Prediction of Protein Secondary Structure with two-stage multi-class SVMs Online publication date: Wed, 06-Dec-2006
by Minh N. Nguyen, Jagath C. Rajapakse
International Journal of Data Mining and Bioinformatics (IJDMB), Vol. 1, No. 3, 2007
Abstract: Bioinformatics techniques to Protein Secondary Structure (PSS) prediction mostly depend on the information available in amino acid sequences. In this paper, we propose a two-stage Multi-class Support Vector Machine (MSVM) approach, where the second MSVM predictor is introduced at the output of the first stage MSVM to capture the contextual relationship among secondary structure elements in order to minimise the generalisation error in the prediction. By using position-specific scoring matrices generated by PSI-BLAST, the two-stage MSVM approach achieves Q3 accuracies of 78.0% and 76.3% on the RS126 dataset of 126 non-homologous globular proteins and the CB396 dataset of 396 non-homologous proteins, respectively, which are better than the scores reported on both datasets to date. By using MSVM, the present prediction scheme significantly achieves 2–6% and 3–15% of improvement in Q3 and Sov accuracies, respectively, on the two datasets. On larger blind-test datasets from PSIPRED, CASP4 and EVA datasets, two-stage MSVM approach achieves Q3 accuracies from 77.0% to 79.5%.
Online publication date: Wed, 06-Dec-2006
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Data Mining and Bioinformatics (IJDMB):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email firstname.lastname@example.org