Title: Effective framework for protein structure prediction

Authors: Nagamma Patil; Durga Toshniwal; Kumkum Garg

Addresses: Department of Electronics and Computer Engineering, Indian Institute of Technology, Roorkee 247667, India ' Department of Electronics and Computer Engineering, Indian Institute of Technology, Roorkee 247667, India ' Department of Computer Science and Engineering, Manipal University, Jaipur 302026, India

Abstract: This paper presents a computational system to predict protein structure using N-grams and a wrapper feature selection framework (the N-gram is a subsequence composed of N characters, extracted from a larger sequence). N-gram features are extracted from a dataset consisting of 277 domains: 70 all-α domains, 61 all-β domains, 81 α/β domains and 65 α + β domains. A wrapper feature selection system, GA-SVM, is applied to obtain an optimised feature set. Using the optimised 3070-feature subset, a classifier model is trained and tested in the Support Vector Machine (SVM) learning system. This model achieves an overall accuracy of 88.09%, evaluated by a 10-fold cross-validation test. This value is 4.7% higher than the one using the initial 6,414 features. Experimental results also illustrate that employing a feature subset selection, by using the proposed GA-SVM wrapper approach, has enhanced classification accuracy in comparison to other GA-based wrapper approaches and existing protein sequence encoding methods.

Keywords: wrapper feature selection; GAs; genetic algorithms; SVM; support vector machines; protein structure prediction; classification accuracy; protein sequences.

DOI: 10.1504/IJFIPM.2012.050426

International Journal of Functional Informatics and Personalised Medicine, 2012 Vol.4 No.1, pp.69 - 79

Received: 21 Mar 2012
Accepted: 16 Jul 2012

Published online: 20 Nov 2012 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article