Authors: Ioannis K. Valavanis, George M. Spyrou, Konstantina S. Nikita
Addresses: Faculty of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou Str., 15780 Zografou, Athens, Greece. ' Biomedical Informatics Unit, Biomedical Research Foundation, Academy of Athens, 4 Soranou Efessiou Str., 115 27 Athens, Greece. ' Faculty of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou Str., 15780 Zografou, Athens, Greece
Abstract: Fold recognition based on sequence-derived features is a complex multi-class classification problem. In the current study, we comparatively assess five different classification techniques, namely multilayer perceptron and probabilistic neural networks, nearest neighbour classifiers, multi-class support vector machines and classification trees for fold recognition on a reference set of proteins that are organised in 27 folds and are described by 125-dimensional vectors of sequence-derived features. We evaluate all classifiers in terms of total accuracy, mutual information coefficient, sensitivity and specificity measurements using a ten-fold cross-validation method. A polynomial support vector machine and a multilayer perceptron of one hidden layer of 88 nodes performed better and achieved satisfactory multi-class classification accuracies (42.8% and 42.1%, respectively) given the complexity of the problem and the reported similar classification performances of other researchers.
Keywords: protein fold recognition; sequence-derived features; neural networks; NNs; support vector machine; SVM; classification tree; nearest neighbour classifier; multi-class classification; proteins.
International Journal of Computational Intelligence in Bioinformatics and Systems Biology, 2010 Vol.1 No.3, pp.332 - 346
Available online: 02 Feb 2010 *Full-text access for editors Access for subscribers Purchase this article Comment on this article