Title: Exploring alternative knowledge representations for protein secondary-structure prediction

Authors: Uros Midic, A. Keith Dunker, Zoran Obradovic

Addresses: Center for Information Science and Technology, Temple University, 1805 N. Broad St., 303 Wachman Hall, Philadelphia, PA 19129, USA. ' Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 714 North Senate Avenue, Suite 250, Indianapolis, IN 46202, USA. ' Center for Information Science and Technology, Temple University, 1805 N. Broad St., 303 Wachman Hall, Philadelphia, PA 19129, USA

Abstract: Methods for 3-class secondary-structure prediction are thought to be reaching the highest achievable accuracy. Their accuracy on β-sheet residue class is considerably lower than for the other two classes. We analysed the relevance of 315 individual input attributes for a predictor with the usual framework of using sequence-profile based data with an input window of fixed size. We propose two alternative knowledge representations with significantly smaller sets of input attributes. We also investigated the possibility of exploiting the prediction of connected pairs of β-sheet residues and the prediction of residue contact maps for the improvement of accuracy of secondary-structure prediction.

Keywords: protein structure prediction; protein folding; bioinformatics; sensitivity analysis; feature selection; knowledge representation; machine learning; data mining; protein secondary structure.

DOI: 10.1504/IJDMB.2007.011614

International Journal of Data Mining and Bioinformatics, 2007 Vol.1 No.3, pp.286 - 313

Published online: 06 Dec 2006 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article