Title: Predicting functional residues of protein sequence alignments as a feature selection task
Authors: Chris Haddow; Justin Perry; Marcus Durrant; Joe Faith
Addresses: School of Computing, Engineering, and Information Sciences, Northumbria University, Newcastle, NE2 1XE, UK. ' School of Applied Sciences, Northumbria University, Newcastle, NE1 8ST, UK. ' School of Life Sciences, Department of Chemical and Forensic Sciences, Northumbria University, Newcastle NE2 1XE, UK. ' School of Computing, Engineering, and Information Sciences, Northumbria University, Newcastle, NE2 1XE, UK
Abstract: Determining which residues within a multiple alignment of protein sequences are most responsible for protein function is a difficult and important task in bioinformatics. Here, we show that this task is an application of the standard Feature Selection (FS) problem. We show the comparison of standard FS techniques with more specialised algorithms on a range of data sets backed by experimental evidence, and find that some standard algorithms perform as well as specialised ones. We also discuss how considering the discriminating power of combinations of residue positions, rather than the power of each position individually, has the potential to improve the performance of such algorithms.
Keywords: feature selection; protein structure; functional residue prediction; targeted projection pursuit; information gain; functional residues; protein sequence alignments; protein sequences; bioinformatics.
DOI: 10.1504/IJDMB.2011.045417
International Journal of Data Mining and Bioinformatics, 2011 Vol.5 No.6, pp.691 - 705
Received: 05 Oct 2009
Accepted: 07 May 2010
Published online: 24 Jan 2015 *