Title: Protein sub-cellular localisation prediction by analysis of short-range residue correlations

Authors: Jian Guo, Yuanlie Lin, Zhirong Sun

Addresses: Laboratory of Statistical Computing and Bioinformatics, Department of Mathematical Sciences, Tsinghua University, Beijing 100084, PR China. ' Laboratory of Statistical Computing and Bioinformatics, Department of Mathematical Sciences, Tsinghua University, Beijing 100084, PR China. ' MOE Key Lab of Bioinformatics, State Key Lab of Biomembrane and Membrane Biotechnology, Institute of Bioinformatics, Department of Biological Sciences and Biotechnology, Tsinghua University, Beijing 100084, China

Abstract: Sub-cellular localisation performs an important role in genome analysis. This paper describes a new residue-couple model using a support vector machine to predict the sub-cellular localisation of proteins. This new approach provides better predictions than the existing methods. The total prediction accuracies on Reinhardt and Hubbard|s dataset reach 92.0% for prokaryotic protein sequences and 86.9% for eukaryotic protein sequences with fivefold cross validation. For a new dataset with 8304 proteins located in eight sub-cellular locations, the total accuracy achieves 88.9%. Meanwhile, the model shows robust against N-terminal errors in the sequences.

Keywords: sub-cellular localisation; residue-couple model; short-range residue correlations; support vector machine; bioinformatics; proteins; prokaryotic protein sequences; eukaryotic protein sequences; genome sequencing.

DOI: 10.1504/IJBRA.2006.009762

International Journal of Bioinformatics Research and Applications, 2006 Vol.2 No.2, pp.105 - 118

Published online: 09 May 2006 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article