Title: Towards predicting Protein-Protein Interactions in novel organisms

Authors: Patrick Shaughnessy, Gary Livingston, Michael V. Graves

Addresses: Department of Computer Science, University of Massachusetts Lowell, One University Avenue, Lowell, MA 01854, USA. ' Department of Computer Science, University of Massachusetts Lowell, One University Avenue, Lowell, MA 01854, USA. ' Department of Biological Sciences, University of Massachusetts Lowell, One University Avenue, Lowell, MA 01854, USA

Abstract: Machine learning methods are often used to predict Protein-Protein Interactions (PPI). It is common to develop methods using known PPI from well-characterised reference organisms, drawing from that organism data for inferring a predictive model and evaluating the model. We present evidence that this practice does not give a meaningful indication of the model|s performance on genetically distinct organisms. We conclude that this practice cannot be applied to proteins inferred from the genetic sequence of a novel organism for which no PPI data is available, and that there is need for evaluating such methods on organisms distinct from their training organisms.

Keywords: proteins; protein-protein interactions; interaction prediction; machine learning; classification; model generalisation; cross-organism prediction; random forests; human herpesvirus 8; Saccharomyces cerevisiae; Arabidopsis thaliana; Chlorella viruses; genetic sequences; novel organisms.

DOI: 10.1504/IJCBDD.2008.021417

International Journal of Computational Biology and Drug Design, 2008 Vol.1 No.3, pp.235 - 253

Available online: 26 Nov 2008 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article