Authors: Patrick Shaughnessy, Gary Livingston, Michael V. Graves
Addresses: Department of Computer Science, University of Massachusetts Lowell, One University Avenue, Lowell, MA 01854, USA. ' Department of Computer Science, University of Massachusetts Lowell, One University Avenue, Lowell, MA 01854, USA. ' Department of Biological Sciences, University of Massachusetts Lowell, One University Avenue, Lowell, MA 01854, USA
Abstract: Machine learning methods are often used to predict Protein-Protein Interactions (PPI). It is common to develop methods using known PPI from well-characterised reference organisms, drawing from that organism data for inferring a predictive model and evaluating the model. We present evidence that this practice does not give a meaningful indication of the model|s performance on genetically distinct organisms. We conclude that this practice cannot be applied to proteins inferred from the genetic sequence of a novel organism for which no PPI data is available, and that there is need for evaluating such methods on organisms distinct from their training organisms.
Keywords: proteins; protein-protein interactions; interaction prediction; machine learning; classification; model generalisation; cross-organism prediction; random forests; human herpesvirus 8; Saccharomyces cerevisiae; Arabidopsis thaliana; Chlorella viruses; genetic sequences; novel organisms.
International Journal of Computational Biology and Drug Design, 2008 Vol.1 No.3, pp.235 - 253
Available online: 26 Nov 2008 *Full-text access for editors Access for subscribers Purchase this article Comment on this article