Authors: Ananda Mondal; Jianjun Hu
Addresses: Department of Computer Science and Engineering, University of South Carolina, 301 Main St, Columbia, SC 29036, USA ' Computer Science Department, 301 Main St. Columbia, SC 29208, USA
Abstract: We present NetLoc, a novel diffusion Kernel-based Logistic Regression (KLR) algorithm for predicting protein subcellular localisation using four types of protein networks including physical PPI networks, genetic Protein-Protein Interaction (PPI) networks, mixed PPI networks and co-expression networks. NetLoc is applied to yeast protein localisation prediction. The results showed that protein networks can provide rich information for protein localisation prediction, achieving Area Under Curve (AUC) score of 0.93. We also showed that networks with high connectivity and high percentage of co-localised PPI lead to better prediction performance. Investigation showed that NetLoc is a very robust approach which can produce good performance (AUC = 0.75) only using 30% of original interactions and capable of producing overall accuracy greater than 0.5 only with 20% annotation coverage. Compared to the previous network feature based prediction algorithm which achieved AUC scores of 0.49 and 0.52 on the yeast PPI network, NetLoc achieved significantly better overall performance with the AUC of 0.74.
Keywords: NetLoc; protein localisation prediction; protein-protein interaction; PPI networks; genetic networks; co-expression networks; kernel-based logistic regression; diffusion kernel; data mining; bioinformatics; protein subcellular localisation.
International Journal of Data Mining and Bioinformatics, 2014 Vol.9 No.4, pp.386 - 400
Received: 15 Apr 2011
Accepted: 15 Apr 2011
Published online: 15 Oct 2013 *