Int. J. of Data Mining and Bioinformatics   »   2016 Vol.14, No.1

 

 

You can view the full text of this article for Free access using the link below.

 

 

Title: Protein structure prediction (RMSD ≤ 5 Å) using machine learning models

 

Authors: Yadunath Pathak; Prashant Singh Rana; P.K. Singh; Mukesh Saraswat

 

Addresses:
Computational Intelligence and Data Mining Research Lab, ABV-Indian Institute of Information Technology and Management, Gwalior 474015, Madhya Pradesh, India
Computer Science and Engineering Department, Thapar University, Patiala 147004, Punjab, India
Computational Intelligence and Data Mining Research Lab, ABV-Indian Institute of Information Technology and Management, Gwalior 474015, Madhya Pradesh, India
Jaypee Institute of Information Technology, Noida 201307, Uttar Pradesh, India

 

Abstract: Physical and chemical properties of protein help to determine the quality of protein structure. Here we explore the machine learning models using six physical and chemical properties, namely total empirical energy, secondary structure penalty, total surface area, pair number, residue length and Euclidean distance to predict the RMSD of a protein structure in the absence of its true native state. The Real Coded Genetic Algorithm is used to determine feature importance, and k-fold cross-validation is used to measure the robustness of the best predictive model. The experiments show that the random forest model outperforms the other machine learning approaches in RMSD prediction. The performance result shows that in the prediction of RMSD, the Root Mean Square Error (RMSE) is 0.48, correlation is 0.90, R² is 0.82 and accuracy is 97.02% (with ±2 error) on the testing data. The data set used in the study is available at http://bit.ly/PSP-ML.

 

Keywords: protein structure prediction; machine learning; random forest; real coded genetic algorithms; total empirical energy; secondary structure penalty; total surface area; pair number; residue length; Euclidean distance; bioinformatics.

 

DOI: 10.1504/IJDMB.2016.073361

 

Int. J. of Data Mining and Bioinformatics, 2016 Vol.14, No.1, pp.71 - 85

 

Date of acceptance: 04 May 2015
Available online: 30 Nov 2015

 

 

Editors Full text accessFree access Free accessComment on this article