Protein structure prediction (RMSD ≤ 5 Å) using machine learning models Online publication date: Mon, 30-Nov-2015
by Yadunath Pathak; Prashant Singh Rana; P.K. Singh; Mukesh Saraswat
International Journal of Data Mining and Bioinformatics (IJDMB), Vol. 14, No. 1, 2016
Abstract: Physical and chemical properties of protein help to determine the quality of protein structure. Here we explore the machine learning models using six physical and chemical properties, namely total empirical energy, secondary structure penalty, total surface area, pair number, residue length and Euclidean distance to predict the RMSD of a protein structure in the absence of its true native state. The Real Coded Genetic Algorithm is used to determine feature importance, and k-fold cross-validation is used to measure the robustness of the best predictive model. The experiments show that the random forest model outperforms the other machine learning approaches in RMSD prediction. The performance result shows that in the prediction of RMSD, the Root Mean Square Error (RMSE) is 0.48, correlation is 0.90, R² is 0.82 and accuracy is 97.02% (with ±2 error) on the testing data. The data set used in the study is available at http://bit.ly/PSP-ML.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Data Mining and Bioinformatics (IJDMB):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com