Title: Secondary protein structure prediction combining protein structural class, relative surface accessibility, and contact number

Authors: Imad Rahal; Jonathon Walz

Addresses: Department of Computer Science, College of Saint Benedict, Saint John's University, Collegeville, MN 56321, USA ' Department of Computer Science, College of Saint Benedict, Saint John's University, Collegeville, MN 56321, USA

Abstract: With huge amounts of molecular data produced from ever-increasing numbers of genomic and proteomic studies, predicting the secondary structure of proteins from amino acid sequences has become a common expectation among scientists. Several studies in the literature have demonstrated that the accuracy of such predictions can be drastically improved by incorporating additional types of protein data into the prediction process; however, no work has studied the effect of incorporating multiple types of protein data simultaneously. In this work, we report our findings from an extensive experimental study that uses neural networks designed to study the effect of using different combinations of protein data on the accuracy of predicting secondary protein structures. Overall, our experimental results indicate that accuracy improves the most when incorporating contact number, relative surface accessibility or any combination that includes at least one of the two into the prediction process.

Keywords: protein structure prediction; neural networks; machine learning; scientific data mining; data science; bioinformatics; protein structural class; relative surface accessibility; protein contact number.

DOI: 10.1504/IJDS.2018.090624

International Journal of Data Science, 2018 Vol.3 No.1, pp.68 - 85

Received: 08 Jan 2015
Accepted: 07 Nov 2015

Published online: 25 Mar 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article