Title: Prediction of the disulphide bonding state of cysteines in proteins using Conditional Random Fields

Authors: Watshara Shoombuatong, Patrinee Traisathit, Sukon Prasitwattanaseree, Chatchai Tayapiwatana, Robert Cutler, Jeerayut Chaijaruwanich

Addresses: Faculty of Science, Bioinformatics Research Laboratory, Chiang Mai University, 50200, Thailand. ' Faculty of Science, Department of Statistics and Bioinformatics Research Laboratory, Chiang Mai University, 50200, Thailand. ' Faculty of Science, Department of Statistics and Bioinformatics Research Laboratory, Chiang Mai University, 50200, Thailand. ' Faculty of Associated Medical Sciences, Division of Clinical Immunology, Department of Medical Technology, Chiang Mai University, 50200, Thailand; Biomedical Technology Research Unit, National Science and Technology Development Agency, Chiang Mai University, 50200, Thailand. ' 15 Prasing Post, Ampur Muang, Chiang Mai 50200, Thailand. ' Faculty of Science, Department of Computer Science and Bioinformatics Research Laboratory, Chiang Mai University, 50200, Thailand

Abstract: The formation of disulphide bonds between cysteines plays a major role in protein folding, structure, function and evolution. Many computational approaches have been used to predict the disulphide bonding state of cysteines. In our work, we developed a novel method based on Conditional Random Fields (CRFs) to predict the disulphide bonding state from protein primary sequence, predicted secondary structures and predicted relative solvent accessibilities (all-state information). Our experiments obtain 84% accuracy, 88% precision and 94% recall, using all-state information. However, our results show essentially identical results when using protein sequence and predicted relative solvent accessibilities in the absence of secondary structure.

Keywords: cysteines bonding; CRFs; conditional random fields; bioinformatics; machine learning; disulphide bonds; proteins; protein sequences.

DOI: 10.1504/IJDMB.2011.041559

International Journal of Data Mining and Bioinformatics, 2011 Vol.5 No.4, pp.449 - 464

Published online: 24 Jan 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article