Title: RNA secondary structure prediction using conditional random fields model

Authors: Sitthichoke Subpaiboonkit; Chinae Thammarongtham; Robert W. Cutler; Jeerayut Chaijaruwanich

Addresses: Faculty of Science, Department of Computer Science and Bioinformatics Research Laboratory, Chiang Mai University 50200, Thailand ' Biochemical Engineering and Pilot Plant Research and Development Unit, National Center for Genetic Engineering and Biotechnology, Bangkok 10150, Thailand ' Independent Research Scientist, 15 Prasing Post, Ampur Muang, Chiang Mai 50200, Thailand ' Faculty of Science, Department of Computer Science and Bioinformatics Research Laboratory, Chiang Mai University 50200, Thailand

Abstract: Non-coding RNAs (ncRNAs) have important biological functions in living cells dependent on their conserved secondary structures. Here, we focus on computational RNA secondary structure prediction by exploring primary sequences and complementary base pair interactions using the Conditional Random Fields (CRFs) model, which treats RNA prediction as a sequence labelling problem. Proposing suitable feature extraction from known RNA secondary structures, we developed a feature extraction based on natural RNA's loop and stem characteristics. Our CRFs models can predict the secondary structures of the test RNAs with optimal F-score prediction between 56.61 and 98.20% for different RNA families.

Keywords: RNA secondary structure prediction; ncRNA; non-coding RNA; CRFs; conditional random fields; bioinformatics; machine learning; data mining; feature extraction.

DOI: 10.1504/IJDMB.2013.053195

International Journal of Data Mining and Bioinformatics, 2013 Vol.7 No.2, pp.118 - 134

Received: 20 Oct 2010
Accepted: 20 Jul 2011

Published online: 20 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article