Authors: Sitthichoke Subpaiboonkit; Chinae Thammarongtham; Robert W. Cutler; Jeerayut Chaijaruwanich
Addresses: Faculty of Science, Department of Computer Science and Bioinformatics Research Laboratory, Chiang Mai University 50200, Thailand ' Biochemical Engineering and Pilot Plant Research and Development Unit, National Center for Genetic Engineering and Biotechnology, Bangkok 10150, Thailand ' Independent Research Scientist, 15 Prasing Post, Ampur Muang, Chiang Mai 50200, Thailand ' Faculty of Science, Department of Computer Science and Bioinformatics Research Laboratory, Chiang Mai University 50200, Thailand
Abstract: Non-coding RNAs (ncRNAs) have important biological functions in living cells dependent on their conserved secondary structures. Here, we focus on computational RNA secondary structure prediction by exploring primary sequences and complementary base pair interactions using the Conditional Random Fields (CRFs) model, which treats RNA prediction as a sequence labelling problem. Proposing suitable feature extraction from known RNA secondary structures, we developed a feature extraction based on natural RNA's loop and stem characteristics. Our CRFs models can predict the secondary structures of the test RNAs with optimal F-score prediction between 56.61 and 98.20% for different RNA families.
Keywords: RNA secondary structure prediction; ncRNA; non-coding RNA; CRFs; conditional random fields; bioinformatics; machine learning; data mining; feature extraction.
International Journal of Data Mining and Bioinformatics, 2013 Vol.7 No.2, pp.118 - 134
Received: 20 Oct 2010
Accepted: 20 Jul 2011
Published online: 29 Mar 2013 *