Title: A hybrid method for splice site prediction based on Markov model and codon information

Authors: Dan Wei; Yin Peng; Yanjie Wei; Qingshan Jiang; Jinglong Fang

Addresses: Key Laboratory of Complex Systems Modeling and Simulation, School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China ' Department of Pathology, Shenzhen University School of Medicine, Shenzhen, Guangdong, China; Department of Pathology, School of Basic Medical Sciences, Wuhan University, Hubei, China ' Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China ' Shenzhen Key Lab for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China ' Key Laboratory of Complex Systems Modeling and Simulation, School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China

Abstract: Predicting splice sites is very important for gene identification. In this paper, we propose a hybrid splice site prediction method, SVM with Markov model and Codon usage (MC-SVM). The sequence features used for MC-SVM contain the codon bias information and the Markov probabilistic dependence information between adjacent nucleotides. Feature selection is performed using an F-score-based method, and then MC-SVM employs SVM to predict splice sites for both the acceptor and the donor sites. The test on the HS3D data set shows MC-SVM performs well for human gene sequences. The prediction accuracy of MC-SVM is 94.0% for donor splice sites, and 91.5% for acceptor splice sites on the data set with an equal amount of true and false splice site sequences. Compared with many other methods, MC-SVM achieved an improved prediction performance.

Keywords: splice site prediction; support vector machines; SVM; Markov models; codon bias; splice sites; gene identification; bioinformatics; feature selection; gene sequences.

DOI: 10.1504/IJDMB.2016.082211

International Journal of Data Mining and Bioinformatics, 2016 Vol.16 No.4, pp.345 - 362

Accepted: 26 Dec 2016
Published online: 12 Feb 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article