Title: Using naïve Bayesian classification as a meta-predictor to improve start codon prediction accuracy in prokaryotic organisms

Authors: Sean Landman; Imad Rahal

Addresses: Computer Science Department, University of Minnesota, 200 Union Street SE, Minneapolis, MN 55455, USA ' Computer Science Department, College of St. Benedict/St. John's University, 211 Peter Engel Science Center, Collegeville, MN 56321-3000, USA

Abstract: Modern gene location prediction techniques are able to achieve near-perfect accuracy for prokaryotic organisms, but this reported accuracy is generally only for the stop codon locations. Accurate prediction of the start codon locations is more difficult to attain, and different approaches often produce conflicting predictions for the same gene. In this paper, we describe a new approach to resolve these conflicts and improve start codon prediction accuracy. Our approach uses a set of gene location prediction results from other popular prediction approaches to find consistently predicted gene locations. It then uses these consistent genes as a training set for a naïve Bayesian classifier to improve accuracy in the ambiguous genes, those in which there are some inconsistencies in the predicted start codon location among the original predictions. The methods detailed here apply to prokaryotic organisms, using E. coli and the EcoGene Verified Set database as a case study.

Keywords: gene location prediction; start codon locations; Bayesian classification; meta-predictor; prokaryote; genomics; E. coli; Ecogene; data mining; prokaryotic organisms.

DOI: 10.1504/IJDMMM.2013.055864

International Journal of Data Mining, Modelling and Management, 2013 Vol.5 No.3, pp.246 - 260

Published online: 29 Jul 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article