Title: Discriminating TATA box from putative TATA boxes in plant genome

Authors: Raja Loganantharaj

Addresses: The Center for Computer Science, University of Louisiana, P.O. Box 44330, Lafayette, LA 70504, USA

Abstract: The TATA box has been used successfully to identify a transcription start site (TSS) and thereby a promoter. Unfortunately, there are many substrings which fit the profile of a TATA box and such substrings are called putative TATA boxes. We have applied linear and non linear classifiers for discriminating TATA box from putative TATA boxes and have compared their performances. We have also investigated the influence of the length of the pair of sequences flanking a putative TATA box on the prediction accuracy. The techniques we have presented in this paper are general enough to be applicable to other domains or to other genomes.

Keywords: promoter detection; binding sites; transcription start site; linear classifiers; nonlinear classifiers; Naive Bayes algorithm; artificial neural networks; bioinformatics; plant genome; putative TATA boxes; prediction accuracy; promoter prediction; DNA sequences.

DOI: 10.1504/IJBRA.2006.009192

International Journal of Bioinformatics Research and Applications, 2006 Vol.2 No.1, pp.36 - 51

Published online: 09 Mar 2006 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article