Int. J. of Data Mining and Bioinformatics   »   2009 Vol.3, No.2

 

 

Title: Computational identification of protein-coding sequences by comparative analysis

 

Author: Arnaud Fontaine, Helene Touzet

 

Addresses:
LIFL ‐ UMR CNRS 8022 University Lille 1, INRIA Lille Nord Europe, France.
LIFL ‐ UMR CNRS 8022 University Lille 1, INRIA Lille Nord Europe, France

 

Abstract: Gene prediction is an essential step in understanding the genome of a species once it has been sequenced. For that, a promising direction in current research on gene finding is a comparative genomics approach. In this paper, we present a novel approach to identifying evolutionarily conserved protein-coding sequences in genomes. The method takes advantage of the specific substitution pattern of coding sequences together with the consistency of reading frames. It has been implemented in a software called PROTEA. Large-scale experimentation shows good results. PROTEA is intended to be a useful complement to existing tools based on homology search or statistical properties of the sequences.

 

Keywords: comparative genomics; exon prediction; gene prediction; genome annotation; multiple sequence alignment; bioinformatics; data mining; protein coding sequences.

 

DOI: 10.1504/IJDMB.2009.024849

 

Int. J. of Data Mining and Bioinformatics, 2009 Vol.3, No.2, pp.160 - 176

 

Available online: 01 May 2009

 

 

Editors Full text accessAccess for SubscribersPurchase this articleComment on this article