Title: A software system for gene sequence database construction based on fast approximate string matching

Authors: Zheng Liu, James Borneman, Tao Jiang

Addresses: Department of Computer Science and Engineering, University of California, Riverside CA 92521, USA. ' Department of Computer Science and Engineering, University of California, Riverside CA 92521, USA. ' Department of Computer Science and Engineering, University of California, Riverside CA 92521, USA

Abstract: We propose a web-based software system for sequence acquisition and database construction. An example application of this system is to construct a ribosomal RNA gene (rDNA) sequence database to facilitate the study of microbial communities. A fast and accurate approximate string matching algorithm is implemented to fetch rDNA sequences sandwiched by two given primers from GenBank. A homology search algorithm based on Basic-Local-Alignment-Search-Tool (BLAST) is then used to extract rDNA sequences that do not contain the primers. This two step process leads to an rDNA sequence database for a specific taxonomic group. We consider the distance between the occurrences of the two given primers, mismatches and degeneracy when performing string matching. In the homology search, a chaining algorithm is combined with BLAST to obtain global alignments based on local alignments. This system can be used in many biological applications.

Keywords: sequence acquisition; oligonucleotide fingerprinting; approximate string matching; homology search; bioinformatics; gene sequence databases; rDNA gene sequences; internet; microbial communities; ribosomal RNA genes; database construction.

DOI: 10.1504/IJBRA.2005.007906

International Journal of Bioinformatics Research and Applications, 2005 Vol.1 No.3, pp.273 - 291

Published online: 30 Sep 2005 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article