Title: The generalised k-Truncated Suffix Tree for time-and space-efficient searches in multiple DNA or protein sequences

Authors: Marcel H. Schulz, Sebastian Bauer, Peter N. Robinson

Addresses: Institute fur Medizinische Genetik, Charite Universitatsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; International Max Planck Research School for Computational Biology and Scientific Computing, Berlin, Germany. ' Institute fur Medizinische Genetik, Charite Universitatsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany. ' Institute fur Medizinische Genetik, Charite Universitatsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany

Abstract: Efficient searching for specific subsequences in a set of longer sequences is an important component of many bioinformatics algorithms. Generalised suffix trees and suffix arrays allow searches for a pattern of length n in time proportional to n independent of the length of the sequences, and are thus attractive for a variety of applications. Here, we present an algorithm termed the generalised k-Truncated Suffix Tree (kTST), that represents an adaption of Ukkonen|s linear-time suffix tree construction algorithm. The kTST algorithm creates a k-deep tree in linear time that allows rapid searches for short patterns of length of up to k characters. The kTST can offer advantages in computational time and memory usage for searches for short sequences in DNA or protein sequences compared to other suffix-based algorithms.

Keywords: suffix trees; biological sequence analysis; suffix array; bioinformatics; multiple DNA sequences; protein sequences.

DOI: 10.1504/IJBRA.2008.017165

International Journal of Bioinformatics Research and Applications, 2008 Vol.4 No.1, pp.81 - 95

Published online: 17 Feb 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article