Authors: Ken D. Nguyen, Yi Pan
Addresses: Department of Computer Science, Georgia State University, 34 Peachtree Street, Suite 1450, Atlanta, GA 30303-3994, USA. ' Department of Computer Science, Georgia State University, 34 Peachtree Street, Suite 1450, Atlanta, GA 30303-3994, USA
Abstract: Aligning multiple DNA/RNA/protein sequences to identify common functionalities, structures, or relationships between species is a fundamental task in bioinformatics. In this study, we propose a new multiple sequence strategy that extracts sequence information, sequence global and local similarities to provide different weights for each input sequence. A weighted pair-wise distance matrix is calculated from these sequences to build a dynamic alignment guiding tree. The tree can reorder its higher-level branches based on corresponding alignment results from lower tree levels to guarantee the highest alignment scores at each level of the tree. This technique improves the alignment accuracy up to 10% on many benchmarks tested against alignment tools such as CLUSTALW (Thompson et al., 1994), DIALIGN (Morgenstern, 1999), T-COFFEE (Notredame et al., 2000), MUSCLE (Edgar, 2004), and PROBCONS (Do et al., 2005) of the multiple sequence alignment.
Keywords: multiple sequence alignment; sequence information; weighted phylogeny dendrogram; dynamic alignment guiding tree; bioinformatics; RNA protein sequences; DNA protein sequences.
International Journal of Bioinformatics Research and Applications, 2011 Vol.7 No.2, pp.168 - 182
Received: 08 Oct 2010
Accepted: 27 Oct 2010
Published online: 13 May 2011 *