Title: A novel approach to Multiple Sequence Alignment using hadoop data grids
Authors: G. Sudha Sadasivam, G. Baktavatchalam
Addresses: Department of Computer Science and Engineering, PSG College of Technology, Coimbatore 641004, Tamil Nadu, India. ' Department of Computer Science and Engineering, PSG College of Technology, Coimbatore 641004, Tamil Nadu, India
Abstract: Multiple alignment of protein sequences helps to determine evolutionary linkage and to predict molecular structures. The factors to be considered while aligning multiple sequences are speed and accuracy of alignment. Although dynamic programming algorithms produce accurate alignments, they are computation intensive. In this paper we propose a time efficient approach to sequence alignment that also produces quality alignment. The dynamic nature of the algorithm coupled with data and computational parallelism of hadoop data grids improves the accuracy and speed of sequence alignment. The principle of block splitting in hadoop coupled with its scalability facilitates alignment of very large sequences.
Keywords: DNA sequences; global alignment; hadoop data grids; Needleman–Wunsch; multiple sequence alignmen; protein sequences; block splitting; scalability; bioinformatics.
DOI: 10.1504/IJBRA.2010.037987
International Journal of Bioinformatics Research and Applications, 2010 Vol.6 No.5, pp.472 - 483
Published online: 07 Jan 2011 *
Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article