Title: A novel approach to Multiple Sequence Alignment using hadoop data grids

Authors: G. Sudha Sadasivam, G. Baktavatchalam

Addresses: Department of Computer Science and Engineering, PSG College of Technology, Coimbatore 641004, Tamil Nadu, India. ' Department of Computer Science and Engineering, PSG College of Technology, Coimbatore 641004, Tamil Nadu, India

Abstract: Multiple alignment of protein sequences helps to determine evolutionary linkage and to predict molecular structures. The factors to be considered while aligning multiple sequences are speed and accuracy of alignment. Although dynamic programming algorithms produce accurate alignments, they are computation intensive. In this paper we propose a time efficient approach to sequence alignment that also produces quality alignment. The dynamic nature of the algorithm coupled with data and computational parallelism of hadoop data grids improves the accuracy and speed of sequence alignment. The principle of block splitting in hadoop coupled with its scalability facilitates alignment of very large sequences.

Keywords: DNA sequences; global alignment; hadoop data grids; Needleman–Wunsch; multiple sequence alignmen; protein sequences; block splitting; scalability; bioinformatics.

DOI: 10.1504/IJBRA.2010.037987

International Journal of Bioinformatics Research and Applications, 2010 Vol.6 No.5, pp.472 - 483

Published online: 07 Jan 2011 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article