Title: PMS6: a fast algorithm for motif discovery

Authors: Shibdas Bandyopadhyay; Sartaj Sahni; Sanguthevar Rajasekaran

Addresses: Department of CISE, University of Florida, Gainesville, FL 32611, USA ' Department of CISE, University of Florida, Gainesville, FL 32611, USA ' Department of CSE, University of Connecticut, Storrs, CT 06269, USA

Abstract: We propose a new algorithm, PMS6, for the (l,d)-motif discovery problem in which we are to find all strings of length l that appear in every string of a given set of strings with at most d mismatches. The run time ratio PMS5/PMS6, where PMS5 is the fastest previously known algorithm for motif discovery in large instances, ranges from a high of 2.20 for the (21,8) challenge instances to a low of 1.69 for the (17,6) challenge instances. Both PMS5 and PMS6 require some amount of pre-processing. The pre-processing time for PMS6 is 34 times faster than that for PMS5 for (23,9) instances. When pre-processing time is factored in, the run time ratio PMS5/PMS6 is as high as 2.75 for (13,4) instances and as low as 1.95 for (17,6) instances.

Keywords: PMS6; motif search; planted motifs; approximate string search; bioinformatics; motif discovery.

DOI: 10.1504/IJBRA.2014.062990

International Journal of Bioinformatics Research and Applications, 2014 Vol.10 No.4/5, pp.369 - 383

Published online: 24 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article