Title: An efficient motif discovery algorithm with unknown motif length and number of binding sites

Authors: Henry C.M. Leung, Francis Y.L. Chin

Addresses: Department of Computer Science, The University of Hong Kong, Hong Kong, China. ' Department of Computer Science, The University of Hong Kong, Hong Kong, China

Abstract: Most motif discovery algorithms from DNA sequences require the motif|s length as input. Styczynski et al. introduced the Extended (l,d)-Motif Problem (EMP) where the motif|s length is not an input parameter. Unfortunately, their algorithm takes an unacceptably long time to run, e.g. over 3 months to discover a length-14 motif. Since the best motif may not be the longest nor have the largest number of binding sites, in this paper we further eliminate another input parameter about the minimum number of binding sites in order to provide more realistic/robust results. We also develop an efficient algorithm to solve EMP and this redefined problem.

Keywords: motif discovery; transcription factors; binding sites; consensus pattern; DNA sequences; gene regulatory networks; bioinformatics; extended motif problem; motif length; gene expression data.

DOI: 10.1504/IJDMB.2006.010856

International Journal of Data Mining and Bioinformatics, 2006 Vol.1 No.2, pp.201 - 215

Published online: 07 Sep 2006 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article