Authors: Peng Xiao; Soumitra Pal; Sanguthevar Rajasekaran
Addresses: Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road, Storrs, CT 06269, USA ' Algorithmic Methods in Computational and Systems Biology Section, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA ' Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road, Storrs, CT 06269, USA
Abstract: Motifs are crucial patterns in biological sequences that have numerous applications. Motif search is an important step in obtaining meaningful patterns from biological data. However, most of the existing algorithms are deterministic and the role of randomisation in this area is still unexploited. This paper focuses on (l,d)-motif model, which is also known as Planted Motif Search (PMS) and proposes an efficient randomised algorithm, named qPMS10, to solve PMS. We utilise the most efficient PMS solver until now, named qPMS9, as a subroutine. We analyse the time complexity of both algorithms and provide a performance comparison of qPMS10 with qPMS9 on standard benchmark datasets. In addition, we offer a parallel implementation of qPMS10 and run tests using up to four processors. Both theoretical and empirical analyses demonstrate that our randomised algorithm outperforms the existing algorithms for solving PMS.
Keywords: quorum planted motif search; random sample; parallelism; Chernoff bounds; Hamming distance.
International Journal of Data Mining and Bioinformatics, 2017 Vol.18 No.2, pp.105 - 124
Received: 24 Apr 2017
Accepted: 03 May 2017
Published online: 29 Aug 2017 *