Title: TrieAMD: a scalable and efficient apriori motif discovery approach

Authors: Isra Al-Turaiki; Ghada Badr; Hassan Mathkour

Addresses: College of Computer and Information Sciences, King Saud University, Riyadh, Kingdom of Saudi Arabia ' College of Computer and Information Sciences, King Saud University, Riyadh, Kingdom of Saudi Arabia; IRI - The City of Scientific Research and Technological Applications, University and Research District, P.O. 21934 New Borg Alarab, Alexandria, Egypt ' College of Computer and Information Sciences, King Saud University, Riyadh, Kingdom of Saudi Arabia

Abstract: Motif discovery is the problem of finding recurring patterns in biological sequences. It is one of the hardest and long-standing problems in bioinformatics. Apriori is a well-known data-mining algorithm for the discovery of frequent patterns in large datasets. In this paper, we apply the Apriori algorithm and use the Trie data structure to discover motifs. We propose several modifications so that we can adapt the classic Apriori to our problem. Experiments are conducted on Tompa's benchmark to investigate the performance of our proposed algorithm, the Trie-based Apriori Motif Discovery (TrieAMD). Results show that our algorithm outperforms all of the tested tools on real datasets for the average sensitivity measure, which means that our approach is able to discover more motifs. In terms of specificity, the performance of our algorithm is comparable to the other tools. The results also confirm both linear time and linear space scalability of the algorithm.

Keywords: bioinformatics; apriori motif discovery; data mining; scalability; Trie; transcription factor binding sites; recurring patterns; biological sequences; suffix tree; data structures.

DOI: 10.1504/IJDMB.2015.070833

International Journal of Data Mining and Bioinformatics, 2015 Vol.13 No.1, pp.13 - 30

Received: 21 Dec 2012
Accepted: 19 Oct 2013

Published online: 30 Jul 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article