Title: The Apriori property of sequence pattern mining with wildcard gaps
Authors: Fan Min; Youxi Wu; Xindong Wu
Addresses: Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou 363000, China ' School of Computer Science, Hebei University of Technology, Tianjin 300130, China ' Department of Computer Science, University of Vermont, Burlington, Vermont 05405, USA
Abstract: In biological sequence analysis, long and frequently occurring patterns tend to be interesting. Data miners try to obtain frequent patterns with periodical wildcard gaps. However, with the existing definition set, the Apriori property does not hold; consequently, state-of-the-art algorithms are rather complex. This paper proposes an alternative definition of the number of offset sequences by adding a number of dummy characters. With the new definition, the Apriori property holds, hence our Apriori algorithm can mine all frequent patterns with minimal endeavour. This study also serves as the foundation of further research works on sequence pattern mining.
Keywords: sequence pattern mining; wildcard gap; frequency; apriori; biological sequence analysis; offset sequences; dummy characters.
DOI: 10.1504/IJFIPM.2012.050418
International Journal of Functional Informatics and Personalised Medicine, 2012 Vol.4 No.1, pp.15 - 31
Received: 23 May 2011
Accepted: 12 Jun 2011
Published online: 20 Nov 2012 *