Title: The Apriori property of sequence pattern mining with wildcard gaps

Authors: Fan Min; Youxi Wu; Xindong Wu

Addresses: Lab of Granular Computing, Zhangzhou Normal University, Zhangzhou 363000, China ' School of Computer Science, Hebei University of Technology, Tianjin 300130, China ' Department of Computer Science, University of Vermont, Burlington, Vermont 05405, USA

Abstract: In biological sequence analysis, long and frequently occurring patterns tend to be interesting. Data miners try to obtain frequent patterns with periodical wildcard gaps. However, with the existing definition set, the Apriori property does not hold; consequently, state-of-the-art algorithms are rather complex. This paper proposes an alternative definition of the number of offset sequences by adding a number of dummy characters. With the new definition, the Apriori property holds, hence our Apriori algorithm can mine all frequent patterns with minimal endeavour. This study also serves as the foundation of further research works on sequence pattern mining.

Keywords: sequence pattern mining; wildcard gap; frequency; apriori; biological sequence analysis; offset sequences; dummy characters.

DOI: 10.1504/IJFIPM.2012.050418

International Journal of Functional Informatics and Personalised Medicine, 2012 Vol.4 No.1, pp.15 - 31

Received: 23 May 2011
Accepted: 12 Jun 2011

Published online: 20 Nov 2012 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article