Title: A fast and parallel algorithm for frequent pattern mining from big data in many-task environments
Authors: Wei-Tee Lin; Chih-Ping Chu
Addresses: Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan ' Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan
Abstract: Many studies have tried to efficiently discover frequent patterns in large databases. The algorithms used in these studies fall into two main categories: apriori algorithms and frequent pattern growth (FP-growth) algorithms. Apriori algorithms operate according to a generate-and-test approach, so performance suffers from the testing of too many candidate itemsets. Therefore, most recent studies have applied an FP-growth approach to the discovery of frequent patterns. The rapid growth of data, however, has introduced new challenges for the mining of frequent patterns, in terms of both execution efficiency and scalability. Big data often contains a large number of items, a large number of transactions and long average transaction length, which result in large FP-trees. In addition to its dependence on data characteristics, FP-tree size is also sensitive to the minimum support threshold. This is because the small support is probable to bring many branches for nodes, greatly enlarging the FP-tree and the number of reconstructed conditional pattern-based trees. In this paper, we propose a novel algorithm and architecture for efficiently mining frequent patterns from big data in distributed many-task computing environments. Through empirical evaluation of various simulation conditions, we show that the proposed method delivers excellent execution time.
Keywords: data mining; big data; many-task computing; frequent patterns mining.
DOI: 10.1504/IJHPCN.2017.084244
International Journal of High Performance Computing and Networking, 2017 Vol.10 No.3, pp.157 - 167
Received: 05 Aug 2013
Accepted: 03 Jun 2014
Published online: 22 May 2017 *