Article: A fast and parallel algorithm for frequent pattern mining from big data in many-task environments Journal: International Journal of High Performance Computing and Networking (IJHPCN) 2017 Vol.10 No.3 pp.157 - 167 Abstract: Many studies have tried to efficiently discover frequent patterns in large databases. The algorithms used in these studies fall into two main categories: apriori algorithms and frequent pattern growth (FP-growth) algorithms. Apriori algorithms operate according to a generate-and-test approach, so performance suffers from the testing of too many candidate itemsets. Therefore, most recent studies have applied an FP-growth approach to the discovery of frequent patterns. The rapid growth of data, however, has introduced new challenges for the mining of frequent patterns, in terms of both execution efficiency and scalability. Big data often contains a large number of items, a large number of transactions and long average transaction length, which result in large FP-trees. In addition to its dependence on data characteristics, FP-tree size is also sensitive to the minimum support threshold. This is because the small support is probable to bring many branches for nodes, greatly enlarging the FP-tree and the number of reconstructed conditional pattern-based trees. In this paper, we propose a novel algorithm and architecture for efficiently mining frequent patterns from big data in distributed many-task computing environments. Through empirical evaluation of various simulation conditions, we show that the proposed method delivers excellent execution time. Inderscience Publishers - linking academia, business and industry through research

Title: A fast and parallel algorithm for frequent pattern mining from big data in many-task environments

Authors: Wei-Tee Lin; Chih-Ping Chu

Addresses: Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan ' Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan

Abstract: Many studies have tried to efficiently discover frequent patterns in large databases. The algorithms used in these studies fall into two main categories: apriori algorithms and frequent pattern growth (FP-growth) algorithms. Apriori algorithms operate according to a generate-and-test approach, so performance suffers from the testing of too many candidate itemsets. Therefore, most recent studies have applied an FP-growth approach to the discovery of frequent patterns. The rapid growth of data, however, has introduced new challenges for the mining of frequent patterns, in terms of both execution efficiency and scalability. Big data often contains a large number of items, a large number of transactions and long average transaction length, which result in large FP-trees. In addition to its dependence on data characteristics, FP-tree size is also sensitive to the minimum support threshold. This is because the small support is probable to bring many branches for nodes, greatly enlarging the FP-tree and the number of reconstructed conditional pattern-based trees. In this paper, we propose a novel algorithm and architecture for efficiently mining frequent patterns from big data in distributed many-task computing environments. Through empirical evaluation of various simulation conditions, we show that the proposed method delivers excellent execution time.

Keywords: data mining; big data; many-task computing; frequent patterns mining.

DOI: 10.1504/IJHPCN.2017.084244

International Journal of High Performance Computing and Networking, 2017 Vol.10 No.3, pp.157 - 167

Received: 05 Aug 2013
Accepted: 03 Jun 2014
Published online: 22 May 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: A fast and parallel algorithm for frequent pattern mining from big data in many-task environments

Keep up-to-date