Title: A proposed hybrid algorithm for mining frequent patterns on Spark
Authors: Wael Mohamed; Manal A. Abdel-Fattah
Addresses: Information Systems Department, Faculty of Computers and Information, Helwan University, Helwan, Egypt ' Information Systems Department, Faculty of Computers and Information, Helwan University, Helwan, Egypt
Abstract: Frequent itemset mining is one of the most important data mining techniques applied to discover frequent itemset, interesting information, and correlation from data. Many algorithms such as Apriori, Fp-growth and Eclat have been adjusted and implemented to deal with big data. Those algorithms are implemented on big data processing engines such as MapReduce and Spark. However, the existing implementations have limitations. Consequently, this paper proposes a hybrid algorithm to mine frequent patterns on sparse big dataset over Spark platform. The proposed hybrid algorithm uses Apriori in the first few levels then switches to use Eclat for the rest of levels. The proposed hybrid algorithm consists of four phases. Experiments for testing the performance of the proposed algorithm are conducted, and the elapsed time of the proposed hybrid algorithm is compared with parallel fp-growth, YAFIM and Eclat-Spark. The proposed algorithm outperforms YAFIM, Eclat, and fp-growth with a high degree of minimum support.
Keywords: big data; frequent pattern mining; Eclat; Apriori; Spark.
DOI: 10.1504/IJBIDM.2022.120833
International Journal of Business Intelligence and Data Mining, 2022 Vol.20 No.2, pp.146 - 169
Received: 17 May 2019
Accepted: 24 Feb 2020
Published online: 11 Feb 2022 *