Title: A proposed hybrid algorithm for mining frequent patterns on Spark

Authors: Wael Mohamed; Manal A. Abdel-Fattah

Addresses: Information Systems Department, Faculty of Computers and Information, Helwan University, Helwan, Egypt ' Information Systems Department, Faculty of Computers and Information, Helwan University, Helwan, Egypt

Abstract: Frequent itemset mining is one of the most important data mining techniques applied to discover frequent itemset, interesting information, and correlation from data. Many algorithms such as Apriori, Fp-growth and Eclat have been adjusted and implemented to deal with big data. Those algorithms are implemented on big data processing engines such as MapReduce and Spark. However, the existing implementations have limitations. Consequently, this paper proposes a hybrid algorithm to mine frequent patterns on sparse big dataset over Spark platform. The proposed hybrid algorithm uses Apriori in the first few levels then switches to use Eclat for the rest of levels. The proposed hybrid algorithm consists of four phases. Experiments for testing the performance of the proposed algorithm are conducted, and the elapsed time of the proposed hybrid algorithm is compared with parallel fp-growth, YAFIM and Eclat-Spark. The proposed algorithm outperforms YAFIM, Eclat, and fp-growth with a high degree of minimum support.

Keywords: big data; frequent pattern mining; Eclat; Apriori; Spark.

DOI: 10.1504/IJBIDM.2022.120833

International Journal of Business Intelligence and Data Mining, 2022 Vol.20 No.2, pp.146 - 169

Received: 17 May 2019
Accepted: 24 Feb 2020

Published online: 11 Feb 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article