A proposed hybrid algorithm for mining frequent patterns on Spark Online publication date: Fri, 11-Feb-2022
by Wael Mohamed; Manal A. Abdel-Fattah
International Journal of Business Intelligence and Data Mining (IJBIDM), Vol. 20, No. 2, 2022
Abstract: Frequent itemset mining is one of the most important data mining techniques applied to discover frequent itemset, interesting information, and correlation from data. Many algorithms such as Apriori, Fp-growth and Eclat have been adjusted and implemented to deal with big data. Those algorithms are implemented on big data processing engines such as MapReduce and Spark. However, the existing implementations have limitations. Consequently, this paper proposes a hybrid algorithm to mine frequent patterns on sparse big dataset over Spark platform. The proposed hybrid algorithm uses Apriori in the first few levels then switches to use Eclat for the rest of levels. The proposed hybrid algorithm consists of four phases. Experiments for testing the performance of the proposed algorithm are conducted, and the elapsed time of the proposed hybrid algorithm is compared with parallel fp-growth, YAFIM and Eclat-Spark. The proposed algorithm outperforms YAFIM, Eclat, and fp-growth with a high degree of minimum support.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Business Intelligence and Data Mining (IJBIDM):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com