Title: A node sets based fast and scalable frequent itemset algorithm for mining big data using map reduce paradigm

Authors: B. Sivaiah; R. Rajeswara Rao

Addresses: Department of CSE, Jawaharlal Nehru Technological University, Kakinada, Andra Pradesh, India ' Department of CSE, Jawaharlal Nehru Technological University, Gurajada, Andra Pradesh, India

Abstract: Big data is rapidly growing, making traditional tools inefficient for handling large amounts of data. Existing algorithms for frequent itemset mining struggle with scalability due to limitations in parallel processing power. In this paper, we proposed a fast and scalable frequent itemset mining (FSFIM) algorithm used to generate frequent item sets from huge data. Preorder coding (POC) trees and Nodeset data structures save half the memory of node-lists and N-lists. The FSFIM uses Cloudera's CDH Map Reduce framework. With a maximum speedup value of 1.85 when minimal support is set to 1, The experimental results reveal that FSFIM outperforms the state-of-the-art methods such as HBPFP, Mlib PFP, and Big FIM. Fast and scalable frequent itemset mining algorithm is more scalable and faster for mining frequent item sets from big data.

Keywords: big data; frequent itemset mining; FIM; MapReduce paradigm; fast and scalable frequent itemset mining; FSFIM.

DOI: 10.1504/IJDMMM.2024.140540

International Journal of Data Mining, Modelling and Management, 2024 Vol.16 No.3, pp.326 - 343

Received: 27 Jul 2023
Accepted: 19 Nov 2023

Published online: 22 Aug 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article