Authors: K.M. Padmapriya; B. Anandhi; M. Vijayakumar
Addresses: Department of Computer Science, SSM College of Arts and Science, Komarapalayam, Namakal (Dt), Tamil Nadu, India ' Department of Computer Science, Vellalar College for Women, Erode, India ' Department of Computer Applications, K.S.R. College of Engineering, Tiruchengode Namakal (Dt), Tamil Nadu, India
Abstract: Big data clustering is one of the significant processes employed in numerous application domains. Existing clustering algorithms do not cope with large-scale data, resulting in higher false positive rate. In order to cluster such large datasets with higher accuracy, MapReduce gradient descent gentle AdaBoost clustering (MGDGAC) technique is proposed. The MGDGAC technique designs MapReduce fuzzy C-means (MFCM) clustering where the large dataset is initially subdivided into a number of chunks which are executed in parallel on different nodes to effectively perform clustering processes with minimal time. The data with larger membership value are grouped in the cluster with help of mappers. Then, reducer in MFCM clustering re-estimates the centroid value and iteratively fed to the mapper again until it attains a particular iteration and groups the similar data together. Finally, MGDGAC technique applies gentle AdaBoost with intention of reducing the training error of large data clustering.
Keywords: big data clustering; MapReduce; gradient descent; gentle AdaBoost; fuzzy C-means.
International Journal of Business Intelligence and Data Mining, 2021 Vol.19 No.2, pp.170 - 188
Received: 26 Feb 2019
Accepted: 10 Jun 2019
Published online: 17 Aug 2021 *