Title: An improved content splitting and merging algorithm for Hadoop clusters using component analysis and hamming distance

Authors: Balraj Singh; Harsh Kumar Verma; Gulshan Kumar; Hye-jin Kim

Addresses: Department of Computer Science and Engineering, Dr. B.R. Ambedkar National Institute of Technology, Jalandhar, India ' Department of Computer Science and Engineering, Dr. B.R. Ambedkar National Institute of Technology, Jalandhar, India ' Department of Computer Science and Engineering, Lovely Professional University, Jalandhar, 144411, India ' Business Administration Research Institute, Sungshin W. University, 2 Bomun-ro 34da gil, Seongbuk-gu, Seoul, South Korea

Abstract: Distributed storage and processing of dataset of big data have become an integrated component of data science. With the technology progress towards the Internet of Things (IoTs), big data becomes more important. Therefore, processing of such data needs utmost concern for the ease of availability and accuracy. Various research has been executed till date for the efficient use of splitting and merging of content in the processing of data. But, somehow they lack in the generation of proper clusters in Hadoop. In this paper, we have shown an efficient approach of using splitting and merging process of data processing. We have used component analysis and hamming distance to generate thee clusters depending on the split values which is novel in this domain of work. The experimented results of our proposed approach provide better efficiency in term of discrete clusters and time consumption.

Keywords: big data; Hadoop; split; merge; cluster.

DOI: 10.1504/IJTPM.2019.104061

International Journal of Technology, Policy and Management, 2019 Vol.19 No.4, pp.392 - 404

Received: 07 Dec 2017
Accepted: 22 Apr 2018

Published online: 07 Dec 2019 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article