A novel entropy-based dynamic data placement strategy for data intensive applications in Hadoop clusters Online publication date: Wed, 16-Jan-2019
by K. Hemant Kumar Reddy; Vishal Pandey; Diptendu Sinha Roy
International Journal of Big Data Intelligence (IJBDI), Vol. 6, No. 1, 2019
Abstract: In the last decade, efficient data analysis of data-intensive applications has become an important research issue. Hadoop is the most widely used platform for data intensive application. However, majority of data placement strategies attempt placing related-data close to each other for faster access without considering new datasets, generated or for different MapReduce jobs. This paper deals with improving the map-reduce performance over multi-cluster datasets by means of a novel-entropy-based data placement strategy (EDPS) in three-phases. K-means clustering strategy is employed to extract dependencies among different datasets and group them into data-groups. Then these data-groups are placed in different datacenters while considering heterogeneity. Finally, an entropy-based grouping of the newly generated datasets where these datasets are grouped with most similar existing cluster based on their relative entropy. The experimental results show efficacy of the proposed three-fold dynamic grouping and data placement policy, which significantly reduces the time and improve Hadoop performance.
Online publication date: Wed, 16-Jan-2019
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Big Data Intelligence (IJBDI):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email firstname.lastname@example.org