Title: An overall approach to achieve load balancing for Hadoop Distributed File System

Authors: Chi-Yi Lin; Ying-Chen Lin

Addresses: Department of Computer Science and Information Engineering, Tamkang University, Taipei 25137, Taiwan ' Department of Computer Science and Information Engineering, Tamkang University, Taipei 25137, Taiwan

Abstract: Hadoop Distributed File System (HDFS) is a popular cloud storage system that can scale up easily to meet the increasing demand for more storage capacity. In HDFS, files are divided into fixed-size blocks, which are then replicated and randomly stored on many DataNodes to prevent data loss. It can be easily observed that the random nature of the default block placement strategy may lead to a load imbalance state among the DataNodes. Although HDFS has a built-in utility to achieve load balancing, it comes at the cost of a reduced system performance owing to moving blocks around. In this paper, we take a holistic approach to achieve load balancing by considering all situations that may influence the load-balancing state. We designed a new role named BalanceNode to help in matching heavy-loaded and light-loaded DataNodes, so those light-loaded nodes can share part of the load from heavy-loaded ones. We also designed a better block placement strategy to make the storage load as balanced as possible in the first place. The simulation results show that our approach can achieve better load-balancing state than with existing algorithms.

Keywords: cloud computing; Hadoop Distributed File System; load balancing.

DOI: 10.1504/IJWGS.2017.087370

International Journal of Web and Grid Services, 2017 Vol.13 No.4, pp.448 - 466

Received: 20 Oct 2016
Accepted: 15 Jul 2017

Published online: 13 Oct 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article