Article: A job-oriented load-distribution scheme for cost-effective NameNode service in HDFS Journal: International Journal of Web and Grid Services (IJWGS) 2014 Vol.10 No.4 pp.319 - 337 Abstract: Apache Hadoop has been widely used in big data processing and distributed computations. In the Hadoop ecosystem, data are stored and managed by the Hadoop Distributed File System (HDFS), in which the NameNode machine is a single point of failure. Although HDFS Federation and HDFS High Availability solve the problem, it comes at significant cost of extra server hardware. Therefore, we aim at improving the availability of the NameNode service in a more cost-effective way. The primary innovation is the joint consideration of MapReduce jobs and the resulting HDFS operations. Specifically, we dynamically allocate a SubNameNode for each job in one of the existing TaskTrackers to provide the NameNode service. Since the load of the single NameNode is naturally distributed to the SubNameNodes, the failure rate of the NameNode machine can be reduced. Moreover, with SubNameNodes more local to the participating TaskTrackers, TaskTrackers can access the NameNode service more efficiently. Inderscience Publishers - linking academia, business and industry through research

Title: A job-oriented load-distribution scheme for cost-effective NameNode service in HDFS

Authors: Chi-Yi Lin; Jhih-Kai Liao

Addresses: Department of Computer Science and Information Engineering, Tamkang University, Taipei, 25137, Taiwan ' Department of Computer Science and Information Engineering, Tamkang University, Taipei, 25137, Taiwan

Abstract: Apache Hadoop has been widely used in big data processing and distributed computations. In the Hadoop ecosystem, data are stored and managed by the Hadoop Distributed File System (HDFS), in which the NameNode machine is a single point of failure. Although HDFS Federation and HDFS High Availability solve the problem, it comes at significant cost of extra server hardware. Therefore, we aim at improving the availability of the NameNode service in a more cost-effective way. The primary innovation is the joint consideration of MapReduce jobs and the resulting HDFS operations. Specifically, we dynamically allocate a SubNameNode for each job in one of the existing TaskTrackers to provide the NameNode service. Since the load of the single NameNode is naturally distributed to the SubNameNodes, the failure rate of the NameNode machine can be reduced. Moreover, with SubNameNodes more local to the participating TaskTrackers, TaskTrackers can access the NameNode service more efficiently.

Keywords: big data; cloud computing; Hadoop Distributed File System; HDFS; NameNode service; load distribution; fault tolerance; MapReduce; job-oriented scheme; load distribution.

DOI: 10.1504/IJWGS.2014.064933

International Journal of Web and Grid Services, 2014 Vol.10 No.4, pp.319 - 337

Received: 15 Feb 2014
Accepted: 15 Mar 2014
Published online: 29 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: A job-oriented load-distribution scheme for cost-effective NameNode service in HDFS

Keep up-to-date