Title: A job-oriented load-distribution scheme for cost-effective NameNode service in HDFS
Authors: Chi-Yi Lin; Jhih-Kai Liao
Addresses: Department of Computer Science and Information Engineering, Tamkang University, Taipei, 25137, Taiwan ' Department of Computer Science and Information Engineering, Tamkang University, Taipei, 25137, Taiwan
Abstract: Apache Hadoop has been widely used in big data processing and distributed computations. In the Hadoop ecosystem, data are stored and managed by the Hadoop Distributed File System (HDFS), in which the NameNode machine is a single point of failure. Although HDFS Federation and HDFS High Availability solve the problem, it comes at significant cost of extra server hardware. Therefore, we aim at improving the availability of the NameNode service in a more cost-effective way. The primary innovation is the joint consideration of MapReduce jobs and the resulting HDFS operations. Specifically, we dynamically allocate a SubNameNode for each job in one of the existing TaskTrackers to provide the NameNode service. Since the load of the single NameNode is naturally distributed to the SubNameNodes, the failure rate of the NameNode machine can be reduced. Moreover, with SubNameNodes more local to the participating TaskTrackers, TaskTrackers can access the NameNode service more efficiently.
Keywords: big data; cloud computing; Hadoop Distributed File System; HDFS; NameNode service; load distribution; fault tolerance; MapReduce; job-oriented scheme; load distribution.
DOI: 10.1504/IJWGS.2014.064933
International Journal of Web and Grid Services, 2014 Vol.10 No.4, pp.319 - 337
Received: 15 Feb 2014
Accepted: 15 Mar 2014
Published online: 29 Oct 2014 *