A job-oriented load-distribution scheme for cost-effective NameNode service in HDFS Online publication date: Wed, 29-Oct-2014
by Chi-Yi Lin; Jhih-Kai Liao
International Journal of Web and Grid Services (IJWGS), Vol. 10, No. 4, 2014
Abstract: Apache Hadoop has been widely used in big data processing and distributed computations. In the Hadoop ecosystem, data are stored and managed by the Hadoop Distributed File System (HDFS), in which the NameNode machine is a single point of failure. Although HDFS Federation and HDFS High Availability solve the problem, it comes at significant cost of extra server hardware. Therefore, we aim at improving the availability of the NameNode service in a more cost-effective way. The primary innovation is the joint consideration of MapReduce jobs and the resulting HDFS operations. Specifically, we dynamically allocate a SubNameNode for each job in one of the existing TaskTrackers to provide the NameNode service. Since the load of the single NameNode is naturally distributed to the SubNameNodes, the failure rate of the NameNode machine can be reduced. Moreover, with SubNameNodes more local to the participating TaskTrackers, TaskTrackers can access the NameNode service more efficiently.
Online publication date: Wed, 29-Oct-2014
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Web and Grid Services (IJWGS):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email email@example.com