Title: Task scheduling and virtual resource optimising in Hadoop YARN-based cloud computing environment
Authors: Frederic Nzanywayingoma; Yang Yang
Addresses: School of Computer and Communication Engineering, University of Science and Technology Beijing, Haidian District, P.O. Box 100083, Beijing, China ' School of Computer and Communication Engineering, University of Science and Technology Beijing, Haidian District, P.O. Box 100083, Beijing, China
Abstract: Big data is being generated everywhere around us at all times by cameras, mobile devices, sensors, and software logs with large amount of data in units of hundreds of terabytes to petabytes. Therefore, to analyse these massive data, new skills, intensive applications and storage clusters are needed. Apache Hadoop is one of the most recently popular tools developed for big data processing. The main purpose in this paper is to analyse different scheduling algorithms that can help to achieve better performance, efficiency and reliability of Hadoop YARN environment. We describe some task schedulers which consider different levels of Hadoop such as first in first out (FIFO) scheduler, fair scheduler, delay scheduler, deadline constraint scheduler, dynamic priority scheduling, capacity scheduler, and we analyse the performance of these widely used Hadoop task schedulers based on the following elements: makespan; turnaround time; and throughput. To conclude this paper, the experimental results were given.
Keywords: Hadoop; MapReduce; task scheduling; yet another resource negotiator; YARN; Hadoop distributed file system; HDFS; JobTracker; TaskTracker.
International Journal of Cloud Computing, 2018 Vol.7 No.2, pp.83 - 102
Received: 01 Mar 2016
Accepted: 05 Feb 2017
Published online: 03 Aug 2018 *