Title: Task scheduling and virtual resource optimising in Hadoop YARN-based cloud computing environment

Authors: Frederic Nzanywayingoma; Yang Yang

Addresses: School of Computer and Communication Engineering, University of Science and Technology Beijing, Haidian District, P.O. Box 100083, Beijing, China ' School of Computer and Communication Engineering, University of Science and Technology Beijing, Haidian District, P.O. Box 100083, Beijing, China

Abstract: Big data is being generated everywhere around us at all times by cameras, mobile devices, sensors, and software logs with large amount of data in units of hundreds of terabytes to petabytes. Therefore, to analyse these massive data, new skills, intensive applications and storage clusters are needed. Apache Hadoop is one of the most recently popular tools developed for big data processing. The main purpose in this paper is to analyse different scheduling algorithms that can help to achieve better performance, efficiency and reliability of Hadoop YARN environment. We describe some task schedulers which consider different levels of Hadoop such as first in first out (FIFO) scheduler, fair scheduler, delay scheduler, deadline constraint scheduler, dynamic priority scheduling, capacity scheduler, and we analyse the performance of these widely used Hadoop task schedulers based on the following elements: makespan; turnaround time; and throughput. To conclude this paper, the experimental results were given.

Keywords: Hadoop; MapReduce; task scheduling; yet another resource negotiator; YARN; Hadoop distributed file system; HDFS; JobTracker; TaskTracker.

DOI: 10.1504/IJCC.2018.093741

International Journal of Cloud Computing, 2018 Vol.7 No.2, pp.83 - 102

Received: 01 Mar 2016
Accepted: 05 Feb 2017

Published online: 03 Aug 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article