Title: A survey of scheduling frameworks in big data systems

Authors: Ji Liu; Esther Pacitti; Patrick Valduriez

Addresses: Inria and LIRMM, University of Montpellier, France ' Inria and LIRMM, University of Montpellier, France ' Inria and LIRMM, University of Montpellier, France

Abstract: Cloud and big data technologies are now converging to enable organisations to outsource data in the cloud and get value from data. Big data systems typically exploit computer clusters to gain scalability and obtain a good cost-performance ratio. However, scheduling a workload in a computer cluster remains a well-known open problem. Scheduling methods are typically implemented in a scheduling framework. In this paper, we survey scheduling methods and frameworks for big data systems, propose taxonomy and analyse the features of scheduling frameworks. These frameworks have been designed initially for the cloud (MapReduce) to process web data. We examine 16 popular scheduling frameworks. Our study shows that different frameworks are proposed for different big data systems, different scales of computer clusters and different objectives. We propose the main dimensions for workloads and metrics for benchmarks to evaluate these scheduling frameworks. Finally, we analyse their limitations and propose new research directions.

Keywords: big data; cloud computing; cluster computing; parallel processing; scheduling method; scheduling framework; large-scale systems; distributed architecture; resource management; quality of service.

DOI: 10.1504/IJCC.2018.093765

International Journal of Cloud Computing, 2018 Vol.7 No.2, pp.103 - 128

Received: 27 May 2017
Accepted: 18 Jan 2018

Published online: 03 Aug 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article