You can view the full text of this article for free using the link below.

Title: Data locality-aware and QoS-aware dynamic cloud workflow scheduling in Hadoop for heterogeneous environment

Authors: Fan Ding; Minjin Ma

Addresses: College of Computer and Communication Engineering, Lanzhou University of Technology, No. 287 Langongping Road, Qilihe District, Lanzhou City, Gansu, 730050, China ' College of Atmospheric Sciences, Lanzhou University, No. 222 South Tianshui Road, Lanzhou 730000, Gansu Province, 730000, China

Abstract: Hadoop has become a popular data-parallel computing framework for data-intensive scientific applications in recent years. Most scientific applications employ workflows to portray procedures and dependencies between jobs. However, the current default scheduling policy in Hadoop does not take data locality into account. The movement of data among virtual machines (VMs) produces latency in workflow scheduling. In addition, the heterogeneous and dynamics of cloud resources cannot satisfy the user's demand for quality of service (QoS) in static workflow scheduling. Hence, we propose a data locality-aware and QoS-aware dynamic cloud workflow scheduling algorithm (DQ-DCWS) based on dynamic programming. The algorithm balances data locality and delays by grouping nodes that hold tasks correlated with data blocks. We consider five QoS factors and normalise them as a path optimisation issue to realise maximum QoS. DQ-DCWS is implemented and validated by running Montage workflow on real Hadoop clusters which are deployed on Amazon EC2.

Keywords: data locality; Hadoop MapReduce; heterogeneous; workflow scheduling; quality of service; QoS; big data.

DOI: 10.1504/IJWGS.2023.129338

International Journal of Web and Grid Services, 2023 Vol.19 No.1, pp.113 - 135

Received: 10 Jan 2022
Received in revised form: 26 Nov 2022
Accepted: 04 Dec 2022

Published online: 06 Mar 2023 *

Full-text access for editors Full-text access for subscribers Free access Comment on this article