Title: A reliability-aware scheduling algorithm for parallel task executing on cloud computing system

Authors: Jie Cao; Zhifeng Zhang; Bo Wang; Xiao Cui; Jinchao Xu

Addresses: Software Engineering College, Zhengzhou University of Light Industry, Zhengzhou 450002, China ' Software Engineering College, Zhengzhou University of Light Industry, Zhengzhou 450002, China ' Software Engineering College, Zhengzhou University of Light Industry, Zhengzhou 450002, China ' Software Engineering College, Zhengzhou University of Light Industry, Zhengzhou 450002, China ' Infromation Center, Shanghai Jiaotong University, Shanghai, 200240, China

Abstract: As cloud computing is established on the massive cheap server clusters, it causes compute nodes' software and hardware to go wrong. Different computing nodes and communications links have different failure rate. For the parallel task scheduling problem that cloud users have requirements for deadlines and executing reliability, we put forward to generate all possible execution schemes of a parallel task on a cloud computing system. All the execution schemes are constructed into an execution scheme graph (ESG), in which a path from the start point to end point corresponds to an execution scheme of a parallel task. Based on ESG, we propose the maximum reliability execution scheme solving algorithm MRES that searches the execution schemes which have maximum reliability cost while meeting the parallel task's deadline requirement. The experimental results show that MRES algorithm can effectively improve the executing success rate.

Keywords: cloud computing; reliability; parallel task; directed acyclic graph; task scheduling; algorithm.

DOI: 10.1504/IJISTA.2021.120501

International Journal of Intelligent Systems Technologies and Applications, 2021 Vol.20 No.3, pp.215 - 232

Received: 19 Oct 2020
Accepted: 04 Dec 2020

Published online: 24 Jan 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article