Title: A job submission manager for large-scale distributed systems based on job futurity predictor

Authors: Hamid Saadatfar; Hossein Deldari

Addresses: Parallel and Distributed Processing Lab, Computer Engineering Department, Faculty of Engineering, Ferdowsi University of Mashhad, Azadi Sq., Mashhad, Khorasan Razavi, P.O. Box 91775-1111, Iran ' Parallel and Distributed Processing Lab, Computer Engineering Department, Faculty of Engineering, Ferdowsi University of Mashhad, Azadi Sq., Mashhad, Khorasan Razavi, P.O. Box 91775-1111, Iran

Abstract: As compared with supercomputers and PCs, the higher rate of unsuccessful job execution in today's distributed and large systems like clusters and grids is a significant reason behind squandering of their resources. Although many approaches have been proposed in order to make these environments more fault tolerant, their great overhead convinces the researchers to look for preventive methods. In this work, we employ a job futurity predictor to manage the arriving jobs efficiently. To this end, a novel meta-scheduler sub-component called Job Submission Manager (JSM) is proposed. The main role of JSM is to filter the incoming jobs according to some parameters such as current system load, job failure probability. The experimental results based on two different modelling approaches indicate that this managing component can effectively influence the system throughput and increase the utilisation of computing resources.

Keywords: high performance computing; distributed systems; unsuccessful job execution; job submission managers; job futurity predictor; meta-scheduling; current system load; job failure probability; modelling; system throughput; resource utilisation.

DOI: 10.1504/IJGUC.2014.058252

International Journal of Grid and Utility Computing, 2014 Vol.5 No.1, pp.50 - 59

Received: 27 Jul 2012
Accepted: 17 Apr 2013

Published online: 29 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article