Title: Safety scheduling strategies in distributed computing

Authors: Victor V. Toporkov, Alexey Tselishchev

Addresses: Computer Science Department, Moscow Power Engineering Institute, ul. Krasnokazarmennaya 14, Moscow, 111250 Russia. ' European Organization for Nuclear Research (CERN), 1211 Geneva, 23, Switzerland

Abstract: In this paper, we present an approach to safety scheduling in distributed computing based on strategies of resource co-allocation for complex sets of tasks (jobs). The necessity of guaranteed job execution until the time limits requires taking into account the distributed environment dynamics, namely, changes in the number of jobs for servicing, volumes of computations, possible failures of processor nodes, etc. As a consequence, in the general case, a set of versions of scheduling and resource co-allocation, or a strategy, is required instead of a single version. Safety strategies are formed for structurally different job models with various levels of task granularity and data replication policies. We develop and consider scheduling strategies which combine fine-grain and coarse-grain computations, multiple data replicas and constrained data movement. These strategies are evaluated using simulations studies and addressing a variety of metrics.

Keywords: distributed computing; safety scheduling; resource allocation; job execution; task execution; work; critical computing; resource co-allocation; simulation; resource management; job models.

DOI: 10.1504/IJCCBS.2010.031899

International Journal of Critical Computer-Based Systems, 2010 Vol.1 No.1/2/3, pp.41 - 58

Published online: 01 Mar 2010 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article