A recovery mechanism for errors caused by a late subjob in a system handling SLA-based Grid workflows
by Dang Minh Quan, Jorn Altmann
International Journal of Web and Grid Services (IJWGS), Vol. 4, No. 1, 2008

Abstract: Supporting SLAs (Service Level Agreements) for Grid-based workflows requires providing mechanisms for handling errors (i.e., the failures of subjobs). In the context of this paper, we propose an error recovery mechanism which can handle one failed subjob of a workflow. The error recovery mechanism has a maximum of three phases, depending on the impact of the error. In each phase, we use a dedicated algorithm to remap the subjobs of the workflow to the resources. The main contributions of the paper are the error recovery mechanism for SLA-based workflows and the mapping algorithm G-map, which is used in the first phase of the recovery mechanism. The G-map remaps the groups of subjobs, which are directly affected by an error. The efficiency of the proposed algorithm is validated through simulation results.

Online publication date: Sun, 25-May-2008

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Web and Grid Services (IJWGS):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com