Title: Error detection and error classification: failure awareness in data transfer scheduling

Authors: Mehmet Balman, Tevfik Kosar

Addresses: Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA. ' Department of Computer Science, Center for Computation & Technology, Louisiana State University, Baton Rouge, LA 70803, USA

Abstract: Data transfer in distributed environment is prone to frequent failures resulting from back-end system level problems, like connectivity failure which is technically untraceable by users. Error messages are not logged efficiently, and sometimes are not relevant/useful from users| point-of-view. Our study explores the possibility of efficient error detection and reporting system for such environments. Prior knowledge about the environment and awareness of the actual reason behind a failure would enable higher level planners to make better and accurate decisions. It is necessary to have well defined error detection and error reporting methods to increase the usability and serviceability of existing data transfer protocols and data management systems. We investigate the applicability of early error detection and error classification techniques and propose an error reporting framework and a failure-aware data transfer life cycle to improve arrangement of data transfer operations and to enhance decision making of data transfer schedulers.

Keywords: error detection; error classification; network exploration; data movement; distributed repositories; failure awareness; data transfer scheduling; bulk data transfer; error reporting; decision making.

DOI: 10.1504/IJAC.2010.037516

International Journal of Autonomic Computing, 2010 Vol.1 No.4, pp.425 - 446

Published online: 15 Dec 2010 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article