Title: Grid-aware approach to data statistics, data understanding and data preprocessing

Authors: Alexander Wohrer, Lenka Novakova, Peter Brezany, A. Min Tjoa

Addresses: Institute for Scientific Computing, Faculty of Computer Science, University of Vienna, Nordbergstrasse 15/C/3, 1090 Vienna, Austria. ' Czech Technical University in Prague, Faculty of Electrical Engineering, Department of Cybernetics, Technicka 2, 166 27 Prague 6, Czech Republic. ' Institute for Scientific Computing, Faculty of Computer Science, University of Vienna, Nordbergstrasse 15/C/3, 1090 Vienna, Austria. ' Institute of Software Technology, Vienna University of Technology, Favoritenstr. 9 – 11/188, 1040 Wien, Austria

Abstract: In recent years the focus of grid computing shifted towards more data intensive applications, increasingly needing access to various public and private databases. Relocating the code for Data Preprocessing (DPP) closer towards the data source is the overall task of the D³Gframework. This paper presents the data service side architecture to gather Data Statistics (DS) on-the-fly, use them in remote DPP methods on query results and gather exact continuous DS for whole tables inside a database. The performance results are showing low running costs for the continuous DS and the feasibility of the service side DPP functionality.

Keywords: data preprocessing; grid computing; data statistics; database access.

DOI: 10.1504/IJHPCN.2009.026288

International Journal of High Performance Computing and Networking, 2009 Vol.6 No.1, pp.15 - 24

Published online: 05 Jun 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article