Title: A comprehensive data quality methodology for web and structured data

Authors: Carlo Batini, Federico Cabitza, Cinzia Cappiello, Chiara Francalanci

Addresses: Universita di Milano, Bicocca, Milano, Italy. ' Universita di Milano, Bicocca, Milano, Italy. ' Politecnico di Milano, Milano, Italy. ' Politecnico di Milano, Milano, Italy

Abstract: Measuring and improving data quality in an organisation or in a group of interacting organisations is a complex task. Several methodologies have been developed in the past, providing a basis for the definition of a data quality programme that guarantees high data quality levels. Since the main limitation of existing approaches is their specialisation on specific issues or contexts, this paper presents a Comprehensive Data Quality (CDQ) methodology. The main aim of the CDQ methodology is the integration and enhancement of the phases, techniques and tools proposed by previous approaches. In particular, the CDQ methodology is conceived to be at the same time complete, flexible and simple to apply. Completeness is achieved by considering an existing techniques and tools and integrating them in a framework that can work in any organisation. The methodology is flexible, since it supports the user in the selection of the most suitable techniques and tools within each phase and in any context. Finally, CDQ is simple, since it is organised in phases and each phase is characterised by a specific goal and a set of techniques to apply. The methodology is explained by means of a running example and significant cases of its application are reported.

Keywords: data quality; methodology; assessment; improvement; process; cost; web data; structured data.

DOI: 10.1504/IJICA.2008.019688

International Journal of Innovative Computing and Applications, 2008 Vol.1 No.3, pp.205 - 218

Published online: 20 Jul 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article