Title: Managing data quality of integrated data with known provenance

Authors: Maria Del Pilar Angeles, Lachlan M. MacKinnon

Addresses: Facultad de Ingenieria, UNAM, Edificio 'Bernardo Quintana' 2do. Piso, C.U., C.P., 04510, Mexico. ' School of Computing and Mathematical Sciences, University of Greenwich, Old Royal Naval College, Park Row, London SE10 9LS, UK

Abstract: Users querying a database system will have returned to them a set of data with no indication of the qualitative value of that data. In order to address the issue of data quality, and challenging the presumptions of perfection, atomicity and primary authorship, a toolset has been developed. This project proposes a data quality manager (DQM), which contains a reference model, a measurement model along with an assessment model. The present work aims to identify data quality criteria to measure and assess data quality of derived data, as well as data at multiple levels of granularity. The qualitative information provided by the DQM is enhanced by considering data provenance. The qualitative measures allow the ranking of data sources based on users| specification of the context in a heterogeneous multi-database environment. The DQM prototype has been tested and several experiments have been carried out in order to prove that more accurate information is being provided to the users.

Keywords: data quality; data provenance; heterogeneous database systems; information quality; quality management; granularity; data retrieval; information retrieval.

DOI: 10.1504/IJIQ.2011.040671

International Journal of Information Quality, 2011 Vol.2 No.3, pp.244 - 263

Accepted: 23 Aug 2009
Published online: 31 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article