Title: A provenance-based approach to evaluate data quality in eScience

Authors: Joana E. Gonzales Malaverri; André Santanchè; Claudia Bauzer Medeiros

Addresses: Institute of Computing, University of Campinas - UNICAMP, Campinas, SP, Brazil ' Institute of Computing, University of Campinas - UNICAMP, Campinas, SP, Brazil ' Institute of Computing, University of Campinas - UNICAMP, Campinas, SP, Brazil

Abstract: Data quality is growing in relevance as a research topic. Quality assessment has been progressively incorporated in many business environments, and in software engineering practices. eScience environments, however, because of the multiplicity and heterogeneity of data sources and scientific experts involved in a given problem, complicate data quality assessment. This paper deals with the evaluation of the quality of data managed by eScience applications. Our approach is based on data provenance, i.e. the history of the origins and transformations applied to a given data product. Our contributions include (a) the specification of a framework to track data provenance and use it to derive quality information, (b) a model for data provenance based on the Open Provenance Model, and (c) a methodology to evaluate the quality of data based on its provenance. Our proposal is validated experimentally by a prototype that takes advantage of the Taverna workflow system.

Keywords: provenance information; data quality; quality evaluation; agricultural monitoring; e-science applications; data provenance; open provenance model.

DOI: 10.1504/IJMSO.2014.059127

International Journal of Metadata, Semantics and Ontologies, 2014 Vol.9 No.1, pp.15 - 28

Received: 09 Jan 2013
Accepted: 08 Oct 2013

Published online: 05 Feb 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article