Authors: Joana E. Gonzales Malaverri; André Santanchè; Claudia Bauzer Medeiros
Addresses: Institute of Computing, University of Campinas - UNICAMP, Campinas, SP, Brazil ' Institute of Computing, University of Campinas - UNICAMP, Campinas, SP, Brazil ' Institute of Computing, University of Campinas - UNICAMP, Campinas, SP, Brazil
Abstract: Data quality is growing in relevance as a research topic. Quality assessment has been progressively incorporated in many business environments, and in software engineering practices. eScience environments, however, because of the multiplicity and heterogeneity of data sources and scientific experts involved in a given problem, complicate data quality assessment. This paper deals with the evaluation of the quality of data managed by eScience applications. Our approach is based on data provenance, i.e. the history of the origins and transformations applied to a given data product. Our contributions include (a) the specification of a framework to track data provenance and use it to derive quality information, (b) a model for data provenance based on the Open Provenance Model, and (c) a methodology to evaluate the quality of data based on its provenance. Our proposal is validated experimentally by a prototype that takes advantage of the Taverna workflow system.
Keywords: provenance information; data quality; quality evaluation; agricultural monitoring; e-science applications; data provenance; open provenance model.
International Journal of Metadata, Semantics and Ontologies, 2014 Vol.9 No.1, pp.15 - 28
Available online: 05 Feb 2014 *Full-text access for editors Access for subscribers Purchase this article Comment on this article