Title: Longitudinal data consistency verification using formal methods

Authors: Roberto Boselli; Mirko Cesarini; Fabio Mercorio; Mario Mezzanzanica

Addresses: Department of Statistics and Quantitative Methods, CRISP Research Centre, University of Milan Bicocca, Milan 20126, Italy ' Department of Statistics and Quantitative Methods, CRISP Research Centre, University of Milan Bicocca, Milan 20126, Italy ' CRISP Research Centre, University of Milan Bicocca, Milan 20126, Italy ' Department of Statistics and Quantitative Methods, CRISP Research Centre, University of Milan Bicocca, Milan 20126, Italy

Abstract: The longitudinal data collected by public administrations and large organisations are apt to describe social and economic phenomena, whose dynamics require strong attention from policy makers and civil servants. Unfortunately the quality of the stored data is often very poor, therefore data cleansing is a mandatory step before their exploitation. This paper is driven by the idea that formal methods (specifically model checking) can provide a strong contribution to extracting, formalising, and refining consistency requirements from the domain knowledge, and then verifying the real data against the elicited requirements. We developed a methodology (the Robust Data Quality Analysis) assessing the quality of both the original data and the cleansing results. We applied the proposed approach to a real world scenario in the labour market domain, evaluating the consistency of millions of people careers. The results show that our approach can provide an effective contribution to the improvement of data cleansing activities.

Keywords: data quality; model checking; administrative archives; longitudinal data; information quality; consistency verification; labour market; data consistency; formal methods; data cleansing.

DOI: 10.1504/IJIQ.2014.064054

International Journal of Information Quality, 2014 Vol.3 No.3, pp.185 - 206

Received: 11 Sep 2012
Accepted: 31 May 2013

Published online: 30 Aug 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article