Title: Extending a re-identification risk-based anonymisation framework and evaluating its impact on data mining classifiers
Authors: Tania Basso; Hebert Silva; Regina Moraes
Addresses: School of Technology, University of Campinas, Limeira, SP, Brazil ' School of Technology, University of Campinas, Limeira, SP, Brazil ' School of Technology, University of Campinas, Limeira, SP, Brazil
Abstract: Preserving sensitive information in data mining processes is one of the major issues in the context of big data. Handling huge volumes of data demands techniques to assure that private data is not accessible to non-authorised users. One of these techniques is data anonymisation, which aims to avoid individual identification. However, even when anonymised, data may be subject to re-identification through privacy attacks. This paper presents a two-stage policy-based anonymisation framework, which applies anonymisation techniques in ETL process and before exporting data analytic results. We extended part of this framework - the k-anonymity-based component - to help minimising the risk of data re-identification. Experiments evaluated the impact of applying this two-stage anonymisation on data mining regarding accuracy, performance, re-identification risk and information loss. Results showed that, when applied carefully, the anonymisation barely affect classifier results, improving accuracy in some cases.
Keywords: privacy; data mining; data anonymisation; re-identification risk; k-anonymity; personally identifiable information; data leakage; privacy attack; data utility; de-anonymisation.
DOI: 10.1504/IJCCBS.2019.106817
International Journal of Critical Computer-Based Systems, 2019 Vol.9 No.4, pp.348 - 378
Received: 22 Oct 2018
Accepted: 05 Aug 2019
Published online: 21 Apr 2020 *