Title: Extending a re-identification risk-based anonymisation framework and evaluating its impact on data mining classifiers

Authors: Tania Basso; Hebert Silva; Regina Moraes

Addresses: School of Technology, University of Campinas, Limeira, SP, Brazil ' School of Technology, University of Campinas, Limeira, SP, Brazil ' School of Technology, University of Campinas, Limeira, SP, Brazil

Abstract: Preserving sensitive information in data mining processes is one of the major issues in the context of big data. Handling huge volumes of data demands techniques to assure that private data is not accessible to non-authorised users. One of these techniques is data anonymisation, which aims to avoid individual identification. However, even when anonymised, data may be subject to re-identification through privacy attacks. This paper presents a two-stage policy-based anonymisation framework, which applies anonymisation techniques in ETL process and before exporting data analytic results. We extended part of this framework - the k-anonymity-based component - to help minimising the risk of data re-identification. Experiments evaluated the impact of applying this two-stage anonymisation on data mining regarding accuracy, performance, re-identification risk and information loss. Results showed that, when applied carefully, the anonymisation barely affect classifier results, improving accuracy in some cases.

Keywords: privacy; data mining; data anonymisation; re-identification risk; k-anonymity; personally identifiable information; data leakage; privacy attack; data utility; de-anonymisation.

DOI: 10.1504/IJCCBS.2019.106817

International Journal of Critical Computer-Based Systems, 2019 Vol.9 No.4, pp.348 - 378

Received: 22 Oct 2018
Accepted: 05 Aug 2019

Published online: 21 Apr 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article