Extending a re-identification risk-based anonymisation framework and evaluating its impact on data mining classifiers Online publication date: Thu, 16-Apr-2020
by Tania Basso; Hebert Silva; Regina Moraes
International Journal of Critical Computer-Based Systems (IJCCBS), Vol. 9, No. 4, 2019
Abstract: Preserving sensitive information in data mining processes is one of the major issues in the context of big data. Handling huge volumes of data demands techniques to assure that private data is not accessible to non-authorised users. One of these techniques is data anonymisation, which aims to avoid individual identification. However, even when anonymised, data may be subject to re-identification through privacy attacks. This paper presents a two-stage policy-based anonymisation framework, which applies anonymisation techniques in ETL process and before exporting data analytic results. We extended part of this framework - the k-anonymity-based component - to help minimising the risk of data re-identification. Experiments evaluated the impact of applying this two-stage anonymisation on data mining regarding accuracy, performance, re-identification risk and information loss. Results showed that, when applied carefully, the anonymisation barely affect classifier results, improving accuracy in some cases.
Online publication date: Thu, 16-Apr-2020
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Critical Computer-Based Systems (IJCCBS):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email firstname.lastname@example.org