Title: Data transformation techniques for preserving privacy in distance-based mining algorithms

Authors: Mohammad Ali Kadampur; D.V.L.N. Somayajulu

Addresses: National Institute of Technology Warangal, Warangal 506004, India ' National Institute of Technology Warangal, Warangal 506004, India

Abstract: Dissimilarity calculation between two objects is one of the important knowledge gathering methods in cognition science. Many data mining algorithms explore dissimilarity computation to cluster the data in order to know intra-relations, inter-relations, and outliers in the data. Majority of these algorithms use Euclidean distance as the dissimilarity criterion. In this paper, signal transformation functions, with their orthogonal property and energy compaction features are explored in transforming the data. The data transformation scheme considers entire data as a single entity. The proposed scheme is designed such that it can be used even for the non-Euclidean space by using the distance mapping algorithm. The existing randomisation approaches for data transformation maintain only the distributions and do not maintain the Euclidean distance between the records. The proposed methods are superior to the existing methods in terms of run time complexity O(n) and preservation of distance between individual data points.

Keywords: privacy preservation; privacy protection; data perturbation; wavelet transforms; data mining; data transformation; distance-based mining; signal transformation functions; Euclidean distance.

DOI: 10.1504/IJDMMM.2014.065148

International Journal of Data Mining, Modelling and Management, 2014 Vol.6 No.3, pp.285 - 311

Published online: 23 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article