Authors: Balasubramaniam Jayaram; Frank Klawonn
Addresses: Department of Mathematics, Indian Institute of Technology Hyderabad, Yeddumailaram – 502 205, India. ' Department of Computer Science, Ostfalia University of Applied Sciences, D-38302 Wolfenbuettel, Germany; Bioinformatics and Statistics, Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany
Abstract: In this work, we revisit the curse of dimensionality, especially the concentration of the norm phenomenon which is the inability of distance functions to separate points well in high dimensions. We study the influence of the different properties of a distance measure, viz., triangle inequality, boundedness and translation invariance and on this phenomenon. Our studies indicate that unbounded distance measures whose expectations do not exist are to be preferred. We propose some new distance measures based on our studies and present many experimental results which seem to confirm our analysis. In particular, we study these distance measures w.r.t. indices like relative variance and relative contrast and further compare and contrast these measures in the setting of nearest neighbour/proximity searches and hierarchical clustering.
Keywords: curse of dimensionality; CoD; nearest neighbour classifier; cluster analysis; unbounded distance measures; triangle inequality; boundedness; translation invariance; classification; clustering.
International Journal of Data Mining, Modelling and Management, 2012 Vol.4 No.4, pp.361 - 383
Received: 08 May 2021
Accepted: 12 May 2021
Published online: 18 Oct 2012 *