Title: Can unbounded distance measures mitigate the curse of dimensionality?

Authors: Balasubramaniam Jayaram; Frank Klawonn

Addresses: Department of Mathematics, Indian Institute of Technology Hyderabad, Yeddumailaram – 502 205, India. ' Department of Computer Science, Ostfalia University of Applied Sciences, D-38302 Wolfenbuettel, Germany; Bioinformatics and Statistics, Helmholtz Centre for Infection Research, D-38124 Braunschweig, Germany

Abstract: In this work, we revisit the curse of dimensionality, especially the concentration of the norm phenomenon which is the inability of distance functions to separate points well in high dimensions. We study the influence of the different properties of a distance measure, viz., triangle inequality, boundedness and translation invariance and on this phenomenon. Our studies indicate that unbounded distance measures whose expectations do not exist are to be preferred. We propose some new distance measures based on our studies and present many experimental results which seem to confirm our analysis. In particular, we study these distance measures w.r.t. indices like relative variance and relative contrast and further compare and contrast these measures in the setting of nearest neighbour/proximity searches and hierarchical clustering.

Keywords: curse of dimensionality; CoD; nearest neighbour classifier; cluster analysis; unbounded distance measures; triangle inequality; boundedness; translation invariance; classification; clustering.

DOI: 10.1504/IJDMMM.2012.049883

International Journal of Data Mining, Modelling and Management, 2012 Vol.4 No.4, pp.361 - 383

Published online: 23 Aug 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article