Title: An ensemble-clustering-based distance metric and its applications

Authors: Loai AbdAllah; Ilan Shimshoni

Addresses: Department of Mathematics, University of Haifa, Haifa, 31905, Israel; Department of Mathematics and Computer Science, The College of Sakhnin, Sakhnin, B.O. 100, ZIP:20173, Israel ' Department of Information Systems, University of Haifa, Haifa, 31905, Israel

Abstract: A distance metric learned from data reflects the actual similarity between objects better than the geometric distance. So, in this paper, we propose a new distance that is based on clustering. Because objects belonging to the same cluster usually share some common traits even though their geometric distance might be large. Thus, we perform several clustering runs to yield an ensemble of clustering results. The distance is defined by how many times the objects were not clustered together. To evaluate the ability of this new distance to reflect object similarity, we apply it to two types of data mining algorithms, classification (kNN) and selective sampling (LSS). We experimented on standard numerical datasets and on real colour images. Using our distance, the algorithms run on equivalence classes instead of single objects, yielding a considerable speedup. We compared the kNN-EC classifier and LSS-EC algorithm to the original kNN and LSS algorithms.

Keywords: kNN classification; unsupervised learning; distance metric; ensemble clustering; k nearest neighbour; data mining; selective sampling; object similarity; numerical datasets; colour images.

DOI: 10.1504/IJBIDM.2013.059052

International Journal of Business Intelligence and Data Mining, 2013 Vol.8 No.3, pp.264 - 287

Received: 13 Nov 2013
Accepted: 15 Nov 2013

Published online: 28 Jun 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article