Title: FastMap in dimensionality reduction: ensemble clustering of high dimensional data

Authors: Imran Khan; Joshua Z. Huang

Addresses: Shenzhen Key Laboratory of High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China ' Shenzhen Key Laboratory of High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China

Abstract: In this paper we propose an ensemble clustering method for high dimensional data which uses FastMap projection (FP) to generate component datasets. In comparison with subspace component data generation methods such as random sampling (RS), random projection (RP) and principal component analysis (PCA), FP can better preserve the clustering structure of the original data in the component datasets so that the performance of ensemble clustering can be improved significantly. We present experiment results on six real world high dimensional datasets to demonstrate the better preservation of the clustering structure of the original data in the component datasets generated with FastMap, in comparison with the component datasets generated with RS, RP and PCA. The experiment results of 12 ensemble clustering methods from combinations of four subspace component data generation methods and three consensus functions also demonstrated that the ensemble clustering methods with FastMap outperformed other ensemble clustering methods with RS, RP and PCA. Ensemble clustering with FastMap also performed better than the k-means clustering algorithm.

Keywords: ensemble clustering; FastMap; random sampling; random projection; PCA; principal component analysis; dimensionality reduction; high dimensional data; k-means clustering.

DOI: 10.1504/IJDS.2017.082743

International Journal of Data Science, 2017 Vol.2 No.1, pp.15 - 28

Received: 08 Apr 2014
Accepted: 11 Sep 2014

Published online: 10 Mar 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article