Authors: Imran Khan; Joshua Z. Huang
Addresses: Shenzhen Key Laboratory of High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China ' Shenzhen Key Laboratory of High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
Abstract: In this paper we propose an ensemble clustering method for high dimensional data which uses FastMap projection (FP) to generate component datasets. In comparison with subspace component data generation methods such as random sampling (RS), random projection (RP) and principal component analysis (PCA), FP can better preserve the clustering structure of the original data in the component datasets so that the performance of ensemble clustering can be improved significantly. We present experiment results on six real world high dimensional datasets to demonstrate the better preservation of the clustering structure of the original data in the component datasets generated with FastMap, in comparison with the component datasets generated with RS, RP and PCA. The experiment results of 12 ensemble clustering methods from combinations of four subspace component data generation methods and three consensus functions also demonstrated that the ensemble clustering methods with FastMap outperformed other ensemble clustering methods with RS, RP and PCA. Ensemble clustering with FastMap also performed better than the k-means clustering algorithm.
Keywords: ensemble clustering; FastMap; random sampling; random projection; PCA; principal component analysis; dimensionality reduction; high dimensional data; k-means clustering.
International Journal of Data Science, 2017 Vol.2 No.1, pp.15 - 28
Available online: 10 Mar 2017 *Full-text access for editors Access for subscribers Purchase this article Comment on this article