Authors: Dharmveer Singh Rajput; Pramod Kumar Singh; Mahua Bhattacharya
Addresses: ABV, Indian Institute of Information Technology and Management Gwalior, Morena Link Road, Gwalior – 474015, Madhya Pradesh, India ' ABV, Indian Institute of Information Technology and Management Gwalior, Morena Link Road, Gwalior – 474015, Madhya Pradesh, India ' ABV, Indian Institute of Information Technology and Management Gwalior, Morena Link Road, Gwalior – 474015, Madhya Pradesh, India
Abstract: Clustering is a process of partitioning data objects into different groups according to some similarity or dissimilarity measure, e.g., distance criterion. The distance criterion fails to group the objects as all the objects are almost equidistant in high dimensional dataset, hence the distance criterion becomes meaningless. In the literature, numerous clustering algorithms are presented for clustering high dimensional dataset, which select relevant dimensions in high dimensional dataset and perform clustering of the objects on the selected dimensions. As these clustering algorithms produce different clustering results on the same dataset, there is confusion in the selection of clustering algorithm for better clustering of high dimensional dataset. In this paper, we present a comparative study of conventional feature selection based clustering algorithms and propose a new feature selection based clustering method IQRAM (inter quartile range and median based clustering of high dimensional dataset) for clustering high dimensional dataset. We perform our experiments on two real datasets and analyse the clustering results using five well-known clustering quality measures and student's t-test. The qualitative results show that IQRAM outperform ten competitive clustering algorithms.
Keywords: data clustering; high dimensional datasets; dimension reduction; feature extraction; feature selection; data mining; clustering algorithms.
International Journal of Knowledge Engineering and Data Mining, 2012 Vol.2 No.2/3, pp.117 - 136
Received: 08 May 2021
Accepted: 12 May 2021
Published online: 28 Dec 2012 *