Authors: Prakash Kumar, B.K. Tripathy
Addresses: School of Computing Sciences, VIT University, Vellore-632 014, Tamilnadu, India. ' School of Computing Sciences, VIT University, Vellore-632 014, Tamilnadu, India
Abstract: Several cluster analysis techniques have been developed so far to group objects having similar characteristics. Clustering of categorical data is more challenging than that of numerical data. Most of the early cluster analysis techniques face problems due to the fact that much of the data contained in today|s databases is categorical in nature. This necessitated the development of some algorithms for clustering categorical data. Uncertainty is an integral part of databases. The algorithms put forth either lack the capability to handle uncertainty or do not reach a steady state in a few iterations, which gives rise to the stability issues. Recently, an algorithm, termed MMR was proposed (Parmar et al., 2007), which uses the rough set theory to deal with the above problems in clustering categorical data. In this paper, we modified MMR to develop an improved algorithm and call it MMeR. This takes care of both numerical and categorical data simultaneously besides handling uncertainty. Also, this new algorithm provides much better performance than most of the existing algorithms including MMR. Some well known data sets are taken to test and illustrate the superiority of MMeR over most of the existing algorithms.
Keywords: clustering; min-min-roughness; MMR; MMeR; rough sets; heterogeneous data; purity; cluster analysis; uncertainty.
International Journal of Rapid Manufacturing, 2009 Vol.1 No.2, pp.189 - 207
Available online: 28 Nov 2009 *Full-text access for editors Access for subscribers Purchase this article Comment on this article