Title: Data clustering using modified k-medoids algorithm

Authors: T. Geetha; Michael Arock

Addresses: Department of Computer Applications, National Institute of Technology, Trichirappalli – 620015, Tamil Nadu, India. ' Department of Computer Applications, National Institute of Technology, Trichirappalli – 620015, Tamil Nadu, India

Abstract: This paper proposes a modified k-medoids algorithm for data clustering. This algorithm is applied on seven different datasets including two gene expression datasets and two medical datasets. It improves initial medoids selection and employs updated medoids selection. Records in the datasets are divided into k groups. Initial medoids are selected from each group and updated medoids are also selected within the group objects, instead of replacing all objects one by one. Two salient features of this algorithm are: 1) it avoids unnecessary selection of all medoids in the dataset; 2) distance matrix is calculated only once which avoids every time scanning of large database. Both these processes reduce execution time in our approach. The proposed algorithm is applied on synthetic, genome expression and medical datasets. The outcomes are validated using various measures like Rand Index and FM Index. Experiments show that proposed algorithm runs fast and finds better results than the existing algorithms.

Keywords: data clustering; k-medoids; cluster validation; Rand Index; FM Index; gene expression datasets; medical datasets; medoids selection.

DOI: 10.1504/IJMEI.2012.046988

International Journal of Medical Engineering and Informatics, 2012 Vol.4 No.2, pp.109 - 124

Published online: 11 Aug 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article