Title: Factors influencing K means algorithm

Authors: Shejuti Khan; S.M. Monzurur Rahman; M. Faysal Tanim; Fizar Ahmed

Addresses: Department of Computer Science and Engineering, United International University, Road # 8/A, Dhaka-1209, Bangladesh ' Department of Computer Science and Engineering, United International University, Road # 8/A, Dhaka-1209, Bangladesh ' Department of Computer Science and Engineering, United International University, Road # 8/A, Dhaka-1209, Bangladesh ' Department of Computer Science and Engineering, United International University, Road # 8/A, Dhaka-1209, Bangladesh

Abstract: Clustering is an unsupervised learning technique. K-means is one of the most popular clustering algorithms. K-means requires the number of clusters to be pre-specified. Finding the appropriate number of clusters for a dataset is a trial-and-error process made more difficult by the subjective nature of deciding what constitutes 'correct' clustering (Han and Kamber, 2000). The aim of K-means is to group the items into k clusters such that all items in same cluster are as similar to each other and items not in same cluster are as dissimilar as possible. Different distance measures can be applied to calculate similarity. Improving the performance of K means can be a very useful and make better clustering. Improvement of performance depends on factors which we need to explore and measure with experiments. Our paper has done this and studied and identified five influential factors for the performance improvement of K-means.

Keywords: K-means clustering; influencing factors; convergence; distance equation; encoding; normalisation; unit vector; performance improvement.

DOI: 10.1504/IJCSYSE.2013.057212

International Journal of Computational Systems Engineering, 2013 Vol.1 No.4, pp.217 - 228

Published online: 26 Jul 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article