Title: Optimisation of K-means algorithm based on sample density canopy

Authors: Guo-xin Shen; Zhong-yun Jiang

Addresses: Information Institute, Shanghai Ocean University, Shanghai 210306, China ' Information Technology Institute, Shanghai Jianqiao University, Shanghai 210306, China

Abstract: Since the random selection of the initial centroid and the artificial definition of the number of clusters affect the experimental results of K-means, therefore, this article uses sample density and canopy to optimise the K-means algorithm. This algorithm first calculates the sample density of each data, and selects the data point with the smallest density as the first cluster centroid; then combines the canopy algorithm to cluster the original sample data to obtain the number of clusters and each cluster centre. As initial parameter of the K-means finally combines the K-means algorithm to assemble the original samples, UCI dataset and self-built dataset were used to compare simulation experiments. The results show that the algorithm can make clustering results more accurate, run faster, and improve the stability of the algorithm.

Keywords: clustering; K-means algorithm; density; neighbourhood; initial centroid.

DOI: 10.1504/IJAHUC.2021.119087

International Journal of Ad Hoc and Ubiquitous Computing, 2021 Vol.38 No.1/2/3, pp.62 - 69

Received: 03 Sep 2020
Accepted: 07 Jan 2021

Published online: 22 Nov 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article