Authors: Guo-xin Shen; Zhong-yun Jiang
Addresses: Information Institute, Shanghai Ocean University, Shanghai 210306, China ' Information Technology Institute, Shanghai Jianqiao University, Shanghai 210306, China
Abstract: Since the random selection of the initial centroid and the artificial definition of the number of clusters affect the experimental results of K-means, therefore, this article uses sample density and canopy to optimise the K-means algorithm. This algorithm first calculates the sample density of each data, and selects the data point with the smallest density as the first cluster centroid; then combines the canopy algorithm to cluster the original sample data to obtain the number of clusters and each cluster centre. As initial parameter of the K-means finally combines the K-means algorithm to assemble the original samples, UCI dataset and self-built dataset were used to compare simulation experiments. The results show that the algorithm can make clustering results more accurate, run faster, and improve the stability of the algorithm.
Keywords: clustering; K-means algorithm; density; neighbourhood; initial centroid.
International Journal of Ad Hoc and Ubiquitous Computing, 2021 Vol.38 No.1/2/3, pp.62 - 69
Received: 03 Sep 2020
Accepted: 07 Jan 2021
Published online: 22 Nov 2021 *