Article: Optimisation of K-means algorithm based on sample density canopy Journal: International Journal of Ad Hoc and Ubiquitous Computing (IJAHUC) 2021 Vol.38 No.1/2/3 pp.62 - 69 Abstract: Since the random selection of the initial centroid and the artificial definition of the number of clusters affect the experimental results of K-means, therefore, this article uses sample density and canopy to optimise the K-means algorithm. This algorithm first calculates the sample density of each data, and selects the data point with the smallest density as the first cluster centroid; then combines the canopy algorithm to cluster the original sample data to obtain the number of clusters and each cluster centre. As initial parameter of the K-means finally combines the K-means algorithm to assemble the original samples, UCI dataset and self-built dataset were used to compare simulation experiments. The results show that the algorithm can make clustering results more accurate, run faster, and improve the stability of the algorithm. Inderscience Publishers - linking academia, business and industry through research

Title: Optimisation of K-means algorithm based on sample density canopy

Authors: Guo-xin Shen; Zhong-yun Jiang

Addresses: Information Institute, Shanghai Ocean University, Shanghai 210306, China ' Information Technology Institute, Shanghai Jianqiao University, Shanghai 210306, China

Abstract: Since the random selection of the initial centroid and the artificial definition of the number of clusters affect the experimental results of K-means, therefore, this article uses sample density and canopy to optimise the K-means algorithm. This algorithm first calculates the sample density of each data, and selects the data point with the smallest density as the first cluster centroid; then combines the canopy algorithm to cluster the original sample data to obtain the number of clusters and each cluster centre. As initial parameter of the K-means finally combines the K-means algorithm to assemble the original samples, UCI dataset and self-built dataset were used to compare simulation experiments. The results show that the algorithm can make clustering results more accurate, run faster, and improve the stability of the algorithm.

Keywords: clustering; K-means algorithm; density; neighbourhood; initial centroid.

DOI: 10.1504/IJAHUC.2021.119087

International Journal of Ad Hoc and Ubiquitous Computing, 2021 Vol.38 No.1/2/3, pp.62 - 69

Received: 03 Sep 2020
Accepted: 07 Jan 2021
Published online: 22 Nov 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Optimisation of K-means algorithm based on sample density canopy

Keep up-to-date