Title: The FRCK clustering algorithm for determining cluster number and removing outliers automatically

Authors: Yubin Guo; Yuhang Wu; Xiaopeng Zhang; Aofeng Bo; Ximing Li

Addresses: South China Agricultural University, Guangzhou 510642, China ' South China Agricultural University, Guangzhou 510642, China ' South China Agricultural University, Guangzhou 510642, China ' Guangzhou HolandAI Technology Co., Ltd., Guangzhou 510006, China ' South China Agricultural University, Guangzhou 510642, China

Abstract: Clustering algorithm is one of the most popular unsupervised algorithms for data grouping. The K-means algorithm is a popular clustering algorithm for its simplicity, ease of implementation and efficiency. But for K-means algorithm, the optical cluster number is difficult to predict, while it is sensitive to outliers. In this paper, we divide outliers into two types, and then prompt a clustering algorithm to remove the two-type outliers and calculate the optimal cluster number in each clustering iteration. The algorithm is a fusion of rough clustering and K-means, abbreviated as FRCK algorithm. In the FRCK algorithm, outliers are removed precisely, therefore the optical cluster number can be more accurate, and the quality of clustering result can be improved accordingly. And this algorithm is proven effective by experiment.

Keywords: clustering; K-means clustering algorithm; optical cluster number; outlier.

DOI: 10.1504/IJCSE.2021.118097

International Journal of Computational Science and Engineering, 2021 Vol.24 No.5, pp.485 - 494

Received: 15 Jun 2020
Accepted: 07 Jan 2021

Published online: 12 Oct 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article