Title: A novel centroids initialisation for K-means clustering in the presence of benign outliers

Authors: Amin Karami; Shafiq Urréhman; Mustansar Ali Ghazanfar

Addresses: Department of Architecture, Computing and Engineering (ACE), University of East London (UEL), Docklands Campus, UK ' China Euro Vehicle Technology AB (CEVT), Theres Svenssons Gata 7, SE-41755 Göteborg, Sweden ' Department of Architecture, Computing and Engineering (ACE), University of East London (UEL), Docklands Campus, UK

Abstract: K-means is one of the most important and widely applied clustering algorithms in learning systems. However, it suffers from centroids initialisation that makes K-means algorithm unstable. The performance and the stability of the K-means algorithm may be degraded if benign outliers (i.e., long-term independence data points) appear in data. In this paper, we developed a novel algorithm to optimise K-means performance in the presence of benign outliers. We firstly identified the benign outliers and executed K-means across them, then K-means runs over all data points to re-locate clusters' centroids, providing high accuracy. The experimental results over several benchmarking and synthetic datasets confirm that the proposed method significantly outperformed some existing approaches with better accuracy based on applied performance metrics.

Keywords: clustering; K-means; centroid initialisation; benign outlier.

DOI: 10.1504/IJDATS.2020.111498

International Journal of Data Analysis Techniques and Strategies, 2020 Vol.12 No.4, pp.287 - 298

Accepted: 12 Dec 2019
Published online: 30 Nov 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article