Title: An empirical evaluation of strategies based on the triangle inequality for accelerating the k-means algorithm

Authors: Marcelo Kuchar Matte; Maria do Carmo Nicoletti

Addresses: Centro Universitário Campo Limpo Paulista – PMCC, Rua Guatemala 167, 13231-230 C. L. Paulista, SP, Brazil; Instituto Federal de Mato Grosso do Sul – IFMS, Rodovia BR-060 S/N, CEP 79.240-000 Jardim, MS, Brazil ' Centro Universitário Campo Limpo Paulista – PMCC, Rua Guatemala 167, 13231-230 C. L. Paulista, SP, Brazil

Abstract: The k-means clustering algorithm has a long history of success in a wide range of applications in many different research areas. Part of its success is due to both, the simplicity of the algorithm, which helps its quick implementation and the good results it produces. Despite success, however, the original k-means has some shortcomings. One of them relates to the processing time required for the algorithm to finish the iterative process that, given a set of data instances, and an integer value k, induces a clustering having k clusters of the given data instances. This article presents an empirical evaluation of three strategies found in the literature that employ the triangle inequality, with the purpose of accelerating the k-means processing time. Experiments were conducted using two groups of datasets, seven real datasets and ten artificially created datasets. Besides empirically evaluating the impact of variables involved in clustering processes that can interfere with accelerating processes, the article also discusses the different ways the triangle inequality concept is employed by the strategies.

Keywords: k-means; optimisation; triangular inequality; clustering; machine learning; acceleration strategies.

DOI: 10.1504/IJICA.2022.125656

International Journal of Innovative Computing and Applications, 2022 Vol.13 No.4, pp.198 - 209

Received: 21 Apr 2020
Accepted: 10 Sep 2020

Published online: 26 Sep 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article