Title: An empirical evaluation of strategies based on the triangle inequality for accelerating the k-means algorithm
Authors: Marcelo Kuchar Matte; Maria do Carmo Nicoletti
Addresses: Centro Universitário Campo Limpo Paulista – PMCC, Rua Guatemala 167, 13231-230 C. L. Paulista, SP, Brazil; Instituto Federal de Mato Grosso do Sul – IFMS, Rodovia BR-060 S/N, CEP 79.240-000 Jardim, MS, Brazil ' Centro Universitário Campo Limpo Paulista – PMCC, Rua Guatemala 167, 13231-230 C. L. Paulista, SP, Brazil
Abstract: The k-means clustering algorithm has a long history of success in a wide range of applications in many different research areas. Part of its success is due to both, the simplicity of the algorithm, which helps its quick implementation and the good results it produces. Despite success, however, the original k-means has some shortcomings. One of them relates to the processing time required for the algorithm to finish the iterative process that, given a set of data instances, and an integer value k, induces a clustering having k clusters of the given data instances. This article presents an empirical evaluation of three strategies found in the literature that employ the triangle inequality, with the purpose of accelerating the k-means processing time. Experiments were conducted using two groups of datasets, seven real datasets and ten artificially created datasets. Besides empirically evaluating the impact of variables involved in clustering processes that can interfere with accelerating processes, the article also discusses the different ways the triangle inequality concept is employed by the strategies.
Keywords: k-means; optimisation; triangular inequality; clustering; machine learning; acceleration strategies.
DOI: 10.1504/IJICA.2022.125656
International Journal of Innovative Computing and Applications, 2022 Vol.13 No.4, pp.198 - 209
Received: 21 Apr 2020
Accepted: 10 Sep 2020
Published online: 26 Sep 2022 *