Article: Distributed and multi-core version of k-means algorithm Journal: International Journal of Grid and Utility Computing (IJGUC) 2019 Vol.10 No.3 pp.283 - 291 Abstract: Nowadays, huge quantities of data are generated by billions of machines and devices. Numerous methods have been employed, in order to make use of this valuable resource, some of them are altered versions of established known algorithms. One of the most seminal methods, in order to mine from data sources, is clustering, and k-means is a key algorithm which forms clusters of data according to a set of attributes. However, its main shortcoming is the high computational complexity which proves the k-means is very inefficient to perform on big data sets. Although k-means is a very well utilised algorithm, a functional distributed variant combining the multi-core power of contemporary machines has not been accepted yet by researchers. In this work, a three phase distributed/multi-core version of k-means and the analysis of its results are presented. The obtained experimental results are in line with the theoretical outcomes and prove the correctness, efficiency, and scalability of the proposed technique. Inderscience Publishers - linking academia, business and industry through research

Title: Distributed and multi-core version of k-means algorithm

Authors: Ilias K. Savvas; Dimitrios Tselios; Georgia Garani

Addresses: Department of Computer Science and Engineering, T.E.I of Thessaly, Larissa, Greece ' Department of Business Administration, T.E.I. of Thessaly, Larissa, Greece ' Department of Computer Science and Engineering, T.E.I of Thessaly, Larissa, Greece

Abstract: Nowadays, huge quantities of data are generated by billions of machines and devices. Numerous methods have been employed, in order to make use of this valuable resource, some of them are altered versions of established known algorithms. One of the most seminal methods, in order to mine from data sources, is clustering, and k-means is a key algorithm which forms clusters of data according to a set of attributes. However, its main shortcoming is the high computational complexity which proves the k-means is very inefficient to perform on big data sets. Although k-means is a very well utilised algorithm, a functional distributed variant combining the multi-core power of contemporary machines has not been accepted yet by researchers. In this work, a three phase distributed/multi-core version of k-means and the analysis of its results are presented. The obtained experimental results are in line with the theoretical outcomes and prove the correctness, efficiency, and scalability of the proposed technique.

Keywords: parallel algorithm; clustering; multi-core; distributed; k-means; OpenMP; MPI.

DOI: 10.1504/IJGUC.2019.099668

International Journal of Grid and Utility Computing, 2019 Vol.10 No.3, pp.283 - 291

Received: 08 Nov 2017
Accepted: 12 Jan 2018
Published online: 20 May 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Distributed and multi-core version of k-means algorithm

Keep up-to-date