Authors: Ilias K. Savvas; Dimitrios Tselios; Georgia Garani
Addresses: Department of Computer Science and Engineering, T.E.I of Thessaly, Larissa, Greece ' Department of Business Administration, T.E.I. of Thessaly, Larissa, Greece ' Department of Computer Science and Engineering, T.E.I of Thessaly, Larissa, Greece
Abstract: Nowadays, huge quantities of data are generated by billions of machines and devices. Numerous methods have been employed, in order to make use of this valuable resource, some of them are altered versions of established known algorithms. One of the most seminal methods, in order to mine from data sources, is clustering, and k-means is a key algorithm which forms clusters of data according to a set of attributes. However, its main shortcoming is the high computational complexity which proves the k-means is very inefficient to perform on big data sets. Although k-means is a very well utilised algorithm, a functional distributed variant combining the multi-core power of contemporary machines has not been accepted yet by researchers. In this work, a three phase distributed/multi-core version of k-means and the analysis of its results are presented. The obtained experimental results are in line with the theoretical outcomes and prove the correctness, efficiency, and scalability of the proposed technique.
Keywords: parallel algorithm; clustering; multi-core; distributed; k-means; OpenMP; MPI.
International Journal of Grid and Utility Computing, 2019 Vol.10 No.3, pp.283 - 291
Received: 08 Nov 2017
Accepted: 12 Jan 2018
Published online: 15 May 2019 *