Title: A novel near-parallel version of k-means algorithm for n-dimensional data objects using MPI

Authors: Ilias K. Savvas; Georgia N. Sofianidou

Addresses: Department of Computer Science and Engineering, T.E.I. of Thessaly, Larissa, Greece ' Department of Computer Science and Engineering, T.E.I. of Thessaly, Larissa, Greece

Abstract: Nowadays, the growth of data is exponential leading to colossal amounts of information. To explore this huge amount of data, new fast algorithms must be discovered or old ones may be redesigned. One of the most useful techniques in order to extract information from data pools is clustering, and k-means is one of them. Its main disadvantage is its computational complexity, which makes it difficult to apply on big data-sets. In this study, a fully parallel version of the k-means for one-dimensional objects is presented, and in addition, a near-parallel approach for n-dimensional objects is explored. The experimental results obtained for one-dimensional data are in-line with the theoretical outcome and prove both its correctness and effectiveness while for n-dimensional objects they are so close to the outcome of the original one that either could be accepted as they are, or could be used as the initial solution for it.

Keywords: data mining; k-means clustering; MPI; message passing interface; parallel k-means; big data.

DOI: 10.1504/IJGUC.2016.077487

International Journal of Grid and Utility Computing, 2016 Vol.7 No.2, pp.80 - 91

Received: 31 Oct 2014
Accepted: 01 Jan 2015

Published online: 04 Jul 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article