Title: A fast clustering approach for large multidimensional data

Authors: Hajar Rehioui; Abdellah Idrissi

Addresses: Computer Science Laboratory (LRI), Computer Science Department, Faculty of Sciences, Mohammed V University in Rabat, Morocco ' Computer Science Laboratory (LRI), Computer Science Department, Faculty of Sciences, Mohammed V University in Rabat, Morocco

Abstract: Density-based clustering is a strong family of clustering methods. The strength of this family is its ability to classify data of arbitrary shapes and to omit the noise. Among them density-based clustering (DENCLUE), which is one of the well-known powerful density-based clustering methods. DENCLUE is based on the concept of the hill climbing algorithm. In order to find the clusters, DENCLUE has to reach a set of points called density attractors. Despite the advantages of DENCLUE, it remains sensitive to the growth of the size of data and of the dimensionality, in the fact that the density attractors are calculated of each point in the input data. In this paper, in the aim to overcome the DENCLUE shortcoming, we propose an efficient approach. This approach replaces the concept of the density attractor by a new concept which is 'the hyper-cube representative'. The experimental results, provided from several datasets, prove that our approach finds a trade-off between the performance of clustering and the fast response time. In this way, the proposed clustering methods work efficiently for large of multidimensional data.

Keywords: large data; dimensional data; clustering; density-based clustering; DENCLUE.

DOI: 10.1504/IJBIDM.2019.101946

International Journal of Business Intelligence and Data Mining, 2019 Vol.15 No.3, pp.349 - 369

Received: 31 Jan 2017
Accepted: 17 Jun 2017

Published online: 04 Jul 2019 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article