Title: A multilevel genetic algorithm for the clustering problem

Authors: Noureddine Bouhmala

Addresses: Department of Maritime Technology and Innovation, Buskerud and Vestfold University College, Norway

Abstract: Data mining is concerned with the discovery of interesting patterns and knowledge in data repositories. Cluster analysis which belongs to the core methods of data mining is the process of discovering homogeneous groups called clusters. Given a dataset and some measure of similarity between data objects, the goal in most clustering algorithms is maximising both the homogeneity within each cluster and the heterogeneity between different clusters. The multilevel paradigm suggests a hierarchical optimisation process going through different levels evolving from a coarse grain to fine grain strategy. The clustering problem is solved by first reducing the problem level by level to a coarser problem where an initial clustering is computed. The clustering of the coarser problem is mapped back level-by-level to obtain a better clustering of the original problem by refining the intermediate different clustering obtained at various levels. In this paper, a multilevel genetic algorithm and a multilevel K-means algorithm are introduced for solving the clustering problem. A benchmark using a number of datasets collected from a variety of domains is used to compare the effectiveness of the hierarchical approach against its single-level counterpart.

Keywords: multilevel genetic algorithms; K-means clustering; data mining; cluster analysis.

DOI: 10.1504/IJICT.2016.077692

International Journal of Information and Communication Technology, 2016 Vol.9 No.1, pp.101 - 116

Received: 08 Nov 2013
Accepted: 25 Sep 2014

Published online: 13 Jul 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article