Title: An evolutionary approach for high dimensional attribute selection

Authors: Lydia Boudjeloud-Assala

Addresses: Laboratory of Theoretical and Applied of Computer Science, University of Lorraine, LITA EA 3097, Ile du Saulcy, Metz Cedex 01, F-57045, France

Abstract: We present a method to select a relevant dimension subset (with few or no loss of information) for clustering and outlier detection in high dimensional datasets. We use a heuristic search for relevant dimension subset selection based on genetic algorithm. The genetic algorithm fitness function for clustering uses the validity indexes of classification algorithms. We first use these validity indexes to select a dimension subset and then, to evaluate the clustering quality in this subspace. For outlier detection, the genetic algorithm fitness function is an individual distance-based function. The performances of our new approach of dimension selection are evaluated on simulations with different high dimensional datasets for the two applications (clustering and outlier detection). Furthermore, as the number of dimensions is low, it is possible to display the datasets in order to visually evaluate and interpret the obtained results.

Keywords: genetic algorithms; GAs; high dimensional datasets; clustering; attribute selection; outlier detection; visualisation.

DOI: 10.1504/IJIIDS.2012.050110

International Journal of Intelligent Information and Database Systems, 2012 Vol.6 No.6, pp.578 - 602

Accepted: 06 Aug 2012
Published online: 23 Aug 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article