Title: Feature selection with ensemble learning using enriched SOM

Authors: Ameni Filali; Chiraz Jlassi; Najet Arous

Addresses: Laboratory LIMTIC, Higher Institute of Computer Science, University of Tunis El Manar, 2 Rue Abou Raihan El Bayrouni, 2080 Ariana, Tunisia ' Laboratory LIMTIC, Higher Institute of Computer Science, University of Tunis El Manar, 2 Rue Abou Raihan El Bayrouni, 2080 Ariana, Tunisia ' Laboratory LIMTIC, Higher Institute of Computer Science, University of Tunis El Manar, 2 Rue Abou Raihan El Bayrouni, 2080 Ariana, Tunisia

Abstract: Finding pertinent subspaces in very high-dimensional dataset is a challenging task. The selection of features should be stable, but on the other hand clustering results have to be enhanced. Ensemble methods have successfully increased the stability and clustering accuracy, but their runtime prevents them from scaling up to real-world applications. This paper treats the problem of selecting a subset of the most relevant features for each cluster from a dataset. The proposed model is an extension of the random forests method using enriched self-organising map (SOM) to unlabelled data that assess the out-of-bag (oob) feature importance from an ensemble of partitions. Each partition is produced using a different bootstrap sample and a random subset of the features. We then assessed the accuracy and the scalability of the proposed method on 19 benchmark datasets and we compared its effectiveness against other unsupervised feature selection methods with ensemble learning.

Keywords: unsupervised learning; K-means; self-organising map; random forest; feature selection.

DOI: 10.1504/IJISTA.2017.085357

International Journal of Intelligent Systems Technologies and Applications, 2017 Vol.16 No.3, pp.208 - 224

Received: 03 May 2016
Accepted: 28 Dec 2016

Published online: 24 Jul 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article