Title: Feature selection with ensemble learning using enriched SOM
Authors: Ameni Filali; Chiraz Jlassi; Najet Arous
Addresses: Laboratory LIMTIC, Higher Institute of Computer Science, University of Tunis El Manar, 2 Rue Abou Raihan El Bayrouni, 2080 Ariana, Tunisia ' Laboratory LIMTIC, Higher Institute of Computer Science, University of Tunis El Manar, 2 Rue Abou Raihan El Bayrouni, 2080 Ariana, Tunisia ' Laboratory LIMTIC, Higher Institute of Computer Science, University of Tunis El Manar, 2 Rue Abou Raihan El Bayrouni, 2080 Ariana, Tunisia
Abstract: Finding pertinent subspaces in very high-dimensional dataset is a challenging task. The selection of features should be stable, but on the other hand clustering results have to be enhanced. Ensemble methods have successfully increased the stability and clustering accuracy, but their runtime prevents them from scaling up to real-world applications. This paper treats the problem of selecting a subset of the most relevant features for each cluster from a dataset. The proposed model is an extension of the random forests method using enriched self-organising map (SOM) to unlabelled data that assess the out-of-bag (oob) feature importance from an ensemble of partitions. Each partition is produced using a different bootstrap sample and a random subset of the features. We then assessed the accuracy and the scalability of the proposed method on 19 benchmark datasets and we compared its effectiveness against other unsupervised feature selection methods with ensemble learning.
Keywords: unsupervised learning; K-means; self-organising map; random forest; feature selection.
DOI: 10.1504/IJISTA.2017.085357
International Journal of Intelligent Systems Technologies and Applications, 2017 Vol.16 No.3, pp.208 - 224
Received: 03 May 2016
Accepted: 28 Dec 2016
Published online: 24 Jul 2017 *