Title: Measuring stability of feature ranking techniques: a noise-based approach

Authors: Wilker Altidor; Taghi M. Khoshgoftaar; Amri Napolitano

Addresses: Department of Computer & Electrical Engineering & Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA. ' Department of Computer & Electrical Engineering & Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA. ' Department of Computer & Electrical Engineering & Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA

Abstract: One very common criterion used to evaluate feature selection methods is the performance of a chosen classifier trained with the selected features. Another important evaluation criterion that has, until recently, been neglected is the stability of these feature selection methods. While other studies have shown interest in measuring the degree of agreement between the outputs of a technique trained on randomly selected subsets from the same input data, this study presents the importance of evaluating stability in the presence of noise. Experiments are conducted with 17 filters (six standard filter-based ranking techniques and 11 threshold-based feature selection techniques) on nine different real-world datasets. This paper identifies the techniques that are inherently more sensitive to class noise and demonstrates how certain characteristics (sample size and class imbalance) of the data can affect the stability performance of some feature selection methods.

Keywords: feature ranking; class noise; stability; Kuncheva index; class imbalance; data size; feature selection.

DOI: 10.1504/IJBIDM.2012.048729

International Journal of Business Intelligence and Data Mining, 2012 Vol.7 No.1/2, pp.80 - 115

Published online: 12 Nov 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article