Title: Stability analysis of feature ranking techniques in the presence of noise: a comparative study

Authors: Iman Ramezani; Mojtaba Khorram Niaki; Milad Dehghani; Mostafa Rezapour

Addresses: Department of Industrial Engineering, Sharif University of Technology, Tehran, Iran ' Department of Industrial Engineering, University of Tehran, Tehran, Iran ' Department of System Engineering and Engineering Management, City University of Hong Kong, Hong Kong ' Department of Mathematics and Statistics, Washington State University, Pullman, WA, USA

Abstract: Noisy data is one of the common problems associated with real-world data, and may affects the performance of the data models, consequent decisions and the performance of feature ranking techniques. In this paper, we show how stability performance can be changed if different feature ranking methods against attribute noise and class noise are used. We consider Kendall's Tau rank correlation and Spearman rank correlation to evaluate various feature ranking methods stability, and quantify the degree of agreement between ordered lists of features created by a filter on a clean dataset and its outputs on the same dataset corrupted with different combinations of the noise level. According to the results of Kendall and Spearman measures, Gini index (GI) and information gain (IG) have the best performances respectively. Nevertheless, both Kendall and Spearman measures results show that ReliefF (RF) is the most sensitive (the worst) performance.

Keywords: attribute noise; class noise; filter-based feature ranking; threshold-based feature ranking; stability; Kendall's Tau rank correlation; Spearman rank correlation.

DOI: 10.1504/IJBIDM.2020.110371

International Journal of Business Intelligence and Data Mining, 2020 Vol.17 No.4, pp.413 - 427

Received: 02 Nov 2017
Accepted: 25 Feb 2018

Published online: 16 Oct 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article