Title: Stability analysis of feature ranking techniques in the presence of noise: a comparative study
Authors: Iman Ramezani; Mojtaba Khorram Niaki; Milad Dehghani; Mostafa Rezapour
Addresses: Department of Industrial Engineering, Sharif University of Technology, Tehran, Iran ' Department of Industrial Engineering, University of Tehran, Tehran, Iran ' Department of System Engineering and Engineering Management, City University of Hong Kong, Hong Kong ' Department of Mathematics and Statistics, Washington State University, Pullman, WA, USA
Abstract: Noisy data is one of the common problems associated with real-world data, and may affects the performance of the data models, consequent decisions and the performance of feature ranking techniques. In this paper, we show how stability performance can be changed if different feature ranking methods against attribute noise and class noise are used. We consider Kendall's Tau rank correlation and Spearman rank correlation to evaluate various feature ranking methods stability, and quantify the degree of agreement between ordered lists of features created by a filter on a clean dataset and its outputs on the same dataset corrupted with different combinations of the noise level. According to the results of Kendall and Spearman measures, Gini index (GI) and information gain (IG) have the best performances respectively. Nevertheless, both Kendall and Spearman measures results show that ReliefF (RF) is the most sensitive (the worst) performance.
Keywords: attribute noise; class noise; filter-based feature ranking; threshold-based feature ranking; stability; Kendall's Tau rank correlation; Spearman rank correlation.
International Journal of Business Intelligence and Data Mining, 2020 Vol.17 No.4, pp.413 - 427
Received: 02 Nov 2017
Accepted: 25 Feb 2018
Published online: 28 Apr 2020 *