Authors: K. Michalak, H. Kwasnicka
Addresses: Faculty of Computer Science and Management, Institute of Informatics, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27 50-370 Wroclaw, Poland. ' Faculty of Computer Science and Management, Institute of Informatics, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27 50-370 Wroclaw, Poland
Abstract: Feature selection is an important data preprocessing step which is performed before a learning algorithm is applied. The issue that has to be taken into consideration when proposing a feature selection method is its computational complexity. Often, if the feature selection process is fast, it cannot thoroughly search the feature subset space and classification accuracy is degraded. Lately, a pairwise feature selection method was proposed as an effective trade-off between computation speed and classification accuracy. In this paper, a new feature selection method is proposed which further improves feature selection speed while preserving classification accuracy. The new method selects features individually or in a pairwise manner based on the correlations between features. Experiments conducted on several benchmark data sets prove with high statistical significance that the correlation-based feature selection method shortens computations compared to the pairwise feature selection method and produces classification errors that are not worse than those produced by existing methods.
Keywords: machine learning; feature selection; feature correlation; pairwise selection; pattern classification; correlation based feature selection; computational complexity; data preprocessing; classification accuracy.
International Journal of Bio-Inspired Computation, 2010 Vol.2 No.5, pp.319 - 332
Available online: 25 Oct 2010 *Full-text access for editors Access for subscribers Purchase this article Comment on this article