Title: An empirical study of feature selection for classification using genetic algorithm
Authors: Saptarsi Goswami; Amlan Chakrabarti; Basabi Chakraborty
Addresses: Computer Science and Engineering, Institute of Engineering and Management, Kolkata, India ' A.K. Choudhury School of Information and Technology, Calcutta University, Kolkata, India ' Faculty of Software and Information Science, Iwate Prefectural University, Iwate, Japan
Abstract: Feature selection is one of the most important pre-processing steps for a data mining, pattern recognition or machine learning problem. Features get eliminated because either they are irrelevant, or they are redundant. As per literature study, most of the approaches combine the above objectives in a single numeric measure. In this paper, in contrast the problem of finding optimal feature subset has been formulated as a multi objective problem. The concept of redundancy is further refined with a concept of threshold value. Additionally, an objective of maximising entropy has been added. An extensive empirical study has been setup which uses 33 publicly available dataset. A 12% improvement in classification accuracy is reported in a multi objective setup. Other suggested refinements have shown to improve the performance measure. The performance improvement is statistical significant as found by pair wise t-test and Friedman's test.
Keywords: feature selection; classification; genetic algorithm; GA; multi-objective; filter.
International Journal of Advanced Intelligence Paradigms, 2018 Vol.10 No.3, pp.305 - 326
Received: 22 Sep 2015
Accepted: 11 Nov 2015
Published online: 28 Mar 2018 *