Title: An empirical study of feature selection for classification using genetic algorithm

Authors: Saptarsi Goswami; Amlan Chakrabarti; Basabi Chakraborty

Addresses: Computer Science and Engineering, Institute of Engineering and Management, Kolkata, India ' A.K. Choudhury School of Information and Technology, Calcutta University, Kolkata, India ' Faculty of Software and Information Science, Iwate Prefectural University, Iwate, Japan

Abstract: Feature selection is one of the most important pre-processing steps for a data mining, pattern recognition or machine learning problem. Features get eliminated because either they are irrelevant, or they are redundant. As per literature study, most of the approaches combine the above objectives in a single numeric measure. In this paper, in contrast the problem of finding optimal feature subset has been formulated as a multi objective problem. The concept of redundancy is further refined with a concept of threshold value. Additionally, an objective of maximising entropy has been added. An extensive empirical study has been setup which uses 33 publicly available dataset. A 12% improvement in classification accuracy is reported in a multi objective setup. Other suggested refinements have shown to improve the performance measure. The performance improvement is statistical significant as found by pair wise t-test and Friedman's test.

Keywords: feature selection; classification; genetic algorithm; GA; multi-objective; filter.

DOI: 10.1504/IJAIP.2018.090792

International Journal of Advanced Intelligence Paradigms, 2018 Vol.10 No.3, pp.305 - 326

Received: 22 Sep 2015
Accepted: 11 Nov 2015

Published online: 28 Mar 2018 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article