Title: Empirical study of feature selection methods over classification algorithms

Authors: N. Bhalaji; K.B. Sundhara Kumar; Chithra Selvaraj

Addresses: Department of Information Technology, SSN College of Engineering, Chennai, 603110, India ' Department of Information Technology, SSN College of Engineering, Chennai, 603110, India ' Department of Information Technology, SSN College of Engineering, Chennai, 603110, India

Abstract: Feature selection methods are deployed in machine-learning algorithms for reducing the redundancy in the dataset and to increase the clarity in the system models without loss of much information. The objective of this paper is to investigate the performance of feature selection methods when they are exposed to different datasets and different classification algorithms. In this paper, we have investigated standard parameters such as accuracy, precision and recall over two feature selection algorithms namely Chi-Square feature selection and Boruta feature selection algorithms. Observations of the experiments conducted using R studio resulted around 5-6% increased performance in above said parameters when they were exposed to Boruta feature selection algorithm. The experiment was done on two different datasets with different set of features and we have used the following five standard classification algorithms - Naive Bayes, decision tree, support vector machines (SVM), random forest and gradient boosting.

Keywords: classification; feature selection; Boruta; Chi-square; ensemble classifiers.

DOI: 10.1504/IJISTA.2018.091590

International Journal of Intelligent Systems Technologies and Applications, 2018 Vol.17 No.1/2, pp.98 - 108

Received: 14 Feb 2017
Accepted: 03 May 2017

Published online: 08 May 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article