Authors: N. Bhalaji; K.B. Sundhara Kumar; Chithra Selvaraj
Addresses: Department of Information Technology, SSN College of Engineering, Chennai, 603110, India ' Department of Information Technology, SSN College of Engineering, Chennai, 603110, India ' Department of Information Technology, SSN College of Engineering, Chennai, 603110, India
Abstract: Feature selection methods are deployed in machine-learning algorithms for reducing the redundancy in the dataset and to increase the clarity in the system models without loss of much information. The objective of this paper is to investigate the performance of feature selection methods when they are exposed to different datasets and different classification algorithms. In this paper, we have investigated standard parameters such as accuracy, precision and recall over two feature selection algorithms namely Chi-Square feature selection and Boruta feature selection algorithms. Observations of the experiments conducted using R studio resulted around 5-6% increased performance in above said parameters when they were exposed to Boruta feature selection algorithm. The experiment was done on two different datasets with different set of features and we have used the following five standard classification algorithms - Naive Bayes, decision tree, support vector machines (SVM), random forest and gradient boosting.
Keywords: classification; feature selection; Boruta; Chi-square; ensemble classifiers.
International Journal of Intelligent Systems Technologies and Applications, 2018 Vol.17 No.1/2, pp.98 - 108
Received: 14 Feb 2017
Accepted: 03 May 2017
Published online: 03 May 2018 *