Title: Effective hybrid feature subset selection for multilevel datasets using decision tree classifiers

Authors: S. Dinakaran; P. Ranjit Jeba Thangaiah

Addresses: T. John College, Bangalore, India ' Department of Information Technology, Karunya Institute of Technology and Sciences, Coimbatore, India

Abstract: Feature selection is one of the most significant procedures in machine learning algorithms. It is particularly to improve the performance and prediction accuracy for complex data classification. This paper discusses a hybrid feature selection technique with the decision tree-based classification algorithm. The feature selected using information gain (IG) is combined with the features selected from ReliefF which generates the resultant feature subset. Then the resultant feature subset is in turn combined with a correlation-based feature selection (CFS) method to generate the aggregated feature subset. To perform classification accuracy on the aggregated feature subset, different decision trees-based classification algorithm such as C4.5, decision stumps, naive Bayes tree, and random forest with ten-fold cross-validation have been deployed. To check the prediction accuracy of the proposed work eight different multilevel University of California, Irvine (UCI) machine learning datasets have been used with minimum to maximum numbers of features. The main objective of the hybrid feature selection is to improve the classification accuracy, prediction and to reduce the execution time using standard datasets.

Keywords: feature selection; decision tree; information gain; ReliefF; correlation-based feature selection; CFS; naïve Bayes tree; random forest; C4.5; decision stump; exclusive OR; intersection; ranker.

DOI: 10.1504/IJAIP.2023.128082

International Journal of Advanced Intelligence Paradigms, 2023 Vol.24 No.1/2, pp.206 - 228

Accepted: 09 Jun 2019
Published online: 05 Jan 2023 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article