Title: Hybrid feature selection methods for high-dimensional multi-class datasets

Authors: Amit Kumar Saxena; Vimal Kumar Dubey; John Wang

Addresses: Department of Computer Science and Information Technology, Guru Ghasidas Vishwavidyalaya, Bilaspur, 495009, India ' Department of Computer Science and Information Technology, Guru Ghasidas Vishwavidyalaya, Bilaspur, 495009, India ' Department of Information Management and Business Analysis, Montclair State University, Montclair, NJ 07043, USA

Abstract: Hybrid methods are very important for feature selection in case of the classification of high-dimensional datasets. In this paper, we proposed two hybrid methods which are the combination of filter-based feature selection, genetic algorithm, and sequential random search methods. The first proposed method is hybridisation of information gain and genetic algorithm. In this, first, the features are ranked based on the information gain and then a user defined features are selected from the ranked features. Genetic algorithm with these selected features is applied for the selection of optimal feature subset. It is applied for feature selection with two types of fitness functions which are single objective and multi-objective in nature. The second feature selection model is the hybridisation of information gain and sequential random K-nearest neighbour (SRKNN). In this method, again information gain is used to rank the features and a user defined top ranked number of features are selected. A set of binary population (having all feature selected by users) are generated and on each population sequential search method is applied for maximising the classification accuracy. These methods are applied to 21 high-dimensional multi-class datasets. Obtained results show that on some datasets first method's performance is good and on some datasets second method's performance is good. The results obtained by proposed methods are compared with results registered for other methods.

Keywords: intelligent mining; high-dimensional dataset; genetic algorithm; filter approach; information gain; classification.

DOI: 10.1504/IJDMMM.2017.088411

International Journal of Data Mining, Modelling and Management, 2017 Vol.9 No.4, pp.315 - 339

Received: 04 Aug 2016
Accepted: 20 Jan 2017

Published online: 06 Dec 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article