Int. J. of Data Mining, Modelling and Management   »   2017 Vol.9, No.4

 

 

Title: Hybrid feature selection methods for high-dimensional multi-class datasets

 

Authors: Amit Kumar Saxena; Vimal Kumar Dubey; John Wang

 

Addresses:
Department of Computer Science and Information Technology, Guru Ghasidas Vishwavidyalaya, Bilaspur, 495009, India
Department of Computer Science and Information Technology, Guru Ghasidas Vishwavidyalaya, Bilaspur, 495009, India
Department of Information Management and Business Analysis, Montclair State University, Montclair, NJ 07043, USA

 

Abstract: Hybrid methods are very important for feature selection in case of the classification of high-dimensional datasets. In this paper, we proposed two hybrid methods which are the combination of filter-based feature selection, genetic algorithm, and sequential random search methods. The first proposed method is hybridisation of information gain and genetic algorithm. In this, first, the features are ranked based on the information gain and then a user defined features are selected from the ranked features. Genetic algorithm with these selected features is applied for the selection of optimal feature subset. It is applied for feature selection with two types of fitness functions which are single objective and multi-objective in nature. The second feature selection model is the hybridisation of information gain and sequential random K-nearest neighbour (SRKNN). In this method, again information gain is used to rank the features and a user defined top ranked number of features are selected. A set of binary population (having all feature selected by users) are generated and on each population sequential search method is applied for maximising the classification accuracy. These methods are applied to 21 high-dimensional multi-class datasets. Obtained results show that on some datasets first method's performance is good and on some datasets second method's performance is good. The results obtained by proposed methods are compared with results registered for other methods.

 

Keywords: intelligent mining; high-dimensional dataset; genetic algorithm; filter approach; information gain; classification.

 

DOI: 10.1504/IJDMMM.2017.10009449

 

Int. J. of Data Mining, Modelling and Management, 2017 Vol.9, No.4, pp.315 - 339

 

Submission date: 04 Aug 2016
Date of acceptance: 20 Jan 2017
Available online: 01 Dec 2017

 

 

Editors Full text accessAccess for SubscribersPurchase this articleComment on this article