Title: Pre-processing of microarray gene expression data for classification using adaptive feature selection and imputation of non-ignorable missing values

Authors: R. Devi Priya; R. Sivaraj

Addresses: Department of Information Technology, Kongu Engineering College, Erode, Tamil Nadu, India ' Department of Computer Science and Engineering, Velalar College of Engineering and Technology, Erode, Tamil Nadu, India

Abstract: Microarray datasets often contain many features and incomplete values. To address these issues, this paper introduces a method called Genetic Algorithm-Based Adaptive Feature Selection with Missing value Imputation (GAFSMI) with two contributions. First, for identifying the noteworthy features, Genetic Algorithm-Based Adaptive Feature Selection (GAFS) is proposed. Then for imputing the non-ignorable missing values, Bayesian Genetic Algorithm (BAGEL) integrating genetic algorithm with Bayesian principles is introduced. These two pre-processing steps generate the complete dataset with optimal feature subset to perform classification with better accuracy. The proposed algorithm is implemented on eight microarray datasets and it is observed that GAFS selects optimal feature subset with appreciable classification accuracy than other feature selection techniques. The imputation accuracy of BAGEL measured is found to be better than other standard imputation techniques at different missing rates (5% to 40%). Classification accuracy is improved in all the datasets processed with GAFS and BAGEL.

Keywords: microarray datasets; feature selection; missing values; genetic algorithms; classification accuracy; pre-processing; gene expression data; bioinformatics; imputation accuracy.

DOI: 10.1504/IJDMB.2016.080670

International Journal of Data Mining and Bioinformatics, 2016 Vol.16 No.3, pp.183 - 204

Received: 14 Nov 2015
Accepted: 11 Sep 2016

Published online: 01 Dec 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article