Title: Usage of ensemble model and genetic algorithm in pipeline for feature selection from cancer microarray data

Authors: Sahu Barnali; Dehuri Satchidananda; Jagadev Alok Kumar

Addresses: Department of Computer Science and Engineering, Siksha 'O' Anusandhan (deemed to be) University, Bhubaneswar, 751030, Odisha, India ' P.G. Department of Information and Communication Technology, Fakir Mohan University, Vyasa Vihar, Balasore, 756019, Odisha, India ' School of Computer Engineering, Kalinga Institute of Industrial Technology (deemed to be) University, Bhubaneswar, 751024, Odisha, India

Abstract: This paper proposes an ensemble of feature selection techniques with genetic algorithm (GA) in pipeline for selecting features from microarray data. The ensemble is a combination of filter and wrapper-based feature selection methods. In addition, GA in pipeline has been used for refinement of ensemble output to produce a non-local set of robust feature subset. An extensive computational experiment has been carried out on a prostate cancer dataset for validation of the method and comparison with group genetic algorithm (GGA). Finally, the resultant feature subsets of GA, GGA, and other constituents of the ensemble in standalone mode have been used for uncovering frequent patterns based on Apriori and FP-growth. The experimental study confirms that the proposed method gives classification accuracy of 100%, 98.34%, 98.02%, and 97% based on an ensemble of classifiers w. r. t. 5, 10, 15, and 20 features, respectively, vis-à-vis 92.34%, 90.34%, 86.54%, and 87.21% of GGA.

Keywords: microarray data; differentially expressed genes; ensemble feature selection; Apriori; FP-growth.

DOI: 10.1504/IJBRA.2020.109100

International Journal of Bioinformatics Research and Applications, 2020 Vol.16 No.3, pp.217 - 244

Received: 16 Mar 2017
Accepted: 22 Jan 2018

Published online: 14 Aug 2020 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article