Title: Ensemble classifier design selecting important genes based on extracted features

Authors: Soumen Kumar Pati; Asit Kumar Das

Addresses: Department of Information Technology, St. Thomas' College of Engineering and Technology, 4, D.H. Road, Kolkata, India ' Depaetment of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, India

Abstract: Ensemble classifier highly depends on the nature of the dataset and efficiency of the classifier degrades tremendously due to presence of irrelevant features. Because of the distinct characteristics inherent to specific cancer, selecting the most informative genes from high volume microarray dataset is challenging bioinformatics research topic. In the paper, the informative genes are selected based on some prominent features generated using statistical and probabilistic concepts. The selected genes are applied on genetic algorithm which intelligently selects an appropriate combination of classifiers where non-linear uniform cellular automata are employed to generate the initial population, multipoint-crossover and unique jumping gene mechanism for mutation to preserve the diversity in the population and a steady state fitness function is introduced for maximum accuracy with minimum classifiers where many classifiers of distinct characteristics are considered as base classifiers. Performance of the proposed method is compared with the state-of-art algorithms to demonstrate its effectiveness.

Keywords: ensemble classifier; informative gene; bioinformatics research; statistical concept; probabilistic concept; genetic algorithm; cellular automata; jumping gene mutation; microarray dataset.

DOI: 10.1504/IJDMB.2017.089282

International Journal of Data Mining and Bioinformatics, 2017 Vol.19 No.2, pp.117 - 149

Accepted: 05 Sep 2017
Published online: 11 Jan 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article