Title: Cuckoo search optimisation for feature selection in cancer classification: a new approach

Authors: C. Gunavathi; K. Premalatha

Addresses: Department of Computer Science and Engineering, K.S. Rangasamy College of Technology, Tiruchengode, Tamil Nadu, India ' Department of Computer Science and Engineering, Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu, India

Abstract: Cuckoo Search (CS) optimisation algorithm is used for feature selection in cancer classification using microarray gene expression data. Since the gene expression data has thousands of genes and a small number of samples, feature selection methods can be used for the selection of informative genes to improve the classification accuracy. Initially, the genes are ranked based on T-statistics, Signal-to-Noise Ratio (SNR) and F-statistics values. The CS is used to find the informative genes from the top-m ranked genes. The classification accuracy of k-Nearest Neighbour (kNN) technique is used as the fitness function for CS. The proposed method is experimented and analysed with ten different cancer gene expression datasets. The results show that the CS gives 100% average accuracy for DLBCL Harvard, Lung Michigan, Ovarian Cancer, AML-ALL and Lung Harvard2 datasets and it outperforms the existing techniques in DLBCL outcome and prostate datasets.

Keywords: microarray technology; microarray gene expression; cancer classification; bioinformatics; gene selection; feature selection; cuckoo search; T-statistics; SNR; signal-to-noise ratio; F-statistics; kNN; k-nearest neighbour.

DOI: 10.1504/IJDMB.2015.072092

International Journal of Data Mining and Bioinformatics, 2015 Vol.13 No.3, pp.248 - 265

Accepted: 14 Jan 2015
Published online: 30 Sep 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article