Authors: Abdulrauf Garba Sharifai; Zurinahni Zainol
Addresses: Department of Computer Sciences, Yusuf Maitama Sule University, Kano, Nigeria; School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia ' School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia
Abstract: Microarray data analysis infamously challenges it comprises a significant number of genes, but with small samples. Various methods have been proposed for gene selection; however, most existing methods predominantly focused on selecting relevant gene subsets. However, the selected genes often comprised redundant genes in the training data, which may reduce the performance and increase the complexity of the learning algorithm. This paper proposes a Correlation-Based Redundancy Multiple Filter Approach (CBRMFA). Three filter methods are employed to select the relevant genes with diverse classification ability. The top N ranking genes with the highest-ranking scores are combined to form a new ranking list. The Correlation-Based Redundancy is utilised to eliminate redundant genes. A wrapper approach, sequential forward search approach is used to select the optimal gene subsets. Experimental results show that our proposed CBRMFA had achieved outstanding results in terms of three performance measures compared with the state-of-the-art algorithms.
Keywords: multi-filter; correlation-based redundancy; ensemble method; microarray data set; gene selection.
International Journal of Data Mining and Bioinformatics, 2020 Vol.23 No.1, pp.62 - 78
Received: 03 Oct 2019
Accepted: 13 Nov 2019
Published online: 28 Feb 2020 *