Title: A new data-driven method for microarray data classification

Authors: Ganeshkumar Pugalendhi; Ammu Vijayakumar; Ku-Jin Kim

Addresses: School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, South Korea ' Department of Computer Science and Engineering, Erode Builder Educational Trust's Group of Institutions, Kangayam 638108, Tamil Nadu, India ' School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, South Korea

Abstract: Knowledge gained through classification of microarray gene expression data is increasingly important as they are useful for phenotype classification of diseases. In this paper we propose a rule-based approach called 'Large Coverage Rule' for microarray data classification. The proposed approach is a parameter-free data-driven approach that constructs decision rule based on the expression values of a gene. A simple Rank-Based Scoring algorithm is proposed for selecting informative genes. The performance of the proposed approach is evaluated using ten publicly available gene expression data sets. From the simulation result, it is found that the proposed approach generates compact rules and produces comparatively good classification accuracy than the others. Gene ontology based biological semantics is also carried out to analyse the informative genes. Statistical analysis of test result shows that the generated rules are simple to interpret, highly comprehensible and classifies microarray data accurately.

Keywords: microarray data classification; rank-based scoring; decision rules; large coverage rule; weighted voting; gene expression data; bioinformatics; simulation; gene ontology; biological semantics; informative genes.

DOI: 10.1504/IJDMB.2016.076532

International Journal of Data Mining and Bioinformatics, 2016 Vol.15 No.2, pp.101 - 124

Accepted: 29 Dec 2015
Published online: 11 May 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article