Title: A semi-supervised rough set and random forest approach for pattern classification of gene expression data

Authors: Pradeep Kumar Mallick; Debahuti Mishra; Srikanta Patnaik; Kailash Shaw

Addresses: Department of Computer Science and Engineering, Siksha 'O' Anusandhan University, Bhubaneswar, Odisha, India ' Department of Computer Science and Engineering, Siksha 'O' Anusandhan University, Bhubaneswar, Odisha, India ' Department of Computer Science and Engineering, Siksha 'O' Anusandhan University, Bhubaneswar, Odisha, India ' D.Y. Patil College of Engineering, Akurdi, Pune, India

Abstract: In this paper, we present a semi-supervised rough set-based random forest gene selection method for classification of data patterns. The proposed method tries to find the genes of interest known as significant genes and maximise the accuracy of the model with reduction percentage. The advantage of this approach is analysed by experimental results on three benchmark datasets such as leukaemia, colon cancer and SRBCT and results showed an improved accuracy over existing methods such as support vector machine, k-nearest neighbour and random forest. Finally, the performance of those selected significant genes has been measured using classifier validity and statistical measures. The experimental results and performance measures proves the efficiency of the proposed hybridised technique over traditional random forest method.

Keywords: gene selection; rough set theory; random forest; lower approximation; importance score; semi-supervised rough sets; pattern classification; gene expression data; bioinformatics; leukaemia; colon cancer.

DOI: 10.1504/IJRIS.2016.082976

International Journal of Reasoning-based Intelligent Systems, 2016 Vol.8 No.3/4, pp.155 - 167

Received: 10 May 2016
Accepted: 24 Jul 2016

Published online: 17 Mar 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article