Title: Development of a library with feature selection algorithm based on microarray gene expression dataset for biomarker identification

Authors: Sangbum Lee; Sejong Oh

Addresses: Department of Software Science, Dankook University, Yongin 16890, South Korea ' Department of Software Science, Dankook University, Yongin 16890, South Korea

Abstract: Gene expression data is used to find significant genes related to specific disease, such as lung cancer. These significant genes can be used as biomarkers to diagnose disease, and data mining techniques are useful in finding such biomarkers. Feature selection and classification schemes are extensively used for this purpose. Researchers should test various combinations of data mining schemes to find the best biomarker since there is no ultimate scheme for every case of datasets. Thus, the process is tedious and requires effort. In this study, we propose a software library that finds biomarker genes based on microarray datasets. The proposed library contains procedural steps to identify and test biomarker genes and is implemented as an R library for general use. This library with feature selection algorithm, helps to save time and effort in analysing and combining codes to test their research ideas.

Keywords: classification; feature selection; biomarkers; data mining; R library; dimensionality; feature evaluation; microarray datasets; gene expression dataset; biomarker identification; bioinformatics; software library; biomarker genes.

DOI: 10.1504/IJDMB.2016.080041

International Journal of Data Mining and Bioinformatics, 2016 Vol.16 No.2, pp.93 - 110

Accepted: 11 Sep 2016
Published online: 29 Oct 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article