Title: A comparative study of multiclass feature selection on RNAseq and microarray data

Authors: Silu Zhang; Junqing Wang; Keli Xu; Megan M. York; Yin-yuan Mo; Yixin Chen; Yunyun Zhou

Addresses: Department of Computer and Information Science, University of Mississippi, Oxford, Mississippi, 38655, USA ' Department of Surgery, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 20025, China ' Department of Neurobiology and Anatomical Sciences, University of Mississippi Medical Center, Jackson, Mississippi, 39216, USA ' John D. Bower School of Population Health, University of Mississippi Medical Center, Jackson, Mississippi, 39216, USA ' Department of Pharmacology and Toxicology, University of Mississippi Medical Center, Jackson, Mississippi, 39216, USA ' Department of Computer and Information Science University of Mississippi, Oxford, Mississippi, 38655, USA ' Department of Data Science, John D. Bower School of Population Health, University of Mississippi Medical Center, Jackson, Mississippi, 39216 USA

Abstract: Gene expression profiles are widely used for identifying phenotype-specific biomarkers in clinical cancer research. By examining important genes expressed in different phenotypes, patients can be classified into different treatment groups. Microarray and RNAseq are the two leading technologies to measure gene expression data. However, due to the heterogeneity of the two platforms, their selected genes are different. In this project, we systematically compared the breast cancer subtype classification accuracies from the selected genes by four popular multiclass feature selection algorithms and discussed the strengths and weakness of selected genes across different platforms and cohorts. Our results showed that the classification of selected genes performs best within the same platform across different cohorts. It suggested that merging the dataset belonging to the same platform will increase the statistical power and improve the prediction accuracy of the selected gene for multiclass classification analysis.

Keywords: Systems biology; feature selection; breast cancer; cancer subtypes; machine learning; functional analysis; integration analysis.

DOI: 10.1504/IJCBDD.2019.099764

International Journal of Computational Biology and Drug Design, 2019 Vol.12 No.2, pp.128 - 142

Received: 30 May 2018
Accepted: 21 Jun 2018

Published online: 11 May 2019 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article