Title: Estrogen receptor status prediction by gene component regression: a comparative study

Authors: Chi-Cheng Huang; Shih-Hsin Tu; Heng-Hui Lien; Jaan-Yeh Jeng; Jung-Sen Liu; Ching-Shui Huang; Liang-Chuan Lai; Eric Y. Chuang

Addresses: Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan Univerisity, Taipai City, Taiwan; Cathay General Hospital SiJhih, New Taipai City, Taiwan; School of Medicine, Fu-Jen Catholic University, New Taipai City, Taiwan; School of Medicine, Taipei Medical University, Taipai City, Taiwan ' School of Medicine, Taipei Medical University, Taipai City, Taiwan; Department of Surgery, Cathay General Hospital, Taipai City, Taiwan ' School of Medicine, Fu-Jen Catholic University, New Taipai City, Taiwan; Department of Surgery, Cathay General Hospital, Taipai City, Taiwan ' Cathay General Hospital SiJhih, New Taipai City, Taiwan; School of Medicine, Fu-Jen Catholic University, New Taipai City, Taiwan; School of Medicine, Taipei Medical University, Taipai City, Taiwan ' School of Medicine, Fu-Jen Catholic University, New Taipai City, Taiwan; Department of Surgery, Cathay General Hospital, Taipai City, Taiwan ' School of Medicine, Taipei Medical University, Taipai City, Taiwan; Department of Surgery, Cathay General Hospital, Taipai City, Taiwan ' Graduate Institute of Physiology, National Taiwan Univerisy, Taipai City, Taiwan ' Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan Univerisity, Taipai City, Taiwan

Abstract: The aim of the study is to evaluate gene component analysis for microarray studies. Three dimensional reduction strategies, Principle Component Regression (PCR), Partial Least Square (PLS) and Reduced Rank Regression (RRR) were applied to publicly available breast cancer microarray dataset and the derived gene components were used for tumour classification by Logistic Regression (LR) and Linear Discriminative Analysis (LDA). The impact of gene selection/filtration was evaluated as well. We demonstrated that gene component classifiers could reduce the high-dimensionality of gene expression data and the collinearity problem inherited in most modern microarray experiments. In our study gene component analysis could discriminate Estrogen Receptor (ER) positive breast cancers from negative cancers and the proposed classifiers were successfully reproduced and projected into independent microarray dataset with high predictive accuracy.

Keywords: gene components; dimension reduction; principle component regression; PCR; partial least squares; PLS; reduced rank regression; RRR; microarrays; breast cancer; estrogen receptors; receptor status prediction; gene component regression; tumour classification; gene selection; gene filtration; gene expression data; collinearity; bioinformatics.

DOI: 10.1504/IJDMB.2014.059065

International Journal of Data Mining and Bioinformatics, 2014 Vol.9 No.2, pp.149 - 171

Received: 22 Feb 2012
Accepted: 02 Mar 2012

Published online: 02 Oct 2013 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article