Title: An integrated multivariate group sparse approach to identify differentially expressed genes of breast cancer data
Authors: N.A.D.N. Napagoda
Addresses: Department of Mathematical Sciences, Wayamba University of Sri Lanka, Kuliyapitiya, Sri Lanka; School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
Abstract: Identifying differentially expressed genes play an important role in disease diagnosis and prognosis. In this study, we use Student's t-statistic for analysing genes of publically available breast cancer data. Different t values in same gene from multiple data cannot be used for identifying cancer related genes separately. The presence of noise in gene expression data may affect the performance of the study. Therefore, we develop an Integrated Multivariate Group Sparse (IMGS) model based on the combined Student's t-statistic of the independent multiple data sets. Stability selection is used to identify the optimal values of tuning parameter in IMGS method. We illustrate the performance of Student's t-statistic, GeneMeta, metaMa and IMGS model on breast cancer genes with reference genes in GWAS. According to the results, the IMGS model is the more appropriate statistical approach than other three methods to identify the most significant genes of multiple gene expression data.
Keywords: differentially expressed genes; GeneMeta; IMGS; metaMa; stability selection; Student's t-statistic.
International Journal of Data Mining and Bioinformatics, 2019 Vol.22 No.2, pp.149 - 170
Accepted: 01 Apr 2019
Published online: 18 May 2019 *