Title: Identify differentially expressed genes with large background samples
Authors: Jennifer Fowler; Jonathan Stubblefield; Jason Causey; Jake Qualls; Wei Dong; Hongmei Jiang; Karl Walker; Yuanfang Guan; Xiuzhen Huang
Addresses: Molecular Biosciences Graduate Program, Arkansas State University, Jonesboro, AR, 72401, USA ORCID: 0000-0001-5132-8872 ' Arkansas Biosciences Institute, Department of Computer Science, Center for No-Boundary Thinking, Arkansas State University, Jonesboro, AR, 72401, USA ' Department of Computer Science, Center for No-Boundary Thinking, Arkansas State University, Jonesboro, AR, 72401, USA ' Department of Computer Science, Center for No-Boundary Thinking, Arkansas State University, Jonesboro, AR, 72401, USA ' Ann Arbor Algorithm, Ann Arbor, Michigan, 48104, USA ' Department of Statistics, Northwestern University, Evanston, IL, 60208, USA ' Computer Science & Math, University of Arkansas, Pine Bluff, AR, 71601, USA ' Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA ' Department of Computer Science, Center for No-Boundary Thinking, Arkansas State University, Jonesboro, AR, 72401, USA
Abstract: To identify differentially expressed genes related to diseases is important but challenging. The challenges include the inherent noisy nature of the collected data, as well as the imbalance between the very large number of genes and the relatively small number of collected study samples. To address some of these challenges, here we implemented the method of AUCg (Area Under the Curve gene ranking). The novelty of the implementation of AUCg is that it not only utilises the study samples information but also makes good use of the large amount of publicly available gene expression samples as "background". We applied AUCg to a private dataset of 217 multiple myeloma samples, compared to 36,754 publicly available gene expression samples. The analysis identified genes that could be potentially unique to multiple myeloma. The AUCg gene ranking method can be applied for studying many other cancers and human diseases, taking advantage of large publicly available data.
Keywords: genes; gene expression; samples; differentially expressed genes; multiple myeloma.
DOI: 10.1504/IJCBDD.2021.121615
International Journal of Computational Biology and Drug Design, 2021 Vol.14 No.6, pp.411 - 428
Received: 23 Jul 2021
Accepted: 28 Sep 2021
Published online: 21 Mar 2022 *