Title: Deep learning-based classification and interpretation of gene expression data from cancer and normal tissues
Authors: TaeJin Ahn; Taewan Goo; Chan-Hee Lee; SungMin Kim; Kyullhee Han; Sangick Park; Taesung Park
Addresses: Department of Life Science, Handong Global University, Pohang, Gyeongsang, South Korea ' Interdisciplinary Program of Bioinformatics, Seoul National University, Gwanak-gu, Seoul, South Korea ' Department of Life Science, Handong Global University, Pohang, Gyeongsang, South Korea ' Department of Life Science, Handong Global University, Pohang, Gyeongsang, South Korea ' Department of Life Science, Handong Global University, Pohang, Gyeongsang, South Korea ' Department of Life Science, Handong Global University, Pohang, Gyeongsang, South Korea ' Department of Statistics, Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-gu, Seoul, South Korea
Abstract: Outstanding performance has been achieved in resolving recognition and classification problems with deep learning technology. As increasing amounts of gene expression data from cancer and normal samples become publicly available, deep learning may become an integral component of revealing specific patterns within massive data sets. Thus, we aimed to address the extent to which a deep learning can learn to recognise cancer. We integrated gene expression data from the Gene Expression Omnibus (GEO), The Cancer Gene Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET) and Genotype-Tissue Expression (GTEx) databases, including 13,406 cancer and 12,842 normal gene expression data from 24 different tissues. We first trained a Deep Neural Network (DNN) to identify cancer and normal samples using various gene selection strategies. Genes of high expression or large variance, therapeutic target genes from commercial cancer panels, and genes in NCI-curated cancer pathways. We also suggest a systematic analysis method to interpret trained deep neural networks. We applied the method to find genes that majorly contribute to classify cancer in an individual sample.
Keywords: cancer; deep learning; gene expression; oncogene addiction.
DOI: 10.1504/IJDMB.2020.110155
International Journal of Data Mining and Bioinformatics, 2020 Vol.24 No.2, pp.121 - 139
Received: 04 Mar 2020
Accepted: 05 Mar 2020
Published online: 07 Oct 2020 *