Title: Deep learning-based classification and interpretation of gene expression data from cancer and normal tissues

Authors: TaeJin Ahn; Taewan Goo; Chan-Hee Lee; SungMin Kim; Kyullhee Han; Sangick Park; Taesung Park

Addresses: Department of Life Science, Handong Global University, Pohang, Gyeongsang, South Korea ' Interdisciplinary Program of Bioinformatics, Seoul National University, Gwanak-gu, Seoul, South Korea ' Department of Life Science, Handong Global University, Pohang, Gyeongsang, South Korea ' Department of Life Science, Handong Global University, Pohang, Gyeongsang, South Korea ' Department of Life Science, Handong Global University, Pohang, Gyeongsang, South Korea ' Department of Life Science, Handong Global University, Pohang, Gyeongsang, South Korea ' Department of Statistics, Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-gu, Seoul, South Korea

Abstract: Outstanding performance has been achieved in resolving recognition and classification problems with deep learning technology. As increasing amounts of gene expression data from cancer and normal samples become publicly available, deep learning may become an integral component of revealing specific patterns within massive data sets. Thus, we aimed to address the extent to which a deep learning can learn to recognise cancer. We integrated gene expression data from the Gene Expression Omnibus (GEO), The Cancer Gene Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET) and Genotype-Tissue Expression (GTEx) databases, including 13,406 cancer and 12,842 normal gene expression data from 24 different tissues. We first trained a Deep Neural Network (DNN) to identify cancer and normal samples using various gene selection strategies. Genes of high expression or large variance, therapeutic target genes from commercial cancer panels, and genes in NCI-curated cancer pathways. We also suggest a systematic analysis method to interpret trained deep neural networks. We applied the method to find genes that majorly contribute to classify cancer in an individual sample.

Keywords: cancer; deep learning; gene expression; oncogene addiction.

DOI: 10.1504/IJDMB.2020.110155

International Journal of Data Mining and Bioinformatics, 2020 Vol.24 No.2, pp.121 - 139

Received: 04 Mar 2020
Accepted: 05 Mar 2020

Published online: 07 Oct 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article