Title: Classification performance comparison of deep learning and classical data mining methods on RNA-Seq data set

Authors: Merve Kasikci; Erdal Coşgun; Erdem Karabulut

Addresses: Department of Biostatistics, School of Medicine, Hacettepe University, Sihhiye, Ankara, Turkey ' Microsoft Genomics Team, AI & Research, Seattle, Washington, USA ' Department of Biostatistics, School of Medicine, Hacettepe University, Sihhiye, Ankara, Turkey

Abstract: In this study, it is aimed to compare the performance of deep learning and classical classification methods in the classification of RNA-Seq data, which is one of the data sources used to investigate the relationship between disease and genes. Two data sets with different characteristics are used. The first data set, the lung cancer data set, has two classes and balanced class ratios. The second data set is the renal cell carcinoma data set, which has three imbalanced classes. Different gene filtering methods are applied to these data sets. The classification performances of random forest, support vector machines, artificial neural network and deep learning on two data sets and different filters are evaluated. Hyper-parameters are optimised for each classification method. In general, deep learning and support vector machines have the highest or second highest values in terms of performance measures such as accuracy, F-measure and Kappa coefficient. In the lung cancer data sets that contain more genes and show a balanced class distribution, deep learning outperforms classical classification methods and it is recommended to use.

Keywords: RNA-Seq; cancer; data mining; classification methods; deep learning.

DOI: 10.1504/IJDMB.2021.126844

International Journal of Data Mining and Bioinformatics, 2021 Vol.26 No.3/4, pp.188 - 201

Received: 08 Feb 2022
Accepted: 14 Jul 2022

Published online: 08 Nov 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article