Title: TISRover: ConvNets learn biologically relevant features for effective translation initiation site prediction

Authors: Jasper Zuallaert; Mijung Kim; Arne Soete; Yvan Saeys; Wesley De Neve

Addresses: Center for Biotech Data Science, Ghent University Global Campus, Songdo, Incheon, 305-701, South Korea; IDLab, ELIS, Ghent University, Ghent, 9000, Belgium ' Center for Biotech Data Science, Ghent University Global Campus, Songdo, Incheon, 305-701, South Korea; IDLab, ELIS, Ghent University, Ghent, 9000, Belgium ' VIB-UGent Center for Inflammation Research, Technologiepark 927, 9052 Zwijnaarde-Ghent, Belgium; Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium ' VIB-UGent Center for Inflammation Research, Technologiepark 927, 9052 Zwijnaarde-Ghent, Belgium; Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Krijgslaan 281, S9, 9000 Ghent, Belgium ' Center for Biotech Data Science, Ghent University Global Campus, Songdo, Incheon, 305-701, South Korea; IDLab, ELIS, Ghent University, Ghent, 9000, Belgium

Abstract: Being a key component in gene regulation, translation initiation is a well-studied topic. However, recent findings have shown translation initiation to be more complex than initially thought, urging for more effective prediction methods. In this paper, we present TISRover, a multi-layered convolutional neural network architecture for translation initiation site prediction. We achieve state-of-the-art results, outperforming a previous deep learning approach by 4% to 23% in terms of auPRC, and other approaches by at least 68% in terms of error rate. Furthermore, we present a methodology to analyse the decision-making process of our network models, revealing various biologically relevant features for translation initiation site prediction that are automatically learnt from scratch, without any prior knowledge. The most notable features found are the Kozak consensus sequence, the reading frame characteristics, the influence of stop and start codons in the sequence, and the presence of donor splice site patterns.

Keywords: convolutional neural networks; deep learning; genomics; model interpretation; model visualisation; translation initiation site prediction.

DOI: 10.1504/IJDMB.2018.094781

International Journal of Data Mining and Bioinformatics, 2018 Vol.20 No.3, pp.267 - 284

Received: 31 May 2018
Accepted: 12 Jun 2018

Published online: 13 Sep 2018 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article