Authors: Timothy Becker; Dong-Guk Shin
Addresses: Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, USA ' Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, USA
Abstract: A single DNA alignment file can be resource intensive to visualise at arbitrary scale given current visualisation systems. We address this limitation by integrating a parallel out-of-core feature extraction algorithm with a disk based hierarchical data store that is several orders of magnitude faster for visualisation tasks. To demonstrate the utility of our approach, we designed a high-performance web application that serves translated data to an interactive client. We incorporate novel visualisation of these data features, while allowing user-specified resolution and response. Unlike per-read techniques which can run out of memory when displaying large scale genomic variations, our data structure returns a controllable representation of that region, making the technique ideally suited for visualisation of multiple large data sets. We describe our open-source feature extraction framework and web-based visualization while comparing the performance to current systems.
Keywords: feature extraction; sequence alignment visualisation.
International Journal of Data Mining and Bioinformatics, 2020 Vol.23 No.4, pp.285 - 298
Received: 28 Mar 2020
Accepted: 02 Apr 2020
Published online: 27 Jul 2020 *