Title: On stabbing queries for generalised longest repeats

Authors: Bojian Xu

Addresses: Department of Computer Science, Eastern Washington University, Cheney, Washington 99004, USA

Abstract: A longest repeat query on a string, motivated by its applications in many subfields including computational biology, asks for the longest repetitive substring(s) covering a particular string position (point query). In this paper, we extend the point query to interval query, allowing the search for longest repeat(s) covering any position interval. Our method for interval query takes a different approach using the insight from a recent work on shortest unique substrings, as the prior work's approach for point query becomes infeasible in the setting of interval query. We propose an indexing structure, which can be constructed in the optimal O(n) time and space for a string of size n, such that any future interval query can be answered in O(1) time. Further, our solution can find all longest repeats covering any given interval using optimal O(occ) time, where occ is the number of longest repeats covering that given interval.

Keywords: strings; longest repeats; longest repeat query; stabbing queries; string position; point query; interval query; shortest unique substrings; bioinformatics; stringology; repetitive structures.

DOI: 10.1504/IJDMB.2016.078152

International Journal of Data Mining and Bioinformatics, 2016 Vol.15 No.4, pp.350 - 371

Received: 07 May 2016
Accepted: 07 May 2016

Published online: 04 Aug 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article