Title: On stabbing queries for generalised longest repeats
Authors: Bojian Xu
Addresses: Department of Computer Science, Eastern Washington University, Cheney, Washington 99004, USA
Abstract: A longest repeat query on a string, motivated by its applications in many subfields including computational biology, asks for the longest repetitive substring(s) covering a particular string position (point query). In this paper, we extend the point query to interval query, allowing the search for longest repeat(s) covering any position interval. Our method for interval query takes a different approach using the insight from a recent work on shortest unique substrings, as the prior work's approach for point query becomes infeasible in the setting of interval query. We propose an indexing structure, which can be constructed in the optimal O(n) time and space for a string of size n, such that any future interval query can be answered in O(1) time. Further, our solution can find all longest repeats covering any given interval using optimal O(occ) time, where occ is the number of longest repeats covering that given interval.
Keywords: strings; longest repeats; longest repeat query; stabbing queries; string position; point query; interval query; shortest unique substrings; bioinformatics; stringology; repetitive structures.
DOI: 10.1504/IJDMB.2016.078152
International Journal of Data Mining and Bioinformatics, 2016 Vol.15 No.4, pp.350 - 371
Received: 07 May 2016
Accepted: 07 May 2016
Published online: 04 Aug 2016 *