Title: Multi-patterns parameterised matching with application to computational biology
Authors: Swati Tevatia; Rajesh Prasad
Addresses: Department of Computer Science and Engineering, Ajay Kumar Garg Engineering College, Ghaziabad, Uttar Pradesh, 201009, India ' Department of Computer Science and Engineering, Ajay Kumar Garg Engineering College, Ghaziabad, Uttar Pradesh, 201009, India
Abstract: In the multi-pattern parameterised string matching problem, we are given a text T and pattern set P = {p1, p2, , pn{. The pattern pi, 1 ≤ i ≤ n, matches a substring t of the text T, if characters of the pattern pi can be transformed into the characters of the substring t with some bijective mapping. This problem has important application in computational biology, where two trends of DNA can be matched with some one-to-one mapping even they are not exactly same. In 2008, Salmela and Tarhio developed a fast parameterised Boyer-Moore-Horspool with hash algorithm, but it is unable to handle multiple patterns simultaneously. In this paper, we extend this algorithm for simultaneously searching the presence of more than one pattern in a large DNA text. Experimental results show that our algorithm is very fast in practice.
Keywords: parameterised string matching; q-gram; predecessor string; multiple patterns; computational biology; DNA matching.
DOI: 10.1504/IJICT.2015.070319
International Journal of Information and Communication Technology, 2015 Vol.7 No.4/5, pp.469 - 480
Received: 24 Jul 2013
Accepted: 17 Jan 2014
Published online: 02 Jul 2015 *