Title: Multi-patterns parameterised matching with application to computational biology

Authors: Swati Tevatia; Rajesh Prasad

Addresses: Department of Computer Science and Engineering, Ajay Kumar Garg Engineering College, Ghaziabad, Uttar Pradesh, 201009, India ' Department of Computer Science and Engineering, Ajay Kumar Garg Engineering College, Ghaziabad, Uttar Pradesh, 201009, India

Abstract: In the multi-pattern parameterised string matching problem, we are given a text T and pattern set P = {p1, p2, , pn{. The pattern pi, 1 ≤ i ≤ n, matches a substring t of the text T, if characters of the pattern pi can be transformed into the characters of the substring t with some bijective mapping. This problem has important application in computational biology, where two trends of DNA can be matched with some one-to-one mapping even they are not exactly same. In 2008, Salmela and Tarhio developed a fast parameterised Boyer-Moore-Horspool with hash algorithm, but it is unable to handle multiple patterns simultaneously. In this paper, we extend this algorithm for simultaneously searching the presence of more than one pattern in a large DNA text. Experimental results show that our algorithm is very fast in practice.

Keywords: parameterised string matching; q-gram; predecessor string; multiple patterns; computational biology; DNA matching.

DOI: 10.1504/IJICT.2015.070319

International Journal of Information and Communication Technology, 2015 Vol.7 No.4/5, pp.469 - 480

Received: 24 Jul 2013
Accepted: 17 Jan 2014

Published online: 02 Jul 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article