Title: Analysis of the relationships among Longest Common Subsequences, Shortest Common Supersequences and patterns and its application on pattern discovery in biological sequences

Authors: Kang Ning; Hoong Kee Ng; Hon Wai Leong

Addresses: Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA. ' Department of Computer Science, National University of Singapore, 117417, Singapore; Faculty of Information and Communication Technology, Limkokwing University of Creative Technology, Inovasi 1-1, Jalan Teknokrat 1/1, 63000 Cyberjaya, Selangor, Malaysia. ' Department of Computer Science, National University of Singapore, 117417, Singapore

Abstract: For a set of multiple sequences, their patterns, Longest Common Subsequences (LCS) and Shortest Common Supersequences (SCS) represent different aspects of these sequences profile. Revealing the relationship between the patterns and LCS/SCS might provide us with a deeper view of the patterns. In this paper, we have showed that patterns LCS and SCS were closely related to each other. Based on their relations, the PALS algorithms are proposed to discover patterns in a set of biological sequences based on LCS and SCS results. Experiments show that the PALS algorithms are superior in efficiency and accuracy on a variety of sequences.

Keywords: pattern discovery; biological sequences; LCSs; longest common subsequences; SCSs; shortest common supersequences; approximation algorithms; multiple sequences; bioinformatics.

DOI: 10.1504/IJDMB.2011.045413

International Journal of Data Mining and Bioinformatics, 2011 Vol.5 No.6, pp.611 - 625

Received: 01 Feb 2010
Accepted: 01 Feb 2010

Published online: 24 Jan 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article