Title: Analysis of the relationships among Longest Common Subsequences, Shortest Common Supersequences and patterns and its application on pattern discovery in biological sequences
Authors: Kang Ning; Hoong Kee Ng; Hon Wai Leong
Addresses: Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA. ' Department of Computer Science, National University of Singapore, 117417, Singapore; Faculty of Information and Communication Technology, Limkokwing University of Creative Technology, Inovasi 1-1, Jalan Teknokrat 1/1, 63000 Cyberjaya, Selangor, Malaysia. ' Department of Computer Science, National University of Singapore, 117417, Singapore
Abstract: For a set of multiple sequences, their patterns, Longest Common Subsequences (LCS) and Shortest Common Supersequences (SCS) represent different aspects of these sequences profile. Revealing the relationship between the patterns and LCS/SCS might provide us with a deeper view of the patterns. In this paper, we have showed that patterns LCS and SCS were closely related to each other. Based on their relations, the PALS algorithms are proposed to discover patterns in a set of biological sequences based on LCS and SCS results. Experiments show that the PALS algorithms are superior in efficiency and accuracy on a variety of sequences.
Keywords: pattern discovery; biological sequences; LCSs; longest common subsequences; SCSs; shortest common supersequences; approximation algorithms; multiple sequences; bioinformatics.
DOI: 10.1504/IJDMB.2011.045413
International Journal of Data Mining and Bioinformatics, 2011 Vol.5 No.6, pp.611 - 625
Received: 01 Feb 2010
Accepted: 01 Feb 2010
Published online: 24 Jan 2015 *