Authors: Yong Chen, Guojun Li
Addresses: School of Mathematics and System Sciences, Shandong University, Jinan 250100, Shandong, China; School of Sciences, Jinan University, Jinan 250022, Shandong, China. ' School of Mathematics and System Sciences, Shandong University, Jinan 250100, Shandong, China; Computational Systems and Biology Laboratory, Department of Biochemistry and Molecular Biology, The University of Georgia, GA 30602, USA
Abstract: Identification of the short DNA sequence motif, which serves as binding targets for transcription factors, is a fundamental problem in both computer science and molecular biology. Especially, finding the subtle motifs with variable gaps is more challenging. In this paper, a new algorithm is presented, which explores some new strategies. Based on a neighbourhood set concept, a new probability matrix is defined, which can capture the target motifs effectively. An iterative restart strategy is used, by which we can use several similar motifs| information to detect the real motif to demonstrate the effectiveness of our algorithm. We test it on several kinds of data and compare it with some other current representation algorithms. Simulation shows that the algorithm can effectively detect the subtle motifs.
Keywords: motif ﬁnding; hamming distance; heuristic algorithms; bioinformatics; DNA sequence motifs; neighbourhood set; probability matrix; simulation.
International Journal of Bioinformatics Research and Applications, 2008 Vol.4 No.2, pp.137 - 149
Available online: 17 May 2008 *Full-text access for editors Access for subscribers Purchase this article Comment on this article