Title: AliBiMotif: Integrating alignment and biclustering to unravel transcription factor binding sites in DNA sequences

Authors: Joana P. Gonçalves; Yves Moreau; Sara C. Madeira

Addresses: Knowledge Discovery and Bioinformatics (KDBIO), INESC-ID Computer Science Department, IST, Technical University of Lisbon, Rua Alves Redol 9, 1000-029 Lisboa, Portugal ' BIOI, Electrical Engineering Dept. (ESAT-SCD), KULeuven, Kasteelpark Arenberg 10, 3001 Heverlee, Belgium ' Knowledge Discovery and Bioinformatics (KDBIO), INESC-ID Computer Science Department, IST, Technical University of Lisbon, Rua Alves Redol 9, 1000-029 Lisboa, Portugal

Abstract: Transcription Factors (TFs) control transcription by binding to specific sites in the promoter regions of the target genes, which can be modelled by structured motifs. In this paper we propose AliBiMotif, a method combining sequence alignment and a biclustering approach based on efficient string matching techniques using suffix trees to unravel approximately conserved sets of blocks (structured motifs) while straightforwardly disregarding non-conserved stretches in-between. The ability to ignore the width of non-conserved regions is a major advantage of the proposed method over other motif finders, as the lengths of the binding sites are usually easier to estimate than the separating distances.

Keywords: biclustering; sequence alignment; motif finding algorithms; structured motifs; motif identification; binding sites; transcription factors; TFBS; cis-regulatory module discovery; promoter regions; motif finders; integrative mining; bioinformatics; DNA sequences.

DOI: 10.1504/IJDMB.2012.048198

International Journal of Data Mining and Bioinformatics, 2012 Vol.6 No.2, pp.196 - 215

Published online: 17 Dec 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article