Title: A new algorithm for quantifying binding site pattern similarity with applications for Next Generation Sequencing

Authors: Paul W. Bible; Rasiah Loganantharaj

Addresses: The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA 70503, USA. ' The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA 70503, USA

Abstract: New sources of regulatory data, such as transcription factor ChIP-seq experiments, can yield important insights into biological function through downstream analysis of motifs. Position Frequency Matrices (PFMs) are a standard format for representing transcription factor binding patterns. Comparison measures between these binding patterns are necessary to allow more sophisticated detection and classification of regulatory sequences. In this work we have developed a novel algorithm for gapped alignment of PFMs called PfmSim. We compare our measure with a standard measure, Sandelin and Wasserman, on similarity and classification tasks. Our measure gives better similarity values as evaluated by multiple tests.

Keywords: PFM; position frequency matrix; motif analysis; ChIP-Seq; pattern similarity; NGS; next generation sequencing; binding site patterns; binding sites; bioinformatics.

DOI: 10.1504/IJBRA.2012.045973

International Journal of Bioinformatics Research and Applications, 2012 Vol.8 No.1/2, pp.4 - 17

Published online: 05 Dec 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article