Title: Mining poly-regions in DNA

Authors: Panagiotis Papapetrou; Gary Benson; George Kollios

Addresses: Department of Information and Computer Science, Aalto University 00076, Finland ' Departments of Biology and Computer Science, Boston University, MA 02215, USA ' Computer Science Department, Boston University, MA 02215, USA

Abstract: We study the problem of mining poly-regions in DNA. A poly-region is defined as a bursty DNA area, i.e., area of elevated frequency of a DNA pattern. We introduce a general formulation that covers a range of meaningful types of poly-regions and develop three efficient detection methods. The first applies recursive segmentation and is entropy-based. The second uses a set of sliding windows that summarize each sequence segment using several statistics. Finally, the third employs a technique based on majority vote. The proposed algorithms are tested on DNA sequences of four different organisms in terms of recall and runtime.

Keywords: DNA polyregions; burstiness; sliding windows; recursive segmentation; majority vote; nucleosomes; bursty DNA; DNA sequences; bioinformatics; arrangement mining; data mining.

DOI: 10.1504/IJDMB.2012.049278

International Journal of Data Mining and Bioinformatics, 2012 Vol.6 No.4, pp.406 - 428

Received: 12 Mar 2010
Accepted: 05 Dec 2010

Published online: 17 Dec 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article