Title: Breaking the computational barrier: a divide-conquer and aggregate based approach for Alu insertion site characterisation

Authors: Kun Zhang, Wei Fan, Prescott Deininger, Andrea Edwards, Zujia Xu, Dongxiao Zhu

Addresses: Department of Computer Science, Xavier University of Louisiana, New Orleans, Louisiana 70125, USA. ' IBM T.J. Watson, Hawthorne, New York 10532, USA. ' Tulane Cancer Center, Tulane School of Public Health and Tropical Medicine, New Orleans, Louisiana 70122, USA. ' Department of Computer Science, Xavier University of Louisiana, New Orleans, Louisiana 70125, USA. ' Department of Computer Science, Dillard University, New Orleans, Louisiana 70122, USA. ' Department of Computer Science, University of New Orleans, New Orleans, Louisiana 70148, USA

Abstract: Insertion site characterisation of Alu elements is an important problem in primate-specific bioinformatics research. Key characteristics of this challenging problem include: data are not in the pre-defined feature vectors for predictive model construction; without any prior knowledge, can we discover the general patterns that could exist and also make biological insights?; how to obtain the compact yet discriminative patterns given a search space of 4200? This paper provides an integrated algorithmic framework for fulfilling the above mining tasks. Compared to the benchmark biological study, our results provide a further refined analysis of the patterns involved in Alu insertion. In particular, we acquire a 200nt predictive profile around the primary insertion site which not only contains the widely accepted consensus, but also suggests a longer pattern (T)7AA[G|A]AATAA. This pattern provides more insight into the favourable sequence variations allowed for preferred binding and cleavage by the L1 ORF2 endonuclease. The proposed method is general enough that can be also applied to other sequence detection problems, such as microRNA target prediction.

Keywords: frequent pattern discovery; Alu insertion sites; feature construction; sequence-based prediction; data mining; machine learning; primate-specific bioinformatics; Alu elements; sequence detection; microRNA target prediction; retrotransposable elements; mobile DNA sequences.

DOI: 10.1504/IJCBDD.2009.030763

International Journal of Computational Biology and Drug Design, 2009 Vol.2 No.4, pp.302 - 322

Published online: 04 Jan 2010 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article