Title: Improved short adjacent repeat identification using three evolutionary Monte Carlo schemes

Authors: Jin Xu; Qiwei Li; Victor O.K. Li; Shuo-Yen Robert Li; Xiaodan Fan

Addresses: Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong ' Department of Statistics, The Chinese University of Hong Kong, Sha Tin, Hong Kong ' Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong; Department of Computer Engineering, King Saud University, Saudi Arabia ' Department of Information Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong ' Department of Statistics, The Chinese University of Hong Kong, Sha Tin, Hong Kong

Abstract: This paper employs three Evolutionary Monte Carlo (EMC) schemes to solve the Short Adjacent Repeat Identification Problem (SARIP), which aims to identify the common repeat units shared by multiple sequences. The three EMC schemes, i.e., Random Exchange (RE), Best Exchange (BE), and crossover are implemented on a parallel platform. The simulation results show that compared with the conventional Markov Chain Monte Carlo (MCMC) algorithm, all three EMC schemes can not only shorten the computation time via speeding up the convergence but also improve the solution quality in difficult cases. Moreover, we observe that the performances of different EMC schemes depend on the degeneracy degree of the motif pattern.

Keywords: short adjacent repeats; evolutionary Monte Carlo; EMC; parallel tempering; maximum a posteriori; Monte Carlo simulation; DNA sequences; convergence speed; motif patterns; bioinformatics; parallel computing.

DOI: 10.1504/IJDMB.2013.056614

International Journal of Data Mining and Bioinformatics, 2013 Vol.8 No.4, pp.462 - 479

Received: 04 May 2011
Accepted: 04 May 2011

Published online: 20 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article