Title: Accuracy and efficiency of algorithms for the demarcation of bacterial ecotypes from DNA sequence data

Authors: Juan Carlos Francisco; Frederick M. Cohan; Danny Krizanc

Addresses: Department of Mathematics and Computer Science, Wesleyan University, Middletown, CT, USA ' Department of Biology, Wesleyan University, Middletown, CT, USA ' Department of Mathematics and Computer Science, Wesleyan University, Middletown, CT, USA

Abstract: Identification of closely related, ecologically distinct populations of bacteria would benefit microbiologists working in many fields including systematics, epidemiology and biotechnology. Several laboratories have recently developed algorithms aimed at demarcating such 'ecotypes'. We examine the ability of four of these algorithms to correctly identify ecotypes from sequence data. We tested the algorithms on synthetic sequences, with known history and habitat associations, generated under the stable ecotype model and on data from Bacillus strains isolated from Death Valley where previous work has confirmed the existence of multiple ecotypes. We found that one of the algorithms (ecotype simulation) performs significantly better than the others (AdaptML, GMYC, BAPS) in both instances. Unfortunately, it was also shown to be the least efficient of the four. While ecotype simulation is the most accurate, it is by a large margin the slowest of the algorithms tested. Attempts at improving its efficiency are underway.

Keywords: bacterial ecotypes; demarcation algorithms; stable ecotype model; DNA sequences; ecotype simulation; AdaptML; GYMC; BAPS; Bacillus strains; bioinformatics.

DOI: 10.1504/IJBRA.2014.062992

International Journal of Bioinformatics Research and Applications, 2014 Vol.10 No.4/5, pp.409 - 425

Published online: 24 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article