Title: BOTUX: Bayesian-like operational taxonomic unit examiner

Authors: Vishal N. Koparde; Ricky S. Adkins; Jennifer M. Fettweis; Myrna G. Serrano; Gregory A. Buck; Mark A. Reimers; Nihar U. Sheth

Addresses: Center for Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23284, USA ' Center for Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23284, USA ' Center for Study of Biological Complexity and Department of Microbiology and Immunology, Virginia Commonwealth University, Richmond, VA 23284, USA ' Center for Study of Biological Complexity and Department of Microbiology and Immunology, Virginia Commonwealth University, Richmond, VA 23284, USA ' Center for Study of Biological Complexity and Department of Microbiology and Immunology, Virginia Commonwealth University, Richmond, VA 23284, USA ' Center for Study of Biological Complexity and Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23298, USA ' Center for Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23284, USA

Abstract: Bayesian-like operational taxonomic unit examiner (BOTUX) is a new tool for the classification of 16S rRNA gene sequences into operational taxonomic units (OTUs) that addresses the problem of overestimation caused by errors introduced during PCR amplification and DNA sequencing steps. BOTUX utilises a grammar-based assignment strategy, where Bayesian models are built from each word of a given length (e.g., 8-mers). de novo analysis is possible with BOTUX as it does not require a training set, and updates probabilistic models as new sequences are recruited to an OTU. In benchmarking tests performed with real and simulated datasets of 16S rDNA sequences, BOTUX accurately identifies OTUs with comparable or better clustering efficiency and lower execution times than other OTU algorithms tested. BOTUX is the only OTU classifier, which allows incremental analysis of large datasets, and is also adept in clustering both 454 and Illumina datasets in a reasonable timeframe.

Keywords: OTU; operational taxonomic unit; Bayesian models; 16S rRNA sequences; gene sequences; sequence clustering; 454; Illumina; classification; grammar-based assignment.

DOI: 10.1504/IJCBDD.2014.061652

International Journal of Computational Biology and Drug Design, 2014 Vol.7 No.2/3, pp.130 - 145

Published online: 27 May 2014 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article