Title: Highly scalable algorithms for robust string barcoding

Authors: Bhaskar DasGupta, Kishori M. Konwar, Ion I. Mandoiu, Alex A. Shvartsman

Addresses: Department of Computer Science,University of Illinois at Chicago, Chicago, IL 60607-7053, USA. ' Computer Science and Engineering Department, University of Connecticut, 371 Fairfield Rd., 2155 Unit, Storrs, CT 06269-2155, USA. ' Computer Science and Engineering Department, University of Connecticut, 371 Fairfield Rd., 2155 Unit, Storrs, CT 06269-2155, USA. ' Computer Science and Engineering Department, University of Connecticut, 371 Fairfield Rd., 2155 Unit, Storrs, CT 06269-2155, USA

Abstract: String barcoding is a recently introduced technique for genomic based identification of microorganisms. In this paper, we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size, on a well equipped workstation. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem.

Keywords: string barcoding; setcover problem; greedy algorithms; microorganism identification; bioinformatics; genomic sequences; whole genome based selection; distinguisher selection; highly scalable algorithms; viruses; bacteria.

DOI: 10.1504/IJBRA.2005.007574

International Journal of Bioinformatics Research and Applications, 2005 Vol.1 No.2, pp.145 - 161

Published online: 06 Aug 2005 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article