Title: Correcting short reads with high error rates for improved sequencing result

Authors: Thomas K.F. Wong, T.W. Lam, P.Y. Chan, S.M. Yiu

Addresses: Faculty of Engineering, Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong. ' Faculty of Engineering, Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong. ' Faculty of Engineering, Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong. ' Faculty of Engineering, Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong

Abstract: In the sequencing process, reads of the sequence are generated, then assembled to form contigs. New technologies can produce reads faster with lower cost and higher coverage. However, these reads are shorter. With errors, short reads make the assembly step more difficult. Chaisson et al. (2004) proposed an algorithm to correct the reads prior to the assembly step. The result is not satisfactory when the error rate is high (e.g., ≥3%). We improve their approach to handle reads of higher error rates. Experimental results show that our approach is much more effective in correcting errors, producing contigs of higher quality.

Keywords: short reads; error correction; sequence assembly; bioinformatics; sequencing results; DNA sequencing; contigs.

DOI: 10.1504/IJBRA.2009.024039

International Journal of Bioinformatics Research and Applications, 2009 Vol.5 No.2, pp.224 - 237

Available online: 24 Mar 2009 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article