Authors: Francis H. Hunt; Stephanie Perkins; Derek H. Smith
Addresses: School of Computing and Mathematics, University of South Wales, Pontypridd, CF37 1DL, Wales, UK ' School of Computing and Mathematics, University of South Wales, Pontypridd, CF37 1DL, Wales, UK ' School of Computing and Mathematics, University of South Wales, Pontypridd, CF37 1DL, Wales, UK
Abstract: It has been demonstrated in recent years that synthetic DNA can be used to reliably store large volumes of data. It should be possible to recover the data from the synthetic DNA after very long time periods under fairly mild storage conditions. Two key requirements are the need to avoid repeated symbols known as homopolymers and the need to avoid errors arising from secondary structures. In this paper, an error model is developed and error correction techniques are proposed for this technology. The use of variable length Huffman codes in the avoidance of homopolymers can lead to loss of synchronisation if any errors do occur. A scheme to recover synchronisation is proposed and shown to be effective.
Keywords: biological information theory; channel models; error correction codes; deoxyribonucleic acid; DNA information storage; variable length codes; synchronisation; secondary structures; homopolymers; synthetic DNA; Huffman codes.
International Journal of Information and Coding Theory, 2015 Vol.3 No.2, pp.120 - 136
Received: 20 Nov 2014
Accepted: 22 Mar 2015
Published online: 22 Oct 2015 *