Authors: M. Zaki, M. Sayed
Addresses: System and Computer Engineering Department, Faculty of Engineering, Al-Azhar University, Nasr City, Cairo, Egypt. ' System and Computer Engineering Department, Faculty of Engineering, Al-Azhar University, Nasr City, Cairo, Egypt; Armed Forces Main Information Center (AFMIC), Cairo, Egypt
Abstract: This paper exploits a modified genetic programming (GP) approach for solving the data compression problem. In fact, the typical GP algorithm in which a candidate solution is expressed as a tree rather than a bit string, fails to solve that problem since it does not guarantee a one to one correspondence between a particular symbol and the corresponding codeword during subtree exchange operations. The nature of the problem requires generating one, and only one, codeword for each symbol of the underlying text. In the proposed scheme, the authors introduced three new operators, namely, insertion, two-level mutation and modified crossover. Accordingly, a modified version of GP is presented and applied on different data texts to validate the proposed approach. The developed algorithm can provide optimum codes since its final solution reaches Huffman tree. Moreover, it makes use of GP not only to allow optimum compression ratio but also to provide adaptive compression implementation. The adaptation is achieved so that the selection of the codebook depends on the nature of the input text. The proposed compression scheme is written in C++ and is implemented on different text types under various operational conditions. Accordingly, the algorithm performance has been measured and evaluated.
Keywords: Huffman code; genetic programming; adaptive text compression; data compression; lossless compression; alphabet; Arabic language.
International Journal of Information and Coding Theory, 2009 Vol.1 No.1, pp.88 - 108
Published online: 24 Mar 2009 *Full-text access for editors Access for subscribers Purchase this article Comment on this article