Title: A universal compression strategy using sorting transformation

Authors: Bo Liu; Xi Huang; Xiaoguang Liu; Gang Wang; Ming Xu

Addresses: College of Computer and Control Engineering, Nankai University, Tianjin, 300350, China ' College of Computer and Control Engineering, Nankai University, Tianjin, 300350, China ' College of Computer and Control Engineering, Nankai University, Tianjin, 300350, China ' College of Computer and Control Engineering, Nankai University, Tianjin, 300350, China ' QIHU360 Inc., Beijing, 100020, China

Abstract: Although traditional universal compression algorithms can effectively utilise repetition located in a slide window, they cannot take their own advantages for some message source in which similar messages are distributed uniformly. In this paper, we come up with a universal segmenting-sorting compression algorithm to solve this problem. The key idea is to reorder the message source before compressing it with Lz77 algorithm. We design transformation methods for two common data types, corpus of webpages and access log. The experimental results show that segmenting-sorting transformation is truly beneficial to compression ratio. Our new algorithm is able to make compression ratio 20% to 50% lower than naive Lz77 algorithm does and takes almost the same decompression time. For some read-heavy source segmenting-sorting compression can reduce space cost while guaranteeing throughput.

Keywords: segmenting; sorting; Lz77; compression; universal compression method.

DOI: 10.1504/IJCSE.2019.098531

International Journal of Computational Science and Engineering, 2019 Vol.18 No.3, pp.211 - 216

Received: 23 Jun 2016
Accepted: 30 Aug 2016

Published online: 26 Mar 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article