Title: Impact of traffic distribution on web cache performance

Authors: Manuel Gómez Zotano; Jorge Gómez-Sanz; Juan Pavón

Addresses: Corporación de Radiotelevisión Española, Alcalde Sainz de Baranda 92, 28007, Madrid, Spain ' Facultad de Informática UCM, Universidad Complutense de Madrid, Ciudad Universitaria s/n, 28040, Madrid, Spain ' Facultad de Informática UCM, Universidad Complutense de Madrid, Ciudad Universitaria s/n, 28040, Madrid, Spain

Abstract: Caches are a critical element of web-based information systems. Understanding the expected behaviour of cache policies is especially important for achieving good quality of service. Existing works have suggested that the behaviour of the web demand can be modelled as a Zipf distribution with α ≤ 1. New evidence, which is presented in this paper, shows that today websites are following Zipf distributions with α > 1. This article analyses real logs obtained from the client layer of high traffic websites. The main result of this article is that under these conditions, the cache hit ratio can be extremely high with a very small cache size. This means that a very expensive and high resource demanding cache is not needed for effective implementation: a cache size equal to 0.6% of the working set is enough to reach more than 80% of hit ratio, once the right replacement policy has been chosen.

Keywords: website traffic distribution; web cache performance; Zipf distribution; web logs; cache hit rate; web performance; web information systems; quality of service; QoS; cache size; replacement policy.

DOI: 10.1504/IJWET.2015.072349

International Journal of Web Engineering and Technology, 2015 Vol.10 No.3, pp.202 - 213

Published online: 09 Oct 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article