Title: Duplicate detection in pay-per-click streams using temporal stateful Bloom filters

Authors: Chamila Walgampaya; Mehmed Kantardzic; Brent Wenerstrom

Addresses: Computer Science and Computer Engineering Department, Speed School of Engineering, University of Louisville, Louisville, KY 40292, USA. ' Computer Science and Computer Engineering Department, Speed School of Engineering, University of Louisville, Louisville, KY 40292, USA. ' Computer Science and Computer Engineering Department, Speed School of Engineering, University of Louisville, Louisville, KY 40292, USA

Abstract: Detecting duplicates in click data streams is an important task to fight against click fraud, which is the act of generating false clicks in internet advertising. Revenue generation advertising models, that charge advertisers for each click, leave space for individuals or rival companies to generate false clicks. The extent of click fraud's damage to online advertising has grown tremendously over the years. In this paper, we consider the problem of detecting duplicates in click data streams. Our solution uses a modified version of the counting Bloom filter. The temporal stateful Bloom filter (TSBF) extends the standard counting Bloom filter by replacing the bit-vector with an array of counters of states. These counters are dynamic and decay with time. We conducted a comprehensive set of experiments using synthetic and real world data. Results are compared with buffering techniques used in NetMosaics, a click fraud detection and prevention solution. Our results show that TSBF approach achieves 99% accuracy on duplicate detection, while keeping its space requirement a constant.

Keywords: click fraud; Bloom filters; BFs; streaming data; duplicate detection; pay-per-click advertising models; duplicates; click data streams; false clicks; internet advertising.

DOI: 10.1504/IJDATS.2012.050405

International Journal of Data Analysis Techniques and Strategies, 2012 Vol.4 No.4, pp.340 - 377

Published online: 06 Sep 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article