Title: A graph traversal attack on Bloom filter-based medical data aggregation

Authors: William Mitchell; Rinku Dewri; Ramakrishna Thurimella; Max Roschke

Addresses: Department of Computer Science, University of Denver, Denver, CO 80210, USA ' Department of Computer Science, University of Denver, Denver, CO 80210, USA ' Department of Computer Science, University of Denver, Denver, CO 80210, USA ' Department of Computer Science, University of Denver, Denver, CO 80210, USA

Abstract: We present a novel cryptanalytic method based on graph traversals to show that record linkage using Bloom filter encoding does not preserve privacy in a two-party setting. Bloom filter encoding is often suggested as a practical approach to medical data aggregation. This attack is stronger than a simple dictionary attack in that it does not assume knowledge of the universe. The attack is very practical and produced accurate results when experimented on large amounts of name-like data derived from a North Carolina voter registration database. We also give theoretical arguments that show that going from bigrams to n-grams, n > 2, does not increase privacy; on the contrary, it actually makes the attack more effective. Finally, some ways to resist this attack are suggested.

Keywords: Bloom filter encoding; BFE; privacy-preserving record linkage; PPRL; medical data aggregation; cryptanalysis; two-party linkage; private record linkage; PRL.

DOI: 10.1504/IJBDI.2017.086956

International Journal of Big Data Intelligence, 2017 Vol.4 No.4, pp.217 - 226

Received: 16 Oct 2015
Accepted: 18 Aug 2016

Published online: 04 Aug 2017 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article