Title: Facilitating discovery on the private web using dataset digests

Authors: Peter Mork, Ken Smith, Barbara Blaustein, Christopher Wolf, Ken Samuel, Keri Sarver, Irina Vayndiner

Addresses: The MITRE Corporation, 7515 Colshire Dr, McLean, VA 22102, USA. ' The MITRE Corporation, 7515 Colshire Dr, McLean, VA 22102, USA. ' The MITRE Corporation, 7515 Colshire Dr, McLean, VA 22102, USA. ' The MITRE Corporation, 7515 Colshire Dr, McLean, VA 22102, USA. ' The MITRE Corporation, 7515 Colshire Dr, McLean, VA 22102, USA. ' The MITRE Corporation, 7515 Colshire Dr, McLean, VA 22102, USA. ' The MITRE Corporation, 7515 Colshire Dr, McLean, VA 22102, USA

Abstract: Whereas strategies for discovering content on the surface web are commonplace, similar strategies for the private web are non-existent. In this paper, we first establish a general framework for advertising the existence of private web resources that subsumes many existing summarisation strategies, and is based on succinct statistical summaries (which we call digests). We then investigate the trade-off between the data owners| desires to minimise disclosure of sensitive information and the searchers| desires to minimise query error, demonstrating that our techniques are superior to using k-anonymity for that purpose. Finally, we show that our techniques for summarisation do, in fact, make it possible to discover private web data resources.

Keywords: data discovery; data summarisation; privacy; search; indexing; histograms; private web; disclosure; dataset digests; web content discovery; statistical summaries.

DOI: 10.1504/IJMSO.2010.034042

International Journal of Metadata, Semantics and Ontologies, 2010 Vol.5 No.3, pp.170 - 183

Published online: 06 Jul 2010 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article