Title: Persistent queries over dynamic text streams

Authors: Javed Aslam, Ekaterina Pelekhov, Daniela Rus

Addresses: College of Computer Science, Northeastern University, Boston, MA, USA. ' Department of Computer Science, Dartmouth College, Hanover, NH, USA. ' Department of EECS, MIT and Dartmouth, Boston, MA, USA

Abstract: We wish to develop automated tools for information organisation that support information processing in the age of information overload. We present a filtering-based approach to persistent queries that uses clustering. We use the online version of the star algorithm developed in our previous work as our clustering tool because this algorithm computes, with high precision, naturally occurring topics in a collection and it admits an efficient online solution for dynamic streams of text. We describe the principle behind the filtering algorithms and show experimental data. We then discuss a system that uses these algorithms in support of information push by allowing users to submit persistent queries. Finally, we evaluate the persistent query system using TREC data.

Keywords: information retrieval; filtering; persistent queries; information processing; clustering; e-business; electronic business; dynamic text streams; digital collections; browsable hierarchies; star algorithm.

DOI: 10.1504/IJEB.2005.007273

International Journal of Electronic Business, 2005 Vol.3 No.3/4, pp.288 - 299

Published online: 30 Jun 2005 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article