Title: CCReSD: concept-based categorisation of Hidden Web databases

Authors: Yih-Ling Hedley, Muhammad Younas, Anne James

Addresses: Faculty of Engineering and Computing, Coventry University, Coventry CV1 5FB, UK. ' Department of Computing, Oxford Brookes University, Oxford OX33 1HX, UK. ' Faculty of Engineering and Computing, Coventry University, Coventry CV1 5FB, UK

Abstract: Hidden Web databases dynamically generate results in response to users| queries. The categorisation of such databases into a category scheme has been widely employed in information searches. We present a Concept-based Categorisation over Refined Sampled Documents (CCReSD) approach that effectively handles information extraction, summarisation and categorisation of such databases. CCReSD detects and extracts query-related information from sampled documents of databases. It generates terms and frequencies to summarise database contents. It also generates descriptions of concepts from their coverage and specificity given in a category scheme. We conduct experiments to evaluate our approach and to show that it assigns databases with more relevant subject categories.

Keywords: hidden web databases; database categorisation; information extraction; information retrieval; subject categories.

DOI: 10.1504/IJHPCN.2007.015761

International Journal of High Performance Computing and Networking, 2007 Vol.5 No.1/2, pp.24 - 33

Published online: 14 Nov 2007 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article