Authors: Yih-Ling Hedley, Muhammad Younas, Anne James
Addresses: Faculty of Engineering and Computing, Coventry University, Coventry CV1 5FB, UK. ' Department of Computing, Oxford Brookes University, Oxford OX33 1HX, UK. ' Faculty of Engineering and Computing, Coventry University, Coventry CV1 5FB, UK
Abstract: Hidden Web databases dynamically generate results in response to users| queries. The categorisation of such databases into a category scheme has been widely employed in information searches. We present a Concept-based Categorisation over Refined Sampled Documents (CCReSD) approach that effectively handles information extraction, summarisation and categorisation of such databases. CCReSD detects and extracts query-related information from sampled documents of databases. It generates terms and frequencies to summarise database contents. It also generates descriptions of concepts from their coverage and specificity given in a category scheme. We conduct experiments to evaluate our approach and to show that it assigns databases with more relevant subject categories.
Keywords: hidden web databases; database categorisation; information extraction; information retrieval; subject categories.
International Journal of High Performance Computing and Networking, 2007 Vol.5 No.1/2, pp.24 - 33
Published online: 14 Nov 2007 *Full-text access for editors Access for subscribers Purchase this article Comment on this article