Title: CCReSD: concept-based categorisation of Hidden Web databases
Author: Yih-Ling Hedley, Muhammad Younas, Anne James
Faculty of Engineering and Computing, Coventry University, Coventry CV1 5FB, UK.
Department of Computing, Oxford Brookes University, Oxford OX33 1HX, UK.
Faculty of Engineering and Computing, Coventry University, Coventry CV1 5FB, UK
Abstract: Hidden Web databases dynamically generate results in response to users' queries. The categorisation of such databases into a category scheme has been widely employed in information searches. We present a Concept-based Categorisation over Refined Sampled Documents (CCReSD) approach that effectively handles information extraction, summarisation and categorisation of such databases. CCReSD detects and extracts query-related information from sampled documents of databases. It generates terms and frequencies to summarise database contents. It also generates descriptions of concepts from their coverage and specificity given in a category scheme. We conduct experiments to evaluate our approach and to show that it assigns databases with more relevant subject categories.
Keywords: hidden web databases; database categorisation; information extraction; information retrieval; subject categories.
Int. J. of High Performance Computing and Networking, 2007 Vol.5, No.1/2, pp.24 - 33
Available online: 14 Nov 2007