Title: Hyper-textual language model for web information retrieval
Author: Ying Xie, Vijay V. Raghavan, Andrew Young
Department of Computer Science and Information Systems, Kennesaw State University, Kennesaw, GA 30144, USA.
The Centre for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA 70503, USA.
4Access Communications Cumming, GA 30040, USA
Abstract: In this paper, we propose a unified retrieval model that is called the hyper-textual language model for web information retrieval. The proposed model seamlessly integrates information from multiple sources including web content, hyperlinks and the topology of the web in a unified modelling framework. On the one hand, this model extends the language modelling technique to accommodate special structural and semantic information brought by the hyperlinks of the web; on the other hand, it provides a formal retrieval model that realises topic-relevant pageranking. Experimental study on a university website shows that this formal retrieval model outperforms several alternative search techniques including Google and Inktomi on a group of test queries.
Keywords: granular computing; hyper-textual language models; information retrieval; language models; page ranking; web search; web information; internet; hypertext; web content; hyperlinks; web topology;unified modelling; semantics; page ranking; webpages; university websites.
Int. J. of Granular Computing, Rough Sets and Intelligent Systems, 2009 Vol.1, No.2, pp.190 - 202
Submission date: 01 Oct 2008
Date of acceptance: 05 Jan 2009
Available online: 27 Aug 2009