Title: Hyper-textual language model for web information retrieval

Authors: Ying Xie, Vijay V. Raghavan, Andrew Young

Addresses: Department of Computer Science and Information Systems, Kennesaw State University, Kennesaw, GA 30144, USA. ' The Centre for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA 70503, USA. ' 4Access Communications Cumming, GA 30040, USA

Abstract: In this paper, we propose a unified retrieval model that is called the hyper-textual language model for web information retrieval. The proposed model seamlessly integrates information from multiple sources including web content, hyperlinks and the topology of the web in a unified modelling framework. On the one hand, this model extends the language modelling technique to accommodate special structural and semantic information brought by the hyperlinks of the web; on the other hand, it provides a formal retrieval model that realises topic-relevant pageranking. Experimental study on a university website shows that this formal retrieval model outperforms several alternative search techniques including Google and Inktomi on a group of test queries.

Keywords: granular computing; hyper-textual language models; information retrieval; language models; page ranking; web search; web information; internet; hypertext; web content; hyperlinks; web topology;unified modelling; semantics; page ranking; webpages; university websites.

DOI: 10.1504/IJGCRSIS.2009.028009

International Journal of Granular Computing, Rough Sets and Intelligent Systems, 2009 Vol.1 No.2, pp.190 - 202

Received: 02 Oct 2008
Accepted: 05 Jan 2009

Published online: 27 Aug 2009 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article