Title: The cooperation model for multi-agents and the identification on replicated collections for web crawler

Authors: Kai Gao, Shengwang Li

Addresses: College of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang Hebei, 050018, China. ' College of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang Hebei, 050018, China

Abstract: The quality of a web crawler increases if it can assess whether a newly crawled web page is a duplicate of a previously crawled page, so the strategy on detecting and filtering duplicates is important. In this paper, a dynamic cooperation model for different agents| message exchanging is presented and then the identification on replicated collections is proposed. The space complexity of the algorithm is not high and the performances on time complexity and recall ratio are better than the primary algorithm. Both the experimental results and the application validate the feasibility of the algorithm, while existing problems and the future work are also presented in the end.

Keywords: cooperation modelling; multi-agent systems; MAS; agent-based systems; replicated collections; web crawlers; search engines; URL; hash function; message exchange; web pages.

DOI: 10.1504/IJMIC.2010.037034

International Journal of Modelling, Identification and Control, 2010 Vol.11 No.3/4, pp.224 - 231

Available online: 21 Nov 2010 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article