An evaluation of provenance-based near-duplicates detection
by Y. Syed Mudhasir; J. Deepika; S. Sendhilkumar
International Journal of Knowledge and Web Intelligence (IJKWI), Vol. 2, No. 2/3, 2011

Abstract: Any existing search engine suffers the problem of redundancy in their search results. Detecting and eliminating such redundancy (near-duplicates) is one thrust area of research conducted widely by many search engine researchers. Provenance-based factors would improve the web search in view of providing beneficial quality content to the user. For users, many factors that affect personalisation may prove to be useful in determining the quality and trust in web documents. Also provenance information is helpful in filtering near duplicates from search results based on 6W factors. Hence this paper is aimed towards developing a web search system using provenance-based technique of near-duplicates detection and elimination. This system incorporates a personalised crawler (focused crawler) for computing author credentials which contributes to the trustworthiness of a web document. Finally, the results of the proposed system are compared with existing algorithms using a test bed of web documents.

Online publication date: Fri, 09-Dec-2011

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Knowledge and Web Intelligence (IJKWI):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?

Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email