Title: Inferring relevant blocks on hyperlinked web page based on block-to-block similarity

Authors: Keiichiro Tsukamoto; Yuki Koizumi; Hiroyuki Ohsaki; Kunio Hato; Junichi Murayama

Addresses: Graduate School of Information Science and Technology, Osaka University, 1-5 Yamadaoka, Suita, Osaka, 565-0871, Japan ' Graduate School of Information Science and Technology, Osaka University, 1-5 Yamadaoka, Suita, Osaka, 565-0871, Japan ' Department of Informatics School of Science and Technology, Kwansei Gakuin University, 2-1 Gakuen, Sanda, Hyogo 669-1337, Japan ' NTT Secure Platform Laboratories, NTT Corporation, 3-9-11, Midori-cho, Musashino, Tokyo 180-8585, Japan ' NTT Secure Platform Laboratories, NTT Corporation, 3-9-11, Midori-cho, Musashino, Tokyo 180-8585, Japan

Abstract: Internet users devote considerable time and effort to collecting information from the web. To do so efficiently, after following a hyperlink, a user must be able to rapidly determine whether the desired information is contained on the destination web page. In this paper, therefore, we propose a method called hyperlink referring block estimation (HERB), which infers the existence and location of relevant contents on destination web pages. HERB utilises user context in web browsing, in particular, the selected hyperlink and the text around it. Through experiments simulating ordinary web browsing, we quantitatively investigate the effectiveness of HERB. Our experiments show that HERB can infer blocks relevant to a hyperlink with approximately 65% precision and 70% recall. Furthermore, we design two HERB implementations, namely, a web proxy and a web browser, and we present an overview of a web proxy prototype and an example use case.

Keywords: web browsing; hyperlinks; relevant block inference; web page segmentation; cosine similarity; web pages; block-to-block similarity; content relevance; user context; simulation; web proxy; web browsers.

DOI: 10.1504/IJKWI.2013.060266

International Journal of Knowledge and Web Intelligence, 2013 Vol.4 No.4, pp.279 - 299

Received: 27 Nov 2012
Accepted: 12 Jul 2013

Published online: 26 Jul 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article