Title: Extraction of relationship between web pages and files in access logs

Authors: Qiang Song; Yousuke Watanabe; Haruo Yokota

Addresses: Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, Japan ' Global Scientific Information and Computing Center, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, Japan ' Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, Japan

Abstract: Since the Internet is sufficiently established, information on the Web is significantly enriched every day. It induces a fact that the information on Web pages has become increasingly useful in daily life. Therefore, it has become very common for us to refer to information on the Web, particularly when writing documents or programs. If we want to revisit the same Web pages to modify some part of a file later, it can be very hard to track down the Web pages originally referred to. In this paper, we propose methods for extracting relationships between files and Web pages based on the co-occurrence of data in Web-access logs and file-access logs. These relationships are very useful for revisiting Web pages related to target files. There are two approaches for merging the logs to analyse co-occurrence in these two types of access logs, involving a trade-off between accuracy and execution time. We call them the Pre-Merge and Post-Merge methods. We have evaluated these two methods using actual access logs.

Keywords: relationship analysis; file search; web search; access logs; business intelligence; data mining; co-occurrence frequency; internet; files; web pages.

DOI: 10.1504/IJBIDM.2012.049552

International Journal of Business Intelligence and Data Mining, 2012 Vol.7 No.3, pp.152 - 171

Published online: 12 Nov 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article