Title: Online commercial intention detection framework based on web pages

Authors: Huakang Li; Xiaofeng Xu; Longbin Lai; Yao Shen

Addresses: Department of Computer Science and Engineering, Shanghai Jiao Tong University, SEIEE Building 3-118, 800 Dongchuan Road, Shanghai, 200240, China ' Department of Computer Science and Engineering, Shanghai Jiao Tong University, SEIEE Building 3-118, 800 Dongchuan Road, Shanghai, 200240, China ' Department of Computer Science and Engineering, Shanghai Jiao Tong University, SEIEE Building 3-118, 800 Dongchuan Road, Shanghai, 200240, China ' Department of Computer Science and Engineering, Shanghai Jiao Tong University, SEIEE Building 3-118, 800 Dongchuan Road, Shanghai, 200240, China

Abstract: The China Internet Network Information Centre (CNNIC) published that internet users around the world mostly spent 10-16 hours per week online. For effective advertising and social information publishing on the internet, how to dig out the commercial value from users' online behaviour becomes a new challenge compared with the traditional recommendation system. In this paper, we propose a novel system named 'online commercial intention (OCI) detection system' using users' global web browsing history to predict potential purchasing products on an online shopping platform. A 'commercial keyword dictionary (KD)' that reveals the relationship between user queries and product categories is firstly set up by analysing the click distribution of billion queries on the shopping platform. Footprints of millions of internet users are gathered and the raw page contents are crawled. Keywords in these pages are extracted using N-gram algorithm and commercial probabilities are estimated with query frequency (QF), inverse category frequency (ICF), etc. The page OCI is estimated by merging the KD matrices of its commercial keywords. In order to increase categories' coherence and accuracy, we provide a category similarity model to observe the distance between top N categories. The experiment results show that category prediction accuracy reaches 86% with manual evaluation.

Keywords: user behaviour; online behaviour; online commercial intention; OCI; user profiles; large-scale data; commercial keyword dictionary; category similarity modelling; commercial probabilities; product categories; web browsing history; potential purchases; online shopping; user queries; product categories; click distribution; query frequency; inverse category frequency; commercial keywords.

DOI: 10.1504/IJCSE.2016.076220

International Journal of Computational Science and Engineering, 2016 Vol.12 No.2/3, pp.176 - 185

Received: 01 Feb 2013
Accepted: 09 Jun 2013

Published online: 30 Apr 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article