Title: An efficient approach for mining web content sensitivity

Authors: Cheng Wang, Ying Liu, Liheng Jian, Peng Zhang

Addresses: Agilent Technologies Co. Ltd., Beijing 100102, China. ' Graduate University of Chinese Academy of Sciences; Fictitious Economy and Data Science Research Center, Beijing 100080, China. ' Graduate University of Chinese Academy of Sciences. ' Graduate University of Chinese Academy of Sciences

Abstract: Abnormal remarks on the web, such as violence, threat, superstition, etc., may disturb the social order and public morality (referred as sensitive content). To provide a quantitative measure of the sensitivity of a webpage, we propose the concept of web content sensitivity which measures how sensitive a page is. We also propose a web content sensitivity mining approach. Our experiment identified a number of sensitive webpages that traditional frequency-based methods failed to find. By varying the sensitive values of the keywords, different sets of high sensitivity keywords were discovered as well as the corresponding webpages.

Keywords: utility mining; web content mining; public opinion monitoring; web content sensitivity; two-phase algorithm; internet; sensitive web pages; internet; sensitivity keywords.

DOI: 10.1504/IJKWI.2009.027927

International Journal of Knowledge and Web Intelligence, 2009 Vol.1 No.1/2, pp.95 - 109

Published online: 19 Aug 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article