Title: Self-supervised capturing of users' activities from weblogs

Authors: The-Minh Nguyen; Takahiro Kawamura; Yasuyuki Tahara; Akihiko Ohsuga

Addresses: Graduate School of Information Systems, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-shi, Tokyo 182-8585, Japan. ' Graduate School of Information Systems, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-shi, Tokyo 182-8585, Japan. ' Graduate School of Information Systems, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-shi, Tokyo 182-8585, Japan. ' Graduate School of Information Systems, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-shi, Tokyo 182-8585, Japan

Abstract: The goal of this paper is to describe a method to automatically extract all basic attributes namely actor, action, object, time and location which belong to an activity from Japanese weblogs. Sentences retrieved from weblogs are often diversified, complex, syntactically wrong, have emoticons and new words. There are some works that have tried to extract users' activities in sentences retrieved from web and weblogs. However, these works have several limitations, such as inability of extracting infrequent activities, high setup cost, limitation on the types of sentences that can be handled, necessary of preparing a list of object and action. To resolve these problems, we propose a novel approach that treats the activity extraction as a sequence labelling problem, and automatically makes its own training data. This approach can extract infrequent activities, and has advantages such as scalability, and unnecessary any hand-tagged data. Since it does not require to fix the positions and the number of the attributes in activity sentences, this approach can extract all attributes, with high recall.

Keywords: human activities; semantic networks; weblogs; weblog mining; self-supervised learning; conditional random fields; blogs; blog mining; Japan; user attributes; activity extraction; sequence labelling; infrequent activities; attribute extraction.

DOI: 10.1504/IJIIDS.2012.045117

International Journal of Intelligent Information and Database Systems, 2012 Vol.6 No.1, pp.61 - 76

Received: 17 Jul 2010
Accepted: 28 Dec 2010

Published online: 16 Aug 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article