Title: The estimate method of the omission of Japanese inquiry texts using an LDA algorithm

Authors: Tomohiko Harada; Kazuhiko Tsuda; Nobuo Suzuki; Yoshikatsu Fujita

Addresses: Graduate School of Systems and Information Engineering, University of Tsukuba, Tokyo 112-0012, Japan ' Graduate School of Systems and Information Engineering, University of Tsukuba, Tokyo 112-0012, Japan ' KDDI R&D Laboratories, Tokyo 102-8460, Japan ' Department of Sociology, Teikyo University, Tokyo 192-0395, Japan

Abstract: Inquiries through web forms and emails are becoming increasingly common. These inquiry texts usually include many informal expressions, using a colloquial style more akin to spoken language, with words omitted, causing the meaning of sentences to become ambiguous and sometimes misunderstood. In this paper, we focus on the frequently omitted noun 'B' in the noun phrase 'A NO B' (usually meaning B of A) seen in colloquial style inquiry text and propose a method to predict the omitted noun 'B' from the context and knowledge using topic information. From the results of an evaluation experiment, we confirm that our method improved the prediction accuracy by 11.34% compared to the conventional method and predicted the omitted word with an accuracy of more than 75% using latent Dirichlet allocation (LDA). Note: In this paper, italic fonts are used to express Japanese pronunciation. (e.g., 'NO' expresses the pronunciation of the Japanese connective particle 'NO'.)

Keywords: colloquial expressions; omissions; topic information; LDA; latent Dirichlet allocation; Japanese texts; Japan; web forms; emails; informal expressions; enquiry texts; prediction accuracy; online enquiries.

DOI: 10.1504/IJCAT.2015.071980

International Journal of Computer Applications in Technology, 2015 Vol.52 No.2/3, pp.186 - 195

Published online: 26 Sep 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article