Title: Acquisition of a classification model for a risk search system from unbalanced textual examples

Authors: Shigeaki Sakurai, Ryohei Orihara

Addresses: Corporate Research and Development Center, Toshiba Corporation, 1, Komukai-Toshiba-cho, Saiwai-ku, Kawasaki 212-8582, Japan. ' Corporate Research and Development Center, Toshiba Corporation, 1, Komukai-Toshiba-cho, Saiwai-ku, Kawasaki 212-8582, Japan

Abstract: This paper proposes a method that acquires a more appropriate classification model for a risk search system analysing corporate reputation information included in bulletin board sites. The method inductively acquires the model from textual examples composed of many negative examples and a few positive examples. It selects two kinds of important negative examples by referring to expressions related to a specific label. Here, the label represents the contents of the papers. Finally, the method uses the selected negative examples and all the positive examples to acquire the model. The paper verifies the effectiveness of the method through comparative experiments.

Keywords: unbalanced textual examples; classification models; text mining; SVM; support vector machines; bulletin board sites; corporate reputation information; risk search; g-measure; Tomek links.

DOI: 10.1504/IJBIDM.2009.025409

International Journal of Business Intelligence and Data Mining, 2009 Vol.4 No.1, pp.22 - 37

Published online: 21 May 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article