Title: Acquisition of a classification model for a risk search system from unbalanced textual examples
Authors: Shigeaki Sakurai, Ryohei Orihara
Addresses: Corporate Research and Development Center, Toshiba Corporation, 1, Komukai-Toshiba-cho, Saiwai-ku, Kawasaki 212-8582, Japan. ' Corporate Research and Development Center, Toshiba Corporation, 1, Komukai-Toshiba-cho, Saiwai-ku, Kawasaki 212-8582, Japan
Abstract: This paper proposes a method that acquires a more appropriate classification model for a risk search system analysing corporate reputation information included in bulletin board sites. The method inductively acquires the model from textual examples composed of many negative examples and a few positive examples. It selects two kinds of important negative examples by referring to expressions related to a specific label. Here, the label represents the contents of the papers. Finally, the method uses the selected negative examples and all the positive examples to acquire the model. The paper verifies the effectiveness of the method through comparative experiments.
Keywords: unbalanced textual examples; classification models; text mining; SVM; support vector machines; bulletin board sites; corporate reputation information; risk search; g-measure; Tomek links.
International Journal of Business Intelligence and Data Mining, 2009 Vol.4 No.1, pp.22 - 37
Available online: 21 May 2009 *Full-text access for editors Access for subscribers Purchase this article Comment on this article