Title: Using Laplace and angular measures for Feature Selection in Text Categorisation

Authors: Elena Montanes, Pedro Alonso, Elias F. Combarro, Irene Diaz, Raquel Cortina, Jose Ranilla

Addresses: Computer Science Department, University of Oviedo, Spain. ' Mathematics Department, University of Oviedo, Spain. ' Computer Science Department, University of Oviedo, Spain. ' Computer Science Department, University of Oviedo, Spain. ' Computer Science Department, University of Oviedo, Spain. ' Computer Science Department, University of Oviedo, Spain

Abstract: Text Categorisation (TC) consists of automatically assigning documents to a set of prefixed categories. It usually involves the management of a huge number of features. Some of them are irrelevant or noisy which mislead the classifiers. Thus, they are reduced to increase the efficiency and effectiveness of the classification. In this paper we propose to select relevant features using two different families of filtering measures, which are simpler than other usual measures applied for this purpose. The experiments over three corpora show that, in general, the proposed measures perform equal or better than the existing ones, sometimes allowing greater reductions.

Keywords: feature selection; text categorisation; polynomial filtering measures.

DOI: 10.1504/IJAIP.2008.020819

International Journal of Advanced Intelligence Paradigms, 2008 Vol.1 No.1, pp.40 - 59

Published online: 17 Oct 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article