Title: Set Cover Feature Selection for Text Categorisation and spam detection

Authors: Elias F. Combarro, Jose Ranilla, Manuel Roberto Berdasco, Elena Montanes, Irene Diaz

Addresses: Computer Science Department, University of Oviedo, Spain. ' Computer Science Department, University of Oviedo, Spain. ' Computer Science Department, University of Oviedo, Spain. ' Computer Science Department, University of Oviedo, Spain. ' Computer Science Department, University of Oviedo, Spain

Abstract: In this paper the performance of the Set Cover (SC) Feature Selection (FS) method for Text Categorisation (TC) and Spam Detection problems is studied. Several variants of the original method are presented either to overcome the drawback of the unbalanced problems which are usually present in TC or to increase the efficiency. The behaviour of the algorithm is tested on several collections. The experiments show these methods provide a great reduction in the dimensionality of the problem either keeping the effectiveness of the classification or causing just a slight decrease.

Keywords: feature selection; text categorisation; spam detection; scoring measures; classification.

DOI: 10.1504/IJAIP.2009.026764

International Journal of Advanced Intelligence Paradigms, 2009 Vol.1 No.4, pp.444 - 462

Published online: 25 Jun 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article