Title: Set Cover Feature Selection for Text Categorisation and spam detection
Authors: Elias F. Combarro, Jose Ranilla, Manuel Roberto Berdasco, Elena Montanes, Irene Diaz
Addresses: Computer Science Department, University of Oviedo, Spain. ' Computer Science Department, University of Oviedo, Spain. ' Computer Science Department, University of Oviedo, Spain. ' Computer Science Department, University of Oviedo, Spain. ' Computer Science Department, University of Oviedo, Spain
Abstract: In this paper the performance of the Set Cover (SC) Feature Selection (FS) method for Text Categorisation (TC) and Spam Detection problems is studied. Several variants of the original method are presented either to overcome the drawback of the unbalanced problems which are usually present in TC or to increase the efficiency. The behaviour of the algorithm is tested on several collections. The experiments show these methods provide a great reduction in the dimensionality of the problem either keeping the effectiveness of the classification or causing just a slight decrease.
Keywords: feature selection; text categorisation; spam detection; scoring measures; classification.
DOI: 10.1504/IJAIP.2009.026764
International Journal of Advanced Intelligence Paradigms, 2009 Vol.1 No.4, pp.444 - 462
Published online: 25 Jun 2009 *
Full-text access for editors Access for subscribers Purchase this article Comment on this article