Title: Classical and swarm-based approaches for feature selection in spam filtering
Authors: Kamilia Menghour; Labiba Souici-Meslati
Addresses: LISCO Laboratory, Badji Mokhtar-Annaba University, P.O. Box 12, 23000, Annaba, Algeria ' LISCO Laboratory, Badji Mokhtar-Annaba University, P.O. Box 12, 23000, Annaba, Algeria
Abstract: Feature selection is a significant stage in classification and data mining systems. As a preprocessing step to machine learning, it is very effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. Swarm intelligence, dealing with natural and artificial systems, composed of many individuals that coordinate using decentralised control and self-organisation, represents an interesting recent trend for feature selection. In this article, we propose several algorithms for feature selection based on classical feature ranking methods, in addition to different variants of ant colony optimisation and binary particle swarm optimisation for e-mail classification. This work presents a comparative study about the performance of these approaches when they are applied in conjunction with Naive Bayes classifiers. Our experimental results show that the proposed feature selection approaches can achieve significant predictive error rates in spam filtering; furthermore, the number of selected features is significantly decreased. These results were confirmed with our experiments on other public databases.
Keywords: feature selection; swarm intelligence; ant colony optimisation; ACO; particle swarm optimisation; PSO; spam filtering; feature ranking; naive Bayes classifiers; email classification.
International Journal of Advanced Intelligence Paradigms, 2014 Vol.6 No.3, pp.214 - 234
Received: 20 May 2013
Accepted: 17 Mar 2014
Published online: 29 Oct 2014 *