Authors: Mumtaz M. Al-Mukhtar; Yasmine M. Tabra
Addresses: Department of Internet Engineering, College of Information Engineering, Al-Nahrain University, P.O. Box 64074, Aljadria, Baghdad, Iraq. ' Department of Internet Engineering, College of Information Engineering, Al-Nahrain University, P.O. Box 64074, Aljadria, Baghdad, Iraq
Abstract: The volume of mass unsolicited e-mail, often known as spam, has recently increased enormously and has become a serious threat to not only internet but also to society. It is challenging to develop spam filters that can effectively eliminate the increasing volume of unwanted e-mails automatically. The present work presents a combination of support vector machine classifier for non-linear data (using an eligible kernel function) with appropriate data pre-processing as a spam filter. Data pre-processing is a vital part of text classification where the objective is to generate feature vectors usable by SVM kernels. The pre-processing steps include HTML removal, HTML replacement, de-obfuscation and stop-word-remover. The results obtained using the pre-processing level showed an improvement in the classification level. The estimated training and classification time for different document sizes indicate that the adopted method is practical and computationally efficient. Experimental results show that the approach can enhance the filtering performance effectively.
Keywords: spam filters; kernel function; classification; support vector machines; SVM; unsolicited email; filtering performance.
International Journal of Internet Technology and Secured Transactions, 2012 Vol.4 No.1, pp.42 - 54
Available online: 26 Jan 2012 *Full-text access for editors Access for subscribers Purchase this article Comment on this article