Title: Predictive analytics for spam email classification using machine learning techniques

Authors: Pradeep Kumar

Addresses: Department of Computer Science and Information Technology, Maulana Azad National Urdu University, Hyderabad, Telangana, India

Abstract: Automated text classification is the most widely used approach to manage an enormous amount of unstructured text data in digital forms, which is continuously increasing across the globe. Machine learning techniques are applied for automatic email filtering effectively to detect the spam mail and prevent them from delivering into the user's inbox. This paper used Logistic regression, k-Nearest Neighbours (k-NN), Naive Bayes, Decision Trees, AdaBoost, ANNs, and SVMs for spam email classification. All the classifiers are learned, and the performance measured in terms of precision, recall, and accuracy using a set of systematic experiments conducted on the Spambase data set extracted from the UCI Machine Learning Repository. The effectiveness of each model is empirically illustrated to find a better and viable alternative model. The quantitative performance analysis of supervised and hybrid learning techniques is presented in detail. Experimental results indicate that ensemble methods outperform in terms of accuracy compared with other methods applied.

Keywords: text analytics; feature selection; predictive modelling; spam filtering; machine learning techniques.

DOI: 10.1504/IJCAT.2020.111844

International Journal of Computer Applications in Technology, 2020 Vol.64 No.3, pp.282 - 296

Received: 04 May 2020
Accepted: 26 Jun 2020

Published online: 16 Dec 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article