Title: Predictive analytics for spam email classification using machine learning techniques
Authors: Pradeep Kumar
Addresses: Department of Computer Science and Information Technology, Maulana Azad National Urdu University, Hyderabad, Telangana, India
Abstract: Automated text classification is the most widely used approach to manage an enormous amount of unstructured text data in digital forms, which is continuously increasing across the globe. Machine learning techniques are applied for automatic email filtering effectively to detect the spam mail and prevent them from delivering into the user's inbox. This paper used Logistic regression, k-Nearest Neighbours (k-NN), Naive Bayes, Decision Trees, AdaBoost, ANNs, and SVMs for spam email classification. All the classifiers are learned, and the performance measured in terms of precision, recall, and accuracy using a set of systematic experiments conducted on the Spambase data set extracted from the UCI Machine Learning Repository. The effectiveness of each model is empirically illustrated to find a better and viable alternative model. The quantitative performance analysis of supervised and hybrid learning techniques is presented in detail. Experimental results indicate that ensemble methods outperform in terms of accuracy compared with other methods applied.
Keywords: text analytics; feature selection; predictive modelling; spam filtering; machine learning techniques.
DOI: 10.1504/IJCAT.2020.111844
International Journal of Computer Applications in Technology, 2020 Vol.64 No.3, pp.282 - 296
Received: 04 May 2020
Accepted: 26 Jun 2020
Published online: 16 Dec 2020 *