Title: Spam filtering based on PV-DBOW model

Authors: Ghizlane Hnini; Anass Fahfouh; Jamal Riffi; Mohamed Adnane Mahraz; Ali Yahyaouy; Hamid Tairi

Addresses: LISAC Laboratory, Faculty of Sciences Dhar El Mahraz, University Sidi Mohamed Ben Abdellah, Morocco ' LISAC Laboratory, Faculty of Sciences Dhar El Mahraz, University Sidi Mohamed Ben Abdellah, Morocco ' LISAC Laboratory, Faculty of Sciences Dhar El Mahraz, University Sidi Mohamed Ben Abdellah, Morocco ' LISAC Laboratory, Faculty of Sciences Dhar El Mahraz, University Sidi Mohamed Ben Abdellah, Morocco ' LISAC Laboratory, Faculty of Sciences Dhar El Mahraz, University Sidi Mohamed Ben Abdellah, Morocco ' LISAC Laboratory, Faculty of Sciences Dhar El Mahraz, University Sidi Mohamed Ben Abdellah, Morocco

Abstract: Many feature extraction techniques have been conducted to deal with spam e-mails. However, despite their performance and efficiency, they still have a lot of weaknesses. The term frequency-inverse document frequency (TF-IDF) and the bag-of-words (BoW) are two well-known methods. Yet, they do not capture the semantic aspect of the e-mails, which may lead to misclassification. To tackle this issue, we propose an architecture based on distributed bag-of-words version of paragraph vector (PV-DBOW). It is considered as a deep learning architecture. The features generated from an e-mail are characterised by their richness, and they capture the semantic aspect of the e-mails by taking into account the context of the sentences. The obtained results show that the proposed approach outperforms the state-of-the-art methodologies in terms of precision, recall, F-measure, and accuracy.

Keywords: spam-filtering; deep learning; machine learning; PV-DBOW; feature extraction.

DOI: 10.1504/IJDATS.2021.120111

International Journal of Data Analysis Techniques and Strategies, 2021 Vol.13 No.4, pp.302 - 316

Received: 22 May 2020
Accepted: 04 Mar 2021

Published online: 07 Jan 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article