Spam filtering based on PV-DBOW model
by Ghizlane Hnini; Anass Fahfouh; Jamal Riffi; Mohamed Adnane Mahraz; Ali Yahyaouy; Hamid Tairi
International Journal of Data Analysis Techniques and Strategies (IJDATS), Vol. 13, No. 4, 2021

Abstract: Many feature extraction techniques have been conducted to deal with spam e-mails. However, despite their performance and efficiency, they still have a lot of weaknesses. The term frequency-inverse document frequency (TF-IDF) and the bag-of-words (BoW) are two well-known methods. Yet, they do not capture the semantic aspect of the e-mails, which may lead to misclassification. To tackle this issue, we propose an architecture based on distributed bag-of-words version of paragraph vector (PV-DBOW). It is considered as a deep learning architecture. The features generated from an e-mail are characterised by their richness, and they capture the semantic aspect of the e-mails by taking into account the context of the sentences. The obtained results show that the proposed approach outperforms the state-of-the-art methodologies in terms of precision, recall, F-measure, and accuracy.

Online publication date: Fri, 07-Jan-2022

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Data Analysis Techniques and Strategies (IJDATS):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com