Article: Comparative study of classification approaches for e-mail analysis Journal: International Journal of Information and Computer Security (IJICS) 2020 Vol.13 No.3/4 pp.411 - 427 Abstract: Illicit messages like threatening and abusive messages affect emotions and psychology of a person. Such messages start exerting influence on mental status, and ultimately physical condition of a person. E-mails are one of the popularly used sources, for communicating personal and official messages. Typically, sentiment analysis of these e-mails includes classifying them into positive, negative and neutral messages. Identifying the sentiments of e-mails using an efficient and effective algorithm is very important and useful step in the domain of e-mail forensics. In this work, support vector machine, k-nearest neighbour, and neural network back-propagation algorithms are used to classify the sentiments of e-mail into positive, negative and neutral categories using self-curated e-mail dataset. This dataset is a combination of Enron's e-mail dataset and publicly available messages converted into e-mails. This paper presents a comparative study of classification approaches for e-mail analysis. Finally, it is concluded that the neural network with the back-propagation training algorithm provides the best results considering the accuracy and the memory requirements with the little compromise on the time required to recognise the sentiment of a given e-mail. Inderscience Publishers - linking academia, business and industry through research

Title: Comparative study of classification approaches for e-mail analysis

Authors: Pranjal S. Bogawar; K.K. Bhoyar

Addresses: Department of Bachelor of Computer Applications, Aakar College of Management for Women, S.N.D.T. University, Nagpur, India ' Department of Information Technology, Yeshwantrao Chavhan College of Engineering, R.T.M. Nagpur University, Maharashtra, India

Abstract: Illicit messages like threatening and abusive messages affect emotions and psychology of a person. Such messages start exerting influence on mental status, and ultimately physical condition of a person. E-mails are one of the popularly used sources, for communicating personal and official messages. Typically, sentiment analysis of these e-mails includes classifying them into positive, negative and neutral messages. Identifying the sentiments of e-mails using an efficient and effective algorithm is very important and useful step in the domain of e-mail forensics. In this work, support vector machine, k-nearest neighbour, and neural network back-propagation algorithms are used to classify the sentiments of e-mail into positive, negative and neutral categories using self-curated e-mail dataset. This dataset is a combination of Enron's e-mail dataset and publicly available messages converted into e-mails. This paper presents a comparative study of classification approaches for e-mail analysis. Finally, it is concluded that the neural network with the back-propagation training algorithm provides the best results considering the accuracy and the memory requirements with the little compromise on the time required to recognise the sentiment of a given e-mail.

Keywords: e-mail mining; e-mail classification; k-nearest neighbour; neural network; support vector machine; forensic; abusive; threatening.

DOI: 10.1504/IJICS.2020.109485

International Journal of Information and Computer Security, 2020 Vol.13 No.3/4, pp.411 - 427

Received: 15 Nov 2017
Accepted: 26 Jan 2019
Published online: 10 Sep 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Comparative study of classification approaches for e-mail analysis

Keep up-to-date