Title: An evaluation dataset for depression detection in Arabic social media

Authors: Somaia Elimam; Mohamed Bougeussa

Addresses: College of Graduate Studies, Sudan University of Science and Technology, Khartoum, Sudan; Faculty of Computer Science and Information Technology, University of El Imam El Mahdi, Kosti, Sudan ' Department of Computer Science, Faculty of Science, University of Quebec, Montreal, Canada

Abstract: Studying depression in Arabic social media has been neglected compared to other languages and the traditional way of dealing with depression (face-to-face medical diagnose) is not enough as the number of people that suffer from depression in Arabic communities increased dramatically. This paper proposes the first dataset to detect depressed users in Arabic social media. We pondered tweets from Twitter, pre-processed and converted it to a structured format. A notable advantage of the elaborated dataset is that it allows effective evaluation of machine learning algorithms for depression detection. We employ several classification algorithms such as deep neural network, logistic regression, multinomial Naïve Bayes, Bernoulli Naïve Bayes, AdaBoost, passive aggressive, nearest centroid, and linear SVC. The F-score, AUC, precision, and accuracy scores were selected as performance measures to compare algorithms, and the result showed that it is very challenging to classify Arabic tweets especially with the sparse nature of Twitter data.

Keywords: Arabic dataset; depression detection; Arabic social media; machine learning.

DOI: 10.1504/IJKEDM.2021.119888

International Journal of Knowledge Engineering and Data Mining, 2021 Vol.7 No.1/2, pp.113 - 126

Received: 30 Dec 2020
Accepted: 12 Sep 2021

Published online: 22 Dec 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article