Article: Contributions to the automatic processing of the user-generated Tunisian dialect on the social web Journal: International Journal of Computational Intelligence Studies (IJCISTUDIES) 2020 Vol.9 No.1/2 pp.33 - 51 Abstract: With the growing use of social media in the Arab world, Arabic dialects are rapidly spreading on the web, leading to a growing interest from NLP researchers. These dialects are however, still under-resourced languages which is a major obstacle to their study and processing. In this paper, we focus on the automatic processing of the user-generated Tunisian dialect (TD) on the social web and propose an approach that aids to automatically generate TD language resources. This approach exploits the large amounts of textual productions on the social web to extract and generate dialectal content. It is based on two main NLP components, namely the TD identification and the TD transliteration. A machine learning approach using conditional random fields is proposed for implementing these two components and reached an accuracy of 87.45 for the TD identification and 90.49 for the automatic generation of dialectal contents by transliteration. Inderscience Publishers - linking academia, business and industry through research

Title: Contributions to the automatic processing of the user-generated Tunisian dialect on the social web

Authors: Jihene Younes; Hadhemi Achour; Emna Souissi; Ahmed Ferchichi

Addresses: ISGT, Université de Tunis, LR99ES04 BESTMOD, 2000, Le Bardo, Tunisia ' ISGT, Université de Tunis, LR99ES04 BESTMOD, 2000, Le Bardo, Tunisia ' ENSIT, Université de Tunis, 1008, Montfleury, Tunisia ' ISGT, Université de Tunis, LR99ES04 BESTMOD, 2000, Le Bardo, Tunisia

Abstract: With the growing use of social media in the Arab world, Arabic dialects are rapidly spreading on the web, leading to a growing interest from NLP researchers. These dialects are however, still under-resourced languages which is a major obstacle to their study and processing. In this paper, we focus on the automatic processing of the user-generated Tunisian dialect (TD) on the social web and propose an approach that aids to automatically generate TD language resources. This approach exploits the large amounts of textual productions on the social web to extract and generate dialectal content. It is based on two main NLP components, namely the TD identification and the TD transliteration. A machine learning approach using conditional random fields is proposed for implementing these two components and reached an accuracy of 87.45 for the TD identification and 90.49 for the automatic generation of dialectal contents by transliteration.

Keywords: Tunisian dialect; TD; language resources; LR; corpora; lexica; identification; transliteration; natural language processing; NLP; machine learning.

DOI: 10.1504/IJCISTUDIES.2020.106487

International Journal of Computational Intelligence Studies, 2020 Vol.9 No.1/2, pp.33 - 51

Received: 06 Mar 2018
Accepted: 06 Sep 2018
Published online: 09 Apr 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Contributions to the automatic processing of the user-generated Tunisian dialect on the social web

Keep up-to-date