Contributions to the automatic processing of the user-generated Tunisian dialect on the social web Online publication date: Thu, 09-Apr-2020
by Jihene Younes; Hadhemi Achour; Emna Souissi; Ahmed Ferchichi
International Journal of Computational Intelligence Studies (IJCISTUDIES), Vol. 9, No. 1/2, 2020
Abstract: With the growing use of social media in the Arab world, Arabic dialects are rapidly spreading on the web, leading to a growing interest from NLP researchers. These dialects are however, still under-resourced languages which is a major obstacle to their study and processing. In this paper, we focus on the automatic processing of the user-generated Tunisian dialect (TD) on the social web and propose an approach that aids to automatically generate TD language resources. This approach exploits the large amounts of textual productions on the social web to extract and generate dialectal content. It is based on two main NLP components, namely the TD identification and the TD transliteration. A machine learning approach using conditional random fields is proposed for implementing these two components and reached an accuracy of 87.45 for the TD identification and 90.49 for the automatic generation of dialectal contents by transliteration.
Online publication date: Thu, 09-Apr-2020
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Computational Intelligence Studies (IJCISTUDIES):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email email@example.com