Title: Classifiers for Arabic NLP: survey

Authors: Marwan Al Omari; Moustafa Al-Hajj

Addresses: Centre for Language Sciences and Communication, Lebanese University, Celine Centre, Tayouneh, Beirut, Lebanon ' Centre for Language Sciences and Communication, Lebanese University, Celine Centre, Tayouneh, Beirut, Lebanon

Abstract: In this paper, we reviewed most common-used models and classifiers that used for the Arabic language to classify texts into categories, classes, or topics in tasks of opinion mining, sentence categorisation, part of speech tagging, language identification, name entity recognition, authorship attribution, word sense disambiguation, and text classification. Comparisons between classification tasks conducted in terms of models' performances and accuracies. Classification approaches are three types: lexicon-based, machine and deep learning, or hybrid ones. Research sample is 34 articles in the classification domain. Challenges facing the Arabic language discussed with further solutions: 1) solid research training on both approaches: lexicon-based and corpus-based (machine and deep learning); 2) research contribution mainly corpus, approach technique, and free accessibility; 3) fund increase to the research development in the Arab world.

Keywords: lexicon-based approach; corpus-based approach; machine learning; deep learning; classification; big data; classifier; Arabic NLP; natural language processing; NLP; classification approach; NLP lexicon-based; NLP machine learning.

DOI: 10.1504/IJCCIA.2020.105538

International Journal of Computational Complexity and Intelligent Algorithms, 2020 Vol.1 No.3, pp.231 - 258

Received: 11 Oct 2018
Accepted: 31 Dec 2018

Published online: 28 Feb 2020 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article