Title: Non-vocalised Arabic word classifications based on mining affixes features

Authors: Sari Awwad; Mustafa Hammad; Safaa Al-Haj Saleh

Addresses: Computer Science and Applications, The Hashemite University, Zarqa, Jordan ' Department of Information Technology, Mutah University, Al-Karak, Jordan ' Department of Software Engineering, The Hashemite University, Zarqa, Jordan

Abstract: Arabic word classification is a challenging problem owing to the cursive nature of the language and modulation marks. The existing approaches are based on databases and dictionaries to classify Arabic words, which makes classification process operation slow. Therefore, this paper investigates Arabic word classification in the non-vocalised Arabic text by solely using affixes features and explores the extent to which we can rely on these features to determine Arabic word class without the need for dictionaries or word lists. The proposed approach is mainly based on affixes features and Support Vector Machine (SVM). A Fisher encoding is also applied to remove any redundancy and to preserve important information. Our approach is tested on a data set of two main classes (noun and verb) and different six noun sub-classes. The results indicate that this approach is helpful in achieving a success rate approaching 64% of the total words in the articles used in this study. The unsuccessful classification rate appears because there are no affixes in the target Arabic word or some original characters are considered as affixes.

Keywords: affixes features; word classification; SVM; support vector machine; Fisher encoding; Arabic language.

DOI: 10.1504/IJCAT.2019.099196

International Journal of Computer Applications in Technology, 2019 Vol.59 No.4, pp.347 - 353

Received: 07 Mar 2018
Accepted: 16 Apr 2018

Published online: 23 Apr 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article