New rules-based algorithm to improve Arabic stemming accuracy Online publication date: Wed, 06-Jan-2016
by Walid Cherif; Abdellah Madani; Mohamed Kissi
International Journal of Knowledge Engineering and Data Mining (IJKEDM), Vol. 3, No. 3/4, 2015
Abstract: In the recent past, the world has been witnessing a steady increase in the area of natural language processing owing to the spread of the internet. However, attempts and efforts devoted for Arabic language are still limited. By morphological and semantic properties, Arabic is considered a difficult language in the field of automatic processing. From that perspective, many different approaches were attempted to deal with the morphological variation and the agglutination phenomenon while stemming Arabic texts. Formally, stemming and light-stemming are used to remove irrelevant morphological variations from a given word, and extract its original stem or root. This research introduces a complete new rules-based algorithm. This involves precise removal of affixes based on context-sensitive morphological rules and then deduces the root according to a predefined set of rules. Finally, results show that the accuracy of the proposed algorithm is higher than the two well-known Arabic stemmers.
Online publication date: Wed, 06-Jan-2016
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Knowledge Engineering and Data Mining (IJKEDM):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email email@example.com