Title: An intelligent tool for syntactic annotation of Arabic corpora

Authors: Chiraz Ben Othmane Zribi, Feriel Ben Fraj, Mohamed Ben Ahmed

Addresses: RIADI Laboratory, National School of Computer Science, La Manouba University, La Manouba, Tunisia. ' RIADI Laboratory, National School of Computer Science, La Manouba University, La Manouba, Tunisia. ' RIADI Laboratory, National School of Computer Science, La Manouba University, La Manouba, Tunisia

Abstract: In this paper, we propose a new technique for semi-automatic syntactic annotation of Arabic corpora. We describe a tool that takes a morpho-syntactic tagged corpus as an input and provides its syntactic annotation as output according to the ArabTAG formalism. We say it is |intelligent| because this tool automatically learns and improves during elementary annotation (supertagging). It applies a supervised classification method that combines three classifiers (Naive Bayes, K-Nearest Neighbours, Decision tree). In order to evaluate the ability of this tool to acquire information from human intervention, we present an experimental protocol for a small Treebank of 5000 words.

Keywords: treebank; syntactic annotation; intelligent annotation; ArabTAG; TAG; tree adjoining grammar; supervised classification; machine learning; supertagging; Arabic language; morpho-syntactically tagged texts; XML.

DOI: 10.1504/IJCAT.2011.041651

International Journal of Computer Applications in Technology, 2011 Vol.40 No.4, pp.227 - 237

Published online: 28 Jul 2011 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article