Title: Statistical pair pruning towards target class in learning-based anaphora resolution for Tamil

Authors: K. Arul Deepa; C. Deisy

Addresses: Department of Information Science and Technology, College of Engineering, Guindy, Anna University, Chennai – 600 025, India ' Department of Computer Science and Engineering, Thiagarajar College of Engineering, Madurai - 625 015, India

Abstract: Anaphora resolution is an important task to be achieved in many natural language understanding (NLU) applications including machine translation. This paper proposes learning-based system to resolve pronouns in Tamil text built around various classification algorithms. To improve learning accuracy, the system is built in two folds. First is feature vector production where mentions are identified, characterised then a feature vectors of lexical, syntactic and semantic features are produced. Next is the pair pruning module where, number of non-target class pairs is reduced by deep statistical analysis of feature vector. Incorporating deeper pair pruning module dramatically increases the f-measure score when compared to training the same models but without the pruning module. On the tourism dataset of TDIL we trained the system with various classification algorithms and obtained encouraging results for a challenging language, Tamil. We discuss how varying the ratio of f-measure, precision and recall is between with and without the pruning module in comparative model.

Keywords: anaphora resolution; classification; machine learning; pronoun resolution; Tamil computing; co-reference resolution; natural language understanding; NLU; natural language processing; NLP.

DOI: 10.1504/IJAIP.2017.088142

International Journal of Advanced Intelligence Paradigms, 2017 Vol.9 No.5/6, pp.437 - 463

Received: 02 Dec 2014
Accepted: 02 Sep 2015

Published online: 27 Nov 2017 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article