Title: Mining annotators' common knowledge for automatic text revision

Authors: Giovanni Siragusa; Luigi Di Caro; Marco Tosalli

Addresses: Department of Computer Science, University of Turin, Turin, Italy ' Department of Computer Science, University of Turin, Turin, Italy ' Nuance Communication Inc., Strada del Lionetto, 6, Turin, Italy

Abstract: Many natural language understanding tasks require clean input textual data in order to train systems with the highest precision. Such data, usually collected from surveys or the web, are manually processed in order to remove morphosyntactic variability, spelling errors and incoherence in naming entities. Since these operations are conducted by domain experts and annotators, they are usually costly and time-consuming. Furthermore, this scenario is very common in industrial tasks where annotators are hired. In this context, we propose an innovative and simple method that extracts correction patterns, i.e., <expression, replacement> pairs, where expression is a matching string and replacement indicates how to re-write the matched string. Such tool can be used both to evaluate annotators (since it provides a deep understanding of their work) and to automatically revise the texts. We extensively tested our method in a multilingual setting, obtaining outstanding results over baseline approaches.

Keywords: pattern extraction; natural language understanding; annotation learning; correction patterns.

DOI: 10.1504/IJMSO.2019.099837

International Journal of Metadata, Semantics and Ontologies, 2019 Vol.13 No.3, pp.254 - 263

Received: 22 Mar 2018
Accepted: 01 Mar 2019

Published online: 23 May 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article