Annotation tools for syntax and named entities in the National Corpus of Polish
by Jakub Waszczuk; Katarzyna Głowińska; Agata Savary; Adam Przepiórkowski; Michał Lenart
International Journal of Data Mining, Modelling and Management (IJDMMM), Vol. 5, No. 2, 2013

Abstract: The ongoing National Corpus of Polish project assumes several levels of linguistic annotation. We present the technical environment and methodological background developed for the three upper annotation levels: the levels of syntactic words, syntactic groups and named entities. We show how knowledge-based platforms Spejd and Sprout are used for the automatic pre-annotation of the corpus and discuss some particular problems faced during the preparation of the parser grammar, which contains over 1,000 rules and is one of the largest chunking grammars for Polish. We also show how the tree editor TrEd has been customised for manual post-editing of annotations and for further revision of discrepancies. Our XML format converters and customised archiving repository ensure an automatic data flow and efficient corpus file management. We discuss the inter-annotator agreement in the manually annotated data, and present the first results of a CRF classifier trained on these data.

Online publication date: Tue, 29-Jul-2014

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Data Mining, Modelling and Management (IJDMMM):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com