Title: A pilot investigation of information extraction in the semantic annotation of archaeological reports

Authors: Andreas Vlachidis; Douglas Tudhope

Addresses: Hypermedia Research Unit, Faculty of Advanced Technology, University of Glamorgan, Pontypridd, CF37 1DL, Wales, UK. ' Hypermedia Research Unit, Faculty of Advanced Technology, University of Glamorgan, Pontypridd, CF37 1DL, Wales, UK

Abstract: The paper discusses a prototype investigation of semantic annotation, a form of metadata assigning conceptual entities to textual instances; in the case of archaeological grey literature. The use of Information Extraction (IE), a Natural Language Processing (NLP) technique, is central to the annotation process while the use of Knowledge Organization System (KOS) is explored for the association of semantic annotation with both ontological and terminological references. The annotation process follows a rule-based information extraction approach using the GATE NLP toolkit, together with the CIDOC CRM ontology, its CRM-EH archaeological extension and English Heritage thesauri and glossaries. Results are reported from an initial evaluation, which suggest that these information extraction techniques can be applied to archaeological grey literature reports. Further work is discussed drawing on the evaluation and consideration of the characteristics of the archaeology domain.

Keywords: NLP; natural language processing; KOS; knowledge organisation systems; semantic annotation; information extraction; GATE; digital archaeology; grey literature; CIDOC CRM ontology; archaeological reports; metadata.

DOI: 10.1504/IJMSO.2012.050183

International Journal of Metadata, Semantics and Ontologies, 2012 Vol.7 No.3, pp.222 - 235

Received: 21 Nov 2011
Accepted: 11 Jul 2012

Published online: 31 Dec 2014 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article