Title: Building a large-scale testing dataset for conceptual semantic annotation of text

Authors: Xiao Wei; Daniel Dajun Zeng; Xiangfeng Luo; Wei Wu

Addresses: Shanghai Institute of Technology, Shanghai 201418, China; State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China ' State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China ' School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China ' Shanghai Institute of Technology, Shanghai 201418, China

Abstract: One major obstacle facing the research on semantic annotation is lack of large-scale testing datasets. In this paper, we develop a systematic approach to constructing such datasets. This approach is based on guided ontology auto-construction and annotation methods which use little priori domain knowledge and little user knowledge in documents. We demonstrate the efficacy of the proposed approach by developing a large-scale testing dataset using information available from MeSH and PubMed. The developed testing dataset consists of a large-scale ontology, a large-scale set of annotated documents, and the baselines to evaluate the target algorithm, which can be employed to evaluate both the ontology construction algorithms and semantic annotation algorithms.

Keywords: semantic annotation; ontology concept learning; testing dataset; evaluation baseline; ontology auto-construction; priori knowledge; evaluation parameters; guided annotation method; MeSH; PubMed.

DOI: 10.1504/IJCSE.2018.089582

International Journal of Computational Science and Engineering, 2018 Vol.16 No.1, pp.63 - 72

Received: 17 Mar 2015
Accepted: 11 Oct 2015

Published online: 31 Jan 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article