Layout logical labelling and finding the semantic relationships between citing and cited paper content Online publication date: Fri, 19-Jun-2020
by Sergey Parinov; Amir Bakarov; Daniil Vodolazcky
International Journal of Metadata, Semantics and Ontologies (IJMSO), Vol. 14, No. 1, 2020
Abstract: Currently, large data sets of in-text citations and citation contexts are becoming available for research and developing tools. Using the "topic model" method to analyse these data, one can characterise thematic relationships between citation contexts from citing and the cited paper content. However, to build relevant topic models and to compare them accurately for papers linked by citation relationships we have to know the semantic labels of PDF papers' layout such as section titles, paragraph boundaries, etc. Recent achievements in papers' conversion from a PDF form into a rich attributed JSON format allow us to develop new approaches for the logical labelling of the papers' layout. This paper presents a re-usable method and open source software for the logical labelling of PDF papers, which gave good quality of a layout element's recognition for a set of research papers. Using these semantic labels we made a precise comparison of topic models built for citing and cited papers and we found some level of similarity between them.
Online publication date: Fri, 19-Jun-2020
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Metadata, Semantics and Ontologies (IJMSO):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email email@example.com