International Journal of Metadata, Semantics and Ontologies (7 papers in press)
Extraction and visualisation of citation relationships and its attributes for papers in PDF
by Sergey Parinov
Abstract: This paper presents a method to process a content of research papers in binary PDF format at a server side that gives research information systems new features of citation content analysis. This method efficiently generates JSON versions of PDF documents, which allows an easier recognition of papers references, in-text citations, citation context, etc. As a result, one can parse an extended set of citation data, including a location of citations in a research papers structure, frequency of mentioning for the same references, style of reference mentioning and so on. Based on these data, we upgrade traditional citation relationships by adding some semantics and other attributes. Formatting these data according to the W3C Web Annotation Data Model and integrating the data with some annotation tools, we visualise the citation relationships, their semantic attributes, related statistics and some other data as annotations to content of PDF documents available for users of a research information system.
Keywords: research information system; PDF.js; PDF to JSON conversion; citation relationships; semantic attributes; citation content analysis; visualisation.
A method for examining metadata quality in open research datasets using the OAI-PMH and SQL queries: the case of the Dublin Core Subject element and suggestions for user-centred metadata annotation design
by Panos Balatsoukas, Dimitris Rousidis, Emmanouel Garoufallou
Abstract: Open research datasets are growing in terms of volume rapidly. Despite the fact that metadata play an important role for the management of research datasets and data repositories, several factors - such as the big volume of data and its complex lifecycles, as well as operational constraints related to financial resources and human factors - may impede the quality of metadata. Poor quality metadata can have a negative impact both on the way research datasets are retrieved, shared and used by scientists, and also on the way research data repositories are managed and audited. The aim of the research reported in this paper was to perform a descriptive analysis of the Dublin Cores Subject metadata element and identify its quality problems, if any, in the context of the Dryad research data repository. In order to address this aim, a total of 4557 metadata packages and 13,638 metadata data files were analysed following a novel data-preprocessing method using SQL queries. The findings showed emerging trends about the subject coverage of the repository (e.g. the most popular subjects and the authors that contributed the most to these subjects). Also, quality problems related to the lack of controlled vocabulary and standardisation were identified, such as the inconsistent use of singular and plural forms, adjectives and synonyms. This study has both practical and methodological implications for the evaluation of metadata and the improvement of the quality of the research data annotation process in open research data repositories.
Keywords: big data; Dublin Core; data quality; open access repositories; metadata; digital curation; user-centred design; open research datasets; Dryad; metadata quality; OAI-PMH; SQL queries.
Metadata and semantics research: a case of an international conference paving toward a data driven future
by Getaneh Alemu, Emmanouel Garoufallou, Rania Siatri, Damiana Koutsomiha, Sirje Virkus
Abstract: This paper provides an overview of the 11th international conference on Metadata and Semantics Research (MTSR-2017) which was held in Tallinn, Estonia, from 28 November to 1 December 2017. The paper contextualises this with existing literature and concludes by offering insight toward the future of metadata. MTSR-2017 brought metadata experts from various domains, including digital libraries, museums, archives, higher education and agriculture. The conference provided an opportunity for participants to share their knowledge and novel approaches in the implementation of metadata and semantics technologies across diverse types of information environments and applications. In libraries, it is indicated that contemporary standards-based metadata approaches fail to describe these resources and address changing requirements of users. The paper shows the need to re-conceptualise existing metadata principles and technical formats with emerging Linked Open Data frameworks. This is where the theory of metadata enriching and filtering (Alemu & Stevens, 2015) fits in. It emerges that the future of metadata, ontologies and semantics is enriched, linked, open and filtered. In addition, ontologies need to reflect the diversity of interpretations inherent in human beings and the existence of multi-lingual, cross-cultural and multi-disciplinary content hence they should be designed, developed and maintained with diversity, scalability and interoperability in mind.
Keywords: metadata; semantics; ontologies; data; Linked Data; Open Data; RDF; OWL; BIBFRAME; digital libraries; MTSR.
Who is Open Data for and why could it be hard to use it in the digital humanities? Federated application programming interfaces for interdisciplinary research
by Go Sugimoto
Abstract: Open Data has prevailed across the research community for the past few years. However, there are a number of reasons why data reuse could be hard for digital humanities scholars. This article is based on an application that addresses the issues of data-owning culture, interdisciplinary studies, and distributed-data research in terms of Application Programming Interface (API) and Open Data. The James Cook Dynamic Journal assists users to study the Cooks journal, by aggregating information from various sets of APIs that facilitate full-text search, named entity recognition, and map views. The development of the application revealed some critical issues of data federation and processing automation. In particular, the standardisation of JSON and the development of user-friendly GUI tools would significantly increase the value of APIs. The paper also proposes Easy Data, to avoid a digital divide and to liberate Open Data for a wider spectrum of users.
Keywords: application programming interfaces; APIs; Digital Humanities; data owning culture; distributed data research; James Cook; Open Data; data reuse; interdisciplinary studies; JSON; JSON-LD; FAIR principles; data revolution; digital divide; Easy Data.
Description + Annotation: semantic data publication workflow with Dendro and B2NOTE
by Yulia Karimova, João Aguiar Castro, João Rocha Da Silva, Nelson Pereira, Joana Rodrigues, Cristina Ribeiro
Abstract: Metadata puts research data in their context, making data intelligible and apt to sustain technology evolution and to be reused, in compliance with the FAIR principles. The workflow proposed in this work includes metadata generation in the context of research projects, created with the Dendro platform, and metadata originated in the interaction of people with the deposited data, created with the B2NOTE service from EUDAT. In our experiments, datasets are prepared with Dendro, taking into consideration general-purpose descriptors and domain-specific ones, then transparently deposited in B2SHARE. After publication, B2NOTE provides an environment where authors, other researchers, and any interested party can enrich the description with less formal comments, tags or keywords. This work contributes with 1) a set of use cases in several domains, 2) details on the descriptors used by authors in each case, and 3) reflections on the use of data after publication, using the B2NOTE contributions.
Keywords: research data management; metadata; semantic annotation; Dendro; B2SHARE; B2NOTE.
Data provenance in multiagent systems: relevance, benefits and research opportunities
by Tassio Sirqueira, Marx Viana, Carlos Lucena
Abstract: The popularity of applications based on artificial intelligence creates the need for making them able to explain their behavior and be accountable for their decisions. This is a challenge mainly if applications are distributed, being composed of multiple autonomous agents, forming a multiagent system (MAS). A key means of making these systems explainable is to track agent behaviour, that is, to record the provenance of their actions and reasoning. Although the idea of provenance has been explored in some contexts, it has been little explored in the context of MAS, leaving many open issues that must be understood and addressed. Our goal in this paper is to make a case for the importance of the data provenance to MAS, discussing what questions can be answered regarding MAS behavior using provenance and, with a case study, demonstrating the benefits that provenance provides to answer these questions. This study involves the use of a framework, namely FProvW3C, which collects and stores the provenance of data produced by a MAS. These data can be analysed to answer a wide variety of questions to understand the MAS behaviour. Our case study thus demonstrates that the use of data provenance in MAS is a potential solution to making the agent reasoning process transparent.
Keywords: provenance; multiagent systems; explainable artificial intelligence.
An ontology-based coordination and integration of multi-channel online communication
by Zaenal Akbar, Anna Fensel, Dieter Fensel
Abstract: While multiple online communication channels have been available on the internet, coordination and integration of those channels have been challenging owing to the interoperability limitation. Every channel has unique characteristics, and there is no standard format to interchange knowledge among them. In this paper, we propose an ontology-based model to represent communication activities on multiple channels uniformly, in a way that content publication to multiple channels can be coordinated and the received engagements from multiple channels can be integrated. We show how to apply the model to analyse multi-channel publications on popular social media channels from the tourism industry.
Keywords: online communication; multi-channel; coordinated publication; integrated user engagement.