International Journal of Metadata, Semantics and Ontologies (11 papers in press)
Automated subject indexing using word embeddings and controlled vocabularies: a comparative study
by Michalis Sfakakis, Leonidas Papachristopoulos, Kyriaki Zoutsou, Christos Papatheodorou, Giannis Tsakonas
Abstract: Text mining methods contribute significantly to the understanding and the management of digital content, increasing the potential of entry links. This paper introduces a method for subject analysis combining topic modelling and automated labelling of the generated topics, exploiting terms from existing knowledge organisation systems. A testbed was developed in which the Latent Dirichlet Allocation algorithm (LDA) was deployed for modelling the topics of a corpus of papers related to the Digital Library Evaluation domain. The generated topics were represented in the form of bags-of-words word embeddings and were used for retrieving terms from the EuroVoc Thesaurus and the Computer Science Ontology (CSO). The results of this study show that the domain of DL can be described with different vocabularies, but during the process of automatic labelling the context needs to be taken into account.
Keywords: subject indexing; similarity measures; text classification; machine learning; word embedding.
EPIC: an iterative model for metadata improvement
by Hannah Tarver, Mark Phillips, Ana Krahmer
Abstract: This paper provides a case study of iterative metadata correction and enhancement at the University of North Texas (UNT), within a model that we have developed to describe this process: Evaluate, Prioritize, Identify, Correct (EPIC). These steps are illustrated within the paper to show how they function at UNT and why it may serve as a useful tool for other organisations. We suggest that the EPIC model works for ongoing assessment, but is particularly useful for large remediation and enhancement projects to plan timelines and to allocate the people and resources needed to determine what issues should be addressed (evaluate), to rate their level of severity, importance, or difficulty (prioritise), to define subsets or records that are affected (identify), and to make changes based on prioritisation (correct).
Keywords: metadata quality; iterative processes; enhancement projects; models.
What process can a university follow for open data? The University of Crete case
by Yannis Tzitzikas, Marios Pitikakis, Giorgos Giakoumis, Kalliopi Varouha, Eleni Karkanaki
Abstract: All public bodies in Greece, including universities, are obliged to comply with the national legal framework and policy on open data. An emerging concern is how such a big and diverse organisation could develop supporting procedures from an administrative, legal and technical standpoint, that will enhance and expand the level of the provided open data-related services. In this paper, we describe our experience, at the University of Crete, for tackling these requirements. In particular, (a) we detail the steps of the process that we followed, (b) we show how an Open Data Catalogue can be exploited also in the first steps of this process, (c) we describe the platform that we selected, how we organised the catalogue and the metadata selection, (d) we describe extensions that were required, (e) we motivate and describe various additional services that we developed, and (f) we discuss the current status and possible next steps.
Keywords: open data; university open data; data sharing.
An ontology proposal for a corpus of letters of Vincenzo Bellini: formal properties of physical structure and the case of rotated texts
by Salvatore Cristofaro, Daria Spampinato, Pietro Sichera
Abstract: In this paper, the formal OntoBelliniLetters ontology is described concerning the corpus of Vincenzo Bellini's letters kept at the Belliniano Civic Museum of Catania. This ontology is part of a wider project - the BellinInRete project - one of whose aims is the development of a more general and complete ontology for the whole Vincenzo Bellini's legacy preserved in the museum. The main concepts and relations building up the ontology knowledge base are described and discussed and some formal properties of them are presented. The ontology schema is inspired by the CIDOC Conceptual Reference Model (CIDOC CRM).
Keywords: letters; ontology; Vincenzo Bellini; CIDOC CRM; text arrangement.
Making heterogeneous smart home data interoperable with the SAREF ontology
by Roderick Van Der Weerdt, Victor De Boer, Laura Daniele, Barry Nouwt, Ronald Siebes
Abstract: SAREF is an ontology created to enable interoperability between smart devices, but there is a lack in the literature of practical examples to implement SAREF in real applications. We validate the practical implementation of SAREF through two approaches. We first examine two methods to map the internet of things (IoT) data available in a smart home into Linked Data using SAREF: 1) by creating a template-based mapping to describe how SAREF can be used and 2) by using a mapping language to demonstrate it can be simple to map, while still using SAREF. The second approach demonstrates the communication capabilities of IoT devices when they share knowledge represented using SAREF and describes how SAREF enables interoperability between different devices. The two approaches demonstrate that all the information from various datasets of smart devices can successfully be transformed into the SAREF ontology and how SAREF can be applied in a concrete interoperability framework.
Keywords: internet of things; SAREF ontology; data mapping; smart home.
Introducing a novel bi-functional method for exploiting sentiment in complex information networks
by Paraskevas Koukaras, Dimitrios Rousidis, Christos Tjortjis
Abstract: This paper elaborates on multilayer Information Network (IN) modelling, utilising graph mining and machine learning. Although, Social Media (SM) INs may be modelled as homogeneous networks, real-world networks contain multi-typed entities, characterised by complex relations and interactions posing as heterogeneous INs. For mining data whilst retaining semantic context in such complex structures, we need better ways for handling multi-typed and interconnected data. This work conceives and performs several simulations on SM data. The first simulation models information, based on a bi-partite network schema. The second simulation utilises a star network schema, along with a graph database offering querying for graph metrics. The third simulation handles data from the previous simulations to generate a multilayer IN. The paper proposes a novel bi-functional method for sentiment extraction of user reviews/opinions across multiple SM platforms, considering the concepts of supervised/unsupervised learning and sentiment analysis.
Keywords: linked data; multi-layer information networks; graph modelling; social media; NoSQL; data mining; machine learning; supervised/unsupervised learning; graph metrics; bi-functional algorithms.
Integrated classification schemas to interlink cultural heritage collections over the web using LOD technologies
by Carlos Henrique Marcondes
Abstract: Libraries, archives and museum collections are now being published over the web using LOD technologies. Many of them have thematic intersections or are related to other web subjects and resources such as authorities, sites for historic events, online exhibitions, or to articles in Wikipedia and its sibling resources DBpedia and Wikidata. The full potential of such published initiatives using LOD rests heavily on the meaningful interlinking of such collections. Within these contextual vocabularies and classifications, schemas are important, as they provide meaning and context to heritage data. This paper proposes comprehensive classification schemas - a Culturally Relevant Relationships (CRR) vocabulary and a classification schema of types of heritage objects - to order, integrate and provide structure to cultural heritage data brought about with the publication of heritage collections as LOD.
Keywords: heritage objects; digital collections; classification; interlinking; culturally relevant relationships; linked data; LOD.
Institutional support for data management plans: case studies for a systematic approach
by Yulia Karimova, Cristina Ribeiro, Gabriel David
Abstract: Researchers have to ensure that their projects comply with Research Data Management (RDM) requirements. Consequently, the main funding agencies require Data Management Plans (DMPs) for grant applications. So, institutions are investing in RDM tools and implementing RDM workflows in order to support their researchers. In this context, we propose a collaborative DMP-building method that involves researchers, data stewards and other parties if required. This method was applied as part of an RDM workflow in research groups across several scientific domains. We describe it as a systematic approach and illustrate it through a set of case studies. We also address the DMP monitoring process during the life cycle of projects. The feedback from the researchers highlighted the advantages of creating DMPs and their growing need. So, there is motivation to improve the DMP support process according to the machine-actionable DMPs concept and to the best practices in each scientific community.
Keywords: research data management; data management plan; research workflow; open data; open science.
A method for archaeological and dendrochronological concept annotation using domain knowledge in information extraction
by Andreas Vlachidis, Douglas Tudhope
Abstract: Advances in Natural Language Processing allow the process of deriving information from large volumes of text to be automated. Attention is turned to one of the most important, but traditionally difficult to access resources in archaeology, commonly known as 'grey literature'. This paper presents the development of two separate Named-Entity Recognition (NER) pipelines aimed at the extraction of Archaeological and of Dendrochronological concepts in Dutch, respectively. The role of domain vocabulary is discussed for the development of a Knowledge Organisation System (KOS)-driven, Rule-Based method of NER which makes complementary use of ontology, thesauri and domain vocabulary for information extraction and attribute assignment of semantic annotations. The NER task is challenged by a series of domain and language-oriented aspects and evaluated against a human-annotated Gold Standard. The results suggest the suitability of Rule-based KOS driven approaches for attaining the low-hanging fruits of NER, using a combination of quality vocabulary and rules.
Keywords: information extraction; knowledge organisation systems; named entity recognition; archaeology; dendrochronology; grey literature; semantic annotation.
Interlinking and enrichment of disparate organisational data with LOD at application run-time
by Sotiris Angelis, Konstantinos Kotis, Panagiotis Mouzakis
Abstract: The present work focuses on the semantic integration, enrichment and interlinking of data that is semi-automatically generated by documenting artworks and their creators. In this work, we have been experimenting with RDFization and links discovery tools, W3C standards and widely accepted vocabularies. This work has been already evaluated with museum data, emphasising the discovery of links between disparate data sets and external data sources at the back-end of the proposed approach. In this paper, while contributing a number of other new and extended features at the back-end of this approach, we emphasise links discovery at the front-end in order to interlink and enrich cultural data with LOD at application run-time and facilitate the real-time and up-to-date exploitation of semantically integrated and LOD-enriched data. This is achieved by implementing a custom links discovery method and evaluating it within a web application using LOD cloud data sources such as DBpedia and Europeana.
Keywords: link discovery; RDF querying; semantic enrichment; LOD; cultural data.
A workflow for supporting the evolution requirements of RDF-based semantic warehouses
by Yannis Marketakis, Yannis Tzitzikas, Aureliano Gentile, Bracken Van Niekerk, Marc Taconet
Abstract: Semantic data integration aims to exploit heterogeneous pieces of similar or complementary information for enabling integrated browsing and querying services. A quite common approach is the transformation from the original sources with respect to a common graph-based data model and the construction of a global semantic warehouse. The main problem is the periodic refreshment of the warehouse, as the contents from the data sources change. This is a challenging requirement, not only because the transformations that were used for constructing the warehouse can be invalidated, but also because additional information may have been added in the semantic warehouse, which needs to be preserved after every reconstruction. In this paper, we focus on this particular problem using a semantic warehouse that integrates data about stocks and fisheries from various information systems, we detail the requirements related to the evolution of semantic warehouses and propose a workflow for tackling them.
Keywords: semantic warehouse; evolution; semantic data integration; preserve updates; refresh workflow.