International Journal of Metadata, Semantics and Ontologies (7 papers in press)
Semantic architectures and dashboard creation processes within the data and analytics framework
by Michele Petito, Francesca Fallucchi, Ernesto William De Luca
Abstract: It is almost twenty years since Tim Berners-Lee, creator of the web and the semantic web, described his idea of the web (Berners-Lee, Hendler, & Lassila, 2001) as an environment in which programs were able to understand the meaning of words and make autonomous decisions. The open data tools currently on the market do not exploit the semantic web, or provide tools for data analysis and visualisation. Most of them are simple open data portals that display a data catalogue, often not even fulfilling the lowest level of the famous five-star model. Current solutions (commercial or free) do not provide users with easy access to data, nor with tools for analysing and displaying data. The Data and Analytics Framework (DAF), a project run by the Italian government, was launched at the end of 2017 with the aim of becoming the single platform to solve all the problems related to the management of semantic data in public administration (PA). DAF extracts knowledge from the immense amount of data owned by the State. It favours the spread of linked open data (LOD) within PA thanks to the integration of open source business intelligence products and the network of controlled ontologies and vocabularies (OntoPiA). The research outlined in this paper illustrates some of the platforms competitive solutions and introduces the five-step process to create a DAF dashboard, as well as the related data story. The case study created by the authors concerns tourism in Sardinia (a region of Italy) and represents one of the few demonstrations of a real case being tested in DAF.
Keywords: big data; data and analytics framework; data visualisation; dashboard; open data; linked open data.
HSLD: a hybrid similarity measure for linked data resources
by Gabriela Silva, Frederico Durão, Paulo Roberto De Souza
Abstract: The web of data is a set of deeply linked resources that can be instantly read and understood by both humans and machines. A vast amount of RDF data have been published in freely accessible and interconnected datasets creating the so-called Linked Open Data cloud. Such a huge amount of available data, along with the development of semantic web standards, has opened up opportunities for the development of semantic applications. However, most of the semantic recommender systems use only the link structure between resources to calculate the similarity between resources. In this paper we propose HSLD, a hybrid similarity measure for linked data that exploits information present in RDF literals besides the links between resources. We evaluate the proposed approach in the context of a LOD-based recommender system using data from DBpedia. Experiment results indicate that HSLD increases the precision of the recommendations in comparison with pure link-based baseline methods.
Keywords: recommender systems; linked data; lexical similarity; semantic similarity.
EngMeta: metadata for computational engineering
by Björn Schembera, Dorothea Iglezakis
Abstract: Computational engineering generates knowledge through the analysis and interpretation of research data, which is produced by computer simulation. Supercomputers produce huge amounts of research data. To address a research question, a lot of simulations are run over a large parameter space. Therefore, handling this data and keeping an overview becomes a challenge. Data documentation is mostly handled by file and folder names in inflexible file systems, making it almost impossible for data to be findable, accessible, and interoperable and hence reusable. To enable and improve a structured documentation of research data from computational engineering, we developed EngMeta as a metadata model. We built this model by incorporating existing standards for general descriptive and technical information and adding metadata fields for discipline-specific information, such as the components and parameters of the simulated target system, and information about the research process, such as the used methods, software and computational environment. EngMeta functions, in practical use, as the descriptive core for an institutional repository. In order to reduce the burden of description on scientists, we have developed an approach for automatically extracting metadata information from the output and log files of computer simulations. Through a qualitative analysis, we show that EngMeta fulfills the criteria of a good metadata model. Through a quantitative survey, we can show that it meets the needs of engineering scientists. The overall outcome is the metadata model EngMeta in XML/XSD, ready for usage in computational engineering. This metadata product is backed by an automated metadata extraction and a repository, making specific research data management possible in computational engineering.
Keywords: research data management; metadata; big data; high performance computing; simulation; computational engineering; metadata extraction; repository.
Special Issue on: Towards an Enriched, Linked, Open and Filtered Metadata Model
Intermediary XML schemas: constraint, templating and interoperability in complex environments
by Richard Gartner
Abstract: This article introduces the methodology of intermediary schemas for complex metadata environments. Metadata in instances conforming to these is not generally intended for dissemination but must usually be transformed by XSLT transformations to generate instances conforming to the referent schemas to which they mediate. The methodology is designed to enhance the interoperability of complex metadata within XML architectures. This methodology incorporates three subsidiary methods: these are project-specific schemas which represent constrained mediators to over-complex or over-flexible referents (Method 1), templates or conceptual maps from which instances may be generated (Method 2) and serialized maps of instances conforming to their referent schemas (Method 3). The three methods are detailed and their applications to current research in digital ecosystems, archival description and digital asset management and preservation are examined. A possible synthesis of the three is also proposed in order to enable the methodology to operate within a single schema, the Metadata Encoding and Transmission Standard (METS).
Keywords: XML; intermediary XML schemas; metadata; interoperability; digital asset management; digital preservation; METS; constraint; templating.
Unique challenges facing linked data implementation for National Educational Television
by Chris Pierce
Abstract: Implementing linked data involves a costly process of converting metadata to an exchange format substantially different from traditional library 'records-based' exchange. To achieve full implementation, it is necessary to navigate a complex process of data modelling, crosswalking, and publishing. This paper documents the transition of a dataset of National Educational Television (NET) collection records to a 'data-based' exchange environment of linked data by discussing challenges faced during the conversion. These challenges include silos such as the Librarys media asset management system Merged Audiovisual Information System (MAVIS), aligning PBCore with the bibliographic linked data model BIBFRAME, modelling differences in works between archival moving image cataloging and other domains using Entertainment Identifier Registry IDs (EIDR IDs), and possible alignments with EBUCore (the European Broadcasting Union linked data model) to address gaps between PBCore and BIBFRAME.
Keywords: linked data; MARC21; PBCore; BIBFRAME 2.0; National Educational Television,
EIDR; EBUCore; crosswalking; data modelling.
Exploring the utility of metadata record graphs and network analysis for metadata quality evaluation and augmentation
by Mark Phillips, Oksana Zavalina, Hannah Tarver
Abstract: Our study explores the possible uses and effectiveness of network analysis, including metadata record graphs, as a method of evaluating collections of metadata records at a scale. This paper presents the results of an experiment applying these methods to records in a university digital library system as well as two sub-collections of different sizes and composition. The data includes count- and value-based statistics as well as network metrics for every Dublin Core element in each of the metadata sets. We discuss the benefits and constraints of these metrics based on this analysis and suggest possible future applications.
Keywords: metadata record graphs; metadata quality; metadata linking.
From the web of bibliographic data to the web of bibliographic meaning: structuring, interlinking and validating ontologies on the semantic web
by Helena Simões Patrício, Maria Inês Cordeiro, Pedro Nogueira Ramos
Abstract: Bibliographic datasets have revealed good levels of technical interoperability observing the principles and good practices of linked data. However, they have a low level of quality from the semantic point of view, owing to many factors: lack of a common conceptual framework for a diversity of standards often used together, reduced number of links between the ontologies underlying datasets, proliferation of heterogeneous vocabularies, underuse of semantic mechanisms in data structures, 'ontology hijacking' (Feeney et al., 2018), and point-to-point mappings, as well as limitations of semantic web languages for the requirements of bibliographic data interoperability. After reviewing such issues, a research direction is proposed to overcome the misalignments found by means of a reference model and a superontology, using SHACL (Shapes Constraint Language) to solve current limitations of RDF languages.
Keywords: linked open data; bibliographic data; semantic web; SHACL; LOD validation; ontologies; reference model; bibliographic standards.