International Journal of Metadata, Semantics and Ontologies (3 papers in press)
A method for examining metadata quality in open research datasets using the OAI-PMH and SQL queries: the case of the Dublin Core Subject element and suggestions for user-centred metadata annotation design
by Panos Balatsoukas, Dimitris Rousidis, Emmanouel Garoufallou
Abstract: Open research datasets are growing in terms of volume rapidly. Despite the fact that metadata play an important role for the management of research datasets and data repositories, several factors - such as the big volume of data and its complex lifecycles, as well as operational constraints related to financial resources and human factors - may impede the quality of metadata. Poor quality metadata can have a negative impact both on the way research datasets are retrieved, shared and used by scientists, and also on the way research data repositories are managed and audited. The aim of the research reported in this paper was to perform a descriptive analysis of the Dublin Cores Subject metadata element and identify its quality problems, if any, in the context of the Dryad research data repository. In order to address this aim, a total of 4557 metadata packages and 13,638 metadata data files were analysed following a novel data-preprocessing method using SQL queries. The findings showed emerging trends about the subject coverage of the repository (e.g. the most popular subjects and the authors that contributed the most to these subjects). Also, quality problems related to the lack of controlled vocabulary and standardisation were identified, such as the inconsistent use of singular and plural forms, adjectives and synonyms. This study has both practical and methodological implications for the evaluation of metadata and the improvement of the quality of the research data annotation process in open research data repositories.
Keywords: big data; Dublin Core; data quality; open access repositories; metadata; digital curation; user-centred design; open research datasets; Dryad; metadata quality; OAI-PMH; SQL queries.
Data provenance in multiagent systems: relevance, benefits and research opportunities
by Tassio Sirqueira, Marx Viana, Francisco Cunha, Ingrid Nunes, Carlos Lucena
Abstract: The popularity of applications based on artificial intelligence creates the need for making them able to explain their behavior and be accountable for their decisions. This is a challenge mainly if applications are distributed, being composed of multiple autonomous agents, forming a multiagent system (MAS). A key means of making these systems explainable is to track agent behaviour, that is, to record the provenance of their actions and reasoning. Although the idea of provenance has been explored in some contexts, it has been little explored in the context of MAS, leaving many open issues that must be understood and addressed. Our goal in this paper is to make a case for the importance of the data provenance to MAS, discussing what questions can be answered regarding MAS behavior using provenance and, with a case study, demonstrating the benefits that provenance provides to answer these questions. This study involves the use of a framework, namely FProvW3C, which collects and stores the provenance of data produced by a MAS. These data can be analysed to answer a wide variety of questions to understand the MAS behaviour. Our case study thus demonstrates that the use of data provenance in MAS is a potential solution to making the agent reasoning process transparent.
Keywords: provenance; multiagent systems; explainable artificial intelligence.
A combined path index for efficient processing of XML queries
by Dhanalekshmi Gopinathan, Krishna Asawa
Abstract: In today's digitally connected world, diverse applications use data in various formats. Many application domains use data in a structured format (e.g. transaction data in financial systems) or unstructured format (e.g., social media postings) or in a combination of (e.g., emails, contracts and health records). The flexible nature of the XML has motivated applications in various fields, such as like technical and financial, to drift towards the XML representation. The emerging drift towards XML applications has increased the number of documents exponentially over the web. Thus, unprecedented and exponential growth in the usage of XML documents on the web warrants research attention towards improved and efficient methodologies to facilitate accelerated query processing of XML documents. As the XML documents are modelled as a rooted labelled ordered tree, the primary challenge is to store the data preserving the tree structure and process the query efficiently. This paper proposes a new indexing structure that combines terminal sibling nodes at the same level into a single path. It reduces the search space while querying, thereby accelerating query processing. The main advantage of this index is that it can process the branch (twig) queries efficiently with fewer lookups and decompositions in contrast to the existing approaches. The results also show that they are processed with equal or better performance compared with the existing ones.
Keywords: XML; information retrieval; query processing; XPath; ordered tree; branch query; indexing; sibling combined path.