International Journal of Metadata, Semantics and Ontologies (8 papers in press)
A method for examining metadata quality in open research datasets using the OAI-PMH and SQL queries: the case of the Dublin Core Subject element and suggestions for user-centred metadata annotation design
by Panos Balatsoukas, Dimitris Rousidis, Emmanouel Garoufallou
Abstract: Open research datasets are growing in terms of volume rapidly. Despite the fact that metadata play an important role for the management of research datasets and data repositories, several factors - such as the big volume of data and its complex lifecycles, as well as operational constraints related to financial resources and human factors - may impede the quality of metadata. Poor quality metadata can have a negative impact both on the way research datasets are retrieved, shared and used by scientists, and also on the way research data repositories are managed and audited. The aim of the research reported in this paper was to perform a descriptive analysis of the Dublin Cores Subject metadata element and identify its quality problems, if any, in the context of the Dryad research data repository. In order to address this aim, a total of 4557 metadata packages and 13,638 metadata data files were analysed following a novel data-preprocessing method using SQL queries. The findings showed emerging trends about the subject coverage of the repository (e.g. the most popular subjects and the authors that contributed the most to these subjects). Also, quality problems related to the lack of controlled vocabulary and standardisation were identified, such as the inconsistent use of singular and plural forms, adjectives and synonyms. This study has both practical and methodological implications for the evaluation of metadata and the improvement of the quality of the research data annotation process in open research data repositories.
Keywords: big data; Dublin Core; data quality; open access repositories; metadata; digital curation; user-centred design; open research datasets; Dryad; metadata quality; OAI-PMH; SQL queries.
Data provenance in multiagent systems: relevance, benefits and research opportunities
by Tassio Sirqueira, Marx Viana, Francisco Cunha, Ingrid Nunes, Carlos Lucena
Abstract: The popularity of applications based on artificial intelligence creates the need for making them able to explain their behavior and be accountable for their decisions. This is a challenge mainly if applications are distributed, being composed of multiple autonomous agents, forming a multiagent system (MAS). A key means of making these systems explainable is to track agent behaviour, that is, to record the provenance of their actions and reasoning. Although the idea of provenance has been explored in some contexts, it has been little explored in the context of MAS, leaving many open issues that must be understood and addressed. Our goal in this paper is to make a case for the importance of the data provenance to MAS, discussing what questions can be answered regarding MAS behavior using provenance and, with a case study, demonstrating the benefits that provenance provides to answer these questions. This study involves the use of a framework, namely FProvW3C, which collects and stores the provenance of data produced by a MAS. These data can be analysed to answer a wide variety of questions to understand the MAS behaviour. Our case study thus demonstrates that the use of data provenance in MAS is a potential solution to making the agent reasoning process transparent.
Keywords: provenance; multiagent systems; explainable artificial intelligence.
A combined path index for efficient processing of XML queries
by Dhanalekshmi Gopinathan, Krishna Asawa
Abstract: In today's digitally connected world, diverse applications use data in various formats. Many application domains use data in a structured format (e.g. transaction data in financial systems) or unstructured format (e.g., social media postings) or in a combination of (e.g., emails, contracts and health records). The flexible nature of the XML has motivated applications in various fields, such as like technical and financial, to drift towards the XML representation. The emerging drift towards XML applications has increased the number of documents exponentially over the web. Thus, unprecedented and exponential growth in the usage of XML documents on the web warrants research attention towards improved and efficient methodologies to facilitate accelerated query processing of XML documents. As the XML documents are modelled as a rooted labelled ordered tree, the primary challenge is to store the data preserving the tree structure and process the query efficiently. This paper proposes a new indexing structure that combines terminal sibling nodes at the same level into a single path. It reduces the search space while querying, thereby accelerating query processing. The main advantage of this index is that it can process the branch (twig) queries efficiently with fewer lookups and decompositions in contrast to the existing approaches. The results also show that they are processed with equal or better performance compared with the existing ones.
Keywords: XML; information retrieval; query processing; XPath; ordered tree; branch query; indexing; sibling combined path.
A supervised aspect level sentiment model to predict overall sentiment on tweeter documents
by Syed Muzamil Basha, Dharmendra Singh Rajput
Abstract: As the social applications are gaining the more popularity, different kinds of social media platform are ready to publish and express emotions in the form of reviews. In which, detecting the concealed sentiment patterns and discovering knowledge in the huge user-generated inputs is a challenging task and has great social significance value. In a traditional sentiment analysis approach, statistical correlation between words is considered, whereas the dependency factor between aspects and sentiment words are ignored, which has a greater impact factor on overall sentiment analysis. In this paper, we propose a new supervised topic level sentiment model (SSM), which is capable of handling overall sentiment analysis problems. Belief maximisation algorithm is used in SSM model, and Dirichlet distribution is used to estimate aspects and sentiment words. In order to prepare and modernise new documents, a hyperparameter Gibbs sampling method is used. We conducted experiments on reviews related to different products in multiple documents and the results state that the SSM model outperforms the on-hand algorithm in terms of aspect recognition and overall sentiment prediction accuracy.
Keywords: belief maximisation; Dirichelt distribution; Gibbs sampling.
The properties of property alignment on the semantic web
by Michelle Cheatham, Catia Pesquita, Daniela Oliveira, Helena B. McCurdy
Abstract: The performance of alignment systems on property matching lags behind that on class and instance matching. This work seeks to understand the reasons for this and consider avenues for improvement. The paper contains an exploration of the performance of current alignment systems on the only commonly accepted alignment benchmark that involves matches between properties. A second benchmark involving properties from DBPedia and YAGO, scaled to be within the capabilities of most existing alignment systems, is also proposed. A basic approach focused on aligning properties is then presented and evaluated using both benchmarks to serve as a baseline against which to compare more complex matchers on the property alignment task. The results show that even a relatively simplistic approach can achieve a significantly higher F-measure than current matchers. Finally, an existing full-featured alignment system is augmented with the basic property matching approach and the difference in performance is assessed.
Keywords: property alignment; ontology alignment; property matching; ontology matching; ontology mapping; ontology alignment benchmarks; semantic web; semantic data. integration.
Ontology of folktales in the Greater Mekong Subregion
by Kulthida Tuamsuk, Wirapong Chansanam, Nattapong Kaewboonma
Abstract: The goal of this research is to use the digital humanities research concept in the study of folktales in the Greater Mekong Subregion (GMS). This paper presents the second phase of the research, focusing on developing ontologies of folktales in the GMS. The ontology development comprised two processes: 1) ontology design and development and 2) ontology documentation. In both processes, domain knowledge and ontology of folktales were collected, captured, revised, and evaluated by experts in the field of folktale studies, literary studies, Asian studies, and ontology development. The outcome of this research is domain ontologies for folktales in the GMS. Approximately 74 concepts of folktales in the GMS have been defined and classified into classes and subclasses, including some necessary scope notes and relationships of the topics. The ontology was developed using Prot
Keywords: ontology; folktales; Greater Mekong Subregion; digital humanities.
A semantic web enabled host intrusion detection system
by Ozgu Can, Murat Osman Unalir, Emine Sezer, Okan Bursa, Batuhan Erdogdu
Abstract: Security has preeminent importance in todays technological environment. In recent years, as cyber-attacks have emerged new security concerns have arisen. In order to overcome the serious consequences of these cyber-attacks, fully-functioning and performance-improved intrusion detections systems are required. In this work, we propose a semantic web based host intrusion detection system to reduce the search time for malware scanning and to improve the performance of the intrusion detection systems. For this purpose, we used ontologies to provide semantic expressiveness and knowledge description for an intrusion detection system. The proposed ontology based intrusion detection system scans for malwares running on the operating system. Also, services and processes that are working on the system are scanned, and results are compared with a malware database. If any match occurs, the proposed system displays a malware list that matches with the information of that malware and where it is running.
Keywords: host intrusion detection system; intrusion detection system; semantic web; ontology.
Using type and temporal semantic enrichment to boost content discoverability and multilingualism in the Greek cultural heritage aggregator, SearchCulture.gr
by Haris Georgiadis, Agathi Papanoti, Maria Paschou, Alexandra Roubani, Despina Hardouveli, Evi Sachini
Abstract: Most aggregators face challenges regarding searchability, discoverability and visual presentation of their content owing to metadata heterogeneity across the collections. Particularly for cultural and historical material, keyword-based searching is far from sufficient. Structured item types and temporal information are key metadata for the discoverability of cultural heritage content. We developed an innovative metadata enrichment and homogenisation scheme for types and temporal information that is both effective and user-friendly, and we embedded it in the ingestion workflow of SearchCulture.gr, the Greek cultural heritage aggregator developed by the National Documentation Centre (EKT). Two key components of the enrichment scheme are Semantics.gr, a platform for publishing vocabularies that contains a mapping tool for massive semantic enrichment, and a parametric tool for chronological normalisation. We enriched and homogenised the aggregated content with respect to types and temporal information which subsequently allowed us to develop advanced multilingual search and browsing features, including hierarchical navigation on types and historical periods, searching and faceting on types, time spans and historical periods, a tag cloud of types and an interactive timeline/histogram.
Keywords: aggregator;semantic enrichment; linked data; automatic categorization; vocabularies; thesauri; cultural heritage; historical periods; time-driven search; temporal coverage; timeline; multilingualism.