Title: A method for examining metadata quality in open research datasets using the OAI-PMH and SQL queries: the case of the Dublin Core 'Subject' element and suggestions for user-centred metadata annotation design
Authors: Panos Balatsoukas; Dimitris Rousidis; Emmanouel Garoufallou
Addresses: Department of Computer Science, City, University of London, London, UK ' Department of Computer Science, University of Alcala, Madrid, Spain; Department of Library Science and Information Systems, Alexander Technological Educational Institute of Thessaloniki, Macedonia, Greece ' Department of Library Science and Information Systems, Alexander Technological Educational Institute of Thessaloniki, Macedonia, Greece; Department of Computer Science, University of Alcala, Madrid, Spain
Abstract: Poor quality metadata can have negative impact not only on the way research datasets are retrieved, shared and used by scientists, but also on the way research data repositories are managed and audited. The aim of the research reported in this paper was to perform a descriptive analysis of the Dublin Core's Subject metadata element and identify its quality problems, if any, in the context of the Dryad research data repository following a novel data-preprocessing method using SQL queries. The findings showed quality problems related to the lack of controlled vocabulary and standardisation, like the inconsistent use of singular and plural forms, adjectives and synonyms. This study has both practical and methodological implications for the evaluation of metadata and the improvement of the quality of the research data annotation process in open research data repositories.
Keywords: Big Data; Dublin Core; data quality; open access repositories; metadata; digital curation; user centred design; open research datasets; Dryad; metadata quality; OAI-PMH; SQL queries.
International Journal of Metadata, Semantics and Ontologies, 2018 Vol.13 No.1, pp.1 - 8
Received: 15 Feb 2018
Accepted: 28 Feb 2018
Published online: 30 Nov 2018 *