Title: Applying cross-data set identity reasoning for producing URI embeddings over hundreds of RDF data sets

Authors: Michalis Mountantonakis; Yannis Tzitzikas

Addresses: Institute of Computer Science, FORTH-ICS, Greece; Computer Science Department, University of Crete, Greece ' Institute of Computer Science, FORTH-ICS, Greece; Computer Science Department, University of Crete, Greece

Abstract: There is a proliferation of approaches that exploit RDF data sets for creating URI embeddings, i.e., embeddings that are produced by taking as input URI sequences (instead of simple words or phrases), since they can be of primary importance for several tasks (e.g., machine learning tasks). However, existing techniques exploit either a single or a few data sets for creating URI embeddings. For this reason, we introduce a prototype, called LODVec, which exploits LODsyndesis for enabling the creation of URI embeddings by using hundreds of data sets simultaneously, after enriching them with the results of cross-data set identity reasoning. By using LODVec, it is feasible to produce URI sequences by following paths of any length (according to a given configuration), and the produced URI sequences are used as input for creating embeddings through word2vec model. We provide comparative results for evaluating the gain of using several data sets for creating URI embeddings, for the tasks of classification and regression, and for finding the most similar entities to a given one.

Keywords: embeddings; cross-data set identity reasoning; RDF; machine learning; data integration; linked data; finding similar entities; classification; regression.

DOI: 10.1504/IJMSO.2021.117103

International Journal of Metadata, Semantics and Ontologies, 2021 Vol.15 No.1, pp.1 - 22

Received: 30 May 2020
Accepted: 10 Dec 2020

Published online: 10 Aug 2021 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article