Title: An effective and time-efficient approach for Linked Data fusion using genetic algorithms

Authors: Khayra Bencherif; Mimoun Malki

Addresses: EEDIS Laboratory, Djilali Liabes University, Sidi Bel Abbes, Algeria ' LabRI Laboratory, High School of Computer Science ESI, Sidi Bel Abbes, Algeria

Abstract: The Linked Open Data Cloud is a project that uses RDF formalism to publish data in the form of a triple on the web under open licence. With the ever increasing amount of data sets available in the LOD Cloud, it is already beyond the human capability to integrate heterogeneous data manually. So far, the task of Linked Data fusion entails a significant amount of time owing to the large number of instances in the data sets from the LOD Cloud. In this paper, we suggest a new system to efficiently combine heterogeneous data from the LOD Cloud. First, we extract similar instances from the LOD Cloud to identify identical or related information. Then, our system collects all predicates and objects of the similar instances to construct a set of trees. Finally, we propose a genetic algorithm to merge data in the constructed trees. In the following, we give an overview of our system architecture and we detail our genetic algorithm. We also evaluate our system using real data sets showing that it can increase the completeness and the conciseness in data fusion. Moreover, we prove that our system is faster when fusing large data sets from the LOD Cloud.

Keywords: linked data; data integration; data fusion; genetic algorithms; Linked Open Data; LOD Cloud.

DOI: 10.1504/IJMSO.2016.080349

International Journal of Metadata, Semantics and Ontologies, 2016 Vol.11 No.2, pp.110 - 123

Received: 28 Apr 2016
Accepted: 02 Aug 2016

Published online: 16 Nov 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article