Title: Automated big data quality assessment using knowledge graph embeddings

Authors: Hadi Fadlallah; Rima Kilany; Mitri Haber; Ali Jaber

Addresses: Faculty of Sciences, Saint Jospeh University of Beirut, Beirut, Lebanon ' Faculty of Engineering, Saint Jospeh University of Beirut, Beirut, Lebanon ' Faculty of Engineering, Saint Jospeh University of Beirut, Beirut, Lebanon ' Faculty of Sciences, Lebanese University, Beirut, Lebanon

Abstract: This paper introduces a knowledge-based approach to automate data quality assessment, addressing the limitations of traditional methods that overlook contextual data characteristics. By using knowledge graph embeddings, it predicts missing connections between a dataset's context and relevant quality rules within a knowledge graph. This integration of diverse representations enables a context-specific data quality assessment plan tailored to each scenario. The approach enhances understanding of the dataset's context, surpassing traditional strict matching methods. Numerical edge attributes are applied to assign weights to predicted quality measurements, providing a comprehensive assessment. The solution is evaluated using AmpliGraph on a radiation sensors dataset from the Lebanese Atomic Energy Commission (LAEC-CNRS), demonstrating its effectiveness in generating a robust data quality assessment plan. The results obtained from this evaluation demonstrate the capability of our solution to generate a comprehensive data quality assessment plan for the given input dataset.

Keywords: data quality assessment; data context; big data; machine learning; knowledge graph embeddings; automation.

DOI: 10.1504/IJDMMM.2025.150987

International Journal of Data Mining, Modelling and Management, 2025 Vol.17 No.4, pp.383 - 405

Received: 27 Feb 2024
Accepted: 22 Apr 2024

Published online: 07 Jan 2026 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article