Title: WBDPR: a way for big data provenance relationship

Authors: Zhiwen Zheng; Ying Song; Yunmei Shi; Bo Wang

Addresses: Department of Computer, Beijing Information Science and Technology University, Beijing, 100101, China ' Department of Computer, Beijing Information Science and Technology University, Beijing, 100101, China; State Key Laboratory of Computer Architecture, Institute of Computing Technology, Academy of Sciences, Beijing, 100190, China ' Department of Computer, Beijing Information Science and Technology University, Beijing, 100101, China ' Software Engineering College, Zhengzhou University of Light Industry, Henan Zhengzhou, 450002, China

Abstract: With the increasing complexity of data generation relationships, existing provenance frameworks face challenges such as resource consumption, redundant storage, and slow query times. This paper proposes a way for big data provenance relationship (WBDPR), a solution for efficient data provenance in the Hadoop scenario. WBDPR addresses these issues by supporting asynchronous provenance log integration and introducing a provenance storage mode and query algorithm based on provenance directed acyclic graph (PROV-DAG). Experimental results demonstrate that WBDPR reduces memory occupation by 56% and index disk storage by 75%. Additionally, it improves query performance by 80% in 64% of leaf and intermediate nodes. Compared to RAMP, Newt, and Atlas systems, WBDPR achieves up to 5.1% reduction in tracing time. WBDPR technology is a fault-tolerant technology that records the provenance information of data and its calculation process, and its significance lies in ensuring the integrity and reliability of data.

Keywords: data provenance; provenance graph; provenance model; data storage; Hadoop.

DOI: 10.1504/IJCSE.2024.141374

International Journal of Computational Science and Engineering, 2024 Vol.27 No.5, pp.627 - 641

Received: 23 Feb 2023
Accepted: 17 Jun 2023

Published online: 09 Sep 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article