Title: Maintaining data integrity in cloud systems through version management

Authors: Tsozen Yeh; Yipin Wang; Yiming Tu

Addresses: Department of Computer Science and Information Engineering, Fu Jen Catholic University, New Taipei City, Taiwan ' Department of Computer Science and Information Engineering, Fu Jen Catholic University, New Taipei City, Taiwan ' Department of Computer Science and Information Engineering, Fu Jen Catholic University, New Taipei City, Taiwan

Abstract: As the era of the big data arrives, the enormous amount of data collected has far exceeded what traditional computer systems can appropriately handle and process. Accordingly, cloud computing has been largely used to facilitate the processing of big data. Often individual data files contain data inserted at different time, which means they have chronological versions of contents since their creation. Hadoop is one of the most popular cloud systems used nowadays. Unfortunately, it does not support efficient schemes to conduct version management for files. Previously, we improved Hadoop by realising autonomous snapshot and extra duplication for files covered in snapshots. In this paper, we report our efforts to design and implement version management for files in snapshots. With the help of autonomous snapshot and extra file duplication, version management can further maintain data integrity for important files contained in snapshots.

Keywords: big data; data integrity; cloud computing; Hadoop; Hadoop Distributed File System; HDFS.

DOI: 10.1504/IJAHUC.2020.107818

International Journal of Ad Hoc and Ubiquitous Computing, 2020 Vol.34 No.2, pp.63 - 73

Received: 20 Mar 2019
Accepted: 26 Aug 2019

Published online: 22 Jun 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article