Title: A survey on deduplication systems

Authors: Amdewar Godavari; Chapram Sudhakar

Addresses: Computer Science and Engineering, National Institute of Technology Warangal, Warangal, Telangana, India ' Computer Science and Engineering, National Institute of Technology Warangal, Warangal, Telangana, India

Abstract: With the arrival of new technological trends such as Big Data and Internet of Things, tremendous amount of duplicate data is being generated. Duplicate data causes the wastage of storage capacity and degradation of performance of the storage systems. Data deduplication is a storage optimisation technique that is used to eliminate duplicate data. Deploying deduplication system for primary storage or secondary storage is challenging due to extra latency incurred in deduplication processing. Apart from this, as duplicates are eliminated, deduplication affects contiguous placement of data on the disk, which is known as disk fragmentation problem. This paper gives overview of issues and solutions proposed for deploying deduplication component for primary and/or secondary storage systems with centralised or distributed approaches. Experiments are conducted using Destor tool on different data sets. The results are used to study the effect of different chunking algorithms on deduplication phases.

Keywords: data fragmentation; deduplication; disk bottleneck.

DOI: 10.1504/IJGUC.2024.137902

International Journal of Grid and Utility Computing, 2024 Vol.15 No.2, pp.143 - 159

Received: 22 Mar 2022
Received in revised form: 08 Jan 2023
Accepted: 28 Jan 2023

Published online: 08 Apr 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article