Implementation of a deduplication cache mechanism using content-defined chunking
by Yoshihiro Oyama; Jun Murakami; Shun Ishiguro; Osamu Tatebe
International Journal of High Performance Computing and Networking (IJHPCN), Vol. 9, No. 3, 2016

Abstract: Many application programs in data-intensive science read and write large files. Large data consume significant memory because the data is loaded into the page cache. Since memory resources are critically valuable in data-intensive computing, reducing the memory footprint consumed by file data is essential. In this paper, we propose a cache deduplication mechanism with content-defined chunking (CDC) for the Gfarm distributed file system. CDC divides a file into variable-size blocks (chunks) based on the contents of the file. The client stores the chunks in the local file system as cache files and reuses them during subsequent file accesses. Deduplication of chunks reduces the amount of transmitted data between clients and servers, and reduces storage and memory requirements. The experimental results demonstrate that the proposed mechanism significantly improves the performance of file-read operations and that the introduction of parallelism reduces the overhead of file-write operations.

Online publication date: Sat, 30-Apr-2016

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of High Performance Computing and Networking (IJHPCN):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com