Article: A decentralised framework for efficient storage and processing of big data using HDFS and IPFS Journal: International Journal of Humanitarian Technology (IJHT) 2020 Vol.1 No.2 pp.131 - 143 Abstract: Big data revolution emerged with greater opportunities as well as challenges. Some of the major challenges include capturing, storing, transferring, analysing, processing and updating these large and complex datasets. Traditional data handling techniques cannot manage this fast growing data. Apache Hadoop is one of the best technologies which can address the challenges involved in big data handling. Hadoop is a centralised, distributed data storage model. InterPlanetary file system (IPFS) is an emerging technology which can provide a decentralised distributed storage. By integrating both these technologies, we can create a better framework for the distributed storage and processing of big data. In the proposed work, we formulated a model for big data placement, replication and processing by combining the features of Hadoop and IPFS. Hadoop distributed file system and IPFS jointly handle the data placement and replication tasks and the programming framework MapReduce in Hadoop handle the data processing task. The experimental result shows that the proposed framework can achieve cost-effective storage as well as faster processing of big data. Inderscience Publishers - linking academia, business and industry through research

Title: A decentralised framework for efficient storage and processing of big data using HDFS and IPFS

Authors: Franklin John; Suji Gopinath; Elizabeth Sherly

Addresses: Indian Institute of Information Technology and Management – Kerala, Technopark, Thiruvananthapuram, Kerala, 695581, India ' University of Kerala, Thiruvananthapuram, Kerala 695 581, India ' Indian Institute of Information Technology and Management – Kerala, Technopark, Thiruvananthapuram, Kerala, 695581, India

Abstract: Big data revolution emerged with greater opportunities as well as challenges. Some of the major challenges include capturing, storing, transferring, analysing, processing and updating these large and complex datasets. Traditional data handling techniques cannot manage this fast growing data. Apache Hadoop is one of the best technologies which can address the challenges involved in big data handling. Hadoop is a centralised, distributed data storage model. InterPlanetary file system (IPFS) is an emerging technology which can provide a decentralised distributed storage. By integrating both these technologies, we can create a better framework for the distributed storage and processing of big data. In the proposed work, we formulated a model for big data placement, replication and processing by combining the features of Hadoop and IPFS. Hadoop distributed file system and IPFS jointly handle the data placement and replication tasks and the programming framework MapReduce in Hadoop handle the data processing task. The experimental result shows that the proposed framework can achieve cost-effective storage as well as faster processing of big data.

Keywords: big data management; cloud computing; Hadoop distributed file system; HDFS; interPlanetary file system; IPFS; erasure coding.

DOI: 10.1504/IJHT.2020.112451

International Journal of Humanitarian Technology, 2020 Vol.1 No.2, pp.131 - 143

Received: 04 Oct 2017
Accepted: 06 Jul 2018
Published online: 18 Jan 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: A decentralised framework for efficient storage and processing of big data using HDFS and IPFS

Keep up-to-date