Title: Hadoop ecosystem as enterprise big data platform: perspectives and practices

Authors: Sourav Mazumder; Subhankar Dhar

Addresses: IBM Software Group, San Jose, CA, USA ' San Jose State University, San Jose, CA, USA

Abstract: Innovation in Hadoop and other related big data technologies in recent past bring on to the table promises around better management of enterprise data at much lesser cost but with more business benefits. However, managing big data environment at enterprise level is an involved task from cost and operational perspectives. Supporting various types of enterprise use cases with different workload patterns in the same cluster, minimising the data movement, assuring different service level agreements (SLAs), ensuring data lineage, veracity, and security are some of the key challenges. In this paper, we delve into these key challenges from practitioners' perspective based on lessons learnt from various big data implementation scenarios. We also discuss the concept of Hadoop ecosystem as big data platform which can potentially address these challenges. Finally, we also provide a prescriptive approach which can help moving towards the vision of enterprise big data platform using Hadoop ecosystem.

Keywords: business analytics; big data; data mining; Map Reduce; Hadoop; Spark; Alluxio; NoSQL.

DOI: 10.1504/IJITM.2018.095060

International Journal of Information Technology and Management, 2018 Vol.17 No.4, pp.334 - 348

Received: 26 Mar 2016
Accepted: 26 Nov 2016

Published online: 01 Jun 2018 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article