Authors: Sourav Mazumder; Subhankar Dhar
Addresses: IBM Software Group, San Jose, CA, USA ' San Jose State University, San Jose, CA, USA
Abstract: Innovation in Hadoop and other related big data technologies in recent past bring on to the table promises around better management of enterprise data at much lesser cost but with more business benefits. However, managing big data environment at enterprise level is an involved task from cost and operational perspectives. Supporting various types of enterprise use cases with different workload patterns in the same cluster, minimising the data movement, assuring different service level agreements (SLAs), ensuring data lineage, veracity, and security are some of the key challenges. In this paper, we delve into these key challenges from practitioners' perspective based on lessons learnt from various big data implementation scenarios. We also discuss the concept of Hadoop ecosystem as big data platform which can potentially address these challenges. Finally, we also provide a prescriptive approach which can help moving towards the vision of enterprise big data platform using Hadoop ecosystem.
Keywords: business analytics; big data; data mining; Map Reduce; Hadoop; Spark; Alluxio; NoSQL.
International Journal of Information Technology and Management, 2018 Vol.17 No.4, pp.334 - 348
Received: 26 Mar 2016
Accepted: 26 Nov 2016
Published online: 01 Jun 2018 *