Int. J. of Big Data Intelligence   »   2017 Vol.4, No.1

 

 

Title: HCEm model and a comparative workload analysis of Hadoop cluster

 

Authors: José Benedito De Souza Brito; Aletéia Patrícia Favacho De Araújo

 

Addresses:
Department of Computer Science, Universidade de Brasília (UnB), POB: 4.466, ZIP: 90.910-900, Brasília, DF, Brazil
Department of Computer Science, Universidade de Brasília (UnB), POB: 4.466, ZIP: 90.910-900, Brasília, DF, Brazil

 

Abstract: This paper describes the HCEm model, designed to estimate the size of a cluster running Hadoop, in a given timeframe on cloud environments. The HCEm consists of a light optimisation layer for MapReduce jobs and a model to estimate the size of a Hadoop cluster. Additionally, this paper presents a comparative study of HCEm using similar applications and workloads in two production Hadoop clusters, the Amazon Elastic MapReduce and a private cloud in a large financial company, in order to evaluate the performance of the model in real and intensive processing environments. The estimates generated by the HCEm model and processing performed are representative and consistent, which can help researchers and engineers understand the workload characteristics of Hadoop clusters in their production environments. The performance differences shown between the real environments, confirmed that the increased sharing of physical computing host resources reduces the accuracy of the model.

 

Keywords: distributed computing; computational efficiency; Hadoop benchmarks; big data; data analysis; resource allocation; performance model; MapReduce; Hadoop performance evaluation; job estimation; Hadoop clusters; cloud computing.

 

DOI: 10.1504/IJBDI.2016.10001999

 

Int. J. of Big Data Intelligence, 2017 Vol.4, No.1, pp.47 - 60

 

Available online: 26 Dec 2016

 

 

Editors Full text accessAccess for SubscribersPurchase this articleComment on this article