Title: Optimising virtual machine allocation in MapReduce cloud for improved data locality
Authors: T.P. Shabeera; S.D. Madhu Kumar
Addresses: Department of Computer Science and Engineering, National Institute of Technology Calicut, Kerala, India ' Department of Computer Science and Engineering, National Institute of Technology Calicut, Kerala, India
Abstract: Big data is getting more attention in today's world. Although MapReduce is successful in processing big data, it has some performance bottlenecks when deployed in cloud. Data locality has an important role among them. The focus of this paper is on improving data locality in MapReduce cloud by allocating adjacent VMs, for executing MapReduce jobs. Good data locality reduces cross network traffic and hence results in high performance. When a user requests for a set of virtual machines (VMs), VMs are chosen based on their physical distance between other VMs. We propose a greedy algorithm for creating cluster of VMs. Greedy methods do not give an optimal solution. The second method for the allocation of VMs is via partitioning around medoids method. Partitioning around medoids method always find a local minimum. This allocation may not be globally optimised. We also present a dynamic programming approach which is guaranteed to find an optimal solution from the users' perspective.
Keywords: cloud computing; virtual machines; MapReduce cloud; Hadoop; data locality; optimisation; big data; virtual machine allocation; greedy algorithm; VM clusters; partitioning around medoids; dynamic programming.
DOI: 10.1504/IJBDI.2015.067563
International Journal of Big Data Intelligence, 2015 Vol.2 No.1, pp.2 - 8
Received: 07 Jun 2014
Accepted: 22 Aug 2014
Published online: 21 Mar 2015 *