Article: Optimising virtual machine allocation in MapReduce cloud for improved data locality Journal: International Journal of Big Data Intelligence (IJBDI) 2015 Vol.2 No.1 pp.2 - 8 Abstract: Big data is getting more attention in today's world. Although MapReduce is successful in processing big data, it has some performance bottlenecks when deployed in cloud. Data locality has an important role among them. The focus of this paper is on improving data locality in MapReduce cloud by allocating adjacent VMs, for executing MapReduce jobs. Good data locality reduces cross network traffic and hence results in high performance. When a user requests for a set of virtual machines (VMs), VMs are chosen based on their physical distance between other VMs. We propose a greedy algorithm for creating cluster of VMs. Greedy methods do not give an optimal solution. The second method for the allocation of VMs is via partitioning around medoids method. Partitioning around medoids method always find a local minimum. This allocation may not be globally optimised. We also present a dynamic programming approach which is guaranteed to find an optimal solution from the users' perspective. Inderscience Publishers - linking academia, business and industry through research

Title: Optimising virtual machine allocation in MapReduce cloud for improved data locality

Authors: T.P. Shabeera; S.D. Madhu Kumar

Addresses: Department of Computer Science and Engineering, National Institute of Technology Calicut, Kerala, India ' Department of Computer Science and Engineering, National Institute of Technology Calicut, Kerala, India

Abstract: Big data is getting more attention in today's world. Although MapReduce is successful in processing big data, it has some performance bottlenecks when deployed in cloud. Data locality has an important role among them. The focus of this paper is on improving data locality in MapReduce cloud by allocating adjacent VMs, for executing MapReduce jobs. Good data locality reduces cross network traffic and hence results in high performance. When a user requests for a set of virtual machines (VMs), VMs are chosen based on their physical distance between other VMs. We propose a greedy algorithm for creating cluster of VMs. Greedy methods do not give an optimal solution. The second method for the allocation of VMs is via partitioning around medoids method. Partitioning around medoids method always find a local minimum. This allocation may not be globally optimised. We also present a dynamic programming approach which is guaranteed to find an optimal solution from the users' perspective.

Keywords: cloud computing; virtual machines; MapReduce cloud; Hadoop; data locality; optimisation; big data; virtual machine allocation; greedy algorithm; VM clusters; partitioning around medoids; dynamic programming.

DOI: 10.1504/IJBDI.2015.067563

International Journal of Big Data Intelligence, 2015 Vol.2 No.1, pp.2 - 8

Received: 07 Jun 2014
Accepted: 22 Aug 2014
Published online: 21 Mar 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Optimising virtual machine allocation in MapReduce cloud for improved data locality

Keep up-to-date