Authors: Ahmed Radwan; Akmal Younis; Santhosh Srinivasan; Abhay Gupta
Addresses: Yahoo! Inc., Santa Clara, CA, 95054, USA. ' College of Engineering, University of Miami, Coral Gables, FL 33124, USA. ' Yahoo! Inc., Santa Clara, CA, 95054, USA. ' Yahoo! Inc., Santa Clara, CA, 95054, USA
Abstract: MapReduce is a parallel programming model that is proven to scale. However, using the low-level MapReduce for general data processing tasks poses the problem of developing, maintaining and reusing custom low-level user code. Several frameworks have emerged to address this problem. We highlight several issues in these approaches and alternatively propose a novel refined MapReduce model (MR-LEGOS); an explicit model for composing MapReduce constructs from simpler components, namely, |Maplets|, |Reducelets| and optionally |Combinelets|. This composition can be viewed as defining a micro-workflow inside the MapReduce job. Using MR-LEGOS, complex problem semantics can be defined in the encompassing micro-workflow while keeping the building blocks simple. The model is analogous to LEGO bricks. Having a collection of these standard and reusable predefined bricks, helps define complex processing tasks efficiently. We present the design details, usage scenarios, performance experiments and highlight the main features of MR-LEGOS.
Keywords: cloud computing; MapReduce; Hadoop; data management; grid computing; LEGOS; extract transform load; parallel programming; semantics.
International Journal of Cloud Computing, 2011 Vol.1 No.1, pp.58 - 80
Received: 19 May 2010
Accepted: 26 Oct 2010
Published online: 30 Dec 2014 *