Authors: Soham Das; Sartaj Sahni
Addresses: Department of Computer and Information Science and Engineering, University of Florida, Gainesville, USA ' Department of Computer and Information Science and Engineering, University of Florida, Gainesville, USA
Abstract: In this paper, we develop algorithms for the data aggregation problem which arises in the context of big data applications that employ the MapReduce operation. For the case when source racks can send their data to the aggregator using multiple paths, we show that an aggregation tree topology that minimises aggregation time can be constructed in a polynomial time. We also consider the problem of constructing aggregation trees that minimise total network traffic subject to the primary constraints that the aggregation time is minimised. The heuristics for this problem is presented and the experiments show that allowing multiple paths reduces aggregation time by up to 99% related to the aggregation trees constructed using the LPT rule. This reduction in aggregation time, however, comes with up to 35% increase in total network traffic when the racks have more than two optical links and up to 98% increase in total network traffic when each rack has two optical links.
Keywords: big data applications; data centre networks; MapReduce tasks; software-defined networking; network topology optimisation; data aggregation; multiple paths; network traffic.
International Journal of Metaheuristics, 2015 Vol.4 No.2, pp.115 - 140
Received: 07 Apr 2015
Accepted: 29 Apr 2015
Published online: 18 Jan 2016 *