Authors: Sikha Bagui; Keerthi Devulapalli
Addresses: Department of Computer Science, University of West Florida, Pensacola, FL, USA ' Department of Computer Science, University of West Florida, Pensacola, FL, USA
Abstract: The ever increasing size of data sets in this big data era has forced data analytics to be moved from traditional RDBMS systems to distributed technologies like Hadoop. Since data analysts are more familiar with SQL than the MapReduce programming paradigm, HiveQL was built on Hadoop's MapReduce framework. Traditional RDBMS query optimisation techniques used in the rule-based optimiser (RBO) of Hive do not perform well in the MapReduce environment, hence, the correlation optimiser (CRO) and cost-based optimisers (CBOs) were developed. These optimisers perform query optimisations taking the MapReduce execution framework into account. In this work, the three optimisers, RBO, CRO, and CBO are compared. Queries with common intra-query operations are found to be better optimised with CRO.
Keywords: Hive; query optimisation; correlation optimiser; CRO; rule-based optimiser; RBO; cost-based optimiser; CBO.
International Journal of Big Data Intelligence, 2018 Vol.5 No.4, pp.243 - 257
Received: 09 Jun 2017
Accepted: 16 Aug 2017
Published online: 05 Dec 2017 *