Title: Comparison of Hive's query optimisation techniques

Authors: Sikha Bagui; Keerthi Devulapalli

Addresses: Department of Computer Science, University of West Florida, Pensacola, FL, USA ' Department of Computer Science, University of West Florida, Pensacola, FL, USA

Abstract: The ever increasing size of data sets in this big data era has forced data analytics to be moved from traditional RDBMS systems to distributed technologies like Hadoop. Since data analysts are more familiar with SQL than the MapReduce programming paradigm, HiveQL was built on Hadoop's MapReduce framework. Traditional RDBMS query optimisation techniques used in the rule-based optimiser (RBO) of Hive do not perform well in the MapReduce environment, hence, the correlation optimiser (CRO) and cost-based optimisers (CBOs) were developed. These optimisers perform query optimisations taking the MapReduce execution framework into account. In this work, the three optimisers, RBO, CRO, and CBO are compared. Queries with common intra-query operations are found to be better optimised with CRO.

Keywords: Hive; query optimisation; correlation optimiser; CRO; rule-based optimiser; RBO; cost-based optimiser; CBO.

DOI: 10.1504/IJBDI.2018.094993

International Journal of Big Data Intelligence, 2018 Vol.5 No.4, pp.243 - 257

Received: 09 Jun 2017
Accepted: 16 Aug 2017

Published online: 28 Sep 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article