Title: A SPARQL query processing system using map-phase-multi join for big data in clouds
Authors: Sheng-Wei Huang; Chia-Ho Yu; Ce-Kuen Shieh; Ming-Fong Tsai
Addresses: Department of Electrical Engineering, Institute of Computer and Communication Engineering, National Cheng Kung University, Taiwan ' Department of Electrical Engineering, Institute of Computer and Communication Engineering, National Cheng Kung University, Taiwan ' Department of Electrical Engineering, Institute of Computer and Communication Engineering, National Cheng Kung University, Taiwan ' Department of Electronic Engineering, National United University, Taiwan
Abstract: Big data refers to large datasets which are huge, complex and hard to be stored and analysed by traditional data processing tools. Linked data is one of the approaches to deal with big data which are stored and processed in TripleStore. For huge dataset, TripleStore requires more scalable techniques. 'MapReduce' programming model is the most representative of cloud technology. There are several approaches using MapReduce to serve SPARQL query but still exhibit unacceptable performance in complex queries. In this paper, we propose a map-phase-multi-join algorithm for processing SPARQL queries. Using multi-join, job initialisation time is reduced by avoiding iterative of MapReduce jobs. Furthermore, map-phase join can save bandwidth by preventing join-less data to be transferred among computing nodes. We also design a storage schema and a join-order rule which enhance the performance of our system. The evaluation results show that our system outperforms traditional join approaches in most queries.
Keywords: index terms-big data; linked data; MapReduce; SPARQL; TripleStore; NoSQL.
DOI: 10.1504/IJIPT.2017.087555
International Journal of Internet Protocol Technology, 2017 Vol.10 No.3, pp.177 - 188
Received: 27 Jul 2016
Accepted: 06 Apr 2017
Published online: 18 Oct 2017 *
 Join us on Bluesky
Join us on Bluesky Follow us on X
Follow us on X