Title: Efficient join algorithms for distributed information integration based on XML

Authors: Hongzhi Wang, Jianzhong Li, Shuguang Xiong

Addresses: Department of Computer Science and Technology, Harbin Institute of Technology, P.O. Box 750, Harbin, China. ' Department of Computer Science and Technology, Harbin Institute of Technology, P.O. Box 750, Harbin, China. ' Department of Computer Science and Technology, Harbin Institute of Technology, P.O. Box 750, Harbin, China

Abstract: For its flexibility, XML is suitable for data representation in information integration systems. Querying XML data in distributed information integration system brings new challenges. In this paper, we focus on join algorithms in result merging step of query processing in distributed information integration system based on XML. In order to transmit partial results efficiently, data compacting strategies are presented. We present four join operators with various semantics for result merging in XML-based information integration system. Based on the compacted data, efficient evaluation algorithms are designed for join operators presented in this paper. To process join on data from multiple data sources, our two-way join algorithms are extended to multi-join algorithm. Extensive experimental results show that our data compacting strategy is effective; our join algorithms outperform XJoin significantly and have good scalability; our multi-join algorithm outperforms the strategy of performing, multi-way join is efficient and has good scalability.

Keywords: XML; information integration; data compacting strategy; join algorithms; data representation; query processing; semantics.

DOI: 10.1504/IJBPIM.2008.024984

International Journal of Business Process Integration and Management, 2008 Vol.3 No.4, pp.271 - 281

Published online: 06 May 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article