Int. J. of Big Data Intelligence   »   2017 Vol.4, No.4

 

 

Title: Data partition optimisation for column-family NoSQL databases

 

Authors: Meng-Ju Hsieh; Li-Yung Ho; Jan-Jan Wu; Pangfeng Liu

 

Addresses:
Institute of Information Science, Academia Sinica, 11529 128, Section 2, Academia Road, Nankang, Taipei, Taiwan; Department of Computer Science and Information Engineering, National Taiwan University, 10617 1, Section 4, Roosevelt Road, Taipei, Taiwan
Research Center for Information Technology Innovation, Academia Sinica, 11529 128, Section 2, Academia Road, Nankang, Taipei, Taiwan; Department of Computer Science and Information Engineering, National Taiwan University, 10617 1, Section 4, Roosevelt Road, Taipei, Taiwan
Institute of Information Science, Academia Sinica, 11529 128, Section 2, Academia Road, Nankang, Taipei, Taiwan; Research Center for Information Technology Innovation, Academia Sinica, 11529 128, Section 2, Academia Road, Nankang, Taipei, Taiwan
Department of Computer Science and Information Engineering, National Taiwan University, 10617 1, Section 4, Roosevelt Road, Taipei, Taiwan; Graduate Institute of Networking and Multimedia, National Taiwan University, 10617 1, Section 4, Roosevelt Road, Taipei, Taiwan

 

Abstract: Data conversion has become an emerging topic in BigData era. To face the challenge of rapid data growth, legacy or existing relational databases have the need to convert into NoSQL column-family database in order to achieve better scalability. The conversion from SQL to NoSQL databases requires combining small, normalised SQL data tables into larger NoSQL data tables; a process called denormalisation. A challenging issue in data conversion is how to group the denormalised columns in a large data table into 'families' in order to ensure the performance of query processing. In this paper, we propose an efficient heuristic algorithm, graph-based partition algorithm (GPA), to address this problem. We use TPC-C and TPC-H benchmarks to demonstrate that the column-families produced by GPA is very efficient for large-scale data processing.

 

Keywords: vertical partition; column partition; column family; NoSQL database.

 

DOI: 10.1504/IJBDI.2017.10006848

 

Int. J. of Big Data Intelligence, 2017 Vol.4, No.4, pp.263 - 275

 

Submission date: 01 Mar 2016
Date of acceptance: 03 Oct 2016
Available online: 04 Aug 2017

 

 

Editors Full text accessAccess for SubscribersPurchase this articleComment on this article