Title: Data partition optimisation for column-family NoSQL databases

Authors: Meng-Ju Hsieh; Li-Yung Ho; Jan-Jan Wu; Pangfeng Liu

Addresses: Institute of Information Science, Academia Sinica, 11529 128, Section 2, Academia Road, Nankang, Taipei, Taiwan; Department of Computer Science and Information Engineering, National Taiwan University, 10617 1, Section 4, Roosevelt Road, Taipei, Taiwan ' Research Center for Information Technology Innovation, Academia Sinica, 11529 128, Section 2, Academia Road, Nankang, Taipei, Taiwan; Department of Computer Science and Information Engineering, National Taiwan University, 10617 1, Section 4, Roosevelt Road, Taipei, Taiwan ' Institute of Information Science, Academia Sinica, 11529 128, Section 2, Academia Road, Nankang, Taipei, Taiwan; Research Center for Information Technology Innovation, Academia Sinica, 11529 128, Section 2, Academia Road, Nankang, Taipei, Taiwan ' Department of Computer Science and Information Engineering, National Taiwan University, 10617 1, Section 4, Roosevelt Road, Taipei, Taiwan; Graduate Institute of Networking and Multimedia, National Taiwan University, 10617 1, Section 4, Roosevelt Road, Taipei, Taiwan

Abstract: Data conversion has become an emerging topic in BigData era. To face the challenge of rapid data growth, legacy or existing relational databases have the need to convert into NoSQL column-family database in order to achieve better scalability. The conversion from SQL to NoSQL databases requires combining small, normalised SQL data tables into larger NoSQL data tables; a process called denormalisation. A challenging issue in data conversion is how to group the denormalised columns in a large data table into 'families' in order to ensure the performance of query processing. In this paper, we propose an efficient heuristic algorithm, graph-based partition algorithm (GPA), to address this problem. We use TPC-C and TPC-H benchmarks to demonstrate that the column-families produced by GPA is very efficient for large-scale data processing.

Keywords: vertical partition; column partition; column family; NoSQL database.

DOI: 10.1504/IJBDI.2017.086962

International Journal of Big Data Intelligence, 2017 Vol.4 No.4, pp.263 - 275

Received: 01 Mar 2016
Accepted: 03 Oct 2016

Published online: 03 Oct 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article