Title: Cassandra: what it does and what it does not and benchmarking

Authors: Melyssa Barata; Jorge Bernardino; Pedro Furtado

Addresses: University of Coimbra, Portugal Rua Pedro Nunes, Quinta da Nora, 3030-190 Coimbra, Portugal ' ISEC – Coimbra Institute of Engineering, Portugal Rua Pedro Nunes, Quinta da Nora, 3030-190 Coimbra, Portugal; CISUC, Polytechnic Institute of Coimbra, Pinhal de Marrocos 3030-290 Coimbra, Portugal ' University of Coimbra, Portugal Pinhal de Marrocos 3030-290 Coimbra, Portugal

Abstract: Cassandra is an open source distributed data store system designed for managing and storing huge amounts of data. It can serve as both a read-intensive database for large-scale business intelligence systems and a real-time operational data store for online transactional applications. In this article, we describe three of the most relevant benchmarks that were developed to assess both NoSQL and big data capabilities and complex analytic functionality (YCSB, BigBench, and TPC-DS). We also review the Cassandra database explaining its characteristics and operational principles. The main focus of this paper is to explain the three benchmarks and Cassandra in order to present the results we obtained from this database.

Keywords: benchmarking; Not Only SQL; NoSQL; database; Cassandra; Yahoo! Cloud Serving Benchmark; YCSB; TPC-DS; BigBench; big data; business intelligence.

DOI: 10.1504/IJBPIM.2015.073658

International Journal of Business Process Integration and Management, 2015 Vol.7 No.4, pp.364 - 371

Published online: 15 Dec 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article