Title: Towards quality-of-service driven consistency for Big Data management

Authors: Álvaro García-Recuero; Sérgio Esteves; Luís Veiga

Addresses: INRIA, Rennes-Bretagne Atlantique Research Center, Campus de Beaulieu, 35042 Rennes Cedex, France ' INESC-ID Lisboa-Distributed Systems Group, Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol, 9, 1000-029 Lisbon, Portugal ' INESC-ID Lisboa-Distributed Systems Group, Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol, 9, 1000-029 Lisbon, Portugal

Abstract: With the advent of Cloud Computing, Big Data management has become a fundamental challenge during the deployment and operation of distributed highly available and fault-tolerant storage systems such as the HBase extensible record-store. These systems can provide support for geo-replication, which comes with the issue of data consistency among distributed sites. In order to offer a best-in-class service to applications, one wants to maximise performance while minimising latency. In terms of data replication, that means incurring in as low latency as possible when moving data between distant data centres. Traditional consistency models introduce a significant problem for systems architects, which is specially important to note in cases where large amounts of data need to be replicated across wide-area networks. In such scenarios it might be suitable to use eventual consistency, and even though not always convenient, latency can be partly reduced and traded for consistency guarantees so that data-transfers do not impact performance. In contrast, this work proposes a broader range of data semantics for consistency while prioritising data at the cost of putting a minimum latency overhead on the rest of non-critical updates. Finally, we show how these semantics can help in finding an optimal data replication strategy for achieving just the required level of data consistency under low latency and a more efficient network bandwidth utilisation.

Keywords: cloud storage; data consistency; replication; geo-replication; data storage; NoSQL; quality-of-service; QoS; big data management; data semantics; latency; network bandwidth.

DOI: 10.1504/IJBDI.2014.063853

International Journal of Big Data Intelligence, 2014 Vol.1 No.1/2, pp.74 - 88

Received: 28 Dec 2013
Accepted: 20 Mar 2014

Published online: 30 Sep 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article