Int. J. of Big Data Intelligence   »   2014 Vol.1, No.1/2

 

 

Title: Towards quality-of-service driven consistency for Big Data management

 

Authors: Álvaro García-Recuero; Sérgio Esteves; Luís Veiga

 

Addresses:
INRIA, Rennes-Bretagne Atlantique Research Center, Campus de Beaulieu, 35042 Rennes Cedex, France
INESC-ID Lisboa-Distributed Systems Group, Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol, 9, 1000-029 Lisbon, Portugal
INESC-ID Lisboa-Distributed Systems Group, Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol, 9, 1000-029 Lisbon, Portugal

 

Abstract: With the advent of Cloud Computing, Big Data management has become a fundamental challenge during the deployment and operation of distributed highly available and fault-tolerant storage systems such as the HBase extensible record-store. These systems can provide support for geo-replication, which comes with the issue of data consistency among distributed sites. In order to offer a best-in-class service to applications, one wants to maximise performance while minimising latency. In terms of data replication, that means incurring in as low latency as possible when moving data between distant data centres. Traditional consistency models introduce a significant problem for systems architects, which is specially important to note in cases where large amounts of data need to be replicated across wide-area networks. In such scenarios it might be suitable to use eventual consistency, and even though not always convenient, latency can be partly reduced and traded for consistency guarantees so that data-transfers do not impact performance. In contrast, this work proposes a broader range of data semantics for consistency while prioritising data at the cost of putting a minimum latency overhead on the rest of non-critical updates. Finally, we show how these semantics can help in finding an optimal data replication strategy for achieving just the required level of data consistency under low latency and a more efficient network bandwidth utilisation.

 

Keywords: cloud storage; data consistency; replication; geo-replication; data storage; NoSQL; quality-of-service; QoS; big data management; data semantics; latency; network bandwidth.

 

DOI: 10.1504/IJBDI.2014.063853

 

Int. J. of Big Data Intelligence, 2014 Vol.1, No.1/2, pp.74 - 88

 

Submission date: 27 Dec 2013
Date of acceptance: 20 Mar 2014
Available online: 23 Jul 2014

 

 

Editors Full text accessAccess for SubscribersPurchase this articleComment on this article