Authors: Álvaro García-Recuero; Sérgio Esteves; Luís Veiga
Addresses: INRIA, Rennes-Bretagne Atlantique Research Center, Campus de Beaulieu, 35042 Rennes Cedex, France ' INESC-ID Lisboa-Distributed Systems Group, Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol, 9, 1000-029 Lisbon, Portugal ' INESC-ID Lisboa-Distributed Systems Group, Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol, 9, 1000-029 Lisbon, Portugal
Abstract: With the advent of Cloud Computing, Big Data management has become a fundamental challenge during the deployment and operation of distributed highly available and fault-tolerant storage systems such as the HBase extensible record-store. These systems can provide support for geo-replication, which comes with the issue of data consistency among distributed sites. In order to offer a best-in-class service to applications, one wants to maximise performance while minimising latency. In terms of data replication, that means incurring in as low latency as possible when moving data between distant data centres. Traditional consistency models introduce a significant problem for systems architects, which is specially important to note in cases where large amounts of data need to be replicated across wide-area networks. In such scenarios it might be suitable to use eventual consistency, and even though not always convenient, latency can be partly reduced and traded for consistency guarantees so that data-transfers do not impact performance. In contrast, this work proposes a broader range of data semantics for consistency while prioritising data at the cost of putting a minimum latency overhead on the rest of non-critical updates. Finally, we show how these semantics can help in finding an optimal data replication strategy for achieving just the required level of data consistency under low latency and a more efficient network bandwidth utilisation.
Keywords: cloud storage; data consistency; replication; geo-replication; data storage; NoSQL; quality-of-service; QoS; big data management; data semantics; latency; network bandwidth.
International Journal of Big Data Intelligence, 2014 Vol.1 No.1/2, pp.74 - 88
Available online: 23 Jul 2014 *Full-text access for editors Access for subscribers Purchase this article Comment on this article