Modern web applications face stringent requirements along many dimensions including latency, scalability, and availability. In response, several geo-distributed cloud storage systems have emerged in recent years. Customizing cloud data stores to meet application SLA requirements is a challenge given the scale of applications, and their diverse and dynamic workloads. In this paper, we tackle these challenges in the context of quorum-based systems (e.g. Amazon Dynamo, Cassandra), an important and widely used class of distributed cloud storage systems. We present models that seek to optimize percentiles of response time under normal operation and under a data-center (DC) failure. Our models consider a variety of factors such as the geographic spread of users, DC locations, relative priorities of read and write requests, application consistency requirements and inter-DC communication costs. We evaluate our models using realworld traces of three popular applications: Twitter,Wikipedia and Gowalla, and through experiments with a Cassandra cluster. Our results confirm the importance and effectiveness of our models, and offer important insights on the performance achievable with geo-distributed data stores.

Date of this Version