Taming tail latency for erasure-coded, distributed storage systems

Jingxian Fan, Purdue University

Abstract

Nowadays, in distributed storage systems, long tails of responsible time are of particular concern. Modern large companies like Bing, Facebook and Amazon Web Service show that 99.9th percentile response times being orders of magnitude worse than the mean. With the advantages of maintaining high data reliability and ensur- ing enough space eciency, erasure code has become a popular storage method in distributed storage systems. However, due to the lack of mathematical models for analyzing erasure-coded based distributed storage systems, taming tail latency is still an open problem. In this research, we quantify tail latency in such systems by deriving a closed upper bounds on tail latency for general service time distribution and heterogeneous files. Later we specified service time to shifted exponentially distributed. Based on this model, we developed an optimization problem to minimize weighted tail latency probability of deriving all files. We propose an alternating minimization algorithm for this problem. Our simulation results have shown significant reduction on tail latency of erasure-coded distributed storage systems with realistic environment workload.

Degree

M.S.I.E.

Advisors

Aggarwal, Purdue University.

Subject Area

Industrial engineering

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS