Cloud computing holds the exciting potential of elastically scaling computation to match time-varying demand, thus eliminating the need to provision for peak demand. However, the uncertainty of variable loads necessitate the use of "margins" - servers that must be held active to absorb unpredictable potential load surges - which can be a signicant fraction of overall cost. Further, naively switching to an on-demand cloud model can actually degrade "true costs" (server costs that would be incurred even if margin costs disappeared) because of the fundamental economic rule wherein on-demand services/goods cost more compared to "reserved" goods/services where the user bears some commitment (i.e., on-demand customers must pay a premium in exchange for not undertaking the fixed-cost risk that committed customers undertake). This paper addresses the twin challenges of minimizing margin costs and true costs in an infrastructure-as-a-service (IaaS) cloud. Our paper makes the following two contributions. To address the problem of margin costs, we make two key observations based on real web server traces. First, rather than use a fixed margin, we observe that the margin may be load-dependent. For example, the margin required at low loads may be higher than the margin required at high loads. Second, we observe that the "tolerance" - the fraction of time when the response time target may be violated - need not be uniform across all load levels. For example, compared to a case where we satisfy requests within the target response time 95% of the time (for a tolerance of 5%) irrespective of load, one may achieve lower costs by satisfying the response time target 93% of the time at low loads and 97% of the time at high loads, while still achieving an overall 95% satisfaction ratio. We propose ShrinkWrap-opt which is a dynamic programming algorithm that exploits both the above observations to achieve optimal margin cost while achieving the desired (statistical) response time guarantees. To address true costs, we propose commitment straddling - the mixed use of reserved and on-demand machines - to achieve optimal true-cost. Simulations with real web server load traces. (including 3 months of traces from Wikimedia from Summer 2010) using the Amazon EC2 cost model reveal that our techniques save between 13% and 29% (21% on average) in cost while satisfying response-time targets.
Date of this Version