Dynamic network pricing using sampling techniques for Markov games
We consider the problem of dynamic pricing in a bandwidth market, where users arrive into a network according to a known stochastic traffic model, and request resource according to their class-specific demand functions. The aim of the bandwidth owner (vendor) is to maximize its revenue. We first consider the single-vendor case and provide three different pricing schemes for this setting. Using simulation results we show that model-based schemes outperform an appropriately tuned model-free scheme. We then extend the problem formulation to a Markov game by considering two vendors in the market. We give heuristic pricing schemes for this problem that attempt to find a Nash equilibrium in a restricted policy space, and provide simulation results. We then present an opportunistic pricing scheme that exploits knowledge of the pricing scheme being used by its opponent and empirically show that this scheme indeed benefits from such knowledge. The success of this opportunistic scheme emphasizes the need for sophisticated strategies for Markov games with large state spaces. Motivated by this need, we present a key approximation theorem for zero-sum, discounted Markov games, and use it to prove statespace-size—independent performance bounds on two sampling techniques. Specifically, we extend the policy-rollout technique for Markov decision processes to Markov games, and establish that under certain conditions, the policy generated by this algorithm outperforms the base policy in the sup norm. We also provide an alternate proof of near-optimality of the sparse-sampling algorithm developed by Kearns et al. We provide simulation results to empirically evaluate the new policy-rollout technique.
Givan, Purdue University.
Off-Campus Purdue Users:
To access this dissertation, please log in to our