On-line sampling-based control for network queueing problems
This thesis proposes novel on-line sampling algorithms for control in (possibly partially observable) Markov decision processes (MDPs). We emplay a receding horizon control framework. In this framework, we select a fixed sampling horizon and obtain an approximately optimal current action for that sampling horizon, taking that action at each decision time. We first discuss two distinguished previous efforts in this direction; a sampled look-ahead tree by Kearns et al. and the rollout algorithm by Bertsekas and Castanon, and then we propose two sampling-based control techniques called “parallel rollout” and “hindsight optimization”. Parallel rollout is a generalization of the Bertsekas rollout algorithm, and hindsight optimization is motivated by Ginsberg's Monte Carlo card play algorithm for computer bridge. In parallel rollout, we start with a small set of simple heuristic base policies that we wish to combine in an online fashion to generate a single controller. The approach yields a policy that is provably no worse at each state than the best of the base policies at that state. In hindsight optimization, the utility of taking an action is upper bounded by the average over many sampled traces of the (possibly discounted) reward sum of taking the action and then following the trace-relative optimal plan for the remaining horizon. The action with the highest utility upper bound is taken at each decision time. The utility estimate by hindsight optimization is an upperbound on the true utility whereas the estimate by parallel rollout is a lowerbound. As a “proof of concept” of parallel rollout and hindsight optimization, we formulate two resource allocation problems that arise in the telecommunication network area by partially observable MDPs: a buffer management problem and a multiclass packet scheduling problem with deadlines. The key feature of these two approaches is that, using our techniques, a given or learned stochastic model of network traffic can be effectively incorporated beneficially and tractably in making on-line network control decisions. We compare well-known non-sampling control policies and previously published sampling-based techniques with our proposed approaches, and show that our approaches improve on several known alternatives using empirical results based on simulated traffic.
Givan, Purdue University.
Electrical engineering|Computer science
Off-Campus Purdue Users:
To access this dissertation, please log in to our