Statistical Design of Sequential Decision Making Algorithms
Abstract
Sequential decision-making is a fundamental class of problem that motivates algorithm designs of online machine learning and reinforcement learning. Arguably, the resulting online algorithms have supported modern online service industries for their data-driven real-time automated decision making. The applications span across different industries, including dynamic pricing (Marketing), recommendation (Advertising), and dosage finding (Clinical Trial). In this dissertation, we contribute fundamental statistical design advances for sequential decision-making algorithms, leaping progress in theory and application of online learning and sequential decision making under uncertainty including online sparse learning, finite-armed bandits, and high-dimensional online decision making. Our work locates at the intersection of decision-making algorithm designs, online statistical machine learning, and operations research, contributing new algorithms, theory, and insights to diverse fields including optimization, statistics, and machine learning.In part I, we contribute a theoretical framework of continuous risk monitoring for regularized online statistical learning. Such theoretical framework is desirable for modern online service industries on monitoring deployed model’s performance of online machine learning task. In the first project (Chapter 1), we develop continuous risk monitoring for the online Lasso procedure and provide an always-valid algorithm for high-dimensional dynamic pricing problems. In the second project (Chapter 2), we develop continuous risk monitoring for online matrix regression and provide new algorithms for rank-constrained online matrix completion problems. Such theoretical advances are due to our elegant interplay between non-asymptotic martingale concentration theory and regularized online statistical machine learning.In part II, we contribute a bootstrap-based methodology for finite-armed bandit problems, termed Residual Bootstrap exploration. Such a method opens a possibility to design model-agnostic bandit algorithms without problem-adaptive optimism-engineering and instance-specific prior-tuning. In the first project (Chapter 3), we develop residual bootstrap exploration for multi-armed bandit algorithms and shows its easy generalizability to bandit problems with complex or ambiguous reward structure. In the second project (Chapter 4) we develop a theoretical framework for residual bootstrap exploration in linear bandit with fixed action set. Such methodology advances are due to our development of non-asymptotic theory for the bootstrap procedure.In part III, we contribute application-driven insights on the exploration-exploitation dilemma for high-dimensional online decision-making problems. Such insights help practitioners to implement effective high-dimensional statistics methods to solve online decisionmaking problems. In the first project (Chapter 5), we develop a bandit sampling scheme for online batch high-dimensional decision making, a practical scenario in interactive marketing, and sequential clinical trials. In the second project (Chapter 6), we develop a bandit sampling scheme for federated online high-dimensional decision-making to maintain data decentralization and perform collaborated decisions. These new insights are due to our new bandit sampling design to address application-driven exploration-exploitation trade-offs effectively.
Degree
Ph.D.
Advisors
Song, Purdue University.
Subject Area
Computer science
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.