Retrospective Approximation for Smooth Stochastic Optimization

David Newton, Purdue University

Abstract

Stochastic Gradient Descent (SGD) is a widely-used iterative algorithm for solving stochastic optimization problems for a smooth (and possibly non-convex) objective function via queries from a first-order stochastic oracle. In this dissertation, we critically examine SGD’s choice of executing a single step as opposed to multiple steps between subsample updates. Our investigation leads naturally to generalizing SG into Retrospective Approximation (RA) where, during each iteration, a deterministic solver executes possibly multiple steps on a subsampled deterministic problem and stops when further solving is deemed unnecessary from the standpoint of statistical efficiency. RA thus leverages what is appealing for implementation – during each iteration, a solver, e.g., L-BFGS with backtracking line search is used, as is, and the subsampled objected function is solved only to the extent necessary. We develop a complete theory using relative errorof the observed gradients as the principal object, demonstrating that almost sure and L1 consistency of RA are preserved under especially weak conditions when sample sizes are increased at appropriate rates. We also characterize the iteration and oracle complexity (for linear and sub-linear solvers) of RA, and identify two practical termination criteria, one of which we show leads to optimal complexity rates. The message from extensive numerical experiments is that the ability of RA to incorporate existing second-order deterministic solvers in a strategic manner is useful both in terms of algorithmic trajectory as well as from the standpoint of dispensing with hyper-parameter tuning.

Degree

Ph.D.

Advisors

Pasupathy, Purdue University.

Subject Area

Artificial intelligence|Mathematics

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS