Adaptive Sampling Methods for Stochastic Optimization

Daniel Andres Vasquez Carvajal, Purdue University

Abstract

This dissertation investigates the use of sampling methods for solving stochastic optimization problems using iterative algorithms. Two sampling paradigms are considered: (i) adaptive sampling, where, before each iterate update, the sample size for estimating the objective function and the gradient is adaptively chosen: and (ii) retrospective approrimation (RA), where, iterate updates are performed using a chosen fixed sample size for as long as progress is deemed statistically significant, at which time the sample size is increased. We investigate adaptive sampling within the context of a trust-region framework for solving stochastic optimization problems in Rd, and retrospective approximation within the broader context of solving stochastic optimization problems on a Hilbert space. In the first part of the dissertation, we propose Adaptive Sampling Trust-Region Optimization (ASTRO), a class of derivative-based stochastic trust-region (TR) algorithms developed to solve smooth stochastic unconstrained optimization problems in R4¢ where the objective function and its gradient are observable only through a noisy oracle or using a large dataset. Efficiency in ASTRO stems from two key aspects: (i) adaptive sampling to ensure that the objective function and its gradient are sampled only to the extent needed, so that small sample sizes are chosen when the iterates are far from a critical point and large sample sizes are chosen when iterates are near a critical point; and (ii) quasi-Newton Hessian updates using BFGS. We prove three main results for ASTRO and for general stochastic trust-region methods that estimate function and gradient values adaptively. using sample sizes that are stopping times with respect to the sigma algebra of the generated observations. The first asserts strong consistency when the adaptive sample sizes have a mild logarithmic lower bound. assuming that the oracle errors are light-tailed. The second and third results characterize the iteration and oracle complexities in terms of certain risk functions. Specifically, the second result asserts that the best achievable O(e-1) iteration complexity (of squared gradient norm) is attained when the total relative risk associated with the adaptive sample size sequence is finite: and the third result characterizes the corresponding oracle complexity in terms of the total generalized risk associated with the adaptive sample size sequence. We report encouraging numerical results in certain settings. In the second part of this dissertation, we consider the use of RA as an alternate adaptive sampling paradigm to solve smooth stochastic constrained optimization problems in infinite- dimensional Hilbert spaces. RA generates a sequence of subsampled deterministic infinite- dimensional problems that are approximately solved within a dynamic error tolerance. The bottleneck in RA becomes solving this sequence of problems efficiently. To this end, we propose a progressive subspace expansion (PSE) framework to solve smooth deterministic optimization problems in infinite-dimensional Hilbert spaces with a TR Sequential Quadratic Programming (SQP) solver. The infinite-dimensiona! optimization problem is discretized, and a sequence of finite-dimensional problems are solved where the problem dimension is progressively increased. Additionally, (i) we solve this sequence of finitedimensional prob- lems only to the extent necessary, i.e.. we spend just enough computational work needed to solve each problem within a dynamic error tolerance, and (ii) we use the solution of the current optimization problem as the initial guess for the subsequent problem.

Degree

Ph.D.

Advisors

Pasupathy, Purdue University.

Subject Area

Artificial intelligence

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS