Optimal Subsampling for Massive Penalized Spine Single Index Models
Abstract
The semiparametric single index model is well known as a compromise between parametric and nonparametric regression models, with its response mean dependent on a linear combination of covariates through an unknown univariate function. It has been widely studied due to its simplicity and flexibility, yet the challenge of its application exists especially for large datasets. This thesis focuses on the subsampling approach to fit a semiparametric single index models on large datasets, which can be computationally difficult due to the long calculating time and its high requirements on storage memory. By subsampling, the estimation on subsample, called the subsampling estimator, is used to approximate the estimation on the full sample, called the full sample estimator. To obtain an optimal sampling probability for subsampling, i.e., the optimal subsampling method, we first study the asymptotic properties of the subsampling estimator in a general semiparametric single index model with a general subsampling method, then we derive the formula of the optimal sampling probability by minimizing the asymptotic MSE of the subsampling estimator. We consider specific models in simulation studies and real data applications to investigate the numerical performance of the optimal subsampling method.
Degree
Ph.D.
Advisors
Li, Purdue University.
Subject Area
Statistics
Off-Campus Purdue Users:
	To access this dissertation, please log in to our
	proxy server.
 
				