High Dimensional Inference for Semiparametric Models
In the literature, high dimensional inference refers to statistical inference when the number of unknown parameters is much greater than the sample size. Semiparametric models are models that include parametric and nonparametric components, such as partial linear models and partial additive models. Due to the high dimensionality of the parameter of interests and the presence of a nuisance function, it is very challenging to make estimation and inference for the parametric component in high dimensional semiparametric settings, for instance, construction of confidence intervals and hypothesis testings. In this thesis, I will present two sets of estimation and inference results under high dimensional semiparametric setups. The first one is minimax optimal estimation in high dimensional semiparametric models. Our particular focus is on partially linear additive models with high dimensional sparse vectors and smooth nonparametric functions. The minimax lower bound for the parametric component depends merely on the dimensionality and sparsity, while the minimax lower bound for each nonparametric component is established as an interplay among dimensionality, sparsity and smoothness. Indeed, the minimax risk for parametric estimation cannot be affected by the roughness of the nonparametric functions. However, the minimax risk for smooth nonparametric estimation can be slowed down to the classical parametric rate by the existence of high dimensional sparse vector, given sufficiently large smoothness or dimensionality. Such rate-switching phenomenon differs significantly from low dimensional models where estimation rate for each component only depends on itself. In the above setting, a general class of penalized least square estimators is constructed to nearly achieve minimax lower bounds. The second one is high dimensional inference for partial spline models, where the dimension of parametric components is allowed to be as exponentially large as sample size. We propose a semiparametric version of de-biased Lasso estimator. In the high dimensional regime, this new estimator is shown to be asymptotically normal. Based on this distributional result, we further conduct a simultaneous hypothesis testing with applications to support recovery and multiple testing with strong family wise error rate control.
Cheng, Purdue University.
Off-Campus Purdue Users:
To access this dissertation, please log in to our