Fast nonparametric estimation of a mixing distribution with application to high-dimensional inference
Mixture distributions have, for many years, been used in a wide range of classical statistical problems, including cluster analysis and density estimation, but they are now finding new and interesting applications in the high-dimensional problems inspired by microarrays and other recent technological advances. Computational breakthroughs such as the EM and MCMC algorithms make fitting the mixture model relatively easy, but inference on the mixing distribution itself remains a challenging problem. Recently, M. A. Newton proposed a fast recursive algorithm for nonparametric estimation of the mixing distribution, motivated by heuristic Bayesian arguments, which has been shown to perform well in a host of applications. Theoretical investigations, on the other hand, have been rather limited. This thesis gives a thorough exploration of the theoretical properties of Newton’s recursive estimate (RE). We begin with a rigorous justification for the recursive algorithm, showing that RE is just a special case of stochastic approximation. For finite mixtures, consistency of RE is established using classical stochastic approximation results; general mixtures, on the other hand, would require an infinite-dimensional stochastic approximation which is still not well studied in general. As an alternative approach in the general mixture problem, a martingale approximation is used to show, under mild conditions, that the estimated mixture density converges almost surely to the “best possible” mixture in a Kullback-Leibler sense, and a competitive bound on the rate of convergence is obtained. Under some extra conditions, including identifiability, we prove almost sure weak convergence of the estimated mixing distribution.
Liu, Purdue University.
Off-Campus Purdue Users:
To access this dissertation, please log in to our