Efficient Bayesian Machine Learning with Gaussian Processes
Statistical inference for functions is an important topic for regression and classification problems in machine learning. One of the challenges in function inference is to model the non-linear relationships between data inputs and outputs. Gaussian processes (GPs) are powerful non-linear models that provide a nonparametric representation of functions. However, exact GP inference suffers from a high computational cost for big data. In this dissertation, we target to provide efficient and scalable but yet accurate approximate inference algorithms for GPs. First, we purpose a new Bayesian approach, EigenGP, which learns both the dictionary basis functions—eigenfunctions of a GP prior—and the prior precision in a sparse finite model. EigenGP can be viewed as a low-rank approximation to the exact GPs by projecting the GP prior to an eigensubspace. By maximizing the model evidence, EigenGP can construct sparse basis functions in a reproducing kernel Hilbert space as a finite linear combination of kernel functions. The learned basis functions of EigenGP can interpolate non-stationary functions even with a stationary kernel. We further extend EigenGP to parallel computing and classification problems. The experimental results on benchmark datasets demonstrate the advantages of EigenGP over alternative state-of-the-art sparse GP methods as well as Relevance Vector Machines (RVMs). Second, we propose a more general but principled model, ADVGP, in the variational Bayesian framework. ADVGP approximates the exact GP inference using a weight-space argumentation with a flexible choice of the basis functions. By choosing different basis functions, we show that ADVGP not only relates many existing sparse GPs but also creates new sparse GP models. More importantly, as far as we know, we are the first to implement the delayed proximal gradient algorithm for sparse GPs on the recent distributed platform, PARAMETERSERVER. By utilizing the structure of the ADVGP’s log evidence lower bound, we derive efficient element-wise asynchronous updates with convergence guarantees for our variational parameters on the servers. Our experiments show that ADVGP has a better predictive performance than other distributed or stochastic spare GP models and scales GP regression to a real-world application with billions of samples.
Li, Purdue University.
Off-Campus Purdue Users:
To access this dissertation, please log in to our