Efficient sparse Bayesian learning using spike-and-slab priors
In the context of statistical machine learning, sparse learning is a procedure that seeks a reconciliation between two competing aspects of a statistical model: good predictive power and interpretability. In a Bayesian setting, sparse learning methods invoke sparsity inducing priors to explicitly encode this tradeoff in a principled manner. Recently, spike-and-slab priors have been very popular in the sparse machine learning community. This popularity stems from the selective shrinkage property of the priors: irrelevant variables are shrunk aggressively, but relevant variables are regularized mildly. However, classical formulation of the spike-and-slab priors does not explicitly incorporate information about the correlation structure between the variables which is available in various domains, and could be useful for revealing the sparsity structure. In this dissertation we focus on supervised parametric linear models, and propose a generalized formulation of the spike-and-slab priors that tries to achieve optimal model complexity by exploiting this domain based correlation structure information, and hence seeks to improve the predictive power and interpretability of the results. Bayesian learning through spike-and-slab priors, though attractive, is not free of challenges. One huge bottleneck associated with current Bayesian inference methodologies is the high computational cost at high dimensions. In this dissertation we also propose scalable Bayesian inference strategies for classical spike-and-slab models. First, we present a new sparse Bayesian approach, called Network and Node Selection (NaNOS), for joint group and feature selection. NaNOS extends the classical spike-and-slab prior for group selection by presenting a generalized formulation of the prior that incorporates correlation structure information provided by the domain for each group, and allows our model to induce structured sparsity, guided by domain knowledge, within the selected groups. NaNOS also provides a principled framework for jointly selecting relevant groups as well as relevant features within the selected groups. Simulation and real data results demonstrate improved predictive performance and selection accuracy of our method over alternative methods. Second, we propose a scalable approximate Bayesian inference algorithm based on Laplace's method for the classical spike-and-slab models. Our method can be seen as a hybrid of Bayesian and frequentist treatments taking benefits from both worlds. From a frequentist perspective, our approach is computationally efficient, and possesses asymptotic consistency properties; and from a Bayesian point of view, our method performs posterior inference better than or comparable to existing approximate inference techniques. Experimental results show improved performance of our approach compared to alternative approximate inference methods, but with computational efficiency comparable to frequentist l1 approaches.
Qi, Purdue University.
Off-Campus Purdue Users:
To access this dissertation, please log in to our