Graphical Models for Non-Gaussian Continuous Data with Applications to Genomics Datasets
Abstract
A graphical model captures conditional relationships among a set of random variables via a graph. Under the assumption of a multivariate Gaussian distribution (or shortly the Gaussian assumption), it has been shown that edges of a graph are matched to the nonzero elements of the inverse covariance matrix. Based on this fact, many approaches have been proposed to estimate the sparse inverse covariance matrix with various regularization methods. Although simple and powerful, these approaches rely on a rather strong distributional assumption, which limits their application to real data. In this dissertation, we propose two approaches to graphical models that can be used for continuous multivariate data under conditions weaker than the Gaussian assumption. First, we introduce kernel partial correlation which extends the partial correlation coefficient. The partial correlation coefficient is a useful conditional dependence measure that is computed by a combination of two parametric methods under the Gaussian assumption. In kernel partial correlation, these two parametric methods are substituted by two flexible nonparametric kernel-based approaches. The proposed approach is not only flexible to the shape of relationship but also robust to high levels of noise owing to the characteristics of the employed nonparametric approaches. Our method outperforms existing approaches when it is applied to simulated data as well as to real data from single-cell RNA-sequencing experiments. Although flexible, the kernel partial correlation cannot explore dependence among variables in full generality due to limitations of traditional regression. In the second part, we remove such limitations by assuming that variables jointly follow a multivariate elliptical distribution after certain transformations. We propose the elliptical graphical models with individual transformations approach which estimates a graph by integrating optimal transformations and sparse nonparametric function estimations. Due to the flexibility of the model, our method captures highly non-linear conditional relationships, such as circular relationships with many dummy variables. Our simulation studies under various scenarios show that our method performs superior to other approaches. At the end, we survey the theories of other methods that are closely linked to our approach and describe the possible future works.
Degree
Ph.D.
Advisors
Chun, Purdue University.
Subject Area
Statistics
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.