Chromosome 3D Structure Modeling and New Approaches for General Statistical Inference

Rongrong Zhang, Purdue University

Abstract

This thesis consists of two separate topics, which include the use of piecewise helical models for the inference of 3D spatial organizations of chromosomes and new approaches for general statistical inference. The recently developed Hi-C technology enables a genome-wide view of chromosome spatial organizations, and has shed deep insights into genome structure and genome function. However, multiple sources of uncertainties make downstream data analysis and interpretation challenging. Specifically, statistical models for inferring three-dimensional (3D) chromosomal structure from Hi-C data are far from their maturity. Most existing methods are highly over-parameterized, lacking clear interpretations, and sensitive to outliers. We propose a parsimonious, easy to interpret, and robust piecewise helical curve model for the inference of 3D chromosomal structures from Hi-C data, for both individual topologically associated domains and whole chromosomes. When applied to a real Hi-C dataset, the piecewise helical model not only achieves much better model fitting than existing models, but also reveals that geometric properties of chromatin spatial organization are closely related to genome function. For potential applications in big data analytics and machine learning, we propose to use deep neural networks to automate the Bayesian model selection and parameter estimation procedures. Two such frameworks are developed under different scenarios. First, we construct a deep neural network-based Bayes estimator for the parameters of a given model. The neural Bayes estimator mitigates the compu-tational challenges faced by traditional approaches for computing Bayes estimators. When applied to the generalized linear mixed models, the neural Bayes estimator outperforms existing methods implemented in R packages and SAS procedures. Second, we construct a deep convolutional neural networks-based framework to perform simultaneous Bayesian model selection and parameter estimation. We refer to the neural networks for model selection and parameter estimation in the framework as the neural model selector and parameter estimator, respectively, which can be properly trained using labeled data systematically generated from candidate models. Simulation study shows that both the neural selector and estimator demonstrate excellent performances. The theory of Conditional Inferential Models (CIMs) has been introduced to combine information for efficient inference in the Inferential Models framework for priorfree and yet valid probabilistic inference. While the general theory is subject to further development, the so-called regular CIMs are simple. We establish and prove a necessary and sufficient condition for the existence and identification of regular CIMs. More specifically, it is shown that for inference based on a sample from continuous distributions with unknown parameters, the corresponding CIM is regular if and only if the unknown parameters are generalized location and scale parameters, indexing the transformations of an affine group.

Degree

Ph.D.

Advisors

Zhu, Purdue University.

Subject Area

Artificial intelligence|Bioinformatics|Genetics|Mathematics|Polymer chemistry

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS