Expeditious Causal Inference for Big Observational Data

Yumin Zhang, Purdue University

Abstract

This dissertation address two significant challenges in the causal inference workflow for Big Observational Data. The first is designing Big Observational Data with high-dimensional and heterogeneous covariates. The second is performing uncertainty quantification for estimates of causal estimands that are obtained from the application of black box machine learning algorithms on the designed Big Observational Data. The methodologies developed by addressing these challenges are applied for the design and analysis of Big Observational Data from a large public university in the United States.Distributed DesignA fundamental issue in causal inference for Big Observational Data is confounding due to covariate imbalances between treatment groups. This can be addressed by designing the study prior to analysis. The design ensures that subjects in the different treatment groups that have comparable covariates are subclassified or matched together. Analyzing such a designed study helps to reduce biases arising from the confounding of covariates with treatment. Existing design methods, developed for traditional observational studies consisting of a single designer, can yield unsatisfactory designs with sub-optimum covariate balance for Big Observational Data due to their inability to accommodate the massive dimensionality, heterogeneity, and volume of the Big Data. We propose a new framework for the distributed design of Big Observational Data amongst collaborative designers. Our framework first assigns subsets of the high-dimensional and heterogeneous covariates to multiple designers. The designers then summarize their covariates into lower-dimensional quantities, share their summaries with the others, and design the study in parallel based on their assigned covariates and the summaries they receive. The final design is selected by comparing balance measures for all covariates across the candidates and identifying the best amongst the candidates. We perform simulation studies and analyze datasets from the 2016 Atlantic Causal Inference Conference Data Challenge to demonstrate the flexibility and power of our framework for constructing designs with good covariate balance from Big Observational Data.Designed BootstrapThe combination of modern machine learning algorithms with the nonparametric bootstrap can enable effective predictions and inferences on Big Observational Data. An increasingly prominent and critical objective in such analyses is to draw causal inferences from the Big Observational Data. A fundamental step in addressing this objective is to design the observational study prior to the application of machine learning algorithms. However, the application of the traditional nonparametric bootstrap on Big Observational Data requires excessive computational efforts. This is because every bootstrap sample would need to be re-designed under the traditional approach, which can be prohibitive in practice. We propose a design-based bootstrap for deriving causal inferences with reduced bias from the application of machine learning algorithms on Big Observational Data. Our bootstrap procedure operates by resampling from the original designed observational study. It eliminates the need for additional, costly design steps on each bootstrap sample that are performed under the standard nonparametric bootstrap. We demonstrate the computational efficiency of this procedure compared to the traditional nonparametric bootstrap, and its equivalency in terms of confidence interval coverage rates for the average treatment effects, by means of simulation studies and a real-life casestudy.Case StudyWe apply the distributed design and designed bootstrap methodologies in a case study involving institutional data from a large public university in the United States.

Degree

Ph.D.

Advisors

Sabbaghi, Purdue University.

Subject Area

Artificial intelligence|Design|Higher education

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS