Open Access Dissertations

Divide and recombine for large complex data: The subset likelihood modeling approach to recombination

Abstract

Divide and recombine (D&R) is a statistical framework for the analysis of large complex data. The data are divided into subsets. Numeric and visualization methods, which collectively are analytic methods, are applied to each subset. For each analytic method, the outputs of the application of the method to the subsets are recombined. So each analytic method has associated with it a division method and a recombination method. Here we study D&R methods for likelihood-based model fitting. We introduce a notion of likelihood analysis and modeling. We divide the data and fit a likelihood model on each subset. The fitted model is characterized by a set of parameters much smaller than the subset data size, but retains as much information as possible about the true subset likelihood. Analysis of subset likelihoods and their fitted models consists of visualizations on an appropriate scale and region. These visualizations allow the analyst to verify the choice and fit of the model. The fitted models are recombined across subsets to form a model of the the all-data likelihood, which we maximize to obtain a likelihood modeling estimate (LME). We present simulation results demonstrating the performance of our method compared with the all-data maximum likelihood estimate (MLE) for the case of logistic regression.

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Statistics

Committee Chair

William S. Cleveland

Date of Award

Spring 2015

Recommended Citation

Gautier, Philip, "Divide and recombine for large complex data: The subset likelihood modeling approach to recombination" (2015). Open Access Dissertations. 458.
https://docs.lib.purdue.edu/open_access_dissertations/458

First Advisor

William S. Cleveland

Committee Member 1

Chuanhai Liu

Committee Member 2

Bowei Xi

Committee Member 3

Lingsong Zhang

Download

Included in

Statistics and Probability Commons

COinS

Open Access Dissertations

Divide and recombine for large complex data: The subset likelihood modeling approach to recombination

Abstract

Degree Type

Degree Name

Department

Committee Chair

Date of Award

Recommended Citation

First Advisor

Committee Member 1

Committee Member 2

Committee Member 3

Included in

Search

Links

Links for Authors

Browse

Open Access Dissertations

Divide and recombine for large complex data: The subset likelihood modeling approach to recombination

Author

Abstract

Degree Type

Degree Name

Department

Committee Chair

Date of Award

Recommended Citation

First Advisor

Committee Member 1

Committee Member 2

Committee Member 3

Included in

Share

Search

Links

Links for Authors

Browse