Variable selection methodology for high -dimensional multivariate binary data with application to microbial community DNA fingerprint analysis

Jayson Dwight Wilbur, Purdue University

Abstract

In order to understand the role of microorganisms in an environment, the identification and characterization of the relevant microbial community is necessary. Characteristic profiles of microbial communities are obtained by denaturing gradient gel electrophoresis (DGGE) of polymerase chain reaction (PCR) amplified 16S rDNA from soil extracted DNA. These characteristic profiles, commonly called community DNA fingerprints, can be represented in the form of high-dimensional binary vectors. The problem of modeling and variable selection for high-dimensional multivariate binary data is addressed from both a frequentist and a Bayesian perspective. Permutation-based approaches are employed to select variables which vary significantly with respect to a treatment effect and the properties of these methods are explored via simulation. An Empirical Bayes model for multivariate binary response data is proposed and variables are selected by making posterior inference on the model space. In conclusion, an application of the methodology is explored in the context of a controlled agricultural experiment.

Degree

Ph.D.

Advisors

Ghosh, Purdue University.

Subject Area

Statistics|Biostatistics

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS