WaveTDA: Bayesian Statistics, Wavelets and Topological Data Analysis to Assess Statistical Shape

Patrick S Medina, Purdue University

Abstract

Advanced imaging and scanning technologies are providing an abundance of shape-based data that require novel approaches to understand the factors that generate them. This is especially true in modern agricultural practices where large quantities of highly detailed morphometric data are collected and used for studying genetic associations. Toward this end there is an immediate need for efficient algorithms that extract relevant geometric information from images in a form that can be used within a statistical framework. Topological Data Analysis (TDA) has shown to be both fast and effective for studying complex shapes, and can simultaneously reduce dimensionality into simpler topological summaries (referred to as a persistence diagram) while capturing essential geometric information about the overall image shape. In an agricultural setting, and otherwise, TDA has the capacity to quantify plant morphological images across multiple organs, but fails to provide a form that lends itself to further statistical analysis. Specifically, there is a significant gap in our understanding of the methods that adapt TDA summaries so that they can be analyzed using the arsenal of statistical theories and methodologies that are provided by such a rich field of study. Although there is at least one approach that uses kernel methods applied to persistence diagrams to make them amenable to existing statistical methods, very few of these kernel methods make use of all the topological information contained in a persistence diagram without experiencing a significant loss of information. To overcome this, WaveTDA is introduced as a Bayesian approach based on wavelets that lends itself to both regression analysis and hypothesis testing. Further, WaveTDA has the capacity to identify regions of the kernel transformed persistence diagrams (under the assumption of independence) that are significantly associated with (genetic) covariates. While this work is motivated and presented in the context of genetic studies, it is general enough to be used in a variety of applications.

Degree

Ph.D.

Advisors

Chun, Purdue University.

Subject Area

Statistics

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS