Supervised dimension reduction for high-dimensional generalized linear models

Yanzhu Lin, Purdue University

Abstract

Dimensionality reduction has become an increasingly important strategy in highdimensional data analysis in modern statistics. This is largely driven by the need to analyze massive data sets involving ill-posed problems due to high dimensionality and multicollinearity issues. In this thesis, we propose two new regression-based modeling methods for high-dimensional classication problems by implementing dimension reduction idea. In order to deal with the generalized linear model (GLM) with high-dimensional data, we propose a strategy to implement the supervised dimension reduction idea in partial least squares (PLS) to t high-dimensional GLMs. We intend to build up generalized orthogonal-components regression (GOCRE) for GLMs. Unlike the existing methods based on the extension of PLS to categorical data, we sequentially construct orthogonal predictors and each orthogonal predictor is the resultant of convergence construction. The bias correction procedure by Firth (1993) is also applied. In order to simultaneously implement dimension reduction and variable selection ideas in high-dimensional data analysis, we develop Sparse-GOCRE by incorporating a penalized approach into GOCRE framework. Within the sequential construction of components in the framework of GOCRE, a penalized approach is used to identify the sparse predictors for each component. Two dierent penalized strategies are considered, i.e., L1 penalty and empirical Bayes thresholding strategy. Our methods not only provide a solution to the high dimensionality issue but are also able to identify the variables that are highly correlated or share some common coherent patterns. Both simulation studies and real data analysis of gene expression microarray data are presented to illustrate the competitive performance of our methods in comparison with several existing methods.

Degree

Ph.D.

Advisors

Zhang, Purdue University.

Subject Area

Biostatistics|Statistics|Bioinformatics

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS