Lasso and general L1-regularized regression under linear equality and inequality constraints

Tianhong He, Purdue University

Abstract

This thesis consists of three parts. In Chapter 1, we examine existing variable selection methods and introduce the almost explicit solution of Lasso from the perspective of convex optimization. In Chapter 2, we focus on the problem of incorporating linear equality and inequality constraints into Lasso for variable selection. We develop efficient algorithms for computing the path solution of linearly constrained Lasso and establish its theoretical properties. We extend the results in Chapter 2 to a general family of l1 regularized regression in Chapter 3. The Lasso proposed by Tibshirani (1996) has become a popular variable selection method for high dimensional data analysis. Much effort has been dedicated to its further improvement in recent statistical literature. It is well-known that incorporation of prior information regarding predictor variables can lead to more accurate estimates of the regression coefficients in linear regression. We propose a systematic approach to incorporating prior information into the Lasso via linear constraints. An efficient algorithm has been developed to compute the lasso solution under linear constraints, and the theoretical properties of the resulting estimates including variable selection consistencies have been established. Simulation studies show that the proposed method can lead to more accurate results in terms of variable selection and parameter estimation. The proposed method is applied to real life examples. In particular, we use the proposed method to incorporate prior knowledge of genetic regulatory networks and metabolic pathways to genomic data analysis. Furthermore, we consider a general family of L 1 regularized regression, and develop algorithms for computing the path solutions of this family with or without linear constraints. Future research directions are also discussed.

Degree

Ph.D.

Advisors

Zhu, Purdue University.

Subject Area

Statistics

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS