Generation and statistical modeling of active protein chimeras: A sequence based approach

Nicholas Justin Fico, Purdue University

Abstract

Generation of active protein chimeras is a valuable tool to probe the functional space of proteins. Statistical modeling is the next logical step, allowing us to build a model of gene fragment replaceability between species. In this thesis I begin to develop the statistical tools that are needed to systematically describe combinatorial protein libraries. I present three sets of diverse chimeric protein libraries developed using sequence information. The statistical model of the human N-Ras and human K-Ras-4B genes reveal a set previously unidetifed surface residues on the N-Ras G-Domain that may be involved in cellular localization. Statistical modeling of a library of chimeric proteins between A. thaliana cinnamate 4-hydroxylase (AtC4H) and S. moellendorffii cinnamate 4-hydroxylase (SmC4H) reveal a possible stabilizing effect of the N-terminal amino acids from SmC4H and, irreplaceable catalytic domains between AtC4H and SmC4H. I also show gene fragment replaceability on a small scale between functionally divergent AtC4H and A. thaliana ferulate 5-hyrdoxylase proteins. Finally, I show that commonly occurring residue pairs in the sequence record are effective covariates when modeling activity in the AtC4H-SmC4H chimeric library.

Degree

Ph.D.

Advisors

Friedman, Purdue University.

Subject Area

Molecular biology|Statistics

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS