Computational models of mutations for predicting and classifying protein-protein interaction sites

David La, Purdue University

Abstract

Protein-protein interaction residues are largely responsible for mediating many critical functions in the cell, such as inhibitory effects through enzyme-inhibitor interaction, initiating immune response by an antibody-antigen interaction, and regulation of cell-signaling proteins. Currently, various methods are available for predicting protein-protein interaction sites, these methods allows a residue-level understanding of the protein-binding phenomena presented by the global construction protein-protein interaction networks. In this thesis, protein-protein interaction sites are predicted using phylogenetic substitution models of amino acid mutations at protein interfaces: 1) Predicting Protein-Protein Interaction Sites using Phylogenetic Substitution Models: Protein-protein are critical for maintaining many different biological functions in the cell. In particular, these processes involve functionally important amino acid residues that are traditionally accepted as conserved in sequence throughout evolutionary time. However, protein-protein interaction sites exhibit higher sequence variation than other functional regions, such as those that correspond to catalytic sites and ligand-binding sites. Consequently, the semi-conservation of protein-protein interaction sites pose significant challenges in the current protein-protein interface prediction methods. To approach this problem, we developed a phylogenetic framework to capture the mutational behavior of essential protein-protein binding residues. Through the comprehensive analysis of functionally diverse protein families, we discover key amino acid substitution patterns that are characteristic of protein-protein interfaces. We demonstrate the contrast between interface and non-interface substitution models shows mutational biases imposed on protein-protein binding residues. Based on this analysis, we have developed a novel method, BindML, which utilizes these evolutionary models to predict protein-protein binding sites on protein structures even without knowledge of their interacting partners. When assessed on a large benchmark of protein complexes, our method performs better compared to alternative methods for protein binding interface prediction. The conceptual novelty of this method is that it detects semi-conserved mutations rather than conventional conservation in protein family sequences, thus aimed to open a new direction in protein sequence analysis. 2) Prediction and Classification of Permanent and Transient Protein-Protein Interfaces: Proteins interact with each other in different ways for specific functional consequences. Our current research direction involves the development of a new method to classify mutation patterns of protein-protein interaction sites into permanent and transient types. The permanent type of interactions requires tight binding between proteins to assemble strong complexes. For example, enzyme-inhibitor, antigen-antibody, and large homo-oligomeric enzyme structures all compose of proteins that are required to be permanently bound in order to correctly carry out their functions. In contrast, transient type protein-protein interactions can readily dissociate after binding. Examples of transient interactions include proteins involved in signaling pathways, in which binding of transient proteins (such as protein kinases and G-proteins) induces conformational changes that allow protein function (and hence pathways) to switch on and off allowing strict and precise control of cellular activity. Although there are many studies that have already explored the differences in these two types of interactions at the level of the protein structure, in this study we develop amino acid substitution models to differentiate the differences between permanent and transient type interfaces primarily using sequence information. We built highly discriminative substitution models that can be used to classify protein interface predictions into permanent and transient interaction types. A detailed understanding of the mutational constraint differences between permanent and transient protein complexes should help elucidate critical amino acid substitution preferences that are useful for annotating protein binding interface predictions of structures and sequences of unknown function. 3) 3D-SURFER Software for high-throughput protein surface comparison and analysis: A web-based tool, 3D-Surfer, has been developed to facilitate high-throughput comparison and characterization of proteins based on their surface shape. As each protein is effectively represented by a vector of 3D Zernike descriptors, comparison times for a query protein against the entire PDB take, on average, only a couple of seconds. The web interface has been designed to be as interactive as possible with displays showing animated protein rotations, CATH codes and structural alignments using the CE program. In addition, geometrically interesting local features of the protein surface, such as pockets that often correspond to ligand binding sites as well as protrusions and flat regions can also be identified and visualized.

Degree

Ph.D.

Advisors

Kihara, Purdue University.

Subject Area

Bioinformatics|Biophysics

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS