Protein function, diversity and functional interplay
Functional annotations of novel or unknown proteins is one of the central problems in post-genomics bioinformatics research. With the vast expansion of genomic and proteomic data and technologies over the last decade, development of automated function prediction (AFP) methods for large-scale identification of protein function has become imperative in many aspects. In this research, we address two important divergences from the “one protein – one function” concept on which all existing AFP methods are developed: 1. One protein with multiple independent functions – Moonlighting Proteins: Moonlighting proteins perform more than one independent cellular function within one polypeptide chain. Recent biological experiments have been discovering such multi-functional proteins at a steady pace. Our work on moonlighting proteins can be divided into two logical parts: 1a. Development of a computational framework for comprehensive genome-scale characterization of moonlighting proteins based on functional and context-based information. Our work identifies characteristic features of moonlighting proteins in both cases where current databases have functional annotations of the diverse functions of such proteins and cases where functional annotations do not exist. 1b. Development of automated prediction models of moonlighting proteins. We take two different approaches for our model development: using functional and context based features in a machine learning framework, and using text-based features, learned through text-mining algorithms. 2. Group of proteins sharing a common function: On a regular basis, biological experiments reveal sets of proteins involved in disease/disorder/cellular phenomena without sufficient explanation of the functional mechanisms of these group activities. Intuitively, proteins interact in a cell physically, through gene expression or genetic interaction to perform a common function that so often ends up causing a disease/disorder. To understand the functional nature of a set of proteins, it is often important to understand the functionalities in which they are involved in as a group, rather than understanding the detailed functional characteristics of the individual proteins. In this research, we develop a conditional random field (CRF)-based framework that predicts the function of the “protein groups”, based on group neighborhood of their interaction network, and iteratively updates the function annotation of the unknown group members such that it reflects the protein’s group activity. For the protein function prediction research domain, it is vital to keep pace with existing AFP methods by improving the prediction accuracy, updating the models and making the methods available to the bioinformatics community. The final part of this research copes with the AFP problem in three aspects: improvement, database update and web-server development of two existing methods: PFP and ESG, and participation in a community-wide challenge for the AFP methods called CAFA (Critical Assessment of Function Annotation) and bench-marking the performances.
Kihara, Purdue University.
Off-Campus Purdue Users:
To access this dissertation, please log in to our