Predicting protein structure and function using machine learning methods

Mary Qu-Xuanyuan Yang, Purdue University


Many of the central questions in bioinformatics relate to protein structure and function. We are mainly be concerned with three problems: identifying transmembrane segments in proteins, distinguishing disordered from ordered regions, and determining protein function from sequence information. In order to deal effectively with these problems, we have conducted an in-depth analyses of the physiochemical properties of the amino acids that make up proteins and the amino acid compositions of the various types of proteins. We approach the above questions from a machine learning perspective; the advantage of machine learning approaches over traditional laboratory methods is that the former are generally faster and less expensive. We address the problem of identifying transmembrane segments in proteins using a variant of a self-organizing global ranking algorithm. The problems of distinguishing ordered regions from disordered regions in proteins and of determining protein function from sequence information are addressed using a tree-based recursive classifier. We provide extensive empirical and theoretical justification for these algorithms, and compare our algorithms to existing algorithms such as decision trees and support vector machines.




Ersoy, Purdue University.

Subject Area

Biophysics|Artificial intelligence

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server