Protein Structural Modeling Using Electron Microscopy Maps
Abstract
Proteins are significant components of living cells. They perform a diverse range of biological functions such as cell shape and metabolism. The functions of proteins are determined by their three-dimensional structures. Cryogenic-electron microscopy (cryo-EM) is a technology known for determining the structure of large macromolecular structures including protein complexes. When individual atomic protein structures are available, a critical task in structure modeling is fitting the individual structures into the cryo-EM density map.In this research, we developed, MarkovFit, a machine learning-based method which performs simultaneous rigid fitting of the atomic structures of individual proteins into cryo-EM maps of medium to low resolution to model the three-dimensional structure of protein complexes. MarkovFit uses Markov random field (MRF), which allows probabilistic evaluation of fitted models. MarkovFit starts by searching the conformational space using FFT for potential poses of protein structures, computes scores which quantify the goodness-of-fit between each individual protein and the cryo-EM map, and the interactions between the proteins. Afterwards, proteins and their interactions are represented using a MRF graph. MRF nodes use a belief propagation algorithm to exchange information, and the best conformations are then extracted using a maxheap tree. Lastly, top final conformations of the protein complex are refined using two structural refinement methods.We benchmarked MarkovFit on a dataset of nine experimentally determined Cryo-EM maps at resolution less than 4 Å where each protein subunit was shifted and rotated randomly. The average root-mean-square-distance (RMSD) value between the predicted models and the native structures was 1.86 Å for the best-RMSD and the highest-scored models. Besides the highresolution experimental dataset, MarkovFit was tested on a dataset consisting of 28 experimental maps determined at resolution ranges from 6 to 20 Å. The medium resolution experimental dataset has two versions. In the first version, each protein was shifted and rotated randomly, while the initial orientation was used in the second version. For the randomly transformed experimental dataset, the RMSD values between the predicted models and the native structures were 8.14 Å and 13.91 Å for the best-RMSD and the highest-scored models, respectively. For the non-transformed experimental dataset, the average RMSD values were 6.08 Å and 9.95 Å for the best-RMSD and the highest-scored models, respectively.In addition to the medium resolution experimental dataset, MarkovFit was benchmarked on a dataset of 31 EM maps simulated at resolution 10 Å. For the randomly transformed simulated dataset, the RMSD values were 5.28 Å and 6.12 Å for the best-RMSD and the highest-scored models, respectively. For the non-transformed simulated dataset, the average RMSD values were 0.94 Å and 1.27 Å for the best-RMSD and the highest-scored models, respectively.Lastly, MarkovFit was benchmarked in comparison with two existing methods where it showed superior fitting results.
Degree
Ph.D.
Advisors
Kihara, Purdue University.
Subject Area
Artificial intelligence|Mathematics|Medical imaging
Off-Campus Purdue Users:
	To access this dissertation, please log in to our
	proxy server.
 
				