Novel statistical models and a high-performance computing toolkit for the solution of cryo electron microscopy inverse problems in viral structural biology
Abstract
Computation of the 3-D variation of the electron scattering intensity of a virus from cryo electron microscope images by statistical methods requires high performance computing that can be specifically adapted to different types of problems, e.g., a spherical virus versus a helical virus or heterogeneous versus homogeneous ensembles of virus particles. End-to-end formulations of biological problems, statistical models, algorithms, and software design, implementation, and performance are described which have resulted in a parallel software toolkit. The toolkit is targeted at commodity PC clusters and is modular, reusable, and provides user-transparent parallelism while at the same time delivering nearly linear speedup on practical problems. Two applications of these software ideas and systems have been developed in detail: spherical viruses showing heterogeneous structures as an example of Matlab and the portals of bacteriophage P22 as an example of C/C++/OpenMP/MPI. Virus particles sharing identical genomic information often have heterogeneous 3-D structures when examined particle by particle. This dissertation develops a new statistical approach to describing such heterogeneous ensembles and computing the parameters of the description from large sets of cryo electron microscope images. The parameters describe both the typical 3-D variation of electron scattering intensity (i.e., a reconstruction) and its spatial fluctuation. As required by the new description of virus particle ensembles, which is statistical in character, new numerical algorithms and parallel software to process the image data in the framework of the EM software toolkit have been developed and demonstrated for problems related to Flock House Virus. The second problem is a two-class classification problem for end-on views of the portal of P22 bacteria phage where the two classes are whether the portal contains 11 or 12 copies of the protein in 11-fold or 12-fold rotational symmetry. Algorithms based on computing probability of class membership to automatically classify the two types of portals as well as broken particles and debris which are common in cryo EM images have been developed which are capable of jointly classifying 104 images without training data.
Degree
Ph.D.
Advisors
Doerschuk, Purdue University.
Subject Area
Electrical engineering
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.