Determination of biological macromolecular structures using distributed memory MIMD systems

Marius A Cornea-Hasegan, Purdue University

Abstract

X-ray crystallography is the major tool for the structure determination of proteins and viruses, and the Molecular Replacement is one of the most important methods used for this purpose in Structural Biology. The determination of the 3-dimensional atomic structure of biological macromolecules requires a huge amount of computing resources. Until recently, processing of experimental data, phase determination, and model building were being done by a series of complicated sequential programs. Chapter 1 of this thesis is an introduction to the physical problem. In chapter 2, we discuss algorithms and programs for electron density averaging as part of the Molecular Replacement Method, that were implemented on distributed memory MIMD systems. Electron density averaging is the most computationally intensive step needed for phase refinement and extension in the computation of the 3-D structure of macromolecules like proteins and viruses. The determination of a single structure may require thousands of hours of CPU time for traditional supercomputers. The approach discussed herein leads to a reduction by two orders of magnitude of the computing time. The programs that have been implemented are used for more than two years by the Structural Biology Group at Purdue University, and run on the Intel iPSC/860, the Intel Touchstone Delta, and the Intel Paragon systems. A Shared Virtual Memory based on interrupt driven messages was implemented, in order to solve most efficiently the data access problem. Different data management techniques for mapping a large data space onto the memory hierarchy of a distributed memory MIMD system are also discussed in chapter 2, and experimental results for Structural Biology computations using the Molecular Replacement Method are presented. An X-Window, Motif based, object-oriented User Interface was developed to support execution of the Structural Biology programs, as part of a Problem Solving Environment, and is discussed in chapter 3. A Problem Specification Language, SBL (Structural Biology Language) was designed and implemented to support the execution of complex sequences of Structural Biology programs on distributed memory MIMD systems, insuring automatic check-pointing and restart mechanisms. This makes the object of chapter 4. The present thesis is not only of academic interest. Two objectives were pursued: first, to uncover interesting problems from the computer science point of view, and second, to provide the biologists and crystallographers with a workable environment, allowing them to reach the goal of determining biological macromolecular structures. Both these goals have been achieved--several journal or conference papers, as well as technical reports were published as a result of the work presented herein, and, on the other side, biologists are currently using the implemented programs in their studies.

Degree

Ph.D.

Advisors

Marinescu, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS