Advanced informatics based approaches for data driven drug discovery
Modern drug discovery has evolved into a rational design paradigm, where the vast chemical space is now being systematically explored for the desired biological activity. The post-genomic era has been marked with easy availability of sophisticated high throughput screening techniques and peta-scale computational power for drug discovery research. This has resulted in generation of enormous amount of data, which if mined effectively could provide invaluable information that supplements the existing knowledge from theoretical models and experimental findings. The focus of this work is to demonstrate how informatics approaches can contribute to strategies for screening chemical libraries. This work is organized into two problems in the context of structure based drug design—one in receptor based drug design and other in ligand based drug design. ^ In receptor based drug design, the structure of the receptor protein is well known and is utilized for searching for good binding inhibitors with desired properties. The first problem of the thesis discusses the development of weighted SIFt (or w-SIFt) method that is based on efficient capturing and encoding of key protein-ligand binding interactions. The w-SIFt tool could be used as a virtual screening technique for mining potent compounds from chemical databases. w-SIFt involved applying a data dimensionality reduction technique called Nonnegative Matrix Factorization coupled with Simulated Annealing optimization for efficiently learning weights that signify the importance of an interaction during a protein-ligand binding. ^ Ligand based drug design involves searching for new inhibitors based on information from a known set of active ligands. Such indirect approaches are necessary in cases where the structure of the target protein cannot be determined experimentally. Current day computational methods for structural overlay of active ligands are highly approximate and so prone to errors. A random forests model is applied here for predicting correct structural overlays of ligands generated from three different computational overlay algorithms. These predictions are based on various descriptors that characterize the kind of overlaid ligand, the template ligand, and the receptor protein. It is shown here that using random forests, the computationally obtained overlays can be efficiently classified as correct and incorrect, and the correctly sorted overlays can now be accepted with a high degree of confidence. ^
Sangtae Kim, Purdue University.
Health Sciences, Pharmacology|Engineering, Chemical|Biology, Bioinformatics