Advanced computational and machine learning tools in pharmaceutical informatics

Sridhar V. V. S Maddipati, Purdue University


Computational and informatics approaches are of immediate necessity in the drug discovery research to analyze the possible side effects of the drugs a priori and also to leverage and integrate vast amounts of the disparate data that is being generated by the high throughput screening experiments and the ‘omic’ technologies. In this work, computational methods are proposed for two problems: (1) A priori prediction of side effects of cancer therapeutic drugs: Protein kinases are central targets for drug-based treatment of diseases such as cancer, diabetes and arthritis. However, recent high throughput screening data reveal that most kinase inhibitors of pharmacological relevance exhibit high cross reactivity, frequently leading to toxic side effects since kinases play a critical role in many cell signaling events. In this work, the recently proposed theory of dehydrons is first illustrated by molecular dynamics simulations and is then used to develop a structure-based predictor of cross reactivity, which is validated against affinity fingerprinting of the kinases. It is shown that pharmacological distances are highly correlated to the patterns of dehydrons. (2) Analysis of the high throughput data on recombinant proteins: Recombinant proteins find several applications in biotechnology and pharmaceutical industries. Recent high throughput experiments can measure the properties of several of these artificial proteins. However, experiments spanning the entire combinatorial library of such proteins are quite expensive and hence building statistical models based on the limited experimental data is of utmost importance. In this work, a statistical classification model built using Support Vector Machines is used to analyze the folding patterns in recombinants of Cytochrome P450. This model can further be used in formulating a design problem to find the optimal recombinants. The methods (kernels) developed in this work can be generalized to arrive at new kernels called ‘categorical kernels’ and are found to be applicable in general pattern recognition problems like hand written digit recognition.^ Since the kernel methods can be applied to categorical, numerical and even continuous attributes, they can potentially be employed to make cross reactivity predictions for other protein families, for which only few crystal structures are available. These kernels could bridge the high throughput experimental data with sequence and structure information. ^




Sangtae Kim, Purdue University, Venkat Venkatasubramanian, Purdue University.

Subject Area

Engineering, Chemical|Biology, Bioinformatics

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server