Machine Learning Methods for Better Drug Prioritization

Junfeng Liu, Purdue University

Abstract

Effective prioritization is critical in drug discovery and precision medicine. Various computational tools have been developed and utilized in different applications for the development and the use of drugs. In the early stages of drug discovery, compound prioritization is largely used in high throughput screening to help identify drug candidates for further investigation. For a compound to be a successful drug, it has to exhibit certain promising biological properties (e.g., compound activity, selectivities, toxicity, etc.). Compound prioritization methods prioritize the drug candidates based on such properties so that the compounds that exhibit more drug-like properties could be prioritized over those compounds that are less likely to become drugs. After drugs are developed, drug prioritization is also essential to develop better treatment plans in precision medicine. One of the primary goals of precision medicine is to select the right drugs for the right patients. For instance, when selecting drugs for patients of different cancer types, sensitive drugs for patients of certain types of cancers should be prioritized over insensitive drugs, even if these insensitive drugs might be sensitive to patients of other cancer types. Current development of computational methods for compound prioritization and drug prioritization suffer from three major issues, and we have developed novel machine learning methods to tackle each of them, respectively. First, existing methods for compound prioritization are largely focused on devising advanced ranking algorithms that better learn the ordering among compounds. However, such methodologies are fundamentally limited by the scarcity of available data, particularly when the screenings are conducted at a relatively small scale over known promising compounds. To tackle this problem, we explore the structures of bioassay space and leverage such structures to improve ranking performance of an existing strong ranking algorithm. This is done by identifying assistance bioassays and assistance compounds intelligently and leveraging such assistance within the existing ranking algorithm. By leveraging the assistance bioassays and compounds, the data scarcity can be properly overcome. Along this line, we developed a machine learning framework MACPAU, which consists of a suite of assistance bioassay selection methods and assistance compound selection methods. Our experiments demonstrate an overall 8.34% improvement on the ranking performance over the current state-of-the-art. Second, current computational methods for compound prioritization usually focus on ranking compounds based on one property, typically activity, with respect to a single target. However, compound selectivity is also a key property which should be deliberated simultaneously so as to minimize the likelihood of undesired side effects of future drugs. To solve this problem, we present a novel machine-learning based differential compound prioritization method dCPPP. This dCPPP method learns compound prioritization models that rank active compounds well, and meanwhile, preferably rank selective compounds higher via a bi-directional selectivity push strategy. The bi-directional push is enhanced by push powers that are determined by ranking difference of selective compounds over multiple bioassays. Our experiments demonstrate that the new method dCPPP achieves significant improvement on prioritizing selective compounds over baseline models. Third, conventional methods for drug selection are unable to effectively prioritize sensitive drugs over insensitive drugs, and are unable to differentiate the orderings among sensitive drugs. We have formulated the cancer drug selection problem as to accurately predict 1). the ranking positions of sensitive drugs and 2). the ranking orders among sensitive drugs in cancer cell lines based on their responses to cancer drugs. We have developed a new learning-to-rank method, denoted as pLETORg, that predicts drug ranking structures in each cell line using drug latent vectors and cell line latent vectors. The pLETORg method learns such latent vectors through explicitly enforcing that, in the drug ranking list of each cell line, the sensitive drugs are pushed above insensitive drugs, and meanwhile the ranking orders among sensitive drugs are correct. Genomics information on cell lines is leveraged in learning the latent vectors. Our experimental results on a benchmark cell line-drug response dataset demonstrate that the new pLETORg significantly outperforms the state-of-the-art method in prioritizing new sensitive drugs.

Degree

M.S.

Advisors

Ning, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS