Integrative Analysis of Multimodal Biomedical Data with Machine Learning

Zhi Huang, Purdue University

Abstract

With the rapid development in high-throughput technologies and the next generation sequencing (NGS) during the past decades, the bottleneck for advances in computational biology and bioinformatics research has shifted from data collection to data analysis. As one of the central goals in precision health, understanding and interpreting high-dimensional biomedical data is of major interest in computational biology and bioinformatics domains. Since significant effort has been committed to harnessing biomedical data for multiple analyses, this thesis is aiming for developing new machine learning approaches to help discover and interpret the complex mechanisms and interactions behind the high dimensional features in biomedical data. Moreover, this thesis also studies the prediction of post-treatment response given histopathologic images with machine learning. Capturing the important features behind the biomedical data can be achieved in many ways such as network and correlation analyses, dimensionality reduction, image processing, etc. In this thesis, we accomplish the computation through co-expression analysis, survival analysis, and matrix decomposition in supervised and unsupervised learning manners. We use co-expression analysis as upfront feature engineering, implement survival regression in deep learning to predict patient survival and discover associated factors. By integrating Cox proportional hazards regression into non-negative matrix factorization algorithm, the latent clusters of human genes are uncovered. Using machine learning and automatic feature extraction workflow, we extract thirty-six image features from histopathologic images, and use them to predict post-treatment response. In addition, a web portal written by R language is built in order to bring convenience to future biomedical studies and analyses. In conclusion, driven by machine learning algorithms, this thesis focuses on the integrative analysis given multimodal biomedical data, especially the supervised cancer patient survival prognosis, the recognition of latent gene clusters, and the application of predicting posttreatment response from histopathologic images. The proposed computational algorithms present its superiority comparing to other state-of-the-art models, provide new insights toward the biomedical and cancer studies in the future.

Degree

Ph.D.

Advisors

Delp, Purdue University.

Subject Area

Artificial intelligence|Oncology

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS