Statistical Methods for Effective Protein Biomarker Discovery

Lin-Yang Cheng, Purdue University


Effective discovery of protein biomarkers whose expression profiles correlate with (disease) conditions has been of great interest for numerous scientific purposes, such as drug discovery, diagnosis, prognosis, etc. Data independent acquisition (DIA) is the most advanced experimental approach to date for mass spectrometry-based proteomics, as it can quantify protein markers from much wider range of mass, providing opportunities to discover important markers by exploring thousands of unknown ones. However, the process of biomarker discovery is complex and costly. Bottom-up experiments with high throughput regularly cause noisy or incorrect signals from DIA data, which then lead to problematic results from subsequent statistical analysis. Meanwhile, data configuration is so complicated that the most effective statistical method for analyzing the data at hand must be determined on a case-by-case basis. Finally, biomarker discovery using DIA data faces sparsity and high dimensionality problems, often causing suboptimal results. Most importantly, we notice that even the best of currently popular statistical methods either mistakenly discover too many irrelevant markers or fail to identify all the relevant markers. In this dissertation, our work aims for simplifying the process of biomarker discovery. First, we propose a regularization framework for removing the noisy and deleterious signals incurred during the DIA experiment. Next, we propose a permutationbased method for determining the most effective statistical method in discovering informative markers. Lastly, we propose an effective likelihood ratio-based discriminant statistic, which also has desirable asymptotic properties, for directly finding all relevant markers with much lower false discovery rate than all the existing methods.




Xi, Purdue University.

Subject Area


Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server