A flexible and versatile framework for statistical design and analysis of quantitative mass spectrometry-based proteomic experiments
Quantitative mass spectrometry (MS)-based proteomics is an indispensable technology for biological and clinical research. As the proteomics field grows, MS-based proteomic workflows are becoming more complex and diverse. The accuracy and the throughput of the MS measurements and of the signal processing tools dramatically increased. However, many existing statistical tools and workflows have not followed the technological development. Therefore, there is a need for flexible statistical tools, which reflect diverse and complex workflows, are computationally efficient for large datasets, and maximize the reproducibility of the results. We propose a family of linear mixed effects models, and a split-plot view of the experimental design, that represent measurements from quantitative mass spectrometry-based proteomics. The whole plot part of the design reflects the structure of the biological variation of the experiment, such as case-control design, paired design, or time-course design. The subplot part of the design reflects the structure of the technological variation, such as fragmentation patterns, labeling strategy, and presence of multiple peptides per protein. We propose an estimation procedure that separately estimates the parameters of the subplot and the whole plot parts of the design, to maximize the flexibility of the model, increase the speed of the analysis, and facilitate the interpretation. The proposed modeling framework was validated using 9 controlled mixtures and 10 experimental datasets from targeted Selected Reaction Monitoring (SRM), Data-Dependent Acquisition (DDA or shotgun), and Data-Independent Acquisition (DIA or SWATH-MS), where signals were extracted with multiple signal processing tools. We implemented the proposed method in the software package MSstats, which checks the correctness of the user input, recognizes arbitrary complex experimental design, visualizes the data and performs statistical modeling and inference. It is interoperable with other existing computational tools such as Skyline.
Vitek, Purdue University.
Off-Campus Purdue Users:
To access this dissertation, please log in to our