An algorithmic pipeline for analyzing multi-parametric flow cytometry data
Flow cytometry (FC) is a single-cell profiling platform for measuring the phenotypes (protein expressions) of individual cells from millions of cells in biological samples. In the last several years, FC has begun to employ high-throughput technologies, and to generate high-dimensional data, and hence algorithms for analyzing the data represent a bottleneck. This dissertation addresses several computational challenges arising in modern cytometry while mining information from high-dimensional and high-content biological data. A collection of combinatorial and statistical algorithms for locating, matching, prototyping, and classifying cellular populations from multi-parametric flow cytometry data is developed. The algorithms developed in this dissertation are assembled into a data analysis pipeline called flowMatch. This pipeline consists of five well-defined algorithmic modules for (1) transforming data to stabilize within-population variance, (2) identifying phenotypic cell populations by robust clustering algorithms, (3) registering cell populations across samples, (4) encapsulating a class of samples with templates, and (5) classifying samples based on their similarity with the templates. Each module of flowMatch is designed to perform a specific task independent of other modules of the pipeline. However, they can also be employed sequentially in the order described above to perform the complete data analysis. The flowMatch pipeline is made available as an open-source R package in Bioconductor (http://www.bioconductor.org/). I have employed flowMatch for classifying leukemia samples, evaluating the phosphorylation effects on T cells, classifying healthy immune profiles, comparing the impact of two treatments for Multiple Sclerosis, and classifying the vaccination status of HIV patients. In these analyses, the pipeline is able to reach biologically meaningful conclusions quickly and efficiently with the automated algorithms. The algorithms included in flowMatch can also be applied to problems outside of flow cytometry such as in microarray data analysis and image recognition. Therefore, this dissertation contributes to the solution of fundamental problems in computational cytometry and related domains.
Pothen, Purdue University.
Off-Campus Purdue Users:
To access this dissertation, please log in to our