Statistical design and analysis of next-generation sequencing data

Paul Livermore Auer, Purdue University

Abstract

Next-generation sequencing technologies have revolutionized genomic research. Although these technologies provide an unprecedented amount of genomic information, they also raise numerous statistical issues related to the high-dimensionality, small sample sizes, and inherent variability of the data they produce. To address some of these issues, we begin by examining the effectiveness of arranging next-generation sequencing experiments according to well known experimental designs for the purposes of partitioning sources of variation and removing confounding with nuisance factors. Second, we propose a novel method for testing differential gene expression as measured from next-generation sequencing technologies. The advantages of this approach over other methods are demonstrated through simulations and applications on data from Homo sapiens and Mus musculus. Two special cases of next-generation sequencing data are presented, demonstrating the flexibility of these data to answer diverse biological questions.

Degree

Ph.D.

Advisors

Doerge, Purdue University.

Subject Area

Statistics

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS