Sequencing-Based Gene Discovery and Gene Regulatory Variation Exploration in Pedigreed Populations

Robert Ebow McEwan, Purdue University

Abstract

Forward genetics discovery of the molecular basis of induced mutants has fundamentally contributed to our understanding of basic biological processes such as metabolism, cell dynamics, growth, and development. Advances in Next-Generation Sequencing (NGS) technologies enabled rapid genome sequencing but also come with limitations such as sequencing errors, dependence on reference genome accuracy, and alignment errors. By incorporating pedigree information to help correct for some errors I optimized variant calling and filtering strategies to respond to experimental design. This led to the identification of multiple causative alleles, the detection of pedigree errors, and an ability to explore the mutational spectrum of multiple mutagens in Arabidopsis. Similar to the problems in forward genetic discovery of mutant alleles, variation in genomes complicates the analysis of gene expression affected by natural variation. The plant hypersensitive response (HR) is a highly localized and rapid form of programmed cell death that plants use to contain biotrophic pathogens. Substantial natural variation exists in the mechanisms that trigger and control HR, yet a complete understanding of the molecular mechanisms modulating HR is lacking. I explored the gene expression consequences of the plant HR in maize using a semi-dominant mutant encoding a constitutively active HR-inducing Nucleotide Binding Site Leucine Rich Repeat protein, Rp1-D21, derived from the receptor responsible for perceiving certain strains of the common rust Puccinia sorghi. Differentially expressed genes (DEG) in response to Rp1-D21 were identified in different genetic backgrounds and hybrids that exhibit divergent enhancing (NC350) or suppressing (H95, B73) effects on the visual manifestations of HR. To enable this analysis, I created anonymized reference genomes for each comparison, so that the reference genome induced less bias in the mapping steps. Comprehensive identification of DEG corroborated the visual phenotypes and provided the identities of genes influential in plant hypersensitive response for further studies. The locations of expression quantitative trait loci (eQTL) that determined the differential response of NC350 and B73 were identified using 198 F1 families generated by crossing B73 x NC350 RIL population and Rp1-D21/+ in H95. This identified 3514 eQTL controlling the variability in differential expression between mutant versus wild-type. Trans-eQTL were dramatically arranged in the genome and identified 17 hotspots with more than 200 genes influenced by each locus. A single locus significantly affected expression variation in 5700 genes, 5396 (94.7%) of which were DGE. An allele specific expression analysis of NC350 x H95 and B73 x H95 F1 hybrids with and without Rp1-D21 identified cis-eQTL and ASE at a subset of these genes. Bias in the confirmation of eQTL by ASE was still present despite the anonymized reference genomes indicating that additional efforts to improve signal processing in these experiments is needed.

Degree

Ph.D.

Advisors

Dilkes, Purdue University.

Subject Area

Genetics|Bioinformatics|Electrical engineering

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS