Comprehensive Computational Analysis of Chromatinenriched RNAS Reveal Both Active and Repressive Cisregulatory Non-Coding RNAS
Abstract
Nuclear RNA-seq has revealed thousands of potentially regulatory long noncoding RNA (lncRNA). Nuclear-retained lncRNA may interact with various chromatin regulatory proteins and recruit them to cis-regulatory elements in order to regulate gene expression. We are interested in analyzing nuclear RNA-seq to identify chromatin-associated lncRNA (cheRNA) that share enhancer features and transcription-factor dependence, and are thus being indicators of cis-acting loci. Nuclear RNA-seq requires rigorous and effective pipelines that differ from the conventional pipelines used for total RNA-seq datasets, but a thorough survey of analytic pipelines for nuclear RNA-seq has not been performed.The existing computational pipeline (Werner) has important biases. To address the flaws in Werner pipeline, we have developed three new pipelines (referred to here as Tuxedo, Concatenating, and Taco) to analyze nuclear RNA-seq datasets. In this study, we survey the four nuclear RNA-seq analytic pipelines for the cheRNA identification and use the optimal scheme to explore new structure features of cheRNA that has high cis-regulatory potential.To evaluate the transcriptomes assembled by the four pipelines, we used RNA from K562 cells as an example. The Tuxedo pipeline assembles complete transcriptome, including 10.9k unannotated lncRNAs. Transcripts assembled by Tuxedo, compared to the other three pipelines, showed the highest fraction of ongoing transcription by Pol II, and the highest level of nascent transcription by GRO-seq, demonstrating that the Tuxedo assembled transcriptome is more concordant with the active transcription signal represented by traditional measurements.Comparing the four pipelines, Tuxedo also outperforms the other pipelines constructing assemblies that are enriched in enhancer hallmarks. ROC analysis, using the pool of predicted transcripts identification by all four methods, shows that Tuxedo identifies cheRNA precisely, while recapturing three known genomic features of active enhancer.Applying the Tuxedo approach to the K562 dataset, we found that intergenic cheRNA (icheRNA) is more positively correlated with the transcription of neighboring gene than with randomly selected gene. This demonstrates, for the first time, a quantitative cis-regulatory effect of cheRNA expression. A similar analysis of FAMTOM- or ChromHMM-predicted eRNA, which is believed to have cis-regulatory enhancer effect, shows similar but weaker positive correlation. In contrast, intergenic chromatin depleted RNA (isneRNA) and neighboring gene show negative correlation.Genomic regions with abundant H3K9me3 modification, which is usually associated with condensed, inactive chromatin regions, can be actively transcribed. IcheRNAs with high levels of H3K9me3 in the gene body are transcribed at dramatically higher levels than those with lower levels; This is seen in both the soluble nuclear extract and the chromatin pellet, indicating that icheRNA is actively transcribed from regions with high H3K9me3 modification (contrary to expectation). One hypothesis for the unexpected H3K9me3 signal around icheRNA is that the icheRNA may be embedded in condensed domains derived from mobile elementsWe observed that the TSS of antisense cheRNA (as-cheRNA) colocalized mRNA is significantly less open (measured by ATAC-seq signal), has fewer active transcription marks (POL II, H3K4me3), and has more repressive marks (H3K27me3) and PRC2 complex binding (SUZ12, EZH2), compared with random mRNA. This pattern is not observed in mRNA colocalized with antisense chromatin depleted RNA (as-sneRNA), suggesting that as-cheRNA may be cisregulatory elements that interfere transcription of colocalized mRNAs on the opposite strand via recruiting the PRC2 complex.
Degree
Ph.D.
Advisors
Gribskov, Purdue University.
Subject Area
Energy|Biology|Bioinformatics|Agronomy|Biochemistry|Genetics
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.