Search for conserved patterns in RNA structures
Abstract
RNA molecules play an intricate role in many cellular processes, but unlike protein and DNA, our abilities to predict and compare RNA structures are inadequate. Divergent RNA molecules with similar functions are likely to have similar structures but might have no detectable resemblance in sequence. So, a sequence homology based approach for prediction and comparison is not very reliable. Moreover, predicted Minimum Free Energy (MFE) structures do not predict pseudoknots, and a large fraction of biologically relevant stems may be missing due to restrictions of the algorithms and inaccuracies in energy parameters. However, a reasonable strategy is to analyze the ensemble of predicted suboptimal structures to find additional biologically relevant stems, especially pseudoknots. We use a graph framework (XIOS RNA graph) to represent these structures and learn biologically important motifs by comparative analysis. We convert RNA structures to XIOS RNA graphs, where a stem is a vertex and the relationships between stems are represented as different types of edges, including pseudoknots and mutually exclusive relationship between stems. Hence, ensembles of RNA structures can be represented in a single. First, we generate a comprehensive representation of RNA, which includes all biological stems, from predicted suboptimal structures. Since individual suboptimal structures do not contain pseudoknots, we incorporate likely pseudoknots by considering all unique stems in the ensemble of structures. One of our findings shows that, at the cost of overprediction, the accuracy of predicting stems and pseudoknots increases as more suboptimal structures are considered. We reduce the complexity of XIOS RNA graphs by removing similar stems while preserving those that are biologically important. We enumerate the topologies of all possible stems that can be formed with respect to another stem. Then, we identify the instances where the topological space can be reduced by merging similar stems, creating the basis for developing a set of heuristics rules to contract these graph. We also have developed methods to remove infrequent base pairs from the ensemble. We have developed a RNA structure comparison tool, XIOSMatch, by optimizing, extending and parallelizing a maximal subgraph isomorphism algorithm (gSpan), and use it to identify the largest topological match in a set of RNA graphs. We apply our tool to different RNAs and demonstrate that the conserved motifs discovered for various RNA species are likely to have functional and structural significance.
Degree
Ph.D.
Advisors
Gribskov, Purdue University.
Subject Area
Bioinformatics
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.