Reverse engineering gene networks using genomic time-course data

Andrea Rau, Purdue University


Gene regulatory networks are collections of genes that interact, whether directly or indirectly, with each other and with other substances in the cell. Such gene-to-gene interactions play an important role in a variety of biological processes, as they regulate the rate and degree to which genes are transcribed and proteins are created. By measuring gene expression over time, it may be possible to reverse engineer, or infer, the structure of the gene network involved in a particular cellular process. With the development of microarray and next-generation sequencing technologies, it has become possible to conduct longitudinal experiments to measure the expression of thousands of genes simultaneously over time. However, due to the high dimensionality of gene expression data, the limited number of biological replicates and time points typically measured, and the complexity of biological systems themselves, the problem of reverse engineering networks from transcriptomic data demands a specialized suite of appropriate statistical tools and methodologies. Two methods are proposed that use directed graphical models of stochastic processes, known as dynamic Bayesian networks, and first-order linear models to represent gene regulatory networks. In the first method, an algorithm is developed based on a hierarchical Bayesian framework for a Gaussian state space model. Hyperparameters are estimated using an empirical Bayes procedure, and parameter posterior distributions determine the presence or absence of gene-to-gene interactions. In the second method, a simulation-based approach known as Approximate Bayesian Computing based on Markov Chain Monte Carlo sampling is modified to the context of gene regulatory networks. Because no likelihood calculation is required, this method permits inference even for networks where no distributional assumptions are made. The performance of the proposed approaches is investigated via simulations, and both methods are applied to real longitudinal expression data. The two methods, while not comparable, are complementary, and help illustrate the need for a variety of network inference methods adapted for different contexts.




Doerge, Purdue University.

Subject Area


Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server