Comparative analysis of biological networks

Mehmet Koyuturk, Purdue University

Abstract

Recent developments in molecular biology have resulted in experimental data that entails the relationships and interactions between biomolecules. Biomolecular interaction data, generally referred to as biological or cellular networks, are frequently abstracted using graph models. In systems biology, comparative analysis of these networks provides understanding of functional modularity in the cell by integrating cellular organization, functional hierarchy, and evolutionary conservation. In this dissertation, we address a number of algorithmic issues associated with comparative analysis of molecular interaction networks. We first discuss the problem of identifying common sub-networks in a collection of molecular interaction networks belonging to diverse species. The main algorithmic challenges here stem from the exponential worst-case complexity of the underlying mining problem involving large patterns, as well as the NP-hardness of the subgraph isomorphism problem. Three decades of research into theoretical aspects of this problem has highlighted the futility of syntactic approaches to this problem, thus motivating use of semantic information. Using a biologically motivated orthologcontraction technique for relating proteins across species, we render this problem tractable. We experimentally show that the proposed method can be used as a pruning heuristic that accelerates existing techniques significantly, as well as a stand-alone tool that conveys significant biological insights at near-interactive rates. With a view to understanding the conservation and divergence of functional modules, we also develop network alignment techniques, grounded in theoretical models of network evolution. Through graph-theoretic modeling of evolutionary events in terms of matches, mismatches, and duplications, we reduce the alignment problem to a graph optimization problem and develop effective heuristics to solve this problem efficiently. We probabilistically analyze the existence of highly connected and conserved subgraphs in random graphs, in order to assess the statistical significance of the patterns identified by our algorithms. Our methods and algorithms are implemented on various platforms and tested extensively on a comprehensive collection of molecular interaction data, illustrating the effectiveness of the algorithms in terms of providing novel biological insights as well as computational efficiency. The source code of the software described in this dissertation is available in the public domain and has been downloaded and effectively used by several researchers.

Degree

Ph.D.

Advisors

Szpankowski, Purdue University.

Subject Area

Bioinformatics

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS