Date of Award

8-2018

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

Committee Chair

Walid G. Aref

Committee Member 1

Sunil Prabhakar

Committee Member 2

Susanne E. Hambrusch

Committee Member 3

Tiark Rompf

Abstract

A variety of data management applications spanning various domains, e.g., social networks, transportation, and bioinformatics, have graphs as first-class citizens. Specialized graph databases can serve these applications to evaluate vital graph-traversal queries, e.g., shortest path and reachability queries. However, specialized graph databases are not as mature as the pervasive relational database systems. Although the literature has several proposals to process graph queries inside an RDBMS, none of these proposals process graphs natively inside an RDBMS for potentially better performance, which is particularly challenging due to the impedance mismatch between the relational and the graph models. Additionally, to scale for large graphs and heavy query-workloads, the state-of-the-art approaches pre-construct indexing structures to improve query latency. However, updates that dynamically change the underlying graphs may frequently invalidate these costly pre-constructed indexes. This dissertation presents GRFusion, an RDBMS that manages graphs as first-class citizens as well as novel methods to index and summarize dynamic graphs for efficient query processing, namely, EDP and SBG-Sketch.

GRFusion is based on VoltDB, an open-source in-memory relational database. This dissertation shows how the SQL query engine of GRFusion is empowered to declaratively define graphs and execute cross-data-model query plans acting on graphs and relations, resulting in up to four orders-of-magnitude in query-time speedup w.r.t. state-of-the-art approaches.

To scale for large graphs and heavy query-workloads, this dissertation presents Edge-Disjoint Partitioning (EDP) and Self-Balanced Graph Sketch (SBG-Sketch). EDP is a new technique for efficiently answering shortest-path queries with filtering predicates over dynamic graphs. EDP has two main components: a graph-partitioning index and an efficient traversal algorithm. EDP can dynamically handle various types of online graph updates, e.g., graph topology and attribute updates. Experimental results demonstrate that EDP can achieve query performance gains of up to four orders-of-magnitude in comparison to state-of-the-art techniques. SBG-Sketch is a graph sketch that summarizes edge-labeled graph streams, coping with highly imbalanced edge-labels. SBG-Sketch maintains synopsis for both the edge attributes (e.g., edge weight) as well as the topology of the streamed graph. SBG-Sketch allows efficient processing of frequency-based queries (e.g., edge counts) and graph-traversal queries (e.g., reachability queries). Experimental results over a variety of real labeled-graph streams show SBG-Sketch to reduce the estimation errors of the state-of-the-art methods by up to 99%.

Share

COinS