Higher-Order Reasoning with Graph Data

Leonardo Cotta, Purdue University

Abstract

Graphs are the natural framework of many of today’s highest impact computing applications: from online social networking, to Web search, to product recommendations, to chemistry, to bioinformatics, to knowledge bases, to mobile ad-hoc networking. To develop successful applications in these domains, we often need representation learning methods —models mapping nodes, edges, subgraphs or entire graphs to some meaningful vector space. Such models are studied in the machine learning subfield of graph representation learning (GRL). Previous GRL research has focused on learning node or entire graph representations as associational tasks. In this work I study higher-order (k> 1-node) representations of graphs in the context of both associational and counterfactual tasks. In this context, we tackle the following problems. Unsupervised k-node representations. Existing Graph Neural Network (GNN) methods that learn inductive unsupervised graph representations focus on learning node and edge representations by predicting observed edges in the graph. Although such approaches have shown advances in downstream node classification tasks, they are ineffective in jointly representing larger k-node sets, k>2. I propose MHM-GNN, an inductive unsupervised graph representation approach that combines joint k-node representations with energy-based models (hypergraph Markov networks) and GNNs. To address the intractability of the loss that arises from this combination, I endow my optimization with a loss upper bound using a finite-sample unbiased Markov Chain Monte Carlo estimator. My experiments show that the unsupervised joint k-node representations of MHM-GNN produce better unsupervised representations than existing approaches from the literature. Graph reconstruction for powerful graph representations.Graph neural networks (GNNs) have limited expressive power, failing to represent many graph classes correctly. While more expressive graph representation learning (GRL) alternatives can distinguish some of these classes, they are significantly harder to implement, may not scale well, and have not been shown to outperform well-tuned GNNs in real-world tasks. Thus, devising simple, scalable, and expressive GRL architectures that also achieve real-world improvements remains an open challenge. In this work, I show the extent to which graph reconstruction —reconstructing a graph from its subgraphs —can mitigate the theoretical and practical problems currently faced by GRL architectures. First, I leverage graph reconstruction to build two new classes of expressive graph representations. Secondly, I show how graph reconstruction boosts the expressive power of any GNN architecture while being a (provably) powerful inductive bias for invariances to node removals. Empirically, I show how reconstruction can boost GNN’s expressive power —while maintaining its invariance to permutations of the nodes— by solving seven graph property tasks not solvable by the original GNN. Further, I demonstrate how it boosts state-of-the-art GNN’s performance across nine real-world benchmark datasets. Learning the Effect of Interventions in Graphs through Counterfactual Lifting.Here I study the most pervasive higher-order problem in GRL: link prediction. The AI methods that predict missing links in graphs are arguably a key component of today’s society, since they are used in knowledge base completion, remote sensing, predicting drug-target interactions, and recommendation systems that match Internet users to each other, to products, to online content.

Degree

Ph.D.

Advisors

Ribeiro, Purdue University.

Subject Area

Bioinformatics|Artificial intelligence|Information science|Web Studies

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS