Neural Representation Learning for Semi-Supervised Node Classification and Explainability

Hogun Park, Purdue University

Abstract

Many real-world domains are relational, consisting of objects (e.g., users and papers) linked to each other in various ways. Because class labels in graphs are often only available for a subset of the nodes, semi-supervised learning for graphs has been studied extensively to predict the unobserved class labels. For example, we can predict political views in a partially labeled social graph dataset and get expected gross incomes of movies in an actor/movie graph with a few labels. Recently, advances in representation learning for graph data have made great strides for the semi-supervised node classification. However, most of the methods have mainly focused on learning node representations by considering simple relational properties (e.g., random walk) or aggregating nearby attributes, and it is still challenging to learn complex interaction patterns in partially labeled graphs and provide explanations on the learned representations.In this dissertation, multiple methods are proposed to alleviate both challenges for semi-supervised node classification. First, we propose a graph neural network architecture, REGNN, that leverages local inferences for unlabeled nodes. REGNN performs graph convolution to enable label propagation via high-order paths and predicts class labels for unlabeled nodes. In particular, our proposed attention layer of REGNN measures the role equivalence among nodes and effectively reduces the noise, which is generated during the aggregation of observed labels from distant neighbors at various distances. Second, we also propose a neural network architecture that jointly captures both temporal and static interaction patterns, which we call Temporal-Static-Graph-Net (TSGNet). The architecture learns a latent representation of each node in order to encode complex interaction patterns. Our key insight is that leveraging both a static neighbor encoder, that learns aggregate neighbor patterns, and a graph neural network-based recurrent unit, that captures complex interaction patterns, improves the performance of node classification. Lastly, in spite of better performance of representation learning on node classification tasks, neural network-based representation learning models are still less interpretable than the previous relational learning models due to the lack of explanation methods. To address the problem, we show that nodes with high bridgeness scores have larger impacts on node embeddings such as DeepWalk [1], LINE [2], Struc2Vec [3], and PTE [4] under perturbation. However, it is computationally heavy to get bridgeness scores, and we propose a novel gradient-based explanation method, GRAPH-wGD, to find nodes with high bridgenessefficiently. In our evaluations, our proposed architectures (REGNN and TSGNet) for semi-supervised node classification consistently improve predictive performance on real-world datasets. Our GRAPH-wGD also identifies important nodes as global explanations, which significantly change both predicted probabilities on node classification tasks and k-nearest neighbors in the embedding space after perturbing the highly ranked nodes and re-learning low-dimensional node representations for DeepWalk and LINE embedding methods.

Degree

Ph.D.

Advisors

Neville, Purdue University.

Subject Area

Artificial intelligence|Pedagogy|Internet and social media studies

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS