Generating Attributed Networks: Modeling, Learning, and Sampling

Pablo D Robles-Granda, Purdue University

Abstract

Statistical models of networks are widely used to reason about the properties of complex systems—where the nodes represent entities (e.g., users), the links represent relationships (e.g., friendships), and the attributes represent a property of the nodes (e.g., professions). In particular, generative network models (GNMs) allow us to create synthetic graphs (structure only) for prediction, anonymization, testing, etc. To acquire a better understanding of the underlying properties of the system (e.g., a social network) it is crucial to develop GNMs that accurately capture the observed characteristics in the real world network structure, and to incorporate information about the attributes of the system. Because of the variety of problems and domains where networks arise it is important to consider both efficiency and correctness. In this work we investigate three main statistical tasks where efficient and/or correct algorithms can be created: modeling of networks and attributed-networks, parameter learning for the models, and sampling of synthetic data. Furthermore, we identify three open problems that will benefit from this work. ^ First, we study the primitives governing the behavior of edge-based GNMs and introduce a generalization. We create a unified GNM representation using Bayesian networks (BNs) with parametric symmetries and context-specific dependence. We use these two properties to design a universal, efficient, and correct sampling method. We provide two example transformations of existing GNMs to the new representation and design two new GNMs based on the representation. ^ Second, we analyze modeling of attributed-networks using hierarchical GNMs, for which no solution exists, but our solution is general and can be applied to other types of GNMs as well. Sampling attributed-networks using hierarchical GNMs is difficult because the probability mass is allocated to certain regions of the network space. In our solution we: 1) model attributed-networks as an approximation to a joint distribution with marginals of structure (defined by a GNM) and attributes (defined by a probability function) and combine them using maximum entropy, 2) propose CSAG, a sampling method adapted to hierarchical GNMs that jointly models attributes and structure better than the state of the art and maintains the variability of the GNM. ^ Finally, using the insights from the second problem we propose a generalized representation and probabilistic description of attributed-networks using copulas—that are an effective way to combine marginals to model joint distributions. The immediate benefit is not only precise formulation of the joint but also simplification of estimation. We provide a non-iterative structural marginal to facilitate sampling. We also provide theoretical analysis of the conditions that determine when we can or cannot model the autocorrelation of attributes.^

Degree

Ph.D.

Advisors

Jennifer Neville, Purdue University.

Subject Area

Statistics|Artificial intelligence|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS