Network hypothesis testing for relational data

Sebastian Ignacio Moreno Araya, Purdue University

Abstract

Recent interest in networks—social, physical, etc.—has led to a great deal of research on the analysis and modeling of graphs. However, most studies analyze a single network rather than a population of networks. Although there are some studies that compare networks from different domains or samples, these studies have mainly consisted of empirical exploration of graph characteristics. As a result, there are few statistical methods to determine whether two networks are likely to have been sampled from the same probability distribution. This type of method would be useful in several contexts. For example, to uncover the behavior in graphs by comparing the effects of network modifications. The challenges to developing this type of statistical method include difficulties associated with obtaining a set of networks to learn the graph distribution, and the statistical limitations of models to capture variations across networks. In this dissertation, we propose a new model-based statistical hypothesis testing approach for network similarity. We start with an analysis of a common generative model used in the literature. With the analysis, we understand the lack of variation in the generation process of current models. Using this knowledge, we create a new generative model, based on Kronecker products, capable of modeling the mean and variance of the distribution from a given single network. Unfortunately, the quadratic runtime of a straightforward sampling process for Kronecker models limits our proposed hypothesis testing framework to small networks. To solve this, we propose a new representation of Kronecker models, facilitating the implementation of new sampling algorithms that can replicate the original Kronecker model distribution and run in time proportional to the number of edges. Using the new model and sampling algorithms, we generate an empirical sampling distribution of graphs that are likely to occur. Then, based on this, we develop a method to test the null hypothesis that a new network is generated from the same distribution as an observed training network.

Degree

Ph.D.

Advisors

Neville, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS