Scaling Up Network Analysis and Mining: Statistical Sampling, Estimation, and Pattern Discovery

Nesreen K Ahmed, Purdue University

Abstract

Network analysis and graph mining play a prominent role in providing insights and studying phenomena across various domains, including social, behavioral, biological, transportation, communication, and financial domains. Across all these domains, networks arise as a natural and rich representation for data. Studying these real-world networks is crucial for solving numerous problems that lead to high-impact applications. For example, identifying the behavior and interests of users in online social networks (e.g., viral marketing), monitoring and detecting virus outbreaks in human contact networks, predicting protein functions in biological networks, and detecting anomalous behavior in computer networks. A key characteristic of these networks is that their complex structure is massive and continuously evolving over time, which makes it challenging and computationally intensive to analyze, query, and model these networks in their entirety. In this dissertation, we propose sampling as well as fast, efficient, and scalable methods for network analysis and mining in both static and streaming graphs. We develop a generic framework for statistical network stream sampling, called graph sample and hold. We formulate network sampling as a principled approach with two main functions: (1) the sampling function, and (2) the holding function, this approach allows tuning the sampling and estimation of graph properties more efficiently and accurately than the state-of-the-art. We develop a suite of algorithms to sample and estimate various graph properties, while processing the graph sequentially as a stream of edges. Finally, we develop a fast parallel algorithm for counting motifs, which is 460 times faster than the state-of-the-art. We show how these motif patterns can be used as features to benefit various machine learning tasks such as large-scale graph classification, prediction, anomaly detection, and visual analytics.

Degree

Ph.D.

Advisors

Neville, Purdue University.

Subject Area

Statistics|Artificial intelligence|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS