New Approaches Towards Online, Distributed, and Robust Learning of Statistical Properties of Data

Tong Yao, Purdue University

Abstract

In this thesis, we present algorithms to allow agents to estimate certain properties in a robust, online, and distributed manner. Each agent receives a sequence of observations, and through communication, collectively infers properties of the data gathered by all agents by communicating.In the first part of the thesis, we provide algorithms to infer the correlations between interacting entities from these large datasets. Gaussian graphical models have been well studied to represent the relationships between the various random variables which generate data, and numerous algorithms have been proposed to learn the dependencies in such models. However, existing algorithms typically process data in a batch at a central location, limiting their applications in scenarios where data arrive in real-time and are gathered by different agents.To address these challenges, first, we propose an online sparse inverse covariance algorithm to infer the static network structure (i.e., dependencies between nodes) in real-time from time-series data, in a centralized location. Subsequently, we propose a distributed algorithm to cooperatively learn the network structure in real-time from data collected by distributed agents. We characterize the theoretical convergence properties and provide simulations using synthetic datasets and real-world hurricane Twitter datasets in disaster management applications.The second part of this thesis addresses the robustness of online and distributed learning under arbitrary data corruption. We propose online and distributed algorithms for robust mean, covariance, and sparse inverse covariance estimation. These algorithms are capable of operating effectively even in the presence of adversarial data attacks. We provide theoretical bounds on the error and rate of convergence of these methods and evaluate their performance under various settings.Finally, we consider the problem of classification with a network of heterogeneous and partially informative agents, each receiving local data from an underlying true class, and equipped with a classifier that only distinguishes between a subset of the entire set of classes. We propose an iterative algorithm that uses the posterior probabilities of any classifier and recursively updates each agent’s local belief based on its local signals and belief information from its neighbors. We then adopt a novel distributed min-rule to update each agent’s global belief and enable learning of the true class for all agents. We analyze the convergence properties of our proposed algorithm, and subsequently, demonstrate and compare its performance with local averaging and global average consensus through simulations and with a visual image dataset.

Degree

Ph.D.

Advisors

Sundaram, Purdue University.

Subject Area

Communication

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS