Differentially private data publishing: From histograms to transaction sets

Wahbeh Qardaji, Purdue University

Abstract

The prevalent need for publicly available datasets, coupled with the spate of privacy-related incidents pertaining to the release of such data, have spurred the need to develop resilient and accurate methods of privacy-preserving data publishing. In this dissertation, we consider the problem of private data publishing while satisfying the robust notion of differential privacy. In particular, we consider the scenario in which a trusted curator gathers sensitive information from a large number of respondents, creates a dataset where each tuple corresponds to one entity, and publishes a privacy-preserving synopsis of the dataset. The diverse nature of the datasets of interest prevents the development of a single general method of data publishing that works in all situations. We therefore develop differentially private synopsis mechanisms for various types of data. We start with the simplest data publishing scenario: publishing a single-dimensional histogram. We explore hierarchical approaches to publishing histograms and propose various optimizations. Next we consider two-dimensional datasets and propose grid-based approaches for publishing geospatial datasets. For datasets with more than just a few dimensions, we propose a framework for publishing k-way marginals and contingency tables while guaranteeing accuracy and consistency. Furthermore, for high-dimensional datasets, we propose a framework for frequent itemset mining while guaranteeing differential privacy. Finally, we explore relaxations to differential privacy in light of an adversary's uncertainty about the dataset.

Degree

Ph.D.

Advisors

Li, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS