Anonymization views: Supporting privacy in database systems
Abstract
Many anonymization techniques proposed in the literature are standalone algorithms that operate on isolated tables to generate a privacy-preserving anonymized version of the data. Applying these algorithms efficiently and correctly within a database system is not straightforward, especially when answering queries that involve multiple tables and predicates. We introduce the notion of 'Anonymization Views' as an abstraction to support privacy through anonymization in database systems. We treat the problem of anonymization as a relational view on the tables containing sensitive data, and propose a generic definition of anonymization views that involve single tables, joins of multiple tables, and other anonymization views. Anonymization operators that are used in query plans to construct and operate on anonymization views are defined and implemented. In addition to adapting an existing anonymization algorithm to support multiple anonymization requirements, we propose a new non-blocking anonymization algorithm that supports pipelined query evaluation. Certain scenarios have been identified where the relational operators can be pushed below the anonymization operators to improve performance and utility yet maintain correctness, i.e., proper privacy. We present a prototype system using Post-greSQL that defines and operates on anonymization views using extensions to SQL. We demonstrate how anonymization views integrate with other privacy-preserving hippocratic database components, e.g., privacy policy management, limited retention, and limited disclosure. A number of experiments have been performed to report the performance and utility of anonymization views and associated query processing and optimization strategies under various circumstances.
Degree
M.S.
Advisors
Aref, Purdue University.
Subject Area
Computer science
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.