Abstract
An important issue any organization or individual has to face when managing data containing sensitive information, is the risk that can be incurred when releasing such data. Even though data may be sanitized before being released, it is still possible for an adversary to reconstruct the original data using additional information thus resulting in privacy violations. To date, however, a systematic approach to quantify such risks is not available. In this paper we develop a framework, based on statistical decision theory, that assesses the relationship between the disclosed data and the resulting privacy risk. We model the problem of deciding which data to disclose, in terms of deciding which disclosure rule to apply to a database. We assess the privacy risk by taking into account both the entity identification and the sensitivity of the disclosed information. Furthermore, we prove that, under some conditions, the estimated privacy risk is an upper bound on the true privacy risk. Finally, we relate our framework with the k-anonymity disclosure method. The proposed framework makes the assumptions behind k-anonymity explicit, quantifies them, and extends them in several natural directions.
Keywords
statistical decision theory, privacy, entity identification, database, k-anonymity, Privacy, Security, Risk Management, Data Sharing, Decision Theory, Anonymity
Date of this Version
2009
Included in
Engineering Commons, Life Sciences Commons, Medicine and Health Sciences Commons, Physical Sciences and Mathematics Commons
Comments
Transactions on Data Privacy 2:3 (2009) 153 - 183 The contributors grant the journal the right to publish, distribute, index, archive and publicly display the article (and the abstract) in printed, electronic or any other form.