An important issue any organization or individual has to face when managing data containing sensitive information, is the risk that can be incurred when releasing such data. Even though data may be sanitized before being released, it is still possible for an adversary to reconstruct the original data using additional information thus resulting in privacy violations. To date, however, a systematic approach to quantify such risks is not available. In this paper we develop a framework, based on statistical decision theory, that assesses the relationship between the disclosed data and the resulting privacy risk. We model the problem of deciding which data to disclose, in terms of deciding which disclosure rule to apply to a database. We assess the privacy risk by taking into account both the entity identification and the sensitivity of the disclosed information. Furthermore, we prove that, under some conditions, the estimated privacy risk is an upper bound on the true privacy risk. Finally, we relate our framework with the k-anonymity disclosure method. The proposed framework makes the assumptions behind k-anonymity explicit, quantifies them, and extends them in several natural directions.


statistical decision theory, privacy, entity identification, database, k-anonymity, Privacy, Security, Risk Management, Data Sharing, Decision Theory, Anonymity

Date of this Version



Transactions on Data Privacy 2:3 (2009) 153 - 183 The contributors grant the journal the right to publish, distribute, index, archive and publicly display the article (and the abstract) in printed, electronic or any other form.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.