Efficient k-Anonymization Using Clustering Techniques
Abstract
k-anonymization techniques have been the focus of intense research in the last few years. An important requirement for such techniques is to ensure anonymization of data while at the same time minimizing the information loss resulting from data modifications. In this paper we propose an approach that uses the idea of clustering to minimize information loss and thus ensure good data quality. The key observation here is that data records that are naturally similar to each other should be part of the same equivalence class. We thus formulate a specific clustering problem, referred to as k-member clustering problem. We prove that this problem is NP-hard and present a greedy heuristic, the complexity of which is in O(n 2). As part of our approach we develop a suitable metric to estimate the information loss introduced by generalizations, which works for both numeric and categorical data.
Keywords
k-anonymization, data modifications, clustering, NP-hard, heuristic
Date of this Version
2007
Comments
Advances in Databases - Concepts, Systems and Applications Lecture Notes in Computer Science - 2007, Volume 4443/2007, 188-200