Hybrid Microdata via Model-Based Clustering
- Cite this paper as:
- Oganian A., Domingo-Ferrer J. (2012) Hybrid Microdata via Model-Based Clustering. In: Domingo-Ferrer J., Tinnirello I. (eds) Privacy in Statistical Databases. PSD 2012. Lecture Notes in Computer Science, vol 7556. Springer, Berlin, Heidelberg
In this paper we propose a new scheme for statistical disclosure limitation which can be classified as a hybrid method of protection, that is, a method that combines properties of perturbative and synthetic methods. This approach is based on model-based clustering with the subsequent synthesis of the records within each cluster. The novelty is that the clustering and synthesis methods have been carefully chosen to fit each other in view of reducing information loss. The model-based clustering tries to obtain clusters such that the within-cluster data distribution is approximately normal; then we can use a multivariate normal synthesizer for the local synthesis of data. In this way, some of the non-normal characteristics of the data are captured by the clustering, so that a simple synthesizer for normal data can be used within each cluster. Our method is shown to be effective when compared to other disclosure limitation strategies.
Keywords and PhrasesStatistical disclosure limitation (SDL) hybrid SDL methods mixture models model-based clustering expectation-maximization (EM) algorithm
Unable to display preview. Download preview PDF.