Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation
- 580 Downloads
k-Anonymity is a useful concept to solve the tension between data utility and respondent privacy in individual data (microdata) protection. However, the generalization and suppression approach proposed in the literature to achieve k-anonymity is not equally suited for all types of attributes: (i) generalization/suppression is one of the few possibilities for nominal categorical attributes; (ii) it is just one possibility for ordinal categorical attributes which does not always preserve ordinality; (iii) and it is completely unsuitable for continuous attributes, as it causes them to lose their numerical meaning. Since attributes leading to disclosure (and thus needing k-anonymization) may be nominal, ordinal and also continuous, it is important to devise k-anonymization procedures which preserve the semantics of each attribute type as much as possible. We propose in this paper to use categorical microaggregation as an alternative to generalization/suppression for nominal and ordinal k-anonymization; we also propose continuous microaggregation as the method for continuous k-anonymization.
Keywordsk-anonymity microdata privacy database security microaggregation
- Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., and Zhu, A. 2004. k-Anonymity: Algorithms and hardness. Technical report, Stanford University.Google Scholar
- Dalenius, T. 1986. Finding a needle in a haystack - or identifying anonymous census records. Journal of Official Statistics, 2(3):329–336.Google Scholar
- Defays, D. and Nanopoulos, P. 1993. Panels of enterprises and confidentiality: the small aggregates method. In Proc. of 92 Symposium on Design and Analysis of Longitudinal Surveys. Ottawa, Statistics Canada, pp.195–204.Google Scholar
- Domingo-Ferrer, J., Mateo-Sanz, J.M., and Torra, V. 2001. Comparing sdc methods for microdata on the basis of information loss and disclosure risk. In Pre-proceedings of ETK-NTTS'2001 (vol. 2). Luxemburg. Eurostat, pp. 807–826.Google Scholar
- Domingo-Ferrer, J. and Torra, V. 2001a. Disclosure protection methods and information loss for microdata. In P. Doyle, J.I. Lane, J.J.M. Theeuwes, and L. Zayatz (Eds.), Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, Amsterdam. North-Holland. http://vneumann.etse.urv.es/publications/bcpi pp. 91–110.
- Domingo-Ferrer, J. and Torra, V. 2001b. A quantitative comparison of disclosure control methods for microdata. In P. Doyle, J.I. Lane, J.J.M. Theeuwes, and L. Zayatz (Eds.), Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, Amsterdam. North-Holland. http://vneumann.etse.urv.es/publications/bcpi, pp. 111–134.
- Domingo-Ferrer, J. and Torra, V. 2005. Privacy in statistical databases: Methods and performance metrics for microdata protection. manuscript.Google Scholar
- Duncan, G.T., Fienberg, S.E., Krishnan, R., Padman, R., and Roehrig, S.F. 2001a. Disclosure limitation methods and information loss for tabular data. In P. Doyle, J.I. Lane, J.J. Theeuwes and L.V. Zayatz (Eds.), Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies. Amsterdam. North-Holland, pp. 135–166.Google Scholar
- Duncan, G.T., Keller-McNulty, S.A., and Stokes, S.L. 2001b. Disclosure risk vs. data utility: The r-u confidentiality map.Google Scholar
- Hundepool, A., de Wetering, A.V., Ramaswamy, R., Franconi, L., Capobianchi, A., DeWolf, P.-P., Domingo-Ferrer, J., Torra, V., Brand, R., and Giessing, S. 2003. μ-ARGUS version 3.2 Software and User's Manual. Statistics Netherlands, Voorburg NL. http://neon.vb.cbs.nl/casc://neon.vb.cbs.nl/casc.
- Mateo-Sanz, J.M., Domingo-Ferrer, J., and Sebé, F. 2005. Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Mining and Knowledge Discovery, this issue.Google Scholar
- Meyerson, A. and Williams, R. 2004. On the complexity of optimal k-Anonymity. In Proc. of the ACM Symposium on Principles of Database Systems-PODS'2004. Paris, France. ACM, pp. 223–228.Google Scholar
- Oganian, A. and Domingo-Ferrer, J. 2001. On the complexity of optimal microaggregation for statistical disclosure control. Statistical Journal of the United Nations Economic Comission for Europe, 18(4):345–354.Google Scholar
- Reiter, J.P. 2004. Releasing multiply-imputed, synthetic public use microdata: An illustration and empirical study. Journal of the Royal Statistical Society, Series A, page forthcoming.Google Scholar
- Samarati, P. and Sweeney, L. 1998. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI International.Google Scholar
- Sebé, F., Domingo-Ferrer, J., Mateo-Sanz, J.M., and Torra, V. 2002. Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In J. Domingo-Ferrer (ed.), Inference Control in Statistical Databases, volume 2316 of LNCS, Berlin Heidelberg, Springer, pp. 163–171.Google Scholar
- Torra, V. 2004. Microaggregation for categorical variables: A median based approach. In J. Domingo-Ferrer and V. Torra (Eds.), Privacy in Statistical Databases, volume 3050 of LNCS, Berlin Heidelberg. Springer, pp. 162–174.Google Scholar
- Winkler, W. E. 2004. Re-identification methods for masked microdata. In J. Domingo-Ferrer and V. Torra (Eds.), Privacy in Statistical Databases, volume 3050 of LNCS, Berlin Heidelberg, Springer, pp. 216–230.Google Scholar
- Yancey, W.E., Winkler, W.E., and Creecy, R.H. 2002. Disclosure risk assessment in perturbative microdata protection. In J. Domingo-Ferrer (Eds.), Inference Control in Statistical Databases, volume 2316 of LNCS, Berlin Heidelberg. Springer, pp. 135–152.Google Scholar