Data Mining and Knowledge Discovery

, Volume 11, Issue 2, pp 195–212 | Cite as

Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation

Article

Abstract

k-Anonymity is a useful concept to solve the tension between data utility and respondent privacy in individual data (microdata) protection. However, the generalization and suppression approach proposed in the literature to achieve k-anonymity is not equally suited for all types of attributes: (i) generalization/suppression is one of the few possibilities for nominal categorical attributes; (ii) it is just one possibility for ordinal categorical attributes which does not always preserve ordinality; (iii) and it is completely unsuitable for continuous attributes, as it causes them to lose their numerical meaning. Since attributes leading to disclosure (and thus needing k-anonymization) may be nominal, ordinal and also continuous, it is important to devise k-anonymization procedures which preserve the semantics of each attribute type as much as possible. We propose in this paper to use categorical microaggregation as an alternative to generalization/suppression for nominal and ordinal k-anonymization; we also propose continuous microaggregation as the method for continuous k-anonymization.

Keywords

k-anonymity microdata privacy database security microaggregation 

References

  1. Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., and Zhu, A. 2004. k-Anonymity: Algorithms and hardness. Technical report, Stanford University.Google Scholar
  2. Dalenius, T. 1986. Finding a needle in a haystack - or identifying anonymous census records. Journal of Official Statistics, 2(3):329–336.Google Scholar
  3. Defays, D. and Nanopoulos, P. 1993. Panels of enterprises and confidentiality: the small aggregates method. In Proc. of 92 Symposium on Design and Analysis of Longitudinal Surveys. Ottawa, Statistics Canada, pp.195–204.Google Scholar
  4. Domingo-Ferrer, J. and Mateo-Sanz, J.M. 2002. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering, 14(1):189–201.CrossRefGoogle Scholar
  5. Domingo-Ferrer, J., Mateo-Sanz, J.M., and Torra, V. 2001. Comparing sdc methods for microdata on the basis of information loss and disclosure risk. In Pre-proceedings of ETK-NTTS'2001 (vol. 2). Luxemburg. Eurostat, pp. 807–826.Google Scholar
  6. Domingo-Ferrer, J. and Torra, V. 2001a. Disclosure protection methods and information loss for microdata. In P. Doyle, J.I. Lane, J.J.M. Theeuwes, and L. Zayatz (Eds.), Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, Amsterdam. North-Holland. http://vneumann.etse.urv.es/publications/bcpi pp. 91–110.
  7. Domingo-Ferrer, J. and Torra, V. 2001b. A quantitative comparison of disclosure control methods for microdata. In P. Doyle, J.I. Lane, J.J.M. Theeuwes, and L. Zayatz (Eds.), Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, Amsterdam. North-Holland. http://vneumann.etse.urv.es/publications/bcpi, pp. 111–134.
  8. Domingo-Ferrer, J. and Torra, V. 2005. Privacy in statistical databases: Methods and performance metrics for microdata protection. manuscript.Google Scholar
  9. Duncan, G.T., Fienberg, S.E., Krishnan, R., Padman, R., and Roehrig, S.F. 2001a. Disclosure limitation methods and information loss for tabular data. In P. Doyle, J.I. Lane, J.J. Theeuwes and L.V. Zayatz (Eds.), Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies. Amsterdam. North-Holland, pp. 135–166.Google Scholar
  10. Duncan, G.T., Keller-McNulty, S.A., and Stokes, S.L. 2001b. Disclosure risk vs. data utility: The r-u confidentiality map.Google Scholar
  11. Hundepool, A., de Wetering, A.V., Ramaswamy, R., Franconi, L., Capobianchi, A., DeWolf, P.-P., Domingo-Ferrer, J., Torra, V., Brand, R., and Giessing, S. 2003. μ-ARGUS version 3.2 Software and User's Manual. Statistics Netherlands, Voorburg NL. http://neon.vb.cbs.nl/casc://neon.vb.cbs.nl/casc.
  12. Mateo-Sanz, J.M., Domingo-Ferrer, J., and Sebé, F. 2005. Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Mining and Knowledge Discovery, this issue.Google Scholar
  13. Meyerson, A. and Williams, R. 2004. On the complexity of optimal k-Anonymity. In Proc. of the ACM Symposium on Principles of Database Systems-PODS'2004. Paris, France. ACM, pp. 223–228.Google Scholar
  14. Oganian, A. and Domingo-Ferrer, J. 2001. On the complexity of optimal microaggregation for statistical disclosure control. Statistical Journal of the United Nations Economic Comission for Europe, 18(4):345–354.Google Scholar
  15. Reiter, J.P. 2004. Releasing multiply-imputed, synthetic public use microdata: An illustration and empirical study. Journal of the Royal Statistical Society, Series A, page forthcoming.Google Scholar
  16. Samarati, P. 2001. Protecting respondents' identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6):1010–1027.CrossRefGoogle Scholar
  17. Samarati, P. and Sweeney, L. 1998. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI International.Google Scholar
  18. Sebé, F., Domingo-Ferrer, J., Mateo-Sanz, J.M., and Torra, V. 2002. Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In J. Domingo-Ferrer (ed.), Inference Control in Statistical Databases, volume 2316 of LNCS, Berlin Heidelberg, Springer, pp. 163–171.Google Scholar
  19. Sweeney, L. 2002a. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems, 10(5):571–588.MATHCrossRefMathSciNetGoogle Scholar
  20. Sweeney, L. 2002b. k-anonimity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems, 10(5):557–570.MATHCrossRefMathSciNetGoogle Scholar
  21. Torra, V. 2004. Microaggregation for categorical variables: A median based approach. In J. Domingo-Ferrer and V. Torra (Eds.), Privacy in Statistical Databases, volume 3050 of LNCS, Berlin Heidelberg. Springer, pp. 162–174.Google Scholar
  22. Willenborg, L. and DeWaal, T. 2001. Elements of Statistical Disclosure Control. Springer-Verlag, New York.MATHGoogle Scholar
  23. Winkler, W. E. 2004. Re-identification methods for masked microdata. In J. Domingo-Ferrer and V. Torra (Eds.), Privacy in Statistical Databases, volume 3050 of LNCS, Berlin Heidelberg, Springer, pp. 216–230.Google Scholar
  24. Yancey, W.E., Winkler, W.E., and Creecy, R.H. 2002. Disclosure risk assessment in perturbative microdata protection. In J. Domingo-Ferrer (Eds.), Inference Control in Statistical Databases, volume 2316 of LNCS, Berlin Heidelberg. Springer, pp. 135–152.Google Scholar

Copyright information

© Springer Science+Business Media, Inc. 2005

Authors and Affiliations

  1. 1.Department of Computer Engineering and MathsRovira i Virgili University of TarragonaTarragonaSpain
  2. 2.Institut d'Investigació en Intel·ligència Artificial-CSICBellaterraSpain

Personalised recommendations