Beyond Multivariate Microaggregation for Large Record Anonymization

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8313)

Abstract

Microaggregation is one of the most commonly employed microdata protection methods. The basic idea of microaggregation is to anonymize data by aggregating original records into small groups of at least \(k\) elements and, therefore, preserving \(k\)-anonymity. Usually, in order to avoid information loss, when records are large, i.e., the number of attributes of the data set is large, this data set is split into smaller blocks of attributes and microaggregation is applied to each block, successively and independently. This is called multivariate microaggregation. By using this technique, the information loss after collapsing several values to the centroid of their group is reduced. Unfortunately, with multivariate microaggregation, the \(k\)-anonymity property is lost when at least two attributes of different blocks are known by the intruder, which might be the usual case.

In this work, we present a new microaggregation method called one dimension microaggregation (\(Mic1D-k\)). With \(Mic1D-k\), the problem of \(k\)-anonymity loss is mitigated by mixing all the values in the original microdata file into a single non-attributed data set using a set of simple pre-processing steps and then, microaggregating all the mixed values together. Our experiments show that, using real data, our proposal obtains lower disclosure risk than previous approaches whereas the information loss is preserved.

Keywords

Microaggregation \(k\)-anonymity Privacy in statistical databases 

References

  1. 1.
    Adam, N.R., Wortmann, J.C.: Security-control for statistical databases: a comparative study. ACM Comput. Surv. 21, 515–556 (1989)CrossRefGoogle Scholar
  2. 2.
    Aggarwal, C.: On \(k\)-anonymity and the curse of dimensionality. In: Proceedings of the 31st International Conference on Very Large Databases, pp. 901–909 (2005)Google Scholar
  3. 3.
    Aggarwal, G., Feder, T., Kenthapadi, K., Khuller, S., Panigrahy, R., Thomas, D., Zhu, A.: Achieving anonymity via clustering. In: Proceedings of the 25th ACM Symposium on Principles of Databases Systems, pp. 153–162 (2006)Google Scholar
  4. 4.
    CASC: Computational Aspects of Statistical Confidentiality, European Project IST-2000-25069, http://neon.vb.cbs.nl/casc
  5. 5.
    Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata, pp. 91–110 of [8] (2001)Google Scholar
  6. 6.
    Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata, pp. 111–133 of [8] (2001)Google Scholar
  7. 7.
    Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)CrossRefGoogle Scholar
  8. 8.
    Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (eds.): Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Elsevier Science, New York (2001)Google Scholar
  9. 9.
    Felso, F., Theeuwes, J., Wagner, G.: Disclosure limitation in use: results of a survey, pp. 17–42 of [8] (2001)Google Scholar
  10. 10.
    Fung, B., Wang, K., Yu, P.: Top-down specialization for information and privacy preservation. In: Proceedings of the 21st IEEE International Conference on Data, Engineering, pp. 205–216 (2005)Google Scholar
  11. 11.
    Hansen, S., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. Trans. Knowl. Data Eng. 15(4), 1043–1044 (2003)CrossRefGoogle Scholar
  12. 12.
    Jolliffe, I.T.: Principal Component Analysis. Springer Series in Statistics. Springer, New York (2002). ISBN: 978-0-387-95442-4MATHGoogle Scholar
  13. 13.
    Larsen, R.J., Marx, M.L.: An Introduction to Mathematical Statistics and Its Applications, 3rd edn. Prentice Hall, Upper Saddle River (2005). ISBN-10: 0131867938Google Scholar
  14. 14.
    Mateo-Sanz, J.M., Domingo-Ferrer, J.: A method for data-oriented multivariate microaggregation. In: Statistical Data Protection for Official Publications of the European, Communities, pp. 89–99Google Scholar
  15. 15.
    Murphy, P., M., Aha, D.W.: UCI Repository machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html, University of California, Department of Information and Computer Science, Irvine, CA (1994)
  16. 16.
    Nin, J., Torra, V.: Empirical analysis of database privacy using twofold integrals. In: Hao, Y., Liu, J., Wang, Y.-P., Cheung, Y., Yin, H., Jiao, L., Ma, J., Jiao, Y.-C. (eds.) CIS 2005, vol. 3801, pp. 1–8. LNAI. Springer, Heidelberg (2005)Google Scholar
  17. 17.
    Nin, J., Herranz, J., Torra, V.: On the disclosure risk of multivariate microaggregation. Data. Knowl. Eng. (DKE), Elsevier 67(3), 399–412 (2008)CrossRefGoogle Scholar
  18. 18.
    Nin, J., Herranz, J., Torra, V.: How to group attributes in multivariate microaggregation. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 16(1), 121–138 (2008)CrossRefGoogle Scholar
  19. 19.
    Nin, J., Herranz, J., Torra, V.: Towards a more realistic disclosure risk assessment. In: Domingo-Ferrer, J., Saygın, Y. (eds.) PSD 2008, vol. 5262, pp. 152–165. LNCS. Springer, Heidelberg (2008)Google Scholar
  20. 20.
    Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. United Nations Econ. Comm. Europe 18(4), 345–354 (2000)Google Scholar
  21. 21.
    Pagliuca, D., Seri, G.: Some results of individual ranking method on the system of enterprise accounts annual survey, Esprit SDC Project, Deliverable MI-3/D2 (1999)Google Scholar
  22. 22.
    Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression. SRI International technical reports (1998)Google Scholar
  23. 23.
    Sande, G.: Exact and approximate methods for data directed microaggregation in one or more dimensions. Int. J. Unc. Fuzz. Knowl. Based Syst. 10(5), 459–476 (2002)CrossRefMATHMathSciNetGoogle Scholar
  24. 24.
    Sebé, F., Domingo-Ferrer, J., Mateo-Sanz, J.M., Torra, V.: Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases, vol. 2316, pp. 163–171. LNCS. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  25. 25.
    Sweeney, L.: Achieving \(k\)-anonymity privacy protection using generalization and suppression. Int. J. Unc. Fuzz. Knowl. Based Syst. 10(5), 571–588 (2002)CrossRefMATHMathSciNetGoogle Scholar
  26. 26.
    Sweeney, L.: \(k\)-anonymity: a model for protecting privacy. Int. J. Unc. Fuzz. Knowl. Based Syst. 10(5), 557–570 (2002)CrossRefMATHMathSciNetGoogle Scholar
  27. 27.
    U.S. Census Bureau, Data Extraction System. http://www.census.gov/ (1990)
  28. 28.
    Willenborg, L., Waal, T.: Elements of Statistical Diclosure Control. Lecture Notes in Statistics. Springer, New York (2001)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Barcelona Supercomputing Center (BSC)Universitat Politècnica de Catalunya (BarcelonaTech)BarcelonaSpain

Personalised recommendations