Skip to main content

Mixing Genetic Algorithms and V-MDAV to Protect Microdata

  • Chapter
  • First Online:
Computational Intelligence for Privacy and Security

Abstract

Protecting the privacy of individuals, whose data are released to untrusted parties, is a problem that has captured the attention of the scientific community for years. Several techniques have been proposed to cope with this problem. Amongst these techniques, microaggregation is able to provide a good trade-off between information loss and disclosure risk. Thus, many efforts have been devoted to its study. Microaggregation is a statistical disclosure control (SDC) technique that aims at protecting the privacy of individual respondents by aggregating the information of similar respondents, so as to make them undistinguishable. Although microaggregation is a very interesting approach, to microaggregate multivariate data sets optimally is known to be an NP-hard problem. Consequently, the use of heuristics has been suggested as a possible strategy to solve the problem in a reasonable time. Specifically, genetic algorithms (GA) have been shown to be able to find good solutions to the microaggregation problem for small, multivariate data sets. However, due to the very nature of the problem, GA can hardly cope with large, multivariate data sets. With the aim to apply them to large data sets, those have to be previously partitioned into smaller disjoint subsets that the GA can handle separately. In this chapter, we summarise several proposals for partitioning data sets, in order to apply GA to microaggregate them. In addition, we elaborate on the study of a partitioning strategy based on the variable-MDAV algorithm, we study the effect of several parameters, namely the dimension, the aggregation parameter (k), the size of the data sets, etc. Also, we compare it with the most relevant previous proposals.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brand, R., Domingo-Ferrer, J., Mateo-Sanz, J.M.: Reference data sets to test and compare sdc methods for protection of numerical microdata. European Project IST-2000-25069 CASC (2002), http://neon.vb.cbs.nl/casc

  2. Canadian Privacy: Canadian privacy regulations (2005), http://www.media-awareness.ca/english/issues/privacy/canadian_legislation_privacy.cfm

  3. Defays, D., Anwar, N.: Micro-aggregation: a generic method. In: Proceedings of the 2nd International Symposium on Statistical Confidentiality, Eurostat, Luxemburg, pp. 69–78 (1995)

    Google Scholar 

  4. Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J.M., Sebé, F.: Efficient multivariate data-oriented microaggregation. The VLDB Journal 15(4), 355–369 (2006)

    Article  Google Scholar 

  5. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14(1), 189–201 (2002)

    Article  Google Scholar 

  6. Domingo-Ferrer, J., Sebé, F., Solanas, A.: A polynomial-time approximation to optimal multivariate microaggregation. Computers & Mathematics with Applications 55(4), 714–732 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  7. Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Discov. 11(2), 195–212 (2005)

    Article  MathSciNet  Google Scholar 

  8. Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogenerous k-anonymity through microaggregation. Data Mining and Knowledge Discovery 11(2), 195–212 (2005)

    Article  MathSciNet  Google Scholar 

  9. Edwards, A.W.F., Cavalli-Sforza, L.L.: A method for cluster analysis. Biometrics 21, 362–375 (1965)

    Article  Google Scholar 

  10. European Parliament: DIRECTIVE 2002/58/EC of the European Parliament and Council of concerning the processing of personal data and the protection of privacy in the electronic communications sector (Directive on privacy and electronic communications) (July 12, 2002), http://europa.eu.int/eur-lex/pri/en/oj/dat/2002/l_201/l_20120020731en00370047.pdf

  11. Fayyoumi, E., Oommen, B.J.: A Fixed Structure Learning Automaton Micro-Aggregation Technique for Secure Statistical Databases. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 114–128. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Hansen, S.L., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. IEEE Transactions on Knowledge and Data Engineering 15(4), 1043–1044 (2003)

    Article  Google Scholar 

  13. Holland, J.: Adaptation in Natural and Artificial Systems. University of Michigan Press (1975)

    Google Scholar 

  14. Hundepool, A., de Wetering, A.V., Ramaswamy, R., Franconi, L., Capobianchi, A., DeWolf, P.P., Domingo-Ferrer, J., Torra, V., Brand, R., Giessing, S.: μ-ARGUS version 4.0 Software and User’s Manual. Statistics Netherlands, Voorburg NL (2005), http://neon.vb.cbs.nl/casc

  15. Hutter, M.: Fitness uniform selection to preserve genetic diversity. Tech. Rep. IDSIA-01-01, IDSIA, Manno-Lugano, Switzerland (2001)

    Google Scholar 

  16. Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on Knowledge and Data Engineering 17(7), 902–911 (2005)

    Article  Google Scholar 

  17. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)

    Google Scholar 

  18. Martínez-Ballesté, A., Solanas, A., Domingo-Ferrer, J., Mateo-Sanz, J.M.: A genetic approach to multivariate microaggregation for database privacy. In: ICDE Workshops, pp. 180–185. IEEE Computer Society Press (2007), http://dx.doi.org/10.1109/ICDEW.2007.4400989

  19. Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Statistical Journal of the United Nations Economic Comission for Europe 18(4), 345–354 (2001)

    Google Scholar 

  20. Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)

    Article  Google Scholar 

  21. Sande, G.: Exact and approximate methods for data directed microaggregation in one or more dimensions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 459–476 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  22. Solanas, A.: Privacy Protection with Genetic Algorithms. In: Success in Evolutionary Computation. SCI, pp. 215–237. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  23. Solanas, A., Gonzalez-Nicolaas, U., Martinez-Balleste, A.: A variable-mdav-based partitioning strategy to continuous multivariate microaggregation with genetic algorithms. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2010), doi:10.1109/IJCNN.2010.5596660

    Google Scholar 

  24. Solanas, A., Martínez-Ballesté, A.: V-MDAV: Variable group size multivariate microaggregation. In: COMPSTAT 2006, Rome, pp. 917–925 (2006)

    Google Scholar 

  25. Solanas, A., Martínez-Ballesté, A., Mateo-Sanz, J.M., Domingo-Ferrer, J.: Multivariate microaggregation based on genetic algorithms. In: 3rd IEEE Conference On Intelligent Systems, pp. 65–70. IEEE Computer Society Press, Westminster (2006)

    Chapter  Google Scholar 

  26. Torra, V.: Microaggregation for categorical variables: A median based approach. In: Privacy in Statistical Databases, pp. 162–174 (2004)

    Google Scholar 

  27. US Privacy: regulations (2005), http://www.media-awareness.ca/english/issues/privacy/us_legislation_privacy.cfm

  28. Ward, J.H.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58, 236–244 (1963)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Agusti Solanas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Solanas, A., González-Nicolás, Ú., Martínez-Ballesté, A. (2012). Mixing Genetic Algorithms and V-MDAV to Protect Microdata. In: Elizondo, D., Solanas, A., Martinez-Balleste, A. (eds) Computational Intelligence for Privacy and Security. Studies in Computational Intelligence, vol 394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25237-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25237-2_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25236-5

  • Online ISBN: 978-3-642-25237-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics