Advertisement

Soft Computing

, Volume 15, Issue 7, pp 1301–1311 | Cite as

An evolutionary approach to enhance data privacy

  • Javier Jiménez
  • Jordi Marés
  • Vicenç TorraEmail author
Original Paper

Abstract

Dissemination of data with sensitive information about individuals has an implicit risk of unauthorized disclosure. Perturbative masking methods propose the distortion of the original data sets before publication, tackling a difficult tradeoff between data utility (low information loss) and protection against disclosure (low disclosure risk). In this paper, we describe how information loss and disclosure risk measures can be integrated within an evolutionary algorithm to seek new and enhanced masking protections for continuous microdata. The proposed technique constitutes a hybrid approach that combines state-of-the-art protection methods with an evolutionary algorithm optimization. We also provide experimental results using three data sets in order to illustrate and empirically evaluate the application of this technique.

Keywords

Information privacy and security Evolutionary algorithms 

Notes

Acknowledgments

Partial support by Generalitat de Catalunya (AGAUR, 2009 SGR 7) and by the Spanish MEC (projects ARES CONSOLIDER INGENIO 2010 CSD2007 00004 and e-AEGIS TSI2007 65406-C03-01/02) is acknowledged.

References

  1. Agrawal R, Srikant R (2000) Privacy preserving data mining. In: Proceedings of the ACM SIGMOD conference on management of data, pp 439–450Google Scholar
  2. Back T, Fogel DB, Michalewicz Z (eds) (2000) Evolutionary computation. Advanced algorithms and operations, vol 2. Institute of Physics Publishing, BristolGoogle Scholar
  3. Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: IEEE proceedings of the 21st international conference on data engineering, ICDE, pp 217–228Google Scholar
  4. Brand R, Domingo-Ferrer J, Mateo-Sanz JM (2002) Reference data sets to test and compare SDC methods for protection of numerical microdata. Unscheduled Deliverable, European Project IST–2000–25069 CASCGoogle Scholar
  5. Caruana RA, Schaffer JD (1988) Representation and hidden bias: Gray vs. binary coding for genetic algorithms. In: Proceedings of the 5th international conference on machine learning, Morgan Kaufmann, Los Altos, pp 153–161Google Scholar
  6. Defays D, Anwar MN (1995) Micro-aggregation: a generic method. In: Proceedings of the 2nd international symposium on statistical confidentiality, pp 69–78Google Scholar
  7. Defays D, Nanopoulos P (1993) Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 symposium on design and analysis of longitudinal surveys, pp 195–204Google Scholar
  8. Dick G (2005) A comparison of localised and global niching methods. In: Proceedings of the 17th annual colloquium of the spatial information research centre, pp 91–101Google Scholar
  9. Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201CrossRefGoogle Scholar
  10. Domingo-Ferrer J, Torra V (2001) A quantitative comparison of disclosure control methods for microdata. In: Doyle P, Lane JI, Theeuwes JJM, Zayatz LV (eds) Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, Chap 6. Elsevier, Amsterdam, pp 111–133Google Scholar
  11. Domingo-Ferrer J, Torra V (2004) Disclosure risk assessment in statistical data protection. J Comput Appl Math 164:285–293MathSciNetCrossRefGoogle Scholar
  12. Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212MathSciNetCrossRefGoogle Scholar
  13. Domingo-Ferrer J, Mateo-Sanz JM, Torra V (2001) Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: New techniques and technologies for statistics: exchange of technology and know-how, ETK-NTTS’2001. Creta, Hersonissos, pp 807–826Google Scholar
  14. Duncan GT, Fienberg SE, Krishnan R, Padman R, Roehrig SF (2001a) Disclosure limitation methods and information loss for tabular data. In: Doyle P, Lane JI, Theuwes JJM, Vatz L (eds) Confidentiality, disclosure and data access: theory and practical applications for statistical agencies, Chap 7. Elsevier, Amsterdam, pp 135–166Google Scholar
  15. Duncan GT, Keller-McNulty SA, Stokes SL (2001b) Disclosure risk vs. data utility: the R-U confidentiality map. Technical report 121, National Institute of Statistical Sciences, NISS, North CarolinaGoogle Scholar
  16. Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press (2nd edn, MIT Press, 1992)Google Scholar
  17. Iyengar VS (2002) Transforming data to satisfy privacy constraints. In: Proceedings of the Eigth ACM SIGKDD international conference on knowledge discovery and data mining, pp 279–288Google Scholar
  18. Jiménez J, Torra V (2009a) JPEG-based microdata protection methods. Technical reports IIIA–TR–2009–06, IIIA-CSICGoogle Scholar
  19. Jiménez J, Torra V (2009b) Utility and risk of JPEG–based continuous microdata protection methods. In: IEEE Proceedings of the 4th international conference on availability, reliability and security, ARESGoogle Scholar
  20. Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17(7):902–911CrossRefGoogle Scholar
  21. LeFevre KR (2007) Anonymity in data publishing and distribution. PhD thesis, University of Wisconsin, MadisonGoogle Scholar
  22. Mahfoud SW (1992) Crowding and preselection revisited. Technical report 92004, Illinois Genetic Algorithms Laboratory (IlliGAL), University of Illinois, also in Parallel Problem Solving From Nature, PPSN, 2:27–36Google Scholar
  23. Mateo-Sanz JM, Domingo-Ferrer J, Sebé F (2005) Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min Knowl Discov 11(2):181–193MathSciNetCrossRefGoogle Scholar
  24. Michalewicz Z, Fogel DB (2004) How to solve it: Modern Heuristics, 2nd edn. Springer, BerlinGoogle Scholar
  25. Moore RA Jr (1996) Controlled data-swapping techniques for masking public use microdata sets. Research report, RR 96-04, Statistical Research Division Report Series, US Bureau of the CensusGoogle Scholar
  26. Nin J, Herranz J, Torra V (2008a) On the disclosure risk of multivariate microaggregation. Data Knowl Eng 67(3):399–412CrossRefGoogle Scholar
  27. Nin J, Herranz J, Torra V (2008b) Rethinking rank swapping to decrease disclosure risk. Data Knowl Eng 64(1):346–364CrossRefGoogle Scholar
  28. Rechenberg I (1970) Evolutions strategie: optimierung technischer systeme nach prinzipien der biologischen information. PhD thesis, Technical University of Berlin, reprinted by Fromman Verlag, Freiburg, Germany, 1973Google Scholar
  29. Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027CrossRefGoogle Scholar
  30. Schaffer JD, Caruana R, Eshelman LJ, Das R (1989) A study of control parameters affecting online performance of genetic algorithms for function optimization. In: Schaffer JD (ed) ICGA, Morgan Kaufmann, pp 51–60Google Scholar
  31. Schwefel HP (1981) Numerical optimization of computer models (Tr. from German to English). Wiley, ChichesterGoogle Scholar
  32. Sebé F, Domingo-Ferrer J, Mateo JM, Torra V (2002) Post-masking optimization of the tradeoff between information loss and disclosure risk in masked microdata sets. In: Inference control in statistical databases: from theory to practice, LNCS, vol 2316. Springer, Berlin, pp 163–171Google Scholar
  33. Solanas A (2008) Privacy protection with genetic algorithms. In: Ang Yang LTB Yin Shan (ed) Success in evolutionary computation, Studies in computational intelligence series. Springer, Berlin, pp 215–239Google Scholar
  34. Willenborg L, de Waal T (1996) Statistical disclosure control in practice. Springer, BerlinGoogle Scholar
  35. Yancey WE, Winkler WE, Creecy RH (2002) Disclosure risk assessment in perturbative microdata protection. In: Inference control in statistical databases: from theory to practice, LNCS, vol 2316. Springer, Berlin, pp 135–152Google Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  1. 1.IIIA, Artificial Intelligence Research InstituteCSIC, Consejo Superior de Investigaciones CientficasBellaterraSpain

Personalised recommendations