Advertisement

The Application of Genetic Algorithms to Data Synthesis: A Comparison of Three Crossover Methods

  • Yingrui Chen
  • Mark ElliotEmail author
  • Duncan Smith
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11126)

Abstract

Data synthesis is a data confidentiality method which is applied to microdata to prevent leakage of sensitive information about respondents. Instead of publishing real data, data synthesis produces an artificial dataset that does not contain the real records of respondents. This, in particular, offers significant protection against reidentification attacks. However, effective data synthesis requires retention of the key statistical properties of (and respecting the multiple utilities of) the original data. In previous work, we demonstrated the value of matrix genetic algorithms in data synthesis [4]. The current paper compares three crossover methods within a matrix GA: parallelised (two-point) crossover, matrix crossover, and parametric uniform crossover. The crossover methods are applied to three different datasets and are compared on the basis of how well they reproduce the relationships between variables in the original datasets.

Keywords

Genetic algorithms Data synthesis Data privacy 

References

  1. 1.
    Abowd, J.M., Lane, J.: New approaches to confidentiality protection: synthetic data, remote access and research data centers. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 282–289. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-25955-8_22CrossRefGoogle Scholar
  2. 2.
    Cantu-Paz, E., Goldberg, D.: Efficient parallel genetic algorithms: theory and practice. Comput. Methods Appl. Mech. Eng. 186, 221–238 (2000)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Chen, Y., Elliot, M., Sakshaug, J.: A genetic algorithm approach to synthetic data production. In: Proceedings of the 1st International Workshop on AI for Privacy and Security (2016). Article no. 13Google Scholar
  4. 4.
    Chen, Y., Elliot, M., Sakshaug, J.: Genetic algorithms in matrix representation and its application in synthetic data, UNECE work session on statistical data confidentiality (2017). https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.46/2017/2_Genetic_algorithms.pdf. Accessed 20 Dec 2017
  5. 5.
    Ciriani, V., di Vimercati, S.D.C., Foresti, S., Samarati, P.: Microdata protection. In: Yu, T., Jajodia, S. (eds.) Secure Data Management in Decentralized Systems, vol. 33, pp. 291–321. Springer, New York (2007).  https://doi.org/10.1007/978-0-387-27696-0_9CrossRefGoogle Scholar
  6. 6.
    Department for Communities and Local Government, Ipsos MORI: Citizenship Survey, 2010–2011. [data collection]. UK Data Service. SN: 7111 (2012).  https://doi.org/10.5255/UKDA-SN-7111-1. Accessed 20 Dec 2017
  7. 7.
    Drechsler, J.: Synthetic data, where do we come from? Where do we want to go? In: Synthetic Data Workshop; Office of National (2014)Google Scholar
  8. 8.
    Maimon, O., Rokach, L.: Data Mining and Knowledge Discovery Handbook, p. 704. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Navarro-Arribas, G., Torra, V.: Advanced research on data privacy in the ARES project. In: Navarro-Arribas, G., Torra, V. (eds.) Advanced Research in Data Privacy. SCI, vol. 567, pp. 3–14. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-09885-2_1CrossRefGoogle Scholar
  10. 10.
    Office for National Statistics. Crime Survey for England and Wales, 2015–2016. [data collection]. UK Data Service. SN: 8140 (2017).  https://doi.org/10.5255/UKDA-SN-8140-1. Accessed 11 Jan 2018
  11. 11.
    Office for National Statistics. Social Survey Division, Northern Ireland Statistics and Research Agency, Eurostat (2011). European Union Statistics on Income and Living Conditions, 2009. [data collection]. UK Data Service. SN: 6767 (2009).  https://doi.org/10.5255/UKDA-SN-6767-1. Accessed 11 Jan 2018
  12. 12.
    Pongcharoen, P., Khadwilard, A., Klakankhai, A.: Multi-matrix real-coded genetic algorithm for minimising total costs, in logistics chain network. In: World Academy of Science, Engineering and Technology, vol. 1, no. 11, pp. 574–597 (2007). (Int. J. Econ. Manag. Eng.)Google Scholar
  13. 13.
    Sun, L., Zhang, Y., Jiang, C.: A matrix real-coded genetic algorithm to the unit commitment problem. Electr. Pow. Syst. Res. 76, 716–728 (2006)CrossRefGoogle Scholar
  14. 14.
    Wallet, B.C., Marchette, D.J., Solka, J.L.: A matrix representation for genetic algorithms. In: Proceedings of Automatic Object Recognition IV of SPIE Aerosense. Naval Surface Warfare Center Dahlgren, Virginia (1996)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.University of ManchesterManchesterUK

Personalised recommendations