Skip to main content

GANs for Tabular Healthcare Data Generation: A Review on Utility and Privacy

  • Conference paper
  • First Online:
Discovery Science (DS 2021)

Abstract

Data is a major asset in today’s healthcare scenery. Hospitals are one of the primary producers of healthcare-related data and the value this data can provide is enormous. However, to use this to improve healthcare practice and push science forward, it is necessary to safeguard the patient’s privacy and the ethical use of the data. The ethical and legal requirements are vast and complex. Synthetic data appears as a tool to overcome these hurdles and provide fast and reliable access to data without compromising utility nor privacy. Even though Generative Adversarial Networks (GANs) are receiving a lot of attention lately, the application of most common models and architectures are not suited to tabular data – the most prevalent healthcare-related data. This study surveys the current GAN implementations tailored to this scenario. The analysis was focused mainly on the models employed, datasets used, and metrics reported regarding the quality of the generated data in terms of utility, privacy and how they compare among themselves. We aim to help institutions and investigators get a grasp of the tools to facilitate access to healthcare data, as well as recommendations for testing data synthesizers with privacy concerns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. corGAN Repository. https://github.com/astorfi/cor-gan

  2. CTGAN Repository. https://github.com/sdv-dev/CTGAN

  3. dp-GAN Repository. https://github.com/illidanlab/dpgan

  4. DPAutoGAN Repository. https://github.com/DPautoGAN/DPautoGAN

  5. DRP-CGAN Repository. https://github.com/astorfi/differentially-private-cgan

  6. mc-medGAN Repository. https://github.com/rcamino/multi-categorical-gans

  7. medGAN Repository. https://github.com/mp2893/medgan

  8. medWGAN Repository. https://github.com/baowaly/SynthEHR

  9. Post-GAN Boosting Repository. https://github.com/mneunhoe/post-gan-boosting

  10. POSTER Repository. https://goo.gl/94qyQz

  11. PPGAN Repository. https://github.com/niklausliu/PPGANs-Privacy-preserving-GANs

  12. SMOOTH-GAN Repository. https://github.com/anuragdutt/synthehr_medgan

  13. SPRINT-GAN Repository. https://github.com/greenelab/SPRINT_gan

  14. table-GAN Repository. https://github.com/mahmoodm2/tableGAN

  15. TGAN Repository. https://github.com/sdv-dev/TGAN

  16. WGAN-DP Repository. https://github.com/Baukebrenninkmeijer/On-the-Generation-and-Evaluation-of-Synthetic-Tabular-Data-using-GANs

  17. Baowaly, M.K., Lin, C.C., Liu, C.L., Chen, K.T.: Synthesizing electronic health records using improved generative adversarial networks. J. Am. Med. Inform. Assoc. 26(3), 228–241 (2019). https://doi.org/10.1093/jamia/ocy142

    Article  Google Scholar 

  18. Baowaly, M.K., Liu, C.L., Chen, K.T.: Realistic data synthesis using enhanced generative adversarial networks. In: 2019 IEEE 2nd International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), pp. 289–292. IEEE; IEEE Computer Society (2019). https://doi.org/10.1109/AIKE.2019.00057

  19. Beaulieu-Jones, B.K., et al.: Privacy-preserving generative deep neural networks support clinical data sharing. Circ. Cardiovasc. Qual. Outcomes 12(7), 139–148 (2019). https://doi.org/10.1161/CIRCOUTCOMES.118.005122, https://www.ahajournals.org/doi/10.1161/CIRCOUTCOMES

  20. Brenninkmeijer, B.: On the generation and evaluation of tabular data using GANs. Ph.D. thesis (2019)

    Google Scholar 

  21. Camino, R., Hammerschmidt, C., State, R.: Generating multi-categorical samples with generative adversarial networks (2018). arXiv:1807.01202

  22. Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W.F., Sun, J.: Generating multi-label discrete patient records using generative adversarial networks, vol. 68, pp. 1–20 (2017). arXiv:1703.06490

  23. Comissão Nacional Proteção de dados: Princípios aplicáveis aos tratamentos de dados efetuados no âmbito da investigação clínica (2015)

    Google Scholar 

  24. El Emam, K., Jonker, E., Arbuckle, L., Malin, B.: A systematic review of re-identification attacks on health data. PLOS ONE 6(12), e0126772 (2011). https://doi.org/10.1371/journal.pone.0028071

  25. Goncalves, A., Ray, P., Soper, B., Stevens, J., Coyle, L., Sales, A.P.: Generation and evaluation of synthetic patient data. BMC Med. Res. Methodol. 20(1), 1–40 (2020). https://doi.org/10.1186/s12874-020-00977-1

    Article  Google Scholar 

  26. Goodfellow, I.J., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2014). https://doi.org/10.1145/3422622,arXiv:1406.2661

  27. Henry, J., Pylypchuk, Y., Searcy, T., Patel, V.: Adoption of electronic health record systems among U.S. Non-Federal Acute Care Hospitals: 2008–2015. Technical report (2016). https://dashboard.healthit.gov/evaluations/data-briefs/non-federal-acute-care-hospital-ehr-adoption-2008-2015.php

  28. Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016)

    Article  Google Scholar 

  29. Jordon, J., Yoon, J., Van Der Schaar, M.: PATE-GaN: generating synthetic data with differential privacy guarantees. In: 7th International Conference on Learning Representations, ICLR 2019, pp. 1–21 (2019)

    Google Scholar 

  30. Kusner, M.J., Hernández-Lobato, J.M.: GANS for sequences of discrete elements with the Gumbel-softmax distribution, pp. 1–6 (2016). arXiv:1611.04051

  31. Liu, Y., Peng, J., Yu, J.J., Wu, Y.: PPGAN: privacy-preserving generative adversarial network. In: Proceedings of the International Conference on Parallel and Distributed Systems, ICPADS 2019, December 2019, pp. 985–989 (2019). ISBN 9781728125831. https://doi.org/10.1109/ICPADS47876.2019.00150, arXiv:1910.02007v1

  32. Lu, P.H., Wang, P.C., Yu, C.M.: Empirical evaluation on synthetic data generation with generative adversarial network. In: Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics, WIMS 2019 (2019). https://doi.org/10.1145/3326467.3326474

  33. Lu, P.H., Yu, C.M.: POSTER: a unified framework of differentially private synthetic data release with generative adversarial network. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, pp. 2547–2549. ACM SIGSAC; Association of Computer Machinery; AT & T Business; Baidu; NSF; CISCO; Internet Finance Authenticat Alliance; Samsung; University of Texas Dallas; Google; IBM Res; Paloalto Networks; Visa Res; Army Res Off; Nasher Sculpture Ctr (2017). https://doi.org/10.1145/3133956.3138823

  34. Mirza, M., Osindero, S.: Conditional generative adversarial nets, pp. 1–7 (2014). arXiv:1411.1784

  35. Neunhoeffer, M., Wu, Z.S., Dwork, C.: Private post-GAN boosting (2020)

    Google Scholar 

  36. Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. In: 34th International Conference on Machine Learning, ICML 2017, vol. 6, pp. 4043–4055 (2017). ISBN 9781510855144. arXiv:1610.09585

  37. Office for Civil Rights.: Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. U.S. Department of Health and Human Services, 20 November 2013 (2013). https://www.hhs.gov/hipaa/for-professionals/privacy/special-%20topics/de-identification/index.html

  38. Park, N., Mohammadi, M., Gorde, K., Jajodia, S., Park, H., Kim, Y.: Data synthesis based on generative adversarial networks. Proc. VLDB Endow. 11(10), 1071–1083 (2018). https://doi.org/10.14778/3231751.3231757, arXiv:1806.03384

  39. Rashidian, S., et al.: SMOOTH-GAN: towards sharp and smooth synthetic EHR data generation. In: Michalowski, M., Moskovitch, R. (eds.) AIME 2020. LNCS (LNAI), vol. 12299, pp. 37–48. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59137-3_4

    Chapter  Google Scholar 

  40. Stadler, T., Oprisanu, B., Troncoso, C.: Synthetic data - a privacy mirage. arXiv arXiv:2011.07018 (2020)

  41. Tantipongpipat, U., Waites, C., Boob, D., Siva, A.A., Cummings, R.: Differentially private synthetic mixed-type data generation for unsupervised learning. arXiv arXiv:cs.LG/1912.03250 (2020)

  42. Torfi, A., Fox, E.A.: CorGAN: Correlation-capturing convolutional generative adversarial networks for generating synthetic healthcare records. arXiv arXiv:2001.09346 (2020)

  43. Torfi, A., Fox, E.A., Reddy, C.K.: Differentially private synthetic medical data generation using convolutional GANs. arXiv arXiv:2012.11774 [cs] (December 2020). https://web.archive.org/web/20210618105126/

  44. Vega-Marquez, B., Rubio-Escudero, C., Riquelme, J.C., Nepomuceno-Chamorro, I.: Creation of synthetic data with conditional generative adversarial networks. In: 14th International Conference on Soft Computing Models in Industrial and Environmental Applications, SOCO 2019, vol. 950, pp. 231–240. Startup Ole; IEEE SMC Spanish Chapter (2020). https://doi.org/10.1007/978-3-030-20055-8_22

  45. Walia, M., Tierney, B., McKeever, S.: Synthesising tabular data using Wasserstein conditional GANs with gradient penalty, p. 13 (2020)

    Google Scholar 

  46. Xie, L., Lin, K., Wang, S., Wang, F., Zhou, J.: Differentially private generative adversarial network. arXiv arXiv:1802.06739 (2018). ISBN 1234567245

  47. Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. arXiv arXiv:1907.00503 32(NeurIPS) (2019)

  48. Xu, L., Veeramachaneni, K.: Synthesizing tabular data using generative adversarial networks. arXiv arXiv:1811.11264 (November 2018)

  49. Xu, Q., et al.: An empirical study on evaluation metrics of generative adversarial networks. arXiv:1806.07755 [cs, stat] (August 2018). https://web.archive.org/web/20200604163128/

  50. Yoon, J., Drumright, L.N., van der Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Inform. 24(8), 2378–2388 (2020). https://doi.org/10.1109/JBHI.2020.2980262

    Article  Google Scholar 

Download references

Acknowledgments

This work has been done under the scope of - and funded by - the Ph.D. Program in Health Data Science of the Faculty of Medicine of the University of Porto, Portugal - heads.med.up.pt.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Coutinho-Almeida, J., Rodrigues, P.P., Cruz-Correia, R.J. (2021). GANs for Tabular Healthcare Data Generation: A Review on Utility and Privacy. In: Soares, C., Torgo, L. (eds) Discovery Science. DS 2021. Lecture Notes in Computer Science(), vol 12986. Springer, Cham. https://doi.org/10.1007/978-3-030-88942-5_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88942-5_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88941-8

  • Online ISBN: 978-3-030-88942-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics