Abstract
Data is a major asset in today’s healthcare scenery. Hospitals are one of the primary producers of healthcare-related data and the value this data can provide is enormous. However, to use this to improve healthcare practice and push science forward, it is necessary to safeguard the patient’s privacy and the ethical use of the data. The ethical and legal requirements are vast and complex. Synthetic data appears as a tool to overcome these hurdles and provide fast and reliable access to data without compromising utility nor privacy. Even though Generative Adversarial Networks (GANs) are receiving a lot of attention lately, the application of most common models and architectures are not suited to tabular data – the most prevalent healthcare-related data. This study surveys the current GAN implementations tailored to this scenario. The analysis was focused mainly on the models employed, datasets used, and metrics reported regarding the quality of the generated data in terms of utility, privacy and how they compare among themselves. We aim to help institutions and investigators get a grasp of the tools to facilitate access to healthcare data, as well as recommendations for testing data synthesizers with privacy concerns.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
corGAN Repository. https://github.com/astorfi/cor-gan
CTGAN Repository. https://github.com/sdv-dev/CTGAN
dp-GAN Repository. https://github.com/illidanlab/dpgan
DPAutoGAN Repository. https://github.com/DPautoGAN/DPautoGAN
DRP-CGAN Repository. https://github.com/astorfi/differentially-private-cgan
mc-medGAN Repository. https://github.com/rcamino/multi-categorical-gans
medGAN Repository. https://github.com/mp2893/medgan
medWGAN Repository. https://github.com/baowaly/SynthEHR
Post-GAN Boosting Repository. https://github.com/mneunhoe/post-gan-boosting
POSTER Repository. https://goo.gl/94qyQz
PPGAN Repository. https://github.com/niklausliu/PPGANs-Privacy-preserving-GANs
SMOOTH-GAN Repository. https://github.com/anuragdutt/synthehr_medgan
SPRINT-GAN Repository. https://github.com/greenelab/SPRINT_gan
table-GAN Repository. https://github.com/mahmoodm2/tableGAN
TGAN Repository. https://github.com/sdv-dev/TGAN
WGAN-DP Repository. https://github.com/Baukebrenninkmeijer/On-the-Generation-and-Evaluation-of-Synthetic-Tabular-Data-using-GANs
Baowaly, M.K., Lin, C.C., Liu, C.L., Chen, K.T.: Synthesizing electronic health records using improved generative adversarial networks. J. Am. Med. Inform. Assoc. 26(3), 228–241 (2019). https://doi.org/10.1093/jamia/ocy142
Baowaly, M.K., Liu, C.L., Chen, K.T.: Realistic data synthesis using enhanced generative adversarial networks. In: 2019 IEEE 2nd International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), pp. 289–292. IEEE; IEEE Computer Society (2019). https://doi.org/10.1109/AIKE.2019.00057
Beaulieu-Jones, B.K., et al.: Privacy-preserving generative deep neural networks support clinical data sharing. Circ. Cardiovasc. Qual. Outcomes 12(7), 139–148 (2019). https://doi.org/10.1161/CIRCOUTCOMES.118.005122, https://www.ahajournals.org/doi/10.1161/CIRCOUTCOMES
Brenninkmeijer, B.: On the generation and evaluation of tabular data using GANs. Ph.D. thesis (2019)
Camino, R., Hammerschmidt, C., State, R.: Generating multi-categorical samples with generative adversarial networks (2018). arXiv:1807.01202
Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W.F., Sun, J.: Generating multi-label discrete patient records using generative adversarial networks, vol. 68, pp. 1–20 (2017). arXiv:1703.06490
Comissão Nacional Proteção de dados: Princípios aplicáveis aos tratamentos de dados efetuados no âmbito da investigação clínica (2015)
El Emam, K., Jonker, E., Arbuckle, L., Malin, B.: A systematic review of re-identification attacks on health data. PLOS ONE 6(12), e0126772 (2011). https://doi.org/10.1371/journal.pone.0028071
Goncalves, A., Ray, P., Soper, B., Stevens, J., Coyle, L., Sales, A.P.: Generation and evaluation of synthetic patient data. BMC Med. Res. Methodol. 20(1), 1–40 (2020). https://doi.org/10.1186/s12874-020-00977-1
Goodfellow, I.J., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2014). https://doi.org/10.1145/3422622,arXiv:1406.2661
Henry, J., Pylypchuk, Y., Searcy, T., Patel, V.: Adoption of electronic health record systems among U.S. Non-Federal Acute Care Hospitals: 2008–2015. Technical report (2016). https://dashboard.healthit.gov/evaluations/data-briefs/non-federal-acute-care-hospital-ehr-adoption-2008-2015.php
Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016)
Jordon, J., Yoon, J., Van Der Schaar, M.: PATE-GaN: generating synthetic data with differential privacy guarantees. In: 7th International Conference on Learning Representations, ICLR 2019, pp. 1–21 (2019)
Kusner, M.J., Hernández-Lobato, J.M.: GANS for sequences of discrete elements with the Gumbel-softmax distribution, pp. 1–6 (2016). arXiv:1611.04051
Liu, Y., Peng, J., Yu, J.J., Wu, Y.: PPGAN: privacy-preserving generative adversarial network. In: Proceedings of the International Conference on Parallel and Distributed Systems, ICPADS 2019, December 2019, pp. 985–989 (2019). ISBN 9781728125831. https://doi.org/10.1109/ICPADS47876.2019.00150, arXiv:1910.02007v1
Lu, P.H., Wang, P.C., Yu, C.M.: Empirical evaluation on synthetic data generation with generative adversarial network. In: Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics, WIMS 2019 (2019). https://doi.org/10.1145/3326467.3326474
Lu, P.H., Yu, C.M.: POSTER: a unified framework of differentially private synthetic data release with generative adversarial network. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, pp. 2547–2549. ACM SIGSAC; Association of Computer Machinery; AT & T Business; Baidu; NSF; CISCO; Internet Finance Authenticat Alliance; Samsung; University of Texas Dallas; Google; IBM Res; Paloalto Networks; Visa Res; Army Res Off; Nasher Sculpture Ctr (2017). https://doi.org/10.1145/3133956.3138823
Mirza, M., Osindero, S.: Conditional generative adversarial nets, pp. 1–7 (2014). arXiv:1411.1784
Neunhoeffer, M., Wu, Z.S., Dwork, C.: Private post-GAN boosting (2020)
Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. In: 34th International Conference on Machine Learning, ICML 2017, vol. 6, pp. 4043–4055 (2017). ISBN 9781510855144. arXiv:1610.09585
Office for Civil Rights.: Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. U.S. Department of Health and Human Services, 20 November 2013 (2013). https://www.hhs.gov/hipaa/for-professionals/privacy/special-%20topics/de-identification/index.html
Park, N., Mohammadi, M., Gorde, K., Jajodia, S., Park, H., Kim, Y.: Data synthesis based on generative adversarial networks. Proc. VLDB Endow. 11(10), 1071–1083 (2018). https://doi.org/10.14778/3231751.3231757, arXiv:1806.03384
Rashidian, S., et al.: SMOOTH-GAN: towards sharp and smooth synthetic EHR data generation. In: Michalowski, M., Moskovitch, R. (eds.) AIME 2020. LNCS (LNAI), vol. 12299, pp. 37–48. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59137-3_4
Stadler, T., Oprisanu, B., Troncoso, C.: Synthetic data - a privacy mirage. arXiv arXiv:2011.07018 (2020)
Tantipongpipat, U., Waites, C., Boob, D., Siva, A.A., Cummings, R.: Differentially private synthetic mixed-type data generation for unsupervised learning. arXiv arXiv:cs.LG/1912.03250 (2020)
Torfi, A., Fox, E.A.: CorGAN: Correlation-capturing convolutional generative adversarial networks for generating synthetic healthcare records. arXiv arXiv:2001.09346 (2020)
Torfi, A., Fox, E.A., Reddy, C.K.: Differentially private synthetic medical data generation using convolutional GANs. arXiv arXiv:2012.11774 [cs] (December 2020). https://web.archive.org/web/20210618105126/
Vega-Marquez, B., Rubio-Escudero, C., Riquelme, J.C., Nepomuceno-Chamorro, I.: Creation of synthetic data with conditional generative adversarial networks. In: 14th International Conference on Soft Computing Models in Industrial and Environmental Applications, SOCO 2019, vol. 950, pp. 231–240. Startup Ole; IEEE SMC Spanish Chapter (2020). https://doi.org/10.1007/978-3-030-20055-8_22
Walia, M., Tierney, B., McKeever, S.: Synthesising tabular data using Wasserstein conditional GANs with gradient penalty, p. 13 (2020)
Xie, L., Lin, K., Wang, S., Wang, F., Zhou, J.: Differentially private generative adversarial network. arXiv arXiv:1802.06739 (2018). ISBN 1234567245
Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. arXiv arXiv:1907.00503 32(NeurIPS) (2019)
Xu, L., Veeramachaneni, K.: Synthesizing tabular data using generative adversarial networks. arXiv arXiv:1811.11264 (November 2018)
Xu, Q., et al.: An empirical study on evaluation metrics of generative adversarial networks. arXiv:1806.07755 [cs, stat] (August 2018). https://web.archive.org/web/20200604163128/
Yoon, J., Drumright, L.N., van der Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Inform. 24(8), 2378–2388 (2020). https://doi.org/10.1109/JBHI.2020.2980262
Acknowledgments
This work has been done under the scope of - and funded by - the Ph.D. Program in Health Data Science of the Faculty of Medicine of the University of Porto, Portugal - heads.med.up.pt.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Coutinho-Almeida, J., Rodrigues, P.P., Cruz-Correia, R.J. (2021). GANs for Tabular Healthcare Data Generation: A Review on Utility and Privacy. In: Soares, C., Torgo, L. (eds) Discovery Science. DS 2021. Lecture Notes in Computer Science(), vol 12986. Springer, Cham. https://doi.org/10.1007/978-3-030-88942-5_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-88942-5_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88941-8
Online ISBN: 978-3-030-88942-5
eBook Packages: Computer ScienceComputer Science (R0)