Skip to main content

Generation of Synthetic Tabular Healthcare Data Using Generative Adversarial Networks

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13833))

Included in the following conference series:

Abstract

High-quality tabular data is a crucial requirement for developing data-driven applications, especially healthcare-related ones, because most of the data nowadays collected in this context is in tabular form. However, strict data protection laws complicates the access to medical datasets. Thus, synthetic data has become an ideal alternative for data scientists and healthcare professionals to circumvent such hurdles. Although many healthcare institutions still use the classical de-identification and anonymization techniques for generating synthetic data, deep learning-based generative models such as generative adversarial networks (GANs) have shown a remarkable performance in generating tabular datasets with complex structures. This paper examines the GANs’ potential and applicability within the healthcare industry, which often faces serious challenges with insufficient training data and patient records sensitivity. We investigate several state-of-the-art GAN-based models proposed for tabular synthetic data generation. Healthcare datasets with different sizes, numbers of variables, column data types, feature distributions, and inter-variable correlations are examined. Moreover, a comprehensive evaluation framework is defined to evaluate the quality of the synthetic records and the viability of each model in preserving the patients’ privacy. The results indicate that the proposed models can generate synthetic datasets that maintain the statistical characteristics, model compatibility and privacy of the original data. Moreover, synthetic tabular healthcare datasets can be a viable option in many data-driven applications. However, there is still room for further improvements in designing a perfect architecture for generating synthetic tabular data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/ds-anik/Synthetic_Tabular_Healthcare_Data_Generation.

References

  1. Tavanapong, W., Oh, J., Riegler, M., Khaleel, M.I., Mitta, B., De Groen, P.C.: Artificial intelligence for colonoscopy: past, present, and future, IEEE Journal of Biomedical and Health Informatics

    Google Scholar 

  2. Choy, G.: Current applications and future impact of machine learning in radiology. Radiology 288(2), 318 (2018)

    Article  Google Scholar 

  3. Shatte, A.B., Hutchinson, D.M., Teague, S.J.: Machine learning in mental health: a scoping review of methods and applications. Psychol. Med. 49(9), 1426–1448 (2019)

    Article  Google Scholar 

  4. van de Sande, D., et al.: Developing, implementing and governing artificial intelligence in medicine: a step-by-step approach to prevent an artificial intelligence winter, BMJ Health & Care Informatics 29 (1)

    Google Scholar 

  5. Rajkomar, A., Dean, J., Kohane, I.: Machine learning in medicine. N. Engl. J. Med. 380(14), 1347–1358 (2019)

    Article  Google Scholar 

  6. Thambawita, V., et al.: DeepSynthBody: the beginning of the end for data deficiency in medicine. In: 2021 International Conference on Applied Artificial Intelligence (ICAPAI), pp. 1–8. IEEE (2021)

    Google Scholar 

  7. Goncalves, A., Ray, P., Soper, B., Stevens, J., Coyle, L., Sales, A.P.: Generation and evaluation of synthetic patient data. BMC Med. Res. Methodol. 20(1), 1–40 (2020)

    Article  Google Scholar 

  8. Rashidian, S., et al.: SMOOTH-GAN: towards sharp and smooth synthetic EHR data generation. In: Michalowski, M., Moskovitch, R. (eds.) AIME 2020. LNCS (LNAI), vol. 12299, pp. 37–48. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59137-3_4

    Chapter  Google Scholar 

  9. Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2013)

    Google Scholar 

  10. Gogoshin, G., Branciamore, S., Rodin, A.S.: Synthetic data generation with probabilistic Bayesian networks. Math. Biosci. Eng. MBE 18(6), 8603 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  11. Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K., (Eds.), Advances in Neural Information Processing Systems, vol. 27, Curran Associates Inc., (2014)

    Google Scholar 

  12. Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W.F., Sun, J.: Generating multi-label discrete patient records using generative adversarial networks. In: Doshi-Velez, F., Fackler, J., Kale, D., Ranganath, R., Wallace, B., Wiens, J., (Eds.), Proceedings of the 2nd Machine Learning for Healthcare Conference, vol. 68 of Proceedings of Machine Learning Research, pp. 286–305. PMLR (2017)

    Google Scholar 

  13. Park, N., Mohammadi, M., Gorde, K., Jajodia, S., Park, H., Kim, Y.: Data synthesis based on generative adversarial networks. Proceedings of the VLDB Endowment 11(10), 1071–1083 (2018)

    Article  Google Scholar 

  14. Xu, L., Veeramachaneni, K.: Synthesizing tabular data using generative adversarial networks (2018)

    Google Scholar 

  15. Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H., Larochelle, H., Beygelzimer, A., Alché-Buc, F.D., Fox, E., Garnett, R., (Eds.), Advances in Neural Information Processing Systems, vol. 32, Curran Associates Inc., (2019)

    Google Scholar 

  16. Zhao, Z., Kunar, A., Birke, R., Chen, L.Y.: CTAB-GAN: effective table data synthesizing. In: Balasubramanian, V.N., Tsang, I., (Eds.), Proceedings of The 13th Asian Conference on Machine Learning, vol. 157 of Proceedings of Machine Learning Research, pp. 97–112. PMLR (2021)

    Google Scholar 

  17. Xie, L., Lin, K., Wang, S., Wang, F., Zhou, J.: Differentially private generative adversarial network. arXiv preprint arXiv:1802.06739

  18. Torkzadehmahani, R., Kairouz, P., Paten, B.: DP-CGAN: differentially private synthetic data and label generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)

    Google Scholar 

  19. Torfi, A., Fox, E.A., Reddy, C.K.: Differentially private synthetic medical data generation using convolutional GANs. Inf. Sci. 586, 485–500 (2022)

    Article  Google Scholar 

  20. Jordon, J., Yoon, J., Van Der Schaar, M.: PATE-GAN: generating synthetic data with differential privacy guarantees. In: International Conference on Learning Representations (2018)

    Google Scholar 

  21. Coutinho-Almeida, J., Rodrigues, P.P., Cruz-Correia, R.J.: GANs for tabular healthcare data generation: a review on utility and privacy. In: Soares, C., Torgo, L. (eds.) DS 2021. LNCS (LNAI), vol. 12986, pp. 282–291. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88942-5_22

    Chapter  Google Scholar 

  22. Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Scientific Data 3 (160035)

    Google Scholar 

  23. Andrzejak, R.G., Lehnertz, K., Mormann, F., Rieke, C., David, P., Elger, C.E.: Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state. Phys. Rev. E 64 (061907)

    Google Scholar 

  24. Harun-Ur-Rashid, Supriya, Epileptic seizure recognition (2018)

    Google Scholar 

  25. Dua, D., Graff, C.: UCI machine learning repository (2017)

    Google Scholar 

  26. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  27. Strack, B., et al.: Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records, BioMed Research International (2014)

    Google Scholar 

  28. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C., Improved training of wasserstein GANs. In: Guyon, I., et al. (Eds.), Advances in Neural Information Processing Systems, vol. 30, Curran Associates Inc., (2017)

    Google Scholar 

  29. Abadi, M.: TensorFlow: large-scale machine learning on heterogeneous systems. Software available from tensorflow.org (2015)

    Google Scholar 

  30. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R., (Eds.), Advances in Neural Information Processing Systems 32, Curran Associates Inc., pp. 8024–8035 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael A. Riegler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nik, A.H.Z., Riegler, M.A., Halvorsen, P., Storås, A.M. (2023). Generation of Synthetic Tabular Healthcare Data Using Generative Adversarial Networks. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham. https://doi.org/10.1007/978-3-031-27077-2_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-27077-2_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-27076-5

  • Online ISBN: 978-3-031-27077-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics