Utility of GAN generated synthetic data for cardiovascular diseases mortality prediction: an experimental study

Khan, Shahzad Ahmed; Murtaza, Hajra; Ahmed, Musharif

doi:10.1007/s12553-024-00847-6

Utility of GAN generated synthetic data for cardiovascular diseases mortality prediction: an experimental study

Original Paper
Published: 17 April 2024

(2024)
Cite this article

Health and Technology Aims and scope Submit manuscript

48 Accesses
Explore all metrics

Abstract

Purpose

Electronic Health Records (EHRs) are invaluable sources of information for healthcare research and decision-making. However, laws protecting patient privacy restrict the sharing of real EHR data thus impeding the development of advanced AI based healthcare technology which require large volumes of quality data. To bridge this gap, synthetic data (SD) has emerged as a potential privacy-preserving alternative to real data. While SD can serve as a proxy to real data in many practical scenarios, its true potential is still unexploited because of insufficient empirical evidence Nevertheless lack of sufficient empirical evidence supporting its efficacy has led to skepticism and decreased trust in SD among the stakeholders. This research article presents the result of extensive experimentation with SD in prediction of Cardiovascular Disease (CVD) mortality.

Methods

Generative adversarial networks (GANs) are a popular choice for generating SD, especially in the medical domain. We perform two controlled experiments to evaluate the effectiveness of the state-of-the-art GAN models for CVD SD generation, and to study the impact of increasing data-dimensionality upon the utility of generated SD.

Results

The results demonstrate that GAN-generated SD performs well in predicting CVD, with comparable accuracy to that of real data, and highlights the potential of SD for disease prediction.

Conclusion

We believe that our results will leverage better trust on practical use cases of SD among medical practitioners and user stakeholders for applications such as decision support systems, health monitoring and planning, and mobile health systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SMOOTH-GAN: Towards Sharp and Smooth Synthetic EHR Data Generation

Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications

Article Open access 27 May 2023

Generation of Synthetic Tabular Healthcare Data Using Generative Adversarial Networks

Data availability

The datasets utilized in the experiments are publicly available.

References

Ngom F, Fall I, Camara MS, Alassane BA. A study on predicting and diagnosing non-communicable diseases: case of cardiovascular diseases. In: 2020 International Conference on Intelligent Systems and Computer Vision (ISCV). IEEE; 2020. p. 1–8.
Cowie MR, et al. Electronic health records to facilitate clinical research. Clin Res Cardiol. 2017;106(1):1. https://doi.org/10.1007/s00392-016-1025-6.
Article Google Scholar
Hossain ME, Khan A, Moni MA, Uddin S. Use of electronic health data for disease prediction: A comprehensive literature review. IEEE/ACM Trans Computat Biol Bioinform. 2019;18(2):745–58.
Article Google Scholar
Nithya B, Ilango V. Predictive analytics in health care using machine learning tools and techniques. In 2017 International Conference on Intelligent Computing and Control Systems (ICICCS). 2017;492–499. https://doi.org/10.1109/ICCONS.2017.8250771.
Dove ES, Phillips M. Privacy law, data sharing policies, and medical data: a comparative perspective. In: Gkoulalas-Divanis A, Loukides G, editors. Medical data privacy handbook. Cham: Springer International Publishing; 2015. p. 639–78. https://doi.org/10.1007/978-3-319-23633-9_24.
Chapter Google Scholar
Jacobs B, Popma J. Medical research, big data and the need for privacy by design. Big Data Soc. 2019;6(1):1. https://doi.org/10.1177/2053951718824352.
Article Google Scholar
Murthy S, Bakar AA, Rahim FA, Ramli R. A comparative study of data anonymization techniques. In: 2019 IEEE 5th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS). IEEE; 2019. p. 306–9.
Khan SI, Hoque ASM. Digital health data: a comprehensive review of privacy and security risks and some recommendations. Comp Sci J Moldova. 2016;71(2):273–92.
Google Scholar
Dankar FK, Ibrahim M. Fake it till you make it: guidelines for effective synthetic data generation. Appl Sci. 2021;11(5):5. https://doi.org/10.3390/app11052158.
Article Google Scholar
Jordon J, et al. Synthetic Data -- what, why and how? arXiv. 2022. http://arxiv.org/abs/2205.03257. Accessed 09 Aug 2022.
Kaabachi B, et al. Can we trust synthetic data in medicine? A scoping review of privacy and utility metrics. medRxiv. 2023;2023.11.28.23299124. https://doi.org/10.1101/2023.11.28.23299124.
Abowd JM, Vilhuber L. How protective are synthetic data? In: Domingo-Ferrer J, Saygın Y, editors. Privacy in statistical databases. Berlin: Springer; 2008. p. 239–46. https://doi.org/10.1007/978-3-540-87471-3_20. Lecture Notes in Computer Science.
Chapter Google Scholar
Giuffrè M, Shung DL. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. Npj Digit Med. 2023;6(1):1. https://doi.org/10.1038/s41746-023-00927-3.
Article Google Scholar
Rahim A, et al. An integrated machine learning framework for effective prediction of cardiovascular diseases. IEEE Access. 2021;9:1065–88.
Zhou B, Pei J, Luk W. A brief survey on anonymization techniques for privacy-preserving publishing of social network data. SIGKDD Explor Newsl. 2008;10(2):12–22.
Article Google Scholar
Langarizadeh M, et al. Effectiveness of anonymization methods in preserving patients' privacy: A systematic literature review. eHealth. 2018;248:80–7.
Abufadda M, Mansour K. A survey of synthetic data generation for machine learning. In: 2021 22nd International Arab Conference on Information Technology (ACIT). 2021. p. 1–7. https://doi.org/10.1109/ACIT53391.2021.9677302.
El Emam K, Mosquera L, Hoptroff R. Practical synthetic data generation - balancing privacy and the broad availability of data. 1st ed. O’Reilly; 2020.
Google Scholar
Goodfellow I, et al. Generative adversarial nets. Adv Neural Inform Process Sys. 2014;27.
Georges-Filteau J, Cirillo E. Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?. arXiv preprint arXiv:2005.13510;2020.
Murtaza H, Ahmed M, Khan NF, Murtaza G, Zafar S, Bano A. Synthetic data generation: state of the art in health care domain. Comput Sci Rev. 2023;48:100546. https://doi.org/10.1016/j.cosrev.2023.100546.
Article Google Scholar
Brekke PH, et al. Synthetic data for annotation and extraction of family history information from clinical text. J Biomed Semantics. 2021;12:1–11.
Article Google Scholar
Buczak AL, Babin S, Moniz L. Data-driven approach for creating synthetic electronic medical records. BMC Med Inform Decis Mak. 2010;10(1):1–28.
Article Google Scholar
Coutinho-Almeida J, Rodrigues PP, Cruz-Correia RJ. GANs for tabular healthcare data generation: a review on utility and privacy. In: Soares C, Torgo L, editors. Discovery science. Cham: Springer International Publishing; 2021. p. 282–91. https://doi.org/10.1007/978-3-030-88942-5_22. Lecture Notes in Computer Science.
Chapter Google Scholar
Abedi M, et al. GAN-based approaches for generating structured data in the medical domain. Appl Sci. 2022;12(14):7075.
Article Google Scholar
World Health Organization (WHO). https://www.who.int. Accessed 26 Aug 2023.
Hasan NI, Bhattacharjee A. Deep learning approach to cardiovascular disease classification employing modified ECG signal from empirical mode decomposition. Biomed Signal Process Control. 2019;52:128–40.
Article Google Scholar
Venugopal R, Shafqat N, Venugopal I, Tillbury BMJ, Stafford HD, Bourazeri A. Privacy preserving generative adversarial networks to model electronic health records. Neural Netw. 2022;153:339–48. https://doi.org/10.1016/j.neunet.2022.06.022.
Article Google Scholar
Azizi Z, Zheng C, Mosquera L, Pilote L, El Emam K. Can synthetic data be a proxy for real clinical trial data? A validation study. BMJ Open. 2021;11(4):e043497.
Article Google Scholar
El Emam K. Seven ways to evaluate the utility of synthetic data. IEEE Secur Priv. 2020;18(4):4. https://doi.org/10.1109/MSEC.2020.2992821.
Article Google Scholar
Kuppa A, Lamine A, Nhien-An L-K. Towards improving privacy of synthetic datasets. In: Annual privacy forum. Cham: Springer International Publishing; 2021.
Bourou S, El Saer A, Velivassaki T-H, Voulkidis A, Zahariadis T. A review of tabular data synthesis using GANs on an IDS dataset. Information. 2021;12(9):375.
Article Google Scholar
García-Vicente C, et al. Evaluation of synthetic categorical data generation techniques for predicting cardiovascular diseases and post-hoc interpretability of the risk factors. Appl Sci. 2023;13(7):4119.
Article Google Scholar
Rashidian S. SMOOTH-GAN: towards sharp and smooth synthetic EHR data generation. In: Michalowski M, Moskovitch R, editors. Artificial intelligence in medicine. Cham: Springer International Publishing; 2020. p. 37–48. https://doi.org/10.1007/978-3-030-59137-3_4. Lecture Notes in Computer Science.
Chapter Google Scholar
Tucker A, Wang Z, Rotalinti Y, Myles P. Generating high-fidelity synthetic patient data for assessing machine learning healthcare software. Npj Digit Med. 2020;3(1):1. https://doi.org/10.1038/s41746-020-00353-9.
Article Google Scholar
Abdelfattah SM, Abdelrahman GM, Wang M. Augmenting the size of EEG datasets using generative adversarial networks. In: 2018 International Joint Conference on Neural Networks (IJCNN). 2018;1–6. https://doi.org/10.1109/IJCNN.2018.8489727.
Rodriguez-Almeida AJ, et al. Synthetic patient data generation and evaluation in disease prediction using small and imbalanced datasets. IEEE J Biomed Health Inform. 2023;27(6):2670–80. https://doi.org/10.1109/JBHI.2022.3196697.
Article Google Scholar
García-Vicente C. Clinical synthetic data generation to predict and identify risk factors for cardiovascular diseases. In: Rezig EK, Gadepally V, Mattson T, Stonebraker M, Kraska T, Kong J, Luo G, Teng D, Wang F, editors. Heterogeneous data management, polystores, and analytics for healthcare. Cham: Springer Nature Switzerland; 2022. p. 75–91. https://doi.org/10.1007/978-3-031-23905-2_6. Lecture Notes in Computer Science.
Chapter Google Scholar
García-Vicente C, et al. Evaluation of synthetic categorical data generation techniques for predicting cardiovascular diseases and post-hoc interpretability of the risk factors. Appl Sci. 2023;13(7):7. https://doi.org/10.3390/app13074119.
Article Google Scholar
Wang Y, Dong X, Wang L, Chen W, Zhang X. Optimizing small-sample disk fault detection based on LSTM-GAN model. ACM Trans Archit Code Optim TACO. 2022;19(1):1–24.
Article Google Scholar
Fang ML, Devendra Singh D, Kristian K. Dp-ctgan: Differentially private medical data generation using ctgans. In: International Conference on Artificial Intelligence in Medicine. Cham: Springer International Publishing; 2022.
Dua D, Graff C. UCI Machine learning repository: data sets. http://archive.ics.uci.edu/ml/datasets.php. Accessed 20 May 2021.
Framingham Heart Study (FHS) | NHLBI, NIH. https://www.nhlbi.nih.gov/science/framingham-heart-study-fhs. Accessed 26 Aug 2023.
Stroke Prediction Dataset. https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset. Accessed 26 Aug 2023.
Heart Failure Prediction | Kaggle. https://www.kaggle.com/datasets/andrewmvd/heart-failure-clinical-data. Accessed 26 Aug 2023.
Goncalves A, et al. Generation and evaluation of synthetic patient data. BMC Med Res Methodol. 2020;20:1–40.
Article Google Scholar
Hernandez M, Epelde G, Alberdi A, Cilla R, Rankin D. Standardised metrics and methods for synthetic tabular data evaluation. 2021. https://doi.org/10.36227/techrxiv.16610896.
Yan C, et al. A multifaceted benchmarking of synthetic electronic health record generation models. Nat Commun. 2022;13(1):1. https://doi.org/10.1038/s41467-022-35295-1.
Article Google Scholar
Choi E, Biswal S, Malin B, Duke J, Stewart WF, Sun J. Generating multi-label discrete patient records using generative adversarial networks. In: Machine Learning for Healthcare Conference, PMLR. 2017. p. 286–305. http://proceedings.mlr.press/v68/choi17a.html. Accessed 10 May 2021.
Xu L, et al. Modeling tabular data using conditional gan. Adv Neural Inform Process Sys. 2019; 32.
Bhanot K, Qi M, Erickson JS, Guyon I, Bennett KP. The problem of fairness in synthetic healthcare data. Entropy. 2021;23(9):9. https://doi.org/10.3390/e23091165.
Article Google Scholar

Download references

Funding

The authors did not receive support from any organization for the submitted research article.

Author information

Authors and Affiliations

Faculty of Computing, Riphah International University, I-14, Hajj Complex, Islamabad, Pakistan
Shahzad Ahmed Khan, Hajra Murtaza & Musharif Ahmed

Authors

Shahzad Ahmed Khan
View author publications
You can also search for this author in PubMed Google Scholar
Hajra Murtaza
View author publications
You can also search for this author in PubMed Google Scholar
Musharif Ahmed
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All the authors contributed to the conception and design of the study. Research Methodology was designed by Dr. Musharraf Ahmed. Material preparation, data collection and analysis were performed by Shahzad Ahmed Khan. Hajra Murtaza and Shahzad Ahmed conducted experiments and prepared the manuscript which was reviewed by Dr. Musharraf Ahmed. All the authors read and approved the final manuscript.

Corresponding author

Correspondence to Hajra Murtaza.

Ethics declarations

Ethical approval

Not Applicable.

Consent to publish

All the authors agreed to publish this work in the respective journal.

Consent to participate

Not Applicable.

Conflict of interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Pearson correlation

1.1 Pearson correlation for LSTMGAN and DPGAN based SD

As described in result section we have evaluated three models. Among the three selected models CTGAN have performed well. Results of CTGAN are presented in results section, here we are presenting the results of LSTMGAN and DPGAN in Figs. 6 and 7 respective.

From the above results, we can say that there is a huge gap between real and SD. The relationship among variables of the dataset in real is much different than in synthetic. But visual representation is not enough to reject the model for a further experiment that’s why we evaluate the utility of SD by tuning our model on SD and predicting the mortality of the patient on both real data and SD. The prediction results on SD are far away from the results on real data.

1.2 Result for experiment 01: DPGAN

For basic sanity check, we first find the relationship among variables by using Pearson Correlation. The visual representation is given in Fig. 7.

1.3 CTGAN for heart stroke dataset

In the given experiments, we have a binary classification problem. We have predicted the target variable. We have made a comparison of the real and SD. First of all, we have presented the Pearson Correlation of both datasets in Fig. 8.

1.4 CTGAN with Herat failure dataset

In this experiment, we have done binary classification on SD. We have predicted the target variable. First of all, we are presenting the Pearson Correlation of both datasets and presented the results in Fig. 9.

1.5 CTGAN for UCI dataset

UCI datasets are used in binary prediction on the target variable. First of all, we are presenting the Pearson Correlation of both datasets. For SD we have changed the number of epochs and results are presented in Fig. 10.

We have compared both real and synthetic datasets by checking their accuracy. On our best model configuration, we have got an accuracy of 74%. The mean accuracy is presented in Table 7.

Appendix 2: DWP results

2.1 Dimension wise prediction for heart stroke dataset

In the below table Stroke is predicted on the remaining attributes. Al the other attributes are predicted in the same manner and results are presented in Table 12. The experiment is done on both SD and real data for every attribute.

Table 12 DWP Accuracy (%) for both Real and Synthetic Heat Stroke Datasets

Full size table

2.2 Dimension wise prediction for UCI dataset

In the below table target variable is predicted on the remaining attributes. In the same manner, we have predicted the other variables. The experiment is done on both SD and real data for every attribute from Table 13 we can see that the prediction accuracy is very close in both datasets.

Table 13 DWP Accuracy (%) for UCI with Real and Synthetic Datasets

Full size table

Appendix 3: TSTR and TRTS results

For quality evaluation of SD, we have performed two tests. In TRTS, as the name suggests, we trained our model on real datasets, once training is done then we performed testing on SD. We performed TRTS 50 times and every time our model generate different data. There is another way to evaluate the SD, that is, TSTR. In TSTR, we trained the model on SD and tested on real data. The results have similar accuracy as TRTS. Also, we iterate 50 times to get better results.

TRTS Results of Heart Stroke dataset:

Accuracy with the real dataset: 0.9572
Iteration	Accuracy	Iteration	Accuracy	Iteration	Accuracy	Iteration	Accuracy
1	0.9256	7	0.8849	13	0.8849	19	0.8492
2	0.9345	8	0.9232	14	0.9232	20	0.9175
3	0.9226	9	0.8890	15	0.8800	21	0.8951
4	0.9144	10	0.9096	16	0.9096	22	0.8447
5	0.9491	11	0.8831	17	0.8831	23	0.7856
6	0.9002	12	0.9047	18	0.9042	24	0.9123
25	0.8961

TSTR Results of Heart Stroke dataset:

Accuracy with the real dataset: 0.9467
Iteration	Accuracy	Iteration	Accuracy	Iteration	Accuracy	Iteration	Accuracy
1	0.9498	7	0.9112	13	0.9387	19	0.9344
2	0.8964	8	0.8812	14	0.8890	20	0.9358
3	0.9176	9	0.8592	15	0.9166	21	0.9124
4	0.9287	10	0.9345	16	0.8986	22	0.9234
5	0.9191	11	0.8861	17	0.9153	23	0.9431
6	0.8912	12	0.8974	18	0.8443	24	0.9342
25	0.8976

TRTS Results for UCI datasets:

Accuracy with the real dataset: 0.8689
Iteration	Accuracy	Iteration	Accuracy	Iteration	Accuracy	Iteration	Accuracy
1	0.6428	7	0.5564	13	0.6463	19	0.43548
2	0.6123	8	0.5457	14	0.5246	20	0.41140
3	0.6453	9	0.5545	15	0.6111	21	0.61243
4	0.6452	10	0.4657	16	0.6136	22	0.43584
5	0.6354	11	0.7146	17	0.5345	23	0.64537
6	0.6128	12	0.6751	18	0.5456	24	0.41751
25	0.6055

TSTR: Results for UCI datasets

Accuracy with the real dataset: 0.8645
Iteration	Accuracy	Iteration	Accuracy	Iteration	Accuracy	Iteration	Accuracy
1	0.6317	7	0.7428	13	0.6670	19	0.5847
2	0.6114	8	0.7344	14	0.6073	20	0.6236
3	0.6338	9	0.6434	15	0.6754	21	0.7000
4	0.6358	10	0.7147	16	0.6476	22	0.7073
5	0.5387	11	0.6756	17	0.6175	23	0.6837
6	0.5175	12	0.7045	18	0.5751	24	0.6042
25	0.7031

TRTS for Heart Failure dataset

Accuracy with the real dataset: 0.8645
Iteration	Accuracy	Iteration	Accuracy	Iteration	Accuracy	Iteration	Accuracy
1	0.6354	7	0.7123	13	0.6534	19	0.6564
2	0.6114	8	0.7812	14	0.7766	20	0.5246
3	0.5124	9	0.7453	15	0.6545	21	0.7344
4	0.6445	10	0.7111	16	0.6343	22	0.7133
5	0.5453	11	0.7312	17	0.6734	23	0.7334
6	0.6144	12	0.6128	18	0.6984	24	0.6456
25	0.7122

TSTR for Heart Failure dataset:

Accuracy with the real dataset: 0.8334
Iteration	Accuracy	Iteration	Accuracy	Iteration	Accuracy	Iteration	Accuracy
1	0.7234	7	0.7089	13	0.5985	19	0.6114
2	0.6344	8	0.6567	14	0.6122	20	0.5433
3	0.6434	9	0.5434	15	0.6073	21	0.5358
4	0.7147	10	0.5985	16	0.5837	22	0.6387
5	0.6756	11	0.6751	17	0.5042	23	0.6147
6	0.7045	12	0.6845	18	0.5317	24	0.7300
25	0.5788

Appendix 4: Architectural details of GANs used in the experiments

4.1 CTGAN

CTGAN (Conditional Tabular Generative Adversarial Network) is a generative model designed to produce synthetic structured data while preserving its original statistical properties. Built upon the principles of Generative Adversarial Networks (GANs), CTGAN encompasses a generator and discriminator that work in tandem to create high-quality synthetic data.

This architecture operates within a conditional GAN framework, allowing it to generate data samples while considering specific attributes. By transforming discrete categorical attributes into continuous embeddings using embedding networks, CTGAN ensures that similar categorical values are closely represented in the synthetic data. The generator takes both continuous and categorical noise vectors as well as attribute values for conditioning. Through its layers, including fully connected ones, it transforms these inputs into realistic synthetic samples that mirror the original data. The discriminator, on the other hand, distinguishes between real and generated data by classifying input samples as authentic or synthetic.

To train CTGAN, adversarial and auxiliary loss functions come into play. The adversarial loss fine-tunes the generator and discriminator to create data that's indistinguishable from real data. Auxiliary losses, such as distance-based loss, guarantee that the synthetic samples align with the original data's statistical properties. CTGAN incorporates a strategic sampling strategy to ensure diversity in generated samples, mitigating the risk of the generator producing repetitive data (mode collapse). Moreover, post-processing is employed to conform to predefined constraints or business rules, enhancing the generated data's utility.

4.2 Architecture of DP-GAN

Privacy-preserving generative models, like the Deep Privacy Preserving Generative Model (DPGAN), are designed to generate synthetic data while protecting the privacy of individuals in the original dataset. These models are developed to ensure that generated data retains the statistical properties of the original data without revealing sensitive information.

The DPGAN framework comprises several key components. It starts with an original dataset containing sensitive information and defines privacy constraints that specify which attributes or attribute combinations should remain private. The generator, a neural network, takes random noise as input and produces synthetic data samples. To achieve privacy, the generator generates samples that adhere to the original data's statistical distribution while satisfying privacy constraints.

Privacy preservation is achieved through mechanisms like differential privacy, which adds noise to the generated output, ensuring that the synthetic samples do not divulge specific individual details. Privacy loss bounds are also enforced to maintain acceptable privacy levels. Balancing utility preservation alongside privacy is crucial. Utility ensures that the generated data remains useful for downstream tasks without compromising its privacy preservation objectives. Evaluation involves assessing both privacy and utility, often involving adversarial training to maintain data indistinguishability from the original while preserving privacy.

Post-processing might be applied to guarantee the synthetic data adheres to privacy constraints. Validation processes ensure the generated data's validity and privacy alignment using suitable metrics and tests. It's important to note that privacy-preserving generative models represent an evolving research field, and architecture specifics can vary based on the model's design and researchers' goals.

4.3 Architecture of LSTM GAN

The term "LSTMGAN" suggests a potential model that merges the capabilities of LSTM (Long Short-Term Memory) networks and GANs (Generative Adversarial Networks). While not a recognized model as of my last update in September 2021, we can conceptualize its architecture.

LSTMGAN could be a generative model designed to produce sequential data, leveraging LSTM's proficiency in capturing temporal patterns. Its components include a generator and a discriminator, fundamental to GANs. The generator employs LSTM architecture to create sequences of data points, using random noise as input. The discriminator's role is to differentiate real and generated sequences, steering the generator towards crafting more authentic sequences. Training involves an adversarial framework. The generator strives to make its sequences indistinguishable from real data, while the discriminator refines its classification abilities.

Loss functions encompass an adversarial loss and an LSTM loss. The former compels the generator to craft sequences resembling real data, while the latter ensures coherence and temporal consistency. Iterations of training refine both generator and discriminator through backpropagation, where the generator's parameters adapt to enhance sequence quality, and the discriminator becomes adept at classification.

The generator's sampling strategy ensures diverse sequences, sidestepping repetition and mode collapse issues. In applications, the LSTMGAN model could generate realistic time series data, text sequences, or other sequential data types. However, this interpretation is conceptual. For specific insights into a model named "LSTMGAN" developed after my last update, I suggest consulting original research papers and documentation for its precise architecture and methodology.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Khan, S.A., Murtaza, H. & Ahmed, M. Utility of GAN generated synthetic data for cardiovascular diseases mortality prediction: an experimental study. Health Technol. (2024). https://doi.org/10.1007/s12553-024-00847-6

Download citation

Received: 20 September 2023
Accepted: 19 March 2024
Published: 17 April 2024
DOI: https://doi.org/10.1007/s12553-024-00847-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Utility of GAN generated synthetic data for cardiovascular diseases mortality prediction: an experimental study

Abstract

Purpose

Methods

Results

Conclusion

Access this article

Similar content being viewed by others

SMOOTH-GAN: Towards Sharp and Smooth Synthetic EHR Data Generation

Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications

Generation of Synthetic Tabular Healthcare Data Using Generative Adversarial Networks

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Consent to publish

Consent to participate

Conflict of interests

Additional information

Publisher's Note

Appendices

Appendix 1: Pearson correlation

1.1 Pearson correlation for LSTMGAN and DPGAN based SD

1.2 Result for experiment 01: DPGAN

1.3 CTGAN for heart stroke dataset

1.4 CTGAN with Herat failure dataset

1.5 CTGAN for UCI dataset

Appendix 2: DWP results

2.1 Dimension wise prediction for heart stroke dataset

2.2 Dimension wise prediction for UCI dataset

Appendix 3: TSTR and TRTS results

Appendix 4: Architectural details of GANs used in the experiments

4.1 CTGAN

4.2 Architecture of DP-GAN

4.3 Architecture of LSTM GAN

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation