Skip to main content

CB-GAN: Generate Sensitive Data with a Convolutional Bidirectional Generative Adversarial Networks

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13946))

Included in the following conference series:

Abstract

In the era of big data, numerous data measurements collected from all walks of life are playing important roles in various data mining applications. Not all data owners (or keepers) could develop feasible learning models for knowledge discovery’s sake. Oftentimes, the original data need to be passed to or shared with researchers or data scientists for better mining insights, especially in the medical, financial, and industrial fields. However, concerns about sensitivity and privacy limit the availability and completeness of shared (or passed) data and the quality of mining results. In this paper, we propose a novel Convolutional Bidirectional Generative Adversarial Networks (CB-GAN) framework to generate sensitive synthetic data. The Convolutional Neural Networks are utilized to capture the feature correlations of the original data, and the Generative Adversarial Networks with Autoencoders are combined to synthesize realistically distributed data. To demonstrate the feasibility of the model, we evaluated it from three aspects: how similar are the distributions of the synthetic data to the original data, how well can the synthetic data accomplish future data mining tasks, and how much sensitive information has been hidden. Various experimental results showed the superiority of the proposed method compared with the state-of-the-art methods.

This research was supported by the Guangdong Natural Science Foundation General Program (Grant No. 2022A1515011713).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adepu, S., Kandasamy, N.K., Mathur, A.: EPIC: an electric power testbed for research and training in cyber physical systems security. In: Katsikas, S.K., et al. (eds.) SECPRE/CyberICPS -2018. LNCS, vol. 11387, pp. 37–52. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12786-2_3

    Chapter  Google Scholar 

  2. Ahmed, C.M., Palleti, V.R., Mathur, A.P.: WADI: a water distribution testbed for research in the design of secure cyber physical systems. In: Proceedings of the 3rd International Workshop on Cyber Physical Systems for Smart Water Networks, pp. 25–28 (2017)

    Google Scholar 

  3. Al-E’mari, S., Anbar, M., Sanjalawe, Y., Manickam, S.: A labeled transactions-based dataset on the Ethereum network. In: Anbar, M., Abdullah, N., Manickam, S. (eds.) ACeS 2020. CCIS, vol. 1347, pp. 61–79. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-6835-4_5

    Chapter  Google Scholar 

  4. Andrzejak, R.G., Lehnertz, K., Mormann, F., Rieke, C., David, P., Elger, C.E.: Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state. Phys. Rev. E 64(6), 061907 (2001)

    Article  Google Scholar 

  5. Aung, Y.L., Tiang, H.H., Wijaya, H., Ochoa, M., Zhou, J.: Scalable VPN-forwarded honeypots: dataset and threat intelligence insights. In: Sixth Annual Industrial Control System Security (ICSS) Workshop, pp. 21–30 (2020)

    Google Scholar 

  6. Botsis, T., Hartvigsen, G., Chen, F., Weng, C.: Secondary use of EHR: data quality issues and informatics opportunities. Summit Transl. Bioinform. 2010, 1 (2010)

    Google Scholar 

  7. Buczak, A.L., Babin, S., Moniz, L.: Data-driven approach for creating synthetic electronic medical records. BMC Med. Inform. Decis. Mak. 10(1), 1–28 (2010)

    Article  Google Scholar 

  8. Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W.F., Sun, J.: Generating multi-label discrete patient records using generative adversarial networks. In: Machine Learning for Healthcare Conference, pp. 286–305. PMLR (2017)

    Google Scholar 

  9. Clause, S.L., Triller, D.M., Bornhorst, C.P., Hamilton, R.A., Cosler, L.E.: Conforming to HIPAA regulations and compilation of research data. Am. J. Health Syst. Pharm. 61(10), 1025–1031 (2004)

    Article  Google Scholar 

  10. Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Theoret. Comput. Sci. 9(3–4), 211–407 (2013)

    MathSciNet  MATH  Google Scholar 

  11. El Emam, K., Rodgers, S., Malin, B.: Anonymising and sharing individual patient data. BMJ 350 (2015)

    Google Scholar 

  12. Fasano, G., Franceschini, A.: A multidimensional version of the Kolmogorov-Smirnov test. Mon. Not. R. Astron. Soc. 225(1), 155–170 (1987)

    Article  Google Scholar 

  13. Fernandes, K., Cardoso, J.S., Fernandes, J.: Transfer learning with partial observability applied to cervical cancer screening. In: Alexandre, L.A., Salvador Sánchez, J., Rodrigues, J.M.F. (eds.) IbPRIA 2017. LNCS, vol. 10255, pp. 243–250. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58838-4_27

    Chapter  Google Scholar 

  14. Goh, J., Adepu, S., Junejo, K.N., Mathur, A.: A dataset to support research in the design of secure water treatment systems. In: Havarneanu, G., Setola, R., Nassopoulos, H., Wolthusen, S. (eds.) CRITIS 2016. LNCS, vol. 10242, pp. 88–99. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71368-7_8

    Chapter  Google Scholar 

  15. U.S. Dept. of Health and Human Services: Guidance regarding methods for de-identification of protected health information in accordance with the health insurance portability and accountability act (HIPAA) privacy rule. HIPAA) Privacy Rule (2012)

    Google Scholar 

  16. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  17. Hodge, J.G., Jr., Gostin, L.O., Jacobson, P.D.: Legal issues concerning electronic health information: privacy, quality, and liability. JAMA 282(15), 1466–1471 (1999)

    Article  Google Scholar 

  18. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)

    Google Scholar 

  19. Jensen, P.B., Jensen, L.J., Brunak, S.: Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13(6), 395–405 (2012)

    Article  Google Scholar 

  20. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint: arXiv:1412.6980 (2014)

  21. McKenna, R., Mullins, B., Sheldon, D., Miklau, G.: Aim: an adaptive and iterative mechanism for differentially private synthetic data. arXiv preprint: arXiv:2201.12677 (2022)

  22. McLachlan, S., Dube, K., Gallagher, T.: Using the CareMap with health incidents statistics for generating the realistic synthetic electronic healthcare record. In: 2016 IEEE International Conference on Healthcare Informatics (ICHI), pp. 439–448. IEEE (2016)

    Google Scholar 

  23. Miller, A.R., Tucker, C.: Health information exchange, system size and information silos. J. Health Econ. 33, 28–42 (2014)

    Article  Google Scholar 

  24. Natekin, A., Knoll, A.: Gradient boosting machines, a tutorial. Front. Neurorobot. 7, 21 (2013)

    Article  Google Scholar 

  25. Park, Y., Ghosh, J., Shankar, M.: Perturbed Gibbs samplers for generating large-scale privacy-safe synthetic health data. In: 2013 IEEE International Conference on Healthcare Informatics, pp. 493–498. IEEE (2013)

    Google Scholar 

  26. S. Oliveira, M.I., Barros Lima, G.D.F., Farias Lóscio, B.: Investigations into data ecosystems: a systematic mapping study (2019)

    Google Scholar 

  27. Tao, Y., Xiao, X., Li, J., Zhang, D.: On anti-corruption privacy preserving publication. In: 2008 IEEE 24th International Conference on Data Engineering, pp. 725–734. IEEE (2008)

    Google Scholar 

  28. Torfi, A., Fox, E.A.: CorGAN: correlation-capturing convolutional generative adversarial networks for generating synthetic healthcare records. In: The Thirty-Third International Flairs Conference (2020)

    Google Scholar 

  29. Torfi, A., Fox, E.A., Reddy, C.K.: Differentially private synthetic medical data generation using convolutional GANs. Inf. Sci. 586, 485–500 (2022)

    Article  Google Scholar 

  30. Ulianova, S.: Cardiovascular disease dataset. Data retrieved from the Kaggle dataset (2018)

    Google Scholar 

  31. Walonoski, J., et al.: Synthea: an approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J. Am. Med. Inform. Assoc. 25(3), 230–238 (2018)

    Article  Google Scholar 

  32. Xie, L., Lin, K., Wang, S., Wang, F., Zhou, J.: Differentially private generative adversarial network. arXiv preprint: arXiv:1802.06739 (2018)

  33. Zheng, P., Zheng, Z., Wu, J., Dai, H.N.: XBlock-eth: Extracting and exploring blockchain data from Ethereum. IEEE Open J. Comput. Soc. 1, 95–106 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hu, R., Li, D., Ng, SK., Zheng, Z. (2023). CB-GAN: Generate Sensitive Data with a Convolutional Bidirectional Generative Adversarial Networks. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13946. Springer, Cham. https://doi.org/10.1007/978-3-031-30678-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30678-5_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30677-8

  • Online ISBN: 978-3-031-30678-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics