Advertisement

Creation of Synthetic Data with Conditional Generative Adversarial Networks

  • Belén Vega-MárquezEmail author
  • Cristina Rubio-Escudero
  • José C. Riquelme
  • Isabel Nepomuceno-Chamorro
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 950)

Abstract

The generation of synthetic data is becoming a fundamental task in the daily life of any organization due to new protection data laws that are emerging. Generative Adversarial Networks (GANs) and its variants have attracted many researchers in their research work due to its elegant theoretical basis and its great performance in the generation of new data [19]. The goal of synthetic data generation is to create data that will perform similarly to the original dataset for many analysis tasks, such as classification. The problem of GANs is that in a classification problem, GANs do not take class labels into account when generating new data, they treat it as another attribute. This research work has focused on the creation of new synthetic data from the “Default of Credit Card Clients” dataset with a Conditional Generative Adversarial Network (CGAN). CGANs are an extension of GANs where the class label is taken into account when the new data is generated. The performance of our results has been measured by comparing the results obtained with classification algorithms, both in the original dataset and in the data generated.

Keywords

Synthetic data Conditional Generative Adversarial Networks Deep Learning Credit Card Fraud Data 

References

  1. 1.
    Beaulieu-Jones, B.K., Wu, Z.S., Williams, C., Lee, R., Bhavnani, S.P., Byrd, J.B., Greene, C.S.: Privacy-preserving generative deep neural networks support clinical data sharing. bioRxiv, p. 159756, Jan 2018. http://biorxiv.org/content/early/2018/12/20/159756.abstract
  2. 2.
    Chen, T., Guestrin, C.: XGBoost. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2016 (2016)Google Scholar
  3. 3.
    Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. CoRR abs/1603.02754 (2016). http://arxiv.org/abs/1603.02754
  4. 4.
    Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W.F., Sun, J.: Generating multi-label discrete electronic health records using generative adversarial networks. CoRR abs/1703.06490 (2017). http://arxiv.org/abs/1703.06490
  5. 5.
    Chollet, F., et al.: Keras (2015). https://keras.io
  6. 6.
    Dietz, M.: GAN-Sandbox (2017). https://github.com/mjdietzx/GAN-Sandbox
  7. 7.
    Generales, C.: Ley orgánica 3/2018, de 5 de diciembre, de protección de datos personales y garantía de los derechos digitales, December 2018. https://www.boe.es/buscar/doc.php?id=BOE-A-2018-16673. Accessed 14 Feb 2019
  8. 8.
    Goodfellow, I.J., Pouget-abadie, J., Mirza, M., Xu, B., Warde-farley, D., Ozair, S., Courville, A., Bengio, Y.: GANs. In: NIPS (2014)Google Scholar
  9. 9.
    Kim, H.Y.: Statistical notes for clinical researchers: covariance and correlation. Restorative Dent. Endod. 43(1), e4 (2018). http://www.ncbi.nlm.nih.gov/pubmed/29487835. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC5816993CrossRefGoogle Scholar
  10. 10.
    Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
  11. 11.
    Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR abs/1411.1784 (2014). http://arxiv.org/abs/1411.1784
  12. 12.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Ramponi, G., Protopapas, P., Brambilla, M., Janssen, R.: T-CGAN: conditional generative adversarial network for data augmentation in noisy time series with irregular sampling. CoRR abs/1811.08295 (2018). http://arxiv.org/abs/1811.08295
  14. 14.
    Rezaei, M., Yang, H., Meinel, C.: Multi-task generative adversarial network for handling imbalanced clinical data. CoRR abs/1811.10419 (2018). http://arxiv.org/abs/1811.10419
  15. 15.
    Schober, P., Boer, C., Schwarte, L.A.: Correlation coefficients. Anesth. Analg. 126(5), 1763–1768 (2018). http://insights.ovid.com/crossref?an=00000539-201805000-00050CrossRefGoogle Scholar
  16. 16.
    Sedgwick, P.: Pearson’s correlation coefficient. BMJ 345, e4483 (2012). https://www.bmj.com/content/345/bmj.e4483CrossRefGoogle Scholar
  17. 17.
    Triastcyn, A., Faltings, B.: Generating differentially private datasets using GANs (2018). https://openreview.net/forum?id=rJv4XWZA
  18. 18.
    Vega, B.: Syntheticdata (2019). https://github.com/bvegaus/syntheticData
  19. 19.
    Xie, L., Lin, K., Wang, S., Wang, F., Zhou, J.: Differentially private generative adversarial network, February 2018Google Scholar
  20. 20.
    Yoon, J., Jordon, J., van der Schaar, M.: PATE-GAN: generating synthetic data with differential privacy guarantees. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=S1zk9iRqF7

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Belén Vega-Márquez
    • 1
    Email author
  • Cristina Rubio-Escudero
    • 1
  • José C. Riquelme
    • 1
  • Isabel Nepomuceno-Chamorro
    • 1
  1. 1.Department of Computer Languages and SystemsUniversity of SevillaSevillaSpain

Personalised recommendations