Skip to main content
Log in

Generalization error of GAN from the discriminator’s perspective

  • Research
  • Published:
Research in the Mathematical Sciences Aims and scope Submit manuscript

Abstract

The generative adversarial network (GAN) is a well-known model for learning high-dimensional distributions, but the mechanism for its generalization ability is not understood. In particular, GAN is vulnerable to the memorization phenomenon, the eventual convergence to the empirical distribution. We consider a simplified GAN model with the generator replaced by a density and analyze how the discriminator contributes to generalization. We show that with early stopping, the generalization error measured by Wasserstein metric escapes from the curse of dimensionality, despite that in the long term, memorization is inevitable. In addition, we present a hardness of learning result for WGAN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. Technically, Fig. 1 curve \(\textcircled {1}\) concerns the \(W_2\) loss, but it is reasonable to believe that training on \(W_1\) or any \(W_p\) loss cannot escape from the curse of dimensionality either.

References

  1. Ambrosio, L., Gigli, N., Savaré, G.: Gradient flows: in metric spaces and in the space of probability measures. Springer, Berlin (2008)

    MATH  Google Scholar 

  2. Arbel, M., Korba, A., Salim, A., Gretton, A.: Maximum mean discrepancy gradient flow. arXiv preprintarXiv:1906.04370 (2019)

  3. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv preprintarXiv:1701.07875 (2017)

  4. Arora, S., Ge, R., Liang, Y., Ma, T., Zhang, Y.: Generalization and equilibrium in generative adversarial nets (GANs). arXiv preprintarXiv:1703.00573 (2017)

  5. Arora, S., Risteski, A., Zhang, Y.: Do GANs learn the distribution? Some theory and empirics. In: International Conference on Learning Representations (2018)

  6. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprintarXiv:1607.06450 (2016)

  7. Bai, Y., Ma, T., Risteski, A.: Approximability of discriminators implies diversity in GANs (2019)

  8. Balaji, Y., Sajedi, M., Kalibhat, N.M., Ding, M., Stöger, D., Soltanolkotabi, M., Feizi, S.: Understanding overparameterization in generative adversarial networks. arXiv preprintarXiv:2104.05605 (2021)

  9. Borkar, V.S.: Stochastic approximation with two time scales. Syst. Control Lett. 29(5), 291–294 (1997)

    Article  MathSciNet  Google Scholar 

  10. Chavdarova, T., Fleuret, F.: SGAN: an alternative training of generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9407–9415 (2018)

  11. Che, T., Li, Y., Jacob, A., Bengio, Y., Li, W.: Mode regularized generative adversarial networks. arXiv preprintarXiv:1612.02136 (2016)

  12. Dobrić, V., Yukich, J.E.: Asymptotics for transportation cost in high dimensions. J. Theor. Probab. 8(1), 97–118 (1995)

    Article  MathSciNet  Google Scholar 

  13. E, W., Ma, C., Wang, Q.: A priori estimates of the population risk for residual networks. arXiv preprintarXiv:1903.02154 1, 7 (2019)

  14. E, W., Ma, C., Wojtowytsch, S., Wu, L.: Towards a mathematical understanding of neural network-based machine learning: what we know and what we don’t (2020)

  15. E, W., Ma, C., Wu, L.: A priori estimates for two-layer neural networks. arXiv preprintarXiv:1810.06397 (2018)

  16. E, W., Ma, C., Wu, L.: On the generalization properties of minimum-norm solutions for over-parameterized neural network models. arXiv preprintarXiv:1912.06987 (2019)

  17. E, W., Ma, C., Wu, L.: Machine learning from a continuous viewpoint, I. Sci. China Math. 63(11), 2233–2266 (2020)

  18. E, W., Ma, C., Wu, L.: The Barron space and the flow-induced function spaces for neural network models. Construct. Approx., 1–38 (2021)

  19. E, W., Wojtowytsch, S.: Kolmogorov width decay and poor approximators in machine learning: shallow neural networks, random feature models and neural tangent kernels. arXiv preprintarXiv:2005.10807 (2020)

  20. E, W., Wojtowytsch, S.: On the Banach spaces associated with multi-layer ReLU networks: function representation, approximation theory and gradient descent dynamics. arXiv preprintarXiv:2007.15623 (2020)

  21. Feizi, S., Farnia, F., Ginart, T., Tse, D.: Understanding GANs in the LQG setting: formulation, generalization and stability. IEEE J. Sel. Areas Inf. Theory 1(1), 304–311 (2020)

    Article  Google Scholar 

  22. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680 (2014)

  23. Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13(25), 723–773 (2012)

    MathSciNet  MATH  Google Scholar 

  24. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of Wasserstein GANs (2017)

  25. Gulrajani, I., Raffel, C., Metz, L.: Towards GAN benchmarks which require generalization. arXiv preprintarXiv:2001.03653 (2020)

  26. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems, pp. 6626–6637 (2017)

  27. Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991)

    Article  MathSciNet  Google Scholar 

  28. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprintarXiv:1502.03167 (2015)

  29. Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)

  30. Jiang, Y., Chang, S., Wang, Z.: TransGAN: Two transformers can make one strong GAN. arXiv preprintarXiv:2102.07074 (2021)

  31. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)

  32. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprintarXiv:1312.6114 (2013)

  33. Kodali, N., Abernethy, J., Hays, J., Kira, Z.: On convergence and stability of GANs. arXiv preprintarXiv:1705.07215 (2017)

  34. Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Advances in Neural Information Processing Systems, pp. 950–957 (1992)

  35. Lei, Q., Lee, J.D., Dimakis, A.G., Daskalakis, C.: SGD learns one-layer networks in WGANs (2020)

  36. Liang, Y., Lee, D., Li, Y., Shin, B.-S.: Unpaired medical image colorization using generative adversarial network. Multimed. Tools Appl., 1–15 (2021)

  37. Lin, T., Jin, C., Jordan, M.: On gradient descent ascent for nonconvex-concave minimax problems. In: International Conference on Machine Learning, PMLR, pp. 6083–6093 (2020)

  38. Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Smolley, S.P.: On the effectiveness of least squares generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41(12), 2947–2960 (2018)

    Article  Google Scholar 

  39. Mao, Y., He, Q., Zhao, X.: Designing complex architectured materials with generative adversarial networks. Sci. Adv. 6, 17 (2020)

    Google Scholar 

  40. Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for GANs do actually converge? In: International Conference on Machine Learning, PMLR, pp. 3481–3490 (2018)

  41. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprintarXiv:1802.05957 (2018)

  42. Mustafa, M., Bard, D., Bhimji, W., Lukić, Z., Al-Rfou, R., Kratochvil, J.M.: CosmoGAN: creating high-fidelity weak lensing convergence maps using generative adversarial networks. Comput. Astrophys. Cosmol. 6(1), 1 (2019)

    Article  Google Scholar 

  43. Nagarajan, V., Kolter, J.Z.: Gradient descent GAN optimization is locally stable. arXiv preprintarXiv:1706.04156 (2017)

  44. Nagarajan, V., Raffel, C., Goodfellow, I.: Theoretical insights into memorization in GANs. In: Neural Information Processing Systems Workshop

  45. Nowozin, S., Cseke, B., Tomioka, R.: \(f\)-GAN: training generative neural samplers using variational divergence minimization. In: Advances in Neural Information Processing Systems, pp. 271–279 (2016)

  46. Petzka, H., Fischer, A., Lukovnicov, D.: On the regularization of Wasserstein GANs (2018)

  47. Prykhodko, O., Johansson, S.V., Kotsias, P.-C., Arús-Pous, J., Bjerrum, E.J., Engkvist, O., Chen, H.: A de novo molecular generation method using latent vector based generative adversarial network. J. Cheminform. 11(74), 1–11 (2019)

    Google Scholar 

  48. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprintarXiv:1511.06434 (2015)

  49. Rahimi, A., Recht, B.: Uniform approximation of functions with random bases. In: 2008 46th Annual Allerton Conference on Communication, Control, and Computing, IEEE, pp. 555–561 (2008)

  50. Royden, H.L.: Real Analysis, 3rd edn. Collier Macmillan, London (1988)

    MATH  Google Scholar 

  51. Saxena, D., Cao, J.: Generative adversarial networks (GANs) challenges, solutions, and future directions. ACM Comput. Surv. (CSUR) 54(3), 1–42 (2021)

    Article  Google Scholar 

  52. Shah, V., Hegde, C.: Solving linear inverse problems using GAN priors: an algorithm with provable guarantees. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 4609–4613 (2018)

  53. Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)

    Book  Google Scholar 

  54. Singh, S., Póczos, B.: Minimax distribution estimation in Wasserstein distance. arXiv preprintarXiv:1802.08855 (2018)

  55. Sun, Y., Gilbert, A., Tewari, A.: On the approximation properties of random ReLU features. arXiv preprintarXiv:1810.04374 (2018)

  56. Tabak, E.G., Vanden-Eijnden, E., et al.: Density estimation by dual ascent of the log-likelihood. Commun. Math. Sci. 8(1), 217–233 (2010)

    Article  MathSciNet  Google Scholar 

  57. Villani, C.: Topics in Optimal Transportation. No. 58 in Graduate Studies in Mathematics. American Mathematical Society, New York (2003)

    Google Scholar 

  58. Weed, J., Bach, F.: Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. arXiv preprintarXiv:1707.00087 (2017)

  59. Wojtowytsch, S.: On the convergence of gradient descent training for two-layer ReLU-networks in the mean field regime. arXiv preprintarXiv:2005.13530 (2020)

  60. Wu, H., Zheng, S., Zhang, J., Huang, K.: GP-GAN: Towards realistic high-resolution image blending. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2487–2495 (2019)

  61. Wu, S., Dimakis, A.G., Sanghavi, S.: Learning distributions generated by one-layer ReLU networks. In: Advances in Neural Information Processing Systems 32. Curran Associates, Inc., pp. 8107–8117 (2019)

  62. Xu, K., Li, C., Zhu, J., Zhang, B.: Understanding and stabilizing GANs’ training dynamics using control theory. In: International Conference on Machine Learning, PMLR, pp. 10566–10575 (2020)

  63. Yang, H., E, W.: Generalization and memorization: the bias potential model (2020)

  64. Yazici, Y., Foo, C.-S., Winkler, S., Yap, K.-H., Chandrasekhar, V.: Empirical analysis of overfitting and mode drop in GAN training. In: 2020 IEEE International Conference on Image Processing (ICIP), IEEE, pp. 1651–1655 (2020)

  65. Zhang, P., Liu, Q., Zhou, D., Xu, T., He, X.: On the discrimination-generalization tradeoff in GANs. arXiv preprintarXiv:1711.02771 (2017)

  66. Zhao, J., Mathieu, M., LeCun, Y.: Energy-based generative adversarial network. arXiv preprintarXiv:1609.03126 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongkang Yang.

Ethics declarations

Conflict of Interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, H., E, W. Generalization error of GAN from the discriminator’s perspective. Res Math Sci 9, 8 (2022). https://doi.org/10.1007/s40687-021-00306-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40687-021-00306-y

Keywords

Navigation