Generalization error of GAN from the discriminator’s perspective

Yang, Hongkang; E, Weinan

doi:10.1007/s40687-021-00306-y

Generalization error of GAN from the discriminator’s perspective

Research
Published: 28 December 2021

Volume 9, article number 8, (2022)
Cite this article

Research in the Mathematical Sciences Aims and scope Submit manuscript

546 Accesses
4 Citations
Explore all metrics

Abstract

The generative adversarial network (GAN) is a well-known model for learning high-dimensional distributions, but the mechanism for its generalization ability is not understood. In particular, GAN is vulnerable to the memorization phenomenon, the eventual convergence to the empirical distribution. We consider a simplified GAN model with the generator replaced by a density and analyze how the discriminator contributes to generalization. We show that with early stopping, the generalization error measured by Wasserstein metric escapes from the curse of dimensionality, despite that in the long term, memorization is inevitable. In addition, we present a hardness of learning result for WGAN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Robust generative adversarial network

Article 13 September 2023

Shufei Zhang, Zhuang Qian, … Canyi Lu

On the Estimation of the Wasserstein Distance in Generative Models

Generative Adversarial Networks: Recent Developments

Notes

Technically, Fig. 1 curve \(\textcircled {1}\) concerns the \(W_2\) loss, but it is reasonable to believe that training on \(W_1\) or any \(W_p\) loss cannot escape from the curse of dimensionality either.

References

Ambrosio, L., Gigli, N., Savaré, G.: Gradient flows: in metric spaces and in the space of probability measures. Springer, Berlin (2008)
MATH Google Scholar
Arbel, M., Korba, A., Salim, A., Gretton, A.: Maximum mean discrepancy gradient flow. arXiv preprintarXiv:1906.04370 (2019)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv preprintarXiv:1701.07875 (2017)
Arora, S., Ge, R., Liang, Y., Ma, T., Zhang, Y.: Generalization and equilibrium in generative adversarial nets (GANs). arXiv preprintarXiv:1703.00573 (2017)
Arora, S., Risteski, A., Zhang, Y.: Do GANs learn the distribution? Some theory and empirics. In: International Conference on Learning Representations (2018)
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprintarXiv:1607.06450 (2016)
Bai, Y., Ma, T., Risteski, A.: Approximability of discriminators implies diversity in GANs (2019)
Balaji, Y., Sajedi, M., Kalibhat, N.M., Ding, M., Stöger, D., Soltanolkotabi, M., Feizi, S.: Understanding overparameterization in generative adversarial networks. arXiv preprintarXiv:2104.05605 (2021)
Borkar, V.S.: Stochastic approximation with two time scales. Syst. Control Lett. 29(5), 291–294 (1997)
Article MathSciNet Google Scholar
Chavdarova, T., Fleuret, F.: SGAN: an alternative training of generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9407–9415 (2018)
Che, T., Li, Y., Jacob, A., Bengio, Y., Li, W.: Mode regularized generative adversarial networks. arXiv preprintarXiv:1612.02136 (2016)
Dobrić, V., Yukich, J.E.: Asymptotics for transportation cost in high dimensions. J. Theor. Probab. 8(1), 97–118 (1995)
Article MathSciNet Google Scholar
E, W., Ma, C., Wang, Q.: A priori estimates of the population risk for residual networks. arXiv preprintarXiv:1903.02154 1, 7 (2019)
E, W., Ma, C., Wojtowytsch, S., Wu, L.: Towards a mathematical understanding of neural network-based machine learning: what we know and what we don’t (2020)
E, W., Ma, C., Wu, L.: A priori estimates for two-layer neural networks. arXiv preprintarXiv:1810.06397 (2018)
E, W., Ma, C., Wu, L.: On the generalization properties of minimum-norm solutions for over-parameterized neural network models. arXiv preprintarXiv:1912.06987 (2019)
E, W., Ma, C., Wu, L.: Machine learning from a continuous viewpoint, I. Sci. China Math. 63(11), 2233–2266 (2020)
E, W., Ma, C., Wu, L.: The Barron space and the flow-induced function spaces for neural network models. Construct. Approx., 1–38 (2021)
E, W., Wojtowytsch, S.: Kolmogorov width decay and poor approximators in machine learning: shallow neural networks, random feature models and neural tangent kernels. arXiv preprintarXiv:2005.10807 (2020)
E, W., Wojtowytsch, S.: On the Banach spaces associated with multi-layer ReLU networks: function representation, approximation theory and gradient descent dynamics. arXiv preprintarXiv:2007.15623 (2020)
Feizi, S., Farnia, F., Ginart, T., Tse, D.: Understanding GANs in the LQG setting: formulation, generalization and stability. IEEE J. Sel. Areas Inf. Theory 1(1), 304–311 (2020)
Article Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680 (2014)
Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13(25), 723–773 (2012)
MathSciNet MATH Google Scholar
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of Wasserstein GANs (2017)
Gulrajani, I., Raffel, C., Metz, L.: Towards GAN benchmarks which require generalization. arXiv preprintarXiv:2001.03653 (2020)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems, pp. 6626–6637 (2017)
Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991)
Article MathSciNet Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprintarXiv:1502.03167 (2015)
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Jiang, Y., Chang, S., Wang, Z.: TransGAN: Two transformers can make one strong GAN. arXiv preprintarXiv:2102.07074 (2021)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprintarXiv:1312.6114 (2013)
Kodali, N., Abernethy, J., Hays, J., Kira, Z.: On convergence and stability of GANs. arXiv preprintarXiv:1705.07215 (2017)
Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Advances in Neural Information Processing Systems, pp. 950–957 (1992)
Lei, Q., Lee, J.D., Dimakis, A.G., Daskalakis, C.: SGD learns one-layer networks in WGANs (2020)
Liang, Y., Lee, D., Li, Y., Shin, B.-S.: Unpaired medical image colorization using generative adversarial network. Multimed. Tools Appl., 1–15 (2021)
Lin, T., Jin, C., Jordan, M.: On gradient descent ascent for nonconvex-concave minimax problems. In: International Conference on Machine Learning, PMLR, pp. 6083–6093 (2020)
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Smolley, S.P.: On the effectiveness of least squares generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41(12), 2947–2960 (2018)
Article Google Scholar
Mao, Y., He, Q., Zhao, X.: Designing complex architectured materials with generative adversarial networks. Sci. Adv. 6, 17 (2020)
Google Scholar
Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for GANs do actually converge? In: International Conference on Machine Learning, PMLR, pp. 3481–3490 (2018)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprintarXiv:1802.05957 (2018)
Mustafa, M., Bard, D., Bhimji, W., Lukić, Z., Al-Rfou, R., Kratochvil, J.M.: CosmoGAN: creating high-fidelity weak lensing convergence maps using generative adversarial networks. Comput. Astrophys. Cosmol. 6(1), 1 (2019)
Article Google Scholar
Nagarajan, V., Kolter, J.Z.: Gradient descent GAN optimization is locally stable. arXiv preprintarXiv:1706.04156 (2017)
Nagarajan, V., Raffel, C., Goodfellow, I.: Theoretical insights into memorization in GANs. In: Neural Information Processing Systems Workshop
Nowozin, S., Cseke, B., Tomioka, R.: \(f\)-GAN: training generative neural samplers using variational divergence minimization. In: Advances in Neural Information Processing Systems, pp. 271–279 (2016)
Petzka, H., Fischer, A., Lukovnicov, D.: On the regularization of Wasserstein GANs (2018)
Prykhodko, O., Johansson, S.V., Kotsias, P.-C., Arús-Pous, J., Bjerrum, E.J., Engkvist, O., Chen, H.: A de novo molecular generation method using latent vector based generative adversarial network. J. Cheminform. 11(74), 1–11 (2019)
Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprintarXiv:1511.06434 (2015)
Rahimi, A., Recht, B.: Uniform approximation of functions with random bases. In: 2008 46th Annual Allerton Conference on Communication, Control, and Computing, IEEE, pp. 555–561 (2008)
Royden, H.L.: Real Analysis, 3rd edn. Collier Macmillan, London (1988)
MATH Google Scholar
Saxena, D., Cao, J.: Generative adversarial networks (GANs) challenges, solutions, and future directions. ACM Comput. Surv. (CSUR) 54(3), 1–42 (2021)
Article Google Scholar
Shah, V., Hegde, C.: Solving linear inverse problems using GAN priors: an algorithm with provable guarantees. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 4609–4613 (2018)
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)
Book Google Scholar
Singh, S., Póczos, B.: Minimax distribution estimation in Wasserstein distance. arXiv preprintarXiv:1802.08855 (2018)
Sun, Y., Gilbert, A., Tewari, A.: On the approximation properties of random ReLU features. arXiv preprintarXiv:1810.04374 (2018)
Tabak, E.G., Vanden-Eijnden, E., et al.: Density estimation by dual ascent of the log-likelihood. Commun. Math. Sci. 8(1), 217–233 (2010)
Article MathSciNet Google Scholar
Villani, C.: Topics in Optimal Transportation. No. 58 in Graduate Studies in Mathematics. American Mathematical Society, New York (2003)
Google Scholar
Weed, J., Bach, F.: Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. arXiv preprintarXiv:1707.00087 (2017)
Wojtowytsch, S.: On the convergence of gradient descent training for two-layer ReLU-networks in the mean field regime. arXiv preprintarXiv:2005.13530 (2020)
Wu, H., Zheng, S., Zhang, J., Huang, K.: GP-GAN: Towards realistic high-resolution image blending. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2487–2495 (2019)
Wu, S., Dimakis, A.G., Sanghavi, S.: Learning distributions generated by one-layer ReLU networks. In: Advances in Neural Information Processing Systems 32. Curran Associates, Inc., pp. 8107–8117 (2019)
Xu, K., Li, C., Zhu, J., Zhang, B.: Understanding and stabilizing GANs’ training dynamics using control theory. In: International Conference on Machine Learning, PMLR, pp. 10566–10575 (2020)
Yang, H., E, W.: Generalization and memorization: the bias potential model (2020)
Yazici, Y., Foo, C.-S., Winkler, S., Yap, K.-H., Chandrasekhar, V.: Empirical analysis of overfitting and mode drop in GAN training. In: 2020 IEEE International Conference on Image Processing (ICIP), IEEE, pp. 1651–1655 (2020)
Zhang, P., Liu, Q., Zhou, D., Xu, T., He, X.: On the discrimination-generalization tradeoff in GANs. arXiv preprintarXiv:1711.02771 (2017)
Zhao, J., Mathieu, M., LeCun, Y.: Energy-based generative adversarial network. arXiv preprintarXiv:1609.03126 (2016)

Download references

Author information

Authors and Affiliations

Program in Applied and Computational Mathematics, Princeton University, Princeton, USA
Hongkang Yang & Weinan E
Department of Mathematics, Princeton University, Princeton, USA
Weinan E
Beijing Institute of Big Data Research, Beijing, China
Weinan E

Authors

Hongkang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Weinan E
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongkang Yang.

Ethics declarations

Conflict of Interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, H., E, W. Generalization error of GAN from the discriminator’s perspective. Res Math Sci 9, 8 (2022). https://doi.org/10.1007/s40687-021-00306-y

Download citation

Received: 09 July 2021
Accepted: 06 December 2021
Published: 28 December 2021
DOI: https://doi.org/10.1007/s40687-021-00306-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Generalization error of GAN from the discriminator’s perspective

Abstract

Access this article

Similar content being viewed by others

Robust generative adversarial network

On the Estimation of the Wasserstein Distance in Generative Models

Generative Adversarial Networks: Recent Developments

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generalization error of GAN from the discriminator’s perspective

Abstract

Access this article

Similar content being viewed by others

Robust generative adversarial network

On the Estimation of the Wasserstein Distance in Generative Models

Generative Adversarial Networks: Recent Developments

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation