Skip to main content

On the regularized risk of distributionally robust learning over deep neural networks


In this paper, we explore the relation between distributionally robust learning and different forms of regularization to enforce robustness of deep neural networks. In particular, starting from a concrete min-max distributionally robust problem, and using tools from optimal transport theory, we derive first-order and second-order approximations to the distributionally robust problem in terms of appropriate regularized risk minimization problems. In the context of deep ResNet models, we identify the structure of the resulting regularization problems as mean-field optimal control problems where the number and dimension of state variables are within a dimension-free factor of the dimension of the original unrobust problem. Using the Pontryagin maximum principles associated with these problems, we motivate a family of scalable algorithms for the training of robust neural networks. Our analysis recovers some results and algorithms known in the literature (in settings explained throughout the paper) and provides many other theoretical and algorithmic insights that to our knowledge are novel. In our analysis, we employ tools that we deem useful for a future analysis of more general adversarial learning problems.

This is a preview of subscription content, access via your institution.

Fig. 1


  1. Two layers with a convolutional kernel, ReLu activation functions, and maxpool; and two linear layers at the end


  1. Ambrosio, L., Gigli, N., Savaré, G.: Gradient flows in metric spaces and in the space of probability measures, 2nd edn. Lectures in Mathematics ETH Zürich. Biruser Verlag, Basel (2008)

  2. Belloni, A., Chernozhukov, V., Wang, L.: Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98(4), 791–806 (2011)

    MathSciNet  Article  Google Scholar 

  3. Ben-Tal, A., den Hertog, D., Waegenaere, A.D., Melenberg, B., Rennen, G.: Robust solutions of optimization problems affected by uncertain probabilities. Manag. Sci. 59(2), 341–357 (2013)

    Article  Google Scholar 

  4. Blanchet, J., Kang, Y., Murthy, K.: Robust Wasserstein profile inference and applications to machine learning. J. Appl. Probab. 56(3), 830–857 (2019)

    MathSciNet  Article  Google Scholar 

  5. Blanchet J, Murthy K, Nguyen VA. Statistical analysis of wasserstein distributionally robust estimators. 2021

  6. Carlini N, Athalye A, Papernot N, W. Brendel, J. Rauber, D. Tsipras, I. Goodfellow, A. Madry, and A. Kurakin. On Evaluating Adversarial Robustness arXiv:1902.06705 [cs, math] (2019)

  7. Carlini N and Wagner D: Towards Evaluating the Robustness of Neural Networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57, San Jose, CA, USA, May 2017. IEEE

  8. Carmona R , Delarue F. Probabilistic Theory of Mean Field Games with Applications II: mean field games with common noise and master equations, volume 84. Springer, 2018

  9. Chen R, Paschalidis IC Distributionally robust learning. Foundations and Trends®in Optimization, 4(1-2):1–243, 2020

  10. Chen RTQ, Rubanova Y, Bettencourt J, Duvenaud DK. Neural ordinary differential equations. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018

  11. Dong Y, Deng Z, Pang T, J. Z. 0001, and H. S. 0006. Adversarial distributional training for robust deep learning. In H. Larochelle, M. Ranzato, R. Hadsell, M.-F. Balcan, and H.-T. Lin, editors, Advances in Neural Information Processing Systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020

  12. Drucker, H., Le Cun, Y.: Improving generalization performance using double backpropagation. IEEE Trans. Neural Netw. 3(6), 991–997 (1992)

    Article  Google Scholar 

  13. Dudley, R.M.: Real analysis and probability. CRC Press (2018)

  14. E W, Han J, Li Q. A Mean-field optimal control formulation of deep learning. arXiv:1807.01083 [cs, math] (2018)

  15. Fawzi A, Fawzi H, Fawzi O. Adversarial vulnerability for any classifier. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018

  16. Finlay C, Calder J, Abbasi B, Oberman , A: Lipschitz regularized deep neural networks generalize and are adversarially robust. 2018

  17. Finlay, C., Oberman, A.M.: Scaleable input gradient regularization for adversarial robustness. Mach. Learn. Appl. 3, 100017 (2021)

    Google Scholar 

  18. García Trillos N, Murray R. Adversarial classification: necessary conditions and geometric flows. arXiv:2011.10797, (2020)

  19. Goodfellow I, Shlens J, Szegedy C: Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015

  20. Haber, E., Ruthotto, L.: Stable architectures for deep neural networks. Inverse Problems 34(1), 014004 (2017)

    MathSciNet  Article  Google Scholar 

  21. Hein M, Andriushchenko M. Formal guarantees on the robustness of a classifier against adversarial manipulation. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

  22. Jetley S, Lord N, Torr P. With friends like these, who needs adversaries? In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018

  23. Kuhn D, Esfahani P, Nguyen V, Shafieezadeh-Abadeh S. Wasserstein Distributionally robust optimization: theory and applications in machine learning, pages 130–166. 10 2019

  24. Li, Q., Chen, L., Tai, C.W.E.: Maximum principle based algorithms for deep learning. J. Mach. Learn. Res. 18(165), 1–29 (2018)

    MathSciNet  MATH  Google Scholar 

  25. Lyu C, Huang K, Liang H-N. A unified gradient regularization family for adversarial examples. In 2015 IEEE International Conference on Data Mining, pages 301–309, 2015

  26. Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv:1706.06083 [cs, stat] (2019)

  27. Moosavi-Dezfooli S-M , Fawzi A, Uesato J, Frossard P. Robustness via curvature regularization, and vice versa. In 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pages 9070–9078, 2019

  28. Pedregal P. Optimization, relaxation and young measures. Bull. Amer. Math. Soc. (N.S.), 36(1):27–58, 1999

  29. Ross AS , Doshi-Velez F. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. (2018)

  30. Roth K, Lucchi A, Nowozin S, Hofmann T. Adversarially robust training through structured gradient regularization. (2018)

  31. Shafahi A, Najibi M, Ghiasi MA, Z. Xu, J. Dickerson, C. Studer, L. S. Davis, G. Taylor, and T. Goldstein. Adversarial training for free! In H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

  32. Thorpe M, van Gennip Y. Deep limits of residual neural networks. arXiv:1810.11741 [math.CA], 2018

  33. Thorpe M, Wang B: Robust certification for laplace learning on geometric graphs. In Proceedings of Machine Learning Research, 2021

  34. Tramèr, A. Kurakin F, Papernot N, Goodfellow I, Boneh D, McDaniel P. Ensemble adversarial training: attacks and defenses. arXiv:1705.07204 [cs, stat] (2020)

  35. Villani C.: Topics in optimal transportation. Graduate Studies in Mathematics, vol. 58. American Mathematical Society, Providence, RI (2003)

  36. Weng T-W , Zhang H, Chen P-Y, Yi J, Su D, Gao Y, Hsieh C-J, Daniel L. Evaluating the robustness of neural networks: an extreme value theory approach. In International Conference on Learning Representations, 2018

  37. Wiesemann, W., Kuhn, D., Sim, M.: Distributionally robust convex optimization. Oper. Res. 62, 1358–1376 (2014)

    MathSciNet  Article  Google Scholar 

  38. Wong E, Rice L, Kolter JZ. Fast is better than free: revisiting adversarial training. arXiv:2001.03994 [cs, stat] (2020)

  39. Yang WH . On generalized holder inequality. 1991

  40. Yeats EC , Chen Y, Li H. Improving gradient regularization using complex-valued neural networks. In M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on machine learning, volume 139 of Proceedings of Machine Learning Research, pages 11953–11963. PMLR, 18–24 Jul 2021

  41. Yong J, Zhou XY: Stochastic controls: Hamiltonian systems and HJB equations, volume 43. Springer Science & Business Media, 1999

  42. Zhang D, Zhang T, Lu Y, Zhu Z, Dong B. You only propagate once: Accelerating adversarial training via maximal principle. In H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett, editors, Advances in neural information processing systems, volume 32. Curran Associates, Inc., 2019

Download references


The authors would like to thank two anonymous reviewers for their positive and constructive feedback. The authors would like to thank Leon Bungert for enlightening conversations and for providing them with many useful references. NGT was supported by NSF-DMS grant 2005797 and would also like to thank the IFDS at UW-Madison and NSF through TRIPODS grant 2023239 for their support. Part of this work was completed, while NGT was visiting the Simons Institute to participate in the program “Geometric Methods in Optimization and Sampling” during the Fall of 2021. NGT would like to thank the institute for hospitality and support.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Nicolás García Trillos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

García Trillos, C.A., García Trillos, N. On the regularized risk of distributionally robust learning over deep neural networks. Res Math Sci 9, 54 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: