Skip to main content

A Principle of Least Action for the Training of Neural Networks

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12458))

Abstract

Neural networks have been achieving high generalization performance on many tasks despite being highly over-parameterized. Since classical statistical learning theory struggles to explain this behaviour, much effort has recently been focused on uncovering the mechanisms behind it, in the hope of developing a more adequate theoretical framework and having a better control over the trained models. In this work, we adopt an alternative perspective, viewing the neural network as a dynamical system displacing input particles over time. We conduct a series of experiments and, by analyzing the network’s behaviour through its displacements, we show the presence of a low kinetic energy bias in the transport map of the network, and link this bias with generalization performance. From this observation, we reformulate the learning problem as follows: find neural networks that solve the task while transporting the data as efficiently as possible. This offers a novel formulation of the learning problem which allows us to provide regularity results for the solution network, based on Optimal Transport theory. From a practical viewpoint, this allows us to propose a new learning algorithm, which automatically adapts to the complexity of the task, and leads to networks with a high generalization ability even in low data regimes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    \(T_\sharp \alpha \) is the push-forward measure: \(T_\sharp \alpha (B) = \alpha (T^{-1}(B))\) for any measurable set B.

  2. 2.

    By this, we mean that \(\Vert T^{\theta ^\star }-T^\star \Vert _\infty \le \epsilon \) where \(T^\star \) is the OT map.

References

  1. Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. PNAS 116, 15849–15854 (2019)

    Google Scholar 

  2. Belkin, M., Ma, S., Mandal, S.: To understand deep learning we need to understand kernel learning. In: 35th International Conference on Machine Learning (2018)

    Google Scholar 

  3. Benamou, J., Brenier, Y.: A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numerische Mathematik 84, 375–393 (2000)

    Google Scholar 

  4. Bolley, F.: Separability and completeness for the Wasserstein distance. In: Donati-Martin, C., Émery, M., Rouault, A., Stricker, C. (eds.) Séminaire de Probabilités XLI. LNM, vol. 1934, pp. 371–377. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-77913-1_17

    Chapter  Google Scholar 

  5. de Bézennac, E., Ayed, I., Gallinari, P.: Optimal unsupervised domain translation (2019)

    Google Scholar 

  6. Chang, B., et al.: Reversible architectures for arbitrarily deep residual neural networks. In: AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  7. Chen, R., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary differential equations. In: Advances in Neural Information Processing Systems (2018)

    Google Scholar 

  8. De Palma, G., Kiani, B., Lloyd, S.: Random deep neural networks are biased towards simple functions. In: Advances in Neural Information Processing Systems (2019)

    Google Scholar 

  9. Feynman, R.P.: The principle of least action in quantum mechanics. In: Feynman’s Thesis - A New Approach to Quantum Theory. World Scientific Publishing (2005)

    Google Scholar 

  10. Garcia-Morales, V., Pellicer, J., Manzanares, J.: Thermodynamics based on the principle of least abbreviated action. Ann. Phys. 323, 1844–1858 (2008)

    Google Scholar 

  11. Gray, C.G.: Principle of least action. Scholarpedia (2009)

    Google Scholar 

  12. Haber, E., Lensink, K., Treister, E., Ruthotto, L.: IMEXnet a forward stable deep neural network. In: 36th International Conference on Machine Learning (2019)

    Google Scholar 

  13. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. SSS. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7

    Book  MATH  Google Scholar 

  14. Hauser, M.: On residual networks learning a perturbation from identity (2019)

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38

    Chapter  Google Scholar 

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  17. Jacot, A., Gabriel, F., Hongler, C.: Neural tangent kernel: convergence and generalization in neural networks. In: Advances in Neural Information Processing Systems (2018)

    Google Scholar 

  18. Jastrzebski, S., Arpit, D., Ballas, N., Verma, V., Che, T., Bengio, Y.: Residual connections encourage iterative inference. In: ICLR (2018)

    Google Scholar 

  19. Li, Q., Chen, L., Tai, C., Weinan, E.: Maximum principle based algorithms for deep learning. J. Mach. Learn. Res. 18, 1–29 (2018)

    Google Scholar 

  20. Lu, Y., Zhong, A., Li, Q., Dong, B.: Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. In: 35th International Conference on Machine Learning (2018)

    Google Scholar 

  21. Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., Sutskever, I.: Deep double descent: where bigger models and more data hurt. In: ICLR (2020)

    Google Scholar 

  22. Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: ICLR (2018)

    Google Scholar 

  23. Peyre, G., Cuturi, M.: Computational Optimal Transport. Now Publishers (2019)

    Google Scholar 

  24. Rahaman, N., et al.: On the spectral bias of neural networks. In: 36th International Conference on Machine Learning (2019)

    Google Scholar 

  25. Ruthotto, L., Haber, E.: Deep neural networks motivated by partial differential equations. J. Math. Imaging Vis. 62(3), 352–364 (2019). https://doi.org/10.1007/s10851-019-00903-1

    Article  MathSciNet  MATH  Google Scholar 

  26. Sandler, M., Baccash, J., Zhmoginov, A., Howard, A.: Non-discriminative data or weak model? on the relative importance of data and model resolution. In: International Conference on Computer Vision Workshop (ICCVW) (2019)

    Google Scholar 

  27. Santambrogio, F.: Optimal transport for Applied Mathematicians. Birkhäuser (2015)

    Google Scholar 

  28. Saxe, A.M., Mcclelland, J.L., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural network. In: ICLR (2014)

    Google Scholar 

  29. Sonoda, S., Murata, N.: Transport analysis of infinitely deep neural network. J. Mach. Learn. Res. 20, 31–81 (2019)

    Google Scholar 

  30. Weinan, E.: A proposal on machine learning via dynamical systems. Commun. Math. Stat. 5(1), 1–11 (2017). https://doi.org/10.1007/s40304-017-0103-z

    Article  MathSciNet  MATH  Google Scholar 

  31. Xie, S., et al.: Aggregated residual transformations for deep neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  32. Yan, H., Du, J., Tan, V., Feng, J.: On robustness of neural ordinary differential equations. In: ICLR (2020)

    Google Scholar 

  33. Yoshida, Y., Miyato, T.: Spectral norm regularization for improving the generalizability of deep learning (2017)

    Google Scholar 

  34. Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Proceedings of the British Machine Vision Conference (BMVC). BMVA Press (2016)

    Google Scholar 

  35. Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. In: ICLR (2017)

    Google Scholar 

  36. Zhang, J., et al.: Towards robust resnet: a small step but a giant leap. In: Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI) (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Skander Karkar .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 449 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Karkar, S., Ayed, I., Bézenac, E.d., Gallinari, P. (2021). A Principle of Least Action for the Training of Neural Networks. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12458. Springer, Cham. https://doi.org/10.1007/978-3-030-67661-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67661-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67660-5

  • Online ISBN: 978-3-030-67661-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics