A Principle of Least Action for the Training of Neural Networks

Karkar, Skander; Ayed, Ibrahim; Bézenac, Emmanuel de; Gallinari, Patrick

doi:10.1007/978-3-030-67661-2_7

Skander Karkar¹²,
Ibrahim Ayed¹³,
Emmanuel de Bézenac¹³ &
…
Patrick Gallinari^12,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12458))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1480 Accesses
1 Citations

Abstract

Neural networks have been achieving high generalization performance on many tasks despite being highly over-parameterized. Since classical statistical learning theory struggles to explain this behaviour, much effort has recently been focused on uncovering the mechanisms behind it, in the hope of developing a more adequate theoretical framework and having a better control over the trained models. In this work, we adopt an alternative perspective, viewing the neural network as a dynamical system displacing input particles over time. We conduct a series of experiments and, by analyzing the network’s behaviour through its displacements, we show the presence of a low kinetic energy bias in the transport map of the network, and link this bias with generalization performance. From this observation, we reformulate the learning problem as follows: find neural networks that solve the task while transporting the data as efficiently as possible. This offers a novel formulation of the learning problem which allows us to provide regularity results for the solution network, based on Optimal Transport theory. From a practical viewpoint, this allows us to propose a new learning algorithm, which automatically adapts to the complexity of the task, and leads to networks with a high generalization ability even in low data regimes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Treating Artificial Neural Net Training as a Nonsmooth Global Optimization Problem

Entropical Optimal Transport, Schrödinger’s System and Algorithms

Article 05 November 2021

Regularisation of neural networks by enforcing Lipschitz continuity

Article Open access 06 December 2020

Notes

1.
\(T_\sharp \alpha \) is the push-forward measure: \(T_\sharp \alpha (B) = \alpha (T^{-1}(B))\) for any measurable set B.
2.
By this, we mean that \(\Vert T^{\theta ^\star }-T^\star \Vert _\infty \le \epsilon \) where \(T^\star \) is the OT map.

References

Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. PNAS 116, 15849–15854 (2019)
Google Scholar
Belkin, M., Ma, S., Mandal, S.: To understand deep learning we need to understand kernel learning. In: 35th International Conference on Machine Learning (2018)
Google Scholar
Benamou, J., Brenier, Y.: A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numerische Mathematik 84, 375–393 (2000)
Google Scholar
Bolley, F.: Separability and completeness for the Wasserstein distance. In: Donati-Martin, C., Émery, M., Rouault, A., Stricker, C. (eds.) Séminaire de Probabilités XLI. LNM, vol. 1934, pp. 371–377. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-77913-1_17
Chapter Google Scholar
de Bézennac, E., Ayed, I., Gallinari, P.: Optimal unsupervised domain translation (2019)
Google Scholar
Chang, B., et al.: Reversible architectures for arbitrarily deep residual neural networks. In: AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Chen, R., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary differential equations. In: Advances in Neural Information Processing Systems (2018)
Google Scholar
De Palma, G., Kiani, B., Lloyd, S.: Random deep neural networks are biased towards simple functions. In: Advances in Neural Information Processing Systems (2019)
Google Scholar
Feynman, R.P.: The principle of least action in quantum mechanics. In: Feynman’s Thesis - A New Approach to Quantum Theory. World Scientific Publishing (2005)
Google Scholar
Garcia-Morales, V., Pellicer, J., Manzanares, J.: Thermodynamics based on the principle of least abbreviated action. Ann. Phys. 323, 1844–1858 (2008)
Google Scholar
Gray, C.G.: Principle of least action. Scholarpedia (2009)
Google Scholar
Haber, E., Lensink, K., Treister, E., Ruthotto, L.: IMEXnet a forward stable deep neural network. In: 36th International Conference on Machine Learning (2019)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. SSS. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Book MATH Google Scholar
Hauser, M.: On residual networks learning a perturbation from identity (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Jacot, A., Gabriel, F., Hongler, C.: Neural tangent kernel: convergence and generalization in neural networks. In: Advances in Neural Information Processing Systems (2018)
Google Scholar
Jastrzebski, S., Arpit, D., Ballas, N., Verma, V., Che, T., Bengio, Y.: Residual connections encourage iterative inference. In: ICLR (2018)
Google Scholar
Li, Q., Chen, L., Tai, C., Weinan, E.: Maximum principle based algorithms for deep learning. J. Mach. Learn. Res. 18, 1–29 (2018)
Google Scholar
Lu, Y., Zhong, A., Li, Q., Dong, B.: Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. In: 35th International Conference on Machine Learning (2018)
Google Scholar
Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., Sutskever, I.: Deep double descent: where bigger models and more data hurt. In: ICLR (2020)
Google Scholar
Novak, R., Bahri, Y., Abolafia, D.A., Pennington, J., Sohl-Dickstein, J.: Sensitivity and generalization in neural networks: an empirical study. In: ICLR (2018)
Google Scholar
Peyre, G., Cuturi, M.: Computational Optimal Transport. Now Publishers (2019)
Google Scholar
Rahaman, N., et al.: On the spectral bias of neural networks. In: 36th International Conference on Machine Learning (2019)
Google Scholar
Ruthotto, L., Haber, E.: Deep neural networks motivated by partial differential equations. J. Math. Imaging Vis. 62(3), 352–364 (2019). https://doi.org/10.1007/s10851-019-00903-1
Article MathSciNet MATH Google Scholar
Sandler, M., Baccash, J., Zhmoginov, A., Howard, A.: Non-discriminative data or weak model? on the relative importance of data and model resolution. In: International Conference on Computer Vision Workshop (ICCVW) (2019)
Google Scholar
Santambrogio, F.: Optimal transport for Applied Mathematicians. Birkhäuser (2015)
Google Scholar
Saxe, A.M., Mcclelland, J.L., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural network. In: ICLR (2014)
Google Scholar
Sonoda, S., Murata, N.: Transport analysis of infinitely deep neural network. J. Mach. Learn. Res. 20, 31–81 (2019)
Google Scholar
Weinan, E.: A proposal on machine learning via dynamical systems. Commun. Math. Stat. 5(1), 1–11 (2017). https://doi.org/10.1007/s40304-017-0103-z
Article MathSciNet MATH Google Scholar
Xie, S., et al.: Aggregated residual transformations for deep neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Yan, H., Du, J., Tan, V., Feng, J.: On robustness of neural ordinary differential equations. In: ICLR (2020)
Google Scholar
Yoshida, Y., Miyato, T.: Spectral norm regularization for improving the generalizability of deep learning (2017)
Google Scholar
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Proceedings of the British Machine Vision Conference (BMVC). BMVA Press (2016)
Google Scholar
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. In: ICLR (2017)
Google Scholar
Zhang, J., et al.: Towards robust resnet: a small step but a giant leap. In: Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI) (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Criteo AI Lab, Criteo, Paris, France
Skander Karkar & Patrick Gallinari
LIP6, Sorbonne Université, Paris, France
Ibrahim Ayed, Emmanuel de Bézenac & Patrick Gallinari

Authors

Skander Karkar
View author publications
You can also search for this author in PubMed Google Scholar
Ibrahim Ayed
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel de Bézenac
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Gallinari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Skander Karkar .

Editor information

Editors and Affiliations

Albert-Ludwigs-Universität, Freiburg, Germany
Frank Hutter
TU Darmstadt, Darmstadt, Germany
Kristian Kersting
Ghent University, Ghent, Belgium
Jefrey Lijffijt
Saarland University, Saarbrücken, Germany
Isabel Valera

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 449 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karkar, S., Ayed, I., Bézenac, E.d., Gallinari, P. (2021). A Principle of Least Action for the Training of Neural Networks. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12458. Springer, Cham. https://doi.org/10.1007/978-3-030-67661-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-67661-2_7
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67660-5
Online ISBN: 978-3-030-67661-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

A Principle of Least Action for the Training of Neural Networks

Abstract

Access this chapter

Similar content being viewed by others

Treating Artificial Neural Net Training as a Nonsmooth Global Optimization Problem

Entropical Optimal Transport, Schrödinger’s System and Algorithms

Regularisation of neural networks by enforcing Lipschitz continuity

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 449 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

A Principle of Least Action for the Training of Neural Networks

Abstract

Access this chapter

Similar content being viewed by others

Treating Artificial Neural Net Training as a Nonsmooth Global Optimization Problem

Entropical Optimal Transport, Schrödinger’s System and Algorithms

Regularisation of neural networks by enforcing Lipschitz continuity

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 449 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation