Semi-discrete Optimization Through Semi-discrete Optimal Transport: A Framework for Neural Architecture Search

García Trillos, Nicolás; Morales, Javier

doi:10.1007/s00332-022-09780-2

Semi-discrete Optimization Through Semi-discrete Optimal Transport: A Framework for Neural Architecture Search

Published: 14 March 2022

Volume 32, article number 27, (2022)
Cite this article

Journal of Nonlinear Science Aims and scope Submit manuscript

Nicolás García Trillos¹ &
Javier Morales²

435 Accesses
Explore all metrics

Abstract

In this paper, we introduce a theoretical framework for semi-discrete optimization using ideas from optimal transport. Our primary motivation is in the field of deep learning, and specifically in the task of neural architecture search. With this aim in mind, we discuss the geometric and theoretical motivation for new techniques for neural architecture search [in the companion work (García-Trillos et al. in Traditional and accelerated gradient descent for neural architecture search, 2021); we show that algorithms inspired by our framework are competitive with contemporaneous methods]. We introduce a Riemannian like metric on the space of probability measures over a semi-discrete space \({\mathbb {R}}^d \times \mathcal {G}\) where \(\mathcal {G}\) is a finite weighted graph. With such Riemannian structure in hand, we derive formal expressions for the gradient flow of a relative entropy functional, as well as second-order dynamics for the optimization of said energy. Then, with the aim of providing a rigorous motivation for the gradient flow equations derived formally we also consider an iterative procedure known as minimizing movement scheme (i.e., Implicit Euler scheme, or JKO scheme) and apply it to the relative entropy with respect to a suitable cost function. For some specific choices of metric and cost, we rigorously show that the minimizing movement scheme of the relative entropy functional converges to the gradient flow process provided by the formal Riemannian structure. This flow coincides with a system of reaction–diffusion equations on \({\mathbb {R}}^d\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Traditional and Accelerated Gradient Descent for Neural Architecture Search

Optimization for Deep Learning: An Overview

Article 13 June 2020

Deep relaxation: partial differential equations for optimizing deep neural networks

Article 28 June 2018

References

Ambrosio, L., Gigli, N.: A User’s Guide to Optimal Transport, pp. 1–155. Springer, Berlin (2013)
Google Scholar
Ambrosio, L., Gigli, N., Savaré, G.: Gradient Flows in Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics ETH Zürich. Birkhäuser, Basel (2005)
MATH Google Scholar
Benamou, J.-D., Brenier, Y.: A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numer. Math. 84(3), 375–393 (2000)
Article MathSciNet Google Scholar
Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 24, pp. 2546–2554. Curran Associates Inc., Red Hook (2011)
Google Scholar
Chow, S.-N., Huang, W., Li, Y., Zhou, H.: Fokker-Planck equations for a free energy functional or Markov process on a graph. Arch. Ration. Mech. Anal. 203(3), 969–1008 (2012)
Article MathSciNet Google Scholar
Chung, F.: Spectral Graph Theory. American Mathematical Society, Providence (1996)
Book Google Scholar
do Carmo, M.P.: Riemannian Geometry. Mathematics: Theory & Applications. Birkhäuser Boston, Inc., Boston (1992) (Translated from the second Portuguese edition by Francis Flaherty)
Elsken, T., Metzen, J.-H., Hutter, F.: Simple and efficient architecture search for convolutional neural networks (2017). arXiv:1711.04528
Erbar, M., Fathi, M., Laschos, V., Schlichting, A.: Gradient flow structure for Mckean–Vlasov equations on discrete spaces (2016)
Erbar, M., Maas, J.: Ricci curvature of finite Markov chains via convexity of the entropy. Arch. Ration. Mech. Anal. 206(3), 997–1038 (2012)
Article MathSciNet Google Scholar
Esposito, A., Patacchini, F.S., Schlichting, A., Slepcev, D.: Nonlocal-interaction equation on graphs: gradient flow structure and continuum limit (2019). arXiv:abs/1912.09834
Figalli, A., Gigli, N.: A new transportation distance between non-negative measures, with applications to gradients flows with dirichlet boundary conditions. J. Math. Pures Appl. 94(2), 107–130 (2010)
Article MathSciNet Google Scholar
Garbuno-Inigo, A., Hoffmann, F., Li, W., Stuart, A.M.: Interacting Langevin diffusions: gradient structure and ensemble Kalman sampler (2019). arXiv:1903.08866
García Trillos, N.: Gromov-Hausdorff limit of Wasserstein spaces on point clouds. Calc. Var. 59, 73 (2020). https://doi.org/10.1007/s00526-020-1729-3
Gigli, N., Maas, J.: Gromov–Hausdorff convergence of discrete transportation metrics. SIAM J. Math. Anal. 45(2), 879–899 (2013)
Article MathSciNet Google Scholar
Gladbach, P., Kopfer, E., Maas, J.: Scaling limits of discrete optimal transport. SIAM J. Math. Anal. 52(3), 2759–2802 (2020)
Article MathSciNet Google Scholar
Gladbach, P., Kopfer, E., Maas, J., Portinale, L.: Homogenisation of one-dimensional discrete optimal transport. J. Math. Pures Appl. 139, 204–234 (2020)
Article MathSciNet Google Scholar
Jordan, R., Kinderlehrer, D., Otto, F.: The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal. 29(1), 1–17 (1998)
Article MathSciNet Google Scholar
Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L.-J., Fei-Fei, L., Yuille, A., Huang, J., Murphy, K.: Progressive neural architecture search. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018, pp. 19–35. Springer, Cham (2018)
Chapter Google Scholar
Maas, J.: Gradient flows of the entropy for finite Markov chains. J. Funct. Anal. 261(8), 2250–2292 (2011)
Article MathSciNet Google Scholar
Mielke, A.: A gradient structure for reaction–diffusion systems and for energy-drift-diffusion systems. Nonlinearity 24(4), 1329–1346 (2011)
Article MathSciNet Google Scholar
Mielke, A.: Geodesic convexity of the relative entropy in reversible Markov chains. Calc. Var. Partial Differ. Equ. 48(1), 1–31 (2013)
Article MathSciNet Google Scholar
Peyré, G., Cuturi, M.: Computational Optimal Transport: With Applications to Data Science, Foundations and Trends in Machine Learning, vol. 11, pp. 355–607 (2019)
Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: Dy, J., Krause, A. (eds) Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research: PMLR, pp. 4095–4104. Stockholmsmässan, Stockholm, 10–15 Jul (2018)
Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: AAAI (2018)
Simon, J.: Compact sets in the space Lp(o, t; b). Annali di Matematica Pura ed Applicata 146, 65–96 (1986)
Article Google Scholar
Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002)
Article Google Scholar
Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17(153), 1–43 (2016)
MathSciNet MATH Google Scholar
Trillos, N.G., Morales, F., Morales J.: Traditional and accelerated gradient descent for neural architecture search. In: Nielsen F., Barbaresco F. (eds.) Geometric Science of Information. GSI 2021. Lecture Notes in Computer Science, vol. 12829. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-80209-7_55
Villani, C.: Optimal Transport. Springer, Berlin (2009)
Book Google Scholar
Williams, R.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(8), 229–256 (1992)
MATH Google Scholar
Yu, T., Zhu, H.: Hyper-parameter optimization: a review of algorithms and applications (2020). arXiv:2003.05689
Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning (2016). arXiv:1611.01578
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition (2017). arXiv:1707.07012

Download references

Acknowledgements

N. García Trillos was supported by NSF-DMS 2005797. The work of J. Morales was supported by NSF grants DMS16-13911, RNMS11-07444 (KI-Net) and ONR grant N00014-1812465. Support for this research was provided by the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin-Madison with funding from the Wisconsin Alumni Research Foundation.

Author information

Authors and Affiliations

Department of Statistics, University of Wisconsin-Madison, 1300 University Avenue, Madison, WI, 53706, USA
Nicolás García Trillos
Center for Scientific Computation and Mathematical Modeling (CSCAMM), University of Maryland, College Park, MD, 20742, USA
Javier Morales

Authors

Nicolás García Trillos
View author publications
You can also search for this author in PubMed Google Scholar
Javier Morales
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Javier Morales.

Additional information

Communicated by Mary Pugh.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

García Trillos, N., Morales, J. Semi-discrete Optimization Through Semi-discrete Optimal Transport: A Framework for Neural Architecture Search. J Nonlinear Sci 32, 27 (2022). https://doi.org/10.1007/s00332-022-09780-2

Download citation

Received: 05 December 2020
Accepted: 07 January 2022
Published: 14 March 2022
DOI: https://doi.org/10.1007/s00332-022-09780-2

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-discrete Optimization Through Semi-discrete Optimal Transport: A Framework for Neural Architecture Search

Abstract

Access this article

Similar content being viewed by others

Traditional and Accelerated Gradient Descent for Neural Architecture Search

Optimization for Deep Learning: An Overview

Deep relaxation: partial differential equations for optimizing deep neural networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Semi-discrete Optimization Through Semi-discrete Optimal Transport: A Framework for Neural Architecture Search

Abstract

Access this article

Similar content being viewed by others

Traditional and Accelerated Gradient Descent for Neural Architecture Search

Optimization for Deep Learning: An Overview

Deep relaxation: partial differential equations for optimizing deep neural networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation