Skip to main content
Log in

Semi-discrete Optimization Through Semi-discrete Optimal Transport: A Framework for Neural Architecture Search

  • Published:
Journal of Nonlinear Science Aims and scope Submit manuscript

Abstract

In this paper, we introduce a theoretical framework for semi-discrete optimization using ideas from optimal transport. Our primary motivation is in the field of deep learning, and specifically in the task of neural architecture search. With this aim in mind, we discuss the geometric and theoretical motivation for new techniques for neural architecture search [in the companion work (García-Trillos et al. in Traditional and accelerated gradient descent for neural architecture search, 2021); we show that algorithms inspired by our framework are competitive with contemporaneous methods]. We introduce a Riemannian like metric on the space of probability measures over a semi-discrete space \({\mathbb {R}}^d \times \mathcal {G}\) where \(\mathcal {G}\) is a finite weighted graph. With such Riemannian structure in hand, we derive formal expressions for the gradient flow of a relative entropy functional, as well as second-order dynamics for the optimization of said energy. Then, with the aim of providing a rigorous motivation for the gradient flow equations derived formally we also consider an iterative procedure known as minimizing movement scheme (i.e., Implicit Euler scheme, or JKO scheme) and apply it to the relative entropy with respect to a suitable cost function. For some specific choices of metric and cost, we rigorously show that the minimizing movement scheme of the relative entropy functional converges to the gradient flow process provided by the formal Riemannian structure. This flow coincides with a system of reaction–diffusion equations on \({\mathbb {R}}^d\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ambrosio, L., Gigli, N.: A User’s Guide to Optimal Transport, pp. 1–155. Springer, Berlin (2013)

    Google Scholar 

  • Ambrosio, L., Gigli, N., Savaré, G.: Gradient Flows in Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics ETH Zürich. Birkhäuser, Basel (2005)

    MATH  Google Scholar 

  • Benamou, J.-D., Brenier, Y.: A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numer. Math. 84(3), 375–393 (2000)

    Article  MathSciNet  Google Scholar 

  • Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 24, pp. 2546–2554. Curran Associates Inc., Red Hook (2011)

    Google Scholar 

  • Chow, S.-N., Huang, W., Li, Y., Zhou, H.: Fokker-Planck equations for a free energy functional or Markov process on a graph. Arch. Ration. Mech. Anal. 203(3), 969–1008 (2012)

    Article  MathSciNet  Google Scholar 

  • Chung, F.: Spectral Graph Theory. American Mathematical Society, Providence (1996)

    Book  Google Scholar 

  • do Carmo, M.P.: Riemannian Geometry. Mathematics: Theory & Applications. Birkhäuser Boston, Inc., Boston (1992) (Translated from the second Portuguese edition by Francis Flaherty)

  • Elsken, T., Metzen, J.-H., Hutter, F.: Simple and efficient architecture search for convolutional neural networks (2017). arXiv:1711.04528

  • Erbar, M., Fathi, M., Laschos, V., Schlichting, A.: Gradient flow structure for Mckean–Vlasov equations on discrete spaces (2016)

  • Erbar, M., Maas, J.: Ricci curvature of finite Markov chains via convexity of the entropy. Arch. Ration. Mech. Anal. 206(3), 997–1038 (2012)

    Article  MathSciNet  Google Scholar 

  • Esposito, A., Patacchini, F.S., Schlichting, A., Slepcev, D.: Nonlocal-interaction equation on graphs: gradient flow structure and continuum limit (2019). arXiv:abs/1912.09834

  • Figalli, A., Gigli, N.: A new transportation distance between non-negative measures, with applications to gradients flows with dirichlet boundary conditions. J. Math. Pures Appl. 94(2), 107–130 (2010)

    Article  MathSciNet  Google Scholar 

  • Garbuno-Inigo, A., Hoffmann, F., Li, W., Stuart, A.M.: Interacting Langevin diffusions: gradient structure and ensemble Kalman sampler (2019). arXiv:1903.08866

  • García Trillos, N.: Gromov-Hausdorff limit of Wasserstein spaces on point clouds. Calc. Var. 59, 73 (2020). https://doi.org/10.1007/s00526-020-1729-3

  • Gigli, N., Maas, J.: Gromov–Hausdorff convergence of discrete transportation metrics. SIAM J. Math. Anal. 45(2), 879–899 (2013)

    Article  MathSciNet  Google Scholar 

  • Gladbach, P., Kopfer, E., Maas, J.: Scaling limits of discrete optimal transport. SIAM J. Math. Anal. 52(3), 2759–2802 (2020)

    Article  MathSciNet  Google Scholar 

  • Gladbach, P., Kopfer, E., Maas, J., Portinale, L.: Homogenisation of one-dimensional discrete optimal transport. J. Math. Pures Appl. 139, 204–234 (2020)

    Article  MathSciNet  Google Scholar 

  • Jordan, R., Kinderlehrer, D., Otto, F.: The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal. 29(1), 1–17 (1998)

    Article  MathSciNet  Google Scholar 

  • Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L.-J., Fei-Fei, L., Yuille, A., Huang, J., Murphy, K.: Progressive neural architecture search. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018, pp. 19–35. Springer, Cham (2018)

    Chapter  Google Scholar 

  • Maas, J.: Gradient flows of the entropy for finite Markov chains. J. Funct. Anal. 261(8), 2250–2292 (2011)

    Article  MathSciNet  Google Scholar 

  • Mielke, A.: A gradient structure for reaction–diffusion systems and for energy-drift-diffusion systems. Nonlinearity 24(4), 1329–1346 (2011)

    Article  MathSciNet  Google Scholar 

  • Mielke, A.: Geodesic convexity of the relative entropy in reversible Markov chains. Calc. Var. Partial Differ. Equ. 48(1), 1–31 (2013)

    Article  MathSciNet  Google Scholar 

  • Peyré, G., Cuturi, M.: Computational Optimal Transport: With Applications to Data Science, Foundations and Trends in Machine Learning, vol. 11, pp. 355–607 (2019)

  • Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: Dy, J., Krause, A. (eds) Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research: PMLR, pp. 4095–4104. Stockholmsmässan, Stockholm, 10–15 Jul (2018)

  • Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: AAAI (2018)

  • Simon, J.: Compact sets in the space Lp(o, t; b). Annali di Matematica Pura ed Applicata 146, 65–96 (1986)

    Article  Google Scholar 

  • Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002)

    Article  Google Scholar 

  • Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17(153), 1–43 (2016)

    MathSciNet  MATH  Google Scholar 

  • Trillos, N.G., Morales, F., Morales J.: Traditional and accelerated gradient descent for neural architecture search. In: Nielsen F., Barbaresco F. (eds.) Geometric Science of Information. GSI 2021. Lecture Notes in Computer Science, vol. 12829. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-80209-7_55

  • Villani, C.: Optimal Transport. Springer, Berlin (2009)

    Book  Google Scholar 

  • Williams, R.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(8), 229–256 (1992)

    MATH  Google Scholar 

  • Yu, T., Zhu, H.: Hyper-parameter optimization: a review of algorithms and applications (2020). arXiv:2003.05689

  • Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning (2016). arXiv:1611.01578

  • Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition (2017). arXiv:1707.07012

Download references

Acknowledgements

N. García Trillos was supported by NSF-DMS 2005797. The work of J. Morales was supported by NSF grants DMS16-13911, RNMS11-07444 (KI-Net) and ONR grant N00014-1812465. Support for this research was provided by the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin-Madison with funding from the Wisconsin Alumni Research Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javier Morales.

Additional information

Communicated by Mary Pugh.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

García Trillos, N., Morales, J. Semi-discrete Optimization Through Semi-discrete Optimal Transport: A Framework for Neural Architecture Search. J Nonlinear Sci 32, 27 (2022). https://doi.org/10.1007/s00332-022-09780-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00332-022-09780-2

Keywords

Mathematics Subject Classification

Navigation