Abstract
Wasserstein gradient flows on probability measures have found a host of applications in various optimization problems. They typically arise as the continuum limit of exchangeable particle systems evolving by some mean-field interaction involving a gradient-type potential. However, in many problems, such as in multi-layer neural networks, the so-called particles are edge weights on large graphs whose nodes are exchangeable. Such large graphs are known to converge to continuum limits called graphons as their size grows to infinity. We show that the Euclidean gradient flow of a suitable function of the edge weights converges to a novel continuum limit given by a curve on the space of graphons that can be appropriately described as a gradient flow or, more technically, a curve of maximal slope. Several natural functions on graphons, such as homomorphism functions and the scalar entropy, are covered by our setup, and the examples have been worked out in detail.
Similar content being viewed by others
Data Availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Code Availability
The codes used during the current study are available from the corresponding author on reasonable request.
References
Aldous, D.J.: On exchangeability and conditional independence. Exchangeability in probability and statistics (Rome, 1981), 165–170 (1982)
Aldous, D.J.: Representations for partially exchangeable arrays of random variables. J. Multivar. Anal. 11(4), 581–598 (1981). https://doi.org/10.1016/0047-259X(81)90099-3
Ambrosio, L., Gigli, N., Savaré, G.: Gradient flows: In Metric spaces and in the space of probability measures. Second Edition. Lectures in mathematics. ETH Zürich. Birkhäuser Verlag AG, Basel (2008). https://doi.org/10.1007/978-3-7643-8722-8
Araújo, D., Oliveira, R.I., Yukimura, D.: A mean-field limit for certain deep neural networks. arXiv preprint arXiv:1906.00193 (2019)
Athreya, S., den Hollander, F., Röllin, A.: Graphon-valued stochastic processes from population genetics. Ann. Appl. Probab. 31(4), 1724–1745 (2021). https://doi.org/10.1214/20-AAP1631
Austin, T.: Exchangeable random arrays. In: Notes for IAS workshop (2012)
Austin, T.: On exchangeable random variables and the statistics of large graphs and hypergraphs. Probab. Surv. 5, 80–145 (2008)
Austin, T.: Exchangeable random measures. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 51(3), 842–861 (2015). https://doi.org/10.1214/13-AIHP584
Bach, F., Chizat, L.: Gradient descent on infinitely wide neural networks: global convergence and generalization. arXiv preprint arXiv:2110.08084 (2021)
Ben-Eliezer, O., Fischer, E., Levi, A., Yoshida, Y.: Ordered graph limits and their applications. In: Lee, J.R. (ed.) 12th Innovations in theoretical computer science conference (ITCS 2021). Leibniz international proceedings in informatics (LIPIcs), vol. 185, pp. 42–14220. Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany (2021). https://doi.org/10.4230/LIPIcs.ITCS.2021.42
Bhattacharya, B.B., Ganguly, S.: Upper tails for edge eigenvalues of random graphs. SIAM J. Discr. Math. 34(2), 1069–1083 (2020). https://doi.org/10.1137/18M1230852
Bondy, J.A.: Pancyclic graphs I. J. Combinat. Theory, Series B 11(1), 80–84 (1971). https://doi.org/10.1016/0095-8956(71)90016-5
Borgs, C., Chayes, J.T., Lovász, L., Sós, V.T., Vesztergombi, K.: Convergent sequences of dense graphs I: Subgraph frequencies, metric properties and testing. Adv. Math. 219(6), 1801–1851 (2008). https://doi.org/10.1016/j.aim.2008.07.008
Borgs, C., Chayes, J.T., Lovász, L., Sós, V.T., Vesztergombi, K.: Convergent sequences of dense graphs II multiway cuts and statistical physics. Ann. Math. (2012). https://doi.org/10.4007/ANNALS.2012.176.1.2
Borgs, C., Chayes, J.T., Cohn, H., Holden, N.: Sparse exchangeable graphs and their limits via graphon processes. J. Mach. Learn. Res. 18(210), 1–71 (2018)
Butcher, J.C.: Numerical methods for ordinary differential equations. Wiley, Hoboken (2016). https://doi.org/10.1002/9781119121534
Carrillo, J.A., Craig, K., Patacchini, F.S.: A blob method for diffusion. Calc. Variat. Part. Diff. Eq. 58(2), 1–53 (2019). https://doi.org/10.1007/s00526-019-1486-3
Chatterjee, S.: Large deviations for random graphs: École d’Été de Probabilités de Saint-Flour XLV-2015 vol. 2197. Springer, New York (2017). https://doi.org/10.1007/978-3-319-65816-2
Chatterjee, S., Diaconis, P.: Estimating and understanding exponential random graph models. Ann. Stat. 41(5), 2428–2461 (2013). https://doi.org/10.1214/13-AOS1155
Chatterjee, S., Varadhan, S.R.S.: The large deviation principle for the Erdős-Rényi random graph. Eur. J. Comb. 32(7), 1000–1017 (2011). https://doi.org/10.1016/j.ejc.2011.03.014
Chern, B.G.: Large deviations approximation to normalizing constants in exponential models. PhD thesis, Stanford University (2016)
Chizat, L., Bach, F.: On the global convergence of gradient descent for over-parameterized models using optimal transport. In: Proceedings of the 32nd international conference on neural information processing systems, pp. 3040–3050. Curran Associates Inc., Red Hook, NY, USA (2018)
Cook, N., Dembo, A.: Large deviations of subgraph counts for sparse Erdős-Rényi graphs. Adv. Math. 373, 107289 (2020). https://doi.org/10.1016/j.aim.2020.107289
Crane, H.: Dynamic random networks and their graph limits. Ann. Appl. Probab. 26(2), 691–721 (2016). https://doi.org/10.1214/15-AAP1098
Demetci, P., Santorella, R., Sandstede, B., Noble, W.S., Singh, R.: Gromov-Wasserstein optimal transport to align single-cell multi-omics data. bioRxiv (2020). https://doi.org/10.1101/2020.04.28.066787
Diaconis, P., Janson, S.: Graph limits and exchangeable random graphs. Rendiconti di Matematica e delle sue Applicazioni 28(1), 33–61 (2008)
Diao, P., Guillot, D., Khare, A., Rajaratnam, B.: Differential calculus on graphon space. J. Combin. Theory, Series A 133, 183–227 (2015). https://doi.org/10.1016/j.jcta.2015.02.006
Eldan, R., Gross, R.: Exponential random graphs behave like mixtures of stochastic block models. Ann. Appl. Probab. 28(6), 3698–3735 (2018). https://doi.org/10.1214/18-AAP1402
Frieze, A., Kannan, R.: Quick approximation to matrices and applications. Combinatorica 19(2), 175–220 (1999). https://doi.org/10.1007/s004930050052
Gangbo, W., Tudorascu, A.: On differentiability in the Wasserstein space and well-posedness for Hamilton-Jacobi equations. J. Math. Pures et Appl. 125, 119–174 (2019). https://doi.org/10.1016/j.matpur.2018.09.003
Ghafouri, S., Khasteh, S.H.: A survey on exponential random graph models: an application perspective. PeerJ Comput. Sci. 6, 269 (2020). https://doi.org/10.7717/peerj-cs.269
Harchaoui, Z., Oh, S., Pal, S., Somani, R., Tripathi, R.: Stochastic optimization on matrices and a graphon McKean-Vlasov limit. arXiv preprint arXiv:2210.00422 (2022)
Hoover, D.N.: Row-column exchangeability and a generalized model for probability. Exchangeability in probability and statistics (Rome, 1981), 281–291 (1982)
Huff, R.E.: The Radon-Nikodỳm property for Banach-spaces - a survey of geometric aspects. In: Bierstedt, K.-D., Fuchssteiner, B. (eds.) Functional analysis: surveys and recent results. North-Holland Mathematics Studies, vol. 27, pp. 1–13. North-Holland, Germany (1977). https://doi.org/10.1016/S0304-0208(08)70521-8
Hunter, J.K.: Notes on partial differential equations. Lecture notes, https://www.math.ucdavis.edu/~hunter/pdes/pde_notes.pdf, Department of mathematics, University of California (2014)
Janson, S.: Graphons and cut metric on sigma-finite measure spaces. arXiv preprint arXiv:1608.01833 (2016)
Janson, S.: Graphons, cut norm and distance, couplings and rearrangements. NYJM Monographs 4 (2013)
Jordan, R., Kinderlehrer, D., Otto, F.: The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal. 29(1), 1–17 (1998). https://doi.org/10.1137/S0036141096303359
Kallenberg, O.: On the representation theorem for exchangeable arrays. J. Multiv. Anal. 30(1), 137–154 (1989). https://doi.org/10.1016/0047-259X(89)90092-4
Kenyon, R., Yin, M.: On the asymptotics of constrained exponential random graphs. J. Appl. Probab. 54(1), 165–180 (2017). https://doi.org/10.1017/jpr.2016.93
Lindelöf, E.: Sur l’application de la méthode des approximations successives aux équations différentielles ordinaires du premier ordre. Comptes rendus hebdomadaires des séances de l’Académie des sciences 116(3), 454–457 (1894)
Lovász, L.: Large networks and graph limits. Colloquium publications, vol. 60. American Mathematical Society, Providence, RI (2012). https://doi.org/10.1090/coll/060
Lovász, L., Szegedy, B.: Limits of dense graph sequences. J. Comb. Theory, Series B 96(6), 933–957 (2006). https://doi.org/10.1016/j.jctb.2006.05.002
Lovász, L., Szegedy, B.: Szemerédi’s lemma for the analyst. Geomet. Funct. Anal. 1(7), 252–270 (2007). https://doi.org/10.1007/s00039-007-0599-6
Lovász, L.M., Zhao, Y.: On derivatives of graphon parameters. J. Combin. Theory Series A 145(C), 364–368 (2017). https://doi.org/10.1016/j.jcta.2016.08.007
Lubetzky, E., Zhao, Y.: On replica symmetry of large deviations in random graphs. Rand. Struct. Algor. 47(1), 109–146 (2015). https://doi.org/10.1002/rsa.20536
Mantel, W.: Problem 28. Wiskundige Opgaven 10(2), 60–61 (1907)
McCann, R.J.: A convexity principle for interacting gases. Adv. Math. 128(1), 153–179 (1997). https://doi.org/10.1006/aima.1997.1634
Mei, S., Misiakiewicz, T., Montanari, A.: Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit. In: Beygelzimer, A., Hsu, D. (eds.) Proceedings of the Thirty-Second Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 99, pp. 2388–2464 (2019)
Mémoli, F.: Gromov-Wasserstein distances and the metric approach to object matching. Found. Comput. Math 1(1), 417–487 (2011). https://doi.org/10.1007/s10208-011-9093-5
Munkres, J.R.: Topology. Prentice Hall, Upper Saddle River (2000)
Nguyen, P.-M., Pham, H.T.: A rigorous framework for the mean field limit of multilayer neural networks. arXiv preprint arXiv:2001.11443 (2020)
Rotskoff, G.M., Vanden-Eijnden, E.: Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks. In: Proceedings of the 32nd international conference on neural information processing systems, pp. 7146–7155 (2018)
Santambrogio, F.: Optimal transport for applied mathematicians: calculus of variations, PDEs, and modeling. Progress in nonlinear differential equations and their applications, vol. 87. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20828-2
Santambrogio, F.: \(\{\)Euclidean, metric, and Wasserstein\(\}\) gradient flows: an overview. Bull. Math. Sci. 7(1), 87–154 (2017). https://doi.org/10.1007/s13373-017-0101-1
Sirignano, J., Spiliopoulos, K.: Mean field analysis of neural networks: a central limit theorem. Stoch. Process. Appl. 130(3), 1820–1852 (2020). https://doi.org/10.1016/j.spa.2019.06.003
Sirignano, J., Spiliopoulos, K.: Mean field analysis of neural networks: a law of large numbers. SIAM J. Appl. Math. 80(2), 725–752 (2020). https://doi.org/10.1137/18M1192184
Song, M., Montanari, A., Nguyen, P.: A mean field view of the landscape of two-layers neural networks. Proceed. Nat. Acad. Sci. 115, 7665–7671 (2018). https://doi.org/10.1073/pnas.1806579115
Sturm, K.-T.: The space of spaces: curvature bounds and gradient flows on the space of metric measure spaces. Available at arXiv:1208.0434v1 (2012)
Tzen, B., Raginsky, M.: A mean-field theory of lazy training in two-layer neural nets: entropic regularization and controlled McKean-Vlasov dynamics. arXiv preprint arXiv:2002.01987 (2020)
Villani, C.: Topics in optimal transportation. Graduate studies in mathematics, vol. 58. American Mathematical Society, Providence, RI (2003). https://doi.org/10.1090/gsm/058
Acknowledgements
Many thanks to Persi Diaconis, Apoorva Khare and Stefan Steinerberger for helpful conversations and references and to the PIMS Kantorovich Initiative for facilitating this collaboration. The authors are listed in alphabetical order.
Funding
This research is partially supported by the following grants. Pal is supported by NSF Grant No. DMS-2052239 and a PIMS CRG (PIHOT). Pal and Oh are supported by NSF grant DMS-2134012. Oh is supported by NSF Grant No. CCF-2019844.
Author information
Authors and Affiliations
Contributions
The authors are arranged in alphabetical order.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that have no conflict of interest.
Ethical Approval
Not relevant to the content of this article.
Consent to Participate
Not relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Oh, S., Pal, S., Somani, R. et al. Gradient Flows on Graphons: Existence, Convergence, Continuity Equations. J Theor Probab (2023). https://doi.org/10.1007/s10959-023-01271-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10959-023-01271-8