Optimization Algorithms for Multi-species Spherical Spin Glasses

Huang, Brice; Sellke, Mark

doi:10.1007/s10955-024-03242-7

Optimization Algorithms for Multi-species Spherical Spin Glasses

Open access
Published: 24 February 2024

Volume 191, article number 29, (2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Statistical Physics Aims and scope Submit manuscript

Optimization Algorithms for Multi-species Spherical Spin Glasses

Download PDF

387 Accesses
2 Citations
Explore all metrics

Abstract

This paper develops approximate message passing algorithms to optimize multi-species spherical spin glasses. We first show how to efficiently achieve the algorithmic threshold energy identified in our companion work (Huang and Sellke in arXiv preprint, 2023. arXiv:2303.12172), thus confirming that the Lipschitz hardness result proved therein is tight. Next we give two generalized algorithms which produce multiple outputs and show all of them are approximate critical points. Namely, in an r-species model we construct $2^r$ approximate critical points when the external field is stronger than a “topological trivialization" phase boundary, and exponentially many such points in the complementary regime. We also compute the local behavior of the Hamiltonian around each. These extensions are relevant for another companion work (Huang and Sellke in arXiv preprint, 2023. arXiv:2308.09677) on topological trivialization of the landscape.

Crisanti–Sommers Formula and Simultaneous Symmetry Breaking in Multi-species Spherical Spin Glasses

Article 22 June 2022

Low Temperature Asymptotics of Spherical Mean Field Spin Glasses

Article 24 March 2017

Spectral Gap Estimates in Mean Field Spin Glasses

Article 19 May 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

This paper studies the efficient optimization of a family of random non-convex functions $H_N$ defined on high-dimensional spaces, namely the Hamiltonians of multi-species spherical spin glasses. Mean-field spin glasses have been studied since [25] as models for disordered magnetic systems and are also closely linked to random combinatorial optimization problems [12, 19, 22]. In short, their Hamiltonians are certain polynomials in many variables with independent centered Gaussian coefficients.

The purpose of this work is to develop efficient algorithms to optimize $H_N$. Our companion work [16] derives an algorithmic threshold ${\textsf {ALG}}$ and proves no optimization algorithm with suitably Lipschitz dependence on $H_N$ can achieve energy better than ${\textsf {ALG}}$ with more than exponentially small probability. The value ${\textsf {ALG}}$ is expressed as the maximum of a variational principle over several increasing functions, which was shown to be achieved by joining the solutions to a pair of well-posed differential equations. The first main contribution of this paper is to show that given a solution to this variational problem, so-called approximate message passing (AMP) algorithms efficiently achieve the value ${\textsf {ALG}}$. We note that several previous works [4, 21, 24, 26] have given similar algorithms for mean-field spin glasses with 1 species, and our algorithm is in line with the latter three.

Furthermore, we use these AMP algorithms to aid a detailed study of the landscape of $H_N$ by probing neighborhoods of special critical points. This is related to a second companion work [17] which identifies the phase boundary for topological trivialization of $H_N$, where the number of critical points is a constant independent of N. Therein, Kac-Rice estimates are used to show that for r-species models (defined on a product of r spheres) in the “super-solvable” regime with strong external field, $H_N$ has exactly $2^r$ critical points with high probability. In this paper, we give a signed AMP algorithm which explicitly approximates each of these critical points. Moreover in the complementary “sub-solvable” regime, we use AMP to construct $\exp (cN)$ separated approximate critical points with high probability. This implies the failure of strong topological trivialization as defined in [17], which is proved therein to hold for super-solvable models. Finally, the machinery of AMP allows us to compute the local behavior of $H_N$ around these algorithmic outputs, giving even more precise information about the landscape.

1.1 Problem Description

Fix a finite set ${\mathscr {S}}= \{1,\ldots ,r\}$. For each positive integer N, fix a deterministic partition $\{1,\ldots ,N\} = \sqcup _{s\in {\mathscr {S}}}\, {\mathcal {I}}_s$ with $\lim _{N\rightarrow \infty } |{\mathcal {I}}_s| / N =\lambda _s$ where ${\vec \lambda }= (\lambda _1,\ldots ,\lambda _r) \in {\mathbb {R}}_{>0}^{\mathscr {S}}$. For $s\in {\mathscr {S}}$ and ${\varvec{x}}\in {\mathbb {R}}^N$, let ${\varvec{x}}_s \in {\mathbb {R}}^{{\mathcal {I}}_s}$ denote the restriction of ${\varvec{x}}$ to coordinates ${\mathcal {I}}_s$. We consider the state space

$$\begin{aligned} {\mathcal {B}}_N = \left\rbrace {\varvec{x}}\in {\mathbb {R}}^N : {\left\Vert{\varvec{x}}_s\right\Vert}_2^2 \le \lambda _s N \quad \forall ~s\in {\mathscr {S}}\right\lbrace . \end{aligned}$$

(1.1)

Fix ${\vec h}= (h_1,\ldots ,h_r) \in {\mathbb {R}}_{\ge 0}^{\mathscr {S}}$ and let ${\textbf {1}}= (1,\ldots ,1) \in {\mathbb {R}}^N$. For each $k\ge 2$ fix a symmetric tensor $\Gamma ^{(k)} = (\gamma _{s_1,\ldots ,s_k})_{s_1,\ldots ,s_k\in {\mathscr {S}}} \in ({\mathbb {R}}_{\ge 0}^{{\mathscr {S}}})^{\otimes k}$ with $\sum _{k\ge 2} 2^k {\left\Vert\Gamma ^{(k)}\right\Vert}_\infty < \infty $, and let ${\textbf {G}}^{(k)} \in ({\mathbb {R}}^N)^{\otimes k}$ be a tensor with i.i.d. standard Gaussian entries.

For $A\in ({\mathbb {R}}^{\mathscr {S}})^{\otimes k}$, $B\in ({\mathbb {R}}^N)^{\otimes k}$, define $A\diamond B \in ({\mathbb {R}}^N)^{\otimes k}$ to be the tensor with entries

$$\begin{aligned} (A\diamond B)_{i_1,\ldots ,i_k} = A_{s(i_1),\ldots ,s(i_k)} B_{i_1,\ldots ,i_k}, \end{aligned}$$

(1.2)

where s(i) denotes the $s\in {\mathscr {S}}$ such that $i\in {\mathcal {I}}_s$. Let $\varvec{h}= {\vec h}\diamond {\textbf {1}}$. We consider the mean-field multi-species spin glass Hamiltonian

$$\begin{aligned} H_N({\varvec{\sigma }})&= \langle \varvec{h}, {\varvec{\sigma }}\rangle + \widetilde{H}_N({\varvec{\sigma }}), \quad \text {where} \end{aligned}$$

(1.3)

$$\begin{aligned} \widetilde{H}_N({\varvec{\sigma }})&= \sum _{k\ge 2} \frac{1}{N^{(k-1)/2}} \langle \Gamma ^{(k)} \diamond {\varvec{G}}^{(k)}, {\varvec{\sigma }}^{\otimes k} \rangle \nonumber \\&= \sum _{k\ge 2} \frac{1}{N^{(k-1)/2}} \sum _{i_1,\ldots ,i_k=1}^N \gamma _{s(i_1),\ldots ,s(i_k)} {\varvec{G}}^{(k)}_{i_1,\ldots ,i_k} \sigma _{i_1}\ldots \sigma _{i_k} \end{aligned}$$

(1.4)

with inputs ${\varvec{\sigma }}= (\sigma _1,\ldots ,\sigma _N) \in {\mathcal {B}}_N$. For example, the choice of parameters $\Gamma ^{(2)} = ({\begin{matrix} 0 &{} 1 \\ 1 &{} 0 \end{matrix}})$ and $\Gamma ^{(k)}=0$ for $k\ge 3$ is the well-known bipartite spherical SK model [2]. For ${\varvec{\sigma }},{\varvec{\rho }}\in {\mathcal {B}}_N$, define the species s overlap and overlap vector

$$\begin{aligned} R_s({\varvec{\sigma }}, {\varvec{\rho }}) = \frac{ \langle {\varvec{\sigma }}_s, {\varvec{\rho }}_s \rangle }{\lambda _s N}, \qquad \vec R({\varvec{\sigma }}, {\varvec{\rho }}) = \left(R_1({\varvec{\sigma }}, {\varvec{\rho }}), \ldots , R_r({\varvec{\sigma }}, {\varvec{\rho }})\right). \end{aligned}$$

(1.5)

Let $\odot $ denote coordinate-wise product. For $\vec {x}= (x_1,\ldots ,x_r) \in {\mathbb {R}}^{\mathscr {S}}$, let

$$\begin{aligned} \xi (\vec {x})&= \sum _{k\ge 2} \langle \Gamma ^{(k)}\odot \Gamma ^{(k)}, ({\vec \lambda }\odot \vec {x})^{\otimes k}\rangle \\&= \sum _{k\ge 2} \sum _{s_1\ldots ,s_k\in {\mathscr {S}}} \gamma _{s_1,\ldots ,s_k}^2 (\lambda _{s_1} x_{s_1}) \ldots (\lambda _{s_k} x_{s_k}). \end{aligned}$$

The random function $\widetilde{H}_N$ can also be described as the Gaussian process on ${\mathcal {B}}_N$ with covariance

$$\begin{aligned} {\mathbb {E}}\widetilde{H}({\varvec{\sigma }})\widetilde{H}({\varvec{\rho }}) = N\xi (\vec R({\varvec{\sigma }}, {\varvec{\rho }})). \end{aligned}$$

We will also often refer to the product of spheres

$$\begin{aligned} {\mathcal {S}}_N=\big \{{\varvec{u}}\in {\mathbb {R}}^N~:~\Vert {\varvec{u}}_s\Vert ^2=\lambda _s N~~\forall ~s\in {\mathscr {S}}\big \}. \end{aligned}$$

(1.6)

It will be useful to define, for $s\in {\mathscr {S}}$,

$$\begin{aligned} \xi ^s(\vec {x}) = \lambda _s^{-1} \partial _{x_s} \xi (\vec {x}). \end{aligned}$$

1.2 The Value ${\textsf {ALG}}$

Given $({\vec \lambda },\xi )$, the ground state energy of the associated multi-species spherical spin glass is^{Footnote 1}

$$\begin{aligned} \textsf{OPT}=\textsf{OPT}(\xi ) = \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \sup _{{\varvec{\sigma }}\in {\mathcal {B}}_N} H_N({\varvec{\sigma }})/N. \end{aligned}$$

In the bipartite SK model mentioned above, $\textsf{OPT}$ is the limiting operator norm of an IID Gaussian rectangular matrix with aspect ratio $\lambda _1/\lambda _2$. For large k, the asymptotic operator norm of an IID random k-tensor is similarly encoded as $\textsf{OPT}(\xi )$ for some $\xi $ (with e.g. $r=k$). Perhaps surprisingly, it is generally believed that polynomial-time algorithms are not in general capable of finding ${\varvec{\sigma }}\in {\mathcal {B}}_N$ such that $H_N({\varvec{\sigma }})\ge \textsf{OPT}(\xi )-{\varepsilon }$ with high probability as $N\rightarrow \infty $. Our work [15] showed that in the single species case (and with all terms of even degree), one can identify an exact threshold ${\textsf {ALG}}$ for the performance of a class of Lipschitz algorithms which includes gradient-based methods and Langevin dynamics. More recently in [16], we extended the algorithmic hardness direction of this result to multi-species spherical spin glasses, using a new proof technique that applies even when $\textsf{OPT}$ is not known. The purpose of this paper is to give explicit algorithms attaining the value ${\textsf {ALG}}$, and we present here the formula for this value.

The algorithmic threshold ${\textsf {ALG}}$ is given by the following variational principle. This is a simplification of the more general variational formula [16, Equation (1.7)], obtained by a partial characterization of its maximizers [16, Theorem 3]. The following generic assumption is needed therein to ensure well-posedness of the ODE (2.3) used in this description, and we will freely assume it throughout the paper.

Assumption 1

All quadratic and cubic interactions participate in H, i.e. $\Gamma ^{(2)}, \Gamma ^{(3)} > 0$ coordinate-wise. We will call such models non-degenerate. Since this condition depends only on $\xi $, we similarly call $\xi $ non-degenerate.

To optimize $H_N$ for degenerate $\xi $, it suffices to apply our algorithms to a slight perturbation $\widetilde{\xi }$ which is non-degenerate and satisfies $\Vert \xi -\widetilde{\xi }\Vert _{C^3([0,1]^r)}\le {\varepsilon }$ to obtain the guarantees in this and the next section. Here, $C^3([0,1]^r)$ denotes the norm

$$\begin{aligned} \Vert \xi \Vert _{C^3([0,1]^r)} = \sup _{\vec {x}\in [0,1]^r} \max \left\rbrace |\xi (\vec {x})|, \Vert \nabla \xi (\vec {x})\Vert _\infty , \Vert \nabla ^2 \xi (\vec {x})\Vert _\infty , \Vert \nabla ^3 \xi (\vec {x})\Vert _\infty \right\lbrace . \end{aligned}$$

Since both the ground state and the more general ${\textsf {ALG}}$ formula in [16] (allowing degenerate $\xi $) vary continuously in $\xi $, there is essentially no loss of generality in assuming non-degeneracy.

The formula for ${\textsf {ALG}}$ is described by two cases depending on whether $\vec {1}=1^{{\mathscr {S}}}$ is super-solvable as defined below.

Definition 1.1

A matrix $M\in {\mathbb {R}}^{{\mathscr {S}}\times {\mathscr {S}}}$ is diagonally signed if $M_{i,i}\ge 0$ and $M_{i,j}<0$ for all $i\ne j$.

Definition 1.2

A symmetric diagonally signed matrix M is super-solvable if it is positive semidefinite, and solvable if it is furthermore singular; otherwise M is strictly sub-solvable. A point $\vec {x}\in (0,1]^{\mathscr {S}}$ is super-solvable, solvable, or strictly sub-solvable if $M^*(\vec {x})$ is, where

$$\begin{aligned} M^*(\vec {x}) = \text {diag}\left(\left(\frac{\partial _{x_s}\xi (\vec {x}) + \lambda _s h_s^2}{x_s}\right)_{s\in {\mathscr {S}}}\right) - \left(\partial _{x_s,x_{s'}}\xi (\vec {x})\right)_{s,s'\in {\mathscr {S}}}. \end{aligned}$$

(1.7)

We also adopt the convention that $\vec {0}$ is always super-solvable, and solvable if ${\vec h}=\vec {0}$.

The following will be useful.

Proposition 1.3

([16, Proposition 4.3], see also [17, Lemma 2.5]) If the square matrix M is diagonally signed, then the minimal eigenvalue ${\varvec{\lambda }}_{\min }(M)$ has multiplicity 1, and the corresponding eigenvector $\vec {v}$ has strictly positive entries. Moreover

$$\begin{aligned} {\varvec{\lambda }}_{\min }(M) = \sup _{\vec {v}\succ \vec {0}} \min _{s\in {\mathscr {S}}} \frac{(M\vec {v})_s}{v_s}, \end{aligned}$$

and the supremum is uniquely attained at $\vec {v}$.

It is easy to see that any $x\in (0,1]^{\mathscr {S}}$ is sub-solvable when ${\vec h}=\vec {0}$, and that super-solvability is a coordinate-wise increasing property of ${\vec h}$. For our purposes, an external field is large if $\vec {1}$ is super-solvable and small if $\vec {1}$ is strictly sub-solvable. (Unfortunately we do not have more refined intuition for the precise form of $M^*$ above, nor the resulting phase boundary between super and sub-solvability.) As shown in our companion work [17], in super-solvable models the external fields $\varvec{h}$ are strong enough to trivialize the “glassy” nature of the landscape for $H_N$. Namely the number of critical points is exactly $2^r$ with high probability, the minimum number of any generic smooth (“Morse”) function on a product of r spheres. By contrast in the sub-solvable case, the expected number of critical points is exponentially large in the dimension N. As explained below, the optimization algorithms are also simpler in the super-solvable case.

Definition 1.4

(Algorithmic Threshold, Super-Solvable Case) If $\vec {1}$ is super-solvable, then

$$\begin{aligned} {\textsf {ALG}}= \sum _{s\in {\mathscr {S}}} \lambda _s \sqrt{\xi ^s(\vec {1}) + h_s^2}. \end{aligned}$$

When $\vec {1}$ is strictly sub-solvable, the formula for ${\textsf {ALG}}$ becomes more complicated and depends on the optimal choice of a increasing $C^2$ function $\Phi :[q_1,1]\rightarrow [0,1]^{{\mathscr {S}}}$ satisfying certain conditions. We term such $\Phi $ pseudo-maximizers and defer the formal definition to Definition 2.1. Note that $q_1 \in [0,1]$ is not fixed, but is determined by the choice of $\Phi $.

Definition 1.5

(Algorithmic Threshold, Strictly Sub-solvable Case) If $\vec {1}$ is strictly sub-solvable, then with the maximum taken over all pseudo-maximizers $\Phi $ of ${\mathbb {A}}$,

$$\begin{aligned} \begin{aligned} {\textsf {ALG}}&= \max _{\Phi } {\mathbb {A}}(\Phi ); \\ {\mathbb {A}}(\Phi )&\equiv \sum _{s\in {\mathscr {S}}} \lambda _s \left[ \sqrt{\Phi _s(q_1) (\xi ^s(\Phi (q_1)) + h_s^2)} + \int _{q_1}^1 \sqrt{\Phi '_s(q)(\xi ^s\circ \Phi )'(q)}~\text {d}q \right]. \end{aligned} \end{aligned}$$

(1.8)

See [16, Remark 1.3] for an approach to maximizing ${\mathbb {A}}$ using the well-posedness of the ODEs (2.2), (2.3) in the definition of pseudo-maximizer. The computational complexity of this task is in particular independent of N.

The following theorem is our main result. We equip the space ${\mathscr {H}}_N$ of Hamiltonians $H_N$ with the following distance. We identify $H_N$ with its disorder coefficients $({\varvec{G}}^{(k)})_{k\ge 2}$, which we arrange in an arbitrary but fixed order into an infinite vector ${\varvec{g}}(H_N)$, and define

$$\begin{aligned} \Vert {H_N-H'_N}\Vert _2 = \Vert {{\varvec{g}}(H_N) - {{\varvec{g}}}(H'_N)}\Vert _2. \end{aligned}$$

(In other words, $\Vert {H_N-H'_N}\Vert _2^2$ is the sum of squared differences $(g_{i_1,\ldots ,i_k}-g'_{i_1,\ldots ,i_k})^2$ between all corresponding pairs of coefficients in $({\varvec{G}}^{(k)})_{k\ge 2}$ and $({\varvec{G}}'^{(k)})_{k\ge 2}$.) We say an algorithm ${\mathcal {A}}_N: {\mathscr {H}}_N \rightarrow {\mathcal {B}}_N$ is $\tau $-Lipschitz if

$$\begin{aligned} \Vert {{\mathcal {A}}_N(H_N) - {\mathcal {A}}_N(H'_N)}\Vert _2 \le \tau \Vert {H_N - H'_N}\Vert _2, \qquad \forall H_N, H'_N \in {\mathscr {H}}_N. \end{aligned}$$

Note that $\Vert {H_N-H'_N}\Vert _2$ may be infinite, and if so this condition holds vacuously for such pairs $(H_N,H'_N)$. Here and throughout, all implicit constants may depend also on $(\xi ,{\vec h},{\vec \lambda })$.

Theorem 1

For any ${\varepsilon }>0$, there exists an $O_{{\varepsilon }}(1)$-Lipschitz ${\mathcal {A}}_N:{\mathscr {H}}_N\rightarrow {\mathcal {B}}_N$ such that

$$\begin{aligned} {\mathbb {P}}[H_N({\mathcal {A}}_N(H_N))/N \ge {\textsf {ALG}}-{\varepsilon }] \ge 1-\exp (-cN), \quad c = c({\varepsilon }) > 0. \end{aligned}$$

The main result in our companion work [16, Theorem 1] states that any $\tau $-Lipschitz ${\mathcal {A}}_N: {\mathscr {H}}_N \rightarrow {\mathcal {B}}_N$ satisfies, for the same threshold ${\textsf {ALG}}$ and N sufficiently large,

$$\begin{aligned} {\mathbb {P}}[H_N({\mathcal {A}}_N(H_N))/N \ge {\textsf {ALG}}+ {\varepsilon }] \le \exp (-cN), \quad c = c({\varepsilon },\tau ) > 0. \end{aligned}$$

Together these results thus characterize the best possible Lipschitz optimization algorithms for multi-species spherical spin glasses.

We prove Theorem 1 with an explicit algorithm based on AMP, following a recent line of work [4, 5, 21, 24, 26]. Such algorithms are shown to be Lipschitz (up to modification on a set with $\exp (-cN)$ probability) in [15, Sect. 8]. AMP algorithms also have computational complexity which is linear in the input size when $H_N$ is a polynomial of finite degree (modulo solving for $\Phi $, a task that does not depend on N). See [4, Remark 2.1] for related discussion on this last point.

Similarly to [5, 24], our algorithm has two phases, a “root-finding" phase and a “tree-descending" phase. Roughly speaking, the set of points reachable by our algorithm has the geometry of a densely branching ultrametric tree, which is rooted at the origin when $\varvec{h}= {\textbf {0}}$ and more generally at a random point correlated with $\varvec{h}$. The first phase identifies this root, and the second traces a root-to-leaf path of this tree. The structure of the first phase is similar to the original AMP algorithm of [9] for the SK model at high-temperature, while the latter incremental AMP technique was introduced in [21].

For the purposes of this paper, the significance of (super, sub)-solvability is as follows. When the external field is sufficiently large, the root moves all the way to the boundary of ${\mathcal {B}}_N$ (in all r species) and the algorithmic tree becomes degenerate. In [16], it is shown that the external field is large enough for this to occur if and only if $\vec {1}$ is super-solvable. Moreover, [17] shows this condition coincides with strong topological trivialization (defined therein) of the optimization landscape.

In Sect. 3 we extend our main algorithm in several ways. In Sect. 3.1 we define $2^r$ signed generalizations of the root-finding algorithm with similar behavior. In Sect. 3.2 we compute the gradients of $H_N$ at the points output by our algorithm, in both cases when $\vec {1}$ is super-solvable and sub-solvable. In particular, we show that they are approximate critical points on the product of spheres ${\mathcal {S}}_N$ (defined in (1.6)). As explained in Remark 3.1, in the strictly super-solvable case these $2^r$ outputs approximate the $2^r$ genuine critical points of $H_N$ on ${\mathcal {S}}_N$. The sub-solvable case of this computation is used in our companion paper [17, Theorem 1.5(c) and Sect. 5.3] to show failure of annealed topological trivialization in the sub-solvable case. Finally in Sect. 3.3 we give a modification of the tree-descending phase for the super-solvable case. It constructs $\exp (cN)$ well-separated approximate critical points arranged in a densely branching ultrametric tree; this implies the failure of strong topological trivialization in [17, Definition 6 and Theorem 1.6].

1.3 Notations

Throughout, we will use boldface lowercase letters (${\varvec{u}},{\varvec{v}},\ldots $) to denote vectors in ${\mathbb {R}}^N$, and lowercase letters with vector sign ($\vec {u},\vec {v},\ldots $) to denote vectors in ${\mathbb {R}}^{\mathscr {S}}\simeq {\mathbb {R}}^r$. Similarly, boldface uppercase letters denote matrices or tensors in $({\mathbb {R}}^N)^{\otimes k}$, and non-boldface uppercase letters denote matrices or tensors in $({\mathbb {R}}^r)^{\otimes k}$. We let

$$\begin{aligned} \langle {\varvec{v}}\rangle _N=N^{-1}\sum _{i\le N} v_i; \quad \quad \langle {\varvec{u}},{\varvec{v}}\rangle _N = N^{-1}\sum _{i\le N}u_iv_i = \langle \vec \lambda , \vec R({\varvec{u}},{\varvec{v}})\rangle \end{aligned}$$

for ${\varvec{u}},{\varvec{v}}\in {\mathbb {R}}^N$. The corresponding norm is

$$\begin{aligned} \Vert {\varvec{u}}\Vert _{N}= \langle {\varvec{u}},{\varvec{u}}\rangle _N^{1/2}=\sqrt{\sum _s \lambda _s R_s({\varvec{u}},{\varvec{u}})}. \end{aligned}$$

Next $a_N\simeq b_N$ means that $a_N-b_N$ converges in probability to 0. Analogously, for two vectors ${\varvec{u}}_N, {\varvec{v}}_N$, we write ${\varvec{u}}_N\simeq {\varvec{v}}_N$ when $\Vert {\varvec{u}}_N-{\varvec{v}}_N\Vert _N$ converges in probability to 0. We denote limits in probability by $\mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty }$. Analogously we write $\approx _{\delta }$ to denote asymptotic equality as $\delta \rightarrow 0$.

For any tensor $\varvec{A}\in ({\mathbb {R}}^N)^{\otimes k}$, we define the operator norm

$$\begin{aligned} {\Vert \varvec{A}\Vert }_{\text {op}} = \sup _{\Vert {\varvec{\sigma }}^1\Vert ,\ldots ,\Vert {\varvec{\sigma }}^k\Vert \le 1} \left|\langle \varvec{A}, {\varvec{\sigma }}^1 \otimes \cdots \otimes {\varvec{\sigma }}^k \rangle \right|. \end{aligned}$$

The following proposition shows that with exponentially good probability, the operator norms of all constant-order gradients of $H_N$ are bounded on the appropriate scale.

Proposition 1.6

([16, Proposition 1.13]) For any fixed model $(\xi , {\vec h})$ there exists a constant $c>0$, sequence $(K_N)_{N\ge 1}$ of convex sets $K_N\subseteq {\mathscr {H}}_N$, and sequence of constants $(C_{k})_{k\ge 1}$ independent of N, such that the following properties hold.

(a)
$\mathbb {P}[H_N\in K_N]\ge 1-e^{-cN}$;
(b)
For all $H_N\in K_N$ and ${\varvec{x}}\in {\mathcal {B}}_N$,
$$\begin{aligned} {\left\Vert\nabla ^k H_N({\varvec{x}})\right\Vert}_{\text {op}}&\le C_{k}N^{1-\frac{k}{2}}. \end{aligned}$$
(1.9)

2 Achieving Energy ${\textsf {ALG}}$

In this section we prove Theorem 1 by exhibiting an AMP algorithm. Throughout this section, Assumption 1 on non-degeneracy of $\xi $ will be enforced without loss of generality.

2.1 Definition of Pseudo-Maximizer

As mentioned before Definition 1.5, the threshold ${\textsf {ALG}}$ in the sub-solvable case depends on a notion of pseudo-maximizer. We now provide this definition, which was derived in [16, Theorem 3] as a necessary condition for $\Phi $ to maximize ${\mathbb {A}}$ defined in (1.8) (and it is proved therein that a maximizer always exists).

Definition 2.1

A coordinate-wise strictly increasing $C^2$ function $\Phi :[q_1,1]\rightarrow [0,1]^{{\mathscr {S}}}$, for some $q_1\in [0,1]$, is a pseudo-maximizer if:

(1)
$\Phi $ is admissible, meaning it satisfies the normalization
$$\begin{aligned} \langle {\vec \lambda }, \Phi (q)\rangle = q,\quad \forall q\in [q_1,1]. \end{aligned}$$
(2.1)
In particular $\Phi (1) = \vec {1}$.
(2)
$\Phi (q_1)$ is solvable.
(3)
The derivative at $q_1$ satisfies $M^*(\Phi (q_1))\Phi '(q_1)=\vec {0}$. This amounts to no restriction when ${\vec h}=\vec {0}$ and thus $(q_1,\Phi (q_1))=(0,\vec {0})$; when ${\vec h}\ne \vec {0}$ it means that
$$\begin{aligned} \Phi _s'(q_1) = \frac{\Phi _s(q_1) (\xi ^s\circ \Phi )'(q_1)}{\xi ^s(\Phi (q_1))+h_s^2}, \quad s\in {\mathscr {S}}. \end{aligned}$$
(2.2)
(4)
For all $q\in [q_1,1]$, $\Phi $ solves the (second-order) tree-descending differential equation:
$$\begin{aligned} \Psi (q) \equiv \frac{1}{\Phi '_s(q)} {\frac{{\text {d}}}{{\text {d}q}}} \sqrt{\frac{\Phi '_s(q)}{(\xi ^s \circ \Phi )'(q)}} \end{aligned}$$
(2.3)
is independent of the species s. (See [16, Lemma 4.37] for well-posedness of this ODE.)

Note that there may exist multiple such $\Phi $, see [16, Figure 2]. If $\vec {1}$ is super-solvable, we adopt the convention that $q_1=1$ and $\Phi $ has domain $\{1\}$.

We now give an efficient AMP algorithm achieving energy ${\mathbb {A}}(\Phi )$ for any pseudo-maximizer $\Phi $. In particular for the optimal pseudo-maximizer this achieves energy ${\textsf {ALG}}$.

2.2 Review of Approximate Message Passing

Here we recall the class of AMP algorithms, specialized to our setting of interest. We initialize AMP with a deterministic vector ${\varvec{w}}^0$ with coordinates

$$\begin{aligned} w^0_i = w_{s(i)} \end{aligned}$$

(2.4)

depending only on the species. Let $f_{t,s}:{\mathbb {R}}^{t+1}\rightarrow {\mathbb {R}}$ be a Lipschitz function for each $(t,s)\in {\mathbb {Z}}_{\ge 0}\times {\mathscr {S}}$. For $({\varvec{w}}^0,{\varvec{w}}^1,\ldots ,{\varvec{w}}^t)\in {\mathbb {R}}^{N\times (t+1)}$, let $f_{t}({\varvec{w}}^0,{\varvec{w}}^1,\ldots ,{\varvec{w}}^t)\in {\mathbb {R}}^N$ be given by

$$\begin{aligned} f_{t}({\varvec{w}}^0,{\varvec{w}}^1,\ldots ,{\varvec{w}}^t)_i = f_{t,s(i)}(w^1_i,\ldots ,w^t_i),\quad i\in [N]. \end{aligned}$$

We generate subsequent iterates through recursions of the following form, where ${\textbf {ons}}_t$ is known as the Onsager correction term:

$$\begin{aligned} {\varvec{w}}^{t+1}&= \nabla H_N(\varvec{m}^t) - {\textbf {ons}}_t ; \nonumber \\ \varvec{m}^t&= f_{t}({\varvec{w}}^0,{\varvec{w}}^1,\ldots ,{\varvec{w}}^t); \end{aligned}$$

(2.5)

$$\begin{aligned} {\textbf {ons}}_t&= \sum _{t'\le t} d_{t,t'} \diamond f_{t'-1}({\varvec{w}}^1,\ldots ,{\varvec{w}}^{t'-1}); \end{aligned}$$

(2.6)

$$\begin{aligned} d_{t,t',s}&= \left( \sum _{s'\in {\mathscr {S}}} \partial _{x_{s'}} \xi ^s \left( \big ( {\mathbb {E}}[M^t_{s''} M^{t'-1}_{s''}] \big )_{s''\in {\mathscr {S}}} \right) \cdot {\mathbb {E}}\left[ \partial _{W^{t'}_{s'}}f_{t,s'}(W^0_{s'},\ldots ,W^t_{s'}) \right] \right). \end{aligned}$$

(2.7)

Here $W^t_s,M^t_s$ are defined as follows. $W^0_s=w_s$ and the variables $(\widetilde{W}^t_s)_{(t,s)\in {\mathbb {Z}}_{\ge 1}\times {\mathscr {S}}}$ form a centered Gaussian process with covariance defined recursively by

$$\begin{aligned} \begin{aligned} {\mathbb {E}}[\widetilde{W}^{t+1}_s \widetilde{W}^{t'+1}_{s}]&= \xi ^s\left({\mathbb {E}}[f_{t,s}(W^0_s,\ldots ,W^{t}_s)f_{t',s}(W^0_s,\ldots ,W^{t'}_s)]\right), \\ W^t_s&= \widetilde{W}^t_s+h_s; \\ M^t_s&= f_{t,s}(W^0_s,\ldots ,W^t_s) \end{aligned} \end{aligned}$$

(2.8)

and ${\mathbb {E}}[\widetilde{W}^{t+1}_{s} \widetilde{W}^{t'+1}_{s'}]=0$ if $s\ne s'$ (i.e. different species are independent).

The following state evolution characterizes the behavior of the above iterates. It states that for each $s\in {\mathscr {S}}$, when $i\in {\mathcal {I}}_s$ is uniformly random the sequence of coordinates $(w^1_i,w^2_i,\ldots ,w^t_i)$ has the same law as $(W^1_s,\ldots ,W^t_s)$. Say a function $\psi :{\mathbb {R}}^{\ell } \rightarrow {\mathbb {R}}$ is pseudo-Lipschitz if $|\psi (x) - \psi (y)| \le C(1+|x|+|y|)|x-y|$ for a constant C.

Proposition 2.2

For any pseudo-Lipschitz function $\psi $ and $\ell \in {\mathbb {Z}}_{\ge 0}$, $s\in {\mathscr {S}}$,

$$\begin{aligned} \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty }\frac{1}{N_s} \sum _{i\in {\mathcal {I}}_s} \psi ({\varvec{w}}^0_i,\ldots ,{\varvec{w}}^{\ell }_i) = {\mathbb {E}}[ \psi (W^0_s,\ldots ,W^{\ell }_s) ]. \end{aligned}$$

(2.9)

This proposition allows us to read off normalized inner products of the AMP iterates, since e.g.

$$\begin{aligned} \langle {\varvec{w}}^k,{\varvec{w}}^{\ell }\rangle _N \simeq \sum _{s\in {\mathscr {S}}} \lambda _s {\mathbb {E}}[W^k_s W^{\ell }_s]. \end{aligned}$$

Proposition 2.2 is proved in Appendix 1. In fact we show a slight generalization allowing $f_t=f_t({\varvec{w}}^0,\ldots ,{\varvec{w}}^t,{\varvec{g}}^0,\ldots ,{\varvec{g}}^t)$ to depend also on independently generated vectors $({\varvec{g}}^0,\ldots ,{\varvec{g}}^t)\in {\mathbb {R}}^{N(t+1)}$. When using this extension, we will always take each ${\varvec{g}}^t\sim {\mathcal {N}}(0,I_N)$ to be standard Gaussian. The more general result essentially says that ${\varvec{g}}_t$ still acts as an independent Gaussian for the purposes of state evolution. Since this is relatively intuitive, we refer to Theorem 2 in the appendix for a precise statement.

For random matrices (i.e. the case of quadratic H) there is a considerable literature establishing state evolution in many settings beginning with [7, 9] and later [6, 8, 10, 11, 13] (see also [14] for a survey of many statistical applications). The generalization to tensors was introduced in [23] and proved in [4], whose approach we follow.

2.3 Stage $\text {I}$: Finding the Root of the Ultrametric Tree

Our goal in this subsection will be to compute a vector $\varvec{m}^{{\underline{\ell }}}$ satisfying

$$\begin{aligned} \mathop {\mathrm {p-lim}}\limits _{{\underline{\ell }}\rightarrow \infty } \lim _{N\rightarrow \infty } \vec R(\varvec{m}^{{\underline{\ell }}},\varvec{m}^{{\underline{\ell }}})=\Phi (q_1) \end{aligned}$$

and with the correct energy value (as stated in Lemma 2.5). We take as given a maximizer $\Phi $ to ${\mathbb {A}}$ with domain $[q_1,1]$. Recall $\Phi (q_1)$ is super-solvable: either $\vec {1}$ is strictly sub-solvable, in which case $\Phi (q_1)$ is solvable, or $\vec {1}$ is super-solvable, in which case $\Phi (q_1) = \Phi (1) = \vec {1}$.

We use the initialization

$$\begin{aligned} w^0_i = \sqrt{\xi ^s(\Phi (q_1))+h_s^2},\quad i\in {\mathcal {I}}_s. \end{aligned}$$

Define the vector ${\vec a}\in \mathbb R^{{\mathscr {S}}}$ by

$$\begin{aligned} a_s= \sqrt{\frac{\Phi _s(q_1)}{\xi ^s(\Phi (q_1))+h_s^2}}. \end{aligned}$$

Subsequent iterates are defined via the following recursion.

$$\begin{aligned} {\varvec{w}}^{k+1}&= \nabla H_N(\varvec{m}^k) - {\vec b}_k \diamond \varvec{m}^{k-1} \nonumber \\&= \varvec{h}+ \nabla \widetilde{H}_N(\varvec{m}^k) - {\vec b}_k \diamond \varvec{m}^{k-1}; \end{aligned}$$

(2.10)

$$\begin{aligned} \varvec{m}^k&= {\vec a}\diamond {\varvec{w}}^k \end{aligned}$$

(2.11)

$$\begin{aligned} b_{k,s}&\equiv \sum _{s'\in {\mathscr {S}}} a_{s'} \partial _{s'}\xi ^s \big (\vec R(\varvec{m}^k,\varvec{m}^{k-1})\big ). \end{aligned}$$

(2.12)

The last term in (2.10) comes from specializing the formula (2.6) for the Onsager term.

Next recalling (2.8), let $(W^j_s,M^j_s)_{j\ge 0,s\in {\mathscr {S}}}$ be the state evolution limit of the coordinates of

$$\begin{aligned} ({\varvec{w}}^{0},\varvec{m}^{0},\ldots ,{\varvec{w}}^k,\varvec{m}^k) \end{aligned}$$

as $N\rightarrow \infty $. Concretely, each $W^j_s$ is Gaussian with mean $h_s$ and

$$\begin{aligned} M^{j}_s=\sqrt{\frac{\Phi _s(q_1)}{\xi ^s(\Phi (q_1))+h_s^2} } \cdot W^j_s, \quad j\ge 0,~ s\in {\mathscr {S}}. \end{aligned}$$

We next compute the covariance of the Gaussians $\widetilde{W}^j_s = W^j_s - h_s$. Define ${\vec \alpha }: {\mathbb {R}}_{\ge 0}^{\mathscr {S}}\rightarrow {\mathbb {R}}_{\ge 0}^{\mathscr {S}}$ by

$$\begin{aligned} \alpha _s(\vec {x}) = \left(\xi ^s(\vec {x})+h_s^2\right)\left(\frac{\Phi _s(q_1)}{\xi ^s(\Phi (q_1))+h_s^2}\right). \end{aligned}$$

(2.13)

Define the (deterministic) ${\mathbb {R}}_{\ge 0}^{{\mathscr {S}}}$-valued sequence $(\vec R^0,\vec R^1,\dots )$ of asymptotic overlaps recursively by $\vec R^0=\vec {0}$ and $\vec R^{k+1} = {\vec \alpha }(\vec R^k)$.

Lemma 2.3

For integers $0\le j<k$, the following equalities hold (the first in distribution):

$$\begin{aligned} W^j_s&{\mathop {=}\limits ^{d}} h_s+Z\sqrt{\xi ^s(\Phi (q_1))},\quad Z\sim {\mathcal {N}}(0,1) \end{aligned}$$

(2.14)

$$\begin{aligned} \mathbb E[\widetilde{W}^j_s \widetilde{W}^k_s]&=\xi ^s(\vec R^j) \end{aligned}$$

(2.15)

$$\begin{aligned} \mathbb E[(M^j_s)^2]&=\Phi _s(q_1) \end{aligned}$$

(2.16)

$$\begin{aligned} \mathbb E[M^j_s M^k_s]&=R^{j+1}_s. \end{aligned}$$

(2.17)

Proof

We proceed by induction on j, first showing (2.14) and (2.16) together. As a base case, (2.14) holds for $j=0$ by initialization. For the inductive step, assume first that (2.14) holds for j. Then by the definition (2.11),

$$\begin{aligned} \mathbb E\left[(M^j_s)^2\right]&= \left(\xi ^s(\Phi (q_1))+h_s^2\right)\cdot a_s^2 \\ {}&= \left(\xi ^s(\Phi (q_1))+h_s^2\right)\cdot \left(\frac{\Phi _s(q_1)}{\xi ^s(\Phi (q_1))+h_s^2}\right) \\ {}&=\Phi _s(q_1) \end{aligned}$$

so that (2.14) implies (2.16) for each $j\ge 0$. On the other hand, state evolution directly implies that if (2.16) holds for j then (2.14) holds for $j+1$. This establishes (2.14) and (2.16) for all $j\ge 0$.

We similarly show (2.15) and (2.17) together by induction, beginning with (2.15). When $j=0$ it is clear because $\widetilde{W}^k_s$ is mean zero and independent of $\widetilde{W}^0_s$. Just as above, it follows from state evolution that (2.15) for (j, k) implies (2.17) for (j, k) which in turn implies (2.15) for $(j+1,k+1)$. Hence induction on j proves (2.15) and (2.17) for all (j, k). $\square $

The next lemma is crucial and uses super-solvability of $\Phi (q_1)$.

Lemma 2.4

The limit $\vec R^\infty \equiv \lim _{j\rightarrow \infty } \vec R^j$ exists and equals $\Phi (q_1)$.

Proof

First we observe that ${\vec \alpha }$ (recall (2.13)) is coordinate-wise strictly increasing in the sense that if $0\preceq x\prec y$ then ${\vec \alpha }(x)\prec {\vec \alpha }(y)$. Moreover ${\vec \alpha }(\vec {0})\succ 0$ (assuming ${\vec h}\ne 0$, else the result is trivial) and ${\vec \alpha }(\Phi (q_1))=\Phi (q_1)$. Therefore $\vec R^\infty $ exists, ${\vec \alpha }(\vec R^\infty )=\vec R^\infty $, and

$$\begin{aligned} \vec {0}\preceq \vec R^\infty \preceq \Phi (q_1). \end{aligned}$$

It remains to show that the above forces $\vec R^\infty =\Phi (q_1)$ to hold.

Let $M\in {\mathbb {R}}^{{\mathscr {S}}\times {\mathscr {S}}}$ be the matrix with entries $M_{s,s'}={\frac{{\text {d}}}{{\text {d}t}}}{\vec \alpha }_s(\Phi (q_1)+te_{s'})|_{t=0}$ for $e_{s'}$ a standard basis vector. Then M is the derivative matrix for ${\vec \alpha }$ at $\Phi (q_1)$ in the sense that for any $\vec {u}\in {\mathbb {R}}^{{\mathscr {S}}}$,

$$\begin{aligned} {\frac{{\text {d}}}{{\text {d}t}}}{\vec \alpha }(\Phi (q_1)+t\vec {u})|_{t=0}=M\vec {u}. \end{aligned}$$

We easily calculate that

$$\begin{aligned} M_{s,s'} = \frac{\Phi _s(q_1) \partial _{x_s,x_{s'}}\xi (\Phi (q_1))}{\partial _{x_s}\xi (\Phi (q_1)) + \lambda _s h_s^2}. \end{aligned}$$

We claim that for any entry-wise non-negative vector $\vec w\in \mathbb R_{\ge 0}^{{\mathscr {S}}}$,

$$\begin{aligned} (M\vec w)_s\le w_s \end{aligned}$$

(2.18)

for some $s\in {\mathscr {S}}$. Indeed, suppose to the contrary that $(M\vec w)_s > w_s$ for all $s\in {\mathscr {S}}$. This rearranges to

$$\begin{aligned} \frac{\partial _{x_s}\xi (\Phi (q_1)) + \lambda _s h_s^2}{\Phi _s(q_1)} w_s - \sum _{s'\in {\mathscr {S}}} \partial _{x_s,x_{s'}} \xi (\Phi (q_1)) w_{s'} < 0 \quad \forall s\in {\mathscr {S}}, \end{aligned}$$

i.e. $M^*(\Phi (q_1)) \vec w\prec \vec {0}$ (recall (1.7)). Proposition 1.3 then implies that ${\varvec{\lambda }}_{\min }(M^*(\Phi (q_1))) < 0$, so $\Phi (q_1)$ is strictly sub-solvable, which is a contradiction. Thus (2.18) holds for some $s\in {\mathscr {S}}$.

Now suppose for sake of contradiction that $\vec R^\infty \prec \Phi (q_1)$, let $\vec w=\Phi (q_1)-\vec R^\infty $, and choose $s\in {\mathscr {S}}$ such that (2.18) holds. Write $f(t)=\alpha _s(\Phi (q_1)+t\vec w)$. Since $\alpha _s$ is a polynomial with non-negative coefficients and $\xi $ is non-degenerate, f is strictly convex and strictly increasing on $[-1,0]$. Hence

$$\begin{aligned} \alpha _s(\vec R^\infty ) = f(-1) > f(0)-f'(0) \ge \Phi _s(q_1)-(M\vec w)_s {\mathop {\ge }\limits ^{(2.18)}} \Phi _s(q_1)-w_s = R^\infty _s. \end{aligned}$$

The first inequality above is strict, so we deduce that ${\vec \alpha }(\vec R^\infty )\ne \vec R^\infty $ if $\vec R^\infty \prec \Phi (q_1)$. This contradicts the definition of $\vec R^\infty $. Therefore $\vec R^\infty =\Phi (q_1)$, completing the proof. $\square $

Remark 2.1

Super-solvability of $\Phi (q_1)$ is a tight condition for the above argument to hold, as the matrix M above needs to have Perron-Frobenius eigenvalue at most 1. Indeed suppose that $\Phi (q_1)$ was chosen so that $\lambda _1(M)>1$. Then there exists $\vec w\in {\mathbb {R}}_{>0}^{{\mathscr {S}}}$ with $M\vec w\succ \vec w$. Letting $\vec {x}=\Phi (q_1)-{\varepsilon }\vec w$ for small ${\varepsilon }>0$, we find ${\vec \alpha }(\vec {x})\prec \vec {x}$. Monotonicity implies that ${\vec \alpha }$ maps the compact, convex set

$$\begin{aligned} K=\{{\vec y}\in [0,1]^{{\mathscr {S}}}~:~\vec {0}\preceq {\vec y}\preceq \vec {x}\} \end{aligned}$$

into itself. By the Brouwer fixed point theorem, a fixed point of ${\vec \alpha }$ strictly smaller than $\Phi (q_1)$ exists whenever $\Phi (q_1)$ is strictly subsolvable.

We finish our analysis of the first AMP phase by computing the asymptotic energy it achieves. As expected, the resulting value agrees with the first term in the formula (1.8) for ${\textsf {ALG}}$.

Lemma 2.5

$$\begin{aligned} \lim _{k\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty }\frac{H_N(\varvec{m}^k)}{N} = \sum _{s\in {\mathscr {S}}} \lambda _s \sqrt{ \Phi _s(q_1) \cdot \left(h_s^2+\xi ^s(\Phi (q_1))\right)}. \end{aligned}$$

Proof

We use the identity

$$\begin{aligned} \frac{H_N(\varvec{m}^k)}{N}=\big \langle \varvec{h},\varvec{m}^k\rangle _N+\int _0^1 \langle \varvec{m}^k,\nabla {\widetilde{H}}_N(t\varvec{m}^k)\big \rangle _N \text {d}t \end{aligned}$$

(2.19)

and interchange the limit in probability with the integral. To compute $\mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty }\langle \varvec{m}^k,\nabla \widetilde{H}_N(t\varvec{m}^k)\rangle $ we introduce an auxiliary AMP step

$$\begin{aligned} {\varvec{y}}^{k+1}=\nabla {\widetilde{H}}_N(t\varvec{m}^k)- t {\vec b}_k \diamond \varvec{m}^{k-1} \end{aligned}$$

which depends implicitly on $t\in [0,1]$. Rearranging yields

$$\begin{aligned} \vec R(\varvec{m}^k,\nabla {\widetilde{H}}_N(t\varvec{m}^k))&= \vec R(\varvec{m}^k,{\varvec{y}}^{k+1}) + t\cdot \left( \vec R(\varvec{m}^k,\varvec{m}^{k-1} ) \odot {\vec b}_k \right) \\ {}&\simeq \vec R(\varvec{m}^k,{\varvec{y}}^{k+1}) + t\cdot \left( \vec R^k \odot {\vec b}_k \right). \end{aligned}$$

For the first term, recalling (2.11) yields

$$\begin{aligned} R_s(\varvec{m}^k,{\varvec{y}}^{k+1})= & {} \mathbb E[a_s W^k_s Y^{k+1}_s] \\ {}= & {} a_s\;\xi ^s(t \vec R^k). \end{aligned}$$

Note also that

$$\begin{aligned} \lambda _s \partial _{s'}\xi ^s(\vec R^k)=\partial _{x_s,x_{s'}}\xi (\vec R^k)=\lambda _{s'}\partial _{s}\xi ^{s'}(\vec R^k). \end{aligned}$$

(2.20)

Integrating with respect to t, and switching the roles of $s,s'$ in applying (2.20), we thus find

$$\begin{aligned} \int _0^1 \langle \varvec{m}^k,\nabla {\widetilde{H}}_N(t\varvec{m}^k)\rangle _N \text {d}t&\simeq \sum _{s\in {\mathscr {S}}} \lambda _s \int _0^1 R_s(\varvec{m}^k,\nabla {\widetilde{H}}_N(t\varvec{m}^k)) \text {d}t \\ {}&\simeq \sum _{s\in {\mathscr {S}}} \lambda _s \int _0^1 \Big ( a_s \xi ^s(t\vec R^k) + t R^k_s \sum _{s'} a_{s'}\partial _{s'}\xi ^s(\vec R^k) \Big ) ~\text {d}t \\&{\mathop {=}\limits ^{(2.20)}} \sum _{s\in {\mathscr {S}}} \lambda _s \int _0^1 \Big ( a_s \xi ^s(t\vec R^k) + t a_s \sum _{s'} R^k_{s'} \partial _{s'}\xi ^s(\vec R^k) \Big ) ~\text {d}t \\ {}&= \sum _{s\in {\mathscr {S}}} \lambda _s a_s \int _0^1 \frac{\text {d}~}{\text {d}t} \left(t\, \xi ^s(t\, \vec R^k)\right) \text {d}t \\ {}&= \sum _{s\in {\mathscr {S}}} \lambda _s a_s \xi ^s(\vec R^k). \end{aligned}$$

Finally the external field $\varvec{h}$ gives energy contribution

$$\begin{aligned} \langle \varvec{h}, \varvec{m}^k\rangle _N \simeq \sum _{s\in {\mathscr {S}}} \lambda _s h_s{\mathbb {E}}[M^k_s] = \sum _{s\in {\mathscr {S}}} \lambda _s a_s h_s^2. \end{aligned}$$

Since $\vec R^\infty =\Phi (q_1)$ by Lemma 2.4, we conclude

$$\begin{aligned} \lim _{k\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty }\frac{H_N(\varvec{m}^k)}{N}&= \sum _{s\in {\mathscr {S}}} \lambda _s a_s\big (h_s^2 + \xi ^s(\Phi (q_1))\big ) \\&= \sum _{s\in {\mathscr {S}}} \lambda _s \sqrt{ \Phi _s(q_1) \cdot \left(h_s^2+\xi ^s(\Phi (q_1))\right) } . \end{aligned}$$

$\square $

2.4 Stage $\text {II}$: Descending the Ultrametric Tree

We now turn to the second phase which uses incremental approximate message passing. Choose a large integer ${\underline{\ell }}$, and with $\delta ={\underline{\ell }}^{-1}$ let

$$\begin{aligned} q^{\delta }_{\ell } = q_1 + (\ell -{\underline{\ell }})\delta ,\quad \ell \ge 0. \end{aligned}$$

We then define

$$\begin{aligned} {\varvec{n}}^{{\underline{\ell }}} = \varvec{m}^{{\underline{\ell }}} + \sqrt{\Phi (q_1+\delta )-\Phi (q_1)}\diamond {\varvec{g}}\end{aligned}$$

(2.21)

with the square-root taken entrywise, and ${\varvec{g}}\sim {\mathcal {N}}(0,I_N)$. Then

$$\begin{aligned} \vec R({\varvec{n}}^{{\underline{\ell }}},{\varvec{n}}^{{\underline{\ell }}})\simeq \Phi (q_1+\delta )=\Phi (q^{\delta }_{{\underline{\ell }}+1}). \end{aligned}$$

(2.22)

The point ${\varvec{n}}^{{\underline{\ell }}}$ will be the “root” of our IAMP algorithm.^{Footnote 2}

Moreover we set $\overline{\ell }=\max \{\ell \in {\mathbb {Z}}_+~:~q_{\ell }^{\delta }\le 1-2\delta \}.$ We also define for $s\in {\mathscr {S}}$ and ${\underline{\ell }}\le \ell \le \overline{\ell }$ the constants

$$\begin{aligned} u_{\ell ,s}^{\delta } = \sqrt{ \frac{\Phi _s(q^{\delta }_{\ell +1})-\Phi _s(q^{\delta }_{\ell })}{\xi ^s(\Phi (q_{\ell +1}^{\delta })) - \xi ^s(\Phi (q_{\ell }^{\delta }))}}. \end{aligned}$$

(2.23)

Set ${\varvec{z}}^{{\underline{\ell }}}={\varvec{w}}^{{\underline{\ell }}}-\varvec{h}$. We will define $({\varvec{z}}^{\ell })_{\ell \ge {\underline{\ell }}+1}$ via

$$\begin{aligned} \begin{aligned} {\varvec{z}}^{\ell +1}&= \nabla \widetilde{H}_N(f_{\ell }({\varvec{z}}^{{\underline{\ell }}},\ldots ,{\varvec{z}}^\ell )) - \sum _{j=0}^\ell d_{\ell , j}\diamond f_{j-1}({\varvec{z}}^{{\underline{\ell }}},\ldots ,{\varvec{z}}^{j-1}). \end{aligned} \end{aligned}$$

(2.24)

The Onsager coefficients $d_{\ell ,j}$ are given by (2.7) and will not appear explicitly in any calculations until Sect. 3.2. Note that formally, they may depend on the first ${\underline{\ell }}$ iteratates, since (2.24) is a continuation of the same AMP iteration. To complete the definition of the iteration (2.24), for $s(i)=s$ and $\ell \ge {\underline{\ell }}$ we set

$$\begin{aligned} f_{\ell ,s}(z^{{\underline{\ell }}}_i,\ldots ,z^{\ell }_i) = n^{\ell }_i, \end{aligned}$$

(2.25)

where

$$\begin{aligned} {\varvec{n}}^{\ell +1} = {\varvec{n}}^{\ell }+ u_{\ell }^{\delta } \diamond \left({\varvec{z}}^{\ell +1}-{\varvec{z}}^{\ell } \right). \end{aligned}$$

(2.26)

The algorithm ${\mathcal {A}}$ outputs

$$\begin{aligned} {\mathcal {A}}(H_N) = \vec R({\varvec{n}}^{\overline{\ell }},{\varvec{n}}^{\overline{\ell }})^{-1/2}\diamond {\varvec{n}}^{\overline{\ell }} \in {\mathcal {B}}_N \end{aligned}$$

(2.27)

where the power $-1/2$ is taken entry-wise. We show in (2.32) below that

$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \Vert {\varvec{n}}^{\overline{\ell }}-{\mathcal {A}}(H_N)\Vert _N=0. \end{aligned}$$

Hence we will often not distinguish between the two and just consider ${\varvec{n}}^{\overline{\ell }}$ to be the output. This makes essentially no difference by virtue of Proposition 1.6.

The state evolution limits of ${\varvec{z}}^\ell $ and ${\varvec{n}}^\ell $ are described by time-changed Brownian motions with total variance $\Phi _s(q^{\delta }_{\ell })$ in species s after iteration $\ell $. This is made precise below.

Lemma 2.6

Fix $s\in {\mathscr {S}}$. The sequences $(Z^{\delta }_{{\underline{\ell }},s},Z^{\delta }_{{\underline{\ell }}+1,s},\dots )$ and $(N^{\delta }_{{\underline{\ell }},s},N^{\delta }_{{\underline{\ell }}+1,s},\dots )$ are Gaussian processes satisfying

$$\begin{aligned} \mathbb E[(Z^{\delta }_{\ell +1,s}-Z^{\delta }_{\ell ,s})Z^{\delta }_{j,s}]&= 0,\quad \text {for all }{\underline{\ell }}+1\le j\le \ell \end{aligned}$$

(2.28)

$$\begin{aligned} \mathbb E\big [ (Z^{\delta }_{\ell +1,s}-Z^{\delta }_{\ell ,s})^2 \big ]&= \xi ^s(\Phi (q_{\ell +1}^{\delta })) - \xi ^s(\Phi (q_{\ell }^{\delta })) \end{aligned}$$

(2.29)

$$\begin{aligned} \mathbb E[Z^{\delta }_{\ell ,s}Z^{\delta }_{j,s}]&= \xi ^s(\Phi (q_{j\wedge \ell }^{\delta })) \end{aligned}$$

(2.30)

$$\begin{aligned} \mathbb E[N^{\delta }_{\ell ,s}N^{\delta }_{j,s}]&= \Phi _s(q^{\delta }_{(j\wedge \ell )+1}). \end{aligned}$$

(2.31)

Proof

The fact that these sequences are Gaussian processes is a general fact about state evolution (the external Gaussian ${\varvec{g}}$ is permitted in Theorem 2). We proceed by induction on $\ell \ge {\underline{\ell }}$. The proof is similar to [24, Sect. 8] so we give only the main points (in fact (2.21) simplifies the corresponding construction therein, which avoided the use of external Gaussian noise). We will make liberal use of (2.8) to connect asymptotic overlaps before and after applying $\nabla H_N(\cdot )$.

For base cases, the ${\underline{\ell }}$ case of (2.30) is immediate from (2.16). The base case of (2.31) follows from (2.22), and thus the ${\underline{\ell }}+1$ case of (2.30). The main computation for the base case is

$$\begin{aligned} {\mathbb {E}}\big [\big (Z^{\delta }_{{\underline{\ell }}+1,s}-Z^{\delta }_{{\underline{\ell }},s}\big )Z^{\delta }_{{\underline{\ell }},s}\big ]&= \xi ^s\left(\{{\mathbb {E}}[N^{\delta }_{{\underline{\ell }},s}M^{{\underline{\ell }}-1}_s]\}_{s\in {\mathscr {S}}}\right) - \xi ^s\left(\{{\mathbb {E}}[M^{{\underline{\ell }}-1}_{s}M^{{\underline{\ell }}-1}_s]\}_{s\in {\mathscr {S}}}\right) \\ {}&= \xi ^s(\Phi (q_1))-\xi ^s(\Phi (q_1)) \\ {}&= 0. \end{aligned}$$

Here we used the general AMP statement of Theorem 2 to say that

$$\begin{aligned} {\mathbb {E}}[N^{\delta }_{{\underline{\ell }},s} M^{{\underline{\ell }}-1}_s] = {\mathbb {E}}[M^{{\underline{\ell }}-1}_s M^{{\underline{\ell }}-1}_s] = \Phi _s(q_1). \end{aligned}$$

For inductive steps, we always have by state evolution

$$\begin{aligned} {\mathbb {E}}[Z^{\delta }_{\ell +1,s}Z^{\delta }_{j+1,s}] \simeq \xi ^s\big (\vec R({\varvec{n}}^{\ell },{\varvec{n}}^{j})\big ). \end{aligned}$$

It follows by the inductive hypothesis of (2.28) that for $j\le \ell $,

$$\begin{aligned} R_s({\varvec{n}}^{\ell },{\varvec{n}}^{j})&= R_s({\varvec{n}}^{{\underline{\ell }}},{\varvec{n}}^{{\underline{\ell }}}) + \sum _{k={\underline{\ell }}}^{j-1} (u_k^{\delta })^2 R_s({\varvec{z}}^{k+1}-{\varvec{z}}^k,{\varvec{z}}^{k+1}-{\varvec{z}}^k) \\ {}&= R_s({\varvec{n}}^{{\underline{\ell }}},{\varvec{n}}^{{\underline{\ell }}}) + \sum _{k={\underline{\ell }}}^{j-1} (u_k^{\delta })^2 \left( \xi ^s(\Phi (q_{k+1}^{\delta })) - \xi ^s(\Phi (q_{k}^{\delta })) \right) \\&= \Phi _s(q_1) + \sum _{k={\underline{\ell }}}^{j-1} \Big ( \Phi _s(q^{\delta }_{k+1}) - \Phi _s(q^{\delta }_{k}) \Big ) \\ {}&= \Phi _s(q^{\delta }_j). \end{aligned}$$

Plugging into the above yields that for $j\le \ell $,

$$\begin{aligned} {\mathbb {E}}[Z^{\delta }_{\ell +1,s}Z^{\delta }_{j+1,s}] = \xi ^s(\Phi (q^{\delta }_j)). \end{aligned}$$

This depends only on $\min (j,\ell )$, so (2.28) follows. The others are proved by similar computations. $\square $

Equation (2.31) implies that $\vec R({\varvec{n}}^{\delta }_{\ell },{\varvec{n}}^{\delta }_{j})\simeq \Phi (q^{\delta }_{(\ell \wedge j)+1})$, which exactly corresponds to the previous sections of the paper. In particular it implies that the final iterate ${\varvec{n}}^{\delta }_{\overline{\ell }}$ satisfies

$$\begin{aligned} (1-O(\delta ))\cdot \vec {1}\preceq \vec R({\varvec{n}}^{\delta }_{\overline{\ell }},{\varvec{n}}^{\delta }_{\overline{\ell }})\preceq \vec {1}\end{aligned}$$

(2.32)

so the rounding step (2.27) causes only an $O(\delta )$ change in the Hamiltonian value. Finally we compute in Lemma 2.7 the energy gain from the second phase, which matches the second term in (1.8).

Lemma 2.7

$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N \rightarrow \infty } \frac{H_{N}({\varvec{n}}^{\overline{\ell }})-H_{N}\left({\varvec{n}}^{{\underline{\ell }}}\right)}{N} = \sum _{s\in {\mathscr {S}}} \lambda _s \int _{q_1}^{1} \sqrt{\Phi '_s(t) (\xi ^s \circ \Phi )'(t)}\quad \text {d}t \end{aligned}$$

(2.33)

Proof

Observe that $\langle h,{\varvec{n}}^{\overline{\ell }}-{\varvec{n}}^{{\underline{\ell }}}\rangle _N\simeq 0$ because the values $(N_{\ell ,s}^{\delta })_{\ell \ge {\underline{\ell }}}$ form a martingale sequence for each $s\in {\mathscr {S}}$. Therefore it suffices to find the in-probability limit of $\frac{\widetilde{H}_{N}({\varvec{n}}^{\overline{\ell }})-\widetilde{H}_{N}({\varvec{n}}^{{\underline{\ell }}})}{N}$. We write

$$\begin{aligned} \frac{\widetilde{H}_{N}({\varvec{n}}^{\overline{\ell }})-\widetilde{H}_{N}({\varvec{n}}^{{\underline{\ell }}})}{N} =\sum _{\ell ={\underline{\ell }}}^{\overline{\ell }-1}\frac{\widetilde{H}_{N}({\varvec{n}}^{\ell +1})-\widetilde{H}_{N}({\varvec{n}}^{\ell })}{N} \end{aligned}$$

and use a Taylor series approximation for each term. In particular for $F\in C^3(\mathbb R;{\mathbb {R}})$, applying Taylor’s approximation theorem twice yields

$$\begin{aligned} F(1)-F(0)&= F'(0)+\frac{1}{2}F''(0)+O\big (\sup _{a\in [0,1]}|F'''(a)|\big ) \\ {}&= F'(0)+\frac{1}{2}(F'(1)-F'(0))+O\big (\sup _{a\in [0,1]}|F'''(a)|\big ) \\ {}&= \frac{1}{2}(F'(1)+F'(0))+O\big (\sup _{a\in [0,1]}|F'''(a)|\big ) . \end{aligned}$$

Assuming $\sup _{\ell } {\left\Vert{\varvec{n}}^\ell \right\Vert}_N \le 1$, which holds with probability $1-o_N(1)$ by state evolution and the definition of $\overline{\ell }$, we apply this estimate with

$$\begin{aligned} F(a)= \frac{1}{N} \widetilde{H}_N\left((1-a){\varvec{n}}^{\ell }+a{\varvec{n}}^{\ell +1}\right). \end{aligned}$$

The result is:

$$\begin{aligned}&\frac{1}{N} \left| \widetilde{H}_{N} ({\varvec{n}}^{\ell +1})-\widetilde{H}_{N}({\varvec{n}}^{\ell }) -\frac{1}{2}\left\langle \nabla \widetilde{H}_N({\varvec{n}}^{\ell })+\nabla \widetilde{H}_N({\varvec{n}}^{\ell +1}),{\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell }\right\rangle \right|\\&\quad \le O\left( \underline{C} \Vert {\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell }\Vert _N^3 \right) ; \\&\underline{C}N^{-1/2} = \sup _{\Vert {\varvec{\sigma }}\Vert \le \sqrt{N}}\left\Vert\nabla ^3 \widetilde{H}_N({\varvec{\sigma }})\right\Vert_{\text {op}}. \end{aligned}$$

Proposition 1.6 implies that for deterministic constants c, C,

$$\begin{aligned} {\mathbb {P}}[\underline{C}\le C]\ge 1-e^{-cN}. \end{aligned}$$

On the other hand for each ${\underline{\ell }}\le \ell \le \overline{\ell }-1$ we have

$$\begin{aligned} \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty }\Vert {\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell }\Vert _N&= \sqrt{ \sum _{s\in {\mathscr {S}}}\lambda _s R_s({\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell },{\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell })} \\ {}&= \sqrt{\sum _{s\in {\mathscr {S}}}\lambda _s \big (\Phi _s(q^{\delta }_{\ell +2})-\Phi _s(q^{\delta }_{\ell +1}) \big ) } \\ {}&=\sqrt{\delta }. \end{aligned}$$

Summing and noting that $\overline{\ell }-{\underline{\ell }}\le \delta ^{-1}$ yields the high-probability estimate

$$\begin{aligned} \sum _{\ell ={\underline{\ell }}}^{\overline{\ell }-1}&\frac{1}{N} \left| \widetilde{H}_{N}({\varvec{n}}^{\ell +1}) - \widetilde{H}_{N}({\varvec{n}}^{\ell }) - \frac{1}{2}\left\langle \nabla \widetilde{H}_N({\varvec{n}}^{\ell })+\nabla \widetilde{H}_N({\varvec{n}}^{\ell +1}),{\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell }\right\rangle \right| \\ {}&\le \sum _{\ell ={\underline{\ell }}}^{\overline{\ell }-1} \Vert {\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell }\Vert _N^3 \le O(\sqrt{\delta }). \end{aligned}$$

So, this term vanishes as $\delta \rightarrow 0$. It remains to prove

$$\begin{aligned} \lim _{\delta \rightarrow 0} \mathop {\mathrm {p-lim}}\limits _{N \rightarrow \infty } \sum _{\ell ={\underline{\ell }}}^{\overline{\ell }-1} \left\langle \nabla \widetilde{H}_N({\varvec{n}}^{\ell }) + \nabla \widetilde{H}_N({\varvec{n}}^{\ell +1}), {\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell } \right\rangle _N {\mathop {=}\limits ^{?}} 2 \sum _{s\in {\mathscr {S}}} \lambda _s \int _{q_1}^{1} \sqrt{\Phi '_s(t) (\xi ^s \circ \Phi )'(t)} \text {d}t. \end{aligned}$$

To establish this it suffices to show for each species $s\in {\mathscr {S}}$ the equality

$$\begin{aligned} \lim _{\delta \rightarrow 0} \mathop {\mathrm {p-lim}}\limits _{N \rightarrow \infty } \sum _{\ell ={\underline{\ell }}}^{\overline{\ell }-1} R_s\left( \nabla \widetilde{H}_N({\varvec{n}}^{\ell }) + \nabla \widetilde{H}_N({\varvec{n}}^{\ell +1}) , {\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell } \right) {\mathop {=}\limits ^{?}} 2 \int _{q_1}^{1} \sqrt{\Phi '_s(t) (\xi ^s \circ \Phi )'(t)} \text {d}t. \end{aligned}$$

(2.34)

Observe by (2.24) that

$$\begin{aligned} \nabla \widetilde{H}_N({\varvec{n}}^{\ell }) = {\varvec{z}}^{\ell +1}+\sum _{j=0}^\ell d_{\ell , j}\diamond {\varvec{n}}^{j-1}. \end{aligned}$$

(2.35)

Passing to the limiting Gaussian process $(Z^{\delta }_k)_{k\in \mathbb Z^+}$ via state evolution,

$$\begin{aligned} \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } R\left(\nabla \widetilde{H}_N({\varvec{n}}^{\ell }),{\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell }\right)_s&= \mathbb E\left[ Z^{\delta }_{\ell +1,s}(N^{\delta }_{\ell +1,s}-N^{\delta }_{\ell ,s})\right]\\&\quad + \sum _{j=0}^{\ell } d_{\ell ,j,s} \mathbb E\left[ N^{\delta }_{j-1,s}(N^{\delta }_{\ell +1,s}-N^{\delta }_{\ell ,s}) \right], \\ \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } R\left(\nabla \widetilde{H}_N({\varvec{n}}^{\ell +1}),{\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell }\right)_s&= \mathbb E\left[ Z^{\delta }_{\ell +2,s}(N^{\delta }_{\ell +1,s}-N^{\delta }_{\ell ,s})\right]\\&\quad + \sum _{j=0}^{\ell +1} d_{\ell +1,j,s} \mathbb E\left[ N^{\delta }_{j-1}(N^{\delta }_{\ell +1,s}-N^{\delta }_{\ell ,s}) \right]. \end{aligned}$$

As $(N^{\delta }_k)_{k\ge \mathbb Z^+}$ is a martingale process, it follows that all right-most expectations vanish. Similarly it holds that

$$\begin{aligned} \mathbb E[Z_{\ell +2}^{\delta }(N_{\ell +1}^{\delta }-N_{\ell }^{\delta })]&=\mathbb E[Z_{\ell +1}^{\delta }(N_{\ell +1}^{\delta }-N_{\ell }^{\delta })] \\ \mathbb E[Z_{\ell }^{\delta }(N_{\ell +1}^{\delta }-N_{\ell }^{\delta })]&=0. \end{aligned}$$

We conclude that

$$\begin{aligned}&\mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } R\left(\nabla \widetilde{H}_N({\varvec{n}}^{\ell })+\nabla \widetilde{H}_N({\varvec{n}}^{\ell +1}),{\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell }\right)_s\\&\quad = 2\,\mathbb E[ (Z_{\ell +1,s}^{\delta }-Z^{\delta }_{\ell ,s})(N_{\ell +1,s}^{\delta }-N_{\ell ,s}^{\delta }) ] \\&\quad = 2\,\mathbb E[u_{\ell ,s}^{\delta }(Z^{\delta }_{\ell ,s})(Z_{\ell +1,s}^{\delta }-Z^{\delta }_{\ell ,s})^2] \\&\quad = 2\,\mathbb E[u_{\ell ,s}^{\delta }(Z^{\delta }_{\ell ,s})] \cdot {\mathbb {E}}[(Z_{\ell +1,s}^{\delta }-Z^{\delta }_{\ell ,s})^2] \\&\quad = 2\,\sqrt{ \Big ( \Phi _s(q^{\delta }_{\ell +1,s})-\Phi _s(q^{\delta }_{\ell ,s}) \Big ) \cdot \Big (\xi ^s(\Phi (q_{\ell +1}^{\delta }))-\xi ^s(\Phi (q_{\ell }^{\delta })) \Big ) } . \end{aligned}$$

In the second-to-last step we used independence of $Z^{\delta }_{\ell ,s}$ increments, which follows from Lemma 2.6, while the last step used (2.23) and (2.29). Combining with [16, Lemma 3.7] on discrete approximation of the integral in ${\mathbb {A}}$ implies (2.34). $\square $

Proof of Theorem 1

We take ${\mathcal {A}}$ as in (2.27) for ${\underline{\ell }}$ a large constant depending on $({\varepsilon },\xi ,h,\lambda )$. First,

$$\begin{aligned} {\mathbb {P}}[H_N({\mathcal {A}}(H_N))/N \ge {\textsf {ALG}}-{\varepsilon }/2] \ge 1-o_N(1) \end{aligned}$$

(2.36)

follows from combining Lemmas 2.5, 2.7 and the fact that (recall (2.32))

$$\begin{aligned} H_N({\mathcal {A}}(H_N))/N \simeq H_N({\varvec{n}}^{{\underline{\ell }}})/N+o_{{\mathbb {P}}}(1). \end{aligned}$$

Next, let $K_N\subseteq {\mathscr {H}}_N$ be as in Proposition 1.6. We recall that ${\mathbb {P}}[H_N\in K_N]\ge 1-e^{-cN}$. Exactly as in [15, Theorem 10] it follows that there is a $C({\varepsilon })$-Lipschitz function $\widetilde{\mathcal {A}}:{\mathscr {H}}_N\rightarrow {\mathbb {R}}$ such that $\widetilde{\mathcal {A}}$ and ${\mathcal {A}}$ agree on $K_N$. Moreover (1.6) and concentration of measure on Gaussian space imply that $H_N(\widetilde{\mathcal {A}}(H_N))$ is $O(N^{1/2})$-sub-Gaussian. In light of (2.36) and since ${\mathbb {P}}[\widetilde{\mathcal {A}}(H_N)={\mathcal {A}}(H_N)]\ge {\mathbb {P}}[H_N\in K_N]\ge 1-e^{-cN}$, we deduce that

$$\begin{aligned} {\mathbb {P}}[H_N({\mathcal {A}}(H_N))/N \ge {\textsf {ALG}}-{\varepsilon }] \ge 1-e^{-cN}. \end{aligned}$$

This concludes the proof. $\square $

3 Extensions

3.1 Signed AMP

In our companion paper [17], we show that strictly super-solvable models have w.h.p. exactly $2^r$ critical points, indexed by sign patterns ${\vec \Delta }\in \{\pm 1\}^r$ with the following physical meaning. Consider first the extreme case of a linear Hamiltonian, with external field $\varvec{h}= {\vec h}\diamond {\textbf {1}}$ where all entries of ${\vec h}$ are nonzero and no other interactions. This model clearly has $2^r$ critical points, which are the products of the maxima and minima in the spheres $\{{\left\Vert{\varvec{x}}_s\right\Vert}_2^2 = \lambda _s N\}$ corresponding to each species $s\in {\mathscr {S}}$, and the signs ${\vec \Delta }$ record whether the critical point is a maximum or minimum in each species. As explained in [17, Sect. 6.6], if a strictly super-solvable $H_N$ is gradually deformed to a linear function (staying inside the strictly super-solvable phase), the critical points move stably, and over this process their Hessian eigenvalues do not cross zero. Thus, each critical point of $H_N$ can also be associated with a sign pattern ${\vec \Delta }$.

We now show that the root-finding algorithm defined in Sect. 2.3 can be generalized to find all $2^r$ critical points in a strictly super-solvable model. More precisely, it finds $2^r$ approximate critical points, one in a neighborhood of each exact critical point of the model, from which the exact critical points can be computed by Newton’s method (see Remark 3.2). For general models, it finds $2^r$ approximate critical points on the product of spheres with self-overlap $\Phi (q_1)$. The restriction of $H_N$ to this set, considered as a spin glass in its own right (see [16, Remark 1.2]) is a solvable model.

Fixing ${\vec \Delta }\in \{\pm 1\}^r$, the analogous iteration to (2.10) is:

$$\begin{aligned} \begin{aligned} {\varvec{w}}^{k+1}&= \nabla H_N(\varvec{m}^k) - {\vec b}_k \diamond \varvec{m}^{k-1} \\ {}&= \varvec{h}+ \nabla \widetilde{H}_N(\varvec{m}^k) - {\vec b}_k({\vec \Delta }) \diamond \varvec{m}^{k-1}; \\ \varvec{m}^k&= {\vec \Delta }\odot {\vec a}\diamond {\varvec{w}}^k \\ b_{k,s}({\vec \Delta })&\equiv \sum _{s'\in {\mathscr {S}}} \Delta _{s'} a_{s'} \partial _{s'}\xi ^s \big (\vec R(\varvec{m}^k,\varvec{m}^{k-1})\big ). \end{aligned} \end{aligned}$$

(3.1)

The change of sign does not affect the proofs or statements of Lemmas 2.3, 2.4. Indeed $a_s^2$ only changes to $\Delta _s^2 a_s^2$ in the former proof which is no change at all. The generalization of Lemma 2.5 is as follows.

Lemma 3.1

$$\begin{aligned} \lim _{k\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty }\frac{H_N(\varvec{m}^k)}{N} = \sum _{s\in {\mathscr {S}}} \lambda _s \Delta _s \sqrt{ \Phi _s(q_1) \left(h_s^2+\xi ^s(\Phi (q_1))\right) } . \end{aligned}$$

Proof

The proof is similar to Lemma 2.5. The main calculation now becomes:

$$\begin{aligned} \int _0^1 \langle \varvec{m}^k,\nabla {\widetilde{H}}_N(t\varvec{m}^k)\rangle _N \text {d}t&\simeq \sum _{s\in {\mathscr {S}}} \lambda _s \int _0^1 R_s(\varvec{m}^k,\nabla {\widetilde{H}}_N(t\varvec{m}^k)) \text {d}t \\&\simeq \sum _{s\in {\mathscr {S}}} \lambda _s \int _0^1 \Big ( \Delta _s a_s \xi ^s(t\vec R^k) + t R^k_s \sum _{s'\in {\mathscr {S}}} \Delta _{s'} a_{s'}\partial _{s'}\xi ^s(\vec R^k) \Big ) ~\text {d}t \\&{\mathop {=}\limits ^{(2.20)}} \sum _{s\in {\mathscr {S}}} \lambda _s \int _0^1 \Big ( \Delta _s a_s \xi ^s(t\vec R^k) + t \Delta _s a_s \sum _{s'\in {\mathscr {S}}} R^k_{s'} \partial _{s'}\xi ^s(\vec R^k) \Big ) ~\text {d}t \\&= \sum _{s\in {\mathscr {S}}} \lambda _s \Delta _s a_s \int _0^1 \frac{\text {d}~}{\text {d}t} \left(t\, \xi ^s(t\, \vec R^k)\right) \text {d}t \\&= \sum _{s\in {\mathscr {S}}} \lambda _s \Delta _s a_s \xi ^s(\vec R^k). \end{aligned}$$

Moreover the external field $\varvec{h}$ now contributes energy

$$\begin{aligned} \langle \varvec{h}, \varvec{m}^k\rangle _N \simeq \sum _{s\in {\mathscr {S}}} \lambda _s h_s{\mathbb {E}}[M^k_s] = \sum _{s\in {\mathscr {S}}} \lambda _s \Delta _s a_s h_s^2. \end{aligned}$$

Combining gives the desired statement. $\square $

Remark 3.1

One can sign the IAMP phase as well by redefining (2.26) to

$$\begin{aligned} {\varvec{n}}^{\ell +1}({\vec \Delta }) = {\varvec{n}}^{\ell }({\vec \Delta }) + {\vec \Delta }\odot u_{\ell }^{\delta } \diamond ({\varvec{z}}^{\ell +1}-{\varvec{z}}^{\ell }). \end{aligned}$$

(3.2)

The resulting output ${\varvec{n}}^{\overline{\ell }}({\vec \Delta })$ then achieves asymptotic energy (recall (1.8))

$$\begin{aligned}{} & {} \lim _{{\underline{\ell }}\rightarrow \infty }\mathop {\mathrm {p-lim}}\limits _{N \rightarrow \infty }\frac{H_N\big ({\varvec{n}}^{\overline{\ell }}({\vec \Delta })\big )}{N} \nonumber \\ {}{} & {} \quad =\sum _{s\in {\mathscr {S}}} \lambda _s \Delta _s\left[ \sqrt{\Phi _s(q_1) (\xi ^s(\Phi (q_1)) + h_s^2)} + \int _{q_1}^1 \sqrt{\Phi '_s(q)(\xi ^s\circ \Phi )'(q)}~\text {d}q \right]. \end{aligned}$$

(3.3)

However it is unclear whether ${\varvec{n}}^{\overline{\ell }}({\vec \Delta })$ can be made to obey any notable properties. We will show that the signed outputs $\varvec{m}^k({\vec \Delta })$ of the first phase above are approximate critical points for $H_N$ (and in [17] that all near-critical points are close to one of them). By contrast, for the output of signed IAMP to be a critical point, $\Phi $ must satisfy a signed version of the tree-descending ODE (2.3) in which the function $(\xi ^s \circ \Phi )'(q)$ is replaced by

$$\begin{aligned} \sum _{s'\in {\mathscr {S}}} \Delta _{s'} \partial _{s'}\xi ^s(\Phi (q)) \Phi _s'(q). \end{aligned}$$

Since this quantity appears inside a square root in (2.3), it is unclear when to expect solutions to exist. Furthermore the proof in [16] of well-posedness relies on positivity of coefficients (via Perron-Frobenius theory) and does not seem to generalize. Additionally, a solution would not seem to correspond to a maximizer of any variational problem as in (1.8). As a result we do not know how to prove a solution exists in the signed case. However if one takes as given a smooth function $\Phi $ satisfying the signed tree-descending ODE, the iteration (3.2) starting from signed initialization ${\varvec{n}}^{{\underline{\ell }}}({\vec \Delta })=\varvec{m}^{{\underline{\ell }}}({\vec \Delta })+ \sqrt{\Phi (q_1+\delta )-\Phi (q_1)}\diamond {\varvec{g}}$ would produce an approximate critical point ${\varvec{n}}^{\overline{\ell }}({\vec \Delta })$ which still satisfies (3.3).

3.2 Gradient Computation and Connection to $E_{\infty }$

We now compute the gradient of the outputs, showing that $\varvec{m}^{{\underline{\ell }}}({\vec \Delta })$ and ${\varvec{n}}^{\ell }$ (${\underline{\ell }}\le \ell \le \overline{\ell }$) are approximate critical points for the restriction of $H_N$ to the products of r spheres with suitable radii passing through them. For ${\varvec{\sigma }}$ to be an approximate critical point means precisely that there exist coefficients $\vec {A}\in {\mathbb {R}}^r$ such that

$$\begin{aligned} \Vert \nabla H_N({\varvec{\sigma }}) - \vec {A}\diamond {\varvec{\sigma }}\Vert _N \simeq 0. \end{aligned}$$

In our case, these coefficients will be given as follows. If $\vec {1}$ is strictly sub-solvable (so $q_1<1$), define $\vec {A}(q)$ for $q\in [q_1,1]$ by

$$\begin{aligned} A_s(q)&\equiv f_s(q)^{-1} +\sum _{s'\in {\mathscr {S}}} f_{s'}(q) \partial _{s'}\xi ^s\big (\Phi (q)\big ), \end{aligned}$$

(3.4)

$$\begin{aligned} f_s(q)&\equiv \sqrt{\frac{\Phi '_s(q)}{(\xi ^s \circ \Phi )'(q)}}. \end{aligned}$$

(3.5)

Further define for ${\vec \Delta }\in \{-1,1\}^r$

$$\begin{aligned} A_s(q_1;{\vec \Delta }) \equiv \Delta _s \sqrt{\frac{\xi ^s(\Phi (q_1))+h_s^2}{\Phi _s(q_1)}} + \sum _{s'\in {\mathscr {S}}} \Delta _{s'} \partial _{s'}\xi ^s\big (\Phi (q_1)\big ) \sqrt{ \frac{\Phi _{s'}(q_1)}{\xi ^{s'}(\Phi (q_1))+h_{s'}^2} }. \end{aligned}$$

(3.6)

Note that, by (2.2), this is consistent with the definition of $\vec {A}(q_1)$ above, in the sense that $\vec {A}(q_1;\vec {1}) = \vec {A}(q_1)$. We take this to be the definition of $\vec {A}(q_1)$ if $\vec {1}$ is super-solvable (and $q_1=1$).

Proposition 3.2

If $\Phi $ is a pseudo-maximizer for ${\mathbb {A}}$ (recall Definition 2.1) then for any ${\vec \Delta }\in \{\pm 1\}^r$,

$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \Vert \nabla H_N(\varvec{m}^{{\underline{\ell }}}({\vec \Delta }))- \vec {A}(q_1,{\vec \Delta }) \diamond \varvec{m}^{{\underline{\ell }}}({\vec \Delta })\Vert _N = 0 . \end{aligned}$$

(3.7)

Proof

Recall from Lemma 2.3 (which holds without modification for general ${\vec \Delta }$) that

$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty }\mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \Vert \varvec{m}^{{\underline{\ell }}+1}({\vec \Delta })-\varvec{m}^{{\underline{\ell }}}({\vec \Delta })\Vert _N =0. \end{aligned}$$

(3.8)

Thus rearranging (3.1) yields

$$\begin{aligned} \lim _{k\rightarrow \infty }\mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \Vert \nabla H_N(\varvec{m}^{{\underline{\ell }}}({\vec \Delta })) - ({\vec \Delta }\odot {\vec a}^{-1} + {\vec b}_k({\vec \Delta })) \diamond \varvec{m}^{{\underline{\ell }}}({\vec \Delta }) \Vert _N =0. \end{aligned}$$

Since $\lim _{{\underline{\ell }}\rightarrow \infty } \big (\Delta _s a_s^{-1}+b_{{\underline{\ell }},s}({\vec \Delta })\big )=A_s({\vec \Delta })$ by (3.6), the result follows. $\square $

Remark 3.2

In [17, Theorems 1.5 and 1.6], we show that when $\xi $ is strictly super-solvable, $H_N$ has exactly $2^r$ critical points $\{{\varvec{x}}({\vec \Delta })\}_{{\vec \Delta }\in \{-1,1\}^r}$. Moreover all ${\varepsilon }$-approximate critical points with Riemannian gradient $\Vert \nabla _{{\text {sp}}}H_N({\varvec{x}})\Vert \le {\varepsilon }\sqrt{N}$ are within $o_{{\varepsilon }}(\sqrt{N})$ of some ${\varvec{x}}({\vec \Delta })$. It follows from Proposition 3.2 that each $\varvec{m}^{{\underline{\ell }}}({\vec \Delta })$ is an ${\varepsilon }$-approximate critical point for large enough ${\underline{\ell }}={\underline{\ell }}(\xi ,{\varepsilon })$. In fact the preceding gradient computation shows that the values ${\vec \Delta }$ agree, implying that $\Vert \varvec{m}^{{\underline{\ell }}}({\vec \Delta })-{\varvec{x}}({\vec \Delta })\Vert _N \le o_{{\underline{\ell }}\rightarrow \infty }(1)$ (compare with [17, Definition 5, Eq. (1.15)]). Moreover by [17, Theorem 1.6] each Riemannian Hessian $\nabla ^2_{{\text {sp}}}H_N({\varvec{x}}({\vec \Delta }))$ has condition number at least $1/C(\xi )$. It follows that each critical point ${\varvec{x}}({\vec \Delta })$ can be efficiently computed to arbitrary accuracy by applying Newton’s method from $\varvec{m}^{{\underline{\ell }}}({\vec \Delta })$ for a large enough ${\underline{\ell }}={\underline{\ell }}(\xi )$. (By contrast, the convergence of $\varvec{m}^{{\underline{\ell }}}({\vec \Delta })$ itself to ${\varvec{x}}({\vec \Delta })$ is only in the careful double-limit sense $\lim _{{\underline{\ell }}\rightarrow \infty }\lim _{N\rightarrow \infty }$.)

Proposition 3.3

If $\Phi $ is a pseudo-maximizer for ${\mathbb {A}}$, then for any ${\underline{\ell }}$-indexed sequence $(q_*,\ell )=\big ((q_*,\ell )_{{\underline{\ell }}\ge 1}\big )$ such that $q_*\in [q_1,1]$, ${\underline{\ell }}\le \ell \le \overline{\ell }$ and $\lim _{{\underline{\ell }}\rightarrow \infty }|q_*-q_{\ell }^{\delta }|=0$, we have

$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \left\Vert\nabla H_N({\varvec{n}}^{\ell })-\vec {A}(q_*) \diamond {\varvec{n}}^{\ell } \right\Vert_N = 0. \end{aligned}$$

Proof

For notational convenience we assume $(q_*,\ell )=(1,\overline{\ell })$; the proof is identical in general. Recall the rearrangement (2.35):

$$\begin{aligned} \nabla \widetilde{H}_N({\varvec{n}}^{\overline{\ell }}) = {\varvec{z}}^{\overline{\ell }+1}+\sum _{j=0}^{\overline{\ell }} d_{\overline{\ell }, j}\diamond {\varvec{n}}^{j-1} . \end{aligned}$$

(3.9)

So far we did not have to compute $d_{\overline{\ell }, j}$. We do this now, focusing on the IAMP phase. Recalling (2.25), the IAMP iteration used non-linearity

$$\begin{aligned} \varvec{f}_{\overline{\ell }}&= {\varvec{n}}^{\overline{\ell }} = n^{{\underline{\ell }}}+ \sum _{j={\underline{\ell }}}^{\overline{\ell }-1} ({\varvec{n}}^{j+1}-{\varvec{n}}^{j}) \\&= {\vec a}\diamond {\varvec{z}}^{{\underline{\ell }}}+ \sum _{j={\underline{\ell }}}^{\overline{\ell }-1} {\varvec{u}}^{\delta }_{j} \diamond ({\varvec{z}}^{j+1}-{\varvec{z}}^{j}) \\&= ({\vec a}- {\varvec{u}}_{{\underline{\ell }}}^{\delta }) \diamond {\varvec{z}}^{{\underline{\ell }}} + {\varvec{u}}_{\overline{\ell }-1}^{\delta }\diamond {\varvec{z}}^{\overline{\ell }} - \sum _{j={\underline{\ell }}+1}^{\overline{\ell }-1} ({\varvec{u}}^{\delta }_{j}-{\varvec{u}}^{\delta }_{j-1}) \diamond {\varvec{z}}^{j} . \end{aligned}$$

Using the formula (2.7) we find

$$\begin{aligned} d_{\overline{\ell },j,s}&\approx {\left\{ \begin{array}{ll} \sum _{s'\in {\mathscr {S}}} \partial _{s'}\xi ^s(\Phi (1)) ~ u^{\delta }_{\overline{\ell },s'} ,\quad \quad \quad \quad \quad \quad j=\overline{\ell }; \\ - \sum _{s'\in {\mathscr {S}}} \partial _{s'}\xi ^s(\Phi (q_{j-1}^{\delta })) ~ (u^{\delta }_{j,s'}-u^{\delta }_{j-1,s'}) ,\quad {\underline{\ell }}<j<\overline{\ell }; \\ \sum _{s'\in {\mathscr {S}}} \partial _{s'}\xi ^s(\Phi (q_1)) ~ (a_{s'}-u^{\delta }_{{\underline{\ell }},s'}) ,\quad \quad \quad j={\underline{\ell }}. \end{array}\right. } \end{aligned}$$

Note that since $\Phi \in C^2([q_1,1])$ we have the uniform-in-$q_j^{\delta }$ approximations (recall (3.5)):

$$\begin{aligned} \begin{aligned} u^{\delta }_{\ell ,s}&\approx f_s(q_j^{\delta }), \\ \frac{u^{\delta }_{\ell ,s}-u^{\delta }_{\ell -1,s}}{\delta }&\approx \frac{\text {d}}{\text {d}q} \sqrt{\frac{\Phi '_s(q)}{(\xi ^s \circ \Phi )'(q)}} \, \Bigg |_{q=q_j^{\delta }}, \\ a_s\approx u_{{\underline{\ell }},s}^{\delta }&\approx f_s(q_{{\underline{\ell }}}^{\delta })\approx f_s(q_1) . \end{aligned} \end{aligned}$$

(3.10)

Substituting into (3.9), we obtain

$$\begin{aligned} \nabla H_N({\varvec{n}}^{\overline{\ell }})= & {} \varvec{h}+{\varvec{z}}^{\overline{\ell }+1} + \sum _{j={\underline{\ell }}}^{\overline{\ell }} d_{\overline{\ell }, j}\diamond {\varvec{n}}^{j-1}\nonumber \\= & {} {\vec a}^{-1}\diamond \varvec{m}^{{\underline{\ell }}} + \sum _{j={\underline{\ell }}}^{\overline{\ell }} u^{\delta }_j \diamond ({\varvec{n}}^{j+1}-{\varvec{n}}^j) + \sum _{j={\underline{\ell }}}^{\overline{\ell }} d_{\overline{\ell }, j}\diamond {\varvec{n}}^{j-1}\nonumber \\\approx & {} \Big ( {\vec a}^{-1} + \sum _{j={\underline{\ell }}}^{\overline{\ell }} d_{\overline{\ell },j} \Big ) \diamond {\varvec{n}}^{{\underline{\ell }}} + \sum _{j={\underline{\ell }}}^{\overline{\ell }-1} \vec {C}_j\diamond ({\varvec{n}}^{j+1}-{\varvec{n}}^j) ;\nonumber \\ C_{j,s}\equiv & {} a_s^{-1} + \sum _{k=j+1}^{{\underline{\ell }}} d_{\overline{\ell },k,s}\nonumber \\{} & {} {\mathop {\approx }\limits ^{(3.10)}} f_s(q_j^{\delta })^{-1} + \sum _{s'\in {\mathscr {S}}} \left( \partial _{s'} \xi ^s(\Phi (1)) f_{s'}(1) - \int _{q_j^{\delta }}^1 \partial _{s'}\xi ^s(\Phi (q)) \, f_{s'}'(q)~\text {d}q \right)\nonumber \\\equiv & {} \widehat{C}_s(q_j^{\delta }) . \end{aligned}$$

(3.11)

Since the increments $({\varvec{n}}^{j+1}-{\varvec{n}}^j)$ are orthogonal in the state evolution sense, it easily follows that the approximation of $C_{j,s}$ by $\widehat{C}_{s}(q_j^{\delta })$ commutes with summation, i.e.

$$\begin{aligned} \nabla H_N({\varvec{n}}^{\overline{\ell }}) \approx \Big ( {\vec a}^{-1} + \sum _{j={\underline{\ell }}}^{\overline{\ell }} d_{\overline{\ell },j} \Big ) \diamond {\varvec{n}}^{{\underline{\ell }}} + \sum _{j={\underline{\ell }}}^{\overline{\ell }-1} \widehat{C}(q_j^{\delta })\diamond ({\varvec{n}}^{j+1}-{\varvec{n}}^j) \end{aligned}$$

Note that we manifestly have $\widehat{C}(1)=\vec {A}(1)$. We claim the function $\widehat{C}$ is constant on $[q_1,1]$. This is equivalent to showing that for each s the function

$$\begin{aligned} F_s(q) = \frac{1}{f_s(q)} + \left( \int _{q_1}^q \sum _{s'\in {\mathscr {S}}} \partial _{s'}\xi ^s(\Phi (t)) f_{s'}'(t) ~\text {d}t \right) \end{aligned}$$

is constant. Differentiating, it suffices to show

$$\begin{aligned} \sum _{s'\in {\mathscr {S}}} \partial _{s'}\xi ^s(\Phi (q)) f_{s'}'(q) {\mathop {=}\limits ^{?}} f_s'(q)/f_s(q)^2 . \end{aligned}$$

(3.12)

Write $f_{s'}'(q)=\Psi (q)\Phi _{s'}'(q)$, where $\Psi $ is independent of s since $\Phi $ solves the tree-descending ODE (2.3). Then using the chain rule, the left-hand side of (3.12) equals

$$\begin{aligned} \Psi (q) \sum _{s'\in {\mathscr {S}}} \partial _{s'}\xi ^s(\Phi (q))\cdot \Phi _{s'}'(q) =\Psi (q)(\xi ^s\circ \Phi )'(q). \end{aligned}$$

Meanwhile the right-hand side of (3.12) is

$$\begin{aligned} f_s'(q)/f_s(q)^2 = \Psi (q)\Phi _{s}'(q) \cdot \frac{(\xi ^s\circ \Phi )'(q)}{\Phi _s'(q)} = \Psi (q)(\xi ^s\circ \Phi )'(q). \end{aligned}$$

Therefore $\widehat{C}(q)=\vec {A}(1)$ is constant as claimed. Finally it is clear that the ${\varvec{n}}^{{\underline{\ell }}}$ coefficient in (3.11) approximately equals $\widehat{C}(q_1)$ and hence also $\vec {A}(1)$. Then (3.11) implies

$$\begin{aligned} \nabla H_N({\varvec{n}}^{\overline{\ell }}) \approx \vec {A}(1)\diamond \left({\varvec{n}}^{{\underline{\ell }}}+\sum _{j={\underline{\ell }}}^{\overline{\ell }-1} ({\varvec{n}}^{j+1}-{\varvec{n}}^j)\right) \\ =\vec {A}(1)\diamond {\varvec{n}}^{\overline{\ell }} \end{aligned}$$

which completes the proof. $\square $

From the point of view of [16], the fact that $\Vert \nabla _{{\text {sp}}} H_N({\varvec{n}}^{\overline{\ell }})\Vert _N\approx 0$ is to be expected. At least for $(\Phi ;q_1)$ maximizing ${\mathbb {A}}$, if this were not true than an extra step of gradient descent would essentially suffice to reach energy strictly better than ${\textsf {ALG}}$, contradicting the optimality in [16, Theorem 1]. However the radial derivative computation is interesting in its own right and lets us study the spherical Hessian around an output ${\varvec{\sigma }}$. We believe that Corollary 3.4 can be strengthened to hold with ${\varvec{\lambda }}_1$ rather than ${\varvec{\lambda }}_{{\varepsilon }N}$. This seems to require a more precise Gaussian conditioning argument around ${\mathcal {A}}(H_N)$ which we chose not to pursue.

Corollary 3.4

With ${\varvec{\lambda }}_k$ the k-th largest eigenvalue of a symmetric real matrix,

$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty ,{\varepsilon }\rightarrow 0} \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } {\varvec{\lambda }}_{{\varepsilon }N} \left( \nabla ^2_{{\text {sp}}} H_N({\varvec{n}}^{\overline{\ell }}) \right) =0. \end{aligned}$$

(3.13)

Proof

Fixing $\vec {A}=\vec {A}(1)$, the bulk spectral measure of

$$\begin{aligned} {\varvec{W}}({\varvec{x}})=\nabla ^2 H_N({\varvec{x}}) - \vec {A}\diamond {\varvec{x}}\end{aligned}$$

(3.14)

for deterministic ${\varvec{x}}\in {\mathcal {S}}_N$ concentrates with rate function $N^2$ around a limiting spectral measure independent of ${\varvec{x}}$. By union-bounding over an $\delta \sqrt{N}$-net as in [26, Proof of Lemma 3], it thus suffices to show (3.13) at a point ${\varvec{x}}\in {\mathcal {S}}_N$ independent of $H_N$, with ${\varvec{W}}({\varvec{x}})$ in place of $\nabla ^2_{{\text {sp}}} H_N({\varvec{\sigma }})$. This is purely a statement of random matrix theory and is shown in [17, Proposition 5.18]. $\square $

Notably Corollary 3.4 explains the equality ${\textsf {ALG}}=E_{\infty }$ for pure models, which we derived manually in [16]. Indeed for a pure model with $\xi =\prod _{i=1}^r x_i^{a_i}$, the energy and radial derivative are deterministically proportional:

$$\begin{aligned} \nabla _{{\text {rad}}} H_N({\varvec{x}}) = -H_N({\varvec{x}}){\vec a}\diamond {\varvec{x}},\quad \forall {\varvec{x}}\in {\mathcal {B}}_N. \end{aligned}$$

It follows (using again the $N^2$ large deviation rate for the spectral bulk) that there is a unique energy level $E_{\infty }$ at which critical points can have spherical Hessian obeying the conclusion of Corollary 3.4. This is the definition of $E_\infty $ given in [1, 20].

3.3 Branching IAMP and Exponential Concentration

Here we modify the second stage of our IAMP algorithm (which requires ${\vec \Delta }=\vec {1}$) to use external Gaussian randomness in a small number of increment steps. This allows the construction of an ultrametric tree of outputs with large constant depth and $\exp (cN)$ breadth, with pairwise overlaps given by $\Phi $. More precisely, for any finite ultrametric space $X=(x_1,\ldots ,x_M)$, $M=\exp (cN)$, of diameter at most $1-q_1$, branching IAMP outputs $({\varvec{\sigma }}_1,\ldots ,{\varvec{\sigma }}_M)$ with

$$\begin{aligned} \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \max _{1\le i,j\le M}{\Vert \vec R({\varvec{\sigma }}_i,{\varvec{\sigma }}_j) - \Phi \big (1-d_X(x_i,x_j)\big ) \Vert }_{\infty }=0. \end{aligned}$$

We use an approach suggested in [3] by injecting external Gaussian noise ${\varvec{g}}^{(i)}$ into the IAMP phase of the algorithm at depth $q_i\in (q_1,1)$. Importantly, this gives an explicit construction of $\exp (cN)$ approximate critical points of $H_N$ (with exponentially good probability) whenever there is an IAMP phase. A similar construction was used by one of us in [24, Sect. 4]. There the Gaussian noise was constructed artificially by preliminary iterates of AMP rather than from exogenous noise (due to the lack of a state evolution result incorporating independent Gaussian vectors). This only enabled the construction of a large constant number of outputs rather than exponentially many.

Our branching IAMP proceeds as follows. We first apply Stage $\text {I}$ with ${\vec \Delta }=\vec {1}$ as before. We fix $q_1<q_2<\dots <q_m=1$ and let

$$\begin{aligned} \ell ^{\delta }_{q_i}={\underline{\ell }}+\left\lceil \frac{q_i-q}{\delta }\right\rceil +1, \quad i\in [m]. \end{aligned}$$

We define ${\varvec{n}}^{\ell }$ with the same recursive formula as before, unless $\ell =\ell ^{\delta }_{q_i}$ for some $i\in [m]$. For these cases, we define ${\varvec{g}}^{(1)},\ldots ,{\varvec{g}}^{(m)}\sim {\mathcal {N}}(0,{\textbf {1}}_N)$ to be independent standard Gaussian vectors. Then we set:

$$\begin{aligned} {\varvec{n}}^{\ell +1} = {\left\{ \begin{array}{ll} {\varvec{n}}^{\ell } + \sqrt{\xi ^s\big (\Phi (q^{\delta }_{\ell _{q_i}^{\delta }+1})\big )-\xi ^s\big (\Phi (q^{\delta }_{\ell _{q_i}^{\delta }})\big )} \diamond {\varvec{g}}^{(i)}, \quad \quad \quad \ell = \ell ^{\delta }_{q_i}\quad \text {for some } i\in [m] \\ {\varvec{n}}^{\ell }+ u_{\ell }^{\delta }\diamond \left( {\varvec{z}}^{\ell +1}-{\varvec{z}}^{\ell } \right) , \quad \text { else}. \end{array}\right. } \end{aligned}$$

(3.15)

The definition (3.15) naturally enables couplings for pairs of iterations. We say the iterations $\big ({\varvec{n}}^{\ell ,1},{\varvec{n}}^{\ell ,2}\big )_{\ell \ge 1}$ are $q_j$-coupled if their associated Gaussian vectors

$$\begin{aligned} {\varvec{g}}^{(1,1)},\ldots ,{\varvec{g}}^{(m,1)}&\sim {\mathcal {N}}(0,{\textbf {1}}_N), \\ {\varvec{g}}^{(1,2)},\ldots ,{\varvec{g}}^{(m,2)}&\sim {\mathcal {N}}(0,{\textbf {1}}_N) \end{aligned}$$

are coupled so that ${\varvec{g}}^{(i,1)}={\varvec{g}}^{(i,2)}$ almost surely for $i<j$, and the variables are otherwise independent.

Proposition 3.5

Let the iterations ${\varvec{n}}^{\ell ,1},{\varvec{n}}^{\ell ,2}$ be $q_j$ coupled as above, and let $\Phi $ be a pseudo-maximizer of ${\mathbb {A}}$ (recall Definition 2.1). Then

$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \frac{H_N({\varvec{n}}^{\overline{\ell },a}_{\delta })}{N}&= {\mathbb {A}}(\Phi ),\quad a\in \{1,2\} ; \end{aligned}$$

(3.16)

$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \left\Vert\nabla H_N({\varvec{n}}^{\ell ,a})-\vec {A}(1) \diamond {\varvec{n}}^{\ell ,a} \right\Vert_N&= 0,\quad a\in \{1,2\} ; \end{aligned}$$

(3.17)

$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \frac{ \langle {\varvec{n}}^{\overline{\ell },1}_{\delta } , {\varvec{n}}^{\overline{\ell },2}_{\delta } \rangle }{N}&= \Phi (q_j^{\delta }) . \end{aligned}$$

(3.18)

Proof

The analysis uses the slightly generalized state evolution given in Theorem 2, which states that (2.8) continues to hold even in the presence of external randomness ${\varvec{g}}^{(i)}$. Modulo this point, the calculations are essentially identical. Indeed [24] uses exactly the same calculations to analyze a slightly different formulation of branching IAMP (therein, the vectors ${\varvec{g}}$ are defined via negatively time-indexed AMP iterates to sidestep the lack of a generalized state evolution result). We therefore give only an outline below.

The SDE description in (2.6) is unchanged if one uses the slightly added generality of Theorem 2 to incorporate the external Gaussian noise. (This Gaussian noise is scaled in (3.15) to achieve exactly the same effect as a usual iteration step.) The energy analysis of $H_N({\varvec{n}}^{\overline{\ell }})$ only changes on the m modified steps which has negligible effect since $\delta \rightarrow 0$ as ${\underline{\ell }}\rightarrow \infty $; similarly for $\nabla H_N({\varvec{n}}^{\overline{\ell }})$. Thus (3.16) follows by the same proof as before. The proof of (3.18) is identical to [24, Sect. 8]. $\square $

In Proposition 3.6 we observe that concentration of measure implies Proposition 3.5 holds with exponentially high probability. Thus we can couple together $\exp (cN)$ branching IAMPs to construct a full ultrametric tree of large constant depth m and breadth $\exp (cN)$. To do this, we fix m, take ${\underline{\ell }}$ sufficiently large and then $\eta >0$ sufficiently small. Then with $K=\exp (\eta N)$, we consider a complete depth m rooted tree ${\mathcal {T}}$, with root defined to have depth 1, such that each vertex at depths $1,\ldots ,m-1$ has K children. Thus the leaf-set $L({\mathcal {T}})$ is naturally indexed by $[K]^m$. For $v,v'\in L({\mathcal {T}})$ we let $v\wedge v'\in \{1,2,\ldots ,m\}$ denote the height of their least common ancestor. For each non-leaf $x\in V({\mathcal {T}})$, label the edge from x to its parent by an i.i.d. Gaussian vector ${\varvec{g}}^{(x)}\sim {\mathcal {N}}(0,I_N)$. Then for each leaf $v\in L({\mathcal {T}})$, using the m Gaussian vectors along the path from the root of ${\mathcal {T}}$ to v yields branching IAMP output ${\varvec{\sigma }}^{(v)}$ for any $H_N$.

Proposition 3.6

Proposition 3.5 holds with exponentially good probability in the following sense. Fix m and $q_1<q_2<\dots <q_m=1$. For any ${\varepsilon }>0$, for large enough ${\underline{\ell }}$ there exists $\eta =\eta ({\varepsilon },{\underline{\ell }})>0$ such that for N large enough, the following hold simultaneously across all $v,v'\in L({\mathcal {T}})$ with probability at least $1-\exp (-\eta N)$:

$$\begin{aligned} \begin{aligned} |{\mathbb {A}}(\Phi ) - H_N({\varvec{n}}^{\overline{\ell },v})/N |&\le {\varepsilon }; \\ \Vert \nabla H_N({\varvec{n}}^{\overline{\ell },v})- \vec {A}(1) \diamond {\varvec{n}}^{\overline{\ell },v}\Vert _N&\le {\varepsilon }; \\ \left\Vert\vec R({\varvec{n}}^{\overline{\ell },v},{\varvec{n}}^{\overline{\ell },v'}) - \Phi (q_{v\wedge v'}) \right\Vert_N&\le {\varepsilon }. \end{aligned} \end{aligned}$$

(3.19)

Proof

As explained in [15, Sect. 8], the map $H_N\mapsto {\varvec{n}}^{\overline{\ell }}$ agrees with a $C({\underline{\ell }})$-Lipschitz function of the coefficients ${\varvec{G}}^{(k)}$ of $H_N$ except with probability $1-\exp (-cN)$. The same proof applies for $H_N\mapsto {\varvec{n}}^{\overline{\ell },v}$ as well since the external noise variables are also Gaussian. Concentration of measure on Gaussian space now ensures that the statements above hold with exponentially high probability for each fixed $(v,v')$. Union bounding over all such pairs for small enough $\eta $ implies the result. $\square $

In particular, the last conclusion in (3.19) shows that all $\exp (\eta N)$ constructed points have pairwise distance at least $\delta \sqrt{N}$ for $0<\delta <1-q_{m-1}$. Thus for any sub-solvable model, with high probability there are exponentially many $\sqrt{N}/C(\xi )$-separated approximate critical points. This is a converse to the main result of [17], where we show that strictly super-solvable models enjoy a strong topological trivialization property which rules out such behavior.

Remark 3.3

An alternative to branching IAMP, which is very natural from the point of view of our companion work [16], is to slightly perturb $H_N$ to a $(1-\eta )$-correlated function $H_N^{(\eta )}$. Concentration of measure implies that the overlap

$$\begin{aligned} \vec R\big ({\mathcal {A}}(H_N),{\mathcal {A}}(H_N^{(\eta )})\big ) \end{aligned}$$

concentrates exponentially around a limiting value $R_{\delta ,{\underline{\ell }},\eta }\in {\mathbb {R}}^r$. We expect that taking $\eta \rightarrow 0$ with $\delta ,{\underline{\ell }}$ in a suitable way enables $R_{\delta ,{\underline{\ell }},\eta }\approx \Phi (q)$ for any desired $q\in [q_1,1]$. This corresponds to the fact that p(q) for $q\in [q_1,1]$ for any $(p,\Phi ;q_0)$ maximizing ${\mathbb {A}}$. However this approach seems more cumbersome to analyze explicitly.

Remark 3.4

The construction in this section shows the quenched existence of $\exp (\eta N)$ well-separated approximate critical points for strictly sub-solvable models. In [17, Theorem 5.15] we use this fact to prove the number of exact critical points is exponentially large in expectation. However we are unable to prove the quenched (i.e. high-probability) existence of $\exp (\eta N)$ exact critical points in strictly sub-solvable models. Showing that this is the case, or more generally identifying the quenched exponential order of the number of critical points, is an interesting direction for future work.

Data Availability

We do not analyze or generate any data. Instead, our work proceeds via a mathematical approach.

Notes

Technically the $N\rightarrow \infty $ limit is not known to exist for general $\xi $. Since $\textsf{OPT}$ appears in the present paper only in this informal discussion, we will not belabor this point.
If ${\vec h}=0$, one takes ${\underline{\ell }}=q_1=0$, $n^{1}_i=\sqrt{\Phi _{s(i)}(\delta )}{\varvec{g}}_i$, and proceeds identically.
The unusual factor 2 in the exponent comes from the external randomness vectors ${\varvec{e}}^1,\ldots ,{\varvec{e}}^t$.

References

Auffinger, A., Ben Arous, G., Černý, J.: Random matrices and complexity of spin glasses. Commun. Pure Appl. Math. 66(2), 165–201 (2013)
Article MathSciNet Google Scholar
Auffinger, A., Chen, W.-K.: Free energy and complexity of spherical bipartite models. J. Stat. Phys. 157(1), 40–59 (2014)
Article ADS MathSciNet Google Scholar
El Alaoui, A., Montanari, A.: Algorithmic thresholds in mean field spin glasses. arXiv preprint (2020). arXiv:2009.11481
El Alaoui, A., Montanari, A., Sellke, M.: Optimization of mean-field spin glasses. Ann. Probab. 49(6), 2922–2960 (2021)
Article MathSciNet Google Scholar
El Alaoui, A., Sellke, M.: Algorithmic pure states for the negative spherical perceptron. J. Stat. Phys. 189(2), 27 (2022)
Article ADS MathSciNet Google Scholar
Bayati, M., Lelarge, M., Montanari, A.: Universality in polytope phase transitions and message passing algorithms. Ann. Appl. Probab. 25(2), 753–822 (2015)
Article MathSciNet Google Scholar
Bayati, M., Montanari, A.: The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans. Inf. Theory 57, 764–785 (2011)
Article MathSciNet Google Scholar
Berthier, R., Montanari, A., Nguyen, P.-M.: State evolution for approximate message passing with non-separable functions. Inf. Inference 9, 33–79 (2019)
Article MathSciNet Google Scholar
Bolthausen, E.: An iterative construction of solutions of the TAP equations for the Sherrington–Kirkpatrick model. Commun. Math. Phys. 325(1), 333–366 (2014)
Article ADS MathSciNet Google Scholar
Chen, W.-K., Lam, W.-K.: Universality of approximate message passing algorithms. Electron. J. Probab. 26, 1–44 (2021)
Article MathSciNet Google Scholar
Dudeja, R., Lu, Y.M., Sen, S.: Universality of approximate message passing with semi-random matrices. Ann. Probab. 51(5), 1616–1683 (2023). https://doi.org/10.1214/23-AOP1628
Dembo, A., Montanari, A., Sen, S.: Extremal cuts of sparse random graphs. Ann. Probab. 45(2), 1190–1217 (2017)
Article MathSciNet Google Scholar
Fan, Z.: Approximate message passing algorithms for rotationally invariant matrices. Ann. Stat. 50(1), 197–224 (2022)
Article MathSciNet Google Scholar
Feng, O.Y., Venkataramanan, R., Rush, C., Samworth, R.J., et al.: A unifying tutorial on approximate message passing. Found. Trends Mach. Learn. 15(4), 335–536 (2022)
Article Google Scholar
Huang, B., Sellke, M.: Tight Lipschitz hardness for optimizing mean field spin glasses. arXiv preprint (2021). arXiv:2110.07847
Huang, B., Sellke, M.: Algorithmic threshold for multi-species spherical spin glasses. arXiv preprint (2023). arXiv:2303.12172
Huang, B., Sellke, M.: Strong topological trivialization of multi-species spherical spin glasses. arXiv preprint (2023). arXiv:2308.09677
Javanmard, A., Montanari, A.: State evolution for general approximate message passing algorithms, with applications to spatial coupling. Inf. Inference 2(2), 115–144 (2013)
Article MathSciNet Google Scholar
Krzakala, F., Montanari, A., Ricci-Tersenghi, F., Semerjian, G., Zdeborová, L.: Gibbs states and the set of solutions of random constraint satisfaction problems. Proc. Natl. Acad. Sci. 104(25), 10318–10323 (2007)
Article ADS MathSciNet PubMed Google Scholar
McKenna, B.: Complexity of bipartite spherical spin glasses. arXiv preprint (2021). arXiv:2105.05043
Montanari, A.: Optimization of the Sherrington–Kirkpatrick Hamiltonian. SIAM J. Comput. (2021). https://doi.org/10.1137/20M132016X
Article Google Scholar
Panchenko, D.: On the K-sat model with large number of clauses. Random Structures & Algorithms 52(3), 536–542 (2018)
Article MathSciNet Google Scholar
Richard, E., Montanari, A.: A statistical model for tensor PCA. In: Advances in Neural Information Processing Systems, pp. 2897–2905 (2014)
Sellke, M.: Optimizing mean field spin glasses with external field. Electron. J. Probab. 29, 1–47 (2024)
Article MathSciNet Google Scholar
Sherrington, D., Kirkpatrick, S.: Solvable model of a spin-glass. Phys. Rev. Lett. 35(26), 1792 (1975)
Article ADS Google Scholar
Subag, E.: Following the ground states of full-RSB spherical spin glasses. Commun. Pure Appl. Math. 74(5), 1021–1044 (2021)
Article MathSciNet Google Scholar

Download references

Acknowledgements

B.H. was supported by an NSF Graduate Research Fellowship, a Siebel scholarship, NSF awards CCF-1940205 and DMS-1940092, and NSF-Simons collaboration grant DMS-2031883. M.S. was supported by an NSF graduate research fellowship, a Stanford graduate fellowship, and NSF award CCF-2006489 and was a member at the IAS while parts of this work were completed.

Funding

Open Access funding provided by the MIT Libraries

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, USA
Brice Huang
Department of Statistics, Harvard University, Cambridge, USA
Mark Sellke

Authors

Brice Huang
View author publications
You can also search for this author in PubMed Google Scholar
Mark Sellke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brice Huang.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests (beyond the aforementioned funding).

Additional information

Communicated by Francesco Zamponi.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: State Evolution: Proof of Proposition 2.2

In this section we prove Proposition 2.2, following the appendix of [4]. Throughout, we denote by ${\varvec{G}}^{(k)}\in ({\mathbb {R}}^N)^{\otimes k}$, $k\ge 2$ a sequence of standard Gaussian tensors. For $S_k$ the symmetric group on k elements we also write

$$\begin{aligned} \varvec{A}^{(k)}=\frac{1}{N^{(k-1)/2} }\Gamma ^{(k)} \diamond \sum _{\pi \in S^k}({\varvec{G}}^{(k)})^{\pi } \end{aligned}$$

(A.1)

for the rescaled tensors with entries

$$\begin{aligned} \varvec{A}^{(k)}_{i_1,\ldots ,i_k} = \frac{1}{N^{(k-1)/2} } \gamma _{s(i_1),\ldots ,s(i_k)} \sum _{\pi \in S_k}{\varvec{G}}^{(k)}_{i_{\pi (1)},\ldots ,i_{\pi (k)}}. \end{aligned}$$

(A.2)

For a symmetric tensor $\varvec{A}^{(k)}\in ({\mathbb {R}}^N)^{\otimes k}$ and $\varvec{T}\in ({\mathbb {R}}^N)^{\otimes (k-1)}$, we denote by $\varvec{A}^{(k)}\{\varvec{T}\}\in {\mathbb {R}}^N$ the vector with components

$$\begin{aligned} \varvec{A}^{(k)}\{\varvec{T}\}_i = \frac{1}{(k-1)!}\sum _{1\le i_1,\ldots ,i_{k-1}\le N}A^{(k)}_{i,i_1,\ldots ,i_{k-1}}T_{i_1\dots i_{k-1}}. \end{aligned}$$

(A.3)

For ${\varvec{u}}\in {\mathbb {R}}^N$ we denote by $\varvec{A}^{(k)}\{{\varvec{u}}\}=\varvec{A}^{(k)}\{{\varvec{u}}^{\otimes (k-1)}\}$ the vector with entries

$$\begin{aligned} \varvec{A}^{(k)}\{{\varvec{u}}\}_i = \frac{1}{(k-1)!}\sum _{1\le i_1,\ldots ,i_{k-1}\le N}A^{(k)}_{i,i_1,\ldots ,i_{k-1}}u_{i_1}\ldots u_{i_{k-1}}. \end{aligned}$$

(A.4)

Note that for $\varvec{A}^{(k)}$ as in (A.1), one has

$$\begin{aligned} \varvec{A}^{(k)}\{{\varvec{u}}\}=\nabla H_{N,k}({\varvec{u}}) \end{aligned}$$

where $H_{N,k}$ denotes the part of $H_N$ of total degree k.

For ${\varvec{u}},{\varvec{v}}\in {\mathbb {R}}^N$ we recall from Sect. 1.3 the notations

$$\begin{aligned} \langle {\varvec{v}}\rangle _N&=N^{-1}\sum _{i\le N} v_i, \\ \langle {\varvec{u}},{\varvec{v}}\rangle _N&= N^{-1}\sum _{i\le N}u_iv_i = \langle \vec \lambda , \vec R({\varvec{u}},{\varvec{v}})\rangle , \\ \Vert {\varvec{u}}\Vert _{N}&= \langle {\varvec{u}},{\varvec{u}}\rangle _N^{1/2}=\sqrt{\sum _s \lambda _s R_s({\varvec{u}},{\varvec{u}})}. \end{aligned}$$

Given functions $f_{t,s}:{\mathbb {R}}^{t+1}\rightarrow {\mathbb {R}}$ of $t+1$ variables for each $s\in {\mathscr {S}}$, and ${\varvec{v}}^0,{\varvec{v}}^1,\ldots ,{\varvec{v}}^t\in {\mathbb {R}}^{N}$, we define $f_t({\varvec{v}}^0,{\varvec{v}}^1,\ldots ,{\varvec{v}}^t)\in \mathbb R^N$ component-wise via

$$\begin{aligned} f_t({\varvec{v}}^0,{\varvec{v}}^1,\ldots ,{\varvec{v}}^t)_i=f_{t,s(i)}(v^0_i,\ldots ,v^t_i). \end{aligned}$$

(A.5)

Finally, for a sequence of vectors ${\varvec{w}}^0,{\varvec{w}}^1,\dots $, we write ${\varvec{w}}^{\le t} = ({\varvec{w}}^0,{\varvec{w}}^1,\ldots ,{\varvec{w}}^t)$.

To deduce the state evolution result for mixed tensors, we analyze a slightly more general iteration where each homogenous k-tensor is tracked separately, while restricting ourselves to the case where the mixture $\xi $ has finitely many components: $\gamma _{s_1,\ldots ,s_k} = 0$ for all $(s_1,\ldots ,s_k)\in {\mathscr {S}}^k$ for all $k \ge D +1$ for some fixed $D \ge 2$. We then proceed by an approximation argument to extend the convergence to the general case $D = \infty $.

We begin by introducing the Gaussian process that captures the asymptotic behavior of AMP. Define $\xi ^k$ to be the degree k part of $\xi $, and

$$\begin{aligned} \xi ^{k,s}=\frac{1}{\lambda _s}\partial _{x_s}\xi ^k(x_1,\ldots ,x_r) \end{aligned}$$

the degree $k-1$ part of $\xi ^s$.

An AMP iteration is specified by Lipschitz functions $f_{t,s}:{\mathbb {R}}^{2(t+1)}\rightarrow {\mathbb {R}}$ for each $(t,s)\in {\mathbb {N}}\times {\mathscr {S}}$.^{Footnote 3} For each iteration t, the state of the algorithm is given by vectors ${\varvec{w}}^t\in {\mathbb {R}}^N$, and ${\varvec{z}}^{k,t}\in {\mathbb {R}}^N$, with $k\in \{2,\ldots ,D\}$. Moreover for each t, there is also an external randomness vector ${\varvec{e}}^t\in {\mathbb {R}}^N$ with independent coordinates $e^t_i\sim \mu _{t,s(i)}$ from deterministic probability distributions $\big (\mu _{t,s}\big )_{t\ge 0,s\in {\mathscr {S}}}$ with finite second moment. We now start to define the AMP iteration steps (the definition finishes at (A.11)). A single step is given by

$$\begin{aligned} \textsf{AMP}_t \left({\varvec{w}}^0,\ldots ,{\varvec{w}}^t ; {\varvec{e}}^0,\ldots ,{\varvec{e}}^t\right)_{k}&\equiv \varvec{A}^{(k)} \{ \varvec{f}_t \} - \sum _{t'\le t} d_{t,t',k} \diamond \varvec{f}_{t'-1}, \end{aligned}$$

(A.6)

$$\begin{aligned} \varvec{f}_t&= f_t({\varvec{w}}^0,\ldots ,{\varvec{w}}^t;{\varvec{e}}^0,\ldots ,{\varvec{e}}^t), \end{aligned}$$

(A.7)

$$\begin{aligned} d_{t,t',k,s}&\equiv \left( \sum _{s'\in {\mathscr {S}}} \partial _{x_{s'}}\xi ^{k,s} \left(\left( \mathop {\mathrm {{\mathbb {E}}}}\limits \left[ F_{t,s} F_{t'-1,s} \right] \right)_{s\in {\mathscr {S}}} \right) \times \mathop {\mathrm {{\mathbb {E}}}}\limits \left[ \partial _{W^{t'}_{s'}} F_{t,s'} \right] \right), \end{aligned}$$

(A.8)

$$\begin{aligned} F_{t,s'}&\equiv f_{t,s'}\big (W^0_{s'},W^1_{s'},\ldots , W^t_{s'};E^0_{s'},\ldots ,E^t_{s'}\big ). \end{aligned}$$

(A.9)

A general multi-species tensor AMP algorithm then takes the form:

$$\begin{aligned} {\varvec{w}}^t=\sum _{2\le k\le D} {\varvec{z}}^{k,t},\quad \quad {\varvec{z}}^{k,t+1} = \textsf{AMP}_t \left({\varvec{w}}^0,\ldots ,{\varvec{w}}^t;{\varvec{e}}^0,\ldots ,{\varvec{e}}^t\right)_k. \end{aligned}$$

(A.10)

For the right-hand side of (A.9) to make sense, we must define for each $t\ge 0$ and $s\in {\mathscr {S}}$ a distribution over sequences $(W^0_s,\ldots ,W^t_s;E^0_s,\ldots ,E^t_s)$. The latter variables $E^{t'}_s\sim \mu _{t',s}$ are simply taken independent of each other and all other variables. The construction of the W variables is recursive across t as follows. For each $2\le k\le D$ and $s\in {\mathscr {S}}$, we let $U^{k,0}_s\sim \nu _{k,s}$ and construct a centered Gaussian process

$$\begin{aligned} (U^{k,1}_s,U^{k,1}_s,\ldots ,U^{k,t}_s) \end{aligned}$$

which is independent of $U^{k,0}_s$. The variables $U^{k,t}_s$ and $U^{k',t'}_{s'}$ are independent unless $(k,s)=(k',s')$. It remains to specify the covariance of $(U^{k,t}_s)_{1\le t\le T}$ which is given recursively by:

$$\begin{aligned} \mathop {\mathrm {{\mathbb {E}}}}\limits \big [U^{k,t+1}_s U^{k,t'+1}_{s}\big ]= & {} \xi ^{k,s}(\Sigma ^{k,t,t'});\nonumber \\ \Sigma ^{k,t,t'}_{s}= & {} {\mathbb {E}}\left[ f_{t,s}(W^0_s,\ldots ,W^t_s;E^0_s,\ldots ,E^t_s) f_{t',s}(W^0_s,\ldots ,W^{t'}_s;E^0_s,\ldots ,E^{t'}_s) \right],\nonumber \\ {}{} & {} \quad \forall s\in {\mathscr {S}}\nonumber \\ W^t_s\equiv & {} \sum _{2\le k\le D} U^{k,t}_s. \end{aligned}$$

(A.11)

The main result, an extension of Proposition 2.2, follows. Below we use ${\mathbb {W}}_2$ to denote the Wasserstein-2 distance between probability measures on Euclidean space in any dimension. We say a function $\psi :{\mathbb {R}}^d\rightarrow {\mathbb {R}}$ is pseudo-Lipschitz if

$$\begin{aligned} |\psi ({\varvec{w}})-\psi ({\varvec{y}})| \le C(1+\Vert {\varvec{w}}\Vert +\Vert {\varvec{y}}\Vert ) \cdot \Vert {\varvec{w}}-{\varvec{y}}\Vert ,\quad \forall {\varvec{w}},{\varvec{y}}\in {\mathbb {R}}^d. \end{aligned}$$

Theorem 2

(State Evolution for AMP) Let $\{{\varvec{G}}^{(k)}\}_{k\ge 2}$ be independent standard Gaussian tensors with ${\varvec{G}}^{(k)}\in ({\mathbb {R}}^N)^{\otimes k}$, and define $\varvec{A}^{(k)}$ as in (A.2). Fix a sequence of Lipschitz functions $f_{t,s}:{\mathbb {R}}^{k+1}\rightarrow {\mathbb {R}}$. Let ${\varvec{z}}^{2,0},\ldots {\varvec{z}}^{D,0}\in {\mathbb {R}}^N$ be deterministic vectors and ${\varvec{w}}^0 =\sum _{2\le k\le D} {\varvec{z}}^{k,0}$. Assume that for each $s\in {\mathscr {S}}$, the empirical distribution of the vectors

$$\begin{aligned} (z_i^{2,0},\ldots z_i^{D,0}),\quad i\in {\mathcal {I}}_s \end{aligned}$$

converges in ${\mathbb {W}}_2({\mathbb {R}}^{D-1})$ distance to the law of the vector $(U^{k,0}_s)_{2\le k\le D}$.

Let ${\varvec{w}}^{t}, {\varvec{z}}^{k,t}$, $t\ge 1$ be given by the tensor AMP iteration. Then, for all $s\in {\mathscr {S}}$ and $T\ge 1$ and for any pseudo-Lipschitz functions $\psi :{\mathbb {R}}^{D \times T}\rightarrow {\mathbb {R}}$ and $\widetilde{\psi }:{\mathbb {R}}^T\rightarrow {\mathbb {R}}$, we have

$$\begin{aligned} \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty }\frac{1}{N_s}\sum _{i\in {\mathcal {I}}_s}\psi \Big ((z_i^{k,t})_{ k\le D,t\le T}\Big )&= \mathop {\mathrm {{\mathbb {E}}}}\limits \left[\psi \big ((U^{k,t}_s)_{2\le k\le D,t\le T}\big )\right]; \end{aligned}$$

(A.12)

$$\begin{aligned} \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty }\frac{1}{N_s}\sum _{i\in {\mathcal {I}}_s}\widetilde{\psi }\Big ((w_i^{t})_{t\le T}\Big )&= \mathop {\mathrm {{\mathbb {E}}}}\limits \left[\widetilde{\psi }\big ((W^{t}_s)_{t\le T}\big )\right]. \end{aligned}$$

(A.13)

Note that (A.13) (which concerns the actual AMP iterates ${\varvec{w}}^t$) is a special case of (A.12) (which is more convenient to prove). Indeed one can take $\psi \left((z^{k,t})_{k\le D,t\le T}\right)=\widetilde{\psi }\left(\big (\sum _{k\le D}z^{k,t}\big )_{t\le T}\right)$. In the special case that $c_k =0$ for all $k\ge D+1$, Proposition 2.2 follows immediately from Theorem 2 by baking the contribution of $\varvec{h}$ explicitly into $f_t$ (since we require $k\ge 2$ above). Proposition 2.2 for non-polynomial $\xi $ follows by a standard approximation argument outlined at the end of Sect. 1. For the remainder of this appendix we thus focus on establishing (A.12).

1.1 A.1: Further Definitions

We define the notations

$$\begin{aligned} {\varvec{W}}_t&= \big [{\varvec{w}}^0~|~{\varvec{w}}^1~|\ldots |~{\varvec{w}}^t \big ], \\ {\varvec{E}}_t&= \big [{\varvec{e}}^0~|~{\varvec{e}}^1~|\ldots |~{\varvec{e}}^t\big ], \\ {\varvec{Z}}_{k,t}&= \big [{\varvec{z}}^{k,0}~|~{\varvec{z}}^{k,1}~|\ldots |~{\varvec{z}}^{k,t}\big ]. \end{aligned}$$

Given a $N\times (t+1)$ matrix such as ${\varvec{W}}_t$, and a tensor $\varvec{A}^{(k)}\in ({\mathbb {R}}^{N})^{\otimes k}$, we write $\varvec{A}^{(k)}\{{\varvec{W}}_t\}$ for the $N\times (t+1)$ matrix with columns $\varvec{A}^{(k)}\{{\varvec{w}}^0\}$, ..., $\varvec{A}^{(k)}\{{\varvec{w}}^t\}$:

$$\begin{aligned} \varvec{A}^{(k)}\{{\varvec{W}}_t\}&=\Big [\varvec{A}^{(k)}\{{\varvec{w}}^0\}\Big |\varvec{A}^{(k)}\{{\varvec{w}}^1\}\Big |\ldots \Big |\varvec{A}^{(k)}\{{\varvec{w}}^t\}\Big ]. \end{aligned}$$

We will write $\varvec{f}_t=f_t({\varvec{W}}_t,{\varvec{E}}_t)=f_t({\varvec{w}}^0,\ldots ,{\varvec{w}}^t,{\varvec{e}}^0,\ldots ,{\varvec{e}}^t)$ and also set

$$\begin{aligned} {\varvec{y}}_{k,t+1}({\varvec{Z}}_{k,t})&= {\varvec{A}}_k\left\rbrace \varvec{f}_t\right\lbrace = {\varvec{z}}^{k,t+1} + \sum _{t_1\le t} d_{t,t_1,k} \diamond \varvec{f}_{t_1-1}, \end{aligned}$$

(A.14)

$$\begin{aligned} \varvec{Y}_{k,t}&=[{\varvec{y}}_{k,1}|\ldots |{\varvec{y}}_{k,t}], \nonumber \\ {\varvec{y}}_{t}({\varvec{Z}}_{k,t})&=\sum _{2\le k\le D} {\varvec{y}}_{k,t}({\varvec{Z}}_{k,t}). \end{aligned}$$

(A.15)

We also define an associated $(t+1)\times (t+1)$ Gram matrix ${\varvec{G}}_{\xi ^{k,s}}={\varvec{G}}_{\xi ^{k,s},t}$ via

$$\begin{aligned} ({\varvec{G}}_{\xi ^{k,s}})_{t_1,t_2} = \xi ^{k,s}\big (\vec R( \varvec{f}_{t_1},\varvec{f}_{t_2})\big ). \end{aligned}$$

(A.16)

The dependence of ${\varvec{G}}_{\xi ^{k,s},t}$ on t will often be suppressed (this dependence is relevant when inverting the matrix ${\varvec{G}}_{\xi ^{k,s},t}$ but not for defining individual entries). Finally, we let ${\mathcal {F}}_t$ denote the $\sigma $-algebra generated by all iterates up to time t:

$$\begin{aligned} {\mathcal {F}}_t = \sigma \big (\{{\varvec{z}}_{k,t_1},{\varvec{w}}^{t_1},{\varvec{e}}^{t_1},\varvec{f}_{t_1}\}_{k\le D,t_1\le t}\big ). \end{aligned}$$

(A.17)

Throughout the proof of state evolution we make the following simplifying assumptions:

Assumption 2

$\xi $ is a degree D polynomial with all coefficients $\gamma _{s_1,\ldots ,s_k}$ for $2\le k\le D$ strictly positive.

Assumption 3

Each matrix ${\varvec{G}}_{\xi ^{k,s},t}$ is well-conditioned, i.e.

$$\begin{aligned} C^{-1}\le \sigma _{\min }({\varvec{G}}_{\xi ^{k,s},t})\le \sigma _{\max }({\varvec{G}}_{\xi ^{k,s},t})\le C \end{aligned}$$

for all $t\le T$. Here ${\varvec{G}}_{\xi ^{k,s},t}$ is defined based on iterates that will appear in Theorems 3 and A.6. The same holds for ${\mathcal {L}}_{k,t}$ as defined in (A.34).

It is a standard argument that to establish Proposition 2.2, it suffices to do so under the above assumptions. The reason is that one can always slightly perturb both $\xi $ and the non-linearities $f_{t,s}$ to ensure the assumptions hold. Then suitable continuity properties suffice to transfer all asymptotic guarantees. We refer the reader to [4, Appendices A.8 and A.9] for the arguments in the single-species case, still in the generality of mixed tensors. (In the more common setting $D=2$ of just a random matrix this step is also common for state evolution proofs, see e.g. [18, Sect. 4.2.1].) The corresponding extension in our setting is completely analogous and omitted.

1.2 A.2: Preliminary Lemmas

The next lemma has several parts. All are elementary Gaussian calculations so their proofs are omitted.

Lemma A.1

For any deterministic ${\varvec{u}},{\varvec{v}}\in {\mathbb {R}}^N$ and $\varvec{A}^{(k)}$ defined by (A.2) we have:

1.
Letting $g_0\sim \textsf{N}(0,1)$ be independent of ${\varvec{g}}\sim \textsf{N}(0,{\varvec{I}}_N)$, we have
$$\begin{aligned} \varvec{A}^{(k)}\{{\varvec{u}}\}{\mathop {=}\limits ^{\textrm{d}}}\sum _{s\in {\mathscr {S}}} {\varvec{g}}_s \sqrt{\xi ^{k,s}(\vec R({\varvec{u}},{\varvec{u}}))} + \frac{g_0}{\sqrt{N}} \sum _{s\in {\mathscr {S}}} {\varvec{u}}_s \sqrt{ \sum _{s'\in {\mathscr {S}}} \partial _{x_{s'}} \xi ^{k,s} \Big ( \vec R({\varvec{u}},{\varvec{u}})\Big )}. \end{aligned}$$
(A.18)
2.
Let $g_0,g_1,\ldots ,g_r\sim \textsf{N}(0,1)$ be independent. We have (jointly across $s\in {\mathscr {S}}$)
$$\begin{aligned} \sqrt{\lambda _s N} R_s({\varvec{v}},\varvec{A}^{(k)}\{{\varvec{u}}\})&{\mathop {=}\limits ^{\textrm{d}}}&\sqrt{\xi ^{k,s}(\vec R({\varvec{u}},{\varvec{u}}))\cdot \vec R({\varvec{v}},{\varvec{v}})}g_s\nonumber \\ {}{} & {} + \sqrt{\sum _{s'\in {\mathscr {S}}} \partial _{x_{s'}} \xi ^{k,s} \Big (\vec R({\varvec{u}},{\varvec{u}})\Big )} R_s({\varvec{u}},{\varvec{v}}) \quad g_0. \end{aligned}$$
(A.19)
3.
For $s\in {\mathscr {S}}$:
$$\begin{aligned} R\left(\varvec{A}^{(k)}\{{\varvec{u}}\},\varvec{A}^{(k)}\{{\varvec{v}}\}\right)_s\simeq \xi ^{k,s}\left(\vec R( {\varvec{u}},{\varvec{v}})\right). \end{aligned}$$
4.
For a deterministic symmetric tensor $\varvec{T}\in ({\mathbb {R}}^N)^{\otimes k-1}$, the vector $\varvec{A}^{(k)}\{\varvec{T}\}$ is centered Gaussian. Its covariance is given by
$$\begin{aligned} \mathop {\mathrm {{\mathbb {E}}}}\limits \big [\varvec{A}^{(k)}\{\varvec{T}\}_i\varvec{A}^{(k)}\{\varvec{T}\}_j\big ]&= \langle \xi ^{k,s(i)}\diamond \varvec{T},\, \varvec{T}\rangle _N \cdot 1\{i=j\} \\&\quad +\frac{k(k-1)}{N^{k-1}}\;\sum _{i_1,\ldots ,i_{k-2}=1}^N \gamma _{i,i_1,\ldots ,i_{k-2}} \gamma _{j,i_1,\ldots ,i_{k-2}} T_{i,i_1,\ldots ,i_{k-2}} T_{j,i_1,\ldots ,i_{k-2}}. \end{aligned}$$
5.
Let ${\varvec{P}}\in {\mathbb {R}}^{N\times N}$ be the orthogonal projection onto a (deterministic) subspace $S\subseteq {\mathbb {R}}^N$ with $d=\dim (S)=O(1)$. Then
$$\begin{aligned} \Vert {\varvec{P}}{\varvec{G}}^{(k)}\{{\varvec{u}}\} - {\varvec{G}}^{(k)}\{{\varvec{u}}\}\Vert _2 /\Vert {\varvec{G}}^{(k)}\{{\varvec{u}}\}\Vert _2\simeq 0. \end{aligned}$$

We next develop a formula for the conditional expectation of a Gaussian tensor $\varvec{A}^{(k)}$ given a collection of linear observations. We set ${\varvec{D}}$ to be the $t\times t\times t$ tensor with entries $D_{ijk}=1$ if $i=j=k$ and $D_{ijk}=0$ otherwise.

Lemma A.2

Recalling (A.17), let $\mathop {\mathrm {{\mathbb {E}}}}\limits \{\varvec{A}^{(k)}|{\mathcal {F}}_t\}$. Equivalently $\mathop {\mathrm {{\mathbb {E}}}}\limits \{\varvec{A}^{(k)}|{\mathcal {F}}_t\}$ is the conditional expectation of $\varvec{A}^{(k)}$ given the linear-in-$\varvec{A}^{(k)}$ observations

$$\begin{aligned} \varvec{A}^{(k)}\{\varvec{f}_{t'}\}={\varvec{y}}_{k,t'+1}\quad \text{ for }\; s\in \{0,\ldots ,t-1\}. \end{aligned}$$

(A.20)

Then we have for $i_1,i_2,\ldots ,i_k\le n$,

$$\begin{aligned}&\mathop {\mathrm {{\mathbb {E}}}}\limits [\varvec{A}^{(k)}|{\mathcal {F}}_t]_{i_1,i_2,\ldots ,i_k}\nonumber \\&\quad = \sum _{j=1}^k \sum _{0\le t_1,t_2\le t-1} (\widehat{\varvec{Z}}_{k,t})_{i_j,t_2}\cdot ({\varvec{G}}^{-1}_{\xi ^{k,s},t-1})_{t_2,t_1}\cdot (\varvec{f}_{t_1,i_1}\ldots \varvec{f}_{t_1,i_{j-1}}\varvec{f}_{t_1,i_{j+1}}\ldots \varvec{f}_{t_1,i_{k}}). \end{aligned}$$

(A.21)

Here, the matrix $\widehat{\varvec{Z}}_{k,t}\in {\mathbb {R}}^{N\times t}$ is defined as the solution of a system of linear equations as follows. Define the linear operator ${\mathcal {T}}_{k,t}:{\mathbb {R}}^{N\times t}\rightarrow {\mathbb {R}}^{N\times t}$ by letting, for $i\le N$, $0\le t_3\le t-1$:

$$\begin{aligned}{}[{\mathcal {T}}_{k,t}({\varvec{Z}})]_{i,t_3}= & {} \sum _{j=1}^N \sum _{0\le t_1,t_2\le t-1} (\varvec{f}_{t_2})_{i} (\varvec{f}_{t_2})_{j}\nonumber \\ {}{} & {} \times \left(({\varvec{G}}_{\xi ^{k,s(i)},t-1}^{-1})_{t_2,t_1} \partial _{s(j)}\xi ^{k,s(i)}\big (\vec R(\varvec{f}_{t_2},\varvec{f}_{t_3})\big ) \right) \diamond ({\varvec{Z}})_{j,t_1}. \end{aligned}$$

(A.22)

Then $\widehat{\varvec{Z}}_{k,t}$ is the unique solution of the following linear equation (with $\varvec{Y}_{k,t}$ defined as per (A.14))

$$\begin{aligned} \widehat{\varvec{Z}}_{k,t}+{\mathcal {T}}_{k,t}(\widehat{\varvec{Z}}_{k,t}) =\varvec{Y}_{k,t}. \end{aligned}$$

(A.23)

(Here, $\widehat{\varvec{Z}}_{k,t} = [\hat{{\varvec{z}}}_{k,0},\ldots ,\hat{{\varvec{z}}}_{k,t-1}]$ and $\varvec{Y}_{k,t} = [\hat{{\varvec{y}}}_{k,1},\ldots ,\hat{{\varvec{y}}}_{k,t}]$ have dimensions $N \times t$.)

The above formulas for $\mathop {\mathrm {{\mathbb {E}}}}\limits [\varvec{A}^{(k)}|{\mathcal {F}}_t]$ and ${\mathcal {T}}_{k,t}$ are rather complicated. In [4, Appendix A] the reader may find helpful tensor network diagrams for the single-species case. Unfortunately it is less clear how to draw a corresponding tensor network with multiple species.

Proof of Lemma A.2

Let ${\mathcal {V}}_{k,t}$ be the affine space of symmetric tensors satisfying the constraint (A.20). The conditional expectation $\mathop {\mathrm {{\mathbb {E}}}}\limits [\varvec{A}^{(k)}|{\mathcal {F}}_t]$ is the tensor with minimum weighted Frobenius norm $\Vert \cdot \Vert _{F,\xi ^k}$ in the affine space ${\mathcal {V}}_{k,t}$, given by

$$\begin{aligned} \Vert \varvec{A}\Vert _{F,\xi ^k}^2 = \langle (\Gamma ^{(k)})^{-1}\diamond \varvec{A}, (\Gamma ^{(k)})^{-1}\diamond \varvec{A}\rangle . \end{aligned}$$

(A.24)

Here $(\Gamma ^{(k)})^{-1}$ is the entry-wise inverse of $\Gamma ^{(k)}$, which exists by Assumption 2.

By Lagrange multipliers, there exist vectors $\varvec{m}^1,\ldots ,\varvec{m}^t\in {\mathbb {R}}^N$ such that $\mathop {\mathrm {{\mathbb {E}}}}\limits [\varvec{A}^{(k)}|{\mathcal {F}}_t]=\widehat{\varvec{A}}^{(k)}$ equals

$$\begin{aligned} \widehat{\varvec{A}}^{(k)}_t \equiv \Gamma ^{(k)}\diamond \sum _{t'=0}^{t-1}\sum _{j=1}^k \underbrace{\varvec{f}_{t'} \otimes \cdots \otimes \varvec{f}_{t'}}_{j-1 \text{ times }}\otimes \varvec{m}^{t'} \otimes \underbrace{\varvec{f}_{t'}\otimes \cdots \otimes \varvec{f}_{t'}}_{k-j \text{ times }}. \end{aligned}$$

(A.25)

Also by Lagrange multipliers, if a tensor $\widehat{\varvec{A}}^{(k)}$ is of this form (for some choice of vectors $\varvec{m}^1,\ldots ,\varvec{m}^t$) and satisfies the constraints $\widehat{\varvec{A}}^{(k)}\{\varvec{f}_{t'}\}={\varvec{y}}_{k,t'+1}$ for $s< t$, then this tensor is unique and equals $\mathop {\mathrm {{\mathbb {E}}}}\limits [\varvec{A}^{(k)}|{\mathcal {F}}_t]$. Without loss of generality, we write

$$\begin{aligned} \varvec{m}^{t_1}_i = \sum _{t_2=0}^{t-1}({\varvec{G}}^{-1}_{\xi ^{k,s(i)},t-1})_{t_1,t_2}\hat{\varvec{z}}^{t_2}_i, \quad \widehat{\varvec{Z}}_{k,t}= [\hat{\varvec{z}}^1|\ldots |\hat{\varvec{z}}^t]. \end{aligned}$$

(A.26)

By direct calculation we obtain that for each $i\in [N]$,

$$\begin{aligned} \big (\widehat{\varvec{A}}^{(k)}_t\{\varvec{f}_{t_1}\}\big )_i= & {} \sum _{t_2=0}^{t-1} ({\varvec{G}}_{\xi ^{k,s(i)},t-1})_{t_1,t_2}(\varvec{m}^{t_2})_i \nonumber \\ {}{} & {} + \sum _{t_2=0}^{t-1} \sum _{s'\in {\mathscr {S}}} \left(\partial _{s'} \xi ^{k,s(i)}(\vec R(\varvec{f}_{t_1},\varvec{f}_{t_2})) R_{s'}(\varvec{f}_{t_1},\varvec{m}^{t_2}) \right) (\varvec{f}_{t_2})_i\nonumber \\ f= & {} (\hat{\varvec{z}}_{t_1})_i + \sum _{t_2=0}^{t-1} \sum _{s'\in {\mathscr {S}}} \left(\partial _{s'} \xi ^{k,s(i)}(\vec R(\varvec{f}_{t_1},\varvec{f}_{t_2})) R_{s'}(\varvec{f}_{t_1},\varvec{m}^{t_2}) \right) (\varvec{f}_{t_2})_i.\nonumber \\ \end{aligned}$$

(A.27)

We next stack these vectors as columns of an $N\times t$ matrix. The first term yields $\widehat{\varvec{Z}}_{k,t}$. Moreover the second term coincides with ${\mathcal {T}}_{k,t}(\widehat{\varvec{Z}}_{k,t})$ by rearranging the order of sums in (A.27). Hence

$$\begin{aligned} \big [\widehat{\varvec{A}}^{(k)}_t\{\varvec{f}_{0}\},\ldots ,\widehat{\varvec{A}}^{(k)}_t\{\varvec{f}_{t-1}\}\big ]&= \widehat{\varvec{Z}}_{k,t} + {\mathcal {T}}_{k,t}(\widehat{\varvec{Z}}_{k,t}). \end{aligned}$$

(A.28)

This in turn implies that the equation determining $\widehat{\varvec{Z}}_{k,t}$ takes the form (A.23). $\square $

1.3 A.3: Long AMP

As an intermediate step towards proving Theorem 2, we introduce a new iteration that we call Long AMP (LAMP), following [8]. This iteration is less compact but simpler to analyze. For each $k\le D$, let ${\mathcal {S}}_{k,t}\subseteq ({\mathbb {R}}^N)^{\otimes k}$ be the linear subspace of tensors $\varvec{T}$ that are symmetric and such that $\varvec{T}\{\varvec{f}_{t_1}\}=0$ for all $t_1<t$. We denote by ${\mathcal {P}}_t^{\perp }(\varvec{A}^{(k)})$ the projection of $\varvec{A}^{(k)}$ onto ${\mathcal {S}}_{k,t}$, in the inner product space (A.24) corresponding to $\Gamma ^{(k)}$. We then define the LAMP mapping

$$\begin{aligned} \textsf{LAMP}_t\left({\varvec{v}}^{\le t}\right)_{k}&\equiv {\mathcal {P}}_t^{\perp } (\varvec{A}^{(k)}) \{\varvec{f}_t\} + \sum _{0\le t_1\le t} h_{t,t_1-1,k} \diamond {\varvec{q}}^{k,t_1}, \end{aligned}$$

(A.29)

$$\begin{aligned} h_{t,t_1,k,s}&\equiv \sum _{0\le t_2\le t-1} \big [{\varvec{G}}_{\xi ^{k,s},t-1}^{-1}\big ]_{t_1,t_2} \big [{\varvec{G}}_{\xi ^{k,s},t}\big ]_{t_2,t} ,~~~ h_{t,-1,k}=0. \end{aligned}$$

(A.30)

Here we use similar notations $\varvec{f}_t = f_t({\varvec{V}}_t;{\varvec{E}}_t)$ and ${\varvec{G}}_{\xi ^{k,s},t}$ as before (recall (A.16)), and take the vectors ${\varvec{e}}^t$ as before. However the quantities $\varvec{f}_t,{\varvec{G}}_{\xi ^{k,s},t}$ are now different: they are computed using the vectors ${\varvec{v}}^0,\ldots ,{\varvec{v}}^t$ using the recursion:

$$\begin{aligned} {\varvec{v}}^t=\sum _{2\le k\le D} {\varvec{q}}^{k,t},\quad {\varvec{q}}^{k,t+1}=\textsf{LAMP}_t\left({\varvec{v}}^{\le t}\right)_{k}. \end{aligned}$$

(A.31)

Following [4, 8], we first establish state evolution for LAMP (under the non-degeneracy Assumption 2), and then deduce the result for the original AMP. In analyzing LAMP we use notations analogous to the ones introduced for AMP. In particular:

$$\begin{aligned} {\varvec{V}}_{t}&= [{\varvec{v}}_{1}|{\varvec{v}}_{2}|\dots |{\varvec{v}}_{t}] \end{aligned}$$

(A.32)

$$\begin{aligned} {\varvec{Q}}_{k,t}&= [ {\varvec{q}}_{k,1}^{\otimes k}| {\varvec{q}}_{k,2}^{\otimes k}| \dots | {\varvec{q}}_{k,t}^{\otimes k} ]. \end{aligned}$$

(A.33)

1.4 A.4: State Evolution for Long AMP

Theorem 3

Under the assumptions of Theorem 2, let ${\varvec{q}}^{2,0},\ldots {\varvec{q}}^{D,0}\in {\mathbb {R}}^N$ be deterministic vectors and ${\varvec{v}}^0 =\sum _{2\le k\le D} {{\textbf {q}}}^{k,0}$. Assume that the uniform empirical distribution of the N vectors $\{(q_i^{2,0},\ldots , q_i^{D,0})\}_{i\le N}$ converges in ${\mathbb {W}}_2$ distance to the law of the vector $(U^{k,0})_{2\le k\le D}$.

Further we assume there is a constant $C<\infty $ such that for all $t\le T$:

(i)
The matrices ${\varvec{G}}_{\xi ^{k,s},t}= {\varvec{G}}_{\xi ^{k,s},t}({\varvec{V}})$ are uniformly well-conditioned as guaranteed by Assumption 3.
(ii)
Let the linear operator ${\mathcal {T}}_{k,t}:{\mathbb {R}}^{N\times t}\rightarrow {\mathbb {R}}^{N\times t}$ be defined as per (A.22), with ${\varvec{G}}_{\xi ^{k,s},t} = {\varvec{G}}_{\xi ^{k,s},t}({\varvec{V}},{\varvec{E}})$, and $\varvec{f}_t=f_t({\varvec{V}},{\varvec{E}})$, and define
$$\begin{aligned} {\mathcal {L}}_{k,t} = {\varvec{1}}+{\mathcal {T}}_{k,t}. \end{aligned}$$
(A.34)
Then $C^{-1}\le \sigma _{\min }({\mathcal {L}}_{k,t})\le \sigma _{\max }({\mathcal {L}}_{k,t})\le C$.

Then the following statements hold for any $t\le T$ and sufficiently large N:

(a)
Correct conditional law:
$$\begin{aligned} {\varvec{q}}^{k,t+1}|_\mathcal {F_t}{\mathop {=}\limits ^{\textrm{d}}}\mathop {\mathrm {{\mathbb {E}}}}\limits [{\varvec{q}}^{k,t+1}|\mathcal F_t] + {\mathcal {P}}_t^{\perp }(\widetilde{\varvec{A}}^{(k)}) \{\varvec{f}_t\}. \end{aligned}$$
(A.35)
where $\widetilde{\varvec{A}}^{(k)}$ is a symmetric tensor distributed identically to $\varvec{A}^{(k)}$ and independent of everything else, and ${\mathcal {P}}_{t}^{\perp }$ is the projection onto the subspace ${\mathcal {S}}_{k,t}$ defined in Sect. 1. Further
$$\begin{aligned} \mathop {\mathrm {{\mathbb {E}}}}\limits [{\varvec{q}}^{k,t+1}|\mathcal F_t] = \sum _{s\in {\mathscr {S}}} \sum _{0\le t_1\le t} h_{t,t_1-1,k,s} {\varvec{q}}^{k,t_1}_s. \end{aligned}$$
(A.36)
Moreover, the vectors $({\varvec{q}}^{k,t+1})_{2\le k\le D}$ are conditionally independent given $\mathcal F_t$.
(b)
Approximate isometry: we have
$$\begin{aligned} R_s({\varvec{q}}^{k,t_1+1},{\varvec{q}}^{k,t_2+1})&\simeq \xi ^{k,s}\left(\vec R( \varvec{f}_{t_1},\varvec{f}_{t_2})\right), \end{aligned}$$
(A.37)
$$\begin{aligned} R_s({\varvec{v}}^{t_1+1},{\varvec{v}}^{t_2+1})&\simeq \xi ^{s}\big (\vec R( \varvec{f}_{t_1},\varvec{f}_{t_2})\big ). \end{aligned}$$
(A.38)
Moreover, both sides converge in probability to constants as $N\rightarrow \infty $, and for $k_1\ne k_2$ and any $(t_1,t_2)$ and $s\in {\mathscr {S}}$,
$$\begin{aligned} R_s({\varvec{q}}^{k_1,t_1},{\varvec{q}}^{k_2,t_2})\simeq 0. \end{aligned}$$
(A.39)
(c)
State evolution: for each $s\in {\mathscr {S}}$ and any pseudo-Lipschitz function $\psi :{\mathbb {R}}^{D \times 2(t+1)}\rightarrow {\mathbb {R}}$, we have
$$\begin{aligned} \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \frac{1}{N_s} \sum _{i\in {\mathcal {I}}_s} \psi \big ((q_i^{k,t'})_{ k\le D,t'\le t}; (e^t_i)_{t'\le t}\big ) = \mathop {\mathrm {{\mathbb {E}}}}\limits \big \{\psi \big ((U^{k,t'}_s)_{2\le k\le D,t'\le t}; (E^{t'}_s)_{t'\le t} \big )\big \}. \end{aligned}$$
(A.40)
where $(U^{k,t}_s)_{k\le D,1\le t\le T}$ is the centered Gaussian process defined in the statement of Theorem 2.

In the next subsection, we will prove these statements by induction on t. The crucial point we exploit is the representation (a). We emphasize that the iteration number t is bounded as $N\rightarrow \infty $; therefore all numerical quantities not depending on N (but possibly on t) will be treated as constants.

1.5 A.5: Proof of Theorem 3

The proof will be by induction over t. The base case is clear, (e.g. see Proposition A.4) and we focus on the inductive step. We assume the statements above for $t-1$ and prove them for t.

1.5.1 A.5.1: Proof of (a)

Note that ${\mathcal {P}}_t^{\perp }(\varvec{A}^{(k)})$ is by construction independent of ${\mathcal {F}}_t$, and therefore we can replace $\varvec{A}^{(k)}$ by a fresh independent matrix in (A.29), whence (A.35) follows. The equality (A.36) holds by definition of the iteration.

1.5.2 A.5.2: Proof of (b): Approximate isometry

We will repeatedly apply Lemma A.1. We start with (A.37). As we are inducting on t, we may limit ourselves to considering overlaps $\vec R( {\varvec{q}}^{k,t+1},{\varvec{q}}^{k,t_1+1})$, for $t_1\le t$.

Define the tensor $\Gamma ^{(k),\nabla }\in ({\mathbb {R}}^{{\mathscr {S}}}_{\ge 0})^{\otimes (k-1)}$ by

$$\begin{aligned} \Gamma ^{(k),\nabla }_{s_1,\ldots ,s_{k-1}} = \sqrt{ k \sum _{s\in {\mathscr {S}}} \lambda _s \big (\Gamma ^{(k)}_{s,s_1,\ldots ,s_{k-1}}\big )^2 }. \end{aligned}$$

(A.41)

We choose

$$\begin{aligned} (\varvec{f}_{t}^{\otimes k-1})_{\parallel } \in \textrm{span}\left(\varvec{f}_{t_1}^{\otimes k-1}\right)_{t_1< t} \end{aligned}$$

such that

$$\begin{aligned} \Gamma ^{(k),\nabla }\diamond \big (\varvec{f}_{t}^{\otimes k-1}\big )_{\parallel } \end{aligned}$$

is the orthogonal projection of $\Gamma ^{(k),\nabla }\diamond \varvec{f}_{t}^{\otimes k-1}$ onto

$$\begin{aligned} \textrm{span}\left(\Gamma ^{(k),\nabla }\diamond \varvec{f}_{t_1}^{\otimes k-1}\right)_{t_1< t} \end{aligned}$$

and also set

$$\begin{aligned} (\varvec{f}_{t}^{\otimes k-1})_{\perp }=\varvec{f}_{t}^{\otimes k-1}-(\varvec{f}_{t}^{\otimes k-1})_{\parallel }. \end{aligned}$$

We will use (and soon after, prove) the following lemma.

Lemma A.3

For all $t_1\le t_1$, we have

$$\begin{aligned} {\mathcal {P}}_t^{\perp } (\widetilde{\varvec{A}}^{(k)})\{(\varvec{f}_t^{\otimes k-1})_{\perp }\} \simeq \widetilde{\varvec{A}}^{(k)}\{(\varvec{f}_t^{\otimes k-1})_{\perp }\}. \end{aligned}$$

(A.42)

For $t_1\le t-1$, using Lemma A.1, point 2 implies

$$\begin{aligned} \vec R({\varvec{q}}^{k,t+1},{\varvec{q}}^{k,t_1+1}) \simeq \vec R\big (\mathop {\mathrm {{\mathbb {E}}}}\limits [{\varvec{q}}^{k,t+1}|{\mathcal {F}}_t],\quad {\varvec{q}}^{k,t_1+1}\big ) \end{aligned}$$

We next use the formula in (a) for $\mathop {\mathrm {{\mathbb {E}}}}\limits [{\varvec{q}}^{k,t+1}|{\mathcal {F}}_t]$ together with the expression in (A.29). For each $s\in {\mathscr {S}}$:

$$\begin{aligned} R\big (\mathbb E[{\varvec{q}}^{k,t+1}|\mathcal F_t],{\varvec{q}}^{k,t_1+1}\big )_s&\simeq R\Bigg ( \sum _{0\le t_2,t_3\le t-1} {\varvec{q}}^{k,t_3+1}({\varvec{G}}_{\xi ^{k,s},t-1}^{-1})_{t_3,t_2} \quad \xi ^{k,s}\big (\varvec{f}_{t_2},\varvec{f}_{t}\big ),\quad {\varvec{q}}^{k,t_1+1} \Bigg )_s \nonumber \\&= \sum _{0\le t_2,t_3\le t-1} R\big ({\varvec{q}}^{k,t_3+1},{\varvec{q}}^{k,t_1+1}\big )_s\;({\varvec{G}}_{\xi ^{k,s},t-1}^{-1})_{t_3,t_2}\; \xi ^{k,s}\big (\varvec{f}_{t_2},\varvec{f}_{t}\big ) \nonumber \\&\simeq \sum _{0\le t_2,t_3\le t-1} ({\varvec{G}}_{\xi ^{k,s},t-1})_{t_3,t_1}\;({\varvec{G}}_{\xi ^{k,s},t-1}^{-1})_{t_3,t_2}\; \xi ^{k,s}\big (\varvec{f}_{t_2},\varvec{f}_{t}\big ) \nonumber \\&= \big ( {\varvec{G}}_{\xi ^{k,s},t-1} \times {\varvec{G}}_{\xi ^{k,s},t-1}^{-1} \times {\varvec{G}}_{\xi ^{k,s},t-1} \big )_{t_1,t} \nonumber \\&= ({\varvec{G}}_{\xi ^{k,s},t-1})_{t_1,t}. \end{aligned}$$

(A.43)

Here (A.43) comes from the induction hypothesis (A.37) (and the symmetry of the matrix ${\varvec{G}}_{\xi ^{k,s},t-1}$ is used to obtain the next line). We next prove that (A.37) holds for $t_1=t$. We have by definition of the projections that

$$\begin{aligned} {\mathcal {P}}_t^{\perp } (\widetilde{\varvec{A}}^{(k)})\{\varvec{f}_t\}={\mathcal {P}}_t^{\perp } (\widetilde{\varvec{A}}^{(k)})\{(\varvec{f}_t^{\otimes k-1})_{\perp }\}, \end{aligned}$$

where the right-hand side is defined according to (A.3). Using (A.42) from Lemma A.3 as well as point 4 of Lemma A.1, we have

$$\begin{aligned} R\left( {\mathcal {P}}_t^{\perp } (\widetilde{\varvec{A}}^{(k)})\{\varvec{f}_t\}, {\mathcal {P}}_t^{\perp } (\widetilde{\varvec{A}}^{(k)})\{\varvec{f}_t\} \right)_s \simeq \xi ^{(k,s)}\left( R\big ( (\varvec{f}_{t}^{\otimes k-1})_{\perp }, (\varvec{f}_{t}^{\otimes k-1})_{\perp } \big ) \right). \end{aligned}$$

(A.44)

Next, using (A.42) and Lemma A.1 (point 2), we obtain that for all $s\in {\mathscr {S}}$

$$\begin{aligned} R\big ({\mathcal {P}}_t^{\perp } (\widetilde{\varvec{A}}^{(k)})\{\varvec{f}_t\},\mathop {\mathrm {{\mathbb {E}}}}\limits [{\varvec{q}}^{k,t+1}|{\mathcal {F}}_t]\big )_s \simeq 0. \end{aligned}$$

(A.45)

Moreover we recall that by the expression for $\mathop {\mathrm {{\mathbb {E}}}}\limits [{\varvec{q}}^{k,t+1}|{\mathcal {F}}_t]$ from part (a),

$$\begin{aligned} R\left( \mathop {\mathrm {{\mathbb {E}}}}\limits [{\varvec{q}}^{k,t+1}|\mathcal F_t], \mathop {\mathrm {{\mathbb {E}}}}\limits [{\varvec{q}}^{k,t+1}|\mathcal F_t] \right) \simeq \xi ^{k,s} \left( \vec R( (\varvec{f}_t^{\otimes k-1})_{\parallel }, (\varvec{f}_t^{\otimes k-1})_{\parallel } ) \right). \end{aligned}$$

(A.46)

The formula for linear regression implies

$$\begin{aligned} (\varvec{f}_{t}^{\otimes k-1})_{\parallel }&= \sum _{0\le t_1\le t-1} \alpha _{t_1,t} \Gamma ^{(k,s)}\diamond \varvec{f}_{t_1}^{\otimes k-1}, \end{aligned}$$

(A.47)

$$\begin{aligned} \alpha _{t_1,t}&= \sum _{0\le t_2\le t-1} ({\varvec{G}}_{\xi ^{k,s},t-1}^{-1})_{t_1,t_2} \langle \Gamma ^{(k,s)}\diamond \varvec{f}_{t_2}^{\otimes k-1}, \Gamma ^{(k,s)}\diamond \varvec{f}_{t}^{\otimes k-1} \rangle _N \end{aligned}$$

(A.48)

$$\begin{aligned}&= \sum _{0\le t_2\le t-1} ({\varvec{G}}_{\xi ^{k,s},t-1}^{-1})_{t_1,t_2} ({\varvec{G}}_{\xi ^{k,s},t})_{t_2,t}. \end{aligned}$$

(A.49)

By part (b) of the inductive step, for $1\le t_1,t_2\le t-1$ we have

$$\begin{aligned} \xi ^{k,s}\big (\vec R( \varvec{f}_{t_2},\varvec{f}_{t_1})\big ) \simeq R_s( {\varvec{q}}_{k,t_2+1},{\varvec{q}}_{k,t_1+1}) . \end{aligned}$$

In particular the formulas (A.30) and (A.47) have asymptotically the same coefficients, and the overlap structure between the summands is identical. It follows that

$$\begin{aligned} R\left( \mathop {\mathrm {{\mathbb {E}}}}\limits [{\varvec{q}}^{k,t+1}|\mathcal F_t], \mathop {\mathrm {{\mathbb {E}}}}\limits [{\varvec{q}}^{k,t+1}|\mathcal F_t] \right) \simeq R\left( (\varvec{f}_{t}^{\otimes k-1})_{\parallel }, (\varvec{f}_{t}^{\otimes k-1})_{\parallel } \right). \end{aligned}$$

(A.50)

Using together Eqs. (A.44), (A.45), and (A.50), we get

$$\begin{aligned} R\left({\varvec{q}}^{k,t+1},{\varvec{q}}^{k,t+1}\right)&\simeq R\left(\mathop {\mathrm {{\mathbb {E}}}}\limits [{\varvec{q}}^{k,t+1}|\mathcal F_t], \mathop {\mathrm {{\mathbb {E}}}}\limits [{\varvec{q}}^{k,t+1}|\mathcal F_t] \right) + R\left( (\varvec{f}_t^{\otimes k-1})_{\perp }, (\varvec{f}_t^{\otimes k-1})_{\perp } \right) \\&\simeq R\left( (\varvec{f}_t^{\otimes k-1})_{\perp }, (\varvec{f}_t^{\otimes k-1})_{\perp } \right) \\&=\xi ^{k,s}\left(\vec R( \varvec{f}_t,\varvec{f}_t)\right). \end{aligned}$$

This establishes (A.37).

Next consider (A.39), i.e., approximate orthogonality of ${\varvec{q}}^{k,r}$ and ${\varvec{q}}^{p',r}$ for $k\ne p'.$ This follows easily from the representation in point (a) which, together with Lemma A.1, inductively implies that the iterates ${\varvec{q}}^{s,k}$ for different k are approximately orthogonal. Finally, (A.38) follows directly from (A.37) and (A.39). We now prove Lemma A.3.

Proof of Lemma A.3

For convenience we write $\widetilde{\varvec{A}}=\widetilde{\varvec{A}}^{(k)}$. By Lagrange multipliers, there exist vectors $({\varvec{\theta }}_{t_1})_{t_1 \le t-1}$ in ${\mathbb {R}}^N$ such that ${\mathcal {P}}_t^{\perp } (\widetilde{\varvec{A}}) = \widetilde{\varvec{A}}- {\varvec{Q}}$, where

$$\begin{aligned} {\varvec{Q}}= \big (\Gamma ^{(k)}\big )^{\odot 2} \diamond \frac{(k-1)!}{N^{k-1}}\sum _{t_1=0}^{t-1} \sum _{j=1}^k \underbrace{\varvec{f}_{t_1}\otimes \cdots \otimes \varvec{f}_{t_1}}_{j-1 \text{ times }}\otimes {\varvec{\theta }}_{t_1} \otimes \underbrace{\varvec{f}_{t_1}\otimes \cdots \otimes \varvec{f}_{t_1}}_{k-j \text{ times }}. \end{aligned}$$

The vectors $({\varvec{\theta }}_{t_1})_{t_1 \le t-1}$ are determined by the equations ${\varvec{Q}}\{\varvec{f}_{t_1}\}=\widetilde{\varvec{A}}\{\varvec{f}_{t_1}\}$ for all $t_1\le t-1$. This expands (for each $t_1\le t-1$) to

$$\begin{aligned} \sum _{t_2\le t-1} ({\varvec{G}}_{\xi ^{k,s},t-1})_{t_1,t_2} \diamond {\varvec{\theta }}_{t_2} + \sum _{t_2\le t-1} \sum _{s'\in {\mathscr {S}}} \left(\partial _{s'} \xi ^{k,s(i)}(\vec R(\varvec{f}_{t_1},\varvec{f}_{t_2})) R_{s'}(\varvec{f}_{t_1},{\varvec{\theta }}_{t_2}) \right) \varvec{f}_{t_2} = \widetilde{\varvec{A}}\{\varvec{f}_{t_1}\}. \end{aligned}$$

Recall that we assume each ${\varvec{G}}_{\xi ^{k,s},t-1}$ is well-conditioned with high probability. Thus we can multiply the system of t equations above by ${\varvec{G}}_{\xi ^{k,s},t-1}^{-1}$ in the coordinates ${\mathcal {I}}_s$ for each $s\in {\mathscr {S}}$. For each $t_3\le t-1$, we obtain:

$$\begin{aligned} \begin{aligned}&{\varvec{\theta }}_{t_3} + \sum _{t_1,t_2<t} \left( ({\varvec{G}}_{\xi ^{k,s},t-1}^{-1})_{t_1,t_3} \sum _{s'\in {\mathscr {S}}} \left(\partial _{s'} \xi ^{k,s(i)}(\vec R(\varvec{f}_{t_1},\varvec{f}_{t_2})) R_{s'}(\varvec{f}_{t_1},{\varvec{\theta }}_{t_2}) \right) \right) \varvec{f}_{t_2} \\&= \sum _{t_1<t} ({\varvec{G}}_{\xi ^{k,s},t-1}^{-1})_{t_1,t_3} \diamond \widetilde{\varvec{A}}\{\varvec{f}_{t_1}\}. \end{aligned} \end{aligned}$$

(A.51)

Switching $t_3$ to $t_1$, we find

$$\begin{aligned} {\varvec{\theta }}_{t_1}&= {\varvec{\theta }}^0_{t_1}+ {\varvec{\theta }}^{\parallel }_{t_1}, \nonumber \\ {\varvec{\theta }}^0_{t_1}&\equiv \sum _{t_2<t} ({\varvec{G}}_{\xi ^{k,s},t-1}^{-1})_{t_1,t_2}\diamond \widetilde{\varvec{A}}\{\varvec{f}_{t_2}\}, \nonumber \\ {\varvec{\theta }}^{\parallel }_{t_1}&\in \textrm{span}\left((\varvec{f}_{t_2,s})_{t_2<t,s\in {\mathscr {S}}}\right). \end{aligned}$$

(A.52)

We claim that $\Vert {\varvec{\theta }}^{\parallel }_{t_1}\Vert _N\simeq 0$, i.e., ${\varvec{\theta }}_{t_1}\simeq {\varvec{\theta }}^0_{t_1}$. Indeed, let ${\varvec{\Theta }}\in {\mathbb {R}}^{N\times t}$ be the matrix with columns $({\varvec{\theta }}_{t_2})_{t_2<t}$, and ${\varvec{\Theta }}^0$ the matrix with columns $({\varvec{\theta }}^0_{t_2})_{t_2<t}$. Then (A.51) can be written as

$$\begin{aligned} {\mathcal {L}}_{k,t}^{\textsf{T}}({\varvec{\Theta }}) ={\varvec{\Theta }}^0. \end{aligned}$$

Here we recall ${\mathcal {L}}_{k,t}={\varvec{1}}+{\mathcal {T}}_{k,t}$ and ${\mathcal {T}}_{k,t}\in {\mathbb {R}}^{Nt\times Nt}$ is defined in (A.22). Substituting the decomposition ${\varvec{\Theta }}= {\varvec{\Theta }}^0+{\varvec{\Theta }}^{\parallel }$ in the above, we obtain

$$\begin{aligned} {\mathcal {L}}_{k,t}^{\textsf{T}}({\varvec{\Theta }}^{\parallel }) =-{\mathcal {T}}^{\textsf{T}}_{k,t}({\varvec{\Theta }}^0). \end{aligned}$$

Recall that ${\mathcal {L}}_{k,t}$ is well-conditioned by Assumption 3. Therefore it remains to prove

$$\begin{aligned} {\mathcal {T}}^{\textsf{T}}_{k,t}({\varvec{\Theta }}^0){\mathop {\simeq }\limits ^{?}} 0. \end{aligned}$$

(A.53)

Let ${\varvec{c}}_0,\ldots ,{\varvec{c}}_{t-1} \in {\mathbb {R}}^N$ be the columns of ${\mathcal {T}}^{\textsf{T}}_{k,t}({\varvec{\Theta }}^0)$. We first note that for all $t_1\le t-1$ and $s\in {\mathscr {S}}$,

$$\begin{aligned} {\varvec{c}}_{t_1,s} \in \textrm{span}\big ((\varvec{f}_{t_2,s})_{t_2<t}\big ). \end{aligned}$$

Moreover the Gram matrix

$$\begin{aligned} {\varvec{G}}_{1,t-1,s}=\left(R_s(\varvec{f}_{t_1},\varvec{f}_{t_2})\right)_{t_1,t_2<t} \end{aligned}$$

is well-conditioned for each $s\in {\mathscr {S}}$. Therefore it is sufficient to check that $R_s(\varvec{f}_{t_1},{\varvec{c}}_{t_4})\simeq 0$ for each $t_1,t_4<t$ and $s\in {\mathscr {S}}$. Plugging in the definition (A.22), it remains to check that for $0\le t_1,t_4\le t-1$,

$$\begin{aligned} \sum _{t_2,t_3<t} \sum _{s''\in {\mathscr {S}}} \lambda _{s''} \vec R_s\left( \varvec{f}_{t_1},\;({\varvec{G}}_{\xi ^{k,s''},t-1}^{-1})_{t_2,t_3} \sum _{s'} \partial _{s'} \xi ^{k,s''}(\vec R(\varvec{f}_{t_4},\varvec{f}_{t_3})) R_{s'}(\varvec{f}_{t_4},{\varvec{\theta }}^0_{t_2}) \right) R_s(\varvec{f}_{t_1},\varvec{f}_{t_3}) {\mathop {\simeq }\limits ^{?}} 0. \end{aligned}$$

Finally, this last claim follows by substituting the definition (A.52) of ${\varvec{\theta }}^0_{t_2}$, and using the fact that

$$\begin{aligned} R_{s'}(\varvec{f}_{t_4},\widetilde{\varvec{A}}\{\varvec{f}_{t_2}\})\simeq 0,\quad \forall ~ t_4,t_2\le t,\quad s'\in {\mathscr {S}}\end{aligned}$$

which follows from Lemma A.1. Thus (A.53) is established.

We are now ready to prove Lemma A.3. First note that

$$\begin{aligned} \widetilde{\varvec{A}}\{(\varvec{f}_t^{\otimes k-1})_{\perp }\} - {\mathcal {P}}_t^{\perp } (\widetilde{\varvec{A}}) \{(\varvec{f}_t^{\otimes k-1})_{\perp }\} = {\varvec{Q}}\{(\varvec{f}_t^{\otimes k-1})_{\perp }\} \end{aligned}$$

(A.54)

decomposes into two types of terms based on the definition of ${\varvec{Q}}$ above. Recalling (A.41), the first involves

$$\begin{aligned} \left\langle \Gamma ^{(k),\nabla } \diamond (\varvec{f}_{t_1}^{\otimes k-1})_{\perp }, \Gamma ^{(k),\nabla } \diamond (\varvec{f}_t^{\otimes k-1})_{\perp } \right\rangle _N {\varvec{\theta }}_{t_1} \end{aligned}$$

for $t_1\le t-1$, which vanishes by the definition of $(\varvec{f}_t^{\otimes k-1})_{\perp }$. The other terms take the form

$$\begin{aligned} \left\langle \Gamma ^{(k),\nabla } \diamond ({\varvec{\theta }}_{t_1}\otimes \varvec{f}_{t_1}^{\otimes k-2}),\quad \Gamma ^{(k),\nabla } \diamond (\varvec{f}_t^{\otimes k-1})_{\perp } \right\rangle _N ~ \varvec{f}_{t_1}. \end{aligned}$$

In particular, this means that to prove (A.54) vanishes, suffices to show

$$\begin{aligned} R\left( {\varvec{Q}}\{(\varvec{f}_t^{\otimes k-1})_{\perp }\},\quad \varvec{f}_{t_2} \right) =0 \end{aligned}$$

for all $t_2\le t$.

Note that by construction,

$$\begin{aligned} (\varvec{f}_t^{\otimes k-1})_{\perp } = \sum _{t_1\le t} b_{t_1} \varvec{f}_{t_1}^{\otimes k-1} . \end{aligned}$$

By the well-conditioning assumption, the $b_{t_1}$ are bounded. Therefore it suffices to show that

$$\begin{aligned} R\left( {\varvec{Q}}\{\varvec{f}_{t_2}^{\otimes k-1}\},\quad \varvec{f}_{t_1} \right) {\mathop {\simeq }\limits ^{?}} 0, \quad \forall \; t_1\le t-1, \quad t_2\le t. \end{aligned}$$

Finally note that each term in the left-hand side includes an overlap $R_s({\varvec{\theta }}_{t_1},\varvec{f}_{t_2})$. However these all vanish:

$$\begin{aligned} R_s({\varvec{\theta }}_{t_1},\varvec{f}_{t_2}) \simeq 0. \end{aligned}$$

This is because we can substitute ${\varvec{\theta }}_{t_1}$ with ${\varvec{\theta }}_{t_1}^0$ as defined in (A.52) and use the fact that $\vec R(\widetilde{\varvec{A}}\{\varvec{f}_{t_3}\},\varvec{f}_t)\simeq 0$ which follows from Lemma A.1. This completes the proof. $\square $

1.5.3 A.5.3: Proof of (c)

The base case of initialization is handled by the following basic fact.

Proposition A.4

Let $\mu \in {\mathcal {P}}({\mathbb {R}}^k)$ be a probability distribution with finite second moment. Then if $E_1,\ldots ,E_N{\mathop {\sim }\limits ^{i.i.d.}}\mu $ and $\hat{\mu }_N=\frac{1}{N}\sum _{i=1}^N \delta _{E_i}$, one has

$$\begin{aligned} \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } {\mathbb {W}}_2(\hat{\mu }_N,\mu )=0. \end{aligned}$$

Proof

It suffices to show that $\hat{\mu }_N\rightarrow \mu $ weakly in probability and show convergence in probability of the $L^2$ norm. The first is clear and the second holds by the law of large numbers. $\square $

Continuing to the inductive step, recall that the process $(U^{k,t}_s)_{t\ge 1}$ is Gaussian by construction, and independent of $U^{k,0}_s$. Define

$$\begin{aligned} C_{t_1,t_2,s}&= \mathop {\mathrm {{\mathbb {E}}}}\limits \big [U^{k,t_1}_s U^{k,t_2}_s\big ]; \\ {\varvec{C}}_{\le t,s}&= (C_{t_1,t_2,s})_{t_1,t_2\le t}. \end{aligned}$$

We then have

$$\begin{aligned} \begin{aligned} \mathop {\mathrm {{\mathbb {E}}}}\limits [U^{k,t+1}_s| U^{k,0}_s,\ldots ,U^{k,t}_s]&= \sum _{t_1=1}^t \widetilde{\alpha }_{t_1,s} U^{k,t_1};\\ \widetilde{\alpha }_{t_1,s}&\equiv \sum _{t_2=1}^t ({\varvec{C}}^{-1}_{\le t,s})_{t_1,t_2}C_{t_2,t+1,s}. \end{aligned} \end{aligned}$$

(A.55)

Here in writing $({\varvec{C}}^{-1}_{\le t,s})_{t_1,t_2}$, we view ${\varvec{C}}_{\le t,s}$ as a $(t+1)\times (t+1)$ matrix for each $s\in {\mathscr {S}}$.

On the other hand, from point (a), we know that

$$\begin{aligned} \begin{aligned} \mathop {\mathrm {{\mathbb {E}}}}\limits [{\varvec{q}}^{k,t+1}_{s}|\mathcal F_t]&= \sum _{1\le t_1\le t} \alpha _{t_1,s} {\varvec{q}}^{t_1,k}_s; \\ \alpha _{t_1,s}&\equiv \sum _{t_2=1}^t ({\varvec{G}}^{-1}_{\xi ^{k,s},t-1})_{t_1-1,t_2-1} ({\varvec{G}}_{\xi ^{k,s},t})_{t_2-1,t}. \end{aligned} \end{aligned}$$

(A.56)

Moreover the induction hypothesis of (A.40) implies that for $t_1,t_2 \le t$,

$$\begin{aligned} ({\varvec{G}}_{\xi ^{k,s},t})_{t_1,t_2} \simeq {\mathbb {E}}\left[ \xi ^{k,s} \left( f_{t_1}(W^0_s,\ldots ,W^{t_1}_s;E^0_s,\ldots ,E^{t_1}_s), f_{t}(W^0_s,\ldots ,W^{t_2}_s;E^0_s,\ldots ,E^{t_2}_s)\}\right)\right].\nonumber \\ \end{aligned}$$

(A.57)

(Recall that by definition $W^t_s \equiv \sum _{k\le D} U^{k,t}_s$, while $\varvec{f}_t=f_t({\varvec{V}}_t;{\varvec{E}}_t)$ here.)

Therefore, from the definition of the process $(U^{k,t}_s)_{t\ge 0}$,

$$\begin{aligned} ({\varvec{G}}_{\xi ^{k,s},t})_{t_1,t_2} \simeq C_{t_1+1,t_2+1,s}, \quad \quad \forall t_1,t_2\le t. \end{aligned}$$

Recalling that ${\varvec{G}}_{\xi ^{k,s},t}$ is well-conditioned, we find (recall (A.55), (A.56)):

$$\begin{aligned} \alpha _{t_1,s}\simeq \widetilde{\alpha }_{t_1,s}. \end{aligned}$$

Therefore we also have

$$\begin{aligned} \mathop {\mathrm {{\mathbb {E}}}}\limits [{\varvec{q}}^{k,t+1}|\mathcal F_t] - \sum _{t_1=1}^t \widetilde{\alpha }_{t_1}\diamond {\varvec{q}}^{k,t_1}&= \sum _{t_1=1}^t (\alpha _{t_1}-\widetilde{\alpha }_{t_1}) \diamond {\varvec{q}}^{k,t_1} \\&\simeq 0. \end{aligned}$$

Moreover, Lemma A.1 (point 4) shows that ${\mathcal {P}}^{\perp }_t(\widetilde{\varvec{A}}^{(k)})\{\varvec{f}_t\}\simeq \widetilde{\varvec{A}}^{(k)}\{(\varvec{f}_{t}^{\otimes k-1})_{\perp }\}$ has entries which are approximately independent Gaussian with variance

$$\begin{aligned} \sigma ^2_{t,s}\equiv \left( \Gamma ^{(k),\nabla } \diamond (\varvec{f}_{t}^{\otimes k-1})_{\perp },\quad \Gamma ^{(k),\nabla } \diamond (\varvec{f}_{t}^{\otimes k-1})_{\perp } \right) \end{aligned}$$

on coordinates $i\in {\mathcal {I}}_s$, even conditionally on ${\mathcal {F}}_t$. Therefore

$$\begin{aligned} {\varvec{q}}^{k,t+1}&{\mathop {=}\limits ^{\textrm{d}}}\sum _{t_1=1}^t \widetilde{\alpha }_{t_1} \diamond {\varvec{q}}^{k,t_1} +\sigma _t \diamond {\varvec{g}}+ {\varvec{err}}^{k,t+1}, \end{aligned}$$

(A.58)

where $\Vert {\varvec{err}}\Vert _N\simeq 0$ and ${\varvec{g}}\sim \textsf{N}(\varvec{0},{\varvec{I}}_N)$ is independent of everything else. It now remains to verify that this agrees with the desired covariance. As proved in the previous point, for any $t_1\le t$,

$$\begin{aligned} R\left({\varvec{q}}^{k,t+1} ,{\varvec{q}}^{k,t'+1}\right)_s&\simeq \xi ^{k,s}\left( \varvec{f}_t,\varvec{f}_{t'} \right) \\&\simeq \mathop {\mathrm {{\mathbb {E}}}}\limits \big [ U^{k,t+1}_s U^{k,t'+1}_s \big ]. \end{aligned}$$

In particular this establishes convergence of the second moment, so in order to prove (A.40) it is sufficient to establish weak convergence. Hence we may assume $\psi :{\mathbb {R}}^{D \times (t+1)}\rightarrow {\mathbb {R}}$ is Lipschitz (rather than just pseudo-Lipschitz).

Using the representation (A.58), and focusing for simplicity on a single k, we get

$$\begin{aligned} \frac{1}{N_s} \sum _{i\in {\mathcal {I}}_s} \psi \big ({\varvec{q}}_i^{k,\le t}, q_i^{k,t+1} ; {\varvec{e}}^{\le t}_i \big )&\simeq \frac{1}{N_s} \sum _{i\in {\mathcal {I}}_s} \psi \left( {\varvec{q}}_i^{k,\le t}, \sum _{s=1}^t \widetilde{\alpha }_s {\varvec{q}}^{k,s} + \sigma _tg_i ; {\varvec{e}}^{\le t}_i \right) \\&\simeq \frac{1}{N_s} \sum _{i\in {\mathcal {I}}_s} {\mathbb {E}}^{g\sim {\mathcal {N}}(0,1)} \psi \left( {\varvec{q}}_i^{k,\le t},\sum _{s=1}^t\widetilde{\alpha }_s{\varvec{q}}^{k,s}+\sigma _t g;{\varvec{e}}^{\le t}_i\right). \end{aligned}$$

The second equality above follows by Gaussian concentration since $\psi $ is assumed Lipschitz. Applying the induction hypothesis now implies (A.40), except that ${\varvec{e}}^{t+1}$ is not present. However since ${\varvec{e}}^{t+1}_i$ and $E^{t+1}_s$ have the same law and are both independent of the past, ${\mathbb {W}}_2$ convergence immediately transfers by Proposition A.5 below. This completes the proof of part (c).

Proposition A.5

Let $\nu _n=\sum _{i=1}^n \delta _{\widehat{X}_i}$ for $n\ge 1$ be a sequence of probability measures on ${\mathbb {R}}^k$ converging to $\nu \in {\mathcal {P}}({\mathbb {R}}^k)$ in ${\mathbb {W}}_2$. Let $\mu \in {\mathcal {P}}({\mathbb {R}}^k)$ be a probability distribution with finite second moment. Let

$$\begin{aligned} E_1,\ldots ,E_N{\mathop {\sim }\limits ^{i.i.d.}}\mu \end{aligned}$$

and set

$$\begin{aligned} \widetilde{\nu }_n=\sum _{i=1}^n \delta _{(\widehat{X}_i,E_i)}. \end{aligned}$$

Then

$$\begin{aligned} \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } {\mathbb {W}}_2(\widetilde{\nu }_N,\nu \otimes \mu )=0. \end{aligned}$$

Proof

Using Proposition A.4 applied to $\nu $, we can find for any ${\varepsilon }>0$ a coupling $\Pi =\big ((\widehat{X}_i,X_i)\big )_{i\in [N]}$ of $\nu _n$ with i.i.d. samples $\hat{\nu }_n$ with transport cost at most ${\varepsilon }$. Generate independent variables $E_1,\ldots ,E_N{\mathop {\sum }\limits ^{i.i.d.}}\mu $. Then note that

$$\begin{aligned} {\mathbb {W}}_2\big (\widetilde{\nu }_n,\nu \otimes \mu \big )&\le {\mathbb {W}}_2\left(\widetilde{\nu }_n,\sum _{i=1}^n \delta _{(X_i,E_i)}\right) + {\mathbb {W}}_2\left(\sum _{i=1}^n \delta _{(X_i,E_i)},\nu \otimes \mu \right) \\&\le {\varepsilon }+ o_{{\mathbb {P}}}(1) . \end{aligned}$$

Here in the latter step we used the assumption on the coupling $\Pi $ for the first term and Proposition A.4 applied to $\nu \otimes \mu $ on the second term. This completes the proof. $\square $

1.6 A.6: Asymptotic Equivalence of AMP and Long AMP

Here we show that AMP and LAMP produce approximately the same iterates.

Lemma A.6

Let $\{{\varvec{G}}^{(k)}\}_{k\le D}$ be standard Gaussian tensors, and $\varvec{A}^{(k)} = \Gamma ^{(k)}\diamond {\varvec{G}}^{(k)}$ for $k\ge 2$. Consider the corresponding AMP iterates ${\varvec{Z}}_{t}\equiv ({\varvec{z}}^{k,t_1})_{k\le D,t_1\le t}$ and LAMP iterates ${\varvec{Q}}_{t}\equiv ({\varvec{q}}^{k,t_1})_{k\le D,t_1\le t}$, from the same initialization ${\varvec{Z}}_0={\varvec{Q}}_0$ satisfying the assumptions of Theorems 2 and 3.

Let $\varvec{f}_t = f_t({\varvec{V}}_t;{\varvec{E}}_t)$, $t\ge 0$ be the nonlinearities applied to LAMP iterates. Further assume that there exists a constant $C<\infty $ such that, for all $t\le T$,

(i)
The LAMP Gram matrices ${\varvec{G}}_{k,t} = {\varvec{G}}_{k,t}$ are well-conditioned as guaranteed by Assumption 3, i.e.,
$$\begin{aligned} C^{-1}\le \sigma _{\min }({\varvec{G}}_{k,t})\le \sigma _{\max }({\varvec{G}}_{k,t})\le C, \quad \quad \forall k\le D,~t\le T. \end{aligned}$$
(ii)
Let the linear operator ${\mathcal {T}}_{k,t}:{\mathbb {R}}^{N\times t}\rightarrow {\mathbb {R}}^{N\times t}$ be defined as per (A.22), with ${\varvec{G}}_{k,t} = {\varvec{G}}_{k,t}({\varvec{V}})$, and $\varvec{f}_t=f_t({\varvec{V}},{\varvec{E}}_t)$, and define ${\mathcal {L}}_{k,t} = {\textbf {1}}+{\mathcal {T}}_{k,t}$. Then
$$\begin{aligned} C^{-1}\le \sigma _{\min }({\mathcal {L}}_{k,t})\le \sigma _{\max }({\mathcal {L}}_{k,t})\le C. \end{aligned}$$

Then, for any $t\le T$, we have

$$\begin{aligned} \Vert {\varvec{Z}}_{t} - {\varvec{Q}}_{t}\Vert _N\simeq 0. \end{aligned}$$

(A.59)

Proof

Throughout the proof we will suppress ${\varvec{E}}_t$ and simply write $f_t({\varvec{W}}_t)$ or $f_t({\varvec{V}}_t)$ to distinguish AMP and LAMP iterates, and analogously for ${\varvec{G}}_{k,t}({\varvec{W}}_t)$ or ${\varvec{G}}_{k,t}({\varvec{V}}_t)$. The proof is by induction over the iteration number, so we will assume it to hold at iteration t, and prove it for iteration $t+1$. We prove the induction step by establishing the following two facts for each $2\le k\le D$:

$$\begin{aligned} \big \Vert \textsf{AMP}_{t+1}({\varvec{Z}}_{t})_k -\textsf{AMP}_{t+1}({\varvec{Q}}_{t})_k \big \Vert _N&\simeq 0, \end{aligned}$$

(A.60)

$$\begin{aligned} \big \Vert \textsf{AMP}_{t+1}({\varvec{Q}}_{t})_k -\textsf{LAMP}_{t+1}({\varvec{Q}}_{t})_k \big \Vert _N&\simeq 0. \end{aligned}$$

(A.61)

Let us first consider the claim (A.60), and note that

$$\begin{aligned} \textsf{AMP}_{t+1}({\varvec{Z}}_{t})_k -\textsf{AMP}_{t+1}({\varvec{Q}}_{t})_k&= \varvec{A}^{(k)}\{f_t({\varvec{W}}_t)\}-\varvec{A}^{(k)}\{f_t({\varvec{V}}_t)\} \\ {}&\quad - \sum _{t_1\le t} d_{t,t_1,k} \diamond \big (f_{t_1-1}({\varvec{W}}_{t_1-1})-f_{t_1-1}({\varvec{V}}_{t_1-1})\big ), \end{aligned}$$

where we wrote $d_{t,t_1,k,s}$ for the coefficients of (A.8), with AMP iterates replaced by LAMP iterates. We then have

$$\begin{aligned} \big \Vert \textsf{AMP}_{t+1}({\varvec{Z}}_{t})_k -\textsf{AMP}_{t+1}({\varvec{Q}}_{t})_k \big \Vert _N&\le D_{1,t}+D_{2,t}; \\ D_{1,t}&\equiv \big \Vert \varvec{A}^{(k)}\{f_t({\varvec{W}}_t)\}-\varvec{A}^{(k)}\{f_t({\varvec{V}}_t)\}\big \Vert _N, \\ D_{2,t}&\equiv \sum _{t_1\le t,~s\in {\mathscr {S}}} |d_{t,t_1,k,s}| \cdot \big \Vert f_{t_1-1,s}({\varvec{W}}_{t_1-1})- f_{t_1-1,s}({\varvec{V}}_{t_1-1})\big \Vert _N. \end{aligned}$$

Notice that, by the induction assumption (and recalling that each $f_{t,s}$ is Lipschitz continuous and acts component-wise):

$$\begin{aligned} \big \Vert f_t({\varvec{W}}_t)-f_t({\varvec{V}}_t)\big \Vert _N \le C_T \sum _{t_1\le t,~k\le D}\Vert {\varvec{w}}^{k,t_1}-{\varvec{v}}^{k,t_1}\Vert _N \simeq 0. \end{aligned}$$

(A.62)

Further, for any tensor $\varvec{T}\in ({\mathbb {R}}^{N})^{\otimes k}$, and any vectors ${\varvec{v}}_1,bv_2\in {\mathbb {R}}^N$,

$$\begin{aligned} \Vert \varvec{T}\{{\varvec{v}}_1\}-\varvec{T}\{{\varvec{v}}_{2}\}\Vert _N\le (N^{\frac{k-2}{2}}\Vert \varvec{T}\Vert _{\text {op}}) (\Vert {\varvec{v}}_1\Vert _N+\Vert {\varvec{v}}_2\Vert _N)^{k-2} \Vert {\varvec{v}}_1-{\varvec{v}}_2\Vert _N \end{aligned}$$

(A.63)

Using Lemma A.1, this implies that the following bound holds with high probability for a constant C:

$$\begin{aligned} D_{1,t}&\le C (\Vert f_t({\varvec{W}}_t)\Vert _N+\Vert f_t({\varvec{V}}_t)\Vert _N)^{k-2} \Vert f_t({\varvec{W}}_t)-f_t({\varvec{V}}_t)\Vert _N\\&\le C (2\Vert f_t({\varvec{V}}_t)\Vert _N+\Vert f_t({\varvec{W}}_t) -f_t({\varvec{V}}_t)\Vert _N)^{k-2} \Vert f_t({\varvec{W}}_t) -f_t({\varvec{V}}_t)\Vert _N \\&\simeq 0. \end{aligned}$$

The last step follows from (A.62) and Theorem 3, which implies (recall each $f_{t,s}$ is Lipschitz) that $\Vert f_t({\varvec{V}}_t)\Vert _N\le C$ with probability $1-o(1)$. Notice that the same argument implies $\Vert f_t({\varvec{W}}_t)\Vert _N \le C$ with high probability.

Similarly, $D_{2,t}\simeq 0$ follows since $\Vert f_{t_1-1}({\varvec{W}}_{t_1-1})- f_{t_1-1}({\varvec{V}}_{t_1-1})\Vert _N\simeq 0$ and $|d_{t,t_1,k,s}|\le C_T$ by construction, thus yielding (A.60).

We now prove (A.61). Comparing (A.8) and (A.29), with ${\mathcal {P}}_t^{\parallel } = {\varvec{1}}-{\mathcal {P}}_t^{\perp }$ we find

$$\begin{aligned} \textsf{AMP}_{t+1}({\varvec{Q}}_{t})_k - \textsf{LAMP}_{t+1}({\varvec{Q}}_{t})_k&= {\mathcal {P}}_t^{\parallel } (\varvec{A}^{(k)})\{f_t({\varvec{V}}_t)\} - {\textbf {ons}}_{k,t+1} - \sum _{0\le t_1\le t-1} h_{t,t_1,k}\diamond {\varvec{q}}^{k,t_1+1}, \nonumber \\ {\textbf {ons}}_{k,t+1}&= \sum _{t_1\le t} d_{t,t_1,k}\diamond f_{t_1-1}({\varvec{V}}_{t_1-1}) \end{aligned}$$

(A.64)

Note that ${\mathcal {P}}_t^{\parallel } (\varvec{A}^{(k)})=\mathop {\mathrm {{\mathbb {E}}}}\limits \left[\varvec{A}^{(k)}|{\mathcal {F}}_t\right]$, where ${\mathcal {F}}_t$ here is the analogous $\sigma $-algebra generated by $\{{{\textbf {q}}}^{k,t_1},{\varvec{e}}^{t_1}\}_{t_1\le t, k\le D}$. Equivalently, this is the conditional expectation of $\varvec{A}^{(k)}$ given the linear constraints

$$\begin{aligned} {\varvec{A}}^{(k)}\{f_{t_1}({\varvec{V}}_{t_1})\}&={\varvec{y}}_{k,t_1+1},\quad \text{ for }\; t_1\in \{0,\ldots , t-1\}, \end{aligned}$$

(A.65)

Also notice that, by the induction hypothesis, and the definition of ${\varvec{y}}_{k,t_1}$, (A.14), we have for all $t_1\le t$,

$$\begin{aligned} {\varvec{y}}_{k,t_1} \simeq {\varvec{q}}^{k,t_1}+{\textbf {ons}}_{k,t_1}. \end{aligned}$$

(A.66)

Lemma A.2 implies that ${\mathcal {P}}_t^{\parallel } (\varvec{A}^{(k)})$ takes the form of (A.21) for a suitable matrix $\widehat{\varvec{Z}}_{k,t}\in {\mathbb {R}}^{N\times t}$. The key claim is that

$$\begin{aligned} \widehat{\varvec{Z}}_{k,t} \simeq {\varvec{Q}}_t. \end{aligned}$$

(A.67)

In order to establish this claim, we show that, under the inductive hypothesis,

$$\begin{aligned} ({\varvec{1}}+{\mathcal {T}}_{k,t}){\varvec{Q}}_t\simeq \varvec{Y}_{k,t}. \end{aligned}$$

(A.68)

Since ${\mathcal {L}}_{k,t}={\varvec{1}}+{\mathcal {T}}_{k,t}$ is well-conditioned by assumption, the combination of (A.23) and (A.68) implies $\widehat{\varvec{Z}}_{k,t}\simeq {\varvec{Q}}_t$. By (A.66), in order to prove (A.68), it is sufficient to show that

$$\begin{aligned} {\mathcal {T}}_t{\varvec{Q}}_t \simeq {\textbf {ONS}}_{k,t} \equiv [{\textbf {ons}}_{k,1}|\cdots |{\textbf {ons}}_{k,t}]. \end{aligned}$$

(A.69)

In order to prove (A.69), we use Theorem 3. Recall that

$$\begin{aligned} C_{t_1,t_2,s}&= \mathop {\mathrm {{\mathbb {E}}}}\limits \{U^{k,t_1}_s U^{k,t_2}_s\}, \\ W^{t_1}_s&= \sum _{2\le k\le D} U^{k,t_1}_s, \\ {\varvec{C}}_{\le t}&= (C_{t_1,t_2,s})_{t_1,t_2\le t}. \end{aligned}$$

(The value $2\le k\le D$ is implicitly fixed in the definition of ${\varvec{C}}_{\le t}$.) By Theorem 3,

$$\begin{aligned} C_{t_1+1,t_2+1} \simeq \langle {\varvec{q}}^{k,t_1+1},{\varvec{q}}^{k,t_2+1}\rangle \simeq ({\varvec{G}}_{\xi ^{k,s},t}({\varvec{V}}))_{t_1,t_2}, \quad \forall ~t_1,t_2\le t. \end{aligned}$$

This implies for any $0 \le t_1\le t-1$ and $s\in {\mathscr {S}}$,

$$\begin{aligned}&\sum _{t_2=0}^{t-1} ({\varvec{G}}_{\xi ^{k,s},t-1}^{-1})_{t_1,t_2} R\left({\varvec{q}}^{k,t_2+1},f_{t-1}({\varvec{V}}_{t-1})\right)_s \nonumber \\&\simeq \sum _{t_2=0}^{t-1} ({\varvec{C}}_{\le t,s}^{-1})_{t_1+1,t_2+1} \mathop {\mathrm {{\mathbb {E}}}}\limits \left[ U^{k,t_2+1}_s f_{t-1,s}(W^0_s,\ldots ,W^{t-1}_s;E^0_s,\ldots ,E^{t-1}_s) \right] \nonumber \\&= \mathop {\mathrm {{\mathbb {E}}}}\limits \left[ \frac{\partial f_{t-1,s}}{\partial W^{t_1+1}_s} (W^0_s,\ldots ,W^{t-1}_s;E^0_s,\ldots ,E^{t-1}_s) \right] {\varvec{1}}_{t_1\le t-2}. \end{aligned}$$

(A.70)

Indeed, Gaussian integration by parts yields the latter expression (it can be done conditionally on the variables E since they are independent). Combining (A.70) with the definition (A.8) will now allow us to conclude ${\mathcal {T}}_{k,t}{\varvec{Q}}_t\simeq {\textbf {ONS}}_{k,t}$ as desired. Indeed for each $s\in {\mathscr {S}}$ we have

$$\begin{aligned} \big [{\mathcal {T}}_{k,t}{\varvec{Q}}_t\big ]_{t,s}&= \sum _{t_1=0}^{t-1} \partial _{s'}\xi ^{k,s}(\vec R(\varvec{f}_{t_1},\varvec{f}_{t-1})) \Big ( \sum _{t_2=0}^{t-1} ({\varvec{G}}_{\xi ^{k,s'},t-1}^{-1})_{t_1,t_2}\quad R_{s'}({\varvec{q}}^{k,t_2+1}, \varvec{f}_{t-1}) \Big ) \varvec{f}_{t_1,s} \\&\simeq \sum _{t_1=0}^{t-2} \sum _{s'\in {\mathscr {S}}} \partial _{s'}\xi ^{k,s} \big ( \vec R(\varvec{f}_{t_1}, \varvec{f}_{t-1}) \big ) \cdot \mathop {\mathrm {{\mathbb {E}}}}\limits \left[ \frac{\partial f_{t-1,s'}}{\partial W^{t_1+1}_{s'}} (W^0_{s'},\ldots ,W^{t-1}_{s'}) \right] \varvec{f}_{t_1,s} \\&= {\textbf {ons}}_{k,t}. \end{aligned}$$

Having established (A.67), we now use the formula (A.21) for ${\mathcal {P}}^{\parallel }_t(\varvec{A}^{(k)})=\mathop {\mathrm {{\mathbb {E}}}}\limits \big [\varvec{A}^{(k)}|{\mathcal {F}}_t\big ]$. The result is:

$$\begin{aligned} {\mathcal {P}}^{\parallel }_{t}(\varvec{A}^{(k)})\{\varvec{f}_t\}&\simeq \sum _{t_1\le t} \big (\alpha _{t_1} \diamond {\varvec{q}}^{k,t_1} + \beta _{t_1} \diamond \varvec{f}_{t_1}\big ); \nonumber \\ \alpha _{t_1,s}&\equiv \sum _{0\le t_2\le t-1}({\varvec{G}}_{\xi ^{k,s},t-1}^{-1})_{t_1,t_2}\;\xi ^{k,s}\big (\vec R(f_{t_2}({\varvec{V}}_{t_2}),f_{t}({\varvec{V}}_t))\big ), \nonumber \\ \beta _{t_1,s}&\equiv \sum _{s'\in {\mathscr {S}}} \partial _{s'}\xi ^{k,s}\big (\vec R(\varvec{f}_{t_1},\varvec{f}_{t})\big )\left(\sum _{0\le t_2\le t-1} ({\varvec{G}}_{\xi ^{k,s'},t-1}^{-1})_{t_1,t_2}\;R_{s'}( {\varvec{q}}^{k,t_2},\varvec{f}_{t})\right). \end{aligned}$$

(A.71)

On the other hand, using again (A.70) gives

$$\begin{aligned} \sum _{t_1\le t}\beta _{t_1}\diamond \varvec{f}_{t_1}&\simeq \sum _{t_1 \le t-1} d_{t,t_1,k}\diamond \varvec{f}_{t_1-1} = {\textbf {ons}}_{k,t+1}, \\ \sum _{t_1\le t} \alpha _{t_1} \diamond {\varvec{q}}^{k,t_1}&\simeq \sum _{0\le t_1\le t-1} h_{t,t_1,k} \diamond {\varvec{q}}^{k,t_1+1}. \end{aligned}$$

We conclude from (A.64) that $\Vert \textsf{AMP}_{t+1}({\varvec{Q}}_{t})_k -\textsf{LAMP}_{t+1}({\varvec{Q}}_{t})_k \Vert _N\simeq 0$. This concludes the proof. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Huang, B., Sellke, M. Optimization Algorithms for Multi-species Spherical Spin Glasses. J Stat Phys 191, 29 (2024). https://doi.org/10.1007/s10955-024-03242-7

Download citation

Received: 14 September 2023
Accepted: 28 January 2024
Published: 24 February 2024
DOI: https://doi.org/10.1007/s10955-024-03242-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Optimization Algorithms for Multi-species Spherical Spin Glasses

Abstract

Similar content being viewed by others

Crisanti–Sommers Formula and Simultaneous Symmetry Breaking in Multi-species Spherical Spin Glasses

Low Temperature Asymptotics of Spherical Mean Field Spin Glasses

Spectral Gap Estimates in Mean Field Spin Glasses

1 Introduction

1.1 Problem Description

1.2 The Value \({\textsf {ALG}}\)

Assumption 1

Definition 1.1

Definition 1.2

Proposition 1.3

Definition 1.4

Definition 1.5

Theorem 1

1.3 Notations

Proposition 1.6

2 Achieving Energy \({\textsf {ALG}}\)

2.1 Definition of Pseudo-Maximizer

Definition 2.1

2.2 Review of Approximate Message Passing

Proposition 2.2

2.3 Stage \(\text {I}\): Finding the Root of the Ultrametric Tree

Lemma 2.3

Proof

Lemma 2.4

Proof

Remark 2.1

Lemma 2.5

Proof

2.4 Stage \(\text {II}\): Descending the Ultrametric Tree

Lemma 2.6

Proof

Lemma 2.7

Proof

Proof of Theorem 1

3 Extensions

3.1 Signed AMP

Lemma 3.1

Proof

Remark 3.1

3.2 Gradient Computation and Connection to \(E_{\infty }\)

Proposition 3.2

Proof

Remark 3.2

Proposition 3.3

Proof

Corollary 3.4

Proof

3.3 Branching IAMP and Exponential Concentration

Proposition 3.5

Proof

Proposition 3.6

Proof

Remark 3.3

Remark 3.4

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: State Evolution: Proof of Proposition 2.2

Appendix A: State Evolution: Proof of Proposition 2.2

Theorem 2

1.1 A.1: Further Definitions

Assumption 2

Assumption 3

1.2 A.2: Preliminary Lemmas

Lemma A.1

Lemma A.2

Proof of Lemma A.2

1.3 A.3: Long AMP