1 Introduction

This paper studies the efficient optimization of a family of random non-convex functions \(H_N\) defined on high-dimensional spaces, namely the Hamiltonians of multi-species spherical spin glasses. Mean-field spin glasses have been studied since [25] as models for disordered magnetic systems and are also closely linked to random combinatorial optimization problems [12, 19, 22]. In short, their Hamiltonians are certain polynomials in many variables with independent centered Gaussian coefficients.

The purpose of this work is to develop efficient algorithms to optimize \(H_N\). Our companion work [16] derives an algorithmic threshold \({\textsf {ALG}}\) and proves no optimization algorithm with suitably Lipschitz dependence on \(H_N\) can achieve energy better than \({\textsf {ALG}}\) with more than exponentially small probability. The value \({\textsf {ALG}}\) is expressed as the maximum of a variational principle over several increasing functions, which was shown to be achieved by joining the solutions to a pair of well-posed differential equations. The first main contribution of this paper is to show that given a solution to this variational problem, so-called approximate message passing (AMP) algorithms efficiently achieve the value \({\textsf {ALG}}\). We note that several previous works [4, 21, 24, 26] have given similar algorithms for mean-field spin glasses with 1 species, and our algorithm is in line with the latter three.

Furthermore, we use these AMP algorithms to aid a detailed study of the landscape of \(H_N\) by probing neighborhoods of special critical points. This is related to a second companion work [17] which identifies the phase boundary for topological trivialization of \(H_N\), where the number of critical points is a constant independent of N. Therein, Kac-Rice estimates are used to show that for r-species models (defined on a product of r spheres) in the “super-solvable” regime with strong external field, \(H_N\) has exactly \(2^r\) critical points with high probability. In this paper, we give a signed AMP algorithm which explicitly approximates each of these critical points. Moreover in the complementary “sub-solvable” regime, we use AMP to construct \(\exp (cN)\) separated approximate critical points with high probability. This implies the failure of strong topological trivialization as defined in [17], which is proved therein to hold for super-solvable models. Finally, the machinery of AMP allows us to compute the local behavior of \(H_N\) around these algorithmic outputs, giving even more precise information about the landscape.

1.1 Problem Description

Fix a finite set \({\mathscr {S}}= \{1,\ldots ,r\}\). For each positive integer N, fix a deterministic partition \(\{1,\ldots ,N\} = \sqcup _{s\in {\mathscr {S}}}\, {\mathcal {I}}_s\) with \(\lim _{N\rightarrow \infty } |{\mathcal {I}}_s| / N =\lambda _s\) where \({\vec \lambda }= (\lambda _1,\ldots ,\lambda _r) \in {\mathbb {R}}_{>0}^{\mathscr {S}}\). For \(s\in {\mathscr {S}}\) and \({\varvec{x}}\in {\mathbb {R}}^N\), let \({\varvec{x}}_s \in {\mathbb {R}}^{{\mathcal {I}}_s}\) denote the restriction of \({\varvec{x}}\) to coordinates \({\mathcal {I}}_s\). We consider the state space

$$\begin{aligned} {\mathcal {B}}_N = \left\rbrace {\varvec{x}}\in {\mathbb {R}}^N : {\left\Vert{\varvec{x}}_s\right\Vert}_2^2 \le \lambda _s N \quad \forall ~s\in {\mathscr {S}}\right\lbrace . \end{aligned}$$
(1.1)

Fix \({\vec h}= (h_1,\ldots ,h_r) \in {\mathbb {R}}_{\ge 0}^{\mathscr {S}}\) and let \({\textbf {1}}= (1,\ldots ,1) \in {\mathbb {R}}^N\). For each \(k\ge 2\) fix a symmetric tensor \(\Gamma ^{(k)} = (\gamma _{s_1,\ldots ,s_k})_{s_1,\ldots ,s_k\in {\mathscr {S}}} \in ({\mathbb {R}}_{\ge 0}^{{\mathscr {S}}})^{\otimes k}\) with \(\sum _{k\ge 2} 2^k {\left\Vert\Gamma ^{(k)}\right\Vert}_\infty < \infty \), and let \({\textbf {G}}^{(k)} \in ({\mathbb {R}}^N)^{\otimes k}\) be a tensor with i.i.d. standard Gaussian entries.

For \(A\in ({\mathbb {R}}^{\mathscr {S}})^{\otimes k}\), \(B\in ({\mathbb {R}}^N)^{\otimes k}\), define \(A\diamond B \in ({\mathbb {R}}^N)^{\otimes k}\) to be the tensor with entries

$$\begin{aligned} (A\diamond B)_{i_1,\ldots ,i_k} = A_{s(i_1),\ldots ,s(i_k)} B_{i_1,\ldots ,i_k}, \end{aligned}$$
(1.2)

where s(i) denotes the \(s\in {\mathscr {S}}\) such that \(i\in {\mathcal {I}}_s\). Let \(\varvec{h}= {\vec h}\diamond {\textbf {1}}\). We consider the mean-field multi-species spin glass Hamiltonian

$$\begin{aligned} H_N({\varvec{\sigma }})&= \langle \varvec{h}, {\varvec{\sigma }}\rangle + \widetilde{H}_N({\varvec{\sigma }}), \quad \text {where} \end{aligned}$$
(1.3)
$$\begin{aligned} \widetilde{H}_N({\varvec{\sigma }})&= \sum _{k\ge 2} \frac{1}{N^{(k-1)/2}} \langle \Gamma ^{(k)} \diamond {\varvec{G}}^{(k)}, {\varvec{\sigma }}^{\otimes k} \rangle \nonumber \\&= \sum _{k\ge 2} \frac{1}{N^{(k-1)/2}} \sum _{i_1,\ldots ,i_k=1}^N \gamma _{s(i_1),\ldots ,s(i_k)} {\varvec{G}}^{(k)}_{i_1,\ldots ,i_k} \sigma _{i_1}\ldots \sigma _{i_k} \end{aligned}$$
(1.4)

with inputs \({\varvec{\sigma }}= (\sigma _1,\ldots ,\sigma _N) \in {\mathcal {B}}_N\). For example, the choice of parameters \(\Gamma ^{(2)} = ({\begin{matrix} 0 &{} 1 \\ 1 &{} 0 \end{matrix}})\) and \(\Gamma ^{(k)}=0\) for \(k\ge 3\) is the well-known bipartite spherical SK model [2]. For \({\varvec{\sigma }},{\varvec{\rho }}\in {\mathcal {B}}_N\), define the species s overlap and overlap vector

$$\begin{aligned} R_s({\varvec{\sigma }}, {\varvec{\rho }}) = \frac{ \langle {\varvec{\sigma }}_s, {\varvec{\rho }}_s \rangle }{\lambda _s N}, \qquad \vec R({\varvec{\sigma }}, {\varvec{\rho }}) = \left(R_1({\varvec{\sigma }}, {\varvec{\rho }}), \ldots , R_r({\varvec{\sigma }}, {\varvec{\rho }})\right). \end{aligned}$$
(1.5)

Let \(\odot \) denote coordinate-wise product. For \(\vec {x}= (x_1,\ldots ,x_r) \in {\mathbb {R}}^{\mathscr {S}}\), let

$$\begin{aligned} \xi (\vec {x})&= \sum _{k\ge 2} \langle \Gamma ^{(k)}\odot \Gamma ^{(k)}, ({\vec \lambda }\odot \vec {x})^{\otimes k}\rangle \\&= \sum _{k\ge 2} \sum _{s_1\ldots ,s_k\in {\mathscr {S}}} \gamma _{s_1,\ldots ,s_k}^2 (\lambda _{s_1} x_{s_1}) \ldots (\lambda _{s_k} x_{s_k}). \end{aligned}$$

The random function \(\widetilde{H}_N\) can also be described as the Gaussian process on \({\mathcal {B}}_N\) with covariance

$$\begin{aligned} {\mathbb {E}}\widetilde{H}({\varvec{\sigma }})\widetilde{H}({\varvec{\rho }}) = N\xi (\vec R({\varvec{\sigma }}, {\varvec{\rho }})). \end{aligned}$$

We will also often refer to the product of spheres

$$\begin{aligned} {\mathcal {S}}_N=\big \{{\varvec{u}}\in {\mathbb {R}}^N~:~\Vert {\varvec{u}}_s\Vert ^2=\lambda _s N~~\forall ~s\in {\mathscr {S}}\big \}. \end{aligned}$$
(1.6)

It will be useful to define, for \(s\in {\mathscr {S}}\),

$$\begin{aligned} \xi ^s(\vec {x}) = \lambda _s^{-1} \partial _{x_s} \xi (\vec {x}). \end{aligned}$$

1.2 The Value \({\textsf {ALG}}\)

Given \(({\vec \lambda },\xi )\), the ground state energy of the associated multi-species spherical spin glass isFootnote 1

$$\begin{aligned} \textsf{OPT}=\textsf{OPT}(\xi ) = \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \sup _{{\varvec{\sigma }}\in {\mathcal {B}}_N} H_N({\varvec{\sigma }})/N. \end{aligned}$$

In the bipartite SK model mentioned above, \(\textsf{OPT}\) is the limiting operator norm of an IID Gaussian rectangular matrix with aspect ratio \(\lambda _1/\lambda _2\). For large k, the asymptotic operator norm of an IID random k-tensor is similarly encoded as \(\textsf{OPT}(\xi )\) for some \(\xi \) (with e.g. \(r=k\)). Perhaps surprisingly, it is generally believed that polynomial-time algorithms are not in general capable of finding \({\varvec{\sigma }}\in {\mathcal {B}}_N\) such that \(H_N({\varvec{\sigma }})\ge \textsf{OPT}(\xi )-{\varepsilon }\) with high probability as \(N\rightarrow \infty \). Our work [15] showed that in the single species case (and with all terms of even degree), one can identify an exact threshold \({\textsf {ALG}}\) for the performance of a class of Lipschitz algorithms which includes gradient-based methods and Langevin dynamics. More recently in [16], we extended the algorithmic hardness direction of this result to multi-species spherical spin glasses, using a new proof technique that applies even when \(\textsf{OPT}\) is not known. The purpose of this paper is to give explicit algorithms attaining the value \({\textsf {ALG}}\), and we present here the formula for this value.

The algorithmic threshold \({\textsf {ALG}}\) is given by the following variational principle. This is a simplification of the more general variational formula [16, Equation (1.7)], obtained by a partial characterization of its maximizers [16, Theorem 3]. The following generic assumption is needed therein to ensure well-posedness of the ODE (2.3) used in this description, and we will freely assume it throughout the paper.

Assumption 1

All quadratic and cubic interactions participate in H, i.e. \(\Gamma ^{(2)}, \Gamma ^{(3)} > 0\) coordinate-wise. We will call such models non-degenerate. Since this condition depends only on \(\xi \), we similarly call \(\xi \) non-degenerate.

To optimize \(H_N\) for degenerate \(\xi \), it suffices to apply our algorithms to a slight perturbation \(\widetilde{\xi }\) which is non-degenerate and satisfies \(\Vert \xi -\widetilde{\xi }\Vert _{C^3([0,1]^r)}\le {\varepsilon }\) to obtain the guarantees in this and the next section. Here, \(C^3([0,1]^r)\) denotes the norm

$$\begin{aligned} \Vert \xi \Vert _{C^3([0,1]^r)} = \sup _{\vec {x}\in [0,1]^r} \max \left\rbrace |\xi (\vec {x})|, \Vert \nabla \xi (\vec {x})\Vert _\infty , \Vert \nabla ^2 \xi (\vec {x})\Vert _\infty , \Vert \nabla ^3 \xi (\vec {x})\Vert _\infty \right\lbrace . \end{aligned}$$

Since both the ground state and the more general \({\textsf {ALG}}\) formula in [16] (allowing degenerate \(\xi \)) vary continuously in \(\xi \), there is essentially no loss of generality in assuming non-degeneracy.

The formula for \({\textsf {ALG}}\) is described by two cases depending on whether \(\vec {1}=1^{{\mathscr {S}}}\) is super-solvable as defined below.

Definition 1.1

A matrix \(M\in {\mathbb {R}}^{{\mathscr {S}}\times {\mathscr {S}}}\) is diagonally signed if \(M_{i,i}\ge 0\) and \(M_{i,j}<0\) for all \(i\ne j\).

Definition 1.2

A symmetric diagonally signed matrix M is super-solvable if it is positive semidefinite, and solvable if it is furthermore singular; otherwise M is strictly sub-solvable. A point \(\vec {x}\in (0,1]^{\mathscr {S}}\) is super-solvable, solvable, or strictly sub-solvable if \(M^*(\vec {x})\) is, where

$$\begin{aligned} M^*(\vec {x}) = \text {diag}\left(\left(\frac{\partial _{x_s}\xi (\vec {x}) + \lambda _s h_s^2}{x_s}\right)_{s\in {\mathscr {S}}}\right) - \left(\partial _{x_s,x_{s'}}\xi (\vec {x})\right)_{s,s'\in {\mathscr {S}}}. \end{aligned}$$
(1.7)

We also adopt the convention that \(\vec {0}\) is always super-solvable, and solvable if \({\vec h}=\vec {0}\).

The following will be useful.

Proposition 1.3

([16, Proposition 4.3], see also [17, Lemma 2.5]) If the square matrix M is diagonally signed, then the minimal eigenvalue \({\varvec{\lambda }}_{\min }(M)\) has multiplicity 1, and the corresponding eigenvector \(\vec {v}\) has strictly positive entries. Moreover

$$\begin{aligned} {\varvec{\lambda }}_{\min }(M) = \sup _{\vec {v}\succ \vec {0}} \min _{s\in {\mathscr {S}}} \frac{(M\vec {v})_s}{v_s}, \end{aligned}$$

and the supremum is uniquely attained at \(\vec {v}\).

It is easy to see that any \(x\in (0,1]^{\mathscr {S}}\) is sub-solvable when \({\vec h}=\vec {0}\), and that super-solvability is a coordinate-wise increasing property of \({\vec h}\). For our purposes, an external field is large if \(\vec {1}\) is super-solvable and small if \(\vec {1}\) is strictly sub-solvable. (Unfortunately we do not have more refined intuition for the precise form of \(M^*\) above, nor the resulting phase boundary between super and sub-solvability.) As shown in our companion work [17], in super-solvable models the external fields \(\varvec{h}\) are strong enough to trivialize the “glassy” nature of the landscape for \(H_N\). Namely the number of critical points is exactly \(2^r\) with high probability, the minimum number of any generic smooth (“Morse”) function on a product of r spheres. By contrast in the sub-solvable case, the expected number of critical points is exponentially large in the dimension N. As explained below, the optimization algorithms are also simpler in the super-solvable case.

Definition 1.4

(Algorithmic Threshold, Super-Solvable Case) If \(\vec {1}\) is super-solvable, then

$$\begin{aligned} {\textsf {ALG}}= \sum _{s\in {\mathscr {S}}} \lambda _s \sqrt{\xi ^s(\vec {1}) + h_s^2}. \end{aligned}$$

When \(\vec {1}\) is strictly sub-solvable, the formula for \({\textsf {ALG}}\) becomes more complicated and depends on the optimal choice of a increasing \(C^2\) function \(\Phi :[q_1,1]\rightarrow [0,1]^{{\mathscr {S}}}\) satisfying certain conditions. We term such \(\Phi \) pseudo-maximizers and defer the formal definition to Definition 2.1. Note that \(q_1 \in [0,1]\) is not fixed, but is determined by the choice of \(\Phi \).

Definition 1.5

(Algorithmic Threshold, Strictly Sub-solvable Case) If \(\vec {1}\) is strictly sub-solvable, then with the maximum taken over all pseudo-maximizers \(\Phi \) of \({\mathbb {A}}\),

$$\begin{aligned} \begin{aligned} {\textsf {ALG}}&= \max _{\Phi } {\mathbb {A}}(\Phi ); \\ {\mathbb {A}}(\Phi )&\equiv \sum _{s\in {\mathscr {S}}} \lambda _s \left[ \sqrt{\Phi _s(q_1) (\xi ^s(\Phi (q_1)) + h_s^2)} + \int _{q_1}^1 \sqrt{\Phi '_s(q)(\xi ^s\circ \Phi )'(q)}~\text {d}q \right]. \end{aligned} \end{aligned}$$
(1.8)

See [16, Remark 1.3] for an approach to maximizing \({\mathbb {A}}\) using the well-posedness of the ODEs (2.2), (2.3) in the definition of pseudo-maximizer. The computational complexity of this task is in particular independent of N.

The following theorem is our main result. We equip the space \({\mathscr {H}}_N\) of Hamiltonians \(H_N\) with the following distance. We identify \(H_N\) with its disorder coefficients \(({\varvec{G}}^{(k)})_{k\ge 2}\), which we arrange in an arbitrary but fixed order into an infinite vector \({\varvec{g}}(H_N)\), and define

$$\begin{aligned} \Vert {H_N-H'_N}\Vert _2 = \Vert {{\varvec{g}}(H_N) - {{\varvec{g}}}(H'_N)}\Vert _2. \end{aligned}$$

(In other words, \(\Vert {H_N-H'_N}\Vert _2^2\) is the sum of squared differences \((g_{i_1,\ldots ,i_k}-g'_{i_1,\ldots ,i_k})^2\) between all corresponding pairs of coefficients in \(({\varvec{G}}^{(k)})_{k\ge 2}\) and \(({\varvec{G}}'^{(k)})_{k\ge 2}\).) We say an algorithm \({\mathcal {A}}_N: {\mathscr {H}}_N \rightarrow {\mathcal {B}}_N\) is \(\tau \)-Lipschitz if

$$\begin{aligned} \Vert {{\mathcal {A}}_N(H_N) - {\mathcal {A}}_N(H'_N)}\Vert _2 \le \tau \Vert {H_N - H'_N}\Vert _2, \qquad \forall H_N, H'_N \in {\mathscr {H}}_N. \end{aligned}$$

Note that \(\Vert {H_N-H'_N}\Vert _2\) may be infinite, and if so this condition holds vacuously for such pairs \((H_N,H'_N)\). Here and throughout, all implicit constants may depend also on \((\xi ,{\vec h},{\vec \lambda })\).

Theorem 1

For any \({\varepsilon }>0\), there exists an \(O_{{\varepsilon }}(1)\)-Lipschitz \({\mathcal {A}}_N:{\mathscr {H}}_N\rightarrow {\mathcal {B}}_N\) such that

$$\begin{aligned} {\mathbb {P}}[H_N({\mathcal {A}}_N(H_N))/N \ge {\textsf {ALG}}-{\varepsilon }] \ge 1-\exp (-cN), \quad c = c({\varepsilon }) > 0. \end{aligned}$$

The main result in our companion work [16, Theorem 1] states that any \(\tau \)-Lipschitz \({\mathcal {A}}_N: {\mathscr {H}}_N \rightarrow {\mathcal {B}}_N\) satisfies, for the same threshold \({\textsf {ALG}}\) and N sufficiently large,

$$\begin{aligned} {\mathbb {P}}[H_N({\mathcal {A}}_N(H_N))/N \ge {\textsf {ALG}}+ {\varepsilon }] \le \exp (-cN), \quad c = c({\varepsilon },\tau ) > 0. \end{aligned}$$

Together these results thus characterize the best possible Lipschitz optimization algorithms for multi-species spherical spin glasses.

We prove Theorem 1 with an explicit algorithm based on AMP, following a recent line of work [4, 5, 21, 24, 26]. Such algorithms are shown to be Lipschitz (up to modification on a set with \(\exp (-cN)\) probability) in [15, Sect. 8]. AMP algorithms also have computational complexity which is linear in the input size when \(H_N\) is a polynomial of finite degree (modulo solving for \(\Phi \), a task that does not depend on N). See [4, Remark 2.1] for related discussion on this last point.

Similarly to [5, 24], our algorithm has two phases, a “root-finding" phase and a “tree-descending" phase. Roughly speaking, the set of points reachable by our algorithm has the geometry of a densely branching ultrametric tree, which is rooted at the origin when \(\varvec{h}= {\textbf {0}}\) and more generally at a random point correlated with \(\varvec{h}\). The first phase identifies this root, and the second traces a root-to-leaf path of this tree. The structure of the first phase is similar to the original AMP algorithm of [9] for the SK model at high-temperature, while the latter incremental AMP technique was introduced in [21].

For the purposes of this paper, the significance of (super, sub)-solvability is as follows. When the external field is sufficiently large, the root moves all the way to the boundary of \({\mathcal {B}}_N\) (in all r species) and the algorithmic tree becomes degenerate. In [16], it is shown that the external field is large enough for this to occur if and only if \(\vec {1}\) is super-solvable. Moreover, [17] shows this condition coincides with strong topological trivialization (defined therein) of the optimization landscape.

In Sect. 3 we extend our main algorithm in several ways. In Sect. 3.1 we define \(2^r\) signed generalizations of the root-finding algorithm with similar behavior. In Sect. 3.2 we compute the gradients of \(H_N\) at the points output by our algorithm, in both cases when \(\vec {1}\) is super-solvable and sub-solvable. In particular, we show that they are approximate critical points on the product of spheres \({\mathcal {S}}_N\) (defined in (1.6)). As explained in Remark 3.1, in the strictly super-solvable case these \(2^r\) outputs approximate the \(2^r\) genuine critical points of \(H_N\) on \({\mathcal {S}}_N\). The sub-solvable case of this computation is used in our companion paper [17, Theorem 1.5(c) and Sect. 5.3] to show failure of annealed topological trivialization in the sub-solvable case. Finally in Sect. 3.3 we give a modification of the tree-descending phase for the super-solvable case. It constructs \(\exp (cN)\) well-separated approximate critical points arranged in a densely branching ultrametric tree; this implies the failure of strong topological trivialization in [17, Definition 6 and Theorem 1.6].

1.3 Notations

Throughout, we will use boldface lowercase letters (\({\varvec{u}},{\varvec{v}},\ldots \)) to denote vectors in \({\mathbb {R}}^N\), and lowercase letters with vector sign (\(\vec {u},\vec {v},\ldots \)) to denote vectors in \({\mathbb {R}}^{\mathscr {S}}\simeq {\mathbb {R}}^r\). Similarly, boldface uppercase letters denote matrices or tensors in \(({\mathbb {R}}^N)^{\otimes k}\), and non-boldface uppercase letters denote matrices or tensors in \(({\mathbb {R}}^r)^{\otimes k}\). We let

$$\begin{aligned} \langle {\varvec{v}}\rangle _N=N^{-1}\sum _{i\le N} v_i; \quad \quad \langle {\varvec{u}},{\varvec{v}}\rangle _N = N^{-1}\sum _{i\le N}u_iv_i = \langle \vec \lambda , \vec R({\varvec{u}},{\varvec{v}})\rangle \end{aligned}$$

for \({\varvec{u}},{\varvec{v}}\in {\mathbb {R}}^N\). The corresponding norm is

$$\begin{aligned} \Vert {\varvec{u}}\Vert _{N}= \langle {\varvec{u}},{\varvec{u}}\rangle _N^{1/2}=\sqrt{\sum _s \lambda _s R_s({\varvec{u}},{\varvec{u}})}. \end{aligned}$$

Next \(a_N\simeq b_N\) means that \(a_N-b_N\) converges in probability to 0. Analogously, for two vectors \({\varvec{u}}_N, {\varvec{v}}_N\), we write \({\varvec{u}}_N\simeq {\varvec{v}}_N\) when \(\Vert {\varvec{u}}_N-{\varvec{v}}_N\Vert _N\) converges in probability to 0. We denote limits in probability by \(\mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty }\). Analogously we write \(\approx _{\delta }\) to denote asymptotic equality as \(\delta \rightarrow 0\).

For any tensor \(\varvec{A}\in ({\mathbb {R}}^N)^{\otimes k}\), we define the operator norm

$$\begin{aligned} {\Vert \varvec{A}\Vert }_{\text {op}} = \sup _{\Vert {\varvec{\sigma }}^1\Vert ,\ldots ,\Vert {\varvec{\sigma }}^k\Vert \le 1} \left|\langle \varvec{A}, {\varvec{\sigma }}^1 \otimes \cdots \otimes {\varvec{\sigma }}^k \rangle \right|. \end{aligned}$$

The following proposition shows that with exponentially good probability, the operator norms of all constant-order gradients of \(H_N\) are bounded on the appropriate scale.

Proposition 1.6

([16, Proposition 1.13]) For any fixed model \((\xi , {\vec h})\) there exists a constant \(c>0\), sequence \((K_N)_{N\ge 1}\) of convex sets \(K_N\subseteq {\mathscr {H}}_N\), and sequence of constants \((C_{k})_{k\ge 1}\) independent of N, such that the following properties hold.

  1. (a)

    \(\mathbb {P}[H_N\in K_N]\ge 1-e^{-cN}\);

  2. (b)

    For all \(H_N\in K_N\) and \({\varvec{x}}\in {\mathcal {B}}_N\),

    $$\begin{aligned} {\left\Vert\nabla ^k H_N({\varvec{x}})\right\Vert}_{\text {op}}&\le C_{k}N^{1-\frac{k}{2}}. \end{aligned}$$
    (1.9)

2 Achieving Energy \({\textsf {ALG}}\)

In this section we prove Theorem 1 by exhibiting an AMP algorithm. Throughout this section, Assumption 1 on non-degeneracy of \(\xi \) will be enforced without loss of generality.

2.1 Definition of Pseudo-Maximizer

As mentioned before Definition 1.5, the threshold \({\textsf {ALG}}\) in the sub-solvable case depends on a notion of pseudo-maximizer. We now provide this definition, which was derived in [16, Theorem 3] as a necessary condition for \(\Phi \) to maximize \({\mathbb {A}}\) defined in (1.8) (and it is proved therein that a maximizer always exists).

Definition 2.1

A coordinate-wise strictly increasing \(C^2\) function \(\Phi :[q_1,1]\rightarrow [0,1]^{{\mathscr {S}}}\), for some \(q_1\in [0,1]\), is a pseudo-maximizer if:

  1. (1)

    \(\Phi \) is admissible, meaning it satisfies the normalization

    $$\begin{aligned} \langle {\vec \lambda }, \Phi (q)\rangle = q,\quad \forall q\in [q_1,1]. \end{aligned}$$
    (2.1)

    In particular \(\Phi (1) = \vec {1}\).

  2. (2)

    \(\Phi (q_1)\) is solvable.

  3. (3)

    The derivative at \(q_1\) satisfies \(M^*(\Phi (q_1))\Phi '(q_1)=\vec {0}\). This amounts to no restriction when \({\vec h}=\vec {0}\) and thus \((q_1,\Phi (q_1))=(0,\vec {0})\); when \({\vec h}\ne \vec {0}\) it means that

    $$\begin{aligned} \Phi _s'(q_1) = \frac{\Phi _s(q_1) (\xi ^s\circ \Phi )'(q_1)}{\xi ^s(\Phi (q_1))+h_s^2}, \quad s\in {\mathscr {S}}. \end{aligned}$$
    (2.2)
  4. (4)

    For all \(q\in [q_1,1]\), \(\Phi \) solves the (second-order) tree-descending differential equation:

    $$\begin{aligned} \Psi (q) \equiv \frac{1}{\Phi '_s(q)} {\frac{{\text {d}}}{{\text {d}q}}} \sqrt{\frac{\Phi '_s(q)}{(\xi ^s \circ \Phi )'(q)}} \end{aligned}$$
    (2.3)

    is independent of the species s. (See [16, Lemma 4.37] for well-posedness of this ODE.)

Note that there may exist multiple such \(\Phi \), see [16, Figure 2]. If \(\vec {1}\) is super-solvable, we adopt the convention that \(q_1=1\) and \(\Phi \) has domain \(\{1\}\).

We now give an efficient AMP algorithm achieving energy \({\mathbb {A}}(\Phi )\) for any pseudo-maximizer \(\Phi \). In particular for the optimal pseudo-maximizer this achieves energy \({\textsf {ALG}}\).

2.2 Review of Approximate Message Passing

Here we recall the class of AMP algorithms, specialized to our setting of interest. We initialize AMP with a deterministic vector \({\varvec{w}}^0\) with coordinates

$$\begin{aligned} w^0_i = w_{s(i)} \end{aligned}$$
(2.4)

depending only on the species. Let \(f_{t,s}:{\mathbb {R}}^{t+1}\rightarrow {\mathbb {R}}\) be a Lipschitz function for each \((t,s)\in {\mathbb {Z}}_{\ge 0}\times {\mathscr {S}}\). For \(({\varvec{w}}^0,{\varvec{w}}^1,\ldots ,{\varvec{w}}^t)\in {\mathbb {R}}^{N\times (t+1)}\), let \(f_{t}({\varvec{w}}^0,{\varvec{w}}^1,\ldots ,{\varvec{w}}^t)\in {\mathbb {R}}^N\) be given by

$$\begin{aligned} f_{t}({\varvec{w}}^0,{\varvec{w}}^1,\ldots ,{\varvec{w}}^t)_i = f_{t,s(i)}(w^1_i,\ldots ,w^t_i),\quad i\in [N]. \end{aligned}$$

We generate subsequent iterates through recursions of the following form, where \({\textbf {ons}}_t\) is known as the Onsager correction term:

$$\begin{aligned} {\varvec{w}}^{t+1}&= \nabla H_N(\varvec{m}^t) - {\textbf {ons}}_t ; \nonumber \\ \varvec{m}^t&= f_{t}({\varvec{w}}^0,{\varvec{w}}^1,\ldots ,{\varvec{w}}^t); \end{aligned}$$
(2.5)
$$\begin{aligned} {\textbf {ons}}_t&= \sum _{t'\le t} d_{t,t'} \diamond f_{t'-1}({\varvec{w}}^1,\ldots ,{\varvec{w}}^{t'-1}); \end{aligned}$$
(2.6)
$$\begin{aligned} d_{t,t',s}&= \left( \sum _{s'\in {\mathscr {S}}} \partial _{x_{s'}} \xi ^s \left( \big ( {\mathbb {E}}[M^t_{s''} M^{t'-1}_{s''}] \big )_{s''\in {\mathscr {S}}} \right) \cdot {\mathbb {E}}\left[ \partial _{W^{t'}_{s'}}f_{t,s'}(W^0_{s'},\ldots ,W^t_{s'}) \right] \right). \end{aligned}$$
(2.7)

Here \(W^t_s,M^t_s\) are defined as follows. \(W^0_s=w_s\) and the variables \((\widetilde{W}^t_s)_{(t,s)\in {\mathbb {Z}}_{\ge 1}\times {\mathscr {S}}}\) form a centered Gaussian process with covariance defined recursively by

$$\begin{aligned} \begin{aligned} {\mathbb {E}}[\widetilde{W}^{t+1}_s \widetilde{W}^{t'+1}_{s}]&= \xi ^s\left({\mathbb {E}}[f_{t,s}(W^0_s,\ldots ,W^{t}_s)f_{t',s}(W^0_s,\ldots ,W^{t'}_s)]\right), \\ W^t_s&= \widetilde{W}^t_s+h_s; \\ M^t_s&= f_{t,s}(W^0_s,\ldots ,W^t_s) \end{aligned} \end{aligned}$$
(2.8)

and \({\mathbb {E}}[\widetilde{W}^{t+1}_{s} \widetilde{W}^{t'+1}_{s'}]=0\) if \(s\ne s'\) (i.e. different species are independent).

The following state evolution characterizes the behavior of the above iterates. It states that for each \(s\in {\mathscr {S}}\), when \(i\in {\mathcal {I}}_s\) is uniformly random the sequence of coordinates \((w^1_i,w^2_i,\ldots ,w^t_i)\) has the same law as \((W^1_s,\ldots ,W^t_s)\). Say a function \(\psi :{\mathbb {R}}^{\ell } \rightarrow {\mathbb {R}}\) is pseudo-Lipschitz if \(|\psi (x) - \psi (y)| \le C(1+|x|+|y|)|x-y|\) for a constant C.

Proposition 2.2

For any pseudo-Lipschitz function \(\psi \) and \(\ell \in {\mathbb {Z}}_{\ge 0}\), \(s\in {\mathscr {S}}\),

$$\begin{aligned} \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty }\frac{1}{N_s} \sum _{i\in {\mathcal {I}}_s} \psi ({\varvec{w}}^0_i,\ldots ,{\varvec{w}}^{\ell }_i) = {\mathbb {E}}[ \psi (W^0_s,\ldots ,W^{\ell }_s) ]. \end{aligned}$$
(2.9)

This proposition allows us to read off normalized inner products of the AMP iterates, since e.g.

$$\begin{aligned} \langle {\varvec{w}}^k,{\varvec{w}}^{\ell }\rangle _N \simeq \sum _{s\in {\mathscr {S}}} \lambda _s {\mathbb {E}}[W^k_s W^{\ell }_s]. \end{aligned}$$

Proposition 2.2 is proved in Appendix 1. In fact we show a slight generalization allowing \(f_t=f_t({\varvec{w}}^0,\ldots ,{\varvec{w}}^t,{\varvec{g}}^0,\ldots ,{\varvec{g}}^t)\) to depend also on independently generated vectors \(({\varvec{g}}^0,\ldots ,{\varvec{g}}^t)\in {\mathbb {R}}^{N(t+1)}\). When using this extension, we will always take each \({\varvec{g}}^t\sim {\mathcal {N}}(0,I_N)\) to be standard Gaussian. The more general result essentially says that \({\varvec{g}}_t\) still acts as an independent Gaussian for the purposes of state evolution. Since this is relatively intuitive, we refer to Theorem 2 in the appendix for a precise statement.

For random matrices (i.e. the case of quadratic H) there is a considerable literature establishing state evolution in many settings beginning with [7, 9] and later [6, 8, 10, 11, 13] (see also [14] for a survey of many statistical applications). The generalization to tensors was introduced in [23] and proved in [4], whose approach we follow.

2.3 Stage \(\text {I}\): Finding the Root of the Ultrametric Tree

Our goal in this subsection will be to compute a vector \(\varvec{m}^{{\underline{\ell }}}\) satisfying

$$\begin{aligned} \mathop {\mathrm {p-lim}}\limits _{{\underline{\ell }}\rightarrow \infty } \lim _{N\rightarrow \infty } \vec R(\varvec{m}^{{\underline{\ell }}},\varvec{m}^{{\underline{\ell }}})=\Phi (q_1) \end{aligned}$$

and with the correct energy value (as stated in Lemma 2.5). We take as given a maximizer \(\Phi \) to \({\mathbb {A}}\) with domain \([q_1,1]\). Recall \(\Phi (q_1)\) is super-solvable: either \(\vec {1}\) is strictly sub-solvable, in which case \(\Phi (q_1)\) is solvable, or \(\vec {1}\) is super-solvable, in which case \(\Phi (q_1) = \Phi (1) = \vec {1}\).

We use the initialization

$$\begin{aligned} w^0_i = \sqrt{\xi ^s(\Phi (q_1))+h_s^2},\quad i\in {\mathcal {I}}_s. \end{aligned}$$

Define the vector \({\vec a}\in \mathbb R^{{\mathscr {S}}}\) by

$$\begin{aligned} a_s= \sqrt{\frac{\Phi _s(q_1)}{\xi ^s(\Phi (q_1))+h_s^2}}. \end{aligned}$$

Subsequent iterates are defined via the following recursion.

$$\begin{aligned} {\varvec{w}}^{k+1}&= \nabla H_N(\varvec{m}^k) - {\vec b}_k \diamond \varvec{m}^{k-1} \nonumber \\&= \varvec{h}+ \nabla \widetilde{H}_N(\varvec{m}^k) - {\vec b}_k \diamond \varvec{m}^{k-1}; \end{aligned}$$
(2.10)
$$\begin{aligned} \varvec{m}^k&= {\vec a}\diamond {\varvec{w}}^k \end{aligned}$$
(2.11)
$$\begin{aligned} b_{k,s}&\equiv \sum _{s'\in {\mathscr {S}}} a_{s'} \partial _{s'}\xi ^s \big (\vec R(\varvec{m}^k,\varvec{m}^{k-1})\big ). \end{aligned}$$
(2.12)

The last term in (2.10) comes from specializing the formula (2.6) for the Onsager term.

Next recalling (2.8), let \((W^j_s,M^j_s)_{j\ge 0,s\in {\mathscr {S}}}\) be the state evolution limit of the coordinates of

$$\begin{aligned} ({\varvec{w}}^{0},\varvec{m}^{0},\ldots ,{\varvec{w}}^k,\varvec{m}^k) \end{aligned}$$

as \(N\rightarrow \infty \). Concretely, each \(W^j_s\) is Gaussian with mean \(h_s\) and

$$\begin{aligned} M^{j}_s=\sqrt{\frac{\Phi _s(q_1)}{\xi ^s(\Phi (q_1))+h_s^2} } \cdot W^j_s, \quad j\ge 0,~ s\in {\mathscr {S}}. \end{aligned}$$

We next compute the covariance of the Gaussians \(\widetilde{W}^j_s = W^j_s - h_s\). Define \({\vec \alpha }: {\mathbb {R}}_{\ge 0}^{\mathscr {S}}\rightarrow {\mathbb {R}}_{\ge 0}^{\mathscr {S}}\) by

$$\begin{aligned} \alpha _s(\vec {x}) = \left(\xi ^s(\vec {x})+h_s^2\right)\left(\frac{\Phi _s(q_1)}{\xi ^s(\Phi (q_1))+h_s^2}\right). \end{aligned}$$
(2.13)

Define the (deterministic) \({\mathbb {R}}_{\ge 0}^{{\mathscr {S}}}\)-valued sequence \((\vec R^0,\vec R^1,\dots )\) of asymptotic overlaps recursively by \(\vec R^0=\vec {0}\) and \(\vec R^{k+1} = {\vec \alpha }(\vec R^k)\).

Lemma 2.3

For integers \(0\le j<k\), the following equalities hold (the first in distribution):

$$\begin{aligned} W^j_s&{\mathop {=}\limits ^{d}} h_s+Z\sqrt{\xi ^s(\Phi (q_1))},\quad Z\sim {\mathcal {N}}(0,1) \end{aligned}$$
(2.14)
$$\begin{aligned} \mathbb E[\widetilde{W}^j_s \widetilde{W}^k_s]&=\xi ^s(\vec R^j) \end{aligned}$$
(2.15)
$$\begin{aligned} \mathbb E[(M^j_s)^2]&=\Phi _s(q_1) \end{aligned}$$
(2.16)
$$\begin{aligned} \mathbb E[M^j_s M^k_s]&=R^{j+1}_s. \end{aligned}$$
(2.17)

Proof

We proceed by induction on j, first showing (2.14) and (2.16) together. As a base case, (2.14) holds for \(j=0\) by initialization. For the inductive step, assume first that (2.14) holds for j. Then by the definition (2.11),

$$\begin{aligned} \mathbb E\left[(M^j_s)^2\right]&= \left(\xi ^s(\Phi (q_1))+h_s^2\right)\cdot a_s^2 \\ {}&= \left(\xi ^s(\Phi (q_1))+h_s^2\right)\cdot \left(\frac{\Phi _s(q_1)}{\xi ^s(\Phi (q_1))+h_s^2}\right) \\ {}&=\Phi _s(q_1) \end{aligned}$$

so that (2.14) implies (2.16) for each \(j\ge 0\). On the other hand, state evolution directly implies that if (2.16) holds for j then (2.14) holds for \(j+1\). This establishes (2.14) and (2.16) for all \(j\ge 0\).

We similarly show (2.15) and (2.17) together by induction, beginning with (2.15). When \(j=0\) it is clear because \(\widetilde{W}^k_s\) is mean zero and independent of \(\widetilde{W}^0_s\). Just as above, it follows from state evolution that (2.15) for (jk) implies (2.17) for (jk) which in turn implies (2.15) for \((j+1,k+1)\). Hence induction on j proves (2.15) and (2.17) for all (jk). \(\square \)

The next lemma is crucial and uses super-solvability of \(\Phi (q_1)\).

Lemma 2.4

The limit \(\vec R^\infty \equiv \lim _{j\rightarrow \infty } \vec R^j\) exists and equals \(\Phi (q_1)\).

Proof

First we observe that \({\vec \alpha }\) (recall (2.13)) is coordinate-wise strictly increasing in the sense that if \(0\preceq x\prec y\) then \({\vec \alpha }(x)\prec {\vec \alpha }(y)\). Moreover \({\vec \alpha }(\vec {0})\succ 0\) (assuming \({\vec h}\ne 0\), else the result is trivial) and \({\vec \alpha }(\Phi (q_1))=\Phi (q_1)\). Therefore \(\vec R^\infty \) exists, \({\vec \alpha }(\vec R^\infty )=\vec R^\infty \), and

$$\begin{aligned} \vec {0}\preceq \vec R^\infty \preceq \Phi (q_1). \end{aligned}$$

It remains to show that the above forces \(\vec R^\infty =\Phi (q_1)\) to hold.

Let \(M\in {\mathbb {R}}^{{\mathscr {S}}\times {\mathscr {S}}}\) be the matrix with entries \(M_{s,s'}={\frac{{\text {d}}}{{\text {d}t}}}{\vec \alpha }_s(\Phi (q_1)+te_{s'})|_{t=0}\) for \(e_{s'}\) a standard basis vector. Then M is the derivative matrix for \({\vec \alpha }\) at \(\Phi (q_1)\) in the sense that for any \(\vec {u}\in {\mathbb {R}}^{{\mathscr {S}}}\),

$$\begin{aligned} {\frac{{\text {d}}}{{\text {d}t}}}{\vec \alpha }(\Phi (q_1)+t\vec {u})|_{t=0}=M\vec {u}. \end{aligned}$$

We easily calculate that

$$\begin{aligned} M_{s,s'} = \frac{\Phi _s(q_1) \partial _{x_s,x_{s'}}\xi (\Phi (q_1))}{\partial _{x_s}\xi (\Phi (q_1)) + \lambda _s h_s^2}. \end{aligned}$$

We claim that for any entry-wise non-negative vector \(\vec w\in \mathbb R_{\ge 0}^{{\mathscr {S}}}\),

$$\begin{aligned} (M\vec w)_s\le w_s \end{aligned}$$
(2.18)

for some \(s\in {\mathscr {S}}\). Indeed, suppose to the contrary that \((M\vec w)_s > w_s\) for all \(s\in {\mathscr {S}}\). This rearranges to

$$\begin{aligned} \frac{\partial _{x_s}\xi (\Phi (q_1)) + \lambda _s h_s^2}{\Phi _s(q_1)} w_s - \sum _{s'\in {\mathscr {S}}} \partial _{x_s,x_{s'}} \xi (\Phi (q_1)) w_{s'} < 0 \quad \forall s\in {\mathscr {S}}, \end{aligned}$$

i.e. \(M^*(\Phi (q_1)) \vec w\prec \vec {0}\) (recall (1.7)). Proposition 1.3 then implies that \({\varvec{\lambda }}_{\min }(M^*(\Phi (q_1))) < 0\), so \(\Phi (q_1)\) is strictly sub-solvable, which is a contradiction. Thus (2.18) holds for some \(s\in {\mathscr {S}}\).

Now suppose for sake of contradiction that \(\vec R^\infty \prec \Phi (q_1)\), let \(\vec w=\Phi (q_1)-\vec R^\infty \), and choose \(s\in {\mathscr {S}}\) such that (2.18) holds. Write \(f(t)=\alpha _s(\Phi (q_1)+t\vec w)\). Since \(\alpha _s\) is a polynomial with non-negative coefficients and \(\xi \) is non-degenerate, f is strictly convex and strictly increasing on \([-1,0]\). Hence

$$\begin{aligned} \alpha _s(\vec R^\infty ) = f(-1) > f(0)-f'(0) \ge \Phi _s(q_1)-(M\vec w)_s {\mathop {\ge }\limits ^{(2.18)}} \Phi _s(q_1)-w_s = R^\infty _s. \end{aligned}$$

The first inequality above is strict, so we deduce that \({\vec \alpha }(\vec R^\infty )\ne \vec R^\infty \) if \(\vec R^\infty \prec \Phi (q_1)\). This contradicts the definition of \(\vec R^\infty \). Therefore \(\vec R^\infty =\Phi (q_1)\), completing the proof. \(\square \)

Remark 2.1

Super-solvability of \(\Phi (q_1)\) is a tight condition for the above argument to hold, as the matrix M above needs to have Perron-Frobenius eigenvalue at most 1. Indeed suppose that \(\Phi (q_1)\) was chosen so that \(\lambda _1(M)>1\). Then there exists \(\vec w\in {\mathbb {R}}_{>0}^{{\mathscr {S}}}\) with \(M\vec w\succ \vec w\). Letting \(\vec {x}=\Phi (q_1)-{\varepsilon }\vec w\) for small \({\varepsilon }>0\), we find \({\vec \alpha }(\vec {x})\prec \vec {x}\). Monotonicity implies that \({\vec \alpha }\) maps the compact, convex set

$$\begin{aligned} K=\{{\vec y}\in [0,1]^{{\mathscr {S}}}~:~\vec {0}\preceq {\vec y}\preceq \vec {x}\} \end{aligned}$$

into itself. By the Brouwer fixed point theorem, a fixed point of \({\vec \alpha }\) strictly smaller than \(\Phi (q_1)\) exists whenever \(\Phi (q_1)\) is strictly subsolvable.

We finish our analysis of the first AMP phase by computing the asymptotic energy it achieves. As expected, the resulting value agrees with the first term in the formula (1.8) for \({\textsf {ALG}}\).

Lemma 2.5

$$\begin{aligned} \lim _{k\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty }\frac{H_N(\varvec{m}^k)}{N} = \sum _{s\in {\mathscr {S}}} \lambda _s \sqrt{ \Phi _s(q_1) \cdot \left(h_s^2+\xi ^s(\Phi (q_1))\right)}. \end{aligned}$$

Proof

We use the identity

$$\begin{aligned} \frac{H_N(\varvec{m}^k)}{N}=\big \langle \varvec{h},\varvec{m}^k\rangle _N+\int _0^1 \langle \varvec{m}^k,\nabla {\widetilde{H}}_N(t\varvec{m}^k)\big \rangle _N \text {d}t \end{aligned}$$
(2.19)

and interchange the limit in probability with the integral. To compute \(\mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty }\langle \varvec{m}^k,\nabla \widetilde{H}_N(t\varvec{m}^k)\rangle \) we introduce an auxiliary AMP step

$$\begin{aligned} {\varvec{y}}^{k+1}=\nabla {\widetilde{H}}_N(t\varvec{m}^k)- t {\vec b}_k \diamond \varvec{m}^{k-1} \end{aligned}$$

which depends implicitly on \(t\in [0,1]\). Rearranging yields

$$\begin{aligned} \vec R(\varvec{m}^k,\nabla {\widetilde{H}}_N(t\varvec{m}^k))&= \vec R(\varvec{m}^k,{\varvec{y}}^{k+1}) + t\cdot \left( \vec R(\varvec{m}^k,\varvec{m}^{k-1} ) \odot {\vec b}_k \right) \\ {}&\simeq \vec R(\varvec{m}^k,{\varvec{y}}^{k+1}) + t\cdot \left( \vec R^k \odot {\vec b}_k \right). \end{aligned}$$

For the first term, recalling (2.11) yields

$$\begin{aligned} R_s(\varvec{m}^k,{\varvec{y}}^{k+1})= & {} \mathbb E[a_s W^k_s Y^{k+1}_s] \\ {}= & {} a_s\;\xi ^s(t \vec R^k). \end{aligned}$$

Note also that

$$\begin{aligned} \lambda _s \partial _{s'}\xi ^s(\vec R^k)=\partial _{x_s,x_{s'}}\xi (\vec R^k)=\lambda _{s'}\partial _{s}\xi ^{s'}(\vec R^k). \end{aligned}$$
(2.20)

Integrating with respect to t, and switching the roles of \(s,s'\) in applying (2.20), we thus find

$$\begin{aligned} \int _0^1 \langle \varvec{m}^k,\nabla {\widetilde{H}}_N(t\varvec{m}^k)\rangle _N \text {d}t&\simeq \sum _{s\in {\mathscr {S}}} \lambda _s \int _0^1 R_s(\varvec{m}^k,\nabla {\widetilde{H}}_N(t\varvec{m}^k)) \text {d}t \\ {}&\simeq \sum _{s\in {\mathscr {S}}} \lambda _s \int _0^1 \Big ( a_s \xi ^s(t\vec R^k) + t R^k_s \sum _{s'} a_{s'}\partial _{s'}\xi ^s(\vec R^k) \Big ) ~\text {d}t \\&{\mathop {=}\limits ^{(2.20)}} \sum _{s\in {\mathscr {S}}} \lambda _s \int _0^1 \Big ( a_s \xi ^s(t\vec R^k) + t a_s \sum _{s'} R^k_{s'} \partial _{s'}\xi ^s(\vec R^k) \Big ) ~\text {d}t \\ {}&= \sum _{s\in {\mathscr {S}}} \lambda _s a_s \int _0^1 \frac{\text {d}~}{\text {d}t} \left(t\, \xi ^s(t\, \vec R^k)\right) \text {d}t \\ {}&= \sum _{s\in {\mathscr {S}}} \lambda _s a_s \xi ^s(\vec R^k). \end{aligned}$$

Finally the external field \(\varvec{h}\) gives energy contribution

$$\begin{aligned} \langle \varvec{h}, \varvec{m}^k\rangle _N \simeq \sum _{s\in {\mathscr {S}}} \lambda _s h_s{\mathbb {E}}[M^k_s] = \sum _{s\in {\mathscr {S}}} \lambda _s a_s h_s^2. \end{aligned}$$

Since \(\vec R^\infty =\Phi (q_1)\) by Lemma 2.4, we conclude

$$\begin{aligned} \lim _{k\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty }\frac{H_N(\varvec{m}^k)}{N}&= \sum _{s\in {\mathscr {S}}} \lambda _s a_s\big (h_s^2 + \xi ^s(\Phi (q_1))\big ) \\&= \sum _{s\in {\mathscr {S}}} \lambda _s \sqrt{ \Phi _s(q_1) \cdot \left(h_s^2+\xi ^s(\Phi (q_1))\right) } . \end{aligned}$$

\(\square \)

2.4 Stage \(\text {II}\): Descending the Ultrametric Tree

We now turn to the second phase which uses incremental approximate message passing. Choose a large integer \({\underline{\ell }}\), and with \(\delta ={\underline{\ell }}^{-1}\) let

$$\begin{aligned} q^{\delta }_{\ell } = q_1 + (\ell -{\underline{\ell }})\delta ,\quad \ell \ge 0. \end{aligned}$$

We then define

$$\begin{aligned} {\varvec{n}}^{{\underline{\ell }}} = \varvec{m}^{{\underline{\ell }}} + \sqrt{\Phi (q_1+\delta )-\Phi (q_1)}\diamond {\varvec{g}}\end{aligned}$$
(2.21)

with the square-root taken entrywise, and \({\varvec{g}}\sim {\mathcal {N}}(0,I_N)\). Then

$$\begin{aligned} \vec R({\varvec{n}}^{{\underline{\ell }}},{\varvec{n}}^{{\underline{\ell }}})\simeq \Phi (q_1+\delta )=\Phi (q^{\delta }_{{\underline{\ell }}+1}). \end{aligned}$$
(2.22)

The point \({\varvec{n}}^{{\underline{\ell }}}\) will be the “root” of our IAMP algorithm.Footnote 2

Moreover we set \(\overline{\ell }=\max \{\ell \in {\mathbb {Z}}_+~:~q_{\ell }^{\delta }\le 1-2\delta \}.\) We also define for \(s\in {\mathscr {S}}\) and \({\underline{\ell }}\le \ell \le \overline{\ell }\) the constants

$$\begin{aligned} u_{\ell ,s}^{\delta } = \sqrt{ \frac{\Phi _s(q^{\delta }_{\ell +1})-\Phi _s(q^{\delta }_{\ell })}{\xi ^s(\Phi (q_{\ell +1}^{\delta })) - \xi ^s(\Phi (q_{\ell }^{\delta }))}}. \end{aligned}$$
(2.23)

Set \({\varvec{z}}^{{\underline{\ell }}}={\varvec{w}}^{{\underline{\ell }}}-\varvec{h}\). We will define \(({\varvec{z}}^{\ell })_{\ell \ge {\underline{\ell }}+1}\) via

$$\begin{aligned} \begin{aligned} {\varvec{z}}^{\ell +1}&= \nabla \widetilde{H}_N(f_{\ell }({\varvec{z}}^{{\underline{\ell }}},\ldots ,{\varvec{z}}^\ell )) - \sum _{j=0}^\ell d_{\ell , j}\diamond f_{j-1}({\varvec{z}}^{{\underline{\ell }}},\ldots ,{\varvec{z}}^{j-1}). \end{aligned} \end{aligned}$$
(2.24)

The Onsager coefficients \(d_{\ell ,j}\) are given by (2.7) and will not appear explicitly in any calculations until Sect. 3.2. Note that formally, they may depend on the first \({\underline{\ell }}\) iteratates, since (2.24) is a continuation of the same AMP iteration. To complete the definition of the iteration (2.24), for \(s(i)=s\) and \(\ell \ge {\underline{\ell }}\) we set

$$\begin{aligned} f_{\ell ,s}(z^{{\underline{\ell }}}_i,\ldots ,z^{\ell }_i) = n^{\ell }_i, \end{aligned}$$
(2.25)

where

$$\begin{aligned} {\varvec{n}}^{\ell +1} = {\varvec{n}}^{\ell }+ u_{\ell }^{\delta } \diamond \left({\varvec{z}}^{\ell +1}-{\varvec{z}}^{\ell } \right). \end{aligned}$$
(2.26)

The algorithm \({\mathcal {A}}\) outputs

$$\begin{aligned} {\mathcal {A}}(H_N) = \vec R({\varvec{n}}^{\overline{\ell }},{\varvec{n}}^{\overline{\ell }})^{-1/2}\diamond {\varvec{n}}^{\overline{\ell }} \in {\mathcal {B}}_N \end{aligned}$$
(2.27)

where the power \(-1/2\) is taken entry-wise. We show in (2.32) below that

$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \Vert {\varvec{n}}^{\overline{\ell }}-{\mathcal {A}}(H_N)\Vert _N=0. \end{aligned}$$

Hence we will often not distinguish between the two and just consider \({\varvec{n}}^{\overline{\ell }}\) to be the output. This makes essentially no difference by virtue of Proposition 1.6.

The state evolution limits of \({\varvec{z}}^\ell \) and \({\varvec{n}}^\ell \) are described by time-changed Brownian motions with total variance \(\Phi _s(q^{\delta }_{\ell })\) in species s after iteration \(\ell \). This is made precise below.

Lemma 2.6

Fix \(s\in {\mathscr {S}}\). The sequences \((Z^{\delta }_{{\underline{\ell }},s},Z^{\delta }_{{\underline{\ell }}+1,s},\dots )\) and \((N^{\delta }_{{\underline{\ell }},s},N^{\delta }_{{\underline{\ell }}+1,s},\dots )\) are Gaussian processes satisfying

$$\begin{aligned} \mathbb E[(Z^{\delta }_{\ell +1,s}-Z^{\delta }_{\ell ,s})Z^{\delta }_{j,s}]&= 0,\quad \text {for all }{\underline{\ell }}+1\le j\le \ell \end{aligned}$$
(2.28)
$$\begin{aligned} \mathbb E\big [ (Z^{\delta }_{\ell +1,s}-Z^{\delta }_{\ell ,s})^2 \big ]&= \xi ^s(\Phi (q_{\ell +1}^{\delta })) - \xi ^s(\Phi (q_{\ell }^{\delta })) \end{aligned}$$
(2.29)
$$\begin{aligned} \mathbb E[Z^{\delta }_{\ell ,s}Z^{\delta }_{j,s}]&= \xi ^s(\Phi (q_{j\wedge \ell }^{\delta })) \end{aligned}$$
(2.30)
$$\begin{aligned} \mathbb E[N^{\delta }_{\ell ,s}N^{\delta }_{j,s}]&= \Phi _s(q^{\delta }_{(j\wedge \ell )+1}). \end{aligned}$$
(2.31)

Proof

The fact that these sequences are Gaussian processes is a general fact about state evolution (the external Gaussian \({\varvec{g}}\) is permitted in Theorem 2). We proceed by induction on \(\ell \ge {\underline{\ell }}\). The proof is similar to [24, Sect. 8] so we give only the main points (in fact (2.21) simplifies the corresponding construction therein, which avoided the use of external Gaussian noise). We will make liberal use of (2.8) to connect asymptotic overlaps before and after applying \(\nabla H_N(\cdot )\).

For base cases, the \({\underline{\ell }}\) case of (2.30) is immediate from (2.16). The base case of (2.31) follows from (2.22), and thus the \({\underline{\ell }}+1\) case of (2.30). The main computation for the base case is

$$\begin{aligned} {\mathbb {E}}\big [\big (Z^{\delta }_{{\underline{\ell }}+1,s}-Z^{\delta }_{{\underline{\ell }},s}\big )Z^{\delta }_{{\underline{\ell }},s}\big ]&= \xi ^s\left(\{{\mathbb {E}}[N^{\delta }_{{\underline{\ell }},s}M^{{\underline{\ell }}-1}_s]\}_{s\in {\mathscr {S}}}\right) - \xi ^s\left(\{{\mathbb {E}}[M^{{\underline{\ell }}-1}_{s}M^{{\underline{\ell }}-1}_s]\}_{s\in {\mathscr {S}}}\right) \\ {}&= \xi ^s(\Phi (q_1))-\xi ^s(\Phi (q_1)) \\ {}&= 0. \end{aligned}$$

Here we used the general AMP statement of Theorem 2 to say that

$$\begin{aligned} {\mathbb {E}}[N^{\delta }_{{\underline{\ell }},s} M^{{\underline{\ell }}-1}_s] = {\mathbb {E}}[M^{{\underline{\ell }}-1}_s M^{{\underline{\ell }}-1}_s] = \Phi _s(q_1). \end{aligned}$$

For inductive steps, we always have by state evolution

$$\begin{aligned} {\mathbb {E}}[Z^{\delta }_{\ell +1,s}Z^{\delta }_{j+1,s}] \simeq \xi ^s\big (\vec R({\varvec{n}}^{\ell },{\varvec{n}}^{j})\big ). \end{aligned}$$

It follows by the inductive hypothesis of (2.28) that for \(j\le \ell \),

$$\begin{aligned} R_s({\varvec{n}}^{\ell },{\varvec{n}}^{j})&= R_s({\varvec{n}}^{{\underline{\ell }}},{\varvec{n}}^{{\underline{\ell }}}) + \sum _{k={\underline{\ell }}}^{j-1} (u_k^{\delta })^2 R_s({\varvec{z}}^{k+1}-{\varvec{z}}^k,{\varvec{z}}^{k+1}-{\varvec{z}}^k) \\ {}&= R_s({\varvec{n}}^{{\underline{\ell }}},{\varvec{n}}^{{\underline{\ell }}}) + \sum _{k={\underline{\ell }}}^{j-1} (u_k^{\delta })^2 \left( \xi ^s(\Phi (q_{k+1}^{\delta })) - \xi ^s(\Phi (q_{k}^{\delta })) \right) \\&= \Phi _s(q_1) + \sum _{k={\underline{\ell }}}^{j-1} \Big ( \Phi _s(q^{\delta }_{k+1}) - \Phi _s(q^{\delta }_{k}) \Big ) \\ {}&= \Phi _s(q^{\delta }_j). \end{aligned}$$

Plugging into the above yields that for \(j\le \ell \),

$$\begin{aligned} {\mathbb {E}}[Z^{\delta }_{\ell +1,s}Z^{\delta }_{j+1,s}] = \xi ^s(\Phi (q^{\delta }_j)). \end{aligned}$$

This depends only on \(\min (j,\ell )\), so (2.28) follows. The others are proved by similar computations. \(\square \)

Equation (2.31) implies that \(\vec R({\varvec{n}}^{\delta }_{\ell },{\varvec{n}}^{\delta }_{j})\simeq \Phi (q^{\delta }_{(\ell \wedge j)+1})\), which exactly corresponds to the previous sections of the paper. In particular it implies that the final iterate \({\varvec{n}}^{\delta }_{\overline{\ell }}\) satisfies

$$\begin{aligned} (1-O(\delta ))\cdot \vec {1}\preceq \vec R({\varvec{n}}^{\delta }_{\overline{\ell }},{\varvec{n}}^{\delta }_{\overline{\ell }})\preceq \vec {1}\end{aligned}$$
(2.32)

so the rounding step (2.27) causes only an \(O(\delta )\) change in the Hamiltonian value. Finally we compute in Lemma 2.7 the energy gain from the second phase, which matches the second term in (1.8).

Lemma 2.7

$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N \rightarrow \infty } \frac{H_{N}({\varvec{n}}^{\overline{\ell }})-H_{N}\left({\varvec{n}}^{{\underline{\ell }}}\right)}{N} = \sum _{s\in {\mathscr {S}}} \lambda _s \int _{q_1}^{1} \sqrt{\Phi '_s(t) (\xi ^s \circ \Phi )'(t)}\quad \text {d}t \end{aligned}$$
(2.33)

Proof

Observe that \(\langle h,{\varvec{n}}^{\overline{\ell }}-{\varvec{n}}^{{\underline{\ell }}}\rangle _N\simeq 0\) because the values \((N_{\ell ,s}^{\delta })_{\ell \ge {\underline{\ell }}}\) form a martingale sequence for each \(s\in {\mathscr {S}}\). Therefore it suffices to find the in-probability limit of \(\frac{\widetilde{H}_{N}({\varvec{n}}^{\overline{\ell }})-\widetilde{H}_{N}({\varvec{n}}^{{\underline{\ell }}})}{N}\). We write

$$\begin{aligned} \frac{\widetilde{H}_{N}({\varvec{n}}^{\overline{\ell }})-\widetilde{H}_{N}({\varvec{n}}^{{\underline{\ell }}})}{N} =\sum _{\ell ={\underline{\ell }}}^{\overline{\ell }-1}\frac{\widetilde{H}_{N}({\varvec{n}}^{\ell +1})-\widetilde{H}_{N}({\varvec{n}}^{\ell })}{N} \end{aligned}$$

and use a Taylor series approximation for each term. In particular for \(F\in C^3(\mathbb R;{\mathbb {R}})\), applying Taylor’s approximation theorem twice yields

$$\begin{aligned} F(1)-F(0)&= F'(0)+\frac{1}{2}F''(0)+O\big (\sup _{a\in [0,1]}|F'''(a)|\big ) \\ {}&= F'(0)+\frac{1}{2}(F'(1)-F'(0))+O\big (\sup _{a\in [0,1]}|F'''(a)|\big ) \\ {}&= \frac{1}{2}(F'(1)+F'(0))+O\big (\sup _{a\in [0,1]}|F'''(a)|\big ) . \end{aligned}$$

Assuming \(\sup _{\ell } {\left\Vert{\varvec{n}}^\ell \right\Vert}_N \le 1\), which holds with probability \(1-o_N(1)\) by state evolution and the definition of \(\overline{\ell }\), we apply this estimate with

$$\begin{aligned} F(a)= \frac{1}{N} \widetilde{H}_N\left((1-a){\varvec{n}}^{\ell }+a{\varvec{n}}^{\ell +1}\right). \end{aligned}$$

The result is:

$$\begin{aligned}&\frac{1}{N} \left| \widetilde{H}_{N} ({\varvec{n}}^{\ell +1})-\widetilde{H}_{N}({\varvec{n}}^{\ell }) -\frac{1}{2}\left\langle \nabla \widetilde{H}_N({\varvec{n}}^{\ell })+\nabla \widetilde{H}_N({\varvec{n}}^{\ell +1}),{\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell }\right\rangle \right|\\&\quad \le O\left( \underline{C} \Vert {\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell }\Vert _N^3 \right) ; \\&\underline{C}N^{-1/2} = \sup _{\Vert {\varvec{\sigma }}\Vert \le \sqrt{N}}\left\Vert\nabla ^3 \widetilde{H}_N({\varvec{\sigma }})\right\Vert_{\text {op}}. \end{aligned}$$

Proposition 1.6 implies that for deterministic constants cC,

$$\begin{aligned} {\mathbb {P}}[\underline{C}\le C]\ge 1-e^{-cN}. \end{aligned}$$

On the other hand for each \({\underline{\ell }}\le \ell \le \overline{\ell }-1\) we have

$$\begin{aligned} \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty }\Vert {\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell }\Vert _N&= \sqrt{ \sum _{s\in {\mathscr {S}}}\lambda _s R_s({\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell },{\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell })} \\ {}&= \sqrt{\sum _{s\in {\mathscr {S}}}\lambda _s \big (\Phi _s(q^{\delta }_{\ell +2})-\Phi _s(q^{\delta }_{\ell +1}) \big ) } \\ {}&=\sqrt{\delta }. \end{aligned}$$

Summing and noting that \(\overline{\ell }-{\underline{\ell }}\le \delta ^{-1}\) yields the high-probability estimate

$$\begin{aligned} \sum _{\ell ={\underline{\ell }}}^{\overline{\ell }-1}&\frac{1}{N} \left| \widetilde{H}_{N}({\varvec{n}}^{\ell +1}) - \widetilde{H}_{N}({\varvec{n}}^{\ell }) - \frac{1}{2}\left\langle \nabla \widetilde{H}_N({\varvec{n}}^{\ell })+\nabla \widetilde{H}_N({\varvec{n}}^{\ell +1}),{\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell }\right\rangle \right| \\ {}&\le \sum _{\ell ={\underline{\ell }}}^{\overline{\ell }-1} \Vert {\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell }\Vert _N^3 \le O(\sqrt{\delta }). \end{aligned}$$

So, this term vanishes as \(\delta \rightarrow 0\). It remains to prove

$$\begin{aligned} \lim _{\delta \rightarrow 0} \mathop {\mathrm {p-lim}}\limits _{N \rightarrow \infty } \sum _{\ell ={\underline{\ell }}}^{\overline{\ell }-1} \left\langle \nabla \widetilde{H}_N({\varvec{n}}^{\ell }) + \nabla \widetilde{H}_N({\varvec{n}}^{\ell +1}), {\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell } \right\rangle _N {\mathop {=}\limits ^{?}} 2 \sum _{s\in {\mathscr {S}}} \lambda _s \int _{q_1}^{1} \sqrt{\Phi '_s(t) (\xi ^s \circ \Phi )'(t)} \text {d}t. \end{aligned}$$

To establish this it suffices to show for each species \(s\in {\mathscr {S}}\) the equality

$$\begin{aligned} \lim _{\delta \rightarrow 0} \mathop {\mathrm {p-lim}}\limits _{N \rightarrow \infty } \sum _{\ell ={\underline{\ell }}}^{\overline{\ell }-1} R_s\left( \nabla \widetilde{H}_N({\varvec{n}}^{\ell }) + \nabla \widetilde{H}_N({\varvec{n}}^{\ell +1}) , {\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell } \right) {\mathop {=}\limits ^{?}} 2 \int _{q_1}^{1} \sqrt{\Phi '_s(t) (\xi ^s \circ \Phi )'(t)} \text {d}t. \end{aligned}$$
(2.34)

Observe by (2.24) that

$$\begin{aligned} \nabla \widetilde{H}_N({\varvec{n}}^{\ell }) = {\varvec{z}}^{\ell +1}+\sum _{j=0}^\ell d_{\ell , j}\diamond {\varvec{n}}^{j-1}. \end{aligned}$$
(2.35)

Passing to the limiting Gaussian process \((Z^{\delta }_k)_{k\in \mathbb Z^+}\) via state evolution,

$$\begin{aligned} \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } R\left(\nabla \widetilde{H}_N({\varvec{n}}^{\ell }),{\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell }\right)_s&= \mathbb E\left[ Z^{\delta }_{\ell +1,s}(N^{\delta }_{\ell +1,s}-N^{\delta }_{\ell ,s})\right]\\&\quad + \sum _{j=0}^{\ell } d_{\ell ,j,s} \mathbb E\left[ N^{\delta }_{j-1,s}(N^{\delta }_{\ell +1,s}-N^{\delta }_{\ell ,s}) \right], \\ \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } R\left(\nabla \widetilde{H}_N({\varvec{n}}^{\ell +1}),{\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell }\right)_s&= \mathbb E\left[ Z^{\delta }_{\ell +2,s}(N^{\delta }_{\ell +1,s}-N^{\delta }_{\ell ,s})\right]\\&\quad + \sum _{j=0}^{\ell +1} d_{\ell +1,j,s} \mathbb E\left[ N^{\delta }_{j-1}(N^{\delta }_{\ell +1,s}-N^{\delta }_{\ell ,s}) \right]. \end{aligned}$$

As \((N^{\delta }_k)_{k\ge \mathbb Z^+}\) is a martingale process, it follows that all right-most expectations vanish. Similarly it holds that

$$\begin{aligned} \mathbb E[Z_{\ell +2}^{\delta }(N_{\ell +1}^{\delta }-N_{\ell }^{\delta })]&=\mathbb E[Z_{\ell +1}^{\delta }(N_{\ell +1}^{\delta }-N_{\ell }^{\delta })] \\ \mathbb E[Z_{\ell }^{\delta }(N_{\ell +1}^{\delta }-N_{\ell }^{\delta })]&=0. \end{aligned}$$

We conclude that

$$\begin{aligned}&\mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } R\left(\nabla \widetilde{H}_N({\varvec{n}}^{\ell })+\nabla \widetilde{H}_N({\varvec{n}}^{\ell +1}),{\varvec{n}}^{\ell +1}-{\varvec{n}}^{\ell }\right)_s\\&\quad = 2\,\mathbb E[ (Z_{\ell +1,s}^{\delta }-Z^{\delta }_{\ell ,s})(N_{\ell +1,s}^{\delta }-N_{\ell ,s}^{\delta }) ] \\&\quad = 2\,\mathbb E[u_{\ell ,s}^{\delta }(Z^{\delta }_{\ell ,s})(Z_{\ell +1,s}^{\delta }-Z^{\delta }_{\ell ,s})^2] \\&\quad = 2\,\mathbb E[u_{\ell ,s}^{\delta }(Z^{\delta }_{\ell ,s})] \cdot {\mathbb {E}}[(Z_{\ell +1,s}^{\delta }-Z^{\delta }_{\ell ,s})^2] \\&\quad = 2\,\sqrt{ \Big ( \Phi _s(q^{\delta }_{\ell +1,s})-\Phi _s(q^{\delta }_{\ell ,s}) \Big ) \cdot \Big (\xi ^s(\Phi (q_{\ell +1}^{\delta }))-\xi ^s(\Phi (q_{\ell }^{\delta })) \Big ) } . \end{aligned}$$

In the second-to-last step we used independence of \(Z^{\delta }_{\ell ,s}\) increments, which follows from Lemma 2.6, while the last step used (2.23) and (2.29). Combining with [16, Lemma 3.7] on discrete approximation of the integral in \({\mathbb {A}}\) implies (2.34). \(\square \)

Proof of Theorem 1

We take \({\mathcal {A}}\) as in (2.27) for \({\underline{\ell }}\) a large constant depending on \(({\varepsilon },\xi ,h,\lambda )\). First,

$$\begin{aligned} {\mathbb {P}}[H_N({\mathcal {A}}(H_N))/N \ge {\textsf {ALG}}-{\varepsilon }/2] \ge 1-o_N(1) \end{aligned}$$
(2.36)

follows from combining Lemmas 2.5, 2.7 and the fact that (recall (2.32))

$$\begin{aligned} H_N({\mathcal {A}}(H_N))/N \simeq H_N({\varvec{n}}^{{\underline{\ell }}})/N+o_{{\mathbb {P}}}(1). \end{aligned}$$

Next, let \(K_N\subseteq {\mathscr {H}}_N\) be as in Proposition 1.6. We recall that \({\mathbb {P}}[H_N\in K_N]\ge 1-e^{-cN}\). Exactly as in [15, Theorem 10] it follows that there is a \(C({\varepsilon })\)-Lipschitz function \(\widetilde{\mathcal {A}}:{\mathscr {H}}_N\rightarrow {\mathbb {R}}\) such that \(\widetilde{\mathcal {A}}\) and \({\mathcal {A}}\) agree on \(K_N\). Moreover (1.6) and concentration of measure on Gaussian space imply that \(H_N(\widetilde{\mathcal {A}}(H_N))\) is \(O(N^{1/2})\)-sub-Gaussian. In light of (2.36) and since \({\mathbb {P}}[\widetilde{\mathcal {A}}(H_N)={\mathcal {A}}(H_N)]\ge {\mathbb {P}}[H_N\in K_N]\ge 1-e^{-cN}\), we deduce that

$$\begin{aligned} {\mathbb {P}}[H_N({\mathcal {A}}(H_N))/N \ge {\textsf {ALG}}-{\varepsilon }] \ge 1-e^{-cN}. \end{aligned}$$

This concludes the proof. \(\square \)

3 Extensions

3.1 Signed AMP

In our companion paper [17], we show that strictly super-solvable models have w.h.p. exactly \(2^r\) critical points, indexed by sign patterns \({\vec \Delta }\in \{\pm 1\}^r\) with the following physical meaning. Consider first the extreme case of a linear Hamiltonian, with external field \(\varvec{h}= {\vec h}\diamond {\textbf {1}}\) where all entries of \({\vec h}\) are nonzero and no other interactions. This model clearly has \(2^r\) critical points, which are the products of the maxima and minima in the spheres \(\{{\left\Vert{\varvec{x}}_s\right\Vert}_2^2 = \lambda _s N\}\) corresponding to each species \(s\in {\mathscr {S}}\), and the signs \({\vec \Delta }\) record whether the critical point is a maximum or minimum in each species. As explained in [17, Sect. 6.6], if a strictly super-solvable \(H_N\) is gradually deformed to a linear function (staying inside the strictly super-solvable phase), the critical points move stably, and over this process their Hessian eigenvalues do not cross zero. Thus, each critical point of \(H_N\) can also be associated with a sign pattern \({\vec \Delta }\).

We now show that the root-finding algorithm defined in Sect. 2.3 can be generalized to find all \(2^r\) critical points in a strictly super-solvable model. More precisely, it finds \(2^r\) approximate critical points, one in a neighborhood of each exact critical point of the model, from which the exact critical points can be computed by Newton’s method (see Remark 3.2). For general models, it finds \(2^r\) approximate critical points on the product of spheres with self-overlap \(\Phi (q_1)\). The restriction of \(H_N\) to this set, considered as a spin glass in its own right (see [16, Remark 1.2]) is a solvable model.

Fixing \({\vec \Delta }\in \{\pm 1\}^r\), the analogous iteration to (2.10) is:

$$\begin{aligned} \begin{aligned} {\varvec{w}}^{k+1}&= \nabla H_N(\varvec{m}^k) - {\vec b}_k \diamond \varvec{m}^{k-1} \\ {}&= \varvec{h}+ \nabla \widetilde{H}_N(\varvec{m}^k) - {\vec b}_k({\vec \Delta }) \diamond \varvec{m}^{k-1}; \\ \varvec{m}^k&= {\vec \Delta }\odot {\vec a}\diamond {\varvec{w}}^k \\ b_{k,s}({\vec \Delta })&\equiv \sum _{s'\in {\mathscr {S}}} \Delta _{s'} a_{s'} \partial _{s'}\xi ^s \big (\vec R(\varvec{m}^k,\varvec{m}^{k-1})\big ). \end{aligned} \end{aligned}$$
(3.1)

The change of sign does not affect the proofs or statements of Lemmas 2.32.4. Indeed \(a_s^2\) only changes to \(\Delta _s^2 a_s^2\) in the former proof which is no change at all. The generalization of Lemma 2.5 is as follows.

Lemma 3.1

$$\begin{aligned} \lim _{k\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty }\frac{H_N(\varvec{m}^k)}{N} = \sum _{s\in {\mathscr {S}}} \lambda _s \Delta _s \sqrt{ \Phi _s(q_1) \left(h_s^2+\xi ^s(\Phi (q_1))\right) } . \end{aligned}$$

Proof

The proof is similar to Lemma 2.5. The main calculation now becomes:

$$\begin{aligned} \int _0^1 \langle \varvec{m}^k,\nabla {\widetilde{H}}_N(t\varvec{m}^k)\rangle _N \text {d}t&\simeq \sum _{s\in {\mathscr {S}}} \lambda _s \int _0^1 R_s(\varvec{m}^k,\nabla {\widetilde{H}}_N(t\varvec{m}^k)) \text {d}t \\&\simeq \sum _{s\in {\mathscr {S}}} \lambda _s \int _0^1 \Big ( \Delta _s a_s \xi ^s(t\vec R^k) + t R^k_s \sum _{s'\in {\mathscr {S}}} \Delta _{s'} a_{s'}\partial _{s'}\xi ^s(\vec R^k) \Big ) ~\text {d}t \\&{\mathop {=}\limits ^{(2.20)}} \sum _{s\in {\mathscr {S}}} \lambda _s \int _0^1 \Big ( \Delta _s a_s \xi ^s(t\vec R^k) + t \Delta _s a_s \sum _{s'\in {\mathscr {S}}} R^k_{s'} \partial _{s'}\xi ^s(\vec R^k) \Big ) ~\text {d}t \\&= \sum _{s\in {\mathscr {S}}} \lambda _s \Delta _s a_s \int _0^1 \frac{\text {d}~}{\text {d}t} \left(t\, \xi ^s(t\, \vec R^k)\right) \text {d}t \\&= \sum _{s\in {\mathscr {S}}} \lambda _s \Delta _s a_s \xi ^s(\vec R^k). \end{aligned}$$

Moreover the external field \(\varvec{h}\) now contributes energy

$$\begin{aligned} \langle \varvec{h}, \varvec{m}^k\rangle _N \simeq \sum _{s\in {\mathscr {S}}} \lambda _s h_s{\mathbb {E}}[M^k_s] = \sum _{s\in {\mathscr {S}}} \lambda _s \Delta _s a_s h_s^2. \end{aligned}$$

Combining gives the desired statement. \(\square \)

Remark 3.1

One can sign the IAMP phase as well by redefining (2.26) to

$$\begin{aligned} {\varvec{n}}^{\ell +1}({\vec \Delta }) = {\varvec{n}}^{\ell }({\vec \Delta }) + {\vec \Delta }\odot u_{\ell }^{\delta } \diamond ({\varvec{z}}^{\ell +1}-{\varvec{z}}^{\ell }). \end{aligned}$$
(3.2)

The resulting output \({\varvec{n}}^{\overline{\ell }}({\vec \Delta })\) then achieves asymptotic energy (recall (1.8))

$$\begin{aligned}{} & {} \lim _{{\underline{\ell }}\rightarrow \infty }\mathop {\mathrm {p-lim}}\limits _{N \rightarrow \infty }\frac{H_N\big ({\varvec{n}}^{\overline{\ell }}({\vec \Delta })\big )}{N} \nonumber \\ {}{} & {} \quad =\sum _{s\in {\mathscr {S}}} \lambda _s \Delta _s\left[ \sqrt{\Phi _s(q_1) (\xi ^s(\Phi (q_1)) + h_s^2)} + \int _{q_1}^1 \sqrt{\Phi '_s(q)(\xi ^s\circ \Phi )'(q)}~\text {d}q \right]. \end{aligned}$$
(3.3)

However it is unclear whether \({\varvec{n}}^{\overline{\ell }}({\vec \Delta })\) can be made to obey any notable properties. We will show that the signed outputs \(\varvec{m}^k({\vec \Delta })\) of the first phase above are approximate critical points for \(H_N\) (and in [17] that all near-critical points are close to one of them). By contrast, for the output of signed IAMP to be a critical point, \(\Phi \) must satisfy a signed version of the tree-descending ODE (2.3) in which the function \((\xi ^s \circ \Phi )'(q)\) is replaced by

$$\begin{aligned} \sum _{s'\in {\mathscr {S}}} \Delta _{s'} \partial _{s'}\xi ^s(\Phi (q)) \Phi _s'(q). \end{aligned}$$

Since this quantity appears inside a square root in (2.3), it is unclear when to expect solutions to exist. Furthermore the proof in [16] of well-posedness relies on positivity of coefficients (via Perron-Frobenius theory) and does not seem to generalize. Additionally, a solution would not seem to correspond to a maximizer of any variational problem as in (1.8). As a result we do not know how to prove a solution exists in the signed case. However if one takes as given a smooth function \(\Phi \) satisfying the signed tree-descending ODE, the iteration (3.2) starting from signed initialization \({\varvec{n}}^{{\underline{\ell }}}({\vec \Delta })=\varvec{m}^{{\underline{\ell }}}({\vec \Delta })+ \sqrt{\Phi (q_1+\delta )-\Phi (q_1)}\diamond {\varvec{g}}\) would produce an approximate critical point \({\varvec{n}}^{\overline{\ell }}({\vec \Delta })\) which still satisfies (3.3).

3.2 Gradient Computation and Connection to \(E_{\infty }\)

We now compute the gradient of the outputs, showing that \(\varvec{m}^{{\underline{\ell }}}({\vec \Delta })\) and \({\varvec{n}}^{\ell }\) (\({\underline{\ell }}\le \ell \le \overline{\ell }\)) are approximate critical points for the restriction of \(H_N\) to the products of r spheres with suitable radii passing through them. For \({\varvec{\sigma }}\) to be an approximate critical point means precisely that there exist coefficients \(\vec {A}\in {\mathbb {R}}^r\) such that

$$\begin{aligned} \Vert \nabla H_N({\varvec{\sigma }}) - \vec {A}\diamond {\varvec{\sigma }}\Vert _N \simeq 0. \end{aligned}$$

In our case, these coefficients will be given as follows. If \(\vec {1}\) is strictly sub-solvable (so \(q_1<1\)), define \(\vec {A}(q)\) for \(q\in [q_1,1]\) by

$$\begin{aligned} A_s(q)&\equiv f_s(q)^{-1} +\sum _{s'\in {\mathscr {S}}} f_{s'}(q) \partial _{s'}\xi ^s\big (\Phi (q)\big ), \end{aligned}$$
(3.4)
$$\begin{aligned} f_s(q)&\equiv \sqrt{\frac{\Phi '_s(q)}{(\xi ^s \circ \Phi )'(q)}}. \end{aligned}$$
(3.5)

Further define for \({\vec \Delta }\in \{-1,1\}^r\)

$$\begin{aligned} A_s(q_1;{\vec \Delta }) \equiv \Delta _s \sqrt{\frac{\xi ^s(\Phi (q_1))+h_s^2}{\Phi _s(q_1)}} + \sum _{s'\in {\mathscr {S}}} \Delta _{s'} \partial _{s'}\xi ^s\big (\Phi (q_1)\big ) \sqrt{ \frac{\Phi _{s'}(q_1)}{\xi ^{s'}(\Phi (q_1))+h_{s'}^2} }. \end{aligned}$$
(3.6)

Note that, by (2.2), this is consistent with the definition of \(\vec {A}(q_1)\) above, in the sense that \(\vec {A}(q_1;\vec {1}) = \vec {A}(q_1)\). We take this to be the definition of \(\vec {A}(q_1)\) if \(\vec {1}\) is super-solvable (and \(q_1=1\)).

Proposition 3.2

If \(\Phi \) is a pseudo-maximizer for \({\mathbb {A}}\) (recall Definition 2.1) then for any \({\vec \Delta }\in \{\pm 1\}^r\),

$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \Vert \nabla H_N(\varvec{m}^{{\underline{\ell }}}({\vec \Delta }))- \vec {A}(q_1,{\vec \Delta }) \diamond \varvec{m}^{{\underline{\ell }}}({\vec \Delta })\Vert _N = 0 . \end{aligned}$$
(3.7)

Proof

Recall from Lemma 2.3 (which holds without modification for general \({\vec \Delta }\)) that

$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty }\mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \Vert \varvec{m}^{{\underline{\ell }}+1}({\vec \Delta })-\varvec{m}^{{\underline{\ell }}}({\vec \Delta })\Vert _N =0. \end{aligned}$$
(3.8)

Thus rearranging (3.1) yields

$$\begin{aligned} \lim _{k\rightarrow \infty }\mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \Vert \nabla H_N(\varvec{m}^{{\underline{\ell }}}({\vec \Delta })) - ({\vec \Delta }\odot {\vec a}^{-1} + {\vec b}_k({\vec \Delta })) \diamond \varvec{m}^{{\underline{\ell }}}({\vec \Delta }) \Vert _N =0. \end{aligned}$$

Since \(\lim _{{\underline{\ell }}\rightarrow \infty } \big (\Delta _s a_s^{-1}+b_{{\underline{\ell }},s}({\vec \Delta })\big )=A_s({\vec \Delta })\) by (3.6), the result follows. \(\square \)

Remark 3.2

In [17, Theorems 1.5 and 1.6], we show that when \(\xi \) is strictly super-solvable, \(H_N\) has exactly \(2^r\) critical points \(\{{\varvec{x}}({\vec \Delta })\}_{{\vec \Delta }\in \{-1,1\}^r}\). Moreover all \({\varepsilon }\)-approximate critical points with Riemannian gradient \(\Vert \nabla _{{\text {sp}}}H_N({\varvec{x}})\Vert \le {\varepsilon }\sqrt{N}\) are within \(o_{{\varepsilon }}(\sqrt{N})\) of some \({\varvec{x}}({\vec \Delta })\). It follows from Proposition 3.2 that each \(\varvec{m}^{{\underline{\ell }}}({\vec \Delta })\) is an \({\varepsilon }\)-approximate critical point for large enough \({\underline{\ell }}={\underline{\ell }}(\xi ,{\varepsilon })\). In fact the preceding gradient computation shows that the values \({\vec \Delta }\) agree, implying that \(\Vert \varvec{m}^{{\underline{\ell }}}({\vec \Delta })-{\varvec{x}}({\vec \Delta })\Vert _N \le o_{{\underline{\ell }}\rightarrow \infty }(1)\) (compare with [17, Definition 5, Eq. (1.15)]). Moreover by [17, Theorem 1.6] each Riemannian Hessian \(\nabla ^2_{{\text {sp}}}H_N({\varvec{x}}({\vec \Delta }))\) has condition number at least \(1/C(\xi )\). It follows that each critical point \({\varvec{x}}({\vec \Delta })\) can be efficiently computed to arbitrary accuracy by applying Newton’s method from \(\varvec{m}^{{\underline{\ell }}}({\vec \Delta })\) for a large enough \({\underline{\ell }}={\underline{\ell }}(\xi )\). (By contrast, the convergence of \(\varvec{m}^{{\underline{\ell }}}({\vec \Delta })\) itself to \({\varvec{x}}({\vec \Delta })\) is only in the careful double-limit sense \(\lim _{{\underline{\ell }}\rightarrow \infty }\lim _{N\rightarrow \infty }\).)

Proposition 3.3

If \(\Phi \) is a pseudo-maximizer for \({\mathbb {A}}\), then for any \({\underline{\ell }}\)-indexed sequence \((q_*,\ell )=\big ((q_*,\ell )_{{\underline{\ell }}\ge 1}\big )\) such that \(q_*\in [q_1,1]\), \({\underline{\ell }}\le \ell \le \overline{\ell }\) and \(\lim _{{\underline{\ell }}\rightarrow \infty }|q_*-q_{\ell }^{\delta }|=0\), we have

$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \left\Vert\nabla H_N({\varvec{n}}^{\ell })-\vec {A}(q_*) \diamond {\varvec{n}}^{\ell } \right\Vert_N = 0. \end{aligned}$$

Proof

For notational convenience we assume \((q_*,\ell )=(1,\overline{\ell })\); the proof is identical in general. Recall the rearrangement (2.35):

$$\begin{aligned} \nabla \widetilde{H}_N({\varvec{n}}^{\overline{\ell }}) = {\varvec{z}}^{\overline{\ell }+1}+\sum _{j=0}^{\overline{\ell }} d_{\overline{\ell }, j}\diamond {\varvec{n}}^{j-1} . \end{aligned}$$
(3.9)

So far we did not have to compute \(d_{\overline{\ell }, j}\). We do this now, focusing on the IAMP phase. Recalling (2.25), the IAMP iteration used non-linearity

$$\begin{aligned} \varvec{f}_{\overline{\ell }}&= {\varvec{n}}^{\overline{\ell }} = n^{{\underline{\ell }}}+ \sum _{j={\underline{\ell }}}^{\overline{\ell }-1} ({\varvec{n}}^{j+1}-{\varvec{n}}^{j}) \\&= {\vec a}\diamond {\varvec{z}}^{{\underline{\ell }}}+ \sum _{j={\underline{\ell }}}^{\overline{\ell }-1} {\varvec{u}}^{\delta }_{j} \diamond ({\varvec{z}}^{j+1}-{\varvec{z}}^{j}) \\&= ({\vec a}- {\varvec{u}}_{{\underline{\ell }}}^{\delta }) \diamond {\varvec{z}}^{{\underline{\ell }}} + {\varvec{u}}_{\overline{\ell }-1}^{\delta }\diamond {\varvec{z}}^{\overline{\ell }} - \sum _{j={\underline{\ell }}+1}^{\overline{\ell }-1} ({\varvec{u}}^{\delta }_{j}-{\varvec{u}}^{\delta }_{j-1}) \diamond {\varvec{z}}^{j} . \end{aligned}$$

Using the formula (2.7) we find

$$\begin{aligned} d_{\overline{\ell },j,s}&\approx {\left\{ \begin{array}{ll} \sum _{s'\in {\mathscr {S}}} \partial _{s'}\xi ^s(\Phi (1)) ~ u^{\delta }_{\overline{\ell },s'} ,\quad \quad \quad \quad \quad \quad j=\overline{\ell }; \\ - \sum _{s'\in {\mathscr {S}}} \partial _{s'}\xi ^s(\Phi (q_{j-1}^{\delta })) ~ (u^{\delta }_{j,s'}-u^{\delta }_{j-1,s'}) ,\quad {\underline{\ell }}<j<\overline{\ell }; \\ \sum _{s'\in {\mathscr {S}}} \partial _{s'}\xi ^s(\Phi (q_1)) ~ (a_{s'}-u^{\delta }_{{\underline{\ell }},s'}) ,\quad \quad \quad j={\underline{\ell }}. \end{array}\right. } \end{aligned}$$

Note that since \(\Phi \in C^2([q_1,1])\) we have the uniform-in-\(q_j^{\delta }\) approximations (recall (3.5)):

$$\begin{aligned} \begin{aligned} u^{\delta }_{\ell ,s}&\approx f_s(q_j^{\delta }), \\ \frac{u^{\delta }_{\ell ,s}-u^{\delta }_{\ell -1,s}}{\delta }&\approx \frac{\text {d}}{\text {d}q} \sqrt{\frac{\Phi '_s(q)}{(\xi ^s \circ \Phi )'(q)}} \, \Bigg |_{q=q_j^{\delta }}, \\ a_s\approx u_{{\underline{\ell }},s}^{\delta }&\approx f_s(q_{{\underline{\ell }}}^{\delta })\approx f_s(q_1) . \end{aligned} \end{aligned}$$
(3.10)

Substituting into (3.9), we obtain

$$\begin{aligned} \nabla H_N({\varvec{n}}^{\overline{\ell }})= & {} \varvec{h}+{\varvec{z}}^{\overline{\ell }+1} + \sum _{j={\underline{\ell }}}^{\overline{\ell }} d_{\overline{\ell }, j}\diamond {\varvec{n}}^{j-1}\nonumber \\= & {} {\vec a}^{-1}\diamond \varvec{m}^{{\underline{\ell }}} + \sum _{j={\underline{\ell }}}^{\overline{\ell }} u^{\delta }_j \diamond ({\varvec{n}}^{j+1}-{\varvec{n}}^j) + \sum _{j={\underline{\ell }}}^{\overline{\ell }} d_{\overline{\ell }, j}\diamond {\varvec{n}}^{j-1}\nonumber \\\approx & {} \Big ( {\vec a}^{-1} + \sum _{j={\underline{\ell }}}^{\overline{\ell }} d_{\overline{\ell },j} \Big ) \diamond {\varvec{n}}^{{\underline{\ell }}} + \sum _{j={\underline{\ell }}}^{\overline{\ell }-1} \vec {C}_j\diamond ({\varvec{n}}^{j+1}-{\varvec{n}}^j) ;\nonumber \\ C_{j,s}\equiv & {} a_s^{-1} + \sum _{k=j+1}^{{\underline{\ell }}} d_{\overline{\ell },k,s}\nonumber \\{} & {} {\mathop {\approx }\limits ^{(3.10)}} f_s(q_j^{\delta })^{-1} + \sum _{s'\in {\mathscr {S}}} \left( \partial _{s'} \xi ^s(\Phi (1)) f_{s'}(1) - \int _{q_j^{\delta }}^1 \partial _{s'}\xi ^s(\Phi (q)) \, f_{s'}'(q)~\text {d}q \right)\nonumber \\\equiv & {} \widehat{C}_s(q_j^{\delta }) . \end{aligned}$$
(3.11)

Since the increments \(({\varvec{n}}^{j+1}-{\varvec{n}}^j)\) are orthogonal in the state evolution sense, it easily follows that the approximation of \(C_{j,s}\) by \(\widehat{C}_{s}(q_j^{\delta })\) commutes with summation, i.e.

$$\begin{aligned} \nabla H_N({\varvec{n}}^{\overline{\ell }}) \approx \Big ( {\vec a}^{-1} + \sum _{j={\underline{\ell }}}^{\overline{\ell }} d_{\overline{\ell },j} \Big ) \diamond {\varvec{n}}^{{\underline{\ell }}} + \sum _{j={\underline{\ell }}}^{\overline{\ell }-1} \widehat{C}(q_j^{\delta })\diamond ({\varvec{n}}^{j+1}-{\varvec{n}}^j) \end{aligned}$$

Note that we manifestly have \(\widehat{C}(1)=\vec {A}(1)\). We claim the function \(\widehat{C}\) is constant on \([q_1,1]\). This is equivalent to showing that for each s the function

$$\begin{aligned} F_s(q) = \frac{1}{f_s(q)} + \left( \int _{q_1}^q \sum _{s'\in {\mathscr {S}}} \partial _{s'}\xi ^s(\Phi (t)) f_{s'}'(t) ~\text {d}t \right) \end{aligned}$$

is constant. Differentiating, it suffices to show

$$\begin{aligned} \sum _{s'\in {\mathscr {S}}} \partial _{s'}\xi ^s(\Phi (q)) f_{s'}'(q) {\mathop {=}\limits ^{?}} f_s'(q)/f_s(q)^2 . \end{aligned}$$
(3.12)

Write \(f_{s'}'(q)=\Psi (q)\Phi _{s'}'(q)\), where \(\Psi \) is independent of s since \(\Phi \) solves the tree-descending ODE (2.3). Then using the chain rule, the left-hand side of (3.12) equals

$$\begin{aligned} \Psi (q) \sum _{s'\in {\mathscr {S}}} \partial _{s'}\xi ^s(\Phi (q))\cdot \Phi _{s'}'(q) =\Psi (q)(\xi ^s\circ \Phi )'(q). \end{aligned}$$

Meanwhile the right-hand side of (3.12) is

$$\begin{aligned} f_s'(q)/f_s(q)^2 = \Psi (q)\Phi _{s}'(q) \cdot \frac{(\xi ^s\circ \Phi )'(q)}{\Phi _s'(q)} = \Psi (q)(\xi ^s\circ \Phi )'(q). \end{aligned}$$

Therefore \(\widehat{C}(q)=\vec {A}(1)\) is constant as claimed. Finally it is clear that the \({\varvec{n}}^{{\underline{\ell }}}\) coefficient in (3.11) approximately equals \(\widehat{C}(q_1)\) and hence also \(\vec {A}(1)\). Then (3.11) implies

$$\begin{aligned} \nabla H_N({\varvec{n}}^{\overline{\ell }}) \approx \vec {A}(1)\diamond \left({\varvec{n}}^{{\underline{\ell }}}+\sum _{j={\underline{\ell }}}^{\overline{\ell }-1} ({\varvec{n}}^{j+1}-{\varvec{n}}^j)\right) \\ =\vec {A}(1)\diamond {\varvec{n}}^{\overline{\ell }} \end{aligned}$$

which completes the proof. \(\square \)

From the point of view of [16], the fact that \(\Vert \nabla _{{\text {sp}}} H_N({\varvec{n}}^{\overline{\ell }})\Vert _N\approx 0\) is to be expected. At least for \((\Phi ;q_1)\) maximizing \({\mathbb {A}}\), if this were not true than an extra step of gradient descent would essentially suffice to reach energy strictly better than \({\textsf {ALG}}\), contradicting the optimality in [16, Theorem 1]. However the radial derivative computation is interesting in its own right and lets us study the spherical Hessian around an output \({\varvec{\sigma }}\). We believe that Corollary 3.4 can be strengthened to hold with \({\varvec{\lambda }}_1\) rather than \({\varvec{\lambda }}_{{\varepsilon }N}\). This seems to require a more precise Gaussian conditioning argument around \({\mathcal {A}}(H_N)\) which we chose not to pursue.

Corollary 3.4

With \({\varvec{\lambda }}_k\) the k-th largest eigenvalue of a symmetric real matrix,

$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty ,{\varepsilon }\rightarrow 0} \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } {\varvec{\lambda }}_{{\varepsilon }N} \left( \nabla ^2_{{\text {sp}}} H_N({\varvec{n}}^{\overline{\ell }}) \right) =0. \end{aligned}$$
(3.13)

Proof

Fixing \(\vec {A}=\vec {A}(1)\), the bulk spectral measure of

$$\begin{aligned} {\varvec{W}}({\varvec{x}})=\nabla ^2 H_N({\varvec{x}}) - \vec {A}\diamond {\varvec{x}}\end{aligned}$$
(3.14)

for deterministic \({\varvec{x}}\in {\mathcal {S}}_N\) concentrates with rate function \(N^2\) around a limiting spectral measure independent of \({\varvec{x}}\). By union-bounding over an \(\delta \sqrt{N}\)-net as in [26, Proof of Lemma 3], it thus suffices to show (3.13) at a point \({\varvec{x}}\in {\mathcal {S}}_N\) independent of \(H_N\), with \({\varvec{W}}({\varvec{x}})\) in place of \(\nabla ^2_{{\text {sp}}} H_N({\varvec{\sigma }})\). This is purely a statement of random matrix theory and is shown in [17, Proposition 5.18]. \(\square \)

Notably Corollary 3.4 explains the equality \({\textsf {ALG}}=E_{\infty }\) for pure models, which we derived manually in [16]. Indeed for a pure model with \(\xi =\prod _{i=1}^r x_i^{a_i}\), the energy and radial derivative are deterministically proportional:

$$\begin{aligned} \nabla _{{\text {rad}}} H_N({\varvec{x}}) = -H_N({\varvec{x}}){\vec a}\diamond {\varvec{x}},\quad \forall {\varvec{x}}\in {\mathcal {B}}_N. \end{aligned}$$

It follows (using again the \(N^2\) large deviation rate for the spectral bulk) that there is a unique energy level \(E_{\infty }\) at which critical points can have spherical Hessian obeying the conclusion of Corollary 3.4. This is the definition of \(E_\infty \) given in [1, 20].

3.3 Branching IAMP and Exponential Concentration

Here we modify the second stage of our IAMP algorithm (which requires \({\vec \Delta }=\vec {1}\)) to use external Gaussian randomness in a small number of increment steps. This allows the construction of an ultrametric tree of outputs with large constant depth and \(\exp (cN)\) breadth, with pairwise overlaps given by \(\Phi \). More precisely, for any finite ultrametric space \(X=(x_1,\ldots ,x_M)\), \(M=\exp (cN)\), of diameter at most \(1-q_1\), branching IAMP outputs \(({\varvec{\sigma }}_1,\ldots ,{\varvec{\sigma }}_M)\) with

$$\begin{aligned} \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \max _{1\le i,j\le M}{\Vert \vec R({\varvec{\sigma }}_i,{\varvec{\sigma }}_j) - \Phi \big (1-d_X(x_i,x_j)\big ) \Vert }_{\infty }=0. \end{aligned}$$

We use an approach suggested in [3] by injecting external Gaussian noise \({\varvec{g}}^{(i)}\) into the IAMP phase of the algorithm at depth \(q_i\in (q_1,1)\). Importantly, this gives an explicit construction of \(\exp (cN)\) approximate critical points of \(H_N\) (with exponentially good probability) whenever there is an IAMP phase. A similar construction was used by one of us in [24, Sect. 4]. There the Gaussian noise was constructed artificially by preliminary iterates of AMP rather than from exogenous noise (due to the lack of a state evolution result incorporating independent Gaussian vectors). This only enabled the construction of a large constant number of outputs rather than exponentially many.

Our branching IAMP proceeds as follows. We first apply Stage \(\text {I}\) with \({\vec \Delta }=\vec {1}\) as before. We fix \(q_1<q_2<\dots <q_m=1\) and let

$$\begin{aligned} \ell ^{\delta }_{q_i}={\underline{\ell }}+\left\lceil \frac{q_i-q}{\delta }\right\rceil +1, \quad i\in [m]. \end{aligned}$$

We define \({\varvec{n}}^{\ell }\) with the same recursive formula as before, unless \(\ell =\ell ^{\delta }_{q_i}\) for some \(i\in [m]\). For these cases, we define \({\varvec{g}}^{(1)},\ldots ,{\varvec{g}}^{(m)}\sim {\mathcal {N}}(0,{\textbf {1}}_N)\) to be independent standard Gaussian vectors. Then we set:

$$\begin{aligned} {\varvec{n}}^{\ell +1} = {\left\{ \begin{array}{ll} {\varvec{n}}^{\ell } + \sqrt{\xi ^s\big (\Phi (q^{\delta }_{\ell _{q_i}^{\delta }+1})\big )-\xi ^s\big (\Phi (q^{\delta }_{\ell _{q_i}^{\delta }})\big )} \diamond {\varvec{g}}^{(i)}, \quad \quad \quad \ell = \ell ^{\delta }_{q_i}\quad \text {for some } i\in [m] \\ {\varvec{n}}^{\ell }+ u_{\ell }^{\delta }\diamond \left( {\varvec{z}}^{\ell +1}-{\varvec{z}}^{\ell } \right) , \quad \text { else}. \end{array}\right. } \end{aligned}$$
(3.15)

The definition (3.15) naturally enables couplings for pairs of iterations. We say the iterations \(\big ({\varvec{n}}^{\ell ,1},{\varvec{n}}^{\ell ,2}\big )_{\ell \ge 1}\) are \(q_j\)-coupled if their associated Gaussian vectors

$$\begin{aligned} {\varvec{g}}^{(1,1)},\ldots ,{\varvec{g}}^{(m,1)}&\sim {\mathcal {N}}(0,{\textbf {1}}_N), \\ {\varvec{g}}^{(1,2)},\ldots ,{\varvec{g}}^{(m,2)}&\sim {\mathcal {N}}(0,{\textbf {1}}_N) \end{aligned}$$

are coupled so that \({\varvec{g}}^{(i,1)}={\varvec{g}}^{(i,2)}\) almost surely for \(i<j\), and the variables are otherwise independent.

Proposition 3.5

Let the iterations \({\varvec{n}}^{\ell ,1},{\varvec{n}}^{\ell ,2}\) be \(q_j\) coupled as above, and let \(\Phi \) be a pseudo-maximizer of \({\mathbb {A}}\) (recall Definition 2.1). Then

$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \frac{H_N({\varvec{n}}^{\overline{\ell },a}_{\delta })}{N}&= {\mathbb {A}}(\Phi ),\quad a\in \{1,2\} ; \end{aligned}$$
(3.16)
$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \left\Vert\nabla H_N({\varvec{n}}^{\ell ,a})-\vec {A}(1) \diamond {\varvec{n}}^{\ell ,a} \right\Vert_N&= 0,\quad a\in \{1,2\} ; \end{aligned}$$
(3.17)
$$\begin{aligned} \lim _{{\underline{\ell }}\rightarrow \infty } \mathop {\mathrm {p-lim}}\limits _{N\rightarrow \infty } \frac{ \langle {\varvec{n}}^{\overline{\ell },1}_{\delta } , {\varvec{n}}^{\overline{\ell },2}_{\delta } \rangle }{N}&= \Phi (q_j^{\delta }) . \end{aligned}$$
(3.18)

Proof

The analysis uses the slightly generalized state evolution given in Theorem 2, which states that (2.8) continues to hold even in the presence of external randomness \({\varvec{g}}^{(i)}\). Modulo this point, the calculations are essentially identical. Indeed [24] uses exactly the same calculations to analyze a slightly different formulation of branching IAMP (therein, the vectors \({\varvec{g}}\) are defined via negatively time-indexed AMP iterates to sidestep the lack of a generalized state evolution result). We therefore give only an outline below.

The SDE description in (2.6) is unchanged if one uses the slightly added generality of Theorem 2 to incorporate the external Gaussian noise. (This Gaussian noise is scaled in (3.15) to achieve exactly the same effect as a usual iteration step.) The energy analysis of \(H_N({\varvec{n}}^{\overline{\ell }})\) only changes on the m modified steps which has negligible effect since \(\delta \rightarrow 0\) as \({\underline{\ell }}\rightarrow \infty \); similarly for \(\nabla H_N({\varvec{n}}^{\overline{\ell }})\). Thus (3.16) follows by the same proof as before. The proof of (3.18) is identical to [24, Sect. 8]. \(\square \)

In Proposition 3.6 we observe that concentration of measure implies Proposition 3.5 holds with exponentially high probability. Thus we can couple together \(\exp (cN)\) branching IAMPs to construct a full ultrametric tree of large constant depth m and breadth \(\exp (cN)\). To do this, we fix m, take \({\underline{\ell }}\) sufficiently large and then \(\eta >0\) sufficiently small. Then with \(K=\exp (\eta N)\), we consider a complete depth m rooted tree \({\mathcal {T}}\), with root defined to have depth 1, such that each vertex at depths \(1,\ldots ,m-1\) has K children. Thus the leaf-set \(L({\mathcal {T}})\) is naturally indexed by \([K]^m\). For \(v,v'\in L({\mathcal {T}})\) we let \(v\wedge v'\in \{1,2,\ldots ,m\}\) denote the height of their least common ancestor. For each non-leaf \(x\in V({\mathcal {T}})\), label the edge from x to its parent by an i.i.d. Gaussian vector \({\varvec{g}}^{(x)}\sim {\mathcal {N}}(0,I_N)\). Then for each leaf \(v\in L({\mathcal {T}})\), using the m Gaussian vectors along the path from the root of \({\mathcal {T}}\) to v yields branching IAMP output \({\varvec{\sigma }}^{(v)}\) for any \(H_N\).

Proposition 3.6

Proposition 3.5 holds with exponentially good probability in the following sense. Fix m and \(q_1<q_2<\dots <q_m=1\). For any \({\varepsilon }>0\), for large enough \({\underline{\ell }}\) there exists \(\eta =\eta ({\varepsilon },{\underline{\ell }})>0\) such that for N large enough, the following hold simultaneously across all \(v,v'\in L({\mathcal {T}})\) with probability at least \(1-\exp (-\eta N)\):

$$\begin{aligned} \begin{aligned} |{\mathbb {A}}(\Phi ) - H_N({\varvec{n}}^{\overline{\ell },v})/N |&\le {\varepsilon }; \\ \Vert \nabla H_N({\varvec{n}}^{\overline{\ell },v})- \vec {A}(1) \diamond {\varvec{n}}^{\overline{\ell },v}\Vert _N&\le {\varepsilon }; \\ \left\Vert\vec R({\varvec{n}}^{\overline{\ell },v},{\varvec{n}}^{\overline{\ell },v'}) - \Phi (q_{v\wedge v'}) \right\Vert_N&\le {\varepsilon }. \end{aligned} \end{aligned}$$
(3.19)

Proof

As explained in [15, Sect. 8], the map \(H_N\mapsto {\varvec{n}}^{\overline{\ell }}\) agrees with a \(C({\underline{\ell }})\)-Lipschitz function of the coefficients \({\varvec{G}}^{(k)}\) of \(H_N\) except with probability \(1-\exp (-cN)\). The same proof applies for \(H_N\mapsto {\varvec{n}}^{\overline{\ell },v}\) as well since the external noise variables are also Gaussian. Concentration of measure on Gaussian space now ensures that the statements above hold with exponentially high probability for each fixed \((v,v')\). Union bounding over all such pairs for small enough \(\eta \) implies the result. \(\square \)

In particular, the last conclusion in (3.19) shows that all \(\exp (\eta N)\) constructed points have pairwise distance at least \(\delta \sqrt{N}\) for \(0<\delta <1-q_{m-1}\). Thus for any sub-solvable model, with high probability there are exponentially many \(\sqrt{N}/C(\xi )\)-separated approximate critical points. This is a converse to the main result of [17], where we show that strictly super-solvable models enjoy a strong topological trivialization property which rules out such behavior.

Remark 3.3

An alternative to branching IAMP, which is very natural from the point of view of our companion work [16], is to slightly perturb \(H_N\) to a \((1-\eta )\)-correlated function \(H_N^{(\eta )}\). Concentration of measure implies that the overlap

$$\begin{aligned} \vec R\big ({\mathcal {A}}(H_N),{\mathcal {A}}(H_N^{(\eta )})\big ) \end{aligned}$$

concentrates exponentially around a limiting value \(R_{\delta ,{\underline{\ell }},\eta }\in {\mathbb {R}}^r\). We expect that taking \(\eta \rightarrow 0\) with \(\delta ,{\underline{\ell }}\) in a suitable way enables \(R_{\delta ,{\underline{\ell }},\eta }\approx \Phi (q)\) for any desired \(q\in [q_1,1]\). This corresponds to the fact that p(q) for \(q\in [q_1,1]\) for any \((p,\Phi ;q_0)\) maximizing \({\mathbb {A}}\). However this approach seems more cumbersome to analyze explicitly.

Remark 3.4

The construction in this section shows the quenched existence of \(\exp (\eta N)\) well-separated approximate critical points for strictly sub-solvable models. In [17, Theorem 5.15] we use this fact to prove the number of exact critical points is exponentially large in expectation. However we are unable to prove the quenched (i.e. high-probability) existence of \(\exp (\eta N)\) exact critical points in strictly sub-solvable models. Showing that this is the case, or more generally identifying the quenched exponential order of the number of critical points, is an interesting direction for future work.