1 Introduction

This paper is concerned with the independence ratio of random regular graphs. A graph is said to be regular if each vertex has the same degree. For a fixed degree d, let \(\mathbb {G}(N,d)\) be a uniform random d-regular graph on N vertices. Note that \(\mathbb {G}(N,d)\) has a “trivial local structure” in the sense that with high probability almost all vertices have the same local neighborhood: \(\mathbb {G}(N,d)\) almost surely converges locally to the d-regular tree \(T_d\) as \(N \rightarrow \infty \). In statistical physics \(T_d\) is also known as the Bethe lattice. In fact, Mézard and Parisi used the expression Bethe lattice for referring to \(\mathbb {G}(N,d)\) (see [21] for example), and proposed to study various models on these random graphs.

In a graph, an independent set is a set of vertices, no two of which are adjacent, that is, the induced subgraph has no edges. The independence ratio of a graph is the size of its largest independent set normalized by the number of vertices. For any fixed degree \(d \ge 3\), the independence ratio of \(\mathbb {G}(N,d)\) is known to converge to some constant \(\alpha ^*_d\) as \(N \rightarrow \infty \) [4]. Determining \(\alpha ^*_d\) is a major challenge of the area. The cavity method, a non-rigorous statistical physics tool, led to a 1-step replica symmetry breaking (1-RSB) formula for \(\alpha ^*_d\). The authors of [5] also argued that the formula may be exact for \(d \ge 20\), which is widely believed to be indeed the case. Later this 1-RSB formula was confirmed to be exact for (very) large d in the seminal paper of Ding et al. [10].

Lelarge and Oulamara [19] used the interpolation method to rigorously establishFootnote 1 the 1-RSB formula as an upper bound for every \(d \ge 3\). This approach also provides r-step RSB bounds for any \(r \ge 2\). The problem is that these formulas get increasingly complicated and fully solving the corresponding optimization problems seems to be out of reach. Can we, at least, get an estimate or a bound?

The parameters of the r-RSB bound includes a functional order parameter which can be thought of as a measure.Footnote 2 The optimal measure satisfies a certain self-consistency equation. We cannot hope for an exact solution so the natural instinct is to try to find an approximate solution. In the physics literature an iterative randomized algorithm called population dynamics is often used to find an approximate solution in the 1-RSB (and occasionally in the 2-RSB) setting for various models. This sounds like a promising approach but we came to the surprising conclusion that for the hard-core model it may be a better strategy to forget about the equation altogether and search among “simple” measures. It seems to be possible to get very close to the global optimum using atomic measures with a moderate number of atoms. Furthermore, when we only have a few atoms, we can tune their weights and locations to a great precision, and this seems to outweigh the advantage of having a more “delicate” measure (but being unable to tune it to the same precision).

Moreover, using a small number of atoms means that we can compute the value exactly and the interpolation method ensures that what we get is always a rigorous upper bound. In contrast, population dynamics only gives an estimate for the value of the bound because for large populations one simply cannot compute the corresponding bound precisely and has to settle for an estimate based on a sample.

Therefore, our approach is that we try to find local minima of the discrete version (corresponding to atomic measures) using a computer. Even this is a formidable challenge as we will see. Table 1 shows the best bounds we found via numerical optimization.

Table 1 Upper bounds for the asymptotic independence ratio \(\alpha ^*_d\), \(3 \le d \le 8\)

These may seem to be small improvements but we actually expect the true values to be fairly close to our new bounds. In particular, for \(d=3\) it is reasonable to conjecture that the bound is sharp up to at least five decimal digits, that is, \(\alpha ^*_3 = 0.45078...\).

We also have improvements for \(9 \le d \le 19\). However, as the degree gets closer to the threshold \(d \ge 20\) (above which 1-step replica symmetry breaking is believed to be the truth), the 1-RSB bound gets sharper and our improvement gets smaller. For more details, see the tables in the Appendix.

1.1 Upper Bound Formulas

In Sect. 2 we will explain the RSB bounds in detail. Here we only display a few formulas in order to give the reader an idea of the optimization tasks we are faced with.

For comparison, we start with the replica symmetric (RS) bound: for any \(\lambda >0\) and any \(x \in [0,1]\) we have

$$\begin{aligned} \alpha ^*_d \log \lambda \le \log \big ( 1+\lambda (1-x)^d \big ) - \frac{d}{2} \log (1-x^2) . \end{aligned}$$
(1)

Here the fugacity parameter \(\lambda \) is the “reward” for including a vertex in the independent set, while x can be thought of as the probability that a cavityFootnote 3 vertex is included. Then the right-hand side expresses the change in the free energy when adding a star (i.e., connecting a new vertex to d cavities) versus adding d/2 edges between cavities. Choosing \(\lambda \) and x optimally leads to the exact same formula as the Bollobás bound from 1981 [6], which was based on a first moment calculation for the number of independent sets of a given size. Actually, this relatively simple bound is already asymptotically tight: \(\big ( 2 + o_d(1) \big ) \frac{\log d}{d}\), where the asymptotic lower bound is due to Frieze and Łuczak [11].

The 1-RSB bound says that for any \(\lambda _0>1\) and any \(q \in [0,1]\):

$$\begin{aligned} \alpha ^*_d \log (\lambda _0) \le \log \big ( 1+(\lambda _0-1)(1-q)^d \big ) - \frac{d}{2} \log \big ( 1- (1-1/\lambda _0) q^2 \big ) . \end{aligned}$$
(2)

Choosing \(\lambda _0\) and q optimally leads to an (implicit) formula for \(\alpha ^*_d\). As we mentioned, this 1-RSB bound is conjectured to be sharp for any \(d \ge 20\) and known to be sharp for sufficiently large d.

Heavy notation would be needed to describe the r-step RSB bounds in general. In order to keep the introduction concise, we only give (a discretized version of) the formula for the case \(r=2\): for any \(\lambda _0 >1\), \(0<m<1\), and any \(p_1, \ldots , p_n, q_1, \ldots , q_n \in [0,1]\) with \(p_1 + \cdots + p_n = 1\) we have

$$\begin{aligned} \alpha ^*_d \, m \log (\lambda _0){} & {} \le \log \sum _{i_1=1}^n \cdots \sum _{i_d=1}^n \, \bigg ( \prod _{\ell =1}^d p_{i_\ell } \bigg ) \bigg ( 1+(\lambda _0-1) \prod _{\ell =1}^d (1-q_{i_\ell }) \bigg )^m \nonumber \\{} & {} \quad - \frac{d}{2} \log \sum _{i_1=1}^n \sum _{i_2=1}^n p_{i_1} p_{i_2} \bigg ( 1- (1-1/\lambda _0) q_{i_1} q_{i_2} \bigg )^m . \end{aligned}$$
(3)

The number of parameters for general r is roughly \(2 n_1 \ldots n_{r-1}\), where \(n_k\) denotes the number of atoms used at the different layers, so the dimension of the parameter space grows exponentially in r, see Sect. 2.3 for details.

1.2 The Case of Degree 3

One can plug any concrete choice of parameter values into (3) to get a bound for the independence ratio. To demonstrate the strength of (3) even for small n, we include here an example for a 2-RSB bound for \(d=3\), \(n=4\): the values

$$\begin{aligned} \lambda _0&= 19.3&\quad p_1&= 0.2493&\quad p_2&= 0.2778&\quad p_3&= 0.2880&\quad p_4&= 0.1849\\ m&= 0.557&\quad q_1&= 0.1184&\quad q_2&= 0.5947&\quad q_3&= 0.8876&\quad q_4&= 0.9827 \end{aligned}$$

give a bound \(\alpha ^*_3< 0.450789952<0.45079\) that already comfortably beats the currently best bound (\(\approx 0.45086\)). Table 2 shows our best bounds for \(d=3\).

Table 2 The degree 3 case: our best r-RSB bounds for \(\alpha ^*_3\) for \(r=2,3,4,5\) compared to previous upper bounds

As for lower bounds for small d, the best results have been achieved by so-called local algorithms. Table 3 lists a few selected works and the obtained bounds for \(\alpha ^*_3\).

Table 3 Lower bounds on \(\alpha ^*_3\)

Note that a beautiful result of Rahman and Virág [25], building on a work of Gamarnik and Sudan [14], says that asymptotically (as \(d \rightarrow \infty \)) local algorithms can only produce independent sets of half the maximum size (over random regular graphs). For small d, however, the independence ratio produced by local algorithms may be the same as (or very close to) \(\alpha ^*_d\).

1.3 Optimization

We wrote Python/SAGE codes to perform the numerical optimization for the replica bounds.

  • The first task was to efficiently compute the r-RSB formulas and their derivatives w.r.t. the parameters.

  • Then we used standard algorithms to perform local optimization starting from random points. As the parameter space grows, more attempts are required to find an appropriate starting point leading to a good local optimum.

  • Eventually we start to encounter a rugged landscape with a huge number of local minima, where we cannot expect to get close to the global optimum even after trying a large number of starting points. In order to overcome this obstacle, for \(d=3\) we used a technique called basin hopping. In each step, the algorithm randomly visits a “nearby” local minimum, favoring steps to smaller values. This approach led to the discovery of our best bounds for \(d=3\).

  • The smaller the degree d, the deeper we could go in the replica hierarchy (i.e., use larger r). We could perform the 3-RSB optimization for \(d \le 10\), the 4-RSB optimization for \(d \le 6\), and the 5-RSB optimization for \(d=3\).

See Sect. 3.2 for further details about the implementation.

Although the bounds are hard to find, they are easy to check: one simply needs to plug the specific parameter values into the given formulas. We created a website with interactive SAGE codes where the interested reader may check the claimed bounds and even run simple optimizations: https://www.renyi.hu/~harangi/rsb.htm. Our codes can be found in the public GitHub repository https://github.com/harangi/rsb.

1.4 2-RSB in the Literature

As far as we know, there was only one previous attempt to get an estimate for the 2-RSB formula (only for \(d=3\)). In [5] it reads that “the 2-RSB calculation is [...] somewhat involved and was done in [24] [and obtained the value] 0.45076(7)”. Rivoire’s thesis [24] indeed reports briefly of a 2-RSB calculation. Note that, since he considers the equivalent vertex-cover problem (concerning the complements of independent sets), we need to subtract his value from 1 to get our value. On page 113 he writes that using population dynamics he obtained the following estimate: \(0.54924 \pm 0.00007\). For our problem this means \(0.45076 \pm 0.00007 = [0.45069,0.45083]\). The value 0.45076(7) in [5] may have come from mistakenly using an error \(\pm 0.000007\) instead of \(\pm 0.00007\) when citing Rivoire’s work. The thesis only provides a short description of how this estimate was obtained. The author refers to it as “unfactored” 1-RSB and it seems to be the same as what we call a non-standard 1-RSB in our remarks after Theorem 2.2. If that is indeed the case, then our findings suggest that its true value should actually be around 0.45081.

1.5 Outline of the Paper

In Sect. 2 we present the general replica bounds and their discrete versions that we need to optimize. Section 3 contains details about the numerical optimization. In Sect. 4 we revisit the \(r=1\) case and investigate more sophisticated choices for the functional order parameter. The Appendix contains a table listing our best bounds for different values of d and r (Sect. 5) and an overview of the interpolation method for the hard-core model over random regular graphs (Sect. 6).

2 Replica Formulas

Originally the cavity method and belief propagation were non-rigorous techniques in statistical physics to predict the free energy of various models. They inspired a large body of rigorous work, and over the years several predictions were confirmed. In particuler, the so-called interpolation method has been used with great success to establish rigorous upper bounds on the free energy.

In the context of the hard-core model over random d-regular graphs, the interpolation method was carried out by Lelarge and Oulamara in [19], building on the pioneering works [12, 13, 23]. First we present the general r-step RSB bound obtained this way.

2.1 The General Replica Bound

For a topological space \(\Omega \) let \(\mathcal {P}(\Omega )\) denote the space of Borel probability measures on \(\Omega \) equipped with the weak topology. We set and then recursively for \(k \ge 1\). The general bound will have the following parameters:

  • \(\lambda >1\);

  • \(0< m_1, \ldots , m_r <1\) corresponding to the so-called Parisi parameters;

  • a measure \(\eta ^{(r)} \in \mathcal {P}^r\).

Definition 2.1

Given a fixed \(\eta ^{(r)} \in \mathcal {P}^r\), we choose (recursively for \(k=r-1,r-2,\ldots ,1\)) a random \(\eta ^{(k)} \in \mathcal {P}^k\) with distribution \(\eta ^{(k+1)}\). Finally, given \(\eta ^{(1)}\) we choose a random \(x \in [0,1]\) with distribution \(\eta ^{(1)}\). In fact, we will need d independent copies of this random sequence, indexed by \(\ell \in \{1,\ldots , d\}\). Schematically:

$$\begin{aligned} \eta ^{(r)} \, \rightarrow \, \eta _\ell ^{(r-1)} \, \rightarrow \, \cdots \, \rightarrow \, \eta _\ell ^{(1)} \, \rightarrow \, x_\ell \quad (\ell = 1,\ldots , d) . \end{aligned}$$

For \(1 \le k \le r\) we define \(\mathcal {F}_k\) as the \(\sigma \)-algebra generated by \(\eta _\ell ^{(r-1)}, \ldots , \eta _\ell ^{(k)}\), \(\ell =1,\ldots ,d\), and by \(\mathbb {E}_k\) we denote the conditional expectation w.r.t. \(\mathcal {F}_k\). Note that \(\mathcal {F}_r\) is the trivial \(\sigma \)-algebra and hence \(\mathbb {E}_r\) is simply \(\mathbb {E}\).

Given a random variable V (depending on the variables \(\eta _\ell ^{(k)}, x_\ell \)), let us perform the following procedure: raise it to power \(m_1\), then apply \(\mathbb {E}_1\), raise the result to power \(m_2\), then apply \(\mathbb {E}_2\), and so on. In formula, let and recursively for \(k=1, \ldots , r\) set

In this scenario, applying \(\mathbb {E}_k\) means that, given \(\eta _\ell ^{(k)}\), \(\ell =1,\ldots ,d\), we take expectation in \(\eta _\ell ^{(k-1)}\), \(\ell =1,\ldots ,d\) (or in \(x_\ell \) if \(k=1\)).

Now we are ready to state the r-RSB bound given by the interpolation method.

Theorem 2.2

Let \(r \ge 1\) be a positive integer and \(\lambda , m_1, \ldots , m_r, \eta ^{(r)}\) parameters as described above. Let \(x_\ell \), \(\ell =1,\ldots ,d\) denote the random variables obtained from \(\eta ^{(r)}\) via the procedure in Definition 2.1. Then we have the following upper bound for the asymptotic independence ratio \(\alpha ^*_d\) of random d-regular graphs:

$$\begin{aligned} \alpha ^*_d \, m_1 \cdots m_r \log \lambda \le \log T_r \big ( 1+ \lambda (1-x_1)\cdots (1-x_d) \big ) - \frac{d}{2} \log T_r (1-x_1 x_2) . \end{aligned}$$

This was rigorously proved in [19]. They actually considered a more general setting incorporating a class of (random) models over a general class of random hypergraphs (with given degree distributions). They used the hard-core model over d-regular graphs as their chief example, working out the specific formulas corresponding to their general RS and 1-RSB bounds. Theorem 2.2 follows from their general r-RSB bound [19, Theorem 3] exactly the same way as in the RS and 1-RSB case.

We should make a number of remarks at this point.

  • Above we slightly deviated from the standard notation as the usual form of the Parisi parameters would be

    $$\begin{aligned} 0< \hat{m}_1< \cdots< \hat{m}_r < 1 , \end{aligned}$$

    where \(\hat{m}_k\) can be expressed in terms of our parameters \(m_k\) as follows:

    $$\begin{aligned} \hat{m}_r = m_1; \quad \hat{m}_{r-1} = m_1 m_2; \quad \ldots ; \quad \hat{m}_{1} = m_1 m_2 \cdots m_r . \end{aligned}$$

    As a consequence, the indexing of \(\mathcal {F}_k\), \(\mathbb {E}_k\), \(T_k\) is in reverse order, and the definition of \(T_k\) simplifies a little because raising to power \(1/\hat{m}_{r-k+2}\) and then immediately to \(\hat{m}_{r-k+1}\) (as done, for example, in [23]) amounts to a single exponent \(\hat{m}_{r-k+1}/\hat{m}_{r-k+2}=m_k\) in our setting.

  • Also, generally there is an extra layer of randomness (starting from an \(\eta ^{(r+1)} \in \mathcal {P}^{r+1}\)) resulting in another expectation outside the \(\log \). This random choice is meant to capture the local structure of the graph in a given direction. However, when the underlying graph is d-regular (meaning that essentially all vertices see the same graph structure locally), we do not need this layer of randomness (in principle). Therefore, in the d-regular case one normally chooses a trivial \(\eta ^{(r+1)} = \delta _{\eta ^{(r)}}\). That is why we omitted \(\eta ^{(r+1)}\) and started with a deterministic \(\eta ^{(r)}\).

    For \(d \ge 20\), where the 1-RSB bound is (conjectured to be) tight, the optimal choice of parameters indeed uses a trivial \(\eta ^{(r+1)} = \delta _{\eta ^{(r)}}\) with r being 1 in this case.

    For \(d \le 19\), the same choice gives us a 1-RSB upper bound (which is not tight any more). Let us call this the standard 1-RSB bound, and, in general, we call an r-RSB bound standard if it was obtained by using a deterministic \(\eta ^{(r)}\) at the start. Then a non-standard bound would use \(\eta ^{(r+1)}\) (and hence random \(\eta _\ell ^{(r)}\) variables). Note that a non-standard r-RSB bound is actually a special case of standard \((r+1)\)-RSB bounds in the limit \(m_{r+1} \rightarrow 0\). So even though it is possible to improve on standard r-step bounds by non-standard r-step bounds, it actually makes more sense to use the extra layer to move to \((r+1)\)-step bounds instead (and use some positive \(m_{r+1})\).

  • The full RSB picture is well-understood for the famous Sherrington–Kirkpatrick model [22, 26], where the infimum of the r-RSB bound converges to the free energy as \(r \rightarrow \infty \). It is reasonable to conjecture that this is the case for the hard-core model as well. There is some progress towards this in [8] where a variational formula is obtained for \(\alpha _d^*\).

2.2 A Specific Choice

The formula in Theorem 2.2 would be hard to work with numerically because it would only give good results for very large \(\lambda \). So we make a specific choice (similar to the one made in [19, Section 3.2.1] in the case \(r=1\)) that may not be optimal but will allow us to use numerical optimization. We consider the limit \(\lambda \rightarrow \infty \) and \(m_1 \rightarrow 0\) in a way that \(m_1 \log \lambda \) stays constant and x is concentrated on the two-element set \(\{0, 1-1/\lambda \}\), meaning that \(\eta ^{(1)}\) is a distribution \(q \delta _{1-1/\lambda } + (1-q) \delta _0\) for some random \(q \in [0,1]\).

For a fixed \(\lambda _0>1\) let \(\log \lambda _0 = m_1 \log \lambda \). First we focus on the expressions \(T_1 \big ( 1+ \lambda (1-x_1)\cdots (1-x_d) \big )\) and \(T_1 (1-x_1 x_2)\). If each \(x_\ell \in \{0, 1-1/\lambda \}\) was fixed, we would have the following in the limit as \(\lambda \rightarrow \infty \), \(m_1 \rightarrow 0\) with \(m_1 \log \lambda = \log \lambda _0\):

$$\begin{aligned} \big ( 1+ \lambda (1-x_1)\cdots (1-x_d) \big )^{m_1}&\rightarrow {\left\{ \begin{array}{ll} \lambda _0 &{} \text{ if } \text{ each } x_\ell \text{ is } \text{0; }\\ 1 &{} \text{ otherwise; } \end{array}\right. } \\ (1-x_1 x_2)^{m_1}&\rightarrow {\left\{ \begin{array}{ll} 1/\lambda _0 &{} \text{ if } x_1=x_2=1-1/\lambda ;\\ 1 &{} \text{ otherwise. } \end{array}\right. } \end{aligned}$$

Therefore, conditioned on

$$\begin{aligned} \eta _\ell ^{(1)} = q_\ell \delta _{1-1/\lambda } + (1-q_\ell ) \delta _0 \end{aligned}$$

for some deterministic \(q_1, \ldots , q_d \in [0,1]\), we get

$$\begin{aligned} T_1 \big ( 1+ \lambda (1-x_1)\cdots (1-x_d) \big )&\rightarrow 1+(\lambda _0-1) (1-q_1)\cdots (1-q_d) ;\\ T_1 (1-x_1 x_2)&\rightarrow 1-(1-1/\lambda _0) q_1 q_2 . \end{aligned}$$

In the resulting formula the randomness in layer 1 disappears along with the Parisi parameter \(m_1\). After re-indexing (\(k \rightarrow k-1\)) we get the following corollary.

Corollary 2.3

Let \(\lambda _0>1\) and \(0< m_1, \ldots , m_{r-1} <1\). Furthermore, fix a deterministic \(\pi ^{(r-1)} \in \mathcal {P}^{r-1}\) and take d independent copies of recursive sampling:

$$\begin{aligned} \pi ^{(r-1)} \, \rightarrow \, \pi _\ell ^{(r-2)} \, \rightarrow \, \cdots \, \rightarrow \, \pi _\ell ^{(1)} \, \rightarrow \, q_\ell \quad (\ell = 1,\ldots , d) . \end{aligned}$$

We define the conditional expectations \(\mathbb {E}_k\) and the corresponding \(T_k\) as before, w.r.t. this new system of random variables. Then

$$\begin{aligned} \alpha ^*_d \, m_1 \cdots m_{r-1} \log \lambda _0{} & {} \le \log T_{r-1} \big ( 1+ (\lambda _0-1)(1-q_1)\cdots (1-q_d) \big )\\ {}{} & {} \quad - \frac{d}{2} \log T_{r-1} \big ( 1-(1-1/\lambda _0) q_1 q_2 \big ) . \end{aligned}$$

Proof

For a formal proof one needs to define an \(\eta ^{(r)}=\eta _\lambda ^{(r)} \in \mathcal {P}^r\) for the fixed \(\pi ^{(r-1)}\) and any given \(\lambda \) such that the corresponding \(\eta _\ell ^{(1)}\) is distributed as \(q_\ell \delta _{1-1/\lambda } + (1-q_\ell ) \delta _0\). Then Theorem 2.2 can be applied and we get the new formula in the limit. \(\square \)

2.3 Discrete Versions

In our numerical computations we will use the bound of Corollary 2.3 in the special case when each distribution is discrete.

For \(r=1\) we have a deterministic q and we get back (2), while \(r=2\) gives (3).

Let \(r=3\). For any \(\lambda _0 >1\), \(0<m_1,m_2<1\), \(p_i \ge 0\) with \(\sum p_i=1\), \(p_{i,j} \ge 0\) with \(\sum _j p_{i,j}=1\) for every fixed i, and \(q_{i,j} \in [0,1]\) we get that

$$\begin{aligned}&\alpha ^*_d \, m_1 m_2 \log (\lambda _0) \le \log R^{\textrm{star}} - \frac{d}{2} \log R^{\textrm{edge}} \text{, } \text{ where } \\&R^{\textrm{star}} = \sum _{i_1} \cdots \sum _{i_d} \, \bigg ( \prod _{\ell =1}^d p_{i_\ell } \bigg ) \left( \sum _{j_1} \cdots \sum _{j_d} \, \bigg ( \prod _{\ell =1}^d p_{i_\ell ,j_\ell } \bigg ) \bigg ( 1+(\lambda _0-1) \prod _{\ell =1}^d (1-q_{i_\ell ,j_\ell }) \bigg )^{m_1} \right) ^{m_2} ;\\&R^{\textrm{edge}} = \sum _{i_1} \sum _{i_2} p_{i_1} p_{i_2} \left( \sum _{j_1} \sum _{j_2} p_{i_1,j_1} p_{i_2,j_2} \bigg ( 1-(1-1/\lambda _0) q_{i_1,j_1} q_{i_2,j_2} \bigg )^{m_1} \right) ^{m_2} . \end{aligned}$$

For a general \(r \ge 1\), we will index our parameters \(p_s,q_s\) with sequences \(s=\big ( s^{(1)},\ldots ,s^{(k)} \big )\) of length \(|s|=k \le r-1\). We denote the empty sequence (of length 0) by \(\emptyset \). Furthermore, we write \(s' \succ s\) if \(s'\) is obtained by adding an element to the end of s, that is, \(|s'|=|s|+1\) and the first |s| elements coincide.

Now let S be some set of sequences of length at most \(r-1\) such that \(\emptyset \in S\). We partition S into two parts \(S_{\le r-2} \cup S_{r-1}\) based on whether the length of the sequence is at most \(r-2\) or exactly \(r-1\), respectively.

Now the discrete version of the r-RSB bound has the following parameters:

  • \(\lambda _0>1\);

  • \(0<m_1, \ldots , m_{r-1}<1\);

  • \(p_{s} \ge 0\), \(s \in S\), satisfying

    $$\begin{aligned} \sum _{s' \succ s} p_{s'} = 1 \text{ for } \text{ each } s \in S_{\le r-2} ; \end{aligned}$$
  • \(q_{s} \in [0,1]\), \(s \in S_{r-1}\).

Now we define the distribution \(\pi ^{(r-1)} \in \mathcal {P}^{r-1}\) corresponding to the parameters \(p_s, q_s\). Set

and then, recursively for \(k=r-2,r-3,\ldots ,1,0\), for a sequence s of length \(|s|=k\) let

We want to use Corollary 2.3 with . The obtained bound can be expressed as follows.

For any d-tuple \(s_1,\ldots ,s_d\) of sequences of length \(r-1\), set

(4)

and then, recursively for \(k=r-1, r-2, \ldots , 1\), for any d-tuple \(s_1,\ldots ,s_d\) of sequences of length \(k-1\) let

(5)

Similarly, for any pair \(s_1,s_2\) of sequences of length \(r-1\), set

and then, recursively for \(k=r-1, r-2, \ldots , 1\). for any pair \(s_1,s_2\) of sequences of length \(k-1\), let

Then the bound is

$$\begin{aligned} \alpha ^*_d \, m_1 \ldots m_{r-1} \log (\lambda _0) \le \log R_{\emptyset ,\ldots ,\emptyset }^{\textrm{star}} - \frac{d}{2} \log R_{\emptyset ,\emptyset }^{\textrm{edge}} . \end{aligned}$$
(6)

Remark 2.4

Normally we fix integers \(n_1, \ldots , n_{r-1} \ge 2\) and assume that the k-th elements of our sequences come from the set \(\{1,\ldots , n_k\}\). This way the number of free parameters (after taking the sum restrictions on the parameters \(p_s\) into account) is

$$\begin{aligned} (r-1) + 2 n_1 n_2 \cdots n_{r-1} . \end{aligned}$$
(7)

In the tables of Sect. 3.1 and the Appendix we will refer to such a parameter space as \([n_1, \ldots , n_{r-1}]\).

3 Numerical Optimization

3.1 Numerical Results

Our starting point was the observation in [5] that the 1-RSB formula for \(\alpha ^*_d\) “is stable towards more steps of replica symmetry breaking” only for \(d \ge 20\), so it should not be exact for \(d \le 19\). Therefore the 2-RSB bound in Corollary 2.3 ought to provide an improved upper bound for some choice of \(\lambda _0,m_1,\pi ^{(1)}\). The optimal \(\pi ^{(1)}\) may be continuous. Can we achieve significant improvement on the 1-RSB bound even by using some atomic measure \(\pi ^{(1)} = \sum _{i=1}^n p_i \delta _{q_i}\)? In other words, can we find good substituting values for the parameters \(p_i,q_i\) of the discrete version (3) using numerical optimization? We were skeptical because we may not be able to use a large enough n to get a good atomic approximation of the optimal \(\pi ^{(1)}\). Surprisingly, based on our findings it appears that even a small number of atoms may yield close-to-optimal bounds. Table 4 shows our best 2-RSB bounds for \(d=3\) and for different values of n.

Table 4 Our 2-RSB bounds for \(\alpha ^*_3\) using n atoms

Of course, we do not know what the true infimum of the bound in Corollary 2.3 is, but our bounds seem to stabilize very quickly as we increase the number of atoms (n). Also, we experimented with various other approaches that would allow for better approximations of continuous distributions and they all pointed to the direction that the bounds in Table 4 are close to optimal.

This actually gave us the hope that it may not be impossible to get further improvements by considering r-step replica bounds for \(r \ge 3\) even though the number of atoms we can use at each layer is indeed very small due to computational capacities. Table 5 shows some bounds we obtained for \(d=3\) and \(r \ge 3\) using different parameter spaces (see Remark 2.4).

Table 5 Our 3,4,5-step RSB bounds for \(\alpha ^*_3\)

The dimension of the parameter space (7) depends only on \(r,n_1,\ldots ,n_{r-1}\) and not on the degree d. However, as we increase d, computing \(R^{\textrm{star}}\) (see Sect. 2.3) and its derivative takes longer and we have to settle for using smaller values of r and \(n_k\). At the same time, the 1-RSB formula is presumably getting closer to the truth as we are approaching the phase transition between \(d=19\) and \(d=20\). Nevertheless, we tried to achieve as much improvement as we could for each degree \(d=3,\ldots , 19\). See the Appendix for results for \(d \ge 4\).

3.2 Implementation

3.2.1 Efficient Computation

According to (6), our RSB upper bound for \(\alpha ^*_d\) reads as

$$\begin{aligned} \frac{\log R_{\emptyset ,\ldots ,\emptyset }^{\textrm{star}} - \frac{d}{2} \log R_{\emptyset ,\emptyset }^{\textrm{edge}}}{m_1 \ldots m_{r-1} \log (\lambda _0)} , \end{aligned}$$
(8)

where \(R_{\emptyset ,\ldots ,\emptyset }^{\textrm{star}}\) and \(R_{\emptyset ,\emptyset }^{\textrm{edge}}\) were defined recursively through \(r-1\) steps, each step involving a multifold summation, see Sect. 2.3 for details. So our task is to minimize (8) as a function of the parameters. During optimization the function and its partial derivatives need to be evaluated at a large number of locations. So it was crucial for us to design program codes that compute them efficiently. Instead of trying to do the summations using for loops, the idea is to utilize the powerful array manipulation tools of the Python library NumPy. In particular, one can efficiently perform element-wise calculations or block summations on the multidimensional arrays of NumPy.

First we show how \(R_{\emptyset ,\ldots ,\emptyset }^{\textrm{star}}\) can be obtained using such tools. Recall that \(S_k\) contains sequences of length k and we have a parameter \(p_s\) for any \(s \in S_1 \cup \cdots \cup S_{r-1}\) and \(q_s\) for any \(s \in S_{r-1}\). In particular, in the \([n_1, \ldots , n_{r-1}]\) setup we have \(|S_k|=n_1 \cdots n_k\). We do the following steps.

  • \(v_k\): vector of length \(|S_k|\) consisting of \(p_s\), \(s \in S_k\)    (\(1 \le k \le r-1\)).

  • \(P_k\): d-dimensional array of size \(|S_k| \times \cdots \times |S_k|\) obtained by “multiplying” d copies of \(v_k\). (Each element of \(P_k\) is a product \(p_{s_1}\cdots p_{s_d}\) for some \(s_1,\ldots ,s_d \in S_k\).)

  • \(M_{r-1}\): d-dimensional array of size \(|S_{r-1}| \times \cdots \times |S_{r-1}|\) obtained by “multiplying” d copies of the vector consisting of \(1-q_s\), \(s \in S_{r-1}\), then multiply each element by \(\lambda _0-1\) and add 1; cf. (4).

  • Then, recursively for \(k=r-1,r-2, \ldots ,1\), given the \(|S_k| \times \cdots \times |S_k|\) array \(M_k\) we obtain \(M_{k-1}\) as follows: we raise \(M_k\) to the power of \(m_{r-k}\) and multiply by \(P_k\) (both element-wise), and perform a block summation: in the \([n_1,\ldots ,n_{r-1}]\) setup we divide the array into \(n_k \times \cdots \times n_k\) blocks and replace each with the sum of the elements in the block; cf. (5).

  • At the end \(M_0\) will have a single element equal to \(R_{\emptyset ,\ldots ,\emptyset }^{\textrm{star}}\).

One can compute \(R_{\emptyset ,\ldots ,\emptyset }^{\textrm{edge}}\) similarly, using two-dimensional arrays this time.

Note that during the computation of \(R_{\emptyset ,\ldots ,\emptyset }^{\textrm{star}}\) all the d-dimensional arrays are invariant under any permutation of the d axes. This means that the same products appear in many instances, hence the same calculations are repeated many times in the approach above. However, typically we get plenty of compensation in efficiency due to the fact that all the calculations can be done in one sweep using powerful array tools. Nevertheless, when the degree d gets above 7, we do use another approach in the 2-RSB setting \(d \, \, [n]\). In advance, we create a list containing all partitions of d into the sum of n nonnegative integers \(d=a_1+\cdots +a_n\). We also store the corresponding multinomial coefficients \(\left( {\begin{array}{c}d\\ a_1,\ldots ,a_n\end{array}}\right) \) in a vector. Then, at each function call, we go through the list of partitions and compute

$$\begin{aligned} p_1^{a_1}\cdots p_n^{a_n} \big (1 + (\lambda _0-1) (1-q_1)^{a_1}\cdots (1-q_n)^{a_n} \big )^{m_1} , \end{aligned}$$

storing the values in a vector. Then we simply need to take the dot product with the precalculated vector containing the multinomial coefficients.

In both approaches computing the partial derivatives with respect to the parameters (\(\lambda \), \(m_k\), \(p_s\), \(q_s\)) is more involved but can be done using similar techniques (array manipulations and partitioning, respectively). As an example, we show how we can compute \(\partial R_{\emptyset ,\ldots ,\emptyset }^{\textrm{star}} / \partial p_s\) in the first approach. For a given \(1 \le \ell \le r-1\) we will do this for all \(s \in S_\ell \) at once, resulting in a vector of length \(|S_\ell |\) consisting of the partial derivatives w.r.t. each \(p_s\), \(s \in S_\ell \). We will use again the vectors \(v_k\) and the arrays \(P_k,M_k\) obtained during the computation of \(R_{\emptyset ,\ldots ,\emptyset }^{\textrm{star}}\).

  • \(P'_k\): d-dimensional array of size \(|S_k| \times \cdots \times |S_k|\) obtained by “multiplying” the all-ones vector of length \(|S_k|\) and \(d-1\) copies of \(v_k\).

  • \(D_k\): d-dimensional array of size \(|S_k| \times \cdots \times |S_k|\) obtained by (element-wise) raising \(M_k\) to the power of \(m_{r-k}-1\) and multiplying by \(m_{r-k}\) and by \(P_k\).

  • For a given \(1 \le \ell \le r-1\) we start with \(M_\ell \), raise it to the power of \(m_{r-\ell }\) and multiply it by \(P'_\ell \) (both element-wise) and perform a block summation for blocks of size \(1 \times n_\ell \times \cdots \times n_\ell \), resulting in an \(|S_\ell | \times |S_{\ell -1}| \times \cdots \times |S_{\ell -1}|\) array that we denote by \(M'_{\ell -1}\).

  • Then, recursively for \(k=\ell -1,\ell -2, \ldots ,1\), given the \(|S_\ell | \times |S_k| \times \cdots \times |S_k|\) array \(M'_k\) we obtain \(M'_{k-1}\) as follows: we “stretch” \(D_k\) so that its first axis has length \(|S_\ell |\) by repeating each element \(|S_\ell |/|S_k|\) times (along that first axis) to get an \(|S_\ell | \times |S_k| \times \cdots \times |S_k|\) array, which we multiply element-wise by \(M'_k\), and perform a block summation for blocks of size \(1 \times n_k \times \cdots \times n_k\).

  • At the end we get the array \(M'_0\) of size \(|S_\ell | \times 1 \times \cdots \times 1\). We simply need to multiply its elements by d to get the partial derivatives w.r.t. \(p_s\), \(s \in S_\ell \).

3.2.2 Local Optimization

Given a differentiable multivariate function, gradient descent means that at each step we move in the opposite direction of the gradient at the current point, and thus (hopefully) converging to a local minimum of the function. This is a very natural strategy because we have the steepest initial descent in that direction. There are other standard iterative algorithms that also use the gradient (i.e., the vector consisting of the partial derivatives). They can make more sophisticated steps because they take previous gradient evaluations into account as well, resulting in a faster convergence to a local minimum. Since we have complicated functions for which gradient evaluations are computationally expensive, it is important for us to reach a local optimum in as few iterations as possible. Specifically, we used the conjugate gradient and the Broyden–Fletcher–Goldfarb–Shanno algorithm, which are both implemented in the Python library SciPy.

With efficient gradient evaluation and fast-converging optimization at our disposal, we were able to find local optima. However, we were surprised to see that, depending on the starting point, these algorithms find a large number of different local minima of the RSB formulas. This is due to our parameterization: we only consider discrete measures with a fixed number of atoms, and the atom locations are included among the parameters.Footnote 4 (This is what allowed us to tackle the problem numerically but it also makes the function behave somewhat chaotically.)

It is hard to get a good picture of the behavior of a function of so many variables. To give some idea, in Fig. 1 we plotted the 2-RSB bound for \(d=3 \, [n=5]\) over two-dimensional sections. In both cases we chose three local minima and took the plane H going through them and plotted the function over H. (Note that the left one appears to have a fourth local minimum. However, it is only a local minimum for the two-dimensional restriction of the function and it can actually be locally improved when we are allowed to use all dimensions.)

Fig. 1
figure 1

Plots of our 2-RSB bound over two-dimensional sections. The black ticks mark the local minima. We cut the function at a certain height

Many of these local minima have very similar values. It appears that one would basically need to check them all in order to find the one that happens to be the global minimum (for the given number of atoms). So our strategy is to simply start local optimization from various (random) points to eventually get a strong bound. This seems to work well as long as there are not too many local minima.

3.2.3 Basin Hopping

As the dimension of the parameter space grows, we start to see a landscape with a huge number of local minima and our chance for picking a good starting point becomes extremely slim. Instead, when we get to a local minimum (i.e., the bottom of a “basin”), we may try to “hop out” of the basin by applying a small perturbation of the variables. After a new round of local optimization, we end up at the bottom of another basin. If the function value decreases compared to the previous basin, we accept this step. If not, then we make a random decision of acceptance/rejection with a probability based on the difference of the values. Such a basin hopping algorithm randomly travels through local minima, with a slight preference for smaller values. (This preference should not be too strong, though, as we have to allow enough leeway for this random travel.) This approach led to the discovery of our best bounds for \(d=3\). We mention that in the case of our 5-RSB bound the basin hopping algorithm was running for days.

3.2.4 Avoiding Lower-Depth Minima

There is one more subtlety we have to pay attention to, especially when \(r \ge 3\). The fact that the r-RSB formula contains the \((r-1)\)-RSB as a special case means that the optimization has the tendency to converge to such “lower-depth” local minima (on the boundary of the parameter space). So it is beneficial to distort the target function in some way in order to force the \(r-1\) Parisi parameters to stay away from the boundary. That is, we need to add a penalty term to our function based on the distance of each \(m_k\) from 1. Once the function value is sufficiently small, we can continue the optimization with the original (undistorted) function.

4 One-Step RSB Revisited

It is possible to improve the previous best bounds even within the framework of the \(r=1\) case of the interpolation method. Recall that Theorem 2.2 gives the following bound in this case:

$$\begin{aligned} \alpha ^*_d \, m \log \lambda \le \log \mathbb {E}\big ( 1+ \lambda (1-x_1)\cdots (1-x_d) \big )^m - \frac{d}{2} \log \mathbb {E}(1-x_1 x_2)^m , \end{aligned}$$

where \(x_1,\ldots ,x_d\) are IID from some fixed distribution \(\eta \) on [0, 1]. If we use \(\eta =q \delta _{1-1/\lambda }+(1-q)\delta _0\) and take the limit \(m \rightarrow 0, \lambda \rightarrow \infty \) with \(m \log \lambda = \lambda _0\), then we get (2) as explained in Sect. 2.2 for general r. Optimizing (2) leads to what we refer to as the 1-RSB bound throughout the paper. In this section we show how one can improve on (2) for \(d \le 19\) by considering a more sophisticated \(\eta \). We will refer to the obtained bounds as \(1^+\)-RSB bounds. Although this approach is generally inferior to 2-RSB bounds, it is computationally less demanding. In fact, for degrees \(d=17,19\) we could only perform the 2-RSB optimization with \(n_1=2\) and the obtained bound was actually worse than the \(1^+\)-RSB bound outlined below.

For the sake of simplicity we start with a choice of \(\eta \) only slightly more general than the original one: let \(\eta \) have three atoms at the locations

We denote the measures of these atoms by , where ; i.e.,

Note that the original choice corresponds to the case .

As before, we let \(m \rightarrow 0, \lambda \rightarrow \infty \) with \(m \log \lambda = \lambda _0\), which leads to the following bound:

$$\begin{aligned} \alpha ^*_d \log (\lambda _0) \le \log S - \frac{d}{2} \log E , \end{aligned}$$
(9)

where

Substituting , we have three remaining parameters: . Setting the partial derivatives of the right-hand side w.r.t. \(q_0\) and to 0, we get that

$$\begin{aligned} 0=\partial _{q_0} \big (\log S - \frac{d}{2} \log E \big ) = \frac{\partial _{q_0} S}{S} - \frac{d}{2} \frac{\partial _{q_0} E}{E} , \end{aligned}$$

and similarly for . It follows that for the optimal choice of parameters we have

One can easily compute these partial derivatives to conclude that

The second equality gives

which turns the first equality into

So our bound has one free parameter left (\(q_0\)), in which we can easily optimize numerically. For \(d=3\) one gets 0.450851131. This is the simplest way to improve upon the basic 1-RSB bound.

More generally, one can take any measure \(\tau \) on [0, 1] and define \(\eta \) as the push-forward of \(\tau \) w.r.t. the mapping \(t \mapsto 1-1/\lambda ^t\). Once again, letting \(m \rightarrow 0, \lambda \rightarrow \infty \) with \(m \log \lambda = \lambda _0\), we get the following:

$$\begin{aligned} \alpha ^*_d \log (\lambda _0){} & {} \le \log \bigg ( \int \lambda _0^{\max (0,1-\sum t_\ell )} \, \textrm{d} \tau ^d(t_1,\ldots ,t_d) \bigg ) \\{} & {} \quad - \frac{d}{2} \log \bigg ( \int \lambda _0^{-\min (t_1,t_2)} \, \textrm{d} \tau ^2(t_1,t_2) \bigg ) . \end{aligned}$$

For any fixed \(\lambda _0\), an optimal \(\tau \) must satisfy a simple fixed point equation involving the convolution power \(\tau ^{*(d-1)}\). For \(\textrm{div} \in \mathbb {N}\) one can divide [0, 1] into \(\textrm{div}\) many intervals and search among atomic measures \(\tau \) with atom locations at \(i/\textrm{div}\), \(i=0,1,\ldots ,\textrm{div}\). It is possible to numerically solve the fixed point equation by an iterative algorithm. Then it remains to tune the parameter \(\lambda _0\). We computed these \(1^+\)-RSB bounds for \(\textrm{div}=1,2,4,8,\ldots ,1024\). Note that \(\textrm{div}=1\) corresponds to the original 1-RSB, while \(\textrm{div}=2\) gives (9). Table 6 shows the results for \(d=3\).

Table 6 Our \(1^+\)-RSB bounds for \(\alpha ^*_3\)