Skip to main content

The number of solutions for random regular NAE-SAT

Abstract

Recent work has made substantial progress in understanding the transitions of random constraint satisfaction problems. In particular, for several of these models, the exact satisfiability threshold has been rigorously determined, confirming predictions of statistical physics. Here we revisit one of these models, random regular k-nae-sat: knowing the satisfiability threshold, it is natural to study, in the satisfiable regime, the number of solutions in a typical instance. We prove here that these solutions have a well-defined free energy (limiting exponential growth rate), with explicit value matching the one-step replica symmetry breaking prediction. The proof develops new techniques for analyzing a certain “survey propagation model” associated to this problem. We believe that these methods may be applicable in a wide class of related problems.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Data Availability Statement

Data sharing is not applicable to this article as no datasets were generated or analyzed during this study.

Notes

  1. The converse is not needed for the final bound, but we mention it for the sake of concreteness.

  2. The matrix \(MM^t\) is invertible: if \(MM^tx=0\) then \(M^t x \in \ker M = ({{\,\mathrm{im}\,}}M^t)^\perp \). On the other hand clearly \(M^t x \in {{\,\mathrm{im}\,}}M^t\), so \(M^t x \in ({{\,\mathrm{im}\,}}M^t) \cap ({{\,\mathrm{im}\,}}M^t)^\perp =\{0\}\). Therefore \(x\in \ker M^t\), but \(M^t\) is injective by assumption.

  3. For the proof of Theorem E.5 it is equivalent to sample \(\rho \) from \(\eta ^\text {av} \equiv \int \eta \,d\zeta \).

  4. The event \(\mathrm {\textsf {{{COUP}}}}_{\leqslant r}\) is measurable with respect to \({\mathscr {F}}_{r,\circ }\), since \({\delta 'V_{r,\circ }},{\delta 'U_{r,\circ }}\) would remain less than k if the coupling fails at an earlier iteration.

  5. That is to say, let \((w_\gamma )_{\gamma \geqslant 1}\) be a Poisson point process on \(\mathbb {R}_{>0}\) with intensity measure \(w^{-(1+\lambda )}\,dw\). Let W denote their sum, which is finite almost surely. Assume the points of \(w_\gamma \) are arranged in decreasing order, and write \(z_\gamma \equiv w_\gamma /W\). Then \((z_\gamma )_{\gamma \geqslant 1}\) is distributed as a Poisson–Dirichlet process with parameter \(\lambda \).

References

  1. Achlioptas, D., Coja-Oghlan, A.: Algorithmic barriers from phase transitions. In: Proceedings of 49th FOCS, pp. 793–802. IEEE (2008)

  2. Achlioptas, D., Coja-Oghlan, A., Ricci-Tersenghi, F.: On the solution-space geometry of random constraint satisfaction problems. Random Struct. Algorithm 38(3), 251–268 (2011)

    Article  MathSciNet  Google Scholar 

  3. Achlioptas, D., Moore, C.: Random \(k\)-SAT: two moments suffice to cross a sharp threshold. SIAM J. Comput. 36(3), 740–762 (2006)

    Article  MathSciNet  Google Scholar 

  4. Abbe, E., Montanari, A.: On the concentration of the number of solutions of random satisfiability formulas. arXiv:1006.3786, (2010)

  5. Abbe, E., Montanari, A.: On the concentration of the number of solutions of random satisfiability formulas. Random Struct. Algorithm 45(3), 362–382 (2014)

    Article  MathSciNet  Google Scholar 

  6. Achlioptas, D., Ricci-Tersenghi, F.: On the solution-space geometry of random constraint satisfaction problems. In: Proceedings of 38th STOC, pp. 130–139. ACM, New York (2006)

  7. Aizenman, M., Sims, R., Starr, S.L.: Extended variational principle for the Sherrington-Kirkpatrick spin-glass model. Phys. Rev. B 68(21), 214403 (2003)

    Article  Google Scholar 

  8. Bapst, V., Coja-Oghlan, A.: The condensation phase transition in the regular \(k\)-SAT model. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and techniques, volume 60 of LIPIcs. Leibniz Int. Proc. Inform., pages Art. No. 22, 18. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern (2016)

  9. Bapst, V., Coja-Oghlan, A.: Harnessing the Bethe free energy. Random Struct. Algorithm 49(4), 694–741 (2016)

    Article  MathSciNet  Google Scholar 

  10. Bapst, V., Coja-Oghlan, A., Hetterich, S., Raßmann, F., Vilenchik, D.: The condensation phase transition in random graph coloring. Commun. Math. Phys. 341(2), 543–606 (2016)

    Article  MathSciNet  Google Scholar 

  11. Bapst, V., Coja-Oghlan, A., Raßmann, F.: A positive temperature phase transition in random hypergraph 2-coloring. Ann. Appl. Probab. 26(3), 1362–1406 (2016)

    Article  MathSciNet  Google Scholar 

  12. Bayati, M., Gamarnik, D., Tetali, P.: Combinatorial approach to the interpolation method and scaling limits in sparse random graphs. Ann. Probab. 41(6), 4080–4115 (2013)

    Article  MathSciNet  Google Scholar 

  13. Braunstein, A., Mézard, M., Zecchina, R.: Survey propagation: an algorithm for satisfiability. Random Struct. Algorithm 27(2), 201–226 (2005)

    Article  MathSciNet  Google Scholar 

  14. Coja-Oghlan, A., Krza̧kała, F., Perkins, W., Zdeborová, L.: Information-theoretic thresholds from the cavity method. In: Proceedings of 49th STOC, pp. 146–157. ACM, New York (2017)

  15. Coja-Oghlan, A., Krzakala, F., Perkins, W., Zdeborová, L.: Information-theoretic thresholds from the cavity method. Adv. Math. 333, 694–795 (2018)

    Article  MathSciNet  Google Scholar 

  16. Coja-Oghlan, A., Perkins, W.: Belief propagation on replica symmetric random factor graph models. Ann. Inst. Henri Poincaré D 5(2), 211–249 (2018)

    Article  MathSciNet  Google Scholar 

  17. Coja-Oghlan, A., Panagiotou, K.: Catching the \(k\)-NAE-SAT threshold. In: Proceedings of 44th STOC, pp. 899–907. ACM, New York (2012)

  18. Coja-Oghlan, A., Panagiotou, K.: The asymptotic \(k\)-SAT threshold. Adv. Math. 288, 985–1068 (2016)

    Article  MathSciNet  Google Scholar 

  19. Coja-Oghlan, A., Perkins, W., Skubch, K.: Limits of discrete distributions and Gibbs measures on random graphs. Eur. J. Combin. 66, 37–59 (2017)

    Article  MathSciNet  Google Scholar 

  20. Coja-Oghlan, A., Zdeborová, L.: The condensation transition in random hypergraph 2-coloring. In: Proceedings of 23rd SODA, pp. 241–250. ACM, New York (2012)

  21. Decelle, A., Krza̧kała, F., Moore, C., Zdeborová, L.: Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84(6), 66106 (2011)

  22. Ding, J., Sly, A., Sun, N.: Satisfiability threshold for random regular NAE-SAT. In: Proceedings of 46th STOC. ACM, New York (2014)

  23. Ding, J., Sly, A., Sun, N.: Proof of the satisfiability conjecture for large \(k\). In: Proceedings of 47th STOC, pp. 59–68. ACM, New York (2015)

  24. Ding, J., Sly, A., Sun, N.: Maximum independent sets on random regular graphs. Acta Math. 217(2), 263–340 (2016)

    Article  MathSciNet  Google Scholar 

  25. Ding, J., Sly, A., Sun, N.: Satisfiability threshold for random regular NAE-SAT. Commun. Math. Phys. 341(2), 435–489 (2016)

    Article  MathSciNet  Google Scholar 

  26. Franz, S., Leone, M.: Replica bounds for optimization problems and diluted spin systems. J. Stat. Phys. 111(3–4), 535–564 (2003)

    Article  MathSciNet  Google Scholar 

  27. Gamarnik, D.: Right-convergence of sparse random graphs. Probab. Theory Relat. Fields 160(1–2), 253–278 (2014)

    Article  MathSciNet  Google Scholar 

  28. Gerschenfeld, A., Montanari, A.: Reconstruction for models on random graphs. In: Proceedings of 48th FOCS, pp. 194–204. IEEE (2007)

  29. Guerra, F., Toninelli, F.L.: Infinite volume limit and spontaneous replica symmetry breaking in mean field spin glass models. Ann. Henri Poincaré 4(suppl. 1), S441–S444 (2003)

    Article  MathSciNet  Google Scholar 

  30. Guerra, F.: Broken replica symmetry bounds in the mean field spin glass model. Commun. Math. Phys. 233(1), 1–12 (2003)

    Article  MathSciNet  Google Scholar 

  31. Krza̧kała, F., Montanari, A., Ricci-Tersenghi, F., Semerjian, G., Zdeborová, L.: Gibbs states and the set of solutions of random constraint satisfaction problems. Proc. Natl. Acad. Sci. USA 104(25), 10318–10323 (2007)

  32. Massoulié, L.: Community detection thresholds and the weak Ramanujan property. In: Proceedings of 46th STOC, pp. 694–703. ACM, New York (2014)

  33. Mézard, M., Montanari, A.: Information, Physics, and Computation. Oxford University Press, Oxford, Oxford Graduate Texts (2009)

    Book  Google Scholar 

  34. Maneva, E., Mossel, E., Wainwright, M.J.: A new look at survey propagation and its generalizations. J. ACM 54(4), 41 (2007)

    Article  MathSciNet  Google Scholar 

  35. Mézard, M., Mora, T., Zecchina, R.: Clustering of solutions in the random satisfiability problem. Phys. Rev. Lett. 94(19), 197205 (2005)

    Article  Google Scholar 

  36. Mertens, S., Mézard, M., Zecchina, R.: Threshold values of random \(k\)-SAT from the cavity method. Random Struct. Algorithm 28(3), 340–373 (2006)

    Article  MathSciNet  Google Scholar 

  37. Mossel, E., Neeman, J., Sly, A.: Reconstruction and estimation in the planted partition model. Probab. Theory Relat. Fields 162(3–4), 431–461 (2015)

    Article  MathSciNet  Google Scholar 

  38. Montanari, A., Ricci-Tersenghi, F., Semerjian, G.: Clusters of solutions and replica symmetry breaking in random \(k\)-satisfiability. J. Stat. Mech. 2008(04), P04004 (2008)

    Article  Google Scholar 

  39. Montanari, A., Restrepo, R., Tetali, P.: Reconstruction and clustering in random constraint satisfaction problems. SIAM J. Discrete Math. 25(2), 771–808 (2011)

    Article  MathSciNet  Google Scholar 

  40. Nam, D., Sly, A., Sohn, Y.: One-step replica symmetry breaking of random regular nae-sat. arXiv:2011.14270, (2020)

  41. Panchenko, D.: The Sherrington0-Kirkpatrick Model. Springer Monographs in Mathematics, Springer, New York (2013)

    Book  Google Scholar 

  42. Parisi, G.: On local equilibrium equations for clustering states. arXiv:cs/0212047, (2002)

  43. Panchenko, D., Talagrand, M.: Bounds for diluted mean-fields spin glass models. Probab. Theory Relat. Fields 130(3), 319–336 (2004)

    Article  MathSciNet  Google Scholar 

  44. Zdeborova, L., Krza̧kała, F.: Phase transitions in the coloring of random graphs. Phys. Rev. E 76(3), 31131 (2007)

Download references

Acknowledgements

We are grateful to Amir Dembo, Jian Ding, Andrea Montanari, and Lenka Zdeborová for helpful conversations. We thank the anonymous referee and Youngtak Sohn for pointing out errors and giving many helpful comments on drafts of the paper. We also gratefully acknowledge the hospitality of the Simons Institute at Berkeley, where part of this work was completed during a spring 2016 semester program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Allan Sly.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Research supported in part by Allan Sly: NSF DMS-1208338, DMS-1352013, Sloan Fellowship, and Nike Sun: NSFMSPRF.

Appendices

Appendix A: Contraction estimates

We now prove Proposition 5.5, on the contraction of the bp recursion for the coloring model. In Section A.1 we analyze the recursions for the first moment (single-copy) model and prove Proposition 5.5a. In Section A.2 we analyze the the recursions for the first moment (pair) model and prove the remainder of Proposition 5.5. We assume throughout the section that \(0\leqslant \lambda \leqslant 1\) and \(1\leqslant T\leqslant \infty \).

1.1 A.1. Single-copy coloring recursions

Recall from Sect. 5.2 that the bp recursion is a pair (64) of mappings \(\dot{{\texttt {BP}}}: {\mathscr {P}}(\hat{\Omega }_T)\rightarrow {\mathscr {P}}(\dot{\Omega }_T)\) and \(\hat{{\texttt {BP}}}: {\mathscr {P}}(\dot{\Omega }_T)\rightarrow {\mathscr {P}}(\hat{\Omega }_T)\). Recall that for our purposes we can restrict attention to measures satisfying \({\dot{q}}={\dot{q}}^\text {av}\) and \(\hat{q}=\hat{q}^\text {av}\). Under this restriction, the bp recursion is quite explicit, as we now describe. Recall from Definition 2.8, equations (24) and (25), that for \(\dot{\tau }\in \dot{\mathscr {M}}\) and \(\hat{\tau }\in \hat{\mathscr {M}}\) we defined \(\dot{{\texttt {m}}}(\dot{\tau })\) and \(\hat{{\texttt {m}}}(\hat{\tau })\) as probability measures on \(\{{\texttt {0}},{\texttt {1}}\}\). For convenience, we also define

$$\begin{aligned} \dot{{\texttt {m}}}({\texttt {r}}_{\texttt {1}})=\hat{{\texttt {m}}}({\texttt {b}}_{\texttt {1}})=\delta _{\texttt {1}},\quad \dot{{\texttt {m}}}({\texttt {r}}_{\texttt {0}})=\hat{{\texttt {m}}}({\texttt {b}}_{\texttt {0}})=\delta _{\texttt {0}}\,.\end{aligned}$$
(69)

In what follows we often represent a probability measure on \(\{{\texttt {0}},{\texttt {1}}\}\) by the probability assigned to \({\texttt {1}}\), writing \(\dot{m}(\dot{\tau })\equiv \dot{{\texttt {m}}}[\dot{\tau }]({\texttt {1}})\) and \(\hat{m}(\hat{\tau })\equiv \hat{{\texttt {m}}}[\hat{\tau }]({\texttt {1}})\). Thus, equations (24), (25), and (69) together define mappings \(\dot{m}:\dot{\Omega }\rightarrow [0,1]\) and \(\hat{m}:\hat{\Omega }\rightarrow [0,1]\). Recall that we denote \(\{{\texttt {r}}\}\equiv \{{\texttt {r}}_{\texttt {1}},{\texttt {r}}_{\texttt {0}}\}\), \(\{{\texttt {b}}\}\equiv \{{\texttt {b}}_{\texttt {1}},{\texttt {b}}_{\texttt {0}}\}\), and \(\{{\texttt {f}}\}\equiv \Omega {\setminus }\{{\texttt {r}},{\texttt {b}}\}\). We also write \(\{{\texttt {f}}\}\equiv (\dot{\Omega }\cup \hat{\Omega }){\setminus }\{{\texttt {r}},{\texttt {b}}\}\); the precise meaning of \(\{{\texttt {f}}\}\) will be unambiguous from context. Then, for \({\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}\), let us abbreviate

$$\begin{aligned}{\texttt {g}}\equiv {\texttt {b}}\cup {\texttt {f}}, \ {\texttt {g}}_{{\varvec{x}}}\equiv {\texttt {b}}_{{\varvec{x}}}\cup {\texttt {f}}, \ {\texttt {y}}\equiv {\texttt {r}}\cup {\texttt {f}}, \ {\texttt {p}}_{{\varvec{x}}} \equiv {\texttt {b}}_{{\varvec{x}}} \cup {\texttt {r}}_{{\varvec{x}}}\,.\end{aligned}$$

The variable recursion \(\dot{{\texttt {BP}}}\equiv \dot{{\texttt {BP}}}_{\lambda ,T}\) is given by

$$\begin{aligned}(\dot{{\texttt {BP}}}\hat{q})(\dot{\sigma }) \cong {\left\{ \begin{array}{ll} \hat{q}({\texttt {p}}_{\texttt {1}})^{d-1} &{} \text {if }\dot{\sigma }\in \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\},\\ \hat{q}({\texttt {p}}_{\texttt {1}})^{d-1} - (\hat{q}({\texttt {b}}_{\texttt {1}}))^{d-1} &{} \text {if }\dot{\sigma }\in \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\},\\ \displaystyle \dot{z}(\dot{\sigma })^\lambda \sum _{\hat{\sigma }_2,\ldots ,\hat{\sigma }_d} {\mathbf {1}}\bigg \{ \dot{\sigma }= \dot{T}\Big ((\hat{\sigma }_i)_{i\geqslant 2}\Big ) \bigg \} \prod _{i=2}^d \hat{q}(\hat{\sigma }_i) &{} \text {if }\dot{\sigma }\in \dot{\Omega }_T{\setminus }\{{\texttt {r}},{\texttt {b}}\}, \end{array}\right. } \end{aligned}$$

where \(\cong \) indicates the normalization which makes \(\dot{{\texttt {BP}}}\hat{q}\) a probability measure on \(\dot{\Omega }_T\). For the clause recursion, let us write \(\underline{{\dot{\sigma }}}\sim \hat{\sigma }\) if \(\underline{{\dot{\sigma }}}\equiv (\dot{\sigma }_2,\ldots ,\dot{\sigma }_k)\in (\dot{\Omega }_T)^{k-1}\) is compatible with \(\hat{\sigma }\), in the sense that

$$\begin{aligned} \Big \{\underline{{\sigma }}=( (\dot{\sigma },\hat{\sigma }), (\dot{\sigma }_2,\hat{\sigma }_2), \ldots (\dot{\sigma }_k,\hat{\sigma }_k)) \in (\Omega _T)^k: \hat{I}^\text {lit}(\underline{{\sigma }})=1 \Big \} \ne \varnothing . \end{aligned}$$
(70)

The clause recursion \(\hat{{\texttt {BP}}}\equiv \hat{{\texttt {BP}}}_{\lambda ,T}\) is given by

$$\begin{aligned}(\hat{{\texttt {BP}}}{\dot{q}})(\hat{\sigma }) \cong {\left\{ \begin{array}{ll} \displaystyle {\dot{q}}({\texttt {b}}_{{\texttt {0}}})^{k-1} &{} \text {if }\hat{\sigma }\in \{{\texttt {r}}_{\texttt {0}},{\texttt {r}}_{\texttt {1}}\},\\ \displaystyle \hat{z}(\hat{\sigma })^\lambda \sum _{\dot{\sigma }_2,\ldots ,\dot{\sigma }_k} {\mathbf {1}}\bigg \{ \hat{\sigma }= \hat{T}\Big ( (\dot{\sigma }_i)_{i\geqslant 2}\Big ) \bigg \} \prod _{i=2}^k{\dot{q}}(\dot{\sigma }_i) &{} \text {if }\hat{\sigma }\in \hat{\Omega }_T{\setminus }\{{\texttt {r}},{\texttt {b}}\},\\ \displaystyle \sum _{\underline{{\dot{\sigma }}} \sim {\texttt {b}}_{\texttt {1}}} \Big ( 1-\prod _{i=2}^k \dot{m}(\dot{\sigma }_i) \Big )^\lambda \prod _{i=2}^k {\dot{q}}(\dot{\sigma }_i) &{} \text {if } \hat{\sigma }\in \{{\texttt {b}}_{\texttt {0}}, {\texttt {b}}_{\texttt {1}}\}, \end{array}\right. } \end{aligned}$$

where the last line uses the convention (69). Recall that \({\texttt {BP}}\equiv \dot{{\texttt {BP}}}\circ \hat{{\texttt {BP}}}\equiv {\texttt {BP}}_{\lambda ,T}\). We will show the following contraction result (assuming, as always, \(0\leqslant \lambda \leqslant 1\) and \(1\leqslant T\leqslant \infty \)).

Proposition A.1

If \({\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }\), then \({\texttt {BP}}{\dot{q}}_1, {\texttt {BP}}{\dot{q}}_2 \in \varvec{\Gamma }\) and \(\Vert {{\texttt {BP}}{\dot{q}}_1-{\texttt {BP}}{\dot{q}}_2} \Vert = O(k^2/2^k)\Vert {{\dot{q}}_1-{\dot{q}}_2} \Vert \).

Before the proof of Proposition A.1 we deduce the following consequences:

Proof of Proposition 5.5a

Let \({\dot{q}}^{(0)}\) be the uniform measure on \(\{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}},{\texttt {r}}_{\texttt {1}},{\texttt {r}}_{\texttt {0}}\}\), and let \({\dot{q}}^{(l)} \equiv {\texttt {BP}}{\dot{q}}^{(l-1)}\). It is clear that \({\dot{q}}^{(0)}\in \varvec{\Gamma }\), so Proposition A.1 implies \({\dot{q}}^{(l)}\in \varvec{\Gamma }\) for all \(l\geqslant 1\), and furthermore that \(({\dot{q}}^{(l)})_{l\geqslant 1}\) forms an \(\ell ^1\) Cauchy sequence. By completeness of \(\ell ^1\) we conclude that there exists \({\dot{q}}^{(\infty )}={\dot{q}}_\star \in \varvec{\Gamma }\) such that \({\texttt {BP}}{\dot{q}}_\star = {\dot{q}}_\star \) and \(\Vert {{\dot{q}}^{(l)} - {\dot{q}}_\star } \Vert \rightarrow 0\) as \(l\rightarrow \infty \). Applying Proposition A.1 again gives \(\Vert {{\texttt {BP}}{\dot{q}}-{\dot{q}}_\star } \Vert = O(k^2/2^k)\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert \) for any \({\dot{q}}\in \varvec{\Gamma }\), from which it follows that \({\dot{q}}_\star \) is the unique fixed point of \({\texttt {BP}}\) in \(\varvec{\Gamma }\).\(\square \)

Proof of Proposition 5.5b

For each \(1\leqslant T\leqslant \infty \), let \(({\dot{q}}_{\lambda ,T})^{(l)}\) (\(l\geqslant 0\)) be defined in the same way as \({\dot{q}}^{(l)}\) in the proof of Proposition A.1. It follows from the definition that \(({\dot{q}}_{\lambda ,T})^{(l)} = ({\dot{q}}_{\lambda ,\infty })^{(l)}\) for all \(l \leqslant l_T\), where \(l_T\equiv \ln T/ \ln (dk)\). By the triangle inequality and Proposition 5.5a,

$$\begin{aligned} \Vert {{\dot{q}}_{\lambda ,T} - {\dot{q}}_{\lambda ,\infty }} \Vert \leqslant \Vert {{\dot{q}}_{\lambda ,T} - ({\dot{q}}_{\lambda ,\infty })^{(l_T)}} \Vert +\Vert {({\dot{q}}_{\lambda ,\infty })^{(l_T)} - {\dot{q}}_{\lambda ,\infty }} \Vert \leqslant (C/2^k)^{l_T} \end{aligned}$$

for some absolute constant k. The result follows assuming \(k\geqslant k_0\).\(\square \)

We now turn to the proof of Proposition A.1. We work with the non-normalized bp recursions \(\dot{{\texttt {NB}}}\equiv \dot{{\texttt {NB}}}_{\lambda ,T}\) and \(\hat{{\texttt {NB}}}\equiv \hat{{\texttt {NB}}}_{\lambda ,T}\), defined by substituting “\(\cong \)” with “\(=\)” in the definitions of \(\dot{{\texttt {BP}}}\) and \(\hat{{\texttt {BP}}}\) respectively. One can then recover \(\dot{{\texttt {BP}}},\hat{{\texttt {BP}}}\) from \(\dot{{\texttt {NB}}},\hat{{\texttt {NB}}}\) via

$$\begin{aligned} (\dot{{\texttt {BP}}}\hat{p})(\dot{\sigma }) = \frac{(\dot{{\texttt {NB}}}\hat{p})(\dot{\sigma })}{\sum _{\dot{\sigma }' \in \dot{\Omega }}(\dot{{\texttt {NB}}}\hat{p})(\dot{\sigma }')}\,,\quad (\hat{{\texttt {BP}}}\dot{p})(\hat{\sigma }) = \frac{(\hat{{\texttt {NB}}}\dot{p})(\hat{\sigma })}{\sum _{\hat{\sigma }' \in \hat{\Omega }}(\hat{{\texttt {NB}}}\dot{p})(\hat{\sigma }')}\,. \end{aligned}$$

Let \(\dot{p}\) be the reweighted measure defined by

$$\begin{aligned} \dot{p}(\dot{\sigma }) \equiv [\dot{p}({\dot{q}})](\dot{\sigma }) \equiv \frac{{\dot{q}}(\dot{\sigma })}{1 - {\dot{q}}({\texttt {r}})}\,. \end{aligned}$$
(71)

In the above we have assumed that the inputs to \(\dot{{\texttt {BP}}},\hat{{\texttt {BP}}},\dot{{\texttt {NB}}},\hat{{\texttt {NB}}}\) are probability measures; we now extend them in the obvious manner to nonnegative measures with strictly positive total mass.

Given two measures \(r_1,r_2\) defined on any space \(\mathcal {X}\), we denote \(\Delta r(x) \equiv |r_1(x)-r_2(x)|\). We regard \(\Delta r\) as a nonnegative measure on \(\mathcal {X}\): for any subset \(S\subseteq \mathcal {X}\),

$$\begin{aligned} \Delta r(S) =\sum _{x\in S}|r_1(x)-r_2(x)| \geqslant |r_1(S)-r_2(S)|, \end{aligned}$$

where the inequality may be strict. For any nonnegative measure \({\hat{r}}\) on \(\hat{\Omega }\), we abbreviate

$$\begin{aligned} \hat{m}^\lambda {\hat{r}}(\hat{\sigma })&\equiv \hat{m}(\hat{\sigma })^\lambda {\hat{r}}(\hat{\sigma }),\\ (1-\hat{m})^\lambda {\hat{r}}(\hat{\sigma })&\equiv (1-\hat{m}(\hat{\sigma }))^\lambda {\hat{r}}(\hat{\sigma }). \end{aligned}$$

In what follows we will begin with two measures in \(\varvec{\Gamma }\), and show that they contract under one step of the bp recursion. Let \(\hat{{\texttt {NB}}}\) and \(\dot{{\texttt {NB}}}\) be the non-normalized single-copy bp recursions at parameters \(\lambda ,T\). Starting from \({\dot{q}}_i\in \varvec{\Gamma }\) (\(i=1,2\)), denote

$$\begin{aligned} \dot{p}_i&\equiv \dot{p}({\dot{q}}_i) \text { (as defined by } (71)),\\ \hat{p}_i&\equiv \hat{{\texttt {NB}}}(\dot{p}_i) \text { and } \hat{p}_{i,\infty }\equiv \hat{{\texttt {NB}}}_{\lambda ,\infty }(\dot{p}_i),\\ \dot{p}^{\text {u}}_i&\equiv \dot{{\texttt {NB}}}(\hat{p}_i) \text { and } \tilde{q}_i \equiv \dot{{\texttt {BP}}}\hat{p}_i ={\texttt {BP}}{\dot{q}}_i. \end{aligned}$$

With this notation in mind, the proof of Proposition A.1 is divided into four lemmas.

Lemma A.2

(effect of reweighting) Assuming \({\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }\), \(\Vert {\Delta \dot{p}} \Vert = O(1) \Vert {{\dot{q}}_1 - {\dot{q}}_2} \Vert \), where O(1) indicates a constant depending on the constant appearing in (68).

Lemma A.3

(clause bp) Assuming \({\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }\),

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i({\texttt {s}})&= 1 -4/2^k + O(k/4^k),\nonumber \\ \hat{m}^\lambda \hat{p}_i({\texttt {f}})&=\hat{m}^\lambda \hat{p}_i({\texttt {s}}) + O(k/4^k),\nonumber \\ \hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}})&= 1 + O(k/2^k),\nonumber \\ \hat{m}^\lambda \hat{p}_i({\texttt {r}}_{\texttt {1}})&= (2/2^k)[1 + O(k/2^k)]. \end{aligned}$$
(72)

Further, writing \(\Delta \hat{m}^\lambda \hat{p}(\cdot ) \equiv \hat{m}^\lambda (\cdot )|\hat{p}_1 (\cdot ) - \hat{p}_2 (\cdot )|\),

$$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}({\texttt {f}}) + \Delta \hat{m}^\lambda \hat{p}({\texttt {r}})&= O(k/2^k)\Delta \dot{p}({\texttt {f}}),\nonumber \\ \Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert&= O(k^2/2^k) \Vert {\Delta \dot{p}} \Vert . \end{aligned}$$
(73)

(Recall that \(\hat{p}(\hat{\sigma }\oplus {\texttt {1}})=\hat{p}(\hat{\sigma })\) and \(\hat{m}(\hat{\sigma }\oplus {\texttt {1}})=1-\hat{m}(\hat{\sigma })\), so \((1-\hat{m})^\lambda \hat{p}(\hat{\sigma }) = \hat{m}^\lambda \hat{p}(\hat{\sigma }\oplus {\texttt {1}})\). As a result, the bounds for \(\Delta \hat{m}^\lambda \hat{p}\) imply analogous bounds for \(\Delta (1-\hat{m})^\lambda \hat{p}\).)

Lemma A.4

(variable bp, non-normalized) Assuming \({\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }\), we have

$$\begin{aligned} \begin{bmatrix} \dot{p}^{\text {u}}_i({\texttt {f}}) \\ \dot{p}^{\text {u}}_i({\texttt {r}}) \end{bmatrix} = \begin{bmatrix} O(2^{-k}) \\ 1+O(2^{-k}) \end{bmatrix}\dot{p}^{\text {u}}_i({\texttt {b}}) ,\quad \begin{bmatrix} \Delta \dot{p}^{\text {u}}({\texttt {f}})\\ \Delta \dot{p}^{\text {u}}({\texttt {b}})\\ \Delta \dot{p}^{\text {u}}({\texttt {r}}) \end{bmatrix} = \begin{bmatrix} O(k) \\ O(k2^k) \\ O(k2^k) \end{bmatrix} \Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert \max _{i=1,2}\Big \{ \dot{p}^{\text {u}}_i({\texttt {b}})\Big \}. \end{aligned}$$
(74)

Lemma A.5

(variable bp, normalized) Assuming \({\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }\), we have \(\tilde{q}_1,\tilde{q}_2\in \varvec{\Gamma }\) as well, with

$$\begin{aligned}\Vert {\tilde{q}_1-\tilde{q}_2} \Vert \lesssim k \Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert \,.\end{aligned}$$

Proof of Proposition A.1

Follows by combining the four preceding lemmas A.2A.5.\(\square \)

We now prove the four lemmas.

Proof of Lemma A.2

This follows from the elementary identity

$$\begin{aligned} \frac{a_1}{b_1} - \frac{a_2}{b_2} = \frac{1}{b_1} (a_1 - a_2) + \frac{b_2 - b_1}{b_1b_2} {a_2}\,. \end{aligned}$$
(75)

together with (68).\(\square \)

In the proof of the next two lemmas, the following elementary fact will be used repeatedly: suppose for \(1\leqslant l\leqslant m\) that we have nonnegative measures \(a^l,b^l\) over a finite set \(\mathcal {X}^l\). Then, denoting \(\underline{{\mathcal {X}}}=\mathcal {X}^1\times \cdots \times \mathcal {X}^m\), we have

$$\begin{aligned} \sum _{\underline{{x}}\in \underline{{\mathcal {X}}}} \bigg | \prod _{l=1}^m a^l(x^l) -\prod _{l=1}^m b^l(x^l) \bigg |&\leqslant \sum _{l=1}^m \sum _{\underline{{x}}\in \underline{{\mathcal {X}}}} \bigg \{\prod _{1\leqslant j<l} b^j(x^j)\bigg \} \bigg \{\prod _{l<j\leqslant m} a^j(x^j)\bigg \} \Big | a^l(x^l)-b^l(x^l) \Big |\nonumber \\&\leqslant \sum _{l=1}^m \Vert {a^l-b^l} \Vert \prod _{j\ne l} \Big ( \Vert {a^j} \Vert +\Vert {a^j-b^j} \Vert \Big ). \end{aligned}$$
(76)

If all the \((\mathcal {X}^l,a^l,b^l)\) are the same \((\mathcal {X},a,b)\), this reduces to the bound

$$\begin{aligned} \sum _{x_1,\dots ,x_m\in \mathcal {X}} \bigg | \prod _{i=1}^{m} a(x_i) - \prod _{i=1}^{m} b(x_i) \bigg | \leqslant m\Vert {a-b} \Vert \Big ( \Vert {a} \Vert + \Vert {a-b} \Vert \Big )^{m-1}\,. \end{aligned}$$
(77)

In what follows we will abbreviate (for \({\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}\))

$$\begin{aligned} {\texttt {a}}_{\varvec{x}}\equiv \Big \{\hat{\sigma }\in \hat{\Omega }_T: \underline{{\dot{\sigma }}}\in ({\texttt {g}}_{\varvec{x}})^{k-1} \text { for all } \underline{{\dot{\sigma }}}\sim \hat{\sigma }\Big \}\,. \end{aligned}$$
(78)

Proof of Lemma A.3

From the definition, if \(\dot{p}=\dot{p}({\dot{q}})\) then

$$\begin{aligned}\dot{p}({\texttt {b}}) =\frac{{\dot{q}}({\texttt {b}})}{1-{\dot{q}}({\texttt {r}})} =\frac{{\dot{q}}({\texttt {b}})}{{\dot{q}}({\texttt {g}})} =1-\dot{p}({\texttt {f}})\,. \end{aligned}$$

It follows that for any \({\dot{q}}_1,{\dot{q}}_2 \in \varvec{\Gamma }\) we have \(\Delta \dot{p}({\texttt {b}}) \leqslant \Delta \dot{p}({\texttt {f}}) \leqslant \dot{p}_1({\texttt {f}})+\dot{p}_2({\texttt {f}}) = O(2^{-k})\). Another consequence of the definition of \(\varvec{\Gamma }\) is that \(\Vert {\Delta \dot{p}} \Vert =O(1)\). We now control \(\Delta \hat{m}^\lambda \hat{p}(\hat{\sigma })\), distinguishing a few cases:

  1. 1.

    We first consider \(\hat{\sigma }\in \hat{\Omega }{\setminus }\{{\texttt {b}},{\texttt {s}}\}\). For such \(\hat{\sigma }\) we have

    $$\begin{aligned}\Delta \hat{m}^\lambda \hat{p}(\hat{\sigma }) =\bigg | [\hat{m}(\hat{\sigma }) \hat{z}(\hat{\sigma })]^\lambda \sum _{\underline{{\dot{\sigma }}}\sim \hat{\sigma }} \bigg (\prod _{j=2}^k\dot{p}_1(\dot{\sigma }_j) -\prod _{j=2}^k\dot{p}_2(\dot{\sigma }_j)\bigg ) \bigg |,\end{aligned}$$

    and it is easy to check that

    $$\begin{aligned}\hat{m}(\hat{\sigma }) \hat{z}(\hat{\sigma }) =1-\prod _{j=2}^k \dot{m}(\dot{\sigma }_j)\in [0,1]\,.\end{aligned}$$

    Moreover, any such \(\hat{\sigma }\) must belong to \({\texttt {a}}_{\texttt {0}}\) or \({\texttt {a}}_{\texttt {1}}\). By summing over \(\hat{\sigma }\in {\texttt {a}}_{\texttt {0}}\) and applying (77) we have

    $$\begin{aligned}\Delta \hat{m}^\lambda \hat{p}({\texttt {a}}_{\texttt {0}}) \leqslant (k-1) \Vert (\dot{p}_1-\dot{p}_2) \Vert _{\ell ^1 ({\texttt {g}}_{\texttt {0}})} \Big ( \dot{p}_1({\texttt {g}}_{\texttt {0}}) + \Delta \dot{p}({\texttt {f}}) \Big )^{k-2}\,.\end{aligned}$$

    Recalling that \(\dot{p}_1\) and \(\dot{p}_2\) both lie in \(\varvec{\Gamma }\), in the above we have \(\dot{p}_1({\texttt {g}}_{\texttt {0}}) + \Delta \dot{p}({\texttt {f}}) \leqslant [1 + O(2^{-k})]/2\), as well as \(\Vert (\dot{p}_1-\dot{p}_2) \Vert _{\ell ^1 ({\texttt {b}}_{\texttt {0}},{\texttt {f}})} =O(1) \Delta \dot{p}({\texttt {f}})\). Combining these gives

    $$\begin{aligned}\Delta \hat{m}^\lambda \hat{p}({\texttt {a}}_{\texttt {0}}) =O(k/2^k) \Delta \dot{p}({\texttt {f}}),\end{aligned}$$

    and the same bound holds for \(\Delta \hat{m}^\lambda \hat{p}({\texttt {a}}_{\texttt {1}})\).

  2. 2.

    Next consider \(\hat{\sigma }={\texttt {s}}\), for which we have \(\hat{m}(\hat{\sigma })=1/2\) and \(\hat{z}(\hat{\sigma })=2\). Thus

    $$\begin{aligned} \hat{m}^\lambda \hat{p}({\texttt {s}}) = 1 - (\dot{p}({\texttt {g}}_{\texttt {0}}))^{k-1} - (\dot{p}({\texttt {g}}_{\texttt {1}}))^{k-1} + \dot{p}({\texttt {f}})^{k-1}\,.\end{aligned}$$
    (79)

    Arguing as above gives \(\Delta \hat{m}^\lambda \hat{p}({\texttt {s}}) = O(k/2^k) \Delta \dot{p}({\texttt {f}})\), proving the first half of (73).

  3. 3.

    Lastly consider \(\hat{\sigma }\in \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}}\}\). Recalling (69) we have \(\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {0}})=0\), so let us take \(\hat{\sigma }={\texttt {b}}_{\texttt {1}}\), and consider \(\underline{{\dot{\sigma }}}\sim {\texttt {b}}_{\texttt {1}}\). Note that if \(\underline{{\dot{\sigma }}}\) has no entry in \(\{{\texttt {r}}\}\), then we also have \(\underline{{\dot{\sigma }}}\sim \hat{\sigma }'\) for some \(\hat{\sigma }' \in \{{\texttt {r}},{\texttt {f}}\}\). Again making use of (69), this \(\underline{{\dot{\sigma }}}\) gives the same contribution to \(\hat{m}^\lambda \hat{p}_\infty (\hat{\sigma }')\) as to \(\hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}})\). It follows that

    $$\begin{aligned}\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}) \leqslant \Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {y}}) + k\Big | \dot{p}_1({\texttt {r}}_{{\texttt {0}}})\dot{p}_1({\texttt {b}}_{\texttt {1}})^{k-2} - \dot{p}_2({\texttt {r}}_{{\texttt {0}}}) \dot{p}_2({\texttt {b}}_{\texttt {1}})^{k-2} \Big |\,.\end{aligned}$$

    The first term on the right-hand side captures the contribution from those \(\underline{{\dot{\sigma }}}\) with no entry in \(\{{\texttt {r}}\}\), and by the preceding arguments it is \(O(k/2^k)\Delta \dot{p}({\texttt {f}})\). It is easy to check that the second term is \(O(k^2/2^k)\Vert {\Delta \dot{p}} \Vert \), which finishes the second part of (73).

Combining the above estimates proves (73). We next prove (72). For this purpose we introduce the notation \({\texttt {f}}_{\geqslant 1}\) to refer to elements of \(\dot{\Omega }\) or \(\hat{\Omega }\) that contain at least one free variable. In particular, \({\texttt {f}}_{\geqslant 1}\cap \hat{\Omega }\) is given by \(\{{\texttt {f}}\}{\setminus }\{{\texttt {s}}\} \subseteq {\texttt {a}}_{\texttt {0}}\cup {\texttt {a}}_{\texttt {1}}\subseteq \hat{\Omega }\). Since \({\dot{q}}_i\in \varvec{\Gamma }\), we must have from (68) that

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i({\texttt {f}}_{\geqslant 1}) \leqslant 2\sum _{l=1}^{k-1} \left( {\begin{array}{c}k-1\\ l\end{array}}\right) \dot{p}_i({\texttt {f}})^l \dot{p}_i({\texttt {b}}_{\texttt {0}})^{k-1-l} \leqslant 2 \dot{p}_i({\texttt {b}}_{\texttt {0}})^{k-1} \sum _{l=1}^{k-1} \bigg ( \frac{k \dot{p}_i({\texttt {f}})}{\dot{p}_i({\texttt {b}}_{\texttt {0}})} \bigg )^l = O(k/4^k)\,.\end{aligned}$$
(80)

On the other hand, we see from (79) that

$$\begin{aligned}\hat{m}^\lambda \hat{p}_i({\texttt {s}}) = 1-4/2^k+O(k/4^k)\,.\end{aligned}$$

If \(\underline{{\dot{\sigma }}}\sim {\texttt {b}}_{\texttt {1}}\) has no entry in \(\{{\texttt {r}}\}\), then there must exist some \(\hat{\sigma }\in \{{\texttt {f}}\}\) such that \(\underline{{\dot{\sigma }}}\sim \hat{\sigma }\) as well. Conversely, if \(\hat{\sigma }\in \hat{\Omega }_T{\setminus }\{{\texttt {r}},{\texttt {b}}\}\) and \(\underline{{\dot{\sigma }}}\sim \hat{\sigma }\), then \(\underline{{\dot{\sigma }}}\sim {\texttt {b}}_{\texttt {1}}\), unless \(\underline{{\dot{\sigma }}}\) has exactly one spin \(\dot{\sigma }_i\in \{{\texttt {b}}_{\texttt {0}},{\texttt {f}}\}\) with the remaining \(k-2\) spins equal to \({\texttt {b}}_{\texttt {1}}\).Footnote 1 It follows that

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}})&=\hat{p}_i({\texttt {b}}_{\texttt {1}})=\hat{m}^\lambda \hat{p}_{i,\infty } ({\texttt {f}}) + (k-1) \Big [ \dot{p}_i({\texttt {r}}_{\texttt {0}}) -\dot{p}_i({\texttt {g}}_{\texttt {0}}) \Big ] \dot{p}_i({\texttt {b}}_{\texttt {1}})^{k-2} \nonumber \\&\leqslant \hat{m}^\lambda \hat{p}_{i,\infty } ({\texttt {f}}) + (k-1) \dot{p}_i({\texttt {r}}_{\texttt {0}}) \dot{p}_i({\texttt {b}}_{\texttt {1}})^{k-2} = 1 + O(k/2^k). \end{aligned}$$
(81)

For a lower bound it suffices to consider the contribution from clauses with all k incident colors in \(\{{\texttt {b}}\}\):

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}}) =\hat{p}_i({\texttt {b}}_{\texttt {1}}) \geqslant \dot{p}_i({\texttt {b}})^{k-1} [1-O(k/2^k)]= 1 - O(k/2^k)\,.\end{aligned}$$
(82)

Lastly, note by symmetry that

$$\begin{aligned}\hat{m}^\lambda \hat{p}_i({\texttt {r}}_{\texttt {1}}) =\hat{p}_i({\texttt {r}}_{\texttt {1}}) =\hat{p}_i({\texttt {b}}_{\texttt {0}})^{k-1} =(2/2^k) \hat{p}_i({\texttt {b}})^{k-1}\,.\end{aligned}$$

Combining these estimates proves (72).\(\square \)

Proof of Lemma A.4

We control \(\dot{p}^{\text {u}}\) and \(\Delta \dot{p}^{\text {u}}\) in two cases.

  1. 1.

    First consider \(\dot{\sigma }\in \dot{\Omega }{\setminus }\{{\texttt {r}},{\texttt {b}}\}\). Up to permutation there is a unique \(\underline{{\hat{\sigma }}}\in \{{\texttt {f}}\}^{d-1}\) such that \(\dot{\sigma }=\hat{T}(\underline{{\hat{\sigma }}})\). Let \({\textsf {comb}}(\dot{\sigma })\) denote the number of distinct tuples \(\underline{{\hat{\sigma }}}'\) that can be obtained by permuting the coordinates of \(\underline{{\hat{\sigma }}}\). For this \(\underline{{\hat{\sigma }}}\) we have

    $$\begin{aligned} \prod _{j=2}^d\hat{m}(\hat{\sigma }_j)^\lambda \leqslant \dot{z}(\dot{\sigma })^\lambda \leqslant \prod _{j=2}^d\hat{m}(\hat{\sigma }_j)^\lambda +\prod _{j=2}^d(1-\hat{m}(\hat{\sigma }_j))^\lambda ,\end{aligned}$$
    (83)

    where the rightmost inequality uses that \((a+b)^\lambda \leqslant a^\lambda + b^\lambda \) for \(a,b\geqslant 0\) and \(\lambda \in [0,1]\). It follows that for \(i=1,2\) we have

    $$\begin{aligned} {\textsf {comb}}(\dot{\sigma }) \prod _{j=2}^d \hat{m}\hat{p}_i(\hat{\sigma }_j) \leqslant \dot{p}^{\text {u}}_i(\dot{\sigma }) \leqslant {\textsf {comb}}(\dot{\sigma }) \bigg \{ \prod _{j=2}^d \hat{m}^\lambda \hat{p}_i(\hat{\sigma }_j) + \prod _{j=2}^d (1-\hat{m})^\lambda \hat{p}_i(\hat{\sigma }_j)\bigg \}\,. \end{aligned}$$

    It follows by symmetry that \(\hat{m}^\lambda \hat{p}_i({\texttt {f}}) = (1-\hat{m})^\lambda \hat{p}_i({\texttt {f}})\), so

    $$\begin{aligned}{}[\hat{m}^\lambda \hat{p}_i({\texttt {s}})]^{d-1} \leqslant \dot{p}^{\text {u}}_i({\texttt {f}}) \leqslant [\hat{m}^\lambda \hat{p}_i({\texttt {f}})]^{d-1} + [(1-\hat{m})^\lambda \hat{p}_i({\texttt {f}})]^{d-1} = 2[\hat{m}^\lambda \hat{p}_i({\texttt {f}})]^{d-1}\,. \end{aligned}$$
    (84)

    Making use of the symmetry together with (83) gives

    $$\begin{aligned}\Delta \dot{p}^{\text {u}}({\texttt {f}}) \leqslant 2\sum _{\underline{{\hat{\sigma }}} \in (\hat{\Omega }_{\texttt {f}})^{d-1}} \bigg | \prod _{j=2}^{d-1} \hat{m}^\lambda \hat{p}_1(\hat{\sigma }_j) -\prod _{j=2}^{d-1} \hat{m}^\lambda \hat{p}_2(\hat{\sigma }_j) \bigg |,\end{aligned}$$

    and applying (77) gives

    $$\begin{aligned}\Delta \dot{p}^{\text {u}}({\texttt {f}}) \lesssim d\Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert \Big ( \hat{m}^\lambda \hat{p}_1({\texttt {f}}) +\Delta \hat{m}^\lambda \hat{p}_1({\texttt {f}}) \Big )^{d-2}\,.\end{aligned}$$

    Combining (72) with the lower bound from (83) then gives

    $$\begin{aligned}\Delta \dot{p}^{\text {u}}({\texttt {f}}) \lesssim d\Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert \max _{i=1,2}\Big \{\dot{p}^{\text {u}}_i({\texttt {f}})\Big \}\,.\end{aligned}$$
  2. 2.

    ] Next consider \(\dot{\sigma }\in \{{\texttt {r}},{\texttt {b}}\}\): for \({\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}\), note that \(\dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}}) = \hat{p}_i({\texttt {p}}_{\varvec{x}})^{d-1}\), and

    $$\begin{aligned} \frac{\dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}}) - \dot{p}^{\text {u}}_i({\texttt {b}}_{\varvec{x}})}{\dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}})} =\frac{\hat{p}_i({\texttt {b}}_{\varvec{x}})^{d-1}}{\hat{p}_i({\texttt {p}}_{\varvec{x}})^{d-1}} =\bigg (1 - \frac{\hat{p}_i({\texttt {r}}_{\varvec{x}})}{\hat{p}_i({\texttt {p}}_{\varvec{x}})} \bigg )^{d-1}=O(2^{-k}),\end{aligned}$$
    (85)

    where the last estimate uses (72) and \(d/k=2^{k-1}\ln 2+O(1)\). Applying (77) gives

    $$\begin{aligned}\Delta \dot{p}^{\text {u}}({\texttt {p}}_{\texttt {1}}) \lesssim d \Vert {\hat{m}^\lambda \hat{p}} \Vert \Big (\min _{i=1,2}\Big \{ \hat{m}^\lambda \hat{p}_i({\texttt {p}}_{\texttt {1}})\Big \} +\Delta \hat{m}^\lambda \hat{p}({\texttt {p}}_{\texttt {1}}) \Big )^{d-2}\,.\end{aligned}$$

    Suppose without loss that \(\hat{m}^\lambda \hat{p}_1({\texttt {b}}_1) \leqslant \hat{m}^\lambda \hat{p}_2({\texttt {b}}_1)\): then

    $$\begin{aligned} \hat{m}^\lambda \hat{p}_1({\texttt {p}}_{\texttt {1}}) +\Delta \hat{m}^\lambda \hat{p}({\texttt {p}}_{\texttt {1}})&=\hat{m}^\lambda \hat{p}_2({\texttt {b}}_{\texttt {1}}) +\hat{m}^\lambda \hat{p}_1({\texttt {r}}_{\texttt {1}}) +\Delta \hat{m}^\lambda \hat{p}({\texttt {r}}_{\texttt {1}}) \\&\leqslant \hat{m}^\lambda \hat{p}_2({\texttt {p}}_{\texttt {1}}) +2\Delta \hat{m}^\lambda \hat{p}({\texttt {r}}_{\texttt {1}}), \end{aligned}$$

    and substituting into the above gives

    $$\begin{aligned}\Delta \dot{p}^{\text {u}}({\texttt {p}}_{\texttt {1}}) \lesssim d\Vert {\hat{m}^\lambda \hat{p}} \Vert \Big (\max _{i=1,2}\Big \{ \hat{m}^\lambda \hat{p}_i({\texttt {p}}_{\texttt {1}})\Big \} +\Delta \hat{m}^\lambda \hat{p}({\texttt {r}}_{\texttt {1}}) \Big )^{d-2}\,.\end{aligned}$$

    From (73) and the definition (68) of \(\varvec{\Gamma }\) we have \(\Delta \hat{m}^\lambda \hat{p}({\texttt {r}}_{\texttt {1}}) = O(k/2^k)\Delta \dot{p}({\texttt {f}}) = O(k/4^k)\). It follows from (85) that

    $$\begin{aligned} \Delta \dot{p}^{\text {u}}({\texttt {p}}_{\texttt {1}}) \lesssim d \Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert \max _{i=1,2}\Big \{ \dot{p}^{\text {u}}_i({\texttt {b}}_{\texttt {1}})\Big \}\,.\end{aligned}$$
    (86)

It remains to show \(\dot{p}^{\text {u}}({\texttt {f}}) / \dot{p}^{\text {u}}({\texttt {b}}) = O(2^{-k})\). From (81),

$$\begin{aligned}\hat{m}^\lambda \hat{p}_i({\texttt {f}}) -\hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}}) \leqslant \hat{m}^\lambda \hat{p}_{i,\infty }({\texttt {f}}) -\hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}}) \leqslant (k-1) \Big [\dot{p}_i({\texttt {g}}_{\texttt {0}}) -\dot{p}_i({\texttt {r}}_{\texttt {0}})\Big ] \dot{p}_i({\texttt {b}}_{\texttt {1}})^{k-2},\end{aligned}$$

and by definition of \(\varvec{\Gamma }\) the right-hand side is \(O(k/4^k) \dot{p}_i({\texttt {b}})^{k-1}\). Now recall from (82) that \(\hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}}) \gtrsim \dot{p}_i({\texttt {b}})^{k-1}\). Combining these gives

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i({\texttt {f}}) \leqslant [1+O(k/4^k)] \hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}})\,.\end{aligned}$$
(87)

Recalling (83), it follows that

$$\begin{aligned}\frac{\dot{p}^{\text {u}}_i({\texttt {f}})}{\dot{p}^{\text {u}}_i({\texttt {b}}_{\texttt {1}})} \lesssim \bigg (\frac{\hat{m}^\lambda \hat{p}_i({\texttt {f}})}{\hat{m}^\lambda \hat{p}_i({\texttt {p}}_{\texttt {1}})}\bigg )^{d-1} \lesssim \bigg (\frac{\hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}})}{\hat{m}^\lambda \hat{p}_i({\texttt {p}}_{\texttt {1}})}\bigg )^{d-1} \lesssim 2^{-k},\end{aligned}$$

where the last step uses (72). This concludes the proof.\(\square \)

Proof of Lemma A.5

Denote \(\tilde{q}_i \equiv {\texttt {BP}}{\dot{q}}_i\) and \(\Delta \tilde{q}\equiv |\tilde{q}_1-\tilde{q}_2|\). We first check that \(\tilde{q}_i\) lies in \(\varvec{\Gamma }\): the first condition of (68) follows from (74), and the second is automatically satisfied from the definition of \(\dot{{\texttt {BP}}}\). Next we bound \(\Delta \tilde{q}\). With some abuse of notation, we shall write \(\tilde{q}_i({\texttt {X}})\equiv \tilde{q}_i({\texttt {r}})-\tilde{q}_i({\texttt {b}})\) and

$$\begin{aligned}\Delta \tilde{q}({\texttt {X}}) \equiv |(\tilde{q}_1({\texttt {r}})-\tilde{q}_1({\texttt {b}})) - (\tilde{q}_2({\texttt {r}})-\tilde{q}_2({\texttt {b}}))|\,.\end{aligned}$$

Let \(\dot{p}^{\text {u}}_i({\texttt {X}})\) and \(\Delta \dot{p}^{\text {u}}({\texttt {X}})\) be similarly defined. Arguing similarly as in the derivation of (86),

$$\begin{aligned} \Delta \dot{p}^{\text {u}}({\texttt {X}}) = 2 |\hat{p}_1({\texttt {b}}_{\texttt {1}})^{d-1} -\hat{p}_2({\texttt {b}}_{\texttt {1}})^{d-1}| \lesssim k\Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert \max _{i=1,2} \Big \{\dot{p}^{\text {u}}_i({\texttt {b}})\Big \}\end{aligned}$$
(88)

Recalling \(\Vert {\tilde{q}_i} \Vert =1\), we have

$$\begin{aligned} 2\tilde{q}_i({\texttt {r}})&=[1-\tilde{q}_i({\texttt {f}})] +[\tilde{q}_i({\texttt {r}})-\tilde{q}_i({\texttt {b}})] \text { and}\\ 2\tilde{q}_i({\texttt {b}})&=[1-\tilde{q}_i({\texttt {f}})]-[\tilde{q}_i({\texttt {r}})-\tilde{q}_i({\texttt {b}})], \text { so}\\ \Vert {\Delta \tilde{q}} \Vert&\lesssim \Delta \tilde{q}({\texttt {f}}) + \Delta \tilde{q}({\texttt {X}}). \end{aligned}$$

If we take \(a\in \{1,2\}\) and \(b=2-a\), and write \({\dot{Z}}_i\equiv \Vert {\dot{p}^{\text {u}}_i} \Vert \), then

$$\begin{aligned} \Delta \tilde{q}({\texttt {f}}) + \Delta \tilde{q}({\texttt {X}}) \leqslant \frac{\Delta \dot{p}^{\text {u}}({\texttt {f}}) +\Delta \dot{p}^{\text {u}}({\texttt {X}}) }{{\dot{Z}}_a} + \frac{|{\dot{Z}}_a-{\dot{Z}}_b|}{{\dot{Z}}_a} \frac{[\dot{p}^{\text {u}}_b({\texttt {f}}) +\dot{p}^{\text {u}}_b({\texttt {r}})-\dot{p}^{\text {u}}_b({\texttt {b}})]}{{\dot{Z}}_b}\,. \end{aligned}$$

If we take \(a\in {{\,\mathrm{arg\,max}\,}}_i\dot{p}^{\text {u}}_i({\texttt {b}})\), then, by (74) and (88), the first term on the right-hand side is

$$\begin{aligned} \lesssim \frac{ k\Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert \dot{p}^{\text {u}}_a({\texttt {b}}) }{{\dot{Z}}_a} \lesssim k\Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert , \end{aligned}$$

where the rightmost inequality uses \({\dot{Z}}_i\geqslant \dot{p}^{\text {u}}_i({\texttt {b}})\). As for the second term, (74) gives

$$\begin{aligned} \frac{|{\dot{Z}}_a-{\dot{Z}}_b|}{{\dot{Z}}_a} \lesssim d\Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert \quad \text {and}\quad \frac{[\dot{p}^{\text {u}}_b({\texttt {f}}) +\dot{p}^{\text {u}}_b({\texttt {r}})-\dot{p}^{\text {u}}_b({\texttt {b}})]}{ {\dot{Z}}_b} \lesssim 2^{-k}\,. \end{aligned}$$

Combining these estimates yields the claimed bound.\(\square \)

1.2 A.2. Pair coloring recursions

In this section we analyze the bp recursions for the pair coloring model and prove the remaining assertions of Proposition 5.5. Recall that we assume \({\dot{q}}={\dot{q}}^\text {av}\) and \(\hat{q}=\hat{q}^\text {av}\), where these are now probability measures on \((\dot{\Omega }_T)^2\) and \((\hat{\Omega }_T)^2\) respectively. For any measure p(x) defined on \(x\equiv (x^1,x^2)\) in \((\dot{\Omega }_T)^2\) or \((\hat{\Omega }_T)^2\), define

$$\begin{aligned}(\mathfrak {f}p)(x) \equiv p(\mathfrak {f}x)\quad \text {where } \mathfrak {f}x \equiv x\oplus ({\texttt {0}},{\texttt {1}}) \equiv (x^1,x^2\oplus {\texttt {1}})\,. \end{aligned}$$

Recall from Sect. 5.3 the definition of \(\varvec{\Gamma }(c,\kappa )\). We will prove that

Proposition A.6

For any \(c\in (0,1]\) and any \( {\dot{q}}_1,{\dot{q}}_2 \in \varvec{\Gamma }(c,1)\), we have \({\texttt {BP}}{\dot{q}}_1, {\texttt {BP}}{\dot{q}}_2 \in \varvec{\Gamma }(1,1)\) and

$$\begin{aligned} \Vert {{\texttt {BP}}{\dot{q}}_1-{\texttt {BP}}{\dot{q}}_2} \Vert = O(k^4/2^k)\Vert {{\dot{q}}_1-{\dot{q}}_2} \Vert + O(k^4/2^k) \sum _{i=1,2} \Vert {{\dot{q}}_i -\mathfrak {f}{\dot{q}}_i} \Vert . \end{aligned}$$
(89)

Assuming this result, it is straightforward to deduce Proposition 5.5A:

Proof of Proposition 5.5A

Let \({\dot{q}}^{(0)}\) be the uniform probability measure on \(\{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}},{\texttt {r}}_{\texttt {1}},{\texttt {r}}_{\texttt {0}}\}^2\), and define recursively \({\dot{q}}^{(l)} = {\texttt {BP}}{\dot{q}}^{(l-1)}\) for \(l\geqslant 1\). It is clear that \({\dot{q}}^{(0)}\in \varvec{\Gamma }(1,1)\) and \({\dot{q}}^{(0)} =\mathfrak {f}{\dot{q}}^{(0)}\). Since \({\dot{q}}^{(l)}=\mathfrak {f}{\dot{q}}^{(l)}\) for all \(l\geqslant 1\), it follows from (89) that \(({\dot{q}}^{(l)})_{l\geqslant 1}\) forms an \(\ell ^1\) Cauchy sequence. It follows by completeness of \(\ell ^1\) that \({\dot{q}}^{(l)}\) converges to a limit \({\dot{q}}^{(\infty )}={\dot{q}}_\star \in \varvec{\Gamma }(1,1)\), satisfying \({\dot{q}}_\star =\mathfrak {f}{\dot{q}}_\star ={\texttt {BP}}{\dot{q}}_\star \). This implies that for any probability measure \({\dot{q}}\),

$$\begin{aligned}\Vert {{\dot{q}}-\mathfrak {f}{\dot{q}}} \Vert \leqslant \Vert {{\dot{q}}-{\dot{q}}_\star } \Vert +\Vert {{\dot{q}}_\star -\mathfrak {f}{\dot{q}}} \Vert =2\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert \,.\end{aligned}$$

Applying (89) again gives

$$\begin{aligned} \Vert {{\texttt {BP}}{\dot{q}}-{\dot{q}}_\star } \Vert = O(k^4/2^k)\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert + O(k^4/2^k)\Vert {{\dot{q}}-\mathfrak {f}{\dot{q}}} \Vert = O(k^4/2^k)\Vert {{\dot{q}}-{\dot{q}}_\star } \Vert ,\end{aligned}$$

proving the claimed contraction estimate. Uniqueness of \({\dot{q}}_\star \) can be deduced from this contraction.\(\square \)

We now turn to the proof of Proposition A.6; the proof of Proposition 5.5B is given after. Let \(\dot{{\texttt {NB}}},\hat{{\texttt {NB}}}\) now denote the non-normalized bp recursions for the pair model. Let \({\texttt {r}}[\dot{\sigma }]\in \{0,1,2\}\) count the number of \({\texttt {r}}\) spins in \(\dot{\sigma }\), and let \(\dot{p}\equiv \dot{p}({\dot{q}})\) be the reweighted measure

$$\begin{aligned} \dot{p}(\dot{\sigma }) \equiv \frac{{\dot{q}}(\dot{\sigma })}{1 - {\dot{q}}({\texttt {r}}[\dot{\sigma }] > 0)}\,.\end{aligned}$$
(90)

Recalling convention (69), we will denote

$$\begin{aligned}\hat{m}^\lambda {\hat{r}}(\hat{\sigma }^1,\hat{\sigma }^2) \equiv [\hat{m}(\hat{\sigma }^1) \hat{m}(\hat{\sigma }^2)]^\lambda {\hat{r}}(\hat{\sigma }^1,\hat{\sigma }^2)\,.\end{aligned}$$

Let \(\hat{{\texttt {NB}}}\) and \(\dot{{\texttt {NB}}}\) be the non-normalized pair bp recursions at parameters \(\lambda ,T\). Starting from \({\dot{q}}_i\in \varvec{\Gamma }(c,\kappa )\) (\(i=1,2\)), we denote

$$\begin{aligned} \dot{p}_i&\equiv \dot{p}({\dot{q}}_i) \text { (as defined by (71)),}\\ \hat{p}_i&\equiv \hat{{\texttt {NB}}}(\dot{p}_i) \text { and } \hat{p}_{i,\infty }\equiv \hat{{\texttt {NB}}}_{\lambda ,\infty }(\dot{p}_i),\\ \dot{p}^{\text {u}}_i&\equiv \dot{{\texttt {NB}}}(\hat{p}_i) \text { and } \tilde{q}_i \equiv \dot{{\texttt {BP}}}\hat{p}_i ={\texttt {BP}}{\dot{q}}_i.\end{aligned}$$

With this notation in mind, the proof of Proposition A.6 is divided into the following lemmas.

Lemma A.7

(effect of reweighting) Suppose \({\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }(c,\kappa )\) for \(c\in (0,1]\) and \(\kappa \in [0,1]\): then

$$\begin{aligned} \Vert {\Delta \dot{p}} \Vert&\equiv O(2^{2(1-\kappa )k}) \Vert {\Delta {\dot{q}}} \Vert ,\\ \Vert {\dot{p}_i - \mathfrak {f}\dot{p}_i} \Vert&\equiv O(2^{(1-\kappa )k}) \Vert {{\dot{q}}_i - \mathfrak {f}{\dot{q}}_i} \Vert . \end{aligned}$$

Lemma A.8

(clause bp contraction) Suppose \({\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }(c,\kappa )\) for \(c\in (0,1]\) and \(\kappa \in [0,1]\): then

$$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}({\texttt {y}}{\texttt {y}})&=O(k^3/2^{k}) \Delta \dot{p}({\texttt {g}}{\texttt {g}}) =O(k^3/2^{(1+c)k}),\nonumber \\ \Delta \hat{m}^\lambda \hat{p}(\{{\texttt {b}}{\texttt {r}},{\texttt {b}}{\texttt {f}}_{\geqslant 1}\})&= O(k^2/2^k) [\Delta \dot{p}({\texttt {g}}{\texttt {g}}) +2^{-k}\Delta \dot{p}( \dot{\Omega }^2{\setminus }\{{\texttt {r}}{\texttt {r}}\})] =O(k^3/2^{(1+c)k}),\nonumber \\ \Vert {\Delta \hat{m}^\lambda \hat{p}} \Vert&= O(k^3/2^{k})\Vert {\Delta \dot{p}} \Vert =O(k^3 2^{(1-2\kappa )k}), \end{aligned}$$
(91)

and the same estimates hold with \(\mathfrak {f}\hat{p}\) in place of \(\hat{p}\). For both \(i=1,2\),

$$\begin{aligned} \Vert {\hat{m}^\lambda \hat{p}_i -\hat{m}^\lambda \mathfrak {f}\hat{p}_i} \Vert =O(k^3/2^{(1+\kappa )k}) \Vert {\dot{p}_i-\mathfrak {f}\dot{p}_i} \Vert =O(k^3/2^{2\kappa k}) \Vert {{\dot{q}}_i-\mathfrak {f}{\dot{q}}_i} \Vert \,. \end{aligned}$$
(92)

Lemma A.9

(clause bp output values) Suppose \({\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }(c,\kappa )\) for \(c\in (0,1]\) and \(\kappa \in [0,1]\). For \(s,t\subseteq \hat{\Omega }\) let \(st\equiv s\times t\). Then it holds for all \(s,t \in \{{\texttt {r}}_{\texttt {1}}, {\texttt {b}}_{\texttt {1}}, {\texttt {f}}, {\texttt {s}}\}\) that

$$\begin{aligned} \frac{\hat{m}^\lambda \hat{p}_i(s, t)}{(2/2^k)^{{\texttt {r}}[s]+{\texttt {r}}[t]} } = {\left\{ \begin{array}{ll} 1+O(k^2/2^{k}) &{} \text {if }{\texttt {r}}[s] + {\texttt {r}}[t] \leqslant 1,\\ 1+O(k^2/2^{ck}) &{} \text {if }{\texttt {r}}[s] + {\texttt {r}}[t] = 2. \\ \end{array}\right. } \end{aligned}$$
(93)

Furthermore we have the bounds

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i({\texttt {f}}_{\geqslant 1}t) + \hat{m}^\lambda \hat{p}_i(t{\texttt {f}}_{\geqslant 1})&\leqslant O(k/4^k) \text { for all } t\in \{{\texttt {r}}_{\texttt {1}},{\texttt {b}}_{\texttt {1}},{\texttt {f}},{\texttt {s}}\},\nonumber \\ \hat{m}^\lambda \hat{p}_i (\{{\texttt {f}}\}\times \hat{\Omega }) -\hat{m}^\lambda \hat{p}_i(\{{\texttt {b}}_{\texttt {1}}\} \times \hat{\Omega })&\leqslant O(k/4^{k}). \end{aligned}$$
(94)

The same estimates hold with \(\mathfrak {f}\hat{p}_i\) in place of \(\hat{p}_i\).

Lemma A.10

(variable bp) Suppose \({\dot{q}}_1,{\dot{q}}_2\in \varvec{\Gamma }(c,\kappa )\) for \(c\in (0,1]\) and \(\kappa \in [0,1]\). Then \({\texttt {BP}}{\dot{q}}_1,{\texttt {BP}}{\dot{q}}_2 \in \varvec{\Gamma }(c',1)\) for \(c' = \max \{0,2\kappa -1\}\), and

$$\begin{aligned}\Big \Vert {{\texttt {BP}}{\dot{q}}_1 - {\texttt {BP}}{\dot{q}}_2} \Big \Vert \lesssim k \Big \Vert {\Delta \hat{m}^\lambda \hat{p}+\Delta \hat{m}^\lambda \mathfrak {f}\hat{p}} \Big \Vert \big ) + k2^k\sum _{i=1,2} \Big \Vert {\hat{m}^\lambda \hat{p}_i - \hat{m}^\lambda \mathfrak {f}\hat{p}_i} \Big \Vert \,.\end{aligned}$$

Proof of Proposition A.6

Follows by combining the preceding lemmas A.7A.10.\(\square \)

Proof of Proposition 5.5B

If \({\dot{q}}\in \varvec{\Gamma }(c,0)\) is a fixed point of \({\texttt {BP}}\), then it follows from Lemmas A.8A.10 that we have \({\dot{q}}\in \varvec{\Gamma }(c,0)\cap \varvec{\Gamma }(0,1) = \varvec{\Gamma }(c,1)\).\(\square \)

We now prove the three lemmas leading to Proposition A.6.

Proof of Lemma A.7

Applying (75) we have

$$\begin{aligned} |\dot{p}_1(\dot{\sigma })-\dot{p}_2(\dot{\sigma })| \leqslant \frac{|{\dot{q}}_1(\dot{\sigma })-{\dot{q}}_2(\dot{\sigma })|}{ {\dot{q}}_1({\texttt {g}}{\texttt {g}}) } + \frac{|{\dot{q}}_1({\texttt {g}}{\texttt {g}}) -{\dot{q}}_2({\texttt {g}}{\texttt {g}})|}{{\dot{q}}_1({\texttt {g}}{\texttt {g}}){\dot{q}}_2({\texttt {g}}{\texttt {g}})} {\dot{q}}_2(\dot{\sigma }), \end{aligned}$$

and summing over \(\dot{\sigma }\in \dot{\Omega }^2\) gives

$$\begin{aligned} \Vert {\Delta \dot{p}} \Vert \leqslant \frac{\Vert {{\dot{q}}_1-{\dot{q}}_2} \Vert }{ {\dot{q}}_1({\texttt {g}}{\texttt {g}}) } + \frac{|{\dot{q}}_1({\texttt {g}}{\texttt {g}}) -{\dot{q}}_2({\texttt {g}}{\texttt {g}})|}{{\dot{q}}_1({\texttt {g}}{\texttt {g}}){\dot{q}}_2({\texttt {g}}{\texttt {g}})} \leqslant \frac{2\Vert {{\dot{q}}_1-{\dot{q}}_2} \Vert }{{\dot{q}}_1({\texttt {g}}{\texttt {g}}){\dot{q}}_2({\texttt {g}}{\texttt {g}})}\,. \end{aligned}$$

Since \({\dot{q}}_i\in \varvec{\Gamma }\), we have, using (\(1\varvec{\Gamma }\)) and (\(2\varvec{\Gamma }\)),

$$\begin{aligned} \dot{p}_i(\dot{\Omega }^2{\setminus }\{{\texttt {r}}{\texttt {r}}\})= O(1)\,,\quad \dot{p}_i({\texttt {r}}{\texttt {r}})=O(2^{(1-\kappa )k})\,. \end{aligned}$$
(95)

Consequently \({\dot{q}}_i({\texttt {g}}{\texttt {g}})^{-1}\leqslant O(1) 2^{(1-\kappa )k}\), and the claimed bound on \(\Vert {\Delta \dot{p}} \Vert \) follows. The bound on \(\Vert {\dot{p}_i-\mathfrak {f}\dot{p}_i} \Vert \) follows by noting that if \({\dot{q}}_2=\mathfrak {f}{\dot{q}}_1\), then \({\dot{q}}_1({\texttt {g}}{\texttt {g}})={\dot{q}}_2({\texttt {g}}{\texttt {g}})\).

\(\square \)

Proof of Lemma A.8

We will prove (91) for \(\hat{p}_i\); the proof for \(\mathfrak {f}\hat{p}_i\) is entirely similar. It follows from the symmetry \(\dot{p}_i=(\dot{p}_i)^\text {av}\) that for any \({\varvec{x}},{\varvec{y}}\in \{{\texttt {0}},{\texttt {1}}\}\),

$$\begin{aligned} \Big |\dot{p}_i({\texttt {b}}{\texttt {b}}) -4\dot{p}_i({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}})\Big | =2\Big | \dot{p}_i({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{{\varvec{y}}\oplus {\texttt {1}}}) -\dot{p}_i({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}})\Big | =2\Big | \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -\dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})\Big |, \end{aligned}$$

from which we obtain that

$$\begin{aligned} \Delta \dot{p}({\texttt {b}}{\texttt {b}}) \lesssim \max _{i=1,2} \Big | \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -\dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})\Big |\,. \end{aligned}$$

Recall \({\texttt {g}}=\{{\texttt {b}},{\texttt {f}}\}\) and \(\dot{p}_i({\texttt {g}}{\texttt {g}})=1\). Combining the above with the definition of \(\varvec{\Gamma }(c,\kappa )\) gives

$$\begin{aligned} \Delta \dot{p}({\texttt {g}}{\texttt {g}})&\leqslant \Delta \dot{p}({\texttt {b}}{\texttt {b}})+ \Delta \dot{p}({\texttt {g}}{\texttt {f}}) +\Delta \dot{p}({\texttt {f}}{\texttt {g}})\nonumber \\&\leqslant \sum _{i=1,2}\bigg \{ \Big | \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -\dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})\Big | +\dot{p}_i({\texttt {g}}{\texttt {f}}) +\dot{p}({\texttt {f}}{\texttt {g}}) \bigg \} = O(2^{-ck}). \end{aligned}$$
(96)

Step I. We first control \(\Delta \hat{m}^\lambda \hat{p}(\hat{\sigma })\). By symmetry it suffices to analyze the bp recursion at a clause with all literals \({\texttt {L}}_j={\texttt {0}}\). We distinguish the following cases of \(\hat{\sigma }\in \hat{\Omega }^2\):

  1. 1.

    Recall \({\texttt {y}}\equiv {\texttt {r}}\cup {\texttt {f}}\), and note \(\{{\texttt {y}}\}{\setminus }\{{\texttt {s}}\} \subseteq {\texttt {a}}_{\texttt {0}}\cup {\texttt {a}}_{\texttt {1}}\) (as defined by (78)). Thus

    $$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}( \{{\texttt {y}}{\texttt {y}}\}{\setminus }\{{\texttt {s}}{\texttt {s}}\}) \leqslant \sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \bigg \{ \Delta \hat{m}^\lambda \hat{p}({\texttt {a}}_{\varvec{x}}\times \{{\texttt {y}}\}) + \Delta \hat{m}^\lambda \hat{p}( \{{\texttt {y}}\}\times {\texttt {a}}_{\varvec{x}}) \bigg \}. \end{aligned}$$
    (97)

    For \({\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}\), consider \(\hat{\sigma }\in {\texttt {a}}_{\varvec{x}}\times \{{\texttt {y}}\}\): in order for \(\underline{{\dot{\sigma }}}\in (\dot{\Omega }^2)^{k-1}\) to be compatible with \(\hat{\sigma }\), it is necessary that \(\dot{\sigma }_j\in A \equiv \{{\texttt {g}}_{\varvec{x}}\}\times \{{\texttt {g}}\}\) for all \(2\leqslant j\leqslant k\). Combining with (77) gives

    $$\begin{aligned}\Delta \hat{m}^\lambda \hat{p}({\texttt {a}}_{\varvec{x}}\times \{{\texttt {y}}\}) \leqslant \sum _{\underline{{\dot{\sigma }}}\in A^{k-1}} \bigg | \prod _{j=2}^k\dot{p}_1(\dot{\sigma }_j) -\prod _{j=2}^k\dot{p}_2(\dot{\sigma }_j) \bigg | \leqslant k\Delta \dot{p}({\texttt {g}}{\texttt {g}}) \Big ( \dot{p}_1(A)+\Delta \dot{p}({\texttt {g}}{\texttt {g}}) \Big )^{k-2}\,. \end{aligned}$$

    It follows from the definition of \(\varvec{\Gamma }(c,\kappa )\) that \(\dot{p}_1(A)+\Delta \dot{p}({\texttt {g}}{\texttt {g}}) = \tfrac{1}{2}+O(2^{-ck})\), so we conclude

    $$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}( \{{\texttt {y}}{\texttt {y}}\}{\setminus }\{{\texttt {s}}{\texttt {s}}\}) = O(k/2^k) \Delta \dot{p}({\texttt {g}}{\texttt {g}})\,.\end{aligned}$$
    (98)
  2. 2.

    Now take \(\hat{\sigma }={\texttt {s}}{\texttt {s}}\): for \(\underline{{\dot{\sigma }}}\in (\dot{\Omega }^2)^{k-1}\) to be compatible with \(\hat{\sigma }\), it is necessary that \(\underline{{\dot{\sigma }}}\in \{{\texttt {y}}{\texttt {y}}\}^{k-1}\). On the other hand, it is sufficient that \(\underline{{\dot{\sigma }}}\in \{{\texttt {g}}{\texttt {g}}\}^{k-1}\) does not belong to any of the sets \((A_{\texttt {0}})^{k-1},(A_{\texttt {1}})^{k-1},(B_{\texttt {0}})^{k-1},(B_{\texttt {1}})^{k-1}\), where for \({\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}\) we define \(A_{\varvec{x}}\equiv \{{\texttt {b}}_{\varvec{x}}{\texttt {g}}\}\cup \{{\texttt {f}}{\texttt {g}}\}\) and \(B_{\varvec{x}}\equiv \{{\texttt {g}}{\texttt {b}}_{\varvec{x}}\}\cup \{{\texttt {g}}{\texttt {f}}\}\). Therefore

    $$\begin{aligned}\Delta \hat{m}^\lambda \hat{p}({\texttt {s}}{\texttt {s}}) \leqslant \sum _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \sum _{\underline{{\dot{\sigma }}} \in (A_{\varvec{x}})^{k-1} \cup (B_{\varvec{x}})^{k-1}} \bigg | \prod _{j=2}^k\dot{p}_1(\dot{\sigma }_j) -\prod _{j=2}^k\dot{p}_2(\dot{\sigma }_j) \bigg | = O(k/2^k) \Delta \dot{p}({\texttt {g}}{\texttt {g}}),\end{aligned}$$

    where the last estimate follows by the same argument that led to (98). This concludes the proof of the first line of (91).

  3. 3.

    Now consider \(\hat{\sigma }\) with exactly one coordinate in \(\{{\texttt {b}}\}\), meaning the other must be in \(\{{\texttt {y}}\}\). Recalling convention (69), we assume without loss that \(\hat{\sigma }\in \{{\texttt {b}}_{\texttt {1}}{\texttt {y}}\}\) and proceed to bound \(\Delta \hat{m}^\lambda \hat{p}(\hat{\sigma })\). Let \(\underline{{\dot{\sigma }}}\in (\dot{\Omega }^2)^{k-1}\) be compatible with \(\hat{\sigma }\). There are two cases:

    1. a.

      If \(\underline{{\dot{\sigma }}}\) has no entry in \(\{{\texttt {r}}\}\), it must also be compatible with some \(\hat{\sigma }'\in \{{\texttt {y}}{\texttt {y}}\}\), as long as we permit the possibility that \(|(\hat{\sigma }')^1|>T\). Such \(\underline{{\dot{\sigma }}}\) gives the same contribution to \(\hat{m}^\lambda \hat{p}(\hat{\sigma })\) as to \(\hat{m}^\lambda \hat{p}_\infty ({\texttt {y}}{\texttt {y}})\). It follows from the preceding estimates that the contribution to \(\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {y}})\) from all such \(\underline{{\dot{\sigma }}}\) is upper bounded by

      $$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {y}}{\texttt {y}}) =O(k/2^k) \Delta \dot{p}({\texttt {g}}{\texttt {g}})\end{aligned}$$
      (99)
    2. b.

      The only remaining possibility is that some permutation of \(\underline{{\dot{\sigma }}}\) belongs to \(A\times B^{k-2}\) for \(A=\{{\texttt {r}}_{\texttt {0}}{\texttt {g}}\}\) and \(B=\{{\texttt {b}}_{\texttt {1}}{\texttt {g}}\}\): the contribution to \(\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {y}})\) from all such \(\underline{{\dot{\sigma }}}\) is

      $$\begin{aligned} \leqslant (k-1)\sum _{\underline{{\dot{\sigma }}}\in A\times B^{k-2}}\bigg |\prod _{j=2}^k \dot{p}_1(\dot{\sigma }_j) -\prod _{j=2}^k \dot{p}_2(\dot{\sigma }_j)\bigg | = O(k^2/2^k)\Vert {\Delta \dot{p}} \Vert ,\end{aligned}$$
      (100)

      where the last estimate follows using (76) and (95).

    Combining the above estimates (and using the symmetry between \({\texttt {b}}_{\texttt {1}}{\texttt {y}}\) and \({\texttt {y}}{\texttt {b}}_{\texttt {1}}\)) gives

    $$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {y}}) +\Delta \hat{m}^\lambda \hat{p}({\texttt {y}}{\texttt {b}}_{\texttt {1}}) = O(k^2/2^k)\Vert {\Delta \dot{p}} \Vert \,.\end{aligned}$$
    (101)

    If we further have \(\hat{\sigma }\in \{{\texttt {b}}_{\texttt {1}}\}\times \{{\texttt {r}},{\texttt {f}}_{\geqslant 1}\}\), then, arguing as above, \(\underline{{\dot{\sigma }}}\) either contributes to \(\Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {y}}\times \{{\texttt {r}},{\texttt {f}}_{\geqslant 1}\})\), or else belongs to \(A_{\varvec{x}}\times (B_{\varvec{x}})^{k-2}\) for \(A_{\varvec{x}}=\{{\texttt {r}}_{\texttt {0}}{\texttt {g}}_{\varvec{x}}\}\), \(B_{\varvec{x}}=\{{\texttt {b}}_1{\texttt {g}}_{\varvec{x}}\}\) and \({\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}\). The contribution from first case is bounded by (98). The contribution from the second case, using (76) and (95), is

    $$\begin{aligned} \lesssim k \Delta \dot{p}(\dot{\Omega }^2{\setminus }\{{\texttt {r}}{\texttt {r}}\}) \Big (\max _{{\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}} \dot{p}_1(B_{\varvec{x}}) + \Delta \dot{p}({\texttt {g}}{\texttt {g}}) \Big )^{k-2} =O(k^2/4^k) \Delta \dot{p}(\dot{\Omega }^2{\setminus }\{{\texttt {r}}{\texttt {r}}\})\,.\end{aligned}$$

    The second claim of (91) follows by combining these estimates and recalling (96).

  4. c.

    Lastly, consider \(\hat{\sigma }\in \{{\texttt {b}}{\texttt {b}}\}\). Without loss of generality, we take \(\hat{\sigma }={\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}}\) and proceed to bound \(\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})\). Let \(\underline{{\dot{\sigma }}}\in (\dot{\Omega }^2)^{k-1}\) be compatible with \(\hat{\sigma }\). We distinguish three cases:

    1. a.

      For at least one \(i\in \{1,2\}\), \(\underline{{\dot{\sigma }}}^i\) contains no entry in \(\{{\texttt {r}}\}\). In this case \(\underline{{\dot{\sigma }}}\) is also compatible with some \(\hat{\sigma }'\in \{{\texttt {b}}_{\texttt {1}}{\texttt {y}}\} \cup \{{\texttt {y}}{\texttt {b}}_{\texttt {1}}\}\), as long as we permit the possibility that \(|(\hat{\sigma }')^i|>T\). The contribution of all such \(\underline{{\dot{\sigma }}}\) to \(\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})\) is therefore upper bounded by

      $$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {b}}_{\texttt {1}}{\texttt {y}}) +\Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {y}}{\texttt {b}}_{\texttt {1}}) = O(k^2/2^k)\Vert {\Delta \dot{p}} \Vert ,\end{aligned}$$
      (102)

      where the last step is by the same argument as for (101).

    2. b.

      The next case is that \(\underline{{\dot{\sigma }}}\) is a permutation of \(({\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}}, ({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-2})\). The contribution to \(\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})\) from this case is at most

      $$\begin{aligned}(k-1)\bigg | \dot{p}_1({\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}}) \dot{p}_1({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-2} -\dot{p}_2({\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}}) \dot{p}_2({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-2} \bigg |\,.\end{aligned}$$

      Using (76) and the definition of \(\varvec{\Gamma }(c,\kappa )\), this is at most

      $$\begin{aligned}&O(k^2/4^k) \Big ( \Delta \dot{p}({\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}}) + \dot{p}({\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}}) \cdot \Delta \dot{p}({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}}) \Big )\nonumber \\&\quad =O(k^2/4^k)\Vert {\dot{p}} \Vert \Vert {\Delta \dot{p}} \Vert = O(k^2/2^{(1+\kappa )k})\Vert {\Delta \dot{p}} \Vert . \end{aligned}$$
      (103)
    3. c.

      The last case is that \(\underline{{\dot{\sigma }}}\) is a permutation of \(({\texttt {r}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}}, {\texttt {b}}_{\texttt {1}}{\texttt {r}}_{\texttt {0}}, ({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-3})\). The contribution to \(\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})\) from this case is at most

      $$\begin{aligned} k^2 \bigg | \dot{p}_1({\texttt {r}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}}) \dot{p}_1({\texttt {b}}_{\texttt {1}}{\texttt {r}}_{\texttt {0}}) \dot{p}_1({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-3} -\dot{p}_2({\texttt {r}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}}) \dot{p}_2({\texttt {b}}_{\texttt {1}}{\texttt {r}}_{\texttt {0}}) \dot{p}_2({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-3} \bigg |\,. \end{aligned}$$

      This is at most \(O(k^2/4^k) \Vert {\Delta \dot{p}} \Vert \) by another application of (76) and the definition of \(\varvec{\Gamma }(c,\kappa )\).

    The above estimates together give

    $$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}}) = O(k^2/2^k)\Vert {\Delta \dot{p}} \Vert ,\end{aligned}$$
    (104)

    where the main contribution comes from (102). Combining with the previous bound (101) yields the last part of (91).

Step II. Next we prove (92) by improving the preceding bounds in the special case that \(\dot{p}_1=\dot{p}\) and \(\dot{p}_2 \equiv \mathfrak {f}\dot{p}\). Recall \(\hat{p}_i \equiv \hat{{\texttt {NB}}}(\dot{p}_i)\); it follows that \(\hat{p}_2=\mathfrak {f}\hat{p}_1\). Thus, for any \(\hat{\sigma }\in \hat{\Omega }^2\) with \(\hat{\sigma }^2={\texttt {s}}\), we have \(\hat{\sigma }=\mathfrak {f}\hat{\sigma }\), consequently \(\hat{p}_2(\hat{\sigma }) =\hat{p}_1(\mathfrak {f}\hat{\sigma })=\hat{p}_1(\hat{\sigma })\). For \(\hat{\sigma }\in \hat{\Omega }^2\) with \(\hat{\sigma }^1={\texttt {s}}\), we have \(\hat{\sigma }=(\mathfrak {f}\hat{\sigma })\oplus {\texttt {1}}\), so \(\hat{p}_2(\hat{\sigma })=\hat{p}_1(\mathfrak {f}\hat{\sigma })=\hat{p}_1(\hat{\sigma })\), where the last step uses that \(\hat{p}_1=(\hat{p}_1)^\text {av}\). It follows that instead of (97) and (99) we have the improved bound

$$\begin{aligned} \Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {y}}{\texttt {y}})&=\Delta \hat{m}^\lambda \hat{p}_\infty (\{{\texttt {y}}{\texttt {y}}\} {\setminus }(\{{\texttt {s}}{\texttt {y}}\}\cup \{{\texttt {y}}{\texttt {s}}\})) \leqslant \sum _{{\varvec{x}},{\varvec{y}}\in \{{\texttt {0}},{\texttt {1}}\}} \Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {a}}_{\varvec{x}}\times {\texttt {a}}_{\varvec{y}})\\&= O(k) \Vert {\Delta \dot{p}} \Vert \sum _{{\varvec{x}},{\varvec{y}}\in \{{\texttt {0}},{\texttt {1}}\}} \Big ( \dot{p}_1({\texttt {g}}_{\varvec{x}},{\texttt {g}}_{\varvec{y}}) +\Delta \dot{p}({\texttt {g}}{\texttt {g}}) \Big )^{k-2} = O(k/4^k)\Vert {\dot{p}-\mathfrak {f}\dot{p}} \Vert . \end{aligned}$$

Similarly, instead of (100) we would only have a contribution from \(\underline{{\dot{\sigma }}}\) belonging to either \(A_{\texttt {0}}\times (B_{\texttt {0}})^{k-2}\) or \(A_{\texttt {1}}\times (B_{\texttt {1}})^{k-2}\), where \(A_{\varvec{x}}=\{{\texttt {r}}_{\texttt {0}}{\texttt {g}}_{\varvec{x}}\}\) and \(B_{\varvec{x}}=\{{\texttt {b}}_{\texttt {1}}{\texttt {g}}_{\varvec{x}}\}\). It follows that instead of (101) and (102) we have the improved bound

$$\begin{aligned}\Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {b}}_{\texttt {1}}{\texttt {y}}) +\Delta \hat{m}^\lambda \hat{p}_\infty ({\texttt {y}}{\texttt {b}}_{\texttt {1}}) =O(k^4/4^k)\Vert {\Delta \dot{p}} \Vert \,.\end{aligned}$$

Previously the main contribution in (104) came from (102), but now it comes instead from (103). This gives the improved bound \(\Delta \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}}) =O(k^2/2^{(1+\kappa )k})\), which proves the first part of (92). The second part follows by applying Lemma A.7.\(\square \)

Proof of Lemma A.9

We first prove (93). Assume \(s,t\in \{{\texttt {b}}_{\texttt {1}},{\texttt {f}},{\texttt {s}}\}\), and write \(st\equiv s\times t \subseteq \hat{\Omega }^2\). Then for a lower bound we have

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i(st) \geqslant [1-O(k/2^k)] \dot{p}_i({\texttt {b}}{\texttt {b}})^{k-1} =1-O(k/2^k)\,. \end{aligned}$$

for an upper bound we have

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i(st)&\leqslant \dot{p}_i({\texttt {g}}{\texttt {g}})^{k-1} + k\dot{p}_i({\texttt {r}}_{\texttt {0}}{\texttt {g}}) \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {g}})^{k-2} + k\dot{p}_i({\texttt {g}}{\texttt {r}}_{\texttt {0}}) \dot{p}_i({\texttt {g}}{\texttt {b}}_{\texttt {1}})^{k-2}\\&\qquad + k\dot{p}_i({\texttt {r}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}}) \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-2} + k^2 \dot{p}_i({\texttt {r}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}}) \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {r}}_{\texttt {0}}) \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-3}\\ {}&= 1+O(k^2/2^k). \end{aligned}$$

Writing \({\texttt {r}}_{\texttt {1}}t\equiv {\texttt {r}}_{\texttt {1}}\times t\) for \(t\in \{{\texttt {b}}_{\texttt {1}},{\texttt {f}},{\texttt {s}}\}\), a similar argument gives

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i({\texttt {r}}_{\texttt {1}}t)&\geqslant [1-O(k/2^k)] \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}})^{k-1} = [1-O(k/2^k)] \cdot (2/2^k)\,,\nonumber \\ \hat{m}^\lambda \hat{p}_i({\texttt {r}}_{\texttt {1}}t)&\leqslant \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {g}})^{k-1} + k \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {r}}_{\texttt {0}}) \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})^{k-2} = [1-O(k/2^k)] \cdot (2/2^k)\,. \end{aligned}$$
(105)

Lastly, it is easily seen that

$$\begin{aligned} \hat{m}\hat{p}_i({\texttt {r}}_{\texttt {1}}{\texttt {r}}_{\texttt {1}}) = \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}})^{k-1} = [1-O(k/2^{ck})] \cdot (2/2^k)^2\,. \end{aligned}$$

This concludes the proof of (93), and we turn next to the proof of (94). Arguing similarly as for (80) gives

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i(\{{\texttt {f}}{\texttt {f}}\} {\setminus }\{{\texttt {s}}{\texttt {s}}\}) \leqslant \hat{m}^\lambda \hat{p}_i({\texttt {f}}_{\geqslant 1}{\texttt {f}}) +\hat{m}^\lambda \hat{p}_i({\texttt {f}}{\texttt {f}}_{\geqslant 1}) = O(k/4^{k})\,. \end{aligned}$$

Next, suppose \(\underline{{\dot{\sigma }}}\) is compatible with \(\hat{\sigma }\in {\texttt {b}}_{\texttt {1}}{\texttt {f}}_{\geqslant 1}\): if \(\underline{{\dot{\sigma }}}\) has no entry in \(\{{\texttt {r}}\}\), then it is also compatible with some \(\hat{\sigma }'\in {\texttt {f}}{\texttt {f}}_{\geqslant 1}\), provided we allow \(|(\hat{\sigma }')^1|>T\). Therefore

$$\begin{aligned}&\hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {f}}_{\geqslant 1}) - \hat{m}^\lambda \hat{p}_{i,\infty } ({\texttt {f}}{\texttt {f}}_{\geqslant 1}) \\&\qquad \leqslant \sum _{{\varvec{y}}\in \{{\texttt {0}},{\texttt {1}}\}} \bigg [ k\dot{p}_i({\texttt {r}}_{\texttt {0}}{\texttt {f}}) \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {g}}_{\varvec{y}})^{k-2} +k^2 \dot{p}_i({\texttt {r}}_{\texttt {0}}{\texttt {b}}_{\varvec{y}}) \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {f}}) \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {g}}_{\varvec{y}})^{k-3} \bigg ], \end{aligned}$$

and by definition of \(\varvec{\Gamma }(c,\kappa )\) this is \(O(k/4^k)\). Finally,

$$\begin{aligned} \hat{m}^\lambda \hat{p}_i({\texttt {r}}_{\texttt {1}}{\texttt {f}}_{\geqslant 1}) \leqslant \sum _{{\varvec{y}}\in \{{\texttt {0}},{\texttt {1}}\}} k\dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {f}}) \dot{p}_i({\texttt {b}}_{\texttt {0}}{\texttt {g}}_{\varvec{y}})^{k-2} =O(k/8^k), \end{aligned}$$

which proves the first part of (94). For the second part, arguing as for (87), we have for any \(\hat{\eta }\in \hat{\Omega }\) that

$$\begin{aligned}\hat{m}^\lambda \hat{p}_i({\texttt {f}}\hat{\eta }) -\hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\texttt {1}}\hat{\eta }) \leqslant (k-1) \sum _{\underline{{\dot{\sigma }}}\sim \hat{\eta }} [ \dot{p}_i({\texttt {g}}_{\texttt {0}}\dot{\sigma }_2) -\dot{p}_i({\texttt {r}}_{\texttt {0}}\dot{\sigma }_2) ] \prod _{j=3}^k \dot{p}_i({\texttt {b}}_{\texttt {1}}\dot{\sigma }_j)\,. \end{aligned}$$

Note that \(\underline{{\dot{\sigma }}}\) has at most one entry in \(\{{\texttt {r}}\}\). If \(\dot{\sigma }_2={\texttt {r}}_{\texttt {0}}\), then \(\dot{\sigma }_j={\texttt {b}}_{\texttt {1}}\) for all \(j\geqslant 3\). Since \({\dot{q}}_i\in \varvec{\Gamma }(c,\kappa )\) (which means also that \({\dot{q}}_i=({\dot{q}}_i)^\text {av}\)), we have

$$\begin{aligned} \sum _{\underline{{\dot{\sigma }}}\sim \hat{\eta }} \mathbf {1}\{\dot{\sigma }_2=\zeta \} \prod _{j=3}^k \dot{p}_i({\texttt {b}}_{\texttt {1}}\dot{\sigma }_j) \leqslant {\left\{ \begin{array}{ll} \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})^{k-2} \leqslant O(4^{-k}) &{} \text {if }\zeta ={\texttt {r}}_{\texttt {0}},\\ \dot{p}_i({\texttt {b}}_{\texttt {1}}{\texttt {g}})^{k-3} \leqslant O(2^{-k}) &{} \text {if }\zeta \in \dot{\Omega }{\setminus }\{{\texttt {r}}_{\texttt {0}}\}. \end{array}\right. } \end{aligned}$$

On the other hand, \({\dot{q}}_i\in \varvec{\Gamma }(c,\kappa )\) also implies

$$\begin{aligned}\dot{p}_i({\texttt {g}}_{\texttt {0}}\zeta ) -\dot{p}_i({\texttt {r}}_{\texttt {0}}\zeta ) \leqslant O(2^{-k}) \dot{p}_i({\texttt {b}}_{\texttt {0}}\zeta ) +\dot{p}_i({\texttt {f}}\zeta ) \leqslant {\left\{ \begin{array}{ll} O(1) &{}\text {if }\zeta ={\texttt {r}}_{\texttt {0}},\\ O(2^{-k}) &{}\text {if }\zeta \in \dot{\Omega }{\setminus }\{{\texttt {r}}_{\texttt {0}}\}.\end{array}\right. } \end{aligned}$$

Combining these estimates and summing over \(\hat{\eta }\in \hat{\Omega }\) proves the second part of (94).\(\square \)

An immediate application of (93), which will be useful in the next proof, is that

$$\begin{aligned} \frac{\hat{m}^\lambda \hat{p}_i({\texttt {r}}_{\varvec{x}}\hat{\eta })}{\hat{m}^\lambda \hat{p}_i({\texttt {b}}_{\varvec{x}}\hat{\eta })} \geqslant [1+O(k^2/2^{k})] \cdot (2/2^k)\,. \end{aligned}$$
(106)

for all \(\hat{\eta }\in \{{\texttt {b}}_{\texttt {0}},{\texttt {b}}_{\texttt {1}},{\texttt {f}},{\texttt {s}}\}\) and all \({\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}\).

Proof of Lemma A.10

We divide the proof in two parts.

Step I. Non-normalized messages.

  1. 1.

    First consider \(\dot{\sigma }\in \{{\texttt {f}}{\texttt {f}}\}\). Recalling \((a+b)^\lambda \leqslant a^\lambda + b^\lambda \) for \(a,b\geqslant 0\) and \(\lambda \in [0,1]\),

    $$\begin{aligned}\Delta \dot{p}^{\text {u}}({\texttt {f}}{\texttt {f}}) \leqslant 2 \sum _{{\hat{r}} \in \{\hat{p},\mathfrak {f}\hat{p}\} } \sum _{\underline{{\hat{\sigma }}} \in \{{\texttt {f}}{\texttt {f}}\}^{k-1}} \bigg |\prod _{j=2}^d \hat{m}^\lambda {\hat{r}}_1(\hat{\sigma }_j) -\prod _{j=2}^d\hat{m}^\lambda {\hat{r}}_2(\hat{\sigma }_j) \bigg |\end{aligned}$$

    where the \({\hat{r}}=\mathfrak {f}\hat{p}\) term arises from the fact that

    $$\begin{aligned}\hat{m}(\hat{\sigma }^1)^\lambda [1-\hat{m}(\hat{\sigma }^2)]^\lambda \hat{p}(\hat{\sigma }) =\hat{m}(\hat{\sigma }^1)^\lambda \hat{m}(\hat{\sigma }^2\oplus {\texttt {1}})^\lambda (\mathfrak {f}\hat{p})(\mathfrak {f}\hat{\sigma }) =\hat{m}^\lambda \mathfrak {f}\hat{p}(\mathfrak {f}\hat{\sigma })\,.\end{aligned}$$

    Applying (77) gives

    $$\begin{aligned}\Delta \dot{p}^{\text {u}}({\texttt {f}}{\texttt {f}}) =O(d) \sum _{{\hat{r}}\in \{\hat{p},\mathfrak {f}\hat{p}\}} \Delta \hat{m}^\lambda {\hat{r}}({\texttt {f}}{\texttt {f}}) \Big (\hat{m}^\lambda {\hat{r}}_1({\texttt {f}}{\texttt {f}}) + \Delta \hat{m}^\lambda {\hat{r}}({\texttt {f}}{\texttt {f}}) \Big )^{d-2}\,.\end{aligned}$$

    We have from (91) and (93) that \(\hat{m}^\lambda \hat{p}_1({\texttt {f}}{\texttt {f}}) \asymp 1\) and \(\Delta \hat{m}^\lambda \hat{p}({\texttt {f}}{\texttt {f}}) = O(k^3/2^{(1+c)k})\), so

    $$\begin{aligned} \Delta \dot{p}^{\text {u}}({\texttt {f}}{\texttt {f}}) = O(d)\Vert {\Delta \hat{m}^\lambda \hat{p}+ \Delta \hat{m}^\lambda \mathfrak {f}\hat{p}} \Vert \cdot \dot{p}^{\text {u}}_1({\texttt {f}}{\texttt {f}})\,.\end{aligned}$$
    (107)
  2. 2.

    Next consider \(\dot{\sigma }\in \{{\texttt {p}}_{\texttt {1}}{\texttt {f}}\}\). Let \({\hat{r}}_{\max }(\hat{\sigma }) \equiv \max _{i=1,2}{\hat{r}}_i(\hat{\sigma })\)—in this notation,

    $$\begin{aligned}{\hat{r}}_{\max }(\hat{\Omega }) = \sum _{\hat{\sigma }\in \hat{\Omega }} \max _{i=1,2} {\hat{r}}_i(\hat{\sigma }) \geqslant \max _{i=1,2} \sum _{\hat{\sigma }\in \hat{\Omega }} {\hat{r}}_i(\hat{\sigma }) = \max _{i=1,2}{\hat{r}}_i(\hat{\Omega })\end{aligned}$$

    where the inequality may be strict. Then

    $$\begin{aligned}\Delta \dot{p}^{\text {u}}({\texttt {p}}_{\texttt {1}}{\texttt {f}}) = O(d) \sum _{{\hat{r}}\in \{\hat{p},\mathfrak {f}\hat{p}\}} \Delta \hat{m}^\lambda {\hat{r}}({\texttt {p}}_{\texttt {1}}{\texttt {f}}) [\hat{m}^\lambda {\hat{r}}_{\max } ({\texttt {p}}_{\texttt {1}}{\texttt {f}}) ]^{d-2}\,.\end{aligned}$$

    Let \(a\in {{\,\mathrm{arg\,max}\,}}_i {\hat{r}}_i({\texttt {b}}_{\texttt {1}}{\texttt {s}})\), so that

    $$\begin{aligned}0\leqslant \hat{m}^\lambda {\hat{r}}_{\max }({\texttt {p}}_{\texttt {1}}{\texttt {f}}) -\hat{m}^\lambda {\hat{r}}_a({\texttt {p}}_{\texttt {1}}{\texttt {f}}) \leqslant \Delta \hat{m}^\lambda {\hat{r}}({\texttt {r}}_{\texttt {1}}{\texttt {f}}) +\Delta \hat{m}^\lambda {\hat{r}} ({\texttt {b}}_{\texttt {1}}{\texttt {f}}_{\geqslant 1}) = O(2^{ -(1+ c)k}),\end{aligned}$$

    where the last estimate is by (91) and (94). We also have from (93) that \(\hat{m}^\lambda \hat{p}({\texttt {p}}_{\texttt {1}}{\texttt {f}}) \geqslant \hat{m}^\lambda \hat{p}({\texttt {b}}_{\texttt {1}}{\texttt {f}}) \asymp 1\), and it follows that

    $$\begin{aligned}{}[\hat{m}^\lambda {\hat{r}}_{\max } ({\texttt {p}}_{\texttt {1}}{\texttt {f}})]^{d-2} \asymp [\hat{m}^\lambda {\hat{r}}_a ({\texttt {p}}_{\texttt {1}}{\texttt {f}})]^{d-1}. \end{aligned}$$
    (108)

    Applying (93) and (94) again, we have (for \(i=1,2\))

    $$\begin{aligned}^{d-1} \asymp [ \hat{m}^\lambda {\hat{r}}_i ({\texttt {p}}_{\texttt {1}}{\texttt {s}})]^{d-1}\,.\end{aligned}$$

    On the other hand, assuming \(T\geqslant 1\), we have

    $$\begin{aligned}\dot{p}^{\text {u}}_i({\texttt {r}}_{\texttt {1}}{\texttt {f}}) \geqslant [ \hat{m}^\lambda {\hat{r}}_i({\texttt {p}}_{\texttt {1}}{\texttt {s}}) ]^{d-1} -[ \hat{m}^\lambda {\hat{r}}_i({\texttt {b}}_{\texttt {1}}{\texttt {s}}) ]^{d-1} \asymp [ \hat{m}^\lambda {\hat{r}}_i({\texttt {p}}_{\texttt {1}}{\texttt {s}}) ]^{d-1}\end{aligned}$$

    where the last step follows by (106). Similarly,

    $$\begin{aligned} \dot{p}^{\text {u}}_i({\texttt {r}}_{\texttt {1}}{\texttt {f}}) - \dot{p}^{\text {u}}_i({\texttt {b}}_{\texttt {1}}{\texttt {f}})&=O(1) \sum _{{\hat{r}}\in \{\hat{p},\mathfrak {f}\hat{p}\}} \hat{m}^\lambda {\hat{r}}_i({\texttt {b}}_{\texttt {1}}{\texttt {f}})^{d-1} = O(2^{-k}) \sum _{{\hat{r}}\in \{\hat{p},\mathfrak {f}\hat{p}\}} \hat{m}^\lambda {\hat{r}}_i({\texttt {p}}_{\texttt {1}}{\texttt {f}})^{d-1} \nonumber \\&= O(2^{-k}) \dot{p}^{\text {u}}_i({\texttt {r}}_{\texttt {1}}{\texttt {f}}) = O(2^{-k}) \dot{p}^{\text {u}}_i({\texttt {b}}_{\texttt {1}}{\texttt {f}}), \end{aligned}$$
    (109)

    where the last step follows by rearranging the terms. Combining the above gives

    $$\begin{aligned} \Delta \dot{p}^{\text {u}}({\texttt {p}}_{\texttt {1}}{\texttt {f}}) \leqslant O(d) \Vert {\Delta \hat{m}^\lambda \hat{p}+ \Delta \hat{m}^\lambda \mathfrak {f}\hat{p}} \Vert \max _{i=1,2} \dot{p}^{\text {u}}_i({\texttt {b}}_{\texttt {1}}{\texttt {f}}). \end{aligned}$$
    (110)

    Clearly, similar bounds hold if we replace \({\texttt {p}}_{\texttt {1}}{\texttt {f}}\) with any of \({\texttt {p}}_{\texttt {0}}{\texttt {f}}\), \({\texttt {f}}{\texttt {p}}_{\texttt {1}}\), or \({\texttt {f}}{\texttt {p}}_{\texttt {0}}\).

  3. 3.

    Lastly we bound \(\Delta \dot{p}^{\text {u}}({\texttt {p}}_{\varvec{x}}{\texttt {p}}_{\varvec{x}})\) for \({\varvec{x}},{\varvec{y}}\in \{{\texttt {0}},{\texttt {1}}\}\). As in the single-copy recursion, we denote

    $$\begin{aligned} {\dot{r}}({\texttt {X}}_{\varvec{x}}\dot{\sigma })&\equiv {\dot{r}}({\texttt {r}}_{\varvec{x}}\dot{\sigma }) -{\dot{r}}({\texttt {b}}_{\varvec{x}}\dot{\sigma }),\\ {\dot{r}}(\dot{\sigma }{\texttt {X}}_{\varvec{x}})&\equiv {\dot{r}}(\dot{\sigma }{\texttt {r}}_{\varvec{x}}) -{\dot{r}}(\dot{\sigma }{\texttt {b}}_{\varvec{x}}),\\ {\dot{r}}({\texttt {X}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}})&\equiv {\dot{r}}({\texttt {r}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}) -{\dot{r}}({\texttt {r}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}}) - {\dot{r}}({\texttt {b}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}) + {\dot{r}}({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}}). \end{aligned}$$

    Applying (106) gives

    $$\begin{aligned} \dot{p}^{\text {u}}_i({\texttt {X}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}})&=[\hat{p}_i({\texttt {b}}_{\varvec{x}}{\texttt {p}}_{\varvec{y}})]^{d-1} = O(2^{-k}) [\hat{p}_i({\texttt {p}}_{\varvec{x}}{\texttt {p}}_{\varvec{y}})]^{d-1} = O(2^{-k})\dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}),\\ \dot{p}^{\text {u}}_i({\texttt {X}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}})&=[ \hat{p}_i({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}})]^{d-1} =O(2^{-k}) \dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}). \end{aligned}$$

    Combining the above estimates gives

    $$\begin{aligned}\dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}) -\dot{p}^{\text {u}}_i({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}}) =\dot{p}^{\text {u}}_i ({\texttt {X}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}) + \dot{p}^{\text {u}}_i ({\texttt {r}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}}) - \dot{p}^{\text {u}}_i ({\texttt {X}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}}) =O(2^{-k}) \dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}})\,. \end{aligned}$$

    Further, it follows from the bp equations that

    $$\begin{aligned}&\max \{ \dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}}), \dot{p}^{\text {u}}_i({\texttt {b}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}}), \dot{p}^{\text {u}}_i({\texttt {X}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}), \dot{p}^{\text {u}}_i({\texttt {X}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}}) \} \leqslant \dot{p}^{\text {u}}_i({\texttt {r}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}) -\dot{p}^{\text {u}}_i({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}}), \nonumber \\&\quad \text {so } \dot{p}^{\text {u}}_i(st) =[1+O(2^{-k})]\dot{p}^{\text {u}}_i({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}}) \text { for all }s\in \{{\texttt {r}}_{\varvec{x}},{\texttt {b}}_{\varvec{x}}\}, t\in \{{\texttt {r}}_{\varvec{y}},{\texttt {b}}_{\varvec{y}}\}. \end{aligned}$$
    (111)

    Similarly, we can upper bound

    $$\begin{aligned} \Delta \dot{p}^{\text {u}}({\texttt {p}}_{\varvec{x}}{\texttt {p}}_{\varvec{y}})&\leqslant 4[ \Delta \dot{p}^{\text {u}}({\texttt {r}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}) + \Delta \dot{p}^{\text {u}}({\texttt {X}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}}) + \Delta \dot{p}^{\text {u}}({\texttt {r}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}}) + \Delta \dot{p}^{\text {u}}({\texttt {X}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}}) ].\nonumber \\&\leqslant O(d) \sum _{{\hat{r}}\in \{ \hat{p},\mathfrak {f}\hat{p}\}} \sum _{\begin{array}{c} s\in \{{\texttt {p}}_{\varvec{x}},{\texttt {b}}_{\varvec{x}}\} \\ t\in \{{\texttt {p}}_{\varvec{y}},{\texttt {b}}_{\varvec{y}}\} \end{array}} \Vert { \Delta \hat{m}^\lambda {\hat{r}} } \Vert [\hat{m}^\lambda {\hat{r}}_{\max }(st)]^{d-2}. \end{aligned}$$
    (112)

    For \({\hat{r}} \in \{\hat{p},\mathfrak {f}\hat{p}\}\), let \(a={{\,\mathrm{arg\,max}\,}}_{i=1,2} \hat{m}^\lambda {\hat{r}}_i({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})\): then, for any \(s\in \{{\texttt {p}}_{\varvec{x}},{\texttt {b}}_{\varvec{x}}\}\), \(t\in \{{\texttt {p}}_{\varvec{y}},{\texttt {b}}_{\varvec{y}}\}\),

    $$\begin{aligned} 0&\leqslant {\hat{m}^\lambda {\hat{r}}_{\max }(st)}- {\max _{i=1,2}\hat{m}^\lambda {\hat{r}}_i(st)} \leqslant {\hat{m}^\lambda {\hat{r}}_{\max }(st)} -\hat{m}^\lambda {\hat{r}}_a(st)\\&\leqslant O(1) \Delta \hat{m}^\lambda {\hat{r}}(\{{\texttt {p}}{\texttt {p}}\}{\setminus }\{{\texttt {b}}{\texttt {b}}\}) \leqslant O(1/2^{(1+c)k}), \end{aligned}$$

    where the last estimate is by (91). Combining with (72) and (111) gives

    $$\begin{aligned}\sum _{\begin{array}{c} s\in \{{\texttt {p}}_{\varvec{x}},{\texttt {b}}_{\varvec{x}}\} \\ t\in \{{\texttt {p}}_{\varvec{y}},{\texttt {b}}_{\varvec{y}}\} \end{array}} [\hat{m}^\lambda {\hat{r}}_{\max }(st)]^{d-2} = O(1) \Big [ \max _{i=1,2} {\hat{r}}_i({\texttt {p}}_{\varvec{x}}{\texttt {p}}_{\varvec{y}})\Big ]^{d-1} =O(1) \max _{i=1,2} \dot{p}^{\text {u}}_i({\texttt {b}}{\texttt {b}})\,.\end{aligned}$$

    Substituting into (112) gives

    $$\begin{aligned} \Delta \dot{p}^{\text {u}}({\texttt {p}}_{\varvec{x}}{\texttt {p}}_{\varvec{y}}) \leqslant O(d)\Vert {\Delta \hat{m}^\lambda \hat{p}+ \Delta \hat{m}^\lambda \mathfrak {f}\hat{p}} \Vert \max _{i=1,2}\dot{p}^{\text {u}}_i ({\texttt {b}}{\texttt {b}})\,.\end{aligned}$$
    (113)

    Further, for any \(st\in \{{\texttt {r}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}}, {\texttt {X}}_{\varvec{x}}{\texttt {r}}_{\varvec{y}},{\texttt {X}}_{\varvec{x}}{\texttt {X}}_{\varvec{y}}\}\), we have

    $$\begin{aligned} \Delta \dot{p}^{\text {u}}(st) \leqslant O(k)\Vert {\Delta \hat{m}^\lambda \hat{p}+ \Delta \hat{m}^\lambda \mathfrak {f}\hat{p}} \Vert \max _{i=1,2}\dot{p}^{\text {u}}_i ({\texttt {b}}{\texttt {b}})\,.\end{aligned}$$
    (114)

    Lastly, in the special case \(\hat{p}_2 = \mathfrak {f}\hat{p}_1\), (113) reduces to

    $$\begin{aligned} |\dot{p}^{\text {u}}_1({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -\dot{p}^{\text {u}}_1({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})|&\leqslant O(d) \Vert {\hat{m}^\lambda \hat{p}_1 -\hat{m}^\lambda \mathfrak {f}\hat{p}_1} \Vert \dot{p}^{\text {u}}_1({\texttt {b}}{\texttt {b}})\nonumber \\&\leqslant k^5 2^{(1-2\kappa )k} \Vert {\dot{p}_i-\mathfrak {f}\dot{p}_i} \Vert \,. \end{aligned}$$
    (115)

    where the last estimate is by (92).

Step II. Normalized messages. Recall \(\tilde{q}_i\equiv {\texttt {BP}}{\dot{q}}_i\). It remains to verify that \(\tilde{q}_i\in \varvec{\Gamma }(c',1)\) with \(c'=\max \{0,2\kappa -1\}\): recalling the definition of \(\varvec{\Gamma }\), this means

$$\begin{aligned}&|p({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -p({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})| \leqslant (k^9/2^{c'k})p({\texttt {b}}{\texttt {b}})\text { and } p({\texttt {f}}{\texttt {f}})+p(\{{\texttt {f}}{\texttt {r}},{\texttt {r}}{\texttt {f}}\})/2^k \\&\quad + p({\texttt {r}}{\texttt {r}})/4^k = O(2^{-k}) p({\texttt {b}}{\texttt {b}});\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \, (1\varvec{\Gamma }')\\&p( {\texttt {f}}{\texttt {r}}) = O(2^{-k})p({\texttt {b}}{\texttt {b}})\text { and } p({\texttt {r}}{\texttt {r}}) = O(1) p({\texttt {b}}{\texttt {b}}); \qquad \qquad \qquad \qquad \qquad \qquad \qquad (2\varvec{\Gamma }')\\&p({\texttt {r}}_{\varvec{x}}\dot{\sigma }) \geqslant [1-O(2^{-k})] p({\texttt {b}}_{\varvec{x}}\dot{\sigma }) \text { for all } {\varvec{x}}\in \{{\texttt {0}},{\texttt {1}}\}\text { and } \dot{\sigma }\in \dot{\Omega }.\qquad \qquad \qquad \qquad \,\, (3\varvec{\Gamma }') \end{aligned}$$

Condition (\(3\varvec{\Gamma }'\)) is automatically satisfied due to the bp equations. The second part of (\(2\varvec{\Gamma }'\)) follows from (111). The first part of (\(1\varvec{\Gamma }'\)) holds trivially if \(c'=0\), and otherwise follows from (115). We claim that

$$\begin{aligned} \tilde{q}_i(\{{\texttt {r}}{\texttt {f}},{\texttt {f}}{\texttt {r}},{\texttt {f}}{\texttt {f}}\}) = O(2^{-k})\tilde{q}_i({\texttt {b}}{\texttt {b}})\,. \end{aligned}$$
(116)

This immediately gives the first part of (\(2\varvec{\Gamma }'\)). Further, the bp equations give \(\tilde{q}_i({\texttt {b}}{\texttt {f}})\leqslant \tilde{q}_i({\texttt {r}}{\texttt {f}})\) and \(\tilde{q}_i({\texttt {f}}{\texttt {b}})\leqslant \tilde{q}_i({\texttt {f}}{\texttt {r}})\), so the second part of (\(1\varvec{\Gamma }'\)) also follows. To see that (116) holds, note that the second part of (94) gives

$$\begin{aligned} \dot{p}^{\text {u}}_i({\texttt {f}}{\texttt {f}})&\leqslant O(1) \sum _{{\hat{r}}\in \{\hat{p},\mathfrak {f}\hat{p}\}} [\hat{m}^\lambda {\hat{r}}_i({\texttt {f}}{\texttt {f}})]^{d-1} \leqslant O(1) \sum _{{\hat{r}}\in \{\hat{p},\mathfrak {f}\hat{p}\}} [\hat{m}^\lambda {\hat{r}}_i({\texttt {b}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})]^{d-1},\\ \dot{p}^{\text {u}}_i({\texttt {r}}_{\texttt {1}}{\texttt {f}})&\leqslant O(1) \sum _{{\hat{r}}\in \{\hat{p},\mathfrak {f}\hat{p}\}} [\hat{m}^\lambda {\hat{r}}_i({\texttt {p}}_{\texttt {1}}{\texttt {f}})]^{d-1} \leqslant O(1) \sum _{{\hat{r}}\in \{\hat{p},\mathfrak {f}\hat{p}\}} [\hat{m}^\lambda {\hat{r}}_i ({\texttt {p}}_{\texttt {1}}{\texttt {b}}_{\texttt {1}})]^{d-1}. \end{aligned}$$

Combining with (106) gives \(\dot{p}^{\text {u}}_i(\{{\texttt {r}}_{\texttt {1}}{\texttt {f}},{\texttt {f}}{\texttt {f}}\}) = O(2^{-k})\dot{p}^{\text {u}}_i({\texttt {r}}_{\texttt {1}}{\texttt {r}}_{\texttt {1}})\). Recalling (111) (and making use of symmetry) gives (116). Finally, we conclude the proof of the lemma by bounding the difference \(\Delta \tilde{q}\equiv |\tilde{q}_1-\tilde{q}_2|\). Recalling the definition of \({\texttt {X}}_{\varvec{x}}\), we have

$$\begin{aligned} \Delta \tilde{q}({\texttt {p}}{\texttt {p}})&\leqslant O(1) \Delta \tilde{q}(\{{\texttt {b}}{\texttt {b}}, {\texttt {r}}{\texttt {X}},{\texttt {X}}{\texttt {r}},{\texttt {X}}{\texttt {X}}\}),\\ \Delta \tilde{q}(\dot{\Omega }^2{\setminus }\{{\texttt {p}}{\texttt {p}}\})&\leqslant O(1)\Delta \tilde{q}( \{{\texttt {b}}{\texttt {f}},{\texttt {f}}{\texttt {b}},{\texttt {f}}{\texttt {f}}, {\texttt {f}}{\texttt {X}},{\texttt {X}}{\texttt {f}}\}). \end{aligned}$$

We next bound \(\Delta \tilde{q}({\texttt {b}}{\texttt {b}})\), which is the sum of \(\Delta \tilde{q}({\texttt {b}}_{\varvec{x}}{\texttt {b}}_{\varvec{y}})\) over \({\varvec{x}},{\varvec{y}}\in \{{\texttt {0}},{\texttt {1}}\}\). By symmetry let us take \({\varvec{x}}={\varvec{y}}={\texttt {0}}\). Since \(\tilde{q}_i=(\tilde{q}_i)^\text {av}\), \(\tilde{q}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) =\tfrac{1}{4}\tilde{q}_i({\texttt {b}}{\texttt {b}}) +\tfrac{1}{2}[ \tilde{q}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -\tilde{q}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}}) ]\), so

$$\begin{aligned} \Delta \tilde{q}({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) \leqslant \tfrac{1}{4} |\tilde{q}_1({\texttt {b}}{\texttt {b}})-\tilde{q}_2({\texttt {b}}{\texttt {b}})| +\tfrac{1}{2} \sum _{i=1,2}|\tilde{q}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -\tilde{q}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})|\,. \end{aligned}$$

Since the \(\tilde{q}_i\) are normalized to be probability measures,

$$\begin{aligned} 1-\tilde{q}_i(\dot{\Omega }^2{\setminus }\{{\texttt {p}}{\texttt {p}}\}) =\tilde{q}_i({\texttt {p}}{\texttt {p}}) =2\tilde{q}_i({\texttt {r}}{\texttt {X}}) +2\tilde{q}_i({\texttt {X}}{\texttt {r}}) -3 \tilde{q}_i({\texttt {X}}{\texttt {X}}) +4\tilde{q}_i({\texttt {b}}{\texttt {b}}), \end{aligned}$$

from which it follows that

$$\begin{aligned} |\tilde{q}_1({\texttt {b}}{\texttt {b}})-\tilde{q}_2({\texttt {b}}{\texttt {b}})| \lesssim |\tilde{q}_1(\dot{\Omega }^2{\setminus }\{{\texttt {p}}{\texttt {p}}\}) -\tilde{q}_2(\dot{\Omega }^2{\setminus }\{{\texttt {p}}{\texttt {p}}\})| + \Delta \tilde{q}(\{ {\texttt {r}}{\texttt {X}},{\texttt {X}}{\texttt {r}},{\texttt {X}}{\texttt {X}}\})\,. \end{aligned}$$

Combining the above estimates gives

$$\begin{aligned} \Vert {\Delta \tilde{q}} \Vert \lesssim \Delta \tilde{q}(\texttt {A} ) + \sum _{i=1,2} |\tilde{q}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -\tilde{q}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})|,\quad \texttt {A} \equiv \{ {\texttt {b}}{\texttt {f}},{\texttt {f}}{\texttt {b}},{\texttt {f}}{\texttt {f}}, {\texttt {f}}{\texttt {X}},{\texttt {X}}{\texttt {f}}, {\texttt {r}}{\texttt {X}},{\texttt {X}}{\texttt {r}},{\texttt {X}}{\texttt {X}}\}\,. \end{aligned}$$

Write \(\dot{Z}_i\equiv \Vert {\dot{p}^{\text {u}}_i} \Vert \). Taking \(a\in \{1,2\}\) and \(b=2-a\), we find \(\Vert {\Delta \tilde{q}} \Vert \leqslant e_1+e_2e_3+e_4\) with

$$\begin{aligned}&e_1\equiv \frac{\Delta \dot{p}^{\text {u}}(\texttt {A})}{{\dot{Z}}_a}\,,\quad e_2\equiv \frac{|{\dot{Z}}_1-{\dot{Z}}_2|}{{\dot{Z}}_a} \\&\quad \leqslant \frac{\Vert {\Delta \dot{p}^{\text {u}}} \Vert }{{\dot{Z}}_a}\,, \quad e_3\equiv \frac{\dot{p}^{\text {u}}_b(\texttt {A})}{{\dot{Z}}_b}\,, \quad \\&\quad e_4\equiv \sum _{i=1,2} \frac{|\dot{p}^{\text {u}}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {0}}) -\dot{p}^{\text {u}}_i({\texttt {b}}_{\texttt {0}}{\texttt {b}}_{\texttt {1}})|}{{\dot{Z}}_i}\,. \end{aligned}$$

It follows from (107), (110), (114) and (116), and taking \(a={{\,\mathrm{arg\,max}\,}}_i\dot{p}^{\text {u}}_i({\texttt {b}}{\texttt {b}})\), that

$$\begin{aligned} e_1 \lesssim \Vert {\Delta \hat{m}^\lambda \hat{p}+\Delta \hat{m}^\lambda \mathfrak {f}\hat{p}} \Vert (d/2^k) \max _{i=1,2}\dot{p}^{\text {u}}_i({\texttt {b}}{\texttt {b}})/ {\dot{Z}}_a \lesssim k \Vert {\Delta \hat{m}^\lambda \hat{p}+\Delta \hat{m}^\lambda \mathfrak {f}\hat{p}} \Vert \,. \end{aligned}$$

Further, recalling (113) gives

$$\begin{aligned} e_2 \lesssim k2^k \Vert {\Delta \hat{m}^\lambda \hat{p}+\Delta \hat{m}^\lambda \mathfrak {f}\hat{p}} \Vert \,. \end{aligned}$$

Combining (109), (111), and (116) gives \(e_3=O(2^{-k})\). Finally, (115) gives

$$\begin{aligned} e_4 \lesssim k2^{k} \Vert {\hat{m}^\lambda \hat{p}_i -\hat{m}^\lambda \mathfrak {f}\hat{p}_i} \Vert \,. \end{aligned}$$

Combining the pieces together finishes the proof.\(\square \)

Appendix B: The 1RSB free energy

1.1 B.1. Equivalence of recursions

In this section, we relate the coloring recursion (64) to the distributional recursion (10), and prove the following:

Proposition B.1

Let \({\dot{q}}_\lambda \) be the fixed point given by Proposition 5.5a for parameters \(\lambda \in [0,1]\) and \(T=\infty \). Let \(H_\lambda \equiv (\dot{H}_\lambda ,\hat{H}_\lambda ,\bar{H}_\lambda )\in \varvec{\Delta }\) be the associated triple of measures defined by Proposition 3.4. We then have the identity \((\varvec{s}(H_\lambda ),\varvec{\Sigma }(H_\lambda ),\varvec{F}(H_\lambda )) = (s_\lambda , \Sigma (s_\lambda ),{\mathfrak {F}}(\lambda ))\).

In the course of the proof, we will obtain Proposition 1.2 as a corollary. Throughout the section we take \(T=\infty \) unless explicitly indicated otherwise. We begin with some notations. Recall that \({\mathscr {P}}(\mathcal {X})\) is the space of probability measures on \(\mathcal {X}\). Given \({\dot{q}}\in {\mathscr {P}}(\dot{\Omega })\), we define two associated measures \(\dot{m}^\lambda {\dot{q}},(1-\dot{m})^\lambda {\dot{q}}\) on \(\dot{\Omega }\) by

$$\begin{aligned}(\dot{m}^\lambda {\dot{q}})(\dot{\sigma }) \equiv \dot{m}(\dot{\sigma })^\lambda {\dot{q}}(\dot{\sigma }),\quad ((1-\dot{m})^\lambda {\dot{q}})(\dot{\sigma }) \equiv (1-\dot{m}(\dot{\sigma }))^\lambda {\dot{q}}(\dot{\sigma }),\end{aligned}$$

We let \(\dot{\pi }\equiv \dot{\pi }({\dot{q}})\) be the probability measure on \(\dot{\mathscr {M}}{\setminus }\{\star \}\) given by

Recall from §A.1 the mappings \(\dot{m}: \dot{\Omega }\rightarrow [0,1]\) and \(\hat{m}: \dot{\Omega }\rightarrow [0,1]\). We then denote the pushforward measure \(\dot{u}\equiv \dot{u}({\dot{q}}) \equiv \dot{\pi }\circ \dot{m}^{-1}\), so that \(\dot{u}\) belongs to the space \({\mathscr {P}}\) of discrete probability measures on [0, 1]. Analogously, given \(\hat{q}\in {\mathscr {P}}(\hat{\Omega })\), we define two associated measures \(\hat{m}^\lambda \hat{q},(1-\hat{m})^\lambda \hat{q}\) on \(\hat{\Omega }\). We let \(\hat{\pi }\equiv \hat{\pi }(\hat{q})\) be the probability measure on \(\hat{\mathscr {M}}{\setminus }\{\star \}\) given by

and we then denote \(\hat{u}\equiv \hat{u}(\hat{q})\equiv \hat{\pi }\circ \hat{m}^{-1}\), so that \(\hat{u}\in {\mathscr {P}}\) also. The next two lemmas follow straightforwardly from the above definitions, and we omit their proofs:

Lemma B.2

Suppose \({\dot{q}}\in {\mathscr {P}}(\dot{\Omega })\) satisfies \({\dot{q}}={\dot{q}}^\text {av}\) and

$$\begin{aligned} \dot{m}^\lambda {\dot{q}}({\texttt {f}}) ={\dot{q}}({\texttt {r}}_{\texttt {1}}) - {\dot{q}}({\texttt {b}}_{\texttt {1}}) ={\dot{q}}({\texttt {r}}_{\texttt {0}}) - {\dot{q}}({\texttt {b}}_{\texttt {0}}) =(1-\dot{m})^\lambda {\dot{q}}({\texttt {f}}) \end{aligned}$$
(117)

Then \(\hat{q}\equiv \hat{{\texttt {BP}}}{\dot{q}}\in {\mathscr {P}}(\hat{\Omega })\) must satisfy \(\hat{q}=\hat{q}^\text {av}\) and

$$\begin{aligned} \hat{m}^\lambda \hat{q}({\texttt {f}}) = \hat{q}({\texttt {b}}_{\texttt {1}}) = \hat{q}({\texttt {b}}_{\texttt {0}}) = (1-\hat{m})^\lambda \hat{q}({\texttt {f}}), \end{aligned}$$
(118)

Let \(\hat{\varvec{z}}\equiv (\hat{{\texttt {NB}}}{\dot{q}})/(\hat{{\texttt {BP}}}{\dot{q}})\) be the normalizing constant. Then \(\dot{u}\equiv \dot{u}({\dot{q}})\) and \(\hat{u}\equiv \hat{u}(\hat{q})\) satisfy

$$\begin{aligned} \hat{u}= \hat{\mathscr {R}}_\lambda (\dot{u}), \quad \hat{{\mathscr {Z}}}_\lambda (\dot{u}) = \frac{\hat{\varvec{z}}(1-\hat{q}({\texttt {b}}))}{(1-{\dot{q}}({\texttt {r}}))^{k-1}}. \end{aligned}$$
(119)

Lemma B.3

Suppose \(\hat{q}\in {\mathscr {P}}(\hat{\Omega })\) satisfies \(\hat{q}=\hat{q}^\text {av}\) and (118). Then \({\dot{q}}\equiv \hat{{\texttt {BP}}}\hat{q}\in {\mathscr {P}}(\dot{\Omega })\) must satisfy \({\dot{q}}={\dot{q}}^\text {av}\) and (117). Let \(\dot{\varvec{z}}\equiv (\dot{{\texttt {NB}}}\hat{q})/(\dot{{\texttt {BP}}}\hat{q})\) be the normalizing constant: then

$$\begin{aligned} \dot{u}= \dot{\mathscr {R}}_\lambda (\hat{u}), \quad \dot{{\mathscr {Z}}}_\lambda (\hat{u}) = \frac{\dot{\varvec{z}}(1-{\dot{q}}({\texttt {r}}))}{(1-\hat{q}({\texttt {b}}))^{d-1}}. \end{aligned}$$
(120)

Proof of Proposition 1.2

This is simply a rephrasing of the proof of Proposition 5.5a, using Lemma B.2 and Lemma B.3.\(\square \)

We next prove Proposition B.1. In the remainder of this section, fix \(\lambda \in [0,1]\) and \(T=\infty \). Let \({\dot{q}}\equiv {\dot{q}}_\lambda \) be the fixed point of \({\texttt {BP}}\equiv {\texttt {BP}}_{\lambda ,\infty }\) given by Proposition 5.5a. Let \(\hat{q}\equiv \hat{q}_\lambda \) denote the image of \({\dot{q}}\) under the mapping \(\hat{{\texttt {BP}}}\equiv \hat{{\texttt {BP}}}_{\lambda ,\infty }\). Denote the associated normalizing constants

$$\begin{aligned} \hat{\varvec{z}}\equiv \hat{\varvec{z}}_\lambda \equiv (\hat{{\texttt {NB}}}{\dot{q}})/(\hat{{\texttt {BP}}}{\dot{q}}),\quad \dot{\varvec{z}}\equiv \dot{\varvec{z}}_\lambda \equiv (\dot{{\texttt {NB}}}\hat{q})/(\dot{{\texttt {BP}}}\hat{q})\,. \end{aligned}$$

Let \(H_\lambda \equiv (\dot{H}_\lambda ,\hat{H}_\lambda ,\bar{H}_\lambda )\) be the triple of associated measures defined as in Proposition 3.4, with normalizing constants \((\dot{\mathfrak {z}}_\lambda ,\hat{\mathfrak {z}}_\lambda ,\bar{\mathfrak {z}}_\lambda )\). Recall from (12) that \({\mathfrak {F}}(\lambda )=\ln \dot{\mathfrak {Z}}_\lambda +\alpha \ln \hat{\mathfrak {Z}}_\lambda - d\ln \bar{\mathfrak {Z}}_\lambda \). We now show that it coincides with \(\varvec{F}(H_\lambda )\):

Lemma B.4

Under the above notations, \(\varvec{F}(H_\lambda ) = \ln \dot{\mathfrak {z}}_\lambda + \alpha \ln \hat{\mathfrak {z}}_\lambda - d \ln \bar{\mathfrak {z}}_\lambda \), and

$$\begin{aligned} \bar{\mathfrak {Z}}_\lambda = \frac{\bar{\mathfrak {z}}_\lambda }{(1-{\dot{q}}_\lambda ({\texttt {r}}))(1-\hat{q}_\lambda ({\texttt {b}}))} ,\quad \dot{\mathfrak {Z}}_\lambda = \frac{\dot{\mathfrak {z}}_\lambda }{(1-\hat{q}_\lambda ({\texttt {b}}))^{d}} ,\quad \hat{\mathfrak {Z}}_\lambda = \frac{\hat{\mathfrak {z}}_\lambda }{(1-{\dot{q}}_\lambda ({\texttt {r}}))^{k}} . \end{aligned}$$
(121)

Consequently \({\mathfrak {F}}(\lambda ) = \varvec{F}(H_\lambda )\).

Proof

It follows from the definition (43) (and recalling from Corollary 2.18 that \(\hat{\Phi }(\underline{{\sigma }})^\lambda = \hat{F}(\underline{{\sigma }})^\lambda {\hat{v}}(\underline{{\sigma }})\)) that

$$\begin{aligned} \varvec{F}(H_\lambda ) = \langle \ln (\dot{\Phi }^\lambda /\dot{H}) , \dot{H}_\lambda \rangle + \alpha \langle \ln ( \hat{\Phi }^\lambda /\hat{H}_\lambda ) ,\hat{H}_\lambda \rangle + d \langle \ln ( \bar{\Phi }^\lambda \bar{H}_\lambda ) , \bar{H}_\lambda \rangle \,. \end{aligned}$$

Substituting in Definition 5.6 and rearranging gives

$$\begin{aligned}&\varvec{F}(H_\lambda ) -\Big (\ln \dot{\mathfrak {z}}_\lambda + \alpha \ln \hat{\mathfrak {z}}_\lambda - d \ln \bar{\mathfrak {z}}_\lambda \Big ) \\&\qquad =- \Big \langle \sum _{i=1}^d\ln \hat{q}_\lambda (\hat{\sigma }_i), \dot{H}_\lambda \Big \rangle -\alpha \Big \langle \sum _{i=1}^k\ln {\dot{q}}_\lambda (\dot{\sigma }_i),\hat{H}_\lambda \Big \rangle + d \langle \ln [{\dot{q}}_\lambda (\dot{\sigma })\hat{q}_\lambda (\hat{\sigma })], \bar{H}_\lambda \rangle . \end{aligned}$$

This equals zero since \(H_\lambda \in \varvec{\Delta }\). The proof of (121) is straightforward from the preceding definitions, and is omitted. \(\square \)

Proof of Proposition B.1

By similar calculations as above, it is straightforward to verify that \(s_\lambda =\varvec{s}(H_\lambda )\). Since by definition \({\mathfrak {F}}(\lambda ) = \lambda s_\lambda + \Sigma (s_\lambda )\) and \(\varvec{F}(H_\lambda ) = \lambda \varvec{s}(H_\lambda ) + \varvec{\Sigma }(H_\lambda )\), it follows that \(\Sigma (s_\lambda )= \varvec{\Sigma }(H_\lambda )\), concluding the proof.\(\square \)

Proof of Proposition 3.13

Immediate consequence of Proposition B.1 together with Proposition 5.5b.\(\square \)

1.2 B.2. Large-k asymptotics

We now evaluate the large-k asymptotics of the free energy, beginning with (12). Let \(\dot{\mu }_\lambda \) be the probability measure on [0, 1] given by Proposition 1.2, and write \(\hat{\mu }_\lambda \equiv \hat{{\mathscr {R}}}_\lambda (\dot{\mu }_\lambda )\). In what follows it will be useful to denote \(\dot{\mu }_\lambda ({\texttt {f}}) \equiv \dot{\mu }_\lambda ((0,1))\), as well as

$$\begin{aligned}\psi _\lambda \equiv \int x^\lambda \mathbf {1}\{x\in (0,1)\} \dot{\mu }_\lambda (dx), \quad \rho _\lambda \equiv \int y^\lambda \mathbf {1}\{ y\in (0,1) {\setminus }\{\tfrac{1}{2}\} \} \hat{\mu }_\lambda (dy)\,. \end{aligned}$$

Proposition B.5

For \(k\geqslant k_0\), \(\alpha _\text {lbd}\leqslant \alpha = (2^{k-1}-c)\ln 2 \leqslant \alpha _\text {ubd}\), and \(\lambda \in [0,1]\),

$$\begin{aligned} \ln \dot{\mathfrak {Z}}_{\lambda }&=\ln 2 -(1-2^{\lambda -1})/2^k + d\ln \Big ( 2^{-\lambda }\hat{\mu }_\lambda (\tfrac{1}{2}) +\hat{\mu }_\lambda (1) +\rho _\lambda \Big ) + \mathrm{{\textsf {{err}}}}, \end{aligned}$$
(122)
$$\begin{aligned} -d\ln \bar{\mathfrak {Z}}&=- d \ln \Big ( 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) +\hat{\mu }(1) + \rho _\lambda \Big ) - (k\ln 2) [- \dot{\mu }_\lambda ({\texttt {f}}) + 2 \psi _\lambda ] + \mathrm{{\textsf {{err}}}}, \end{aligned}$$
(123)
$$\begin{aligned} \alpha \ln \hat{\mathfrak {Z}}&=\alpha \ln (1-2/2^k) + (k\ln 2) ( - \dot{\mu }_\lambda ({\texttt {f}}) + 2 \psi _\lambda ) + \mathrm{{\textsf {{err}}}}, \end{aligned}$$
(124)

where \(\mathrm{{\textsf {{err}}}}\) denotes any error bounded by \(k^{O(1)}/4^k\). Altogether this yields

$$\begin{aligned} {\mathfrak {F}}(\lambda ) ={\textsf {f}}^\textsc {rs}(\alpha ) - (1-2^{\lambda -1})/2^k + \mathrm{{\textsf {{err}}}}= [(2c-1)\ln 2 -(1-2^{\lambda -1})]/2^k +\mathrm{{\textsf {{err}}}}\,. \end{aligned}$$

On the other hand \(\lambda s_\lambda = \lambda (\ln 2) 2^{\lambda -1} / 2^k + \mathrm{{\textsf {{err}}}}\).

Proof of Proposition 1.4b

Apply Proposition B.5: setting \({\mathfrak {F}}(\lambda ) = \lambda s_\lambda \) gives

$$\begin{aligned} \alpha _\lambda =(2^{k-1}-c_\lambda )\ln 2+\mathrm{{\textsf {{err}}}},\quad c_\lambda = \frac{1}{2} + \frac{ 1 - 2^{\lambda -1}(1-\lambda \ln 2) }{2\ln 2}\,. \end{aligned}$$

Substituting the special values \(\lambda =1\) and \(\lambda =0\) gives

$$\begin{aligned} c_\text {cond} =c_1=1,\quad c_\text {sat} =c_0 =\frac{1}{2} + \frac{1}{4\ln 2}, \end{aligned}$$

verifying (1) and (16).\(\square \)

Proof of Proposition B.5

Throughout the proof we abbreviate \(\epsilon _k\) for a small error term which may change from one occurrence to the next, but is bounded throughout by \(k^C/2^k\) for a sufficiently large absolute constant C. Note that

$$\begin{aligned} \hat{\mu }_\lambda (\tfrac{1}{2}) = 1 -2 \cdot \frac{2^{1-\lambda }}{2^k} + \epsilon _k, \quad \hat{\mu }_\lambda (1) = \hat{\mu }_\lambda (0) = \frac{2^{1-\lambda }}{2^k} + \epsilon _k, \quad \hat{\mu }_\lambda ((0,1){\setminus }\{\tfrac{1}{2}\}) = \epsilon _k, \end{aligned}$$

from which it follows that \(\rho _\lambda =\epsilon _k\). Meanwhile, \(\psi _\lambda \leqslant \dot{\mu }_\lambda ({\texttt {f}})\), and we will show below that

$$\begin{aligned} \dot{\mu }_\lambda ({\texttt {f}})= \frac{2^{\lambda -1}}{2^k} + \epsilon _k\,. \end{aligned}$$
(125)

\(\square \)

Estimate of \(\dot{\mathfrak {Z}}_{\lambda }\). Recall from the definition (11) that

$$\begin{aligned}\dot{\mathfrak {Z}}_{\lambda } = \int \bigg (\prod _{i=1}^d y_i +\prod _{i=1}^d (1-y_i)\bigg )^\lambda \prod _{i=1}^d \hat{\mu }_\lambda (dy_i)\,.\end{aligned}$$

Let \(\dot{\mathfrak {Z}}_{\lambda }({\texttt {f}})\) denote the contribution to \(\dot{\mathfrak {Z}}_{\lambda }\) from free variables, meaning \(y_i\in (0,1)\) for all i. This can be decomposed further into the contribution \(\dot{\mathfrak {Z}}_{\lambda }({\texttt {f}}_1)\) from isolated free variables (meaning \(y_i=1/2\) for all i) and the remainder \(\dot{\mathfrak {Z}}_{\lambda }({\texttt {f}}_{\geqslant 2})\). We then calculate

$$\begin{aligned} \dot{\mathfrak {Z}}_\lambda ({\texttt {f}}_1) = 2^\lambda \Big ( 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2})\Big )^d\,. \end{aligned}$$

This dominates the contribution from non-isolated free variables:

$$\begin{aligned} \dot{\mathfrak {Z}}_\lambda ({\texttt {f}}_{\geqslant 2})&= \sum _{j=1}^d\left( {\begin{array}{c}d\\ j\end{array}}\right) \bigg (\int y^\lambda \mathbf {1}\{y\in (0,1){\setminus }\{\tfrac{1}{2}\}\} \hat{\mu }_\lambda (dy)\bigg )^j \Big ( 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) \Big )^{d-j} \\&\leqslant O(1) d \hat{\mu }_\lambda ( (0,1){\setminus }\{\tfrac{1}{2}\}) \Big ( 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) \Big )^d \leqslant \dot{\mathfrak {Z}}_\lambda ({\texttt {f}}_1) k^{O(1)}/2^k. \end{aligned}$$

Next let \(\dot{\mathfrak {Z}}_{\lambda }({\texttt {1}})\) denote the contribution from variables frozen to \({\texttt {1}}\):

$$\begin{aligned} \dot{\mathfrak {Z}}_{\lambda }({\texttt {1}})&=\Big ( \int y^\lambda \hat{\mu }_\lambda (dy) \Big )^d -\Big ( \int y^\lambda \mathbf {1}\{ y\in (0,1)\} \hat{\mu }_\lambda (dy) \Big )^d\\&= \Big ( 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) +\hat{\mu }_\lambda (1)+ \rho _\lambda \Big )^d - 2^{-\lambda } \dot{\mathfrak {Z}}_{\lambda }({\texttt {f}}_1) + \epsilon _k. \end{aligned}$$

The ratio of free to frozen variables is given by

$$\begin{aligned}\frac{\dot{\mathfrak {Z}}_{\lambda }({\texttt {f}})}{2\dot{\mathfrak {Z}}_{\lambda }({\texttt {1}})} = \frac{2^\lambda }{2} \bigg (\frac{\hat{\mu }_\lambda (\tfrac{1}{2})}{\hat{\mu }_\lambda (\tfrac{1}{2}) + 2^\lambda \hat{\mu }_\lambda (1)} \bigg )^d + \epsilon _k = \frac{2^{\lambda -1}}{2^k} + \epsilon _k. \end{aligned}$$

Combining these yields (122). The proof of (125) is very similar.

Estimate of \(\bar{\mathfrak {Z}}_\lambda \). Recall from the definition (11) that

$$\begin{aligned}\bar{\mathfrak {Z}}_\lambda = \int \Big ( xy+(1-x)(1-y)\Big )^\lambda \dot{\mu }_\lambda (dx) \hat{\mu }_\lambda (dy)\,. \end{aligned}$$

The contribution to \(\bar{\mathfrak {Z}}\) from \(x=0\) or \(x=1\) is given by

$$\begin{aligned} \bar{\mathfrak {Z}}_\lambda (x=1) = \dot{\mu }_\lambda (1) \Big ( 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) +\hat{\mu }_\lambda (1)+ \rho _\lambda \Big )=\bar{\mathfrak {Z}}_\lambda (x=0)\,. \end{aligned}$$

The contribution from \(x\in (0,1)\) and \(y=1/2\) is given by

$$\begin{aligned} \bar{\mathfrak {Z}}_\lambda (x\in (0,1),y=1/2) =\dot{\mu }_\lambda ({\texttt {f}}) 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2})\,. \end{aligned}$$

Lastly, the contribution from \(x\in (0,1)\) and \(y=1\) is given by

$$\begin{aligned} \bar{\mathfrak {Z}}_\lambda (x\in (0,1),y=1) =\hat{\mu }_\lambda (1) \psi _\lambda , \end{aligned}$$

and there is an equal contribution from the case \(x\in (0,1)\) and \(y=0\). The contribution from the case that both \(x,y\in (0,1)\) is \(\leqslant k^{O(1)}/8^k\). Combining these estimates gives

$$\begin{aligned} d\ln \bar{\mathfrak {Z}}_\lambda&= d \ln \Big ( 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) + 2 \dot{\mu }_\lambda (1) \hat{\mu }(1) + 2 \dot{\mu }_\lambda (1) \rho _\lambda + 2 \hat{\mu }(1)\psi _\lambda \Big ) + \epsilon _k \\&=d \ln \Big ( 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) +\hat{\mu }(1) + \rho _\lambda \Big ) +d\ln \Big (1 + \frac{\hat{\mu }(1)[-\dot{\mu }_\lambda ({\texttt {f}}) + 2 \psi _\lambda ] }{ 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) } \Big ) + \epsilon _k. \end{aligned}$$

Recalling \(\hat{\mu }_\lambda =\hat{{\mathscr {R}}}\dot{\mu }_\lambda \) gives

$$\begin{aligned}d\ln \Big (1+\frac{ \hat{\mu }(1)[ -\dot{\mu }_\lambda ({\texttt {f}}) + 2 \psi _\lambda ] }{ 2^{-\lambda } \hat{\mu }_\lambda (\tfrac{1}{2}) }\Big ) =d\dot{\mu }_{\lambda }(0)^{k-1} (-\dot{\mu }_\lambda ({\texttt {f}}) + 2 \psi _\lambda ) + \epsilon _k,\end{aligned}$$

and (123) follows.

Estimate of \(\hat{\mathfrak {Z}}_\lambda \). Recall from the definition (11) that

$$\begin{aligned}\hat{\mathfrak {Z}}_\lambda = \int \bigg (1 -\prod _{i=1}^k x_i -\prod _{i=1}^k (1-x_i)\bigg ) \prod _{i=1}^k\dot{\mu }_\lambda (x_i)\,. \end{aligned}$$

Writing \(\dot{\mu }_\lambda ({\texttt {0}},{\texttt {f}})\equiv \dot{\mu }_\lambda ([0,1))\), the contribution to \(\hat{\mathfrak {Z}}\) from separating clauses is

$$\begin{aligned} 1 - 2 \dot{\mu }_\lambda ({\texttt {0}},{\texttt {f}})^k+\dot{\mu }_\lambda ({\texttt {f}})^k = 1 - (2/2^k) (1+k\dot{\mu }_\lambda ({\texttt {f}})) + k^{O(1)}/8^k\,. \end{aligned}$$

The contribution from clauses which are forcing to some variable that is not forced by any other clause is \(2k \dot{\mu }_\lambda (0)^{k-1} \psi _\lambda \). The contribution from all other clause types is \(\leqslant k^{O(1)}/8^k\), and (124) follows.

Estimate of \(s_\lambda \). Recall from (13) the definition of \(s_\lambda \). By similar considerations as above, it is straightforward to check that the total contribution from frozen variables, edges incident to frozen variables, and separating or forcing clauses is zero. The dominant term is the contribution of isolated free variables, and the estimate follows.\(\square \)

1.3 B.3. Properties of the complexity function

We conclude with a few basic properties of the complexity function \(\Sigma (s)\), including a proof of Proposition 1.4a.

Lemma B.6

For fixed \(1\leqslant T < \infty \), the fixed point \({\dot{q}}_{\lambda ,T}\) of Proposition 5.5a is continuously differentiable as a function of \(\lambda \in [0,1]\).

Proof

Fix \(T<\infty \) and define \(f_T[{\dot{q}},\lambda ] \equiv {\texttt {BP}}_{\lambda ,T}[{\dot{q}}]-{\dot{q}}\) as the mapping from \({\mathscr {P}}(\dot{\Omega }_T)\times [0,1]\) to the set of signed measures on \(\Omega _T\). Since function \(\dot{z}(\dot{\sigma })\) (\(\hat{z}(\hat{\sigma })\), respectively) can take only finitely many values on \(\dot{\Omega }_T\) (\(\hat{\Omega }_T\), respectively) and therefore must be uniformly bounded away from 0. It is straightforward to check that for any \(\lambda \in [0,1]\),

$$\begin{aligned} f_T[{\dot{q}}_\star (\lambda ,T),\lambda ](\dot{\sigma }) = 0 ,\quad \forall \dot{\sigma }\in \Omega _T, \end{aligned}$$

and is uniformly differentiable in a neighborhood of \(\{({\dot{q}}_\star (\lambda ,T),\lambda ):\lambda \in [0,1]\}\).

For any other \({\dot{q}}\) in the contraction region (68), Proposition A.1 guarantees that

$$\begin{aligned} \Vert {f_T[{\dot{q}},\lambda ] - f_T[{\dot{q}}_\star (\lambda ,T),\lambda ]} \Vert&\geqslant \Vert {{\dot{q}}-{\dot{q}}_\star (\lambda )} \Vert - \Vert {{\texttt {BP}}_{\lambda ,T}[{\dot{q}}] - {\texttt {BP}}_{\lambda ,T}[{\dot{q}}_\star (\lambda ,T)]} \Vert \\&\geqslant (1-O(k^22^{-k})) \Vert {{\dot{q}}-{\dot{q}}_\star (\lambda ,T)} \Vert . \end{aligned}$$

Therefore the Jacobian matrix

$$\begin{aligned}\Big (\frac{\partial f_T(\dot{\sigma }_i)}{\partial {\dot{q}}(\dot{\sigma }_j)} \Big )_{\dot{\Omega }\times \dot{\Omega }} \end{aligned}$$

is invertible at each \(({\dot{q}}_\star (\lambda ,T),\lambda )\). By implicit function theorem, \({\dot{q}}_\star (\lambda ,T)\), as the solution of \(f_T[{\dot{q}},\lambda ] = 0 \), is uniformly differentiable in \(\lambda \).\(\square \)

Let us first fix \(T<\infty \) and consider the clusters encoded by T-colorings. We have explicitly defined \(\varvec{\Sigma }(H)\) and \(\varvec{s}(H)\). Let \({\mathcal {S}}(s) \equiv \sup \{ \varvec{\Sigma }(H) : \varvec{s}(H)=s\}\), with the convention that a supremum over an empty set is \(-\infty \). Thus \({\mathcal {S}}(s)\) is a well-defined function which captures the spirit of the function \(\Sigma (s)\) discussed in the introduction. (Note \({\mathcal {S}}\) implicitly depends on T since the maximum is taken over empirical measures H which are supported on T-colorings.) Recall that the physics approach ([31] and refs. therein) takes \({\mathcal {S}}(s)\) as a conceptual starting point. However, for purposes of explicit calculation the actual starting point is the Legendre dual

$$\begin{aligned}{\mathfrak {F}}(\lambda ) \equiv (-{\mathcal {S}})^\star (\lambda ) = \sup _{s\in \mathbb {R}} \Big \{\lambda s + {\mathcal {S}}(s)\Big \} = \sup _H \varvec{F}_\lambda (H), \end{aligned}$$

where \(\varvec{F}_\lambda (H)\equiv \lambda \varvec{s}(H)+\varvec{\Sigma }(H)\). The replica symmetry breaking heuristic gives an explicit conjecture for \({\mathfrak {F}}\). One then makes the assumption that \({\mathcal {S}}(s)\) is concave in s: this means it is the same as

$$\begin{aligned} {\mathcal {R}}(s) \equiv - {\mathfrak {F}}^\star (s) = -(-{\mathcal {S}})^{\star \star }(s), \end{aligned}$$

so if \({\mathcal {S}}\) is concave then it can be recovered from \({\mathfrak {F}}\).

We do not have a proof that \({\mathcal {S}}(s)\) is concave for all s, but we will argue that this holds on the interval of s corresponding to \(\lambda \in [0,1]\). Formally, for \(\lambda \in [0,1]\), we proved that \(\varvec{F}_\lambda (H)\) has a unique maximizer \(H_\star \equiv H_\lambda \). This implies that there is a unique \(s_\lambda \) which maximizes \(\lambda s +{\mathcal {S}}(s)\), given by

$$\begin{aligned} s_\lambda =\varvec{s}(H_\lambda )\,. \end{aligned}$$

Recall that \(H_\lambda \) and \(s_\lambda \) both depend implicitly on T. We also have from Lemma B.6 that for any fixed \(T<\infty \), \(s_\lambda \) is continuous in \(\lambda \), so it maps \(\lambda \in [0,1]\) onto some compact interval \({\mathcal {I}} \equiv [s_-,s_+]\). Define the modified function

$$\begin{aligned} \overline{{\mathcal {S}}}(s) \equiv {\left\{ \begin{array}{ll} {\mathcal {S}}(s) &{} \text {if }s\in {\mathcal {I}},\\ -\infty &{} \text {otherwise.}\end{array}\right. } \end{aligned}$$

Lemma B.7

For all \(s\in \mathbb {R}\), \(\overline{{\mathcal {S}}}(s) =-(-\overline{{\mathcal {S}}})^{\star \star }(s)\). Consequently the function \(\overline{{\mathcal {S}}}\) is concave, and \(s_\lambda \) is nondecreasing in \(\lambda \).

Proof

The function \(-{\mathcal {S}}(s)\) has Legendre dual

$$\begin{aligned} \overline{{\mathfrak {F}}}(\lambda ) = \sup _{s\in \mathbb {R}}\Big \{ \lambda s + \overline{{\mathcal {S}}}(s)\Big \} = \sup _{s\in {\mathcal {I}} }\Big \{ \lambda s + {\mathcal {S}}(s)\Big \} \leqslant {\mathfrak {F}}(\lambda )\,. \end{aligned}$$

For \(\lambda \in [0,1]\) it is clear that \(\overline{{\mathfrak {F}}}(\lambda ) ={\mathfrak {F}}(\lambda )\). It is straightforward to check that if \(\lambda <0\) then

$$\begin{aligned}\overline{{\mathfrak {F}}}(\lambda ) \leqslant \max _{s\in {\mathcal {I}}}\lambda s +\max _{s\in {\mathcal {I}}}{\mathcal {S}}(s) = \lambda s_{\min } + {\mathcal {S}}(s_0), \end{aligned}$$

so if \(s<s_{\min }\) then

$$\begin{aligned} (-\overline{{\mathcal {S}}})^{\star \star }(s)=(\overline{{\mathfrak {F}}})^\star (s) \geqslant \sup _{\lambda<0} \Big \{\lambda s -\overline{{\mathfrak {F}}}(\lambda ) \Big \} \geqslant \sup _{\lambda <0} \Big \{\lambda (s-s_{\min }) - {\mathcal {S}}(s_0) \Big \}=+\infty \,. \end{aligned}$$

A symmetric argument shows that \((-\overline{{\mathcal {S}}})^{\star \star }(s)=+\infty \) also for \(s>s_{\max }\). If \(s\in {\mathcal {I}}\), we must have \(s=s_{\lambda _\circ }\) for some \(\lambda _\circ \in [0,1]\), and so

$$\begin{aligned} (-\overline{{\mathcal {S}}})^{\star \star }(s) \geqslant \lambda _\circ s - {\mathfrak {F}}(\lambda _\circ ) = -{\mathcal {S}}(s)\,. \end{aligned}$$

This proves \((-\overline{{\mathcal {S}}})^{\star \star }(s) \geqslant -\overline{{\mathcal {S}}}(s)\) for all \(s\in \mathbb {R}\). On the other hand, it holds for any function f that \(f^{\star \star } \leqslant f\), so we conclude \((-\overline{{\mathcal {S}}})^{\star \star }(s) =-\overline{{\mathcal {S}}}(s)\) for all \(s\in \mathbb {R}\). This implies that \(\overline{{\mathcal {S}}}\) is concave, concluding the proof.\(\square \)

Proof of Proposition 1.4a

We can obtain \(\Sigma (s)\) as the limit of \(\overline{{\mathcal {S}}}(s)\) in the limit \(T\rightarrow \infty \). It follows from Lemma B.7 together with Proposition 5.5b that it is strictly decreasing in s.\(\square \)

Appendix C: Constrained entropy maximization

In this section we review basic calculations for entropy maximization problems under affine constraints.

1.1 C.1. Constraints and continuity

We will optimize a functional over nonnegative measures \(\nu \) on a finite space X (with \(|X|=s\)), subject to some affine constraints \(M\nu =b\). We begin by discussing basic continuity properties. Denote

$$\begin{aligned}\mathbb {H}(b)\equiv \{\nu \geqslant 0\} \cap \{M\nu =b\} \subseteq \mathbb {R}^s\,. \end{aligned}$$

Let \(\Delta \equiv \{\nu \geqslant 0\} \cap \{\langle {\mathbf {1}},\nu \rangle =1\}\), and let \(\varvec{B}\) denote the space of \(b\in \mathbb {R}^r\) for which

$$\begin{aligned}\varnothing \ne \mathbb {H}(b) \subseteq \Delta \,.\end{aligned}$$

Then \(\varvec{B}\) is contained in the image of \(\Delta \) under M, so \(\varvec{B}\) is a compact subset of \(\mathbb {R}^r\).

Proposition C.1

If \(\varvec{F}\) is any continuous function on \(\Delta \) and

$$\begin{aligned} F(b)= \max \{ \varvec{F}(\nu ) : \nu \in \mathbb {H}(b) \},\end{aligned}$$
(126)

then F is (uniformly) continuous over \(b\in \varvec{B}\).

Proposition C.1 is a straightforward consequence of the following two lemmas.

Lemma C.2

For \(b\in \varvec{B}\) and any vector u in the unit sphere \({\mathbb {S}}^{r-1}\), let

$$\begin{aligned}d(b,u)\equiv \inf \{t\geqslant 0: b+tu\notin \varvec{B}\}\,.\end{aligned}$$

There exists \(\delta =\delta (b)>0\) such that

$$\begin{aligned}d(b,u) \in \{0\} \cup [\delta ,\infty ) \quad \text {for all }b\in \varvec{B}\,.\end{aligned}$$

Proof

\(\varvec{B}\) is a polytope, so it can be expressed as the intersection of finitely many closed half-spaces \(H_1,\ldots ,H_k\), where \(H_i = \{ x\in \mathbb {R}^r : \langle a_i,x\rangle \leqslant c_i \}\). Consequently there is at least one index \(1\leqslant i\leqslant k\) such that

$$\begin{aligned}d(b,u) = \inf \{t\geqslant 0:b+tu\notin H_i\}\,.\end{aligned}$$

It follows that \(\langle a_i,u\rangle >0\) and

$$\begin{aligned}d(b,u)=\frac{c_i-\langle a_i,b\rangle }{\langle a_i,u\rangle } \geqslant \frac{c_i-\langle a_i,b\rangle }{|a_i|} = d( b,\partial H_i )\end{aligned}$$

where \(d( b,\partial H_i )\) is the distance between b and the boundary of \(H_i\). In particular, \(d(b,u)>0\) if and only if \(\langle a_i,b\rangle <c_i\), which in turn holds if and only if \(d( b,\partial H_i )>0\). It follows that for all \(u\in {\mathbb {S}}^{r-1}\) we have \(d(b,u)\in \{0\} \cup [\delta ,\infty )\) with

$$\begin{aligned}\delta =\delta (b)= \min \{ d( b,\partial H_i ): d( b,\partial H_i )>0 \};\end{aligned}$$

\(\delta \) is a minimum over finitely many positive numbers so it is also positive.\(\square \)

Lemma C.3

The set-valued function \(\mathbb {H}\) is continuous on \(\varvec{B}\) with respect to the Hausdorff metric \(d_{\mathcal {H}}\), that is to say, if \(b_n\in \varvec{B}\) with \(\lim _{n\rightarrow \infty } b_n=b\) then

$$\begin{aligned}\lim _{n\rightarrow \infty }d_{\mathcal {H}}(\mathbb {H}(b_n),\mathbb {H}(b)) = 0\,.\end{aligned}$$

Proof

Recall that the Hausdorff distance between two subsets X and Y of a metric space is

$$\begin{aligned}d_{\mathcal {H}}(X,Y) = \inf \{\epsilon \geqslant 0 : X \subseteq Y^\epsilon \text { and } Y \subseteq X^\epsilon \},\end{aligned}$$

where \(X^\epsilon ,Y^\epsilon \) are the \(\epsilon \)-thickenings of X and Y. Any sequence \(\nu _n\in \mathbb {H}(b_n)\) converges along subsequences to limits \(\nu \in \mathbb {H}(b)\), so for all \(\epsilon >0\) there exists \(n_0(\epsilon )\) large enough that

$$\begin{aligned}\mathbb {H}(b_n) \subseteq (\mathbb {H}(b))^\epsilon , \quad n\geqslant n_0(\epsilon )\,.\end{aligned}$$

In the other direction, we now argue that if \(\nu \in \mathbb {H}(b)\) and \(b'=b+tu\) for \(u\in {\mathbb {S}}^{r-1}\) and t a small positive number, then we can find \(\nu '\in \mathbb {H}(b')\) which is close to \(\nu \). For \(u\in {\mathbb {S}}^{r-1}\) let d(bu) be as in Lemma C.2, and take \(\nu (b,u)\) to be any fixed element of \(\mathbb {H}(b+d(b,u)u)\) (which by definition is nonempty). Since we consider \(b'=b+tu\) for \(t>0\), we can assume that d(bu) is positive, hence \(\geqslant \delta (b)\) by Lemma C.2. We can express \(b'=b+tu\) as the convex combination

$$\begin{aligned}b' = (1-\epsilon )b + \epsilon [ b+d(b,u)u ],\quad \epsilon = \frac{t}{d(b,u)} = \frac{|b'-b|}{d(b,u)} \leqslant \frac{|b'-b|}{\delta }\,.\end{aligned}$$

Then \(\nu ' = (1-\epsilon )\nu + \epsilon \nu (b,u)\in \mathbb {H}(b')\), so

$$\begin{aligned}|\nu '-\nu | = \epsilon | \nu (b,u)-\nu | \leqslant \frac{({{\,\mathrm{diam}\,}}\Delta )|b-b'|}{\delta }\end{aligned}$$

This implies \(H(b) \subseteq (H(b_n))^\epsilon \) for large enough n, and the result follows.\(\square \)

Proof of Proposition C.1

Take \(\nu \in \mathbb {H}(b)\) so that \(F(b)=\varvec{F}(\nu )\). If \(b'=b+tu\in \varvec{B}\) for some \(u\in {\mathbb {S}}^{r-1}\), then Lemma C.3 implies that we can find \(\nu '\in \mathbb {H}(b')\) with \(|\nu '-\nu | = o_t(1)\), where \(o_t(1)\) indicates a function tending to zero in the limit \(t\downarrow 0\), uniformly over \(u\in {\mathbb {S}}^{r-1}\). It follows that \(\varvec{F}(\nu ) = \varvec{F}(\nu ')+o_t(1)\), since \(\varvec{F}\) is uniformly continuous on \(\Delta \) by the Heine–Cantor theorem. Therefore

$$\begin{aligned}F(b) = \varvec{F}(\nu ) = \varvec{F}(\nu ') + o_t(1) \leqslant F(b') + o_t(1)\,.\end{aligned}$$

By the same argument \(F(b') \leqslant F(b) + o_t(1)\), concluding the proof.\(\square \)

When solving (126) for a fixed value of \(b\in \varvec{B}\), it will be convenient to make the following reduction:

Remark C.4

Suppose M is an \(r\times s\) matrix where \(s=|X|\). We can assume without loss that M has full rank r, since otherwise we can eliminate redundant constraints. We consider only \(b\in \varvec{B}\), meaning \(\varnothing \ne \mathbb {H}(b)\subseteq \Delta \). The affine space \(\{M\nu =b\}\) has dimension \(s-r\); we assume this is positive since otherwise \(\mathbb {H}(b)\) would be a single point. Then, if \(\mathbb {H}(b)\) does not contain an interior point of \(\{\nu \geqslant 0\}\), it must be that

$$\begin{aligned}X_\circ \equiv \{x\in X : \exists \nu \in \{\nu \geqslant 0\}\cap \{M\nu =b\} \text { so that }\nu (x)>0\}\end{aligned}$$

is a nonempty subset of X. In this case, it is equivalent to solve the optimization problem over measures \(\nu _\circ \) on the reduced alphabet \(X_\circ \), subject to constraints \(M' \nu _\circ =b\) where \(M'\) is the submatrix of M formed by the columns indexed by \(X_\circ \). Then, by construction, the space

$$\begin{aligned}\mathbb {H}_\circ (b) =\{\nu _\circ \geqslant 0\} \cap \{M' \nu _\circ =b\}\end{aligned}$$

contains an interior point of \(\{\nu _\circ \geqslant 0\}\). The matrix \(M'\) is \(r\times s_\circ \) where \(s_\circ =|X_\circ |\); and if \(M'\) is not of rank r then we can again remove redundant constraints, replacing \(M'\) with an \(r_\circ \times s_\circ \) submatrix \(M_\circ \) which has full rank \(r_\circ \). We emphasize that the final matrix \(M_\circ \) depends on b. In conclusion, when solving (126) for a fixed \(b\in \varvec{B}\), we may assume with no essential loss of generality that the original matrix M is \(r\times s\) with full rank r, and that \(\mathbb {H}(b)=\{\nu \geqslant 0\}\cap \{M\nu =b\}\) contains an interior point of \(\{\nu \geqslant 0\}\). It follows that this space has dimension \(s-r>0\), and its boundary is contained in the boundary of \(\{\nu \geqslant 0\}\).

1.2 C.2. Entropy maximization

We now restrict (126) to the case of functionals \(\varvec{F}\) which are concave on the domain \(\{\nu \geqslant 0\}\). It is straightforward to verify from definitions that the optimal value F(b) is (weakly) concave in b. Recall that the convex conjugate of a function f on domain C is the function \(f^\star \) defined by

$$\begin{aligned} f^\star (x^\star ) = \sup \{\langle x^\star ,x\rangle -f(x) : x\in C\}\,.\end{aligned}$$

Denote \(G(\gamma ) \equiv (-\varvec{F})^\star (M^t\gamma )\), and consider the Lagrangian functional

$$\begin{aligned}{\mathcal {L}}(\gamma ;b) = \sup \{ \varvec{F}(\nu ) + \langle \gamma , M\nu -b \rangle : \nu \geqslant 0 \} = -\langle \gamma ,b\rangle + G(\gamma )\,.\end{aligned}$$

It holds for any \(\gamma \in \mathbb {R}^r\) that \({\mathcal {L}}(\gamma ;b) \geqslant F(b)\), so

$$\begin{aligned} F(b) \leqslant \inf \{{\mathcal {L}}(\gamma ;b) : \gamma \in \mathbb {R}^r\} = -G^\star (b)\,.\end{aligned}$$
(127)

Now assume \(\psi \) is a positive function on X, and consider (126) for the special case

$$\begin{aligned} \varvec{F}(\nu ) = \mathcal {H}(\nu ) + \langle \nu ,\ln \psi \rangle = \sum _{x\in X} \nu (x)\ln \frac{\psi (x)}{\nu (x)}\,.\end{aligned}$$
(128)

We remark that the supremum in \((-\mathcal {H})^\star (\nu ^\star ) =\sup \{ \langle \nu ^\star ,\nu \rangle +\mathcal {H}(\nu ): \nu \geqslant 0\}\) is uniquely attained by the measure \(\nu ^\text {op}(x)=\exp \{ -1 + \nu ^\star (x)\}\), yielding

$$\begin{aligned}(-\mathcal {H})^\star (\nu ^\star ) =\langle \nu ^\text {op}(\nu ^\star ) ,1\rangle = \sum _x \exp \{ -1+\nu ^\star (x)\}\,.\end{aligned}$$

This gives the explicit expression

$$\begin{aligned} G(\gamma ) =(-\varvec{F})^\star (M^t\gamma ) =(-\mathcal {H})^\star (\ln \psi +M^t\gamma ) =\sum _x \psi (x) \exp \{ -1 + (M^t\gamma )(x)\}\,.\end{aligned}$$
(129)

Lemma C.5

Assume \(\psi \) is a strictly positive function on a set X of size s and that M is \(r\times s\) with rank r. Then the function \(G(\gamma )\) of (129) is strictly convex in \(\gamma \).

Proof

Let \(\nu \equiv \nu (\gamma )\) denote the measure on X defined by

$$\begin{aligned}\nu (x) = \psi (x) \exp \{ -1+(M^t\gamma )(x) \},\end{aligned}$$

and write \(\langle f(x) \rangle _\nu \equiv \langle f,\nu \rangle \). The Hessian matrix \(H \equiv {{\,\mathrm{Hess}\,}}G(\gamma )\) has entries

$$\begin{aligned}H_{i,j} = \frac{\partial ^2{\mathcal {L}}(\gamma ;b)}{ \partial \gamma _i\partial \gamma _j } =\sum _{x\in X} \nu (x) M_{i,x} M_{j,x} =\langle M_{i,x} M_{j,x}\rangle _\nu \,.\end{aligned}$$

Let \(M_x\) denote the vector-valued function \((M_{i,x})_{i\leqslant r}\), so

$$\begin{aligned}\alpha ^t H \alpha = \langle (\alpha ^t M_x)^2 \rangle _\nu \,.\end{aligned}$$

This is zero if and only if \(\nu (\{x\in X:\alpha ^t M_x=0\})=1\). Since \(\nu \) is a positive measure, this can only happen if \(\alpha ^t M_x=0\) for all \(x\in X\), but this contradicts the assumption that M has rank r. This proves that H is positive-definite, so G is strictly convex in \(\gamma \).\(\square \)

Proposition C.6

Let \(b\in \varvec{B}\) such that \(\mathbb {H}(b)=\{\nu \geqslant 0\}\cap \{M\nu =b\}\) contains an interior point of \(\{\nu \geqslant 0\}\), and consider the optimization problem (126) for \(\varvec{F}\) as in (128). For this problem, the inequality (127) becomes an equality,

$$\begin{aligned}F(b) =\inf \{ {\mathcal {L}}(\gamma ;b): \gamma \in \mathbb {R}^r\}=-G^\star (b)\,.\end{aligned}$$

Further, \({\mathcal {L}}(\gamma ;b)\) is strictly convex in \(\gamma \), and its infimum is achieved by a unique \(\gamma =\gamma (b)\). The optimum value of (126) is uniquely attained by the measure \(\nu =\nu ^\text {op}(b)\) defined by

$$\begin{aligned} \nu (x)= \psi (x) \exp \{-1 + (M^t\gamma )(x)\}\,.\end{aligned}$$
(130)

For any \(\mu \in \mathbb {H}(b)\), \(\varvec{F}(\nu )-\varvec{F}(\mu ) =\mathcal {D}_{\textsc {kl}}(\mu |\nu ) \gtrsim \Vert \nu -\mu \Vert ^2\). Finally, in a neighborhood of b in \(\varvec{B}\), \(\gamma '(b)\) is defined and F(b) is strictly concave in b.

Proof

Under the assumptions, the boundary of the set \(\mathbb {H}(b)\) is contained in the boundary of \(\{\nu \geqslant 0\}\). The entropy \(\mathcal {H}\) has unbounded gradient at this boundary, so for \(\varvec{F}\) as in (128), the optimization problem (126) must be solved by a strictly positive measure \(\nu >0\). Since \(\nu >0\), we can differentiate in the direction of any vector \(\delta \) with \(M\delta =0\) to find

$$\begin{aligned}0=\frac{d}{dt} \bigg [ \mathcal {H}(\nu +t\delta ) +\langle \ln \psi ,\nu +t\delta \rangle \bigg ]\bigg |_{t=0} = \langle \delta ,-1-\ln \nu +\ln \psi \rangle \,. \end{aligned}$$

Recalling Remark C.4, we assume without loss that M is \(r\times s\) with rank r, since otherwise we can eliminate redundant constraints. Then, since \(M\delta =0\), for any \(\gamma \in \mathbb {R}^r\) we have

$$\begin{aligned} 0 = \langle \delta ,\epsilon \rangle \quad \text {where }\epsilon = -1-\ln \nu +\ln \psi + M^t\gamma \,. \end{aligned}$$

We can then solve for \(\gamma \) so that \(M\epsilon =0\):Footnote 2

$$\begin{aligned} \gamma = (M M^t)^{-1} M(\ln \nu -\ln \psi +1)\,. \end{aligned}$$

Setting \(\delta =\epsilon \) in the above gives \(0=\Vert \epsilon \Vert ^2\), therefore we must have \(\epsilon =0\). This proves the existence of \(\gamma =\gamma (b) \in \mathbb {R}^r\) such that (126) is optimized by \(\nu =\nu ^\text {op}(b)\), as given by (130). The optimal value of (126) is then

$$\begin{aligned} F(b)&= \langle 1,\nu ^\text {op}(b)\rangle - \langle M^t\gamma (b), \nu ^\text {op}(b)\rangle \\&= \sum _x \psi (x) \exp \{ -1+ (M^t\gamma )(x) \} -\langle \gamma ,b\rangle \bigg |_{\gamma =\gamma (b)} = {\mathcal {L}}(\gamma (b),b). \end{aligned}$$

In view of (127), this proves that in fact

$$\begin{aligned}-G^\star (b)=\inf \{ {\mathcal {L}}(\gamma ,b) :\gamma \in \mathbb {R}^r\} = \min \{ {\mathcal {L}}(\gamma ,b) : \gamma \in \mathbb {R}^r\} = {\mathcal {L}}(\gamma (b),b) = F(b)\end{aligned}$$

as claimed. Now recall from Lemma C.5 that \(G(\gamma )\) is strictly convex, which implies that \({\mathcal {L}}(\gamma ;b)\) is strictly convex in \(\gamma \). Thus \(\gamma =\gamma (b)\) is the unique stationary point of \({\mathcal {L}}(\gamma ;b)\).

These conclusions are valid under the assumption that \(\mathbb {H}(b)\) contains an interior point of \(\{\nu \geqslant 0\}\), which is valid in a neighborhood of b in \(\varvec{B}\). Throughout this neighborhood, \(\gamma (b)\) is defined by the stationarity condition \(b = G'(\gamma )\). Differentiating again with respect to \(\gamma \) gives

$$\begin{aligned} b'(\gamma ) = {{\,\mathrm{Hess}\,}}G(\gamma ),\quad \gamma '(b) = [{{\,\mathrm{Hess}\,}}G(\gamma (b))]^{-1}\,.\end{aligned}$$
(131)

We also find (in this neighborhood) that

$$\begin{aligned}F'(b) = -\gamma (b),\quad F''(b) = - \gamma '(b) = -[{{\,\mathrm{Hess}\,}}G(\gamma (b))]^{-1},\end{aligned}$$

so F is strictly concave. It remains to prove that \(\varvec{F}(\nu )-\varvec{F}(\mu ) =\mathcal {D}_{\textsc {kl}}(\mu |\nu )\). (The estimate \(\mathcal {D}_{\textsc {kl}}(\mu |\nu )\gtrsim \Vert \mu -\nu \Vert ^2\) is well known and straightforward to verify.) For any measure \(\mu \),

$$\begin{aligned}-\mathcal {D}_{\textsc {kl}}(\mu |\nu ) = \mathcal {H}(\mu ) + \langle \mu ,\ln (\psi \exp \{-1+M^t\gamma \}) \rangle \,.\end{aligned}$$

Applying this with \(\mu =\nu \) gives

$$\begin{aligned}0=-\mathcal {D}_{\textsc {kl}}(\nu |\nu ) = \mathcal {H}(\nu ) + \langle \nu ,\ln (\psi \exp \{-1+M^t\gamma \}) \rangle \,.\end{aligned}$$

Subtracting these two equations gives

$$\begin{aligned}-\mathcal {D}_{\textsc {kl}}(\mu |\nu ) = \mathcal {H}(\mu )-\mathcal {H}(\nu ) +\langle \mu -\nu ,\ln \psi \rangle + \langle \mu -\nu ,\ln (\exp \{-1+M^t\gamma \}) \rangle \,.\end{aligned}$$

If \(M\nu =M\nu =b\) then the last term vanishes, giving \(-\mathcal {D}_{\textsc {kl}}(\mu |\nu ) = \varvec{F}(\mu )-\varvec{F}(\nu )\).\(\square \)

Remark C.7

Our main application of Proposition C.6 is for the depth-one tree \(\mathcal {D}\) of Fig. 6. In the notation of the current section, X is the space of valid T-colorings \(\underline{{\sigma }}\) of \(\mathcal {D}\), and \(\psi : X \rightarrow (0,\infty )\) is defined by

$$\begin{aligned}\psi (\underline{{\sigma }}) = \varvec{w}_{\mathcal {D}}(\underline{{\sigma }})^\lambda =\bigg \{ \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \prod _{a\in \partial v} [\bar{\Phi }(\sigma _{av}) \hat{\Phi }(\underline{{\sigma }}_{\delta a})] \bigg \}^\lambda \,.\end{aligned}$$

We then wish to solve the optimization problem (126) for \(\varvec{F}(\nu )\) as in (128), under the constraint that \(\nu \) has marginals \(\dot{h}^\text {tr}(\dot{\sigma })\) on the boundary edges \(\delta \mathcal {D}\). This can be expressed as \(M\nu ={\dot{h}}\) where M has rows indexed by the spins \(\dot{\sigma }\in \dot{\Omega }\), columns indexed by valid T-colorings \(\underline{{\eta }}\equiv \underline{{\eta }}_\mathcal {D}\) of \(\mathcal {D}\): the \((\dot{\sigma },\underline{{\eta }})\) entry of M is given by

$$\begin{aligned}M(\dot{\sigma },\underline{{\eta }}) =|\delta \mathcal {D}|^{-1} \sum _{e\in \delta \mathcal {D}} \mathbf {1}\{\dot{\eta }_e=\dot{\sigma }\}\,.\end{aligned}$$

Recall Remark C.4, let \(\dot{\Omega }_+=\{\dot{\sigma }\in \dot{\Omega }: \dot{h}^\text {tr}(\dot{\sigma })>0\}\), and \(X_\circ = \{\underline{{\eta }}\in X:M(\dot{\sigma },\underline{{\eta }})=0\ \forall \dot{\sigma }\notin \dot{\Omega }\}\). Let \(M_+\) be the \(\dot{\Omega }_+ \times X_\circ \) submatrix of M, and set \({\dot{q}}(\dot{\sigma })=0\) for all \(\dot{\sigma }\notin \dot{\Omega }_+\). Next, in the matrix \(M_+\), if the \(\dot{\eta }\) row is a linear combination of other rows, then set \({\dot{q}}(\dot{\eta })=1\) and remove this row. Repeat until we arrive at an \(\dot{\Omega }_\circ \times X_\circ \) matrix \(M_\circ \) of full rank \(r_\circ =|\dot{\Omega }_\circ |\). The original problem reduces to an optimization over \(\{\nu _\circ \geqslant 0\}\cap \{M_\circ \nu _\circ =b_\circ \}\) where \(b_\circ \) denotes the entries of b indexed by \(\dot{\Omega }_\circ \). It follows from Proposition C.6 that the unique maximizer of (126) is the measure \(\nu =\nu ^\text {op}(b)\) given by

$$\begin{aligned}\nu (\underline{{\sigma }}) = \frac{1}{Z} \varvec{w}_{\mathcal {D}}(\underline{{\sigma }})^\lambda = \frac{1}{Z}\bigg \{ \dot{\Phi }(\underline{{\sigma }}_{\delta v}) \prod _{a\in \partial v} [\bar{\Phi }(\sigma _{av}) \hat{\Phi }(\underline{{\sigma }}_{\delta a})] \bigg \}^\lambda \prod _{e\in \delta \mathcal {D}} {\dot{q}}(\sigma _e)\,.\end{aligned}$$

Note however that if \(M_+\) is not of full rank then \({\dot{q}}\) need not be unique.

Appendix D: Pairs of intermediate or large overlap

In this section we prove Proposition 3.7, which states that the first moment of \(\varvec{Z}=\varvec{Z}_{\lambda ,T}\) is dominated by separable colorings provided \(0\leqslant \lambda \leqslant 1\).

1.1 D.1. Intermediate overlap

We first show that configurations with “intermediate” overlap are negligible. This can be done with quite crude estimates, working with nae-sat solutions rather than colorings.

Lemma D.1

Consider random regular nae-sat at clause density \(\alpha \geqslant 2^{k-1}\ln 2 - O(1)\). On \(\mathscr {G}=(V,F,E,\underline{{\texttt {L}}})\), let \(Z^2[\rho ]\) count the number of pairs \(\underline{{x}},\underline{{\acute{x}}}\in \{{\texttt {0}},{\texttt {1}}\}^V\) of valid nae-sat solutions which agree on \(\rho \) fraction of variables. Then

$$\begin{aligned}\mathbb {E}Z^2[\rho ] \leqslant (\mathbb {E}Z) \exp \Big \{ n \Big [ H(\rho ) -(\ln 2)\pi (\rho ) + O(1/2^k) \Big ] \Big \},\end{aligned}$$

for \(\pi (\rho )\equiv 1-\rho ^k-(1-\rho )^k\).

Proof

For \(\underline{{u}}\in \{{\texttt {0}},{\texttt {1}}\}^V\), let \(I^\textsc {nae}(\underline{{u}};\mathscr {G})\) be the indicator that \(\underline{{u}}\) is a valid nae-sat solution on \(\mathscr {G}\). Fix any pair of vectors \(\underline{{x}},\underline{{\acute{x}}}\in \{{\texttt {0}},{\texttt {1}}\}^V\) which agree on \(\rho \) fraction of variables:

$$\begin{aligned}\mathbb {E}Z^2[\rho ] = 2^n \left( {\begin{array}{c}n\\ n\rho \end{array}}\right) \mathbb {E}[ I^\textsc {nae}(\underline{{x}};\mathscr {G}) I^\textsc {nae}(\underline{{\acute{x}}};\mathscr {G})] = (\mathbb {E}Z) \left( {\begin{array}{c}n\\ n\rho \end{array}}\right) \mathbb {E}[I^\textsc {nae}(\underline{{\acute{x}}};\mathscr {G}) \,|\, I^\textsc {nae}(\underline{{x}};\mathscr {G})=1]\,.\end{aligned}$$

Given \(\underline{{x}},\underline{{\acute{x}}}\), let \(M\equiv M(\underline{{x}},\underline{{\acute{x}}})\) count the number of clauses \(a\in F\) where

$$\begin{aligned}|\{e\in \delta a : x_{v(e)}=\acute{x}_{v(e)} \}| \notin \{0,k\}\,.\end{aligned}$$

In each of these clauses, there are \(2^k-2\) literal assignments \(\underline{{\texttt {L}}}_{\delta a}\) which are valid for \(\underline{{x}}\). Out of these, exactly \(2^k-4\) are valid also for \(\underline{{\acute{x}}}\). If we define i.i.d. binomial random variables \(D_a\sim {\mathrm {Bin}}(k,\rho )\), indexed by \(a\in F\), then

$$\begin{aligned}\mathbb {P}( M=m\gamma ) = \mathbb {P}\bigg ( \sum _{a\in F} \mathbf {1}\{D_a\notin \{0,k\}\} \,\bigg |\, \sum _{a\in F} D_a = mk\rho \bigg )\,.\end{aligned}$$

The \((D_a)_{a\in F}\) sum to \(mk\rho \) with probability which is polynomial in n, so

$$\begin{aligned}\mathbb {P}( M=m\gamma ) \leqslant n^{O(1)} \mathbb {P}({\mathrm {Bin}}(m,\pi )=m\gamma )\end{aligned}$$

with \(\pi =\pi (\rho )\) as in the statement of the lemma. Therefore

$$\begin{aligned}\mathbb {E}[I^\textsc {nae}(\underline{{\acute{x}}};\mathscr {G}) \,|\, I^\textsc {nae}(\underline{{x}};\mathscr {G})=1] \leqslant n^{O(1)} \mathbb {E}\bigg [ \bigg ( \frac{2^k-4}{2^k-2} \bigg )^X \bigg ]\end{aligned}$$

for \(X\sim {\mathrm {Bin}}(m,\rho )\). It is easily seen that the above is \(\leqslant \exp \{ -m\pi /2^{k-1} \}\), and the claimed bound follows, using the lower bound on \(\alpha =m/n\).\(\square \)

Corollary D.2

Let \(\psi (\rho ) = H(\rho ) - (\ln 2)\pi (\rho )\). Then \(\psi (\rho ) \leqslant -2k/2^k\) for all \(\rho \) in

$$\begin{aligned} \cup [\tfrac{1}{2}(1 + k/2^{k/2}), 1-\exp \{-k/(\ln k)\}]\,.\end{aligned}$$

Assuming \(\alpha =m/n\geqslant 2^{k-1}\ln 2-O(1)\), \(\mathbb {E}Z^2[\rho ] \leqslant \exp \{ -nk/2^k \}\) for all such \(\rho \).

Proof

Note that \(H( \tfrac{1+\epsilon }{2}) \leqslant \ln 2-\epsilon ^2/2\). If \((k\ln k)/2^k \leqslant \epsilon \leqslant 1/k\), then

$$\begin{aligned}\psi ( \tfrac{1+\epsilon }{2} ) \leqslant -\epsilon ^2/2 + O(k\epsilon /2^k) \leqslant -\epsilon ^2/3\,.\end{aligned}$$

Both \(H( \tfrac{1+\epsilon }{2})\) and \(\pi ( \tfrac{1+\epsilon }{2})\) are symmetric about \(\epsilon =0\), and decreasing on the interval \(0\leqslant \epsilon \leqslant 1\). It follows that for any \(0\leqslant a \leqslant b\leqslant 1\),

$$\begin{aligned}\max _{a\leqslant \epsilon \leqslant b} \psi ( \tfrac{1+\epsilon }{2} ) \leqslant H(\tfrac{1+a}{2}) -(\ln 2)\pi (\tfrac{1+b}{2})\,.\end{aligned}$$

With this in mind, if \(1/k \leqslant \epsilon \leqslant 1-5(\ln k)/k\),

$$\begin{aligned}\psi ( \tfrac{1+\epsilon }{2} ) \leqslant -(2k^2)^{-1} + O(k^{-5/2}) \leqslant -(4k^2)^{-1}\,.\end{aligned}$$

If \(1-5(\ln k)/k \leqslant \epsilon \leqslant 1-(\ln k)^3/k^2\),

$$\begin{aligned}\psi ( \tfrac{1+\epsilon }{2} ) \leqslant O(1) (\ln k)^2/k -\Omega (1) (\ln k)^3/k \leqslant -\Omega (1) (\ln k)^3/k\,.\end{aligned}$$

Finally, if \(1-(\ln k)^3/k^2 \leqslant \epsilon \leqslant 1-\exp \{-2k/(\ln k)\}\), then

$$\begin{aligned}\psi ( \tfrac{1+\epsilon }{2} ) \leqslant O(1) \epsilon k/(\ln k) - \Omega (1) \epsilon k \leqslant - \Omega (1) \epsilon k\,.\end{aligned}$$

Combining these estimates proves the claimed bound on \(\psi (\rho )\). The assertion for \(\mathbb {E}[Z^2(\rho )]\) then follows by substituting into Lemma D.1, and noting that \(\mathbb {E}Z \leqslant \exp \{O(n/2^k)\}\).\(\square \)

1.2 D.2. Large overlap

In what follows, we restrict consideration to a small neighborhood \(\mathbf {N}\) of \(H_\star \). We abbreviate \(\underline{{\sigma }}\in H\) if \(H(\mathcal {G},\underline{{\sigma }})=H\), and \(\underline{{\sigma }}\in \mathbf {N}\) if \(H(\mathcal {G},\underline{{\sigma }})\in \mathbf {N}\). Recall that we write \(\underline{{\sigma }}'\succcurlyeq \underline{{\sigma }}\) if the number of free variables in \(\underline{{x}}(\underline{{\sigma }}')\) upper bounds the number in \(\underline{{x}}(\underline{{\sigma }})\). We also write \(H' \succcurlyeq H\) if \(\underline{{\sigma }}'\succcurlyeq \underline{{\sigma }}\) for any (all) \(\underline{{\sigma }}\in H\) and \(\underline{{\sigma }}'\in H'\). Let \(\varvec{Z}_\text {ns}(H,H')\) count the colorings \(\underline{{\sigma }}\in H\) such that

$$\begin{aligned}\Big |\Big \{ \underline{{\sigma }}'\in H' : \mathrm {\textsf {{sep}}}(\underline{{\sigma }},\underline{{\sigma }}') \leqslant \exp \{-k/(\ln k) \} \Big \}\Big | \geqslant \omega (n),\end{aligned}$$

for \(\omega (n) = \exp \{ (\ln n)^4 \}\). (Although we will not write it explicitly, it should be understood that \(\varvec{Z}_\text {ns}(H,H')\) depends on \(\mathscr {G}\), since both \(\underline{{\sigma }},\underline{{\sigma }}'\) are required to be valid colorings of \(\mathscr {G}\).) Let \(\varvec{Z}_\text {ns}(\mathbf {N})\) denote the sum of \(\varvec{Z}_\text {ns}(H;H')\) over all pairs \(H,H'\in \mathbf {N}\) with \(H'\succcurlyeq H\). Let \(\varvec{Z}(\mathbf {N})\) denote the sum of \(\varvec{Z}(H)\) over all \(H\in \mathbf {N}\).

Proposition D.3

There exists a small enough positive constant \(\epsilon _{\max }(k)\) such that, if \(\mathbf {N}\) is the \(\epsilon \)-neighborhood of \(H_\star \) for any \(\epsilon \leqslant \epsilon _{\max }\), then

$$\begin{aligned}\mathbb {E}\varvec{Z}_\text {ns}(\mathbf {N})\leqslant \mathbb {E}\varvec{Z}(\mathbf {N}) \exp \{-(\ln n)^2\}\,.\end{aligned}$$

Proof

By definition,

$$\begin{aligned}\varvec{Z}_\text {ns}(\mathbf {N}) = \sum _{H\in \mathbf {N}} \varvec{Z}_\succcurlyeq (H),\quad \varvec{Z}_\succcurlyeq (H)\equiv \sum _{H'\in \mathbf {N}} \mathbf {1}\{ H' \succcurlyeq H \} \varvec{Z}_\text {ns}(H,H')\,.\end{aligned}$$

It suffices to show that for every \(H\in \mathbf {N}\), \(\mathbb {E}\varvec{Z}_\succcurlyeq (H)\leqslant \mathbb {E}\varvec{Z}(H)\exp \{ -2 (\ln n)^2 \}\). Note that the total number of empirical measures \(H'\) is at most \(n^c\) for some constant c(kT). Let \(\varvec{E}\) denote the set of pairs \((\mathscr {G},\underline{{\sigma }})\) for which

$$\begin{aligned}\Big |\Big \{ \underline{{\sigma }}'\in \mathbf {N}: \underline{{\sigma }}'\succcurlyeq \underline{{\sigma }}\text { and } \mathrm {\textsf {{sep}}}(\underline{{\sigma }},\underline{{\sigma }}') \leqslant \exp \{-k/(\ln k)\} \Big \}\Big |\geqslant \omega (n)\,.\end{aligned}$$

(Again, it is understood that both \(\underline{{\sigma }},\underline{{\sigma }}'\) must be valid colorings of \(\mathscr {G}\).) Then

$$\begin{aligned}\varvec{Z}_\succcurlyeq (H) \leqslant n^{c} \sum _{\underline{{\sigma }}\in H} \mathbf {1}\{(\mathscr {G},\underline{{\sigma }})\in \varvec{E}\}\,.\end{aligned}$$

Consequently, in order to show the required bound on \(\mathbb {E}\varvec{Z}_\succcurlyeq (H)\), it suffices to show

$$\begin{aligned} \mathbb {P}^H(\varvec{E})\leqslant n^{-c} \exp \{-2(\ln n)^2\},\end{aligned}$$
(132)

where \(\mathbb {P}^H\) is a “planted” measure on pairs \((\mathscr {G},\underline{{\sigma }})\): to sample from \(\mathbb {P}^H\), we start with a set V of n isolated variables each with d incident half-edges, and a set F of m isolated clauses each with k incident half-edges. Assign colorings of the half-edges,

$$\begin{aligned}\underline{{\sigma }}_\delta \equiv (\underline{{\sigma }}_{\delta V},\underline{{\sigma }}_{\delta F}) \quad \text {where } \underline{{\sigma }}_{\delta V} \equiv (\underline{{\sigma }}_{\delta v})_{v\in V}, \ \underline{{\sigma }}_{\delta F} \equiv (\underline{{\sigma }}_{\delta a})_{a\in F},\end{aligned}$$

which are uniformly random subject to the empirical measure H. Then \(\underline{{\sigma }}_\delta \) is the “planted” coloring: conditioned on it, we sample uniformly at random a graph \(\mathscr {G}\) such that \(\underline{{\sigma }}_\delta \) becomes a valid coloring \(\underline{{\sigma }}\) on \(\mathscr {G}\). The resulting pair \((\mathscr {G},\underline{{\sigma }})\) is a sample from \(\mathbb {P}^H\).

Suppose \((\mathscr {G},\underline{{\sigma }})\in \varvec{E}\). The total number of configurations \(\underline{{\sigma }}'\) with \(\mathrm {\textsf {{sep}}}(\underline{{\sigma }},\underline{{\sigma }}') \leqslant \delta \) is at most \((cn)^{n\delta }\), which is \(\ll \omega (n)\) if \(\delta \leqslant n^{-1} (\ln n)^2\). This implies that there must exist \(\underline{{\sigma }}'\in \mathbf {N}\) such that \(\underline{{\sigma }}'\succcurlyeq \underline{{\sigma }}\) and

$$\begin{aligned}n^{-1} (\ln n)^2\leqslant \mathrm {\textsf {{sep}}}(\underline{{\sigma }},\underline{{\sigma }}') \leqslant \exp \{-k/(\ln k)\}\,.\end{aligned}$$

It follows that

$$\begin{aligned}S\equiv \{v\in V: x_v(\underline{{\sigma }})\in \{{\texttt {0}},{\texttt {1}}\} \text { and } x_v(\underline{{\sigma }}') \ne x_v(\underline{{\sigma }})\}\end{aligned}$$

has size \(|S| \equiv ns\) for \(s \in [(2n)^{-1}(\ln n)^2,\exp \{-k/(\ln k)\}]\). The set S is internally forced in \(\underline{{\sigma }}\): for every \(v\in S\), any clause forcing to v must have another edge connecting to S. Formally, let \(\texttt {R}_U\) (resp. \(\texttt {B}_U\)) count the number of \(\{{\texttt {r}}\}\)-colored (resp. \(\{{\texttt {b}}\}\)-colored) edges incident to a subset of vertices U. Let \(I_S\) be the indicator that all variables in S are forced. For any fixed \(S\subseteq V\),

$$\begin{aligned}\mathbb {P}^H(S\text { internally forced}) \leqslant \mathbb {E}_{\mathbb {P}^H}\bigg [ I_S k^{\texttt {R}_S} \frac{(\texttt {B}_S)_{\texttt {R}_S}}{(\texttt {B}_F)_{\texttt {R}_S}} \bigg ] \leqslant \mathbb {E}_{\mathbb {P}^H}[ I_S (4ks)^{\texttt {R}_S}]\,.\end{aligned}$$

In the first inequality, the factor \(k^{\texttt {R}_S}\) accounts for the choice, for each S-incident \(\{{\texttt {r}}\}\)-colored edge e, of another edge \(e'\) sharing the same clause. The factor \((\texttt {B}_S)_{\texttt {R}_S}/(\texttt {B}_F)_{\texttt {R}_S}\) then accounts for the chance that the chosen edge \(e'\) (which must have color in \(\{{\texttt {b}}\}\)) will also be S-incident. The second inequality follows by noting that we certainly have \(\texttt {B}_S \leqslant nsd\), and for H near \(H_\star \) we also clearly have \(\texttt {B}_F \geqslant nd/4\).

To bound the above, we can work with a slightly different measure \(\mathbb {Q}^H\): instead of sampling \(\underline{{\sigma }}_\delta \) subject to H, we can simply sample variable-incident colorings \(\underline{{\sigma }}_{\delta v}\) i.i.d. from \(\dot{H}\), and clause-incident colorings \(\underline{{\sigma }}_{\delta a}\) i.i.d. from \(\hat{H}\). On the event \(\mathrm {\textsf {{MARG}}}\) that the resulting \(\underline{{\sigma }}_\delta \) has empirical measure H, we sample the graph \(\mathscr {G}\) according to \(\mathbb {P}^H(\mathscr {G}|\underline{{\sigma }}_\delta )\), and otherwise we set \(\mathscr {G}=\varnothing \). Then, since \(\mathbb {Q}^H(\mathrm {\textsf {{MARG}}}) \geqslant n^{-c}\) (adjusting c as needed), we have

$$\begin{aligned}\mathbb {P}^H((\mathscr {G},\underline{{\sigma }}))=\mathbb {Q}^H((\mathscr {G},\underline{{\sigma }}) \,|\,\mathrm {\textsf {{MARG}}}) \leqslant n^c \, \mathbb {Q}^H((\mathscr {G},\underline{{\sigma }}) ; \mathrm {\textsf {{MARG}}})\,.\end{aligned}$$

Let us abbreviate \(\dot{H}(\ell )\) for the probability under \(\dot{H}\) that \(\underline{{\sigma }}\) has \(\ell \) entries in \(\{{\texttt {r}}\}\): then

$$\begin{aligned} \mathbb {E}_{\mathbb {P}^H}[ I_S (4ks)^{\texttt {R}_S}] \leqslant n^c \, \mathbb {E}_{\mathbb {Q}^H}[ I_S (4ks)^{\texttt {R}_S}; \mathrm {\textsf {{MARG}}}] \leqslant n^c \, \, \bigg (\sum _{\ell \geqslant 1} \dot{H}(\ell ) (4ks)^{\ell }\bigg )^{ns}\,. \end{aligned}$$
(133)

For H sufficiently close to \(H_\star \), we will have

$$\begin{aligned}\dot{H}(\ell ) \leqslant 2\dot{H}_\star (\ell ) \leqslant 2 \left( {\begin{array}{c}d\\ \ell \end{array}}\right) \frac{\hat{q}_\star ({\texttt {r}}_{\texttt {1}})^\ell \hat{q}_\star ({\texttt {b}}_{\texttt {1}})^{d-\ell }}{ [\hat{q}_\star ({\texttt {r}}_{\texttt {1}}) +\hat{q}_\star ({\texttt {b}}_{\texttt {1}})]^d -\hat{q}_\star ({\texttt {b}}_{\texttt {1}})^d }\,.\end{aligned}$$

It follows that the right-hand side of (133) is (for some absolute constant \(\delta \))

$$\begin{aligned}\leqslant n^c \, 2^{ns} \bigg (\frac{[\hat{q}_\star ({\texttt {r}}_{\texttt {1}}) \cdot 4ks +\hat{q}_\star ({\texttt {b}}_{\texttt {1}})]^d -\hat{q}_\star ({\texttt {b}}_{\texttt {1}})^d}{[\hat{q}_\star ({\texttt {r}}_{\texttt {1}}) +\hat{q}_\star ({\texttt {b}}_{\texttt {1}})]^d -\hat{q}_\star ({\texttt {b}}_{\texttt {1}})^d }\bigg )^{ns} \leqslant n^c s^{ns} 2^{-\delta k n s} ,\end{aligned}$$

where the last inequality uses that \(s\leqslant \exp \{ -k/(\ln k)\}\). Summing over S gives

$$\begin{aligned}\mathbb {P}^H(\varvec{E}) \leqslant \max _{s \geqslant (2n)^{-1}(\ln n)^2} n^c 2^{-\delta k n s/2} \leqslant \exp \{ -\Omega (1) k(\ln n)^2\}\,.\end{aligned}$$

This implies (132); and the claimed result follows as previously explained.\(\square \)

Proof of Proposition3.7

Follows by combining Corollary D.2 and Proposition D.3.\(\square \)

Appendix E: Free energy upper bound

For a family of spin systems that includes nae-sat, an interpolative calculation gives an upper bound for the free energy on Erdős-Rényi graphs ([26, 43], cf. [30]). These bounds build on earlier work [29] concerning the subadditivity of the free energy in the Sherrington–Kirkpatrick model, which was later generalized to a broad class of models [4, 5, 12, 27]. (Although these results are closely related, we remark that interpolation gives quantitative bounds whereas subadditivity does not.) To prove the upper bound in Theorem 1, we establish the analogue of [26, 43] for random regular graphs. Although the main concern of this paper is the nae-sat model, we give the bound for a more general class of models, which may be of independent interest.

1.1 E.1. Basic interpolation bound

Recall \(\mathcal {G}=(V,F,E)\) denotes a (dk)-regular bipartite graph (without edge literals). We consider measures defined on vectors \(\underline{{x}}\in \mathcal {X}^V\) where \(\mathcal {X}\) is some fixed alphabet of finite size. Fix also a finite index set S. Suppose we have (random) vectors \(b\in \mathbb {R}^S\) and \(f\in {\mathcal {F}}(\mathcal {X})^S\), where \({\mathcal {F}}(\mathcal {X})\) denotes the space of functions \(\mathcal {X}\rightarrow \mathbb {R}_{\geqslant 0}\). Independently of b, let \(f_1,\ldots ,f_k\) be i.i.d. copies of f, and define the random function

$$\begin{aligned} \theta (\underline{{x}}) \equiv \sum _{s\in S} b_s \prod _{j=1}^k f_{s,j}(x_j). \end{aligned}$$
(134)

Let h be another (random) element of \({\mathcal {F}}(\mathcal {X})\). Assume there is a constant \(\epsilon >0\) so that

$$\begin{aligned} \epsilon \leqslant \{h,1-\theta \} \leqslant \frac{1}{\epsilon }\quad \text {almost surely.} \end{aligned}$$
(135)

Note we do not require the \(b_s\) to be nonnegative; however, we assume that

$$\begin{aligned} b^p(\underline{{s}}) \equiv \mathbb {E}\Big [ \prod _{\ell =1}^p b_{s_\ell }\Big ]\geqslant 0 \quad \text {for any }p\geqslant 1, \underline{{s}} \equiv (s_1,\ldots ,s_p)\in S^p. \end{aligned}$$
(136)

Let \(\mathscr {G}\) denote the graph \(\mathcal {G}\) labelled by a vector \(((h_v)_{v\in V},(\theta _a)_{a\in F})\) of independent functions, where the \(h_v\) are i.i.d. copies of h and the \(\theta _a\) are i.i.d. copies of \(\theta \). For \(a\in F\) we abbreviate \(\underline{{x}}_{\delta a}\equiv ( x_{v(e)} )_{e\in \delta a}\in \mathcal {X}^k\), and we consider the (random) Gibbs measure

$$\begin{aligned} \mu _{\mathscr {G}}(\underline{{x}}) \equiv \frac{1}{Z(\mathscr {G})} \prod _{v\in V} h_v(x_v) \prod _{a\in F} [1-\theta _a( \underline{{x}}_{\delta a})] \end{aligned}$$
(137)

where \(Z(\mathscr {G})\) is the normalizing constant. Now let \(\mathscr {G}\) be the random (dk)-regular graph on n variables, together with the random function labels. We write \(\mathbb {E}_n\) for expectation over the law of \(\mathscr {G}\), and define the (logarithmic) free energy of the model to be

$$\begin{aligned}F_n \equiv \frac{1}{n}\mathbb {E}_n\ln Z(\mathscr {G})\,. \end{aligned}$$

Example E.1

(positive temperature nae-sat) Let \(\mathcal {X}=\{{\texttt {0}},{\texttt {1}}\}\), and let \(\underline{{{\texttt {L}}}}\equiv ({\texttt {L}}_i)_{i\leqslant k}\) be a sequence of i.i.d. \(\text {Bernoulli}(1/2)\) random variables. The positive-temperature nae-sat model corresponds to taking \(h\equiv 1\) and

$$\begin{aligned}\theta (\underline{{x}}) \equiv (1-e^{-\beta }) \bigg ( \prod _{i=1}^k \frac{{\texttt {L}}_{1}\oplus x_{i}}{2} +\prod _{i=1}^k \frac{{\texttt {1}}\oplus {\texttt {L}}_{i}\oplus x_{i}}{2} \bigg )\end{aligned}$$

where \(\beta \in (0,\infty )\) is the inverse temperature. In this model, each violated clause incurs a multiplicative penalty \(e^{-\beta }\).

Example E.2

(positive-temperature coloring) Let \(\mathcal {X}=[q]\). The positive-temperature coloring model (i.e., anti-ferromagnetic Potts model) on a k-uniform hypergraph corresponds to \(h\equiv 1\) and

$$\begin{aligned}\theta (\underline{{x}}) \equiv (1-e^{-\beta }) \sum _{s=1}^q \mathbf {1}\{x_1=\cdots =x_k=s\}\end{aligned}$$

where \(\beta \in (0,\infty )\) is the inverse temperature. In this model, each monochromatic (hyper)edge incurs a multiplicative penalty \(e^{-\beta }\).

The following theorem is a random regular graph analog of [43, Thm. 3]. (We stated our result for a slightly more general class of models than considered in [43]; however the main result of [43] extends to these models with only minor modifications.)

Theorem E.3

Consider a (random) Gibbs measure (137) satisfying assumptions (134)–(136), and consider the (nonasymptotic) free energy \(F_n \equiv n^{-1}\mathbb {E}_n\ln Z(\mathscr {G})\). Let

$$\begin{aligned} {\mathcal {M}}_0&\equiv \text {space of probability measures over } \mathcal {X},\\ {\mathcal {M}}_1&\equiv \text {space of probability measures over } {\mathcal {M}}_0,\\ {\mathcal {M}}_2&\equiv \text {space of probability measures over } {\mathcal {M}}_1. \end{aligned}$$

For \({\zeta \in {\mathcal {M}}_2}\), let \(\underline{{\eta }}\equiv (\eta _{a,j})_{a\geqslant 0,j\geqslant 0}\) be an array of i.i.d. samples from \(\zeta \). For each index (aj) let \(\rho _{a,j}\) be a conditionally independent sample from \(\eta _{a,j}\), and denote \(\underline{{\rho }}\equiv (\rho _{a,j})_{a\geqslant 0,j\geqslant 0}\). Let \((h\rho )_{a,j}(x) \equiv h_{a,j}(x)\rho _{a,j}(x)\), define random variables

$$\begin{aligned} \varvec{u}_a(x)&\equiv \sum _{\underline{{x}}\in \mathcal {X}^k} \mathbf {1}\{x_1=x\} [1-\theta _a(\underline{{x}})] \prod _{j=2}^k (h\rho )_{a,j}(x_j) ,\\ \varvec{u}_a&\equiv \sum _{\underline{{x}}\in \mathcal {X}^k} [1-\theta _a(\underline{{x}})] \prod _{j=1}^k (h\rho )_{a,j}(x_j). \end{aligned}$$

For any \(\lambda \in (0,1)\) and any \({\zeta \in {\mathcal {M}}_2}\),

$$\begin{aligned}F_n \leqslant \lambda ^{-1} \mathbb {E}\ln \mathbb {E}'\bigg [ \Big ( \sum _{x\in \mathcal {X}} h(x) \prod _{a=1}^d \varvec{u}_a(x) \Big )^\lambda \bigg ] -(k-1)\alpha \lambda ^{-1} \mathbb {E}\ln \mathbb {E}'[(\varvec{u}_0)^\lambda ] +O_{\epsilon }(n^{-1/3}) \end{aligned}$$

where \(\mathbb {E}'\) denotes the expectation over \(\underline{{\rho }}\) conditioned on all else, and \(\mathbb {E}\) denotes the overall expectation.

Remark E.4

In the statistical physics framework, elements \(\rho \in {\mathcal {M}}_0\) correspond to belief propagation messages for the underlying model, which has state space \(\mathcal {X}\). Elements \(\eta \in {\mathcal {M}}_1\) correspond to belief propagation messages for the 1rsb model (termed “auxiliary model” in [33, Ch. 19]), which has state space \({\mathcal {M}}_0\). The informal picture is that the \(\eta \) associated to variable x is determined by the geometry of the local neighborhood of x — that is to say, the randomness of \(\zeta \) reflects the randomness in the geometry of the R-neighborhood of a uniformly randomly variable in the graph. In random regular graphs this randomness is degenerate—the R-neighborhood of (almost) every vertex is simply a regular tree. It is therefore expected that the best upper bound in Theorem E.3 can be achieved with \(\zeta \) a point mass.

1.2 E.2. Replica symmetric bound

Along the lines of [43], we first prove a weaker “replica symmetric” version of Theorem E.3. Afterwards we will apply it to obtain the full result.

Theorem E.5

In the setting of Theorem E.3, define

$$\begin{aligned}\Phi _V\equiv \mathbb {E}\ln \Big ( \sum _{x\in \mathcal {X}} h(x)\prod _{a=1}^d \varvec{u}_a (x) \Big ),\quad \Phi _F \equiv (k-1)\alpha \mathbb {E}\ln (\varvec{u}_0)\,. \end{aligned}$$

Then \(F_n \leqslant \Phi _V-\Phi _F-O_{\epsilon }(n^{-1/3})\).

Inspired by the proof of [12], we prove Theorem E.5 by a combinatorial interpolation between two graphs, \(\mathscr {G}_{-1}\) and \(\mathscr {G}_{nd+1}\). The initial graph \(\mathscr {G}_{-1}\) will have free energy \(\Phi _V\), and the final graph \(\mathscr {G}_{nd+1}\) will have free energy \(F_n + \Phi _F\). We will show that, up to \(O_\epsilon (n^{1/3})\) error, the free energy of \(\mathscr {G}_{-1}\) will be larger than that of \(\mathscr {G}_{nd+1}\), from which the bound of Theorem E.5 follows.

To begin, we take \(\mathscr {G}_{-1}\) to be a factor graph consisting of n disjoint trees (Fig. 7a). Each tree is rooted at a variable v which joins to d clauses. Each of these clauses then joins to \(k-1\) more variables, which form the leaves of the tree. We write V for the root variables, A for the clauses, and U for the leaf variables. Note \(|V|=n\), \(|A|=nd\), and \(|U|=nd(k-1)\).

Independently of all else, take a vector of i.i.d. samples \((\eta _u,\rho _u)_{u\in U}\) where \(\eta _u\) is a sample from \(\zeta \), and \(\rho _u\) is a sample from \(\eta _u\).Footnote 3 As before, the variables and clauses in \(\mathscr {G}_{-1}\) are labelled independently with functions \(h_v\) and \(\theta _a\). We now additionally assign to each \(u\in U\) the label \((\eta _u,\rho _u)\). Let \((h\rho )_u(x)\equiv h_u(x)\rho _u(x)\). We consider the factor model on \(\mathscr {G}_{-1}\) defined by

$$\begin{aligned}\mu _{\mathscr {G}_{-1}}(\underline{{x}}) = \frac{1}{Z(\mathscr {G}_{-1})} \prod _{v\in V} h_v(x_v) \prod _{a\in A} [1-\theta _a(\underline{{x}}_{\delta a})] \prod _{u\in U} (h\rho )_u(x_u)\,. \end{aligned}$$

We now define the interpolating sequence of graphs \(\mathscr {G}_{-1},\mathscr {G}_0,\ldots ,\mathscr {G}_{nd+1}\). Fix \(m'\equiv 2n^{2/3}\). The construction proceeds by adding and removing clauses. Whenever we remove a clause a, the edges \(\delta a\) are left behind as k unmatched edges in the remaining graph. Whenever we add a new clause b, we label it with a fresh sample \(\theta _b\) of \(\theta \). The graph \(\mathscr {G}_r\) has clauses \(F_r\) which can be partitioned into \(A_{U,r}\) (clauses involving U only), \(A_{V,r}\) (clauses involving V only), and \(A_r\) (clauses involving both U and V). We will define below a certain sequence of events \(\mathrm {\textsf {{{COUP}}}}_r\). Let \(\mathrm {\textsf {{{COUP}}}}_{\leqslant r}\) be the event that \(\mathrm {\textsf {{{COUP}}}}_s\) occurs for all \(0\leqslant s \leqslant r\). The event \(\mathrm {\textsf {{{COUP}}}}_{\leqslant -1}\) occurs vacuously, so \(\mathbb {P}(\mathrm {\textsf {{{COUP}}}}_{\leqslant -1})=1\). With this notation in mind, the construction goes as follows:

  1. 1.

    Starting from \(\mathscr {G}_{-1}\), choose a uniformly random subset of \(m'\) clauses from \(F_{-1}=A_{-1}=A\), and remove them to form the new graph \(\mathscr {G}_0\).

  2. 2.

    For \(0\leqslant r\leqslant nd-m'-1\), we start from \(\mathscr {G}_r\) and form \(\mathscr {G}_{r+1}\) as follows.

    1. a.

      If \(\mathrm {\textsf {{{COUP}}}}_{\leqslant r-1}\) succeeds, choose a uniformly random clause a from \(A_r\), and remove it to form the new graph \(\mathscr {G}_{r,\circ }\). Let \(\delta 'U_{r,\circ }\) and \(\delta 'V_{r,\circ }\) denote the unmatched half-edges incident to U and V respectively in \(\mathscr {G}_{r,\circ }\), and define the event

      $$\begin{aligned}\mathrm {\textsf {{{COUP}}}}_r\equiv \{ \min \{{\delta 'U_{r,\circ }},{\delta 'V_{r,\circ }}\}\geqslant k\}\,.\end{aligned}$$

      If instead \(\mathrm {\textsf {{{COUP}}}}_{\leqslant r-1}\) fails, then \(\mathrm {\textsf {{{COUP}}}}_{\leqslant r}\) fails by definition.

    2. b.

      If \(\mathrm {\textsf {{{COUP}}}}_{\leqslant r}\) fails, let \(\mathscr {G}_{r+1}=\mathscr {G}_r\). If \(\mathrm {\textsf {{{COUP}}}}_{\leqslant r}\) succeeds, then with probability 1/k take k half-edges from \(\delta ' V_{r,\circ }\) and join them into a new clause c. With the remaining probability \((k-1)/k\) take k half-edges from \(\delta ' U_{r,\circ }\) and join them into a new clause c.

  3. c.

    For \(nd-m' \leqslant r\leqslant nd-1\) let \(\mathscr {G}_{r+1}=\mathscr {G}_r\). Starting from \(\mathscr {G}_{nd}\), remove all the clauses in \(A_{nd}\). Then connect (uniformly at random) all remaining unmatched V-incident edges into clauses. Likewise, connect all remaining unmatched U-incident edges into clauses. Denote the resulting graph \(\mathscr {G}_{nd+1}\).

By construction, \(\mathscr {G}_{nd+1}\) consists of two disjoint subgraphs, which are the induced subgraphs \(\mathscr {G}_U,\mathscr {G}_V\) of UV respectively. Note that \(\mathscr {G}_V\) is distributed as the random graph \(\mathscr {G}\) of interest, while \(\mathscr {G}_U\) consists of a collection of \(nd(k-1)/k = n\alpha (k-1)\) disjoint trees.

Fig. 7
figure 7

Interpolation with \(d=2,k=3\), \(n = 6\)

Lemma E.6

Under the construction above,

$$\begin{aligned} \mathbb {E}\ln Z(\mathscr {G}_0)\geqslant \mathbb {E}\ln Z(\mathscr {G}_{nd})-O_\epsilon (n^{1/3})\,,\end{aligned}$$
(138)

where the expectation \(\mathbb {E}\) is over the sequence of random graphs \((\mathscr {G}_r)_{-1\leqslant r\leqslant nd+1}\).

Proof

Let \({\mathscr {F}}_{r,\circ }\) be the \(\sigma \)-field generated by \(\mathscr {G}_{r,\circ }\), and write \(\mathbb {E}_{r,\circ }\) for expectation conditioned on \({\mathscr {F}}_{r,\circ }\). One can rewrite (138) as

$$\begin{aligned} \mathbb {E}\ln \frac{Z(\mathscr {G}_0)}{Z(\mathscr {G}_{nd})} = \sum _{r=0}^{nd-1} \mathbb {E}\Delta _r,\quad \Delta _r\equiv \mathbb {E}_{r,\circ } \ln \frac{Z(\mathscr {G}_r)}{Z(\mathscr {G}_{r,\circ })} -\mathbb {E}_{r,\circ }\ln \frac{Z(\mathscr {G}_{r+1})}{Z(\mathscr {G}_{r,\circ })}\,.\end{aligned}$$

In particular, \(\Delta _r=0\) if the coupling fails. Therefore it suffices to show that \(\Delta _r\) is positive conditioned on \(\mathrm {\textsf {{{COUP}}}}_{\leqslant r}\).Footnote 4 First we compare \(\mathscr {G}_r\) and \(\mathscr {G}_{r,\circ }\). Conditioned on \({\mathscr {F}}_{r,\circ }\), we know \(\mathscr {G}_{r,\circ }\). From \(\mathscr {G}_{r,\circ }\) we can obtain \(\mathscr {G}_r\) by adding a single clause \(a\equiv a_r\), together with a random label \(\theta _a\) which is a fresh copy of \(\theta \). To choose the unmatched edges \(\delta a=(e_1,\ldots ,e_k)\) which are combined into the clause a, we take \(e_1\) uniformly at random from \(\delta 'V_{r,\circ }\), then take \(\{e_2,\ldots ,e_k\}\) a uniformly random subset of \(\delta 'U_{r,\circ }\). Let \(\mu _{r,\circ }\) be the Gibbs measure on \(\mathscr {G}_{r,\circ }\) (ignoring unmatched half-edges). Let \(\underline{\underline{x}}\equiv (\underline{{x}},\underline{{x}}^1,\underline{{x}}^2,\ldots )\) be an infinite sequence of i.i.d. samples from \(\mu _{r,\circ }\), and write \(\langle \cdot \rangle _{r,\circ }\) for the expectation with respect to their joint law. Then

$$\begin{aligned}\mathbb {E}_{r,\circ } \ln \frac{Z(\mathscr {G}_r)}{Z(\mathscr {G}_{r,\circ })} = \mathbb {E}_{r,\circ } \ln (1- \langle \theta (\underline{{x}}_{\delta a}) \rangle _{r,\circ }) = \sum _{p\geqslant 1} \frac{1}{p} {\mathscr {A}}_p,\quad {\mathscr {A}}_p \equiv \mathbb {E}_{r,\circ }\bigg [\Big \langle \prod _{\ell =1}^p \theta (\underline{{x}}^\ell _{\delta a}) \Big \rangle _{r,\circ } \bigg ]\,.\end{aligned}$$

We have \(\mathbb {E}_{r,\circ }=\mathbb {E}_a\mathbb {E}_\theta \) where \(\mathbb {E}_a\) is expectation over the choice of \(\delta a\), and \(\mathbb {E}_\theta \) is expectation over the choice of \(\theta \). Under \(\mathbb {E}_a\), the edges \((e_2,\ldots ,e_k)\) are weakly dependent, since they are required to be distinct elements of \(\delta 'U_{r,\circ }\). We can consider instead sampling \(e_2,\ldots ,e_k\) uniformly with replacement from \(\delta 'U_{r,\circ }\), so that \(e_1,\ldots ,e_k\) are independent conditional on \({\mathscr {F}}_{r,\circ }\); let \(\mathbb {E}_{a,\text {ind}}\) denote expectation with respect to this choice of \(\delta a\). Under \(\mathbb {E}_{a,\text {ind}}\) the chance of a collision \(e_i=e_j\) (\(i\leqslant j\)) is \(O(k^2/|\delta 'U_{r,\circ }|)\). Recalling \(1-\theta \geqslant \epsilon \) almost surely, we have

$$\begin{aligned}{\mathscr {A}}_{p,\text {ind}} \equiv \mathbb {E}_{a,\text {ind}} \mathbb {E}_\theta \bigg [ \Big \langle \prod _{\ell =1}^p \theta (\underline{{x}}^\ell _{\delta a}) \Big \rangle _{r,\circ } \bigg ] = {\mathscr {A}}_p + O(1) (1-\epsilon )^p \min \bigg \{ \frac{k^2}{|\delta 'U_{r,\circ }|}, 1 \bigg \}\,.\end{aligned}$$

Recall from (134) the product form of \(\theta \), and let \(\mathbb {E}_f\) denote expectation over the law of \(f\equiv (f_s)_{s\in S}\). Then, with \(b^p(\underline{{s}})\) as defined in (136), we have

$$\begin{aligned} {\mathscr {A}}_{p,\text {ind}}&=\sum _{\underline{{s}}\in S^p} b^p(\underline{{s}}) \bigg \langle \mathbb {E}_{a,\text {ind}}\bigg \{ \prod _{j=1}^k \mathbb {E}_f\bigg [ \prod _{\ell =1}^p f_{s_\ell }(x^\ell _{e_j})\bigg ] \bigg \} \bigg \rangle _{r,\circ }\\&= \sum _{\underline{{s}}\in S^p} b^p(\underline{{s}}) \langle I_{V,\underline{{s}}}(\underline{\underline{x}}) I_{U,\underline{{s}}}(\underline{\underline{x}})^{k-1} \rangle _{r,\circ }, \end{aligned}$$

where, for \(W=U\) or \(W=V\), we define

$$\begin{aligned}I_{W,\underline{{s}}}(\underline{\underline{x}}) \equiv \frac{1}{|\delta 'W_{r,\circ }|} \sum _{e\in \delta 'W_{r,\circ }} \mathbb {E}_f\bigg [ \prod _{\ell =1}^p f_{s_\ell }(x^\ell _e) \bigg ]\,.\end{aligned}$$

Summing over \(p\geqslant 1\) gives that, on the event \(\mathrm {\textsf {{{COUP}}}}_{\leqslant r}\),

$$\begin{aligned}&\mathbb {E}_{r,\circ } \ln \frac{Z(\mathscr {G}_r)}{Z(\mathscr {G}_{r,\circ })} = \sum _{p\geqslant 1} \frac{1}{p} \sum _{\underline{{s}}\in S^p} b^p(\underline{{s}}) \mathbb {E}_{r,\circ } \langle I_{V,\underline{{s}}}(\underline{\underline{x}}) I_{U,\underline{{s}}}(\underline{\underline{x}})^{k-1} \rangle _{r,\circ } + \textsf {err}_{r,1},\\&\qquad \text {where } |\textsf {err}_{r,1}| \leqslant O_\epsilon (1) \min \bigg \{ \frac{k^2}{|\delta 'U_{r,\circ }|}, 1 \bigg \}. \end{aligned}$$

A similar comparison between \(\mathscr {G}_{r+1}\) and \(\mathscr {G}_{r,\circ }\) gives

$$\begin{aligned} \mathbb {E}_{r,\circ } \ln \frac{Z(\mathscr {G}_r)}{Z(\mathscr {G}_{r,\circ })}&= \sum _{p\geqslant 1} \frac{1}{p} \mathbb {E}_{r,\circ } \bigg [ \sum _{\underline{{s}}\in S^p} b^p(\underline{{s}}) \bigg \langle \frac{k-1}{k} I_{U,\underline{{s}}}(\underline{\underline{x}})^{k} + \frac{1}{k} I_{V,\underline{{s}}}(\underline{\underline{x}})^{k} \bigg \rangle _{r,\circ } \bigg ] +\textsf {err}_{r,2},\\&\qquad |\textsf {err}_{r,2}| \leqslant O_\epsilon (1) \min \bigg \{ \frac{k^2}{ \min \{|\delta 'U_{r,\circ }|, |\delta 'V_{r,\circ }|\}}, 1 \bigg \}. \end{aligned}$$

We now argue that the sum of the error terms \(\textsf {err}_{r,1},\textsf {err}_{r,2}\), over \(0\leqslant r\leqslant nd-1\), is small in expectation. First note that for a constant \(C=C(k,\epsilon )\),

$$\begin{aligned}\sum _{r=0}^{nd-1} \mathbb {E}[\textsf {err}_{r,1} +\textsf {err}_{r,2}] \leqslant C n \bigg [ n^{-2/3} + \mathbb {P}\Big ( \min \{|\delta 'V_{r,\circ }|, |\delta 'V_{r,\circ }|\} \leqslant n^{2/3} \text { for some } r \leqslant nd \Big ) \bigg ]\,.\end{aligned}$$

The process \((|\delta 'V_{r,\circ }|)_{r\geqslant 0}\) is an unbiased random walk started from \(m'+1 = 2n^{2/3}+1\). In each step it goes up by 1 with chance \((k-1)/k\), and down by \(k-1\) with chance 1/k; it is absorbed if it hits k before time \(nd-m'\). Similarly, \((|\delta 'U|_{r,\circ })_{r\geqslant 0}\) is an unbiased random walk started from \((m'+1)(k-1)\) with an absorbing barrier at k. By the Azuma–Hoeffding bound, there is a constant \(c=c(k)\) such that

$$\begin{aligned}\mathbb {P}(|\delta 'V_{r,\circ }| \leqslant |\delta 'V_{0,\circ }|-n^{2/3}) + \mathbb {P}(|\delta 'U_{r,\circ }| \leqslant |\delta 'U_{0,\circ }|-n^{2/3}) \leqslant \exp \{ -c n^{1/3} \}\end{aligned}$$

Taking a union bound over r shows that with very high probability, neither of the walks \(|\delta 'V_{r,\circ }|,|\delta 'U_{r,\circ }|\) is absorbed before time \(nd-m'\), and (adjusting the constant C as needed)

$$\begin{aligned}\sum _{r=0}^{nd-1} \mathbb {E}[\textsf {err}_{r,1} +\textsf {err}_{r,2}] \leqslant C n^{1/3}\,.\end{aligned}$$

Altogether this gives

$$\begin{aligned}&\mathbb {E}\ln \frac{Z(\mathscr {G}_0)}{Z(\mathscr {G}_{nd})} - O_\epsilon (n^{1/3})\\&=\sum _{r=0}^{nd-1} \sum _{p\geqslant 1} \frac{1}{p} \sum _{\underline{{s}}} b^p(\underline{{s}}) \mathbb {E}_{r,\circ } \bigg \langle I_{V,\underline{{s}}}(\underline{\underline{x}}) I_{U,\underline{{s}}}(\underline{\underline{x}})^{k-1} -\frac{k-1}{k} I_{U,\underline{{s}}}(\underline{\underline{x}})^{k-1} -\frac{1}{k} I_{V,\underline{{s}}}(\underline{\underline{x}})^{k-1} \bigg \rangle _{r,\circ }. \end{aligned}$$

Using the fact that \(x^k-kxy^{k-1}+(k-1)y^k\geqslant 0\) for all \(x,y\in \mathbb {R}\) and even \(k\geqslant 2\), or \(x,y\geqslant 0\) and odd \(k\geqslant 3\) finishes the proof.\(\square \)

Corollary E.7

In the setting of Lemma E.6,

$$\begin{aligned}\mathbb {E}\ln Z(\mathscr {G}_{-1})\geqslant \mathbb {E}\ln Z(\mathscr {G}_{nd+1}) -O_{\epsilon }(n^{2/3}),\end{aligned}$$

where the expectation \(\mathbb {E}\) is over the sequence of random graphs \((\mathscr {G}_r)_{-1\leqslant r\leqslant nd+1}\).

Proof

Adding or removing a clause can change the partition function by at most a multiplicative constant (depending on \(\epsilon \)). On the event that the coupling succeeds for all r,

$$\begin{aligned}\bigg | \ln \frac{Z(\mathscr {G}_0)}{Z(\mathscr {G}_{-1})}\bigg | +\bigg | \ln \frac{Z(\mathscr {G}_{nd+1})}{Z(\mathscr {G}_{nd})}\bigg | = O_\epsilon (m') = O_\epsilon (n^{2/3})\,.\end{aligned}$$

On the event that the coupling fails, the difference is crudely \(O_\epsilon (n)\). We saw in the proof of Lemma E.6 that the coupling fails with probability exponentially small in n, so altogether we conclude

$$\begin{aligned}\mathbb {E}\bigg | \ln \frac{Z(\mathscr {G}_0)}{Z(\mathscr {G}_{-1})}\bigg | +\mathbb {E}\bigg | \ln \frac{Z(\mathscr {G}_{nd+1})}{Z(\mathscr {G}_{nd})}\bigg | = O_\epsilon (n^{2/3})\,. \end{aligned}$$

Combining with the result of Lemma E.6 proves the claim.\(\square \)

Proof of Theorem E.5

In the interpolation, the initial graph \(\mathscr {G}_{-1}\) consists of n disjoint trees \(T_v\), each rooted at a variable \(v\in V\). Thus

$$\begin{aligned}n^{-1}\mathbb {E}\ln Z(\mathscr {G}_{-1}) =\mathbb {E}\ln Z(T_v) =\mathbb {E}\ln \bigg ( \sum _{x\in \mathcal {X}} h_v(x) \prod _{a=1}^d \varvec{u}_a(x) \bigg )\,.\end{aligned}$$

The final graph \(\mathscr {G}_{nd+1}\) is comprised of two disjoint subgraphs—one subgraph \(\mathscr {G}_V\) has the same law as the graph \(\mathscr {G}\) of interest, while the other subgraph \(\mathscr {G}_U=(U,F_U,E_U)\) consists of \(n\alpha (k-1)\) disjoint trees \(S_c\), each rooted at a clause \(c\in A_U\). Thus

$$\begin{aligned}n^{-1}\mathbb {E}\ln Z(\mathscr {G}_{nd+1}) = \alpha (k-1)\mathbb {E}\ln Z(S_c) + n^{-1}\mathbb {E}\ln Z(\mathscr {G}) = \alpha (k-1)\mathbb {E}\ln \varvec{u}_0 + F_n\,.\end{aligned}$$

The theorem follows by substituting these into the bound of Corollary E.7.\(\square \)

1.3 E.3. Proof of 1RSB bound

For the proof of Theorem E.3, we take \(\mathscr {G}_{-1}\) as before and modify it as follows. Where previously each \(u\in U\) had spin value \(x_u\in \mathcal {X}\), it now has the augmented spin \((x_u,\gamma _u)\) where \(\gamma \) goes over the positive integers. Let \(\underline{{\gamma }}\equiv (\gamma _u)_u\). Next, instead of labeling u with \((h_u,\eta _u,\rho _u)\) as before, we now label it with \((h_u,\eta _u,(\rho ^\gamma _u)_{\gamma \geqslant 1})\) where \((\rho ^\gamma _u)_{\gamma \geqslant 1}\) is an infinite sequence of i.i.d. samples from \(\eta _u\). Lastly, we join all variables in U to a new clause \(a_*\) (Fig. 8), which is labelled with the function

$$\begin{aligned}\varphi _{a_*}(\underline{{\gamma }}) =\sum _{\gamma \geqslant 1} z_\gamma \prod _{u\in U}\mathbf {1}\{ \gamma _u=\gamma \}\end{aligned}$$

for some sequence of (random) weights \((z_\gamma )_{\gamma \geqslant 1}\). Let \({\mathscr {H}}_{-1}\) denote the resulting graph.

Fig. 8
figure 8

\({\mathscr {H}}_{-1}\)

Given \({\mathscr {H}}_{-1}\), let \(\mu _{{\mathscr {H}}_{-1}}\) be the associated Gibbs measure on configurations \((\underline{{\gamma }},\underline{{x}})\). Due to the definition of \(\varphi _{a_*}\), the support of \(\mu _{{\mathscr {H}}_{-1}}\) contains only those configurations where all the \(\gamma _u\) share a common value \(\gamma \), in which case we denote \((\underline{{\gamma }},\underline{{x}})\equiv (\gamma ,\underline{{x}})\). Explicitly,

$$\begin{aligned}\mu _{{\mathscr {H}}_{-1}}(\gamma ,\underline{{x}}) =\frac{1}{Z({\mathscr {H}}_{-1})} z_\gamma \prod _{v\in V} h_v(x_v) \prod _{a\in A} [1-\theta _a(\underline{{x}}_{\delta a})] \prod _{u\in U} (\rho ^\gamma h)_u(x_u)\,.\end{aligned}$$

We can then define an interpolating sequence \({\mathscr {H}}_{-1},\ldots , {\mathscr {H}}_{nd+1}\) precisely as in the proof of Theorem E.5, leaving \(a_*\) untouched. Let \(\mathscr {G}_r\) denote the graph \({\mathscr {H}}_r\) without the clause \(a_*\), and let \(Z_\gamma (\mathscr {G}_r)\) denote the partition function on \(\mathscr {G}_r\) restricted to configurations where \(\gamma _u=\gamma \) for all u. Then, for each \(0\leqslant r\leqslant nd+1\),

$$\begin{aligned}Z({\mathscr {H}}_r) =\sum _\gamma z_\gamma Z_\gamma (\mathscr {G}_r)\,.\end{aligned}$$

The proofs of Lemma E.6 and Corollary E.7 carry over to this setting with essentially no changes, giving

Corollary E.8

Under the assumptions above,

$$\begin{aligned}\mathbb {E}\ln Z({\mathscr {H}}_{-1}) \geqslant \mathbb {E}\ln Z({\mathscr {H}}_{nd+1}) -O_{\epsilon }(n^{2/3}),\end{aligned}$$

where the expectation \(\mathbb {E}\) is over the sequence of random graphs \(({\mathscr {H}}_r)_{-1 \leqslant r\leqslant nd+1}\).

The result of Corollary E.8 applies for any \((z_\gamma )_{\gamma \geqslant 1}\). Now take \((z_\gamma )_{\gamma \geqslant 1}\) to be a Poisson–Dirichlet process with parameter \(\lambda \in (0,1)\).Footnote 5 The process has the following invariance property (see e.g. [41, Ch. 2]):

Proposition E.9

Let \((z_\gamma )_{\gamma \geqslant 1}\) be a Poisson–Dirichlet process with parameter \(\lambda \in (0,1)\). Independently, let \((\xi _\gamma )_{\gamma \geqslant 1}\) be a sequence of i.i.d. positive random variables with finite second moment. Then the two sequences \((z_\gamma \xi _\gamma )_{\gamma \geqslant 1}\) and \((z_\gamma (\mathbb {E}\xi _1^\lambda )^{1/\lambda })_{\gamma \geqslant 1}\) have the same distribution, and consequently

$$\begin{aligned} \mathbb {E}\ln \sum _{\gamma \geqslant 1} z_\gamma \xi _\gamma =\frac{1}{\lambda }\ln \mathbb {E}\xi ^\lambda \,. \end{aligned}$$

Proof of Theorem E.3

Consider \(\underline{{Z}}(\gamma ) \equiv (Z_\gamma (\mathscr {G}_r) )_{-1 \leqslant r \leqslant nd+1}\). If we condition on everything else except for the \(\rho \)’s, then \((\underline{{Z}}(\gamma ))_{\gamma \geqslant 1}\) is an i.i.d. sequence indexed by \(\gamma \). Let \(\mathbb {E}_{z,\rho }\) denote expectation over the z’s and \(\rho \)’s, conditioned on all else: then applying Proposition E.9 gives

$$\begin{aligned} n^{-1}\mathbb {E}\ln Z({\mathscr {H}}_{-1})&= (n\lambda )^{-1} \mathbb {E}\ln \mathbb {E}_{z,\rho } [ Z({\mathscr {G}}_{-1})^\lambda ] = \lambda ^{-1} \mathbb {E}\ln \mathbb {E}_{z,\rho }\bigg [ \bigg (\sum _{x\in \mathcal {X}} h(x) \prod _{a=1}^d \varvec{u}_a(x)\bigg )^\lambda \bigg ],\\ n^{-1}\mathbb {E}\ln Z({\mathscr {H}}_{nd+1})&=F_n + \lambda ^{-1}\mathbb {E}\ln \mathbb {E}_{z,\rho }[ (\varvec{u}_0)^\lambda ]. \end{aligned}$$

Combining with Corollary E.8 proves the result.\(\square \)

1.4 E.4. Extension to higher levels of RSB

We finally explain that Theorem E.3 can be extended relatively easily to cover the scenario of r-step replica symmetry breaking. Before stating the result, we define some notations (mainly following notation of [41, §2.3]). Let \({\mathbb {N}}\) be the set of positive integers and \({\mathbb {N}}^r\) be its r-fold product; in particular, \({\mathbb {N}}^0\equiv \{\varnothing \}\). We consider arrays indexed by the set

$$\begin{aligned} {\mathcal {A}}\equiv \bigcup _{p=0}^r {\mathbb {N}}^p\,. \end{aligned}$$

We view \({\mathcal {A}}\) as a depth-r infinitary tree rooted at \(\varnothing \). For \(0\leqslant p\leqslant r-1\), each vertex \(\gamma =(\gamma _1,\ldots ,\gamma _p)\in {\mathbb {N}}^p\) has children \(\gamma n \equiv (\gamma _1,\dots ,\gamma _s,n)\in {\mathbb {N}}^{s+1}\). The leaves of the tree are in the last level \({\mathbb {N}}^r\). For \(\gamma \in {\mathbb {N}}^p\) write \(|\gamma |\equiv p\), and let \({\mathsf {p}}(\gamma )\) be the path between the root and \(\gamma \) (not inclusive):

$$\begin{aligned} {\mathsf {p}}( \gamma ) \equiv \bigg \{ \gamma _1,(\gamma _1,\gamma _2), \dots ,(\gamma _1,\dots ,\gamma _{p-1}) \bigg \}\,. \end{aligned}$$

Fix a sequence of parameters \( \underline{{m}} = (m_1,\dots ,m_r)\) satisfying

$$\begin{aligned} 0< m_0< \dots< m_{r-1}< 1. \end{aligned}$$
(139)

For each \(\gamma \in {\mathcal {A}}{\setminus }{\mathbb {N}}^r\), let \(\Pi _\gamma \) be (independently of all else) a Poisson–Dirichlet point process with parameter \(m_{|\gamma |}\). Let \((u_{\gamma n})_{n\in {\mathbb {N}}}\) be the points of \(\Pi _\gamma \) arranged in decreasing order. As \(\gamma \) goes over all of \({\mathcal {A}}\setminus {\mathbb {N}}^r\), we obtain an array \((u_\beta )_{\beta \in {\mathcal {A}}\setminus {\mathbb {N}}^0}\). Let

$$\begin{aligned}w_\gamma \equiv \prod _{\beta \in {\mathsf {p}}(\gamma )}u_\beta \,.\end{aligned}$$

The Ruelle probability cascade of parameter \(\underline{{m}}\) (hereafter \(\text {RPC}(\underline{{m}})\)) is defined as the \({\mathbb {N}}^r\)-indexed array

$$\begin{aligned}\nu _\gamma \equiv \frac{w_\gamma }{\sum _{\beta \in {\mathbb {N}}^r}w_\beta }\,.\end{aligned}$$

For the validity of the definition, see for instance [41, Lem. 2.4]. As in the 1rsb setting, we plan to apply Theorem E.5 to the modified graph \({\mathscr {H}}_{-1}\), where we “glue” multiple weighted copies of \(\mathscr {G}_{-1}\)’s together via the extra clause \(a_\star \). The only difference is that now the copies of \(\mathscr {G}_{-1}\) are indexed by \(\gamma \in {\mathbb {N}}^r\) instead of \({\mathbb {N}}\). More precisely, the extra spin at each vertex \(u\in U\) will take a value \(\gamma \in {\mathbb {N}}^r\); the label at each vertex \(u\in U\) will be \((h_u,\eta _u,(\rho ^\gamma _u)_{\gamma \in {\mathcal {A}}})\); and the function at \(a_\star \) will be

$$\begin{aligned} \phi _{a_\star } (\underline{{\gamma }}) = \sum _{\gamma \in {\mathbb {N}}^r} z_\gamma \prod _{u\in U} \mathbf {1}\{\gamma _u = \gamma \} , \end{aligned}$$
(140)

where \((z_\gamma )_{\gamma \in {\mathbb {N}}^r}\) is a \({\mathbb {N}}^r\)-indexed random array representing the weight of copy \(\gamma \in {\mathbb {N}}^r\). In the proof, we will choose \((z_\gamma )_{\gamma \in {\mathbb {N}}^r}\) according to the \(\text {RPC}(\underline{{m}})\) law.

We now specify the labels \((\rho ^\gamma )_{\gamma \in {\mathcal {A}}}\) that will be used in the proof. Recall that \({\mathcal {M}}_0\) is the space of probability measures on the alphabet \({\mathcal {X}}\). We recursively define \({\mathcal {M}}_r\), for \(1\leqslant r\leqslant p\), to be the space of probability measures on \({\mathcal {M}}_{r-1}\). Now fix an element \(\rho ^\varnothing = \zeta \in {\mathcal {M}}_r\). For each \(0\leqslant p \leqslant r-1\) and \(\gamma \in {\mathbb {N}}^p\), suppose inductively that we have constructed \(\rho ^\gamma \in {\mathcal {M}}_{r-p}\). We then take \(\rho ^{\gamma n} \in {\mathcal {M}}_{r-p-1}\) for \(n\in {\mathbb {N}}\) as i.i.d. samples \(\rho ^{\gamma }\in {\mathcal {M}}_{r-p}\) in \({\mathcal {M}}_{r-p-1}\). The process terminates with the construction of \(\rho ^\gamma \in {\mathcal {X}}\) for each \(\gamma \in {\mathbb {N}}^r\). Define the \(\sigma \)-field

$$\begin{aligned} {\mathcal {F}}_p \equiv \sigma \bigg ( \Big ( (\rho ^\gamma )_{\gamma \in {\mathbb {N}}^s} \Big )_{s\leqslant p} \bigg )\,, \end{aligned}$$

and write \(\mathbb {E}_p\) for expectation conditional on \({\mathcal {F}}_p\). For any deterministic function \(V(u,\rho )\), any random variable U independent of \((\rho ^\gamma )_{\gamma \in {\mathbb {N}}^r}\), and any sequence of parameters \(\underline{{m}}\) satisfying (139), consider the random array \((V^\gamma )_{\gamma \in {\mathbb {N}}^r} \equiv (V(U,\rho ^\gamma ))_{\gamma \in {\mathbb {N}}^r}\). Let \({\mathsf {T}}_r(V) = V(U,\rho ^{\underline{{1}}})\), and for \(0\leqslant p\leqslant r-1\) let

$$\begin{aligned} {\mathsf {T}}_p(V) = \bigg \{ \mathbb {E}_p\Big ( {\mathsf {T}}_{p+1}(V) \Big )^{m_p} \bigg \}^{1/m_p} \end{aligned}$$

The resulting operator \({\mathsf {T}}_0\) depends implicitly on the distribution of U, measure \(\rho ^\varnothing \in {\mathcal {M}}_r\) and parameter \(\underline{{m}}\). The following lemma is a well-known property of the RPC.

Lemma E.10

([43, Prop. 2]) Let \((z_\gamma )_{\gamma \in {\mathbb {N}}^r}\) be the RPC with parameter \(\underline{{m}}\). Under the notations above,

$$\begin{aligned} \mathbb {E}\ln {\mathsf {T}}_0(V) = \mathbb {E}\ln \sum _{\gamma \in {\mathbb {N}}^r} z_\gamma V^\gamma . \end{aligned}$$

The next result generalizes Theorem E.3.

Theorem E.11

Consider a (random) Gibbs measure (137) satisfying assumptions (134)–(136). Write \((h\rho )_{a,j}(x) \equiv h_{a,j}(x)\rho _{a,j}(x)\). For each \(a\in F\) we define

$$\begin{aligned} \varvec{u}_a(x)&\equiv \sum _{\underline{{x}}\in \mathcal {X}^k} \mathbf {1}\{x_1=x\} [1-\theta _a(\underline{{x}})] \prod _{j=2}^k (h\rho )_{a,j}(x_j) ,\\ \varvec{u}_a&\equiv \sum _{\underline{{x}}\in \mathcal {X}^k} [1-\theta _a(\underline{{x}})] \prod _{j=1}^k (h\rho )_{a,j}(x_j)\,. \end{aligned}$$

Note that \(\varvec{u}_a(x)\) and \(\varvec{u}_a\) are deterministic functions of the variables \(\big (\theta _a, (h_{a,j})_{j\in [k]}, (\rho _{a,j})_{j\in [k]}\big )\). Let

$$\begin{aligned} \varvec{v}\equiv \sum _{x\in \mathcal {X}} h(x) \prod _{a=1}^d \varvec{u}_a(x) . \end{aligned}$$

For any \({\zeta \in {\mathcal {M}}_r}\) and sequence \(\underline{{m}}\) satisfying (139), let \((\rho ^\gamma )_{\gamma \in {\mathbb {N}}^r}\) be constructed as above, and let \((\rho ^\gamma _{a,j})_{\gamma \in {\mathbb {N}}^r}\) be i.i.d. copies indexed by (aj). Define \({\mathsf {T}}_0\) similarly as above, using the \(\sigma \)-fields

$$\begin{aligned} {\mathcal {F}}_p = \sigma \bigg ( (\rho ^\gamma _{a,j})_{\gamma \in {\mathbb {N}}^s}: a\in F, j\in [k], s\leqslant p \bigg )\,. \end{aligned}$$

Then the nonasymptotic free energy \(F_n \equiv n^{-1}\mathbb {E}_n\ln Z(\mathscr {G})\) satisfies the bound

$$\begin{aligned} F_n \leqslant \mathbb {E}\ln {\mathsf {T}}_0 (\varvec{v}) -(k-1)\alpha \mathbb {E}\ln {\mathsf {T}}_0(\varvec{u}_0) +O_{\epsilon }\bigg (\frac{1}{n^{1/3}}\bigg ) \end{aligned}$$

where \(\mathbb {E}\) denotes the expectation over \((\theta _a)_{a\in F}\) and \((h_{a,j})_{a\in F,j\in [k]}\).

Proof

As outlined above, we consider the modified graph \({\mathscr {H}}_{-1}\) where each vertex \(u\in U\) is independently labeled with \((h_u,\eta _u,(\rho ^\gamma _u)_{\gamma \in {\mathcal {A}}})\) and the extra clause \(a_\star \) is labeled with the function defined in (140). In this setting, each \(u\in U\) has spin value \((\gamma ,x)\in {\mathbb {N}}^r\times \mathcal {X}\). Since we are interested only in configurations \((\underline{{\gamma }},\underline{{x}})\) such \(\gamma _u\equiv \gamma \) for all \(u\in U\), we write \((\gamma ,\underline{{x}})\) instead of \((\underline{{\gamma }},\underline{{x}})\) and define the Gibbs measure as

$$\begin{aligned} \mu _{{\mathscr {H}}_{-1}}(\gamma ,\underline{{x}}) =\frac{1}{Z({\mathscr {H}}_{-1})} z_\gamma \prod _{v\in V} h_v(x_v) \prod _{a\in A} [1-\theta _a(\underline{{x}}_{\delta a})] \prod _{u\in U} (\rho ^\gamma h)_u(x_u) \,. \end{aligned}$$

Sample the weights \((z_\gamma )_{\gamma \in {\mathbb {N}}^r}\) according the law \(\text {RPC}(\underline{{m}})\). The result then follows by the proof of Theorem E.3, with Lemma E.10 replacing the role of Proposition E.9. \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sly, A., Sun, N. & Zhang, Y. The number of solutions for random regular NAE-SAT. Probab. Theory Relat. Fields 182, 1–109 (2022). https://doi.org/10.1007/s00440-021-01029-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00440-021-01029-5

Mathematics Subject Classification

  • 60K35
  • 82B44