1 Introduction

1.1 Motivation

Spin systems on random graphs have turned out to be a source of extremely challenging problems at the junction of mathematical physics and combinatorics [42, 43]. Beyond the initial motivation of modelling disordered systems, applications have sprung up in areas as diverse as computational complexity, coding theory, machine learning and even screening for infectious diseases; e.g. [1, 16, 27, 41, 46, 48, 49]. Progress has been inspired largely by techniques from statistical physics, which to a significant extent still await a rigorous justification. The physicists’ sophisticated but largely heuristic tool is the Belief Propagation message passing scheme in combination with a functional called the Bethe free energy [40]. Roughly speaking, the fixed points of Belief Propagation are conjectured to correspond to the ‘pure states’ of the underlying distribution, with the Bethe functional gauging the relative weight of the different pure states. Yet at closer inspection matters are actually rather complicated. For instance, the system typically possesses spurious Belief Propagation fixed points without any actual combinatorial meaning, while other fixed points need not correspond to metastable states that attract dynamics such as the Glauber Markov chain [13, 17]. Generally, the mathematical understanding of the connection between Belief Propagation and dynamics leaves much to be desired.

In this paper we investigate the ferromagnetic Potts model on the random regular graph. Recall, for an integer \(q\ge 3\) and real \(\beta >0\), the Potts model on a graph \(G=(V,E)\) corresponds to a probability distribution \(\mu _{G,\beta }\) over all possible configurations \([q]^V\), commonly referred to as the Boltzmann/Gibbs distribution; the weight of a configuration \(\sigma \) in the distribution is defined as \(\mu _{G,\beta }(\sigma )=\textrm{e}^{\beta \mathcal {H}_G(\sigma )}/Z_\beta (G)\) where \(\mathcal {H}_G(\sigma )\) is the number of edges that are monochromatic under \(\sigma \), and \(Z_\beta (G)= \sum _{\tau \in [q]^V}\textrm{e}^{\beta \mathcal {H}_G(\tau )}\) is the normalising factor of the distribution. In physics jargon, \(\beta \) corresponds to the so-called inverse-temperature of the model, \(\mathcal {H}_G(\,\cdot \,)\) is known as the Hamiltonian, and \(Z_\beta (\,\cdot \,)\) is the partition function. Note, since \(\beta >0\), the Boltzmann distribution assigns greater weight to configurations \(\sigma \) where many edges join vertices of the same colour; thus, the pairwise interactions between vertices are ferromagnetic.

The Potts model on the d-regular random graph has two distinctive features. First, the local geometry of the random regular graph is essentially deterministic. For any fixed radius \(\ell \), the depth-\(\ell \) neighbourhood of all but a tiny number of vertices is just a d-regular tree. Second, the ferromagnetic nature of the model precludes replica symmetry breaking, a complex type of long-range correlations [40]. Given these, it is conjectured that the model on the random regular graph has a similar behaviour to that on the clique (the so-called mean field case), and there has already been some preliminary evidence of this correspondence [6, 22, 23, 27, 33]. On the clique, the phase transitions are driven by a battle between two subsets of configurations (phases): (i) the paramagnetic/disordered phase, consisting of configurations where every colour appears roughly equal number of times, and (ii) the ferromagnetic/ordered phase, where one of the colours appears more frequently than the others. It is widely believed that these two phases also mark (qualitatively) the same type of phase transitions for the Potts model on the random regular graph, yet this has remained largely elusive.

The main reason that this behaviour is harder to establish on the random regular graph is that it has a non-trivial global geometry which makes both the analysis of the distribution and Markov chains significantly more involved (to say the least). In particular, the emergence of the metastable states in the distribution, which can be established by way of calculus in the mean-field case, is out of reach with single-handed analytical approaches in the random regular graph and it is therefore not surprising that it has resisted a detailed analysis so far. Likewise, the analysis of Markov chains is a far more complicated task since their evolution needs to be considered in terms of the graph geometry and therefore much harder to keep track of.

Our main contribution is to detail the emergence of the metastable states, viewed as fixed points of Belief Propagation on this model, and their connection with the dynamic evolution of the two most popular Markov chains, the Glauber dynamics and the Swensen-Wang chain. We prove that these natural fixed points, whose emergence is directly connected with the phase transitions of the model, have the combinatorial meaning in terms of both the pure state decomposition of the distribution and the Glauber dynamics that physics intuition predicts they should. The proofs avoid the complicated moment calculations and the associated complex optimistion arguments that have become a hallmark of the study of spin systems on random graphs [3]. Instead, building upon and extending ideas from [5, 18], we exploit a connection between spatial mixing properties on the d-regular tree and the Boltzmann distribution. Our metastability results for the Potts model significantly refine those appearing in the literature, especially those in [27, 33] which are more relevant to this work, see Sect. 1.6 for a more detailed discussion.

We expect that this approach might carry over to other examples, particularly other ferromagnetic models. Let us begin by recapitulating Belief Propagation.

1.2 Belief propagation

Suppose that \(n,d\ge 3\) are integers such that dn is even and let \(\mathbb {G}=\mathbb {G}(n,d)\) be the random d-regular graph on the vertex set \([n]=\{1,\ldots ,n\}\). For an inverse temperature parameter \(\beta >0\) and an integer \(q\ge 3\) we set out to investigate the Boltzmann distribution \(\mu _{\mathbb {G},\beta }\); let us write \(\varvec{\sigma }_{\mathbb {G},\beta }\) for a configuration drawn from \(\mu _{\mathbb {G},\beta }\).

A vital step toward understanding the Boltzmann distribution is to get a good handle on the partition function \(Z_{\beta }(\mathbb {G})\). Indeed, according to the physicsts’ cavity method, Belief Propagation actually solves both problems in one fell swoop [40]. To elaborate, with each edge \(e=uv\) of \(\mathbb {G}\), Belief Propagation associates two messages \(\mu _{\mathbb {G},\beta ,u\rightarrow v},\mu _{\mathbb {G},\beta ,v\rightarrow u}\), which are probability distributions on the set [q] of colours. The message \(\mu _{\mathbb {G},\beta ,u\rightarrow v}(c)\) is defined as the marginal probability of v receiving colour c in a configuration drawn from the Potts model on the graph \(\mathbb {G}-u\) obtained by removing u. The semantics of \(\mu _{\mathbb {G},\beta ,v\rightarrow u}\) is analogous.

Under the assumption that the colours of far apart vertices of \(\mathbb {G}\) are asymptotically independent, one can heuristically derive a set of equations that links the various messages together. For a vertex v, let \(\partial v\) be the set of neighbours of v, and for an integer \(\ell \ge 1\) let \(\partial ^\ell v\) be the set of vertices at distance precisely \(\ell \) from v. The Belief Propagation equations read

$$\begin{aligned} \mu _{\mathbb {G},\beta ,v\rightarrow u}(c)&=\frac{\prod _{w\in \partial v{\setminus }\left\{ {u}\right\} }1+(\textrm{e}^\beta -1)\mu _{\mathbb {G},\beta ,w\rightarrow v}(c)}{\sum _{\chi \in [q]}\prod _{w\in \partial v{\setminus }\left\{ {u}\right\} }1+(\textrm{e}^\beta -1)\mu _{\mathbb {G},\beta ,w\rightarrow v}(\chi )}{} & {} (uv\in E(\mathbb {G}),\ c\in [q]). \end{aligned}$$
(1.1)

The insight behind (1.1) is that once we remove v from the graph, its neighbours \(w\ne u\) are typically far apart from one another because \(\mathbb {G}\) contains only a negligible number of short cycles. Hence, we expect that in \(\mathbb {G}-v\) the spins assigned to \(w\in \partial v{\setminus }\left\{ {u}\right\} \) are asymptotically independent. From this assumption it is straightforward to derive the sum-product-formula (1.1).

A few obvious issues spring to mind. First, for large \(\beta \) it is not actually true that far apart vertices decorrelate. This is because at low temperature there occur q different ferromagnetic pure states, one for each choice of the dominant colour. To break the symmetry between them one could introduce a weak external field that slighly boosts a specific colour or, more bluntly, confine oneself to a conditional distribution on subspace where a specific colour dominates. In the definition of the messages and in (1.1) we should thus replace the Boltzmann distribution by the conditional distribution \(\mu _{\mathbb {G},\beta }(\,\cdot \,\mid S)\) for a suitable \(S\subseteq [q]^{n}\). Second, even for the conditional measure we do not actually expect (1.1) to hold precisely. This is because for any finite n minute correlations between far apart vertices are bound to remain.

Nonetheless, precise solutions \((\mu _{v\rightarrow u})_{uv\in E(\mathbb {G})}\) to (1.1) are still meaningful. They correspond to stationary points of a functional called the Bethe free energy, which connects Belief Propagation with the problem of approximating the partition function [52]. Given a collection \((\mu _{u\rightarrow v})_{uv\in E(\mathbb {G})}\) of probability distributions on [q], the Bethe functional reads

$$\begin{aligned} \begin{aligned} \mathcal {B}_{\mathbb {G},\beta }\left( {(\mu _{u\rightarrow v})_{uv\in E(\mathbb {G})}}\right)&=\frac{1}{n}\sum _{v\in V(\mathbb {G})}\log \bigg [\sum _{c\in [q]}\prod _{w\in \partial v}1+(\textrm{e}^\beta -1)\mu _{w\rightarrow v}(c)\bigg ]\\&\quad -\frac{1}{n}\sum _{vw\in E(\mathbb {G})}\log \bigg [1+(\textrm{e}^\beta -1)\sum _{c\in [q]}\mu _{v\rightarrow w}(c)\mu _{w\rightarrow v}(c)\bigg ]. \end{aligned} \end{aligned}$$
(1.2)

According to the cavity method the maximum of \(\mathcal {B}_{\mathbb {G},\beta }\left( {(\mu _{u\rightarrow v})_{uv\in E(\mathbb {G})}}\right) \) over all solutions \((\mu _{u\rightarrow v})_{uv\in E(\mathbb {G})}\) to (1.1) should be asymptotically equal to \(\log Z_\beta (\mathbb {G})\) with high probability.

In summary, physics lore holds that the solutions \((\mu _{u\rightarrow v})_{uv\in E(\mathbb {G})}\) to (1.1) are meaningful because they correspond to a decomposition of the phase space \([q]^{n}\) into pieces where long-range correlations are absent. Indeed, these “pure states” are expected to exhibit metastability, i.e., they trap dynamics such as the Glauber Markov chain for an exponential amount of time. Moreover, the relative probabilities of the pure states are expected to be governed by their respective Bethe free energy. In the following we undertake to investigate these claims rigorously.

Before proceeding, let us mention that ferromagnetic spin systems on random graphs have been among the first models for which predictions based on the cavity method could be verified rigorously. Following seminal work by Dembo and Montanari on the Ising model [21] vindicating the “replica symmetric ansatz”, Dembo et al. [23] studied, among other things, the Gibbs unique phase of the Potts ferromagnet on the random regular graph, and Dembo, Montanari, Sly and Sun [23] established the free energy of the model for all \(\beta \) (and d even). More generally, Ruozzi [47] pointed out how graph covers [51] can be used to investigate the partition function of supermodular models, of which the Ising ferromagnet is an example. In addition, Barbier, Chan and Macris [6] proved that ferromagnetic spin systems on random graphs are generally replica symmetric in the sense that the multi-overlaps of samples from the Boltzmann distribution concentrate on deterministic values.

1.3 The ferromagnetic and the paramagnetic states

An obvious attempt at constructing solutions to the Belief Propagation equations is to choose identical messages \(\mu _{u\rightarrow v}\) for all edges \(uv\in E(\mathbb {G})\). Clearly, any solution \((\mu (c))_{c\in [q]}\) to the system

$$\begin{aligned} \mu (c)&=\frac{(1+(\textrm{e}^\beta -1)\mu (c))^{d-1}}{\sum _{\chi \in [q]}(1+(\textrm{e}^\beta -1)\mu (\chi ))^{d-1}}&(c\in [q]) \end{aligned}$$
(1.3)

supplies such a ‘constant’ solution to (1.1). Let \(\mathcal {F}_{d,\beta }\) be the set of all solutions \((\mu (c))_{c\in [q]}\) to (1.3). The Bethe functional (1.2) then simplifies to

$$\begin{aligned} \mathcal {B}_{d,\beta }\big ((\mu (c))_{c\in [q]}\big )&=\log \bigg [\sum _{c\in [q]}\left( {1+(\textrm{e}^\beta -1)\mu (c)}\right) ^d\bigg ]-\frac{d}{2}\log \bigg [1+(\textrm{e}^\beta -1)\sum _{c\in [q]}\mu (c)^2\bigg ]. \end{aligned}$$
(1.4)

One obvious solution to (1.3) is the uniform distribution on [q]; we refer to that solution as paramagnetic/disordered and denote it by \(\mu _{\textrm{p}}\). Apart from \(\mu _{\textrm{p}}\), other solutions to (1.3) emerge as \(\beta \) increases for any \(d\ge 3\). Specifically, let \({\beta _u}>0\) be the supremum value of \(\beta >0\) where \(\mu _{\textrm{p}}\) is the unique solution to (1.3).Footnote 1 Then, for \(\beta = {\beta _u}\), one more solution \(\mu _{\textrm{f}}\) emerges such that \(\mu _{\textrm{f}}(1)>\mu _{\textrm{f}}(i)=\tfrac{1-\mu _{\textrm{f}}(1)}{q-1}\) for \(i=2,\ldots ,q\), portending the emergence of a metastable state and, ultimately, a phase transition. In particular, for any \(\beta > {\beta _u}\), a bit of calculus reveals there exist either one or two distinct solutions \(\mu \) with \(\mu (1)>\mu (i)=\tfrac{1-\mu (1)}{q-1}\) for \(i=2,\ldots ,q\); we denote by \(\mu _{\textrm{f}}\) the solution of (1.3) which maximises the value \(\mu (1)\) and refer to it as ferromagnetic/ordered. The value \({\beta _u}\) is the so-called uniqueness threshold for the Potts model on the d-regular tree, see, e.g., [27] for a more detailed discussion and related pointers.

At the critical value

$$\begin{aligned} {\beta _p}&=\max \left\{ {\beta \ge {\beta _u}:\mathcal {B}_{d,\beta }(\mu _{\textrm{p}})\ge \mathcal {B}_{d,\beta }(\mu _{\textrm{f}})}\right\} =\log \frac{q-2}{(q-1)^{1-2/d}-1}. \end{aligned}$$

the ferromagnetic solution \(\mu _{\textrm{f}}\) takes over from the paramagnetic solution \(\mu _{\textrm{p}}\) as the global maximiser of the Bethe functional. For that reason, the threshold \({\beta _p}\) is also known in the literature as the ordered-disordered threshold. Yet, up to the threshold

$$\begin{aligned} {\beta _h}&=\log (1+q/(d-2)) \end{aligned}$$

the paramagnetic solution remains a local maximiser of the Bethe free energy; later, in Sect. 2.3 we will see that \({\beta _h}\) has a natural interpretation as a tree-broadcasting threshold (and is also a conjectured threshold for uniqueness in the random-cluster representation for the Potts model, see [32] for details).

The relevance of these critical values has been demonstrated in [27] (see also [22] for d even, and [33] for q large), where it was shown that \(\tfrac{1}{n}\log Z_\beta (\mathbb {G})\) is asymptotically equal to \(\max _{\mu } \mathcal {B}_{d,\beta }(\mu )\), the maximum ranging over \(\mu \) satisfying (1.3). In particular, at the maximum it holds that \(\mu =\mu _{\textrm{p}}\) when \(\beta <{\beta _p}\), \(\mu =\mu _{\textrm{f}}\) when \(\beta >{\beta _p}\) and \(\mu \in \{\mu _{\textrm{p}},\mu _{\textrm{f}}\}\) when \(\beta ={\beta _p}\).

1.4 Slow mixing and metastability

To investigate the two BP solutions further and obtain connections to the dynamical evolution of the model, we need to look more closely how these two solutions \(\mu _{\textrm{p}},\mu _{\textrm{f}}\) manifest themselves in the random regular graph. To this end, we define for a given distribution \(\mu \) on [q] another distribution

$$\begin{aligned} \nu ^\mu (c)&=\frac{(1+(\textrm{e}^\beta -1)\mu (c))^{d}}{\sum _ {\chi \in [q]}(1+(\textrm{e}^\beta -1)\mu (\chi ))^{d}}&(c\in [q]). \end{aligned}$$
(1.5)

Let \(\nu _{\textrm{f}}=\nu ^{\mu _{\textrm{f}}}\) and \(\nu _{\textrm{p}}=\nu ^{\mu _{\textrm{p}}}\) for brevity; of course \(\nu _{\textrm{p}}=\mu _{\textrm{p}}\) is just the uniform distribution. The distributions \(\nu _{\textrm{f}}\) and \(\nu _{\textrm{p}}\) represent the expected Boltzmann marginals within the pure states corresponding to \(\mu _{\textrm{f}}\) and \(\mu _{\textrm{p}}\). Indeed, the r.h.s. of (1.5) resembles that of (1.3) except that the exponents read d rather than \(d-1\). This means that we pass from messages, where we omit one specific endpoint of an edge from the graph, to actual marginals, where all d neighbours of a vertex are present. For small \(\varepsilon >0\), it will therefore be relevant to consider the sets of configurations

$$\begin{aligned} S_{\textrm{f}}(\varepsilon )&=\bigg \{\sigma \in [q]^{n}:\sum _{c\in [q]}\Big |\big | \sigma ^{-1}(c) \big |-n\nu _{\textrm{f}}(c)\Big |<\varepsilon n\bigg \},\\ S_{\textrm{p}}(\varepsilon )&=\bigg \{\sigma \in [q]^{n}:\sum _{c\in [q]}\Big |\big | \sigma ^{-1}(c) \big |-n\nu _{\textrm{p}}(c)\Big |<\varepsilon n\bigg \}, \end{aligned}$$

whose colour statistics are about \(n\nu _{\textrm{f}}\) and \(n\nu _{\textrm{p}}\), respectively; i.e., in \(S_{\textrm{p}}\), all colours appear with roughly equal frequency, whereas in \(S_{\textrm{f}}\) colour 1 is favoured over the other \(q-1\) colours (which appear with roughly equal frequency).

We are now in position to state our main result for Glauber dynamics. Recall that, for a graph \(G=(V,E)\), Glauber is initialised at a configuration \(\sigma _0\in [q]^{V}\); at each time step \(t\ge 1\), Glauber draws a vertex uniformly at random and obtains a new configuration \(\sigma _t\) by updating the colour of the chosen vertex according to the conditional Boltzmann distribution given the colours of its neighbours. It is a well-known fact that Glauber converges in distribution to \(\mu _{G,\beta }\); the mixing time of the chain is defined as the maximum number of steps t needed to get within total variation distance \(\le 1/4\) from \(\mu _{G,\beta }\), where the maximum is over the choice of the initial configuration \(\sigma _0\), i.e., the quantity \(\max _{\sigma _0}\min \{t:\,d_{\textrm{TV}}( \sigma _t,\mu _{G,\beta })\le 1/4\}\).

For metastability, we will consider Glauber launched from a random configuration from a subset \(S\subseteq [q]^{V}\) of the state space. More precisely, let us denote by \(\mu _{G,\beta ,S}=\mu _{G,\beta }(\cdot \mid S)\) the conditional Boltzmann distribution on S. We call S a metastable state for Glauber dynamics on G if there exists \(\delta >0\) such that

$$\begin{aligned} \mathbb {P}\left[{\min \{t:\,\sigma _t\not \in S\}\le \textrm{e}^{\delta |V|}\mid \sigma _0\sim \mu _{G,\beta ,S}}\right]\le \textrm{e}^{-\delta |V|}. \end{aligned}$$

Hence, it will most likely take Glauber an exponential amount of time to escape from a metastable state.

Theorem 1.1

Let \(d,q\ge 3\) be integers and \(\beta >0\) be real. Then, for all sufficiently small \(\varepsilon >0\), the following hold w.h.p. over the choice of \(\mathbb {G}=\mathbb {G}(n,d)\).

  1. (i)

    If \(\beta <{\beta _h}\), then \(S_{\textrm{p}}(\varepsilon )\) is a metastable state for Glauber dynamics on \(\mathbb {G}\).

  2. (ii)

    If \(\beta >{\beta _u}\), then \(S_{\textrm{f}}(\varepsilon )\) is a metastable state for Glauber dynamics on \(\mathbb {G}\).

Further, for \(\beta >{\beta _u}\), the mixing time of Glauber is \(\textrm{e}^{\Omega (n)}\).

Thus, we can summarise the evolution of the Potts model as follows. For \(\beta <{\beta _u}\) there is no ferromagnetic state. As \(\beta \) passes \({\beta _u}\), the ferromagnetic state \(S_{\textrm{f}}\) emerges first as a metastable state. Hence, if we launch Glauber from \(S_{\textrm{f}}\), the dynamics will most likely remain trapped in the ferromagnetic state for an exponential amount of time, even though the Boltzmann weight of the paramagnetic state is exponentially larger (as we shall see in the next section). At the point \({\beta _p}\) the ferromagnetic state then takes over as the one dominating the Boltzmann distribution, but the paramagnetic state remains as a metastable state up to \({\beta _h}\). Note in particular that the two states coexist as metastable states throughout the interval \(({\beta _u},{\beta _h})\).

The metastability for the Potts model manifests also in the evolution of the Swendsen–Wang (SW) chain, which is another popular and substantially more elaborate chain that makes non-local moves, based on the random-cluster representation of the model. For a graph \(G=(V,E)\) and a configuration \(\sigma \in [q]^V\), a single iteration of SW starting from \(\sigma \) consists of two steps.

  • Percolation step: Let \(M=M(\sigma )\) be the random edge-set obtained by adding (indepentently) each monochromatic edge under \(\sigma \) with probability \(p=1-\textrm{e}^{-\beta }\).

  • Recolouring step: Obtain the new \(\sigma '\in [q]^V\) by assigning each componentFootnote 2 of the graph (VM) a uniformly random colour from [q]; for \(v\in V\), we set \(\sigma _v'\) to be the colour assigned to v’s component.

We define metastable states for SW dynamics analogously to above. The following theorem establishes the analogue of Theorem 1.1 for the non-local SW dynamics. Note here that SW might change the most-frequent colour due to recolouring step, so the metastability statement for the ferromagnetic phase needs to consider the set \(S_{\textrm{f}}(\varepsilon )\) with its \(q-1\) permutations.

Theorem 1.2

Let \(d,q\ge 3\) be integers and \(\beta >0\) be real. Then, for all sufficiently small \(\varepsilon >0\), the following hold w.h.p. over the choice of \(\mathbb {G}=\mathbb {G}(n,d)\).

  1. (i)

    If \(\beta <{\beta _h}\), then \(S_{\textrm{p}}(\varepsilon )\) is a metastable state for SW dynamics on \(\mathbb {G}\).

  2. (ii)

    If \(\beta >{\beta _u}\), then \(S_{\textrm{f}}(\varepsilon )\) together with its \(q-1\) permutations is a metastable state for SW dynamics on \(\mathbb {G}\).

Further, for \(\beta \in ({\beta _u},{\beta _h})\), the mixing time of SW is \(\textrm{e}^{\Omega (n)}\).

1.5 The relative weight of the metastable states

At the heart of obtaining the metastability results of the previous section is a refined understanding of the relative weight of the ferromagnetic and paramagnetic states. The following notion of non-reconstruction will be the key in our arguments; it captures the absence of long-range correlations within a set \(S\subseteq [q]^{n}\), saying that, for any vertex v, a typical boundary configuration on \(\varvec{\sigma }_{\partial ^\ell v}\) chosen according to the conditional distribution on S does not impose a discernible bias on the colour of v (for large \(\ell ,n\); recall, \(\partial ^\ell v\) is the set of all vertices at distance precisely \(\ell \) from v). More precisely, let \(\mu =\mu _{\mathbb {G},\beta }\) and \(\varvec{\sigma }\sim \mu \); the Boltzmann distribution exhibits non-reconstruction given a subset \(S\subseteq [q]^{n}\) if for any vertex v it holds that

$$\begin{aligned} \lim _{\ell \rightarrow \infty }\limsup _{n\rightarrow \infty } \sum _{c\in [q]}\sum _{\tau \in S}\mathbb {E}\left[{\mu (\tau \mid S) \times \left| {\mu (\varvec{\sigma }_{v}=c\mid \varvec{\sigma }_{\partial ^\ell v}=\tau _{\partial ^\ell v})-\mu (\varvec{\sigma }_{v}=c\mid S)}\right| }\right]=0, \end{aligned}$$

where the expectation is over the choice of the graph \(\mathbb {G}\).

Theorem 1.3

Let \(d,q\ge 3\) be integers and \(\beta >0\) be real. The following hold for all sufficiently small \(\varepsilon >0\) as \(n\rightarrow \infty \).

  1. (i)

    For all \(\beta <{\beta _p}\), \(\mathbb {E}\left[{\mu _{\mathbb {G},\beta }(S_{\textrm{p}})}\right]\rightarrow 1\) and, if \(\beta >{\beta _u}\), then \(\mathbb {E}\left[{\tfrac{1}{n}\log \mu _{\mathbb {G},\beta }(S_{\textrm{f}})}\right]\rightarrow \mathcal {B}_{d,\beta }(\mu _{\textrm{f}})-\mathcal {B}_{d,\beta }(\mu _{\textrm{p}})\).

  2. (ii)

    For all \(\beta >{\beta _p}\), \(\mathbb {E}\left[{\mu _{\mathbb {G},\beta }(S_{\textrm{f}})}\right]\rightarrow 1/q\) and, if \(\beta <{\beta _h}\), then \(\mathbb {E}\left[{\tfrac{1}{n}\log \mu _{\mathbb {G},\beta }(S_{\textrm{p}})}\right]\rightarrow \mathcal {B}_{d,\beta }(\mu _{\textrm{p}})-\mathcal {B}_{d,\beta }(\mu _{\textrm{f}})\).

Furthermore, the Boltzmann distribution given \(S_{\textrm{p}}\) exhibits non-reconstruction if \(\beta <{\beta _h}\) and the Boltzmann distribution given \(S_{\textrm{f}}\) exhibits non-reconstruction if \(\beta >{\beta _u}\).

Theorem 1.3 shows that for \(\beta <{\beta _p}\) the Boltzmann distribution is dominated by the paramagnetic state \(S_{\textrm{p}}\) for \(\beta <{\beta _p}\). Nonetheless, at \({\beta _u}\) the ferromagnetic state \(S_{\textrm{f}}\) and its \(q-1\) mirror images start to emerge. Their probability mass is determined by the Bethe free energy evaluated at \(\mu _{\textrm{f}}\). Further, as \(\beta \) passes \({\beta _p}\) the ferromagnetic state takes over as the dominant state, with the paramagnetic state lingering on as a sub-dominant state up to \({\beta _h}\). Finally, both states \(S_{\textrm{p}}\) and \(S_{\textrm{f}}\) are free from long-range correlations both for the regime of \(\beta \) where they dominate and for those \(\beta \) where they are sub-dominant.

1.6 Discussion

Our slow mixing result for Glauber dynamics when \(\beta >{\beta _u}\) (Theorem 1.1) significantly improves upon previous results of Bordewich et al. [11] that applied to \(\beta >{\beta _u}+\Theta _q(1)\). Similarly, our slow mixing result for Swendsen–Wang dynamics when \(\beta \in ({\beta _u},{\beta _h})\) (Theorem 1.2) strengthens earlier results of Galanis et al. [27] which applied to \(\beta ={\beta _p}\), and by Helmuth et al. [33] which applied for a small interval around \({\beta _p}\); both results applied only for q sufficiently large. To obtain our result for all integers \(q,d\ge 3\), we need to carefully track how SW evolves on the random regular graph for configurations starting from the ferromagnetic and paramagnetic phases, by accounting for the percolation step via delicate arguments, whereas the approaches of [27, 33] side-stepped this analysis by considering the change in the number of monochromatic edges instead.

Our slow mixing results complement the recent fast mixing result of Blanca and Gheissari [8] for edge dynamics on the random d-regular graph that applies to all \(\beta <{\beta _u}\). Roughly, edge dynamics is the analogue of Glauber dynamics for the random cluster representation of the Potts model (the random-cluster representation has nicer monotonicity properties). The result of [8] already implies a polynomial bound on the mixing time of SW when \(\beta <{\beta _u}\) (due to comparison results by Ullrich that apply to general graphs [50]), and conversely our exponential lower bound on the mixing time of SW for \(\beta \notin ({\beta _u},{\beta _h})\) implies an exponential lower bound on the mixing time of edge dynamics for \(\beta \notin ({\beta _u},{\beta _h})\). The main open questions remaining are therefore showing whether Glauber dynamics for the Potts model mixes fast when \(\beta \le {\beta _u}\) and whether SW/edge-dynamics mixes fast when \(\beta \ge {\beta _h}\). Extrapolating from the mean-field case (see discussion below), it is natural to conjecture that our slow mixing results are best-possible, i.e., for \(\beta \le {\beta _u}\), Glauber mixes rapidly and similarly, for \(\beta \notin ({\beta _u},{\beta _h})\), SW mixes rapidly on the random regular graph.

Theorem 1.3, aside from being critical in establishing the aforementioned slow mixing and metastability results, is the first to establish for all \(q,d\ge 3\) the coexistence of the ferromagnetic and paramagnetic phases for all \(\beta \) in the interval \(({\beta _u},{\beta _h})\) and detail the logarithmic order of their relative weight in the same interval. Previous work in [27] showed coexistence for \(\beta ={\beta _p}\) (for all \(q,d\ge 3\)) and [33] for \(\beta \) in a small interval around \({\beta _p}\) (for large q and \(d\ge 5\)).Footnote 3 Together with Theorems 1.1 and  1.2, Theorem 1.3 delineates more firmlyFootnote 4 the correspondence with the (simpler) mean-field case, the Potts model on the clique. In the mean-field case, there are qualitatively similar thresholds \({\beta _u},{\beta _p},{\beta _h}\) and the mixing time for Glauber and SW have been detailed for all \(\beta \), even at criticality, see [9, 10, 20, 26, 29, 31, 37]. As mentioned earlier, the most tantalising question remaining open is to establish whether the fast mixing of SW for \(\beta ={\beta _u}\) and \(\beta \ge {\beta _h}\) in the mean-field case translates to the random regular graph as well. Another interesting direction is to extend our arguments to the random-cluster representation of the Potts model for all non-integer \(q\ge 1\); note that the arguments of [7, 33] do apply to non-integer q (\(q\ge 1\) and q large, respectively). The proof of Theorem 1.3 relies on a truncated second moment computation, an argument that was applied to different models in [15, 18].

We further remark here that, from a worst-case perspective, it is known that sampling from the Potts model on d-regular graphs is #BIS-hard for \(\beta >{\beta _p}\) [27], and we conjecture that the problem admits a poly-time approximation algorithm when \(\beta <{\beta _p}\). However, even showing that Glauber mixes fast on any d-regular graph in the uniqueness regime \(\beta <{\beta _u}\) is a major open problem, and Theorems 1.1 and 1.2 further demonstrate that getting an algorithm all the way to \({\beta _p}\) will require using different techniques. To this end, progress has been made in [12, 19] where an efficient algorithm is obtained asymptotically up to \(\beta _p\) for large q and d using cluster-expansion methods. More precise results have been shown on the random regular graph: [33] obtained an algorithm for \(d\ge 5\) and q large that applies to all \(\beta \) by sampling from each phase separately based also on cluster-expansion methods; also, for \(\beta <{\beta _p}\), Efthymiou [24] gives an algorithm with weaker approximation guarantees but which applies to all \(q,d\ge 3\) (see also [7]). In principle, and extrapolating again from the mean-field case, one could use Glauber/SW to sample from each phase on the random regular graph for all \(q,d\ge 3\) and all \(\beta \). Analysing such chains appears to be relatively far from the reach of current techniques even in the case of the random regular graph, let alone worst-case graphs. In the case of the Ising model however, the case \(q=2\), the analogue of this fast mixing question has recently been established for sufficiently large \(\beta \) in [30] on the random regular graph and the grid, exploiting certain monotonicity properties.

Finally, let us note that the case of the grid has qualitatively different behaviour than the mean-field and the random-regular case. There, the three critical points coincide and the behaviour at criticality depends on the value of q; the mixing time of Glauber and SW has largely been detailed, see [9, 28, 39].

2 Overview

In this section we give an overview of the proofs of Theorems 1.11.3. Fortunately, we do not need to start from first principles. Instead, we build upon the formula for the partition function \(Z_\beta (\varvec{G})\) and its proof via the second moment method from [27]. Additionally, we are going to seize upon facts about the non-reconstruction properties of the Potts model on the random \((d-1)\)-ary tree, also from [27]. We will combine these tools with an auxiliary random graph model known as the planted model, which also plays a key role in the context of inference problems on random graphs [17].

2.1 Preliminaries

Throughout most of the paper, instead of the simple random regular graph \(\mathbb {G}\), we are going to work with the random d-regular multi-graph \(\varvec{G}=\varvec{G}(n,d)\) drawn from the pairing model. Recall that \(\varvec{G}\) is obtained by creating d clones of each of the vertices from [n], choosing a random perfect matching of the complete graph on \([n]\times [d]\) and subsequently contracting the vertices \(\{i\}\times [d]\) into a single vertex i, for all \(i\in [n]\). It is well-known that \(\mathbb {G}\) is contiguous with respect to \(\varvec{G}\) [35], i.e., any property that holds w.h.p. for \(\varvec{G}\) also holds w.h.p. for \(\mathbb {G}\).

For a configuration \(\sigma \in [q]^{V(G)}\) define a probability distribution \(\nu ^\sigma \) on [q] by letting

$$\begin{aligned} \nu ^\sigma (s)&=|\sigma ^{-1}(s)|/n{} & {} (s\in [q]). \end{aligned}$$

In words, \(\nu ^\sigma \) is the empirical distribution of the colours under \(\sigma \). Similarly, let \(\rho ^{G,\sigma }\in \mathcal {P}([q]\times [q])\) be the edge statistics of a given graph/colouring pair, i.e.,

$$\begin{aligned} \rho ^{G,\sigma }(s,t)&=\frac{1}{2|E(G)|}\sum _{u,v\in V(G)}\varvec{1}\{uv\in E(G),\,\sigma _u=s,\,\sigma _v=t\}. \end{aligned}$$

We are going to need the following elementary estimate of the number of d-regular multigraphs G that attain a specific \(\rho ^{G,\sigma }\).

Lemma 2.1

([14, Lemma 2.7]). Suppose that \(\sigma \in [q]^n\). Moreover, suppose that \(\rho =(\rho (s,t))_{s,t\in [q]}\) is a symmetric matrix with positive entries such that \(dn\rho (s,t)\) is an integer for all \(s,t\in [q]\), \(dn\rho (s,s)\) is even for all \(s\in [q]\) and \(\sum _{t=1}^q\rho (s,t)=\nu ^{\sigma }(s)\) for all \(s\in [q]\). Let \(\mathcal {G}(\sigma ,\rho )\) be the event that \(\rho ^{\varvec{G},\sigma }=\rho \).

Then

$$\begin{aligned} \mathbb {P}\left[{\mathcal {G}(\sigma ,\rho )}\right]&=\exp \left[{\frac{dn}{2}\sum _{s,t=1}^q\rho (s,t) \log \frac{\nu ^{\sigma }(s)\nu ^{\sigma }(t)}{\rho (s,t)}+O(\log n/n)}\right]. \end{aligned}$$

2.2 Moments and messages

The routine method for investigating the partition function and the Boltzmann distribution of random graphs is the method of moments [3]. The basic strategy is to calculate, one way or another, the first two moments \(\mathbb {E}[Z_\beta (\varvec{G})]\), \(\mathbb {E}[Z_\beta (\varvec{G})^2]\) of the partition function. Then we cross our fingers that the second moment is not much larger than the square of the first. It sometimes works. But potential pitfalls include a pronounced tendency of running into extremely challenging optimisation problems in the course of the second moment calculation and, worse, lottery effects that may foil the strategy altogether. While regular graphs in general and the Potts ferromagnet in particular are relatively tame specimens, these difficulties actually do arise once we set out to investigate metastable states. Drawing upon [5, 18] to sidestep these challenges, we develop a less computation-heavy proof strategy.

The starting point is the observation that the fixed points of (1.3) are intimately related to the moment calculation. This will not come as a surprise to experts, and indeed it was already noticed in [27]. To elaborate, let \(\nu =(\nu (s))_{s\in [q]}\) be a probability distribution on the q colours. Moreover, let \({\mathcal R}(\nu )\) be the set of all symmetric matrices \(\rho =(\rho (s,t))_{s,t\in [q]}\) with non-negative entries such that

$$\begin{aligned} \sum _{t\in [q]}\rho (s,t)&=\nu (s)\qquad \text{ for } \text{ all } s\in [q]. \end{aligned}$$
(2.1)

Simple manipulations (e.g., [14, Lemma 2.7]) show that the first moment satisfies

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\log \mathbb {E}[Z_\beta (\varvec{G})]&=\max _{\nu \in \mathcal {P}([q]), \rho \in R(\nu )}F_{d,\beta }(\nu ,\rho ),\qquad \text{ where }\nonumber \\ F_{d,\beta }(\nu ,\rho )&=(d-1)\sum _{s\in [q]}\nu (s)\log \nu (s)-d\sum _{1\le s\le t \le q}\rho (s,t)\log \rho (s,t)\nonumber \\&\quad +\frac{d\beta }{2}\sum _{s\in [q]}\rho (s,s). \end{aligned}$$
(2.2)

Thus, the first moment is governed by the maximum or maxima, as the case may be, of \(F_{d,\beta }\).

We need to know that the maxima of \(F_{d,b}\) are in one-to-one correspondence with the stable fixed points of (1.3). To be precise, a fixed point \(\mu \) of (1.3) is stable if the Jacobian of (1.3) at \(\mu \) has spectral radius strictly less than one. Let \(\mathcal {F}^+_{d,\beta }\) be the set of all stable fixed points \(\mu \in \mathcal {F}_{d,\beta }\). Moreover, let \(\mathcal {F}^1_{d,\beta }\) be the set of all \(\mu \in \mathcal {F}^+_{d,\beta }\) such that \(\mu (1)=\max _{s\in [q]}\mu (s)\). In addition, let us call a local maximum \((\nu ,\rho )\) of \(F_{d,\beta }\) stable if there exist \(\delta ,c>0\) such that

$$\begin{aligned} F_{d,\beta }(\nu ',\rho ')&\le F_{d,\beta }(\nu ,\rho )-c\left( {\Vert \nu -\nu '\Vert ^2+\Vert \rho -\rho '\Vert ^2}\right) \end{aligned}$$
(2.3)

for all \(\nu '\in \mathcal {P}([q])\) and \(\rho '\in {\mathcal R}(\nu ')\) such that \(\Vert \nu -\nu '\Vert +\Vert \rho -\rho '\Vert <\delta \). Roughly, (2.3) provides that the Hessian of \(F_{d,\beta }\) is negative definite on the subspace of all possible \(\nu ,\rho \).

Lemma 2.2

([27, Theorem 8]). Suppose that \(d\ge 3,\beta >0\). The map \(\mu \in \mathcal {P}([q])\mapsto (\nu ^\mu ,\rho ^\mu )\) defined by

$$\begin{aligned} \nu ^\mu (s)&=\frac{(1+(\textrm{e}^\beta -1)\mu (s))^d}{\sum _{t\in [q]}(1+(\textrm{e}^\beta -1) \mu (t))^d},&\rho ^\mu (s,t)&=\frac{\textrm{e}^{\beta \varvec{1}\{s=t\}}\mu (s)\mu (t)}{1+(\textrm{e}^\beta -1)\sum _{s\in [q]}\mu (s)^2} \end{aligned}$$
(2.4)

is a bijection from \(\mathcal {F}^+_{d,\beta }\) to the set of stable local maxima of \(F_{d,\beta }\). Moreover, for any fixed point \(\mu \) we have \(\mathcal {B}_{d,\beta }(\mu )=F_{d,\beta }(\nu ^\mu ,\rho ^\mu ).\)

For brevity, let \((\nu _{\textrm{p}},\rho _{\textrm{p}})=(\nu ^{\mu _{\textrm{p}}},\rho ^{\mu _{\textrm{p}}})\) and \((\nu _{\textrm{f}},\rho _{\textrm{f}})=(\nu ^{\mu _{\textrm{f}}},\rho ^{\mu _{\textrm{f}}})\). The following result characterises the stable fixed points \(\mathcal {F}^1_{d,\beta }\).

Proposition 2.3

([27, Theorem 4]). Suppose that \(d\ge 3,\beta >0\).

  1. (i)

    If \(\beta <{\beta _u}\), then (1.3) has a unique fixed point, namely the paramagnetic distribution \(\nu _{\textrm{p}}\) on [q]. This fixed point is stable and thus \(F_{d,\beta }\) attains its global maximum at \((\nu _{\textrm{p}},\rho _{\textrm{p}})\).

  2. (ii)

    If \({\beta _u}<\beta <{\beta _h}\), then \(\mathcal {F}^1_{d,\beta }\) contains two elements, namely the paramagnetic distribution \(\nu _{\textrm{p}}\) and the ferromagnetic distribution \(\nu _{\textrm{f}}\); \((\nu _{\textrm{p}},\rho _{\textrm{p}})\) is a global maximum of \(F_{d,\beta }\) iff \(\beta \le {\beta _p}\), and \((\nu _{\textrm{f}},\rho _{\textrm{f}})\) iff \(\beta \ge {\beta _p}\).

  3. (iii)

    If \(\beta >{\beta _h}\), then \(\mathcal {F}^1_{d,\beta }\) contains precisely one element, namely the ferromagnetic distribution \(\nu _{\textrm{f}}\), and \((\nu _{\textrm{f}},\rho _{\textrm{f}})\) is a global maximum of \(F_{d,\beta }\).

Like the first moment, the second moment boils down to an optimisation problem as well, albeit one of much higher dimension (\(q^2-1\) rather than \(q-1\)). Indeed, it is not difficult to derive the following approximation (once again, e.g., via [14, Lemma 2.7]). For a probability distribution \(\nu \in \mathcal {P}([q])\) and a symmetric matrix \(\rho \in {\mathcal R}(\nu )\) let \({\mathcal R}^\otimes (\rho )\) be the set of all tensors \(r=(r(s,s',t,t'))_{s,s',t,t'\in [q]}\) such that

$$\begin{aligned} r(s,s',t,t')&=r(t,t',s,s')\quad \text{ and }\quad \sum _{s',t'\in [q]}r(s,s',t,t') =\sum _{s',t'\in [q]}r(s',s,t',t)\nonumber \\&=\rho (s,t)\quad \text{ for } \text{ all } s,t\in [q]. \end{aligned}$$
(2.5)

Then, with \(H(\cdot )\) denoting the entropy function, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\log \mathbb {E}[(Z_{\beta }(\varvec{G}))^2]&=\max _{\nu ,\rho \in {\mathcal R}(\nu ),r\in {\mathcal R}^\otimes (\rho )}F_{d,\beta }^\otimes (\rho ,r), \text{ where } \nonumber \\ F_{d,\beta }^\otimes (\rho ,r)&=(d-1)H(\rho )+\frac{d}{2}H(r)\nonumber \\&\quad +\frac{d\beta }{2} \sum _{s,s',t,t'\in [q]} \left( {\varvec{1}\{s=t\}+\varvec{1}\{s'=t'\}}\right) r(s, s', t, t') . \end{aligned}$$
(2.6)

A frontal assault on this optimisation problem is in general a daunting task due to the doubly-stochastic constraints in (2.5), i.e., the constraint \(r\in {\mathcal R}^\otimes (\rho )\). But fortunately, to analyse the global maximum (over \(\nu \) and \(\rho \)), these constraints can be relaxed, permitting an elegant translation of the problem to operator theory. In effect, the second moment computation can be reduced to a study of matrix norms. The result can be summarised as follows.

Proposition 2.4

([27, Theorem 7]). For all \(d,q\ge 3\) and \(\beta >0\) we have \(\max _{\nu ,\rho \in {\mathcal R}(\nu ),r\in {\mathcal R}^\otimes (\rho )}F_{d,\beta }^\otimes (\rho ,r)=2\max _{\nu ,\rho }F_{d,\beta }(\nu ,\rho )\) and thus \(\mathbb {E}[Z_\beta (\varvec{G})^2]=O (\mathbb {E}[Z_\beta (\varvec{G})]^2)\).

Combining Lemma 2.2, Proposition 2.3 and Proposition 2.4, we obtain the following reformulation of [27, Theorem 7], which verifies that we obtain good approximations to the partition function by maximising the Bethe free energy on \(\mathcal {F}_{d,\beta }\).

Theorem 2.5

For all integers \(d,q\ge 3\) and real \(\beta >0\), we have \(\displaystyle \lim _{n\rightarrow \infty }n^{-1}\log Z_{\beta }(\mathbb {G})=\max \limits _{\mu \in \mathcal {F}_{d,\beta }}\mathcal {B}_{d,\beta }(\mu )\) in probability.

2.3 Non-reconstruction

While the global maximisation of the function \(F_{d,\beta }^\otimes \) and thus the proof of Theorem 2.5 boils down to matrix norm analysis, in order to prove Theorems 1.1 and 1.3 via the method of moments we would in addition need to get a good handle on all the local maxima. Unfortunately, we do not see a way to reduce this more refined question to operator norms. Hence, it would seem that we should have to perform a fine-grained analysis of \(F_{d,\beta }^\otimes \) after all. But luckily another path is open to us. Instead of proceeding analytically, we resort to probabilistic ideas. Specifically, we harness the notion of non-reconstruction on the Potts model on the d-regular tree.

To elaborate, let \(\mathbb T_{d}\) be the infinite d-regular tree with root o. Given a probability distribution \(\mu \in \{\mu _{\textrm{p}},\mu _{\textrm{f}}\}\) we define a broadcasting process \(\varvec{\sigma }=\varvec{\sigma }_{d,\beta ,\mu }\) on \(\mathbb T_{d}\) as follows. Initially we draw the colour \(\varvec{\sigma }_{o}\) of the root o from the distribution \(\nu ^{\mu }\). Subsequently, working our way down the levels of the tree, the colour of a vertex v whose parent u has been coloured already is drawn from the distribution

$$\begin{aligned} \mathbb {P}\left[{\varvec{\sigma }_{v}=c\mid \varvec{\sigma }_{u}}\right]&=\frac{\mu (c)\textrm{e}^ {\beta \varvec{1}\{c=\varvec{\sigma }_{u}\}}}{\sum _{c'\in [q]}\mu (c')\textrm{e}^{\beta \varvec{1}\{c'=\varvec{\sigma }_{u}\}}}. \end{aligned}$$

Naturally, the colours of different vertices on the same level are mutually independent conditioned on the parent’s colour. Let \(\partial ^\ell o\) be the set of all vertices at distance precisely \(\ell \) from o. We say that the broadcasting process has the strong non-reconstruction property if

$$\begin{aligned} \sum _{c\in [q]} \mathbb {E}\Big [\big |\mathbb {P}\left[{\varvec{\sigma }_{o}=c\mid \varvec{\sigma }_{\partial ^\ell o}}\right]-\mathbb {P}\left[{\varvec{\sigma }_{o}=c}\right]\big |\Big ]=\exp (-\Omega (\ell )), \end{aligned}$$

where the expectation is over the random configuration \(\varvec{\sigma }_{\partial ^\ell o}\) (distributed according to the broadcasting process). In words, this says that the information of the spin of the root decays in the broadcasting process; the term “strong” refers that the decay is exponential with respect to the depth \(\ell \).

Proposition 2.6

([27, Theorem 50]). Let \(d,q\ge 3\) be integers and \(\beta >0\) be real.

  1. (i)

    For \(\beta <{\beta _h}\), the broadcasting process \(\varvec{\sigma }_{d,\beta ,\mu _{\textrm{p}}}\) has the strong non-reconstruction property.

  2. (ii)

    For \(\beta >{\beta _u}\), the broadcasting process \(\varvec{\sigma }_{d,\beta ,\mu _{\textrm{f}}}\) has the strong non-reconstruction property.

In order to prove Theorems 1.11.3 we will combine Proposition 2.6 with reweighted random graph models known as planted models. To be precise, we will consider two versions of planted models, a paramagnetic and a ferromagnetic one. Then we will deduce from Proposition 2.6 that the Boltzmann distribution of these planted models has the non-reconstruction property in a suitably defined sense. In combination with some general facts about Boltzmann distributions this will enable us to prove Theorems 1.11.3 without the need for complicated moment computations.

2.4 Planting

We proceed to introduce the paramagnetic and the ferromagnetic version of the planted model. Roughly speaking, these are weighted versions of the common random regular graph \(\mathbb {G}\) where the probability mass of a specific graph is proportional to the paramagnetic or ferromagnetic bit of the partition function. To be precise, for \(\varepsilon >0\), recall the subsets \(S_{\textrm{p}}=S_{\textrm{p}}(\varepsilon ),S_{\textrm{f}}=S_{\textrm{f}}(\varepsilon )\) of the configuration space \([q]^n\). Letting

$$\begin{aligned} Z_{\textrm{f}}(G)=\sum _{\sigma \in S_{\textrm{f}}}\textrm{e}^{\beta \mathcal {H}_{G}(\sigma )}\quad \text{ and } \quad Z_{\textrm{p}}(G)=\sum _{\sigma \in S_{\textrm{p}}}\textrm{e}^{\beta \mathcal {H}_{\varvec{G}}(\sigma )}, \end{aligned}$$
(2.7)

we define random graph models \(\hat{\varvec{G}}_{\textrm{f}},\hat{\varvec{G}}_{\textrm{p}}\) by

$$\begin{aligned} \mathbb {P}\left[{\hat{\varvec{G}}_{\textrm{f}}=G}\right]&=\frac{Z_{\textrm{f}}(G)\mathbb {P}\left[{\varvec{G}=G}\right]}{\mathbb {E}[Z_{\textrm{f}}(\varvec{G})]},&\mathbb {P}\left[{\hat{\varvec{G}}_{\textrm{p}}=G}\right]&=\frac{Z_{\textrm{p}}(G)\mathbb {P}\left[{\varvec{G}=G}\right]}{\mathbb {E}[Z_{\textrm{p}}(\varvec{G})]}. \end{aligned}$$
(2.8)

Thus, \(\hat{\varvec{G}}_{\textrm{f}}\) and \(\hat{\varvec{G}}_{\textrm{p}}\) are d-regular random graphs on n vertices such that the probability that a specific graph G comes up is proportional to \(Z_{\textrm{f}}(G)\) and \(Z_{\textrm{p}}(G)\), respectively.

We need to extend the notion of non-reconstruction to \(\hat{\varvec{G}}_{\textrm{p}},\hat{\varvec{G}}_{\textrm{f}}\). Specifically, we need to define non-reconstruction for the conditional Boltzmann distributions \(\mu _{\mathbb {G},\beta }(\,\cdot \,\mid S_{\textrm{p}})\), \(\mu _{\mathbb {G},\beta }(\,\cdot \,\mid S_{\textrm{f}})\). We thus say that for a graph/configuration pair \((G,\sigma )\), an event \(S\subseteq [q]^{n}\), a positive real \(\xi >0\), a real number \(\gamma \in [0,1]\), an integer \(\ell \ge 1\) and a probability distribution \(\mu \) on [q] the conditional \((\gamma , \xi ,\ell ,\mu )\)-non-reconstruction property holds if

$$\begin{aligned} \frac{1}{n}\sum _{v\in [n]}\sum _{c\in [q]}\left| {\nu ^\mu (c)-\mu _{G,\beta }(\varvec{\sigma }_{G,\beta ,v}=c\mid S,\varvec{\sigma }_{G,\beta ,\partial ^\ell v}=\sigma _{\partial ^\ell v})}\right|&<\xi \end{aligned}$$
(2.9)

holds with probability \(1-\gamma \). In words, (2.9) provides that on the average over all v the conditional marginal probability \(\mu _{G,\beta }(\varvec{\sigma }_{G,\beta ,v}=c\mid S,\varvec{\sigma }_{G,\beta ,\partial ^\ell v}=\sigma _{\partial ^\ell v})\) that v receives colour c given the boundary condition induced by \(\sigma \) on the vertices at distance \(\ell \) from v and given the event S is close to \(\nu ^\mu (\omega )\).

Further, while (2.9) deals with a specific graph/configuration pair \((G,\sigma )\), we need to extend the definition to the random graph models \(\hat{\varvec{G}}_{\textrm{f}}\) and \(\hat{\varvec{G}}_{\textrm{p}}\). For a graph G let \(\varvec{\sigma }_{G,\mathrm f}\) denote a sample from the conditional distribution \(\mu _{G,\beta }(\,\cdot \,\mid S_{\textrm{f}})\). Also define \(\varvec{\sigma }_{G,\mathrm p}\) similarly for \(S_{\textrm{p}}\). We say that for the random graph \(\hat{\varvec{G}}_{\textrm{f}}\) has the \((\eta ,\xi ,\ell )\)-non-reconstruction property if

$$\begin{aligned} \mathbb {E}\left[{\mu _{\hat{\varvec{G}}_{\textrm{f}},\beta }\left( {\left\{ {(\hat{\varvec{G}}_{\textrm{f}},\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{f}},\textrm{f}}) \text{ fails } \text{ to } \text{ have } \text{ the } (\xi ,\ell ,\mu _{\textrm{f}})\text{-non-reconstruction } \text{ property } \text{ conditional } \text{ on } S_{\textrm{f}}}\right\} }\right) }\right]<\eta . \end{aligned}$$
(2.10)

Thus, we ask that (2.9) holds for a typical graph/configuration pair obtained by first drawing a graph \(\hat{\varvec{G}}_{\textrm{f}}\) from the planted model and then sampling \(\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{f}},\mathrm f}\) from \(\mu _{\hat{\varvec{G}}_{\textrm{f}}}(\,\cdot \,\mid S_{\textrm{f}})\). We introduce a similar definition for \(\hat{\varvec{G}}_{\textrm{p}}\).

The following proposition shows that the non-reconstruction statements from Proposition 2.6 carry over to the planted random graphs. This is the key technical statement toward the proofs of Theorems 1.11.3.

Proposition 2.7

Let \(d\ge 3\).

  1. (i)

    Assume that \({\beta _u}<\beta \). Then \(\hat{\varvec{G}}_{\textrm{f}}\) has the \((o(1),1/\log \log n,\lceil \log \log n\rceil )\)-non-reconstruction property. Moreover, for any \(\delta >0\) there exist \(\ell =\ell (d,\beta ,\delta )>0\) and \(\chi =\chi (d,\beta ,\delta )>0\) such that \((\hat{\varvec{G}}_{\textrm{f}},\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{f}},\textrm{f}})\) has the \((\exp (-\chi n),\delta ,\ell ,\mu _{\textrm{f}})\)-non-reconstruction property.

  2. (ii)

    Assume that \(\beta <{\beta _h}\). Then \(\hat{\varvec{G}}_{\textrm{p}}\) has the \((o(1),1/\log \log n,\lceil \log \log n\rceil )\)-non-reconstruction property. Moreover, for any \(\delta >0\) there exist \(\ell =\ell (d,\beta ,\delta )>0\) and \(\chi =\chi (d,\beta ,\delta )>0\) such that \((\hat{\varvec{G}}_{\textrm{p}},\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{p}},\textrm{p}})\) has the \((\exp (-\chi n),\delta ,\ell ,\mu _{\textrm{p}})\)-non-reconstruction property.

Together with a few routine arguments for the study of Boltzmann distributions that build upon [5], we can derive from Proposition 2.7 that for \(\beta >{\beta _u}\) two typical samples from the ferromagnetic Boltzmann distribution have overlap about \(\nu _{\textrm{f}}\otimes \nu _{\textrm{f}}\). This insight enables a truncated second moment computation that sidesteps a detailed study of the function \(F_{d,\beta }^\otimes \) from (2.6). Indeed, the only observation about \(F^\otimes _{d,\beta }\) that we need to make is that \(F^\otimes _{d,\beta }(\nu _{\textrm{f}}\otimes \nu _{\textrm{f}},\rho _{\textrm{f}}\otimes \rho _{\textrm{f}})=2F_{d,\beta }(\nu _{\textrm{f}},\rho _{\textrm{f}})\). Similar arguments apply to the paramagnetic case. We can thus determine the asymptotic Boltzmann weights of \(S_{\textrm{p}},S_{\textrm{f}}\) on the random regular graph as follows.

Corollary 2.8

Let \(d,q\ge 3\) be arbitrary integers.

  1. (i)

    For \(\beta >{\beta _u}\), for all sufficiently small \(\varepsilon >0\), we have w.h.p. \(\frac{1}{n}\log Z_{\textrm{f}}(\varvec{G})=\mathcal {B}_{d,\beta }(\mu _{\textrm{f}})+o(1)\).

  2. (ii)

    For \(\beta <{\beta _h}\), for all sufficiently small \(\varepsilon >0\), we have w.h.p. \(\frac{1}{n}\log Z_{\textrm{p}}(\varvec{G})=\mathcal {B}_{d,\beta }(\mu _{\textrm{p}})+o(1)\).

Finally, combining Corollary 2.8 with the definition (2.8) of the planted models and the non-reconstruction statements from Proposition 2.7, we obtain the following conditional non-reconstruction statements for the plain random regular graph.

Corollary 2.9

Let \(d,q\ge 3\) be arbitrary integers.

  1. (i)

    For \(\beta >{\beta _u}\), the Boltzmann distribution \(\mu _{\varvec{G},\beta }\) given \(S_{\textrm{f}}\) exhibits the non-reconstruction property.

  2. (ii)

    For \(\beta <{\beta _h}\), the Boltzmann distribution \(\mu _{\varvec{G},\beta }\) given \(S_{\textrm{p}}\) exhibits the non-reconstruction property.

Theorem 1.3 is an immediate consequence of Corollaries 2.8 and 2.9.

3 Quiet Planting

In this section we prove Proposition 2.7 along with Corollaries 2.8 and 2.9. We begin with an important general observation about the planted model called the Nishimori identity, which will provide an explicit constructive description of the planted models.

3.1 The Nishimori identity

We complement the definition (2.8) of the planted random graphs \(\hat{\varvec{G}}_{\textrm{f}},\hat{\varvec{G}}_{\textrm{p}}\) by also introducing a reweighted distribution on graphs for a specific configuration \(\sigma \in [q]^{n}\). Specifically, we define a random graph \(\hat{\varvec{G}}(\sigma )\) by letting

$$\begin{aligned} \mathbb {P}\left[{\hat{\varvec{G}}(\sigma )=G}\right]&=\frac{\mathbb {P}\left[{\varvec{G}=G}\right] \textrm{e}^{\beta \mathcal {H}_{\varvec{G}}(\sigma )}}{\mathbb {E}[\textrm{e}^{\beta \mathcal {H}_{\varvec{G}}(\sigma )}]}. \end{aligned}$$
(3.1)

Furthermore, for \(\varepsilon >0\), recalling the truncated partition functions \(Z_{\textrm{f}},Z_{\textrm{p}}\) from (2.7), we introduce reweighted random configurations \(\hat{\varvec{\sigma }}_{\textrm{f}}=\hat{\varvec{\sigma }}_{\textrm{f}}(\varepsilon )\in [q]^n\) and \(\hat{\varvec{\sigma }}_{\textrm{p}}=\hat{\varvec{\sigma }}_{\textrm{p}}(\varepsilon )\in [q]^{n}\) with distributions

$$\begin{aligned} \mathbb {P}\left[{\hat{\varvec{\sigma }}_{\textrm{f}}=\sigma }\right]&=\frac{\varvec{1}\left\{ {\sigma \in S_{\textrm{f}}}\right\} \mathbb {E}[\textrm{e}^{\beta \mathcal {H}_{\varvec{G}}(\sigma )}]}{\mathbb {E}[Z_{\textrm{f}}(\varvec{G})]},&\mathbb {P}\left[{\hat{\varvec{\sigma }}_{\textrm{p}}=\sigma }\right]&=\frac{\varvec{1}\left\{ {\sigma \in S_{\textrm{p}}}\right\} \mathbb {E}[\textrm{e}^{\beta \mathcal {H}_{\varvec{G}}(\sigma )}]}{\mathbb {E}[Z_{\textrm{p}}(\varvec{G})]}. \end{aligned}$$
(3.2)

We have the following paramagnetic and ferromagnetic Nishimori identities.

Proposition 3.1

For any integers \(d,q\ge 3\) and real \(\beta ,\varepsilon >0\), we have

$$\begin{aligned} (\hat{\varvec{G}}_{\textrm{p}},\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{p}},\textrm{p}})\,{{\mathop {=}\limits ^{\text{ d }}}}\,(\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}}),\hat{\varvec{\sigma }}_{\textrm{p}}),{} & {} (\hat{\varvec{G}}_{\textrm{f}},\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{f}},\textrm{f}})\,{{\mathop {=}\limits ^{\text{ d }}}}\,(\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{f}}),\hat{\varvec{\sigma }}_{\textrm{f}}). \end{aligned}$$
(3.3)

Proof

Let G be a d-regular graph on n vertices and \(\sigma \in [q]^n\). We have

$$\begin{aligned}&\mathbb {P}\left[{ (\hat{\varvec{G}}_{\textrm{p}}, \varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{p}},\textrm{p}})=\left( {G,\sigma }\right) }\right] = \mathbb {P}\left[{\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{p}},\textrm{p}}= \sigma \Big \vert \hat{\varvec{G}}_{\textrm{p}}= G}\right] \mathbb {P}\left[{ \hat{\varvec{G}}_{\textrm{p}}=G}\right] \nonumber \\&\quad = \mu _{G,\beta } \left( {\sigma \Big \vert S_{\textrm{p}}}\right) \frac{Z_{\textrm{p}}(G)\mathbb {P}\left[{\varvec{G}=G}\right]}{\mathbb {E}[Z_{\textrm{p}}(\varvec{G})]} . \end{aligned}$$
(3.4)

Moreover, by the definition of the Boltzmann distribution \(\mu _{G,\beta }\),

$$\begin{aligned} \mu _{G,\beta } \left( {\sigma \vert S_{\textrm{p}}}\right)&= \frac{\textrm{e}^{\beta \mathcal {H}_{\varvec{G}}(\sigma )}\varvec{1}\left\{ { \sigma \in S_{\textrm{p}}}\right\} }{ Z(G) \ \mu _{G,\beta }\left( {S_{\textrm{p}}}\right) },&\mu _{G}\left( {S_{\textrm{p}}}\right)&= \frac{ Z_{\textrm{p}}(G) }{ Z(G)}. \end{aligned}$$
(3.5)

Combining (3.4) and (3.5), we obtain

$$\begin{aligned} \mathbb {P}\left[{ (\hat{\varvec{G}}_{\textrm{p}}, \varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{p}},\textrm{p}}) =(G, \sigma ) }\right]&= \frac{\textrm{e}^{\beta \mathcal {H}_{\varvec{G}}(\sigma )} \mathbb {P}\left[{\varvec{G}=G}\right] }{ \mathbb {E}\left[{ \textrm{e}^{\beta \mathcal {H}_{\varvec{G}}(\sigma )}}\right]} \cdot \frac{\mathbb {E}\left[{\textrm{e}^{\beta \mathcal {H}_{\varvec{G}}(\sigma )}}\right]\varvec{1}\left\{ { \sigma \in S_{\textrm{p}}}\right\} }{ \mathbb {E}[ Z_{\textrm{p}}(\varvec{G})]} \\&= \mathbb {P}\left[{ \hat{\varvec{G}}\left( { \hat{\varvec{\sigma }}_{\textrm{p}}}\right) = G \Big \vert \hat{\varvec{\sigma }}_{\textrm{p}}=\sigma }\right] \mathbb {P}\left[{ \hat{\varvec{\sigma }}_{\textrm{p}}= \sigma }\right]\\&=\mathbb {P}\left[{(\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}})=G,\hat{\varvec{\sigma }}_{\textrm{p}}=\sigma )}\right], \end{aligned}$$

as claimed. The very same steps apply to \(\hat{\varvec{G}}_{\textrm{f}}\). \(\quad \square \)

Nishimori identities were derived in [17] for a broad family of planted models which, however, does not include the planted ferromagnetic models \(\hat{\varvec{G}}_{\textrm{p}},\hat{\varvec{G}}_{\textrm{f}}\). Nonetheless, the (simple) proof of Proposition 3.1 is practically identical to the argument from [17].

While the original definition (2.8) of the planted models may appear unwieldy, Proposition 3.1 paves the way for a more hands-on description. But as a preliminary step we need to get a handle on the empirical distribution of the colours under the random configurations \(\hat{\varvec{\sigma }}_{\textrm{f}},\hat{\varvec{\sigma }}_{\textrm{p}}\). Additionally, we also need to determine the edge statistics \(\rho ^{\hat{\varvec{G}}_{\textrm{p}},\hat{\varvec{\sigma }}_{\textrm{p}}}\) and \(\rho ^{\hat{\varvec{G}}_{\textrm{f}},\hat{\varvec{\sigma }}_{\textrm{f}}}\). The following two lemmas solve these problems for us.

Lemma 3.2

Suppose that \(0\le \beta <\beta _h\). Then \(\mathbb {E}[Z_{\textrm{p}}(\varvec{G})]=\exp (nF_{d,\beta }(\nu _{\textrm{p}},\rho _{\textrm{p}})+O(\log n))\).

Proof

To obtain a lower bound on \(\mathbb {E}[Z_{\textrm{p}}(\varvec{G})]\) let \(\sigma _0\in [q]^n\) be a configuration such that \(|\sigma _0^{-1}(s)|=\frac{n}{q}+O(1)\) for all \(s\in [q]\). Let \(\nu (s)=|\sigma _0^{-1}(s)|/n\). Then

$$\begin{aligned} \rho ^\nu (s,t)&=\frac{\textrm{e}^{\beta \varvec{1}\{s=t\}}}{q(q-1+\textrm{e}^\beta )}+O(1/n)=\rho _{\textrm{p}}(s,t)+O(1/n){} & {} (s,t\in [q]). \end{aligned}$$

Therefore, Lemma 2.1 yields

$$\begin{aligned} \mathbb {E}[Z_{\textrm{p}}(\varvec{G})]&\ge \sum _{\sigma \in [q]^n}\varvec{1}\left\{ {\forall s\in [q]:|\sigma ^{-1}(s)|=n\nu (s)}\right\} \mathbb {P}\left[{\mathcal {G}(\sigma ,\rho ^\nu )}\right] \nonumber \\&\quad \exp \left( {\frac{\beta \textrm{e}^\beta dn}{2(q-1+\textrm{e}^\beta )}+O(1)}\right) \nonumber \\&\ge q^n\exp \left( {\frac{\beta \textrm{e}^\beta dn}{2(q-1+\textrm{e}^\beta )}+O(\log n)}\right) =\exp \left( {nF_{d,\beta }(\nu _{\textrm{p}},\rho _{\textrm{p}})+O(\log n)}\right) . \end{aligned}$$
(3.6)

Conversely, since there are only \(n^{O(1)}\) choices of \(\nu ,\rho \), Lemma 2.2 and Proposition 2.3 imply that

$$\begin{aligned} \mathbb {E}[Z_{\textrm{p}}(\varvec{G})]&\le \exp \left( {nF_{d,\beta }(\nu _{\textrm{p}},\rho _{\textrm{p}})+O(\log n)}\right) . \end{aligned}$$
(3.7)

The assertion follows from (3.6) and (3.7). \(\quad \square \)

Lemma 3.3

Suppose that \(\beta >\beta _u\). Then \(\mathbb {E}[Z_{\textrm{f}}(\varvec{G})]=\exp (nF_{d,\beta }(\nu _{\textrm{f}},\rho _{\textrm{f}})+O(\log n))\).

Proof

As in the proof of Lemma 3.2 let \(\sigma _0\in [q]^n\) be a configuration such that \(|\sigma _0^{-1}(s)|=n\nu _{\textrm{f}}(s)+O(1)\) for all \(s\in [q]\). Letting \(\nu (s)=|\sigma _0^{-1}(s)|/n\) we see that \(\rho ^{\nu }(s,t)=\rho _{\textrm{f}}(s,t)+O(1/n)\) for all \(s,t\in [q]\). Therefore, Lemma 2.1 yields

$$\begin{aligned} \mathbb {E}[Z_{\textrm{f}}(\varvec{G})]&\ge \left( {\begin{array}{c}n\\ \nu n\end{array}}\right) \exp \left( {-\frac{dn}{2}D_{\textrm{KL}}\left( {{{\rho ^\nu }\Vert {\nu \otimes \nu }}}\right) +\frac{\beta \textrm{e}^{\beta } dn\sum _{s\in [q]}\mu _{\textrm{f}}(s)^2}{2\big (1+(\textrm{e}^\beta -1)\sum _{s\in [q]}\mu _{\textrm{f}}(s)^2\big )}+O(\log n)}\right) \nonumber \\&=\exp \left( {nF_{d,\beta }(\nu _{\textrm{f}},\rho _{\textrm{f}})+O(\log n)}\right) . \end{aligned}$$
(3.8)

As for the upper bound, once again because there are only \(n^{O(1)}\) choices of \(\nu ,\rho \), Lemma 2.2 and Proposition 2.3 yield

$$\begin{aligned} \mathbb {E}[Z_{\textrm{f}}(\varvec{G})]&\le \exp \left( {nF_{d,\beta }(\nu _{\textrm{f}},\rho _{\textrm{f}})+O(\log n)}\right) . \end{aligned}$$
(3.9)

Combining the lower and upper bounds from (3.8) and (3.9) completes the proof. \(\quad \square \)

Lemma 3.4

For any integers \(d,q\ge 3\) and real \(\beta \in (0,\beta _h)\), there exist \(c,t_0>0\) such that

$$\begin{aligned}&\mathbb {P}\left[{d_{\textrm{TV}}\left( {\nu ^{\hat{\varvec{\sigma }}_{\textrm{p}}},\nu _{\textrm{p}}}\right) +d_{\textrm{TV}}\left( {\rho ^{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}}),\hat{\varvec{\sigma }}_{\textrm{p}}},\rho _{\textrm{p}}}\right) >t}\right] \\&\quad \le \exp (-ct^2n+O(\log n))\qquad \text{ for } \text{ all } 0\le t<t_0. \end{aligned}$$

Proof

Suppose that \(\nu \) is a probability distribution on [q] such that \(n\nu (s)\) is an integer for all \(s\in [q]\). Moreover, suppose that \(\rho =(\rho (s,t))_{s,t\in [q]}\) is a symmetric matrix such that \(dn\rho (s,t)\) is an integer for all \(s,t\in [q]\), \(dn\rho (s,s)\) is even for all \(s\in [q]\) and \(\sum _{t=1}^q\rho (s,t)=\nu (s)\) for all \(s\in [q]\). Retracing the steps of the proof of Lemma 3.3, we see that

$$\begin{aligned} \sum _{\sigma \in [q]^n}\varvec{1}\left\{ {\nu ^{\sigma }=\nu }\right\} \mathbb {P}\left[{\mathcal {G}(\sigma ,\rho )}\right]\exp \left( {\frac{\beta dn}{2}\sum _{s=1}^q\rho (s,s)}\right)&=\exp \left( {nF_{d,\beta }(\nu ,\rho )+O(\log n)}\right) . \end{aligned}$$
(3.10)

Therefore, the assertion follows from Proposition 2.3 and the definition (2.3) of stable local maxima. \(\quad \square \)

Lemma 3.5

For any integers \(d,q\ge 3\) and real \(\beta >{\beta _u}\), there exist \(c,t_0>0\) such that

$$\begin{aligned}&\mathbb {P}\left[{d_{\textrm{TV}}\left( {\nu ^{\hat{\varvec{\sigma }}_{\textrm{f}}},\nu _{\textrm{f}}}\right) +d_{\textrm{TV}}\left( {\rho ^{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{f}}),\hat{\varvec{\sigma }}_{\textrm{f}}}, \rho _{\textrm{f}}}\right) >t}\right]\\&\quad \le \exp (-ct^2n+O(\log n))\qquad \text{ for } \text{ all } 0\le t<t_0. \end{aligned}$$

Proof

The argument from the proof of Lemma 3.4 applies mutatis mutandis. \(\quad \square \)

At this point we have handy, constructive descriptions of the models \(\hat{\varvec{G}}_{\textrm{p}},\hat{\varvec{G}}_{\textrm{f}}\). Indeed, Lemmas 3.4 and 3.5 provide that the planted configurations \(\hat{\varvec{\sigma }}_{\textrm{p}}\) and \(\hat{\varvec{\sigma }}_{\textrm{f}}\) have colour statistics approximately equal to \(\nu _{\textrm{p}}\) and \(\nu _{\textrm{f}}\) w.h.p., respectively. Moreover, because the random graph models are invariant under permutations of the vertices, \(\hat{\varvec{\sigma }}_{\textrm{p}}\) and \(\hat{\varvec{\sigma }}_{\textrm{f}}\) are uniformly random given their colour statistics. In addition, the edge statistics of the random graphs \(\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}})\) and \(\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{f}})\) concentrate about \(\rho _{\textrm{f}}\) and \(\rho _{\textrm{p}}\). Once more because of permutation invariance, the random graphs \(\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}})\) and \(\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{f}})\) themselves are uniformly random given the planted assignment \(\hat{\varvec{\sigma }}_{\textrm{p}}\) or \(\hat{\varvec{\sigma }}_{\textrm{f}}\) and given the edge statistics.

Thus, let \(\mathfrak {S}_{\textrm{f}}\) and \(\mathfrak {S}_{\textrm{p}}\) be the \(\sigma \)-algebras generated by \(\hat{\varvec{\sigma }}_{\textrm{f}},\rho ^{\hat{\varvec{G}}_{\textrm{f}},\hat{\varvec{\sigma }}_{\textrm{f}}}\) and \(\hat{\varvec{\sigma }}_{\textrm{p}},\rho ^{\hat{\varvec{G}}_{\textrm{p}},\hat{\varvec{\sigma }}_{\textrm{p}}}\), respectively. Then we can use standard techniques from the theory of random graphs to derive typical properties of \(\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}})\) given \(\mathfrak {S}_{\textrm{p}}\) and of \(\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{f}})\) given \(\mathfrak {S}_{\textrm{f}}\), which are distributed precisely as \(\hat{\varvec{G}}_{\textrm{p}}\) and \(\hat{\varvec{G}}_{\textrm{f}}\) by Proposition 3.1. Using these characterisations, we are now going to prove Proposition 2.7.

3.2 Proof of Proposition 2.7

Lemma 3.4 gives sufficiently accurate information as to the distribution of \(\hat{\varvec{\sigma }}_{\textrm{p}},\rho ^{\hat{\varvec{G}}_{\textrm{p}},\hat{\varvec{\sigma }}_{\textrm{p}}}\) for us to couple the distribution of the colouring produced by the broadcasting process and the colouring that \(\hat{\varvec{\sigma }}_{\textrm{p}}\) induces on the neighbourhood of some particular vertex of \(\hat{\varvec{G}}_{\textrm{p}}\), say v.

Lemma 3.6

Let \(d,q\ge 3\) be integers and \(\beta \in (0,{\beta _h})\) be real. Then, for any vertex v and any non-negative integer \(\ell =o(\log n)\), given \(\mathfrak {S}_{\textrm{p}}\) w.h.p. we have

$$\begin{aligned} d_{\textrm{TV}}(\hat{\varvec{\sigma }}_{\textrm{p},\partial ^\ell v},\tau _{\partial ^\ell o})=O\left( {d^\ell \left( {d_{\textrm{TV}}(\nu ^{\hat{\varvec{\sigma }}_{\textrm{p}}},\nu _{\textrm{p}})+d_{\textrm{TV}}(\rho ^{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}}),\hat{\varvec{\sigma }}_{\textrm{p}}},\rho _{\textrm{p}})+O(n^{-0.99})}\right) }\right) . \end{aligned}$$

Proof

Proceeding by induction on \(\ell \), we construct a coupling of \(\hat{\varvec{\sigma }}_{\textrm{p},\partial ^\ell v}\) and \(\tau _{\partial ^\ell o}\). Let

$$\begin{aligned} \zeta =d_{\textrm{TV}}(\nu ^{\hat{\varvec{\sigma }}_{\textrm{p}}},\nu _{\textrm{p}})+d_{\textrm{TV}}(\rho ^{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}}),\hat{\varvec{\sigma }}_{\textrm{p}}},\rho _{\textrm{p}}). \end{aligned}$$
(3.11)

In the case \(\ell =0\) the set \(\partial ^\ell v\) consists of v only, while \(\partial ^\ell o\) comprises only the root vertex o itself. Hence, the colours \(\hat{\varvec{\sigma }}_{\textrm{p}}(v)\) and \(\tau (o)\) can be coupled to coincide with probability at least \(1-\zeta \). As for \(\ell \ge 1\), assume by induction that \(\partial ^{\ell -1} v\) is acyclic and that \(\hat{\varvec{\sigma }}_{\textrm{p},\partial ^{\ell -1} v}\) and \(\tau _{\partial ^{\ell -1} o}\) coincide. Given \(\partial ^{\ell -1} v\) and \(\hat{\varvec{\sigma }}_{\textrm{p},\partial ^{\ell -1} v}\) every vertex u at distance precisely \(\ell -1\) from v in \(\hat{\varvec{G}}_{\textrm{p}}\) then requires another \(d-1\) neighbours outside of \(\partial ^{\ell -1} v\). Because \(\hat{\varvec{G}}_{\textrm{p}}\) is uniformly random given \(\mathfrak {S}_{\textrm{p}}\), for each u these \(d-1\) neighbours are simply the endpoints of edges \(e_{u,1},\ldots ,e_{u,d-1}\) drawn randomly from the set of all remaining edges with one endpoint of colour \(\hat{\varvec{\sigma }}_{\textrm{p}}(u)\). Since \(\ell =o(\log n)\), the subgraph \(\partial ^\ell v\) consumes no more than \(n^{o(1)}\) edges. As a consequence, for each neighbour \(w\not \in \partial ^{\ell -1}v\) the colour \(\hat{\varvec{\sigma }}_{\textrm{p}}(w)\) has distribution \(\rho ^\nu (\hat{\varvec{\sigma }}_{\textrm{p}}(u),\,\cdot \,)\), up to an error of \(n^{o(1)-1}\) in total variation. Finally, the probability that two vertices at distance precisely \(\ell \) from v are neighbours is bounded by \(n^{o(1)-1}\) as well.

By comparison, in the broadcasting process on \(\mathbb T_{d}\) the colours of the children of y are always drawn independently from the distribution \(\rho _{\textrm{p}}(\varvec{\sigma }_{d,\beta ,\nu _{\textrm{p}}}(y),\,\cdot \,)\). Hence, the colours of the vertices at distance \(\ell \) in the two processes can be coupled to completely coincide with probability \(1-O(d^\ell (\zeta +n^{o(1)-1}))\), as claimed.

In addition, since we work with the conditional Boltzmann distributions where we “cut off” a part of the phase space, we need to verify that the configuration is very unlikely to hit the boundary of \(S_{\textrm{p}}\). To see this, recall from Proposition 2.3 that, for \(\beta \in (0, {\beta _h})\), \((\nu _{\textrm{p}}, \rho _{\textrm{p}})\) is a stable local maxima of \(F_{d, \beta }\) i.e. there exist \(\delta ,c>0\) such that

$$\begin{aligned} F_{d,\beta }(\nu ',\rho ')&\le F_{d,\beta }(\nu _{\textrm{p}},\rho _{\textrm{p}})-c\left( {\Vert \nu _{\textrm{p}}-\nu '\Vert ^2+\Vert \rho _{\textrm{p}}-\rho '\Vert ^2}\right) \end{aligned}$$
(3.12)

for all \(\nu '\in \mathcal {P}([q])\) and \(\rho '\in {\mathcal R}(\nu ')\) such that \(\Vert \nu _{\textrm{p}}-\nu '\Vert +\Vert \rho _{\textrm{p}}-\rho '\Vert <\delta \). Now, choose \(\varepsilon \) in the definition of \(S_{\textrm{p}}(\varepsilon )\) such that \(\varepsilon >\delta \) and define \(T_p(\delta )=\left\{ \sigma \in [q]^{n}:\frac{1}{n} \sum _{c\in [q]} \left| {\sigma ^{-1} (c)}\right| = \nu _{\textrm{p}}+ \delta ^\Delta \right\} \) for some \(\Delta >0\). Moreover, define a probability distribution \(\nu _{\textrm{p}}'\) on the q colours by \(\nu _{\textrm{p}}'(c)=\frac{1}{q}+\frac{\delta ^{\Delta }}{ q}\) for all \(c \in [q]\) and let \(\rho _{\textrm{p}}' \in {\mathcal R}(\nu _{\textrm{p}}')\) the corresponding maximizer for \(F_{d, \beta }(\nu _{\textrm{p}}', \cdot )\) (as in 2.4). Furthermore, choose \(\Delta \) sufficiently small so that \( \Vert \nu _{\textrm{p}}-\nu _{\textrm{p}}'\Vert +\Vert \rho _{\textrm{p}}-\rho _{\textrm{p}}'\Vert <\delta \). Thus, by (3.12) and Lemma 3.2 we have

$$\begin{aligned} \mathbb {P}\left[{\hat{\varvec{\sigma }}_{\textrm{p}}\in T_p(\delta ) }\right]&=\sum _{\sigma \in T_p(\delta ) }\frac{\varvec{1}\left\{ {\sigma \in S_{\textrm{p}}}\right\} \mathbb {E}[\textrm{e}^{\beta \mathcal {H}_{\varvec{G}}(\sigma )}]}{\mathbb {E}[Z_{\textrm{p}}(\varvec{G})]} \le \frac{\exp \left( { n F_{d,\beta }(\nu _{\textrm{p}}',\rho _{\textrm{p}}') }\right) }{\exp \left( { n F_{d,\beta }(\nu _{\textrm{p}},\rho _{\textrm{p}}) + O(\log n)}\right) } \\&\le \exp \left( {\left( {- c \left( { \delta ^{2 \Delta } + \Vert \rho _{\textrm{p}}-\rho '\Vert ^2 }\right) + o(1) }\right) n }\right) \le \exp \left( { \left( {-K +o(1)}\right) n}\right) \end{aligned}$$

for some sufficiently large constant K, as desired. \(\quad \square \)

The colouring of the neighbourhood of \(v_1\) in \(\hat{\varvec{G}}_{\textrm{f}}\) admits a similar coupling with the ferromagnetic version of the broadcasting process.

Lemma 3.7

Let \(d,q\ge 3\) be integers and \(\beta >{\beta _u}\) be real. Then, for any vertex v and any non-negative integer \(\ell =o(\log n)\), given \(\mathfrak {S}_{\textrm{f}}\) w.h.p. we have

$$\begin{aligned} d_{\textrm{TV}}(\hat{\varvec{\sigma }}_{\textrm{f},\partial ^\ell v},\tau _{\partial ^\ell o})=O\left( {d^\ell \left( {d_{\textrm{TV}}(\nu ^{\hat{\varvec{\sigma }}_{\textrm{f}}},\nu _{\textrm{f}})+d_{\textrm{TV}}(\rho ^{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{f}}),\hat{\varvec{\sigma }}_{\textrm{f}}},\rho _{\textrm{f}})+n^{-0.99}}\right) }\right) . \end{aligned}$$

Proof

The argument from the proof of Lemma 3.6 carries over directly. \(\quad \square \)

Proof of Proposition 2.7

We prove the first statement concerning \(\hat{\varvec{G}}_{\textrm{f}}\); the proof of the second statement for \(\hat{\varvec{G}}_{\textrm{p}}\) is analogous. Due to Proposition 3.1 we may work with the random graph \(\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{f}})\) with planted configuration \(\hat{\varvec{\sigma }}_{\textrm{f}}\). Fix an arbitrary vertex v and \(\ell =\lceil \log \log n\rceil \). For the first assertion, by the Nishimori identity, it suffices to prove that

$$\begin{aligned} \sum _{c\in [q]}\mathbb {E}\left| {\nu _{\textrm{f}}(c)-\mu _{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{f}}),\beta } (\varvec{\sigma }_{v}=c\mid \varvec{\sigma }_{\partial ^\ell v}=\hat{\varvec{\sigma }}_{\textrm{f}}{}_{,\partial ^\ell v})}\right|&<\ell ^{-3}, \end{aligned}$$
(3.13)

where the expectation is over the choice of the pair \((\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{f}}),\hat{\varvec{\sigma }}_{\textrm{f}})\). Indeed, the desired \((o(1),\ell ^{-1},\ell )\)-non-reconstruction property follows from (3.13) and Markov’s inequality.

To obtain (3.13) we first apply Lemma 3.5, which implies that with probability \(1-o(1/n)\),

$$\begin{aligned} d_{\textrm{TV}}\left( {\nu ^{\hat{\varvec{\sigma }}_{\textrm{f}}},\nu _{\textrm{f}}}\right) +d_{\textrm{TV}}\left( {\rho ^{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{f}}),\hat{\varvec{\sigma }}_{\textrm{f}}},\rho _{\textrm{f}}}\right) \le n^{-1/4}. \end{aligned}$$
(3.14)

Further, assuming (3.14), we obtain from Lemma 3.7 that

$$\begin{aligned} d_{\textrm{TV}}(\varvec{\sigma }_{{\textrm{f}},\partial ^\ell v},\tau _{\partial ^\ell o})=o(n^{-1/5}). \end{aligned}$$
(3.15)

Hence, the colourings \(\partial ^\ell v\) and \(\tau _{\partial ^\ell o}\) can be coupled such that both are identical with probability \(1-o(n^{-1/5})\). Consequently, (3.13) follows from Proposition 2.6.

Thus, we are left to prove the second assertion concerning \((\exp (-\chi n),\delta ,\ell ,\mu _{\textrm{f}})\)-non-reconstruction. Hence, given \(\delta >0\) pick a large enough \(\ell =\ell (d,\beta ,\delta )>0\), a small enough \(\zeta =\zeta (\delta ,\ell )>0\) and even smaller \(\xi =\xi (\delta ,\ell ,\zeta )>0\), \(\chi =\chi (d,\beta ,\xi )>0\). Then in light of Lemma 3.5 we may assume that

$$\begin{aligned} d_{\textrm{TV}}\left( {\nu ^{\hat{\varvec{\sigma }}_{\textrm{f}}},\nu _{\textrm{f}}}\right) +d_{\textrm{TV}}\left( {\rho ^{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{f}}),\hat{\varvec{\sigma }}_{\textrm{f}}},\rho _{\textrm{f}}}\right) <\xi . \end{aligned}$$
(3.16)

Further, let \(\varvec{X}\) be the number of vertices u such that

$$\begin{aligned} \sum _{c\in [q]}\left| {\nu _{\textrm{f}}(c)-\mu _{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{f}}),\beta }(\varvec{\sigma }_{u}=c\mid \varvec{\sigma }_{\partial ^\ell u}=\hat{\varvec{\sigma }}_{\textrm{f}}{}_{,\partial ^\ell u})}\right| >\zeta . \end{aligned}$$

Then Proposition 2.6, (3.16) and Lemma 3.7 imply that \(\mathbb {E}[\varvec{X}]<\zeta n\). Moreover, \(\varvec{X}\) is tightly concentrated about its mean. Indeed, adding or removing a single edge of the random d-regular graph \(\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{f}})\) can alter the \(\ell \)-th neighbourhoods of no more than \(d^\ell \) vertices. Therefore, the Azuma–Hoeffding inequality shows that

$$\begin{aligned} \mathbb {P}\left[{\varvec{X}>\mathbb {E}[\varvec{X}\mid \mathfrak {S}_{\textrm{f}}]+\zeta n\mid \mathfrak {S}_{\textrm{f}}}\right]<\exp (-\chi n), \end{aligned}$$
(3.17)

as desired. \(\quad \square \)

3.3 Proof of Corollary 2.8

We derive the corollary from Proposition 2.7, the Nishimori identity from Proposition 3.1 and the formula (2.6) for the second moment. As a first step we derive an estimate of the typical overlap of two configurations drawn from the Boltzmann distribution. To be precise, for a graph \(G=(V,E)\), the overlap of two configurations \(\sigma ,\sigma '\in [q]^{V}\) is defined as the probability distribution \(\nu (\sigma ,\sigma ')\in \mathcal {P}([q]^2)\) with

$$\begin{aligned} \nu _{c,c'}(\sigma ,\sigma ')&=\frac{1}{n}\sum _{v\in V(G)}\varvec{1}\left\{ {\sigma _{v}=c,\,\sigma '_{v}=c'}\right\}{} & {} (c,c'\in [q]). \end{aligned}$$

Thus, \(\nu (\sigma ,\sigma ')\) gauges the frequency of the colour combinations among vertices.

Lemma 3.8

Let \(d,q\ge 3\) be integers and \(\beta <{\beta _h}\) be real. Let \(\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{p}},\textrm{p}},\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{p}},\textrm{p}}'\) be independent samples from \(\mu _{\hat{\varvec{G}}_{\textrm{p}},\beta }(\,\cdot \,\mid S_{\textrm{p}})\). Then \(\mathbb {E}\left[{d_{\textrm{TV}}\big (\nu (\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{p}},\textrm{p}},\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{p}},\textrm{p}}'),\nu _{\textrm{p}}\otimes \nu _{\textrm{p}}\big )}\right]=o(1)\).

Proof

Due to the Nishimori identity (3.3) it suffices to prove that w.h.p. for a sample \(\varvec{\sigma }_{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}})}\) from \(\mu _{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}}),\beta }(\,\cdot \,\mid S_{\textrm{p}})\) it holds that

$$\begin{aligned} d_{\textrm{TV}}\big (\nu (\hat{\varvec{\sigma }}_{\textrm{p}},\varvec{\sigma }_{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}}),\textrm{p}}),\nu _{\textrm{p}}\otimes \nu _{\textrm{p}}\big )&=o(1) \end{aligned}$$
(3.18)

To see (3.18), for colours \(s,t\in [q]\), we consider the first and second moment of the number of vertices u with \(\hat{\varvec{\sigma }}_{\textrm{p}}(u)=s\) and \(\varvec{\sigma }_{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}}),\textrm{p}}(u)=t\). To facilitate the analysis of the second moment, it will be convenient to consider the following configuration \(\varvec{\sigma }_{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}}),\textrm{p}}'\). Let \(\varvec{v},\varvec{w}\) be two random vertices such that \(\hat{\varvec{\sigma }}_{\textrm{p}}(\varvec{v})=\hat{\varvec{\sigma }}_{\textrm{p}}(\varvec{w})=s\). Also let \(\ell =\ell (n)=\lceil \log \log n\rceil \). Now, draw \(\varvec{\sigma }_{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}}),\textrm{p}}''\) from \(\mu _{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}}),\beta }(\,\cdot \,\mid S_{\textrm{p}})\) and subsequently generate \(\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{p}},\textrm{p}}'\) by re-sampling the colours of the vertices at distance less than \(\ell \) from \(\varvec{v},\varvec{w}\) given the colours of the vertices at distance precisely \(\ell \) from \(\varvec{v},\varvec{w}\) and the event \(S_{\textrm{p}}\). Then \(\varvec{\sigma }_{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}}),\textrm{p}}'\) has distribution \(\mu _{\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}}),\beta }(\,\cdot \,\mid S_{\textrm{p}})\). Moreover, since for two random vertices \(\varvec{v}, \varvec{w}\) their \(\ell \)-neighbourhoods are going to be disjoint, Proposition 2.7 implies that w.h.p.

$$\begin{aligned}&\mathbb {P}\left[{\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{p}},\textrm{p}}'(\varvec{v})=\chi ,\,\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{p}},\textrm{p}}'(\varvec{w})=\chi '\mid \hat{\varvec{\sigma }}_{\textrm{p}},\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}}),\varvec{v},\varvec{w}}\right]\nonumber \\&\quad =\nu _{\textrm{p}}(\chi )\nu _{\textrm{p}}(\chi ')+o(1)\qquad \text{ for } \text{ all } \chi ,\chi '\in [q]. \end{aligned}$$
(3.19)

Hence, for a colour \(t\in [q]\) let \(\varvec{X}(s,t)\) be the number of vertices u with \(\hat{\varvec{\sigma }}_{\textrm{p}}(u)=s\) and \(\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{p}},\textrm{p}}'(u)=t\). Then (3.19) shows that w.h.p.

$$\begin{aligned} \mathbb {E}\left[{\varvec{X}(s,t)\mid \hat{\varvec{\sigma }}_{\textrm{p}},\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}})}\right]\sim \frac{n}{q^2},\qquad \mathbb {E}\left[{\varvec{X}(s,t)^2\mid \hat{\varvec{\sigma }}_{\textrm{p}},\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}})}\right]&\sim \frac{n^2}{q^4}. \end{aligned}$$

Thus, (3.18) follows from Chebyshev’s inequality. \(\quad \square \)

Lemma 3.9

Let \(d,q\ge 3\) be integers and \(\beta >{\beta _u}\) be real. Let \(\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{f}},\textrm{f}},\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{f}},\textrm{f}}'\) be independent samples from \(\mu _{\hat{\varvec{G}}_{\textrm{f}},\beta }(\,\cdot \,\mid S_{\textrm{f}})\). Then \(\mathbb {E}\left[{d_{\textrm{TV}}\big (\nu (\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{f}},\textrm{f}},\varvec{\sigma }_{\hat{\varvec{G}}_{\textrm{f}},\textrm{f}}'),\nu _{\textrm{f}}\otimes \nu _{\textrm{f}}\big )}\right]=o(1)\).

Proof

The same argument as in the proof of Lemma 3.8 applies. \(\quad \square \)

We proceed to apply the second moment method to truncated versions of the paramagnetic and ferromagnetic partition functions \(Z_{\textrm{p}},Z_{\textrm{f}}\) where we expressly drop graphs that violate the overlap bounds from Lemmas 3.8 and 3.9. Thus, we introduce

$$\begin{aligned} Y_{\textrm{p}}(G)&=Z_{\textrm{p}}(G)\cdot \varvec{1}\left\{ {\mathbb {E}\big [d_{\textrm{TV}}(\nu (\varvec{\sigma }_{G,\textrm{p}}, \varvec{\sigma }_{G,\textrm{p}}'),\nu _{\textrm{p}}\otimes \nu _{\textrm{p}})\big ]=o(1)}\right\} ,\end{aligned}$$
(3.20)
$$\begin{aligned} Y_{\textrm{f}}(G)&=Z_{\textrm{f}}(G)\cdot \varvec{1}\left\{ {\mathbb {E}\big [d_{\textrm{TV}}(\nu (\varvec{\sigma }_{G,\textrm{f}},\varvec{\sigma }_{G,\textrm{f}}'), \nu _{\textrm{f}}\otimes \nu _{\textrm{f}})\big ]=o(1)}\right\} . \end{aligned}$$
(3.21)

Estimating the second moments of these two random variables is a cinch because by construction we can avoid an explicit optimisation of the function \(F^\otimes _{d,\beta }\) from (2.6). Indeed, because we drop graphs G whose overlaps stray far from the product measures \(\nu _{\textrm{p}}\otimes \nu _{\textrm{p}}\) and \(\nu _{\textrm{f}}\otimes \nu _{\textrm{f}}\), respectively, we basically just need to evaluate the function \(F^\otimes _{d,\beta }\) at \(\nu _{\textrm{p}}\otimes \nu _{\textrm{p}}\) and \(\nu _{\textrm{f}}\otimes \nu _{\textrm{f}}\).

Corollary 3.10

Let \(d\ge 3\).

  1. (i)

    If \(\beta <{\beta _h}\), then \(\mathbb {E}[Y_{\textrm{p}}(\varvec{G})]\sim \mathbb {E}[Z_{\textrm{p}}(\varvec{G})]\) and \(\mathbb {E}[Y_{\textrm{p}}(\varvec{G})^2]\le \exp (o(n))\mathbb {E}[Z_{\textrm{p}}(\varvec{G})]^2\).

  2. (ii)

    If \(\beta >{\beta _u}\), then \(\mathbb {E}[Y_{\textrm{f}}(\varvec{G})]\sim \mathbb {E}[Z_{\textrm{f}}(\varvec{G})]\) and \(\mathbb {E}[Y_{\textrm{f}}(\varvec{G})^2]\le \exp (o(n))\mathbb {E}[Z_{\textrm{f}}(\varvec{G})]^2\).

Proof

Assume that \(\beta <{\beta _h}\). Let \({\mathcal E}_{\mathrm p}=\{G:\mathbb {E}\big [d_{\textrm{TV}}(\nu (\varvec{\sigma }_{G,\textrm{p}},\varvec{\sigma }_{G,\textrm{p}}'),\nu _{\textrm{p}}\otimes \nu _{\textrm{p}})\big ]=o(1)\}\). Combining Lemma 3.8 with the Nishimori identity (3.3), we obtain

$$\begin{aligned} \frac{\mathbb {E}[Y_{\textrm{p}}(\varvec{G})]}{\mathbb {E}[Z_{\textrm{p}}(\varvec{G})]}&=\mathbb {P}\left[{\hat{\varvec{G}}_{\textrm{p}}\in {\mathcal E}_{\mathrm p}}\right]\sim 1 \end{aligned}$$
(3.22)

and thus \(\mathbb {E}[Y_{\textrm{p}}(\varvec{G})]\sim \mathbb {E}[Z_{\textrm{p}}(\varvec{G})]\).

Regarding the second moment, consider the set \(\mathcal {P}_{{\textrm{p}}}(n)\) of all probability distributions \(\nu \) on \([q]\times [q]\) such that \(n\nu (\chi ,\chi ')\) is an integer for all \(\chi ,\chi '\in [q]\) and such that \(d_{\textrm{TV}}(\nu ,\varvec{u})=o(1)\). Let \({\mathcal R}_{\textrm{p}}(\nu ,n)\) be the set of all distributions \(\rho \) on \([q]^4\) such that

$$\begin{aligned}&\rho (\chi ,\chi ',\chi '',\chi ''')=\rho (\chi '',\chi ''',\chi ,\chi ')\quad \text{ for } \text{ all } \chi ,\chi ',\chi '',\chi '''\in [q]\quad \text{ and }\end{aligned}$$
(3.23)
$$\begin{aligned}&\sum _{\chi '',\chi '''\in [q]}\rho (\chi ,\chi ',\chi '',\chi ''')=\nu (\chi ,\chi ')\quad \text{ for } \text{ all } \chi ,\chi '\in [q] \end{aligned}$$
(3.24)

and such that \(n\rho (\chi ,\chi ',\chi '',\chi ''')\) is an integer for all \(\chi ,\chi ',\chi '',\chi '''\in [q]\). Using the definition (3.20) of \(Y_{\textrm{p}}\), Lemma 2.1 and the linearity of expectation, we bound

$$\begin{aligned}&\mathbb {E}\left[{Y_{\textrm{p}}(\varvec{G})^2}\right]\le (1+o(1))\sum _{\sigma ,\sigma '\in [q]^n}\varvec{1}\left\{ {d_{\textrm{TV}}(\nu (\sigma ,\sigma '), \nu _{\textrm{p}}\otimes \nu _{\textrm{p}})=o(1)}\right\} \mathbb {E}\left[{\textrm{e}^{\beta (\mathcal {H}_{\varvec{G}}(\sigma )+\mathcal {H}_{\varvec{G}}(\sigma '))}}\right]\nonumber \\&\quad \le \sum _{\nu \in \mathcal {P}_{\textrm{p}}(n)}\left( {\begin{array}{c}n\\ \nu n\end{array}}\right) \sum _{\rho \in {\mathcal R}_{\textrm{p}}(\nu ,n)} \exp \Big [\frac{dn}{2}\sum _{\chi ,\chi ',\chi '',\chi '''=1}^q\rho (\chi ,\chi ',\chi '',\chi ''') \log \frac{\nu (\chi ,\chi ')\nu (\chi '',\chi ''')}{\rho (\chi ,\chi ',\chi '',\chi ''')}\nonumber \\&\qquad +\beta \left( {\varvec{1}\left\{ {\chi =\chi ''}\right\} +\varvec{1}\left\{ {\chi '=\chi '''}\right\} }\right) +O(\log n)\Big ]. \end{aligned}$$
(3.25)

For any given \(\nu \) the term inside the square brackets is a strictly concave function of \(\rho \). Therefore, for any \(\nu \) there exists a unique maximiser \(\rho ^*_\nu \). Moreover, the set \({\mathcal R}_{\textrm{p}}(\nu ,n)\) has size \(|{\mathcal R}_{\textrm{p}}(\nu ,n)|=n^{O(1)}\). Hence, using Stirling’s formula we can simplify (3.25) to

$$\begin{aligned}&\mathbb {E}\left[{Y_{\textrm{p}}(\varvec{G})^2}\right]\le \sum _{\nu \in \mathcal {P}_{\textrm{p}}(n)} \exp \Big [-n\sum _{\chi ,\chi '=1}^q\nu (\chi ,\chi ')\log \nu (\chi ,\chi ')\nonumber \\&\qquad +\frac{dn}{2}\sum _{\chi ,\chi ',\chi '',\chi '''=1}^q\rho ^*_\nu (\chi ,\chi ',\chi '',\chi ''') \log \frac{\nu (\chi ,\chi ')\nu (\chi '',\chi ''')}{\rho ^*_\nu (\chi ,\chi ',\chi '',\chi ''')}\nonumber \\&\qquad +\beta \left( {\varvec{1}\left\{ {\chi =\chi ''}\right\} +\varvec{1}\left\{ {\chi '=\chi '''}\right\} }\right) +O(\log n)\Big ]. \end{aligned}$$
(3.26)

To further simplify the expression notice that the maximiser \(\rho ^*_\nu \) is the unique solution to a concave optimisation problem subject to the linear constraints (3.23)–(3.24). Since the constraints (3.24) themselves are linear in \(\nu \), by the inverse function theorem the maximiser \(\rho ^*_\nu \) is a continuous function of \(\nu \). In effect, since \(|\mathcal {P}_{\textrm{p}}(n)|=n^{O(1)}\), we can bound (3.26) by the contribution of the uniform distribution \(\nu _{\textrm{p}}\otimes \nu _{\textrm{p}}\) only. We thus obtain

$$\begin{aligned}&\mathbb {E}\left[{Y_{\textrm{p}}(\varvec{G})^2}\right]\le q^n\exp \Big [\frac{dn}{2}\sum _{\chi ,\chi ',\chi '',\chi '''=1}^q\rho ^*_{\nu _{\textrm{p}}\otimes \nu _{\textrm{p}}} (\chi ,\chi ',\chi '',\chi ''')\log \frac{\nu _{\textrm{p}}(\chi ,\chi ')\nu _{\textrm{p}}(\chi '',\chi ''')}{\rho ^*_{\nu _{\textrm{p}}\otimes \nu _{\textrm{p}}}(\chi ,\chi ',\chi '',\chi ''')}\nonumber \\&\qquad +\beta \left( {\varvec{1}\left\{ {\chi =\chi ''}\right\} +\varvec{1}\left\{ {\chi '=\chi '''}\right\} }\right) +o(n)\Big ]. \end{aligned}$$
(3.27)

Finally, the maximiser \(\rho ^*_{\nu _{\textrm{p}}\otimes \nu _{\textrm{p}}}\) in (3.27) works out to be \(\rho ^*_{\nu _{\textrm{p}}\otimes \nu _{\textrm{p}}}=\rho _{\textrm{p}}\otimes \rho _{\textrm{p}}\). To see this, recall from Lemma 3.8 that \(\nu _{\textrm{p}}\) is the uniform distribution on [q]. It therefore remains to show that subject to (3.23)–(3.24), the function

$$\begin{aligned} g(\rho )&=\sum _{\chi ,\chi ',\chi '',\chi '''=1}^q\rho (\chi ,\chi ',\chi '',\chi ''')\log \frac{\nu _{\textrm{p}}(\chi ) \nu _{\textrm{p}}(\chi ') \nu _{\textrm{p}}(\chi '') \nu _{\textrm{p}}(\chi ''')}{\rho (\chi ,\chi ',\chi '',\chi ''')}\\&\quad +\beta \left( {\varvec{1}\left\{ {\chi =\chi ''}\right\} +\varvec{1}\left\{ {\chi '=\chi '''}\right\} }\right) \\&=-4\log q-\sum _{\chi ,\chi ',\chi '',\chi '''=1}^q\rho (\chi ,\chi ',\chi '',\chi ''')\log \left( {\rho (\chi ,\chi ',\chi '',\chi ''')}\right) \\&\quad -\beta \left( {\varvec{1}\left\{ {\chi =\chi ''}\right\} +\varvec{1}\left\{ {\chi '=\chi '''}\right\} }\right) \end{aligned}$$

attains its maximum at the distribution \(\rho =\rho _{\textrm{p}}\otimes \rho _{\textrm{p}}\). Since g is strictly concave, the unique maximum occurs at the unique stationary point of the Lagrangian

$$\begin{aligned} L_{{\textrm{p}}}&=g(\rho )+\sum _{\chi ,\chi ',\chi '',\chi '''}\lambda _{\chi ,\chi ',\chi '',\chi '''} \left( {\rho \left( {\chi ,\chi ',\chi '',\chi '''}\right) -\rho \left( {\chi '',\chi ''',\chi ,\chi '}\right) }\right) \\&\quad +\sum _{\chi ,\chi '}\lambda _{\chi ,\chi '}\left( {\sum _{\chi '',\chi '''\in [q]} \rho (\chi ,\chi ',\chi '',\chi ''')-\nu (\chi ,\chi ')}\right) . \end{aligned}$$

Since the derivatives work out to be

$$\begin{aligned} \frac{\partial L_{{\textrm{p}}}}{\partial \rho (\chi ,\chi ',\chi '',\chi ''')}&=-1-\log \rho (\chi ,\chi ',\chi '',\chi ''') +\lambda _{\chi ,\chi ',\chi '',\chi '''} -\lambda _{\chi '',\chi ''',\chi ',\chi '} \\&\quad +\lambda _{\chi ,\chi '}+\beta \varvec{1}\{\chi =\chi ''\}+\beta \varvec{1}\{\chi '=\chi '''\}, \end{aligned}$$

for the choice \(\rho =\rho _{\textrm{p}}\otimes \rho _{\textrm{p}}\) there exist Lagrange multipliers such that all partial derivatives vanish.

The proof of (ii) proceeds analogously. \(\quad \square \)

Proof of Corollary 2.8

The corollary is now an immediate consequence of Corollary 3.10, the Paley-Zygmund and Azuma inequalities. \(\quad \square \)

3.4 Proof of Corollary 2.9

To prove Corollary 2.9 we derive the following general transfer principle from the estimate of the Boltzmann weights of \(S_{\textrm{f}}\) and \(S_{\textrm{p}}\) from Corollary 2.8.

Lemma 3.11

Let \(d\ge 3\).

  1. (i)

    If \(\beta <{\beta _h}\), then for any event \({\mathcal E}\) with \(\mathbb {P}\left[{\hat{\varvec{G}}_{\textrm{p}}\in {\mathcal E}}\right]\le \exp (-\Omega (n))\) we have \(\mathbb {P}\left[{\mathbb {G}\in {\mathcal E}}\right]=o(1)\).

  2. (ii)

    If \(\beta >{\beta _u}\), then for any event \({\mathcal E}\) with \(\mathbb {P}\left[{\hat{\varvec{G}}_{\textrm{f}}\in {\mathcal E}}\right]\le \exp (-\Omega (n))\) we have \(\mathbb {P}\left[{\mathbb {G}\in {\mathcal E}}\right]=o(1)\).

Proof

This follows from a “quiet planting” argument akin to the one from [2]. Specifically, Theorem 2.5 and Proposition 2.3 show that for \(\beta <{\beta _h}\) the event \(\mathcal {Z}_{\textrm{p}}=\{Z_{\textrm{p}}(\varvec{G})=\mathbb {E}[Z_{\textrm{p}}(\varvec{G})]\exp (o(n))\}\) occurs w.h.p. Therefore, recalling the definition (2.8) of the planted model, we obtain

$$\begin{aligned} \mathbb {P}\left[{\varvec{G}\in {\mathcal E}}\right]&\le \mathbb {P}\left[{\varvec{G}\in {\mathcal E}\cap \mathcal {Z}_{\textrm{p}}}\right] +\mathbb {P}\left[{\varvec{G}\not \in \mathcal {Z}_{\textrm{p}}}\right]\le \frac{\mathbb {E}[\varvec{1}\{\varvec{G}\in {\mathcal E}\} Z_{\textrm{p}}(\varvec{G})]\exp (o(n))}{\mathbb {E}[Z_{\textrm{p}}(\varvec{G})]}+o(1)\nonumber \\&\le \exp (o(n))\mathbb {P}\left[{\hat{\varvec{G}}_{\textrm{p}}\in {\mathcal E}}\right]+o(1)=o(1). \end{aligned}$$
(3.28)

Since the simple random regular graph \(\mathbb {G}\) is contiguous with respect to \(\varvec{G}\), assertion (i) follows from (3.28). The proof of (ii) is identical. \(\quad \square \)

Proof of Corollary 2.9

The assertion follows from Lemma 3.11 and Proposition 2.7. \(\quad \square \)

4 Metastability and Slow Mixing

In this section, we prove Theorems 1.1 and 1.2. Recall from Sect. 1.3 the paramagnetic and ferromagnetic states \(S_{\textrm{p}}(\varepsilon )\) and \(S_{\textrm{f}}(\varepsilon )\) for \(\varepsilon >0\). For the purposes of this section we will need to be more systematic of keeping track the dependence of these phases on \(\varepsilon \). In particular, we will use the more explicit notation \(Z_{\textrm{p}}^{\varepsilon }(G)\) and \(Z_{\textrm{f}}^{\varepsilon }(G)\) to denote the quantities \(Z_{\textrm{p}}(G)\) and \(Z_{\textrm{f}}(G)\), respectively, from (2.7).

The following lemma reflects the fact that \(\nu _{\textrm{p}}\) and \(\nu _{\textrm{f}}\) are local maxima of the first moment.

Lemma 4.1

Let \(q,d\ge 3\) be integers and \(\beta >0\) be real. Then, for all sufficiently small constants \(\varepsilon '>\varepsilon >0\) and any constant \(\theta >0\), there exists constant \(\zeta >0\) such that w.h.p. over \(G\sim \varvec{G}\), it holds that

  1. (1)

    If \(\beta <{\beta _h}\), then \(Z_{\textrm{p}}^{\varepsilon }(G)= \textrm{e}^{ o(n)}\mathbb {E}[Z_{\textrm{p}}^{\varepsilon }(\varvec{G})]\)  and  \(Z_{\textrm{p}}^{\varepsilon '}(G)\le (1+\textrm{e}^{-\zeta n})Z_{\textrm{p}}^{\varepsilon }(G)\).

  2. (2)

    If \(\beta >{\beta _u}\), then \(Z_{\textrm{f}}^{\varepsilon }(G)= \textrm{e}^{o(n)}\mathbb {E}[Z_{\textrm{f}}^{\varepsilon }(\varvec{G})]\) and \(Z_{\textrm{f}}^{\varepsilon '}(G)\le (1+\textrm{e}^{-\zeta n})Z_{\textrm{f}}^{\varepsilon }(G)\).

Proof

We first prove Item 1, let \(\beta <{\beta _h}\). Recall from (2.2) the function \(F(\nu ,\rho ):=F_{d, \beta }(\nu ,\rho )\) for \(\nu \in \mathcal {P}([q])\) and \(\rho \in {\mathcal R}(\nu )\). By Corollary 2.8, for all sufficiently small constant \(\varepsilon >0\), we have that

$$\begin{aligned} \mathbb {E}[\tfrac{1}{n}\log Z_{\textrm{p}}^{\varepsilon }(\varvec{G})]=\mathcal {B}_{d,\beta }(\mu _{\textrm{p}})+o(1)=F(\nu _{\textrm{p}},\rho _{\textrm{p}})+o(1), \end{aligned}$$

where the last equality holds by Lemma 2.2. Applying Azuma’s inequality to the random variable \(\log Z_{\textrm{p}}^{\varepsilon }(\varvec{G})\) by revealing the edges of \(\varvec{G}\) one-by-one, we therefore obtain that w.h.p. it holds that \(Z_{\textrm{p}}^{\varepsilon }(G)= \textrm{e}^{nF(\nu _{\textrm{p}},\rho _{\textrm{p}})+o(n)}\). Also, from Lemma 3.2 we have that \(\mathbb {E}[Z_{\textrm{p}}^{\varepsilon }(\varvec{G})]= \textrm{e}^{nF(\nu _{\textrm{p}},\rho _{\textrm{p}})+o(n)}\), so we obtain that \(Z_{\textrm{p}}^{\varepsilon }(G)= \textrm{e}^{o(n)}\mathbb {E}[Z_{\textrm{p}}^{\varepsilon }(\varvec{G})]\) proving the first inequality of Item 1. For the second inequality, recall from Proposition 2.3 that \((\nu _{\textrm{p}},\rho _{\textrm{p}})\) is a local maximum of F for \(\beta <{\beta _h}\), cf. (2.3). Therefore, for all sufficiently small constants \(\varepsilon '>\varepsilon >0\), there exists constant \(\zeta >0\) such that

$$\begin{aligned} F(\nu ,\rho )\le F(\nu _{\textrm{p}},\rho _{\textrm{p}})-4\zeta \end{aligned}$$
(4.1)

for all \(\nu \in \mathcal {P}([q])\) and \(\rho \in {\mathcal R}(\nu )\) with

$$\begin{aligned} \varepsilon \le \left\| {\nu -\nu _{\textrm{p}}}\right\| + \left\| {\rho -\rho _{\textrm{p}}}\right\| \le \varepsilon '. \end{aligned}$$
(4.2)

Using (3.10), we see that

$$\begin{aligned} \mathbb {E}\big [Z_{\textrm{p}}^{\varepsilon '}(\varvec{G})-Z_{\textrm{p}}^{\varepsilon }(\varvec{G})\big ]\le \sum _{\nu , \rho }\exp (nF(\nu ,\rho )+O(\log n)) \end{aligned}$$

where the sum ranges over \(\nu \in \mathcal {P}([q])\) and \(\rho \in {\mathcal R}(\nu )\) satisfying (4.2) such that \(n\nu (s),dn\rho (s,t)\) are integers for all \(s,t\in [q]\), and \(dn\rho (s,s)\) is even. Since there are at most \(n^{O(1)}\) choices for such colour statistics \(\nu ,\rho \), we obtain that \(\mathbb {E}\big [Z_{\textrm{p}}^{\varepsilon '}(\varvec{G})-Z_{\textrm{p}}^{\varepsilon }(\varvec{G})\big ]\le \textrm{e}^{n(F(\nu _{\textrm{p}},\rho _{\textrm{p}})-3\zeta )}\) for all sufficiently large n. By Markov’s inequality, we therefore have that w.h.p. \(Z_{\textrm{p}}^{\varepsilon '}(G)-Z_{\textrm{p}}^{\varepsilon }(G)\le \textrm{e}^{n F(\nu _{\textrm{p}},\rho _{\textrm{p}})-2\zeta n}\). As we showed above, w.h.p. \(Z_{\textrm{p}}^{\varepsilon }(G)= \textrm{e}^{nF(\nu _{\textrm{p}},\rho _{\textrm{p}})+o(n)}\), so combining these we obtain that w.h.p. \(Z_{\textrm{p}}^{\varepsilon }(G)\ge \textrm{e}^{\zeta n}\big (Z_{\textrm{p}}^{\varepsilon '}(G)-Z_{\textrm{p}}^{\varepsilon }(G)\big )\), completing the proof of Item 1 of the lemma.

For the second item of the lemma, the proof is completely analogous, using the fact from Proposition 2.3 that \((\nu _{\textrm{f}},\rho _{\textrm{f}})\) is a local maximum of \(F(\nu ,\rho )\) for \(\beta >{\beta _u}\). \(\quad \square \)

Theorem 1.1 will follow by way of a conductance argument. Let \(G=(V,E)\) be a graph, and P be the transition matrix for the Glauber dynamics defined in Sect. 1.4. For a set \(S \subseteq [q]^{V}\) define the bottleneck ratio of S to be

$$\begin{aligned} \Phi \left( {S}\right) = \frac{\sum _{\sigma \in S,\,\tau \not \in S}\mu _{G,\beta }(\sigma )P(\sigma ,\tau )}{\mu _{G,\beta }(S)} \end{aligned}$$
(4.3)

The following lemma provides a routine conductance bound (e.g., [38, Theorem 7.3]). For the sake of completeness the proof is included in Appendix A.

Lemma 4.2

Let \(G=(V,E)\) be a graph. For any \(S \subseteq [q]^{V}\) such that \(\mu _{G}(S)>0\) and any integer \(t\ge 0\) we have \(\left\| { \mu _{G,S} P^t - \mu _{G,S}}\right\| _{TV} \le t \Phi (S).\)

Proof of Theorem 1.1

We prove the statement for the pairing model \(\varvec{G}\), the result for \(\mathbb {G}\) follows immediately by contiguity. Let \(\varepsilon '>\varepsilon >0\) and \(\zeta >0\) be small constants such that Lemma 4.1 applies, and let \(G\sim \varvec{G}\) be a graph satisfying the lemma. Set for convenience \(\mu =\mu _{\varvec{G},\beta }\); we consider first the metastability of \(S_{\textrm{f}}(\varepsilon )\) for \(\beta >{\beta _u}\).

Since Glauber updates one vertex at a time it is impossible in one step to move from \(\sigma \in S_{\textrm{f}}(\varepsilon )\) to \(\tau \in [q]^n\backslash S_{\textrm{f}}(\varepsilon ')\), i.e., \(P(\sigma ,\tau )=0\), and therefore

$$\begin{aligned} \Phi \big (S_{\textrm{f}}(\varepsilon )\big )&=\frac{\sum _{\sigma \in S_{\textrm{f}}(\varepsilon )}\sum _{\tau \notin S_{\textrm{f}}(\varepsilon )}\mu (\sigma )P(\sigma ,\tau )}{\mu \big (S_{\textrm{f}}(\varepsilon )\big )}= \frac{\sum _{\sigma \in S_{\textrm{f}}(\varepsilon )}\sum _{\tau \in S_{\textrm{f}}(\varepsilon ')\backslash S_{\textrm{f}}(\varepsilon )}\mu (\sigma )P(\sigma ,\tau )}{\mu \big (S_{\textrm{f}}(\varepsilon )\big )} \end{aligned}$$

By reversibility of Glauber, for any \(\sigma ,\tau \in [q]^n\) we have \(\mu (\sigma )P(\sigma ,\tau )=\mu (\tau )P(\tau ,\sigma )\), and therefore

$$\begin{aligned}{} & {} \sum _{\sigma \in S_{\textrm{f}}(\varepsilon )}\sum _{\tau \in S_{\textrm{f}}(\varepsilon ')\backslash S_{\textrm{f}}(\varepsilon )}\mu (\sigma )P(\sigma ,\tau )\\{} & {} \quad =\sum _{\tau \in S_{\textrm{f}}(\varepsilon ')\backslash S_{\textrm{f}}(\varepsilon )}\mu (\tau )\sum _{\sigma \in S_{\textrm{f}}(\varepsilon )}P(\tau ,\sigma )\le \sum _{\tau \in S_{\textrm{f}}(\varepsilon ')\backslash S_{\textrm{f}}(\varepsilon )}\mu (\tau )=\mu \big (S_{\textrm{f}}(\varepsilon ')\backslash S_{\textrm{f}}(\varepsilon )\big ) \end{aligned}$$

Hence, \(\Phi \big (S_{\textrm{f}}(\varepsilon )\big )\le \frac{\mu \big (S_{\textrm{f}}(\varepsilon ')\backslash S_{\textrm{f}}(\varepsilon )\big )}{\mu \big (S_{\textrm{f}}(\varepsilon )\big )}=\tfrac{Z_{\textrm{f}}^{\varepsilon '}(G)-Z_{\textrm{f}}^{\varepsilon }(G)}{Z_{\textrm{f}}^{\varepsilon }(G)}\le \textrm{e}^{-\zeta n}\), where the last inequality follows from the fact that G satisfies Lemma 4.1. Lemma 4.2 therefore ensures that for all nonnegative integers \(T\le \textrm{e}^{\zeta n/3}\)

$$\begin{aligned} \left\| { \mu \big (\,\cdot \,\mid S_{\textrm{f}}(\varepsilon )\big ) P^{T} - \mu \big (\,\cdot \,\mid S_{\textrm{f}}(\varepsilon )\big )}\right\| _{TV} \le T \cdot \Phi (S_{\textrm{f}})\le \textrm{e}^{-2\zeta n/3}. \end{aligned}$$
(4.4)

Now, consider the Glauber dynamics \((\sigma _t)_{t\ge 0}\) launched from \(\sigma _0\) drawn from \(\mu _{\mathbb {G},\beta ,S_{\textrm{f}}(\varepsilon )}\), and denote by \(T_{\textrm{f}}=\min \left\{ {t >0: \sigma _t \notin S_{\textrm{f}}(\varepsilon )}\right\} \) its escape time from \(S_{\textrm{f}}(\varepsilon )\). Observe that \(\sigma _t\) has the same distribution as \(\mu (\,\cdot \,\mid S_{\textrm{f}}(\varepsilon )) P^{t}\), so (4.4) implies that for all nonnegative integers \(T\le \textrm{e}^{\zeta n/3}\)

$$\begin{aligned} \big |\mathbb {P}\left[{\sigma _T\in S_{\textrm{f}}(\varepsilon )}\right]-1\big |<\textrm{e}^{-2\zeta n/3}, \text{ or } \text{ equivalently } \mathbb {P}\left[{\sigma _T\notin S_{\textrm{f}}(\varepsilon )}\right]\le \textrm{e}^{-2\zeta n/3}. \end{aligned}$$

By a union bound over the values of T, we therefore obtain that \(\mathbb {P}[T_{\textrm{f}}\le \textrm{e}^{\zeta n/3}]\le \textrm{e}^{-\zeta n/3}\), thus proving that \(S_{\textrm{f}}(\varepsilon )\) is a metastable state for \(\beta >{\beta _u}\). Analogous arguments show that \(S_{\textrm{p}}(\varepsilon )\) is a metastable state for \(\beta <{\beta _h}\).

The slow mixing of Glauber for \(\beta >{\beta _u}\) follows from the metastability of \(S_{\textrm{f}}(\varepsilon )\). In particular, from Theorem 1.3 we have that \(\left\| {\mu \big (\,\cdot \,\mid S_{\textrm{f}}(\varepsilon )\big )-\mu }\right\| \ge 3/5\) and therefore, from (4.4), \(\left\| { \mu \big (\,\cdot \,\mid S_{\textrm{f}}(\varepsilon )\big ) P^{T}-\mu }\right\| \ge 1/2\), yielding that the mixing time is \(\textrm{e}^{\Omega (n)}\). \(\quad \square \)

The final ingredients to establish Theorem 1.2 are the following results, bounding the probability that Swendsen–Wang escapes \(S_{\textrm{p}}(\varepsilon )\) and \(S_{\textrm{f}}(\varepsilon )\). More precisely, for a graph G, a configuration \(\sigma \in [q]^n\), and \(S\subseteq [q]^n\), let \(P^{G}_{SW}(\sigma \rightarrow S)\) denote the probability that after one step of SW on G starting from \(\sigma \), we end up in a configuration in S.

The following proposition shows that for almost all pairs \((G,\sigma )\) from the paramagnetic planted distribution \(\big (\hat{\varvec{G}}\big (\hat{\varvec{\sigma }}_{\textrm{p}}(\varepsilon )\big ),\hat{\varvec{\sigma }}_{\textrm{p}}(\varepsilon )\big )\), the probability that SW leads to a configuration in the paramagnetic phase, slightly enlarged, is \(1-\textrm{e}^{-\Omega (n)}\).

Proposition 4.3

Let \(q,d\ge 3\) be integers and \(\beta \in ({\beta _u},{\beta _h})\) be real. Then, for all sufficiently small constants \(\varepsilon '>\varepsilon >0\), there exists constant \(\eta >0\) such that with probability \(1-\textrm{e}^{- \eta n}\) over the planted distribution \((G,\sigma )\sim \big (\hat{\varvec{G}}\big (\hat{\varvec{\sigma }}_{\textrm{p}}(\varepsilon )\big ),\hat{\varvec{\sigma }}_{\textrm{p}}(\varepsilon )\big )\), it holds that \(P^{G}_{SW}\big (\sigma \rightarrow S_{\textrm{p}}(\varepsilon ')\big )\ge 1-\textrm{e}^{-\eta n}\).

The following establishes the analogue of the previous proposition for the ferromagnetic planted distribution \(\big (\hat{\varvec{G}}\big (\hat{\varvec{\sigma }}_{\textrm{f}}(\varepsilon )\big ),\hat{\varvec{\sigma }}_{\textrm{f}}(\varepsilon )\big )\). Note here that SW might change the dominant colour due to recolouring step, so, for \(\varepsilon >0\), we now need to consider the set of configurations \(\tilde{S_{\textrm{f}}}(\varepsilon )\) that consists of the ferromagnetic phase \(S_{\textrm{f}}(\varepsilon )\) together with its \(q-1\) permutations, and the probability that SW escapes from it, starting from a ferromagnetic state.

Proposition 4.4

Let \(q,d\ge 3\) be integers and \(\beta \in ({\beta _u},{\beta _h})\) be real. Then, for all sufficiently small constants \(\varepsilon '>\varepsilon >0\), there exists constant \(\eta >0\) such that with probability \(1-\textrm{e}^{-\eta n}\) over the planted distribution \((G,\sigma )\sim \big (\hat{\varvec{G}}\big (\hat{\varvec{\sigma }}_{\textrm{f}}(\varepsilon )\big ),\hat{\varvec{\sigma }}_{\textrm{f}}(\varepsilon )\big )\), it holds that \(P^{G}_{SW}\big (\sigma \rightarrow \tilde{S_{\textrm{f}}}(\varepsilon ')\big )\ge 1-\textrm{e}^{-\eta n}\).

Proof of Theorem 1.2

We prove the statement for the pairing model \(\varvec{G}\), the result for \(\mathbb {G}\) follows immediately by contiguity. We consider first the metastability for the ferromagnetic phase when \(\beta >{\beta _u}\). Let \(\varepsilon '>\varepsilon >0\) and \(\eta ,\zeta >0\) be small constants such that Lemma 4.1 and Propositions 4.34.4 all apply. Let \(\theta =\tfrac{1}{10}\min \{\eta ,\zeta \}\).

Let \(\mathcal {Q}\) be the set of d-regular (multi)graphs that satisfy

$$\begin{aligned}Z_{\textrm{f}}^{\varepsilon }(G)\ge \textrm{e}^{-\theta n}\mathbb {E}[Z_{\textrm{f}}^{\varepsilon }(\varvec{G})]\quad \text{ and } \quad Z_{\textrm{f}}^{\varepsilon '}(G)\le (1+\textrm{e}^{-\zeta n})Z_{\textrm{f}}^{\varepsilon }(G),\end{aligned}$$

and note that by Item 2 of Lemma 4.1 it holds that \(\mathbb {P}[\varvec{G}\in \mathcal {Q}]=1-o(1)\). Moreover, let \(\mathcal {Q}'\) be the set of d-regular (multi)graphs G such that the set of configurations where SW has conceivable probability of escaping \(\tilde{S_{\textrm{f}}}(\varepsilon ')\) has small weight, i.e., the set

$$\begin{aligned} S_{\textrm{Bad}}(G)=\big \{\sigma \in \tilde{S_{\textrm{f}}}(\varepsilon )\,\big |\, P^{G}_{SW}\big (\sigma \rightarrow \tilde{S_{\textrm{f}}}(\varepsilon ')\big )<1- \textrm{e}^{-\eta n}\big \} \end{aligned}$$

has aggregate weight \(Z_{\textrm{Bad}}(G)=\sum _{\sigma \in S_{\textrm{Bad}}(G)}\textrm{e}^{\beta \mathcal {H}(G)}\) less than \(\textrm{e}^{-\theta n}Z_{\textrm{f}}^{\varepsilon }(G)\). We claim that for a d-regular graph G such that \(G\in \mathcal {Q}\cap \mathcal {Q}'\), it holds that \(\Phi _{SW}\big (\tilde{S_{\textrm{f}}}(\varepsilon )\big )\le 10\textrm{e}^{-\eta n}\), where \(\Phi _{SW}(\cdot )\) denotes the bottleneck ratio for the SW-chain. Indeed, we have

$$\begin{aligned} \Phi _{SW}\big (\tilde{S_{\textrm{f}}}(\varepsilon )\big )&=\frac{\sum _{\sigma \in \tilde{S_{\textrm{f}}}(\varepsilon )}\mu (\sigma )P^G_{SW}(\sigma \rightarrow [q]^n\backslash \tilde{S_{\textrm{f}}}(\varepsilon ))}{\mu \big (\tilde{S_{\textrm{f}}}(\varepsilon )\big )}\\&\le \frac{\mu \big (S_{\textrm{Bad}}(G)\big )+\sum _{\sigma \in \tilde{S_{\textrm{f}}}(\varepsilon )\backslash S_{\textrm{Bad}}(G)}\mu (\sigma )P^G_{SW}(\sigma \rightarrow [q]^n\backslash \tilde{S_{\textrm{f}}}(\varepsilon ))}{\mu \big (\tilde{S_{\textrm{f}}}(\varepsilon )\big )} \end{aligned}$$

We can decompose the sum in the numerator of the last expression as

$$\begin{aligned} \sum _{\sigma \in \tilde{S_{\textrm{f}}}(\varepsilon )\backslash S_{\textrm{Bad}}(G)}\mu (\sigma )P^G_{SW}\big (\sigma \rightarrow [q]^n\backslash \tilde{S_{\textrm{f}}}(\varepsilon ')\big )+\sum _{\sigma \in \tilde{S_{\textrm{f}}}(\varepsilon )\backslash S_{\textrm{Bad}}(G)}\mu (\sigma )P^G_{SW}\big (\sigma \rightarrow \tilde{S_{\textrm{f}}}(\varepsilon ')\backslash \tilde{S_{\textrm{f}}}(\varepsilon )\big ). \end{aligned}$$

For \(\sigma \in \tilde{S_{\textrm{f}}}(\varepsilon )\backslash S_{\textrm{Bad}}(G)\), we have \(P^G_{SW}\big (\sigma \rightarrow [q]^n\backslash \tilde{S_{\textrm{f}}}(\varepsilon ')\big )\le \textrm{e}^{-\eta n}\) and therefore the first sum is upper bounded by \(\textrm{e}^{-\eta n}\mu \big (\tilde{S_{\textrm{f}}}(\varepsilon )\big )\). The second sum, using the reversibility of the SW chain, is upper bounded by \(\mu \big (\tilde{S_{\textrm{f}}}(\varepsilon ')\backslash \tilde{S_{\textrm{f}}}(\varepsilon )\big )\). Using these, we therefore have that

$$\begin{aligned} \Phi _{SW}\big (\tilde{S_{\textrm{f}}}(\varepsilon )\big )&\le \frac{\mu \big (S_{\textrm{Bad}}(G)\big )+\textrm{e}^{-\eta n}\mu \big (\tilde{S_{\textrm{f}}}(\varepsilon )\big )+\mu \big (\tilde{S_{\textrm{f}}}(\varepsilon ')\backslash \tilde{S_{\textrm{f}}}(\varepsilon )\big )}{\mu \big (\tilde{S_{\textrm{f}}}(\varepsilon )\big )}\le 10\textrm{e}^{-\theta n}, \end{aligned}$$

since \(\frac{\mu \big (S_{\textrm{Bad}}(G)\big )}{\mu \big (\tilde{S_{\textrm{f}}}(\varepsilon )\big )}=\frac{Z_{\textrm{Bad}}(G)}{qZ_{\textrm{f}}^{\varepsilon }(G)}\le \textrm{e}^{-\theta n}\) from the assumption \(G\in \mathcal {Q}'\) and \(\frac{\mu \big (\tilde{S_{\textrm{f}}}(\varepsilon ')\backslash \tilde{S_{\textrm{f}}}(\varepsilon )\big )}{\mu \big (\tilde{S_{\textrm{f}}}(\varepsilon )\big )}=\frac{q(Z_{\textrm{f}}^{\varepsilon '}(G)-Z_{\textrm{f}}^{\varepsilon }(G))}{qZ_{\textrm{f}}^{\varepsilon }(G)}\le \textrm{e}^{-\theta n}\) from Lemma 4.1. By arguments analogous to those in the proof of Theorem 1.1, we have that \(\tilde{S_{\textrm{f}}}(\varepsilon )\) is a metastable state for graphs \(G\in \mathcal {Q}\cap \mathcal {Q}'\). Therefore, to finish the metastability proof for the random graph, it suffices to show that \(\mathbb {P}[\varvec{G}\in \mathcal {Q}\cap \mathcal {Q}']=1-o(1)\).

To do this, let \(\mathcal {G}(n,d)\) be the set of all multigraphs that can be obtained in the pairing model and \(\Lambda _{d,\beta }(n)=\big \{(G,\sigma )\, \big | \, G\in \mathcal {G}(n,d),\ \sigma \in \tilde{S_{\textrm{f}}}(\varepsilon )\big \}\). Let \({\mathcal E}\) be the pairs \((G,\sigma )\in \Lambda _{d,\beta }(n)\) where one step of SW starting from \(G,\sigma \) stays within \(\tilde{S_{\textrm{f}}}(\varepsilon ')\) with probability \(1- \textrm{e}^{-\Omega (n)}\), more precisely

$$\begin{aligned} {\mathcal E}=\Big \{(G,\sigma )\in \Lambda _{d,\beta }(n)\, \big | \, P^{G}_{SW}\big (\sigma \rightarrow \tilde{S_{\textrm{f}}}(\varepsilon ')\big )\ge 1- \textrm{e}^{-\eta n}\Big \}. \end{aligned}$$

The aggregate weight corresponding to pairs \((G,\sigma )\) that do not belong to \({\mathcal E}\) can be lower-bounded by

$$\begin{aligned}{} & {} \sum _{(G,\sigma )\in \Lambda _{d,\beta }\backslash {\mathcal E}} \textrm{e}^{\beta \mathcal {H}_G(\sigma )}\ge \sum _{\begin{array}{c} (G,\sigma )\in \Lambda _{d,\beta }\backslash {\mathcal E};\\ G\in \mathcal {Q}\backslash \mathcal {Q}' \end{array}} \textrm{e}^{\beta \mathcal {H}_G(\sigma )}\\{} & {} \quad =\sum _{G\in \mathcal {Q}\backslash \mathcal {Q}'} \sum _{\sigma \in \Sigma _{\textrm{Bad}}(G)}\textrm{e}^{\beta \mathcal {H}_G(\sigma )}\ge \textrm{e}^{-\theta n}\sum _{G\in \mathcal {Q}\backslash \mathcal {Q}'}Z_{\textrm{f}}^{\varepsilon }(G). \end{aligned}$$

For graphs \(G\in \mathcal {Q}\) we have \(Z_{\textrm{f}}^{\varepsilon }(G)\ge \textrm{e}^{-\theta n}\mathbb {E}[Z_{\textrm{f}}^{\varepsilon }(\varvec{G})]\), and therefore

$$\begin{aligned} \sum _{(G,\sigma )\in \Lambda _{d,\beta }\backslash {\mathcal E}} \textrm{e}^{\beta \mathcal {H}_G(\sigma )}\ge \textrm{e}^{-2\theta n} \big |\mathcal {Q}\backslash \mathcal {Q}'\big |\ \mathbb {E}\big [Z_{\textrm{f}}^{\varepsilon }(\varvec{G})\big ]=\textrm{e}^{-2\theta n}\big |\mathcal {Q}\backslash \mathcal {Q}'\big |\,\frac{\sum _{(G,\sigma )\in \Lambda _{d,\beta }} \textrm{e}^{\beta \mathcal {H}_G(\sigma )}}{\big |\mathcal {G}(n,d)\big |} \end{aligned}$$
(4.5)

From the definition of \(\big (\hat{\varvec{G}}\big (\hat{\varvec{\sigma }}_{\textrm{f}}(\varepsilon )\big ),\hat{\varvec{\sigma }}_{\textrm{f}}(\varepsilon )\big )\), cf. (3.1),(3.2), observe that

$$\begin{aligned} \frac{\sum _{(G,\sigma )\in \Lambda _{d,\beta }\backslash {\mathcal E}} \textrm{e}^{\beta \mathcal {H}_G(\sigma )}}{\sum _{(G,\sigma )\in \Lambda _{d,\beta }} \textrm{e}^{\beta \mathcal {H}_G(\sigma )}}=\mathbb {P}\big [\big (\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{f}}(\varepsilon )),\hat{\varvec{\sigma }}_{\textrm{f}}(\varepsilon )\big )\in \Lambda _{d,\beta }\backslash {\mathcal E}\big ]\le \textrm{e}^{-\eta n}\le \textrm{e}^{-10\theta n}, \end{aligned}$$

where the penultimate inequality follows from Proposition 4.4 and the last from the choice of \(\theta \). Combining this with (4.5), we obtain \(\mathbb {P}[\varvec{G}\in \mathcal {Q}\backslash \mathcal {Q}']=o(1)\). Since \(\mathbb {P}[\varvec{G}\in \mathcal {Q}]=1-o(1)\) from Lemma 4.1, it follows that

$$\begin{aligned} \mathbb {P}[\varvec{G}\in \mathcal {Q}\cap \mathcal {Q}']\ge \mathbb {P}[\varvec{G}\in \mathcal {Q}]-\mathbb {P}[\varvec{G}\in \mathcal {Q}\backslash \mathcal {Q}']\ge 1-o(1). \end{aligned}$$

This concludes the proof for the metastability of the ferromagnetic phase \(\tilde{S_{\textrm{f}}}(\varepsilon )\) when \(\beta >{\beta _u}\).

A similar bottleneck-ratio argument shows that \(S_{\textrm{p}}(\varepsilon )\) is a metastable state for \(\beta <{\beta _h}\). The slow mixing of SW for \(\beta \in ({\beta _u},{\beta _h})\) follows from the metastability of \(\tilde{S_{\textrm{f}}}(\varepsilon )\) when \(\beta \in ({\beta _u},{\beta _p}]\) and the metastability of \(S_{\textrm{p}}(\varepsilon )\) when \(\beta \in [{\beta _p},{\beta _h})\). In particular, let \(S\in \{\tilde{S_{\textrm{f}}}(\varepsilon ),S_{\textrm{p}}(\varepsilon )\}\) be such that \(\left\| {\mu \big (\,\cdot \,\mid S\big )-\mu }\right\| \ge 1/2\), then Lemma 4.2 gives that for \(T=\textrm{e}^{\Omega (n)}\), it holds that \(\left\| { \mu \big (\,\cdot \,\mid S\big ) P^{T}_{SW}-\mu }\right\| \ge 1/2-1/10\), yielding that the mixing time is \(\textrm{e}^{\Omega (n)}\). \(\quad \square \)

5 Remaining Proofs for Swendsen–Wang

To analyse the Swendsen–Wang dynamics on the d-regular random graph \(\varvec{G}\), we will need to consider the component structure after performing edge percolation with probability \(p\in (0,1)\). Key quantities we will be interested in are the size of the largest compoment, which will allow us to track whether we land in the paramagnetic or ferromagnetic phases, as well as the sum of squares of component sizes; the first will signify whether we land in the paramagnetic or ferromagnetic phases, and the second will allow us to track the random fluctuations caused by the colouring step of SW. Both of these ingredients have been worked out in detail for the mean-field case; here the random regular graph makes all the arguments more involved technically, even for a single iteration (recall that the reason it suffices to analyse a single iteration is because of the quiet planting idea of Sects. 3 and 4).

5.1 Percolation on random regular graphs

For a graph G and \(p\in (0,1)\), we denote by \(G_p\) the random graph obtained by keeping every edge of G with probability p. Working in the configuration model, we will denote by \(\varvec{G}_{p}:=\varvec{G}_{p}(n,d)\) the multigraph obtained by first choosing a random matching of the points in \([n]\times [d]\), then keeping each edge of the matching with probability p, and finally projecting the edges onto vertices in [n]. It will also be relevant to consider the multigraph \(\tilde{\varvec{G}}_p:=\tilde{\varvec{G}}_{p}(n,d)\) where in the second step we instead keep a random subset of exactly \(m=[pdn/2]\) edges. To help differentiate between the two models, we will refer to \(\varvec{G}_{p}\) as the binomial-edge model, whereas to \(\tilde{\varvec{G}}_p\) as the exact-edge model. Note that for an n-vertex multigraph G of maximum degree d with m edges, the two models are related by

$$\begin{aligned} \mathbb {P}\big [\varvec{G}_{p}=G\mid E(\varvec{G}_{p})=m\big ]=\mathbb {P}[\tilde{\varvec{G}}_{\tilde{p}}=G], \text{ where } \tilde{p}=2m/nd. \end{aligned}$$

see for example [25, Lemma 3.1]. Based on this, it is standard to relate the two models for events that are monotone under edge inclusion.Footnote 5

Lemma 5.1

Let \(d\ge 3\) be an integer and \(p^*\in (0,1)\) be a constant. There exists a constant \(c>0\) such that, for any constant \(\delta \in (0,1)\), for any increasing property \({\mathcal E}\) and any decreasing property \(\mathcal {F}\) on multigraphs of maximum degree d, it holds that

$$\begin{aligned}{} & {} \tfrac{1}{2}\mathbb {P}[\tilde{\varvec{G}}_{p^*-\delta }\in {\mathcal E}]\le \mathbb {P}[\varvec{G}_{p^*}\in {\mathcal E}]\le \mathbb {P}[\tilde{\varvec{G}}_{p^*+\delta }\in {\mathcal E}]\ +\textrm{e}^{-c\delta ^2 n},\\{} & {} \tfrac{1}{2}\mathbb {P}[\tilde{\varvec{G}}_{p^*+\delta }\in \mathcal {F}]\le \mathbb {P}[\varvec{G}_{p^*}\in \mathcal {F}]\le \mathbb {P}[\tilde{\varvec{G}}_{p^*-\delta }\in \mathcal {F}]\ +\textrm{e}^{-c\delta ^2 n}. \end{aligned}$$

Proof

Let \(\mathcal {A}\) be the event that \(E(\varvec{G}_{p^*})\) has \((p^*\pm \delta )dn/2\) edges. By standard Chernoff bounds we obtain that there exists a constant \(c>0\) such that \(\mathbb {P}(\mathcal {A})\ge 1- \textrm{e}^{-c\delta ^2 n}\). Further, conditioned on \(|E(\varvec{G}_{p^*})|=p dn/2\) for some p, the graph \(\varvec{G}_{p^*}\) has the same distribution as \(\tilde{\varvec{G}}_{p}\), and therefore, using the fact that \({\mathcal E}\) is an increasing property, we have that \(\mathbb {P}[\tilde{\varvec{G}}_{p^*+\delta }\in {\mathcal E}]\ge \mathbb {P}[\tilde{\varvec{G}}_{p^*}\in {\mathcal E}\mid \mathcal {A}]\ge \mathbb {P}[\tilde{\varvec{G}}_{p^*-\delta }\in {\mathcal E}]\), and the inequalities are reversed for \(\mathcal {F}\), yielding the lemma. \(\quad \square \)

It is a classical result [4] that for percolation on random d-regular graphs there is a phase transition at \(p=1/(d-1)\) with regards to the emergence of a giant component, see also [34, 36, 44, 45]. To prove Propositions 4.3 and 4.4, we will need to control the sizes of the components in the strictly subcritical and supercritical regimes with probability bounds that are exponentially close to 1, which makes most of these results not directly applicable.

For a graph G and an integer \(i\ge 1\), we denote by \(C_i(G)\) the i-th largest component of G (in terms of vertices); \(|C_i(G)|\) and \(|E(C_i(G))|\) denote the number of vertices and edges in \(C_i(G)\). The following proposition gives the desired bound on the component sizes in the subcritical regime.

Proposition 5.2

Let \(d\ge 3\) be an integer and \(p_0<1/(d-1)\) be a positive constant. There exists constants \(c,M>0\) such that the following holds for all integers n. For any positive \(p<p_0\), with probability at least \(1-\textrm{e}^{-cn}\) over the choice of either \(G\sim \varvec{G}_p\) or \(G\sim {\tilde{\varvec{G}}}_p\), it holds that \(\sum _{i\ge 1}|C_i(G)|^2\le M n\).

Proof

The proof is fairly standard and actually holds for percolation on an arbitrary graph of maximum degree d. We argue initially for the binomial-edge case \(G\sim \varvec{G}_p\). Consider the process where we consider the vertices of G in an arbitrary order, and we explore by breadth-first-search the components of those vertices that have not been discovered so far. Suppose that we have already explored the components \(\mathcal {C}_1,\ldots , \mathcal {C}_k\) and we are exploring the component \(\mathcal {C}_{k+1}\) starting from vertex v. Since the graph has maximum degree d, the size of \(\mathcal {C}_{k+1}\) is stochastically dominated above by a branching process where the root has offspring distribution \(\textrm{Bin}(d,p_0)\) and every other vertex has \(\textrm{Bin}(d-1,p_0)\). Since \(p_0<1/(d-1)\), the latter process is subcritical and therefore there exist constants \(c',K>0\) (depending only on d and \(p_0\)) such that for all \(t>K\), it holds that

$$\begin{aligned} \mathbb {P}\big [|\mathcal {C}_{k+1}|>t\mid \mathcal {C}_1,\ldots , \mathcal {C}_k\big ]\le \textrm{e}^{-c' t}. \end{aligned}$$
(5.1)

We have that \(\sum _{i\ge 1}|C_i(G)|^2=\sum _{k\ge 1}|\mathcal {C}_k|^2\le K^2 n+\sum _{k\ge 1}|\mathcal {C}_k|^2\textbf{1}\{\mathcal {C}_k\ge K\}\). From (5.1), we have that the sum in the last expression is stochastically dominated by the sum of n i.i.d. random variables with exponential tails, and therefore there exists constants \(c,M'>0\), depending only on \(p_0\), such that with probability \(1-\textrm{e}^{-cn}\) the sum is bounded by \(M'n\), yielding the result with \(M=M'+K^2\). The exact-edge case \(G\sim {\tilde{\varvec{G}}}_p\) follows by applying Lemma 5.1, noting that the graph property \(\sum _{i\ge 1}|C_i(G)|^2\le M n\) is decreasing under edge-inclusion. \(\quad \square \)

The supercritical regime is more involved since we need to account for the giant component using large deviation bounds. While there is not an off-the-self result we can use, we can adapt a technique by Krivelevich et al. [36] that was developed in a closely related setting (for high-girth expanders, refining previous results of Alon, Benjamini and Stacey [4]).

For \(d\ge 3\) and \(p\in (\frac{1}{d-1},1)\), let \(\phi =\phi (p)\in (0,1)\) be the probability that a branching process with offspring distribution \(\textrm{Bin}(d-1,p)\) dies out, i.e., \(\phi (p)\in (0,1)\) is the (unique) solution of

$$\begin{aligned}{} & {} \phi =(p \phi +1-p)^{d-1}, \text{ and } \text{ define } \chi =\chi (p), \psi =\psi (p) \text{ from } \chi =1-(p \phi +1-p)^{d},\nonumber \\{} & {} \quad \psi =\tfrac{1}{2}dp(1-\phi ^2). \end{aligned}$$
(5.2)

In Appendix B, we show the following adapting the argument from [36].

Lemma 5.3

Let \(d\ge 3\) be an integer, \(p\in (\frac{1}{d-1},1)\) be a real, and \(\chi =\chi (p),\psi =\psi (p)\) be as in (5.2). Then, for any \(\delta >0\), with probability \(1-\textrm{e}^{-\Omega (n)}\) over the choice of either \(G\sim \varvec{G}_p\) or \(G\sim {\tilde{\varvec{G}}}_p\), it holds that

$$\begin{aligned} |C_1(G)|=(\chi \pm \delta ) n, \quad |E(C_1(G))|=(\psi \pm \delta ) n. \end{aligned}$$

With this and a bit of algebra, we can derive the analogue of Proposition 5.2 in the supercritical regime.

Proposition 5.4

Let \(d\ge 3\) be an integer. Consider arbitrary \(p_0\in (\tfrac{1}{d-1},1)\) and let \(\chi _0=\chi (p_0)\) be as in (5.2). Then, for all \(\delta >0\), there exist \(\varepsilon ,c,M>0\), such that the following holds. For all sufficiently large integers n and any \(p=p_0\pm \varepsilon \), with probability at least \(1-\textrm{e}^{-cn}\) over the choice of either \(G\sim \varvec{G}_p\) or \(G\sim {\tilde{\varvec{G}}}_p\), it holds that \(|C_1(G)|=(\chi _0 \pm \delta ) n\) and \(\sum _{i\ge 2}|C_i(G)|^2\le M n\).

To prove Proposition 5.4, the following inequality between \(\chi \) and \(\psi \) will be useful; it will allow us to conclude that once we remove the giant component, the remaining components are in the subcritical regime.

Lemma 5.5

Let \(d\ge 3\) be an integer and \(p\in (\tfrac{1}{d-1},1)\). Then, \(\frac{2(\tfrac{1}{2}dp-\psi )}{d(1-\chi )}< \tfrac{1}{d-1}\).

Proof

Using (5.2), we have

$$\begin{aligned}{} & {} \tfrac{d(1-\chi )}{d-1}-2(\tfrac{1}{2}dp-\psi )=\tfrac{d}{d-1}(p \phi +1-p)^{d}-dp\phi (p \phi +1-p)^{d-1}\\{} & {} \quad =\tfrac{d}{d-1}(p \phi +1-p)^{d-1}(1-p-(d-2)p\phi ), \end{aligned}$$

so it suffices to show that \(1-p-(d-2)p\phi >0\). Let \(g(y)=y -(p y+1-p)^{d-1}\) and note that \(g(\phi )=0\). Then, we have that \(g(0)<0\) and \(g(1)=0\). Moreover, \(g'(y)=1-(d-1)p(py+1-p)^{d-2}\) and hence \(g'(1)<0\). It follows that \(g(y)>0\) for \(y\uparrow 1\), and therefore there is \(y\in (0,1)\) such that \(g(y)=0\). Note that g is strictly concave and therefore cannot have three zeros in the interval (0, 1], so \(y=\phi \), and therefore \(g'(\phi )> 0\). It remains to observe that \(g'(\phi )=\tfrac{1-p-(d-2)p\phi }{p \phi +1-p}\), from where the desired inequality follows. \(\quad \square \)

Proof of Proposition 5.4

Let \(\psi _0=\psi (p_0)\) and consider an arbitrarily small \(\delta >0\). Since \(\chi (p)\) and \(\psi (p)\) are continuous functions of p in the interval \((\tfrac{1}{d-1},1)\), we can pick \(\varepsilon >0\) so that, for all \(p=p_0\pm \varepsilon \) it holds that \(d|p-p_0|, |\chi (p)-\chi _0|,|\psi (p)-\psi _0|\le \delta /10\) and, by Lemma 5.5, \(\frac{2(\tfrac{1}{2}dp-\psi )+4\delta }{d(1-\chi )-\delta }<\tfrac{1}{d-1}-\delta \). Consider now an arbitrary \(p=p_0\pm \varepsilon \) and consider random G sampled from either of the distributions \(\varvec{G}_p\) or \(\tilde{\varvec{G}}_p\). Using the monotonicity of the events \(\{|C_1(G)|\ge t\},\{|E(C_1(G))|\ge t\}\), we obtain from Lemmas 5.1 and  5.3 (as well as a standard Chernoff bound for the number of edges in G) that there exists a constant \(c'>0\), depending only on \(d, p_0,\varepsilon \) (but not on p), such that with probability at least \(1-\textrm{e}^{-c'n}\) over the choice of G it holds that \(|E(G)|=\tfrac{1}{2}{dpn}\pm \delta n\), \(|C_1(G)|=(\chi _0 \pm \delta ) n\), and \(|E(C_1(G))|=(\psi _0 \pm \delta ) n\). Let \({\mathcal E}\) denote this event.

Note that conditioned on \(|C_1(G)|, |E(C_1(G))|\) and |E(G)|, the remaining components of G are distributed according to those in the exact-edge model \(\tilde{\varvec{G}}_{\tilde{p}}({\tilde{n}},d)\) with \(\tilde{n}=n-|C_1(G)|\) and \({\tilde{p}}=\tfrac{2}{d\tilde{n}}(|E(G)|-|E(C_1(G))|)\), conditioned on the event \(\mathcal {F}\) that all components have size less than \(|C_1(G)|\). Hence, conditioned on \({\mathcal E}\), we have that \(\tilde{p}\le \frac{2(\frac{1}{2}dpn-\psi n)+4\delta n}{2(n-\chi n)-\delta n}<\tfrac{1}{d-1}-\delta \) where the last inequality follows from the choice of \(\varepsilon \), i.e., \(\tilde{\varvec{G}}_{\tilde{p}}({\tilde{n}},d)\) is in the subcritical regime. Therefore, the probability of \(\mathcal {F}\) is \(1-\textrm{e}^{-\Omega (n)}\) and hence the conditioning on \(\mathcal {F}\) when considering \(\tilde{\varvec{G}}_{\tilde{p}}({\tilde{n}},d)\) can safely be ignored. From Proposition 5.2, we have that there exist constants \(M,c''>0\), depending only on d and \(p_0\), so that with probability at least \(1-\textrm{e}^{-c''n}\) over the choice of \(G'\sim \tilde{\varvec{G}}_{\tilde{p}}({\tilde{n}},d)\), it holds that \(\sum _{i\ge 1}|C_i(G')|^2\le M {\tilde{n}}\). Therefore, we have \(\sum _{i\ge 2}|C_i(G)|^2\le M n\). \(\quad \square \)

5.2 Percolation in the planted model

Recall the edge-empirical distributions \(\rho _{G,\sigma }\), \(\rho _{\textrm{p}}\), \(\rho _{\textrm{f}}\), cf. (2.4). The following lemma will allow us to deduce the regime (subcritical or supercritical) that dictates the percolation step of SW when we start from the paramagnetic and ferromagnetic phases.

Lemma 5.6

For \(\beta <{\beta _h}\), any colour \(s\in [q]\) in the paramagnetic phase satisfies \((1-\textrm{e}^{-\beta })\frac{\rho _{\textrm{p}}(s,s)}{\nu _{\textrm{p}}(s)}<\tfrac{1}{d-1}\). For \(\beta >{\beta _u}\), any colour \(s\in [q]\) in the ferromagnetic phase satisfies \((1-\textrm{e}^{-\beta })\frac{\rho _{\textrm{f}}(s,s)}{\nu _{\textrm{f}}(s)}=\tfrac{(\textrm{e}^{\beta }-1)\mu _{\textrm{f}}(s)}{1+(\textrm{e}^{\beta }-1)\mu _{\textrm{f}}(s)}\); this is larger than \(\tfrac{1}{d-1}\) for the colour \(s=1\), and less than \(\tfrac{1}{d-1}\) for all the other \(q-1\) colours.

Proof

For the paramagnetic phase and any colour \(s\in [q]\), it follows from (2.4) that

$$\begin{aligned} \nu _{\textrm{p}}(s)=\tfrac{1}{q}, \qquad \rho _{\textrm{p}}(s,s)=\tfrac{\textrm{e}^{\beta }}{q\textrm{e}^{\beta }+(q^2-q)}, \end{aligned}$$

so \((1-\textrm{e}^{-\beta })\frac{\rho _{\textrm{p}}(s,s)}{\nu _{\textrm{p}}(s)}<\tfrac{1}{d-1}\) is equivalent to \((1-\textrm{e}^{-\beta })\frac{\textrm{e}^{\beta }}{\textrm{e}^{\beta }+q-1}<\tfrac{1}{d-1}\) which is true iff \(\beta <{\beta _h}\), since \({\beta _h}=\log (1+\tfrac{q}{d-2})\).

For the ferromagnetic phase, recall from Sect. 1.3 that \(x=\mu _{\textrm{f}}(1)\) is the largest number in the interval (1/q, 1) that satisfies

$$\begin{aligned} x=\frac{(1+(\textrm{e}^\beta -1)x)^{d-1}}{(1+(\textrm{e}^\beta -1)x)^{d-1} +(q-1)\big (1+(\textrm{e}^\beta -1)\tfrac{1-x}{q-1}\big )^{d-1}}. \end{aligned}$$
(5.3)

Let \(t=\tfrac{1+(\textrm{e}^\beta -1)x}{1+(\textrm{e}^\beta -1)\tfrac{1-x}{q-1}}\) and note that \(t>1\) since \(x>1/q\) and \(\beta >0\). Moreover, (5.3) can be written as \(x=\frac{t^{d-1}}{t^{d-1}+(q-1)}\), and hence \(t^{d-1}=\frac{(q-1)x}{1-x}\). Then, it follows from (2.4) that for colour \(s=1\) we have

$$\begin{aligned} \nu _{\textrm{f}}(1)= & {} \frac{t^d}{t^d+(q-1)}=\frac{t x}{tx+1-x}, \qquad \rho _{\textrm{f}}(1,1)=\frac{\textrm{e}^\beta x^2}{1+(\textrm{e}^\beta -1)\big (x^2+\tfrac{(1-x)^2}{q-1}\big )}\nonumber \\= & {} \frac{\textrm{e}^\beta t x^2}{(tx+1-x)\big (1+(\textrm{e}^\beta -1)x\big )}, \end{aligned}$$
(5.4)

whereas for colours \(s\ne 1\) we have

$$\begin{aligned} \nu _{\textrm{f}}(s)= & {} \frac{1}{t^d+(q-1)}=\frac{\tfrac{1-x}{q-1}}{tx+1-x}, \qquad \rho _{\textrm{f}}(s,s)=\frac{\textrm{e}^\beta \big (\tfrac{1-x}{q-1}\big )^2}{1+(\textrm{e}^\beta -1)(x^2+\tfrac{(1-x)^2}{q-1})}\\= & {} \frac{\textrm{e}^\beta t \big (\tfrac{1-x}{q-1}\big )^2}{(tx+1-x)\big (1+(\textrm{e}^\beta -1)x\big )}. \end{aligned}$$

Using these expressions, it is a matter of few manipulations to verify that \((1-\textrm{e}^{-\beta })\frac{\rho _{\textrm{f}}(s,s)}{\nu _{\textrm{f}}(s)} =\tfrac{(\textrm{e}^{\beta }-1)\mu _{\textrm{f}}(s)}{1+(\textrm{e}^{\beta }-1)\mu _{\textrm{f}}(s)}\) for all colours \(s\in [q]\).

Using this, for \(s=1\), we have that the inequality \((1-\textrm{e}^{-\beta })\frac{\rho _{\textrm{f}}(1,1)}{\nu _{\textrm{f}}(1)}>\tfrac{1}{d-1}\) is equivalent to \((\textrm{e}^\beta -1)x>\tfrac{1}{d-2}\). Plugging \(x=\frac{t^{d-1}}{t^{d-1}+(q-1)}\) into \(t=\tfrac{1+(\textrm{e}^\beta -1)x}{1+(\textrm{e}^\beta -1)\tfrac{1-x}{q-1}}\) and solving for \((\textrm{e}^\beta -1)\) yields that \(\textrm{e}^\beta -1=\frac{(t-1)(t^{d-1}+q-1)}{t^{d-1}-t}\). Therefore the desired inequality becomes

$$\begin{aligned} \frac{(t-1)t^{d-1}}{t^{d-1}-t}>\tfrac{1}{d-2}, \text{ or } \text{ equivalently } (d-2)t^{d-1}-(d-1)t^{d-2}+1>0, \end{aligned}$$

which is true for any \(t>1\). For a colour \(s\ne 1\), the inequality \((1-\textrm{e}^{-\beta })\frac{\rho _{\textrm{f}}(s,s)}{\nu _{\textrm{f}}(s)}<\tfrac{1}{d-1}\) can be proved analogously. We have in particular the equivalent inequality \((\textrm{e}^\beta -1)\tfrac{1-x}{q-1}<\tfrac{1}{d-2}\), which further reduces to \(\frac{t-1}{t^{d-1}-t}<\tfrac{1}{d-2}\); the latter again holds for any \(t>1\). \(\quad \square \)

5.3 Tracking one step of SW—Proof of Propositions 4.3 and 4.4

Proposition 4.3

Let \(q,d\ge 3\) be integers and \(\beta \in ({\beta _u},{\beta _h})\) be real. Then, for all sufficiently small constants \(\varepsilon '>\varepsilon >0\), there exists constant \(\eta >0\) such that with probability \(1-\textrm{e}^{- \eta n}\) over the planted distribution \((G,\sigma )\sim \big (\hat{\varvec{G}}\big (\hat{\varvec{\sigma }}_{\textrm{p}}(\varepsilon )\big ),\hat{\varvec{\sigma }}_{\textrm{p}}(\varepsilon )\big )\), it holds that \(P^{G}_{SW}\big (\sigma \rightarrow S_{\textrm{p}}(\varepsilon ')\big )\ge 1-\textrm{e}^{-\eta n}\).

Proof

Let \(\varepsilon >0\) be a sufficiently small constant so that by Lemma 3.4, for any constant \(\delta >0\), with probability \(1-\textrm{e}^{-\Omega (n)}\) over the choice of \((G,\sigma )\sim \big (\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}}(\varepsilon )),\hat{\varvec{\sigma }}_{\textrm{p}}(\varepsilon )\big )\), we have

$$\begin{aligned} \left\| {\nu ^{\sigma }-\nu _{\textrm{p}}}\right\| \le \delta \quad \text{ and } \quad \left\| {\rho ^{G,\sigma }-\rho _{\textrm{p}}}\right\| \le \delta . \end{aligned}$$
(5.5)

Let \(\varepsilon '\) be an arbitrary constant such that \(\varepsilon '>\varepsilon \). We will show that there exists a constant \(\eta >0\) such that for arbitrary \(\nu \) and \(\rho \in {\mathcal R}(\nu )\) satisfying \(\left\| {\nu -\nu _{\textrm{p}}}\right\| \le \delta \) and \(\left\| {\rho -\rho _{\textrm{p}}}\right\| \le \delta \), for \((G,\sigma )\sim \big (\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}}(\varepsilon )),\hat{\varvec{\sigma }}_{\textrm{p}}(\varepsilon )\big )\), it holds that

$$\begin{aligned} \mathbb {P}\Big [P^{G}_{SW}\big (\sigma \rightarrow S_{\textrm{p}}(\varepsilon ')\big )\ge 1-e^{-\eta n}\,\big |\, \nu ^{\sigma }=\nu , \rho ^{G,\sigma }=\rho \Big ]\ge 1-\textrm{e}^{-\eta n} \end{aligned}$$
(5.6)

and therefore the conclusion follows by aggregating over \(\nu \) and \(\rho \), using the law of total probability and the probability bound for (5.5).

Choose \((G,\sigma )\sim \big (\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}}(\varepsilon )),\hat{\varvec{\sigma }}_{\textrm{p}}(\varepsilon )\big )\) conditioned on \(\nu ^{\sigma }=\nu \) and \(\rho ^{G,\sigma }=\rho \). Observe that \(\hat{\varvec{G}}(\sigma )\) is a uniformly random graph conditioned on the sizes of the vertex/edge classes prescribed by \(\nu ,\rho \). For \(i\ge 1\), let \(C_i(G_{\sigma ,SW})\) be the components of G (in decreasing order of size) starting from the configuration \(\sigma \) after the percolation step of the SW dynamics with parameter \(p=1-\textrm{e}^{-\beta }\), when starting from the configuration \(\sigma \). We will show that there exists a constant \(M>0\) such that

$$\begin{aligned} \mathbb {P}\bigg [\sum _{i\ge 1}|C_i(G_{\sigma ,SW})|^2\le Mn\,\Big |\, \nu ^{\sigma }=\nu , \rho ^{G,\sigma }=\rho \bigg ] \ge 1-\textrm{e}^{-\Omega (n)}. \end{aligned}$$
(5.7)

Assuming this for the moment, for a colour \(s\in [q]\), let \(N_s\) be the number of vertices with colour \(s\in [q]\) after the recoloring step of SW. Note that the expectation of \(N_s\) is n/q, and whenever the event in (5.7) holds, by Azuma’s inequality we obtain that \(\frac{1}{n}N_s\) is within an additive \(\varepsilon '\) from its expectation with probability \(1-\textrm{e}^{-\Omega (n)}\). By a union bound over the q colours, we obtain (5.6).

For a colour \(s\in [q]\), let \(G(\sigma ^{-1}(s))\) be the induced graph on \(\sigma ^{-1}(s)\), and note that since G is uniformly random conditioned on \(\nu \) and \(\rho \), \(G(\sigma ^{-1}(s))\) has the same distribution as the exact-edge model \(H(s)\sim {\tilde{\varvec{G}}}_{\tilde{r}(s)}({\tilde{n}}(s),d)\) where \({\tilde{n}}(s)=n\nu (s)\) and \(\tilde{r}(s)=\frac{\rho (s,s)}{\nu (s)}\). Percolation on this graph with parameter p is therefore closely related to the binomial-edge model \(\varvec{G}_{r(s)}(\tilde{n}(s),d)\) with \(r(s)=p\tilde{r}(s)\). More precisely, note that for all sufficiently small \(\delta >0\), Lemma 5.6 guarantees that the percolation parameter r(s) is bounded by a constant strictly less than \(1/(d-1)\), so by Theorem 5.2 there exists a constant \(M>0\) such that

$$\begin{aligned} \mathbb {P}\Big [\sum _{i\ge 1}|C_i(\varvec{G}_{r(s)})|^2\le M\tilde{n}(s)\Big ]\ge 1-\textrm{e}^{-\Omega (\tilde{r}{n}(s))}\ge 1-\textrm{e}^{-\Omega (n)}. \end{aligned}$$
(5.8)

Note that, for any \(p\in (0,1)\), the property \(\big \{G\,:\, \mathbb {P}\big [\sum _{i\ge 1}|C_i(G_p)|^2\le Mn\big ]\ge 1-\textrm{e}^{-\Omega (n)}\big \}\) is a decreasing graph property, i.e., if G is a subgraph of \(G'\), we can couple the random graphs \(G_p\) and \(G_p'\) so that \(\sum _{i\ge 1}|C_i(G_p)|^2\le \sum _{i\ge 1}|C_i(G_p')|^2\). Viewing the event in (5.8) as a property of the binomial-edge model \(\varvec{G}_{\tilde{r}(s)}(\tilde{n}(s),d)\), it follows from Lemma 5.1 that with probability \(1-\textrm{e}^{-\Omega (n)}\) over the choice of the exact-edge model \(H(s)\sim {\tilde{\varvec{G}}}_{\tilde{r}(s)}({\tilde{n}}(s),d)\) it holds that

$$\begin{aligned} \mathbb {P}\Big [\sum _{i\ge 1}|C_i(H_p(s))|^2]\le M\tilde{n}(s)\Big ]\ge 1-\textrm{e}^{-\Omega (n)}. \end{aligned}$$

Applying this for colours \(s=1,\ldots ,q\) and \(H(s)=G(\sigma ^{-1}(s))\), we obtain by the union bound that with probability \(1-\textrm{e}^{-\Omega (n)}\) over the choice of \((G,\sigma )\sim \big (\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{p}}(\varepsilon )),\hat{\varvec{\sigma }}_{\textrm{p}}(\varepsilon )\big )\) conditioned on \(\nu ^{\sigma }=\nu \) and \(\rho ^{G,\sigma }=\rho \), the components of G after the percolation step of SW satisfy (5.7), as claimed, therefore finishing the proof. \(\quad \square \)

Proposition 4.4

Let \(q,d\ge 3\) be integers and \(\beta \in ({\beta _u},{\beta _h})\) be real. Then, for all sufficiently small constants \(\varepsilon '>\varepsilon >0\), there exists constant \(\eta >0\) such that with probability \(1-\textrm{e}^{- \eta n}\) over the planted distribution \((G,\sigma )\sim \big (\hat{\varvec{G}}\big (\hat{\varvec{\sigma }}_{\textrm{p}}(\varepsilon )\big ),\hat{\varvec{\sigma }}_{\textrm{p}}(\varepsilon )\big )\), it holds that \(P^{G}_{SW}\big (\sigma \rightarrow S_{\textrm{p}}(\varepsilon ')\big )\ge 1-\textrm{e}^{-\eta n}\).

Proof of Proposition 4.4

The first part of the proof is analogous to that of Theorem 4.3. Let \(\varepsilon >0\) be a sufficiently small constant, so that by Lemma 3.5, for any constant \(\delta >0\), with probability \(1-\textrm{e}^{-\Omega (n)}\) over the choice of \((G,\sigma )\sim \big (\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{f}}(\varepsilon )),\hat{\varvec{\sigma }}_{\textrm{f}}(\varepsilon )\big )\), we have

$$\begin{aligned} \left\| {\nu ^{\sigma }-\nu _{\textrm{f}}}\right\| \le \delta \quad \text{ and } \quad \left\| {\rho ^{G,\sigma }-\rho _{\textrm{f}}}\right\| \le \delta . \end{aligned}$$
(5.9)

We will show that there exists a constant \(\eta >0\) such for arbitrary \(\nu \) and \(\rho \in {\mathcal R}(\nu )\) satisfying \(\left\| {\nu -\nu _{\textrm{f}}}\right\| \le \delta \) and \(\left\| {\rho -\rho _{\textrm{f}}}\right\| \le \delta \), for \((G,\sigma )\sim \big (\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{f}}(\varepsilon )),\hat{\varvec{\sigma }}_{\textrm{f}}(\varepsilon )\big )\) it holds that

$$\begin{aligned} \mathbb {P}\Big [P^{G}_{SW}\big (\sigma \rightarrow \tilde{S_{\textrm{f}}}(\varepsilon ')\big )\ge 1-e^{-\eta n}\,\big |\, \nu ^{\sigma }=\nu , \rho ^{G,\sigma }=\rho \Big ]\ge 1-\textrm{e}^{-\eta n} \end{aligned}$$
(5.10)

and therefore the conclusion follows by aggregating over \(\nu \) and \(\rho \).

Choose \((G,\sigma )\sim \big (\hat{\varvec{G}}(\hat{\varvec{\sigma }}_{\textrm{f}}(\varepsilon )),\hat{\varvec{\sigma }}_{\textrm{f}}(\varepsilon )\big )\) conditioned on \(\nu ^\sigma =\nu \) and \(\rho ^{G,\sigma }=\rho \), and observe once again that \(\hat{\varvec{G}}(\sigma )\) is uniformly random conditioned on \(\nu ,\rho \). For \(i\ge 1\), let \(C_i(G_{\sigma ,SW})\) be the components of G (in decreasing order of size) starting from the configuration \(\sigma \) after the percolation step of the SW dynamics with parameter \(p=1-\textrm{e}^{-\beta }\), when starting from the configuration \(\sigma \). We will show that there exists a constant \(M>0\) such that

$$\begin{aligned}{} & {} \mathbb {P}\Big [C_1(G_{\sigma ,SW})= n\big (1-\tfrac{q(1-\nu _{\textrm{f}}(1))}{q-1}\big )\pm \epsilon 'n,\ \sum _{i\ge 2}|C_i(G_{\sigma ,SW})|^2\nonumber \\{} & {} \quad \le Mn\,\Big |\, \nu ^{\sigma }=\nu , \rho ^{G,\sigma }=\rho \Big ]\ge 1-\textrm{e}^{-\Omega (n)}. \end{aligned}$$
(5.11)

We first complete the proof of the theorem assuming this for the moment, and return to the proof of (5.11) later. In particular, assume w.l.o.g. that \(C_1(G_{\sigma ,SW})\) gets colour 1. For a colour \(s\in [q]\), let \(N_s\) be the number of vertices outside \(C_1(G_{\sigma ,SW})\) that get colour \(s\in [q]\) after the recoloring step of SW. Note that in the final configuration after the recolouring step, the number of vertices with colour \(s\in [q]\) is \(N_s+\textbf{1}\{s=1\}|C_1(G_{\sigma ,SW})|\). Now, the expectation of \(N_s\) is \(\tfrac{n-|C_1(G_{\sigma ,SW})|}{q}\), and whenever the event in (5.7) holds, by Azuma’s inequality we obtain that \(\frac{1}{n}N_s\) is within an additive \(\varepsilon '\) from its expectation with probability \(1-\textrm{e}^{-\Omega (n)}\). Therefore, by a union bound over the q colours, the Potts configuration obtained after one step of SW belongs to \(\tilde{S_{\textrm{f}}}(\varepsilon ')\) with probability \(1-\textrm{e}^{-\Omega (n)}\), which establishes the claim in (5.10).

It remains to prove (5.11). As in the proof of Proposition 4.3, for a colour \(s\in [q]\), let \(G(\sigma ^{-1}(s))\) be the induced graph on \(\sigma ^{-1}(s)\), and note that \(G(\sigma ^{-1}(s))\) has the same distribution as the exact-edge model \(H(s)\sim {\tilde{\varvec{G}}}_{\tilde{r}(s)}({\tilde{n}}(s),d)\) where \(\tilde{n}(s)=n\nu (s)\) and \(\tilde{r}(s)=\frac{\rho (s,s)}{\nu (s)}\). By considering again the binomial-edge model \(\varvec{G}_{r(s)}(\tilde{n}(s),d)\) with \(r(s)=p\tilde{r}(s)\), and using the inequalities in Lemma 5.6 for the ferromagnetic phase, we obtain that for all colours \(s\ne 1\) the parameter r(s) is bounded by a constant strictly less than \(\tfrac{1}{d-1}\) and hence the model is in the subcritical regime. In fact, by the same line of arguments as in Theorem 4.3, we therefore have that there exists a constant \(M_0>0\) (depending only on \(d,\beta \) but not on \(\nu \) or \(\rho \)) such that, for all colours \(s\ne 1\) with probability \(1-\textrm{e}^{-\Omega (n)}\) over the choice of \(H(s)\sim \tilde{\varvec{G}}_{\tilde{r}(s)}({\tilde{n}}(s),d)\), it holds that

$$\begin{aligned} \mathbb {P}\Big [\sum _{i\ge 1}|C_i(H_p(s))|^2\le M_0\tilde{n}(s)\Big ]\ge 1-\textrm{e}^{-\Omega (n)} \end{aligned}$$
(5.12)

By contrast, for \(s=1\), the binomial-edge model \(\varvec{G}_{r(s)}(\tilde{n}(s),d)\) is in the supercritical regime since \(r(s)=r_{\textrm{f}}\pm \varepsilon \) where \(r_{\textrm{f}}=(1-\textrm{e}^{-\beta })\frac{\rho _{\textrm{f}}(1,1)}{\nu _{\textrm{f}}(1)} =\tfrac{(\textrm{e}^{\beta }-1)\mu _{\textrm{f}}(1)}{1+(\textrm{e}^{\beta }-1)\mu _{\textrm{f}}(1)}\) is a constant larger than \(\tfrac{1}{d-1}\) (by Lemma 5.6). Let \(\chi _{\textrm{f}}=\chi (r_{\textrm{f}})\) be as in (5.2), so by Proposition 5.4 there exists a constant \(M_1>0\) such that

$$\begin{aligned} \mathbb {P}\Big [\big |C_1(\varvec{G}_{r(s)})\big |= \tilde{n}(s)(\chi _{\textrm{f}}\pm \tfrac{\epsilon '}{2})\Big ],\ \mathbb {P}\bigg [\sum _{i\ge 2}|C_i(\varvec{G}_{r(s)})|^2\le M_1\tilde{n}(1)\bigg ]\ge 1-\textrm{e}^{-\Omega (n)}. \end{aligned}$$
(5.13)

We will shortly show that

$$\begin{aligned} 1-\frac{q(1-\nu _{\textrm{f}}(1))}{q-1}=\chi _{\textrm{f}}\nu _{\textrm{f}}(1) \text{ or } \text{ equivalently } \chi _{\textrm{f}}= \frac{q\nu _{\textrm{f}}(1)-1}{(q-1)\nu _{\textrm{f}}(1)}. \end{aligned}$$
(5.14)

Assuming this for now, note that since \(|C_1(G)|\) and \(\mathbb {P}\big [\sum _{i\ge 2}|C_i(G_p)|^2\big ]\) are monotone under edge-inclusion, we can again use Lemma 5.1 to go back to the percolation model for the colour \(s= 1\). So, we conclude that with probability \(1-\textrm{e}^{-\Omega (n)}\) over the choice of \(H(s)\sim \tilde{\varvec{G}}_{\tilde{r}(s)}({\tilde{n}}(s),d)\), it holds that

$$\begin{aligned} \mathbb {P}\bigg [\big |C_1(H_p(s))\big |= \tilde{n}(s)(\chi _{\textrm{f}}\pm \epsilon '), \quad \sum _{i\ge 1}|C_i(H_p(s))|^2\le M_1\tilde{n}(s)\bigg ]\ge 1-\textrm{e}^{-\Omega (n)}. \end{aligned}$$

Combining (5.12) and (5.13) with a union bound over the q colours, we obtain (5.11) with \(M=\max {M_0,M_1}\).

It only remains to prove (5.14). Recall from (5.4) that \(\nu _{\textrm{f}}(1)=\tfrac{t^d}{t^d+(q-1)}\) where \(t=\tfrac{1+(\textrm{e}^\beta -1)x}{1+(\textrm{e}^\beta -1)\tfrac{1-x}{q-1}}\) and \(x=\mu _{\textrm{f}}(1)\). So, \(\chi _{\textrm{f}}= \frac{q\nu _{\textrm{f}}(1)-1}{(q-1)\nu _{\textrm{f}}(1)}\) is equivalent to showing that

$$\begin{aligned} \chi _{\textrm{f}}= 1-(1/t)^d. \end{aligned}$$
(5.15)

Now, recall from (5.2) that \(\chi _{\textrm{f}}=1-\big (1-r_{\textrm{f}}+r_{\textrm{f}}\phi _{\textrm{f}}\big )^{d}\), where \(\phi _{\textrm{f}}=\phi (r_{\textrm{f}})\). So (5.15) reduces to showing that

$$\begin{aligned}{} & {} 1/t=1-r_{\textrm{f}}+r_{\textrm{f}}\phi _{\textrm{f}}, \text{ which } \text{ using } t=\tfrac{1+(\textrm{e}^\beta -1)x}{1+(\textrm{e}^\beta -1)\tfrac{1-x}{q-1}}\nonumber \\{} & {} \text{ and } r_{\textrm{f}}=\tfrac{(\textrm{e}^{\beta }-1)x}{1+(\textrm{e}^{\beta }-1)x} \text{ is } \text{ equivalent } \text{ to } \phi _{\textrm{f}}=\tfrac{1-x}{(q-1)x}. \end{aligned}$$
(5.16)

From (5.2), \(y=\phi _{\textrm{f}}\) is the unique solution in (0, 1) of the equation

$$\begin{aligned} y=\big (1-r_{\textrm{f}}+r_{\textrm{f}}y\big )^{d-1}, \end{aligned}$$
(5.17)

and note that \(\tfrac{1-x}{(q-1)x}\in (0,1)\) since \(x>1/q\). So, to prove the equality \(\phi _{\textrm{f}}=\tfrac{1-x}{(q-1)x}\) in (5.16), it suffices to show that setting \(y=\tfrac{1-x}{(q-1)x}\) satisfies (5.17). This follows from the fact that \(x=\mu _{\textrm{f}}(1)\) satisfies the Belief propagation equations; in particular, from (5.3) we have

$$\begin{aligned} x=\frac{(1+(\textrm{e}^\beta -1)x)^{d-1}}{(1+(\textrm{e}^\beta -1)x)^{d-1}+(q-1)\big (1+(\textrm{e}^\beta -1)\tfrac{1-x}{q-1}\big )^{d-1}}, \end{aligned}$$

from which it follows that \(y=\tfrac{1-x}{(q-1)x}=\Big (\frac{1+(\textrm{e}^\beta -1)x)}{1+(\textrm{e}^\beta -1)\tfrac{1-x}{q-1}}\Big )^{d-1}=\big (1-r_{\textrm{f}}+r_{\textrm{f}}y\big )^{d-1}\). This finishes the proof of (5.14) and therefore the proof of Proposition 4.4. \(\quad \square \)