Introduction

History of the problem

Since the seminal works of Dobrushin and Lanford-Ruelle [16, 29], the equilibrium states of a lattice model of statistical mechanics in the thermodynamic limit—the so-called Gibbs states—are identified with the probability measures \(\mu \) that are solutions of the DLR equation,

$$\begin{aligned} \mu (\cdot )=\int \mathrm d \mu (\omega )\gamma _\Lambda ( \cdot \,\vert \,\omega ), \quad \text{ for} \text{ all} \text{ finite} \text{ subsets} \Lambda \text{ of} \text{ the} \text{ lattice,} \end{aligned}$$

where the probability kernel \(\gamma _\Lambda \) is the Gibbsian specification associated to the system; see [19]. Under very weak assumptions (at least for bounded spins), it can be shown that the set \(\mathcal G \) of all Gibbs states is a non-empty simplex. The analysis of \(\mathcal G \) is thus reduced to determining its extremal elements. In general, this is a very hard problem which remains essentially completely open in dimensions \(3\) and higher, for any nontrivial model, even in perturbative regimes.

The problem of determining all extremal Gibbs states amounts to understanding all possible local behaviors of the system. Pirogov–Sinaĭ’s theory [32, 35] often allows, at very low temperatures, to determine the pure phases of the model, i.e., the extremal, translation invariant (or periodic) Gibbs states, as perturbations of the corresponding ground states. However, it might be the case that suitable boundary conditions induce interfaces resulting in the local coexistence of different thermodynamic phases. That such a phenomenon can occur was first proved for the nearest-neighbor ferromagnetic (n.n.f.) Ising model on \(\mathbb Z ^3\) by Dobrushin [17], by considering the model in a cubic box with \(+\) spins on the top half boundary of the box and—spins on the bottom half (the so-called Dobrushin boundary condition). He proved that, at low enough temperatures, the induced interface is rigid—it is given by a plane with local defects—and the corresponding Gibbs state is extremal.

In two dimensions, the situation is very different. Gallavotti [18] proved, by studying the fluctuations of the corresponding interface, that the Gibbs state of the (very low temperature) n.n.f. Ising model on \(\mathbb Z ^2\) obtained using the Dobrushin boundary condition is the mixture \(\tfrac{1}{2}(\mu ^+ + \mu ^-)\), where \(\mu ^+\) and \(\mu ^-\) are the two pure phases of the Ising model. This was refined by Higuchi [24], who proved that the interface, after diffusive scaling, weakly converges to a Brownian bridge at sufficiently low temperatures. These two results were then pushed to all subcritical temperatures by, respectively, Messager and Miracle-Sole [31] and Greenberg and Ioffe [22]. A weaker but very simple and general proof of the non-extremality of the state obtained using Dobrushin boundary condition can be found in [8].

The fact that the Dobrushin boundary condition gives rise to a translation invariant Gibbs state is a strong indication that all Gibbs states of the two-dimensional Ising model should be translation invariant: because of the large fluctuations of the interfaces, a small box deep inside the system should remain, with high probability, far away from any of the interfaces that are induced by the boundary condition. Thus the possible local behaviors of the system should correspond to the pure phases.

In the late 1970s, this phenomenology was established in the celebrated works of Aizenman [1] and Higuchi [25], based on important earlier work of Russo [34]. They proved that \(\mathcal G =\{\alpha \mu ^++(1-\alpha )\mu ^-\,:\,0\le \alpha \le 1\}\) for the n.n.f. Ising model on \(\mathbb Z ^2\). Their approaches relied on many specific properties of the Ising model (in particular, GKS, Lebowitz and FKG inequalities were used in the proof). A decade ago, Georgii and Higuchi [21] devised a variant of this proof with a number of advantages. In particular, their version only relies on the FKG inequality (and some lattice symmetries), which made it possible to obtain in the same way a complete description of Gibbs states in several other models: the n.n.f. Ising model on the triangular and hexagonal lattices, the antiferromagnetic Ising model in an homogeneous field and the hard-core lattice gas. It should be emphasized that all these works deal directly with the infinite-volume system, and have only very weak implications for large finite systems. In particular, the reasoning underlying these arguments (taking the form of a proof by contradiction) remains far from the heuristics of interfaces fluctuations.

A much more general result, restricted to very low temperatures, was established by Dobrushin and Shlosman [15]. They proved that, under suitable assumptions (finite single-spin space, bounded interactions, finite number of periodic ground states), all Gibbs states are periodic, and in particular are convex combinations of the pure phases corresponding to perturbations of the ground states of the model. Their approach deals with finite systems and is closer in spirit, if not fully in practice, to the above heuristics. Namely, even though interface fluctuations play a central role in the approach of [15], the authors resort to crude low temperature surgery estimates without developing a comprehensive fluctuation theory.

Very recently, a completely different approach to the Aizenman–Higuchi result was developed by two of us [13]. This new approach, although still restricted to the n.n.f. Ising model on \(\mathbb Z ^2\), presents several advantages on the former ones. Unlike [15] it does not require a very low temperature assumption, and actually holds for all sub-critical temperatures. Furthermore, it provides a quantitative, finite-volume version of the Aizenman–Higuchi theorem, with the correct rate of relaxation. Another interesting feature of the proof is that it closely follows the outlined heuristics and, consequently, should be much more robust.

In the present work, we extend the approach of [13] to n.n.f. Potts models on \(\mathbb Z ^2\). As will be seen below, two major factors make the proof substantially more difficult in this case. The first one is of a physical nature: In all previous non-perturbative studies, there were only two pure phases, and thus macroscopic interfaces were always line segments. In the Potts model with \(3\) or more states, there are more than two phases and, consequently, interfaces are more complicated objects, elementary macroscopic interfaces being trees rather than lines. The second difficulty is of a technical nature: The positive association of Ising spins, manifested through the FKG inequality, simplified many parts of the proof in [13]. Unfortunately, this property does not hold anymore in the context of general \(q\)-state Potts models. We will therefore avoid this difficulty by reformulating the problem in terms of the random-cluster representation.

Statement of the results

Let \(\Omega =\{1,\ldots ,q\}^\mathbb{Z ^2}\) be the space of configurations. Let \(\Lambda \) be a finite subset of \(\mathbb Z ^2\), and \(\Lambda ^c=\mathbb Z ^2{\setminus }\Lambda \) be its complement. The finite-volume Gibbs measure in \(\Lambda \) for the \(q\)-state Potts model with boundary conditions \(\sigma \in \Omega \) and at inverse-temperature \(\beta >0\) is the probability measure on \(\Omega \) (with the associated product \(\sigma \)-algebra) defined by

$$\begin{aligned} \mathbb P _{\beta ,\Lambda }^\sigma (\eta ) = {\left\{ \begin{array}{ll} \frac{1}{Z^\sigma _{\beta ,\Lambda }}\mathrm{e}^{-\beta H_\Lambda (\eta )}&\text{ if}\, \eta _i=\sigma _i \text{ for} \text{ all} i\in \Lambda ^c\\ 0&\text{ otherwise}, \end{array}\right.} \end{aligned}$$

where the normalization constant \(Z^\sigma _{\beta ,\Lambda }\) is the partition function. The Hamiltonian in \(\Lambda \) is given by

$$\begin{aligned} H_\Lambda (\eta ) = -\sum _{\begin{array}{c} i\sim j\\ \{i,j\}\cap \Lambda \ne \varnothing \end{array}}\delta _{\eta _i,\eta _j} \end{aligned}$$

where \(i\sim j\) if \(i\) and \(j\) are nearest neighbors in \(\mathbb Z ^2\). In the case of pure boundary condition \(i \in \} 1,\ldots , q\{ \), meaning that \(\sigma _x=i\) for every \(x\in \Lambda ^c\), we denote the measure by \(\mathbb P _{\beta ,\Lambda }^{(i)}\).

For an arbitrary subset \(A\) of \(\mathbb Z ^2\), let \(\mathcal{F }_A\) be the sigma-algebra generated by spins in \(\Lambda \). A probability measure \(\mathbb P \) on \(\Omega \) is an infinite-volume Gibbs measure for the \(q\)-state Potts model at inverse temperature \(\beta \) if and only if it satisfies the following DLR condition:

$$\begin{aligned} \mathbb P (\cdot |\mathcal F _{\Lambda ^c})(\sigma ) = \mathbb P ^\sigma _{\beta ,\Lambda }\quad \text{ for} \mathbb P \text{-a.e.} \sigma \text{,} \text{ and} \text{ all} \text{ finite} \text{ subsets} \Lambda \text{ of} \mathbb Z ^2. \end{aligned}$$

Let \(\mathcal{G }_{q,\beta }\) be the space of infinite-volume \(q\)-state Potts measures.

Non-emptiness of \(\mathcal{G }_{q,\beta }\) can be proved constructively in this model. For \(i\in \{1,\ldots ,q\}\), \({(\mathbb P ^{(i)}_{\beta ,\Lambda })}_\Lambda \) converges when \(\Lambda {\nearrow }\mathbb Z ^2\) (in particular, the limit does not depend on the sequence of boxes chosen); this follows easily, e.g., from the random cluster representation. We denote by \(\mathbb P ^{(i)}_\beta \) the corresponding limit. It can be checked [20, Prop. 6.9] that the measures \(\mathbb P ^{(i)}_\beta \) (\(i=1,\ldots ,q\)) belong to \(\mathcal{G }_{q,\beta }\) and are translation invariant.

When \(\beta \) is less than the critical inverse temperature \(\beta _c(q)=\log (1+\sqrt{q})\) [6], it is known that there exists a unique infinite-volume Gibbs measure (in particular \(\mathbb P ^{(i)}_\beta =\mathbb P ^{(j)}_\beta \) for every \(i,j\in \{1,\ldots ,q\}\)). The relevant values of \(\beta \) for a study of \(\mathcal{G }_{\beta ,q}\) are thus \(\beta \ge \beta _c(q)\).

In the present work, we extend ideas of [13] in order to determine all infinite-volume Gibbs measures for the \(q\)-state Potts models at inverse temperature \(\beta >\beta _c(q)\) on \(\mathbb Z ^2\). More precisely, we show that every Gibbs state is a convex combination of infinite-volume measures with pure boundary condition:

Theorem 1.1

For any \(q\ge 2\) and \(\beta >\beta _c(q)\),

$$\begin{aligned} \mathcal{G }_{q,\beta } = \left\{ \sum _{i=1}^q\alpha _i\mathbb P _\beta ^{(i)}, \text{ where} \alpha _i\ge 0,\forall i\in \{1,\ldots , q\} \text{ and} \sum _{i=1}^q\alpha _i=1\right\} . \end{aligned}$$
(1)

A straightforward yet important corollary of this theorem is the fact that any Gibbs state is invariant under translations.

Corollary 1.2

For any \(q\ge 2\) and \(\beta >\beta _c(q)\), all elements of \(\mathcal{G }_{q,\beta }\) are invariant under translations.

A second important corollary is the fact that the extremal Gibbs measures (also called pure states) of the simplex \(\mathcal{G }_{q,\beta }\) are the infinite-volume measures with pure boundary condition.

Corollary 1.3

For any \(q\ge 2\) and \(\beta >\beta _c(q)\), the extremal elements of \(\mathcal{G }_{q,\beta }\) are the \(\mathbb P _\beta ^{(i)}, i\in \{1,\dots ,q\}\).

This follows from Theorem 1.1: Define \(\Delta (\beta )\) via \( \mathbb{P }_\beta ^{(i)}(\eta _0\ne i) = (q-1)\Delta \), and observe that in the decomposition (1) of \(\mathbb{P }_\beta \in \mathcal{G }_{q,\beta }\), the coefficient \(\alpha _i\) equals to \(\frac{\mathbb{P }_\beta (\eta _0=i)-\Delta }{\mathbb{P }_\beta ^{(i)}(\eta _0=i)-\Delta }.\)

Actually, our main result is stronger than Theorem 1.1. As in [13], we obtain a finite-volume, quantitative version of the latter theorem, which, together with its proof, fully vindicates the heuristics given above. For a measure \(\mu \) and an integrable function \(f\), we write \(\mu [f]=\int f\mathrm d \mu \).

Theorem 1.4

Let \(q\ge 2\) and \(\beta >\beta _c(q)\), and set \(\Lambda _n=\mathbb Z ^2\cap [-n,n]^2\). For any \(\varepsilon >0\) small enough, there exists \(C_\varepsilon <\infty \) such that, for any boundary condition \(\sigma \) on \(\partial \Lambda _n\), we can find \(\alpha _1^n,\ldots ,\alpha _q^n\ge 0\) depending on \((n,\sigma ,\beta ,q)\) only, such that

$$\begin{aligned} \bigl |\,\mathbb P _{\Lambda _n,\beta }^\sigma [g] - \sum _{i=1}^q\alpha _i^n\, \mathbb P _{\beta }^{(i)}[g] \bigr | \le C_\varepsilon \Vert g\Vert _\infty n^{-\tfrac{1}{2} +14\varepsilon }, \end{aligned}$$

for any measurable function \(g\) of the spins in \(\Lambda _{n^\varepsilon }\).

Note that the error term is essentially of the right order (which is \(O(n^{-1/2})\)); see [13] for a proof of this claim when \(q=2\).

The strategy of the proof is the following. We consider the conditioned random-cluster measure on \(\Lambda \) associated to the \(q\)-state Potts model with boundary condition \(\sigma \). Boundary conditions for the Potts model get rephrased as absence of connections (in the random-cluster configuration) between specified parts of the boundary of \(\Lambda \). In other words, boundary conditions for the Potts models correspond to conditioning on the existence of dual-clusters between some dual-sites on the boundary. Note that the conditioning can be very messy, since intricate boundary conditions correspond to microscopic conditioning on existence of dual-clusters. It will be seen that being a mixture of measures with pure boundary condition boils down to the fact that, with high probability, no dual-cluster connected to the boundary reaches a small box deep inside \(\Lambda \) (which, in particular, implies that the same is true for the Potts interfaces).

The techniques involved in the proof are two-fold. First, we use positivity of surface tension in the regime \(\beta >\beta _c\), which was proved in [6], in order to get rid of the microscopic mess due to the conditioning and to show that, deep inside the box, the conditioning with respect to \(\sigma \) corresponds to the existence of macroscopic dual-clusters. The second part of the proof consists in proving that these clusters are very slim, and that they fluctuate in a diffusive way, so that the probability that they touch a small box centered at the origin is going to zero as the size of \(\Lambda \) goes to infinity. The crucial step here is the use of the Ornstein–Zernike theory of sub-critical FK clusters developed in [10].

Open problems

Before delving into the proof, let us formulate some important open problems related to the present study.

  • Critical 2d Potts models. The behavior of two-dimensional \(q\)-state Potts models in the critical regime \(\beta =\beta _c(q)\) is still widely open. It is conjectured that there is a unique Gibbs state when \(q=3\) and \(4\), but that, for \(q\ge 5\), there is coexistence at \(\beta _c\) of \(q+1\) pure phases: the \(q\) low-temperature ordered pure phases and the high temperature disordered phase. This is known to be true when \(q\) is large enough [28, 30]. The extension of the latter result to every \(q>4\) remains a mathematical challenge.

  • Finite-range 2d models. The extension of the present result, even in the Ising case \(q=2\), to general finite-range interactions still seems out of reach today. There are, at least, two main difficulties when dealing with such models: On the one hand, it is difficult to find a suitable non-perturbative definition of interfaces (the classical definitions used, e.g., in Pirogov–Sinaĭ theory become meaningless once the temperature is not very low); on the other hand, interfaces will not partition the system into (random) subsystems with pure boundary conditions anymore, which implies that it will be necessary to understand relaxation to pure phases from impure boundary conditions. Of course, the general philosophy of the approach we use should still apply.

  • The question of quasiperiodicity. There is a general conjecture that two-dimensional models should always possess a finite number of extremal Gibbs states, all of which are periodic. In particular, this would imply that all Gibbs states are periodic, and thus that a two-dimensional quasicrystal cannot exist (as an equilibrium state).

  • Models in higher dimensions. Needless to say, the situation in higher dimensions is very different, due to the existence of translation non-invariant states. Even in the very low-temperature \(3\)-dimensional n.n.f. Ising model, the set of extremal Gibbs states is not known. Note, however, that it has been proved, in the case of a \(d\)-dimensional Ising model for any \(d\ge 3\), that all translation invariant Gibbs states are convex combinations of the two pure phases at all temperatures [7]. A similar result also holds for large enough values of \(q\) [30].

Notations

Each nearest-neighbor edge \(e\) of \(\mathbb Z ^2\) intersects a unique dual edge of \((\mathbb Z ^2)^* = (\frac{1}{2} ,\frac{1}{2}) +\mathbb Z ^2\), that we denote by \(e^*\). Consider a subgraph \(G=(V,E)\) of \(\mathbb Z ^2\), with vertex set \(V\) and edge set \(E\). If \(E\) is a set of direct edges, then its dual is defined by \(E^* = \left\{ e^*\, :\, e\in E\right\} \). Furthermore, if \(G\) does not possess any isolated vertices, we can define the dual \(V^*\) as the endpoints of edges in \(E^*\). Altogether, this defines a dual graph \(G^*=(V^*,E^*)\).

Let \(\Lambda _n\) be the set of sites of \(\mathbb Z ^2\cap [-n,n]^2\) and \(E_n\) be the set of all nearest-neighbor edges of \(\Lambda _n\). The dual graph is denoted by \(\left(\Lambda _n^* , E_n^*\right)\). For \(m<n\), the annulus \(\Lambda _n{\setminus }\Lambda _m\) is denoted by \(A_{m,n}\).

The vertex-boundary \(\partial V\) of a graph \((V,E)\) is defined by \(\partial V = \{ x\in V: \exists y\sim x \text{ such} \text{ that} y\not \in V\}\).

The exterior vertex-boundary \(\partial ^\mathrm{ext} V\) of a graph \((V,E)\) is defined by \(\partial ^\mathrm{ext} V = \cup _{x\in V}\left\{ y\not \in V : y\sim x\ \right\} \).

The edge-boundary \(\partial E\) of a graph \((V,E)\) is the set of edges between two adjacent points of \(\partial V\).

It will occasionally be convenient to think about \(\partial E_m\) as a closed contour in \(\mathbb{R }^2\) or, more generally, to think about subsets of \(E\) (clusters, paths, etc) in terms of their embedding into \(\mathbb{R }^2\); we shall do it without further comments in the sequel.

All constants in the sequel depend on \(\beta \) and \(q\) only. We shall use the notation \(f=O(g)\) if there exists \(C=C(\beta ,q)>0\) such that \(|f|\le C|g|\). We shall write \(f=\Theta (g)\) if both \(f=O(g)\) and \(g=O(f)\).

From Potts model to random-cluster model

In this section, we relate Potts and random-cluster models. We will assume throughout this article that the reader is familiar with the basic properties of the Fortuin–Kasteleyn (FK) representation. A very concise and clear exposition including derivation of comparison inequalities could be found in [2]. Mixing properties of random cluster measures were studied in [3, 4]. There is an extensive review [20] and a book [23] on the subject. More recent results [6, 10] play an important role in our approach.

Let \(G=(V(G),E(G))\) be a finite graph. An element \(\omega \in \{0,1\}^{E(G)}\) is called a configuration. An edge \(e\) is said to be open in \(\omega \) if \(\omega (e)=1\) and closed if \(\omega (e)=0\). We shall work with two types of boundary conditions: \(\mathsf{f}\)-free and \(\mathsf{w}\)-wired. Recall that the random-cluster measure with edge-weight \(p\) and cluster-weight \(q\) on \(G\) with \(*\)-boundary condition (\(*= \mathsf{f}, \mathsf{w}\)) is given by

$$\begin{aligned} \mu _{G,p,q}^*(\omega ) = \mu _{G}^{*}(\omega )=\frac{p^{\#\,\text{ open} \text{ edges}}(1-p)^{\#\,\text{ closed} \text{ edges}}q^{\#_*\,\text{ clusters}}}{Z_{G,p,q}^*}, \end{aligned}$$

where \(Z_{G,p,q}^*\) is a normalizing constant and a cluster is a maximal connected component of the graph \((V(G),\{e\in E(G)\,:\, \omega (e)=1\})\). The number \(\#_\mathsf{f}\,\text{ clusters}\) counts all the disjoint clusters, whereas the number \(\#_\mathsf{w}\,\text{ clusters}\) counts only those disjoint clusters which are not connected to the vertex boundary \(\partial V\).

Coupling with a supercritical random-cluster model on \((\mathbb Z ^2)^*\)

We consider the \(q\)-state Potts model on the graph \((\mathbb Z ^2)^*\) at inverse temperature \(\beta >\beta _c(q)\). As the parameters \(\beta \) and \(q\) will always remain fixed, we drop them from the notation. Fix \(\sigma \in \left\{ 1, \dots , q\right\} ^{(\mathbb Z ^2)^*}\). For each \(n\), we define the Potts measure \(\mathbb{P }^\sigma _{\Lambda _n^*}\) on \(\Lambda _n^*\) with boundary condition \(\sigma \) on the vertex boundary \(\partial \Lambda _n^*\).

It is a classical result (see, e.g., [2, 20]) that the Potts model can be coupled with a random-cluster configuration in the following way. From a configuration of spins \(\eta \in \{1,\dots ,q\}^{V(\Lambda _n^*)}\), construct a percolation configuration \(\omega ^*\in \{0,1\}^{E_n^*}\) by setting each edge in \(E_n^*\) to be

  • closed if the two end-points have different spins,

  • closed with probability \(\mathrm{e}^{-\beta }\) and open otherwise if the two end-points have the same spins.

The measure thus obtained is a random-cluster measure on \((\mathbb Z ^2)^*\) with edge-weight \(p^*=1-\mathrm{e}^{-\beta }\), cluster-weight \(q\) and wired boundary condition on \(\partial \Lambda _n^*\), conditioned on the following event, called \(\mathrm{Cond}_n[\sigma ]\): writing \(S_i= \left\{ x\in \partial \Lambda _n^*\, :\, \sigma (x) =i\right\} \), the sets \(S_i\) and \(S_j\) are not connected by open edges in \( E_n^*\), for every \(i\ne j\) in \(\{1,\ldots ,q\}\). We denote this measure by \( \mu _{\Lambda _n^*}^{\mathsf{w}}(\cdot \; | \; \text{ Cond}_n[\sigma ])\). When there is no conditioning, the random-cluster measure with wired (resp. free) boundary condition is denoted by \(\mu _{\Lambda _n^*}^{\mathsf{w}}\) (resp. \(\mu _{\Lambda _n^*}^{\mathsf{f}}\)).

Reciprocally, the Potts measure can be obtained from \(\mu _{\Lambda _n^*}^{\mathsf{w}}(\cdot \; | \; \text{ Cond}_n[\sigma ])\) by assigning to every cluster a spin in \(\{1,\ldots ,q\}\) according to the following rule:

  • For every \(i\in \{1,\ldots ,q\}\), sites connected to \(S_i\) receive the spin \(i\),

  • The sites of a cluster which is not connected to \(S_i\) receive the same spin in \(\{1,\ldots ,q\}\) chosen uniformly at random, independently of the spins of the other clusters.

Thanks to the connection between Potts measures and random-cluster measures, tools provided by the theory of random-cluster models can be used in this context. Note that the parameters of the corresponding random-cluster measure are supercritical (\(p^*>p_c(q)\)).

Coupling with the subcritical Random-Cluster model on \(\mathbb Z ^2\)

Rather than working with the supercritical random-cluster measure on \((\mathbb Z ^2)^*\), we will be working with its subcritical dual measure on \(\mathbb Z ^2\) (this is the reason for choosing to define the Potts model on \((\mathbb Z ^2)^*\)). There is a natural one-to-one mapping between \(\left\{ 0,1\right\} ^{ E_n^*}\) and \(\left\{ 0,1\right\} ^{ E_n}\). Namely, set \(\omega (e) = 1-\omega (e^*)\). In this way, both direct and dual FK configurations are defined on the same probability space. In the sequel, the same notation will be used for percolation events in direct and dual configurations. For instance, \(\omega \in \mathrm{Cond}_n [\sigma ]\) means that \(\omega ^*\in \mathrm{Cond}_n [\sigma ]\). The corresponding direct FK measure is \(\mu ^{\mathsf{f}}_{\Lambda _n} (\cdot \; | \; \text{ Cond}_n[\sigma ])\).

It is well-known [11] that this defines an FK measure with parameters \(q\) and \(p\) satisfying \(pp^*/[(1-p)(1-p^*)]=q\).

Since we are working with the low temperature Potts model, the random-cluster model on \((\mathbb Z ^2)^*\) corresponds to \(p^*>p_c(q)\) so that the random-cluster model on \(\mathbb Z ^2\) is subcritical (\(p<p_c(q))\). For this measure, \(\mathrm{Cond}_n [\sigma ]\) is an increasing event which requires the existence of direct open paths disconnecting different dual \(S_i\)-s. This reduces the problem to the study of the stochastic geometry of subcritical clusters. In particular, this enables us to use known results on the subcritical model.

Let us recall the few properties we will be using in the next sections. First, there is a unique infinite-volume measure, denoted \(\mu _\mathbb{Z ^2}\). Second, there is exponential decay of connectivities in the random-cluster model with parameter \(p<p_c(q)\). These two properties imply the following corollary.

Proposition 2.1

There exists \(c>0\) such that, for \(n\) large enough and \(2k\le n \le m\),

$$\begin{aligned} \mu _{A_{k,n}}^\mathsf{w}(\text{ there} \text{ exists} \text{ a} \text{ crossing} \text{ of} A_{k,n})\le \mathrm{e}^{-cn},\\ \mu _{\Lambda _n}^\mathsf{w}(\text{ there} \text{ exists} \text{ a} \text{ cluster} \text{ of} \text{ cardinality} m \text{ in} \Lambda _{n/2})\le \mathrm{e}^{-cm}, \end{aligned}$$

where a crossing is a cluster of \(A_{m,n}\) connecting the inner box to the outer box.

A cluster surrounding the inner box of \(A_{m,n}\) inside the outer box of \(A_{m,n}\) is said to be a circuit. Note that the existence of a dual circuit is a complementary event to the existence of a crossing between the inner and outer boxes.

Proposition 2.1 follows from the exponential decay of connectivities proved for any \(p<p_c(q)\) in [6] together with the uniqueness of the infinite-volume measure (this is required to tackle wired boundary conditions, see [10, Appendix] for details). The result would not be true at criticality when \(q\) is very large, despite the fact that there is exponential decay for free boundary conditions.

Surface tension Surface tension in the supercritical dual model is the inverse correlation length in the primal sub-critical FK percolation. Let \(p<p_c(q)\). The surface tension in direction \(x\) is defined by

$$\begin{aligned} \tau (x) = \tau _p(x) = -\lim _{k\rightarrow \infty }\frac{1}{k}\log \mu _\mathbb{Z ^2}(0\leftrightarrow [kx]), \end{aligned}$$

where \(y\leftrightarrow z\) means that \(y\) and \(z\) belong to the same connected component. We will also refer to it as the \(\tau \)-distance. By Proposition 2.1, \(\tau \) is equivalent to the usual Euclidean distance on \(\mathbb{R }^d\). Furthermore, by [10] it is strictly convex, and the following sharp triangle inequality of [26, 33] holds: There exists \(\rho =\rho (p) >0\) such that

$$\begin{aligned} \tau (x)+\tau (y) -\tau (x+y)\ge \rho (|x|+|y|-|x+y|) . \end{aligned}$$
(2)

Define \(\mathrm{d}_\tau (A,B)=\sup _{a\in A}\inf _{b\in B} \tau (a-b)\) to be the \(\tau \)-Hausdorff distance between two sets.

Reformulation of the problem in terms of the subcritical random-cluster model

Theorem 2.2

Fix \(p<p_c(q)\) and let \(\varepsilon \in (0,1)\). Then, uniformly in all boundary conditions \(\sigma \),

$$\begin{aligned} \mu _{\Lambda _n}^\mathsf{f}\bigl (\mathsf{C }\cap \Lambda _{n^{\varepsilon }}\ne \varnothing \; | \; \text{ Cond}_n[\sigma ]\bigr ) = O(n^{-\frac{1}{2} + 14\varepsilon }) \end{aligned}$$
(3)

where \(\mathsf C \) is the set of sites connected to the boundary \(\partial \Lambda _n\).

The proof of this theorem will be the core of the paper. Before delving into the proof, let us show how it implies Theorem 1.4.

Lemma 2.3

Let \(\beta >\beta _c(q)\). Then,

$$\begin{aligned} \mathbb P ^\mathsf{f}_{(\mathbb Z ^2)^*} = \frac{1}{q}\sum _{i=1}^q\mathbb P ^{(i)}_{(\mathbb Z ^2)^*}. \end{aligned}$$
(4)

Proof

Fix \(\beta >\beta _c\). Note that \(\mathbb{P }^{(i)}_{(\mathbb{Z }^2)^*}\) can be defined via the coupling with the random-cluster measure as follows. Let \(\mu _{(\mathbb{Z }^2)^*}\) be the unique infinite-volume random-cluster measure on \((\mathbb{Z }^2)^*\). Since \(p^*>p_c(q)\), this measure possesses a unique infinite cluster. The Potts measure \(\mathbb{P }^{(i)}_{(\mathbb{Z }^2)^*}\) is constructed by assigning spin \(i\) to the infinite cluster, and a spin chosen uniformly at random for each finite cluster, independently of the spin of the other clusters. The Potts measure \(\mathbb{P }^\mathsf{f}_{(\mathbb{Z }^2)^*}\) can also be constructed from \(\mu _{(\mathbb{Z }^2)^*}\) by assigning to each cluster (including the infinite one) a spin chosen uniformly at random, independently of the spin of the other clusters. We deduce (4) immediately.

Note that in general, \(\mathbb{P }^{(i)}_{(\mathbb{Z }^2)^*}\) is constructed from the infinite-volume random-cluster measure \(\mu _{(\mathbb Z ^2)^*}^{\mathsf{w}}\) while \(\mathbb{P }^\mathsf{f}_{(\mathbb{Z }^2)^*}\) is constructed from the infinite-volume random-cluster measure \(\mu _{(\mathbb Z ^2)^*}^{\mathsf{f}}\). Therefore, if these two measures are different, (4) will not be valid. This is the case when \(p=p_c(q)\) and \(q\) is large enough. \(\square \)

Lemma 2.4

There exists \(c>0\) such that, for any \(n>0\) and any subdomain \(\Omega ^*\) of \((\mathbb Z ^2)^*\) containing \(\Lambda _{2n}^*\),

$$\begin{aligned} \mathbb P ^\mathsf{f}_{\Omega ^*}[g] = \mathbb P ^\mathsf{f}_{(\mathbb Z ^2)^*}[g]+O(\Vert g\Vert _\infty \mathrm{e}^{-cn}), \end{aligned}$$
(5)

for any \(g\) depending only on spins in \(\Lambda _n^*\). The same holds for pure boundary conditions \(i\in \{1,\ldots ,q\}\).

Proof

We treat the case of the free boundary condition. The other cases follow from the same proof. Since \(p<p_c(q)\), the random-cluster model on \(\mathbb Z ^2\) has exponential decay of connectivities. Therefore, [4, Theorem 1.7(ii)] implies the so-called ratio strong mixing property for the dual random-cluster model: If a percolation event \(A\) depends on edges from \(E_A\) and if \(B\) depends on edges from \(E_B\), then,

$$\begin{aligned} \left| \frac{\mu ^{\mathsf{f}}_{(\mathbb{Z }^2)^*}(A\cap B)}{\mu ^{\mathsf{f}}_{(\mathbb{Z }^2)^*}(A)\mu ^{\mathsf{f}}_{(\mathbb{Z }^2)^*}(B)} -1\right| \le \sum _{e_A \in E_A , e_B\in E_B} \mathrm{e}^{-c\, \text{ d}(e_A , e_B )}, \end{aligned}$$
(6)

where \(\text{ d}(e_A , e_B )\) is a distance between edges \(e_A\) and \(e_B\) (for instance the distance between their mid-points).

Together with the observation that \(\mu ^{\mathsf{f}}_{\Omega ^*}=\mu _{(\mathbb Z ^2)^*}^{\mathsf{f}}(\cdot |\omega (e)=0,\forall e\notin E(\Omega ^*))\), this leads to

$$\begin{aligned} \bigl |\mu ^{\mathsf{f}}_{\Omega ^*}[f]-\mu _{(\mathbb Z ^2)^*}^{\mathsf{f}}[f]\bigr |= O\bigl ( \mathrm{e}^{-cn}\mu _{(\mathbb Z ^2)^*}^{\mathsf{f}}[f]\bigr ) \end{aligned}$$
(7)

for any function \(f\) depending only on edges in \(E_{3n/2}^*\). More generally, let \(F\) be the event that there does not exist an open crossing in the annulus \(A_{n,3n/2}\) (this corresponds to the existence of a dual circuit surrounding the origin). The complement \(F^c\) of this event has exponentially small probability by Proposition 2.1. Consider a function \(f\) depending a priori on every dual edges, but with the property that \(f\mathbf 1 _{F}\) is measurable with respect to edges in \(E_{3n/2}^*\). We immediately find that

$$\begin{aligned} \mu ^{\mathsf{f}}_{\Omega ^*}[f] = \mu ^{\mathsf{f}}_{\Omega ^*}[f\mathbf 1 _F]+O\big (||f||_{\infty }\mu ^{\mathsf{f}}_{\Omega ^*} (F^c)\big ) = \mu ^ {\mathsf{f}}_{\Omega ^*}[f\mathbf 1 _F]+O(||f||_{\infty }\mathrm{e}^{-cn}) \end{aligned}$$

and similarly for \(\mu ^\mathsf{f}_{(\mathbb Z ^2)^*}[f]\), so that (7) is preserved for this class of functions.

Now, consider \(g\) depending only on spins in \(\Lambda _n^*\). Via the coupling with the random-cluster model, \(\mathbb P ^\mathsf{f}_{\Omega ^*}[g]\) and \(\mathbb P ^\mathsf{f}_{(\mathbb Z ^2)^*}[g]\) can be seen as \(\mu ^{\mathsf{f}}_{\Omega ^*}[f]\) and \(\mu ^{\mathsf{f}}_{(\mathbb Z ^2)^*}[f]\) for a certain function \(f\), depending a priori on every edge, but for which \(f\mathbf 1 _F\) depends on edges in \(E_{3n/2}^*\) only (on the event \(F\), the dual connections between vertices of \(\Lambda _n^*\) are determined by edges in \(E_{3n/2}^*\)). We conclude that

$$\begin{aligned} \big |\mathbb P ^\mathsf{f}_{\Omega ^*}[g]-\mathbb P ^\mathsf{f}_{(\mathbb Z ^2)^*}[g]\big | = \big |\mu ^{\mathsf{f}}_{\Omega ^*}[f]-\mu ^{\mathsf{f}}_{(\mathbb Z ^2)^*}[f] \big | = O\big (||f||_{\infty }\mathrm{e}^{-cn} \big ). \end{aligned}$$

\(\square \)

Proof of Theorem 1.4

Fix \(n>0\) and a boundary condition \(\sigma \) on \(\partial \Lambda _n\). Fix \(\varepsilon >0\) small.

We consider the coupling \((\eta ,\omega )\) (the measure is denoted by \(\mathbf P \)) with marginals \(\mathbb P ^\sigma _{\Lambda _n}\) and \(\mu ^\mathsf{f}_{\Lambda _n}(\cdot \; | \; \text{ Cond}_n[\sigma ])\) described in the previous section. Let \(\mathcal E \) be the event that \(\omega \) contains an open crossing in \(A_{2n^\varepsilon ,n}\). Let \(\mathcal F ^\mathsf{f}\) be the event that \(\omega \) contains an open circuit in \(A_{2n^\varepsilon ,n}\). Let \(\mathcal F ^{(i)}\) be the event that \(\omega \) contains neither an open crossing nor an open circuit in \(A_{2 n^\varepsilon ,n}\), and that \((\Lambda _{2n^\varepsilon })^*\) is connected in the dual configuration to \(S_i\) (Fig. 1). Note that

$$\begin{aligned} \mathbf P (\mathcal E ) = \mu ^\mathsf{f}_{\Lambda _n}(\mathcal E \; | \; \text{ Cond}_n[\sigma ]) = O(n^{-\frac{1}{2} +14\varepsilon }), \end{aligned}$$

by applying Theorem 2.2.

  • (conditioning on \(\mathcal F ^\mathsf{f}\)).  Let \(\Gamma ^*\) be the connected component of \(\partial \Lambda _n^*\) in \(\omega ^*\). Denote the connected component of \(\Lambda _{2n^\varepsilon }^*\) in \(\Lambda _n^*{\setminus } \Gamma ^*\) by \(\Omega ^*\). We have \(\Lambda _{2n^\varepsilon }^*\subset \Omega ^*\). Conditioning on \(\Gamma ^*\) we infer, using (5) and (4) that

    $$\begin{aligned} \mathbf P \bigl (g\bigm |\mathcal F ^\mathsf{f}\bigr )&= \mathbf P \bigl ( \mathbb P ^\mathsf{f}_{\Omega ^*}[g] \bigm | \mathcal{F }^\mathsf{f}\bigr ) = \mathbb P ^\mathsf{f}_{(\mathbb Z ^2)^*}[g] + O(\Vert g\Vert _\infty \mathrm{e}^{-cn^{\varepsilon }}) \\&= \frac{1}{q} \sum _{i=1}^q\mathbb P ^{(i)}_{(\mathbb Z ^2)^*}[g] + O(\Vert g\Vert _\infty \mathrm{e}^{-cn^{\varepsilon }}). \end{aligned}$$
  • (conditioning on \(\mathcal F ^{(i)}\)). In this case, let us condition on the connected cluster \(\Gamma \) of \(\partial \Lambda _n\). We view \(\Gamma \) as the set of bonds. Define \(\Omega ^*\) as the connected component of \(\Lambda _{2n^\varepsilon }^*\) in \(\left(E_n{\setminus }\Gamma \right)^*\). By construction, \(\Lambda _{2n^\varepsilon }^*\subset \Omega ^*\) and \(\Omega ^*\cap S_i \ne \varnothing \). Consequently, using (5) once again, we obtain

    $$\begin{aligned} \mathbf P \bigl (g\bigm |\mathcal F ^{(i)}\bigr ) = \mathbf P \bigl ( \mathbb P ^{(i)}_{\Omega ^*}[g] \bigm | \mathcal{F }^{(i)}\bigr ) = \mathbb P ^{(i)}_{(\mathbb Z ^2)^*}[g] + O(\Vert g\Vert _\infty \mathrm{e}^{-cn^{\varepsilon }}). \end{aligned}$$
Fig. 1
figure 1

On the left (resp. center, right), the event \( \mathcal E \) (resp. \(\mathcal{F }^\mathsf{f}\), \(\mathcal{F }^{(i)}\)) is depicted

By summing all these terms,

$$\begin{aligned} \mathbb P ^\sigma _{\Lambda _n}[g]&= \mathbf P [g] = \mathbf P [g|\mathcal E ]\, \mathbf P [\mathcal E ] + \mathbf P [g|\mathcal F ^\mathsf{f}]\, \mathbf P [\mathcal F ^\mathsf{f}] + \sum _{i=1}^q \mathbf P [g|\mathcal F ^{(i)}]\mathbf P [\mathcal F ^{(i)}]\\&= \sum _{i=1}^q \bigl ( \tfrac{1}{q} \mathbf P [\mathcal F ^\mathsf{f}] + \mathbf P [\mathcal F ^{(i)}] \bigr ) \mathbb P ^{(i)}_{(\mathbb Z ^2)^*}[g] + O(\Vert g\Vert _\infty n^{-\frac{1}{2} +14\varepsilon }), \end{aligned}$$

which implies the claim readily. \(\square \)

Macroscopic flower domains

In the box \(\Lambda _n\), the conditioning on \(\mathrm{Cond}_n[\sigma ]\) can be very messy. Indeed, as we mentioned before, it forces the existence of open paths separating the sets \(S_i\). For instance, the number of such paths forced by an alternating boundary condition \(1,2,\dots ,q,1,2,\dots \) is necessarily of order \(n\).

We first show that, no matter what the boundary condition \(\sigma \) is, with high probability only a bounded number of such interfaces is capable of reaching an inner box \(\Lambda _m\), where \(m\) is a fraction of \(n\). Furthermore, we shall argue that the number of sites in \(\partial \Lambda _m\) which are connected to the original \(\partial \Lambda _n\) is uniformly bounded. In terms of the original Potts model, this corresponds to the existence, with high probability, of a domain including the box \(\Lambda _m\) for which the boundary condition contains a uniformly bounded number of spin changes. This will be called a flower domain below.

Definition of flower domains

Let \(m<n\). For a configuration \(\omega \), let \(\mathsf{C}_{m,n}=\mathsf{C}_{m,n}(\omega )\) be the set of sites connected to \(\partial \Lambda _n\) in \(\omega \cap (E_n{\setminus }E_m)\). Define the set of marked vertices by

$$\begin{aligned} \mathbb{G }_{m,n} = \mathbb{G }_{m,n}(\omega ) = \mathsf{C}_{m,n}\cap \partial \Lambda _{m}. \end{aligned}$$

The set \(\mathbb{G }_{m,n}\cup \left(\Lambda _n{\setminus }\mathsf{C}_{m,n}\right)\) may have several connected components, exactly one of them containing \(\Lambda _m\). Let us call the latter the flower domain \(\mathcal{D }_{m,n}=\mathcal{D }_{m,n}(\omega )\) rooted at m. Note that \(\mathbb{G }_{m,n} = \partial \mathcal{D }_{m,n}\cap \partial \Lambda _m\), that is marked sites are unambiguously determined by the corresponding flower domains (Fig. 2).

Fig. 2
figure 2

Description of a flower domain \(\mathcal D _{m,n}\) (light grey area). The blue points are locations of spin changes (i.e. separation between sets \(S_i\)), the red points constitute \(\mathbb G _{m,n}\), the solid black lines in the annulus \(\Lambda _n\backslash \Lambda _m\) constitute \(\mathsf{C}_{m,n}\) (color figure online)

Fix a configuration \(\omega \). Let \(\mathcal{C }=\mathsf{C}_{m,n}(\omega )\) and let \(\mathcal{D }=\mathcal{D }_{m,n}(\omega )\) be the corresponding flower domain. Let also \(\mathbb{G }=\mathbb{G }_{m,n}(\omega )\). By construction, the restriction of the conditional measure \(\mu ^{\mathsf{f}}_{\Lambda _n}(\, \cdot \, | \mathsf{C}_{m,n} =\mathcal{C })\) to \(\left\{ 0,1\right\} ^{\mathcal{E }_{\mathcal{D }}}\), where \(\mathcal{E }_{\mathcal{D }}\) is the set of edges of \(\mathcal{D }\), is the FK measure with free boundary conditions on \(\partial \mathcal{D }{\setminus }\mathbb{G }\) and wiring between sites of \(\mathbb{G }\) inherited from connections in \(\mathcal{C }\). We denote this restricted conditional measure as \(\mu _\mathcal{D }^{\text{ flower}}\). We also set \(\mathsf{C}_{\mathbb{G }}\) for the connected component of \(\mathbb{G }\) in the restriction of \(\omega \) to \(\mathcal{E }_{\mathcal{D }}\).

Cardinality of \(\mathbb{G }_{m,n}\)

Flower domains have typically small sets \(\mathbb{G }_{m,n}\), as the following proposition shows.

Proposition 3.1

There exists \(M>0\) such that for any \(\delta >0\)

$$\begin{aligned} \mu _{\Lambda _n}^\mathsf{f}\Bigl ( \exists m\in \bigl [\tfrac{\delta n}{3},\delta n\bigr ]:|\mathbb{G }_{m,n}| \le M \Bigm | \mathrm{Cond}_n[\sigma ]\Bigr ) \ge 1-\mathrm{e}^{-\delta n} , \end{aligned}$$
(8)

uniformly in \(\sigma \) and \(n\) sufficiently large.

The notation \(M\) will now be reserved for an integer \(M>0\) satisfying the previous proposition. We shall prove this Proposition for \(\delta =1\); the general case follows by a straightforward adaptation.

Definition 3.2

Let \(\mathcal{E }_r\) be the event that there exist \(r\) disjoint crossings of \(A_{n/3,n/2}\).

Lemma 3.3

For all \(r\ge 1\) and \(n>0\),

$$\begin{aligned} \mu _{\Lambda _n}^{\mathsf{f}}(\mathcal{E }_r)\le \mathrm{e}^{-crn}, \end{aligned}$$

where \(c>0\) is defined in Proposition 2.1.

Proof

We prove that for all \(r\ge 1\) and \(n>0\),

$$\begin{aligned} \mu _{\Lambda _n}^{\mathsf{f}}(\mathcal{E }_r) \le \bigl (\mu ^{\mathsf{w}}_{A_{n/3,n/2}}(\mathcal{E }_1)\bigr )^{r}. \end{aligned}$$
(9)

The conclusion will then follow easily, since Proposition 2.1 implies that \(\mu ^{\mathsf{w}}_{A_{n/3,n/2}}(\mathcal{E }_1) \le \exp (-cn)\).

In order to prove (9), we proceed by induction. First, note that \(\mu _{\Lambda _n}^{\mathsf{f}}\) restricted to \(A_{n/3,n/2}\) is stochastically dominated by \(\mu _{A_{n/3,n/2}}^{\mathsf{w}}\).

Let \(r\ge 1\) and consider \(\mu _{\Lambda _n}^{\mathsf{f}}(\mathcal{E }_{r+1}|\mathcal{E }_r)\). We number the vertices of \(\partial \Lambda _n=\{x_1,\ldots ,x_{4n+4}\}\) in clockwise order, starting at the bottom right corner. Let \(k\) be the smallest number such that there are \(r\) crossings among the clusters containing \(x_1,\ldots ,x_k\). Denote by \(\mathcal{S }\) the union of these clusters (which may contain isolated vertices). Observe that all edges in \(A_{n/3,n/2}{\setminus }\mathcal{S }\) which are incident to vertices of \(\mathcal{S }\) are closed. Therefore, the conditional measure \(\mu _{\Lambda _n}^{\mathsf{f}}(\cdot _{|A_{n/3,n/2}{\setminus }\mathcal{S }} |\mathcal{S })\) is stochastically dominated by \(\mu ^\mathsf{w}_{A_{n/3,n/2}}(\cdot _{|A_{n/3,n/2}\setminus \mathcal{S }})\). In both instances above, the symbol \(\nu (\cdot _{|B} )\) means the restriction of \(\nu \) to edges of the graph with the vertex set \(B\). As a result, the probability, under \(\mu _{\Lambda _n}^{\mathsf{f}}(\cdot _{|A_{n/3,n/2}\setminus \mathcal{S }} |\mathcal{S })\), that there exists a crossing of \(A_{n/3,n/2}\) is smaller than \(\mu ^{\mathsf{w}}_{A_{n/3,n/2}}(\mathcal{E }_1)\). We obtain

$$\begin{aligned} \mu _{\Lambda _n}^{\mathsf{f}}(\mathcal{E }_{r+1})&= \mu _{\Lambda _n}^{\mathsf{f}}(\mathcal{E }_{r+1}|\mathcal{E }_r)\mu _{\Lambda _n}^{\mathsf{f}}(\mathcal{E }_r) = \mu _{\Lambda _n}^{\mathsf{f}}[\mu _{\Lambda _n}^{\mathsf{f}}(\mathcal{E }_{r+1}|\mathcal{S })]\mu _{\Lambda _n}^{\mathsf{f}}(\mathcal{E }_r)\\&\le \mu ^{\mathsf{w}}_{A_{n/3,n/2}}(\mathcal{E }_1) \mu _{\Lambda _n}^{\mathsf{f}}(\mathcal{E }_{r})\le \mu ^{\mathsf{w}}_{A_{n/3,n/2}} (\mathcal{E }_1)^{r+1}. \end{aligned}$$

\(\square \)

Proof of Proposition 3.1

Obviously,

$$\begin{aligned} \mu _{\Lambda _n}^\mathsf{f}\bigl (\forall m\in \left[\tfrac{n}{3},\tfrac{n}{2}\right]:|\mathbb{G }_{m,n}|> M \bigm | \mathrm{Cond}_n[\sigma ]\bigr ) \le \frac{\mu _{\Lambda _n}^\mathsf{f}\bigl (\forall m\in [\tfrac{n}{3},\tfrac{n}{2}]:|\mathbb{G }_{m,n}|> M\bigr )}{\mu _{\Lambda _n}^\mathsf{f}(\mathrm{Cond}_n[\sigma ])}.\nonumber \\ \end{aligned}$$
(10)

Let us bound from below the denominator of (10). If all the edges of \(\partial E_n\) are open, then \(\mathrm{Cond}_n [\sigma ]\) occurs. Moreover, the measure \(\mu _{\Lambda _n}^{\mathsf{f}}\) stochastically dominates independent Bernoulli edge percolation on \(\left\{ 0,1\right\} ^{ E_n}\) with \(\tilde{p} = p/(p+ (q-1)p)\), see [2, Theorem 4.1]. We deduce

$$\begin{aligned} \mu _{\Lambda _n}^{\mathsf{f}}(\mathrm{Cond}_n[\sigma ])\ge \mu _{\Lambda _n}^{\mathsf{f}}(\text{ all} \text{ the} \text{ edges} \text{ in} \partial E_n \text{ are} \text{ open}) \ge \tilde{p}^{8n}. \end{aligned}$$
(11)

Let us now bound from above the numerator of (10). First,

$$\begin{aligned} \mu _{\Lambda _n}^{\mathsf{f}}\bigl (\forall m\in \bigl [\tfrac{n}{3},\tfrac{n}{2}\bigr ]:|\mathbb{G }_{m,n}| > M\bigr ) \le \mu _{\Lambda _n}^{\mathsf{f}}\bigl (|\mathsf{C}_{n/3,n}\cap A_{n/3,n/2}|\ge M n/6\big ). \end{aligned}$$

Fix \(R>0\). If \(|\mathsf{C}_{n/3,n}\cap A_{n/3,n/2}|\ge M n/6\), either \(A_{n/3,n/2}\) contains more than \(R\) crossings or one of the crossings has cardinality larger than \(M n/(6R)\). Proposition 2.1 implies that the probability of having clusters with size larger than \(M n/(6R)\) in \(\Lambda _{n/2}\) is smaller than \(\exp [-c M n/(6R)]\) for \(n\) large enough. Lemma 3.3 together with (10) implies that, for \(n\) large enough,

$$\begin{aligned} \mu _{\Lambda _n}^{\mathsf{f}}\bigl (\forall m\in \bigl [\tfrac{n}{3},\tfrac{n}{2}\bigr ]:|\mathbb{G }_{m,n}| > M \bigm | \mathrm{Cond}_n[\sigma ] \bigr ) \le \tilde{p}^{-8n}[\mathrm{e}^{-cRn}+\mathrm{e}^{-c Mn/(6R)}]\le \mathrm{e}^{-n}, \end{aligned}$$

provided that \(R\) and \(M\) be sufficiently large. \(\square \)

Reduction to FK measures on flower domains with free boundary condition

We define

$$\begin{aligned} \mathcal{M }_n = \max \{ m\le n : \left| \mathbb{G }_{m,n }\right|\le M \}, \end{aligned}$$
(12)

where the maximum is set to be equal to \(\infty \) if there is no \(m\le n\) such that \(\left| \mathbb{G }_{m,n }\right|\le M\). With this notation, we actually proved that \(\mathcal{M }_n \in [\tfrac{n}{3}, n]\) with probability bounded below by \(1-\mathrm{e}^{-{n} }\).

Let \(\mathcal{C }\) be a possible realization of \(\mathsf{C}_{m,n}\) and \(\mathcal{D }=\mathcal{D }_{m,n}\) be the corresponding flower domain. The restriction of \(\mu _{\Lambda _n}^{\mathsf{f}}\left(\cdot ~|~\mathcal{M }_n =m ;\mathsf{C}_{m, n}=\mathcal{C }\right)\) to \(\mathcal{D }\) is \(\mu _\mathcal{D }^{\text{ flower}}\). Furthermore,

$$\begin{aligned} \text{ Cond}_n[\sigma ]\cap \{ \mathcal{M }_n=m\} \cap \{ \mathsf{C}_{m,n} = \mathcal{C }\} \end{aligned}$$

is a product event \(\Omega _{\sigma ,\mathcal{C }}\times \{ \mathcal{M }_n= m ; \mathsf{C}_{m,n}=\mathcal{C }\}\), where \(\Omega _{\sigma ,\mathcal{C }}\subset \left\{ 0,1\right\} ^{\mathcal{E }_{\mathcal{D }}}\). Then

$$\begin{aligned} \mu _{\Lambda _n}^{\mathsf{f}}(\mathsf{C}\cap \Lambda _{n^\varepsilon }\!\ne \!\varnothing \; | \; \text{ Cond}_n[\sigma ];\mathcal{M }_n\!=\!m;\mathsf{C}_{m, n}\!=\!\mathcal{C }) \!=\! \mu _\mathcal{D }^{\text{ flower}}( \mathsf{C}_{\mathbb{G }} \cap \Lambda _{n^\varepsilon }\!\ne \!\varnothing \,|\, \Omega _{\sigma ,\mathcal{C }}).\nonumber \\ \end{aligned}$$
(13)

The event \(\Omega _{\sigma ,\mathcal{C }}\) has an obvious structure. It corresponds to the existence of certain connections between different sites of \(\mathbb{G }= \mathcal{C }\cap \partial \Lambda _m = \mathcal{D }\cap \partial \Lambda _m\). More precisely, let \(\mathcal{P }_{\mathbb{G }}\) be the collection of different partitions of \(\mathbb{G }\). Elements of \(\mathcal{P }_\mathbb{G }\) are of the form \(\underline{\mathbb{G }} = \left(\mathbb{G }_1, \dots ,\mathbb{G }_\ell \right)\). Define

$$\begin{aligned} \Omega _{\underline{\mathbb{G }}} = \bigcap _{i}\bigcap _{u,v\in \mathbb{G }_i}\{ u\leftrightarrow v\} \subset \{0,1\}^{\mathcal{E }_{\mathcal{D }}} . \end{aligned}$$

Let us say that a partition \(\underline{\mathbb{G }}\) is compatible with \(\Omega _{\sigma ,\mathcal{C }}\) if \(\Omega _{\underline{\mathbb{G }}} \subseteq \Omega _{\sigma ,\mathcal{C }}\). Note that we do not rule out that some elements \(\mathbb{G }_i\) of a partition \(\underline{\mathbb{G }}\) are singletons. If \(\mathbb{G }_i\) is a singleton, then \(\bigcap _{u,v\in \mathbb{G }_i}\{ u\leftrightarrow v\}\) is, of course, a sure event, which could be dropped from the definition of \(\Omega _{\underline{\mathbb{G }}}\). In other words, only non-singleton elements of \(\underline{\mathbb{G }}\) are relevant for \(\Omega _{\underline{\mathbb{G }}}\). Also note that the events \(\Omega _{\underline{\mathbb{G }}}\) do not have to be disjoint. Still, for any \(\sigma \),

$$\begin{aligned} \Omega _{\sigma ,\mathcal{C }} = \bigcup _{\underline{\mathbb{G }}\in \mathcal{P }^\prime _\mathbb{G }} \Omega _{\underline{\mathbb{G }}}, \end{aligned}$$

where the set \(\mathcal{P }^\prime _\mathbb{G }\) corresponds to partitions which are compatible with the occurrence of the event \(\Omega _{\sigma ,\mathcal{C }}\), and which are maximal in the sense that one cannot find a finer partition which would be still compatible with \(\Omega _{\sigma ,\mathcal{C }}\).

The previous section implies the following reduction, which we will now consider for the rest of this work.

Proposition 3.4

Fix \(\delta >0\). Then, writing \(B_k\) for the \(k^{th}\) Bell number, which counts the number of partitions of a set of \(k\) elements,

$$\begin{aligned} \mu _{\Lambda _n}^{\mathsf{f}}\bigl ( \mathsf{C}\cap \Lambda _{n^\varepsilon }\ne \varnothing \bigm | \text{ Cond}_n[\sigma ]\bigr ) \le \mathrm{e}^{-\delta n} + B_{M} q ^{M} \max \mu _\mathcal{D }^{\mathsf{f}}\bigl ( \mathsf{C}_\mathbb{G }\cap \Lambda _{n^\varepsilon }\ne \varnothing \bigm | \Omega _{\underline{\mathbb{G }}}\bigr ) ,\nonumber \\ \end{aligned}$$
(14)

for all boundary conditions \(\sigma \) and \(n\) sufficiently large. The above maximum is over all flower domains \(\mathcal{D }\) rooted at \(m\in [\tfrac{n}{3}, n]\) with at most \(\left| \mathbb{G }\right|\le M\) marked points, and over all partitions \(\underline{\mathbb{G }}\in \mathcal{P }^{\prime }_\mathbb{G }\).

Above the term \(q^{M}\) comes from the fact that the elements of \(\mathbb{G }\) are possibly wired together. It then bounds the Radon–Nikodym derivative between measures \(\mu _\mathcal{D }^{\text{ flower}}\) and \(\mu _\mathcal{D }^{\mathsf{f}}\). The quantity \(B_{M}\) bounds from above the number of sub-partitions of \(\mathbb{G }\) (the events \(\Omega _{\underline{\mathbb{G }}}\) being not necessarily disjoint).

Macroscopic structure near the center of the box

This section studies the macroscopic structure of the set \(\mathsf{C}\) of sites connected to the boundary of \(\Lambda _n\). Its main result, Proposition 4.2 below, implies that on a sufficiently small scale \(\delta >0\), the intersection \(\mathsf{C}\cap \Lambda _k\) for boxes with \(k\in [\frac{\delta n}{3},\delta n]\) is with an overwhelming probability either empty, or close to a segment, or close to a tripod (three segments coming out from a point).

Before starting, note that Proposition 3.4 enables us to restrict attention to a flower domain \(\mathcal{D }=\mathcal{D }_{m,n}\) with \(m\in [\frac{n}{3},n]\) and \(\left| \mathbb{G }_{m,n}\right|\le M\). We set \(\mathbb{G }=\mathbb{G }_{m,n}\). We now fix this flower domain and work under \(\mu _\mathcal{D }^{\mathsf{f}}\left(\cdot ~\big |~ \Omega _{\underline{\mathbb{G }}}\right)\) for some \(\underline{\mathbb{G }}\in \mathcal{P }_{\mathbb{G }}^{\prime }\). All constants in this section are independent of \(\mathcal{D }_{m,n}\) and \(\underline{\mathbb{G }}\) as long as \(|\mathbb{G }_{m,n}|\le M\). We will often recall this independence by using the expression “uniformly in \((\mathcal{D },\underline{\mathbb{G }})\) with \(|\mathbb{G }|\le M\)”.

Define \(\mathsf{C}_{k,\mathbb{G }}\) to be the set of edges connected to \(\mathbb{G }\) in \(\mathcal{D }{\setminus }\Lambda _k\) (which can consist of several connected components). Note that \(\mathbb{G }_{k,n}=\mathsf{C}_{k,n}\cap \partial \Lambda _k=\mathsf{C}_{k,\mathbb{G }}\cap \partial \Lambda _k\). Given \(v_1,v_2\in \mathbb R ^2\), we define \([v_1,v_2]\) to be the line segment with endpoints \(v_1\) and \(v_2\), and \(\measuredangle (v_1,v_2)\) to be the angle between \(v_1\) and \(v_2\), seen as vectors in the plane. We refer to Fig. 3 for an illustration of the following definitions.

Fig. 3
figure 3

Description of the events \(E^\ell _{\nu ,k}\), \(\ell =1,2,3\) from left to right. The set \(\mathbb G _{k,n}\), partitioned into \(\mathbb{V }^\ell _{k,n}\), \(\ell =1,2,3\), is indicated in red (color figure online)

Definition 4.1

For \(k< m\), \(\nu >0\) and \(\ell =1,2,3\), let us say that \(E_{\nu ,k}^\ell \subset \left\{ 0,1\right\} ^{\mathcal{E }_\mathcal{D }}\) occurs if S\(\ell \) below happens:

  1. S1.

    \(\mathbb{G }_{k,n}=\varnothing \).

  2. S2.

    \(\mathbb{G }_{k,n} = \mathbb{V }_{k,n}^1\cup \mathbb{V }_{k,n}^2\), where \(\mathbb{V }_{k,n}^1 , \mathbb{V }_{k,n}^2\) are two disjoint sets of \(\tau \)-diameter less than or equal to \(\nu k\). Moreover,

    • Each of the sets \(\mathbb{V }_{k,n}^1\) and \(\mathbb{V }_{k,n}^2\) is connected in \(\mathsf{C}_{k, \mathbb{G }}\).

    • For any two vertices \(v_i\in \mathbb{V }_{k,n}^i\); \(i=1,2\), we have \([v_1 ,v_2]\cap \Lambda _{k/2}\ne \varnothing \).

  3. S3.

    \(\mathbb{G }_{k,n} = \mathbb{V }_{k,n}^1\cup \mathbb{V }_{k,n}^2 \cup \mathbb{V }_{k,n}^3\), where \(\mathbb{V }_{k,n}^1\), \(\mathbb{V }_{k,n}^2\) and \(\mathbb{V }_{k,n}^3\) are disjoint sets with \(\tau \)-diameter less than or equal to \(\nu k\). Moreover,

    • Each of the sets \(\mathbb{V }_{k,n}^1\), \(\mathbb{V }_{k,n}^2\) and \(\mathbb{V }_{k,n}^3\) is connected in \(\mathsf{C}_{k ,\mathbb{G }}\),

    • For any choice of \(v_i\in \mathbb{V }_{k,n}^i;\ i=1,2,3\). there exists \(x\in \Lambda _{k/2}\) such that \(\mathcal{T }= \left\{ v_1 ,v_2 ,v_3 ; x\right\} \) is a Steiner tripod (see Definition 4.7 below). In particular, as it follows from P2 of Proposition 4.3 below, \(\measuredangle (v_i - x,v_j -x) > \frac{\pi }{2} +\eta \) for every \(i\ne j\) .

We are now in a position to state the main proposition.

Proposition 4.2

For any \(\nu >0\), there exist \(\delta =\delta (\nu ,M)>0\) and \(\kappa =\kappa (\nu ,M) > 0\) such that

$$\begin{aligned} \mu _\mathcal{D }^{\mathsf{f}}\left(\ \bigcup _{k\ge \delta n} \big (E_{\nu ,k}^1\cup E_{\nu ,k}^2\cup E_{\nu ,k}^3\big )\cap \big \{ |\mathbb{G }_{k,n}|\le M\big \}\ \Bigm |\ \Omega _{\underline{\mathbb{G }}}\ \right) \ge 1- \mathrm{e}^{-\kappa n}, \end{aligned}$$
(15)

uniformly in \((\mathcal{D },\underline{\mathbb{G }})\) with \(|\mathbb{G }|\le M\).

The proof of Proposition 4.2 comprises two steps: First, we show that the implied geometric structure is characteristic of deterministic objects called Steiner forests. Then, we show that, with high \(\mu _\mathcal{D }^{\mathsf{f}}(\,\cdot \,|\, \Omega _{\underline{\mathbb{G }}})\)-probability, the cluster \(\mathsf{C}_\mathbb{G }\) sits in the vicinity of one such forest.

Steiner forests

Note that for every \(m\) the set \(\mathcal{K }_m\) of all compact subsets of \(\Lambda _m\) is a Polish space with respect to the \(\text{ d}_\tau \)-distance.

We now recall the concept of Steiner forest (Fig. 4). Consider \(E\subseteq \partial \Lambda _m\) with \(|E|\le M\). Let \(\underline{E}=(E_1,\dots ,E_i)\) be a partition of \(E\) and \(\Omega _{\underline{E}}\) be the set of compact subsets of \(\mathbb{R }^2\) such that \(E_j\) is included in one of their connected components for every \(j\in \{1,\dots ,i\}\). For the trivial partition \(\underline{E} = \left\{ E\right\} \), we shall write \(\Omega _{E}\).

For a compact \(\mathcal{S }\subset \mathbb{R }^2\), let \(\tau (\mathcal{S })\) be the (one-dimensional) Hausdorff measure of \(\mathcal{S }\) in the \(\tau \)-norm. Explicitly,

$$\begin{aligned} \tau (\mathcal{S }) = \lim _{\varepsilon \rightarrow 0}\,\inf \Bigl \{ \sum \mathrm{diam}_\tau (A_i )\,:\, \mathcal{S }\subseteq \cup A_i,\, \mathrm{diam}_\tau (A_i )\le \varepsilon \Bigr \}, \end{aligned}$$
(16)

where \(\mathrm{diam}_\tau (A)=\sup \{ \tau (x-y) : x,y\in A\}\). Define the set of Steiner forests by

$$\begin{aligned} \Omega ^\mathrm{min}_{\underline{E}} = \bigl \{ \mathcal{F }\in \Omega _{\underline{E}} \,:\, \tau (\mathcal{F }) = \min _{\mathcal{S }\in \Omega _{\underline{E}}} \tau (\mathcal{S })\bigr \} . \end{aligned}$$

We set

$$\begin{aligned} \tau _{\underline{E}} = \min _{\mathcal{S }\in \Omega _{\underline{E}}} \tau (\mathcal{S }) = \tau (\mathcal{F }) , \end{aligned}$$

for any Steiner forest \(\mathcal{F }\in \Omega ^\mathrm{min}_{\underline{E}}\).

Fig. 4
figure 4

A non trivial Steiner forest with a partition \(\underline{E} = (E_1, E_2)\) with \(E_1 = \left\{ u_1, \dots , u_4\right\} \) and \(E_2 = \left\{ u_5, u_6\right\} \)

In the sequel we shall work only with Steiner forests \(\mathcal{F }\in \Omega ^\mathrm{min}_{\underline{E}}\), when \(\underline{E}\) is a partition of a set \(E\subset \partial \Lambda _m\) of cardinality \(\left| E\right|\le M\). Let \(\Omega ^\mathrm{min}_{M, m}\) be the collection of all such forests.

Proposition 4.3

Fix \(M>0\). The following properties hold uniformly in \(m\), in finite subsets \(E\subseteq \partial \Lambda _m\) with \(|E|\le M\) and in partitions \(\underline{E}\) of \(E\):

  1. P1.

    (Number of Steiner forests and compactness of \(\Omega ^\mathrm{min}_{M, m}\)) There exists \(k=k(M)<\infty \) such that \(|\Omega ^\mathrm{min}_{\underline{E}} |\le k\). The set \(\Omega ^\mathrm{min}_{M, m}\) is a compact subset of \(\left(\mathcal{K }_m, \text{ d}_\tau \right)\).

  2. P2.

    (Structure of Steiner forests) The sets \(\mathcal{F }\in \Omega ^\mathrm{min}_{\underline{E}}\) are forests (that is collections of disjoint trees). Each inner node (that is not belonging to \(E\)) of such \(\mathcal{F }\) has degree \(3\). Furthermore, there exists an \(\eta >0\) such that the angle between two edges incident to an inner node of \(\mathcal{F }\) is always larger than \(\tfrac{\pi }{2} + \eta \).

  3. P3.

    (Well separateness of trees) There exists \(\delta _1=\delta _1(M) > 0\) such that any \(\mathcal{F }\in \Omega ^\mathrm{min}_{\underline{E}}\) satisfies:

    1. (a)

      for any Steiner tree \(\mathcal{T }\in \mathcal{F }\), two different nodes of \(\mathcal{T }\) in \(\Lambda _{m/2}\) are at \(\text{ d}_\tau \)-distance at least \(\delta _1 m\) of each other;

    2. (b)

      if \(\mathcal{T }_1\) and \(\mathcal{T }_2\) are two disjoint trees of \(\mathcal{F }\), then \(\text{ d}_\tau \left(\mathcal{T }_1\cap \Lambda _{m/2} , \mathcal{T }_2\cap \Lambda _{m/2}\right)\ge \delta _1 m\) .

  4. P4.

    (Stability) For any \(\delta _2 > 0\), there exists \(\kappa _2=\kappa _2(\delta _2,M) > 0\) such that, for any \(\left| E\right|\le M\), any partition \(\underline{E}\) of \(E\) and any \(\mathcal{S }\in \Omega _{\underline{E}}\),

    $$\begin{aligned} \tau (\mathcal{S }) \le \tau _{\underline{E}} + \kappa _2 m\quad \text{ implies}\quad \min _{\mathcal{F }\in \Omega ^\mathrm{min}_{M, m}}\mathrm{d}_\tau (\mathcal{S }, \mathcal{F })< \delta _2 m . \end{aligned}$$
    (17)

Proof

We shall be rather sketchy since the arguments are presumably well understood. We shall consider the case \(m=1\) (the general case follows by homogeneity).

Let us start with P4. The functional \(\tau \) in (16) is lower semi-continuous on \((\mathcal{K }_1,\text{ d}_\tau )\) and has compact level sets (meaning sets of the form \(\{\mathcal{S }:\tau (\mathcal{S })\le R\}\)). See, for instance, [14, Proposition 3.1], where these facts are explained for the inverse correlation length of sub-critical Bernoulli bond percolation.

Assume that P4 is wrong. Then there exists \(\delta >0\) and two sequences; \(\underline{E_k}\) and \(\mathcal{S }_k\in \Omega _{\underline{E_k}}\), such that

$$\begin{aligned} \tau \left(\mathcal{S }_j\right)< \tau _{\underline{E_k}}+\frac{1}{k}\quad \text{ but}\quad \min _{\mathcal{F }\in \Omega ^\mathrm{min}_{M, m}}\mathrm{d}_\tau (\mathcal{S }_j,\mathcal{F }) > \delta . \end{aligned}$$

Since \(\left| E_k\right|\le M\), the sequence \(\tau _{\underline{E_k}}\) is bounded. Hence \(\left\{ \mathcal{S }_j\right\} \) is precompact. Possibly passing to subsequence we may assume that \(\underline{E_k}\) converges to \(\underline{E}\) (points might collapse, but this is irrelevant since this preserves \(\left| E\right|\le M\)), and that \(\mathcal{S }_k\) converges to \(\mathcal{S }\in \Omega _{\underline{E}}\). Both convergence are, of course, in the sense of Hausdorff distance. By minimality it is evident that \(\tau _{\underline{E}} = \lim \tau _{\underline{E_k}}\). By lower-semicontinuity \(\tau (\mathcal{S })\le \liminf \tau (\mathcal{S }_k )\). Which means that \(\mathcal{S }\in \Omega _{\underline{E}}^\mathrm{min}\). A contradiction.

A proof of the first assertion of P1 can be found in [12, Theorem 1]. Compactness of \(\Omega _{M,1}^\mathrm{min}\) follows from compactness of level sets of \(\tau \) and the fact that if \(\mathcal{F }_k\in \Omega _{\underline{E_k}}^\mathrm{min}\) converges to \(\mathcal{F }\in \Omega _{\underline{E}}\), then, as was already mentioned above, \(\tau _{\underline{E}} = \lim \tau _{\underline{E_k}}\), and hence, by the lower-semicontinuity of \(\tau \), \(\mathcal{F }\in \Omega _{\underline{E}}^\mathrm{min}\).

A proof of P2 can be found in [5].

Let us turn to the proof of P3. For trivial partitions, Steiner forests are trees. Now, assume that there exists a sequence of Steiner trees \(\mathcal{T }_k\in \Omega ^\mathrm{min}_{{ E}_k}\) such that \(\mathcal{T }_k\) contains at least two inner nodes in \(\Lambda _{1/2}\) at distance less or equal \(\frac{1}{k}\). There is no loss of generality to assume that the sequence \(\mathcal{T }_k\) converges to some \(\mathcal{T }^*\) . As it follows from P4, \(\mathcal{T }^*\in \Omega ^\mathrm{min}_{{E}^*}\), where \({E}^*\) is the corresponding limit of \({E}_k\). Obviously, \(|E^*|\) is still less or equal to \(M\), since boundary points can only collapse under the limiting procedure.

The total number of nodes of each of \(\mathcal{T }_k\) is uniformly bounded above. Hence by our assumption we can choose a number \(\ell \ge 2\), a point \(x\in \Lambda _{1/2}\), a radius \(\varepsilon >0\) and a sequence \(\nu (k )\rightarrow 0\), so that

  1. (a)

    each of \(\mathcal{T }_k\) contains \(\ell \) nodes in \(\Lambda _{\nu (k)} (x)=x+\Lambda _{\nu (k)}\);

  2. (b)

    none of \(\mathcal{T }_k\) contains nodes in the annulus \(A_{\nu (k), \varepsilon }(x)\).

Then the restriction of \(\mathcal{T }_k\) to \(\Lambda _{\varepsilon } (x)\) is a Steiner tree, whereas the cardinality of the intersection \({|\partial \Lambda _{\varepsilon } (x)\cap \mathcal{T }_k | = \ell +2}\). By the minimality of \(\mathcal{T }_k\) the points of \(\partial \Lambda _{\varepsilon } (x)\cap \mathcal{T }_k \) are uniformly separated. Consequently, \(|\partial \Lambda _{\varepsilon } (x)\cap \mathcal{T }^* | = \ell +2 >3\). We infer that the degree of \(x\) in the Steiner tree \(\mathcal{T }^*\) is \(\ell +2 >3\), which is impossible by P2. This proves P3(a).

Consider now two disjoint Steiner trees \(\mathcal{T }_1\in \Omega ^\mathrm{min}_{{ E}_1}\) and \(\mathcal{T }_2 \in \Omega ^\mathrm{min}_{{ E}_2}\), such that the forest \(\left\{ \mathcal{T }_1 ,\mathcal{T }_2\right\} \) belongs to \(\Omega ^\mathrm{min}_{\left\{ { E}_1 , E_2 \right\} }\). By the strict convexity of \(\tau \), the trees are confined to their convex envelopes: \(\mathcal{T }_i\in \mathrm{co}\left(E_i \right)\) for \(i=1,2\). Thus if both trees are disjoint and intersect \(\Lambda _{1/2}\), it follows that \(\mathrm{co}\left(E_1\right)\cap \mathrm{co}\left(E_2\right)= \varnothing \). Consequently, there exist \(u_1, v_1\in E_1\) and \(u_2, v_2\in E_2\), such that \(\mathcal{T }_1\) lies below the interval \([u_1 ,v_1]\) and \(\mathcal{T }_2\) lies above the interval \([u_2 ,v_2]\) (notions of above and below are with respect to the directions of normals). We are now facing two cases:

  • \(\mathcal{T }_1\) or \(\mathcal{T }_2\) has an inner node in \(\Lambda _{2/3}\). By P2, inner nodes are of degree three and angles between edges incident to inner nodes are at most \(\pi -2\eta \). This pushes inner nodes of \(\mathcal{T }_i\) away from \([u_i ,v_i ]\) uniformly in \(\mathcal{T }_1\) and \(\mathcal{T }_2\). In such a case, P3 is satisfied.

  • Both \(\mathcal{T }_1\) and \(\mathcal{T }_2\) do not contain nodes in \(\Lambda _{2/3}\), but each contains an edge which crosses \(\Lambda _{1/2}\). Having such edges close to each other (and hence running essentially in parallel across \(\Lambda _{1/2}\)) is easily seen to be incompatible with the minimality of \(\mathcal{F }\).

This achieves the proof of P3(b). \(\square \)

Forest skeleton of the cluster \(\mathsf{C}_\mathbb{G }\)

Let \(\underline{\mathbb{G }}\) be a partition of \(\mathbb{G }\). We now aim to show that, under \(\mu _\mathcal{D }^{\mathsf{f}}(\,\cdot \,|\, \Omega _{\underline{\mathbb{G }}} )\), the cluster \(\mathsf{C}_\mathbb{G }\) stays typically close to one of the Steiner forests from \(\Omega _{M, m}^\mathrm{min}\). In order to do that, we introduce the notion of forest skeleton of the cluster. This notion is a modification of the coarse-graining procedure developed in Section 2.2 of [10].

Let \(\mathbf{U }_\tau \) be the unit ball in \(\tau \)-norm. Fix a large number \(c >0\) and consider \(K\) such that \(c\log K<K\). For any \(y\in \mathbb Z ^2\), set

$$\begin{aligned} \mathbf{B }_K (y)\, = \, \left(y+ K\cdot \mathbf{U }_\tau \right)\cap \mathbb Z ^2 \quad \text{ and}\quad \hat{\mathbf{B }}_{K } (y) \, =\, \mathbf{B }_{K +c\log K} (y). \end{aligned}$$

If \(x\in A\subset \mathbb Z ^2\) and \(y\in A\cup \partial ^{{\mathrm{ext}}} A\), we shall use \(\{x\stackrel{A}{\longleftrightarrow }y\}\) to denote the event that \(x\) and \(y\) are connected by an open path from \(x\) to \(y\) whose vertices belong to \(A\), with the possible exception of the terminal point \(y\) itself.

Let us construct the forest skeleton \(\mathcal{F }_{K}\) of the cluster \(\mathsf{C}_\mathbb{G }\) (see Fig. 5). Here and below, vertices in \(\mathbb Z ^2\) are ordered using the lexicographical ordering. In the following construction, we will often refer to the minimal vertex having some property.

  • Step 1. Set \(r=1, i=1\). Set \(x_0^1=u_{i_1}\) be the minimal vertex of \(\mathbb{G }\). Set \(V = \left\{ x_0^{1}\right\} \) and \(\mathcal{C }= \hat{\mathbf{B }}_K(x_0^1)\). Go to Step 2.

  • Step 2. If there exists \(x\in V\) and \(u\in \mathbb{G }{\setminus } V\) such that \(u\in \hat{\mathbf{B }}_{2K} (x)\), then choose \(u^*\in \mathbb{G }{\setminus } V\) to be the minimal such vertex. Set \(x^r_i = u^* ,{A_i^r = \mathbf{B }_K (x_i^r )}\) and go to Step 3. Otherwise, go to Step 4.

  • Step 3. Update \(V\rightarrow V\cup \{x_i^r\}\), \(\mathcal{C }\rightarrow \mathcal{C }\cup \hat{\mathbf{B }}_K(x_i^r)\) and \(i \rightarrow i+1\). Go to Step 2.

  • Step 4. If there is at least one vertex \(y\in \partial ^{\mathrm{ext}} \mathcal{C }\) such that

    $$\begin{aligned} y \stackrel{\mathsf{C}_\mathbb{G }\setminus \mathcal{C }}{\longleftrightarrow } \partial ^{\mathrm{ext}} \mathbf{B }_K(y) {\setminus }\mathcal{C }, \end{aligned}$$

    then choose \(y^*\) to be the minimal such vertex, set \(x_i^r =y^*, A_i^r = \mathbf{B }_K (x_i^r ){\setminus } \mathcal{C }\), and go to Step 3. Otherwise, go to Step 5.

  • Step 5. If \(\mathbb{G }\subset V\), then terminate the construction. Otherwise, choose \(u^*\) to be the minimal vertex of \(\mathbb{G }{\setminus }V\). Update \(r\rightarrow r+1\) and set \(x_0^r = u^*\). Update \(V\rightarrow V\cup \left\{ x_0^r\right\} \) and \(i=1\). Go to Step 2.

Fig. 5
figure 5

Construction of the forest skeleton \(\mathcal{F }_{K}=\{\mathcal{T }_K^1,\mathcal{T }_K^2\}\) of the cluster \(\mathsf{C}_\mathbb{G }\) (in black), consisting of the trees \(\mathcal{T }_K^i=\{\mathfrak{t }^i,\mathfrak B ^i\}\), \(i=1,2.\) The Steiner forest corresponding to the partition \(\underline{\mathbb{G }}=(\{u_1,u_2,u_5\},\{u_3,u_4\})\) is drawn in dashed green (color figure online)

Definition 4.4

The above procedure produces \(r\) disjoint sets of vertices \(V^1 = \left\{ x_0^1, x_1^1, \dots \right\} \), \(V^2 = \left\{ x_0^2, x_1^2, \dots \right\} \), \(\dots \), \(V^r = \left\{ x_0^r, x_1^r, \dots \right\} \). The vertices \(x_i^j\) constructed on Step 4 are equipped with sets \(A_i^j\), \(j=1\ldots r\). Exit paths through such \(A_i^j\)-s contribute multiplicative factors \(\mathrm{e}^{-K}\) each. Sets \(A_i^j\) for vertices \(x_i^j\) constructed on Step 2 play no role and are introduced for notational convenience only (see (20) below). By construction, there are at most \(M\) such vertices.

The edges within each group \(\ell = 1, \dots , r\) are constructed as follows: \(x_i^\ell \) is connected to the vertex of

$$\begin{aligned} \bigl \{ x_j^\ell \,:\, j<i \text{ and} x_i\in \hat{\mathbf{B }}_{2K}(x_j)\bigr \} \end{aligned}$$

which has smallest index \(j\).

This produces a graph which is a set of \(r\) trees \(\mathcal{T }_{K}^1,\ldots ,\mathcal{T }_K^r\). The union of the trees is called the forest skeleton \(\mathcal{F }_K=\cup _\ell \mathcal{T }_K^\ell \).

Note that we consider these graphs as compact subsets of \(\mathbb{R }^2\). An example of forest squeleton is drawn on Fig. 5. The following result follows trivially from the construction of the forest skeleton.

Proposition 4.5

Let \(\mathcal{F }_K\) be the forest skeleton of \(\mathsf{C}_\mathbb{G }\), then

  1. 1.

    \(\mathbb{G }\) is included in the vertices of \(\mathcal{F }_K\).

  2. 2.

    Two vertices \(u,v\in \mathbb{G }\) which were connected in \(\mathsf{C}_\mathbb{G }\) are also connected in \(\mathcal{F }_K\).

  3. 3.

    \(\mathsf{C}_{\mathbb{G }}\subseteq \cup _{\ell ,i} \hat{\mathbf{B }}_{2K} (x_i^\ell )\).

Distance between \(\mathsf{C}_\mathbb{G }\) and Steiner forests

Proposition 4.6

For every \(\delta _3>0\), there exists \(\kappa _3=\kappa _3(M)>0\) such that for \(n\) large enough,

$$\begin{aligned} \mu _{\mathcal{D }}^\mathsf{f}\left( \min _{\mathcal{F }\in \Omega ^\mathrm{min}_{M,m }}\mathrm{d}_\tau \bigl (\mathsf{C}_\mathbb{G }, \mathcal{F }\bigr ) > \delta _3 n \Bigm | \Omega _{{\underline{\mathbb{G }}}} \right) \le \mathrm{e}^{- \kappa _3 n }, \end{aligned}$$

uniformly in \((\mathcal{D },\underline{\mathbb{G }})\) with \(|\mathbb{G }|\le M\).

Proof

Let \(\mathcal{F }_K\) be the forest skeleton of \(\mathsf{C}_\mathbb{G }\) at scale \(K\) (\(K\) will be chosen later). By the third item of Proposition 4.5,

$$\begin{aligned} \mathrm d _\tau \left(\mathsf{C}_\mathbb{G }, \mathcal{F }_K\right)\le 2K +c\log 2K. \end{aligned}$$

The proposition thus reduces to the following claim: for any \(\delta _3>0\), there exist \(K=K(M)>0\) and \(\kappa _3=\kappa _3(M)>0\) such that

$$\begin{aligned} \mu _{\mathcal{D }}^\mathsf{f}\left( \min _{\mathcal{F }\in \Omega ^\mathrm{min}_{M,m }} \mathrm{d}_\tau \bigl (\mathcal{F }_{K}, \mathcal{F }\bigr ) > \delta _3 n \Bigm | \Omega _{{\underline{\mathbb{G }}}} \right) \le \mathrm{e}^{- \kappa _3 n }, \end{aligned}$$

uniformly in \((\mathcal{D },\underline{\mathbb{G }})\) with \(|\mathbb{G }|\le M\). We now prove this statement.

Writing \(E:=\{ \min _{\mathcal{F }\in \Omega ^\mathrm{min}_{M,m }} \mathrm{d}_\tau \bigl (\mathcal{F }_{K}, \mathcal{F }\bigr ) > \delta _3 n\}\), we have

$$\begin{aligned} \mu _{\mathcal{D }}^\mathsf{f}(E\vert \Omega _{{\underline{\mathbb{G }}}} ) = \frac{\mu _{\mathcal{D }}^\mathsf{f}(E \cap \Omega _{{\underline{\mathbb{G }}}} )}{\mu _{\mathcal{D }}^\mathsf{f}( \Omega _{{\underline{\mathbb{G }}}} )} \le \frac{\mu _{\mathcal{D }}^\mathsf{f}( \tau (\mathcal{F }_{K }) \ge \tau _{{\mathbb{G }}} +\kappa _2 n)}{\mu _{\mathcal{D }}^\mathsf{f}( \Omega _{{\underline{\mathbb{G }}}} )} \end{aligned}$$
(18)

where in the last inequality we used Property P4 of Proposition 4.3, applied with \(\delta _2 =\delta _3\).

Let \(\mathcal{F }\) be a Steiner forest in \(\Omega _{\underline{\mathbb{G }}}^\mathrm{min}\) and \(\mathcal{F }^{\prime }\) be the forest obtained by replacing each inner node of \(\mathcal{F }\) by the closest vertex of \(\mathbb Z ^2\). Now, by the FKG inequality, we can lower bound the denominator

$$\begin{aligned} \mu _{\mathcal{D }}^\mathsf{f}(\Omega _{{\underline{\mathbb{G }}}} )&\ge \mu _{\mathcal{D }}^\mathsf{f}\left(\bigcap _{\{x,y\}\in \mathcal{E }(\mathcal{F }^{\prime })} \{x \leftrightarrow y\}\right) \ge \prod _{\{x,y\}\in \mathcal{E }(\mathcal{F }^{\prime })} \mu _{\mathcal{D }}^\mathsf{f}( x \leftrightarrow y )\nonumber \\&\ge \prod _{\{x,y\}\in \mathcal{E }(\mathcal{F }^{\prime })} \mathrm{e}^{-\tau (y-x)(1+o_{\left| y-x\right|}(1))} = \mathrm{e}^{-\tau _{\mathbb{G }} (1+o_n(1))}. \end{aligned}$$
(19)

where \(\lim _{k\rightarrow \infty } \text{ o}_{k}(1)=0\) by definition, and the product is taken over the set \(\mathcal{E }(\mathcal{F }^{\prime })\) of all the inner edges of the approximate Steiner forest \(\mathcal{F }^{\prime }\).

To obtain an upper bound on the numerator, we follow [10, Section 2]. Let \(\vert V(\mathcal{F }_{{K}})\vert =\sum _{\ell =1}^r \vert V^\ell \vert \) be the total number of vertices of the forest skeleton \(\mathcal{F }\), then

$$\begin{aligned} {\mathrm{e}^{-2KM}}\mu _{\mathcal{D }}^\mathsf{f}(\mathcal{F }_{K}=\mathcal{F })&\le \mu _\mathcal{D }^\mathsf{f}\left(\bigcap _{\ell =1}^r\bigcap _{i=0}^{\vert V^\ell \vert } x_i^\ell \stackrel{{A_i^\ell }}{\leftrightarrow }\partial ^{{\mathrm{ext}}}\mathbf{B }_K(x_i^\ell )\right) \nonumber \\&\le \prod _{\ell =1}^r\prod _{i=1}^{\vert V^\ell \vert } \mu _{\hat{A}_K^i}^\mathsf{w}\bigl ( x_i^\ell \stackrel{{A_i^\ell }}{\leftrightarrow }\partial ^{{\mathrm{ext}}} \mathbf{B }_K(x_i^\ell ) \bigr )\nonumber \\&\le \bigl (\mathrm{e}^{-K(1-\text{ o}_{K}(1))}\bigr )^{\sum _{\ell =1}^r \vert V^\ell \vert } = \mathrm{e}^{-K\vert V(\mathcal{F })\vert (1-\text{ o}_{K}(1))}\nonumber \\&\le \mathrm{e}^{-\tau (\mathcal{F })(1-\text{ o}_{K}(1)-\text{ o}_{n}(1))}, \end{aligned}$$
(20)

where in the first inequality the term \(\mathrm{e}^{-2MK}\) compensates (by the FKG inequality) the inclusion of events \(x_i^\ell \stackrel{{A_i^\ell }}{\leftrightarrow }\partial ^{{\mathrm{ext}}}\mathbf{B }_K(x_i^\ell )\) for points \(x_i^\ell \in \mathbb{G }\), whereas in the second inequality we expand the probability of the intersection as a product of conditional expectations and then use the FKG inequality to compare this conditional expectations with the probability with wired boundary conditions, and in the second line we use that \(\mu _{\mathbf{B }_K(x)}^\mathsf{w}(x\leftrightarrow \partial ^{{\mathrm{ext}}} \mathbf{B }_K(x))=\mathrm{e}^{-K(1-\text{ o}_{K}(1))}\) (this follows from [10, Corollary 1.1], which is now known to be valid up to \(p_c(q)\) thanks to Proposition 2.1). If we now upper bound crudely the number of forest \(K\)-skeletons rooted at \(\mathbb{G }\) with \(\tau (\mathcal{F })=T\) (and so with less than \(C_1 T/K\) vertices) by \((C_2 K)^{C_3 T/K}\), we get

$$\begin{aligned} \mu _{\mathcal{D }}^\mathsf{f}( \tau (\mathcal{F }_{K }) \ge \tau _{{\mathbb{G }}} +\kappa _2 n)&= \sum _{\mathcal{F }: \tau (\mathcal{F })\ge \tau _\mathbb{G }+\kappa _2n} \mu _\mathcal{D }^\mathsf{f}(\mathcal{F }_K=\mathcal{F }) \nonumber \\&= \sum _{T\ge \tau _\mathbb{G }+\kappa _2n}{\mathop {\mathop {\sum }\limits _{\mathcal{F }\,:}}\limits _{\tau (\mathcal{F })=T}}\mu _\mathcal{D }^\mathsf{f}(\mathcal{F }_K=\mathcal{F })\nonumber \\&= \sum _{T\ge \tau _\mathbb{G }+\kappa _2n}\mathrm{e}^{(C_1 T/K)\log (C_2 K)-T(1-\text{ o}_{K}(1)-\text{ o}_{n}(1))}\nonumber \\&\le C_4 \cdot \mathrm{e}^{-(\tau _\mathbb{G }+\kappa _2 n)(1+\text{ o}_{K}(1)+\text{ o}_{n}(1))}, \end{aligned}$$
(21)

where we used (20) in the second line. The result follows by comparison with (19):

$$\begin{aligned} \mu _\mathcal{D }^\mathsf{f}(E\vert \Omega _\mathbb{G }) \le \mathrm{e}^{-(\tau _\mathbb{G }+\kappa _2 n)(1-\text{ o}_{K}(1)-\text{ o}_{n}(1))+\tau _\mathbb{G }(1+\text{ o}_{n}(1))} \le \mathrm{e}^{-n\kappa _3}. \end{aligned}$$

Note that \(\tau _\mathbb{G }o_n(1) = o(n)\) since \(\tau _{\mathbb{G }}=O(n)\). The latter follows from the fact that \(\tau _\mathbb{G }\) is bounded by the \(\tau \)-length of the forest obtained by opening all the edges of \(\partial E_m\) (recall that \(\tau \) is an equivalent norm on \(\mathbb R ^2\)). \(\square \)

Proof of Proposition 4.2

Fix \(\mathcal{D }=\mathcal{D }_{m,n}\) and \(\mathbb{G }=\mathbb{G }_{m,n}\) with \(m\ge \frac{n}{3}\) and \(|\mathbb{G }|\le M\). Let \(\nu >0\). Fix an arbitrary \(0<\delta \ll 1\) such that \(\Lambda _{250\delta n}\subset \delta _1 n\mathbf U _\tau \), where \(\delta _1\) is given by P3. By definition of \(\delta \), we know that for any forest \(\mathcal{F }\in \Omega ^\mathrm{min}_{{M}, m}\), \(\mathcal{F }\cap \Lambda _{250\delta n}\) is connected and contains at most one node. Therefore, we have three cases: either \(\mathcal{F }\cap \Lambda _{2\delta n}=\varnothing \), or \(\mathcal{F }\cap \Lambda _{2\delta n}\ne \varnothing \) but \(\mathcal{F }\cap \Lambda _{20\delta n}\) contains only one edge, or \(\mathcal{F }\cap \Lambda _{20\delta n}\) contains more than one edge. In the later case, the fact that edges incident to a node make an angle larger or equal to \(\frac{\pi }{2}+\eta \) implies that \(\mathcal{F }\cap \Lambda _{40\delta n}\) contains a node.

Also set \(\delta _3<\min \{\nu ,\delta \}\). Proposition 4.6 implies that

$$\begin{aligned} \min _{\mathcal{F }\in \Omega ^\mathrm{min}_{M,m }}\mathrm d _\tau \bigl ( \mathsf{C}_\mathbb{G }, \mathcal{F }\bigr ) \le \delta n, \end{aligned}$$
(22)

with probability larger than \(1-\mathrm{e}^{-\kappa _3 n}\) for \(n\) large enough. We now assume that this inequality is indeed satisfied. Since, by P1 of Proposition 4.3 the set \(\Omega ^\mathrm{min}_{M,m }\) is compact, and since we are after an upper bound which vanishes with \(n\), it will be enough to fix a Steiner forest \(\mathcal{F }\in \Omega ^\mathrm{min}_{M,m }\) and to assume that

$$\begin{aligned} \mathrm d _\tau \bigl ( \mathsf{C}_\mathbb{G }, \mathcal{F }\bigr ) \le \delta n, \end{aligned}$$
(23)

Let us treat the three previous cases separately.

  1. C1.

    \(\mathcal{F }\cap \Lambda _{2\delta n} = \varnothing \). In such case, (23) shows that \(\mathsf{C}_\mathbb{G }\cap \Lambda _{\delta n}=\varnothing \). Thus, \(E^1_{\nu ,\delta n}\) holds true and \(\mathbb{G }_{\delta n,n}=\varnothing \).

  2. C2.

    \(\mathcal{F }\cap \Lambda _{20\delta n}=[u_1,u_2]\) with \(u_1\) and \(u_2\) on \(\partial \Lambda _{20\delta n}\) and \([u_1 ,u_2]\cap \Lambda _{2\delta n}\ne \varnothing \). In such case, (23) shows that \(\mathsf{C}_\mathbb{G }\) intersects \(\Lambda _{3\delta n}\) which in turns implies that \(E^2_{\nu ,k}\) holds for every \(k\in [6\delta n,18\delta n]\). Proposition 3.1 implies the existence of \(k\in [6\delta n,18\delta n]\) with \(|\mathbb{G }_{k,n}|\le M\) on an event of probability larger than \(1-\mathrm{e}^{-18\delta n}\).

  3. C3.

    There exists a node \(x\in \Lambda _{40\delta n}\) and therefore \(\mathcal{F }\cap \Lambda _{250\delta n}=[u_1,x]\cup [u_2,x]\cup [u_3,x]\) with \(u_1\), \(u_2\), \(u_3\) on \(\partial \Lambda _{250\delta n}\) such that \(\measuredangle (u_i - x,u_j - x) >\frac{\pi }{2} +\eta \) for every \(i\ne j\). In such case, (23) shows that \(E^3_{\nu ,k}\) holds for every \(k\in [ 82\delta n,246\delta n]\). Proposition 3.1 implies the existence of \(k\in [82\delta n,246\delta n]\) with \(|\mathbb{G }_{k,n}|\le M\) on an event of probability larger than \(1-\mathrm{e}^{-246\delta n}\).

Altogether, we obtain the claim. \(\square \)

For later use, let us introduce the following definition:

Definition 4.7

For \(u_1, u_2, u_3\) in general position the function \(\phi (y ) := \sum _{i=1}^3 \tau (u_i -y)\) is strictly convex and quadratic around its minimum point; see [9, Lemma 3]. Let \(x\) be the unique minimizer of \(\phi \). In this way the notation \(\mathcal{T }(u_1, u_2 , u_3 ; x)\) is reserved for the minimal Steiner forest (in this case it is a tree) which contains \(u_1, u_2, u_3\). It might happen, of course, that \(x\) coincides with one of the \(u_i\)-s. When, however, this is not the case, we shall refer to \(\mathcal{T }(u_1, u_2 , u_3 ; x)\) as a Steiner tripod.

Fluctuation theory and proof of Theorem 2.2

We are now in a position to prove Theorem 2.2. Let \(\nu >0\) small enough to be fixed later. By (14) and (15), we can assume that there exist \(\delta =\delta (\nu )>0\) and \(k\ge \delta n\) such that \(|\mathbb{G }_{k,n}|\le M\) and \(E^\ell _{\nu ,k}\) holds true for some \(\ell \in \{1,2,3\}\). Let

$$\begin{aligned} \mathcal{R }_n = \max \left\{ k\ge \delta n : |\mathbb{G }_{k,n}|\le M \text{ and} E^1_{\nu ,k}\cup E^2_{\nu ,k}\cup E^3_{\nu ,k} \right\} \in \left[\delta n , n\right]\!. \end{aligned}$$

Let \(\mathcal{C }\) be a possible realization of \(\mathsf{C}_{k,n}\) and \(\mathcal{D }=\mathcal{D }_{k,n}\) be the corresponding flower domain. We also set \(\mathbb{G }=\mathbb{G }_{k,n}\). The restriction of \(\mu _{\Lambda _n}^{\mathsf{f}}\left(\cdot ~|~\mathcal{R }_n =k ;\mathsf{C}_{k, n}=\mathcal{C }\right)\) to \(\mathcal{D }\) is \(\mu _\mathcal{D }^{\text{ flower}}\). Exactly as in Sect. 3.3,

$$\begin{aligned} \text{ Cond}_n[\sigma ]\cap \{ \mathcal{R }_n= k\} \cap \{ \mathsf{C}_{k,n} = \mathcal{C }\} = \Omega _{\sigma ,\mathcal{C }}\times \{ \mathcal{R }_n= k ; \mathsf{C}_{k,n}=\mathcal{C }\}, \end{aligned}$$

where \(\Omega _{\sigma ,\mathcal{C }}=\cup _{\underline{\mathbb{G }}\in \mathcal{P }^{\prime }_\mathbb{G }}\Omega _{\underline{\mathbb{G }}}\) is defined as in Sect. 3.3. This reduction shows that it is sufficient to prove that

$$\begin{aligned} \mu ^\mathsf{f}_\mathcal{D }\bigl ( \mathsf{C}_\mathbb{G }\cap \Lambda _{n^\varepsilon }\ne \varnothing \bigm | \Omega _{\sigma ,\mathcal{C }}\bigr ) = O(n^{14\varepsilon -1/2}), \end{aligned}$$

uniformly in the possible realizations of \(\mathcal{D }\), \(\mathcal{C }\) and \(\mathbb{G }\).

From now on, we fix \(k\ge \delta n\) such that \(|\mathbb{G }_{k,n}|\le M\) and \(E^\ell _{\nu ,k}\) holds true for some \(\ell \in \{1,2,3\}\). We set \(\mathcal{D }=\mathcal{D }_{k,n}, {\mathcal{C }= \mathcal{C }_{k, n}}\) and \(\mathbb{G }=\mathbb{G }_{k,n}\).

Since each set \(\mathbb{V }^i\) is already assumed to be connected outside of \(\mathcal{D }\) (since \(E^1_{\nu ,k}\cup E^2_{\nu ,k}\cup E^3_{\nu ,k}\) occurs), partitions \(\underline{\mathbb{G }}\in \mathcal{P }^{\prime }_{\mathbb{G }}\) can be of four different types (recall that they are maximal in the sense defined in the previous section): singletons only, singletons together with one pair of elements in two different \(\mathbb{V }^i\) (this cannot occur in \(E^1_{\nu ,k}\)), singletons together with one triplet of elements in three different \(\mathbb{V }^i\) (this can occur only in \(E^3_{\nu ,k}\)), singletons together with two pairs \((u,v)\) and \((u^{\prime },w)\), where \(u\) and \(u^{\prime }\) belong to the same \(\mathbb{V }^i\), and \(v\) and \(w\) belong to the other \(\mathbb{V }^j\) (this can occur only in \(E^3_{\nu ,k}\)). Let \(\mathcal{P }^*_{\mathbb{G }}\) be the set of partitions in \(\mathcal{P }^{\prime }_{\mathbb{G }}\) of one of the first three types. If the configuration is in \(\Omega _{\sigma ,\mathcal{C }}\setminus \cup _{\underline{\mathbb{G }}\in \mathcal{P }^*_{\mathbb{G }}} \Omega _{\underline{\mathbb{G }}}\), there are two different clusters connecting two pairs of vertices \((u,v)\) and \((u^{\prime },w)\) satisfying the conditions described above. By choosing \(\nu >0\) small enough, the assumption that \(E^3_{\nu ,k}\) holds implies that \(\tau (u-v)+\tau (u^{\prime }-w)\ge (1+\varepsilon )\tau _{\mathbb{G }}\) (where \(\varepsilon =\varepsilon (\delta _3,\nu )>0\)) uniformly in the possible pairs \((u,v)\) and \((u^{\prime },w)\). As in the proof of Proposition 4.6, one obtains after a small computation that

$$\begin{aligned} \mu ^\mathsf{f}_{\mathcal{D }}\bigl ( \Omega _{\sigma ,\mathcal{C }}\setminus \cup _{\underline{\mathbb{G }}\in \mathcal{P }^*_{\mathbb{G }}}\Omega _{\underline{\mathbb{G }}}\bigm | \Omega _{\sigma ,\mathcal{C }}\bigr ) = O(\mathrm{e}^{-ck}), \end{aligned}$$

for some constant \(c>0\). Hence, a reduction in the spirit of Proposition 3.4 shows that Theorem 2.2 would follow from the bound

$$\begin{aligned} \mu ^\mathsf{f}_{\mathcal{D }}\bigl ( \mathsf{C}_{\mathbb{G }}\cap \Lambda _{n^\varepsilon }\ne \varnothing \bigm | \Omega _{\underline{\mathbb{G }}}\bigr ) = O(n^{\varepsilon -1/2}), \end{aligned}$$
(24)

where the right-hand side is uniform in the possible realizations of \(\mathcal{D }\) and in the \(\underline{\mathbb{G }}\in \mathcal{P }^*_{\mathbb{G }}\). We decompose the proof of (24) into three cases, depending on the type of \(\underline{\mathbb{G }}\).

Scenario S1: No imposed crossing. This occurs in the following two cases (cf. Definition (4.1)): (i) \(E^1_{\nu ,k}\) occurs; (ii) \(E^2_{\nu ,k}\cup E^3_{\nu ,k}\) occurs and the partition \(\underline{\mathbb{G }}\) is composed of singletons only. In this case, the measure \(\mu ^\mathsf{f}_{\mathcal{D }_{k,n}}(\,\cdot \,|\Omega _{\underline{\mathbb{G }}})\) is unconditioned (i.e. \(\Omega _{\underline{\mathbb{G }}}=\Omega \)). Proposition 2.1 then implies that \(\mu ^\mathsf{f}_{\mathcal{D }}\bigl ( \mathsf{C}_{\mathbb{G }}\cap \Lambda _{n^\varepsilon }\ne \varnothing \bigm | \Omega _{\underline{\mathbb{G }}}\bigr )\) decays exponentially with \(n\).

Scenario S2: One imposed crossing. This occurs when \(E^2_{\nu ,k}\cup E^3_{\nu ,k}\) occurs and \(\underline{\mathbb{G }}\) is composed of singletons together with a unique pair \((u,v)\), where \(u\in \mathbb{V }^i, v\in \mathbb{V }^{j}\) with \(i\ne j\). In other words, \(\Omega _{\underline{\mathbb{G }}} =\left\{ u\leftrightarrow v\right\} \). In this case, the cluster \(\mathcal{C }_{\mathbb{G }}\subseteq \mathcal{D }\) may contain several connected components, but, up to exponentially small (in \(k\)) \(\mu ^\mathsf{f}_\mathcal{D }(\cdot |u\leftrightarrow v)\)-conditional probabilities, only one of them, namely the connected cluster \(\mathsf{C}(u, v)\) of \(\left\{ u,v\right\} \) is capable of reaching \(\Lambda _{n^\varepsilon }\). However, the law of the cluster connecting \(u\) and \(v\) converges to the law of a Brownian bridge. In fact, one obtains the following stronger result:

$$\begin{aligned} \mu ^\mathsf{f}_\mathcal{D }(x\in \mathsf{C}(u, v )|u\leftrightarrow v) \le \frac{C}{\sqrt{|u-v|}}\exp \Bigl (-\kappa \frac{d_\tau (x,[u,v])^2}{|u-v|}\Bigr ), \end{aligned}$$
(25)

where \(\kappa \) and \(C\) are constants depending on \(p\) only, and \([u,v]\) denotes the segment between \(u\) and \(v\). In the case of Ising interfaces, such bound was obtained in [22, (3.31)]. The proof relies on the positive curvature of the surface tension and on the effective random walk with exponentially decaying step distribution representation of the interface. The theory developed in [10] enables a literal adaptation to the case of sub-critical FK-clusters, see Theorems C and E and Subsections 4.4 and 4.5 in [10]. Consequently,

$$\begin{aligned} \mu _\mathcal{D }^{\mathsf{f}}\bigl ( \mathsf{C}_\mathbb{G }\cap \Lambda _{n^\varepsilon }\ne \varnothing \bigm | \Omega _{\underline{\mathbb{G }}}\bigr ) = O(n^{2\varepsilon -1/2}). \end{aligned}$$
(26)

Scenario S3: One tripod. This can only happen when \(E^3_{\nu ,k}\) occurs and \(\underline{\mathbb{G }}\) is composed of singletons together with one triplet \((u_1,u_2,u_3)\) with \(u_1\in \mathbb{V }^1,u_2\in \mathbb{V }^2,u_3\in \mathbb{V }^3\). Thus, in this case \(\Omega _{\underline{\mathbb{G }}} = \left\{ \mathsf{C}(u_1, u_2 , u_3 )\ne \varnothing \right\} \), where \(\mathsf{C}(u_1, u_2 , u_3 )\) is the joint connected cluster of \(\left\{ u_1, u_2 , u_3\right\} \). Again, \(\mathsf{C}= \mathsf{C}_{\mathbb{G }}\subseteq \mathcal{D }\) may contain several connected components, but, up to exponentially small (in \(k\)) \(\mu ^\mathsf{f}_\mathcal{D }(\cdot |\mathsf{C}(u_1 , u_2 , u_3 )\ne \varnothing )\)-conditional probabilities, only one of them, namely \(\mathsf{C}(u_1 , u_2 , u_3 )\) itself, is capable of reaching \(\Lambda _{n^\varepsilon }\). By definition, there exists a unique \(x{= x (u_1, u_2 , u_3 )} \in \Lambda _{k/2}\) (see Definition 4.7) such that \(\mathcal{T }_x = \left\{ u_1 ,u_2 ,u_3 ; x\right\} \) is a Steiner tripod. To lighten the notation, we set

$$\begin{aligned} E(u_1,u_2,u_3,x)=\{u_1,u_2,u_3 \text{ are} \text{ connected} \text{ and} \mathrm d _\tau (\mathsf{C}_\mathbb{G }, \mathcal{T }_x)\le \nu k\} \end{aligned}$$

and redefine \(\mathsf{C}= \mathsf{C}(u_1 , u_2 , u_3 ) \). Thanks to Propositions 4.2 and 4.6, we now aim at proving the bound

$$\begin{aligned} \mu ^\mathsf{f}_{\mathcal{D }}\bigl ( \mathsf{C }\cap \Lambda _{n^{\varepsilon }} \ne \varnothing \bigm | E(u_1,u_2,u_3,x)\bigr ) = O(n^{14\varepsilon -1/2}). \end{aligned}$$
(27)

This bound will imply Theorem 2.2.

Proving (27) is more complicated than proving (26). Nevertheless, the idea remains the same: The tripod has Gaussian fluctuations, therefore it intersects a small box with probability going to 0. In the case of percolation, fluctuations of tripods on the level of local limit results were studied in [9]. We are not after a full local limit picture here, and merely explain how techniques from [10] allow to derive (27). Let us write \(\Lambda _r(x)\) for \(x+\Lambda _r\).

Definition 5.1

(Cones \(\mathcal{Y }_1,\mathcal{Y }_2,\mathcal{Y }_3\)) Since, by Property P2 of the Steiner forests, for every \(i\ne j\),

$$\begin{aligned} \measuredangle (u_i-x,u_j-x) \ge \frac{\pi }{2} + \eta , \end{aligned}$$

there exist disjoint cones \(\mathcal{Y }_1, \mathcal{Y }_2\) and \(\mathcal{Y }_3\) such that each \(\mathcal{Y }_i\) contains exactly one lattice direction in its interior (i.e., one of the four vectors \((1,0)\), \((0,1)\), \((-1,0)\) and \((0,-1)\), denoted by \(\mathsf{f}_i\)), and there exists \(\varepsilon _1>0\) such that \(u_i\in \mathrm{int}\left(y+ \mathcal{Y }_i\right)\) for every \(y\in \Lambda _{\varepsilon _1 k }(x)\) and every \(i\in \{1,2,3\}\), and \(u_i \in \mathrm{int}\left(u_j - \mathcal{Y }_i\right)\) for every \(i\ne j\).

Definition 5.2

(Event \(S(t,y)\)) Given \(y\in \Lambda _{\varepsilon _1 k} (x) \) and \(t\in \mathbb{N }\), let \(\mathcal{S }(t ,y)\) be the event that the following three conditions occur (Fig. 6):

Fig. 6
figure 6

Description of the event \(S(t,y)\), namely the cones and the decomposition of the cluster \(\mathsf{C}\) into \(\mathsf{C}_i(t)\) and \(v_i(t)\), \(i=1,2,3\)

  1. R1.

    \(u_1\), \(u_2\) and \(u_3\) are pairwise disconnected in \(\mathsf{C}{\setminus }\Lambda _t(y)\),

  2. R2.

    \(\mathsf{C}\) intersects \(\partial \Lambda _t (y )\) in exactly three vertices. For \(i=1,2,3\), let \(\mathsf{C}_i(t ,y)\) be the connected component of \(\mathsf{C}\setminus \Lambda _t (y )\) containing \(u_i\), and \(v_i(t ,y)=\mathsf{C}_i(t ,y)\cap \partial \Lambda _t (y )\). Define \(\mathsf{C}_0(t ,y)=\mathsf{C}\setminus \big (\mathsf{C}_1(t ,y) \cup \mathsf{C}_2(t ,y) \cup \mathsf{C}_3(t ,y)\big )\). We will drop the reference to \(t\) and \(y\) when no confusion is possible.

  3. R3.

    \(\mathsf{C}_0\) is contained in \(\bigcap _{i=1}^3\left(v_i - \mathcal{Y }_i\right)\) and \(\mathsf{C}_i\subset \left(v_i +\mathcal{Y }_i\right)\) for \(i\in \{1,2,3\}\).

Lemma 5.3

Fix \(\varepsilon _1 >0\) and let \(\varepsilon >0\) be sufficiently small. There exists \(C>0\) such that

$$\begin{aligned} \mu _\mathcal{D }^\mathsf{f}\left(\bigcup _{t\le Ck^\varepsilon } \bigcup _{ y\in \Lambda _{\varepsilon _1 k}(x)} \mathcal{S }(t,y) \Bigm | E(u_1,u_2,u_3,x) \right) \ge 1-O(\mathrm{e}^{-k^{\varepsilon }}) . \end{aligned}$$
(28)

Proof of Lemma 5.3

First of all, we notice that coarse-graining on the \(k^\varepsilon \)-scale enables a reduction to particularly simple geometric structures. Consider a forest skeleton of the cluster \(\mathsf{C}\) at scale \(k^\varepsilon \). Note that, conditionally on \(E(u_1,u_2,u_3,x)\), this forest is in fact a tree \(\mathcal{T }_{k^\varepsilon }\).

We define the trunk \(\mathfrak{t }_\varepsilon \) of \(\mathcal{T }_{k^\varepsilon }\) as the minimal subtree of \(\mathcal{T }_{k^\varepsilon }\) which spans \(\left\{ u_1, u_2, u_3\right\} \).

We define the branches of \(\mathcal{T }_{k^\varepsilon }\) as \(\mathfrak B _\varepsilon =\mathcal{T }_{k^\varepsilon }\backslash \mathfrak{t }_\varepsilon \). In this case, we obtain the following reduced geometry of typical \(\mathcal{T }_{k^\varepsilon }\), which holds uniformly in all situations in question, up to probabilities which are exponentially small in \(k^\varepsilon \):

  1. T1.

    \(\mathcal{T }_{k^\varepsilon }\) does not have branches. This means that the tree \(\mathcal{T }_{k^\varepsilon }\) consists only of a trunk which is a tripod, i.e. with one vertex of degree 3 and all other vertices of degree at most 2. We will write \(x_\varepsilon \) for the only triple point of \(\mathcal{T }_{k^\varepsilon }\), and \(\mathcal{T }_{k^\varepsilon }^i =\{ u_{i ,\varepsilon }^{n_i}, \dots , u_{i ,\varepsilon }^1 = x_\varepsilon \}\), \(i=1,2,3\), for the three legs of \(\mathcal{T }_{k^\varepsilon }\). Note that \(u_i\in \hat{\mathbf{B }}_{2k^\varepsilon }(u_{i,\varepsilon }^{n_i})\).

  2. T2.

    Fix \(\kappa > 0\) small. For every \(\varepsilon > 0\) and each \({\varepsilon ^{\prime }}\in (0,\varepsilon /2)\), the skeletons \(\mathcal{T }_{k^{\varepsilon ^{\prime }}}^i{\setminus } \Lambda _{k^\varepsilon } (x_{\varepsilon ^{\prime }})\subseteq x_{\varepsilon ^{\prime }}+ \mathcal{Y }_{i, 2\kappa } \) as soon as \(k\) becomes sufficiently large, where cones \(\mathcal{Y }_{i, 2\kappa }\) are defined via

    $$\begin{aligned} \mathcal{Y }_{i, r} = \bigl \{ z : \measuredangle (z,u_i-x_{\varepsilon ^{\prime }}) \le r\bigr \}. \end{aligned}$$
    (29)

    That is, the vertices of each of the three branches of \(\mathcal{T }_{k^{\varepsilon ^{\prime }}}\) outside the box \(\Lambda _{k^\varepsilon }\) are confined to the respective cones \(x_{\varepsilon ^{\prime }}+ \mathcal{Y }_{i, 2\kappa }\).

Before proving Properties T1 and T2, let us describe how they can be used to prove the lemma. First of all, note that, by Proposition 4.6, we may assume that \(|x_{\varepsilon ^{\prime }}- x|\le \delta _1 k\) with \(\delta _1 >0\) fixed as small as we wish. In particular, we may assume that \(u_i\in \mathrm{int} (x_{\varepsilon ^{\prime }}+\mathcal{Y }_i)\) (see Definition 5.1) and, consequently, that \(\mathcal{Y }_{i ,2\kappa }\subset \mathcal{Y }_i\).

By Proposition 4.5, the connected cluster \(\mathsf{C}\) is included in \(\mathcal{T }_{k^{\varepsilon ^{\prime }}} + 2k^{\varepsilon ^{\prime }}\mathbf{U}_\tau \). Therefore, Properties T1 and T2 imply that \(\mathsf{C}\setminus \Lambda _{k^\varepsilon } (x_{\varepsilon ^{\prime }})= \tilde{\mathsf{C}}_1 \cup \tilde{\mathsf{C}}_2 \cup \tilde{\mathsf{C}}_3\), where \(\tilde{\mathsf{C}}_1\), \(\tilde{\mathsf{C}}_2\) and \(\tilde{\mathsf{C}}_3\) are the clusters (in \(\mathsf{C}\setminus \Lambda _{k^\varepsilon } (x_{\varepsilon ^{\prime }})\)) of \(u_1\), \(u_2\) and \(u_3\) respectively. Note that, by T2, clusters \(\mathsf{C}_i\) are confined to the sets (actually truncated cones) \(( x_{\varepsilon ^{\prime }}+ \mathcal{Y }_{i ,2\kappa } + 2k^{\varepsilon ^{\prime }}\mathbf{U}_\tau ){\setminus }\Lambda _{k^\varepsilon } (x_{\varepsilon ^{\prime }})\), which are well separated on the \(k^\varepsilon \)-scale. Consequently coarse-graining estimates developed in [10, Section 2] apply to each of \(\mathsf{C}_i\) separately. As a result, the claim of Lemma 5.3 follows by a straightforward adaptation of the mass-gap arguments of [10, Section 2] applied separately to each of the three disjoint clusters \(\tilde{\mathsf{C}}_1 ,\tilde{\mathsf{C}}_2 \) and \(\tilde{\mathsf{C}}_3\). For instance, one can show the following: Fix \(r\) large enough so that \(\Lambda _{k^{\varepsilon ^{\prime }}} (x_{\varepsilon ^{\prime }})\subset v- \mathcal{Y }_i\) for any \(v\in (x_{\varepsilon ^{\prime }}+ \mathcal{Y }_{i ,2\kappa }) \cap (\Lambda _{2rk^\varepsilon } (x_{\varepsilon ^{\prime }}){\setminus } \Lambda _{r k^\varepsilon } (x_{\varepsilon ^{\prime }}))\) and \(i=1,2,3\). Then, up to probabilities which are exponentially small in \(k^\varepsilon \), there exists \(t\in [rk^\varepsilon ,2rk^\varepsilon ]\) such that each of the clusters \(\tilde{\mathsf{C}}_i\) contains a \(\mathcal{Y }_i\)-break point on \(\partial \Lambda _{t} (x_{\varepsilon ^{\prime }})\). That is,

  • for \(i=1,2,3\), the intersection \(v_i = \tilde{\mathsf{C}}_i\cap \partial \Lambda _t (x_\varepsilon )\) is a singleton;

  • for \(i= 1,2,3\) the cluster \(\tilde{\mathsf{C}}_i \subset (v_i +\mathcal{Y }_i) \cup (v_i -\mathcal{Y }_i)\).

This ensures \(\mathcal{S }(t ,y)\) for some \(y\in \Lambda _{\varepsilon _1 k}\) and (28) follows. \(\square \)

For the proof of Property T1, we refer to [10, Lemma 2.1 and 2.2].

Proof of Property T2

Let us start with a lower bound on \(\mu _\mathcal{D }^\mathsf{f}(E(u_1,u_2,u_3,x))\) which will be used later as a test threshold quantity for ruling out improbable events. Let \(y\) be a lattice approximation of \(x\). By the FKG inequality,

$$\begin{aligned} \mu _\mathcal{D }^\mathsf{f}(E(u_1,u_2,u_3,x)) \ge \mu _\mathcal{D }^\mathsf{f}\left( \bigcap _{i=1}^{3} \left\{ y\stackrel{\mathcal{D }}{\leftrightarrow } u_i\right\} \right) \ge \prod _{i=1}^3 \mu _\mathcal{D }^\mathsf{f}\bigl ( y\stackrel{\mathcal{D }}{\leftrightarrow } u_i\bigr ) . \end{aligned}$$

Theorem A in [10] gives sharp asymptotics of quantities \(\mu ^\mathsf{f}\left(y\leftrightarrow v_i\right)\). These sharp asymptotics are built upon an effective random walk representation of events \(\left\{ y\leftrightarrow u \right\} \) as described in Subsection 4.1 of the the paper. Steps of this random walk have effective drift from \(u_i\) towards \(y\), and, since \(\Lambda _k\subset \mathcal{D }\), it is easy to adjust the arguments therein in order to show that

$$\begin{aligned} \mu ^\mathsf{f}_\mathcal{D }\bigl ( y \stackrel{\mathcal{D }}{\leftrightarrow } u\bigr ) \ge \frac{C_0}{\sqrt{k}} \mathrm{e}^{-\tau (u -y )}\!, \end{aligned}$$

uniformly in \(y\in \Lambda _{\frac{k}{2}}\) and \(v\in \partial \Lambda _k\), where \(C_0\) (and, similarly, \(C_1\), \(C_2\), \(\ldots \) below) is a universal constant, in the sense that (30) applies uniformly in all the situations in question as soon as \(k\) is sufficiently large. Consequently,

$$\begin{aligned} \mu _\mathcal{D }^\mathsf{f}(E(u_1,u_2,u_3,x)) \ge \exp \left( -\sum _{i=1}^3 \tau (u_i-x) - C_1 \log k \right), \end{aligned}$$
(30)

also uniformly in all the situations in question as soon as \(k\) is sufficiently large.

Next, let us say that \(\mathsf{w}\in \mathcal{T }_{k^{\varepsilon ^{\prime }}}^i\) is a \(2\kappa \)-cone point of \(\mathcal{T }_{k^{\varepsilon ^{\prime }}}^i\) if \(\mathcal{T }_{k^{\varepsilon ^{\prime }}}^i\subset \left(w-\mathcal{Y }_{i ,2\kappa }\right)\cup \left(w+\mathcal{Y }_{i ,2\kappa }\right)\). In our notation,

$$\begin{aligned} \tau (\mathcal{T }_{k^{\varepsilon ^{\prime }}} ) = \sum _{i=1}^3 \tau (\mathcal{T }_{k^{\varepsilon ^{\prime }}}^i ) \end{aligned}$$

Since \(\tau \) is a strictly convex norm ([10, Subsection 1.3.2]) ,

$$\begin{aligned} \tau (\mathcal{T }_{k^{\varepsilon ^{\prime }}}^i) \ge \tau (u_i - x_{\varepsilon ^{\prime }})\left(1 +\delta (\kappa )\right)\ge \tau (u_i - x_{\varepsilon ^{\prime }})+ C_2 k, \end{aligned}$$
(31)

whenever \(\mathcal{T }_{k^{\varepsilon ^{\prime }}}^i\) does not contain \(2\kappa \)-cone points at all. This is essentially Lemma 2.4 of [10]. In view of (20), and in view of the lower bound (30), we are entitled to ignore the situation when any of the \(\mathcal{T }_{k^{\varepsilon ^{\prime }}}^i\) does not have \(2\kappa \)-cone points at all.

In the sequel, we use \(\mathsf{w}^*_i\) to denote the first \(2\kappa \)-cone point of \(\mathcal{T }_{k^{\varepsilon ^{\prime }}}^i\) (starting at \(x_{\varepsilon ^{\prime }}\)) and \(N_i\) to denote its serial number; that is, \(\mathsf{w}_i^* = u^{N_i}_{i,{\varepsilon ^{\prime }}}\). Define \(\mathcal{T }_{k^{\varepsilon ^{\prime }}}^{i,*} = \{ u^1_{i, {\varepsilon ^{\prime }}}, \ldots , u^{N_i}_{i , {\varepsilon ^{\prime }}} = \mathsf{w}_i^*\}\) as the portion of \(\mathcal{T }_{k^{\varepsilon ^{\prime }}}^i\) up to \(\mathsf{w}_i^*\). Given \(y\) and \(\underline{\mathsf{w}} = \left(\mathsf{w}_1 , \mathsf{w}_2, \mathsf{w}_3\right)\), define the percolation event \(E_{\varepsilon ^{\prime }}(y, \underline{\mathsf{w}} )\subset E(u_1,u_2,u_3,x)\) as

$$\begin{aligned} E_{\varepsilon ^{\prime }}(y, \underline{\mathsf{w}})= \bigl \{ x_{\varepsilon ^{\prime }}= y\, ;\, \mathsf{w}_i^* = \mathsf{w}_i\ \text{ for} i=1,2,3 \bigr \} . \end{aligned}$$

In view of (30), Property T2 will follow as soon as we shall have checked that

$$\begin{aligned} \mu ^\mathsf{f}_\mathcal{D }\left(E_{\varepsilon ^{\prime }}(y, \underline{\mathsf{w}})\right)\le \mathrm{e}^{-\sum _i\tau (u_i -y) - C_3 k^\varepsilon }, \end{aligned}$$
(32)

uniformly in \(k\), tripods \(\mathcal{T }_x\), \(y\) and \(\underline{\mathsf{w}}\not \subset \Lambda _{k^\varepsilon } (y )\). For fixed realizations \(\mathcal{T }^{i, *}\) of \(\mathcal{T }_{k^{\varepsilon ^{\prime }}}^{i,*}\) we have

$$\begin{aligned}&\mu ^\mathsf{f}_\mathcal{D }\bigl ( E_{\varepsilon ^{\prime }}(y, \underline{\mathsf{w}}); \mathcal{T }_{k^{\varepsilon ^{\prime }}}^{i,*}=\mathcal{T }^{i,*} \ \text{ for} i=1,2,3 \bigr ) \\&\quad \le \mathrm{exp}\left\{ -\sum _{i=1}^3\bigl \{ \tau (u_i - \mathsf{w}_i) +\tau (\mathcal{T }^{i,*}) (1-\text{ o}_{k^{\varepsilon ^{\prime }}}(1))\bigr \} + C_4 k^{2{\varepsilon ^{\prime }}}\right\} . \end{aligned}$$

This follows from (20) and from the finite energy property (applied for configurations on \(\Lambda _{C_5 k^{\varepsilon ^{\prime }}} (\mathsf{w}_i )\)) of the FK measures. Indeed, the finite energy property and the exponential ratio mixing property (6) enable to decouple between the event \(\bigcap _i \bigl \{ \mathcal{T }_{k^{\varepsilon ^{\prime }}}^{i,*}=\mathcal{T }^{i,*} \bigr \}\) and the events \(\bigcap _i \bigl \{ \mathsf{w}_i\stackrel{\mathsf{w}_i + \mathcal{Y }_{i ,2\kappa }}{\longleftrightarrow } u_i\bigr \}\).

Assume, for instance, that \(\mathsf{w}_1\not \in \Lambda _{k^\varepsilon } (y)\). There are two cases to consider:

Case 1: \(\mathsf{w}_1 \in y + \mathcal{Y }_{1, \kappa }\). Then, exactly as in (31), \(\tau (\mathcal{T }^{1, *} )\ge \tau (\mathsf{w}_{1}-y) + C_6\left| \mathsf{w}_1 - y\right|\). As in (21) the entropic factor related to the number of possible compatible realizations \(\mathcal{T }^{1 ,*}\) is suppressed, and (32) follows.

Case 2: \(\mathsf{w}_1 \in ( y + \mathcal{Y }_{1, 2\kappa })\backslash ( y + \mathcal{Y }_{1, \kappa })\). By construction, \(\tau (\mathcal{T }^{1, *} ) \ge \tau (\mathsf{w}_1 - y )\). However, by the sharp triangle inequality (2),

$$\begin{aligned} \tau (\mathsf{w}_1 - y) +\tau (u_1 - \mathsf{w}_1 ) - \tau (u_1 -y ) \ge C_7\left| \mathsf{w}_1 - y\right|, \end{aligned}$$

uniformly in \(\mathsf{w}_1\) under consideration. Again, since the entropic factor is suppressed, (32) follows by choosing \(\varepsilon > 2 \varepsilon ^\prime \).

Lemma 5.4

Let \(S_\varepsilon (y)=\bigcup _{t\le Ck^\varepsilon }S(t,y)\). There exist two universal constants \(\kappa >0\) and \(C<\infty \) such that

$$\begin{aligned} \mu _{\mathcal{D }}^\mathsf{f}\bigl ( S_\varepsilon (y) \bigm \vert E(u_1,u_2,u_3,x) \bigr ) = O \Bigl ( k^{12\varepsilon -1} \exp \Bigl (-\kappa \frac{\left| y -x\right|^2}{k}\Bigr )\Bigr ) , \end{aligned}$$
(33)

uniformly in \(y\in \Lambda _{\varepsilon _1 k} (x)\).

Proof

Decompose

$$\begin{aligned} \mathcal{S }( t, y ) = \bigcup _{W}\mathcal{S }^{W} (t,y) \end{aligned}$$

according to the triple \(W = \{ v_1-y ,v_2-y ,v_3-y \} \subset \partial \Lambda _t\) which shows up in the definition. From now on, we set \(w_1=v_1-y\), \(w_2=v_2-y\) and \(w_3=v_3-y\).

Since, under the event \(S^W(t,y)\), we have that \(\mathsf{C}_0\subseteq \bigcap _{i=1}^3\left(v_i - \mathcal{Y }_i\right)\) and that the points \(u_i\) lie deep in the interior of the corresponding cones \(v_i+\mathcal{Y }_i\), with \(v_i\in \partial \Lambda _t(y)\) and \(t\le Ck^\varepsilon \), the Ornstein–Zernike asymptotics of [10, Theorem A] imply that

$$\begin{aligned} \mu _{\mathcal{D }}^\mathsf{f}\left(\, \bigcap _{i=1}^3\{ \mathsf{C}_i\subset v_i +\mathcal{Y }_i \} \Bigm | \mathsf{C}_0(t,y)\right) = \Theta \left(k^{-3/2}\, \mathrm{e}^{-\sum _{i=1}^3 \tau (u_i-v_i)}\right),\quad \quad \end{aligned}$$
(34)

uniformly in any possible realization \(\mathsf{C}_0\) of \(\mathsf{C}_0(t,y)\) compatible with \(S^W(t,y)\). Note that if \(\mathsf{C}_0(t,y)\) is compatible with \(S^W(t,y)\), then shifts \(\mathsf{C}_0^{u}\stackrel{\Delta }{=}\mathsf{C}_0 +u\) are compatible with shifted events \(S^W(t,y+u)\).

Recall Definition 4.7 of \(\phi ( y)\). Given a triple \(W = \left\{ w_1 , w_2 , w_3\right\} \subset \Lambda _{Ck^\varepsilon }\), let us define

$$\begin{aligned} \phi _W (y) = \sum _{i=1}^3 \tau (u_i-w_i-y). \end{aligned}$$

Together with (34), we obtain

$$\begin{aligned} \frac{ \mu _{\mathcal{D }}^\mathsf{f}\bigl (\mathcal{S }^{W}(t,y) \bigr ) }{ \mu _{\mathcal{D }}^\mathsf{f}\bigl ( \mathcal{S }^{W}(t,z) \bigr ) }&= \frac{\sum _{\mathsf{C}_0}\mu _{\mathcal{D }}^\mathsf{f}\Bigl ( \bigcap _{i=1}^3\{ \mathsf{C}_i\subset y+w_i +\mathcal{Y }_i \} \Bigm | \mathsf{C}_0\Bigr )\mu _\mathcal{D }^\mathsf{f}(\mathsf{C}_0(t,y)=\mathsf{C}_0)}{\sum _{\mathsf{C}_0}\mu _{\mathcal{D }}^\mathsf{f}\Bigl ( \bigcap _{i=1}^3\{ \mathsf{C}_i\subset z+w_i +\mathcal{Y }_i \} \Bigm | \mathsf{C}_0\Bigr )\mu _\mathcal{D }^\mathsf{f}(\mathsf{C}_0(t,z)=\mathsf{C}_0^{{z-y}})}\nonumber \\&= \Theta \bigl (\mathrm{e}^{ \phi _W (y) - \phi _W (z) }\bigr ) , \end{aligned}$$
(35)

uniformly in \(t\le Ck^\varepsilon \), \(W = \left\{ w_1 ,w_2 ,w_3\right\} \subset \partial \Lambda _t\) and \(y,z\in \Lambda _{\varepsilon _1 k}\), where the sum is over \(\mathsf{C}_0\) compatible with \(S^W(t,y)\) and where in the last line we used classical ratio-mixing properties of subcritical random-cluster measures [3, Theorem 3.4] and (6) to compare \(\mu _\mathcal{D }^\mathsf{f}(\mathsf{C}_0(t,y)=\mathsf{C}_0)\) and \(\mu _\mathcal{D }^\mathsf{f}(\mathsf{C}_0(t,z)=\mathsf{C}_0^{z-y})\).

The function \(\phi _W\) has a non-degenerate quadratic minimum at some \(x_\mathrm{min}(W)\) (see [9, Lemma 3]). In view of the homogeneity of \(\tau \), a quadratic expansion around \(x_\mathrm{min}\) yields

$$\begin{aligned} \phi _W (y) - \phi _W (x_\mathrm{min}) = \Theta \left( \frac{\left| y - x_\mathrm{min}\right|^2}{k}\right) , \end{aligned}$$
(36)

uniformly in all situations in question. Since \(|\phi (y)-\phi _W(y)|=O(k^\varepsilon )\), its minimizers \(x_\mathrm{min} (W)\) solve

$$\begin{aligned} F(x, W ) \stackrel{\Delta }{=}\nabla _x\phi _W (x ) = 0. \end{aligned}$$

Since \(\mathrm{Hess} (\phi )\) is non-degenerate at \(x\), the implicit function theorem applies. As a result, \(|x_\mathrm{min} (W) -x|=O(\sum _i \left| w_i \right|) = O (k^\varepsilon )\) uniformly in all \(W\) in question. This fact, together with (36), yields

$$\begin{aligned} \phi _W (y)-{\phi _W(x ) } = \Theta \left( \frac{\left| y - x\right|^2}{k}+k^{2\varepsilon -1}\right) . \end{aligned}$$
(37)

Since there are at most \(O(k^{3\varepsilon })\) possible choices for \(W\) and \(O(k^\varepsilon )\) possible choices for \(t\), we deduce from (35) and (37) that

$$\begin{aligned} \frac{1}{O(k^{4\varepsilon })} \exp \left(-{C_1}\frac{|y-x|^2}{k}\right) \le \frac{ \mu _{\mathcal{D }}^\mathsf{f}( \mathcal{S }_\varepsilon (y) )}{ \mu _{\mathcal{D }}^\mathsf{f}( \mathcal{S }_\varepsilon ({x} ) ) } \le O(k^{4\varepsilon }) \exp \left(-{C_2}\frac{|y-x|^2}{k}\right) .\qquad \end{aligned}$$
(38)

Above \(\mathcal{S }_\varepsilon ({x} )\) means in fact \(\mathcal{S }_\varepsilon (\lfloor x\rfloor )\). We can now compute

$$\begin{aligned} \mu _\mathcal{D }^\mathsf{f}(S_\varepsilon (y) \,\vert \, E(u_1,u_2,u_3,x))&= \frac{\mu _\mathcal{D }^\mathsf{f}(S_\varepsilon (y))}{\mu _\mathcal{D }^\mathsf{f}(S_\varepsilon (x ))}\cdot \frac{\mu _\mathcal{D }^\mathsf{f}(S_\varepsilon (x ))}{\mu _\mathcal{D }^\mathsf{f}(E(u_1,u_2,u_3,x))} \nonumber \\&\le O(k^{4\varepsilon })\exp \Bigl (-C_2\frac{|y-x|^2}{k} \Bigr )\frac{\mu _\mathcal{D }^\mathsf{f}(S_\varepsilon (x ))}{\mu _\mathcal{D }^\mathsf{f}(E(u_1,u_2,u_3,x))} \end{aligned}$$
(39)

where we used the second inequality in (38). In order to see that the rightmost term in (39) is of the right order, observe that \(|y-x|\le k^{1/2-\varepsilon }\) implies that \(\mathrm{e}^{-C_2|y-x|^2/k}\) is of order 1, and therefore the ratio in (38) is smaller than \(O(k^{4\varepsilon })\). Therefore, by looking at the \(k^{1-2\varepsilon }\) sites which are at distance at most \(k^{1/2-\varepsilon }\) from \(x\), we deduce, using the first inequality in (38), that

$$\begin{aligned} \mu _\mathcal{D }^\mathsf{f}(S_\varepsilon (x )) \!\le \! O(k^{-1+6\varepsilon }) \sum _{y\in \Lambda _{k^{1/2-\varepsilon }}(x )} \mu _\mathcal{D }^\mathsf{f}(S_\varepsilon (y)) \!\le \! O(k^{-1+8\varepsilon })\mu _\mathcal{D }^\mathsf{f}(E(u_1,u_2,u_3,x)), \end{aligned}$$

where in the second inequality, we used the fact that in a given configuration there are at most \(O(k^{2\varepsilon })\) sites \(y\) such that the corresponding events \(\mathcal{S }_\varepsilon (y)\) occur. This implies that

$$\begin{aligned} \frac{\mu _{\mathcal{D }}^\mathsf{f}\bigl ( \mathcal{S }_\varepsilon (x) \bigr )}{\mu ^\mathsf{f}_\mathcal{D }\bigl (E(u_1,u_2,u_3,x)\bigr )} \le O(k^{8\varepsilon -1}). \end{aligned}$$

Together with (39), we obtain (33). \(\square \)

Lemmas 5.3 and 5.4 imply that

$$\begin{aligned}&\mu _{\mathcal{D }}^\mathsf{f}\bigl ( \mathsf{C}\cap \Lambda _{k^\varepsilon }\ne \varnothing \bigm | E(u_1,u_2,u_3,x) \bigr ) \nonumber \\&\quad \le O(k^{12\varepsilon -1}) \sum _{y\in \Lambda _{\varepsilon _1 k} (x)} \mathrm{e}^{-\kappa \left| y-x\right|^2/k} \mu _{\mathcal{D }}^\mathsf{f}\bigl ( \mathsf{C}\cap \Lambda _{k^\varepsilon }\ne \varnothing \bigm | \mathcal{S }_\varepsilon (y) \bigr ) + O\bigl ( \mathrm{e}^{-k^\varepsilon } \bigr ).\nonumber \\ \end{aligned}$$
(40)

It remains to provide an upper bound on \(\mu _{\mathcal{D }}^\mathsf{f}\bigl (\mathsf{C}\cap \Lambda _{k^\varepsilon }\ne \varnothing \bigm | \mathcal{S }_\varepsilon (y)\bigr )\). There are two cases to consider:

CASE 1: \(y\in \Lambda _{2Ck^\varepsilon }\). In this case, we simply use

$$\begin{aligned} \mu _{\mathcal{D }}^\mathsf{f}\bigl ( \mathsf{C}\cap \Lambda _{k^\varepsilon }\ne \varnothing \bigm | \mathcal{S }_\varepsilon (y) \bigr ) \le 1. \end{aligned}$$
(41)

The total contribution to the right-hand side of (40) is then bounded by \(O(k^{14\varepsilon -1})\), which is negligible with respect to our target estimate (27).

CASE 2: \(y\not \in \Lambda _{2Ck^\varepsilon }\). In this case, \(\Lambda _{k^\varepsilon }\) can intersect at most one of the \(\Lambda _{Ck^\varepsilon }(y) + \mathcal{Y }_i\), and therefore can be hit by only one cluster \(\mathsf{C}_i\). Without loss of generality, let us assume that \(\mathsf{C}_i=\mathsf{C}_1\). Conditioning on the smallest \(t\) such that \(\mathcal{S }(t,y)\) occurs as well as on \(\mathsf{C}_0\), \(\mathsf{C}_2\) and \(\mathsf{C}_3\), the cluster \(\mathsf{C}_1\) obeys, as was explained after (25), a diffusive scaling. In particular,

$$\begin{aligned} \mu _{\mathcal{D }}^\mathsf{f}\bigl (z \in \mathsf{C}_1 \bigm | S(t,y),\mathsf{C}_0,\mathsf{C}_2,\mathsf{C}_3 \bigr ) = O\left(\frac{1}{\sqrt{|v_1-z|}} \, \exp \left[ -\kappa ^{\prime } \frac{\text{ d}_\tau (z,[v_1,u_1])^2}{\left| v_1-z\right|}\right]\right). \end{aligned}$$

In the previous inequality, \(v_1=v_1(t,y)\). We find:

$$\begin{aligned} \mu _{\mathcal{D }}^\mathsf{f}\bigl ( \mathsf{C}\cap \Lambda _{k^\varepsilon }\ne \varnothing \bigm | S(t,y),\mathsf{C}_0,\mathsf{C}_2,\mathsf{C}_3 \bigr )&\le \sum _{z\in \partial \Lambda _{k^\varepsilon }}\mu _{\mathcal{D }}^\mathsf{f}\bigl (z \in \mathsf{C}_1 \bigm | S(t,y),\mathsf{C}_0,\mathsf{C}_2,\mathsf{C}_3 \bigr )\nonumber \\&\le \sum _{z\in \partial \Lambda _{k^\varepsilon }} O\left(\frac{1}{\sqrt{|v_1-z|}} \, \exp \Bigl [ -\kappa ^{\prime } \frac{\text{ d}_\tau (z,[v_1,u_1])^2}{\left| v_1-z\right|}\Bigr ] \right)\nonumber \\&= O\left(\frac{k^\varepsilon }{\sqrt{\left| y\right|}} \, \exp \bigl \{ -\kappa ^{\prime \prime } \frac{\text{ d}_\tau (0,[y,u_1])^2}{\left| y\right|}\bigr \} \right) . \end{aligned}$$
(42)

In the last line, we used the fact that \(y, v_1\notin \Lambda _{Ck^\varepsilon }\) and \(|v_1-y|\le Ck^\varepsilon \). Let us substitute (42) into the sum on the right-hand side of (40) to obtain

$$\begin{aligned}&\mu _{\mathcal{D }}^\mathsf{f}\bigl ( \mathsf{C}\cap \Lambda _{k^\varepsilon }\ne \varnothing \bigm | E(u_1,u_2,u_3,x) \bigr ) \le O(\mathrm{e}^{-k^\varepsilon })+O(k^{14\varepsilon -1})\nonumber \\&\quad + O(k^{13\varepsilon -1}) \sum _{y\in \Lambda _{\varepsilon _1 k} (x)\setminus \Lambda _{2C k^{\varepsilon }}} \frac{1}{\sqrt{|y|}}\exp \Big [-\kappa \frac{\left| y-x\right|^2}{k}-\kappa ^{\prime \prime }\frac{\text{ d}_\tau (0,[y,u_1])^2}{|y|}\Big ].\nonumber \\ \end{aligned}$$
(43)

After a simple estimate, one sees easily that the sum on the right is bounded above as

$$\begin{aligned}&2\sum _{y\in \Lambda _{k^{1/2+\varepsilon }}(x)\setminus \Lambda _{2C k^{\varepsilon }}} \frac{1}{\sqrt{|y|}}\exp \Big [-\kappa ^{\prime \prime }\frac{\text{ d}_\tau (0,[y,u_1])^2}{|y|}\Big ] \nonumber \\&\quad \le \sum _{\ell =\max \{2Ck^\varepsilon ,|x|-k^{1/2+\varepsilon }\}}^{|x|+k^{1/2+\varepsilon }} \frac{O(\sqrt{\ell })}{\sqrt{\ell }}=O(k^{\frac{1}{2}+\varepsilon }), \end{aligned}$$

uniformly in \(x\). In order to obtain the first inequality, we used the fact that \(\exp (-\kappa |y-x|^2/k)\) is very small for sites outside of \(\Lambda _{k^{1/2+\varepsilon }}(x)\). For the second, observe that sites \(y\) at distance \(\ell \) contributing substantially to this sum must satisfy the condition that \(0\) is at distance \(O(\sqrt{\ell })\) of \([y,u_1]\). There are \(O(\sqrt{\ell })\) of them. This concludes the proof.