1 Introduction

Let \(d\ge 1\) and suppose that \(J:{\mathbb {Z}}^d \rightarrow [0,\infty )\) is both symmetric in the sense that \(J(x)=J(-x)\) for every \(x\in {\mathbb {Z}}^d\) and integrable in the sense that \(\sum _{x\in {\mathbb {Z}}^d} J(x) <\infty \). For each \(\beta \ge 0\), long-range percolation on \({\mathbb {Z}}^d\) with intensity J is the random graph with vertex set \({\mathbb {Z}}^d\) in which we choose whether or not to include each potential edge \(\{x,y\}\) independently at random with inclusion probability \(1-\exp (-\beta J(y-x))\). Note that this model is equivalent to nearest-neighbour percolation when \(J(x)=\mathbb {1}(\Vert x\Vert _1=1)\). Here we will instead be most interested in the case that J(x) decays like an inverse power of \(\Vert x\Vert \), so that

$$\begin{aligned} J(x) \sim A\Vert x\Vert ^{-d-\alpha } \qquad \text { as } x\rightarrow \infty \end{aligned}$$
(1.1)

for some constants \(A>0\) and \(\alpha >0\). We denote the law of the resulting random graph by \({\mathbf {P}}_{\beta }={\mathbf {P}}_{J,\beta }\) and refer to the connected components of this random graph as clusters. Studying the geometry of these clusters leads to many interesting questions, some of which are motivated by applications to modeling ‘small-world’ phenomena in physics, epidemiology, the social sciences, and so on; see e.g. [12, Section 1.4] and [14, Section 10.6] for background and many references. Although substantial progress on these questions has been made over the last forty years, with highlights of the literature including [5, 12, 13, 21, 24, 25, 42, 72], many further important problems remain open.

In this paper we study the phase transition in long-range percolation. Given \(d\ge 1\) and a symmetric, integrable function \(J:{\mathbb {Z}}^d \rightarrow [0,\infty )\), we define the critical parameter

$$\begin{aligned} \beta _c = \beta _c(J) = \sup \bigl \{ \beta \ge 0 : {\mathbf {P}}_\beta \text { is supported on configurations with no infinite clusters}\bigr \}. \end{aligned}$$

Elementary path-counting arguments yield that there are no infinite clusters almost surely when \(\beta \sum _x J(x) <1\), and hence that \(\beta _c \ge 1/\sum _x J(x) >0\) under the assumption that J is locally finite. When \(d =1\) and J is of the form (1.1), the model has a non-trivial phase transition in the sense that \(0<\beta _c<\infty \) if and only if \(\alpha \le 1\), while for \(d\ge 2\) the phase transition is non-trivial for every \(\alpha >0\) [58, 61]. As with nearest-neighbour percolation, the model is expected to exhibit many interesting fractal-like features when \(\beta =\beta _c\) (see e.g. [21, 25, 42]), but proving this rigorously seems to be a very difficult problem in general.

It is a surprising fact that our understanding of long-range percolation models is better than our understanding of their nearest-neighbour counterparts in many situations. Indeed, it is a remarkable theorem of Noam Berger [11] that long-range percolation on \({\mathbb {Z}}^d\) undergoes a continuous phase transition in the sense that there is no infinite cluster at \(\beta _c\) whenever \(d \ge 1\) and \(0<\alpha <d\). The corresponding statement for nearest-neighbour percolation with \(d\ge 2\) is of course a notorious open problem needing little further introduction. While it is widely believed that the phase transition should be continuous for all \(\alpha >0\) and \(d\ge 2\), it is a theorem of Aizenman and Newman [5] that the model undergoes a discontinuous phase transition when \(d=\alpha =1\), so that the condition \(\alpha <d\) cannot be removed from Berger’s result in general.

Berger’s proof works by showing that the set of \(\beta \) for which an infinite cluster exists a.s. is open, and gives little quantitative control of percolation at the critical parameter \(\beta _c\) itself. In this paper we give a new, quantitative proof of Berger’s result that yields an explicit power-law upper bound on the tail of the volume of the cluster of the origin at criticality under the same assumptions. We write \(K_0\) for the cluster of the origin, write \(\Lambda _r=[-r,r]^d \cap {\mathbb {Z}}^d\) for each \(r\ge 0\), and write \(\{x \leftrightarrow y\}\) for the event that x and y belong to the same cluster.

Theorem 1.1

Let \(d\ge 1\), let \(J:{\mathbb {Z}}^d \rightarrow (0,\infty )\) be symmetric and integrable, and suppose that there exists \(\alpha < d\), \(c>0\), and \(r_0<\infty \) such that \(J(x)\ge c \Vert x\Vert _1^{-d-\alpha }\) for every \(x\in {\mathbb {Z}}^d\) with \(\Vert x\Vert _1 \ge r_0\). Then there exists a constant C such that

$$\begin{aligned} {\mathbf {P}}_{\beta }(|K_0| \ge n)&\le C n^{-(d-\alpha )/(2d+\alpha )} \end{aligned}$$
(1.2)
$$\begin{aligned} \quad \text {and} \quad \frac{1}{|\Lambda _r|} \sum _{x\in \Lambda _r} {\mathbf {P}}_\beta (0 \leftrightarrow x)&\le C r^{-2(d-\alpha )/(3d)} \end{aligned}$$
(1.3)

for every \(\beta \le \beta _c\), \(n \ge 1\), and \(r\ge 1\). In particular, there are almost surely no infinite clusters at the critical parameter \(\beta _c\).

The theorem is most interesting when \(d < 6\) and \(\alpha > d/3\), in which case the model is not expected to have mean-field behaviour and high-dimensional techniques such as the lace expansion [21, 35, 42] should not apply. Indeed, we believe that Theorem 1.1 is the first rigorous, non-trivial power-law upper bound for a critical Bernoulli percolation model that is neither two-dimensional nor expected to be described by mean-field critical exponents.

Let us now discuss interpretations of our results in terms of critical exponents. It is strongly believed that the large-scale behaviour of critical (long-range or nearest-neighbour) percolation on d-dimensional Euclidean lattices is described by critical exponents [33, Chapters 9 and 10]. The most relevant of these exponents to us are traditionally denoted \(\delta \) and \(\eta \) and are believed to describe the distribution of the cluster of the origin at criticality via the asymptotics

$$\begin{aligned} {\mathbf {P}}_{\beta _c}(|K_0| \ge n)&\approx n^{-1/\delta }&\text { as }n&\rightarrow \infty \\ \quad \text {and} \quad {\mathbf {P}}_{\beta _c}(x\leftrightarrow y)&\approx \Vert x-y\Vert ^{-d+2-\eta }&\text { as }\Vert x-y\Vert&\rightarrow \infty , \end{aligned}$$

where \(\approx \) means that the ratio of the logarithms of the two sides tends to 1 in the relevant limit. These exponents are expected to depend on the dimension d and the long-range parameter \(\alpha \) (if appropriate) but not on the small-scale details of the model such as the choice of lattice. It is an open problem of central importance to prove the existence of and/or compute these exponents, as well as to prove that they are universal in this sense. Significant progress has been made in high dimensions (\(d >6\) or \(\alpha < d/3\)) [4, 6, 21, 30, 35, 37, 42], where it is known that \(\delta =2\) and \(\eta =0\) for several large classes of examples, and for nearest-neighbour models in two dimensions [49, 52, 66, 67], where it has been proven in particular that \(\delta =91/5\) and \(\eta =5/24\) for site percolation on the triangular lattice as predicted by Nienhuis [59]. Important partial progress for other two-dimensional planar lattices has been made by Kesten [47,48,49] and Kesten and Zhang [50]. Progress in intermediate dimensions has however been extremely limited. Theorem 1.1 can be seen as a modest first step towards understanding the problem in this regime, and implies that for long-range percolation with \(0<\alpha <d\) the exponents \(\delta \) and \(\eta \) satisfy

$$\begin{aligned} \delta \le \frac{2d+\alpha }{d-\alpha } \qquad \text { and } \qquad 2-\eta \le \frac{1}{3}d + \frac{2}{3}\alpha \end{aligned}$$
(1.4)

whenever they are well-defined. (Conversely, the mean-field lower bound of Aizenman and Barsky [2] implies that \(\delta \) satisfies \(\delta \ge 2\) whenever it is well-defined; see also [29, Proposition 1.3].) See Sect. 1.3 for a discussion of how these bounds compare to the non-rigorous predicted values of \(\eta \) and \(\delta \) in the physics literature. We remark that similar bounds on other exponents including the susceptibility exponent \(\gamma \), gap exponent \(\Delta \), and cluster density exponent \(\beta \) can be obtained from (1.4) using the rigorous scaling inequalities \(\gamma \le \delta -1\), \(\Delta \le \delta \), and \(\beta \ge 2/\delta \) proven in [46] and [57].

Hyperscaling inequalities. As a part of our proof, we also prove a new rigorous hyperscaling inequality \((2-\eta )(\delta +1)\le d(\delta -1)\) for both long-range and nearest-neighbour percolation. To prove this inequality, we first prove a universal inequality implying in particular that the maximum cluster size in percolation on any finite graph is of the same order as its mean with high probability. Both results are of independent interest, and are discussed in detail in Sect. 2.

Other graphs. Our methods are not very specific to the hypercubic lattice \({\mathbb {Z}}^d\), and can also be used to establish very similar results for long-range percolation on, say, arbitrary transitive graphs of d-dimensional volume growth. We now formulate an even more general version of our theorem, which will follow by essentially the same proof. The definitions introduced here will also be used throughout the rest of the paper. Given a graph G and a vertex v, we write \(E^\rightarrow _v\) for the set of oriented edges emanating from v. (We will often abuse notation by identifying this set with the corresponding set of unoriented edges.) We define a weighted graph \(G=(V,E,J)\) to be a countable graph (VE) together with an assignment of positive weights \(\{J_e : e \in E\}\) such that \(\sum _{e\in E^\rightarrow _v} J_e < \infty \) for each \(v\in V\). Locally finite graphs can be considered as weighted graphs by setting \(J_e \equiv 1\). A graph automorphism of (VE) is a weighted graph automorphism of (VEJ) if it preserves the weights, and a weighted graph G is said to be transitive if for every two vertices x and y in G there exists an automorphism of G sending x to y. We say that a weighted graph is simple if there is at most one edge between any two vertices. Given a weighted graph \(G=(V,E,J)\) and \(\beta \ge 0\), we define Bernoulli-\(\beta \) bond percolation on G to be the random subgraph of G in which each edge is chosen to be either retained or deleted independently at random with retention probability \(1-e^{-\beta J_e}\), and write \({\mathbf {P}}_\beta ={\mathbf {P}}_{G,\beta }\) for the law of this random subgraph.

Theorem 1.2

Let \(G=(V,E,J)\) be an infinite, simple, unimodular transitive weighted graph, let o be a vertex of G, and suppose that there exist constants \(1/2< a < 1\), \(c>0\), and \(\varepsilon _0>0\) such that \(|\{ e\in E^\rightarrow _o : J_e \ge \varepsilon \}| \ge c \varepsilon ^{-a}\) for every \(0<\varepsilon \le \varepsilon _0\). Then \( \beta _c < \infty \) and there exists a constant C such that

$$\begin{aligned} {\mathbf {P}}_{\beta }(|K_o| \ge n) \le C n^{-(2a-1)/(a+1)} \end{aligned}$$

for every \(0\le \beta \le \beta _c\) and \(n\ge 1\). In particular, there are almost surely no infinite clusters at the critical parameter \(\beta _c\).

The hypothesis of unimodularity is a technical condition that holds in most natural examples, including all amenable transitive weighted graphs and all weighted graphs defined in terms of a countable group \(\Gamma \) and a symmetric, integrable function \(J:\Gamma \rightarrow [0,\infty )\) by \(V=\Gamma \), \(E=\{ \{g,h\} : g,h\in \Gamma , J(g^{-1} h)>0\}\), and \(J(\{g,h\})=J(g^{-1}h)\) for each \(\{g,h\}\in E\) [68]. (As in the case of \({\mathbb {Z}}^d\), we say that a function \(J:\Gamma \rightarrow [0,\infty )\) on a countable group \(\Gamma \) is symmetric if \(J(\gamma )=J(\gamma ^{-1})\) for every \(\gamma \in \Gamma \) and integrable if \(\sum _{\gamma \in \Gamma }J(\gamma )<\infty \).) It follows in particular that Theorem 1.2 implies Theorem 1.1. See [56, Chapter 8] for further background on unimodularity.

Remark 1.3

Theorem 1.2 also leads to a new proof of a recent theorem of Xiang and Zou [76] which states that every countably infinite (but not necessarily finitely generated) group \(\Gamma \) admits a symmetric, integrable function \(J:\Gamma \rightarrow [0,\infty )\) for which the associated weighted graph has a non-trivial percolation phase transition. To deduce their theorem from ours, simply pick a bijection \(\sigma :\Gamma \rightarrow \{1,2,\ldots \}\), let \(1<\alpha <2\), and consider the symmetric, integrable function on \(\Gamma \) defined by \(J(\gamma )= \sigma (\gamma )^{-\alpha }+\sigma (\gamma ^{-1})^{-\alpha }\) for every \(\gamma \in \Gamma \): the associated long-range percolation model has \(\beta _c<\infty \) by Theorem 1.2. We remark also that Xiang and Zou’s proof relied on the results of Duminil-Copin et al. [26] in the case that the group is finitely generated, while our proof is self-contained. It would be interesting if a new proof of the results of [26] could be derived from Theorem 1.2 by comparison of short- and long-range percolation.

1.1 About the proof

We now outline the basic structure of our proof and discuss how it compares to previous approaches to critical percolation. We begin with a brief overview of the two main strategies that have been employed in the study of critical percolation, which we term the supercritical strategy and the subcritical strategy. Broadly speaking, the supercritical strategy has found more success in low-dimensional settings while the subcritical strategy has found more success in high-dimensional settings, but there are notable exceptions in both cases. We write \(\theta (p)={\mathbf {P}}_p(|K_o|=\infty )\) for the probability that the origin lies in an infinite cluster.

The supercritical strategy. In this strategy, one attempts to prove that the set \(\{p:\theta (p)>0\}\) is open by analysis of percolation under the assumption that \(\theta (p)>0\). For example, one may hope to show that if infinite clusters exist then each such cluster K must be ‘large’ in some coarse sense that is strong enough to ensure that \(p_c(K)<1\). This approach has been successfully followed both in Berger’s analysis of long-range percolation on \({\mathbb {Z}}^d\) [11] and in Benjamini, Lyons, Peres, and Schramm’s proof that critical percolation on any nonamenable Cayley graph has no infinite clusters [10]. Harris’s classical proof that \(\theta (1/2)=0\) for the square lattice [38] can also be thought of in similar terms. Alternatively, one may instead attempt to find a finite-size characterisation of supercriticality, that is, a sequence of events \(({\mathcal {E}}_n)_{n\ge 1}\) each depending on at most finitely many edges and a sequence of positive numbers \((\delta _n)_{n\ge 0}\) such that

$$\begin{aligned} \theta (p)>0 \iff \text {there exists }n\ge 1\text { such that } {\mathbf {P}}_p({\mathscr {E}}_n) > 1- \delta _n \end{aligned}$$

for every \(p\in [0,1]\); the existence of such a finite-size characterisation of supercriticality is easily seen to imply that the set \(\{p:\theta (p)>0\}\) is open as required. Such finite-size characterisations are typically derived via a renormalization argument, and this strategy often amounts to an alternative formalization of the more geometric strategy discussed above. Successful realisations of this approach include Barsky, Grimmett, and Newman’s analysis [7, 8] of half-spaces and orthants in \({\mathbb {Z}}^d\) and Duminil-Copin, Sidoravicius, and Tassion’s analysis [28] of two-dimensional slabs \({\mathbb {Z}}^2 \times [0,r]^{k}\). A popular approach to critical percolation on \({\mathbb {Z}}^3\) seeks to implement this strategy by eliminating the ‘sprinkling’ from the proof of the Grimmett-Marstrand theorem [34]; while this has not yet been done successfully, interesting partial progress in this direction has been made by Cerf [20].

Arguments following the supercritical strategy tend to be ineffective in the sense that they give little or no quantitative information about percolation at \(p_c\); see however the recent work of Duminil-Copin, Kozma, and Tassion [27] for some progress towards reversing this trend.

The subcritical strategy. In this strategy, one attempts to prove that the set \(\{p : \theta (p)=0\}\) is closed by proving that some non-trivial upper bound on the distribution of the cluster of the origin holds uniformly throughout the subcritical phase. In contrast to the supercritical strategy, the subcritical strategy is inherently quantitative in nature and typically yields explicit estimates on the distribution of the cluster of the origin at criticality. The simplest example of such an argument is the proof that there is no percolation at criticality on any amenable transitive graph of exponential volume growth [43], which uses elementary subadditivity considerations to prove the uniform bound

$$\begin{aligned} \min \{ {\mathbf {P}}_p(x \leftrightarrow y) : d(x,y) \le n\} \le {\text {gr}}(G)^{-n} \end{aligned}$$

for every \(n\ge 1\) and \(p<p_c\), where \({\text {gr}}(G)=\limsup _{n\rightarrow \infty } |B(x,n)|^{1/n}\) is the rate of exponential volume growth of G. Left-continuity of connection probabilities then implies that the same bound continues to hold at \(p_c\), from which the theorem is easily deduced.

More sophisticated versions of the subcritical strategy often involve a ‘bootstrapping’ or ‘forbidden zone’ argument. Such an argument was first used to analyze high-dimensional statistical mechanics models by Slade [62]. To implement such an argument, one aims to prove that some well-chosen estimate, called the bootstrapping hypothesis, implies a strictly stronger version of itself. Once this is done, it is usually straightforward to conclude via a continuity argument that the strong form of the estimate holds uniformly throughout the subcritical phase. For example, the lace expansion for high-dimensional percolation [30, 36, 37, 42] works roughly by showing that if d is sufficiently large and \({\mathscr {G}}\) denotes the Greens function on \({\mathbb {Z}}^d\) then for each \(p\in [0,p_c)\) we have the implication

$$\begin{aligned}&\left( {\mathbf {P}}_p(x\leftrightarrow y) \le 3\, {\mathscr {G}}(x,y) \text { for every }x,y\in {\mathbb {Z}}^d\right) \nonumber \\&\quad \Rightarrow \left( {\mathbf {P}}_p(x\leftrightarrow y) \le 2\, {\mathscr {G}}(x,y) \text { for every } x,y\in {\mathbb {Z}}^d\right) . \end{aligned}$$
(1.5)

The estimate \({\mathbf {P}}_p(x\leftrightarrow y) \le 3\, {\mathscr {G}}(x,y)\) holds trivially when p is small. Since we also have that \(\limsup _{x \rightarrow \infty } {\mathbf {P}}_p(0 \leftrightarrow x)/{\mathscr {G}}(0,x) =0\) for every \(p<p_c\) by sharpness of the phase transition [2, 29], it follows by an elementary continuity argument that \({\mathbf {P}}_p(x\leftrightarrow y) \le 2\, {\mathscr {G}}(x,y)\) for every \(0 \le p \le p_c\) and hence that there is no infinite cluster at \(p_c\) as desired. (In fact the bootstrapping hypothesis used in the lace expansion analysis of percolation is more complicated than this, but the essence of the argument is as described.) See [41, 63] for an overview of this method and [16, 65] for recent work simplifying the implementation of the lace expansion for weakly self-avoiding walk.

In this paper we build upon a new version of the subcritical strategy that has been developed in our recent works [39, 44, 45]. The most basic form of the method was first used to prove power-law upper bounds for percolation on groups of exponential growth in [44], while a more sophisticated version of the method, closer to that employed here, was subsequently used to analyze critical percolation on certain groups of stretched-exponential volume growth in joint work with Hermon [39]. Very recently, similar ideas have also been used to prove continuity of the phase transition for the Ising model on nonamenable groups [45].

Let us now outline how this method works. In [44], we built upon the work on Aizenman, Kesten, and Newman [3] to prove an upper bound on the probability of a certain two-arm-type event, which we called the two-ghost inequality, that holds universally for all unimodular transitive graphs. One formulation of this inequality states that if \(G=(V,E)\) is a connected, locally finite, transitive unimodular graph (e.g. \(G={\mathbb {Z}}^d\)) and \({\mathscr {S}}_{e,n}\) denotes the event that the endpoints of the edge e are in distinct clusters each of which touches (i.e., contains a vertex incident to) at least n edges and at least one of which is finite, then

$$\begin{aligned} \sum _{e\in E^\rightarrow _o} {\mathbf {P}}_p({\mathscr {S}}_{e,n}) \le 66 \deg (o) \sqrt{\frac{1-p}{p n}} \end{aligned}$$
(1.6)

for every \(p\in (0,1]\), \(n\ge 1\), and \(o \in V\). An extension of the two-ghost inequality to long-range models (including certain dependent models) was proven in [45, Section 3], which we give a further improvement to in Theorem 3.1. The two-ghost inequality can sometimes be used to prove that the percolation phase transition is continuous via the following rough strategy, which we implement a version of in this paper:

  1. 1.

    Assume as a bootstrapping hypothesis some well-chosen upper bound \({\mathbf {P}}_\beta (|K_o|\ge n) \le h(n)\) for each \(n\ge 1\) with \(h(n)\rightarrow 0\) as \(n\rightarrow \infty \) and that holds trivially when \(\beta \) is very small and that decays subexponentially, so that \(\lim _{n\rightarrow \infty } h(n)^{-1} {\mathbf {P}}_\beta (|K_o|\ge n) =0\) for every \(\beta <\beta _c\) by sharpness of the phase transition [29, 46]. Choosing which bound to use is a potentially subtle matter which may involve trial and error. Heuristically, there is a ‘Goldilocks principle’ that needs to be satisfied when choosing the bootstrapping hypothesis appropriately: A bound that is too weak will be of too little use as an input to proceed further into the argument, while a bound that is too strong will be too difficult to re-derive in a stronger form as required for the bootstrapping argument to come full circle. In particular, any bound decaying faster than \(n^{-1/2}\) cannot possibly work. In this paper we are able to consider power-law upper bounds as seems most natural, while in [40] the optimal upper bound making the argument work was of the form \(C e^{-\log ^\varepsilon n}\) for small \(\varepsilon >0\).

  2. 2.

    Find some way to convert the bootstrapping hypothesis \({\mathbf {P}}_\beta (|K_o|\ge n)\le h(n)\) into a two-point function upper bound \({\mathbf {P}}_\beta (o \leftrightarrow x)\le f(x)\) for some function f that hopefully decays reasonably quickly as \(x\rightarrow \infty \) for at least some well-chosen choices of x. In [39], for example, this is done by letting X be a random walk and bounding \({\mathbf {P}}_\beta (o \leftrightarrow X_k)\) via spectral techniques. Here we will instead prove such a bound using hyperscaling inequalities as discussed in Sect. 2.

  3. 3.

    Use the Harris-FKG inequality and a union bound to observe that \({\mathbf {P}}_\beta ({\mathscr {S}}_{o,x,n}') \ge {\mathbf {P}}_\beta (|K_o|\ge n)^2- {\mathbf {P}}_\beta (o\leftrightarrow x)\), where \({\mathscr {S}}_{o,x,n}'\) is the event that o and x belong to distinct clusters of size at least n, then prove an upper bound of the form \({\mathbf {P}}_\beta ({\mathscr {S}}_{o,x,n}') \le F(x) {\mathbf {P}}_\beta ({\mathscr {S}}_{e,n}')\) for some appropriately chosen edge \(e=e(x)\) and some function F(x) that is hopefully not too large. In [39, 44] this second step is done via a surgery argument using the finite-energy property of percolation. In our setting this step is much simpler and more efficient since we can just take e to be the ‘long edge’ connecting o to x and take \(F(x)\equiv 1\).

  4. 4.

    Put steps 2 and 3 together to get an inequality of the form

    $$\begin{aligned} {\mathbf {P}}_\beta (|K_o|\ge n) \le \sqrt{ F(x) {\mathbf {P}}_\beta ({\mathscr {S}}'_{e,n})+f(x)} \end{aligned}$$

    for every \(n\ge 1\) and every vertex x under the assumption that \(\beta <\beta _c\) and that the bootstrapping hypothesis holds. The proof will work if bounding \({\mathbf {P}}_\beta ({\mathscr {S}}'_{e,n})\) using the two-ghost inequality and optimizing over the choice of x leads to a bound \({\mathbf {P}}_\beta (|K_o|\ge n) \le g(n)\) that is a strict improvement of the bootstrapping hypothesis in the sense that \(g(n)<h(n)\) whenever \(h(n) < 1\). (The function g must not depend on the choice of \(0\le \beta < \beta _c\).) Once this has been done successfully, it follows by an elementary continuity argument (using that \(\lim _{n\rightarrow \infty } h(n)^{-1} {\mathbf {P}}_\beta (|K_o|\ge n) =0\) for every \(\beta <\beta _c\)) that the bound \({\mathbf {P}}_\beta (|K_o|\ge n) \le g(n)\) holds for all \(0\le \beta \le \beta _c\) and \(n\ge 1\), and hence that there is no percolation at criticality as desired.

1.2 A short proof of a weaker result

In order to give a simple illustration of how the strategy sketched above can be applied to long-range percolation on \({\mathbb {Z}}^d\), we now give a quick proof of a weaker result requiring \(\alpha <d/4\) rather than \(\alpha <d\) and giving a worse upper bound on the exponent \(\delta \).

Proposition 1.4

Let \(d\ge 1\), let \(J:{\mathbb {Z}}^d \rightarrow [0,\infty )\) be symmetric and integrable, and suppose that there exists \(\alpha < d/4\), \(c>0\), and \(r_0<\infty \) such that \(J(x)\ge c \Vert x \Vert _1^{-d-\alpha }\) for every \(x\in {\mathbb {Z}}^d\) with \(\Vert x\Vert _1 \ge r_0\). Then there exists a constant C such that

$$\begin{aligned} {\mathbf {P}}_{\beta }(|K_0| \ge n)&\le C n^{-(d-4\alpha )/(4d)} \end{aligned}$$

for every \(0\le \beta \le \beta _c\) and \(n \ge 1\). In particular, there are almost surely no infinite clusters at the critical parameter \(\beta _c\).

The proof will apply the following special case of the two-ghost inequality of [45, Corollary 3.2]. We will prove a stronger version of this inequality in Sect. 3. For each \(e\in E\) and \(\lambda >0\), we define \({\mathscr {S}}_{e,\lambda }\) to be the event that the endpoints of e are in distinct clusters each of which touches a set of edges of total weight at least \(\lambda \) and at least one of which contains only finitely many vertices.

Theorem 1.5

Let \(G=(V,E,J)\) be a connected, unimodular, transitive weighted graph, let o be a vertex of G, and let \(\beta \ge 0\). Then

$$\begin{aligned} \sum _{e\in E^\rightarrow _o} \sqrt{J_e(e^{\beta J_e}-1)} {\mathbf {P}}_{\beta }({\mathscr {S}}_{e,\lambda }) \le \frac{42}{\sqrt{\lambda }} \qquad \text { for every }\lambda >0. \end{aligned}$$
(1.7)

Proof of Proposition 1.4

By rescaling if necessary, we may assume without loss of generality that \(\sum _{x \in {\mathbb {Z}}^d} J(x) =1\), so that \(\beta _c \ge 1\). Let \(\theta =(d-4\alpha )/4d<1/4\). We claim that there exists a constant \(C\ge 1\) such that the following implication holds for each \(1/2 \le \beta < \beta _c\) and \(1 \le A < \infty \):

$$\begin{aligned}&\Bigl ({\mathbf {P}}_\beta (|K_0|\ge n) \le A n^{-\theta }\text { for every } n\ge 1\Bigr ) \nonumber \\&\quad \Rightarrow \Bigl ({\mathbf {P}}_\beta (|K_0|\ge n) \le C A^{1/2} n^{-\theta }\text { for every }n\ge 1\Bigr ). \end{aligned}$$
(1.8)

Indeed, fix one such \(1/2 \le \beta <\beta _c\) and suppose that \(1 \le A <\infty \) is such that \({\mathbf {P}}_\beta (|K_0|\ge n) \le A n^{-\theta }\) for every \(n\ge 1\). All the constants appearing in the remainder of the proof will be allowed to depend on \(d,\alpha ,\), c, and \(r_0\), but not on the choice of \(1\le A < \infty \) or \(1/2 \le \beta <\beta _c\). For each \(x\in {\mathbb {Z}}^d\) and \(n\ge 1\), let \({\mathscr {S}}_{x,n}'\) be the event that 0 and x belong to distinct clusters each of which contain at least n vertices. Both clusters are automatically finite since \(\beta < \beta _c\). It follows from Theorem 1.5 that there exists a constant \(C_1\) such that

$$\begin{aligned} \sum _{x \in {\mathbb {Z}}^d} J(x)^{1/2}(e^{\beta J(x)}-1)^{1/2} {\mathbf {P}}_\beta ({\mathscr {S}}_{x,n}') \le C_1 n^{-1/2} \end{aligned}$$

for every \(n \ge 1\). For each \(r\ge r_0\), define \(\Lambda _r'=\Lambda _r \setminus \Lambda _{r_0-1}\); we will prove bounds depending on both n and r before optimizing over the choice of r later in the proof. Using the inequality \(e^x-1 \ge x\) and the assumption that \(J(x) \ge c \Vert x\Vert _1^{-d-\alpha }\) for every \(x \in {\mathbb {Z}}^d \setminus \Lambda _{r_0-1}\), it follows that there exists a constant \(C_2\) such that

$$\begin{aligned}&\sum _{x\in \Lambda _r'} {\mathbf {P}}_\beta ({\mathscr {S}}_{x,n}') \le \max \left\{ J(x)^{-1/2}(e^{\beta J(x)}-1)^{-1/2} : x\in \Lambda _r'\right\} \nonumber \\&\qquad \sum _{x \in {\mathbb {Z}}^d} J(x)^{1/2}(e^{\beta J(x)}-1)^{1/2} {\mathbf {P}}_\beta ({\mathscr {S}}_{x,n}') \nonumber \\&\quad \le \max \left\{ \beta ^{-1/2} J(x)^{-1} : x\in \Lambda _r'\right\} C_1 n^{-1/2} \le C_2 r^{d+\alpha } n^{-1/2} \end{aligned}$$
(1.9)

for every \(n\ge 1\) and \(r\ge r_0\). On the other hand, we have trivially that there exists a constant \(C_3\) such that

$$\begin{aligned} \sum _{x \in \Lambda _r'}{\mathbf {P}}_\beta (0 \leftrightarrow x)= & {} {\mathbf {E}}_\beta |K_0 \cap \Lambda _r'| \le {\mathbf {E}}_\beta \left[ |K_0| \wedge |\Lambda _r'| \right] = \sum _{n=1}^{|\Lambda _r'|} {\mathbf {P}}_\beta (|K_0| \ge n) \nonumber \\\le & {} A\sum _{n=1}^{|\Lambda _r'|} n^{-\theta } \le C_3 A r^{d(1-\theta )} \end{aligned}$$
(1.10)

for every \(r\ge r_0\). We now apply these two bounds to obtain a new bound on \({\mathbf {P}}_\beta (|K_0|\ge n)\). We have by a union bound and the Harris-FKG inequality that

$$\begin{aligned} {\mathbf {P}}_\beta ({\mathscr {S}}_{x,n}') \ge {\mathbf {P}}_\beta (|K_0|\ge n,|K_x|\ge n)-{\mathbf {P}}_\beta (0 \leftrightarrow x) \ge {\mathbf {P}}_\beta (|K_0|\ge n)^2-{\mathbf {P}}_\beta (0 \leftrightarrow x) \end{aligned}$$

for each \(x\in {\mathbb {Z}}^d\) and \(n\ge 1\). Rearranging yields \({\mathbf {P}}_\beta (|K_0|\ge n)^2 \le {\mathbf {P}}_\beta (0 \leftrightarrow x) + {\mathbf {P}}_\beta ({\mathscr {S}}_{x,n}') \) for every \(x\in {\mathbb {Z}}^d\) and \(n\ge 1\), and it follows by averaging over \(x\in \Lambda _r'\) that there exists a constant \(C_4\) such that

$$\begin{aligned} {\mathbf {P}}_\beta (|K_0|\ge n)^2\le & {} \frac{1}{|\Lambda _r'|}\sum _{x\in \Lambda _r'} {\mathbf {P}}_\beta (0 \leftrightarrow x) \\&+ \frac{1}{|\Lambda _r'|}\sum _{x\in \Lambda _r'}{\mathbf {P}}_\beta ({\mathscr {S}}_{x,n}') \le C_4 A r^{-d\theta } + C_4 r^{\alpha } n^{-1/2} \end{aligned}$$

for every \(r\ge r_0\) and \(n\ge 1\), where we applied (1.10) and (1.9) in the second inequality. Taking \(r=r_0 \vee \left\lceil n^{(1-4\theta )/(2\alpha )}\right\rceil \) yields that there exists a constant \(C_5\) such that

$$\begin{aligned} {\mathbf {P}}_\beta (|K_0|\ge n)^2&\le C_5 A n^{-d \theta (1-4\theta )/(2\alpha )} + C_5 n^{-2\theta } = C_5 (A+1) n^{-2\theta } \le 2 C_5 A n^{-2\theta } \end{aligned}$$
(1.11)

for every \(n \ge 1\), where we used that \(\theta =(d-4\alpha )/4d\) in the central equality. (We arrived at this value of \(\theta \) by getting to this stage of the calculation with \(\theta \) indeterminate and solving for the value of \(\theta \) that made the two powers of n equal.) The inequality (1.11) implies the claimed implication (1.8) by taking square roots on both sides.

We now apply the bootstrapping implication (1.8) to complete the proof of the proposition. For each \(1/2 \le \beta < \beta _c\), we have by sharpness of the phase transition [2, 29] that \(|K_0|\) has finite mean (indeed, it has an exponential tail), and in particular that there exists \(1 \le A < \infty \) such that \({\mathbf {P}}_\beta (|K_0|\ge n) \le A n^{-\theta }\) for every \(n\ge 1\). For each \(1/2\le \beta < \beta _c\) we may therefore define

$$\begin{aligned} A_\beta = \min \bigl \{1 \le A< \infty : {\mathbf {P}}_\beta (|K_0|\ge n) \le A n^{-\theta } \text { for every }n\ge 1\bigr \} < \infty . \end{aligned}$$

Since the set we are minimizing over is closed, we have that \({\mathbf {P}}_\beta (|K_0|\ge n) \le A_\beta n^{-\theta }\) for every \(n\ge 1\) and \(1/2 \le \beta < \beta _c\). Moreover, (1.8) implies that there exists a constant C such that \(A_\beta \le C A_\beta ^{1/2}\) for every \(0 \le \beta < \beta _c\), and since \(A_\beta \) is finite for every \(1/2\le \beta <\beta _c\) we may safely rearrange this inequality to obtain that \(A_\beta \le C^2\) for every \(1/2\le \beta <\beta _c\). Thus, we have proven that \({\mathbf {P}}_\beta (|K_0|\ge n) \le C^{2} n^{-\theta }\) for every \(0 \le \beta < \beta _c\). Considering the standard monotone coupling of \({\mathbf {P}}_\beta \) and \({\mathbf {P}}_{\beta '}\) for \(\beta \le \beta '\) and taking limits, it follows that the same estimate holds for all \(0\le \beta \le \beta _c\) and \(n\ge 1\) as claimed. \(\square \)

In order to prove Theorem 1.1, we will improve the above proof in two ways: In Sect. 2 we develop a better method to convert volume-tail bounds into two-point function bounds than the primitive method used in (1.10), while in Sect. 3 we prove an improved form of the two-ghost inequality that gives better bounds on \({\mathbf {P}}_\beta ({\mathscr {S}}_{e,n})\) in the case that e is a typical ‘long’ edge. (Each improvement can be used in isolation to prove a result of intermediate strength requiring \(\alpha <d/2\).)

1.3 Comparison to physics predictions

We now give a brief heuristic discussion of how our exponent bounds compare to the values predicted in the physics literature. Building upon the work of Sak [60] on long-range Ising models (see also e.g. [9, 55]), physicists including Brezin, Parisi, and Ricci-Tersenghi [19] have argued that if \(\eta (d,\alpha )\) and \(\delta (d,\alpha )\) denote the values of the exponents \(\eta \) and \(\delta \) for long-range percolation in dimension d with long-range parameter \(\alpha \) and \(\eta _\mathrm {SR}(d)\) and \(\delta _\mathrm {SR}(d)\) denote the corresponding nearest-neighbour exponents then

$$\begin{aligned} 2-\eta (d,\alpha ) = {\left\{ \begin{array}{ll} \alpha &{} \alpha \le 2-\eta _\mathrm {SR}(d)\\ 2-\eta _\mathrm {SR}(d) &{} \alpha >2-\eta _\mathrm {SR}(d), \end{array}\right. } \end{aligned}$$
(1.12)

with logarithmic corrections to scaling at the ‘crossover’ value \(\alpha ^*(d) = 2-\eta _\mathrm {SR}(d)\). In particular, the exponent \(\eta (d,\alpha )\) is predicted to ‘stick’ to its mean-field value of \(2-\alpha \) in the interval \((d/3,\alpha ^*]\), even though other exponents such as \(\delta \) are not expected to take their mean-field values in this interval. Numerical work supporting these predictions has recently been carried out in [31]. See [21, 22, 42] for rigorous proofs in certain high-dimensional cases and [53, 64] for related rigorous results for the long-range spin O(n) model. Assuming further that \(\delta (d,\alpha )\) takes its mean-field value of 2 when \(\alpha \le d/3\) and that the hyperscaling relation \((2-\eta )(\delta +1)=d(\delta -1)\) is satisfied when \(\alpha \ge d/3\) yields that

$$\begin{aligned} \delta (d,\alpha ) = {\left\{ \begin{array}{ll} 2 &{} 0< \alpha \le d/3\\ (d+\alpha )/(d-\alpha ) &{} d/3 \le \alpha \le \alpha ^*(d)\\ \delta _\mathrm {SR}(d) &{} \alpha ^*(d) \le \alpha <\infty . \end{array}\right. } \end{aligned}$$
(1.13)

As discussed above, it is strongly expected and known in some cases that \(\eta _\mathrm {SR}(2)=5/24\) and that \(\eta _\mathrm {SR}(d)=0\) when \(d\ge 6\). On the other hand, it is believed that \(\eta _\mathrm {SR}\) takes small negative values for \(d\in \{3,4,5\}\): Both numerical estimates [54, 71, 75, 77] and non-rigorous renormalization group methods [32] give values ranging between \(-0.1\) and \(-0.01\) in all three cases. (See the Wikipedia page https://en.wikipedia.org/wiki/Percolation_critical_exponents for a summary.) As such, it is believed that \(\alpha ^*(d)<d\) for every \(d\ge 2\) and hence that that the models treated by Theorem 1.1 should include examples in the same universality class as nearest-neighbour Bernoulli bond percolation on each lattice of dimension \(d\ge 2\). (Proving such a universality claim would, however, require a vastly better understanding of these models than that provided by Theorem 1.1.) The bounds we obtain on the exponents for these models are of reasonable order, with our upper bounds on \(\delta (d,\alpha )\) always within a factor of 2 of the predicted true values when \(\alpha \le \alpha ^*(d) =2-\eta _\mathrm {SR}(d)\). See Figs. 1 and 2 for side-by-side comparisons in two and three dimensions.

Fig. 1
figure 1

Our upper bounds (blue) versus the conjectured true values (red) of \(2-\eta \) and \(\delta \) when \(d=2\)

Fig. 2
figure 2

Our upper bounds (blue) versus the conjectured true values (red) of \(2-\eta \) and \(\delta \) when \(d=3\). Here we use the numerical values \(\alpha ^*(3)=2-\eta _\mathrm {SR}(3)\approx 2.0457\) and \(\delta _\mathrm {SR}(3)\approx 5.2886\) obtained by applying the scaling and hyperscaling relations to the numerical estimates on the exponents \(\nu \) and \(\beta /\nu \) obtained by Wang et al. in [75]. When \(\alpha =2.0457\approx \alpha ^*(3)\) our upper bound on \(\delta \) is about 8.43

2 Hyperscaling inequalities and the maximum cluster size in a box

The proof of Proposition 1.4 made use of the fact that if Bernoulli bond percolation on some weighted graph \(G=(V,E,J)\) satisfies a bound of the form \(\sup _{v\in V}{\mathbf {P}}_\beta (|K_v|\ge n) \le A n^{-\theta }\) for some \(0\le \theta <1\) and \(A<\infty \) then we have that

$$\begin{aligned} \sum _{v\in \Lambda }{\mathbf {P}}_\beta (u \leftrightarrow v) = {\mathbf {E}}_\beta |K_u \cap \Lambda | \le {\mathbf {E}}_\beta \left[ |K_u| \wedge |\Lambda | \right] \le A\sum _{n=1}^{|\Lambda |} n^{-\theta } \le C(\theta )A |\Lambda |^{1-\theta }\nonumber \\ \end{aligned}$$
(2.1)

for every \(\Lambda \subseteq V\) and \(u\in V\), where \(C(\theta )=O(1/(1-\theta ))\) depends only on \(\theta \). Tasaki [69] observed that this inequality, which holds for arbitrary random graph models on \({\mathbb {Z}}^d\), can be thought of as giving a primitive hyperscaling inequality \((2-\eta )\delta \le d(\delta -1)\). In this section, we prove an inequality implying the stronger hyperscaling inequality \((2-\eta )(\delta +1) \le d(\delta -1)\). Note that while the arguments in the rest of the paper can all be applied to certain dependent percolation models including the random-cluster model with only a little extra work, the arguments in this section rely on the BK inequality in an essential way and are therefore very specific to Bernoulli percolation.

Let us now briefly review what is known about scaling and hyperscaling relations for Bernoulli percolation. In addition to the critical exponents \(\delta \) and \(\eta \) that we have already introduced, it is also believed that there exist exponents \(\gamma , \Delta ,\) \(\rho \), and \(\beta \) such that

$$\begin{aligned}&{\mathbf {E}}_{\beta _c-\varepsilon }\left[ |K_0|^k \right] \approx \varepsilon ^{-\gamma - \Delta (k-1)}&\text { as }\varepsilon \downarrow 0\text { for each }k\ge 1\\&{\mathbf {P}}_{\beta _c}(0 \leftrightarrow \partial [-r,r]^d) \approx r^{-1/\rho }&\text { as }r \uparrow \infty ,\text { and}\\&{\mathbf {P}}_{\beta _c+\varepsilon }(|K_0|=\infty ) \approx \varepsilon ^\beta&\text { as } \varepsilon \downarrow 0. \end{aligned}$$

As before, \(\approx \) means that the ratio of the logarithms of the two sides tends to 1 in the relevant limit. A further critical exponent \(\nu \) is expected to describe the correlation length \(\xi (\beta )\) through the asymptotics \(\xi (\beta _c-\varepsilon ) \approx \varepsilon ^{-\nu }\) as \(\varepsilon \downarrow 0\). Intuitively the correlation length is the scale on which off-critical behaviour begins to manifest itself, see [33, Section 6.2] for a precise definition in the nearest-neighbour context. Heuristic scaling theory predicts that these exponents always satisfy the scaling relations

$$\begin{aligned} \gamma = \beta (\delta -1), \qquad \beta \delta = \Delta , \qquad \text { and } \qquad \gamma = \nu (2-\eta ). \end{aligned}$$
(2.2)

Below the upper critical dimension, two additional relations between these exponents known as the hyperscaling relations are expected to hold, namely

$$\begin{aligned} d \rho = \delta + 1 \qquad \text { and } \qquad d \nu = \beta (\delta +1). \end{aligned}$$
(2.3)

Note that the hyperscaling relations involve the dimension d while the scaling relations do not. It is a heuristic originally due to Coniglio [23] that the hyperscaling relations should hold if there are typically O(1) ‘large’ critical clusters on any given scale. This condition is believed to hold below the upper critical dimension but not above; see [1, 17] for detailed discussions. See [33, Section 9.1] for an overview of the heuristic arguments in support of the scaling and hyperscaling relations.

For nearest-neighbour percolation on two-dimensional planar lattices, the scaling relations (2.2) and hyperscaling relations (2.3) were proven to hold by Kesten [49] under the assumption that the exponents \(\delta \) and \(\nu \) are both well-defined. (Kesten’s results were of central importance to the subsequent computation of the critical exponents for site percolation on the triangular lattice following Smirnov’s proof of conformal invariance [52, 66, 67].) See also [74] for related results on two-dimensional Voronoi percolation. Meanwhile, in high dimensions, it is now known that the exponents \(\beta ,\gamma ,\delta ,\Delta ,\eta ,\rho ,\) and \(\nu \) all take their mean-field values in nearest-neighbour percolation with \(d\gg 6\), from which it follows that the scaling relations (2.2) are satisfied but that the hyperscaling relations (2.3) are violated; see [41] for an overview and [4, 6, 21, 35, 37, 51] for highlights of the high-dimensional literature.

It remains completely open to prove that the scaling and hyperscaling relations hold in dimensions \(2<d\le 6\), even if one assumes that all the relevant exponents are well-defined. The most significant progress is due to Borgs, Chayes, Kesten, and Spencer [17, 18], who proved in particular that the scaling and hyperscaling relations both hold in low-dimensional lattices for which \(\rho \) is well-defined under the (as yet unproven) assumption that the number of clusters crossing the box \([0,r]\times [0,3r]^{d-1}\) in the easy direction is tight as \(r\rightarrow \infty \). Their proof also yields that the hyperscaling inequalities

$$\begin{aligned} d\rho \ge \delta +1 \qquad \text { and } \qquad d-2+\eta \ge 2/\rho \end{aligned}$$

hold on any graph for which these exponents are well-defined. Many further works have established various other inequalities between critical exponents; see the work of Tasaki [69, 70] for hyperscaling inequalities and the recent work [46] and references therein for an overview of scaling inequalities.

The main goal of this section is to prove the following theorem, which improves significantly upon the naive bound of (2.1).

Theorem 2.1

There exists a universal constant C such that the following holds. Let \(G=(V,E,J)\) be a weighted graph, let \(\beta \ge 0\), and suppose that there exist constants \(A<\infty \) and \(0 \le \theta \le 1/2\) such that \({\mathbf {P}}_\beta ( |K_u| \ge \lambda ) \le A \lambda ^{-\theta }\) for every \(u\in V\) and \(\lambda >0\). Then

$$\begin{aligned} \frac{1}{|\Lambda |}\sum _{v \in \Lambda } {\mathbf {P}}_\beta (u\leftrightarrow v) \le C A^{2 /(1+\theta )} |\Lambda |^{-2\theta /(1+\theta )} \end{aligned}$$

for every \(u\in V\) and every finite set \(\Lambda \subseteq V\).

In the context of \({\mathbb {Z}}^d\), it follows from Theorem 2.1 that if the exponents \(\eta \) and \(\delta \) are both well-defined then they satisfy the hyperscaling inequality

$$\begin{aligned} (2 -\eta )(\delta +1) \le d(\delta -1). \end{aligned}$$
(2.4)

Indeed, if \(\eta \) and \(\delta \) are both well-defined then either \(\eta \ge 2\), in which case (2.4) is trivial, or we can apply Theorem 2.1 with \(\theta =1/\delta -\varepsilon \) for \(\varepsilon >0\) arbitrary (noting that \(\delta \ge 2\) when it is well defined [2, 29]) to compute that

$$\begin{aligned}&r^{-d+2-\eta } \approx r^{-d} \sum _{x \in \Lambda _r} \Vert x\Vert ^{-d+2-\eta } \approx r^{-d} \sum _{x,y \in \Lambda _r} {\mathbf {P}}_\beta (0\leftrightarrow x) \\&\quad \lesssim r^{-2d/(\delta +1)} \qquad \text { as }r\rightarrow \infty , \end{aligned}$$

where we write \(\lesssim \) to mean that the ratio of the logarithms of the left and right hand sides has limit supremum less than 1. This inequality may be rearranged to prove the inequality (2.4) in the case \(\eta < 2\). We remark that the inequality (2.4) is expected to be an equality below the upper critical dimension, as would follow from the validity of the scaling and hyperscaling relations.

In order to prove Theorem 2.1, we first prove in Sect. 2.1 a universal inequality implying in particular that the maximum size of the intersection of a cluster with a finite set is exponentially unlikely to be much larger than its median value, Theorem 2.2. This inequality is proven by a combinatorial argument using the BK inequality. We then deduce Theorem 2.1 from this inequality in Sect. 2.2 by a fairly straightforward calculation.

2.1 Universal tightness of the maximum cluster size in a finite region

Let \(G=(V,E,J)\) be a countable weighted graph, and consider Bernoulli bond percolation on G with parameter \(\beta \ge 0\). For each finite subset \(\Lambda \) of V, we define

$$\begin{aligned} |K_\mathrm {max}(\Lambda )|=\max \{|K_v \cap \Lambda | : v\in V\}=\max \{|K_v \cap \Lambda | : v\in \Lambda \}. \end{aligned}$$

(This is a slight abuse of notation: there may be more than one cluster achieving this maximum, so that \(K_\mathrm {max}(\Lambda )\) need not be well-defined as a set in general.) In this section we prove a general inequality, applying universally to all G, \(\beta \), and \(\Lambda \), implying that \(|K_\mathrm {max}(\Lambda )|\) is of the same order as its ‘typical value’ \(M_\beta (\Lambda ):=\min \{n \ge 0 : {\mathbf {P}}_\beta (|K_\mathrm {max}(\Lambda )| \ge n)\le e^{-1} \}\) with high probability. In particular, one simple consequence of this theorem is that \(e^{-1} M_\beta (\Lambda ) \le {\mathbf {E}}_\beta |K_\mathrm {max} (\Lambda )| \le 10 M_\beta (\Lambda )\), so that the mean and typical value of \(|K_\mathrm {max}(\Lambda )|\) are always of the same order. We expect that the inequalities we prove in this section will have many further applications in the future.

Theorem 2.2

(Universal tightness of the maximum cluster size). Let \(G=(V,E,J)\) be a countable weighted graph and let \(\Lambda \subseteq V\) be finite and non-empty. Then the inequalities

$$\begin{aligned} {\mathbf {P}}_\beta \Bigl (|K_\mathrm {max}(\Lambda )| \ge \alpha M_\beta (\Lambda )\Bigr )&\le \exp \left( -\frac{1}{9}\alpha \right) \end{aligned}$$
(2.5)
$$\begin{aligned} \text {and} \qquad {\mathbf {P}}_\beta \Bigl (|K_\mathrm {max}(\Lambda )| < \varepsilon M_\beta (\Lambda ) \Bigr )&\le 27 \varepsilon \qquad \, \qquad \end{aligned}$$
(2.6)

hold for every \(\beta \ge 0\), \(\alpha \ge 1\), and \(0<\varepsilon \le 1\). Moreover, the inequality

$$\begin{aligned} {\mathbf {P}}_\beta \Bigl (|K_u \cap \Lambda | \ge \alpha M_\beta (\Lambda )\Bigr ) \le e \cdot {\mathbf {P}}_\beta \Bigl (|K_u \cap \Lambda | \ge M_\beta (\Lambda )\Bigr ) \exp \left( -\frac{1}{9}\alpha \right) \end{aligned}$$
(2.7)

holds for every \(\beta \ge 0\), \(\alpha \ge 1\), and \(u \in V\).

We will deduce this theorem as a corollary of the following more general inequality.

Theorem 2.3

Let \(G=(V,E,J)\) be a countable weighted graph and let \(\Lambda \subseteq V\) be finite and non-empty. Then the inequalities

$$\begin{aligned}&{\mathbf {P}}_\beta \bigl (|K_\mathrm {max}(\Lambda )| \ge 3^k \lambda \bigr ) \le {\mathbf {P}}_\beta \bigl (|K_\mathrm {max}(\Lambda )| \ge \lambda \bigr )^{3^{k-1}+1} \end{aligned}$$
(2.8)
$$\begin{aligned} \text {and}&\,\, {\mathbf {P}}_\beta \bigl (|K_u \cap \Lambda | \ge 3^k \lambda \bigr ) \le {\mathbf {P}}_\beta \bigl (|K_\mathrm {max}(\Lambda )| \ge \lambda \bigr )^{3^{k-1}} {\mathbf {P}}_\beta \bigl (|K_u \cap \Lambda | \ge \lambda \bigr ) \end{aligned}$$
(2.9)

hold for every \(\beta \ge 0\), \(\lambda \ge 1\), \(k\ge 0\), and \(u\in V\).

(This theorem does not require \(\lambda \) to be an integer.)

Proof of Theorem 2.2 given Theorem 2.3

The inequalities (2.5) and (2.7) are trivial when \(\alpha \le 9\), while for \(\alpha \ge 9\) they follow immediately from (2.8) and (2.9) by taking \(\lambda =M_\beta (\Lambda )\) and \(k=\lfloor \log _3 \alpha \rfloor \ge 1\) and using that \(3^{\lfloor \log _3 \alpha \rfloor -1} \ge \alpha /9\). We now turn to (2.6). Write \(M=M_\beta (\Lambda )\). The inequality is trivial if \(\varepsilon M <1\) or \(9 \varepsilon \ge 1\), so we may assume that \(M \ge 1/ \varepsilon \ge 9\). The definitions ensure that \({\mathbf {P}}_\beta (|K_\mathrm {max}(\Lambda )|\ge M-1) \ge e^{-1}\). Let \(k=\lfloor \log _3 (1/\varepsilon )\rfloor -1\), so that \(3^{-k}(M-1) \ge 3\varepsilon (M-1) \ge \varepsilon M\). The inequality (2.8) implies that

$$\begin{aligned} {\mathbf {P}}_\beta \Bigl (|K_\mathrm {max}(\Lambda )| \ge M -1 \Bigr )\le & {} {\mathbf {P}}_\beta \Bigl (|K_\mathrm {max}(\Lambda )| \ge 3^{-k} (M-1) \Bigr )^{3^{k-1}} \\\le & {} {\mathbf {P}}_\beta \Bigl (|K_\mathrm {max}(\Lambda )| \ge \varepsilon M \Bigr )^{1/(27\varepsilon )}, \end{aligned}$$

which can be rearranged to yield that

$$\begin{aligned} {\mathbf {P}}_\beta \Bigl (|K_\mathrm {max}(\Lambda )| < \varepsilon M \Bigr ) \le 1-{\mathbf {P}}_\beta \Bigl (|K_\mathrm {max}(\Lambda )| \ge M -1 \Bigr )^{27\varepsilon } \le 1- e^{-27\varepsilon } \le 27\varepsilon \end{aligned}$$

as claimed, where we used that \(1-e^{-x} \le x\) in the third inequality. \(\square \)

We now turn to the proof of Theorem 2.3. We will deduce the theorem as a consequence of the BK inequality together with the following combinatorial lemma.

Lemma 2.4

Let \(G=(V,E)\) be a connected, locally finite graph, let \(k\ge 1\), and let A be a finite subset of V such that \(|A| \ge 3^k\). Then there exists \(m \ge 3^{k-1}+1\) and a collection \(\{E_i : 1 \le i \le m\}\) of disjoint, non-empty subsets of E such that the following hold:

  1. 1.

    For each \(1\le i \le m\), the subgraph of G spanned by \(E_i\) is connected.

  2. 2.

    Every vertex in V is incident to some edge in \(\bigcup _{i=1}^{m} E_i\).

  3. 3.

    The set \(V_i\) of vertices incident to an edge of \(E_i\) satisfies

    $$\begin{aligned} 3^{-k} \le \frac{|A \cap V_i|}{|A|} < 3^{-k+1} \end{aligned}$$

    for each \(1 \le i \le m\).

When G is finite, the proof of this lemma can be used to derive an explicit divide-and-conquer algorithm for finding such a collection of sets \(E_1,\ldots ,E_{m}\) after taking a spanning tree of G.

Proof of Lemma 2.4

We may without loss of generality assume that \(G=T\) is a tree, taking a spanning tree of G otherwise. In this case we will prove the stronger claim that the sets \(\{E_i : 1 \le i \le m\}\) can be taken to be a partition of E. (Here, a partition of E is a set of disjoint subsets of E whose union is E.)

We say that a partition \(\pi \) of E is good if each piece of \(\pi \) spans a connected subgraph of T.

We first prove that if \(T=(V,E)\) is a locally finite tree and \(A \subseteq V\) has \(3\le |A| < \infty \) then there exists a good partition of E into two non-empty sets \(E_1\) and \(E_2\) such that if \(V(E_i)\) denotes the set of vertices incident to at least one edge of \(E_i\) then

$$\begin{aligned} \frac{1}{3}|A| \le |A \cap V(E_i)| \le \left\lceil \frac{2}{3}|A|\right\rceil \end{aligned}$$

for each \(i=1,2\). Let \(\rho \) be a vertex of T. We root T at \(\rho \), and call a vertex v a descendant of an edge e if the unique shortest path from \(\rho \) to v contains e. We will iteratively define a sequence \((v_n,W_n)_{n= 0}^N\), where \(1 \le N \le \infty \), \(v_n \in V\), and \(W_n \subseteq E\) for each \(0 \le n \le N\). Start by setting \(v_0=\rho \) and \(W_0=\emptyset \). At each intermediate stage \(0<n<N\) of the sequence, \(W_n\) and \(W_n^c\) will both be non-empty and span connected subgraphs of T, while \(v_n\) will be incident to edges of both \(W_n\) and \(W_n^c\). Given \((v_n,W_n)\) for some \(n\ge 0\), we let \(V_n\) be the set of vertices of T that are either equal to \(\rho \) or incident to some edge of \(W_n\), and define \((v_{n+1},W_{n+1})\) using the following procedure, which is illustrated in Fig. 3.

  1. 1.

    If \(v_n\) has exactly one edge \(e\in W_n^c\) adjacent to it, we set \(W_{n+1} = W_n \cup \{e\}\) and set \(v_{n+1}\) to be the other endpoint of this edge. If \(W_{n+1}=E\) then we set \(N=n+1\) and terminate the sequence.

  2. 2.

    Otherwise, \(v_{n+1}\) has at least two edges of \(W_n^c\) adjacent to it. Enumerate these edges \(e_1,\ldots ,e_\ell \), and let \(D_i\) be the set of descendants of \(e_i\) for each \(1\le i \le \ell \). Since \(\sum _{i=1}^\ell |D_i \cap A| = |V_n^c \cap A|\), there must exist \(1\le i \le \ell \) such that \(|D_i \cap A| \le |V_n^c \cap A|/2\). Choose one such i, set \(W_{n+1}\) to be the union of \(W_n\) with the set of edges incident to \(D_i\) (i.e., having at least one endpoint in \(D_i\)), and set \(v_{n+1}=v_n\).

Fig. 3
figure 3

A sequence of good partitions of a tree as constructed in the proof of Lemma 2.4. Red vertices represent elements of the distinguished set A. At each stage, edges belonging to \(W_n\) are thick, blue, and contained in the blue shaded region, while edges belonging to the complement \(W_n^c\) are thin and black. The vertex \(v_n\), which lies at the boundary of \(W_n\) and \(W_n^c\), is represented with a green outer ring. In this example, \(W_n\) is first incident to more than 1/3 of the vertices of A when \(n=2\), in which case \(W_n\) and \(W_n^c\) are both incident to exactly 3 vertices of A

We may verify by induction that \(W_n\) and \(W_n^c\) are indeed both non-empty and span connected subgraphs of T for every \(0< n < N\) as claimed, so that \(\{W_n,W_n^c\}\) is a good partition of E for every \(0< n < N\). Moreover, the assumption that T is locally finite implies that \(\bigcup _{n=0}^N W_n = E\) and hence that \(\bigcup _{n=0}^N V_n = V\). Since A is finite and \(\bigcup _{n=0}^N V_n = V\), there exists a finite time \(N'\le N\) such that \(V_n\) contains A for every \( N' \le n \le N\). Observe that the set \(\{0\le n \le N' : |V_n \cap A| > |A|/3\}\) contains \(N'\) but does not contain 0 since \(|A| \ge 3\). Letting \(m \ge 1\) be the minimal element of this set, we have that

$$\begin{aligned} \frac{1}{3}|A|< & {} |V_{m} \cap A| \le |V_{m-1} \cap A| + \max \Bigl \{ 1,\, \frac{1}{2}|V_{m-1}^c \cap A| \Bigr \} \\= & {} \max \Bigl \{ |V_{m-1} \cap A| + 1,\, \frac{1}{2}\left( |A|+|V_{m-1} \cap A|\right) \Bigr \} \le \frac{2}{3}|A|, \end{aligned}$$

where we used that \(|A| \ge 3\) in the final inequality. It follows in particular that \(0< m < N\). Moreover, if \(V_m'\) denotes the set of vertices incident to an edge of \(W_m^c\) then \(V_m \cup V_m'=V\) and \(|V_m \cap V_m'|\le 1\), so that

$$\begin{aligned} \frac{1}{3} |A| \le |A| - |V_m \cap A| \le |V_m'\cap A| \le |A|- |V_m \cap A| +1 \le \left\lceil \frac{2}{3}|A|\right\rceil , \end{aligned}$$

where the final inequality follows since \(|V_m \cap A| > \frac{1}{3}|A|\) is an integer. It follows that \(\{W_{m},W_{m}^c\}\) is a good partition of E with the desired properties.

We now apply the claim proven in the previous paragraph to complete the proof of the lemma. Let \(T=(V,E)\) be a locally finite tree, let \(k\ge 1\), and let \(A \subseteq V\) satisfy \(3^k \le |A| < \infty \). For each subset K of E, write V(K) for the set of vertices incident to an edge of K. Construct a sequence \(\pi _0,\pi _1,\ldots \) of good partitions of E recursively as follows. Set \(\pi _0=\{E\}\) to be the trivial partition. For each \(i\ge 0\), define \(\pi _{i+1}\) by retaining those pieces W of \(\pi _i\) satisfying \(|V(W) \cap A| < 3^{k-1} |A|\) and splitting each piece W of \(\pi _i\) with \(|V(W) \cap A| \ge 3^{-k+1}|A|\) into two pieces \(W_1\) and \(W_2\) that each span connected subgraphs of T and satisfy

$$\begin{aligned} \frac{1}{3}|V(W) \cap A| \le |V(W_1) \cap A|,|V(W_2) \cap A| \le \left\lceil \frac{2}{3}|V(W) \cap A|\right\rceil . \end{aligned}$$

This can be done by applying the claim proven in the previous paragraph to the subgraph of T spanned by the piece W. We have by induction on i that

$$\begin{aligned} \min \{V(W) \cap A : W \in \pi _{i} \} \ge 3^{-k} |A| \end{aligned}$$

for every \(i\ge 0\). Moreover, noting that \(\lceil 2x/3 \rceil < 9x/10\) for every integer \(x\ge 3\), we also have by induction on i that

$$\begin{aligned} \max \{V(W) \cap A : W \in \pi _{i} \} < \max \left\{ 3^{-k+1}|A|,\, \left( \frac{9}{10}\right) ^i |A| \right\} \end{aligned}$$

for every \(i \ge 0\). It follows that there exists \(i_0<\infty \) such that every piece W of the good partition \(\pi _{i_0}\) satisfies \(3^{-k}|A| \le |V(W) \cap A| < 3^{-k+1}|A|\) as required. Letting \(m=|\pi _{i_0}|\) we have that \(3^{-k+1} |A| m > \sum _{W \in \pi _{i_0}} |V(W) \cap A| \ge |A|\) and hence that \(m \ge 3^{k-1}+1\) as desired. \(\square \)

Proof of Theorem 2.3

Let \(G=(V,E,J)\) be a finite weighted graph. Recall that if \(A_1,\ldots ,A_k\) are (not necessarily distinct), increasing subsets of \(\{0,1\}^E\), the disjoint occurrence \(A_1 \circ \cdots \circ A_k\) is the set of \(\omega \in \{0,1\}^E\) such that there exist disjoint sets \(W_1,\ldots ,W_k \subseteq \{e:\omega (e)=1\}\) such that

$$\begin{aligned} (\omega '(e)=1 \text { for every }e\in W_i) \Rightarrow (\omega ' \in A_i) \qquad \text {for every }\omega ' \in \{0,1\}^E\text { and }1 \le i \le k. \end{aligned}$$

(Here, a subset of \(\{0,1\}^E\) is said to be increasing if \(\omega \in A \Rightarrow \omega ' \in A\) for every \(\omega ,\omega ' \in \{0,1\}^E\) such that \(\omega '(e)\ge \omega (e)\) for every \(e\in E\).) The sets \(W_1,\ldots ,W_k\) are known as disjoint witnesses for the events \(A_1,\ldots ,A_k\). The van den Berg and Kesten inequality [73], a.k.a. the BK inequality, states that if \(G=(V,E,J)\) is a finite weighted graph and \(A_1,\ldots ,A_k \subseteq \{0,1\}^E\) are increasing events then

$$\begin{aligned} {\mathbf {P}}_\beta (A_1 \circ \cdots \circ A_k) \le \prod _{i=1}^k {\mathbf {P}}_\beta (A_i) \end{aligned}$$

for every \(\beta \ge 0\). See [33, Chapter 2.3] for further background.

Let \(G=(V,E,J)\) be a finite weighted graph and let \(\Lambda \subseteq V\). Suppose that the event \(\{|K_\mathrm {max}(\Lambda )|\ge 3^k \lambda \}\) holds for some \(\lambda \ge 1\) and \(k\ge 1\), and let \(v\in V\) be such that \(|K_v \cap \Lambda | \ge 3^k \lambda \). Applying Lemma 2.4 to \(K_v\) yields that there exists \(m\ge 3^{k-1}+1\) and m disjoint sets of open edges \(E_1,\ldots ,E_{m}\), each spanning a connected subgraph of \(K_v\), such that the set \(V_i\) of vertices incident to an edge of \(E_i\) satisfies \(|V_i \cap \Lambda | \ge \lambda \) for every \(1\le i \le m\). It follows that the sets \(E_1,\ldots ,E_{m}\) are all witnesses for the event \(\{|K_\mathrm {max}(\Lambda )|\ge \lambda \}\), and since these sets are all disjoint we deduce that

$$\begin{aligned} \{ |K_\mathrm {max}(\Lambda )|\ge 3^k \lambda \} \subseteq \underbrace{\{|K_\mathrm {max}(\Lambda )|\ge \lambda \} \circ \cdots \circ \{|K_\mathrm {max}(\Lambda )|\ge \lambda \}}_{3^{k-1}+1\text { copies}} \end{aligned}$$
(2.10)

for every \(\lambda \ge 1\) and \(k\ge 1\). Taking probabilities on both sides and applying the BK inequality yields the claimed inequality (2.8) in the case that G is finite. Now suppose that the event \(\{|K_u \cap \Lambda |\ge 3^k \lambda \}\) holds for some \(\lambda \ge 1\), \(k\ge 1\), and \(u\in V\). Similarly to above, applying Lemma 2.4 to \(K_u\) yields that there exists \(m \ge 3^{k-1}+1\) and m disjoint sets of open edges \(E_1,\ldots ,E_{m}\), each spanning a connected subgraph of \(K_u\), such that the set \(V_i\) of vertices incident to an edge of \(E_i\) satisfies \(|V_i \cap \Lambda | \ge \lambda \) for every \(1\le i \le m\) and such that \(\bigcup _{i=1}^{m} V_i\) is equal to the vertex set of \(K_u\). In particular, \(u\in V_i\) for some \(1 \le i \le m\). Thus, at least one of the sets \(E_1,\ldots ,E_{m}\) is a witness for the event \(\{ |K_u \cap \Lambda |\ge \lambda \}\), while the remaining sets are all witnesses for the event \(\{|K_\mathrm {max}(\Lambda )|\ge \lambda \}\). Since the sets \(E_1,\ldots ,E_{m}\) are all disjoint, we deduce that

$$\begin{aligned} \{ |K_u \cap \Lambda |\ge 3^k \lambda \} \subseteq \{ |K_u \cap \Lambda |\ge \lambda \} \circ \underbrace{\{|K_\mathrm {max}(\Lambda )|\ge \lambda \} \circ \cdots \circ \{|K_\mathrm {max}(\Lambda )|\ge \lambda \}}_{3^{k-1}\text { copies}}\nonumber \\ \end{aligned}$$
(2.11)

for every \(\lambda \ge 1\), \(k\ge 1\), and \(u \in V\). As before, taking probabilities on both sides and applying the BK inequality yields the claimed inequality (2.9) in the case that G is finite. The infinite cases of (2.8) and (2.9) follow straightforwardly from the finite cases by passing to the limit in an exhaustion over finite subgraphs. \(\square \)

2.2 Proof of the hyperscaling inequality

We now apply Theorem 2.2 to prove Theorem 2.1. In fact we will prove the following stronger theorem which also gives control of the maximal cluster size in \(\Lambda \) and allows \(0\le \theta <1\).

Theorem 2.5

There exists a universal continuous function \(C:[0,1)\rightarrow (0,\infty )\) such that the following holds. Let \(G=(V,E,J)\) be a countable weighted graph, let \(\beta \ge 0\), let \(\Lambda \subseteq V\) be finite, and suppose that there exist \(A<\infty \) and \(0\le \theta <1\) such that \({\mathbf {P}}_\beta ( |K_u \cap \Lambda | \ge n) \le A n^{-\theta }\) for every \(u\in V\) and \(n \ge 1\). Then

$$\begin{aligned}&M_\beta (\Lambda ) \le C(\theta ) A^{1/(1+\theta )} |\Lambda |^{1/(1+\theta )} \quad \text { and } \quad \\&\quad \frac{1}{|\Lambda |}\sum _{v \in \Lambda } {\mathbf {P}}_\beta (u\leftrightarrow v) \le C(\theta ) A^{2/(1+\theta )} |\Lambda |^{-2\theta /(1+\theta )} \end{aligned}$$

for every \(u\in V\).

We begin by writing down the following immediate corollary of Theorem 2.2.

Corollary 2.6

Let \(G=(V,E,J)\) be a weighted graph and let \(\beta \ge 0\). Let \(u\in V\) and \(\Lambda \subseteq V\) be finite, and suppose that there exist constants \(A<\infty \) and \(0\le \theta <1\) such that \({\mathbf {P}}_\beta ( |K_u \cap \Lambda | \ge n) \le A n^{-\theta }\) for every \(n \ge 1\). Then

$$\begin{aligned} {\mathbf {P}}_\beta (|K_u \cap \Lambda | \ge n) \le e A \left( \frac{18}{n}\right) ^{\theta } \!\exp \left[ -\frac{n}{18 M_\beta (\Lambda )}\right] \qquad \text {for every }n\ge 1. \end{aligned}$$

Proof of Corollary 2.6

Write \(M=M_\beta (\Lambda )\). The claim is trivial when \(n \le 18 M\). If not, we have by Theorem 2.2 that

$$\begin{aligned}&{\mathbf {P}}(|K_u \cap \Lambda | \ge n) \le e A M^{-\theta } \exp \left[ -\frac{n}{9 M}\right] \le e A n^{-\theta } \exp \left[ -\frac{n}{18 M}\right] \left( \frac{n^{\theta }}{M^{\theta }}\exp \left[ -\frac{n}{18M}\right] \right) . \end{aligned}$$

Using that \(x^{\theta } e^{-x/C}\) is decreasing on \([C,\infty )\) yields the claimed inequality. \(\square \)

Proof of Theorem 2.5

For each \(u\in V\) we can apply Corollary 2.6 to compute that

$$\begin{aligned} \sum _{v \in \Lambda } {\mathbf {P}}_\beta (u\leftrightarrow v)&= {\mathbf {E}}_\beta |K_u \cap \Lambda | = \sum _{n\ge 1} {\mathbf {P}}_\beta (|K_u \cap \Lambda | \ge n) \nonumber \\&\le e A \sum _{n=1}^\infty \left( \frac{18}{n}\right) ^{\theta } \exp \left[ -\frac{n}{18 M}\right] \nonumber \\&\le e A \int _0^\infty \left( \frac{18}{t}\right) ^{\theta } \exp \left[ -\frac{t}{18 M}\right] \text {t}t = 18 e \Gamma (1-\theta ) A M^{1-\theta } \end{aligned}$$
(2.12)

where \(\Gamma (\alpha )=\int _0^\infty t^{\alpha -1} e^{-t} \text {t}t\) is the Gamma function and where we used the change of variables \(s=t/(18M)\) in the final equality. Summing over \(u\in \Lambda \), it follows that

$$\begin{aligned} \sum _{u,v \in \Lambda } {\mathbf {P}}_\beta (u \leftrightarrow v) = \sum _{u\in \Lambda } {\mathbf {E}}_\beta |K_u \cap \Lambda | \le 18 e \Gamma (1-\theta ) A M^{1-\theta }|\Lambda |. \end{aligned}$$
(2.13)

On the other hand, we also have the lower bound

$$\begin{aligned} \sum _{u,v \in \Lambda } {\mathbf {P}}_\beta (u \leftrightarrow v)&= {\mathbb {E}}\left[ \sum _{u,v \in \Lambda } \mathbb {1}(u \leftrightarrow v)\right] \ge {\mathbf {E}}_\beta \left[ |K_\mathrm {max}(\Lambda )|^2\right] \nonumber \\&\ge (M-1)^2 {\mathbf {P}}_\beta \Bigl (|K_\mathrm {max}(\Lambda )|\ge M-1\Bigr ) \ge \frac{1}{e}(M-1)^2 \ge \frac{1}{4e}M^2, \end{aligned}$$
(2.14)

where we used that \(M\ge 2\) in the final inequality. Comparing the estimates (2.13) and (2.14) and rearranging yields that

$$\begin{aligned} M^{1+\theta } \le 72 e^2 \Gamma (1-\theta ) A |\Lambda |, \end{aligned}$$
(2.15)

completing the proof of the first claimed bound. Substituting this bound into (2.12) yields that there exists a universal continuous function \(C:[0,1)\rightarrow (0,\infty )\) such that

$$\begin{aligned} \frac{1}{|\Lambda |}\sum _{v \in \Lambda } {\mathbf {P}}_\beta (u \leftrightarrow v)\le & {} \frac{1}{|\Lambda |} 18 e \Gamma (1-\theta ) A \left( 72 e^2 \Gamma (1-\theta ) A |\Lambda |\right) ^{(1-\theta )/(1+\theta )} \\= & {} C(\theta ) A^{2/(1+\theta )}|\Lambda |^{-2\theta /(1+\theta )} \end{aligned}$$

for each \(u\in V\), completing the proof of the second bound. \(\square \)

Remark 2.7

Although the distribution of the entire cluster of critical percolation on a transitive weighted graph always satisfies \({\mathbf {P}}_{\beta _c}(|K_v|\ge n) \ge c n^{-1/2}\), the \(1/2<\theta <1\) case of Theorem 2.5 may nevertheless be useful when taking e.g. \(\Lambda \subseteq {\mathbb {Z}}^{d-k} \subseteq {\mathbb {Z}}^d\) to be contained in a lower-dimensional subspace of the full lattice. In particular, it would be interesting if one could improve the high-dimensional case of Theorem 1.1 by first proving an upper bound of the form \({\mathbf {P}}_{\beta _c}(|K_0 \cap {\mathbb {Z}}^{d-2}| \ge n) \le n^{-1/\delta _2}\) for some \(\delta _2 < 2\) and then using Theorem 2.5 to get an improved bound on the two-point function within \({\mathbb {Z}}^{d-2}\). It seems that only a relatively modest improvement along these lines is needed to give a lace expansion-free proof that the triangle condition is satisfied when d is large and \(\alpha \) is fixed. Note also that bounds on the maximum cluster size similar to those of Theorem 2.5 may be proven in the regime \(\theta \ge 1\) by following the proof as above but considering \(\sum _{u\in \Lambda } {\mathbf {E}}_\beta |K_v \cap \Lambda |^k\) instead of \(\sum _{u\in \Lambda } {\mathbf {E}}_\beta |K_v \cap \Lambda |\) for appropriate choice of \(k\ge 2\).

3 An improved two-ghost inequality

In this section we derive an improved version of the two-ghost inequality of [44, Theorem 1.6 and Corollary 1.7] as stated for long-range models in [45, Section 3]. This improved two-ghost inequality will be applied together with Theorem 2.1 to prove Theorems 1.1 and 1.2 in the next section. The proof of the two-ghost inequality uses ideas originating in the important work of Aizenman, Kesten, and Newman [3]; see [44] and [20] for further discussion of how the methods of [3] can be used to derive quantitative estimates on critical percolation. Our improvement to the two-ghost inequality as stated in [45, Corollary 3.2] is two-fold:

  • We show that a starting assumption of the form \({\mathbf {P}}_\beta (|K|\ge n)\le A n^{-\theta }\) (as will come from our bootstrapping hypothesis) can be used to improve the exponent given by the two-ghost inequality. The fact that this can be done had previously been discussed briefly [44, Remark 6.1] and [45, Remark 3.6].

  • We use a re-weighting trick to improve the bound one obtains on the probability of the two-arm event for a typical ‘long’ edge. The basic idea behind this improvement is that the two-ghost inequality of [45] holds not just for the weights J that are given with the graph G, but also for any other automorphism-invariant choice of weights. Optimizing the resulting bound over all possible automorphism-invariant weights leads to the bound of Theorem 3.1.

For the benefit of future applications, we phrase the results in this section not just for Bernoulli percolation but for the more general class of percolation in random environment models. The same level of generality was employed in [45, Section 3], where we applied the two-ghost inequality to the random-cluster and Ising models. (See in particular [45, Section 3.3] for a representation of the random-cluster model as a percolation in random environment model first arising in [15].) Let \(G=(V,E,J)\) be a countable weighted graph. Suppose that \(\mu \) is a probability measure on \([0,1]^E\), and let \({\mathbf {p}}=({\mathbf {p}}_e)_{e\in E}\) be a \([0,1]^E\)-valued random variable with law \(\mu \). Let \((U_e)_{e\in E}\) be i.i.d. Uniform[0, 1] random variables independent of \({\mathbf {p}}\) and let \(\omega =\omega ({\mathbf {p}},U)\) be the \(\{0,1\}^E\)-valued random variable defined by \(\omega (e) = \mathbb {1}(U_e \le {\mathbf {p}}_e)\) for each \(e\in E\). We say that \(\omega \) is a percolation in random environment on G with environment distribution \(\mu \) and write \({\mathbf {P}}_\mu \) for the joint law of \({\mathbf {p}}\) and \(\omega \). We can consider Bernoulli percolation on G to be a percolation in random environment model for which the environment measure \(\mu \) is concentrated on the point \(({\mathbf {p}}_e)_{e\in E} = (1-e^{-\beta J_e})_{e\in E}\).

For each \(e\in E\) and \(n\ge 1\), let \({\mathscr {S}}'_{e,n}\) be the event that the endpoints of e belong to distinct clusters each of which include at least n vertices and at least one of which is finiteFootnote 1 (We use \({\mathscr {S}}'_{e,n}\) rather than \({\mathscr {S}}_{e,n}\) to indicate that we are measuring volume in terms of vertices rather than edges.) Recall that we write \(E_v^\rightarrow \) for the set of oriented edges emanating from v for each vertex v of G; we do not distinguish notationally between oriented and unoriented edges, and will often abuse notation to apply functions defined on unoriented edges to oriented edges by forgetting the orientation.

Theorem 3.1

(Improved two-ghost inequality). Let \(G=(V,E,J)\) be a connected transitive weighted graph, let o be a vertex of G, and let \(\Gamma \subseteq {\text {Aut}}(G)\) be a closed, transitive, unimodular subgroup of automorphisms of G. Let \(\mu \) be a \(\Gamma \)-invariant probability measure on \([0,1]^E\) and suppose that there exist constants \(A<\infty \) and \(0\le \theta <1/2\) such that \({\mathbf {P}}_{\mu }(|K_o| \ge n) \le A n^{-\theta }\) for every \(n \ge 1\). Then

$$\begin{aligned} \sum _{e\in E^\rightarrow _o} {\mathbf {E}}_{\mu }\left[ \mathbb {1}({\mathscr {S}}_{e,n}') \sqrt{\frac{{\mathbf {p}}_e}{1-{\mathbf {p}}_e}}\right] ^2 \le \frac{40000 \cdot A^2}{(1-2\theta )^2 n^{1+2\theta }} \qquad \text { for every }n\ge 1. \end{aligned}$$
(3.1)

Here, the closed, transitive subgroup \(\Gamma \subseteq {\text {Aut}}(G)\) is said to be unimodular if it satisfies the mass-transport principle, i.e., if

$$\begin{aligned} \sum _{v \in V} F(o,v) = \sum _{v\in V}F(v,o) \end{aligned}$$
(3.2)

for every \(o\in V\) and every function \(F:V^2\rightarrow [0,\infty ]\) that is diagonally invariant under \(\Gamma \) in the sense that \(F(\gamma u,\gamma v)=F(u,v)\) for every \(u,v\in V\) and \(\gamma \in \Gamma \). Equivalently, \(\Gamma \) is unimodular if its left and right Haar measures coincide. This holds in particular whenever \(\Gamma \) is countable, in which case it has counting measure as both a left and right Haar measure. Most transitive weighted graphs arising in examples have unimodular automorphism groups, including all amenable transitive weighted graphs and all weighted graphs defined in terms of a countable group as described after the statement of Theorem 1.2. See e.g. [45, Section 2] and [56, Chapter 8] for further background and for proofs of these statements. For the main purposes of this paper, it suffices to consider the case that G has vertex set \({\mathbb {Z}}^d\) and that \(\Gamma ={\mathbb {Z}}^d\) acts transitively on G by translations as in Theorem 1.1.

Let \(G=(V,E,J)\) be a connected, transitive weighted graph, let o be a vertex of G, and let \(\Gamma \) be a closed transitive subgroup of \({\text {Aut}}(G)\). We call \(\mathrm {w}:E\rightarrow [0,1]\) a (\(\Gamma \)-)good weight function if \(\mathrm {w}(\gamma e)=\mathrm {w}(e)\) for every \(e\in E\) and \(\gamma \in \Gamma \), \(\sum _{E^\rightarrow _o} \mathrm {w}(e)=1\), and \(\sum _{E^\rightarrow _o} \sqrt{\mathrm {w}(e)} < \infty \). (The last condition holds trivially if \(\mathrm {w}(e)=0\) for all but finitely many \(e\in E^\rightarrow _o\), and it would in fact suffice to consider this case for the rest of the proof.) Let \(\mu \) be a \(\Gamma \)-invariant probability measure on \([0,1]^E\), let \({\mathbf {p}}\) be a random variable with law \(\mu \) and let \(\omega \) be the associated percolation in random environment process as above. Let \(h>0\). Given the environment \({\mathbf {p}}\) and a good weight function \(\mathrm {w}\), let \({\mathcal {G}}\in \{0,1\}^E\) be a random subset of E, independent of \({\mathbf {p}}\) and \(\omega \), where each edge \(e \in E\) is included in \({\mathcal {G}}\) independently at random with probability \(1-e^{-h \mathrm {w}(e)}\) of being included. We write \({\mathbf {P}}_{\mu ,\mathrm {w},h}\) and \({\mathbf {E}}_{\mu ,\mathrm {w},h}\) for probabilities and expectations taken with respect to the joint law of \({\mathbf {p}}\), \(\omega \), and \({\mathcal {G}}\). We call \({\mathcal {G}}\) the \(\mathrm {w}\)-ghost field and call an edge \(\mathrm {w}\)-green if it is included in \({\mathcal {G}}\). Note that

$$\begin{aligned} {\mathbf {P}}_{\mu ,\mathrm {w},h}(A \cap {\mathcal {G}}= \emptyset \mid {\mathbf {p}}) = \exp \left[ -h \cdot \mathrm {w}(A) \right] \end{aligned}$$

for every finite set \(A \subseteq E\), where we write \(\mathrm {w}(A)=\sum _{e\in A}\mathrm {w}(e)\) for the total weight of A.

For each edge e of G, we define \({\mathscr {T}}_e\) to be the event that e is closed in \(\omega \) and that the endpoints of e are in distinct clusters of \(\omega \), each of which touches some \(\mathrm {w}\)-green edge, and at least one of which is finite. We will deduce Theorem 3.1 from the following proposition.

Proposition 3.2

Let \(G=(V,E,J)\) be a connected transitive weighted graph, let o be a vertex of G, and let \(\Gamma \subseteq {\text {Aut}}(G)\) be a closed transitive unimodular subgroup of automorphisms of G. Let \(\mu \) be a \(\Gamma \)-invariant probability measure on \([0,1]^E\) and suppose that there exist constants \(A<\infty \) and \(0\le \theta <1/2\) such that \({\mathbf {P}}_{\mu }(|K_o|\ge n) \le A n^{-\theta }\) for every \(n\ge 1\). Then for each \(\Gamma \)-good weight function \(\mathrm {w}:E\rightarrow [0,1]\) we have that

$$\begin{aligned} \sum _{e\in E^\rightarrow _o} \sqrt{\mathrm {w}(e)}{\mathbf {E}}_{\mu ,\mathrm {w},h}\left[ \mathbb {1}({\mathscr {T}}_e) \sqrt{\frac{{\mathbf {p}}_e}{1-{\mathbf {p}}_e}}\right] \le \frac{40 A}{1-2\theta } h^{(1+2\theta )/2} \qquad \text {for every }h>0. \end{aligned}$$
(3.3)

(The condition \(\sum _{e\in E^\rightarrow _o} \sqrt{\mathrm {w}(e)} < \infty \) is not really needed for this proposition to hold, but will slightly simplify the proof.) Before proving this theorem, let us see how it implies Theorem 3.1.

Proof of Theorem 3.1 given Proposition 3.2

Let \(\mathrm {w}:E\rightarrow [0,1]\) be a \(\Gamma \)-good weight function. Let e be an edge of G with endpoints x and y and let \({\mathscr {D}}_e\) be the event that x and y are in distinct clusters at least one of which is finite. Then we have by the definitions that

$$\begin{aligned} {\mathbf {P}}_{\mu ,\mathrm {w},h}({\mathscr {T}}_e \mid {\mathbf {p}})\ge & {} (1-e^{-\frac{1}{2}hn})^2{\mathbf {P}}_{\mu ,\mathrm {w},h}\bigl ({\mathscr {D}}_e \cap \bigl \{\mathrm {w}(E(K_x)),\mathrm {w}(E(K_{y})) \ge \tfrac{1}{2}n\bigr \}\mid {\mathbf {p}}\,\bigr ) \\\ge & {} (1-e^{-\frac{1}{2}hn})^2{\mathbf {P}}_{\mu ,\mathrm {w},h}\bigl ({\mathscr {S}}_{e,n}'\mid {\mathbf {p}}\,\bigr ) \end{aligned}$$

for each \(h>0\) and \(n\ge 1\), where we used that \(|A| \le 2\mathrm {w}(E(A))\) for every \(A \subseteq V\) in the final inequality. Setting \(h=c n^{-1}\) with \(c\ge 1\) and applying Proposition 3.2, it follows that

$$\begin{aligned}&\sum _{e\in E^\rightarrow _o} \sqrt{\mathrm {w}(e)} {\mathbf {E}}_{\mu }\left[ \mathbb {1}({\mathscr {S}}_{e,n}') \sqrt{\frac{{\mathbf {p}}_e}{1-{\mathbf {p}}_e}}\right] \\&\quad \le (1-e^{-\frac{1}{2}hn})^{-2} \sum _{e\in E^\rightarrow _o} \sqrt{\mathrm {w}(e)} {\mathbf {E}}_{\mu ,\mathrm {w},h}\left[ \mathbb {1}({\mathscr {T}}_{e}) \sqrt{\frac{{\mathbf {p}}_e}{1-{\mathbf {p}}_e}}\right] \\&\quad \le \frac{c}{(1-e^{-c/2})^2} \cdot \frac{40 A}{1-2\theta } n^{-(1+2\theta )/2} \end{aligned}$$

for every \(n\ge 1\) and \(c\ge 1\). Using that \(\inf _{c \ge 1} 40 c (1-e^{-c/2})^{-2} =196.433 \ldots \le 200\) gives that

$$\begin{aligned} \sum _{e\in E^\rightarrow _o} \sqrt{\mathrm {w}(e)} {\mathbf {E}}_{\mu }\left[ \mathbb {1}({\mathscr {S}}_{e,n}') \sqrt{\frac{{\mathbf {p}}_e}{1-{\mathbf {p}}_e}}\right] \le \frac{200 A}{1-2\theta } n^{-(1+2\theta )/2} \end{aligned}$$
(3.4)

for every \(n\ge 1\) and every \(\Gamma \)-good weight function \(\mathrm {w}:E\rightarrow [0,1]\).

We now optimize over the choice of good weight function \(\mathrm {w}\) in order to prove the claimed inequality (3.1). Fix \(n\ge 1\). This inequality is trivial if \({\mathbf {E}}_{\mu }[\mathbb {1}({\mathscr {S}}_{e,n}') \sqrt{{\mathbf {p}}_e/(1-{\mathbf {p}}_e)}] =0\) for every \(e\in E^\rightarrow _o\), so we may assume that there exists \(e_0 \in E^\rightarrow _o\) for which this quantity is positive. Let \((A_m)_{m\ge 0}\) be an exhaustion of E by finite sets containing \(e_0\), so that the orbit \(\Gamma A_m=\{\gamma e : \gamma \in \Gamma , e\in A_m\}\) has finite intersection with \(E^\rightarrow _o\) for each \(m\ge 1\). For each \(m\ge 1\) we may therefore define a good weight function \(\mathrm {w}_{n,m}:E\rightarrow [0,1]\) by taking

$$\begin{aligned} \mathrm {w}_{n,m}(e)= & {} \frac{\tilde{\mathrm {w}}_{n,m}(e) \mathbb {1}(e\in \Gamma A_m)}{\sum _{e' \in E^\rightarrow _o} \tilde{\mathrm {w}}_{n,m}(e') \mathbb {1}(e'\in \Gamma A_m)} \; \text { where } \; \tilde{\mathrm {w}}_{n,m}(e)\\= & {} \min \left\{ m,{\mathbf {E}}_{\mu }\left[ \mathbb {1}({\mathscr {S}}_{e,n}') \sqrt{\frac{{\mathbf {p}}_e}{1-{\mathbf {p}}_e}}\right] \right\} ^2 \end{aligned}$$

for every \(e\in E\). Applying (3.4) with this choice of good weight function and rearranging yields that

$$\begin{aligned} \sum _{e\in E^\rightarrow _o \cap \Gamma A_m} \min \left\{ m,{\mathbf {E}}_{\mu }\left[ \mathbb {1}({\mathscr {S}}_{e,n}') \sqrt{\frac{{\mathbf {p}}_e}{1-{\mathbf {p}}_e}}\right] \right\} ^2 \le \frac{40000 A^2}{(1-2\theta )^2 n^{1+2\theta }} \end{aligned}$$

for every \(m\ge 1\). The claim follows by taking the limit as \(m\rightarrow \infty \). \(\square \)

We now begin to work towards the proof of Proposition 3.2. Although the proof is similar to that of [45, Theorem 3.1], we will present most the details in order to keep the paper self-contained. Let \(G=(V,E,J)\) be a connected transitive weighted graph, let \(\Gamma \) be a closed transitive subgroup of automorphisms of G, and let \(\mathrm {w}:E\rightarrow [0,1]\) be a \(\Gamma \)-good weight function. For each environment \({\mathbf {p}}\in (0,1)^E\) and subgraph H of G, we define the \(\mathrm {w}\)-fluctuation of H to be

$$\begin{aligned} h_{{\mathbf {p}},\mathrm {w}}(H)&:= \sum _{e \in E(H)} \sqrt{\mathrm {w}(e)} \left[ \sqrt{\frac{{\mathbf {p}}_e}{1-{\mathbf {p}}_e}}\mathbb {1}\left( e\in \partial H\right) -\sqrt{\frac{1-{\mathbf {p}}_e}{{\mathbf {p}}_e}} \mathbb {1}\left( e\in E_o(H)\right) \right] \\&=\sum _{e \in E(H)} \sqrt{\frac{\mathrm {w}(e) {\mathbf {p}}_e}{1-{\mathbf {p}}_e}} \cdot \frac{{\mathbf {p}}_e-\mathbb {1}(e\in E_o(H))}{{\mathbf {p}}_e} \end{aligned}$$

where E(H) denotes the set of edges that touch H, i.e., have at least one endpoint in the vertex set of H, \(\partial H\) denotes the set of edges of G that touch the vertex set of H but are not included in H, and \(E_\circ (H)\) denotes the set of edges of G that are included in H, so that \(E(H)=\partial H \cup E_o(H)\). As in [45], the fluctuation is defined so that \(h_{{\mathbf {p}},\mathrm {w}}(K_v)\) is the total quadratic variation of a certain martingale that arises when exploring the cluster \(K_v\) one edge at a time after conditioning on the environment \({\mathbf {p}}\). The following key lemma uses the mass-transport principle to relate the probability of the two-arm event to an expectation written in terms of the fluctuation. (This lemma is the only place that unimodularity is used in the proofs of any of our theorems.)

Lemma 3.3

Let \(G=(V,E,J)\) be a connected transitive weighted graph and let \(\Gamma \subseteq {\text {Aut}}(G)\) be a closed transitive unimodular subgroup of automorphisms. Let \(\mu \) be a \(\Gamma \)-invariant probability measure on \((0,1)^E\) and let \(\mathrm {w}:E\rightarrow [0,1]\) be a \(\Gamma \)-good weight function. Then the inequality

$$\begin{aligned}&\sum _{e\in E^\rightarrow _o} \sqrt{\mathrm {w}(e)}{\mathbf {E}}_{\mu ,\mathrm {w},h}\left[ \mathbb {1}({\mathscr {T}}_e) \sqrt{\frac{ {\mathbf {p}}_e}{1-{\mathbf {p}}_e}}\right] \\&\quad \le 2{\mathbf {E}}_{\mu ,\mathrm {w},h}\left[ \frac{|h_{{\mathbf {p}},\mathrm {w}}(K_{o})|}{\mathrm {w}(E(K_{o}))} \mathbb {1}\bigl (|K_o| < \infty \text { and } E(K_{o}) \cap {\mathcal {G}}\ne \emptyset \bigr )\right] \end{aligned}$$

holds for every \(h>0\).

Be careful to note here that \(\mathrm {w}(E(K_o))\) and \(h_{{\mathbf {p}},\mathrm {w}}(K_{o})\) are defined in terms of unoriented edges. In particular, an edge contributes the same amount to both quantities whether it has one or two endpoints in \(K_o\).

Lemma 3.3 follows by a very similar proof to that of [45, Lemma 3.3] but where we have allowed ourself to use the weights \(\mathrm {w}\) instead of the original weights J. Before giving the proof of this lemma, let us state a variant form of the mass-transport principle involving oriented edges and good weight functions that will be useful. Let \(G=(V,E)\) be a connected weighted graph, let \(\Gamma \subseteq {\text {Aut}}(G)\) be a unimodular closed transitive subgroup, and let \(\mathrm {w} : E\rightarrow [0,1]\) be a \(\Gamma \)-good weight function. If \(F:E^\rightarrow \times E^\rightarrow \rightarrow [0,\infty ]\) is \(\Gamma \)-diagonally invariant in the sense that \(F(\gamma e_1,\gamma e_2)=F(e_1,e_2)\) for each two oriented edges \(e_1,e_2 \in E^\rightarrow \) and \(\gamma \in \Gamma \), then we have that

$$\begin{aligned} \sum _{e_1 \in E_o^\rightarrow } \sum _{e_2 \in E^\rightarrow } \mathrm {w}(e_1)\mathrm {w}(e_2)F(e_1,e_2) = \sum _{e_1 \in E_o^\rightarrow } \sum _{e_2 \in E^\rightarrow } \mathrm {w}(e_1)\mathrm {w}(e_2)F(e_2,e_1). \end{aligned}$$
(3.5)

Indeed, this follows by applying the usual mass-transport principle to the function \(F'(u,v) = \sum _{e_1 \in E_u^\rightarrow } \sum _{e_2 \in E_v^\rightarrow } \mathrm {w}(e_1)\mathrm {w}(e_2)F(e_1,e_2)\). The equality (3.5) also holds for signed diagonally-invariant functions \(F:E^\rightarrow \times E^\rightarrow \rightarrow {\mathbb {R}}\) that satisfy the absolute integrability condition

$$\begin{aligned} \sum _{e_1\in E^\rightarrow _o} \sum _{e_2 \in E^\rightarrow } \mathrm {w}(e_1)\mathrm {w}(e_2) |F(e_1,e_2)|<\infty . \end{aligned}$$
(3.6)

This follows by applying (3.5) separately to the positive and negative parts of F, which are defined by \(F^+(e_1,e_2)=0\vee F(e_1,e_2)\) and \(F^-(e_1,e_2)= 0\vee (-F(e_1,e_2))\).

Proof of Lemma 3.3

Define \({\mathscr {F}}_e\) to be the event that every cluster touching e is finite and let \({\mathscr {G}}_e\) be the event that there exists a finite cluster touching e and \({\mathcal {G}}\). Then \({\mathscr {T}}_e \cap {\mathscr {F}}_e\) is the event that the endpoints of e are in distinct finite clusters each of which touches the ghost field \({\mathcal {G}}\), and for each edge e of G we have that

$$\begin{aligned}&\mathbb {1}({\mathscr {T}}_e \cap {\mathscr {F}}_e) = \mathbb {1}(\omega (e)=0)\cdot \#\{\text {finite clusters touching }e\text { and }{\mathcal {G}}\} \\&\quad - \mathbb {1}\bigl (\{\omega (e)=0\}\cap {\mathscr {G}}_e). \end{aligned}$$

Taking expectations conditional on the environment \({\mathbf {p}}\), it follows that

$$\begin{aligned}&{\mathbf {P}}_{\mu ,\mathrm {w},h}({\mathscr {T}}_e \cap {\mathscr {F}}_e \mid {\mathbf {p}}\,) = {\mathbf {E}}_{\mu ,\mathrm {w},h}\left[ \mathbb {1}(\omega (e)=0)\cdot \#\{\text {finite clusters touching }e\text { and }{\mathcal {G}}\} \mid {\mathbf {p}}\,\right] \nonumber \\&\quad - {\mathbf {P}}_{\mu ,\mathrm {w},h}\bigl (\{\omega (e)=0\} \cap {\mathscr {G}}_e \mid {\mathbf {p}}\,\bigr ). \end{aligned}$$
(3.7)

Next, we observe that the event \({\mathscr {F}}_e \cap {\mathscr {G}}_e\) is conditionally independent of the value of \(\omega (e)\) given \({\mathbf {p}}\) and hence that

$$\begin{aligned}&{\mathbf {P}}_{\mu ,\mathrm {w},h}\bigl (\{\omega (e)=0\} \cap {\mathscr {F}}_e \cap {\mathscr {G}}_e \mid {\mathbf {p}}\, \bigr ) = \frac{1-{\mathbf {p}}_e}{{\mathbf {p}}_e} {\mathbf {P}}_{\mu ,\mathrm {w},h}\bigl (\{\omega (e)=1\} \cap {\mathscr {F}}_e \cap {\mathscr {G}}_e \mid {\mathbf {p}}\, \bigr ).\nonumber \\&\quad = \frac{1-{\mathbf {p}}_e}{{\mathbf {p}}_e} {\mathbf {P}}_{\mu ,\mathrm {w},h}\bigl (\{\omega (e)=1\} \cap {\mathscr {G}}_e \mid {\mathbf {p}}\,\bigr ). \end{aligned}$$
(3.8)

Substituting (3.8) into (3.7) yields that

$$\begin{aligned}&{\mathbf {P}}_{\mu ,\mathrm {w},h}({\mathscr {T}}_e \cap {\mathscr {F}}_e \mid {\mathbf {p}}\,) = {\mathbf {E}}_{\mu ,\mathrm {w},h}\left[ \mathbb {1}(\omega (e)=0)\cdot \#\{\text {finite clusters touching }e\text { and }{\mathcal {G}}\} \mid {\mathbf {p}}\,\right] \nonumber \\&\quad - \frac{1-{\mathbf {p}}_e}{{\mathbf {p}}_e}{\mathbf {P}}_{\mu ,\mathrm {w},h}(\{\omega (e)=1\} \cap {\mathscr {G}}_e \mid {\mathbf {p}}\,) - {\mathbf {P}}_{\mu ,\mathrm {w},h}\bigl (\{\omega (e)=0\} \cap {\mathscr {G}}_e \setminus {\mathscr {F}}_e \mid {\mathbf {p}}\,\bigr ). \end{aligned}$$
(3.9)

Since the events \(\{\omega (e)=0\} \cap {\mathscr {G}}_e \setminus {\mathscr {F}}_e\) and \({\mathscr {T}}_e \cap {\mathscr {F}}_e\) are disjoint and \({\mathscr {T}}_e\) coincides with \(({\mathscr {T}}_e \cap {\mathscr {F}}_e) \cup (\{\omega (e)=0\} \cap {\mathscr {G}}_e \setminus {\mathscr {F}}_e)\) up to a null set, the equation (3.9) implies that

$$\begin{aligned}&{\mathbf {P}}_{\mu ,\mathrm {w},h}({\mathscr {T}}_e \mid {\mathbf {p}}\,) = {\mathbf {E}}_{\mu ,\mathrm {w},h}\left[ \mathbb {1}(\omega (e)=0)\cdot \#\{\text {finite clusters touching }e\text { and }{\mathcal {G}}\} \mid {\mathbf {p}}\right] \\&\quad - \frac{1-{\mathbf {p}}_e}{{\mathbf {p}}_e}{\mathbf {P}}_{\mu ,\mathrm {w},h}(\{\omega (e)=1\} \cap {\mathscr {G}}_e \mid {\mathbf {p}}\,). \end{aligned}$$

We can rewrite this equality more succinctly as

$$\begin{aligned} {\mathbf {P}}_{\mu ,\mathrm {w},h}({\mathscr {T}}_e \mid {\mathbf {p}}\,)= {\mathbf {E}}_{\mu ,\mathrm {w},h}\left[ \frac{{\mathbf {p}}_e-\omega (e)}{{\mathbf {p}}_e} \cdot \#\{\text {finite clusters touching }e\text { and }{\mathcal {G}}\} \;\Bigm |\; {\mathbf {p}}\;\right] .\nonumber \\ \end{aligned}$$
(3.10)

Note that this equality is essentially identical to [45, Eq. 3.7], although of course the ghost field is defined with respect to a different choice of weights there.

Consider the \(\Gamma \)-diagonally-invariant function \(F:E^\rightarrow \times E^\rightarrow \rightarrow {\mathbb {R}}\) defined by

$$\begin{aligned}&F(e_1,e_2) \\&\quad ={\mathbf {E}}_{\mu ,\mathrm {w},h}\sum \left\{ \frac{1}{2\mathrm {w}(E(K))} \left[ \frac{{\mathbf {p}}_{e_1}-\omega (e_1)}{{\mathbf {p}}_{e_1}}\right] \right. \\&\qquad \left. \sqrt{\frac{{\mathbf {p}}_{e_1}}{(1-{\mathbf {p}}_{e_1})\mathrm {w}(e_1)}} : \begin{array}{l}K\text { is a finite cluster}\\ \text {of }\omega \text { touching }e_1,e_2,\text { and }{\mathcal {G}}\end{array}\right\} , \end{aligned}$$

where we write \(\sum \{x(i) :i\in I\} = \sum _{i\in I} x(i)\) and where we include the factor of 1/2 to account for the fact that each edge in E(K) can be oriented in two directions. (We say that an oriented edge touches K if at least one of its endpoints belongs to K.) The multiset of numbers being summed over has cardinality either 0, 1,  or 2, and we can therefore compute that

$$\begin{aligned}&\sum _{e_1 \in E^\rightarrow _o} \sum _{e_2\in E^\rightarrow } \mathrm {w}(e_1)\mathrm {w}(e_2) |F(e_1,e_2)| \\&\quad \le 2 \sum _{e_1 \in E^\rightarrow _o} \mathrm {w}(e_1) {\mathbf {E}}_{\mu ,\mathrm {w},h}\left[ \frac{|{\mathbf {p}}_{e_1}-\omega (e_1)|}{{\mathbf {p}}_{e_1}} \sqrt{\frac{{\mathbf {p}}_{e_1}}{(1-{\mathbf {p}}_{e_1})\mathrm {w}(e_1)}}\right] \\&\quad = 4 \sum _{e_1 \in E_o^\rightarrow } \sqrt{\mathrm {w}(e_1)} {\mathbf {E}}_{\mu ,\mathrm {w},h} \left[ \sqrt{{\mathbf {p}}_{e_1}(1-{\mathbf {p}}_{e_1})} \right] \le 4 \sum _{e_1 \in E_o^\rightarrow } \sqrt{\mathrm {w}(e_1)}, \end{aligned}$$

which is finite since \(\mathrm {w}\) is good. This gives us the integrability required to apply the mass-transport principle (3.5) to the right hand side of (3.10) and deduce that

$$\begin{aligned}&\sum _{e_1\in E_o^\rightarrow }\sqrt{\mathrm {w}(e_1)}{\mathbf {E}}_{\mu ,\mathrm {w},h}\left[ \mathbb {1}({\mathscr {T}}_{e_1})\sqrt{\frac{{\mathbf {p}}_{e_1}}{(1-{\mathbf {p}}_{e_1})}}\right] \nonumber \\&\quad = \sum _{e_1\in E_o^\rightarrow } \mathrm {w}(e_1){\mathbf {E}}_{\mu ,\mathrm {w},h}\left[ \frac{{\mathbf {p}}_{e_1}-\omega ({e_1})}{{\mathbf {p}}_{e_1}} \right. \nonumber \\&\qquad \left. \sqrt{\frac{{\mathbf {p}}_{e_1} }{(1-{\mathbf {p}}_{e_1})\mathrm {w}(e_1)}}\cdot \#\{\text {finite clusters touching }e\text { and }{\mathcal {G}}\} \right] \nonumber \\&\quad = \sum _{e_1\in E_o^\rightarrow } \sum _{e_2\in E^\rightarrow } \mathrm {w}(e_1) \mathrm {w}(e_2) F(e_1,e_2) = \sum _{e_1\in E_o^\rightarrow } \sum _{e_2\in E^\rightarrow } \mathrm {w}(e_1)\mathrm {w}(e_2) F(e_2,e_1)\nonumber \\&\quad = \sum _{e_1 \in E_o^\rightarrow }\mathrm {w}(e_1){\mathbf {E}}_{\mu ,\mathrm {w},h} \sum \left\{ \frac{h_{{\mathbf {p}},\mathrm {w}}(K)}{\mathrm {w}(E(K))} : \begin{array}{l}K\text { is a finite cluster}\\ \text {of }\omega \text { touching }e_1\text { and }{\mathcal {G}}\end{array}\right\} . \end{aligned}$$
(3.11)

Letting \({\mathscr {O}}_v\) be the event that the cluster \(K_v\) is finite and touches \({\mathcal {G}}\) for each vertex v of G, we deduce that

$$\begin{aligned}&\sum _{e_1\in E_o^\rightarrow }\sqrt{\mathrm {w}(e_1)}{\mathbf {E}}_{\mu ,\mathrm {w},h}\left[ \mathbb {1}({\mathscr {T}}_{e_1})\sqrt{\frac{{\mathbf {p}}_{e_1}}{(1-{\mathbf {p}}_{e_1})}}\right] \\&\quad \le \sum _{e_1\in E_o^\rightarrow } \mathrm {w}(e_1) {\mathbf {E}}_{\mu ,\mathrm {w},h} \sum \left\{ \frac{|h_{{\mathbf {p}},\mathrm {w}}(K)|}{\mathrm {w}(E(K))} : \begin{array}{l} K\text { is a finite cluster}\\ \text {of }\omega \text { touching }e_1\text { and }{\mathcal {G}}\end{array}\right\} \\&\quad \le \sum _{e_1\in E_o^\rightarrow } \mathrm {w}(e_1) {\mathbf {E}}_{\mu ,\mathrm {w},h}\left[ \frac{|h_{{\mathbf {p}},\mathrm {w}}(K_{o})|}{\mathrm {w}(E(K_{o}))}\mathbb {1}\bigl ({\mathscr {O}}_{o} \bigr )+ \frac{|h_{{\mathbf {p}},\mathrm {w}}(K_{e^+})|}{\mathrm {w}(E(K_{e^+}))}\mathbb {1}\bigl ({\mathscr {O}}_{e^+} \bigr )\right] \\&\quad =2{\mathbf {E}}_{\mu ,\mathrm {w},h}\left[ \frac{|h_{{\mathbf {p}},\mathrm {w}}(K_{o})|}{\mathrm {w}(E(K_{o}))}\mathbb {1}\bigl ({\mathscr {O}}_{o} \bigr )\right] \end{aligned}$$

as claimed, where the final equality follows by transitivity since \(\sum _{e_1\in E_o^\rightarrow }\mathrm {w}(e_1)=1\).

\(\square \)

We now bound the right hand side of the inequality of Lemma 3.3 via a martingale analysis, where we use the assumption \({\mathbf {P}}_\mu (|K_o|\ge n)\le A n^{-a}\) to improve upon the analysis of [45, Section 3.1]. Let \(X=(X_n)_{n\ge 0}\) be a real-valued martingale with respect to the filtration \({\mathcal {F}}=({\mathcal {F}}_n)_{n\ge 0}\), and suppose that \(X_0=0\). The quadratic variation process \(Q=(Q_n)_{n\ge 0}\) associated to \((X,{\mathcal {F}})\) is defined by \(Q_0=0\) and

$$\begin{aligned} Q_n = \sum _{i=1}^n{\mathbb {E}}\left[ |X_i - X_{i-1}|^2 \mid {\mathcal {F}}_{i-1} \right] \end{aligned}$$

for each \(n\ge 1\). The following is a minor improvement of [45, Lemma 3.4].

Lemma 3.4

Let \((X_n)_{n\ge 0}\) be a martingale with respect to the filtration \(({\mathcal {F}}_n)_{n\ge 0}\) such that \(X_0=0\), let \((Q_n)_{n\ge 0}\) be the associated quadratic variation process, and let T be a stopping time. Then

$$\begin{aligned} {\mathbb {E}}\Bigl [ \sup \bigl \{X_n^2 : 0 \le n \le T,\, Q_T \le \lambda \bigr \} \Bigr ] \le 4 {\mathbb {E}}\left[ Q_T \wedge \lambda \right] \qquad \text {for every }\lambda \ge 0. \end{aligned}$$

Proof

Fix \(\lambda \ge 0\) and let \(\tau =\sup \{k\ge 0: Q_k \le \lambda \}=\inf \{k\ge 0 : Q_k > \lambda \}-1\), which may be infinite. Since \(Q_n\) is \({\mathcal {F}}_{n-1}\)-measurable for every \(n\ge 0\), \(\tau \) is a stopping time and \(X_{n\wedge \tau \wedge T}\) is a martingale. Thus, we have by the orthogonality of martingale increments that

$$\begin{aligned} {\mathbb {E}}\left[ X^2_{n\wedge \tau \wedge T}\right]&= \sum _{i=1}^n{\mathbb {E}}\left[ (X_{i\wedge \tau \wedge T}-X_{(i-1)\wedge \tau \wedge T})^2\right] \\&= \sum _{i=1}^n{\mathbb {E}}\left[ {\mathbb {E}}\left[ (X_{i\wedge \tau \wedge T}-X_{(i-1)\wedge \tau \wedge T})^2\mid {\mathcal {F}}_{i-1} \right] \right] \\&={\mathbb {E}}\left[ \sum _{i=1}^{n \wedge T} {\mathbb {E}}\left[ (X_{i}-X_{i-1})^2\mid {\mathcal {F}}_{i-1} \right] \mathbb {1}(i \le \tau )\right] \\&= {\mathbb {E}}\left[ Q_{n \wedge \tau \wedge T}\right] \le {\mathbb {E}}\left[ Q_{T} \wedge \lambda \right] \end{aligned}$$

for every \(n\ge 1\). The claim follows by applying Doob’s \(L^2\) maximal inequality to

\((X_{n\wedge \tau \wedge T})_{n\ge 0}\). \(\square \)

We now apply Lemma 3.4 to deduce the following improvement to [45, Lemma 3.5] under the assumption that the tail of the total quadratic variation satisfies a power-law upper bound.

Lemma 3.5

Let \((X_n)_{n\ge 0}\) be a martingale with respect to the filtration \(({\mathcal {F}}_n)_{n\ge 0}\) such that \(X_0=0\), and let \((Q_n)_{n\ge 0}\) be the associated quadratic variation process. Let T be a stopping time and suppose that there exist constants A and \(0\le \theta <1/2\) such that \({\mathbb {P}}( Q_T \ge x) \le A x^{-\theta }\) for every \(x > 0\). Then

$$\begin{aligned}&{\mathbb {E}}\left[ \frac{\sup _{0 \le n \le T}|X_n|}{Q_T} (1-e^{-h Q_T})\mathbb {1}(0<Q_T < \infty ) \right] \nonumber \\&\quad \le \frac{20 A}{1-2\theta } h^{(1+2\theta )/2} \qquad \text {for every }h> 0. \end{aligned}$$
(3.12)

Proof

Write \(M_n=\max _{0\le m \le n} |X_n|\) for each \(n\ge 0\). Since \((1-e^{-hx})/x\) is a decreasing function of \(x>0\), we may write

$$\begin{aligned}&{\mathbb {E}}\left[ \frac{M_T}{Q_T}\bigl (1-e^{-hQ_T}\bigr )\mathbb {1}(0<Q_T<\infty ) \right] \\&\quad \le h\sum _{k=-\infty }^\infty \frac{1-e^{-e^{k}}}{e^k} {\mathbb {E}}\left[ M_T \mathbb {1}(e^k \le h Q_T \le e^{k+1})\right] . \end{aligned}$$

We can then compute that

$$\begin{aligned} {\mathbb {E}}\left[ Q_T \wedge \lambda \right] = \int _{x=0}^\lambda {\mathbb {P}}(Q_T \ge x)\text {t}x \le \int _{x=0}^\lambda A x^{-\theta }\text {t}x = \frac{A}{1-\theta }\lambda ^{1-\theta } \end{aligned}$$

for every \(\lambda >0\), so that Lemma 3.4 and Cauchy-Schwarz let us bound

$$\begin{aligned} {\mathbb {E}}\left[ M_T \mathbb {1}(e^k \le hQ_T \le e^{k+1})\right] ^2&\le 4{\mathbb {E}}\left[ Q_T \wedge h^{-1}e^{k+1}\right] {\mathbb {P}}\bigl (Q_T \ge h^{-1}e^k\bigr ) \\&\le \frac{4 A}{1-\theta } e^{(1-\theta )(k+1)} h^{-(1-\theta )}\cdot A e^{-\theta k} h^{\theta } \\&= \frac{4 A^2e^{1-\theta }}{1-\theta } e^{(1-2\theta )k} h^{2\theta -1} \end{aligned}$$

for each \(k\in {\mathbb {Z}}\). Taking square roots and summing over k we obtain that

$$\begin{aligned} {\mathbb {E}}\left[ \frac{M_T}{Q_T}\bigl (1-e^{-hQ_T}\bigr )\mathbb {1}(0<Q_T<\infty ) \right] \le \frac{2A e^{(1-\theta )/2}}{\sqrt{1-\theta }} h^{(1+2\theta )/2}\sum _{k=-\infty }^\infty \frac{1-e^{-e^{k}}}{e^{(1+2\theta )k/2}}. \end{aligned}$$

This series is easily seen to converge, and indeed satisfies

$$\begin{aligned} \sum _{k=-\infty }^\infty \frac{1-e^{-e^{k}}}{e^{(1+2\theta )k/2}}&\le \sum _{k= 0}^\infty \frac{1}{e^{(1+2\theta )k/2}} + \sum _{k= 1}^\infty \frac{e^{-k}}{e^{-(1+2\theta )k/2}}\\&=\frac{1}{1-e^{-(1+2\theta )/2}}+\frac{1}{e^{(1-2\theta )/2}-1}\\&\le \frac{\sqrt{e}+1}{(\sqrt{e}-1)(1-2\theta )} \end{aligned}$$

for every \(0\le \theta <1/2\), where the final inequality can be verified by calculus. It follows that

$$\begin{aligned} {\mathbb {E}}\left[ \frac{M_T}{Q_T}\bigl (1-e^{-hQ_T}\bigr )\mathbb {1}(0<Q_T<\infty ) \right]\le & {} \frac{2 (\sqrt{e}+1) A \sqrt{2e}}{(\sqrt{e}-1)(1-2\theta )} h^{(1+2\theta )/2} \\\le & {} \frac{20 A}{1-2\theta } h^{(1+2\theta )/2} \end{aligned}$$

as claimed, where we used the bound \((2 (\sqrt{e}+1) \sqrt{2e}) / (\sqrt{e}-1) = 19.040\ldots \le 20\) to simplify the constant. \(\square \)

Proof of Proposition 3.2

We prove the proposition in the case that \(\mu \) is supported on \((0,1)^E\), which is the only case required by our main theorems. The general case follows by a simple limiting argument that is given in detail in the proof of [45, Theorem 3.1]. Let \(\mu \) be a \(\Gamma \)-invariant probability measure on \((0,1)^E\), let \(\mathrm {w}\) be a \(\Gamma \)-good weight function, and let \(({\mathbf {p}},\omega )\) be random variables with law \({\mathbf {P}}_\mu \). Write \(K=K_o\) for the cluster of o in \(\omega \). As in the proofs of [44, Theorem 1.6] and [45, Theorem 3.1], we can condition on the environment \({\mathbf {p}}\) and explore the cluster K one edge at a time in such a way that if T denotes the (possibly infinite) total number of edges touching K, \(E_{n}\) denotes the (random, unoriented) edge whose status is queried at the nth step of the exploration for each \(n\ge 0\), and \({\mathcal {F}}_{n}\) denotes the \(\sigma \)-algebra generated by the environment \({\mathbf {p}}\) and the first n steps of the exploration for each \(n\ge 0\), then \({\mathbf {P}}_\mu (E_{n+1}=1 \mid {\mathcal {F}}_n) = {\mathbf {p}}_{E_{n+1}}\) whenever \(n<T\) and \(\{E_i : 1 \le i \le T\}=E(K)\). (Briefly, we can define such an exploration process by fixing an enumeration \(E=\{e_1,e_2,\ldots \}\) and, at each step, taking \(E_{n+1}\) to be minimal with respect to this enumeration among those edges that are incident to the part of the cluster of o that has been explored but have not already been queried. See the above references for formal definitions.) It follows that the process \((Z_n)_{n\ge 0}\) defined by \(Z_0=0\) and

$$\begin{aligned} Z_n = \sum _{i=1}^{n\wedge T} \sqrt{\mathrm {w}(E_i)} \left[ \sqrt{\frac{{\mathbf {p}}_{E_i}}{1-{\mathbf {p}}_{E_i}}} \mathbb {1}(\omega (E_i)=0) - \sqrt{\frac{1-{\mathbf {p}}_{E_i}}{{\mathbf {p}}_{E_i}}} \mathbb {1}(\omega (E_i)=1)\right] \end{aligned}$$

for each \(n\ge 1\) is a martingale with respect to the filtration \(({\mathcal {F}}_n)_{n\ge 0}\) for which the final value \(Z_T\) is equal to the \(\mathrm {w}\)-fluctuation \(h_{{\mathbf {p}},\mathrm {w}}(K)\). Moreover, we can express the associated quadratic variation process \(Q_n=\sum _{i=1}^n {\mathbf {E}}_{\mu }[(Z_{i+1}-Z_i)^2 \mid {\mathcal {F}}_i]\) as

$$\begin{aligned} Q_n= & {} \sum _{i=1}^{n\wedge T} {\mathbf {E}}_{\mu }\left[ \mathrm {w}(E_i) \left[ \frac{{\mathbf {p}}_{E_i}}{1-{\mathbf {p}}_{E_i}} \mathbb {1}(\omega (E_i)=0) + \frac{1-{\mathbf {p}}_{E_i}}{{\mathbf {p}}_{E_i}} \mathbb {1}(\omega (E_i)=1)\right] \Biggm | {\mathcal {F}}_{n-1}\right] \\= & {} \sum _{i=1}^{n\wedge T} \mathrm {w}(E_i) \end{aligned}$$

for every \(n\ge 0\), so that \(Q_T = \mathrm {w}(E(K))\) is the total weight of all the edges touching K. Thus, it follows from Lemmas 3.3 and 3.5 that if \({\mathbf {P}}_\mu (|K|\ge n) \le A n^{-\theta }\) for every \(n\ge 1\) then

$$\begin{aligned}&\sum _{e\in E^\rightarrow _o} \sqrt{\mathrm {w}(e)}{\mathbf {E}}_{\mu ,\mathrm {w},h}\left[ \mathbb {1}({\mathscr {T}}_e) \sqrt{\frac{{\mathbf {p}}_e}{1-{\mathbf {p}}_e}}\right] \\&\quad \le 2 {\mathbf {E}}_{p}\left[ \frac{|h_{{\mathbf {p}},\mathrm {w}}(K)|}{\mathrm {w}(E(K))} (1-e^{-h \mathrm {w}(E(K))} \mathbb {1}\bigl (|K|<\infty \bigr )\right] \\&\quad = 2 {\mathbf {E}}_p\left[ \frac{|Z_T|}{Q_T}\bigl (1-e^{-h Q_T}\bigr )\mathbb {1}(0<Q_T<\infty ) \right] \le \frac{40 A}{1-2\theta } h^{(1+2\theta )/2} \end{aligned}$$

as required. \(\square \)

4 Proof of the main theorem

In this section we apply Theorems 2.1 and 3.1 to prove Theorems 1.1 and 1.2. The proof of Theorem 1.1 relies on the following key bootstrapping lemma.

Lemma 4.1

Let \(d\ge 1\), let \(J:{\mathbb {Z}}^d \rightarrow (0,\infty )\) be symmetric and integrable, and suppose that there exists \(\alpha < d\), \(c>0\), and \(r_0<\infty \) such that \(J(x)\ge c \Vert x \Vert _1^{-d-\alpha }\) for every \(x\in {\mathbb {Z}}^d\) with \(\Vert x\Vert _1 \ge r_0\). Let \(\theta =(d-\alpha )/(2d+\alpha ) <1/2\). Then there exists a constant \(C\ge 1\) such that the following implication holds for each \(0 \le \beta < \beta _c\) and \(1 \le A<\infty \):

$$\begin{aligned}&\Bigl ({\mathbf {P}}_\beta (|K|\ge n) \le A n^{-\theta }\text { for every } n\ge 1\Bigr )\\&\quad \Rightarrow \Bigl ({\mathbf {P}}_\beta (|K|\ge n) \le C A^{1/(1+\theta )} n^{-\theta }\text { for every }n\ge 1\Bigr ). \end{aligned}$$

Proof of Lemma 4.1

By rescaling if necessary, we may assume without loss of generality that \(\sum _{e\in E^\rightarrow _o} J_e =1\). Fix \(0 \le \beta <\beta _c\) and suppose that \(1 \le A <\infty \) is such that \({\mathbf {P}}_\beta (|K|\ge n) \le A n^{-\theta }\) for every \(n\ge 1\), where \(\theta =(d-\alpha )/(2d+\alpha ) <1/2\). We wish to prove that there exists a constant C that may depend on d\(\alpha \), c, and \(r_0\) but not on the choice of \(1\le A < \infty \) or \(0\le \beta <\beta _c\) such that

$$\begin{aligned} {\mathbf {P}}_\beta (|K|\ge n) \le C A^{1/(1+\theta )} n^{-\theta } \end{aligned}$$

for every \(n\ge 1\). If \(\beta \le 1/2\) then a standard path-counting argument implies that \({\mathbf {E}}_\beta |K_o| \le 2\), so that the claim holds trivially in this case by Markov’s inequality provided that we take \(C\ge 2\). We may therefore assume that \(\beta \ge 1/2\) for the remainder of the proof.

All the constants appearing in the remainder of the proof may depend on d\(\alpha \), c, and \(r_0\) but not on the choice of \(1\le A < \infty \) or \(1/2\le \beta <\beta _c\). For each \(x\in {\mathbb {Z}}^d\) and \(n\ge 1\), let \({\mathscr {S}}_{x,n}'\) be the event that 0 and x belong to distinct clusters each of which contains at least n vertices; both clusters are automatically finite since \(\beta < \beta _c\). Since \(\theta <1/2\), we have by Theorem 3.1 that there exists a constant \(C_1\) such that

$$\begin{aligned} \sum _{x \in {\mathbb {Z}}^d} (e^{\beta J_x}-1) {\mathbf {P}}_\beta ({\mathscr {S}}_{x,n}')^2 \le C_1 A n^{-(1+2\theta )} \end{aligned}$$

for every \(n \ge 1\). Let \(\Lambda '_r=\Lambda _r \setminus \Lambda _{r_0-1}\) for each \(r\ge r_0\). It follows by Cauchy-Schwarz that there exists a constant \(C_2\) such that

$$\begin{aligned} \sum _{x \in \Lambda _r'} {\mathbf {P}}_\beta ({\mathscr {S}}_{x,n}')&\le \left[ \sum _{x \in \Lambda _r'} (e^{\beta J_x}-1) {\mathbf {P}}_\beta ({\mathscr {S}}_{x,n}')^2\right] ^{1/2}\left[ \sum _{x \in \Lambda _r'} \frac{1}{e^{\beta J_x}-1}\right] ^{1/2} \nonumber \\&\le C_1^{1/2} A^{1/2} n^{-(1+2\theta )/2} \left( \frac{1}{c\beta r^{-d-\alpha }} |\Lambda _r'| \right) ^{1/2} \nonumber \\&\le C_2 A^{1/2} n^{-(1+2\theta )/2} r^{\alpha /2} |\Lambda _r| \end{aligned}$$
(4.1)

for every \(r\ge r_0\), where we used the inequality \(e^x-1 \ge x\) in the first inequality on the second line. On the other hand, since \(\theta <1/2\), it follows immediately from Theorem 2.1 that there exists a constant \(C_3\) such that

$$\begin{aligned} \frac{1}{|\Lambda _r'|}\sum _{x\in \Lambda _r} {\mathbf {P}}_\beta (0 \leftrightarrow x) \le C_3 A^{2/(1+\theta )} |\Lambda _r'|^{-2\theta /(1+\theta )} \le C_3 A^{2/(1+\theta )} r^{-2\theta d/(1+\theta )} \nonumber \\ \end{aligned}$$
(4.2)

for every \(r\ge r_0\). We now apply these two bounds to obtain a new bound on \({\mathbf {P}}_\beta (|K_0|\ge n)\). We have by a union bound and the Harris-FKG inequality that

$$\begin{aligned} {\mathbf {P}}_\beta ({\mathscr {S}}_{x,n}') \ge {\mathbf {P}}_\beta (|K_0|\ge n,|K_x|\ge n)-{\mathbf {P}}_\beta (0 \leftrightarrow x) \ge {\mathbf {P}}_\beta (|K_0|\ge n)^2-{\mathbf {P}}_\beta (0 \leftrightarrow x). \end{aligned}$$

for each \(x\in {\mathbb {Z}}^d\) and \(n\ge 1\). Rearranging and averaging over \(x\in \Lambda _r\), it follows that

$$\begin{aligned} {\mathbf {P}}_\beta (|K_0|\ge n)^2&\le \frac{1}{|\Lambda _r'|}\sum _{x\in \Lambda _r'}{\mathbf {P}}_\beta ({\mathscr {S}}_{x,n}')+ \frac{1}{|\Lambda _r'|}\sum _{x\in \Lambda _r'} {\mathbf {P}}_\beta (0 \leftrightarrow x) \end{aligned}$$
(4.3)
$$\begin{aligned}&\le C_2 A^{1/2} r^{\alpha /2} n^{-(1+2\theta )/2}+C_3 A^{2/(1+\theta )} r^{-2d\theta /(1+\theta )} \end{aligned}$$
(4.4)

for every \(r\ge r_0\) and \(n\ge 1\). Taking \(r=r_0 \vee \left\lceil n^{(1-2\theta )/\alpha }\right\rceil \) yields that there exists a constant \(C_4\) such that

$$\begin{aligned} {\mathbf {P}}_\beta (|K_0|\ge n)^2&\le C_4 \left( A^{1/2} n^{-2\theta }+ A^{2/(1+\theta )} n^{-2d\theta (1-2\theta )/(\alpha +\alpha \theta )} \right) \end{aligned}$$
(4.5)

for every \(n \ge 1\). Since \(\theta =(d-\alpha )/(2d+\alpha )\), the two powers of n appearing in this expression and equal. Since we also have that \(A^{1/2} \le A^{2/(1+\theta )}\), it follows by taking square roots on both sides of (4.5) that \({\mathbf {P}}_\beta (|K_0|\ge n) \le \sqrt{2C_4} A^{1/(1+\theta )} n^{-\theta }\) for every \(n\ge 1\). This completes the proof. \(\square \)

Proof of Theorem 1.1

We follow the same argument used to deduce Proposition 1.4 from the implication (1.8); we include the details again here for ease of reading. Let \(\theta =(d-\alpha )/(2d+\alpha )<1/2\). For each \(0 \le \beta < \beta _c\), we have by sharpness of the phase transition [2, 29] that \(|K_0|\) has finite mean, and in particular that there exists \(1 \le A < \infty \) such that \({\mathbf {P}}_\beta (|K_0|\ge n) \le A n^{-\theta }\) for every \(n\ge 1\). For each \(0\le \beta < \beta _c\) we may therefore define

$$\begin{aligned} A_\beta = \min \bigl \{1 \le A< \infty : {\mathbf {P}}_\beta (|K_0|\ge n) \le A n^{-\theta } \text { for every }n\ge 1\bigr \} < \infty . \end{aligned}$$

Observe that the set we are minimizing over is closed, so that \({\mathbf {P}}_\beta (|K_0|\ge n) \le A_\beta n^{-\theta }\) for every \(n\ge 1\) and \(0 \le \beta < \beta _c\). Lemma 4.1 implies that there exists a constant \(C=C(d,\alpha ,c,r_0)\) such that \(A_\beta \le C A_\beta ^{1/(1+\theta )}\) for every \(0 \le \beta < \beta _c\). Since \(A_\beta \) is finite for every \(0\le \beta <\beta _c\) we may safely rearrange this inequality to obtain that \(A_\beta \le C^{(1+\theta )/\theta }\) for every \(0\le \beta <\beta _c\) and hence that

$$\begin{aligned} {\mathbf {P}}_\beta (|K_0|\ge n) \le C^{(1+\theta )/\theta } n^{-\theta } \end{aligned}$$

for every \(0 \le \beta < \beta _c\) and \(n\ge 1\). This implies in particular that \(\beta _c<\infty \). Considering the standard monotone coupling of \({\mathbf {P}}_\beta \) and \({\mathbf {P}}_{\beta _c}\) for \(\beta \le \beta _c\) and taking limits as \(\beta \uparrow \beta _c\), it follows that the same estimate holds for all \(0\le \beta \le \beta _c\) as claimed. The claimed bound on the averaged two-point function \(|\Lambda _r|^{-1}\sum _{x\in |\Lambda _r|} {\mathbf {P}}_\beta (0\leftrightarrow x)\) follows immediately from the bound \({\mathbf {P}}_\beta (|K_0|\ge n) \le C^{(1+\theta )/\theta } n^{-\theta }\) together with Theorem 2.5. \(\square \)

Proof of Theorem 1.2

This proof is very similar to that of Theorem 1.1, and we will omit most the details. As before, we may assume without loss of generality that \(\sum _{e\in E^\rightarrow _o} J_e =1\). The analogue of Lemma 4.1 is as follows: Let \(\theta =(2a-1)/(a+1)<1/2\). Then there exists a constant C such that the implication

$$\begin{aligned}&\Bigl ({\mathbf {P}}_\beta (|K|\ge n) \le A n^{-\theta }\text { for every }n\ge 1\Bigr )\nonumber \\&\quad \Rightarrow \Bigl ({\mathbf {P}}_\beta (|K|\ge n) \le C A^{1/(1+\theta )} n^{-\theta }\text { for every }n\ge 1\Bigr ) \end{aligned}$$
(4.6)

holds for every \(1 \le A < \infty \) and \(0 \le \beta < \beta _c\). This will be proven via essentially the same argument as above but where we replace the set \(\Lambda _r'\) with the analogous set \(\Lambda _\varepsilon =\{x \in V : \{o,x\} \in E, J_{\{o,x\}} \ge \varepsilon \}\), which satisfies \(|\Lambda _\varepsilon | \ge c\varepsilon ^{-a}\) for every \(0<\varepsilon \le \varepsilon _0\) by assumption. As before, it suffices to consider the case that \(\beta \ge 1/2\). Fix \(1/2 \le \beta <\beta _c\) and \(1 \le A < \infty \) and suppose that \({\mathbf {P}}_\beta (|K|\ge n) \le A n^{-\theta }\) for every \(n\ge 1\). The derivations of (4.1) and (4.2) from Theorem 3.1 and Theorem 2.1 yield in this context that there exist constants \(C_1\), \(C_2\), and \(C_3\) such that

$$\begin{aligned}&\frac{1}{|\Lambda _\varepsilon |}\sum _{x \in \Lambda _\varepsilon } {\mathbf {P}}_\beta ({\mathscr {S}}'_{x,n})&\le C_1 A^{1/2}n^{-(1+2\theta )/2} \varepsilon ^{-(1-a)/2} \end{aligned}$$
(4.7)
$$\begin{aligned} \text {and}&\frac{1}{|\Lambda _\varepsilon |}\sum _{x\in \Lambda _r} {\mathbf {P}}_\beta (0 \leftrightarrow x)&\le C_2 A^{2/(1+\theta )} |\Lambda _\varepsilon |^{-2\theta /(1+\theta )} \le C_3 A^{2/(1+\theta )} \varepsilon ^{2a\theta /(1+\theta )} \end{aligned}$$
(4.8)

for every \(0<\varepsilon \le \varepsilon _0\) and \(n\ge 1\). The same union bound and Harris-FKG argument used to derive (4.4) then yields that

$$\begin{aligned} {\mathbf {P}}_\beta (|K_o| \ge n)^2 \le C_1 A^{1/2}n^{-(1+2\theta )/2} \varepsilon ^{-1/2} + C_3 A^{2/(1+\theta )} \varepsilon ^{2a\theta /(1+\theta )} \end{aligned}$$

for every \(0<\varepsilon \le \varepsilon _0\) and \(n\ge 1\). Taking \(\varepsilon =\varepsilon _0 \wedge n^{-(1-2\theta )/(1-a)}\) implies that there exists a constant \(C_4\) such that

$$\begin{aligned} {\mathbf {P}}_\beta (|K_o| \ge n)^2 \le C_4 A^{1/2} n^{-2\theta } + C_4 A^{2/(1+\theta )} n^{-2a\theta (1-2\theta )/((1-a)(1+\theta ))} \end{aligned}$$

for every \(n\ge 1\). As before, the definition of \(\theta \) is chosen such that these two powers of n are equal, and we obtain that \({\mathbf {P}}_\beta (|K_o|\ge n) \le \sqrt{2C_4} A^{1/(1+\theta )} n^{-\theta }\) for every \(n\ge 1\). This completes the proof of the implication (4.6). The derivation of Theorem 1.2 from the implication (4.6) is identical to the derivation of Theorem 1.1 from Lemma 4.1 and is omitted. \(\square \)