Power-law bounds for critical long-range percolation below the upper-critical dimension

We study long-range Bernoulli percolation on $\mathbb{Z}^d$ in which each two vertices $x$ and $y$ are connected by an edge with probability $1-\exp(-\beta \|x-y\|^{-d-\alpha})$. It is a theorem of Noam Berger (CMP, 2002) that if $0<\alpha<d$ then there is no infinite cluster at the critical parameter $\beta_c$. We give a new, quantitative proof of this theorem establishing the power-law upper bound \[ \mathbf{P}_{\beta_c}\bigl(|K|\geq n\bigr) \leq C n^{-(d-\alpha)/(2d+\alpha)} \] for every $n\geq 1$, where $K$ is the cluster of the origin. We believe that this is the first rigorous power-law upper bound for a Bernoulli percolation model that is neither planar nor expected to exhibit mean-field critical behaviour. As part of the proof, we establish a universal inequality implying that the maximum size of a cluster in percolation on any finite graph is of the same order as its mean with high probability. We apply this inequality to derive a new rigorous hyperscaling inequality $(2-\eta)(\delta+1)\leq d(\delta-1)$ relating the cluster-volume exponent $\delta$ and two-point function exponent $\eta$.


Introduction
Let d ≥ 1 and suppose that J : Z d → [0, ∞) is both symmetric in the sense that J(x) = J(−x) for every x ∈ Z d and integrable in the sense that x∈Z d J(x) < ∞. For each β ≥ 0, longrange percolation on Z d with intensity J is the random graph with vertex set Z d in which we choose whether or not to include each potential edge {x, y} independently at random with inclusion probability 1−exp(−βJ(y −x)). Note that this model is equivalent to nearest-neighbour percolation when J(x) = ½( x 1 = 1). Here we will instead be most interested in the case that J(x) decays like an inverse power of x , so that for some constants A > 0 and α > 0. We denote the law of the resulting random graph by P β = P J,β and refer to the connected components of this random graph as clusters. Studying the geometry of these clusters leads to many interesting questions, some of which are motivated by applications to modeling 'small-world' phenomena in physics, epidemiology, the social sciences, and so on; see e.g. [12,Section 1.4] and [14,Section 10.6] for background and many references. Although substantial progress on these questions has been made over the last forty years, with highlights of the literature including [5,12,13,20,23,24,68], many further important problems remain open.
In this paper we study the phase transition in long-range percolation. Given d ≥ 1 and a symmetric, integrable function J : Z d → [0, ∞), we define the critical parameter β c = β c (J) = sup β ≥ 0 : P β is supported on configurations with no infinite clusters .
Elementary path-counting arguments yield that there are no infinite clusters almost surely when β x J(x) < 1, and hence that β c ≥ 1/ x J(x) > 0 under the assumption that J is locally finite. When d = 1 and J is of the form (1.1), the model has a non-trivial phase transition in the sense that 0 < β c < ∞ if and only if α ≤ 1, while for d ≥ 2 the phase transition is non-trivial for every α > 0 [54,57]. As with nearest-neighbour percolation, the model is expected to exhibit many interesting fractal-like features when β = β c (see e.g. [20,24]), but proving this rigorously seems to be a very difficult problem in general.
It is a surprising fact that our understanding of long-range percolation models is better than our understanding of their nearest-neighbour counterparts in many situations. Indeed, it is a remarkable theorem of Noam Berger [11] that long-range percolation on Z d undergoes a continuous phase transition in the sense that there is no infinite cluster at β c whenever d ≥ 1 and 0 < α < d. The corresponding statement for nearest-neighbour percolation with d ≥ 2 is of course a notorious open problem needing little further introduction. While it is widely believed that the phase transition should be continuous for all α > 0 and d ≥ 2, it is a theorem of Aizenman and Newman [5] that the model undergoes a discontinuous phase transition when d = α = 1, so that the condition α < d cannot be removed from Berger's result in general.
Berger's proof works by showing that the set of β for which an infinite cluster exists a.s. is open, and gives little quantitative control of percolation at the critical parameter β c itself. In this paper we give a new, quantitative proof of Berger's result that yields an explicit power-law upper bound on the tail of the volume of the cluster of the origin at criticality under the same assumptions. We write K 0 for the cluster of the origin, write Λ r = [−r, r] d ∩ Z d for each r ≥ 0, and write {x ↔ y} for the event that x and y belong to the same cluster. for every x ∈ Z d with x 1 ≥ r 0 . Then there exists a constant C such that for every β ≤ β c , n ≥ 1, and r ≥ 1. In particular, there are almost surely no infinite clusters at the critical parameter β c .
The theorem is most interesting when d < 6 and α > d/3, in which case the model is not expected to have mean-field behaviour and high-dimensional techniques such as the lace expansion [20,33] should not apply. Indeed, we believe that Theorem 1.1 is the first rigorous, non-trivial power-law upper bound for a critical Bernoulli percolation model that is neither two-dimensional nor expected to be described by mean-field critical exponents.
Let us now discuss interpretations of our results in terms of critical exponents. It is strongly believed that the large-scale behaviour of critical (long-range or nearest-neighbour) percolation on d-dimensional Euclidean lattices is described by critical exponents [31,Chapters 9 and 10]. The most relevant of these exponents to us are traditionally denoted δ and η and are believed to describe the distribution of the cluster of the origin at criticality via the asymptotics P βc (|K 0 | ≥ n) ≈ n −1/δ as n → ∞ and where ≈ means that the ratio of the logarithms of the two sides tends to 1 in the relevant limit. These exponents are expected to depend on the dimension d and the long-range parameter α (if appropriate) but not on the small-scale details of the model such as the choice of lattice. It is an open problem of central importance to prove the existence of and/or compute these exponents, as well as to prove that they are universal in this sense. Significant progress has been made in high dimensions (d > 6 or α < d/3) [4,6,20,29,33,35], where it is known that δ = 2 and η = 0 for several large classes of examples, and for nearest-neighbour models in two dimensions [46,49,62,63], where it has been proven in particular that δ = 91/5 and η = 5/24 for site percolation on the triangular lattice as predicted by Nienhuis [55]. Important partial progress for other two-dimensional planar lattices has been made by Kesten [44][45][46] and Kesten and Zhang [47]. Progress in intermediate dimensions has however been extremely limited. Theorem 1.1 can be seen as a modest first step towards understanding the problem in this regime, and implies that for long-range percolation with 0 < α < d the exponents δ and η satisfy whenever they are well-defined. (Conversely, the mean-field lower bound of Aizenman and Barsky [2] implies that δ satisfies δ ≥ 2 whenever it is well-defined; see also [28,Proposition 1.3].) See Section 1.3 for a discussion of how these bounds compare to the non-rigorous predicted values of η and δ in the physics literature. We remark that similar bounds on other exponents including the susceptibility exponent γ, gap exponent ∆, and cluster density exponent β can be obtained from (1.4) using the rigorous scaling inequalities γ ≤ δ − 1, ∆ ≤ δ, and β ≥ 2/δ proven in [43] and [53].
Hyperscaling inequalities. As a part of our proof, we also prove a new rigorous hyperscaling inequality (2 − η)(δ + 1) ≤ d(δ − 1) for both long-range and nearest-neighbour percolation. To prove this inequality, we first prove a universal inequality implying in particular that the maximum cluster size in percolation on any finite graph is of the same order as its mean with high probability. Both results are of independent interest, and are discussed in detail in Section 2.
Other graphs. Our methods are not very specific to the hypercubic lattice Z d , and can also be used to establish very similar results for long-range percolation on, say, arbitrary transitive graphs of d-dimensional volume growth. We now formulate an even more general version of our theorem, which will follow by essentially the same proof. The definitions introduced here will also be used throughout the rest of the paper. Given a graph G and a vertex v, we write E → v for the set of oriented edges emanating from v. (We will often abuse notation by identifying this set with the corresponding set of unoriented edges.) We define a weighted graph G = (V, E, J) to be a countable graph (V, E) together with an assignment of positive weights {J e : e ∈ E} such that e∈E → v J e < ∞ for each v ∈ V . Locally finite graphs can be considered as weighted graphs by setting J e ≡ 1. A graph automorphism of (V, E) is a weighted graph automorphism of (V, E, J) if it preserves the weights, and a weighted graph G is said to be transitive if for every two vertices x and y in G there exists an automorphism of G sending x to y. We say that a weighted graph is simple if there is at most one edge between any two vertices. Given a weighted graph G = (V, E, J) and β ≥ 0, we define Bernoulli-β bond percolation on G to be the random subgraph of G in which each edge is chosen to be either retained or deleted independently at random with retention probability 1 − e −βJe , and write P β = P G,β for the law of this random subgraph. Theorem 1.2. Let G = (V, E, J) be an infinite, simple, unimodular transitive weighted graph, let o be a vertex of G, and suppose that there exist constants 1/2 < a < 1, c > 0, and ε 0 > 0 such that for every 0 ≤ β ≤ β c and n ≥ 1. In particular, there are almost surely no infinite clusters at the critical parameter β c .
The hypothesis of unimodularity is a technical condition that holds in most natural examples, including all amenable transitive weighted graphs and all weighted graphs defined in terms of a countable group Γ and a symmetric, integrable function J : Γ → [0, ∞) by V = Γ, E = {{g, h} : g, h ∈ Γ, J(g −1 h) > 0}, and J({g, h}) = J(g −1 h) for each {g, h} ∈ E [64]. (As in the case of Z d , we say that a function J : Γ → [0, ∞) on a countable group Γ is symmetric if J(γ) = J(γ −1 ) for every γ ∈ Γ and integrable if γ∈Γ J(γ) < ∞.) It follows in particular that Theorem 1.2 implies Theorem 1.1. See [52,Chapter 8] for further background on unimodularity. Remark 1.3. Theorem 1.2 also leads to a new proof of a recent theorem of Xiang and Zou [72] which states that every countably infinite (but not necessarily finitely generated) group Γ admits a symmetric, integrable function J : Γ → [0, ∞) for which the associated weighted graph has a non-trivial percolation phase transition. To deduce their theorem from ours, simply pick a bijection σ : Γ → {1, 2, . . .}, let 1 < α < 2, and consider the symmetric, integrable function on Γ defined by J(γ) = σ(γ) −α + σ(γ −1 ) −α for every γ ∈ Γ: the associated long-range percolation model has β c < ∞ by Theorem 1.2. We remark also that Xiang and Zou's proof relied on the results of Duminil-Copin, Goswami, Raoufi, Severo, and Yadin [25] in the case that the group is finitely generated, while our proof is self-contained. It would be interesting if a new proof of the results of [25] could be derived from Theorem 1.2 by comparison of short-and long-range percolation.

About the proof
We now outline the basic structure of our proof and discuss how it compares to previous approaches to critical percolation. We begin with a brief overview of the two main strategies that have been employed in the study of critical percolation, which we term the supercritical strategy and the subcritical strategy. Broadly speaking, the supercritical strategy has found more success in low-dimensional settings while the subcritical strategy has found more success in high-dimensional settings, but there are notable exceptions in both cases. We write θ(p) = P p (|K o | = ∞) for the probability that the origin lies in an infinite cluster.
The supercritical strategy. In this strategy, one attempts to prove that the set {p : θ(p) > 0} is open by analysis of percolation under the assumption that θ(p) > 0. For example, one may hope to show that if infinite clusters exist then each such cluster K must be 'large' in some coarse sense that is strong enough to ensure that p c (K) < 1. This approach has been successfully followed both in Berger's analysis of long-range percolation on Z d [11] and in Benjamini, Lyons, Peres, and Schramm's proof that critical percolation on any nonamenable Cayley graph has no infinite clusters [10]. Harris's classical proof that θ(1/2) = 0 for the square lattice [36] can also be thought of in similar terms. Alternatively, one may instead attempt to find a finite-size characterisation of supercriticality, that is, a sequence of events (E n ) n≥1 each depending on at most finitely many edges and a sequence of positive numbers (δ n ) n≥0 such that θ(p) > 0 ⇐⇒ there exists n ≥ 1 such that P p (E n ) > 1 − δ n for every p ∈ [0, 1]; the existence of such a finite-size characterisation of supercriticality is easily seen to imply that the set {p : θ(p) > 0} is open as required. Such finite-size characterisations are typically derived via a renormalization argument, and this strategy often amounts to an alternative formalization of the more geometric strategy discussed above. Successful realisations of this approach include Barsky, Grimmett, and Newman's analysis [7,8] of half-spaces and orthants in Z d and Duminil-Copin, Sidoravicius, and Tassion's analysis [27] of two-dimensional slabs Z 2 × [0, r] k . A popular approach to critical percolation on Z 3 seeks to implement this strategy by eliminating the 'sprinkling' from the proof of the Grimmett-Marstrand theorem [32]; while this has not yet been done successfully, interesting partial progress in this direction has been made by Cerf [19].
Arguments following the supercritical strategy tend to be ineffective in the sense that they give little or no quantitative information about percolation at p c ; see however the recent work of Duminil-Copin, Kozma, and Tassion [26] for some progress towards reversing this trend.
The subcritical strategy. In this strategy, one attempts to prove that the set {p : θ(p) = 0} is closed by proving that some non-trivial upper bound on the distribution of the cluster of the origin holds uniformly throughout the subcritical phase. In contrast to the supercritical strategy, the subcritical strategy is inherently quantitative in nature and typically yields explicit estimates on the distribution of the cluster of the origin at criticality. The simplest example of such an argument is the proof that there is no percolation at criticality on any amenable transitive graph of exponential volume growth [40], which uses elementary subadditivity considerations to prove the uniform bound min{P p (x ↔ y) : d(x, y) ≤ n} ≤ gr(G) −n for every n ≥ 1 and p < p c , where gr(G) = lim sup n→∞ |B(x, n)| 1/n is the rate of exponential volume growth of G. Left-continuity of connection probabilities then implies that the same bound continues to hold at p c , from which the theorem is easily deduced.
More sophisticated versions of the subcritical strategy often involve a 'bootstrapping' or 'forbidden zone' argument. Such an argument was first used to analyze high-dimensional statistical mechanics models by Slade [58]. To implement such an argument, one aims to prove that some well-chosen estimate, called the bootstrapping hypothesis, implies a strictly stronger version of itself. Once this is done, it is usually straightforward to conclude via a continuity argument that the strong form of the estimate holds uniformly throughout the subcritical phase. For example, the lace expansion for high-dimensional percolation [29,34,35] works roughly by showing that if d is sufficiently large and G denotes the Greens function on Z d then for each p ∈ [0, p c ) we have the implication The estimate P p (x ↔ y) ≤ 3 G (x, y) holds trivially when p is small. Since we also have that lim sup x→∞ P p (0 ↔ x)/G (0, x) = 0 for every p < p c by sharpness of the phase transition [2,28], it follows by an elementary continuity argument that P p (x ↔ y) ≤ 2 G (x, y) for every 0 ≤ p ≤ p c and hence that there is no infinite cluster at p c as desired. (In fact the bootstrapping hypothesis used in the lace expansion analysis of percolation is more complicated than this, but the essence of the argument is as described.) See [39,59] for an overview of this method and [15,61] for recent work simplifying the implementation of the lace expansion for weakly self-avoiding walk.
In this paper we build upon a new version of the subcritical strategy that has been developed in our recent works [37,41,42]. The most basic form of the method was first used to prove power-law upper bounds for percolation on groups of exponential growth in [41], while a more sophisticated version of the method, closer to that employed here, was subsequently used to analyze critical percolation on certain groups of stretched-exponential volume growth in joint work with Hermon [37]. Very recently, similar ideas have also been used to prove continuity of the phase transition for the Ising model on nonamenable groups [42].
Let us now outline how this method works. In [41], we built upon the work on Aizenman, Kesten, and Newman [3] to prove an upper bound on the probability of a certain two-arm-type event, which we called the two-ghost inequality, that holds universally for all unimodular transitive graphs. One formulation of this inequality states that if G = (V, E) is a connected, locally finite, transitive unimodular graph (e.g. G = Z d ) and S e,n denotes the event that the endpoints of the edge e are in distinct clusters each of which touches at least n edges and at least one of which is finite, then for every p ∈ (0, 1], n ≥ 1, and o ∈ V . An extension of the two-ghost inequality to long-range models (including certain dependent models) was proven in [42, Section 3], which we give a further improvement to in Theorem 3.1. The two-ghost inequality can sometimes be used to prove that the percolation phase transition is continuous via the following rough strategy, which we implement a version of in this paper: 1. Assume as a bootstrapping hypothesis some well-chosen upper bound P β (|K o | ≥ n) ≤ h(n) for each n ≥ 1 with h(n) → 0 as n → ∞ and that holds trivially when β is very small. Choosing which bound to use is a potentially subtle matter which may involve trial and error.
Heuristically, there is a 'Goldilocks principle' that needs to be satisfied when choosing the bootstrapping hypothesis appropriately: A bound that is too weak will be of too little use as an input to proceed further into the argument, while a bound that is too strong will be too difficult to re-derive in a stronger form as required for the bootstrapping argument to come full circle. In particular, any bound decaying faster than n −1/2 cannot possibly work. In this paper we are able to consider power-law upper bounds as seems most natural, while in [38] the optimal upper bound making the argument work was of the form Ce − log ε n for small ε > 0.
2. Find some way to convert the bootstrapping hypothesis for some function f that hopefully decays reasonably quickly as x → ∞ for at least some well-chosen choices of x. In [37], for example, this is done by letting X be a random walk and bounding P β (o ↔ X k ) via spectral techniques. Here we will instead prove such a bound using hyperscaling inequalities as discussed in Section 2.

Use the Harris-FKG inequality and a union bound to observe that
x,n is the event that o and x belong to distinct clusters of size at least n, then prove an upper bound of the form for some appropriately chosen edge e = e(x) and some function F (x) that is hopefully not too large. In [37,41] this second step is done via a surgery argument using the finite-energy property of percolation. In our setting this step is much simpler and more efficient since we can just take e to be the 'long edge' connecting o to x and take F (x) ≡ 1.

Put steps 2 and 3 together to get an inequality of the form
for every n ≥ 1 and every vertex x under the assumption that β < β c and that the bootstrapping hypothesis holds. The proof will work if bounding P β (S ′ e,n ) using the two-ghost inequality and optimizing over the choice of x leads to a bound P β (|K o | ≥ n) ≤ g(n) that is a strict improvement of the bootstrapping hypothesis in the sense that g(n) < h(n) whenever h(n) < 1. (The function g must not depend on the choice of 0 ≤ β < β c .) Once this has been done successfully, it follows by an elementary continuity argument that the bound P β (|K o | ≥ n) ≤ g(n) holds for all 0 ≤ β ≤ β c and n ≥ 1, and hence that there is no percolation at criticality as desired.

A short proof of a weaker result
In order to give a simple illustration of how the strategy sketched above can be applied to long-range percolation on Z d , we now give a quick proof of a weaker result requiring α < d/4 rather than α < d and giving a worse upper bound on the exponent δ.
be symmetric and integrable, and suppose that there exists α < d/4, c > 0, and Then there exists a constant C such that for every 0 ≤ β ≤ β c and n ≥ 1. In particular, there are almost surely no infinite clusters at the critical parameter β c .
The proof will apply the following special case of the two-ghost inequality of [42, Corollary 3.2]. We will prove a stronger version of this inequality in Section 3. For each e ∈ E and λ > 0, we define S e,λ to be the event that the endpoints of e are in distinct clusters each of which touches a set of edges of total weight at least λ and at least one of which contains only finitely many vertices.
Proof of Proposition 1.4. By rescaling if necessary, we may assume without loss of generality that . We claim that there exists a constant C ≥ 1 such that the following implication holds for each 1/2 ≤ β < β c and 1 ≤ A < ∞: All the constants appearing in the remainder of the proof will be allowed to depend on d, α, and c, but not on the choice of 1 ≤ A < ∞ or 1/2 ≤ β < β c . For each x ∈ Z d and n ≥ 1, let S ′ x,n be the event that 0 and x belong to distinct clusters each of which contain at least n vertices. Both clusters are automatically finite since β < β c . It follows from Theorem 1.5 that there exists a constant C 1 such that for every n ≥ 1 and r ≥ r 0 . On the other hand, we have trivially that there exists a constant C 3 such that for every r ≥ r 0 . We now apply these two bounds to obtain a new bound on P β (|K 0 | ≥ n). We have by a union bound and the Harris-FKG inequality that for each x ∈ Z d and n ≥ 1. Rearranging and averaging over x ∈ Λ ′ r , it follows that there exists a constant C 4 such that for every r ≥ r 0 and n ≥ 1. Taking r = r 0 ∨ n (1−4θ)/(2α) yields that there exists a constant C 5 such that for every n ≥ 1, where we used that θ = (d − 4α)/4d in the central equality. (We arrived at this value of θ by getting to this stage of the calculation with θ indeterminate and solving for the value of θ that made the two powers of n equal.) The inequality (1.10) implies the claimed implication (1.8) by taking square roots on both sides. We now apply the bootstrapping implication (1.8) to complete the proof of the proposition. For each 1/2 ≤ β < β c , we have by sharpness of the phase transition [2,28] that |K 0 | has finite mean (indeed, it has an exponential tail), and in particular that there exists 1 ≤ A < ∞ such that P β (|K 0 | ≥ n) ≤ An −θ for every n ≥ 1. For each 1/2 ≤ β < β c we may therefore define Since the set we are minimizing over is closed, we have that P β (|K 0 | ≥ n) ≤ A β n −θ for every n ≥ 1 and 1/2 ≤ β < β c . Moreover, (1.8) implies that there exists a constant C such that A β ≤ CA 1/2 β for every 0 ≤ β < β c , and since A β is finite for every 1/2 ≤ β < β c we may safely rearrange this inequality to obtain that A β ≤ C 2 for every 1/2 ≤ β < β c . Thus, we have proven that P β (|K 0 | ≥ n) ≤ C 2 n −θ for every 0 ≤ β < β c . Considering the standard monotone coupling of P β and P β ′ for β ≤ β ′ and taking limits, it follows that the same estimate holds for all 0 ≤ β ≤ β c and n ≥ 1 as claimed.
In order to prove Theorem 1.1, we will improve the above proof in two ways: In Section 2 we develop a better method to convert volume-tail bounds into two-point function bounds than the primitive method used in (1.9), while in Section 3 we prove an improved form of the two-ghost inequality that gives better bounds on P β (S e,n ) in the case that e is a typical 'long' edge. (Each improvement can be used in isolation to prove a result of intermediate strength requiring α < d/2.)

Comparison to physics predictions
We now give a brief heuristic discussion of how our exponent bounds compare to the values predicted in the physics literature. Building upon the work of Sak [56] on long-range Ising models (see also e.g. [9]), physicists including Brezin, Parisi, and Ricci-Tersenghi [18] have argued that if η(d, α) and δ(d, α) denote the values of the exponents η and δ for long-range percolation in dimension d with long-range parameter α and η SR (d) and δ SR (d) denote the corresponding nearest-neighbour exponents then with logarithmic corrections to scaling at the 'crossover' value α * (d) = 2 − η SR (d). In particular, the exponent η(d, α) is predicted to 'stick' to its mean-field value of 2 − α in the interval (d/3, α * ], even though other exponents such as δ are not expected to take their mean-field values in this interval. See [20,21] for rigorous proofs in certain high-dimensional cases and [50,60] for related rigorous results for the long-range spin O(n) model. Assuming further that δ(d, α) takes its mean-field value of 2 when α ≤ d/3 and that the hyperscaling relation ( (1.12) As discussed above, it is strongly expected and known in some cases that η SR (2) = 5/24 and that η SR (d) = 0 when d ≥ 6. On the other hand, it is believed that η SR takes small negative values for d ∈ {3, 4, 5}: Both numerical estimates [51,67,71,73] and non-rigorous renormalization group methods [30] give values ranging between −0.1 and −0.01 in all three cases. (See the Wikipedia page https://en.wikipedia.org/wiki/Percolation_critical_exponents for a summary.) As such, it is believed that α * (d) < d for every d ≥ 2 and hence that that the models treated by Theorem 1.1 should include examples in the same universality class as nearest-neighbour Bernoulli bond percolation on each lattice of dimension d ≥ 2. (Proving such a universality claim would, however, require a vastly better understanding of these models than that provided by Theorem 1.1.) The bounds we obtain on the exponents for these models are of reasonable order, with our upper bounds on δ(d, α) always within a factor of 2 of the predicted true values when α ≤ α * (d) = 2 − η SR (d). See Figures 1 and 2 for side-by-side comparisons in two and three dimensions.

Hyperscaling inequalities and the maximum cluster size in a box
The proof of Proposition 1.4 made use of the fact that if Bernoulli bond percolation on some weighted graph G = (V, E, J) satisfies a bound of the form sup v∈V P β (|K v | ≥ n) ≤ An −θ for some 0 ≤ θ < 1 and A < ∞ then we have that for every Λ ⊆ V and u ∈ V , where C(θ) = O(1/(1− θ)) depends only on θ. Tasaki [65] observed that this inequality, which holds for arbitrary random graph models on Z d , can be thought of as giving a primitive hyperscaling inequality (2− η)δ ≤ d(δ − 1). In this section, we prove an inequality implying the stronger hyperscaling inequality (2 − η)(δ + 1) ≤ d(δ − 1). Note that while the arguments in the rest of the paper can all be applied to certain dependent percolation models including the randomcluster model with only a little extra work, the arguments in this section rely on the BK inequality in an essential way and are therefore very specific to Bernoulli percolation.
Let us now briefly review what is known about scaling and hyperscaling relations for Bernoulli percolation. In addition to the critical exponents δ and η that we have already introduced, it is also believed that there exist exponents such that γ, ∆, ρ, and β such that As before, ≈ means that the ratio of the logarithms of the two sides tends to 1 in the relevant limit. A further critical exponent ν is expected to describe the correlation length ξ(β) through the asymptotics ξ(β c − ε) ≈ ε −ν as ε ↓ 0. Intuitively the correlation length is the scale on which off-critical behaviour begins to manifest itself, see [31, Section 6.2] for a precise definition in the nearest-neighbour context. Heuristic scaling theory predicts that these exponents always satisfy the scaling relations γ = β(δ − 1), βδ = ∆, and γ = ν(2 − η).
Note that the hyperscaling relations involve the dimension d while the scaling relations do not. It is a heuristic originally due to Coniglio [22] that the hyperscaling relations should hold if there are typically O(1) 'large' critical clusters on any given scale. This condition is believed to hold below the upper critical dimension but not above; see [1,16] for detailed discussions. See [31,Section 9.1] for an overview of the heuristic arguments in support of the scaling and hyperscaling relations.
For nearest-neighbour percolation on two-dimensional planar lattices, the scaling relations (2.2) and hyperscaling relations (2.3) were proven by to hold by Kesten [46] under the assumption that the exponents δ and ν are both well-defined. (Kesten's results were of central importance to the subsequent computation of the critical exponents for site percolation on the triangular lattice following Smirnov's proof of conformal invariance [49,62,63].) See also [70] for related results on two-dimensional Voronoi percolation. Meanwhile, in high dimensions, it is now known that the exponents β, γ, δ, ∆, η, ρ, and ν all take their mean-field values in nearest-neighbour percolation with d ≫ 6, from which it follows that the scaling relations (2.2) are satisfied but that the hyperscaling relations (2.3) are violated; see [39] for an overview and [4,6,20,33,35,48] for highlights of the high-dimensional literature.
It remains completely open to prove that the scaling and hyperscaling relations hold in dimensions 2 < d ≤ 6, even if one assumes that all the relevant exponents are well-defined. The most significant progress is due to Borgs, Chayes, Kesten, and Spencer [16,17], who proved in particular that the scaling and hyperscaling relations both hold in low-dimensional lattices for which ρ is well-defined under the (as yet unproven) assumption that the number of clusters crossing the box [0, r]×[0, 3r] d−1 in the easy direction is tight as r → ∞. Their proof also yields that the hyperscaling inequalities dρ ≥ δ + 1 and d − 2 + η ≥ 2/ρ hold on any graph for which these exponents are well-defined. Many further works have established various other inequalities between critical exponents; see the work of Tasaki [65,66] for hyperscaling inequalities and the recent work [43] and references therein for an overview of scaling inequalities. The main goal of this section is to prove the following theorem, which improves significantly upon the naive bound of (2.1).
Theorem 2.1. There exists a universal constant C such that the following holds. Let G = (V, E, J) be a weighted graph, let β ≥ 0, and suppose that there exist constants A < ∞ and 0 ≤ θ ≤ 1/2 such that P β (|K u ∩ Λ| ≥ λ) ≤ Aλ −θ for every u ∈ V and λ > 0. Then for every u ∈ V and every finite set Λ ⊆ V .
In the context of Z d , it follows from Theorem 2.1 that if the exponents η and δ are both welldefined then they satisfy the hyperscaling inequality Indeed, if η and δ are both well-defined then either η ≥ 2, in which case (2.4) is trivial, or we can apply Theorem 2.5 with θ = 1/δ − ε for ε > 0 arbitrary to compute that where we write to mean that the ratio of the logarithms of the left and right hand sides has limit supremum less than 1. This inequality may be rearranged to prove the inequality (2.4) in the case η < 2. We remark that the inequality (2.4) is expected to be an equality below the upper critical dimension, as would follow from the validity of the scaling and hyperscaling relations.

Universal tightness of the maximum cluster size in a finite region
Let G = (V, E, J) be a countable weighted graph, and consider Bernoulli bond percolation on G with parameter β ≥ 0. For each finite subset Λ of V , we define (This is a slight abuse of notation: there may be more than one cluster achieving this maximum, so that K max (Λ) need not be well-defined as a set in general.) In this section we prove a general inequality, applying universally to all G, β, and Λ, implying that |K max (Λ)| is of the same order as its 'typical value' M β (Λ) := min{n ≥ 0 : P β (|K max (Λ)| ≥ n) ≤ e −1 } with high probability. In particular, one simple consequence of this theorem is that so that the mean and typical value of |K max (Λ)| are always of the same order. We expect that the equalities we prove in this section will have many further applications in the future. and hold for every β ≥ 0, α ≥ 1, and 0 < ε ≤ 1. Moreover, the inequality holds for every β ≥ 0, α ≥ 1, and u ∈ V .
We will deduce this theorem as a corollary of the following more general inequality. and hold for every β ≥ 0, λ ≥ 1, k ≥ 0, and u ∈ V .
(This theorem does not require λ to be an integer.)
We now turn to the proof of Theorem 2.3. We will deduce the theorem as a consequence of the BK inequality together with the following combinatorial lemma.
Lemma 2.4. Let G = (V, E) be a connected, locally finite graph, let k ≥ 1, and let A be a finite subset of V such that |A| ≥ 3 k . Then there exists m ≥ 3 k−1 + 1 and a collection {E i : 1 ≤ i ≤ m} of disjoint, non-empty subsets of E such that the following hold: 1. For each 1 ≤ i ≤ m, the subgraph of G spanned by E i is connected.

The set V i of vertices incident to an edge of E i satisfies
When G is finite, the proof of this lemma can be used to derive an explicit divide-and-conquer algorithm for finding such a collection of sets E 1 , . . . , E m after taking a spanning tree of G.
Proof of Lemma 2.4. We may without loss of generality assume that G = T is a tree, taking a spanning tree of G otherwise. In this case we will prove the stronger claim that the sets {E i : 1 ≤ i ≤ m} can be taken to be a partition of E. (In fact this is true in general also.) We say that a partition of E is good if each piece of the partition spans a connected subgraph of T .
We first prove that if T = (V, E) is a locally finite tree and A ⊆ V has 3 ≤ |A| < ∞ then there exists a good partition of E into two sets E 1 and E 2 such that if V i denotes the set of vertices incident to an edge of E i then for each i = 1, 2. Let ρ be a vertex of T . We root T at ρ, and call a vertex v a descendant of an edge e if the unique shortest path from ρ to v contains e. We will iteratively define a sequence (v n , W n ) N n=0 , where 1 ≤ N ≤ ∞, v n ∈ V , and W n ⊆ E for each 0 ≤ n ≤ N . Start by setting v 0 = ρ and W 0 = ∅. At each intermediate stage 0 < n < N of the sequence, W n and W c n will both be non-empty and span connected subgraphs of T and v n will be incident to edges of both W n and W c n . Given (v n , W n ) for some n ≥ 0, we let V n be the set of vertices of T that are either equal to ρ or incident to some edge of W n , and define (v n+1 , W n+1 ) as follows: 1. If v n has exactly one edge e ∈ W c n adjacent to it, we set W n+1 = W n ∪ {e} and set v n+1 to be the other endpoint of this edge. If W n+1 = E then we set N = n + 1 and terminate the sequence.
2. Otherwise, v n+1 has at least two edges of W c n adjacent to it. Enumerate these edges e 1 , . . . , e ℓ , and let D i be the set of descendants of e i for each 1 ≤ i ≤ ℓ. Since ℓ i=1 |D i ∩ A| = |V c n ∩ A|, there must exist 1 ≤ i ≤ ℓ such that |D i ∩ A| ≤ |V c n ∩ A|/2. Choose one such i, set W n+1 to be the union of W n with the set of edges incident to some vertex of D i , and set v n+1 = v n .
We may verify by induction that W n and W c n are indeed both non-empty and span connected subgraphs of T for every 0 < n < N as claimed, so that {W n , W c n } is a good partition of E for every 0 < n < N . Moreover, the assumption that T is locally finite implies that N n=0 W n = E and hence that N n=0 V n = V . Since A is finite and N n=0 V n = V , there exists a finite time N ′ ≤ N such that V n contains A for every N ′ ≤ n ≤ N . Observe that the set {0 ≤ n ≤ N ′ : |V n ∩ A| > |A|/3} contains N ′ but does not contain 0 since |A| ≥ 3. Letting m ≥ 1 be the minimal element of this set, we have that where we used that |A| ≥ 3 in the final inequality. It follows that 0 < m < N and that {W m , W c m } is a good partition of E with the desired properties.
We now apply the claim proven in the previous paragraph to complete the proof of the lemma. Let T = (V, E) be a locally finite tree, let k ≥ 1, and let A ⊆ V satisfy 3 k ≤ |A| < ∞. Let G be the set of good partitions of E. For each set of edges W ⊆ E, let V (W ) ⊆ V be the set of vertices incident to an edge of W . For each 1 ≤ n ≤ |E| and each good partition P = {E 1 , . . . , E n } ∈ G , let σ : {1, . . . , n} → {1, . . . , n} be such that |V (E σ(j) ) ∩ A| is decreasing in j, and define R(P) := |V (E σ(1) ) ∩ A|, . . . , |V (E σ(n) ) ∩ A|, 0, . . . , 0 |E| − n zeros ∈ {0, 1, . . .} |E| to be the sizes of the intersections with A of the vertex sets associated to the pieces of the partition P in decreasing order. Note that σ is not necessarily unique, but that the choice of σ does not affect the sequence R(P). We define a preorder of G by letting P < P ′ if the sequence R(P) is strictly lexicographically smaller than the sequence R(P ′ ). Let A be the set of good partitions P = {E 1 , . . . , E n } ∈ G such that |V (E i ) ∩ A| ≥ 3 −k |A| for every 1 ≤ i ≤ n and let P 0 = {E 1 , . . . , E m } be an element of A that is minimal with respect to the partial ordering we have just defined in the sense that there does not exist P ∈ A with P < P 0 : such a P 0 exists since {R(P) : P ∈ G } is finite. It suffices to prove that |V (E i ) ∩ A| < 3 −k+1 |A| for every 1 ≤ i ≤ m: since m i=1 |V (E i ) ∩ A| ≥ |A| it will follow that m > |A|/(3 −k+1 |A|) and hence that m ≥ 3 k−1 + 1 as desired. Suppose for contradiction that this claim does not hold, so that |V (E σ(1) ) ∩ A| ≥ 3 −k+1 |A| ≥ 3. Applying the claim of the previous paragraph to the subgraph of T spanned by E σ(1) , we obtain a partition E σ(1) = E σ(1),1 ∪ E σ(1),2 such that E σ(1),1 and E σ(1),2 both span connected subgraphs of T and , . . . , E σ(m) } is a good partition of E that belongs to A and satisfies P ′ 0 < P 0 , contradicting the minimality of P 0 .
for every β ≥ 0. See [31, Chapter 2.3] for further background. Let G = (V, E, J) be a finite weighted graph and let Λ ⊆ V . Suppose that the event {|K max (Λ)| ≥ 3 k λ} holds for some λ ≥ 1 and k ≥ 1, and let v ∈ V be such that |K v ∩Λ| ≥ 3 k λ. Applying Lemma 2.4 to K v yields that there exists m ≥ 3 k−1 + 1 and m disjoint sets of open edges E 1 , . . . , E m , each spanning a connected subgraph of K v , such that the set V i of vertices incident to an edge of E i satisfies |V i ∩ Λ| ≥ λ for every 1 ≤ i ≤ m. It follows that the sets E 1 , . . . , E m are all witnesses for the event {|K max (Λ)| ≥ λ}, and since these sets are all disjoint we deduce that for every λ ≥ 1 and k ≥ 1. Taking probabilities on both sides and applying the BK inequality yields the claimed inequality (2.8) in the case that G is finite. Now suppose that the event {|K u ∩Λ| ≥ 3 k λ} holds for some λ ≥ 1, k ≥ 1, and u ∈ V . Similarly to above, applying Lemma 2.4 to K u yields that there exists m ≥ 3 k−1 + 1 and m disjoint sets of open edges E 1 , . . . , E m , each spanning a connected subgraph of K u , such that the set V i of vertices incident to an edge of E i satisfies |V i ∩ Λ| ≥ λ for every 1 ≤ i ≤ m and such that m i=1 V i is equal to the vertex set of K u . In particular, u ∈ V i for some 1 ≤ i ≤ m. Thus, at least one of the sets E 1 , . . . , E m is a witness for the event {|K u ∩ Λ| ≥ λ}, while the remaining sets are all witnesses for the event {|K max (Λ)| ≥ λ}. Since the sets E 1 , . . . , E m are all disjoint, we deduce that for every λ ≥ 1, k ≥ 1, and u ∈ V . As before, taking probabilities on both sides and applying the BK inequality yields the claimed inequality (2.9) in the case that G is finite. The infinite cases of (2.8) and (2.9) follow straightforwardly from the finite cases by passing to the limit in an exhaustion over finite subgraphs.

Proof of the hyperscaling inequality
We now apply Theorem 2.2 to prove Theorem 2.5. In fact we will prove the following stronger theorem which also gives control of the maximal cluster size in Λ and allows 0 ≤ θ < 1.
We begin by writing down the following immediate corollary of Theorem 2.2.
Proof of Corollary 2.6. Write M = M β (Λ). The claim is trivial when n ≤ 18M . If not, we have by Theorem 2.2 that .
Using that x θ e −x/C is decreasing on [C, ∞) yields the claimed inequality.
Proof of Theorem 2.5. For each u ∈ V we can apply Corollary 2.6 to compute that where Γ(α) = ∞ 0 t α−1 e −t dt is the Gamma function and where we used the change of variables s = t/(18M ) in the final equality. Summing over u ∈ Λ, it follows that u,v∈Λ On the other hand, we also have the lower bound u,v∈Λ where we used that M ≥ 2 in the final inequality. Comparing the estimates (2.13) and (2.14) and rearranging yields that completing the proof of the first claimed bound. Substituting this bound into (2.12) yields that there exists a universal continuous function C : for each u ∈ V , completing the proof of the second bound.
Remark 2.7. Although the distribution of the entire cluster of critical percolation on a transitive weighted graph always satisfies P βc (|K v | ≥ n) ≥ cn −1/2 , the 1/2 < θ < 1 case of Theorem 2.5 may nevertheless be useful when taking e.g. Λ ⊆ Z d−k ⊆ Z d to be contained in a lower-dimensional subspace of the full lattice. In particular, it would be interesting if one could improve the highdimensional case of Theorem 1.1 by first proving an upper bound of the form P βc (|K 0 ∩ Z d−2 | ≥ n) ≤ n −1/δ 2 for some δ 2 < 2 and then using Theorem 2.5 to get an improved bound on the two-point function within Z d−2 . It seems that only a relatively modest improvement along these lines is needed to give a lace expansion-free proof that the triangle condition is satisfied when d is large and α is fixed. Note also that bounds on the maximum cluster size similar to those of Theorem 2.5 may be proven in the regime θ ≥ 1 by following the proof as above but considering u∈Λ E β |K v ∩ Λ| k instead of u∈Λ E β |K v ∩ Λ| for appropriate choice of k ≥ 2.

An improved two-ghost inequality
In this section we derive an improved version of the two-ghost inequality of [41, Theorem 1.6 and Corollary 1.7] as stated for long-range models in [42,Section 3]. This improved two-ghost inequality will be applied together with Theorem 2.1 to prove Theorems 1.1 and 1.2 in the next section. The proof of the two-ghost inequality uses ideas originating in the important work of Aizenman, Kesten, and Newman [3]; see [41] and [19] for further discussion of how the methods of [3] can be used to derive quantitative estimates on critical percolation. Our improvement to the two-ghost inequality as stated in [42, Corollary 3.2] is two-fold: • We show that a starting assumption of the form P β (|K| ≥ n) ≤ An −θ (as will come from our bootstrapping hypothesis) can be used to improve the exponent given by the two-ghost inequality. The fact that this can be done had previously been discussed briefly [41, Remark 6.1] and [42,Remark 3.6].
• We use a re-weighting trick to improve the bound one obtains on the probability of the two-arm event for a typical 'long' edge. The basic idea behind this improvement is that the two-ghost inequality of [42] holds not just for the weights J that are given with the graph G, but also for any other automorphism-invariant choice of weights. Optimizing the resulting bound over all possible automorphism-invariant weights leads to the bound of Theorem 3.1.
For the benefit of future applications, we phrase the results in this section not just for Bernoulli percolation but for the more general class of percolation in random environment models. The same level of generality was employed in [42,Section 3], where we applied the two-ghost inequality to the random-cluster and Ising models. Let G = (V, E, J) be a countable weighted graph. Suppose that µ is a probability measure on [0, 1] E , and let p = (p e ) e∈E be a [0, 1] E -valued random variable with law µ. Let (U e ) e∈E be i.i.d. Uniform[0, 1] random variables independent of p and let ω = ω(p, U ) be the {0, 1} E -valued random variable defined by ω(e) = ½(U e ≤ p e ) for each e ∈ E. We say that ω is a percolation in random environment on G with environment distribution µ and write P µ for the joint law of p and ω. We can consider Bernoulli percolation on G to be a percolation in random environment model for which the environment measure µ is concentrated on the point For each e ∈ E and n ≥ 1, let S ′ e,n be the event that the endpoints of e belong to distinct clusters each of which include at least n vertices and at least one of which is finite. (We use S ′ e,n rather than S e,n to indicate that we are measuring volume in terms of vertices rather than edges.) Theorem 3.1 (Improved two-ghost inequality). Let G = (V, E, J) be a connected transitive weighted graph, let o be a vertex of G, and let Γ ⊆ Aut(G) be a closed, transitive, unimodular subgroup of automorphisms of G. Let µ be a Γ-invariant probability measure on [0, 1] E and suppose that there exist constants A < ∞ and 0 ≤ θ < 1/2 such that P µ (|K o | ≥ n) ≤ An −θ for every n ≥ 1. Then for every n ≥ 1.
See e.g. [42, Section 2] for relevant background, including the definition of unimodularity of a transitive subgroup Γ ⊆ Aut(G). For the main purposes of this paper, it suffices to consider the case that G is unimodular and Γ is the full automorphism group of G, or indeed the case that G has vertex set Z d and that Γ = Z d acts transitively on G by translations as in Theorem 1.1.
Let G = (V, E, J) be a connected, transitive weighted graph, let o be a vertex of G, and let Γ be a closed transitive subgroup of Aut(G). We call w : E → [0, ∞) a (Γ-)good weight function if w(γe) = w(e) for every e ∈ E and γ ∈ Γ, E → o w(e) = 1, and E → o w(e) 1/2 < ∞. (The last condition holds trivially if w(e) = 0 for all but finitely many e ∈ E → o , and it would in fact suffice to consider this case for the rest of the proof.) Let µ be a Γ-invariant probability measure on [0, 1] E , let p be a random variable with law µ and let ω be the associated percolation in random environment process as above. Let h > 0. Given the environment p and a good weight function w, let G ∈ {0, 1} E be a random subset of E, independent of p and ω, where each edge e of E is included in G independently at random with probability 1 − e −hw(e) of being included. We write P µ,w,h and E µ,w,h for probabilities and expectations taken with respect to the joint law of p, ω, and G. We call G the w-ghost field and call an edge w-green if it is included in G. Note that for every finite set A ⊆ E, where we write w(A) = e∈A w(e) for the total weight of A.
For each edge e of G, we define T e to be the event that e is closed in ω and that the endpoints of e are in distinct clusters of ω, each of which touches some w-green edge, and at least one of which is finite. In order to prove Theorem 3.1, it suffices to prove the following proposition.
(The condition e∈E → o w(e) 1/2 < ∞ is not really needed for this proposition to hold, but will slightly simplify the proof.) Before proving this theorem, let us see how it implies Theorem 3.1.

Proof of Theorem 3.1 given Proposition 3.2.
Let w : E → [0, 1] be a Γ-good weight function. Let e be an edge of G with endpoints x and y and let D e be the event that x and y are in distinct clusters at least one of which is finite. Then we have by the definitions that for each h > 0 and n ≥ 1, where we used that w(E(A)) ≤ |A| for every A ⊆ V in the final inequality. Setting h = cn −1 with c ≥ 1 and applying Proposition 3.2, it follows that for every m ≥ 1. The claim follows by taking the limit as m → ∞.
We now begin to work towards the proof of Proposition 3.2. Since the proof is fairly similar to that of [42,Theorem 3.1], we will keep the details light and focus on those aspects of the proof that are genuinely different. Let G = (V, E, J) is a connected transitive weighted graph, let Γ be a closed transitive subgroup of automorphisms of G, and let w : E → [0, 1] be a Γ-good weight function. For each environment p ∈ (0, 1) E and subgraph H of G, we define the w-fluctuation of H to be As in [42], the fluctuation is defined so that h p,w (K v ) is the total quadratic variation of a certain martingale that arises when exploring the cluster K v one edge at a time after conditioning on the environment p. The following key lemma uses the mass-transport principle to relate the probability of the two-arm event to an expectation written in terms of the fluctuation. (This lemma is the only place that unimodularity is used in the proofs of any of our theorems.) holds for every h > 0.
Proof. This follows from exactly the same proof as [42, Lemma 3.3] but where we have allowed ourself to use the weights w instead of the original weights J; doing so requires notational changes only. While the cited lemma is phrased in terms of a random root edge, this is equivalent to the formulation in terms of a sum over e ∈ E → o that we have given here since we have required that e∈E → o w(e) = 1. Moreover, the integrability condition required by the cited lemma holds automatically here since we have required that e∈E → o w(e) 1/2 < ∞. We now bound the right hand side of this inequality via a martingale analysis, where we use the assumption P µ (|K o | ≥ n) ≤ An −a to improve upon the analysis of [42, Section 3.1]. Let X = (X n ) n≥0 be a real-valued martingale with respect to the filtration F = (F n ) n≥0 , and suppose that X 0 = 0. The quadratic variation process Q = (Q n ) n≥0 associated to (X, F) is defined by Q 0 = 0 and for each n ≥ 1. The following is a minor improvement of [42,Lemma 3.4].
Lemma 3.4. Let (X n ) n≥0 be a martingale with respect to the filtration (F n ) n≥0 such that X 0 = 0, let (Q n ) n≥0 be the associated quadratic variation process, and let T be a stopping time. Then Proof. Fix λ ≥ 0 and let τ = sup{k ≥ 0 : Q k ≤ λ} = inf{k ≥ 0 : Q k > λ} − 1, which may be infinite. Since Q n is F n−1 -measurable for every n ≥ 0, τ is a stopping time and X n∧τ ∧T is a martingale. Thus, we have by the orthogonality of martingale increments that for every n ≥ 1. The claim follows by applying Doob's L 2 maximal inequality to (X n∧τ ∧T ) n≥0 .
We now apply Lemma 3.4 to deduce the following improvement to [42,Lemma 3.5] under the assumption that the tail of the total quadratic variation satisfies a power-law upper bound. Lemma 3.5. Let (X n ) n≥0 be a martingale with respect to the filtration (F n ) n≥0 such that X 0 = 0, and let (Q n ) n≥0 be the associated quadratic variation process. Let T be a stopping time and suppose that there exist constants A and 0 ≤ θ < 1/2 such that P(Q T ≥ x) ≤ Ax −θ for every x > 0. Then Proof. Write M n = max 0≤m≤n |X n | for each n ≥ 0. Since (1 − e −hx )/x is a decreasing function of x > 0, we may write We can then compute that for every λ > 0, so that Lemma 3.4 and Cauchy-Schwarz let us bound for each k ∈ Z. Taking square roots and summing over k we obtain that This series is easily seen to converge, and indeed satisfies for every 0 ≤ θ < 1/2, where the final inequality can be verified by calculus. It follows that as claimed, where we used the bound 8 √ 2e = 18.653 . . . ≤ 19 to simplify the constant.

Proof of Proposition 3.2.
We prove the proposition in the case that µ is supported on (0, 1) E , which is the only case required by our main theorems. The general case follows by a simple limiting argument that is given in detail in the proof of [42,Theorem 3.1]. Let µ be a Γ-invariant probability measure on (0, 1) E , let w be a Γ-good weight function, and let (p, ω) be random variables with law P µ . Write K = K o for the cluster of o in ω. As in the proofs of [41, Theorem 1.6] and [42, Theorem 3.1], we can condition on the environment p and explore the cluster K one edge at a time in such a way that if T denotes the (possibly infinite) total number of edges touching K, E n denotes the (random) edge whose status is queried at the nth step of the exploration for each n ≥ 0, and F n denotes the σ-algebra generated by the environment p and the first n steps of the exploration for each n ≥ 0, then P µ (E n+1 = 1 | F n ) = p E n+1 whenever n < T and {E i : 1 ≤ i ≤ T } = E(K). It follows that the process (Z n ) n≥0 defined by Z 0 = 0 and for each n ≥ 1 is a martingale with respect to the filtration (F n ) n≥0 for which the final value Z T is equal to the w-fluctuation h p,w (K). Moreover, we can express the associated quadratic variation for every n ≥ 0, so that Q T = w(E(K)) is the total weight of all the edges touching K. Thus, it follows from Lemma 3.3 and Lemma 3.5 that if P µ (|K| ≥ n) ≤ An −θ for every n ≥ 1 then as required.
Proof of Lemma 4.1. By rescaling if necessary, we may assume without loss of generality that e∈E → o J e = 1. Fix 0 ≤ β < β c and suppose that 1 ≤ A < ∞ is such that P β (|K| ≥ n) ≤ An −θ for every n ≥ 1, where θ = (d − α)/(2d + α) < 1/2. We wish to prove that there exists a constant C that may depend on d, α, c, and r 0 but not on the choice of 1 ≤ A < ∞ or 0 ≤ β < β c such that P β (|K| ≥ n) ≤ CA 1/(1+θ) n −θ for every n ≥ 1. If β ≤ 1/2 then a standard path-counting argument implies that E β |K o | ≤ 2, so that the claim holds trivially in this case by Markov's inequality provided that we take C ≥ 2. We may therefore assume that β ≥ 1/2 for the remainder of the proof.
All the constants appearing in the remainder of the proof may depend on d, α, c, and r 0 but not on the choice of 1 ≤ A < ∞ or 0 ≤ β < β c . For each x ∈ Z d and n ≥ 1, let S ′ x,n be the event that 0 and x belong to distinct clusters each of which contains at least n vertices. Both of clusters are automatically finite since β < β c . Since θ < 1/2, we have by Theorem 3.1 that there exists a constant C 1 such that x∈Z d (e βJx − 1)P β (S ′ x,n ) 2 ≤ C 1 An −(1+2θ) for every n ≥ 1. Let Λ ′ r = Λ r \ Λ r 0 −1 for each r ≥ r 0 . It follows by Cauchy-Schwarz that there exists a constant C 2 such that ≤ C 2 A 1/2 n −(1+2θ)/2 r α/2 |Λ r | (4.1) for every r ≥ r 0 , where we used the inequality e x − 1 ≥ x in the first inequality on the second line. On the other hand, since δ ≥ 2, it follows immediately from Theorem 2.1 that there exists a constant C 3 such that for every r ≥ r 0 . We now apply these two bounds to obtain a new bound on P β (|K 0 | ≥ n). We have by a union bound and the Harris-FKG inequality that for each x ∈ Z d and n ≥ 1. Rearranging and averaging over x ∈ Λ r , it follows that ≤ C 2 A 1/2 r α/2 n −(1+2θ)/2 + C 3 A 2/(1+θ) r −2dθ/(1+θ) (4.4) for every r ≥ r 0 and n ≥ 1. Taking r = r 0 ∨ n (1−2θ)/α yields that there exists a constant C 4 such that P β (|K 0 | ≥ n) 2 ≤ C 4 A 1/2 n −2θ + A 2/(1+θ) n −2dθ(1−2θ)/(α+αθ) (4.5) for every n ≥ 1. Since θ = (d − α)/(2d + α), the two powers of n appearing in this expression and equal. Since we also have that A 1/2 ≤ A 2/(1+θ) , it follows by taking square roots on both sides of (4.5) that P β (|K 0 | ≥ n) ≤ √ 2C 4 A 1/(1+θ) n −θ for every n ≥ 1. This completes the proof.