Analyticity of Gaussian free field percolation observables

We prove that cluster observables of level-sets of the Gaussian free field on the hypercubic lattice $\mathbb{Z}^d$, $d\geq3$, are analytic on the whole off-critical regime $\mathbb{R}\setminus\{h_*\}$. This result concerns in particular the percolation density function $\theta(h)$ and the (truncated) susceptibility $\chi(h)$. As an important step towards the proof, we show the exponential decay in probability for the capacity of a finite cluster for all $h\neq h_*$, which we believe to be a result of independent interest. We also discuss the case of general transient graphs.


Introduction
Motivation and main results. We consider the level-set percolation for the Gaussian free field (GFF) on a connected, locally finite, transient graph G = (V, E). Of particular interest is the case of the hypercubic lattice Z d in dimensions d ≥ 3. The Gaussian free field ϕ = (ϕ x ) x∈V is defined as the centered Gaussian process with covariance E(ϕ x ϕ y ) = g(x, y) for all x, y ∈ V , where g(·, ·) stands for the Green function of the simple random walk on G. Given h ∈ R, we are interested in the excursion set {ϕ ≥ h} := {x ∈ V : ϕ x ≥ h} seen as a random subgraph of G (with the induced adjacency). The first and most fundamental question in percolation theory is the existence of a nontrivial phase transition, which in our case corresponds to −∞ < h * < +∞. A soft argument due to Bricmont, Lebowitz & Maes [2] shows that the GFF percolates above any negative level, i.e. h * (G) ≥ 0 (> −∞) for every transient graph G -it has been recently proved [3] that h * (Z d ) > 0 for all d ≥ 3. The opposite inequality h * < +∞ is more delicate. In the special case G = Z d , d ≥ 3, this was proved by Rodriguez & Sznitman [18] -the case d = 3 had already been obtained in [2]. Other graphs have been proved to satisfy h * < +∞ [20], but this remains open for more general transient graphs. Remarkably, this is in contrast with the classical Bernoulli percolation, for which proving the existence of percolative regime is in general much harder than proving the existence of a non-percolative regime -see [5].
Once the existence of a phase transition is established, the next important question concerns the uniqueness of critical point, i.e. whether h * defined above is the only value at which one can see a qualitative change in the large-scale behavior of the model. This immediately raises the question of whether there are critical points at other values of h and how to define them. There are two main approaches to this question.
From a percolation theory perspective, a natural approach consists in defining alternative critical parametersh and h * * , which characterize a strongly percolative and strongly nonpercolative regimes, respectively. In the last decade, this approach has been successfully implemented in the case G = Z d : definitions appeared in many works -see e.g. [18,17,4,19] and more recently it has been proved by Duminil-Copin, Goswami, Rodriguez & Severo [6] that indeedh = h * = h * * . This equality is often referred to as "sharpness" of phase transition and is also expected to hold for other transient graphs, but this remains open. The corresponding result for Bernoulli percolation on Z d was obtained in the highly influential works of Aizenman & Barsky [1] and Menshikov [14] (on the subcritical phase) and Grimmett & Marstrand [9] (on the supercritical phase).
From the point of view of statistical physics, a classical approach consists in considering a function (such as θ) describing the macroscopic behavior of the model, and define the critical points to be the singularities of that function. Uniqueness of critical point then corresponds to the analyticity of this function on R \ {h * }, which is precisely the main result of the present article. Let us mention that the corresponding result for Bernoulli percolation on Z d has been proved on the subcritical phase by Kesten [11] and on the supercritical phase by Geogarkopoulos & Panagiotis [7]. Hermon & Hutchcroft [10] also proved a corresponding result for Bernoulli percolation on non-amenable transitive graphs.
In order to state our main result, we need to introduce some notation. Let X denote the family of all finite subsets of V . We say that a cluster observable F : X → C has subexponential growth if |F (S)| ≤ e o(cap(S)) as cap(S) → ∞. Here cap(S) denotes the (harmonic) capacity of S, the (vertex) closure of S -see Section 2 for definitions. Finally, for a cluster observable F : X → C and a subset X ∈ X , consider the function F X : R → C defined by where C X (h) denotes the union of all clusters in {ϕ ≥ h} intersecting X.
Then for every observable F : X → C of subexponential growth and every X ∈ X , the function F X well-defined and analytic on R \ {h * }.
Notice that the analyticity of the percolation density θ on R \ {h * } follows from Theorem 1.1 by taking F ≡ 1, for which where X ∈ X with |X| = k. The following is a corollary of Theorem 1.1.
The only function for which Corollary 1.2 does not follow readily from Theporem 1.1 is the (non-truncated) k point function τ X (h). In order to deduce its analyticity, simply notice that by the uniqueness of the infinite cluster (see e.g. [18,Remark 1.6]) and the inclusion-exclusion principle, we can write We remark that the analyticity of τ X (h) may break down on the supercritical phase if uniqueness of infinite cluster does not hold. Indeed, for Bernoulli percolation there are examples [10] of transitive non-amenable graphs for which τ has a discontinuity at the uniqueness critical point p u , which in this case satisfies p c < p u < 1.
Our proof of analyticity of F X makes crucial use of the following convenient series decomposition. For every integer N ≥ 1 and h ∈ R, consider the event We can then write (1.1) With the series (1.1) in hands, it is enough to show that each term F X N (h) := E[F (C X (h))1 A X N (h) ] can be analytically extended to a domain of C containing R \ {h * } on which the series converges locally uniformly. A crucial step to establish such a convergence is proving that P[A X N (h)] decays (uniformly) exponentially in N for h = h * . This is the content of the following theorem, which we believe to be of independent interest. Theorem 1.3. Let G = Z d , d ≥ 3. Then for every ε > 0 and X ∈ X , there exists c = c(X, ε, d) > 0 such that P[A X N (h)] ≤ e −cN for every N ≥ 1 and every h ∈ R with |h − h * | ≥ ε.
In recent years, large deviation problems for GFF percolation events has attracted considerable attention -see e.g. [19,16,15,22,8]. A common feature in these problems is a deep connection with potential theory and in particular the notion of capacity. Typically, the exponential rate of decay is given by the solution of a constrained optimization problem involving the Dirichlet energy and, in some cases, the percolation density θ as well [21,24,25]. It is therefore relevant to understand the regularity of θ in order to study these optimization problems. Motivated by this, it has been recently proved [23] that θ is C 1 for the closely related model of random interlacements. We expect that the techniques developed in the present article may be helpful to study similar questions for the random interlacements and other strongly correlated models as well. The proof of Theorem 1.3 is based on a coarse graining argument which is very much in the spirit of the works cited above. However, we would like to highlight a key new aspect of our work: we use a coarse graining procedure that involves, at the same time, multiple scales instead of only one. We describe this multi-scale coarse graining scheme in more details in the end of this section.
We now discuss the case of general transient graphs. Theorem 1.5. For every transient graph G, every observable F : X → C of subexponential growth and every X ∈ X , the function F X is well-defined and analytic on (−∞, 0).
Under weaker assumptions on the decay of P[A X N (h)], we can prove that F X is smooth for observables F : X → C of (at most) polynomial growth, i.e. satisfying |F (S)| ≤ C|cap(S)| C for all S ∈ X and some constant C ∈ (0, ∞). We say that a sequence (c N ) N ≥1 decays superpolynomially fast if lim N →∞ for every X ∈ X there exists (c N ) N ≥1 decaying super-polynomially fast such We also define an analogous parameter in the subcritical phasê for every X ∈ X there exists (c N ) N ≥1 decaying super-polynomially fast such Theorem 1.6. For every transient graph G, every observable F : X → C of (at most) polynomial growth and every X ∈ X , F X is well-defined and The parameters h andĥ defined above can be seen, respectively, as an alternative definition of the classical parametersh and h * * mentioned above. Indeed, for the case G = Z d , it is not hard to prove thath ≤ h ≤ h * ≤ĥ ≤ h * * , which in turn implies h = h * =ĥ as the equalityh = h * * is known in this case [6]. It is natural to expect that the equality h = h * =ĥ holds in great generality, but sharpness of phase transition remains open beyond Z d . It is also natural to expect that, independently of sharpness, one might be able to bootstrap the decay of P[A X N (h)] from super-polynomial to exponential via a coarse graining argument, thus proving that θ is analytic on R \ [ h,ĥ]. This is essentially what we do in the proof of Theorem 1.3 for the case G = Z d : we start from a sub-optimal decay provided by the assumption h ∈ R \ [h, h * * ] (= R \ {h * } by [6]) and enhance it to the desired exponential decay through a coarse graining argument -see the discussion below for more details. On general graphs though, developing a coarse graining argument is more challenging due to a poorer understanding of their geometry.
About the proof. As mentioned above, our proof makes crucial use of the series (1.1). We first use a shift-argument based on the Cameron-Martin formula to naturally construct an analytic extension of the function F ] to the whole complex plane C for every N ≥ 1. This construction provides a simple way to effectively estimate the growth of this entire function along the imaginary direction. More precisely, we prove in Proposition 2.1 that F Due to this result, it is not difficult to deduce the locally uniform convergence (and therefore analyticity) of the series (1.1) from the (uniform) exponential upper bound for P[A X N (h)] on the real line -see Proposition 2.2. This exponential bound is then provided in the case G = Z d by Theorem 1.3, which is the most technical part of this article.
Before discussing the ideas involved in the proof of Theorem 1.3, we would like to highlight some key differences between GFF level-sets and Bernoulli percolation. Kesten's proof [11] of analyticity for Bernoulli percolation on the subcritical phase is based on a series expansion similar to (1.1), but in terms of the cluster size N = |C X |. Since in the subcritical phase the cluster size decays exponentially in probability [1,14] and the expansion in the imaginary direction also grows (at most) exponentially in N , one can prove that the series converges locally uniformly near the real line and is therefore analytic. This strategy does not work in the supercritical phase though: while the expansion in the imaginary direction is still exponential in N , the decay of the cluster probabilities is exponential in its boundary size [12], which is typically of order N . Motivated by this issue, Georgakopoulos & Panagiotis [7] considered a series decomposition in terms of the size of (multi-)interfaces instead, in which case both the expansion in the imaginary direction and the decay on the real line are of the same exponential order. For the GFF level-sets though, none of these decompositions can work as the decay on the real line is subexponential in both the volume and boundary sizes. Nevertheless, we observe that both the imaginary expansion and the decay of cluster probability (in both subcritical and supercritical phases!) are exponential in the capacity of the cluster, thus allowing us to make effective use of the series expansion (1.1). This fact is due to an entropic repulsion phenomenon that emerges from the strong (non-integrable) correlations of the GFF, which in turn are deeply related to the potential theory attached to the random walk.
We will now outline the main ideas present in the proof of Theorem 1.3. As mentioned above, a quite substantial multi-scale coarse graining argument takes place in the proof. We start by discussing the more natural single-scale coarse graining approach with the hope of making the need for a multi-scale argument more apparent. This single-scale approach would consist in choosing an appropriate scale L and observe that on the event A X N (h) one can find a family F of L-boxes on which an unlikely event (so called bad event) happens. Then one can hope to prove that, for every given F , the probability that all of these boxes are bad is at most e −cN , while keeping the combinatorial complexity (i.e. the number of possible families F ) of order e o(N ) . In order to prove the desired exponential upper bound, one can use the harmonic decomposition of GFF on each box of F into the sum of a local and a global field and then consider two cases: either most boxes of F are globally bad -which corresponds to the global (harmonic) field deviating from 0 -or many boxes are locally bad -which corresponds to the occurrence of an unlikely percolation event for the local field. By applying a large deviation result of Sznitman [19], one can prove that the probability of the first case decays exponentially in the capacity, i.e. it is smaller than e −cN , as desired. In the second case though, one can use independence to show that its probability is smaller than p |F | L , where p L is the probability of a single L-box being locally bad. On the one hand, since the available a priori bound on p L is only stretched exponential in L and the geometry of F is completely arbitrary, one quickly notices that in order for the desired inequality p |F | L ≤ e −cN to hold uniformly in F , it is necessary to choose L not too large. On the other hand, because of the arbitrary geometry of F again, it is necessary to take L sufficiently large in order to have a combinatorial complexity of order e o(N ) . As a consequence, choosing such a scale L becomes impossible, suggesting the need of a multi-scale approach.
Our multi-scale coarse graining construction goes roughly as follows. For each configuration ϕ ∈ A N (h) we construct a set of bad (and very bad) boxes F consisting of multiple scales. We do so inductively in the scales, starting by a sufficiently large scale L such that the combinatorial complexity is of order e o(N ) . We then look at all the boxes where something unlikely happensthese boxes are called bad -and we add to F all those boxes where something "very unlikely" happens -these boxes are called very-bad. Here "very unlikely" corresponds to an event for which an improved a priori upper bound of type q L ≤ e −c cap(B) holds. If these boxes have capacity of order N , we are done. Otherwise, we can go down to a smaller scale L < L and inspect the bad L -boxes contained in the remaining L-boxes (i.e. bad but not very-bad) and add to F those L -boxes which are very-bad. By continuing this process, we eventually obtain either a family of very-bad boxes with capacity of order N or a very large number of bad boxes of the smallest scale L 0 . We can then prove that the probability of both cases is smaller than e −cN . Since each time we go down one scale we look only inside certain boxes of the previous scale, it turns out that we can do so by keeping the combinatorial complexity of order e o(N ) , as desired. For this construction to work though, one has to define the notions of bad and very-bad boxes in a very careful way so that a certain propagation property holds -see item (iii) of Definition 3.2.
Organization of the paper. In Section 2 we review the potential theory attached to the simple random walk and describe the shift-argument used to extend each term of the series (1.1) to an entire function. We then prove Theorems 1.5 and 1.6 and also deduce Theorem 1.1 from Theorem 1.3, to which the remaining sections are dedicated. In Section 3, we describe the large deviation argument used to prove Theorem 1.3. In Section 4 we prove the (deterministic) multi-scale coarse graining theorem stated in Section 3. Finally, in Sections 5 and 6 we prove the decay in probability for the notions of bad and very-bad boxes introduced in Section 3.
(X n ) n≥0 for the corresponding process. We let g(·, ·) stand for the Green function of the walk, where d(y) denotes the degree of y. It is well known that the Green function is finite (as G is transient), symmetric and positive-definite. Therefore, we can effectively define the GFF ϕ = (ϕ x ) x∈V as the centered Gaussian field with covariance matrix g. In the case of Z d , d ≥ 3, it is well known that g(x, y) x − y 2−d . Given K ⊂ V and x ∈ V , we consider the equilibrium measure e K (x) := d(x)P x [ H K = ∞] 1 x∈K , where H K := min{n ≥ 1 : X n ∈ K}. The capacity of K is defined as its total mass, The capacity is an increasing and sub-additive function, i.e. cap(A) ≤ cap(A ∪ B) ≤ cap(A) + cap(B) for every A, B ⊂ V . The following variational characterization of the capacity is useful for obtaining lower bounds: where E(ν) := x,y ν(x)ν(y)g(x, y) and the infimum ranges over all probability measures ν supported on K. As a direct consequence, one has the following inequality . The optimizing measure in (2.3) is precisely the normalized equilibrium measure e K (x) := e K (x)/cap(K). Further, for every K ⊂ K ⊂⊂ Z d one has the sweeping identity where H K := min{n ≥ 0 : X n ∈ K}. Consider the Dirichlet inner product defined as for every pair of function f, g : V → R for which the sum converges (for instance, if either ∆f or g has finite support), where ∆f (x) := y∼x (f (y) − f (x)) is the Laplacian of f . One also has the following variational characterization of capacity in terms of the Dirichlet energy where the infimum ranges over all functions f such that E (f, f ) is well defined and f (x) ≥ 1 for every x ∈ K. The optimizing function in (2.8) is called the harmonic potential of K and is given by In fact, f K takes value 1 on K and is harmonic on V \ K, i.e. ∆f K (x) = 0 for all x ∈ V \ K.
Given a function f : V → C for which the Laplacian ∆f (x) has finite support, we introduce the complex measure Notice that when f takes real values, the Cameron-Martin formula implies that P f is a probability measure and furthermore, the law of ϕ under P f coincides with the law of ϕ x − f (x) under P. This observation will allow us to extend the probability of local events to the complex plane.
Proposition 2.1. For every X ∈ X and N ≥ 1, the function F ] extends to an entire function such that for every h, t ∈ R, Proof. Let S ∈ X . We start by extending h → P[C X (h) = S] to the complex plane. For every z ∈ C, we define (2.11) First notice that since the event {C X (0) = S} only depends on ϕ restricted to S and hf S = h on S, it follows from the Cameron-Martin formula that θ X S (h) is indeed equal to P[C X (h) = S] for h ∈ R. In order to prove that θ X S (z) is analytic on C it suffices to show that the series in (2.11) converges locally uniformly. Indeed, this follows directly from the fact that for all n ≥ 0, and E[exp |zE (f S , ϕ)| ] is finite for every z ∈ C as E (f S , ϕ) is a Gaussian random variable.
We will now obtain a bound for θ X S (h + it) in terms of θ X S (h) for h, t ∈ R. By the Cameron-Martin formula, we have Since |exp −iyE (f S , ϕ) | = 1 a.s., we obtain Finally, for every z ∈ C, we define Here A N denotes the family of all sets S ∈ X such that N − 1 ≤ cap(S) < N . By (2.12), We can then apply the Weierstrass M-test to conclude that the series in (2.13) converges locally uniformly and therefore F X N is indeed analytic on C. The inequality (2.10) follows readily from (2.12).
With Proposition 2.1 in hands, we can now easily obtain a sufficient condition for the analyticity of F X .
Proof. By Proposition 2.1 and our assumption on the decay of P[A X N (h)], we obtain that |F hence it is analytic on that set.
Notice that Theorem 1.1 follows directly from Proposition 2.2 and Theorem 1.3, whose proof is presented in the following sections. Theorem 1.5 follows from Proposition 2.2 and the following simple result. Recall that {ϕ ≥ h} is known to percolate for every h < 0 on any transient graph [2].
Proof. Let h < 0 and S ∈ A N . Recall that by the Cameron-Martin formula, Notice that on the event and the desired inequality follows by summing over S.
We finish this section by proving Theorem 1.6.
Proof of Theorem 1.6. Let us write D(h, R) for the closed disk in the complex plane that is centred at h and has radius R. Consider some h ∈ R \ [h,ĥ] and let R = N −1/2 . By Proposition 2.1 and the Cauchy estimate, we can bound the kth derivative of F X N as follows Thus, by the subpolynomial growth of F and the superpolynomial decay of P[A X N (h )], it follows that |∂ k F X N (h)| decays to 0 super-polynomially fast and uniformly on compact subsets of R \ [h,ĥ]. We can now conclude that the sum ∞ N =1 |∂ k F X N (h)| converges uniformly on compact subsets of R\[h,ĥ], hence the kth derivative of F X (h) exists and is equal to

Exponential decay of capacity on Z d
In this section, we will introduce some definitions and state the technical results needed for the proof of Theorem 1.3. Since Z d is transitive and the capacity is sub-additive, by a union bound we can assume without loss of generality that X = {o} and henceforth omit X from the notation.

Markov decomposition and harmonic deviations
We start by introducing some notation.
We will view LZ d both as a graph that is naturally isomorphic to Z d and as the collection of all the boxes B L (z). Given a box B = B L (z), we consider the Gaussian fields where K = K L (z) and T K := min{n ≥ 0 : X n / ∈ K}. One then has the decomposition Moreover, ξ B is harmonic in K and the covariance matrix of ψ B is equal to the Green function g K for simple random walk killed on the boundary of K. The fields ξ B and ψ B are often called harmonic and local fields, respectively. The aforementioned decomposition of ϕ is of great importance for large deviation results as it will allow us to distinguish local contributions (driven by ψ) from global ones (driven by ξ). In this subsection we focus on estimating the global contributions, which correspond to deviations of ξ and are governed by the capacity.
Let ε > 0 and L ≥ 1. We say that the box where D = D L (z). If B is not (ξ, ε)-good, we will call it (ξ, ε)-bad. Sznitman [19] obtained a precise estimate for the probability that many boxes of the same scale are (ξ, ε)-bad. For our purposes, a multi-scale version of Sznitamn's result is necessary. To formally state this new version, we will need the following definition. Consider a family F of boxes of We remark that for a well-separated family F , the local fields ψ B , B ∈ F , are independent from each other, which will be useful in the following sections in estimating the probability of certain events. Finally, The following is a slight modification of Sznitman's result.
There is a constant c 0 > 0 such that the following holds. For every ε > 0 there is a constant δ > 0 such that for every well-separated collection F with |F | ≤ δcap(Σ), we have Proof. It suffices to prove that for some constant c > 0, we have Indeed, notice that ξ B are centered and either F − = {B ∈ F : inf D ξ B ≤ −ε} or F + = {B ∈ F : sup D ξ B ≥ ε} has capacity at least cap(Σ)/2 by the sub-additivity of the capacity. Moreover, there are 2 |F | ≤ 2 δcap(Σ) possibilities for F ± , so it is enough to take 0 < δ << c . The proof of (3.2) is essentially the same as in [19,Corollary 4.4]. We will point out the necessary changes. The results mentioned throughout this proof are from [19]. We attach to F the collection and We need to show that there exists a constant C = C(d) > 0 such that for every f ∈ F and The first inequality can be obtained by arguing as in the proof of [ for every f, k ∈ F , where C is a constant, L denotes the scale of B and L denotes the scale of B . It follows from (3.5), (3.3) and the fact that Now given ε ∈ (0, √ C ], for every L ≥ 1 we pick the largest integer l such that l ≤ 7εL/ √ C and for each box B ∈ F of scale L, we partition D into disjoint boxes, each having · ∞diameter at most l. If f, k ∈ F are such that for every B ∈ F , f (B) and k(B) lie in the same box of D, then it follows from (3.5 Arguing as in page 1820 of [19] we obtain (3.4). We can now use the Borell-TIS inequality as in the proof of [19,Corollary 4.4] to obtain and (3.4) in hands, the desired result follows once we choose δ so that C √ δ ≤ ε/2 and 0 < δ << c .
Notice that by applying Lemma 3.1 to a single box B ∈ LZ d and recalling (2.5), we have

Bad boxes and multi-scale coarse graining
Our aim now is to set up the abstract multi-scale coarse graining scheme used to prove Theorem 1.3. This is encapsulated in Theorem 3.3 below, which is purely deterministic and whose proof is postponed to Section 4. In the next subsections, we deduce the desired exponential decay of capacity in the subcritical and supercritical phases separately by applying Theorem 3.3 with well chosen notions of "bad" and "very-bad" events. Let us start by giving some definitions and introducing some notations. For every L ≥ 1 We will introduce a general framework that will allow us to study both the supercritical and the subcritical regime. To this end, we consider a family of "bad events" indexed by boxes B = B L (z) ∈ LZ d , satisfying certain properties.  For our purposes, both E b B and E vb B will be chosen to be unlikely events, with E vb B in particular being extremely unlikely, in the sense that its probability decays exponentially in cap(B). Item (ii) can be thought of as an initiation property that ensures that the union of the boxes B ∈ C o (h, L) for which E B happens, has capacity at least cap(C o (h)). Item (iii) can be thought of as a propagation property. Ideally, we would like the event E vb B to happen for most boxes in ∂C o (h, L). If this is not the case, then we have many boxes B ∈ ∂C o (h, L) for which E b B happens. In this case, item (iii) ensures that for many boxes in C o (h, L) that are adjacent to ∂C o (h, L), the event E B happens. Continuing in this way we explore more and more boxes for which E B happens.
With such events in hand, we will associate to C o (h) an interface I such that for each box B of I, E B happens. An interface I is a finite collection of disjoint boxes of L 1 Z d , L 2 Z d , . . . L k Z d for an integer k > 0 and 1 ≤ L 1 < L 2 < . . . < L k . Most of the I we will consider, o will be contained in a bounded component of Z d \ I (thus the term "interface"), but it will be more convenient for us not to add this condition in the definition. When E B happens for each box B of I, we will say that I occurs. There are two subsets of I that play an important role. The first one, denoted B, is the set of boxes B ∈ I such that E b B happens. The second one, denoted V B, is the set of all boxes B ∈ I such that E vb B happens. In the following theorem, we construct a family of interfaces I N of small cardinality such that whenever A N (h) happens, some interface I ∈ I N occurs for which either V B has large capacity or B has large cardinality.
, be a family of events which are admissible for each L ≥ 1. For every ρ > 0 and δ > 0, there exist constants there is a family I N of interfaces such that the following hold: ) and B ⊂ L 1 Z d , and one of the following holds: We stress that the constants t, L 0 and N 0 in the above theorem depend only on d, ρ and δ and not on the choice of E b B and E vb B . We also remark that for our applications, E B will be chosen in such a way that its probability decays stretched exponentially with exponent the constant ρ appearing in the statement of the theorem.

Exponential decay in the supercritical regime
We will split the proof of Theorem 1.3 into two parts, depending on whether h belongs to the supercritical or the subcritical regime. We will first handle the supercritical regime. Our aim is to choose E b B and E vb B appropriately and then apply Theorem 3.3. To this end, consider an integer L > 0 and a box B ∈ LZ d . We and has diameter at least L/5 -the latter follows immediately for any connected subgraph that intersects at least 3 4 M d boxes contained in B, provided that L is large enough, but we will not need this fact.
Fix h < h * and ε 0 : The events appearing in (b2), (vb1) and (vb2) are unlikely to happen. However, it will be convenient for us to work with events that, in addition to being unlikely, are independent on different boxes that are far away from each other. For this reason, we will now introduce certain local bad and very-bad events. In what follows, given a box B = B L (z), U stands for U L (z) and D stands for D L (z).
We say that B is (ψ, h, ε)-good if for every function g : D → R which is harmonic in D and |g(x)| < ε for all x ∈ D, the following happen: • {ψ B + g ≥ h} ∩ U contains a cluster of diameter at least L/5, If B is not (ψ, h, ε)-good, we will call it (ψ, h, ε)-bad. It is not hard to see that if E b B happens and L ≤ diam(C o (h)), then B is (ψ, h, ε 0 )-bad (with the choice g = ξ B ), since C o (h)∩U contains a cluster of diameter at least L/5. The following result will be proved in Section 5.
We now define another local event. We say that B is (ψ, h, ε)-very-good if for every function g : D → R which is harmonic in D and |g(x)| < ε for all x ∈ D, the following happen: • for every B which is either B or some neighbour of B, {ψ B + g ≥ h} ∩ B contains a dense cluster, • for every neighbour B of B and every pair of dense clusters of {ψ B + g ≥ h} ∩ B and {ψ B + g ≥ h} ∩ B , respectively, there is a path in {ψ B + g ≥ h} ∩ D visiting both dense clusters.
If B is not (ψ, h, ε)-very-good, we will call it (ψ, h, ε)-very-bad. It is not hard to see that if E vb B happens and B is (ξ, ε 0 )-good, then B is (ψ, h, ε 0 )-very-bad. The following result will be proved in Section 6.
Assuming Theorem 3.3 and Propositions 3.4 and 3.5, we are now in position to prove Theorem 1.3 for h in the supercritical regime.
Proof of Theorem 1.3 for h < h * . Consider some h ≤ h < h * and let ρ > 0 be the exponent of Proposition 3.4. Consider also a constant δ > 0 which will be chosen along the way to be sufficiently small. We start by applying Theorem 3.3 for the choice of events E b B and E vb B mentioned above to obtain a family I N as in the statement of the theorem. For each I ∈ I N , we will prove an exponential upper bound for the probability that I occurs satisfying either (c1) or (c2) and then apply a union bound over all I ∈ I N . First, let us fix I ∈ I N and a pair of subsets I 1 , I 2 ⊂ I such that I 1 satisfies cap B∈I 1 B ≥ N/4d and I 2 satisfies I 2 ⊂ L 1 Z d (where L 1 is the smallest scale of I) and |I 2 |L ρ 1 ≥ tN . We will bound separately the probability that V B = I 1 and B = I 2 . We start by the latter. Let I 2 be a well-separated subset of I 2 that is maximal with respect to this property. By the maximality of I 2 , for every B L 1 (z) ∈ I 2 , there is some Notice that the local fields ψ B , B ∈ I 2 are independent of each other (since I 2 is well-separated) and each box in I 2 is (ψ, h, ε 0 )-bad (since the boxes of I have scale smaller than the diameter of C o (h)). Therefore, by Proposition 3.4 and independence, we have We shall now bound the probability that V B = I 1 . First, we restrict I 1 to a well-separated subset with capacity of order N . Let L 1 < L 2 < . . . < L k be the scales of I 1 . Let I k 1 be a subset of I 1 ∩ L k Z d which is well-separated and maximal with respect to this property. Proceeding inductively, for each i ∈ {1, 2, . . . , k}, let I i 1 be a subset of I 1 ∩ L i Z d such that k j=i I j 1 is well-separated and I i 1 is a maximal set with respect to this property. Finally, let I 1 = k j=1 I j 1 . It follows from the maximality of the construction that for every B ∈ I 1 of scale L i there exists B ∈ I 1 of scale L j ≥ L i such that the · ∞ -distance between B and B is at most 201L j . In this case, for every x ∈ B we have that P x [H B < ∞] ≥ q, where q = q(d) > 0 is a constant depending only on the dimension d -see e.g. [13, Proposition 2.2.2]. It then follows from the sweeping identity (2.6) that cap(Σ(I 1 )) ≥ qcap(Σ(I 1 )) ≥ qN/4d.
Applying Lemma 3.1 and a union bounded over all possibilities for Ξ(I 1 ) we obtain P I occurs with V B = I 1 and cap(Ξ( where the sum ranges over all possible J such that cap(J ) ≥ qN 8d . Recall that |J | ≤ |I 1 | ≤ |I| ≤ δtN , so that we can indeed guarantee that J satisfies the hypothesis of Lemma 3.1 by decreasing the value of δ if necessary. The term 2 δtN above accounts for the number of possible J . For the second case, notice that by the sub-additivity of the capacity and (2.5). Hence by Proposition 3.5, we have P I occurs with V B = I 1 and cap(Ψ( Since |I N | ≤ e δtN , applying a union bound over all I ∈ I N and all possible I 1 , I 2 ⊂ I, and decreasing δ even further, if necessary, we obtain that for some constant c > 0 depending only on h and d, as desired.

Exponential decay in the subcritical regime
We now move on to the proof of Theorem 1.3 for h in the subcritical regime. We will implement a strategy similar to the one we used for the subcritical regime.
First, we need to choose suitably the events E b B and E vb B . Given h ≥ h > h * , ε 0 = (h −h * )/2 and a box B ∈ LZ d , let E B be the event that {ϕ ≥ h} ∩ U contains a cluster of diameter at least L/5, and let E b It is straightforward to see that this family of events is h-admissible when C o (h) has diameter at least L, since then for every box B ∈ C o (h, L), the event E B happens.
Notice that when the event E b B happens, {ψ B ≥ h − ε 0 } ∩ U contains a cluster of diameter at least L/5. The latter happens with probability decaying stretched exponentially. Proof. This is a simple consequence of the (subcritical) sharpness of GFF percolation on Z d (i.e. h * = h * * ) mentioned in the introduction. Indeed, by the main result of [6], for every h > h * , there exist ρ = ρ(d) ∈ (0, 1) and c = c(d, h) such that for every N ≥ 1, Assume that {ψ B ≥ h} ∩ D contains a cluster of diameter at least L/5 and let ε 0 = (h − h * )/2. Up to a probability decaying exponentially in L d−2 , B is (ξ, ε 0 )-good by (3.6). When this happens, {ϕ ≥ h − ε 0 } ∩ D contains a cluster of diameter at least L/5. The latter event has probability decaying stretched exponentially in the subcritical regime by (3.7).
Assuming Theorem 3.3 and Proposition 3.6, we are now in position to prove Theorem 1.3 for h in the subcritical regime.
Proof of Theorem 1.3 for h > h * . The proof is similar to that of the case h < h * presented in Section 3.2.1. Consider some h ≥ h > h * , and let ρ > 0 be the exponent of Proposition 3.6. Consider also a small enough constant δ > 0. We can apply Theorem 3.3 to obtain a family I N satisfying the conclusion of the theorem. Fix I ∈ I N and a pair of subsets I 1 , I 2 ⊂ I as in the proof of the case h < h * . If B = I 2 happens, then we restrict to a maximal well-separated subset I 2 of I 2 . Arguing as in the previous section and using Proposition 3.6 and independence, we deduce that P[I occurs with B = I 2 ] ≤ exp{−201 −d c 3 tN }. If V B = I 1 happens, then we restrict to a well-separated subset I 1 of I 1 defined as in the proof of Theorem 1.3 for which we have cap(Σ(I 1 )) ≥ qcap(Σ(I 1 )) ≥ qN/4d. Then we apply Lemma 3.1 to conclude that A union bound over all I ∈ I N and over all possible subsets I 1 , I 2 of I gives for some constant c > 0 depending only on h and d, as desired.

Multi-scale coarse graining construction
We will now proceed with the proof of Theorem 3.3. In order to prove the theorem we need to introduce some notation. The following lemma will be used in the proof of Theorem 3.3. For simplicity, we may henceforth identify any set of boxes F with its union Σ(F ) = B∈F B.  We can now easily deduce that cap(X) ≥ cap(C o (h)). Notice that because when we start a simple random walk from some x ∈ ∂C o (h), one way to never visit C o (h) again is to first visit a given neighbour y ∈ ∂ out C o (h) and from there to never visit We are now ready to prove Theorem 3.3.
Proof of Theorem 3.3. Our aim is to construct an occurring multi-scale interface I for every configuration on the event A N (h). We will construct I by starting from I(2 k ) for a certain choice of 2 k and then adding boxes of smaller and smaller scales. We will divide the definition of I into segments. At each step of the first segment we will add at most N f (N ) boxes, at each step of the second segment we will add at most N f (f (N )) boxes, and so on, where f (N ) = log b (N ), b = 3(d − 2)/ρ. The process will stop once we reach a scale of size roughly L or if it happens that (c1) or (c1) is satisfied before we reach that scale. It suffices to prove the theorem for ρ ≤ 1. Consider an integer L ≥ 1 and let N ≥ N 0 , where N 0 is a large enough constant that will be determined along the way. Assume that the event A N (h) happens and let where I 1,1 (2 k ) := I(2 k ). Notice that k 1,1 is well-defined, since |I 1,1 (1)| ≥ |∂C o (h)| ≥ cap(C o (h)) ≥ N/2d, provided that N 0 is large enough so that f (N 0 ) ≥ 2d. By further increasing the value of N 0 , we can assume that 2 k 1,1 ≤ r := 1 2 diam(C o (h)) because |I 1,1 (r)| ≤ 4 d . By definition, where L 1,1 := 2 k 1,1 +1 ≤ diam(C o (h)). On the other hand, the number of boxes of L 1,1 Z d that contain a box of I 1,1 (2 k 1,1 ) is at least .
Thus we can add enough boxes of I 1,1 (2 k 1,1 ) to I 1,1 (L 1,1 ) to obtain an interface I 1,1 such that . In the first case, the process stops because (c1) is satisfied, and we let I = V B 1,1 . In the second case, we would like to check whether (c2) is satisfied. For that purpose, we consider two cases according to whether .
In the first case, we stop the first segment of our process. In the second case, we move on to the second step of the first segment. We remark that along the way of the second and every subsequent step, we will define some integers k i,j , L i,j and some collections I i,j of 2 k i,j -boxes and L i,j -boxes, where L i,j = 2 k i,j +1 . To avoid repetition, let us mention that we will use the notation V B i,j : For the second step, we will require N to be large enough so that f (N ) ≥ 4d. Now let where I 1,2 (2 k ) is the set of boxes of I(2 k ) that lie in some box of B Inequality (4.1) follows now from our assumption that cap(V B 1,1 ) < N/4d. Hence I 1,2 (1), which contains ∂C o (h) ∩ B 1,1 , has size at least N/4d. Moreover, by our assumption that , we obtain that k 1,2 < k 1,1 . This proves that k 1,2 is well-defined.
Let now L 1,2 := 2 k 1,2 +1 . Arguing as in the first step, we obtain an interface I 1,2 such that , that is obtained from I 1,2 (L 1,2 ) by adding enough 2 k 1,2 -boxes of I 1,2 (2 k 1,2 ) that are disjoint from the boxes of I 1,2 (L 1,2 ). At this point, we take cases according to whether As before, if the first case happens, the process stops and we define I = 2 j=1 V B 1,i , while if the second case happens, then we check whether Similarly, if the first case happens, we end the first segment. If the second case happens, then we continue to the third step. At this point, we need a generalisation of Lemma 4.1 which will ensure that |I 1,3 (1)| ≥ N/4d and more generally that |I i,j (1)| ≥ N/4d for the subsequent steps. This is proved in Lemma 4.3.
Continuing in this manner, we obtain a sequence of interfaces (I 1,j ) j≥1 , where I 1,j is contained in B 1,j−1 . We claim that eventually for some integer j 1 ≥ 1, Indeed, if the first inequality does not hold, then by Lemma 4.3 and the sub-additivity of the capacity we have cap(I 1,j ) > N/4d. On the other hand, by (2.5) and the sub-additivity of capacity again, we have We thus conclude that However, it follows from the definitions that (L 1,j ) j≥1 is a strictly decreasing sequence, and so (4.2) cannot hold for arbitrary large j.
We end the first segment as soon as we reach a step j 1 as above. We shall now decide whether we start the second segment or not. If it happens that and then our process stops. In the first case, we simply set I = j 1 j=1 V B 1,j . In the second case though, we set I = B(L 1,j 1 )∩I 1,j 1 if |B(L 1,j 1 )∩I 1,j 1 | ≥ |B(2 k 1,j 1 )∩I 1,j 1 | and I = B(2 k 1,j 1 )∩I 1,j 1 otherwise. In other words, I contains only one of the sets B(L 1,j 1 ) ∩ I 1,j 1 and B(2 k 1,j 1 ) ∩ I 1,j 1 , namely that of larger size. If (4.3) is not satisfied, then we move on to the second segment.
Arguing in a similar manner, we obtain a sequence of occurring interfaces (I 2,j ) j≥1 such that |I 2,j | < N/f (f (N )) for all j ≥ 1, where each I 2,j lies in I 1,j 1 . The segment ends when we reach a certain step j 2 such that either .
The process stops at the end of the second segment if and In that case, we set I = 2 i=1 j i j=1 V B i,j , I = B(L 2,j 2 ) ∩ I 2,j 2 or I = B(2 k 2,j 2 ) ∩ I 2,j 2 , as appropriate.
Proceeding inductively, we define sequences of occurring interfaces (I 1,j ) j 1 j=1 , (I 2,j ) j 2 j=1 , . . . such that |I i,j | < N/f •i (N ) for all i and j, where f •i denotes the i-fold composition of f .
At the end of an arbitrary kth segment, we either have cap Notice that m is well-defined for every N such that f (N ) > M . If the desired conditions are not satisfied at the end of the ith segment for every i ≤ m, we move on to the (m + 1)th segment. This segment plays a special role, as we are defining each I m+1,j in such a way that At the end of the (m + 1)th segment we have cap ) ∩ I m+1,j m+1 or I = B(2 k m+1,j m+1 ) ∩ I m+1,j m+1 , as appropriate.
It is not hard to see that if (c1) is not satisfied, then (c2) is satisfied for Indeed, if the process stops at the end of the ith segment for some i ≤ m, then the smallest scale (here we use the notation introduced above the statement of Theorem 3.3). Thus |B|L ρ k ≥ 2 −ρ−d−1 N . On the other hand, if the process stops at the end of the (m + 1)th segment, then we can argue as in the proof of (4.2) to deduce that Thus the smallest scale of I is at least 1 2 L m+1,j m+1 ≥ L, which implies that |B|L ρ k ≥ tN . Since t ≤ 2 −ρ−d−1 , the desired assertion follows.
The above construction gives us a family of interfaces I N satisfying all the properties claimed in Theorem 3.3. The only properties that do not follow immediately from the construction are that |I N | ≤ e δtN and that |I| ≤ δtN for every I ∈ I N . In order to prove these inequalities, we will treat each segment separately. We start with the first segment. To determine I 1,j , j = 1, 2, . . . , j 1 , we need to first determine the sequence (L 1,j ) j 1 j=1 . Recall that by construction we have L 1,1 ≤ diam(C o (h)). As we mentioned above Corollary 1.4, a cluster of capacity at most N has volume (and therefore diameter) at most j=1 is simply a strictly decreasing sequence of powers of 2 with exponents at most log 2 (C 1 N 3 ), which in turn implies that there are at most 2 log 2 (C 1 N 3 ) = C 1 N 3 possibilities for (L 1,j ) j 1 j=1 . Once the scales (L 1,j ) j 1 j=1 are fixed, we should bound the possibilities for I 1,j , 1 ≤ j ≤ j 1 . Notice that for all j = 1, 2, . . . , j 1 , each box of I 1,j is at distance at most C 1 N 3 from the origin and furthermore |I 1,j | ≤ N 1 := N/f (N ) . Hence, for each 1 ≤ j ≤ j 1 , the number possibilities for I 1,j given L i,j is at most where N 1 = C 2 N 3d . Using the inequality n k ≤ n k k e k and the monotonicity of the combinatorial coefficient n k for k ≤ n/2 we obtain that Overall, there at most possibilities for the first segment. By increasing C 3 of necessary, the term inside the exponential in (4.5) is also an upper bound for the number of boxes of the first segment contained in I. Moving on to the second segment, first notice that all scales (L 2,j ) j 2 j=1 are powers of 2 smaller than f (N ) 1 ρ (recall that (4.3) does not hold). Therefore, there are at most f (N ) 1 ρ possibilities for (L 2,j ) j 2 j=1 . Since L 1,j 1 < f (N ) 1 ρ and every box of the second segment is contained in I 1,j 1 , which in turn contains at most N 1 boxes, we deduce that for every j = 1, 2, . . . , j 2 , I 2,j contains Hence for each 1 ≤ j ≤ j 2 the number of possibilities for I 2,j given L 2,j is at most N )) . Overall, there are at most possibilities for the second segment, where for the last inequality, we increase the value of C 3 if necessary.
, we see that for the boxes of an arbitrary ith segment, there are at most possibilities. Overall, we deduce that Furthermore, the term inside the exponential in (4.6) is an upper bound for |I|. Therefore, it remains to prove that provided that L and N are large enough (recall from (4.4) that t = L ρ 2 d+1 M ). We start by bounding the (m + 1)th term. By the definition of m, we have f •(m+1) ≤ M , which implies Now, let us handle the sum up to the mth term. First notice that for all i ≤ m, log 2 (g i−1 (N )) for all x ≥ C 5 . Since g i−1 (N ) ≥ g m (N ) ≥ C 5 for all L and N that are large enough, one readily deduces N )).
Iterating the last inequality, we obtain that (4.10) Combining (4.8) and (4.10), we deduce that Recalling the definitions of M and t, we see that t = C 6 M −1+ ρ d−2 . Since by definition b = 3(d − 2)/ρ, the desired inequality (4.7) follows readily as long as δ ≥ 3C 3 , which can be guaranteed by making L sufficiently large. This completes the proof.
We now prove the lemma mentioned in the proof of the above theorem. We recall that for convenience we identify sets of boxes with the corresponding subsets of Z d .
Proof. As in the proof of Lemma 4.1, the desired result will follow once we show that . Recall the definitions of I i,j and I i,j (L i,j ). We will prove that X i,j is a separating set of C o (h) in the special case where I 1,1 = I 1,1 (L 1,1 ), I 1,2 = I 1,2 (L 1,2 ), . . . , I i,j = I i,j (L i,j ). The general case follows easily by removing I 1,1 \ I 1,1 (L 1,1 ) ∪ I 1,2 \ I 1,2 (L 1,2 ) ∪ . . . ∪ I i,j \ I i,j (L i,j ) from X i,j .
It is clear that X i,j is a separating set of C o (h) ∩ V B i,j . We claim that every box in ∂ out B i,j lies either in V B i,j or in L i,j Z d \ C o (h, L i,j ), (4.11) which implies that X i,j is a separating set of C o (h)∩B i,j by arguing as in the proof of Lemma 4.1. Indeed, for (i, j) = (1, 1), the claim follows from property (iii). Proceeding inductively, assume that the statement holds for an arbitrary (i, j). Let (k, l) be the next pair of indices, i.e. (k, l) = (i, j + 1) if j < j i or (k, l) = (i + 1, 1) if j = j i . Clearly, every box in ∂ out B k,l lies either in L k,l Z d \ C o (h, L k,l ), in which case there is nothing to show, or in C o (h, L k,l ). So let us consider a box B ∈ ∂ out B k,l ∩ C o (h, L k,l ). Then B has a neighbour B ∈ B k,l ⊂ B i,j , which implies that B is contained entirely in B i,j or in ∂ out B i,j . If B is contained in B i,j , then B ∈ V B k,l ⊂ V B k,l because of our assumption that B ∈ ∂ out B k,l ∩ C o (h, L k,l ). Let us now assume that B is contained in some box B ∈ ∂ out B i,j . It follows from our inductive hypothesis that B lies either in V B i,j or in . This proves the inductive statement and the claim follows. It Then we claim that ∂ out Z i,j is contained in V B i,j which implies that V B i,j is a separating set of C o (h) \ Y i,j . We will prove the claim inductively. For (i, j) = (1, 1), this follows from the proof of Lemma 4.1 where it is shown that for every component S of C o (h, L 1,1 ) \ I(L 1,1 ), we have ∂ out S ⊂ V B(L 1,1 ).
Assume that the statement holds for some (i, j). We will prove it for the next pair of indices (k, l). Let S be a component of Z k,l . Although S ⊂ Z k,l , it is possible that some box of S is contained in B i,j . Let us assume that this is the case. Then by the connectivity of S and (4.11), all boxes of S are contained in B i,j . Notice that ∂ out S lies in C o (h, L k,l ) because otherwise some box B of S lies in ∂ out C o (h, L k,l ) ∩ B i,j , hence B is contained in Y k,l , which contradicts the definition of Z k,l . From this we deduce that ∂ out S ⊂ Y k,l . Moreover, no box of ∂ out S lies in B k,l because otherwise some box of S lies in I k,l ⊂ Y k,l by our assumption that S is contained in B i,j . Therefore, ∂ out S lies in V B k,l .
Let us now assume that no boxes of S are contained in B i,j . Then S lies entirely in C o (h, L i,j )\ Y i,j , since Y k,l contains V B i,j . We can now apply the induction hypothesis to deduce that ∂ out S is contained in V B i,j ⊂ V B k,l . This completes the inductive proof.

Decay of badness
In this section, we will prove Proposition 3.4. We will make use of the (supercritical) sharpness of phase transition for GFF percolation [6] (i.e. h = h * ). We say that a box B = B L (z), z ∈ LZ d , is (ϕ, h)-good if there exists a connected component in {ϕ ≥ h} ∩ B with diameter at least L/5 and furthermore any two clusters in {ϕ ≥ h} ∩ U having diameter at least L/10 are connected to each other in {ϕ ≥ h} ∩ D. By the main result of [6], for every h < h * there exist ρ = ρ(d) ∈ (0, 1) and c = c(d, h ) such that for every h ≤ h and L ≥ 1, Our aim is to express the event that a box is (ψ, h, ε)-bad in terms of events depending on ϕ, so that we can use (5.1). For this purpose, we will make use of the following classical fact about discrete harmonic functions. For any function f : for neighbouring x and y in B L (z), where C = C (d) > 0 is a universal constant -see [13,Theorem 1.7.1]. We shall apply this result for f = ξ B and B being a (ξ, ε)-good box for a certain value of ε > 0. We first need to introduce some definitions. Consider an integer N ≥ 1 and let L = N/M ≈ N 1 α+2 , where α = (2d+1) 2 and M = N α+1 α+2 . We say that a connected subgraph C of U N is very dense if it has diameter at least N/5 and for every box B = B L (z), z ∈ LZ d contained in B N , C ∩ B contains a connected subgraph of diameter at least L/5.
Given 0 < ε < h * −h, we say that a strong local uniqueness happens in B N if {ϕ ≥ h+ε}∩U N contains a very dense cluster and furthermore, for every k ∈ {− εL α , . . . , εL α }, every box C is the constant appearing in (5.2). We denote by NSLU(h, ε, N ) the event that strong local uniqueness does not happen in B N .
We now turn to the proof of each of the above lemmas.
Proof of Lemma 5.3. If B N is (ξ, δ)-bad or the event Conf(h, ε + δ, N ) happens, then there is nothing to prove, so let us assume that B N is (ξ, δ)-good and Conf(h, ε + δ, N ) does not happen. We need to show that NSLU(h, ε + δ, N ) happens. To this end, if {ϕ ≥ h + ε + δ} ∩ U N does not contain a very dense cluster, then NSLU(h, ε + δ, N ) happens, so let us assume that {ϕ ≥ h + ε + δ} ∩ U N does contain a very dense cluster C 1 .
We claim that for some function f : D N → R which is harmonic in D N and satisfies |f (x)| < ε + δ for every x ∈ D N , Indeed, B N is (ψ, h, ε)-bad, so it follows from the decomposition of ϕ that there is a function f as above for which either {ϕ + f ≥ h} ∩ U N does not contain a cluster of diameter at least N/5 or (5.3) happens. However, C 1 is contained in {ϕ + f ≥ h} ∩ U N and has diameter at least N/5, which implies that (5.3) happens. and Since f is harmonic in D N and |f (x)| < ε + δ for every x ∈ D N , we have that |f (x) − f (y)| ≤ C (ε + δ)/N for neighbouring x and y in B N by (5.2). Since D has diameter at most 7dL 2 , we conclude that Consider the smallest k ∈ {− (ε + δ)L α , . . . , (ε + δ)L α } such that max u∈D f (u) ≤ kL −α . Then We can now deduce from (5.4) and (5.5) that and ϕ < h − rL −α on ∂ out C 2 ∩ D, (5.8) where r = k − 1 − 7dC (ε + δ). Now as Conf(h, ε, N ) does not happen and (5.7) holds, for all but at most L − 1 vertices x of C 2 ∩ D we have ϕ x ≥ h − rL −α . We claim that {ϕ ≥ h − rL −α } ∩ C 2 ∩ D contains a cluster of diameter at least L. Indeed, notice that C 2 ∩D contains a connected set of diameter at least 3L 2 because the graph distance between B and ∂D is 3L 2 . Consider a path γ in C 2 ∩ D connecting two vertices u and v with graph distance d(u, v) ≥ 3L 2 . Then we have that where x 0 = u, x j = v and x 1 , . . . , x j−1 are the vertices of γ in between u and v such that h − kL −α ≤ ϕ x i < h − rL −α , ordered in turn of appearance in γ as we move from u to v. As γ can have at most The subpath γ of γ in between x i and x i+1 has thus diameter at least 3L − 2 ≥ L and ϕ x ≥ h − rL −α for every x ∈ γ . Consider the cluster C 3 of {ϕ ≥ h − rL −α } ∩ D containing γ . We will show that C 2 contains C 3 , which proves the claim. To this end, recall that C 2 is a cluster of (5.9) Since C 2 and C 3 overlap at γ , we deduce that C 2 contains C 3 . Consider a box B L (w) ∈ LZ d lying in B that intersects C 3 . We will show that B L (w) is (ϕ, h − rL −α )-bad, which implies that NSLU(h, ε + δ, N ) happens, as desired. Recalling the definition of a very dense cluster, we see that B L (w) intersects C 1 as well. Notice that both C 3 ∩ U L (w) and C 1 ∩ U L (w) contain a cluster of diameter at least L/5 because both C 3 and C 1 have diameter at least L/5. On the other hand, C 3 is not connected to C 1 in {ϕ ≥ f +h}∩D L (w) by (5.3) and the fact that C 3 ⊂ C 2 . Using (5.9), we can deduce that C 3 is also not connected to Proof of Lemma 5.1. Let us start by constructing a very dense cluster. By increasing the value of N , if necessary, we can assume that for every B = B L (z) ∈ LZ d contained in B N , D = D L (z) is contained in U N . It is not hard to see that if all boxes of LZ d that are contained in B N are (ϕ, h + ε)-good, then {ϕ ≥ h + ε} ∩ U N contains a cluster C such that C ∩ B contains a cluster of diameter at least L/5 for every B ∈ LZ d lying in B N . This is because for every pair of neighbouring boxes B and B contained in B N , both {ϕ ≥ h+ε}∩B and {ϕ ≥ h+ε}∩B contain a cluster of diameter at least L/5, and these two clusters are connected in {ϕ ≥ h + ε} ∩ D. Increasing the value of N even further ensures that C has diameter at least N/5, i.e. it is a very dense cluster.
Proof of Lemma 5.2. We will show that the probability of Conf(h, ε, B) decays stretched exponentially for every B ∈ L 2 Z d . Then the desired result will follow from a union bound over all B ∈ L 2 Z d lying in B N and the fact that there are polynomially many choices for B.
In order to prove the aforementioned result, consider a subset S of D of cardinality L and an integer k ∈ {− εL α , . . . , εL α }. We will estimate the probability that h − kL −α ≤ ϕ x < h − rL −α for all x ∈ S and then apply a union bound over all possible S and k. Let us set h 1 := h − kL −α and h 2 := h − rL −α . Choose a subset S of S such that for every x, y ∈ S we have d(x, y) ≥ 2, and S is a maximal subset of S with respect to this property. Then |S | ≥ L 2d+1 . Now conditioning on ϕ y for y ∈ Z d \ S , we obtain For the second equality, we used that conditionally on all ϕ y , y / ∈ S , the random variables ϕ x , x ∈ S are pairwise independent. For the first inequality, we used that which follows from the fact that conditionally on σ(ϕ y , y ∈ Z d \ S ), ϕ x is a normal random variable with variance 1 (the value of the mean is not important), hence its probability density function is bounded by 1.
On the other hand, D contains (7L 2 ) d vertices, hence there are at most (7L 2 ) dL possible subsets of D L 2 of cardinality L. A union bound over the 2 εL α + 1 possible values of k and the subsets of D L 2 of cardinality L implies that By our choice of α, This completes the proof.

Decay of very-badness
In this section, we will prove Proposition 3.5. First, we need to express the event that B is (ψ, h, ε)-very-bad in terms of ϕ and ξ. We say that a box B is (ϕ, h, ε)-very-good if for every function g : D → R which is harmonic in D and |g(x)| < ε for all x ∈ D, the following happen: • for every B which is either B or some neighbour of B, {ϕ + g ≥ h} ∩ B contains a dense cluster, • for every neighbour B of B and every pair of dense clusters of {ϕ + g ≥ h} ∩ B and {ϕ + g ≥ h} ∩ B , respectively, there is a path in {ϕ + g ≥ h} ∩ D visiting both dense clusters.
We shall now introduce another event that will be used to handle the non-uniqueness of a dense cluster. We define H(h, ε, B) to be the event that there are • a function g : D → R which is harmonic in D and |g(x)| < ε for all x ∈ D, and • a pair C 1 , C 2 of clusters of {ϕ + g ≥ h} ∩ U of diameter at least L/5, for which there is no path in {ϕ + g ≥ h} ∩ D connecting C 1 with C 2 . It is not hard to see that if H(h, ε, B) happens and B is (ξ, δ)-good for some δ > 0, then B is (ψ, h, ε + δ)-bad.
Recall that the definition of a dense cluster involves considering the boxes of L 0 Z d that are contained in B L . In order to construct a dense cluster, we will need to work with the columns of this collection of L 0 -boxes. To define them precisely, let {e 1 , e 2 , . . . , e d } be the standard basis of Z d . Given a collection F of boxes of RZ d for some R ≥ 1, the columns of F parallel to e i , i ∈ {1, 2, . . . , d} are defined as follows. For every sequence of integers (y j ) d j=1,j =i , the set of boxes B R (z) ∈ F , z = (z 1 , z 2 , . . . , z d ) with z j = y j for every j = i, will be called a column of F parallel to e i .
We will now prove Proposition 3.5.
Proof of Proposition 3.5. Notice that if B L is (ψ, h, ε)-very-bad and (ξ, δ)-good for some δ > 0, then it is (ϕ, h, ε + δ)-very-bad. Applying this observation for δ = (h * − h − ε)/2 and using Lemma 3.1 to handle the case that B is (ξ, δ)-good, we see that (after redefining ε) it suffices to prove that for every h < h * and 0 We will first focus on the existence of a dense cluster. Recall that L 0 = L/M , where . Consider the boxes of L 0 Z d contained in U L and notice that they form a partition of U L . We will show that when only a few columns of this partition contain a (ϕ, h+ε)bad box, {ϕ ≥ h + ε} ∩ B L contains a dense cluster, where B L is either B L or a neighbouring box of B L . The latter easily implies that {ϕ + g ≥ h} ∩ B L contains a dense cluster for every function g : D L → R which is harmonic in D L and |g(x)| < ε for all x ∈ D L . Then we will proceed to show that the probability of having many columns that contain a (ϕ, h + ε)-bad box decays exponentially in L d−2 .
Among the columns of the partition of U L that are parallel to e i , i = 1, 2, . . . , d, consider those that contain a box which is (ϕ, h + ε)-bad. We let Φ i be the event that there are at least M d−1 10(2d−1)! such columns. When the event d i=1 Φ i does not happen, we will show that {ϕ ≥ h + ε} ∩ B L contains a dense cluster. To this end, since the dense cluster needs to lie in B L , we need to restrict to the collection of boxes B L 0 (z) such that D L 0 (z) is contained in B L . Let us assume that L is large enough so that this collection is non-empty. This collection forms a partition of a smaller box B L that is contained in B L . Notice that the number of boxes in each column of B L is M − 6. Let Γ be the set of boxes of the partition of B L that are (ϕ, h + ε)-good. Then for every e i , Γ contains at least columns parallel to e i , provided that L is large enough so that To show that C is dense, it remains to estimate its diameter. Since F contains at least 3 4 M d boxes, it must intersect a column Col(F ) of B L contained entirely in Γ. Since F is a connected component of Γ, it must contain Col(F ). In other words, C contains a vertex from the two extremal boxes of Col(F ), which implies that C has diameter at least (M − 8)L 0 . We have that (M − 8)L 0 = L − 8L 0 ≥ L/5, provided that L is large enough. Thus C is a dense cluster.
We will now estimate P[ d i=1 Φ i ]. To this end, let i = 1, 2, . . . , d and S be a set of M d−1 10(2d−1)! boxes that lie in different columns of U L parallel to e i . We will first count the possibilities for S and then estimate the probability that for a fixed S as above, all its boxes are (ϕ, h + ε)-bad.
Notice that there are A d−1 columns parallel to e i , where A = 3M , and each column contains A boxes. Hence there are at most possibilities for S, since we can construct S by first choosing a set of columns and then picking a box from each column of this set.
Moving on to the probabilistic estimate, let S be a subset of S which is well-separated and is maximal with respect to this property. Then it is not hard to see that |S | ≥ 201 −d |S|. Let ε 0 = (h * − h − ε)/2. We will now consider two cases. Either at least |S |/2 boxes of S are (ξ, ε 0 )-good or at least |S |/2 boxes of S are (ξ, ε 0 )-bad. In the first case, because we have assumed that all boxes of S are (ϕ, h + ε)-bad, we can deduce that at least |S |/2 boxes of S are (ψ, h + ε, ε 0 )-bad. Applying Proposition 3.4 and using a union bound over the subsets of S we obtain On the other hand, if the second case holds, we can argue as follows. Let T be the set of boxes of S that are (ξ, ε 0 )-bad. Applying Lemma 6.2 below, we see that cap(Σ(T )) ≥ rL d−2 for some constant r > 0. We shall now apply Lemma 3.1 and for this reason we need to check that |T | ≤ δrL d−2 , where δ is the constant of Lemma 3.1. This inequality follows from (6.1) by choosing L to be large enough. Hence a union bound over the subsets of S implies that P[ at least |S |/2 boxes of S are (ξ, ε 0 )-bad] ≤ 2 |S | exp{−crε 2 0 L d−2 } ≤ exp{−c 4 L d−2 }.
Overall, we obtain that Let B L be a neighbouring box of B L . We shall now consider the event that for some function g : D → R which is harmonic in D and |g(x)| < ε for all x ∈ D, and a pair C 1 , C 2 of dense clusters of {ϕ + g ≥ h} ∩ B L and {ϕ + g ≥ h} ∩ B L , respectively, there is no path in {ϕ + g ≥ h} ∩ D L connecting C 1 to C 2 . Let i = 1, 2, . . . , d be such that B L = B L ± Le i . Notice that for every L 0 -box B intersecting C j , j = 1, 2, C j ∩ U contains a cluster of diameter at least L 0 /5. Hence each column of B L ∪ B L parallel to e i that intersects both C 1 and C 2 must contain a box B ∈ L 0 Z d such that H(h, ε, B) happens, since otherwise, C 1 and C 2 are connected in {ϕ + g ≥ h} ∩ D L . We will show that many columns are intersected by both clusters. Indeed, it follows from the definition of a dense cluster that each of C 1 and C 2 intersects at least 3M d−1 /4 columns parallel to e i . In particular, at least M d−1 /2 columns parallel to e i are intersected by both C 1 and C 2 , and all of them contain a box B ∈ L 0 Z d such that H(h, ε, B) happens.
To estimate the probability of the event that at least M d−1 /2 columns contain a box B ∈ L 0 Z d such that H(h, ε, B) happens, we consider two cases. Either at least M d−1 /4 columns contain a (ξ, ε 0 )-bad box or at least M d−1 /4 columns contain a box B such that H(h, ε, B) happens and B is (ξ, ε 0 )-good. When H(h, ε, B) happens and B is (ξ, ε 0 )-good, B is (ψ, h, ε+ε 0 )bad. In both cases, we can argue as above to obtain the desired decay.
We will now prove the two lemmas mentioned above. Proof. We will prove inductively on the dimension that the statement of the lemma holds for all Γ (as in the statement) and all 0 < x < 1 (2d−1)! . For d = 2, the statement holds because any pair of vertical and horizontal columns shares a common vertex. Let us assume that it holds for some d ≥ 2. We will prove it for d + 1.  Proof. Let F 1 be the face of B L intersecting all columns of B L parallel to e 1 and let Γ be the projection of Γ to F 1 . We claim that cap(Γ) ≥ tcap(Γ ) for some constant t = t(d) > 0.
Indeed, recall the variational characterization of the capacity (2.3). Let ν be the probability measure supported on Γ such that cap(Γ ) = E(ν ) −1 and define ν to be the probability measure supported on Γ such that ν(x) = ν (x ), where x is the projection of x to F 1 . Then cap(Γ) ≥ E(ν) −1 . Notice that by projecting Γ onto F 1 , the distance between its vertices decreases. Since the Green's function g(x, y) is asymptotically decreasing in the distance x−y , we have E(ν ) ≥ t(d)E(ν) and the claim follows.
We will now lower bound the capacity of Γ by applying (2.4). To this end, notice that Γ contains at least rL d−1 vertices and consider some vertex x ∈ Γ . Since the number of vertices in F 1 that are at · ∞ -distance k from x is of order k d−2 , it is not hard to see that there are constants t 1 = t 1 (d, r) > 0 and t 2 = t 2 (d, r) > 0 such that for at least t 1 L values of k ∈ {0, 1, . . . , L − 1}, Γ contains at least t 2 k d−2 vertices at distance k from x. The desired lower bound on cap(Γ ) follows now from (2.4).