Block factorization of the relative entropy via spatial mixing

We consider spin systems in the $d$-dimensional lattice $Z^d$ satisfying the so-called strong spatial mixing condition. We show that the relative entropy functional of the corresponding Gibbs measure satisfies a family of inequalities which control the entropy on a given region $V\subset Z^d$ in terms of a weighted sum of the entropies on blocks $A\subset V$ when each $A$ is given an arbitrary nonnegative weight $\alpha_A$. These inequalities generalize the well known logarithmic Sobolev inequality for the Glauber dynamics. Moreover, they provide a natural extension of the classical Shearer inequality satisfied by the Shannon entropy. Finally, they imply a family of modified logarithmic Sobolev inequalities which give quantitative control on the convergence to equilibrium of arbitrary weighted block dynamics of heat bath type.


Introduction
Functional inequalities such as the Poincaré and the logarithmic Sobolev inequality have long played a key role in the analysis of convergence to equilibrium for spin systems. For the Glauber dynamics associated to a lattice Gibbs measures in the high temperature regime, rather conclusive results were obtained around thirty years ago in a series of influential papers [17,34,32,31,20,25]. Broadly speaking, the main results of these works can be summarized with the statement that for finite or compact spin space, if the spin system satisfies a spatial mixing condition, then the relative entropy functional of the Gibbs measure µ V describing the system on any region V ⊂ Z d , satisfies an approximate tensorization of the form: where C > 0 is a constant, f is a nonnegative function, and Ent V f , the relative entropy of the probability measure (f /µ V f ) µ V with respect to µ V , is given by with Ent x denoting Ent {x} , for any vertex x ∈ V . The key feature of this inequality is its dimensionless character, namely the fact that the constant C > 0 is independent of both the region V , and the boundary condition fixed in Z d \ V , which we have omitted from our notation for simplicity. The papers mentioned above formulate their results in terms of logarithmic Sobolev inequalities, but we find it natural to restate them in terms of the tensorization inequality (1.1), which seems to have a more fundamental character in our setting. Anyhow, if the spin space is finite, the statement (1.1) is equivalent to the standard logarithmic Sobolev inequality for the single site heat bath Markov chain, see e.g. [8,28]. The proof of these results was obtained through refined recursive techniques, which exploit the spatial mixing assumption to establish some form of factorization of the entropy functional. We refer to the surveys [23,16] for systematic expositions of these techniques. A particularly simple and effective approach was later developed in [9] and [11], who independently showed that the spatial mixing condition implies a factorization estimate of the form where A, B are e.g. two overlapping rectangular regions in Z d , with V = A ∪ B, and ε > 0 is a constant that can be made small provided the overlap between A and B is sufficiently thick. If the inequality (1.2) is available, then a relatively simple recursion leads to the desired conclusion (1.1). The spatial mixing assumed for all these results is a condition of the Dobrushin-Shlosman type [13], that can be formulated in terms of exponential decay of correlations. In the literature one finds various degrees of generality of the mixing condition, often loosely referred to as strong spatial mixing. We refer to the original papers for the precise notions of spatial mixing involved; see also Section 2.3 below for more on this matter. We point out that the discussion here is mostly concerned with the case of finite or compact spin space, in which case one can actually show that (1.1) is equivalent to a strong mixing condition [30,25]. In the case of unbounded spins the techniques and the results are somewhat different; we refer the interested reader to [35,33,7,19,29,27].
While the inequality (1.1) is well suited for the analysis of the single site heat bath Markov chain, it is not very helpful in the analysis of more general block dynamics, that is Markov chains where an entire region A ⊂ V can be resampled at once by a single heat bath move. With that motivation in mind, in this work we address the question of the validity of a version of the inequality (1.1) where single sites x ∈ V are replaced by arbitrary blocks A ⊂ V . More precisely, we consider the question of finding the best constant C such that for all nonnegative functions f , where α = {α A , A ⊂ V } is an arbitrary collection of nonnegative weights, and we define γ(α) = min x∈V A: A∋x If (1.3) holds with the same constant C for all finite regions V ⊂ Z d , for all given boundary conditions on Z d \ V , and for all choices of weights α, we say that the spin system satisfies the block factorization of entropy (with constant C). This definition is inspired by the fact that in the case of infinite temperature, that is if µ V is a product measure, then (1.3) holds with C = 1. Indeed, in this special case it is a consequence of the well known Shearer inequality satisfied by the Shannon entropy, see [8]. These inequalities have far reaching applications in several different settings, see e.g. [21,2,10], and it is thus very natural to investigate their validity beyond the product case.
However, as far as we know there are no significant results in the literature concerning the validity of (1.3) when µ V is not a product measure. Notice that the tensorization statement (1.1) corresponds to the special case where α A = 1 or 0 according to whether A is a single site or not. In this case, the right hand side of (1.3) has a simple additive structure, a feature that is crucially used in all existing proofs of (1.1).
An important progress was obtained recently in [5] concerning the linearized version of (1.3). Namely, if we replace the entropy functional Ent V f by the variance functional which we may refer to as the block factorization of variance. Notice that the inequality (1.4) provides the lower bound γ(α)/C on the spectral gap of the α-weighted block dynamics, that is the Markov chain with Dirichlet form defined by denotes the covariance of two functions f, g with respect to µ A . This is the continuous time Markov chain where each block A independently undergoes full heat bath resamplings at the arrival times of a Poisson process with rate α A ≥ 0, see e.g. [23]. One of the main results of [5] shows that, if the system satisfies the strong spatial mixing assumption, then it must satisfy the special case of (1.4) where the weights α are all either zero or one, but otherwise arbitrary, and where γ(α) is replaced by the indicator 1 γ(α)>0 , see [5,Theorem 1.2]. The proofs in [5] however rely crucially on coupling arguments as in [14], which do not seem to be effective in establishing the stronger statement (1.3).
In this paper we establish the block factorization of entropy, namely the full statement (1.3), provided the system satisfies a strong spatial mixing assumption. For instance, it will follow that the block factorization of entropy holds throughout the whole one phase region for the ferromagnetic Ising/Potts models in two dimensions, provided V in (1.3) is a sufficiently regular set in the sense of [24], see Section 2.3.
As a corollary, we obtain estimates on the speed of convergence to equilibrium of any block dynamics. Indeed, Jensen's inequality shows that, for any A ⊂ V ⊂ Z d , and therefore (1.3) implies the following modified logarithmic Sobolev inequality for any α-weighted block dynamics: (1.6) In particular, the block factorization of entropy implies the exponential decay in time of the relative entropy, with rate at least γ(α)/C, for any α-weighted block dynamics. Moreover, if the spin state is finite the bound (1.6) implies the upper bound where |V | is the cardinality of the set V , D is some new absolute constant and T mix (V, α) denotes the total variation mixing time of the α-weighted block dynamics. We refer e.g. to [12,6] for the standard background on these implications. If the spin state is finite it is also possible to use (1.3) to derive a standard logarithmic Sobolev inequality for the α-weighted block dynamics in the form with the constant where D is an absolute constant and µ A, * is the minimum value attained by the probability measure µ A , minimized over the choice of the implicit boundary condition in Z d \ A. We conclude this introduction with a brief discussion of the main ideas involved in the proof of our main result (1.3). The proof starts with an observation already put forward in [5] for the case of the spectral gap, which allows us to reduce the general factorization problem to the problem of factorization with two special blocks only: the even sites and the odd sites. The latter is then analyzed via a recursion similar to that employed in Cesi's proof of (1.1), see [9]. As mentioned above, the main obstacle in implementing the recursion here is the lack of an additive structure, which generates potentially large error terms when trying to restore a block from smaller components. To overcome this difficulty we develop a two-stage recursion, which combines a version of the two-block factorization estimate (1.2) together with a decomposition of the entropy which allows us to smear out the errors coming from the restoration of large blocks, see Theorem 4.6. A further crucial ingredient in the proof is a new tensorization lemma which we believe to be of independent interest, see Lemma 3.2 below.
The plan of the paper is as follows. In Section 2 we describe the setup and the main results. In Section 3 we develop some key tools needed for the proof. In Section 4 we prove the block factorization estimate.

Setup and main results
2.1. The spin system. The underlying graph is the d-dimensional integer lattice Z d , with vertices x = (x 1 , . . . , x d ), and edges E defined as unordered pairs xy of vertices x and y such that d i=1 |x i − y i | = 1. We call d(·, ·) the resulting graph distance. For any set of vertices Λ ⊂ Z d , the exterior boundary is ∂Λ = {y ∈ Λ c : d(y, Λ) = 1}, where Λ c = Z d \ Λ. We write F for the set of finite subsets Λ ⊂ Z d .
We take the single spin state to be an arbitrary probability space (S, S , ν). Given any region Λ ⊂ Z d , the associated configuration space is the product space (Ω Λ , F Λ ) = (S Λ , S Λ ), whose elements are denoted by σ Λ = {σ x , x ∈ Λ} with σ x ∈ S for all x. The apriori measure on Ω Λ is the product measure ν Λ = ⊗ x∈Λ ν.
Given a bounded measurable symmetric function U : S × S → R, the pair potential, and a bounded measurable function W : S → R, the single site potential, for any Λ ∈ F, and τ ∈ Ω Λ c , the Hamiltonian H τ The Gibbs measure in the region Λ ∈ F with boundary condition τ ∈ Ω Λ c is the probability measure µ τ Λ on (Ω Λ , F Λ ) defined by where Z τ Λ is the normalizing constant. For any measurable function f : Ω Λ → R we write µ τ Λ f for the expectation of f under µ τ Λ , and write µ Λ f for the measurable function Ω Λ c ∋ τ → µ τ Λ f . A fundamental feature of the family of measures {µ τ Λ , Λ ∈ F , τ ∈ Ω Λ c } is the so-called DLR property: valid for all Λ ⊂ V ∈ F, and for all bounded measurable function f : Ω V → R.

2.2.
Examples and remarks. Below we list some standard examples which fit the general framework defined above and discuss possible extensions. We refer the reader to [15] for an introduction to the statistical mechanics of lattice spin systems.
2.2.1. Finite spins. When the space S is finite we take ν as the counting measure on S.
The Potts model corresponds to S = {1, . . . , q}, with q ≥ 2 a fixed integer, where the parameter β ∈ R is related to the inverse temperature of the system and the fixed vector (h 1 , . . . , h q ) ∈ R q to an external magnetic field. When β ≥ 0 the model is called ferromagnetic. When q = 2 the Potts model is called the Ising model. In the case of finite spin space, in order to include spin systems with hard constraints, we shall also allow the function U to take the value −∞. The spin system is called permissive if for every Λ ∈ F, for every τ ∈ Ω Λ c , there exists σ Λ ∈ Ω Λ with positive mass under µ τ Λ , that is such that µ τ Λ (σ Λ ) > 0. Well known examples of permissive spin systems include the hardcore model with parameter λ, for any λ > 0, and the uniform distribution over proper q-colorings, for any integer q ≥ 2d+1. The hard-core model with parameter λ corresponds to S = {0, 1}, U (1, 1) = −∞, U (1, 0) = U (0, 1) = U (0, 0) = 0, W (s) = s log(λ), while the uniform distribution over proper q-colorings corresponds to the limit β → −∞ in the Potts model. A permissive spin system is called irreducible if the single site heat bath Markov chain on Λ with boundary condition τ is irreducible for any choice of Λ ∈ F and τ ∈ Ω Λ c , see [5,Section 2]. Our main results below will apply to permissive irreducible spin systems.

Continuous compact spins.
Other classical examples are obtained when S is a compact subset of R n and ν is the uniform distribution over S. The O(n) model, for n ≥ 2, corresponds to the case where S is the unit sphere in R n , β ∈ R, for some fixed vector v ∈ S, with ·, · denoting the standard inner product in R n .

2.2.3.
Unbounded spins. The setup introduced above includes unbounded (continuous or discrete) spins. When S = Z + for instance it covers the particle systems considered in [11]. It should be however clear that the boundedness assumptions on the interaction U rules out many interesting models in the unbounded setting.

Extensions.
Concerning possible extensions of our main results to more general settings, we remark that the definitions given above can be extended to include spatially non-homogeneous models, with pair potentials U and site potentials W replaced by edge dependent functions U xy and site dependent functions W x respectively. It is not difficult to check that all results in this paper can be extended to include these cases provided that all the estimates involved in our assumptions are uniform with respect to the new potentials. Finally, we remark that our setup is restricted to the case of nearest neighbor interactions, and the extension of our main results to more general finite range spin systems is not immediate. Indeed, our proof makes explicit use of the nearest neighbor structure at various places. We believe however that a similar approach can be used, provided the decomposition into even and odd sites used in our proof is replaced by more general tilings such as the ones used in [5].
2.3. Spatial mixing. The notion of spatial mixing to be considered belongs to the family of strong spatial mixing conditions. In the case of finite spins it is one of many equivalent conditions introduced by Dobrushin and Shlosman [13] to characterize the so-called complete analyticity regime. The precise formulation we give here coincides with the one adopted in Cesi's paper [9]. For any ∆ ⊂ Λ ∈ F we call µ τ Λ,∆ the marginal of µ τ Λ on Ω ∆ . A version of the Radon-Nikodym density of µ τ Λ,∆ with respect to ν ∆ is given by the function Definition 2.1. Given constants K, a > 0, and Λ ∈ F we say that condition C(Λ, K, a) holds if for any ∆ ⊂ Λ, for all x ∈ ∂Λ: where τ, τ ′ ∈ Ω Λ c are such that τ y = τ ′ y for all y = x, and · ∞ denotes the L ∞ norm. We say that the spin system satisfies SM (K, a) if C(Λ, K, a) holds for all Λ ∈ F.
As emphasized in [24] it is often important to consider a relaxed spatial mixing condition that requires C(Λ, K, a) to hold only for all sufficiently "fat" sets Λ. The latter is defined as follows.
For systems without hard constraints it is well known that SM (K, a), for some K, a, is always satisfied in dimension one, and that for any dimension d > 1 it holds under the assumption of suitably high temperature, see e.g. [23]. It is important to note that the validity of both SM (K, a) and SM L (K, a) can be ensured by checking finite size conditions only [22].
We recall that SM (K, a) can be strictly stronger than SM L (K, a). For instance, as a consequence of results in [26,1,3] it is known that the two-dimensional ferromagnetic Potts model satisfies SM L (K, a), for some K, a > 0 and L ∈ N, throughout the whole uniqueness region, while SM (K, a) cannot hold in this generality.
Finally, we note that C(Λ, K, a) is too strong a requirement in the case of systems with hard constraints, since µ τ ′ Λ,∆ may be not absolutely continuous with respect to µ τ Λ,∆ . However, since (2.2) will only be relevant if d(x, ∆) is sufficiently large, in order to have a meaningful assumption for permissive spin systems with hard constraints, we may rephrase the condition SM L (K, a) by requiring, for all Λ ∈ F (L) , that (2.2) holds for all ∆ ⊂ Λ and x ∈ ∂Λ such that d(x, ∆) ≥ L/2.

2.4.
Main results. We first recall some standard notation. For any V ∈ F, τ ∈ Ω V c , and f : where γ(α) = min x∈V A: A∋x α A . If instead the spin system satisfies SM L (K, a) for some constants K, a > 0, L ∈ N, then the conclusion (2.3) continues to hold, provided we require that V ∈ F (L) .
As we mentioned in Section 1, Theorem 2.3 has the following immediate corollary for the α-weighted block dynamics defined by (1.5). Below, E τ V,α (f, g) denotes the Dirichlet form (1.5) evaluated at a given boundary condition τ ∈ Ω V c .
Corollary 2.4. If the spin system satisfies SM (K, a) for some constants K, a > 0, then the following modified logarithmic Sobolev inequalities hold: Finally, all statements above continue to hold if we only assume SM L (K, a) for some constants K, a > 0 and L ∈ N, provided we restrict to V ∈ F (L) .

Some key tools
In this section we collect some key general facts that do not depend on the spatial mixing assumption. We start by recalling some standard decompositions of the entropy. Next, we prove a new general tensorization lemma. Finally, we revisit the two-block factorization (1.2). Some remarks on the notation are in order. We fix a region V ∈ F and a boundary condition τ ∈ Ω V c . To avoid heavy notation, we often omit explicit reference to V, τ . In particular, whenever possible we shall use the following shorthand notation Moreover, whenever we write µ Λ or Ent Λ for some Λ ⊂ V , we assume that the implicit boundary condition outside Λ has been fixed, and it agrees with τ outside of V .
Unless otherwise stated, f will always denote a nonnegative measurable function such that f log + f ∈ L 1 (µ). To avoid repetitions, we simply write f ≥ 0 throughout. As a convention, we set µ ∅ f = f and Ent ∅ f = 0.
3.1. Preliminaries. We first recall a standard lemma that will be repeatedly used.
More generally, for any
The proof is complete once we show that for each j, For each j, k fixed, µ Λ k,j is a product of µ A i,j , i = 1, . . . , k. Hence, Therefore, (3.5) follows if we show that all j, k fixed: To prove (3.6), notice that where the second identity follows from the product structure where the inequality follows from the variational principle valid for any region U , any boundary condition on U c , and any function g ≥ 0.
Here is an example to keep in mind, with n arbitrary and m = 2. Let {R 1 , . . . , R n } denote a collection of subsets R i ∈ F with d(R i , R j ) > 1 for all i = j. Let A i,1 = ER i be the even sites in R i and A i,2 = OR i be the odd sites in R i , where a vertex x ∈ Z d is even or odd according to the parity of d i=1 x i . Lemma 3.2 says that if we can factorize the even and odd sites on each R i with some constant s i , then we can also factorize, with the constant max i s i , the even and odd sites on all Λ = ∪ i R i . In this example, one has A i,j ∩ A i,k = ∅ if k = j, so in particular C j ∩ C k = ∅ for k = j, but it is interesting to note that this need not be the case in Lemma 3.2, that is each "row" R i is allowed to be decomposed into arbitrary, possibly overlapping subsets A i,j , j = 1, . . . , m. We refer to Remark 3.5 for useful applications of the latter situation.
3.3. Two block factorizations. We shall need the following versions of an inequality of Cesi [9]. Lemma 3.3. Take A, B ∈ F and V = A ∪ B. Suppose that for some ε ∈ (0, 1): for all functions g ∈ L 1 (µ). Then, for all functions f ≥ 0, Proof. The inequality (3.9) coincides with [9, Eq. (2.10)]. To prove (3.10) we use essentially the same argument. As in the proof of (3.9) we may restrict to the case where f is bounded, and bounded away from zero. Then Cesi's inequality [9, Eq. (3. 2)] says that the assumption (3.8) implies for all f ≥ 0, where θ(ε) = 84ε(1 − ε) −2 . Therefore, the claim (3.10) follows from (3.11) applied with µ A f in place of f . with the same error ε ∈ (0, 1), but one has to take the error proportional to n in order to have (3.8)

Proof of the main results
We first reduce the general block factorization problem to the factorization into even and odd sites only.

Reduction to even and odd blocks. We partition the vertices of Z d into even sites and odd sites, where x is even if
x i is an odd integer. Given a set of vertices V ∈ F we write EV for the set of even vertices x ∈ V and OV for the set of odd vertices x ∈ V . Whenever possible we simply write E for EV and O for OV . Notice that both µ E and µ O are product measures.
The reduction to even and odd blocks can be stated as follows. As usual we assume that a region V ∈ F, and a boundary condition τ ∈ Ω V c have been fixed, and we use the shorthand notation (3.1).
Proposition 4.1. Suppose that for some constant C > 0 and some function f ≥ 0, Then, for the same C and f , for all nonnegative weights where γ(α) = min x∈V A:A∋x α A .
Proposition 4.1 is a direct consequence of the following version of Shearer's inequality satisfied by the relative entropy functional of any product measure.
Proof. As in [8, Proposition 2.6], the inequality (4.2) follows from a weighted version of Shearer's inequality for Shannon entropy. For a proof of the latter we refer e.g. to [10, Theorem 6.2].

Proof of Proposition 4.1. Fix a choice of weights
Since µ E is a product measure on Ω E , we may apply Lemma 4.2 with Λ = E and weights α replaced byα = {α U , U ⊂ E}, withα U = A⊂V α A 1 EA=U . It follows that where γ E (α) = min x∈E A:A∋x α A . Similarly, The rest of this section is concerned with the proof of the factorization into even and odd blocks. Namely, we prove the following theorem, which together with Proposition 4.1 establishes the main result Theorem 2.3.

5)
If instead the spin system satisfies SM L (K, a) for some constants K, a > 0, L ∈ N, then the same conclusion (4.5) holds, provided we require that V ∈ F (L) .

Proof of Theorem 4.3.
The overall idea is to follow a recursive strategy based on a geometric construction introduced in [4], see also [9]. However, contrary to the problems studied in [4,9], the error terms produced at each step of the iteration are too large in our setting to obtain directly the desired conclusion, see Theorem 4.6, and we will need an additional recursive argument to finish the proof, see Theorem 4.7. We first carry out the proof under the spatial mixing assumption SM (K, a), and then, in the end, consider the relaxed assumption SM L (K, a).
Let δ(k) denote the largest constant δ > 0 such that holds for all V ∈ F k , τ ∈ Ω V c , and all f : Note that δ(k) ≤ 1 for any k ∈ N since if e.g. f = f (σ E ) is a function depending only on the spins at even sites then the right hand side in (4.6) is equal to On the other hand, the next lemma guarantees that it is positive for all k ∈ N.
Proof. If the spin system has no hard constraints one can use a perturbation argument from [17], see e.g. [8,Lemma 2.2] for the application to our setting. In particular, one obtains that there exists a constant C > 0 such that for all k ∈ N: In the presence of hard constraints, in the case of irreducible permissive systems one can argue as follows. It is known that any probability measure µ satisfies with µ * = min σ µ(σ), where the minimum is restricted to σ such that µ(σ) > 0, and C 0 is an absolute constant, see [12,Corollary A.4]. Here Var denotes the variance functional of µ. For a finite permissive system in a region V one has µ * ≥ e −C|V | for some C > 0 independent of V . Moreover, using the irreducibility assumption, a crude coupling argument shows that the spectral gap of the even/odd Markov chain is bounded away from zero in any fixed region V ∈ F, see [5,Lemma 5.1]. In other words, for some constant C 1 = C 1 (k) one has   Assume SM (K, a). There exists a constant k 0 ∈ N depending on K, a, d such that Theorem 4.6 can only be useful if we know that δ(k) is much larger than 1/ℓ k for k large enough, and thus it is not sufficient to prove Theorem 4.3. The next result allows us to have an independent control on δ(k) which, together with Theorem 4.6 implies the desired uniform bound of Theorem 4.3.
Theorem 4.7. Assume SM (K, a). For any ε > 0, there exists a constant k 0 ∈ N depending on K, a, d, ε, such that Theorem 4.6 and Theorem 4.7 are more than sufficient for our purpose. Indeed, using (4.10) and (4.9), taking for instance ε = 1/2, we see that

4.3.
Proof of Theorem 4.6. We start with a simple decomposition that will be used in the inductive step. Recall that E = EV and O = OV are the even and odd sites respectively, in the given region V .
Proof. The decomposition in Lemma 3.1 shows that Another application of that decomposition shows that However, the product property of µ E implies that µ EB µ EA f = µ E f , and therefore The same argument applies to the case of odd sites.
Let us give a sketch of the main steps of the proof before entering the details. Suppose that V = A ∪ B ∈ F k , and suppose that the assumption of Lemma 3.3 is satisfied. Then where we use the fact that Entµ A f ≤ Entf . Now suppose furthermore that A, B ∈ F k−1 . By definition of δ(k) we then have Therefore, using Lemma 4.8, Disregarding the second line in (4.12) would allow us to obtain a bound of the form provided that an arbitrary set V ∈ F k can be decomposed into sets A, B ∈ F k−1 as above.
We remark that if µ were a product over A, B then by convexity one would have (4.13) and the same bound for odd sites. Thus in the product case the second line in (4.12) may be neglected and we recover a factorization statement which is contained already in Lemma 3.2. In the case we are interested in however one has A ∩ B = ∅ and we cannot hope for a bound like (4.13). For an illustration of the problem, consider for instance the 1D case, with V = {1, . . . , n}, A = {1, . . . , m} and B = {m − ℓ, . . . , n} for some integers 0 < ℓ < m < n. Suppose that m + 1 is even, and suppose that f only depends on σ m . Then, once all odd sites have been frozen, µ EA f is a constant, and therefore Ent EB µ EA f = 0. On the other hand, µ A f depends on σ m+1 , since the conditional expectation µ A depends non-trivially on σ m+1 , and thus we may well have Ent EB µ A f = 0. Therefore, the second line of (4.12) does produce a nontrivial error term. At this point a fruitful idea from [23] comes to our rescue. Namely, one can average over many possible choices of the decomposition V = A ∪ B and hope that the averaging lowers the size of the overall error. This strategy works very well if the error terms have an additive structure, such as in the case of [9]. Here there is no simple additive structure to exploit, and we resort to using the martingale-type decompositions from Lemma 3.1 to control the average error term by means of the global entropy Entf , see Lemma 4.11. This will be sufficient to obtain the recursive estimate (4.9). To implement this argument, we use a slightly different averaging procedure than in [9].
We turn to the actual proof. We start with some geometric considerations, see Figure  4.1 for a two-dimensional representation. Set r := ⌊ 1 6 ℓ k+d ⌋, and define the rectangular sets 1 2 ℓ k+d + i] , i = 0, . . . , r + 1. Suppose that V ⊂ [0, ℓ k+1 ] × · · · × [0, ℓ k+d ], and define, for i = 1, . . . , r + 1: where, as usual E = EV and O = OV denote the even and the odd sites of V respectively. Define also Lemma 4.9. Suppose that V ⊂ [0, ℓ k+1 ] × · · · × [0, ℓ k+d ], and that V / ∈ F k−1 . Referring to the above setting, for all i = 1, . . . r: independent if we condition on the spins in Γ i+1 , that is Right: A1 is the set of yellow vertices, Γ2 is the set of red vertices, and A2 is the set of yellow and red vertices together.
Proof. 1. Suppose that V \ B is empty. Then V = B and therefore, up to translation it is contained in [0, ℓ k+1 ] × · · · × [0, 2 3 ℓ k+d ]. Since 2 3 ℓ k+d = ℓ k this would imply that up to permutation of the coordinates V ∈ [0, The maximal stretch of B along the d-th coordinate is at most 2 3 ℓ k+d = ℓ k and therefore up to translations and permutation of the coordinates B ∈ [0, ℓ k ]×[0, ℓ k+1 ]×· · ·× [0, ℓ k+d−1 ] which says that B ∈ F k−1 . The same argument shows that A i ⊂ R i ∩V ∈ F k−1 for all i.

If
and therefore Γ i+1 ⊂ E. Similarly, one has Γ i+1 ⊂ O if i is even. Moreover, any Z d -path inside V connecting A i with V \A i+1 must go through Γ i+1 , and therefore A i and V \A i+1 become independent if we condition on the spins in Γ i+1 .
Proof. Since i is fixed, for simplicity we write A instead of A i . Set h = µ A g. Then h depends only on σ ∆ , where ∆ = V \ A ⊂ B. We are going to use (2.2) with Λ = B. Let Ω B,τ denote the set of all spin configurations η ∈ Ω B c which agree on the set V c with the overall boundary condition τ ∈ Ω V c . For any η ∈ Ω B,τ one has Therefore, where Since ψ η B,∆ depends on η only through the spins in ∂B, the configurations η, η ′ ∈ Ω B,τ in (4.15) can be assumed to differ only in the set N B = (∂B) ∩ (V \ B). Notice that N B has at most (ℓ k+d−1 + 1) d−1 elements, and that Lemma 4.9(2). Therefore, if η(0) = η, . . . , η(m) = η ′ , denotes a sequence of configurations interpolating between η and η ′ , such that, for all j ∈ {0, . . . , m − 1}, η(j) and η(j + 1) differ only at one site The definition of SM (K, a) implies that Expanding the products in (4.15), and assuming mε 0 ≤ 1, we obtain where we use the inequality (1 + x) m ≤ 1 + emx for x > 0 and m > 0 such that mx ≤ 1. Thus, if k ≥ k 0 for some constant k 0 depending only on K, a, d, we have obtained (4.14) Proof. We prove the first inequality. The same argument proves the second one, with the role of even and odd sites exchanged. Fix i ∈ {1, . . . , r}. Notice that Let us first observe that if i is even then Indeed, in this case i + 1 is odd and Lemma 4.9(4) implies Therefore, where the inequality follows from the variational principle (3.7). This settles the case when i is even. Next, suppose that i is odd. Here the commutation relation (4.17) does not hold, since the average µ A i depends on the spins in the even sites Γ i+1 ⊂ B \ A i . Moreover, (4.16) is in general false since if e.g. f depends only on σ Γ i , then Ent EB µ EA i f = 0 while one can have Ent EB µ A i f > 0.
Define g = µ EA i f . From the decomposition in Lemma 3.1 we see that where we use the shorthand notation σ i+1 for σ Γ i+1 , Ent E (g|σ i+1 ) denotes the entropy of g with respect to the conditional measure µ E (·|σ i+1 ) = µ E\Γ i+1 . Since µ E is a product measure, where Ent i+1 = Ent Γ i+1 denotes the entropy with respect to the probability measure µ Γ i+1 . Similarly, Let us show that Indeed, Lemma 4.9(4) implies that where E(V \ A i+1 ) are the even sites in V \ A i+1 , and we have used the fact that A i and E(V \ A i+1 ) are conditionally independent given the spins σ i+1 . Therefore, reasoning as in (4.18): From (4.19)-(4.20)-(4.21)-(4.22) we conclude that, when i is odd: As in (4.22), we may write where the first inequality follows from convexity of entropy and the second from the monotonicity of A → µ[Ent A f ]. Neglecting the last term in (4.23), we have arrived at for all i odd. In view of the estimate (4.16) we may use the bound (4.24) for all i. Therefore, an application of Lemma 3.1 shows that We are now able to conclude the proof of Theorem 4.6. To prove the recursive bound (4.9) we suppose V ∈ F k \ F k−1 . Then, by translation invariance and by the invariance under coordinate permutation, we may assume that V is as in Lemma 4.9. Combining Lemma 3.3 with Lemma 4.10 we obtain, for each i = 1, . . . , r, Since A i , B ∈ F k−1 , by definition of δ(k) we obtain From Lemma 4.8 we find that the right hand side of (4.25) equals Averaging over i in (4.26) and using Lemma 4.11, In conclusion, δ(k) ≥ (1 − θ(ε k ))δ(k − 1) − 2 r , or equivalently Since r ∼ 1 4 ℓ k and δ(k − 1) ≤ 1, it follows that 1 rδ(k−1) ≫ θ(ε k ) for all k large enough, and therefore for all k ≥ k 0 (K, a, d).

4.4.
Proof of Theorem 4.7. Here we shall use again a recursion on an exponential scale. However, this time we divide the set V into two sets A = ∪ i A i , B = ∪ i B i each being the union of a large number of well separated subsets. We use the factorization from Lemma 3.3 to reduce the problem in the set V to the problem in either A or B.
Then we use the Lemma 3.2 to tensorize within A and within B, which allows us to reduce the problem to a single region A i or B i only. Fix a large integer b > 1, define u k = b k/d , and call G k the set of all subsets V ⊂ Z d which up to translations and permutation of the coordinates are included in the rectangle [0, u k+1 ] × · · · × [0, u k+d ]. We partition the interval I = [0, u k+d ] into 2b consecutive nonoverlapping intervals I 1 , . . . , I 2b such that I j have length t k := 1 2b u k+d , that is Define also the enlarged intervalsĪ j = {s ∈ I : d(s, I j ) ≤ t k /4}, and consider the collections of intervals We remark that both ∆ A and ∆ B are collections of non-overlapping intervals, with On the other hand, ∆ A ∩ ∆ B = ∅. We define the rectangular sets in R d : and define the Z d subsets We refer to Figure 4.2 for a two-dimensional representation. We observe that A i ∈ G k−1 and B i ∈ G k−1 for all i = 1, . . . , b. Indeed, the stretch of A i along the d-th coordinate is at most t k + 2t k /4 ≤ 2t k ≤ u k which together with u k,i = u k−1,i+1 , i = 1, . . . , d−1, implies that A i ∈ G k−1 . The same applies to B i . Observe that with these definitions one has the product property Moreover, the geometric construction shows that Thus, a repetition of the argument in Lemma 4.10 shows that the assumption of Lemma 3.3 is satisfied with ε given by Next, let ̺(k) be defined as the largest constant ̺ > 0 such that the inequality holds for all V ∈ G k , τ ∈ Ω V c , and all f ≥ 0. The key observation is that thanks to the product property (4.27), and using the fact that A i ∈ G k−1 for all i, Lemma 3.2 allows us to estimate Similarly, Thus, (4.28) implies where we use the monotonicity of Λ → µ [Ent Λ f ]. Estimating 1 − θ(ε k ) ≥ 1/2 we have proved that ̺(k) ≥ 1 4 ̺(k − 1).
Remark 4.12. We point out that the argument given in the proof of Theorem 4.7 can be improved if one replaces the parameter t k which is linear in u k by t ′ k = C 1 log(u k ), with C 1 a suitably large constant. Since t ′ k is logarithmic in u k , one can modify the recursion to obtain a bound of the form δ(k) ≥ δ(C 2 log(k))/C 2 for some new constant C 2 , which provides a much better lower bound on δ(k) than the one stated in Theorem 4.7. However, without the companion recursive estimate from Theorem 4.6, this argument alone would not provide the uniform estimate inf k δ(k) > 0.

4.5.
Proof of Theorem 4.3 assuming SM L (K, a). Theorem 4.6 and Theorem 4.7 allowed us to establish Theorem 4.3 under the assumption SM (K, a). We now prove it assuming only SM L (K, a). To this end we observe that any set V ∈ F (L) is uniquely identified by the set V ′ ∈ F such that V = y∈V ′ Q L (y). (4.29) A careful check of the previous proofs then shows that if we work on the rescaled lattice, that is we replace vertices x with blocks Q L (x), then we may repeat all steps in Theorem 4.6 and Theorem 4.7 to obtain the following coarse-grained version of Theorem 4.3 assuming only SM L (K, a): for any V ∈ F (L) , for all f ≥ 0, where, if V is given by (4.29), then E L = ∪ x∈EV ′ Q L (x), and O L = ∪ x∈OV ′ Q L (x). Consider now a single cube Q L (x). By Lemma 4.5 we know that for some constant C 1 = C 1 (L). Observe that by construction d(Q L (x), Q L (y)) > 1 for all x, y ∈ EV ′ . Similarly, d(Q L (x), Q L (y)) > 1 for all x, y ∈ OV ′ . Therefore, Lemma 3.2 implies where EE L denotes the even sites in E L , EO L the even sites in O L , and so on. Plugging these estimates in (4.30) and using the monotonicity of A → µ[Ent A f ] one arrives at with D = 2C × C 1 . This ends the proof of Theorem 4.3.