Abstract
Recently, van den Berg and Jonasson gave the first substantial extension of the BK inequality for non-product measures: they proved that, for \(k\)-out-of-\(n\) measures, the probability that two increasing events occur disjointly is at most the product of the two individual probabilities. We show several other extensions and modifications of the BK inequality. In particular, we prove that the antiferromagnetic Ising Curie–Weiss model satisfies the BK inequality for all increasing events. We prove that this also holds for the Curie–Weiss model with three-body interactions under the so-called negative lattice condition. For the ferromagnetic Ising model we show that the probability that two events occur ‘cluster-disjointly’ is at most the product of the two individual probabilities, and we give a more abstract form of this result for arbitrary Gibbs measures. The above cases are derived from a general abstract theorem whose proof is based on an extension of the Fortuin–Kasteleyn random-cluster representation for all probability distributions and on a ‘folding procedure’ which generalizes an argument of Reimer.
1 Introduction and statement of results
1.1 Definitions, background and overview
Before we state and discuss results in the literature that are needed in, or partly motivated, our current work, we introduce the main definitions and notation: Let \(S\) be a finite set and let \(\Omega \) denote the set \(S^n\). This set will be our state space. We will often use the notation \([n]\) for \(\{1,\ldots ,n\}\), the set of indices. For \(\omega \in \Omega \) and \(K \subset [n]\), we define \(\omega _K\) as the ‘tuple’ \((\omega _i, i \in K)\). We use the notation \([\omega ]_K\) for the set of all elements of \(\Omega \) that ‘agree with \(\omega \) on \(K\)’. More formally,
For \(A, B \subset \Omega , A \square B\) is defined as the event that \(A\) and \(B\) ‘occur disjointly’. Formally, the definition is:
For the case where \(S\) is an ordered set and \(\omega \) and \(\omega ^{\prime } \in \Omega \), we write \(\omega ^{\prime } \ge \omega \) if \(\omega ^{\prime }_i \ge \omega _i\) for all \(i \in [n]\). An event \(A \subset \Omega \) is said to be increasing if \(\omega ^{\prime } \in A\) whenever \(\omega \in A\) and \(\omega ^{\prime }\ge \omega \).
The following inequality, (2) below, was conjectured (and proved for the special case where \(S = \{0,1\}\) and \(A\) and \(B\) are increasing events) in [1]. Some other special cases were proved in [2] and [18]. The general case was proved by Reimer (see [17]).
Theorem 1.1
For all \(n\), all product measures \(\mu \) on \(S^n\), and all \(A, B \subset S^n\),
We also state the following result, Proposition 1.2 below, which was Reimer’s key ingredient (intermediate result) in his proof of Theorem 1.1, and which is also crucial in our work.
To state this result, some more notation is needed: For \(\omega = (\omega _1, \ldots , \omega _n) \in \{0,1\}^n\), we denote by \({\bar{\omega }}\) the configuration obtained from \(\omega \) by replacing \(1\)’s by \(0\)’s and vice versa:
Further, for \(A \subset \Omega \), we define \({\bar{A}}=\{{\bar{\omega }} \, : \, \omega \in A\}\). Finally, if \(V\) is a finite set, \(|V|\) denotes the number of elements of \(V\).
Proposition 1.2
(Reimer [17]) For all \(n\) and all \(A, B \subset \{0,1\}^n\),
It is easy to see that if \(\mu \) is a non-product measure on \(S^n\), it cannot satisfy (2) for all events. However, it seemed intuitively obvious that many measures on \(\{0,1\}^n\) do satisfy this inequality for all increasing events. Such measures are sometimes called BK measures. The first natural, non-trivial, non-product measure which was proved to be BK is the so-called \(k\)-out-of-\(n\) measure: Let \(k \le n\) and let \(\Omega _{k,n}\) be the set of all \(\omega \in \{0,1\}^n\) with exactly \(k\ 1\)’s. Let \(P_{k,n}\) be the distribution on \(\{0,1\}^n\) that assigns equal probability to all \(\omega \in \Omega _{k,n}\) and probability \(0\) to all other elements of \(\{0,1\}^n\).
Theorem 1.3
(van den Berg and Jonasson [3]) For all \(n\), all \(k \le n\), and all increasing \(A, B \subset \{0,1\}^n\),
Remark
This result, which was conjectured in [9], extends, as pointed out in [3], to certain weighted versions of \(P_{k,n}\) and to products of such measures.
Theorem 1.3 is one direction in which Theorem 1.1 can be extended or generalized. In our current work we prove some other natural cases, including the antiferromagnetic Curie–Weiss model (see Theorems 1.4 and 1.5), in this direction. However, we also generalize Theorem 1.1 in a very different sense, namely by modifying the disjoint-occurrence operation. In particular we will show that the ferromagnetic Ising model satisfies (2) for the modification where the usual ‘disjoint-occurrence’ notion is replaced by the stronger notion of disjoint-spin-cluster occurrence (see Theorem 1.6). (A form of this result is also proved for arbitrary Gibbs measures, see Theorem 1.10). As an example we derive an upper bound for the probability of a certain four-arm event in terms of the one-arm probabilities (see Corollary 1.8).
All these results are stated in Sect. 1.2. Next, in Sect. 2 we state and prove a very general result, Theorem 2.3. This theorem involves the notion of ‘foldings’ of a measure, which already plays (but only within the class of product measures and therefore less explicitly) an important role in Reimer’s work [17]. This notion is defined in Sect. 2.2. Theorem 2.3 also involves a highly generalized form of the Fortuin–Kasteleyn random-cluster representation for all probability distributions, presented in Sect. 2.1. In very simplified and informal terms, Theorem 2.3 states that an inequality similar to (2) holds whenever the events \(A\) and \(B\) and the probability distribution \(\mu \) are such that if (a certain version of) \(A \square B\) holds, \(A\) and \(B\) can be ‘witnessed’ by sets of indices that are not connected to each other in the random-cluster configurations for the foldings of \(\mu \). In Sect. 3 we investigate random-cluster representations for Gibbs measures and show how our general result implies the above mentioned Theorem 1.10 and Theorem 1.6. Finally, in Sect. 4 we derive our theorems for the Curie–Weiss model from the general result.
We finish the current section by remarking that a very different kind of extension (namely, a ‘dual form’) of Theorem 1.1, within the class of product-measures, was obtained by Kahn, Saks and Smyth [13], and that there are examples of non-product measures for which the BK property can be proved ‘more directly’ from Theorem 1.1 (see [12]).
1.2 New extensions of Theorem 1.1
1.2.1 Antiferromagnetic Ising Curie–Weiss model
In this model each pair of vertices has the same, antiferromagnetic, interaction. Moreover, each vertex ‘feels’ an external field (which may be different from that at the other vertices).
More precisely, the Ising Curie–Weiss measure with vertices \(\{1, \ldots , n\}\), interaction parameter \(J\) and external fields \(h_1, \ldots , h_n\), is the distribution \(\mu \) on \(\{-1, +1\}^n\) given by
where (here and in similar expressions later) \(Z\) is a normalizing constant, the first sum is over all pairs \((i,j)\) with \(1 \le i < j \le n\) and the second sum is over all \(i = 1, \ldots , n\). If \(J < 0\), the measure is called antiferromagnetic. One of our main new results is that in that case it is a BK measure:
Theorem 1.4
The Ising Curie–Weiss measure (5) with \(J \le 0\) satisfies
for all increasing \(A, B \subset \{-1, +1\}^n\).
In Sect. 4 we prove this theorem from the more general Theorem 2.3.
Remark
If \(n\) is even and all the \(h_i\)’s are \(0\), letting \(J \rightarrow -\infty \) in (5) yields the \(n/2\)-out-of-\(n\) distribution (with \(-1\) playing the role of \(0\)). More generally, taking all the \(h_i\)’s equal to a common value \(h\), and then letting \(J \rightarrow -\infty \) and simultaneously (in a suitable way, depending on \(k)\,h \rightarrow \infty \) (or \( - \infty \)), yields the \(k\)-out-of-\(n\) distribution. In this sense Theorem 1.3 can be seen as a special case of Theorem 1.4 above. In fact, the proof of Theorem 1.4 is somewhat similar in spirit to that of Theorem 1.3. Roughly speaking, it boils down to showing that an antiferromagnetic Curie–Weiss measure without external fields on \(\{-1,+1\}^n\) can be written as a convex combination of products of ‘independent fair coin-flips’ and 1-out-of-2 measures. However, to show that such convex combination exists is much more involved than the analogous work in [3] for the \(k\)-out-of-\(n\) model (and is not intuitively obvious at all).
1.2.2 Curie–Weiss model with three-body interactions
It is well-known and easy to see that the Ising Curie–Weiss measure (5) satisfies the negative lattice condition
if \(J \le 0\), and that it satisfies the positive lattice condition (i.e. (7) with reverse inequality, also widely known as FKG lattice condition; see e.g. Section 2.2 in [10]) if \(J \ge 0\). Thus Theorem (1.4) says that for the Ising Curie–Weiss model the negative lattice condition implies the BK property, while it was only known to imply negative association (see [16]), a property which is weaker than BK (see [15]).
One could wonder if this is the case also for the Curie–Weiss model with multibody interaction. This question is in some sense opposite to those in [14] where they deal with infinite extendibility (IE) to an exchangeable distribution, a property which implies the positive lattice condition.
We investigate here only the first step in this direction, namely the addition of a three-body interaction to (5): combining Lemmas 4.2 and 4.3 and the remark that the cubic part of the interaction disappears in the foldings, we show that even for this model the negative lattice condition implies BK. With \(S= \{-1,1\}\) the Curie–Weiss model with three-body interactions \(\mu \) is the distribution on \(\Omega =S^n\) given by
where in the last two sums we take \(i< j\) and \(i<j<k\), respectively.
Theorem 1.5
If \(\mu \) as in (8) satisfies the negative lattice condition (7), then
for all increasing \(A, B \subset \{-1, +1\}^n\).
This theorem will be proved in Sect. 4.2.
Remark
There exist non-zero \(J_2\) and \(J_3\) such that the distribution (8) satisfies (7). For instance, take \(J_2 < 0\) and take \(|J_3|\) sufficiently small.
1.2.3 A cluster-disjointness inequality for the ferromagnetic Ising model
In this section we state, for the ferromagnetic Ising model, a version of Theorem 1.1 with a modified form of the \(\square \) operation. First we recall that the ferromagnetic Ising measure for vertices \(1, \ldots , n\), interaction parameters \(J_{i,j} \ge 0, 1 \le i < j \le n\) and external fields \(h_i, 1 \le i \le n\), is the distribution on \(\{-1, +1\}^n\) given by
It is well-known that this measure satisfies the FKG inequality: for all increasing events \(A\) and \(B\),
Note that (since the complement of an increasing event is decreasing) this is equivalent to saying that for all increasing events \(A\) and decreasing events \(B\),
To define the modified \(\square \)-operation we first consider the usual graph \(G\) induced by the interaction values. This is the graph with vertices \(1, \ldots , n\) where two vertices \(i\) and \(j\) share an edge iff \(J_{i,j} >0\). A \(+\) cluster (with respect to a realization \(\omega \)), is a connected component in the graph obtained from \(G\) by deleting all vertices \(i\) with \(\omega _i = -1\). Similarly, \(-\) clusters are defined. The term spin cluster will be used for \(+\) as well as for \(-\) clusters. If \(K \subset [n]\), we use the notation \(C(K)\) for the set of all vertices \(i \in [n]\) for which there is a \(j \in K\) which is in the same spin cluster as \(i\). (Note that, in particular \(C(K) \supset K\)). The modified \(\square \) operation is defined as in (1), but with the constraint \(K \cap L = \emptyset \) replaced by the stronger constraint that \(C(K) \cap C(L) = \emptyset \). Note that this stronger constraint is equivalent to saying that there is no ‘monochromatic’ path from \(K\) to \(L\).
One of our main results is that, with the above modification of the \(\square \)-operation, the ferromagnetic Ising model satisfies the analog of Theorem 1.1:
Theorem 1.6
The ferromagnetic Ising measure (10) satisfies
for all \(A, B \subset \{-1, +1\}^n\).
In Sect. 1.2.5 we present a more general result of this flavour, Theorem 1.10, for Gibbs measures. In Sect. 3.2 we show that Theorem 1.6 can be obtained easily from Theorem 1.10 (which in turn follows from our most general result, Theorem 2.3).
Note that if \(A\) is increasing and \(B\) decreasing, then \(A \boxminus B = A \cap B\), so that the FKG inequality (11) can be considered as a special case of (14). Another special case is given by the following corollary. For \(W, W^{\prime }\subset [n]\) we write \(W \stackrel{+}{\rightarrow } W^{\prime }\) for the event that there is a \(+\) path (i.e. a path, with respect to the graph structure mentioned above, of which every vertex has value \(+1\)) from some vertex in \(W\) to some vertex in \(W^{\prime }\). We denote the complement of this event simply by \((W \stackrel{+}{\rightarrow } W^{\prime })^c\).
Corollary 1.7
Let \(\mu \) be the Ising distribution defined above, and let \(X, Y, U, W\subset [n]\). Then
Proof of Corollary 1.7
Take for \(A\) the event \(\{X \stackrel{+}{\rightarrow } Y\}\) and for \(B\) the event \(\{U \stackrel{+}{\rightarrow } W\}\). Then \(A \boxminus B\) is the event in the l.h.s. of (15). Now apply Theorem 1.6. \(\square \)
Remark
If \(X, Y, U\) and \(W\) have only one element, say \(x, y, u\) and \(w\) respectively, the event in the l.h.s. of (15) can be written as
where the union marked by \(*\) is over all connected components \(K\) with \(\{u, w\} \subset K\) and \(\{x, y\} \cap K = \emptyset \), and where \([\mathbf{1}]_K\) is the event that all vertices in \(K\) have value \(1\), and \([\mathbf{-1}]_{\partial K}\) is the event that all vertices that are not in \(K\) but have a neighbour in \(K\) have value \(-1\). Since this is a union of disjoint events, a more direct decoupling inequality of Borgs and Chayes [4] can be applied in this case to obtain (15). However, if the sets \(X, Y, U\) and \(W\) have more elements, the event in the l.h.s. of (15) can, in general, not be written as a suitable disjoint union, and the Borgs–Chayes decoupling inequality is not applicable.
The following ‘four-arm event’ is another example which illustrates how Theorem 1.6 can be used. Consider the graph with vertices \(V = \{-k, \ldots ,k\}^2{\setminus }\{(0,0)\}\), and where two vertices \(v = (v_1, w_1)\) and \(w = (w_1,w_2)\) share an edge iff \(|v_1 - w_1| + |v_2 - w_2| = 1\). In other words, this graph is the \(2 k \times 2k\) box on the square lattice, centered at \(O\), but with \(O\) ‘cut out’. Consider the Ising distribution on \(\{-1, +1\}^V\) with external fields \(h_v, v \in V\) and interaction parameters \(J_{v,w} > 0\) if \(v\) and \(w\) share an edge and \(0\) otherwise. We use the notation \(W \stackrel{+}{\rightarrow } W^{\prime }\) as in Corollary 1.7, and the notation \(W \stackrel{-}{\rightarrow } W^{\prime }\) for its analog, with \(+\) replaced by \(-\). Further, we will use here the notation \(\partial V\) in a slightly different way as above, namely for the set of those vertices \((v_1, v_2) \in V\) for which \(|v_1| + |v_2| = k\).
Corollary 1.8
Let \(\mu \) be the Ising distribution on \(V\) described above. We have
Proof of Corollary 1.8 from Theorem 1.6
Let \(A\) be the event \(\{(1,0) \overset{+}{\rightarrow } \partial V, (0,1) \overset{-}{\rightarrow } \partial V \}\), and \(B\) the event \(\{(-1,0) \overset{+}{\rightarrow } \partial V, (0,-1) \overset{-}{\rightarrow } \partial V \}\). It is easy to see that the event in the l.h.s. of (16) is contained in \(A \boxminus B\), [with \(\boxminus \) as defined in (13)]. Hence, by Theorem 1.6, the l.h.s. of (16) is at most \(\mu (A) \mu (B)\), which, by applying the FKG inequality (12) to \(\mu (A)\) and \(\mu (B)\) separately, is at most the r.h.s. of (16). \(\square \)
Remarks
-
(i)
At first sight one might have the impression that Corollary 1.8 can be proved more directly, by the earlier mentioned decoupling inequalities in [4] or a straightforward combination of FKG and elementary manipulations. However, we do not see how to do that.
-
(ii)
Various versions of Corollary 1.8 can be proved in exactly the same way. One example is the analog of this corollary, where the ‘hole’ in the box is bigger than just one point, and where the event in the l.h.s. of (16) is replaced by the event that there exist four points \(u, v, w\) and \(x\) on the boundary of the hole (denoted by \(\partial H\)) with the property that travelling along this boundary clockwise, starting in \(u\), we first encounter \(v\), then \(w\) and then \(x\), and such that \(u \overset{+}{\rightarrow } \partial V, v \overset{-}{\rightarrow } \partial V, w \overset{+}{\rightarrow } \partial V\) and \(x \overset{-}{\rightarrow } \partial V\). (In this case the r.h.s. of (16) is replaced by the product of \((\mu (\partial H \overset{+}{\rightarrow } \partial V))^2\) and \((\mu (\partial H \overset{-}{\rightarrow } \partial V))^2\).). A different kind of version, which can also be proved in the same way is that where the \(-\)path events are replaced by their analogs for so-called \(*\) paths (i.e. where besides horizontal and vertical steps, also diagonal steps are allowed in the path).
1.2.4 Potts models
The Potts measure for vertices \(1, \ldots , n\), set of ‘spin’ values \(S\), and interaction parameters \(J_{i,j}, 1 \le i, j \le n\) is the distribution on \(S^n\) given by
If all the \(J_{i,j}\)’s are larger (smaller) than or equal to \(0\) we say that the measure is ferromagnetic (antiferromagnetic). Note that if \(|S| = 2\), the ferromagnetic Potts measure is, in fact, the ferromagnetic Ising model. Remarkably, if \(|S| \ge 3\) we do not have a non-trivial analog of Theorem 1.6 for the ferromagnetic Potts model, but we do have one for the antiferromagnetic case.
First, as we did for the Ising model, we consider the graph with vertices \(1, \ldots , n\) where two vertices \(i\) and \(j\) share an edge iff \(J_{i,j} \ne 0\). A path \(\pi \) in this graph is called changing (w.r.t. a configuration \(\omega \)) if, for each two consecutive vertices \(v\) and \(w\) on \(\pi , \omega _v \ne \omega _w\). (In particular, the path consisting of the vertex \(v\) only, is considered as a changing path). The cluster of a set \(K \subset [n]\), again (as in Sect. 1.2.3) denoted by \(C(K)\), is now defined as the set of all vertices \(v\) for which there is a changing path with starting point in \(K\) and endpoint \(v\).
The modified operation \(\boxminus \) we now use has exactly the same form as (13) (but now with the new meaning of \(C(K)\) and \(C(L)\)). We get the following analog of Theorem 1.6.
Theorem 1.9
Consider the Potts measure \(\mu \) [see (17)] with all \(J_{i,j}\)’s non-positive. With the above defined notion of clusters and \(\boxminus \) operation, this measure satisfies
for all \(A, B \subset S^n\).
In Sect. 3.3 we will show that this theorem follows from Theorem 1.10 below.
Remark
This analog of Theorem 1.6 is, due to the more complicated notion of clusters, less intuitively appealing than Theorem 1.6 itself, and we do not (yet) know an interesting consequence of the form of Corollary 1.8. We hope the following example, which is of the same spirit as Corollary 1.7, enhances the intuitive understanding. In this example we take \(S = \{1, 2, 3\}\), and let \(\mu \) as in Theorem 1.9. We use notation like \( x \stackrel{1}{\rightarrow } Y\) similarly to that in Corollary 1.7. Further, let, for vertices \(x\) and \(u, DB(x,u;2)\) denote the event that there is a ‘double barrier’ of \(2\)’s, separating \(x\) and \(u\). More precisely, this means that every path from \(x\) to \(u\) has at least two consecutive vertices with value \(2\). It is easy to check from the definitions that Theorem 1.9 implies that, for \(X, Y, U, W \subset [n]\),
1.2.5 Gibbs measures
This section sets the two previous ones in a more general context. (Since the Ising model and Potts model are such widely used models, and the correspondence of the \(\boxminus \) operation in Theorems 1.6 and 1.9 to that in Theorem 1.10 below is not immediate, we devoted a separate section to them). First we give several definitions and introduce the needed notation.
If \(K\) and \(L\) are disjoint subsets of \([n]\), and \(\alpha \in S^K\) and \(\gamma \in S^L\), we denote by \(\alpha \circ \gamma \) the configuration on \(K \cup L\) that agrees with \(\alpha \) on \(K\) and with \(\gamma \) on \(L\). Formally, \(\alpha \circ \gamma \) is the (unique) \(\omega \in S^{K \cup L}\) for which
A potential \(\Phi \) (for the configuration space \(S^{[n]}\)) is a collection of functions
Below we will often use the term ‘hyperedges’ for subsets of \([n]\), but sometimes we will simply call them edges. In most examples \(\Phi _b \equiv \) some constant for most \(b\)’s. For the ease of notation we will often (when there is no risk of confusion) write \(\Phi (\omega _b)\) instead of \(\Phi _b(\omega _b)\).
A hyperedge \(b\) is called inefficient (with respect to a configuration \(\omega \in S^b\)) if \(|b| \ge 2\) and, for every \(N \subset b\) and every \(\sigma \in S^b\),
Note that if \(\Phi _b\) is constant on \(S^b\), then \(b\) is inefficient with respect to every \(\omega \in S^b\).
Remark
The name ‘inefficient’ is motivated by certain explicit examples. For instance, the ‘usual’ potential function for the ferromagnetic Ising model assigns to an edge \(b\), of which the endpoints \(x\) and \(y\) have spin \(\omega _x\) and \(\omega _y\) respectively, the value \(J_b \omega _x \omega _y\) (where \(J_b\) is a positive number). It is easy to see that in that case the edge is inefficient (in the sense defined above) if and only if its endpoints have spin values that minimize this value.
Let \(K \subset [n]\) and \(v \in [n]\). A hyperpath from \(K\) to \(v\) is a sequence \(\pi = (b_1, \ldots , b_m)\), such that \(K \cap b_1 \ne \emptyset , b_i \cap b_{i+1} \ne \emptyset , \, 1 \le i \le m-1\) and \(v \in b_m\). If (w.r.t. a certain configuration) none of these edges \(b_i, 1 \le i\le m\), is inefficient, we say that \(\pi \) is an efficient path (w.r.t. that configuration). We define the cluster of \(K\), denoted by \(C(K)\), as the set of all vertices \(v\) for which there is an efficient path from \(K\) to \(v\).
Remark
Note that, by the definition of an inefficient edge, there is always an efficient path from a vertex \(v\) to itself (namely the path which consists only of the edge \(\{v\}\)). Hence \(C(K) \supset K\).
We define the following modification of the box-operation:
Remark
We have used here the same notation as for the ferromagnetic Ising model in the previous section. Note that, by taking for \(\Phi \) the ‘usual potential function for the Ising model’, definition (20) becomes exactly definition (13).
The Gibbs measure for the potential \(\Phi \) is the measure \(\mu \) on \(S^{[n]}\) given by
where the sum is over all \(b \subset [n]\) (or, equivalently, by adjusting \(Z\), over all \(b \subset [n]\) with the property that \(\Phi _b\) is non-constant on \(S^b\)).
The main result of this section is:
Theorem 1.10
The Gibbs measure (21) satisfies
for all \(A, B \subset S^n\).
We will prove in Sect. 3 that the above theorem follows from Theorem 2.3.
2 General framework
2.1 Generalized random-cluster representations
As before, \(S\) is a finite set (and will play the role of ‘single-site state space’), and as set of ‘indices’ (also called ‘vertices’) we take \([n] := \{1, \ldots , n\}\). (So the state space is \(S^n\)). The set of all subsets of a set \(V\) will be denoted by \(\mathcal{P }(V)\). In the case \(V = [n]\) we simply write \(\mathcal{P }(n)\) for \(\mathcal{P }([n])\). Elements of \(\mathcal{P }(n)\) will often be called hyperedges.
We assign, to each hyperedge \(b\), a random subset of \(S^b\). Let \(\nu \) denote the joint distribution of this collection of subsets. So \(\nu \) is a probability measure on \(\prod _{b \subset [n]} \mathcal{P }(S^b)\). In the following, we will typically use the notation \(\eta _b\) for a subset of \(S^b\), and the notation \(\eta \) for the collection \((\eta _b, b \subset [n])\). Further (as before) we typically denote an element of \(S^n\) by \(\omega \). For \(\omega \) and \(\eta \) as above, we say that \(\omega \) is compatible with \(\eta \) (notation: \(\omega \sim \eta \)) if \(\omega _b \in \eta _b\) for all \(b \subset [n]\).
Definition 2.1
Let \(\mu \) be a probability distribution on \(S^n\). We say that \(\mu \) has a random cluster representation (RCR) with base \(\nu \) if, for all \(\omega \in S^n\),
A hyperedge \(b\) is said to be active (w.r.t. \(\eta \)) if \(\eta _b \ne S^b\). Two vertices (elements of \([n]\)) \(v\) and \(w\) are said to be neighbours (w.r.t. \(\eta \)) if there is an active hyperedge \(b\) such that \(v \in b\) and \(w \in b\). This notion gives naturally rise to the notion of clusters: the cluster of \(v\) is the set which consists of \(v\) and all \(w \in [n]\) for which there exists a sequence \(b_1, \ldots , b_k\) of active hyperedges such that: \(v \in b_1, w \in b_k\), and \(b_i \cap b_{i+1} \ne \emptyset , 1 \le i \le k-1\). To emphasize that these notions depend on \(\eta \), we speak of \(\eta \)-active, \(\eta \)-cluster etcetera.
Remarks
-
(i)
This notion of random cluster representation is an abstraction of the usual notion in the literature. To indicate the correspondence with the usual notion, consider as an example the ferromagnetic Ising measure (10), with all \(h_i\)’s equal to \(0\), and each \(J_{i,j}\) either \(0\) or \(J\). The usual random cluster model for this measure, introduced by Fortuin and Kasteleyn (see e.g. [7]; see also [10] and Chapter 10 in [11]) assigns to each edge (pair of vertices \(i, j\) for which \(J_{i,j} \ne 0\)) the value ‘open’ or ‘closed’. The two endpoints of an open edge ‘receive’ the same spin value (i.e. both are \(+1\) or both are \(-1\)). In the language of our definition above, this is the same as assigning to the edge \((i,j)\) the ‘value’ \(\{(-1,-1), (+1, +1)\}\) (which corresponds to being ‘open’), or the value \(\{-1,+1\}^b\) (which corresponds to being ‘closed’). The base \(\nu \) in Definition 2.1 is, in this special case, a Bernoulli measure: each edge \(b = \{i,j\}\) with \(i \ne j\) and \(J_{i,j} >0\) has, independently of the other edges, \(\eta _b\) equal to \(\{(-1,-1), (+1, +1)\}\) with probability \(p\), and equal to \(\{-1,+1\}^b\) with probability \(1-p\) (where \(p = 1 - \exp (- 2 J)\) as in the Fortuin–Kasteleyn (FK) random-cluster measure). The ‘extra’ factor (\(2\) to the number of open clusters) in the FK random-cluster measure is missing in the equation for \(\nu \). This makes our computations involving \(\nu \) more elegant but has no fundamental consequences.
-
(ii)
Typically a measure \(\mu \) on \(S^n\) has more than one random cluster representation. For instance, taking
$$\begin{aligned} \nu (\eta ) = {\left\{ \begin{array}{ll} \mu (\omega ),&\text{ if} \eta _{[n]} = \{\omega \} \text{ and} \eta _b = S^b \text{ for} \text{ all} b \ne [n] \\ 0,&\text{ otherwise} \end{array}\right.} \end{aligned}$$gives an RCR of \(\mu \) which is trivial and not useful.
-
(iii)
Generalized random-cluster representations are not only useful for the purposes in this paper but also interesting in themselves. Several properties will be studied in more detail in the separate paper [8].
-
(iv)
For \(q\)-state Potts models on the triangular lattice a generalization of the usual random-cluster model was obtained by Chayes and Lei (see Section 3.1 in [6]).
2.2 Foldings
Let \(M \subset [n]\) and \(\alpha \in S^M\). Further, let \(\beta , \gamma \in S^{M^c}\) be such that \(\beta _i \ne \gamma _i\) for all \(i \in M^c\). Finally, let \(\mu \) be a distribution on \(S^n\). The following notion, but less explicitly and less generally, and not with this terminology, plays an important role in [3, 17].
Definition 2.2
The \((\alpha ;\beta ,\gamma )\)-folded version of \(\mu \) is the probability measure on \(\prod _{i \in M^c} \{\beta _i, \gamma _i\}\) given by
where \({\bar{\omega }}\) is the unique element of \(\prod _{i \in M^c} \{\beta _i, \gamma _i\}\) with the property that, for each \(i \in M^c, {\bar{\omega }}_i \ne \omega _i\). We call \(M\) the locked area of the folding.
Remarks
-
(i)
Instead of \((\alpha ;\beta ,\gamma )\)-folded version we will often say \((\alpha ;\beta ,\gamma )\)-folding (or, simply, folding).
-
(ii)
Although the definition of \({\bar{\omega }}\) depends on \(\beta \) and \(\gamma \) we do not show this in our notation; it will always be clear from the context which \(\beta \) and \(\gamma \) are meant. We will also use this notation in more generality: if \(V \subset M^c\) and \(\omega \in \prod _{i \in V} \{\beta _i, \gamma _i\}, \bar{\omega }\) is the unique element of \(\prod _{i \in V} \{\beta _i, \gamma _i\}\) with the property that, for each \(i \in V, {\bar{\omega }}_i \ne \omega _i\). And, if \(F \subset \prod _{i \in V} \{\beta _i, \gamma _i\}, \bar{F}\) is defined as the set \(\{{\bar{\omega }} \, : \, \omega \in F \}\).
-
(iii)
It is also obvious from the definition that if, for some indices \(i\), we replace \(\beta _i\) by \(\gamma _i\) and vice versa, this does not change the measure \(\mu ^{(\alpha ; \beta , \gamma )}\). In particular, if the set \(S\) has only two elements, the choice of \(\beta \) and \(\gamma \) is immaterial and therefore we simply write \(\mu ^{(\alpha )}\) in that case.
-
(iv)
We will be interested in random-cluster representations of foldings. Note that the base \(\nu \) of such a representation is a probability measure on
$$\begin{aligned} \prod _{b \subset M^c} \mathcal{P }\left(\prod _{i \in b} \{\beta _i, \gamma _i\}\right). \end{aligned}$$
2.3 A general form of restricted disjoint occurrence
For \(A, B \subset S^n\) and \(\omega \in S^n\), we define the set of disjoint-occurrence pairs \(\mathcal{D }(A,B,\omega )\) as follows:
Note that \(A \square B\) can be written as
Restricted forms of the disjoint-occurrence operation can be obtained by replacing, in the r.h.s. of (26), \(\mathcal{D }(A,B,\omega )\) by a subset. We already did this in Sects. 1.2.3 and 1.2.4 for Ising and Potts models, and in Sect. 1.2.5 for other Gibbs measures. More generally, let \(\Psi \) be a map which assigns to each triple \((A,B,\omega )\) (where \(A, B \subset S^n\) and \(\omega \in S^n\)) a (possibly empty) subset of \(\mathcal{D }(A,B,\omega )\). Such a map will be called a selection rule. Now define the \(\Psi \)-restricted disjoint occurrence operation as follows:
This definition depends of course on the selection rule \(\Psi \). Although this dependence is not visible in the notation \(A \boxminus B\), it will always be clear from the context to which selection rule it refers. In fact, in the section on the ferromagnetic Ising model and that on Gibbs measures, we already used this notation [see (13) and (20), respectively], which, as we will see in Sect. 3, corresponds to certain particular choices of the selection rule. Note that if for \(\Psi \) we take the ‘obvious’ selection rule \(\Psi (A,B, \omega ) = \mathcal{D }(A, B, \omega )\), then \(A \boxminus B\) is simply \(A \square B\).
Our general theorem for (restricted) disjoint-occurrence is the following.
Theorem 2.3
Let \(A, B \subset S^n,\Psi \) a selection rule and \(\mu \) a probability measure on \(S^n\). If, for each \(M \subset [n]\), each \(\alpha \in S^M\) and all \(\beta , \gamma \in S^{M^c}\) with \(\beta _i \ne \gamma _i\) for all \(i \in M^c\), the folding \(\mu ^{(\alpha ; \beta , \gamma )}\) has a random-cluster representation with base \(\nu ^{(\alpha ; \beta , \gamma )}\) which satisfies conditions (i) and (ii) below, then
Condition (i) (Symmetry): For \(\nu ^{(\alpha ; \beta , \gamma )}\)-almost every \(\eta \) and each \(b \subset M^c, \eta _b = {\bar{\eta }}_b\).
Condition (ii) (Separation): For all \(\omega \in \prod _{i \in M^c} \{\beta _i, \gamma _i\}\) for which \(\Psi (A,B,\omega \circ \alpha ) \ne \emptyset \), there is a pair \((K,L) \in \Psi (A,B,\alpha \circ \omega )\) such that for \(\nu ^{(\alpha ; \beta , \gamma )}\)-almost all \(\eta \sim \omega \) there is no element of \(K \cap M^c\) that belongs to the same \(\eta \)-cluster as an element of \(L \cap M^c\).
Proof
The proof highly generalizes the overall structure of the proofs of Theorems 1.1 and 1.3 from Proposition 1.2 in [17] and [3] respectively, and combines it with the notion of random-cluster representations. \(\square \)
It is clear that (28) can be written as
Let, for each \(M \subset [n]\), each \(\alpha \in S^M\), and all pairs \(\beta , \gamma \in S^{M^c}\) with \(\beta _i \ne \gamma _i, i \in M^c\),
It is clear that two such sets are either equal to each other or disjoint. Moreover, the union of all such sets is \(S^{n} \times S^{n}\). Hence, to prove (29) it is sufficient to prove that, for each of the above mentioned sets \(W^{(\alpha ; \beta , \gamma )}\),
By the definition of ‘foldings’ [see (24)], and that of random-cluster representations [see (23)] the l.h.s. of (30) is equal to
where \(Z\) corresponds to the normalizing factor in (24), and \(Z^{\prime }\) with the normalizing factor in (23).
Similarly, the r.h.s. of (30) is equal to
The theorem now follows if, for each \(\eta \) with \(\nu ^{(\alpha ; \beta , \gamma )}(\eta ) >0\), the cardinality of the set of \(\omega \)’s in the last line of (31) is smaller than or equal to that in the last line of (32). The following lemma states that this is indeed the case.
Lemma 2.4
Let \(\eta \) be such that \(\nu ^{(\alpha ; \beta , \gamma )}(\eta ) >0\). Then
Proof of Lemma 2.4
Let \(C_1, C_2, \ldots , C_k\) denote the \(\eta \)-clusters. (So, in particular, \((C_1, \ldots , C_k)\) is a partition of \(M^c\)). From Condition (i) in Theorem 2.3 (and the definition of \(\eta \)-clusters) it follows that if \(\omega \sim \eta \), and \(\sigma \in \prod _{i \in M^c} \{\beta _i, \gamma _i\}\) satisfies \(\sigma _{C_i} \in \{\omega _{C_i}, {\bar{\omega }}_{C_i}\}\) for all \(i = 1, \ldots , k\), then also \(\sigma \sim \eta \). Therefore it is sufficient to show that, for each \(\omega \sim \eta \),
\(\square \)
Consider the map \(T : \prod _{1 \le i \le k} \{\omega _{C_i}, {\bar{\omega }}_{C_i}\} \rightarrow \{0,1\}^k\), defined by
This map is clearly a \(1-1\) map, and
Now let
Claim
-
(i)
The image under \(T\) of the set in the l.h.s. of (34) is contained in \(D \square E\).
-
(ii)
The image under \(T\) of the set in the r.h.s. of (34) is equal to \(D \cap {\bar{E}}\).
To see that the first claim holds, let \(\sigma \) be an element of the set in the l.h.s. of (34). Let
and define \(B(\alpha )\) analogously. By the definition of \(\Psi (A,B,\alpha \circ \sigma )\) and Condition (ii), it follows that there are disjoint subsets \(\{i_1, \ldots , i_l\}\) and \(\{j_1, \ldots , j_m\}\) of \(\{1, \ldots , k\}\) such that
From (37) and the definition of the map \(T\) it follows that \([T(\sigma )]_{\{i_1,\ldots , i_l\}} \subset D\) and \([T(\sigma )]_{\{j_1,\ldots , j_m\}} \subset E\), and hence that \(T(\sigma ) \in D \square E\). This shows that Claim (i) holds. To check Claim (ii) is straightforward.
Lemma 2.4 is now obtained as follows: By Claim (i) and because \(T\) is a 1-1 map, the l.h.s. of (34) is at most \(|D \square E|\), which by Proposition 1.2 is at most \(|D \cap {\bar{E}}|\), which by Claim (ii) (and again the fact that the map \(T\) is \(1-1\)) is equal to the r.h.s. of (34). This completes the proof of Lemma 2.4.\(\square \)
As we saw before, Lemma 2.4 completes the proof of Theorem 2.3.\(\square \)
3 RCR for Gibbs measures, and proofs of Theorems 1.6–1.10
3.1 Proof of Theorem 1.10 from Theorem 2.3
We start with the following general result for random-cluster representations of Gibbs measures, which is also interesting in itself. See Sect. 2.1 for the definition of RCR and Sect. 1.2.5 for notation and terminology related to Gibbs measures.
First some definitions: We say that \(\eta _b \subset S^b\) is monotone (w.r.t. the potential \(\Phi \)) if \(\omega \in \eta _b\) and \(\Phi _b(\omega ^{\prime }) \ge \Phi _b(\omega )\) implies \(\omega ^{\prime } \in \eta _b\). We say that the collection \(\eta = (\eta _b, b \subset [n])\) is monotone if each \(\eta _b, b \subset [n]\), is monotone. Finally, we say that a probability measure \(\nu \) on the set \(\prod _{b \subset [n]} \mathcal{P }(S^b)\) is monotone if it is concentrated on the set of monotone \(\eta \)’s (i.e. if \(\nu (\eta ) = 0\) whenever \(\eta \) is not monotone).
Lemma 3.1
Let \(\Phi \) be a potential for the configuration space \(S^n\), as defined in Sect. 1.2.5, and let \(\mu \) be the Gibbs measure on \(S^n\) for the potential \(\Phi \). Then \(\mu \) has a RCR with base \(\nu \) given by
(In (38) we define the maximum over an empty set to be \(0)\).
Remark
Although we do not need the explicit form (38) for the proof of Theorem 1.10 (only the monotonicity of \(\nu \) is needed), this form may be of interest in itself. Note from this form that \(\nu \) is a product measure (where the product is over all edges \(b\) for which \(\Phi _b\) is non-constant).
Proof of Lemma 3.1
We have to show that \(\nu \) is indeed the base of an RCR for \(\mu \). So let \(\omega \in S^n\). We have
where the symbol \(*\) in the second line indicates that the sum is over all monotone \(\eta \) (in the set \(\prod _{b \subset [n]} \mathcal{P }(S^b)\)), and the \(*\) in the third line indicates that the sum is over monotone \(\beta \). It is easy to see (from the monotonicity property of \(\beta \)) that in this last sum everything cancels except the term \(\exp (\Phi _b(\omega _b))\). Hence
which completes the proof. \(\square \)
Now we start with the proof of Theorem 1.10.
Proof of Theorem 1.10
Let \(\Phi \) and \(\mu \) be as in the statement of the theorem. First note that the definition of \(A \boxminus B\) in Theorem 1.10 is consistent with the general definition (27) of the \(\boxminus \) operation. To see this, we just take the selection rule \(\Psi \) as follows.
with \(C(K)\) as defined in the paragraph below (19).
Let \(M \subset [n], \alpha \in S^M\), and \(\beta , \gamma \in S^{M^c}\) with \(\beta _i \ne \gamma _i, \, i \in M^c\). Recall the definition of the folded measure \(\mu ^{(\alpha ; \beta , \gamma )}\) in (24). By (21), \(\mu ^{(\alpha ; \beta , \gamma )}\) can be written as
for \(\omega \in \prod _{i \in M^c} \{\beta _i, \gamma _i\}\).
From this form it is clear that \(\mu ^{(\alpha ; \beta , \gamma )}\) is the Gibbs measure with the following potential \(\tilde{\Phi }\):
Note that
Since \(\mu ^{(\alpha ; \beta , \gamma )}\) is the Gibbs measure for the potential \(\tilde{\Phi }\), we have by Lemma 3.1 that it has an RCR with base \(\nu ^{(\alpha ; \beta , \gamma )}\) which is monotone w.r.t. \(\tilde{\Phi }\). To prove Theorem 1.10 it is sufficient to show that this RCR satisfies Conditions (i) and (ii) in the statement of Theorem 2.3. Condition (i) follows immediately from the above mentioned monotonicity property of \(\nu ^{(\alpha ; \beta , \gamma )}\) and from the symmetry property (41).
Now we show that Condition (ii) also holds: Let \(\omega \in \prod _{i \in M^c} \{\beta _i, \gamma _i\}\) and let \((K,L) \in \Psi (A, B, \omega \circ \alpha )\). Hence (by the way we chose \(\Psi \))
Let \(\eta \sim \omega \) be such that \(\nu ^{(\alpha ; \beta , \gamma )}(\eta ) > 0\). It is sufficient to show that no element of \(K \cap M^c\) belongs to the same \(\eta \)-cluster as an element of \(L \cap M^c\). To do this, it is, by (42), sufficient to show that every \(b \subset M^c\) which satisfies
is inactive (w.r.t. \(\eta \)).
So, let \(b \subset M^c\) satisfy (43). Let \(b^{\prime }\subset [n]\) be such that \(b^{\prime }\supset b\). From the definition of \(C(K)\) it follows that \(b^{\prime }\) is inefficient (w.r.t. \(\alpha \circ \omega \)). From the definition (19) of ‘inefficient’ it follows (by substituting in (19) \(b\) by \(b^{\prime }, \omega _b\) by \((\alpha \circ \omega )_{b^{\prime }}, \sigma \) by \(((\alpha \circ {\bar{\omega }})_{b^{\prime }}\), and \(N\) by \(\{ i \in b^{\prime }\!\setminus \! M \, : \, \delta _i = \omega _i \}\)) that
for all \(\delta \in \prod _{i \in b^{\prime }\setminus M} \{\beta _i, \gamma _i\}\). Applying this to each term in the r.h.s. of (40) gives
From this (and because \(\eta _b\) is monotone and \(\omega _b \in \eta _b\)) it follows immediately that each \(\delta \in \prod _{i \in b} \{\beta _i, \gamma _i\}\) belongs to \(\eta _b\). Hence \(b\) is not active.
As explained above, this completes the proof of Theorem 1.10. \(\square \)
3.2 Proof of Theorem 1.6 from Theorem 1.10
Proof
The Ising distribution (10) is clearly a Gibbs measure with respect to the potential \(\Phi \) given by:
Let \(b \subset [n]\) and \(\omega \in \{-1,+1\}^n\). Suppose \(b\) is not inefficient w.r.t. \(\omega \) (in the sense of definition (19), with \(\Phi \) as in (44)). It then follows from the definitions that then \(b\) is of the form \(\{i,j\}\) for some \(i, j \in [n]\) with \(i \ne j\) and \(J_{i j} > 0\), and, moreover, that for some \(x, y \in \{-1, +1\}\)
Hence \(\omega _i = \omega _j\).
Vice versa, if \(J_{i j} > 0\) and \(\omega _i = \omega _j\) then it follows similarly that \(\{i,j\}\) is not inefficient. This shows that the notion of clusters in Sect. 1.2.5, (with \(\Phi \) given by (44)) is the same as that in Sect. 1.2.3. But then the meaning of \(\boxminus \) in the two sections is also the same, and Theorem 1.6 is a special case of Theorem 1.10. \(\square \)
3.3 Proof of Theorem 1.9 from Theorem 1.10
Proof
The argument is quite similar to that for the ferromagnetic Ising model in the previous section. First note that the antiferromagnetic Potts measure, (17) with all \(J_{i j}\)’s non-positive, is a Gibbs measure on \(S^n\) with potential function \(\Phi \) given by
We will use the notion of inefficient in the sense of definition (19), with \(\Phi \) as in (45).
Let \(b \subset [n]\) and \(\omega \in S^n\). It follows immediately from the definitions that if \(|b| \ne 2\) or \(b\) is of the form \(\{i,j\}, i \ne j\), with \(J_{i j} = 0\), then \(b\) is inefficient.
Now suppose that \(b\) is of the form \(\{i, j\}, i \ne j\), with \(J_{i j} < 0\) and that \(\omega _i = \omega _j\). We claim that in that case \(b\) is also inefficient. If this claim holds, the above considerations imply that if two vertices are in different clusters in the sense of Sect. 1.2.4, then they also are in different clusters in the sense of Sect. 1.2.5. That, in turn, implies that \(A \boxminus B\) as defined in the former section is contained in \(A \boxminus B\) as defined in the latter section, so that Theorem 1.9 follows indeed from Theorem 1.10. By the definition of ‘inefficient’ and the form of the potential \(\Phi \) in (45), to prove the claim it suffices to show that (recall that \(J_{i j} < 0\)) for all \(x, y \in S\)
This last inequality can be checked straightforwardly: Since the first term in the l.h.s. is \(1\), the inequality can only fail if the r.h.s equals \(2\), i.e. if both terms in the r.h.s. are \(1\). However, it follows immediately that in that case \(\omega _i, \omega _j, x\) and \(y\) are all equal, so that the l.h.s. is also equal to \(2\).
This completes the proof of the claim, and thus that of Theorem 1.9. \(\square \)
4 Permutation invariance and proof of Theorem 1.4
We first state the following corollary of the general Theorem 2.3. Recall from Sect. 2.2 that if \(\mu \) is a distribution on \(\{0,1\}^n, M \subset [n]\) and \(\alpha \in \{0,1\}^M\), the base of an RCR of the folding \(\mu ^{(\alpha )}\) is a distribution on the set of all \(\eta \) of the form \((\eta _b, \, b \subset M^c)\) with each \(\eta _b\) a subset of \(\{0,1\}^b\).
Corollary 4.1
Let \(\mu \) be a probability distribution on \(\{0,1\}^n\) such that for each \(M \subset [n]\) and each \(\alpha \in \{0,1\}^M\), the folding \(\mu ^{(\alpha )}\) has a random-cluster representation with base \(\nu ^{(\alpha )}\) such that \(\nu ^{(\alpha )}\)-almost every \(\eta \) has the following two properties, (a) and (b) below.
-
Property (a): Every \(\eta \)-active \(b\) has \(|b| = 2\) and \(\eta _b = \{(0,1), \, (1,0) \}\).
-
Property (b): If \(b\) and \(b^{\prime }\) are \(\eta \)-active, then \(b=b^{\prime }\) or \(b \cap b^{\prime } = \emptyset \).
Then \(\mu \) has the BK-property.
Proof
Let \(A\) and \(B\) be increasing subsets of \(\{0,1\}^n\). Let \(\Psi \) be the following selection rule:
It is easy to see that, since \(A\) and \(B\) are increasing,
Recall that if \(W\) is a finite set and \(\omega \in \{0,1\}^W\), we use the notation \(|\omega |\) for \(\sum _{i \in W} \omega _i\). Now let \(M \subset [n]\) and \(\alpha \in \{0,1\}^M\). We will show that the base \(\nu ^{(\alpha )}\) satisfies Conditions (i) and (ii) of Theorem 2.3. First of all, by Property (a) above it follows immediately that Condition (i) is satisfied. Further, let \(\omega \in \{0,1\}^{M^c}\), let \(K, L \subset \Psi (A,B, \alpha \circ \omega )\), and let \(\eta \) be such that \(\nu ^{(\alpha )}(\eta ) >0\) and \(\eta \sim \omega \). Suppose that \(K \cap M^c\) and \(L \cap M^c\) have an element in the same \(\eta \)-cluster. From Property (b) in the statement of the corollary it follows immediately that then there is an \(\eta \)-active \(b\) such that \(K \cap b\) and \(L \cap b\) are non-empty. Since \(K \cap L = \emptyset \) and \(\omega \equiv 1\) on \(K \cup L\) it follows that \(|\omega _b| \ge 2\). However, this gives a contradiction with Property (a) in the statement of the corollary. Hence \(K \cap M^c\) and \(L \cap M^c\) have no element in the same \(\eta \)-cluster. So Condition (ii) in Theorem 2.3 is also satisfied, and it follows from that theorem that \(\mu (A \square B) \le \mu (A) \mu (B).\) \(\square \)
Lemma 4.2
Let \(\mu \) be a symmetric, permutation-invariant distribution on \(\{0,1\}^n\). (That is, \(\sum _{\omega \in \{0,1\}^n} \mu (\omega ) = 1\) and there are \(p_0, \ldots , p_{\lfloor n/2 \rfloor } \ge 0\) such that for all \(\omega \in \{0,1\}^n\) with \(|\omega | \le \lfloor n/2 \rfloor , \mu (\omega ) = \mu ({\bar{\omega }}) = p_{|\omega |}\)). Suppose there exist \(\xi _j \ge 0, j = 0,\ldots , \lfloor n/2 \rfloor \), such that the following equations hold:
with
Then \(\mu \) has a random-cluster representation with base \(\nu \) such that \(\nu \)-almost every \(\eta \) satisfies properties (a) and (b) in Corollary 4.1.
Proof
First we prove the following \(\square \)
Claim
Let \(0 \le j \le k \le n/2\). Let \(\omega \in \{0,1\}^n\) with \(|\omega | = k\). Then the number of \(\eta \sim \omega \) which have exactly \(j\) active edges and satisfy properties (a) and (b) in Corollary 4.1 is equal to \(a_{k, j}\).
The proof of this claim is a rather straightforward application of elementary combinatorics and we only give a brief sketch. Let \(V \subset [n]\) be the set of indices \(v\) for which \(\omega _v =1\). ‘Constructing’ an \(\eta \) of the form in the claim corresponds to choosing a subset \(W\) of size \(j\) of \(V\), and ‘pairing’ each \(w \in W\) with an index \(w^{\prime }\in V^c\). (Each such pair \(\{w,w^{\prime }\}\) corresponds to an active edge of \(\eta \)). Since \(|V| = k\), there are \({k \atopwithdelims ()j}\) ways to choose \(W\). Next, for each choice of \(W\) there are (since \(|V^c| = n-k\)) \(\frac{(n-k)!}{(n-k-j)!}\) ways to assign to each \(w \in W\) a \(w^{\prime } \in V^c\). So the number of \(\eta \)’s of the form in the claim is
which indeed equals \(a_{k, j}\), completing the proof of the claim.
Now we continue with the proof of Lemma 4.2. Let \(\nu \) be the probability distribution which assigns to each \(\eta \) probability
where we use the notation \(|\eta |\) for the number of \(\eta \)-active edges, and \(Z\) is a normalizing constant. Now let \(\omega \in \{0,1\}^n\) with \(|\omega | = k \le n/2\). We have
which, by the definition of \(\nu \) and by the claim in the beginning of this proof, equals
which by (46) is equal to \(p_k/Z\) and hence to \(\mu (\omega )/Z\). Further, if \(|\omega | \ge n/2\), then \(|{\bar{\omega }}| \le n/2\), and, using the above, we get a similar result as follows:
where in the last equation we used that (with the above choice of \(\nu ) \nu \)-almost every \(\eta \) is compatible with \(\omega \) if and only if it is compatible with \({\bar{\omega }}\). Hence \(\nu \) is indeed the base of an RCR for \(\mu \). From the definition of \(\nu \) it is trivial that \(\nu \)-almost every \(\eta \) satisfies (a) and (b) in Corollary 4.1.
This completes the proof of Lemma 4.2.
4.1 Proof of Theorem 1.4
Proof
Let \(M \subset [n]\) and \(\alpha \in \{-1,+1\}^M\). Denote \(|M|\) by \(m\). From the definitions it follows that the folded measure \(\mu ^{(\alpha )}\) is given by
where the first equality holds because the contributions from the external fields for \(\omega \) and \({\bar{\omega }}\) cancel, and the contributions from the ‘interaction’ with \(\alpha \) for \(\omega \) and \({\bar{ \omega }}\) also cancel, and where we used the notation \(|\omega |\) for the number of \(i \in M^c\) with \(\omega _i = +1\). This distribution is clearly symmetric and permutation-invariant in the sense of Lemma 4.2 (with the ‘spin value’ \(0\) replaced by \(-1\) but that is of course immaterial). Writing \(x\) for \(\exp ( -4 J)\) and \(k\) for \(|\omega |\), the last expression in (48) (apart from the constant factor \(1/{\tilde{Z}}\)) becomes \(x^{k (n-m-k)}\). So, by Lemma 4.2 and Corollary 4.1, it is sufficient to prove the following. \(\square \)
Lemma 4.3
For each \(n\) and each \(x \ge 1\), the following system of linear equations has a non-negative solution \((\xi _j, 0 \le j \le n/2)\):
where \(a_{k j}\) is given by (47) if \(0 \le j \le k \le n/2\) and equal to \(0\) otherwise.
Proof of Lemma 4.3
We start with some simple observations. First of all, since the matrix entries \(a_{k j}\) in the system of equations (49) are non-zero if and only if \(j \le k\), the matrix has an inverse \((a^{(-1)}_{j,k})_{0 \le j, k \le n/2}\) and the system of equations has a unique solution, which we denote by \(\xi _j(x), 0 \le j \le n/2\).
So we have to prove that if \(x \ge 1\), then \(\xi _j(x) \ge 0\) for all \(j\). From now on we restrict to \(x \ge 1\).
Now observe that, since \(a_{k 0} = 1\) for all \(k\), it follows immediately that
and
We will study the derivatives of \(\xi _j(x)\) for \(j \ge 1\). First we define, for a real function \(f\),
Next we define
The key ingredient of the proof of Lemma 4.3 is the following claim. \(\square \)
Claim 4.4
For each \(j \le n/2\) and each \(r = 1, 2, \ldots j-1\),
Before we prove the claim we show how it is used to prove Lemma 4.3. Taking \(r = j-1\) in the claim gives
This, together, with the obvious facts that \(a^{(-1)}_{j, j-1} = 1/a_{j j}\) and
gives
From (54) we have
Using this and the definition of \(\mathcal{D }\) we can now go ‘step by step backwards’, starting from (55), as follows. From the definition we have that
which by (55)) is \(\ge 0\). Since we also have, by (56), that \({\mathcal{D }}^{j-2, \, j}(1) = 0\), it follows that
Repeating this argument for \(j-3, j-4\) etcetera, we get eventually that
Now recall that the l.h.s. of this last expression is, by definition, \(\frac{d}{d x} \xi _j(x)\). Also recall [see (51)] that \(\xi _j(1) = 0\). Hence \(\xi _j(x) \ge 0\) for all \(x \ge 1\), which is the statement of the lemma.
So the only thing which still has to be done is to prove Claim 4.4. This is done by induction. First note that if \(r=1\) then, by the definitions (52) and (53), the l.h.s. of (54) is just \(\frac{d}{dx} \xi _j(x)\), which we can write as
where the third equality uses the definition (47) of \(a_{k j}\). Since the last expression in (57) is equal to the r.h.s. of (54) (for \(r=1\)), this shows that the claim holds for \(r = 1\). Now suppose the claim holds for \(r-1\). We show that then it also holds for \(r\): By the induction hypothesis [and the definition (53)], the l.h.s. of (54) can be written as
where the first equality follows from the definition (52) of \(D_r\), and the last from simple manipulations and the definition (47). Since the last expression in (58) is equal to the r.h.s. of (54), this completes the proof of Claim 4.4. As we pointed out before, this also completes the proof of Lemma 4.3.
Finally, as we explained before the statement of Lemma 4.3, this completes the proof of Theorem 1.4.
4.2 Proof of Theorem 1.5
Proof
The negative lattice condition can be expressed in terms of the foldings as follows. For \(\omega , \omega ^{\prime } \in \Omega \), let \(M=\{i \in [n]\, : \, \omega _i = \omega ^{\prime }_i\}\) and let \(\alpha = \omega _M\). Then \(\mu (\omega )\mu (\omega ^{\prime })=\mu ^{(\alpha )}(\omega _{M^c})\). Moreover, if \(\hat{\omega } \in \{-1, +1\}^{M^c}\) is such that \(\hat{\omega }_i \equiv 1\), then \(\mu (\omega \vee \omega ^{\prime }) \, \mu (\omega \wedge \omega ^{\prime })=\mu ^{(\alpha )}(\hat{\omega })\). The negative lattice condition is then equivalent to
For \(\mu \) as in (8), it is easy to see that in each folding the interaction between an odd number of spins vanishes, and that what is left is a permutation invariant model with interactions expressed in terms of products of two spin values. More precisely, the folding is of the form
for a suitable \(x\) (which depends on \(\alpha \)), where \(n^{\prime }=|M^c|\) and \(k^{\prime }=|\sigma |\).
From (59) and (60) it follows that \(x \ge 1\). Hence, by Lemma 4.3 there is a nonnegative solution to the system of equations (49). This implies that \(\mu ^{(\alpha )}\) satisfies the conditions of Lemma 4.2, so that Corollary 4.1 can be applied to yield Theorem 1.5.
\(\square \)
References
van den Berg, J., Kesten, H.: Inequalities with applications to percolation and reliability. J. Appl. Probab. 22, 556–569 (1985)
van den Berg, J., Fiebig, U.: On a combinatorial conjecture concerning disjoint occurrences of events. Ann. Probab. 15, 354–374 (1987)
van den Berg, J., Jonasson, J.: A BK inequality for randomly drawn subsets of fixed size. Probab. Theory Relat. Fields (2011). doi:10.1007/s00440-011-0386-z
Borgs, C., Chayes, J.T.: On the covariance matrix of the Potts model: a random-cluster analysis. J. Stat. Phys. 82, 1235–1297 (1996)
Borgs, C., Chayes, J.T., Randall, D.: The van den Berg-Kesten-Reimer inequality: a review. In: Bramson, M., Durrett, R. (eds.) Perplexing Problems in Probability (Festschrift in honor of Harry Kesten), pp. 159–175 (1999)
Chayes, L., Lei, H.K.: Random cluster models on the triangular lattice. J. Stat. Phys. 122, 647–670 (2006)
Fortuin, C.M., Kasteleyn, P.W.: On the random-cluster model. I. Introduction and relation to other models. Physica 57, 536–564 (1972)
Gandolfi, A.: Random cluster representations and foldings of a finite probability (in preparation)
Grimmett, G.R.: Percolative problems. In: Grimmett, G.R. (ed.) Probability and Phase Transition, pp. 69–86. Kluwer, Dordrecht (1994)
Grimmett, G.R.: The Random-Cluster Model. Springer, Berlin (2006)
Grimmett, G.R.: Probability on Graphs. IMS Textbooks. Cambridge University Press, Cambridge (2010)
Jonasson, J.: The BK inequality for pivotal sampling (2012, preprint). http://www.math.chalmers.se/jonasson)
Kahn, J., Saks, M., Smyth, C.: The dual BKR inequality and Rudich’s conjecture. Combin. Probab. Comput. 20(2), 257–266 (2011)
Liggett, T.M., Steif, J.E., Tóth, B.: Statistical mechanics systems on complete graphs, infinite exchangeability, finite extensions and a discrete finite moment problem. Ann. Probab. 35(3), 867–914 (2007)
Markström, K.: Closure properties and negatively associated measures violating the van den Berg–Kesten inequality. Elect. Comm. Probab. 15, 449–456 (2009)
Pemantle, R.: Towards a theory of negative dependence. J. Math. Phys. 41, 1371–1390 (2000)
Reimer, D.: Proof of the Van den Berg–Kesten conjecture. Combin. Probab. Comput. 9, 27–32 (2000)
Talagrand, M.: Some remarks on the Berg–Kesten inequality. In: Probability in Banach Spaces, Vol. 9, pp. 293–297. Birkhäuser, Boston (1994)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
van den Berg, J., Gandolfi, A. BK-type inequalities and generalized random-cluster representations. Probab. Theory Relat. Fields 157, 157–181 (2013). https://doi.org/10.1007/s00440-012-0452-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-012-0452-1
Keywords
- BK inequality
- Negative dependence
- Random-cluster representation Gibbs distribution
- Arm events
- Curie-Weiss model
- Foldings
Mathematics Subject Classification (2010)
- 60C05
- 60K35
- 82B20