What Can be Sampled Locally?

The local computation of Linial [FOCS'87] and Naor and Stockmeyer [STOC'93] concerns with the question of whether a locally definable distributed computing problem can be solved locally: more specifically, for a given local CSP (Constraint Satisfaction Problem) whether a CSP solution can be constructed by a distributed algorithm using local information. In this paper, we consider the problem of sampling a uniform CSP solution by distributed algorithms, and ask whether a locally definable joint distribution can be sampled from locally. More broadly, we consider sampling from Gibbs distributions induced by weighted local CSPs, especially the Markov random fields (MRFs), in the LOCAL model. We give two Markov chain based distributed algorithms which we believe to represent two fundamental approaches for sampling from Gibbs distributions via distributed algorithms. The first algorithm generically parallelizes the single-site sequential Markov chain by updating in each step the variables from a random independent set in parallel, and achieves an O(Δ log n) time upper bound in the LOCAL model, where Δ is the maximum degree, when the Dobrushin's condition for the Gibbs distribution is satisfied. The second algorithm is a novel parallel Markov chain which proposes to update all variables simultaneously yet still guarantees to converge correctly with no bias. It surprisingly parallelizes an intrinsically sequential process: stabilizing to a joint distribution with massive local dependencies, and may achieve an optimal O(log n) time upper bound independent of the maximum degree Δ under a stronger mixing condition. We also show a strong Ω(diam) lower bound for sampling: in particular for sampling independent set in graphs with maximum degree Δ ≥ 6. Independent sets are trivial to construct locally and the sampling lower bound holds even when every node is aware of the entire graph. This gives a strong separation between sampling and constructing locally checkable labelings.


Introduction
Local computation and the LOCAL model.Locality of computation is a central theme in the theory of distributed computing.In the seminal works of Linial [44], and Naor and Stockmeyer [49], the locality of distributed computation and the locally definable distributed computing problems are respectively captured by the LOCAL model and the notion of locally checkable labeling (LCL) problems.In the LOCAL model [49,52], a network of n processors is represented as an undirected graph, where each vertex represents a processor and each edge represents a bidirectional communication channel.Computations and communications are organized in synchronized rounds.In each round, each processor may receive a message of arbitrary size from each of its neighbors, perform an arbitrary local computation with the information collected so far, and send a message of arbitrary size to each of its neighbors.The output value for each vertex in a t-round protocol is determined by the local information within the t-neighborhood of the vertex.The local computation tasks are usually formulated as labeling problems, such as the locally checkable labeling (LCL) problems introduced in [49], in which the distributed algorithm is asked to construct a feasible solution of a constraint satisfaction problem (CSP) defined by local constraints with constant diameter in the network.Many problems can be expressed in this way, including various vertex/edge colorings, or local optimizations such as maximal independent set (MIS) and maximal matching.
The local sampling problem.Given an LCL problem which defines a local CSP on the network, aside from constructing a feasible solution of the local CSP, another interesting problem is to sample a uniform random feasible solution, e.g. to sample a uniform random proper coloring of the network G with a given number of colors.More abstractly, given an instance of local CSP which, say, treats the vertices in the network G(V, E) as variables, a joint distribution of uniform random feasible solution X = (X v ) v∈V is accordingly defined by these local constraints.Our main question is whether a locally definable joint distribution can be sampled from locally.
Intuitively, sampling could be substantially more difficult than labeling, because to sample a feasible solution is at least as difficult as to construct one, and furthermore, the marginal distribution of each random variable X v in a jointly distributed feasible solution X = (X v ) v∈V may already encapsulate certain amount of non-local information about the solution space.
Retrieving such information about the solution space (as in sampling) instead of constructing one solution (as in labeling) by distributed algorithms is especially well motivated in the context of distributed machine learning [50,17,57,62,32,15,14,63,61], where the data (the description of the joint distribution) is usually distributed among a large number of servers.
Besides uniform distributions, it is also natural to consider sampling from general non-uniform distributions over the solution space, which are usually formulated as graphical models known as the weighted CSPs [7], also known as factor graphs [47].In this model, a probability distribution called the Gibbs distribution is defined over the space Ω = [q] V of configurations, in such a way that each constraint of the weighted CSP contributes a nonnegative factor in the probability measure of a configuration in Ω. Due to Hammersley-Clifford's fundamental theorem [47,Theorem 9.3] of random fields, this model is universal for conditional independent (spatial Markovian) [47,Proposition 9.2] joint distributions.The conditional independence property roughly says that fixed a separator S ⊂ V whose removal "disconnects" the variable sets A and B, given any feasible configuration X S = σ S over S, the configurations X A over A and X B over B are conditionally independent.
We are particularly interested in a basic class of weighted local CSPs, namely the Markov random fields (MRFs), where every local constraint (factor) is either a binary constraint over an edge or a unary constraint on a vertex.Specifically, given a graph G(V, E) and a finite domain [q] = {1, 2, . . ., q}, the probability measure µ(σ) of each configuration σ ∈ [q] V under the Gibbs distribution µ is defined to be proportional to the weight: where {A e ∈ R q×q ≥0 } e∈E are non-negative q × q symmetric matrices and {b v ∈ R q ≥0 } v∈V are nonnegative q-vectors, both specified by the instance of MRF.Examples of MRFs include combinatorial models such as independent set, vertex cover, graph coloring, and graph homomorphsm, or physical models such as hardcore gas model, Ising model, Potts model, and general spin systems.

Our results
We give two Markov chain based distributed algorithms for sampling from Gibbs distributions.Given any ǫ > 0, each algorithm returns a random output which is within total variation distance ǫ from the Gibbs distribution.Our expositions mainly focus on MRFs, although both algorithms can be extended straightforwardly to general weighted local CSPs.
In classic single-site Markov chains for Gibbs sampling, such as the Glauber dynamics, at each step a variable is picked at random and is updated according to its neighbors' current states.A generic approach for parallelizing a single-site sequential Markov chain is to update a set of nonadjacent vertices in parallel at each step.This natural idea has been considered in [32], also in a much broader context such as parallel job scheduling [12] or distributed Lovász local lemma [48,11].For sampling from locally defined joint distributions, it is especially suitable because of the conditional independence property of MRFs.
Our first algorithm, named LubyGlauber, naturally parallelizes the Glauber dynamics by parallel updating vertices from independent sets generated by the "Luby step" in Luby's algorithm [1,46].It is well known that Glauber dynamics achieves the mixing rate τ (ǫ) = O n log n ǫ under the Dobrushin's condition for the decay of correlation [16,35].By a standard coupling argument, the LubyGlauber algorithm achieves a mixing rate τ (ǫ) = O ∆ log n ǫ under the same condition, where ∆ is the maximum degree of the network.In particular, for uniform proper q-colorings, this implies: Theorem 1.1.If q ≥ α∆ for an arbitrary constant α > 2, there is an algorithm which samples a uniform proper q-coloring within total variation distance ǫ > 0 within O ∆ log n ǫ rounds of communications on any graph G(V, E) with n = |V | vertices and maximum degree ∆, where ∆ may be unbounded.
A barrier for this natural approach is that it will perform poorly on general graphs with large chromatic number.The situation motivates us to ask following questions: • Is it possible to update all variables in X = (X v ) v∈V simultaneously and still converge to the correct stationary distribution µ?
• More concretely, is it always possible to sample almost uniform proper q-coloring, for a q = O(∆), on any graphs G(V, E) with n = |V | vertices and maximum degree ∆, within O(log n) rounds of communications, especially when ∆ is unbounded?Surprisingly, the answers to both questions are "yes".We give an algorithm, called the Lo-calMetropolis algorithm, achieving these goals.This is a bit surprising, since it seems to fully parallelize a process which is intrinsically sequential due to the massive local dependencies, especially on graphs with unbounded maximum degree.The algorithm follows the Metropolis-Hastings paradigm: at each step, it proposes to update all variables independently and then applies proper local filtrations to the proposals to ensure its convergence to the correct joint distribution.Our main discovery is that for locally defined joint distributions, the Metropolis filters are localizable.
The LocalMetropolis algorithm always converges to the correct Gibbs distribution.The analysis of its mixing time is more involved.In particular, for uniformly sampling proper q-coloring we show: Theorem 1.2.If q ≥ α∆ for an arbitrary constant α > 2 + √ 2, there is an algorithm for sampling uniform proper q-coloring within total variation distance ǫ > 0 in O log n ǫ rounds of communications on any graph G(V, E) with n = |V | vertices and maximum degree at most ∆ ≥ 9, where ∆ may be unbounded.
Neither of the algorithms abuses the power of the LOCAL model: each message is of O(log n) bits if the domain size q = poly(n).
Due to the exponential correlation between variables in Gibbs distributions, the O log n ǫ time bound achieved in Theorem 1.2 is optimal.
After the submission of this paper, two independent works [21,23] give the same distributed algorithm for sampling random q-coloring, which improves the LocalMetropolis algorithm by introducing a step of laziness as distributed symmetry breaking.This new algorithm achieves an O(log n) mixing time under the Dobrushin's condition q ≥ (2 + δ)∆.Furthermore, for graphs with sufficiently large maximum degree and girth at least 9, it achieves an O(log n) mixing time when q ≥ (α * + δ)∆, where α * ≈ 1.763 is the positive root of equation x = e 1/x .Another non-MCMC algorithm named distributed JVV sampler is given in [22].For many locally definable joint distributions, this algorithm successfully samples a configuration within polylog(n) rounds in the LOCAL model with high probability.In particular, this algorithm samples random q-coloring of triangle-free graphs within O(log 3 n) rounds in the LOCAL model as long as q ≥ (α * + δ)∆.This non-MCMC sampling algorithm abuses the power of the LOCAL model by assuming unlimited message-size and local computations.
It is a well known phenomenon that sampling may become computationally intractable when the model exhibits the non-uniqueness phase-transition property, e.g.independent sets in graphs of maximum degree bounded by a ∆ ≥ 6 [55,56,27,28].For the same class of distributions, we show the following unconditional Ω(diam) lower bound for sampling in the LOCAL model.Theorem 1.3.For ∆ ≥ 6, there exist infinitely many graphs G(V, E) with maximum degree ∆ and diameter diam(G) = |V | Ω (1) such that any algorithm that samples uniform independent set in G within sufficiently small constant total variation distance ǫ requires at least Ω(diam(G)) rounds of communications, even assuming the vertices v ∈ V to be aware of G.
The lower bound is proved by a now fairly well-understood reduction from maximum cut to sampling independent sets when ∆ ≥ 6 [55,56,28].Specifically, we show that when ∆ ≥ 6 there are infinitely many graphs G(V, E) such that if one can sample a nearly uniform independent set in G(V, E), then one can also sample an almost uniform maximum cut in an even cycle of size |V | Ω (1) , which is necessarily a global task because of the long-range correlation.
Theorem 1.3 strongly separates sampling from labeling problems for distributed computing: • In the LOCAL model it is trivial to construct an independent set (because ∅ is an independent set).In contrast, Theorem 1.3 says that sampling a uniform independent set is very much a global task for graphs with maximum degree ∆ ≥ 6.
• In the LOCAL model any labeling problem would be trivial once the network structure G is known to each vertex.In contrast, the sampling lower bound in Theorem 1.3 still holds even when each vertex is aware of G. Unlike labeling whose hardness is due to the locality of information, for sampling the hardness is solely due to the locality of randomness.
• A breakthrough of Ghaffari, Kuhn and Maus [30] shows that any labeling problem that can be solved sequentially with local information admits a randomized protocol within O(polylog(n)) rounds in the LOCAL model.In contrast, for sampling we have an Ω(diam) randomized lower bound for graphs with n Ω(1) diameter.

Related work
The topic of sequential MCMC (Markov chain Monte Carlo) sampling is extensively studied.The study of sampling proper q-colorings was initiated by the seminal works of Jerrum [37] and independently of Salas and Sokal [53].So far the best rapid mixing condition for general bounded-degree graphs is q ≥ 11 6 ∆ due to Vigoda [59].See [26] for an excellent survey.The chromatic-scheduler-based parallelization of the Glauber dynamics chain was studied in [32].This parallel chain is in fact a special case of systematic scan for Glauber dynamics [18,19,35], in which the variables are updated according to a fixed order.
Empirical studies showed that sometimes an ad hoc "Hogwild!"parallelization of sequential sampler might work well in practice [51] and the mixing results assuming bounded asynchrony were given in [14,38].
A sampling algorithm based on the Lovász local lemma is given in [33].When sampling from the hardcore model with λ < 1 2 √ e∆−1 on a graph of maximum degree ∆, this sampling algorithm can be implemented in the LOCAL model which runs in O(log n) rounds.
A problem related to the local sampling is the finitary coloring [36], in which a random feasible solution is sampled according to an unconstrained distribution as long as the distribution is over feasible solutions, rather than a specific distribution such as the Gibbs distribution.Therefore, the nature of this problem is still labeling rather than sampling.
Our algorithms are Markov chains which randomly walk over the solution space.A related notion is the distributed random walks [13], which walk over the network.
Our LocalMetropolis chain should be distinguished from the parallel Metropolis-Hastings algorithm [9] or the parallel tempering [58], in which the sampling algorithms makes N proposals or runs N copies of the system in parallel for a suitably large N , in order to improve the dynamic properties of the Monte Carlo simulation.
Organization of the paper The models and preliminaries are introduced in Section 2. The LubyGlauber algorithm is introduced in Section 3. The LocalMetropolis algorithm is introduced in Section 4. And the lower bounds are proved in Section 5.

The LOCAL model
We assume Linial's LOCAL model [49,52] for distributed computation, which is as described in Section 1.We further allow each node in the network G(V, E) to be aware of upper bounds of ∆ and log n, where n = |V | is the number of nodes.This information is accessed only because the running time of the Monte Carlo algorithms may depend on them.

Markov random field and local CSP
The Markov random field (MRF), or spin system, is a well studied stochastic model in probability theory and statistical physics.Given a graph G(V, E) and a set of spin states [q] = {1, 2, . . ., q} for a finite q ≥ 2, a configuration σ ∈ [q] V assigns each vertex one of the q spin states.For each edge e ∈ E there is a non-negative q × q symmetric matrix A e ∈ R q×q ≥0 associated with e, called the edge activity; and for each vertex v ∈ V there is a non-negative q-dimensional vector b v ∈ R q ≥0 associated with v, called the vertex activity.Then each configuration σ ∈ [q] V is assigned a weight w(σ) which is as defined in (1).
This gives rise to a natural probability distribution µ, called the Gibbs distribution, over all configurations in the sample space Ω = [q] V proportional to their weights, such that µ(σ) = w(σ)/Z for each σ ∈ Ω, where Several natural joint distributions can be expressed as MRFs: • Independent sets / vertex covers: , each feasible configuration corresponds to an independent set (or vertex cover, if the other spin state indicates the set) in G, and the Gibbs distribution µ is the uniform distribution over independent sets (or vertex covers) in G.When b v = 1 λ for some parameter λ > 0, this is the hardcore model from statistical physics.
• Colorings and list colorings: When every A e has A e (i, i) = 0 and A e (i, j) = 1 if i = j, and every b v is the all-1 vector, the Gibbs distribution µ becomes the uniform distribution over proper q-colorings of graph G.For list colorings, each vertex v ∈ V can only use the colors from its list L v ⊆ [q] of available colors.Then we can let each b v be the indicator vector for the list L v and A e 's are the same as for proper q-colorings, so that the Gibbs distribution is the uniform distribution over proper list colorings.
• Physical model: The proper q-coloring is a special case of the Potts model in statistical physics, in which each A e has A e (i, i) = β for some parameter β > 0 and A e (i, j) = 1 if i = j.
When further q = 2, the model becomes the Ising model.
The model of MRF can be further generalized to allow multivariate asymmetric constraints, by which gives us the weighted CSPs, also known as the factor graphs.In this model, we have a collection C of constraints c = (f c , S c ) where each where σ| Sc represents the restriction of σ on S c .And the Gibbs distribution µ over all configurations in Ω = [q] V is defined in the same way proportional to the weights.In particular, when f c 's are Boolean-valued functions, the Gibbs distribution µ is the uniform distribution over CSP solutions.
A constraint c = (f c , S c ) is said to be local with respect to network G if the diameter of the scope S c in network G is bounded by a constant.Local CSPs are expressive, for example: • Dominating sets: They can be expressed by having a "cover" constraint on each inclusive neighborhood Γ + (v) which constrains that at least one vertex from Γ + (v) is chosen.
• Maximal independent sets (MISs): An MIS is a dominating independent set.Clearly, the MRF is a special class of weighted local CSPs, defined by unary and binary symmetric local constraints with respect to G.

Local Sampling
The local sampling problem is defined as follows.Let G(V, E) be a network.Given an MRF defined on G (or more generally a weighted CSP that is local with respect to G), where the specifications of the local constraints are given as private inputs to the involved processors, for any ǫ > 0 upon termination each processor v ∈ V outputs a random variable X v such that the total variation distance between the distribution ν of the random vector X = (X v ) v∈V and the Gibbs distribution µ is bounded as d TV (µ, ν) ≤ ǫ, where the total variation distance between two distributions µ, ν over Ω = [q] V is defined as

Mixing rate
Our algorithms are given as Markov chains.Given an irreducible and aperiodic Markov chain X (0) , X (1) , . . .∈ Ω, for any σ ∈ Ω let π σ denote the distribution of X (t) conditioning on that X (0) = σ.For ǫ > 0 the mixing rate τ (ǫ) is defined as where π is the stationary distribution for the chain.For formal definitions of these notions for Markov chain, we refer to a standard textbook of the subject [43].Informally, irreducibility and aperiodicity guarantees that X (t) converges to the unique stationary distribution π as t → ∞, and the mixing rate τ (ǫ) tells us how fast it converges.
Notations.Given a graph G(V, E), we denote by d v = deg(v) the degree of v in G, ∆ = ∆ G the maximum degree of G, diam = diam(G) the diameter of G, and dist(u, v) = dist G (u, v) the shortest path distance between vertices u and v in G.
We also denote by Γ(v) = {u | uv ∈ E} the neighborhood of v, and Γ + (v) = Γ(v) ∪ {v} the inclusive neighborhood.Finally we write B r (v) = {u | dist(u, v) ≤ r} for the r-ball centered at v.

The LubyGlauber Algorithm
In this section, we analyze a generic scheme for parallelizing Glauber dynamics, a classic sequential Markov chain for sampling from Gibbs distributions.
We assume a Markov random field (MRF) defined on the network G(V, E), with edge activities A = {A e } e∈E and vertex activities b = {b v } v∈V , which specifies a Gibbs distribution µ over Ω = [q] V .The single-site heat-bath Glauber dynamics, or simply the Glauber dynamics, is a well known Markov chain for sampling from the Gibbs distribution µ.Starting from an arbitrary initial configuration X ∈ [q] V , at each step the chain does the followings: • sample a vertex v ∈ V uniformly at random; • resample the value of X v according to the marginal distribution induced by µ at vertex v conditioning on the current spin states of v's neighborhood.
It is well known (see [43]) that the Glauber dynamics is a reversible Markov chain whose stationary distribution is the Gibbs distribution µ.Formally, supposed that σ ∈ [q] V is sampled from µ, for any v ∈ V , S ⊆ V and τ S ∈ [q] S , the marginal distribution at vertex v conditioning on τ S , denoted as µ v (• | τ S ), is defined as In the Glauber dynamics, X v is resampled according to the marginal distribution µ Here X Γ(v) represents the current spin states of v's neighborhood Γ(v).For Markov random field, this marginal distribution can be computed as For example, when the MRF is the proper q-coloring, this is just the uniform distribution over available colors in [q] which are not used by v's neighbors.For the Glauber dynamics to work, it is common to assume that the sum a∈[q] b v (a) u∈Γ(v) A uv (a, X u ) is always positive, so that the marginal distributions are well-defined. 1 generic scheme for parallelizing the Glauber dynamics is that at each step, instead of updating one vertex, the chain updates a group of "non-interfering" vertices in parallel, as follows: • independently sample a random independent set I in G; This can be seen as a relaxation of the chromatic-based scheduler [32] and systematic scans [19].
A convenient way for generating a random independent set in a distributed fashion is the "Luby step" in Luby's algorithm for distributed MIS [1,46]: each vertex samples a uniform and independent ID from the interval [0, 1] (which can be discretized with O(log n) bits) and the vertices v who are locally maximal among the inclusive neighborhood Γ + (v) are selected into the independent set I.
The resulting algorithm is called LubyGlauber , whose pseudocode is given in Algorithm 1.
Algorithm 1: Pseudocode for vertex v ∈ V in LubyGlauber algorithm Input: Vertex v ∈ V receives {A uv } u∈Γ(v) and b v as input.
1 initialize X v to an arbitrary value in [q]; 2 for t = 1 through T do 3 sample a real β v ∈ [0, 1] uniformly and independently; According to the definition of marginal distribution (2), resampling X v can be done locally by exchanging neighbors' current spin states.After T iterations, where T is a threshold determined for specific Markov random field, the algorithm terminates and outputs the current X = (X v ) v∈V .
Remark 3.1.The LubyGlauber algorithm can be easily extended to sample from weighted CSPs defined by local constraints c = (f c , S c ) ∈ C, by simply overriding the definition of neighborhood as is the neighborhood of v in the hypergraph where S c 's are the hyperedges and now I is the strongly independent set of this hypergraph.

Mixing of LubyGlauber
Let µ LG denote the distribution of X returned by the algorithm upon termination.As in the case of single-site Glauber dynamics, we assume that the marginal distribution (2) is always well-defined, and the single-site Glauber dynamics is irreducible among all feasible configurations.The following proposition is easy to obtain.Proposition 3.1.The Markov chain LubyGlauber is reversible and has stationary distribution µ.Furthermore, under the above assumption, d TV (µ LG , µ) converges to 0 as T → ∞.
Proof.We prove this for a more general family of Markov chains, where the "Luby step" is replaced by an arbitrary way of independently sampling a random independent set I, as long as Pr[v ∈ I] > 0 for every vertex v ∈ V .
Let Ω = [q] V and P ∈ R

|Ω|×|Ω| ≥0
denote the transition matrix for the LubyGlauber chain.We first show that the chain is reversible and µ is stationary.Specifically, this means to verify the detailed balance equation: If both X and Y are infeasible, then µ(X) = µ(Y ) = 0 and the detailed balance equation holds trivially.If X is feasible and Y is not then µ(Y ) = 0 and meanwhile since the chain never moves from a feasible configuration to an infeasible one, we have P (X, Y ) = 0 so the detailed balance equation is also satisfied.
It remains to verify the detailed balance equation when both X and Y are feasible.
} be the set of disagreeing vertices.If D is not an independent set, then P (X, Y ) = P (Y, X) = 0 and the detailed balance equation holds.Suppose that D is an independent set.For any independent set I ⊇ D, we denote by Pr[X → Y | I] the probability that within an iteration the chain moves from X to Y conditioning on I being the independent set sampled in the first step.Therefore, By the law of total probability, Thus, the chain is reversible with respect to the Gibbs distribution µ.
Next, observe that the chain will never move from a feasible configuration to an infeasible one.Moreover, due to the assumption that the marginal distribution ( 2) is always well-defined, once a vertex v has been resampled, it will satisfy all local constraints.Therefore, the chain will be feasible once every vertex has been resampled.Since every vertex v has positive probability Pr[v ∈ I] to be resampled, the chain is absorbing to feasible configurations.
It is easy to observe that every feasible configuration is aperiodic, since it has self-loop transition, i.e.P (X, X) > 0 for all feasible X.And any move X → Y between feasible configurations X, Y ∈ Ω in the single-site Glauber dynamics with vertex v being updated, can be simulated by a move in the LubyGlauber chain by first sampling an independent set I ∋ v (which is always possible since Pr[v ∈ I] > 0) and then updating v according to X → Y and meanwhile keeping all v ∈ I \ {v} unchanged (which is always possible for feasible X).Provided the irreducibility of the single-site Glauber dynamics among all feasible configurations, the LubyGlauber chain is also irreducible among all feasible configurations.Combining with the absorption towards feasible configurations and their aperiodicity, due to the Markov chain convergence theorem [43], the total variation distance d TV (µ LG , µ) converges to 0 as T → ∞.
We then apply a standard coupling argument from [35,18] to analyze the mixing rate of the LubyGlauber chain.The following notions are essential to the mixing of Glauber dynamics.
for the marginal distribution of the value of v, for configurations sampled from µ conditioning on agreeing with σ at all neighbors of v.For vertices i, j ∈ V , the influence of j on i is defined as where S j denotes the set of all pairs of feasible configurations σ, τ ∈ [q] V such that σ and τ agree on all vertices except j.Let R = (ρ i,j ) i,j∈V be the n × n influence matrix.
Definition 3.2 (Dobrushin's condition).Let α be the total influence on a vertex, defined by We say that the Dobrushin's condition is satisfied if α < 1.
It is a fundamental result that the Dobrushin's condition is sufficient for the rapid mixing of Glauber dynamics [16,53,35], with a mixing rate of τ Here we show that the LubyGlauber chain is essentially a parallel speed up of the Glauber dynamics by a factor of Θ( n ∆ ).Theorem 3.2.Under the same assumption as Proposition 3.1, if the total influence α < 1, then the mixing rate of the LubyGlauber chain is τ Consequently, for any ǫ > 0 the LubyGlauber algorithm can terminate within O ∆ 1−α log n ǫ rounds in the LOCAL model and return an X ∈ [q] V whose distribution µ LG is ǫ-close to the Gibbs distribution µ in total variation distance.
Remark 3.2.In fact, Proposition 3.1 and Theorem 3.2 hold for a more general family of Markov chains, where the "Luby step" could be any subroutine which independently generates a random independent set I, as long as every vertex has positive probability to be selected into I.In general, the mixing rate in Theorem 3.2 is in fact where γ is a lower bound for the probability Pr[v ∈ I] for all v ∈ V .
The following lemma is crucial for relating the mixing rate to the influence matrix.The lemma has been proved in various places [35,18,14].Lemma 3.3.Let X and Y be two random variables that take values over the feasible configurations in Proof.We enumerate V as V = {1, 2, . . ., n}.For 0 ≤ k ≤ n, define Z (k) as that for each j ∈ V , In particular, Z (0) = X and Z (n) = Y .Now, by triangle inequality, Next, we note that Since Z (k−1) and Z (k) can only differ at vertex k, it follows that (Z (k−1) , Z (k) ) ∈ S k , and hence, By linearity of expectation, Proof of Theorem 3.2: We are actually going to prove a stronger result: Denoted by I the random independent set on which the resampling is executed, we write γ v = Pr[v ∈ I] for each v ∈ V , and assume that for all v ∈ V , γ v ≥ γ for some γ > 0. Clearly, when I is generated by the "Luby step", this holds for γ = 1 ∆+1 .We are going to prove that τ The proof follows the framework of Hayes [35].We construct a coupling of the Markov chain (X (t) , Y (t) ) such that the transition rules for X (t) → X (t+1) and Y (t) → Y (t+1) are the same as the for any initial configurations σ, τ ∈ Ω, then by the coupling lemma for Markov chain [43], we have the mixing rate τ (ǫ) ≤ T .
The coupling we are going to use is the maximal one-step coupling of the LubyGlauber chain, which for every vertex i ∈ V achieves that Pr X i are the marginal distributions as defined in Definition 3.1.The existence of such coupling is guaranteed by the coupling lemma.
Arbitrarily fix σ, τ ∈ Ω = [q] V .For t ≥ 0, define (X (t) , Y (t) ) ∈ Ω 2 by iterating a maximal one-step coupling of the LubyGlauber chain, starting from initial condition X (0) = σ, Y (0) = τ .Due to the well-defined-ness of marginal distribution (2), we know that once all vertices have been resampled, the configuration will be feasible and will remain to be feasible in future.
Let T 1 be a positive integer and F denote the event all vertices have been resampled in chain X and Y in the first T 1 steps.By union bound, we have Next, we assume that X (t) , Y (t) are both feasible for t ≥ T 1 .We define the vector .
By the definition of the LubyGlauber chain, it holds for every j ∈ V that By the definition of maximal one-step coupling and Lemma 3.
Combined with equality (4), for t ≥ T 1 we have where matrix M = (J − Γ)J + ΓR, where Γ is the n × n diagonal matrix with Γ i,i = γ i ; J is the n × n identity matrix; and R = (ρ ij ) is the influence matrix.The ∞-norm of M is bounded as By induction, we obtain the component-wise inequality Conditioning on that X (T 1 ) and Y (T 1 ) are both feasible, we have 1 by union bound For any ǫ, we choose . Combining (3) and ( 5), conditioning on

This implies that
In particular, if the random independent set I is generated by the "Luby step", we have γ = 1 ∆+1 , therefore for the LubyGlauber chain

Application of LubyGlauber for sampling graph colorings
For uniformly distributed proper q-coloring of graph G, it is well known that the Dobrushin's condition is satisfied when q ≥ 2∆ + 1 where ∆ is the maximum degree of graph G.
We consider a more generalized problem, the list colorings, where each vertex v ∈ V maintains a list L v ⊆ [q] of colors that it can use.The proper q-coloring is a special case of list coloring when everyone's list is precisely [q].For each vertex v ∈ V , we denote by q v = |L v | the size of v's list, and d v = deg(v) the degree of v.It is easy to verify that the total influence α is now bounded as Applying Theorem 3.2, we have the following corollary, which also implies Theorem 1.1.
Corollary 3.4.If there is an arbitrary constant δ > 0 such that q v ≥ (2 + δ)d v for every vertex v, then the mixing rate of the LubyGlauber chain for sampling list coloring is τ (ǫ) = O ∆ log n ǫ .

The LocalMetropolis Algorithm
In this section, we give an algorithm that may fully parallelize the sequential process under suitable mixing conditions, even on graphs with unbounded degree.The algorithm is inspired by the famous Metropolis-Hastings algorithm for MCMC, in which a random choice is proposed and then filtered to enforce the target stationary distribution.Our algorithm, called the LocalMetropolis algorithm, makes each vertex propose independently, and localizes the work of filtering to each edge.We are given a Markov random field (MRF) defined on the network G(V, E), with edge activities A = {A e } e∈E and vertex activities b = {b v } v∈V , whose Gibbs distribution is µ.Starting from an arbitrary configuration X ∈ [q] V , in each iteration, the LocalMetropolis chain does the followings: • Local filter: Each edge e ∈ E flips a biased coin independently, with the probability of HEADS being where Ãe is the matrix obtained by normalizing A e as Ãe = A e / max i,j A e (i, j).We say that the edge passes the check if the outcome of coin flipping is HEADS.
Then for each vertex v ∈ V , if all edges incident with v passed their checks, v accepts the proposal and updates the value as After T iterations, where T is a threshold determined for specific Markov random field, the algorithm terminates and outputs the current X = (X v ) v∈V .The pseudocode for the LocalMetropolis algorithm is given in Algorithm 2. We remark that in each iteration, for each edge e = uv, the two endpoints u and v access the same random coin to determine whether e passes the check in this iteration.1 each v ∈ V initializes X v to an arbitrary value in [q]; 2 for t = 1 through T do pass the check independently with probability Ae(σu,σv)Ae(Xu,σv)Ae(σu,Xv) (maxi,j∈[q] Ae(i,j)) 3 ; all edges e incident with v pass the checks then

Mixing of LocalMetropolis
Let µ LM denote the distribution of X = (X v ) v∈V returned by the LocalMetropolis algorithm after T iterations.We need to ensure the chain is well behaved even when starting from infeasible configurations.Now we make the following assumption: for all which is slightly stronger than the assumption made for the Glauber dynamics.As in the case of Glauber dynamics, the property is needed only when the chain is allowed to start from an infeasible configuration X ∈ [q] V with µ(X) = 0.For specific MRF, such as graph colorings, the condition ( 6) is satisfied as long as q ≥ ∆ + 1 and q ≥ 3.As before, we further assume that the single-site Markov chain2 is irreducible among feasible configurations.
denote the transition matrix for the LocalMetropolis chain.First, we show this chain is reversible and µ is stationary, by verifying the detailed balance equation: If two configurations X, Y are both infeasible, then µ(X) = µ(Y ) = 0.If precisely one of X, Y is feasible, say X is feasible and Y is not, then µ(Y ) = 0 and X cannot move to Y since at least one edge cannot pass its check, which means P (X, Y ) = 0.In both cases, the detailed balance equation holds.
Next, we suppose X, Y are both feasible.Consider a move in the LocalMetropolis chain.Let C ∈ {0, 1} E be a Boolean vector that C e indicates whether edge e ∈ E passes its check.We call v ∈ V non-restricted by C if C e = 1 for all e incident with v and v accepts the proposal; and call v ∈ V restricted by C if otherwise.
A move in the chain is completely determined by C along with the proposed configurations σ ∈ [q] V .Let Ω X→Y denote the set of pairs (σ, C) with which X moves to Y , and ∆ X,Y = {v ∈ V | X v = Y v } the set of vertices on which X and Y disagree.Note that each (σ, C) ∈ Ω X→Y satisfies: Similar holds for Ω Y →X , the set of (σ, C) with which Y moves to X. Hence: In order to verify the detailed balance equation, we construct a bijection φ X,Y : Ω X→Y → Ω Y →X , and for every (σ, The detailed balance equation then follows from ( 7) and ( 8).
The bijection (σ, C) ) is constructed as follow: • for all v restricted by C, since (σ, It can be verified that the φ X,Y constructed in this way is indeed a bijection from Ω X→Y to Ω Y →X .For any (σ, C) ∈ Ω X→Y and the corresponding (σ ′ , C ′ ) ∈ Ω Y →X , since C ′ = C, in the following we will not specify whether v is (non-)restricted by C or C ′ but just say v is (non-)restricted, and the followings are satisfied: Then we have: Next, for each edge e ∈ E we calculate the ratio Pr(Ce|σ,X) Pr(C ′ e |σ ′ ,Y ) .There are two cases: • If C e = 0 which means e does not pass its check, then and And both u and v are restricted by C. By our construction of the bijection φ X,Y , we have • If C e = 1 which means e passes its check, then and There are three sub-cases according to whether vertices u and v are restricted: 1.Both u and v are restricted, in which case In all three sub-cases, the following identity can be verified: Since each edges passes its check independently, we have Combining ( 9) and ( 10), for every (σ, C) ∈ Ω X→Y and the corresponding (σ ′ , C ′ ) ∈ Ω Y →X , we have: This completes the verification of detailed balance equation and the proof of the reversibility of the chain with respect to stationary distribution µ.
Next, observe that the chain will never move from a feasible configuration to an infeasible one since at least one of the edge will not pass its check.By assumption (6), for all X ∈ [q] V , no matter feasible or not, and for every v ∈ V there must be a spin state i ∈ [q] such that with positive probability v is successfully updated to spin state i.Note that once a vertex is successfully updated it satisfies and will keep satisfying all its local constraints.Therefore, the chain is absorbing to feasible configurations.
It is easy to observe that every feasible configuration is aperiodic, since it has self-loop transition, i.e.P (X, X) > 0 for all feasible X.In addition, any move X → Y between feasible configurations X, Y ∈ Ω in the single-site Markov chain with vertex v being updated, can be simulated by a move in the LocalMetropolis chain in which all the vertices u other than v propose their current spin state X u and v proposes Y v .Provided the irreducibility of the single-site Markov chain among all feasible configurations, the LocalMetropolis chain is also irreducible among all feasible configurations.Combinining with the absorption towards feasible configurations and their aperiodicity, due to the Markov chain convergence theorem [43], d TV (µ LM , µ) converges to 0 as T → ∞.

The mixing of LocalMetropolis chain for graph colorings
Unlike the LubyGlauber chain, whose mixing rate is essentially due to the analysis of systematic scans.The mixing rate of LocalMetropolis chain is much more complicated to analyze.Here we analyze the mixing rate of the LocalMetropolis chain for proper q-colorings.
Given a graph G(V, E), a q-coloring σ ∈ [q] V is proper if σ u = σ v for all uv ∈ E. For this special MRF, the LocalMetropolis chain behaves simply as follows.Starting from an arbitrary coloring X ∈ [q] V , not necessarily proper, in each step: • Propose: each vertex v proposes a color c v ∈ [q] uniformly at random; • Local filter: each vertex v rejects its proposal if there is a neighbor u ∈ Γ(v) such that one of the followings occurs: 1. (v proposed the neighbor's current color) c v = X u ; 2. (v and the neighbor proposed the same color) c v = c u ; 3. (the neighbor proposed v's current color) otherwise, v accepts its proposal and updates its color X v to c v .
The first two filtering rules are sufficient to guarantee that the chain will never move to a "less proper" coloring.Although at first glance the third filtering rule looks redundant, it is necessary to guarantee the reversibility of the chain as well as the uniform stationary distribution.
It can be verified that when q ≥ ∆ + 2, the condition ( 6) is satisfied and the single-site Glauber dynamics for proper q-coloring is irreducible, and hence the chain is mixing due to Theorem 4.1.
The following theorem states a condition in the form q ≥ α∆ for the logarithmic mixing rate even for unbounded ∆ and q.This proves Theorem 1.2.Theorem 4.2.If q ≥ α∆ for a constant α > 2 + √ 2, the mixing rate of the LocalMetropolis chain for proper q-coloring on graphs with maximum degree at most , where the constant factor in O(•) depends only on α but not on the maximum degree ∆.
The theorem is proved by path coupling, a powerful engineering tool for coupling Markov chains.A coupling of a Markov chain on space Ω is a Markov chain (X, Y ) → (X ′ , Y ′ ) on space Ω 2 such that the transitions X → X ′ and Y → Y ′ individually follow the same transition rule as the original chain on Ω.For path coupling, we can construct a coupled Markov chain (X, Y ) → (X ′ , Y ′ ) for X, Y ∈ [q] V which differ at only one vertex.The chain mixes rapidly if the expected number of disagreeing vertices in (X ′ , Y ′ ) is less than 1.

An ideal coupling
The 2 + √ 2 threshold in Theorem 4.2 is due to an ideal coupling in the ∆-regular tree.Let T ∆ denote the infinite ∆-regular tree rooted at v 0 .We assume that the current pair of colorings (X, Y ) disagree only at the root v 0 and X u = Y u ∈ {X v 0 , Y v 0 } for all other vertices u in T ∆ .
An ideal coupling can be constructed as follows in a breadth-first fashion: (1) the root v 0 proposes the same random color in both chains X, Y ; (2) each child u of the root proposes the same random color in both chains unless it proposed one of {X v 0 , Y v 0 }, in which case it switches the roles of the two colors {X v 0 , Y v 0 } in the Y chain; (3) for all other vertices u, it proposes the same random color in both chains unless its parent proposed different colors in the two chains, in which case u switches the roles of {X v 0 , Y v 0 } in the Y chain.For this ideal coupling, by a calculation, it can be verified that for the root v 0 : ∆ and for any non-root vertex u in T ∆ at distance ℓ from v 0 : The expected number of disagreeing vertices in (X ′ , Y ′ ) is then bounded as The path coupling argument requires this quantity to be less than 1.For q = α ⋆ ∆ and ∆ → ∞, this quantity becomes For general non-tree graphs G(V, E) and arbitrary pairs of colorings (X, Y ) which disagree at only one vertex, where X, Y may not even be proper, we essentially show that the above special pair of colorings (X, Y ) on the infinite ∆-regular tree T ∆ represent the worst case for path coupling.The analysis for this general case is quite involved.We first state the path coupling lemma with general metric.Lemma 4.3 (Bubley and Dyer [6]).Given a pre-metric, which is a connected undirected graph on configuration space Ω with positive edge weight such that every edge is a shortest path, let Φ(X, Y ) be the length of the shortest path between two configurations X, Y ∈ Ω. Suppose that there is a coupling (X, Y ) → (X ′ , Y ′ ) of the Markov chain defined only for the pair (X, Y ) of configurations that are adjacent in the pre-metric, which satisfies that for some 0 < δ < 1.Then the mixing rate of the Markov chain is bounded by where diam(Ω) denotes the diameter of Ω in the pre-metric.
We use the following slightly modified pre-metric: A pair (X, Y ) ∈ Ω = [q] V is connected by an edge in the pre-metric if and only if X and Y differ at only one vertex, say v, and the edge-weight is given by deg(v).This leads us to the following definition.
u and φ u (X ′ , Y ′ ) = 0 if otherwise; and for S ⊆ V , we define the distance between X ′ and Y ′ on S as In addition, we denote Φ Clearly, the diameter of Ω in distance Φ has diam(Ω) ≤ n∆.
We prove the mixing rate in Theorem 4.2 for two separate regimes for q by using two different couplings.We define α * ≈ 3.634 . . . to be the positive root of α = 2e 1/α + 1.
Theorem 4.2 follows by combining the two lemmas.

An easy local coupling for q > 3.634∆ + 3
We first prove Lemma 4.4 by constructing a local coupling where the disagreement will not percolate outside its neighborhood.Let X, Y ∈ [q] V two q-colorings, not necessarily proper.Assume that X and Y disagree only at vertex v 0 ∈ V .The coupling (X, Y ) → (X ′ , Y ′ ) is constructed as follows: • Each vertex v ∈ V proposes the same random color in the two chains X and Y .Then (X ′ , Y ′ ) is determined due to the transition rule of LocalMetropolis chain.
Next we show the path coupling condition: The following technical lemma is frequently applied in the analysis of this and next couplings.
Lemma 4.6.If q ≥ a∆, then for any integer 0 which is nonnegative when q ≥ ad.
Proof of Lemma 4.4.First, observe that if v ∈ Γ + (v 0 ), where v 0 is the vertex at which X and Y disagree, then it always holds that X ′ v = Y ′ v , because all vertices in Γ + (v) are colored the same in X and Y and will propose the same random color in the two chains due to the coupling.Therefore, it is sufficient to consider the difference between X ′ and Y ′ in Γ + (v 0 ) and we have For each v, let c v ∈ [q] be the uniform random color proposed independently by v, which is identical in both chains by the coupling.
For the disagreeing vertex v 0 , it holds that For each u ∈ Γ(v 0 ), since X u = Y u , the event Combining ( 11) and ( 12) together and due to linearity of expectation, we have where the last inequality is due to the monotonicity stated in Lemma 4.6.

A global coupling for (2 +
√ 2)∆ < q ≤ 3.7∆ + 3 Next, we prove Lemma 4.5 and bound the mixing rate when (2+ √ 2)∆ < q ≤ 3.7∆+3.This is done by a global coupling where the disagreement may percolate to the entire graph, whose construction and analysis is substantially more sophisticated than the previous local coupling.Although this sophistication only improves the threshold for q in Lemma 4.4 by a small constant factor, the effort is worthwhile because it helps us to approache the threshold of the ideal coupling discussed in Section 4.2.1 and shows that the infinite ∆-regular tree T ∆ represents the worst case for path coupling.And curiously, the extremity of this worst case only holds when q is also properly upper bounded, say q ≤ 3.7∆ + 3, whereas the mixing rate for larger q was guaranteed by Lemma 4.4.
Let v 0 ∈ V be a vertex and X, Y ∈ [q] V any two q-colorings (not necessarily proper) which disagree only at v 0 .The coupling (X, Y ) → (X ′ , Y ′ ) of the LocalMetropolis chain is constructed by coupling (c X , c Y ), where c X , c Y ∈ [q] V are the respective vector of proposed colors in the two chains X and Y .For each v ∈ V , the (c X v , c Y v ) is sampled from one of the two following joint distributions: Note that for all u = v 0 we have X u = Y u , and if further X u ∈ {X v 0 , Y v 0 }, we say the vertices w ∈ Γ + (u) \ {v 0 } are blocked by u, and all other u = v 0 is unblocked.The special vertex v 0 is neither blocked nor unblocked.We denote by Γ B (v) and Γ U (v) the respective sets of blocked and unblocked neighbors of vertex v and let b The coupling (c X , c Y ) of proposed colors is constructed by the following recursive procedure: • Initially, for the disagreeing vertex v 0 , (c X v 0 , c Y v 0 ) is sampled consistently in the two chains.
• For each unblocked u ∈ Γ(v 0 ), the (c X u , c Y u ) is sampled independently (of other vertices) from the permuted distribution.
• Let S ⊆ V denote the current set of vertices v such that (c X v , c Y v ) has been sampled, and We abuse the notation and use ∂S = = {unblocked u ∈ S | ∃uv ∈ E, s.t.v ∈ S = } to denote the unblocked un-sampled vertex boundary of S = .If such ∂S = is non-empty, then all u ∈ ∂S = sample the respective (c X u , c Y u ) independently from the permuted distribution and join the S simultaneously.Grow S = according to the results of sampling.Repeat this step until the current ∂S = is empty and thus S is stabilized.
• For all remaining vertices v, (c X v , c Y v ) is sampled independently and consistently.
This procedure is in fact a Galton-Watson branching process starting from root v 0 .The blockedness of each vertex is determined by the current X and Y .The S grows from the root by a percolation of disagreement c X v = c Y v added in a breadth-first order.It is easy to see that each individual c X v or c Y v is uniformly distributed over [q] and is independent of c X u or c Y u for all other u = v (although the joint distributions (c X v , c Y v ) may be dependent of each other).Therefore, the (c X , c Y ) is a valid coupling of proposed colors.
A walk P = (v 0 , v 1 , . . ., v ℓ ) in G(V, E) is called a strongly self-avoiding walk (SSAW) if P is a simple path in G and v i v j is not an edge in G for any 0 < i+1 < j ≤ ℓ.An SSAW P = (v 0 , v 1 , . . ., v ℓ ) is said to be a path of disagreement with respect to (c X , c Y ) if (c X v i , c Y v i ), v i ∈ P are sampled in the order along the path P from i = 0 to ℓ, and c X v i = c Y v i for all 1 ≤ i ≤ ℓ.For any specific SSAW P = (v 0 , v 1 , . . ., v ℓ ) through unblocked vertices v 1 , v 2 , . . ., v ℓ , by the chain rule Proposition 4.7.For any vertex u = v 0 , the event c X u = c Y u occurs only if there is a strongly selfavoiding walk (SSAW) This means that u itself must be unblocked.At the time when (c X u , c Y u ) is being sampled, there must exist a neighbor w ∈ Γ(u) such that either (1) , and vertex w is unblocked.If it is the latter case, we repeat this argument for w recursively until v 0 is reached.This will give us a path P = (v 0 , v 1 , . . ., v ℓ ) from v 0 to u = v ℓ through unblocked vertices v 1 , . . ., v ℓ such that for all 1 Thus, P is a path of disagreement through unblocked vertices.Note that this path P = (v 0 , v 1 , . . ., v ℓ ) must be a strongly self-avoiding.To the contrary assume that P is not strongly self-avoiding and there exist 0 ≤ i, j ≤ ℓ such that i < j − 1 and v i v j is an edge.In this case, right after c X v i = c Y v i being sampled and v i joining S = , v i+1 and v j must be both in ∂S = because they are both unblocked un-sampled neighbors of v i then.And due to our construction of coupling, the (c X v i+1 , c Y v i+1 ) and (c X v j , c Y v j ) are sampled and v i+1 , v j join S simultaneously, which contradict that (c ) along the path.Therefore, P is an SSAW through unblocked vertices and is also a path of disagreement.
The coupled next step (X ′ , Y ′ ) is determined by the current (X, Y ) and the coupled proposed colors (c X , c Y ).Proposition 4.8.For any vertex u = v 0 , the event We then show for every edge uw incident to u, the followings hold: With ( 15), ( 16) and ( 17), each edge uw passes the check in chain X if and only if it passes the check in chain Y .Combining with the fact that X u = Y u for all u = v 0 , this implies We then verify ( 15), ( 16) and ( 17): , then for every neighbor w ∈ Γ(u), either w is blocked or w = v 0 .In both cases c X w = c Y w is sampled consistently, this implies ( 15) and ( 16), because c 15) and (16).And it holds that either For an unblocked vertex u = v 0 , assume ) must be sampled from the consistent distribution.And since u is unblocked and u = v 0 , the (c X u , c Y u ) is sampled from the consistent distribution only when for all neighbors w for all neighbors w ∈ Γ(u), which guarantees that X ′ u = Y ′ u , a contradiction.Therefore, we also show that for any unblocked u We then analyze the probability of Lemma 4.9.For the vertex v 0 at which the q-colorings X, Y ∈ [q] V disagree, accepts the proposal, which happens if the following events occur simultaneously: and the fact that X u = Y u for u = v 0 ).This occurs with probability at least q−dv 0 q .
• For all unblocked neighbors u ∈ Γ ).This occurs with probability at least 1 Thus the following is obtained by the chain rule: where the last inequality is due to the monotonicity stated in Lemma 4.6.
Proof.Due to Proposition 4.8, for unblocked u = v 0 , the event and u accepts its proposal in at least one chain among X, Y .Observe that any edge uv between unblocked vertices u, v either passes the check in both chains X, Y or does not pass the check in both chains.Therefore, the event X ′ u = Y ′ u occurs for an unblocked u = v 0 only if the following events occurs simultaneously: , which according to Proposition 4.7, occurs only if there is a SSAW P = (v 0 , v 1 , . . ., v ℓ ) from v 0 to v ℓ = u through unblocked vertices v 1 , . . ., v ℓ such that P is a path of disagreement; • for all unblocked neighbors w ∈ Γ U (u), the edge uw passes the check, which means c X w ∈ {c X u , X u } (and meanwhile c Y w ∈ {c Y u , Y u } by coupling) for all w ∈ Γ U (u); • all blocked neighbors w ∈ Γ B (u) passes the check in at least one chains among X, Y , which means either c . More specifically, these events occur only if: , which in either case, occurs with probability 1 q conditioning on (c

by the principle of inclusion-exclusion.
Take the union bound over all SSAW P = (v 0 , v 1 , . . ., v ℓ ) through unblocked vertices v 1 , . . ., v ℓ = u.Due to the strongly-avoiding property, it is safe to apply the chain rule for every P. We have: with that c Y u = Y w 0 which is due to that c X u = c Y u and X w 0 = Y w 0 , the edge uw 0 cannot pass the check in both chains, giving us X ′ u = Y ′ u , a contradiction.For the following, we assume c u and X w 0 = Y w 0 .We claim that u must have an unblocked neighbor w * ∈ Γ(u) such that c X w * = c Y w * because if otherwise for all the vertices w ∈ Γ + (u), the consistencies c X w = c Y w and X w = Y w hold, giving us X ′ u = Y ′ u , a contradiction.Therefore, there is a neighbor w * ∈ Γ(u) such that c X w * = c Y w * , which by Proposition 4.7, means that there is a strongly self-avoiding walk (SSAW ) is a path of disagreement.Fix any SSAW P = (v 0 , v 1 , . . ., v ℓ ) from v 0 to v ℓ = u with only u blocked.By Proposition 4.7: As argued above, assuming and c Y u = c X u due to the coupling), which occurs with probability 1 q conditioning on that P ′ is a path of disagreement.
As argued above, we have {c Since P is a strongly selfavoiding, we have w ∈ P for all w ∈ Γ(u)\{v ℓ−1 }.And the proposals are mutually independent in one chain.Condition on previous events, this probability is at most 1 − 2 q du−1 .
By the union bound over all SSAW P from v 0 to u with u being the only blocked vertex, and the chain rule for every P, we have This proves (19).
We then verify the path coupling condition: for some constant δ > 0, By the linearity of expectation, Due to Lemma 4.9, On the other hand, due to Lemma 4.10 and Lemma 4.11, ≤ P from v 0 to any u =v 0 where the inequality ( 22) is due to the monotonicity stated in Lemma 4.6, and the last sum in (23) enumerates all the walks P = (v 0 , v 1 , . . ., v ℓ ) from v 0 .And for such walk P, the quantity φ P is defined as that φ P = 0 if P is not a strongly self-avoiding walk (SSAW), and for a SSAW P = (v 0 , v 1 , . . ., v ℓ ) from v 0 to any v ℓ = u: It is easy to verify the inequality (23) with this definition of φ P .Given any walk P = (v 0 , v 1 , . . ., v ℓ ) from v 0 such that all v 1 , . . ., v ℓ are unblocked, we further define that where the sum enumerates all walks (not necessarily strongly self-avoiding) P ′ = (v 0 , v 1 , . . ., v ℓ , v ℓ+1 , . ..) with P as its prefix, including P itself.
Then by the inequality (23) the expected distance except for v 0 can be expressed as: Here each (v 0 , u) is a path (of length 1) from v 0 to its neighbor u.
And more importantly, for Φ P we have the following recurrence.For any walk P = (v 0 , v 1 , . . ., v ℓ ) from v 0 through unblocked vertices v 1 , . . ., v ℓ = u, if P is not strongly self-avoiding then Φ P = 0; and if otherwise P is strongly self-avoiding, then the following recurrence follows directly from the definition (24) of Φ P : where (P, w) denotes the walk P ′ = (v 0 , v 1 , . . ., v ℓ , w) that extends P.
The following lemma essentially states that Φ P is maximized when the number of blocked neighbors b u = 0 and then the value of Φ P is upper bounded by the fixpoint for this recurrence.Lemma 4.12.If 3∆ < q ≤ 3.7∆ + 3 and ∆ ≥ 5, then for any walk P = (v 0 , v 1 , . . ., v ℓ ) from v 0 such that all v 1 , . . ., v ℓ are unblocked, it holds that Proof.We prove by induction on the length of the walk.Let P = (v 0 , v 1 , . . ., v ℓ ) be a walk from v 0 such that all v 1 , . . ., v ℓ are unblocked and v ℓ = u.When ℓ is longer than the longest strongly self-avoiding walk among unblocked v 1 , . . ., v ℓ , then P is not a SSAW and thus Φ P = 0. Assume that the lemma holds for all unblocked walks longer than ℓ.Then due to the recurrence (26), which is bounded from above by The inequality holds trivially when b u = 0.It is then sufficient to prove that LHS is monotone on integer b u ≥ 0: Denoted In particular this holds when 3∆ < q ≤ 3.7∆ + 3 and ∆ ≥ 5.This completes the induction.

Lower bounds
In this section, we show lower bounds for local sampling.Let G(V, E) be a network, and I an instance of MRF or weighted local CSP defined on graph G.For example, I = (G, [q], A, b) for a MRF with edge activities A = {A e } e∈E and vertex activities b = {b v } v∈V .We assume that each vertex v ∈ V may access to an independent random variable Ψ v as its source of randomness.Then a t-round protocol specifies a family of functions Π v,I , such that for each vertex v ∈ V , the output X v is produced as where B t (v) = {u ∈ V | dist(u, v) ≤ t} represents the t-ball centered at v. Let µ out denote the distribution of the output random vector X = (X v ) v∈V .The goal is to have d TV (µ out , µ) ≤ ǫ, where µ = µ I is the Gibbs distribution defined by the MRF instance I.
Note that in above we allow the protocol Π v,I executed at each vertex v ∈ V to be aware of the instance I of the MRF.This is much stronger than the original LOCAL model.In fact, the only locality property we are using to prove our lower bounds is that for any X = (X v ) v∈V returned by a t-round protocol: The lower bounds implied by this property is due to the locality of randomness.For many natural MRFs, the Gibbs distribution µ exhibits the following exponential correlations: There exist constants δ, η > 0 such that for a path P of length n, any vertices u, v from the path, there are two spin states σ u , σ ′ u ∈ [q] such that µ u (σ u ) ≥ δ, µ u (σ ′ u ) ≥ δ for the marginal distribution µ u induced by µ at vertex u and This exponential correlation property is satisfied by many MRFs, in particular, the proper qcolorings for any constant q.For MRFs having this property, for any ǫ > exp(−o(n)), vertex pairs (u, v) with sufficiently small dist(u, v) = Ω(log 1 ǫ ) will contribute at least an ǫ total variation distance between Gibbs (σ u , σ v ) and any independent (X u , X v ).And due to (30), this gives an Ω(log 1 ǫ ) lower bound for local sampling from any MRF satisfying (31), where ǫ is the total variation distance.
We then show that the Ω(log n) lower bound holds even for a constant total variation distance ǫ.A similar Ω(log n) lower bound for sampling independent sets is proved independently in [33].Altogether it shows that the O log n ǫ upper bound in Theorem 1.2 is optimal.
Theorem 5.1.Let q ≥ 3 be a constant and ǫ < 1  3 .Any t-round protocol that samples uniform proper q-coloring in a path within total variation distance ǫ must have t = Ω(log n).
Proof.We actually prove the lower bound for all MRFs satisfying a stronger exponential correlation property stated as follows: There exist constants δ, η > 0 such that for a path P of length n, for any non-adjacent vertices x, u, v, y in the path from left to right, any spin states σ x , σ y ∈ [q], there exist two spin states It can be verified by a simple recursion for marginal probabilities in paths [45] that this property as well as the weaker correlation property (31) hold for uniform proper q-colorings in paths for any constant q ≥ 3. Let P = (w 0 , w 1 , . . ., w n−1 ) be a path of n vertices.For i = 0, 1, . . ., m where m = n−1 3(2t+1) , we denote x i = w 3(2t+1)i ; and for i = 0, 1, . . ., m − 1, denote u i = w 3(2t+1)i+2t+1 , and v i = w 3(2t+1)i+2(2t+1) .We denote F = {x i | 0 ≤ i ≤ m} and U = {u i , v i | 0 ≤ i ≤ m − 1}, and let C = F ∪ U .We call the vertices in C the centers, and the vertices in F and U the fixed and unfixed centers respectively.Note that the pairs (u i , v i ) of consecutive unfixed centers are separated by the fixed centers x i 's.Due to the conditional independence of MRF, conditioning on any particular configuration σ F ∈ [q] F of fixed centers, for a σ ∈ [q] P sampled from the Gibbs distribution µ consistent with σ F over F , the pairs (σ u i , σ v i ) are mutually independent of each other.For the followings we assume that we are conditioning on an arbitrarily fixed σ F ∈ [q] F .Let X u i and X v i be the respective output of u i and v i in a t-round protocol.Due to the observation of (30), X u i and X v i are mutually independent.According to the exponential correlation of (32), by choosing a suitably small t = O(log n), the total variation distance between (σ u i , σ v i ) and (X u i , X v i ) is at least exp(−Ω(t)) = n − 1 4 .We denote X i = (X u i , X v i ) and Y i = (σ u i , σ v i ), and consider the random vector X = (X i ) 0≤i≤m−1 and Y = (Y i ) 0≤i≤m−1 where Y is sampled conditioning on an arbitrarily fixed σ F ∈ [q] F .As we argued above, both X = (X i ) and Y = (Y i ) are vectors of mutually independent variables, and , where π i is a distribution over [q] 2 .Suppose Y follows the product distribution ν = ν 0 ×ν 1 ×. ..×ν m−1 , where ν i is a distribution over [q] 2 .Given π and ν, for each 0 ≤ i ≤ m − 1, define a map f i : [q] 2 → {0, 1}: Define the function f : where the last inequality holds because t = O(log n) and n is sufficiently large.Given a sample τ ∈ ([q] 2 ) m , we say the event A occurs if Note that m ≤ n.By Hoeffding's inequality, it holds that Pr .
By the definition of total variation distance, it holds that Recall that the above Y is sampled conditioning on an arbitrary configuration σ F ∈ [q] F of fixed centers.Now we consider a σ ∈ [q] P sampled from the Gibbs distribution µ on the path P and its restrictions σ F , σ U and σ C on F = {x i }, U = {u i , v i } and C = F ∪ U .Also let X be the vector of values returned by the vertices in P in a t-round protocol, and X F , X U and X C its restrictions on the respective sets of centers.The theorem follows if we can show that d TV (X, σ) > 1  3 for our choice of t = n).By definition of the total variation distance, we have: Note that If this quantity is greater than 1/3, then we already have d TV (X, σ) > 1/3 and the lower bound is proved.If otherwise, we suppose that 1 2 Observe that for any σ F ∈ [q] F , we have 1 2 0≤i≤m−1 is sampled conditioning on σ F and the inequality is due to (33).Therefore, the total variation distance in (34) can be further bounded as Next, we state a strong Ω(diam) lower bound for sampling with long-range correlations.

An Ω(diam) lower bound in the non-uniqueness regime
We consider the weighted independent sets of graphs, the hardcore model.Given a graph G(V, E) and a fugacity parameter λ > 0, each configuration σ in indicates an independent set I in G and is assigned a weight w(σ) = λ |I| .The Gibbs distribution µ = µ G is defined over all independent sets in G proportional to their weights.As discussed in Section 2.2, the model is an MRF.The hardcore model on graphs with maximum degree ∆ undergoes a computational phase transition at the uniqueness threshold λ c (∆) = (∆−1) ∆−1 (∆−2) ∆ , such that sampling from the Gibbs distribution can be done in polynomial time in the uniqueness regime λ < λ c [60,20] and is intractable unless NP=RP in the non-uniqueness regime λ > λ c [8,55,56,28].
The following theorem states an Ω(diam) lower bound for sampling from the hardcore model in the non-uniqueness regime.In particular when λ = 1 the model represents the uniform independent sets and the non-uniqueness λ > λ c (∆) holds when ∆ ≥ 6, which gives us Theorem 1.3.Theorem 5.2.Let ∆ ≥ 3 and λ > λ c (∆).Let ǫ > 0 be a sufficiently small constant.For all N > 0 there exists a graph G on Θ(N ) vertices with maximum degree ∆ and diameter diam(G) = Ω(N 1/11 ) such that for the hardcore model on G with fugacity λ, any t-round protocol that samples within total variation distance ǫ from the Gibbs distribution µ = µ G must have t = Ω(diam(G)).
We follow the approaches in [8,55,56,27,28] for the computational phase transition.The network G = H G is constructed by lifting a graph H with a gadget G, such that sampling from the hardcore model on H G with λ > λ c (∆) effectively samples a maximum cut in H.We choose H to be an even cycle, in which the maximum cut imposes a long-range correlation among vertices.And to sample with such a long-range correlation, the sampling algorithm must not be local.
Unlike the results of [8,55,56,27,28] which are for computational complexity of approximate counting, here we prove unconditional lower bounds for sampling in the LOCAL model.Our lower bound is due to the long-range correlations in the random max-cut rather than the computational complexity of optimization.Technical-wise, this means that in addition to show that a max-cut in H is sampled, we also need that the sampled max-cut is distributed almost uniformly.

The random graph gadget
We now describe the random graph gadget which is essential to the hardness of sampling.The gadget is constructed in two steps.For positive integers n, r and ∆, we first describe the construction of the random bipartite (multi)graph G r n : • Let V + and V − be two vertex sets with • Uniformly and independently sample ∆ − 1 perfect matchings between V + and V − and then uniformly and independently sample a perfect matching between U + and U − .The union of all these matchings gives us the random bipartite (multi)graph G r n , in which every vertex in U has degree ∆ and every vertex in W has degree ∆ − 1.Now we describe the second part of the construction.Let 0 < θ < ψ < 1/8 be constants.Let r ′ := ).First, we sample G from the distribution G r ′ n .Next, attach k disjoint (∆ − 1)-ary trees of even depth l (with k = (∆ − 1) ⌊θ log ∆−1 n⌋ and l = 2⌊ ψ 2 log ∆−1 n⌋) to W ± , such that every vertex in W is a leaf of exactly one tree and the trees do not share common vertices with the bipartite graph G, apart from the vertices in W .Let T ± denote the roots of those trees (|T + | = |T − | = k), called "terminals".We denote the family of graphs that can be constructed this way by G(k, n, ∆).Note that our construction is still bipartite with size Θ(n) and the terminals in T + and T − belongs to distinct partitions of the bipartite graph.
The phase of a configuration σ, denoted as Y (σ), is defined as It is easy to verify that the random bipartite graph G r n in the first step is an expander with high probability.The following proposition was proved in [8].
(∆−2) ∆ then there exist two constants 0 < q − < q + < 1 such that the followings hold.Let Q ± T denote the product measure on configurations in {0, 1} T so that the spin states are i.i.d.Bernoulli with probability q ± on T + and q ∓ on T − , that is: For any δ > 0, there exists sufficiently large constant N 0 (δ) such that for all n > N 0 (δ) the followings hold altogether with positive probability for G ∼ G(k, n, ∆): where Pr G is the probability law for σ sampled from µ G .By the probabilistic method, there exists a G satisfying the above conditions.
• For each vertex x ∈ H let G x be a copy of G.We denote by T ± x the respective set of 2k terminals in G x .Let H G be the disconnected copies of the G x , x ∈ H.
• For every edge (x, y) ∈ H, add k edges between T + x and T + y and similarly add k edges between T − x and T − y .This can be done in such a way that the resulting (multi)graph H G is ∆-regular.H) .Given the phase Y ′ ∈ {+, −} V (H) , we define: where is the set of all independent sets in H G .We also use Pr H G to represent the probability law for σ sampled from µ H G .
Note that the cycle H has precisely two maximum cuts.A key property for proving the lower bound is that in the non-uniqueness regime, sampling from the hardcore model on graph H G corresponds to sampling a maximum cut in H almost uniformly.
Proof.Since the graph H G consists of a collection of disconnected copies of G, the distribution of a configuration on H G is given by the product measure of configurations on the (G x ) x∈H .In particular the phases are independent, therefore Note that the ratio Z H G (Y ′ )/Z H G (Y ′ ) is precisely the probability of a σ sampled from µ H G being an independent set in H G .And due to Proposition 5.3, conditioning on the phase Y ′ the spins of σ x∈H Tx are almost independent i.i.d.Bernoulli with probabilities q + or q − depending on the phase, therefore where x Tx (σ Tx ) .

Proof of the Ω(diam) lower bound
Now we are ready to prove Theorem 5.2.Let N be sufficiently large.We choose an integer n = Θ(N 10/11 ) and even integer m = Θ(N 1/11 ) such that m/2 is odd, so that a gadget G is constructed to satisfy Proposition 5.3, and the graph G = H G , where H is a cycle of length m, is constructed as described in Section 5.1.2.Note that diam (G) ≥ diam(H) ≥ m/2 and |V (G)| = Θ(N ), therefore diam (G) = Ω(N 1/11 ).Let σ ′ denote the output of a t-round protocol with t ≤ 0.49 • diam(G) on network G, whose distribution is denoted as µ t ; and let σ be sampled from the hardcore Gibbs distribution µ = µ G .By contradiction, we assume that d TV (µ t , µ) ≤ ǫ for sufficiently small constant ǫ.We pick u, v ∈ V (G) which satisfy that dist G (u, v) = diam (G).Since G = H G is constructed by replacing each vertex x in H with G x which is an identical copy of G, it must hold that u ∈ G x , v ∈ G y for some vertices x, y in H with dist H (x, y) = m/2.And since m/2 is odd, without loss of generality, we suppose that Y ′ x = +, Y ′ y = − and Y ′′ x = −, Y ′′ y = +.Moreover, for all u ′ ∈ G x , v ′ ∈ G y , by the triangle inequality we have: For the σ ′ returned by a t-round protocol where t ≤ 0.49 • diam(G), according to the property (30), the σ ′ Gx and σ ′ Gy are independent of each other, thus the phases of G x and G y on σ ′ are independent of each other: by taking ǫ to be a sufficiently small constant, which contradicts the independence given in (39).

Conclusion
In this paper, we study the local sampling problem and ask a new question about local computation: whether a locally definable joint distribution can be sampled locally.
On the positive side, we give two distributed sampling algorithms LubyGlauber and Local-Metropolis.We show that LubyGlauber achieves O(∆ log n) mixing time under Dobrushin's condition and LocalMetropolis may achieve optimal O(log n) mixing time under a stronger mixing condition.Thus many locally definable joint distributions can be sampled locally.
On the negative side, we give an Ω(log n) lower bound for sampling from a broad class of locally defined joint distributions.Thus the O(log n)-radius can be considered as the new criteria for being local for distributed sampling algorithms.Furthermore, we give an Ω(diam) = n Ω(1) lower bound for sampling weighted independent sets in the non-uniqueness regime.Since independent set is trivial to construct, this gives a strong separation between local sampling and local construction.The lower bounds hold even if every vertex is aware of the graph structure, which means the hardness for local sampling is due to the discrepancy between the locality of randomness in distributed algorithms and the long-range correlation in the joint distribution from which we want to sample.

Remark 4 . 1 .
The LocalMetropolis algorithm can be naturally extended to sample from weighted CSPs.The local filtering now occurs on each local constraint, such that a k-ary constraint c = (f c , S c ) ∈ C passes the check with the probability which is a product of 2 k − 1 normalized factors fc (τ ) for the τ ∈ [q] Sc obtained from 2 k − 1 ways of mixing σ Sc with X Sc except the X Sc itself.Algorithm 2: Pseudocode for the LocalMetropolis algorithm Input: Each vertex v ∈ V receives {A uv } u∈Γ(v) and b v as input.