Abstract
The local computation of Linial [FOCS’87] and Naor and Stockmeyer [STOC’93] studies whether a locally defined distributed computing problem is locally solvable. In classic local computation tasks, the goal of distributed algorithms is to construct a feasible solution for some constraint satisfaction problem (CSP) locally defined on the network. In this paper, we consider the problem of sampling a uniform CSP solution by distributed algorithms in the \(\mathsf {LOCAL}\) model, and ask whether a locally definable joint distribution is locally sampleable. We use Markov random fields and Gibbs distributions to model locally definable joint distributions. We give two distributed algorithms based on Markov chains, called LubyGlauber and LocalMetropolis, which we believe to represent two basic approaches for distributed Gibbs sampling. The algorithms achieve respective mixing times \(O(\varDelta \log n)\) and \(O(\log n)\) under typical mixing conditions, where n is the number of vertices and \(\varDelta \) is the maximum degree of the graph. We show that the time bound \(\varTheta (\log n)\) is optimal for distributed sampling. We also show a strong \(\varOmega (\mathrm {diam})\) lower bound: in particular for sampling independent set in graphs with maximum degree \(\varDelta \ge 6\). This gives a strong separation between sampling and constructing locally checkable labelings.
Keywords
Distributed sampling algorithms Local computation \(\mathsf {LOCAL}\) model Gibbs sampling Markov chain Monte Carlo1 Introduction
Local computation and the \(\mathsf {LOCAL}\) model. Locality of computation is a central theme in the theory of distributed computing. In the seminal works of Linial [44], and Naor and Stockmeyer [49], the locality of distributed computation and the locally definable distributed computing problems are respectively captured by the \(\mathsf {LOCAL}\) model and the notion of locally checkable labeling (LCL) problems. In the \(\mathsf {LOCAL}\) model [49, 52], a network of n processors is represented as an undirected graph, where each vertex represents a processor and each edge represents a bidirectional communication channel. Computations and communications are organized in synchronized rounds. In each round, each processor may receive a message of arbitrary size from each of its neighbors, perform an arbitrary local computation with the information collected so far, and send a message of arbitrary size to each of its neighbors. The output value for each vertex in a tround protocol is determined by the local information within the tneighborhood of the vertex. The local computation tasks are usually formulated as labeling problems, such as the locally checkable labeling (LCL) problems introduced in [49], in which the distributed algorithm is asked to construct a feasible solution of a constraint satisfaction problem (CSP) defined by local constraints with constant diameter in the network. Many problems can be expressed in this way, including various vertex/edge colorings, or local optimizations such as maximal independent set (MIS) and maximal matching.
A classic question for local computation is whether a locally definable problem is locally computable. Mathematically, this asks whether a feasible solution for a given local CSP can be constructed using only local information. There is a substantial body of research works dedicated to this question [2, 3, 4, 5, 10, 24, 25, 29, 30, 31, 34, 39, 40, 41, 42, 44, 49, 54].
The local sampling problem Given an LCL problem which defines a local CSP on the network, aside from constructing a feasible solution of the local CSP, another interesting problem is to sample a uniform random feasible solution, e.g. to sample a uniform random proper coloring of the network G with a given number of colors. More abstractly, given an instance of local CSP which, say, treats the vertices in the network G(V, E) as variables, a joint distribution of uniform random feasible solution \(\varvec{X}=(X_v)_{v\in V}\) is accordingly defined by these local constraints. Our main question is whether a locally definable joint distribution can be sampled from locally.
Intuitively, sampling could be substantially more difficult than labeling, because to sample a feasible solution is at least as difficult as to construct one, and furthermore, the marginal distribution of each random variable \(X_v\) in a jointly distributed feasible solution \(\varvec{X}=(X_v)_{v\in V}\) may already encapsulate certain amount of nonlocal information about the solution space.
Retrieving such information about the solution space (as in sampling) instead of constructing one solution (as in labeling) by distributed algorithms is especially well motivated in the context of distributed machine learning [14, 15, 17, 32, 50, 57, 61, 62, 63], where the data (the description of the joint distribution) is usually distributed among a large number of servers.
Besides uniform distributions, it is also natural to consider sampling from general nonuniform distributions over the solution space, which are usually formulated as graphical models known as the weighted CSPs [7], also known as factor graphs [47]. In this model, a probability distribution called the Gibbs distribution is defined over the space \(\varOmega =[q]^V\) of configurations, in such a way that each constraint of the weighted CSP contributes a nonnegative factor in the probability measure of a configuration in \(\varOmega \). Due to HammersleyClifford’s fundamental theorem [47, Theorem 9.3] of random fields, this model is universal for conditional independent (spatial Markovian) [47, Proposition 9.2] joint distributions. The conditional independence property roughly says that fixed a separator \(S\subset V\) whose removal “disconnects” the variable sets A and B, given any feasible configuration \(X_S=\sigma _S\) over S, the configurations \(X_A\) over A and \(X_B\) over B are conditionally independent.
1.1 Our results
We give two Markov chain based distributed algorithms for sampling from Gibbs distributions. Given any \(\epsilon >0\), each algorithm returns a random output which is within total variation distance \(\epsilon \) from the Gibbs distribution. Our expositions mainly focus on MRFs, although both algorithms can be extended straightforwardly to general weighted local CSPs.
In classic singlesite Markov chains for Gibbs sampling, such as the Glauber dynamics, at each step a variable is picked at random and is updated according to its neighbors’ current states. A generic approach for parallelizing a singlesite sequential Markov chain is to update a set of nonadjacent vertices in parallel at each step. This natural idea has been considered in [32], also in a much broader context such as parallel job scheduling [12] or distributed Lovász local lemma [11, 48]. For sampling from locally defined joint distributions, it is especially suitable because of the conditional independence property of MRFs.
Our first algorithm, named LubyGlauber, naturally parallelizes the Glauber dynamics by parallel updating vertices from independent sets generated by the “Luby step” in Luby’s algorithm [1, 46]. It is well known that Glauber dynamics achieves the mixing rate \(\tau (\epsilon )=O\left( n\log \left( \frac{n}{\epsilon }\right) \right) \) under the Dobrushin’s condition for the decay of correlation [16, 35]. By a standard coupling argument, the LubyGlauber algorithm achieves a mixing rate \(\tau (\epsilon )=O\left( \varDelta \log \left( \frac{n}{\epsilon }\right) \right) \) under the same condition, where \(\varDelta \) is the maximum degree of the network. In particular, for uniform proper qcolorings, this implies:
Theorem 1
If \(q\ge \alpha \varDelta \) for an arbitrary constant \(\alpha >2\), there is an algorithm which samples a uniform proper qcoloring within total variation distance \(\epsilon >0\) within \(O\left( \varDelta \log \left( \frac{n}{\epsilon }\right) \right) \) rounds of communications on any graph G(V, E) with \(n=V\) vertices and maximum degree \(\varDelta \), where \(\varDelta \) may be unbounded.

Is it possible to update all variables in \(\varvec{X}=(X_v)_{v\in V}\) simultaneously and still converge to the correct stationary distribution \(\mu \)?

More concretely, is it always possible to sample almost uniform proper qcoloring, for a \(q=O(\varDelta )\), on any graphs G(V, E) with \(n=V\) vertices and maximum degree \(\varDelta \), within \(O(\log n)\) rounds of communications, especially when \(\varDelta \) is unbounded?
The LocalMetropolis algorithm always converges to the correct Gibbs distribution. The analysis of its mixing time is more involved. In particular, for uniformly sampling proper qcoloring we show:
Theorem 2
If \(q\ge \alpha \varDelta \) for an arbitrary constant \(\alpha >2+\sqrt{2}\), there is an algorithm for sampling uniform proper qcoloring within total variation distance \(\epsilon >0\) in \(O\left( \log \left( \frac{n}{\epsilon }\right) \right) \) rounds of communications on any graph G(V, E) with \(n=V\) vertices and maximum degree at most \(\varDelta \ge 9\), where \(\varDelta \) may be unbounded.
Neither of the algorithms abuses the power of the \(\mathsf {LOCAL}\) model: each message is of \(O(\log n)\) bits if the domain size \(q=\mathrm {poly}(n)\).
Due to the exponential correlation between variables in Gibbs distributions, the \(O\left( \log \left( \frac{n}{\epsilon }\right) \right) \) time bound achieved in Theorem 2 is optimal.
After the submission of this paper, two independent works [21, 23] give the same distributed algorithm for sampling random qcoloring, which improves the LocalMetropolis algorithm by introducing a step of laziness as distributed symmetry breaking. This new algorithm achieves an \(O(\log n)\) mixing time under the Dobrushin’s condition \(q \ge (2+\delta )\varDelta \). Furthermore, for graphs with sufficiently large maximum degree and girth at least 9, it achieves an \(O(\log n)\) mixing time when \(q \ge (\alpha ^* + \delta )\varDelta \), where \(\alpha ^* \approx 1.763\) is the positive root of equation \(x = \mathrm {e}^{1/x}\). Another nonMCMC algorithm named distributed JVV sampler is given in [22], which successfully samples. For many locally definable joint distributions, this algorithm successfully samples a configuration within \(\mathrm {polylog}(n)\) rounds in the \(\mathsf {LOCAL}\) model with high probability. In particular, this algorithm samples random qcoloring of trianglefree graphs within \(O(\log ^3 n)\) rounds in the \(\mathsf {LOCAL}\) model as long as \(q \ge (\alpha ^* + \delta )\varDelta \). This nonMCMC sampling algorithm abuses the power of the \(\mathsf {LOCAL}\) model by assuming unlimited messagesize and local computations.
It is a well known phenomenon that sampling may become computationally intractable when the model exhibits the nonuniqueness phasetransition property, e.g. independent sets in graphs of maximum degree bounded by a \(\varDelta \ge 6\) [27, 28, 55, 56]. For the same class of distributions, we show the following unconditional \(\varOmega ({\mathrm {diam}})\) lower bound for sampling in the \(\mathsf {LOCAL}\) model.
Theorem 3
For \(\varDelta \ge 6\), there exist infinitely many graphs G(V, E) with maximum degree \(\varDelta \) and diameter \({\mathrm {diam}}(G)=V^{\varOmega (1)}\) such that any algorithm that samples uniform independent set in G within sufficiently small constant total variation distance \(\epsilon \) requires \(\varOmega ({\mathrm {diam}}(G))\)rounds of communications, even assuming the vertices \(v\in V\) to be aware of G.
The lower bound is proved by a now fairly wellunderstood reduction from maximum cut to sampling independent sets when \(\varDelta \ge 6\) [28, 55, 56]. Specifically, we show that when \(\varDelta \ge 6\) there are infinitely many graphs G(V, E) such that if one can sample a nearly uniform independent set in G(V, E), then one can also sample an almost uniform maximum cut in an even cycle of size \(V^{\varOmega (1)}\), which is necessarily a global task because of the longrange correlation.

In the \(\mathsf {LOCAL}\) model it is trivial to construct an independent set (because \(\emptyset \) is an independent set). In contrast, Theorem 3 says that sampling a uniform independent set is very much a global task for graphs with maximum degree \(\varDelta \ge 6\).

In the \(\mathsf {LOCAL}\) model any labeling problem would be trivial once the network structure G is known to each vertex. In contrast, the sampling lower bound in Theorem 3 still holds even when each vertex is aware of G. Unlike labeling whose hardness is due to the locality of information, for sampling the hardness is solely due to the locality of randomness.

A breakthrough of Ghaffari et al. [30] shows that any labeling problem that can be solved sequentially with local information admits a \(O(\mathrm {polylog}(n))\)round randomized protocol in the \(\mathsf {LOCAL}\) model. In contrast, for sampling we have an \(\varOmega ({\mathrm {diam}})\) randomized lower bound for graphs with \(n^{\varOmega (1)}\) diameter.
1.2 Related work
The topic of sequential MCMC (Markov chain Monte Carlo) sampling is extensively studied. The study of sampling proper qcolorings was initiated by the seminal works of Jerrum [37] and independently of Salas and Sokal [53]. So far the best rapid mixing condition for general boundeddegree graphs is \(q \ge \frac{11}{6}\varDelta \) due to Vigoda [59]. See [26] for an excellent survey.
The chromaticschedulerbased parallelization of Glauber dynamics was studied in [32]. This parallel chain is in fact a special case of systematic scan for Glauber dynamics [18, 19, 35], in which the variables are updated according to a fixed order.
Empirical studies showed that sometimes an ad hoc “Hogwild!” parallelization of sequential sampler might work well in practice [51] and the mixing results assuming bounded asynchrony were given in [14, 38].
A sampling algorithm based on the Lovász local lemma is given in [33]. When sampling from the hardcore model with \(\lambda <\frac{1}{2\sqrt{\mathrm {e}}\varDelta 1}\) on a graph of maximum degree \(\varDelta \), this sampling algorithm can be implemented in the \(\mathsf {LOCAL}\) model which runs in \(O(\log n)\) rounds.
A problem related to the local sampling is the finitary coloring [36], in which a random feasible solution is sampled according to an unconstrained distribution as long as the distribution is over feasible solutions, rather than a specific distribution such as the Gibbs distribution. Therefore, the nature of this problem is still labeling rather than sampling.
Our algorithms are Markov chains which randomly walk over the solution space. A related notion is the distributed random walks [13], which walk over the network.
Our LocalMetropolis algorithm should be distinguished from the parallel MetropolisHastings algorithm [9] or the parallel tempering [58], in which the sampling algorithms makes N proposals or runs N copies of the system in parallel for a suitably large N, in order to improve the dynamic properties of the Monte Carlo simulation.
Organization of the paper The models and preliminaries are introduced in Sect. 2. The LubyGlauber algorithm is introduced in Sect. 3. The LocalMetropolis algorithm is introduced in Sect. 4. And the lower bounds are proved in Sect. 5.
2 Models and preliminaries
2.1 The \(\mathsf {LOCAL}\) model
We assume Linial’s \(\mathsf {LOCAL}\) model [49, 52] for distributed computation, which is as described in Sect. 1. We further allow each node in the network G(V, E) to be aware of upper bounds of \(\varDelta \) and \(\log n\), where \(n=V\) is the number of nodes. This information is accessed only because the running time of the Monte Carlo algorithms may depend on them.
2.2 Markov random field and local CSP
The Markov random field (MRF), or spin system, is a well studied stochastic model in probability theory and statistical physics. Given a graph G(V, E) and a set of spin states \([q]=\{1,2,\ldots ,q\}\) for a finite \(q\ge 2\), a configuration \(\sigma \in [q]^V\) assigns each vertex one of the q spin states. For each edge \(e\in E\) there is a nonnegative \(q\times q\) symmetric matrix \(A_e\in {\mathbb {R}}_{\ge 0}^{q\times q}\) associated with e, called the edge activity; and for each vertex \(v\in V\) there is a nonnegative qdimensional vector \(b_v \in {\mathbb {R}}_{\ge 0}^q\) associated with v, called the vertex activity. Then each configuration \(\sigma \in [q]^V\) is assigned a weight \(w(\sigma )\) which is as defined in (1).
This gives rise to a natural probability distribution \(\mu \), called the Gibbs distribution, over all configurations in the sample space \(\varOmega =[q]^V\) proportional to their weights, such that \(\mu (\sigma ) = {w(\sigma )}/{Z}\) for each \(\sigma \in \varOmega \), where \(Z=\sum _{\sigma \in \varOmega }w(\sigma )\) is the normalizing factor. A configuration \(\sigma \in \varOmega \) is feasible if \(\mu (\sigma )>0\).

Independent sets/vertex covers: When \(q=2\), all \(A_e=\begin{bmatrix}1&1 \\ 1&0\end{bmatrix}\) and all \(b_v=\begin{bmatrix}1 \\ 1 \end{bmatrix}\), each feasible configuration corresponds to an independent set (or vertex cover, if the other spin state indicates the set) in G, and the Gibbs distribution \(\mu \) is the uniform distribution over independent sets (or vertex covers) in G. When \(b_v=\begin{bmatrix}1 \\ \lambda \end{bmatrix}\) for some parameter \(\lambda >0\), this is the hardcore model from statistical physics.

Colorings and list colorings: When every \(A_e\) has \(A_e(i,i)=0\) and \(A_e(i,j)=1\) if \(i\ne j\), and every \(b_v\) is the all1 vector, the Gibbs distribution \(\mu \) becomes the uniform distribution over proper qcolorings of graph G. For list colorings, each vertex \(v\in V\) can only use the colors from its list \(L_v\subseteq [q]\) of available colors. Then we can let each \(b_v\) be the indicator vector for the list \(L_v\) and \(A_e\)’s are the same as for proper qcolorings, so that the Gibbs distribution is the uniform distribution over proper list colorings.

Physical model: The proper qcoloring is a special case of the Potts model in statistical physics, in which each \(A_e\) has \(A_e(i,i)=\beta \) for some parameter \(\beta >0\) and \(A_e(i,j)=1\) if \(i\ne j\). When further \(q=2\), the model becomes the Ising model.

Dominating sets: They can be expressed by having a “cover” constraint on each inclusive neighborhood \(\varGamma ^+(v)\) which constrains that at least one vertex from \(\varGamma ^+(v)\) is chosen.

Maximal independent sets (MISs): An MIS is a dominating independent set.
2.3 Local sampling
2.4 Mixing rate
Notations Given a graph G(V, E), we denote by \(d_v=\deg (v)\) the degree of v in G, \(\varDelta =\varDelta _G\) the maximum degree of G, \({\mathrm {diam}}={\mathrm {diam}}(G)\) the diameter of G, and \({\mathrm {dist}}(u,v)={\mathrm {dist}}_G(u,v)\) the shortest path distance between vertices u and v in G.
We also denote by \(\varGamma (v)=\{u\mid uv\in E\}\) the neighborhood of v, and \(\varGamma ^+(v)=\varGamma (v)\cup \{v\}\) the inclusive neighborhood. Finally we write \(B_r(v)=\{u\mid {\mathrm {dist}}(u,v)\le r\}\) for the rball centered at v.
3 The LubyGlauber algorithm
In this section, we analyze a generic scheme for parallelizing Glauber dynamics, a classic sequential Markov chain for sampling from Gibbs distributions.

sample a vertex \(v\in V\) uniformly at random;

resample the value of \(X_v\) according to the marginal distribution induced by \(\mu \) at vertex v conditioning on the current spin states of v’s neighborhood.

independently sample a random independent set I in G;

for each \(v\in I\), resample \(X_v\) in parallel according to the marginal distribution \(\mu _v(\cdot \mid X_{\varGamma (v)})\).
A convenient way for generating a random independent set in a distributed fashion is the “Luby step” in Luby’s algorithm for distributed MIS [1, 46]: each vertex samples a uniform and independent ID from the interval [0, 1] (which can be discretized with \(O(\log n)\) bits) and the vertices v who are locally maximal among the inclusive neighborhood \(\varGamma ^+(v)\) are selected into the independent set I.
According to the definition of marginal distribution (2), resampling \(X_v\) can be done locally by exchanging neighbors’ current spin states. After T iterations, where T is a threshold determined for specific Markov random field, the algorithm terminates and outputs the current \(\varvec{X}=(X_v)_{v\in V}\).
Remark 1
The LubyGlauber algorithm can be easily extended to sample from weighted CSPs defined by local constraints \(c=(f_c,S_c)\in {\mathcal {C}}\), by simply overriding the definition of neighborhood as \(\varGamma (v)=\{u\ne v\mid \exists c\in {\mathcal {C}}, \{u,v\}\subseteq S_c\}\), thus \(\varGamma (v)\) is the neighborhood of v in the hypergraph where \(S_c\)’s are the hyperedges and now I is the strongly independent set of this hypergraph.
3.1 Mixing of LubyGlauber
Let \(\mu _{\mathsf {LG}}\) denote the distribution of \(\varvec{X}\) returned by the algorithm upon termination. As in the case of singlesite Glauber dynamics, we assume that the marginal distribution (2) is always welldefined, and the singlesite Glauber dynamics is irreducible among all feasible configurations. The following proposition is easy to obtain.
Proposition 1
The Markov chain LubyGlauber is reversible and has stationary distribution \(\mu \). Furthermore, under the above assumption, \(d_{\mathrm {TV}}\left( {\mu _{\mathsf {LG}}},{\mu }\right) \) converges to 0 as \(T\rightarrow \infty \).
Proof
We prove this for a more general family of Markov chains, where the “Luby step” is replaced by an arbitrary way of independently sampling a random independent set I, as long as \(\Pr [v\in I]>0\) for every vertex \(v\in V\).
If both X and Y are infeasible, then \(\mu (X)=\mu (Y)=0\) and the detailed balance equation holds trivially. If X is feasible and Y is not then \(\mu (Y)=0\) and meanwhile since the chain never moves from a feasible configuration to an infeasible one, we have \(P(X,Y)=0\) so the detailed balance equation is also satisfied.
Next, observe that the chain will never move from a feasible configuration to an infeasible one. Moreover, due to the assumption that the marginal distribution (2) is always welldefined, once a vertex v has been resampled, it will satisfy all local constraints. Therefore, the chain will be feasible once every vertex has been resampled. Since every vertex v has positive probability \(\Pr [v\in I]\) to be resampled, the chain is absorbing to feasible configurations.
It is easy to observe that every feasible configuration is aperiodic, since it has selfloop transition, i.e. \(P(X,X) > 0\) for all feasible X. And any move \(X \rightarrow Y\) between feasible configurations \(X,Y\in \varOmega \) in the singlesite Glauber dynamics with vertex v being updated, can be simulated by a move in the LubyGlauber chain by first sampling an independent set \(I\ni v\) (which is always possible since \(\Pr [v\in I]>0\)) and then updating v according to \(X\rightarrow Y\) and meanwhile keeping all \(v\in I{\setminus }\{v\}\) unchanged (which is always possible for feasible X). Provided the irreducibility of the singlesite Glauber dynamics among all feasible configurations, the LubyGlauber chain is also irreducible among all feasible configurations. Combining with the absorption towards feasible configurations and their aperiodicity, due to the Markov chain convergence theorem [43], the total variation distance \(d_{\mathrm {TV}}\left( {\mu _{\mathsf {LG}}},{\mu }\right) \) converges to 0 as \(T \rightarrow \infty \).\(\square \)
We then apply a standard coupling argument from [18, 35] to analyze the mixing rate of the LubyGlauber chain. The following notions are essential to the mixing of Glauber dynamics.
Definition 1
Definition 2
It is a fundamental result that the Dobrushin’s condition is sufficient for the rapid mixing of Glauber dynamics [16, 35, 53], with a mixing rate of \(\tau (\epsilon )=O\left( \frac{n}{1\varvec{\alpha }}\log \left( \frac{n}{\epsilon }\right) \right) \) . Here we show that the LubyGlauber chain is essentially a parallel speed up of the Glauber dynamics by a factor of \(\varTheta (\frac{n}{\varDelta })\).
Theorem 4
Under the same assumption as Proposition 1, if the total influence \(\varvec{\alpha }<1\), then the mixing rate of the LubyGlauber chain is \(\tau (\epsilon )= O\left( \frac{\varDelta }{1\varvec{\alpha }}\log \left( \frac{n}{\epsilon }\right) \right) \).
Consequently, for any \(\epsilon >0\) the LubyGlauber algorithm can terminate within \(O\left( \frac{\varDelta }{1\varvec{\alpha }}\log \left( \frac{n}{\epsilon }\right) \right) \) rounds in the \(\mathsf {LOCAL}\) model and return an \(\varvec{X}\in [q]^V\) whose distribution \(\mu _\mathsf {LG}\) is \(\epsilon \)close to the Gibbs distribution \(\mu \) in total variation distance.
Remark 2
In fact, Proposition 1 and Theorem 4 hold for a more general family of Markov chains, where the “Luby step” could be any subroutine which independently generates a random independent set I, as long as every vertex has positive probability to be selected into I. In general, the mixing rate in Theorem 4 is in fact \(\tau (\epsilon )= O\left( \frac{1}{(1\varvec{\alpha })\gamma }\log \left( \frac{n}{\epsilon }\right) \right) \) where \(\gamma \) is a lower bound for the probability \(\Pr [v\in I]\) for all \(v\in V\).
The following lemma is crucial for relating the mixing rate to the influence matrix. The lemma has been proved in various places [14, 18, 35].
Lemma 1
Proof
Proof of Theorem 4:
We are actually going to prove a stronger result: Denoted by I the random independent set on which the resampling is executed, we write \(\gamma _v=\Pr [v\in I]\) for each \(v\in V\), and assume that for all \(v\in V\), \(\gamma _v\ge \gamma \) for some \(\gamma >0\). Clearly, when I is generated by the “Luby step”, this holds for \(\gamma =\frac{1}{\varDelta +1}\). We are going to prove that \(\tau (\epsilon )=O\left( \frac{1}{(1\varvec{\alpha })\gamma }\log \left( \frac{n}{\epsilon }\right) \right) \).
The proof follows the framework of Hayes [35]. We construct a coupling of the Markov chain \((X^{(t)}, Y^{(t)})\) such that the transition rules for \(X^{(t)}\rightarrow X^{(t+1)}\) and \(Y^{(t)}\rightarrow Y^{(t+1)}\) are the same as the LubyGlauber chain. If \(\Pr [X^{(T)}\ne Y^{(T)}\mid X^{(0)}=\sigma \wedge Y^{(0)}=\tau ]\le \epsilon \) for any initial configurations \(\sigma ,\tau \in \varOmega \), then by the coupling lemma for Markov chain [43], we have the mixing rate \(\tau (\epsilon )\le T\).
Arbitrarily fix \(\sigma ,\tau \in \varOmega =[q]^V\). For \(t\ge 0\), define \((X^{(t)},Y^{(t)})\in \varOmega ^2\) by iterating a maximal onestep coupling of the LubyGlauber chain, starting from initial condition \(X^{(0)}=\sigma ,Y^{(0)}=\tau \). Due to the welldefinedness of marginal distribution (2), we know that once all vertices have been resampled, the configuration will be feasible and will remain to be feasible in future.
3.2 Application of LubyGlauber for sampling graph colorings
For uniformly distributed proper qcoloring of graph G, it is well known that the Dobrushin’s condition is satisfied when \(q\ge 2\varDelta +1\) where \(\varDelta \) is the maximum degree of graph G.
Corollary 1
If there is an arbitrary constant \(\delta >0\) such that \(q_v\ge (2+\delta )d_v\) for every vertex v, then the mixing rate of the LubyGlauber chain for sampling list coloring is \(\tau (\epsilon )= O\left( \varDelta \log \left( \frac{n}{\epsilon }\right) \right) \).
4 The LocalMetropolis algorithm
In this section, we give an algorithm that may fully parallelize the sequential process under suitable mixing conditions, even on graphs with unbounded degree. The algorithm is inspired by the famous MetropolisHastings algorithm for MCMC, in which a random choice is proposed and then filtered to enforce the target stationary distribution. Our algorithm, called the LocalMetropolis algorithm, makes each vertex propose independently, and localizes the work of filtering to each edge.

Propose: Each vertex \(v \in V\) independently proposes a spin state \(\sigma _v\in [q]\) with probability proportional to \(b_v(\sigma _v)\).
 Local filter: Each edge \(e\in E\) flips a biased coin independently, with the probability of HEADS beingwhere \(\tilde{A}_e\) is the matrix obtained by normalizing \(A_e\) as \(\tilde{A}_e=A_e/\max _{i,j}A_e(i,j)\). We say that the edge passes the check if the outcome of coin flipping is HEADS. Then for each vertex \(v \in V\), if all edges incident with v passed their checks, v accepts the proposal and updates the value as \(X_v=\sigma _v\), otherwise v leaves \(X_v\) unchanged.$$\begin{aligned} \tilde{A}_e(\sigma _u,\sigma _v)\tilde{A}_e(X_u,\sigma _v)\tilde{A}_e(\sigma _u,X_v), \end{aligned}$$
We remark that in each iteration, for each edge \(e=uv\), the two endpoints u and v access the same random coin to determine whether e passes the check in this iteration.
Remark 3
The LocalMetropolis algorithm can be naturally extended to sample from weighted CSPs. The local filtering now occurs on each local constraint, such that a kary constraint \(c=(f_c,S_c)\in {\mathcal {C}}\) passes the check with the probability which is a product of \(2^k1\) normalized factors \(\tilde{f}_c(\tau )\) for the \(\tau \in [q]^{S_c}\) obtained from \(2^k1\) ways of mixing \(\sigma _{S_c}\) with \(X_{S_c}\) except the \(X_{S_c}\) itself.
4.1 Mixing of LocalMetropolis
Let \(\mu _\mathsf {LM}\) denote the distribution of \(\varvec{X}=(X_v)_{v\in V}\) returned by the LocalMetropolis algorithm after T iterations.
Theorem 5
The Markov chain LocalMetropolis is reversible and has stationary distribution \(\mu \). Furthermore, under above assumptions, \(d_{\mathrm {TV}}\left( {\mu _{\mathsf {LM}}},{\mu }\right) \) converges to 0 as \(T \rightarrow \infty \).
Proof
Next, we suppose X, Y are both feasible. Consider a move in the LocalMetropolis chain. Let \({\mathcal {C}}\in \{0,1\}^E\) be a Boolean vector that \({\mathcal {C}}_e\) indicates whether edge \(e\in E\) passes its check. We call \(v\in V\) nonrestricted by \({\mathcal {C}}\) if \({\mathcal {C}}_e=1\) for all e incident with v and v accepts the proposal; and call \(v\in V\) restricted by \({\mathcal {C}}\) if otherwise.

\(\forall v \in \varDelta _{X,Y}\): \(\sigma _v=Y_v\) and v is nonrestricted by \({\mathcal {C}}\);

\(\forall v \not \in \varDelta _{X,Y}\): either \(\sigma _v = X_v=Y_v\) or v is restricted by \({\mathcal {C}}\).

\({\mathcal {C}}^\prime = {\mathcal {C}}\);

for all v nonrestricted by \({\mathcal {C}}\), since \((\sigma ,{\mathcal {C}})\in \varOmega _{X \rightarrow Y}\) it must hold \(\sigma _v=Y_v\), then set \(\sigma '_v=X_v\);

for all v restricted by \({\mathcal {C}}\), since \((\sigma ,{\mathcal {C}})\in \varOmega _{X \rightarrow Y}\) it must hold \(X_v=Y_v\), then set \(\sigma '_v=\sigma _v\).

\(\forall v\in \varDelta _{X,Y}\): \(\sigma _v = Y_v\), \(\sigma '_v=X_v\) and v is nonretricted;

\(\forall v \not \in \varDelta _{X,Y}\): either \(\sigma _v=\sigma _v'=X_v=Y_v\) or v is restricted and \(\sigma _v=\sigma _v'\). In both cases, \(\sigma _v = \sigma '_v\).
 If \({\mathcal {C}}_e=0\) which means e does not pass its check, thenAnd both u and v are restricted by \({\mathcal {C}}\). By our construction of the bijection \(\phi _{X,Y}\), we have \(\sigma _u=\sigma '_u\), \(\sigma _v=\sigma '_v\), \(X_u=Y_u\), and \(X_v=Y_v\). It follows that$$\begin{aligned}&{\Pr }[{\mathcal {C}}_e=0 \mid \sigma ,X] = 1\tilde{A}_e(\sigma _u,\sigma _v)\tilde{A}_e(X_u,\sigma _v)\tilde{A}_e(\sigma _u,X_v)\\&\text {and}\quad \\&{\Pr }[{\mathcal {C'}}_e=0\mid \sigma ',Y] = 1\tilde{A}_e(\sigma '_u,\sigma '_v)\tilde{A}_e(Y_u,\sigma '_v)\tilde{A}_e(\sigma '_u,Y_v). \end{aligned}$$$$\begin{aligned} \frac{{\Pr }[{\mathcal {C}}_e=0\mid \sigma ,X]}{{\Pr }[{\mathcal {C^\prime }}_e=0\mid \sigma ^\prime ,Y] } =\frac{A_e(Y_u,Y_v)}{A_e(X_u,X_v)}=1. \end{aligned}$$
 If \({\mathcal {C}}_e=1\) which means e passes its check, thenThere are three subcases according to whether vertices u and v are restricted:$$\begin{aligned}&{\Pr }[{\mathcal {C}}_e=1\mid \sigma ,X] = \tilde{A}_e(\sigma _u,\sigma _v)\tilde{A}_e(X_u,\sigma _v)\tilde{A}_e(\sigma _u,X_v),\\&\text {and }\\&{\Pr }[{\mathcal {C'}}_e=1\mid \sigma ',Y] = \tilde{A}_e(\sigma '_u,\sigma '_v)\tilde{A}_e(Y_u,\sigma '_v)\tilde{A}_e(\sigma '_u,Y_v). \end{aligned}$$In all three subcases, the following identity can be verified:
 1.
Both u and v are restricted, in which case \(\sigma _u=\sigma '_u\), \(\sigma _v=\sigma '_v\), \(X_u=Y_u\), \(X_v=Y_v\).
 2.
Precisely one of \(\{u,v\}\) is restricted, say v is restricted and u is nonrestricted, in which case \(\sigma _u=Y_u\), \(\sigma '_u=X_u\), \(\sigma _v=\sigma '_v\), and \(X_v=Y_v\).
 3.
Both u and v are nonrestricted, in which case \(\sigma _u = Y_u\), \(\sigma '_u=X_u\), \(\sigma _v=Y_v\), \(\sigma '_v=X_v\).
$$\begin{aligned} \frac{{\Pr }[{\mathcal {C}}_e=1\mid \sigma ,X]}{{\Pr }[{\mathcal {C^\prime }}_e=1\mid \sigma ^\prime ,Y]} =\frac{\tilde{A}_e(Y_u,Y_v)}{\tilde{A}_e(X_u,X_v)}=\frac{A_e(Y_u,Y_v)}{A_e(X_u,X_v)}. \end{aligned}$$  1.
Next, observe that the chain will never move from a feasible configuration to an infeasible one since at least one of the edge will not pass its check. By assumption (6), for all \(X \in [q]^V\), no matter feasible or not, and for every \(v \in V\) there must be a spin state \(i\in [q]\) such that with positive probability v is successfully updated to spin state i. Note that once a vertex is successfully updated it satisfies and will keep satisfying all its local constraints. Therefore, the chain is absorbing to feasible configurations.
It is easy to observe that every feasible configuration is aperiodic, since it has selfloop transition, i.e. \(P(X,X) > 0\) for all feasible X. In addition, any move \(X \rightarrow Y\) between feasible configurations \(X,Y\in \varOmega \) in the singlesite Markov chain with vertex v being updated, can be simulated by a move in the LocalMetropolis chain in which all the vertices u other than v propose their current spin state \(X_u\) and v proposes \(Y_v\). Provided the irreducibility of the singlesite Markov chain among all feasible configurations, the LocalMetropolis chain is also irreducible among all feasible configurations. Combinining with the absorption towards feasible configurations and their aperiodicity, due to the Markov chain convergence theorem [43], \(d_{\mathrm {TV}}\left( {\mu _{\mathsf {LM}}},{\mu }\right) \) converges to 0 as \(T \rightarrow \infty \). \(\square \)
4.2 The mixing of LocalMetropolis chain for graph colorings
Unlike the LubyGlauber chain, whose mixing rate is essentially due to the analysis of systematic scans. The mixing rate of LocalMetropolis chain is much more complicated to analyze. Here we analyze the mixing rate of the LocalMetropolis chain for proper qcolorings.

Propose: each vertex v proposes a color \(c_v\in [q]\) uniformly at random;
 Local filter: each vertex v rejects its proposal if there is a neighbor \(u \in \varGamma (v)\) such that one of the followings occurs:otherwise, v accepts its proposal and updates its color \(X_v\) to \(c_v\).
 1.
(v proposed the neighbor’s current color) \(c_v = X_u\);
 2.
(v and the neighbor proposed the same color) \(c_v = c_u\);
 3.
(the neighbor proposed v’s current color) \(X_v = c_u\);
 1.
It can be verified that when \(q \ge \varDelta + 2\), the condition (6) is satisfied and the singlesite Glauber dynamics for proper qcoloring is irreducible, and hence the chain is mixing due to Theorem 5. The following theorem states a condition in the form \(q\ge \alpha \varDelta \) for the logarithmic mixing rate even for unbounded \(\varDelta \) and q. This proves Theorem 2.
Theorem 6
If \(q\ge \alpha \varDelta \) for a constant \(\alpha >2+\sqrt{2}\), the mixing rate of the LocalMetropolis chain for proper qcoloring on graphs with maximum degree at most \(\varDelta =\varDelta (n)\ge 9\) is \(\tau (\epsilon )=O(\log \left( \frac{n}{\epsilon }\right) )\), where the constant factor in \(O(\cdot )\) depends only on \(\alpha \) but not on the maximum degree \(\varDelta \).
The theorem is proved by path coupling, a powerful engineering tool for coupling Markov chains. A coupling of a Markov chain on space \(\varOmega \) is a Markov chain \((X,Y)\rightarrow (X',Y')\) on space \(\varOmega ^2\) such that the transitions \(X\rightarrow X'\) and \(Y\rightarrow Y'\) individually follow the same transition rule as the original chain on \(\varOmega \). For path coupling, we can construct a coupled Markov chain \((X,Y)\rightarrow (X',Y')\) for \(X,Y\in [q]^V\) which differ at only one vertex. The chain mixes rapidly if the expected number of disagreeing vertices in \((X',Y')\) is \(<1\).
4.2.1 An ideal coupling
The \(2+\sqrt{2}\) threshold in Theorem 6 is due to an ideal coupling in the \(\varDelta \)regular tree. Let \({\mathbb {T}}_{\varDelta }\) denote the infinite \(\varDelta \)regular tree rooted at \(v_0\). We assume that the current pair of colorings (X, Y) disagree only at the root \(v_0\) and \(X_u=Y_u\not \in \{X_{v_0},Y_{v_0}\}\) for all other vertices u in \({\mathbb {T}}_{\varDelta }\).
For general nontree graphs G(V, E) and arbitrary pairs of colorings (X, Y) which disagree at only one vertex, where X, Y may not even be proper, we essentially show that the above special pair of colorings (X, Y) on the infinite \(\varDelta \)regular tree \({\mathbb {T}}_{\varDelta }\) represent the worst case for path coupling. The analysis for this general case is quite involved. We first state the path coupling lemma with general metric.
Lemma 2
We use the following slightly modified premetric: A pair \((X,Y)\in \varOmega =[q]^V\) is connected by an edge in the premetric if and only if X and Y differ at only one vertex, say v, and the edgeweight is given by \(\deg (v)\). This leads us to the following definition.
Definition 3
Clearly, the diameter of \(\varOmega \) in distance \(\varPhi \) has \({\mathrm {diam}}(\varOmega )\le n\varDelta \).
We prove the mixing rate in Theorem 6 for two separate regimes for q by using two different couplings. We define \(\alpha ^*\approx 3.634\ldots \) to be the positive root of \(\alpha =2\mathrm {e}^{1/\alpha }+1\).
Lemma 3
If \(q \ge \alpha \varDelta +3\) for a constant \(\alpha >\alpha ^*\), then \(\tau (\epsilon )=O(\log \left( \frac{n}{\epsilon }\right) )\).
Lemma 4
If \(\alpha \varDelta \le q\le 3.7\varDelta +3\) for \(2+\sqrt{2}<\alpha \le 3.7\) and \(\varDelta \ge 9\), then \(\tau (\epsilon )=O(\log \left( \frac{n}{\epsilon }\right) )\).
Theorem 6 follows by combining the two lemmas.
4.2.2 An easy local coupling for \(q > 3.634\varDelta +3\)

Each vertex \(v\in V\) proposes the same random color in the two chains X and Y. Then \((X',Y')\) is determined due to the transition rule of LocalMetropolis chain.
Lemma 5
If \(q \ge a \varDelta \), then for any integer \(0\le d\le \varDelta \), \(d\left( 1\frac{a}{q}\right) ^d\le \varDelta \left( 1\frac{a}{q}\right) ^{\varDelta }\).
Proof
Proof of Lemma 3
For each v, let \(c_v\in [q]\) be the uniform random color proposed independently by v, which is identical in both chains by the coupling.
Therefore, when \(\alpha >\alpha ^*\), there is a constant \(\delta >0\) which depends only on \(\alpha \), such that for all \(\varDelta \ge 1\) and \(q\ge \alpha \varDelta +3\), the inequality (13) is satisfied, which by Lemma 2, gives us \(\tau (\epsilon ) = O\left( \log \left( \frac{n}{\epsilon }\right) \right) \).
4.2.3 A global coupling for \((2+\sqrt{2})\varDelta <q\le 3.7\varDelta +3\)
Next, we prove Lemma 4 and bound the mixing rate when \((2+\sqrt{2})\varDelta <q\le 3.7\varDelta +3\). This is done by a global coupling where the disagreement may percolate to the entire graph, whose construction and analysis is substantially more sophisticated than the previous local coupling. Although this sophistication only improves the threshold for q in Lemma 3 by a small constant factor, the effort is worthwhile because it helps us to approache the threshold of the ideal coupling discussed in Sect. 4.2.1 and shows that the infinite \(\varDelta \)regular tree \({\mathbb {T}}_{\varDelta }\) represents the worst case for path coupling. And curiously, the extremity of this worst case only holds when q is also properly upper bounded, say \(q\le 3.7\varDelta +3\), whereas the mixing rate for larger q was guaranteed by Lemma 3.

consistent: \(c_v^X=c_v^Y\) and is uniformly distributed over [q];

permuted: \(c_v^X\) is uniform in [q] and \(c_v^Y=\phi (c_v^X)\) where \(\phi :[q]\rightarrow [q]\) is a bijection defined as that \(\phi (X_{v_0})=Y_{v_0}\), \(\phi (Y_{v_0})=X_{v_0}\), and \(\phi (x)=x\) for all \(x\not \in \{X_{v_0},Y_{v_0}\}\).

Initially, for the disagreeing vertex \(v_0\), \((c_{v_0}^X,c_{v_0}^Y)\) is sampled consistently in the two chains.

For each unblocked \(u\in \varGamma (v_0)\), the \((c_{u}^X,c_{u}^Y)\) is sampled independently (of other vertices) from the permuted distribution.

Let \({\mathcal {S}}\subseteq V\) denote the current set of vertices v such that \((c_v^X,c_v^Y)\) has been sampled, and \({\mathcal {S}}^{\ne }\subseteq {\mathcal {S}}\) the set of vertices v with \((c_v^X,c_v^Y)\) sampled inconsistently as \(c_v^X\ne c_v^Y\). We abuse the notation and use \(\partial {\mathcal {S}}^{\ne }=\{\text {unblocked }u\not \in {\mathcal {S}}\mid \exists uv\in E, \text { s.t. }v\in {\mathcal {S}}^{\ne } \}\) to denote the unblocked unsampled vertex boundary of \({\mathcal {S}}^{\ne }\). If such \(\partial {\mathcal {S}}^{\ne }\) is nonempty, then all \(u\in \partial {\mathcal {S}}^{\ne }\) sample the respective \((c_{u}^X,c_{u}^Y)\) independently from the permuted distribution and join the \({\mathcal {S}}\) simultaneously. Grow \({\mathcal {S}}^{\ne }\) according to the results of sampling. Repeat this step until the current \(\partial {\mathcal {S}}^{\ne }\) is empty and thus \({\mathcal {S}}\) is stabilized.

For all remaining vertices v, \((c_v^X,c_v^Y)\) is sampled independently and consistently.
It is easy to see that each individual \(c_v^X\) or \(c_v^Y\) is uniformly distributed over [q] and is independent of \(c_u^X\) or \(c_u^Y\) for all other \(u\ne v\) (although the joint distributions \((c_v^X,c_v^Y)\) may be dependent of each other). Therefore, the \((\varvec{c}^X,\varvec{c}^Y)\) is a valid coupling of proposed colors.
Proposition 2
For any vertex \(u\ne v_0\), the event \(c_u^X\ne c_u^Y\) occurs only if there is a strongly selfavoiding walk (SSAW) \({\mathcal {P}}=(v_0,v_1,\ldots ,v_\ell )\) from \(v_0\) to \(v_\ell =u\) through unblocked vertices \(v_1,v_2,\ldots ,v_{\ell }\) such that \({\mathcal {P}}\) is a path of disagreement.
Proof
By the coupling, \(c^X_u \ne c^Y_u\) only when \((c^X_u, c^Y_u)\) is sampled from the permuted distribution and it must hold that \(\{c^X_u,c^Y_u\} = \{X_{v_0},Y_{v_0}\}\). This means that u itself must be unblocked.
At the time when \((c^X_u, c^Y_u)\) is being sampled, there must exist a neighbor \(w \in \varGamma (u)\) such that either (1) \(w=v_0\) or (2) \(w \in {\mathcal {S}}^{\ne }\), which means that \(c^X_w\ne c^Y_w\), \(\{c^X_w,c^Y_w\} = \{X_{v_0},Y_{v_0}\}\) was sampled before \((c^X_u, c^Y_u)\), and vertex w is unblocked. If it is the latter case, we repeat this argument for w recursively until \(v_0\) is reached. This will give us a path \({\mathcal {P}}=(v_0,v_1,\ldots ,v_{\ell })\) from \(v_0\) to \(u=v_{\ell }\) through unblocked vertices \(v_1,\ldots ,v_{\ell }\) such that for all \(1\le i\le \ell \), \((c^X_{v_i}, c^Y_{v_i})\) are sampled in that order, \(c^X_{v_i} \ne c^Y_{v_i}\) and \(\{c^X_{v_i},c^Y_{v_i}\} = \{X_{v_0},Y_{v_0}\}\). Thus, \({\mathcal {P}}\) is a path of disagreement through unblocked vertices. Note that this path \({\mathcal {P}}=(v_0,v_1,\ldots ,v_{\ell })\) must be a strongly selfavoiding. To the contrary assume that \({\mathcal {P}}\) is not strongly selfavoiding and there exist \(0\le i,j\le \ell \) such that \(i<j1\) and \(v_{i}v_{j}\) is an edge. In this case, right after \(c^X_{v_i}\ne c^Y_{v_i}\) being sampled and \(v_i\) joining \({\mathcal {S}}^{\ne }\), \(v_{i+1}\) and \(v_{j}\) must be both in \(\partial {\mathcal {S}}^{\ne }\) because they are both unblocked unsampled neighbors of \(v_i\) then. And due to our construction of coupling, the \((c^X_{v_{i+1}}, c^Y_{v_{i+1}})\) and \((c^X_{v_{j}}, c^Y_{v_{j}})\) are sampled and \(v_{i+1}, v_{j}\) join \({\mathcal {S}}\) simultaneously, which contradict that \((c^X_{v_{j}}, c^Y_{v_{j}})\) is sampled after \((c^X_{v_{i+1}}, c^Y_{v_{i+1}})\) along the path. Therefore, \({\mathcal {P}}\) is an SSAW through unblocked vertices and is also a path of disagreement. \(\square \)
The coupled next step \((X',Y')\) is determined by the current (X, Y) and the coupled proposed colors \((\varvec{c}^X,\varvec{c}^Y)\).
Proposition 3
For any vertex \(u\ne v_0\), the event \(X'_u\ne Y'_u\) occurs only if \(c^X_u,c^Y_u \in \{X_{v_0},Y_{v_0}\}\). Furthermore, for any unblocked vertex \(u\ne v_0\), the event \(X_u'\ne Y_u'\) occurs only if \(c_u^X\ne c_u^Y\).
Proof
We pick any \(u\ne v_0\). Assume by contradiction that \(c^X_u=c^Y_u \not \in \{X_{v_0},Y_{v_0}\}\). Note that this covers all possible contradicting cases to that \(c^X_u,c^Y_u \in \{X_{v_0},Y_{v_0}\}\), because \(c^X_u\ne c^Y_u\) occurs only when \(c^X_u,c^Y_u \in \{X_{v_0},Y_{v_0}\}\).

If \(X_u=Y_u \in \{X_{v_0},Y_{v_0}\}\), then for every neighbor \(w \in \varGamma (u)\), either w is blocked or \(w=v_0\). In both cases \(c_w^X=c_w^Y\) is sampled consistently, this implies (15) and (16), because \(c^X_u = c^Y_u\) and \(X_u = Y_u\). And it holds that either \(\{X_w,Y_w\} = \{X_{v_0},Y_{v_0}\}\) (in case of \(w = v_0\)) or \(X_w = Y_w\) (in case of \(w \ne v_0\)), this implies (17) because \(c^X_u = c^Y_u \not \in \{X_{v_0},Y_{v_0}\}\).

If \(X_u=Y_u \not \in \{X_{v_0},Y_{v_0}\}\). For each neighbor \(w \in \varGamma (u)\), it holds that either \(\{c^X_w,c^Y_w\} = \{X_{v_0},Y_{v_0}\}\) or \(c^X_w=c^Y_w\), because the event \(c^X_w \ne c^Y_w\) happens if and only if \(\{c^X_w,c^Y_w\} = \{X_{v_0},Y_{v_0}\}\) due to the coupling. Recall that \(c^X_u = c^Y_u \not \in \{X_{v_0},Y_{v_0}\}\) and \(X_u=Y_u \not \in \{X_{v_0},Y_{v_0}\}\), this implies (15) and (16). And it holds that either \(\{X_w,Y_w\} = \{X_{v_0},Y_{v_0}\}\) (in case of \(w = v_0\)) or \(X_w = Y_w\) (in case of \(w \ne v_0\)), this implies (17) because \(c^X_u = c^Y_u \not \in \{X_{v_0},Y_{v_0}\}\).
We then analyze the probability of \(X'_u \ne Y'_u\) for each vertex \(u \in V\).
Lemma 6
Proof

\(c^X_{v_0} \not \in \{X_u \mid u \in \varGamma (v_0)\}\) (and hence \(c^Y_{v_0} \not \in \{Y_u \mid u \in \varGamma (v_0)\}\) by the coupling \(c^Y_{v_0}=c^X_{v_0}\) and the fact that \(X_u=Y_u\) for \(u\ne v_0\)). This occurs with probability at least \(\frac{qd_{v_0}}{q}\).

For all unblocked neighbors \(u \in \varGamma ^U(v_0)\), it must have \(c^X_{u} \not \in \{X_{v_0},c^X_{v_0}\}\) and \(c^Y_u \not \in \{Y_{v_0},c^Y_{v_0}\}\). This occurs with probability at least \(\left( 1\frac{2}{q}\right) ^{d_{v_0}b_{v_0}}\) conditioning on any choice of \(c^X_{v_0}=c^Y_{v_0}\).

For all blocked neighbors \(w \in \varGamma ^B(v_0)\), it must have \(c^X_w \not \in \{c^X_{v_0},X_{v_0},Y_{v_0}\}\) (and hence \(c^Y_w \not \in \{c^Y_{v_0},X_{v_0},Y_{v_0}\}\) due to the coupling \(c^Y_{w}=c^X_{w}\)). This occurs with probability at least \(\left( 1\frac{3}{q}\right) ^{b_{v_0}}\) conditioning on any choice of \(c^X_{v_0}=c^Y_{v_0}\) and independent of unblocked neighbors \(u \in \varGamma ^U(v_0)\).
Lemma 7
Proof

\(c_u^X\ne c_u^Y\), which according to Proposition 2, occurs only if there is a SSAW \({\mathcal {P}}=(v_0,v_1,\ldots , v_{\ell })\) from \(v_0\) to \(v_\ell =u\) through unblocked vertices \(v_1,\ldots , v_{\ell }\) such that \({\mathcal {P}}\) is a path of disagreement;

for all unblocked neighbors \(w\in \varGamma ^{U}(u)\), the edge uw passes the check, which means \(c_w^X\not \in \{c_u^X,X_u\}\) (and meanwhile \(c_w^Y\not \in \{c_u^Y,Y_u\}\) by coupling) for all \(w\in \varGamma ^{U}(u)\);

all blocked neighbors \(w\in \varGamma ^{B}(u)\) passes the check in at least one chains among X, Y, which means either \(c^X_w \not \in \{c^X_u, X_u\}\) for all \(w \in \varGamma ^B(u)\) or \(c^Y_w \not \in \{c^Y_u,Y_u\}\) for all \(w \in \varGamma ^B(u)\).

there is a SSAW \({\mathcal {P}}=(v_0,v_1,\ldots , v_{\ell })\) from \(v_0\) to \(v_\ell =u\) through unblocked vertices \(v_1,\ldots , v_{\ell }\) such that \(c_{v_i}^X\in \{X_{v_0},Y_{v_0}\}\) for \(1\le i\le \ell 1\), which occurs with probability \(\left( \frac{2}{q}\right) ^{\ell 1}\);

if \(u\in \varGamma (v_0)\), then \(c_u^X=Y_{v_0}\) (and meanwhile \(c_u^Y=X_{v_0}\}\) by coupling), and if \(u\not \in \varGamma (v_0)\), \(c_u^X\in \{X_{v_0},Y_{v_0}\}\setminus \{c_{v_{\ell 1}}^X\}\) (and meanwhile \(c_u^Y\in \{X_{v_0},Y_{v_0}\}\setminus \{c_{v_{\ell 1}}^Y\}\) by coupling), which in either case, occurs with probability \(\frac{1}{q}\) conditioning on \((c_{v_{\ell 1}}^X,c_{v_{\ell 1}}^Y)\);

\(c_w^X\not \in \{c_u^X,X_u\}\) (and meanwhile \(c_w^Y\not \in \{c_u^Y,Y_u\}\) by coupling) for all unblocked \(w\in \varGamma ^U(u)\setminus \{v_{\ell 1}\}\), which occurs with probability \(\left( 1\frac{2}{q}\right) ^{d_ub_u1}\) conditioning on \(c_u^X\);

either \(c^X_w \not \in \{c^X_u, X_u\}\) for all \(w \in \varGamma ^B(u)\) or \(c^Y_w \not \in \{c^Y_u,Y_u\}\) for all \(w \in \varGamma ^B(u)\), which occurs with probability at most \(\left[ 2\left( 1\frac{2}{q}\right) ^{b_u}\left( 1\frac{3}{q}\right) ^{b_u}\right] \) conditioning on \((c_u^X,c_u^Y)\) by the principle of inclusionexclusion.
Lemma 8
Proof
By the coupling, any blocked vertex \(u \in V\) proposes consistently in the two chains, thus \(c^X_u = c^Y_u\). And we have \(X_u = Y_u\) for \(u \ne v_0\).

\(X_u = Y_u \not \in \{X_{v_0},Y_{v_0}\}\). Since vertex u is blocked, there must exist a vertex \(w_0 \in \varGamma (u) \setminus \{v_0\}\), such that \(X_{w_0} = Y_{w_0} \in \{X_{v_0},Y_{v_0}\}\). Without loss of generality, suppose \(X_{w_0} = Y_{w_0} =X_{v_0}\) (and the case \(X_{w_0}=Y_{w_0}=Y_{v_0}\) follows by symmetry). By Proposition 3, \(X'_u \ne Y'_u\) only if \(c^X_u = c^Y_u \in \{X_{v_0},Y_{v_0}\}\). Note that if \(c^X_u = c^Y_u = X_{v_0}\), then the edge \(uw_0\) cannot pass the check in both chains, hence \(X'_u = Y'_u\), a contradiction. So we must have \(c^X_u = c^Y_u=Y_{v_0}\), in which case edge \(v_0u\) cannot pass the check in chain Y, thus the event \(X'_u \ne Y'_u\) occurs only when u accepts the proposal in chain X, which happens only if for all \(w \in \varGamma (u)\), \(c^X_w \not \in \{c^X_u,X_u\}\). Remember that we already have \(c^X_u=Y_{v_0}\ne X_u\) and note that all vertices in chain X propose independently, therefore \(X'_u \ne Y'_u\) occurs with probability at most \(\frac{1}{q}\left( 1\frac{2}{q}\right) ^{d_u}\).

\(X_u = Y_u \in \{X_{v_0},Y_{v_0}\}\). Without loss of generality, suppose \(X_u = Y_u = X_{v_0}\)(and the case \(X_u = Y_u = Y_{v_0}\) follows by symmetry). By Proposition 3, \(X'_u \ne Y'_u\) only if \(c^X_u = c^Y_u \in \{X_{v_0},Y_{v_0}\}\). If \(c^X_u=c^Y_u=X_{v_0}\), the proposal and the current color of u are the same in two chains, hence \(X'_u = Y'_u\), a contradiction. So we must have \(c^X_u = c^Y_u=Y_{v_0}\), in which case the edge \(uv_0\) cannot pass the check in chain Y, thus event \(X'_u \ne Y'_u\) occurs only if vertex u accepts the proposal in chain X, which happens only if for all \(w \in \varGamma (u)\), \(c^X_w \not \in \{c^X_u,X_u\}=\{X_{v_0},Y_{v_0}\}\). Remember that we already have \(c^X_u=Y_{v_0}\) and note that all vertices in chain X propose independently, therefore \(X'_u \ne Y'_u\) occurs with probability at most \(\frac{1}{q}\left( 1\frac{2}{q}\right) ^{d_u}\).
Now we consider the general blocked vertices \(u \not \in \varGamma ^+(v_0)\). Assume that \(X_u'\ne Y'_u\).
If u is blocked by itself, i.e. \(X_u = Y_u \in \{X_{v_0},Y_{v_0}\}\), then all the vertices \(w \in \varGamma ^+(u)\) are blocked and hence propose consistently, and for \(u\not \in \varGamma ^+(v_0)\) all neighbors w have \(X_w=Y_w\), so we must have \(X'_u = Y'_u\). Thus \(\Pr [X'_u \ne Y'_u \mid X, Y] = 0\) and (19) holds trivially.
If otherwise u is not blocked by itself, i.e. \(X_u = Y_u \not \in \{X_{v_0},Y_{v_0}\}\), then u must be blocked by one of its neighbors \(w_0 \in \varGamma (u)\) such that \(X_{w_0}=Y_{w_0} \in \{X_{v_0},Y_{v_0}\}\). By Proposition 3, \(X'_u \ne Y'_u\) only if \(c^X_u = c^Y_u \in \{X_{v_0},Y_{v_0}\}\). We must have \(c_u^X\ne X_{w_0}\), because if otherwise \(c_u^X= X_{w_0}\), together with that \(c^Y_u = Y_{w_0}\) which is due to that \(c^X_u=c^Y_u\) and \(X_{w_0}=Y_{w_0}\), the edge \(uw_0\) cannot pass the check in both chains, giving us \(X'_u = Y'_u\), a contradiction.

\({\mathcal {P}}'=(v_0,v_1,\ldots ,v_{\ell 1})\) is a path of disagreement with probability at most \(\left( \frac{2}{q}\right) ^{\ell 1}\).

\(c^X_u\in \{X_{v_0},Y_{v_0}\} \setminus \{X_{w_0}\}\) (and \(c^Y_u=c^X_u\) due to the coupling), which occurs with probability \(\frac{1}{q}\) conditioning on that \({\mathcal {P}}'\) is a path of disagreement.

\(c^Y_w \not \in \{Y_u, c^Y_u\}\) for all \(w \in \varGamma (u)\setminus \{v_{\ell 1}\}\). Recall that \(Y_u \ne c^Y_u\). Since \({\mathcal {P}}\) is a strongly selfavoiding, we have \(w \not \in {\mathcal {P}}\) for all \(w \in \varGamma (u)\setminus \{v_{\ell 1}\}\). And the proposals are mutually independent in one chain. Condition on previous events, this probability is at most \(\left( 1\frac{2}{q}\right) ^{d_u  1}\).
The following lemma essentially states that \(\varPhi _{{\mathcal {P}}}\) is maximized when the number of blocked neighbors \(b_u=0\) and then the value of \(\varPhi _{{\mathcal {P}}}\) is upper bounded by the fixpoint for this recurrence.
Lemma 9
Proof
We prove by induction on the length of the walk. Let \({\mathcal {P}}=(v_0,v_1,\ldots ,v_\ell )\) be a walk from \(v_0\) such that all \(v_1,\ldots ,v_\ell \) are unblocked and \(v_\ell =u\). When \(\ell \) is longer than the longest strongly selfavoiding walk among unblocked \(v_1,\ldots ,v_\ell \), then \({\mathcal {P}}\) is not a SSAW and thus \(\varPhi _{{\mathcal {P}}}=0\).
Proof of Lemma 4:
Altogether, by the path coupling Lemma 2, if \(\alpha \varDelta \le q\le 3.7\varDelta +3\) for a constant \(\alpha > 2 + \sqrt{2}\) and \(\varDelta \ge 9\), then the mixing rate is bounded by \(\tau (\epsilon ) = O(\log \left( \frac{n}{\epsilon }\right) )\). \(\square \)
5 Lower bounds
In this section, we show lower bounds for local sampling. Let G(V, E) be a network, and \({\mathcal {I}}\) an instance of MRF or weighted local CSP defined on graph G. For example, \({\mathcal {I}}=(G,[q],\varvec{A},\varvec{b})\) for a MRF with edge activities \(\varvec{A}=\{A_e\}_{e\in E}\) and vertex activities \(\varvec{b}=\{b_v\}_{v\in V}\).
We then show that the \(\varOmega (\log n)\) lower bound holds even for a constant total variation distance \(\epsilon \). A similar \(\varOmega (\log n)\) lower bound for sampling independent sets is proved independently in [33]. Altogether it shows that the \(O\left( \log \left( \frac{n}{\epsilon }\right) \right) \) upper bound in Theorem 2 is optimal.
Theorem 7
Let \(q\ge 3\) be a constant and \(\epsilon <\frac{1}{3}\). Any tround protocol that samples uniform proper qcoloring in a path within total variation distance \(\epsilon \) must have \(t=\varOmega (\log n)\).
Proof
Let \(P=(w_0,w_1,\ldots ,w_{n1})\) be a path of n vertices. For \(i=0,1,\ldots ,m\) where \(m=\left\lfloor \frac{n1}{3(2t+1)}\right\rfloor \), we denote \(x_i=w_{3(2t+1)i}\); and for \(i=0,1,\ldots ,m1\), denote \(u_i=w_{3(2t+1)i+2t+1}\), and \(v_i=w_{3(2t+1)i+2(2t+1)}\). We denote \(F=\{x_i\mid 0\le i\le m\}\) and \(U=\{u_i,v_i\mid 0\le i\le m1\}\), and let \(C=F\cup U\). We call the vertices in C the centers, and the vertices in F and U the fixed and unfixed centers respectively. Note that the pairs \((u_i,v_i)\) of consecutive unfixed centers are separated by the fixed centers \(x_i\)’s. Due to the conditional independence of MRF, conditioning on any particular configuration \(\sigma _F\in [q]^F\) of fixed centers, for a \(\sigma \in [q]^P\) sampled from the Gibbs distribution \(\mu \) consistent with \(\sigma _F\) over F, the pairs \((\sigma _{u_i},\sigma _{v_i})\) are mutually independent of each other. For the followings we assume that we are conditioning on an arbitrarily fixed \(\sigma _F\in [q]^F\).
Let \({X}_{u_i}\) and \(X_{v_i}\) be the respective output of \(u_i\) and \(v_i\) in a tround protocol. Due to the observation of (30), \({X}_{u_i}\) and \(X_{v_i}\) are mutually independent. According to the exponential correlation of (32), by choosing a suitably small \(t=O(\log n)\), the total variation distance between \((\sigma _{u_i},\sigma _{v_i})\) and \(({X}_{u_i},{X}_{v_i})\) is at least \(\exp (\varOmega (t)) =n^{\frac{1}{2}}\).
Next, we state a strong \(\varOmega ({\mathrm {diam}})\) lower bound for sampling with longrange correlations.
5.1 An \(\varOmega \)(diam) lower bound in the nonuniqueness regime
The hardcore model on graphs with maximum degree \(\varDelta \) undergoes a computational phase transition at the uniqueness threshold \(\lambda _c(\varDelta )=\frac{(\varDelta 1)^{\varDelta 1}}{(\varDelta 2)^\varDelta }\), such that sampling from the Gibbs distribution can be done in polynomial time in the uniqueness regime \(\lambda <\lambda _c\) [20, 60] and is intractable unless NP=RP in the nonuniqueness regime \(\lambda >\lambda _c\) [8, 28, 55, 56].
The following theorem states an \(\varOmega ({\mathrm {diam}})\) lower bound for sampling from the hardcore model in the nonuniqueness regime. In particular when \(\lambda =1\) the model represents the uniform independent sets and the nonuniqueness \(\lambda >\lambda _c(\varDelta )\) holds when \(\varDelta \ge 6\), which gives us Theorem 3.
Theorem 8
Let \(\varDelta \ge 3\) and \(\lambda >\lambda _c(\varDelta )\). Let \(\epsilon >0\) be a sufficiently small constant. For all \(N>0\) there exists a graph \({\mathcal {G}}\) on \(\varTheta (N)\) vertices with maximum degree \(\varDelta \) and diameter \({\mathrm {diam}}({\mathcal {G}})={\varOmega (N^{1/11})}\) such that for the hardcore model on \({\mathcal {G}}\) with fugacity \(\lambda \), any tround protocol that samples within total variation distance \(\epsilon \) from the Gibbs distribution \(\mu =\mu _{\mathcal {G}}\) must have \(t=\varOmega ({\mathrm {diam}}({\mathcal {G}}))\).
We follow the approaches in [8, 27, 28, 55, 56] for the computational phase transition. The network \({\mathcal {G}}=H^G\) is constructed by lifting a graph H with a gadget G, such that sampling from the hardcore model on \(H^G\) with \(\lambda >\lambda _c(\varDelta )\) effectively samples a maximum cut in H. We choose H to be an even cycle, in which the maximum cut imposes a longrange correlation among vertices. And to sample with such a longrange correlation, the sampling algorithm must not be local.
Unlike the results of [8, 27, 28, 55, 56] which are for computational complexity of approximate counting, here we prove unconditional lower bounds for sampling in the \(\mathsf {LOCAL}\) model. Our lower bound is due to the longrange correlations in the random maxcut rather than the computational complexity of optimization. Technicalwise, this means that in addition to show that a maxcut in H is sampled, we also need that the sampled maxcut is distributed almost uniformly.
5.1.1 The random graph gadget

Let \(V^+\) and \(V^\) be two vertex sets with \(V^+=V^=n+r\), such that \(V^\pm =U^\pm \uplus W^\pm \) where \(\leftU^\pm \right=n\) and \(\leftW^\pm \right=r\). Let \(V=V^+\cup V^\), \(W=W^+\cup W^\) and \(U=U^+\cup U^\).

Uniformly and independently sample \(\varDelta 1\) perfect matchings between \(V^+\) and \(V^\) and then uniformly and independently sample a perfect matching between \(U^+\) and \(U^\). The union of all these matchings gives us the random bipartite (multi)graph \({\mathcal {G}}^r_n\), in which every vertex in U has degree \(\varDelta \) and every vertex in W has degree \(\varDelta 1\).
Proposition 4

(expander) G is connected with \({{\mathrm {diam}}}\left( G\right) =O(\log n)\);

(balanced phases) \(\Pr _G\left[ Y(\sigma )=\pm \right] \in [(1\delta )/2,(1+\delta )/2]\);

(phasecorrelated almost independence) \(\forall \tau _T\in \{0,1\}^T\), \(\Pr _G\left[ \sigma _T=\tau _T \mid Y(\sigma )=\pm \right] /Q_T^\pm (\tau _T)\in [1\delta ,1+\delta ]\);
By the probabilistic method, there exists a G satisfying the above conditions.
5.1.2 Reduction from maxcut

For each vertex \(x\in H\) let \(G_x\) be a copy of G. We denote by \(T^\pm _x\) the respective set of 2k terminals in \(G_x\). Let \(\widehat{H}^G\) be the disconnected copies of the \(G_x\), \(x\in H\).

For every edge \((x,y)\in H\), add k edges between \(T^+_x\) and \(T^+_y\) and similarly add k edges between \(T^_x\) and \(T^_y\). This can be done in such a way that the resulting (multi)graph \(H^G\) is \(\varDelta \)regular.
Definition 4
Note that the cycle H has precisely two maximum cuts. A key property for proving the lower bound is that in the nonuniqueness regime, sampling from the hardcore model on graph \(H^G\) corresponds to sampling a maximum cut in H almost uniformly.
Theorem 9
The theorem is implied by the following lemma, which is proved by applying a calculation in [55] with the improved gadget property Proposition 4.
Lemma 10
Proof
Proof of Theorem 9:
5.1.3 Proof of the \({\varOmega }\)(diam) lower bound
Now we are ready to prove Theorem 8. Let N be sufficiently large. We choose an integer \(n={\varTheta (N^{10/11})}\) and even integer \(m={\varTheta (N^{1/11})}\) such that m / 2 is odd, so that a gadget G is constructed to satisfy Proposition 4, and the graph \({\mathcal {G}}=H^G\), where H is a cycle of length m, is constructed as described in Sect. 5.1.2. Note that \(\text {diam}\left( {\mathcal {G}}\right) \ge {\mathrm {diam}}(H)\ge m/2\) and \(\leftV\left( {\mathcal {G}}\right) \right=\varTheta (N)\), therefore \(\text {diam}\left( {\mathcal {G}}\right) ={\varOmega (N^{1/11})}\).
Let \(\sigma '\) denote the output of a tround protocol with \(t\le 0.49\cdot {\mathrm {diam}}({\mathcal {G}})\) on network \({\mathcal {G}}\), whose distribution is denoted as \(\mu _t\); and let \(\sigma \) be sampled from the hardcore Gibbs distribution \(\mu =\mu _{{\mathcal {G}}}\). By contradiction, we assume that \(d_{\mathrm {TV}}\left( {\mu _t},{\mu }\right) \le \epsilon \) for sufficiently small constant \(\epsilon \).
6 Conclusion
In this paper, we study the local sampling problem and ask a new question about local computation: whether a locally definable joint distribution can be sampled locally.
On the positive side, we give two distributed sampling algorithms LubyGlauber and LocalMetropolis. LubyGlauber achieves \(O(\varDelta \log n)\) mixing time under Dobrushin’s condition and LocalMetropolis may achieve optimal \(O(\log n)\) mixing time under a stronger mixing condition. Thus many locally definable joint distributions can be sampled locally.
On the negative side, we give an \(\varOmega (\log n)\) lower bound for sampling from a broad class of locally defined joint distributions. Thus the \(O(\log n)\)radius can be considered as the new criteria for being local for distributed sampling algorithms. Furthermore, we give an \(\varOmega (\mathrm {diam})=n^{\varOmega (1)}\) lower bound for sampling weighted independent sets in the nonuniqueness regime. Since independent set is trivial to construct, this gives a strong separation between local sampling and local construction. The lower bounds hold even if every vertex is aware of the graph structure, which means the hardness for local sampling is due to the discrepancy between the locality of randomness in distributed algorithms and the longrange correlation in the joint distribution from which we want to sample.
Footnotes
 1.
This property holds automatically for feasible configurations X with \(\mu (X)>0\), and is only needed when the Glauber dynamics is allowed to start from an infeasible configuration. For specific MRF, such as proper qcoloring, this property is guaranteed by the “uniqueness condition” \(q\ge \varDelta +1\).
 2.
For the MRFs, since the singlesite Glauber dynamics has the same connectivity structure as the natural singlesite version of Metropolis chain, we do not distinguish between them when referring to irreducibility.
Notes
Acknowledgements
This research is supported by the National Key R&D Program of China 2018YFB1003202 and the National Science Foundation of China under Grant Nos. 61722207 and 61672275. Yitong Yin wants to thank Daniel Štefankovič for the stimulating discussions in the beginning of this project. He also wants to thank Heng Guo, Tom Hayes, Eric Vigoda, and Chaodong Zheng for helpful discussions.
References
 1.Alon, N., Babai, L., Itai, A.: A fast and simple randomized parallel algorithm for the maximal independent set problem. J. Algorithms 7(4), 567–583 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
 2.Awerbuch, B., Luby, M., Goldberg, A.V., Plotkin, S.A.: Network decomposition and locality in distributed computation. In: Proceedings of the 30th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pp. 364–369 (1989)Google Scholar
 3.Barenboim, L.: Deterministic (\(\varDelta \)+ 1)coloring in sublinear (in \(\varDelta \)) time in static, dynamic, and faulty networks. J. ACM 63(5), 47 (2016)MathSciNetCrossRefGoogle Scholar
 4.Barenboim, L., Elkin, M.: Deterministic distributed vertex coloring in polylogarithmic time. J. ACM 58(5), 23 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
 5.Barenboim, L., Elkin, M., Pettie, S., Schneider, J.: The locality of distributed symmetry breaking. J. ACM 63(3), 20 (2016)MathSciNetCrossRefGoogle Scholar
 6.Bubley, R., Dyer, M.: Path coupling: a technique for proving rapid mixing in markov chains. In: Proceedings of the 38th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pp. 223–231 (1997)Google Scholar
 7.Cai, J.Y., Chen, X., Lu, P.: Nonnegative weighted# CSP: an effective complexity dichotomy. SIAM J. Comput. 45(6), 2177–2198 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
 8.Cai, J.Y., Galanis, A., Goldberg, L.A., Guo, H., Jerrum, M., Štefankovič, D., Vigoda, E.: # bishardness for 2spin systems on bipartite bounded degree graphs in the tree nonuniqueness region. J. Comput. Syst. Sci. 82(5), 690–711 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
 9.Calderhead, B.: A general construction for parallelizing Metropolis–Hastings algorithms. Proc. Natl. Acad. Sci. 111(49), 17408–17413 (2014)CrossRefGoogle Scholar
 10.Chang, Y.J., Kopelowitz, T., Pettie, S.: An exponential separation between randomized and deterministic complexity in the LOCAL model. In: Proceedings of the 57th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pp. 615–624 (2016)Google Scholar
 11.Chung, K.M., Pettie, S., Su, H.H.: Distributed algorithms for the Lovász local lemma and graph coloring. In: Proceedings of the 33rd ACM Symposium on Principles of Distributed Computing (PODC), pp. 134–143 (2014)Google Scholar
 12.Dániel, M.: Graph colouring problems and their applications in scheduling. Period. Polytech. Electr. Eng. 48(1–2), 11–16 (2004)Google Scholar
 13.Das Sarma, A., Nanongkai, D., Pandurangan, G., Tetali, P.: Distributed random walks. J. ACM 60(1), 2 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 14.De Sa, C., Olukotun, K., Ré, C.: Ensuring rapid mixing and low bias for asynchronous Gibbs sampling. In: Proceedings of the 33rd International Conference on Machine Learning (ICML), pp. 1567–1576 (2016)Google Scholar
 15.De Sa, C., Zhang, C., Olukotun, K., Ré, C.: Rapidly mixing Gibbs sampling for a class of factor graphs using hierarchy width. In: Advances in Neural Information Processing Systems (NIPS), pp. 3097–3105 (2015)Google Scholar
 16.Dobrushin, R.L.: Prescribing a system of random variables by conditional distributions. Theory Probab. Appl. 15(3), 458–486 (1970)CrossRefzbMATHGoogle Scholar
 17.DoshiVelez, F., Mohamed, S., Ghahramani, Z., Knowles, D.A.: Large scale nonparametric Bayesian inference: data parallelisation in the Indian buffet process. In: Advances in Neural Information Processing Systems (NIPS), pp. 1294–1302 (2009)Google Scholar
 18.Dyer, M., Goldberg, L.A., Jerrum, M.: Dobrushin conditions and systematic scan. In: Proceedings of the 10th International Workshop on Randomization and Computation (RANDOM), pp. 327–338. Springer, Berlin (2006)Google Scholar
 19.Dyer, M., Goldberg, L.A., Jerrum, M.: Systematic scan for sampling colorings. Ann. Appl. Probab. 16(1), 185–230 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
 20.Efthymiou, C., Hayes, T.P., Štefankovic, D., Vigoda, E., Yin, Y.: Convergence of MCMC and loopy BP in the tree uniqueness region for the hardcore model. In: Proceedings of the 57th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pp. 704–713 (2016)Google Scholar
 21.Feng, W., Hayes, T.P., Yin, Y.: Distributed symmetry breaking in sampling (optimal distributed randomly coloring with fewer colors). arXiv preprint arXiv:1802.06953 (2018)
 22.Feng, W., Yin, Y.: On local distributed sampling and counting. arXiv preprint arXiv:1802.06686 (2018)
 23.Fischer, M., Ghaffari, M.: A simple parallel and distributed sampling technique: local glauber dynamics. arXiv preprint arXiv:1802.06676 (2018)
 24.Fraigniaud, P., Heinrich, M., Kosowski, A.: Local conflict coloring. In: Proceedings of the 57th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pp. 625–634 (2016)Google Scholar
 25.Fraigniaud, P., Korman, A., Peleg, D.: Towards a complexity theory for local distributed computing. J. ACM 60(5), 35 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 26.Frieze, A., Vigoda, E.: A survey on the use of markov chains to randomly sample colourings. Oxf. Lect. Ser. Math. Appl. 34, 53 (2007)MathSciNetzbMATHGoogle Scholar
 27.Galanis, A., Štefankovič, D., Vigoda, E.: Inapproximability for antiferromagnetic spin systems in the tree nonuniqueness region. J. ACM 62(6), 50 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 28.Galanis, A., Štefankovič, D., Vigoda, E.: Inapproximability of the partition function for the antiferromagnetic Ising and hardcore models. Comb. Probab. Comput. 25(04), 500–559 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
 29.Ghaffari, M.: An improved distributed algorithm for maximal independent set. In: Proceedings of the 27th Annual ACMSIAM Symposium on Discrete Algorithms (SODA), pp. 270–277 (2016)Google Scholar
 30.Ghaffari, M., Kuhn, F., Maus, Y.: On the complexity of local distributed graph problems. arXiv preprint arXiv:1611.02663 (2016)
 31.Ghaffari, M., Su, H.H.: Distributed degree splitting, edge coloring, and orientations. In: Proceedings of the 28th Annual ACMSIAM Symposium on Discrete Algorithms (SODA), pp. 2505–2523 (2017)Google Scholar
 32.Gonzalez, J.E., Low, Y., Gretton, A., Guestrin, C.: Parallel Gibbs sampling: From colored fields to thin junction trees. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 15, pp. 324–332 (2011)Google Scholar
 33.Guo, H., Jerrum, M., Liu, J.: Uniform sampling through the Lovász local lemma. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pp. 342–355 (2017)Google Scholar
 34.Harris, D.G., Schneider, J., Su, H.H.: Distributed \(({\varDelta } +1)\)coloring in sublogarithmic rounds. In: Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pp. 465–478 (2016)Google Scholar
 35.Hayes, T.P.: A simple condition implying rapid mixing of singlesite dynamics on spin systems. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 39–46 (2006)Google Scholar
 36.Holroyd, A.E., Schramm, O., Wilson, D.B.: Finitary coloring. arXiv preprint arXiv:1412.2725 (2014)
 37.Jerrum, M.: A very simple algorithm for estimating the number of \(k\)colorings of a lowdegree graph. Random Struct. Algorithms 7(2), 157–165 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
 38.Johnson, M.J., Saunderson, J., Willsky, A.S.: Analyzing Hogwild parallel Gaussian Gibbs sampling. In: Advances in Neural Information Processing Systems (NIPS), pp. 2715–2723 (2013)Google Scholar
 39.Kuhn, F., Moscibroda, T., Wattenhofer, R.: What cannot be computed locally! In: Proceedings of the 23th Annual ACM Symposium on Principles of Distributed Computing (PODC), pp. 300–309 (2004)Google Scholar
 40.Kuhn, F., Moscibroda, T., Wattenhofer, R.: The price of being nearsighted. In: Proceedings of the 17th Annual ACMSIAM Symposium on Discrete Algorithm (SODA), pp. 980–989. Society for Industrial and Applied Mathematics (2006)Google Scholar
 41.Kuhn, F., Moscibroda, T., Wattenhofer, R.: Local computation: lower and upper bounds. J. ACM 63(2), 17 (2016)MathSciNetCrossRefGoogle Scholar
 42.Kuhn, F., Wattenhofer, R.: On the complexity of distributed graph coloring. In: Proceedings of the 25th Annual ACM Symposium on Principles of Distributed Computing (PODC), pp. 7–15 (2006)Google Scholar
 43.Levin, D.A., Peres, Y., Wilmer, E.L.: Markov Chains and Mixing Times. American Mathematical Soc., Providence (2009)zbMATHGoogle Scholar
 44.Linial, N.: Locality in distributed graph algorithms. SIAM J. Comput. 21(1), 193–201 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
 45.Lu, P., Yin, Y.: Improved FPTAS for multispin systems. In: Proceedings of the 17th International Workshop on Randomization and Computation (RANDOM), pp. 639–654 (2013)Google Scholar
 46.Luby, M.: A simple parallel algorithm for the maximal independent set problem. SIAM J. Comput. 15(4), 1036–1053 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
 47.Mezard, M., Montanari, A.: Information, Physics, and Computation. Oxford University Press, Oxford (2009)CrossRefzbMATHGoogle Scholar
 48.Moser, R.A., Tardos, G.: A constructive proof of the general Lovász local lemma. J. ACM 57(2), 11 (2010)CrossRefzbMATHGoogle Scholar
 49.Naor, M., Stockmeyer, L.: What can be computed locally? SIAM J. Comput. 24(6), 1259–1277 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
 50.Newman, D., Asuncion, A., Smyth, P., Welling, M.: Distributed inference for latent Dirichlet allocation. In: Proceedings of the 20th International Conference on Neural Information Processing Systems (NIPS), pp. 1081–1088 (2007)Google Scholar
 51.Niu, F., Recht, B., Ré, C., Wright, S.J.: Hogwild: a lockfree approach to parallelizing stochastic gradient descent. In: Advances in Neural Information Processing Systems (NIPS), pp. 693–701 (2011)Google Scholar
 52.Peleg, D.: Distributed Computing: A Localitysensitive Approach. SIAM, Philadelphia (2000)CrossRefzbMATHGoogle Scholar
 53.Salas, J., Sokal, A.D.: Absence of phase transition for antiferromagnetic Potts models via the Dobrushin uniqueness theorem. J. Stat. Phys. 86(3), 551–579 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
 54.Sarma, A.D., Holzer, S., Kor, L., Korman, A., Nanongkai, D., Pandurangan, G., Peleg, D., Wattenhofer, R.: Distributed verification and hardness of distributed approximation. SIAM J. Comput. 41(5), 1235–1265 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
 55.Sly, A.: Computational transition at the uniqueness threshold. In: Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 287–296 (2010)Google Scholar
 56.Sly, A., Sun, N.: Counting in twospin models on \(d\)regular graphs. Ann. Probab. 42(6), 2383–2416 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 57.Smyth, P., Welling, M., Asuncion, A.U.: Asynchronous distributed learning of topic models. In: Advances in Neural Information Processing Systems (NIPS), pp. 81–88 (2009)Google Scholar
 58.Swendsen, R.H., Wang, J.S.: Replica Monte Carlo simulation of spinglasses. Phys. Rev. Lett. 57(21), 2607 (1986)MathSciNetCrossRefGoogle Scholar
 59.Vigoda, E.: Improved bounds for sampling colorings. J. Math. Phys. 41(3), 1555–1569 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
 60.Weitz, D.: Counting independent sets up to the tree threshold. In: Proceedings of the 38th Annual ACM Symposium on Theory of Computing (STOC), pp. 140–149 (2006)Google Scholar
 61.Xu, M., Lakshminarayanan, B., Teh, Y.W., Zhu, J., Zhang, B.: Distributed bayesian posterior sampling via moment sharing. In: Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS), pp. 3356–3364 (2014)Google Scholar
 62.Yan, F., Xu, N., Qi, Y.: Parallel inference for latent Dirichlet allocation on graphics processing units. In: Advances in Neural Information Processing Systems (NIPS), pp. 2134–2142 (2009)Google Scholar
 63.Yang, Y., Chen, J., Zhu, J.: Distributing the stochastic gradient sampler for largescale LDA. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 1975–1984 (2016)Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.