Geometric Bounds on the Fastest Mixing Markov Chain

In the Fastest Mixing Markov Chain problem, we are given a graph $G = (V, E)$ and desire the discrete-time Markov chain with smallest mixing time $\tau$ subject to having equilibrium distribution uniform on $V$ and non-zero transition probabilities only across edges of the graph. It is well-known that the mixing time $\tau_\textsf{RW}$ of the lazy random walk on $G$ is characterised by the edge conductance $\Phi$ of $G$ via Cheeger's inequality: $\Phi^{-1} \lesssim \tau_\textsf{RW} \lesssim \Phi^{-2} \log |V|$. Analogously, we characterise the fastest mixing time $\tau^\star$ via a Cheeger-type inequality but for a different geometric quantity, namely the vertex conductance $\Psi$ of $G$: $\Psi^{-1} \lesssim \tau^\star \lesssim \Psi^{-2} (\log |V|)^2$. This characterisation forbids fast mixing for graphs with small vertex conductance. To bypass this fundamental barrier, we consider Markov chains on $G$ with equilibrium distribution which need not be uniform, but rather only $\varepsilon$-close to uniform in total variation. We show that it is always possible to construct such a chain with mixing time $\tau \lesssim \varepsilon^{-1} (\operatorname{diam} G)^2 \log |V|$. Finally, we discuss analogous questions for continuous-time and time-inhomogeneous chains.


Introduction 1.Fastest Mixing Markov Chain Set-Up and Motivation
Sampling objects from a finite set is a basic primitive which has a myriad of applications.Sampling directly from such a set, however, may be computationally too expensive or even impossible, for example, if the objects are nodes of a distributed network.A common approach in these scenarios is to design a random walk (RW ), or, more generally, a Markov chain with state space corresponding to the set from which we wish to sample and appropriate equilibrium distribution.Furthermore, to ensure our sampling procedure is computationally efficient, we desire our Markov chain to converge to equilibrium in a small number of steps, i.e., have fast mixing time.
This has wide-ranging applications: from shuffling cards [BD92,DS81], to approximating statistical physics models [JS93,DHJM21] and analysing load-balancing protocols in distributed computing [RSW98,TS19].Furthermore, approximately sampling from the uniform distribution of a set can be used to estimate the size of the set itself [JS96].This has been applied to approximating the permanent of a matrix [JS89b,JSV04] and counting the number of independent sets [DJMV21], perfect matchings [DM19] and forests [Ann94] in graphs.
Fundamental to these applications is a fast mixing time.Understanding in which instances fast mixing is achievable and what the intrinsic obstacles to fast mixing are is the focus of this paper.More precisely, we consider the following scenario.
• We are given a finite, undirected graph G = (V, E): the vertex set represents the underlying state space, while the edge set E defines the transitions allowed.
• Our goal is to study the fastest mixing Markov chain satisfying these constraints.We assume throughout that graphs are finite, undirected and connected.
This problem was originally introduced by Boyd, Diaconis and Xiao [BDX04] as the Fastest Mixing Markov Chain (FMMC ) problem.Specifically, by considering only reversible chains and optimising the spectral gap as a proxy for the mixing time, they recast the problem of finding the fastest mixing Markov chain on a graph as a convex optimisation problem.Analogously to Boyd, Diaconis and Xiao [BDX04], we dedicate most of our attention to reversible, time-homogenous chains in discrete-time.We do, however, dedicate one section to questions in the continuous-time setting, first studied in [SBXD06], and one short final section to time-inhomogeneous chains.Compared with discrete-time chains, continuous-time and time-inhomogeneous chains are considerably more powerful, but perhaps less natural from an application viewpoint.
To be explicit and precise, a transition matrix P is reversible with respect to (w.r.t.) π if π(u)P (u, v) = π(v)P (v, u) for all u, v ∈ V. Results on the spectral gap below-both our own and those referenced-are always in the reversible set-up.We also restrict to lazy chains, ie chains with P (v, v) ≥ 1 2 for all v ∈ V .This is without loss of generality, since we are interested in maximising the spectral gap and this restriction costs a factor of at most 1 2 in the optimal spectral gap.There are a variety of choices for how "convergence to equilibrium" is measured.It is typically measured in the total variation (TV ), or equivalently ℓ 1 , distance.Other popular measures, particularly in the statistics literature, include ℓ 2 , or χ 2 , distance and relative entropy, or Kullback-Leibler divergence.We also recall that ℓ 2 is equivalent to ℓ ∞ , or uniform, distance for reversible chains.
Nevertheless, no matter which one of these measure we choose, the long-term convergence to equilibrium of a lazy, reversible Markov chain is governed by its spectral gap γ P .More precisely, given a transition matrix P of a lazy, reversible Markov chain, let d P (t, x) denote the distance between P t (x, •) and its equilibrium distribution according to any of the aforementioned measures and let d P (t) := max x∈V d P (t, x).Then, d P (t) 1/t → γ P as t → ∞.
See [LPW17, Theorems 12.4 and 12.5] for details.The spectral gap thus determines the asymptotic convergence to equilibrium without having to select a specific measure.
We now define formally the class of reversible Markov chains on a graph and then the spectral gap and the relaxation and mixing times.
Definition (Markov Chains on a Graph).Let G = (V, E) be a graph and π a probability measure on V .A transition matrix P ∈ [0, 1] V ×V is on G if P (u, v) > 0 then either {u, v} ∈ E or u = v.Let M(G, π) denote the set of lazy transition matrices on G which are reversible w.r.t.π.
There is a standard relation between the relaxation and mixing times: [LPW17,Theorems 12.4 and 12.5] for details.The " " symbol hides an implicit universal constant; we use the symbols " " and "≍" similarly.Typically, log π −1 min ≍ log |V |.So, the relaxation time is a proxy for the mixing time, as well as characterising long-term convergence to equilibrium.
We are now finally ready to formally introduce the FMMC problem.
Definition (Fastest Mixing Markov Chain).Let G = (V, E) be a graph and let π be a probability measure on V .The optimal spectral gap is defined as The optimal relaxation time is 1/γ ⋆ π (G).We write γ ⋆ (G), omitting the π, when π = U V is uniform.A transition matrix P is fast mixing if 1/γ P is polylogarithmic in |V |, asymptotically.Analogously, a graph G admits a fast mixing chain if 1/γ ⋆ (G) is polylogarithmic in |V |, asymptotically.
Previous work has been mainly focussed on finding useful formulations of the problem or on solving particular cases; see §1.4 for further details.The primary aim of our work, instead, is twofold: (i) to control the optimal spectral gap in terms of geometric barriers in the graph; (ii) to find ways to overcome these geometric barriers by slightly relaxing the FMMC problem.Finally, we centre our attention on the case where π = U V is the uniform distribution on V .This case was also the main focus of the original series of papers studying the FMMC problem.

Main Results
This article includes multiple avenues of study, all on the theme of finding fast mixing Markov chains.We introduce these and the main theorems that we prove in the following subsections.

A Characterisation of Fast Mixing on Graphs
We are looking for some natural statistic of the graph G which characterises fast mixing: we desire necessary and sufficient conditions for 1/γ ⋆ (G) to be 'small', namely polylogarithmic in |V |.
How well-connected a graph is should, intuitively, influence how fast a chain on the graph can mix.Thus, we would like to understand what kind of connectivity measure best characterises fast mixing.A natural candidate is the edge conductance Φ ⋆ of a graph, which is defined as follows.It is well-known the edge conductance Φ ⋆ characterises the spectral gap of the lazy random walk (abbreviated RW ) P RW on G via the discrete Cheeger inequality, discovered in [JS89a,LS88]: The lazy RW on a graph, however, does not have uniform equilibrium distribution, unless the graph is regular.For this reason, it is natural to consider the uniform, or maximum degree, RW P U which is defined by adding the appropriate number of self-loops to each vertex so that the becomes graph regular.A simple calculation with the Dirichlet characterisation gives see [BDX04, §7.2] for details.Applying this along with the discrete Cheeger inequality gives Fast mixing for low-degree graphs is thus characterised by the edge conductance Φ ⋆ (G): (i) G admits a fast mixing chain if and only if P U is fast mixing; Such a simple characterisation does not hold if d max is large.This may be slightly counterintuitive at first: adding edges can only increase the optimum γ ⋆ ; but the lower bound above gets worse as d max increases.A striking example is given by taking two cliques on n vertices and connecting them by a perfect matching; see Figure 3 in §1.3.It is a regular graph with Φ ⋆ ≍ 1/n, but, as we will see later, it has γ ⋆ ≍ 1. Informally, γ ⋆ ≍ 1 because we can replace the two cliques with two bounded degree expander graphs without overly damaging its connectivity properties.This shows that edge conductance is not the correct conductance measure for the FMMC problem.
This prompts us to consider an alternative notion of connectivity: the vertex conductance Ψ ⋆ .It measures how well connected a set is by comparing the number of vertices in the boundary with its size.Contrastingly, edge conductance compares the number of edges in the boundary with the total number of edges inside the set.
where ∂S is the vertex boundary of S ⊆ V : The example above in which two equisized cliques are connected by a perfect matching has vertex conductance Ψ ⋆ ≍ 1.This agrees with our claimed optimal spectral gap γ ⋆ ≍ 1.
Vertex conductance has been used to provide upper bounds on the time to spread a rumour in a graph and on the hitting times of RWs, by Giakkoupis [Gia14] and Chandra et al [Cha+96], respectively, amongst others.Roch [Roc05, Proposition 2] showed that vertex conductance represents a fundamental barrier to fast mixing: γ ⋆ (G) Ψ ⋆ (G).This can be seen directly via a simple calculation comparing the edge conductance of any reweighing of G, for which the RW on this weighted graph has uniform equilibrium distribution, with the vertex conductance.
The edge and vertex conductances are comparable for low-degree graphs: Thus, the fact that the edge conductance Φ ⋆ (G) characterises fast mixing for low-degree graphs means that the same holds for the vertex conductance Ψ ⋆ (G): We remove this d max factor, at the cost of a log |V | factor, thus showing that vertex conductance characterises the existence of a fast mixing chain for any graph.The graph of Figure 3 in §1.3 shows this does not hold for the edge conductance.
Theorem A (Characterisation of Fast Mixing).Let G = (V, E) be a finite graph.Then γ ⋆ (G) satisfies Thus, vertex conductance characterises fast mixing for any graph.
The quadratic dependence on the vertex conductance in the lower bound is needed for graphs such as the cycle.This has optimal spectral gap γ ⋆ ≍ 1/n 2 and vertex conductance Ψ ⋆ ≍ 1/n; see [BDSX06].We are not aware of a graph for which the log |V | factor is needed, but we have reasons to believe that such a factor, or at least a factor log d max , is necessary.We elaborate.
Louis, Raghavendra and Vempala [LRV13] have essentially shown the following: under the socalled Small Set Expansion Conjecture of Raghavendra and Steurer [RS10], for any ε > 0, there is no polynomial-time algorithm that can distinguish between Ψ ⋆ (G) ≤ ε and Ψ ⋆ (G) √ ε log d max for any graph G = (V, E).Since the optimal spectral gap γ ⋆ (G) can be computed in polynomial time, getting rid of the logarithmic factor altogether in Theorem A would violate the Small Set Expansion Conjecture.We leave open the problem of reducing the factor log |V | to log d max .
One of the most interesting aspects of the proof of Theorem A, given in §2, is that it does not directly relate the vertex conductance to the spectral gap.Rather, it relates a variational characterisation of the optimal spectral gap, due to Roch [Roc05, Proposition 1], to a new connectivity measure for graphs which we introduce.We term it matching conductance and denote it Υ ⋆ .It is defined similarly to vertex conductance, but it replaces the size of the vertex boundary of a set S in the numerator with the size of a maximum matching between S and S c in E. A formal definition is given in Definition 2.1.It can be viewed as a measure of fault tolerance of a graph: a graph has small matching conductance if and only if we can remove a few vertices of the graph and split the graph into two large, disconnected subsets.
It turns out the matching conductance of a graph is essentially equivalent to its vertex conductance: Υ ⋆ (G) ≍ Ψ ⋆ (G), uniformly over all graphs G; see Proposition 2.2.A specific set S of vertices, however, can have matching conductance Υ(S) much smaller than its vertex conductance Ψ(S).This fact makes using matching, rather than vertex, conductance essential in our proof.
Finally, it is natural to ask for a statement analogous to Theorem A, but for general π, rather than specific to π = U V .This is not immediate, for slightly technical reasons.We elaborate in §6.

B 'Almost Mixing'
We introduced the FMMC problem to formalise our desire to construct a fast mixing Markov chain.Theorem A, however, implies there are certain graphs, namely those with small vertex conductance, for which this desire cannot be attained.It is then natural to ask if we can slightly relax the constraints we imposed to overcome this fundamental obstacle.
We answer this question affirmatively: we show that if the Markov chain is not required to have equilibrium distribution exactly uniform, but only sufficiently close to uniform, then all graphs with small diameter admit a fast-mixing Markov chain.Before formalising this claim, we gain some intuition by considering the following simple example, known as the dumbbell graph.
Since D ⋆ has vertex conductance equal to 1/n, Theorem A implies that no chain with uniform equilibrium distribution can have relaxation time of smaller order than n.
In light of the above, we propose the following RW, described by a weighting on the edges of D ⋆ : • give all edges which do not include any of {v + , v − , v ⋆ } unit weight; • give the remaining edges weight εn.The RW takes steps with distribution proportional to the edge weights.It is straightforward to check that the equilibrium distribution induced is at most ε far from uniform in TV.
The fundamental barrier to fast mixing in D ⋆ is that any chain with uniform equilibrium gets stuck in one side of the graph for a time at least order n in expectation.Up-weighting the edges through the bottleneck means that the new RW transitions between the two sides with expected time order 1/ε.This leads to a relaxation time order 1/ε.This all comes at a cost of having invariant distribution ε far from uniform.Further details are given in §1.3.
We are able to generalise this construction to general graphs and general equilibrium distributions.The fast-mixing Markov chain we design is a RW on a carefully weighted breadth-first search (BFS) spanning tree, supplemented with self-loops.We establish an upper bound of 12(diam G) 2 /ε on the relaxation time when we allow the RW to have equilibrium distribution ε-far from uniform.
We now define precisely our set-up and formally state our result.
We actually establish a stronger result than the one described above.The above description says that there exists some reversible chain which is fast mixing: there exist π ′ ∈ D(π, ε) and P ∈ M(G, π ′ ) such that γ P ε/(diam G) 2 .We prove that any reversible chain can be perturbed into a fast mixing chain: for all π ∈ D(V ) and all P ∈ M(G, π), there exist π ′ ∈ D(π, ε) and Theorem B (Almost Mixing).Let G = (V, E) be a finite, connected graph and π ∈ D(V ).Let ε ∈ (0, 1) and P ∈ M(G, π).There exist π ′ ∈ D(π, ε) and A consequence of this spectral gap estimate is that The matrix Q is obtained as a perturbation of P .Moreover, this perturbation is actually independent of P : we construct a weighted BFS tree, as describe above, and 'superimpose' it with the weights corresponding to P .A more refined statement, making this independence of the perturbation explicit, is given in Theorem 3.1.
This diameter bound is a substantial improvement over the vertex conductance lower bound on the optimal spectral gap from Theorem A. It comes at the cost of having invariant distribution only 'almost' uniform-hence the name "almost mixing".We show in the next section that passing to the continuous-time setting allows this diameter-squared bound to be maintained while having exactly uniform invariant distribution.We use fundamentally the same chain: it is a RW on the same weighted BFS tree, where the weights now represent the rate at which an edge is crossed.

C Continuous-Time Markov Chains
The discussion and results above all concern discrete-time Markov chains.It is also natural to study the question of the fastest mixing Markov chain in continuous-time.We restrict to the case where the target distribution π = U V .The question of the FMMC in continuous-time was originally raised by Sun, Boyd, Xiao and Diaconis [SBXD06] and has been studied subsequently by Sammer [Sam05] and Montenegro and Tetali [MT06].We review their work in §1.4.
A continuous-time Markov chain on a graph G = (V, E) with uniform equilibrium distribution can be represented by the RW on a weighted graph (G, q), where q : E → R + , as follows.
Definition C.1 (RW on Weighted Graph).Let G = (V, E) be a graph and q : E → R + a collection of non-negative weights.The RW on (G, q) jumps from x to y at rate q({x, y}) for x, y ∈ V with {x, y} ∈ E. The Laplacian L q ∈ R V ×V of the weighting q is defined by x,y := 1{{x, y} ∈ E} • q({x, y}) − 1{x = y} z∈V :{x,z}∈E q({x, z}) for x, y ∈ V.
The spectral gap, which we denote γ q , is given by the second smallest eigenvalue of L q .
The spectral gap γ q is intrinsically related to the mixing time τ q , as in discrete-time: It is immediate to see that if all the rates are multiplied by some factor, then the spectral gap changes by that factor too: γ cq = cγ q for any c > 0. We must therefore impose some normalisation.
Definition C.2 (Normalisation).The rate at which the walk leaves the vertex x is given by q(x) := y∈V 1{{x, y} ∈ V }q({x, y}) for x ∈ V.
We call max x∈V q(x) the maximal leave-rate and |V | −1 x∈V q(x) the average leave-rate.
A natural normalisation is to require a maximal leave-rate of 1.It can be seen, however, that this reduces to the discrete-time case via exponential-1 waiting times.We impose instead an average leave-rate of 1, or, equivalently, q(E) ≤ 1 2 |E|.This allows a few vertices to have abnormally large leave-rate, but rarely enough that the average is not significantly affected.This will allow the RW to exit small 'bottlenecks' quickly, where the discrete-time walk would remain stuck for significant time.This average leave-rate normalisation was considered in [SBXD06,Sam05,MT06].Montenegro and Tetali [MT06, §7.1] describe this normalisation as "rather powerful [compared with discrete-time]" due to the fact that the maximal leave-rate may be very large.
The main result of this section states that, for any graph, it is possible to construct a weighting with average leave-rate of 1 such that its spectral gap depends only on the diameter of the graph.
Theorem C (Continuous-Time).Let G = (V, E) be a graph.There exists a weighting w : E → R + with average leave-rate at most 1 such that the RW on (G, w) satisfies An upper bound on 1/γ ⋆ (G) of order (diam G) 2 is required for graphs with diffusive behaviour, such as the cycle or the path.A lower bound of order diam G, however, is not necessary, in general.This is in stark contrast to the discrete-time case.Indeed, in continuous-time, a few edges can be up-weighted significantly with little affect on the average.So if the 'typical' distance is much less than the maximal, a relaxation time of smaller order than the diameter may well be achievable.
A highly related theorem is given in Sammer's PhD thesis [Sam05, §3.3]; see also [MT06, §7.1].They bound the optimal spectral gap γ ⋆ (G) in terms of the spread constant c(G), introduced in [ABS98], which is the maximal variance of a function that is Lipschitz on the edges of G: The spread constant c(G) can be upper bounded by 1 4 (diam G) 2 , but there are examples for which this is far from tight.Still, if a very general, easy to calculate, bound is desired, then we do not know of a better bound than c(G) (diam G) 2 , which reduces to approximately our bound.The spread constant c(G) can also be lower bounded by a type of 'typical' distance; see [MT06, Corollary 7.2].
In contrast with our result, however, theirs is non-constructive, relying on the famous, but nonconstructive, Johnson-Lindenstrauss lemma [JL84].Montenegro and Tetali [MT06, Remark 7.3] comment on the difficulty of explicitly constructing such a process: "It might be challenging and in general impractical a task to actually find such a process explicitly."Our construction is explicit and can actually be constructed in time linear in the size of the graph.
Montenegro and Tetali [MT06, Remark 7.3] also comment on the existence of such a fast mixing Markov chain in continuous-time: "The key [to the existence of such a chain] ... might be that we were implicitly providing the continuous-time chain with more power ... by not requiring the rates in each row to sum to 1, but only the [average rate to be 1]."This significant additional power allows bottlenecks to be traversed quickly while maintaining an average leave-rate of 1. Indeed, the weighting w that we construct has max x∈V w(x) ≍ n/ diam G, which may be far larger than 1.
This really emphasises the strength of our 'almost mixing' result, Theorem B: the chain there is in discrete-time-or, equivalently, has max x∈V q(x) ≤ 1-but still attains a spectral gap only order ε smaller than that attained in the continuous-time case of Theorem C. Of course, the cost is that the equilibrium distribution π ′ only satisfies min  We expect that our continuous-time analysis can be adjusted to handle general equilibrium distributions π with relatively little changed.We have not checked the details, however.We focussed on the uniform case because it is, arguably, the most important and the cleanest to present.

D Time-Inhomogeneous Markov Chains
Our attention has been so far restricted to time-homogeneous Markov chains, in which the transition probabilities do not change over time and are described by a single transition matrix P .A timeinhomogeneous Markov chain, instead, is described by a sequence (P t ) t∈N of transition matrices and an initial law µ 0 : the time-t law µ t := P(X t ∈ •) is given by µ t = µ 0 P 1 P 2 • • • P t for t ∈ N. A time-homogeneous chain has P t = P for all t ∈ N, for some P .Nevertheless, we close our section of results by showing that they can lead to improvements over time-homogeneous chains.
Theorem D. Let G = (V, E) be a connected graph and let π ∈ D(V ).There exists a timeinhomogeneous Markov chain on G that perfectly mixes to π after 2 diam G steps: It is easy to see that diam G is a lower bound on the fastest 'perfectly mixing' chain.If one only requires Thus the bound of 2 diam G above is tight up to a factor of at most 4.

Notable Examples
We discuss briefly a few examples which are of particular interest.We always consider the uniform distribution, i.e. π = U V , unless specified to the the contrary.

Dumbbell Graph
Let D ⋆ be the dumbbell graph with bells H ± of size n.The bells H ± need not be cliques K n ; they can be arbitrary connected graphs on n vertices.See Figure 1 for an illustration when Conductance Measures.It is straightforward to see that the set with the worst vertex conductance is given by one side of the dumbbell graph: S = H − or S = H + .This shows that This implies that the optimal relaxation time 1 It is easy to find a chain attaining the correct order of n −1 when H ± = K n , i.e. each bell is a complete graph on n vertices.Define a weighting as follows.Each vertex gets the same total weight.
• Place unit weights on all edges which do not include the centre v ⋆ .
• Give the edges {v − , v ⋆ } and {v The probability of stepping to v ⋆ from either of v ± is 1 2 .This gives an order-n hitting time of one clique from the other.This implies that our chain has relaxation time order n.Contrast this with the suboptimal order n 2 hitting and relaxation time for the uniform RW.
Almost Mixing.If the graphs H ± have polylogarithmic diameter then Theorem B provides a chain with polylogarithmic relaxation time.This is a substantial improvement from linear.This is true regardless of the particular structure of H ± : it just needs log diam H ± log log n.If diam H ± ≍ 1, then we obtain relaxation time order 1, known also as an expander.
We now explain very roughly how to construct this chain for this dumbbell example D ⋆ .The general idea is to up-weight edges towards the central vertex v ⋆ , which is the bottleneck.We do this is such a way to make the distance to v ⋆ behave somewhat like an unbiased RW on Z.This way it should take time order (diam H ± ) 2 to move from It is natural to try to achieve this bias by rooting a spanning tree T at v ⋆ and then up-weighting the vertices towards the root.This leads to a worst-case hitting time for the root v ⋆ of order (diam T ) 2 .We choose T to be a breadth-first search (BFS) tree since this has diam T ≤ 2 diam G.
We give a more detailed overview in §3.2.We specifically chose the bottleneck vertex v ⋆ to be the root of T above.It turns out that actually any choice of root suffices.The reader may find this surprising at first; we did.More generally, suppose that o ∈ V is any vertex and a BFS is rooted at o; we up-weight the edges towards o.Paths from v = o to o naturally go through bottlenecks.This automatically up-weights edges in bottlenecks.

Binary Tree
Let T = (V, E) be the complete binary tree on n = 2 N − 1 vertices with depth N ≈ log 2 n.
Conductance Measures.It is straightforward to see that the set with the worst vertex conductance is given by one side of the tree: the root, a child and all its descendants This gives This implies that the optimal relaxation time 1/γ ⋆ (T) satisfies n 1/γ ⋆ (T) n 2 log n.
T has bounded degree, so the maximum degree chain attains the correct order relaxation time.This chain is just the simple RW, but with extra laziness at the leaves to make the invariant distribution uniform.The correct relaxation time is order n; see [Spi19] for details.
Almost Mixing.It is very natural to root the BFS tree at the root o of T. The up-weighting will help pull the walk up the tree towards the root, allowing it to spread across the width of the tree more easily.The weight given to the edge from x = o to its parent is given by the number of vertices in the subtree rooted at x. Precisely, if x is at distance d ≥ 1 from the root, then the weight is 2 N −d − 1.The up-weighting means that the distance from the root behaves roughly as an unbiased RW.The hitting time of the root is then order N 2 ≍ (log n) 2 .Once the RW hits the root, which branch it takes after is uniformly distributed.Thus, once it hits the leaves again, it is uniform on the leaves and so approximately mixed.The total time for this is order N 2 ≍ (log n) 2 .
Our method does not know, however, that there is anything special about the root of the binary tree.Any vertex can be picked as the root of the BFS tree.Viewed from this vertex, T is like a complete binary tree of some depth, but with some of the branches pruned.The depth is at most 2N .The same ideas give relaxation and mixing time order

Star Graph
Let G ⋆ = (V, E) be the star graph with centre v ⋆ and n leaves.See Figure 2 for an illustration.
Conductance Measures.There is a simple dichotomy for the vertex boundary of a set S = ∅ in the star graph It is straightforward to see that this gives the correct order: place unit weights on all the edges and weight-(n−1) self-loops on all the non-central vertices; it is easy to see that this chain has mixing time order n.
Another measure of vertex conductance replaces the ∂S with the symmetric union ∂ sym S := ∂S ∪ ∂S c ; denote the vertex conductance with ∂ sym by Ψ sym .Again, there is a simple dichotomy for S = ∅: The difference between the two measures is that if the boundary of S is small, then that of S c is large.The use of ∂S, as opposed to ∂ sym S, is thus important in Theorem A.
It is well-known that the spectral gap γ is characterised by a variational form.The relationship between spectral gap γ and the edge conductance Φ ⋆ of the simple RW is given by the wellknown Cheeger inequalities.A related variational form was introduced by Bobkov, Houdré and Tetali [BHT00], which they denote λ ∞ .They establish various Cheeger-type relationships between λ ∞ and the vertex conductance Ψ ⋆ sym .In particular, they show, for any graph, that In light of [BHT00] and our Theorem A, it is natural to wonder whether λ ∞ can be directly related to the optimal spectral gap γ ⋆ , without a d max factor.The example of the star graph G ⋆ shows that this is not possible: Ψ ⋆ sym ≍ 1 and thus λ ∞ ≍ 1, but Ψ ⋆ ≍ 1/n and so γ ⋆ 1/n.This shows that λ ∞ is really not the correct parameter for the FMMC problem.
Almost Mixing.Obtaining an 'almost mixing' chain with order 1/ε mixing time is simple: place ε weights on all the edges and weight-1 self-loops to all non-central vertices.The total weight of the central vertex is ε(n − 1).The remainder of the weight is spread uniformly.Thus the distribution π induced by this weighting is in D(G ⋆ , ε).It is easy to see that the mixing time is order 1/ε.
From this, Ψ ⋆ (M) ≍ 1 follows easily.It is not too hard to show that the optimal spectral gap γ ⋆ (M) is of the same order: just replace the two cliques by two 3-regular expanders and leave the perfect matching in place.The lazy simple RW on this edge-induced subgraph of M has order-1 spectral gap.
Almost Mixing.The optimal spectral gap is order 1, so there is no need for 'almost mixing'.

Complete Graphs Connected via a 'Source'
See Figure 4 for an illustration.Conductance Measures.One may think at first that this 'source' of k edges, rather than just a single edge, give rise to faster mixing; indeed, Ψ(H − ) = k/n.However, removing the source from the set gives Ψ(H − \ v 0 ) = 1/(n − 1).So in fact the vertex conductance of the source graph Σ is almost the same as that of the dumbbell graph D ⋆ .
The edge conductance of the graph does improve with k: Φ(Σ) ≍ k/n 2 .But this is always at most 1/n.So the improvement from k is not enough to outweigh the fact that the uniform RW has spectral gap far from the optimal-unless k ≍ n.
An optimal spectral gap can be achieved by choosing an arbitrary 3-regular expander as a subgraph of each of the cliques and connecting these via a single edge.The uniform RW on this sparse subgraph then has spectral gap order 1/n.Almost Mixing.We can use exactly the same construction as in the dumbbell graph D ⋆ , picking an arbitrary edge amongst the k connecting edges.

Review of Previous Work
We now review previous related work.The FMMC question was originally introduced by Boyd, Diaconis and Xiao [BDX04], which was the first in a series of articles [BDX04, BDSX06, SBXD06, BDPX05, BDPX09] by those authors along with Parrilo and Sun.It has subsequently been studied by [Roc05, Sam05, MT06, AS11, JJ11, FK13, CA15].We roughly collect these by theme.

Finding Useful Formulations
Boyd, Diaconis and Xiao [BDX04].This original work introduces the FMMC question and then primarily studies equivalent formulations.In our view, the most important contribution of that paper, beyond the introduction of the very interesting FMMC question, is their formulation of the FMMC optimisation problem as a semi-definite program (SDP).This allows the computation of an optimal solution in polynomial time via standard convex optimisation techniques.The SDP leads naturally to a dual formulation, which found use in subsequent work [MT06,Roc05,Sam05].
Roch [Roc05].Roch takes the dual formulation of [BDSX06] much further, writing the optimal spectral gap γ ⋆ π (G) as a minimisation of the variance of a certain constrained graph embedding.To quote him, "Informally, to obtain [a lower bound on the optimal spectral gap] we seek to embed the graph into R |V | so as to 'spread' the nodes as much as possible under constraints over the distances separating nodes connected by edges."He re-derives the upper bound γ ⋆ π (G) Ψ π (G) using this formulation.This shows vertex conductance is a fundamental barrier to fast mixing.Our result shows that vertex conductance is essentially the fundamental barrier to fast mixing.
Sun, Boyd, Xiao and Diaconis [SBXD06].The paper [SBXD06] is of a similar flavour to [BDX04] but in the continuous-time set-up.We discuss it in detail in §1.4 below.

Special Cases and Particular Examples
Boyd, Diaconis, Sun and Xiao [BDSX06].The special case of the path with uniform distribution is studied in the short note [BDSX06], as a follow-on from [BDX04].They show that the 'uniform chain', i.e., the unbiased RW with 1 2 -holding at the ends, has the largest spectral gap.Boyd, Diaconis, Parrilo and Xiao [BDPX05,BDPX09].The FMMC problem on graphs with rich symmetry properties is studied in [BDPX09].They are able to solve various cases analytically: edge-transitive graphs, such as the cycle; Cartesian products of graphs, such as the two-dimensional torus and the hypercube; distance-transitive graphs, such as Petersen, Hamming and Johnson graphs.They then use algebraic methods to study FMMC on orbit graphs.This uses powerful representation theory arguments developed in [BDPX05].
Cihan and Akar [CA15].Many similar scenarios, such as edge-transitive graphs, are studied in [CA15].The focus is on two SDP methods.They study the degree-biased and uniform equilibria.
Jafarizadeh and Jamalipour [JJ11].Symmetric K-partite graphs and connections to sensor networks are considered in [JJ11].They compare numerically with a Metropolis-Hastings algorithm.
Allison and Shader [AS11].Graphs which are overlapping unions of two cliques are studied in [AS11].Here are are two cliques, say of sizes r+s and r+t, respectively, and there are s overlapping vertices.The FMMC problem is solved analytically for such graphs.
Fill and Kahn [FK13].A rather different approach is taken in [FK13].Their paper is focussed on comparison inequalities and majorisation of measures.They use these to analyse the FMMC problem.This use of majorisation allows them to study a distance, such as TV, separation or ℓ 2 , rather than the spectral gap, which is only a proxy for the mixing time.

Continuous-Time Set-Up
Sun, Boyd, Xiao and Diaconis [SBXD06].The study of the FMMC question in continuous-time was initiated in [SBXD06].The structure and goals of this paper are similar to [BDX04].The primary contribution is a convex SDP formulation as well as some dual formulations.
Recall that a normalisation on the weights was required.Indeed, doubling all the weights doubles the spectral gap.We imposed an "average leave-rate of 1".A slightly more general 'weighted average' is considered in [SBXD06].A number of physical interpretations of this normalisation are given.
Sammer [Sam05] and Montenegro and Tetali [MT06].The FMMC problem is considered by Sammer [Sam05, §3.3].It is referenced and discussed by Montenegro and Tetali [MT06, §7].We discussed their work in detail immediately after Theorem C. We add a small caveat to that discussion.
Montenegro and Tetali [MT06, §7.1] claim to impose a scaling of q(V ) ≤ 1 on their edge weightings q : E → R + ; contrast this with our imposition of q(V ) ≤ n.Their scaling immediately implies that the relaxation time is at least order n; this contradicts their theorem.There are a couple of other points where there seem to be issues with the scalings, in particular in application of results from [SBXD06].It may be possible to rectify these issues, but we have not checked carefully.

Vertex Conductance and the Optimal Spectral Gap
This section is devoted to a proof of Theorem A. In §2.1 we define the matching conductance of a graph, which plays a central role in the proof of Theorem A. We also show in Proposition 2.2 that matching and vertex conductance of a graph differ by at most a universal constant.In §2.2 contains the necessary notation and preliminaries needed in the proof of Theorem A. In §2.3 we relate the optimal spectral gap of a graph to its matching conductance.This relation is formalised in Theorem 2.10.Notice that Theorem 2.10 together with Proposition 2.2 directly imply Theorem A.

The Matching Conductance of a Graph
A matching is a set of edges that do not share an endpoint.Given a set of (undirected) edges E together with a weight function w : E → R ≥0 , a maximum matching for E is a matching with maximum total weight (if E is the edge set of an unweighted graph, we assume w is equal to one on E).We denote with ν(E) the weight of a maximum matching for E; that is, w(e).
We can now define the matching conductance of a graph.
Definition 2.1.Let G = (V, E) be a graph and ∅ = S ⊂ V .The matching conductance of S is defined as The matching conductance of G is defined as The next proposition relates matching and vertex conductance.
Proposition 2.2.Let G = (V, E) be a graph.Then, it holds that Proof.The inequality Υ ⋆ (G) ≤ Ψ ⋆ (G) is obvious: for any S ⊆ V , ν(E(S, S c )) must be smaller than the size of the vertex boundary of S. Therefore, Υ(S) ≤ Ψ(S) for any S ⊆ V , which yields the inequality.The proof of Ψ ⋆ (G) ≤ 4Υ ⋆ (G) is slightly more involved.In particular it's not true that Υ(S) ≤ Ψ(S) for any S ⊆ V .Figure 4 provides an example of a graph with a set with small matching conductance, but large vertex conductance.Nevertheless, the worst vertex conductance of a set in a graph is related to the matching conductance of the graph.To prove this, consider |S| ≤ |V |/2 with Υ(S) = ν(E(S, S c ))/|S|.We can assume Υ(S) ≤ 1/4, otherwise Υ ⋆ (G) > 1/4 ≥ Ψ ⋆ (G)/4.Let M be a maximum matching for E(S, S c ), that is |M | = ν(E(S, S c )), and V (M ) ⊂ V be the set of vertices adjacent to edges in M .Now consider the set T = S \ V (M ).We claim T has small vertex conductance.To this end, consider ∂T .It holds that |∂T | ≤ |V (M )|.Indeed, any u ∈ S c \ V (M ) cannot be in ∂T , otherwise there would exist an edge between a vertex S \ V (M ) and a vertex in S c \ V (M ), and this would contradict the maximality of M .Therefore, since |V (M )| = 2|M |, we have that where in the last inequality we have used the fact that |M | = Υ(S) • |S| ≤ |S|/4.

Definitions and Preliminaries
Given a set of vertices V , together with a set of edges E on V , a fractional matching is a function f : E → [0, 1] such that, for any v ∈ V , e∋v f (e) ≤ 1.Moreover, the fractional matching number of E, denoted by ν * (E), is the maximum total weight of a fractional matching for E: where the maximisation is over valid fractional matchings.
Notice that ν * (E) is the solution of a linear program which is a convex relaxation for ν(E).As such, ν(E) ≤ ν * (E).A useful characterisation of ν * (E) is the following.
Proposition 2.3.The fractional matching number of E, ν * (E), is equal to the minimum of the following linear program.
Proof.This simply follows from linear programming duality.
With a slight abuse of notation we will also use ν(G) and ν * (G) to denote the maximum (fractional) matching weight on the edge set of G.
Up until now we have considered only matchings in undirected graphs.For technical reasons, however, we will also need to consider matchings in directed graphs.Given a set M and (u, v) = (w, z), then u = w and v = z.Alternatively, a directed matching can be seen as a subgraph where each vertex has indegree and outdegree at most one, whereas an undirected matching is a subgraph where each vertex has degree at most one.Analogously to the undirected case, we denote with ν( − → E ) the weight of a maximum matching in − → E .

w) is a directed graph constructed by replacing each undirected edge {u, v} ∈ E with a directed edge (u, v) (with arbitrary orientation) having weight w(u, v).
The next lemma relates the maximum matching weight in an undirected graph with the maximum matching weight of its orientation.
Proposition 2.5.For any graph G = (V, E, w) and an orientation Proof.Let M ⊆ E be a matching returned by the greedy algorithm for finding a maximal matching on G, which works as follows: let e 1 , . . ., e m be an ordering of the edges of G such that w(e 1 ) ≥ • • • ≥ w(e m ).Then, greedy incrementally construct M by adding e i to it, for i = 1, . . ., m, as long as this operation maintain the property that M is a matching.Denote with e i1 , . . ., e i |M | the edges of M ordered nondecreasingly according to their weight.
Let −→ M * be a maximum matching in − → G .We upper bound its total weight as follow.Let − → M 0 = −→ M * and, for j = 1, . . ., |M |, let − → M j be the directed graph obtained from − → M j−1 by removing all edges incident to one of the endpoints of e ij .Since M is maximal by construction, − → M |M| is empty.Moreover, at each iteration j we remove at most four edges, since there are at most four edges in −→ M * that share an endpoint with e ij .These edges all have weight less than or equal to w(e ij ).This is because the matching {e 1 , . . ., e ij−1 } can be augmented by adding any one of these edges (or rather, their undirected equivalent) without breaking the property of it being a matching.But then, their weight must be less than or equal to w(e ij ), since otherwise greedy would have chosen one of those instead of e ij .Therefore, we have proved that 4ν(G) ≥ ν( −→ M * ), from which the proposition follows.

Matching Conductance and the Fastest Mixing Problem
The following result is due to Roch [Roc05] and gives a variational characterisation of γ ⋆ (G).It follows from the fact that γ ⋆ (G) can be expressed as the solution to a semidefinite program for which strong duality holds.
Proposition 2.6.Let G = (V, E) be a graph of n vertices.Then, γ ⋆ (G) is equal to the minimum of the following optimisation problem.
Remark 2.7.The variational characterisation actually given by Roch [Roc05] doesn't include a non-negativity constraint for the function g.The function g, however, needs to be non-negative whenever, as in our case, Markov chains on G are allowed non-negative holding probabilities.More precisely, for any u ∈ V and P transition matrix of a Markov chain on G, if we allow P (u, u) > 0, then we need to require g(u) ≥ 0. △ The variational characterisation of γ ⋆ (G) given by Proposition 2.6 requires minimising over n-dimensional embeddings of the vertices in the graph.It is often more convenient to work with one-dimensional embeddings.For this reason, we introduce the following parameter.
Definition 2.8.Let G = (V, E) be a graph of n vertices.We denote with γ (1) (G) the minimum of the following optimisation problem.
The following proposition shows that γ Proposition 2.9.Let G = (V, E) be a graph.It holds that The proof of this proposition uses a standard trick (see, e.g., Montenegro and Tetali [MT06]): (i) we apply the Johnson-Lindenstrauss lemma [JL84] to show that considering only O(log n)dimensional embeddings suffices to obtain a constant approximation for γ ⋆ (G); (ii) we transform such O(log n)-dimensional embedding into a one-dimensional embedding, but in doing so we will lose a O(log n) factor.
Proof.The relation γ ⋆ (G) ≤ γ (1) (G) follows trivially since computing γ (1) (G) can be seen as minimising over the same set of n-dimensional embeddings as for γ ⋆ (G), with the additional constraint that only the first coordinate can be non-zero.
To prove the upper bound, let f : V → R n , g : V → R ≥0 be the minimiser achieving γ ⋆ (G) in Proposition 2.6.Then, the Johnson-Lindenstrauss lemma ensures there exists an embedding over j ∈ {1, . . ., d}, and define Therefore, (h, g) is a feasible solution to the optimisation problem of Definition 2.8.Finally, where we used the fact that both h and f are centred at zero.Therefore, . Suppose we fix f : V → R that minimises the optimisation problem above.Then, by Proposition 2.3, γ (1) (G) can be seen as the fractional matching value of a graph G f which is constructed from G by reweighing each edge {u, v} by (f (u) − f (v)) 2 .Together with Proposition 2.9, this hints towards a connection between the matching conductance of G and γ ⋆ (G).This connection is formalised in Theorem 2.10, which is the main result of this section.
Theorem 2.10.Let G = (V, E) be a graph.It holds that

Moreover, this implies that
The proof of Theorem 2.10 follows the standard template of the proof of the discrete Cheeger inequality.To upper bound γ (1) (G) it suffices to construct test functions f, g from a set S minimising the matching conductance of G.The other direction is more complicated and, similarly to the case of the "hard direction" of the discrete Cheeger inequality, it requires using the function f that minimises γ (1) (G) to construct sweep sets and analyse the matching conductance of such sets.Analysing the matching conductance of these sweep sets, however, is not as straightforward as analysing their edge conductance as in the proof of the standard discrete Cheeger inequality.
We split the proof of Theorem 2.10 in several lemmata.The first one, Lemma 2.11, relates the maximum matching of cuts in the graph to the maximum matching of a weighted directed graph appositely constructed.
Lemma 2.11.Let G = (V, E) be an unweighted undirected graph and let f : − → E f , w f ) be a directed weighted graph constructed as follows: For any t > 0, define Proof.For any t ∈ [0, ∞), let M t ⊆ E(S t , S c t ) be a matching achieving value ν(E(S t , S c t )).Notice that, for any t, there might be several distinct maximum matching for E(S t , S c t ): we just pick one of them arbitrarily.We have that Notice that, for any edge {u, v} output by an execution of greedy on − → G f , which works as follows.We first order the edges . For i = 1, . . ., m, we incrementally construct − → M by including e i for i = 1, . . ., m as long as adding this edge doesn't break the property of − → M being a directed matching.This is the same algorithm as the one described in the proof of Proposition 2.5, with the difference that we are now constructing a directed instead of an undirected matching.
Consider now {u, v} ∈ E such that f (u) < f (v) and {u, v} ∈ M t for some t ≥ 0.Then, there must exist an edge (u ′ , v ′ ) ∈ − → M such that u = u ′ or v = v ′ (since greedy outputs a maximal matching) and The inequality above holds because otherwise greedy would have picked (u, v) instead of (u For any t ≥ 0, let h t : E → E be the function that maps any {u, v} ∈ E such that f (u) < f (v) and {u, v} ∈ M t to an edge (u ′ , v ′ ) as above.Notice that for any edge (u ′ , v ′ ) ∈ − → M and any t ≥ 0, there can be at most two edges in M t that share an endpoint with (u ′ , v ′ ).Hence, The lemma follows by observing that where we can swap the signs of integration and summation since the matchings {M t : t ≥ 0} can be chosen so that we need to consider only at most n − 1 different matchings (since there are at most n−1 different sets S t ), which implies that the integral can actually be computed as a finite sum.
The next lemma shows how to construct a set of small matching conductance given a "good" non-negative function f : V → R ≥0 .
Lemma 2.12.Let G = (V, E) be a graph and f : V → R ≥0 .Let λ be the minimum of the following optimisation problem.
Then, there exists a set S ⊆ {u ∈ V : Proof.
First notice the denominator is equal to We now upper-bound the numerator.Let − → G f = (V, − → E f , w f ) be the directed weighted graph defined in Lemma 2.11: Notice that this is an orientation of a graph G f = (V, E f , w f ) where each directed edge (u, v) ∈ − → E f is replaced by {u, v} ∈ E. Therefore, by Lemma 2.11 and Proposition 2.5, we have that We now want to relate ν(G f ) to λ.Let M be a maximum matching in G f .By applying Cauchy-Schwartz and the triangle inequality, where the first inequality follows from Cauchy-Schwartz, while the second from the inequality (a + b) 2 ≤ 2a 2 + 2b 2 for any a, b ∈ R. Notice that {u,v}∈M (f (u) − f (v)) 2 can be interpreted as the weight of a matching in an undirected graph obtained from G by reweighing each edge {u, v} with weight (f (u) − f (v)) 2 .Therefore, we can apply Proposition 2.3 and the definition of λ to show that Putting all together, we obtain We are now finally ready to prove Theorem 2.10.
Proof of Theorem 2.10.We start by proving the "easy side" of the Cheeger-type inequality, i.e., Let M be a maximum matching for E(S, S c ), i.e., |M | = ν(E(S, S c )). Define g : V → R as By construction, f, g satisfy the constraints of the optimisation problem of Definition 2.8.Moreover, u∈V g(u) = 2ν(E(S, S c ))/|S|, while u∈V f (u) 2 = 1.Therefore, We now turn the attention to the "harder side", i.e., Υ ⋆ (G) 2 γ (1) (G).Let f, g : V → R minimise γ (1) (G).We cannot directly apply Lemma 2.12 with f since f is not non-negative.For this reason, we define two non-negative functions h − , h + : V → R ≥0 as follows.Let c be the median of f , i.e., order the vertices in we define h h − , otherwise h h + .We now apply Lemma 2.12 with h.
First notice that since u∈V f (u) = 0 because f is a feasible solution to the optimisation problem of Definition 2.8.Moreover, for any {u, v} ∈ E, We can then apply Lemma 2.12 with u∈V f (u) 2 = 2γ (1) (G) Therefore, there exists S ⊆ {u ∈ V : h(u) > 0} such that Υ(S) ≤ 8 √ 2λ.Moreover, by construction the support of h has size at most |V |/2.Hence, 3 Almost Mixing

Set-Up and Main Result
The previous section was devoted to estimating mixing-type statistics for the FMMC problem: we controlled the maximal spectral gap γ P amongst all transition matrices P on a given graph G = (V, E) which are reversible w.r.t. the uniform distribution in terms of the vertex conductance of G.
The purpose of the current section is to relax the condition that the invariant distribution of P , which we denote π P , is exactly U V : we allow π P to be ε-far from uniform in TV.We show that this can allow a significant speed up in the mixing time versus requiring the invariant distribution to be exactly π: we explicitly construct a Markov chain with spectral gap order at least ε/(diam G) 2 .Recall that we write D(V ) for the set of positive probability distributions on a set V and We use the following notation.
u(x) := y∈V u({x, y}) and u(S) := x∈S u(x); Define the transition matrix P u ∈ [0, 1] V ×V by P u (x, y) := u({x, y})/u(x) for x, y ∈ V with x = y; Abbreviate the spectral gap as γ u := γ Pu .Define the probability measure π u : V → [0, 1] by P u is the transition matrix of the RW on the weighted graph and π u is its invariant distribution.It is the unique invariant distribution if the edge set {e ∈ E | u(e) = 0} is connected.
The following theorem is a refinement of Theorem B.
Theorem 3.1 ('Almost Mixing').Let G = (V, E) be a graph and let π ∈ D(V ).There exists an edge weighting w 1 : E → R + , depending only on G and π, with unit total weight, i.e. w 1 (V ) = 1, and the following property.Let ε ∈ (0, 1).Let P be a transition matrix on G which is reversible w.r.t.π; it need not be irreducible.Let w 0 : E → R + be the unique edge weighting of G with P w = P and w 0 (V ) = 1.Define the transition matrix Q via the superposition weighting w := w 0 + εw 1 : Q := P w where u(e) := w 0 (e) + εw 1 (e) for e ∈ E.
We remark briefly on the 'independence' of the perturbation by εw 1 .
Remark 3.2 (Independence of Perturbation).The weighting w = w 0 + εw 1 can be seen as a perturbation of w 0 by εw 1 , since we are most interested in the case where ε is very small-indeed, we want the new equilibrium distribution to be very close to π.We emphasise that the perturbation weighting w 1 does not depend on the base weighting w 0 ; rather, w 1 is a function only G and π. △ We fix a graph G = (V, E) and a probability measure π ∈ D(V ) throughout this section.We do not always repeat these in statements below.Also, we write n := |V |.
We start by proving a slightly weaker statement.Assume that w 0 corresponds to unit-self loops: w 0 ({x, y}) = 0 for all x, y ∈ V with x = y and w 0 ({v}) = π(v) for all v ∈ V.
The corresponding transition matrix P w0 is diagonal and thus reversible w.r.t.any measure.We then extend the argument to handle arbitrary initial weightings w 0 in the §3.5.

Outline and Proof Given Later Results
We start by giving a very brief outline with cross-references to the results proved in the following subsections.We then flesh out this outline, giving a more detailed description.
Outline of Proof: Very Brief.The proof has four key steps.(i) We construct a weighted spanning tree; see Definition 3.11.
(ii) We control the difference between the invariant distribution of the RW on this weighted tree and the target distribution π; see Lemma 3.12.
(iii) We estimate the conductance of this weighted tree; see Proposition 3.13.
(iv) We relate its spectral gap and conductance using canonical paths; see Corollary 3.14.△ Outline of Proof: More Detailed.We now flesh out the above details somewhat for π = U V .
• Let T = (V, F ) be a BFS spanning tree of G, rooted at v ⋆ .We choose a weighting w ⋆ : F → (0, ∞) such that the weights increase towards the root v ⋆ in such a way that w ⋆ has edge conductance order 1.We then rescale the w ⋆ to get w ⋆ with total weight w ⋆ (V ) = εn.
• It remains to analyse the edge conductance of w, which is intimately related to the original total weight w ⋆ (V ).We can choose the weights such that w ⋆ (V ) ≍ n diam G.We then apply a Cheeger-type inequality to deduce a spectral gap lower bound of order ε/(diam G) 2 .We now describe how to choose the weighting w ⋆ .Let T x ⊆ T denote the subtree rooted at x and consisting of all descendants of x.We choose the weight w ⋆ ({x, prt(x)}) := |T x |, where prt(x) is the (unique) parent of x, for x = o.This way, the conductance of a subtree T x in the weighted tree (T, w ⋆ ) is precisely 1.We emphasise that this is in the weighted tree (T, w ⋆ ).We need to rescale w ⋆ and combine it with the unit-weight self-loops to get an approximately uniform weighting.
It turns out that w ⋆ (F ) ≍ n diam T ≍ n diam G.This then gives rise to a final conductance Φ ⋆ ≍ ε/ diam G.The standard Cheeger inequality then gives γ ε 2 /(diam G) 2 .We improve this to γ Φ * /(diam T ) by applying the canonical paths method, using the fact that T is a tree.
The proof for general π is very similar.One gives the self-loop at x weight π(x) and defines w ⋆ ({x, prt(x)}) := π(T x ).This is the natural extension.The same arguments follow through.

Preliminaries
We introduce some preliminary material which is used throughout the proof, as well as in §4.The majority of it will be familiar to a reader well-versed in RWs and mixing time analysis.
First, we generalise the notation of edge conductance of the graph, herein abbreviated conductance.We introduced this in Definition A.1 for RWs on unweighted graphs, i.e. unit weights on all edges.Reversible RWs correspond to a general weighting u : E → R + , as in the above notation.The (edge) conductance of a reversible RW is the (edge) conductance of that weighted graph.Definition 3.3 (Edge Conductance).Let G = (V, F ) be a graph and u : F → R + be a weighting.The conductance Φ u (S) of a set S ⊆ V with π u (S) > 0 w.r.t.u is defined to be The conductance Φ ⋆ u of u is defined to be The adjusted conductance Φ u (S) of a set S ⊆ V with 0 < π u (S) < 1 is defined similarly: ) and Φ ⋆ u := min S⊆V :0<πu(S)<1 Φ u (S).
Remark 3.4a (Conductance: Original and Adjusted Relations).The definition of the adjusted conductance does not need the restriction π u (S) ≤ 1 2 as Φ u (S) is invariant under complementation: Φ u (S) = Φ u (S c ) for all S ⊆ V with 0 < π u (V ) < 1.
The following inequalities between Φ u and Φ u are immediate: Φ u (S) ≤ Φ u (S) for all S ⊆ V with 0 < π u (S) < 1; The length of a path Γ : {0, ..., L} → V is defined to be |Γ| := L. Proposition 3.6 (Canonical Paths: General).Let G = (V, E) be a connected graph and u : E → R + be a weighting.Let Γ x,y be an arbitrary F -path from x to y for x, y ∈ V .The spectral gap γ u of the RW on (G, u) satisfies x,y∈V 1{e ∈ Γ x,y }π u (x)π u (y)|Γ x,y | .
Corollary 3.7 (Canonical Paths: Trees).Let T = (V, F ) be a connected tree and u : F → (0, ∞) be a weighting.The spectral gap γ u of the RW on (T, u) satisfies Proof.Let Γ x,y be the shortest path between x and y for all x, y ∈ V .Then |Γ x,y | ≤ diam T for all x, y ∈ V .Removing the edge e = {e − , e + } ∈ F disconnects the graph, leaving two components, with e − ∈ V in one component and e + ∈ V in the other.Denote the component containing e ± by T e ± .The canonical paths method (Proposition 3.6) then implies that This uses the fact that T e + ∪ T e − = V and F (T e + , T e − ) = e for all e ∈ F .
Remark 3.8.A more general statement is proved by Miclo [Mic99, Theorem 1] (article in French).He does not require the graph T to be a tree at the cost of replacing diam T in the denominator of the bound by the longest path in T ; eg, if T has a Hamiltonian path, then the denominator becomes n − 1.He gives two proofs, one of which uses a canonical paths style argument.△ Finally, we introduce some notation for trees and prove a counting lemma.This lemma, innocuous as it may appear, is fundamental to multiple calculations.The notation and definitions above were for any graph T = (V, F ). Assume that T is a tree for the rest of this preliminary section.Definition 3.9 (Tree Notation).Let T = (V, F ) be a tree rooted at o.
• Let anc(z) denote the unique shortest path from z to the root o, including both z and o.
• Let V y := {z ∈ V | y ∈ anc(z)} and T y := T [V y ] denote the subtree rooted at x.
• Let prt(x) denote the parent of x = o, ie the unique neighbour y of x satisfying y ∈ anc(x).
Lemma 3.10 (Counting Weighted Subtrees).For all measures µ on V and all x ∈ V , we have

Construction via a Weighted Spanning Tree
First, we define a weighted spanning tree (T, w).T is a BFS tree, so ∆ := diam T satisfies ∆ ≤ 2 diam G.
The distribution π w induced by w is close to π in the following sense.
Next, we control the conductance of this weighted spanning tree.
Proposition 3.13 (Conductance).Let (T, w) be the weighted spanning tree from Definition 3.11.The conductance Φ ⋆ w of the RW on (T, w) satisfies Proof.First, suppose that o / ∈ S. Choose x ∈ S with dist(x, o) minimal.We may assume that T [S] is connected, by Remark 3.4b.These together imply that S ⊆ V x and x, prt(x) = F (T x , T c x ) ⊆ F (S, S c ).
This implies that The definition of w gives. .
Finally, we apply the canonical paths method for trees (Corollary 3.7) to deduce a bound on the spectral gap for (T, w).
Corollary 3.14 (Spectral Gap).Let (T, w) be the weighted spanning tree from Definition 3.11.The spectral gap γ w of the RW on (T, w) satisfies Proof.This is an immediate consequence of the canonical paths method for trees (Corollary 3.7) and Proposition 3.13, along with the relations of Remark 3.4a.Also, ∆ ≤ 2 diam G.
We have now almost proved the main result.We just need to make sure the chain is lazy and convert the spectral gap result into a mixing time result.
Proof of Theorem 3.1 when w 0 = π.The Markov chain constructed is a RW on a weighted BFS tree.It is defined in Definition 3.11.Denote the invariant distribution of the RW on this weighted tree by π ′ .Lemma 3.12 establishes the claim on the invariant distribution.
The spectral gap bound is proved via Proposition 3.13 and Corollary 3.14.Precisely, Corollary 3.14 defines a reversible chain The mixing time bound will follow from the spectral gap bound via a standard mixing timespectral gap relation.To apply this relation, we first pass from Q to its lazy version Q ′ := 1 2 (I +Q ′ ).This ensures that the spectral gap and absolute spectral gap agree.Q and Q ′ have the same invariant distribution and that γ Q ′ = 1 2 γ Q .A simple calculation establishes the mixing time claim using the spectral-mixing relation; see [AF02,Lemma 4.23] for details of this relation.
It remains to handle the case of general w 0 , i.e.where w 0 is any unit edge weighting with π as its induced invariant distribution.This is done in the next subsection.

Perturbation to Arbitrary Base Chain
The analysis up to this point has shown the existence of a fast 'almost mixing' chain.Precisely, we defined a weighted graph by constructing an appropriately weighted BFS tree and supplementing it with π-weighted self-loops.We can think of the self-loops as a 'base' weighting which is reversible w.r.t.π.We denoted the 'base' weighting w 0 and the 'tree' weighting w 1 ; recall Definition 3.11.
We now explain how to extend this to an arbitrary 'base' weighting w 0 .The analysis is extremely similar to that of the self-loops case above: we simply take an arbitrary base weighting w 0 and superimpose on it the same weighted BFS tree.Some small adjustments are needed, but not many.
Let w 0 : E → R + be an arbitrary unit edge weighting of E. Define E ′ := {e ∈ E | w 0 (e) = 0}, the edge set of the graph induced by w 0 .π is the invariant distribution of the w 0 -weighted RW.
Construction.We define the weighted tree (T, w 1 ) exactly as before in Definition 3.11: T = (V, F ) is an arbitrary BFS tree and w 1 ({x, prt(x)}) = π(T x ) for x ∈ V \ {o}; set w := w 0 + εw 1 .The proof of Lemma 3.12 is unchanged, showing that the induced distribution π w is close to π. △ Canonical Paths and Adjusted Conductance.We can no longer use the canonical paths method for trees (Corollary 3.7) since the weighting does not necessarily give rise to a tree.The 'extra edges'-ie, those corresponding to non-self-loops in w 0 -can only increase the conductance.Intuitively, these cannot harm mixing, but we must establish this carefully.First, we adjust the proof of the canonical paths method for trees, i.e. the deduction of Corollary 3.7 from Proposition 3.6.We use the same canonical paths Γ, defined by paths in the BFS tree T .The bound on the spectral gap γ does not require a lower bound on Φ(S) for arbitrary S; rather, it only needs a lower bound on Φ(T x ) for all x ∈ V .
The fact that E ′ = ∅ when w 0 is only self-loops meant that the set of edges emanating from the set T x was given by F (T x , T c x ) = {x, prt(x)}.More generally, it is given by {x, prt(x)} ∪ E ′ (T x , T c x ).But this is always a superset of {x, prt(x)}, so the edge conductance is always larger than if the edges from E ′ were ignored.Motivated by this, define the following adjustment of edge conductance: This is the edge conductance where only the boundary edges in the tree T = (V, F ) are considered.The same proof as for canonical paths for trees then implies that Conductance Analysis.The analysis of the conductance in Proposition 3.13 needs to be adjusted.We need only analyse the conductance of complete subtrees T x and must regard the boundary as only . The proof of Proposition 3.13 applies almost unchanged to control Φ ⋆ w : we obtain Φ ⋆ w ≥ 1 6 ε/∆.The only point to be noted is the establishment of the equality w 0 (T x ) = π(T x ).Previously, this was obvious from the self-loop weightings.It still holds here, since the invariant distribution induced by w 0 is π and w 0 has unit total weight, by assumption.Thus, in fact, w 0 (x) = π(x) for all x ∈ V .△ Conclusion.We combine the two results above, exactly as before, to obtain The conversion of this into a lazy chain and then into a mixing estimate is unchanged.

Set-Up, Main Result and Outline
We have been studying discrete-time Markov chains throughout this paper.It is natural to ask the same question for continuous-time chains.Our attention is devoted to continuous-time Markov chains which are reversible w.r.t. the uniform distribution.Such chains can always be represented as a RW on a weighted graph (G, w) where w : E → R + is a weighting on the edge of G = (V, E).
Our main result for continuous-time chains is simple to state: we impose a normalisation of |V | −1 e∈E w(e), ie the average rate at which the RW leaves a vertex is at most 1; we define a weighting w and show an upper bound of order (diam G) 2 on the spectral gap of this RW.(ii) We control the total weight of the spanning tree; see Lemma 4.3.
(iii) We estimate the conductance of this weighted tree; see Proposition 4.5.
(iv) We relate its spectral gap and conductance using canonical paths; see Corollary 4.6.△ We fix a graph G = (V, E) and always take π := U V to be the uniform distribution on V .We do not always repeat these in statements below.Also, we write n := |V |.

Proof via Adjustments to Discrete-Time Case
The proof in continuous-time is surprisingly similar to that used in discrete-time.
• We construct the same weighted tree (T, w), except that we do not include the self-loops; contrast Definitions 3.11 and 4.2.
• The invariant distribution of a RW on a graph with weights on the edges is always uniform.Thus we do not need an analogue of Lemma 3.12.We require the total weight to be at most n, instead of requiring the invariant distribution to be close to a given measure.
• We use the same argument to control the conductance; cf Proposition 3.13.The only differences is that now we do not include the self-loop weight in the calculation.
This is equivalent to the tree-weight in the discrete-time case; see w 1 in Definition 3.11.The particular scaling is chosen so that the total weight of w is at most ∆n, as the next lemma shows.
Lemma 4.3 (Total Weight of w).We have w(V ) ≤ n.
Proof.This is an immediate consequence of the subtree counting lemma (Lemma 3. Next, we control the conductance of this weighted spanning tree; cf Proposition 3.13.To do this, we must first give the precise definition of conductance in continuous-time. Definition 4.4 (Conductance).Let T = (V, F ) be a graph and let u : F → R + be a weighting.The conductance Φ u (S) of a set S ⊆ V with π u (S) > 0 w.r.t.u is defined to be Φ u (S) := u F (S, S c ) |S|.
Proposition 4.5 (Conductance).Let (T, w) be the weighted spanning tree from Definition 3.11.The conductance Φ ⋆ w of the RW on (T, w) satisfies Proof.The same reductions as used in Proposition 3.13 show that it suffices to show that Φ w (T x ) ≥ 1 2 ∆ −1 for all x ∈ V.
But this is immediate from the definition of w: Finally, we apply the canonical paths method for trees (Corollary 3.7) to deduce a bound on the spectral gap for (T, w); cf Corollary 3.14.We must adjust this to apply in continuous-time; see Proposition 4.7 and Corollary 4.8.Proof.This is an immediate consequence of the canonical paths method for trees in continuoustime (Corollary 4.8) and Proposition 4.5.
It remains to adjust the method of canonical paths to continuous-time.
Proposition 4.7 (Canonical Paths in Continuous-Time: General).Let G = (V, E) be a graph and u : E → R + be a weighting.Let γ x,y be an F -path from x to y for all x, y ∈ V .The spectral gap γ u of the RW on (G, u) satisfies x,y∈V 1{e ∈ γ x,y }|γ x,y | .Concretely, one can rescale the weights, setting w(•) := cw(•) for some value c such that max x∈V w(x) = 1.This can then be realised by placing mean-1 exponential wait times between jumps of a discrete-time chain P .One then applies the canonical paths method to P .The Dirichlet form is linear in this scaling meaning that the scaling can be 'undone' at the end.
Remark (Cheeger-Inequality in Continuous-Time).We remark that while a scaling argument as used above does apply for the usual discrete-time Cheeger inequality, namely γ ≥ 1 2 (Φ ⋆ ) 2 , the bound is quadratic in the scaling.Thus a factor of c is lost.See [AF02, Theorem 4.40].In our set-up, max x∈V w(x) may be as large as (n − 1)/∆.This would lead to a lower bound of 1/(n∆) on the spectral gap, rather than 1/∆ 2 as we were able to achieve using canonical paths.△ • If at some point the walk stays put, then keep it at the state indefinitely.This completes the argument when X 0 = o.It remains to consider the case that X 0 = o.Direct all edges towards o and run for ∆(o) steps.Precisely, set P 0 (x, y) := 1{x = o, y = prt(x)} + 1{x = y = o}, where prt(x) is the unique neighbour of x = o on the unique shortest path from x to o.If ∆(o) steps are made according to this matrix, then X ∆(o) = o, regardless of X o .We then apply the construction from the case X 0 = o, all shifted by ∆(o).

Open Problems and Concluding Remarks
We have studied fundamental barriers to fast mixing on graphs.There are a few questions left open by our work.First, throughout this paper we have been mainly focussed on mixing to the uniform distribution.Whilst this is arguably the most important case, we believe that generalising our results to arbitrary distributions would be an interesting extension to our work.In particular, while extending our results on continuous-time chains should not require much effort, more thought is needed in generalising our Cheeger-type inequality between the vertex conductance and the optimal spectral gap.It is possible to generalise the notion of vertex conductance to any arbitrary positive distribution π.We believe that it should be possible to carry over the general idea of our proof to this case by replacing matching conductance with a similar conductance measure based on a generalisation of the weighted vertex cover problem.The main difficulty then lies in generalising the proof of Lemma 2.11, since it crucially depends on the greedy algorithm for maximal matching.
Another open problem prompted by our work is to construct a graph sparsifier, i.e., an edgeinduced sparse subgraph, that approximately preserves the vertex conductance of the original graph.Our work together with a result by Batson, Spielman and Srivastava [BSS12] implies that it is possible to construct an order n-size sparsifier of G with vertex conductance at most order Ψ ⋆ (G) 2 log n.It is then natural to ask if we can obtain a better approximation.
Finally, can our results spur new algorithmic applications?We mention two.First, we would like to design a distributed algorithm to compute a fast mixing Markov chain on G, where G also represents the topology of the distributed network.Second, we ask if it is possible to design a local algorithm, in the spirit of Spielman and Teng [ST13], that outputs a subset of nodes with small vertex conductance.
where Φ(S) := |E(S, S c )|/ vol(S) for S ⊆ V, where E(S, S c ) is the edge boundary of S ⊆ V and vol(S) is the volume of S ⊆ V : E(S, S c ) := {x, y} ∈ E | x ∈ S, y / ∈ S and vol(S) := x∈S deg(x).

Figure 1 .Figure 2 .
Figure 1.Dumbbell graph D ⋆ with n = 7: two cliques connected to a single external vertex

u
whatever the graph.△ Remark 3.4b (Conductance: Connectivity Assumption).We may assume that S induces a connected subsetT [S] when analysing Φ ⋆ u .Indeed, if S = A ∪ B with F (A, B) = ∅, then Φ u (S) = u F (A, A c ) + u F (B, B c ) u(A) + u(B) ≥ min Φ u (A), Φ u (B) , using the fact that F (S, S c ) = F (A, A c ) ∪ F (B, B c ) and that a + b a ′ + b ′ ≥ min a a ′ , b b ′ for all a, a ′ , b, b ′ > 0. △Next, we introduce the canonical paths method and use it to relate the spectral gap to the conductance in trees.A proof of Proposition 3.6 can be found in [Sin92, Theorem 5].Definition 3.5 (Paths).Let G = (V, F ) be a graph.Γ : {0, ..., L} → V is an F -path from x to y if

Proof.
The discrete-time case is proved in [Sin92, Theorem 5].It involves the variational characterisation of the spectral gap in terms of the Dirichlet form.This characterisation holds both in discrete-and continuous-time; see [AF02, §3.6].The proof in[Sin92] then passes almost unchanged to the continuous-time set-up, recalling that now the invariant distribution is uniform.