The GHP Scaling Limit of Uniform Spanning Trees in High Dimensions

We show that the Brownian continuum random tree is the Gromov–Hausdorff–Prohorov scaling limit of the uniform spanning tree on high-dimensional graphs including the d-dimensional torus \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {Z}}_n^d$$\end{document}Znd with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d>4$$\end{document}d>4, the hypercube \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{0,1\}^n$$\end{document}{0,1}n, and transitive expander graphs. Several corollaries for associated quantities are then deduced: convergence in distribution of the rescaled diameter, height and simple random walk on these uniform spanning trees to their continuum analogues on the continuum random tree.


Introduction
Consider the uniform spanning tree (UST) of the d-dimensional torus Z d n with d > 4 or another transitive high-dimensional graph such as the hypercube {0, 1} n or a transitive expander graph.In this paper we show that the Brownian continuum random tree (CRT), introduced by Aldous [1,2], is the Gromov-Hausdorff-Prohorov (GHP) scaling limit of such USTs.
Convergence of such USTs to the CRT in the sense of finite dimensional distributions has been established in the work of Peres and Revelle [28].The novelty of the current paper is proving that this convergence holds in the stronger GHP topology.This implies the convergence in distribution of some natural geometric quantities of the USTs (which were not known to converge prior to this work) and allows us to express their limiting distribution explicitly.For example, it follows from our work that the diameter and the height seen from a random vertex of these USTs, properly rescaled, converge to certain functionals of the Brownian excursion, as predicted by Aldous (see [2,Section 4]).Additionally, it implies that the simple random walk on these USTs converges to Brownian motion on the CRT.We discuss these implications in Section 1.3.
Our main result is as follows.
Theorem 1.1.Let T n be a uniformly drawn spanning tree of the d-dimensional torus Z d n with d > 4. Denote by d Tn the corresponding graph-distance on T n and by µ n the uniform probability measure on the vertices of T n .Then there exists a constant β(d) > 0 such that where (T , d T , µ) is the CRT equipped with its canonical mass measure µ and −→ means convergence in distribution with respect to the GHP distance between metric measure spaces.
Remark 1.2.We take the convention of Aldous [2, Section 2] that the CRT is coded by two times standard Brownian excursion, although different normalizations are sometimes used elsewhere in the literature.
Our result shows that high-dimensional USTs exhibit a strong form of universality, a common phenomenon in statistical physics whereby above an upper critical dimension, the macroscopic behaviour of a system does not depend on the finer properties of the underlying network.For USTs the upper critical dimension is well-known to be four as for the closely related model of loop-erased random walk (LERW).Above dimension four LERW rescales to Brownian motion, see [19].In lower dimensions the scaling limits are markedly different.On Z 2 it was shown by Lawler, Schramm and Werner [20] that LERW rescales to SLE 2 , and Barlow, Croydon and Kumagai [7] later established subsequential GHP scaling limits for the UST.This was later extended to full convergence in a result of Holden and Sun [13].On Z 3 , much less is known, however the breakthrough works of Kozma [16] and Li and Shiraishi [23] on subsequential scaling limits of LERW enabled Angel, Croydon, Hernandez-Torres and Shiraishi [4] to show GHP convergence of the rescaled UST along a dyadic subsequence.Their scaling factors are given in terms of the LERW growth exponent in three dimensions, which was shown to exist by Shiraishi [31].Finally, in four dimensions, a classical result of Lawler [18] computes the logarithmic correction to scaling under which the LERW on Z 4 converges to Brownian motion.Schweinsberg [30] showed that with these logarithmic corrections to scaling, the finite-dimensional distributions of the UST on the four dimensional torus converge to those of the CRT, analogously to [28].Various exponents governing the shape of the UST in Z 4 are given in the recent work of Hutchcroft and Sousi [15].Our proof of GHP convergence does not encompass the four dimensional torus (see Problem 7.3).
In the rest of this section we first present the standard notation and definitions required to parse Theorem 1.1.We then state the most general version of our result, Theorem 1.5, handling other high-dimensional underlying graphs such as expanders and the hypercube.We close this section with a discussion of the various corollaries mentioned above and the organization of the paper.

Standard notation and definitions
A spanning tree of a connected finite graph G is a connected subset of edges touching every vertex and containing no cycles.The uniform spanning tree (UST) is a uniformly drawn sample from this finite set.Given a tree T we denote by d T the graph distance metric on the vertices of T , i.e., d T (u, v) is the number of edges in the unique path between u and v in T .
We follow the setup of [26,Sections 1.3 and 6] and work in the space X c of equivalence classes of (deterministic) metric measure spaces (mm-spaces) (X, d, µ) such that (X, d) is a compact metric space and µ is a Borel probability measure on (X, d), where we treat (X, d, µ) and (X ′ , d ′ , µ ′ ) as equivalent if there exists a bijective isometry φ : X → X ′ such that φ * µ = µ ′ where φ * µ is the pushforward measure of µ under φ.As is standard in the field, we will abuse notation and represent an equivalence class in X c by a single element of that equivalence class.
We will now define the GHP metric on X c .First recall that if (X, d) is a metric space, the Hausdorff distance d H between two sets A, A ′ ⊂ X is defined as Furthermore, for ε > 0 and A ⊂ X we let A ε = {x ∈ X : d(x, A) < ε} be the ε-fattening of A in X.If µ and ν are two measures on X, the Prohorov distance between µ and ν is given by d P (µ, ν) = inf{ε > 0 : µ(A) ≤ ν(A ε ) + ε and ν(A) ≤ µ(A ε ) + ε for any closed set A ⊂ X}.Definition 1.3.Let (X, d, µ) and (X ′ , d ′ , µ ′ ) be elements of X c .The Gromov-Hausdorff-Prohorov (GHP) distance between (X, d, µ) and (X ′ , d ′ , µ ′ ) is defined as where the infimum is taken over all isometric embeddings φ : X → F , φ ′ : X ′ → F into some common metric space F .
It is shown in [26,Theorem 6 and Proposition 8] that (X c , d GHP ) is a Polish metric space.Denote by M 1 (X GHP The CRT is a typical example of a random fractal tree and can be thought of as the scaling limit of critical (finite variance) Galton-Watson trees.As we shall explain in Section 3, we do not directly approach the CRT in this paper; therefore we have opted to omit the definition of the CRT and refer the reader to Le Gall's comprehensive survey [21] for its construction (see also [2]) as a random element in X c .Except for this, by now we have stated all the necessary definitions required for Theorem 1.1.

The general theorem
We now present the general version of Theorem 1.1 which will imply the GHP convergence of the UST on graphs like the hypercube {0, 1} m or transitive expanders.Our assumptions on the underlying graph are stated in terms of random walk behavior but should be thought of as geometric assumptions.For a graph G, two vertices x, y and a non-negative integer t we write p t (x, y) for the probability that the lazy random walk starting at x will be at y at time t.When G is a finite connected regular graph on n vertices we define the uniform mixing time of G as We will assume the following throughout the paper.This is the same assumption under which Peres and Revelle establish finite-dimensional convergence in [28].
Assumption 1.4.Let {G n } be a sequence of finite connected vertex transitive graphs with |G n | = n.
1.There exists θ < ∞ such that sup Both items in Assumption 1.4 imply that the graph sequence is in some sense of dimension greater than four.The first item is a finite analogue of the condition that the expected number of intersections of two independent random walks is finite; in Z d this happens if and only if d > 4. The second item (which clearly holds on the torus on n vertices once d > 4, since this has mixing time of order n 2/d ) heuristically ensures that different parts of the UST that are distance √ n apart behave asymptotically independently.We do not claim that these conditions are optimal (see the discussion in [25,Section 1.4]), but they are enough to yield convergence to the CRT in the most interesting cases.
Theorem 1.5.Let {G n } be a sequence of graphs satisfying Assumption 1.4 and let T n be a sample of UST(G n ).Denote by d Tn the graph distance on T n and by µ n the uniform probability measure on the vertices of T n .Then there exists a sequence where (T , d T , µ) is the CRT equipped with its canonical mass measure µ and −→ means convergence in distribution with respect to the GHP distance.
The sequence {β n } is inherited from the main result of Peres and Revelle, see [28,Theorem 1.2] (we restate this as Theorem 3.1 in this paper).Note that Theorem 1.1 is not a special case of Theorem 1.5 since the latter does not guarantee a single scaling factor β, rather a sequence β n (which is the best one can hope for in the context of Theorem 1.5 since one can alternate between different graph sequences).
Proof of Theorem 1.1 given Theorem 1.5.For the torus Z d n with d ≥ 5, Peres and Revelle proved that there exists β(d) ∈ (0, ∞) such that [28, Theorem 1.2] holds with β n = β(d); see the choice of β n at the end of Section 3 of [28] as well as Lemma 8.1 and (17) in that paper.Hence, this and Theorem 1.5 readily imply Theorem 1.1.Furthermore, see Lemma 1.3 and Section 9 of [28], in graphs where additionally two independent simple random walks typically avoid one another for long enough (see the precise condition in [28, Equation 6]), we can take β n ≡ 1.This family of graphs includes the hypercube and transitive expanders with degrees tending to infinity.In the same spirit, for the d-dimensional torus, β(d) → 1 as d → ∞.Moreover, it is also immediate to see that Assumption 1.4 holds for a sequence of bounded degree transitive expanders (see for instance [28,Section 9]) and hence Theorem 1.5 holds for them as well.

Pointed convergence
In order to establish some of the corollaries alluded to above, it will be useful to rephrase Theorem 1.5 in terms of pointed convergence.Roughly speaking, this means that we consider our spaces to be rooted, and we add a term corresponding to the distance between the roots in the embedding in Definition 1.3.We refer to [10,Section 2.2] for the precise definition.We start with the following observation, which is a trivial consequence of a coupling characterization of the Prohorov distance (see [26, Proof of Proposition 6]).d, µ) deterministically in the GHP topology.Let U n be a random element of X n sampled according to the measure µ n , and U be a random element of X sampled according to the measure µ.Then with respect to the pointed GHP topology, as defined in [10,Section 2.2].
Due to transitivity, in our setting the root can be an arbitrary vertex O n rather than uniformly chosen.Combining Theorem 1.5 with Lemma 1.6 and Skorohod representation theorem we deduce the following.
where (T , d T , µ, O) is the CRT equipped with its canonical mass measure µ and root O, and −→ means convergence in distribution with respect to the pointed GHP distance defined in [10, Section 2.2].

Diameter distribution
The diameter of a metric space (X, d) is sup x,y∈X d(x, y) and denoted by Diam(X).When X is a tree, it is just the length of the longest path.The study of the diameter of random trees has an interesting history.Szekeres [33] proved in 1982 that the diameter D n of a uniformly drawn labeled tree on n vertices normalized by n −1/2 converges in distribution to a random variable D with the rather unpleasant density where b n,y = 8(πn/y) 2 and y ∈ (0, ∞).Aldous [1,2] showed that this tree, viewed as a random metric space, converges to the CRT and deduced that D is distributed as where {e t } t∈[0,1] is standard Brownian excursion.Curiously enough, up until 2015 the only known way to show that (4) has density (3) was to go via random trees and combine the Aldous and Szekeres results.
Wang [34], prompted by a question of Aldous, gave a direct proof of this fact in 2015.
A uniformly drawn labeled tree on n vertices is just UST(K n ) where K n is the complete graph on n vertices.Applying Theorem 1.5 we are able to extend Szekeres' 1983 result to USTs of any sequence of graphs satisfying Assumption 1.4.
Corollary 1.8.Let {G n } be a sequence of graphs satisfying Assumption 1.4, let T n be a sample of UST(G n ) and let {β n } be the sequence guaranteed to exist by Theorem 1.5.Then where D is the diameter of the CRT, i.e., a random variable defined by either (3) or (4).
Proof.Let D n = Diam(T n ) and let g : [0, ∞) → R be bounded and continuous.The function h : X c → R defined by h((X, d, µ)) = Diam(X) is continuous with respect to the GHP topology; indeed, for any two metric spaces X 1 and

Height distribution
Given a rooted tree (T, v), the height of (T, v) is sup x∈T d(v, x), i.e. the length of the longest simple path in T starting from v, and denoted by Height(T, v).The study of the height of random trees predates the study of the diameter.In 1967, Rényi and Szekeres [29] found the limiting distribution of the height of a uniformly drawn labeled rooted tree on n vertices normalized by n −1/2 ; we omit the precise formula this time (it is also unpleasant).Aldous [1,2] realized that the limiting distribution is that of the maximum of the Brownian excursion.
The following corollary is an immediate consequence of Theorem 1.7.The proof goes along the same lines as the proof of Corollary 1.8; we omit the details.
Corollary 1.9.Let {G n } be a sequence of graphs satisfying Assumption 1.4, let T n be a sample of UST(G n ) and let β n be the sequence guaranteed to exist by Theorem 1.5.Let v n be an arbitrary vertex of G n .Then where {e t } t∈[0,1] is standard Brownian excursion.

SRW on the UST converges to BM on the CRT
A particularly nice application of Theorem 1.5 together with [10, Theorem 1.2] allows us to deduce that the simple random walk (SRW) on UST(G n ) rescales to Brownian motion on the CRT.The latter object was first defined by Aldous in [2, Section 5.2] and formally constructed by Krebs [17].In what follows, we let P Theorem 1.10.Let {G n } be a sequence of graphs satisfying Assumption 1.4, let T n be a sample of UST(G n ), and let (X n (m)) m≥0 be a simple random walk on T n .Then there exists a probability space Ω on which the convergence of Theorem 1.7 holds almost surely, and furthermore, on this probability space, for almost every ω ∈ Ω the spaces ((T n , d n , µ n , O n )) n≥1 and (T , d, µ, O) can be embedded into a common metric space (X ′ , d ′ )(ω) so that weakly as probability measures on the space D(R ≥0 , X ′ (ω)) of càdlàg functions equipped with the uniform topology.
Proof.The existence of such a probability space Ω follows from the Skorohod representation theorem since the space of pointed compact mm-spaces endowed with a finite measure is separable by [ as n → ∞, almost surely on Ω.This result then transfers to the SRW sequence (X n (•)) n≥1 in place of (Y n (•)) n≥1 by standard arguments using the strong law of large numbers and continuity of the limit process.We refer to [5, Section 4.2] for an example of such an argument.

Organization
We begin with some preliminaries in Section 2 where we introduce the standard definitions of loop-erased random walk, mixing time and capacity which are central to the proof.We also record some stochastic domination properties of USTs, and prove there a general result regarding negative correlations of certain expected volumes in the UST (see Claim 2.12).
Next in Section 3 we present the main argument of the proof, while delegating two useful estimates, Theorem 3.3 and Theorem 3.6, to Section 4, and a third useful estimate, Lemma 3.7, to Section 5.In Section 6 we present a necessary though rather straightforward abstract argument combining the result of Section 3 with the results of [28] to yield Theorem 1.5.Lastly, in Section 7 we present some concluding remarks and open questions.

Acknowledgments
We thank Christina Goldschmidt for many useful discussions.This research is supported by ERC starting grant 676970 RANDGEOM, consolidator grant 101001124 UniversalMap, and by ISF grant 1294/19.

Preliminaries
In this section we provide an overview of the tools used to prove Theorem 1.5.Throughout the section, we assume that G = (V, E) is a finite connected graph with n vertices.We will use the following conventions: • For an integer m ≥ 1 we write [m] = {1, . . ., m}.
• For two positive sequences t(n), r(n) we write t ∼ r when t(n)/r(n) → 1.
• For two positive sequences t(n), r(n) we write t ≫ r when t(n)/r(n) → ∞.
• We omit floor and ceiling signs when they are necessary.
• Through the rest of this paper, the random walk on a graph equipped with positive edge weights is the random walk that stays put with with probability 1/2 and otherwise jumps to a random neighbor with probability proportional to the weight of the corresponding edge.If no edge weights are specified, then they are all unit weights.

Loop-erased random walk and Wilson's algorithm
Wilson's algorithm [35], which we now describe, is a widely used algorithm for sampling USTs.A walk X = (X 0 , . . .X L ) of length L ∈ N is a sequence of vertices where (X i , X i+1 ) ∈ E(G) for every 0 ≤ i ≤ L − 1.
For an interval J = [a, b] ⊂ [0, L] where a, b are integers, we write X[J] for {X i } b i=a .Given a walk, we define its loop erasure Y = LE(X) = LE(X[0, L]) inductively as follows.We set Y 0 = X 0 and let λ 0 = 0.Then, for every i ≥ 1, we set We halt this process once we have λ i > L. The times λ k (X) are the times contributing to the loop-erasure of the walk X.When X is a random walk starting at some vertex v ∈ G and terminated when hitting another vertex u (L is now random), we say that LE(X) is the loop erased random walk (LERW) from v to u.
To sample a UST of a finite connected graph G we begin by fixing an ordering of the vertices of V = (v 1 , . . ., v n ).At the first step, let T 1 be the tree containing v 1 and no edges.At each step i > 1, sample a LERW from v i to T i−1 and set T i to be the union of T i−1 and the LERW that has just been sampled.We terminate this algorithm with T n .Wilson [35] proved that T n is distributed as UST(G).An immediate consequence is that the path between any two vertices in UST(G) is distributed as a LERW between those two vertices.This was first shown by Pemantle [27].
To understand the lengths of loops erased in LERW we will need the notion of the bubble sum.Let G be a graph and let W be a non empty subset of vertices of G.For every two vertices u, w ∈ V (G), define where X is a random walk on G.We define the W -bubble sum by Note that since the random walk on G is an irreducible Markov chain on a finite state space, we have that P(X[0, t] ∩ W = ∅) decays exponentially in t and hence this sum is always finite.Another bubble-sum we will consider is when the random walk is killed at a geometric time (rather than when hitting a set W ). Let T ζ be an independent geometric random variable with mean ζ > 1.We define Definition 2.1.We say that a random walk X on a finite connected graph G starting from an arbitrary vertex is bubble-terminated with bubble-sum bounded by ψ if it is killed upon hitting some set W and Both bubble-sums allow us to bound the size of the loops erased in the loop-erasure process.As in [14] and [25,Claim 3.2] we have the following.Claim 2.2.Let G be a finite connected graph and X be a bubble-terminated random walk on G with bubblesum bounded by ψ.For any finite simple path γ such that P(LE(X) = γ) > 0 of length L we have that the random variables {λ i+1 (X) − λ i (X)} L−1 i=0 are independent conditionally on {LE(X) = γ} and furthermore Proof.In the case that X is killed upon hitting W , see [25,Proof of Claim 3.2].(Note that the definitions of the times contributing to the loop erased random walk are a little different.More accurately, the kth time contributing to the loop erasure according to our definition equals the (k + 1)th time contributing to the loop erasure minus one according to the definition in [25], thus the expected difference between two consecutive times has the same bound, and the proof is the same.) When X is killed at T ζ − 1 where T ζ is an independent geometric random variable with mean ζ > 1, the proof can be deduced from the previous claim.Indeed, we add a new vertex ρ to G and edges (ρ, u) for every u ∈ G with weights on them so that the probability to visit ρ from every vertex in a single step is equal to 1/ζ for any u ∈ G.Call the resulting network G * .A random walk on G * started from v ∈ G and terminated when hitting ρ has the same distribution as a random walk on G with geometric killing time.

Mixing times
Recall the definition of the uniform mixing time above Assumption 1.4.It follows that for every t ≥ t mix we have that where X t is the random walk.Even though in this paper we mainly use the uniform mixing time as defined in (2) we also use a more classical version of distance between probability measures on finite sets.Recall that the total variation distance between two probability measures on µ and ν on a finite set X is defined by It is a standard fact (see [22,Section 4.5]) that if t ≥ kt mix , then for any vertex

Capacity
The capacity of a set of vertices quantifies how difficult it is for a random walk to hit the set.It is a crucial notion when one wishes to analyze the behavior of Wilson's algorithm.Let {Y i } i≥0 be a random walk on G and for Note that the relative capacity is not symmetric in W, U .
We will see later that the capacities of certain subsets determine the expected volumes of balls in UST(G).Here we collect some useful facts about the capacity.By the union bound, when G is regular we always have the upper bound The capacity is defined for the lazy simple random started at stationarity.When k is significantly larger than the mixing time, the starting vertex does not make much difference as the following claim shows.
We will also use the following lemma.
Lemma 2.4.Let G be a connected regular graph.Let W ⊂ U be subsets of vertices and k, s, m ≥ 0. Assume that Cap k (W, U \ W ) ≥ s .
Then we can find at least L = ⌊s/(m for all j = 1, . . ., L. Proof.We first observe that if A is a subset of W and v ∈ W \ A then by (9) we have that Secondly, we observe that if A 1 , . . ., A L ′ are disjoint sets so that With these two observations in place, we now perform an iterative construction of the subsets.We add vertices from ≥ m and so forth.By the second observation we deduce that we can continue this way In order to obtain useful lower bounds on the capacity, we state a well-known relationship between the capacity of a set A and the Green kernel summed over A. Given a set A ⊂ G and k ∈ N we define where . This is useful due to the following characterization of capacity.
Lemma 2.5.Let G be a connected regular graph.For all A ⊂ G, Proof.The proof is the same as that of [9, Theorem 2.2], but instead considering a stationary starting point distributed according to π, noting that G (k) (π, x) = k n for all x ∈ G n by transitivity, and specifically using the measure µ(x) = ½{x∈A} |A| .
The following bound on E M (k) (P ) where P is a random walk path will be useful.
Lemma 2.6.Let G be a connected regular graph with n vertices.Let m and k be two positive integers and let P be a random walk path of length Proof.The proof goes by the same argument as in [14,Lemma 5.6].
Furthermore, in order to lower bound the relative capacity, we define the k-closeness of two sets U and It follows from [28, Lemma 5.2] together with (9) that on any finite connected regular graph G, if W = X[0, T ] where X is a random walk on G started at stationarity, and T is a stopping time, then for any set U ⊂ G, Lastly, recall the two bubble sums defined in Section 2.1.One of the uses of the capacity is to bound such bubble sums.Claim 2.7.Let {G n } be a sequence of graphs satisfying Assumption 1.4 and let W ⊂ G n be a set of vertices such that Cap √ n (W ) ≥ c, then Proof.This follows by exactly the same proof as in [25,Claim 3.14].

Stochastic domination properties
The UST enjoys the negative correlation property, i.e., the probability that an edge e is such that e ∈ UST(G) conditioned on f ∈ UST(G) for some other edge f is no more than the unconditional probability.Moreover, Feder and Mihail showed that for every increasing event A that ignores f , the probability of A given f ∈ UST(G) is no more than the unconditional probability.This led to the following result.The same proof leads to a slightly more generalized version.
Lemma 2.10.Let (G, w) be a weighted network and suppose that (H, w ′ ) is a network such that V (G) ⊆ V (H) and that for every edge Later in the paper, we will apply Lemma 2.10 in the following context.To study UST(G) using Wilson's algorithm, it will sometimes be convenient to add an extra vertex to G called the sun, and for every vertex v ∈ G add an extra edge from v to the sun.We give well-chosen weights to these new edges and call the new graph the sunny graph.Lemma 2.10 tells us that the UST of the sunny graph, intersected with E(G), is stochastically dominated by UST(G).This idea was previously used in [35] and [28].
We will also make use of the following well-known lemma.Here G/A denotes the graph obtained from G by identifying all vertices in A with a single vertex.Lastly, let W be a set of vertices, and let A 1 and A 2 be disjoint subsets of W .In what follows we consider UST(G/W ).Given an integer k and j ∈ {1, 2}, let I j (k) denote the vertices of G that are connected to W in UST(G/W ) by a path of length k such that the last edge on the path to W is an edge that one of its original endpoints belonged to A j (including A j itself).Also, let Claim 2.12.Let G be a finite connected graph, take any k ≥ 1 and let W, A 1 , A 2 be as above.Then, for UST(G/W ) and for every Proof.We will first show that for every v ∈ G, the events {X 1 > M } and {v ∈ I 2 } are negatively correlated.Fix some v ∈ G such that v ∈ I 2 has positive probability.Condition on v ∈ I 2 and on γ 2 , the path from v to A 2 .The UST conditioned on W and γ 2 has the distribution of UST(G/(W ∪ γ 2 )).Hence, by Lemma 2.11 we have that UST(G/(W ∪ γ 2 )) is dominated by UST(G/W ).Therefore, by Strassen's theorem [32], there exists a coupling of the two measures such that UST(G/(W ∪ γ 2 )) ⊆ UST(G/W ).This means that every vertex connected to W through A 1 by a path of length at most k in UST(G/(W ∪ γ 2 )) will also be connected by the same path to A 1 in UST(G/W ).Therefore, we have that Then by averaging over γ 2 and taking complements we obtain Therefore, inverting using Bayes' rule, we have for every v with P(v ∈ I 2 (k)) > 0 that Summing over v yields the result.

The lower mass bound
The starting point of the proof of Theorem 1.5 is the work of Peres and Revelle [28].For the proof of Theorem 1.5 we take the same sequence β n guaranteed to exist by the theorem above.As we shall see in Section 6, the convergence of Theorem 3.1 is equivalent to what is known as Gromov-weak convergence, which does not imply GHP convergence.In order to close this gap in this abstract theory, Athreya, Löhr and Winter [6, Theorem 6.1] introduced the lower mass bound condition and proved that this condition together with Gromov-weak convergence is in fact equivalent to GHP convergence; we discuss this further in Section 6.The main effort in this paper is proving that the lower mass bound holds under Assumption 1.4; this is the content of the following theorem.Theorem 3.2.Let {G n } be a sequence of graphs satisfying Assumption 1.4 and let T n be UST(G n ).For a vertex v ∈ T n and some r ≥ 0 we write B Tn (v, r) = {u : d Tn (v, u) ≤ r} where d Tn is the intrinsic graph distance metric on T n .Then for any c > 0 and any δ > 0 there exists ε > 0 such that for all n ≥ 1, In other words, the random variables In the rest of this section we prove Theorem 3.2, delegating parts of the proof to Section 4 and Section 5.For the rest of this section as well as Section 4, {G n } is a sequence of graphs satisfying Assumption 1.4 and T n is UST(G n ).

Bootstrap argument
The main difficulty in Theorem 3.2 is that it is global; that is, it requires a lower tail bound on the volumes of the balls around all vertices simultaneously.Our approach is to prove a strong enough local version of this bound, that is, a bound for a single vertex, and use a bootstrap argument to obtain a weaker (yet sufficient) global bound.The idea is to use the observation that if there is one vertex x ∈ T n such that |B(x, r)| is small, then either |B(x, r 2 )| is also small, or otherwise there are many vertices v ∈ B(x, r 2 ) such that |B(v, r 2 )| is small.Provided that these two latter events are sufficiently less likely than the former, this allows us to define a sequence of events on the balls of dyadic radii |B(x, r 2 ℓ )|, each with strictly stronger tail decay than the previous one.We will iterate this observation enough times until the probability of the final event is o 1  n , at which point we will apply the union bound and conclude the proof.Thus, our goal will be to iteratively improve the bounds on where x is a fixed vertex (our graphs are transitive so the choice of x does not matter) and ℓ = 0, . . ., N n where N n , the number of iterations, will be chosen suitably as we now explain.
Since we will use Wilson's algorithm to sample branches in UST(G n ), it will be important in our arguments in Section 4 that the radius c √ n 2 l we consider at each step is significantly longer than the mixing time of a random walk on G n .Therefore, we require that c √ n −α (recall the constant α from Assumption 1.4), so the number of iterations N n can be at most of order log n.We will see in the proof of Theorem 3.2 that for this bootstrap argument to work with only log n steps, it will be convenient to obtain bounds on (13) that are sub-polynomial in ε.
A natural strategy to bound the probability in ( 13) is to first sample a single branch joining x to a predefined root of UST(G n ), consider the volumes of balls in subtrees attached to this branch close to x, and show that the sum of these volumes is very unlikely to be too small.This strategy almost gives sufficiently strong tail decay, but there is one step at which the tail decay is not sub-polynomial.This problem arises in the first step since there is a probability of order ε that the path joining x to a root vertex is of length less than √ εn.
This is not a fundamental problem since if this path is short, then it means we just picked a short branch when longer branches to different roots were available.However, it is not convenient to condition on picking a long branch to a well-chosen root since this conditioning reveals too much information about UST(G n ), which makes it difficult to control other properties of the branch, primarily its capacity and the capacity of its subsets.It is also inconvenient (though probably possible) to continue choosing a few more branches until we reach a certain length.
The simplest way we found to circumvent this issue is to first sample a branch Γ n between two uniformly chosen vertices of G n and perform the bootstrap argument discussed above conditioned on Γ n and the event that it is a "nice" path, a property we will define later that will include, amongst others, the event that Γ n is not too short.Then, using Wilson's algorithm we may sample other branches of UST(G n ) by considering loop-erased random walks terminated at Γ n ; thus Γ n can be thought of as the backbone of UST(G n ), and provided Γ n is sufficiently long we can sample the branch from x to Γ n and consider its extension into Γ n to make it longer if necessary.With this modified definition of a branch, it is then possible to prove a conditional sub-polynomial tail bound in ε for (13), and then to prove Theorem 3.2 by decomposing according to whether Γ n is "nice" or not.
Throughout the rest of this paper, and in accordance with Theorem 3.2, we fix c > 0 to be a small enough parameter and ε > 0 which can also be chosen to be small enough depending on c and set and for any scale ℓ ∈ {0, . . ., N n } Theorem 3.3.Let {G n } be a sequence of graphs satisfying Assumption 1.4, let T n be UST(G n ) and denote by Γ n the unique path between two independent uniformly chosen vertices.Then for any δ > 0 there exist c ′ , ε ′ > 0 such that for all c ∈ (0, c ′ ) and all ε ∈ (0, ε ′ ) there exists N = N (δ, c, ε) such that for any n ≥ N we have that, with probability at least 1 − δ, (II) For any scale ℓ ∈ {0, . . ., N n } and subsegment I ⊆ Γ n with |I| = r ℓ /3 we have that Definition 3.4.For the rest of this paper, given c and ε as above we denote by E n,c,ε the intersection of the events in (I), (II), (III) of the above theorem.
Remark 3.5.The reader may notice that although c is fixed in Theorem 3.2 and (14), it is now treated as a variable parameter in Theorem 3.3.This is intentional since we need |Γ n | ≥ c √ n to overcome the problem of branch length mentioned above, and we cannot ensure this with high probability when c is fixed; only when it is small.To prove Theorem 3.2, we start with a fixed c, but our first step is to reduce it if necessary so that the statement of Theorem 3.3 holds as well.We will then prove the theorem with this smaller value of c.This poses no problem since the assertion of Theorem 3.2 with smaller c is stronger; this is also discussed in the proof of Theorem 3.2.
Next we assume that E n,c,ε holds for some positive c and ε, and let x be a vertex of G n .Let Γ x denote the loop-erasure of the random walk path starting x and stopped when it hits Γ n (if x ∈ Γ n then Γ x is empty).For an integer s ∈ (0, c √ n) we denote by Γ s x the prefix of Γ x of length s as long as |Γ x | ≥ s; otherwise, i.e. if |Γ x | < s, we denote by Γ s x the prefix of the path in UST(G n ) from x to one of the two endpoints of Γ n such that this path has length at least s.This is possible since by part (I) of Theorem 3.3 and (9), if E n,c,ε holds, then If the two endpoints of Γ n can be used, we choose one in some arbitrary predefined manner.In Section 4.3 we will prove the following.Theorem 3.6.Let {G n } be a sequence of graphs satisfying Assumption 1.4 and let T n be UST(G n ).Denote by Γ n the unique path between two independent uniformly chosen vertices and for a vertex x ∈ G n and s > 0 let Γ s x be as described above.Then for any c > 0 there exist ε ′ > 0 and a constant a > 0 such that for every ε ∈ (0, ε ′ ) there exists N = N (c, ε) such that for any n ≥ N and any ℓ ∈ {0, . . ., N n } we have Given these two estimates we are now ready to proceed with the proof of Theorem 3.2.Our strategy is as follows.On the event we can condition on Γ n and Γ x and then apply Lemma 2.4 with m = ℓ n for all j = 1, . . ., L.Moreover, since the cardinality of ∪ L j=1 A j is at most 5r ℓ 6 , the number of j's such that |A j | ≥ 2 13   ℓ ; we relabel the sets so that A i for i = 1, . . ., (2 13 e) −1 ε have this upper bound on their size and forget about the other sets.Our aim will be to test each of the intervals {A i } in turn to see if the trees hanging on A i contribute at least r 2 ℓ ε ℓ to B(x, r ℓ ).We will test these intervals conditionally on Γ x ∪ Γ n and on the outcome of the previous tests.Here we encounter a significant difficulty since the failure of some past tests introduces a complicated conditioning which we cannot access directly by contracting some edges.
To overcome this difficulty we proceed as follows.Conditioned on Γ n ∪ Γ x ⊂ UST(G n ), we contract Γ n ∪ Γ x to a single vertex (still remembering the original edge-set) to form the graph G n /(Γ n ∪ Γ x ).By the UST spatial Markov property [8, Proposition 4.2], we have that UST(G n ) is distributed as the union of Γ n ∪ Γ x and the UST of this new graph.Before proceeding, we then add a new vertex called the sun, denoted by ⊙, to the graph G n /(Γ n ∪ Γ x ), and add an edge from every vertex to the sun with weight chosen so that a lazy random walk on G n ∪ {⊙}/(Γ n ∪ Γ x ) will always jump to the sun at the next step with probability 1 k ℓ .Then, we identify the sun with Γ n ∪ Γ x , remembering the edges emanating from the sun.This ensures that when we run Wilson's algorithm on the remaining graph, rooted at the contracted vertex, random walks will always be killed when they hit the sun, so typically they only run for time of order k ℓ .
On the graph G n /({⊙} ∪ Γ n ∪ Γ x ) we will often say "hit A" when A is a subset of {⊙} ∪ Γ n ∪ Γ x .The meaning of hitting A in the graph G n /({⊙} ∪ Γ n ∪ Γ x ) is to hit {⊙} ∪ Γ n ∪ Γ x by traversing an edge whose original endpoint belonged in A. In some cases it will be convenient to start a random walk at a uniform vertex U in the original graph G n , and project the start point onto G n /({⊙} ∪ Γ n ∪ Γ x ); in this case "hit A" also includes the event U ∈ A.

By Lemma 2.10, conditionally on Γ
. Therefore, we can couple the two USTs together such that every edge e not adjacent to the sun in UST( ) and then remove ⊙ and its incident edges, we obtain several connected components, one of which contains x.By stochastic domination, the component containing x is a subset of UST(G n ).Therefore, let B ⊙ (x, r ℓ ) denote the set of vertices connected to x by a path of length at most r ℓ that does not intersect the sun after expanding {⊙} ∪ Γ n ∪ Γ x in the sunny graph.By stochastic domination, if we can prove a lower tail bound for B ⊙ (x, r ℓ ) on the sunny graph, it automatically transfers to a lower tail bound for B Tn (x, r ℓ ) on the original graph.
Recall that, given Γ n ∪ Γ x , each of the A i 's defined above is a subset of Γ n ∪ Γ x .When working on the graph G n /({⊙} ∪ Γ n ∪ Γ x ), we let I i (k ℓ ) be the set of vertices connected to the contracted vertex in UST(G n /({⊙} ∪Γ n ∪Γ x ) by a path of length at most k ℓ , such that the last edge on this path has an endpoint in A i .Note that this is equivalent to being connected to A i by a path of length at most k ℓ not touching Γ n ∪ Γ x after expanding the path and separating ⊙ to obtain a subset of UST(G n ).We also include A i in ℓ } and (for notational convenience) interpret B ⊙ 0 as an almost sure event.In Section 5 we will prove the following lemma.Lemma 3.7.Conditionally on Γ x ∪ Γ n , let B ⊙ j be as defined above on the graph G n /({⊙} ∪ Γ n ∪ Γ x ).Then for each j ≤ (2 13 e) −1 ε This has the following immediate corollary.
Corollary 3.8.Let {G n } be a sequence of graphs satisfying Assumption 1.4 and let T n , Γ n and Γ x be as in the previous theorem.Then for any c > 0, any ε > 0, all n large enough and any ℓ ∈ {0, . . ., N n }, we have , where b = (5e 2 2 18 ) −1 .
Proof.Given that Cap k ℓ (Γ , we can condition on Γ n ∪Γ x and obtain intervals (A j ) (2 13 e) −1 ε on the graph G n /({⊙} ∪ Γ n ∪ Γ x ) as described above.Applying Lemma 3.7, we then deduce that To conclude, we average over Γ n ∪ Γ x , then transfer this result from to B Tn (x, r ℓ ) in UST(G n ) using the stochastic domination result of Lemma 2.10, as explained above.
We now have all the tools to prove Theorem 3.2.
Proof of Theorem 3.2.Let δ > 0. We define the events for ℓ ∈ {0, . . ., N n }.We decompose by writing In what follows we will show that given δ > 0 we can find ε and c small enough and N large enough so that the sum above is at most 3δ.This yields the required assertion of the theorem since the quantity P(∃x ∈ T n : |B(x, c √ n)| ≤ εn) is non-decreasing as c decreases.
We first apply Theorem 3.3 and find ε and c small enough and N large enough (depending on δ) that the first term is at most δ for all n ≥ N .To control the second term in (17) we note that if A ℓ occurs, then |B(v, r ℓ+1 )| ≤ ε ℓ r 2 ℓ = 16ε ℓ+1 r 2 ℓ+1 for all v ∈ B(x, r ℓ+1 ), and the number of such v is at least ε ℓ+1 r 2 ℓ+1 .Therefore using Theorem 3.6, Corollary 3.8 and Markov's inequality, we have for all n large enough that By making ε smaller and N larger if necessary we can guarantee that the term in the parenthesis on the right hand side is at most ε 10 ℓ for all n ≥ N .This shows that the sum can be smaller than δ as long as ε is small enough and N is large enough.
Finally, for the third term we recall that r Nn = cn which tends to 0 as n → ∞, so it is smaller than δ as long as n is large enough.Provided n is sufficiently large, we have therefore bounded (17) by 3δ, concluding the proof.(We can then reduce ε if necessary so that the bound holds for all n ≥ 1).
4 Proofs of Theorems 3.3 and 3.6 In this section we prove Theorem 3.3 and Theorem 3.6.Due to the results of [25], this essentially boils down to proving only capacity estimates.In both cases, we will bound capacity using Lemma 2.5.In Section 4.1 we prove two claims that we later use in the proofs of Theorem 3.3 and Theorem 3.6 in Section 4.2 and Section 4.3 respectively.

Two claims
In what follows we take z = 1/20 and assume that {G n } is a sequence of graphs satisfying Assumption 1.4.
Our first claim shows that with very high probability any loop-erased trajectory (that has bounded bubblesum) has a rather long subinterval which is derived from a (relatively) short segment of a random walk trajectory (which in turn will have a long subinterval with good M (k) and closeness values by the subsequent claim).
Claim 4.1.Fix ψ > 0 and c > 0. There exists ε ′ > 0 such that for every ε ∈ (0, ε ′ ) there exists N such that for all n ≥ N and for all scales ℓ ∈ {0, . . ., N n } the following holds.Let X be a random walk on G n which is bubble-terminated (see Definition 2.1) with bubble-sum bounded by ψ and let Γ be its loop erasure.Also fix j ∈ N and χ = min{ z 3ψ , 1 24 }.Then with probability at least 1 − exp − or there exists t ∈ [(j − 1)r ℓ /24, jr ℓ /24] such that for all integers 1 ≤ m ≤ χε Proof.We set M ℓ = ε −5z/3 ℓ .On the event that |Γ| ≥ jr ℓ /24, we divide Γ[(j − 1)r ℓ /24, jr ℓ /24] into M ℓ /24 consecutive disjoint subintervals of length r ℓ /M ℓ .For m ∈ {1, . . ., M ℓ /24} we say that the m-th interval is good if As we assumed that the bubble sum is bounded by ψ, it follows from Claim 2.2 that conditioned on Γ and the event {|Γ| ≥ jr ℓ /24}, the collection of events that the m-th interval is good are independent.Furthermore, by Claim 2.2, Claim 2.7 and Markov's inequality the probability of each such event is at least Hence, the probability that a sequence of χε where we used the inequality 1 − x ≥ e −2x valid for x > 0 small enough.Since there are M ℓ 24 intervals in total, we can form M ℓ 24 (χε Since the events are independent conditionally on Γ, we deduce that the probability that none of these runs contain only good intervals is at most where in the last inequality we used the fact that χ = min z 3ψ , 1   24   by assumption. For the next claim recall the definitions of M (k) in ( 10) and of Close k (U, V ) in (11).We show that with very high probability, any random walk interval of length of order ε z ℓ log ε −1 ℓ r ℓ has a slightly shorter subinterval of length of order ε z ℓ r ℓ , such that its value of M (k) and its closeness to the rest of the path are very close to their expected values given by Lemma 2.6 and ( 12).This is done by finding many well separated intervals and employing the fast mixing of the graph to obtain independence.Claim 4.2.Fix some χ, c > 0. There exists ε ′ > 0 such that for every ε ∈ (0, ε ′ ) there exists N such that for all n ≥ N and for all scales ℓ ∈ {0, . . ., N n } the following holds.Let X be a random walk on G n , started from stationarity.Let M > 0 and fix some interval Proof.We write I = [t − I , t + I ] and then further subdivide I into χ 5 log ε −1 ℓ segments of length 2ε z ℓ r ℓ separated by buffers of length 1  2 ε z ℓ r ℓ , that is, we set for each non-negative integer j ≤ χ 5 log ε −1 ℓ .It will be important soon that the length of the buffers satisfy t mix for all n large enough by Assumption 1.4, ( 14) and (15).We also set We condition on X avoid I and define for each j the event Note that since |I| = 1 2 χε z ℓ log ε −1 ℓ r ℓ , we have that t − j − t − I ≤ r ℓ 72 and t + I − t + j ≤ r ℓ 72 for all j and all ℓ provided ε is small enough.Thus, Hence E j implies that the interval I j satisfies the conditions (1) − (4).Note that the events {E j } j are not independent, but that was why we introduced the buffers.Let {Y j } j≤ χ 5 log ε −1 ℓ be independent random walks started from stationarity and run for time 2ε z ℓ r ℓ , set and note that conditioned on X avoid I the events {E ind j } j are independent.Now by Lemma 2.6 and Assumption 1.4 we have (provided ε < 1 and c < 1/6, for example) that , it also follows from ( 12) that and that Consequently, by Markov's inequality and independence we get that as long as ε is small enough depending on θ.To conclude, note that as long as n is large enough, we can couple the independent walks {Y j } and {X[I j ]} so that Indeed, assume we coupled the first j − 1 pairs and condition on all these pairs.By the Markov property and since the buffers between distinct I j 's are longer than n 2α 3 t mix , the starting point of X[I j ] is 2 −n 2α 3 close in total variation distance to the stationary distribution by (8).Therefore, we may couple it to the first vertex of Y j so that they are equal with probability at least 1 − 2 −n 2α 3 (see for instance [22,Proposition 4.7]).Moreover, once their starting points are coupled, we can run the walks together so that they remain coupled for the remaining 2ε z ℓ r ℓ steps.Hence ( 21) holds for large enough n and we combine with (20) in a union bound to conclude that

Proof of Theorem 3.3
As mentioned in Section 2.4, to sample Γ n we will use a coupling with the sunny graph Consequently, it will be convenient to work with a path Γ * n = Γ * n (ζ) sampled as follows.Let T and T ′ be two independent geometric random variables with mean ζ −1 n 1/2 .Given T , let X be a random walk run for T − 1 steps started from u ∈ G n and let LE(X) be its loop erasure.Given T ′ and X, run X ′ , a random walk started from v ∈ G n and terminate X ′ after T ′ steps.Write T X the minimum between T ′ and the first hitting time of X.Let Γ * n be the path between Proof of Theorem 3.3.Our main effort is to show that part (II) holds with high probability.Indeed, that part (I) and (III) occur with probability at least 1 − δ as long as ε > 0 and c > 0 are small enough is a consequence of [25, Theorem 1.1 and Theorem 2.1].
Let δ > 0. We appeal to Lemma 4.3 and obtain ζ > 0 so that P(Γ n = Γ * n (ζ)) ≤ δ/4.Denote by B the event of part (II) of Theorem 3.3.For the rest of the proof, we think of δ and ζ as fixed, we set ψ = θ + 2ζ −2 and χ = min{ z 3ψ , 1 24 }, and decrease both c and ε until we eventually obtain that P(B c ) ≤ δ.Recall that Γ * n (ζ) is generated using two independent random walks with geometric killing time which we denote X and X ′ .Setting M = 8 ζδ , we can thus write The first event has probability at most δ 4 by the above.Since M = 8 ζδ , the probability of the second event is also bounded by δ/4 by Markov's inequality.For the third term, first decrease c if necessary so it is less than 1 2M (this will be useful at the end of the proof), then let ℓ ≤ N n be a fixed scale and let I ⊂ Γ * n (ζ) be some segment with |I| = r ℓ /3.It therefore has at least r ℓ /6 vertices either on LE(X) or on LE(X ′ ) and hence contains at least one interval of the form LE(X)[(j −2)r ℓ /24, (j +1)r ℓ /24] or LE(X ′ )[(j −2)r ℓ /24, (j +1)r ℓ /24] (that is, an interval of the form [(j − 1)r ℓ /24, jr ℓ /24] plus two buffers of length r ℓ 24 both before and after the interval) for some j ≤ 24M √ n r ℓ .Since Assumption 1.4 holds, we deduce from Claim 2.8 that X and X ′ are bubble-terminated random walks with bubble sum bounded by ψ.Hence we may apply Claim 4.1 and the union bound to learn that the probability that there exists a scale ℓ and j as above such that the event of Claim 4.1 does not hold for X or X ′ is at most which can be made to be smaller than δ/4 by decreasing ε appropriately.Thus we may assume without loss of generality that I contains an interval of the form LE(X)[(j − 2)r ℓ /24, (j + 1)r ℓ /24] for some j that we fix henceforth, and that there exists a time t ∈ [(j − 1)r ℓ /24, jr ℓ /24] such that for all integers m satisfying 1 ≤ m ≤ χε We write X[t 1 , t 2 ) for the corresponding part of X, that is, we set t 1 = λ t (X) and We now apply the union bound using Claim 4.2 with W = X ′ [0, M √ n] to get that the probability that there exists a scale ℓ and i which can be made smaller than δ/4 by decreasing ε appropriately.Therefore we henceforth assume that all such intervals contain a good subinterval satisfying (1) − (4) of Claim 4.2.
Since [t 1 , t 2 ] must contain an interval of the form (i − 1) (22), there must exist some m * ≤ χε We set A = (LE(X)) [t+(m * −1)ε Due to the buffers of length r ℓ /24 present in the beginning and ending of where we used Consequently, since we chose c < 1 2M and z = 1/20, we can reduce ε if necessary so that as required.Finally, to cover the case where I is primarily contained in LE(X ′ ) rather than LE(X), note that we can reverse the roles of X and X ′ above to obtain a fourth contribution to the probability of δ 4 .This concludes the proof.

Proof of Theorem 3.6
We assume that E n,c,ε holds and let ℓ ≤ N n be a fixed scale throughout the proof.The proof will involve applications of Claim 4.1 and Claim 4.2; for these we will take ψ = θ + 1 c 2 , take χ = min{ z 3ψ , 1 24 }, take M ℓ = ε −1/10 ℓ and take W = Γ n .These four variables will assume these values throughout the proof.
Proof.Let x be some vertex of G n and let X be a random walk started from x and let τ Γn denote the time at which X hits Γ n , so that Γ x = LE(X[0, τ Γn ]).We start by upper bounding the time until X hits Γ n .On the event E n,c,ε we have from Theorem 3.
Consequently, taking a product over i ≤ M ℓ it follows that Provided that ε is small enough as a function of c, this is much smaller than the required bound on the probability of Theorem 3.6; hence we work on the event {τ Γn ≤ M ℓ √ n} for the rest of the proof.Furthermore, if {|Γ x | ≤ r ℓ 3 }, then Γ 5r ℓ /6 x contains a segment I ⊂ Γ n with |I| = r ℓ 3 (see the definitions above Theorem 3.6).Hence on the event E n,c,ε it follows from Theorem 3.3 (II) (which we just proved in the previous subsection) that so that the tail bound of Theorem 3.6 holds.We therefore also assume that {|Γ x | > r ℓ 3 }.Under E n,c,ε we have that Cap √ n (Γ n ) ≥ 2c and hence by Claim 2.7 we have that r ℓ for all ℓ ≤ N n provided that ε is small enough as a function of χ and c (i.e., depending on θ and c).By the union bound and Claim 4.2, provided n exceeds some N (c, ε) the probability that all of these consecutive intervals contain a subinterval satisfying points (1) − (4) of Claim 4.2 is therefore at least In particular, since any interval I ⊂ [0, M ℓ √ n] of length χε z ℓ log ε −1 ℓ r ℓ must contain an entire consecutive interval of the form above, we deduce that, provided n ≥ N (c, ε), We next apply Claim 4.1 with j = 1 to obtain that, provided n ≥ N (c, ε), with probability at least there exists t ≤ r ℓ 3 such that for all 1 ≤ m ≤ χε We write X[t 1 , t 2 ) for the corresponding part of X, so that t 1 = λ t (X) and and moreover since we assumed that {|Γ x | > r ℓ 3 } and {τ Γn ≤ M ℓ √ n}, we clearly have that t 2 ≤ M ℓ √ n.On the event E n,c,ε , it therefore follows from ( 25) and ( 28) that the probability that [t 1 , t 2 ] does not contain a subinterval J = [t − J , t + J ] satisfying conditions (1) − (4) of Claim 4.2 is bounded by e − χz 12 (log ε −1 For the rest of the proof we assume that such a J exists and that E n,c,ε holds.By part (1) of Claim 4.2 and ( 27), for each 2 ≤ m ≤ χε Therefore there must exist some m * ≤ χε .
Note that, by construction, it holds that A ⊂ Γ r ℓ 3 x , that |A| = ε 5z 3 ℓ r ℓ and that A ⊂ X[t − J , t + J ]. Since M (k ℓ ) (A) as defined in (10) is monotone with respect to A we have that Since t + J ≤ λ r ℓ /3 (X) by construction, and W = Γ n , it also follows that Therefore, since Close k ℓ (•, •) is monotone and subadditive in each argument (by definition and the union bound), applying (3) − (4) of Claim 4.2 we deduce that (where we used k ℓ = ε and assuming without loss of generality that c, ε < 1/2, we obtain that ℓ n .
To summarize, we showed that Cap k ℓ Γ is large enough on the event E n,c,ε whenever {τ Γn ≤ M ℓ √ n} and the relevant events in Claim 4.1 and Claim 4.2 occur so that we can find A as above.Theorem 3.6 therefore follows on taking a union bound over ( 24), ( 25) and ( 26), choosing ε ′ small enough as a function of c and requiring that n is large enough as a function of ε and c (since χ and ψ were themselves functions of c).
5 Proof of Lemma 3.7 In this section we prove Lemma 3.7.Throughout we assume that the index n, the scale ℓ and the paths Γ n ∪ Γ x are fixed.We also take the setup of Section 3.1, as outlined above Lemma 3.7.This means that we condition on Γ n ∪ Γ x and add a sun ⊙ to the graph G n /(Γ n ∪ Γ x ) with weights chosen so that a lazy random walk will jump to the sun at the next step with probability 1 k ℓ .We also assume that the intervals A j ⊂ Γ x for j = 1, . . ., (2 13 e) −1 ε − 1 3 ℓ are predefined as described in Section 3.1.For the rest of this section we work on the graph G n /({⊙} ∪ Γ n ∪ Γ x ).Recall that When we talk about capacity and relative capacity in this section, we are always referring to these quantities on the original graph G n .
Recall also that, for each j ≤ (2 13 e) −1 ε ℓ , we let I j (k ℓ ) be the set of vertices connected to the contracted vertex in UST(G n /({⊙} ∪ Γ n ∪ Γ x ) by a path of length at most k ℓ , such that the last edge on this path has an endpoint in A j .This also includes vertices originally in A j before the contraction.Since ℓ is fixed for this section, we also set X j = |I j (k ℓ )|.
Claim 5.1.Assume that Γ x and Γ n satisfy (16) (and therefore (29)).Fix a scale ℓ and consider the graph G n /(Γ n ∪ Γ x ∪ {⊙}) as described above.Then, for every j ∈ {1, . . ., (2 13 e) −1 ε Proof.By Wilson's algorithm, for every v ∈ G n , we have that v ∈ I j (k ℓ ) if a random walk starting at v hits Γ n ∪ Γ x ∪ {⊙} at A j and its loop erasure is of length at most k ℓ .Therefore, where all hitting times refer to hitting times of the lazy random walk.First note that P(τ . Then, given τ ⊙ > k ℓ , the lazy random walk until time k ℓ is distributed as a lazy random walk on G n /(Γ n ∪ Γ x ).Since all degrees in G n are equal we get and we conclude the proof using (29).
Recall that our goal is to find a lower bound for the probability that j+1 i=1 X i is large given that j i=1 X i is small.To this end, let Φ j be the (random) edge-set consisting of all simple paths of length at most k ℓ in UST(G n /({⊙} ∪ Γ n ∪ Γ x )) that end in the contracted vertex through A 1 ∪ . . .∪ A j .Note that Φ j determines { j i=1 X i ≤ 16ε ℓ r 2 ℓ } and that conditioning on Φ j = ϕ j for some set of edges ϕ j means precisely that the edges of ϕ j are in the UST (open edges) and all other edges touching a vertex v of ϕ j , such that the path in ϕ j from v to A 1 ∪ . . .∪ A j is of length at most k ℓ − 1, must not belong to the UST (closed edges).These open and closed edges determine Φ j .Thus, to condition on Φ j = ϕ j , we erase the closed edges and contract all the open edges to a single vertex which coincides with Γ n ∪ Γ x ∪ {⊙}, and call the remaining graph G n (ϕ j ).By the spatial Markov property of the UST [8, Proposition 4.2] we have that UST(G n (ϕ j )) together with ϕ j is distributed precisely as UST(G n /({⊙} ∪ Γ n ∪ Γ x )) conditioned on Φ j = ϕ j .Note that the event { j i=1 X i ≤ 16ε ℓ r 2 ℓ } occurs if and only if |V (ϕ j )| ≤ 16ε ℓ r 2 ℓ where V (ϕ j ) are the vertices touched by ϕ j .Claim 5.2.Let ϕ j ⊂ E(G n ) be such that P(Φ j = ϕ j ) > 0 and |V (ϕ j )| ≤ 16ε ℓ r 2 ℓ .Let γ be a simple path in G n (ϕ j ) that ends at the contracted vertex.Let (Y t ) t≥0 denote a lazy random walk on G n (ϕ j ) started from a uniform vertex U of the original graph G n and killed upon hitting the contracted vertex of G n (ϕ j )/γ, that is, the upon hitting the vertex corresponding to the contracted edges {⊙} ∪ Γ n ∪ Γ x ∪ ϕ j ∪ γ.Denote by V (Γ n ∪ Γ x ∪ ϕ j ∪ γ) the set of vertices of G n touched by the edges in Γ n ∪ Γ x ∪ ϕ j ∪ γ and let M ⊂ V (Γ n ∪ Γ x ∪ ϕ j ∪ γ) be a fixed subset of vertices of G n .Then (Recall here that to "hit M " means to hit the contracted vertex via an edge that originally led to M ).
Proof.Let ∆ = deg(G n ), i.e. the degree of vertices in the original graph G n (recall that by Assumption 1.4 all vertex degrees are equal), and let In other words, V bad is the set of all vertices of G n that are not in the contracted vertex of G n (ϕ j ) that are adjacent to at least ∆/2 closed edges.Since |V (ϕ j )| ≤ 16ε ℓ r 2 ℓ , number of closed edges is no more than 16∆ε ℓ r 2 ℓ .Hence the number of vertices touching a closed edge is at most 32∆ε ℓ r 2 ℓ and each vertex in V bad contributes at least ∆/2 to this count, so |V bad | ≤ 64ε ℓ r 2 ℓ .Recall that, when we originally added the sun to G n /(Γ n ∪Γ x ), we chose the weights so that the probability that a lazy random walk on G n /(Γ n ∪ Γ x ) would jump to the sun at the next step is always 1 k ℓ .In the graph G n (ϕ j ), we have now contracted some edges and closed some other edges.For any x ∈ G n (ϕ j ), these operations can only increase the probability that Y will jump directly to the sun from the vertex x.Therefore, by coupling, we can separate the sun and its incident edges, and obtain an upper bound for P(Y hits M ) by instead bounding the same probability for a lazy random walk on (G n (ϕ j )/γ) \ {⊙} with an independent Geo( 1 k ℓ ) killing time.We denote this second lazy random walk by Y ′ .To control capacity on (G n (ϕ j )/γ)\{⊙} we will need to work with the stationary measure on (G n (ϕ j )/γ)\ {⊙}, which we denote by π ′ .(The bound on |V bad | above will then help us to compare π ′ with the uniform measure).We define π ′ on all of G n by remembering the edges from before the contraction.In particular, this means that for u ∈ G n , we have , where N cl (v) denotes the number of closed edges incident to v in G n (ϕ j ).
We now observe the following.
Also, for every u ∈ G n , provided that c < 1/32 and ε < 1, we have that In what follows, these two observations mean that we will be able to switch between π ′ and U and vice versa provided we multiply by 2. In particular, we can write We will use Claim 5.2 to prove the following upper bounds.
Lemma 5.3.Let ϕ j ⊂ E(G n ) be such that P(Φ j = ϕ j ) > 0 and Proof.We condition on Φ j = ϕ j throughout this proof so our probability space is that of UST(G n (ϕ j )).To prove (i) we condition on Φ j = ϕ j and take any v ∈ G n (ϕ j ) \ {⊙}.By Wilson's algorithm on the graph G n (ϕ j ), we have that P(v ∈ A j+1 | Φ j = ϕ j ) is upper bounded by the probability that a lazy random walk started at v hits A j+1 before it hits the sun.If (Y t ) t≥0 is such a random walk starting from a uniform vertex of G n , by Claim 5.2 and (29) we have that where we also used the upper bound on |A j+1 | in (29).
To ease notation in the proof of (ii) we write P(•), E[•] and Var(•) for P(• | Φ j = ϕ j ) and the corresponding expectation and variance.We have Fix some v, and rewrite the inner sum as where U is a vertex chosen uniformly from G n .We decompose the event v ∈ I j+1 according to γ v , the path from v to A j+1 in G n (ϕ j ) which is of length at most k ℓ and obtain that To compare P(U ∈ I j+1 | γ v ⊆ UST(G n (ϕ j ))) and P(U ∈ I j+1 ) we note again from the spatial Markov property [8, Proposition 4.2] that the rest of UST(G n (ϕ j )) given γ v ⊆ UST(G n (ϕ j )) is the UST on the graph obtained from G n (ϕ j ) by contracting γ v .By coupling Wilson's Algorithm running on each of the two graphs (G n (ϕ j ) and G n (ϕ j )/γ v ), the difference between the two quantities can be upper bounded by the probability that a random walk starting from a uniform vertex of G n hits γ v before it hits the new sun ⊙.By Claim 5.2, this is bounded by Plugging this into (30) and using ( 29) we obtain , and Φ j is the random edge-set induced by ∪ j i=1 I i (k ℓ ).Under B ⊙ j , we have no information about the structure of Φ j , other than that |Φ j | ≤ 16ε ℓ r 2 ℓ (and this was important for the factorization in the proof of Corollary 3.8).However, in order to prove Lemma 3.7, we will need the following lower bound.Lemma 5.4.It holds that Proof.Recall that we are working on the graph G n /(Γ n ∪ Γ x ∪ {⊙}).Suppose that j i=1 X i ≤ 16ε ℓ r 2 ℓ , and note that this event can be written as the disjoint union of all possible ϕ j such that P(Φ j = ϕ j ) > 0 and |V (ϕ j )| ≤ 16ε ℓ r 2 ℓ .When conditioning on Φ j = ϕ j for some ϕ j we work on the graph G n (ϕ j ), as defined above Claim 5.2.Note that by Lemma 5.3, we have that for every ϕ j with |V (ϕ j )| ≤ 16ε ℓ r 2 ℓ and P(Φ ℓ r 2 ℓ .Furthermore, by Claim 2.12 and Claim 5.1 we have that Write E ′ and P ′ for the expectation and probability operators as required.
Lemma 5.5.Suppose ϕ j is such that P Proof.The result is a straightforward application of Chebyshev's inequality, similarly to [14,Lemma 6.13].
First note that it follows from Lemma 5.3(ii) that where in the last inequality we used that E[X j+1 |Φ j = ϕ j ] ≥ 2 9 ε ℓ r 2 ℓ by assumption.Using this again we therefore deduce that Proof of Lemma 3.7.By Lemma 5.4, given that j i=1 X i ≤ 16ε ℓ r 2 ℓ , we get with probability at least ε For every such ϕ j , by Lemma 5.5, we get that given Φ j = ϕ j , we have that X j+1 ≥ 16ε ℓ r 2 ℓ with probability at least 1/2.We conclude that as required.
6 A criterion for GHP convergence

GP convergence
We first aim to address the convergence provided in Theorem 3.1.Recall our definitions and notation from Section 1.1 (in fact this section can be seen as a direct continuation of Section 1.1).Definition 6.1.Let (X, d, µ) and (X ′ , d ′ , µ ′ ) be elements of X c .The Gromov-Prohorov (GP) pseudodistance between (X, d, µ) and (X ′ , d ′ , µ ′ ) is defined as where the infimum is taken over all isometric embeddings φ : X → F , φ ′ : X ′ → F into some common metric space F .
Thus d GP is a metric on X GP c which is the space X c where we identify all mm-space with GP distance 0. There is a useful equivalent definition of convergence of mm-spaces with respect to the GP distance.Given an mm-space (X, d, µ) and a fixed m ∈ N we define a measure ν m ((X, d, µ)) on R ( m 2 ) to be the law of the Proof.Fix some x ∈ supp(µ) and ε > 0. Then µ(B(x, ε/4)) ≥ b for some b = b(x, ε) > 0. Put δ = min{b/2, ε/12}.By the GP convergence there exists N ∈ N such that for every n ≥ N there are isometric embeddings taking X n and X to a common metric space (E, d ′ n ) such that the Prohorov distance between the pushforwards of their measures is smaller than δ.Therefore we may assume that X n and X are both subsets of some common metric space.We abuse notation and write µ and µ n in place of their respective pushforward measures.Since the GP distance is at most δ we get that b ≤ µ(B(x, ε/4)) ≤ µ n (B(x, ε/4 + δ)) + δ.
Hence, taking the lim inf on the left hand side and then taking δ ′ → 0 we obtain that for all x ∈ X lim inf n→∞ inf y∈Xn µ n (B(y, ε/2)) ≤ µ(B(x, ε)), and the claim follows by taking the infimum over x ∈ X.
We now state and prove the main goal of this section; as we state immediately afterwards, it readily shows that Theorem 3.2 implies Theorem 1.5.(ii) For any c > 0, the sequence m c ((X n , d n , µ n )) −1 n≥1 is tight.
Proof.The metric space (X c , d GP ) is separable (see [6, Figure 1]), hence by the Skorohod Representation theorem, there exists a probability space on which the convergence in (i) holds almost surely.We will henceforth work on this probability space, and may therefore assume that (X n ) n≥1 and X are embedded in a common metric space where d P (X, X n ) → 0 almost surely.We will show that on this probability space, we have that (X n , d n , µ n ) −→ (supp(µ), d, µ) in probability with respect to the GHP topology, giving the required assertion.
Let ε, ε 2 > 0. By (ii), we have that there exists some c 1 > 0 and N 1 ∈ N such that for every n ≥ N 1 we have Hence by Fatou's lemma Meaning, with probability larger than 1 − ε 2 , we can find a (random) subsequence n k such that for every k ∈ N we have that m ε/2 ((X n k , d n k , µ n k )) = inf Hence µ n (B dn (x, 2ε)) > 0.
Proof of Theorem 1.5.Lemma 6.3 shows that the UST sequence converges in distribution with respect to d GP to the CRT (X, d, µ) so that condition (i) of Theorem 6.5 holds.Theorem 3.2 verifies that condition (ii) holds, and lastly, it is well known (see [1,Theorem 3]) that supp(µ) = X.The conclusion of Theorem 6.5 thus verifies Theorem 1.5.

Comments and open questions
Combining with self-similarity of the CRT, Theorem 1.1 can also be used to recover the UST scaling limit in other settings.For instance, Theorem 2 of [3] entails that the branch point between three uniformly chosen points in the CRT splits the CRT into three smaller copies of itself, with masses distributed according to the Dirichlet( 1 2 , 1 2 , 1 2 ) distribution, and where each copy is independent of the others after rescaling.This together with Theorem 1.1 shows the following., the torus on (approximately) n vertices and d > 4. Sample a Dirichlet( 1 2 , 1 2 , 1 2 ) random variable, that is, a uniform triplet (∆ 1 , ∆ 2 , ∆ 3 ) on the 2-simplex.Conditioned on this, let G ⌊∆1n⌋ , G ⌊∆2n⌋ , and G ⌊∆3n⌋ be disjoint and attach each to an outer vertex of a 3-star.Let T n be the UST on the resulting graph and µ n the uniform measure on its vertices.Then (T n , 1 −→ (T , d T , µ).
Next, building on the corollaries in Section 1.3, one can also ask finer questions about the structure of the UST in the mean-field regime.One in particular is the convergence of the height profile.This does not follow straightforwardly from the GHP convergence of Theorem 1.5 since that only captures the convergence of full balls of diameter √ n with volumes of order n.(On the other hand, it is straightforward prove convergence of the rescaled volume profile V n (r) = s≤r H n (s) from GHP convergence).
Next, our paper addresses the general mean-field case but leaves the upper critical dimension case of Z 4 open.Here the mixing time is really of order n 1/2 , but it was shown by Schweinsberg [30] that Gromov-weak convergence to the CRT still holds with an additional scaling factor of (log n) 1/6 .Our proof of the lower mass bound does not immediately transfer to the 4-dimensional setting.However, it is possible that it is attainable to do so using the recent results of Hutchcroft and Sousi [15].

Theorem 1 . 7 (
Pointed convergence).Let {G n } be a sequence of graphs satisfying Assumption 1.4, let T n be a sample of UST(G n ) and let O n be an arbitrary vertex of G n .Denote by d Tn the graph distance on T n and by µ n the uniform probability measure on the vertices of T n .Then there exists a sequence {β n } satisfying 0 denote the (random) law of a discrete-time SRW on UST(G n ), started from O n , and let P (O) (•) denote the law of Brownian motion on the CRT as constructed by Krebs, started from O.

Theorem 3 . 1 .
[28, Theorem 1.2].Let {G n } be a sequence of graphs satisfying Assumption 1.4 and let T n be UST(G n ).Denote by d Tn the graph distance on T n and by (T , d, µ) the CRT.Then there exists a sequence {β n } satisfying 0 < inf n β n ≤ sup n β n < ∞ such that the following holds.For fixed k ≥ 1, if {x 1 , . . ., x k } are uniformly chosen independent vertices of G n , then the distancesd Tn (x i , x j ) β n √ nconverge jointly in distribution to the k 2 distances in T between k i.i.d. points drawn according to µ. eε

1 2 −
α 10 and ε Nn = εn − α 5 , and use Theorem 3.6, Corollary 3.8 and the union bound to bound obtained from G n by adding an extra vertex ρ n known as the sun, and connecting it to every vertex in v ∈ G n with an edge of weight (deg v)ζ √ n−ζ (so that the probability of jumping to ρ n at any step is ζn −1/2 It follows from Lemma 2.10 that the graph UST(G * n ) \ {ρ n } obtained from the UST of G * n by removing ρ n and its incident edges is stochastically dominated by the UST of G n .Therefore, there is a coupling between UST(G n ) and UST(G * n ) such that UST(G * n ) \ {ρ n } ⊂ UST(G n ); moreover if Γ * n denotes the path between u and v in UST(G * n ), then Γ n = Γ * n in this coupling provided that ρ n / ∈ Γ * n .Note that this sunny graph is different to the sunny graph used in the statements of Lemma 3.7 and Corollary 3.8.As outlined in Section 3.1, Lemma 3.7 and Corollary 3.8 refer to later stages of the overall proof strategy.
such a path exists.Otherwise, let Γ * n = ∅.Lemma 4.3.For every δ > 0 there exists ζ > 0 such that for all large enough n there exists a coupling ofΓ n and Γ * n (ζ) such that Γ n = Γ * n (ζ)and is non-empty with probability at least 1 − δ.Proof.For every path Γ from u to v let H(Γ) = Γ be equal to Γ if ρ n / ∈ Γ and H(Γ) = ∅ if ρ n ∈ Γ. Run Wilson's algorithm on the graph G * n (ζ) initiated at the points ρ n , u and then v, and note that the hitting time of ρ n is a geometric random variable, and moreover that given τ ρn , the walk until time τ ρn is distributed as a random walk on G n .Consequently, H( Γ * n ) has the distribution of Γ * n .By the discussion above, we can find a coupling of (Γ n , Γ * n ) where these paths are equal whenever ρ n / ∈ Γ * n .Under this coupling, we have that (Γ n , Γ * n , H( Γ * n )) are all equal with probability 1 − P(ρ n / ∈ Γ * n ).As H( Γ * n ) has the law of Γ * n , this is in fact a coupling of Γ n and Γ * n where the paths are equal with probability 1 − P(ρ n / ∈ Γ * n ).By [25, Claim 2.9], this probability tends to 1 as ζ → 0. For the final part of the claim, note that Γ * n is clearly non-empty on this good event.