Historical Lattice Trees

We prove that the rescaled historical processes associated to critical spread-out lattice trees in dimensions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d>8$$\end{document}d>8 converge to historical Brownian motion. This is a functional limit theorem for measure-valued processes that encodes the genealogical structure of the underlying random trees. Our results are applied elsewhere to prove that random walks on lattice trees, appropriately rescaled, converge to Brownian motion on super-Brownian motion.


Introduction and Main Results
In the past three decades, various critical high-dimensional spatial branching models have been conjectured or proved to converge to super-Brownian motion (SBM), which is a continuous Markov process taking values in the space of finite measures on R d . One of the settings in which significant progress has been made is that of critically weighted (and sufficiently spread-out) lattice trees (LT) above 8 dimensions [7,10,11,[17][18][19]. In particular, convergence on path space has recently been proved in this setting (see [11]). For LT's convergence to SBM means weak convergence to SBM of the rescaled empirical measure process of the locations in the LT which are a given tree distance from the root. Hence the tree distance to the root plays the role of time for the stochastic processes. More recently, it has been proved in [20] that for LT's, and in fact for several lattice models, the rescaled ranges (for LT's the range is the compact set of vertices in the tree) converge weakly to the range of SBM. Convergence of genealogical observables is not forthcoming from the notions of weak convergence to SBM described thus far. Results of this kind can be obtained by proving convergence of the corresponding "historical processes" [6]. For LT's this would mean that instead of just having the convergence to SBM of the rescaled empirical measure process of the particles in the LT, as a function of the distance from the root, one establishes convergence to historical Brownian motion (HBM) of the rescaled empirical measure process for the entire paths in the LT to the endpoints, as a function of the distance from the root. HBM, constructed in [6], is a process taking values in the space of finite measures on R d -valued paths, which at time t is the empirical measure of the past histories of the particles contributing to the SBM at time t. See Sect. 1.2.1 below for more about HBM, including the fact that is the weak limit of the rescaled historical processes associated with Branching Brownian Motion (Theorem 1.3). Our main result, Theorem 1.4 below, establishes this convergence of "historical processes" for LT's.
In Sect. 2.1 we give a set of general conditions that are sufficient for convergence of discrete-time historical processes to HBM in the sense of finite-dimensional distributions (Theorem 2.1). Most of these conditions are already known to hold for a range of lattice models above the critical dimension including lattice trees (d > 8) and oriented percolation (d > 4), as well as for the voter model (d ≥ 2) and the contact process (d > 4), both of which are continuous-time models. The main condition that remains to be proved in each case is convergence of the joint characteristic functions of the increments of a finite dimensional subtree. These detailed r -particle transforms can be seen as enriched versions of the r -particle transforms studied e.g. in [13,16,17] (called Fourier transforms of (r + 1)-point functions therein) that record genealogy. We prove that these conditions are satisfied for sufficiently spread-out lattice trees in high dimensions and so establish convergence to HBM in the sense of f.d.d.'s (Proposition 2.4). The required asymptotics of the detailed r -particle transforms are obtained via the lace expansion (see e.g. [25]) in Sect. 4. It is worth noting that these asymptotics can be understood from those of the usual r -particle transforms and the detailed 1-particle transform. In particular we do not require any new "diagrammatic estimates". We believe that all of the conditions can also be verified for the other models 1 mentioned above. For the voter model this is currently work in progress [1].
The second main ingredient in our proof is a novel tightness argument for historical processes which upgrades f.d.d. convergence to convergence on path space in a historical setting. This step is carried out in Sect. 3. We start with an abstract tightness result in a general historical setting (Theorem 3.6). For all of the lattice models mentioned above this reduces tightness of the approximating rescaled historical processes to that of the R-valued processes obtained by integrating a test function (from an appropriate class) with respect to the rescaled historical processes. (Verification of the other conditions may be found in [20].) This key condition is then verified for LT's with some effort in Proposition 3.11. The main ingredients of this argument are tightness of the total mass process from [11] and a uniform modulus of continuity for the approximating historical paths from [20]. The latter is in fact verified for all of the other lattice models mentioned above, and so we have potentially reduced the problem of tightness for historical processes to that of the total mass process for a range of other lattice models.
A simple consequence of our results is that the unique path in the tree from the origin to a uniformly chosen vertex (called the backbone from the origin to that vertex) of distance n converges weakly to BM on path space (see [18,Theorem 1.3]). Another application of our results concerns the scaling limit of random walk on lattice trees. In particular, the historical convergence proved herein is used in [21] to verify certain conditions of Ben-Arous et al. [2] which imply that random walk on lattice trees converges to a BM on a SBM cluster.

Lattice trees and scaling limits.
A lattice tree is a finite connected set of lattice bonds containing no cycles (see Fig. 1).
We will be considering lattice trees on Z d with bonds connecting any two vertices that live in a common ball (in ∞ ) of sufficiently large radius L ∈ N, and with d > 8.
To be more precise, let d > 8 and let D(·) be the uniform distribution on a finite box The assumption of uniformity of D is not essential. We expect that the results herein hold for D as in [17,Section 1].
For a lattice tree T o define W z,D (T ) = z |T | e∈T D(e), where the product is over the edges in T and |T | is the number of edges in T . Remark 1.1. If T is an edge-disjoint union of subtrees then W z,D (T ) can be factored into a product over the weights of the subtrees.
It turns out (see e.g. [10,17]) that there exists a unique critical value z D such that ρ = T o W z D ,D (T ) < ∞ and E[|T |] = ∞, where P(T = T ) = ρ −1 W z D ,D (T ) for T o. Hereafter we write W (·) for the critical weighting W z D ,D (·) and suppose that we are selecting a random tree T o according to this critical weighting.
Let T be a lattice tree containing o ∈ Z d , and for m ∈ N, let T m denote the set of vertices in T of tree distance m from o. In particular, T 0 = {o}, and for any x ∈ T m there is a unique path from o to x in the tree of length m. Roughly speaking, in this paper we consider the weak limit (as m → ∞) of rescaled paths of this kind in high dimensions. For t ∈ R + \ Z + define T t = T t . For t ≥ 0 and x ∈ Z d we will write (t, x) ∈ T to mean that x ∈ T t . The notation (t, x) is consistent with that in [20], while in the oriented percolation and contact process literature often (x, t) is used instead.
Functional limit theorems. For our general discussion we require the notion of weak convergence of finite measures on Polish (i.e. complete, separable metric) spaces. We refer the reader to [8,Chapter 3] for further details on what we discuss below.
For a Polish space P, let M F (P) (resp. M 1 (P)) denote the space of finite (resp. probability) measures on the Borel sets of P. For a sequence ν n ∈ M F (P) we say that ν n converges weakly to ν ∈ M F (P) and write ν n w −→ ν if for every f : P → R bounded and continuous, Equipped with the Vasershtein metric, which generates the topology of weak convergence, M F (P) is also Polish (see e.g., [24,Ch. II]). We will use the notation E ν [ f (X )] for f (x)ν(dx), with the understanding that X ∈ P. This will be particularly convenient when X is a P-valued random variable defined on an underlying probability space and ν(·) = c · P(X ∈ ·) for some c > 0. Let S n denote the location of a nearest-neighbour simple symmetric random walk on Z d after n steps (starting from the origin o ∈ Z d ). Then E[S 2 n ] = n (here and elsewhere, for x, y ∈ R d we abuse notation and write x y to mean x · y, and hence x 2 to mean |x| 2 ) and the central limit theorem (CLT) states that n −1/2 S n converges in distribution to a random vector Z that is (multivariate-) normally distributed with mean 0 ∈ R d and covariance matrix diag(1/d). Define probability measures ν n , ν on (the Borel sets of) R d by ν n (·) = P n −1/2 S n ∈ · , and ν(·) = P(Z ∈ ·).
Phrased in the language of weak convergence of (finite) measures, the CLT says that ν n w −→ ν. The statement ν n w −→ ν in M F (R d ) is well known to be equivalent to pointwise convergence of the characteristic functions (Fourier transforms), so for ν n , ν as above For a Polish space P let D t (P) (resp. D(P)) denote the space of càdlàg paths (paths that are continuous from the right with limits existing from the left) mapping [0, t] (resp. [0, ∞)) to P. Let C t (P) (resp. C(P)) denote the corresponding subspace of continuous paths. It is well known that there are complete metrics on these spaces (generating the Skorokhod J 1 topology) for which D t (P) and D(P) are also Polish (see [8,Chapter 3.5]). The functional central limit theorem (FCLT) concerns the entire path (W (n) t ) t≥0 defined by W (n) t = n −1/2 S nt . Defined in this way, for each n, W (n) jumps at times t = i/n for i ∈ N and is constant on intervals [i/n, i + 1/n) for i ∈ Z + . In particular the process W (n) is a random element of the space D(R d ) of càdlàg paths from R + = [0, ∞) to R d . The FCLT states that the sequence of rescaled random walks (W (n) t ) t≥0 converges to a d-dimensional Brownian motion (B t ) t≥0 (with B 1 ∼ N (0, diag(1/d))). Phrased in the language of weak convergence of (probability) measures this FCLT says that ν n w −→ ν, where ν n , ν ∈ M 1 (D(R d )) are defined by ν n (·) = P (W (n) t ) t≥0 ∈ · , ν(·) = P (B t ) t≥0 ∈ · .
Note that ν puts all its mass on continuous paths. The MVP X (1) 5 assigns masses to points in the tree at distance 5 from the root, while H (1) 5 assigns the same masses to paths in the tree leading to these points Thus every (m, x) ∈ T has associated to it an infinite càdlàg path w(m, x) that is constant after time m. Denote the collection of ancestral paths for T by W = (w(m, x)) (m,x)∈T . For t ≥ 0 and x ∈ Z d / √ n such that √ nx ∈ T nt we define w (n) (t, x) ∈ D by By [10,17] there exist constants C A , C V > 0 2 such that Let C 0 = C 2 A C V , and let , and denote the (rescaled) measure-valued "process" and historical "process" (see e.g. [6]) associated with the random lattice tree T respectively. Note that X (n) t assigns mass to certain particles in the tree (but does not encode the genealogy) whereas H (n) t assigns mass to genealogical paths leading to those particles. See e.g. Fig. 2.
For φ : P → C and Y t ∈ M F (P) write Y t (φ) = φdY t . Then for φ : (1.5) and in particular Define the survival/extinction time as Due to the survival probability asymptotics (1.6), multiplying by n and working on the event that the process survives until time n is asymptotically the same (up to a constant) as conditioning on survival until time n (or rescaled time 1).
According to [24,Section II.7], for any γ, σ 2 > 0 (representing the branching rate and diffusion parameter respectively) there exists a σ -finite measure N = N γ,σ 2 on , which is a (time-homogeneous) Markov process. The superprocess in question (called super-Brownian motion) is a measure-valued process that can be thought of as the empirical measures of an infinitesimal critical branching process whose spatial dispersion is governed by the R d -valued process  t≥0 , γ, 0)-superprocess. The latter (as well as the process H underlying N H ) is called historical Brownian motion (HBM). The general construction of canonical measures for superprocesses may be found in [24,Section II.7], while Section II.8 therein shows how to consider the historical processes in this general framework. One can also construct N H from the canonical measure of Le Gall's Brownian snake since the historical process is a functional of the snake. See [22, pages 34, 64] for details.
It is proved in [11,17] that for lattice trees in dimensions d > 8 (with L sufficiently large) ν LT n w −→ N, where the parameters of N are γ = 1 and σ 2 0 = σ 2 0 (L , d), which is to be discussed later. Since the limit is a σ -finite measure, ν n w −→ N is defined in terms of weak convergence of a family of finite measures (indexed by t) on D(M F (R d )) as for each t > 0, (1.9) or equivalently in terms of weak convergence of their conditional (on S > t) counterparts, which are probability measures. (The equivalence holds by (1.6), (1.7) and (1.8).) Similar results have been proved for other self-interacting branching systems such as the voter model [3,4] (d ≥ 2), oriented percolation (OP) [16] (d > 4), and the contact process (CP) [13] (d > 4)), although for OP and CP only convergence of the finitedimensional distributions has been established and tightness remains an open problem. The corresponding result for the historical processes (μ n w −→ N H ) was an open problem in all of the above contexts. Here we resolve this problem for lattice trees (d > 8, and L sufficiently large 3 ), and, as was suggested above, our general approach may well also help in the other contexts above. A discussion of possible extensions and challenges for other models, including these, may be found in Sect. 1.3.

Main results.
In this section we state our main result (Theorem 1.4 below). For this, we first introduce some notation and present the relevant notions of weak convergence. We then introduce critical branching Brownian motion (BBM) as a simpler process from which one can understand the limiting historical Brownian motion through a corresponding historical limit theorem for rescaled BBM's, see Theorem 1.3. The latter follows easily from results in the literature as we will describe. Following this, we state our main result. Theorem 1.3 is also used in the proof of our main result by identifying the joint characteristic functions of the general moment measures for the limiting HBM in Proposition 2.6.
For a Polish space P, and One should think of M EX 1 (P) as the space of excursion measures for càdlàg measurevalued paths where the measures are on P. For μ ∈ M EX (P), and s > 0 define the (probability) measure μ s on D(M F (P)) to be μ conditional on S > s, i.e.
For r ∈ N and t = (t 1 , . . . , t r ) ∈ [0, ∞) r and a finite measure κ on D(M F (P)), let κ t denote the (finite) measure on (M F (P)) r defined by On the left is the index set I drawn (with labels as edges) up to and including generation 3. On the right is an example of a Galton-Watson tree (with edge labels α), where e α = 0 for all α ∈ {000, 0010, 0011, 01}, while e α = 2 for α ∈ {0, 00, 001}. Note that we have dropped the parentheses and commas in the notation for elements of I to declutter the pictures

Branching Brownian motion.
A good way to understand historical Brownian motion is as a limit of critical branching Brownian motions. Recall that branching Brownian motion may be viewed as a system of Brownian motions run along the edges of a critical Galton-Watson tree. The notation introduced below is presented in [24] at a more leisurely pace. We start by defining a Brownian motion on a full binary tree. Let and for α as above set |α| = n, α|i = (α 0 , . . . , α i ), i ≤ n, and say β is an ancestor of α iff β = α|i for some i < |α|. If α, β ∈ I , the greatest common antecedent (gca) of α and β is α ∧ β = α|i, where i is the maximal integer such that α|i = β|i. If |α| > 0, the parent of α is πα := α|(|α| − 1). Let {W α : α ∈ I } be iid d-dimensional Brownian motions with variance parameter σ 2 . For a fixed n ∈ N (dependence on n is suppressed) and for α ∈ I , let and note that (B α t ) t≥0 is a d-dimensional Brownian motion, starting at 0, that runs until time (|α|+1)/n (after which it stays constant). We can view {B α t : t < (|α|+1)/n, α ∈ I } as a Brownian motion run on a rescaled binary tree with edge lengths 1/n. We next prune the binary tree to make it a critical Galton-Watson (G-W) tree. Let {e α : α ∈ I } be a collection of iid random variables with (critical) binary offspring law 1 2 δ 0 + 1 2 δ 2 that is independent of {W α : α ∈ I }. For a fixed n ∈ N (dependence on n is suppressed) and for α ∈ I , let and also define Here is added to R d as a cemetery point. In this way GW = {α : τ α = |α|+1 n } labels the points (drawn as edges in Fig. 3) on a G-W tree with a critical binary offspring law (0,0) Fig. 4. A (binary) branching Brownian motion in 1-dimension, with time on the x axis, drawn up to the third branch time, 3/n. In the corresponding G-W tree, the root 0 has two children, exactly one of which has 2 children that does not depend on n. We have scaled the edge lengths of the tree to be n −1 and write α ∼ t iff α ∈ GW and |α| n ≤ t < |α|+1 n . Therefore α ∼ t means that α labels an edge in the Galton-Watson tree which is alive at time t ≥ 0. In particular, 0 ∼ t for every t < 1/n, see Fig. 3. Finally {B α t : α ∼ t} for t ≥ 0 is a system of Brownian motions, starting with a single particle at the origin, and run along these edges while undergoing critical binary branching at times { j/n : j ∈ N}, with the motions being independent along the disjoint scaled edges in the G-W tree. Figure 4 gives a depiction of the system of Brownian motions in 1-dimension.
We define the scaled empirical measures X (n) ) associated with these locations and historical paths, respectively, by It is easy to extend the above definitions to the setting of a general mean 1 finite variance γ offspring law in place of the critical binary branching law above where we have γ = 1 (see [24,Section II.3]). In this setting let ν BBM n = nP(X (n) ∈ ·) and μ BBM n = nP(H (n) ∈ ·). We believe that the following limit result was first proved in [24], although part (b) was not stated explicitly there. The original construction of N = N γ,σ 2 was done by Le Gall using his Brownian snake (see [22,Ch. IV] and the references therein) from which the result below was clear enough. An easy consequence of the above and the obvious analogue of (1.5) for branching Brownian motion is that H projects down to super-Brownian motion,

Lattice trees in high dimensions.
Our main result is that the functional limit theorem for historical processes in (b) above, continues to hold for lattice trees in high dimensions (the analogue of (a) was already noted in (1.9)). Recall the definition of μ n from (1.7). Theorem 1.4. For each d > 8 there exists L 0 ≥ 1 such that: for every L ≥ L 0 , there Here, and throughout this work, the constant σ 2 0 is equal to vσ 2 /d in [17,Theorem 3.7].
1.3. Discussion. We finish this section with a brief discussion of extensions and applications of our results, and commentary on possible extensions to other models.
Our results are extended in [21] and used in [2] to prove weak convergence of rescaled random walk on lattice trees to a Brownian motion on a Super-Brownian motion cluster, the latter as defined in [5]. [2] reduces this latter result to the verification of two conditions. Roughly speaking, the first of these conditions is that if one chooses K points at random in the lattice tree, then the spatial tree generated by these K points, and suitably rescaled, converges (as the scaling parameter becomes large) to the random tree in R d generated by choosing K paths independently at random according to ∞ 0 H t (·) dt (normalized by its total mass). One interprets this convergence in an appropriate metric space. The weak convergence in Theorem 1.4 is extended in [21] to joint convergence with K independently chosen paths as above, and moreover one can include the branch times and path lengths, to eventually obtain the required spatial tree convergence. The second condition states that in a certain precise sense the vertices of the rescaled tree generated by the K points become dense in the full rescaled lattice trees, uniformly in the scaling parameter, as K becomes large. This is also verified in [21] by using one of the inputs of our tightness argument, namely the modulus of continuity from [20] as stated in Condition 3.4 below.
One may ask about historical convergence in other contexts. This is most natural in cases where there are existing notions of time and ancestry in the model. Such notions exist in the voter model, where the parent of (t, x) is the corresponding point (t , x ) from which (t, x) most recently updated its vote, and also in the contact process where the parent of an infected particle is the infected particle which most recently infected it. In his PhD thesis, Tim Banova is using the methodology of Sect. 2 to prove historical convergence of the voter model in dimensions d > 2 (for both nearest-neighbour and spread-out (finite range) models). We believe the methodology of Sect. 2 is also relevant for historical convergence of sufficiently spread-out contact processes for d > 4. Results for convergence of empirical measures associated with high-dimensional contact processes (but not in the historical context) have relied on a time-discretisation argument and analysis of oriented percolation (OP) (see [13]).
In the context of OP, there is a natural notion of time, but ancestral paths are not unique because there can be multiple connections between vertices. One possible "remedy" is for each site (n, x) of generation n in the cluster of the origin to choose a parent uniformly at random from among sites of generation n − 1 in the cluster that are connected to (n, x). We expect that the resulting historical process of sufficiently spread-out OP does converge to historical Brownian motion in dimensions d > 4, but note that this process does not encode every connection in the cluster of the origin.
Another approach that one could take (which would also be relevant for percolation and lattice animals) is to define ancestral paths only in terms of pivotal bonds for connections. Pivotal bonds for a connection from (0, o) to (n, x) in oriented percolation, and from o to x in percolation and lattice animals (if such a connection exists) have a natural temporal ordering, as all paths from point to point must pass through these pivotal bonds in the same order. One could then define historical paths by e.g. linearly interpolating between these pivotal bonds. After appropriate scaling we expect that these historical processes would converge to historical Brownian motion in dimensions larger than the respective critical dimension. Section 2 below would be relevant in each of these contexts.
As has already been noted, except for the voter model [3,4], tightness for any of these models has been a challenging problem even in the context of convergence of empirical measures to SBM, where it has only been established for high-dimensional lattice trees [11] with considerable effort. The proof of tightness for our historical lattice trees uses some bounds on the total mass of the rescaled LT's from [11], and Conditions 2.3 and 2.4 which have also been shown in [20] for OP and the contact process. The additional special property of LT's we use is a sub-Markov property, Lemma 3.15. It would be interesting to see if the proof of tightness can be carried out without this property. The reason is that then control of the total mass process should suffice to prove tightness, even in the historical context, for both the contact process and OP. For percolation and lattice animals, tightness through this historical approach, without even a uniform modulus continuity (Condition 2.3), still seems to be out of reach.
Finally, note that in this paper we have assumed that the step kernel D(·) is uniform on a large box. As noted earlier, the uniformity assumption is not essential. We suspect that D with unbounded support but > 2 finite moments and with d > d c = 8 suffices for convergence to historical Brownian motion. In particular this ought to be true in the nearest-neighbour setting, but at present it would seem to be a considerable challenge (see e.g. [9]) to quantify some dimension d 0 above which this holds.

A general theorem.
In what follows we write N H for N γ,σ 2 H where the branching variance γ > 0 and the diffusion parameter σ 2 > 0 are fixed throughout.
A collection of G of bounded continuous functions from P to C is a determining class for M F (P) if whenever μ, μ ∈ M F (P) satisfy gdμ = gdμ for all g ∈ G, then μ = μ . The following is the path-valued analogue of [19, Theorem 2.6]: (weak convergence of finite mean measures on D), and for every ε > 0, μ n (H 0 (1) > ε) → 0. (iv) for every ∈ Z + and every t ∈ (0, ∞) , and every φ 1 , . . . , φ ∈ G, (2.1) Then It follows that for any t ∈ (0, ∞) , μ s n, t n∈N is tight in M F (D) . Assume μ ∈ M F M F (D) is a limit point of (μ s n, t ) n∈N . Then it follows from (2.1) and Dominated Convergence that By [8,Proposition 3.4.6] it follows that μ = N s H, t . Although this result is stated in [8] for G a set of real-valued functions, the fact that G is closed under complex conjugation allows one to see it is also a determining class for complex-valued measures and the proof in [8]  For s = (s 0 , . . . , s m ), where 0 = s 0 < · · · < s m and k = (k 0 , k 1 , . . . , and let G * = φ s, k : s, k as above for some m ∈ N . Note that G * is a determining class for M F (D(R d )) since finite measures on D(R d ) are determined by their finitedimensional distributions, and the laws of these finite-dimensional random vectors are determined by the characteristic functions of appropriate dimension. The elements of G * are precisely those which correspond to the characteristic function of the increments of the path at all finite sets of times. Setting k = 0 we see that 1 ∈ G * and by replacing k j with −k j we observe that G * is closed under complex conjugation. So we see that G * satisfies the conditions on G in Theorem 2.1.

Remark 2.3.
Under N H , H t assigns mass only to paths that are constant from time t onwards and start at o at time 0. The same holds for H (n) t for all n for LT and BBM. Therefore, when applying Theorem 2.1 in these settings, with G = G * as above, we may restrict our attention to φ s (1) , k (1) , . . . , φ s ( ) , k ( ) ∈ G * in part (iv) of the theorem satisfying m ) ∈ R dm and ignore the first factor in (2.3). Moreover we can without loss of generality assume that that the largest element of s (i) is t i for each i (i.e., if not we can append an extra component t (i) to s (i) and set the corresponding k (i) j equal to zero without changing φ s (i) , k (i) ).
In the context of Theorem 1.4, we will use Theorem 2.1 with the determining class G * at the end of this section to first establish the following result: Indeed, condition (i) of Theorem 2.1 trivially holds for lattice trees rooted at the origin. Condition (ii) of the Theorem is (1.6). The first part of Condition (iii) holds by [18,Theorem 2.1], and the second part is obvious because under μ LT n , H 0 (1) = 1 C 0 n . Condition (iv) of the Theorem (for the determining class G * ) will follow immediately from Proposition 2.6 and Theorem 2.7 below. In order to state these results we need to introduce various notation, which we proceed to do now.
The degree of a vertex in a graph is the number of incident edges. Vertices of degree 1 are called leaves. Vertices of degree ≥ 3 are called branch points.

Definition 2.5.
A non-degenerate shape is an isomorphism class of finite connected rooted tree graphs whose vertices all have degree 1 or 3, and whose r + 1 leaves (for some r ≥ 1) are labelled 0, 1, 2, . . . , r : the root 0 is always one of the leaves. To be more precise, two such graphs are considered to be the same shape if there is a graph isomorphism which preserves the labelling of the leaves (thus there is exactly one shape with 3 leaves and exactly 3 shapes with 4 leaves).
We let r denote the set of non-degenerate shapes with r + 1 leaves. For any ∈ r , we know that α has r − 1 branch points, 2r vertices and 2r − 1 edges. Label the branch 5. A depiction of a shape ∈ 4 with vertex labels above vertices and edge labels in brackets. The set of edges in the path from vertex 0 to vertex 1 is E 1 ( ) = {1, 5, 6}. Variables u i are associated to each of the vertices i, describing a 'length' from 0 to i, to form T ( , u). Differences in these u i are then the "edge lengths" points as r + 1, . . . , 2r − 1 in order, as you encounter them as you move from the root to vertex 1, then continue to label new internal vertices in the order that you encounter them as you move from the root to vertex 2 and so on up to vertex r . See e.g. Fig. 5. This is just a convenient arbitrary but fixed order. For i, j ∈ {0, . . . , 2r − 1}, we abuse the notation for the usual order and let i ∧ j ∈ {0, . . . , 2r − 1} denote the greatest common antecedent (gca) of i and j. The edges e of ∈ r are labelled as E( ) = {1, . . . , 2r −1} corresponding to the vertex labelling of the endvertex of e that is farthest from the root.
For e, f ∈ E( ), write e ≺ f if e is an ancestor of f in . For leaves ∈ 1, . . . , r , let E ( ) be the set of edges in the unique path in from o to .
For ∈ r we assign edge lengths by letting u = (u 1 , . . . , u 2r −1 ) ∈ (0, ∞) 2r −1 give the distances from the vertices to the root. That is, u i is the distance from the root to vertex i, and the edge lengths can be found by differencing. We let T( , u) denote the resulting tree with shape and edge lengths u. See Fig. 5. We often will specify the distances t = (t 1 , . . . , t r ) ∈ (0, ∞) r of the r leaves to the root in advance. In this case we let M( t, ) denote the set of possible vertex distances from the root. That is, M( t, ) denotes the set of u = (u 1 , . . . , u 2r −1 ) ∈ (0, ∞) 2r −1 such that: (2.4) if k and j are vertices of and k is an ancestor of j in , then u k < u j . (2.5) Consider a given (non-degenerate) shape ∈ r , t ∈ (0, ∞) r , and u ∈ M( t, ) as above. Let s = ( s (1)  For ∈ [r ], e ∈ E ( ), and a ∈ {1, . . . , j e }, let Similarly symbols represent times s (2) j (with m (2) = 6) and ⊗ symbols represent times s (3) j (with m (3) = 3) respectively. In this example there is one point (on edge 5) that is both square and triangle simultaneously. The 'subinterval' lengthsš 5,i are indicated for edge 5 (2.6) The following proposition (proved in Sect. 2.2) gives an explicit formula for the right hand side of (2.1). The integral over M( t, ) is actually an (r − 1)-dimensional integral over (u r +1 , . . . , u 2r −1 ) as the first r components are fixed. Proposition 2.6. For any r ∈ N, t ∈ (0, ∞) r and φ (1) The following result is proved in Sect. 2.4 below.
There exists L 0 such that for all L ≥ L 0 , and r ∈ N, t, and φ (1) , . . . , φ (r ) ∈ G * as in Proposition 2.6, Proof of Proposition 2.4. As noted after the statement of the Proposition, we only need verify condition (2.1) in Theorem 2.1 with G = G * , and this is immediate from Proposition 2.6 and Theorem 2.7.

Branching Brownian motion f.d.d. and proof of Proposition 2.6.
Definition 2.8. Let r ∈ N, ∈ r , t ∈ (0, ∞) r , and u ∈ M( t, ). For each edge e we let (e) ∈ {1, . . . , r } be the minimal leaf such that e ∈ E ( ). Let (W i s ) s≤t i for i ∈ [r ] be (dependent) d-dimensional Brownian motions with variance parameter σ 2 , such that for any distinct i, j ∈ {1, . . . , r }, (recall u i∧ j is the distance from the root to the gca of i and j) and (2.9) We call (W 1 , . . . , W r ) a tree-indexed BM with variance parameter σ 2 on T( , u).
(2.9) simply says that the collection of Brownian motions run along the disjoint edges of T( , u) are independent. Note that in (2.9) we could choose any such that e ∈ E ( ) by (2.8). We remark that the law of (W 1 , . . . , W r ) is uniquely specified by the above (note it is mean zero Gaussian with Cov Proof. This is an elementary calculation which divides the dependent Brownian increments on the left-hand side into smaller non-overlapping independent increments and keeps track of the Fourier coefficients multiplying each increment. The details are left for the reader. (2.10) Proof of Proposition 2.6. We will work with the measures μ BBM n for branching Brownian motion where the variance parameter is σ 2 > 0 and the offspring distribution is critical binary branching, i.e., 1 2 δ 0 + 1 2 δ 2 , and so γ = 1. In this case, [23, Proposition 2.6(a)(i)] with φ = 1, and Doob's strong L p inequality for martingales imply (1) ), . . . , H t r (φ (r ) )) under N t 1 H (see, e.g., [8, Theorem 10.2 in Ch. 3]). Note also that for K large enough, 2r . Therefore, the above together with (2.11) and Dominated Convergence imply that A moment calculation for branching Brownian motion which uses Proposition 2.9 and is much simpler than that for lattice trees in Theorem 2.7, shows that the limit on the right-hand side of the above equals the right-hand side of the equality in the proposition. We sketch the proof as it explains how the right-hand side of (2.7) arises. Let Z + /n = { j/n : j ∈ Z + }. Recall (1.10), and let I t = {β ∈ I : |β| = t }. Fix t 1 , . . . , t r > 0 and consider only large enough n so that (2.13) where in the last we used the independence of the branching variables {e β : β ∈ I } and the spatial motions {B β : β ∈ I } as well as the fact that B It is easy to see that the contribution to the above sum from β 1 , . . . , β r such that for some i = j: πβ i is an ancestor of β j , is bounded by C(r, K )/n for max{t i : i ∈ [r ]} ≤ K . To see this, note that if πβ i is an ancestor of β j , then πβ i is determined by β j since its length is nt i − 1. This means there are only two possible values of β i and so we can bound this contribution by twice the (r − 1)-fold sum with each k ( ) = 0 (so each φ ( ) = 1), and applying (2.11), we obtain the above bound. Fix β := (β 1 , . . . , β r ) ∈ I nt 1 × · · · × I nt r so that none of the indices has a parent which is an ancestor of another index (in particular all are distinct). Call such a β a good value of β. Then, in particular, β uniquely determines a non-degenerate shape ( β) ∈ r where β 1 , . . . , β r label the r leaves and one can define the internal vertices of the shape by locating the branch points from the root to β 1 , then the new branch points while proceeding from the root to β 2 , and so on up to β r . See e.g. Fig. 7. In this way we label the internal vertices by β r +1 , . . . , β 2r −1 using our labelling convention in Definition 2.5 (now with β i in place of i). For example (assuming r > 1), β r +1 = β 1 |κ r +1 , where κ r +1 = max{κ : β 1 |κ = β |κ for all > 1} ∈ {0, . . . , min{|β |} − 2} (the upper bound On the left is (part of) a GW tree with β 1 , β 2 , β 3 indicated. Here |β 1 | = |β 5 | = 2, |β 2 | = |β 3 | = 3, and |β 4 | = 1, and this contributes to (2.13) when t 2 , t 3 ∈ [3/n, 4/n) and t 1 ∈ [2/n, 3/n) as depicted. On the right is the corresponding tree shape. The edge lengths associated to the latter are determined by taking differences of the u e , where u 4 = 1/n, since β is good), and then continue down the branch towards β 1 until there is only one leaf (β 1 ) along the remaining tree. Note that each β for > r is of the form β i |κ for some i = i( ) ≤ r and some κ < |β i |, i.e. is an ancestor of some β i . We introduce tree distances u( β) = (u 1 , . . . , u 2r −1 ) for the above shape, with Recall that u is the distance from vertex β to the root and so edge distances can be found by differencing. Denote this tree shape with edge lengths by T( β). Note that the fact that β is good ensures that u < |β i |/n ≤ u i , whenever β is an ancestor of β i for > r and i ≤ r . In fact, the possible values of u are now given by the discrete analogue of M( t, ), and u k < u j whenever β k is an ancestor of β j }. (2.14) In the above notation we use the fact that the ordering of the leaves given by t, the shape , and our convention on numbering internal vertices, determines the ancestral relationship between the β k , not the particular choice of β. The definition of u for the internal branch points > r ensures that To see this, note that at a branch point β = β i ∧ β j for leaves i, j and > r , the Brownian pathsB β i andB β j do not split apart and evolve independently until time (|β | + 1)/n = u . We now decompose the sum over good β in (2.13) according to its shape, , and edge lengths u. Abbreviating (β 1 , . . . , β r ) ∈ I nt 1 × · · · × I nt r as β ∈ I n t , and writing β ⊂ GW for {β 1 , . . . , β r } ⊂ GW , the right hand side of (2.13) becomes (2.16) Recall the notation (2.10). Choose ∈ r , u ∈ M n ( t, ), and β ∈ I n t such that ( β) = and u( β) = u. Let N = N ( , u) ∈ Z + be the number of ancestors of β 1 , . . . , β r in the index set I . Note that N is equal to n times the sum of (truncated) edge lengths in g. the left hand side of Fig. 7). (Here we identify each edge of rescaled length 1/n with the index of its entry vertex in I .) Therefore N is a function of ( , u) as the notation suggests. It follows immediately that P( β ⊂ GW ) = 2 −N since β ⊂ GW if and only if each of these ancestors has two offspring.
It follows from this, (2.15), and Proposition 2.9, that (2.16) equals Here dropping the "good" requirement on β, at the cost of O 1 n , is again an easy calculation along the lines of that done earlier.
For fixed ∈ r and u ∈ M n ( t, ), the number of choices for β ⊂ I with this shape and edge lengths in the above is 2 N . This is because there are two choices for the offspring labels for each of the N "ancestors" above. Therefore combining the above equalities leads to As n → ∞ in the above, the (r − 1)-fold Riemann sum converges to the (r − 1)dimensional integral in the right-hand side of the proposition, and so the result now follows from (2.12). For the Riemann sum convergence, we note that the u dependence of the integrand admits finitely many jump discontinuities. Then

Fig. 8.
A depiction of the event in the detailed 1-particle function with n = 1, t = 6, s 1 = 1 and s 2 = 4, with the path s → w s (6, x 3 ) in bold (recall the notation from (1.1)) We call the quantity P x m ∈ T nt , ∩ m j=1 {w ns j (nt, x m ) = x j } a detailed 1-particle function, (see e.g. Fig. 8), and the Fourier transform of the increments is called a detailed 1-particle transform, i.e.
Related quantities arising from expectations of the form 2.7) are called detailed r -particle transforms. Therefore Theorem 2.7 amounts to verifying the appropriate asymptotics for the detailed r -particle transforms. When m = 1, the detailed 1-particle function is simply P(x 1 ∈ T nt ), and its Fourier transform becomes x∈Z d e ik 1 x P(x ∈ T nt ). These quantities are called the 1-particle functions (traditionally in the literature these have been called the 2-point functions, with the two points being the origin o and x 1 ). For n ∈ Z r + and x = (x 1 , . . . , x r ) ∈ Z dr we can define the r -particle functions (see e.g. Fig. 9): and (their Fourier transforms) the r -particle transforms for k ∈ (R d ) r : We write O(x) to denote a quantity whose absolute value is bounded by a constant times x. Using the inductive method of [12,14] 9. A depiction of the event in the 3-particle function ρ (3,3,6) (2.20) Recall that the constant C A is equal to A in the paper [17], while σ 2 0 is equal to vσ 2 /d in [17]. The error terms (see [17, Theorem 3.7, Lemma 3.8]) in (2.20) depend on d, L but are uniform in {k ∈ R d : |k| 2 ≤ C log n} (where C depends on δ). Taking k = 0 above we see that, as claimed in Sect. 1.1, C A = lim n→∞ E[|T n |]. Asymptotics for the r -particle transforms are provided in [17,Theorem 1.14]. In particular there exists C V > 0 depending on D, d such that Recall that the constant C V in our paper is equal to Vρ 2 in [17]. Our task is to "upgrade" these kinds of results from [17] to get asymptotics for the "detailed" r -particle transforms. This is the focus of the next section.
2.4. The LT detailed r -particle transforms and proof of Theorem 2.7. Recall the labelling convention for internal vertices (branch points) and edges in from Definition 2.5.
A lattice tree T o having r + 1 leaves (o = x 0 and x 1 , . . . , x r ), r − 1 vertices x r +1 , . . . , x 2r −1 of degree 3, and all other vertices degree 2, has an associated abstract tree as follows: x i → i, and any two vertices i, i in are connected via a single edge if the shortest path from x i to x i in T passes through no other x j . All vertices in are degree 1 or 3. Relabelling the vertices of degree 3 according to the labelling convention in Definition 2.5 gives an abstract shape , which is the shape of T and the points x 1 , . . . , x r (and o), and we write v g ∈ {x r +1 , . . . , x 2r −1 } for the vertex in T that mapped to branch point g ∈ .
Given ,y = (y e,i ) i∈[ j e ],e∈[2r −1] , andň = (ň e,i ) i∈[ j e ],e∈[2r −1] with eachy e,i ∈ Z d and eachň e,i ∈ N, letŤ ( ,y,ň) denote the set of lattice trees T o such that: i=1y e,i , and the shape of the minimal subtree T of T containing o and x 1 , . . . , x r is , and for each branch point g ∈ , the corresponding vertex v g is tree distance f ≺g from the root in T , and (**) for each ∈ [r ], each e ∈ E ( ), and each i e ∈ {1, . . . , j e }, the path from o to x in T passes through the point f ≺e Given n, , andň as above, andǩ The following proposition will be proved in Sect. 4.5 via modifications of [17,Theorem 4.8] (where each j e = 1) as indicated in [18]: where the constants in the error terms depend on L, δ, r, R, ( j e ) e∈[2r −1] and ε > 0.
The purpose of this section is to prove Theorem 2.7 using Proposition 2.11. We begin with generalisations of (2.17) and (2.18) (where r = 1). Fix r ≥ 1 and Take expectations and work with the un-normalised functions w(t, x) = w(t, x) T (a slight abuse of notation, as before w(t, x) was defined as a function of the random tree T ) to see that . , x r ∈ T , one can consider the minimal subtree containing the origin and these points. Typically this subtree has r − 1 branch points that are connected to the root and the points x i according to an abstract (rooted) shape consisting of 2r − 1 edges e ∈ E( ) and 2r vertices. Call this the shape associated to (T, x). Contributions from subtrees containing fewer than r − 1 branch points (arising if (i) the number of distinct elements in {x 1 , . . . , x r } is smaller than r , or (ii) paths in T to one or more x i contain paths to one or more other x j , or (iii) the most recent common ancestor of two x j 's is the origin, or (iv) some branch point in the subtree has degree more than 3) will constitute error terms (see e.g. (2.26) below) and they will be said to have a degenerate shape. For a given (non-degenerate) shape ∈ r , and t = (t 1 , . . . , t r ) ∈ (R >0 ) r , recall the definition of M n ( t, ) from (2.14) (but now with in place of β ). For x ∈ (Z d ) r , y = (y r +1 , . . . , y 2r −1 ) ∈ (Z d ) r −1 , and u ∈ M n ( t, ), let T n ( , t, u, x, y) denote the set of lattice trees T containing the origin and the points x i ∈ T nt i for i ∈ [r ] for which the shape associated to (T, x) is , such that for each branch point j = r + 1, . . . , 2r − 1 in , the spatial and temporal location of the corresponding branch point in T is (y j , nu j ). The main contribution to (2.23) is therefore The modulus of each exponential is bounded by 1. Next, using (2.19), and neglecting interaction between parts of the tree corresponding to the 2r − 1 different edges in the shape we get that for any shape ∈ r , Remark 2.12. Bounds similar to (2.25) hold in great generality. For any abstract rooted tree graph (call it a generalised shape) * with edge set E * , and any set of temporal lengths (n e ) e∈E * (with each n e ∈ N) associated with those edges: the total weight of all lattice trees containing the origin having vertices with spatial and temporal displacements ( e ) e∈E * and (n e ) e∈E * with the generalised shape of the connections to these points being * , summed over ( e ) e∈E * gives at most K #E * 0 . This is also obtained by ignoring interactions between different parts of the trees corresponding to different edges in E * .
For degenerate shapes, one also has (2.25) (in fact the exponent 2r −1 can be reduced). However, in comparison with (2.24), degenerate shapes give rise to sums over fewer (at most r −2 in fact) u j 's, each of which takes at most nt +1 possible values. After summing over finitely many degenerate shapes and summing over u we may bound the version of (2.24) for degenerate shapes by We conclude that contributions to (2.23) from degenerate shapes are bounded in absolute value by Cn −1 (t + 1) r −2 and the main contribution from non-degenerate shapes is at most C(t + 1) r −1 . If we set m ( ) = 1, k ( ) = 0, we conclude the following as a special case: Lemma 2.13. For each r ∈ N there exists a constant C r > 0 such that for all t 1 , . . . , t r ≥0, Given ε > 0, t, s, and a (non-degenerate) shape ∈ r , let M n,ε ( t, , s) denote the set of u ∈ M n ( t, ) for which (with u 0 := 0) either: • there exist a leaf ∈ {1, . . . , r }, a branch point j ∈ {r + 1, . . . , 2r − 1} in the path from o to , and an i ∈ {1, . . . , m ( ) }, such that . . , 2r − 1} vertices of , such that i is an ancestor of j in and Roughly speaking these correspond to situations where there is branching on a path close to one of the observation times along the path, or where one of the edge-lengths is short. ε ( t, , s). Then the sum over u in (2.24) can be split into a sum over u ∈ M n, * ( t, , s) and a sum over u ∈ M n,ε ( t, , s).
Using the same argument as for (2.26), we get that the absolute value of the sum over u ∈ M n,ε ( t, , s) is at most n r −1 Cε(t + 1) r −1 n −(r −1) = C(t + 1) r −1 ε. (2.27) We therefore turn our attention to the quantity . (2.28) We now define discrete analogues of the sets I following Definition 2.5. Recall the notation (2.10). Let n (a, ) .
which depends on , s, u, n and of course k.
If n ∈ N, ∈ r , s, and u ∈ M n, * ( t, , s) are given, this determinesň =ň( , s, u) as above. If we are given k as well then this also determinesǩ(n). By expressing locations of paths in terms of their spatial incrementsy = (y e,i ) i∈[ j e ],e∈[2r −1] (and recalling the definition ofŤ ( ,y,ň) given prior to (2.22)) we see that (2.28) is equal to  ( t, , s). Then, as for (2.27), we have that Recall the definition of (and its arguments) from (2.6). Below we will show that as n → ∞ (2.29) converges to Fix ∈ r and consider the quantity in (2.29) with fixed which can be written as e iǩ e,i (n) √ n ·y e,i P T ∈Ť ( ,y,ň( , u, s)) . (2.31) Then (2.31) is equal to where we recall thatǩ(n) depends on , s, u, n, k.
Proof of Theorem 2.7. Fix r , t and the φ ( ) (hence k and s).
Let δ(s) > 0 denote the minimum difference between distinct values in s (recall that this includes 0 and each t ). Let ε ∈ (0, (δ(s)/2) ∧ 1). Above (see in particular (2.26), (2.27) and (2.32)), we have shown that the left hand side of (2.7) is equal to where the constants in the O notation here only depend ont, r, L , d. By definition ofň, eachň e,i is equal to ns − ns for some distinct s > s ∈ s (or is equal to | ns ( ) i −nu j | for some branch point j in the path from o to in , or |nu i − nu j | for some i ≺ j in ). It follows from the definition of δ(s) and the fact that u ∈ M n, * ( t, , s) that we have thatň e,i > nε/2 for all e, i for n sufficiently large depending on ε (which we assume in what follows). By Proposition 2.11 (recalling that C 0 = C 2 A C V and C 1 = C A C V , and δ ∈ (0, 1) is as in Proposition 2.11) we see that this is equal to where the error term in the exponent depends on k but is uniform in u. Recalling (2.6), it follows that (2.35) is equal to As n → ∞ in the above, the (r − 1)-fold Riemann sum converges to the (r − 1)dimensional integral in (2.30). We have therefore shown that there exists a constant C (depending on k, s, t) such that for any ε > 0, for n sufficiently large we have that which completes the proof.

Tightness
In this section we work in an abstract setting for historical processes motivated by the historical paths {w(m, x) : m ∈ Z + , x ∈ T m } of lattice trees and those for branching Brownian motion, {B α : |α| ∈ Z + , α ∈ GW } (with n = 1), both introduced in Sect. 1. As before, add to R d as a cemetery point. Assume on a probability space ( , F, P) we have ∀k ∈ Z + , S k is an a.s. finite random subset of a countable set S. (3.2) So for each k ∈ Z + and β in the random finite set S k we have a discrete-time R dvalued stochastic process starting at 0 and freezing at time k. For we define the rescaled paths by where C g > 0 is a model-dependent constant. We call this class of measure valued processes, the historical processes associated with W. Example 3.2 (Branching random walk). We discretize (in time) the branching Brownian motions introduced in Sect. 1 and use the notation from that construction. We denote dependence on n ∈ N now in our notation forB β,(n) for β ∈ I . Let S = I , S m = {β ∈ I : β ∈ GW, |β| = m}, and for β ∈ S m set j∧m .
Then one can check that for α ∈ S nt , In order to prove historical tightness, we will assume that the collection W (as in (3.3)) of historical paths satisfies the following condition. Recall that w (n) is the scaled version of w, as in (3.4).

Condition 3.3 (Modulus of continuity).
For some q ∈ (0, 1/2), θ ∈ (0, 1], and constant C 2 > 0, there exist random variables (δ n ) n∈N so that for all historical paths w ∈ W and n ∈ N, This condition is verified for any q ∈ (0, 1/2) and θ = 1 in [20, Theorem 6] for sufficiently spread-out lattice trees in more than 8 dimensions in Example 3.1 above (as well as a number of other models)-see Lemma 3.12 below. For the Branching Random Walks with Gaussian increments in Example 3.2 it is easy to derive it from [6, Theorem 8.1] for the same parameter values (in fact θ can be taken to be any value in (0, ∞)) . Here one takes the underlying diffusion to be Brownian motion, restricts the time steps to be in Z + /n, and then uses (3.6).
In our abstract setting, the extinction times become agreeing with our earlier definition for lattice trees. We assume S (1) satisfies the following:

Condition 3.4 (Survival bounds).
There exist c, c > 0 such that

Definition 3.5.
For a metric space, E, a collection {Q n : n ∈ N} of probabilities on D(R + , E) = D(E), is C-relatively compact iff every sequence n k → ∞ has a subsequence {n k } s.t. Q n k converges weakly in D(E) to a law, Q, supported on C(E), the set of continuous E-valued paths. If {X n } is a sequence of càdlàg E-valued processes on our underlying probability space, we say {X n : n ∈ N} is C cond -relatively compact iff for every s 0 > 0, the set of conditional laws {P(X n ∈ ·|S (n) > s 0 ) : n ∈ N} is C-relatively compact in D(E).
We start with a general tightness result for historical processes in this abstract setting: Theorem 3.6. Assume H (n) is given by (3.5), where W satisfies Condition 3.3. Suppose also that Condition 3.4 holds and {H (n) · (φ) : n ∈ N} is C cond -relatively compact in D(C) for each φ in a determining class D 0 (for M F (D(R d ))) containing 1. Then {H (n) · : n ∈ N} is C cond -relatively compact, and for every s 0 > 0, every limit point, H , In practice it is the relative compactness of {H (n) · (φ) : n ∈ N} for a rich class of test functions φ that will require most of the effort. For LT's this is done in Proposition 3.11, which is in turn proved in Sect. 3.2 below. Applying Theorem 3.6 to the case of lattice trees (conditional on survival), we will then deduce the following below:

Theorem 3.7. Let H (n) be the sequence of rescaled historical processes associated with sufficiently spread-out lattice trees in d > 8 dimensions, defined in (1.4). Then {H (n)
· : n ∈ N} is C cond -relatively compact.

9)
and for all φ ∈ D 0 the sequence of probabilities, For δ, T > 0 and w ∈ D(R d ), we define where the infimum is over all partitions {t i } such that 0 = t 0 < t 1 < . . . t N −1 < T ≤ t N such that t i − t i−1 > δ for all i. Note that W is decreasing in δ and increasing in T . We restate [8, Ch. 3, Theorem 6.3 and Remark 6.4] with their general metric space E replaced by R d and use the above monotonicity to take sequential limits and restrict T ∈ N.
An easy application of Proposition 3.9 shows that A M has compact closure in D(R d ).
where in the first inequality we have moved an interval endpoint to an appropriate neighbouring point in Z + /n resulting in an error of at most n −q . Consider next W (w (n) , 2 −m , T ) for w ∈ W. If 2 −m < 1 n , then W (w (n) , 2 −m , T ) = 0, as one can see by taking t i = i n , i ∈ Z + in the definition of W , and using the fact that w (n) is constant on [i/n, (i + 1)/n) for i ∈ Z + . Assume therefore that 2 −m ≥ 1 n . Now set t i = i2 −m+1 , for i ∈ Z + , which gives t i − t i−1 > 2 −m for all i. We also have By (3.7) this implies that for s, t ∈ [t i−1 , t i ) where in the last line we have used the middle expression in (3.12). This proves that W (w (n) , 2 −m , T ) ≤ 2 −(m−2)q , which together with (3.11), shows that w (n) ∈ A M , and so completes the proof.
Proof of Theorem 3.6. Let n k → ∞, fix s 0 > 0, and define probabilities on D (M F (D(R d ))) by For the first assertion we need to show this sequence of probability laws are Crelatively compact on D(M F (D(R d ))). For this we will use Theorem 3.8, and so need to verify the hypotheses of that result. For (3.9), for all T ∈ N we set K η,T = A M , where M is chosen below. The compactness of this set follows from Proposition 3.9, as has already been noted above. By Lemma 3.10, where in the last inequality we have used (3.8) and (3.7). The above bound is at most c −1 (s 0 + 1)C 2 (2 (2−M)θ + n −θ k ) which will be smaller than η if we set M = M(η) large enough and assume n k > N (η). This proves (3.9) for large enough k. It is easy to enlarge K η,T to obtain a compact set which satisfies (3.9) for all k. For example, for fixed n = n k ≤ N (η) and all t ≥ 0, H (n) t is supported on the space of càdlàg paths which are constant on [i/n, (i + 1)/n) and on [S (n) , ∞), and whose jumps are uniformly bounded in absolute value by Now use (3.8) to bound S (n) and bound the upper bound on the jumps in (3.13), with high P n -probability, and so obtain a compact set of paths which supports H (n) t for all t ≥ 0 with P n probability at least 1 − η, for the finite many values of n = n k ≤ N (η).
The other condition (3.10) of Theorem 3.8 holds by assumption and so the C-relative compactness is established.
For the last statement we note first that if w t = w t −w t− for w ∈ D(R d ) and t > 0, then a simple Skorokhod topology exercise (e.g. use [8, Chapter 3, Proposition 5.3]) shows that for any δ > 0, Consider a weak limit H of {P n k }. By Skorokhod's representation theorem and the continuity of the limit point, H · , we may realize all our processes on a space with underlying law P and assume H for all t ≥ 0, P -a.s. So the Portmanteau Theorem for the weak topology gives for all t ≥ 0 and M ∈ N, Now fix t > 0 and use Fatou's Lemma to see that for δ > 0, In the last inequality we use the fact that for k large enough δ n k ≥ 1/n k implies that for all ancestral paths, and all s > 0, | w (n k ) s | ≤ (1/n k ) q < 1/M, and in the final equality we use Conditions 3.3 and 3.4. Now let M ↑ ∞ to see that H t is supported by C = C(R d ) a.s. for each t > 0. Therefore H t (C c ) = 0 ∀t ∈ Q >0 . So using the openness of C c and the Portmanteau theorem again, we get from the continuity of t → H t that H t (C c ) = 0 for all t ≥ 0 a.s.
Let Lip K denote the set of functions φ : D(R d ) → R such that for each w, w ∈ D(R d ), |φ(w)| ≤ K and |φ(w) − φ(w )| ≤ K w − w , where w = sup t∈R + |w t |. The proof of this key result is more complicated and so is deferred until Sect. 3.2. Assuming this, Theorem 3.7 now follows: Proof of Theorem 3.7. We have already noted that the historical process for lattice trees is a special case of the general framework in this Section, that Condition 3.3 was verified in [20] with q = 1/4 and θ = 1 (see Lemma 3.12 below), and Condition 3.4 holds by (1.6). Proposition 3.11 shows the last hypothesis of Theorem 3.6 holds with D 0 = Lip 1 . D 0 is a determining class because it includes appropriate multiples of all finitedimensional Lipschitz continuous functions. The result now follows from Theorem 3.6.
One can also prove the analogue of Theorem 3.7 for the branching random walks in Example 3.2, where the analogue of Proposition 3.11 yields easily to martingale methods, but the convergence results here can be readily proved as in [24, Chapter II].

Tightness for lattice trees.
The goal of this section is to prove Proposition 3.11. For lattice trees, we will use the modulus of continuity in the following form: Lemma 3.12. For each n ∈ N there exists a random δ n ≥ 1 n and a constant c > 0 satisfying nP(δ n ≤ ρ) ≤ cρ for every ρ ∈ [0, 1) and every w ∈ W (the system of ancestral paths to points in the tree) Proof. Apply [20, Theorem 6] with α = 1/4. The fact that we can take δ n ≥ 1 n follows from the finite-range assumption on the lattice trees, which gives |w (n) i/n − w (n) (i−1)/n | ≤ Ln −1/2 ≤ Ln −1/4 , and so allows us to replace δ n with δ n ∨ (1/n).
The other main ingredient we use is a bound on the fourth moments of the increments of the total mass: Proposition 3.13. There is a γ > 1 and for any T > 0, there is a c T such that for all n ∈ N and all s 1 , The above is condition (ii) of [11, Theorem 2.2] with k = 0 and is verified in that reference (see [11, Theorem 3.3, Lemma 3.5, and Section 7]). For w ∈ D(R d ) and t ≥ 0 let w t ∈ D(R d ) be defined by w t s = w s∧t and for φ ∈ Lip 1 let φ t ∈ Lip 1 be defined by φ t (w) = φ(w t ). Define T (n) t = n −1/2 T nt . We will use T (n) t as our index set for w (n) , as in (1.2), and so depart from the notation in (3.4).
Lemma 3.14. Let δ n be as in Lemma 3.12, and assume that 0 ≤ v ≤ t 1 < t 2 satisfy Then for φ ∈ Lip 1 and i = 1, 2,  4)), and therefore for t i and v as above, where we have used t 2 − v ≤ δ n and Lemma 3.12 in the last line. The result follows.
For a lattice tree T containing x (and o), let T ≯x denote the tree consisting of all vertices that are not descendants of x. If x / ∈ T then let T ≯x = ∅. Let F ≯x = σ (T ≯x ). Let T x denote the set of lattice trees containing the vertex x. If x ∈ T , let R x (T ) ∈ T x denote the descendants of x in T together with x and all the edges joining them By [20,Lemma 9.4] this is at most Summing over S gives The right-hand side is equal to ρE[ϕ(o, T )]1 {x∈T } as claimed. Use linearity to get the result for simple non-negative functions, and monotone convergence to complete the proof.
Assume 0 ≤ v ≤ t 1 < t 2 and φ ∈ Lip 1 . We want to bound |H (n) Lemma 3.14 will allow us to handle the first and last terms; the majority of the work will be in bounding the expected 4th power of the middle term. For fixed n, T ∈ T o and x ∈ n −1/2 Z d , let R (n) x (T (n) ) = n −1/2 R √ nx (T ) ⊂ T (n) denote the subtree consisting of x and its descendants. Write R (n) x = R (n) x (T (n) ).
Using the tree structure and v ≤ t 1 < t 2 , this is equal to Let 1 (n) {x,v} := 1 {x∈T (n) v } and recall γ > 1 is as in Proposition 3.13. Lemma 3.16. Let ε ∈ (0, 1], K > 0 and T ∈ N. There is a C K ,T > 0 so that for n ∈ N, where we have used Proposition 3.13, and that, without loss of generality, . Now use the uniform bound on the survival probability from Condition 3.4 for lattice trees, to bound the above by as required.
In proving our next result, we will make use of Lemma 2.13 with each t i = t.
Proposition 3.17. There are η, ε ∈ (0, 1], and for any T ∈ N a constant C T , such that for (v may be negative), and all n ∈ N, Proof. We first show that it suffices to prove the above for t i ∈ Z + /n satisfying t i ≤ T, t 1 ≤ t 2 ≤ t 1 + 1, and any v ≤ t 1 − (t 2 − t 1 ) ε . (3.17) Assume this result and let n, t i and v be as in the theorem. Using t 2 − t 1 ≤ 1/2, we have In addition, using t 2 − t 1 ≥ 1/(2n) we have (the last by (3.18)), as required.
So consider now only t i ∈ Z + /n satisfying (3.17) and t 2 > t 1 (without loss of generality). We first assume v ≤ 0. In this case for all x ∈ T (n) t i , w (n) (t i , x) 0 is the zero path,0, and so The required inequality now follows (recall t i ∈ Z + /n) from Proposition 3.13 and |φ(0)| ≤ 1. v, x)). Note that from (3.15) we have where (n) x,v denotes the product of the indicators (n) x i ,v over the elements x i of the vector x and φ (n) x,v is the product (running over the elements of the vector x) of the φ (n) x i ,v . We'd like to condition (n) x 4 ,v on F ≯x 4 in order to extract a positive power of t 2 −t 1 using Lemma 3.16. This is complicated by the fact that there are terms in the sums where other x i = x 4 . If we specify for which i this is true for then we will also have a constraint that the remaining x j are not equal to x 4 . After conditioning we wish to restore the possibility that these x j = x 4 in order to recover a term of the form (H (n) t 2 (φ v ) − H (n) t 1 (φ v )) raised to some power smaller than 4 and so derive a recursive inequality which will bound the mean of fourth power of this increment. This results in an inclusion-exclusion argument below. To shorten the notation we will drop the dependence on v and n in our notation and also suppress the summation range of x.
In what follows, A 1 ⊂ [4] denotes the set of indices i for which x i = x 4 (so in particular 4 ∈ A 1 ). Then letting A c 1 = [4] \ A 1 , and writing x(A) := {x i : i ∈ A} and x A for the vector x with coordinates restricted to A, we have where in the case A 1 = [4] we interpret the term in the expectation as x 4 φ 4 x 4 4 x 4 . Taking conditional expectation with respect to F ≯x 4 and using the fact that (for Interpreting the empty sum x 4 ∈x(A c 1 ) as zero when A c 1 = ∅, we can write the above as Note that |A 1 | + |A c 1 | = 4 and reason as in (3.19) to see that (3.20) equals which, by Lemma 3.16 and |φ| ≤ 1, is bounded in absolute value by Expressing the sum over x 4 in terms of H (n) v (1) this is equal to By Hölder's inequality this is at most (1)) r ] < C r,T for each r ∈ N (by Lemma 2.13), this shows that this quantity is at most (C may depend on T throughout) We turn now to the quantity (3.21), and it is convenient to introduce further notation. For sets A i ⊂ [4], let B i = ∪ i j=1 A j . In particular B 1 = A 1 . Thus (3.21) is equal to the negative of (3.22) Abusing notation by writing x(A) = x to mean that x i = x for each i ∈ A we can write which is simply the statement that x 4 ∈ x(B c 1 ) if and only if the set A 2 := {i ∈ [4] \ B 1 : where we have also used the fact that φ x 4 , and |A 2 | + |A 1 | = |B 2 |. In the case B c 2 = ∅ the term in the expectation in (3.23) should be interpreted as We can again condition on F ≯x 4 to see that (3.23) is equal to Using inclusion-exclusion in the sum over x 4 this can be written as (3.25) where the sum over x 4 in (3.25) is interpreted as 0 when B c 2 = ∅. The quantity (3.24) is equal to [reasoning as in (3.19)] We have also used |B c 2 | + |A 1 | + |A 2 | = 4 to get the correct powers of n. Using Lemma 3.16 again as before, we may bound the summand (in absolute value) by where we have again used Hölder's inequality, Lemma 2.13, and 2 i=1 |A i | = |B 2 | since A 1 and A 2 are disjoint. As in (3.23), the negative of (3.25) is equal to where if B c 3 = ∅ the term in the expectation is interpreted as Conditioning again, this is equal to (3.29) As in (3.26) and (3.27), the term (3.28) is bounded in absolute value by Since in (3.29) B c 3 can contain at most one element, the sums over x B c 3 and x 4 ∈ x(B c 3 ) therein reduce to a sum over x 4 (with x B c 3 = x 4 ). After conditioning again we get that the negative of (3.29) is equal to where we note that if this term is to be non-zero then each |A i | = 1, and in particular B 4 = [4]. By Lemma 3.16 and then Lemma 2.13, (3.30) is bounded in absolute value by After dropping some negative powers of n, we have shown above that Thus, letting d = D 4 |t 2 − t 1 | 16ε−γ and recalling that |t 2 − t 1 | ≤ 1, we have Recall that D 4 is finite by Lemma 2.13, and so from the above, d ≤ C = C(C ), and therefore D 4 ≤ C|t 2 − t 1 | γ −16ε . Choosing ε < (γ − 1)/16 completes the proof.
Proof of Proposition 3.11. Let φ ∈ Lip 1 and n k → ∞. For a fixed s 0 > 0 we must show that {n k } has a subsequence {n k } along which P H (n k ) · (φ) ∈ ·|S (n k ) > s 0 converges weakly to a continuous limit. The argument remains unchanged if we assume n k = k, and to ease the notation we will assume this. So our goal is to show that

Now fix T ∈ N and assume
where ε is as in Proposition 3.17; note that v may be negative. Recall from (3.14) that and so if δ n (ω) is as in Lemma 3.12, then Lemma 3.14 (applied to v + ≥ 0) together with the facts that If η > 0 is as in Proposition 3.17, let η 0 = η/8. Proposition 3.17 shows that for m, n ∈ N satisfying m ≤ (log 2 n) + 1, that is, 2 −m ≥ 1 2n , then, by taking a union bound over k ∈ Z + : 0 ≤ k2 −m ≤ T + 1,

By a union bound there is an
Set η 1 = (ε/4) ∧ η 0 > 0. Combine the above bound with (3.34) and use it in (3.33) (with T + 1 in place of T in the latter two) to see that for all natural numbers m satisfying T +2 * (1) + 1)2 −mη 1 .
Thus (3.37) holds for {t 2 } n , {t 1 } n , that is, Now use the fact that H (n) t i (φ) = H (n) {t i } n (φ) for i = 1, 2 to conclude that: Next use Lemma 3.12 and (3.35) to see that for r ∈ (0, 1 12 ), then (3.40) Our objective now follows easily from (3.39) and (3.40). Let {H (n) t , t ≥ 0} be the continuous process obtained by linearly interpolating {H (n) j/n (φ) : j ∈ Z + }. It follows from (3.38) and (3.39), with T + 1 in place of T , that for some C T , For t 2 − t 1 ≥ 1 n this is an easy consequence of the triangle inequality and the fact that δ n ≥ 1 n . For 0 < t 2 − t 1 < 1 n , either [t 1 ] n = [t 2 ] n and the linear interpolation and δ n ≥ 1/n easily give the desired bound, or [t 2 ] n = [t 1 ] n + 1/n, and the triangle inequality gives which leads to the required bound using the linear interpolation and δ n ≥ 1/n again. Recall that |φ| ≤ 1 implies |H (n) 0 | ≤ 1 C 0 n . We now fix T ∈ N, and for δ, M > 0, define a compact set of paths in C = C([0, T ], R) by Compactness is clear by the Arzela-Ascoli Theorem. Recall that s 0 > 0. It follows from (3.41) and (3.40) that for small enough δ k > 0 and large enough M k , n k ∈ N, (3.42) Here we are using the tightness of the maximum total mass processes from [11, Theorem 1. We use the lower bound on the survival probability from (3.8): Combine the above with (3.43) to conclude that for all m, n ∈ N, This shows that {P(H (n) ∈ ·|S (n) > s 0 ) : n ∈ N} is tight in C(R + , R) and so by Prohorov's theorem is relatively compact in C(R + , R). This implies (see, e.g., [8,Proposition 10.4 in Chapter 3]) that {P(H (n) (φ) ∈ ·|S (n) > s 0 ) : n ∈ N} is C-relatively compact in D(R + , R), proving (3.31), as required.

Proof of Proposition 2.11
The goal of this section is to prove Proposition 2.11. The proof is a modification of that of [17,Theorem 4.8], so we will not give all of the details here. Instead we will indicate the main ideas of the proof, and refer the reader to [17] for various details.
For ∈ r , [17, Theorem 4.8] proves Proposition 2.11 in the simplified setting where j e = 1 for every e ∈ E( ). In that reference (and with j e = 1 for each e) the quantitŷ t ( ) n (·) is written ast N ( ,ň) (·), where N ( ,ň) denotes a skeleton network consisting of insertingň e,1 − 1 vertices into edge e, for each e ∈ E( ). The quantity ρ −1t N ( ,ň) then encodes (in Fourier space) the probability of our random tree T connecting the origin to r specified space-time points with the spatial and temporal locations of the branch points, as well as the "shape" of the connections also specified (consider the seť T ( ,y,ň) in the case where each j e = 1). In our paper j e need not be equal to 1. In this more general setting,t ( ) n (·) encodes (in Fourier space) the probability of a subset of the above event, where now the spatial locations at various other fixed times are also specified. The appropriate skeleton network is now a marked skeleton network N + (see below), where certain vertices on the skeleton network N at fixed times (graph distance from the root) are marked.
The approach in [17, proof of Theorem 4.8] relies on the so-called lace expansion and involves an inductive argument (on r ). To be more precise [17] uses the lace expansion on a tree network (introduced in [15] for networks of self-avoiding walks) in the context of lattice trees, with the expansion applied at the closest branch point to the root in the network N . The expansion gives rise to certain diagrams that involve lattice trees connecting or intersecting in various ways. Some of these connections are of fixed temporal length, and others are of unrestricted length. A crucial part of the analysis involves bounding these diagrams. The bounds depend on the complexity of the diagram, as well as the total temporal length in the diagram. Diagrams where either the complexity or the length is large give small contributions (recall that we are in high dimensions), as they are asking for either lots of intersections, or for intersections to occur over a large distance.
The point of this discussion is that, in our setting, when j e need not be 1, one can perform exactly the same expansion. It turns out that there are essentially no new diagrams to deal with in our setting. Below we introduce the definition of a marked skeleton network (see also Fig. 10) and then proceed in the following subsections to expand the above outline of the proof of Proposition 2.11. Remark 4.2. The sets of (all) vertices and edges of a marked skeleton network N + will be denoted by N + and E(N + ) respectively (note the abuse of notation that N + denotes both the marked skeleton and its set of vertices). The cardinality of E(N + ) is #E(N + ) = e∈ j e i=1ň e,i and the number of vertices is 1 larger. All special points are also vertices of N + , while marked edges should be considered as distinct objects from edges, even for marked edges (e, i) such thatň e,i = 1 (note that we have thus far specified a labelling scheme for marked edges, but not edges). The set of marked edges of N + is E(N + ).

Asymptotics of the detailed 1-particle transform.
For the case where r = 1, there exists only one shape in 1 which consists of a single edge e. In this case, we use the notation [ň 1 , . . . ,ň ], for (ň i ) i≤ ∈ N with ≥ 0 to designate the corresponding marked skeleton network (containing no branch point) withň = {ň e,1 , . . . ,ň e, }.
One of the main results of [18] (see Theorem 4.3(ii) of that reference) can be reformulated as the following proposition (the error terms are not stated explicitly in [18,Theorem 4.3], but if we keep track of them we get the following result), which is the r = 1 case of Proposition 2.11: 1 such that for all L ≥ L 0 : For each δ ∈ (0, 1 ∧ d− 8 2 ), R > 0, every ∈ N and (ň i ) i≤ ∈ N and for any k ∈ [−R, R] we have, for the unique shape ∈ 1 , where the error depends on R, δ, L, d, , and any lower bound on min i≤ ň i /n and upper bound on max i≤ ň i /n.
Note that in [18] eachň i is of the form nt i − nt i−1 , where 0 = t 0 < t 1 < · · · < t ≤ t * and where the error term depends on min{t i − t i−1 } and t * .

Lace expansion.
We will use the lace expansion (and induction on r ) to reduce our required estimates on a shape in r with r ≥ 2 to the shape in 1 . In the following we let N + = N + ( ,ň) for some ∈ r and someň, where r ≥ 2. Since eachň e,i /n ≥ ε in Proposition 2.11, for fixed ε we may assume that n is sufficiently large so that eacȟ n e,i ≥ 2 in what follows. b Fig. 11. A graph on a marked skeleton network N + , with b denoting the branch point nearest to the root. The rightmost bond is in R since it covers two special points. Also, Fig. 11.
In this section, for a bond vv ∈ N + , U vv will denote a quantity in {−1, 0}. Observe   a (connected subnetwork of a) star-shaped network of degree at most 3 (since ∈ r with r ≥ 2).
with the convention that In words this decomposition says that the set of graphs on N + containing no bonds that cover two or more special points consists of (i) those graphs for which the induced connected subnetwork containing b also contains a neighbour of some other special point (this is the last term in (4.2)) and (ii) those graphs for which this induced subnetwork does not contain the neighbour of another special point. For (ii) the induced connected subnetwork is some set A contained in S − N + so we can first sum over the possibilities for A and then sum over connected graphs on A and graphs on each (N + \ A) i . Introducing  . Given x ∈ Z d , we write R x to denote a sum over lattice trees R containing the point x ∈ Z d .
As for [17,Eq. (4.17)] we can write as any combination (ω ∈ N + (y), (R s ) s∈N + ) such that the R s are mutually avoiding lattice trees, uniquely defines a lattice tree T ∈Ť ( ,ň,y) and vice versa. Here, R s is the tree hanging off the vertex s ∈ N + . Note that in the shorthand notation of [17] (4.5) would be written as Recalling Definition 4.5 and (4.1), we set (4.6) which is 0 unless U vv = −1 for some vv ∈ R, and (recalling the last term in (4.3)) By (4.1) we have and by (4.3) t N + (y) This decomposition is related to Fig. 11 where, loosely speaking, the term in J corresponds the interactions induced by bonds around the first branch point and the three terms in K correspond to three new smaller networks. Some notation associated to this decomposition is introduced in the next definition. Definition 4.12. For a marked skeleton network N + , letě 1 ,ě 2 ,ě 3 be the three marked edges incident to the branch point b. Note that each for k = 1, 2, 3,ě k = (e k , i k ) for some e k ∈ [2r − 1] and some i k ∈ {1, j e k } (this marked edge is necessarily the last marked edge on the branch containing the origin and the first marked edge on the other two branches containing b).
Given m = (m k ) 3 k=1 such that 0 ≤ m k ≤ňě k − 2 whereňě k :=ň e k ,i k , k = 1, 2, 3, we define (N + k, m ) k=1,2,3 as the three components of N + \ S m as in Definition 4.9 (recall Definition 4.6, and note that each N + k, m is itself a marked skeleton network). Since each m k <ň e k ,i k , there is a bijection between marked edges of N + and the marked edges of (N + k, m ) k=1,2,3 . The marked edgeě k is split between S m and N + k, m , but we will abuse notation by retaining this label to refer to the corresponding truncated edge in both components. Set for the vector whose components encode the lengths of marked edges in N + k, m , i.e.
for the vector whose components arě =ň wheneverě ∈ E * (N + k, m )). In particular, recalling that there is a bijection between marked edges of N + and the marked edges of (N + k, m ) k=1,2,3 , we can see that for any a ∈ R there exist c(a), C(a) > 0 such that for m ∈ Hňb Finally, we set and (noting the change to the sum over m) (4.8) From the argument above and (4.7), we can see that The last three terms are error terms. The relevant estimates (bounds) are given in the following lemma, whose proof (which is very similar to the corresponding error bounds in [17]) will be presented in Sect. 4.6. where the constants in the O notation depend on d and the number of special points in N + .
We end this section by introducing an important quantity that will appear in the decomposition of Q N + and describes the interactions induced by the term J (S m ) in (4.8).

Definition 4.15.
For m ∈ Z 3 + and u ∈ (Z d ) 3 we define π 0 ( u) = ρ1 { u=0} and if some m i > 0, where the set of embeddings S m ( u) is defined similarly to Definition 4.10: the root of S m (which is the vertex along branch 1 at graph distance m 1 from the central vertex -if m 1 = 0 this is simply the central vertex itself) is mapped to 0; adjacent vertices in S m are mapped to points in Z d at distance at most L; and the central point is mapped to u 1 and the leaves on branches i for i = 2, 3 are mapped to u 1 + u 2 and u 1 + u 3 respectively.  16. This definition of π m is exactly the same as the one in [17] (Definition 4.12) and as such the results on this quantity, that rely heavily on diagrammatic estimates, can be transferred directly to our context.
Recall from (2.21) and the discussion thereafter that C V = ρ 2 V . The constant V was defined in [17, (4.30)] as For N ∈ N we define (4.11) where L ∈ L (N ) (S m ) is the set of laces on S m with N bonds and C(L) denotes the set of bonds which are compatible with L. We refer to [17,Section 2] for the precise definitions, and give only a rough description here: A lace L on S m is either a minimal graph covering S m (i.e. the removal of any bond in L results in a graph that no longer covers S m ) or one that is almost minimal (in this case there is a bond covering the branch point whose removal results in a minimal graph covering S m ). There is a rule for (uniquely) defining a lace L( ) associated to a connected graph on S m . For a fixed lace L the bonds compatible with L are those for which adding them to L results in a connected graph for which L( ) = L.
In our work we only need a few facts about π (N ) m (·), including the obvious fact that π (N ) m ( u) ≥ 0 and that (see [17, (4 8 2 , and (4.14) The correction is that the ∨1 and ∨2 are missing in [17,Proposition 4.13], but what we have stated above is what is actually proved therein. Here we have also not included the extra decay in L appearing in these bounds in [17,Proposition 4.13] as we do not need it.

4.4.
Decomposition of Q N + . By (4.8) we can see that Q N + ( ,ň) (y) can be decomposed into 4 parts: the connected component S m of bonds stemming from the branching point (term in J ) and the three subgraphs of N + remaining after the removal of this connected component (terms in K). These four subgraphs are not connected by any bonds by definition of J and K on the respective subgraphs. Furthermore the star-shaped subgraph S m contains the special point b, while all other special points are contained in one of the other subgraphs. This means that our problem can be reduced to three independent similar problems for smaller lengths. This reasoning translates into the following lemma which can be proved exactly as for [17,Lemma 4.14] so we do not repeat the proof. Recall the definition ofy v i ,i and the marked skeleton networks N + i, m in Definition 4.12.

Proof of Proposition 2.11.
The proof now closely follows that of [17,Theorem 4.8] with obvious (and straightforward) modifications. We will present the main ideas, but not the details. The goal is to prove that From (4.9), our bounds on the error terms therein (Lemma 4.14), and (4.16) we have that (4.17) We proceed by induction on r for networks with shape ∈ r , using Lemma 4.3 for the initializing case (r = 1).
Let δ ∈ (0, 1 ∧ d−8 2 ). By the induction hypothesis applied to each N + j, m (having r j +1 leaves, where r 1 = 1 and r 2 + r 3 = r ) we may write where we recall that the notationň m, j · was introduced in Definition 4.12. The error terms in the above approximation are obtained from the induction hypothesis and Remark 4.13 (using the fact thatň m, ǰ e is comparable toňě-they are identical unlessě =ě j for some j ≤ 3 -since m ∈ Hň b ). We then use the fact that NowD(ǩě j / √ n) = 1 + O(|ǩě j | 2 /n) and which, when summed over m j ≤ňě j , j = 1, 2, 3, gives at most C L 2 n −1 |ǩ b | 2 3 by (4.14). Combining the above and recalling (4.10) and that C V = ρ 2 V reveals that where t m (u) = ρP(u ∈ T m ) (so t 0 (u) = ρ1 {u=0} ), and we recall that * denotes the convolution of functions on Z d . Note that in [17] there is a ζ in the definition, but this ζ = 1 because of Lemma 3.9 of [17]. Note that for m ≥ 2, For a given skeleton network N + , let r + = # E(N + ). If there is a bond uu covering two special points then either we can find two non-neighbouring marked edgesě u andě u , or (at least) one of u, u is a leaf of N + . In order to accommodate the latter cases, for the proof of Lemma 4.14(e:R) it is notationally convenient to adjoin to each leaf in N + a "phantom" marked edge of length 0, and write E(N ++ ) for this enlarged set of marked edges. For marked edgesě,ě ∈ E(N ++ ) writeě ∼ě if they are adjacent, andě ∼ě otherwise. Recall from (4.4) that in the notation U st , st is a pair of vertices in N + . For non-adjacent marked edgesě,ě ∈ E(N ++ ) and m ≤ňě, and m ≤ň ě , write st (ě,ě , m, m ) to denote the pair of vertices in N + corresponding to the m-th vertex b f 1 f 2 mě Fig. 12. A skeleton network N + with a bond in R. This bond has endpoints in the marked edgeě ∈ N + and the "phantom" marked edgeě ∈ N ++ of lengthsňě = 6 andň ě = 0 respectively. We write st (ě,ě ,mě,m ě ) for this bond. Here,mě = 2 is indicated, whilem ě = 0. The set of marked edges E + e,ě on the path fromě tǒ e is { f 1 , f 2 } from to and to as indicated along marked edgeě in the direction away fromě and the m -th vertex along marked edgeě in the direction away fromě . If e.g.ě was one of the phantom marked edges theň ně = 0 and the relevant vertex is actually the leaf thatě was adjoined to. See e.g. Fig. 12.
For 0 ≤ a ≤ b ≤ňě, writeě[a, b] to denote that part of the marked edgeě consisting of the a-th to the b-th vertices (with ordering directed away fromě as above) and similarly defineě [a , b ] for 0 ≤ a ≤ b ≤ň ě . Proof of Lemma 4.14(e:R). In the definition of φ R N + (see (4.6)), we can see that where e.g. ifmě ∈ {0,ňě} then the corresponding empty product is 1. (Note that this kind of approach is used to prove (2.25) as well as the more general statement appearing in Remark 2.12.) Using the above inequalities we can see that Note that if e.g.ě is a phantom marked edge then the corresponding sum overmě contains only the value 0 =ňě. Now, the ω in (4.6) can be broken up at every special point and at the two vertices corresponding to st (ě,ě , m, m ). The graph then becomes broken up into (at most) r + + 2 segments. Let us now introduce the set E + e,ě of marked edges which connect (but do not include)ě toě which is non-empty sinceě andě are not neighbours, and E This arises because e.g. iff andf are two distinct marked edges for which there is no U st term appearing anywhere in (4.20) with s and t vertices off andf respectively, then the corresponding segments of ω (and the sets of lattice trees R · hanging off them) have been decoupled. Segments of ω and corresponding elements ofy can then be summed over "independently", with factors of ρ arising at endvertices, similarly to (4.19). Similarly, the presence of the term [−U st (·,·,·,·) ] in (4.20) forces two corresponding trees R · to intersect, which yields the (2) term above. Recalling that y h n (y) ≤ C 1 for any n by (The power of C 1 is r + − #Ē + e,ě + 2 ≤ r + , and so assuming C 1 ≥ 1 without loss of generality, the above follows.) The notation in the last convolution above means that there is one term in the convolution for eachf ∈ E + e,ě . By Lemma 4.19 with k = 2 and l = l + := 2 + #E + e,ě we have that for ně ,ě = Proof of Lemma 4.14(e:π ). Similarly to Lemma 4.18 (but note the change in the first summation) we have that Therefore, for anyy ∈ (Z d ) r + , Using a generalisation of (2.25) as in Remark 2.12, and then (4.13) and (4.14), we have The result follows. The proof of Lemma 4.14(e:b) is again an adaptation of the proof in [17] (specifically in [17, Section 6.5.3]). Here we will indicate the changes to the argument required for the present setting of a marked skeleton network. We start by adapting [17,Definition 2.2]. Given a graph ∈ E b N + on N + , a special point v of N + and a marked edge e of which v is an endpoint, we define the bond associated to e at v as follows: If there is no bond in covering v that has an endpoint strictly on e then there is no bond associated to e at v. Otherwise from the set of such bonds we choose the one whose endpoint in e is farthest from v. If this is not unique then we choose from this set one according to a fixed but arbitrary rule (e.g. choose from those whose other endpoint is strictly on some edge e of smallest label the one whose endpoint on e is farthest from v in this direction).
Proof of Lemma 4.14(e:b). Recall that (4.21) Recall also that (ě i ) 3 i=1 are the marked edges adjacent to b and denote their end vertices (other than b) as (v i ) 3 i=1 , which are special points. For F ⊂ {1, 2, 3} let : ∀i ∈ F, A b ( ) contains a nearest neighbour ofv i .
Note that if F = {1, 2, 3} this set may include for which some A b ( ) also contains a nearest neighbour ofv i for some i ∈ {1, 2, 3} \ F. Inclusion-exclusion over the sets F gives Given ∈ E b F,N + we define a subgraph F ⊂ to be the set of bonds st ∈ such that • st is the bond associated to one of the marked edgesě i at b, for some i ∈ F, or • st is the bond associated to one of the marked edgesě i , atv i where i ∈ F, or • st are both vertices in the marked edgeě i for some i ∈ F.
Let S F denote the largest connected subnetwork of N + containing b that is covered by F . Then S F is a star-shaped network of degree 3 or less (with branch point b) and F is a connected graph on S F . Moreover S F contains at most #F + 1 special points of N + (one of which is b) since contains no bonds in R. Note that the length of branch i of S F is at leastň i − 1. Let S F (N + ) denote (for fixed F) the set of possible S F that can arise as above from graphs ∈ E b F,N + . It follows that (4.23) Now we may proceed as in [17, (6.23)-(6.28)], which we briefly discuss in the following paragraph but direct the reader to [17] for details. For fixed F and S ∈ S F (N + ) we have the notion of a lace on S containing N bonds and the set of bonds, C(L), which are compatible with the lace L, as described after (4.11). Similarly, given F, and ∈ E b F,N + such that S F ( ) = S we have the lace associated to the subgraph F , which is a connected graph on S. Thus, as in [17, (6.23)-(6.24)], we can write (4.23) as where the sum over L is a sum over (a certain subclass of all) laces on S containing exactly N bonds (for the definition of this subclass see [17, definition prior to (6.23)]). The last two pages of [17] show how to deal with the "messy" final sum in (4.24), by breaking the sum over into three sets: (i) sets of bonds on S that are compatible with L; (ii) sets of bonds that live on N + \ S; and (iii) sets of bonds st with one endpoint in S and one in N + \ S for which S F (L ∪ {st}) = S (in each case bonds in R are excluded). Using this decomposition we see that (4.24) is equal to   We bound the absolute value of the above by simply removing the factors (−1) N (everything else is non-negative). Then we can ignore the last product (4.26) (bound it by 1) and get an upper bound. Similarly we can discard any part of the last product in (4.25) to get an upper bound. For the latter we throw away all st such that s and t are on different connected components of N + \ S. We deduce from (4.21), (4.22), and the above that where N + \S denotes the number of disjoint components of N + \S and the components are denoted by (N + \S) j . Here, the components S, and (N + \S) j for all j have now been decoupled, because there are no U st terms where s and t are on different components. Recalling (4.11), the term in curly brackets in (4.27) (in combination with the part of ω and the trees R · corresponding to S) is the quantity that gives rise to π (N ) m (where the m i are the lengths of the branches of S) except that we are summing over a restricted set of laces containing N bonds. But we can also sum over all L ∈ L (N ) (S), the set of laces on S with exactly N bonds, to get an upper bound. This gives rise to a bound where we note that the sum over m arises from the sum over S seen in previous expressions, and the constant C(r + ) arises from the generalisation of (2.25) noted in Remark 2.12. Finally use Proposition 4.17 to see that (4.28) is at most This proves the result.
Acknowledgments The authors thank an anonymous referee for many comments that helped improve the paper.
Funding Data availability Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

Conflict of interest
The authors have no conflict of interest to declare that are relevant to the content of this article.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.