Convergence of genealogies through spinal decomposition with an application to population genetics

Consider a branching Markov process with values in some general type space. Conditional on survival up to generation N, the genealogy of the extant population defines a random marked metric measure space, where individuals are marked by their type and pairwise distances are measured by the time to the most recent common ancestor. In the present manuscript, we devise a general method of moments to prove convergence of such genealogies in the Gromov-weak topology when N→∞\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N \rightarrow \infty $$\end{document}. Informally, the moment of order k of the population is obtained by observing the genealogy of k individuals chosen uniformly at random after size-biasing the population at time N by its kth factorial moment. We show that the sampled genealogy can be expressed in terms of a k-spine decomposition of the original branching process, and that convergence reduces to the convergence of the underlying k-spines. As an illustration of our framework, we analyse the large-time behavior of a branching approximation of the biparental Wright–Fisher model with recombination. The model exhibits some interesting mathematical features. It starts in a supercritical state but is naturally driven to criticality. We show that the limiting behavior exhibits both critical and supercritical characteristics.


Introduction
A branching process with a self-organized criticality behavior.The original motivation of the present article is the branching approximation of a classical model in population genetics.It can be formulated as a branching process in discrete time where each individual carries a subinterval of (0, R), for some fixed parameter R > 0. At generation t = 0, the population is made of a single individual carrying the full interval (0, R).At each subsequent generation, individuals reproduce independently and an individual carrying an interval I with length |I| gives birth to K(I) children, where K(I) ∼ Poisson 1 + |I| N , and N ≥ R is another fixed parameter.Each of these K(I) children inherits independently an interval which is either the full parental interval I, or a fragmented version of it.More precisely, with probability we say that a recombination occurs: a random point is sampled uniformly on I which breaks I into two subintervals.The child inherits either the left or the right subinterval with equal probability.With probability 1 − r N no recombination occurs and the child inherits the full parental interval I.
We refer to this process as the branching process with recombination.
One of the most interesting aspect of the present model is a self-organized criticality property.While the process is "locally" supercritical, since E[K(I)] > 1, intervals are broken via recombination and the process is naturally driven to criticality.Under the regime N R 1, we will prove that some features are reminiscent of a critical branching process (for instance, it satisfies a type of Yaglom's law) but also bears similarities to supercritical branching processes.In particular, one striking feature is related to the genealogy of the process conditioned on survival at a large time horizon.In the natural time scale, the genealogy of the extant population is indistinguishable from the supercritical case, that is, it converges to a star tree.However, if we zoom in on the root by rescaling time in a logarithmic way, the genealogy converges to the celebrated Brownian Coalescent Point Process and becomes indistinguishable from a critical branching process.
From a biological standpoint, our process was first introduced in [3] and corresponds to a branching approximation of a more complicated model of population genetics, named the biparental Wright-Fisher model with recombination.The connection between the two models and their biological significance are discussed in greater details in Section 2.
Convergence of types and genealogy.In order to analyse the previous model, we introduce a general framework and provide simple criteria for the convergence of random genealogies.Although the branching process that we consider is interesting in its own right, our study aims at giving a concrete illustration of a general approach that could presumably be relevant in many other settings.
It is quite common that individuals in a branching process are endowed with a "type", which is heritable and can in turn influence the reproductive success of individuals.Let us denote by E the set of types.For instance, in our work E is the set of subintervals of (0, R), for branching random walks E = R d [44], or for multi-type Galton-Watson processes E is often chosen as finite [1,Chapter 5].In the absence of types or when the reproduction law does not depend on types (as for standard branching random walks in R d ), the scaling limits of the tree structure and of the distribution of types have received quite a lot of attention [12,39,30].In this particular setting, one can make use of an encoding of the tree as the excursion of a stochastic process, the so-called contour process, or height process.Convergence is then obtained by showing that the corresponding excursion converges.
When the reproduction law may depend on the types, some attempts to extend the excursion approach exist in the literature [40] but as far we know a systematic and amenable approach is still missing.In this work we follow a different approach, and extend the seminal work of [18] to prove convergence in the Gromov-weak topology.Proving convergence in distribution for this setting is very similar in spirit to the method of moments for real random variables, where one proves convergence in distribution by showing that all moments of the tree structure converge.In the context of trees and metric spaces, the moments of order k are obtained by summing over all k-tuples of individuals at some generation, and considering a functional of the subtree spanned by these k individuals.Informally, this amounts to picking k individuals at random in a size biased population, and then proving convergence of the genealogy of the sample.One contribution of our work is that, analogously to the method of moments in the real setting, we only need to prove convergence of the moments with no need to identify the limit.This relies on a de Finetti-like representation of exchangeable coalescents that was developed in [17].See Theorem 4 for our main convergence result.
Spinal decomposition of Markov branching processes.To compute the moments of branching process, we make use of a second set of tools called spinal decompositions [31,44,22].One of the main insight of the present manuscript lies in the observation that an ingenious random change of measure allows us to reduce the computation of a polynomial of order k to a computation on a single tree with k leaves, called the k-spine tree.Since this type of manipulation allows one to reduce a computation involving the whole tree to a computation involving only k individuals, this type of results have been called many-to-few formula.While many-to-one formula have been extensively explored in the literature since the seminal work of Lyons, Pemantle, and Peres [31], spinal decompositions of higher order are more sparse.One formulation has been exposed in [22] (see also [24,21,4,41]) where the k-spine is constructed from a system of branching particles evolving according to a prescribed Markov dynamics.While the main result in [22] could be in principle applied to our setting, the computation rapidly proved to be intractable.Another contribution of our work, that we want to emphasize, is a derivation of new general many-to-few formula that was better suited to our case.Let us describe it briefly here, and refer to Section 3 for a complete account.
The k-spine tree in our work is constructed iteratively as a coalescent point process (CPP in short).Starting from a single branch of length N , at each step a new branch is added to the right of the tree.Branch lengths are assumed to be i.i.d., and the procedure is stopped when the tree has k leaves, see Figure 4. Given this tree, types need to be assigned to vertices of the k-spine tree.For k = 1, the tree is made of a single branch, and the sequence of types observed from the root to the unique leaf is a Markov chain.This Markov chain is the usual sequence of types along the spine that arises in many versions of the many-to-one formula [7,44].It is obtained as the Doob harmonic transform of the offspring type, see Section 3.1.For a general k, the previous chain is duplicated independently at each branch point.The distribution of the resulting tree is connected to the original distribution of the branching process through a random change of measure ∆ k given in (3).The latter factor accounts for the fact that individuals located at the branch points are more likely to have a large offspring and a favorable type.
While our spinal decomposition result bears similarities with that in [22], our formulation allows for a more general distribution of the k-spine tree, which can be any discrete CPP.This additional degree of freedom proved very valuable in our application, where the introduction of a well-chosen ansatz for the genealogy of the process, see (17), simplified considerably earlier versions of our proofs.More generally, we believe that our approach is particularly amenable to the study of nearcritical branching processes, since the scaling limit of their genealogy can also be described as a continuous CPP.Nevertheless, see [24,21] for successful applications of the techniques in [22] to study the genealogy of a sample from a Galton-Watson tree.
Outline.Overall, the contribution of our work is three-fold.We have 1) derived a new type of many-to-few formula based on a CPP tree, 2) combined it with the framework of the Gromovweak topology to produce an effective way of studying the scaling limit of types and genealogies in branching processes, and 3) applied it to study a complex model from population genetics, the branching process with recombination.
The rest of our work is laid out as follows.Section 2 provides more details on the biological motivation of the branching process with recombination and a statement of our main results concerning this model.Those results will be proved using a general framework that will be developed in the subsequent sections.
In Section 3 we construct the k-spine tree and prove our spinal decomposition result.In Section 4, we show that the convergence of the genealogy of branching processes can be reduced to the convergence of the associated k-spines.This approach relies on a previous work [17] where we provide a de Finetti-like representation of ultrametric spaces that allows us to extend previous convergence criteria for the Gromov-weak topology.
In the last two sections, we apply the previous framework to the model at hand.In Section 5, we characterize the 1-spine associated to the branching process with recombination, and prove our convergence results in Section 6.

Branching process with recombination 2.1 Biological motivation
In the context of this work, genetic recombination is the biological mechanism by which an individual can inherit a chromosome which is not a copy of one of its two parental chromosomes, but a mix of them.An idealized version of this mechanism is illustrated in Figure 1.Due to recombination, the alleles carried by an individual at different loci, that is, locations on the chromosome, are not necessarily transmitted together.At the level of the population, this creates a complex correlation between the gene frequencies at different loci which is hard to study mathematically.
When focusing on a finite number of loci it is possible to express the dynamics of these frequencies as a set of non-linear differential equations or stochastic differential equations [2,36].However, one needs to keep track of the frequencies of all possible combinations of alleles.As the number of such combinations grows exponentially fast with the number of loci, it leads to expressions that rapidly become cumbersome, providing little biological insight.Another very fruitful approach is to trace backward-in-time the set of potential ancestors of the population.This gives rise to a mathematical object named the ancestral recombination graph (ARG) [19], or see [13,Chapter 3].However, the ARG is quite complicated both from a mathematical and a numerical point of view.Nevertheless, see [28] for some recent mathematical results, and [33,32] for approximations of the ARG that have proved very successful in application.
In this work we consider a third approach to this question, which is to envision the chromosome as a continuous segment.At each reproduction event recombination can break this segment into several subintervals, a subset of which is transmitted to the offspring, as in Figure 1.The genetic contribution of an individual is now described by a collection of intervals, which are delimited by points called junctions.This point of view has long-standing history dating back to the work of Fisher [16], see for instance [23] and references therein.Let us discuss the specific model that we consider, and how the branching process with recombination approximates it.

Connection to the Wright-Fisher model
Consider a population of fixed size N where individuals are endowed with a continuous chromosome represented by the interval (0, R).At each generation, individuals pick independently two parents uniformly from the previous generation.Assume that these parents can be distinguished, so that there is a left and a right parent.Then, independently for each individual: • with probability 1 − R/N , it inherits the chromosome of one of its two parents, say the left one; • with probability R/N , a recombination occurs.A crossover point U is sampled uniformly on (0, R), and the offspring inherits the part of the chromosome to the left of U from its left parent, and that to the right of U from its right parent.
Suppose that at some focal generation, labeled generation t = 0, each chromosome in the population is assigned a different color.Due to recombination, new chromosomes are formed that are mosaics of the initial colors.We are ultimately interested in describing the long-term distribution of these mosaics in the population.This is illustrated in Figure 2.
In this work, we consider a simpler but related problem.Fix a focal ancestor, and say that its chromosome is red.We trace the individuals in the population that have inherited some genetic material from this focal ancestor, that is, the set of individuals that have some red on their chromosome as well as the location of the red color.To recover the branching approximation that we study, consider an individual in the population at some generation t carrying a red interval I. Its offspring size distribution is Each of these children has another parent in the population.As long as the number of individuals with a red piece of chromosome is small compared to N , this other parent does not have any red part on its chromosome.Therefore, there are only four possible outcomes for each child: • With probability 1 − R/N no recombination occurs and with probability 1/2 it inherits I; with probability 1/2 the interval I is lost.
• With probability R/N a recombination occurs and if U ∈ I the interval I is transmitted or lost with probability 1/2; if U ∈ I, the child inherits the subinterval of I to the left or to the right of U with probability 1/2.
By combining the previous cases, we recover that the number of descendants that carry some red of an individual with red interval I is approximately distributed as a Poisson 1 + |I| N r.v., and that the probability of inheriting a fragmented interval is This is the description of the branching process with recombination.

Limiting behavior
Let P R denote the distribution of the branching process with recombination started from a single individual with interval (0, R).The following asymptotic expression for the survival probability of this process was already derived in [3].

Proposition 1 ([3]
).Let Z N denote the population size at generation N in the branching process with recombination.The limit Let T N denote the set of individuals at generation N in the branching process with recombination.For an individual u ∈ T N , we denote by I u the interval that it carries.For u, v ∈ T N , let d T (u, v) denote the genealogical distance between u and v, that is, the number of generations that need to be traced backward in time before u and v find a common ancestor.
Our first result provides the joint limit of the interval lengths and of the genealogy of the population.To derive this limit, we will envision the population as a marked metric measure space and work with the marked Gromov-weak topology [10].The definition of this topology is recalled in Section 4.1.
Let us consider the measure µ N on T N × R + defined as The triple [T N , d T , µ N ] is the marked metric measure space corresponding to the branching process with recombination.Let us finally define the rescaling and define the rescaled distance as N which is the distance obtained by rescaling time according to F R .
Theorem 1. Fix t > 0. Conditional on survival at time N t the following limit holds in distribution for the marked Gromov-weak topology, where [(0, Y ), d P ] is a Brownian coalescent point process, and Exp(t) is the exponential distribution with mean 1/t.
A stronger version of this result is proved in Section 6.3.Let us now briefly discuss several consequences of the previous result.
Convergence of the empirical measure.As mentioned in the introduction, the branching process at hand is naturally driven to criticality through recombination.Recall that if the offspring distribution of a (standard) critical branching processes has finite second moment, the celebrated Yaglom law states that conditional on survival up to time N t, the rescaled population size Z N t /N converges to an exponential random variable.In contrast, the convergence of µ N t tN log R entails that the rescaled population size at time N t converges to an exponential random variable, but the population size is of order N log R instead of N .In words, the local supercritical character of the process translates into an extra log R factor for the population size.
Secondly, the convergence of the random measure tN log R also implies that the length of the interval carried by a typical individual in the population is exponentially distributed with mean 1/t.Since the limiting random measure is deterministic, the intervals carried by k typical individuals in the populations are independent (propagation of chaos).Note that, although the length of the initial interval R goes to infinity, the intervals at any finite time t remain of finite length.This phenomenon is usually referred to as coming down from infinity.In our work, it originates from the existence of an entrance law at infinity for the spine, which turns out to be connected to the existence of such entrance laws for positive self-similar Markov processes with negative index of self-similarity [5,6].
Convergence of the genealogy.Let us first comment on the rescaling F R .Although the expression of F R appears a bit daunting at first, it essentially boils down to first rescaling time by N (as expected), and then measuring time from the origin in the log-scale.The first consequence is that the genealogy of the population in the natural scale (that is, if we only rescale time by N ) converges to a star tree so that the genealogy becomes indistinguishable from the one of a supercritical branching process at the limit.
A second consequence of this result is that, after rescaling time according to F R , the genealogy of the branching process with recombination converges to a limiting metric space named the Brownian coalescent point process (CPP).It is constructed out of a Poisson point process P on (0, ∞) × (0, 1) with intensity dt ⊗ 1 x 2 dx.Let ∀x ≤ y, d P (x, y) = sup{z : (t, z) ∈ P, x ≤ t ≤ y}.
The Brownian CPP is the random metric space [(0, Y ), d P ], where Y is an exponential r.v. with mean 1, independent of P , see Figure 3 for a graphical construction.It corresponds to the limit of the genealogy of a critical Galton-Watson process with finite variance [39].
Chromosomic distance.The previous result provides a complete description of the interval lengths in the population, but does not provide any insight into their distribution over (0, R).We will encode the latter information by picking a reference point belonging to each interval in the population and considering the usual distance on the real line between these points.More precisely, for each u ∈ T N , pick a reference point M u uniformly on I u .We define a new metric We will refer to the D N as the chromosomic distance.
The quadruple [T N , d N , D N , µ N ] can be seen as a random "bi-metric" measure space with marks.We can define a straightforward extension of the marked Gromov-weak topology for such objects, see the end of Section 4.1.The correct rescaling for D N is to set In Section 6.3, we prove the following refinement of Theorem 1.
Theorem 2. Fix t > 0. Conditional on survival of the process at tN , the following limit holds in distribution for the marked Gromov-weak topology, where [(0, Y ), d P ] is a Brownian coalescent point process, and Exp(t) is the exponential distribution with mean 1/t.
It is important to note that, in the limit, the two metrics coincide.This result is quite interesting from a biological point of view.It shows that there is a correspondence between the genealogical distance between two individuals, and the chromosomic distance between the genetic material that they carry.Indeed, the latter two quantities are correlated: two individuals inherit intervals that are subsets of the interval carried by their most-recent common ancestor.If this ancestor is recent, its interval is smaller, and so is their chromosomic distance.Our result shows that, in the limit, the two distances become identical when considered on the right scale.This result is illustrated in Figure 3.The black vertical lines represent to the atoms of P , and the corresponding tree is pictured in grey.Bottom: geometry of the blocks of ancestral material corresponding to the top CPP.Each block is represented by a black stripe.The correspondence between the blocks and the tree are shown for some blocks by grey segments joining the two.The distance between two consecutive stripes is the logarithm of their distance on the chromosome.Note that this induces a strong deformation of the intuitive linear scale.which corresponds to the CPP tree "viewed from the individual with the left-most interval".Using elementary properties of Poisson point processes shows that ϑ can also be written as where P is a Poisson point process on (0, ∞) × (0, ∞) with intensity 1 x 2 e −x/y dxdy.The same expression was obtained in [28,Theorem 1.5] to describe the set of loci that share the same ancestor as the left-most locus in the fixed haplotype of a Wright-Fisher model with recombination, under a limiting regime similar to ours.This connection is quite surprising.We are considering a branching approximation where all intervals belong to distinct individuals and its chromosome carries at most one block of ancestral genome, whereas in [28] all intervals belong to a single chromosome, which has reached fixation in the population.
3 The k-spine tree

The many-to-few formula
The objective of this first section is to introduce the k-spine tree and state our many-to-few formula, that relates the expression of the polynomials of a branching process to the k-spine tree.All the random variables introduced here are more formally defined in the forthcoming sections, where the proof of the many-to-few formula is carried out.A formal statement of our result requires some preliminary notation.
Assumption and notation.Consider a Polish space (E, d E ), and a collection (Ξ(x); x ∈ E) of random point measures on E. This collection can be used to construct a branching process with type space E, such that the atoms of a realization of Ξ(x) provide the types of the children of an individual with type x.The distribution of the resulting branching process is denoted by P x .
Let K(x) denote the number of atoms of Ξ(x), and set for the distribution of K(x).The n-th factorial moment of K(x) is denoted by m n (x), that is, where we have used the notation k (n) for the n-th descending factorial of k, Our results are more easily formulated under the assumption that, conditional on K(x), the locations of the atoms are i.i.d. with distribution p(x, •).That is, we assume that where (ξ i (x); i ≥ 1) is an i.i.d.sequence distributed as p(x, •) and is independent of K(x).We make the further simplifying assumption that all distributions p(x, •) have a density w.r.t.some common measure Λ on E. With a slight abuse of notation, the density of p(x, •) is denoted by (p(x, y); y ∈ E).
Harmonic function.We say that a map h : where we used the notation µ, f = f dµ, see for instance [7].A harmonic function can be used to define a new probability kernel on E, defined as The fact that this is a probability measure follows from the harmonicity of h.
There is a unique tree with k leaves labeled by {1, . . ., k} such that the tree distance between the leaves is d T .We denote it by S and call it the ν-CPP tree.This tree is constructed inductively by grafting a branch of length N − W i on the tree constructed at step i, as illustrated in Figure 4.
We now assign marks on the tree such that along each branch of the tree, marks evolve according to a Markov chain with transition kernel (q(x, y); x, y ∈ E) defined in (2).More formally construct a collection of processes (X 1 , . . ., X k ) such that • the process (X 1 (n); n ≥ 0) is a Markov chain with transition (q(x, y); x, y ∈ E) started from x; • conditional on (X 1 , . . ., X i ), for some independent Markov chain X with transition (q(x, y); x, y ∈ E) started from X i (W i ).
Figure 4: Illustration of the construction of a CPP tree.The vector (W 1 , . . ., W k−1 ) branching times between successive leaves.In this example, this vector is (6, 4, 1, 5, 5).The tree is recovered from these times by grafting for each i a branch of length N − W i to the right-most vertex of the tree at generation W i .By thinking of (X i (n); n ≥ 0) as giving the sequence of marks along the branch of S starting from the root and going to the i-th leaf, we can assign to each vertex u ∈ S a mark Y u .
Definition 1.The k-spine tree is the random marked tree [S, (Y u ; u ∈ S)] encoded by the variables (W 1 , . . ., W k−1 ) and (X 1 , . . ., X k ).The distribution of the latter variables is denoted by Q k,N x .
We are now ready to state our many-to-few formula.It can be described informally as follows.Suppose that the branching process with law P x is biased by the k-th factorial moment of its size at generation N and that k individuals are chosen uniformly for that generation.Then the law of the subtree spanned by these individuals is Q k,N x biased by a random factor ∆ k that can be expressed as where d u denotes the degree of a vertex u ∈ S and Y u its mark.Note that the left product in (3) has at most k − 1 terms, which correspond to the branch points in S.
Lemma 1. Assume that for every x ∈ E, the offspring number K(x) is Poisson (for some given parameter λ(x) > 0 that may depend on x).Then .
Proof.This simply follows from the well known fact that the k-th factorial moment of a Poisson random variable with parameter λ > 0 is λ k .
Finally, let T N denote the labels of the N -th generation of a branching process with distribution P x , for u ∈ T N let X u denote its type, and let d T denote the tree distance on T N .
Remark 2. (i) In our construction of the k-spine, the distribution of the tree is independent of the marking.The term ∆ k captures the interplay between the genealogy and the types as a function of the marking at "topological" points.
(ii) Compare Proposition 2 to the many-to-few formula in [22].Both expressions relate the distribution of a k-sample from the branching process (l.h.s. of the equality) to that of a simpler k-spine tree (r.h.s. of the equality) at the expense of a bias term, here denoted by ∆ k .
(iii) In [22], the k-spine tree only depends on the moments of the reproduction law.Our formulation has one extra degree of freedom, since the k-spine tree is constructed out of an a priori genealogy, the ν-CPP tree.
(iv) In many situations, including the model at hand, the bias term in [22] becomes degenerate in the limit so that the distribution of the limiting genealogy is singular with respect to that of the original k-spine tree.For instance, for near-critical processes conditioned on survival at generation N , the first split time of the k-spine tree in [22, Section 8] remains of order 1, whereas the most-recent common ancestor of the whole population is known to live at a time of order N .In contrast, one advantage of our approach is that ν can be well-chosen so that the bias ∆ k converges to a non-degenerate limit.This amounts to finding a good ansatz for the limiting genealogy.In our example this ansatz is given in (17), and the limit of the bias ∆ k is independent of the genealogy.This indicates that the limit of the genealogy does not depend on the types in the population.
The rest of the section is dedicated to the proof of the many-to-few formula.Our strategy to prove this result is to define a new tree with distribution Qk,N x by grafting on the k-spine tree independent subtrees distributed as the original branching process.The many-to-few formula will then follow from the more precise spinal decomposition theorem, which states that Qk,N

Tree construction of the branching process
Let us recall some common notation on trees.
Trees.Following the usual Ulam-Harris labeling convention, all trees will be encoded as subsets of Let us consider an element u = (u(1), . . ., u(n)) ∈ U .We denote by |u| = n its length, interpreted as the generation of u.Moreover, its i-th child is denoted by and its ancestor in the previous generation as The set U is naturally endowed with a partial order , where u v if u is an ancestor of v, that is, The most-recent common ancestor of u and v can then be defined as u ∧ v := max{w : w u and w v}.
In the tree interpretation of U , we can define a metric d T corresponding to the graph distance as Finally, as a consequence of the Ulam-Harris encoding, trees are planar in the sense that the children of each vertex are endowed with a total order.Accordingly let us denote by ≤ the lexicographical order on U , which we will call the planar order.Note that ≤ extends .
(ii) if, for some j, uj ∈ τ , then u ∈ τ ; (iii) for any u ∈ τ , there exists where k u is the number of children of u, also called the (out-)degree of u.
The set of all trees is denoted by Ω.For a tree τ ∈ Ω, define its restriction to the n-th generation as and that to the first n generations as Furthermore, let us denote by Ω n the set of trees of height at most n, where the height of a tree is defined as the generation of the oldest individual in the tree.
Marked trees and definition of P x .A marked tree is a tree τ ∈ Ω with a collection (x u ; u ∈ τ ) of marks with values in E. Let us define a random marked tree [T, (X u ; u ∈ T )] inductively as follows, that corresponds to the branching process with offspring reproduction point processes (Ξ(x); x ∈ E).
Start from a single individual ∅ with mark X ∅ = x.Conditional on the first n generations T [n] and their marks (X u ; u ∈ T [n] ), consider a collection of independent point processes (Ξ u ; u ∈ T n ), where Ξ u ∼ Ξ(X u ).
Let us write for the atoms of Ξ u .Then define the next generation as with marks given by ∀ui ∈ T n+1 , X ui = ξ ui .
Let T = ∪ n≥1 T n be the whole tree, and define define P x as the law of the random marked tree [T, (X u ; u ∈ T )] obtained through the previous procedure, and P N x the law of its restriction to the first N generations.

Ultrametric trees
From now on, we consider a fixed, focal generation N .In this section we construct the measure Qk,N x obtained by grafting some independent subtrees on the k-spine tree.This construction relies on the notion of (discrete) ultrametric trees.
The set of all ultrametric trees of height N with k leaves is denoted by U k,N .For τ ∈ U k,N , let us denote by ( 1 , . . ., k ) the leaves of τ in lexicographical order, that is, such that The previous description of an ultrametric tree as an element of Ω N is not suitable to describe the large N limit of the k-spine tree.To derive such a limit, we need to encode elements of U k,N as a sequence (g 1 , . . ., g k−1 ) giving the branch times between successive leaves in the tree.This construction is sometimes referred to as a coalescent point process (CPP) [39,29].
More precisely, define the map The following straightforward result shows that the tree τ can be recovered from the vector of coalescence times Φ(τ ).
Lemma 2. The map Φ is a bijection from the set of ultrametric trees U k,N to the set of vectors {0, . . ., N − 1} k−1 .
The k-spine tree.Let (W 1 , . . ., W k−1 ) and (X 1 , . . ., X k ) have distribution Q k,N x .We formally define the ν-CPP tree illustrated in Figure 4 as the random tree S := Φ −1 (W 1 , . . ., W k−1 ).Note that the CPP tree associated to the uniform distribution is uniform on U k,N .The processes (X 1 , . . ., X k ) can now be used to construct a collection of marks (Y u ; u ∈ S) as follows.Each u ∈ S is of the form u = ( i (1), . . ., i (n)) for some leaf i and n ≤ N .Define the mark of such a u as (It is not hard to see that Y u is well-defined in that it does not depend on the choice of i if u is ancestral to several leaves.)The marked tree [S, (Y u )] is the k-spine tree encoded by the r.v.(W 1 , . . ., W k−1 ) and (X 1 , . . ., X k ).

Construction of Qk,N
x .Let [S, (Y u )] be the k-spine tree constructed above.We attach to S some subtrees distributed as P x to define a larger marked tree [T, (X u )].This yields a random tree with k spines originated from the k leaves of S at generation N .The distribution of these random variables will be denoted by Qk,N x .To construct T from the spine, we first specify the number of subtrees that need to be attached to each vertex u of the spine.We will distinguish the degree of a vertex in S and that in the larger tree T .We denote by d u the number of children in S of u.(The degree of u in T will be denoted by k u as previously.)We work conditional on [S, (Y u )] and assume them to be fixed.Let (K u ; u ∈ S, |u| < N ) be independent variables such that K u has the distribution of K(Y u ), biased by its d u -th factorial moment.That is, Among the K u children of u in T , d u are distinguished as they correspond to the children of u in S. Let C u1 < • • • < C udu be the labels of these distinguished children, and let us assume that they are uniformly chosen among the Ku du possibilities.We can now define the subtree corresponding to S in the larger tree T by an inductive relabelling of the nodes.For u ∈ S, define Ψ(u) inductively as follows Finally, let us attach the subtrees to [S, (Y u )].For u ∈ S, consider a sequence [T ui , (X ui,v ; v); i ≥ 1] of i.i.d.marked trees with the original distribution P Xui,∅ , but with random initial mark X ui,∅ distributed as p(Y u , •).The final tree T is defined as and for v ∈ T ui , the mark of Ψ(u)iv is Informally, for each of the K u − d u children of u that are not in S, we realize one step of the Markov chain with kernel (p(x, y); x, y ∈ E) and then attach a whole subtree T ui to that child.The resulting tree T has k distinguished leaves, Ψ( 1 ), . . ., Ψ( k ), corresponding to the k leaves of S. Let us finally define ∀i ≤ k, V i = Ψ( σi ) for an independent uniform permutation σ of {1, . . ., k}.The distribution of the triple [T is the restriction of T to the first N generations.

The spinal decomposition theorem
Our final objective in this section is to connect P x and Qk,N x to derive our many-to-few formula.We assume ν n > 0 for all n ∈ {0, . . ., N − 1}.Recall the expression of ∆ k from (3).Our spinal decomposition theorem states that, if P x is biased by the k-th factorial moment of its size at generation N and k uniformly chosen individuals (V 1 , . . ., V k ) are distinguished from that generation, the corresponding marked tree [T N , (X u ), (V i )] is distributed as Qk,N x biased by ∆ k .
Theorem 3 (Spinal decomposition).Consider a tree τ with height N and k distinct vertices (v 1 , . . ., v k ) ∈ τ N .Let h be a harmonic function for the branching process with law (P x ; x ∈ E).Then, for any test function ϕ, we have Proof.It is enough to prove the result for the uniform CPP by noting that u∈S 1 N νu is the Radon-Nykodim derivative of the uniform CPP with respect to the ν-CPP.
The natural state space for P N x is the space of all marked trees with height at most N , that is, Using that the offspring distribution on E has a density w.r.t.some measure Λ, it is clear that P N x has a density w.r.t. a dominating measure defined as which is given by where k u stands for the number of children of u.
Let s denote the subtree spanned by (v 1 , . . ., v k ), that is We can decompose (5) into a product on s and on the subtrees attached to s.The branching property shows that For u ∈ s, let d u denote the number of children of u that belong to s, that is, Let us make the following change in the previous equality Let us also write the second term in the product as Putting both expressions together, we obtain that The result now follows upon identifying each term in this product.The first term is ∆ k .The second term is the density of the marks (x u ; u ∈ s) along the k-spine.The last product is made of three terms.The first is the probability that the d u children of u that belong to s have a given birth rank.The second is the probability that the final degree of u is k u given that it has d u children in s (see (4)).The last is the density of the marked trees attached to u.The N k−1 k! term in the statement of theorem is simply the probability of observing a given ultrametric tree and labeling of the leaves.
Proof of Proposition 2. Let τ be some fixed tree and v 1 , . . ., v k be distinct vertices at height N of τ .Using Theorem 3 yields Summing over all (v 1 , . . ., v k ) first, then over all τ , and recalling that V i = σi for an independent uniform permutation σ of {1, . . ., k} proves the result.
4 Convergence of marked branching processes 4.1 The marked Gromov-weak topology Deriving the scaling limit of the genealogy and types in a branching process requires one to envision it as a random marked metric measure space.In this work we equip the set of all such spaces with the marked Gromov-weak topology [10].This section is a brief remainder of the basic properties of this topology, a more thorough account can be found in [10,18].We do not restrict our attention to trees and try to follow as much as possible the notation in [10], so that some notation in this section might be inconsistent with the rest of the paper.
Let (E, d E ) be a fixed complete separable metric space, referred to as the mark space.In our application, E = [0, ∞) is endowed with the usual distance on the real line.A marked metric measure space (mmm-space for short) is a triple [X, d, µ], where (X, d) is a complete separable metric space, and µ is a finite measure on X × E.
To define a topology on the set of mmm-spaces, for each k ≥ 1 consider the map that maps k points in X × E to the matrix of pairwise distances and vector of marks.We denote by ν k,X = µ ⊗k •R −1 k , the k-th marked distance matrix distribution of [X, d, µ], which is the pushforward of µ ⊗k by the map R k .(Note that µ is not necessarily a probability distribution.)For some k ≥ 1 and some continuous bounded test function Functionals of the previous form are called polynomials (k is the degree or order of the polynomial), and the set of all polynomials, obtained by varying k and ϕ, is denoted by Π.
Definition 2. The marked Gromov-weak topology is the topology on mmm-spaces induced by Π.A random mmm-space is a r.v. with values in the set of (equivalence classes of) mmm-spaces, endowed with the Gromov-weak topology and the associated Borel σ-field.
Remark 3. Formally, the marked Gromov-weak topology should be defined on equivalence classes of mmm-spaces, where two spaces belong to the same class iff there is a measure preserving isometry between the supports of their measures that also preserves marks, see [10,Definition 2.1].This distinction has little consequences in practice so that we often omit it.
There is a unique equivalence class of all mmm-spaces with a null sampling measure, which acts as the null mmm-space and that we denote by 0. It follows from the definition of the Gromov-weak topology that a sequence of mmm-spaces ( Remark 4 (Polar decomposition).An mmm-space [X, d, µ] = 0 can be seen as a pair (μ, [X, d, μ]) where μ = µ(X × E) is the total mass of µ and μ = µ/μ is the renormalized probability measure.This is the so-called polar decomposition of [X, d, µ] [9].The space of all polar decompositions is naturally endowed with the product topology, where the space of all probability mmm-spaces is endowed with the more standard marked Gromov-weak topology restricted to probability mmm-spaces [10].It is not hard to see that the map taking non-null mmm-spaces to their polar decompositions is an homeomorphism.
An important consequence of this remark is that the convergence in distribution of a sequence of mmm-spaces [X n , d n , µ n ] implies that of [X n , d n , μn ], provided that the limit mmm-space is a.s.non-null.In particular, for ultrametric spaces, it implies the convergence in distribution of the genealogy of k individuals sampled from [X n , d n , µ n ] according to μn .
Many properties of the marked Gromov-weak topology are derived in [10] under the further assumption that µ is a probability measure.Relaxing this assumption to account for finite measures is quite straightforward but requires some caution, as the total mass of µ can now drift to zero or infinity.In particular, the following result shows that Π forms a convergence determining class only when the limit satisfies a moment condition, which is a well-known criterion for a real variable to be identified by its moments, see for instance [14,Theorem 3.3.25].This result was already stated for metric measure spaces without marks in [9, Lemma 2.7].
Then, for a sequence [X n , d n , µ n ] of random mmm-spaces to converge in distribution for the marked Gromov-weak topology to [X, d, µ] it is sufficient that for all Φ ∈ Π.
Proof.Let us prove this result carefully.Fix a polynomial Φ of degree k associated to a non-negative continuous bounded functional ϕ.Recall the notation μn for the total mass of [X n , d n , µ n ], which is a r.v. with values in [0, ∞), and μn = µ n /μ n .Introduce a new measure M Φ n on [0, ∞) such that for any continuous bounded function The key observation is now that, applying Fubini's theorem, [X, d, µ] → μp Φ(X, d, µ) is again a polynomial of the form (6) (of degree p + k).Therefore, our assumption entails that, for any integer p ≥ 0, where we have defined M Φ is a similar way to M Φ n using the limiting random variable [X, d, µ].Now, the usual method of moments on [0, ∞), see for instance [14,Theorem 3.3.26],entails that for any continuous bounded function f We have used that M Φ fulfills the moment growth condition of [14, Theorem 3.3.26]since and (7) holds.By taking linear combinations, (8) holds for any polynomials, not only non-negative ones.
Let f : [0, ∞) → R be continuous bounded and have its support bounded away from 0. Since x → f (x)/x k is continuous bounded, applying (8) to this map and using that Φ( Standard arguments show that the above convergence also holds for f (x) = g(x)1 {x≥ε} for any continuous bounded g and ε > 0 such that P(μ = ε) = 0. Since [10, Theorem 5] ensures that polynomials are convergence determining on mmm-spaces with a probability sampling measure, we can use [15, Proposition 4.6, Chapter 3] to obtain that for any continuous bounded functional F on the space of mmm-spaces, (Here we have applied the result to the polar decomposition of the mmm-space, and used that the polar decomposition defines an homeomorphism.)To end the proof, we note that by Portmanteau's theorem Finally, we write take a limit n → ∞ first, then ε → 0, and use (9) to estimate the first term and (10) to control the second one to obtain which is the desired result.
Bi-metric measure spaces.The branching process with recombination is naturally endowed with two metrics: the genealogical distance and the chromosomic distance.Therefore, for the purpose of this application only, let us say that [X, d, D, µ] is a marked bi-metric measure space if both d and D are metric that make (X, d) and (X, D) Polish spaces, and if µ is a finite measure on X × E, where X is endowed with the σ-field induced by reunion of the open balls of d and D.
A polynomial of a marked bi-metric measure space is a functional of the form for some k and some ϕ.Accordingly we define the Gromov-weak topology for these spaces as the topology induced by the polynomials.It is straightforward to check that all the results stated for mmm-spaces carry on to marked bi-metric measure spaces, up to replacing the polynomials in (6) by that in (11).

Convergence of ultrametric spaces
Using Proposition 3 requires one to have prior knowledge of the limit [X, d, µ].A stronger version of this result would be that the convergence of each (E[Φ(X n , d n , µ n )]; n ≥ 1) implies the existence of a random mmm-space to which (X n , d n , µ n ) converges in distribution (under a moment condition similar to (7)).Such a result cannot hold in the current formulation of the marked Gromov-weak topology.This is a consequence of the fact that some limits of distance matrix distributions cannot be expressed as the distance matrix distribution of a separable metric space, see for instance [18, Example 2.12 (ii)].To overcome this issue, it is necessary to relax the separability assumption in the definition of an mmm-space.Deriving a meaningful extension of the Gromov-weak topology to non-separable metric spaces is not a straightforward task, since it raises many measure theoretic difficulties.However, when restricting our attention to genealogies, as is the purpose of this work, the specific tree structure of these objects can be used to define such an extension.We follow the framework introduced in [17, Section 4], but see also [20].The results contained in this section are not necessary for the analysis of the branching process with recombination and can be possibly skipped.
(ii) The σ-field U verifies: where B(x, t) is the open ball of radius t and center x, and B(U ) is the Borel σ-field associated to (U, d); (iii) The measure µ is a finite measure on U × E, defined on the product σ-field U ⊗ B(E).
Remark 5.While this definition might be surprising at first sight, note that if (U, d) is separable and ultrametric, points (i) and (ii) of the definition are fulfilled when U is chosen to be the usual Borel σ-field.Therefore, a separable marked UMS in the sense of Definition 3 is an ultrametric mmm-space in the sense of Section 4.1.When no σ-field is prescribed, U is assumed to be the Borel σ-field.Using a naive definition of a marked UMS as a complete metric space with a finite measure on the corresponding Borel σ-field raises some deep measure theoretic issues related to the Banach-Ulam problem, that are avoided by Definition 3, see [17,Section 4] for a discussion.
Point (i) of the above definition ensures that each map R k is measurable, so that we can define the marked distance matrix distribution ν k,U and the polynomials Φ(U, d, U , µ) of a marked UMS (U, d, U , µ) as in the previous section.Analogously to mmm-spaces, we define the marked Gromovweak topology on the set of marked UMS as the topology induced by the set of polynomials.Remark 6.Again, for the topology to be separated we need to work with equivalence classes of marked UMS.For non-separable spaces, the correct notion of equivalence is that of weak isometry provided in [17,Definition 4.11].We do not make the distinction between marked UMS and their equivalence class in practice.
We can now state a stronger version of Proposition 3 for ultrametric spaces.In the statement of the theorem we will need a mild tightness conditions.For a marked UMS [U, d, U , µ], define the maps r and π E as and the corresponding pushforward measures is random, these are random measures, and we denote their intensity measures by E[w U ] and E[m U ], which are deterministic measures on R + and E respectively.Theorem 4. Let (U n , d n , U n , µ n ) be a sequence of random marked UMS such that for any polynomial Φ ∈ Π, lim exists, and fulfill (compare with (7)) lim sup Suppose also that the sequences (E[w Un ]; n ≥ 1) and (E[m Un ]; n ≥ 1) are relatively compact, as measures on R + and E. Then there exist a random marked UMS, [U, d, U , µ] such that (U n , d n , U n , µ n ) converges to that limit in the marked Gromov-weak topology.Moreover the limit is characterized by Remark 7. The previous result suggests the following simple method to prove convergence in distribution in the (usual) sense of separable ultrametric mmm-spaces.First prove that the conditions of Theorem 4 are fulfilled, then check that the limiting marked UMS is a.s.separable.The two compactness conditions on (E[w Un ]) and (E[m Un ]) ensure, in combination with the convergence of the moments, that the sequence of mmm-spaces is tight.Compare this to checking, on top of the previous assumptions, the tightness criterion in [18,Theorem 2 (ii)] that ensures that no mass of the sampling measure is accumulating on isolated points.This condition is not needed here because we have enlarged the state space of mmm-spaces to include non-separable metric spaces.
The proof of the above result is based on a characterization of all exchangeable ultrametric matrices.We call a random pair (d ij ; i, j ≥ 1) and (Y i ; i ≥ 1) a marked exchangeable ultrametric matrix if

an ultrametric on N;
• its distribution is invariant by the action of any permutation σ of N with finite support: A typical way to obtain such an ultrametric matrix is to consider an i.i.d.sample (X i , Y i ; i ≥ 1) from a marked UMS (U, d, U , µ) with µ(U ) = 1 a.s., and define The next result shows that all exchangeable marked matrices are obtained in this way.It can be seen as a version of Kingman's representation theorem of exchangeable partitions [26] for ultrametric matrices.
Theorem 5 ([17]).Let (d ij ; i, j ≥ 1) and (Y i ; i ≥ 1) be an exchangeable marked ultrametric matrix.There exists a random marked probability UMS [U, d, U , µ] (that is, µ(U ) = 1 a.s.) such that the exchangeable marked ultrametric matrix obtained by sampling from it as in (13) is distributed as (d ij ; i, j ≥ 1) and (Y i ; i ≥ 1).Moreover this marked UMS is unique in distribution.
Proof.This result is a straightforward extension of [17,Theorem 1.8] that deals with the case without marks.To guide the reader, let us mention the crucial modification that need to be made.
The proof relies on encoding some marginals of (d ij ; i, j ≥ 1) as an exchangeable sequence of r.v.(ξ i ; i ≥ 1) in [0, 1] p and using a de Finetti-type argument, see [17,Appendix B].The same argument should be applied to the exchangeable sequence of r.v.(ξ Proof of Theorem 4. We prove the result by a tightness and uniqueness argument.To prove tightness, we embed the space of marked UMS into a space of measures, using the marked distance matrices, and use known tightness arguments for random measures.More precisely, the map ι : [U, d, U , µ] → (ν k,U ; k ≥ 1) is an injection.This is a consequence of the uniqueness part of Theorem 5.For each k ≥ 1, ν k,U lives in the space of finite measures on R k 2 + × E k , which can be endowed with the weak topology.If the space of sequences (ν k,U ; k ≥ 1) is endowed with the product topology, it follows readily from the definition of the Gromov-weak topology that ι is a homeomorphism from the space of marked UMS to its image.We claim that the image of ι is closed in this product topology.If this is the case, the space of marked UMS is homeomorphic to a closed subset of the space of sequences of measures, and clearly, where in the right-hand side each ν k,U n is a random measure, and tightness is with respect to the weak topology.For the collection of random measures (ν k,Un ; n ≥ 1) to be tight it is sufficient that the collection of intensity measures (E[ν k,Un ]; n ≥ 1) is relatively compact [8,Lemma 3.2.8].For and p i (y) = u i be the projection maps.It is sufficient to show that the pushforward of E[ν k,Un ] through each projection is relatively compact.By definition, for a Borel set A ⊆ R + and i = j, by exchangeability, The relative compactness of (E[ν k,Un ] • p −1 ij ; n ≥ 1) now follows from that of (E[w Un ]; n ≥ 1) and from the uniform integrability of μk−2 n .In a similar way, for a Borel set B ∈ E, The desired compactness follows form that of (E[m Un ]; n ≥ 1) and from the uniform integrability of μk−1 n .
We now go back to our claim that the image of ι is closed.For each k, n ≥ 1, let ν k,n be the k-th marked distance matrix distribution of some marked UMS [U n , d n , U n , µ n ], and assume that it converges as n → ∞ to some ν k .We need to show that the limiting sequence of distance matrices can be obtained by sampling from a marked UMS.We can assume without loss of generality that ν k = 0. Let νk be the probability measure obtained by renormalizing ν k , and define similarly νk,n .Since the projection of νk+1,n on R k 2 + × E k is equal to νk,n , the same property holds for νk+1 and νk .Using Kolmogorov's extension theorem, we can extend consistently the measures (ν k ; k ≥ 1) to a measure ν∞ on R N×N + × E N whose projections on finite-dimensional spaces are given by the measures (ν k ; k ≥ 1).Quite clearly, ν∞ is the law of a marked exchangeable ultrametric matrix.(Exchangeability and almost sure ultrametricity hold for a fixed n, and pass to the limit.)Theorem 5 shows that we can find a marked UMS [U, d, U , μ] whose k-th marked distance matrix distribution is νk .Denote by μ the limit of the total mass of ν 1,n , and by µ = μμ.The k-th marked distance matrix distribution of the marked UMS [U, d, U , µ] is μk νk = ν k .This proves the claim.
Finally, we prove uniqueness.Let [U, d, U , µ] and [U , d , U , µ ] be two random marked UMS, that are limits in distribution of a subsequence of ([U n , d n , U n , µ n ]; n ≥ 1).We want to show that they have the same distribution.For any polynomial Φ ∈ Π, since Φ is continuous and (Φ(U n , d n , U n , µ n ); n ≥ 1) is uniformly integrable (it has uniformly bounded moments of all orders), the moments of the two limiting marked UMS coincide and verify (12), namely, Introducing the same measure M Φ as in the proof of Proposition 3, the method of moments on R + shows that, for any continuous bounded f : R + → R and any polynomial Φ, On the event {μ > 0}, let (d ij , Y i ; i, j ≥ 1) be the marked exchangeable ultrametric matrix obtained from an i.i.d.sample from [U, d, U , μ], and define The identity ( 14) can be written as This equation shows that μ and μ have the same distribution, and for μ-a.e.x, the law of (d ij , Y i ; i, j ≤ k) conditional on μ = x is the same as that of (d ij , Y i ; i, j ≤ k) conditional on μ = x.The uniqueness part of Theorem 5 shows that the law of [U, d, U , μ] conditional on μ = x is the same as that of [U , d , U , μ ] conditional on μ .Combining this with the fact that μ and μ have the same distribution and that the polar decomposition is a homeomorphism proves that the two marked UMS have the same distribution.

Moments of some continuous trees
In this section we compute the moments of some usual random tree models, namely CPP trees and Λ-coalescents, to illustrate the type of expression that can arise for the limiting mmm-space of Proposition 3.
Continuous coalescent point processes.Coalescent point process trees are a class of continuous random trees that correspond to the scaling limit of the genealogy of various branching processes [11,27,39].Of particular interest is the Brownian CPP described in Section 2.3 that corresponds to the scaling limit of critical Galton-Watson processes, and also corresponds to the limit of the rescaled genealogy of the branching process with recombination.Consider a Poisson point process P on [0, ∞) × (0, ∞), with intensity dt ⊗ ν(dx).We make the further assumptions that For some x 0 > 0, let Y denote the first atom of P whose second coordinate exceeds x 0 , that is, The CPP tree at height x 0 associated to ν is the random metric measure space [(0, Y ), d P , Leb] with ∀x ≤ y, d P (x, y) = sup{z : (t, z) ∈ P, x ≤ z ≤ y}.
Proposition 4. Let [(0, Y ), d P , Leb] be the CPP tree at height x 0 associated to the measure ν.
Then for any continuous bounded function ϕ with associated polynomial Φ, we have and σ is an independent uniform permutation of {1, . . ., k}.
The following direct computation shows that this has the required distribution, It is clear from the definition of d P that for i < j, Therefore, if σ denotes the unique permutation of [k] such that In this work, the scaling limit of the genealogy is given by the Brownian CPP, which is the CPP with height 1 associated to the measure Corollary 1.The moments of the Brownian CPP are given by where for i < j, H i,j = H j,i = max{H i , . . ., H j−1 }, the r.v.(H 1 , . . ., H k−1 ) are i.i.d.uniform on (0, 1), and σ is an independent uniform permutation of {1, . . ., k}.
Proof.A direct computation shows that Metric measure spaces with independent types.In our model and in many other settings, the types in the population become independent of the genealogy in the limit of large population size.Typically, this situation arises when the time between the ancestors of two typical individuals in the population is large, so that the dynamics of the types along the lineages has time to reach some form of equilibrium and to forget about its starting point (the type of the ancestor).For a mmm-space [X, d, µ], the independence between the types and the genealogy corresponds to having a product sampling measure of the form µ = µ X ⊗ µ E , where µ X is a measure on X, and µ E a probability measure on the type space E. The moments of such product mmm-spaces are easily expressed in terms of the (unmarked) metric measure space [X, d, µ X ].Proposition 5. Let [X, d, µ] be a random mmm-space with a sampling measure of the form µ = µ X ⊗ µ E , where µ E is a deterministic probability measure on E.Then, for any polynomial Φ ∈ Π, we have where (Y 1 , . . ., Y k ) are i.i.d., distributed as µ E , and independent of [X, d, µ X ].
Proof.By definition of a polynomial and applying Fubini's theorem for a.s.all realizations of the random measure, Λ-coalescents.A Λ-coalescent is a process with values in the partitions of N such that for any n, its restriction to {1, . . ., n} is a Markov process with the following transitions.When the process has b blocks, any k blocks merge at rate λ b,k where for some finite measure Λ.These processes were introduced in [38,42], and provide the limit of the genealogy of several celebrated population models with fixed population size [34,43].A Λ-coalescent can be seen as a random ultrametric space on N. It is possible to take an appropriate completion of this space to define an ultrametric d Λ on (0, 1) that encodes the metric structure of the coalescent, see [18,Section 4] for the separable case, and [17, Section 3] for the general case.More precisely, there exists a random ultrametric d Λ such that if (V i ; i ≥ 1) is an independent sequence of i.i.d.uniform r.v. on (0, 1), and Π t is the partition defined through the equivalence relation Define the mark measure on T N × E as The triple [T N , d T , µ N ] is the mmm-space associated to the branching process [T, (X u )].The polynomial of degree k corresponding to a functional ϕ can be written as The aim of this section is to provide a general convergence criterion for a rescaling of the sequence of mmm-spaces [T N , d T , µ N ; N ≥ 1] that only involves computation on the k-spine tree.For each N ≥ 1, consider a rescaling parameter α N for the population size, β N : E → E for the mark space, and γ N : R + → R + for the genealogical distances.We assume that γ N is increasing so that γ N • d T is also an ultrametric, and that α N → ∞.Theorem 6. Suppose that for any k ≥ 1 and any continuous bounded function ϕ, the sequence converges and that the limit fulfills (12).Then there exists a random marked UMS [U, d, U , µ] such that conditional on Z N > 0, lim holds in distribution for the marked Gromov-weak topology.
Proof.According to Theorem 4 it is sufficient to prove that the following moments converge, Let us denote by By the many-to-few formula, Proposition 2, for an independent uniform permutation σ of [k].Taking ϕ ≡ 1, the assumption of the result readily implies that the convergence of each M N follows from that of M N and the result is proved.

Convergence of the k-spine
Theorem 6 shows that convergence of the branching process in the Gromov-weak topology can be deduced from the convergence of some functionals of the k-spine tree.We now provide a general convergence result for the k-spine tree that will be used to compute the limit of (15) for the branching process with recombination.
We work under the measure Q k,N x and define Since our work involves working under various measures, for a sequence (P n ; n ≥ 1) of probability measures and a sequence (Y n ; n ≥ 1) of r.v., we will use the notation to mean that the distribution of Y n , under the measure P n , converges to the distribution of Y .

W.
(ii) There exists a limiting Feller process X such that, if X N 1 (0) → X(0), in the Skorohod topology.
There exist equivalent formulations of the second point involving generators or semigroups, see for instance [25,Theorem 19.28].In the next result, we use the notation for the concatenation of f and g at time t.Proposition 7. Suppose that (A1) holds.Then where, • the r.v.(W 1 , . . ., W k−1 ) are i.i.d.copies of the limiting r.v.W ; • X 1 is distributed as X started from x and is independent of (W 1 , . . ., W k−1 ); • for each i, conditional on (W 1 , . . ., W k−1 ) and (X 1 , . . ., X i ), where (X (t); t ≥ 0) is distributed as X started from X i (W i ).
Proof.Let us work inductively, and assume that the convergence holds for some k ≥ 1.Let X N k+1 be distributed as X, started from X N k (W N k ).Obviously, W N k converges to W k , a copy of W independent of (W 1 , . . ., W k−1 ) and of (X 1 , . . ., X k ).Then it follows from the fact that X has no fixed time discontinuity that X N k (W N k ) converges to X k (W k ).Using the assumption (A1), this entails that X N k+1 converges to a limiting process X k+1 , which is distributed as X started from X k (W k ).
Recalling that, by definition of the discrete spine under the claim is a consequence of the a.s.continuity of the concatenation map, which is proved in Lemma 6.

The recombination spine
We now focus on the branching process with recombination.In this first section we derive the properties of its 1-spine.
Using the formalism of the previous section, the branching process with recombination can be constructed as a random marked tree, where the mark space is the set of intervals of R. According to the description of the branching process with recombination, an individual with mark I = [a, b] gives birth to K(I) children, with Then, each newborn experiences a recombination event with probability r N (I) := 2|I|/N 1 + |I|/N .
In the case of a recombination, the offspring inherits the interval [a, U ] or [U, b] with equal probability, where U is uniformly distributed over I.As in the previous section, we denote by Ξ(I) the offspring point process of a mother with interval I.The objective of this section is to compute and characterize the distribution Q 1,N I of the intervals along the 1-spine and its large N limit Q 1 I .
The h-transformed mark process.Defining Q 1,N I first requires one to find an adequate harmonic function for the branching process.In the branching process with recombination, a simple calculation shows that the length of the intervals is harmonic.Let us now compute the distribution Q 1,N I of the h-transformed process.According to (2), under Q 1,N I , the probability of experiencing no recombination in one time-step when carrying interval I is When experiencing a recombination event, according to (2) the resulting interval is biased by h, that is, biased by its length.This leads to the following description of the distribution of the intervals along the spine.
where U * has the size-biased uniform distribution on Large N convergence of the spine.As in the previous section, let I N denote the rescaled process ∀t ≥ 0, I N (t) = I N t .
We show that its large N limit is given by the following process.
Definition 5. Let Q 1 I denote the distribution of the continuous-time Markov process (I(t); t ≥ 0) started from I and such that jumps: converges in distribution for the Skorohod topology to (I(t); t ≥ 0) under Q 1 I .Proof.The two processes (I N (t); t ≥ 0) and (I(t); t ≥ 0) visit the same sequence of states, in distribution.Therefore, convergence in the Skorohod topology amounts to convergence of the jump times.Started from [a, b], the time before the first jump of (I N (t); t ≥ 0) is distributed as T N /N , where T N is geometrically distributed with success probability (b − a)/N .It is clear that T N /N converges in distribution to an exponentially distributed variable with mean 1/(b − a).Applying this convergence to the successive jump times of (I N (t); t ≥ 0) readily proves the result.

R
We are interested in the large R properties of the spine.In this section, we prove that the spine has a unique entrance law at infinity, which can be constructed from a homogeneous Poisson point process.This construction will also provide a coupling for the distribution of (I(t); t ≥ 0) started from any initial condition.
First, for an interval I = [a, b], it will be convenient to use the notation for any reals λ, µ.Consider a homogeneous Poisson point process P on [0, ∞) × R. For any t ≥ 0, the point process P t on R of atoms of P with time coordinate in [0, t] defined as ∀A, P t (A) = P ([0, t] × A).
The atoms of P t (A) split the real line into infinitely many subintervals.We are interested in the subinterval covering the origin.More precisely, let (x i ; i ∈ Z) be the atoms of P t , labeled in such a way that . . .< x −1 < x 0 < 0 < x 1 < . . .

and define I
The following proposition shows that (I P (t); t ≥ 0) corresponds to the distribution of (I(t); t ≥ 0), started from infinity.Proposition 9. Let M be uniformly distributed on [0, 1] and independent of P .Then, for any R ≥ 0, the process (I R (t); t ≥ 0) defined as Moreover, for any t, M is uniformly distributed on I R (t).The proposition will follow from the next simple result.Lemma 4. Let U and V be independent uniform r.v. on [0, 1].Define the interval In this work we will not make use of this connection with pssMp since all computations can be carried out directly from the Poisson construction.However this link could be used to generalize our results to a larger class of branching processes on the intervals, with more general fragmentation rules for the offspring distribution.

Convergence of the rescaled spine
Recall the definition of F R and F −1 R from (1).The following result provides the limit of the spine after rescaling time according to F where the γ i 's are independent Gamma r.v.'s with parameter (2, 1).
Proof.We show the result by induction on n.Recall that X(F −1 R (u)) has the same distribution as with a probability going to 1.This establishes the result at stage 1.
Let us now assume that the property is satisfied at stage n.Conditional on the process X up to time F −1 R (u n ), the Markov property implies that the spine at , and Y 1 , Y 2 are independent and exponentially distributed with mean as in the case n = 1, this implies that F −1 R (X(u n+1 )) is converging to an independent γ n+1 random variable.

The recombination k-spine tree
In the previous section we have characterized the large N , large R behavior of the process giving the marks along a single spine.We now provide a similar characterization for the k-spine tree.We start with the following definition.Definition 6.Let us denote by Q k I the law of some r.v.(I 1 , . . ., I k ) and (W 1 , . . ., W k−1 ) such that: We also use the shorter notation

Convergence of the tree
According to Proposition 10, it is natural to rescale time using F R as follows.
The following result is a straightforward extension of Proposition 10.
(ii) Conditional on the W i 's where the γ i 's and γi 's are independent Gamma r.v.'s with parameter (1, 2).
Proof.The proof goes along the same line the one of Proposition 10 and is left to the interested reader.

Convergence of the chromosomic distance
In this section, we prove that the genealogical distance and the rescaled chromosomic distance coincide in the large R limit under Q k R .For i < j, set W i,j = W j,i = min{W i , . . ., W j−1 } to be the time at which branches i and j split.Define ∀i, j ≤ k, d(i, j) = 1 − W i,j , which is the genealogical distance between the leaves of the k-spine tree.Conditional on (I 1 , . . ., I k ) and (W 1 , . . ., W k−1 ), let M i be uniformly distributed on I i (1).We define ∀i, j ≤ k, D(i, j) = |M i − M j | which is the chromosomic distance between the leaves.For later purpose, we also introduce the corresponding rescaled distance, ∀i, j ≤ k, dR (i, j) = 1 − F R (W i,j ), DR (i, j) = log D(i, j) ∨ 2 log R .
The next result provides an interesting relation between the genealogy of the branching process and the "geography" along the chromosome.Namely, on a logarithmic scale, the distance between two segments on the chromosome is directly related to the genealogy of the two segments.Proof.Let us work under Q k R and let i < j.By construction of the k-spine, conditional on I i (W i,j ), (I i (t+W i,j ); t ≥ 0) and (I j (t+W i,j ); t ≥ 0) are independent and distributed as Q 1 Ii(Wi,j ) .We know from the Poisson construction that, conditional on I i (W i,j ), M i and M j are independent uniform variables on that interval.Therefore, is a Beta(1, 2) r.v.Write log D i,j = log|M i − M j | = log |M i − M j | X i (W i,j ) + log W i,j X i (W i,j ) − log W i,j .
From the previous point and Proposition 10, The result follows by noting that the r.v. on the left-hand side has the same distribution under Q k R as log D i,j d i,j log R under Q k R .

Proof of the main result
We can now proceed to the proof of our main result.
Proof of Theorem 2. In order to ease the exposition, we only prove result for t = 1, but the proof is easily adapted for general t > 0.
Recall that Q 1,N R denotes the distribution of the 1-spine provided in Definition 5. Let Q k,N R be the corresponding k-spine distribution, with i.i.d.branch times (W 1 , . . ., W k−1 ) such that for the function F R defined in (1).Set where the M i are uniformly distributed on the I i (N ).In order to use Theorem 6, we need to compute the limit of where for the branching process with recombination, Therefore, lim , DN R (i, j), X N i (1); i, j ≤ k exists and we have lim Applying Theorem 6 for the large N limit, then Proposition 3 for the large R limit, and finally Proposition 5 and Proposition 4 for the polynomials of the Brownian CPP with independent marks proves the result.Note that since the total mass of the Brownian CPP is exponentially distributed, it fulfills the moment condition (7).For the large N limit, the size of the branching process with recombination is stochastically dominated by a Galton-Watson process with Poisson(1 + R/N ) offspring distribution.The size of this process, conditional on survival at time N and rescaled by N , is well-known to converge to an exponential distribution, see for instance [35,Theorem 2.1].This readily shows that ( 12) is also fulfilled by the large N limit, for each fixed R.
Remark 9.In the above proof we have made use of Theorem 4 to identify the large N limit of the genealogy.It is possible to prove the result without relying on our extension of the Gromov-weak topology by constructing the branching process with recombination by superimposing on a Galton-Watson tree with Poisson(1 + R/N ) offspring distribution a process along the branches describing the recombination events as in [3].The large N limit could then be expressed by means of the superprocess limit associated to that branching model.
We now apply the previous estimate inductively to the k-spine, first to be able to take the large N limit, then to take the large R limit.Recall the notation X N i for the rescaled marks along branch i and the notation W N i for the rescaled i-th branch time, and the notation ∀u ∈ [0, 1], Φ(u) = u + 1 R − 1 .
Corollary 3.For any R > 1, k ≥ 1, and β ∈ (1, 2), Proof.We only prove the result in the continuum setting.The discrete case can be proved along the same lines.Applying the Markov property to X k+1 at time W k and using Lemma 7 yields where in the last inequality, we used the fact that Φ(u) ≤ 2 for every u ∈ [0, 1].Let p, q ≥ 0 such that 1 p + 1 q = 1 and take q close enough to 1 such that qβ ∈ (1, 2).By Hölder's inequality, the second term on the r.h.s. is bounded from above by Further, The result follows by a straightforward induction since the case k = 1 was proved in Lemma 7.

Figure 1 :
Figure 1: When a recombination occurs, a point is chosen along the sequence, called the crossover point and represented by a dashed line.Two new chromosomes are formed by swapping the parts of the parental chromosomes on one side of the crossover point.The offspring inherits one of these two chromosomes.

Figure 3 :
Figure3: Top: simulation of a Brownian CPP.The black vertical lines represent to the atoms of P , and the corresponding tree is pictured in grey.Bottom: geometry of the blocks of ancestral material corresponding to the top CPP.Each block is represented by a black stripe.The correspondence between the blocks and the tree are shown for some blocks by grey segments joining the two.The distance between two consecutive stripes is the logarithm of their distance on the chromosome.Note that this induces a strong deformation of the intuitive linear scale.

x
and P x are connected through the random change of measure ∆ k .It is proved in Section 3.4.The remaining sections provide a rigorous construction of the measures P x and Qk,Nx and the proof of the spinal decomposition theorem.

Proposition 6 .
then (Π t ; t ≥ 0) is distributed as a Λ-coalescent.In particular, this leads to the following expression for the moments of the metric measure space [(0, 1), d Λ , Leb].Let [(0, 1), d Λ , Leb] be a Λ-coalescent tree.ThenE Φ (0, 1), d Λ , Leb = E ϕ d ij ; i, j ≤ k where ∀i, j ≤ k, d ij = inf{t ≥ 0 : i ∼ Πt j}for a realization (Π t ; t ≥ 0) of a Λ-coalescent.4.4 Relating spine convergence to Gromov-weak convergenceLet [T, (X u )] be the random marked tree with distribution P x constructed in Section 3.2, and let Z N = |T N | denote the population size at generation N .Recall that T can be endowed with the graph distance d T , and that T N denotes the N -th generation of the process.The metric d T restricted to T N encodes the genealogy of the population, and has the simple expression ∀u, v ∈ T N , d T (u, v) = N − |u ∧ v|.

Lemma 3 .
The function h : I → |I| is harmonic for the family of point processes (Ξ(I)).

Definition 4 .
The distribution Q 1,N I of the intervals along the 1-spine in the branching process with recombination is that of the discrete-time Markov chain (I(n); n ≥ 0) verifying I(0) = I, and conditional on I(n) = [a, b],

Figure 5 :
Figure 5: Illustration of the Poisson construction of Q 1 R .Atoms of P are represented with dark circles.At each time t, the vertical slice of the shaded region gives I R (t).