Ancestral reproductive bias in continuous-time branching trees under various sampling schemes

Cheek and Johnston (JMB 86:70, 2023) consider a continuous-time Bienaymé-Galton-Watson tree conditioned on being alive at time T. They study the reproduction events along the ancestral lineage of an individual randomly sampled from all those alive at time T. We give a short proof of an extension of their main results (Cheek and Johnston in JMB 86:70, 2023, Theorems 2.3 and 2.4) to the more general case of Bellman-Harris processes. Our proof also sheds light onto the probabilistic structure of the rate of the reproduction events. A similar method will be applied to explain (i) the different ancestral reproduction bias appearing in work by Geiger (JAP 36:301–309, 1999) and (ii) the fact that the sampling rule considered by Chauvin et al. (SPA 39:117–130, 1991), (Theorem 1) leads to a time homogeneous process along the ancestral lineage.


Introduction
Consider a continuous-time branching process with N t individuals alive at time t, started with one individual at time 0. At the end of its lifetime, an individual is replaced by a random number of independent offspring with distribution (p k ) k≥0 .When lifetimes of the individuals are i.i.d. with an arbitrary distribution µ on R + , the resulting process is called a Bellman-Harris process [BH48].In the special case of exponentially distributed lifetimes, this process is a continuous-time(Bienaymé-) Galton-Watson process, which is also called one-dimensional continuous-time Markov branching process, see [AN72, Chapter 3].For those processes, Cheek and Johnston [CJ23] study the process of reproduction times and family sizes along the ancestral lineage of an individual sampled from all those alive at a given time T > 0, conditioned on the event {N T > 0}.We give a short and conceptual probabilistic proof of the main results of [CJ23] in the more general Bellman-Harris setting.The core idea of this proof is as follows: On the event {N T > 0}, we assign to the individuals alive at time T independent random variables, which will be called markers, uniformly distributed on [0, 1].Then the individual whose marker is largest constitutes a uniformly distributed random pick from all the individuals alive at time T .As we will see, the argument s of the generating functions that appear in the analytic arguments of [CJ23] corresponds to the realisation of the largest marker.Sections 2 to 4 will be devoted to formulating and proving Theorem 2.1.
Relating to work of Chauvin, Rouault and Wakolbinger [CRW91], in Section 5 we will consider the case of potentially dependent but identically and atomless distributed markers and conditioning on one marker taking the prescribed value s.In contrast to the above, in this case one does not observe a time-inhomogeneity along the sampled ancestral lineage.
In Section 6 we will consider a planar embedding of the Bellman-Harris tree conditioned to survive up to time T , and analyse the leftmost ancestral lineage among those surviving until time T .Here we follow Geiger [Gei99], who gave a representation of discrete-time Galton-Watson processes conditioned to survive up to a given number of generations.With this sampling rule we observe a time-inhomogeneity of ancestral reproduction events that is different from the one in [CJ23].
2020 Mathematics Subject Classification.Primary 60J80; secondary 60K05, 92D10.Key words and phrases.branching processes, spines, reproductive bias, inspection paradox,sampling schemes.We thank Anton Wakolbinger for bringing the work [CJ23] to our attention.We are grateful to him and also to Matthias Birkner, Götz Kersting and Marius Schmidt for stimulating discussions and valuable hints.A substantial part of this work was done during the 2023 seminar week of the Frankfurt probability group in Haus Bergkranz.
In Section 7 we briefly resume the discussion from [CJ23] on a possible relation between the ancestral rate bias and the rate of mutations per cell division in embryogenesis, and illustrate the various sampling schemes from a more biological perspective.
2. Sampling an ancestral line at random 0.28 V with mark 0.967 An example for a realisation of the random variables S, L 1 , L 2 , T 1 , T 2 in the sampling regime described in Section 2.
Recall that to each individual at time T , we have associated a uniform marker in [0, 1].On the event {N T > 0}, let the individual V be sampled as described in the Introduction, and let S be its mark.We define the process (N t ) t≥0 to be right continuous with left limits.As a consequence, if T 1 is the lifetime of the root individual, then N T 1 has distribution (p k ) k≥0 .Let J be the random number of reproduction events and 0 < T 1 < T 2 < • • • < T J ≤ T be the random times of reproduction events along the ancestral lineage of V .Let L 1 , . . ., L J be the offspring sizes in these reproduction events and let 0 < τ 1 < τ 2 < • • • be the random arrival times in a renewal process with interarrival time distribution µ.See Figure 1 for a sample realisation.Denote by P and E the probability measure and expectation for N 0 = 1.
When the lifetime distribution µ is the exponential distribution with parameter r, then τ 1 , τ 2 , . . .are the points of a rate r Poisson point process.In this case Corollary 2.2 together with (2.1) becomes a reformulation of the statements of [CJ23, Theorems 2.3 and 2.4], and at the same time reveals the probabilistic role of the mixing parameter s in the mixture of biased compound Poisson processes that appear in the "Cox process representation" of [CJ23].
Let us write (as in [CJ23]) F t (s) := E[s Nt ], and abbreviate [CJ23, Theorem 2.4] (as well as Theorem 2.1) says that the rate of size ℓ reproduction along the uniform ancestral lineage at time t is This can be obtained from Corollary 2.2 by noting that S has density .
In this sense the factor B(t, T, ℓ) can be interpreted as an (ancestral) rate bias, on top of the classical term rℓp ℓ .Indeed, the factor B(t, T, ℓ) is absent in trees that are biased with respect to their size at time T .Galton-Watson trees of this kind have been investigated (also in the multitype case) by Georgii and Baake [GB03, Section 4]; they are continuous-time analogues of the size-biased trees analysed by Lyons et al. [LPP95] and Kurtz et al. [KLPP97].
In the critical and supercritical case one can check that, for all fixed u < T and ℓ ∈ N one has the convergence B(T − u, T, ℓ) → 1 as T → ∞ because S converges to 1 in probability.In the supercritical case this stabilisation along the sampled ancestral lineage corresponds to the "retrospective viewpoint" that has been taken in [GB03] and, in the more general situation of Crump-Mode-Jagers processes, by Jagers and Nerman [JN96].The choice µ = δ 1 renders the case of discrete-time Galton-Watson processes, starting with one individual at time 0 and with reproduction events at times 1, 2, . ...Then, with T = n ∈ N, and L 1 , . . ., L n being the family sizes along the ancestral lineage of the sampled individual V , the formula (2.1) specialises to

random markers
As a preparation for the short probabilistic proof of Theorem 2.1 given in the next section, we recall the following well-know fact: Denote by Unif[0, 1] the uniform distribution on the interval [0, 1].For ℓ ∈ N, let S be the maximum of ℓ independent Unif[0, 1]-distributed random variables U 1 , . . ., U ℓ .Then the density of S is Indeed, because of exchangeability, which equals the r.h.s. of (3.1).
The following corollary is immediate.
Corollary 3.2.Let L be an N 0 -valued random variable that is independent of all the random variables appearing in Lemma 3.1, with P(L = ℓ) = p ℓ , ℓ ∈ N 0 .Then we have for all ℓ ∈ N 0 ,

Proof of Theorem 2.1
We prove the statement (2.1) by induction over j, simultaneously over all time horizons T > 0. We write P T for the probability referring to time horizon T ; this will be helpful in the induction step where we will encounter two different time horizons.For j = 0, both sides of (2.1) are equal to µ((T, ∞)) ds.For j = 1, on the event {T 1 ∈ dt 1 }, we can directly apply Corollary 3.2 to the markers of the L 1 subtrees produced in this event.These subtrees live T − t 1 long and thus have sizes distributed as N T −t 1 .So the left side of (2.1) equals which is using the j = 0 case.This is equal to the right hand side of (2.1).Now assume we have proved (2.1) for all time horizons T ′ with j − 1 (in place of j), for all times t ′ 1 , . . ., t ′ j−1 ≤ T ′ , sizes ℓ ′ 1 , . . ., ℓ ′ j−1 ∈ N and s ∈ [0, 1].On the event {T 1 ∈ dt 1 , L 1 = ℓ 1 } the descendants of the ℓ 1 siblings in the first branching event form ℓ 1 independent and identically distributed trees on the time interval [t 1 , T ].Thus, using Corollary 3.2 and setting t ′ 1 := t 2 − t 1 , . . ., t ′ j−1 = t j − t 1 , we obtain that the left hand side of (2.1) equals By the induction assumption, this is equal to (4.1) , where τ ′ 1 , τ ′ 2 , . . .have the same distribution as (τ 1 , τ 2 , . ..) .Obviously (4.1) equals the r.h.s of (2.1).This completes the induction step and concludes the proof.

Conditioning on a marker value
Chauvin, Rouault and Wakolbinger [CRW91] consider a Markov process with an atomless transition probability indexed by a continuous-time Galton-Watson-tree and they then condition on an individual at time T to be at a given location.
An example for a realisation of the random variables L 1 , L 2 , T 1 , T 2 in the sampling regime described in Section 5.
To relate this to the framework described in the Introduction, we assume that each individual alive at time T in the Bellmann-Harris tree carries a marker in some standard Borel space E and these random marks have the following properties: (M1) Their marginal distributions (denoted by ν) are identical and do not depend on the reproduction events (M2) A.s. no pair of marks is equal.Think for example of branching Brownian motion: The positions of the different particles clearly depend on each other via the genealogy, however, at time t the marginal distribution of the position of each particle is a centered Gaussian random variable with variance t, irrespective of its past genealogical events in the underlying continuous-time Galton-Watson tree.Thus (M1), is fulfilled.Since two correlated Gaussian random variables are a.s.not equal if the correlation coefficient is not equal to one, (M2) is also fulfilled.
We now condition on {N T > 0} and, for given s ∈ E, on one of the N T individuals having marker value s.Remember the previous notation: Denote by V the individual having marker s.Let J be the random number of reproduction events along the ancestral lineage of V and 0 < T 1 < T 2 < • • • < T J < T be the random times of these reproduction events.Let L 1 , . . ., L J be the offspring sizes in these reproduction events and let 0 < τ 1 < τ 2 < • • • be the random arrival times in a renewal process with interarrival time distribution µ. Figure 2 depicts a sample realisation.The following Theorem generalises (part of) [CRW91, Theorem 2] to general lifetime time distributions.
Theorem 5.1.For j ≥ 0, 0 < t 1 < . . .< t j < T and ℓ 1 , . . ., ℓ j ∈ N we have for ν-almost all s Proof.Because of properties (M1), (M2) we have Hence (5.1) is equivalent to As in the proof of Theorem 2.1 we prove the statement (5.2) by induction over j, simultaneously over all time horizons T > 0. As before we write P T for the probability referring to time horizon T .For j = 0 the statement is true, since P T (J = 0, N T > 0, ∃ marker ∈ ds) = P (τ 1 ≤ T ) ν(ds).
Assume we have proved (5.2) for all time horizons T ′ with j − 1 (in place of j), for all times t ′ 1 , . . ., t ′ j−1 ≤ T ′ , sizes ℓ ′ 1 , . . ., ℓ ′ j−1 ∈ N and marker distributions with the same marginal ν that satisfy conditions (M1), (M2).Turning to (5.2) as it stands, we note that on {T 1 = t 1 , L 1 = ℓ 1 }, the descendants of the ℓ 1 siblings in the first branching event form ℓ 1 independent and identically distributed trees on the time interval [t 1 , T ].Let U k , k = 1, . . ., ℓ 1 , be the set of markers of the individuals at time T that descend from the k-th sibling.By randomly permuting these ℓ 1 siblings, we can assume that the set-valued random variables U k , k = 1, . . ., ℓ 1 , are exchangeable.Note that the markers in each U k satisfy conditions (M1), (M2).Because the markers are a.s.pairwise different by assumption, the marker s belongs to at most one of those U k , so Note that for the sake of intuition we use a differential notation for what formally is an (integral) equality for the distribution of the random point measure formed by the individuals' markers, which by assumption (M2) can be seen as a random set of points.Putting t ′ 1 := t 2 − t 1 , . . ., t ′ j−1 := t j − t 1 we thus infer, using the branching property of the Bellman-Harris tree, that the left hand side of (5.2) equals By the induction assumption this is equal to where (τ ′ 1 , τ ′ 2 , . ..) have the same distribution as (τ 1 , τ 2 , . ..).Obviously (5.3) equals the r.h.s. of (5.2), which completes the induction step and concludes the proof.
Remark 5.2.If µ is the exponential distribution with parameter r, then τ 1 , τ 2 , . . .are again the points of a rate r Poisson point process and (5.1) implies that reproduction events along the ancestral lineage of V happen according to a time-homogeneous Poisson process with rate r ℓ ℓp ℓ .This corresponds to the description of the events along the ancestral line of V given in [CRW91, Theorem 1].

Sampling the left-most ancestral lineage
An example for a realisation of markers and random variables L 1 , L 2 , K 1 , k 2 , T 1 , T 2 in the sampling regime described in Section 6.
We now aim to obtain results about what Geiger [Gei99] calls the leftmost surviving ancestral lineage in a planar embedding of the tree: At any reproduction event we assign independent uniformly on [0, 1] distributed markers to all children.An individual can now be uniquely determined by the markers along its ancestral lineage.On the event {N T > 0}, let V be the individual whose markers along the entire ancestral lineage comes first in the lexicographic ordering.Let J be the random number of reproduction events and 0 < T 1 < T 2 < • • • < T J ≤ T be the random times of reproduction events along the ancestral lineage of V .
Let L 1 , . . ., L J be the offspring sizes in these reproduction events and let 0 < τ 1 < τ 2 < • • • be the random arrival times in a renewal process with interarrival time distribution µ.Denote by K i the number of siblings born at reproduction event number i along the ancestral lineage of V which have a lower lexicographic order than V and whose descendants hence die out before time T .Figure 3 shows a realisation for this sampling rule.Theorem 6.1.For j ≥ 0, 0 < t 1 < . . .< t j < T, ℓ 1 , . . ., ℓ j ∈ N and k i ∈ {1, . . ., ℓ i − 1} we have Proof.The proof of the theorem works in analogy to the one of Theorem 2.1, but using following analogue of Lemma 3.1.

Biological perspectives
Cheek and Johnston [CJ23, Section 5] discuss recent studies ([PMK + 21], [CTAS + 19]) which suggest that certain mutation rates are elevated for the earliest cell divisions in embryogenesis.Under the assumptions that (1) cell division times vary and (2) mutations arise not only at but also between cell divisions, Cheek and Johnston argue that this early rate elevation might be parsimoniously explained by their finding that in the supercritical case with no deaths the rate of branching events along a uniformly chosen ancestral lineage is increasing in t ∈ [0, T ] (which is a corollary to their Theorem 2.4).
The two-stage sampling rule • first sample a random tree ("an adult") that survives up to time T , • then sample an individual from this tree ("a cell from this adult") at time T seems adequate for the situation discussed in Cheek and Johnston [CJ23, Section 5].In other modeling situations, again with a large collection of i.i.d.Galton-Watson trees, one may think of a different sampling rule: Choose individuals at time T uniformly from the union of all time T individuals in all of the trees.This makes it more probable that the sampled individuals belong to larger trees, and in fact corresponds to the size-biasing of the random trees at time T ([GB03, Section 4]).In the two-stage sampling rule we see the different rate bias (2.2), discussed at the end of Section 2.
As can be seen from [CRW91, Theorem 1] (and Theorem 5.1), the rate bias (2.2) is also absent along the ancestral lineage of an individual whose marker has a prescribed value s, if one considers a situation in which a neutral marker evolves along the trees in small (continuous) mutation steps, and if one takes, for the prescribed value s, the collection of trees so large that one individual at time T has a marker value close to (ideally: precisely at) s.
The sampling rule that appears in [Gei99] (and Theorem 6.1) leads to a rate (and reproduction size) bias along the ancestral lineage that is different from the ones we just discussed.This sampling rule can be defined via i.i.d.real-valued valued neutral markers that are created at each birth and passed to the offspring.The individual sampled at time T (from the tree conditioned to survive up to time T ) is the one whose marker sequence is the largest in lexicographic order among the individuals that live in the tree at time T .This interpretation appears of less biological relevance, except in the pure birth (or cell division) case, where one might think of one single marker that is passed on in each generation to a randomly chosen daughter cell.