Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models

Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including—but not limited to—speciation (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {S}}$$\end{document}S), gene duplication (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {D}}$$\end{document}D), gene loss (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {L}}$$\end{document}L), and horizontal gene transfer (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {T}}$$\end{document}T). The reconstruction of a gene tree representing the evolution of a gene family constrained by a species tree is an important problem in phylogenomics. However, unlike in the multispecies coalescent evolutionary model that considers only speciation and incomplete lineage sorting events, very little is known about the search space for gene family histories accounting for gene duplication, gene loss and horizontal gene transfer (the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {D}}{\mathbb {L}}{\mathbb {T}}$$\end{document}DLT-model). In this work, we introduce the notion of evolutionary histories defined as a binary ordered rooted tree describing the evolution of a gene family, constrained by a species tree in the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {D}}{\mathbb {L}}{\mathbb {T}}$$\end{document}DLT-model. We provide formal grammars describing the set of all evolutionary histories that are compatible with a given species tree, whether it is ranked or unranked. These grammars allow us, using either analytic combinatorics or dynamic programming, to efficiently compute the number of histories of a given size, and also to generate random histories of a given size under the uniform distribution. We apply these tools to obtain exact asymptotics for the number of gene family histories for two species trees, the rooted caterpillar and complete binary tree, as well as estimates of the range of the exponential growth factor of the number of histories for random species trees of size up to 25. Our results show that including horizontal gene transfers induce a dramatic increase of the number of evolutionary histories. We also show that, within ranked species trees, the number of evolutionary histories in the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {D}}{\mathbb {L}}{\mathbb {T}}$$\end{document}DLT-model is almost independent of the species tree topology. These results establish firm foundations for the development of ensemble methods for the prediction of reconciliations.


Introduction
A gene tree represents the evolution of a gene family, a group of genes assumed to descend from a single ancestral gene.The reconstruction of gene trees from molecular sequence data is a central but difficult problem in computational biology.Indeed, while species are mostly expected to evolve through speciation, gene families evolve through a wider variety of mechanisms including gene duplication, gene loss, horizontal gene transfer (HGT) and incomplete lineage sorting (ILS).As a result, it is common to observe an incongruence between gene trees and species trees [32].This discrepancy has motivated an intense research activity on the problem of reconstructing the gene tree of a gene family, conditional to a given species tree for the considered species.We refer to [43,45] for extensive reviews discussing how gene trees evolve within a species tree, describe existing models and methods for reconstructing gene trees within species trees.
In the case where a gene family contains a single gene per species, observed incongruences between a gene tree and a species tree can be analyzed through the prism of ILS in the multispecies coalescent model [11].The natural question is then to compute the probability of coalescent histories conditional to the given species tree [12,35,49,50].For gene families that might contain duplicate copies (or no copy) of a gene in a given species, the multispecies coalescent model is not appropriate, and gene trees need to be inferred in a model including gene duplication, gene loss and, ideally, transfers.Most methods developed to understand the evolution of gene families in this context rely on the concept of gene tree-species tree reconciliation, illustrated in Fig. 1.In this framework, given a gene tree G and a species tree S, one aims to embed G within S, often optimizing a parsimony or probabilistic criterion with regard to the considered evolutionary model.Fig. 1: A species tree S (left), a DL-history for S (center) and its associated gene tree (right).Green squares (resp.blue circles, red diamonds, black rectangles) correspond to nodes x such that e(x) = D (resp.e(x) = S, e(x) = L, e(x) = Extant).The mapping s is represented by the location of the internal nodes of the history within the species tree in the center tree and by the species names in the nodes in the right tree.
Early reconciliation methods were developed for an evolutionary model considering only gene duplications and gene losses (the DL-model), and considered a parsimony criterion.This problem, introduced by Goodman et al. [26], is computationally tractable through dynamic programming.Extending the model to include HGT, while ensuring that HGT events are time-consistent, makes the problem of predicting of the most parsimonious reconciliation intractable in general [34,47].However, if the provided species tree is ranked, i.e. is provided with a total ordering of its internal nodes describing the order of speciation events, the reconciliation problem becomes tractable (see the discussion in [19]).Over the last 20 years, various efficient dynamic programming algorithms were designed to compute a parsimonious reconciliation, implemented in widely used phylogenomics packages [6,22,31,39].Similar to parsimony-based methods, probabilistic reconciliation methods were first developed in a model considering only gene duplication and gene loss [1,2,27,29], before being extended to include HGTs [40,44].
Most methods that reconstruct a gene tree, conditional to a species tree, rely on the exploration of the space of possible evolutionary histories.It is then important to develop conceptual tools that can describe this combinatorial space and further enable its efficient exploration.This naturally raises the questions to compute the size of the space of evolutionary histories for a given gene family and a given species tree, and to be able to sample such histories.Both questions are naturally related, as precise counting results often translate into efficient sampling algorithms [24,48].The former (counting) question has been studied by Rosenberg et al. in the case of the multispecies coalescent model [13][14][15][16][17]38]. However similar questions have not been explored as thoroughly for evolutionary models including gene duplication, gene loss and HGT.In this framework, dynamic programming equations aimed at computing a parsimonious reconciled gene tree can be turned into a specification of the corresponding search space [28,36].This then leads to efficient algorithms for counting or sampling parsimonious reconciliations [5,18] or sampling reconciled gene trees under the Boltzmann probability distribution [31].However, to the best of our knowledge, such questions have not been considered in the case where a gene tree is not specified at first, i.e. we are only given a species tree and gene family.
This paper provides analytic and algorithmic answers to those questions.We show that, for a given species tree, whether ranked or unranked, the space of all possible evolutionary histories of a fixed size in the DLT-model can be described using a formal grammar.This allows us to compute, in polynomial time and space, for given species tree and gene family size, the number of evolutionary histories of this size conditional to the given species tree, as well as to sample among these histories under the uniform probability.Using these algorithms, we can provide estimates of the exponential growth factor of the number of histories in the DL-model and DLT-model.
We show that, as expected, including HGT in a model results in an exponential increase of the number of histories.We also notice that with a ranked species tree, the exponential growth factor of the number of histories in the DLT-model seems to be almost independent of the chosen species tree.Finally, using enumerative and analytic combinatorics, we provide exact values for the asymptotic number of histories for two specific species tree: the rooted caterpillar tree and the rooted complete binary tree.
2 Model: gene families evolutionary histories In this section, we introduce the combinatorial objects modeling the evolution of a gene family within a given species tree, that we call histories.
Preliminaries on trees.For a given rooted tree1 T, we say it is uniquely labeled if every node has a label, and no two nodes have the same label.For a node x in T, we denote by Tx the subtree of T rooted at x.In this work, we consider only binary and unary-binary trees: in a binary tree, every internal node has exactly two children, while in a unary-binary tree, an internal node can have either one child or two children.If a uniquely labeled tree T is unordered we take advantage of the nodes labeling to see it as an ordered tree, with the two children of an internal node x being ordered from left to right in increasing order of their labels; so from now on all trees we consider are ordered.If an internal node x of a tree T is binary, we denote by x the left child of x and by xr its right child; if x is unary, i.e. has a single child, we denote it by xc.We denote by r(T) the root of T. For a node x of T, we denote by p(x) its parent in T. The size of a tree T is the number of its leaves.
A rooted tree describes a partial order on the set of its nodes, and two nodes are said to be comparable if one is an ancestor of the other one and incomparable otherwise.For a node u, we denote by C(u) the set of nodes that are incomparable with u.

Ranked trees.
A ranking of a tree T of size n is a mapping π from the nodes of T to {1, . . ., n} such that (1) x and y are internal nodes, and (3) π(x) < π(y) if x is an ancestor of y.
A tree augmented with a ranking is called a ranked tree; in our context it models the evolution of a set of species, the ranking providing the relative order of speciation events, under the assumption that no two speciations can occur at the same time.
Given a binary tree T and a ranking π, we define an unranked unary-binary tree Tπ that encodes the ranking information as follows: for each internal node u, considered iteratively in increasing ranking order, and for every edge (p(v), v) such that π(p(v)) < π(u) < π(v), we subdivide the edge (p(v), v) into two edges (p(v), vu) and (vu, v), so adding a unary node vu on this edge.We denote by t(u) the set of all unary nodes created in this way and we call this set of nodes together with u a time slice.Additionally, we also define the set of all leaves as a time slice (see Figure 2).Note that in this way we create n different time slices which correspond to the n different values of the ranking.We modify the notion of incomparability for such unary-binary trees as follows: Gene Families Evolutionary Histories.The objects we study in this work model the evolution of a gene family within a species tree.A species tree, which will be denoted by S from now on, is a uniquely labeled rooted binary tree that represents the evolution of a set of species through speciation events; S can be either unranked or ranked.A gene family evolves within S from a single ancestral gene, present in the species r(S), through four possible kinds of evolutionary events: -Speciation S: a gene x present in species u has two descendant genes x present in species u and xr present in species ur.
-Duplication D: a gene x present in species u is duplicated, with a new copy x d of x appearing in species u; x is said to be the original gene while x d is the novel gene.
-Loss L: a gene x present in species u has exactly one descendant either in x or in xr, implying that after a speciation at species u, exactly one of the two resulting genes is lost along the branch toward either u or ur.-Horizontal Gene Transfer T (HGT): this is similar to a duplication but the novel copy, denoted x t here, appears in a species v different from u and incomparable with u, called the receiver of the HGT, while u is called the donor of the HGT.If S is ranked, with ranking π, the receiver species v is required to exist at the same time as u, i.e. to satisfy two ranking constraints, π(p(v)) < π(u) < π(v).
Definition 2.1 An evolutionary history for a gene family within a species tree S is a unary-binary ordered rooted tree T together with two mappings s : V (T) → V (S) and e : V (T) → {S, D, L, T, Extant} satisfying the following constraints: if x is a leaf, e(x) ∈ {Extant, L}; if x is internal and binary, e(x) ∈ {S, D, T}; if x is internal and unary then e(x) = S2 ; if e(x) = S and s(x) = u is binary then s(x ) = u and s(xr) = ur; if e(x) = S and s(x) = u is unary then s(xc) = uc; if e(x) = D then s(x ) = s(xr) = s(x); if e(x) = T then s(x ) = s(x) and s(xr) ∈ C(s(x)).
The size of a history is the number of leaves x such that e(x) = Extant.
Intuitively, this definition states that a history is represented by a tree where each node corresponds to a gene present in a species, either extant or ancestral (the mapping s), and each ancestral gene either was lost (e(x) = L) or evolved toward extant genes through a duplication (e(x) = D), an HGT to an incomparable receiver species (e(x) = T) or a speciation (e(x) = S), while extant genes belong to extant species; the constraints on the species mapping s ensure that this history can be embedded within S as illustrated in Figure 1.By convention, for duplications, we consider that the novel copy of a gene x is its right child xr, x representing the original copy.Histories considered by the DL-model, which allows both duplications and losses (resp.duplications, losses and HGTs), are called DL-histories (resp.DLT-histories).
Remark 2.1 By modeling the evolution of a gene family with ordered trees we differ from the classical notion of reconciliation, that also models the evolution of a gene family but considers that when a gene duplication occurs, the original gene and the novel gene are indistinguishable.As a result, the children of a duplication are ordered within a history, whereas they are not in a reconciliation.
Remark 2.2 Gene losses are modeled as speciation events with one disappearing gene.As a consequence, we can not have a duplication or a HGT that results in one of the resulting two gene copies being lost.This is necessary to avoid creating an infinite number of histories of a given size, due to an arbitrary number of duplications within a species, each followed by a loss, or an arbitrary long sequence of HGT, again each followed by a loss, leading to at most one extant gene.
Time Consistency of DLT-histories.Given an unranked species tree S, a DLT-history as defined above is time inconsistent if there exists a gene x belonging to a species u such that one of its ancestors belongs to a species v and one of its descendants belongs to a species v ancestral to v.This pattern can be observed due to the fact that, in the definition of a DLT-history, the choice of the receiver species v of an HGT of gene x belonging to species u is not restricted to the set of species that are also incomparable with all species containing genes that are ancestral to x; see Figure 3   The problem of computing gene family evolutionary scenarios that are both parsimonious and time-consistent has been shown to be intractable when such scenarios are modeled by reconciliations with an unranked species tree [34,47], while, when the provided species tree S is ranked, the problem becomes tractable (see [19] and references therein).Similarly, when S is ranked, we can ensure time-consistency of evolutionary histories, by requiring that the donor and receiver of any HGT belong to the same time slice in Sπ, i.e. the receiver of an HGT of a gene belonging to a species u belongs to C(u) = t(u) − {u}.

Methods
Our results (counting and sampling algorithms) are based on the design of formal grammars specifying, for a given species tree S, the combinatorial families of DL-histories and DLT-histories constrained by S.These grammars are then used as templates to design dynamic programming algorithms for counting and sampling (under the uniform distribution) the number of histories of a fixed size.Moreover, these grammars are amenable to techniques of analytic combinatorics that allow us to compute the asymptotic growth constant for the number of histories.We first describe our grammars, then the counting and sampling algorithms, and finally the asymptotic analysis of these grammars.

General grammars specifying DL-histories and DLT-histories
In this section we describe grammars specifying histories evolving within a species tree using the formalism developed in [23].We describe grammars for DLT-histories, for both an unranked and a ranked species tree; these grammars can then be specialized into grammars for DL-histories by omitting the rules related to HGT.
Let S be a species tree.If S is unranked, it is a binary tree, otherwise, if it comes with a ranking π, we consider the unary-binary species tree Sπ.So in the statements below, when mentioning a ranked species tree we mean the unary-binary tree Sπ defined by the ranking.
We denote by Hu the set of DLT-histories for the tree Su.In the most general setting, following [23], these grammars contain both terminal symbols, corresponding to atomic elements of the histories (nodes) and nonterminal symbols, corresponding to combinatorial operators applied to sets of histories.We use the non-terminal Zu to encode a gene present in extant species u; moreover, we use Xu for a gene lost at species u, Yu for a duplication at species u and Wu for a HGT with donor species u.We consider two combinatorial operators, ∪ the disjoint union and × the Cartesian product.
Theorem 3.1 The set H r(S) defined by the grammar below specifies the set of all DLT-histories for a species tree S.
if u is internal and unary (4) where C(u) is the set of nodes that are incomparable with u in S. The set of DL-histories is specified by the same grammar where rule ( 6) is removed and the terms Tu are removed from rules ( 1) and ( 2).
Proof The grammar follows the definition of histories, Definition 2.1.Rule (1) simply states that the root (i.e. the first evolutionary event of the history) of a DLT-history within the subtree Su, assuming it is not reduced to a leaf, is either a speciation, a duplication or a transfer of the ancestral gene present in species u: non-terminal Su, Du and Tu represent respectively these three subsets of Hu.Rule (2) addresses the case where Su is composed of a single leaf, in which case there can not be a speciation event, but a history reduced to a single gene in species u.Rule (3) describes a speciation event at species u.The ancestral gene can either evolve into a gene in each of the two children of u (first term of the union) or into a gene in a single child of u due to a gene loss in the other child of u.In the case where u is unary (due to being a node created by the time slicing in a ranked S), the ancestral gene evolves into a copy in the unique child uc of u.
Rule (5) addresses the case of a duplication.It results in two ordered independent histories starting at species u: the first one being the history of the original copy of the starting ancestral gene and the second one the history rooted at the novel gene created by the duplication.
Last, Rule (6) addresses the case of histories starting by a HGT.Generally, a HGT has a structure similar to a duplication but for the fact that the novel gene appears in a species that is incomparable with u.
These various rules cover all cases for describing the possible first event of a history and are mutually exclusive, thus providing a complete recursive specification of DLT-histories for a given species tree S. It follows immediately that removing the rule and non-terminals associated to HGT gives a grammar specifying DL-histories for S.
Remark 3.1 The above grammar can be greatly simplified if one is interested only in the number of histories of a given size, as opposed to the specific species where gene duplication, gene loss and HGT events occur and the precise gene content of extant species.In this case, one simply identifies all non-terminals Zu (resp.Xu, Yu, Wu) to a single variable Z (resp.X , Y, W).From now, we follow this approach.

Counting and sampling algorithms
The grammar defined above can naturally be turned into a dynamic programming algorithm computing the number of histories of a given size.This algorithm computes tables H, D, S, T where, for a given node u of S and a given history size n, H ) is the number of DLT-histories of size n evolving within Su (respectively, starting with a duplication, a speciation, and an HGT).We illustrate this in the case of DLT-histories with an unranked species tree S.
Counting Time Φ(n, k) Table 1: Leading terms for the time (Φ(n, k)) and space (Φ(n, k)) complexities incurred by the evaluation of the counting recurrences for histories consisting of n genes in a species tree of size k.
A random generation algorithm can then be adapted from the counting recurrences, resulting in an instance of the so-called recursive method [48].Right-hand sides of the counting equation are split into sums of multiplicative terms.Starting from the initial state H[r(S), n], the algorithm randomly chooses a term from the right-hand side of the current state, with probability proportional to its contribution to the counting.When the selected term is a multiplication of two terms, the length n needs to be distributed across the two terms, and a pair of lengths (m, n − m), is chosen with probability proportional to the associated count.For the sake of performances, the various alternatives can be explored in Boustrophedon order, ensuring an overall O(n log(n)) worst-case complexity [24].Recursive calls are then performed over the states associated with the chosen term, until a leaf is chosen (term 1).This leads to the following result.

Theorem 3.2
The number of histories of size n constrained by a species tree of size k can be computed in polynomial time O(Φ(n, k)) and space O(Ψ (n, k)), where Φ(n, k) and Ψ (n, k) both depend on the model (DL or DLT) and the ranked/unranked nature of the species tree, as summarized in Table 1.
The uniform random generation of h histories of size n can be performed in time

Asymptotic number of histories in the DL-model
The grammar given in Theorem 3.1 defines a combinatorial specification of the set of histories for a given species tree in a given evolutionary model.In this section, we derive the asymptotic number of histories in the DL-model and use it later on two specific species trees: the caterpillar and complete binary trees.The following theorem is the main result of this section and describes their asymptotic growth for n tending to infinity.
Theorem 3.3 For any given species tree S, the number of histories in the unranked DL-model given by Equations (1)-( 5) is, for large n, equal to for explicitly computable constants γ S > 0 and ρ S ∈ (0, 1/4].
In the remainder of this section we prove this theorem.The grammars are amenable to enumerative and analytic combinatorics techniques.We follow the general approach presented in Flajolet and Sedgewick [23] and Drmota [20].It consists mainly in translating the combinatorial specification of a combinatorial family into equations defining its counting generating function.Then, its analytic properties lead to precise asymptotic formulas for its coefficients.We provide an overview of this approach in Example 3.1.
Example 3.1 Consider the class of rooted binary trees B. Such a tree is either a leaf, or it consists of a root with two children which are also each roots of binary trees.Let us mark each leaf with the variable Z.Then, the grammar is given by Let bn be the number of binary trees with n leaves and let B(z) = n≥1 bnz n be the counting generating function of binary trees.The symbolic method [23, Part A] translates this grammar directly into an equation for the generating function: Its generating function is thus given by B(z The general method of singularity analysis from analytic combinatorics [23, Chapter VI] allows us to directly get the asymptotics of the coefficients.First, by the Cauchy-Hadamard theorem, the asymptotic growth is directly connected with the dominant singularities (and the radius of convergence) of the counting generating function.
Here, the generating function B(z) becomes singular at z = 1/4, which is also the unique singular point.Hence, the coefficients bn grow like 4 n .Second, using transfer theorems of analytic combinatorics [23, Theorem VI.1 and Theorem VI.3] we also get the subexponential terms and recover the well-known result for Catalan numbers for n → ∞.
We will now describe this approach applied to the grammar specifying the DL-histories with an unranked species tree S. Let hu,n be the number of DL-histories of Su consisting of n genes represented in the generating function by the formal variable z.We define the counting generating functions The coefficients hu,n represent the number of histories of size n associated with the species tree Su independent on the number of losses or duplications.These generating functions (one per species u of S) are strongly related to the generating function of binary trees B(z) introduced in Example 3.1.Lemma 3.1 For a given species tree S the counting generating function H r(S) (z) for histories in the unranked DL-model is defined by the system of functional equations over all nodes u of S, where Proof The symbolic method [23, Part A] translates the unranked DL-grammar of Equations ( 1)-( 5) directly into a system of equations for the generating functions.We get Comparing these equations with the one for binary trees from Equation ( 13) the claim follows.
The advantage of a generating function approach is that we are able to identify the subexponential growth as n −3/2 , and that we are able to explicitly compute exponential growth ρ −1 S and the constant γ S for a fixed species tree S. We will compute the involved constants explicitly for the caterpillar tree in Section 4.1.1 and for the complete binary tree in Section 4.1.2.
By basic principles of analytic combinatorics, the asymptotic growth of a counting sequence is directly related to the radius of convergence of the corresponding generating function.In particular, its dominant singularity (i.e. the one closest to the origin) defines its asymptotic growth.By the construction in terms of nested radicals, the generating function Hu(z) is singular if and only if at least one of its radicals becomes zero.Therefore, we make the structure of nested radicals visible.Writing the explicit form of the outermost B(z) in (14) gives Then, the radicands satisfy the following recurrence The recurrence can be used to determine the nature of the radii of convergence.For a node u we define ρu as the radius of convergence of Hu(z).Lemma 3.2 Let u be the parent of v in S.Then, ρu < ρv and ρu ∈ (0, 1/4] with ρu = 1/4 if u is a leaf.Furthermore, Ru(z) is the only radicand that vanishes at z = ρu and ρu is a simple root.
Proof By combinatorial construction Hu(z) is built of nested radicals and does not include any poles.Therefore, its dominant singularity must be at a point where (at least) one of its radicands vanishes.
We continue by induction on the depth of the subtree with root u given by Su.The depth is the longest path from the root to any leaf.As a first step, we prove that Ru(0) = 1 and that ρu ≤ 1/4.For a leaf u it is clear from Relation (17) that Ru(0) = 1 and that ρu = 1/4.
Next, let v and w be the children of u such that ρv ≤ ρw.By the induction hypothesis we directly get In order to continue, note that Ru(z) is monotonically decreasing on [0, +∞], because from the decomposition in ( 16) and (15) we see that for certain non-negative numbers an.
Thus, on the one hand, by the intermediate value theorem Ru(z) must have at least one zero in the interval (0, ρv).On the other hand, as Ru(z) is monotonically decreasing it has at most one zero in (0, ρv).Hence, this zero is equal to ρu.Finally, the above reasoning implies that among the nested radicals of Hu(z) the outermost one is the first one that vanishes, and no other radical vanishes at the same time.Thus, ρu is the radius of convergence of Hu(z).
Moreover, by (18) we see that the derivative R u (z) has non-positive coefficients.Hence, ρu is a simple root.
Let us shortly digress and discuss in a more general context how to numerically compute the exponential growth for the coefficients of the generating function with the fastest exponential growth that is defined by a system of functional equations involving generating functions B 1 , . . ., B k of the form where the Φ i are polynomials with non-negative integer coefficients in k + 1 variables.Note that the grammar given in Theorem 3.1 is of this shape.In order to decide which of the B i 's has this specific exponential growth, further information on the problem, like in our case given by Lemma 3.2, is needed.By Banach's fixed point theorem, these equations admit a unique solution vector (B 1 , . . ., B k ) ∈ (C[[z]]) k with respect to the formal topology [23,Section A.5]. Furthermore, each B i (z) has non-negative coefficients in its expansion around 0 (which is already clear from the combinatorial nature of the problem).Then, the multivariate version of the implicit function theorem implies that each of them has a non-zero radius of convergence which we call ρ i .By Pringsheim's Theorem [23, Theorem IV.6], ρ i ∈ [0, +∞] is a singularity of B i (z).Moreover, as B i (z) is an ordinary generating function of an infinite combinatorial class, we must have ρ i ∈ [0, 1].Finally, in order to compute the radius of convergence, we find the minimal point z ∈ [0, 1] where the implicit function theorem fails.To be more precise, we numerically compute solutions ρ ∈ [0, 1] and b 1 , . . ., b k ∈ [0, +∞) of the following system where δ i,j is the Kronecker symbol: δ i,i = 1, and δ i,j = 0 for i = j.

Remark 3.2
The unranked DL-grammars lead to the following specific shape . . .
We actually know by Lemma 3.2 that the outermost square-root vanishes, which gives b k = B k (ρ) = 1/2.Additionally, we can also directly deduce from this system that ρ k ≤ ρ k−1 .
In the unranked DLT-model the system looks like . . .
where the last equation is the only one involving B k , as the root can not be a receiver of an HGT.Note that the subsystem of the first k − 1 equations is strongly connected and but still not satisfies the a-properness condition (i.e. it is no contraction in the formal topology) of the Drmota-Lalley-Woods Theorem [23, Theorem VII.6] which would directly imply a square root singularity.Thus, we conjecture that the dominant singularity still comes solely from the outermost square root of B k implying b k = 1/2.
In the ranked DLT-model we are dealing with blocks of strongly connected components that correspond to the time slices.Note that the root is contained in a singleton time slice.Experiments suggest the same behavior as in the previous cases.However, one thing is for sure in all models: we always have ρ r(S) ≤ ρu for all other subtrees with root u of the species tree.Hence, there will be always a dominant minimal singularity in [0, 1] that can be (numerically) computed.Note however, that the determinant computation soon becomes extremely heavy.
After determining the radius of convergence, we must determine the number of singularities on it.As shown in the case of λ-terms in [8, Lemma 8] there can only be one dominant singularity ρu.Let us quickly repeat this argument here.Assume that there exists a root z 0 = ρue iθ of the same modules.Substituting this value into Ru(z) from ( 18) gives which can only hold if e inθ = 1 whenever an = 0. Now, due to a 1 = 0 we have z 0 = ρu.Hence, ρu is the unique dominant real singularity of Hu(z).
Combining the previous results, we have shown for a family of constants γ u,i the following local singular expansion The fact that Ru(z) has a simple root at z = ρu shows that γ u,0 > 0.Then, by transfer theorems of analytic combinatorics [23, Theorem VI.1 and Theorem VI.3], we get the claimed asymptotic expansion of Equation (12), where γ T = γu,0 2 √ π > 0 and this ends the proof of Theorem 3.3.

Remark 3.3
There are several possible extensions of the previous approach.First of all, it is straightforward to extend it to the ranked DL-model.In that case one only needs to incorporate unary nodes arising from the time slices.Second, an extension to the DLT-model is also possible, yet the computations are more involved as the binary tree structure leading to Lemma 3.1 does not hold anymore.However, it can still be modeled with colored binary trees, where the number of colors depends on the size of the set of incomparable nodes (in the the current time slice).Third, it is also possible to consider the distribution of certain parameters, such as the number of gene losses, or the number of gene duplications, see e.g. for related results in lattice paths and trees [4,10,25].Using multivariate generating functions and marking each such event by an additional variable like in the general grammar of Theorem 3.1, the above results for the DL-model directly generalize to the respective ones on multivariate generating functions.All these generalizations are interesting future research directions.
The counting and sampling algorithms described above have been implemented in Python, and are available at https://github.com/cchauve/DLTcount.

Results
Over the next two sections, we will apply Theorem 3.3 to the special cases of the caterpillar and complete species tree in the unranked DL-model, and explicitly determine the constants involved in the asymptotic expansion.
Then, we apply our dynamic programming counting and sampling algorithms to study properties of random evolutionary histories.

Asymptotic expansion for extremal species trees in the DL-model
Our experimental results (Section 4.2) suggest that for a given k, the species trees having the largest (resp.smallest) number of DL-histories are respectively the caterpillar tree and the balanced binary tree (Conjecture 4.1), defined below.In the present section, our main results are the explicit computation of the asymptotic growth and the leading constant of Theorem 3.3 for the caterpillar species tree (Propositions 4.1 and 4.2) and for the complete binary species tree, the special case of balanced trees when k is a power of 2 (Propositions 4.3 and 4.4, see also Table 2).The rooted caterpillar tree CT k can be defined as follows: CT 1 is the tree reduced to a single leaf, while is the tree formed by a left subtree equal to CT k−1 and a right subtree equal to CT 1 .Observe that every subtree of a caterpillar tree is itself a caterpillar tree, see Figure 4.
The complete binary tree CB h with k = 2 h leaves can be defined as follows: CB 0 is the tree reduced to a single leaf, while CB h (h ≥ 1) is the tree formed by a left and a right subtree both equal to CB h−1 .Observe again that every subtree is itself a complete binary tree, see Figure 4.The complete binary tree is a special case of the class of balanced trees, defined as trees where, for each node, the number of leaves in the left subtree differs from the number of leaves in the right subtree by at most one.Complete binary trees are the only balanced trees in which the number of leaves is a power of two.
We can observe that the number of DL-histories grows much faster for the caterpillar tree than for the complete binary tree.This is actually unsurprising given that the number of DL-histories can be linked to the size of the grammar, which itself depends on the structure of the species tree.More precisely, the size of the grammar depends on the number of unique subtrees of the considered species tree S. Each such subtree may be identified by its root u and corresponds to one set of rules (1)-( 6), while subtrees having the same topology lead to isomorphic subgrammars with the same counting generating functions.The caterpillar (resp.complete binary) tree has the largest (resp.smallest) number of unique subtrees within the set of species trees of the same size (when k is a power of 2 for the complete binary tree), compare also Table 2 where extant genes are marked by a single terminal Z, is the following: Let f k,n be the number of DL-histories of the caterpillar CT k consisting of n genes.The corresponding counting generating function is given by and, by Lemma 3.1, it is defined by the functional equation In Table 3 we computed the first few initial terms for k = 1, ..., 5. Note that none but the first one was found in the OEIS [41] before we added them.Applying Theorem 3.3, the asymptotic expansion of the coefficients for n → ∞ is for some constants α k > 0 and λ k > 0 that are made explicit below.
Let X k be the minimal positive real solution of the fixed point equation Then, the dominant singularity of F k (z) can be found at Proof We need to analyze the nested radicals of F k (z) in more detail.Therefore, as done in Equation ( 16) for the general case, we define the decomposition .
Thus, we directly get the specialized version of the reucurrence for the radicands from Equation ( 17) by The dominant singularity λ k is given by the minimal positive root of P k (z).This already proves the case k = 1.We introduce the shorthand X = √ 1 − 4z and use it from now on as our new variable.This directly gives Hence, this equation is zero if and only if For k = 2 this proves the claim as P 1 (X) = X.Now we proceed by induction.Squaring this equation and substituting the known expression for P k−1 (X) gives Repeating this process proves the claim.
Proposition 4.2 Using the notation of Proposition 4.1, the constant α k is equal to In particular, α k > 0.
Proof We will prove that P k (X) admits the following extension in a neighborhood of X k : where the derivative is with respect to X.Note that this derivative exists, as and we know from Lemma 3.2 that X k−1 < X k .
Next, recall the shorthand X = √ 1 − 4z and that by the chain rule ∂zP k (X) = ∂ X P k (X)∂zX.Then, the transfer theorems of analytic combinatorics [23] directly show that the n-th coefficient of F k (z) satisfies the form (12) with α k = λ k P k (X k )/(8πX k ).Therefore, it remains to find an expression for P k (X k ).
Let us take the derivative of Equation (25).We get In the proof of Proposition 4.1 we have seen that P i (X k ) = s k−i+1 (X k ).Iterating this equation until P 1 (X) = 2X shows the claim.Finally, the positivity of the constant holds as all terms are positive.
With these formulas it is easy to compute explicit values for the constant α k and the asymptotic growth factor λ −1 k .We show the first few values in be the set of DL-histories associated with the complete binary tree CB h .Then, the respective grammar, considering again only terminals Z marking extant genes, is the following: Let g h,n be the number of histories over the complete binary tree CB h consisting of n genes represented by z.As before, we analyze the counting generating function which is given by and, by Lemma 3.1, it is defined by the functional equation As before, we computed the first few initial terms in Table 4. Again, none but the first one was found in the OEIS [41] before we added them.Applying Theorem 3.3 gives the asymptotic expansion of the coefficients for n → ∞ as where β h > 0 and µ h > 0 are nonnegative constants computed as follows.
4 , where Furthermore, q h and µ h are algebraic numbers of degree 2 h .
Proof As for the caterpillar tree, we need to analyze the nested radicals.To make this structure visible, we again define Then, the radicands satisfy the following recurrence When comparing it with the recurrence of radicands for the caterpillar grammar in (24) we notice a major difference: the coefficients are independent of z.
Then, the reasoning follows the same lines as the proof of Proposition 4.1.Yet, due to the independence of the coefficients of z, the induction yields an explicit expression.Note that In a similar way we are also able to compute the constant β h explicitly.
Proposition 4.4 Using the notation of Proposition 4.3, the constant β h is equal to Proof By Equation ( 30) the singularity of G h (z) is determined by the smallest root µ h of Q h (z).The constant is determined by the expansion for z → µ h : By the recursive definition, Iterating this relation and applying As before, we computed the first few explicit values for the constant β h and the asymptotic growth factor µ −1 h , where h is a power of 2, and show them in Table 2.

Empirical investigations and open questions
In this section we present empirical results and observations derived using the counting and sampling algorithms described in Section 3.2.These results provide the first detailed view, especially in the DL-model, of the general question: in how many ways can n genes have evolved from a single ancestral gene, for a given species tree?

Counting histories for random species trees
We are first interested in computing the number of histories in a given evolutionary model.We considered the following models: DL-histories with an unranked or ranked species tree (called respectively models uDL and rDL from now), DLT-histories with an unranked species tree or a ranked species tree (called respectively models uDLT and rDLT from now).
For a given evolutionary model and species tree S of size k, let h S (n) be the number of histories of size n.
As shown in Equation (12) for the uDL-model, this number grows asymptotically with n as follows where γ S and ρ S , both depend only on S. From now, we denote α S = ρ −1 S the exponential growth factor for the number h S (n).In the uDL-model, as discussed in Section 3.3, we can compute precisely the growth factor from the grammar specifying the DL-histories for the given species tree S. For other models, we can estimate λ S from the number h S (n) of histories of size n as follows: this estimate precision increasing naturally with n.
DL-models.We considered species trees of size ranging from k = 3 to k = 25 and for each species tree size k, we generated 98 random species tree of size k under the uniform distribution, using the RANRUT algorithm described in [33], and we completed this set of species tree by adding the caterpillar species tree with k leaves and the balanced tree with k leaves3 ; so for small values of k, the same species tree can occur several times in the sample of 100 trees.When working in the rDLT-model, we generated, for each species tree 10 random rankings under the uniform distribution, using the algorithm described in [8].Then, for each instance, we computed the number of histories of size n = 50 in the models uDL, uDLT and rDLT4 and used these numbers to estimate the growth factor using (32).
Figure 5 shows the exponential growth factor in the uDL-model obtained using the exact approach described in Section 3.3 and the ratio between this exact growth factor and the growth factor estimated using the experimental approach described above.A first observation from Figure 5 is that estimating the growth factor from the number of histories of size n = 50 approximates well the exact growth factor in the uDL-model; we believe it is also the case in the other models (data not shown).
Moreover, following up on the results shown in Table 2, our experiments lead to the following conjecture, characterizing the species trees leading to extreme growth factors for a given value of k.
Conjecture 4.1 For a given k, and n large enough, the unranked species tree of size k having the largest number of DL-histories of size n is the caterpillar tree; moreover the exponential growth factor of the number of histories for a caterpillar of size k grows superlinearly as a function of k.Species trees having the smallest number of DL-histories are balanced species trees of size k and the exponential growth factor of the number of histories for a balanced tree of size k grows linearly as a function of k.
We verified that the conjecture is true for all values of k in our experiments.We investigated several proof ideas, in particular linking the exponential growth factor to the number of unique subtrees in a species tree.Indeed this is a feature for which caterpillar and balanced trees reach extreme values for a given value of k; actually the caterpillar is the unique tree with the maximum number of subtrees, while balanced trees have the minimum number of subtrees, although if k is not a power of 2, some unbalanced trees can have the same number of subtrees than balanced ones.We did find examples of pairs of species trees for which the one with the larger (resp.smaller) number of unique subtrees has a smaller (resp.larger) exponential growth factor.There are also species trees with the same number of unique subtrees than balanced trees of the same size and showing a larger exponential growth rate.So the number of unique subtrees is not the determinant leading to an extreme growth factor.We observed similar examples when considering the height of the species tree, another feature for which caterpillar and balanced trees attain extreme values.Generally the question of understanding which features of species trees of the same size that makes one having more DL-histories than the other one is open.
DLT-models.Next, we consider models including HGT; in Figure 6 we show the estimated growth constants in the uDLTand rDLT-models.
An observation that addresses one of the main questions motivating our work, is that the number of histories in models involving HGT grows much faster than in models excluding HGT; this is apparent by comparing the growth factors in the uDL and uDLT models, but even more through Figure 7 that shows the ratio of the number of DLT-histories over the number of DL-histories for selected pairs (k, n), considered over all randomly chosen ranked or unranked species trees.We can observe that the ratios grow as large as 10 40 in the unranked model and 10 29 in the ranked model for histories of size 50 over a species tree of size 25, that correspond to parameters of realistic phylogenomics datasets.It is nevertheless interesting to observe that considering ranked species trees tames significantly the magnitude of the search space explosion when introducing HGT in a model.
Finally, we can observe that in the rDLT-model, the growth factor seems to be almost independent of the topology of the chosen species tree and ranking (Figure 6 (Bottom)).Intuitively, this can be explained by the fact that a ranked species tree can almost be seen as a sequence of time slices, each composed of a set of branches (from 1 branch for the time slice containing the root of S to k branches for the time slice containing all leaves), with exactly one ending with a speciation node while all other end by a unary node.Within each time slice, the genes can evolve freely by duplication and HGT, where a duplication can be seen as equivalent to a HGT within the same branch.Thus, the number of histories is dominated by the number of evolutionary events taking place in each time slice, with some variability being introduced by the number of genes leaving a time slice right after the only speciation node it contains, that can create extra gene copies entering the next time slice.In order to understand this phenomenon, we investigated a reduced evolutionary model, in which every speciation is followed by a random loss, i.e. does not create an extra gene copy entering the next time slice; we name this model the rDT-SL-model, where SL stands for Speciation-Loss.In this model, we are able to prove the independence of the chosen species trees.Theorem 4.1 In the rDT-SL-model, the number of histories of size n is the same for every ranked species tree of size k.
Proof Let a ranked species tree of size k be given, and consider the unary-binary tree induced by its time slices.We then transform this tree into a directed graph called the events graph describing the possible events of duplication, HGT, and speciation in the following way: 1. Label the leaves from 1 to k.
2. Label each internal node with a set containing the labels of the leaves of its induced subtree.These labels are the possible leaves reachable by speciation; 3. Encode speciation events by super edges called speciation edges which consist of the one (unary) or two (binary) edges leading to the children of a node.By doing so, the two edges are treated as a single edge; 4. Encode duplication events by adding loops called duplication edges to each node; 5. Encode HGT events by adding edges called transfer edges from each node to each other node within the same time slice; An example of this transformation is shown in Figure 8.Let us briefly state some properties of the events graph.The labels of the nodes of each time slice form a set partition of {1, . . ., k} by construction.Due to the rankings, each time slice contains one node more than the previous one and every path from the root to the previous leaves contains k − 1 speciation edges.
The main idea of the proof is that we can encode an history H for a species tree S of size k by an ordered unary-binary tree He whose nodes are labeled by nodes of the events graph, that encodes unambiguously H, and then show that in the rDT-SL-model, given the events graph E of another ranked species tree S of the same size, we can transform He into an ordered unary-binary tree H e whose nodes are labeled by nodes of E that encodes a unique history for S .This establishes a one-to-one correspondence between the sets of histories for two arbitrary ranked species trees of size k, S and S , and thus proves the stated result.
The principle of the encoding is to associate each internal node of a history with a (deterministic) label which is a node of the events graph.Let E be the events graph of S. The encoding works as follows: for a node x of a history H for species tree S, if t is the time slice it belongs to and i its left-most leaf (defined in a depth-first traversal of the ordered tree representing the history), then we label x by the unique node of E in the time slice t that contains i. Extant leaves stay labeled by their extant species.After deleting leaves corresponding to gene losses from the history, speciation-loss nodes become unary, while duplication and HGT nodes stay binary.Call He the ordered unary-binary tree for history H.The original history H can be unambiguously recovered from He and E, by reinserting these losses and removing the labels, as any edge of He corresponds to an edge of E, so defines an evolutionary event.
Next, let S be another ranked species tree of the same size k as S and E its events graph.We transform He into H e as follows: for every node x, whose left-most leaf is u and that belongs to time slice t, replace its label by the unique node of time slice t of E that contains the u.This is always possible, as, by construction of the events graph in models with HGT, any leaf is reachable from any node.We claim that H e defines unambiguously a history for S .The key argument to prove this claim is that, by the way we constructed E and H e , for any edge in H e the labels of its two nodes, that are either in the same time slice or in consecutive time slices, are incident in E : if both nodes are in the same time slice, then by construction of E they are either the same node (so linked by a duplication edge) or are incident by a transfer edge, while if they are in consecutive time slices, they contain a common species and so are incident by a speciation edge.It follows that H e encodes a history H for S .The construction from H to H is deterministic and reversible, which provides a one-to-one correspondence between the histories of S and the histories of S in the rDT-SL-model.
Note that this construction does not work in models with no duplication, HGT or unrestricted speciation as the key argument that any edge in H e can be found in E does not hold anymore, thus preventing to be able to transform H e into a history for S .Remark 4.1 From the previous proof we can also deduce an iterative tree growing algorithm for the histories offering an alternative explanation for Theorem 4.1.Every internal node gets a label that is a pair consisting of its time slice and the number of its left-most leaf.Note that this uniquely identifies a node in the species tree.
We start with a root node labeled by the first time slice and an arbitrary number from {1, . . ., k}.At every step, choose a leaf of the current history and consider the corresponding node in the events graph.Then traverse ) into an events graph (right) in the rDT-SL-model used in the proof of Theorem 4.1.
one of its edges and perform the action of this edge: If it is a speciation edge then add a new node with a label consisting of the successive time slice and the same number as only child.If it is a duplication or transfer edge then add a left child with the same label as the root and a right child labeled with the current time slice and an arbitrary number from the set the edge is pointing to.Once all leaves correspond to extant nodes the tree is a valid history.

Remark 4.2
The construction of the events graph in Theorem 4.1 can be adapted to all models.If there are no duplication events, the duplication edges are removed; if there are no HGT events, the transfer edges are removed.
The characteristics of the SL dynamics are not encoded in the events graph but in the bijection or the history growing algorithm.

On the parsimony and profile of random histories.
We also considered at the distribution of the evolutionary score for randomly sampled histories, where the score of a history is the sum of the number of duplications, losses and HGT, for k = 16 and n = 30, over 50 random unranked species trees, sampling 10, 000 random histories for each species tree.
Figure 9 below suggests that the space of histories for a given species tree is dominated by histories with a relatively high score and that, as expected, for a given species tree including HGT in the evolutionary model leads to a significant decrease of the evolutionary score of histories.
In fact, when looking at the distribution of the number of duplications in the uDLT-model (results not shown), we observed that the duplication number drops significantly in the uDLT-model compared to the uDL-model.We can also note that, when comparing the score of histories in the uDL-model and the number of duplications, most of the score is due to gene losses (Figure 10), a characteristic we also see in the uDLT-model where the number of duplications (resp.HGT) exceeds rarely 5 (resp.25) in the sampled histories.

Conclusion and perspectives
Our work introduces the first results on counting and sampling evolutionary scenarios in models accounting for gene duplication, gene loss and HGT.The originality of our work, compared to previous work in the reconciliation framework, is that we only consider the species tree to be given, and thus consider all possible evolutionary histories of a given size, i.e. leading to a given number of genes.Our results include formal grammars describing this combinatorial space, together with counting and sampling algorithms, obtained using either dynamic programming or enumerative and analytic combinatorics methods.These results complement a growing body of work developed over the last few years in the case of matching gene and species trees.
Using our method, we were able to obtain precise asymptotics on the number of histories for the two specific species trees, the rooted caterpillar and the complete binary tree in the unranked DL-model, although our method also applies to any given species tree in this model.Our counting and sampling algorithms allowed us Fig. 9: Distribution of the score (number of duplications plus losses plus HGT) over 50 random species trees of size 16 and 10, 000 random histories of size 30 per tree in the uDLand uDLT-models.
to complement these results for other models, especially models accounting for HGT.Our experimental results provide a first global view of the space of potential evolutionary histories for a given species tree.They confirm the expected fact that introducing HGT in a model result in a dramatic increase of the space of possible histories; they also lead to the interesting observation that in the ranked DLT-model, the total number of histories is asymptotically almost independent of the given species tree.
Our work suggests several avenues for further research.First, our notion of evolutionary history assumes that gene trees are ordered, i.e. that gene copies created by a gene duplication are distinguishable; this differs from the notion of reconciled gene trees, where duplicated copies are not distinguishable.While our assumption follows naturally from an evolutionary biology point of view, it would be interesting to see if our approach could be applied to count and sample reconciliations instead of histories.Next, the last few years have seen the development of more comprehensive models of gene family evolution, accounting for example for genes appearing at a given species by an HGT from an unsampled or extinct species [46], incomplete lineage sorting (ILS) [3,21,37,42,51,52], or gene conversion [30].In these models, reconciled gene trees can be computed using dynamic programming algorithms and it is natural to ask if such algorithms could be turned into grammars for the corresponding space of evolutionary scenarios.Last, from an applied point of view, a limitation of our work lies in the fact that histories are parameterized by their size, i.e. the number of extant genes, while in applications, the genes of a gene family are assigned to specific extant species.Ideally, in order to explore (through counting or sampling) the space of all possible evolutionary scenarios for a gene families whose distribution of genes in extant species is given, we would need to parameterize our algorithms by this distribution, which leads to dynamic programming algorithms with a much higher time and space complexity, dependent on the number of extant species.However, we believe that advanced combinatorial sampling, especially multiparametric combinatorial samplers [7,9], can be used within the framework we developed in the present work to provide efficient counting and sampling algorithms.

Fig. 5 :
Fig. 5: Box-plot of the distribution of the growth factor for each 100 random species tree per size k in the uDL-model.(Top) Exact growth factor; (Bottom) Box-plot of the distribution, for each species tree, of the ratio between the exact growth factor and the estimated growth factor.

Fig. 6 :
Fig.6: Box-plot of the distribution of the growth factor for each 100 random species tree per size k in the uDLT (Top) and rDLT (Bottom) models.The growth factor is estimated from the number of DLT-histories of size n = 50 using formula(32).

Fig. 7 :
Fig.7: Box-plots of the distribution of the ratio of the number of DLT -histories over the number of DL-histories over all species trees size k and histories size n for selected pairs (k, n).The distributions are obtained, for each (k, n), over 100 randomly chosen (resp.1000) unranked (resp.ranked) species trees.

Funding:
The first author is supported by a Discovery Grant of the Natural Sciences and Engineering Research Council of Canada (RGPIN-2017-03986).This research was enabled in part by support provided by Westgrid (https://www.westgrid.ca/)and Compute Canada (https://www.computecanada.ca)through a Resource Allocation (ID 838) to the first author.The third author was supported by the Exzellenzstipendium of the Austrian

Fig. 10 :
Fig. 10: Distribution of the ratio Duplications / Losses in the uDL (Top) and of the ratios HGT / score, Duplications/score and Losses/score in the uDLT-model (Bottom).For both figures the distribution is over 50 random species trees of size 16 and 10, 000 random histories of size 30 per tree.
for an illustration.

Table 2 :
. Leading constants and exponential growth factors for the number of DL-histories consistent with the unranked caterpillar and complete species tree.Their closed forms are given in Propositions 4.1-4.4.

Table 3 :
DL-history counting sequences of the caterpillar species trees CT k .
4.1.1CountingDL-histories associated with the caterpillar species treeDenote by H CT k the set of DL-histories over the caterpillar CT k , then the general grammar of DL-histories,

Table 4 :
DL-history counting sequences of the complete species trees CB h with k = 2 h leaves.