Embedding small digraphs and permutations in binary trees and split trees

We investigate the number of permutations that occur in random labellings of trees. This is a generalisation of the number of subpermutations occurring in a random permutation. It also generalises some recent results on the number of inversions in randomly labelled trees. We consider complete binary trees as well as random split trees a large class of random trees of logarithmic height introduced by Devroye in 1998. Split trees consist of nodes (bags) which can contain balls and are generated by a random trickle down process of balls through the nodes. For complete binary trees we show that asymptotically the cumulants of the number of occurrences of a fixed permutation in the random node labelling have explicit formulas. Our other main theorem is to show that for a random split tree, with high probability the cumulants of the number of occurrences are asymptotically an explicit parameter of the split tree. For the proof of the second theorem we show some results on the number of embeddings of digraphs into split trees which may be of independent interest.


Introduction and statement of results
Our two main results are the distribution of the number of appearances of a fixed permutation in random labellings of complete binary tree and split trees. Theorem 1.3 gives the distribution of the number of appearances of a fixed permutation in a random labelling of a complete binary tree. A split tree, see Section 1.3, is a random tree consisting of a random number and arrangement of nodes and non-negative number of balls within each node. We say an event E n occurs with high probability (whp) if P(E n ) → 1 as n → ∞. Theorem 1.6 shows that for a random split tree with high probability, a result similar to Theorem 1.3 holds for the number of appearances of a fixed permutation in a random labelling of the balls of the tree. We write a complete introduction and statement of results in terms of complete binary trees first before defining split trees and stating our results for split trees. This paper extends the conference paper [1].

Patterns in labelled trees
Let V denote the node set of a tree T n with n nodes. Define a partial ordering on the nodes of the tree by saying that a < b if a is an ancestor of b. Suppose we have a labelling of the nodes π : V → [n].
We say that nodes a and b form an inversion if a < b and π(a) > π(b). The enumeration of labelled trees with a fixed number of inversions has been studied by Gessel et al. [8], Mallows and Riordan [13] and Yan [16].
Write π(u) ≈ α to indicate the induced order is the same: for example 527 ≈ 213. Permutations in labelled trees have been studied before: Anders et al. [2] and Chauve et al. [4] enumerated labelled trees avoiding permutations in the labels.
We shall be interested in the number of permutations in random labellings of trees. From now on, for fixed trees we let π : V → [n] be a node labelling chosen uniformly from the n! possible labellings (for split trees π is a uniformly random ball labelling). The (random) number of inversions in random node labellings of fixed trees as well as some random models of trees were studied in [7,14] and extended in a recent paper [3]. The nice paper [12] by Lackner and Panholzer studied runs in labelled trees; i.e. the permutations 12 . . . k and k . . . 21 for constant k. Their paper gives both enumeration results as well as a central limit law for runs in randomly labelled random rooted trees. This new paper finds approximate extensions to some of the results in [3]. We now define the notation we will use. The number of inverted triples in a fixed tree T is the random variable R(321, T ) = ∑ u 1 <u 2 <u 3 1[π(u 1 ) > π(u 2 ) > π(u 3 )] where the sum runs over all triples of nodes in T such that u 1 is an ancestor of u 2 and u 2 an ancestor of u 3 . For a tree T and uniformly random node labelling define R(α, T ) so in particular R(21, T ) counts the number of inversions in a random labelling of T . (For split trees we take π to be a uniformly random ball labelling and the balls get a partial relation of ancestor induced by the nodes: see Section 1.3 for details.) Let d(v) denote the depth of v, i.e., the distance from v to the root ρ. For any u 1 < . . . < u |α| we have P[π(u) ≈ α] = 1/|α|! and so it immediately follows that, For length two permutations, e.g. inversions, E [R(21, T )] = 1 2 ϒ(T ) the tree parameter ϒ(T ) def = ∑ v d(v) is called the total path length of T . We will state our results in terms of a tree parameter ϒ k r (T ) which generalises the notion of total path length.
Defining ϒ k r (T ) will allows us to generalize (1.1) to higher moments of R(α, T ). For r nodes v 1 , . . . , v r let c(v 1 , . . . , v r ) be the number of ancestors that they share and so c(v 1 , . . . , v r ) def = |{u ∈ V : u ≤ v 1 , v 2 , . . . , v r }| which is also the depth of the least common ancestor plus one. That is c(v 1 , . . . , v r ) = d(v 1 ∨ . . . ∨ v r ) + 1 where we write v 1 ∨ v 2 for the least common ancestor of v 1 and v 2 . The 'off by one error' is because the root is in the set of common ancestors for any subsets of nodes but we use the convention that the root has depth 0. Also define where the sum is over all ordered r-tuples of nodes in the tree and with the convention i.e., we recover the usual notion of total path length. The k = 2 case recovers the r-total common ancestors [3]. Indeed the distribution of the number of inversions in a fixed tree has already been studied in [3]. Similarly to the way one can describe a distrubtion by giving all finite moments, we may also describe a distribution via its cumulant moments. The cumulants, which we by denote κ r = κ r (X ), are the coefficients in the Taylor expansion of the log of the moment generating function of X about the origin (provided they exist) log E(e ξ X ) = ∑ r κ r ξ r /r! thus κ 1 (X ) = E [X ] and κ 2 (X ) = Var (X ). For more information on cumulants see for example [11,Section 6.1]. Theorem 1.1 (Cai et al. [3]). Let T be a fixed tree, and denote by κ r = κ r (R(21, T )) the r-th cumulant of R(21, T ). Then for r ≥ 2, where B r denotes the r-th Bernoulli number.
Remark 1.2. In essence Theorem 1.1 (Cai et al. [3]) shows the r-th cumulant of the number of inversions is a constant times ϒ 2 r (T ). Our main result on complete binary trees, Theorem 1.3 (respectively Theorem 1.6 on split trees), shows that for any fixed permutation α of length k for complete binary trees (and whp for split trees) the r-th cumulant is a constant times ϒ k r (T n ) asymptotically. The exact constant is defined in Equation (6.1) and is a little more involved than for inversions but observe it is a function only of the moment r and the length of k = |α| together with the first element α 1 of the permutation α = α 1 . . . α k .

Complete Binary trees
We move onto stating our results. For the case of T a complete binary tree on n vertices we asymptotically recover Theorem 1.1 ( [3]) for large n. Moreover we extend it to cover any fixed permutation α for complete binary trees.
The first of our theorems gives the distribution of the number of α in a random labelling of the nodes in a complete binary tree. This result formed Theorem 2 in the extended abstract version of the paper however there was an error in the definition of the constant D α,r for r > 2 which has now been corrected. Theorem 1.3. Let T n be the complete binary tree with n nodes and fix a permutation α = α 1 . . . α k of length k. Let κ r = κ r (R(α, T n )) be the r-th cumulant of R(α, T n ). Then for r ≥ 2, there exists a constant D α,r depending only on α and r such that, An explicit formula for D α,r is derived in Equation (6.1) and in the Appendix on page 22 we list values of D α,r for permuatations α of length at most 6 and moments r ∈ {1, . . . , 5}. The explicit formula (6.1) implies the following corollary. Corollary 1.4. Let T n be the complete binary tree with n nodes. For permutations α of length 3, the variance is (1)) for α = 213, 231 and more generally for α = α 1 α 2 . . . α k , Remark 1.5. The methods in the proofs are very different for inversions and general permutations. In [3], the method takes advantage of a nice independence property of inversions. For a node u let I u be the number of inversions involving u as the top node: I u = |{w : u < w, π(u) > π(w)}|. Then the {I u } u are independent random variables and I u is distributed as the uniform distribution on {0, . . . , |T u |} where T u is the subtree rooted at u, see Lemma 1.1 of [3]. Without a similar independence property for general permutations our route instead uses nice properties on the number of embeddings of small digraphs in both complete binary trees and, whp, in split trees. This property allows us to calculate the r-th moment of R(α, T ) directly from a sum of products of indicator variables as most terms in the sum are zero or negligible by the embedding property.

Split trees
Split trees were first defined in [5] and were introduced to encompass many families of trees that are frequently used in algorithm analysis, e.g., binary search trees [9], m-ary search trees [15] and quad trees [6]. The full definition is given below but note that a split tree is a random tree which consists of nodes (bags) each of which contains a number of balls. We will study the number of occurences of a fixed subpermutation α in a random ball labelling of the split tree.
The random split tree T n has parameters b, s, s 0 , s 1 , V and n. The integers b, s, s 0 , s 1 are required to satisfy the inequalities We define T n algorithmically. Consider the infinite b-ary tree U , and view each node as a bucket or bag with capacity s. Each node (bag) u is assigned an independent copy V u of the random split vector V . Let C(u) denote the number of balls in node (bag) u, initially setting C(u) = 0 for all u. Say that u is a leaf if C(u) > 0 and C(v) = 0 for all children v of u, and internal if C(v) > 0 for some proper descendant v, i.e., v > u. We add n balls labeled {1, . . . , n} to U one by one. The j-th ball is added by the following "trickle-down" procedure.
1. Add j to the root.
2. While j is at an internal node (bag) u, choose child i with probability is the split vector at u, and move j to child i.
3. If j is at a leaf u with C(u) < s, then j stays at u and we set C(u) ← C(u) + 1.
If j is at a leaf with C(u) = s, then the balls at u are distributed among u and its children as follows. We select s 0 ≤ s of the balls uniformly at random to stay at u. Among the remaining s + 1 − s 0 balls, we uniformly at random distribute s 1 balls to each of the b children of u. Each of the remaining s + 1 − s 0 − bs 1 balls is placed at a child node chosen independently at random according to the split vector assigned to u. This splitting process is repeated for any child which receives more than s balls.
Once all n balls have been placed in U , we obtain T n by deleting all nodes u such that the subtree rooted at u contains no balls. Note that an internal node (bag) of T n contains exactly s 0 balls, while a leaf contains a random amount in {1, . . . , s}. We can assume that the components V i of the split vector V are identically distributed. If this was not the case they can anyway be made identically distributed by using a random permutation, see [5]. Let V be a random variable with this distribution. We assume, as previous authors, that P {∃i : V i = 1} < 1. For this paper we will also require that the internal node (bag) capacity s 0 is at least one so that there are some internal balls to receive labels.
For example, if we let b = 2, s = s 0 = 1, s 1 = 0 and V have the distribution of (U, 1 − U ) where U ∼ Unif[0, 1], then we get the well-known binary search tree.
An alternate definition of the random split tree is as follows. Consider an infinite b-ary tree U . The split tree T n is constructed by distributing n balls (pieces of information) among nodes of U . For a node u, let n u be the number of balls stored in the subtree rooted at u. Once n u are all decided, we take T n to be the largest subtree of U such that n u > 0 for all u ∈ T n . Let V u = (V u,1 , . . . ,V u,b ) be the independent copy of V assigned to u. Let u 1 , . . . , u b be the child nodes of u. Conditioning on n u and V u , if n u ≤ s, then n u i = 0 for all i; if n u > s, then where Mult denotes multinomial distribution, and b, s, s 0 , s 1 are integers satisfying (1.3). Note that we have ∑ b i=1 n u i ≤ n (hence the "splitting"). Naturally for the root ρ, n ρ = n. Thus the distribution of (n u , V u ) u∈V (U ) is completely defined.
The balls inherit a partial order from the partial ordering of the nodes in the split tree. We write u 1 < u 2 if node u 1 is an ancestor of node u 2 , u 1 > u 2 if u 2 is an ancestor of u 1 and finally u 1 ⊥ u 2 is neither u 1 nor u 2 is an ancestor of the other node. For balls j 1 , j 2 in nodes (bags) u 1 , u 2 respectively j 1 < j 2 if u 1 < u 2 and j 1 ⊥ j 2 if u 1 ⊥ u 2 . We say that balls j 1 , j 2 are incomparable, j 1 ⊥ j 2 if they are in the same node (bag).
This next theorem is our other main result. We determine the distribution of the number of occurences of a fixed subpermutation in a random ball labelling of the split tree. Denote the random variable for the number of occurences of α in a uniformly random ball labelling of split tree T n by R(α, T n ). Theorem 1.6. Fix a permutation α = α 1 . . . α k of length k. Let T n be a split tree with split vector V = (V 1 , . . . ,V b ) and n balls. Let κ r = κ r (R(α, T n )) be the r-th cumulant of R(α, T n ). For r ≥ 2 the constant D α,r is defined in Equation (6.1). Whp the split tree T n has the following property.
Our theorem says the following. Generate a random split tree T n , whp it has the property that the random number of occurrences of any fixed subpermutation in a random ball labelling of T n has variance and higher cumulant moments approximately a constant times a 'simple' tree parameter of T n . Remark 1.7. We may contrast this with Theorem 1.12 of [3]. That theorem states the distribution of the number of inversions in a random split tree; where the distribution is expressed as the solution of a system of fixed point equations. Determining the distribution of ϒ k r (T n ) would extend Theorem 1.12 of [3] about inversions to general permutations.

Embeddings of small digraphs
Certain classes of digraphs, defined below, will be important in the proof of Theorem 1.3. Loosely the digraphs we will consider are those that may be obtained by taking r copies of the directed path P k and iteratively fusing pairs of vertices together. It will also matter how many embeddings each digraph has into the complete binary tree. In Proposition 4.1 we show the counts for most digraphs in such a class are of smaller order than the counts of a particular set of digraphs in the class. The main work in the proof of this proposition is to show that the number of embeddings of any digraph H, up to a constant factor, depends only on the numbers of two types of vertices in H. We separate this result out as a theorem, Theorem 1.8, which we prove in Section 2.
We now define the particular notion of embedding small digraphs into a tree which will be important. Define a digraph to be a simple graph together with a direction on each edge. We shall consider only acyclic digraphs i.e. those without a directed cycle.
In the complete binary tree we have a natural partial order, the ancestor relation, where the root is the ancestor of all other nodes. Any fixed acyclic digraph also induces a partial order on its vertices where v < u if there is a directed path from v to u. For an acyclic digraph H, define [H] T n to be the number of embeddings ι of H to distinct nodes in T n such that the partial order of vertices in H is respected by the embedding to nodes in T n under the ancestor relation.
Observe that the inverse of embedding ι −1 need not respect relations.
For an example of this take the digraph and denote by P ℓ the rooted path on ℓ nodes. Notice that in two of the vertices are incomparable but the vertices of the digraph can be embedded into the nodes of a path which are completely ordered. The counts are [ ] P 4 = 2 and in general [ ] P ℓ = 2 ℓ 4 . A particular star-like digraph S k,r will be important. This is the digraph obtained by taking r directed paths of length k and fusing their source vertices into a single vertex. Alternatively the theorem can be stated in terms of star counts as [S |α|,r ] T n = ϒ  A vertex in a directed graph is a sink if it has zero out-degree. Define A 0 (H) ⊆ V (H) to be the set of sinks in digraph H. Recall that a directed acyclic graph defines a partial order on the vertices: to be the vertices with exactly one descendant which is a sink. We will call vertices in A 1 ancestors as they are ancestors of a single sink. Define A 2 (H) to be the remainder A 2 (H) = V (H)\{A 0 ∪ A 1 }. We call those in A 2 common-ancestors as they are the common ancestor of at least two sinks (see Figure 1). Observe if H is a directed forest then the sinks are the leaves. However, H need not be a forest and indeed a sink may have indegree more than one as in the rightmost sink in Figure 1.
For the split tree T n and an acyclic digraph H, define [H] T n to be the number of embeddings ι of vertices in H to distinct balls in T n such that the partial order of vertices in H is respected by the embedding to balls in T n under the ancestor relation.
In the extended abstract version of this paper [1], in Lemma 7, we proved the weaker upper bound that for constant c ′′ whp [H] T n ≤ c ′′ n |A 0 | (ln n) |A 1 | (ln ln n) |A 2 | , i.e. a dependence also on the number of 'common-ancestor' (red) vertices in H. It is a little trickier to prove the new upper bound. However, we are rewarded by a tighter bound on the number of embeddings; the expected number of embeddings is now determined only by the numbers of sink (green) and 'ancestor' (blue) vertices up to constant factors. It would be interesting to obtain tail bounds on the number of embeddings of small digraphs in a random split tree and we leave this as an open question.

Embeddings of small digraphs into the complete binary tree
In this section we prove Theorem 1.8 concerning upper and lower bounds on the number of embeddings of a fixed digraph H, thought of as constant, into a complete binary tree T n with n vertices.
We prove the lower bound of Theorem 1.8 first as the upper bound will require some preparatory lemmas.
Proof. (of lower bound of Theorem 1.8) We restrict attention to embeddings where all 'commonancestors' of H are embedded very near the root of T n , the sink vertices are embedded to leaves of T n and the 'ancestor' vertices are placed on the path between the root of T n and the leaf to which their descendant sink was embedded (see Figure 2). There are sufficiently many such embeddings to obtain the lower bound. In fact we restrict a little further to make it easy to check all the embeddings are valid.
The first task is to embed the vertices in A 2 close to the root in such a way that A 2 is embedded to ancestors of the nodes to which A 1 and A 0 are embedded and also such that the ordering within the vertices in A 2 is preserved. As H is an acyclic digraph the directed edges define a partial order on all vertices of H and in particular for those in A 2 . Thus this relation can be extended to a total order. Fix such a total order < * on V (H), one which extends the partial order on V (H), and relabel vertices in A 2 so that v 1 < * . . . < * v |A 2 | . Thus we may embed v 1 to the root ρ in T n and each v i+1 to a child of the node to which v i was embedded and the relation between vertices in H will be preserved by their embedding in T n ; i.e. we may embed A 2 to the nodes on the path from ρ to some u * at depth |A 2 | − 1. Fix such a node u * and let T * be the subtree of T n from u * .
Label the sinks A 0 = {s 1 , . . . , s |A 0 | } and vertices in A 1 according to which sink they are the ancestors We obtain a subcount of [H] T n by embedding A 2 onto the path from ρ to u * , embedding A 0 to leaves of T * and then for each i in turn embedding vertices in A i 1 on the path from u * to the embedding of s i . There are m − |A 2 | − 1 vertices on the path from s i to u * and at most |A 1 | of them already have an ancestor vertex embedded onto to them (i.e. from A j 1 for some j < i). Thus where the first binomial coefficient counts the number of ways to embed A 0 and the i-th binomial coefficient in the product counts the ways to embed A i (1). Hence for large m the RHS of Equation (2.1) has first term of order Θ(2 m|A 0 | ) and the product over The key observation to prove the upper bound in Theorem 1.8 is that for most pairs of nodes in a complete binary tree their least 'common ancestor' is very near the root. We make the required condition precise in the assumption of the next lemma, and show it implies the upper bound on the number of embeddings of H. It then suffices to prove that the condition holds for complete binary trees. This allows us to recycle the lemma to show the corresponding result in split trees.
Define c(u 1 , u 2 ) to be the number of 'common ancestors' of nodes u 1 and u 2 .
where the sum is over ordered pairs of distinct nodes in T n .
Proof. Label the sinks A 0 = {s 1 , . . . , s |A 0 | } and vertices in A 1 according to which sink they are the an- Similarly partition 'common-ancestor' vertices into disjoint sets {A i, j 2 } 1≤i< j≤|A 0 | according to the lexicographically least pair of sinks s i and s j for which it is an ancestor. Formally a vertex v ∈ A 2 is in A i, j 2 if v is the ancestor of sinks s i and s j but not an ancestor of a sink s k for k < max{i, j}.
Suppose sinks s i and s j are embedded to vertices u i and u j in T n . Then to complete the embedding of ancestors of s i , vertices in A i 1 must be embedded to ancestors of u i in T n and there are at most d(u i ) options. Likewise vertices in A i, j 2 i.e. 'common-ancestors' of sinks s i and s j must be embedded to a common ancestor of u i and u j in the tree. Thus, recalling c(u i , u j ) denotes the number of common ancestors of u i and u j , where the sum is over distinct nodes u 1 , . . . , u |A 0 | and the product i = j is over pairs u i , u j in u 1 , . . . , u |A 0 | . Fix a particular embedding of the sinks to u 1 , . . . , u |A 0 | and we shall bound both terms in the product in (2.2). Recall that for the (blue) 'ancestor' vertices, It will suffice to use the trivial bound that all vertices have depth at most the height of the tree, i.e. max i d(u i ) ≤ m. And so, Similarly, for the (red) 'common-ancestor' vertices Hence substituting the bounds above into the expression in (2.2), which is the required result.
There is one more result we need and then the upper bound in Theorem 1.8 will follow very fast.
the sum is over ordered pairs of distinct nodes in T n Proof. Associate with each vertex v ∈ V (T n ) a binary string of length at most m in the usual way: the root has string ∅, children of the root are labelled 0 and 1 and two vertices in the same subtree at depth d have the same initial d-length substring. Now ∑ u 1 ,u 2 1[c(u 1 , u 2 ) ≥ ℓ] is precisely the number of ordered pairs which share a common (ℓ − 1)-length initial substring in their labels; i.e. ordered pairs with both vertices in the same depth (ℓ − 1) subtree.

Embeddings of small digraphs into the split trees
In this section we prove Theorem 1.9 concerning upper and lower bounds on the number of embeddings of a fixed digraph H, thought of as constant, into a random split tree with n balls. We begin by briefly listing some results on split trees from the literature that will be useful for us.
For  We will use Proposition 3.1 as well as the property that most pairs of balls have their least common ancestor node very close to the root which we prove in Lemma 3.4.
We begin with the lower bound, the upper bound is proven at the end of this section on page 15.
Proof. (of the lower bound of Theorem 1.9) We describe a strategy to embed H into T n . The details of the proof are then to show that whp this strategy can be followed to obtain a valid embedding of H and that there are sufficiently many different such embeddings to achieve the lower bound.
The idea is as follows: first embed 'common-ancestor' vertices along a path to some node u * near the root of T n so that the subtree from u * hasñ balls where thisñ is a constant proportion of the total number of balls n. Now consider the split tree withñ balls and embed 'ancestor' and sink vertices into that. Embed sink vertices to 'good' balls in the tree (i.e. depth very close to the expected depth) and the 'ancestor' vertices to balls which are in nodes on the path between u * and the embedding of that ancestor's descendant. See Figure 3.
We embed the 'common-ancestor' vertices, A 2 (H), to the balls in the nodes on the path between a node, u * say, at depth |A 2 | − 1 and the root, using one ball per node. This is so far effectively the same as in the binary case. And we will later embed the sink and 'common-ancestor' vertices to balls in the subtree T u * .
We need to confirm there is some node u * at depth L = |A 2 | − 1 withñ balls in its subtree. Each node (bag) has capacity at most s 0 (internal nodes) or s (leaves) and there are at most (b L+1 − 1) nodes, a constant number, at depth less than L, so n − O(1) balls remaining. These balls are shared between b L , u * ,ñ balls ρ, n balls The rest of this section is devoted to proving the upper bound of Theorem 1.9. To prove the upper bound on the expected number of embeddings of a fixed digraph into a split tree we begin by proving the split tree analogue of Lemma 2.1 which was for complete binary trees. Define c n (b 1 , b 2 ) to be the number of node common ancestors of balls b 1 and b 2 . The lemma shows that the number of embeddings of H to balls in T n can be bounded above by a function of the number of balls, the height of the tree and the number of node common ancestors. Note that the following lemma is deterministic and is true for any instance of a split tree. [H] T n ≤ s Suppose sinks s i and s j are embedded to balls b i and b i ′ in T n . Then to complete the embedding ancestors of s i , i.e. vertices in A i 1 must be embedded balls in node ancestors of b i in T n and there are at most s 0 d(b i ) options as each node ancestor of b i has s 0 balls. Likewise vertices in A i, j 2 i.e. commonancestors of sinks s i and s j must be embedded to balls in common ancestor nodes of b i and b j in the tree. Thus, where the sum is over distinct balls b 1 , . . . , b |A 0 | and the product i = i ′ is over pairs The expression above is very similar to Equation (2.3) in the proof of Lemma 2.1 and the proof follows now in an identical way so we omit the details. Notice the upper bound for split trees simply picks up an additional factor of s

Lemma 3.3. Let j and j ′ be any two distinct balls, and v a node with split vector
. Let y be the probability that balls j and j ′ pass to the same child node of node v conditional on the event that both balls reach node v. (We say a ball passes to a child node whether it stays at that child or continues further down the tree via that child node). Then, Proof. If a ball j reaches node v there are three possible scenarios • (i) ball j is chosen as one of the s 0 balls to remain at node v when all n balls have been added to the tree.
• (ii) ball j is chosen as one of the bs 1 balls which are distributed uniformly so each child of v receives s 1 of them.
• (iii) ball j chooses a child of v with probabilities given by the split vector V v .
For each of these possible scenarios we give the probability that balls j, j ′ pass to the same child of node v. Observe that swapping the scenarios for j, j ′ gives the same probability so we list only one possibility. We summarise these in a table and then provide the proof of each line below the table.
(i) (ii) (iii) Probability that j, j ′ pass to same child Now, if either or both of the balls stay at node v then self-evidently they cannot pass to the same child of v, thus the situations indicated in the first three rows have probability zero.
The first interesting case is if both balls are in situation (ii), i.e. are both chosen to be part of the bs 1 nodes that are distributed uniformly such that each child receives s 1 balls. Fix a child of v, the number of ways both j, j ′ pass to that child is s 1 2 ; and thus there are bs 1 (s 1 − 1)/2 ways for j, j ′ to pass to the same child of v. Then simply divide by bs 1 (bs 1 − 1)/2 to get the probability that j, j ′ pass to the same child of v. This finishes this case.
The next interesting case is if ball j is in situation (ii) and ball j ′ is in situation (iii). In this case ball j ′ goes to each child v with probability indicated by the split vector. The probability that ball j goes to the same node as j ′ is 1/b; and indeed it didn't matter the probability with which j ′ passes to each child of v.
The last case to consider is if both j, j ′ are in situation (iii), i.e. they pass to child i of node v with probability V i as given by the split vector. Thus the probability they both go to child i of node v is ∑ i V 2 i ; and the probability they pass to the same child of v is then simply the sum over the children of v as required.
After justifying each line in the table it now suffices only to show that s 1 −1 The first is immediate, and the second follows by Jensen's inequality.
We write c n ( j, j ′ ) to denote the number of nodes which are common ancestors of balls j, j ′ and c n ( j) the number of nodes which are ancestors of ball j, including the node containing ball j. Similarly, write c n (u) to be the number of nodes which are ancestors of node u including node u itself. Lastly denote by j ∨ n j ′ the node which is the least common-ancestor of balls j and j ′ ; note if j and j ′ are in the same node then this node is j ∨ n j ′ . Observe that the number of nodes which are ancestors of a ball is one more than the depth c n ( j) = d( j) + 1 and similarly c n ( j, j ′ ) = d( j ∨ n j ′ ) + 1.
After recalling this notation, we can use it to express the probability y in the statement of Lemma 3.3. Observe that the event that the balls j and j ′ both reach node v can be expressed as j, j ′ ≥ v or equivalently ( j ∨ n j) ≥ v.
Now y was defined as the probability that balls j and j ′ pass to the same child node of node v conditional on the event that both balls reach node v and conditional on node v having split vector We may now also state the required lemma for split trees (this lemma plays a very similar role to the bound proven for ∑ u 1 ,u 2 1[c(u 1 , u 2 ) ≥ ℓ] in the proof of Theorem 1.8 for complete binary trees).

Lemma 3.4. Let j, j ′ be any two distinct balls in the split tree with split vector
Proof. The idea is to establish, using Lemma 3.3, the probability that two balls follow the same path through the tree to some specified level given they followed the same path through the tree to the level before. We condition on {V v } v the set of all split vectors in the split tree. For ℓ ≥ 1 The first term is less than ∑ i (V u i ) 2 by Lemma 3.3. For the second term note the following. If balls j and j ′ have at least ℓ common ancestors then their least common ancestor, the node j ∨ n j ′ must have at least ℓ common ancestors. In particular j ∨ n j ′ itself or a node on the path from j ∨ n j ′ to the root must have precisely ℓ ancestors and so, (3.1) (Another way to see this is that for j and j ′ to have at least ℓ common ancestors there must be some node u which is an ancestor of both j and j ′ such that node u has precisely ℓ ancestors.) Hence we get that where ∑ u p u = 1 and also the p u depend only on split vectors for nodes v with c n (v) < ℓ, i.e. closer to the root than node u and so the p u are independent of the {V w } w : c n (w)=ℓ . We can now calculate the probability that balls j, j ′ have ℓ + 1 ancestors conditioned on having ℓ by taking expectations (over split vectors) and using the tower property of expectations.
where the inequality in the third line followed by (3.2). We are basically done. Notice that the root is the ancestor of any two balls, so the event c n ( j, j ′ ) ≥ 1 has probability one and we have our 'base case'. Hence The previous lemma implies the next proposition almost immediately.
Proposition 3.5. Let C > 0 be any constant and let T n be a split tree with n balls. Then there exists a constant β > 0 such that where the sum is over balls b 1 , b 2 .
Proof. By Lemma 3.4, there exists a constant a < 1 such that for any positive integer ℓ, hence as earlier in the proof of the upper bound in Theorem 1.8 this implies and again since C and a < 1 are constants the sum ∑ ∞ ℓ=1 a ℓ ℓ C converges to a constant, say β = β (a,C) and we are done.
We are now ready to prove our upper bound on the expected number of embeddings.
Proof. (of the upper bound of Theorem 1.9) Fix a digraph H, and we will show that there exists a constant c = c(H) such that It is important to have a strong bound on the likely height of the split tree. We apply Proposition 3.1.
Choose K ′ such that P(h(T n ) > K ′ ln n) ≤ n −|H|−1 . Let B denote the (bad) event that h(T n ) > K ′ ln n, and denote by B c the complement of this event.
Define random variable X = X (T n ) to be X = ∑ b 1 ,b 2 c n (b 1 , b 2 ) |A 2 | . Observe that because X is nonnegative and by law of total expectation E [X | B c ] ≤ E [X ]/P(B c ) and so, by Proposition 3.5, for n large enough, In particular, by conditioning on B c : the event that the height being less than K ′ ln n, and by Equa- . Colours and shapes of nodes indicate sink (green •), 'ancestor' (blue ) and 'common-ancestor' (red ) nodes respectively. These labelled directed acyclic graphs appear in variance calculations of R(α) for |α| = 3.

Embeddings: stars are more frequent than other connected digraphs
After having proved the some properties of embedding counts for our two classes of trees, complete binary trees and split trees, we show these imply the desired results on cumulants of the number of appearances of a permutation in the node labellings of complete binary trees, respectively ball labellings in split trees.
Say a sequence of trees T n with n nodes (respectively balls) is explosive if for any fixed acyclic digraph H Ω(n |A 0 | (ln n) |A 1 | ) = [H] T n = o(n |A 0 | (ln n) |A 1 |+1 ).
Thus Section 2 was devoted to showing complete binary trees are explosive and Section 3 to showing split trees are explosive whp. This section proves the cumulant results using only this explosive property of the tree classes. The first result, Proposition 4.1, shows that the number of embeddings of most digraphs we will need to consider are of smaller order than the number of embeddings of a particular digraph the 'star' S k,r which we define below. The other result of this section is to show the asymptotic number of embeddings of S k,r is asymptotically the same as our extended notion of path length ϒ k r (T n ) in Lemma 4.2.
The set G k,r is the set of acyclic digraphs which may be obtained by taking r copies of the path P k and iteratively fusing pairs of vertices together. Likewise labelled H ′ in G ′ k,r are those obtained by fusing together j labelled paths P k keeping both sets of labels when a pair of vertices are fused. The set G ′ 3,2 is illustrated in Figure 4.
Formally let G k,r be the set of directed acyclic graphs H on (k − 1)r edges (allowing parallel edges), such that the edge set can be partitioned into r directed paths P 1 , . . . , P r , each on k − 1 edges. For H ∈ G k,r write H ′ for H together with a labelling V 1 , . . . ,V r , where V i are the k vertices in P i (note some vertices have multiple labels). Likewise write G ′ k,r for the labelled set of graphs. Denote by S k, j the digraph composed by taking j copies of the path P k and fusing the j source vertices into a single vertex. We shall refer to this as a star graph but note it is only really stars if k = 2.

Proposition 4.1. Fix k, r and let H be a connected digraph in the set G k,r . If T n is explosive and H = S k,r then
[H] T n = o [S k,r ] T n .
Proof. First observe that S k,r has r sink vertices, (k − 2)r ancestor vertices and exactly one commonancestor vertex. Thus by the explosive property of T n [S k,r ] T n = Ω(n r (ln n) (k−2)r ).
Now fix H ∈ G k,r \S k,r and fix a labelling V 1 , . . . ,V r on H. Again by the explosive property We will also need the following lemma in the proof of Proposition 6.1. Recall the tree parameter ϒ k r (T n ), defined in Equation (1.2), extends the notion of total path length of a tree. (1)).

Lemma 4.2. Fix k, r. If T n is explosive then
Proof. The star S k,r consists of r directed paths of length k (rays) with their source vertices fused to a common vertex. Let ρ denote the common vertex, and label all other vertices v i, j for 1 ≤ i ≤ r and 2 ≤ j ≤ k, where (ρ, v i,2 , v i,3 , . . . , v i,k ) makes up ray i. As a warmup we count the number of ways to embed S k,r into a tree T n . Suppose the leaves v 1,k , v 2,k , . . . , v r,k are mapped to u 1 , . . . , u r in T n . Then ρ must be mapped to one of the c(u 1 , . . . , u r ) common ancestors of u 1 , . . . , u r . Having done this, for each i we choose k − 2 vertices between u i and ι(ρ), to which we map v i,2 , . . . , v i,k−1 . So the total number of ways is We now show that (4.2) is asymptotically ϒ k r (T n ). The directed star, S k,r can be constructed by taking r directed paths of length k and fusing their source vertices together to a common vertex. Let F k,r be the set of graphs obtained by taking r directed paths of length k and fusing one non-sink vertex from each path together to a common vertex and possibly additional pairs of vertices from paths where vertices were at or above this common vertex . So, S k,r ∈ F k,r , but as for k > 2 the common fused vertex need not be the source vertex of each path, there may be many other digraphs in F k,r .
We now count the number of ways to embed H ∈ F k,r into a tree T n . Let ρ denote the common vertex to all paths. Label all other unlabelled vertices v i, j for 1 ≤ i ≤ r and 1 ≤ j ≤ k, where (v i,1 , ρ, v i,3 , . . . , v i,k ) makes up ray i if it was the second vertex of path i that was fused.
Recall for any H ∈ F k,r the sinks of each path are not fused. Suppose the sinks/leaves v 1,k , v 2,k , . . . , v r,k are mapped to u 1 , . . . , u r in T n . Then ρ must be mapped to one of the c(u 1 , . . . , u r ) common ancestors of u 1 , . . . , u r . Having done this, for each i we choose k − 2 between the root of T n and u i to which we map v i,2 , . . . , v i,k−1 . (The number of the k − 2 vertex mapped above and below ι(ρ) is dependent on which vertex on path i was common vertex in H). Thus, However there are only finitely many digraphs F k,r and all of these are connected digraphs also in the set G k,r . Therefore by Proposition 4.1 and we are done.

Labelling stars
In the proof of Proposition 6.1 where we calculate the moments of the distribution of the number of α that occur in a random labelling of our tree we will consider indicators over small subsets of vertices. A star S k,ℓ can be formed by fusing together ℓ length k paths at their source vertices. For S k,ℓ with a uniform labelling, we calculate the probability each of the ℓ paths is labelled with respect to α in Proposition 5.1.
Proposition 5.1. Let α be a permutation of length k, S k,ℓ be the digraph defined earlier and let λ : be a uniform random labelling of the vertices of S k,ℓ . Then the probability that every V i induces a labelling of relative order α is, Proof. First note that for each V i to induce the relative order α, i.e. a 'correct' labelling there is only one possible label for the root ρ. This is obvious if α 1 = 1 since then the root must receive the label '1'. For general α 1 , each V i \ρ must have α 1 − 1 labels less than the label at the root λ (ρ) and k − α 1 labels greater than λ (ρ); hence we must have λ (ρ) = (α 1 − 1)ℓ + 1. Note that we may choose a uniform labelling λ by first choosing the label at the root λ (ρ) and then choosing uniformly from all labellings of S k,r \ρ with the remaining labels. Thus, as there is only one possible label for the root, the probability it is labelled correctly is ((k−1)ℓ + 1) −1 .

Cumulants moments
By exploiting only the explosive property of binary and (whp) of split trees we will prove the moments result for both classes at once, using Proposition 4.1. In particular observe that Theorems 1.3 and 1.6 are both implied by taking Proposition 6.1 along with the lemmas proving complete binary trees are explosive and split trees are whp explosive.
To define the constant D α,r used in Proposition 6.1 and Theorems 1.3 and 1.6 we use some basic notation of partitions. We write P(r) to indicate the set of all partitions of [r] and note {{1}{2, 3, 4}} and {{2}{1, 3, 4}} form different partitions of [4]. Given a partition π = {s 1 , . . . , s ℓ } of {1, . . . , r} with set sizes r i = |s i | we let |π| = ℓ denote the number of parts in π. Noting a |α|,ℓ (α) is the constant defined in Proposition 5.1 we may now define D α,r by Proof. We fix a permutation α with |α| = k and an explosive tree T n on n nodes, and consider the random variable where we sum over vertex sets U ⊆ T n of size |U | = |α| which are ordered under the partial ordering of T n , i.e. U = {u 1 , . . . , u k } with u 1 < · · · < u k . In order to calculate the cumulants of X , we use mixed cumulants (see e.g. [11, Section 6.1]). Given a set of random variables X 1 , . . . , X r , we denote the mixed cumulant by κ(X 1 , . . . , X r ). For now, we only need the following properties.
We then have κ r (X ) = κ(X , X , . . ., X ) = κ ∑ Let {U 1 , . . . ,U r } be a connected family. We can write U i = {u i,1 , . . . , u i,k } with u i,1 < · · · < u i,k for each i. Let H be the graph on vertex set U = U 1 ∪ · · · ∪ U r with an edge from u i, j to u i, j+1 for each i and j < k. The graph H is a connected member of G k,r . As the term κ(1[π(U 1 ) ≈ α], . . . , 1[π(U r ) ≈ α]) only depends on the labels of vertices in U , it is a function of H which we denote by κ(H). Then κ r (X ) = ∑ H∈G k,r connected [H] T n κ(H).
By Proposition 4.1, this sum is dominated by the term corresponding to H = S k,r . We conclude that κ r (X ) = (1 + o(1))[S k,r ] T n κ(S k,r ).
Let V 1 , . . . ,V r denote the vertex sets of the r "rays" of S k,r ; each V i has size k and induces a path of length k, V 1 ∪ · · · ∪V r covers S k,r , and the V i intersect only at the root of S k,r . We have κ(S k,r ) = κ(1[π(V 1 ) ≈ α], . . . , 1[π(V r ) ≈ α]), and need to establish E ∏ j∈I 1[π(V j ) ≈ α] for any I ⊆ [r]. By symmetry, this is determined by the size of I, and so for 1 ≤ ℓ ≤ r, is the probability that, under a labeling of S k,ℓ chosen uniformly at random, each ray respects the permutation α which we calculated in Proposition 5.1. Hence we have κ(S k,r ) = ∑ This may now be written as κ(S k,r ) = ∑ π (−1) |π|−1 (|π| − 1)! ∏ p∈π a k,|p| , summing over partitions of π of [r] which is the constant D α,r as required.