The Uncover Process for Random Labeled Trees

Hackl, Benjamin; Panholzer, Alois; Wagner, Stephan

doi:10.1007/s44007-023-00060-3

The Uncover Process for Random Labeled Trees

Original Research Article
Open access
Published: 02 October 2023

Volume 2, pages 861–892, (2023)
Cite this article

Download PDF

You have full access to this open access article

La Matematica Aims and scope Submit manuscript

The Uncover Process for Random Labeled Trees

Download PDF

487 Accesses
Explore all metrics

Abstract

We consider the process of uncovering the vertices of a random labeled tree according to their labels. First, a labeled tree with n vertices is generated uniformly at random. Thereafter, the vertices are uncovered one by one, in order of their labels. With each new vertex, all edges to previously uncovered vertices are uncovered as well. In this way, one obtains a growing sequence of forests. Three particular aspects of this process are studied in this work: first the number of edges, which we prove to converge to a stochastic process akin to a Brownian bridge after appropriate rescaling; second, the connected component of a fixed vertex, for which different phases are identified and limiting distributions determined in each phase; and lastly, the largest connected component, for which we also observe a phase transition.

Two Applications of Random Spanning Forests

Article Open access 18 July 2017

A Combinatorial Link Between Labelled Graphs and Increasingly Labelled Schröder Trees

An Asymptotic Analysis of Labeled and Unlabeled k-Trees

Article 23 July 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

We consider the process of uncovering the vertices of a random tree: starting either from one of the $n^{n-2}$ unrooted or from one of the $n^{n-1}$ rooted unordered labeled trees of size n (i.e., with n vertices) chosen uniformly at random, we uncover the vertices one by one in order of their labels. This yields a growing sequence of forests induced by the uncovered vertices, and we are interested in the evolution of these forests from the first vertex to the point that all vertices are uncovered. See Fig. 1 for an illustration of this process on a tree with 100 vertices.

This model is motivated by stochastic models known as coalescent models for particle coalescence, most notably the additive and the multiplicative coalescent [3] and the Kingman coalescent [13]. To make the distinction between these classical coalescent models and our model more explicit, let us briefly revisit the additive coalescent model (see [2]) as a prototypical example. This model describes a Markov process on a state space consisting of tuples $(x_1, x_2, \ldots )$ with $x_1 \ge x_2 \ge \cdots \ge 0$ and $\sum _{i\ge 0} x_i = 1$ that model the fragmentation of a unit mass into clusters of mass $x_i$. Pairs of clusters with masses $x_i$ and $x_j$ then merge into a new cluster of mass $x_i + x_j$ at rate $x_i + x_j$. In the corresponding discrete time version of the process, exactly two clusters are merged in every time step. There are various combinatorial settings in which this discrete additive coalescent model appears, for example in the evolution of parking blocks in parking schemes related to “hashing with linear probing” [6] and in a certain scheme for merging forests by uncovering one edge in every time step [20]. There is also a rich literature on random (edge) cutting of trees, starting with the work of Meir and Moon [17], see also [1, 7, 11, 15]. Fragmentation processes on trees have also been studied extensively in a continuous setting, see, e.g., [4, 18, 19, 25]. Lastly, the special case of our model in which the underlying tree is always a path (with random labels) rather than a random labeled tree was considered by Janson in [12].

While the incarnation of the additive coalescent in which edges are uncovered successively is very much related in spirit to our vertex uncover model, the underlying processes are fundamentally different: these classical coalescent models rely on the fact that exactly two clusters are merged in every time step, which is not the case in our model. When uncovering a new vertex, a more or less arbitrary number of edges (including none at all) can be uncovered. There are coalescent models like the $\Lambda $-coalescent, a generalization of the Kingman coalescent [21], which allow for more than two clusters being merged—however, at present, we are not aware of any known coalescent model that is able to capture the behavior of the vertex uncover process.

Overview. Different aspects of the uncover process on labeled trees are investigated in this work. In Sect. 2, we study the stochastic process given by the number of uncovered edges. The corresponding main result, a full characterization of the process and its limiting behavior, is given in Theorem 3.

Sections 3 and 4 are both concerned with cluster sizes, i.e., with the sizes of the connected components that are created throughout the process. In particular, in Sect. 3, we shift our attention to rooted labeled trees, to study the behavior of the component containing a fixed vertex. The expected size of the root cluster is analyzed in Theorem 7. Furthermore, we show that the number of rooted trees whose root cluster has a given size is given by a rather simple enumeration formula—which, in turn, manifests in Theorem 9, a characterization of the different limiting distributions for the root cluster size depending on the number of uncovered vertices.

Finally, in Sect. 4, we use the results on the root cluster to draw conclusions regarding the size of the largest cluster in the tree.

Notation. Throughout this work, we use the notation $[n] = \{1,\ldots ,n\}$ and $[k, \ell ] = \{k, k+1, \ldots , \ell \}$ for discrete intervals, and $x^{\underline{j}} = x (x-1) \cdots (x-j+1)$ for the falling factorials. The floor and ceiling functions are denoted by $\lfloor x \rfloor $ and $\lceil x \rceil $, respectively. Furthermore, we use $\mathcal {T}$ and $\mathcal {T}^{\bullet }$ for the combinatorial classes of labeled trees and rooted labeled trees, respectively, and $\mathcal {T}_n$ and $\mathcal {T}^{\bullet }_n$ for the classes of labeled and rooted labeled trees of size n, i.e., with n vertices. Finally, we use $X_n \xrightarrow {d} X$ and $X_n \xrightarrow {p} X$ to denote convergence in distribution resp. probability of a sequence of random variables (r.v.) $(X_{n})_{n\ge 0}$ to the r.v. X.

2 The Number of Uncovered Edges

In this section, our main interest is the behavior of the number of uncovered edges in the uncover process. We begin by introducing a formal parameter for this quantity.

Definition 1

Let T be a labeled tree with vertex set $V(T) = [n]$. For $1\le j \le n$, we let $k_j(T) \,{:=}\,\Vert T[1, 2, \ldots , j] \Vert $ denote the number of edges in the subgraph of T induced by the vertices in [j]. We refer to the sequence $(k_j(T))_{1\le j\le n}$ as the (edge) uncover sequence.

We start with a few simple observations. First, for any labeled tree of order n, we have $k_1(T) = 0$, as well as $k_n(T) = n-1$. Second, as any induced subgraph of a tree is a forest, and as forests have the elementary property that the number of edges together with the number of connected components gives the number of vertices in the forest, we find that $j - k_j(T)$ is the number of connected components after uncovering the first j vertices of T. Figure 2 illustrates the progression of the number of edges and the number of connected components for $1 \le j\le 1000$ in a randomly chosen labeled tree on 1000 vertices.

Moreover, the fact that the first j vertices of the tree induce a forest also yields the sharp bound $0 \le k_j(T) \le j-1$ for all $1\le j \le n-1$. The lower bound is attained by the star with central vertex n, and the upper bound is attained by the (linearly ordered) path. We can also observe that as soon as $k_{n-1}(T) > 0$, the set of edges added in the last uncover step is not determined uniquely. Thus, the star with central vertex n is the only tree that is fully determined by its uncover sequence.

The following theorem provides an explicit formula for the (multivariate) generating function that tracks the number of edge increments over specified discrete time intervals of the uncover process.

Theorem 1

Let r be a fixed positive integer with $1\le r < n-1$, and let $j_1$, $j_2$, ..., $j_r$ be positive integers with $1< j_1< j_2< \cdots< j_r < n$. Additionally, let $j_0 = 1$. Then, the multivariate generating function tracking the number of uncovered edges when uncovering vertices in $[j_i + 1, j_{i+1}]$ for $0\le i < r$ in the edge uncover process is given by

$$\begin{aligned} E_n(z_1, z_2, \ldots , z_r) = n^{n-j_r-1} \prod _{i=1}^{r} \Big ( n-j_r + j_iz_i + \sum _{h=i+1}^r (j_h-j_{h-1})z_h \Big )^{j_i - j_{i-1}}.\nonumber \\ \end{aligned}$$

(1)

In other words, given a non-decreasing sequence of non-negative integers $0 = a_0 \le a_1 \le \cdots \le a_r = n-1$ with $a_i < j_i$, the coefficient of the monomial $z_1^{a_1} z_2^{a_2 - a_1} \ldots z_r^{a_r - a_{r-1}}$ in the expansion of $E_n(z_1, \ldots , z_r)$ is the number of labeled trees T of order n with $k_{j_i}(T) = a_i$ for all $1\le i \le r$.

Remark 2

By specifying the integers $j_1$, $j_2$, ..., $j_r$, the uncover process is effectively partitioned into intervals. This is also reflected by the quantities occurring in the product in (1): the difference $j_i - j_{i-1}$ corresponds to the number of vertices uncovered in the i-th interval, $j_i$ represents the number of vertices uncovered in total up to the i-th interval, and $n - j_r$ corresponds to the number of vertices uncovered in the last interval.

Proof of Theorem 1

We begin by observing that when the process uncovers the vertex with label j, edges to all adjacent vertices whose label is less than j are uncovered as well. To determine the generating function of the edge increments, we assign the weight $x_i y_j$ to the edge connecting vertex i and vertex j with $i < j$, and then consider the generating function for the tree weight w(T) (which is defined as the product of the edge weights); $E_n(z_1,\ldots , z_r) = \sum _{|T| = n} w(T)$.

Following Martin and Reiner [16, Theorem 4] or Remmel and Williamson [22, Equation (8)], the generating function of the tree weights w(T) has the explicit formula

$$\begin{aligned} \sum _{|T| = n} w(T) = x_1 y_n \prod _{j=2}^{n-1} \Big ( \sum _{i=1}^n x_{\min (i,j)} y_{\max (i,j)} \Big ). \end{aligned}$$

(2)

As initially observed, edges that are counted by $k_{j_i}(T)$ are precisely those that induce a factor $y_{\ell }$ for some $\ell \le j_i$. Thus we make the following substitutions: $x_{\ell } = 1$ for all $\ell $, $y_{\ell } = z_i$ if and only if $j_{i-1} < \ell \le j_i$ (where^{Footnote 1}$j_0 = 1$), and $y_{\ell } = 1$ if $\ell > j_r$. To deal with the sum over $y_{\max (i,j)}$, observe that we can rewrite it as

$$\begin{aligned} \sum _{i=1}^{n} y_{\max (i, j)} = n - j_r + \sum _{i=1}^{j_1} y_{\max (i, j)} + \cdots + \sum _{i=j_{r-1}+1}^{j_r} y_{\max (i, j)}. \end{aligned}$$

In this form, the different values assumed by the sum when j moves through the ranges $1 < j\le j_1$, $j_1 < j \le j_2$, etc. can be determined directly. For some j with $j_{i-1} < j \le j_{i}$, the contribution to the product in (2) is

$$\begin{aligned} n-j_r + j_iz_i + \sum _{h=i+1}^r (j_h-j_{h-1})z_h, \end{aligned}$$

and for $j_r < j \le n-1$ all y-variables are replaced by 1, so that the contribution to the product is a factor n. Putting both of these observations together shows that the right-hand side of (2) can be rewritten as the right-hand side of (1) and thus proves the lemma. $\square $

With a formula for the generating function of edge increments in the uncover process at our disposal, an explicit formula for the number of trees with given (partial) uncover sequence follows as a simple consequence.

Corollary 2

In the setting of Theorem 1, the number of rooted labeled trees T of order n that satisfy $k_{j_i}(T) = a_i$ for all $1\le i\le r$ is given by

$$\begin{aligned}{} & {} (n-j_r)^{j_r-a_r-1}n^{n-j_r-1} \nonumber \\{} & {} \quad \times \prod _{i=1}^r \bigg ( \sum _{h=0}^{a_i-a_{i-1}} \left( {\begin{array}{c}j_{i-1}-a_{i-1}-1\\ h\end{array}}\right) \left( {\begin{array}{c}j_i-j_{i-1}\\ a_i-a_{i-1}-h\end{array}}\right) (j_i-j_{i-1})^h j_i^{a_i-a_{i-1}-h} \bigg ).\nonumber \\ \end{aligned}$$

(3)

Furthermore, there are

$$\begin{aligned} \prod _{i=1}^{n-2} \biggl (\left( {\begin{array}{c}i - a_i - 1\\ a_{i+1} - a_i - 1\end{array}}\right) (i+1) + \left( {\begin{array}{c}i - a_i - 1\\ a_{i+1} - a_i\end{array}}\right) \biggr ) \end{aligned}$$

(4)

trees with a fully specified uncover sequence $(0, a_2, a_3, \ldots , a_{n-1}, n-1)$.

Proof

The formulas follow from extracting the coefficient of $z_1^{a_1}z_2^{a_2-a_1} \cdots z_r^{a_r-a_{r-1}}$ from the corresponding generating function (1), which is done step by step, starting with $z_1$. $\square $

2.1 A Closer Look at the Stochastic Process

The exceptionally nice formula for the generating function of edge increments can be used to study the stochastic process that describes the number of uncovered edges in more detail. Let the sequence of random variables $(K_j^{(n)})_{1\le j\le n}$ be the discrete stochastic process modeling the number of uncovered edges after uncovering the first j vertices in a random labeled tree of size n, chosen uniformly at random. The expected number of uncovered edges can be determined by a simple argument: with j uncovered vertices, $\left( {\begin{array}{c}j\\ 2\end{array}}\right) $ of the $\left( {\begin{array}{c}n\\ 2\end{array}}\right) $ possible positions for the edges have been uncovered. As every position is, due to symmetry and the uniform choice of the labeled tree, equally likely to hold one of the $n-1$ edges, we find

$$\begin{aligned} \mathbb {E}K_j^{(n)} = (n-1) \frac{\left( {\begin{array}{c}j\\ 2\end{array}}\right) }{\left( {\begin{array}{c}n\\ 2\end{array}}\right) } = \frac{j(j-1)}{n}. \end{aligned}$$

(5)

To motivate our investigations further, consider the illustrations in Fig. 3. The rescaled deviation from the mean is reminiscent of a stochastic process known as Brownian bridge.

In order to define this process formally, recall first that the Wiener process $(W(t))_{t\in [0,1]}$ is the unique stochastic process that satisfies

$W(0) = 0$,
W has independent, stationary increments,
$W(t) \sim \mathcal {N}(0, t)$ for all $t > 0$, and
$t\mapsto W(t)$ is almost surely continuous,

see [14, Definition 21.8]. A Brownian bridge can then be defined by setting

$$\begin{aligned} B(t) = (1-t) W(t/(1-t)), \end{aligned}$$

(6)

see, e.g., [23, Exercise 3.10]. The term “bridge” results from the fact that we have $B(0) = B(1) = 0$.

While the (normalized) deviation from the mean looks like it might converge to a Brownian bridge, we will prove that this is only almost the case. The following theorem characterizes the stochastic process. For technical purposes, we set $K_0^{(n)} = 0$ and introduce the linearly interpolated process $({\tilde{K}}_t^{(n)})_{t\in [0,1]}$, where

$$\begin{aligned} {\tilde{K}}_t^{(n)}:= (1 + \lfloor tn\rfloor - tn) K_{\lfloor tn\rfloor }^{(n)} + (tn - \lfloor tn\rfloor ) K_{\lceil tn\rceil }^{(n)}, \end{aligned}$$

(7)

which by construction has continuous paths.

Theorem 3

Let $(Z^{(n)}(t))_{t\in [0,1]}$ be the continuous stochastic process resulting from centering and rescaling the linearly interpolated process $({\tilde{K}}_{t}^{(n)})_{t\in [0,1]}$ in the form of

$$\begin{aligned} Z^{(n)}(t) \,{:=}\,\frac{{\tilde{K}}_{t}^{(n)} - t^2 n}{\sqrt{n}}, \end{aligned}$$

and let $(W(t))_{t\in [0,1]}$ be the standard Wiener process. Then, for $n\rightarrow \infty $, the rescaled process converges weakly with respect to the $\sup $-norm on C([0, 1]) to a limiting process $Z^{\infty }(t)$ that is given by

$$\begin{aligned} Z^{\infty }(t) = (1-t) W\big (t^2 / (1-t)\big ). \end{aligned}$$

(8)

Furthermore, for s, $t\in [0,1]$ with $s < t$, the limiting process satisfies

$$\begin{aligned} \mathbb {E}Z^{\infty }(t) = 0,\quad \mathbb {V}Z^{\infty }(t) = t^2 (1 - t), \quad \text { and }\quad {{\,\textrm{Cov}\,}}(Z^{\infty }(s), Z^{\infty }(t)) = s^2 (1 - t).\nonumber \\ \end{aligned}$$

(9)

Remark 3

While the limiting process $(Z^{\infty }(t))_{t\in [0,1]}$ is not a Brownian bridge (the corresponding variances and covariances as given in (9) do not match), it is closely related. Comparing the characterization of $Z^{\infty }(t)$ in (8) to (6), we see that the processes only differ by the square in the numerator of the argument of the Wiener process.

Two main ingredients are required to prove this result (cf. [14, Theorem 21.38]): the fact that the sequence of stochastic processes is tight on the one hand, and information on the finite-dimensional joint distributions of $({\tilde{K}}_t^{(n)})_{t\in [0,1]}$ on the other hand.

By Prohorov’s theorem ([14, Theorem 13.29]), tightness is equivalent to the sequence being weakly relatively sequentially compact. We prove that this is the case by checking Kolmogorov’s criterion [14, Theorem 21.42] for which we have to verify that the family of initial distributions $({\tilde{Z}}^{(n)}(0))_{n\in \mathbb {Z}_{\ge 0}}$ is tight and that the paths of $(Z^{(n)}(t))_{t\in [0,1]}$ cannot change too fast.

Let us begin with some probabilistic observations that will be very useful in the proof. Consider the generating function derived in (1). Given Cayley’s well-known tree enumeration formula, the corresponding probability generating function for the complete uncover sequence, i.e., when we choose our integer vector as $\textbf{j} = (2, 3, \ldots , n-1)$, is

$$\begin{aligned} P_n(z_2, \ldots , z_{n-1}) = \prod _{i=2}^{n-1} \left( \frac{1}{n} + \frac{i}{n}z_i + \sum _{h=i+1}^{n-1}\frac{1}{n} z_h\right) . \end{aligned}$$

(10)

This suggests modeling the process with $n-2$ independent random variables, each representing an edge increment^{Footnote 2}. The factorization suggests that the j-th increment (which corresponds to the factor with $i = j+1$) happens with probability $(j+1)/n$ when the vertex with label $j+1$ is uncovered, or with probability 1/n every time any of the subsequent vertices are uncovered. This probabilistic point of view can be used to construct a recursive characterization for the number of uncovered edges, namely^{Footnote 3}

$$\begin{aligned} K_{j+1}^{(n)} = K_{j}^{(n)} + {\text {Ber}}\Bigl (\frac{j+1}{n}\Bigr ) + {\text {Bin}}\Bigl (j - 1 - K_j^{(n)}, \frac{1}{n-j}\Bigr ). \end{aligned}$$

(11)

The Bernoulli variable models the probability that the j-th edge increment is added when uncovering the vertex with label $j+1$, and the binomial variable models all of the remaining, not yet uncovered edge increments.

We can actually use a similar approach to determine an explicit characterization of the distribution of $K_j^{(n)}$. Instead of constructing the probability generating function for a complete uncover sequence, we consider the probability generating function for $K_j^{(n)}$, obtained by normalizing the generating function from (1) for $r = 1$ and $j_1 = j$. We find

$$\begin{aligned} P_n(z) = \frac{n^{n-j-1} (n-j + jz)^{j-1}}{n^{n-2}} = \Bigl (\frac{n-j}{n} + \frac{j}{n} z\Bigr )^{j-1}, \end{aligned}$$

which is exactly the probability generating function of a binomially distributed random variable. This proves $K_j^{(n)} \sim {\text {Bin}}(j-1, j/n)$ and can, for example, be used to determine the variance of $K_j^{(n)}$ as

$$\begin{aligned} \mathbb {V}K_j^{(n)} = \frac{(n-j)(j-1)j}{n^2}. \end{aligned}$$

(12)

Now let us consider a centered and rescaled version of the process $(K_j^{(n)})_{1\le j\le n}$ by defining

$$\begin{aligned} Y_j^{(n)} \,{:=}\,\frac{K_j^{(n)} - \frac{j(j-1)}{n}}{n-j}. \end{aligned}$$

(13)

With the help of the recursive description in (11), we can show that $(Y_j^{(n)})_{1\le j\le n-1}$ is a martingale by computing

$$\begin{aligned} \begin{aligned} \mathbb {E}(Y_{j+1}^{(n)} | Y_j^{(n)})&= \frac{\mathbb {E}(K_{j+1}^{(n)} | K_j^{(n)})}{n-j-1} - \frac{j(j+1)}{n (n-j-1)}\\ {}&= \frac{K_j^{(n)} + \frac{j+1}{n} + \frac{j-1- K_j^{(n)}}{n-j}}{n-j-1}-\frac{j(j+1)}{n (n-j-1)}\\ {}&= \frac{K_j^{(n)}}{n-j} - \frac{(j-1)j}{n(nj)} = Y_j^{(n)}. \end{aligned} \end{aligned}$$

As an immediate consequence of the construction of $Y_j^{(n)}$ and (12) we find

$$\begin{aligned} \mathbb {V}Y_j^{(n)} = \frac{\mathbb {V}K_j^{(n)}}{(n-j)^2} = \frac{(j-1)j}{(n-j) n^2}. \end{aligned}$$

Let us now consider the behavior of the finite-dimensional distributions.

Lemma 4

Let r be a fixed positive integer, and let $\textbf{t} = (t_1,\ldots , t_r)\in (0,1)^r$ with $t_1< t_2< \cdots < t_r$. Then for $n\rightarrow \infty $, the random vector

$$\begin{aligned} \textbf{K}_{\lfloor \textbf{t} n\rfloor }^{(n)} \,{:=}\,\left( K_{\lfloor t_1 n\rfloor }^{(n)}, K_{\lfloor t_2 n\rfloor }^{(n)}, \ldots , K_{\lfloor t_r n\rfloor }^{(n)}\right) \end{aligned}$$

converges, after centering and rescaling, for $n\rightarrow \infty $ in distribution to a multivariate normal distribution,

$$\begin{aligned} \frac{\textbf{K}_{\lfloor \textbf{t} n\rfloor }^{(n)} - \mathbb {E}\textbf{K}_{\lfloor \textbf{t} n\rfloor }^{(n)}}{\sqrt{n}} \xrightarrow [n\rightarrow \infty ]{d} \mathcal {N}(\textbf{0}, \Sigma ), \end{aligned}$$

where the expectation vector $\mathbb {E}\textbf{K}_{\lfloor \textbf{t} n\rfloor }^{(n)}$ satisfies

$$\begin{aligned} \mathbb {E}\textbf{K}_{\lfloor \textbf{t} n\rfloor }^{(n)} = n(t_1^2, t_2^2, \ldots , t_r^2) + \mathcal {O}(1), \end{aligned}$$

(14)

and the entries of the variance–covariance matrix $\Sigma = (\sigma _{i,j})_{1\le i,j\le r}$ are

$$\begin{aligned} \sigma _{i,j} = {\left\{ \begin{array}{ll} t_i^2 (1 - t_j) &{} \text { if } i \le j,\\ t_j^2 (1 - t_i) &{} \text { if } i > j. \end{array}\right. } \end{aligned}$$

(15)

Proof

As a consequence of Theorem 1 and Cayley’s well-known enumeration formula for labeled trees of size n, we find that the probability generating function of the number of edge increments after $1< j_1< j_2< \cdots< j_r < n$ steps, respectively, is given by

$$\begin{aligned} P_n(z_1, z_2, \ldots , z_r)&= \frac{E_n(z_1, z_2, \ldots , z_r)}{n^{n-2}} \nonumber \\&= \prod _{i=1}^r \Bigl (1 - \frac{j_r}{n} + \frac{j_i}{n}z_i + \sum _{h=i+1}^r \frac{j_h - j_{h-1}}{n} z_h\Bigr )^{j_i - j_{i-1}}, \end{aligned}$$

(16)

where $j_0 = 1$ for the sake of convenience. Given that this probability generating function factors nicely, we could use a general result like the multidimensional quasi-power theorem (cf. [10]) to prove that the corresponding random vector

$$\begin{aligned} \Delta _{\textbf{j}}^{(n)} = (K_{j_1}^{(n)}, K_{j_2}^{(n)} - K_{j_1}^{(n)}, \ldots , K_{j_r}^{(n)} - K_{j_{r-1}}^{(n)}) \end{aligned}$$

converges, after suitable rescaling, to a multivariate Gaussian limiting distribution. However, there is a simple probabilistic argument: Observe that $\Delta _{\textbf{j}}^{(n)}$ can be seen as a marginal distribution of the sum of r independent, multinomially distributed random vectors: write $t_i = j_i/n$ and consider $M_j \sim {\text {Multi}}(j_i - j_{i-1}, \textbf{p}_i)$ where

$$\begin{aligned} \textbf{p}_i = (p_{i,0}, p_{i,1}, \ldots , p_{i, r}) \in [0,1]^r \quad \text {such that} \quad p_{i,h} = {\left\{ \begin{array}{ll} 1 - t_r &{} \text { if } h = 0,\\ 0 &{} \text { if } 0< h < i,\\ t_i &{} \text { if } h = i,\\ t_{h} - t_{h-1} &{} \text { otherwise.} \end{array}\right. }\nonumber \\ \end{aligned}$$

(17)

By construction, the probability generating function of $M_i$ is then given by

$$\begin{aligned} \Bigl ((1 - t_r)z_0 + t_i z_i + \sum _{h=i+1}^{r} (t_h - t_{h-1}) z_h \Bigr )^{j_i - j_{i-1}}, \end{aligned}$$

so that the probability generating function of the sum $M_1 + \cdots + M_r$ is a product that is very similar (and actually equal if we set $z_0 = 1$, which corresponds to marginalizing out the first component) to (16). In order to make the following arguments formally easier to read, and as the first component is not relevant for us at all, we slightly abuse notation and let $M_i$ for $1\le i\le r$ denote the corresponding marginalized multinomial distributions instead.

For the sake of convenience, we make a slight adjustment: instead of fixing the integer vector $\textbf{j} = (j_1, \ldots , j_r)$, we fix $\textbf{t} = (t_1, \ldots , t_r)$ with $0< t_1< \cdots< t_r < 1$ and define $\textbf{j} = \lfloor \textbf{t} n\rfloor $. Here, n is considered to be sufficiently large so that the conditions for the corresponding integer vector, $1< \lfloor t_1 n\rfloor< \cdots< \lfloor t_r n\rfloor < n$, are still satisfied.

By the multivariate central limit theorem, it is well known that a multinomially distributed random vector $M\sim {\text {Multi}}(n, \textbf{p})$ converges, for $n\rightarrow \infty $ and after appropriate scaling, in distribution to a multivariate normal distribution,

$$\begin{aligned} \frac{M - n\textbf{p}}{\sqrt{n}} \xrightarrow {d} \mathcal {N}(\textbf{0}, {{\,\textrm{diag}\,}}(\textbf{p}) - \textbf{p}^{\top } \textbf{p}). \end{aligned}$$

(18)

As a consequence, we find that

$$\begin{aligned} \frac{\Delta _{\lfloor n\textbf{t}\rfloor }^{(n)} - \mathbb {E}\Delta _{\lfloor n\textbf{t}\rfloor }^{(n)}}{\sqrt{n}}&= \frac{(M_1 + \cdots + M_r) - \mathbb {E}(M_1 + \cdots + M_r)}{\sqrt{n}} \\&= (\sqrt{t_1} + \mathcal {O}(n^{-1})) \frac{M_1 - \mathbb {E}M_1}{\sqrt{\lfloor t_1 n\rfloor }} \\&\quad + \cdots + (\sqrt{t_r - t_{r-1}} + \mathcal {O}(n^{-1}))\frac{M_r - \mathbb {E}M_r}{\sqrt{\lfloor t_r n\rfloor - \lfloor t_{r-1} n\rfloor }}\\&\xrightarrow [n\rightarrow \infty ]{d} \sqrt{t_1}\mathcal {N}(\textbf{0}, \Sigma _1) + \cdots + \sqrt{t_r - t_{r-1}} \mathcal {N}(\textbf{0}, \Sigma _r)\\&= \mathcal {N}(\textbf{0}, t_1\Sigma _1 + \cdots + (t_r - t_{r-1})\Sigma _r), \end{aligned}$$

where the variance–covariance matrices are given by

$$\begin{aligned} \Sigma _j = {{\,\textrm{diag}\,}}(\textbf{p}_j) - \textbf{p}_j^{\top } \textbf{p}_j. \end{aligned}$$

By a straightforward (linear) transformation consisting of taking partial sums, the random vector of increments $\Delta _{\lfloor \textbf{t}n\rfloor }^{(n)}$ can be transformed into $\textbf{K}_{\lfloor \textbf{t}n\rfloor }^{(n)}$. This proves that $\textbf{K}_{\lfloor \textbf{t}n\rfloor }^{(n)}$ converges, after centering and rescaling, to a multivariate normal distribution.

The entries of the corresponding variance–covariance matrix can either be determined mechanically from the entries of $t_1 \Sigma _1 + \cdots + (t_r - t_{r-1}) \Sigma _r$ by taking the partial summation into account, or alternatively, our observations concerning the martingale $Y_j^{(n)}$ can be used. In particular, using (13), we find, for fixed $s,t\in [0,1]$ with $s < t$, that

$$\begin{aligned} \begin{aligned} {{\,\text {Cov}}}\left( \frac{K_{\lfloor sn\rfloor }^{(n)} - \mathbb {E}K_{\lfloor sn\rfloor }^{(n)}}{\sqrt{n}}, \frac{K_{\lfloor tn\rfloor }^{(n)} - \mathbb {E}K_{\lfloor tn\rfloor }^{(n)}}{\sqrt{n}}\right)&= \frac{(n - \lfloor tn\rfloor )(n - \lfloor sn\rfloor )}{n} \mathbb {E}\Big (Y_{\lfloor sn\rfloor }^{(n)} Y_{\lfloor tn\rfloor }^{(n)}\Big )\\ {}&= \Big (n(1-t)(1-s) + \mathcal {O}(1))\mathbb {E}({Y_{\lfloor sn\rfloor }^{(n)}}^2\Big )\\ {}&= s^2 (1 - t) + \mathcal {O}(n^{-1}), \end{aligned} \end{aligned}$$

where we made use of the martingale property, and the fact that the second moment of $Y_j^{(n)}$ is equal to the variance $n^{-2} j (j-1) / (n-j)$. Ultimately, this verifies (15) and thus completes the proof. $\square $

The last remaining piece required to prove that the sequence of processes $(Z^{(n)})_{n\ge 1}$ is tight is a bound on the growth of the corresponding paths.

Lemma 5

There is a constant $\lambda $ such that the following inequality holds for all $s, t\in [0,1]$ and all $n\in \mathbb {Z}_{> 0}$:

$$\begin{aligned} \mathbb {E}\bigl (\bigl |Z^{(n)}(t) - Z^{(n)}(s)\bigr |^4 \bigr ) \le \lambda \, |t-s|^2. \end{aligned}$$

(19)

Proof

Let us first consider the case where both $sn = \ell $ and $tn = m$ are integers. Assume that $\ell > m$. We can follow the argument in the proof of Lemma 4 to see that the random variable $K_{\ell }^{(n)} - K_{m}^{(n)}$ is distributed like the last component in the sum of two independent multinomial distributions, which gives us

$$\begin{aligned} K_{\ell }^{(n)} - K_{m}^{(n)} \sim {\text {Bin}} \Big (m - 1, \frac{\ell -m}{n} \Big ) + {\text {Bin}} \Big (\ell -m, \frac{\ell }{n} \Big ). \end{aligned}$$

Let us write ${\text {Bin}}^*(n,p)$ for a centered binomial distribution, i.e., a binomial distribution ${\text {Bin}}(n,p)$ with the mean np subtracted. Then we have

$$\begin{aligned} K_{\ell }^{(n)} - K_{m}^{(n)} - \frac{\ell ^2-m^2}{n} \sim {\text {Bin}}^* \Big (m - 1, \frac{\ell -m}{n} \Big ) + {\text {Bin}}^* \Big (\ell -m, \frac{\ell }{n} \Big ) - \frac{\ell -m}{n}.\nonumber \\ \end{aligned}$$

(20)

The fourth moment of a ${\text {Bin}}^*(n,p)$-distributed random variable, which is the fourth centered moment of a ${\text {Bin}}(n,p)$-distributed random variable, is

$$\begin{aligned} np(1-p)\big (1+(3n-6)p(1-p)\big ) \le np(1+3np). \end{aligned}$$

Note that for the ${\text {Bin}}^*$-variables in (20), we have $\frac{(m-1)(\ell -m)}{n} \le \ell -m$ and $\frac{(\ell -m)\ell }{n} \le \ell -m$. Thus the fourth moments of the two centered binomial random variables are bounded above by

$$\begin{aligned} (\ell -m)(1+3(\ell -m)) \le 4(\ell -m)^2. \end{aligned}$$

So $K_{\ell }^{(n)} - K_{m}^{(n)} - \frac{\ell ^2-m^2}{n}$ is the sum of three random variables (the third one actually constant) whose fourth moments are bounded above by $4(\ell -m)^2$, $4(\ell -m)^2$, and $(\ell -m)^4n^{-4} \le (\ell -m)^2$, respectively. Applying the inequality $\mathbb {E}((X+Y+Z)^4) \le 27(\mathbb {E}(X^4) + \mathbb {E}(Y^4) + \mathbb {E}(Z^4))$, which is a simple consequence of Jensen’s inequality, we get

$$\begin{aligned} \mathbb {E}\Big ( K_{\ell }^{(n)} - K_{m}^{(n)} - \frac{\ell ^2-m^2}{n} \Big )^4 \le 27 \cdot (4+4+1)(\ell -m)^2 = 243(\ell -m)^2. \end{aligned}$$

So if $tn = \ell $ and $sn = m$ are integers, we have

$$\begin{aligned} \mathbb {E}\bigl (\bigl |Z^{(n)}(t) - Z^{(n)}(s)\bigr |^4 \bigr )&= \mathbb {E}\Bigg ( \frac{K_{\ell }^{(n)} - \frac{\ell ^2}{n}}{\sqrt{n}} - \frac{K_{m}^{(n)} - \frac{m^2}{n}}{\sqrt{n}} \Bigg )^4 \\&\le \frac{243(\ell -m)^2}{n^2} = 243(t-s)^2. \end{aligned}$$

Second, consider the case that tn and sn lie between two consecutive integers m and $m+1$: ${\tilde{s}} n = m \le sn \le tn \le m+1 = {\tilde{t}} n$. In this case, we can express the difference $Z^{(n)}(t) - Z^{(n)}(s)$ (using the linear interpolation in the definition of ${\tilde{K}}_t^{(n)}$) as

$$\begin{aligned} Z^{(n)}(t) - Z^{(n)}(s) = (t-s)n \bigl ( Z^{(n)}({\tilde{t}}) - Z^{(n)}({\tilde{s}}) \bigr ) + (t-s) \sqrt{n} ({\tilde{t}} - t - s + {\tilde{s}}). \end{aligned}$$

The fourth moment of the first term is

$$\begin{aligned} \mathbb {E}\Big ( (t-s)n \bigl ( Z^{(n)}({\tilde{t}}) - Z^{(n)}({\tilde{s}}) \bigr ) \Big )^4 \le (t-s)^4n^4 \cdot 243 ({\tilde{t}} - {\tilde{s}})^2 \le 243(t-s)^2 \end{aligned}$$

since $|t-s| \le |{\tilde{t}} - {\tilde{s}}| = \frac{1}{n}$. Likewise, the fourth power of the second term is easily seen to be bounded above by $(t-s)^2$. So the elementary inequality $\mathbb {E}((X+Y)^4) \le 8(\mathbb {E}(X^4) + \mathbb {E}(Y^4))$ yields

$$\begin{aligned} E \bigl (\bigl |Z^{(n)}(t) - Z^{(n)}(s)\bigr |^4 \bigr ) \le 8(243+1)(t-s)^2 = 1952(t-s)^2. \end{aligned}$$

Finally, in the general case that t and s are arbitrary real numbers in the interval [0, 1] such that tn and sn do not lie between consecutive integers, we can write

$$\begin{aligned} Z^{(n)}(t) - Z^{(n)}(s)= & {} \bigl ( Z^{(n)}(t) - Z^{(n)}(u_1) \bigr ) + \bigl ( Z^{(n)}(u_1) - Z^{(n)}(u_2) \bigr ) \\{} & {} + \bigl ( Z^{(n)}(u_2) - Z^{(n)}(s) \bigr ) \end{aligned}$$

for some real numbers $u_1,u_2$ with $s \le u_2 \le u_1 \le t$ such that $u_1n$ and $u_2n$ are integers and $tn \le u_1n+1$ as well as $sn \ge u_2n-1$. Combining the bounds from above and using again the inequality $\mathbb {E}((X+Y+Z)^4) \le 27(\mathbb {E}(X^4) + \mathbb {E}(Y^4) + \mathbb {E}(Z^4))$, we obtain

$$\begin{aligned} \mathbb {E}\bigl (\bigl |Z^{(n)}(t) - Z^{(n)}(s)\bigr |^4 \bigr )&\le 27 \bigl ( 1952(t-u_1)^2 + 243(u_1-u_2)^2 + 1952(u_2-s)^2 \bigr ) \\&\le 27 \cdot 1952 (t-u_1+u_1-u_2+u_2-s)^2 = 52704(t-s)^2, \end{aligned}$$

completing the proof of the lemma with $\lambda = 52704$. $\square $

All that remains now is to combine the two ingredients to prove our main result on the limiting process.

Proof of Theorem 3

The proof relies on the well-known result asserting that given tightness of the sequence of corresponding probability measures as well as convergence of the finite-dimensional probability distributions, a sequence of stochastic processes converges to a limiting process (see [5, Theorems 7.1, 7.5]).

Tightness is implied (see [14, Theorems 13.29, 21.42]) by tightness of the initial distributions (which is satisfied in our case as every ${\tilde{Z}}^{(n)}(0)$ for $n\in \mathbb {Z}_{\ge 0}$ is degenerate and assumes value 0 with probability 1 as a consequence of the uncover process deterministically starting with no uncovered edges) and the moment bound in Lemma 5. The (limiting) behavior of the finite-dimensional distributions of the original process $(K_{\lfloor tn\rfloor }^{(n)})_{t\in [0,1]}$ is characterized by Lemma 4. This characterization carries over to the linearly interpolated process by an application of Slutsky’s theorem [14, Theorem 13.18] after observing that

$$\begin{aligned} \mathbb {P}\left( \left| {Z^{(n)}(t) - \frac{K_{\lfloor tn\rfloor }^{(n)} - t^2 n}{\sqrt{n}}}\right|> \varepsilon \right)&= \mathbb {P}\left( \frac{|{\tilde{K}}_{t}^{(n)} - K_{\lfloor tn\rfloor }^{(n)}|}{\sqrt{n}} > \varepsilon \right) \le \frac{ \mathbb {E}(({\tilde{K}}_{t}^{(n)} - K_{\lfloor tn\rfloor }^{(n)})^2) }{n\varepsilon ^2}\\&\le \frac{ \mathbb {E}( (K_{\lfloor tn\rfloor +1}^{(n)} - K_{\lfloor tn\rfloor }^{(n)})^2 )}{n \varepsilon ^{2}} \xrightarrow {n\rightarrow \infty } 0, \end{aligned}$$

as a mechanical computation shows that $\mathbb {E}( (K_{\lfloor tn\rfloor +1}^{(n)} - K_{\lfloor tn\rfloor }^{(n)})^2 ) = \mathcal {O}(1)$ (this also follows from Lemma 5).

Note that as the finite-dimensional distributions converge to Gaussian distributions, the limiting process $(Z^{\infty }(t))_{t\in [0,1]}$ is Gaussian itself—which means that it is fully characterized by its first and second order moments. As a consequence of Lemma 4, we find for all $s, t\in [0,1]$ with $s < t$ that

$$\begin{aligned} \mathbb {E}Z^{\infty }(t) = 0,\qquad \mathbb {V}Z^{\infty }(t) = t^2 (1 - t), \qquad {{\,\textrm{Cov}\,}}(Z^{\infty }(s), Z^{\infty }(t)) = s^2 (1-t). \end{aligned}$$

It can be checked that if $(W(t))_{t\in [0,1]}$ is a standard Wiener process, the Gaussian process $((1-t)W(t^2/(1-t)))_{t\in [0,1]}$ has the same first- and second-order moments and therefore also the same distribution as $Z^{\infty }$. $\square $

While we only needed to show tightness of the initial distributions of the processes $(Z^{n}(t))_{t\in [0, 1]}$ to prove convergence to $Z^{\infty }(t)$, we can actually prove a much stronger result. A uniform bound (that implies tightness of $Z^{(n)}(t)$ for every fixed t) reads as follows.

Proposition 6

For any real $C > 1$ and any positive integer n, the random variable $Z^{(n)}(t)$ satisfies the bound

$$\begin{aligned} \mathbb {P}(\sup _{t\in [0,1]} |Z^{(n)}(t)| \ge C) \le 4 (C-1)^{-2}, \end{aligned}$$

(21)

so that for $C \rightarrow \infty $, the probability for the process to exceed C in absolute value converges to 0 uniformly in terms of n.

Proof

In order to obtain this condition, we show first that it can be reduced to an inequality for the martingale from the previous section. To this end, let us write $tn = j + \eta $, with $j \in \mathbb {Z}$ and $\eta \in [0,1)$. A simple calculation shows that

$$\begin{aligned} Z^{(n)}(t)&= \frac{{\tilde{K}}_{t}^{(n)} - t^2 n}{\sqrt{n}} \\&= \frac{(1-\eta )K_j^{(n)} + \eta K_{j+1}^{(n)} - (j+\eta )^2/n}{\sqrt{n}} \\&= \frac{(1-\eta )(K_j^{(n)} - j(j-1)/n) + \eta (K_{j+1}^{(n)} - j(j+1)/n) - (j+\eta ^2)/n}{\sqrt{n}} \\&= (1-\eta )\frac{K_j^{(n)} - j(j-1)/n}{\sqrt{n}} + \eta \frac{K_{j+1}^{(n)} - j(j+1)/n}{\sqrt{n}} - \frac{j+\eta ^2}{n^{3/2}}. \end{aligned}$$

The final fraction is bounded by 1, since $j+\eta ^2 \le j+\eta = tn \le n$. It follows that

$$\begin{aligned} \sup _{t\in [0,1]} |Z^{(n)}(t)| \le \sup _{0 \le j \le n} \left| {\frac{K_j^{(n)} - j(j-1)/n}{\sqrt{n}}}\right| + 1, \end{aligned}$$

so

$$\begin{aligned} \mathbb {P}\bigl (\sup _{t\in [0,1]} |Z^{(n)}(t)| \ge C\bigr )&\le \mathbb {P}\left( \sup _{0 \le j \le n} \left| {\frac{K_j^{(n)} - j(j-1)/n}{\sqrt{n}}}\right| \ge C-1 \right) \nonumber \\&= \mathbb {P}\left( \sup _{1 \le j \le n-1} \left| {\frac{Y_j^{(n)} (n - j)}{\sqrt{n}}}\right| \ge C-1\right) . \end{aligned}$$

(22)

Note here that we need not consider $j=0$ and $j=n$ in the supremum, since $K_j^{(n)} - j(j-1)/n = 0$ in either case. Since $(Y_j^{(n)})_{1\le j\le n-1}$ is a martingale, we can use Doob’s maximal inequality [14, Theorem 11.2]. For any real $C > 0$ and any fixed integer k with $1\le k\le n-1$, we have

$$\begin{aligned} \mathbb {P}\bigl (\sup _{1\le j\le k} |Y_j^{(n)}| \ge C \bigr ) \le \frac{\mathbb {V}Y_k^{(n)}}{C^2} = \frac{k (k-1)}{C^2 (n-k) n^2}. \end{aligned}$$

With this, we have all required prerequisites to prove the bound. We partition the interval over which the supremum is taken in (22), apply the martingale inequality, and then obtain the desired result after summing over all these upper bounds. For every integer $i > 0$, let $I_i^{(n)} \,{:=}\,[2^{-i} n, 2^{-i+1}n] \cap \mathbb {Z}$. We find

$$\begin{aligned} \mathbb {P}\left( \sup _{n-j\in I_i^{(n)}} \left| {\frac{Y_j^{(n)} (n - j)}{\sqrt{n}}}\right| \ge C-1\right)&\le \mathbb {P}\left( \sup _{n-j\in I_i^{(n)}} |Y_j^{(n)} 2^{-i+1} \sqrt{n}| \ge C-1\right) \\&= \mathbb {P}\left( \sup _{n-j\in I_i^{(n)}} |Y_j^{(n)}| \ge \frac{2^{i-1} (C-1)}{\sqrt{n}}\right) \\&\le \frac{n}{2^{2i-2} (C-1)^2} \mathbb {V}(Y_{n - \lceil 2^{-i} n\rceil }^{(n)}) \\&\le \frac{n}{2^{2i-2} (C-1)^2} \cdot \frac{2^i}{n} = \frac{4}{2^i (C-1)^2}, \end{aligned}$$

where in the last inequality we bounded the variance as follows:

$$\begin{aligned} \mathbb {V}(Y_{n - \lceil 2^{-i} n\rceil }^{(n)}) = \frac{\mathbb {V}(K_{n - \lceil 2^{-i} n\rceil }^{(n)})}{\lceil 2^{-i} n\rceil ^2} = \frac{(n - \lceil 2^{-i} n\rceil )(n - \lceil 2^{-i} n\rceil - 1) \lceil 2^{-i} n\rceil }{n^2 \lceil 2^{-i} n\rceil ^2} \le \frac{2^i}{n}. \end{aligned}$$

Finally, the union bound together with the observation that $\sum _{i\ge 1} \frac{4}{2^i (C-1)^2} = 4 (C-1)^{-2}$ yields the upper bound in (21) and therefore completes the proof. $\square $

3 Size of the Root Cluster

We now shift our attention from the number of uncovered edges to the sizes of the connected components (or clusters) appearing in the graph throughout the uncover process. It will prove convenient to change our tree model to rooted labeled trees, as the nature of rooted trees allows us to focus our investigation on one particular cluster – the one containing the root vertex. In case the root vertex has not yet been uncovered, we will consider the size of the root cluster to be 0. Formally, we let the random variable $R_{n}^{(k)}$ be the size of the root cluster of a (uniformly) random rooted labeled tree of size n with k uncovered vertices.

Using the symbolic method for labeled structures (cf. [9, Chapter II]), we can set up a formal specification for the corresponding combinatorial classes and subsequently extract functional equations for the associated generating functions. Let $\mathcal {T}^{\bullet }$ be the class of rooted labeled trees, and let $\mathcal {G}$ be a refinement of $\mathcal {T}^{\bullet }$ where the vertices can either be covered or uncovered, and where uncovered vertices are marked with a marker U. Finally, let $\mathcal {F}$ be a further refinement of $\mathcal {G}$ where all uncovered vertices in the root cluster are additionally marked with marker V. A straightforward “top-down approach,” i.e., a decomposition of the members of the tree family w.r.t. the root vertex, yields the formal specification

$$\begin{aligned} \mathcal {F} = \mathcal {Z} *\textsc {Set}(\mathcal {G}) + \mathcal {Z} *\{UV\} *\textsc {Set}(\mathcal {F}), \qquad \mathcal {G} = \mathcal {Z} *\textsc {Set}(\mathcal {G}) + \mathcal {Z} *\{U\} *\textsc {Set}(\mathcal {G}). \end{aligned}$$

Note that the first summand in the formal description of $\mathcal {F}$ corresponds to the case where the root vertex is covered, thus the size of the root cluster is zero.

Introducing the corresponding exponential generating functions $F:=F(z,u,v)$ and $G:=G(z,u)$,

$$\begin{aligned} F(z,u,v)&:= \sum _{T \in \mathcal {F}} \frac{z^{|T|} u^{{} \# U \hbox { in } T} v^{{} \# V \hbox { in } T}}{|T|!} \\&= \sum _{n \ge 1} \sum _{0 \le k \le n} \sum _{m \ge 0} \frac{n^{n-1}}{n!} \left( {\begin{array}{c}n\\ k\end{array}}\right) \mathbb {P}\{R_{n}^{(k)} = m\} z^{n} u^{k} v^{m},\\ G(z,u)&:= \sum _{T \in \mathcal {G}} \frac{z^{|T|} u^{{} \# U \hbox { in } T}}{|T|!} = \sum _{n \ge 1} \sum _{0 \le k \le n} \frac{n^{n-1}}{n!} \left( {\begin{array}{c}n\\ k\end{array}}\right) z^{n} u^{k}, \end{aligned}$$

we obtain the characterizing equations

$$\begin{aligned} F = z e^{G} + zuv e^{F}, \qquad G = z (1+u) e^{G}. \end{aligned}$$

(23)

Of course, $G(z,u) = T^{\bullet }(z(1+u))$, where $T^{\bullet }$ is the exponential generating function associated with $\mathcal {T}^{\bullet }$, the Cayley tree function. Starting with (23), the following results on $R_{n}^{(k)}$ can be deduced.

Theorem 7

The expectation $\mathbb {E}(R_{n}^{(k)})$ is, for $0 \le k \le n$ and $n \ge 1$, given by

$$\begin{aligned} \mathbb {E}(R_{n}^{(k)}) = \sum _{j=1}^{k} \frac{j \, k^{\underline{j}}}{n^{j}}. \end{aligned}$$

(24)

Depending on the growth of $k=k(n)$, $\mathbb {E}(R_{n}^{(k)})$ has the following asymptotic behavior:

$$\begin{aligned} \mathbb {E}(R_{n}^{(k)}) \sim {\left\{ \begin{array}{ll} \frac{k}{n}, &{} \hbox {for}\; k=o(n), \quad (k \text { small}),\\ \frac{\alpha }{(1-\alpha )^{2}}, &{} \hbox {for}\; k \sim \alpha n, \hbox { with } 0< \alpha < 1, \quad (k \text { in central region}),\\ \frac{n^{2}}{d^{2}}, &{} \hbox {for}\; k=n-d, \hbox { with } d = \omega (\sqrt{n}) \hbox { and } d = o(n),\\ &{} \quad (k\; \text {subcritically large}),\\ \kappa n, &{} \text {with} \quad \kappa = 1-c e^{\frac{c^{2}}{2}} \int _{c}^{\infty } e^{-\frac{t^{2}}{2}} \textrm{d}t,\\ &{} \hbox {for } k=n-d, \hbox { with } d \sim c \sqrt{n} \hbox { and } c > 0,\\ &{} \quad (k\; \text {critically large}),\\ n - \sqrt{\frac{\pi }{2}} d \sqrt{n}, &{} \hbox {for}\; k=n-d, \;\hbox {with}\; d = o(\sqrt{n}),\\ &{} \quad (k \text { supercritically large}). \end{array}\right. } \end{aligned}$$

Proof

After introducing

$$\begin{aligned} E:= E(z,u) = \left. \frac{\partial }{\partial v}F(z,u,v)\right| _{v=1} = \sum _{n,k\ge 0} \frac{n^{n-1}}{n!} \left( {\begin{array}{c}n\\ k\end{array}}\right) \mathbb {E}(R_{n}^{(k)}) z^n u^k, \end{aligned}$$

considering the partial derivative of (23) with respect to v easily yields

$$\begin{aligned} E = \frac{1}{1-\frac{u}{1+u}G} - 1. \end{aligned}$$

Extracting coefficients of E by an application of the Lagrange inversion formula (see, e.g., [9, Theorem A.2]) yields

$$\begin{aligned}{}[z^{n}] E&= \frac{1}{n} [G^{n-1}] \frac{u}{(1+u) (1-\frac{u}{1+u} G)^{2}} \cdot (1+u)^{n} e^{n G}\\&= \sum _{j=0}^{n-1} (j+1) u^{j+1} (1+u)^{n-1-j} \cdot \frac{n^{n-2-j}}{(n-1-j)!}, \end{aligned}$$

and further

$$\begin{aligned}{}[z^{n} u^{k}] E = \sum _{j=0}^{k-1} (j+1) \left( {\begin{array}{c}n-1-j\\ k-1-j\end{array}}\right) \frac{n^{n-2-j}}{(n-1-j)!}. \end{aligned}$$

Using $\mathbb {E}(R_{n}^{(k)}) = \frac{[z^{n} u^{k}] E}{[z^{n} u^{k}] G} = \frac{n! \, [z^{n} u^{k}] E}{n^{n-1} \left( {\begin{array}{c}n\\ k\end{array}}\right) }$, we obtain the result stated in (24):

$$\begin{aligned} \mathbb {E}(R_{n}^{(k)})&= n! \sum _{j=0}^{k-1} (j+1) \frac{\left( {\begin{array}{c}n-1-j\\ k-1-j\end{array}}\right) }{\left( {\begin{array}{c}n\\ k\end{array}}\right) } \cdot \frac{n^{-1-j}}{(n-1-j)!} = n! \sum _{j=0}^{k-1} \frac{j+1}{n^{1+j} (n-1-j)!} \cdot \frac{k^{\underline{j+1}}}{n^{\underline{j+1}}}\\&= \sum _{j=1}^{k} \frac{j \, k^{\underline{j}}}{n^{j}}. \end{aligned}$$

In order to analyze the asymptotic behavior of $\mathbb {E}(R_{n}^{(k)})$, the following integral representation turns out to be advantageous,

$$\begin{aligned} \mathbb {E}(R_{n}^{(k)}) = \int _{0}^{\infty } (x-1) \, e^{-x} \Big (1+\frac{x}{n}\Big )^{k}\textrm{d}x. \end{aligned}$$

(25)

It might be considered as a variation of the integral representation of the so-called Ramanujan Q-function, $Q(n) = \sum _{j=0}^{n-1} \frac{(n-1)^{\underline{j}}}{n^{j}} = \int _{0}^{\infty } e^{-x} (1+\frac{x}{n})^{n-1} \textrm{d}x$, which can be traced back to Ramanujan himself (see, e.g., [8]).

Equation (25) can be verified in a straightforward way by using the integral representation of the Gamma function:

$$\begin{aligned}&\int _{0}^{\infty } (x-1) \, e^{-x} \Big (1+\frac{x}{n}\Big )^{k} \textrm{d}x = \int _{0}^{\infty } (x-1) e^{-x} \sum _{j=0}^{k} \left( {\begin{array}{c}k\\ j\end{array}}\right) \frac{x^{j}}{n^{j}} \textrm{d}x\\&\quad = \sum _{j=0}^{k} \frac{\left( {\begin{array}{c}k\\ j\end{array}}\right) }{n^{j}} \left( \int _{0}^{\infty } e^{-x} x^{j+1} \textrm{d}x - \int _{0}^{\infty } e^{-x} x^{j} \textrm{d}x\right) = \sum _{j=0}^{k} \frac{\left( {\begin{array}{c}k\\ j\end{array}}\right) }{n^{j}} \big ((j+1)! - j!\big )\\&\quad = \sum _{j=0}^{k} \frac{k^{\underline{j}} \, j \, j!}{j! \, n^{j}} = \sum _{j=0}^{k} \frac{j \, k^{\underline{j}}}{n^{j}}. \end{aligned}$$

In order to evaluate the integral (25) asymptotically, we first show that for the range $x \ge n^{\frac{1}{2} + \epsilon }$, with arbitrary but fixed $\epsilon > 0$, the contribution to the integral is exponentially small and thus negligible. Namely, when considering the integrand and setting $t = \frac{x}{n}$, we obtain the following estimate, uniformly for $k \in [0,n]$:

$$\begin{aligned} e^{-x} \Big (1+\frac{x}{n}\Big )^{k} \le e^{-x} \Big (1+\frac{x}{n}\Big )^{n} = e^{-x + n \ln (1+\frac{x}{n})} = e^{-n \big (t-\ln (1+t)\big )}. \end{aligned}$$

Simple monotonicity considerations yield $t - \ln (1+t) \ge \frac{t}{4}$, for $t \ge 1$ (thus $x \ge n$), and $t - \ln (1+t) \ge \frac{t^{2}}{4}$, for $t \in [0,1]$ (thus $x \in [0,n]$). Due to these estimates and by evaluating the resulting integral, we get the following bounds on the integral for the respective ranges:

$$\begin{aligned} \int _{n}^{\infty } (x-1) e^{-x} \Big (1+\frac{x}{n}\Big )^{k} \textrm{d}x&\le \int _{n}^{\infty } x e^{-\frac{x}{4}} \textrm{d}x = 16 \big (1+\frac{n}{4}\big ) e^{-\frac{n}{4}},\\ \int _{n^{\frac{1}{2} + \epsilon }}^{n} (x-1) e^{-x} \Big (1+\frac{x}{n}\Big )^{k} \textrm{d}x&\le \int _{n^{\frac{1}{2} + \epsilon }}^{\infty } x e^{-\frac{x^{2}}{4n}} \textrm{d}x = 2n e^{-\frac{1}{4} n^{2 \epsilon }}, \end{aligned}$$

thus, by combining both cases, uniformly for $k \in [0,n]$ and $\epsilon \in (0, \frac{1}{2}]$:

$$\begin{aligned} \int _{n^{\frac{1}{2} + \epsilon }}^{\infty } (x-1) e^{-x} \Big (1+\frac{x}{n}\Big )^{k} \textrm{d}x = \mathcal {O}\big (n e^{-\frac{1}{4} n^{2 \epsilon }}\big ). \end{aligned}$$

Furthermore, we note that (roughly speaking) if k is sufficiently far away from n the integration range with negligible contribution can be extended. Namely, setting $\delta = 1-\frac{k}{n}$ and $x=nt$, we obtain for the integrand

$$\begin{aligned} e^{-x} \Big (1+\frac{x}{n}\Big )^{k} = e^{-n \big (t-\frac{k}{n}\ln (1+t)\big )} = e^{-n \big (t-(1-\delta )\ln (1+t)\big )} \le e^{-n \delta t} = e^{-\delta x} = e^{-(1-\frac{k}{n})x}, \end{aligned}$$

where we used the trivial estimate $\ln (1+t) \le t$, for $t \ge 0$. E.g., if $\delta \ge n^{-\frac{1}{4}}$, i.e., $k \le n - n^{\frac{3}{4}}$, one easily obtains from this estimate that the contribution to the integral from the range $x \ge n^{\frac{1}{4} + \epsilon }$ is asymptotically negligible:

$$\begin{aligned} \int _{n^{\frac{1}{4} + \epsilon }}^{n^{\frac{1}{2} + \epsilon }} (x-1) e^{-x} \Big (1+\frac{x}{n}\Big )^{k} \textrm{d}x\le & {} \int _{n^{\frac{1}{4} + \epsilon }}^{\infty } x e^{-(1-\frac{k}{n})x} \textrm{d}x \\\le & {} \int _{n^{\frac{1}{4} + \epsilon }}^{\infty } x e^{-n^{-\frac{1}{4}}x} \textrm{d}x = \mathcal {O}\big (n e^{-n^{\epsilon }}\big ). \end{aligned}$$

To get the asymptotic expressions for the integral stated in the theorem, we will take into account the growth of k w.r.t. n, consider suitable expansions of the integrand for the range $x \le n^{\frac{1}{2}+\epsilon }$ (or $x \le n^{\frac{1}{4}+\epsilon }$, respectively) and evaluate the resulting integrals, where we apply the “tail exchange technique,” i.e., we may extend the integration range to $x \ge 0$, since only asymptotically negligible contributions are added.

k small or in the central region: assuming $k \le n - n^{\frac{3}{4}}$, an expansion of the integrand for $x \le n^{\frac{1}{4} + \epsilon }$ leads to the expansion (with a uniform bound in this range):
$$\begin{aligned} e^{-x} \Big (1+\frac{x}{n}\Big )^{k}&= e^{-x + k \ln (1+\frac{x}{n})} = e^{-x (1-\frac{k}{n}) + \mathcal {O}(\frac{k x^{2}}{n^{2}})} = e^{-x(1-\frac{k}{n})} \cdot \Big (1+\mathcal {O}\big ({\textstyle {\frac{k x^{2}}{n^{2}}}}\big )\Big )\\&= e^{-x(1-\frac{k}{n})} \cdot \Big (1+\mathcal {O}\big (n^{-\frac{1}{2}+2\epsilon }\big )\Big ), \end{aligned}$$
and thus to the following evaluation of the integral:
$$\begin{aligned} \int _{0}^{n^{\frac{1}{4}+\epsilon }} (x-1) e^{-x} \Big (1+\frac{x}{n}\Big )^{k} \textrm{d}x&= \int _{0}^{n^{\frac{1}{4}+\epsilon }} (x-1) e^{-x(1-\frac{k}{n})} \textrm{d}x \cdot \Big (1+\mathcal {O}\big (n^{-\frac{1}{2}+2\epsilon }\big )\Big ) \\&= \int _{0}^{\infty } (x-1) e^{-x(1-\frac{k}{n})} \textrm{d}x \cdot \Big (1+\mathcal {O}\big (n^{-\frac{1}{2}+2\epsilon }\big )\Big ) \\ {}&= \frac{\frac{k}{n}}{(1-\frac{k}{n})^{2}} \cdot \Big (1+\mathcal {O}\big (n^{-\frac{1}{2}+2\epsilon }\big )\Big ), \end{aligned}$$
where we used the formula
$$\begin{aligned} \int _{0}^{\infty } (x-1) e^{-\nu x} \textrm{d}x = \frac{1-\nu }{\nu ^{2}}, \quad \hbox { for}\ \nu > 0. \end{aligned}$$
(26)
Of course, this gives in particular
$$\begin{aligned} \mathbb {E}(R_{n}^{(k)}) \sim {\left\{ \begin{array}{ll} \frac{k}{n}, &{} \quad \hbox { for}\ k = o(n),\\ \frac{\alpha }{(1-\alpha )^{2}}, &{} \quad \hbox {for } k \sim \alpha n, \hbox { with } 0<\alpha < 1. \end{array}\right. } \end{aligned}$$
k subcritically large: in the following, we treat cases with $\frac{k}{n} \rightarrow 1$; there we have to distinguish according to the growth behavior of the difference $d=n-k$. First we examine the region $\sqrt{n} \ll d \ll n$, i.e., $d = o(n)$, but $d = \omega (\sqrt{n})$, for which we get the following expansion of the integrand (for $x \le n^{\frac{1}{2} + \epsilon }$):
$$\begin{aligned} e^{-x} \Big (1+\frac{x}{n}\Big )^{k}&= e^{-x + (n-d) \ln (1+\frac{x}{n})} = e^{-\frac{d}{n} x + \mathcal {O}(\frac{x^{2}}{n}) + \mathcal {O}(\frac{x^{3}}{n^{2}})}\\&= e^{-\frac{dx}{n}} \cdot \Big (1+\mathcal {O}\big ({\textstyle {\frac{x^{2}}{n}}}\big ) + \mathcal {O}\big ({\textstyle {\frac{x^{3}}{n^{2}}}}\big )\Big ). \end{aligned}$$
Considering the corresponding integral (and applying tail exchange), we obtain
$$\begin{aligned} \mathbb {E}(R_{n}^{(k)})= & {} \int _{0}^{\infty } (x-1) e^{-\frac{dx}{n}} \textrm{d}x\\{} & {} + \mathcal {O}\left( \frac{1}{n} \cdot \int _{0}^{\infty } (x-1) x^{2} e^{-\frac{dx}{n}} \textrm{d}x\right) + \mathcal {O}\left( \frac{1}{n^{2}} \cdot \int _{0}^{\infty } (x-1) x^{3} e^{-\frac{dx}{n}} \textrm{d}x\right) . \end{aligned}$$
Using (26) and
$$\begin{aligned} \int _{0}^{\infty } (x-1) x^{\ell } e^{-\nu x} \textrm{d}x = \mathcal {O}\big ({\textstyle {\frac{1}{\nu ^{\ell +2}}}}\big ), \quad \hbox {for } \ell \ge 0 \hbox { and } \nu \in (0,1), \end{aligned}$$
we obtain the stated result:
$$\begin{aligned} \mathbb {E}(R_{n}^{(k)}) = \frac{n^{2}}{d^{2}} + \mathcal {O}\big ({\textstyle {\frac{n}{d}}}\big ) + \mathcal {O}\big ({\textstyle {\frac{n^{3}}{d^{4}}}}\big ) = \frac{n^{2}}{d^{2}} \cdot \Big (1+\mathcal {O}\big ({\textstyle {\frac{d}{n}}}\big ) + \mathcal {O}\big ({\textstyle {\frac{n}{d^{2}}}}\big )\Big ) \sim \frac{n^{2}}{d^{2}}. \end{aligned}$$
k critically large: for the case that the difference $d=n-k$ is of order $\Theta (\sqrt{n})$, we obtain the following expansion of the integrand:
$$\begin{aligned} e^{-x} \Big (1+\frac{x}{n}\Big )^{k} = e^{-\frac{dx}{n} - \frac{x^{2}}{2n}} \cdot \Big (1+\mathcal {O}\big ({\textstyle {\frac{dx^{2}}{n^{2}}}}\big ) + \mathcal {O}\big ({\textstyle {\frac{x^{3}}{n^{2}}}}\big )\Big ). \end{aligned}$$
Thus, after completing the integrals occurring, we get
$$\begin{aligned} \mathbb {E}\big (R_{n}^{(k)}\big )&= \int _{0}^{\infty } (x-1) e^{-\frac{dx}{n} - \frac{x^{2}}{2n}} \, \textrm{d}x\\&\quad + \mathcal {O}\Big (\frac{d}{n^{2}} \cdot \int _{0}^{\infty } x^{3} e^{-\frac{dx}{n} - \frac{x^{2}}{2n}} \, \textrm{d}x\Big ) + \mathcal {O}\Big (\frac{1}{n^{2}} \cdot \int _{0}^{\infty } x^{4} e^{-\frac{dx}{n} - \frac{x^{2}}{2n}} \, \textrm{d}x\Big ). \end{aligned}$$
Since, for $\ell \ge 0$,
$$\begin{aligned} \int _{0}^{\infty } x^{\ell } e^{-\frac{dx}{n} - \frac{x^{2}}{2n}} \textrm{d}x = \mathcal {O}\Big ( \int _{0}^{\infty } x^{\ell } e^{- \frac{x^{2}}{2n}} \textrm{d}x \Big ) = \mathcal {O}\big (n^{\frac{\ell +1}{2}}\big ), \end{aligned}$$
this yields
$$\begin{aligned} \mathbb {E}\big (R_{n}^{(k)}\big ) = \int _{0}^{\infty } x \, e^{-\frac{dx}{n} - \frac{x^{2}}{2n}} \textrm{d}x + \mathcal {O}(\sqrt{n}). \end{aligned}$$
Setting $d = c \sqrt{n}$ and applying the substitution $t=c+\frac{x}{\sqrt{n}}$, we evaluate the integral obtaining the stated result:
$$\begin{aligned} \int _{0}^{\infty } x \, e^{-\frac{dx}{n} - \frac{x^{2}}{2n}} \, \textrm{d}x = n \int _{c}^{\infty } (t-c) \, e^{\frac{c^{2}}{2} - \frac{t^{2}}{2}} \textrm{d}t = n \left( 1-c e^{\frac{c^{2}}{2}} \cdot \int _{c}^{\infty } e^{-\frac{t^{2}}{2}} \textrm{d}t\right) . \end{aligned}$$
k supercritically large: for $d=n-k = o(\sqrt{n})$, we get the expansion
$$\begin{aligned} e^{-x} \Big (1+\frac{x}{n}\Big )^{k} = e^{-\frac{x^{2}}{2n}} \cdot \big (1-\frac{dx}{n}\big ) \cdot \Big (1+\mathcal {O}\big ({\textstyle {\frac{d^{2} x^{2}}{n^{2}}}}\big ) + \mathcal {O}\big ({\textstyle {\frac{x^{3}}{n^{2}}}}\big )\Big ). \end{aligned}$$
Computations analogous to the previous ones, using
$$\begin{aligned} \int _{0}^{\infty } x^{\ell } e^{-\frac{x^{2}}{2n}} \, \textrm{d}x = 2^{\frac{\ell -1}{2}} \Gamma \big ({\textstyle {\frac{\ell +1}{2}}}\big ) \cdot n^{\frac{\ell +1}{2}}, \quad \hbox { for}\ \ell \ge 0, \end{aligned}$$
lead to the stated result:
$$\begin{aligned} \mathbb {E}\big (R_{n}^{(k)}\big )&= \int _{0}^{\infty } x e^{-\frac{x^{2}}{2n}} \textrm{d}x - \frac{d}{n} \int _{0}^{\infty } x^{2} e^{-\frac{x^{2}}{2n}} \textrm{d}x + \mathcal {O}(d^{2}) + \mathcal {O}(\sqrt{n})\\&= n - \frac{\sqrt{\pi }}{\sqrt{2}} d \sqrt{n} + \mathcal {O}(d^{2}) + \mathcal {O}(\sqrt{n}). \end{aligned}$$

$\square $

We can even obtain the exact distribution of $R_{n}^{(k)}$. There are two different approaches we want to briefly sketch: for one, an explicit formula for the generating function $F=F(z,u,v)$ can be found either from manipulating the recursive description (23), or directly by decomposing $\mathcal {F}$ as a tree forming the uncovered root cluster with a forest with covered roots attached. Either way, this yields

$$\begin{aligned} F = T^{\bullet }\big (v X e^{-X}\big ) + \frac{G}{1+u}, \quad \text {with} \quad X= \frac{u G}{1+u}. \end{aligned}$$

Note that the second summand, $\frac{G}{1+u} = z e^{G}$, corresponds to the case where the root vertex is covered. Extracting coefficients via an application of the Lagrange inversion formula then yields an explicit formula for $F_{n,k,m}:= n! [z^{n} u^{k} v^{m}] F(z,u,v)$, i.e., the number of labeled rooted trees with n vertices of which k are uncovered and m belong to the root cluster (for $0 \le m \le k \le n$ and $n \ge 1$):

$$\begin{aligned} F_{n,k,m} = {\left\{ \begin{array}{ll} \left( {\begin{array}{c}n-1\\ k\end{array}}\right) n^{n-1}, &{} \quad m =0,\\ \left( {\begin{array}{c}n\\ m\end{array}}\right) \left( {\begin{array}{c}n-m-1\\ k-m\end{array}}\right) n^{n-k-1} m^{m} (n-m)^{k-m},&\quad m \ge 1. \end{array}\right. } \end{aligned}$$

From this formula, the probabilities $\mathbb {P}(R_{n}^{(k)} = m) = \frac{F_{n,k,m}}{n^{n-1} \left( {\begin{array}{c}n\\ k\end{array}}\right) }$ given in Theorem 9 can be obtained directly. We omit these straightforward, but somewhat lengthy computations, since in the following the results are deduced in a more general and elegant way.

Alternatively, there is also a more combinatorial approach to determine these probabilities: there is an elementary formula enumerating trees where a specified set of vertices forms a cluster.

Lemma 8

Let n and k be positive integers, and let $r_1,r_2,\ldots ,r_{\ell }$ be fixed positive integers with $r_1 +\cdots + r_{\ell } \le k$. Moreover, fix disjoint subsets $R_1,\ldots ,R_{\ell }$ of [k] with $|R_i| = r_i$ for all i. The number of n-vertex labeled trees for which $R_1,\ldots ,R_{\ell }$ are components of the forest induced by the vertices with labels in [k] is given by

$$\begin{aligned} n^{n-k-1}\big (n-r_1-\cdots -r_{\ell }\big )^{k-r_1-\cdots -r_{\ell }-1} (n-k)^{\ell } r_1^{r_1-1}\cdots r_{\ell }^{r_{\ell }-1}. \end{aligned}$$

(27)

Proof

We interpret each such tree T as a spanning tree of a complete graph K with n vertices. Set $r = r_1 + \cdots +r_{\ell }$. The vertices of K can be divided into the sets $R_1,\ldots ,R_{\ell }$, the remaining $k - r$ vertices in $\{1,2,\ldots ,k\}$ forming a set Q, and the $n-k$ vertices with label greater than k forming a set S. Note first that the components induced by the sets $R_1,\ldots ,R_{\ell }$ can be chosen in $r_1^{r_1-2} \cdots r_{\ell }^{r_\ell -2}$ ways. If the vertex sets corresponding to $R_1,\ldots ,R_{\ell }$ are contracted to single vertices $v_1,\ldots ,v_{\ell }$, K becomes a multigraph $K'$ with $n - r + \ell $ vertices, and the tree T becomes a spanning tree $T'$ of $K'$ upon contraction. Note that there are $r_i$ edges from $v_i$ to every other vertex in $K'$, and that $T'$ cannot contain any edges from $v_i$ to another vertex in $\{v_1,\ldots ,v_{\ell } \} \cup Q$. Thus $T'$ remains a spanning tree if all such edges are removed from $K'$ to obtain a multigraph $K''$. Conversely, if we take an arbitrary spanning tree of $K''$ and replace the vertices $v_1,\ldots ,v_{\ell }$ by spanning trees of $R_1,\ldots ,R_{\ell }$, respectively, we obtain a labeled tree with n vertices that has the desired properties. It remains to count spanning trees of $K''$, which has an adjacency matrix of the block form

$$\begin{aligned} A = \begin{bmatrix} O &{} O &{} \textbf{r} \textbf{1}^T \\ O &{} E - I &{} E \\ \textbf{1} \textbf{r}^T &{} E &{} E - I \end{bmatrix} \end{aligned}$$

Here, $\textbf{1}$ denotes a (column) vector of 1s, $\textbf{r}$ a (column) vector whose entries are $r_1,\ldots ,r_{\ell }$, O a matrix of 0s, E a matrix of 1s, and I an identity matrix. The blocks correspond to $\ell $, $k-r$ and $n-k$ rows/columns, respectively. The number of spanning trees can now be determined by means of the matrix-tree theorem: the Laplacian matrix is given by

$$\begin{aligned} L = \begin{bmatrix} (n-k) D &{} O &{} - \textbf{r} \textbf{1}^T \\ O &{} (n-r) I - E &{} -E \\ -\textbf{1} \textbf{r}^T &{} -E &{} n I - E \end{bmatrix}, \end{aligned}$$

where D is a diagonal matrix with diagonal entries $r_1,\ldots ,r_{\ell }$. Our goal is to compute the determinant of L with the first row and column removed; let this matrix be $L_1$. If we subtract $\frac{1}{n-k}$ times the first $\ell -1$ rows from all of the last $n-k$ rows of $L_1$, we obtain a matrix where all entries in the first $\ell -1$ columns, except those in the diagonal, are 0. Thus the determinant is equal to the product of these diagonal entries $r_2(n-k),\ldots ,r_{\ell }(n-k)$ times the determinant of a matrix of the block form

$$\begin{aligned} \begin{bmatrix} (n-r) I - E &{} -E \\ -E &{} n I - \big ( 1 + \tfrac{r-r_1}{n-k} \big )E \end{bmatrix}, \end{aligned}$$

where the blocks have length $k-r$ and $n-k$, respectively. This matrix has $n-r$ as an eigenvalue of multiplicity $k-r-1$, since subtracting $n-r$ times the identity yields a matrix with $k-r$ identical rows. For the same reason, n is an eigenvalue of multiplicity $n-k-1$. It remains to determine the remaining two eigenvalues. The corresponding eigenvectors can be constructed as follows: let the first $k-r$ entries (corresponding to the first block) be equal to a, and the remaining entries equal to b. It is easy to verify that this becomes an eigenvector for the eigenvalue $\lambda $ if the simultaneous equations

$$\begin{aligned} (n-k)a - (n-k)b&= \lambda a, \\ -(k-r)a + (k-r+r_1)b&= \lambda b, \end{aligned}$$

are satisfied. The two solutions are the eigenvalues of the $2 \times 2$-coefficient matrix of this system, and their product is the determinant of this $2 \times 2$ matrix, which is

$$\begin{aligned} (n-k)(k-r+r_1) - (n-k)(k-r) = (n-k)r_1. \end{aligned}$$

It finally follows that the determinant of $L_1$, thus the number of spanning trees of $K''$ is equal to

$$\begin{aligned}{} & {} r_2(n-k) \cdots r_{\ell }(n-k) \cdot (n-r)^{k-r-1} n^{n-k-1} (n-k)r_1 \\{} & {} \quad = n^{n-k-1} (n-r)^{k-r-1} (n-k)^{\ell } r_1\cdots r_{\ell }. \end{aligned}$$

Multiplying by $r_1^{r_1-2} \cdots r_{\ell }^{\ell -2}$ (the number of possibilities for the spanning trees induced in the components $R_1,\ldots ,R_{\ell }$), we obtain the desired formula. $\square $

As a consequence of Lemma 8 for $\ell = 1$, the probability $\mathbb {P}(R_{n}^{(k)} = r)$ can be obtained by multiplying $n^{n-k-1} (n-r)^{k-r-1} (n-k) r^{r-1}$ with $r \left( {\begin{array}{c}k\\ r\end{array}}\right) $ (which gives the number of rooted labeled trees on n vertices whose root is contained in a cluster of size r among the first k uncovered vertices), and then normalizing by $n^{n-1}$, the number of labeled rooted trees on n vertices.

Theorem 9

The exact distribution of $R_{n}^{(k)}$ is characterized by the following probability mass function (p.m.f.), which is given by the following formula for $0 \le m \le k \le n$ and $n \ge 1$ (and is equal to 0 otherwise):

$$\begin{aligned} \mathbb {P}(R_{n}^{(k)} = m) = {\left\{ \begin{array}{ll} 1-\frac{k}{n}, &{} \quad \hbox { for}\ m=0,\\ \frac{m^{m} (n-k) (n-m)^{k-m-1}}{n^{k}} \left( {\begin{array}{c}k\\ m\end{array}}\right) , &{} \quad \hbox { for}\ 1 \le m \le k < n,\\ 1, &{} \quad \hbox { for}\ m=k=n. \end{array}\right. } \end{aligned}$$

Depending on the growth of $k=k(n)$, we obtain the following limiting behavior:

k small, i.e., $k = o(n)$:
$$\begin{aligned} R_{n}^{(k)} \xrightarrow {p} 0. \end{aligned}$$
k in central region, i.e., $k \sim \alpha n$ with $0< \alpha < 1$:
$$\begin{aligned} R_{n}^{(k)} \xrightarrow {d} R_{\alpha }, \quad \text {where the discrete r.v.}\ R_{\alpha }\ \text { is characterized by its p.m.f.}\\ \mathbb {P}(R_{\alpha } = m) =: p_{m} = {\left\{ \begin{array}{ll} 1-\alpha , &{} \quad m = 0,\\ \frac{m^{m}}{m!} (1-\alpha ) \alpha ^{m} e^{-\alpha m}, &{} \quad m \ge 1, \end{array}\right. } \end{aligned}$$
or alternatively by the probability generating function $p(v) = \sum _{m \ge 0} p_{m} v^{m} = \frac{1-\alpha }{1-T^{\bullet }(v \alpha e^{-\alpha })}$.
k subcritically large, i.e., $k=n-d$ with $d=\omega (\sqrt{n})$ and $d=o(n)$:
$$\begin{aligned} \Big (\frac{d}{n}\Big )^{2} \cdot R_{n}^{(k)} \xrightarrow {d} {\textsc {Gamma} }\Big (\frac{1}{2},\frac{1}{2}\Big ), \end{aligned}$$
where $\textsc {Gamma} (\frac{1}{2},\frac{1}{2})$ is a Gamma-distribution characterized by its density $f(x) = \frac{1}{\sqrt{2 \pi x}} \, e^{-\frac{x}{2}}$, for $x > 0$.
k critically large, i.e., $k=n-d$ with $d \sim c \sqrt{n}$ and $c > 0$:
$$\begin{aligned} \frac{1}{n} \cdot R_{n}^{(k)} \xrightarrow {d} R(c), \end{aligned}$$
where the continuous r.v. R(c) is characterized by its density $f_{c}(x) = \frac{1}{\sqrt{2 \pi }} \frac{c}{\sqrt{x} (1-x)^{\frac{3}{2}}} e^{-\frac{c^{2} x}{2(1-x)}}$, for $0< x < 1$.
k supercritically large, i.e., $k=n-d$ with $d=\omega (1)$ and $d = o(\sqrt{n})$:
$$\begin{aligned} \frac{1}{d^{2}} \cdot \big (n - R_{n}^{(k)}\big ) \xrightarrow {d} D, \end{aligned}$$
where the continuous r.v. D is characterized by its density $f(x) = \frac{1}{\sqrt{2 \pi } \, x^{\frac{3}{2}}} \, e^{-\frac{1}{2x}}$, $x > 0$.
k supercritically large with fixed difference, i.e., $k=n-d$ with d fixed:
$$\begin{aligned} n - d - R_{n}^{(k)} \xrightarrow {d} D(d), \end{aligned}$$
where the discrete r.v. D(d) is characterized by the p.m.f.
$$\begin{aligned} \mathbb {P}(D(d) = j) =: p_{j} = e^{-d} \cdot \frac{d (d+j)^{j-1}}{j!} \cdot e^{-j}, \quad j \ge 0, \end{aligned}$$
or alternatively via the probability generating function $p(v) = \sum _{j \ge 0} p_{j} v^{j} = e^{d (T^{\bullet }(\frac{v}{e})-1)}$.

Proof

The probability mass function of $R_{n}^{(k)}$ follows from the considerations made before the statement of the theorem. Due to its explicit nature, the limiting distribution results stated in Theorem 9 can be obtained in a rather straightforward way by applying Stirling’s formula for the factorials after distinguishing several cases. $\square $

Remark 4

Of course, for labeled trees, the distribution of $R_{n}^{(k)}$ matches with the distribution of the cluster size of a random vertex. Furthermore, by conditioning, one can easily transfer the results of Theorem 9 to results for the size $S_{n}^{(k)}$ of the cluster of the k-th uncovered vertex: $\mathbb {P}(S_{n}^{(k)} = m) = \mathbb {P}(R_{n}^{(k)} = m | R_{n}^{(k)} > 0) = \frac{n}{k} \cdot \mathbb {P}(R_{n}^{(k)}=m)$, for $m \ge 1$.

Remark 5

Preliminary considerations indicate that the approaches used to characterize the distribution of $R_{n}^{(k)}$ could be extended, at least in principle, to obtain joint distributions of the size of the root cluster at several times during the uncover process. However, it seems that using them to deduce functional limit theorems for the size of the root cluster, e.g., in the critical region, might be a daunting task.

Remark 6

The distributions R(c) and D occurring in the critical and supercritical region, resp., are related to the Lévy-distribution Lévy$(\gamma )$, $\gamma >0$, with density

$$\begin{aligned} f_{\gamma }(x)=\sqrt{\frac{\gamma }{2\pi }}\frac{e^{-\gamma /(2x)}}{x^{3/2}},\quad x>0. \end{aligned}$$

Actually, D is a standard Levy-distributed random variable ($\gamma =1$), whereas R(c) can be obtained as the reciprocal of a shifted Lévy-distributed random variable $L {\mathop {=}\limits ^{d}} \text {Levy}(c^{2})$:

$$\begin{aligned} R(c) {\mathop {=}\limits ^{d}} \frac{1}{1+L}. \end{aligned}$$

4 Size of the Largest Uncovered Component

With knowledge about the behavior of the root cluster at our disposal, we return to non-rooted labeled trees and study the size of the largest cluster. To this aim, we introduce the random variable $X_{n, r}^{(k)}$ which models the number of components of size r after uncovering the vertices 1 to k in a uniformly random labeled tree of size n.

Formally, $X_{n, r}^{(k)}:\mathcal {T}_n \rightarrow \mathbb {Z}_{\ge 0}$. Note that we have, for all labeled trees $T\in \mathcal {T}_n$,

$$\begin{aligned} \sum _{r=1}^n r\cdot X_{n, r}^{(k)}(T) = k. \end{aligned}$$

(28)

Theorem 10

Let $n, k, r\in \mathbb {Z}_{\ge 0}$ with $0\le r \le k \le n$. The expected number of connected components of size r after uncovering k vertices of a labeled tree of size n chosen uniformly at random is

$$\begin{aligned} \mathbb {E}X_{n,r}^{(k)} = \left( {\begin{array}{c}k\\ r\end{array}}\right) \Big (\frac{r}{n}\Big )^{r-1} \Bigl (1 - \frac{k}{n}\Bigr ) \Bigl (1 - \frac{r}{n}\Bigr )^{k-r-1}. \end{aligned}$$

(29)

Proof

Observe that $X_{n,r}^{(k)}$ can be written as a sum of Bernoulli random variables

$$\begin{aligned} X_{n, r}^{(k)} = \sum _{\begin{array}{c} S \subseteq [k] \\ |S| = r \end{array}} X_{n, S}^{(k)}, \end{aligned}$$

with $X_{n, S}^{(k)}$ being 0 or 1 depending on whether or not the vertices in S form a cluster after k uncover steps. By symmetry and linearity of the expected value, we have

$$\begin{aligned} \mathbb {E}X_{n, r}^{(k)} = \sum _{\begin{array}{c} S \subseteq [k] \\ |S| = r \end{array}} \mathbb {E}X_{n, S}^{(k)} = \left( {\begin{array}{c}k\\ r\end{array}}\right) \mathbb {E}X_{n, [r]}^{(k)}. \end{aligned}$$

A formula for the expected value on the right-hand side follows from Lemma 8 and thus proves the theorem. $\square $

In the spirit of the observation in (28), the formula in Lemma 8 provides a combinatorial proof for the following summation identity.

Corollary 11

Let $n, k \in \mathbb {Z}_{\ge 0}$ with $0\le k\le n$. Then, the identity

$$\begin{aligned} \sum _{r = 1}^{k} \left( {\begin{array}{c}k\\ r\end{array}}\right) r^r n^{n-k-1} (n-r)^{k-r-1} (n-k) = k n^{n-2} \end{aligned}$$

(30)

holds.

Proof

The right-hand side enumerates the vertices in [k] in all labeled trees on n vertices. The left-hand side does the same, with the summands enumerating the vertices in connected components of size r. $\square $

Remark 7

Observe that the identity in (30) can be rewritten as

$$\begin{aligned} \sum _{r = 1}^{k} \left( {\begin{array}{c}k\\ r\end{array}}\right) r^r (n-r)^{k-r-1} = \frac{k}{n-k} n^{k-1}, \end{aligned}$$

which is a specialized form of Abel’s Binomial Theorem—a classical, and well-known result; see, e.g., [24].

For a tree $T\in \mathcal {T}$, let $c_{\max }^{(k)}(T)$ denote the largest connected component of T after uncovering the first k vertices.

Theorem 12

Let $n\in \mathbb {Z}_{\ge 0}$, and let $T_n\in \mathcal {T}_n$ be a tree chosen uniformly at random. Then the behavior of the random variable $c_{\max }^{(k)}(T_n)$ as $n\rightarrow \infty $ can be described as follows:

for $k = n-d$ with $d = \omega (\sqrt{n})$ (subcritical case), we have $c_{\max }^{(k)}(T_n)/n \xrightarrow {p} 0$.
for $k = n-d$ with $d = o(\sqrt{n})$ (supercritical case), we have $c_{\max }^{(k)}(T_n)/n \xrightarrow {p} 1$.

Informally, we can interpret this result as follows: as long as there are substantially more than an order of $\sqrt{n}$ many vertices left to be uncovered, then with high probability, there is no “giant” component (i.e., a component that contains at least a fixed percentage of all vertices). Otherwise, if there are substantially fewer than an order of $\sqrt{n}$ many vertices left to be uncovered, then with high probability there is one such “giant” component whose size is asymptotically equal to n.

Proof of Theorem 12

For the subcritical case, we use the expected root cluster size from Theorem 7. Since a cluster of size r contains the root with probability $\frac{r}{n}$, we have

$$\begin{aligned} \frac{n^2}{d^2}&\sim \mathbb {E}R_{n}^{(n-d)} = \sum _{r = 0}^{n-d} \mathbb {E}X_{n,r}^{(n-d)} \cdot r \cdot \frac{r}{n} \ge \sum _{r = m}^{n-d} \mathbb {E}X_{n,r}^{(n-d)} \frac{r^2}{n} \ge \frac{m^2}{n} \sum _{r=m}^{n-d} \mathbb {E}X_{n,r}^{(n-d)}\\&\ge \frac{m^2}{n} \mathbb {P}(c_{\max }^{(n-d)}(T_n) \ge m). \end{aligned}$$

This implies that

$$\begin{aligned} \mathbb {P}(c_{\max }^{(n-d)}(T_n) \ge m) = \mathcal {O}\Bigl (\frac{n^3}{d^2 m^2}\Bigr ), \end{aligned}$$

so if $m = \epsilon n$ for any fixed $\epsilon > 0$, we have $\mathbb {P}(c_{\max }^{(n-d)}(T_n) \ge m) \rightarrow 0$.

In the supercritical case, we recall the corresponding case for the size of the root cluster from Theorem 7. Using Markov’s inequality yields, for any $\epsilon > 0$,

$$\begin{aligned} \mathbb {P}(n - R_n^{(k)} \ge \epsilon n) \le \frac{n - \mathbb {E}(R_{n}^{(k)})}{\epsilon n} \sim \frac{d \sqrt{n}}{\epsilon n} \xrightarrow {n\rightarrow \infty } 0. \end{aligned}$$

Thus, the root cluster is the largest cluster of size $\sim n$ with high probability. Translating this from rooted to unrooted trees proves the theorem. $\square $

In the critical case where $n - k \sim c\sqrt{n}$ for a constant c, we are also able to characterize the behavior of $c_{\max }^{(k)}(T_n)/n$: this variable converges weakly to a continuous limiting distribution. In order to describe the distribution, we first require the following lemma.

Lemma 13

Suppose that $n - k \sim c \sqrt{n}$, and fix $\alpha > 0$. The probability that the forest induced by the vertices with labels in [k] of a uniformly random labeled tree $T_n$ with n vertices contains two components, each with at least $\alpha n$ vertices, whose sizes are equal, goes to 0 as $n \rightarrow \infty $.

Proof

Suppose that there are two components with a vertices each, where $a \ge \alpha n$. By Lemma 8, the probability for this to happen is given by

$$\begin{aligned} P(a) = \frac{\left( {\begin{array}{c}k\\ a,a,k-2a\end{array}}\right) n^{n-k-1}(n-2a)^{k-2a-1} (n-k)^2 a^{2a-2}}{n^{n-2}}, \end{aligned}$$

provided that $k \ge 2a$ (otherwise, the probability is trivially 0). The initial multinomial coefficient gives the number of ways to choose the labels of the two components, the denominator is simply the total number of labeled trees. Our aim is to estimate this expression. The case $k = 2a$ is easy to deal with separately, so assume that $k > 2a$. Then by Stirling’s formula, we have, for some constant $C_1$,

$$\begin{aligned} P(a)&\le \frac{C_1 k^{k+1/2}}{a^{2a+1}(k-2a)^{k-2a+1/2}} \cdot \frac{n^{n-k-1}(n-2a)^{k-2a-1} (n-k)^2 a^{2a-2}}{n^{n-2}} \\&= \frac{C_1 k^{1/2}(n-k)^2n}{a^3(n-2a)(k-2a)^{1/2}} \Big ( 1 + \frac{n-k}{k} \Big )^{-k} \Big ( 1 + \frac{n-k}{k-2a} \Big )^{k-2a}. \end{aligned}$$

It is well known that $(1+\beta /x)^x$ is increasing in x for fixed $\beta $. So if $k - 2a \le n^{3/4}$, we have, using the assumption that $n-k \sim c \sqrt{n}$,

$$\begin{aligned}&\Big ( 1 + \frac{n-k}{k} \Big )^{-k} \Big ( 1 + \frac{n-k}{k-2a} \Big )^{k-2a} \\&\quad \le \Big ( 1 + \frac{n-k}{k} \Big )^{-k} \Big ( 1 + \frac{n-k}{n^{3/4}} \Big )^{n^{3/4}} \\&\quad = \exp \Big ({-k} \log \Big ( 1 + \frac{n-k}{k} \Big ) + n^{3/4} \log \Big ( 1 + \frac{n-k}{n^{3/4}} \Big ) \Big ) \\&\quad = \exp \Big ({-k} \Big ( \frac{n-k}{k} + \mathcal {O}(n^{-1}) \Big ) + n^{3/4} \Big ( \frac{n-k}{n^{3/4}} - \frac{(n-k)^2}{2n^{3/2}} + \mathcal {O}(n^{-3/4}) \Big )\Big ) \\&\quad = \exp \Big ( {-c^2} n^{1/4} + o(n^{1/4}) \Big ). \end{aligned}$$

In this case, P(a) goes to 0 faster than any power of n. Otherwise, i.e., if $k - 2a > n^{3/4}$, we have $k-2a \sim n-2a$, and using the assumption that $a \ge \alpha n$ as well as the same Taylor expansion as above, we obtain

$$\begin{aligned} P(a) \le \frac{C_2(n-k)^2}{n^{3/2}(n-2a)^{3/2}} \exp \Big ({- \frac{(n-k)^2}{2(n-2a)} }\Big ) \end{aligned}$$

for a constant $C_2$. We can rewrite this as

$$\begin{aligned} P(a) \le \frac{C_2}{n^{3/2}(n-k)} f \Big ( \frac{(n-k)^2}{n-2a} \Big ) \end{aligned}$$

with $f(x) = x^{3/2} e^{-x/2}$. Since this function is bounded, we have proven that $P(a) = \mathcal {O}(n^{-2})$, uniformly in a. Summing over all possible values of a, it follows that the probability in question is $\mathcal {O}(n^{-1})$. In particular, it goes to 0. $\square $

Now we are able to prove the following description of the limiting distribution of $c_{\max }^{(k)}(T_n)/n$.

Theorem 14

Suppose that $n - k \sim c \sqrt{n}$, and fix $\alpha > 0$. The probability that the forest induced by the vertices with labels in [k] of a uniformly random labeled tree $T_n$ with n vertices contains a component with at least $\alpha n$ vertices tends to

$$\begin{aligned} \sum _{j \ge 1} \frac{(-1)^{j-1} c^j}{(2\pi )^{j/2}} \idotsint \limits _{\begin{array}{c} \alpha \le t_1< \cdots< t_j \\ \tau _j = t_1 + \cdots + t_j < 1 \end{array}} \prod _{i=1}^j t_i^{-3/2} (1-\tau _j)^{-3/2} \exp \Big ( - \frac{c^2\tau _j}{2 (1-\tau _j)} \Big ) \,\textrm{d}t_1 \cdots \textrm{d}t_j \end{aligned}$$

as $n \rightarrow \infty $.

Proof

Let $r_1,\ldots ,r_\ell $ be positive integers with $\alpha n \le r_1< \cdots < r_\ell $ and $r_1+\cdots +r_\ell \le k$. By Lemma 8, the probability that the forest induced by vertices with labels in [k] has components of sizes $r_1,\ldots ,r_\ell $ is given by the following formula, with $r = r_1 + \cdots + r_{\ell }$:

$$\begin{aligned} P(r_1,\ldots ,r_{\ell }) = \frac{\left( {\begin{array}{c}k\\ r_1,\ldots ,r_{\ell },k-r\end{array}}\right) n^{n-k-1}(n-r)^{k-r-1} (n-k)^{\ell } r_1^{r_1-1}\cdots r_{\ell }^{r_{\ell }-1}}{n^{n-2}}, \end{aligned}$$

and the same argument as in Lemma 13 shows that this probability is $\mathcal {O}(n^{-\ell })$, uniformly in $r_1,\ldots ,r_\ell $. Moreover, if we set $r_i = t_i n$, Stirling’s formula gives us the following asymptotic formula for this probability after some manipulations: with $t = t_1 + \cdots + t_{\ell }$, it is

$$\begin{aligned} P(r_1,\ldots ,r_{\ell }) \sim \frac{c^\ell }{n^{\ell } (2\pi )^{\ell /2}} \prod _{i=1}^\ell t_i^{-3/2} (1-t)^{-3/2} \exp \Big ( - \frac{c^2t}{2 (1-t)} \Big ). \end{aligned}$$

Moreover, by the inclusion–exclusion principle, the probability that there is at least one component of size at least $\alpha n$ is given by

$$\begin{aligned}{} & {} \sum _{\alpha n \le r_1 \le k} P(r_1) - \sum _{\begin{array}{c} \alpha n \le r_1< r_2 \\ r_1+r_2 \le k \end{array}} P(r_1,r_2) \\{} & {} \quad + \cdots + (-1)^{j-1} \sum _{\begin{array}{c} \alpha n \le r_1< \cdots < r_j \\ r_1 + \cdots + r_j \le k \end{array}} P(r_1,\ldots ,r_j) + \cdots + \mathcal {O}(n^{-1}). \end{aligned}$$

The final error term takes the possibility into account that there are two components of the same size. The probability of this event is $\mathcal {O}(n^{-1})$ by Lemma 13. Note that we actually only need a finite number of terms, as the sums become empty for $j \alpha > 1$. If we plug in the asymptotic formula for$P(r_1,\ldots ,r_{\ell })$ and pass to the limit, the sums become integrals, and we obtain the desired formula. $\square $

Data availability

Not applicable.

Notes

Observe that $y_1$ does not occur, since at least one of the ends of every edge has a label greater than 1.
We explicitly model edge increments here instead of edges, because with this approach we do not need to care about which edge is being uncovered. Our model explicitly only captures the behavior of the number of uncovered edges.
We slightly abuse notation: formally, we would need to introduce auxiliary variables that are distributed according to the specified binomial and Bernoulli distributions.

References

Addario-Berry, L., Broutin, N., Holmgren, C.: Cutting down trees with a Markov chainsaw. Ann. Appl. Probab. 24(6), 2297–2339 (2014)
Article MathSciNet Google Scholar
Aldous, D., Pitman, J.: The standard additive coalescent. Ann. Probab. 26(4), 1703–1726 (1998)
Article MathSciNet Google Scholar
Bertoin, J.: Random Fragmentation and Coagulation Processes. Cambridge Studies in Advanced Mathematics, vol. 102. Cambridge University Press, Cambridge (2006)
Berzunza Ojeda, G., Holmgren, C.: Invariance principle for fragmentation processes derived from conditioned stable Galton–Watson trees. arXiv Preprint (2010). arXiv:2010.07880
Billingsley, P.: Convergence of Probability Measures, 2nd edn. Wiley, New York (1999)
Book Google Scholar
Chassaing, P., Louchard, G.: Phase transition for parking blocks, Brownian excursion and coalescence. Random Struct. Algorithms 21(1), 76–119 (2002)
Article MathSciNet Google Scholar
Fill, J.A., Kapur, N., Panholzer, A.: Destruction of very simple trees. Algorithmica 46(3–4), 345–366 (2006)
Article MathSciNet Google Scholar
Flajolet, P., Grabner, P.J., Kirschenhofer, P., Prodinger, H.: On Ramanujan’s $Q$-function. J. Comput. Appl. Math. 58(1), 103–116 (1995)
Article MathSciNet Google Scholar
Flajolet, P., Sedgewick, R.: Analytic Combinatorics. Cambridge University Press, Cambridge (2009)
Book Google Scholar
Heuberger, C., Kropf, S.: Higher dimensional quasi-power theorem and Berry–Esseen inequality. Monatsh. Math. 187(2), 293–314 (2018)
Article MathSciNet Google Scholar
Janson, S.: Random cutting and records in deterministic and random trees. Random Struct. Algorithms 29(2), 139–179 (2006)
Article MathSciNet Google Scholar
Janson, S.: Sorting using complete subintervals and the maximum number of runs in a randomly evolving sequence. Ann. Combin. 12(4), 417–447 (2009)
Article MathSciNet Google Scholar
Kingman, J.F.C.: The coalescent. Stoch. Process. Appl. 13(3), 235–248 (1982)
Article MathSciNet Google Scholar
Klenke, A.: Probability Theory—A Comprehensive Course, 3rd edn. Universitext. Springer, Cham (2020)
Kuba, M., Panholzer, A.: Isolating a leaf in rooted trees via random cuttings. Ann. Combin. 12(1), 81–99 (2008)
Article MathSciNet Google Scholar
Martin, J.L., Reiner, V.: Factorization of some weighted spanning tree enumerators. J. Combin. Theory Ser. A 104(2), 287–300 (2003)
Article MathSciNet Google Scholar
Meir, A., Moon, J.W.: Cutting down random trees. J. Austral. Math. Soc. 11, 313–324 (1970)
Article MathSciNet Google Scholar
Miermont, G.: Self-similar fragmentations derived from the stable tree. I. Splitting at heights. Probab. Theory Relat. Fields 127(3), 423–454 (2003)
Article MathSciNet Google Scholar
Miermont, G.: Self-similar fragmentations derived from the stable tree. II. Splitting at nodes. Probab. Theory Relat. Fields 131(3), 341–375 (2005)
Article MathSciNet Google Scholar
Pitman, J.: Coalescent random forests. J. Combin. Theory Ser. A 85(2), 165–193 (1999)
Article MathSciNet Google Scholar
Pitman, J.: Coalescents with multiple collisions. Ann. Probab. 27(4), 1870–1902 (1999)
Article MathSciNet Google Scholar
Remmel, J.B., Williamson, S.G.: Spanning trees and function classes. Electron. J. Combin. 9(1), Research Paper 34, 24 (2002)
Revuz, D., Yor, M.: Continuous Martingales and Brownian Motion. Grundlehren der mathematischen Wissenschaften, vol. 293, 3rd edn. Springer, Berlin (1999)
Riordan, J.: Combinatorial Identities. Wiley, New York (1968)
Google Scholar
Thévenin, P.: A geometric representation of fragmentation processes on stable trees. Ann. Probab. 49(5), 2416–2476 (2021)
Article MathSciNet Google Scholar

Download references

Acknowledgements

We would like to thank Svante Janson for pointing out a gap in the proof of Theorem 3 in the extended abstract of this paper. Further we want to thank Markus Kuba for pointing out the relations to the Lévy-distribution for certain limiting distributions occurring in Theorem 9. Finally, we would like to thank the anonymous reviewers for their helpful comments.

Funding

Open access funding provided by University of Graz. The author S. Wagner was supported by the Knut and Alice Wallenberg Foundation. An extended abstract of this paper appeared in the Proceedings of the 33rd International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2022).

Author information

Authors and Affiliations

University of Graz, Graz, Austria
Benjamin Hackl
University of Klagenfurt, Klagenfurt, Austria
Benjamin Hackl
Uppsala University, Uppsala, Sweden
Benjamin Hackl & Stephan Wagner
TU Wien, Vienna, Austria
Alois Panholzer

Authors

Benjamin Hackl
View author publications
You can also search for this author in PubMed Google Scholar
Alois Panholzer
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Wagner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin Hackl.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hackl, B., Panholzer, A. & Wagner, S. The Uncover Process for Random Labeled Trees. La Matematica 2, 861–892 (2023). https://doi.org/10.1007/s44007-023-00060-3

Download citation

Received: 30 December 2022
Revised: 09 August 2023
Accepted: 14 August 2023
Published: 02 October 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s44007-023-00060-3

The Uncover Process for Random Labeled Trees

Abstract

Similar content being viewed by others

Two Applications of Random Spanning Forests

A Combinatorial Link Between Labelled Graphs and Increasingly Labelled Schröder Trees

An Asymptotic Analysis of Labeled and Unlabeled k-Trees

1 Introduction

2 The Number of Uncovered Edges

Definition 1

Theorem 1

Remark 2

Proof of Theorem 1

Corollary 2

Proof

2.1 A Closer Look at the Stochastic Process

Theorem 3

Remark 3

Lemma 4

Proof

Lemma 5

Proof

Proof of Theorem 3

Proposition 6

Proof

3 Size of the Root Cluster

Theorem 7

Proof

Lemma 8

Proof

Theorem 9

Proof

Remark 4

Remark 5

Remark 6

4 Size of the Largest Uncovered Component

Theorem 10

Proof

Corollary 11

Proof

Remark 7

Theorem 12

Proof of Theorem 12

Lemma 13

Proof

Theorem 14

Proof

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation