1 Introduction

The motivation of this paper is a new section in Donald Knuth’s The Art of Computer Programming [16], which is dedicated to Depth-First Search (DFS) in a digraph. Briefly, the DFS starts with an arbitrary vertex and explores the arcs from that vertex one by one. When an arc is found leading to a vertex that has not been seen before, the DFS explores the arcs from that vertex in the same way, in a recursive fashion, before returning to the next arc from its parent. This eventually yields a tree containing all descendants of the first vertex (which is the root of the tree). If there still are some unseen vertices, the DFS starts again with one of them and finds a new tree, and so on until all vertices are found. We refer to [16] for details as well as for historical notes. (See also the pseudo-code below and S1–S2 in Sect. 7.) Note that the digraphs in [16] and here are multi-digraphs, where loops and multiple arcs are allowed. (Although in our random model, these are few and usually not important.) The DFS algorithm generates a spanning forest (the depth-first forest) in the digraph, with all arcs in the forest directed away from the roots. Our main purpose is to study the properties of the depth-first forest, starting with a random digraph G; in particular we study the distribution of the depth of vertices in the depth-first forest.

The random digraph model that we consider (following Knuth [16]) has n vertices and a given outdegree distribution \(\textbf{P}\), which in the main part of the present paper is a geometric distribution \({\text {Ge}}(1-p)\) for some fixed \(0<p<1\). (See Sect. 1.1 for definitions of the two versions \({\text {Ge}}(1-p)\) and \({\text {Ge}}_1(1-p)\) of geometric distributions.) The outdegrees (number of outgoing arcs) of the n vertices are independent random numbers with this distribution. The endpoint of each arc is uniformly selected at random among the n vertices, independently of all other arcs. (Therefore, an arc can loop back to the starting vertex, and multiple arcs can occur.) We consider asymptotics as \(n\rightarrow \infty \) for a fixed outdegree distribution.

In the present paper, we study the case of a geometric outdegree distribution in detail; we also (in Sect. 6) briefly give corresponding results for the shifted geometric outdegree distribution \({\text {Ge}}_1(1-p)\) and discuss the similarities and differences between the two cases. The case of a general outdegree distribution (with finite variance) will be studied in a forthcoming paper [12], where we use a somewhat different method which allows us to extend many (but not all) of the results in the present paper and obtain similar, but partly weaker, results; see also Sect. 7. One reason for studying the geometric case separately is that its lack-of-memory property leads to interesting features and simplifications not present for general outdegree distributions; this is seen both in [16] and in the proofs and results below. In particular, the depth process studied in Sect. 2 will be a Markov chain, which is the basis of our analysis.

In addition to studying the depth-first forest, we give also (in Sect. 5) some results on the number of edges of different types in the depth-first jungle; this is defined in [16] as the original digraph with arcs classified by the DFS algorithm into the following five types, see Fig. 1 for examples:

  • loops;

  • tree arcs, the arcs in the resulting depth-first forest;

  • back arcs, the arcs which point to an ancestor of the current vertex in the current tree;

  • forward arcs, the arcs which point to an already discovered descendant of the current vertex in the current tree;

  • cross arcs, all other arcs (these point to an already discovered vertex which is neither a descendant nor an ancestor of the current vertex and might be in another tree).

(See further the exercises in [16].)

Fig. 1
figure 1

Example of a depth-first forest (jungle) from [16], by courtesy of Donald Knuth. Tree arcs are solid (e.g.,\(\textcircled {9}\rightarrow \textcircled {3}\)). For example, \(\textcircled {3}\dasharrow \textcircled {3}\) is a loop, \(\textcircled {2}\dasharrow \textcircled {3}\) is a back arc, \(\textcircled {9}\dasharrow \textcircled {7}\) is a forward arc, \(\textcircled {8}\dasharrow \textcircled {4}\) and \(\textcircled {0}\dasharrow \textcircled {2}\) are cross arcs

For completeness of the presentation, we show a pseudo-code of the depth-first search with indications of the arc classifications. The stack at time t is the chain of the arguments of the successive calls of function \(\textsc {Deep}{}\) which are not terminated at time t. Thus a vertex u is an ancestor of vertex v in the depth-first forest if u is still in the stack when v is discovered. In the pseudo-code, the instructions which are not necessary for the functioning of the algorithm are prefixed as comment with a #, for example, the evolution of the parameters t and d. (To be precise, d(t) is set each time t is changed.) \({{\mathcal {N}}}(u)\) denotes the multiset of children of u, i.e., the endpoints of the arcs starting at u. (This is a multiset since there may be multiple arcs.)

figure a
figure b

Remark 1.1

Some related results for DFS in an undirected Erdős–Rényi graph \(G(n,\lambda /n)\) are proved by Faraud and Ménard [9] and Diskin and Krivelevich [8], and DFS in a random Erdős–Rényi digraph has been studied for example in the proof of [17, Theorem 3]. These models are closely related to our model with a Poisson outdegree distribution \(\textbf{P}\); they will therefore be further discussed in [12].

Remark 1.2

We consider only the case of a fixed outdegree distribution \(\textbf{P}\). The results can be extended to distributions \(\textbf{P}_n\) depending on n, under suitable conditions. This is particularly interesting in the critical case, with expectations \(\lambda _n\rightarrow 1\) (where \(\lambda _n\) is the expectation of \(\textbf{P}_n\)); however, this is out of the scope of the present paper.

The main results for a geometric outdegree distribution are stated and proved in Sects. 25. We analyze the process d(t) of depths of the vertices, in the order they are found by the DFS. For a geometric outdegree distribution (but not in general), d(t) is a Markov chain, and we find its first-order limit by a martingale argument; moreover, we show that fluctuations are of order \(\sqrt{n}\), and, in a subrange, asymptotically Gaussian. The first-order limit of the function d(t)/n is an explicit and simple function of t/n and \(\lambda \) which is \(\left[ \frac{t}{n}+\frac{1}{\lambda }\log (1-\frac{t}{n})\right] ^+\), where \([x]^+\) is the positive part of x (Theorem 2.4). This leads to results on the height of the depth-first forest (Corollary 2.5 and Theorem 3.4), average depth (Corollary 2.6), number of trees in the forest (Theorem 4.1), size of the largest tree in the forest (Theorem 4.3), and the results on the types of arcs (Theorems 5.1 and 5.3). We also find results for the numbers of different types of arcs defined above; this includes verifying some conjectures from previous versions of [16].

In Sect. 6, we study briefly the case of a shifted geometric outdegree distribution. The same method as in previous sections works in this case too, but the explicit results are somewhat different (Theorem 6.1). In particular, the first-order limit of d(t)/n is a new explicit function of t/n and \(\lambda \) which is different from the function for the non-shifted geometric distribution, namely \(\frac{\lambda }{\lambda -1}\left[ \frac{t}{n}+\frac{1}{\lambda }\log (1-\frac{t}{n})\right] ^+\). One motivation for this section is to show that some of the relations found in Sects. 25 for a geometric outdegree distribution do not hold for arbitrary distributions.

We end in Sect. 7 with some comments on the case of general outdegree distributions.

Appendix A gives a generalized version of Donsker’s theorem in a form convenient to use in our proofs.

1.1 Some Notation

We denote the given outdegree distribution by \(\textbf{P}\). Recall that our standing assumption is that the outdegrees of the vertices are i.i.d. (independent and identically distributed).

The mean outdegree, i.e., the expectation of \(\textbf{P}\), is denoted by \(\lambda \). In analogy with branching processes, we say that the random digraph is subcritical if \(\lambda <1\), critical if \(\lambda =1\), and supercritical if \(\lambda >1\).

As usual, w.h.p.  means with high probability, i.e., with probability \(1-o(1)\) as \({n\rightarrow \infty }\). We use \(\overset{\textrm{p}}{\longrightarrow }\) for convergence in probability, and \(\overset{\textrm{d}}{\longrightarrow }\) for convergence in distribution of random variables.

Moreover, let \((a_n)\) be a sequence of positive numbers, and \(X_n\) be a sequence of random variables. We write \(X_n=o_{\textrm{p}}(a_n)\) if, as \({n\rightarrow \infty }\), \(X_n/a_n\overset{\textrm{p}}{\longrightarrow }0\), i.e., if for every \(\varepsilon >0\), we have \({\mathbb {P}}(|X_n|>\varepsilon a_n)\rightarrow 0\). Note that this is equivalent to the existence of a sequence \(\varepsilon _n\rightarrow 0\) such that \({\mathbb {P}}(|X_n|>\varepsilon _n a_n)\rightarrow 0\), or in other words \(|X_n|\le \varepsilon _na_n\) w.h.p. (This is sometimes denoted “\(X_n=o(a_n)\) w.h.p. ”, but we will not use this notation.)

Furthermore, \(X_n=O_{L^2}(a_n)\) means \({\mathbb {E}}\bigl [|X_n/a_n|^2\bigr ]=O(1)\). Note that \(X_n=O_{L^2}(a_n)\) implies \(X_n=o_{\textrm{p}}(\omega _na_n)\), for any sequence \(\omega _n\rightarrow \infty \). Note also that \(X_n=O_{L^2}(a_n)\) implies \({\mathbb {E}}X_n=O(a_n)\); thus error terms of this type imply immediately estimates for expectations and second moments. In particular, for the most common case below, \(X_n=O_{L^2}(n^{1/2})\) is equivalent to \({\mathbb {E}}X_n=O(n^{1/2})\) and \({\text {Var}}X_n= O(n)\).

\({\text {Ge}}(1-p)\) denotes the geometric distribution on \(\{0,1,\dots \}\); thus \(\eta \sim {\text {Ge}}(1-p)\) means that \(\eta \) is a random variable with

$$\begin{aligned} {\mathbb {P}}(\eta =k)=p^k(1-p), \qquad k\ge 0. \end{aligned}$$
(1.1)

Similarly, \({\text {Ge}}_1(1-p)\) denotes the shifted geometric distribution on \(\{1,2,\dots \}\); thus \(\eta \sim {\text {Ge}}_1(1-p)\) means

$$\begin{aligned} {\mathbb {P}}(\eta =k)=p^{k-1}(1-p), \qquad k\ge 1. \end{aligned}$$
(1.2)

\({\text {Po}}(\lambda )\) denotes a Poisson distribution with mean \(\lambda \).

We define \(\rho _0(x)\), for \(x\ge 0\), as the largest solution in [0, 1) to

$$\begin{aligned} 1-\rho _0= e^{-x\rho _0} . \end{aligned}$$
(1.3)

As is well known, \(\rho _0(x)\) is the survival probability of a Bienaymé–Galton–Watson process with a Poisson offspring distribution \({\text {Po}}(x)\) with mean x. We have \(\rho _0(x)=0\) for \(x\le 1\) and \(0<\rho _0(x)<1\) for \(x>1\). (See e.g.  [4, Theorem I.5.1].)

For a real number x, we write \(x^+:=\max \{x,0\}\). Let \([n]:=\{1,\dots ,n\}\). All logarithms are natural. C and c are sometimes used for positive constants.

Remark 1.3

We state many results with error estimates in \(L^2\), which means estimates on the second moment; we conjecture that the results extend to higher moment and estimates in \(L^p\) for any \(p<\infty \), but we have not pursued this.

2 Depth Analysis with Geometric Outdegree Distribution

In this section and the next ones (until we explicitly say otherwise in Sect. 6), we assume that the outdegree distribution is geometric \({\text {Ge}}(1-p)\) for some fixed \(0<p<1\) and thus has mean

$$\begin{aligned} \lambda :=\frac{p}{1-p}. \end{aligned}$$
(2.1)

When doing the DFS on a random digraph of the type studied in this paper, it is natural to reveal the outdegree of a vertex as soon as we find it. (See S1–S2 in Sect. 7.) However, for a geometric outdegree distribution, because of its lack-of-memory property, we do not have to immediately reveal the outdegree when we find a new vertex v. Instead, we only check whether there is at least one outgoing arc (probability p), and if so, we find its endpoint and explore this endpoint if it has not already been visited; eventually, we return to v and then we check whether there is another outgoing arc (again probability p, by the lack-of-memory property), and so on. This will yield the important Markov property in the construction in the next subsection. In the arguments below, we use only this version of the DFS.

By a future arc from some vertex, we mean an arc from that vertex that at the current time has not yet been seen by the DFS (using the version just described). Note that this is a temporary designation which changes as the DFS proceeds, unlike the permanent classification into five types discussed in the introduction.

2.1 Depth Markov Chain

Our aim is to track the evolution of the search depth as a function of the number t of discovered vertices. Let \(v_t\) be the t-th vertex discovered by the DFS (\(t=1,\dots ,n\)), and let d(t) be the depth of \(v_t\) in the resulting depth-first forest, i.e., the number of tree edges that connect the root of the current tree to \(v_t\). The first found vertex \(v_1\) is a root, and thus \(d(1)=0\).

The quantity d(t) follows a Markov chain with transitions (\(1\le t<n\)):

  1. (i)

    \(d(t+1)=d(t)+1\).

    This happens if, for some \(k\ge 1\), \(v_t\) has at least k outgoing arcs, the first \(k-1\) arcs lead to vertices already visited, and the kth arc leads to a new vertex (which then becomes \(v_{t+1}\)). The probability of this is

    $$\begin{aligned} \sum _{k=1}^\infty p^k \Bigl (\frac{t}{n}\Bigr )^{k-1}\Bigl (1-\frac{t}{n}\Bigr ) =\frac{(1-t/n)p}{1-pt/n}. \end{aligned}$$
    (2.2)
  2. (ii)

    \(d(t+1)=d(t)\), assuming \(d(t)>0\).

    This holds if all arcs from \(v_t\) lead to already visited vertices, i.e., (i) does not happen, and furthermore, the parent of \(v_t\) has at least one future arc leading to an unvisited vertex. These two events are independent. Moreover, by the lack-of-memory property, the number of future arcs from the parent of \(v_t\) has also the distribution \({\text {Ge}}(1-p)\). Hence, the probability that one of these future arcs leads to an unvisited vertex equals the probability in (2.2). The probability of (ii) is thus

    $$\begin{aligned} \Bigl (1-\frac{(1-t/n)p}{1-pt/n}\Bigr )\frac{(1-t/n)p}{1-pt/n}. \end{aligned}$$
    (2.3)
  3. (iii)

    \(d(t+1)=d(t)-\ell \), assuming \(d(t)>\ell \ge 1\).

    This happens if all arcs from \(v_t\) lead to already visited vertices, and so do all future arcs from the \(\ell \) nearest ancestors of \(v_t\), while the \((\ell +1)\)th ancestor has at least one future arc leading to an unvisited vertex. The argument in (ii) generalizes and shows that this has probability

    $$\begin{aligned} \Bigl (1-\frac{(1-t/n)p}{1-pt/n}\Bigr )^{\ell +1}\frac{(1-t/n)p}{1-pt/n}. \end{aligned}$$
    (2.4)
  4. (iv)

    \(d(t+1)=d(t)-\ell \), assuming \(d(t)=\ell \ge 0\).

    By the same argument as in (ii) and (iii), except that the \((\ell +1)\)th ancestor does not exist and we ignore it, we obtain the probability

    $$\begin{aligned} \Bigl (1-\frac{(1-t/n)p}{1-pt/n}\Bigr )^{\ell +1}. \end{aligned}$$
    (2.5)

Note that (iv) is the case when \(d(t+1)=0\) and thus \(v_{t+1}\) is the root of a new tree in the depth-first forest.

We can summarize (i)–(iv) in the formula

$$\begin{aligned} d(t+1) = \bigl (d(t) +1-\xi _t\bigr )^+, \end{aligned}$$
(2.6)

where \(\xi _t\) is a random variable, independent of the history, with the distribution

$$\begin{aligned} {\mathbb {P}}(\xi _t=k)=(1-\pi _t)^k\pi _t, \quad k\ge 0, \end{aligned}$$
(2.7)

where

$$\begin{aligned} \pi _t:= \frac{(1-t/n)p}{1-pt/n} =1-\frac{1-p}{1-pt/n}. \end{aligned}$$
(2.8)

In other words, \(\xi _t\) has the geometric distribution \({\text {Ge}}(\pi _t)\). Define

$$\begin{aligned} \widetilde{d}(t):=\sum _{i=1}^{t-1}(1-\xi _i), \qquad t\ge 1, \end{aligned}$$
(2.9)

(in particular, \(\widetilde{d}(1)=0\)) and note that (2.9) is a sum of independent random variables. Then d(t) can be recovered from the simpler process \(\widetilde{d}(t)\) as follows.

Lemma 2.1

We have

$$\begin{aligned} d(t)=\widetilde{d}(t) - \min _{1\le j\le t} \widetilde{d}(j), \qquad 1\le t \le n .\end{aligned}$$
(2.10)

Proof

We use induction on t. Evidently, (2.10) holds for \(t=1\) since \(d(1)=\widetilde{d}(1)=0\).

Suppose that (2.10) holds for some \(t<n\). Then (2.9) yields

$$\begin{aligned} \widetilde{d}(t+1)=\widetilde{d}(t)+1-\xi _t = d(t)+1-\xi _t+\min _{1\le j\le t} \widetilde{d}(j). \end{aligned}$$
(2.11)

If \(d(t)+1-\xi _t\ge 0\), then (2.11) shows that \(\widetilde{d}(t+1)\ge \min _{1\le j\le t} \widetilde{d}(j)\), and thus \(\min _{1\le j\le t+1} \widetilde{d}(j)=\min _{1\le j\le t} \widetilde{d}(j)\); furthermore, \(d(t+1)=d(t)+1-\xi _t\) by (2.6), and it follows that (2.10) holds for \(t+1\).

On the other hand, if \(d(t)+1-\xi _t<0\), then (2.11) shows that \(\widetilde{d}(t+1)<\min _{1\le j\le t} \widetilde{d}(j)\), and thus \(\min _{1\le j\le t+1} \widetilde{d}(j)=\widetilde{d}(t+1)\). In this case, \(d(t+1)=0\) by (2.6), and it follows that (2.10) holds for \(t+1\) in this case too. \(\square \)

Remark 2.2

Similar formulas have been used for other, related, problems with random graphs and trees, where trees have been coded as walks, see for example [3, Section 1.3]. Note that in our case, unlike, e.g., [3], \(\widetilde{d}(t)\) may have negative jumps of arbitrary size.

2.2 Main Result for Depth Analysis

Note first that (2.8) implies that, using \(\lambda =p/(1-p)\),

$$\begin{aligned} \mu _t:={\mathbb {E}}\xi _t=\frac{1-\pi _t}{\pi _t} =\frac{1-p}{p(1-t/n)} =\frac{1}{\lambda (1-t/n)} . \end{aligned}$$
(2.12)

Hence, (2.9) implies that the expectation of \(\widetilde{d}(t)\) is

$$\begin{aligned} {\mathbb {E}}\bigl [\widetilde{d}(t)\bigr ]&= \sum _{i=1}^{t-1} (1-{\mathbb {E}}\xi _i) = \sum _{i=1}^{t-1} (1-\mu _i) = \sum _{i=1}^{t-1} \Bigl (1-\frac{1}{\lambda (1-i/n)}\Bigr ) . \end{aligned}$$
(2.13)

Let \(\theta :=t/n\). We fix a \(\theta ^*<1\) and obtain that, uniformly for \(\theta \le \theta ^*\),

$$\begin{aligned} {\mathbb {E}}\bigl [\widetilde{d}(t)\bigr ] = \int _0^t\Bigl (1-\frac{1}{\lambda (1- x/n)}\Bigr ) \,\textrm{d}x + O(1) =n\widetilde{\ell }(\theta )+O(1), \end{aligned}$$
(2.14)

where

$$\begin{aligned} \widetilde{\ell }(\theta ) := \int _0^\theta \Bigl (1-\frac{1}{\lambda (1-x)}\Bigr ) \,\textrm{d}x=\theta +\lambda ^{-1}\log (1-\theta ) . \end{aligned}$$
(2.15)

Note that the derivative \(\widetilde{\ell }'(\theta )=1-\lambda ^{-1}/(1-\theta )\) is (strictly) decreasing on (0, 1), i.e., \(\widetilde{\ell }\) is concave. Moreover, if \(\lambda >1\) (i.e., \(p>\frac{1}{2}\)) (the supercritical case), then \(\widetilde{\ell }'(0)>0\), and (2.15) shows that \(\widetilde{\ell }(\theta )\) is positive and increasing for \(\theta <\theta _0:=1-\lambda ^{-1}=(2p-1)/p\). After the maximum at \(\theta _0\), \(\widetilde{\ell }(\theta )\) decreases and tends to \(-\infty \) as \(\theta \nearrow 1\). Hence, there exists a \(\theta _0<\theta _1<1\) such that \(\widetilde{\ell }(\theta _1)=0\); we then have \(\widetilde{\ell }(\theta )>0\) for \(0<\theta <\theta _1\) and \(\widetilde{\ell }(\theta )<0\) for \(\theta >\theta _1\). We will see that in this case the depth-first forest w.h.p.  contains a giant tree, of order and height both linear in n, while all other trees are small.

On the other hand, if \(\lambda \le 1\) (i.e., \(p\le \frac{1}{2}\)) (the subcritical and critical cases), then \(\widetilde{\ell }'(0)\le 0\) and \(\widetilde{\ell }(\theta )\) is negative and decreasing for all \(\theta \in (0,1)\). In this case, we define \(\theta _0:=\theta _1:=0\) and note that the properties just stated for \(\widetilde{\ell }\) still hold (rather trivially). We will see that in this case w.h.p.  all trees in the depth-first forest are small.

Note that in all cases,

$$\begin{aligned} \theta _0:=\bigl (1-\lambda ^{-1}\bigr )^+ = {\left\{ \begin{array}{ll} 1-\lambda ^{-1},&{} \lambda >1,\\ 0,&{}\lambda \le 1, \end{array}\right. } \end{aligned}$$
(2.16)

and that \(\theta _1\) is the largest solution in [0, 1) to

$$\begin{aligned} \log (1-\theta _1)=-\lambda \theta _1. \end{aligned}$$
(2.17)

Remark 2.3

The equation (2.17) may also be written \(1-\theta _1=\exp (-\lambda \theta _1)\), which shows that

$$\begin{aligned} \theta _1=\rho _0(\lambda ), \end{aligned}$$
(2.18)

the survival probability of a Bienaymé–Galton–Watson process with \({\text {Po}}(\lambda )\) offspring distribution defined in (1.3).

We define \(\widetilde{\ell }^+(\theta ):=\bigl (\widetilde{\ell }(\theta )\bigr )^+\). Thus, by (2.15) and the comments above,

$$\begin{aligned} \widetilde{\ell }^+(\theta )= {\left\{ \begin{array}{ll} \theta +\lambda ^{-1}\log (1-\theta ) ,&{} 0\le \theta \le \theta _1,\\ 0,&{}\theta _1\le \theta \le 1. \end{array}\right. } \end{aligned}$$
(2.19)

We can now state one of our main results.

Theorem 2.4

We have

$$\begin{aligned} \max _{1\le t\le n} \bigl |d(t)- n\widetilde{\ell }^+(t/n)\bigr |=O_{L^2}(n^{1/2}). \end{aligned}$$
(2.20)

Proof

Since (2.9) is a sum of independent random variables, \(\widetilde{d}(t)-{\mathbb {E}}\widetilde{d}(t)\) (\(t=1,\dots ,n\)) is a martingale, and Doob’s inequality [10, Theorem 10.9.4] yields, for all \(T\le n\),

$$\begin{aligned} {\mathbb {E}}\bigl [\max _{t\le T}|\widetilde{d}(t)-{\mathbb {E}}\widetilde{d}(t)|^2\bigr ] \le 4 {\mathbb {E}}\bigl [|\widetilde{d}(T)-{\mathbb {E}}\widetilde{d}(T)|^2\bigr ] =4 \sum _{i=1}^{T-1} {\text {Var}}(\xi _i). \end{aligned}$$
(2.21)

As above, fix \(\theta ^*<1\), and assume, as we may, that \(\theta ^*>\theta _1\). Let \(T^*:=\lfloor n\theta ^*\rfloor \), and consider first \(t\le T^*\). For \(i<T^*\), we have \({\text {Var}}\xi _i = O(1)\), and thus, for \(T=T^*\), the sum in (2.21) is \(O(T^*)=O(n)\). Consequently, (2.21) yields

$$\begin{aligned} \max _{t\le T^*}\bigl |\widetilde{d}(t)-{\mathbb {E}}\widetilde{d}(t)\bigr | =O_{L^2}(n^{1/2}). \end{aligned}$$
(2.22)

Hence, by (2.14),

$$\begin{aligned} M^*:= \max _{t\le T^*}\bigl |\widetilde{d}(t)-n\widetilde{\ell }(t/n)\bigr | =O_{L^2}(n^{1/2}). \end{aligned}$$
(2.23)

(Note that \(T^*\) and \(M^*\) depend on the choice of \(\theta ^*\).) For \(t\le T^*\), the definition of \(M^*\) in (2.23) implies \(\bigl |\widetilde{d}(j)-n\widetilde{\ell }(j/n)\bigr |\le M^*\) for \(1\le j\le t\), and thus

$$\begin{aligned} \Bigl |\min _{1\le j\le t}\widetilde{d}(j)- n\min _{1\le j\le t}\widetilde{\ell }(j/n)\Bigr |\le M^*. \end{aligned}$$
(2.24)

Moreover, for \(t/n\le \theta _1\), we have \(0\le \min _{1\le j\le t}\widetilde{\ell }(j/n)\le \widetilde{\ell }(1/n)=O(1/n)\), while for \(t/n\ge \theta _1\), we have \(\min _{1\le j\le t}\widetilde{\ell }(j/n)=\widetilde{\ell }(t/n)\). Hence, for all \(t\le T^*\),

$$\begin{aligned} \min _{1\le j\le t}\widetilde{\ell }(j/n)=\widetilde{\ell }(t/n)-\widetilde{\ell }^+(t/n) + O(1/n), \end{aligned}$$
(2.25)

and thus, by (2.24),

$$\begin{aligned} \Bigl |\min _{1\le j\le t}\widetilde{d}(j)- n\widetilde{\ell }(t/n)+n\widetilde{\ell }^+(t/n)\Bigr |\le M^*+O(1/n). \end{aligned}$$
(2.26)

Finally, by (2.10), (2.23) and (2.26),

$$\begin{aligned} \bigl |d(t)- n\widetilde{\ell }^+(t/n)\bigr |\le 2M^*+O(1/n). \end{aligned}$$
(2.27)

This holds uniformly for \(t\le T^*\), and thus, by (2.23),

$$\begin{aligned} \max _{1\le t\le T^*} \bigl |d(t)- n\widetilde{\ell }^+(t/n)\bigr |=O_{L^2}(n^{1/2}). \end{aligned}$$
(2.28)

It remains to consider \(T^*<t\le n\). Then the argument above does not quite work, because \(\pi _t\searrow 0\) and thus \({\text {Var}}\xi _t\nearrow \infty \) as \(t\nearrow n\). We therefore modify \(\xi _t\). We define \({\widehat{\pi }}_t:=\max \{\pi _t,\pi _{T^*}\}\); thus \({\widehat{\pi }}_t=\pi _t\) for \(t\le T^*\) and \({\widehat{\pi }}_t>\pi _t\) for \(t>T^*\). We may then define independent random variables \({\widehat{\xi }}_t\) such that \({\widehat{\xi }}_t\sim {\text {Ge}}({\widehat{\pi }}_t)\) and \({\widehat{\xi }}_t\le \xi _t\) for all \(t< n\). (Thus, \({\widehat{\xi }}_t=\xi _t\) for \(t\le T^*\). For \(t>T^*\), we may assume that \(\xi _t:=\min \{j:U_{t,j}<\pi _t\}-1\) for an array of independent U(0, 1) random variables \((U_{t,j})_{j,t}\) and then define \({\widehat{\xi }}_t:=\min \{j:U_{t,j}<{\widehat{\pi }}_t\}-1\).)

In analogy with (2.9)–(2.10), we further define

$$\begin{aligned} \widehat{\widetilde{d}}(t)&:=\sum _{i=1}^{t-1}\bigl (1-{\widehat{\xi }}_i\bigr ), \end{aligned}$$
(2.29)
$$\begin{aligned} \widehat{d}(t)&:=\widehat{\widetilde{d}}(t)-\min _{1\le j\le t}\widehat{\widetilde{d}}(j) =\max _{1\le j\le t}\sum _{i=j}^{t-1}\bigl (1-{\widehat{\xi }}_i\bigr ). \end{aligned}$$
(2.30)

Since \({\widehat{\xi }}_i\le \xi _i\), (2.30) implies that \(\widehat{d}(t)\ge d(t)\) for all t.

We have \({\text {Var}}\bigl [{\widehat{\xi }}_t\bigr ]=O(1)\), uniformly for all \(t<n\), and thus the argument above yields

$$\begin{aligned} \max _{1\le t\le n} \bigl |\widehat{d}(t)- n[\widehat{\widetilde{\ell }}(t/n)]^+\bigr |=O_{L^2}(n^{1/2}) ,\end{aligned}$$
(2.31)

where

$$\begin{aligned} \widehat{\widetilde{\ell }}(\theta ) := \int _0^\theta \min \Bigl \{\Bigl (1-\frac{1}{\lambda (1-x)}\Bigr ) ,\Bigl (1-\frac{1}{\lambda (1-\theta ^*)}\Bigr )\Bigr \} \,\textrm{d}x.\end{aligned}$$
(2.32)

We have \(\widehat{\widetilde{\ell }}(\theta )=\widetilde{\ell }(\theta )\) for \(\theta \le \theta ^*\), and for \(\theta \ge \theta ^*\), \(\widehat{\widetilde{\ell }}(\theta )\) is negative and decreasing (since \(\theta ^*>\theta _1\)). Hence, \([\widehat{\widetilde{\ell }}(\theta )]^+=\widetilde{\ell }^+(\theta )\) for all \(0<\theta \le 1\). In particular, \([\widehat{\widetilde{\ell }}(\theta )]^+=\widetilde{\ell }^+(\theta )=0\) for all \(\theta \ge \theta ^*\), and (2.31) implies

$$\begin{aligned} \max _{T^*<t\le n} \widehat{d}(t) = O_{L^2}(n^{1/2}). \end{aligned}$$
(2.33)

Recalling \(0\le d(t)\le \widehat{d}(t)\), we thus have

$$\begin{aligned} \max _{T^*<t\le n} \bigl |d(t) -n\widetilde{\ell }^+(t/n)\bigr | = \max _{T^*<t\le n} d(t) \le \max _{T^*<t\le n} \widehat{d}(t) = O_{L^2}(n^{1/2}), \end{aligned}$$
(2.34)

which completes the proof. \(\square \)

Corollary 2.5

The height \(\Upsilon \) of the depth-first forest is

$$\begin{aligned} \Upsilon := \max _{1\le t\le n}d(t)= \upsilon n+O_{L^2}(n^{1/2}), \end{aligned}$$
(2.35)

where

$$\begin{aligned} \upsilon = \upsilon (p):= \widetilde{\ell }^+(\theta _0) = {\left\{ \begin{array}{ll} 0, &{} 0<\lambda \le 1, \\ 1-\lambda ^{-1}-\lambda ^{-1}\log \lambda , &{} \lambda >1 . \end{array}\right. } \end{aligned}$$
(2.36)

Proof

Immediate from Theorem 2.4 and (2.15), since we have \(\max _t\widetilde{\ell }^+(t/n)=\max _\theta \widetilde{\ell }^+(\theta )+O(1/n)\) and \(\max _\theta \widetilde{\ell }^+(\theta )=\widetilde{\ell }^+(\theta _0)=\widetilde{\ell }(\theta _0)\). \(\square \)

In Sect. 3,we will improve this when \(\lambda >1\) and show that then the height \(\Upsilon \) is asymptotically normally distributed (Theorem 3.4).

Corollary 2.6

The average depth \({\overline{d}}\) in the depth-first forest is

$$\begin{aligned} {\overline{d}}:= \frac{1}{n}\sum _{t=1}^n d(t) = \alpha n + O_{L^2}(n^{1/2}), \end{aligned}$$
(2.37)

where \(\alpha =0\) if \(\lambda \le 1\), and, in general,

$$\begin{aligned} \alpha =\alpha (p):= \frac{1}{2}\theta _1^2 -\frac{1}{\lambda }\Bigl ((1-\theta _1)\log (1-\theta _1)+\theta _1\Bigr ) =\frac{\lambda -1}{\lambda }\,\theta _1-\frac{1}{2}\theta _1^2 . \end{aligned}$$
(2.38)

Proof

By Theorem 2.4, using that (2.19) shows that \(\widetilde{\ell }^+(\theta )\) is a Lipschitz function on \([0,1]\),

$$\begin{aligned} \frac{1}{n}\sum _{t=1}^n d(t)&= \sum _{t=1}^n \widetilde{\ell }^+(t/n) + O_{L^2}\bigl (n^{1/2}\bigr ) = \int _0^n \widetilde{\ell }^+\Bigl (\frac{\lceil s\rceil }{n}\Bigr )\,\textrm{d}s + O_{L^2}\bigl (n^{1/2}\bigr ) \nonumber \\&= \int _0^n \Bigl ( \widetilde{\ell }^+\Bigl (\frac{s}{n}\Bigr )+O\Bigl (\frac{1}{n}\Bigr )\Bigr )\,\textrm{d}s + O_{L^2}\bigl (n^{1/2}\bigr ) =n \alpha + O_{L^2}\bigl (n^{1/2}\bigr ), \end{aligned}$$
(2.39)

where

$$\begin{aligned} \alpha&:= \int _0^1 \widetilde{\ell }^+(x)\,\textrm{d}x= \int _0^{\theta _1} \widetilde{\ell }(x)\,\textrm{d}x=\int _0^{\theta _1}\Bigl (x+\lambda ^{-1}\log (1-x)\Bigr )\,\textrm{d}x\nonumber \\&=\frac{1}{2}\theta _1^2 -\lambda ^{-1}\Bigl ((1-\theta _1)\log (1-\theta _1)+\theta _1\Bigr ) , \end{aligned}$$
(2.40)

which yields (2.38), using (2.17). \(\square \)

Remark 2.7

When \(\lambda >1\), the height \(\Upsilon \) and average depth \({\overline{d}}\) are thus linear in n, unlike many other types of random trees. This might imply a rather slow performance of algorithms that operate on the depth-first forest if it is built explicitly in a computer’s memory.

3 Asymptotic Normality

In this section, we show that in the supercritical case \(\lambda >1\), Theorem 2.4 can be improved to yield convergence of d(t) (after rescaling) to a Gaussian process, at least on \([0,\theta _1)\). As a consequence, we show that the height \(\Upsilon \) is asymptotically normal.

Recall that for an interval \(I\subseteq \mathbb {R}\), D(I) is the space of functions \(I\rightarrow \mathbb {R}\) that are right-continuous with left limits (càdlàg) equipped with the Skorohod topology. For definitions of the topology see e.g.  [5, 11, 15, Appendix A.2], or [13]; for our purposes it is enough to know that convergence in D(I) to a continuous limit is equivalent to uniform convergence on compact subsets of I. (Note that it thus matters if the endpoints are included in I or not; for example, convergence in \(D[0,1)\) and \(D[0,1]\) mean different things.)

We define \(d(0):=\widetilde{d}(0):=0\).

Lemma 3.1

Assume \(\lambda >1\). Then

$$\begin{aligned} n^{-1/2}\bigl (\widetilde{d}(\lfloor n\theta \rfloor ) -n\widetilde{\ell }(\theta )\bigr )\overset{\textrm{d}}{\longrightarrow }Z(\theta ) \qquad \text {in}\, D[0,1), \end{aligned}$$
(3.1)

where \(Z(\theta )\) is a continuous Gaussian process on \([0,1)\) with mean \({\mathbb {E}}Z(\theta )=0\) and covariance \({\text {Cov}}\bigl (Z(x),Z(y)\bigr )=g\bigl (\min \{x,y\}\bigr )\), where

$$\begin{aligned} g(\theta ):=\frac{(1-p)^2\theta }{p^2(1-\theta )}-\frac{1-p}{p}\log (1-\theta ) =\lambda ^{-2}\frac{\theta }{1-\theta }-\lambda ^{-1}\log (1-\theta ). \end{aligned}$$
(3.2)

Equivalently, \(Z(\theta )=B\bigl (g(\theta )\bigr )\) for a Brownian motion B(x).

Proof

Since the random variables \(\xi _t\) are independent, (2.9) and (2.7)–(2.8) yield, similarly to (2.13),

$$\begin{aligned} {\text {Var}}\bigl [\widetilde{d}(t)\bigr ]&=\sum _{i=1}^{t-1}{\text {Var}}\xi _i =\sum _{i=1}^{t-1}\frac{1-\pi _i}{\pi _i^2} =\sum _{i=1}^{t-1}\frac{(1-p)(1-pi/n)}{p^2(1-i/n)^2} . \end{aligned}$$
(3.3)

Hence, uniformly for \(t/n\le \theta ^*\) for any \(\theta ^*<1\),

$$\begin{aligned} {\text {Var}}\bigl [\widetilde{d}(t)\bigr ] = ng(t/n)+O(1), \end{aligned}$$
(3.4)

with

$$\begin{aligned} g(\theta ):= \frac{1-p}{p^2}\int _0^{\theta }\frac{1-px}{(1-x)^2}\,\textrm{d}x =\frac{(1-p)^2\theta }{p^2(1-\theta )}-\frac{1-p}{p}\log (1-\theta ), \end{aligned}$$
(3.5)

in agreement with (3.2). (Note that this definition of \(g(\theta )\) as an integral shows that \(g(\theta )\) is strictly increasing in \(\theta \).) Since also \({\mathbb {E}}\widetilde{d}\bigl (\lfloor n\theta \rfloor \bigr )=n\widetilde{\ell }(\theta )+O(1)\) by (2.14), the marginal convergence for a fixed \(\theta \) in (3.1) follows by the classical central limit theorem for independent (not identically distributed) variables, e.g.  using Lyapounov’s condition [10, Theorem 7.2.2].

The functional limit (3.1) is thus a version of Donsker’s theorem [5, Theorem 16.1], extended from the i.i.d. case to the non-identically distributed variables \(\xi _i\). Such extensions exist, but since we have not found a version in the literature that can be immediately applied here, we give such a version in Appendix A. The result (3.1) follows by applying Theorem A.1 to the variables \(n^{-1/2}\bigl ((1-\xi _{i-1})-{\mathbb {E}}(1-\xi _{i-1})\bigr )\). \(\square \)

Lemma 3.2

Assume \(\lambda >1\) and let \(0<\theta ^*<\theta _1\). Then

$$\begin{aligned} \min _{1\le j\le \lfloor n\theta ^*\rfloor }\widetilde{d}(j) =o_{\textrm{p}}\bigl (n^{1/2}\bigr ). \end{aligned}$$
(3.6)

Proof

Let \(t_n:=\lceil n^{2/3}\rceil \). If n is large enough, then \(t_n<n\theta ^*\), and, since \(\widetilde{\ell }'(0)=1-\lambda ^{-1}>0\) by (2.15),

$$\begin{aligned} \min _{t_n/n\le \theta \le \theta ^*}\widetilde{\ell }(\theta ) =\widetilde{\ell }(t_n/n) \ge ct_n/n \ge c n^{-1/3} \end{aligned}$$
(3.7)

for some constant \(c>0\). Furthermore, (2.23) implies

$$\begin{aligned} \max _{t_n\le t\le n\theta ^*} \bigl |\widetilde{d}(t)- n\widetilde{\ell }(t/n)\bigr |=O_{L^2}(n^{1/2})= o_{\textrm{p}}\bigl (n^{2/3}\bigr ) \end{aligned}$$
(3.8)

(recall that \(O_{L^2}(a_n)\) implies \(o_{\textrm{p}}(\omega _na_n)\) for any \(a_n\) and any \(\omega _n\rightarrow \infty \)). It follows from (3.7)–(3.8) that w.h.p.  \( \bigl |\widetilde{d}(t)- n\widetilde{\ell }(t/n)\bigr | < n\widetilde{\ell }(t/n)\) for all \(t\in [t_n,n\theta ^*]\). Hence, w.h.p. , \(\widetilde{d}(t)>0=\widetilde{d}(1)\) for all \(t\in [t_n,n\theta ^*]\). Consequently, w.h.p. ,

$$\begin{aligned} \min _{1\le t\le n\theta ^*}\widetilde{d}(t)=\min _{1\le t\le t_n}\widetilde{d}(t). \end{aligned}$$
(3.9)

For \(t\le t_n\), we use Doob’s inequality in the form (2.21) again. Since \(\min _{t\le t_n}\widetilde{d}(t)\le \widetilde{d}(1)=0\) and \({\mathbb {E}}\widetilde{d}(t)\ge 0\) for \(t\le t_n\) (for n large), we have \(\bigl | \min _{t\le t_n}\widetilde{d}(t)\bigr |\le \max _{t\le t_n}|\widetilde{d}(t)-{\mathbb {E}}\widetilde{d}(t)|\) and thus (2.21) yields

$$\begin{aligned} {\mathbb {E}}\bigl | \min _{t\le t_n}\widetilde{d}(t)\bigr |^2 \le 4 \sum _{i=1}^{t_n-1} {\text {Var}}(\xi _i) =O(t_n) = O\bigl (n^{2/3}\bigr ). \end{aligned}$$
(3.10)

Hence,

$$\begin{aligned} \min _{t\le t_n}\widetilde{d}(t) = O_{L^2}\bigl (n^{1/3}\bigr ) = o_{\textrm{p}}\bigl (n^{1/2}\bigr ). \end{aligned}$$
(3.11)

The proof is completed by combining (3.9) and (3.11). \(\square \)

Theorem 3.3

Assume \(\lambda >1\). Then

$$\begin{aligned} n^{-1/2}\bigl (d(\lfloor n\theta \rfloor ) -n\widetilde{\ell }(\theta )\bigr )\overset{\textrm{d}}{\longrightarrow }Z(\theta ) \qquad \text {in}\, D[0,\theta _1) \end{aligned}$$
(3.12)

where \(Z(\theta )\) is the continuous Gaussian process defined in Lemma 3.1.

Proof

By (2.10) and Lemma 3.2, for any \(\theta ^*<\theta _1\),

$$\begin{aligned} \max _{0\le t\le \lfloor n\theta ^*\rfloor } \bigl | d(t)-\widetilde{d}(t)\bigr | =\Bigl | \min _{1\le j\le \lfloor n\theta ^*\rfloor }\widetilde{d}(j)\Bigr | =o_{\textrm{p}}\bigl (n^{1/2}\bigr ). \end{aligned}$$
(3.13)

The theorem now follows from Lemma 3.1. \(\square \)

Theorem 3.4

Let \(\lambda >1\). Then the height \(\Upsilon \) of the depth-first forest has an asymptotic normal distribution:

$$\begin{aligned} \frac{\Upsilon -\upsilon n}{\sqrt{n}}\overset{\textrm{d}}{\longrightarrow }N\bigl (0,\sigma ^2\bigr ) \end{aligned}$$
(3.14)

with \(\upsilon \) given by (2.36), and

$$\begin{aligned} \sigma ^2:= \lambda ^{-1}-\lambda ^{-2}+\lambda ^{-1}\log \lambda . \end{aligned}$$
(3.15)

Proof

Fix some \(\theta ^*\in (\theta _0,\theta _1)\). By Theorem 3.3 and the Skorohod coupling theorem [15, Theorem 4.30], we may assume that the random variables for different n are coupled such that the limit in (3.12) holds (almost) surely. Since \(Z(\theta )\) is continuous, this implies uniform convergence on \([0,\theta ^*]\), i.e.,

$$\begin{aligned} d\bigl (\lfloor n\theta \rfloor \bigr )=n\widetilde{\ell }(\theta )+n^{1/2}Z(\theta ) + o\bigl (n^{1/2}\bigr ), \end{aligned}$$
(3.16)

uniformly on \([0,\theta ^*]\). (The \(o(n^{1/2})\) here are random, but uniform in \(\theta \).) For \(|\theta -\theta _0|\le n^{-1/6}\), we have \(Z(\theta )=Z(\theta _0)+o(1)\), since Z is continuous, and thus (3.16) yields, almost surely,

$$\begin{aligned} d\bigl (\lfloor n\theta \rfloor \bigr )=n\widetilde{\ell }(\theta )+n^{1/2}Z(\theta _0) + o\bigl (n^{1/2}\bigr ), \qquad |\theta -\theta _0|\le n^{-1/6}. \end{aligned}$$
(3.17)

Since \(\max _\theta \widetilde{\ell }(\theta )=\widetilde{\ell }(\theta _0)\), it follows that

$$\begin{aligned} \max _{ |\theta -\theta _0|\le n^{-1/6}} d\bigl (\lfloor n\theta \rfloor \bigr ) =n\widetilde{\ell }(\theta _0)+n^{1/2}Z(\theta _0) + o\bigl (n^{1/2}\bigr ). \end{aligned}$$
(3.18)

On the other hand, for \(|\theta -\theta _0|\ge n^{-1/6}\), we have by a Taylor expansion, for some \(c>0\),

$$\begin{aligned} \widetilde{\ell }(\theta )\le \widetilde{\ell }(\theta _0) - c (\theta -\theta _0)^2 \le \widetilde{\ell }(\theta _0) - c n^{-1/3}. \end{aligned}$$
(3.19)

Hence, (2.20) implies

$$\begin{aligned} \max _{|\theta -\theta _0|\ge n^{-1/6}} d(t) \le n \max _{|\theta -\theta _0|\ge n^{-1/6}} \widetilde{\ell }(\theta ) + O_{L^2}(n^{1/2})\le n \widetilde{\ell }(\theta _0)-c n^{2/3} + O_{L^2}(n^{1/2}). \end{aligned}$$
(3.20)

Comparing (3.18) and (3.20), we see that w.h.p.  the maximum in (3.18) is larger than the one in (3.20), and thus

$$\begin{aligned} \Upsilon =\max _{0\le \theta \le 1}d\bigl (\lfloor n\theta \rfloor \bigr ) =n\widetilde{\ell }(\theta _0)+n^{1/2}Z(\theta _0) + o\bigl (n^{1/2}\bigr ). \end{aligned}$$
(3.21)

Hence, w.h.p. ,

$$\begin{aligned} \frac{\Upsilon -\widetilde{\ell }(\theta _0) n}{\sqrt{n}} = Z(\theta _0)+o(1), \end{aligned}$$
(3.22)

which implies

$$\begin{aligned} \frac{\Upsilon -\widetilde{\ell }(\theta _0) n}{\sqrt{n}} \overset{\textrm{d}}{\longrightarrow }Z(\theta _0) \sim N\bigl (0,g(\theta _0)\bigr ). \end{aligned}$$
(3.23)

Since \(\widetilde{\ell }(\theta _0)=\upsilon \) by (2.36), this shows (3.14) with \(\sigma ^2:=g(\theta _0)\), which gives (3.15) by (3.2) and \(\theta _0:=1-\lambda ^{-1}\). \(\square \)

4 The Trees in the Forest

Theorem 4.1

Let N be the number of trees in the depth-first forest. Then

$$\begin{aligned} N=\psi n + O_{L^2}(n^{1/2}), \end{aligned}$$
(4.1)

where

$$\begin{aligned} \psi =\psi (p):= 1-\theta _1-\frac{\lambda }{2}(1-\theta _1)^2. \end{aligned}$$
(4.2)

Figure 2 shows the parameter \(\psi \) as a function of the average degree \(\lambda \).

Fig. 2
figure 2

\(\psi \), as function of \(\lambda \)

Proof

Let \(J_t:={\varvec{1}}\{d(t)=0\}\), the indicator that vertex t is a root and thus starts a new tree. Thus \(N=\sum _1^n J_t\).

If \(\theta _1>0\) (i.e., \(\lambda >1\)), then Theorem 2.4 shows that w.h.p.  \(d(t)>0\) in the interval \((1,n\theta _1)\), except possibly close to the endpoints. Thus the DFS will find one giant tree of order \(\approx \theta _1 n\), possibly preceded by a few small trees, and, as we will see later in the proof, followed by many small trees. To obtain a precise estimate, we note that there exists a constant \(c>0\) such that \(\widetilde{\ell }(\theta )\ge \min \{c\theta ,c(\theta _1-\theta )\}\) for \(\theta \in [0,\theta _1]\). Furthermore, if \(d(t)=0\), then (2.10) yields \(\widetilde{d}(t)=\min _{1\le j\le t}\widetilde{d}(j)\le \widetilde{d}(1)=0\), and thus, recalling (2.23), if also \(t\le n\theta _1\) and thus \(t\le T^*\), we have \(M^*\ge n\widetilde{\ell }(t/n)\). Hence, if \(t\le n\theta _1\) and \(d(t)=0\), then

$$\begin{aligned} M^*\ge n\widetilde{\ell }(t/n) \ge c\min \{t,n\theta _1-t\}. \end{aligned}$$
(4.3)

Consequently, \(d(t)=0\) with \(t\le n\theta _1\) implies \(t\in [1,c^{-1}M^*] \cup [n\theta _1-c^{-1}M^*,n\theta _1]\). The number of such t is thus \(O(M^*+1)=O_{L^2}(n^{1/2})\), using (2.23).

Let \(T_1:=\lceil n\theta _1\rceil \). We have just shown that (the case \(\theta _1=0\) is trivial)

$$\begin{aligned} \sum _{t=1}^{T_1-1} J_t = O_{L^2}(n^{1/2}). \end{aligned}$$
(4.4)

It remains to consider \(t\ge T_1\). For any integer \(k\ge 0\), the conditional distribution of \(\xi _t-k\) given \(\xi _t\ge k\) equals the distribution of \(\xi _t\). Hence, recalling (2.12),

$$\begin{aligned} {\mathbb {E}}\bigl [(\xi _t-k)^+\bigr ] ={\mathbb {E}}\bigl [\xi _t-k\mid \xi _t\ge k\bigr ]{\mathbb {P}}(\xi _t\ge k) =\mu _t{\mathbb {P}}(\xi _t-k\ge 0). \end{aligned}$$
(4.5)

We use again the stochastic recursion (2.6). Let \({\mathcal {F}}_t\) be the \(\sigma \)-field generated by \(\xi _1,\dots ,\xi _{t-1}\). Then d(t) is \({\mathcal {F}}_t\)-measurable, while \(\xi _t\) is independent of \({\mathcal {F}}_t\). Hence, (2.6) and (4.5) yield

$$\begin{aligned} {\mathbb {E}}\bigl [d(t+1)\mid {\mathcal {F}}_t\bigr ]&= {\mathbb {E}}\bigl [d(t)+1-\xi _t\mid {\mathcal {F}}_t\bigr ] + {\mathbb {E}}\bigl [(\xi _t-1-d(t))^+\mid {\mathcal {F}}_t\bigr ] \nonumber \\&=d(t)+1-\mu _t+\mu _t{\mathbb {P}}\bigl [\xi _t-1-d(t)\ge 0\mid {\mathcal {F}}_t\bigr ] \nonumber \\&=d(t)+1-\mu _t+\mu _t{\mathbb {P}}\bigl [d(t+1)=0\mid {\mathcal {F}}_t\bigr ] \nonumber \\&=d(t)+1-\mu _t+\mu _t{\mathbb {E}}\bigl [J_{t+1}\mid {\mathcal {F}}_t\bigr ] . \end{aligned}$$
(4.6)

We write \(\Delta d(t):=d(t+1)-d(t)\) and \(\overline{J}_t:=1-J_t\). Then (4.6) yields

$$\begin{aligned} {\mathbb {E}}\bigl [\Delta d(t) -1 + \mu _t\overline{J}_{t+1}\mid {\mathcal {F}}_t\bigr ]=0. \end{aligned}$$
(4.7)

Define

$$\begin{aligned} {\mathcal {M}}_t: =\sum _{i=1}^{t-1}\mu _i^{-1}\bigl (\Delta d(i)-1+\mu _i\overline{J}_{i+1}\bigr ) =\sum _{i=1}^{t-1}\Bigl (\mu _i^{-1}\Delta d(i)-\mu _i^{-1}+\overline{J}_{i+1}\Bigr ). \end{aligned}$$
(4.8)

Then \({\mathcal {M}}_t\) is \({\mathcal {F}}_t\)-measurable, and (4.7) shows that \({\mathcal {M}}_t\) is a martingale. We have, with \(\Delta {\mathcal {M}}_t:={\mathcal {M}}_{t+1}-{\mathcal {M}}_t\), using (2.6),

$$\begin{aligned} |\Delta {\mathcal {M}}_t|\le \mu _t^{-1}\bigl |d(t+1)-d(t)-1\bigr |+\overline{J}_{t+1} \le \mu _t^{-1}\xi _t+1, \end{aligned}$$
(4.9)

and thus, since \(\pi _t\le p<1\) for all t by (2.8) (and using the inequality \((a+b)^2\le 2a^2+2b^2\) for any real ab),

$$\begin{aligned} {\mathbb {E}}|\Delta {\mathcal {M}}_t|^2 \le 2\mu _t^{-2}{\mathbb {E}}\xi _t^2+2 =2\Bigl (\frac{\pi _t}{1-\pi _t}\Bigr )^2\frac{1-\pi _t+(1-\pi _t)^2}{\pi _t^2}+2=O(1). \end{aligned}$$
(4.10)

Hence, uniformly for all \(T\le n\),

$$\begin{aligned} {\mathbb {E}}{\mathcal {M}}_T^2 =\sum _{t=1}^{T-1}{\mathbb {E}}|\Delta {\mathcal {M}}_t|^2=O(T)=O(n). \end{aligned}$$
(4.11)

The definition (4.8) yields

$$\begin{aligned} {\mathcal {M}}_n-{\mathcal {M}}_{T_1}&= \sum _{t=T_1}^{n-1}\mu _t^{-1}\Delta d(t) - \sum _{t=T_1}^{n-1}\mu _t^{-1}+\sum _{t=T_1}^{n-1}\overline{J}_{t+1} . \end{aligned}$$
(4.12)

By a summation by parts, and interpreting \(\mu _n^{-1}:=0\),

$$\begin{aligned} \sum _{t=T_1}^{n-1}\mu _t^{-1}\Delta d(t) =\sum _{t=T_1+1}^{n}\bigl (\mu _{t-1}^{-1}-\mu _t^{-1}\bigr ) d(t) -\mu _{T_1}^{-1}d(T_1). \end{aligned}$$
(4.13)

As t increases, \(\mu _t\) increases by (2.12), and thus \(\mu _{t-1}^{-1}-\mu _t^{-1}>0\). Hence, (4.13) implies

$$\begin{aligned} \Bigl |\sum _{t=T_1}^{n-1}\mu _t^{-1}\Delta d(t)\Bigr |&\le \sum _{t=T_1+1}^{n}\bigl (\mu _{t-1}^{-1}-\mu _t^{-1}\bigr )\sup _{i> T_1}|d(t)| +\mu _{T_1}^{-1}|d(T_1)| \le 2 \mu _{T_1}^{-1}\sup _{i\ge T_1}|d(t)| \nonumber \\&=O_{L^2}(n^{1/2})\end{aligned}$$
(4.14)

by (2.20), since \(\widetilde{\ell }^+(t/n)=0\) for \(t\ge T_1\ge n\theta _1\). Furthermore, (4.11) shows that \({\mathcal {M}}_n,{\mathcal {M}}_{T_1}=O_{L^2}(n^{1/2})\). Hence, (4.12) yields, using (2.12),

$$\begin{aligned} \sum _{t=T_1+1}^n J_t&= n-T_1 - \sum _{t=T_1+1}^n \overline{J}_t =n-T_1- \sum _{t=T_1}^{n-1}\mu _t^{-1}+ O_{L^2}(n^{1/2})\nonumber \\&=n-T_1- \sum _{t=T_1}^{n-1}\lambda \Bigl (1-\frac{t}{n}\Bigr ) + O_{L^2}(n^{1/2}). \end{aligned}$$
(4.15)

We have

$$\begin{aligned} \sum _{t=T_1}^{n-1}\lambda \Bigl (1-\frac{t}{n}\Bigr )&=\int _{T_1}^n\lambda \Bigl (1-\frac{\lfloor s\rfloor }{n}\Bigr )\,\textrm{d}s =\int _{T_1}^n \lambda \Bigl (1-\frac{x}{n}\Bigr )\,\textrm{d}x+O(1), \end{aligned}$$
(4.16)

and thus (4.15) yields

$$\begin{aligned} \sum _{t=T_1+1}^n J_t&=n\psi +O_{L^2}(n^{1/2}), \end{aligned}$$
(4.17)

where

$$\begin{aligned} \psi := 1-\theta _1 -\int _{\theta _1}^1\lambda {(1-x)}\,\textrm{d}x=1-\theta _1-\frac{\lambda }{2}(1-\theta _1)^2. \end{aligned}$$
(4.18)

The result follows by (4.17) and (4.4). \(\square \)

The arguments in the proof of Theorem 4.1 show that in the supercritical case \(\lambda >1\), the DFS w.h.p.  find first possibly a few small trees, then a giant tree containing all \(v_t\) with \(O_{L^2}(n^{1/2})\le t\le \theta _1n+O_{L^2}(n^{1/2})\), and then a large number of small trees. We give some details in the following lemma and theorem.

Lemma 4.2

Let (ab) be a fixed interval with \(0\le a<b\le 1\) and \(b>\theta _1\). Then w.h.p.  there exists a root \(v_t\) in the depth-first forest with \(t/n\in (a,b)\).

Proof

By increasing a, we may assume that \(\theta _1<a<b\le 1\). Then, cf. (2.15), \(\widetilde{\ell }'(a)<\widetilde{\ell }'(\theta _1)\le 0\) and thus \( \lambda (1-a)<1\). Hence, the argument yielding (4.15) in the proof of Theorem 4.1 yields also

$$\begin{aligned} \sum _{t=\lceil an\rceil }^{\lfloor bn\rfloor } J_t&=bn-an- \sum _{t=\lceil an\rceil }^{\lfloor bn\rfloor }\lambda (1-t/n) + O_{L^2}(n^{1/2})\ge cn+O_{L^2}(n^{1/2}), \end{aligned}$$
(4.19)

with \(c:=(b-a)(1-\lambda (1-a))>0\). Hence, w.h.p.  there are many roots \(v_t\) with \(t\in (an,bn)\). \(\square \)

Theorem 4.3

Let \(\textbf{T}_1\) be the largest tree in the depth-first forest.

  1. (i)

    If \(\lambda \le 1\), then \(|\textbf{T}_1|=o_{\textrm{p}}(n)\).

  2. (ii)

    If \(\lambda >1\), then \(|\textbf{T}_1|=\theta _1 n+O_{L^2}(n^{1/2})\). Furthermore, the second-largest tree has order \(|\textbf{T}_2|=o_{\textrm{p}}(n)\).

Proof

Let \(\varepsilon >0\). By covering \([\theta _1,1]\) with a finite number of intervals of length \(<\varepsilon /2\), it follows from Lemma 4.2 that w.h.p.  every tree \(\textbf{T}\) having a root \(v_t\) with \(t>(\theta _1-\varepsilon /2)n\) has \(|\textbf{T}|\le \varepsilon n\).

In particular, if \(\lambda \le 1\), so \(\theta _1=0\), this applies to all trees, and thus w.h.p.  \(|\textbf{T}_1|\le \varepsilon n\), which proves (i).

Suppose now \(\lambda >1\). Consider the tree \(\textbf{T}\) in the depth-first forest that contains \(v_{\lfloor n\theta _0\rfloor }\), denote its root by \(v_r\) and let \(v_s\) be its last vertex. By the proof of Theorem 4.1, \(d(t)>0\) for \(O_{L^2}(n^{1/2})\le t\le \theta _1 n-O_{L^2}(n^{1/2})\), and thus \(r=O_{L^2}(n^{1/2})\) and \(s\ge \theta _1n-O_{L^2}(n^{1/2})\).

On the other hand, let \(\theta ^*\in (\theta _1,1)\). If \(s\ge \theta _1n\), let \(u:=\min \{s,\lfloor \theta ^*n\rfloor \}\). Since \(r/n\le \theta _0\), we have \(\widetilde{\ell }(r/n)\ge 0\). Furthermore, (2.10) implies that \(\widetilde{d}(t)>\min _{j\le t}\widetilde{d}(j)=\widetilde{d}(r)\) for \(t\in (r,s]\), and thus \(\widetilde{d}(u)>\widetilde{d}(r)\). Hence, by (2.23),

$$\begin{aligned} -n\widetilde{\ell }(u/n) \le n\widetilde{\ell }(r/n)-n\widetilde{\ell }(u/n) \le 2M^*+ \widetilde{d}(r)-\widetilde{d}(u) \le 2M^*=O_{L^2}(n^{1/2}). \end{aligned}$$
(4.20)

Since \(\widetilde{\ell }(\theta _1)=0\) and \(\widetilde{\ell }'(\theta )\le -c<0\) for \(\theta \ge \theta _1\), it follows that \(u\le \theta _1n+O_{L^2}(n^{1/2})\), and thus \(s\le \theta _1n+O_{L^2}(n^{1/2})\).

Consequently, \(s= \theta _1n+O_{L^2}(n^{1/2})\), and thus \(|\textbf{T}|=s-r+1=\theta _1n+O_{L^2}(n^{1/2})\). Furthermore, any tree found before \(\textbf{T}\) has order \(\le r=o_{\textrm{p}}(n)\). The first part of the proof now shows that for every \(\varepsilon >0\), there is w.h.p.  no tree other than \(|\textbf{T}|\) of order \(>\varepsilon n\). Hence, w.h.p.  \(\textbf{T}\) is the largest tree, and (ii) follows. \(\square \)

Remark 4.4

As said in Remark 2.3, \(\theta _1\), the asymptotic fraction of vertices in the giant tree equals the survival probability of a Bienaymé–Galton–Watson process with \({\text {Po}}(\lambda )\) offspring distribution. Heuristically, this may be explained by the following argument, well known from similar situations. Start at a random vertex and follow the arcs backwards. The indegree of a given vertex is asymptotically \({\text {Po}}(\lambda )\), and the process of exploring backwards from a vertex may be approximated by a Bienaymé–Galton–Watson process with this offspring distribution. Hence, the probability of a “large” backwards process converges to the survival probability \(\theta _1\). It seems reasonable that most vertices in the giant tree have a large backwards process, while most vertices outside the giant have a small backwards process.

Note also that the asymptotic size of the giant tree thus equals the asymptotic size of the giant component in an undirected Erdős–Rényi random graph \(G(n,\lambda /n)\), which heuristically is given by the same argument. (See also Remark 1.1 and [12].)

5 Types of Arcs

Recall from the introduction the classification of the arcs in the digraph G. Since we assume that the outdegrees are \({\text {Ge}}(1-p)\) and independent, the total number of arcs, M say, has a negative binomial distribution with mean \(\lambda n\), and, by a weak version of the law of large numbers,

$$\begin{aligned} M=\lambda n + O_{L^2}(n^{1/2}). \end{aligned}$$
(5.1)

In the following theorem, we give the asymptotics of the number of arcs of each type.

Theorem 5.1

Let L, T, B, F, and C be the numbers of loops, tree arcs, back arcs, forward arcs, and cross arcs in the random digraph. Then

$$\begin{aligned} L&= O_{L^2}(1), \end{aligned}$$
(5.2)
$$\begin{aligned} T&= \tau n + O_{L^2}(n^{1/2}), \end{aligned}$$
(5.3)
$$\begin{aligned} B&= \beta n + O_{L^2}(n^{1/2}), \end{aligned}$$
(5.4)
$$\begin{aligned} F&=\varphi n + O_{L^2}(n^{1/2}), \end{aligned}$$
(5.5)
$$\begin{aligned} C&= \chi n + O_{L^2}(n^{1/2}), \end{aligned}$$
(5.6)

where

$$\begin{aligned} \tau&:= \chi :=1-\psi =\theta _1+\frac{\lambda }{2}(1-\theta _1)^2, \end{aligned}$$
(5.7)
$$\begin{aligned} \beta&:= \varphi :=\lambda \alpha =(\lambda -1)\theta _1-\frac{\lambda }{2}\theta _1^2 . \end{aligned}$$
(5.8)

Proof

Let \(\eta _t\) be the number of arcs from \(v_t\), and let \(\eta ^<_t,\eta ^=_t,\eta ^>_t\) be the numbers of these arcs that lead to some \(v_u\) with \(u<t\), \(u=t\) and \(u>t\), respectively. Then

$$\begin{aligned} L=\sum _{t=1}^n\eta ^=_t. \end{aligned}$$
(5.9)

Furthermore, an arc \(v_tv_u\) with \(u>t\) is either a tree arc or a forward arc; conversely, every tree arc or forward arc is of this type. Consequently,

$$\begin{aligned} T+F&=\sum _{t=1}^n\eta ^>_t. \end{aligned}$$
(5.10)

Similarly, or by (5.9) and (5.10),

$$\begin{aligned} B+C&=\sum _{t=1}^n\eta ^<_t. \end{aligned}$$
(5.11)

Conditioned on \(\eta _t\), \(\eta ^<_t\) has a binomial distribution \({\text {Bin}}(\eta _t,(t-1)/n)\), since each arc has probability \((t-1)/n\) to go to a vertex \(v_u\) with \(u<t\). In general, if \(X\sim {\text {Bin}}(m,p)\), then \({\mathbb {E}}X = mp\) and \({\mathbb {E}}X^2={\text {Var}}X + ({\mathbb {E}}X)^2=mp(1-p)+(m p)^2\). Hence, by first conditioning on \(\eta _t\),

$$\begin{aligned} {\mathbb {E}}\eta ^<_t&= {\mathbb {E}}\Bigl [\eta _t\frac{t-1}{n}\Bigr ] =\lambda \frac{t-1}{n}, \end{aligned}$$
(5.12)
$$\begin{aligned} {\text {Var}}\eta ^<_t&\le {\mathbb {E}}\bigl [(\eta ^<_t)^2\bigr ] ={\mathbb {E}}\Bigl [\eta _t \frac{t-1}{n}\Bigl (1-\frac{t-1}{n}\Bigr ) +\eta _t^2\Bigl (\frac{t-1}{n}\Bigr )^2\Bigr ] = O(1). \end{aligned}$$
(5.13)

Furthermore, the random variables \(\eta ^<_t\), \(t=1,\dots ,n\), are independent. Hence, (5.11) yields

$$\begin{aligned} {\mathbb {E}}B + {\mathbb {E}}C&= \sum _{t=1}^n{\mathbb {E}}\eta ^<_t = \lambda \sum _{t=1}^n\frac{t-1}{n} =\frac{\lambda }{2}(n-1), \end{aligned}$$
(5.14)
$$\begin{aligned} {\text {Var}}[B + C]&= \sum _{t=1}^n{\text {Var}}\eta ^<_t = O(n), \end{aligned}$$
(5.15)

and thus

$$\begin{aligned} B+C ={\mathbb {E}}[B+C]+O_{L^2}(n^{1/2})= \frac{\lambda }{2}n+O_{L^2}(n^{1/2}). \end{aligned}$$
(5.16)

The same argument with (5.10) yields

$$\begin{aligned} {\mathbb {E}}T + {\mathbb {E}}F&=\frac{\lambda }{2}(n-1), \end{aligned}$$
(5.17)
$$\begin{aligned} {\text {Var}}[F + T]&= O(n), \end{aligned}$$
(5.18)
$$\begin{aligned} F+T&= \frac{\lambda }{2}n+O_{L^2}(n^{1/2}). \end{aligned}$$
(5.19)

Similarly, conditioned on \(\eta _t\), we have \(\eta ^=\sim {\text {Bin}}(\eta _t,1/n)\), and we find

$$\begin{aligned} {\mathbb {E}}L&= \sum _{t=1}^n{\mathbb {E}}\eta ^=_t = n\lambda \frac{1}{n} =\lambda , \end{aligned}$$
(5.20)
$$\begin{aligned} {\text {Var}}{L}&= \sum _{t=1}^n{\text {Var}}\eta ^=_t = O(1), \end{aligned}$$
(5.21)
$$\begin{aligned} L&= \lambda +O_{L^2}(1) = O_{L^2}(1). \end{aligned}$$
(5.22)

This proves (5.2). We prove (5.3)–(5.6) one by one.

T In any forest, the number of vertices equals the number of edges + the number of trees. Hence, \(T=n-N\), where n is the number of trees in the depth-first forest, and thus Theorem 4.1 implies (5.3) with \(\tau =1-\psi \) given by (5.7).

B Let \(B_t\) be the number of back arcs from \(v_t\); thus \(B=\sum _1^n B_t\). Let \({\mathcal {F}}_t\) be the \(\sigma \)-field generated by all arcs from \(v_i\), \(i\le t\) (i.e., by the outdegrees \(\eta _i\) and the endpoints of all these arcs); note that this includes complete information on the DFS until \(v_{t+1}\) is found, but also on some further arcs (the future arcs from the ancestors of \(v_{t+1}\)). Then d(t) is \({\mathcal {F}}_{t-1}\)-measurable and \(B_t\) is \({\mathcal {F}}_t\)-measurable. Moreover, \(\eta _t\) is independent of \({\mathcal {F}}_{t-1}\). Thus, conditioned on \({\mathcal {F}}_{t-1}\), we still have \(\eta _t\sim {\text {Ge}}(1-p)\); we also know d(t), and each arc from \(v_t\) is a back arc with probability d(t)/n. Hence, \({\mathbb {E}}\bigl [B_t\mid {\mathcal {F}}_{t-1},\eta _t\bigr ]=\eta _t d(t)/n\), and consequently

$$\begin{aligned} {\mathbb {E}}\bigl [B_t\mid {\mathcal {F}}_{t-1}\bigr ] ={\mathbb {E}}\Bigl [\eta _t\frac{d(t)}{n}\mid {\mathcal {F}}_{t-1}\Bigr ] =\lambda \frac{d(t)}{n}. \end{aligned}$$
(5.23)

Similarly, since, as above, \(X\sim {\text {Bin}}(m,p)\) implies \({\mathbb {E}}X^2=mp(1-p)+(m p)^2\),

$$\begin{aligned} {\mathbb {E}}\bigl [B_t^2\mid {\mathcal {F}}_{t-1}\bigr ] ={\mathbb {E}}\Bigl [\eta _t\frac{d(t)}{n}\Bigl (1-\frac{d(t)}{n}\Bigr ) +\Bigl (\eta _t\frac{d(t)}{n}\Bigr )^2\mid {\mathcal {F}}_{t-1}\Bigr ] =O(1). \end{aligned}$$
(5.24)

Define \(\Delta Z_t:=B_t-\lambda d(t)/n\) and \(Z_t:=\sum _1^t \Delta Z_i\). Then (5.23) shows that \({\mathbb {E}}\bigl [\Delta Z_t\mid {\mathcal {F}}_t\bigr ]=0\), and thus \((Z_i)_0^n\) is a martingale, with \(Z_0=0\). Hence, \({\mathbb {E}}Z_n=0\). Furthermore, (5.23) implies \({\mathbb {E}}\bigl [(\Delta Z_t)^2\bigr ]={\mathbb {E}}\bigl [(B_t-{\mathbb {E}}[B_t\mid {\mathcal {F}}_{t-1}])^2\bigr ] \le {\mathbb {E}}\bigl [B_t^2\bigr ]\), and thus by (5.24),

$$\begin{aligned} {\mathbb {E}}\bigl [Z_n^2\bigr ] = {\text {Var}}\bigl [Z_n\bigr ] =\sum _{t=1}^n {\mathbb {E}}\bigl [(\Delta Z_t)^2\bigr ] \le \sum _{t=1}^n {\mathbb {E}}\bigl [B_t^2\bigr ] = O(n). \end{aligned}$$
(5.25)

Consequently, \(Z_n=O_{L^2}(n^{1/2})\), and thus

$$\begin{aligned} B= \sum _{t=1}^n B_t =Z_n+\sum _{t=1}^n \lambda \frac{d(t)}{n} =\lambda {\overline{d}}+ Z_n =\lambda {\overline{d}}+O_{L^2}(n^{1/2}). \end{aligned}$$
(5.26)

Finally, (5.26) and Corollary 2.6 yield

$$\begin{aligned} B=\lambda {\overline{d}}+ O_{L^2}(n^{1/2})=\lambda \alpha n+O_{L^2}(n^{1/2}), \end{aligned}$$
(5.27)

which shows (5.4) with \(\beta =\lambda \alpha \) as in (5.8), recalling (2.38).

F By (5.19) and (5.3), we have (5.5) with \(\varphi =\frac{\lambda }{2}-\tau \), which agrees with (5.8) by (5.7) and a simple calculation. In particular, \(\varphi =\beta \).

C Similarly, it follows from (5.16) and (5.4) that we have (5.6) with \(\chi :=\lambda /2-\beta \). Since we have found \(\beta =\varphi \), and we always have \(\tau +\varphi =\lambda /2=\beta +\chi \), see (5.16) and (5.19), we thus have \(\chi =\tau \), and thus (5.7) holds. \(\square \)

Note that \(T+F\) and \(B+C\) are asymptotically normal; this follows immediately from (5.10) and (5.11) by the central limit theorem.

Conjecture 5.2

All four variables TBFC are (jointly) asymptotically normal.

The equalities \(\tau =\chi \) and \(\beta =\varphi \) mean asymptotic equality of the corresponding expectations of numbers of arcs. In fact, there are exact equalities.

Theorem 5.3

For any n, \({\mathbb {E}}T = {\mathbb {E}}C\) and \({\mathbb {E}}B = {\mathbb {E}}F=\lambda {\mathbb {E}}{\overline{d}}\).

Proof

Let vw be two distinct vertices. If the DFS finds w as a descendant of v, then there will later be \({\text {Ge}}(1-p)\) arcs from w, and each has probability 1/n of being a back arc to v. Similarly, there will be \({\text {Ge}}(1-p)\) future arcs from v, and each has probability 1/n of being a forward arc to w. Hence, if \(I_{vw}\) is the indicator that w is a descendant of v, and \(B_{vw}\) [\(F_{vw}\)] is the number of back [forward] arcs vw, then

$$\begin{aligned} {\mathbb {E}}B_{wv} = {\mathbb {E}}F_{vw} = \frac{\lambda }{n} {\mathbb {E}}I_{vw}. \end{aligned}$$
(5.28)

Summing over all pairs of distinct v and w, we obtain

$$\begin{aligned} {\mathbb {E}}B = {\mathbb {E}}F = \frac{\lambda }{n} {\mathbb {E}}\sum _{w} \sum _{v\ne w} I_{vw} = \frac{\lambda }{n} {\mathbb {E}}\sum _{w} d(w) =\lambda {\mathbb {E}}{\overline{d}}. \end{aligned}$$
(5.29)

Finally, \({\mathbb {E}}T+{\mathbb {E}}F = {\mathbb {E}}C + {\mathbb {E}}B\) by (5.17) and (5.14), and thus (5.29) implies \({\mathbb {E}}T = {\mathbb {E}}C\). \(\square \)

Remark 5.4

Knuth [16] conjectured, based on exact calculation of generating functions for small n, that, much more strongly, B and F have the same distribution for every n. (Note that T and C do not have the same distribution; we have \(T\le n-1\), while C may take arbitrarily large values.) This conjecture has recently been proved by Nie [20], using a reformulation in [14].

Remark 5.5

A simple argument with generating functions shows that the number of loops at a given vertex v is \({\text {Ge}}(1-p/(n-np+p))\); these numbers are independent, and thus \(L\sim {\text {NegBin}}\bigl (n,1-p/(n-np+p)\bigr )\) with \({\mathbb {E}}L = p/(1-p)=\lambda =O(1)\) and \({\text {Var}}(L)=p(1-p+p/n)/(1-p)^2=\lambda (1+\lambda /n)=O(1)\) [16]. Moreover, it is easily seen that asymptotically, L has a Poisson distribution, \(L\overset{\textrm{d}}{\longrightarrow }{\text {Po}}(\lambda )\) as \({n\rightarrow \infty }\).

6 Depth, Trees and Arc Analysis in the Shifted Geometric Outdegree Distribution

In this section, the outdegree distribution is \({\text {Ge}}_1(1-p)=1+{\text {Ge}}(1-p)\). Thus we now have the mean

$$\begin{aligned} \lambda =\frac{1}{1-p}. \end{aligned}$$
(6.1)

Thus \(\lambda >1\), and only the supercritical case occurs. As in Sect. 2, the depth d(t) is a Markov chain given by (2.6), but the distribution of \(\xi _t\) is now different. The probability (2.2) is replaced by \((1-t/n)/(1-pt/n)\), but the number of future arcs from an ancestor is still \({\text {Ge}}(1-p)\), and, with \(\theta :=t/n\),

$$\begin{aligned} {\mathbb {P}}\bigl (\xi _t=k\bigr )= {\left\{ \begin{array}{ll} {\overline{\pi }}_t:=\frac{1-\theta }{1-p\theta }, &{} k=0,\\ (1-{\overline{\pi }}_t)(1-\pi _t)^{k-1}\pi _t, &{} k\ge 1, \end{array}\right. } \end{aligned}$$
(6.2)

where \(\pi _t=p{\overline{\pi }}_t\) is as in (2.8). The rest of the analysis does not change, and the results in Theorems 2.45.1 still hold, but we get different values for many of the constants.

We now have

$$\begin{aligned} {\mathbb {E}}\xi _t = \overline{\mu }_t:=\frac{(1-p)\theta }{p(1-\theta )} \end{aligned}$$
(6.3)

and instead of (2.14), we have \({\mathbb {E}}\widetilde{d}(t) = n\widetilde{\ell }(\theta ) + O(1)\) where now \(\widetilde{\ell }(\theta )\) takes the new value

$$\begin{aligned} \widetilde{\ell }(\theta )&:=\int _0^\theta \left( 1-\frac{(1-p)x}{p(1-x)} \right) \,\textrm{d}x =\frac{1}{p}\theta +\frac{1-p}{p}\log (1-\theta ) \nonumber \\&=\frac{1}{p}\bigl (\theta +\lambda ^{-1}\log (1-\theta )\bigr ). \end{aligned}$$
(6.4)

Note that \(\widetilde{\ell }(\theta )\) in (6.4) is proportional to (2.15) for the (unshifted) geometric distribution with the same \(\lambda \), but larger by a factor 1/p. Figures 3 and 4 show \(\widetilde{\ell }(\theta )\) for both geometric distributions with the same p (0.6) and the same \(\lambda \) (2.0), respectively.

Fig. 3
figure 3

\(\widetilde{\ell }(\theta )\), the asymptotic search depth, for geometric distribution \({\text {Ge}}(1-p)\) (solid) and shifted geometric distribution \({\text {Ge}}_1(1-p)\) (dashed) with \(p=0.6\) and thus \(\lambda =1.5\) and \(\lambda =2.5\), respectively

Fig. 4
figure 4

\(\widetilde{\ell }(\theta )\), the asymptotic search depth, for geometric distribution \({\text {Ge}}(1-p)\) (solid) and shifted geometric distribution \({\text {Ge}}_1(1-p)\) (dashed) with \(p=2/3\) and 1/2, respectively, and thus \(\lambda =2\)

Note that \(\widetilde{\ell }(\theta _1)=0\) still gives the formulas (2.17) and (2.18) for \(\theta _1\), now with \(\lambda =1/(1-p)\) as in (6.1), and that \(\lambda >1\) so \(\theta _1>0\) for every p. Differentiating (6.4) shows that the maximum point \(\theta _0=p>0\), which again is given by (2.16). Straightforward calculations yield

$$\begin{aligned} \upsilon&:=\widetilde{\ell }(p)= 1+\frac{1-p}{p}\log (1-p) =1-\frac{1}{\lambda -1}\log \lambda , \end{aligned}$$
(6.5)
$$\begin{aligned} \alpha&:=\frac{1}{p}\left( \frac{\theta _1^2}{2} -\frac{1}{\lambda }\bigl ((1-\theta _1)\log (1-\theta _1)+\theta _1\bigr )\right) =\theta _1-\frac{\theta _1^2}{2p} . \end{aligned}$$
(6.6)

Furthermore, (6.2) yields by a simple calculation, with \(\theta :=t/n\),

$$\begin{aligned} {\text {Var}}\xi _t =\frac{(1-p)^2}{p^2(1-\theta )^2} + \frac{1-p}{p(1-\theta )} - \frac{1-p}{p^2}. \end{aligned}$$
(6.7)

Hence, (3.4) holds with (3.5) replaced by

$$\begin{aligned} g(\theta )&:= \int _0^\theta \Bigl (\frac{(1-p)^2}{p^2(1-x)^2} + \frac{1-p}{p(1-x)} - \frac{1-p}{p^2}\Bigr )\,\textrm{d}x \nonumber \\&= \frac{(1-p)^2\theta }{p^2(1-\theta )} -\frac{1-p}{p}\log (1-\theta ) -\frac{1-p}{p^2}\theta , \end{aligned}$$
(6.8)

and then Lemma 3.1 and Theorem 3.3 hold with this \(g(\theta )\).

Consequently, Theorem 3.4 holds with

$$\begin{aligned} \sigma ^2:=g(p) = -\frac{1-p}{p}\log (1-p). \end{aligned}$$
(6.9)

In the proof of Theorem 4.1, (4.5) for \(k\ge 1\) still holds with \(\mu _t\) given by (2.12) (except for the formula with \(\lambda \)), and thus (4.6) is replaced by, using (6.3),

$$\begin{aligned} {\mathbb {E}}\bigl [d(t+1)\mid {\mathcal {F}}_t\bigr ]&=d(t)+1-\overline{\mu }_t+\mu _t{\mathbb {E}}\bigl [J_{t+1}\mid {\mathcal {F}}_t\bigr ] . \end{aligned}$$
(6.10)

The rest of the proof remains the same with minor modifications and leads to, instead of (4.15), with \(\theta :=t/n\),

$$\begin{aligned} \sum _{t=T_1+1}^n J_t&=\sum _{t=T_1}^{n-1}\frac{\overline{\mu }_t-1}{\mu _t}+O_{L^2}(n^{1/2})=\sum _{t=T_1}^{n-1}\frac{\frac{(1-p)\theta }{p(1-\theta )}-1}{\frac{1-p}{p(1-\theta )}} +O_{L^2}(n^{1/2})\nonumber \\&=\sum _{t=T_1}^{n-1}\bigl (1-\lambda (1-\theta )\bigr ) + O_{L^2}(n^{1/2}), \end{aligned}$$
(6.11)

and thus Theorem 4.1 holds with

$$\begin{aligned} \psi :=\int _{\theta _1}^1\bigl (1-\lambda (1-x)\bigr )\,\textrm{d}x =1-\theta _1-\frac{\lambda }{2}(1-\theta _1)^2, \end{aligned}$$
(6.12)

just as in (4.2).

In the proof of Theorem 5.1, (5.27) still holds, and we obtain (5.4) with \(\beta =\lambda \alpha \), and then (5.6) with \(\chi =\lambda /2-\beta \) just as before (but recall that \(\alpha \) now is different). On the other hand, now the expected numbers of back and forward arcs differ since \({\mathbb {E}}B = \lambda {\mathbb {E}}{\overline{d}}\sim \lambda \alpha n\) and \({\mathbb {E}}F=(\lambda -1){\mathbb {E}}{\overline{d}}\sim (\lambda -1)\alpha n\) because the average number of future arcs at a vertex after a descendant has been created is \(\lambda -1\). The asymptotic formula (5.3) holds as above with \(\tau :=1-\psi \); hence (5.17) implies that (5.5) holds too, with \(\varphi =\lambda /2-\tau \); as just noted, we now have \(\varphi =(\lambda -1)\alpha \ne \beta \). Collecting these constants, we see that Theorem 5.1 holds with

$$\begin{aligned} \tau&:=1-\psi =\theta _1+\frac{\lambda }{2}(1-\theta _1)^2, \end{aligned}$$
(6.13)
$$\begin{aligned} \beta&:=\lambda \alpha = \lambda \theta _1-\frac{\lambda }{2p}\theta _1^2 = \lambda \theta _1-\frac{\lambda ^2}{2(\lambda -1)}\theta _1^2, \end{aligned}$$
(6.14)
$$\begin{aligned} \varphi&:=(\lambda -1)\alpha = (\lambda -1)\theta _1-\frac{\lambda }{2}\theta _1^2 =\frac{\lambda }{2}-\tau , \end{aligned}$$
(6.15)
$$\begin{aligned} \chi&:=\frac{\lambda }{2}-\beta =\frac{\lambda }{2}(1-\theta _1)^2 + \frac{\lambda }{2(\lambda -1)}\theta _1^2 . \end{aligned}$$
(6.16)

Thus the equality \(\beta =\varphi \) and the equality of the expected number of back and forward arcs in Theorems 5.1 and Theorem 5.3 was an artifact of the geometric degree distribution. Similarly, \(\chi =\lambda /2-\beta <\lambda /2-\varphi =\tau \), and the equality of the expected numbers of tree arcs and cross arcs in Theorem 5.3 also does not hold.

We summarize the results above.

Theorem 6.1

Let the outdegree distribution \(\textbf{P}\) be the shifted geometric distribution \({\text {Ge}}_1(p)\) with \(p\in (0,1)\). Then Theorems 2.45.1 hold, with the constants now having the values described above (and always \(\lambda >1\)), while Theorem 5.3 does not hold.

7 A General Outdegree Distribution: Stack Index

In this section, we consider a general outdegree distribution \(\textbf{P}\), with mean \(\lambda \) and finite variance.

When the outdegree distribution is general, the depth does not longer follow a simple Markov chain, since we would have to keep track of the number of children seen so far at each level of the branch of the tree toward the current vertex. We get back a Markov chain if instead of the depth d(t) of the current vertex we consider the stack index I(t) defined as follows.

The DFS can be regarded as keeping a stack of unexplored arcs, for which we have seen the start vertex but not the endpoint. Let again \(v_t\) be the t-th vertex seen by the DFS, and let I(t) be the size of this stack when \(v_t\) is found (but before we add the arcs from \(v_t\) to the stack). This stack index I(t) is given by the following modification of the pseudo-code for \(\textsc {Deep}{}\) (initialized at zero at the beginning):

figure c

This recursive pseudo-code looks very similar to our first recursive pseudo-code, although there is a subtle difference. In the first pseudo-code, there was no need to load \({{\mathcal {N}}}(u)\) as a local variable since the current neighbor could just be localized via a local index. Thus the stack memory requirement is proportional to the number of recursive calls to \(\textsc {Deep}{}\) and is thus of order (at most) the number of vertices in the graph. In the second pseudo-code, we load the full neighborhood \({{\mathcal {N}}}(u)\) as a local variable, which typically increases the stack memory requirement to the order of the number of arcs in the graph, which can be more than the square of the number of vertices. This is even more visible in the following non-recursive pseudo-code version. For our random model, with a general outdegree distribution, the evolution of the stack can be described as follows, with the stack initially empty:

  1. S1.

    If the stack is empty, pick a new vertex v that has not been seen before (if there is no such vertex, we have finished). Otherwise, pop the last arc from the stack and reveal its endpoint v (which is uniformly random over all vertices). If v already is seen, repeat.

  2. S2.

    (v is now a new vertex) Reveal the outdegree \(\eta \) of v and add to the stack \(\eta \) new arcs from v, with unspecified endpoints. GOTO S1.

It can easily be seen that the stack size I(t) will be a Markov chain, similar (but not identical) to the depth process d(t) in the geometric case studied above. Moreover, it is possible to recover the depth of the vertices from the stack size process, which makes it possible to extend many of the results above, although sometimes with less precision. For details, see [12].