On Weighted Depths in Random Binary Search Trees
- 348 Downloads
Abstract
Following the model introduced by Aguech et al. (Probab Eng Inf Sci 21:133–141, 2007), the weighted depth of a node in a labelled rooted tree is the sum of all labels on the path connecting the node to the root. We analyse weighted depths of nodes with given labels, the last inserted node, nodes ordered as visited by the depth first search process, the weighted path length and the weighted Wiener index in a random binary search tree. We establish three regimes of nodes depending on whether the second-order behaviour of their weighted depths follows from fluctuations of the keys on the path, the depth of the nodes or both. Finally, we investigate a random distribution function on the unit interval arising as scaling limit for weighted depths of nodes with at most one child.
Keywords
Analysis of algorithm Data structures Binary search trees Central limit theorems Contraction method Random probability measuresMathematical Subject Classification (2010)
60F05 68P05 68Q251 Introduction
Binary search tree constructed from the list 4, 2, 6, 5, 7, 3, 1
Properties of binary search trees are typically analysed under the random permutation model where the data \(x_1, \ldots , x_n\) are generated by a uniformly chosen permutation of the first n integers. Among the quantities studied in binary search trees, one finds depths of and distances between nodes related to the performance of search queries and finger searches in the database, the (total) path length measuring the cost of constructing the tree as well as the Wiener index. Further, more complex parameters such as the height corresponding to worst case search times, the saturation level and the profile have been studied thoroughly. We review the literature relevant in the context of our work below.
In this note, we complement the wide literature on random binary search trees by the analysis of depths of nodes, path length and Wiener index in their weighted versions as introduced by Aguech et al. [1]. Here, the weighted depth of a node is the sum of all keys stored on the path to the root. In [1], results about weighted depths of extremal paths have been obtained. Kuba and Panholzer [19, 20] studied the problem in random increasing trees covering the random recursive tree and the random plane-oriented recursive tree. Weighted depths of nodes and the weighted height were also studied by Broutin and Devroye [3] in a more general tree model, which relies on assigning weights to the edges of the tree. Further, the weighted path length in this model was investigated by Rüschendorf and Schopp [29]. Note that we deviate from the notation introduced in [1, 19] using the term weighted depth for what is called weighted path length there since we also study a weighted version of the (total) path length of binary search trees.
2 Preliminaries
We introduce some notation. By the size of a finite binary tree, we refer to its number of nodes. Upon embedding a finite rooted binary tree in the complete infinite binary tree, a node is called external if its graph distance to the binary tree is one. Any node on level \(k \ge 1\) in a rooted binary tree is associated a vector \(v_1 v_2 \ldots v_k \in \{0,1\}^k\) where \(v_i = 0\) if and only if the path from the root to the node continues in the left subtree upon reaching level \(i-1\).
Let \(n \ge 1\) and \(1 \le k \le n\). Under the random permutation model (short: permutation model), let \(D_k(n)\) be the depth of the node labelled k. By \(W_k(n)\) we denote the sum of all keys on the path from the root to the node labelled k including the labels of both endpoints. For \(x = x_1 x_2 \ldots \in \{0,1\}^\infty \), let \(B_n(x)\) be the maximal depth among nodes of the form \(x_1 \ldots x_k, k \ge 0\). We use \(X_n\) (\({\mathbb {X}}_n\)) to denote the (weighted) depth of the nth inserted node. Finally, we define the height of the tree by \(H_n = \sup \{k \in \mathbb {N}: D_k(n) > 0\}\).
Finally, we use the Landau notations little–o, big–O, little–\(\omega \), big–\(\varOmega \) and big–\(\varTheta \) as \(n \rightarrow \infty \).
2.1 Depths and Height
2.2 Path Length and Wiener Index
2.3 The i.i.d Model
We also consider binary search trees of size n where the data are chosen as the first n values of a sequence of independent random variables \(U_1, U_2, \ldots \) each having the uniform distribution on [0, 1]. Since the vector \((\text {rank}(U_1), \ldots , \text {rank}(U_n))\) constitutes a uniformly chosen permutation, in distribution, both the permutation model and the i.i.d. model lead to the same unlabelled tree. We use the same notation as in the permutation model for quantities not involving the labels of nodes, that is, \(X_n, h_n, H_n, P_n, W_n\) and \(B_n(x)\). Further, we define the weighted path length \(\mathscr {P}_n\) as the sum of all weighted depths, and the weighted Wiener index \(\mathscr {W}_n\) as the sum over all pairs of weighted distances. Here, the weighted distance between two nodes equals the sum of all labels on the path connecting them, labels of endpoints included. (Notice that the weighted distance between a node and itself is equal to its label.) Finally, analogously to \(B_n(x)\), we define \({\mathscr {B}}_n(x)\) as the weighted depth of the node of largest depth on the path x. We call \(\{{\mathscr {B}}_n(x): x \in \{0,1 \}^\infty \}\), the weighted silhouette of the tree (at time n).
3 Main Results
Our main results are divided into two groups: Theorems 1 and 2 hold in the permutation model, while Theorems 3 and 4 are formulated in the i.i.d. model.
3.1 Results in the Permutation Model
Theorem 1
The asymptotic behaviour of weighted depths of small nodes is to be compared with the corresponding results in [19]. Here, another phase transition occurs when \(k = o(n / \sqrt{\log n})\).
Theorem 2
3.2 Results in the i.i.d Model
Any \(x \in \{0,1\}^\infty \) corresponds to a unique value \(x \in [0,1]\) by \(x = \sum _{i=0}^\infty x_i 2^{-i}\). This identification becomes one-to-one upon allowing only those \(x \in \{0,1\}^\infty \) which contain infinitely many zeros and \(x = \mathbf {1}\). In the i.i.d. model, for any \(x \in \{0,1\}^{\infty }, k \ge 1\), the node \(x_1 \ldots x_k\) eventually appears in the sequence of binary search trees and we write \(\varXi _k(x)\) for its ultimate label. The following theorem about the behaviour of \({\mathscr {B}}_n(x)\) involves a random continuous distribution function arising as the almost sure limit of \(\varXi _k(x), x \in [0,1],\) as \(k \rightarrow \infty \). We believe that this process is of independent interest and state some of its properties in Proposition 1 in Sect. 3.3. The simulations of \(\varXi _{15}\) presented in Fig. 2 illustrate the scaling limit.
Theorem 3
Two simulations of \(\varXi _{15}\), the dotted line being the graph of the identity function
The next theorem extends the distributional convergence result in Theorem 1.1 in [25], that is (9), by central limit theorems for the weighted path length and the weighted Wiener index.
Theorem 4
Conclusions We have seen that there exist three types of nodes showing significantly different behaviour with respect to their weighted depths. By Theorem 1, for \(k = \omega (n/\sqrt{\log n})\), second-order fluctuations of weighted depths are due to variations of the depth of nodes. In the second regime, when \(k = \varTheta (n/\sqrt{\log n})\), variations of weighted depths are determined by two independent contributions, one for the depths and one for the keys on the paths. Finally, when \(k = o(n/\sqrt{\log n})\) only fluctuations of labels on paths influence second-order terms of weighted depths. The third regime can be further subdivided with respect to the first-order terms of \(W_k(n)\) and \(k D_k(n)\): for \(k = \omega (n / \log n)\), they coincide, for \(k = \varTheta (n/\log n)\), they are of the same magnitude, whereas, for \(k = o(n/ \log n)\), they are of different scale. By Theorem 3, the weighted silhouette behaves considerably different. Here, the lack of concentration around the mean leads to an interesting random distribution function on the unit interval as scaling limit.
3.3 Further Results and Remarks
The limit process\(\varXi \) The process \(\varXi \) in Theorem 3 is a random distribution function. In particular, it can be regarded as an element in the set of càdlàg functions \({\mathscr {D}}[0,1]\) consisting of all \(f: [0,1] \rightarrow \mathbb {R}\), such that, for all \(t \in [0,1]\), \(f(t) = \lim _{s \downarrow t} f(s)\) and \(\lim _{s \uparrow t} f(s)\) exists. The absolute value of f is defined by \(\sup \{|f(t)| : t \in [0,1]\}\). Endowed with Skorokhod’s topology \(J_1\), \({\mathscr {D}}[0,1]\) becomes a Polish space. We refer to Chapter 3 in Billingsley’s book [2] for detailed information on this matter.
Proposition 1
- (i)
\(\mathbf {E} \left[ \varXi (t) \right] = t\) for all \(t \in (0,1)\);
- (ii)
\(\mathscr {L}( (\varXi (t))_{t \in [0,1]} ) = \mathscr {L}( (1 - \varXi (1-t))_{t \in [0,1]})\);
- (iii)\(\varXi (\xi )\) has the arcsine distribution with densitywhere \(\varXi , \xi \) are independent and \(\xi \) has the uniform distribution on [0, 1];$$\begin{aligned} \frac{1}{\pi \sqrt{x(1-x)}}, \quad x \in (0,1), \end{aligned}$$
- (iv)
for \(t \in (0,1)\), \(\mathscr {L}(\varXi (t))\) has a smooth density \(f_t: (0,1) \rightarrow (0, \infty )\);
- (v)
for \(t \in (0,1/2)\), \(x f'_t(x) = - f_{2t}(x)\), \(x \in (0,1)\), \(f_t\) is strictly monotonically decreasing and \(\lim _{x \uparrow 1} f_t(x) = 0\);
- (vi)
with \(\alpha ^{(i)}_t := \lim _{x \downarrow 0} f^{(i)}_t(x)\), \(i = 0,1, t \in (0,1/2)\) and \(\gamma _0 = 1/4, \gamma _1 = 5/16\), we have \(\alpha ^{(i)}_t = (-1)^i \infty \) for \(0 < t \le \gamma _i\), \(|\alpha ^{(i)}_t| < \infty \) for \(\gamma _i< t < 1/2\) and \(|\alpha ^{(i)}_t| \uparrow \infty \) as \(t \downarrow \gamma _i\).
Random recursive trees A random recursive tree is constructed as follows: starting with the root labelled one, in the kth step, \(k \ge 2\), a node labelled k is inserted in the tree and connected to an already existing node chosen uniformly at random. Weighted depths in random binary search trees differ substantially from those in random recursive trees analysed in [19] where all nodes show an asymptotic behaviour comparable to that of nodes labelled \(k = o(n/\sqrt{\log n})\) in the binary search tree. The difference is highlighted by the weighted path length. Being of the same order as the path length in binary search trees, it follows from results in [19] that the weighted path length \({\mathscr {Q}}_n\) in a random recursive tree of size n is of order \(n^2\). The same is valid for its standard deviation. We conjecture that the sequence \((n^{-2} {\mathscr {Q}}_n)\) converges in distribution to a non-trivial limit; however, the recursive approach worked out in the proof of Theorem 4, which also applies to the analysis of the path length in random recursive trees, seems not to be fruitful in this context.
Outline All results are proved in Sect. 4 starting with the proofs of Theorems 1 and 2 as well as (21) in Sect. 4.1. Here, most arguments are based on representations of (weighted) depths as sums of bounded independent random variables which go back to Devroye and Neininger [9]. Theorem 3 and Proposition 1 are proved in Sect. 4.2. In this part, the construction of the limiting process relies on suitable uniform \(L_1\)-bounds on the increments of the process \(\varXi _k(x)_{x \in [0,1]}, k \ge 1,\) while the properties of the limit laws formulated in Proposition 1 follow from the distributional fixed-point equation (22). Finally, the proof of Theorem 4 relying on the contraction method is worked out in Sect. 4.3.
4 Proofs
4.1 Weighted Depths of Labelled Nodes
In the permutation model, let \(A_{j,k}\) be the event that the node labelled k is in the subtree of the node labelled j. Then, \(D_k(n) = \sum _{j=1}^n \mathbf {1}_{ A_{j,k} } -1 \) and \(W_k(n) = \sum _{j=1}^n j \mathbf {1}_{ A_{j,k} }\). It is easy to see that \(A_{1, k}, \ldots , A_{k-1,k}\) and \(A_{k+1,k}, \ldots , A_{n,k}\) are two families of independent events; however, there exist subtle dependencies between the sets. Following the approach in [9], let \(B_{j,k} = A_{j,k-1}\) for \(j < k\) and \(B_{j,k} = A_{j,k+1}\) for \(j > k\). For convenience, let \(B_{k,k}\) be an almost sure event. The following lemma summarizes results in [9], and we refer to this paper for a proof. In this context, note that Devroye [8] gives distributional representations as sums of independent (or m-dependent) indicator variables for quantities growing linearly in n, such as the number of leaves.
Lemma 1
4.1.1 Weighted Depths of Large Nodes
4.1.2 Weighted Depths of Small Nodes
4.1.3 Proof of (21)
4.2 The Weighted Silhouette
We prove Theorem 3 and Proposition 1.
Proof of Theorem 3
Proof of Proposition 1
The curvature We make a concluding remark about the curvature of \(f_t, t \in (0,1/2)\). First, since \(x f^{''}_t(x) = - f_{2t}'(x) - f_t'(x)\), for \(0 < t \le 1/4\), the function \(f_t\) is convex. From (28) it is easy to deduce \(f_{1/3}(x) = 2(1-x)\). Since \(f_{1/3}'' = f_{1/2}'' = 0\), it is plausible to conjecture that \(f_t\) is convex for \(t \le 1/3\) and concave for \(1/3 \le t < 1/2\). Concavity at rational points with small denominator such as \(t = 3/8\) or \(t = 5/12\) can be verified by hand using (28).
4.3 Weighted Path Length and Wiener Index
In order to obtain mean and variance for the weighted path length and the weighted Wiener index, we use the reflection argument from the proof of Proposition 1 (ii). To this end, let \(\mathscr {P}_n^*\) and \(\mathscr {W}_n^*\) denote weighted path length and weighted Wiener index in the binary search tree built from the sequence \(U_1^* = 1-U_1, U_2^* = 1-U_2, \ldots \) Then, \(\mathscr {P}_n + \mathscr {P}_n^* = P_n + n\) and \(\mathscr {W}_n + \mathscr {W}_n^* = W_n + {n \atopwithdelims ()2}\) providing the claimed expansions for \(\mathbf {E} \left[ \mathscr {P}_n \right] \) and \(\mathbf {E} \left[ \mathscr {W}_n \right] \) upon recalling (7) and (8).
Notes
Acknowledgements
The first author is grateful to the King Saud University, Deanship of Scientific Research, College of Science Research Center. The research of the third author was supported by a Feodor Lynen Fellowship of the Alexander von Humboldt-Foundation. The authors also thank two anonymous referees for their valuable comments.
References
- 1.Aguech, R., Lasmar, N., Mahmoud, H.: Extremal weighted path lengths in random binary search trees. Probab. Eng. Inform. Sci. 21(1), 133–141 (2007)MathSciNetCrossRefMATHGoogle Scholar
- 2.Billingsley, P.: Convergence of Probability Measures. Wiley Series in Probability and Statistics: Probability and Statistics. Wiley, New York (1999)CrossRefMATHGoogle Scholar
- 3.Broutin, N., Devroye, L.: Large deviations for the weighted height of an extended class of trees. Algorithmica 46(3–4), 271–297 (2006)MathSciNetCrossRefMATHGoogle Scholar
- 4.Brown, G.G., Shubert, B.O.: On random binary trees. Math. Oper. Res. 9(1), 43–65 (1984)MathSciNetCrossRefMATHGoogle Scholar
- 5.Chen, R., Lin, E., Zame, A.: Another arc sine law. Sankhyā Ser. A 43(3), 371–373 (1981)MathSciNetMATHGoogle Scholar
- 6.Devroye, L.: A note on the height of binary search trees. J. Assoc. Comput. Mach. 33(3), 489–498 (1986)MathSciNetCrossRefMATHGoogle Scholar
- 7.Devroye, L.: Applications of the theory of records in the study of random trees. Acta Inform. 26(1–2), 123–1301 (1988)MathSciNetCrossRefMATHGoogle Scholar
- 8.Devroye, L.: Limit laws for local counters in random binary search trees. Random Struct. Algorithms 2(3), 303–315 (1991)MathSciNetCrossRefMATHGoogle Scholar
- 9.Devroye, L., Neininger, R.: Distances and finger search in random binary search trees. SIAM J. Comput. 33(3), 647–658 (2004)MathSciNetCrossRefMATHGoogle Scholar
- 10.Dickman, K.: On the frequency of numbers containing prime factors of a certain relative magnitude. Arkiv för Mathematik, Astronomi och Fysik 22A(10), 1–14 (1930)MATHGoogle Scholar
- 11.Dvoretzky, A., Kiefer, J., Wolfowitz, J.: Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann. Math. Stat. 27, 642–669 (1956)MathSciNetCrossRefMATHGoogle Scholar
- 12.Grübel, R., Rösler, U.: Asymptotic distribution theory for Hoare’s selection algorithm. Adv. Appl. Probab. 28(1), 252–269 (1996)MathSciNetCrossRefMATHGoogle Scholar
- 13.Grübel, R.: On the silhouette of binary search trees. Ann. Appl. Probab. 19(5), 1781–1802 (2009)MathSciNetCrossRefMATHGoogle Scholar
- 14.Grübel, R., Stefanoski, N.: Mixed Poisson approximation of node depth distributions in random binary search trees. Ann. Appl. Probab. 15(1A), 279–297 (2005)MathSciNetCrossRefMATHGoogle Scholar
- 15.Hildebrand, A., Tenenbaum, G.: Integers without large prime factors. J. Théor. Nombres Bordx. 5(2), 411–484 (1993)MathSciNetCrossRefMATHGoogle Scholar
- 16.Hoare, C.A.R.: Quicksort. Comput. J. 5, 10–15 (1962)MathSciNetCrossRefMATHGoogle Scholar
- 17.Hwang, H.-K., Tsai, T.-H.: Quickselect and the Dickman function. Combin. Probab. Comput. 11(4), 353–371 (2002)MathSciNetCrossRefMATHGoogle Scholar
- 18.Knuth, D.E.: The Art of Computer Programming: Sorting and Searching, vol. 3. Addison-Wesley, Reading (1973)MATHGoogle Scholar
- 19.Kuba, M., Panholzer, A.: On weighted path lengths and distances in increasing trees. Probab. Eng. Inf. Sci. 21(3), 419–433 (2007)MathSciNetCrossRefMATHGoogle Scholar
- 20.Kuba, M., Panholzer, A.: On edge-weighted recursive trees and inversions in random permutations. Discrete Math. 308(4), 529–540 (2008)MathSciNetCrossRefMATHGoogle Scholar
- 21.Mahmoud, H.M.: Evolution of Random Search Trees. Wiley-Interscience Series in Discrete Mathematics and Optimization. Wiley, New York (1992)MATHGoogle Scholar
- 22.Mahmoud, H.M., Neininger, R.: Distribution of distances in random binary search trees. Ann. Appl. Probab. 13(1), 253–276 (2003)MathSciNetCrossRefMATHGoogle Scholar
- 23.Mahmoud, H.M., Pittel, B.: On the most probable shape of a search tree grown from a random permutation. SIAM J. Algebr. Discrete Methods 5(1), 69–81 (1984)MathSciNetCrossRefMATHGoogle Scholar
- 24.Neininger, R.: On a multivariate contraction method for random recursive structures with applications to quicksort. Random Struct. Algorithms 19(3–4), 498–524 (2001). Analysis of algorithms (Krynica Morska, 2000)MathSciNetCrossRefMATHGoogle Scholar
- 25.Neininger, R.: The Wiener index of random trees. Combin. Probab. Comput. 11(6), 587–597 (2002)MathSciNetCrossRefMATHGoogle Scholar
- 26.Panholzer, A., Prodinger, H.: Spanning tree size in random binary search trees. Ann. Appl. Probab. 14(2), 718–733 (2004)MathSciNetCrossRefMATHGoogle Scholar
- 27.Régnier, M.: A limiting distribution for quicksort. RAIRO Inform. Théor. Appl. 23(3), 335–343 (1989)MathSciNetCrossRefMATHGoogle Scholar
- 28.Rösler, U.: A limit theorem for “Quicksort”. RAIRO Inform. Théor. Appl. 25, 85–100 (1991)MathSciNetCrossRefMATHGoogle Scholar
- 29.Rüschendorf, L., Schopp, E.-M.: Note on the weighted internal path length of \(b\)-ary trees. Discrete Math. Theor. Comput. Sci. 9(1), 1–6 (2007)MathSciNetMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

