Summary
Several average performance measures are presented for large B-trees formed from insertions, where large refers to the number of keys. Formulas are first derived for the expected number of nodes of each size on the bottom level of the tree, where the size of a node is the number of keys currently contained in the node. This is followed by formulas for the probability of making an insertion into a node of a given size, the probability of a split during an insertion into a node, and the expected number of splits during an insertion into the tree. It is shown that for large trees of high order m, the expected number of splits per insertion is approximately 1/((ln 2) m). A formula is presented for the average storage utilization, and it is shown that this average approaches ln 2 as m approaches infinity. A simpler formula is derived for the average storage utilization at the bottom level of the tree, and it is shown that this formula is an increasing function of m ranging from 2/3 to ln 2. It is shown that the expected tree height and the expected search path length are approximately logarithmic to the base (ln 2) m. Simulation results are presented to corroborate the theoretical analysis.
Similar content being viewed by others
References
Bayer, R., McCreight, E.: Organization and maintenance of large ordered indexes. Acta Inf. 1, 173–189 (1972)
Bayer, R., Schkolnick, M.: Concurrency of operations on B-trees. Acta Inf. 9, 1–21 (1977)
Bayer, R., Unterauer, K.: Prefix B-trees. ACM Trans. Database Syst. 2, 11–26 (1977)
Held, G., Stonebraker, M.: B-trees reexamined. Commun. ACM 21, 139–143 (1978)
Knuth, D.: The Art of Computer Programming, Vol. 1 Fundamental Algorithms. Reading, MA: Addison-Wesley 1969
Knuth, D.: The Art of Computer Programming, Vol. 3 Sorting and Searching, pp. 471–480, 679–680. Reading, MA: Addison-Wesley 1973
Kuspert, K.: Storage utilization in B-trees with a generalized overflow technique. Acta Inf. 19, 35–55 (1983)
Nakamura, T., Mizoguchi, T.: An analysis of storage utilization factor in block split data structuring scheme. Proc. 4th Int. Conf. on Very Large Data Bases, pp. 489–495, 1978
Quitzow, K., Klopprogge, M.: Space utilization and access path length in B-trees. Inf. Syst. 5, 7–16 (1980)
Ullman, J.: Principles of Database Systems, pp. 42–49. Potomac, MD: Computer Science Press 1980
Yao, A.: On random 2–3 trees. Acta Inf. 9, 159–170 (1978)
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Wright, W.E. Some average performance measures for the B-tree. Acta Informatica 21, 541–557 (1985). https://doi.org/10.1007/BF00289710
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF00289710