## Abstract

Bucket Sort is known to run in expected linear time when the input keys are distributed independently and uniformly at random in the interval [0, 1). The analysis holds even when a quadratic time algorithm is used to sort the keys in each bucket. We show how to obtain linear time guarantees on the running time of Bucket Sort that hold with *very high probability*. Specifically, we investigate the asymptotic behavior of the exponent in the upper tail probability of the running time of Bucket Sort. We consider large additive deviations from the expectation, of the form *cn* for large enough (constant) *c*, where *n* is the number of keys that are sorted.

Our analysis shows a profound difference between variants of Bucket Sort that use a quadratic time algorithm within each bucket and variants that use a \(\varTheta (b\log b)\) time algorithm for sorting *b* keys in a bucket. When a quadratic time algorithm is used to sort the keys in a bucket, the probability that Bucket Sort takes *cn* more time than expected is exponential in \(\varTheta (\sqrt{n}\log n)\). When a \(\varTheta (b\log b)\) algorithm is used to sort the keys in a bucket, the exponent becomes \(\varTheta (n)\). We prove this latter theorem by showing an upper bound on the tail of a random variable defined on tries, a result which we believe is of independent interest. This result also enables us to analyze the upper tail probability of a well-studied trie parameter, the external path length, and show that the probability that it deviates from its expected value by an additive factor of *cn* is exponential in \(\varTheta (n)\).

This research was supported by a grant from the United States-Israel Binational Science Foundation (BSF), Jerusalem, Israel, and the United States National Science Foundation (NSF).

A full version of this paper can be found at [1].

## Access this chapter

Tax calculation will be finalised at checkout

Purchases are for personal use only

### Similar content being viewed by others

## Notes

- 1.
Throughout the paper, \(\ln x\) denotes the natural logarithm of

*x*and \(\log x\) denotes the logarithm of base 2 of*x*. - 2.
One should not confuse this analysis with concentration bounds that address small deviations from the expectation.

- 3.
The threshold

*C*depends on: (1) the constant that appears in the sorting algorithm used within each bucket, and (2) the constant that appears in the expected running time of \(b^2\)-Bucket Sort. - 4.
Interestingly, the sum of squares of bin occupancies, i.e., \(f(\mathbf {b})\), also appears in the FKS perfect hashing construction [8].

- 5.
Formally,

*T*(*L*) may contain a subset of these*n*nodes. If a node \(v_j\) at depth \(\log n\) is not chosen by any string, then define \(C_j=0\). - 6.
Note that RVs \(\left\{ \varDelta \right\} _i\) are not independent and probably not even negatively associated. Hence, standard concentration bounds do not apply to \(\sum \varDelta _i\).

## References

Bercea, I.O., Even, G.: Upper tail analysis of bucket sort and random tries. CoRR abs/2002.10499 (2020). https://arxiv.org/abs/2002.10499

Clément, J., Flajolet, P., Vallée, B.: Dynamical sources in information theory: a general analysis of trie structures. Algorithmica

**29**(1–2), 307–369 (2001)Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2009)

Devroye, L.: Lecture Notes on Bucket Algorithms, vol. 12. Birkhäuser Boston (1986)

Doerr, B.: Probabilistic tools for the analysis of randomized optimization heuristics. CoRR abs/1801.06733 (2018). http://arxiv.org/abs/1801.06733

Dubhashi, D.P., Panconesi, A.: Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, Cambridge (2009)

Fill, J.A., Janson, S.: Quicksort asymptotics. J. Algorithms

**44**(1), 4–28 (2002)Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with \(o(1)\) worst case access time. In: 23rd Annual Symposium on Foundations of Computer Science, pp. 165–169. IEEE (1982)

Jacquet, P., Regnier, M.: Normal limiting distribution for the size and the external path length of tries (1988)

Janson, S.: On the tails of the limiting quicksort distribution. Electron. Commun. Prob.

**20**(2015)Janson, S.: Tail bounds for sums of geometric and exponential variables. Stat. Prob. Lett.

**135**, 1–6 (2018)Kirschenhofer, P., Prodinger, H., Szpankowski, W.: On the variance of the external path length in a symmetric digital trie. Discrete Appl. Math.

**25**(1–2), 129–143 (1989)Knuth, D.E.: The Art of Computer Programming, vol. III, 2nd edn. Addison-Wesley, Boston (1998)

Mahmoud, H., Flajolet, P., Jacquet, P., Régnier, M.: Analytic variations on bucket selection and sorting. Acta Informatica

**36**(9–10), 735–760 (2000)Mahmoud, H.M., Lueker, G.S.: Evolution of Random Search Trees, vol. 200. Wiley, New York (1992)

McDiarmid, C., Hayward, R.: Large deviations for quicksort. J. Algorithms

**21**(3), 476–507 (1996). https://doi.org/10.1006/jagm.1996.0055Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis. Cambridge University Press, Cambridge (2017)

Régnier, M.: A limiting distribution for quicksort. RAIRO-Theoretical Inform. Appl.-Informatique Théorique et Appl.

**23**(3), 335–343 (1989)Sanders, P., Mehlhorn, K., Dietzfelbinger, M., Dementiev, R.: Sorting and selection. In: Sequential and Parallel Algorithms and Data Structures, pp. 153–210. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25209-0_5

Sedgewick, R., Flajolet, P.: An Introduction to the Analysis of Algorithms. Pearson Education India, Chennai (2013)

Seidel, R.: Data-specific analysis of string sorting. In: Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1278–1286. Society for Industrial and Applied Mathematics (2010)

Szpankowski, W.: Average Case Analysis of Algorithms on Sequences, vol. 50. John Wiley & Sons, New York (2011)

Vitter, J.S., Flajolet, P.: Average-case analysis of algorithms and data structures. In: Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity, pp. 431–524 (1990)

## Author information

### Authors and Affiliations

### Corresponding author

## Editor information

### Editors and Affiliations

## Rights and permissions

## Copyright information

© 2021 Springer Nature Switzerland AG

## About this paper

### Cite this paper

Bercea, I.O., Even, G. (2021). Upper Tail Analysis of Bucket Sort and Random Tries. In: Calamoneri, T., Corò, F. (eds) Algorithms and Complexity. CIAC 2021. Lecture Notes in Computer Science(), vol 12701. Springer, Cham. https://doi.org/10.1007/978-3-030-75242-2_8

### Download citation

DOI: https://doi.org/10.1007/978-3-030-75242-2_8

Published:

Publisher Name: Springer, Cham

Print ISBN: 978-3-030-75241-5

Online ISBN: 978-3-030-75242-2

eBook Packages: Computer ScienceComputer Science (R0)