Skip to main content
Log in

On the Variety of Shapes in Digital Trees

  • Published:
Journal of Theoretical Probability Aims and scope Submit manuscript

Abstract

We study the joint distribution of the number of occurrences of members of a collection of nonoverlapping motifs in digital data. We deal with finite and countably infinite collections. For infinite collections, the setting requires that we be very explicit about the specification of the underlying measure-theoretic formulation. We show that (under appropriate normalization) for such a collection, any linear combination of the number of occurrences of each of the motifs in the data has a limiting normal distribution. In many instances, this can be interpreted in terms of the number of occurrences of individual motifs: They have a multivariate normal distribution. The methods of proof include combinatorics on words, integral transforms, and poissonization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. A set of probabilities \(p_1, \ldots , p_m\) is said to be periodic, when \(\log p_j / \log p_k\) is rational, for every \(1 \le j, k\le m\).

  2. In the aperiodic case, the o(n) estimate can be improved to \(O(n^{1-\varepsilon })\), for some \(0< \varepsilon < 1\).

  3. In our case, the variance \(\sigma ^{2}_\mathcal {C}(n)\) will always be strictly positive. For a more in-depth consideration of the variance for shape parameters in random tries, see Schachinger [23].

  4. We take an infinite-dimensional random vector to have a multivariate normal distribution, when every nonzero finite linear combination of its components has a univariate normal distribution.

  5. The same is not true in the fixed population model. That is, in case (i), \(I_{n, T_\kappa ,v}\) and \(I_{n, T_\nu ,w}\) can be dependent. So, we see the advantage of quickly switching to a Poisson model, rather than transforming recurrences in the fixed population model.

References

  1. De La Briandais, R.: File searching using variable length keys. In: Proceedings of the Western Joint Computer Conference, pp. 295–298. AFIPS, San Francisco, California (1959)

  2. Dobrow, R., Fill, J.: Multiway trees of maximum and minimum probability under the random permutation model. Comb. Probab. Comput. 5, 351–371 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  3. Fagin, R., Nievergelt, J., Pippenger, N., Strong, H.: Extendible hashing—a fast access method for dynamic files. ACM Trans. Database Syst. 4, 315–344 (1979)

    Article  Google Scholar 

  4. Feng, Q., Mahmoud, H.: On the variety of shapes on the fringe of a random recursive tree. J. Appl. Probab. 47, 191–200 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  5. Fill, J.: On the distribution of binary search trees under the random permutation model. Random Struct. Algorithms 8, 1–25 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  6. Fill, J., Kapur, N.: Transfer theorems and asymptotic distributional results for \(m\)-ary search trees. Random Struct. Algorithms 26, 359–391 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  7. Flajolet, P., Gourdon, X., Dumas, P.: Mellin transform and asymptotic harmonic sums. Theor. Comput. Sci. 144, 3–58 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  8. Flajolet, P., Gourdon, X., Martínez, C.: Patterns in random binary search trees. Random Struct. Algorithms 11, 223–244 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  9. Flajolet, P., Roux, M., Vallée, B.: Digital trees and memoryless sources: from arithmetics to analysis. In: 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA ’10); DMTCS Proceedings, AM, pp. 233–260 (2010)

  10. Fredkin, E.: Trie memory. Commun. ACM 3, 490–499 (1960)

    Article  Google Scholar 

  11. Fuchs, M., Hwang, H.K., Zacharovas, V.: An analytic approach to the asymptotic variance of trie statistics and related structures. Theor. Comput. Sci. 527, 1–36 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  12. Fuchs, M., Lee, C.-K.: A general central limit theorem for shape parameters of \(m\)-ary tries and PATRICIA tries. Electron. J. Combin. 21(1), 1–68 (2014)

    MATH  MathSciNet  Google Scholar 

  13. Gaither, J., Ward, M.D.: The variance of the number of 2-protected nodes in a trie. In: Nebel, E., Szpankowski, W. (eds.) Proceedings of the 10th Meeting on Analytic Algorithmics and Combinatorics, pp. 43–51. ANALCO 2013, New Orleans, Louisiana, USA (2013)

  14. Gopaladesikan, M., Mahmoud, H., Ward, M.D.: Asymptotic joint normality of counts of uncorrelated motifs in recursive trees. Methodol. Comput. Appl. Probab. 16, 863–884 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  15. Gopaladesikan, M., Wagner, S., Ward, M.D.: On the asymptotic probability of forbidden motifs on the fringe of recursive trees. Exp. Math. 25, 237–245 (2016)

    Article  MATH  MathSciNet  Google Scholar 

  16. Hwang, H.K., Fuchs, M., Zacharovas, V.: Asymptotic variance of random symmetric digital search trees. Discrete Math. Theor. Comput. Sci. 12, 103–166 (2010)

    MATH  MathSciNet  Google Scholar 

  17. Jacquet, P., Régnier, M.: Trie partitioning process: limiting distributions. In: Lecture Notes in Computer Science, vol. 214, pp. 196–210. Springer, New York (1986)

  18. Jacquet, P., Szpankowski, W.: Analytical depoissonization and its applications. Theor. Comput. Sci. 201, 1–62 (1998)

    Article  MATH  Google Scholar 

  19. Knuth, D.: The Art of Computer Programming, Volume 3: Sorting and Searching, 2nd edn. Addison-Wesley, Reading, Massachusetts (1998)

    Google Scholar 

  20. Mahmoud, H.: Sorting: A Distribution Theory. Wiley, New York (2000)

    Book  MATH  Google Scholar 

  21. Mahmoud, H., Ward, M.D.: Average-case analysis of cousins in \(m\)-ary tries. J. Appl. Probab. 45, 888–900 (2008)

    MATH  MathSciNet  Google Scholar 

  22. Pittel, B.: Asymptotical growth of a class of random trees. Ann. Probab. 13, 414–427 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  23. Schachinger, W.: On the variance of a class of inductive valuations of data structures for digital search. Theor. Comput. Sci. 144, 251–275 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  24. Szpankowski, W.: Average Case Analysis of Algorithms on Sequences. Wiley, New York (2001)

    Book  MATH  Google Scholar 

Download references

Acknowledgments

The authors sincerely thank an anonymous referee for detailed and insightful comments about the entire paper. We acknowledge the referee for improving the paper in many ways. M. D. Ward’s research is supported by NSF Grant DMS-1246818, and by the NSF Science & Technology Center for Science of Information Grant CCF-0939370.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Daniel Ward.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gaither, J., Mahmoud, H. & Ward, M.D. On the Variety of Shapes in Digital Trees. J Theor Probab 30, 1225–1254 (2017). https://doi.org/10.1007/s10959-016-0700-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10959-016-0700-x

Keywords

Mathematics Subject Classification (2010)

Navigation