Abstract
We study the joint distribution of the number of occurrences of members of a collection of nonoverlapping motifs in digital data. We deal with finite and countably infinite collections. For infinite collections, the setting requires that we be very explicit about the specification of the underlying measure-theoretic formulation. We show that (under appropriate normalization) for such a collection, any linear combination of the number of occurrences of each of the motifs in the data has a limiting normal distribution. In many instances, this can be interpreted in terms of the number of occurrences of individual motifs: They have a multivariate normal distribution. The methods of proof include combinatorics on words, integral transforms, and poissonization.
Similar content being viewed by others
Notes
A set of probabilities \(p_1, \ldots , p_m\) is said to be periodic, when \(\log p_j / \log p_k\) is rational, for every \(1 \le j, k\le m\).
In the aperiodic case, the o(n) estimate can be improved to \(O(n^{1-\varepsilon })\), for some \(0< \varepsilon < 1\).
In our case, the variance \(\sigma ^{2}_\mathcal {C}(n)\) will always be strictly positive. For a more in-depth consideration of the variance for shape parameters in random tries, see Schachinger [23].
We take an infinite-dimensional random vector to have a multivariate normal distribution, when every nonzero finite linear combination of its components has a univariate normal distribution.
The same is not true in the fixed population model. That is, in case (i), \(I_{n, T_\kappa ,v}\) and \(I_{n, T_\nu ,w}\) can be dependent. So, we see the advantage of quickly switching to a Poisson model, rather than transforming recurrences in the fixed population model.
References
De La Briandais, R.: File searching using variable length keys. In: Proceedings of the Western Joint Computer Conference, pp. 295–298. AFIPS, San Francisco, California (1959)
Dobrow, R., Fill, J.: Multiway trees of maximum and minimum probability under the random permutation model. Comb. Probab. Comput. 5, 351–371 (1996)
Fagin, R., Nievergelt, J., Pippenger, N., Strong, H.: Extendible hashing—a fast access method for dynamic files. ACM Trans. Database Syst. 4, 315–344 (1979)
Feng, Q., Mahmoud, H.: On the variety of shapes on the fringe of a random recursive tree. J. Appl. Probab. 47, 191–200 (2008)
Fill, J.: On the distribution of binary search trees under the random permutation model. Random Struct. Algorithms 8, 1–25 (1996)
Fill, J., Kapur, N.: Transfer theorems and asymptotic distributional results for \(m\)-ary search trees. Random Struct. Algorithms 26, 359–391 (2004)
Flajolet, P., Gourdon, X., Dumas, P.: Mellin transform and asymptotic harmonic sums. Theor. Comput. Sci. 144, 3–58 (1995)
Flajolet, P., Gourdon, X., Martínez, C.: Patterns in random binary search trees. Random Struct. Algorithms 11, 223–244 (1997)
Flajolet, P., Roux, M., Vallée, B.: Digital trees and memoryless sources: from arithmetics to analysis. In: 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA ’10); DMTCS Proceedings, AM, pp. 233–260 (2010)
Fredkin, E.: Trie memory. Commun. ACM 3, 490–499 (1960)
Fuchs, M., Hwang, H.K., Zacharovas, V.: An analytic approach to the asymptotic variance of trie statistics and related structures. Theor. Comput. Sci. 527, 1–36 (2014)
Fuchs, M., Lee, C.-K.: A general central limit theorem for shape parameters of \(m\)-ary tries and PATRICIA tries. Electron. J. Combin. 21(1), 1–68 (2014)
Gaither, J., Ward, M.D.: The variance of the number of 2-protected nodes in a trie. In: Nebel, E., Szpankowski, W. (eds.) Proceedings of the 10th Meeting on Analytic Algorithmics and Combinatorics, pp. 43–51. ANALCO 2013, New Orleans, Louisiana, USA (2013)
Gopaladesikan, M., Mahmoud, H., Ward, M.D.: Asymptotic joint normality of counts of uncorrelated motifs in recursive trees. Methodol. Comput. Appl. Probab. 16, 863–884 (2014)
Gopaladesikan, M., Wagner, S., Ward, M.D.: On the asymptotic probability of forbidden motifs on the fringe of recursive trees. Exp. Math. 25, 237–245 (2016)
Hwang, H.K., Fuchs, M., Zacharovas, V.: Asymptotic variance of random symmetric digital search trees. Discrete Math. Theor. Comput. Sci. 12, 103–166 (2010)
Jacquet, P., Régnier, M.: Trie partitioning process: limiting distributions. In: Lecture Notes in Computer Science, vol. 214, pp. 196–210. Springer, New York (1986)
Jacquet, P., Szpankowski, W.: Analytical depoissonization and its applications. Theor. Comput. Sci. 201, 1–62 (1998)
Knuth, D.: The Art of Computer Programming, Volume 3: Sorting and Searching, 2nd edn. Addison-Wesley, Reading, Massachusetts (1998)
Mahmoud, H.: Sorting: A Distribution Theory. Wiley, New York (2000)
Mahmoud, H., Ward, M.D.: Average-case analysis of cousins in \(m\)-ary tries. J. Appl. Probab. 45, 888–900 (2008)
Pittel, B.: Asymptotical growth of a class of random trees. Ann. Probab. 13, 414–427 (1985)
Schachinger, W.: On the variance of a class of inductive valuations of data structures for digital search. Theor. Comput. Sci. 144, 251–275 (1995)
Szpankowski, W.: Average Case Analysis of Algorithms on Sequences. Wiley, New York (2001)
Acknowledgments
The authors sincerely thank an anonymous referee for detailed and insightful comments about the entire paper. We acknowledge the referee for improving the paper in many ways. M. D. Ward’s research is supported by NSF Grant DMS-1246818, and by the NSF Science & Technology Center for Science of Information Grant CCF-0939370.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gaither, J., Mahmoud, H. & Ward, M.D. On the Variety of Shapes in Digital Trees. J Theor Probab 30, 1225–1254 (2017). https://doi.org/10.1007/s10959-016-0700-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10959-016-0700-x
Keywords
- Analysis of algorithms
- Random trees
- Digital trees
- Recurrence
- Functional equation
- Mellin transform
- Poissonization
- Digital data
- Combinatorics on words
- Similarity of strings
- Motif