Abstract
A suffix tree of a word is a digital tree that is built from suffixes of the underlying word. We consider words that are random sequences built from independent symbols over a finite alphabet. Our main finding shows that the depths in a suffix tree are asymptotically equivalent to the depths in a digital tree that stores independent keys (i.e., independent digital trees known also as tries). More precisely, we prove that the depths in a suffix tree build from the first n suffixes of a random word are normally distributed with the mean asymptotically equivalent to 1/h 1 log n and the variance α·log n, where h 1 is the entropy of the alphabet, and α is a parameter of the probabilistic model. Our results provide new insights into asymptotic properties of compression schemes, and therefore find direct applications in computer sciences and telecommunications, most notably in coding theory, theory of languages, and design and analysis of algorithms.
This research was primary supported by NATO Collaborative Grant 0057/89.
This research was primary done while the author was visiting INRIA in Rocquencourt, France. Support was provided in part by NATO Collaborative Grant 0057/89, in part by NSF Grants NCR-8702115 and CCR-8900305, and from Grant AFOSR-90-0107, and in part by Grant R01 LM05118 from the National Library of Medicine.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
A. Apostolico, The Myriad Virtues of Suffix Trees, Combinatorial Algorithms on Words, pp. 8596, Springer-Verlag, ASI F12 (1985).
A.V. Aho, J.E. Hopcroft and J.D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley (1974).
A. Apostolico, W. Szpankowski, Self-alignments in Words and Their Applications, Purdue CSD-TR-732 (1987); Journal of Algorithms, to appear.
A. Blumer, A. Ehrenfeucht and D. Haussler, Average Size of Suffix Trees and DAWGS, Discrete Applied Mathematics, 24, 37–45 (1989).
B. Bollobás Random Graphs, Academic Press, London (1985).
L. Devroye, A Note on the Average Depth of Tries, Computing, 28, 367–371 (1982).
L., Devroye, W. Szpankowski and B. Rais, A note of the height of suffix trees, Purdue University, CSD TR-905 (1989); SIAM J. Computing, to appear.
P. Flajolet, On the Performance Evaluation of Extendible Hashing and Trie Searching, Acta Informatica, 20, 345369 (1983).
P. Flajolet, M. Regnier and R. Sedgewick, Some Uses of the Mellin Transform Techniques in the Analysis of Algorithms, in Combinatorial Algorithms on Words, Springer NATO ASI Ser. F12, 241–254 (1985).
L. Guibas and A. Odlyzko Maximal Prefix-Synchronized Codes, SIAM J. Appl. Math, 35, 401–418 (1978).
L. Giubas and A. Odlyzko, Periods in Strings Journal of Combinatorial Theory, Series A, 30, 19–43 (1981).
L. Guibas and A. W. Odlyzko, String Overlaps, Pattern Matching, and Nontransitive Games, Journal of Combinatorial Theory, Series A, 30, 183–208 (1981).
P. Henrici, Applied and Computational Complex Analysis, John Wiley & Sons (1977).
P. Jacquet and M. Regnier, Trie Partitioning Process: Limiting Distribution, Proc. CAAP'86, Lecture Notes in Computer Science 214, 194–210 (1986).
P. Jacquet and W. Szpankowski, Analysis of Tries With Markovian Dependency, Purdue University, CSD TR-906, 1989; IEEE Trans. Information Theory, to appear.
P. Jacquet and W. Szpankowski, Autocorrelation on Words and Its Applications. Analysis of Suffix Trees by String-Ruler Approach, INRIA TR-1106, 1989.
D. Knuth, The Art of Computer Programming. Sorting and Searching, Addison-Wesley (1973).
M. Lothaire, Combinatorics on Words, Addison-Wesley (1982).
A. Lempel and J. Ziv, On the Complexity of Finite Sequences, IEEE Information Theory 22, 1, 75–81 (1976).
E.M. McCreight, A Space Economical Suffix Tree Construction Algorithm, JACM, 23, 262272 (1976).
B. Pittel, Asymptotic growth of a class of random trees, The Annals of Probability, 18, 414–427 (1985).
B. Pittel, Paths in a Random Digital Tree: Limiting Distributions, Adv. Appl. Prob., 18, 139–155 (1986).
M. Regnier and P. Jacquet, New Results on the Size of Tries, IEEE Trans. Information Theory, 35, 203–205 (1989).
M. Rodeh, V. Pratt and S. Even, Linear Algorithm for Data Compression via String Matching, Journal of the ACM, 28, 16–24 (1981).
W. Szpankowski, Some Results on V-ary Asymmetric Tries, Journal of Algorithms, 9, 224–244 (1988).
W. Szpankowski, The Evaluation of an Alternating Sum with Applications to the Analysis of Some Data Structures, Information Processing Letters, 28, 13–19 (1988).
W. Szpankowski, On the Height of Digital Trees and Related Problems, Algorithmica, 6, 256–277 (1991).
P. Weiner, Linear Pattern Matching Algorithms, Proc. of the 14-th Annual Symposium on Switching and Automata Theory, 111 (1973).
J. Ziv and A. Lempel, A Universal Algorithm for Sequential Data Compression, IEEE Information Theory, 23, 3, 337–343 (1977).
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1991 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jacquet, P., Szpankowski, W. (1991). What can we learn about suffix trees from independent tries?. In: Dehne, F., Sack, JR., Santoro, N. (eds) Algorithms and Data Structures. WADS 1991. Lecture Notes in Computer Science, vol 519. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0028265
Download citation
DOI: https://doi.org/10.1007/BFb0028265
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-54343-5
Online ISBN: 978-3-540-47566-8
eBook Packages: Springer Book Archive