What can we learn about suffix trees from independent tries?

Jacquet, Philippe; Szpankowski, Wojciech

doi:10.1007/BFb0028265

What can we learn about suffix trees from independent tries?

Philippe Jacquet¹ &
Wojciech Szpankowski²

Conference paper
First Online: 01 January 2005

1998 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 519))

Abstract

A suffix tree of a word is a digital tree that is built from suffixes of the underlying word. We consider words that are random sequences built from independent symbols over a finite alphabet. Our main finding shows that the depths in a suffix tree are asymptotically equivalent to the depths in a digital tree that stores independent keys (i.e., independent digital trees known also as tries). More precisely, we prove that the depths in a suffix tree build from the first n suffixes of a random word are normally distributed with the mean asymptotically equivalent to 1/h ₁ log n and the variance α·log n, where h ₁ is the entropy of the alphabet, and α is a parameter of the probabilistic model. Our results provide new insights into asymptotic properties of compression schemes, and therefore find direct applications in computer sciences and telecommunications, most notably in coding theory, theory of languages, and design and analysis of algorithms.

This research was primary supported by NATO Collaborative Grant 0057/89.

This research was primary done while the author was visiting INRIA in Rocquencourt, France. Support was provided in part by NATO Collaborative Grant 0057/89, in part by NSF Grants NCR-8702115 and CCR-8900305, and from Grant AFOSR-90-0107, and in part by Grant R01 LM05118 from the National Library of Medicine.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

A. Apostolico, The Myriad Virtues of Suffix Trees, Combinatorial Algorithms on Words, pp. 8596, Springer-Verlag, ASI F12 (1985).
Google Scholar
A.V. Aho, J.E. Hopcroft and J.D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley (1974).
Google Scholar
A. Apostolico, W. Szpankowski, Self-alignments in Words and Their Applications, Purdue CSD-TR-732 (1987); Journal of Algorithms, to appear.
Google Scholar
A. Blumer, A. Ehrenfeucht and D. Haussler, Average Size of Suffix Trees and DAWGS, Discrete Applied Mathematics, 24, 37–45 (1989).
Google Scholar
B. Bollobás Random Graphs, Academic Press, London (1985).
Google Scholar
L. Devroye, A Note on the Average Depth of Tries, Computing, 28, 367–371 (1982).
Google Scholar
L., Devroye, W. Szpankowski and B. Rais, A note of the height of suffix trees, Purdue University, CSD TR-905 (1989); SIAM J. Computing, to appear.
Google Scholar
P. Flajolet, On the Performance Evaluation of Extendible Hashing and Trie Searching, Acta Informatica, 20, 345369 (1983).
Google Scholar
P. Flajolet, M. Regnier and R. Sedgewick, Some Uses of the Mellin Transform Techniques in the Analysis of Algorithms, in Combinatorial Algorithms on Words, Springer NATO ASI Ser. F12, 241–254 (1985).
Google Scholar
L. Guibas and A. Odlyzko Maximal Prefix-Synchronized Codes, SIAM J. Appl. Math, 35, 401–418 (1978).
Google Scholar
L. Giubas and A. Odlyzko, Periods in Strings Journal of Combinatorial Theory, Series A, 30, 19–43 (1981).
Google Scholar
L. Guibas and A. W. Odlyzko, String Overlaps, Pattern Matching, and Nontransitive Games, Journal of Combinatorial Theory, Series A, 30, 183–208 (1981).
Google Scholar
P. Henrici, Applied and Computational Complex Analysis, John Wiley & Sons (1977).
Google Scholar
P. Jacquet and M. Regnier, Trie Partitioning Process: Limiting Distribution, Proc. CAAP'86, Lecture Notes in Computer Science 214, 194–210 (1986).
Google Scholar
P. Jacquet and W. Szpankowski, Analysis of Tries With Markovian Dependency, Purdue University, CSD TR-906, 1989; IEEE Trans. Information Theory, to appear.
Google Scholar
P. Jacquet and W. Szpankowski, Autocorrelation on Words and Its Applications. Analysis of Suffix Trees by String-Ruler Approach, INRIA TR-1106, 1989.
Google Scholar
D. Knuth, The Art of Computer Programming. Sorting and Searching, Addison-Wesley (1973).
Google Scholar
M. Lothaire, Combinatorics on Words, Addison-Wesley (1982).
Google Scholar
A. Lempel and J. Ziv, On the Complexity of Finite Sequences, IEEE Information Theory 22, 1, 75–81 (1976).
Google Scholar
E.M. McCreight, A Space Economical Suffix Tree Construction Algorithm, JACM, 23, 262272 (1976).
Google Scholar
B. Pittel, Asymptotic growth of a class of random trees, The Annals of Probability, 18, 414–427 (1985).
Google Scholar
B. Pittel, Paths in a Random Digital Tree: Limiting Distributions, Adv. Appl. Prob., 18, 139–155 (1986).
Google Scholar
M. Regnier and P. Jacquet, New Results on the Size of Tries, IEEE Trans. Information Theory, 35, 203–205 (1989).
Google Scholar
M. Rodeh, V. Pratt and S. Even, Linear Algorithm for Data Compression via String Matching, Journal of the ACM, 28, 16–24 (1981).
Google Scholar
W. Szpankowski, Some Results on V-ary Asymmetric Tries, Journal of Algorithms, 9, 224–244 (1988).
Google Scholar
W. Szpankowski, The Evaluation of an Alternating Sum with Applications to the Analysis of Some Data Structures, Information Processing Letters, 28, 13–19 (1988).
Google Scholar
W. Szpankowski, On the Height of Digital Trees and Related Problems, Algorithmica, 6, 256–277 (1991).
Google Scholar
P. Weiner, Linear Pattern Matching Algorithms, Proc. of the 14-th Annual Symposium on Switching and Automata Theory, 111 (1973).
Google Scholar
J. Ziv and A. Lempel, A Universal Algorithm for Sequential Data Compression, IEEE Information Theory, 23, 3, 337–343 (1977).
Google Scholar

Download references

Author information

Authors and Affiliations

INRIA Rocquencourt, 78153, Le Chesnay Cedex, France
Philippe Jacquet
Department of Computer Science, Purdue University, 47907, W. Lafayette, IN, U.S.A.
Wojciech Szpankowski

Authors

Philippe Jacquet
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Szpankowski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Frank Dehne Jörg-Rüdiger Sack Nicola Santoro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jacquet, P., Szpankowski, W. (1991). What can we learn about suffix trees from independent tries?. In: Dehne, F., Sack, JR., Santoro, N. (eds) Algorithms and Data Structures. WADS 1991. Lecture Notes in Computer Science, vol 519. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0028265

Download citation

DOI: https://doi.org/10.1007/BFb0028265
Published: 17 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-54343-5
Online ISBN: 978-3-540-47566-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics