Advertisement

Solving the String Statistics Problem in Time \( \mathcal{O} \) (nlogn)

  • Gerth Stølting Brodal
  • Rune B. Lyngsø
  • Anna Östlin
  • Christian N. S. Pedersen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2380)

Abstract

The string statistics problem consists of preprocessing a string of length n such that given a query pattern of length m, the maximum number of non-overlapping occurrences of the query pattern in the string can be reported efficiently. Apostolico and Preparata introduced the minimal augmented suffix tree (MAST) as a data structure for the string statistics problem, and showed how to construct the MAST in time \( \mathcal{O} \) (nlog2 n) and how it supports queries in time \( \mathcal{O} \) (m) for constant sized alphabets. A subsequent theorem by Fraenkel and Simpson stating that a string has at most a linear number of distinct squares implies that the MAST requires space \( \mathcal{O} \) (n). In this paper we improve the construction time for the MAST to \( \mathcal{O} \) (nlogn) by extending the algorithm of Apostolico and Preparata to exploit properties of efficient joining and splitting of search trees together with a refined analysis.

Keywords

Search Tree Internal Node Query Pattern Maximal Sequence Event Queue 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. Apostolico and A. Ehrenfeucht. Efficient detection of quasiperiodicities in strings. Theoretical Computer Science, 119:247–265, 1993.zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    A. Apostolico and F. P. Preparata. Optimal off-line detection of repetitions in a string. Theoretical Computer Science, 22:297–315, 1983.zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    A. Apostolico and F. P. Preparata. Data structures and algorithms for the string statistics problem. Algorithmica, 15:481–494, 1996.zbMATHMathSciNetCrossRefGoogle Scholar
  4. 4.
    G. S. Brodal, R. Lyngsø, C. N. S. Pedersen, and J. Stoye. Finding maximal pairs with bounded gap. Journal of Discrete Algorithms, Special Issue of Matching Patterns, 1(1):77–104, 2000.Google Scholar
  5. 5.
    G. S. Brodal, R. B. Lyngsø, A. Ostlin, and C. N. S. Pedersen. Solving the string statistics problem in time O(n log n). Technical Report RS-02-13, BRICS, Department of Computer Science, University of Aarhus, 2002.Google Scholar
  6. 6.
    G. S. Brodal and C. N. S. Pedersen. Finding maximal quasiperiodicities in strings. In Proc. 11th Combinatorial Pattern Matching, volume 1848 of Lecture Notes in Computer Science, pages 397–411. Springer Verlag, Berlin, 2000.CrossRefGoogle Scholar
  7. 7.
    M. R. Brown and R. E. Tarjan. A fast merging algorithm. Journal of the ACM, 26(2):211–226, 1979.zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    M. Farach. Optimal suffix tree construction with large alphabets. In Proc. 38th Ann. Symp. on Foundations of Computer Science (FOCS), pages 137–143, 1997.Google Scholar
  9. 9.
    A. S. Fraenkel and J. Simpson. How many squares can a string contain? Journal of Combinatorial Theory, Series A, 82(1):112–120, 1998.zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.Google Scholar
  11. 11.
    K. Hoffmann, K. Mehlhorn, P. Rosenstiehl, and R. E. Tarjan. Sorting Jordan sequences in linear time using level-linked search trees. Information and Control, 86(1–3):170–184, 1986.CrossRefMathSciNetGoogle Scholar
  12. 12.
    S. Huddleston and K. Mehlhorn. A new data structure for representing sorted lists. Acta Informatica, 17:157–184, 1982.zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    F. K. Hwang and S. Lin. A simple algorithm for merging two disjoint linearly ordered sets. SIAM Journal of Computing, 1(1):31–39, 1972.zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    D. E. Knuth, J. H. Morris, and V. R. Pratt. Fast pattern matching in strings. SIAM Journal of Computing, 6:323–350, 1977.zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    E. M. McCreight. A space-economical suffix tree construction algorithm. Journal of the ACM, 23(2):262–272, 1976.zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    K. Mehlhorn. Sorting and Searching, volume 1 of Data Structures and Algorithms. Springer Verlag, Berlin, 1984.zbMATHGoogle Scholar
  17. 17.
    J. Stoye and D. Gusfield. Simple and flexible detection of contiguous repeats using a suffix tree. Theoretical Computer Science, 270:843–856, 2002.zbMATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14:249–260, 1995.zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    P. Weiner. Linear pattern matching algorithms. In Proc. 14th Symposium on Switching and Automata Theory, pages 1–11, 1973.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Gerth Stølting Brodal
    • 1
  • Rune B. Lyngsø
    • 3
  • Anna Östlin
    • 1
  • Christian N. S. Pedersen
    • 1
    • 2
  1. 1.BRICS, Department of Computer ScienceUniversity of AarhusArhus CDenmark
  2. 2.BiRCUniversity of AarhusÅrhus CDenmark
  3. 3.Department of StatisticsOxford UniversityOxfordUK

Personalised recommendations