Advertisement

String matching in hypertext

  • Kunsoo Park
  • Dong Kyue Kim
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 937)

Abstract

In this paper we consider the string matching problem in hypertext, which is a nonlinear structure of text. We model the hypertext as a directed graph G = (V, E), where each node v ∃ V has text T v associated with it and each link (v, w) ∃ E connects the end of text T v to the start of text T w . We define the string matching problem in hypertext as follows: Given a graph G modeling a hypertext and a pattern P, find all occurrences of the pattern in graph G. The pattern length is m and the sum of the lengths of all texts T v in G is N. The main difficulty in the hypertext string matching problem is that the pattern may occur across links.

There is a linear time algorithm for the case when graph G is a tree. In this paper we present a linear O(N+¦E¦) time algorithm when n v = length(T v ) is larger than or equal to m for all v, and a more involved algorithm that takes O(N + ¦E¦m) time when there exist some nodes v with n v < m. To obtain the results, we combine the notion of witnesses and duels with the suffix tree, which enables us to eliminate possible occurrences of any substring of the pattern.

Keywords

Source Node Parent Node String Match Suffix Tree Pattern Occurrence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A.V. Aho, Algorithms for finding patterns in strings, in van Leeuwen, ed., Handbook of Theoretical Computer Science, North Holland, 1992.Google Scholar
  2. 2.
    A.V. Aho, J.E. Hopcroft and J.D.Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, 187–189.Google Scholar
  3. 3.
    A. Amir, G. Benson and M. Farach, An alphabet independent approach to two-dimensional pattern matching, SIAM J. Comput. 23, 2 (1994), 313–323.CrossRefGoogle Scholar
  4. 4.
    A. Apostolico, C. Iliopoulos, G.M. Landau, B. Schieber and U. Vishkin, Parallel construction of a suffix tree with applications, Algorithmica 3, (1988), 347–365.CrossRefGoogle Scholar
  5. 5.
    R.S. Boyer and J.S. Moore, A fast string searching algorithm, Comm. ACM 20 (1977), 762–772.CrossRefGoogle Scholar
  6. 6.
    D. Breslauer and Z. Galil, An optimal O(log log n) time parallel string matching algorithm, SIAM J. Comput. 19 (1990), 1051–1058.CrossRefGoogle Scholar
  7. 7.
    R. Cole, M. Crochemore, Z. Galil, L. Gasieniec, R. Hariharan, S. Muthukrishnan, K. Park and W. Rytter, Optimally fast parallel algorithms for preprocessing and pattern matching in one and two dimensions, Proc. 34th IEEE Symp. Found. Computer Science (1993), 248–258.Google Scholar
  8. 8.
    J. Conklin, Hypertext: An introduction and survey, Computer 20 Sep. (1987), 17–41.Google Scholar
  9. 9.
    T.H. Cormen, C.E. Leiserson and R.L. Rivest, Introduction to Algorithms, MIT Press, 485–487.Google Scholar
  10. 10.
    M. Dubiner, Z. Galil and E. Magen, Faster tree pattern matching, Proc. 31st IEEE Symp. Found. Computer Science (1990), 145–150.Google Scholar
  11. 11.
    Z. Galil, On improving the worst case running time of the Boyer-Moore string matching algorithm, Comm. ACM 22 (1979), 505–508.Google Scholar
  12. 12.
    Z. Galil and K. Park, Alphabet-independent two-dimensional witness computation, to appear in SIAM J. Comput. Google Scholar
  13. 13.
    L. Gasieniec and K. Park, Work-time optimal parallel prefix matching, 2nd European Symp. on Algorithms, 1994.Google Scholar
  14. 14.
    D. Harel and R.E. Tarjan, Fast algorithms for finding nearest common ancestors, SIAM J. Comput. 13, (1984), 338–355.CrossRefGoogle Scholar
  15. 15.
    D.E. Knuth, J.H. Morris, and V.B. Pratt, Fast pattern matching in strings, SIAM J. Comput. 6 (1977), 323–350.CrossRefGoogle Scholar
  16. 16.
    M.G. Main and R.J. Lorentz, An O(n log n) algorithm for finding all repetitions in a string, J. Algorithms 5 (1984), 422–432.CrossRefGoogle Scholar
  17. 17.
    U. Manber and S. Wu, Approximate string matching with arbitrary costs for text and hypertext, Proc. International Workshop on Structural and Syntactic Pattern Recognition (1992), 22–33.Google Scholar
  18. 18.
    E.M. McCreight, A space-economical suffix tree construction algorithms, J. ACM 23, (1976), 262–272.CrossRefGoogle Scholar
  19. 19.
    B. Schieber and U. Vishkin, On finding lowest common ancestors: simplification and parallelization, SIAM J. Comput. 17, (1988), 1253–1262.CrossRefGoogle Scholar
  20. 20.
    U. Vishkin, Optimal parallel pattern matching in strings, Inform. and Control 67 (1985), 91–113.CrossRefGoogle Scholar
  21. 21.
    P. Weiner, Linear pattern matching algorithms, Proc. 14th IEEE Symp. Switching and Automata Theory (1973), 1–11.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1995

Authors and Affiliations

  • Kunsoo Park
    • 1
  • Dong Kyue Kim
    • 1
  1. 1.Department of Computer EngineeringSeoul National UniversitySeoulKorea

Personalised recommendations