String matching in hypertext
In this paper we consider the string matching problem in hypertext, which is a nonlinear structure of text. We model the hypertext as a directed graph G = (V, E), where each node v ∃ V has text T v associated with it and each link (v, w) ∃ E connects the end of text T v to the start of text T w . We define the string matching problem in hypertext as follows: Given a graph G modeling a hypertext and a pattern P, find all occurrences of the pattern in graph G. The pattern length is m and the sum of the lengths of all texts T v in G is N. The main difficulty in the hypertext string matching problem is that the pattern may occur across links.
There is a linear time algorithm for the case when graph G is a tree. In this paper we present a linear O(N+¦E¦) time algorithm when n v = length(T v ) is larger than or equal to m for all v, and a more involved algorithm that takes O(N + ¦E¦m) time when there exist some nodes v with n v < m. To obtain the results, we combine the notion of witnesses and duels with the suffix tree, which enables us to eliminate possible occurrences of any substring of the pattern.
KeywordsSource Node Parent Node String Match Suffix Tree Pattern Occurrence
Unable to display preview. Download preview PDF.
- 1.A.V. Aho, Algorithms for finding patterns in strings, in van Leeuwen, ed., Handbook of Theoretical Computer Science, North Holland, 1992.Google Scholar
- 2.A.V. Aho, J.E. Hopcroft and J.D.Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, 187–189.Google Scholar
- 7.R. Cole, M. Crochemore, Z. Galil, L. Gasieniec, R. Hariharan, S. Muthukrishnan, K. Park and W. Rytter, Optimally fast parallel algorithms for preprocessing and pattern matching in one and two dimensions, Proc. 34th IEEE Symp. Found. Computer Science (1993), 248–258.Google Scholar
- 8.J. Conklin, Hypertext: An introduction and survey, Computer 20 Sep. (1987), 17–41.Google Scholar
- 9.T.H. Cormen, C.E. Leiserson and R.L. Rivest, Introduction to Algorithms, MIT Press, 485–487.Google Scholar
- 10.M. Dubiner, Z. Galil and E. Magen, Faster tree pattern matching, Proc. 31st IEEE Symp. Found. Computer Science (1990), 145–150.Google Scholar
- 11.Z. Galil, On improving the worst case running time of the Boyer-Moore string matching algorithm, Comm. ACM 22 (1979), 505–508.Google Scholar
- 12.Z. Galil and K. Park, Alphabet-independent two-dimensional witness computation, to appear in SIAM J. Comput. Google Scholar
- 13.L. Gasieniec and K. Park, Work-time optimal parallel prefix matching, 2nd European Symp. on Algorithms, 1994.Google Scholar
- 17.U. Manber and S. Wu, Approximate string matching with arbitrary costs for text and hypertext, Proc. International Workshop on Structural and Syntactic Pattern Recognition (1992), 22–33.Google Scholar
- 21.P. Weiner, Linear pattern matching algorithms, Proc. 14th IEEE Symp. Switching and Automata Theory (1973), 1–11.Google Scholar