Cross-Document Pattern Matching

  • Gregory Kucherov
  • Yakov Nekrich
  • Tatiana Starikovskaya
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7354)

Abstract

We study a new variant of the string matching problem called cross-document string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a substring of another document. Several variants of this problem are considered, and efficient linear-space solutions are proposed with query time bounds that either do not depend at all on the pattern size or depend on it in a very limited way (doubly logarithmic). As a side result, we propose an improved solution to the weighted level ancestor problem.

Keywords

Query Time Euler Tour Heavy Path Reporting Query Space Data Structure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Farach, M., Muthukrishnan, S.: Perfect Hashing for Strings: Formalization and Algorithms. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 130–140. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  2. 2.
    Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic text and static pattern matching. ACM Trans. Algorithms 3 (2007)Google Scholar
  3. 3.
    Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2002. Society for Industrial and Applied Mathematics, Philadelphia (2002)Google Scholar
  4. 4.
    Berkman, O., Vishkin, U.: Finding level-ancestors in trees. J. Comput. Syst. Sci. 48(2), 214–230 (1994)MathSciNetMATHCrossRefGoogle Scholar
  5. 5.
    Bender, M.A., Farach-Colton, M.: The level ancestor problem simplified. Theor. Comput. Sci. 321(1), 5–12 (2004)MathSciNetMATHCrossRefGoogle Scholar
  6. 6.
    Golynski, A., Munro, J.I., Rao, S.S.: Rank/select operations on large alphabets: a tool for text indexing. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 368–373. ACM Press (2006)Google Scholar
  7. 7.
    Bender, M.A., Farach-Colton, M.: The LCA Problem Revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  8. 8.
    Bender, M.A., Cole, R., Demaine, E.D., Farach-Colton, M., Zito, J.: Two Simplified Algorithms for Maintaining Order in a List. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 152–164. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  9. 9.
    Dietz, P., Sleator, D.: Two algorithms for maintaining order in a list. In: Proceedings of the 19th Annual ACM Symposium on Theory of Computing, STOC 1987, pp. 365–372. ACM, New York (1987)Google Scholar
  10. 10.
    Bentley, J.L.: Multidimensional divide-and-conquer. Commun. ACM 23(4), 214–229 (1980)MathSciNetMATHCrossRefGoogle Scholar
  11. 11.
    Gagie, T., Navarro, G., Puglisi, S.J.: Colored Range Queries and Document Retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 67–81. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  12. 12.
    Bozanis, P., Kitsios, N., Makris, C., Tsakalidis, A.: New Upper Bounds for Generalized Intersection Searching Problems. In: Fülöp, Z., Gécseg, F. (eds.) ICALP 1995. LNCS, vol. 944, pp. 464–474. Springer, Heidelberg (1995)CrossRefGoogle Scholar
  13. 13.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39 (2007)Google Scholar
  14. 14.
    Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract). In: Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, STOC 2000, pp. 397–406. ACM, New York (2000)CrossRefGoogle Scholar
  15. 15.
    Sadakane, K.: Compressed suffix trees with full functionality. Theory Comput. Syst. 41, 589–607 (2007)MathSciNetMATHCrossRefGoogle Scholar
  16. 16.
    Sadakane, K.: Succinct data structures for flexible text retrieval systems. J. of Discrete Algorithms 5, 12–22 (2007)MathSciNetMATHCrossRefGoogle Scholar
  17. 17.
    Fredman, M.L., Willard, D.E.: Trans-dichotomous algorithms for minimum spanning trees and shortest paths. J. Comput. Syst. Sci. 48(3), 533–551 (1994)MathSciNetMATHCrossRefGoogle Scholar
  18. 18.
    Andersson, A., Thorup, M.: Dynamic ordered sets with exponential search trees. J. ACM 54(3), 13 (2007)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Gregory Kucherov
    • 1
  • Yakov Nekrich
    • 2
  • Tatiana Starikovskaya
    • 3
    • 1
  1. 1.Laboratoire d’Informatique Gaspard MongeUniversité Paris-Est & CNRSParisFrance
  2. 2.Department of Computer ScienceUniversity of ChileSantiagoChile
  3. 3.Lomonosov Moscow State UniversityMoscowRussia

Personalised recommendations