Skip to main content

Faster Subsequence and Don’t-Care Pattern Matching on Compressed Texts

  • Conference paper
Combinatorial Pattern Matching (CPM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6661))

Included in the following conference series:

Abstract

Subsequence pattern matching problems on compressed text were first considered by Cégielski et al. (Window Subsequence Problems for Compressed Texts, Proc. CSR 2006, LNCS 3967, pp. 127–136), where the principal problem is: given a string T represented as a straight line program (SLP) \(\mathcal{T}\) of size n, a string P of size m, compute the number of minimal subsequence occurrences of P in T. We present an O(nm) time algorithm for solving all variations of the problem introduced by Cégielski et al.. This improves the previous best known algorithm of Tiskin (Towards approximate matching in compressed strings: Local subsequence recognition, Proc. CSR 2011), which runs in O(nmlogm) time. We further show that our algorithms can be modified to solve a wider range of problems in the same O(nm) time complexity, and present the first matching algorithms for patterns containing VLDC (variable length don’t care) symbols, as well as for patterns containing FLDC (fixed length don’t care) symbols, on SLP compressed texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Baeza-Yates, R.A.: Searching subsequences. Theoretical Computer Science 78(2), 363–376 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  2. Baturo, P., Rytter, W.: Compressed string-matching in standard sturmian words. Theoretical Computer Science 410(30–32), 2804–2810 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Cégielski, P., Guessarian, I., Lifshits, Y., Matiyasevich, Y.: Window subsequence problems for compressed texts. In: Grigoriev, D., Harrison, J., Hirsch, E.A. (eds.) CSR 2006. LNCS, vol. 3967, pp. 127–136. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Claude, F., Navarro, G.: Self-indexed text compression using straight-line programs. In: Královič, R., Niwiński, D. (eds.) MFCS 2009. LNCS, vol. 5734, pp. 235–246. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  5. Hermelin, D., Landau, G.M., Landau, S., Weimann, O.: A unified algorithm for accelerating edit-distance computation via text-compression. In: Proc. STACS 2009, pp. 529–540 (2009)

    Google Scholar 

  6. Karpinski, M., Rytter, W., Shinohara, A.: An efficient pattern-matching algorithm for strings with short descriptions. Nordic Journal of Computing 4, 172–186 (1997)

    MathSciNet  MATH  Google Scholar 

  7. Knuth, D.E., Morris, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  8. Larsson, N.J., Moffat, A.: Offline dictionary-based compression. In: Proc. Data Compression Conference 1999, pp. 296–305. IEEE Computer Society Press, Los Alamitos (1999)

    Google Scholar 

  9. Lifshits, Y., Lohrey, M.: Querying and embedding compressed texts. In: Královič, R., Urzyczyn, P. (eds.) MFCS 2006. LNCS, vol. 4162, pp. 681–692. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery 1(3), 259–289 (1997)

    Article  Google Scholar 

  11. Miyazaki, M., Shinohara, A., Takeda, M.: An improved pattern matching algorithm for strings in terms of straight-line programs. In: CPM 1997. LNCS, vol. 1264, pp. 1–11. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  12. Nevill-Manning, C.G., Witten, I.H., Maulsby, D.L.: Compression by induction of hierarchical grammars. In: Data Compression Conference 1994, pp. 244–253. IEEE Computer Society Press, Los Alamitos (1994)

    Google Scholar 

  13. Rytter, W.: Grammar compression, LZ-encodings, and string algorithms with implicit input. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 15–27. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  14. Tiskin, A.: Faster subsequence recognition in compressed strings. J. Math. Sci. 158(5), 759–769 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  15. Tiskin, A.: Towards approximate matching in compressed strings: Local subsequence recognition. In: Proc. CSR 2011 (to appear, 2011)

    Google Scholar 

  16. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory IT-23(3), 337–349 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  17. Ziv, J., Lempel, A.: Compression of individual sequences via variable-length coding. IEEE Transactions on Information Theory 24(5), 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yamamoto, T., Bannai, H., Inenaga, S., Takeda, M. (2011). Faster Subsequence and Don’t-Care Pattern Matching on Compressed Texts. In: Giancarlo, R., Manzini, G. (eds) Combinatorial Pattern Matching. CPM 2011. Lecture Notes in Computer Science, vol 6661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21458-5_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21458-5_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21457-8

  • Online ISBN: 978-3-642-21458-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics