Skip to main content

Circular Pattern Matching with k Mismatches

  • Conference paper
  • First Online:
Fundamentals of Computation Theory (FCT 2019)

Abstract

The k-mismatch problem consists in computing the Hamming distance between a pattern P of length m and every length-m substring of a text T of length n, if this distance is no more than k. In many real-world applications, any cyclic shift of P is a relevant pattern, and thus one is interested in computing the minimal distance of every length-m substring of T and any cyclic shift of P. This is the circular pattern matching with k mismatches (k-CPM) problem. A multitude of papers have been devoted to solving this problem but, to the best of our knowledge, only average-case upper bounds are known. In this paper, we present the first non-trivial worst-case upper bounds for the k-CPM problem. Specifically, we show an \(\mathcal {O}(nk)\)-time algorithm and an \(\mathcal {O}(n+\frac{n}{m}\,{\small k^5})\)-time algorithm. The latter algorithm applies in an extended way a technique that was very recently developed for the k-mismatch problem [Bringmann et al., SODA 2019].

P. Charalampopoulos—Supported by a Studentship from the Faculty of Natural and Mathematical Sciences at King’s College London and an A. G. Leventis Foundation Educational Grant.

T. Kociumaka—Supported by ISF grants no. 824/17 and 1278/16 and by an ERC grant MPM under the EU’s Horizon 2020 Research and Innovation Programme (grant no. 683064).

J. Radoszewski and J. Straszyński—Supported by the “Algorithms for text processing with errors and uncertainties” project carried out within the HOMING program of the Foundation for Polish Science co-financed by the European Union under the European Regional Development Fund.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The modulo operation is used to handle the trivial rotation with \(x=0\).

References

  1. Abrahamson, K.R.: Generalized string matching. SIAM J. Comput. 16(6), 1039–1051 (1987). https://doi.org/10.1137/0216067

    Article  MathSciNet  MATH  Google Scholar 

  2. Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with \(k\) mismatches. J. Algorithms 50(2), 257–275 (2004). https://doi.org/10.1016/S0196-6774(03)00097-X

    Article  MathSciNet  MATH  Google Scholar 

  3. Ayad, L.A.K., Barton, C., Pissis, S.P.: A faster and more accurate heuristic for cyclic edit distance computation. Pattern Recognit. Lett. 88, 81–87 (2017). https://doi.org/10.1016/j.patrec.2017.01.018

    Article  Google Scholar 

  4. Ayad, L.A.K., Pissis, S.P.: MARS: improving multiple circular sequence alignment using refined sequences. BMC Genomics 18(1), 86 (2017). https://doi.org/10.1186/s12864-016-3477-5

    Article  Google Scholar 

  5. Azim, M.A.R., Iliopoulos, C.S., Rahman, M.S., Samiruzzaman, M.: A filter-based approach for approximate circular pattern matching. In: Harrison, R., Li, Y., Măndoiu, I. (eds.) ISBRA 2015. LNCS, vol. 9096, pp. 24–35. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19048-8_3

    Chapter  Google Scholar 

  6. Azim, M.A.R., Iliopoulos, C.S., Rahman, M.S., Samiruzzaman, M.: A fast and lightweight filter-based algorithm for circular pattern matching. In: Baldi, P., Wang, W. (eds.) 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2014, pp. 621–622. ACM (2014). https://doi.org/10.1145/2649387.2660804

  7. Barton, C., Iliopoulos, C.S., Kundu, R., Pissis, S.P., Retha, A., Vayani, F.: Accurate and efficient methods to improve multiple circular sequence alignment. In: Bampis, E. (ed.) SEA 2015. LNCS, vol. 9125, pp. 247–258. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20086-6_19

    Chapter  Google Scholar 

  8. Barton, C., Iliopoulos, C.S., Pissis, S.P.: Fast algorithms for approximate circular string matching. Algorithms Mol. Biol. 9, 9 (2014). https://doi.org/10.1186/1748-7188-9-9

    Article  Google Scholar 

  9. Barton, C., Iliopoulos, C.S., Pissis, S.P.: Average-case optimal approximate circular string matching. In: Dediu, A.-H., Formenti, E., Martín-Vide, C., Truthe, B. (eds.) LATA 2015. LNCS, vol. 8977, pp. 85–96. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15579-1_6

    Chapter  Google Scholar 

  10. Bille, P., Fagerberg, R., Gørtz, I.L.: Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts. ACM Trans. Algorithms 6(1), 3:1–3:14 (2009). https://doi.org/10.1145/1644015.1644018

    Article  MathSciNet  MATH  Google Scholar 

  11. Bringmann, K., Wellnitz, P., Künnemann, M.: Few matches or almost periodicity: faster pattern matching with mismatches in compressed texts. In: Chan, T.M. (ed.) 30th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, pp. 1126–1145. SIAM (2019). https://doi.org/10.1137/1.9781611975482.69

    Chapter  Google Scholar 

  12. Chang, W.I., Marr, T.G.: Approximate string matching and local similarity. In: Crochemore, M., Gusfield, D. (eds.) CPM 1994. LNCS, vol. 807, pp. 259–273. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-58094-8_23

    Chapter  Google Scholar 

  13. Clifford, R., Fontaine, A., Porat, E., Sach, B., Starikovskaya, T.: The \(k\)-mismatch problem revisited. In: Krauthgamer, R. (ed.) 27th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, pp. 2039–2052. SIAM (2016). https://doi.org/10.1137/1.9781611974331.ch142

  14. Clifford, R., Kociumaka, T., Porat, E.: The streaming \(k\)-mismatch problem. In: Chan, T.M. (ed.) 30th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, pp. 1106–1125. SIAM (2019). https://doi.org/10.1137/1.9781611975482.68

    Chapter  Google Scholar 

  15. Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, Cambridge (2007). https://doi.org/10.1017/cbo9780511546853

    Book  MATH  Google Scholar 

  16. Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with \(\cal{O}(1)\) worst case access time. J. ACM 31(3), 538–544 (1984). https://doi.org/10.1145/828.1884

    Article  MathSciNet  MATH  Google Scholar 

  17. Fredriksson, K., Navarro, G.: Average-optimal single and multiple approximate string matching. ACM J. Exp. Algorithmics 9(1.4), 1–47 (2004). https://doi.org/10.1145/1005813.1041513

    Article  Google Scholar 

  18. Galil, Z., Giancarlo, R.: Parallel string matching with \(k\) mismatches. Theor. Comput. Sci. 51, 341–348 (1987). https://doi.org/10.1016/0304-3975(87)90042-9

    Article  MathSciNet  MATH  Google Scholar 

  19. Gawrychowski, P., Straszak, D.: Beating \(\cal{O}(nm)\) in approximate LZW-compressed pattern matching. In: Cai, L., Cheng, S.-W., Lam, T.-W. (eds.) ISAAC 2013. LNCS, vol. 8283, pp. 78–88. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45030-3_8

    Chapter  Google Scholar 

  20. Gawrychowski, P., Uznański, P.: Order-preserving pattern matching with \(k\) mismatches. Theor. Comput. Sci. 638, 136–144 (2016). https://doi.org/10.1016/j.tcs.2015.08.022

    Article  MathSciNet  MATH  Google Scholar 

  21. Gawrychowski, P., Uznański, P.: Towards unified approximate pattern matching for Hamming and \({L}_1\) distance. In: Chatzigiannakis, I., Kaklamanis, C., Marx, D., Sannella, D. (eds.) Automata, Languages, and Programming, ICALP 2018. LIPIcs, vol. 107, pp. 62:1–62:13. Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2018). https://doi.org/10.4230/LIPIcs.ICALP.2018.62

  22. Grossi, R., Iliopoulos, C.S., Mercas, R., Pisanti, N., Pissis, S.P., Retha, A., Vayani, F.: Circular sequence comparison: algorithms and applications. Algorithms Mol. Biol. 11, 12 (2016). https://doi.org/10.1186/s13015-016-0076-6

    Article  Google Scholar 

  23. Hazay, C., Lewenstein, M., Sokol, D.: Approximate parameterized matching. ACM Trans. Algorithms 3(3), 29 (2007). https://doi.org/10.1145/1273340.1273345

    Article  MathSciNet  MATH  Google Scholar 

  24. Hirvola, T., Tarhio, J.: Bit-parallel approximate matching of circular strings with k mismatches. ACM J. Exp. Algorithmics 22, 1–5 (2017). https://doi.org/10.1145/3129536

    Article  MathSciNet  MATH  Google Scholar 

  25. Iliopoulos, C.S., Pissis, S.P., Rahman, M.S.: Searching and indexing circular patterns. In: Elloumi, M. (ed.) Algorithms for Next-Generation Sequencing Data, pp. 77–90. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59826-0_3

    Chapter  Google Scholar 

  26. Kociumaka, T.: Efficient data structures for internal queries in texts. Ph.D. thesis, University of Warsaw, October 2018. https://www.mimuw.edu.pl/~kociumaka/files/phd.pdf

  27. Kociumaka, T., Radoszewski, J., Rytter, W., Straszyński, J., Waleń, T., Zuba, W.: Efficient representation and counting of antipower factors in words (2018). http://arxiv.org/abs/1812.08101

  28. Kociumaka, T., Radoszewski, J., Rytter, W., Straszyński, J., Waleń, T., Zuba, W.: Efficient representation and counting of antipower factors in words. In: Martín-Vide, C., Okhotin, A., Shapira, D. (eds.) LATA 2019. LNCS, vol. 11417, pp. 421–433. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-13435-8_31

    Chapter  MATH  Google Scholar 

  29. Kociumaka, T., Radoszewski, J., Rytter, W., Waleń, T.: Internal pattern matching queries in a text and applications. In: Indyk, P. (ed.) 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, pp. 532–551. SIAM (2015). https://doi.org/10.1137/1.9781611973730.36

  30. Kosaraju, S.: Efficient string matching (1987, manuscript)

    Google Scholar 

  31. Landau, G.M., Vishkin, U.: Efficient string matching with \(k\) mismatches. Theor. Comput. Sci. 43, 239–249 (1986). https://doi.org/10.1016/0304-3975(86)90178-7

    Article  MathSciNet  MATH  Google Scholar 

  32. Palazón-González, V., Marzal, A.: On the dynamic time warping of cyclic sequences for shape retrieval. Image Vision Comput. 30(12), 978–990 (2012). https://doi.org/10.1016/j.imavis.2012.08.012

    Article  Google Scholar 

  33. Palazón-González, V., Marzal, A.: Speeding up the cyclic edit distance using LAESA with early abandon. Pattern Recognit. Lett. 62, 1–7 (2015). https://doi.org/10.1016/j.patrec.2015.04.013

    Article  Google Scholar 

  34. Palazón-González, V., Marzal, A., Vilar, J.M.: On hidden Markov models and cyclic strings for shape recognition. Pattern Recognit. 47(7), 2490–2504 (2014). https://doi.org/10.1016/j.patcog.2014.01.018

    Article  MATH  Google Scholar 

  35. Porat, B., Porat, E.: Exact and approximate pattern matching in the streaming model. In: 50th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2009, pp. 315–323. IEEE Computer Society (2009). https://doi.org/10.1109/FOCS.2009.11

  36. Ružić, M.: Constructing efficient dictionaries in close to sorting time. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008. LNCS, vol. 5125, pp. 84–95. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70575-8_8

    Chapter  Google Scholar 

  37. Tiskin, A.: Threshold approximate matching in grammar-compressed strings. In: Holub, J., Zdárek, J. (eds.) Prague Stringology Conference 2014, PSC 2014, pp. 124–138. Department of Theoretical Computer Science, Faculty of Information Technology, Czech Technical University in Prague (2014). http://www.stringology.org/event/2014/p12.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wiktor Zuba .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Charalampopoulos, P. et al. (2019). Circular Pattern Matching with k Mismatches. In: Gąsieniec, L., Jansson, J., Levcopoulos, C. (eds) Fundamentals of Computation Theory. FCT 2019. Lecture Notes in Computer Science(), vol 11651. Springer, Cham. https://doi.org/10.1007/978-3-030-25027-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-25027-0_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-25026-3

  • Online ISBN: 978-3-030-25027-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics