Skip to main content

String Matching with Variable Length Gaps

  • Conference paper
String Processing and Information Retrieval (SPIRE 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6393))

Included in the following conference series:

Abstract

We consider string matching with variable length gaps. Given a string T and a pattern P consisting of strings separated by variable length gaps (arbitrary strings of length in a specified range), the problem is to find all ending positions of substrings in T that match P. This problem is a basic primitive in computational biology applications. Let m and n be the lengths of P and T, respectively, and let k be the number of strings in P. We present a new algorithm achieving time O((n + m)logk + α) and space O(m + A), where A is the sum of the lower bounds of the lengths of the gaps in P and α is the total number of occurrences of the strings in P within T. Compared to the previous results this bound essentially achieves the best known time and space complexities simultaneously. Consequently, our algorithm obtains the best known bounds for almost all combinations of m, n, k, A, and α. Our algorithm is surprisingly simple and straightforward to implement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bille, P.: New algorithms for regular expression matching. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4051, pp. 643–654. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Bille, P., Thorup, M.: Faster regular expression matching. In: Proc. 36th ICALP, pp. 171–182 (2009)

    Google Scholar 

  4. Bille, P., Thorup, M.: Regular expression matching with multi-strings and intervals. In: Proc. 21st SODA (2010)

    Google Scholar 

  5. Bucher, P., Bairoch, A.: A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. In: Proc. 2nd ISMB, pp. 53–61 (1994)

    Google Scholar 

  6. Crochemore, M., Iliopoulos, C., Makris, C., Rytter, W., Tsakalidis, A., Tsichlas, K.: Approximate string matching with gaps. Nordic J. of Computing 9(1), 54–65 (2002)

    MathSciNet  MATH  Google Scholar 

  7. Fredriksson, K., Grabowski, S.: Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance. Inf. Retr. 11(4), 335–357 (2008)

    Article  Google Scholar 

  8. Fredriksson, K., Grabowski, S.: Nested counters in bit-parallel string matching. In: Dediu, A.H., Ionescu, A.M., Martín-Vide, C. (eds.) LATA 2009. LNCS, vol. 5457, pp. 338–349. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  9. Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The prosite database, its status in. Nucleic Acids Res. (27), 215–219 (1999)

    Google Scholar 

  10. Knuth, D.E., James, J., Morris, H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  11. Lee, I., Apostolico, A., Iliopoulos, C.S., Park, K.: Finding approximate occurrences of a pattern that contains gaps. In: Proc. 14th AWOCA, pp. 89–100 (2003)

    Google Scholar 

  12. Morgante, M., Policriti, A., Vitacolonna, N., Zuccolo, A.: Structured motifs search. J. Comput. Bio. 12(8), 1065–1082 (2005)

    Article  Google Scholar 

  13. Myers, E.W.: Approximate matching of network expressions with spacers. J. Comput. Bio. 3(1), 33–51 (1992)

    Article  Google Scholar 

  14. Myers, E.W.: A four-russian algorithm for regular expression pattern matching. J. ACM 39(2), 430–448 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  15. Myers, G., Mehldau, G.: A system for pattern matching applications on biosequences. CABIOS 9(3), 299–314 (1993)

    Google Scholar 

  16. Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J. Comput. Bio. 10(6), 903–923 (2003)

    Article  Google Scholar 

  17. Navarro, G., Raffinot, M.: New techniques for regular expression searching. Algorithmica 41(2), 89–116 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  18. Rahman, M.S., Iliopoulos, C.S., Lee, I., Mohamed, M., Smyth, W.F.: Finding patterns with variable length gaps or don’t cares. In: Chen, D.Z., Lee, D.T. (eds.) COCOON 2006. LNCS, vol. 4112, pp. 146–155. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  19. Thompson, K.: Regular expression search algorithm. Commun. ACM 11, 419–422 (1968)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bille, P., Li Gørtz, I., Vildhøj, H.W., Wind, D.K. (2010). String Matching with Variable Length Gaps. In: Chavez, E., Lonardi, S. (eds) String Processing and Information Retrieval. SPIRE 2010. Lecture Notes in Computer Science, vol 6393. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16321-0_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16321-0_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16320-3

  • Online ISBN: 978-3-642-16321-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics