Skip to main content

Advertisement

Log in

From regular expression matching to parsing

  • Original Article
  • Published:
Acta Informatica Aims and scope Submit manuscript

Abstract

Given a regular expression R and a string Q, the regular expression parsing problem is to determine if Q matches R and if so, determine how it matches, i.e., by a mapping of the characters of Q to the characters in R. Regular expression parsing makes finding matches of a regular expression even more useful by allowing us to directly extract subpatterns of the match, e.g., for extracting IP-addresses from internet traffic analysis or extracting subparts of genomes from genetic data bases. We present a new general techniques for efficiently converting a large class of algorithms that determine if a string Q matches regular expression R into algorithms that can construct a corresponding mapping. As a consequence, we obtain the first efficient linear space solutions for regular expression parsing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Sometimes NFAs are allowed a set of accepting states, but this is not necessary for our purposes.

  2. Note that the time bound in the original paper has an additional \(m \log m\) term [4]. Using atomic heaps [9] to represent dictionaries for micro-TNFAs, this term is straightforward to improve to O(m). See also Bille and Thorup [6, Appendix A].

References

  1. Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing Co., Inc., Boston (1986)

    MATH  Google Scholar 

  2. Backurs, A., Indyk, P.: Which regular expression patterns are hard to match?. In: Proc. 57th FOCS, pp. 457–466 (2016)

  3. Bille, P.: New algorithms for regular expression matching. In: Proc. of the 33rd ICALP, pp. 643–654 (2006)

  4. Bille, P., Farach-Colton, M.: Fast and compact regular expression matching. Theor. Comput. Sci. 409(3), 486–496 (2008)

    Article  MathSciNet  Google Scholar 

  5. Bille, P., Gørtz, I.L.: From regular expression matching to parsing. In: Proc. 44th MFCS (2019)

  6. Bille, P., Thorup, M.: Faster regular expression matching. In: Proc. 36th ICALP, pp. 171–182 (2009). Full version with appendix available at http://www2.compute.dtu.dk/~phbi/files/publications/2009fremC.pdf

  7. Bille, P., Thorup, M.: Regular expression matching with multi-strings and intervals. In: Proc. 21st SODA, pp. 1297–1308 (2010)

  8. Dubé, D., Feeley, M.: Efficiently building a parse tree from a regular expression. Acta Inform. 37(2), 121–144 (2000)

    Article  MathSciNet  Google Scholar 

  9. Fredman, M.L., Willard, D.E.: Trans-dichotomous algorithms for minimum spanning trees and shortest paths. J. Comput. Syst. Sci. 48(3), 533–551 (1994)

    Article  MathSciNet  Google Scholar 

  10. Frisch, A., Cardelli, L.: Greedy regular expression matching. In: Proc. 31st ICALP, vol. 3142, pp. 618–629 (2004)

  11. Garofalakis, M.N., Rastogi, R., Shim, K.: SPIRIT: sequential pattern mining with regular expression constraints. In: Proc. 25th VLDB, pp. 223–234 (1999)

  12. Glushkov, V.M.: The abstract theory of automata. Russ. Math. Surv. 16(5), 1–53 (1961)

    Article  MathSciNet  Google Scholar 

  13. Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Commun. ACM 18(6), 341–343 (1975)

    Article  MathSciNet  Google Scholar 

  14. Johnson, T., Muthukrishnan, S., Rozenbaum, I.: Monitoring regular expressions on out-of-order streams. In: Proc. 23nd ICDE, pp. 1315–1319 (2007)

  15. Jordan, C.: Sur les assemblages des lignes. J. Reine Angew. Math. 70, 185–190 (1869)

    MathSciNet  MATH  Google Scholar 

  16. Kearns, S.M.: Extending regular expressions with context operators and parse extraction. Softw. Pract. Exp. 21(8), 787–804 (1991)

    Article  Google Scholar 

  17. Kin, K., Hartmann, B., DeRose, T., Agrawala, M.: Proton: multitouch gestures as regular expressions. In: Proc. SIGCHI, pp. 2885–2894 (2012)

  18. Kumar, S., Dharmapurikar, S., Yu, F., Crowley, P., Turner, J.: Algorithms to accelerate multiple regular expressions matching for deep packet inspection. In: Proc. SIGCOMM, pp. 339–350 (2006)

  19. Laurikari, V.: NFAs with tagged transitions, their conversion to deterministic automata and application to regular expressions. In: Proc. 7th SPIRE, pp. 181–187 (2000)

  20. Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proc. 27th VLDB, pp. 361–370 (2001)

  21. McNaughton, R., Yamada, H.: Regular expressions and state graphs for automata. IRE Trans. Electron. Comput. 9(1), 39–47 (1960)

    Article  Google Scholar 

  22. Murata, M.: Extended path expressions of XML. In: Proc. 20th PODS, pp. 126–137 (2001)

  23. Myers, E.W.: A four-Russian algorithm for regular expression pattern matching. J. ACM 39(2), 430–448 (1992)

    Article  MathSciNet  Google Scholar 

  24. Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J. Comput. Biol. 10(6), 903–923 (2003)

    Article  Google Scholar 

  25. Nielsen, L., Henglein, F.: Bit-coded regular expression parsing. In: Proc. 5th LATA, pp. 402–413 (2011)

  26. Sulzmann, M., Lu, K. Z. M.: Regular expression sub-matching using partial derivatives. In: Proc. 14th PPDP, pp. 79–90 (2012)

  27. Thompson, K.: Regular expression search algorithm. Commun. ACM 11, 419–422 (1968)

    Article  Google Scholar 

  28. Yu, F., Chen, Z., Diao, Y., Lakshman, T. V.: Katz, R. H.: Fast and memory-efficient regular expression matching for deep packet inspection. In: Proc. ANCS, pp. 93–102 (2006)

Download references

Acknowledgements

We thank the anonymous reviewers whose comments and suggestions significantly improved the presentation of the paper.

Funding

The funding was provided by Teknologi og Produktion, Det Frie Forskningsråd (Grant No. 4005-00267) and Natur og Univers, Det Frie Forskningsråd (DK) (Grant No. 1323-00178).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philip Bille.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

An extended abstract appeared at the 44th Mathematical Foundations of Computer Science [5].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bille, P., Gørtz, I.L. From regular expression matching to parsing. Acta Informatica 59, 709–724 (2022). https://doi.org/10.1007/s00236-022-00420-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00236-022-00420-6

Navigation