Abstract
Given a regular expression R and a string Q, the regular expression parsing problem is to determine if Q matches R and if so, determine how it matches, i.e., by a mapping of the characters of Q to the characters in R. Regular expression parsing makes finding matches of a regular expression even more useful by allowing us to directly extract subpatterns of the match, e.g., for extracting IP-addresses from internet traffic analysis or extracting subparts of genomes from genetic data bases. We present a new general techniques for efficiently converting a large class of algorithms that determine if a string Q matches regular expression R into algorithms that can construct a corresponding mapping. As a consequence, we obtain the first efficient linear space solutions for regular expression parsing.
Similar content being viewed by others
Notes
Sometimes NFAs are allowed a set of accepting states, but this is not necessary for our purposes.
References
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing Co., Inc., Boston (1986)
Backurs, A., Indyk, P.: Which regular expression patterns are hard to match?. In: Proc. 57th FOCS, pp. 457–466 (2016)
Bille, P.: New algorithms for regular expression matching. In: Proc. of the 33rd ICALP, pp. 643–654 (2006)
Bille, P., Farach-Colton, M.: Fast and compact regular expression matching. Theor. Comput. Sci. 409(3), 486–496 (2008)
Bille, P., Gørtz, I.L.: From regular expression matching to parsing. In: Proc. 44th MFCS (2019)
Bille, P., Thorup, M.: Faster regular expression matching. In: Proc. 36th ICALP, pp. 171–182 (2009). Full version with appendix available at http://www2.compute.dtu.dk/~phbi/files/publications/2009fremC.pdf
Bille, P., Thorup, M.: Regular expression matching with multi-strings and intervals. In: Proc. 21st SODA, pp. 1297–1308 (2010)
Dubé, D., Feeley, M.: Efficiently building a parse tree from a regular expression. Acta Inform. 37(2), 121–144 (2000)
Fredman, M.L., Willard, D.E.: Trans-dichotomous algorithms for minimum spanning trees and shortest paths. J. Comput. Syst. Sci. 48(3), 533–551 (1994)
Frisch, A., Cardelli, L.: Greedy regular expression matching. In: Proc. 31st ICALP, vol. 3142, pp. 618–629 (2004)
Garofalakis, M.N., Rastogi, R., Shim, K.: SPIRIT: sequential pattern mining with regular expression constraints. In: Proc. 25th VLDB, pp. 223–234 (1999)
Glushkov, V.M.: The abstract theory of automata. Russ. Math. Surv. 16(5), 1–53 (1961)
Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Commun. ACM 18(6), 341–343 (1975)
Johnson, T., Muthukrishnan, S., Rozenbaum, I.: Monitoring regular expressions on out-of-order streams. In: Proc. 23nd ICDE, pp. 1315–1319 (2007)
Jordan, C.: Sur les assemblages des lignes. J. Reine Angew. Math. 70, 185–190 (1869)
Kearns, S.M.: Extending regular expressions with context operators and parse extraction. Softw. Pract. Exp. 21(8), 787–804 (1991)
Kin, K., Hartmann, B., DeRose, T., Agrawala, M.: Proton: multitouch gestures as regular expressions. In: Proc. SIGCHI, pp. 2885–2894 (2012)
Kumar, S., Dharmapurikar, S., Yu, F., Crowley, P., Turner, J.: Algorithms to accelerate multiple regular expressions matching for deep packet inspection. In: Proc. SIGCOMM, pp. 339–350 (2006)
Laurikari, V.: NFAs with tagged transitions, their conversion to deterministic automata and application to regular expressions. In: Proc. 7th SPIRE, pp. 181–187 (2000)
Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proc. 27th VLDB, pp. 361–370 (2001)
McNaughton, R., Yamada, H.: Regular expressions and state graphs for automata. IRE Trans. Electron. Comput. 9(1), 39–47 (1960)
Murata, M.: Extended path expressions of XML. In: Proc. 20th PODS, pp. 126–137 (2001)
Myers, E.W.: A four-Russian algorithm for regular expression pattern matching. J. ACM 39(2), 430–448 (1992)
Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J. Comput. Biol. 10(6), 903–923 (2003)
Nielsen, L., Henglein, F.: Bit-coded regular expression parsing. In: Proc. 5th LATA, pp. 402–413 (2011)
Sulzmann, M., Lu, K. Z. M.: Regular expression sub-matching using partial derivatives. In: Proc. 14th PPDP, pp. 79–90 (2012)
Thompson, K.: Regular expression search algorithm. Commun. ACM 11, 419–422 (1968)
Yu, F., Chen, Z., Diao, Y., Lakshman, T. V.: Katz, R. H.: Fast and memory-efficient regular expression matching for deep packet inspection. In: Proc. ANCS, pp. 93–102 (2006)
Acknowledgements
We thank the anonymous reviewers whose comments and suggestions significantly improved the presentation of the paper.
Funding
The funding was provided by Teknologi og Produktion, Det Frie Forskningsråd (Grant No. 4005-00267) and Natur og Univers, Det Frie Forskningsråd (DK) (Grant No. 1323-00178).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
An extended abstract appeared at the 44th Mathematical Foundations of Computer Science [5].
Rights and permissions
About this article
Cite this article
Bille, P., Gørtz, I.L. From regular expression matching to parsing. Acta Informatica 59, 709–724 (2022). https://doi.org/10.1007/s00236-022-00420-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00236-022-00420-6