Skip to main content

Formalising Boost POSIX Regular Expression Matching

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 11187)

Abstract

Whereas Perl-compatible regular expression matchers typically exhibit some variation of leftmost-greedy semantics, those conforming to the posix standard are prescribed leftmost-longest semantics. However, the posix standard leaves some room for interpretation, and Fowler and Kuklewicz have done experimental work to confirm differences between various posix matchers. The Boost library has an interesting take on the posix standard, where it maximises the leftmost match not with respect to subexpressions of the regular expression pattern, but rather, with respect to capturing groups. In our work, we provide the first formalisation of Boost semantics, and we analyse the complexity of regular expression matching when using Boost semantics.

Keywords

  • Regular expression matching
  • posix
  • Boost

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-02508-3_6
  • Chapter length: 17 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-02508-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)

Notes

  1. 1.

    Fowler [8] identifies the terms “subpattern” and “subexpression” as particular targets of abuse in the posix standard, especially since they are “central to the description of the matching algorithm”. He goes on to note that, whereas “subpattern” is used but once, “subexpression” is used 70 times and always appears in the context of grouping.

  2. 2.

    The matching engine should also reject on syntax or operators not permitted, as not all PCRE-style features make sense in the posix context. The parsing and validation of the expression is not within the scope of this discussion however.

References

  1. PCRE: Perl compatible regular expressions. https://www.pcre.org/. Accessed 26 May 2018

  2. Portable Operating System Interface (POSIX) Base Specifications, Issue 7. IEEE Standard 1003.1-2017 (2017). (Revision of IEEE Standard 1003.1-2008) https://doi.org/10.1109/ieeestd.2008.4694976

  3. Regex(3) BSD Library Functions Manual, September 2011. as available on macOS 10.11.6

    Google Scholar 

  4. Regular expression routines: OpenBSD library functions manual, May 2016. http://man.openbsd.org/regexec

  5. Berglund, M., van der Merwe, B.: On the semantics of regular expression parsing in the wild. Theor. Comput. Sci. 679, 69–82 (2017). https://doi.org/10.1016/j.tcs.2016.09.006

    MathSciNet  CrossRef  MATH  Google Scholar 

  6. Berglund, M., van der Merwe, B.: Re-examining regular expressions with backreferences. In: Holub, J., Žd’árek, J. (eds.) Proceedings of Prague Stringology Conference, PSC 2017, Prague, August 2017, pp. 30–41. Czech Technical University Prague (2017). http://www.stringology.org/event/2017/p04.html

  7. Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Inf. Comput. 140(2), 229–253 (1998). https://doi.org/10.1006/inco.1997.2688

    MathSciNet  CrossRef  MATH  Google Scholar 

  8. Fowler, G.: An interpretation of the POSIX regex standard. Technical report, AT&T Research, Florham Park, NJ (2003). http://gsf.cococlyde.org/download

  9. Friedl, J.E.F.: Mastering Regular Expressions, 3rd edn. O’Reilly, Sebastopol (2006)

    Google Scholar 

  10. Frisch, A., Cardelli, L.: Greedy regular expression matching. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 618–629. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27836-8_53

    CrossRef  Google Scholar 

  11. Houston, G.: Henry Spencer’s regular expression libraries. Git repositories. https://garyhouston.github.io/regex/. Accessed 26 May 2018

  12. Kearns, S.M.: Extending regular expressions with context operators and parse extraction. Softw. Pract. Exp. 21(8), 787–804 (1991). https://doi.org/10.1002/spe.4380210803

    CrossRef  Google Scholar 

  13. Kuklewicz, C.: Regex-TDFA. https://hackage.haskell.org/package/regex-tdfa. Accessed 26 May 2018

  14. Kuklewicz, C.: Summoned: Response to blog entry on lambda the ultimate: the programming languages weblog, February 2007. http://lambda-the-ultimate.org/node/2064. Accessed 26 May 2018

  15. Kuklewicz, C.: regex-posix-unittest (2009). https://hackage.haskell.org/package/regex-posix-unittest. Accessed 26 May 2018

  16. Kuklewicz, C.: Regex Posix. Haskell Wiki, March 2017. https://wiki.haskell.org/Regex_Posix. Accessed 26 May 2018

  17. Laurikari, V.: NFAs with tagged transitions, their conversion to deterministic automata and application to regular expressions. In: Proceedings of 7th International Symposium on String Processing and Information Retrieval, SPIRE 2000, A Coruña, September 2000, pp. 181–187. IEEE (2000). https://doi.org/10.1109/spire.2000.878194

  18. Laurikari, V.: TRE: The free and portable regex matching library. Git repository. https://github.com/laurikari/tre/. Accessed 26 May 2018

  19. Laurikari, V.: TRE documentation. https://laurikari.net/tre/documentation/regex-syntax/. Accessed 26 May 2018

  20. Laurikari, V.: Efficient submatch addressing for regular expressions. Master’s thesis, Helsinki University of Technology, November 2001

    Google Scholar 

  21. Maddock, J.: Boost.Regex (2013). https://www.boost.org/doc/libs/1_67_0/libs/regex/doc/html/index.html. Accessed 26 May 2018

  22. Okui, S., Suzuki, T.: Disambiguation in regular expression matching via position automata with augmented transitions. In: Domaratzki, M., Salomaa, K. (eds.) CIAA 2010. LNCS, vol. 6482, pp. 231–240. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18098-9_25

    CrossRef  MATH  Google Scholar 

  23. Sulzmann, M., Lu, K.Z.M.: POSIX regular expression parsing with derivatives. In: Codish, M., Sumii, E. (eds.) FLOPS 2014. LNCS, vol. 8475, pp. 203–220. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07151-0_13

    CrossRef  Google Scholar 

  24. Sulzmann, M., Lu, K.Z.M.: Derivative-based diagnosis of regular expression ambiguity. In: Han, Y.-S., Salomaa, K. (eds.) CIAA 2016. LNCS, vol. 9705, pp. 260–272. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40946-7_22

    CrossRef  MATH  Google Scholar 

  25. Thompson, K.: Programming techniques: regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968). https://doi.org/10.1145/363347.363387

    CrossRef  MATH  Google Scholar 

  26. Weideman, N., van der Merwe, B., Berglund, M., Watson, B.: Analyzing matching time behavior of backtracking regular expression matchers by using ambiguity of NFA. In: Han, Y.-S., Salomaa, K. (eds.) CIAA 2016. LNCS, vol. 9705, pp. 322–334. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40946-7_27

    CrossRef  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Willem Bester .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Berglund, M., Bester, W., van der Merwe, B. (2018). Formalising Boost POSIX Regular Expression Matching. In: Fischer, B., Uustalu, T. (eds) Theoretical Aspects of Computing – ICTAC 2018. ICTAC 2018. Lecture Notes in Computer Science(), vol 11187. Springer, Cham. https://doi.org/10.1007/978-3-030-02508-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02508-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02507-6

  • Online ISBN: 978-3-030-02508-3

  • eBook Packages: Computer ScienceComputer Science (R0)