Advertisement

Formalising Boost POSIX Regular Expression Matching

  • Martin Berglund
  • Willem Bester
  • Brink van der Merwe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11187)

Abstract

Whereas Perl-compatible regular expression matchers typically exhibit some variation of leftmost-greedy semantics, those conforming to the posix standard are prescribed leftmost-longest semantics. However, the posix standard leaves some room for interpretation, and Fowler and Kuklewicz have done experimental work to confirm differences between various posix matchers. The Boost library has an interesting take on the posix standard, where it maximises the leftmost match not with respect to subexpressions of the regular expression pattern, but rather, with respect to capturing groups. In our work, we provide the first formalisation of Boost semantics, and we analyse the complexity of regular expression matching when using Boost semantics.

Keywords

Regular expression matching posix Boost 

References

  1. 1.
    PCRE: Perl compatible regular expressions. https://www.pcre.org/. Accessed 26 May 2018
  2. 2.
    Portable Operating System Interface (POSIX) Base Specifications, Issue 7. IEEE Standard 1003.1-2017 (2017). (Revision of IEEE Standard 1003.1-2008)  https://doi.org/10.1109/ieeestd.2008.4694976
  3. 3.
    Regex(3) BSD Library Functions Manual, September 2011. as available on macOS 10.11.6Google Scholar
  4. 4.
    Regular expression routines: OpenBSD library functions manual, May 2016. http://man.openbsd.org/regexec
  5. 5.
    Berglund, M., van der Merwe, B.: On the semantics of regular expression parsing in the wild. Theor. Comput. Sci. 679, 69–82 (2017).  https://doi.org/10.1016/j.tcs.2016.09.006MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Berglund, M., van der Merwe, B.: Re-examining regular expressions with backreferences. In: Holub, J., Žd’árek, J. (eds.) Proceedings of Prague Stringology Conference, PSC 2017, Prague, August 2017, pp. 30–41. Czech Technical University Prague (2017). http://www.stringology.org/event/2017/p04.html
  7. 7.
    Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Inf. Comput. 140(2), 229–253 (1998).  https://doi.org/10.1006/inco.1997.2688MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Fowler, G.: An interpretation of the POSIX regex standard. Technical report, AT&T Research, Florham Park, NJ (2003). http://gsf.cococlyde.org/download
  9. 9.
    Friedl, J.E.F.: Mastering Regular Expressions, 3rd edn. O’Reilly, Sebastopol (2006)Google Scholar
  10. 10.
    Frisch, A., Cardelli, L.: Greedy regular expression matching. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 618–629. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-27836-8_53CrossRefGoogle Scholar
  11. 11.
    Houston, G.: Henry Spencer’s regular expression libraries. Git repositories. https://garyhouston.github.io/regex/. Accessed 26 May 2018
  12. 12.
    Kearns, S.M.: Extending regular expressions with context operators and parse extraction. Softw. Pract. Exp. 21(8), 787–804 (1991).  https://doi.org/10.1002/spe.4380210803CrossRefGoogle Scholar
  13. 13.
    Kuklewicz, C.: Regex-TDFA. https://hackage.haskell.org/package/regex-tdfa. Accessed 26 May 2018
  14. 14.
    Kuklewicz, C.: Summoned: Response to blog entry on lambda the ultimate: the programming languages weblog, February 2007. http://lambda-the-ultimate.org/node/2064. Accessed 26 May 2018
  15. 15.
    Kuklewicz, C.: regex-posix-unittest (2009). https://hackage.haskell.org/package/regex-posix-unittest. Accessed 26 May 2018
  16. 16.
    Kuklewicz, C.: Regex Posix. Haskell Wiki, March 2017. https://wiki.haskell.org/Regex_Posix. Accessed 26 May 2018
  17. 17.
    Laurikari, V.: NFAs with tagged transitions, their conversion to deterministic automata and application to regular expressions. In: Proceedings of 7th International Symposium on String Processing and Information Retrieval, SPIRE 2000, A Coruña, September 2000, pp. 181–187. IEEE (2000).  https://doi.org/10.1109/spire.2000.878194
  18. 18.
    Laurikari, V.: TRE: The free and portable regex matching library. Git repository. https://github.com/laurikari/tre/. Accessed 26 May 2018
  19. 19.
    Laurikari, V.: TRE documentation. https://laurikari.net/tre/documentation/regex-syntax/. Accessed 26 May 2018
  20. 20.
    Laurikari, V.: Efficient submatch addressing for regular expressions. Master’s thesis, Helsinki University of Technology, November 2001Google Scholar
  21. 21.
    Maddock, J.: Boost.Regex (2013). https://www.boost.org/doc/libs/1_67_0/libs/regex/doc/html/index.html. Accessed 26 May 2018
  22. 22.
    Okui, S., Suzuki, T.: Disambiguation in regular expression matching via position automata with augmented transitions. In: Domaratzki, M., Salomaa, K. (eds.) CIAA 2010. LNCS, vol. 6482, pp. 231–240. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-18098-9_25CrossRefzbMATHGoogle Scholar
  23. 23.
    Sulzmann, M., Lu, K.Z.M.: POSIX regular expression parsing with derivatives. In: Codish, M., Sumii, E. (eds.) FLOPS 2014. LNCS, vol. 8475, pp. 203–220. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-07151-0_13CrossRefGoogle Scholar
  24. 24.
    Sulzmann, M., Lu, K.Z.M.: Derivative-based diagnosis of regular expression ambiguity. In: Han, Y.-S., Salomaa, K. (eds.) CIAA 2016. LNCS, vol. 9705, pp. 260–272. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-40946-7_22CrossRefzbMATHGoogle Scholar
  25. 25.
    Thompson, K.: Programming techniques: regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968).  https://doi.org/10.1145/363347.363387CrossRefzbMATHGoogle Scholar
  26. 26.
    Weideman, N., van der Merwe, B., Berglund, M., Watson, B.: Analyzing matching time behavior of backtracking regular expression matchers by using ambiguity of NFA. In: Han, Y.-S., Salomaa, K. (eds.) CIAA 2016. LNCS, vol. 9705, pp. 322–334. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-40946-7_27CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Information Science and Centre for AI ResearchUniversity of StellenboschStellenboschSouth Africa
  2. 2.Division of Computer ScienceUniversity of StellenboschStellenboschSouth Africa

Personalised recommendations