On the Semantics of Atomic Subgroups in Practical Regular Expressions

  • Martin Berglund
  • Brink van der Merwe
  • Bruce Watson
  • Nicolaas Weideman
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10329)

Abstract

Most regular expression matching engines have operators and features to enhance the succinctness of classical regular expressions, such as interval quantifiers and regular lookahead. In addition, matching engines in for example Perl, Java, Ruby and .NET, also provide operators, such as atomic operators, that constrain the backtracking behavior of the engine. The most common use is to prevent needless backtracking, but the operators will often also change the language accepted. As such it is essential to develop a theoretical sound basis for the matching semantics of regular expressions with atomic operators. We here establish that atomic operators preserve regularity, but are exponentially more succinct for some languages. Further we investigate the state complexity of deterministic and non-deterministic finite automata accepting the language corresponding to a regular expression with atomic operators, and show that emptiness testing is PSPACE-complete.

References

  1. [Aho90]
    Aho, A.: Algorithms for finding patterns in strings. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science, vol. A, pp. 255–300. MIT Press (1990)Google Scholar
  2. [BBD+13]
    Berglund, M., Björklund, H., Drewes, F., van der Merwe, B., Watson, B.: Cuts in regular expressions. In: Béal, M.-P., Carton, O. (eds.) DLT 2013. LNCS, vol. 7907, pp. 70–81. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38771-5_8 CrossRefGoogle Scholar
  3. [BvdM16]
    Berglund, M., van der Merwe, B.: On the semantics of regular expression parsing in the wild. Theor. Comput. Sci. (2016). doi:10.1016/j.tcs.2016.09.006
  4. [Fri97]
    Friedl, J.: Mastering regular expressions, 1st edn. O’Reilly & Associates Inc. (1997)Google Scholar
  5. [HK11]
    Holzer, M., Kutrib, M.: Descriptional and computational complexity of finite automata—a survey. Inf. Comput. 209(3), 456–470 (2011)MathSciNetCrossRefMATHGoogle Scholar
  6. [Reg]
    RegexAdvice.com. Regular expression library. http://regexlib.com. Accessed 9 Jan 2017
  7. [SMV12]
    Sakuma, Y., Minamide, Y., Voronkov, A.: Translating regular expression matching into transducers. J. Appl. Logic 10(1), 32–51 (2012)MathSciNetCrossRefMATHGoogle Scholar
  8. [WvdMBW16]
    Weideman, N., van der Merwe, B., Berglund, M., Watson, B.: Analyzing matching time behavior of backtracking regular expression matchers by using ambiguity of NFA. In: Han, Y.-S., Salomaa, K. (eds.) CIAA 2016. LNCS, vol. 9705, pp. 322–334. Springer, Cham (2016). doi:10.1007/978-3-319-40946-7_27 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Martin Berglund
    • 1
    • 3
  • Brink van der Merwe
    • 2
  • Bruce Watson
    • 1
    • 3
  • Nicolaas Weideman
    • 2
    • 3
  1. 1.Department of Information ScienceStellenbosch UniversityStellenboschSouth Africa
  2. 2.Department of Computer ScienceStellenbosch UniversityStellenboschSouth Africa
  3. 3.Center for AI Research, CSIRStellenbosch UniversityStellenboschSouth Africa

Personalised recommendations