A Flexible and Efficient ML Lexer Tool Based on Extended Regular Expression Submatching

  • Martin Sulzmann
  • Pippijn van Steenhoven
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8409)


Lexical analysis has many applications beyond the first phase of compilation in programming language processing. We argue that extended regular expressions combined with the ability to extract submatch information significantly increase the expressiveness of lexer specifications. We show that such an expressive lexical analysis can be done efficiently using some novel automata-based methods. The approach has been implemented in an ML lexer tool which is compatible with ocamllex. Experimental results confirm that our approach is competitive with respect to existing ML lexer tools.


Regular Expression Character Class Semantic Action Partial Derivative Operation Lexical Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools, 2nd edn. Addison-Wesley Longman Publishing Co., Inc., Boston (2006)Google Scholar
  2. 2.
    Antimirov, V.M.: Partial derivatives of regular expressions and finite automaton constructions. Theoretical Computer Science 155(2), 291–319 (1996)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Brzozowski, J.A.: Derivatives of regular expressions. J. ACM 11(4), 481–494 (1964)CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Caron, P., Champarnaud, J.-M., Mignot, L.: Partial derivatives of an extended regular expression. In: Dediu, A.-H., Inenaga, S., Martín-Vide, C. (eds.) LATA 2011. LNCS, vol. 6638, pp. 179–191. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  5. 5.
    Russ Cox. re2 – an efficient, principled regular expression library,
  6. 6.
    Cox, R.: Regular expression matching can be simple and fast (but is slow in java, perl, php, python, ruby,...) (2007),
  7. 7.
    Laurikari, V.: NFAs with tagged transitions, their conversion to deterministic automata and application to regular expressions. In: SPIRE, pp. 181–187 (2000)Google Scholar
  8. 8.
  9. 9.
  10. 10.
    Owens, S., Reppy, J., Turon, A.: Regular-expression derivatives reexamined. Journal of Functional Programming 19(2), 173–190 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  11. 11.
    PCRE - Perl Compatible Regular Expressions,
  12. 12.
    re2ml: Code-based replacement for ocamllex without submatching support,
  13. 13.
    Standard ML of New Jersey,
  14. 14.
    Sulzmann, M., Lu, K.Z.M.: Regular expression sub-matching using partial derivatives. In: Proc. of PPDP 2012, pp. 79–90. ACM (2012)Google Scholar
  15. 15.
    Thompson, K.: Programming techniques: Regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Martin Sulzmann
    • 1
  • Pippijn van Steenhoven
    • 1
  1. 1.Hochschule Karlsruhe - Technik und WirtschaftGermany

Personalised recommendations