Elkhound: A Fast, Practical GLR Parser Generator

  • Scott McPeak
  • George C. Necula
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2985)

Abstract

The Generalized LR (GLR) parsing algorithm is attractive for use in parsing programming languages because it is asymptotically efficient for typical grammars, and can parse with any context-free grammar, including ambiguous grammars. However, adoption of GLR has been slowed by high constant-factor overheads and the lack of a general, user-defined action interface.

In this paper we present algorithmic and implementation enhancements to GLR to solve these problems. First, we present a hybrid algorithm that chooses between GLR and ordinary LR on a token-by-token basis, thus achieving competitive performance for determinstic input fragments. Second, we describe a design for an action interface and a new worklist algorithm that can guarantee bottom-up execution of actions for acyclic grammars. These ideas are implemented in the Elkhound GLR parser generator.

To demonstrate the effectiveness of these techniques, we describe our experience using Elkhound to write a parser for C++, a language notorious for being difficult to parse. Our C++ parser is small (3500 lines), efficient and maintainable, employing a range of disambiguation strategies.

References

  1. 1.
    Johnson, S.C.: YACC: Yet another compiler compiler. In: UNIX Programmer’s Manual, 7th edn., vol. 2B (1979)Google Scholar
  2. 2.
    Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques and Tools. Addison-Wesley, Reading (1986)Google Scholar
  3. 3.
    Lang, B.: Deterministic techniques for efficient non-deterministic parsers. In: Loeckx, J. (ed.) ICALP 1974. LNCS, vol. 14, pp. 255–269. Springer, Heidelberg (1974)Google Scholar
  4. 4.
    Tomita, M.: Efficient Parsing for Natural Language. Int. Series in Engineering and Computer Science. Kluwer, Dordrecht (1985)Google Scholar
  5. 5.
    Rekers, J.: Parser Generation for Interactive Environments. PhD thesis, University of Amsterdam, Amsterdam, The Netherlands (1992)Google Scholar
  6. 6.
    Earley, J.: An efficient context-free parsing algorithm. Communications of the ACM 13, 94–102 (1970)MATHCrossRefGoogle Scholar
  7. 7.
    Heering, J., Hendriks, P.R.H., Klint, P., Rekers, J.: The syntax definition formalism SDF - reference manual. SIGPLAN Notices 24, 43–75 (1989)CrossRefGoogle Scholar
  8. 8.
    Donnelly, C., Stallman, R.M.: Bison: the YACC-compatible Parser Generator, Bison Version 1.28. Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139 (1999)Google Scholar
  9. 9.
    McPeak, S.: Elkhound: A fast, efficient GLR parser generator. Technical Report CSD-02-1214, University of California, Berkeley (2002)Google Scholar
  10. 10.
    Knuth, D.E.: On the translation of languages from left to right. Information and Control 8, 607–639 (1965)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Nozohoor-Farshi, R.: GLR parsing for ε-grammars. In: Tomita, M. (ed.) Generalized LR Parsing, pp. 61–75. Kluwer, Dordrecht (1991)Google Scholar
  12. 12.
    International Organization for Standardization: ISO/IEC 14882:1998: Programming languages — C++. International Organization for Standardization, Geneva, Switzerland (1998)Google Scholar
  13. 13.
    Visser, E.: Scannerless generalized-LR parsing. Technical Report P9707, University of Amsterdam (1997) Google Scholar
  14. 14.
    Alonso, M.A., Cabrero, D., Vilares, M.: Construction of efficient generalized LR parsers. In: Raymond, D.R., Yu, S., Wood, D. (eds.) WIA 1996. LNCS, vol. 1260. Springer, Heidelberg (1997)Google Scholar
  15. 15.
    Kipps, J.R.: GLR parsing in time O(n3). In: Tomita, M. (ed.) Generalized LR Parsing, pp. 43–60. Kluwer, Dordrecht (1991)Google Scholar
  16. 16.
    van den Brand, M., de Jong, H.A., Klint, P., Olivier, P.A.: Efficient annotated terms. Software Practice and Experience 30, 259–291 (2000)CrossRefGoogle Scholar
  17. 17.
    Earley, J.: Ambiguity and precedence in syntax description. Acta Informatica 4, 183–192 (1975)MATHCrossRefGoogle Scholar
  18. 18.
    van den Brand, M., Scheerder, J., Vinju, J.J., Visser, E.: Disambiguation filters for scannerless generalized LR parsers. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, pp. 143–158. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  19. 19.
    Wagner, T.A., Graham, S.L.: Incremental analysis of real programming languages. In: ACM Programming Language Design and Implementation (PLDI), pp. 31–43 (1997)Google Scholar
  20. 20.
    Graham, S.L., Harrison, M.A., Ruzzo, W.L.: An improved context-free recognizer. ACM Transactions on Programming Languages and Systems (TOPLAS) 2, 415–462 (1980)MATHCrossRefGoogle Scholar
  21. 21.
    McLean, P., Horspool, R.N.: A faster Earley parser. In: Gyimóthy, T. (ed.) CC 1996. LNCS, vol. 1060, pp. 281–293. Springer, Heidelberg (1996)Google Scholar
  22. 22.
    Schröer, F.W.: The ACCENT compiler compiler, introduction and reference. Technical Report 101, German National Research Center for Information Technology (2000)Google Scholar
  23. 23.
    Aycock, J., Horspool, R.N., Janoušek, J., Melichar, B.: Even faster generalized LR parsing. Acta Informatica 37, 633–651 (2001)MATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Hutton, G.: Higher-order functions for parsing. Journal of Functional Programming 2, 323–343 (1992)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Scott McPeak
    • 1
  • George C. Necula
    • 1
  1. 1.University of CaliforniaBerkeley

Personalised recommendations