Faster Scannerless GLR Parsing

  • Giorgios Economopoulos
  • Paul Klint
  • Jurgen Vinju
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5501)

Abstract

Analysis and renovation of large software portfolios requires syntax analysis of multiple, usually embedded, languages and this is beyond the capabilities of many standard parsing techniques. The traditional separation between lexer and parser falls short due to the limitations of tokenization based on regular expressions when handling multiple lexical grammars. In such cases scannerless parsing provides a viable solution. It uses the power of context-free grammars to be able to deal with a wide variety of issues in parsing lexical syntax. However, it comes at the price of less efficiency. The structure of tokens is obtained using a more powerful but more time and memory intensive parsing algorithm. Scannerless grammars are also more non-deterministic than their tokenized counterparts, increasing the burden on the parsing algorithm even further.

In this paper we investigate the application of the Right-Nulled Generalized LR parsing algorithm (RNGLR) to scannerless parsing. We adapt the Scannerless Generalized LR parsing and filtering algorithm (SGLR) to implement the optimizations of RNGLR. We present an updated parsing and filtering algorithm, called SRNGLR, and analyze its performance in comparison to SGLR on ambiguous grammars for the programming languages C, Java, Python, SASL, and C++. Measurements show that SRNGLR is on average 33% faster than SGLR, but is 95% faster on the highly ambiguous SASL grammar. For the mainstream languages C, C++, Java and Python the average speedup is 16%.

References

  1. 1.
    Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques and Tools. Addison-Wesley, Reading (1986)MATHGoogle Scholar
  2. 2.
    Aycock, J., Nigel Horspool, R., Janousek, J., Melichar, B.: Even faster generalised LR parsing. Acta Inform. 37(9), 633–651 (2001)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    van den Brand, M.G.J., van Deursen, A., Heering, J., de Jong, H.A., de Jonge, M., Kuipers, T., Klint, P., Moonen, L., Olivier, P.A., Scheerder, J., Vinju, J.J., Visser, E., Visser, J.: The ASF+SDF Meta-Environment: a Component-Based Language Development Environment. In: Wilhelm, R. (ed.) CC 2001. LNCS, vol. 2027, pp. 365–370. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  4. 4.
    van den Brand, M.G.J., de Jong, H.A., Klint, P., Olivier, P.A.: Efficient Annotated Terms. Softw., Pract. Exper. 30(3), 259–291 (2000)CrossRefGoogle Scholar
  5. 5.
    van den Brand, M.G.J., Scheerder, J., Vinju, J.J., Visser, E.: Disambiguation Filters for Scannerless Generalized LR Parsers. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, pp. 143–158. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    Bravenboer, M., Tanter, É., Visser, E.: Declarative, formal, and extensible syntax definition for AspectJ. SIGPLAN Not. 41(10), 209–228 (2006)CrossRefGoogle Scholar
  7. 7.
    Church, K., Patil, R.: Coping with syntactic ambiguity or how to put the block in the box on the table. American Journal of Computational Linguistics 8(3–4), 139–149 (1982)Google Scholar
  8. 8.
    Earley, J.: An efficient context-free algorithm. Comm. ACM 13(2), 94–102 (1970)CrossRefMATHGoogle Scholar
  9. 9.
    Economopoulos, G.R.: Generalised LR parsing algorithms. PhD thesis, Royal Holloway, University of London (August 2006)Google Scholar
  10. 10.
    Ford, B.: Parsing expression grammars: a recognition-based syntactic foundation. In: POPL 2004, pp. 111–122. ACM, New York (2004)Google Scholar
  11. 11.
    Grimm, R.: Better extensibility through modular syntax. In: PLDI 2006, pp. 38–51. ACM, New York (2006)Google Scholar
  12. 12.
    Nigel Horspool, R., Whitney, M.: Even faster LR parsing. Softw., Pract. Exper. 20(6), 515–535 (1990)CrossRefGoogle Scholar
  13. 13.
    Johnstone, A., Scott, E.: Automatic recursion engineering of reduction incorporated parsers. Sci. Comp. Programming 68(2), 95–110 (2007)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Nozohoor-Farshi, R.: GLR parsing for ε-grammars. In: Tomita, M. (ed.) Generalized LR Parsing, ch. 5, pp. 61–75. Kluwer Academic Publishers, Netherlands (1991)CrossRefGoogle Scholar
  15. 15.
    Rekers, J.: Parser Generation for Interactive Environments. PhD thesis, University of Amsterdam (1992)Google Scholar
  16. 16.
    Salomon, D.J., Cormack, G.V.: Scannerless NSLR(1) parsing of programming languages. SIGPLAN Not. 24(7), 170–178 (1989)CrossRefGoogle Scholar
  17. 17.
    Salomon, D.J., Cormack, G.V.: The disambiguation and scannerless parsing of complete character-level grammars for programming languages. Technical Report 95/06, Dept. of Computer Science, University of Manitoba (1995)Google Scholar
  18. 18.
    Elizabeth Scott and Adrian Johnstone. Right nulled GLR parsers. ACM Trans. Program. Lang. Syst., 28(4):577–618, 2006.CrossRefMATHGoogle Scholar
  19. 19.
    Scott, E., Johnstone, A., Economopoulos, R.: BRNGLR: a cubic Tomita-style GLR parsing algorithm. Acta Inform. 44(6), 427–461 (2007)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Tomita, M.: Efficient Parsing for Natural Languages. A Fast Algorithm for Practical Systems. Kluwer Academic Publishers, Dordrecht (1985)Google Scholar
  21. 21.
    Valiant, L.G.: General context-free recognition in less than cubic time. J. Comput. System Sci. 10, 308–315 (1975)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    van den Brand, M.G.J.: Pregmatic, a generator for incremental programming environments. PhD thesis, Katholieke Universiteit Nijmegen (1992)Google Scholar
  23. 23.
    van den Brand, M.G.J., de Jong, H.A., Klint, P., Olivier, P.A.: Efficient annotated terms. Softw., Pract. Exper. 30(3), 259–291 (2000)CrossRefGoogle Scholar
  24. 24.
    van Rossum, G.: Python reference manual, http://docs.python.org/ref/
  25. 25.
    Visser, E.: Syntax Definition for Language Prototyping. PhD thesis, University of Amsterdam (1997)Google Scholar
  26. 26.
    Van Wyk, E.R., Schwerdfeger, A.C.: Context-aware scanning for parsing extensible languages. In: GPCE 2007, pp. 63–72. ACM Press, New York (2007)Google Scholar
  27. 27.
    Younger, D.H.: Recognition and parsing of context-free languages in time n 3. Inform. and control 10(2), 189–208 (1967)CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Giorgios Economopoulos
    • 1
  • Paul Klint
    • 1
  • Jurgen Vinju
    • 1
  1. 1.Centrum voor Wiskunde en Informatica (CWI)AmsterdamThe Netherlands

Personalised recommendations