Faster Scannerless GLR Parsing
Analysis and renovation of large software portfolios requires syntax analysis of multiple, usually embedded, languages and this is beyond the capabilities of many standard parsing techniques. The traditional separation between lexer and parser falls short due to the limitations of tokenization based on regular expressions when handling multiple lexical grammars. In such cases scannerless parsing provides a viable solution. It uses the power of context-free grammars to be able to deal with a wide variety of issues in parsing lexical syntax. However, it comes at the price of less efficiency. The structure of tokens is obtained using a more powerful but more time and memory intensive parsing algorithm. Scannerless grammars are also more non-deterministic than their tokenized counterparts, increasing the burden on the parsing algorithm even further.
In this paper we investigate the application of the Right-Nulled Generalized LR parsing algorithm (RNGLR) to scannerless parsing. We adapt the Scannerless Generalized LR parsing and filtering algorithm (SGLR) to implement the optimizations of RNGLR. We present an updated parsing and filtering algorithm, called SRNGLR, and analyze its performance in comparison to SGLR on ambiguous grammars for the programming languages C, Java, Python, SASL, and C++. Measurements show that SRNGLR is on average 33% faster than SGLR, but is 95% faster on the highly ambiguous SASL grammar. For the mainstream languages C, C++, Java and Python the average speedup is 16%.
- 3.van den Brand, M.G.J., van Deursen, A., Heering, J., de Jong, H.A., de Jonge, M., Kuipers, T., Klint, P., Moonen, L., Olivier, P.A., Scheerder, J., Vinju, J.J., Visser, E., Visser, J.: The ASF+SDF Meta-Environment: a Component-Based Language Development Environment. In: Wilhelm, R. (ed.) CC 2001. LNCS, vol. 2027, pp. 365–370. Springer, Heidelberg (2001)CrossRefGoogle Scholar
- 7.Church, K., Patil, R.: Coping with syntactic ambiguity or how to put the block in the box on the table. American Journal of Computational Linguistics 8(3–4), 139–149 (1982)Google Scholar
- 9.Economopoulos, G.R.: Generalised LR parsing algorithms. PhD thesis, Royal Holloway, University of London (August 2006)Google Scholar
- 10.Ford, B.: Parsing expression grammars: a recognition-based syntactic foundation. In: POPL 2004, pp. 111–122. ACM, New York (2004)Google Scholar
- 11.Grimm, R.: Better extensibility through modular syntax. In: PLDI 2006, pp. 38–51. ACM, New York (2006)Google Scholar
- 15.Rekers, J.: Parser Generation for Interactive Environments. PhD thesis, University of Amsterdam (1992)Google Scholar
- 17.Salomon, D.J., Cormack, G.V.: The disambiguation and scannerless parsing of complete character-level grammars for programming languages. Technical Report 95/06, Dept. of Computer Science, University of Manitoba (1995)Google Scholar
- 20.Tomita, M.: Efficient Parsing for Natural Languages. A Fast Algorithm for Practical Systems. Kluwer Academic Publishers, Dordrecht (1985)Google Scholar
- 22.van den Brand, M.G.J.: Pregmatic, a generator for incremental programming environments. PhD thesis, Katholieke Universiteit Nijmegen (1992)Google Scholar
- 24.van Rossum, G.: Python reference manual, http://docs.python.org/ref/
- 25.Visser, E.: Syntax Definition for Language Prototyping. PhD thesis, University of Amsterdam (1997)Google Scholar
- 26.Van Wyk, E.R., Schwerdfeger, A.C.: Context-aware scanning for parsing extensible languages. In: GPCE 2007, pp. 63–72. ACM Press, New York (2007)Google Scholar