Abstract
Analysis and renovation of large software portfolios requires syntax analysis of multiple, usually embedded, languages and this is beyond the capabilities of many standard parsing techniques. The traditional separation between lexer and parser falls short due to the limitations of tokenization based on regular expressions when handling multiple lexical grammars. In such cases scannerless parsing provides a viable solution. It uses the power of context-free grammars to be able to deal with a wide variety of issues in parsing lexical syntax. However, it comes at the price of less efficiency. The structure of tokens is obtained using a more powerful but more time and memory intensive parsing algorithm. Scannerless grammars are also more non-deterministic than their tokenized counterparts, increasing the burden on the parsing algorithm even further.
In this paper we investigate the application of the Right-Nulled Generalized LR parsing algorithm (RNGLR) to scannerless parsing. We adapt the Scannerless Generalized LR parsing and filtering algorithm (SGLR) to implement the optimizations of RNGLR. We present an updated parsing and filtering algorithm, called SRNGLR, and analyze its performance in comparison to SGLR on ambiguous grammars for the programming languages C, Java, Python, SASL, and C++. Measurements show that SRNGLR is on average 33% faster than SGLR, but is 95% faster on the highly ambiguous SASL grammar. For the mainstream languages C, C++, Java and Python the average speedup is 16%.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques and Tools. Addison-Wesley, Reading (1986)
Aycock, J., Nigel Horspool, R., Janousek, J., Melichar, B.: Even faster generalised LR parsing. Acta Inform. 37(9), 633–651 (2001)
van den Brand, M.G.J., van Deursen, A., Heering, J., de Jong, H.A., de Jonge, M., Kuipers, T., Klint, P., Moonen, L., Olivier, P.A., Scheerder, J., Vinju, J.J., Visser, E., Visser, J.: The ASF+SDF Meta-Environment: a Component-Based Language Development Environment. In: Wilhelm, R. (ed.) CC 2001. LNCS, vol. 2027, pp. 365–370. Springer, Heidelberg (2001)
van den Brand, M.G.J., de Jong, H.A., Klint, P., Olivier, P.A.: Efficient Annotated Terms. Softw., Pract. Exper. 30(3), 259–291 (2000)
van den Brand, M.G.J., Scheerder, J., Vinju, J.J., Visser, E.: Disambiguation Filters for Scannerless Generalized LR Parsers. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, pp. 143–158. Springer, Heidelberg (2002)
Bravenboer, M., Tanter, É., Visser, E.: Declarative, formal, and extensible syntax definition for AspectJ. SIGPLAN Not. 41(10), 209–228 (2006)
Church, K., Patil, R.: Coping with syntactic ambiguity or how to put the block in the box on the table. American Journal of Computational Linguistics 8(3–4), 139–149 (1982)
Earley, J.: An efficient context-free algorithm. Comm. ACM 13(2), 94–102 (1970)
Economopoulos, G.R.: Generalised LR parsing algorithms. PhD thesis, Royal Holloway, University of London (August 2006)
Ford, B.: Parsing expression grammars: a recognition-based syntactic foundation. In: POPL 2004, pp. 111–122. ACM, New York (2004)
Grimm, R.: Better extensibility through modular syntax. In: PLDI 2006, pp. 38–51. ACM, New York (2006)
Nigel Horspool, R., Whitney, M.: Even faster LR parsing. Softw., Pract. Exper. 20(6), 515–535 (1990)
Johnstone, A., Scott, E.: Automatic recursion engineering of reduction incorporated parsers. Sci. Comp. Programming 68(2), 95–110 (2007)
Nozohoor-Farshi, R.: GLR parsing for ε-grammars. In: Tomita, M. (ed.) Generalized LR Parsing, ch. 5, pp. 61–75. Kluwer Academic Publishers, Netherlands (1991)
Rekers, J.: Parser Generation for Interactive Environments. PhD thesis, University of Amsterdam (1992)
Salomon, D.J., Cormack, G.V.: Scannerless NSLR(1) parsing of programming languages. SIGPLAN Not. 24(7), 170–178 (1989)
Salomon, D.J., Cormack, G.V.: The disambiguation and scannerless parsing of complete character-level grammars for programming languages. Technical Report 95/06, Dept. of Computer Science, University of Manitoba (1995)
Elizabeth Scott and Adrian Johnstone. Right nulled GLR parsers. ACM Trans. Program. Lang. Syst., 28(4):577–618, 2006.
Scott, E., Johnstone, A., Economopoulos, R.: BRNGLR: a cubic Tomita-style GLR parsing algorithm. Acta Inform. 44(6), 427–461 (2007)
Tomita, M.: Efficient Parsing for Natural Languages. A Fast Algorithm for Practical Systems. Kluwer Academic Publishers, Dordrecht (1985)
Valiant, L.G.: General context-free recognition in less than cubic time. J. Comput. System Sci. 10, 308–315 (1975)
van den Brand, M.G.J.: Pregmatic, a generator for incremental programming environments. PhD thesis, Katholieke Universiteit Nijmegen (1992)
van den Brand, M.G.J., de Jong, H.A., Klint, P., Olivier, P.A.: Efficient annotated terms. Softw., Pract. Exper. 30(3), 259–291 (2000)
van Rossum, G.: Python reference manual, http://docs.python.org/ref/
Visser, E.: Syntax Definition for Language Prototyping. PhD thesis, University of Amsterdam (1997)
Van Wyk, E.R., Schwerdfeger, A.C.: Context-aware scanning for parsing extensible languages. In: GPCE 2007, pp. 63–72. ACM Press, New York (2007)
Younger, D.H.: Recognition and parsing of context-free languages in time n 3. Inform. and control 10(2), 189–208 (1967)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Economopoulos, G., Klint, P., Vinju, J. (2009). Faster Scannerless GLR Parsing. In: de Moor, O., Schwartzbach, M.I. (eds) Compiler Construction. CC 2009. Lecture Notes in Computer Science, vol 5501. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00722-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-00722-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00721-7
Online ISBN: 978-3-642-00722-4
eBook Packages: Computer ScienceComputer Science (R0)