Island Grammar-Based Parsing Using GLL and Tom
Extending a language by embedding within it another language presents significant parsing challenges, especially if the embedding is recursive. The composite grammar is likely to be nondeterministic as a result of tokens that are valid in both the host and the embedded language. In this paper we examine the challenges of embedding the Tom language into a variety of general-purpose high level languages. Tom provides syntax and semantics for advanced pattern matching and tree rewriting facilities. Embedded Tom constructs are translated into the host language by a preprocessor, the output of which is a composite program written purely in the host language. Tom implementations exist for Java, C, C#, Python and Caml. The current parser is complex and difficult to maintain. In this paper, we describe how Tom can be parsed using island grammars implemented with the Generalised LL (GLL) parsing algorithm. The grammar is, as might be expected, ambiguous. Extracting the correct derivation relies on our disambiguation strategy which is based on pattern matching within the parse forest. We describe different classes of ambiguity and propose patterns for resolving them.
KeywordsGLL Tom island grammars parsing disambiguation
Unable to display preview. Download preview PDF.
- 3.Moonen, L.: Generating robust parsers using island grammars. In: Proceedings of the 8th Working Conference on Reverse Engineering, pp. 13–22. IEEE (2001)Google Scholar
- 4.Tomita, M.: Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems. Kluwer Academic Publishers, Norwell (1985)Google Scholar
- 5.Rekers, J.: Parser Generation for Interactive Environments. PhD thesis, University of Amsterdam, The Netherlands (1992), http://homepages.cwi.nl/~paulk/dissertations/Rekers.pdf
- 6.Visser, E.: Scannerless generalized-LR parsing. Technical Report P9707, Programming Research Group, University of Amsterdam (1997)Google Scholar
- 7.Scott, E., Johnstone, A.: GLL parse-tree generation. Science of Computer Programming (to appear, 2012)Google Scholar
- 8.Manders, M.W.: mlBNF - a syntax formalism for domain specific languages. Master’s thesis, Eindhoven University of Technology, The Netherlands (2011), http://alexandria.tue.nl/extra1/afstversl/wsk-i/manders2011.pdf
- 13.van Deursen, A., Kuipers, T.: Building documentation generators. In: Proceedings of the IEEE International Conference on Software Maintenance, pp. 40–49 (1999)Google Scholar
- 16.Post, E.: Island grammars in ASF+SDF. Master’s thesis, University of Amsterdam, The Netherlands (2007), http://homepages.cwi.nl/~paulk/theses/ErikPost.pdf
- 17.van der Leek, R.: Implementation Strategies for Island Grammars. Master’s thesis, Delft University of Technology, The Netherlands (2005), http://swerl.tudelft.nl/twiki/pub/Main/RobVanDerLeek/robvanderleek.pdf
- 19.Synytskyy, N., Cordy, J.R., Dean, T.R.: Robust multilingual parsing using island grammars. In: Proceedings of the 2003 Conference of the Centre for Advanced Studies on Collaborative Research, CASCON 2003, pp. 266–278. IBM Press (2003)Google Scholar