Island Grammar-Based Parsing Using GLL and Tom

  • Ali Afroozeh
  • Jean-Christophe Bach
  • Mark van den Brand
  • Adrian Johnstone
  • Maarten Manders
  • Pierre-Etienne Moreau
  • Elizabeth Scott
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7745)

Abstract

Extending a language by embedding within it another language presents significant parsing challenges, especially if the embedding is recursive. The composite grammar is likely to be nondeterministic as a result of tokens that are valid in both the host and the embedded language. In this paper we examine the challenges of embedding the Tom language into a variety of general-purpose high level languages. Tom provides syntax and semantics for advanced pattern matching and tree rewriting facilities. Embedded Tom constructs are translated into the host language by a preprocessor, the output of which is a composite program written purely in the host language. Tom implementations exist for Java, C, C#, Python and Caml. The current parser is complex and difficult to maintain. In this paper, we describe how Tom can be parsed using island grammars implemented with the Generalised LL (GLL) parsing algorithm. The grammar is, as might be expected, ambiguous. Extracting the correct derivation relies on our disambiguation strategy which is based on pattern matching within the parse forest. We describe different classes of ambiguity and propose patterns for resolving them.

Keywords

GLL Tom island grammars parsing disambiguation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Moreau, P.-E., Ringeissen, C., Vittek, M.: A Pattern Matching Compiler for Multiple Target Languages. In: Hedin, G. (ed.) CC 2003. LNCS, vol. 2622, pp. 61–76. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  2. 2.
    Bravenboer, M., Dolstra, E., Visser, E.: Preventing injection attacks with syntax embeddings. Science of Computer Programming 75(7), 473–495 (2010)MATHCrossRefGoogle Scholar
  3. 3.
    Moonen, L.: Generating robust parsers using island grammars. In: Proceedings of the 8th Working Conference on Reverse Engineering, pp. 13–22. IEEE (2001)Google Scholar
  4. 4.
    Tomita, M.: Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems. Kluwer Academic Publishers, Norwell (1985)Google Scholar
  5. 5.
    Rekers, J.: Parser Generation for Interactive Environments. PhD thesis, University of Amsterdam, The Netherlands (1992), http://homepages.cwi.nl/~paulk/dissertations/Rekers.pdf
  6. 6.
    Visser, E.: Scannerless generalized-LR parsing. Technical Report P9707, Programming Research Group, University of Amsterdam (1997)Google Scholar
  7. 7.
    Scott, E., Johnstone, A.: GLL parse-tree generation. Science of Computer Programming (to appear, 2012)Google Scholar
  8. 8.
    Manders, M.W.: mlBNF - a syntax formalism for domain specific languages. Master’s thesis, Eindhoven University of Technology, The Netherlands (2011), http://alexandria.tue.nl/extra1/afstversl/wsk-i/manders2011.pdf
  9. 9.
    Balland, E., Brauner, P., Kopetz, R., Moreau, P.-E., Reilles, A.: Tom: Piggybacking Rewriting on Java. In: Baader, F. (ed.) RTA 2007. LNCS, vol. 4533, pp. 36–47. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  10. 10.
    Balland, E., Kirchner, C., Moreau, P.-E.: Formal Islands. In: Johnson, M., Vene, V. (eds.) AMAST 2006. LNCS, vol. 4019, pp. 51–65. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Johnstone, A., Scott, E.: Modelling GLL Parser Implementations. In: Malloy, B., Staab, S., van den Brand, M. (eds.) SLE 2010. LNCS, vol. 6563, pp. 42–61. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  12. 12.
    Heering, J., Hendriks, P.R.H., Klint, P., Rekers, J.: The syntax definition formalism SDF-reference manual-. SIGPLAN Not. 24(11), 43–75 (1989)CrossRefGoogle Scholar
  13. 13.
    van Deursen, A., Kuipers, T.: Building documentation generators. In: Proceedings of the IEEE International Conference on Software Maintenance, pp. 40–49 (1999)Google Scholar
  14. 14.
    Bravenboer, M., Visser, E.: Concrete syntax for objects: domain-specific language embedding and assimilation without restrictions. SIGPLAN Not. 39(10), 365–383 (2004)CrossRefGoogle Scholar
  15. 15.
    den van Brand, M.G.J., Scheerder, J., Vinju, J.J., Visser, E.: Disambiguation Filters for Scannerless Generalized LR Parsers. In: Nigel Horspool, R. (ed.) CC 2002. LNCS, vol. 2304, pp. 143–158. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  16. 16.
    Post, E.: Island grammars in ASF+SDF. Master’s thesis, University of Amsterdam, The Netherlands (2007), http://homepages.cwi.nl/~paulk/theses/ErikPost.pdf
  17. 17.
    van der Leek, R.: Implementation Strategies for Island Grammars. Master’s thesis, Delft University of Technology, The Netherlands (2005), http://swerl.tudelft.nl/twiki/pub/Main/RobVanDerLeek/robvanderleek.pdf
  18. 18.
    van den Brand, M.G.J., Klusener, S., Moonen, L., Vinju, J.J.: Generalized parsing and term rewriting: Semantics driven disambiguation. Electr. Notes Theor. Comput. Sci. 82(3), 575–591 (2003)CrossRefGoogle Scholar
  19. 19.
    Synytskyy, N., Cordy, J.R., Dean, T.R.: Robust multilingual parsing using island grammars. In: Proceedings of the 2003 Conference of the Centre for Advanced Studies on Collaborative Research, CASCON 2003, pp. 266–278. IBM Press (2003)Google Scholar
  20. 20.
    Cordy, J.R.: TXL - A Language for Programming Language Tools and Applications. Electronic Notes in Theoretical Computer Science 110, 3–31 (2004)CrossRefGoogle Scholar
  21. 21.
    Schwerdfeger, A.C., Van Wyk, E.R.: Verifiable composition of deterministic grammars. SIGPLAN Not. 44(6), 199–210 (2009)CrossRefGoogle Scholar
  22. 22.
    Aycock, J., Nigel Horspool, R.: Schrödinger’s Token. Software, Practice & Experience 31, 803–814 (2001)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ali Afroozeh
    • 1
  • Jean-Christophe Bach
    • 2
    • 3
    • 4
  • Mark van den Brand
    • 1
  • Adrian Johnstone
    • 5
  • Maarten Manders
    • 1
  • Pierre-Etienne Moreau
    • 2
    • 3
    • 4
  • Elizabeth Scott
    • 5
  1. 1.Eindhoven University of TechnologyEindhovenThe Netherlands
  2. 2.InriaVillers-lès-NancyFrance
  3. 3.LORIA, UMR 7503Université de LorraineVandœuvre-lès-NancyFrance
  4. 4.LORIA, UMR 7503CNRSVandœuvre-lès-NancyFrance
  5. 5.Royal Holloway, University of LondonSurreyUK

Personalised recommendations