Layout-Sensitive Generalized Parsing

  • Sebastian Erdweg
  • Tillmann Rendel
  • Christian Kästner
  • Klaus Ostermann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7745)

Abstract

The theory of context-free languages is well-understood and context-free parsers can be used as off-the-shelf tools in practice. In particular, to use a context-free parser framework, a user does not need to understand its internals but can specify a language declaratively as a grammar. However, many languages in practice are not context-free. One particularly important class of such languages is layout-sensitive languages, in which the structure of code depends on indentation and whitespace. For example, Python, Haskell, F#, and Markdown use indentation instead of curly braces to determine the block structure of code. Their parsers (and lexers) are not declaratively specified but hand-tuned to account for layout-sensitivity.

To support declarative specifications of layout-sensitive languages, we propose a parsing framework in which a user can annotate layout in a grammar. Annotations take the form of constraints on the relative positioning of tokens in the parsed subtrees. For example, a user can declare that a block consists of statements that all start on the same column. We have integrated layout constraints into SDF and implemented a layout-sensitive generalized parser as an extension of generalized LR parsing. We evaluate the correctness and performance of our parser by parsing 33 290 open-source Haskell files. Layout-sensitive generalized parsing is easy to use, and its performance overhead compared to layout-insensitive parsing is small enough for practical application.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bravenboer, M., Vermaas, R., Vinju, J.J., Visser, E.: Generalized Type-Based Disambiguation of Meta Programs with Concrete Object Syntax. In: Glück, R., Lowry, M. (eds.) GPCE 2005. LNCS, vol. 3676, pp. 157–172. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  2. 2.
    Dijkstra, A., Fokker, J., Swierstra, S.D.: The architecture of the Utrecht Haskell compiler. In: Proceedings of Haskell Symposium, pp. 93–104. ACM (2009)Google Scholar
  3. 3.
    Erdweg, S., Giarrusso, P.G., Rendel, T.: Language composition untangled. In: Proceedings of Workshop on Language Descriptions, Tools and Applications, LDTA (to appear, 2012)Google Scholar
  4. 4.
    Erdweg, S., Rendel, T., Kästner, C., Ostermann, K.: SugarJ: Library-based syntactic language extensibility. In: Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pp. 391–406. ACM (2011)Google Scholar
  5. 5.
    Erdweg, S., Rieger, F., Rendel, T., Ostermann, K.: Layout-sensitive language extensibility with SugarHaskell. In: Proceedings of Haskell Symposium, pp. 149–160. ACM (2012)Google Scholar
  6. 6.
    Ford, B.: Packrat parsing: Simple, powerful, lazy, linear time, functional pearl. In: Proceedings of International Conference on Functional Programming (ICFP), pp. 36–47. ACM (2002)Google Scholar
  7. 7.
    Ford, B.: Parsing expression grammars: A recognition-based syntactic foundation. In: Proceedings of Symposium on Principles of Programming Languages (POPL), pp. 111–122. ACM (2004)Google Scholar
  8. 8.
    Heering, J., Hendriks, P.R.H., Klint, P., Rekers, J.: The syntax definition formalism SDF – reference manual. SIGPLAN Notices 24(11), 43–75 (1989)CrossRefGoogle Scholar
  9. 9.
    Jim, T., Mandelbaum, Y., Walker, D.: Semantics and algorithms for data-dependent grammars. In: Proceedings of Symposium on Principles of Programming Languages (POPL), pp. 417–430. ACM (2010)Google Scholar
  10. 10.
    Kats, L.C.L., de Jonge, M., Nilsson-Nyman, E., Visser, E.: Providing rapid feedback in generated modular language environments: Adding error recovery to scannerless generalized-LR parsing. In: Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pp. 445–464. ACM (2009)Google Scholar
  11. 11.
    Kats, L.C.L., Visser, E., Wachsmuth, G.: Pure and declarative syntax definition: Paradise lost and regained. In: Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pp. 918–932. ACM (2010)Google Scholar
  12. 12.
    Landin, P.J.: The next 700 programming languages. Communication of the ACM 9(3), 157–166 (1966)MATHCrossRefGoogle Scholar
  13. 13.
    Leijen, D., Meijer, E.: Parsec: Direct style monadic parser combinators for the real world. Technical Report UU-CS-2001-27, Universiteit Utrecht (2001)Google Scholar
  14. 14.
    Marlow, S. (ed.): Haskell 2010 language report (2010), http://www.haskell.org/onlinereport/haskell2010
  15. 15.
    Mason, T., Brown, D.: Lex & yacc. O’Reilly (1990)Google Scholar
  16. 16.
    McBride, C.: Epigram: Practical Programming with Dependent Types. In: Vene, V., Uustalu, T. (eds.) AFP 2004. LNCS, vol. 3622, pp. 130–170. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    Parr, T., Fisher, K.: LL(*): The foundation of the ANTLR parser generator. In: Proceedings of Conference on Programming Language Design and Implementation (PLDI), pp. 425–436. ACM (2011)Google Scholar
  18. 18.
    Parr, T., Quong, R.W.: ANTLR: A predicated-LL(k) parser generator. Software Practice and Experience 25, 789–810 (1994)CrossRefGoogle Scholar
  19. 19.
    Tomita, M.: An efficient augmented-context-free parsing algorithm. Computational Linguistics 13(1-2), 31–46 (1987)Google Scholar
  20. 20.
    den van Brand, M.G.J., Scheerder, J., Vinju, J.J., Visser, E.: Disambiguation Filters for Scannerless Generalized LR Parsers. In: Nigel Horspool, R. (ed.) CC 2002. LNCS, vol. 2304, pp. 143–158. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  21. 21.
    Visser, E.: Scannerless generalized-LR parsing. Technical Report P9707, Programming Research Group, University of Amsterdam (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Sebastian Erdweg
    • 1
  • Tillmann Rendel
    • 1
  • Christian Kästner
    • 1
  • Klaus Ostermann
    • 1
  1. 1.University of MarburgGermany

Personalised recommendations