1 EAGs and pregmatic

In 1985, I followed the course “Vertalerbouw 2” (Compiler Construction 2) at the Radboud University Nijmegen. In this course, the Extended Affix Grammar (EAG) formalism was introduced. EAGs were closely related to Attribute Grammars; they allowed the syntactic and semantic definition of programming languages. Next to this topic, we were free to choose a compiler construction-related topic to write an essay, and I choose the topic of Programming Environment Generators. For this purpose, I read the PhD thesis of Thomas Reps on Generating Language-Based Environments [1]. This was my starting point of a long journey in software language engineering and tool development. During my MSc graduation project, I developed a prototype of a programming environment generator based on EAGs. This work was the basis for my PhD thesis “Pregmatic: A Generator For Incremental Programming Environments” [2]. The use of EAGs allowed me to experiment with semantics directed parsing in relation to generated programming environments. EAGs supported both the definition of the syntax of a language and the definition of semantic functions. The exchange values between nodes in the syntax trees were facilitated via affixes. The semantic functions in combination with the affixes provided a mechanism to specify simple semantic rules, mainly related to type checking, but more complex semantic rules turned out to be challenging.

2 ASF+SDF Meta-Environment

After my PhD, I moved to the University of Amsterdam and started working in the group of Paul Klint. The group was developing the ASF+SDF Meta-Environment [3], a programming environment generator built on top of the Centaur System [4]. The ASF+SDF Meta-Environment supported the definition of (programming) languages where the syntax of the language was defined via the Syntax Definition Formalism (SDF) and the semantics via the Algebraic Specification Formalism (ASF). The Meta-Environment was a fully integrated environment that provided editors for the SDF and ASF parts. It was a closed environment from a language definition point of view because there were no back-doors to use other languages to define aspects of a language.

The ASF+SDF Meta-Environment was used to do research on, among others, legacy software and domain-specific languages (DSLs) [5]. My contributions were on pretty printing [6] and later on the development of the ASF2C compiler [7]. In both cases, I used ASF+SDF and the Meta-Environment as implementation vehicles.

After 5 years at the University of Amsterdam, my scientific journey continued at CWI and contributed to a new version of the ASF+SDF Meta-Environment which was no longer based on Centaur. Also, I got more involved in the research on domain-specific languages. The new version of the ASF+SDF Meta-Environment was used in research, education and industrial projects related to DSL design and reverse engineering [5].

The learning curve for ASF+SDF was rather steep. The SDF formalism was a great vehicle to define the syntax of languages, because of its modularity and the underlying powerful SGLR parsing technology [8]. However, definition of the semantics of languages still proved to be the real challenge. Although ASF allowed for abstraction and modularity in the definition of semantic aspects of a language, it was often tedious to define the full semantics, including a code generator.

3 LDTA, SLE, and language workbenches

At the beginning of 2000, the scientific community on software language engineering started to take shape and to become independent of the regular Compiler Construction community. In 2001, the workshop series on Language Descriptions, Tools, and Applications (LDTA)Footnote 1 started. In 2008, this workshop was merged with the International Workshop on Language Engineering (ATEM) and continued as the International Conference on Software Language Engineering (SLE).Footnote 2 LDTA and SLE were stepping stones to create a community of researchers and practitioners working on software language engineering together. In these workshops and conferences, software language engineering researchers and tool builders gathered to present their latest research results and showcased their tools developed for improving and easing the adoption of software language engineering, such as Spoofax [9], JastAdd [10], MontiCore [11], Silver [12], MPS [13], GEMOC [14], Rascal [15], and ANTLRWorks/XtextFootnote 3 [16].

The software language engineering tools became language workbenches (LWBs), a term coined by Martin Fowler [17] around 2005. The term was immediately picked up by the software language engineering community because it brought together the developments in domain-specific languages and integrated development environments (IDEs). The SLE community organized a few language workbench challenges [18, 19], with the goal to show the strengths (and weaknesses) of the various tools. The aforementioned list of language engineering tools, or LWBs, is a mix of academic tools and some tools used in industry, such as MPS and Xtext. The transfer of features explored in the academic tools to the industrialized tools could have been stronger.

4 DSLs in the high-tech industry

In 2006, I moved from CWI to the Eindhoven University of Technology. This move was not only geographically, but I moved from being a developer of language engineering tools to become a user of language engineering tools as well. First, I started teaching a course on Generic Language Technology, initially based on ASF+SDF and the ASF+SDF Meta-Environment; later, I started using Eclipse and Xtext and finally Rascal. For each of these language workbenches, students experienced a steep learning curve. However, once students grasped the basic principles, they were able to create interesting and challenging (small) DSLs. The Eindhoven area is characterized by a large number of companies working on high-tech equipment. This turned out to be a perfect environment for promoting software language engineering. Several companies were interested in adopting DSLs. These companies developed and used DSLs because there was a need to increase the level of abstraction, moving from, for instance, C code to models in the DSL [20]. This involves the development of non-trivial code generators, and the development of these generators is costly and time-consuming. The code generators need to be tested and validated. The lack of properly defined semantics of the DSL hinders developing and testing of the code generators.

Collaborations between the local industry and our university started and led to interesting research projects. There are two examples that I want to highlight. The first is the work by Ulyana Tikhonova. She worked on the development of semantic building blocks for DSLs within ASML [21]. In this project, she applied these semantic building blocks to define the dynamic semantics of a DSL used for describing the scheduling of tasks related to the processing of wafers. The semantic building blocks were mapped to Event-B which allowed for verification of the dynamic semantics of the DSL using the Event-B tools [22].

The other example is the use of MPS at Canon Production Printing (The Netherlands) and their way of working. They have a group of engineers with a broad knowledge on MPS and software language engineering. When a new DSL is required for a specific domain/application area within Canon Production Printing, engineers are teamed up with the domain experts [23].

5 My final reflections

A lot of research has been performed in the area of semantics (e.g., denotational semantics, operational semantics, and action semantics) of programming languages with very interesting and useful results. The transfer of these results to DSL design and language workbenches has been hampered by the fact that the descriptions and implementations are too large. Ulyana Tikhonova’s work was a small step in the creation of re-usable semantic building blocks to formalize the semantics of DSLs in a very specific application area. Eelco Visser followed a different approach in Spoofax, by defining small DSLs to describe different semantic aspects such as name resolution, scoping rules, type checking, and the dynamic semantic rules [24, 25]. The current generation of language workbenches still have a strong focus on syntax, but the mechanisms to define the (static and dynamic) semantics of DSLs are still experimental, although GEMOC [14] offers facilities for executing and debugging DSLs, which in other LWBs are either lacking or primitive.

The Canon Production Printing example illustrated another shortcoming in relation to the design and implementation of DSLs. Although multiple text books on programming languages, language design, and software language engineering exist, among others, [17, 26, 27], there is no text book that proposes a methodology for designing and developing DSLs. Before creating a DSL, it is important to identify and understand the application area or (problem) domain, the goal of the language, the involved stakeholders, and the technical environment. There is an overlap with regular requirements engineering except the software language engineering makes it more complicated. For instance, what are the language concepts needed by the domain or what is the best (textual/graphical) representation of the language concepts for the end-users? Multiple iterations may be needed to obtain a usable DSL. Most of the existing literature solely focuses on the technical challenges when creating DSLs. The existing language workbenches are excellent tools to specify a DSL, but do not support the above mentioned steps before the actual creation of a DSL. The current generation of language workbenches have a strong focus on language engineer tool smiths but do not really support end-users. Recent developments in the area of language workbenches for block-based languages are a promising step [28].