Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Even very different programming languages often share similar constructs. Consider OCaml’s conditional ‘’ and the conditional operator ‘\(E_1\) ? \(E_2\) : \(E_3\)’ in C. These constructs have different concrete syntax but similar semantics, with some variation in details. We would like to exploit this similarity when defining formal semantics for both languages by reusing commonalities between the OCaml and C specifications. With traditional approaches to semantics, reuse through ‘copy-paste-and-edit’ is usually the only option that is available to us. By default, this is also the case with the K Framework [9, 13]. This style of specification reuse is not systematic, and prone to error.

The semantic framework currently being developed by the PLanCompS projectFootnote 1 provides fundamental constructs (funcons) that address the issues of reusability in a systematic manner. Funcons are small semantic entities which express essential concepts of programming languages. These formally specified components can be composed to capture the semantics of concrete programming language constructs. A specification of Caml Light has been developed as an initial case study [3] and a case study on C# is in progress.

For example, the funcon can be used to specify OCaml’s conditional expression. Semantics is given by defining a translation from the concrete construct to the corresponding funcon term:

Since the conditional operator in C uses integer valued expressions as the condition, its translation will reflect this:

We could also define an funcon that would match the C-conditional semantics exactly. However, the translation using is so simple that there wouldn’t be much advantage in doing so. We can reuse the funcon, and with it, its semantic definition. This way, we also make the difference between the OCaml and C conditional construct explicit. Section 2 provides more information on funcons.

PLanCompS uses MSOS [10], a modular variant of structural operational semantics [11], to formally define individual funcons. However, the funcon approach can be seamlessly integrated with other sufficiently modular specification frameworks. We have tested the use of funcons with the K Framework by giving a specification of CinK [8, 9], a pedagogical subset of . We have defined both the translation of CinK to funcons and the semantics of the funcons using K’s rewrite rules. The complete prototyped specification is available online, together with the CinK test programs which we have used to test our specification.Footnote 2 Interested readers may run these programs themselves using the K tool.

In this paper, we present our specification of the CinK translation (Sect. 3) and illustrate the definition of the semantics of funcons involved in it (Sect. 4). Section 5 offers an overview of related work and alternative approaches. We conclude and suggest directions of future work in Sect. 6.

2 Fundamental Constructs

As mentioned in the Introduction, the PLanCompS project is developing an open-ended collection of fundamental programming constructs, or ‘funcons’. Many funcons correspond closely to simplified programming language constructs. However, each funcon has fixed syntax and semantics. For example, the funcon written has the effect of evaluating to a variable, to a value (in any order), then assigning the value to the variable; it is well-typed only if is of type and is of type . In contrast, the language construct written ‘’ may be interpreted as an assignment or as an equality test (and its well-typedness changes accordingly) depending on the language.

The syntax or signature of a funcon determines its name, how many arguments it takes (if any), the sort of each argument, and the sort of the result. The following computation sorts reflect fundamental conceptual and semantic distinctions in programming languages.

  • The sort (commands) is for funcons (such as that are executed only for their effects; on normal termination, a command computes the fixed value .

  • The sort (expressions) is for funcons (such as and ) that compute values of sort .

  • The sort (declarations) is for funcons (such as ) that compute values of sort , which represent sets of bindings between identifiers and values.

All computation sorts include their sorts of computed values as subsorts: a value takes no steps at all to compute itself.

One of the aims of the PLanCompS project is to establish an online repository of funcons (and data types) for anybody to use ‘off-the-shelf’ as components of language specifications. The project is currently testing the reusability of existing funcons and developing new ones in connection with some major case studies (including Caml Light, C#, and Java). Because individual funcons are meant to represent fundamental concepts in programming languages, many funcons (expressing, e.g., sequencing, conditionals, variable lookup and dereferencing) have a high potential for reuse. In fact, many funcons used in the Caml Light case study appear in the semantics of CinK presented in the following section.

The nomenclature and notation for the existing funcons are still evolving, and they will be finalised only when the case studies have been completed, in connection with the publication of the repository. Observant readers are likely to notice some (minor) differences between the funcon names used in this paper and in previous papers (e.g. [3]).

Regardless of the details of funcon notation, funcons can be algebraically composed to form funcon terms, according to their argument and result sorts (strictly lifted to corresponding computation sorts). Well-formedness of funcon terms is context-free: is a well-formed funcon term whenever and are well-formed funcon terms of sort . In contrast, well-typedness of funcon terms is generally context-sensitive. For example, the funcon term is well-typed only in the scope of a declaration that binds to an integer variable. Dynamic semantics is defined for all well-formed terms; execution of ill-typed terms may fail.

The composability of funcons does not depend on features such as whether they might have side effects, terminate abruptly, diverge, spawn processes, interact, etc. This is crucial for the reusability of the funcons. The semantics of each funcon has to be specified without regard to the context in which it might be used, which requires a highly modular specification framework. Funcon specifications have previously been given in MSOS, Rewriting Logic, ASF + SDF, and action notation. Here, we explore specifying funcons in K, following Roşu.Footnote 3

A component-based semantics of a programming language is specified by a context-free grammar for an abstract syntax for the language, together with a family of inductively specified functions translating abstract syntax trees to funcon terms. The static and dynamic semantics of a program is given by that of the resulting funcon term. As mentioned above, funcons have fixed syntax and semantics. Thus, evolution of a language is expressed as changes to translation functions. If the syntax or semantics of the programming language changes, the definition of the translation function has to be updated to reflect this.

Tool support for translating programs to funcon terms, and for executing the static and dynamic semantics of such terms, has previously been developed in Prolog [2], Maude [1] and ASF + SDF. We now present our experiment with K, focusing on dynamic semantics.

3 A Funcon Specification of CinK

This section presents an overview of our CinK specification using funcons. We include examples from the K sources of the specification. A selection of definitions of funcons involved in the specification can be found in Sect. 4.

CinK is a pedagogical subset of [8, 9] used for experimentation with the K Framework. The original report [8] presents the language in seven iterations. The first specifies a basic imperative language; subsequent iterations extend it with threads, model-checking, references, pointers, and uni-dimensional and multi-dimensional arrays. Our specification starts with only an expression language which we extend with declarations, statements, functions, threads, references, pointers, and arrays. The extensions follow the order of the CinK iterations; however, we omit support for model-checking.

The grammar which we have used for our specification is a simplified grammar matching CinK derived from the grammar found in the standard [7, Appendix A].

We invite the reader to compare our specification by translation to funcons with the original K specification of CinK in [8]. Our hope is that our translation functions, together with the suggestive naming of funcons, give a rough understanding of the semantics of language constructs, even before looking at the semantics of funcons themselves.

3.1 Simple Expressions

To give semantics for expressions we use the translation function . It produces a funcon term (of sort ) which, when executed, evaluates the argument expression.

Definitions for arithmetic expressions in CinK can be given very straightforwardly using data operations, which all extend to strict funcons on . For example, semantics of the multiplication operator is expressed as the application of the operation to translations of operand expressions (numeric types in CinK are limited to integers with some common operations):

figure a

The ‘short-circuit and’ operator can be readily expressed using a conditional funcon, which is strict only in its first argument. The (obvious) K definition for can be found in Sect. 4.

figure b

We will use the generic funcon later in this section to define the conditional statement.

3.2 Variables, Blocks and Scope

Bindings and Variables. Semantics of declarations are given using the translation function . The funcon binds the identifier to the value , producing a ‘small’ environment containing only the newly created binding. To allocate a new variable of a specified type we use . In Caml Light, was used for individual name-value bindings in let-expressions, and for reference data types (e.g. ‘’).

figure c

In relation to variables, CinK (following ) distinguishes between two general categories of expressions: lvalue- and rvalue-expressions. We express this distinction by having different translation functions for expressions in lvalue and rvalue contexts: in addition to , we define and . The default function produces terms evaluating lvalue and rvalue expressions according to their category. When an expression is expected to evaluate to an lvalue, we use . When an rvalue is expected, we use which produces terms evaluating all expressions into rvalues. For lvalue expressions it returns the corresponding stored value, i.e., it serves as an lvalue-to-rvalue conversion.

The addition of variables also affects our translations of simple expressions and we need to update them. For example, numeric operations expect an rvalue and thus the operands are now translated using .

To obtain the variable bound to an identifier in the current environment we use . A variable is dereferenced using . The semantics for an identifier appearing in an lvalue or rvalue context is thus:

figure d

Blocks and Controlling Scope. We distinguish between declaration statements and other statements within a block using funcons and . The funcon evaluates in the current environment overridden with the environment computed by . A declaration statement within a block produces a new environment that is valid until the end of the block:

figure e

The function translates statements to funcon commands.

For all other kinds of statements in a block we use the simple sequencing funcon which executes the command for side effects, then executes .

figure f

To accumulate multiple declarations into one environment we use the funcon. The funcon is similar to , except its result is the environment produced by elaborating declaration and overriding the environment computed by with it. This matches the semantics of a multi-variable declaration:

figure g

Note that is strict only in its first argument, so the correct order of evaluation is enforced.

Although Caml Light and CinK are quite different languages, all the funcons we needed here so far for CinK are reused from [3].

3.3 Assignment and Control Statements

The basic construct for updating variables in CinK/ is the assignment expression ‘’, where the expression is expected to evaluate to an lvalue, to which the rvalue of will be assigned. The value of the whole expression is the lvalue of . Semantics of assignment is a rather simple translation using the funcon (defined in Sect. 4.4):

figure h

The funcon is strict in both arguments but not sequentially, so the arguments are evaluated in an unspecified order. The funcon assigns the value given as its second argument to the variable given as its first argument and returns this variable as result.

CinK has boolean-valued conditions and the translations of while- and if-statements are trivial:

figure i

3.4 Function Definition and Calling

We represent functions as abstraction values which wrap any computation as a value. An abstraction can be passed as a parameter, bound to an identifier, or stored like any other value. To turn a funcon term into an abstraction, we use the value constructor. The funcon applies an abstraction to a value and the abstraction may refer to the passed value using . Multiple parameters can be passed as a tuple constructed using tuple value constructors.

A function call expression simply applies the abstraction to translated arguments:

figure j

At this stage the language only supports call-by-value semantics and so each parameter is evaluated to an rvalue before being passed to a function. The translation function (defined in terms of ) recurses through the parameter expressions and constructs a tuple.

figure k

We have introduced the auxiliary abstract syntax to ensure that parameters separated by commas are not interpreted as a comma-operator expression.

We use patterns as translations of function parameters. Patterns themselves are abstractions which compute an environment when applied to a matching value. The pattern for passing a single parameter by value allocates a variable of the corresponding type and binds it to an identifier; then it assigns the parameter value to the variable and returns the resulting environment.

figure l

Here we use the funcon , which allows using a command as a declaration. It is an abbreviation for .

Roughly, the semantics of a function definition is to allocate storage for an abstraction of the corresponding type, bind it to the function name, and use it to store an abstraction of the function body. Looking closer, the definition has to deal with some more details:

figure m

Within the abstraction we use to match the passed value against the pattern tuple constructed from individual parameter patterns. The translation of the function body is evaluated in the environment produced by this matching (). Since a return statement abruptly terminates a function returning a value, we represent return statements as exceptions containing a value tagged with the atom ‘’ and wrap the function body in a handler. The funcon catches the exception and the handling abstraction retrieves the value tagged with ‘’, making it the return value of the whole function. In case there was no return statement in the body of the function, we throw a ‘’ with . Using we form a closure of the abstraction with respect to the definition-time environment, to ensure static scopes for bindings.

As mentioned above, an explicit return statement translates to throwing a value tagged with ‘’. A parameterless return throws a .

figure n

As a simple way of allowing self- and mutually recursive function definitions, we pre-allocate function variables and bind all function names declared at the top-level in a global environment using . Then we combine this environment with the elaboration of full function definitions and other declarations. The function is called in the scope of the global environment.

figure o

Because function identifiers are already bound when the full function definition is elaborated, the full definition only assigns the abstraction to the pre-allocated variable.

3.5 Threads

The second iteration in the original CinK report adds very basic thread support to the language. Spawning a thread in CinK mimics the syntax of using the class from the standard library. However, instead of referring to the standard library, semantics is given to the construct directly.

figure p

The funcon creates a new thread in which the abstraction will be applied. In our case the abstraction contains a function call corresponding to the parameters given to the thread constructor.

3.6 References

A reference in is an alias for a variable, i.e., it introduces a new name for an already existing variable.

figure q

The expression is expected to compute an lvalue and we bind the resulting variable to identifier . We are assuming that the input program is statically correct and thus the variable will have the right type.

A reference parameter pattern simply binds to the given variable.

figure r

Before introducing references, we evaluated function parameters to an rvalue. Now the function has to be redefined in terms of instead of . Dereferencing is handled conditionally inside the parameter pattern.

figure s

The funcon dereferences its argument if it is a variable (lvalue), otherwise it returns the parameter itself.

3.7 Pointers

Pointer variables either hold a reference to another variable or are null otherwise. In this iteration we introduce auxiliary syntax for types, which we use to extract type information from declarations. Our type syntax is not part of the original language. It mostly resembles the original syntax, except for function types which are expressed using a functional (arrow) notation. Here we extract types from a pointer declaration and a function declaration:

figure t

We translate these intermediate types into funcon types (just as we do with simple types). The funcon type is the type of pointers to variables of type :

figure u

To illustrate, consider the pointer declaration which declares to be a pointer to a pointer to an integer variable. The type of this variable in our auxiliary syntax is and the analysed type is .

Pointer variables are allocated in the same manner as other variables: we simply pass the type of the pointer variable as the argument to the funcon.

Explicit dereferencing of a pointer variable in an expression amounts to retrieving the value stored in the pointer. This value is the location to which the pointer is pointing. This is expressed in our translation:

figure v

If the pointer is , dereferencing it or assigning to it will result in a stuck computation.

3.8 Arrays

This extension adds uni-dimensional and multi-dimensional array declarations and expressions to the specification. We analyse CinK arrays, which are indexed from zero, in terms of vectors. Similarly to pointers, we use auxiliary syntax for array types.

figure w

The arguments of the type constructor are the length of the vector and the type of its elements. To allocate an array of a given type, we use the funcon:

figure x

Vectors allocated in this way are composed of the appropriate number of individual variables. These are read from and assigned to separately.

The semantics of accessing an array element via its index is given using the funcon. An array access expression in an lvalue position has the following semantics:

figure y

In CinK, multi-dimensional arrays are specified as vectors of vectors. As an illustration of translating array types, consider the declaration statement in . Expressing the type of using our auxiliary syntax gives us . The translated type is vectors(2, vectors(3, variables(integers))). The construct properly allocates variables for such multi-dimensional vectors and returns a compound value of the appropriate type.

A Note on Reuse. The complete funcon definition of CinK available online uses 27 funcons. Of these, 19 have been previously used in the specification of Caml Light and only 8 were introduced in the present work, 3 of which are just abbreviations for longer funcon terms. It is thus possible to conclude that the degree of reuse of funcons between the Caml Light and CinK specifications is high, even if the languages are quite different.

3.9 Configuration

The configuration of the final iteration of our specification is as follows:

figure z

It appears that this configuration could be generated from the K rules defining the funcons used in our specification of CinK. It is unclear to us whether inference of K configurations from arbitrary K rules is possible, and whether it would be consistent with the K configuration abstraction algorithm.

3.10 Sequencing of Side Effects

Following the standard [7], CinK decouples side effects of some constructs to allow delaying memory writes to after an expression value has been returned. This gives compilers more freedom for performing optimisations and during code generation. The newest standard uses a relation sequenced before to define how side effects are to be ordered with respect to each other and to value evaluation. The original CinK specification in K [8] uses auxiliary constructs for side effects and uses a bag to collect side effects. An auxiliary sequence point construct forces finalisation of side effects in the bag.

We have experimented with funcons to express decoupled side effects and have developed a preliminary K specification of the relevant funcons. Our solution is based on a pair of funcons. The first funcon encapsulates an expression, which can potentially request to defer side effects. It also maintains a set of deferred side effects which are computed interleaved with the encapsulated expression. Finally, it ensures that all side effect computations have finished before returning the value of the original expression. The other funcon serves to defer a side effect: it signals to the encapsulating funcon that a computation is to be interleaved with the evaluation of the original expression.

4 Funcons in K

We now illustrate our K specification of the syntax and semantics of the funcons and value types used in our component-based analysis of CinK. We specify each funcon and value type in a separate module, to facilitate selective reuse. Since modularity is a significant feature of our specifications, we show some of the specified imports. The complete specifications are available online, together with the K specification of the translation of CinK programs to funcons.

4.1 Expressions

Expressions compute values:

figure aa

Our specifications of value types lift the usual value operations to expression funcons, each of which is strict in all its arguments:

figure ab

In contrast, the conditional expression funcon

figure ac

is strict only in , and its rules involve unevaluated expression arguments:

figure ad

We specify a corresponding funcon for conditional commands separately, since it appears that K modules cannot have parametric sorts (although the rules above could be generalised to arbitrary K arguments).

4.2 Declarations

figure ae

Bindings are values corresponding to environments (mapping identifiers to values), and come equipped with some operations that can be used to compose declarations:

figure af

We could have included the funcon as an operation in the above module, since it is strict in its only expression argument:

figure ag

In contrast, the following funcons involve inspecting or (temporarily) changing the current environment, which is assumed to be in an accompanying cell:

figure ah
figure ai
figure aj

The auxiliary operation

figure ak

preserves the result of K when resetting the current environment to M:

figure al

The K argument could be of sort , or . Since we do not use

figure am

directly in the translation of CinK to funcons, the fact that

figure an

is (semantically) of the same sort as K is irrelevant.

4.3 Commands

figure ao

In contrast to the usual style in K specifications, commands compute the unique value

figure ap

on normal termination, rather than dissolving. However, this difference does not affect the translation of programs to funcons.

figure aq

As with , the funcon is essentially generic in , but its syntax needs to be specified separately for each sort of . In contrast, the sort of

figure ar

is independent of the sort of , and we can specify it generically:

figure as

The specification of

figure at

illustrates reuse between funcon specifications:

figure au

4.4 Variables

Variables are themselves treated as values:

figure av

The specifications of the funcons for allocating, assigning to, and inspecting the values stored in variables are much as usual. For example, the funcon assigns a value to a variable and then returns the variable:

figure aw

4.5 Vector Allocation

The funcon serves to allocate a vector of variables. It uses the funcon for allocation of element variables.

figure ax

4.6 Functions

figure ay

The operation

figure az

constructs a value from an unevaluated expression . It can then be closed to obtain static bindings for identifiers in (the K specification of the funcon

figure ba

is unsurprising, and omitted here).

figure bb

The funcon

figure bc

makes the value of available as ‘’ in the evaluation of :

figure bd
figure be

The specifications of the funcons and assume that all cells used to represent the current context of a computation are grouped under a unique context cell. This gives improved modularity: the specification remains the same when further contextual cells are required. In other respects, the specification follows the usual style in the K literature, using a stack of exception handlers:

figure bf
figure bg

Funcons and have the most complicated definitions of all, yet they are still modest in size and complexity.

5 Related Work

The work in this paper was inspired by a basic specification of the IMP example language in funcons using K by Roşu. IMP contains arithmetic and boolean expressions, variables, if- and while-statements, and blocks. The translation to funcons is specified directly using K rewrite rules without defining sorted translation functions. The example can be found in the stable K distribution.Footnote 4

CinK, the sublanguage of that we use as a case study in this paper, is taken from a technical report by Lucanu and Şerbănuţă [8]. We have limited ourselves to the same subset of .

SIMPLE [12] is another K example language which is fairly similar to CinK. The language is presented in two variants: an untyped and a typed one. The definition of typed SIMPLE uses a different syntax and only specifies static semantics. With the component-based approach, we specify a single translation of language constructs to funcons. The MSOS of the funcons defines separate relations for typing and evaluation; in K, it seems we would need to provide a separate static semantics module for each funcon, since the strictness annotations and the computation rules differ.

K specifications scale up to real-world languages, as illustrated by Ellison’s semantics of C [4]. The PLanCompS project is currently carrying out major case studies (C#, Java) to examine how the funcon-based approach scales up to large languages, and to test the reusability of the funcon specifications.

Specification of individual language constructs in separate K modules was proposed by Hills and Roşu [6] and further developed by Hills [5, Chap. 5]. They obtained reusable rules by inferring the transformations needed for the rules to match the overall K configuration. The reusability of their modules was limited by their dependence on language syntax, and by the fact that the semantics of individual language constructs is generally more complicated than that of individual funcons.

6 Conclusion

We have given a component-based specification of CinK, using K to define the translation of CinK to funcons as well as the (dynamic) semantics of the funcons themselves. This experiment confirms the feasibility of integrating component-based semantics with the K Framework.

The K specification of each funcon is an independent module. Funcons are significantly simpler than constructs of languages such as CinK, and it was pleasantly straightforward to specify their K rules. However, we would have preferred the K configurations for combination of funcons to be generated automatically.

Many of the funcons used here for CinK were introduced in the component-based specification of Caml Light [3], demonstrating their reusability. The names of the funcons are suggestive of their intended interpretation, so the translation specification alone should convey a first impression of the CinK semantics. Readers are invited to browse the complete K specifications of our funcons online, then compare our translation of CinK to funcons with its direct specification in K [8].

In the future, we are aiming to define the static semantics of funcons in K, so our translation would induce a static semantics for CinK.