Soundness Conditions for Big-Step Semantics

We propose a general proof technique to show that a predicate is sound, that is, prevents stuck computation, with respect to a big-step semantics. This result may look surprising, since in big-step semantics there is no difference between non-terminating and stuck computations, hence soundness cannot even be expressed. The key idea is to define constructions yielding an extended version of a given arbitrary big-step semantics, where the difference is made explicit. The extended semantics are exploited in the meta-theory, notably they are necessary to show that the proof technique works. However, they remain transparent when using the proof technique, since it consists in checking three conditions on the original rules only, as we illustrate by several examples.


Introduction
The semantics of programming languages or software systems specifies, for each program/system configuration, its final result, if any. In the case of non-existence of a final result, there are two possibilities: either the computation stops with no final result, and there is no means to compute further: stuck computation, -or the computation never stops: non-termination.
There are two main styles to define operationally a semantic relation: the small-step style [34,35], on top of a reduction relation representing single computation steps, or directly by a set of rules as in the big-step style [28]. Within a small-step semantics it is straightforward to make the distinction between stuck and non-terminating computations, while a typical drawback of the big-step style is that they are not distinguished (no judgement is derived in both cases).
For this reason, even though big-step semantics is generally more abstract, and sometimes more intuitive to design and therefore to debug and extend, in the literature much more effort has been devoted to study the meta-theory of smallstep semantics, providing properties, and related proof techniques. Notably, the soundness of a type system (typing prevents stuck computation) can be proved by progress and subject reduction (also called type preservation) [40].
Our quest is then to provide a general proof technique to prove the soundness of a predicate with respect to an arbitrary big-step semantics. How can we achieve this result, given that in big-step formulation soundness cannot even be expressed, since non-termination is modelled as the absence of a final result exactly like stuck computation? The key idea is the following: 1. We define constructions yielding an extended version of a given arbitrary bigstep semantics, where the difference between stuckness and non-termination is made explicit. In a sense, these constructions show that the distinction was "hidden" in the original semantics. 2. We provide a general proof technique by identifying three sufficient conditions on the original big-step rules to prove soundness.
Keypoint (2)'s three sufficient conditions are local preservation, ∃-progress, and ∀-progress. For proving the result that the three conditions actually ensure soundness, the setting up of the extended semantics from the given one is necessary, since otherwise, as said above, we could not even express the property.
However, the three conditions deal only with the original rules of the given big-step semantics. This means that, practically, in order to use the technique there is no need to deal with the extended semantics. This implies, in particular, that our approach does not increase the original number of rules. Moreover, the sufficient conditions are checked only on single rules, which makes explicit the proof fragments typically needed in a proof of soundness. Even though this is not exploited in this paper, this form of locality means modularity, in the sense that adding a new rule implies adding the corresponding proof fragment only.
As an important by-product, in order to formally define and prove correct the keypoints (1) and (2), we propose a formalisation of "what is a big-step semantics" which captures its essential features. Moreover, we support our approach by presenting several examples, demonstrating that: on the one hand, their soundness proof can be easily rephrased in terms of our technique, that is, by directly reasoning on big-step rules; on the other hand, our technique is essential when the property to be checked (for instance, the soundness of a type system) is not preserved by intermediate computation steps, whereas it holds for the final result. On a side note, our examples concern type systems, but the meta-theory we present in this work holds for any predicate.
We describe now in more detail the constructions of keypoint (1). Starting from an arbitrary big-step judgment c ⇒ r that evaluates configurations c into results r , the first construction produces an enriched judgement c ⇒ tr t where t is a trace, that is, the (finite or infinite) sequence of all the (sub)configurations encountered during the evaluation. In this way, by interpreting coinductively the rules of the extended semantics, an infinite trace models divergence (whereas no result corresponds to stuck computation). The second construction is in a sense dual. It is the algorithmic version of the well-known technique presented in Exercise 3.5.16 from the book [33] of adding a special result wrong explicitly modelling stuck computations (whereas no result corresponds to divergence).
By trace semantics and wrong semantics we can express two flavours of soundness, soundness-may and soundness-must, respectively, and show the correctness of the corresponding proof technique. This achieves our original aim, and it should be noted that we define soundness with respect to a big-step semantics within a big-step formulation, without resorting to a small-step style (indeed, the two extended semantics are themselves big-step).
Lastly, we consider the issue of justifying on a formal basis that the two constructions are correct with respect to their expected meaning. For instance, for the wrong semantics we would like to be sure that all the cases are covered. To this end, we define a third construction, dubbed pev for "partial evaluation", which makes explicit the computations of a big-step semantics, intended as the sequences of execution steps of the naturally associated evaluation algorithm. Formally, we obtain a reduction relation on approximated proof trees, so termination, non-termination and stuckness can be defined as usual. Then, the correctness of traces and wrong constructions is proved by showing they are equivalent to pev for diverging and stuck computations, respectively.
In Sect. 2 we illustrate the meta-theory on a running example. In Sect. 3 we define the trace and wrong constructions. In Sect. 4 we express soundness in the must and may flavours, introduce the proof technique, and prove its correctness. In Sect. 5 we show in detail how to apply the technique to the running example, and other significant examples. In Sect. 6 we introduce the third construction and state that the three constructions are equivalent. Finally, in 7 and 8 we discuss related and further work and summarise our contribution. An extended version including an additional example, proofs omitted for lack of space, and technical details on the pev semantics, can be found at http://arxiv.org/abs/2002.08738.

A meta-theory for big-step semantics
We introduce a formalisation of "what is a big-step semantics" that captures its essential features, subsuming a large class of examples (as testified in Sect. 5). This enables a general formal reasoning on an arbitrary big-step semantics.
A big-step semantics is a triple C , R, R where: -C is a set of configurations c.
-R ⊆ C is a set of results r . We define judgments j ≡ c ⇒ r , meaning that configuration c evaluates to result r . Set C (j ) = c and R(j ) = r . -R is a set of rules ρ of shape with c ∈ C \R, where j 1 . . . j n are the dependencies and j n+1 is the continuation. Set C (ρ)=c and, for i ∈ 1..n + 1, C (ρ, i)=C (j i ) and R(ρ, i)=R(j i ). -For each result r ∈ R, we implicitly assume a single axiom r ⇒ r . Hence, the only derivable judgment for r is r ⇒ r , which we will call a trivial judgment.
We will use the inline format, more concise and manageable, for the development of the meta-theory, e.g., in constructions.
A rule corresponds to the following evaluation process for a non-result configuration: first, dependencies are evaluated in the given order, then the continuation is evaluated and its result is returned as result of the entire computation.
Example of big-step semantics Rules as defined above specify an inference system [1,30], whose inductive interpretation is, as usual, the semantic relation. However, they carry slightly more structure with respect to standard inference rules. Notably, premises are a sequence rather than a set, and the last premise plays a special role. Such additional structure does not affect the semantic relation defined by the rules, but allows abstract reasoning about an arbitrary big-step semantics, in particular it is relevant for defining the three constructions. In the following, we will write R ⊢ c ⇒ r when the judgment c ⇒ r is derivable in R.
As customary, the (infinite) set of rules R is described by a finite set of metarules, each one with a finite number of premises. As a consequence, the number of premises of rules is not only finite but bounded. Since we have no notion of metarule, we model this feature (relevant in the following) as an explicit assumption: BP there exists b ∈ N such that, for each ρ ≡ rule(j 1 . . . j n , j n+1 , c), n < b. We end this section illustrating the above definitions and conditions by a simple example: a λ-calculus with natural constants, successor and non-deterministic choice shown in Fig. 1. We present this example as an instance of our definition: -Configurations and results are expressions, and values, respectively. 3 -To have the set of (meta-)rules in our required shape, abbreviated in inline format in the bottom section of the figure: • axiom (val) can be omitted (it is implicitly assumed) • in (app) we consider premises as a sequence rather than a set (the third premise is the continuation) • in (succ), which has no continuation, we add a dummy continuation • on the contrary, in (choice) there is only the continuation (dependencies are the empty sequence, denoted ǫ in the inline format).
Note that (app) corresponds to the standard left-to-right evaluation order. We could have chosen the right-to-left order instead: (app-r) rule(e 2 ⇒ v 2 e 1 ⇒ λx.e , e[v 2 /x] ⇒ v, e 1 e 2 ) or even opt for a non-deterministic approach by taking both rules (app) and (app-r). As said above, these different choices do not affect the semantic relation c ⇒ r defined by the inference system, which is always the same. However, they will affect the way the extended semantics distinguishing stuck computation and non-termination is constructed. Indeed, if the evaluation of e 1 and e 2 is stuck and non-terminating, respectively, we should obtain stuck computation with rule (app) and non-termination with rule (app-r).
In summary, to see a typical big-step semantics as an instance of our definition, it is enough to assume an order (or more than one) on premises, make implicit the axiom for results, and add a dummy continuation when needed. In the examples (Sect. 5), we will assume a left-to-right order on premises, and omit dummy continuations to keep a more familiar style. In the technical part (Sect. 3, Sect. 4 and Sect. 6) we will adopt the inline format.

Extended semantics
In the following, we assume a big-step semantics C , R, R and describe two constructions which make the distinction between non-termination and stuck computation explicit. In both cases, the approach is based on well-know ideas; the novel contribution is that, thanks to the meta-theory in Sect. 2, we provide a general construction working on an arbitrary big-step semantics.

Traces
We denote by C ⋆ , C ω , and C ∞ = C ⋆ ∪C ω , respectively, the sets of finite, infinite, and possibly infinite traces, that is, sequences of configurations. We write t · t ′ for concatenation of t∈C ⋆ with t ′ ∈C ∞ .
We derive, from the judgement c ⇒ r , an enriched big-step judgement c ⇒ tr t with t ∈ C ∞ . Intuitively, t keeps trace of all the configurations visited during the evaluation, starting from c itself. To define the trace semantics, we construct, starting from R, a new set of rules R tr , which are of two kinds: trace introduction These rules enrich the standard semantics by finite traces: for each ρ ≡ rule(j 1 . . . j n , j n+1 , c) in R, and finite traces t 1 , . . . , t n+1 ∈C ⋆ , we add the rule . . · t n+1 · R(j n+1 ) We denote this rule by trace(ρ, t 1 , . . . , t n+1 ), to highlight the relationship with the original rule ρ. We also add one axiom r ⇒ tr r for each result r .

Fig. 2. Trace semantics for application
We denote this rule by prop(ρ, i, t 1 , . . . , t i−1 , t) to highlight the relationship with the original rule ρ. These rules derive judgements c ⇒ tr t with t ∈ C ω , modelling diverging computations.
The inference system R tr must be interpreted coinductively, to properly model diverging computations. Indeed, since there is no axiom introducing an infinite trace, they can be derived only by an infinite proof tree. We write R tr ⊢ c ⇒ tr t when the judgment c ⇒ tr t is derivable in R tr .
We show in Fig. 2 the rules obtained starting from meta-rule (app) of the example (for other meta-rules the outcome is analogous).
Note that only the judgment Ω ⇒ tr t Ω can be derived, that is, the trace semantics of Ω is uniquely determined to be t Ω , since the infinite proof tree forces the equation t Ω = Ω · ωω · t Ω . This example is a cyclic proof, but there are divergent computations with no circular derivation. The trace construction is conservative with respect to the original semantics, that is, converging computations are not affected.

Wrong
A well-known technique [33] (Exercise 3.5.16) to distinguish between stuck and diverging computations, in a sense "dual" to the previous one, is to add a special result wrong, so that c ⇒ wrong means that the evaluation of c goes stuck.
In this case, to define an "automatic" version of the construction, starting from C , R, R , is a non-trivial problem. Our solution is based on defining a relation on rules, modelling equality up to a certain index i, also used for other aims in the following. Consider ρ ≡ rule(j 1 . . . j n , j n+1 , c), ρ ′ ≡ rule(j ′ 1 . . . j ′ m , j ′ m+1 , c ′ ), and an index i ∈ 1.. min(n + 1, m + 1), then Intuitively, this means that rules ρ and ρ ′ model the same computation until the i-th premise. Using this relation, we derive, from the judgment c ⇒ r , an enriched big-step judgement c ⇒ r wr where r wr ∈ R ∪ {wrong}, defined by a set of rules R wr containing all rules in R and two other kinds of rules: wrong introduction These rules derive wrong whenever the (sub)configuration in a premise of a rule reduces to a result which is not admitted in such (or any equivalent) rule: for each ρ ≡ rule(j 1 . . . j n , j n+1 , c) in R, index i ∈ 1..n + 1, and result r ∈ R, if for all rules ρ ′ such that ρ ∼ i ρ ′ , R(ρ ′ , i) = r , then we add the rule wrong(ρ, i, r ) as follows: We also add an axiom c ⇒ wrong for each configuration c which is not the conclusion of any rule. wrong propagation These rules propagate wrong analogously to those for divergence propagation: for each ρ ≡ rule(j 1 . . . j n , j n+1 , c) in R, and index i ∈ 1..n + 1, we add the rule prop(ρ, i, wrong) as follows: We write R wr ⊢ c ⇒ r wr when the judgment c ⇒ r wr is derivable in R wr . We show in Fig. 3 the meta-rules for wrong introduction and propagation constructed starting from those for application and successor. For instance, rule (wrong-app) is introduced since in the original semantics there is rule (app) with e 1 e 2 in the consequence and e 1 in the first premise, but there is no equivalent rule (that is, with e 1 e 2 in the consequence and e 1 in the first premise) such that the result in the first premise is n.
The wrong construction is conservative as well.

Expressing and proving soundness
A predicate (for instance, a typing judgment) is sound when, informally, a program satisfying the predicate (e.g., a well-typed program) cannot go wrong, following Robin Milner's slogan [31]. In small-step style, as firstly formulated in [40], this is naturally expressed as follows: well-typed programs never reduce to terms which neither are values, nor can be further reduced (called stuck terms). The standard technique to ensure soundness is by subject reduction (well-typedness is preserved by reduction) and progress (a well-typed term is not stuck).
We discuss how soundness can be expressed for the two approaches previously presented and we introduce sufficient conditions. In other words, we provide a proof technique to show the soundness of a predicate with respect to a big-step semantics. As mentioned in the Introduction, the extended semantics is only needed to prove the correctness of the technique, whereas to apply the technique for a given big-step semantics it is enough to reason on the original rules.

Expressing soundness
In the following, we assume a big-step semantics C , R, R , and an indexed predicate on configurations, that is, a family Π = (Π ι ) ι∈I , for I set of indexes, with Π ι ⊆ C . A representative case is that, as in the examples of Sect. 5, the predicate is a typing judgment and the indexes are types; however, the proof technique could be applied to other kinds of predicates. When there is no ambiguity, we also denote by Π the corresponding predicate ι∈I Π ι on C (e.g., to be well-typed with an arbitrary type).
To discuss how to express soundness of Π, first of all note that, in the nondeterministic case (that is, there is possibly more than one computation for a configuration), we can distinguish two flavours of soundness [21]: soundness-must (or simply soundness) no computation can be stuck soundness-may at least one computation is not stuck Soundness-must is the standard soundness in small-step semantics, and can be expressed in the wrong extension as follows: Instead, soundness-must cannot be expressed in the trace extension. Indeed, stuck computations are not explicitly modelled. Conversely, soundness-may can be expressed in the trace extension as follows: soundness-may (traces) If c ∈ Π, then there is t such that R tr ⊢ c ⇒ tr t whereas cannot be expressed in the wrong semantics, since diverging computations are not modelled.
Of course soundness-must and soundness-may coincide in the deterministic case. Finally, note that indexes (e.g., the specific types of configurations) do not play any role in the above statements. However, they are relevant in the notion of strong soundness, introduced by [40]. Strong soundness holds if, for configurations satisfying Π ι (e.g., having a given type), computation cannot be stuck, and moreover, produces a result satisfying Π ι (e.g., of the same type) if terminating. Note that soundness alone does not even guarantee to obtain a result satisfying Π (e.g., a well-typed result). The three conditions introduced in the following section actually ensure strong soundness.
In Sect. 4.2 we provide sufficient conditions for soundness-must, showing that they actually ensure soundness in the wrong semantics (Theorem 3). Then, in Sect. 4.3, we provide (weaker) sufficient conditions for soundness-may, and show that they actually ensure soundness-may in the trace semantics (Theorem 4).

Conditions ensuring soundness-must
The three conditions which ensure the soundness-must property are local preservation, ∃-progress, and ∀-progress. The names suggest that the former plays the role of the type preservation (subject reduction) property, and the latter two of the progress property in small-step semantics. However, as we will see, the correspondence is only rough, since the reasoning here is different.
Considering the first condition more closely, we use the name preservation rather than type preservation since, as already mentioned, the proof technique can be applied to arbitrary predicates. More importantly, local means that the condition is on single rules rather than on the semantic relation as a whole, as standard subject reduction. The same holds for the other two conditions. Definition 1 (S1: Local Preservation). For each ρ≡rule(j 1 . . . j n , j n+1 , c), if c∈Π ι , then there exist ι 1 , . . . , ι n+1 ∈ I , with ι n+1 =ι, such that, for all k ∈ 1..n + 1: Thinking to the paradigmatic case where the indexes are types, for each rule ρ, if the configuration c in the consequence has type ι, we have to find types ι 1 , . . . , ι n+1 which can be assigned to (the configurations in) the premises, in particular the same type as c for the continuation. More precisely, we start finding type ι 1 , and successively find the type ι k for (the configuration in) the k-th premise assuming that the results of all the previous premises have the expected types. Indeed, if all such previous premises are derivable, then the expected type should be preserved by their results; if some premise is not derivable, the considered rule is "useless". For instance, considering (an instantiation of) meta-rule has the type T of e 1 e 2 under the assumption that λx .e has type T ′ → T , and v 2 has type T ′ (see the proof example in Sect. 5.1 for more details). A counter-example to condition S1 is discussed at the beginning of Sect. 5.3.
The following lemma states that local preservation actually implies preservation of the semantic relation as a whole.
The following proposition is a form of local preservation where indexes (e.g., specific types) are not relevant, simpler to use in the proofs of Theorems 3 and 4. Proposition 1. Let R and Π satisfy condition S1. For each rule(j 1 . . . j n , j n+1 , c) and k ∈ 1..n + 1, if c ∈ Π and, for all h < k, R ⊢ j h , then C (j k ) ∈ Π.
The second condition, named ∃-progress, ensures that, for configurations satisfying the predicate Π (e.g., well-typed), we can start constructing a proof tree.
The third condition, named ∀-progress, ensures that, for configurations satisfying Π, we can continue constructing the proof tree. This condition uses the notion of rules equivalent up-to an index introduced at the beginning of Sect. 3.2.
We have to check, for each rule ρ, the following: if the configuration c in the consequence satisfies the predicate (e.g., is well-typed), then, for each k, if the configuration in premise k evaluates to some result r (that is, R ⊢ C (j k ) ⇒ r ), then there is a rule (ρ itself or another rule with the same configuration in the consequence and the first k − 1 premises) with such judgment as k-th premise. This check can be done under the assumption that all the previous premises are derivable. For instance, consider again (an instantiation of) the meta-rule Assuming that e 1 evaluates to some v 1 , we have to check that there is a rule with first premise e 1 ⇒ v 1 , in pratice, that v 1 is a λ-abstraction; in general, checking S3 for a (meta-)rule amounts to show that (sub)configurations in the premises evaluate to results with the required shape (see also the proof example in Sect. 5.1).
Soundness-must in wrong semantics Recall that R wr is the extension of R with wrong (Sect. 3.2). We prove the claim of soundness-must with respect to R wr . Theorem 3. Let R and Π satisfy conditions S1, S2 and S3. If c ∈ Π, then R wr ⊢ c ⇒ wrong.
Proof. To prove the statement, we assume R wr ⊢ c ⇒ wrong and look for a contradiction. The proof is by induction on the derivation of c ⇒ wrong. If the last applied rule is an axiom, then, by construction, there is no rule ρ ∈ R such that C (ρ) = c, and this violates condition S2, since c ∈ Π. If the last applied rule is wrong(ρ, i, r ), with ρ ≡ rule(j 1 . . . j n , j n+1 , c), then, by hypothesis, for all k < i, R wr ⊢ j k , and R wr ⊢ C (j i ) ⇒ r , and these judgments can also be derived in R by conservativity (Theorem 2). Furthermore, by construction of this rule, we know that there is no other rule ρ ′ ∼ i ρ such that R(ρ ′ , i) = r , and this violates condition S3, since c ∈ Π. If the last applied rule is prop(ρ, i, wrong), with ρ ≡ rule(j 1 . . . j n , j n+1 , c), then, by hypothesis, for all k < i, R wr ⊢ j k , and these judgments can also be derived in R by conservativity. Then, by Prop. 1 (which requires condition S1), since c ∈ Π, we have C (j i ) ∈ Π, hence we get the thesis by induction hypothesis.
Sect. 5.1 ends with examples not satisfying properties S2 and S3.

Conditions ensuring soundness-may
As discussed in Sect. 4.1, in the trace semantics we can only express a weaker form of soundness: at least one computation is not stuck (soundness-may). As the reader can expect, to ensure this property weaker sufficient conditions are enough: namely, condition S1, and another condition named progress-may and defined below.
We write R ⊢ c ⇒ if c does not converge (there is no r such that R ⊢ c ⇒ r ).
Definition 4 (S4: progress-may). For each c ∈ Π\R, there is ρ ≡ rule(j 1 . . . j n , j n+1 , c) such that: if there is a (first) k ∈ 1..n + 1 such that R ⊢ j k and, for all h < k, This condition can be informally understood as follows: we have to show that there is an either finite or infinite computation for c. If we find a rule where all premises are derivable (no k), then there is a finite computation. Otherwise, c does not converge. In this case, we should find a rule where the configuration in the first non-derivable premise k does not converge as well. Indeed, by coinductive reasoning (use of Lemma 2 below), we obtain that c diverges. The following proposition states that this condition is indeed a weakening of S2 and S3. Proposition 2. Conditions S2 and S3 imply condition S4.
Soundness-may in trace semantics Recall that R tr is the extension of R with traces, defined in Sect. 3.1, where judgements have shape c ⇒ tr t, with t ∈ C ∞ .
The following lemma provides a proof principle useful to coinductively show that a property ensures the existence of an infinite trace, in particular to show Theorem 4. It is a slight variation of an analogous principle presented in [8].
Lemma 2. Let S ⊆ C be a set. If, for all c ∈ S, there are ρ ≡ rule(j 1 . . . j n , j n+1 , c) and k ∈ 1..n + 1 such that 1. for all h < k, R ⊢ j h , and 2. C (j k ) ∈ S then, for all c ∈ S, there is t ∈ C ω such that R tr ⊢ c ⇒ tr t.
Theorem 4. Let R and Π satisfy conditions S1 and S4. If c ∈ Π, then there is t such that R tr ⊢ c ⇒ tr t.
Proof. First note that, thanks to Theorem 1, the statement is equivalent to the following: If c ∈ Π and R ⊢ c ⇒ , then there is t ∈ C ω such that R tr ⊢ c ⇒ tr t. Then, the proof follows from Lemma 2. We define S = {c | c∈Π and R ⊢ c ⇒ }, and show that, for all c ∈ S, there are ρ ≡ rule(j 1 . . . j n , j n+1 , c) and k ∈ 1..n + 1 such that, for all h < k, R ⊢ j h , and C (j k ) ∈ S.
Consider c ∈ S, then, by S4, there is ρ ≡ rule(j 1 . . . j n , j n+1 , c). By definition of S, we have R ⊢ c ⇒ , hence there exists a (first) k ∈ 1..n + 1 such that R ⊢ j k , since, otherwise, we would have R ⊢ c ⇒ R(j n+1 ). Then, since k is the first index with such property, for all h < k, we have R ⊢ j h , hence, again by condition S4, we have that R ⊢ C (j k ) ⇒ . Finally, since for all h < k we have R ⊢ j h , by Prop. 1, we get C (j k ) ∈ Π, hence C (j k ) ∈ S, as needed.

Examples
Sect. 5.1 explains in detail how a typical soundness proof can be rephrased in terms of our technique, by reasoning directly on big-step rules. Sect. 5.2 shows a case where this is advantageous, since the property to be checked is not preserved by intermediate computation steps, whereas it holds for the final result. Sect. 5.3 considers a more sophisticated type system, with intersection and union types. Finally, Sect. 5.4 shows another example where subject reduction is not preserved, whereas soundness can be proved with our technique. This example is intended as a preliminary step towards a more challenging case.

Simply-typed λ-calculus with recursive types
As a first example, we take the λ-calculus with natural constants, successor, and choice used in Sect. 2 (Fig. 1). We consider a standard simply-typed version with recursive types, obtained by interpreting the production in Fig. 4 coinductively. Introducing recursive types makes the calculus non-normalising and permits to write interesting programs such as Ω (see Sect. 3.1).
The typing rules are recalled in Fig. 4. Type environments, written Γ , are finite maps from variables to types, and Γ {T /x } denotes the map which returns T on x and coincides with Γ elsewhere. We write ⊢ e : T for ∅ ⊢ e : T .
Let R 1 be the big-step semantics defined in Fig. 1, and let Π1 T (e) hold if ⊢ e : T , for T defined in Fig. 4. To prove the three conditions S1, S2 and S3 of Theorem 5 (Soundness). The big-step semantics R 1 and the indexed predicate Π1 satisfy the conditions S1, S2 and S3 of Sect. 4.2.
Since the aim of this first example is to illustrate the proof technique, we provide a proof where we explain the reasoning in detail.
Proof of S1. We should prove this condition for each (instantiation of meta-)rule. (app): Assume that ⊢ e 1 e 2 : T holds. We have to find types for the premises, notably T for the last one. We proceed as follows: (succ): This rule has an implicit continuation n + 1 ⇒ n + 1. Assume that ⊢ succ e : T holds. By Lemma 3 (5), T = Nat, and ⊢ e : Nat, hence we find Nat as type for the first premise. Moreover, ⊢ n + 1 : Nat holds by rule (t-const).
(choice): Assume that ⊢ e 1 ⊕ e 2 : T holds. By Lemma 3 (6), we have ⊢ e i : T , with i ∈ 1, 2. Hence we find T as type for the premise.
Proof of S2. We should prove that, for each non-result configuration (here, expression e which is not a value) such that ⊢ e : T holds for some T , there is a rule with this configuration in the consequence. The expression e cannot be a variable, since a variable cannot be typed in the empty environment. Application, successor and choice appear as consequence in the reduction rules.
Proof of S3. We should prove this condition for each (instantiation of meta-)rule. in the consequence and e1 ⇒ v as first premise. Since we proved S1, by preservation (Lemma 1) ⊢ v : T ′ → T holds. Then, by Lemma 5 (1), v has shape λx .e, hence the required rule exists. As noted at page 178, in practice checking S3 for a (meta-)rule amounts to show that (sub)configurations in the premises evaluate to results which have the required shape (to be a λ-abstraction in this case). 2. Second premise: if e 1 ⇒ λx .e, and e2 ⇒ v 2 , then there should be a rule with e 1 e 2 in the consequence and e 1 ⇒ λx .e, e2 ⇒ v as first two premises. This is trivial since the meta-variable v 2 can be freely instantiated in the meta-rule.
(succ): Assuming ⊢ succ e : T , again by Lemma 3 (5) we get ⊢ e : Nat. If e ⇒ v is derivable, there should be a rule with succ e in the consequence and e ⇒ v as first premise. Indeed, by preservation (Lemma 1) and Lemma 5 (2), v has shape n. For the second premise, if n + 1 ⇒ v is derivable, then v is necessarily n + 1.
(choice): Trivial since the meta-variable v can be freely instantiated.
An interesting remark is that, differently from the standard approach, there is no induction in the proof: everything is by cases. This is a consequence of the fact that, as discussed in Sect. 4.2, the three conditions are local, that is, they are conditions on single rules. Induction is "hidden" in the proof that those three conditions are sufficient to ensure soundness.
If we drop in Fig. 1 rule (succ), then condition S2 fails, since there is no longer a rule for the well-typed non-result configuration succ n. If we add the (fool) rule ⊢ 0 0 : Nat, then condition S3 fails for rule (app), since 0 ⇒ 0 is derivable, but there is no rule with 0 0 in the conclusion and 0 ⇒ 0 as first premise.

MiniFJ&λ
In this example, the language is a subset of FJ&λ [12], a calculus extending Featherweight Java (FJ) with λ-abstractions and intersection types, introduced in Java 8. To keep the example small, we do not consider intersections and focus on one key typing feature: λ-abstractions can only be typed when occurring in a context requiring a given type (called the target type). In a small-step semantics, this poses a problem: reduction can move λ-abstractions into arbitrary contexts, leading to intermediate terms which would be ill-typed. To maintain subject reduction, in [12] λ-abstractions are decorated with their initial target type. In a big-step semantics, there is no need of intermediate terms and annotations.
The syntax is given in the first part of Fig. 5. We assume sets of variables x , class names C, interface names I, J, field names f, and method names m.
Interfaces which have exactly one method (dubbed functional interfaces) can be used as target types. Expressions are those of FJ, plus λ-abstractions, and types are class and interface names. In λxs.e we assume that xs is not empty and e is not a λ-abstraction. For simplicity, we only consider upcasts, which have no runtime effect, but are important to allow the programmer to use λ-abstractions, as exemplified in discussing typing rules.
To be concise, the class table is abstractly modelled as follows: fields(C) gives the sequence of field declarations T 1 f 1 ;..T n f n ; for class C mtype(T , m) gives, for each method m in class or interface T , the pair T 1 . . . T n → T ′ consisting of the parameter types and return type mbody(C, m) gives, for each method m in class C, the pair x 1 . . . x n , e consisting of the parameters and body -<: is the reflexive and transitive closure of the union of the extends and implements relations -!mtype(I) gives, for each functional interface I, mtype(I, m), where m is the only method of I.
The big-step semantics is given in the last part of Fig. 5. MiniFJ&λ shows an example of instantiation of the framework where configurations include an auxiliary structure, rather than being just language terms. In this case, the structure is an environment e (a finite map from variables to values) modelling the current stack frame. Results are values, which are either objects, of shape [vs] C , or λ-abstractions.
Rules for FJ constructs are straightforward. Note that, since we only consider upcasts, casts have no runtime effect. Indeed, they are guaranteed to succeed on well-typed expressions. Rule (λ-invk) shows that, when the receiver of a method is a λ-abstraction, the method name is not significant at runtime, and the effect is that the body of the function is evaluated as in the usual application.
The type system is given in Fig. 6. Method bodies are expected to be welltyped with respect to method types. Formally, mbody(C, m) and mtype(C, m) are either both defined or both undefined: in the first case mbody(C, m) = x 1 . . . x n , e , mtype(C, m) = T 1 . . . T n → T , and x 1 :T 1 , . . . , x n :T n , this:C ⊢ e : T . Moreover, we assume other standard FJ constraints on the class table, such as no field hiding, no method overloading, the same parameter and return types in overriding.
Besides the standard typing features of FJ, the MiniFJ&λ type system ensures the following.  -A functional interface I can be assigned as type to a λ-abstraction which has the functional type of the method, see rule (t-λ). -A λ-abstraction should have a target type determined by the context where the λ-abstraction occurs. More precisely, see [25] page 602, a λ-abstraction in our calculus can only occur as return expression of a method or argument of constructor, method call or cast. Then, in some contexts a λ-abstraction cannot be typed, in our calculus when occurring as receiver in field access or method invocation, hence these cases should be prevented. This is implicit in rule (t-field-access), since the type of the receiver should be a class name, whereas it is explicitly forbidden in rule (t-invk). For the same reason, a λabstraction cannot be the main expression to be evaluated. -A λ-abstraction with a given target type J should have type exactly J: a subtype I of J is not enough. Consider, for instance, the following program: Here, the λ-abstraction has target type J, which is not a functional interface, hence the expression is illtyped in Java (the compiler has no functional type against which to typecheck the λ-abstraction). On the other hand, in the body of method m, the parameter y of type I can be passed, as usual, to method n expecting a supertype. For instance, the main expression new C().m(λx .x ) is well-typed, since the λ-abstraction has target type I, and can be safely passed to method n, since it is not used as function there. To formalise this behaviour, it is forbidden to apply subsumption to λ-abstractions, see rule (t-sub). -However, λ-abstractions occurring as results rather than in source code (that is, in the environment and as fields of objects) are allowed to have a subtype of the required type, see the explicit side condition in rules (t-conf) and (t-object). For instance, if C is a class with one field J f, the expression new C((I)λx.x) is well-typed, whereas new C(λx.x) is ill typed, since rule (t-sub) cannot be applied to λ-abstractions. When the expression is evaluated, the result is [λx.x] C , which is well-typed.
As mentioned at the beginning, the obvious small-step semantics would produce not typable expressions. In the above example, we get x] C and new C(λx.x) has no type, while new C((I)λx.x) and [λx.x] C have type C.
We write Γ ⊢ e :<: T as short for Γ ⊢ e : T ′ and T ′ <: T for some T ′ . In order to state soundness, set R 2 the big-step semantics defined in Fig. 5, and let Π2 T ( e, e ) hold if ⊢ e, e :<: T , Π2 T (v ) if ⊢ v :<: T , for T defined in Fig. 5.
Theorem 6 (Soundness). The big-step semantics R 2 and the indexed predicate Π2 satisfy the conditions S1, S2 and S3 of Sect. 4.2.

Intersection and union types
We enrich the type system of Fig. 4 by adding intersection and union type constructors and the corresponding typing rules, see Fig. 7. As usual we require an infinite number of arrows in each infinite path for the trees representing types. Intersection types for the λ-calculus have been widely studied [11]. Union types naturally model conditionals [26] and non-deterministic choice [22]. The typing rules for the introduction and the elimination of intersection and union are standard, except for the absence of the union elimination rule: Γ ⊢ e[e ′ /x ] : V As a matter of fact rule (∨E) is unsound for ⊕. For example, let split the type Nat into Even and Odd and add the expected typings for natural numbers. The prefix addition + has type (Even → Even → Even) ∧ (Odd → Odd → Even) and we derive x:Even ⊢ + x x:Even x:Odd ⊢ + x x:Even   We cannot assign the type Even to 3, which is a possible result, so strong soundness is lost. In the small-step approach, we cannot assign Even to the intermediate term + 1 2, so subject reduction fails. In the big-step approach, there is no such intermediate term; however, condition S1 fails for the reduction rule for +. Indeed, considering the following instantiation of the rule: (1 ⊕ 2) ⇒ 3 and the type Even for the consequence, we cannot assign this type to the (configuration in) last premise (continuation).
Intersection types allow to derive meaningful types also for expressions containing variables applied to themselves, for example we can derive ⊢ λx .x x : (T → S ) ∧ T → S With union types all non-deterministic choices between typable expressions can be typed too, since we can derive Γ ⊢ e 1 ⊕ e 2 : T 1 ∨ T 2 from Γ ⊢ e 1 : T 1 and Γ ⊢ e 2 : T 2 .
In order to state soundness, let Π3 T (e) be ⊢ e : T , for T defined in Fig. 7.
Theorem 7 (Soundness). The big-step semantics R 1 and the indexed predicate Π3 satisfy the conditions S1, S2 and S3 of Sect. 4.2.
As the example shows, the key problem is that rule (∨E) can be applied to expression e where the same subexpression e ′ occurs more than once. In the non-deterministic case, as shown by the example in the previous section, this is unsound, since e ′ can reduce to different values. In the deterministic case, instead, this is sound, but cannot be proved by subject reduction. Since using big-step semantics there are no intermediate steps to be typed, our approach seems very promising to investigate an alternative proof of soundness. Whereas we leave this challenging problem to future work, here as first step we describe a (hypothetical) calculus with a much simpler version of the problematic feature.
The calculus is a variant of FJ [27] with intersection and union types. Methods have intersection types with the same return type and different parameter types, modelling a form of overloading. Union types enhance typability of conditionals. The more interesting feature is the possibility of replacing an arbitrary number of parameters with the same expression having an union type. We dub this calculus MiniFJ&O. Fig. 8 gives the syntax, big-step semantics and typing rules of MiniFJ&O. We omit the standard big-step rule for conditional, and typing rules for boolean constants. The subtyping relation <: is the reflexive and transitive closure of the union of the extends relation and the standard rules for union: T 1 <: T 1 ∨ T 2 T 1 <: T 2 ∨ T 1 On the other hand, method types (results of the mtype function) are now intersection types, and the subtyping relation on them is the reflexive and transitive closure of the standard rules for intersection: The functions fields and mbody are defined as for MiniFJ&λ. Instead mtype(C, m) gives, for each method m in class C, an intersection type. We assume mbody(C, m) and mtype(C, m) either both defined or both undefined: in the first case mbody(C, m)= x 1 . . . x n , e , mtype(C, m)= 1≤i≤m (C In order to state soundness, let R 4 be the big-step semantics defined in Fig. 8, and let Π4 T (e) hold if ⊢ e : T , for T defined in Fig. 8.
Theorem 8 (Soundness). The big-step semantics R 4 and the indexed predicate Π4 satisfy the conditions S1, S2 and S3 of Sect. 4.2.

The partial evaluation construction
In this section, our aim is to provide a formal justification that the constructions in Sect. 3 are correct. For instance, for the wrong semantics we would like to be sure that all the cases are covered. To this end, we define a third construction, dubbed pev for "partial evaluation", which makes explicit the computations of a big-step semantics, intended as the sequences of execution steps of the naturally associated evaluation algorithm. Formally, we obtain a reduction relation on approximated proof trees, so non-termination and stuck computation are distinguished, and both soundness-must and soundness-may can be expressed.
To this end, first of all we introduce a special result ?, so that a judgment c ⇒ ? (called incomplete, whereas a judgment in R is complete) means that the evaluation of c is not completed yet. Analogously to the previous constructions, we define an augmented set of rules R ? for the judgment extended with ?: ? introduction rules These rules derive ? whenever a rule is partially applied: for each rule ρ ≡ rule(j 1 . . . j n , j n+1 , c) in R, index i ∈ 1..n + 1, and result r ∈ R, we define the rule intro ? (ρ, i, r ) as We also add an axiom c ⇒ ?
for each configuration c ∈ C .
? propagation rules These rules propagate ? analogously to those for divergence and wrong propagation: for each ρ ≡ rule(j 1 . . . j n , j n+1 , c) in R, and index i ∈ 1..n + 1, we add the rule prop(ρ, i, ?) as follows: Finally, we consider the set T of the (finite) proof trees τ in R ? . Each τ can be thought as a partial proof or partial evaluation of the root configuration. In particular, we say it is complete if it is a proof tree in R (that is, it only contains complete judgments), incomplete otherwise. We define a reduction relation on T such that, starting from the initial proof tree c ⇒ ? , we derive a sequence where, intuitively, at each step we detail the proof (evaluation). In this way, a sequence ending with a complete tree . . . c ⇒ r models terminating computation, whereas an infinite sequence (tending to an infinite proof tree) models divergence, and a stuck sequence models a stuck computation.
The one-step reduction relation R − −− → on T is inductively defined by the rules in Fig. 9. In this figure #ρ denotes the number of premises of ρ, and r(τ ) the root of τ . We set R ? (c ⇒ u) = u where u ∈ R ∪ {?}. Finally, ∼ i is the equivalence up-to an index of rules, introduced at the beginning of Sect. 3.2. As said above, each reduction step makes "less incomplete" the proof tree. Notably, reduction rules apply to nodes with consequence c ⇒ ?, whereas subtrees with root c ⇒ r represent terminated evaluation. In detail: -If the last applied rule is an axiom, and the configuration is a result r , then we can evaluate r to itself. Otherwise, we have to find a rule ρ with c in the consequence and start evaluating the first premise of such rule. -If the last applied rule is intro ? (ρ, i, r ), then all subtrees are complete, hence, to continue the evaluation, we have to find another rule ρ ′ , having, for each k ∈ 1..i, as k-th premise the root of τ k . Then there are two possibilities: if there is an i + 1-th premise, we start evaluating it, otherwise, we propagate to the conclusion the result r of τ i . -If the last applied rule is a propagation rule prop(ρ, i, ?), then we simply propagate the step made by τ i .
In Fig. 10 we report an example of pev reduction. We end by stating the three constructions to be equivalent to each other, thus providing a coherency result of the approach. In particular, first we show that pev is conservative with respect to R, and this ensures the three constructions are equivalent for finite computations. Then, we prove traces and wrong constructions to be equivalent to pev for diverging and stuck computations, respectively, and this ensures they cover all possible cases.

Related work
Modeling divergence The issue of modelling divergence in big-step semantics dates back to [18], where a stratified approach with a separate coinductive judgment for divergence is proposed, also investigated in [30].
In [5] the authors models divergence by interpreting coinductively standard big-step rules and considering also non-well-founded values. In [17] a similar technique is exploited, by adding a special result modelling divergence. Flag-based big-step semantics [36] captures divergence by interpreting the same semantic rules both inductively and coinductively. In all these approaches, spurious judgements can be derived for diverging computations.
Other proposals [32,3] are inspired by the notion of definitional interpreter [37], where a counter limits the number of steps of a computation. Thus, divergence can be modelled on top of an inductive judgement: a program diverges if the timeout is raised for any value of the counter, hence it is not directly modelled in the definition. Instead, [20] provides a way to directly model divergence using definitional interpreters, relying on the coinductive partiality monad [16].
The trace semantics in Sect. 3.1 has been inspired by [29]. Divergence propagation rules are very similar to those used in [8,9] to define a big-step judgment which directly includes divergence as result. However, this direct definition relies on a non-standard notion of inference system, allowing corules [7,19], whereas for the trace semantics presented in this work standard coinduction is enough, since all rules are productive, that is, they always add an element to the trace.
Differently from all the previously cited papers which consider specific examples, the work [2] shares with us the aim of providing a generic construction to model non-termination, basing on an arbitrary big-step semantics. Ager considers a class of big-step semantics identified by a specific shape of rules, and defines, in a small-step style, a proof-search algorithm which follows the big-step rules; in this way, converging, diverging and stuck computations are distinguished. This approach is somehow similar to our pev semantics, even tough the transition system we propose is directly defined on proof trees.
There is an extensive body of work on coalgebraic techniques, where the difference between semantics can be simply expressed by a change of functor. In this paper we take a set-theoretic approach, simple and accessible to a large audience. Furthermore, as far as we know [38], coalgebras abstract several kinds of transition systems, thus being more similar to a small-step approach. In our understanding, the coalgebra models a single computation step with possible effects, and from this it is possible to derive a unique morphism into the final coalgebra modelling the "whole" semantics. Our trace semantics, being big-step, seems to roughly correspond to directly get this whole semantics. In other words, we do not have a coalgebra structure on configurations.
Proving soundness As we have discussed, also proving (type) soundness with respect to a big-step semantics is a challenging task, and some approaches have been proposed in the literature. In [24], to show soundness of large steps semantics, they prove a coverage lemma, which ensures that the rules cover all cases, including error situations. In [30] the authors prove a soundness property similar to Theorem 4, but by using a separate judgment to represent divergence, thus avoiding using traces. In [5] there is a proof of soundness of a coinductive type system with respect to a coinductive big-step semantics for a Java-like language, defining a relation between derivations in the type system and in the big-step semantics. In [8] there is a proof principle, used to show type soundness with respect to a big-step semantics defined by an inference system with corules [7]. In [4] the proof of type soundness of a calculus formalising path-dependent types relies on a big-step semantics, while in [3] soundness is shown for the polymorphic type systems F <: , and for the DOT calculus, using definitional interpreters to model the semantics. In both cases they extend the original semantics adding error and timeout, and adopt inductive proof strategies, as in [39]. A similar approach is followed by [32] to show type soundness of the Core ML language.
Also [6] proposes an inductive proof of type soundness for the big-step semantics of a Java-like language, but relying on a notion of approximation of infinite derivation in the big-step semantics.
Pretty big-step semantics [17] aims at providing an efficient representation of big-step semantics, so that it can be easily extended without duplication of meta-rules. In order to define and prove soundness, they propose a generic error rule based on a progress judgment, whose definition can be easily derived manually from the set of evaluation rules. This is partly similar to our wrong extension, with two main differences. First, by factorising rules, they introduce intermediate steps as in small-step semantics, hence there are similar problems when intermediate steps are ill-typed (as in Sect. 5.2, Sect. 5.4). Second, wrong introduction is handled by the progress judgment, that is, at the level of side-conditions. Moreover, in [13] there is a formalisation of the pretty-big-step rules for performing a generic reasoning on big-step semantics by using abstract interpretation. However, the authors say that they interpret rules inductively, hence non-terminating computations are not modelled.
Finally, some (but not all) infinite trees of our trace semantics can be seen as cyclic proof trees, see end of Sect. 3.1. Proof systems supporting cyclic proofs can be found, e.g., in [14,15] for classical first order logic with inductive definitions.

Conclusion and future work
The most important contribution is a general approach for reasoning on soundness with respect to a big-step operational semantics. Conditions can be proven by a case analysis on the semantic (meta-)rules avoiding small-step-style intermediate configurations. This can be crucial since there are calculi where the property to be checked is not preserved by such intermediate configurations, whereas it holds for the final result, as illustrated in Sect. 5.
In future work, we plan to use the meta-theory in Sect. 2 as basis to investigate yet other constructions, notably the approach relying on corules [8,9], and that, adding a counter, based on timeout [32,3].
We also plan to compare our proof technique for proving soundness with the standard one for small-step semantics: if a predicate satisfies progress and subject reduction with respect to a small-step semantics, does it satisfy our soundness conditions with respect to an equivalent big-step semantics? To formally prove such a statement, the first step will be to express equivalence between small-step and big-step semantics. On the other hand, the converse does not hold, as shown by the examples in Sect. 5.2 and Sect. 5.4. For what concerns significant applications, we plan to use the approach to prove soundness for the λ-calculus with full reduction and intersection/union types [10]. The interest of this example lies in the failure of the subject reduction, as discussed in Sect. 5.4. In another direction, we want to enhance MiniFJ&O with λ-abstractions and allowing everywhere intersection and union types [23]. This will extend typability of shared expressions. We plan to apply our approach to the big-step semantics of the statically typed virtual classes calculus developed in [24], discussing also the non terminating computations not considered there.
With regard to proofs, that are mainly omitted here, and can be found in the extended version at http://arxiv.org/abs/2002.08738, we plan to investigate if we can simplify them by means of enhanced conductive techniques.
As a proof-of-concept, we provided a mechanisation 6 in Agda of Lemma 1. The mechanisations of the other proofs is similar. However, as future work, we think it would be more interesting to provide a software for writing big-step definitions and for checking that the soundness conditions hold.