1 Introduction

Probability theory offers computer science models which enable system abstraction (at the price of introducing uncertainty), but which can also be seen as a a way to compute, like in randomized computation. Domains in which probabilistic models play a key role include machine learning [27], robotics [34], and linguistics [24]. In cryptography, on the other hand, having access to a source of uniform randomness is essential to achieve security, e.g., in the public key setting [20]. This has stimulated the development of concrete and abstract programming languages, which most often are extensions of their deterministic siblings. Among the many ways probabilistic choice can be captured in programming, the simplest one consists in endowing the language of programs with an operator modeling the flipping of a fair coin. This renders program evaluation a probabilistic process, and under mild assumptions the language becomes universal for probabilistic computation. Particularly fruitful in this sense has been the line of work on the functional paradigm, both at a theoretical [22, 26, 29] and at a more practical level [21].

We are still far, however, from a satisfactory understanding of higher-order probabilistic computation. As an example, little is known about how much of the classic, beautiful, theory underlying the \(\lambda \)-calculus [1] can be lifted to probabilistic \(\lambda \)-calculi, although the latter have been known from forty years now [30]. Until the beginning of this decade, indeed, most investigations were directed towards domain theory, which has been proved to be much more involved in presence of probabilistic choice than in a deterministic scenario [23]. In the last ten years, however, some promising results have appeared. As an example, both quantitative semantics and applicative bisimilarity have been shown to coincide with context equivalence for certain kinds of probabilistic \(\lambda \)-calculi [5, 14]. This not only provides us with new proof methodologies for program equivalence, but also sheds new light on the very nature of probabilistic higher-order computation. As an example, recent results tell us that program equivalence in presence of probabilistic choice lies somehow in between determinism and non-determinism [5].

But are equivalences the most proper way to compare terms? Actually, this really depends on what the underlying observable is. If observables are boolean, then equivalences (and preorders) are indeed natural choices: two programs are dubbed equivalent if they give rise to the same observable (of which there are just two!) in any context. If, on the other hand, the observable is an element of a metric space, which happens for example when we observe (the probability of) convergence in a probabilistic setting, one may wonder whether replacing equivalences with metrics makes sense. This is a question that has recently been given a positive answer in the affine setting [6], i.e., in a \(\lambda \)-calculus in which copying is simply not available. More specifically, a notion of context distance has been shown to model differences between terms satisfactorily, and has also been shown to be characterized by notions of trace metrics, and to be approximated from below by behavioral metrics.

Affine \(\lambda \)-calculi are very poor in terms of the computations they are able to model. Measuring the distance between terms in presence of copying, however, is bound to be problematic. On the one hand, allowing contexts to copy their argument has the potential risk of trivializing the underlying metric. On the other hand, finding handier characterizations of the obtained notion of metric in the style of behavioral or trace metrics is inherently hard. A more thorough discussion on these issues can be found in Sect. 2 below.

In this paper, we attack the problem of analyzing the distance between \(\lambda \)-terms in its full generality. More specifically, the contributions of this paper are fourfold:

  • First of all, we define a linear probabilistic \(\lambda \)-calculus, called \(\varLambda _\oplus ^{!,\parallel }\), in which copying and a nonstandard construct, namely Plotkin’s parallel disjunction, are both available. A very liberal type system prevents deadlocks, but nevertheless leaves the expressive power of the calculus very high. This choice has been motivated by our will to put ourselves in the most general setting, so as to be able to talk about different fragments. The calculus is endowed with a notion of context distance, in Morris’ style. This is covered in Sect. 3 below.

  • We study trivialization of the obtained notion(s) of metric for different fragments of \(\varLambda _\oplus ^{!,\parallel }\), showing that both parallel disjunction and strong normalization give us precisely the kind of discriminating power we need to arbitrarily amplify distances, while in the most natural fragment, namely \(\varLambda _\oplus ^{!}\), trivialization does not hold. This is the subject of Sect. 4.

  • In Sect. 5, we prove that context distance can be characterized as a coinductively-defined distance on a labeled Markov chain of tuples. The way (tuples of) terms interact with their environment makes proofs of soundness laborious and different from their affine counterparts from [6]. An up-to-context notion of bisimulation is proved to be sound, and to be quite useful when evaluating the distance between concrete programs.

  • Finally, we show that the results from Sect. 5 can be lifted back to ordinary probabilistic \(\lambda \)-calculi from the literature [5, 10]. Both when call-by-name evaluation and call-by-value are considered, our framework can be naturally adapted, and helps in facilitating concrete proofs. This is in Sect. 6.

More details can be found in a long version of this paper, available online [7].

2 Metrics and Trivialization, Informally

The easiest way to render the pure \(\lambda \)-calculus a universal probabilistic computation model [10] consists in endowing it with a binary construct \(\oplus \) for probabilistic choice. The term \(M\oplus N\) evolves as either \(M\) or \(N\), each with probability \(\frac{1}{2}\). The obtained calculus can be given meaning by an operational semantics which puts terms in correspondence with distributions of values. The natural notion of observation, at least in an untyped setting like the one we will consider in this paper, is thus the probability of convergence of the observed term \(M\), which will be denoted as \({\sum \nolimits _{[\![M ]\!]}}\). One could then define a notion of context equivalence following Morris’ pattern, and stipulate that two terms \(M\) and \(N\) should be equivalent whenever they terminate with exactly the same probability when put in any context:

$$\begin{aligned} M\equiv N\;\Leftrightarrow \;\forall C.\;{\sum \nolimits _{[\![C[M] ]\!]}}={\sum \nolimits _{[\![C[N] ]\!]}}. \end{aligned}$$

The anatomy of the obtained notion of equivalence has been recently studied extensively, the by-products of this study being powerful techniques for it in the style of bisimilarity and logical relations [3, 5, 9].

As observed by various authors (see, e.g., [25] for a nice account), probabilistic programs and processes are naturally compared by metrics rather than equivalences: the latter do not give any quantitative information about how different two non-equivalent programs are. Given that the underlying notion of observation is inherently quantitative, on the other hand, generalizing context equivalence to a pseudometric turns out to be relatively simple:

$$\begin{aligned} \delta (M,N)=\sup _{C}\left| {\sum \nolimits _{[\![C[M] ]\!]}}-{\sum \nolimits _{[\![C[N] ]\!]}}\right| . \end{aligned}$$

Observe that the obtained notion of context distance between two terms is a real number between 0 and 1, which is minimal precisely when the considered terms are context equivalent. It is the least discriminating pseudometric which is non-expansive and adequate, and as such it provides some quite precise information about how far the two argument programs are, observationally. A similar notion has recently been studied by the authors [6], but only in a purely affine setting.

Let us now consider two prototypical examples of non-equivalent terms, namely \(I=\lambda x.x\) (the identity) and \(\varOmega \) (the always-divergent term). The context distance \(\delta ^c(I,\varOmega )\) between them is maximal: when applied, e.g., to the trivial context \([\cdot ]\), they converge with probability 1 and 0, respectively. A term which is conceptually “in the middle” of them is \(M=I\oplus \varOmega \). Indeed, in a purely affine \(\lambda \)-calculus, \(\delta ^c(I,M)=\delta ^c(M,\varOmega )=\frac{1}{2}\).

If we render the three terms duplicable (by putting them in the scope of a !-operator), however, the situation becomes much more complicated. Consider the terms !I and \(!(I\oplus \varOmega )\). One can easily define a family of contexts \(\{C_n\}_{n\in \mathbb {N}}\) such that the probability of convergence of \(C_n[!I]\) and \(C_n[!(I\oplus \varOmega )]\) tend to 1 and 0 (respectively) when n tends to infinity. It suffices to take \(C_n\) as \((\lambda !x.\underbrace{x\ldots x}_{\text{ n } \text{ times }})[\cdot ]\). Allowing contexts to have the capability to duplicate their argument seems to mean that they can arbitrarily amplify distances. Indeed, the argument above also works when \((I\oplus \varOmega )\) is replaced by any term which behaves as \(\varOmega \) with probability \(\varepsilon \) and as I with probability \(1-\varepsilon \), provided of course \(\varepsilon >0\). But how about \(!\varOmega \) and \(!(I\oplus \varOmega )\)? Are they at maximal distance, i.e. is it that \(\delta ^c(!\varOmega ,!(I\oplus \varOmega ))=1\)? Apparently, this is not the case. The previously defined contexts \(C_n\) cannot amplify the “linear” distance between the two terms above, namely \(\frac{1}{2}\), up to 1. But what is the distance between \(!\varOmega \) and \(!(I\oplus \varOmega )\), then? Evaluating it is hard, since you need to consider all contexts, which do not have a nice structure. In Sect. 5, we will introduce a different, better behaved, notion of distance, this way being able to prove that, indeed, \(\delta ^c(!\varOmega ,!(I\oplus \varOmega ))=\frac{1}{2}\).

All this hints at even more difficult examples, like the one in which \(M_\varepsilon = !{(\varOmega \oplus ^\varepsilon I)}\), where \(\oplus ^\varepsilon \) is the natural generalization of \(\oplus \) to a possibly unfair coin flip, and one is interested in evaluating \(\delta ^c(M_{\varepsilon },M_{\mu })\). In that case, we can easily see that the “linear” distance between them is \(|\varepsilon - \mu |\). In some cases, it is possible to amplify it: the most natural way is again to consider the contexts \(C_n\) defined above. Indeed, we see that the probabilities of convergence of \(C_n[M_\varepsilon ]\) and \(C_n[M_\mu ]\) are \(\varepsilon ^n\) and \(\mu ^n\), respectively. It follows that \(\delta ^c(M_\varepsilon , M_\mu )\ge \sup _{n \in \mathbb {N}}{|\varepsilon ^n - \mu ^n |}\). For some \(\varepsilon \) and \(\mu \) (for example if \(\varepsilon + \mu >1\)), the context distance can be greater than \(|\varepsilon -\mu |\). But there is no easy way to know how far amplification can lead us. The terms \(M_\varepsilon \) and \(M_\mu \) will be running examples in the course of this paper. Despite their simplicity, evaluating the distance between them is quite challenging.

We are also going to consider the case in which contexts can evaluate terms in parallel, converging if and only if at least one of the copies converges. This behavior is not expressible in the usual \(\lambda \)-calculus, but is captured by well-known constructs and in particular by Plotkin’s parallel disjunction [28]. In Sect. 4 below, we prove that all this is not accidental: the presence of parallel disjunction turns a non-trivializing metric into a trivializing one. The proof of it, by the way, relies on building certain amplifying contexts which are then shown to be universal using tools from functional analysis.

3 A Linear Probabilistic \(\lambda \)-Calculus

In this section, we present the syntax and operational semantics of our language \(\varLambda _\oplus ^{!,\parallel }\), on which we will later define metrics. \(\varLambda _\oplus ^{!,\parallel }\) is a probabilistic and linear \(\lambda \)-calculus, designed not only to allow copying, but to have a better control on it. It is based on a probabilistic variation of the calculus defined in [33], whose main feature is to never reduce inside exponential boxes. As we will see in Sect. 6, the calculus is capable of encoding both call-by-value and call-by-name fully-fledged probabilistic \(\lambda \)-calculi. We add a parallel disjunction construct to the calculus, being inspired by Plotkin’s parallel disjunction [28]. Noticeably, it has been recently shown [8] that adding parallel disjunction to a (non-linear) \(\lambda \)-calculus increases the expressive power of contexts to the point of enabling coincidence between the contextual preorder and applicative similarity. The choice of studying a very general calculus is motivated by our desire to be as general as possible. This being said, many of our results hold only in absence of parallel disjunction.

Definition 1

We assume a countable set of variables \(\mathscr {X}\). The set of terms of \(\varLambda _\oplus ^{!,\parallel }\) (denoted \(\mathscr {T}\)) is defined by the following grammar:

where \(x\in \mathscr {X}\). The fragment of \(\varLambda _\oplus ^{!,\parallel }\) without the \(\, \left( [\cdot \parallel \cdot ] \rightarrowtail \cdot \right) \,\) construct will be indicated as \(\varLambda _\oplus ^{!}\). Values are those terms derived from the following grammar:

As already mentioned, \(M \,\oplus \, N\) can evolve to either \(M\) or \(N\), each with probability \(\frac{1}{2}\). The term \(!{M}\) is a duplicable version of \(M\), often called an (exponential) box. We have two distinct abstraction operators: \(\lambda x. M\) is a linear abstraction, while the non-linear abstraction \(\lambda {!{x}}. M\) requires exponential boxes as arguments. The term \(\, \left( [M \parallel N ] \rightarrowtail L\right) \,\) behaves as \(L\) if either \(M\) or \(N\) converges. Please observe that both abstractions and boxes are values—our notion of reduction is weak and surface [33].

We are now going to define an operational semantics for the closed terms of \(\varLambda _\oplus ^{!,\parallel }\) in a way similar to the one developed for a (non-linear) \(\lambda \)-calculus in [10]. We need to first define a family of approximation semantics, and then to take the semantics of a term as the least upper bound of all its approximations. The approximation semantics relation is denoted \(M \Rightarrow \mathcal {D}\), where \(M\) is a closed term of \(\varLambda _\oplus ^{!,\parallel }\), and \(\mathcal {D}\) is a (sub)distribution on values with finite support, i.e., a function from \(\mathscr {V}\) to \(\mathbb {R}_{[0,1]}\) which sums to a real number \(\sum _\mathcal {D}\le 1\). For any distribution \(\mathcal {D}\) on a set X, we call support of \(\mathcal {D}\), and we note \(\textsf {S}(\mathcal {D})\), the set \( \{x \in X \mid \mathcal {D}(x)>0 \}\). We say that \(\mathcal {D}\) is finite if \(\textsf {S}(\mathcal {D})\) is a finite set.

The rules deriving the approximation semantics relation are given in Fig. 1, and are based on the notion of an evaluation context, which is an expression generated from the following grammar:

As usual, \(E[M]\) stands for the term obtained by filling the sole occurrence of \([\cdot ]\) in \(E\) with \(M\). In Fig. 1 and elsewhere in this paper, we indicate the distribution assigning probability \(p_i\) to \(V_i\) for every \(i\in \{1,\ldots ,n\}\) as \(\{V_1^{p_1},\ldots ,V_n^{p_n}\}\). We proceed similarly for the expression \(\{V_i^{p_i}\}_{i\in I}\), where I is any countable index set. Observe how we first define a one-step reduction relation \(\cdot \rightarrow \cdot \) between closed terms and sequences of terms, only later extending it to a small-step reduction relation \(\cdot \Rightarrow \cdot \) between closed terms and distributions on values. A reduction step can be a linear or non-linear \(\beta \)-reduction, or a probabilistic choice. Moreover, there can be more than one active redex in any closed term \(M\), due to the presence of parallel disjunction. For any term \(M\), the set of sub-distributions \(\mathcal {D}\) such that \(M \Rightarrow \mathcal {D}\) is a countable directed set. Since the set of sub-distributions (with potentially infinite support) is an \(\omega \)-complete partial order, we can define the semantics of a term \(M\) as \([\![M ]\!]= \sup \{\mathcal {D}\mid M \Rightarrow \mathcal {D}\} \). We could also define alternatively a big-step semantics, again in the same way as that of the probabilistic \(\lambda \)-calculus considered in [10].

Fig. 1.
figure 1

Approximation semantics for \(\varLambda _\oplus ^{!}\)

Fig. 2.
figure 2

Equality of types

Not all irreducible terms are values in \(\varLambda _\oplus ^{!,\parallel }\), e.g. \((\lambda {!{x}}. x) (\lambda x. x)\). We thus need a type-system which guarantees the absence of deadlocks. Since we want to be as general as possible, we consider recursive types as formulated in [2], which are expressive enough to type the image of the embeddings we will study in Sect. 6. The grammar of types is the following:

Types are defined up to the equality \(=^{\mathscr {A}}\), defined in Fig. 2. \(\sigma [\alpha \rightarrow \tau ]\) stands for the type obtained by substituting all free occurrences of \(\alpha \) by \(\tau \) in \(\sigma \). An environment is a set of expressions in the form \(x:\sigma \) or \(!{x}:!{\sigma }\) in which any variable occurs at most once. Environments are often indicated with metavariables like \(! \varGamma \), which stands for an environment in which all variables occur as \(!{x}\), or \(\varDelta \) in which, on the contrary, variables can only occur with the shape \(x\), so that \(\varDelta \) is of the form \(x_1:\sigma _1,\ldots ,x_n:\sigma _n\). Typing judgments are thus of the form \(! \varGamma ,\varDelta \vdash M : \sigma \). The typing system is given in Fig. 3. The role of this type system is not to guarantee termination, but rather to guarantee a form of type soundness:

Lemma 1

If \( \vdash M : \sigma \) and \(M \Rightarrow \mathcal {D}\), then \( \vdash V : \sigma \) for every \(V\) in the support of \(\mathcal {D}\). Moreover, if \( \vdash M : \sigma \) and \(M\) is irreducible (i.e. for every \(N\)), then \(M\) is value.

Fig. 3.
figure 3

Typing rules

Example 1

The term \(I = \lambda x. x\) can be typed as \( \vdash I : \sigma \multimap \sigma \) for every \(\sigma \in \mathscr {A}\). We define \({\varOmega _!}\) to be the term \((\lambda {!{x}}. x{!{x}})(!{(\lambda {!{x}}. x{!{x}})})\), which is the counterpart in our linear calculus of the prototypical diverging term of the \(\lambda \)-calculus, namely \(\varOmega =(\lambda x.xx)(\lambda x.xx)\). We can type this divergent term with any possible type: indeed, if we take \(\tau \,{:}{:=}\,\mu {\alpha }. !{\alpha } \multimap \sigma \), then \(\tau =^{\mathscr {A}}!{\tau } \multimap \sigma \) and \( \vdash \lambda {!{x}}. x{!{x}} : \sigma \). Using that, we can see that \( \vdash {\varOmega _!} : \sigma \) for every type \(\sigma \). We will see in Sect. 6 that, more generally, there are several ways to turn any pure \(\lambda \)-term \(M\) into a \(\varLambda _\oplus ^{!}\) term in such a way as to obtain meaningful typing and semantics: \(\varLambda _\oplus ^{!}\) is actually at least as powerful as the usual untyped probabilistic \(\lambda \)-calculus [10].

Termination could in principle be guaranteed if one considers strictly positive types, as we will do in Sect. 4.1 below. Let \(\mathbb {D}\) be the set of dyadic numbers (i.e. those rational numbers in the form \(\frac{n}{2^m}\) (with \(n,m\in \mathbb {N}\) and \(n\le 2^m\)). It is easy to derive, for every \(\varepsilon \in \mathbb {D}\), a new binary operator on terms \(\cdot \oplus ^{\varepsilon } \cdot \) such that \([\![M \oplus ^{\varepsilon } N ]\!]=(1-\varepsilon )[\![M ]\!]+ \varepsilon [\![N ]\!]\) for every closed \(M,N\).

Example 2

We define here a family of terms that we use as a running example. We consider terms of the form \(M_\varepsilon = \,!{(\varOmega _! \oplus ^{\varepsilon } I)}\), for \(\varepsilon \in \mathbb {D}\). It holds that \( \vdash M_\varepsilon : !{(\sigma \multimap \sigma )}\) for every \(\sigma \). \(M_\varepsilon \) corresponds to a duplicable term, each copy of which behaves as I with probability \(\varepsilon \), and does not terminate with probability \(1-\varepsilon \).

3.1 Some Useful Terminology and Notation

In this paper, we will make heavy use of sequences of terms and types. It is thus convenient to introduce some terminology and notation about them.

A finite (ordered) sequence whose elements are \(e_1,\ldots ,e_n\) will be indicated as \(\varvec{e}=[e_1,\ldots ,e_n]\), and called an n-sequence. Metavariables for sequences are boldface variations of the metavariables for their elements. Whenever \(E=\{i_1,\ldots ,i_m\}\subseteq \{1,\ldots ,n\}\) and \(i_1<\ldots <i_m\), the sub-sequence \([e_{i_1},\ldots ,e_{i_m}]\) of an n-sequence \(\varvec{e}\) will be indicated as \(\varvec{e}_E\). If the above holds, \(E\) will be called an n-set. If \(\varvec{e}\) is an n-sequence, and \(\varphi \) is a permutation on \(\{1, \ldots ,n\}\), we note \(e_\varphi \) the n-sequence \([e_{\varphi (1)}, \ldots , e_{\varphi {(n)}}]\). We can turn an n-sequence into an \((n+1)\)-sequence by adding an element at the end: this is the role of the semicolon operator. We denote by \([e^n]\) the n-sequence in which all components are equal to \(e\).

Whenever this does not cause ambiguity, notations like the ones above will be used in conjunction with syntactic constructions. For example, if \(\varvec{\sigma }\) is an n-sequence of types, then \(!{\varvec{\sigma }}\) stands for the sequence \([!{\sigma _1},\ldots ,!{\sigma _n}]\). As another example, if \(\varvec{\sigma }\) is an n-sequence of types and \(E\) is an n-set, then \(\varvec{x}_E:\varvec{\sigma }_E\) stands for the environment assigning type \(\sigma _i\) to \(x_i\) for every \(i\in E\). As a final example, if \(\varvec{M}\) is an n-sequence of terms and \(\varvec{\sigma }\) is an n-sequence of types, \( \vdash \varvec{M} : \varvec{\sigma }\) holds iff \( \vdash M_i : \sigma _i\) is provable for every \(i\in \{1,\ldots ,n\}\).

3.2 Context Distance

A context of type \(\sigma \) for terms of type \(\tau \) is a term \(C\) which can be typed as \( hole \,:\,\tau \vdash C : \sigma \), where \( hole \) is a distinguished variable. \(\mathcal {C}^{\tau }_{\sigma }\) collects all such terms. If \(C\in \mathcal {C}^{\tau }_{\sigma }\) and \(M\) is a closed term of type \(\tau \), then the closed term \(C\{ hole /M\}\) has type \(\sigma \) and is often indicated as \(C[M]\).

The context distance [6] is the natural quantitative refinement of context equivalence. Intuitively, it corresponds to the maximum separation that contexts can induce between two terms. Following [6], we take as observable the probability of convergence: for any term \(M\), we define its observable \(\text {Obs}(M)\) as \({\sum \nolimits _{[\![M ]\!]}}\). Then, for any terms \(M\), \(N\) such that \( \vdash M : \sigma \) and \( \vdash N : \sigma \), we define:

$$\begin{aligned} {\delta _{\sigma ,!{}, \parallel }^{c}}(M,N)= \sup _{C\in \mathcal {C}^{\tau }_{\sigma }} {|{\text {Obs}(C[M]) - \text {Obs}(C[N])}|}. \end{aligned}$$

Please observe that this distance is a pseudometric, and that moreover we can recover context equivalence by considering its kernel, that is the set of pairs of terms which are at distance 0. The binary operator \(\delta ^{c}_{\sigma ,!}\) is defined similarly, but referring to terms (and contexts) from \(\varLambda _\oplus ^{!}\).

Example 3

What can we say about \({\delta _{\sigma ,!{}, \parallel }^{c}}(M_{\varepsilon },M_{\mu })\)? Not much apparently, since all contexts should be considered. Even if we put ourselves in the fragment \(\varLambda _\oplus ^{!}\), the best we can do is to conclude that \({\delta ^{c}_{\sigma ,!}}(M,N)\ge \sup _{n\in \mathbb {N}}|\varepsilon ^n-\mu ^n|\), as explained in Sect. 2.

4 On Trivialization

As we have already mentioned, there can well be classes of terms such that the context distance collapses to context equivalence, due to the copying abilities of the language. The question of trivialization can in fact be seen as a question about the expressive power of contexts: given two duplicable terms, how much can a context amplify the observable differences between their behaviors?

More precisely, we would like to identify trivializing fragments of \(\varLambda _\oplus ^{!,\parallel }\), that is to say fragments such that for any pair of duplicable terms, their context distance (with respect to the fragment) is either 0 or 1. This is not the case in \(\varLambda _\oplus ^{!}\) (see Example 8 below).

In fact, a sufficient condition to trivialization is to require the existence of amplification contexts: for every observable type \( \sigma \), for every \(\alpha , \beta \in [0,1]\) distinct, for every \(\gamma >0\), we want to have a context \(C_\sigma ^{\alpha ,\beta , \gamma }\) such that:

$$\begin{aligned} \left. \begin{array}{r} \vdash M, N : \sigma \\ \text {Obs}(M)= \alpha \\ \text {Obs}(N)= \beta \end{array} \right\} \Rightarrow \left|\text {Obs}(C_\sigma ^{\alpha , \beta , \gamma }[!{M}]) -\text {Obs}(C_\sigma ^{\alpha , \beta , \gamma }[!{N}]) \right|\ge 1-\gamma . \end{aligned}$$

Fact 1

Any fragment of \(\varLambda _\oplus ^{!,\parallel }\) admitting all amplifications contexts trivializes.

4.1 Strictly Positive Types

First, let us consider the case of the fragment \(\varLambda _\oplus ^{!,\downarrow }\) of \(\varLambda _\oplus ^{!}\) obtained by considering strictly positive types, only (in a similar way to [2]), and by dropping parallel disjunction. Every term \(M\) of \(\varLambda _\oplus ^{!,\downarrow }\) is terminating (i.e. \(\sum [\![M ]\!]=1\)), so we need to adapt our notion of observation: we define the type \({\mathbb {B}} =\, !{\alpha } \multimap !{\alpha } \multimap \alpha \), which can be seen as boolean type using a variant of the usual boolean encoding in \(\lambda \)-calculi. Our new notion of observation, defined only at type \(\mathbb {B}\), is \(\text {Obs}(M)= {\sum \nolimits _{[\![M!{I}!{{\varOmega _!}} ]\!]}}\), which corresponds to the probability that \(M\) evaluates to \(\mathbf{true }\). While this notion of observation uses the full power of \(\varLambda _\oplus ^{!}\), the context distance \(\delta _{!{},\downarrow }^{c}\) based on it only consider contexts in \(\varLambda _\oplus ^{!,\downarrow }\).

Theorem 1

\(\delta _{!{},\downarrow }^{c}\) trivializes.

The proof of Theorem 1 is based on the construction of amplification contexts. We are going to use Bernstein constructive proof of the Stone-Weierstrass theorem. Indeed, Bernstein showed that for every continuous function \(f: [0,1]\rightarrow \mathbb {R}\), the following sequence of polynomials converges uniformly towards f:

$$\begin{aligned} P_n^f(x) = \sum \nolimits _{0 \le k \le n} f\left( \frac{k}{n}\right) \cdot B_k^n(x), \text { where }B_k^n(x) = \left( {\begin{array}{c}n\\ k\end{array}}\right) \cdot x^k\cdot (1-x)^{n-k}. \end{aligned}$$

Let us consider the following continuous function: we fix \(f(\alpha ) = 0\), \(f(\beta ) = 1\), and f defined elsewhere in such a way that it is continuous, that it has values in [0, 1], and that moreover \(f(\mathbb {Q})\subseteq \mathbb {Q}\). We can easily implement \(P_n^f\) by a context, i.e. define \(C\) such that for every \(M\), \(\text {Obs}(C[M]) = P_n^f(\text {Obs}(M)) \). In \(\varLambda _\oplus ^{!,\downarrow }\), we can indeed copy an argument n times, then evaluate it, and then for every k between 0 and n, if the number of \(\mathbf{true }\)s obtained is exactly k, return the term \(\mathbf{false } \oplus ^{f(\frac{k}{n})} \mathbf{true }\) (that corresponds to a term returning \(\mathbf{true }\) with probability \({f \left( \frac{k}{n}\right) }\)). Please observe that this construction works only because in \(\varLambda _\oplus ^{!,\downarrow }\) all terms converge with probability 1.

4.2 Parallel Disjunction

As we have seen, trivialization can be enforced by restricting the class of terms, but we can also take the opposite road, namely increasing the discriminating power of contexts. Indeed, consider the full language \(\varLambda _\oplus ^{!,\parallel }\), with the usual notion of observation.

We can first see how parallel disjunction increases the expressive power of the calculus on a simple example. Consider the following two terms: \( M=\, !{\varOmega _!}\) and \(N= \,!{(\varOmega _! \,\oplus \, I)}\). We will see later that these two terms are the simplest example of non-trivialization in \(\varLambda _\oplus ^{!}\): indeed \({\delta ^{c}_{!{(\tau \multimap \tau )},!}}(M,N)= \frac{1}{2}\), while \( {\delta _{!{(\tau \multimap \tau )},!{}, \parallel }^{c}}(M,N)= 1\). In \(\varLambda _\oplus ^{!,\parallel }\), we are able to define a family of contexts \((C_n)_{n\in \mathbb {N}}\) as follows:

$$\begin{aligned} C_n =\left( \lambda {!{x}}. \, \left( [x \parallel \, \left( [x \parallel \ldots ] \rightarrowtail I\right) \, ] \rightarrowtail I\right) \, \right) [\cdot ]. \end{aligned}$$

Essentially, \(C_n\) makes n copies of its argument, and then converges towards I if at least one of these copies itself converges. When we apply the context \(C_n\) to \(M\) and \(N\), we can see that the convergence probability of \(C_n[M]\) is always 0 independently of n, whereas the convergence probability of \(C_n[N]\) tends towards 1 when n tends to infinity.

Theorem 2

\(\delta _{!{}, \parallel }^{c}\) trivializes.

The proof is based on the construction of amplification contexts \(C_\sigma ^{\alpha , \beta , \gamma }\). If \(\max (\alpha , \beta ) = 1\), we can extend the informal argument from Sect. 2, by taking contexts that copy an arbitrary number of times their argument. If \(\min (\alpha , \beta ) = 0\), we can use the same idea as in the example above, by taking contexts that do an arbitrary number of disjunctions. What remains to be done to obtain the trivialization result is treating the case in which \(0< \alpha , \beta < 1\). The overall idea is to somehow mix the contexts we use in the previous cases. More precisely, we define a family of contexts \((C_n^m)_{n, m \in \mathbb {N}}\) as follows:

$$\begin{aligned} C_n^m = \lambda {!{y}}. \left( {{\bigwedge }^{n}({\bigvee }^{m}(y, \ldots , y), \ldots ,{{\bigvee }^{m}(y, \ldots , y)})} \right) {[\cdot ]} \end{aligned}$$

where

$$\begin{aligned} {\bigvee }^{n}(M_1, \ldots M_n)&= \, \left( [M_1 \parallel \, \left( [M_2 \parallel \ldots ] \rightarrowtail I\right) \, ] \rightarrowtail I\right) \, ;\\ {\bigwedge }^{n}(M_1\ldots M_n)&= {(\lambda z_1. \lambda z_2. \ldots \lambda y. (yz_1 \ldots z_n))}{M_1 \ldots M_n}. \end{aligned}$$

The term \({\bigvee }^{n}(M_1, \ldots , M_n)\) behaves as a n-ary disjunction: it terminates if at least one of the \(M_i\) terminates. On the other hand, \({\bigwedge }^{n}(M_1, \ldots , M_n)\) can be seen as a n-ary conjunction: it terminates if all the \(M_i\) terminates. The contexts \(C_n^\alpha \) compute m-ary conjunctions of n-ary disjunctions. Now, let \(\iota \) be such that \(\alpha< \iota < \beta \). We need to show that for every n, we can choose \(m(n, \iota ) \in \mathbb {N}\) such that:

$$\begin{aligned} \lim _{n \rightarrow \infty } \text {Obs}(C_n^{m(n, \iota )}[!{M}]) = {\left\{ \begin{array}{ll}1 \text { if } \text {Obs}(M)> \iota ; \\ 0 \text { if } \text {Obs}(M)< \iota .\end{array}\right. }. \end{aligned}$$

We can express this problem purely in terms of functional analysis, by observing that \( \text {Obs}(C_n^m[!{M}]) = (1-(1-\text {Obs}(M))^m)^n\). Then the result is proved by applying the dominated convergence theorem to a well-chosen sequence of functions.

5 Tuples and Full Abstraction

This section is structured as follows: first, we define a labeled Markov chain (LMC) which expresses the semantics of our calculus in an interactive way, and then we use it to give a coinductively-defined notion of distance on a labeled transition system (LTS) of distributions, which coincides with the context distance defined in Sect. 3.2. We are not considering parallel disjunction here: the motivations for that should be clear from Theorem 2.

5.1 A Labeled Markov Chain over Tuples

Labeled Markov chains are the probabilistic analogues to labeled transition systems. Formally, a LMC is a triple \(\mathscr {M}= (\mathcal S,\mathcal A, \mathcal P)\), where \(\mathcal S\) is a countable set of states, \(\mathcal A\) is a countable set of labels, and \(\mathcal P: \mathcal S\times \mathcal A\rightarrow {\text {Distr}(\mathcal S)}\) is a transition probability matrix (where \(\text {Distr}(X)\) is the set of all distributions over X).

Following [9], the interactive behavior of probabilistic \(\lambda \)-terms can be represented by a LMC whose states are the terms of the language, whose actions are values, and where performing the action \(V\) starting from a state \(M\) corresponds to applying the value \(V\) to \(M\). This approach is bound not to work in presence of pairs when metrics take the place of equivalences, due to the unsoundness of projective actions. In [6], this observation led us to introduce a new LMC whose states are tuples of terms, and whose actions include one splitting a pair: \(\mathcal P(\left[ \langle M,N\rangle \right] )({\text {destruct}}) = {\{{\left[ M, N\right] }^1\}} \). This turns out to work well in an affine setting [6]. We are going to define a LMC \(\mathscr {M}_\oplus ^{!{}}= (\mathcal {S}_{\mathscr {M}_\oplus ^{!{}}}, \mathcal {A}_{\mathscr {M}_\oplus ^{!{}}}, \mathcal {P}_{\mathscr {M}_\oplus ^{!{}}})\) which is an extension of the one from [6], and which is adapted to a language with copying capabilities. The idea is to treat exponentials in the spirit of Milner’s Law: \(!{A} \multimap A \otimes !{A}\).

States. Tuples are pairs of the form \(K= (\varvec{M},{\varvec{V}})\) where \({\varvec{M}}\) and \(\varvec{V}\) are a sequence of terms and values, respectively. The set of all such tuples is indicated as \( \mathscr {U} \). The first component of a tuple is called its exponential part, while the second one is called its linear part. We write \( \vdash (\varvec{M}, \varvec{V}) : (\varvec{\sigma }, \varvec{\tau })\) if \( \vdash \varvec{M} : \varvec{\sigma }\) and \( \vdash \varvec{V} : \varvec{\tau }\). We note \(\mathbf T\) the set of pairs \(A= (\varvec{\sigma },\varvec{\tau })\), and we call tuple types the elements of \(\mathbf T\). Moreover, we say that \((\varvec{\sigma },\varvec{\tau })\) is a (n, m) tuple type if \(\varvec{\sigma }\) and \(\varvec{\tau }\) are, respectively, an n-sequence and an m-sequence. To any term \(M\), we associate a tuple in a natural way: we note \(\dot{M}\) the tuple \((\left[ \right] ,\left[ M\right] )\), and similarly if \(\sigma \) is a type, we indicate the tuple type \((\left[ \right] , \left[ \sigma \right] )\) as \(\dot{\sigma }\). Please observe that if \( \vdash M : \sigma \), then it holds that \( \vdash \dot{M} : \dot{\sigma }\).

A sequence of the form \((E, F, \varvec{\sigma }, \varvec{\tau }, M, \gamma )\) is said to be an applicative typing judgment when \(\varvec{\sigma }\) and \(\varvec{\tau }\) are, respectively, an n-sequence and an m-sequence of types, \(E\) and \(F\) are respectively an n-set and an m-set, and moreover it holds that \(!{\varvec{x}_E} : !{\varvec{\sigma }_E}, \varvec{y}_F: \varvec{\tau }_F \vdash M : \gamma \). Intuitively, this means that if we have a tuple \(K= (\varvec{N}, \varvec{V})\) of type \((\varvec{\sigma }, \varvec{\tau })\), we can replace free variables of \(M\) by some of the terms from \(K\). More precisely, we can replace variables in linear position by the \(V_i\) with \(i \in F\), and variables in non linear position by \(N_j\), with \(j \in E\). We note as \(M[K]\) the closed term of type \(\gamma \) that we obtain this way. We note \(\mathscr {J}\) the set of all applicative typing judgments. We are specially interested in those judgments \((E, F, \varvec{\sigma }, \varvec{\tau }, M, \gamma )\) in \(\mathscr {J}\) such that for every tuple \(K\), the resulting term \(M[K]\) is a value: that is when either \(M= y_i\) for a \(i \in \mathbb {N}\), or \(M\) is of the form \(\lambda z. N\), \(\lambda {!{z}}. N\), or \(!{N}\). We note \(\mathscr {J}^{\mathscr {V}}\) the set of those judgments.

We are now in a position to define \(\mathscr {M}_\oplus ^{!{}}\) formally. The set of its states is indeed defined as \(\mathcal {S}_{\mathscr {M}_\oplus ^{!{}}}= \{(K, A) \mid K\in \mathscr {U} ,\, A\in \mathbf T, \, \, \vdash K : A\}\).

Labels and Transitions. How do states in \(\mathcal {S}_{\mathscr {M}_\oplus ^{!{}}}\) interact with the environment? This is captured by the labels in \(\mathcal {A}_{\mathscr {M}_\oplus ^{!{}}}\), and the associated probability matrix. We define \(\mathcal {A}_{\mathscr {M}_\oplus ^{!{}}}\) as the disjoint union of \({\mathcal A_{?}}\), \(\mathcal A_{{!}}\) and \(\mathcal A_{@}\), where:

$$\begin{aligned} \mathcal A_{{!}}={\mathcal A_{?}}= \{i \mid i \in \mathbb {N}\}; \qquad \qquad \mathcal A_{@}= \{ (\kappa , i) \mid i \in \mathbb {N}, \, \kappa \in \mathscr {J}^{\mathscr {V}}\}. \end{aligned}$$

In order to distinguish actions in \(\mathcal A_{{!}}\) and \({\mathcal A_{?}}\), we write the action \(i\in \mathbb {N}\) as \(({?}^{i})\) if it comes from \({\mathcal A_{?}}\), and as \((!^{i})\) if it comes from \(\mathcal A_{{!}}\). The action \((\kappa ,i)\in \mathcal A_{@}\) is often indicated as \( @^{i}_{\kappa }\). The probability matrix \(\mathcal {P}_{\mathscr {M}_\oplus ^{!{}}}\) is defined formally in Fig. 4. We give below some intuitions about it. The general idea is that \(\mathscr {M}_\oplus ^{!{}}\) is designed to express every possible effect that a context can have on tuples. \({\mathcal A_{?}}\) and \(\mathcal A_{{!}}\) are designed to model copying capabilities, while \(\mathcal A_{@}\) corresponds to applicative interactions.

Actions in \({\mathcal A_{?}}\) take any term of the form \(!{M}\) from the linear part of the underlying tuple, unbox it and transfer \(M\) to the exponential part of the tuple. Please observe that this action is in fact deterministic: the resulting tuple is uniquely determined. Labels in \(\mathcal A_{{!}}\), on the other hand, model the act of copying terms in the exponential part. We call its elements Milner’s actions. More specifically, the action \((!^{i})\) takes the i-th term in the exponential part of the tuple, makes a copy of it, evaluates it and adds the result to the linear part. Please observe that, contrary to \(({?}^{i})\), this action can have a probabilistic outcome: the transferred term is evaluated.

Labels in \(\mathcal A_{@}\) are analogues of the applicative actions from applicative bisimulation over terms (see, e.g. [9]). As such, they model environments passing arguments to programs. Here, we have to adapt this idea to our tuple framework: indeed, we can see the tuple as a collection of programs available to the environment, who is free to choose with which of the programs to interact with by passing it an argument. This argument, however, could depend on other components of the tuple, which need to be removed from it if lying in its linear part. Finally, please observe that all this should respect types. Labels in \(\mathcal A_{@}\) are indeed defined as a pair of an index i corresponding to the position in the tuple of the term the environment chooses, and an applicative typing judgment, used to specify the argument. Please observe that in the definition of the probability matrix for applicative actions, in Fig. 4, the condition on i implies that the i-th linear component of the tuple is not used to construct the argument term.

Fig. 4.
figure 4

Definition of \(\mathcal {P}_{\mathscr {M}_\oplus ^{!{}}}\)

Example 4

We give in Fig. 5 a fragment of \(\mathscr {M}_\oplus ^{!{}}\) illustrating our definitions on an example. Let \(\tau \) be an element of \(\mathscr {A}\). We consider terms of the form \(M_\varepsilon = !{(\varOmega _! \oplus ^{\varepsilon } I)}\), for \(\varepsilon \in \mathbb {D}\) and we look at some of the possible evolutions in \(\mathscr {M}_\oplus ^{!{}}\) from the associated state \((\dot{M_\varepsilon }, \dot{!{({\tau \multimap \tau })}}) = (\left[ \right] , \left[ M_\varepsilon \right] ),(\left[ \right] , \left[ !{({\tau \multimap \tau })}\right] )\). In Fig. 5, we denote by \(\sigma \) the type \(\tau \multimap \tau \).

Fig. 5.
figure 5

A fragment of \(\mathcal {P}_{\mathscr {M}_\oplus ^{!{}}}\)

5.2 Distributions as States

Now that we have a LMC \(\mathcal {P}_{\mathscr {M}_\oplus ^{!{}}}\) modeling interaction between (tuple of) terms and their environment, we could define notions of metrics following one of the abstract definitions from the literature, e.g. by defining the trace distance or the behavioral distance between terms. This is, by the way, the approach followed in [6]. We prefer, however, to first turn \(\mathcal {P}_{\mathscr {M}_\oplus ^{!{}}}\) into a transition system \(\mathscr {L}_\oplus ^{!{}}\) whose states are distributions of tuples. This supports both a simple, coinductive presentation of the trace distance, but also up-to techniques, as we will see in Sect. 5.5 below. Both will be very convenient when evaluating the distance between concrete terms, and in particular for our running example.

It turns out that the usual notion of LTS is not sufficient for our purposes, since it lacks a way to expose the observables of each state, i.e., its sum. We thus adopt the following definition:

Definition 2

A weighted labeled transition system (WLTS for short) is a quadruple in the form \(\mathscr {L}= (\mathcal S, \mathcal A, {}{\mathop {\rightarrow }\limits ^{\cdot }}{} , w) \) where:

  • \(\mathcal S\) is a set of states and \(\mathcal A\) is a countable set of actions,

  • \({}{\mathop {\rightarrow }\limits ^{\cdot }}{} \) is a transition function such that, for every \(t\in \mathcal S\) and \(a\in \mathcal A\), there exists at most one \(s\in \mathcal S\) such that \({t}{\mathop {\rightarrow }\limits ^{a}}{s} \),

  • \(w : \mathcal S\rightarrow [0,1] \) is a function.

Please observe how WLTSs are deterministic transition systems. We define the WLTS \(\mathscr {L}_\oplus ^{!{}}\) by the equations of Fig. 6.

Fig. 6.
figure 6

The WLTS \(\mathscr {L}_\oplus ^{!{}}=(\mathcal {S}_{\mathscr {L}_\oplus ^{!{}}}, \mathcal {A}_{\mathscr {L}_\oplus ^{!{}}}, {}{\mathop {\rightarrow }\limits ^{\cdot }}{} ,w) \)

If \(t= (\mathcal {D}, A)\) is in \(\mathcal {S}_{\mathscr {L}_\oplus ^{!{}}}\), we say that \(t\) is a \(A\)-state. It is easy to check that \(\mathscr {L}_\oplus ^{!{}}\) is nothing more than the natural way to turn \(\mathcal {P}_{\mathscr {M}_\oplus ^{!{}}}\) into a deterministic transition system. We illustrate this idea in Fig. 7, by giving a fragment of \(\mathscr {L}_\oplus ^{!{}}\) corresponding to (part of) the fragment of \(\mathscr {M}_\oplus ^{!{}}\) given in Example 4.

Fig. 7.
figure 7

A fragment of \(\mathscr {L}_\oplus ^{!{}}\)

5.3 A Coinductively-Defined Metric

Following Desharnais et al. [13], we use a quantitative notion of bisimulation on \(\mathscr {L}_\oplus ^{!{}}\) to define a distance between terms. The idea is to stipulate that, for any \(\varepsilon \in [0,1]\), a relation \(\mathrel {R}\) is an \(\varepsilon \)-bisimulation if it is, somehow, \(\varepsilon \)-close to a bisimulation. The distance between two states \(t\) and \(s\) is just the smallest \(\varepsilon \) such that \(t\) and \(s\) are \(\varepsilon \)-bisimilar. However, while in [13] the notion of \(\varepsilon \)-bisimulation is local, we want it to be more restricted by the global deviation we may accept considering arbitrary sequences of actions.

Definition 3

Let \(\mathscr {L}= (\mathcal S, \mathcal A, {}{\mathop {\rightarrow }\limits ^{\cdot }}{} ,w)\) be a WLTS. Let \(\mathrel {R}\) be a symmetric and reflexive relation on \(\mathcal S\), and \(\varepsilon \in [0,1]\). \(\mathrel {R}\) is a \(\varepsilon \)-bisimulation whenever the following two conditions hold:

  • if \(t\,{\mathrel {R}}\,s\), and \({t}{\mathop {\rightarrow }\limits ^{a}}{u} \), then there exists \(v\) such that \({s}{\mathop {\rightarrow }\limits ^{a}}{v} \), and it holds that \(u\,{\mathrel {R}}\,v\).

  • if \(t\,{\mathrel {R}}\,s\), then \(|w(t)- w(s)|\le \varepsilon \).

For every \(\varepsilon \in [0,1]\), there exists a largest \(\varepsilon \)-bisimulation, that we indicate as \(\mathrel {R}^\varepsilon \). Please observe that it is not an equivalence relation (since it is not transitive). We can now define a metric on \(\mathcal S\): \({\delta ^{b}_{\mathscr {L}}}(t,s)= \inf {\left\{ \varepsilon \mid t\,{\mathrel {R}^\varepsilon }\,s\right\} }\). The greatest lower bound is in fact reached as a \({\delta ^{b}_{\mathscr {L}}}(t,s)\)-bisimulation [7].

How can we turn \(\delta ^{b}_{\mathscr {L}}\) into a metric on terms? The idea is to consider the distributions on tuples one naturally gets when evaluating the term. To every term \(M\) of type \(\sigma \), we define \(\hat{s}_{\sigma }(M)\in \mathcal {S}_{\mathscr {L}_\oplus ^{!{}}}\) as \( (\{{\dot{V}}^{[\![M ]\!](V)}\}_{V\in \mathscr {V}}, \dot{\sigma })\).

Definition 4

For every terms \(M\) and \(N\) such that \( \vdash M : \sigma \) and \( \vdash N : \sigma \), we set \({\delta _{\sigma ,!{}}^b}(M,N)= {\delta ^{b}_{\mathscr {L}_\oplus ^{!{}}}}(\hat{s}_{\sigma }(M),\hat{s}_{\sigma }(N))\).

Example 5

Consider again the terms \(M_\varepsilon \) and \(M_{\mu }\) from Example 2. We fix a type \(\tau \), and define \(\sigma = \tau \multimap \tau \). As mentioned in Example 2, it holds that \( \vdash M_\varepsilon : !{\sigma }\). Let now \(\varepsilon , \mu , \alpha \) be in [0, 1], and let \(\mathrel {R}\) be any \(\alpha \)-bisimulation, such that \(\hat{s}_{!{\sigma }}(M_{\varepsilon })\,{\mathrel {R}}\,\hat{s}_{!{\sigma }}(M_{\mu }) \). Let \(\{t_i\}_{i\in \mathbb {N}}\) and \(\{s_i\}_{i\in \mathbb {N}}\) be families from \(\mathcal {S}_{\mathscr {L}_\oplus ^{!{}}}\) such that \({\hat{s}_{!{\sigma }}(M_{\varepsilon })}{\mathop {\rightarrow }\limits ^{({?}^{1})}}{{t_0}{\mathop {\rightarrow }\limits ^{(!^{1})}}{{\ldots }{\mathop {\rightarrow }\limits ^{(!^{1})}}{t_i \ldots } } } \) and \( {\hat{s}_{!{\sigma }}(M_{\mu })}{\mathop {\rightarrow }\limits ^{({?}^{1})}}{{s_0}{\mathop {\rightarrow }\limits ^{(!^{1})}}{{\ldots }{\mathop {\rightarrow }\limits ^{(!^{1})}}{s_i \ldots } } } \). Since \(\mathrel {R}\) is an \(\alpha \)-bisimulation, for every i, it holds that \(t_i\,{\mathrel {R}}\,s_i\). Looking at the definition of \(\mathscr {L}_\oplus ^{!{}}\), it is easy to realize that:

$$\begin{aligned} t_i&= \{{(\left[ \varOmega _! \oplus ^{\varepsilon } I\right] , \left[ I,\ldots , I\right] ),(\left[ \sigma \right] , \left[ \sigma , \ldots ,\sigma \right] )}^{\varepsilon ^i}\}_{i\in \mathbb {N}};\\ s_i&= \{{(\left[ \varOmega _! \oplus ^{\mu } I\right] , \left[ I,\ldots , I\right] ) ,(\left[ \sigma \right] , \left[ \sigma , \ldots ,\sigma \right] )}^{\mu ^i}\}_{i\in \mathbb {N}}. \end{aligned}$$

By the definition of an \(\alpha \)-bisimulation, we see that this implies that \(\alpha \ge |\varepsilon ^i - \mu ^i |\). Since this reasoning can be done for every \(\alpha \) such that \(M_{\varepsilon }\) and \(M_{\mu }\) are \(\alpha \)-bisimilar, it means that: \({\delta _{!{\sigma },!{}}^b}(M_{\varepsilon },M_{\mu }) \ge \sup _{i \in \mathbb {N}} |\varepsilon ^i - \mu ^i |\). Moreover, if we consider the special case where \(\varepsilon = 0\), we can actually construct a \(\mu \)-bisimulation by taking

$$\begin{aligned} {\mathrel {R}} = (\hat{s}_{!{\sigma }}(M_{0}), \hat{s}_{!{\sigma }}(M_{\mu }))\cup \{(t_0 ,s_0 ) \} \cup \{((0, A), (\mathcal {D}, A) \mid {\sum \nolimits _{\mathcal {D}}}\le \mu \}. \end{aligned}$$

We can easily check that \(\mathrel {R}\) is indeed a \(\mu \)-bisimulation, which tells us that \({\delta _{!{\sigma },!{}}^b}(M_{0},M_{\mu }) = \mu \).

5.4 Full Abstraction

In this section, we prove that \(\delta _{\sigma ,!{}}^b\) coincides with \(\delta ^{c}_{\sigma ,!}\). We first of all show that the metric \(\delta _{\sigma ,!{}}^b\) is sound with respect to \(\delta ^{c}_{\sigma ,!}\), i.e. that \(\delta _{\sigma ,!{}}^b\) discriminates at least as much as \(\delta ^{c}_{\sigma ,!}\):

Theorem 3

(Soundness). For any terms \(M\) and \(N\) of \(\varLambda _\oplus ^{!}\), such that \( \vdash M : \sigma \) and \( \vdash N : \sigma \), it holds that \({\delta ^{c}_{\sigma ,!}}(M,N)\le {\delta _{\sigma ,!{}}^b}(M,N)\).

Please remember that our definition of the tuple distance is based on the notion of \(\varepsilon \)-bisimulation. Proving the soundness theorem, thus, requires us to show that for any terms \(M\) and \(N\) of type \(\sigma \) such that \(\hat{s}_{\sigma }(M)\) and \(\hat{s}_{\sigma }(N)\) are \(\varepsilon \)-bisimilar, and for any context \(C\), it holds that \(\mid {\sum \nolimits _{[\![C[M] ]\!]}} - {\sum \nolimits _{[\![C[N] ]\!]}}\mid \, \le \varepsilon \). Our proof strategy is based on the fact that we can decompose every evaluation path of a term in the form \(C[L]\) into external reduction steps (that is, steps that do not affect \(L\)), and internal reduction steps (that is, reduction steps affecting \(L\), but which can be shown to correspond only to actions from \(\mathscr {L}_\oplus ^{!{}}\)). Intuitively, if we reduce in parallel \(C[M]\) and \(C[N]\), we are going to have steps where only the context is modified (and the modification does not depend on whether we are considering the first program or the second), and steps where the internal part is modified, but these steps cannot induce too much of a difference between the two programs, since the two internal terms are \(\varepsilon \)-bisimilar.

We first of all need to generalize the notion of context to deal with tuples rather than with terms. In particular, we need contexts with multiple holes having types which match those of the tuple (or, more precisely, the \(A\)-state) they are meant to be paired with. More formally:

Definition 5

(Tuple Contexts). Tuple contexts are triples of the form \((C, A, \gamma )\), where \(C\) is an open term, \(A= (\varvec{\sigma }, \varvec{\tau })\) is a (nm)-tuple type, and \(\gamma \) is a type such that \(!{\varvec{x}_{\{ 1, \ldots ,n\}}} : !{\varvec{\sigma }}, \varvec{y}_{\{1, \ldots , m \}} : {\varvec{\tau }} \vdash C : \gamma \). We note \(\mathscr {C}^{\mathbf {T}}\) the set of tuple contexts. A tuple context \((C, A, \gamma )\) is said to be an open value if \(C\) is of one of the following four forms: \(\lambda x. M\), \(\lambda {!{x}}. M\), \(!{M}\), \(y_i\) (where \(i \in \mathbb {N}\)).

We now want to define when a tuple context and an \(A\)-state can be paired together, and the operational semantics of such an object, which will be derived from that of \(\varLambda _\oplus ^{!}\)-terms. This is the purpose of the following definition:

Definition 6

(Tuple Context Pairs). We say that a pair \(u= (C, t)\) is a tuple context pair iff \(t= (A, \mathcal {D})\) is an \(A\)-state, and \(\exists \gamma \in \mathscr {A}, \,(C, A, \gamma )\in \mathscr {C}^{\mathbf {T}}\). We indicate as \(\mathbf { C \times \varDelta ( \mathscr {U} )}\) the set of tuple context pairs. Moreover, given such a pair \(u= (C,(A, \mathcal {D}))\), we define \(\mathbf F (u)\) as the (potentially infinite) distribution over \(\mathscr {T}\) given by:

$$\begin{aligned} \mathbf F (u) = \{{C\{\varvec{x}/\varvec{M}\}\{\varvec{y}/\varvec{N}\}}^{\mathcal {D}(\varvec{M},\varvec{N})}\}_{(\varvec{M}, \varvec{N}) \in \textsf {S}(\mathcal {D})}. \end{aligned}$$

Giving a notion of context distance for \(A\)-states is now quite easy and natural, since we know how contexts for such objects look like. For the sake of being as general as possible, this notion of a distance is parametric on a set of tuple contexts \(\mathscr {C}\subseteq \mathscr {C}^{\mathbf {T}}\).

Definition 7

Let \(\mathscr {C}\subseteq \mathscr {C}^{\mathbf {T}}\), \(A\in \mathbf T\), and \(t, s\) be two \(A\)-states. We define:

$$\begin{aligned} {\delta ^{\text {c}}_{\mathscr {C}}}(t,s)= \sup _{(C, A, \sigma ) \in \mathscr {C}}\left|{\sum \nolimits _{[\![\mathbf F (C, t) ]\!]}}- {\sum \nolimits _{[\![\mathbf F (C, s) ]\!]}} \right|\end{aligned}$$

Unsurprisingly, the context distance between terms equals \(\delta ^{\text {c}}_{\mathscr {C}^{\mathbf {T}}}\) when applied to \(A\)-states obtained through \(\hat{s}_{\sigma }(\cdot )\):

Proposition 1

If \( \vdash M, N : \sigma \), then \({\delta ^{c}_{\sigma ,!}}(M,N) = {\delta ^{\text {c}}_{\mathscr {C}^{\mathbf {T}}}}(\hat{s}_{\sigma }(M),\hat{s}_{\sigma }(N))\).

But why did we introduce \(\mathbf { C \times \varDelta ( \mathscr {U} )}\)? Actually, these pairs allow a fine analysis of how tuples behave when put in a context, which in turn is precisely what we need to prove Theorem 3. This analysis, however, is not possible without endowing \(\mathbf { C \times \varDelta ( \mathscr {U} )}\) itself with an operational semantics, which is precisely what we are going to do in the next paragraphs.

Two relations need to be defined. On the one hand, we need a one-step labeled transition relation \(\rightarrow _{\mathbf { C \times \varDelta ( \mathscr {U} )}}\) which turns an element of \(\mathbf { C \times \varDelta ( \mathscr {U} )}\) into a distribution over \(\mathbf { C \times \varDelta ( \mathscr {U} )}\) by performing an action. Intuitively, one step of reduction in \(\rightarrow _{\mathbf { C \times \varDelta ( \mathscr {U} )}}\) corresponds to at most one step of reduction in \(\mathscr {L}_\oplus ^{!{}}\). If that step exists, (i.e. if the term is reduced) then the label is the same, and otherwise (i.e., if only the context is reduced), the label is just \(\tau \). We also need a multi-step approximation semantics \({\Rightarrow }_{\mathbf { C \times \varDelta ( \mathscr {U} )}}\) between elements of \(\mathbf { C \times \varDelta ( \mathscr {U} )}\) and subdistributions over the same set. The latter is based on the former, and both are formally defined in Fig. 8, where

  • \(E\) is an evaluation context;

  • \(t\) is an (nm)-state from \(\mathcal {S}_{\mathscr {L}_\oplus ^{!{}}}\);

  • \(h\) is a tuple-context pair from \(\mathbf { C \times \varDelta ( \mathscr {U} )}\);

  • For every context \(C\), \(C_{\text {remove}(E)}\) stands for the context

    $$\begin{aligned} C\{y_1/y_{1-\#\{j \mid j \in E\wedge j<1\}}\} \ldots \{y_n/y_{n-\#\{j \mid j \in E\wedge j <n\}}\} \end{aligned}$$
Fig. 8.
figure 8

Rules for \({} {\Rightarrow }_{\mathbf { C \times \varDelta ( \mathscr {U} )}} \)

We first show that this definition can indeed be related to the usual semantics for terms. This takes the form of the following lemma:

Lemma 2

Let be \(u\in \mathbf { C \times \varDelta ( \mathscr {U} )}\). Then:

  • \(\{\mathcal {D}\mid {u} {\Rightarrow }_{\mathbf { C \times \varDelta ( \mathscr {U} )}} \mathcal {D}\}\) is a directed set. We define \({[\![u ]\!]}^{\mathbf { C \times \varDelta ( \mathscr {U} )}}\) as its least upper bound;

  • \(\mathbf F (\cdot ):\text {Distr}(\mathbf { C \times \varDelta ( \mathscr {U} )}) \rightarrow \text {Distr}(\mathscr {T})\) is continuous;

  • \([\![\mathbf F (u) ]\!] = \mathbf F ({[\![u ]\!]}^{\mathbf { C \times \varDelta ( \mathscr {U} )}})\).

Before proceeding, we need to understand how any reflexive and symmetric relation on \(\mathbf { C \times \varDelta ( \mathscr {U} )}\) can be turned into a relation on distributions on \(\mathbf { C \times \varDelta ( \mathscr {U} )}\). If \(\mathrel {R}\) is a reflexive and symmetric relation on \(\mathbf { C \times \varDelta ( \mathscr {U} )}\), we lift it to distributions over \(\mathbf { C \times \varDelta ( \mathscr {U} )}\) by stipulating that \(\mathcal {D}\mathrel {R}\mathcal {E}\) whenever there exists a countable set I, a family \((p_i)_{i \in I}\) of positive reals of sum smaller than 1, and families \((h_i)_{i \in I}, (k_i)_{i \in I}\) in \(\mathbf { C \times \varDelta ( \mathscr {U} )}\), such that \(\mathcal {D}=\{{h_i}^{p_i}\}_{i \in I}\), \(\mathcal {E}= \{{k_i}^{p_i}\}_{i \in I}\), and moreover \(h_i \mathrel {R}k_i\) for every \(i \in I\).

We now want to precisely capture when a relation on \(\mathbf { C \times \varDelta ( \mathscr {U} )}\) can be used to evaluate the distance between tuple-context pairs.

Definition 8

Let \(\mathrel {R}\) be a reflexive and symmetric relation on \(\mathbf { C \times \varDelta ( \mathscr {U} )}\).

  • We say that \(\mathrel {R}\) is preserved by \(\rightarrow _{\mathbf { C \times \varDelta ( \mathscr {U} )}}\) if, for any \(h, k\in \mathbf { C \times \varDelta ( \mathscr {U} )}\) such that \(h\mathrel {R}k\), if \({h}{\mathop {\rightarrow }\limits ^{a}}_{\mathbf { C \times \varDelta ( \mathscr {U} )}} {\mathcal {D}} \), then there exists \(\mathcal {E}\) such that \({k}{\mathop {\rightarrow }\limits ^{a}}_{\mathbf { C \times \varDelta ( \mathscr {U} )}} {\mathcal {E}} \), and that \(\mathcal {D}\mathrel {R}\mathcal {E}\).

  • We say that \(\mathrel {R}\) is \(\varepsilon \)-bounding if \(h\mathrel {R}k\) implies \(|{\sum \nolimits _\mathbf{F (h)}}- {\sum \nolimits _\mathbf{F (k)}}|\le \varepsilon \).

  • Let \(\mathscr {C}\) be a set of tuple contexts, and \(t, s\in \mathcal {S}_{\mathscr {L}_\oplus ^{!{}}}\) be two \(A\)-states. We say that \(\mathrel {R}\) is \(\mathscr {C}\)-closed with respect to \(t\) and \(s\) if, for every \(C\) and \(\gamma \) such that \((C, A, \gamma ) \in \mathscr {C}\), it holds that \((C, t)\mathrel {R}(C, s)\).

Please observe how any relation preserving \(\rightarrow _{\mathbf { C \times \varDelta ( \mathscr {U} )}}\) and being \(\varepsilon \)-bounding can be seen somehow as an \(\varepsilon \)-bisimulation, but on tuple-context pairs. The way we defined the lifting, however, makes it even a stronger notion, i.e., the ideal candidate for an intermediate step towards Soundness.

As a first step, the conditions from Definition 8 are enough to guarantee that two terms are at context distance at most \(\varepsilon \).

Proposition 2

Let \(M, N\) be two terms of type \(\sigma \). Suppose there exists a reflexive and symmetric relation \(\mathrel {R}\) on \(\mathbf { C \times \varDelta ( \mathscr {U} )}\), which is preserved by \({}{\mathop {\rightarrow }\limits ^{}}_{\mathbf { C \times \varDelta ( \mathscr {U} )}} {} \), \(\varepsilon \)-bounding, and \(\mathscr {C}^{\mathbf {T}}\)-closed with respect to \(\hat{s}_{\sigma }(M)\) and \(\hat{s}_{\sigma }(N)\). Then \({\delta ^{c}_{\sigma ,!}}(M,N)\le \varepsilon \).

What remains to be done, then, is to show that if two terms are related by \(\mathrel {R}^\varepsilon \), then they themselves satisfy Definition 8. Compulsory to that is showing that any \(\varepsilon \)-bisimulation can at least be turned into a relation on \(\mathbf { C \times \varDelta ( \mathscr {U} )}\). We need to do that, in particular, in a way guaranteeing the \(\mathscr {C}\)-closure of the resulting relation, and thus considering all possible tuple contexts from \(\mathscr {C}\):

Definition 9

Let \(\mathrel {R}\) be a reflexive and symmetric relation on \(\mathcal {S}_{\mathscr {L}_\oplus ^{!{}}}\). Let be \(\mathscr {C}\) a set of tuple contexts. We define its contextual lifting to \(\mathbf { C \times \varDelta ( \mathscr {U} )}\) with respect to \(\mathscr {C}\) as the following binary relation on \(\mathbf { C \times \varDelta ( \mathscr {U} )}\):

The following result tells us that, indeed, any \(\varepsilon \)-bisimulation can be turned into a relation satisfying Definition 8:

Proposition 3

Let \(\mathrel {R}\) be an \(\varepsilon \)-bisimulation. Then \(\widehat{\mathrel {R}}^{\mathscr {C}^{\mathbf {T}}}\) is preserved by \({}{\mathop {\rightarrow }\limits ^{}}_{\mathbf { C \times \varDelta ( \mathscr {U} )}} {} \) and \(\varepsilon \)-bounding, and \(\mathscr {C}^{\mathbf {T}}\)-closed with respect to every \(t, s\) such that \(t\mathrel {R}s\).

We are finally ready to give a proof of soundness:

Proof

(of Theorem 3). Consider two terms \(M\) and \(N\) of type \(\sigma \). Let \(\varepsilon \) be \({\delta _{\sigma ,!{}}^b}(M,N)\). We take \(\mathrel {R}^\varepsilon \) (defined in Definition 3 as the largest \(\varepsilon \)-bisimulation), and we see that \(\hat{s}_{\sigma }(M)\,{\mathrel {R}^{\varepsilon }}\,\hat{s}_{\sigma }(N)\). Proposition 3 tells us that we can apply Proposition 2 to \(M\), \(N\), and \(\widehat{(\mathrel {R}^\varepsilon )}^{\mathscr {C}^{\mathbf {T}}}\). Doing so we obtain that \({\delta ^{c}_{\sigma ,!}}(M,N)\le \varepsilon \), which is the thesis.    \(\square \)

We can actually show (see [7]) that \(\delta _{\sigma ,!{}}^b\) is also complete with respect to the contextual distance:

Theorem 4

(Full Abstraction). For every \(\sigma \), \({\delta ^{c}_{\sigma ,!}} = \delta _{\sigma ,!{}}^b\).

5.5 On an Up-to-Context Technique

As we have just shown, context distance can be characterized as a coinductively-defined metric, which turns out to be useful when evaluating the distance between terms. In this section, we will go even further, and show how an up-to-context [31] notion of \(\varepsilon \)-bisimulation is precisely what we need to handle our running example.

We first of all need to generalize our definition of a tuple: an open tuple is a pair \((\varvec{M}, \varvec{N})\), where \(\varvec{M}\) and \(\varvec{N}\) are sequences of (not necessarily closed) typable terms.

Definition 10

If \(K= (\varvec{M}, \varvec{N})\) is an open tuple, and \(A= (\varvec{\gamma }, \varvec{\eta })\) is a tuple type, we say that \((\varvec{\sigma }, \varvec{\tau }, K, A)\) is a substitution judgment iff:

  • \(!{\varvec{x}}: !{\varvec{\sigma }} \vdash {\varvec{M}_i} : {\varvec{\gamma }}_i\);

  • if n and m are such that \(\varvec{\tau }\) is a n-sequence, and \(\varvec{N}\) a m-sequence, then there exists a partition \(\{E_1, \ldots E_m\}\) of \(\{1,\ldots , n \}\) such that \(\varvec{y}_{E_j}: \varvec{\tau }_{E_j} \vdash N_j : \eta _j \) for every \(j \in \{1, \ldots ,m\}\).

\(\mathscr {J}^{\text {subst}}\) is the set of all substitution judgments.

If \(\kappa = (\varvec{\sigma }, \varvec{\tau }, K, A) \in \mathscr {J}^{\text {subst}}\), and \(H\in \mathscr {U} \) is of type \((\varvec{\sigma }, \varvec{\tau })\), then there is a natural way to form a tuple \(\kappa [H]\), namely by substituting the free variables of \(K\) by the components of \(H\). In the following, we restrict \(\mathscr {J}^{\text {subst}}\) to those judgments \(\kappa \) such that for every \(H\), terms in the linear part of \(\kappa [H]\) are values. Observe that we always have \( \vdash \kappa [H] : A\). We extend the notation \(\kappa [H]\) to distributions over \( \mathscr {U} \): if \(\mathcal {D}\) is a distribution over tuples of type \((\varvec{\sigma }, \varvec{\tau })\), we note \(\kappa [\mathcal {D}]= \{{\kappa [H]}^{\mathcal {D}(H)}\}_{H\in \mathscr {U} }\), which is a distribution over tuples of type \(A\). Moreover, we want to be able to apply our substitution judgments to the states of \(\mathscr {L}_\oplus ^{!{}}\). If \(t= (\mathcal {D}, (\varvec{\sigma }, \varvec{\tau }))\in \mathcal {S}_{\mathscr {L}_\oplus ^{!{}}}\), and \(\kappa = (\varvec{\sigma }, \varvec{\tau }, K, A)\), the state of \(\mathscr {L}_\oplus ^{!{}}\) defined by \(( \kappa [\mathcal {D}], A)\) will be often indicated as \(\kappa [t]\).

Example 6

We illustrate on a simple example the use of substitution judgments. Let be \(\tau \) any type. Consider \(\varvec{\sigma }= \left[ \tau \multimap \tau \right] \), and \(\varvec{\tau }= \left[ \right] \). Moreover, let \(K= (\left[ x_1\right] , \left[ I\right] )\) and \(A= (\left[ \tau \multimap \tau \right] , \left[ \tau \multimap \tau \right] )\). Then \(\kappa = (\varvec{\sigma }, \varvec{\tau }, K, A)\) is a substitution judgment. We consider now a tuple of type \((\varvec{\sigma }, \varvec{\tau })\). In fact, we take here a tuple that will be useful in order to analyze our running example: \(H=(\left[ \varOmega _! \oplus ^{\varepsilon } I\right] , \left[ \right] ) \). By substituting \(H\) in \(\kappa \), we obtain \(\kappa [H]= (\left[ \varOmega _! \oplus ^{\varepsilon } I\right] , \left[ I\right] )\), and we can see easily that we obtain indeed a tuple of type \(A\).

The main idea behind up-to-context bisimulation is to allow for the freedom of discarding any context when proving a relation to be a bisimulation. This is captured by the following definition:

Definition 11

Let \(\mathrel {R}\) be a relation on \(\mathcal {S}_{\mathscr {L}_\oplus ^{!{}}}\). \(\mathrel {R}\) is an \(\varepsilon \)-bisimulation up-to-context if for every \(t\) and \(s\) such that \(t\mathrel {R}s\), the following holds:

  • there exists \(C\in \mathbf T\) such that \(t= (\mathcal {D}, C) \), \(s= (\mathcal {E}, C)\), and \(|{\sum \nolimits _{\mathcal {D}}}- {\sum \nolimits _{\mathcal {E}}}|\le \varepsilon \).

  • for any \(a\in \mathcal {A}_{\mathscr {M}_\oplus ^{!{}}}\), if \({t}{\mathop {\rightarrow }\limits ^{a}}{u} = (\mathcal {D}, A)\) and \({s}{\mathop {\rightarrow }\limits ^{a}}{v} = (\mathcal {E}, A) \), then there exists a finite set \(I \subseteq \mathbb {N}\) such that:

    • there is a family of rationals \((p_i)_{i\in I}\) such that \(\sum _{i \in I} p_i \le 1\);

    • there are families \(\varvec{\sigma }^i\), \(\varvec{\tau }^i\), and \(K^i\), such that \(\kappa _i = (\varvec{\sigma }^i, \varvec{\tau }^i, K^i, A)\) is a substitution judgment for every \(i\in I\);

    • there are distributions over tuples \(\mathcal {D}_i\), \(\mathcal {E}_i\) such that \((\mathcal {D}_i, B_i) \mathrel {R}(\mathcal {E}_i, B_i)\);

    and moreover \(\mathcal {D}= \sum \nolimits _{i \in I}p_i \cdot \kappa _i[\mathcal {D}_i] \), and \(\mathcal {E}= \sum \nolimits _{i \in I} p_i \cdot \kappa _i[\mathcal {E}_i].\)

The proof method we just introduced is indeed quite useful when handling our running example.

Example 7

We show that up-to bisimulations can handle our running example. Please recall the definition of \(M_\varepsilon \) given in Example 2. First, we can see that, for every a, for every type \(\tau \), \(\hat{s}_{!{(\tau \multimap \tau )}}(M_{a}) = ({\{{(\left[ \right] , \left[ !{\varOmega _! \oplus ^{a} I}\right] )}^1\}} ,(\left[ \right] , \left[ !{\tau \multimap \tau }\right] ) )\). We define a relation \(\mathrel {R}\) on \(\mathcal {S}_{\mathscr {L}_\oplus ^{!{}}}\) containing \((\hat{s}_{!{(\tau \multimap \tau )}}(M_{\varepsilon }), \hat{s}_{!{(\tau \multimap \tau )}}(M_{\mu }))\), and we show that it is a \(\gamma \)-bisimulation up-to-context for an appropriate \(\gamma \). In order to simplify the notations, we define \(B=({\left[ \tau \multimap \tau \right] },\left[ \right] )\), and \(t_n, s_n \in \mathcal {S}_{\mathscr {L}_\oplus ^{!{}}}\) as:

$$\begin{aligned} t_n = ({\{{({\left[ ({\varOmega _!} \oplus ^{\varepsilon }I)\right] },\left[ \right] )}^{({\varepsilon }^n)}\}} , B), \qquad s_n = ({\{{({\left[ ({\varOmega _!} \oplus ^{\mu }I)\right] },\left[ \right] )}^{({\mu }^n )}\}} , B). \end{aligned}$$

Then, we define the relation \(\mathrel {R}\) as \( \mathrel {R}= \left\{ \left( \hat{s}_{\sigma }(M), \hat{s}_{\sigma }(N)\right) \right\} \cup \left\{ (t_n ,s_n) \mid n \in \mathbb {N}\right\} . \) One can check that \(\mathrel {R}\) is indeed a \(\gamma \)-bisimulation up-to-context (where \(\gamma =\sup _{n\in \mathbb {N}}|\varepsilon ^n-\mu ^n |\)) by carefully analyzing [7] every possible action. The proof is based on the following observations:

  • The only action starting from \(\hat{s}_{\sigma }(M)\) or \(\hat{s}_{\sigma }(N)\) is \(a= ({?}^{1})\), passing a term to the exponential part of the tuple, then we end up in \(t_0\) and \(s_0\) respectively.

  • If we start from \(t_n\) or \(s_n\), the only relevant action is Milner’s action \(a= (!^{1})\), consisting in taking a copy of the term in the exponential part, evaluating it, and putting the result in the linear part. We can see (using the substitution judgment \(\kappa \) defined in Example 6), that \({t_n}{\mathop {\rightarrow }\limits ^{a}}{\kappa [t_{n+1}]} \), and similarly \({s_n}{\mathop {\rightarrow }\limits ^{a}}{\kappa [s_{n+1}]} \), and the result follows.

Bisimulations up-to-context would be useless without a correctness result such as the following one:

Theorem 5

If \(\mathrel {R}\) is an \(\varepsilon \)-bisimulation up-to context, then \(\mathrel {R}\subseteq \mathrel {R}^\varepsilon \).

The proof is an extension of that of Theorem 3 (although technically more involved), and can be found in [7].

Example 8

We can exploit the soundness of up-to-context bisimulation to obtain the contextual distance for our running example. This allows us to conclude that \({\delta ^{c}_{!{(\tau \multimap \tau )},!}}(M_{\varepsilon },M_{\mu }) = \sup _{n\in \mathbb {N}} |\varepsilon ^n - \mu ^n |\). The context distance between \(M_{\varepsilon }\) and \(M_{\mu }\) is thus strictly between 0 and 1 whenever \(0<|\varepsilon -\mu |<1\).

6 Probabilistic \(\lambda \)-Calculi, in Perspective

The calculus \(\varLambda _\oplus ^{!,\parallel }\) we analyzed in this paper is, at least apparently, nonstandard, given the presence of parallel disjunction, but also because of the linear refinement it is based on. In this section, we will reconcile what we have done so far with calculi in the literature, and in particular with untyped probabilistic \(\lambda \)-calculi akin to those studied, e.g., in [5, 9].

We consider a language \(\varLambda _\oplus \) defined by the following grammar:

6.1 On Stable Fragments of \(\mathscr {M}_\oplus ^{!{}}\)

Our objective in this section is to characterize various notions of context distance for \(\varLambda _\oplus \) by way of appropriate embeddings into \(\varLambda _\oplus ^{!}\), and thus by the LMC \(\mathscr {M}_\oplus ^{!{}}\). It is quite convenient, then, to understand when any fragment of \(\mathscr {M}_\oplus ^{!{}}\) is sufficiently robust so as to be somehow self-contained:

Definition 12

We say that the pair \((\hat{\mathcal S}, \hat{\mathcal A})\), where \(\hat{\mathcal S}\subseteq \mathcal {S}_{\mathscr {M}_\oplus ^{!{}}}\), and \(\hat{\mathcal A}\subseteq \mathbf T\times \mathcal {A}_{\mathscr {M}_\oplus ^{!{}}}\) is a stable fragment of \(\mathscr {M}_\oplus ^{!{}}\) iff for every pair \((A,a) \in \hat{\mathcal A}\), for every \(A\)-state \(t\), and for every \(s\in \mathcal S\) such that \(\mathcal {P}_{\mathscr {M}_\oplus ^{!{}}}(t, a, s)>0\), it holds that \(s\in \hat{\mathcal S}\).

Using a stable fragment of \(\mathscr {M}_\oplus ^{!{}}\), we can restrict the WLTS \(\mathscr {L}_\oplus ^{!{}}\) in a meaningful way. The idea is that we only consider some of the states of \(\mathscr {L}_\oplus ^{!{}}\), and we are able to choose the possible actions depending on the type of the state of \(\mathscr {L}_\oplus ^{!{}}\) we consider.

Definition 13

If \(\mathscr {F}= (\hat{\mathcal S}, \hat{\mathcal A})\) is a stable fragment of \(\mathscr {M}_\oplus ^{!{}}\), we define a WLTS by \(\mathscr {L}_{\mathscr {F}} = (\mathcal {S}_{\mathscr {L}_{\mathscr {F}}}, \mathcal {A}_{ \mathscr {L}_{\mathscr {F}}}, {}{\mathop {\rightarrow }\limits ^{\cdot }}{_{\mathscr {F}}} ,w_{{\mathscr {F}}})\), as

$$\begin{aligned} \mathcal {S}_{\mathscr {L}_{\mathscr {F}}} = \bigcup \nolimits _{A\in \mathbf T} \text {Distr}(\{K\mid (K, A)\in \hat{\mathcal S}\})\times \{ A\}; \qquad \mathcal {A}_{ \mathscr {L}_{\mathscr {F}}} = \bigcup \nolimits _{(A, a) \in \hat{\mathcal A}}\{a\} \cup \mathbf T; \end{aligned}$$
$$\begin{aligned} {}{\mathop {\rightarrow }\limits ^{\cdot }}_{\mathscr {F}}{} = {}{\mathop {\rightarrow }\limits ^{\cdot }}{} \cap {\{((\mathcal {D}, A), a, s) \mid \textsf {S}(\mathcal {D})\subseteq \hat{\mathcal S}, \, (A, a) \in \hat{\mathcal A}\}}; \end{aligned}$$

and \(w_{{\mathscr {F}}}\) is defined as expected.

We want to be able to define a notion of distance on a fragment of the original language \(\varLambda _\oplus ^{!}\), so that it verifies the soundness property for a restricted set of contexts. To do that, we need the restricted set of contexts \(\mathscr {C}\) to be preserved by the stable fragment:

Definition 14

Let \(\mathscr {F}= (\hat{\mathcal S}, \hat{\mathcal A})\) be a stable fragment of \(\mathscr {M}_\oplus ^{!{}}\). Let \(\mathscr {C}\) be a set of tuple contexts. We say that \(\mathscr {C}\) is preserved by \(\mathscr {F}\) if the following holds: for any \((C, A,\gamma ) \in \mathscr {C}\) that is not an open value and for any \(A\)-state \(t\) in \(\mathcal {S}_{\mathscr {L}_{\mathscr {F}}}\), there exists \(a\) such that \((A,a) \in \hat{\mathcal A}\bigcup (\mathbf T\times \{ \tau \})\), \({(C, t)}{\mathop {\rightarrow }\limits ^{a}}_{\mathbf { C \times \varDelta ( \mathscr {U} )}} {\mathcal {E}} \), and moreover:

$$\begin{aligned} \textsf {S}(\mathcal {E})\subseteq \bigcup \nolimits _{B\in \mathbf T} \{(D,s) \mid s\text { a }B\text {-state} \wedge \exists \eta \text { s.t. }(D, B,\eta ) \in \mathscr {C}\} \end{aligned}$$

We are now able to provide guarantees that the contextual distance \(\delta ^{\text {c}}_{\mathscr {C}}\) with respect to our restricted set of contexts \(\mathscr {C}\) is smaller that the distance defined on the WLTS \(\mathscr {L}_{\mathscr {F}}\) induced by our stable fragment \(\mathscr {F}\). This is the spirit of the following proposition.

Proposition 4

Let \(\mathscr {F}= (\hat{\mathcal S}, \hat{\mathcal A})\) be a stable fragment of \(\mathscr {M}_\oplus ^{!{}}\), \(\mathscr {C}\) be a set of tuple contexts preserved by \(\mathscr {F}\), and \(t, s\in \mathcal {S}_{\mathscr {L}_{\mathscr {F}}}\). Then \({\delta ^{\text {c}}_{\mathscr {C}}}(t,s)\le {\delta ^{b}_{\mathscr {L}_{\mathscr {F}}}}(t,s)\).

In the following, we make use of Proposition 4 on stable fragments corresponding to embeddings of \(\varLambda _\oplus \) into \(\varLambda _\oplus ^{!}\). We will consider two different encodings depending on the underlying notion of evaluation.

6.2 Call-by-Name

\(\varLambda _\oplus \) can first of all be endowed with call-by-name semantics, as in Fig. 9. We use it to define an approximation semantics exactly in the same way as in Fig. 1, and we take as usual the semantics of a term to be the least upper bound of its approximated semantics. Moreover, we denote by \(\delta _{\text {cbn}}^{c}\) the context distance on \(\varLambda _\oplus \), defined the natural way. We are going, in the remainder of this section, to use our results about \(\varLambda _\oplus ^{!}\) to obtain a characterization of \(\delta _{\text {cbn}}^{c}\).

Fig. 9.
figure 9

One-step call-by-name semantics

The Call-By-Name Embedding. Girard’s translation [19] gives us an embedding \(\langle \cdot \rangle ^{\text {cbn}}:\varLambda _\oplus \rightarrow \varLambda _\oplus ^{!}\), defined as follows:

$$\begin{aligned} \langle x \rangle ^{\text {cbn}}&= x&\langle \lambda x. M \rangle ^{\text {cbn}}&=\lambda {!{x}}. \langle M \rangle ^{\text {cbn}} \\ \langle MN \rangle ^{\text {cbn}}&=\langle M \rangle ^{\text {cbn}}!{\langle N \rangle ^{\text {cbn}}}&\langle M \,\oplus \, N \rangle ^{\text {cbn}}&= \langle M \rangle ^{\text {cbn}} \,\oplus \, \langle N \rangle ^{\text {cbn}} \end{aligned}$$

Please observe that \(\langle \cdot \rangle ^{\text {cbn}}\) respects typing in the sense that, when we define \(\sigma ^{\text {cbn}}= \mu {\alpha }. !{\alpha } \multimap \alpha \), it holds that for every term \(M\) of \(\varLambda _\oplus \) whose free variables are in \(\{ x_1, \ldots , x_n\}\), we can show that \(!{x}_1: !{\sigma ^{\text {cbn}}}, \ldots , !{x}_n : !{\sigma ^{\text {cbn}}} \vdash \langle M \rangle ^{\text {cbn}} : \sigma ^{\text {cbn}}\).

Metrics for \({{\varvec{\varLambda }}_{\varvec{\oplus }}}\). It is very tempting to define a metric on \(\varLambda _\oplus \) just as follows: \( {\delta _{\text {cbn}}^b}(M,N) = {\delta _{!{\sigma ^{\text {cbn}}},!{}}^b}(!{\langle M \rangle ^{\text {cbn}}},!{\langle N \rangle ^{\text {cbn}}})\). We can easily see that it is sound with respect to the context distance for \(\varLambda _\oplus \), since any context of this language can be seen, through \(\langle \cdot \rangle ^{\text {cbn}}\), as a context in \(\varLambda _\oplus ^{!}\). However, it is not complete, as shown by the following example:

Example 9

We consider \(M= \varOmega \,\oplus \, (\lambda x. \varOmega )\) and \( N= {(\lambda x. \varOmega )}\). We can see that \({\delta _{!{\sigma ^{\text {cbn}}},!{}}^b}( !{\langle M \rangle ^{\text {cbn}}},!{\langle N \rangle ^{\text {cbn}}}) = 1 \): indeed, when we define a sequence of \(\varLambda _\oplus ^{!}\)-contexts by \(C_n = \lambda {!{x}}. \left( (\lambda y_1. \ldots \lambda y_n. (\lambda z. zy_1, \ldots y_n))x\ldots x\right) [] \), we see that \(\text {Obs}(!{\langle M \rangle ^{\text {cbn}}}) = 1/{2^n}\) while \(\text {Obs}(!{\langle N \rangle ^{\text {cbn}}}) = 1\). But those contexts \(C_n\) have more expressive power than any context in \(\langle \varLambda _\oplus \rangle ^{\text {cbn}}\), since they do something that none of the contexts from \(\varLambda _\oplus \) can do: they evaluate a copy of the term, and then shift their focus to another copy of the term. It can be seen in the embedding: a term in \(\langle \varLambda _\oplus \rangle ^{\text {cbn}}\) never has several redexes in linear position. It can actually be shown (see [7]) that \({\delta _{\text {cbn}}^{c}}(M,N) = \frac{1}{2} < {\delta _{\text {cbn}}^b}(M,N) \).

The way out consists in using the notion of stable fragment to refine the Markov Chain \(\mathscr {M}_\oplus ^{!{}}\) by keeping only the states and actions we are interested in.

Definition 15

We define a stable fragment \(\mathscr {F}^{\text {cbn}}\) as specified in Fig. 10, and a distance \(\delta _{\text {cbn}}\) on \(\varLambda _\oplus \) as \({\delta _{\text {cbn}}}(M,N) = {\delta ^{b}_{\mathscr {L}_{\mathscr {F}^{\text {cbn}}}}}(\hat{s}^{\text {cbn}}(M),\hat{s}^{\text {cbn}}(N))\), where \(\hat{s}^{\text {cbn}}(M)= ({\{{([\langle M \rangle ^{\text {cbn}}], [])}^1\}} ,A^{0})\).

Fig. 10.
figure 10

The stable fragment \(\mathscr {F}^{\text {cbn}}= (\mathcal S_{\mathscr {M}_\oplus ^{\text {cbn}}},\mathcal A_{\mathscr {M}_\oplus ^{\text {cbn}}})\).

We need now to define a set of tuple contexts preserved by \(\mathscr {F}^{\text {cbn}}\), the aim of applying Proposition 4.

Definition 16

\(\mathscr {C}_{\text {cbn}}\) is the smallest set of tuple contexts such that:

  • If \(M\in \varLambda _\oplus \) with \(FV(M)\subseteq \{x_1 \}\), then \((\langle M \rangle ^{\text {cbn}}, A^{0}, \sigma ^{\text {cbn}}) \in \mathscr {C}_{\text {cbn}}\);

  • If \((C, A^{0}, \sigma ^{\text {cbn}}) \in \mathscr {C}_{\text {cbn}}\), and \(C= E[x_1]\), it holds that \((E[y_1] ,A^{1}, \sigma ^{\text {cbn}}) \in \mathscr {C}_{\text {cbn}}\).

\(\mathscr {C}_{\text {cbn}}\) is designed to allow us to link \(\delta _{\text {cbn}}^{c}\) and \(\delta ^{\text {c}}_{\mathscr {C}_{\text {cbn}}}\): for any \(M, N\in \varLambda _\oplus \) closed terms, it holds that \({\delta _{\text {cbn}}^{c}}(M,N)= {\delta ^{\text {c}}_{\mathscr {C}_{\text {cbn}}}}(\hat{s}^{\text {cbn}}(M),\hat{s}^{\text {cbn}}(N))\). Moreover, \(\mathscr {C}_{\text {cbn}}\) is preserved by the stable fragment \(\mathscr {F}^{\text {cbn}}\) (the proof can be found in [7]).

Theorem 6

(Call-by-Name Full Abstraction). \({\delta _{\text {cbn}}^{c}}\) and \({\delta _{\text {cbn}}}\) coincide.

Proof

We first show that \(\delta _{\text {cbn}}\) is at least as discriminating \(\delta _{\text {cbn}}^{c}\). Let be \(M, N\in \varLambda _\oplus \). By definition of \(\mathscr {L}_{\mathscr {F}^{\text {cbn}}}\), we know that \(\hat{s}^{\text {cbn}}(M), \hat{s}^{\text {cbn}}(N)\in \mathcal {S}_{\mathscr {L}_{\mathscr {F}^{\text {cbn}}}}\). Moreover, we know that \(\mathscr {C}_{\text {cbn}}\) is preserved by \(\mathscr {F}^{\text {cbn}}\). So we can apply Proposition 4, and we see that \({\delta ^{\text {c}}_{\mathscr {C}_{\text {cbn}}}}(\hat{s}^{\text {cbn}}(M),\hat{s}^{\text {cbn}}(N)) \le {\delta _{\text {cbn}}}(M,N)\), and soundness follows. When proving completeness part, we rely on an “intrinsic” characterization of \(\delta _{\text {cbn}}\). The details can be found in [7].    \(\square \)

6.3 Call-by-Value

In a similar way, we can endow \(\varLambda _\oplus \) with a call-by-value semantics, and embed it into \(\varLambda _\oplus ^{!}\). We are then able to define a suitable fragment of \(\mathscr {M}_\oplus ^{!{}}\), a suitable set of tuple contexts preserving it, and a characterization of a call-by-value context distance for \(\varLambda _\oplus \) follows [7]. While the construction of the stable fragment (and the set of tuple contexts to consider) are more involved than in the call-by-name case, we noticed that the characterization we obtain seems to have some similarities with the way environmental bisimulation for a call-by-value probabilistic \(\lambda \)-calculus was defined in [32].

7 Related Work

This is definitely not the first work on metrics in the context of programming languages semantics. A very nice introduction to the topic, together with a comprehensive (although outdated) list of references can be found in [35]. One of the many uses of metrics is as an alternative to order-theoretic semantics. This has also been applied to higher-order languages, and to deterministic \(\mathsf {PCF}\) [15].

If one focuses on probabilistic programming languages, the first attempts at using metrics as a way to measure “how far” two programs are, algebraically or behaviorally, are due to Giacalone et al. [18], and Desharnais et al. [11, 12], who both consider process algebras in the style of Milner’s CCS. Most of further work in this direction has focused on concurrent specifications. Among the recent advances in this direction (and without any hope of being comprehensive), we can cite Gebler et al.’s work on uniform continuity as a way to enforce compositionality in metric reasoning [16, 17]. Great inspiration for this work came from the many contributions on metrics for labeled Markov chains and processes appeared in the last twenty years (e.g. [13, 36]).

8 Conclusions

We have shown how the context distance can be characterized so as to simplify concrete proofs, and to which extent this metric trivializes. All this has been done in a universal linear \(\lambda \)-calculus for probabilistic computation. This clarifies to which extent refining equivalences into metrics is worth in such a scenario. The tuple-based techniques in Sect. 5.5 are potentially very interesting in view of possible applications to cryptography, as hinted in [4]. This is indeed what we are working on currently.