Decomposing Probabilistic Lambda-Calculi

A notion of probabilistic lambda-calculus usually comes with a prescribed reduction strategy, typically call-by-name or call-by-value, as the calculus is non-confluent and these strategies yield different results. This is a break with one of the main advantages of lambda-calculus: confluence, which means results are independent from the choice of strategy. We present a probabilistic lambda-calculus where the probabilistic operator is decomposed into two syntactic constructs: a generator, which represents a probabilistic event; and a consumer, which acts on the term depending on a given event. The resulting calculus, the Probabilistic Event Lambda-Calculus, is confluent, and interprets the call-by-name and call-by-value strategies through different interpretations of the probabilistic operator into our generator and consumer constructs. We present two notions of reduction, one via fine-grained local rewrite steps, and one by generation and consumption of probabilistic events. Simple types for the calculus are essentially standard, and they convey strong normalization. We demonstrate how we can encode call-by-name and call-by-value probabilistic evaluation.


Introduction
Probabilistic lambda-calculi [24,22,17,11,18,9,15] extend the standard lambdacalculus with a probabilistic choice operator N ⊕ p M , which chooses N with probability p and M with probability 1 − p (throughout this paper, we let p be 1 /2 and will omit it). Duplication of N ⊕ M , as is wont to happen in lambdacalculus, raises a fundamental question about its semantics: do the duplicate occurrences represent the same probabilistic event, or different ones with the same probability? For example, take the term ⊕ ⊥ that represents a coin flip between boolean values true and false ⊥. If we duplicate this term, do the copies represent two distinct coin flips with possibly distinct outcomes, or do these represent a single coin flip that determines the outcome for both copies? Put differently again, when we duplicate ⊕ ⊥, do we duplicate the event, or only its outcome?
In probabilistic lambda-calculus, these two interpretations are captured by the evaluation strategies of call-by-name ( cbn ), which duplicates events, and call-by-value ( cbv ), which evaluates any probabilistic choice before it is duplicated, and thus only duplicates outcomes. Consider the following example, where = tests equality of boolean values.
This situation is not ideal, for several, related reasons. Firstly, it demonstrates how probabilistic lambda-calculus is non-confluent, negating one of the central properties of the lambda-calculus, and one of the main reasons why it is the prominent model of computation that it is. Secondly, it means that a probabilistic lambda-calculus must derive its semantics from a prescribed reduction strategy, and its terms only have meaning in the context of that strategy. Thirdly, combining different kinds of probabilities becomes highly involved [15], as it would require specialized reduction strategies. These issues present themselves even in a more general setting, namely that of commutative (algebraic) effects, which in general do not commute with copying.
We address these issues by a decomposition of the probabilistic operator into a generator a and a choice a ⊕ , as follows.
Semantically, a represents a probabilistic event, that generates a boolean value recorded as a. The choice N a ⊕ M is simply a conditional on a, choosing N if a is false and M if a is true. Syntactically, a is a boolean variable with an occurrence in a ⊕ , and a acts as a probabilistic quantifier, binding all occurrences in its scope. (To capture a non-equal chance, one would attach a probability p to a generator, as a p , though we will not do so in this paper.) The resulting probabilistic event lambda-calculus Λ PE , which we present in this paper, is confluent. Our decomposition allows us to separate duplicating an event, represented by the generator a , from duplicating only its outcome a, through having multiple choice operators a ⊕ . In this way our calculus may interpret both original strategies, call-by-name and call-by-value, by different translations of standard probabilistic terms into Λ PE : call-by-name by the above decomposition (see also Section 2), and call-by-value by a different one (see Section 7). For our initial example, we get the following translations and reductions.
We present two reduction relations for our probabilistic constructs, both independent of beta-reduction. Our main focus will be on permutative reduction (Sections 2, 3), a small-step local rewrite relation which is computationally inefficient but gives a natural and very fine-grained operational semantics. Projective reduction (Section 6) is a more standard reduction, following the intuition that a generates a coin flip to evaluate a ⊕ , and is coarser but more efficient. We further prove confluence (Section 4), and we give a system of simple types and prove strong normalization for typed terms by reducibility (Section 5). Omitted proofs can be found in [7], the long version of this paper.

Related Work
Probabilistic λ-calculi are a topic of study since the pioneering work by Saheb-Djaromi [24], the first to give the syntax and operational semantics of a λ-calculus with binary probabilistic choice. Giving well-behaved denotational models for probabilistic λ-calculi has proved to be challenging, as witnessed by the many contributions spanning the last thirty years: from Jones and Plotkin's early study of the probabilistic powerdomain [17], to Jung and Tix's remarkable (and mostly negative) observations [18], to the very recent encouraging results by Goubault-Larrecq [16]. A particularly well-behaved model for probabilistic λ-calculus can be obtained by taking a probabilistic variation of Girard's coherent spaces [10], this way getting full abstraction [13].
On the operational side, one could mention a study about the various ways the operational semantics of a calculus with binary probabilistic choice can be specified, namely by small-step or big-step semantics, or by inductively or coinductively defined sets of rules [9]. Termination and complexity analysis of higherorder probabilistic programs seen as λ-terms have been studied by way of type systems in a series of recent results about size [6], intersection [4], and refinement type disciplines [1]. Contextual equivalence on probabilistic λ-calculi has been studied, and compared with equational theories induced by Böhm Trees [19], applicative bisimilarity [8], or environmental bisimilarity [25].
In all the aforementioned works, probabilistic λ-calculi have been taken as implicitly endowed with either call-by-name or call-by-value strategies, for the reasons outlined above. There are only a few exceptions, namely some works on Geometry of Interaction [5], Probabilistic Coherent Spaces [14], and Standardization [15], which achieve, in different contexts, a certain degree of independence from the underlying strategy, thus accommodating both call-by-name and call-by-value evaluation. The way this is achieved, however, invariably relies on Linear Logic or related concepts. This is deeply different from what we do here.
Some words of comparison with Faggian and Ronchi Della Rocca's work on confluence and standardization [15] are also in order. The main difference between their approach and the one we pursue here is that the operator ! in their calculus Λ ! ⊕ plays both the roles of a marker for duplicability and of a checkpoint for any probabilistic choice "flowing out" of the term (i.e. being fired). In our calculus, we do not control duplication, but we definitely make use of checkpoints. Saying it another way, Faggian and Ronchi Della Rocca's work is inspired by linear logic, while our approach is inspired by deep inference, even though this is, on purpose, not evident in the design of our calculus.
Probabilistic λ-calculi can also be seen as vehicles for expressing probabilistic models in the sense of bayesian programming [23,3]. This, however, requires an operator for modeling conditioning, which complicates the metatheory considerably, and that we do not consider here.
Our permutative reduction is a refinement of that for the call-by-name probabilistic λ-calculus [20], and is an implementation of the equational theory of (ordered) binary decision trees via rewriting [27]. Probabilistic decision trees have been proposed with a primitive binary probabilistic operator [22], but not with a decomposition as we explore here.
2 The Probabilistic Event λ-Calculus Λ PE Definition 1. The probabilistic event λ-calculus (Λ PE ) is given by the following grammar, with from left to right: a variable (denoted by x, y, z, . . . ), an abstraction, an application, a (labeled) choice, and a (probabilistic) generator.
In a term λx. M the abstraction λx binds the free occurrences of the variable x in its scope M , and in a . N the generator a binds the label a in N . The calculus features a decomposition of the usual probabilistic sum ⊕ , as follows.
The generator a represents a probabilistic event, whose outcome, a binary value {0, 1} represented by the label a, is used by the choice operator a ⊕ . That is, a flips a coin setting a to 0 (resp. 1), and depending on this N a ⊕ M reduces to N (resp. M ). We will use the unlabeled choice ⊕ as in (3). This convention also gives the translation from a call-by-name probabilistic λ-calculus into Λ PE (the interpretation of a call-by-value probabilistic λ-calculus is in Section 7).
Reduction. Reduction in Λ PE will consist of standard β-reduction β plus an evaluation mechanism for generators and choice operators, which implements probabilistic choice. We will present two such mechanisms: projective reduction π and permutative reduction p . While projective reduction implements the given intuition for the generator and choice operator, we relegate it to Section 6 and make permutative reduction our main evaluation mechanism, for the reason that it is more fine-grained, and thus more general.
Permutative reduction is based on the idea that any operator distributes over the labeled choice operator (see the reduction steps in Figure 1), even other choice operators, as below.
To orient this as a rewrite rule, we need to give priority to one label over another. Fortunately, the relative position of the associated generators a and b provides just that. Then to define p , we will want every choice to belong to some generator, and make the order of generators explicit. Definition 2. The set fl(N ) of free labels of a term N is defined inductively by: From here on, we consider only label-closed terms (we implicitly assume this, unless otherwise stated). All terms are identified up to renaming of their bound variables and labels. Given some terms M and N and a variable x, M [N/x] is the capture-avoiding (for both variables and labels) substitution of N for the free occurrences of x in M . We speak of a representative M of a term when M is not considered up to such a renaming. A representative M of a term is well-labeled if for every occurrence of a in M there is no a occurring in its scope.

Definition 3 (Order for labels).
Let M be a well-labeled representative of a term. We define an order < M for the labels occurring in M as follows: a < M b if and only if b occurs in the scope of a .
For a well-labeled and label-closed representative M , < M is a finite tree order.
and permutative or p-reduction p , both defined as the contextual closure of the rules given in Figure 1. We write for the reflexive-transitive closure of , and for reduction to normal form; similarly for β and p . We write = p for the symmetric and reflexive-transitive closure of p . Two example reductions are (1)-(2) on p. 137; a third, complete reduction is in Figure 2. The crucial feature of p-reduction is that a choice a ⊕ does permute out of the argument position of an application, but a generator a does not, as below. Since the argument of a redex may be duplicated, this is how we characterize the difference between the outcome of a probabilistic event, whose duplicates may be identified, and the event itself, whose duplicates may yield different outcomes.
By inspection of the rewrite rules in Figure 1, we can then characterize the normal forms of p and as follows.
Proposition 5 (Normal forms). The normal forms P 0 of p , respectively N 0 of , are characterized by the following grammars.

Properties of Permutative Reduction
We will prove strong normalization and confluence of p . For strong normalization, the obstacle is the interaction between different choice operators, which may duplicate each other, creating super-exponential growth. 3 Fortunately, Dershowitz's recursive path orders [12] seem tailor-made for our situation.
Observe that the set Λ PE endowed with p is a first-order term rewriting system over a countably infinite set of variables and the signature Σ given by: • the binary function symbol a ⊕ , for any label a; • the unary function symbol a , for any label a; • the unary function symbol λx, for any variable x; • the binary function symbol @, letting @(M, N ) stand for M N . Definition 6. Let M be a well-labeled representative of a label-closed term, and let Σ M be the set of signature symbols occurring in M . We define ≺ M as the (strict) partial order on Σ M generated by the following rules.
Lemma 7. The reduction p is strongly normalizing.
Proof. For the first-order term rewriting system (Λ PE , p ) we derive a wellfounded recursive path ordering < from ≺ M following [12, p. 289]. Let f and g range over function symbols, let [N 1 , . . . , N n ] denote a multiset and extend < to multisets by the standard multiset ordering, and let N = f (N 1 , . . . , N n ) and While ≺ M is defined only relative to Σ M , reduction may only reduce the signature. Inspection of Figure 1 then shows that M p N implies N < M .
Confluence of Permutative Reduction. With strong normalization, confluence of p requires only local confluence. We reduce the number of cases to consider, by casting the permutations of a ⊕ as instances of a common shape.
Observe that the six reduction rules ⊕λ through ⊕ in Figure 1 are all of the following form. We refer to these collectively as ⊕ .
Lemma 9 (Confluence of p ). Reduction p is confluent.
Proof. By Newman's lemma and strong normalization of p (Lemma 7), confluence follows from local confluence. The proof of local confluence consists of joining all critical pairs given by p . Details are in the Appendix of [7].
Definition 10. We denote the unique p-normal form of a term N by N p .

Confluence
We aim to prove that = β ∪ p is confluent. We will use the standard technique of parallel β-reduction [26], a simultaneous reduction step on a number of β-redexes, which we define via a labeling of the redexes to be reduced. The central point is to find a notion of reduction that is diamond, i.e. every critical pair can be closed in one (or zero) steps. This will be our complete reduction, which consists of parallel β-reduction followed by p-reduction to normal form.
Definition 11. A labeled term P • is a term P with chosen β-redexes annotated as (λx. N ) • M . The unique labeled β-step P • β P • from P • to the labeled reduct P • reduces every labeled redex, and is defined inductively as follows.
Note that P • is an unlabeled term, since all labels are removed in the reduction. For the empty labeling, P • = P • = P , so parallel reduction is reflexive: P β P .
Proof. By induction on the labeled term P • generating P β P • .
Proof. Let P • β P • and P • β P • be two labeled reduction steps on a term P . We annotate each step with the label of the other, preserved by reduction, to give the span from the doubly labeled term P •• = P •• below left. Reducing the remaining labels will close the diagram, as below right.
This is proved by induction on P •• , where only two cases are not immediate: those where a redex carries one but not the other label. One case follows by the below diagram; the other case is symmetric. Below, for the step top right, induction on N • shows that

Parallel Reduction and Permutative Reduction
For the commutation of (parallel) β-reduction with p-reduction, we run into the minor issue that a permuting generator or choice operator may block a redex: in both cases below, before p the term has a redex, but after p it is blocked.
We address this by an adaptation p of p-reduction on labeled terms, which is a strategy in p that permutes past a labeled redex in one step.

Definition 14.
A labeled p-reduction N • p M • on labeled terms is a preduction of one of the forms or a single p-step p on unlabeled constructors in N • .
Lemma 15. Reduction to normal form in p is equal to p (on labeled terms).
Proof. Observe that p and p have the same normal forms. Then in one direction, since p ⊆ p we have p ⊆ p . Conversely, let N p M . On this reduction, let P p Q be the first step such that P p Q. Then there is an R such that P p R and Q p R. Note that we have N p R. By confluence, R p M , and by induction on the sum length of paths in p from R (smaller than from N ) we have R p M , and hence N p M .
The following lemmata then give the required commutation properties of the relations p , p , and β . Figure 3 illustrates these by commuting diagrams.
Proof. By induction on the rewrite step p . The two interesting cases are: How the critical pairs in the above diagrams are joined shows that we cannot use the Hindley-Rosen Lemma [2, Prop. 3.3.5] to prove confluence of β ∪ p .
Proof. Using Lemma 15 we decompose N • p N • p as

Complete Reduction
To obtain a reduction strategy with the diamond property for , we combine parallel reduction β with permutative reduction to normal form p into a notion of complete reduction . We will show that it is diamond (Lemma 19), and that any step in maps onto a complete step of p-normal forms (Lemma 20). Confluence of (Theorem 21) then follows: any two paths map onto complete paths on p-normal forms, which then converge by the diamond property.
Definition 18. A complete reduction step N N •p is a parallel β-step followed by p-reduction to normal form:

Strong Normalization for Simply-Typed Terms
In this section, we prove that the relation enjoys strong normalization in simply typed terms. Our proof of strong normalization is based on the classic reducibility technique, and inherently has to deal with label-open terms. It thus make great sense to turn the order < M from Definition 3 into something more formal, at the same time allowing terms to be label-open. This is in Figure 4. It is easy to realize that, of course modulo label α-equivalence, for every term M there is at least one θ such that θ L M . An easy fact to check is that if θ L M and M θ N , then θ L N . It thus makes sense to parametrize on a sequence of labels θ, i.e., one can define a family of reduction relations θ on pairs in the form (M, θ). The set of strongly normalizable terms, and the number of steps to normal forms become themselves parametric: • The set SN θ of those terms M such that θ L M and (M, θ) is strongly normalizing modulo θ ; • The function sn θ assigning to any term in SN θ the maximal number of θ steps to normal form.
Please notice that the type structure is precisely the one of the usual, vanilla, simply-typed λ-calculus (although terms are of course different), and we can thus reuse most of the usual proof of strong normalization, for example in the version given by Ralph Loader's notes [21], page 17.
Lemma 22. The closure rules in Figure 6 are all sound.
Since the structure of the type system is the one of plain, simple types, the definition of reducibility sets is the classic one: Before proving that all terms are reducible, we need some auxiliary results. Proof. This is an induction on the structure of the term M : • If M is a variable, necessarily one among y 1 , . . . , y n , then the result is trivial.
• If M is a sum L a ⊕ P , we can make use of Lemma 23 and the induction hypothesis, and conclude.
• If M is a generator a . P , we can make use of Lemma 23 and the induction hypothesis. We should however observe that a · θ L P , since θ L M .
We now have all the ingredients for our proof of strong normalization: Proof. Suppose that x 1 : ρ 1 , . . . , x n : ρ n M : τ . Since x 1 : ρ 1 , . . . , x n : ρ n x i : ρ i for all i, and clearly θ L x i for every i, we can apply Lemma 24 and obtain that (Γ, θ, M [x/x]) ∈ Red τ from which, via Lemma 23, one gets the thesis.

Projective Reduction
Permutative reduction p evaluates probabilistic sums purely by rewriting. Here we look at a more standard projective notion of reduction, which conforms more closely to the intuition that a generates a probabilistic event to determine the choice a ⊕ . Using + for an external probabilistic sum, we expect to reduce a . N to N 0 +N 1 where each N i is obtained from N by projecting every subterm M 0 a ⊕ M 1 to M i . The question is, in what context should we admit this reduction? We first limit ourselves to reducing in head position.
Definition 26. The a-projections π a 0 (N ) and π a 1 (N ) are defined as follows: Definition 27. A head context H[ ] is given by the following grammar.
Definition 28. Projective head reduction πh is given by We can simulate πh by permutative reduction if we interpret the external sum + by an outermost ⊕ (taking special care if the label does not occur).
Proposition 29. Permutative reduction simulates projective head reduction: if a / ∈ fl(N )  (H[N ]). By induction on N , if a is minimal in N (i.e. a ∈ fl(N ) and a ≤ b for all b ∈ fl(N )) then N p π a 0 (N ) a ⊕ π a 1 (N ). As required, A gap remains between which generators will not be duplicated, which we should be able to reduce, and which generators projective head reduction does reduce. In particular, to interpret call-by-value probabilistic reduction in Section 7, we would like to reduce under other generators. However, permutative reduction does not permit exchanging generators, and so only simulates reducing in head position. While (independent) probabilistic events are generally considered interchangeable, it is a question whether the below equivalence is desirable.
We elide the issue by externalizing probabilistic events, and reducing with reference to a predetermined binary stream s ∈ {0, 1} N representing their outcomes.
In this way, we will preserve the intuitions of both permutative and projective reduction: we obtain a qualified version of the equivalence (4) (see (5) below), and will be able to reduce any generator on the spine of a term: under (other) generators and choices as well as under abstractions and in function position.
Definition 30. The set of streams is S = {0, 1} N , ranged over by r, s, t, and i · s denotes a stream with i ∈ {0, 1} as first element and s as the remainder.
Definition 31. The stream labeling N s of a term N with a stream s ∈ S, which annotates generators as a i with i ∈ {0, 1} and variables as x s with a stream s, is given inductively below. We lift β-reduction to stream-labeled terms by introducing a substitution case for stream-labeled variables: Definition 32. Projective reduction π on stream-labeled terms is the rewrite relation given by Observe that in N s a generator that occurs under n other generators on the spine of N , is labeled with the element of s at position n + 1. Generators in argument position remain unlabeled, until a β-step places them on the spine, in which case they become labeled by the new substitution case. We allow to annotate a term with a finite prefix of a stream, e.g. N i with a singleton i, so that only part of the spine is labeled. Subsequent labeling of a partly labeled term is then by (N r ) s = N r·s (abusing notation). To introduce streams via the external probabilistic sum, and to ignore an unused remaining stream after completing a probabilistic computation, we adopt the following equation.
Proposition 33. Projective reduction generalizes projective head reduction: Returning to the interchangeability of probabilistic events, we refine (4) by exchanging the corresponding elements of the annotating streams: Stream-labeling externalizes all probabilities, making reduction deterministic. This is expressed by the following proposition, that stream-labeling commutes with reduction: if a generator remains unlabeled in M and becomes labeled after a reduction step M N , what label it receives is predetermined. The deep reason is that stream labeling assigns an outcome to each generator in a way that corresponds to a call-by-name strategy for probabilistic reduction.
Proposition 34. If M N by a step other than then M s N s .
Remark 35. The statement is false for the rule a . N p N (a / ∈ fl(N )), as it removes a generator but not an element from the stream. Arguably, for this reason the rule should be excluded from the calculus. On the other hand, the rule is necessary to implement idempotence of ⊕ , rather than just a ⊕ , as follows.
The below proposition then expresses that projective reduction is an invariant for permutative reduction. If N p M by a step (that is not ) on a labeled generator a i or a corresponding choice a ⊕ , then N and M reduce to a common term, N π P π M , by the projective steps evaluating a i .
Proposition 36. Projective reduction is an invariant for permutative reduction, as follows (with a case for c 2 symmetric to c 1 , and where D[ ] is a context).
We consider the interpretation of a call-by-value probabilistic λ-calculus. For simplicity we will allow duplicating (or deleting) β-redexes, and only restrict duplicating probabilities; our values V are then just deterministic-i.e. without choices-terms, possibly applications and not necessarily β-normal (so that our βv is actually β-reduction on deterministic terms, unlike [9]). We evaluate the internal probabilistic choice ⊕ v to an external probabilistic choice +.
The interpretation N v of a call-by-value term N into Λ PE is given as follows. First, we translate N to a label-open term N open = θ L P by replacing each choice ⊕ v with one a ⊕ with a unique label, where the label-context θ collects the labels used. Then N v is the label closure N v = θ L P , which prefixes P with a generator a for every a in θ.
The label closure θ L P is given inductively as follows.
Our call-by-value reduction may choose an arbitrary order in which to evaluate the choices ⊕ v in a term N , but the order of generators in the interpretation N v is necessarily fixed. Then to simulate a call-by-value reduction, we cannot choose a fixed context stream a priori; all we can say is that for every reduction, there is some stream that allows us to simulate it. Specifically, a reduction step a Λ PE -context, and θ giving rise to the sequence of generators . . . a . b . c . . . in the call-by-value translation. To simulate the reduction step, if b occupies the n-th position in θ, then the n-th position in the context stream s must be the element j. Since β-reduction survives the translation and labeling process intact, we may simulate call-byvalue probabilistic reduction by projective and β-reduction.
Theorem 38. If N v,βv V then N s v π,β V v for some stream s ∈ S.

Conclusions and Future Work
We believe our decomposition of probabilistic choice in λ-calculus to be an elegant and compelling way of restoring confluence, one of the core properties of the λ-calculus. Our probabilistic event λ-calculus captures traditional call-by-name and call-by-value probabilistic reduction, and offers finer control beyond those strategies. Permutative reduction implements a natural and fine-grained equivalence on probabilistic terms as internal rewriting, while projective reduction provides a complementary and more traditional external perspective.
There are a few immediate areas for future work. Firstly, within probabilistic λ-calculus, it is worth exploring if our decomposition opens up new avenues in semantics. Secondly, our approach might apply to probabilistic reasoning more widely, outside the λ-calculus. Most importantly, we will explore if our approach can be extended to other computational effects. Our use of streams interprets probabilistic choice as a read operation from an external source, which means other read operations can be treated similarly. A complementary treatment of write operations would allow us to express a considerable range of effects, including input/output and state.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.