Relational Reasoning for Markov Chains in a Probabilistic Guarded Lambda Calculus

We extend the simply-typed guarded $\lambda$-calculus with discrete probabilities and endow it with a program logic for reasoning about relational properties of guarded probabilistic computations. This provides a framework for programming and reasoning about infinite stochastic processes like Markov chains. We demonstrate the logic sound by interpreting its judgements in the topos of trees and by using probabilistic couplings for the semantics of relational assertions over distributions on discrete types. The program logic is designed to support syntax-directed proofs in the style of relational refinement types, but retains the expressiveness of higher-order logic extended with discrete distributions, and the ability to reason relationally about expressions that have different types or syntactic structure. In addition, our proof system leverages a well-known theorem from the coupling literature to justify better proof rules for relational reasoning about probabilistic expressions. We illustrate these benefits with a broad range of examples that were beyond the scope of previous systems, including shift couplings and lump couplings between random walks.


Introduction
Stochastic processes are often used in mathematics, physics, biology or finance to model evolution of systems with uncertainty. In particular, Markov chains are "memoryless" stochastic processes, in the sense that the evolution of the system depends only on the current state and not on its history. Perhaps the most emblematic example of a (discrete time) Markov chain is the simple random walk over the integers, that starts at 0, and that on each step moves one position either left or right with uniform probability. Let p i be the position at time i. Then, this Markov chain can be described as: p 0 = 0 p i+1 = p i + 1 with probability 1/2 p i − 1 with probability 1/2 The goal of this paper is to develop a programming and reasoning framework for probabilistic computations over infinite objects, such as Markov chains. Although programming and reasoning frameworks for infinite objects and probabilistic computations are well-understood in isolation, their combination is challenging. In particular, one must develop a proof system that is powerful enough for proving interesting properties of probabilistic computations over infinite objects, and practical enough to support effective verification of these properties.
Modelling probabilistic infinite objects A first challenge is to model probabilistic infinite objects. We focus on the case of Markov chains, due to its importance. A (discrete-time) Markov chain is a sequence of random variables {X i } over some fixed type T satisfying some independence property. Thus, the straightforward way of modelling a Markov chain is as a stream of distributions over T . Going back to the simple example outlined above, it is natural to think about this kind of discrete-time Markov chain as characterized by the sequence of positions {p i } i∈N , which in turn can be described as an infinite set indexed by the natural numbers. This suggests that a natural way to model such a Markov chain is to use streams in which each element is produced probabilistically from the previous one. However, there are some downsides to this representation. First of all, it requires explicit reasoning about probabilistic dependency, since X i+1 depends on X i . Also, we might be interested in global properties of the executions of the Markov chain, such as "The probability of passing through the initial state infinitely many times is 1". These properties are naturally expressed as properties of the whole stream. For these reasons, we want to represent Markov chains as distributions over streams. Seemingly, one downside of this representation is that the set of streams is not countable, which suggests the need for introducing heavy measure-theoretic machinery in the semantics of the programming language, even when the underlying type is discrete or finite.
Fortunately, measure-theoretic machinery can be avoided (for discrete distributions) by developing a probabilistic extension of the simply-typed guarded λ-calculus and giving a semantic interpretation in the topos of trees [1]. Informally, the simply-typed guarded λ-calculus [1] extends the simply-typed lambda calculus with a later modality, denoted by ⊲. The type ⊲A ascribes expressions that are available one unit of logical time in the future. The ⊲ modality allows one to model infinite types by using "finite" approximations. For example, a stream of natural numbers is represented by the sequence of its (increasing) prefixes in the topos of trees. The prefix containing the first i elements has the type S i N × ⊲N × . . . × ⊲ (i−1) N, representing that the first element is available now, the second element a unit time in the future, and so on. This is the key to representing probability distributions over infinite objects without measure-theoretic semantics: We model probability distributions over non-discrete sets as discrete distributions over their (the sets') approximations. For example, a distribution over streams of natural numbers (which a priori would be non-discrete since the set of streams is uncountable) would be modelled by a sequence of distributions over the finite approximations S 1 , S 2 , . . . of streams. Importantly, since each S i is countable, each of these distributions can be discrete.
Reasoning about probabilistic computations Probabilistic computations exhibit a rich set of properties. One natural class of properties is related to probabilities of events, saying, for instance, that the probability of some event E (or of an indexed family of events) increases at every iteration. However, several interesting properties of probabilistic computation, such as stochastic dominance or convergence (defined below) are relational, in the sense that they refer to two runs of two processes. In principle, both classes of properties can be proved using a higher-order logic for probabilistic expressions, e.g. the internal logic of the topos of trees, suitably extended with an axiomatization of finite distributions. However, we contend that an alternative approach inspired from refinement types is desirable and provides better support for effective verification. More specifically, reasoning in a higher-order logic, e.g. in the internal logic of the topos of trees, does not exploit the structure of programs for non-relational reasoning, nor the structural similarities between programs for relational reasoning. As a consequence, reasoning is more involved. To address this issue, we define a relational proof system that exploits the structure of the expressions and supports syntax-directed proofs, with necessary provisions for escaping the syntax-directed discipline when the expressions do not have the same structure. The proof system manipulates judgements of the form: where ∆ and Γ are two typing contexts, Σ and Ψ respectively denote sets of assertions over variables in these two contexts, t 1 and t 2 are well-typed expressions of type A 1 and A 2 , and φ is an assertion that may contain the special variables r 1 and r 2 that respectively correspond to the values of t 1 and t 2 . The context ∆ and Γ , the terms t 1 and t 2 and the types A 1 and A 2 provide a specification, while Σ, Ψ , and φ are useful for reasoning about relational properties over t 1 , t 2 , their inputs and their outputs. This form of judgement is similar to that of Relational Higher-Order Logic [2], from which our system draws inspiration.
In more detail, our relational logic comes with typing rules that allow one to reason about relational properties by exploiting as much as possible the syntactic similarities between t 1 and t 2 , and to fall back on pure logical reasoning when these are not available. In order to apply relational reasoning to guarded computations the logic provides relational rules for the later modality ⊲ and for a related modality , called "constant". These rules allow the relational verification of general relational properties that go beyond the traditional notion of program equivalence and, moreover, they allow the verification of properties of guarded computations over different types. The ability to reason about computations of different types provides significant benefits over alternative formalisms for relational reasoning. For example, it enables reasoning about relations between programs working on different data structures, e.g. a relation between a program working on a stream of natural numbers, and a program working on a stream of pairs of natural numbers, or having different structures, e.g. a relation between an application and a case expression.
Importantly, our approach for reasoning formally about probabilistic computations is based on probabilistic couplings, a standard tool from the analysis of Markov chains [3,4]. From a verification perspective, probabilistic couplings go beyond equivalence properties of probabilistic programs, which have been studied extensively in the verification literature, and yet support compositional reasoning [5,6]. The main attractive feature of coupling-based reasoning is that it limits the need of explicitly reasoning about the probabilities-this avoids complex verification conditions. We provide sound proof rules for reasoning about probabilistic couplings. Our rules make several improvements over prior relational verification logics based on couplings. First, we support reasoning over probabilistic processes of different types. Second, we use Strassen's theorem [7] a remarkable result about probabilistic couplings, to achieve greater expressivity. Previous systems required to prove a bijection between the sampling spaces to show the existence of a coupling [5,6], Strassen's theorem gives a way to show their existence which is applicable in settings where the bijection-based approach cannot be applied. And third, we support reasoning with what are called shift couplings, coupling which permits to relate the states of two Markov chains at possibly different times (more explanations below).
Case studies We show the flexibility of our formalism by verifying several examples of relational properties of probabilistic computations, and Markov chains in particular. These examples cannot be verified with existing approaches.
First, we verify a classic example of probabilistic non-interference which requires the reasoning about computations at different types. Second, in the context of Markov chains, we verify an example about stochastic dominance which exercises our more general rule for proving the existence of couplings modelled by expressions of different types. Finally, we verify an example involving shift relations in an infinite computation. This style of reasoning is motivated by "shift" couplings in Markov chains. In contrast to a standard coupling, which relates the states of two Markov chains at the same time t, a shift coupling relates the states of two Markov chains at possibly different times. Our specific example relates a standard random walk (described earlier) to a variant called a lazy random walk; the verification requires relating the state of standard random walk at time t to the state of the lazy random walk at time 2t. We note that this kind of reasoning is impossible with conventional relational proof rules even in a non-probabilistic setting. Therefore, we provide a novel family of proof rules for reasoning about shift relations. At a high level, the rules combine a careful treatment of the later and constant modalities with a refined treatment of fixpoint operators, allowing us to relate different iterates of function bodies.

Summary of contributions
With the aim of providing a general framework for programming and reasoning about Markov chains, the three main contributions of this work are: 1. A probabilistic extension of the guarded λ-calculus, that enables the definition of Markov chains as discrete probability distributions over streams.

2.
A relational logic based on coupling to reason in a syntax-directed manner about (relational) properties of Markov chains. This logic supports reasoning about programs that have different types and structures. Additionally, this logic uses results from the coupling literature to achieve greater expressivity than previous systems. 3. An extension of the relational logic that allows to relate the states of two streams at possibly different times. This extension supports reasoning principles, such as shift couplings, that escape conventional relational logics.

Mathematical preliminaries
This section reviews the definition of discrete probability sub-distributions and introduces mathematical couplings.
Definition 1 (Discrete probability distribution). Let C be a discrete (i.e., finite or countable) set. A (total) distribution over C is a function µ : C → [0, 1] such that x∈C µ(x) = 1. The support of a distribution µ is the set of points with non-zero probability, supp µ {x ∈ C | µ(x) > 0}. We denote the set of distributions over C as D(C). Given a subset E ⊆ C, the probability of sampling from µ a point in E is denoted Pr x←µ [x ∈ E], and is equal to x∈E µ(x).
Definition 2 (Marginals). Let µ be a distribution over a product space C 1 × C 2 . The first (second) marginal of µ is another distribution D(π 1 )(µ) (D(π 2 )(µ)) over C 1 (C 2 ) defined as: Probabilistic couplings Probabilistic couplings are a fundamental tool in the analysis of Markov chains. When analyzing a relation between two probability distributions it is sometimes useful to consider instead a distribution over the product space that somehow "couples" the randomness in a convenient manner. Consider for instance the case of the following Markov chain, which counts the total amount of tails observed when tossing repeatedly a biased coin with probability of tails p: If we have two biased coins with probabilities of tails p and q with p ≤ q and we respectively observe {n i } and {m i } we would expect that, in some sense, n i ≤ m i should hold for all i (this property is known as stochastic dominance). A formal proof of this fact using elementary tools from probability theory would require to compute the cumulative distribution functions for n i and m i and then to compare them. The coupling method reduces this proof to showing a way to pair the coin flips so that if the first coin shows tails, so does the second coin. We now review the definition of couplings and state relevant properties.
Couplings always exist. For instance, the product distribution of two distributions is always a coupling. Going back to the example about the two coins, it can be proven by computation that the following is a coupling that lifts the less-or-equal relation (0 indicating heads and 1 indicating tails): The following theorem in [7] gives a necessary and sufficient condition for the existence of R-couplings between two distributions. The theorem is remarkable in the sense that it proves an equivalence between an existential property (namely the existence of a particular coupling) and a universal property (checking, for each event, an inequality between probabilities).
Lemma 1 (Sequential composition couplings). Let µ 1 ∈ D(C 1 ), µ 2 ∈ D(C 2 ), M 1 : C 1 → D(D 1 ) and M 2 : C 2 → D(D 2 ). Moreover, let R ⊆ C 1 × C 2 and S ⊆ D 1 × D 2 . Assume: (1) ⋄ µ1,µ2 .R; and (2) for every x 1 ∈ C 1 and x 2 ∈ C 2 such that R x 1 x 2 , we have ⋄ M1(x1),M2(x2) .S. Then ⋄ (bind µ1 M1),(bind µ2 M2) .S, where bind µ M is defined as We conclude this section with the following lemma, which follows from Strassen's theorem: This lemma can be used to prove probabilistic inequalities from the existence of suitable couplings: In the example at the beginning of the section, the property we want to prove is precisely that, for every k and i, the following holds: Since we have a ≤-coupling, this proof is immediate. This example is formalized in subsection 3.3.

Overview of the system
In this section we give a high-level overview of our system, with the details on sections 4, 5 and 6. We start by presenting the base logic, and then we show how to extend it with probabilities and how to build a relational reasoning system on top of it.

Base logic: Guarded Higher-Order Logic
Our starting point is the Guarded Higher-Order Logic [1] (Guarded HOL) inspired by the topos of trees. In addition to the usual constructs of HOL to reason about lambda terms, this logic features the ⊲ and modalities to reason about infinite terms, in particular streams. The ⊲ modality is used to reason about objects that will be available in the future, such as tails of streams. For instance, suppose we want to define an All(s, φ) predicate, expressing that all elements of a stream s ≡ n :: xs satisfy a property φ. This can be axiomatized as follows: We use x.φ to denote that the formula φ depends on a free variable x, which will get replaced by the first argument of All. We have two antecedents. The first one states that the head n satisfies φ. The second one, ⊲ [s ← xs] . All(s, x.φ), states that all elements of xs satisfy φ. Formally, xs is the tail of the stream and will be available in the future, so it has type ⊲ Str N . The delayed substitution ⊲[s ← xs] replaces s of type Str N with xs of type ⊲ Str N inside All and shifts the whole formula one step into the future. In other words, ⊲ [s ← xs] . All(s, x.φ) states that All(−, x.φ) will be satisfied by xs in the future, once it is available.

A system for relational reasoning
When proving relational properties it is often convenient to build proofs guided by the syntactic structure of the two expressions to be related. This style of reasoning is particularly appealing when the two expressions have the same structure and control-flow, and is appealingly close to the traditional style of reasoning supported by refinement types. At the same time, a strict adherence to the syntax-directed discipline is detrimental to the expressiveness of the system; for instance, it makes it difficult or even impossible to reason about structurally dissimilar terms. To achieve the best of both worlds, we present a relational proof system built on top of Guarded HOL, which we call Guarded RHOL. Judgements have the shape: where φ is a logical formula that may contain two distinguished variables r 1 and r 2 that respectively represent the expressions t 1 and t 2 . This judgement subsumes two typing judgements on t 1 and t 2 and a relation φ on these two expressions. However, this form of judgement does not tie the logical property to the type of the expressions, and is key to achieving flexibility while supporting syntax-directed proofs whenever needed. The proof system combines rules of two different flavours: two-sided rules, which relate expressions with the same toplevel constructs, and one-sided rules, which operate on a single expression. We then extend Guarded HOL with a modality ⋄ that lifts assertions over discrete types C 1 and C 2 to assertions over D(C 1 ) and D(C 2 ). Concretely, we define for every assertion φ, variables x 1 and x 2 of type C 1 and C 2 respectively, and expressions t 1 and t 2 of type D(C 1 ) and D(C 2 ) respectively, the modal assertion ⋄ [x1←t1,x2←t2] φ which holds iff the interpretations of t 1 and t 2 are related by the probabilistic lifting of the interpretation of φ. We call this new logic Probabilistic Guarded HOL.
We accordingly extend the relational proof system to support reasoning about probabilistic expressions by adding judgements of the form: expressing that t 1 and t 2 are distributions related by a φ-coupling. We call this proof system Probabilistic Guarded RHOL. These judgements can be built by using the following rule, that lifts relational judgements over discrete types C 1 and C 2 to judgements over distribution types D(C 1 ) and D(C 2 ) when the premises of Strassen's theorem are satisfied.
Recall that (discrete time) Markov chains are "memoryless" probabilistic processes, whose specification is given by a (discrete) set C of states, an initial state s 0 and a probabilistic transition function step : C → D(C), where D(S) represents the set of discrete distributions over C. As explained in the introduction, a convenient modelling of Markov chains is by means of probabilistic The random walk vs lazy random walk (shift coupling) cannot be proved in prior systems because it requires either asynchronous reasoning or code rewriting. Finally, the biased coin example (stochastic dominance) cannot be proved in prior work because it requires Strassen's formulation of the existence of coupling (rather than a bijection-based formulation) or code rewriting. We give additional details below.
One-time pad/probabilistic non-interference Non-interference [8] is a baseline information flow policy that is often used to model confidentiality of computations. In its simplest form, non-interference distinguishes between public (or low) and private (or high) variables and expressions, and requires that the result of a public expression not depend on the value of its private parameters. This definition naturally extends to probabilistic expressions, except that in this case the evaluation of an expression yields a distribution rather than a value. There are deep connections between probabilistic non-interference and several notions of (information-theoretic) security from cryptography. In this paragraph, we illustrate different flavours of security properties for one-time pad encryption. Similar reasoning can be carried out for proving (passive) security of secure multiparty computation algorithms in the 3-party or multi-party setting [9,10]. One-time pad is a perfectly secure symmetric encryption scheme. Its space of plaintexts, ciphertexts and keys is the set {0, 1} ℓ -fixed-length bitstrings of size ℓ. The encryption algorithm is parametrized by a key k-sampled uniformly over the set of bitstrings {0, 1} ℓ -and maps every plaintext m to the ciphertext c = k ⊕ m, where the operator ⊕ denotes bitwise exclusive-or on bitstrings. We let otp denote the expression λm.let k = U {0,1} ℓ in munit(k ⊕ m), where U X is the uniform distribution over a finite set X.
One-time pad achieves perfect security, i.e. the distributions of ciphertexts is independent of the plaintext. Perfect security can be captured as a probabilistic non-interference property: where e 1 ⋄ = e 2 is used as a shorthand for ⋄ [y1←e1,y2←e2] y 1 = y 2 . The crux of the proof is to establish using the [COUPLING] rule. It suffices to observe that the assertion induces a bijection, so the image of an arbitrary set X under the relation has the same cardinality as X, and hence their probabilities w.r.t. the uniform distributions are equal. One can then conclude the proof by applying the rules for monadic sequenciation ([MLET]) and abstraction (rule [ABS] in appendix), using algebraic properties of ⊕.
Interestingly, one can prove a stronger property: rather than proving that the ciphertext is independent of the plaintext, one can prove that the distribution of ciphertexts is uniform. This is captured by the following judgement: This style of modelling uniformity as a relational property is inspired from [11]. The proof is similar to the previous one and omitted. However, it is arguably more natural to model uniformity of the distribution of ciphertexts by the judgement: This judgement is closer to the simulation-based notion of security that is used pervasively in cryptography, and notably in Universal Composability [12]. Specifically, the statement captures the fact that the one-time pad algorithm can be simulated without access to the message. It is interesting to note that the judgement above (and more generally simulation-based security) could not be expressed in prior works, since the two expressions of the judgement have different types-note that in this specific case, the right expression is a distribution but in the general case the right expression will also be a function, and its domain will be a projection of the domain of the left expression.
The proof proceeds as follows. First, we prove using the [COUPLING] rule. Then, we apply the [MLET] rule to obtain We conclude by applying the one-sided rule for abstraction.
Stochastic dominance Stochastic dominance defines a partial order between random variables whose underlying set is itself a partial order; it has many different applications in statistical biology (e.g. in the analysis of the birth-anddeath processes), statistical physics (e.g. in percolation theory), and economics. First-order stochastic dominance, which we define below, is also an important application of probabilistic couplings. We demonstrate how to use our proof system for proving (first-order) stochastic dominance for a simple Markov process which samples biased coins. While the example is elementary, the proof method extends to more complex examples of stochastic dominance, and illustrates the benefits of Strassen's formulation of the coupling rule over alternative formulations stipulating the existence of bijections (explained later).
We start by recalling the definition of (first-order) stochastic dominance for the N-valued case. The definition extends to arbitrary partial orders.
We now turn to the definition of the Markov chain. For p ∈ [0, 1], we consider the parametric N-valued Markov chain coins markov(0, h), with initial state 0 and (parametric) step function: where, for p ∈ [0, 1], B(p) is the Bernoulli distribution on {0, 1} with probability p for 1 and 1 − p for 0. Our goal is to establish that coins is monotonic, i.e. for every p 1 , p 2 ∈ [0, 1], p 1 ≤ p 2 implies coins p 1 ≤ SD coins p 2 . We formalize this statement as . The crux of the proof is to establish stochastic dominance for the Bernoulli distribution: where we use e 1 ⋄ ≤ e 2 as shorthand for ⋄ [y1←e1,y2←e2] y 1 ≤ y 2 . This is proved directly by the [COUPLING] rule and checking by simple calculations that the premise of the rule is valid.
We briefly explain how to conclude the proof. Let h 1 and h 2 be the step functions for p 1 and p 2 respectively. It is clear from the above that (context omitted): and by the definition of All: So, we can conclude by applying the [Markov] rule.
It is instructive to compare our proof with prior formalizations, and in particular with the proof in [5]. Their proof is carried out in the pRHL logic, whose [COUPLING] rule is based on the existence of a bijection that satisfies some property, rather than on our formalization based on Strassen's Theorem. Their rule is motivated by applications in cryptography, and works well for many examples, but is inconvenient for our example at hand, which involves non-uniform probabilities. Indeed, their proof is based on code rewriting, and is done in two steps. First, they prove equivalence between sampling and returning x 1 from B(p 1 ); and sampling z 1 from B(p 2 ), z 2 from B( p1 / p2 ) and returning z = z 1 ∧ z 2 . Then, they find a coupling between z and B(p 2 ).
Shift coupling: random walk vs lazy random walk The previous example is an instance of a lockstep coupling, in that it relates the k-th element of the first chain with the k-th element of the second chain. Many examples from the literature follow this lockstep pattern; however, it is not always possible to establish lockstep couplings. Shift couplings are a relaxation of lockstep couplings where we relate elements of the first and second chains without the requirement that their positions coincide.
We consider a simple example that motivates the use of shift couplings. Consider the random walk and lazy random walk (which, at each time step, either chooses to move or stay put), both defined as Markov chains over Z. For simplicity, assume that both walks start at position 0. It is not immediate to find a coupling between the two walks, since the two walks necessarily get desynchronized whenever the lazy walk stays put. Instead, the trick is to consider a lazy random walk that moves two steps instead of one. The random walk and the lazy random walk of step 2 are defined by the step functions: After 2 iterations of step, the position has either changed two steps to the left or to the right, or has returned to the initial position, which is the same behaviour lstep2 has on every iteration. Therefore, the coupling we want to find should equate the elements at position 2i in step with the elements at position i in lstep2. The details on how to prove the existence of this coupling are in section 6.
Lumped coupling: random walks on 3 and 4 dimensions A Markov chain is recurrent if it has probability 1 of returning to its initial state, and transient otherwise. It is relatively easy to show that the random walk over Z is recurrent. One can also show that the random walk over Z 2 is recurrent. However, the random walk over Z 3 is transient.
For higher dimensions, we can use a coupling argument to prove transience. Specifically, we can define a coupling between a lazy random walk in n dimensions and a random walk in n + m dimensions, and derive transience of the latter from transience of the former. We define the (lazy) random walks below, and sketch the coupling arguments.
Specifically, we show here the particular case of the transience of the 4dimensional random walk from the transience of the 3-dimensional lazy random walk. We start by defining the stepping functions: markov(0, step 4 ), and the lazy walk of dimension 3 is modelled by lwalk3 markov(0, step 3 ). We want to prove: where pr n2 n1 denotes the standard projection from Z n2 to Z n1 . We apply the [Markov] rule. The only interesting premise requires proving that the transition function preserves the coupling: To prove this, we need to find the appropriate coupling, i.e., one that preserves the equality. The idea is that the step in Z 3 must be the projection of the step in Z 4 . This corresponds to the following judgement: which by simple equational reasoning is the same as We want to build a coupling such that if we sample (0, 0, 0, 1) or (0, 0, 0, −1) from U U3 , then we sample 0 from B( 3 / 4 ), and otherwise if we sample ( Formally, we prove this with the [Coupling] rule. Given X : U 4 → B, by simple computation we show that:

Probabilistic Guarded Lambda Calculus
To ensure that a function on infinite datatypes is well-defined, one must check that it is productive. This means that any finite prefix of the output can be computed in finite time. For instance, consider the following function on streams: This function is not productive since only the first element can be computed. We can argue this as follows: Suppose that the tail of a stream is available one unit of time after its head, and that that x:xs is available at time 0. How much time does it take for bad to start outputting its tail? Assume it takes k units of time. This means that tail(bad xs) will be available at time k + 1 , since xs is only available at time 1. But tail(bad xs) is exactly the tail of bad(x:xs), and this is a contradiction, since x:xs is available at time 0 and therefore the tail of bad(x:xs) should be available at time k. Therefore, the tail of bad will never be available. The guarded lambda calculus solves the productivity problem by distinguishing at type level between data that is available now and data that will be available in the future, and restricting when fixpoints can be defined. Specifically, the guarded lambda calculus extends the usual simply typed lambda calculus with two modalities: ⊲ (pronounced later ) and (constant ). The later modality represents data that will be available one step in the future, and is introduced and removed by the term formers ⊲ and prev respectively. This modality is used to guard recursive occurrences, so for the calculus to remain productive, we must restrict when it can be eliminated. This is achieved via the constant modality, which expresses that all the data is available at all times. In the remainder of this section we present a probabilistic extension of this calculus.
Syntax Types of the calculus are defined by the grammar where b ranges over a collection of base types. Str A is the type of guarded streams of elements of type A. Formally, the type Str A is isomorphic to A × ⊲ Str A . This isomorphism gives a way to introduce streams with the function (::) : A → ⊲ Str A → Str A and to eliminate them with the functions hd : Str A → A and tl : Str A → ⊲ Str A . D(C) is the type of distributions over discrete types C. Discrete types are defined by the following grammar, where b 0 are discrete base types, e.g., Z.
Note that, in particular, arrow types are not discrete but streams are. This is due to the semantics of streams as sets of finite approximations, which we describe in the next subsection. Also note that Str A is not discrete since it makes the full infinite streams available.
We also need to distinguish between arbitrary types A, B and constant types S, T , which are defined by the following grammar S, T :: where b C is a collection of constant base types. Note in particular that for any type A the type A is constant.
The terms of the language t are defined by the following grammar The terms c are constants corresponding to the base types used and munit(t) and let x = t in t are the introduction and sequencing construct for probability distributions. The meta-variable µ stands for base distributions like U C and B(p).
Delayed substitutions were introduced in [13] in a dependent type theory to be able to work with types dependent on terms of type ⊲A. In the setting of a simple type theory, such as the one considered in this paper, delayed substitutions are equivalent to having the applicative structure [14] ⊛ for the ⊲ modality. However, delayed substitutions extend uniformly to the level of propositions, and thus we choose to use them in this paper in place of the applicative structure.
Denotational semantics The meaning of terms is given by a denotational model in the category S of presheaves over ω, the first infinite ordinal. This category S is also known as the topos of trees [15]. In previous work [1], it was shown how to model most of the constructions of the guarded lambda calculus and its internal logic, with the notable exception of the probabilistic features. Below we give an elementary presentation of the semantics.
Informally, the idea behind the topos of trees is to represent (infinite) objects from their finite approximations, which we observe incrementally as time passes. Given an object x, we can consider a sequence {x i } of its finite approximations observable at time i. These are trivial for finite objects, such as a natural number, since for any number n, n i = n at every i. But for infinite objects such as streams, the ith approximation is the prefix of length i + 1.
Concretely, the category S consists of: -Objects X: families of sets {X i } i∈N together with restriction functions r X n : X n+1 → X n . We will write simply r n if X is clear from the context.
-Morphisms X → Y : families of functions α n : X n → Y n commuting with restriction functions in the sense of r Y n • α n+1 = α n • r X n . The full interpretation of types of the calculus can be found in Figure 8 in the appendix. The main points we want to highlight are: -Streams over a type A are interpreted as sequences of finite prefixes of elements of A with the restriction functions of A: -Distributions over a discrete object C are defined as a sequence of distributions over each C i : where D( C i ) is the set of (probability density) functions µ : C i → [0, 1] such that x∈X µx = 1, and D(r i ) adds the probability density of all the points in C i+1 that are sent by r i to the same point in the C i . In other words, An important property of the interpretation is that discrete types are interpreted as objects X such that X i is finite or countably infinite for every i. This allows us to define distributions on these objects without the need for measure theory. In particular, the type of guarded streams Str A is discrete provided A is, which is clear from the interpretation of the type Str A . Conceptually this holds because Str A i is an approximation of real streams, consisting of only the first i + 1 elements.
An object X of S is constant if all its restriction functions are bijections. Constant types are interpreted as constant objects of S and for a constant type A the objects A and A are isomorphic in S.
Typing rules Terms are typed under a dual context ∆ | Γ , where Γ is a usual context that binds variables to a type, and ∆ is a constant context containing variables bound to types that are constant. The term letc x ← u in t allows us to shift variables between constant and non-constant contexts. The typing rules can be found in Figure 2.
The semantics of such a dual context ∆ | Γ is given as the product of types in ∆ and Γ , except that we implicitly add in front of every type in ∆. In the particular case when both contexts are empty, the semantics of the dual context correspond to the terminal object 1, which is the singleton set { * } at each time.
The interpretation of the well-typed term ∆ | Γ ⊢ t : A is defined by induction on the typing derivation, and can be found in Figure 9 in the appendix.
Applicative structure of the later modality As in previous work we can define the operator ⊛ satisfying the typing rule Example: Modelling Markov chains As an application of ⊛ and an example of how to use guardedness and probabilities together, we now give the precise definition of the markov construct that we used to model Markov chains earlier: The guardedness condition gives f the type ⊲(C → (C → D(C)) → D(Str C )) in the body of the fixpoint. Therefore, it needs to be applied functorially (via ⊛) to ⊲z and ⊲h, which gives us a term of type ⊲D(Str C ). To complete the definition we need to build a term of type D(⊲ Str C ) and then sequence it with :: to build a term of type D(Str C ). To achieve this, we use the primitive operator swap C ⊲D : ⊲D(C) → D(⊲C), which witnesses the isomorphism between ⊲D(C) and D(⊲C). For this isomorphism to exist, it is crucial that distributions be total (i.e., we cannot use subdistributions).

Guarded higher-order logic
We now introduce Guarded HOL (GHOL), which is a higher-order logic to reason about terms of the guarded lambda calculus. The logic is essentially that of [1], but presented with the dual context formulation analogous to the dual-context typing judgement of the guarded lambda calculus. Compared to standard intuitionistic higher-order logic, the logic GHOL has two additional constructs, corresponding to additional constructs in the guarded lambda calculus. These are the later modality (⊲) on propositions, with delayed substitutions, which expresses that a proposition holds one time unit into the future, and the "always" modality , which expresses that a proposition holds at all times. Formulas are defined by the grammar: The basic judgement of the logic is ∆ | Σ | Γ | Ψ ⊢ φ where Σ is a logical context for ∆ (that is, a list of formulas well-formed in ∆) and Ψ is another logical context for the dual context ∆ | Γ . The formulas in context Σ must be constant propositions. We say that a proposition φ is constant if it is well-typed in context ∆ | · and moreover if every occurrence of the later modality in φ is under the modality. Selected rules are displayed in Figure 3 on page 20. We highlight [Loeb] induction, which is the key to reasoning about fixpoints: to prove that φ holds now, one can assume that it holds in the future. The interpretation of the formula ∆ | Γ ⊢ φ is a subobject of the interpretation This family must satisfy the property that if x ∈ A i+1 then r i (x) ∈ A i where r i are the restriction functions of ∆ | Γ . The interpretation of formulas is defined by induction on the typing derivation. In the interpretation of the context ∆ | Σ | Γ | Ψ the formulas in Σ are interpreted with the added modality. Moreover all formulas φ in Σ are typeable in the context ∆ | · ⊢ φ and thus their interpretations are subsets of ∆ . We treat these subsets of ∆ | Γ in the obvious way.
The cases for the semantics of the judgement ∆ | Γ ⊢ φ can be found in the appendix. It can be shown that this logic is sound with respect to its model in the topos of trees.

Theorem 2 (Soundness of the semantics). The semantics of guarded higherorder logic is sound
In addition, Guarded HOL is expressive enough to axiomatize standard probabilities over discrete sets. This axiomatization can be used to define the ⋄ modality directly in Guarded HOL (as opposed to our relational proof system, were we use it as a primitive). Furthermore, we can derive from this axiomatization additional rules to reason about couplings, which can be seen in Figure 4. These rules will be the key to proving the soundness of the probabilistic fragment of the relational proof system, and can be shown to be sound themselves.
Proposition 2 (Soundness of derived rules). The additional rules are sound.

Relational proof system
We complete the formal description of the system by describing the proof rules for the non-probabilistic fragment of the relational proof system (the rules of the probabilistic fragment were described in Section 3.2).

Proof rules
The rules for core λ-calculus constructs are identical to those of [2]; for convenience, we present a selection of the main rules in Figure 7 in the appendix. Fig. 3. Selected Guarded Higher-Order Logic rules

Fig. 4. Derived rules for probabilistic constructs
We briefly comment on the two-sided rules for the new constructs ( Figure 5). The notation Ω abbreviates a context ∆ | Σ | Γ | Ψ . The rule [Next] relates two terms that have a ⊲ term constructor at the top level. We require that both have one term in the delayed substitutions and that they are related pairwise. Then this relation is used to prove another relation between the main terms. This rule can be generalized to terms with more than one term in the delayed substitution. The rule [Prev] proves a relation between terms from the same delayed relation by applying prev to both terms. The rule [Box] proves a relation between two boxed terms if the same relation can be proven in a constant context. Dually, [LetBox] uses a relation between two boxed terms to prove a relation between their unboxings. [LetConst] is similar to [LetBox], but it requires instead a relation between two constant terms, rather than explicitly -ed terms. The rule [Fix] relates two fixpoints following the [Loeb] rule from Guarded HOL. Notice that in the premise, the fixpoints need to appear in the delayed substitution so that the inductive hypothesis is well-formed. The rule [Cons] proves relations on streams from relations between their heads and tails, while [Head] and [Tail] behave as converses of [Cons]. Figure 6 contains the one-sided versions of the rules. We only present the left-sided versions as the right-sided versions are completely symmetric. The rule [Next-L] relates at φ a term that has a ⊲ with a term that does not have a ⊲. First, a unary property φ ′ is proven on the term u in the delayed substitution, and it is then used as a premise to prove φ on the terms with delays removed. Rules for proving unary judgements can be found in the appendix. Similarly, [LetBox-L] proves a unary property on the term that gets unboxed and then uses it as a precondition. The rule [Fix-L] builds a fixpoint just on the left, and relates it with an arbitrary term t 2 at a property φ. Since φ may contain the variable r 2 which is not in the context, it has to be replaced when adding ⊲φ to the logical context in the premise of the rule. The remaining rules are similar to their two-sided counterparts.

Metatheory
We review some of the most interesting metatheoretical properties of our relational proof system, highlighting the equivalence with Guarded HOL.
Theorem 3 (Equivalence with Guarded HOL). For all contexts ∆, Γ ; types σ 1 , σ 2 ; terms t 1 , t 2 ; sets of assertions Σ, Ψ,; and assertions φ: The forward implication follows by induction on the given derivation. The reverse implication is immediate from the rule which allows to fall back on Guarded HOL in relational proofs. (Rule [SUB] in the appendix). The full proof is in the appendix. The consequence of this theorem is that the syntax-directed, relational proof system we have built on top of Guarded HOL does not lose expressiveness.
The intended semantics of a judgement ∆ | Σ | Γ | Ψ ⊢ t 1 : A 1 ∼ t 2 : A 2 | φ is that, for every valuation δ |= ∆, γ |= Γ , if Σ (δ) and Ψ (δ, γ), then Since Guarded HOL is sound with respect to its semantics in the topos of trees, and our relational proof system is equivalent to Guarded HOL, we obtain that our relational proof system is also sound in the topos of trees.

Shift couplings revisited
We give further details on how to prove the example with shift couplings from Section 3.3.
(Additional examples of relational reasoning on non-probabilistic streams can be found in the appendix.) Recall the step functions: We axiomatize the predicate All 2,1 , which relates the element at position 2i in one stream to the element at position i in another stream, as follows.
We can now express the existence of a shift coupling by the statement: For the proof, we need to introduce an asynchronous rule for Markov chains: This asynchronous rule for Markov chains shares the motivations of the rule for loops proposed in [6]. Note that one can define a rule [Markov-m-n] for arbitrary m and n to prove a judgement of the form All m,n on two Markov chains. We show the proof of the shift coupling. By equational reasoning, we get: and the only interesting premise of [Markov-2-1] is: Couplings between z 1 and z 2 and between z ′ 1 and b 2 can be found by simple computations. This completes the proof.

Related work
Our probabilistic guarded λ-calculus and the associated logic Guarded HOL build on top of the guarded λ-calculus and its internal logic [1]. The guarded λ-calculus has been extended to guarded dependent type theory [13], which can be understood as a theory of guarded refinement types and as a foundation for proof assistants based on guarded type theory. These systems do not reason about probabilities, and do not support syntax-directed (relational) reasoning, both of which we support.
Relational models for higher-order programming languages are often defined using logical relations. [16] showed how to use second-order logic to define and reason about logical relations for the second-order lambda calculus. Recent work has extended this approach to logical relations for higher-order programming languages with computational effects such as nontermination, general references, and concurrency [17,18,19,20]. The logics used in loc. cit. are related to our work in two ways: (1) the logics in loc. cit. make use of the later modality for reasoning about recursion, and (2) the models of the logics in loc. cit. can in fact be defined using guarded type theory. Our work is more closely related to Relational Higher Order Logic [2], which applies the idea of logic-enriched type theories [21,22] to a relational setting. There exist alternative approaches for reasoning about relational properties of higher-order programs; for instance, [23] have recently proposed to use monadic reification for reducing relational verification of F * to proof obligations in higher-order logic.
A series of work develops reasoning methods for probabilistic higher-order programs for different variations of the lambda calculus. One line of work has focused on operationally-based techniques for reasoning about contextual equivalence of programs. The methods are based on probabilistic bisimulations [24,25] or on logical relations [26]. Most of these approaches have been developed for languages with discrete distributions, but recently there has also been work on languages with continuous distributions [27,28]. Another line of work has focused on denotational models, starting with the seminal work in [29]. Recent work includes support for relational reasoning about equivalence of programs with continuous distributions for a total programming language [30]. Our approach is most closely related to prior work based on relational refinement types for higher-order probabilistic programs. These were initially considered by [31] for a stateful fragment of F * , and later by [32,33] for a pure language. Both systems are specialized to building probabilistic couplings; however, the latter support approximate probabilistic couplings, which yield a natural interpretation of differential privacy [34], both in its vanilla and approximate forms (i.e. ǫand (ǫ, δ)-privacy). Technically, approximate couplings are modelled as a graded monad, where the index of the monad tracks the privacy budget (ǫ or (ǫ, δ)). Both systems are strictly syntax-directed, and cannot reason about computations that have different types or syntactic structures, while our system can.

Conclusion
We have developed a probabilistic extension of the (simply typed) guarded λcalculus, and proposed a syntax-directed proof system for relational verification. Moreover, we have verified a series of examples that are beyond the reach of prior work. Finally, we have proved the soundness of the proof system with respect to the topos of trees.
There are several natural directions for future work. One first direction is to enhance the expressiveness of the underlying simply typed language. For instance, it would be interesting to introduce clock variables and some type dependency as in [13], and extend the proof system accordingly. This would allow us, for example, to type the function taking the n-th element of a guarded stream, which cannot be done in the current system. Another exciting direction is to consider approximate couplings, as in [32,33], and to develop differential privacy for infinite streams-preliminary work in this direction, such as [35], considers very large lists, but not arbitrary streams. A final direction would be to extend our approach to continuous distributions to support other application domains.

Types and terms in context
The meaning of terms is given by the denotational model in the category S of presheaves over ω, the first infinite ordinal. This category S is also known as the topos of trees [15]. In previous work [1] it was shown how to model most of the constructions of the guarded lambda calculus and the associated logic, with the notable exception of the probabilistic features. Below we give an elementary and self-contained presentation of the semantics. Concretely, objects X of S are families of sets X i indexed over N together with functions r X n : X n+1 → X n . These are called restriction functions. We will write simply r n if X is clear from the context. Moreover if x ∈ X i and j ≤ i we will write x ↾ j for the element r j (· · · (r i−1 (x)) · · · ) ∈ X j . Morphisms X → Y are families of functions α n : X n → Y n commuting with restriction functions in the sense of r Y n •α n+1 = α n •r X n . One can see the restriction function r n : X n+1 → X n as mapping elements of X n+1 to their approximations at time n.
Semantics of types can be found on Figure 8, where G ( A ) consists of sequences {x n } n∈N such that x i ∈ A i and r i (x i+1 ) = x i for all i, i.e., A is the set of so-called global sections of A .
The semantics of a dual context ∆ | Γ is given as the product of types in ∆ and Γ , except that we implicitly add in front of every type in ∆. In the particular case when both contexts are empty, the semantics of the dual context correspond to the terminal object 1, which is the singleton set { * } at each stage. A term in context ∆ | Γ ⊢ t : τ is interpreted as a family of functions t n : ∆ | Γ n → τ n commuting with restriction functions of ∆ | Γ and τ . Semantics of products, coproducts, and natural numbers is pointwise as in sets, so we omit writing it. The cases for the other constructs are in Figure 9 where munit and mlet are the standard unit and bind operations on discrete probabilities, i.e. The functions π 0 and π 1 are the first and second projections, respectively.
b  The denotational semantics validates the following equational theory in addition to the standard equational theory of the simply typed lambda calculus with sums and natural numbers. Rules for fixed points, always modality and streams hd (x :: xs) ≡ x tl (x :: xs) ≡ xs hd t :: tl t ≡ t

Rules for delayed substitutions
Monad laws for distributions In particular, notice that fix does not reduce as usual, but instead the whole term is delayed before the substitution is performed.

B.3 Logical judgements
The cases for the semantics of the judgement ∆ | Γ ⊢ φ of the non-probabilistic fragment are as follows (we omit writing the contexts if they are clear):

C Additional background
One consequence of Strassen's theorem is that couplings are closed under convex combinations.
Lemma 3 (Convex combinations of couplings). Let (µ i ) i∈I and (ν i ) i∈I bet two families of distributions on C 1 and C 2 respectively, and let (p i ) i∈I ∈ [0, 1] such that i∈I p i = 1. If ⋄ µi,νi .R for all i ∈ I then ⋄ ( i∈I piµi),( i∈I piνi) .R, where the convex combination i∈I p i µ i is defined by the clause ( i∈I p i µ i )(x) = i∈I p i µ i (x). One obtains an asymmetric version of the lemma by observing that if µ i = µ for every i ∈ I, then ( i∈I p i µ i ) = µ.
One can also show that couplings are closed under relation composition.
We can also add by [SUB] the same substitutions to the ⊲ in the conclusion, since the substituted variables do not appear in the formula. Then we can apply the [Next] rule, which has the premises: The first four can be proven simply by instantiating and then delaying one of the axioms. The last one is proven by applying [App] three times. This concludes the proof.

E.3 Proof of Cassini's identity
We continue building on the idea from the previous example of using streams to represent series of numbers. This time, we prove a classical identity of the Fibonacci sequence. Since the example requires to observe the stream at different times, we will also have to deal with some asynchronicity on the delayed substitutions.
Let F n be the nth Fibonacci number. Cassini's identity states that F n−1 · F n+1 − F 2 n = (−1) n . Cassini's identity can be stated as a stream problem as follows. First, let F be the Fibonnaci stream (1, 1, 2, 3, 5, . . .) and A be the stream 1, −1, 1, −1, . . . Let ⊕ and ⊗ be infix functions that add and multiply two streams pointwise. Cassini's identity can then be informally written as: In order to formalize Cassini's identity in our system, we first define: Then we define F and A as the fixpoints of the equations: A fix A. 1 :: ⊲(−1 :: A) We prove (using prefix notation for ⊕ and ⊗): The proof combines applications of two-sided rules and one-sided rules; in particular, we use the rule [NEXT-L] to proceed with the proof for a judgement where the left expression is delayed twice and the right expression is delayed once.
By conversion, in the logic we can prove the following equalities: Using these equalities, and desugaring the applications, the judgment we want to prove is (omitting constant contexts): Notice that on the left, since we want to apply tail twice to F , we need to delay the term twice so that F and tl tl F have the same type. On the right, we just need to delay the term once. As for the logical conclusion, r 1 needs to be delayed twice, while r 2 only once. The way to do this is by having r 1 appear on the two substitutions but r 2 only on the inner one.
We start by applying [NEXT-L], which has the two following premises: The first premise is trivial. We continue by applying [NEXT] to the second, which has the following premises: Again, the first premise is trivial. We apply [APP] twice to the second, and we have to prove: The two first premises are easy to prove. We will show how to prove the last one. For this, we need a stronger induction hypothesis for⊕ and⊗. We propose the following: ∀g 1 , g 2 , b 1 , G, B.G = g 1: :g 2: :(G⊕(tlG)) ∧ b 1 = g 2 We then use the [SUB] rule to strengthen the inductive hypothesis, and now the new judgement to prove is: Let Γ ′ , Ψ ′ and Φ IH denote respectively the typing context, logical context and logical conclusion of the previous judgement. The premise of the FIX rule is: Let Φ E denote the existential clause in Φ IH . After applying [ABS] twice, we have: And then we apply [Cons] to prove equality on the heads and the tails: To prove the first one we notice that hdX 1 * hdY 1 = g 1 * (g 1 +g 2 ) = g 2 1 +g 2 * g 1 = g 2 2 + g 2 1 + g 1 * g 2 − g 2 2 = hdX 2 * hdY 2 . To prove the second one we need to check that tlX 1 , tlY 1 , tlX 2 , tlY 2 satisfy the precondition of the inductive hypothesis. In particular, we need to check that which is can be proven by arithmetic computation.

F Unary fragment
In this section we introduce a unary system to prove properties about a single term of the guarded lambda calculus. We will start by adding some definitions Guarded HOL for the unary diamond monad, following by the derivation rules for both the non-probabilistic and the probabilistic system, plus the metatheory and an example.

F.1 Unary fragment of GHOL
The unary semantics of the diamond monad are: The rules are on Figure 10 ∆

F.2 Guarded UHOL
We start by defining the Guarded UHOL system, which allows us to prove logical properties of a term of the Guarded Lambda Calculus. More concretely, judgements have the form: where t is a term well-typed in the dual context ∆ | Γ and φ is a logical formula well-typed in the context ∆ | Γ, r : σ and that can refer to t via the special variable r. The logical contexts Σ and Ψ consist respectively of refinements over the contexts ∆ and Γ .

F.3 Derivation rules
The rule [Next] corresponds to the introduction of the later modality. A refinement Φ i is proven on every term in the substitution, and using those as a premise, a refinement Φ is proven on t. In the notation ⊲ [r ← r] .Φ the first r is the variable bound by the delayed substitution inside Φ while the second r is the distinguished variable in the refinement that refers to the term that is being typed. respectively prove a property on the head and the tail of a stream from a property on the full stream. The intended meaning for a judgment ∆ | Σ | Γ | Ψ ⊢ t : τ | φ is: "For every valuations δ, γ of ∆ and Γ , ∆ | Γ ⊢ Σ (δ, γ)∧ ∆ | Γ ⊢ Ψ (δ, γ) ⇒ ∆ | Γ, r : τ ⊢ Σ (δ, γ, ∆ | Γ ⊢ t (δ, γ) )"

F.4 Metatheory
We now the most interesting metatheoretical properties of Guarded UHOL. In particular, Guarded UHOL is equivalent to Guarded HOL: Theorem 4 (Equivalence with Guarded HOL). For every contexts ∆, Γ , type σ, term t, sets of assertions Σ, Ψ and assertion φ, the following are equivalent: The proof is analogous to the relational case The previous result allows us to lift the soundness result from Guarded HOL to Guarded UHOL. SUPP Finally, we prove an embedding lemma for Guarded UHOL. The proof can be carried by induction on the structure of derivations, or using the equivalence between Guarded UHOL and Guarded HOL (Theorem 4).

F.6 Unary example: Every two
We define the every2 function, which receives a stream and returns another stream consisting of the elements at even positions in the input stream. Note that this function, while productive, cannot be built with the type Str → Str, since we need to take twice the tail of the argument, which would have type ⊲ ⊲ Str, and then a Str cannot be built. Instead, we need to use the constant modality as follows: every2 : Str → Str every2 fix every2. λs.ĥd(tl s) :: (every2 ⊛ next(tl(tl s))) Where theĥd andtl functions are not the native ones, but rather they are defined as: Premises (1) is a consequence of the properties of ones. To prove premise (3) we reduce the letbox with the box inside ones, and do some reasoning using the definition of the fixpoint. To prove the premise (2) we first desugar the term we are typing: every2 ⊛ ⊲(tl(tl s))) ⊲ g ← every2, t ← ⊲(tl(tl s)) .gt and then we apply [Next] which has the following premises: The first premise is just an application of the [Var] rule. The second premise can be proven as a consequence of the properties of ones. Finally, the third premise can be proven with some simple logical reasoning in HOL. This concludes the proof.