Automatic Alignment in Higher-Order Probabilistic Programming Languages

Probabilistic Programming Languages (PPLs) allow users to encode statistical inference problems and automatically apply an inference algorithm to solve them. Popular inference algorithms for PPLs, such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC), are built around checkpoints -- relevant events for the inference algorithm during the execution of a probabilistic program. Deciding the location of checkpoints is, in current PPLs, not done optimally. To solve this problem, we present a static analysis technique that automatically determines checkpoints in programs, relieving PPL users of this task. The analysis identifies a set of checkpoints that execute in the same order in every program run -- they are aligned. We formalize alignment, prove the correctness of the analysis, and implement the analysis as part of the higher-order functional PPL Miking CorePPL. By utilizing the alignment analysis, we design two novel inference algorithm variants: aligned SMC and aligned lightweight MCMC. We show, through real-world experiments, that they significantly improve inference execution time and accuracy compared to standard PPL versions of SMC and MCMC.

Sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) are general-purpose families of inference algorithms often used for PPL implementations. These algorithms share the concept of checkpoints: relevant execution events for the inference algorithm. For SMC, the checkpoints are likelihood updates [47,14] and determine the resampling of executions. Alternatively, users must sometimes manually annotate or write the probabilistic program in a certain way to make resampling explicit [25,30]. For MCMC, checkpoints are instead random draws, which allow the inference algorithm to manipulate these draws to construct a Markov chain over program executions [46,37]. When designing SMC and MCMC algorithms for universal PPLs 4 , both the placement and handling of checkpoints are critical to making the inference both efficient and accurate.
For SMC, a standard inference approach is to resample at all likelihood updates [14,47]. This approach produces correct results asymptotically [24] but is highly problematic for certain models [38]. Such models require non-trivial and SMC-specific manual program rewrites to force good resampling locations and make SMC tractable. Overall, choosing the likelihood updates at which to resample significantly affects SMC execution time and accuracy.
For MCMC, a standard approach for inference in universal PPLs is lightweight MCMC [46], which constructs a Markov chain over random draws in programs.
The key idea is to use an addressing transformation and a runtime database of random draws. Specifically, the database enables matching and reusing random draws between executions according to their stack traces, even if the random draws may or may not occur due to randomness during execution. However, the dynamic approach of looking up random draws in the database through their stack traces is expensive and introduces significant runtime overhead.
To overcome the SMC and MCMC problems in universal PPLs, we present a static analysis technique for higher-order functional PPLs that automatically determines checkpoints in a probabilistic program that always occur in the same order in every program execution-they are aligned. We formally define alignment, formalize the alignment analysis, and prove the soundness of the analysis with respect to the alignment definition. The novelty and challenge in developing the static analysis technique is to capture alignment properties through the identification of expressions in programs that may evaluate to stochastic values and expressions that may evaluate due to stochastic branching. Stochastic branching results from if expressions with stochastic values as conditions or function applications where the function itself is stochastic. Stochastic values and branches pose a significant challenge when proving the soundness of the analysis.
We design two new inference algorithms that improve accuracy and execution time compared to current approaches. Unlike the standard SMC algorithm for PPLs [47,14], aligned SMC only resamples at aligned likelihood updates. Resampling only at aligned likelihood updates guarantees that each SMC execution resamples the same number of times, which makes expensive global termination checks redundant [25]. We evaluate aligned SMC on two diversification models from Ronquist et al. [38] and a state-space model for aircraft localization, demonstrating significantly improved inference accuracy and execution time compared to traditional SMC. Both models-constant rate birth-death (CRBD) and cladogenetic diversification rate shift (ClaDS)-are used in real-world settings and are of considerable interest to evolutionary biologists [32,27]. The documentations of both Anglican [47] and Turing [12] acknowledge the importance of alignment for SMC and state that all likelihood updates must be aligned. However, Turing and Anglican neither formalize nor enforce this property-it is up to the users to manually guarantee it, often requiring non-standard program rewrites [38].
We also design aligned lightweight MCMC, a new version of lightweight MCMC [46]. Aligned lightweight MCMC constructs a Markov chain over the program using the aligned random draws as synchronization points to match and reuse aligned random draws and a subset of unaligned draws between executions. Aligned lightweight MCMC does not require a runtime database of random draws and therefore reduces runtime overhead. We evaluate aligned lightweight MCMC for latent Dirichlet allocation (LDA) [5] and CRBD [38], demonstrating significantly reduced execution times and no decrease in inference accuracy. Furthermore, automatic alignment is orthogonal to and easily combines with the lightweight MCMC optimizations introduced by Ritchie et al. [37].
We implement the analysis, aligned SMC, and aligned lightweight MCMC in Miking CorePPL [25,7]. In addition to analyzing stochastic if-branching, the implementation analyzes stochastic branching at a standard pattern-matching construct. Compared to if expressions, the pattern-matching construct requires a more sophisticated analysis of the pattern and the value matched against it to determine if the pattern-matching causes a stochastic branch.
In summary, we make the following contributions.
-We invent and formalize alignment for PPLs. Aligned parts of a program occur in the same order in every execution (Section 4.1).
-We formalize and prove the soundness of a novel static analysis technique that determines stochastic value flow and stochastic branching, and in turn alignment, in higher-order probabilistic programs (Section 4.2).
-We design aligned SMC inference that only resamples at aligned likelihood updates, improving execution time and inference accuracy (Section 5.1).
-We design aligned lightweight MCMC inference that only reuses aligned random draws, improving execution time (Section 5.2).

Aligned SMC
Likelihood weighting can only handle the simplest of programs. In Fig. 1a, a problem with likelihood weighting is that we assign the weight 0 to many executions at line 8. These executions contribute nothing to the final distribution. SMC solves this by executing many program instances concurrently and occasionally resampling them (with replacement) based on their current likelihoods. Resampling discards executions with lower weights (in the worst case, 0) and replaces them with executions with higher weights. The most common approach in popular PPLs is to resample just after likelihood updates (i.e., calls to weight).
Resampling at all calls to weight in Fig. 1a is suboptimal. The best option is instead to only resample at line 12. This is because executions encounter lines 5 and 8 a random number of times due to the stochastic branch at line 3, while they encounter line 12 a fixed number of times. As a result of resampling at lines 5 and 8, executions become unaligned ; in each resampling, executions can have reached either line 5, line 8, or line 12. On the other hand, if we resample only at line 12, all executions will always have reached line 12 for the same iteration of iter in every resampling. Intuitively, this is a sensible approach since, when resampling, executions have progressed the same distance through the program. We say that the weight at line 12 is aligned, and resampling only at aligned weights results in our new inference approach called aligned SMC. Fig. 1d visualizes the weight alignment for two sample executions of Fig. 1a.

Aligned Lightweight MCMC
Another improvement over likelihood weighting is to construct a Markov chain over program executions. It is beneficial to propose new executions in the Markov chain by making small, rather than large, modifications to the previous execution. The lightweight MCMC [46] algorithm does this by redrawing a single random draw in the previous execution, and then reusing as many other random draws as possible. Random draws in the current and previous executions match through stack traces-the sequences of applications leading up to a random draw. Consider the random draw at line 13 in Fig. 1a. It is called exactly three times in every execution. If we identify applications and assumes by line numbers, we get the stack traces [17,13], [17,15,13], and [17,15,15,13] for these three assumes in every execution. Consequently, lightweight MCMC can reuse these draws by storing them in a database indexed by stack traces.
The stack trace indexing in lightweight MCMC is overly complicated when reusing aligned random draws. Note that the assumes at lines 1 and 13 in Fig 1a  are aligned, while the assume at line 4 is unaligned. Fig. 1e visualizes the assume alignment for two sample executions of Fig. 1a. Aligned random draws occur in the same same order in every execution, and are therefore trivial to match and reuse between executions through indexing by counting. The appeal with stack trace indexing is to additionally allow reusing a subset of unaligned draws.
A key insight in this paper is that aligned random draws can also act as synchronization points in the program to allow reusing unaligned draws without a stack trace database. After an aligned draw, we reuse unaligned draws occurring up until the next aligned draw, as long as they syntactically originate at the same assume as the corresponding unaligned draws in the previous execution. As soon as an unaligned draw does not originate from the same assume as in the previous execution, we redraw all remaining unaligned draws up until the next aligned draw. Instead of a trace-indexed database, this approach requires storing a list of unaligned draws (tagged with identifiers of the assumes at which they originated) for each execution segment in between aligned random draws. For example, for the execution s 1 in Fig. 1e, we store lists of unaligned Bernoulli random draws from line 4 for each execution segment in between the three aligned random draws at line 13. If a Poisson random draw n at line 13 does not change or decreases, we can reuse the stored unaligned Bernoulli draws up until the next Poisson random draw as survives executes n or fewer times. If the drawn n instead increases to n ′ , we can again reuse all stored Bernoulli draws, but must supplement them with new Bernoulli draws to reach n ′ draws in total.
As we show in Section 7, using aligned draws as synchronization points works very well in practice and avoids the runtime overhead of the lightweight MCMC database. However, manually identifying aligned parts of programs and rewriting them so that inference can make use of alignment is, if even possible, tedious, error-prone, and impractical for large programs. This paper presents an automated approach to identifying aligned parts of programs. Combining static alignment analysis and using aligned random draws as synchronization points form the key ideas of the new algorithm that we call aligned lightweight MCMC.
Automatic Alignment in Higher-Order PPLs 7

Syntax and Semantics
In preparation for the alignment analysis in Section 4, we require an idealized base calculus capturing the key features of expressive PPLs. This section introduces such a calculus with a formal syntax (Section 3.1) and semantics (Section 3.2). We assume a basic understanding of the lambda calculus (see, e.g., Pierce [36] for a complete introduction). Section 6 further describes extending the idealized calculus and the analysis in Section 4 to a full-featured PPL.

Syntax
We use the untyped lambda calculus as the base for our calculus. We also add let expressions for convenience, and if expressions to allow intrinsic booleans to affect control flow. The calculus is a subset of the language used in Fig. 1a. We inductively define terms t and values v as follows.

Definition 1 (Terms and values).
(1) X is a countable set of variable names, C a set of intrinsic values and operations, and D ⊂ C a set of probability distributions. The set P contains all evaluation environments ρ, that is, partial functions mapping names in X to values v. We use T and V to denote the set of all terms and values, respectively.
Values v are intrinsics or closures, where closures are abstractions with an environment binding free variables in the abstraction body. We require that C include booleans, the unit value (), and real numbers. The reason is that weight takes real numbers as argument and returns () and that if expression conditions are booleans. Furthermore, probability distributions are often over booleans and real numbers. For example, we can include the normal distribution constructor N ∈ C that takes real numbers as arguments and produces normal distributions over real numbers. For example, N 0 1 ∈ D, the standard normal distribution. We often write functions in C in infix position or with standard function application syntax for readability. For example, 1 + 2 with + ∈ C means + 1 2, and N (0, 1) means N 0 1. Additionally, we use the shorthand t 1 ; t 2 for let _ = t 1 in t 2 , where _ is the do-not-care symbol. That is, t 1 ; t 2 evaluates t 1 for side effects only before evaluating t 2 . Finally, the untyped lambda calculus supports recursion through fixed-point combinators. We encapsulate this in the shorthand let rec f = λx.t 1 in t 2 to conveniently define recursive functions.
The assume and weight constructs are PPL-specific. We define random variables from intrinsic probability distributions with assume (also known as sample in PPLs with sampling-based inference). For example, the term let x = assume N (0, 1) in t defines x as a random variable with a standard normal distribution in t. Boolean random variables combined with if expressions result   [25], illustrating (1). Fig. (a) gives the program, and (b) the corresponding probability distributions. In (b), the y-axis gives the probability, and the x-axis gives the outcome (the number of coin flips). The upper part of (b) excludes the shaded weight at line 4 in (a).
in stochastic branching-causing the alignment problem. Lastly, weight (also known as factor or score) is a standard construct for likelihood updating (see, e.g., Borgström et al. [6]). Next, we illustrate and formalize a semantics for (1).

Semantics
Consider the small probabilistic program t geo ∈ T in Fig. 2a. The program encodes the standard geometric distribution via a function geometric, which recursively flips a fair coin (a Bernoulli(0.5) distribution) at line 2 until the outcome is false (i.e., tails). At that point, the program returns the total number of coin flips, including the last tails flip. The upper part of Fig. 2b illustrates the result distribution for an infinite number of program runs with line 4 ignored.
To illustrate the effect of weight, consider t geo with line 4 included. This weight modifies the likelihood with a factor 1.5 each time the flip outcome is true (or, heads). Intuitively, this emphasizes larger return values, illustrated in the lower part of Fig. 2b. Specifically, the (unnormalized) probability of seeing n coin flips is 0.5 n · 1.5 n−1 , compared to 0.5 n for the unweighted version. The factor 1.5 n−1 is the result of the calls to weight.
We now introduce a big-step operational semantics for single runs of programs t. Such a semantics is essential to formalize the probability distributions encoded by probabilistic programs (e.g., Fig. 2b for Fig. 2a) and to prove the correctness of PPL inference algorithms. For example, Borgström et al. [6] define a PPL calculus and semantics similar to this paper and formally proves the correctness of an MCMC algorithm. Another example is Lundén et al. [24], who also define a similar calculus and semantics and prove the correctness of PPL SMC algorithms. In particular, the correctness of our aligned SMC algorithm (Section 5.1) follows from this proof. The purpose of the semantics in this paper is to formalize alignment and prove the soundness of our analysis in Section 4. We use a bigstep semantics as the finer granularity in a small-step semantics is redundant. We begin with a definition for intrinsics. We use · to denote multiplication.
Definition 3 (Traces). The set S of traces is the set such that, for all s ∈ S, s is a sequence of intrinsics from C with arity 0.
In the following, we use the notation [c 1 , c 2 , . . . , c n ] for sequences and for sequence concatenation.  Fig. 3 presents the semantics as a relation ρ ⊢ t s ⇓ w l v over P × T × S × R × L × V . L is the set of sequences over X, i.e., sequences of names. For example, [x, y, z] ∈ L, where x, y, z ∈ X. We use l ∈ L to track the sequence of letbindings during evaluation. For example, evaluating let x = 1 in let y = 2 in x + y results in l = [x, y]. In Section 4, we use the sequence of encountered let-bindings to define alignment. For simplicity, from now on we assume that bound variables are always unique (i.e., variable shadowing is impossible).
It is helpful to think of ρ, t, and s as the input to ⇓, and l, w and v as the output. In the environment ρ, t, with trace s, evaluates to v, encounters the sequence of let bindings l, and accumulates the weight w. The trace s is the sequence of all random draws, and each random draw in (Assume) consumes precisely one element of s. The rule (Let) tracks the sequence of bindings by adding x at the correct position in l. The number w is the likelihood of the execution-the probability density of all draws in the program, registered at (Assume), combined with direct likelihood modifications, registered at (Weight). The remaining aspects of the semantics are standard (see, e.g., Kahn [20]). To give an example of the semantics, we have for the particular execution of t geo making three recursive calls. Next, we formalize and apply the alignment analysis to (1).

Alignment Analysis
This section presents the main contribution of this paper: automatic alignment in PPLs. Section 4.1 introduces A-normal form and gives a precise definition of alignment. Section 4.2 formalizes and proves the correctness of the alignment analysis. Lastly, Section 4.3 discusses a dynamic version of alignment.

A-Normal Form and Alignment
To reason about all subterms t ′ of a program t and to enable the analysis in Section 4.2, we need to uniquely label all subterms. A straightforward approach is to use variable names within the program itself as labels (remember that we assume bound variables are always unique). This leads us to the standard A-normal form (ANF) representation of programs [11].
We use T ANF to denote the set of all terms t ANF . Unlike t ∈ T , t ANF ∈ T ANF enforces that a variable bound by a let labels each subterm in the program. Furthermore, we can automatically transform any program in T to a semantically equivalent T ANF program, and T ANF ⊂ T . Therefore, we assume in the remainder of the paper that all terms are in ANF. Given the importance of alignment in universal PPLs, it is somewhat surprising that there are no previous attempts to give a formal definition of its meaning. Here, we give a first such formal definition, but before defining alignment, we require a way to restrict, or filter, sequences.
Definition 5 (Restriction of sequences). For all l ∈ L and Y ⊆ X, l| Y (the restriction of l to Y ) is the subsequence of l with all elements not in Y removed.
Definition 6 (Alignment). For t ∈ T ANF , let X t denote all variables that occur in t. The sets A t ∈ A t , A t ⊆ X t , are the largest sets such that, for arbitrary ∅ ⊢ t s1 ⇓ w1 l1 v 1 and ∅ ⊢ t s2 ⇓ w2 l2 v 2 , l 1 | At = l 2 | At . For a given A t , the aligned expressions-expressions bound by a let to a variable name in A t -are those that occur in the same order in every execution, regardless of random draws. We seek the largest sets, as A t = ∅ is always a trivial solution. Assume we have a program with X t = {x, y, z} and such that l = [x, y, x, z, x] and l = [x, y, x, z, x, y] are the only possible sequences of let bindings. Then, A t = {x, z} is the only possibility. It is also possible to have multiple choices for A t . For example, if l = [x, y, z] and l = [x, z, y] are the only possibilities, then A t = {{x, z}, {x, y}}. Next, assume that we transform the programs in Fig. 2a and Fig. 1a to ANF. The expression labeled by x in Fig. 2a is then clearly not aligned, as random draws determine how many times it executes (l could be, e.g., [x, x] or [x, x, x, x]). Conversely, the expression n (line 13) in Fig. 1a is aligned, as its number and order of evaluations do not depend on any random draws.
Definition 6 is context insensitive: for a given A t , each x is either aligned or unaligned. One could also consider a context-sensitive definition of alignment in which x can be aligned in some contexts and unaligned in others. A context could, for example, be the sequence of function applications (i.e., the call stack) leading up to an expression. Considering different contexts for x is complicated and difficult to take full advantage of. We justify the choice of context-insensitive alignment with the real-world models in Section 7, neither of which requires a context-sensitive alignment.
With alignment defined, we now move on to the static alignment analysis.

Alignment Analysis
The basis for the alignment analysis is 0-CFA [33,41]-a static analysis framework for higher-order functional programs. The prefix 0 indicates that 0-CFA is context insensitive. There is also a set of analyses k-CFA [29] that adds increasing amounts (with k ∈ N) of context sensitivity to 0-CFA. We could use such analyses with a context-sensitive version of Definition 6. However, the potential benefit of k-CFA is also offset by the worst-case exponential time complexity, already at k = 1. In contrast, the time complexity of 0-CFA is polynomial (cubic in the worst-case). The alignment analysis for the models in Section 7 runs instantaneously, justifying that the time complexity is not a problem in practice. The extensions to 0-CFA required to analyze alignment are non-trivial to design, but the resulting formalization is surprisingly simple. The challenge is instead to prove that the extensions correctly capture the alignment property from Definition 6. We extend 0-CFA to analyze stochastic values and alignment in programs t ∈ T ANF . As with most static analyses, our analysis is sound 1 let n1 = ¬ in let n2 = ¬ in 2 let one = 1 in 3 let half = 0.5 in let c = true in 4 let f1 = λx1. let t1 = weight one in x1 in 5 let f2 = λx2. let t2 = weight one in t2 in 6 let f3 = λx3. let t3 = weight one in t3 in 7 let f4 = λx4. let t4 = weight one in t4 in 8 let bern = Bernoulli in 9 let d1 = bern half in 10 let a1 = assume d1 11 let v1 = f1 one in  but conservative (i.e., sound but incomplete)-the analysis may mark aligned expressions of programs as unaligned, but not vice versa. That the analysis is conservative does not degrade the alignment analysis results for any model in Section 7, which justifies the approach. We divide the formal analysis into two algorithms. Algorithm 1 generates constraints for t that a valid analysis solution must satisfy. This section describes Algorithm 1 and the generated constraints. Appendix B.1 provides the second algorithm, Algorithm 4, that computes a solution satisfying the generated constraints. We provide examples of applying Algorithm 4 here, but defer the complete description to Appendix B.1.
For soundness of the analysis, we require λx. t, ρ ∈ C (recall that C is the set of intrinsics). That is, closures are not in C. By Definition 3, this implies that closures are not in the sample space of probability distributions in D and that evaluating intrinsics never produces closures (this would unnecessarily complicate the analysis without any benefit).
In addition to standard 0-CFA constraints, Algorithm 1 generates new constraints for stochastic values and unalignment. We use the contrived but illustrative program in Fig. 4 as an example. Note that, while omitted from Fig. 4 for ease of presentation, the analysis also supports recursion introduced through let rec. Stochastic values are values in the program affected by random variables. Stochastic values initially originate at assume and then propagate through programs via function applications and if expressions. For example, a 1 (line 10) is stochastic because of assume. We subsequently use a 1 to define v 2 via n 1 (line 12), which is then also stochastic. Similarly, a 1 is the condition for the if resulting in f 5 (line 14), and the function f 5 is therefore also stochastic. When we apply f 5 , it results in yet another stochastic value, v 4 (line 18). In conclusion, the stochastic values are a 1 , v 2 , f 5 , and v 4 .
Consider the flow of unalignment in Fig. 4. We mark expressions that may execute due to stochastic branching as unaligned. From our analysis of stochastic values, the program's only stochastic if condition is at line 15, and we determine that all expressions directly within the branches are unaligned. That is, the expression labeled by t 5 is unaligned. Furthermore, we apply the variable f 4 when defining t 5 . Thus, all expressions in bodies of lambdas that flow to f 4 are unaligned. Here, it implies that t 4 is unaligned. Finally, we established that the function f 5 produced at line 15 is stochastic. Due to the application at line 18, all names bound by lets in bodies of lambdas that flow to f 5 are unaligned. Here, it implies that t 2 and t 3 are unaligned. In conclusion, the unaligned expressions are named by t 2 , t 3 , t 4 , and t 5 . For example, aligned SMC therefore resamples at the weight at t 1 , but not at the weights at t 2 , t 3 , and t 4 .
Consider the program in Fig. 1a again, and assume it is transformed to ANF. The alignment analysis must mark all names bound within the stochastic if at line 3 as unaligned because a stochastic value flows to its condition. In particular, the weight expressions at lines 5 and 8 are unaligned (and the weight at line 12 is aligned). Thus, aligned SMC resamples only at line 12.
To formalize the flow of stochastic values, we define abstract values a ∈ A, that flow within the program, as follows.
The stoch abstract value is new and represents stochastic values. The λx.y and const n abstract values are standard and represent abstract closures and intrinsics, respectively. For each variable name x in the program, we define a set S x containing abstract values that may occur at x. For example, in Fig. 4, we have stoch ∈ S a1 , (λx 2 .t 2 ) ∈ S f2 , and (const 1) ∈ S n1 . The abstract value λx 2 .t 2 represents all closures originating at λx 2 , and const 1 represents intrinsic functions in C of arity 1 (in our example, ¬). The body of the abstract lambda is the variable name labeling the body, not the body itself. For example, t 2 labels the body let t 2 = one in t 2 of λx 2 . Due to ANF, all terms have a label, which the function name in Algorithm 1 formalizes.
We also define booleans unaligned x that state whether or not the expression labeled by x is unaligned. For example, we previously reasoned that unaligned x = true for x ∈ {t 2 , t 3 , t 4 , t 5 } in Fig. 4. The alignment analysis aims to determine minimal sets S x and boolean assignments of unaligned x for every program variable x ∈ X. A trivial solution is that all abstract values (there is a finite number of them in the program) flow to each program variable and that unaligned x = true for all x ∈ X. This solution is sound but useless. To compute a more precise solution, we follow the rules given by constraints c ∈ R (see Appendix B for a formal definition).
We present the constraints through the generateConstraints function in Algorithm 1 and for the example in Fig. 4. There are no constraints for variables that occur at the end of ANF let sequences (line 2 in Algorithm 1), and the case for let expressions (lines 3-36) instead produces all constraints. The cases for aliases (line 6), intrinsics (line 7), assume (line 35), and weight (line 36) are the most simple. Aliases of the form let x = y in t 2 establish S y ⊆ S x . That is, all abstract values at y are also in x. Intrinsic operations results in a const abstract value. For example, the definition of n 1 at line 1 in Fig. 4 results in the constraint const 1 ∈ S n1 . Applications of assume are the source of stochastic values. For example, the definition of a 1 at line 10 results in the constraint stoch ∈ S a1 . Note that assume cannot produce any other abstract values, as we only allow distributions over intrinsics with arity 0 (see Definition 3). Finally, we use Algorithm 1 Constraint generation function for t ∈ T ANF . We denote the power set of a set E with P(E). weight only for its side effect (likelihood updating), and therefore weights do not produce any abstract values and consequently no constraints.
The cases for abstractions (line 9), applications (line 13), and ifs (line 26) are more complex. The abstraction at line 4 in Fig. 4 generates (omitting the recursively generated constraints for the abstraction body t y ) the constraints The first constraint is standard: the abstract lambda λx 1 .x 1 flows to S f1 . The second constraint states that if the abstraction is unaligned, all expressions in its body (here, only t 1 ) are unaligned. We define the sets of expressions within abstraction bodies and if branches through the names function in Algorithm 1 (line 43).
The application f 5 one at line 18 in Fig. 4 generates the constraints The first constraint is standard: if an abstract value λz.y flows to f 5 , the abstract values of one (the right-hand side) flow to z. Furthermore, the result of the application, given by the body name y, must flow to the result v 4 of the application.
The second constraint is also relatively standard: if an intrinsic function of arity n is applied, it produces a const of arity n − 1. The other constraints are new and specific for stochastic values and unalignment. The third constraint states that if the function is stochastic, the result is stochastic. The fourth constraint states that if we apply an intrinsic function to a stochastic argument, the result is stochastic. We could also make the analysis of intrinsic applications less conservative through intrinsic-specific constraints. The fifth and sixth constraints state that if the expression (labeled by v 4 ) is unaligned or the function is stochastic, all abstract lambdas that flow to the function are unaligned.
The if resulting in f 5 at line 14 in Fig. 4 generates (omitting the recursively generated constraints for the branches t t and t e ) the constraints The first two constraints are standard and state that the result of the branches flows to the result of the if expression. The remaining constraints are new. The third constraint states that if the condition is stochastic, the result is stochastic. The last two constraints state that if the if is unaligned or if the condition is stochastic, all names in the branches (here, only t 5 ) are unaligned. Given constraints for a program, we need to compute a solution satisfying all constraints. We do this by repeatedly iterating through all the constraints and propagating abstract values accordingly. We terminate when we reach a fixed point, i.e., when no constraint results in an update of either S x or unaligned x for any x in the program. Algorithm 4 in Appendix B.1 formalizes our extension of the 0-CFA constraint propagation algorithm that also handles the constraints generated for tracking stochastic values and unalignment. The analysis function analyzeAlign: T ANF → ((X → P(A)) × P(X)) returns a map associating each variable to a set of abstract values and a set of unaligned variables. In other words, analyzeAlign computes a solution to S x and unaligned x for each x in the analyzed program. For example, analyzeAlign(t example ) results in The example confirms our earlier intuition: an intrinsic (¬) flows to n 1 , stoch flows to a 1 , f 5 is stochastic and originates at either (λx 2 .t 2 ) or (λx 3 .t 3 ), and the unaligned variables are t 2 , t 3 , t 4 , and t 5 . We now give soundness results.
⊓ ⊔ Theorem 1 (Alignment analysis soundness). Assume t ∈ T ANF , A t from Definition 6, and an assignment to S x and unaligned x for x ∈ X according to analyzeAlign(t).
Proof. Follows by Lemma 3 in Appendix B.2 with t ′ = t and ρ 1 = ρ 2 = ∅. The proof uses simultaneous structural induction over the derivations l2 v 2 . At corresponding stochastic branches or stochastic function applications in the two derivations, a separate structural induction argument shows that, for the let-sequences l ′ 1 and l ′ 2 of the two stochastic subderivations, Combined, the two arguments give the result.

⊓ ⊔
The result A t ⊆ A t (cf. Definition 6) shows that the analysis is conservative.

Dynamic Alignment
An alternative to static alignment is dynamic alignment, which we explored in early stages when developing the alignment analysis. Dynamic alignment is fully context sensitive and amounts to introducing variables in programs that track (at runtime) when evaluation enters stochastic branching. To identify these stochastic branches, dynamic alignment also requires a runtime data structure that keeps track of the stochastic values. Similarly to k-CFA, dynamic alignment is potentially more precise than the 0-CFA approach. However, we discovered that dynamic alignment introduces significant runtime overhead. Again, we note that the models in Section 7 do not require a context-sensitive analysis, justifying the choice of 0-CFA over dynamic alignment and k-CFA.

Aligned SMC and MCMC
This section presents detailed algorithms for aligned SMC (Section 5.1) and aligned lightweight MCMC (Section 5.2). For a more pedagogical introduction to the algorithms, see Section 2. We assume a basic understanding of SMC and Metropolis-Hastings MCMC algorithms (see, e.g., Bishop [4]).

Aligned SMC
We saw in Section 2.1 that SMC operates by executing many instances of t concurrently, and resampling them at calls to weight. Critically, resampling requires that the inference algorithm can both suspend and resume executions.
Here, we assume that we can create execution instances e of the probabilistic program t, and that we can arbitrarily suspend and resume the instances. The technical details of suspension are beyond the scope of this paper. See Goodman and Stuhlmüller [14], Wood et al. [47], and Lundén et al. [25] for further details.
Algorithm 2 Aligned SMC. The input is a program t ∈ T ANF and the number of execution instances n.
1. Run the alignment analysis on t, resulting in At (see Theorem 1).
3. Execute all ei and suspend execution upon reaching an aligned weight (i.e., let x = weight w in t and x ∈ At) or when the execution terminates naturally. The result is a new set of execution instances e ′ i with weights w ′ i accumulated from unaligned weights and the single final aligned weight during execution.    Algorithm 2 presents all steps for the aligned SMC inference algorithm. After running the alignment analysis and setting up the n execution instances, the algorithm iteratively executes and resamples the instances. Note that the algorithm resamples only at aligned weights (see Section 2.1).
We conjecture that aligned SMC is preferable over unaligned SMC for all practically relevant models, as the evaluation in Section 7 justifies. However, it is possible to construct contrived programs in which unaligned SMC has the advantage. Consider the programs in Fig. 5, both encoding Bernoulli(0.5) distributions in a contrived way using weights. Fig. 5a takes one of two branches with equal probability. Unaligned SMC resamples at the first weights in each branch, while aligned SMC does not because the branch is stochastic. Due to the difference in likelihood, many more else executions survive resampling compared to then executions. However, due to the final weights in each branch, the branch likelihoods even out. That is, resampling at the first weights is detrimental, and unaligned SMC performs worse than aligned SMC. Fig. 5b also takes one of two branches, but now with unequal probabilities. However, the two branches still have equal posterior probability due to the weights. The nested if in the then branch does not modify the overall branch likelihood, but adds variance. Aligned SMC does not resample for any weight within the branches, as the branch is stochastic. Consequently, only 10% of the executions in aligned SMC take the then branch, while half of the executions take the then branch in unaligned SMC (after resampling at the first weight). Therefore, unaligned SMC better explores the then branch and reduces the variance due to the nested if, which results in

Algorithm 3
Aligned lightweight MCMC. The input is a program t ∈ T ANF , the number of steps n, and the global step probability g > 0.
1. Run the alignment analysis on t, resulting in At (see Theorem 1).

Compute the Metropolis-Hastings acceptance ratio
function run() = Run t and do the following: -Record the total weight wi accumulated from calls to weight.
-Record the final value vi.
-At unaligned terms let c = assume d in t (c ∈ At), do the following.
In the program, bind c to the value x and resume execution.
-At aligned terms let c = assume d in t (c ∈ At), do the following. 1 3. Set k ← k + 1, l ← 1, and reuse ← true. In the program, bind c to the value x and resume execution.
overall better inference accuracy. We are not aware of any real model with the property in Fig. 5b. In practice, it seems best to always resample when using weight to condition on observed data. Such conditioning is, in practice, always done outside of stochastic branches, justifying the benefit of aligned SMC.

Aligned Lightweight MCMC
Aligned lightweight MCMC is a version of lightweight MCMC [46], where the alignment analysis provides information about how to reuse random draws between executions. Algorithm 3, a Metropolis-Hastings algorithm in the context of PPLs, presents the details. Essentially, the algorithm executes the program repeatedly using the Run function, and redraws one aligned random draw in each step, while reusing all other aligned draws and as many unaligned draws as possible (illustrated in Section 2.2). We provide a derivation of the Metropolis-Hastings acceptance ratio in step 5 in Appendix E. A key property in Algorithm 3 due to alignment (Definition 6) is that the length of s i (and p i ) is constant, as executing t always results in the same number of aligned random draws. In addition to redrawing only one aligned random draw, each step has a probability g > 0 of being global -meaning that inference redraws every random draw in the program. Occasional global steps fix problems related to slow mixing and ergodicity of lightweight MCMC identified by Kiselyov [21]. In a global step, the Metropolis-Hastings acceptance ratio reduces to A = min 1, wi wi−1 .

Implementation
We implement the alignment analysis (Section 4), aligned SMC (Section 5.1), and aligned lightweight MCMC (Section 5.2) for the functional PPL Miking CorePPL [25], implemented as part of the Miking framework [7]. We implement the alignment analysis as a core component in the Miking CorePPL compiler, and then use the analysis when compiling to two Miking CorePPL backends: RootPPL and Miking Core. RootPPL is a low-level PPL with built-in highly efficient SMC inference [25], and we extend the CorePPL to RootPPL compiler introduced by Lundén et al. [25] to support aligned SMC inference. Furthermore, we implement aligned lightweight MCMC inference standalone as a translation from Miking CorePPL to Miking Core. Miking Core is the general-purpose programming language of the Miking framework, currently compiling to OCaml. The idealized calculus in (1) does not capture all features of Miking CorePPL. In particular, the alignment analysis implementation must support records, variants, sequences, and pattern matching over these. Extending 0-CFA to such language features is not new, but it does introduce a critical challenge for the alignment analysis: identifying all possible stochastic branches. Determining stochastic ifs is straightforward, as we simply check if stoch flows to the condition. However, complications arise when we add a match construct (and, in general, any type of branching construct). Consider the extension t ::= . . . | match t with p then t else t | {k 1 = x 1 , . . ., k n = x n } p ::= x | true | false | {k 1 = p, . . ., k n = p} x, x 1 , . . . , x n ∈ X k 1 , . . . , k n ∈ K n ∈ N of (1), adding records and simple pattern matching. K is a set of record keys. Assume we also extend the abstract values as a ::= . . . | {k 1 = X 1 , . . . , k n = X n }, where X 1 , . . . , X n ⊆ X. That is, we add an abstract record tracking the names in the program that flow to its entries. Consider the program match t 1 with { a = x 1 , b = false } then t 2 else t 3 . This match is, similar to ifs, stochastic if stoch ∈ S t1 . It is also, however, stochastic in other cases. Assume we have two program variables, x and y, such that stoch ∈ S x and stoch ∈ S y . Now, the match is stochastic if, e.g., {a = {y}, b = {x}} ∈ S t1 , because the random value flowing from x to the pattern false may not match because of randomness.
The randomness of x does not influence whether or not the branch is stochastic-the variable pattern x 1 for label a always matches.
Our alignment analysis implementation handles the intricacies of identifying stochastic match cases for nested record, variant, and sequence patterns. In total, the alignment analysis, aligned SMC, and aligned lightweight MCMC implementations consist of approximately 1000 lines of code directly contributed as part of this paper. The code is available on GitHub [2].

Evaluation
This section evaluates aligned SMC and aligned lightweight MCMC on a set of models encoded in Miking CorePPL: CRBD [32,38] in Sections 7.1 and 7.5, ClaDS [27,38] in Section 7.2, state-space aircraft localization in Section 7.3, and latent Dirichlet allocation in Section 7.4. CRBD and ClaDS are non-trivial models of considerable interest in evolutionary biology and phylogenetics [38]. Similarly, LDA is a non-trivial topic model [5]. Running the alignment analysis took approximately 5 ms-30 ms for all models considered in the experiment, justifying that the time complexity is not a problem in practice.
We compare aligned SMC with standard unaligned SMC [14], which is identical to Algorithm 2, except that it resamples at every call to weight (see Appendix C). We carefully checked that automatic alignment corresponds to previous manual alignments of each model. For all SMC experiments, we estimate the normalizing constant produced as a by-product of SMC inference rather than the complete posterior distributions. The normalizing constant, also known as marginal likelihood or model evidence, frequently appears in Bayesian inference and gives the probability of the observed data averaged over the prior. The normalizing constant is useful for model comparison as it measures how well different probabilistic models fit the data (a larger normalizing constant indicates a better fit).
We ran aligned and unaligned SMC with Miking CorePPL and the RootPPL backend configured for a single-core (compiled with GCC 7.5.0). Lundén et al. [25] shows that the RootPPL backend is significantly more efficient than other state-of-the-art PPL SMC implementations. We ran aligned and unaligned SMC inference 300 times (and with 3 warmup runs) for each experiment for 10 4 , 10 5 , and 10 6 executions (also known as particles in SMC literature).
We compare aligned lightweight MCMC to lightweight MCMC (see Appendix D). We implement both versions as compilers from Miking CorePPL to Miking Core, which in turn compiles to OCaml (version 4.12). The lightweight MCMC databases are functional-style maps from the OCaml Map library. We set the global step probability to 0.1 for both aligned lightweight MCMC and lightweight MCMC. We ran aligned lightweight and lightweight MCMC inference 300 times for each experiment. We burned 10% of samples in all MCMC runs.
For all experiments, we used an Intel Xeon 656 Gold 6136 CPU (12 cores) and 64 GB of memory running Ubuntu 18.04.5.

SMC: Constant Rate Birth-Death (CRBD)
This experiment considers the CRBD diversification model from [38] applied to the Alcedinidae phylogeny (Kingfisher birds, 54 extant species) [19]. We use fixed diversification rates to simplify the model, as unaligned SMC inference accuracy is too poor for the full model with priors over diversification rates. Aligned SMC is accurate for both the full and simplified models. We provide the source code   has not yet converged to the correct value −304.75 (available for this particular model due to the fixing the diversification rates) for 10 6 particles, while aligned SMC produces precise estimates already at 10 4 particles. Excess resampling is a significant factor in the increase in execution time for unaligned SMC, as each execution encounters far more resampling checkpoints than in aligned SMC.

SMC: Cladogenetic Diversification Rate Shift (ClaDS)
A limitation of CRBD is that the diversification rates are constant. ClaDS [27,38] is a set of diversification models that allow shifting rates over phylogenies. We evaluate the ClaDS2 model for the Alcedinidae phylogeny. As in CRBD, we use fixed (initial) diversification rates to simplify the model on account of unaligned SMC. The source code for the complete model is available in Listing 2 of Appendix A.2 (147 lines of code). Automatic alignment simplifies the ClaDS2 model significantly, as manual alignment requires collecting and passing weights around in unaligned parts of the program, which are later consumed by aligned weights. The total experiment execution time was 67 hours. Fig. 7 presents the experiment results. 12 unaligned runs for 10 6 particles and nine runs for 10 5 particles ran out of the preallocated stack memory for each particle (10 kB). We omit these runs from Fig. 7. The consequence of not aligning SMC is more severe than for CRBD. Aligned SMC is now almost seven times faster than unaligned SMC and the unaligned SMC normalizing constant estimates are significantly worse compared to the aligned SMC estimates. The unaligned SMC estimates do not even improve when moving from 10 4 to 10 6 particles (we need even more particles to see improvements). Again, aligned SMC produces precise estimates already at 10 4 particles.        Fig. (a) shows execution times (in seconds) for aligned (gray) and unaligned (white) SMC. Error bars show one standard deviation. Fig. (b) shows box plot log normalizing constant estimates on the y-axis for aligned (gray) and unaligned (white) SMC. The average estimate for aligned SMC with 10 6 particles is −61.26.

SMC: State-Space Aircraft Localization
This experiment considers an artificial but non-trivial state-space model for aircraft localization. Appendix A.3 presents the model as well as the source code in Listing 3 (62 lines of code). The total experiment execution time was 1 hour. Fig. 8 presents the experiment results. The execution time difference is not as significant as for CRBD and ClaDS. However, the unaligned SMC normalizing constant estimates are again much less precise. Aligned SMC is accurate (centered at approximately −61.26) already at 10 4 particles. The model's straightforward control flow explains the less dramatic difference in execution time-there are at most ten unaligned likelihood updates in the aircraft model, while the number is, in theory, unbounded for CRBD and ClaDS. Therefore, the cost of extra resampling compared to aligned SMC is not as significant.  Note that we are not using methods based on collapsed Gibbs sampling [17], and the inference task is therefore computationally challenging even with a rather small number of words and documents. The source code for the complete model is available in Listing 4 of Appendix A.4 (31 lines of code). The total experiment execution time was 41 hours.
The LDA model consists of only aligned random draws. As a consequence, aligned lightweight and lightweight MCMC reduces to the same inference algorithm, and we can compare the algorithms by just considering the execution times. We justify the correctness of both algorithms in Appendix A.4. Fig. 9 presents the experiment results. Aligned lightweight MCMC is almost three times faster than lightweight MCMC. To justify the execution times with our implementations, we also implemented and ran the experiment with lightweight MCMC in WebPPL [14] for 10 5 iterations, repeated 50 times (and with 3 warmup runs). The mean execution time was 383 s with standard deviation 5 s. We used WebPPL version 0.9.15 and Node version 16.18.0.

MCMC: Constant Rate Birth-Death (CRBD)
This experiment again considers CRBD. MCMC is not as suitable for CRBD as SMC, and therefore we use a simple synthetic phylogeny with six leaves and an age span of 5 age units (Alcedinidae used for the SMC experiment has 54 leaves and an age span of 35 age units). The source code for the complete model is the same as in Section 7.1, but we now allow the use of proper prior distributions for the diversification rates. The total experiment execution time was 7 hours.
Unlike LDA, the CRBD model contains both unaligned and aligned random draws. Because of this, aligned lightweight MCMC and standard lightweight MCMC do not reduce to the same algorithm. To judge the difference in inference accuracy, we consider the mean estimates of the birth diversification rate produced by the two algorithms, in addition to execution times. The experiment results shows that the posterior distribution over the birth rate is unimodal (see Appendix A.5), which motivates using the posterior mean as a measure of accuracy. Fig. 10 presents the experiment results. Aligned lightweight MCMC is approximately 3.5 times faster than lightweight MCMC. There is no obvious dif-  ference in accuracy. To justify the execution times and correctness of our implementations, we also implemented and ran the experiment with lightweight MCMC in WebPPL [14] for 3 · 10 6 iterations, repeated 50 times (and with 3 warmup runs). The mean estimates agreed with Fig. 10. The mean execution time was 37.1 s with standard deviation 0.8 s. The speedup compared to standard lightweight MCMC in Miking CorePPL is likely explained by the use of early termination in WebPPL, which benefits CRBD. Early termination easily combines with alignment but relies on execution suspension, which we do not currently use in our implementations. Note that aligned lightweight MCMC is faster than WebPPL even without early termination.
In conclusion, the experiments clearly demonstrate the need for alignment. , checkpoint] needs to occur at an expression that is evaluated in every execution of a program". Again, they do not provide any formal definition of alignment nor an automatic solution to enforce it.

Related Work
Lundén et al. [24] briefly mention the general problem of selecting optimal resampling locations in PPLs for SMC but do not consider the alignment problem in particular. They also acknowledge the overhead resulting from not all SMC executions resampling the same number of times, which alignment avoids.
The Many other PPLs exist, such as Gen [10], Venture [28], Edward [43], Stan [8], and AugurV2 [18]. Gen, Venture, and Edward focus on simplifying the joint specification of a model and its inference to give users low-level control, and do not consider automatic alignment specifically. However, the incremental inference approach [9] in Gen does use the addressing approach by Wingate et al. [46]. Stan and AugurV2 have less expressive modeling languages to allow more powerful inference. Alignment is by construction due to the reduced expressiveness. Borgström

Conclusion
This paper gives, for the first time, a formal definition of alignment in PPLs. Furthermore, we introduce a static analysis technique and use it to align checkpoints in PPLs and apply it to SMC and MCMC inference. We formalize the alignment analysis, prove its correctness, and implement it in Miking CorePPL. We also implement aligned SMC and aligned lightweight MCMC, and evaluate the implementations on non-trivial CRBD and ClaDS models from phylogenetics, the LDA topic model, and a state-space model, demonstrating significant improvements compared to standard SMC and lightweight MCMC.

A Evaluation, Continued
This section presents further details related to the evaluation in Section 7. In particular, we attach code listings for the experiment models. Note that these listings only give the model code. The code for the analysis itself and all inference algorithms are available on GitHub [2].    3. An aircraft flies along a one-dimensional axis in discrete time steps, and the crew needs to estimate the aircraft's current position using noisy satellite position data available for the ten most recent time steps (defined at line 1). A second model component-the aircraft's altitude-further complicates the model as the crew cannot observe it (the altimeter is not functioning). The aircraft's velocity and the precision of the satellite observations depend on the altitude, as dictated by the functions velocity (defined at line 13) and positionObsStDev (defined at line 18). The velocity (in meters per second) increases linearly with increasing altitude (less air resistance) but is capped to the range [100, 500]. On the other hand, the observation standard deviation (in meters) decreases linearly with increasing altitude (less interference between the satellites and the aircraft) but is never less than ten.
Lines 25 to 44 define the main function simulate iterating over the ten data items. The critical component illustrating the need for alignment is the weight 0.5 at line 32. This weight encodes that the pilot adjusts the aircraft's pitch when air traffic control signals altitude deviations more than 100 feet from the assigned altitude of 35 000 feet. Each time step where the actual altitude deviates more than 100 feet from the assigned altitude thus gives a penalty factor of 0.5. Unlike the weight at line 29, this weight is unaligned.
The simulation also accounts for variations in, e.g., wind resistance when updating the position at line 34 through a standard deviation of positionStDev  meters. Similarly, the altitude varies with a standard deviation of altitudeStDev feet when updating the altitude at line 39.
We generated the ten data points used for the experiment in Section 7.3 by running the model (ignoring line 32) and sampling from N (position , σ 2 ) at line 29.
Listing 3 gives the Miking CorePPL source code used for the case study model in Section 7.3.

A.4 MCMC: Latent Dirichlet Allocation (LDA)
Listing 4 gives the Miking CorePPL source code used for the case study model in Section 7.4. Furthermore, we conduct an additional LDA experiment justifying the correctness of the aligned lightweight MCMC and lightweight MCMC implementations. The experiment uses a simplified generated data set with only two topics, a vocabulary of two words, and three documents with 10 words each. To generate the data, we use the true values θ 1 = 0.95, θ 2 = 0.05, and θ 3 = 0.5 for the document topic distributions, and φ 1 = 0.99 and φ 2 = 0.01 for the word distribution within the two topics. Note that the true proportions above are uniquely determined by the proportion of the first topic and first word, as there are only two topics and two words in the vocabulary. The simplicity of the model and rather extreme true values used to generate the data allows for easy 21 22 let positionStDev = 50. in

B Alignment Analysis, Continued
This section presents the full alignment constraint propagation algorithm (Section B.1) and proof of soundness of the alignment analysis (Section B.2).

B.1 Algorithm
Algorithm 4 presents the full alignment algorithm that produces a solution to the constraints generated by Algorithm 1. For reference, we now also give a more formal definition of constraints c.
The main function analyzeAlign consists of two steps: initialization and iteration. In the initialization step, generateConstraints provides constraints to the initializeConstraint function, which initializes the maps data and edges, and the set unaligned. The map data contains the sets of abstract values for all program variables and is initially empty. At termination, data(x) is a sound approximation of S x for each x (Lemma 1). The map edges associates a set of constraints with each variable in the program. Specifically, we must propagate the constraints associated with a variable x after updating data(x) with new information. Finally, the set unaligned tracks unaligned expressions and is initially empty. At termination, unaligned contains the set of all unaligned variables identified by the analysis. This set is sound according to Lemma 1. The iteration step iter propagates constraints with propagateConstraint for all variables updated with new abstract values or unalignment since their last propagation. We store these updated variables in the sequence worklist, which, when empty, signals fixpoint and termination. Note that, e.g., the lambda application constraint at line 67 initializes new constraints dynamically during propagation, depending on which abstract lambdas flow to the left-hand side of the application.

B.2 Correctness Proof
This section presents the correctness proof that is ultimately used to prove Theorem 1.
Throughout this section, t 1 = t 2 means that the terms t 1 and t 2 are alpha equivalent. For constant comparisons c 1 = c 2 , we assume the prior existence of an equality function over constants. We first require a specific equality relation on values.
Note, in particular, that V = treats closures as equal even if their environments differ. As we will see, this property is critical in the proof of Lemma 2.
Next, we formally define subterms.
Definition 10 (Subterms). We say that t ′ is a subterm of t iff and t ′ is a subterm of either t 1 , t 2 , or t 3 .
In the below, we assume a fixed t ∈ T ANF , -an assignment to S x and unaligned x for x ∈ X from analyzeAlign(t), and We begin with a lemma concerning unaligned expressions in single evaluations of ⇓.
(R1) We clearly have The result now follows immediately from (R1 ′ ).
(R1) We clearly have The result now follows immediately from (R1 ′ ).
(C1 ′ ) First, it is clear that λy.t y is a subterm of t and that (C1) holds for ρ. Lastly, Lemma 1 also gives λy.name(t y ) ∈ S x .
Subcase t 1 = if y then t t else t e The possible derivations are Without loss of generality, we only consider (If-True). Note that for the sub- below, holds immediately by the induction hypothesis as (C1) holds for ρ.
(R1) Assume we have unaligned n for all n ∈ names(t ′ ) (including unaligned x ).
The result follows immediately.
We have w ∈ R and |w| = 0. The result follows immediately.
(R1) We clearly have The result now follows immediately from (R1 ′ ).

⊓ ⊔
With Lemma 2 established, we now give the main lemma used to prove Theorem 1.
Subcase t 1 = if y then t t else t e The possible derivations are We first establish (C2 ′ )-(C4 ′ ).
(C2 ′ ) Holds in all four cases by repeating the corresponding argument for (C1 ′ ) in Lemma 2. (C3 ′ ) Assume stoch ∈ S x . By Lemma 1, clearly stoch ∈ S y and both derivations are either (If-True) or (If-False). Without loss of generality, assume both derivations are (If-True). The induction hypothesis directly applies to ρ 1 ⊢ t t s11 ⇓ w11 l11 v t1 and ρ 2 ⊢ t t s21 ⇓ w21 l21 v t2 , and we get the result (R3 t )-(R5 t ). By Lemma 1, name(t t ) ⊆ S x . The result now follows from (R4 t ). (C4 ′ ) Assume first that stoch ∈ S y . Then stoch ∈ S x by Lemma 1, and the result is immediate. Therefore, assume stoch ∈ S y . Again, both derivations are either (If-True) or (If-False) and we assume, without loss of generality, that both are (If-True). The induction hypothesis directly applies to ρ 1 ⊢ t t s11 ⇓ w11 l11 v t1 and ρ 2 ⊢ t t s21 ⇓ w21 l21 v t2 , and we get the result (R3 t )-(R5 t ). By Lemma 1, name(t t ) ⊆ S x . The result now follows from (R5 t ). We now apply the induction hypothesis and get (R3 ′ )-(R5 ′ ).
If stoch ∈ S y , then by Lemma 1, unaligned nt for all n t ∈ names(t t ) and unaligned ne for all n e ∈ names(t e ). By repeating Lemma 2 twice, we get l 11 | At = l 21 | At = [] and the result follows. Assume stoch ∈ S y . Again, both derivations are either (If-True) or (If-False) and we assume, without loss of generality, that both are (If-True). The induction hypothesis directly applies to ρ 1 ⊢ t t s11 ⇓ w11 l11 v t1 and ρ 2 ⊢ t t s21 ⇓ w21 l21 v t2 , and we get the result (R3 t )-(R5 t ). By (R3 t ), l 11 | At = l 21 | At . Subcase t 1 = assume y The derivations are We first establish (C2 ′ )-(C4 ′ ).

C Unaligned SMC
Algorithm 5 presents the unaligned SMC algorithm. It is in many ways similar to Algorithm 2.

D Lightweight MCMC
Algorithm 6 presents the lightweight MCMC algorithm. The algorithm is in many ways similar to Algorithm 3, but relies on databases represented with D i (random draws) and p i (probability densities/masses of the draws) to reuse random draws. The Run function keeps track of the current stack trace t at all times and uses it to index the databases.