Multiscale stochastic optimization: modeling aspects and scenario generation

Real-world multistage stochastic optimization problems are often characterized by the fact that the decision maker may take actions only at speciﬁc points in time, even if relevant data can be observed much more frequently. In such a case there are not only multiple decision stages present but also several observation periods between consecutive decisions, where proﬁts/costs occur contingent on the stochastic evolution of some uncertainty factors. We refer to such multistage decision problems with encap-sulated multiperiod random costs, as multiscale stochastic optimization problems. In this article, we present a tailor-made modeling framework for such problems, which allows for a computational solution. We ﬁrst establish new results related to the generation of scenario lattices and then incorporate the multiscale feature by leveraging the theory of stochastic bridge processes. All necessary ingredients to our proposed modeling framework are elaborated explicitly for various popular examples, including both diffusion and jump models.

• Multiperiod models The decisions are made at the very beginning whereas the consequences of the decisions depend on the development of a process over time.
A typical example is a buy-and-hold portfolio strategy. • Multistage models Decisions can be made at regular moments in time. Typical examples are active portfolio strategies.
Stochastic multiperiod models are simple from their structure. In contrast, multistage stochastic models are objects of intensive research, see the book of Pflug and Pichler [32]. The purpose of this paper is to introduce models, which incorporate the properties of both, multistage and multiperiod models. The latter deal with the development between the decision stages. Examples for such problems involving different time scales include: • Supply network extension problems, where major decisions (such as whether to defer, to stage, to mothball, or to abandon a certain infrastructure investment opportunity; cf. [28]) can only be made at strategic time points (say, once every few years), but resulting profits/costs are subject to daily fluctuations of market prices. • Inventory control problems with limited storage capacity and backlogged/lost demand due to out-of-stock events, where procurement of goods is restricted by logistical constraints/time delays. • Structured portfolio investment problems, where rebalancing is possible only at given time points (say, once every few weeks due to product terms and conditions), but contained barrier features make profits/losses depend on the full trajectory of asset prices. • Power plant management problems, where operating plans need to be fixed for a certain period ahead (say, once every few days due to physical constraints avoiding instant reaction to market conditions), but actual profits/losses depend on each tick of the energy market.
To the best of our knowledge, the existing literature does not offer a computational modeling framework designed specifically towards the solution of such multistage stochastic optimization problems, where two different time scales related to one underlying stochastic process are present. The novel approach suggested in this article consists of two parts, each dealing with one of the two time scales. The general idea is to first construct a coarse lattice model for the decision scale and then use a consistent simulation procedure to compute expected profits/costs on a fine time granularity between the decisions. The proposed approach is illustrated in Fig. 1.
Looking only at the coarser decision scale, the requirements to the discrete structure are the same as for any standard multistage stochastic optimization problem. In general, there are three different strategies for the generation of discrete scenarios out of a sample of observed data, as illustrated in Fig. 2. Fans are not an appropriate structure for multistage decision problems, as they cannot reflect the evolution of information. Scenario trees are a popular tool in the literature. However, scenario trees are practically intractable for problems involving a large number of decision stages, due to their exponential growth over time. Therefore, one often reverts to scenario lattices in such cases. While the literature on the construction of scenario trees is relatively rich (see, e.g., [16,18,23,32,33]), the lattice construction literature is rather sparse. The state-of-the art approach is based on the minimization of a distance measure between the targeted distribution and its discretization ("optimal quantization"), see [3,25,32].
In this article, we study a lattice generation method along the very upper path of Fig. 2. More precisely, it is a "direct" method for the case when a time-homogeneous Markovian diffusion model is selected in the first step. The approach is purely based on the infinitesimal drift and diffusion coefficient functions of a diffusion model and directly provides a scenario lattice, without requiring a simulation/quantization procedure. While the idea of such a discretization technique appeared already in an early paper by Pflug and Swietanowski [34], it has not been analyzed (or used) yet in the stochastic optimization literature (cf. the review article of Löhndorf [24]). We make the approach complete in this paper by proving a stability result and error estimate for the optimal value of a generic multistage stochastic optimization problem. In particular, we show that the approximation error regarding the optimal value in the continuous (state space) diffusion model can be controlled when the suggested lattice generation method is applied. Once the decision time scale has been discretized with a scenario tree/lattice model, a coherent approach for the finer observation time scale requires an interpolation that respects the laws of the underlying stochastic process. This brings us to the theory of stochastic bridges, i.e., processes pinned to a given value at the beginning and the end of a certain time period. We suggest to use a simulation engine to generate a set of paths of the bridge process, and then compute expected profits/costs between decisions based on a Monte-Carlo simulation. This requires a simulatable form of the bridge process. The stochastic processes literature seems to offer mainly abstract theory in this respect. There are some articles on simulation methods (i.e., mainly acceptancerejection methods) for diffusion bridges and jump-diffusion bridges in the statistical analysis literature since the early 2000's, see [8,9,14,30,36]. However, these methods are inefficient due to a possibly large rejection rate. To make our suggested modeling approach directly applicable, we work out explicitly the bridge process dynamics for some popular diffusion models, including geometric Brownian motion, the Vašíček model, and the Cox-Ingersoll-Ross model. Based on these dynamics, efficient simulation is possible by means of standard discretization schemes for stochastic differential equations. Moreover, we present a simulation scheme for the example of geometric Brownian motion, which operates directly on generated paths from the unconditioned process and thus enables an even more efficient generation of bridge process trajectories. If the cost function is particularly amenable (e.g., linear), a simulation might not even be required, as expected costs can be computed analytically in some models. We also include jump processes in our analysis, as we propose a simulation algorithm for compound Poisson bridges in the case of Normally, Exponentially, or Gamma distributed jump sizes. In particular, we discuss the simulation of the number of jumps of the bridge process and derive the conditional distribution of each jump-size given both the final value of the bridge process as well as the number of jumps in the interval.
The general contribution of this article is to propose a modeling framework and a corresponding scenario generation method, such that an efficient computational solution of multiscale stochastic optimization problems is possible. The details of this contribution are threefold. First, it consists of the general modeling idea, which is based on a consistent but separate scenario generation approach for the two involved time scales. Second, we analyze theoretically a widely unknown direct method for the construction of scenario lattices when the underlying stochastic model is of the diffusion type; this is purely related to the coarser decision time scale. Third, as regards the finer observation time scale, we elaborate the details of a consistent interpolation procedure for a number of popular modeling choices. This includes the presentation of a novel simulation algorithm for compound Poisson bridges.
The outline of the paper is as follows. Section 2 deals with the generation of discrete scenarios as a model for the information flow over the decision stages. In Sect. 3, we present the details related to the suggested interpolation approach for the information flow through the intermediate observation periods. Section 4 illustrates our modeling framework with a simple multiscale inventory control problem. Moreover, we discuss the applicability and the benefits of the proposed approach. We conclude in Sect. 5.

Scenario lattice generation for decision stages
Computational methods for stochastic optimization problems require discrete structures. For multistage problems, scenario trees are the standard models for the evolution of uncertainty over time. Scenario trees allow for general path-dependent solutions, as for each node there exists a unique path from the root of the tree. However, scenario trees grow exponentially in the number of stages, a fact that easily overwhelms any computer's memory when it comes to practically-sized problems. 1 Therefore, if the underlying stochastic model is a Markov process, one typically discretizes it in the form of a scenario lattice. Lattice models are special cases of graph-based models, where a node does not necessarily have a unique predecessor. Different paths may then connect in a certain node at some later stage. In this way, one can obtain a rich set of paths with relatively few nodes.
The construction of scenario lattices typically works in a two-step procedure. First, one discretizes the marginal distributions for all stages. In a second step, one decides about allowed state transitions and determines conditional transition probabilities between consecutive stages. The state-of-the-art method for such a lattice generation procedure is based on the stagewise minimization of the (Wasserstein) distance between the modeled distribution-which is typically continuous-and its discretization on the lattice. A detailed description of this approach can be found in Löhndorf and Wozabal [25,Sect. 3.2].
We will now study an alternative lattice generation method, which is not based on optimal quantization theory but rather relies on Markov chain approximation results. In particular, this approach allows to construct a scenario lattice directly from the dynamics of a Markovian diffusion process.

Markov chain approximation for diffusion processes
Birth-and-death Markov chains are discrete stochastic processes defined on the integer grid, where each transition depends only on the current state and allows for three possibilities: to remain in the current state, to move one unit up, or to move one down. Many Markov chains can be approximated by a diffusion process. It works by a transformation of the time scale and a renormalization of the state variable. The idea is, e.g., explained in the book of Karlin and Taylor [20,Ch. 15]. Pflug and Swietanowski [34] have looked at the problem from the converse perspective. They elaborate, without providing error estimates, that any diffusion process possessing a stationary distribution can be approximated by a birth-and-death Markov chain in the following way.
Consider a one-dimensional recurrent Markov process X t , as defined by where W denotes a standard Brownian motion and the initial value x 0 is a given constant. Throughout the paper, the coefficient functions μ(·) and σ (·) are (as usual) assumed to be square-integrable functions satisfying the following growth conditions: for some L > 0. Notice that the Lipschitz-continuity implies that one may specify a constant L μ such that μ(x) ≤ L μ + L · |x|.
Algorithm 2.1 (Markov chain approximation method for diffusion processes) For a diffusion process X as given by (1), define its N -th Markov chain approximation as the process constructed along the following scheme.
1. Choose a strictly monotonic, three times differentiable function H (x) with H (0) ≤ M < ∞, for some constant M, as well as functions g(x) and τ (x) with |τ (x)| ≤ 1 for all x, in such a way that the drift and diffusion coefficient functions in (1) are matched: 2. Determine the initial state i 0 such that H ( i 0 2 N ) = x 0 . 3. Define the transition probabilities where [x] 1 0 := min{max{x, 0}, 1}, for jumping up, down, and remaining in its state, respectively. 4. Define the piecewise constant (continuous time) processX N , whereX N t := X N 2 2N t lives in the states H i 2 N ; the floor function being denoted by · .
While the idea of Algorithm 2.1 was originally presented in the early paper [34], it has not been analyzed yet in the context of stochastic optimization. We now make the approach complete by deriving an error estimate for the optimal value of a generic multistage stochastic optimization problem, when the underlying diffusion model is approximated by the method of Algorithm 2.1. We start with some preliminary results required for the proof. Lemma 2.1 LetX N t be constructed according to Algorithm 2.1 and starting in x 0 at time t 0 . Then, for t 1 ≥ t 0 , the following bound for the second moment holds: where K 1 , K 2 ∈ R depend only on the Lipschitz(-like) constants controlling the growth of the coefficient functions in (1).

Proof
The conditional expected increment of the squared process is given by Using the estimate H (x) ≤ 1 + H 2 (x), we then obtain where K 1 := 2L μ + 2L + L 2 and K 2 := L 2 + 2L μ . Then, by the tower property of the expected value, we get Bounds for diffusion processes can be found in the literature. We will use the following result. We now establish that weak convergence implies convergence in Wasserstein distance, if the second moments of all involved probability measures are bounded.

Lemma 2.2 Consider a probability measure P and a sequence of probability measures
Then, weak convergence implies convergence in Wasserstein distance: Proof By Billingsley and Topsøe [7], the weak convergence P n w → P implies the following equivalence: sup g∈G | gd P n − gd P| → 0 holds if and only if both and lim δ→0 sup g∈G P x : sup (2) and (3). As P n and P have bounded support, it holds W(P n , P) = sup g∈Lip(1) g d P n − g d P.

Theorem 2.1 Consider a continuous probability measure P and a sequence of probability measures (P n ).
Suppose that the conditions and hold, for some constant M < ∞. Then, weak convergence implies convergence in Wasserstein distance: Proof Denote the cdf of P n by F n and that of P by F. Notice that, by a version of Cebyšëv's inequality, for and similarly Now choose K large enough such that M K ≤ ε holds. Then, Define the probability measure P K n as P n conditioned on the interval [−K , K ], where we know P n ([−K , K ]) ≥ 1 − 2ε by (6) and (7). Define P K analogously. By Lemma 2.2, it holds Now, for any ε > 0, we can make n such large that Then, in total, we get The subsequent result bounds the difference in the value of a diffusion process at a certain future time, if it starts from different values at time zero.

Proposition 2.2 Define the process X
is satisfied a.e. Then, the following stability of the diffusion process with respect to its starting value holds: for any p and q such that 1 p + 1 q = 1.

Proof See Cox et al. [12, Cor. 2.19]
With the above auxiliary results in hands, we now define a generic multistage stochastic optimization problem. The approximation quality of its optimal value, when the uncertainty process is modeled by a diffusion but approximated on the basis of Algorithm 2.1, is the object that we eventually want to analyze Definition 2.1 (GenMSP) Define a generic multistage stochastic optimization problem (GenMSP) to be of the following form: The feasible sets X t are assumed to be convex. For the scenario process ξ , we assume ξ ∈ L 1 (R, Q). The decision process x is required to be adapted to the filtration σ (ξ) generated by the scenario process, as is denoted by x σ (ξ). Moreover, assume that the cost function C t (·, ·) is convex in the decisions (for any fixed scenario), and Lipschitz continuous (with constant L) w.r.t. the scenario process (for any fixed decision policy). Denote the optimal value of (10), as a function of the underlying probability model To interpret problem (10), it is the objective to select a nonanticipative (constraint (10b)) decision policy x, which fulfills certain additional constraints (10a), in such a way that cumulative expected costs are minimized. One may think, for instance, in terms of portfolio losses C t resulting from the stochastic evolution of the financial market ξ t as well as the selected portfolio composition x t . Short-selling restrictions would then be an example for "additional constraints" on the decision process.
The concept of the Wasserstein distance 2 between probability measures will be a key ingredient for our analysis of Algorithm 2.1 in terms of its approximation quality with respect to the optimal solution of GenMSP. In particular, we will rely on the following general stability result for the optimal solution of GenMSP, when the underlying probability model varies.

Proposition 2.3
Consider a GenMSP as defined in Definition 2.1 above. Let the distance between two paths ξ (1) 0:t and ξ (2) 0:t up to time t ≤ T be defined by ξ (1) Assume that, for all t = 0, . . . , T − 1, there exist constants κ t+1 and ε t+1 such that the Wasserstein distances W of the corresponding single-stage conditional transition probability mea-suresP t+1 (·|ξ t ) andP t+1 (·|ξ t ) satisfy the conditions uniformly for all paths ξ (1) 0:t , ξ (2) 0:t . Then, the following upper bound for the difference between the optimal values v * (P) and v * (P) holds: Proof We want to show that P andP N satisfy the conditions (11) and (12), with ε t ↓ 0 as N increases. Then, the statement follows readily from Proposition 2.3. The diffusion model satisfies condition (11) by Proposition 2.2. Moreover, since for N large enough it holds it follows the convergence of the finite dimensional distributions see [20, pg.169]. Since we have constructed the lattice in such a way, that each atom of the distribution ofX N t is also an atom of the distribution ofX M t , for all M ≥ N , it follows also the weak convergence of all conditional probabilities. By Theorem 2.1, the latter implies convergence in Wasserstein distance, as the conditions (4) and (5) hold by Proposition 2.1 and Lemma 2.1, respectively.
Thus, condition (12) is shown to be satisfied.
Remark The rescaling of time was necessary in the construction of Algorithm 2.1 in order for Theorem 2.2 to hold. However, notice that the method in essence specifies a ternary transition rule. While blindly using the directly resulting ternary lattice would not rely on any supporting theory, it might still be interesting to test its performance, especially for problems with multiple observation periods but relatively few decisions.

Interpolating bridge processes
In Sect. 2, we discussed the generation of discrete scenario trees/lattices out of continuous parametric models, as it is typically required for the computational solution of any multistage stochastic optimization problem. For multiscale problems, a discretization of the information flow through all decision stages is not enough, as the stochasticity of the costs between the decision stages is an important factor. In such cases, we suggest to draw on the theory of stochastic bridge processes in order to simulate the behavior of the uncertainty process (with arbitrary granularity of the time increment) between consecutive decisions. In particular, this approach ensures the consistency of the finer multiperiod observation scale and the coarser decision scale by simulating trajectories for the multiperiod costs that connect two decision nodes with each other in a tree/lattice model. In this section, we make our proposed modeling approach directly applicable by working out the details for several popular examples of stochastic models. In particular, we present a new simulation algorithm for compound Poisson bridges and derive the dynamics for a few diffusion bridge examples in explicit form. From the latter dynamics, a simulation engine can easily be implemented on the basis of any discretization scheme for stochastic differential equations. 3

Diffusion processes
We start with a generic multi-dimensional model with drift and multiple factors. Afterwards, we derive the bridge process dynamics explicitly for several special cases that are frequently used in the literature. The general theory for diffusion bridges is wellestablished (see [4,13,37]), but the literature is quite abstract. In particular, we are not aware of any standard textbook that offers explicit examples apart from the basic Brownian bridge. Our relatively simple proof of the subsequent theorem is a generalization and elaboration of the derivations contained in an unpublished manuscript by Lyons [26], that we found online.
where μ(·) : R n → R, σ (·) : R n → R n×m , and W is an m-dimensional Brownian motion. Then, for t ∈ [t 1 , t 2 ], the dynamics of X conditioned on both, the starting value x 1 at time t 1 and the final value x 2 at time t 2 , are given by where f t 2 denotes the transition density of X at time t 2 .
Proof For any t ∈ [t 1 , t 2 ], denote the density function of the random variableX t bŷ f t (x|X t 1 = x 1 , X t 2 = x 2 ). Due to Bayes' Theorem and the fact that solutions of SDEs are Markov processes, we may rewrite this function aŝ Then, for any Lipschitz-continuous function h : by the multi-dimensional Itô lemma and the Kolmogorov backward equation, which ensures Differentiation with respect to the time parameter gives The function h is Lipschitz by assumption and thus its gradient is bounded. It can be seen from (15) as any x i → ±∞. Therefore, integrating (16) twice by parts gives from which we can deduce with Equation (17) corresponds to the Fokker-Planck equation of the diffusion process We subsequently focus on the one-dimensional case. Let X t 2 = x 2 be fixed for all examples below.
General state-dependent parameters For a general univariate diffusion process X described by the SDE the dynamics of the associated bridge process are given by where f t denotes the transition density of X at time t.
Vašíček model/Ornstein-Uhlenbeck process The model presented in Vašíček [45] is considered as the first stochastic model for the term structure of interest rates. It is a one-factor model for the short rate, featuring mean reversion. In Vašíček's model, the instantaneous rate r is described by a Gaussian Ornstein-Uhlenbeck process, i.e., as the solution of the SDE where the parameter θ can be interpreted as the long-term mean, κ determines the speed of mean-reversion, and W is a standard Brownian motion; the volatility being specified by the (constant) parameter σ . For s ≤ t, the transition density of r is that of a Normal distribution, i.e.
. Hence, the derivative of the logarithmized transition density is a closed form expression and the bridge processr associated with r , pinned to the value r t 2 = x 2 , is described by the dynamics

Cox-Ingersoll-Ross (CIR) model/square-root diffusion
The second classical interest rate term structure model was introduced in Cox et al. [11]. It is typically referred to as CIR model. The square root diffusion process is used as an improvement of the Vašíček model. The transition density f (·|·, ·) of the square-root diffusion process is a cumbersome object but yet an analytic expression. Hence, the bridge process associated with the CIR model is described by tractable dynamics. In particular, for 0 < s < t < T , we get where ν(x) := 4κ x x s e κ(t+s) σ 2 (e κt − e κs ) , q := 2κθ σ 2 − 1, and I α (·) denotes the modified Bessel function of the first kind. Then, the bridge process dynamics for the CIR model are given by Geometric Brownian motion (GBM) For a GBM X , described by the transition density is available as an analytic expression. Thus, the dynamics of the associated bridge process take the explicit form Brownian motion with drift In the simplest case of a Brownian motion with constant drift and volatility, i.e., the associated bridge process is the well-known Brownian bridge. Its dynamics are given by

Pathwise construction of the bridge process for GBM
The subsequent result shows how to translate a set of Brownian motion trajectories into a set of geometric Brownian bridge trajectories. Thus, simulation of the GBM bridge is straightforward and requires only the generation of Gaussian random variables. (20). Assume that it starts in X t 1 = x 1 and it shall be pinned to the value x 2 at time t 2 . Then, for t 1 ≤ t ≤ t 2 , the bridge processX is given bŷ Proof Obviously,X t 1 = x 1 as well asX t 2 = x 2 do hold. Moreover, let us check that the structure of the original process is indeed preserved by this bridge process. Denote the exponent in (21) by Y t and consider x 2 = e Y t 2 as a lognormally distributed random variable, where Var(Y t 2 ) = σ 2 (t 2 − t 1 ). Define t 1 (t) := σ 2 (t − t 1 ). Then, for t 1 ≤ s 1 ≤ s 2 ≤ t 2 , it holds that which confirms that Y is again a Brownian motion and thusX is a GBM. Figure 3 shows a collection of sample paths simulated via (21).

Jump processes
Stochastic processes that do not fluctuate in a continuous manner but rather by sudden jumps, are popular models for a variety of applications. The majority of typical jump models belongs to the class of Lévy processes. Lévy processes are stochastic processes characterized by independent and stationary increments as well as stochastically continuous sample paths. 4 In addition to their prominence in the physical sciences, 5 there is a particularly vast literature on Lévy processes as a model for the random evolution of variables present in the financial markets. 6 As we are dealing with bridge processes here, let us mention the fact that the Markov property of Lévy bridges is inherited from the Markov property of Lévy processes [19, Proposition 2.3.1].

Compound Poisson bridges
The most fundamental and prominent jump process is the Poisson process, counting the number of occurrences of some random event. For the modeling of a situation where not only the number of those (quantifiable) events but also their size matters, the compound Poisson process is a natural extension. It is extensively used, e.g., for actuarial applications as insurance companies are naturally not only interested in the number of claims happening to their customers but even more importantly in the claim sizes. 7 We present a method to simulate sample paths from a compound Poisson bridge process, i.e. a compound Poisson process with given initial and final value (and time). For jump-size distribution families that are closed under convolution or where convolution results in another tractable parametric family, some ingredients to our simulation scheme can be derived analytically and thus efficient simulation is possible. We carry out this exercise for the most popular representatives of jump-size distributions, i.e., the Normal distribution, the Exponential distribution, and the Gamma distribution. For distributions that do not allow for a tractable representation of the required convolution objects, one will have to revert to statistical procedures such as acceptance-rejection methods.
Consider a compound Poisson process X with intensity γ and jump-size distribution given by the density f . To avoid notational conflicts, we reserve the lower index in X t to describe the process X at time t. In contrast, we use an upper index to enumerate individual jumps (as random variables). The realization of an 'i-th' jump X i is denoted by x i . Consider now the process X t in the interval [t 1 , t 2 ], where we are given the values X t 1 and X t 2 . Define c := X t 2 − X t 1 . We suggest the simulation of the bridge process to be performed in the following three steps.
I: Simulation of the number of jumps As a first step, simulate from the conditional Poisson process N given the value of the sum N i=1 X i = c. This yields a realization of the number of jumps occurring over the considered time interval [t 1 , t 2 ]. The probability function of this object is given by For simulation purposes we cut the support of this conditional distribution of N to an interval [0,N ] ⊆ N 0 , in such a way that P[N >N | N i=1 X i = c] < ε, for some small value of ε.
Consider the Normal distribution as a jump size distribution, i.e. X i iid ∼ N (μ, σ 2 ), and let c > 0. The convolution of j iid N (μ, σ 2 ) distributions is an N ( jμ, jσ 2 ) distribution. Thus, To ensure this upper bound to be smaller than ε, we therefore require ε c 1 and The convolution of j independent Exponential distributions with parameter λ gives an Erlang distribution with parameters λ and j. The convolution of j Gamma(α, θ ) distributions gives a Gamma( jα, θ ) distribution. The Erlang distribution is a special case of the Gamma distribution, i.e. for integer-valued shape parameters. Hence, a criterion forN associated with Gamma distributed jumps also gives a criterion associated with Exponentially distributed jumps. From requiring, for the Gamma distribution case, Proof Using the convolution properties of the Normal distribution,F N k is characterized by the density function

Proposition 3.3 (Exponentially distributed jumps) Let F ∼ Exp(λ). Then,F
Exp k is a Lomax distribution, for any 1 ≤ k ≤ n. 8 In particular, Proof Using the convolution properties of the Exponential distribution,F Exp k is characterized by the density function Remark The Lomax distribution (sometimes also called Pareto type II distribution) is a special case of the generalized Pareto distribution (GP). In particular, it holds 8 The density function of the Lomax(α, β) distribution is given by where α is a shape parameter and 1/β is a scale parameter.
Lomax(α, β) ∼ GP(0, 1/α, β/α). The GP distribution is typically contained in commercial software packages. Built in functions can then be used for straightforward simulation of the Lomax distribution.
Remark Observe in passing, as a quick cross-check of the above results, that in both the Normal and the Exponential distribution case, the derived conditional distributionŝ F n of the last jump X n have expectation C n and zero variance. Gamma(α, θ ). Then,F k is a generalized Beta distribution of first kind, for any 1 ≤ k < n. 9 In particular, Proof Using the convolution properties of the Gamma distribution,F k is characterized by the density function Remark As the Gamma function (·) is only defined for strictly positive arguments, the case k = n is not covered in Proposition 3.4 above. However, we have generally addressed the latter case before.
The simulation scheme for compound Poisson processes, that has been elaborated in this section, is summarized in Algorithm 1 below. It includes the Normal, the Exponential and the Gamma distribution for the jump size. Figure 4 visualizes sample paths generated on the basis of this algorithm. 9 The density function of the generalized Beta distribution of first kind is given by y; a, b, p, q where B(·) denotes Euler's Beta function.

Algorithm 1 A simulation scheme for compound Poisson bridge processes.
Given: X t 1 , X t 2 Define: c := X t 2 − X t 1 , C k := c − k−1 j=1 x j I: Simulate a value n for the number of jumps N 1: DetermineN such that using the convolution properties N (μ, σ 2 ) * N (μ, σ 2 ) = N (2μ, 2σ 2 ), Exp(γ ) * Exp(γ ) = Erl(γ , 2), Gam(α, θ ) * Gam(α, θ ) = Gam(2α, θ ). 3: Obtain a value n by simulating from the resulting distribution, i.e., by computing its cdf and applying the inverse to a sample from the Uniform distribution on [0, 1] II: Simulate the jumping times 1: Simulate n random variables from the Uniform distribution on [t 1 , t 2 ] III: Simulate the jump sizes 1: For k = 1, . . . , n, simulate from the following distributions: Remark While efficient simulation of trajectories of compound Poisson bridges is indeed possible (given a tractable jump-size distribution), the distribution of the bridge process for some time t ∈ (t 1 , t 2 ) is generally an intractable object. Its cdf consists of the following terms: i.e., an infinite sum of the product of a Poisson distribution with parameter λ(t 2 − t 1 ), a Binomial distribution with probability parameter t/(t 2 − t 1 ), and a complicated multidimensional integral over the conditional densities (using a shorthand notation) given in (23).

Remark (Further Lévy processes)
For most Lévy processes, the density function at a given future time is not available in (semi-)closed form. However, in some special cases, bridge processes turn out to be of a surprisingly tractable nature. In the dissertation of Hoyle [19], one can find results for 1/2-stable processes, Inverse Gaussian processes and Cauchy processes, which imply that a simulation of associated bridges can be performed in a straightforward way: In the first two cases, by applying a deterministic function to a random draw from the standard Normal distribution; in the third case, the cumulative distribution function is given in terms of standard functions.

Illustration by example
In this section, we discuss the proposed modeling approach by a prototypical example. Moreover, we report about an implementation in the context of a real-world industrial application. In order to focus on the essential characteristics of the class of multiscale stochastic optimization problems, we will keep the complexity of the purely illustrative example as simple as possible.

A simple inventory control problem
Consider a business where some (perishable) goods can be sold for a unit price a. The stock can be replenished each Monday morning for the price b per unit. During the week, the products are sold but the stock cannot be replenished. The demand varies.
If the business runs out of stock, then costs c occur depending on the remaining time until the next opportunity to fill the stock. For products left in stock at the end of the week, we assume that only 30% can still be used for the next week, but 70% need to be thrown away. As a model for the demand, we use the Vašíček model [see (18) in Sect. 3.1]. In particular, for the sake of simplicity we do not consider any seasonal patterns. Let the parameters of the Vašíček model be given by θ = 105, κ = 0.5, σ = 10, and the starting value x 0 = 100. Three-stage problems are the smallest instances involving all issues that are typically connected to multistage decision making under uncertainty. Hence, the objective in the subsequent illustrative example is to maximize expected profits over two upcoming weeks.

Modeling the problem
Denote the demand at time s ∈ [0, 2] by (the continuous random variable) X s , the stock level at the beginning of week t ∈ {1, 2} by S t , and the remaining stock level at the end of week t by R t . One may interpret S t as the post-decision state with respect to a decision π t−1 , which is made before the first random demand of week t is observed. On the other hand, R t corresponds to the amount left in stock after all the demands of week t have been observed, i.e., R t is the pre-decision state with respect to π t . The state transition rule is given by S t = 0.3 · R t−1 + π t−1 , where we assume that the stock is empty in the beginning, that is R 0 = 0. We optimize over replenishing policies {π 0 , π 1 }. The profit during week t ∈ {1, 2} is then given by t+s t X u du > S t represents the first time of week t + 1 when the business runs out of stock. Remaining stock items at the end of the planning horizon enter the model with their value, i.e., we add 0.3 · R 2 · a to f 2 (π 1 , X 1:2 ). The problem can then be summarized as where σ (X ) denotes the filtration generated by the demand process X . We set the problem parameters to a = 10, b = 7 and c = 1000. We observe the demand on an hourly basis, 24/7.
The key observation here is that profits depend (in a highly nonlinear fashion) on the whole demand trajectory, while a replenishment decision for the stock can only be made once a week. The path-dependency is due to the presence of the stopping times τ t in the objective. We apply our suggested methodology and generate a collection of paths between each pair of consecutive decision nodes. In such a way, expected profits during the week can be computed by a simple Monte Carlo simulation. The SDE describing the Vašíček bridge process is given in (19).   Table 1. Notice that this lattice construction serves for the discretization of the decision stages only. Hence, the probabilities of different paths between two stages, which end up in the same node, can be summed up and the intermediate nodes and paths do not have to be stored. If one wanted to store the full lattice construction, this would correspond to (t · 4 N + 1) 2 nodes, 1 2 (3 t·4 N +1 − 1) conditional probabilities, and a total of 3 t·4 N paths up to time t, for the N -th iteration.
For ease of exposition, we keep the discrete model as small as possible for the current example and focus on illustrating the suggested methodology regarding the multiscale issue. Thus, we use a simple binary tree for the decision stages, which we obtain by a standard optimal quantization algorithm. The tree is visualized in Fig. 5. For industrial applications, trees/lattices with a magnitude of 10 5 -10 6 nodes are often used. The suggested lattice construction becomes increasingly attractive the more decision stages are involved.

Comparison with other modeling approaches
One might consider alternative discrete structures to model multiscale stochastic optimization problems. However, none of the approaches used for similar purposes in the stochastic programming literature is really comparable to what has been suggested in the present paper. This is due to the following reasons.
• Large trees/lattices In principle, one can understand any multiscale problem as a standard multistage problem, where the constraints rule out that decisions are made on the finer observation scale. After all, both scales are associated with the same underlying process. Then, one might simply use a very large tree/lattice model for the uncertainty process, which branches in each observation time. However, this will typically result in computational intractability. Even for the very small illustrative example discussed in this section, where there are only two decision points, hourly observation would already require a structure with 336 branching times. Any tree model would clearly explode even for much smaller instances. A ternary lattice model would involve more than 100 thousand nodes. Compared to our approach, it would require massive resources to construct such a lattice, store it, and compute a solution on it. • Reduced trees If multistage problems grow too large to be modeled on regular scenario trees, it seems popular in the applied stochastic programming literature to use trees that only branch irregularly, i.e., certain branches remain constant up to/after a certain time (cf. [15][16][17]29]). However, this means to use clairvoyant branches where a computed policy does not reflect the uncertainty faced by the decision maker. In fact, using such degenerate trees violates the fundamentals of multistage stochastic optimization, which is exactly based on the idea of (direct) stochastic lookahead policies. Our approach, on the other hand, does not turn the decision maker clairvoyant up to/after a certain time and is hence perfectly aligned with the fundamental paradigm that a decision policy must reflect the uncertainty faced by the decision maker at any point in time. For our example, the reduction of a tree with 336 branching times to a computational instance would need to be so massive, that basically a fan with very few branchings would remain. • Deterministic interpolation function Given a tree/lattice model for the information flow over the decision stages, one might simply choose a rudimentary interpolation approach, such as a constant or linear interpolation function, to compute the multiperiod costs between decisions. However, this is inconsistent as both the decision and the observation scale are actually associated with one and the same uncertainty process. It would mean to completely remove the stochasticity between decisions, whereas our approach takes into account the random fluctuations along the way from one decision node to the other. In the context of our example, a constant interpolation would be completely meaningless, as it would correspond to assuming that all the selling activity occurs in a single instant of time each week. A linear interpolation would simply not be in line with the essence of the problem that one does not know in advance if/when one will run out of stock during the week. • Multi-horizon stochastic programming A solution approach for a class of problems which are of a similar flavour, yet crucially different in nature, is called multi-horizon stochastic programming (see [21,27,41,42,46,48]). Infrastructure planning problems, being the original motivation by Kaut et al. [21], typically involve (rarely happening) strategic decisions as well as operational tasks (daily business). To overcome the above mentioned memory issue resulting from frequently branching scenario trees, the authors of [21] suggest to start with a tree for the strategic scale only. In a second step, they attach another tree to each node of the strategic-scale tree. The key assumption for the multi-horizon stochastic program-ming approach to be appropriate is that the strategic scale and all operational scales are independent from each other. In contrast, the approach suggested in the present paper is designed for problems where the two scales are clearly related to the same uncertainty process. Therefore, our approach ensures that different scenarios in between consecutive decisions are eventually bundled in one node (by leveraging the theory of stochastic bridges). Moreover, for our illustrative example each of the "operational" trees would still require 168 branching times, such that again serious simplification would be required to make the approach computationally tractable.
To summarize the major strengths of our model, it • respects the stochasticity in between decisions, • ensures the consistency of all involved scales with respect to a single uncertainty process, and • keeps the problem computationally tractable.
None of the other approaches mentioned above offers those three aspects, which are all essential characteristics of a useful modeling framework for the computational solution of multiscale stochastic optimization problems.

Numerical illustration
We have discussed above the qualitative strengths of our modeling approach. The simple example that we used to exemplify our explanations shall now serve to illustrate numerically two important aspects. First, an appropriate modeling approach is increasingly important the stronger is the path-dependency of the multiperiod costs. In our example, this path-dependency is higher, the larger is the value of the parameter c, which represents the costs that occur during the time span when the agent is out of stock. The second aspect is that the way how the multiperiod costs are modeled, does have a considerable impact on the resulting optimal value, even if the cost-structure does not depend heavily on a particular path. Even if we set c = 0, we observe an over-estimated value of almost 3% if we use an ad-hoc linear interpolation instead of our consistent modeling approach. Notice that this impact is related to a problem with only two decision stages. Table 2 illustrates the above two observations with numbers. The second column shows the optimal values obtained using our modeling approach with bridge processes. The third column shows the values obtained using a simple linear interpolation rule for the intra-week evolution of the demand. How much this changes the optimal value in percent is given in the last column. Our implementation is based on a simple backwards dynamic programing algorithm. Between each pair of consecutive decision nodes, we have used 10k simulations of the bridge process.

A real-world application
We have implemented the modeling approach suggested in this paper in the context of an industrial project dealing with the valuation of a thermal power plant. While the focus of that project lied on the incorporation of model ambiguity into a value function approximation policy, the valuation problem itself was presented to us in the form of a classical multiscale stochastic optimization problem: operating plans for the power plant must be fixed on a weekly basis (for management purposes of all the involved resources), but the intra-week profits resulting from the most recent decision depend on (uncertain) market prices that are observed in 4-h blocks. It is thus required to model a weekly decision scale with 42 observation periods within each week. A classical tree model over all observation periods would be intractable even for a single week. If we use a ternary lattice model, the first week already involves 1764 nodes. The second week requires 5292 additional nodes and modeling a quarter of a year with a ternary lattice involves about 300 thousand nodes in total. On the other hand, with our approach the lattice model is related to a much coarser time granularity, discretizing only the information flow on the weekly decision scale. Then, a time horizon of a quarter of a year involves only 196 nodes on a (ternary) lattice. Considering the finer observation scale, such a lattice involves 507 different arcs, along which an interpolation is required. An inconsistent interpolation approach would distort the expected costs in each such intra-week segment.
The multiscale modeling approach of the present paper proved to be very useful for this practically-sized problem. In fact, the power plant model-as it was presented to us by our industry partner-turned out to be of such a tractable form that expected intra-week costs could even be calculated by an analytical formula, based on the derived bridge process dynamics for the underlying uncertainty process. If a simulation is required, this obviously slows down the computation process. Still, the approach allows for a scenariowise decomposition with respect to the decision time scale, i.e., of the tree/lattice model. Thus, computational tractability is typically not limited by the multiscale feature of a problem, when our modeling approach is applied.
The studied valuation problem involves an extensive model of the power plant and is based on real data provided to us by the operating energy company. Thus, we refer the reader to our separate paper [44] for all the details.
named the subject of the study of this problem class multiscale stochastic optimization. The suggested approach is based on a separation between the (standard) multistage decision problem, and the problem of determining path-dependent costs between two consecutive decisions. The paper contains a contribution to both parts. One section was dedicated to the construction of scenario lattices as a discrete structure representing a time-homogeneous Markovian diffusion model. In particular, we examined a Markov-chain approximation approach and showed that the approximation error with respect to the optimal value of a generic multistage stochastic optimization problem can be controlled with the suggested methodology. In a second part, we suggested to leverage the theory of stochastic bridges in order to tackle the embedded multiperiod problem, which takes place on a much finer time-scale than the decision scale. We elaborated explicitly several examples of popular diffusion models and proposed a new simulation algorithm for compound Poisson bridges. A simple multiscale inventory control problem finally served to illustrate the proposed methodology and discuss it in the context of a concrete example. Moreover, we reported about an implementation as part of a real-world industrial project, where our approach turned out to be very convenient. The latter may be seen as a proof of concept.