Dynamic probabilistic constraints under continuous random distributions

The paper investigates analytical properties of dynamic probabilistic constraints (chance constraints). The underlying random distribution is supposed to be continuous. In the first part, a general multistage model with decision rules depending on past observations of the random process is analyzed. Basic properties like (weak sequential) (semi-) continuity of the probability function or existence of solutions are studied. It turns out that the results differ significantly according to whether decision rules are embedded into Lebesgue or Sobolev spaces. In the second part, the simplest meaningful two-stage model with decision rules from L2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^2$$\end{document} is investigated. More specific properties like Lipschitz continuity and differentiability of the probability function are considered. Explicitly verifiable conditions for these properties are provided along with explicit gradient formulae in the Gaussian case. The application of such formulae in the context of necessary optimality conditions is discussed and a concrete identification of solutions presented.


Overview
The application of probabilistic constraints (or: chance constraints) to engineering problems and their numerical solution is nowadays standard. Introduced by Charnes This work is dedicated to the memory of Shabbir Ahmed.
B R. Henrion henrion@wias-berlin.de Extended author information available on the last page of the article et al. [5] in a simple form (individual constrains) in 1958, their systematic theoretical and algorithmic investigation has been pioneered by Prékopa and his students starting in the Seventies (see [15] and references therein). The typical form of a probabilistic constraint is the inequality where x is a decision vector, ξ is a random vector, P a probability measure and g a random constraint mapping with finitely many components. The meaning of (1) is to define a decision x as feasible if the random inequality system g(x, ·) ≤ 0 is satisfied at least with probability p ∈ (0, 1]. A modern theoretical treatment of probabilistic constraints can be found in the monograph [16, chapter 4]. The algorithmic solution of optimization problems subject to constraints (1) has been tremendously advanced within the last twenty years. Rather than providing a detailed list of references here, we want to emphasize the contribution to this development by Shabbir Ahmed (e.g., [12,13]). At the same time the traditional model (1) has been extended to broader settings such as PDE constrained optimization ( [6,7,9]) or infinite random inequality systems (probust constraints, [17]). A challenge of different nature consists in considering dynamic aspects in probabilistic constraints. Observe that (1) is a static model by nature: The decision x ('here-and now-decision') has to be taken before the randomness ξ is observed. Such model would apply, for instance, in the design of a mechanical construction (encoded by x) which is done once and for ever and has to resist unknown future random forces ξ with high probability. Many decisions, however, are time dependent. The components of x and ξ could refer to discrete time decision and random processes, respectively. In the control of a hydro reservoir, for instance, one is faced with an alternating sequence of decisions x t (referring to water release) and realizations of randomness ξ t (water inflow) according to the chronology Whether or not this sequence ends with a decision (final recourse action) or with the observation of randomness without the possibility of finally reacting to it, depends on the choice of a model of multistage stochastic optimization or of multistage probabilistic programming. This distinction requires some care because sometimes the term 'two-stage probabilistic constraint' is used for the addition of a probabilistic constraint (relaxing the almost sure existence of a recourse action) in a setting of twostage stochastic programming. Such model has been first considered in [14] and is still of much interest (e.g., [11]). Here, the chronology is the one of (2) with T = 2 (without the final term in parantheses): x 1 ξ 1 x 2 , i.e., it is a special two-stage stochastic optimization problem. In our understanding, it is not a two-stage probabilistic constraint which would end with the term in parentheses in (2): x 1 ξ 1 x 2 ξ 2 . In this way one would obtain a logical generalization of conventional one-stage (static) probabilistic constraints of type x 1 ξ 1 and keep the idea, that in a probabilistic constraint one is always faced with a final unknown realization of some random vector. This idea follows a remark in [8]: '... a well-formed probabilistic constraint contains at least one coefficient that depends on a random variable realized after the last decision is taken'.
It is clear that in (2) the dynamic character of the decision making process expresses itself by assuming all decisions being functions of past observations in order to take advantage of the gain of information obtained from the realizations of the random vector. Hence, instead of static (constant) decisions x t one admits decision rules or policies x 2 (ξ 1 ), x 3 (ξ 1 , ξ 2 ) etc. When considering continuously distributed random vectors, this approach takes the problem to infinite dimensions even though time is discrete, because policies are elements of appropriate function spaces. One may circumvent this difficulty by restricting policies to a parameterized class, linear decision rules in the simplest case. Then, one gets back to a static problem where decisions are the parameters of the policies. Several aspects of modeling linear decision rules in the context of (linear) multistage probabilistic constraints are discussed in [10]. It is not guaranteed, however, that the chosen class contains the optimal policy. Another idea to reduce the problem again to a finite-dimensional one would consist in a discrete approximation of the random distribution. A conceptual framework for dealing with dynamic probabilistic constraints without restricting the class of policies and keeping the continuous character of the given (multivariate Gaussian) distribution was presented in [4] along with applications to two-and three-stage probabilistic control of a water reservoir. Using stochastic dynamic programming rather than direct nonlinear programming, a similar problem was later analyzed and numerically solved in [2] for a significantly larger number of stages, however with a discrete random distribution.
The focus in this paper is not on the numerical solution of problems subject to dynamic probabilistic constraints but rather on analytical properties of the arising probability function. Here we assume the underlying random distribution to be continuous and keep the decision rules general as elements of some Lebesgue or Sobolev space. In Sect. 2, a general multistage model is analyzed. Basic properties like (weak sequential) (semi-) continuity of the probability function or existence of solutions are studied. In Sect. 3, the simplest meaningful two-stage model with decision rules from L 2 is investigated. More specific properties like Lipschitz continuity and differentiability of the probability function are considered. Explicitly verifiable conditions for these properties are provided along with explicit gradient formulae in the Gaussian case. The application of such formulae in the context of necessary optimality conditions is discussed and a concrete identification of solutions presented.

The general setting
In this paper we study optimization problems of the type Here, the space of decisions X is one of the following Lebesgue or Sobolev spaces with q ∈ [1, ∞) The subset C ⊆ X (or C ⊆ X 1 ) is meant to represent some abstract constraint on the decision, e.g., nonnegativity or bounds for the components. The focus of our attention will be on the inequality constraint ϕ(x) ≥ p which we will assume to represent a so called joint dynamic chance constraint. More precisely, p ∈ (0, 1] is some given safety level and ϕ : X → [0, 1] denotes a probability function defined for x ∈ X as follows where h i : R T × R T → R and ξ := (ξ 1 , . . . , ξ T ) is a T -dimensional discrete time process on some probability space (Ω, A, P). Observe that with each component x t of the decision x depending on past outcomes (ξ 1 , . . . , ξ t−1 ) only, x represents an adapted decision process. We endow X and X 1 with the maximum norm with respect to the usual norms in the coordinate spaces. Doing so, X and X 1 are Banach spaces.

A motivating example
To illustrate applications for problem 3, we present a decision management optimization problem on a single water reservoir for hydroelectricity generation. Given a set of future time intervals 1, 2, . . . , T , the problem of the operator is to decide on an optimal release policy (x 1 , . . . , x T ) of water, considering technical, economical and environmental aspects. By ξ = (ξ 1 , . . . , ξ T ), we denote the random vector indicating the stochastic water inflow (e.g. precipitation, snow melt) to the reservoirs at corresponding time intervals . The main role of the reservoir is to generate electricity. At the same time, lower and upper limits l * , l * for the water level have to be satisfied in the reservoir, say for flood protection or for ecological reasons. By the random nature of the inflows, the time dependent water level l t (x, ξ) induced from the controlled water release x is a random variable too. Hence, the mentioned limits cannot be satisfied in a deterministic way. Rather, it is reasonable to impose them in a probabilistic way: Here, p ∈ [0, 1] denotes a probability level at which the random constraints are supposed to hold true. The current water level after time interval t is clearly given as the initial level plus the cumulated inflow minus the cumulated release so far: Sometimes, one decides on the future water release in complete ignorance of future water inflow. This is the case, for instance, in day ahead markets, when energy production (water release) for each hour of the next day is fixed one day ahead. Then, decisions are just scalars for each time intervals and the probabilistic constraint (5) becomes Such a static model does not take into account the temporal gain of information while the random inflow process unfolds. In longer term planning problems one therefore admits from the beginning that future decisions on water release are functions of past observations of the random inflow. Hence, rather than deciding on scalars x 1 , . . . , x T , one is looking for functions x 1 , x 2 (·), x 3 (·, ·), so-called policies. In this dynamic setting better solutions of the underlying optimization problem can be expected (the static model being included as a special case with constant policies, e.g., x 2 (·) ≡ x 2 etc.). Hence, we adjust our static chance constraint above to a dynamic one, where x ∈ X , X 1 : A possible objective in a corresponding optimization problem might consist in the maximization of the expected overall water release (representing the amount of energy produced): Then, the optimization problem is of the form (3) with the probability function ϕ defined in (4) via the constraint mapping h : R T × R T → R 2T . The latter has k := 2T components

Basic structural properties of the general model
In this section we are going to collect some basic structural properties of the chance constraint ϕ(x) ≥ p in (3) and the involved probability function ϕ in (4). For convenience, we introduce the notation u [i] := (u 1 , . . . , u i ) for vectors u ∈ R n and 1 ≤ i ≤ n. With the policy x ∈ X we associate the joint policy (whose components have a common domain) as a mapping [x] : R T → R T defined by with the convention x 1 (z [0] ) = x 1 . Finally, we introduce the maximum function related to the mapping h: Then, the probability function in (4) can be compactly written as We first check, that this expression is well-defined. In order to ensure this, we make the following basic assumptions in (4) throughout this paper: so that it is justified to speak of the probability of this event appearing in the definition of (9). It remains to show that this probability is independent of the representative of x ∈ X . To see this, let x (1) , x (2) ∈ X such that x (1) 1 = x (2) 1 and be such that where B t ⊆ R t are Lebesgue measurable subsets with λ t R t \B t = 0 (λ t is the Lebesgue measure in R t ). Define Then, λ T −1 (C) = 0 and Since ξ possesses a density f ξ , it follows from (9 ) that (1) ](z),z)≤0} (2) ](z),z)≤0} (2) ).
This shows, that the value of ϕ does not depend on the representative of x ∈ X . We will commence our analysis with some (lower-) semicontinuity properties and then derive consequences later on. The following Proposition turns out to be a crucial technical tool in this context:

Moreover, if h has components h i which are upper semicontinuous in their first argument vector and in addition
Proof We start with the first assertion (10). The function h max in (8) is lower semicontinuous in its first argument vector because the h i are assumed to be so. By assumption, we have that x (n) 1 → n x 1 and Without loss of generality (by passing to a superset whose difference with B t has Lebesgue measure zero), we may assume that the B t are Borel measurable. Repeating the construction from the beginning of this section, we find a subset C ⊆ R T −1 which now is Borel measurable and is such that λ T −1 (C) = 0 and Consider the event sets Fix an arbitrary ω ∈ (Ω\A) ∩ Γ . Then, the lower semicontinuity of h max in its first argument vector yields that Denote by χ Q the characteristic function of a set Q. Now, (13) In other words, since P (Γ ) = 1, χ A n converges pointwise P -almost surely to χ A on the set Ω\A. Since χ A n ≤ 1, the dominated convergence theorem provides that Ω\A χ A n dP → n 0. Now, let x (n l ) be a subsequence realizing the limsup in (10 ) as a limit. Then, in view of the relation above, we arrive at (10 ): As for (12), observe first that with the components h i being upper semicontinuous in their first argument vector, the components (−h i ) of −h are lower semicontinuous in their first argument vector. Denote byφ the probability function in (4) or (9), respectively, associated with −h rather than with h. Then, by the just proven relation (10), we have that It now follows that From (11) and the basic assumption (BA) that ξ possesses a density, we infer that Hence, we may continue the previous chain of (in-)equalities, in order to arrive at (12): The following Lemma will allow us to derive from Proposition 1 the announced (semi-) continuity properties for ϕ. We do not claim that this Lemma is new but are not able to provide a reference.
Lemma 1 Consider a sequence x (n) ⊆ X 1 which converges weakly to x ∈ X 1 . Then, there exists a subsequence x (n k ) which converges almost everywhere to x.
Proof Consider {x (n) } ⊆ X 1 which converges weakly to x ∈ X 1 . Since our space X 1 is a product spaces, it is enough to prove that each coordinates has a subsequence with the desired property.
Let us fix i ∈ {2, . . . , T } (the case i = 1 is trivial). For simplicity of notation let us denote f n := x Consider r ∈ N\{0}, define the domain U r := B r ⊆ R i−1 , the Euclidean ball centered at zero with radius r . We have that the restriction of f n and f belongs to W 1,1 (U r ), and since U r is bounded we have that f n and f belong to W 1,q (U r ). Now, by Rellich-Kondrachov's Theorem (see, e.g., [1, Theorem 6.3, Part I] and [1, p. 84]) we can extract a subsequence f n k which converges in norm and almost everywhere to z ∈ L 1 (U r ). Moreover, since f n k also converges weakly to f we have that z = f almost everywhere on U r . Finally, using induction and a diagonal argument we are done. (4) (11) is satisfied, then ϕ : X → [0, 1] is lower semicontinuous in the norm topology of X and its restriction ϕ|

Theorem 1 In addition to the basic assumptions (BA), suppose that h in
It is well known that there exists a further subsequence x (n k l ) converging almost everywhere to x (see, e.g., [3,Theorem 13.6]). Then, by (10), which shows the upper semicontinuity of ϕ in the norm topology of X . Next, let x (n) ⊆ X 1 be a sequence weakly converging to some x ∈ X 1 . Then, repeating the previous argument-this time justifying almost everywhere convergence of a subsequence on the basis of Lemma 1-we derive in the same way inequality (14), thus proving the sequential upper semicontinuity of ϕ| X 1 with respect to the weak topology of X 1 .
Under the additional assumption (11), the same argumentation as above can be repeated along with (12), in order to derive the remaining assertions.

sets of feasible decisions in problem (3) defined by the dynamic probabilistic constraint. In addition to the basic assumptions (BA), suppose that h in (4) has components h i which are lower semicontinuous in their first argument vector (related with x). Then, M( p) is strongly closed in X and M 1 ( p) is weakly sequentially closed in
Corollary 2 In addition to the basic assumptions (BA) and to condition (11) suppose that h in (4) has components h i which are continuous in their first argument vector (related with x). Then, ϕ : X → [0, 1] defined in (4) is continuous in the norm topology of X . Its restriction ϕ| X 1 : X 1 → [0, 1] is sequentially continuous with respect to the weak topology of X 1 .
We are now in a position to prove with standard arguments the existence of solutions to problem (3)  Proof As a consequence of 1., X 1 is a reflexive Banach space. Therefore, 2. implies that C is weakly sequentially compact. By 4. and Corollary 1, the set M 1 ( p) = x ∈ X 1 |ϕ(x) ≥ p is weakly sequentially closed. Hence, with 5., the feasible set (3) is nonempty and weakly sequentially compact. Now, with 3., the Weierstrass Theorem guarantees the existence of a solution to (3).
The following example illustrates that, under the assumptions of Corollary 1, M( p) cannot be expected to be weakly sequentially closed in X (in contrast with M 1 ( p) and X 1 ) and therefore existence of solutions as in Theorem 2 cannot be expected in the space X : Then, the h i are continuous such that our basic assumptions (BA) are satisfied and Corollary 1 guarantees that M( p) is strongly closed in X . Now, define the sequence Then, x (n) weakly converges to x := 4π, χ [2π,4π ] . Moreover, by definition of h and ξ and by (4), it holds that Therefore, x (n) ∈ M( p). On the other hand, It follows that x / ∈ M( p), whence M( p) fails to be weakly sequentially closed.
We finish this Section by briefly addressing the issue of convexity of the feasible set defined by the probabilistic constraint ϕ(x) ≥ p in (3). Assume first that we would deal with a joint static probabilistic constraint, which means that the decision policies x are supposed to be constants: (7 ). Assume further, that ξ has a logconcave density (e.g., multivariate Gaussian) and that the mapping h is affine linear: h (x, z) = Ax + Bz + b. This is the cae, for instance, for the reservoir problem with static probabilistic constraint (6). Then, thanks to a result by Prékopa [15, Th. 10.2.1.], the inequality ϕ(x) ≥ p defines a convex set of feasible decisions x for any right-hand side probability level p. Unfortunately, a similar convexity result gets lost in the dynamic setting. Indeed, we may revisit Example 1, where the density of the given uniform distribution is constant on the rectangle and zero outside, hence logconcave (in the extended-valued meaning). Moreover, the mapping h (x, z) = z − x is linear. As for the feasible set M( p) := {x ∈ X |ϕ(x) ≥ p} , we have seen in Example 1 that it is strongly closed but fails to be weakly sequentially closed. If it was convex, then closedness would imply weak closedness, hence weak sequential closedness, which is a contradiction.

Properties of the probability function in a simple two-stage model
In this section, we are going to investigate analytical properties (continuity, Lipschitz continuity, differentiability including explicit derivatives) of the probability function ϕ in (4) in the framework of the simplest meaningful dynamic setting. More precisely, we consider a two-stage model (T = 2) of the following joint and separated probabilistic constraint: This corresponds to the choice of the mapping h : (4). We will choose X with index q = 2 to be the base space of decisions, which means that x 2 ∈ L 2 (R). In all results hereafter, we shall explicitly work with a given density of ξ . By continuity of h, our basic assumptions (BA) will be automatically satisfied then.

Continuity and lipschitz continuity
Proposition 2 If ξ has a density, then the probability function ϕ : R × L 2 (R) → R is continuous.
Proof Since h is continuous, it suffices by Corollary 2 to check condition (11) at an arbitrary x ∈ X . For the first component h 1 of h it reads as which is evidently true. For the second component we observe that and that the right-hand side is evidently true.
Before extending the previous result on continuity to the stronger Lipschitz continuity, we introduce the following two assumptions on the density g ξ of a two-dimensional random vector ξ : sup s∈R g ξ (·, s) ∈ L 2 (R) (17) Note that (16) means that the first marginal density of ξ (which is the density g ξ 1 of the first component of ξ ) is bounded.
Proof Consider an arbitrary couple x, y ∈ X . We start with the obvious estimate Without loss of generality, assume that x 1 ≤ y 1 . Now, by (15), and taking into account assumption (16), we have that Likewise, exploiting (17), the fact that x 2 , y 2 ∈ L 2 (R) and the Cauchy-Schwartz inequality, we obtain Along with (18), we conclude that The following example shows, that the assumptions of Proposition 3 are not strong enough to guarantee the differentiability of ϕ: have a bivariate standard Gaussian distribution (uncorrelated components with mean zero and unit variance). By Proposition 6, the assumptions (16) and (17) of Proposition 3 are satisfied and, hence, ϕ is Lipschitz continuous. On the other hand, ϕ fails to be differentiable. To see this, we fixx 2 := χ [0,1] ∈ L 2 (R) and observe that the partial real functionφ(x 1 ) := ϕ x 1 ,x 2 fails to be differentiable. Indeed, the following explicit representation can be immediately verified, where Φ refers to the cumulative distribution function of the one dimensional standard Gaussian distribution: The graph of this function is shown in Fig. 1. Clearly,φ is Lipschitz continuous because ϕ is so. On the other hand, it fails to be differentiable at x 1 = 0 and x 1 = 1. This can be seen for x 1 = 0, for instance, by deriving the first two expressions above at 0. With f denoting the density of the standard Gaussian distribution, the derivative of the first expression-yielding the left directional derivative ofφ at 0-gives Φ (0) f (0). On the other hand, the derivative of the second expression-yielding the right directional We shall see in the next section that the reason for the failure of differentiability of ϕ in Example 2 is the discontinuity of the second stage policy x 2 := χ [0,1] at which the derivative is considered. More precisely, this circumstance concerns just the partial differentiability of ϕ with respect to its first argument x 1 , whereas the partial differentiability of ϕ with respect to x 2 remains unaffected by a possible discontinuity of x 2 .

Differentiability
Before verifying the partial differentiability of ϕ with respect to its first argument, we shall prove the following Lemma 2 Let a bivariate probability density g satisfy the following technical (uniform calmness) condition: Assume further, that f : R → R is continuous. Then, the function is finite-valued and continuous.
Proof Fix an arbitraryr ∈ R and consider an arbitrary sequence r n →r . We are going to show that α (r n ) → α (r ). We observe first that Therefore, for n large enough and almost every s ∈ R, We show that g (r , ·) ∈ L 1 (R): Indeed, as g ∈ L 1 R 2 (as a probability density), Fubini's Theorem yields that g (r , ·) ∈ L 1 (R) for almost every r ∈ R. Hence, there exists somer ∈ (r − ε,r + ε) with ε from (19) such that g (r , ·) ∈ L 1 (R) and g (r , s) ≤ l(s) |r −r | + g (r , s) a.e. s ∈ R.
The preceding Lemma allows us to formulate the desired result on partial differentiability of ϕ with respect to its first argument:

Proposition 4
Let the density g ξ of ξ satisfies (19) and fixx 2 ∈ L 2 (R) such thatx 2 is continuous. Then, the partial derivative of ϕ w.r.t. x 1 exists at any (x 1 ,x 2 ), it equals Moreover, it depends continuously on x 1 .
Proof Letx 1 be arbitrary. By (15), we have that with α defined in Lemma 2 upon setting f (r ) :=x 2 (r ) and g := g ξ . Sincex 2 is supposed to be continuous, the assumptions of Lemma 2 are satisfied. Thus, by taking into account that α is continuous according to 2, we arrive at Continuity of ∂ϕ ∂ x 1 (·,x 2 ) = α follows once more from the continuity of α.
Observe, that a full continuity result (with respect to x 1 and x 2 simultaneously) cannot be expected for the partial derivative ∂ϕ ∂ x 1 because, by virtue of Example 2, it may not even be defined for discontinuous x 2 approaching the continuous policyx 2 . In contrast to the partial derivative w.r.t. x 1 , the partial derivative of ϕ with respect to x 2 does not require any assumptions on the fixed second-stage policyx 2 but rather some additional assumptions on the density g ξ : Proposition 5 Let the density g ξ of ξ satisfies assumption (17) as well as the assumption of being Lipschitz continuous in the second argument uniformly in the first argument: Fix an arbitrary (x 1 ,x 2 ) ∈ X = R × L 2 (R). Then, the partial derivative ∇ x 2 ϕ exists at (x 1 ,x 2 ), it is given by the expression and it is continuous in (x 1 , x 2 ).
for all x 2 ∈ L 2 (R) and show that this function is Fréchet differentiable atx 2 . Define the linear function From (17) we infer that g ξ (·,x 2 (·)) ∈ L 2 (R), whence by the Cauchy-Schwarz inequality, Consequently, A is a continuous linear functional. Hence, the Fréchet differentiability of γ atx 2 will be proven, once we can show that Indeed, the definition of γ and (15) entail that By (23), we have that Consequently, we derive the following relation implying (26).
It follows that ∇ x 2 ϕ (x 1 ,x 2 ) = ∇γ (x 2 ) = A. Since A in (25) has been shown to be a continuous linear functional on L 2 (R), it can be identified with the function g ξ (·,x 2 (·)) χ (−∞,x 1 ] ∈ L 2 (R). This entails the asserted formula (24). It remains to show that the expression given there depends continuously on (x 1 , x 2 ). To this aim, consider a sequence ( . We will do this by showing the equivalent fact that every subsequence (x (n k ) in L 2 (R). So, let (x (n k ) 1 , x (n k ) 2 ) be such an arbitrary subsequence. Observe first that the strong convergence (x (n k ) everywhere pointwise convergence for a subsequence: x 2 (r )) a.e. r ∈ R. (28) As g ξ is continuous in its second argument by (23), it follows from (28) that (r )) → l g ξ (r ,x 2 (r )) a.e. r ∈ R.
Moreover, χ We conclude from (24) that for almost every r ∈ R. On the other hand, by (17) Therefore, Lebesgue's Dominated Convergence Theorem (for L 2 (R)) yields the asserted convergence (27) in L 2 (R).

Distributions satisfying the assumptions
In this Section we are going to specify the results of the preceding sections to the special case of a bivariate Gaussian distribution and a uniform distribution on a rectangle. First, we verify that all relevant assumptions are satisfied in the Gaussian case:

Theorem 3 Let ξ be a bivariate random vector distributed according to N (μ, Σ)
with regular Σ . Then, the probability function ϕ in (15) is Lipschitz continuous and has a second partial derivative at an arbitrary (x 1 ,x 2 ) ∈ X = R × L 2 (R) which is given by the explicit formula Here, ∇ x 2 ϕ depends continuously (in the norm of X ) on x = (x 1 , x 2 ). Moreover, ϕ has a first partial derivative at an arbitrary (x 1 ,x 2 ) ∈ X = R × L 2 (R) with continuous x 2 which is given by the explicit formula where Φ(t) := (2π ) −1/2 t −∞ e −s 2 /2 ds refers to the cumulative distribution function of the one-dimensional standard Gaussian distribution N (0, 1) .
Proof The Lipschitz continuity, the existence of partial derivatives and the corresponding continuity statements follow from Propositions 3, 4 and 5 via Proposition 6. Relation (31) is obtained by specifying (24) for the density of N (μ, Σ) (see (29) with ). Concerning (32 ), we recall the formula derived in Proposition 4: where g ξ 2 |ξ 1 =x 1 and G ξ 2 |ξ 1 =x 1 refer to the conditional density and cumulative distribution function, respectively, of ξ 2 given ξ 1 =x 1 . As it is well known for the Gaussian case assumed here, one has that the conditioned random variable ξ 2 |ξ 1 =x 1 has a one-dimensional Gaussian distribution with After normalization, we get that Now, the definition of G ξ 2 |ξ 1 =x 1 yields that where Φ is the cumulative distribution function of N (0, 1). Now the asserted formula (32) follows form (33) upon plugging in the formula for the first marginal density g ξ 1 of g ξ having distribution N (μ 1 , Σ 11 ).
Corresponding results can be expected for many other bivariate distributions having continuous density. As a contrast, we briefly refer to uniform distributions over rectangles for which no differentiability results for ϕ but at least Lipschitz continuity can be expected:  (15) is Lipschitz continuous.
Proof The density g ξ satisfies the assumptions (16) and (17) thanks to the following easy to verify relations Now, the assertion follows from Proposition 3.
Note, that a uniform distribution as in the previous Proposition cannot satisfy relations (19) and (23) because of the discontinuity of its density. Therefore, no differentiability results as in Propositions 4 and 5 can be expected and counter examples are easily constructed.

Application to an optimization problem
In the following, we consider the simple dynamic probabilistic constraint as a part of the following two-stage optimization problem (15): where ξ (occuring in the definition of ϕ) is a bivariate random vector distributed according to N (μ, Σ). The objective is linear in the decisions, it could represent, for instance, linear costs. Since the second stage decision is random, its costs are represented as an expected value. Note, however, that considering the full expected value Ex 2 (ξ 1 ) would not make much sense: Indeed, since function values of x 2 (ξ 1 ) for arguments ξ 1 > x 1 do not affect the probability ϕ(x) (see (15)), one could drive the expected value Ex 2 (ξ 1 ) to −∞ while keeping the decision x feasible. Therefore, we measure the costs of x 2 by ignoring in the objective its values beyond x 1 and rather considering the expected value of x 2 χ (−∞, In a first step, one might be interested in deriving some information from necessary optimality conditions for this problem. Here, one has to take into account that ϕ is not continuously differentiable [see Example (2)]. However, ϕ is continuously partially differentiable with respect to x 2 thanks to Proposition 5. This suggests to consider the decomposed version of problem (34): Here, the one-dimensional outer minimization over x 1 can be realized by elementary numerical approaches. Therefore, our interest will focus on the inner minimization problem over x 2 ∈ L 2 (R) for some fixedx 1 ∈ R: For this inner optimization problem, the data (objective and constraint) are continuously differentiable and one can formulate necessary optimality conditions at some fixedx 2 ∈ L 2 (R) provided that ∇ x 2 ϕ(x 1 ,x 2 ) = 0. This, however, is an immediate consequence of (31). Hence, one may formulate the following necessary optimality condition: Proposition 8 Letx 2 ∈ L 2 (R) be a solution of the optimization problem (36) (with some fixedx 1 ∈ R). Then,x 2 is affine linear on the set (−∞, x 1 ] with the explicit value of Σ 12 /Σ 11 for its slope. Proof Without loss of generality, we may assume that c 2 = 1 in (36) because the solution of the problem is not affected by the value of c 2 . The gradient of the objective evaluated atx 2 has to be a multiple of the gradient ∇ϕ(x 1 ,x 2 ) to the constraint in (36) also evaluated atx 2 . Clearly, the objective (with g ξ 1 referring to the density of ξ 1 ) has a gradient which is given by the function Hence, there exists some multiplier λ such that Since the one-dimensional Gaussian density g ξ 1 is strictly positive, we infer that λ > 0. Given the explicit formula for g ξ 1 as well as for ∇ x 2 ϕ(x 1 ,x 2 ) in (31), we derive the existence of constants K 1 , K 2 > 0 (where the latter already incorporates the multiplier λ) such that for almost every r ≤x 1 : We fix an arbitrary r for which (37) holds true. Using the correlation ρ between the two components ξ 1 and ξ 2 , the inverse covariance matrix can be written as Taking the log in (37) and rearranging terms, one arrives at Putting the last identity can be rewritten as Resolution for α yields that Resubstituting for α and β gives our assertion on the structure ofx 2 : Unfortunately, since an affine linear function cannot belong to L 2 (R) unless it is identically zero, we draw the following negative conclusion of Proposition 8:

Corollary 3
If the components ξ 1 and ξ 2 of ξ are not independent, then problem (36) has no local, much less global solution. Proof The independence assumption implies that Σ 12 = 0. Hence, if (36) had a local solutionx 2 ∈ L 2 (R), then this solution would be a linear function (in the range from −∞ tox 1 ) with nonzero slope by Proposition 8. Therefore it does not belong to L 2 (R), a contradiction.
Before deriving a remedy to the outcome of Corollary 3, we want to illustrate the use of the gradient information collected in (31) in a numerical context. We consider problem (36) with the following data: Using the explicit representation of the gradients for the objective and the constraint (see proof of Proposition 8), we apply a simple projected gradient algorithm in order to improve the second stage decision x 2 in (36). The left diagram of Fig. 2 shows some iterates of this algorithm. All plotted policies realize exactly the desired probability p = 0.8 in the definition of the chance constraint in (36). The starting point for x 2 was chosen as a simple step function "1", which after the first iteration turned into a nonlinear-still discontinuous-policy "2". After some further iterations, the policy becomes continuous. Interestingly, after seven iterations "3", the policy is affine linear on a certain subdomain. It turns out that on this subdomain, the policy perfectly coincides with the affine linear policy "4" satisfying the necessary optimality condition in Proposition 8. The latter is easily identified by its slope, which according to Proposition 8 calculates as Σ 12 /Σ 1,1 = 0.25 and by its intercept which has to be chosen in order to match the probability level p = 0.8. Observe, that all iterates decay to zero on the left end of the negative axis in order to belong to L 2 (R). The right diagram of Fig. 2 plots the objective for the first seven iterates and for the the affine linear policy from Proposition 8 (isolated point). Evidently, the necessary optimality condition from Proposition 8 still carries some information on the optimality condition though not belonging to the L 2 space.
= P (η 1 ≤ x 1 ) P (η 2 ≤ α (x 1 )) , where the last equality follows from the independence of η 1 and η 2 . Again, by the well-known transformation laws of Gaussian distributions as well as by (39), it holds that Consequently, we arrive at ϕ (x 1 , y) = p. Hence, x 2 := y is feasible with respect to the constraint ϕ (x 1 , x 2 ) ≥ p. Next, we verify that y ∈ M. By definition of y, M and α, it suffices to show that Indeed, the assumption that α (x 1 ) <μ 2 would lead-via the fact that the values of Φ are strictly smaller than one-to the contradiction = P (η 2 ≤ α (x 1 )) < P (η 2 ≤μ 2 ) ≤ 1 2 with our assumption that p ≥ 1 2 . Summarizing, y ∈ X * defined by (43) is a feasible second stage policy in problem (44).
This allows us, along with (43) and (48), to proceed lead (49) to the contradiction This proves our initial claim that y in (43) is a global solution to (44). Accordingly, for each x 1 ∈ R satisfying (42), we have that min x 2 ∈X * c 2 Ex 2 (ξ 1 ) χ (−∞,x 1 ] (ξ 1 ) |ϕ (x 1 , x 2 ) ≥ p; x 2 ∈ M Fig. 3 Illustration of a solution to problem 38: Optimal first stage decision x * 1 as minimizer of the function c 1 t +c 2 f (t) (left) and optimal second stage decision x * 2 (r ) as affine linear function with slope and intercept as indicated in Theorem 4 (right) This proves our assertion on an optimal solution x * 1 . As shown in the first part in this proof, the optimal second-stage decision in (44) associated with the first-stage decision x * 1 is defined in (43) and yields the asserted formula for x * 2 in the statement of this proof.