1 Introduction

1.1 Overview

The application of probabilistic constraints (or: chance constraints) to engineering problems and their numerical solution is nowadays standard. Introduced by Charnes et al. [5] in a simple form (individual constrains) in 1958, their systematic theoretical and algorithmic investigation has been pioneered by Prékopa and his students starting in the Seventies (see [15] and references therein). The typical form of a probabilistic constraint is the inequality

$$\begin{aligned} {\mathbb {P}}(g_i(x,\xi )\le 0\quad (i=1,\ldots ,p))\ge p, \end{aligned}$$
(1)

where x is a decision vector, \(\xi \) is a random vector, \({\mathbb {P}}\) a probability measure and g a random constraint mapping with finitely many components. The meaning of (1) is to define a decision x as feasible if the random inequality system \(g(x,\cdot )\le 0\) is satisfied at least with probability \(p\in (0,1]\). A modern theoretical treatment of probabilistic constraints can be found in the monograph [16, chapter 4]. The algorithmic solution of optimization problems subject to constraints (1) has been tremendously advanced within the last twenty years. Rather than providing a detailed list of references here, we want to emphasize the contribution to this development by Shabbir Ahmed (e.g., [12, 13]). At the same time the traditional model (1) has been extended to broader settings such as PDE constrained optimization ([6, 7, 9]) or infinite random inequality systems (probust constraints, [17]).

A challenge of different nature consists in considering dynamic aspects in probabilistic constraints. Observe that (1) is a static model by nature: The decision x (‘here-and now-decision’) has to be taken before the randomness \(\xi \) is observed. Such model would apply, for instance, in the design of a mechanical construction (encoded by x) which is done once and for ever and has to resist unknown future random forces \(\xi \) with high probability. Many decisions, however, are time dependent. The components of x and \(\xi \) could refer to discrete time decision and random processes, respectively. In the control of a hydro reservoir, for instance, one is faced with an alternating sequence of decisions \(x_t\) (referring to water release) and realizations of randomness \(\xi _t\) (water inflow) according to the chronology

$$\begin{aligned} x_1\curvearrowright \xi _1\curvearrowright x_2\curvearrowright \xi _2\curvearrowright \cdots \curvearrowright x_T (\curvearrowright \xi _T). \end{aligned}$$
(2)

Whether or not this sequence ends with a decision (final recourse action) or with the observation of randomness without the possibility of finally reacting to it, depends on the choice of a model of multistage stochastic optimization or of multistage probabilistic programming. This distinction requires some care because sometimes the term ‘two-stage probabilistic constraint’ is used for the addition of a probabilistic constraint (relaxing the almost sure existence of a recourse action) in a setting of two-stage stochastic programming. Such model has been first considered in [14] and is still of much interest (e.g., [11]). Here, the chronology is the one of (2) with \(T=2\) (without the final term in parantheses): \(x_1\curvearrowright \xi _1\curvearrowright x_2\), i.e., it is a special two-stage stochastic optimization problem. In our understanding, it is not a two-stage probabilistic constraint which would end with the term in parentheses in (2): \(x_1\curvearrowright \xi _1\curvearrowright x_2\curvearrowright \xi _2\). In this way one would obtain a logical generalization of conventional one-stage (static) probabilistic constraints of type \(x_1\curvearrowright \xi _1\) and keep the idea, that in a probabilistic constraint one is always faced with a final unknown realization of some random vector. This idea follows a remark in [8]: ‘... a well-formed probabilistic constraint contains at least one coefficient that depends on a random variable realized after the last decision is taken’.

It is clear that in (2) the dynamic character of the decision making process expresses itself by assuming all decisions being functions of past observations in order to take advantage of the gain of information obtained from the realizations of the random vector. Hence, instead of static (constant) decisions \(x_t\) one admits decision rules or policies \(x_2(\xi _1), x_3(\xi _1,\xi _2)\) etc. When considering continuously distributed random vectors, this approach takes the problem to infinite dimensions even though time is discrete, because policies are elements of appropriate function spaces. One may circumvent this difficulty by restricting policies to a parameterized class, linear decision rules in the simplest case. Then, one gets back to a static problem where decisions are the parameters of the policies. Several aspects of modeling linear decision rules in the context of (linear) multistage probabilistic constraints are discussed in [10]. It is not guaranteed, however, that the chosen class contains the optimal policy. Another idea to reduce the problem again to a finite-dimensional one would consist in a discrete approximation of the random distribution. A conceptual framework for dealing with dynamic probabilistic constraints without restricting the class of policies and keeping the continuous character of the given (multivariate Gaussian) distribution was presented in [4] along with applications to two- and three-stage probabilistic control of a water reservoir. Using stochastic dynamic programming rather than direct nonlinear programming, a similar problem was later analyzed and numerically solved in [2] for a significantly larger number of stages, however with a discrete random distribution.

The focus in this paper is not on the numerical solution of problems subject to dynamic probabilistic constraints but rather on analytical properties of the arising probability function. Here we assume the underlying random distribution to be continuous and keep the decision rules general as elements of some Lebesgue or Sobolev space. In Sect. 2, a general multistage model is analyzed. Basic properties like (weak sequential) (semi-) continuity of the probability function or existence of solutions are studied. In Sect. 3, the simplest meaningful two-stage model with decision rules from \(L^2\) is investigated. More specific properties like Lipschitz continuity and differentiability of the probability function are considered. Explicitly verifiable conditions for these properties are provided along with explicit gradient formulae in the Gaussian case. The application of such formulae in the context of necessary optimality conditions is discussed and a concrete identification of solutions presented.

1.2 The general setting

In this paper we study optimization problems of the type

$$\begin{aligned} \min \limits _{x\in {\mathbb {X}}}\{J(x)\mid x\in C,\,\,\varphi (x)\ge p\} \end{aligned}$$
(3)

Here, the space of decisions \({\mathbb {X}}\) is one of the following Lebesgue or Sobolev spaces with \(q\in [1,\infty )\)

$$\begin{aligned} {\mathcal {X}}:= & {} {\mathbb {R}}\times L^{q}({\mathbb {R}})\times L^{q}({\mathbb {R}} ^{2})\times \cdots \times L^{q}({\mathbb {R}}^{T-1}) \\ {\mathcal {X}}^{1}:= & {} {\mathbb {R}}\times W^{1,q}({\mathbb {R}})\times W^{1,q}( {\mathbb {R}}^{2})\times \cdots \times W^{1,q}({\mathbb {R}}^{T-1}). \end{aligned}$$

The subset \(C\subseteq {\mathcal {X}}\) (or \(C\subseteq {\mathcal {X}}^{1}\)) is meant to represent some abstract constraint on the decision, e.g., nonnegativity or bounds for the components. The focus of our attention will be on the inequality constraint \(\varphi (x)\ge p\) which we will assume to represent a so called joint dynamic chance constraint. More precisely, \(p\in \left( 0,1\right] \) is some given safety level and \(\varphi :{\mathcal {X}}\rightarrow \left[ 0,1\right] \) denotes a probability function defined for \(x\in {\mathcal {X}}\) as follows

$$\begin{aligned}&\varphi (x) :={\mathbb {P}}\left( h_{i}\left( x_{1},x_{2}\left( \xi _{1}\right) ,\ldots ,x_{T}\left( \xi _{1},\ldots ,\xi _{T-1}\right) ,\xi _{1},\ldots ,\xi _{T}\right) \right. \nonumber \\&\quad \left. \le 0\quad i=1,\ldots ,k\right) , \end{aligned}$$
(4)

where \(h_{i}:{\mathbb {R}}^{T}\times {\mathbb {R}}^{T}\rightarrow {\mathbb {R}}\) and \( \xi :=(\xi _{1},\ldots ,\xi _{T})\) is a T-dimensional discrete time process on some probability space \(\left( \varOmega ,{\mathcal {A}},{\mathbb {P}} \right) \). Observe that with each component \(x_{t}\) of the decision x depending on past outcomes \(\left( \xi _{1},\ldots ,\xi _{t-1}\right) \) only, x represents an adapted decision process. We endow \({\mathcal {X}}\) and \({\mathcal {X}}^{1}\) with the maximum norm with respect to the usual norms in the coordinate spaces. Doing so, \({\mathcal {X}}\) and \({\mathcal {X}}^{1}\) are Banach spaces.

1.3 A motivating example

To illustrate applications for problem 3, we present a decision management optimization problem on a single water reservoir for hydroelectricity generation. Given a set of future time intervals \(1,2,\dots ,T\), the problem of the operator is to decide on an optimal release policy \( \left( x_{1},\ldots ,x_{T}\right) \) of water, considering technical, economical and environmental aspects. By \(\xi =(\xi _{1},\dots ,\xi _{T})\), we denote the random vector indicating the stochastic water inflow (e.g. precipitation, snow melt) to the reservoirs at corresponding time intervals . The main role of the reservoir is to generate electricity. At the same time, lower and upper limits \(l_{*}\), \(l^{*}\) for the water level have to be satisfied in the reservoir, say for flood protection or for ecological reasons. By the random nature of the inflows, the time dependent water level \(l_{t}\left( x,\xi \right) \) induced from the controlled water release x is a random variable too. Hence, the mentioned limits cannot be satisfied in a deterministic way. Rather, it is reasonable to impose them in a probabilistic way:

$$\begin{aligned} {\mathbb {P}}\left( l_{*}\le l_{t}\left( x,\xi \right) \le l^{*}\quad \left( t=1,\dots ,T\right) \right) \ge p. \end{aligned}$$
(5)

Here, \(p\in \left[ 0,1\right] \) denotes a probability level at which the random constraints are supposed to hold true. The current water level after time interval t is clearly given as the initial level plus the cumulated inflow minus the cumulated release so far:

$$\begin{aligned} l_{t}\left( x,\xi \right) =l_{0}+\xi _{1}+\cdots +\xi _{t}-x_{1}-\cdots -x_{t}\quad \left( t=1,\dots ,T\right) . \end{aligned}$$

Sometimes, one decides on the future water release in complete ignorance of future water inflow. This is the case, for instance, in day ahead markets, when energy production (water release) for each hour of the next day is fixed one day ahead. Then, decisions are just scalars for each time intervals and the probabilistic constraint (5) becomes

$$\begin{aligned} {\mathbb {P}}\left( l_{*}\le l_{0}+\sum \limits _{\tau =1}^{t}\xi _{\tau }-\sum \limits _{\tau =1}^{t}x_{\tau }\le l^{*}\quad \left( t=1,\dots ,T\right) \right) \ge p\quad \left( x\in {\mathbb {R}}^{T}\right) . \end{aligned}$$
(6)

Such a static model does not take into account the temporal gain of information while the random inflow process unfolds. In longer term planning problems one therefore admits from the beginning that future decisions on water release are functions of past observations of the random inflow. Hence, rather than deciding on scalars \(x_{1},\ldots ,x_{T}\), one is looking for functions \(x_{1},x_{2}(\cdot ),x_{3}(\cdot ,\cdot )\), so-called policies. In this dynamic setting better solutions of the underlying optimization problem can be expected (the static model being included as a special case with constant policies, e.g., \(x_{2}(\cdot )\equiv x_{2}\) etc.). Hence, we adjust our static chance constraint above to a dynamic one, where \(\left( x\in {\mathcal {X}},{\mathcal {X}}^{1}\right) \):

$$\begin{aligned} {\mathbb {P}}\left( l_{*}\le l_{0}+\sum \limits _{\tau =1}^{t}\xi _{\tau }-\sum \limits _{\tau =1}^{t}x_{\tau }\left( \xi _{1},\ldots ,\xi _{\tau -1}\right) \le l^{*}\quad \left( t=1,\dots ,T\right) \right) \ge p. \end{aligned}$$

A possible objective in a corresponding optimization problem might consist in the maximization of the expected overall water release (representing the amount of energy produced):

$$\begin{aligned} J(x):=-{\mathbb {E}}\sum \limits _{t=1}^{T}x_{t}\left( \xi _{1},\ldots ,\xi _{\tau -1}\right) . \end{aligned}$$

Then, the optimization problem is of the form (3) with the probability function \(\varphi \) defined in (4) via the constraint mapping \(h:{\mathbb {R}}^{T}\times {\mathbb {R}}^{T}\rightarrow {\mathbb {R}}^{2T}\). The latter has \(k:=2T\) components

$$\begin{aligned} h_{t}\left( u,v\right):= & {} l_{0}-l^{*}+\sum \limits _{\tau =1}^{t}v_{\tau }-\sum \limits _{\tau =1}^{t}u_{\tau }\quad \left( t=1,\dots ,T\right) \\ h_{T+t}\left( u,v\right):= & {} l_{*}-l_{0}+\sum \limits _{\tau =1}^{t}u_{\tau }-\sum \limits _{\tau =1}^{t}v_{\tau }\quad \left( t=1,\dots ,T\right) . \end{aligned}$$

2 Basic structural properties of the general model

In this section we are going to collect some basic structural properties of the chance constraint \(\varphi (x)\ge p\) in (3) and the involved probability function \(\varphi \) in (4). For convenience, we introduce the notation \(u_{\left[ i\right] }:=(u_{1},\ldots ,u_{i})\) for vectors \(u\in {\mathbb {R}}^{n}\) and \(1\le i\le n\). With the policy \(x\in {\mathcal {X}}\) we associate the joint policy (whose components have a common domain) as a mapping \(\left[ x\right] :{\mathbb {R}}^{T}\rightarrow {\mathbb {R}}^{T}\) defined by

$$\begin{aligned} \left[ x\right] (z):=\left( x_{t}(z_{\left[ t-1\right] }) \right) _{t=1,\ldots ,T}\quad \left( z\in {\mathbb {R}}^{T}\right) , \end{aligned}$$
(7)

with the convention \(x_1(z_{[0]}) =x_1\). Finally, we introduce the maximum function related to the mapping h:

$$\begin{aligned} h^{\max }:=\max _{i=1,\ldots ,k}h_{i}. \end{aligned}$$
(8)

Then, the probability function in (4) can be compactly written as

$$\begin{aligned} \varphi (x)={\mathbb {P}}\left( h^{\max }\left( \left[ x\right] (\xi ),\xi \right) \le 0\right) . \end{aligned}$$
(9)

We first check, that this expression is well-defined. In order to ensure this, we make the following basic assumptions in (4) throughout this paper:

$$\begin{aligned} \left. \begin{array}{c} \xi \,\,\text {possesses a density} \\ h\,\,\text {is Borel measurable} \end{array} \right\} . \end{aligned}$$
(BA)

Observe first that for given \(x\in {\mathcal {X}}\) each component \(x_{t}:\) \( {\mathbb {R}}^{t-1}\rightarrow {\mathbb {R}}\) is Borel measurable, whence the mapping \(\left[ x\right] \) is Borel measurable. Then, \(h^{\max }\left( \left[ x\right] (z),z\right) \) is a Borel measurable function of z because each \( h_{i}\) is so thanks to (BA). This implies that

$$\begin{aligned} \left\{ \omega \in \varOmega |h^{\max }\left( \left[ x\right] (\xi \left( \omega \right) ),\xi \left( \omega \right) \right) \le 0\right\} \in {\mathcal {A}}, \end{aligned}$$

so that it is justified to speak of the probability of this event appearing in the definition of (9). It remains to show that this probability is independent of the representative of \(x\in {\mathcal {X}}\). To see this, let \(x^{(1)},x^{(2)}\in {\mathcal {X}}\) such that \( x_{1}^{(1)}=x_{1}^{(2)}\) and be such that

$$\begin{aligned} x_{t+1}^{(1)}(u)=x_{t+1}^{(2)}(u)\quad \forall u\in B_{t}\,\,\forall t=1,\ldots ,T-1, \end{aligned}$$

where \(B_{t}\subseteq {\mathbb {R}}^{t}\) are Lebesgue measurable subsets with \( \lambda _{t}\left( {\mathbb {R}}^{t}\backslash B_{t}\right) =0\) (\(\lambda _{t}\) is the Lebesgue measure in \({\mathbb {R}}^{t}\)). Define

$$\begin{aligned} C:=\bigcup \limits _{t=1}^{T-1}C_{t}\text {, where }C_{t}:=\left( {\mathbb {R}} ^{t}\backslash B_{t}\right) \times {\mathbb {R}}^{T-t-1}\subseteq {\mathbb {R}} ^{T-1}\quad \forall t=1,\ldots ,T-1. \end{aligned}$$

Then, \(\lambda _{T-1}\left( C\right) =0\) and

$$\begin{aligned} \left[ x^{(1)}\right] (z)=\left[ x^{(2)}\right] (z)\quad \forall z\in \left( {\mathbb {R}}^{T-1}\backslash C\right) \times {\mathbb {R}}. \end{aligned}$$

Since \(\xi \) possesses a density \(f_{\xi }\), it follows from (9 ) that

$$\begin{aligned} \varphi (x^{(1)})= & {} \int \limits _{\left\{ z|h^{\max }\left( \left[ x^{(1)} \right] (z),z\right) \le 0\right\} }f_{\xi }\left( z\right) dz\\= & {} \int \limits _{\left\{ z|h^{\max }\left( \left[ x^{(1)}\right] (z),z\right) \le 0\right\} \cap \left\{ \left( {\mathbb {R}}^{T-1}\backslash C\right) \times {\mathbb {R}}\right\} }f_{\xi }\left( z\right) dz \\= & {} \int \limits _{\left\{ z|h^{\max }\left( \left[ x^{(2)}\right] (z),z\right) \le 0\right\} \cap \left\{ \left( {\mathbb {R}}^{T-1}\backslash C\right) \times {\mathbb {R}}\right\} }f_{\xi }\left( z\right) dz\\= & {} \int \limits _{\left\{ z|h^{\max }\left( \left[ x^{(2)}\right] (z),z\right) \le 0\right\} }f_{\xi }\left( z\right) dz =\varphi (x^{(2)}). \end{aligned}$$

This shows, that the value of \(\varphi \) does not depend on the representative of \(x\in {\mathcal {X}}\).

We will commence our analysis with some (lower-) semicontinuity properties and then derive consequences later on. The following Proposition turns out to be a crucial technical tool in this context:

Proposition 1

In addition to the basic assumptions (BA), suppose that h in (4) has components \(h_{i}\) which are lower semicontinuous in their first argument vector (related with x). Consider a sequence \( x^{(n)}\) in \({\mathcal {X}}\) which converges componentwise almost everywhere to some \(x\in {\mathcal {X}}\). Then,

$$\begin{aligned} \underset{n\rightarrow \infty }{\lim \sup }\,\,\varphi \left( x^{(n)}\right) \le \varphi \left( x\right) . \end{aligned}$$
(10)

Moreover, if h has components \(h_{i}\) which are upper semicontinuous in their first argument vector and in addition

$$\begin{aligned} \lambda _{T}\left( \left\{ z\in {\mathbb {R}}^{T}|h_{i}\left( \left[ x\right] (z),z\right) =0\right\} \right) =0\quad i=1,\ldots ,k, \end{aligned}$$
(11)

then

$$\begin{aligned} \underset{n\rightarrow \infty }{\lim \inf }\,\,\varphi \left( x^{(n)}\right) \ge \varphi \left( x\right) . \end{aligned}$$
(12)

Proof

We start with the first assertion (10). The function \(h^{\max }\) in (8) is lower semicontinuous in its first argument vector because the \(h_{i}\) are assumed to be so. By assumption, we have that \( x_{1}^{(n)}\rightarrow _{n}x_{1}\) and

$$\begin{aligned} x_{t+1}^{(n)}(u)\rightarrow _{n}x_{t+1}(u)\quad \forall u\in B_{t}\,\,\forall t=1,\ldots ,T-1, \end{aligned}$$

for some Lebesgue measurable subsets \(B_{t}\subseteq {\mathbb {R}}^{t}\) with \( \lambda _{t}\left( {\mathbb {R}}^{t}\backslash B_{t}\right) =0\). Without loss of generality (by passing to a superset whose difference with \(B_t\) has Lebesgue measure zero), we may assume that the \(B_t\) are Borel measurable. Repeating the construction from the beginning of this section, we find a subset \(C\subseteq {\mathbb {R}}^{T-1}\) which now is Borel measurable and is such that \(\lambda _{T-1}\left( C\right) =0\) and

$$\begin{aligned} \left[ x^{(n)}\right] (z)\rightarrow _{n}\left[ x\right] (z)\quad \forall z\in \left( {\mathbb {R}}^{T-1}\backslash C\right) \times {\mathbb {R}}. \end{aligned}$$

Denote \(\varGamma :=\xi ^{-1}\left( \left( {\mathbb {R}}^{T-1}\backslash C\right) \times {\mathbb {R}}\right) \in {\mathcal {A}}\) and observe that

$$\begin{aligned} {\mathbb {P}}\left( \varGamma \right) =\int \limits _{\left( {\mathbb {R}} ^{T-1}\backslash C\right) \times {\mathbb {R}}}f_{\xi }\left( z\right) dz=1,\quad \left[ x^{(n)}\right] (\xi \left( \omega \right) )\rightarrow _{n} \left[ x\right] (\xi \left( \omega \right) )\quad \forall \omega \in \varGamma . \end{aligned}$$

Consider the event sets

$$\begin{aligned} A_{n}:= & {} \left\{ \omega \in \varOmega |h^{\max }\left( \left[ x^{(n)}\right] (\xi \left( \omega \right) ),\xi \left( \omega \right) \right) \le 0\right\} \quad \left( n\in {\mathbb {N}}\right) \\ A:= & {} \left\{ \omega \in \varOmega |h^{\max }\left( \left[ x\right] (\xi \left( \omega \right) ),\xi \left( \omega \right) \right) \le 0\right\} . \end{aligned}$$

Fix an arbitrary \(\omega \in \left( \varOmega \backslash A\right) \cap \varGamma \) . Then, the lower semicontinuity of \(h^{\max }\) in its first argument vector yields that

$$\begin{aligned} \underset{n\rightarrow \infty }{\lim \inf }\,\,h^{\max }\left( \left[ x^{(n)} \right] (\xi \left( \omega \right) ),\xi \left( \omega \right) \right) \ge h^{\max }\left( \left[ x\right] (\xi \left( \omega \right) ),\xi \left( \omega \right) \right) >0. \end{aligned}$$

Consequently, for any \(\omega \in \left( \varOmega \backslash A\right) \cap \varGamma \), there exists some \(n_{0}\left( \omega \right) \in {\mathbb {N}}\) such that

$$\begin{aligned} h^{\max }\left( \left[ x^{(n)}\right] (\xi \left( \omega \right) ),\xi \left( \omega \right) \right) >0\quad \forall n\ge n_{0}\left( \omega \right) \end{aligned}$$
(13)

Denote by \(\chi _{Q}\) the characteristic function of a set Q. Now, (13) entails that \({\chi }_{A_{n}}\left( \omega \right) \rightarrow _{n}0\) for all \(\omega \in \left( \varOmega \backslash A\right) \cap \varGamma \). In other words, since \({\mathbb {P}}\left( \varGamma \right) =1\), \(\chi _{A_{n}}\) converges pointwise \({\mathbb {P}}\) -almost surely to \({\chi }_{A}\) on the set \(\varOmega \backslash A\). Since \({\chi }_{A_{n}}\le 1\), the dominated convergence theorem provides that

$$\begin{aligned} \int \limits _{\varOmega \backslash A}\chi _{A_{n}}d{\mathbb {P}} \rightarrow _{n}0. \end{aligned}$$

Now, let \(x^{(n_{l})}\) be a subsequence realizing the limsup in (10 ) as a limit. Then, in view of the relation above, we arrive at (10 ):

$$\begin{aligned} \underset{n\rightarrow \infty }{\lim \sup }\,\,\varphi \left( x^{(n)}\right)= & {} \underset{l\rightarrow \infty }{\lim }\varphi \left( x^{(n_{l})}\right) = \underset{l\rightarrow \infty }{\lim }{\mathbb {P}}\left( h^{\max }\left( \left[ x^{(n_{l})}\right] (\xi ),\xi \right) \le 0\right) \\= & {} \underset{l\rightarrow \infty }{\lim }{\mathbb {P}}\left( A_{n_{l}}\right) = \underset{l\rightarrow \infty }{\lim }\int \limits _{\varOmega }\chi _{A_{n_{l}}}d{\mathbb {P}}\\\le & {} \underset{l\rightarrow \infty }{\lim \sup } \int \limits _{\varOmega \backslash A}\chi _{A_{n_{l}}}d{\mathbb {P}}+ \underset{l\rightarrow \infty }{\lim \sup }\int \limits _{A}\chi _{A_{n_{l}}}d{\mathbb {P}}\\= & {} \underset{l\rightarrow \infty }{\lim \sup } \int \limits _{A}\chi _{A_{n_{l}}}d{\mathbb {P}} \le \underset{l\rightarrow \infty }{\lim \sup }\int \limits _{A}d{\mathbb {P}}={\mathbb {P}}\left( A\right) \\= & {} {\mathbb {P}}\left( h^{\max }\left( \left[ x\right] (\xi ),\xi \right) \le 0\right) =\varphi \left( x\right) . \end{aligned}$$

As for (12), observe first that with the components \(h_{i}\) being upper semicontinuous in their first argument vector, the components \(\left( -h_{i}\right) \) of \(-h\) are lower semicontinuous in their first argument vector. Denote by \({\tilde{\varphi }}\) the probability function in (4) or (9), respectively, associated with \(-h\) rather than with h. Then, by the just proven relation (10), we have that

$$\begin{aligned} \underset{n\rightarrow \infty }{\lim \sup }\,\,{\mathbb {P}}\left( -h^{\max }\left( \left[ x^{(n)}\right] (\xi ),\xi \right) \le 0\right)= & {} \underset{ n\rightarrow \infty }{\lim \sup }\,\,{\tilde{\varphi }}\left( x^{(n)}\right) \\\le & {} {\tilde{\varphi }}\left( x\right) ={\mathbb {P}}\left( -h^{\max }\left( \left[ x\right] (\xi ),\xi \right) \le 0\right) . \end{aligned}$$

It now follows that

$$\begin{aligned} \liminf \limits _{n\rightarrow \infty } \varphi \left( x^{(n)}\right)= & {} \liminf \limits _{n\rightarrow \infty } {\mathbb {P}}\left( h^{\max }\left( \left[ x^{(n)}\right] (\xi ),\xi \right) \le 0\right) \\\ge & {} \liminf \limits _{n\rightarrow \infty } {\mathbb {P}}\left( -h^{\max }\left( \left[ x^{(n)}\right] (\xi ),\xi \right)>0\right) \\= & {} - \limsup \limits _{n\rightarrow \infty } -{\mathbb {P}}\left( -h^{\max }\left( \left[ x^{(n)}\right] (\xi ),\xi \right) >0\right) \\= & {} - \limsup \limits _{n\rightarrow \infty } \left( {\mathbb {P}}\left( -h^{\max }\left( \left[ x^{(n)}\right] (\xi ),\xi \right) \le 0\right) -1\right) \\= & {} 1- \limsup \limits _{n\rightarrow \infty } {\mathbb {P}}\left( -h^{\max }\left( \left[ x^{(n)}\right] (\xi ),\xi \right) \le 0\right) \\\ge & {} 1-{\mathbb {P}}\left( -h^{\max }\left( \left[ x\right] (\xi ),\xi \right) \le 0\right) ={\mathbb {P}}\left( h^{\max }\left( \left[ x\right] (\xi ),\xi \right) <0\right) . \end{aligned}$$

From (11) and the basic assumption (BA) that \(\xi \) possesses a density, we infer that

$$\begin{aligned} {\mathbb {P}}\left( h^{\max }\left( \left[ x\right] (\xi ),\xi \right) =0\right)= & {} \lambda _{T}\left( \left\{ z\in {\mathbb {R}}^{T}|h^{\max }\left( \left[ x\right] (z),z\right) =0\right\} \right) \\\le & {} \sum _{i=1}^{k}\lambda _{T}\left( \left\{ z\in {\mathbb {R}} ^{T}|h_{i}\left( \left[ x\right] (z),z\right) =0\right\} \right) =0. \end{aligned}$$

Hence, we may continue the previous chain of (in-)equalities, in order to arrive at (12):

$$\begin{aligned} \underset{n\rightarrow \infty }{\lim \inf }\,\,\varphi \left( x^{(n)}\right) \ge {\mathbb {P}}\left( h^{\max }\left( \left[ x\right] (\xi ),\xi \right) \le 0\right) =\varphi \left( x\right) . \end{aligned}$$

\(\square \)

The following Lemma will allow us to derive from Proposition 1 the announced (semi-) continuity properties for \(\,\varphi \). We do not claim that this Lemma is new but are not able to provide a reference.

Lemma 1

Consider a sequence \(\left\{ x^{(n)}\right\} \subseteq {\mathcal {X}}^{1}\) which converges weakly to \(x\in {\mathcal {X}}^{1}\). Then, there exists a subsequence \(\left\{ x^{(n_{k})}\right\} \) which converges almost everywhere to x.

Proof

Consider \(\{ x^{(n)} \}\subseteq {\mathcal {X}}^1\) which converges weakly to \(x \in {\mathcal {X}}^1\). Since our space \({\mathcal {X}}^{1}\) is a product spaces, it is enough to prove that each coordinates has a subsequence with the desired property.

Let us fix \(i \in \{2,\ldots ,T \}\) (the case \(i=1\) is trivial). For simplicity of notation let us denote \(f_n : =x_{i}^{(n)}\), \(f : =x^{(n)}\). Since \(f_n\) converges weakly to f we have that \(f_n\) is bounded in \(W^{1,q}({\mathbb {R}}^{i-1})\).

Consider \(r\in {\mathbb {N}} \backslash \{ 0\}\), define the domain \(U_r:={\mathbb {B}}_r\subseteq {\mathbb {R}}^{i-1}\), the Euclidean ball centered at zero with radius r. We have that the restriction of \(f_n\) and f belongs to \(W^{1,1}(U_r)\), and since \(U_r\) is bounded we have that \(f_n\) and f belong to \(W^{1,q}(U_r)\). Now, by Rellich–Kondrachov’s Theorem (see, e.g., [1, Theorem 6.3, Part I] and [1, p. 84]) we can extract a subsequence \(f_{n_k} \) which converges in norm and almost everywhere to \(z \in L^{1}(U_r)\). Moreover, since \(f_{n_k}\) also converges weakly to f we have that \(z=f\) almost everywhere on \(U_r\). Finally, using induction and a diagonal argument we are done. \(\square \)

Theorem 1

In addition to the basic assumptions (BA), suppose that h in (4) has components \(h_{i}\) which are lower semicontinuous in their first argument vector (related with x). Then, \(\varphi :{\mathcal {X}} \rightarrow \left[ 0,1\right] \) defined in (4) is upper semicontinuous in the norm topology of \({\mathcal {X}}\). Its restriction \( \varphi |_{{\mathcal {X}}^{1}}:{\mathcal {X}}^{1}\rightarrow \left[ 0,1\right] \) is sequentially upper semicontinuous with respect to the weak topology of \({\mathcal {X}}^{1}\). If, h in (4) has components \( h_{i}\) which are upper semicontinuous in their first argument vector and condition (11) is satisfied, then \(\varphi :{\mathcal {X}} \rightarrow \left[ 0,1\right] \) is lower semicontinuous in the norm topology of \({\mathcal {X}}\) and its restriction \(\varphi |_{{\mathcal {X}}^{1}}:{\mathcal {X}} ^{1}\rightarrow \left[ 0,1\right] \) is sequentially lower semicontinuous with respect to the weak topology of \({\mathcal {X}}^{1}\).

Proof

Let \(\left\{ x^{(n)}\right\} \subseteq {\mathcal {X}}\) be a sequence strongly converging to some \(x\in {\mathcal {X}}\). Consider a subsequence \(\left\{ x^{(n_{k})}\right\} \) such that

$$\begin{aligned} \underset{n\rightarrow \infty }{\lim \sup }\,\,\varphi \left( x^{(n)}\right) =\lim _{k\rightarrow \infty }\,\,\varphi \left( x^{(n_{k})}\right) . \end{aligned}$$

It is well known that there exists a further subsequence \(\left\{ x^{(n_{k_{l}})}\right\} \) converging almost everywhere to x (see, e.g., [3, Theorem 13.6]). Then, by (10),

$$\begin{aligned} \underset{n\rightarrow \infty }{\lim \sup }\,\,\varphi \left( x^{(n)}\right) =\lim _{l\rightarrow \infty }\,\,\varphi \left( x^{(n_{k_{l}})}\right) \le \varphi \left( x\right) \end{aligned}$$
(14)

which shows the upper semicontinuity of \(\varphi \) in the norm topology of \( {\mathcal {X}}\).

Next, let \(\left\{ x^{(n)}\right\} \subseteq {\mathcal {X}}^{1}\) be a sequence weakly converging to some \(x\in {\mathcal {X}}^{1}\). Then, repeating the previous argument—this time justifying almost everywhere convergence of a subsequence on the basis of Lemma 1—we derive in the same way inequality (14), thus proving the sequential upper semicontinuity of \(\varphi |_{{\mathcal {X}}^{1}}\) with respect to the weak topology of \({\mathcal {X}}^{1}\).

Under the additional assumption (11), the same argumentation as above can be repeated along with (12), in order to derive the remaining assertions. \(\square \)

Corollary 1

Denote by

$$\begin{aligned} M(p):=\left\{ x\in {\mathcal {X}}|\varphi (x)\ge p\right\} ;\quad M^{1}(p):=\left\{ x\in {\mathcal {X}}^{1}|\varphi (x)\ge p\right\} \end{aligned}$$

the sets of feasible decisions in problem (3) defined by the dynamic probabilistic constraint. In addition to the basic assumptions (BA), suppose that h in (4) has components \( h_{i} \) which are lower semicontinuous in their first argument vector (related with x). Then, M(p) is strongly closed in \({\mathcal {X}}\) and \(M^{1}(p)\) is weakly sequentially closed in \({\mathcal {X}}^{1}\).

Corollary 2

In addition to the basic assumptions (BA) and to condition (11) suppose that h in (4) has components \(h_{i}\) which are continuous in their first argument vector (related with x). Then, \(\varphi :{\mathcal {X}}\rightarrow \left[ 0,1\right] \) defined in (4) is continuous in the norm topology of \({\mathcal {X}}\). Its restriction \(\varphi |_{{\mathcal {X}}^{1}}:{\mathcal {X}}^{1}\rightarrow \left[ 0,1\right] \) is sequentially continuous with respect to the weak topology of \({\mathcal {X}}^{1}\).

We are now in a position to prove with standard arguments the existence of solutions to problem (3) related with the space \( {\mathcal {X}}^{1}\) of decisions:

Theorem 2

Consider the optimization problem (3) with \( {\mathcal {X}}^{1}\) as the space of decisions. In addition to the basic assumptions (BA), we suppose that

  1. 1.

    The index q in the definition of the space \({\mathcal {X}}^{1}\) satisfies \(1<q<\infty \).

  2. 2.

    The abstract constraint set \(C\subseteq {\mathcal {X}}^{1}\) is norm closed, bounded and convex.

  3. 3.

    The objective function J is weakly sequentially lower semicontinuos.

  4. 4.

    The mapping h in (4) has components \(h_{i}\) which are lower semicontinuous in their first argument vector (related with x).

  5. 5.

    The set of feasible decisions of problem (3) is nonempty.

Then, (3) admits a solution.

Proof

As a consequence of 1., \({\mathcal {X}}^{1}\) is a reflexive Banach space. Therefore, 2. implies that C is weakly sequentially compact. By 4. and Corollary 1, the set \(M^{1}(p)=\left\{ x\in {\mathcal {X}} ^{1}|\varphi (x)\ge p\right\} \) is weakly sequentially closed. Hence, with 5., the feasible set \(C\cap M^{1}(p)\) of (3) is nonempty and weakly sequentially compact. Now, with 3., the Weierstrass Theorem guarantees the existence of a solution to (3). \(\square \)

The following example illustrates that, under the assumptions of Corollary 1, M(p) cannot be expected to be weakly sequentially closed in \({\mathcal {X}}\) (in contrast with \(M^{1}(p)\) and \( {\mathcal {X}}^{1}\)) and therefore existence of solutions as in Theorem 2 cannot be expected in the space \({\mathcal {X}}\):

Example 1

Let \(T=2,k=2,q=2,p=0.5+(2\pi )^{-1}\) and let \(\xi \) have a uniform distribution on the rectangle \(\left[ 0,4\pi \right] \times \left[ 0,1\right] \). Define the mapping h by \(h_{i}\left( x_{1},x_{2},z_{1},z_{2}\right) =z_{i}-x_{i}\) for \(i=1,2\). Then, the \(h_{i}\) are continuous such that our basic assumptions (BA) are satisfied and Corollary 1 guarantees that M(p) is strongly closed in \( {\mathcal {X}}\). Now, define the sequence

$$\begin{aligned} x^{(n)}:= & {} \left( x_{1}^{(n)},x_{2}^{(n)}\right) \in {\mathcal {X}}={\mathbb {R}}\times L^{2}( {\mathbb {R}})\quad \text{ by }\\ x_{1}^{(n)}:= & {} 4\pi ;\quad x_{2}^{(n)}(t):=\left\{ \begin{array}{cc} 0 &{}\quad t\in \left( -\infty ,0\right] \cup \left( 4\pi ,\infty \right) \\ \sin \left( nt\right) &{}\quad t\in \left( 0,2\pi \right] \\ 1 &{}\quad t\in \left( 2\pi ,4\pi \right] \end{array} \right. . \end{aligned}$$

Then, \(x^{(n)}\) weakly converges to \(x:=\left( 4\pi ,\chi _{\left[ 2\pi ,4\pi \right] }\right) \). Moreover, by definition of h and \(\xi \) and by (4), it holds that

$$\begin{aligned} \varphi \left( x^{(n)}\right)= & {} {\mathbb {P}}\left( \xi _{1}\le x_{1}^{(n)},\xi _{2}\le x_{2}^{(n)}\left( \xi _{1}\right) \right) \\= & {} {\mathbb {P}}\left( 0\le \xi _{1}<2\pi ,0\le \xi _{2}\le \sin \left( n\xi _{1}\right) \right) +{\mathbb {P}}\left( 2\pi \le \xi _{1}\le 4\pi ,0\le \xi _{2}\le 1\right) \\= & {} (2\pi )^{-1}+0.5=p. \end{aligned}$$

Therefore, \(x^{(n)}\in M(p)\). On the other hand,

$$\begin{aligned} \varphi \left( x\right)= & {} {\mathbb {P}}\left( \xi _{1}\le 4\pi ,\xi _{2}\le \chi _{\left[ 2\pi ,4\pi \right] }\left( \xi _{1}\right) \right) ={\mathbb {P}} \left( 2\pi \le \xi _{1}\le 4\pi ,0\le \xi _{2}\le 1\right) \\= & {} 0.5<p. \end{aligned}$$

It follows that \(x\notin M(p)\), whence M(p) fails to be weakly sequentially closed.

We finish this Section by briefly addressing the issue of convexity of the feasible set defined by the probabilistic constraint \(\varphi (x)\ge p\) in (3). Assume first that we would deal with a joint static probabilistic constraint, which means that the decision policies x are supposed to be constants: \(x[z]\equiv x\in {\mathbb {R}}^{T}\) in (7 ). Assume further, that \(\xi \) has a logconcave density (e.g., multivariate Gaussian) and that the mapping h is affine linear: \(h\left( x,z\right) =Ax+Bz+b\). This is the cae, for instance, for the reservoir problem with static probabilistic constraint (6). Then, thanks to a result by Prékopa [15, Th. 10.2.1.], the inequality \(\varphi (x)\ge p\) defines a convex set of feasible decisions x for any right-hand side probability level p. Unfortunately, a similar convexity result gets lost in the dynamic setting. Indeed, we may revisit Example 1, where the density of the given uniform distribution is constant on the rectangle and zero outside, hence logconcave (in the extended-valued meaning). Moreover, the mapping \(h\left( x,z\right) =z-x\) is linear. As for the feasible set \(M(p):=\left\{ x\in {\mathcal {X}}|\varphi (x)\ge p\right\} \) , we have seen in Example 1 that it is strongly closed but fails to be weakly sequentially closed. If it was convex, then closedness would imply weak closedness, hence weak sequential closedness, which is a contradiction.

3 Properties of the probability function in a simple two-stage model

In this section, we are going to investigate analytical properties (continuity, Lipschitz continuity, differentiability including explicit derivatives) of the probability function \(\varphi \) in (4) in the framework of the simplest meaningful dynamic setting. More precisely, we consider a two-stage model (\(T=2\)) of the following joint and separated probabilistic constraint:

$$\begin{aligned} \varphi \left( x\right) :={\mathbb {P}}\left( \xi _{1}\le x_{1},\xi _{2}\le x_{2}(\xi _{1})\right) \ge p. \end{aligned}$$
(15)

This corresponds to the choice of the mapping \(h:{\mathbb {R}}^{2}\times {\mathbb {R}}^{2}\rightarrow {\mathbb {R}}^{2}\) defined by \(h(x,z)=z-x\) in (4). We will choose \({\mathcal {X}}\) with index \(q=2\) to be the base space of decisions, which means that \(x_{2}\in L^{2}({\mathbb {R}})\). In all results hereafter, we shall explicitly work with a given density of \(\xi \). By continuity of h, our basic assumptions (BA) will be automatically satisfied then.

3.1 Continuity and lipschitz continuity

Proposition 2

If \(\xi \) has a density, then the probability function \(\varphi :{\mathbb {R}} \times L^{2}({\mathbb {R}})\rightarrow {\mathbb {R}}\) is continuous.

Proof

Since h is continuous, it suffices by Corollary 2 to check condition (11) at an arbitrary \(x\in {\mathcal {X}}\). For the first component \(h_{1}\) of h it reads as

$$\begin{aligned} \lambda _{2}\left( \left\{ z\in {\mathbb {R}}^{2}|z_{1}=x_{1}\right\} \right) =0 \end{aligned}$$

which is evidently true. For the second component we observe that

$$\begin{aligned}&\lambda _{2}\left( \left\{ z\in {\mathbb {R}}^{2}|z_{2}=x_{2}(z_{1})\right\} \right) =0\\&\quad \Longleftrightarrow \lambda _{1}\left( \left\{ z_{2}\in {\mathbb {R}} |z_{2}=x_{2}(z_{1})\right\} \right) =0 \quad \text {a.e. } z_{1}\in {\mathbb {R}} \end{aligned}$$

and that the right-hand side is evidently true. \(\square \)

Before extending the previous result on continuity to the stronger Lipschitz continuity, we introduce the following two assumptions on the density \(g_{\xi }\) of a two-dimensional random vector \(\xi \):

$$\begin{aligned}&\exists C\ge 0:g_{\xi _{1}}(r)\left( =\int _{{\mathbb {R}}}g_{\xi }\left( r,s\right) ds\right) \le C\quad \text {a.e. }r\in {\mathbb {R}} \end{aligned}$$
(16)
$$\begin{aligned}&\sup _{s\in {\mathbb {R}}}g_{\xi }\left( \cdot ,s\right) \in L^{2}\left( {\mathbb {R}}\right) \end{aligned}$$
(17)

Note that (16) means that the first marginal density of \(\xi \) (which is the density \(g_{\xi _{1}}\) of the first component of \(\xi \)) is bounded.

Proposition 3

Let the density \(g_{\xi }\) of \(\xi \) satisfy (16) and (17). Then, \(\varphi \) is Lipschitz continuous.

Proof

Consider an arbitrary couple \(x,y\in {\mathcal {X}}\). We start with the obvious estimate

$$\begin{aligned} \left| \varphi (x)-\varphi (y)\right| \le \left| \varphi (x_{1},x_{2})-\varphi (y_{1},x_{2})\right| +\left| \varphi (y_{1},x_{2})-\varphi (y_{1},y_{2})\right| . \end{aligned}$$
(18)

Without loss of generality, assume that \(x_{1}\le y_{1}\). Now, by (15), and taking into account assumption (16), we have that

$$\begin{aligned} \left| \varphi (x_{1},x_{2})-\varphi (y_{1},x_{2})\right|= & {} \left| {\mathbb {P}}\left( \xi _{1}\le x_{1},\xi _{2}\le x_{2}(\xi _{1})\right) -{\mathbb {P}}\left( \xi _{1}\le y_{1},\xi _{2}\le x_{2}(\xi _{1})\right) \right| \\= & {} {\mathbb {P}}\left( x_{1}<\xi _{1}\le y_{1},\xi _{2}\le x_{2}(\xi _{1})\right) \\= & {} \int _{x_{1}}^{y_{1}}\int _{-\infty }^{x_{2}(r)}g_{\xi }\left( r,s\right) dsdr\le \int _{x_{1}}^{y_{1}}\int _{-\infty }^{\infty }g_{\xi }\left( r,s\right) dsdr \\\le & {} C\left| y_{1}-x_{1}\right| . \end{aligned}$$

Likewise, exploiting (17), the fact that \(x_{2},y_{2}\in L^{2}\left( {\mathbb {R}}\right) \) and the Cauchy-Schwartz inequality, we obtain

$$\begin{aligned} \left| \varphi (y_{1},x_{2})-\varphi (y_{1},y_{2})\right|= & {} \left| {\mathbb {P}}\left( \xi _{1}\le y_{1},\xi _{2}\le x_{2}(\xi _{1})\right) -{\mathbb {P}}\left( \xi _{1}\le y_{1},\xi _{2}\le y_{2}(\xi _{1})\right) \right| \\= & {} \left| \int _{-\infty }^{y_{1}}\left( \int _{-\infty }^{x_{2}(r)}g_{\xi }\left( r,s\right) ds-\int _{-\infty }^{y_{2}(r)}g_{\xi }\left( r,s\right) ds\right) dr\right| \\\le & {} \int _{-\infty }^{y_{1}}\left| \int _{-\infty }^{x_{2}(r)}g_{\xi }\left( r,s\right) ds-\int _{-\infty }^{y_{2}(r)}g_{\xi }\left( r,s\right) ds\right| dr \\= & {} \int _{-\infty }^{y_{1}}\int _{\min \{x_{2}(r),y_{2}(r)\}}^{\max \{x_{2}(r),y_{2}(r)\}}g_{\xi }\left( r,s\right) dsdr \\\le & {} \int _{-\infty }^{\infty }\sup _{s\in {\mathbb {R}}}g_{\xi }\left( r,s\right) \left| x_{2}(r)-y_{2}(r)\right| dr \\\le & {} \left\| \sup _{s\in {\mathbb {R}}}g_{\xi }\left( r,s\right) \right\| _{L^{2}\left( {\mathbb {R}}\right) }\left\| x_{2}-y_{2}\right\| _{L^{2}\left( {\mathbb {R}}\right) }={\tilde{C}}\left\| x_{2}-y_{2}\right\| _{L^{2}\left( {\mathbb {R}}\right) }. \end{aligned}$$

Along with (18), we conclude that

$$\begin{aligned} \left| \varphi (x)-\varphi (y)\right| \le (C+{\tilde{C}})\left\| x-y\right\| _{{\mathcal {X}}}. \end{aligned}$$

\(\square \)

The following example shows, that the assumptions of Proposition 3 are not strong enough to guarantee the differentiability of \( \varphi \):

Example 2

Let \(\xi \sim {\mathcal {N}}\left( 0,I_{2}\right) \) have a bivariate standard Gaussian distribution (uncorrelated components with mean zero and unit variance). By Proposition 6, the assumptions (16) and (17) of Proposition 3 are satisfied and, hence, \(\varphi \) is Lipschitz continuous. On the other hand, \(\varphi \) fails to be differentiable. To see this, we fix \({\hat{x}}_{2}:=\chi _{\left[ 0,1\right] }\in L^{2}\left( {\mathbb {R}}\right) \) and observe that the partial real function \({\tilde{\varphi }}(x_{1}):=\varphi \left( x_{1},{\hat{x}} _{2}\right) \) fails to be differentiable. Indeed, the following explicit representation can be immediately verified, where \(\varPhi \) refers to the cumulative distribution function of the one dimensional standard Gaussian distribution:

$$\begin{aligned} {\tilde{\varphi }}(x_{1})=\left\{ \begin{array}{ll} \varPhi \left( 0\right) \varPhi \left( x_{1}\right) &{} x_{1}\le 0 \\ \varPhi ^{2}\left( 0\right) +\varPhi \left( 1\right) \left( \varPhi \left( x_{1}\right) -\varPhi \left( 0\right) \right) &{} x_{1}\in \left[ 0,1\right] \\ \varPhi ^{2}\left( 0\right) +\varPhi \left( 1\right) \left( \varPhi \left( 1\right) -\varPhi \left( 0\right) \right) +\left( \varPhi \left( 0\right) \left( \varPhi \left( x_{1}\right) -\varPhi \left( 1\right) \right) \right) &{} x_{1}\ge 0 \end{array} \right. \end{aligned}$$

The graph of this function is shown in Fig. 1. Clearly, \({\tilde{\varphi }}\) is Lipschitz continuous because \(\varphi \) is so. On the other hand, it fails to be differentiable at \(x_{1}=0\) and \(x_{1}=1\). This can be seen for \(x_{1}=0\), for instance, by deriving the first two expressions above at 0. With f denoting the density of the standard Gaussian distribution, the derivative of the first expression-yielding the left directional derivative of \({\tilde{\varphi }}\) at 0-gives \(\varPhi \left( 0\right) f\left( 0\right) \). On the other hand, the derivative of the second expression-yielding the right directional derivative of \({\tilde{\varphi }}\) at 0-gives \(\varPhi \left( 1\right) f\left( 0\right) \). Since \(\varPhi \left( 1\right) >\varPhi \left( 0\right) \) and \(f\left( 0\right) >0\), both values are different, hence \({\tilde{\varphi }}\) fails to be differentiable at 0.

Fig. 1
figure 1

Plot of function \(\tilde{\varphi }\) from Example 2

We shall see in the next section that the reason for the failure of differentiability of \(\varphi \) in Example 2 is the discontinuity of the second stage policy \(x_{2}:=\chi _{\left[ 0,1\right] }\) at which the derivative is considered. More precisely, this circumstance concerns just the partial differentiability of \(\varphi \) with respect to its first argument \(x_{1}\), whereas the partial differentiability of \( \varphi \) with respect to \(x_{2}\) remains unaffected by a possible discontinuity of \(x_{2}\).

3.2 Differentiability

Before verifying the partial differentiability of \(\varphi \) with respect to its first argument, we shall prove the following

Lemma 2

Let a bivariate probability density g satisfy the following technical (uniform calmness) condition:

$$\begin{aligned} \begin{array}{c} \forall {\bar{r}}\in {\mathbb {R}}\,\,\exists l\in L^{1}\left( {\mathbb {R}}\right) ,\varepsilon >0:\\ \left| g\left( r,s\right) -g\left( {\bar{r}},s\right) \right| \le l(s)\left| r-{\bar{r}}\right| \quad \forall r\in \left( {\bar{r}}-\varepsilon ,{\bar{r}}+\varepsilon \right) \text { a.e. }s\in {\mathbb {R}}. \end{array} \end{aligned}$$
(19)

Assume further, that \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) is continuous. Then, the function

$$\begin{aligned} \alpha \left( r\right) :=\int _{-\infty }^{f(r)}g\left( r,s\right) ds\quad \left( r\in {\mathbb {R}}\right) \end{aligned}$$
(20)

is finite-valued and continuous.

Proof

Fix an arbitrary \({\bar{r}}\in {\mathbb {R}}\) and consider an arbitrary sequence \( r_{n}\rightarrow {\bar{r}}\). We are going to show that \(\alpha \left( r_{n}\right) \rightarrow \alpha \left( {\bar{r}}\right) \). We observe first that

$$\begin{aligned} g\left( r_{n},s\right) \chi _{(-\infty ,f(r_{n})]}(s)\rightarrow g\left( {\bar{r}},s\right) \chi _{(-\infty ,f({\bar{r}})]}(s)\quad \text {a.e. }s\in {\mathbb {R}}. \end{aligned}$$
(21)

Indeed, if \(s<f({\bar{r}})\), then, \(s<f(r_{n})\) and if \(s>f({\bar{r}})\), then, \( s>f(r_{n})\) for n large enough by continuity of f. Hence, for each \( s\ne f({\bar{r}})\), one has that \(\chi _{(-\infty ,f(r_{n})]}(s)=\chi _{(-\infty ,f({\bar{r}})]}(s)\) for n large enough. By (19), there exists a subset \(A\subseteq {\mathbb {R}}\) such that \(\lambda _{1}\left( A\right) =0\) and \(g\left( r_{n},s\right) \rightarrow g\left( {\bar{r}} ,s\right) \) for all \(s\in {\mathbb {R}}\backslash A\). Consequently, (21) holds true for all \(s\in {\mathbb {R}}\backslash \left( A\cup \left\{ f({\bar{r}})\right\} \right) \), where \(\lambda _{1}\left( A\cup \left\{ f({\bar{r}})\right\} \right) =0\).

From (19) we conclude that, for n large enough and almost every \(s\in {\mathbb {R}}\),

$$\begin{aligned} \left| g\left( r_{n},s\right) -g\left( {\bar{r}},s\right) \right| \le l(s)\left| r_{n}-{\bar{r}}\right| \le l(s). \end{aligned}$$

Therefore, for n large enough and almost every \(s\in {\mathbb {R}}\),

$$\begin{aligned} g\left( r_{n},s\right) \chi _{(-\infty ,f(r_{n})]}(s)\le g\left( r_{n},s\right) \le l(s)+g\left( {\bar{r}},s\right) . \end{aligned}$$
(22)

We show that \(g\left( {\bar{r}},\cdot \right) \in L^{1}\left( {\mathbb {R}} \right) \): Indeed, as \(g\in L^{1}\left( {\mathbb {R}}^{2}\right) \) (as a probability density), Fubini’s Theorem yields that \(g\left( r,\cdot \right) \in L^{1}\left( {\mathbb {R}}\right) \) for almost every \(r\in {\mathbb {R}}\). Hence, there exists some \({\tilde{r}}\in \left( {\bar{r}}-\varepsilon ,{\bar{r}} +\varepsilon \right) \) with \(\varepsilon \) from (19) such that \( g\left( {\tilde{r}},\cdot \right) \in L^{1}\left( {\mathbb {R}}\right) \) and

$$\begin{aligned} g\left( {\bar{r}},s\right) \le l(s)\left| {\tilde{r}}-{\bar{r}}\right| +g\left( {\tilde{r}},s\right) \quad \text {a.e. }s\in {\mathbb {R}}. \end{aligned}$$

Since \(l\in L^{1}\left( {\mathbb {R}}\right) \) and \(\left| {\tilde{r}}-{\bar{r}} \right| \le \varepsilon \), it follows that \(g\left( {\bar{r}},\cdot \right) \in L^{1}\left( {\mathbb {R}}\right) \). Hence, by (22), \( l+g\left( {\bar{r}},\cdot \right) \) is an integrable majorant for the sequence of functions \(g\left( r_{n},\cdot \right) \chi _{(-\infty ,f(r_{n})]}\), which by (21) converges pointwise almost everywhere to the function \(g\left( {\bar{r}},\cdot \right) \chi _{(-\infty ,f({\bar{r}})]}\). Therefore, by Lebesgue’s dominated convergence theorem, the value

$$\begin{aligned} \alpha \left( {\bar{r}}\right) =\int g\left( {\bar{r}},s\right) \chi _{(-\infty ,f({\bar{r}})]}(s)ds<\infty \end{aligned}$$

in (20) is finite and it holds that

$$\begin{aligned} \alpha \left( r_{n}\right) =\int g\left( r_{n},s\right) \chi _{(-\infty ,f(r_{n})]}(s)ds\rightarrow _{n}\alpha \left( {\bar{r}}\right) . \end{aligned}$$

Since \({\bar{r}}\) was chosen arbitrarily, we have shown that \(\alpha \) is finite-valued and continuous. \(\square \)

The preceding Lemma allows us to formulate the desired result on partial differentiability of \(\varphi \) with respect to its first argument:

Proposition 4

Let the density \(g_{\xi }\) of \(\xi \) satisfies (19) and fix \({\bar{x}}_{2}\in L^{2}({\mathbb {R}})\) such that \({\bar{x}}_{2}\) is continuous. Then, the partial derivative of \(\varphi \) w.r.t. \(x_{1}\) exists at any \(\left( {\bar{x}}_{1},{\bar{x}}_{2}\right) \), it equals

$$\begin{aligned} \frac{\partial \varphi }{\partial x_{1}}\left( {\bar{x}}_{1},{\bar{x}} _{2}\right) =\int _{-\infty }^{{\bar{x}}_{2}({\bar{x}}_{1})}g_{\xi }\left( {\bar{x}} _{1},s\right) ds. \end{aligned}$$

Moreover, it depends continuously on \(x_{1}\).

Proof

Let \({\bar{x}}_{1}\) be arbitrary. By (15), we have that

$$\begin{aligned} \varphi \left( {\bar{x}}_{1},{\bar{x}}_{2}\right) =\int _{-\infty }^{{\bar{x}} _{1}}\int _{-\infty }^{{\bar{x}}_{2}(r)}g_{\xi }\left( r,s\right) dsdr\mathbb {=} \int _{-\infty }^{{\bar{x}}_{1}}\alpha \left( r\right) dr \end{aligned}$$

with \(\alpha \) defined in Lemma 2 upon setting \(f(r):={\bar{x}}_{2}(r)\) and \(g:=g_{\xi }\). Since \({\bar{x}}_{2}\) is supposed to be continuous, the assumptions of Lemma 2 are satisfied. Thus, by taking into account that \(\alpha \) is continuous according to 2, we arrive at

$$\begin{aligned} \frac{\partial \varphi }{\partial x_{1}}\left( {\bar{x}}_{1},{\bar{x}} _{2}\right) =\lim _{h\rightarrow 0}\frac{\varphi \left( {\bar{x}}_{1}+h,{\bar{x}} _{2}\right) -\varphi \left( {\bar{x}}_{1},{\bar{x}}_{2}\right) }{h}=\alpha (\bar{ x}_{1})=\int _{-\infty }^{{\bar{x}}_{2}({\bar{x}}_{1})}g\left( {\bar{x}} _{1},s\right) ds. \end{aligned}$$

Continuity of \(\frac{\partial \varphi }{\partial x_{1}}\left( \cdot ,{\bar{x}} _{2}\right) =\alpha \) follows once more from the continuity of \(\alpha \). \(\square \)

Observe, that a full continuity result (with respect to \(x_{1}\) and \(x_{2}\) simultaneously) cannot be expected for the partial derivative \( \frac{\partial \varphi }{\partial x_{1}}\) because, by virtue of Example 2, it may not even be defined for discontinuous \(x_{2}\) approaching the continuous policy \({\bar{x}}_{2}\). In contrast to the partial derivative w.r.t. \(x_{1}\), the partial derivative of \(\varphi \) with respect to \(x_{2}\) does not require any assumptions on the fixed second-stage policy \({\bar{x}} _{2}\) but rather some additional assumptions on the density \(g_{\xi }\):

Proposition 5

Let the density \(g_{\xi }\) of \(\xi \) satisfies assumption (17) as well as the assumption of being Lipschitz continuous in the second argument uniformly in the first argument:

$$\begin{aligned} \exists C>0:\left| g_{\xi }\left( r,s\right) -g_{\xi }\left( r,t\right) \right| \le C\left| s-t\right| \quad \forall r,s,t\in {\mathbb {R}} . \end{aligned}$$
(23)

Fix an arbitrary \(\left( {\bar{x}}_{1},{\bar{x}}_{2}\right) \in {\mathcal {X}}= {\mathbb {R}}\times L^{2}\left( {\mathbb {R}}\right) \). Then, the partial derivative \(\nabla _{x_{2}}\varphi \) exists at \(\left( {\bar{x}}_{1},{\bar{x}} _{2}\right) \), it is given by the expression

$$\begin{aligned} \nabla _{x_{2}}\varphi \left( {\bar{x}}_{1},{\bar{x}}_{2}\right) =g_{\xi }\left( \cdot ,{\bar{x}}_{2}(\cdot )\right) \chi _{(-\infty ,{\bar{x}}_{1}]}. \end{aligned}$$
(24)

and it is continuous in \(\left( x_{1},x_{2}\right) \).

Proof

We put \(\gamma \left( x_{2}\right) :=\varphi \left( {\bar{x}}_{1},x_{2}\right) \) for all \(x_{2}\in L^{2}\left( {\mathbb {R}}\right) \) and show that this function is Fréchet differentiable at \({\bar{x}}_{2}\). Define the linear function

$$\begin{aligned} A(h):=\int _{-\infty }^{{\bar{x}}_{1}}g_{\xi }\left( r,{\bar{x}}_{2}(r)\right) h(r)dr\quad \left( h\in L^{2}\left( {\mathbb {R}}\right) \right) . \end{aligned}$$
(25)

From (17) we infer that \(g_{\xi }\left( \cdot ,{\bar{x}}_{2}(\cdot )\right) \in L^{2}\left( {\mathbb {R}}\right) \), whence by the Cauchy-Schwarz inequality,

$$\begin{aligned} \left| A(h)\right| \le \int _{-\infty }^{\infty }g_{\xi }\left( r, {\bar{x}}_{2}(r)\right) \left| h(r)\right| dr\le \left\| g_{\xi }\left( \cdot ,{\bar{x}}_{2}(\cdot )\right) \right\| _{L^{2}}\left\| h\right\| _{L^{2}}. \end{aligned}$$

Consequently, A is a continuous linear functional. Hence, the Fréchet differentiability of \(\gamma \) at \({\bar{x}}_{2}\) will be proven, once we can show that

$$\begin{aligned} \lim _{\left\| h\right\| _{L^{2}}\rightarrow 0}\left\| h\right\| _{L^{2}}^{-1}\left( \gamma \left( {\bar{x}}_{2}+h\right) -\gamma \left( {\bar{x}} _{2}\right) -A(h)\right) =0. \end{aligned}$$
(26)

Indeed, the definition of \(\gamma \) and (15) entail that

$$\begin{aligned}&\gamma \left( {\bar{x}}_{2}+h\right) -\gamma \left( {\bar{x}}_{2}\right) -A(h) \\&\quad =\int _{-\infty }^{{\bar{x}}_{1}}\int _{-\infty }^{{\bar{x}}_{2}\left( r\right) +h\left( r\right) }g_{\xi }\left( r,s\right) dsdr-\int _{-\infty }^{{\bar{x}} _{1}}\int _{-\infty }^{{\bar{x}}_{2}\left( r\right) }g_{\xi }\left( r,s\right) dsdr\nonumber \\&\qquad - \int _{-\infty }^{{\bar{x}}_{1}}g_{\xi }\left( r,{\bar{x}}_{2}(r)\right) h(r)dr \\&\quad =\int _{-\infty }^{{\bar{x}}_{1}}\left( \left( \mathrm {sgn\,}h\left( r\right) \right) \int _{\min \{{\bar{x}}_{2}\left( r\right) ,{\bar{x}}_{2}(r)+h\left( r\right) \}}^{\max \{{\bar{x}}_{2}\left( r\right) ,{\bar{x}}_{2}(r)+h\left( r\right) \}}g_{\xi }\left( r,s\right) ds-g_{\xi }\left( r,{\bar{x}} _{2}(r)\right) h(r)\right) dr \\&\quad = \int _{-\infty }^{{\bar{x}}_{1}}\left( \mathrm {sgn\,}h\left( r\right) \right) \int _{\min \{{\bar{x}}_{2}\left( r\right) ,{\bar{x}}_{2}(r)+h\left( r\right) \}}^{\max \{{\bar{x}}_{2}\left( r\right) ,{\bar{x}}_{2}(r)+h\left( r\right) \}}\left( g_{\xi }\left( r,s\right) -g_{\xi }\left( r,{\bar{x}}_{2}(r)\right) \right) dsdr. \end{aligned}$$

By (23), we have that

$$\begin{aligned} \begin{array}{l} \left| g_{\xi }\left( r,s\right) -g_{\xi }\left( r,{\bar{x}} _{2}(r)\right) \right| \le C\left| h\left( r\right) \right| \\ \quad \forall r\in {\mathbb {R}}\,\,\forall s\in \left[ \min \{{\bar{x}}_{2}\left( r\right) ,{\bar{x}}_{2}(r)+h\left( r\right) \},\max \{{\bar{x}}_{2}\left( r\right) ,{\bar{x}}_{2}(r)+h\left( r\right) \}\right] . \end{array} \end{aligned}$$

Consequently, we derive the following relation implying (26).

$$\begin{aligned} \left| \gamma \left( {\bar{x}}_{2}+h\right) -\gamma \left( {\bar{x}} _{2}\right) -A(h)\right| \le C\int _{-\infty }^{{\bar{x}}_{1}}\left| h\left( r\right) \right| ^{2}dr=C\left\| h\right\| _{L^{2}}^{2}. \end{aligned}$$

It follows that \(\nabla _{x_{2}}\varphi \left( {\bar{x}}_{1},{\bar{x}}_{2}\right) =\nabla \gamma \left( {\bar{x}}_{2}\right) =A\). Since A in (25) has been shown to be a continuous linear functional on \(L^{2}\left( {\mathbb {R}}\right) \), it can be identified with the function \(g_{\xi }\left( \cdot ,{\bar{x}}_{2}(\cdot )\right) \chi _{(-\infty ,{\bar{x}}_{1}]}\in L^{2}\left( {\mathbb {R}}\right) \). This entails the asserted formula (24). It remains to show that the expression given there depends continuously on \(\left( x_{1},x_{2}\right) \). To this aim, consider a sequence \((x_{1}^{(n)},x_{2}^{(n)})\) in \({\mathcal {X}}\) strongly converging to \(\left( {\bar{x}}_{1},{\bar{x}}_{2}\right) \in {\mathcal {X}}\). We have to show that

$$\begin{aligned} \nabla _{x_{2}}\varphi (x_{1}^{(n)},x_{2}^{(n)})\rightarrow _{n}\nabla _{x_{2}}\varphi \left( {\bar{x}}_{1},{\bar{x}}_{2}\right) \end{aligned}$$

in \(L^{2}\left( {\mathbb {R}}\right) \). We will do this by showing the equivalent fact that every subsequence \((x_{1}^{(n_{k})},x_{2}^{(n_{k})})\) of \((x_{1}^{(n)},x_{2}^{(n)})\) has again a subsequence \( (x_{1}^{(n_{k_{l}})},x_{2}^{(n_{k_{l}})})\) such that

$$\begin{aligned} \nabla _{x_{2}}\varphi (x_{1}^{(n_{k_{l}})},x_{2}^{(n_{k_{l}})})\rightarrow _{l}\nabla _{x_{2}}\varphi \left( {\bar{x}}_{1},{\bar{x}}_{2}\right) \end{aligned}$$
(27)

in \(L^{2}\left( {\mathbb {R}}\right) \). So, let \( (x_{1}^{(n_{k})},x_{2}^{(n_{k})})\) be such an arbitrary subsequence. Observe first that the strong convergence \((x_{1}^{(n_{k})},x_{2}^{(n_{k})}) \rightarrow _{l}\left( {\bar{x}}_{1},{\bar{x}}_{2}\right) \) in \({\mathbb {R}}\times L^{2}\left( {\mathbb {R}}\right) \) implies the almost everywhere pointwise convergence for a subsequence:

$$\begin{aligned} (x_{1}^{(n_{k_{l}})}(r),x_{2}^{(n_{k_{l}})}(r))\rightarrow _{l}\left( {\bar{x}} _{1}(r),{\bar{x}}_{2}(r)\right) \quad \text {a.e. }r\in {\mathbb {R}}. \end{aligned}$$
(28)

As \(g_{\xi }\) is continuous in its second argument by (23), it follows from (28) that

$$\begin{aligned} g_{\xi }(r,x_{2}^{(n_{k_{l}})}(r))\rightarrow _{l}g_{\xi }(r,{\bar{x}} _{2}(r))\quad \text {a.e. }r\in {\mathbb {R}}. \end{aligned}$$

Moreover,

$$\begin{aligned} \chi _{(-\infty ,x_{1}^{(n_{k_{l}})}]}(r)\rightarrow _{l}\chi _{\left( -\infty ,{\bar{x}}_{1}\right] }(r)\quad \forall r\in {\mathbb {R}}\diagdown \left\{ {\bar{x}}_{1}\right\} . \end{aligned}$$

We conclude from (24) that

$$\begin{aligned}&\nabla _{x_{2}}\varphi (x_{1}^{(n_{k_{l}})},x_{2}^{(n_{k_{l}})})(r)=g_{\xi }(r,x_{2}^{(n_{k_{l}})}(r))\chi _{(-\infty ,x_{1}^{(n_{k_{l}})}]}(r)\\&\quad \rightarrow _{l} g_{\xi }(r,{\bar{x}}_{2}(r))\chi _{\left( -\infty ,{\bar{x}}_{1}\right] }(r)=\nabla _{x_{2}}\varphi \left( {\bar{x}}_{1},{\bar{x}}_{2}\right) . \end{aligned}$$

for almost every \(r\in {\mathbb {R}}\). On the other hand, by (17)

$$\begin{aligned} \nabla _{x_{2}}\varphi (x_{1}^{(n_{k_{l}})},x_{2}^{(n_{k_{l}})})(\cdot )=g_{\xi }(\cdot ,x_{2}^{(n_{k_{l}})}(\cdot ))\chi _{(-\infty ,x_{1}^{(n_{k_{l}})}]}(\cdot )\le \sup _{s\in {\mathbb {R}}}g_{\xi }\left( \cdot ,s\right) \in L^{2}\left( {\mathbb {R}}\right) . \end{aligned}$$

Therefore, Lebesgue’s Dominated Convergence Theorem (for \(L^{2}\left( {\mathbb {R}}\right) \)) yields the asserted convergence (27) in \( L^{2}\left( {\mathbb {R}}\right) \). \(\square \)

3.3 Distributions satisfying the assumptions

In this Section we are going to specify the results of the preceding sections to the special case of a bivariate Gaussian distribution and a uniform distribution on a rectangle. First, we verify that all relevant assumptions are satisfied in the Gaussian case:

Proposition 6

Let \(\xi \) be a bivariate random vector distributed according to \(\xi \sim {\mathcal {N}}\left( \mu ,\varSigma \right) \) with regular \(\varSigma \). Then its density \(g_{\xi }\) satisfies the assumptions (16), (17), (19) and (23).

Proof

The first marginal density of \(g_{\xi }\) is the density \(g_{\xi _{1}}\) of its first component \(\xi _{1}\sim {\mathcal {N}}\left( \mu _{1},\varSigma _{11}\right) \) which is bounded of course. Hence, (16) holds true. To show (17), recall that

$$\begin{aligned} g_{\xi }\left( r,s\right) =C\exp \left( -\frac{1}{2}\left( r-\mu _{1},s-\mu _{2}\right) \varSigma ^{-1}\left( {\begin{array}{c}r-\mu _{1}\\ s-\mu _{2}\end{array}}\right) \right) , \end{aligned}$$
(29)

where C is some normalizing factor. With \(C_{2}>0\) denoting the smallest eigenvalue of \(\varSigma ^{-1}\), we infer that, for all \(s\in {\mathbb {R}}\),

$$\begin{aligned} g_{\xi }\left( r,s\right)\le & {} C\exp \left( -\frac{C_{2}}{2}\left( r-\mu _{1}\right) ^{2}-\frac{C_{2}}{2}\left( s-\mu _{2}\right) ^{2}\right) \nonumber \\\le & {} C\exp \left( -\frac{C_{2}}{2}\left( r-\mu _{1}\right) ^{2}\right) \in L^{2}\left( {\mathbb {R}}\right) \end{aligned}$$
(30)

which implies (17). In order to verify (19) and (23), we first calculate the gradient of \(g_{\xi }\):

$$\begin{aligned} \nabla g_{\xi }\left( r,s\right) =-g_{\xi }\left( r,s\right) \varSigma ^{-1} \left( {\begin{array}{c}r-\mu _{1}\\ s-\mu _{2}\end{array}}\right) \quad \forall r,s\in {\mathbb {R}}. \end{aligned}$$

Denoting \(h_{i}(r):=\exp \left( -\frac{C_{2}}{2}\left( r-\mu _{i}\right) ^{2}\right) \) for \(i=1,2\), it follows from (30) that

$$\begin{aligned} \left\| \nabla g_{\xi }\left( r,s\right) \right\| \le {\tilde{C}} h_{1}(r)h_{2}(s)\left\| \left( r-\mu _{1},s-\mu _{2}\right) \right\| \quad \forall r,s\in {\mathbb {R}}, \end{aligned}$$

where \({\tilde{C}}:=C\left\| \varSigma ^{-1}\right\| \). Since the function \( h_{1}(r)h_{2}(s)\left\| \left( r-\mu _{1},s-\mu _{2}\right) \right\| ^{2}\) is bounded from above, we have that, for some \(C_{3}>0\),

$$\begin{aligned} \sqrt{h_{1}(r)h_{2}(s)}\left\| \left( r-\mu _{1},s-\mu _{2}\right) \right\| \le C_{3}\quad \forall r,s\in {\mathbb {R}}. \end{aligned}$$

Hence, thanks to \(h_{1},h_{2}\le 1\), we get that

$$\begin{aligned} \left\| \nabla g_{\xi }\left( r,s\right) \right\| \le {\tilde{C}}C_{3} \sqrt{h_{1}(r)h_{2}(s)}\le {\tilde{C}}C_{3}\sqrt{h_{2}(s)}\le {\tilde{C}} C_{3}\quad \forall r,s\in {\mathbb {R}}. \end{aligned}$$

Then, (23) follows from the Mean Value Theorem:

$$\begin{aligned} \left| g_{\xi }\left( r,s\right) -g_{\xi }\left( r,t\right) \right| \le {\tilde{C}}C_{3}\left| s-t\right| \quad \forall r,s,t\in {\mathbb {R}}. \end{aligned}$$

Similarly, for arbitrarily fixed \({\bar{r}}\in {\mathbb {R}}\)

$$\begin{aligned} \left| g_{\xi }\left( r,s\right) -g_{\xi }\left( {\bar{r}},s\right) \right| \le {\tilde{C}}C_{3}\sqrt{h_{2}(s)}\left| r-{\bar{r}} \right| \quad \forall r,s\in {\mathbb {R}}, \end{aligned}$$

where \(\sqrt{h_{2}}\in L^{1}\left( {\mathbb {R}}\right) \). This proves (19). \(\square \)

Theorem 3

Let \(\xi \) be a bivariate random vector distributed according to \({\mathcal {N}}\left( \mu ,\varSigma \right) \) with regular \(\varSigma \) . Then, the probability function \(\varphi \) in (15) is Lipschitz continuous and has a second partial derivative at an arbitrary \( \left( {\bar{x}}_{1},{\bar{x}}_{2}\right) \in {\mathcal {X}}= {\mathbb {R}}\times L^{2}\left( {\mathbb {R}}\right) \) which is given by the explicit formula

$$\begin{aligned} \nabla _{x_{2}}\varphi \left( {\bar{x}}_{1},{\bar{x}}_{2}\right) (r)=\left\{ \begin{array}{cc} \frac{1}{2\pi \sqrt{\det \varSigma }}\exp \left( -\frac{1}{2} \left( {\begin{array}{c}r-\mu _{1}\\ {\bar{x}} _{2}(r)-\mu _{2}\end{array}}\right) ^\top \varSigma ^{-1} \left( {\begin{array}{c}r-\mu _{1}\\ {\bar{x}} _{2}(r)-\mu _{2}\end{array}}\right) \right) &{} \text { if } r\le {\bar{x}}_{1} \\ &{} \\ 0 &{} \text { if } r>{\bar{x}}_{1} \end{array} \right. . \end{aligned}$$
(31)

Here, \(\nabla _{x_{2}}\varphi \) depends continuously (in the norm of \( {\mathcal {X}}\)) on \(x=\left( x_{1},x_{2}\right) \). Moreover, \(\varphi \) has a first partial derivative at an arbitrary \(\left( {\bar{x}}_{1},{\bar{x}} _{2}\right) \in {\mathcal {X}}= {\mathbb {R}}\times L^{2}\left( {\mathbb {R}} \right) \) with continuous \({\bar{x}}_{2}\) which is given by the explicit formula

$$\begin{aligned}&\frac{\partial \varphi }{\partial x_{1}}\left( {\bar{x}}_{1},{\bar{x}} _{2}\right) \nonumber \\&\quad =\frac{1}{\sqrt{2\pi \varSigma _{11}}}\exp \left( -\frac{1}{2\varSigma _{11}}\left( r-\mu _{1}\right) ^{2}\right) \varPhi \nonumber \\&\qquad \left( \frac{{\bar{x}}_{2}( {\bar{x}}_{1})-\mu _{2}-\varSigma _{11}^{-1}\varSigma _{12}\left( {\bar{x}}_{1}-\mu _{1}\right) }{\sqrt{\varSigma _{22}-\varSigma _{11}^{-1}\varSigma _{12}^{2}}}\right) , \end{aligned}$$
(32)

where \(\varPhi (t):=\left( 2\pi \right) ^{-1/2}\int _{-\infty }^{t}e^{-s^{2}/2}ds \) refers to the cumulative distribution function of the one-dimensional standard Gaussian distribution \({\mathcal {N}}\left( 0,1\right) \) . Here \(\frac{\partial \varphi }{\partial x_{1}}\left( \cdot ,{\bar{x}} _{2}\right) \) is continuous.

Proof

The Lipschitz continuity, the existence of partial derivatives and the corresponding continuity statements follow from Propositions 34 and 5 via Proposition 6. Relation (31) is obtained by specifying (24) for the density of \( {\mathcal {N}}\left( \mu ,\varSigma \right) \) (see (29) with \( C:=\left( 2\pi \sqrt{\det \varSigma }\right) ^{-1}\)). Concerning (32 ), we recall the formula derived in Proposition 4:

$$\begin{aligned} \frac{\partial \varphi }{\partial x_{1}}\left( {\bar{x}}_{1},{\bar{x}} _{2}\right)= & {} \int _{-\infty }^{{\bar{x}}_{2}({\bar{x}}_{1})}g_{\xi }\left( {\bar{x}}_{1},s\right) ds= g_{\xi _{1}}\left( {\bar{x}}_{1}\right) \int _{-\infty }^{ {\bar{x}}_{2}({\bar{x}}_{1})}\frac{g_{\xi }\left( {\bar{x}}_{1},s\right) }{g_{\xi _{1}}\left( {\bar{x}}_{1}\right) }ds \nonumber \\= & {} g_{\xi _{1}}\left( {\bar{x}}_{1}\right) \int _{-\infty }^{{\bar{x}}_{2}({\bar{x}} _{1})}g_{\xi _{2}|\xi _{1}={\bar{x}}_{1}}\left( s\right) ds\nonumber \\= & {} g_{\xi _{1}}\left( {\bar{x}}_{1}\right) G_{\xi _{2}|\xi _{1}{\bar{x}}_{1}}\left( {\bar{x}}_{2}({\bar{x}}_{1})\right) , \end{aligned}$$
(33)

where \(g_{\xi _{2}|\xi _{1}={\bar{x}}_{1}}\) and \(G_{\xi _{2}|\xi _{1}={\bar{x}} _{1}}\) refer to the conditional density and cumulative distribution function, respectively, of \(\xi _{2}\) given \(\xi _{1}={\bar{x}}_{1}\). As it is well known for the Gaussian case assumed here, one has that the conditioned random variable \(\xi _{2}|\xi _{1}={\bar{x}}_{1}\) has a one-dimensional Gaussian distribution with

$$\begin{aligned} \left( \xi _{2}|\xi _{1}={\bar{x}}_{1}\right) \sim {\mathcal {N}}\left( \mu _{2}+\varSigma _{11}^{-1}\varSigma _{12}\left( {\bar{x}}_{1}-\mu _{1}\right) ,\varSigma _{22}-\varSigma _{11}^{-1}\varSigma _{12}^{2}\right) . \end{aligned}$$

After normalization, we get that

$$\begin{aligned} \eta :=\frac{\left( \xi _{2}|\xi _{1}={\bar{x}}_{1}\right) -\mu _{2}-\varSigma _{11}^{-1}\varSigma _{12}\left( {\bar{x}}_{1}-\mu _{1}\right) }{\sqrt{\varSigma _{22}-\varSigma _{11}^{-1}\varSigma _{12}^{2}}}\sim {\mathcal {N}}\left( 0,1\right) . \end{aligned}$$

Now, the definition of \(G_{\xi _{2}|\xi _{1}={\bar{x}}_{1}}\) yields that

$$\begin{aligned} G_{\xi _{2}|\xi _{1}={\bar{x}}_{1}}\left( {\bar{x}}_{2}({\bar{x}}_{1})\right)= & {} {\mathbb {P}}\left( \left( \xi _{2}|\xi _{1}={\bar{x}}_{1}\right) \le {\bar{x}} _{2}({\bar{x}}_{1})\right) \\= & {} {\mathbb {P}}\left( \eta \le \frac{{\bar{x}}_{2}({\bar{x}}_{1})-\mu _{2}-\varSigma _{11}^{-1}\varSigma _{12}\left( {\bar{x}}_{1}-\mu _{1}\right) }{\sqrt{\varSigma _{22}-\varSigma _{11}^{-1}\varSigma _{12}^{2}}}\right) \\= & {} \varPhi \left( \frac{{\bar{x}}_{2}({\bar{x}}_{1})-\mu _{2}-\varSigma _{11}^{-1}\varSigma _{12}\left( {\bar{x}}_{1}-\mu _{1}\right) }{\sqrt{\varSigma _{22}-\varSigma _{11}^{-1}\varSigma _{12}^{2}}}\right) , \end{aligned}$$

where \(\varPhi \) is the cumulative distribution function of \({\mathcal {N}}\left( 0,1\right) \). Now the asserted formula (32) follows form (33) upon plugging in the formula for the first marginal density \( g_{\xi _{1}}\) of \(g_{\xi }\) having distribution \({\mathcal {N}}\left( \mu _{1},\varSigma _{11}\right) \). \(\square \)

Corresponding results can be expected for many other bivariate distributions having continuous density. As a contrast, we briefly refer to uniform distributions over rectangles for which no differentiability results for \(\varphi \) but at least Lipschitz continuity can be expected:

Proposition 7

Let \(\xi \) be a bivariate random vector having a uniform distribution over the rectangle \(\left[ a,b\right] \times \left[ c,d\right] \). Then, the probability function \(\varphi \) in (15) is Lipschitz continuous.

Proof

The density \(g_{\xi }\) satisfies the assumptions (16) and (17) thanks to the following easy to verify relations

$$\begin{aligned} g_{\xi _{1}}=\frac{1}{b-a}\chi _{\left[ a,b\right] };\quad \sup _{s\in {\mathbb {R}}}g_{\xi }\left( \cdot ,s\right) =\frac{1}{\left( b-a\right) \left( d-c\right) }\chi _{\left[ a,b\right] }. \end{aligned}$$

Now, the assertion follows from Proposition 3. \(\square \)

Note, that a uniform distribution as in the previous Proposition cannot satisfy relations (19) and (23) because of the discontinuity of its density. Therefore, no differentiability results as in Propositions 4 and 5 can be expected and counter examples are easily constructed.

3.4 Application to an optimization problem

In the following, we consider the simple dynamic probabilistic constraint as a part of the following two-stage optimization problem (15):

$$\begin{aligned} \min _{x\in {\mathcal {X}}}\left\{ c_{1}x_{1}+c_2{\mathbb {E}}x_{2}\left( \xi _{1}\right) \chi _{(-\infty ,x_{1}]}\left( \xi _{1}\right) |\varphi (x)\ge p\right\} , \end{aligned}$$
(34)

where \(\xi \) (occuring in the definition of \(\varphi \)) is a bivariate random vector distributed according to \({\mathcal {N}}\left( \mu ,\varSigma \right) \). The objective is linear in the decisions, it could represent, for instance, linear costs. Since the second stage decision is random, its costs are represented as an expected value. Note, however, that considering the full expected value \({\mathbb {E}}x_{2}\left( \xi _{1}\right) \) would not make much sense: Indeed, since function values of \(x_{2}\left( \xi _{1}\right) \) for arguments \(\xi _{1}>x_{1}\) do not affect the probability \(\varphi (x)\) (see (15)), one could drive the expected value \({\mathbb {E}} x_{2}\left( \xi _{1}\right) \) to \(-\infty \) while keeping the decision x feasible. Therefore, we measure the costs of \(x_{2}\) by ignoring in the objective its values beyond \(x_{1}\) and rather considering the expected value of \(x_{2}\chi _{(-\infty ,x_{1}]}\).

In a first step, one might be interested in deriving some information from necessary optimality conditions for this problem. Here, one has to take into account that \(\varphi \) is not continuously differentiable [see Example (2)]. However, \(\varphi \) is continuously partially differentiable with respect to \(x_{2}\) thanks to Proposition 5. This suggests to consider the decomposed version of problem (34):

$$\begin{aligned} \min _{x_{1}\in {\mathbb {R}}}\left\{ c_{1}x_{1}+\min _{x_{2}\in L^{2}\left( {\mathbb {R}}\right) }\left\{ c_{2}{\mathbb {E}}x_{2}\left( \xi _{1}\right) \chi _{(-\infty ,x_{1}]}\left( \xi _{1}\right) |\varphi (x_{1},x_{2})\ge p\right\} \right\} . \end{aligned}$$
(35)

Here, the one-dimensional outer minimization over \(x_{1}\) can be realized by elementary numerical approaches. Therefore, our interest will focus on the inner minimization problem over \(x_{2}\in L^{2}\left( {\mathbb {R}} \right) \) for some fixed \({\bar{x}}_{1}\in {\mathbb {R}}\):

$$\begin{aligned} \min _{x_{2}\in L^{2}\left( {\mathbb {R}}\right) }\left\{ c_{2}{\mathbb {E}} x_{2}\left( \xi _{1}\right) \chi _{(-\infty ,{\bar{x}}_{1}]}\left( \xi _{1}\right) |\varphi ({\bar{x}}_{1},x_{2})\ge p\right\} . \end{aligned}$$
(36)

For this inner optimization problem, the data (objective and constraint) are continuously differentiable and one can formulate necessary optimality conditions at some fixed \({\bar{x}}_{2}\in L^{2}\left( {\mathbb {R}}\right) \) provided that \(\nabla _{x_{2}}\varphi ({\bar{x}}_{1},{\bar{x}}_{2})\ne 0\). This, however, is an immediate consequence of (31). Hence, one may formulate the following necessary optimality condition:

Proposition 8

Let \({\bar{x}}_{2}\in L^{2}\left( {\mathbb {R}}\right) \) be a solution of the optimization problem (36) (with some fixed \(\bar{ x}_{1}\in {\mathbb {R}}\)). Then, \({\bar{x}}_{2}\) is affine linear on the set \( (-\infty ,x_{1}]\) with the explicit value of \(\varSigma _{12}/\varSigma _{11}\) for its slope.

Proof

Without loss of generality, we may assume that \(c_{2}=1\) in (36) because the solution of the problem is not affected by the value of \(c_{2}\). The gradient of the objective evaluated at \({\bar{x}}_{2}\) has to be a multiple of the gradient \(\nabla \varphi ({\bar{x}}_{1},{\bar{x}}_{2})\) to the constraint in (36) also evaluated at \({\bar{x}}_{2}\). Clearly, the objective

$$\begin{aligned} {\mathbb {E}}x_{2}\left( \xi _{1}\right) =\int x_{2}\left( r\right) \chi _{(-\infty ,x_{1}]}\left( r\right) g_{\xi _{1}}(r)dr \end{aligned}$$

(with \(g_{\xi _{1}}\) referring to the density of \(\xi _{1}\)) has a gradient which is given by the function \(\chi _{(-\infty , x_{1}]}g_{\xi _{1}}\). Hence, there exists some multiplier \(\lambda \) such that

$$\begin{aligned} g_{\xi _{1}}(r)=\lambda \nabla _{x_{2}}\varphi ({\bar{x}}_{1},{\bar{x}} _{2})\left( r\right) \quad \text {a.e. }r\le {\bar{x}}_{1}. \end{aligned}$$

Since the one-dimensional Gaussian density \(g_{\xi _{1}}\) is strictly positive, we infer that \(\lambda >0\). Given the explicit formula for \(g_{\xi _{1}}\) as well as for \(\nabla _{x_{2}}\varphi ({\bar{x}}_{1},{\bar{x}}_{2})\) in (31), we derive the existence of constants \(K_{1},K_{2}>0\) (where the latter already incorporates the multiplier \(\lambda \)) such that for almost every \(r\le {\bar{x}}_{1}\):

$$\begin{aligned}&K_{1}\exp \left( -\frac{1}{2}\frac{\left( r-\mu _{1}\right) ^{2}}{\varSigma _{11}}\right) \nonumber \\&\quad = K_{2}\exp \left( -\frac{1}{2}\left( r-\mu _{1},{\bar{x}} _{2}(r)-\mu _{2}\right) \varSigma ^{-1}\left( {\begin{array}{c}r-\mu _{1}\\ {\bar{x}}_{2}(r)-\mu _{2}\end{array}}\right) \right) \end{aligned}$$
(37)

We fix an arbitrary r for which (37) holds true. Using the correlation \(\rho \) between the two components \(\xi _{1}\) and \(\xi _{2}\), the inverse covariance matrix can be written as

$$\begin{aligned} \varSigma ^{-1}=\frac{1}{1-\rho ^{2}}\left( \begin{array}{cc} \frac{1}{\varSigma _{11}} &{} -\frac{\rho }{\sqrt{\varSigma _{11}\varSigma _{22}}} \\ -\frac{\rho }{\sqrt{\varSigma _{11}\varSigma _{22}}} &{} \frac{1}{\varSigma _{22}} \end{array} \right) ;\quad \left( \rho :=\frac{\varSigma _{12}}{\sqrt{\varSigma _{11}\varSigma _{22}}}\right) . \end{aligned}$$

Taking the log in (37) and rearranging terms, one arrives at

$$\begin{aligned} \log \frac{K_{1}}{K_{2}}&=\frac{1}{1-\rho ^{2}}\left( \frac{-\rho ^{2}}{ 2\varSigma _{11}}\left( r-\mu _{1}\right) ^{2}+\frac{\rho }{\sqrt{\varSigma _{11}\varSigma _{22}}}\left( r-\mu _{1}\right) \left( {\bar{x}}_{2}(r)-\mu _{2}\right) \right. \\&\qquad \left. -\frac{1}{2\varSigma _{22}}\left( {\bar{x}}_{2}(r)-\mu _{2}\right) ^{2}\right) . \end{aligned}$$

Putting

$$\begin{aligned} \alpha :=\frac{{\bar{x}}_{2}(r)-\mu _{2}}{\sqrt{\varSigma _{22}}};\quad \beta :=2\left( 1-\rho ^{2}\right) \log \frac{K_{1}}{K_{2}}, \end{aligned}$$

the last identity can be rewritten as

$$\begin{aligned} \alpha ^{2}-\frac{2\rho }{\sqrt{\varSigma _{11}}}\left( r-\mu _{1}\right) \alpha +\frac{\rho ^{2}}{\varSigma _{11}}\left( r-\mu _{1}\right) ^{2}+\beta =0. \end{aligned}$$

Resolution for \(\alpha \) yields that

$$\begin{aligned} \alpha =\frac{\rho }{\sqrt{\varSigma _{11}}}\left( r-\mu _{1}\right) \pm \sqrt{ -\beta }. \end{aligned}$$

Resubstituting for \(\alpha \) and \(\beta \) gives our assertion on the structure of \({\bar{x}}_{2}\):

$$\begin{aligned} {\bar{x}}_{2}(r)=\frac{\sqrt{\varSigma _{22}}\rho }{\sqrt{\varSigma _{11}}}r+\mu _{2}-\frac{\sqrt{\varSigma _{22}}\rho }{\sqrt{\varSigma _{11}}}\mu _{1}\pm \sqrt{ \varSigma _{22}}\sqrt{2\left( 1-\rho ^{2}\right) \log \frac{K_{2}}{K_{1}}} \end{aligned}$$

\(\square \)

Unfortunately, since an affine linear function cannot belong to \(L^{2}\left( {\mathbb {R}}\right) \) unless it is identically zero, we draw the following negative conclusion of Proposition 8:

Corollary 3

If the components \(\xi _{1}\) and \(\xi _{2}\) of \(\xi \) are not independent, then problem (36) has no local, much less global solution.

Proof

The independence assumption implies that \(\varSigma _{12}\ne 0\). Hence, if (36) had a local solution \({\bar{x}}_{2}\in L^{2}\left( {\mathbb {R}} \right) \), then this solution would be a linear function (in the range from \( -\infty \) to \({\bar{x}}_{1}\)) with nonzero slope by Proposition 8. Therefore it does not belong to \(L^{2}\left( {\mathbb {R}}\right) \), a contradiction.

Before deriving a remedy to the outcome of Corollary 3, we want to illustrate the use of the gradient information collected in (31) in a numerical context. We consider problem (36) with the following data:

$$\begin{aligned} c_{2}=1;\quad {\bar{x}}_{1}=2;\quad p=0.8;\quad \xi \sim {\mathcal {N}}\left( \left( 0,0\right) ,\left( \begin{array}{cc} 1 &{} 0.25 \\ 0.25 &{} 1 \end{array} \right) \right) . \end{aligned}$$

Using the explicit representation of the gradients for the objective and the constraint (see proof of Proposition 8), we apply a simple projected gradient algorithm in order to improve the second stage decision \(x_{2}\) in (36).

Fig. 2
figure 2

Plot of several iterates for a projected gradient algorithm applied to problem (36) (left) and of associated values of the objective (right)

The left diagram of Fig. 2 shows some iterates of this algorithm. All plotted policies realize exactly the desired probability \(p=0.8\) in the definition of the chance constraint in (36). The starting point for \(x_{2}\) was chosen as a simple step function “1”, which after the first iteration turned into a nonlinear—still discontinuous—policy “2”. After some further iterations, the policy becomes continuous. Interestingly, after seven iterations “3”, the policy is affine linear on a certain subdomain. It turns out that on this subdomain, the policy perfectly coincides with the affine linear policy “4” satisfying the necessary optimality condition in Proposition 8. The latter is easily identified by its slope, which according to Proposition 8 calculates as \(\varSigma _{12}/\varSigma _{1,1}=0.25\) and by its intercept which has to be chosen in order to match the probability level \(p=0.8\). Observe, that all iterates decay to zero on the left end of the negative axis in order to belong to \(L^2({\mathbb {R}})\). The right diagram of Fig. 2 plots the objective for the first seven iterates and for the the affine linear policy from Proposition 8 (isolated point). Evidently, the necessary optimality condition from Proposition 8 still carries some information on the optimality condition though not belonging to the \(L^2\) space.

3.5 Derivation of a global solution

Motivated by the negative result of Corollary 3, we consider now the optimization problem

$$\begin{aligned} \min _{\left( x_{1},x_{2}\right) \in {\mathbb {R}}\times {\mathcal {X}}^{*}}\left\{ c_{1}x_{1}+c_{2}{\mathbb {E}}x_{2}\left( \xi _{1}\right) \chi _{\left( -\infty ,x_{1}\right] }\left( \xi _{1}\right) |\varphi \left( x_{1},x_{2}\right) \ge p;\,x_{2}\in M\right\} , \end{aligned}$$
(38)

where \({\mathcal {X}}^{*}\) is the set of all Borel measurable functions \(f: {\mathbb {R}}\rightarrow {\mathbb {R}}\), \(c_{1},c_{2}>0\) are some positive cost coefficients, \(\xi \sim {\mathcal {N}}\left( \mu ,\varSigma \right) \) is a bivariate Gaussian random vector, \(\varphi \) is the probability function defined in (15), \(p\in \left( 0,1\right] \) is some probability level and the additional constraint set is given by

$$\begin{aligned} M:=\left\{ x_{2}\in {\mathcal {X}}^{*}|\,x_{2}\left( r\right) \ge \mu _{2}+ \frac{\varSigma _{12}}{\varSigma _{11}}\left( r-\mu _{1}\right) \right\} . \end{aligned}$$

This problem differs from (34) first in that the space of second stage decisions is much larger than the space \(L^{2}\left( {\mathbb {R}}\right) \) considered before, so that it also includes affine linear functions. At the same time we add the technical constraint \(x_{2}\in M\) in order to identify a global solution by a direct argument rather than by necessary optimality conditions. However, we shall make use of the result obtained before in Proposition 8 to get the right guess for a candidate of an optimal second stage solution (affine linear function with slope \(\frac{\varSigma _{12}}{\varSigma _{11} }\)). Note that we do not require the expectation of \(x_{2}\chi _{\left( -\infty ,x_{1}\right] }\) in (38) to be finite. It turns out that the solution of (38) can be reduced to a one-dimensional optimization. Before stating the result, we introduce the real functions

$$\begin{aligned} \alpha \left( t\right)&:=\varPhi ^{-1}\left( \frac{p}{\varPhi \left( \frac{ t-\mu _{1}}{\sqrt{\varSigma _{11}}}\right) }\right) \sqrt{\varSigma _{22}-\frac{ \varSigma _{12}^{2}}{\varSigma _{11}}}+\mu _{2}-\frac{\varSigma _{12}}{\varSigma _{11}} \mu _{1} \end{aligned}$$
(39)
$$\begin{aligned}&\quad \left( t>\sqrt{\varSigma _{11}}\varPhi ^{-1}\left( p\right) +\mu _{1}\right) , \end{aligned}$$
(40)
$$\begin{aligned} \beta \left( t\right)&:=\mu _{1}-\sqrt{\varSigma _{11}}\frac{\phi \left( \frac{t-\mu _{1}}{\sqrt{\varSigma _{11}}}\right) }{\varPhi \left( \frac{ t-\mu _{1}}{\sqrt{\varSigma _{11}}}\right) } \end{aligned}$$
(41)

where \(\mu _{i}\) and \(\varSigma _{ij}\) refer to the corresponding components of \(\mu \) and \(\varSigma \), respectively and \(\varPhi \) denotes as before the cumulative distribution function of the one-dimensional standard Gaussian distribution \({\mathcal {N}}\left( 0,1\right) \). Recall that \(\varPhi \) is invertible with inverse \(\varPhi ^{-1}\) called the quantile function of \( {\mathcal {N}}\left( 0,1\right) \). Since \(\varPhi ^{-1}\) is defined only on the open interval \(\left( 0,1\right) \), (39) is defined correctly only if \(p<1\) (which we shall impose in the Theorem below) and if

$$\begin{aligned} \varPhi \left( \frac{t-\mu _{1}}{\sqrt{\varSigma _{11}}}\right) >p, \end{aligned}$$

which leads to the constrained domain of definition in (39). Finally, the function

$$\begin{aligned} \phi \left( t\right) :=\frac{1}{\sqrt{2\pi }}e^{-t^{2}/2}, \end{aligned}$$

appearing in (41), is the density of the one-dimensional standard Gaussian distribution.

Theorem 4

Let \(p\in \left[ \frac{1}{2},1\right) \) be given. Then, a global solution of problem (38) is given by \(\left( x_{1}^{*},x_{2}^{*}\right) \in {\mathbb {R}}\times {\mathcal {X}}^{*}\), where \(x_{1}^{*}\) is a minimizer over \({\mathbb {R}}\) of the real function \(c_{1}t+c_{2}f\left( t\right) \) over the open interval \(\left( \sqrt{\varSigma _{11}}\varPhi ^{-1}\left( p\right) +\mu _{1},\infty \right) \), with

$$\begin{aligned} f\left( t\right) :=\left[ \frac{\varSigma _{12}}{\varSigma _{11}}\beta \left( t\right) +\alpha \left( t\right) \varPhi \left( \frac{t-\mu _{1}}{\sqrt{\varSigma _{11}}}\right) \right] \quad \left( t\in {\mathbb {R}}\right) \end{aligned}$$

and

$$\begin{aligned} x_{2}^{*}\left( r\right) :=\frac{\varSigma _{12}}{\varSigma _{11}}r+\alpha \left( x_{1}^{*}\right) \quad \left( r\in {\mathbb {R}}\right) . \end{aligned}$$

Proof

We start our proof with an intermediary result. To this aim, fix an arbitrary

$$\begin{aligned} x_{1}>\sqrt{\varSigma _{11}}\varPhi ^{-1}\left( p\right) +\mu _{1}, \end{aligned}$$
(42)

in order to make the value \(\alpha \left( x_{1}\right) \) well defined in (39). We claim that the second stage policy defined by

$$\begin{aligned} y\left( r\right) :=\frac{\varSigma _{12}}{\varSigma _{11}}r+\alpha \left( x_{1}\right) \quad \left( r\in {\mathbb {R}}\right) \end{aligned}$$
(43)

is a global solution to the problem

$$\begin{aligned} \min _{x_{2}\in {\mathcal {X}}^{*}}\left\{ c_{2}{\mathbb {E}}x_{2}\left( \xi _{1}\right) \chi _{\left( -\infty ,x_{1}\right] }\left( \xi _{1}\right) |\varphi \left( x_{1},x_{2}\right) \ge p;\,x_{2}\in M\right\} . \end{aligned}$$
(44)

In a first step we check the feasibility of y in this problem. Introducing the linear transformation

$$\begin{aligned} \eta _{1}:=\xi _{1};\quad \eta _{2}:=\xi _{2}-\frac{\varSigma _{12}}{\varSigma _{11}}\xi _{1} \end{aligned}$$
(45)

of random variables, we observe that according to well-known rules the random vector \(\eta :=\left( \eta _{1},\eta _{2}\right) \) obeys a bivariate Gaussian law \({\mathcal {N}}\left( {\tilde{\mu }},{\tilde{\varSigma }}\right) \) with parameters

$$\begin{aligned} {\tilde{\mu }}=\left( \mu _{1},\mu _{2}-\frac{\varSigma _{12}}{\varSigma _{11}}\mu _{1}\right) ,\quad {\tilde{\varSigma }}=\left( \begin{array}{ll} \varSigma _{11} &{}\quad 0 \\ 0 &{}\quad \varSigma _{22}-\frac{\varSigma _{12}^{2}}{\varSigma _{11}} \end{array} \right) . \end{aligned}$$

In particular, the components \(\eta _{1},\eta _{2}\)—having zero covariance—are independent. It follows from the definition (15) of \(\varphi \) and from (43) that

$$\begin{aligned} \varphi \left( x_{1},y\right)= & {} {\mathbb {P}}\left( \xi _{1}\le x_{1},\xi _{2}\le y\left( \xi _{1}\right) \right) ={\mathbb {P}}\left( \eta _{1}\le x_{1},\eta _{2}\le \alpha \left( x_{1}\right) \right) \\= & {} {\mathbb {P}}\left( \eta _{1}\le x_{1}\right) {\mathbb {P}}\left( \eta _{2}\le \alpha \left( x_{1}\right) \right) , \end{aligned}$$

where the last equality follows from the independence of \(\eta _{1}\) and \( \eta _{2}\). Again, by the well-known transformation laws of Gaussian distributions as well as by (39), it holds that

$$\begin{aligned} {\mathbb {P}}\left( \eta _{1}\le x_{1}\right)= & {} \varPhi \left( \frac{x_{1}- {\tilde{\mu }}_{1}}{\sqrt{{\tilde{\varSigma }}_{11}}}\right) =\varPhi \left( \frac{ x_{1}-\mu _{1}}{\sqrt{\varSigma _{11}}}\right) ; \nonumber \\ {\mathbb {P}}\left( \eta _{2}\le \alpha \left( x_{1}\right) \right)= & {} \varPhi \left( \frac{\alpha \left( x_{1}\right) -{\tilde{\mu }}_{2}}{\sqrt{{\tilde{\varSigma }}_{22}}}\right) =\varPhi \left( \frac{\alpha \left( x_{1}\right) -\mu _{2}+ \frac{\varSigma _{12}}{\varSigma _{11}}\mu _{1}}{\sqrt{\varSigma _{22}-\frac{\varSigma _{12}^{2}}{\varSigma _{11}}}}\right) \nonumber \\= & {} \frac{p}{\varPhi \left( \frac{x_{1}-\mu _{1} }{\sqrt{\varSigma _{11}}}\right) }. \end{aligned}$$
(46)

Consequently, we arrive at \(\varphi \left( x_{1},y\right) =p\). Hence, \( x_{2}:=y\) is feasible with respect to the constraint \(\varphi \left( x_{1},x_{2}\right) \ge p\). Next, we verify that \(y\in M\). By definition of y, M and \(\alpha \), it suffices to show that

$$\begin{aligned} \alpha \left( x_{1}\right) \ge \mu _{2}-\frac{\varSigma _{12}}{\varSigma _{11}} \mu _{1}={\tilde{\mu }}_{2}. \end{aligned}$$
(47)

Indeed, the assumption that \(\alpha \left( x_{1}\right) <{\tilde{\mu }}_{2}\) would lead—via the fact that the values of \(\varPhi \) are strictly smaller than one—to the contradiction

$$\begin{aligned} p<\frac{p}{\varPhi \left( \frac{x_{1}-\mu _{1}}{\sqrt{\varSigma _{11}}}\right) }= {\mathbb {P}}\left( \eta _{2}\le \alpha \left( x_{1}\right) \right) <{\mathbb {P}} \left( \eta _{2}\le {\tilde{\mu }}_{2}\right) \le \frac{1}{2} \end{aligned}$$

with our assumption that \(p\ge \frac{1}{2}\). Summarizing, \(y\in {\mathcal {X}} ^{*}\) defined by (43) is a feasible second stage policy in problem (44).

In the last step of our initial claim, we show that there is no other feasible second stage decision in (44) that would yield a strictly smaller objective value than y. Indeed, assume to the contrary, that some function \({\tilde{y}}\in M\) with \(\varphi \left( x_{1},{\tilde{y}} \right) \ge p\) would realize in (44) a strictly smaller objective value than y. Then, since \(c_{2}>0\),

$$\begin{aligned} {\mathbb {E}}{\tilde{y}}\left( \xi _{1}\right) \chi _{\left( -\infty ,x_{1}\right] }\left( \xi _{1}\right) <{\mathbb {E}}y\left( \xi _{1}\right) \chi _{\left( -\infty ,x_{1}\right] }\left( \xi _{1}\right) . \end{aligned}$$
(48)

Now, by (45) and with \(g_{\eta }\) denoting the density of \(\eta \),

$$\begin{aligned} 0\le & {} \varphi \left( x_{1},{\tilde{y}}\right) -\varphi \left( x_{1},y\right) ={\mathbb {P}}\left( \xi _{1}\le x_{1},\xi _{2}\le {\tilde{y}} \left( \xi _{1}\right) \right) -{\mathbb {P}}\left( \xi _{1}\le x_{1},\xi _{2}\le y\left( \xi _{1}\right) \right) \\= & {} {\mathbb {P}}\left( \eta _{1}\le x_{1},\eta _{2}\le {\tilde{y}}\left( \eta _{1}\right) -\frac{\varSigma _{12}}{\varSigma _{11}}\eta _{1}\right) -{\mathbb {P}} \left( \eta _{1}\le x_{1},\eta _{2}\le y\left( \eta _{1}\right) -\frac{ \varSigma _{12}}{\varSigma _{11}}\eta _{1}\right) \\= & {} \int _{-\infty }^{x_{1}}\int _{-\infty }^{{\tilde{y}}\left( r\right) -\frac{ \varSigma _{12}}{\varSigma _{11}}r}g_{\eta }\left( s,r\right) dsdr-\int _{-\infty }^{x_{1}}\int _{-\infty }^{y\left( r\right) -\frac{\varSigma _{12}}{\varSigma _{11}} r}g_{\eta }\left( s,r\right) dsdr. \end{aligned}$$

Recalling that, by independence of the components \(\eta _{1}\) and \(\eta _{2}\) , we may write \(g_{\eta }\left( s,r\right) =g_{\eta _{1}}\left( r\right) g_{\eta _{2}}\left( s\right) \), where \(g_{\eta _{1}}\),\(g_{\eta _{2}}\) refer to the one-dimensional densities of \(\eta _{1}\) and \(\eta _{2}\), we may—taking into account (43)—continue as

$$\begin{aligned} 0\le & {} \int _{-\infty }^{x_{1}}g_{\eta _{1}}\left( r\right) \left( \int _{-\infty }^{{\tilde{y}}\left( r\right) -\frac{\varSigma _{12}}{\varSigma _{11}} r}g_{\eta _{2}}\left( s\right) ds\right) dr\nonumber \\&-\int _{-\infty }^{x_{1}}g_{\eta _{1}}\left( r\right) \left( \int _{-\infty }^{\alpha \left( x_{1}\right) }g_{\eta _{2}}\left( s\right) ds\right) dr \nonumber \\= & {} \int _{-\infty }^{x_{1}}g_{\eta _{1}}\left( r\right) F_{\eta _{2}}\left( {\tilde{y}}\left( r\right) -\frac{\varSigma _{12}}{\varSigma _{11}}r\right) dr-\int _{-\infty }^{x_{1}}g_{\eta _{1}}\left( r\right) F_{\eta _{2}}\left( \alpha \left( x_{1}\right) \right) dr \nonumber \\= & {} \int _{-\infty }^{x_{1}}g_{\eta _{1}}\left( r\right) \left[ F_{\eta _{2}}\left( {\tilde{y}}\left( r\right) -\frac{\varSigma _{12}}{\varSigma _{11}} r\right) -F_{\eta _{2}}\left( \alpha \left( x_{1}\right) \right) \right] dr, \end{aligned}$$
(49)

where \(F_{\eta _{2}}\) refers to the cumulative distribution function of \( \eta _{2}\). Since one-dimensional Gaussian distribution functions are concave right of their mean (their second derivative coincides with the first derivative of the density, and so is negative right of the mean), we have the relation

$$\begin{aligned} F_{\eta _{2}}\left( t\right) \le F_{\eta _{2}}\left( s\right) +F_{\eta _{2}}^{\prime }\left( s\right) \left( t-s\right) \quad \forall s,t\ge {\tilde{\mu }}_{2}. \end{aligned}$$

Now, \(\alpha \left( x_{1}\right) \ge {\tilde{\mu }}_{2}\) by (47) and also, because of \({\tilde{y}}\in M\),

$$\begin{aligned} {\tilde{y}}\left( r\right) -\frac{\varSigma _{12}}{\varSigma _{11}}r\ge \mu _{2}- \frac{\varSigma _{12}}{\varSigma _{11}}\mu _{1}={\tilde{\mu }}_2. \end{aligned}$$

Therefore, we may conclude that

$$\begin{aligned} F_{\eta _{2}}\left( {\tilde{y}}\left( r\right) -\frac{\varSigma _{12}}{\varSigma _{11}}r\right) \le F_{\eta _{2}}\left( \alpha \left( x_{1}\right) \right) +\varDelta \left( {\tilde{y}}\left( r\right) -\frac{\varSigma _{12}}{\varSigma _{11}} r-\alpha \left( x_{1}\right) \right) , \end{aligned}$$

where, by positivity of Gaussian densities,

$$\begin{aligned} \varDelta :=F_{\eta _{2}}^{\prime }\left( \alpha \left( x_{1}\right) \right) =g_{\eta _{2}}\left( \alpha \left( x_{1}\right) \right) >0. \end{aligned}$$

This allows us, along with (43) and (48), to proceed lead (49) to the contradiction

$$\begin{aligned} 0\le & {} \varDelta \int _{-\infty }^{x_{1}}g_{\eta _{1}}\left( r\right) \left( {\tilde{y}}\left( r\right) -\frac{\varSigma _{12}}{\varSigma _{11}}r-\alpha \left( x_{1}\right) \right) dr=\varDelta \int _{-\infty }^{x_{1}}g_{\eta _{1}}\left( r\right) \left[ {\tilde{y}}\left( r\right) -y(r)\right] dr \\= & {} \varDelta \left[ {\mathbb {E}}{\tilde{y}}\left( \xi _{1}\right) \chi _{\left( -\infty ,x_{1}\right] }\left( \xi _{1}\right) -{\mathbb {E}}y\left( \xi _{1}\right) \chi _{\left( -\infty ,x_{1}\right] }\left( \xi _{1}\right) \right] <0. \end{aligned}$$

This proves our initial claim that y in (43) is a global solution to (44). Accordingly, for each \(x_{1}\in {\mathbb {R}}\) satisfying (42), we have that

$$\begin{aligned}&\min _{x_{2}\in {\mathcal {X}}^{*}}\left\{ c_{2}{\mathbb {E}}x_{2}\left( \xi _{1}\right) \chi _{\left( -\infty ,x_{1}\right] }\left( \xi _{1}\right) |\varphi \left( x_{1},x_{2}\right) \ge p;\,x_{2}\in M\right\} \\&\quad =c_{2}{\mathbb {E}}y\left( \xi _{1}\right) \chi _{\left( -\infty ,x_{1}\right] }\left( \xi _{1}\right) =c_{2}{\mathbb {E}}\left[ \frac{\varSigma _{12}}{\varSigma _{11}}\xi _{1}+\alpha \left( x_{1}\right) \right] \chi _{\left( -\infty ,x_{1}\right] }\left( \xi _{1}\right) \\&\quad =c_{2}\left[ \frac{\varSigma _{12}}{\varSigma _{11}}{\mathbb {E}}\xi _{1}|_{\xi _{1}\le x_{1}}+\alpha \left( x_{1}\right) {\mathbb {E}}\chi _{\left( -\infty ,x_{1}\right] }\left( \xi _{1}\right) \right] . \end{aligned}$$

It is well-known that for the Gaussian random variable \(\xi _{1}\) with mean \( \mu _{1}\) and standard deviation \(\sqrt{\varSigma _{11}}\) the mean conditioned to \(\xi _{1}\le x_{1}\) calculates as \({\mathbb {E}}\xi _{1}|_{\xi _{1}\le x_{1}}=\beta \left( x_{1}\right) \), where \(\beta \) is defined in (41). Since, moreover, by (46),

$$\begin{aligned} {\mathbb {E}}\chi _{\left( -\infty ,x_{1}\right] }\left( \xi _{1}\right) = {\mathbb {P}}\left( \xi _{1}\le x_{1}\right) ={\mathbb {P}}\left( \eta _{1}\le x_{1}\right) =\varPhi \left( \frac{x_{1}-\mu _{1}}{\sqrt{\varSigma _{11}}}\right) , \end{aligned}$$

we may conclude that, with the function f as introduced in the statement of this Theorem,

$$\begin{aligned} \min _{x_{2}\in {\mathcal {X}}^{*}}\left\{ c_{2}{\mathbb {E}}x_{2}\left( \xi _{1}\right) \chi _{\left( -\infty ,x_{1}\right] }\left( \xi _{1}\right) |\varphi \left( x_{1},x_{2}\right) \ge p;\,x_{2}\in M\right\} =c_{2}f\left( x_{1}\right) . \end{aligned}$$
(50)

Now, finally, we turn to our given problem (38) and decompose it pretty much the same way we did in (35):

$$\begin{aligned} \min _{x_{1}\in {\mathbb {R}}}\left\{ c_{1}x_{1}+\min _{x_{2}\in {\mathcal {X}} ^{*}}\left\{ c_{2}{\mathbb {E}}x_{2}\left( \xi _{1}\right) \chi _{\left( -\infty ,x_{1}\right] }\left( \xi _{1}\right) |\varphi \left( x_{1},x_{2}\right) \ge p;\,x_{2}\in M\right\} \right\} . \end{aligned}$$

Observe first that for arbitrary \(x_{1}\) with \(x_{1}\le \sqrt{\varSigma _{11}} \varPhi ^{-1}\left( p\right) +\mu _{1}\) the feasible set of the inner problem above is empty. Indeed, if there existed some \(x_{2}\in {\mathcal {X}}^{*}\) with \(\varphi \left( x_{1},x_{2}\right) \ge p\), then, repeating an earlier argumentation on top of (49), we could establish the contradiction

$$\begin{aligned} p\le & {} \varphi \left( x_{1},x_{2}\right) ={\mathbb {P}}\left( \xi _{1}\le x_{1},\xi _{2}\le x_{2}\left( \xi _{1}\right) \right) =\int _{-\infty }^{x_{1}}g_{\eta _{1}}\left( r\right) F_{\eta _{2}}\left( {\tilde{y}}\left( r\right) -\frac{\varSigma _{12}}{\varSigma _{11}}r\right) dr \\< & {} \int _{-\infty }^{x_{1}}g_{\eta _{1}}\left( r\right) dr={\mathbb {P}}\left( \eta _{1}\le x_{1}\right) ={\mathbb {P}}\left( \xi _{1}\le x_{1}\right) \\= & {} {\mathbb {P}}\left( \frac{\xi _{1}-\mu _{1}}{\sqrt{\varSigma _{11}}}\le \frac{ x_{1}-\mu _{1}}{\sqrt{\varSigma _{11}}}\right) =\varPhi \left( \frac{x_{1}-\mu _{1} }{\sqrt{\varSigma _{11}}}\right) \le \varPhi \left( \varPhi ^{-1}\left( p\right) \right) =p. \end{aligned}$$

Here, the strict inequality follows from the fact that \(g_{\eta _{1}}>0\) and \(F_{\eta _{2}}<1\), while the last inequality is a consequence of \(\varPhi \) being nondecreasing. Hence, for \(x_{1}\le \sqrt{\varSigma _{11}}\varPhi ^{-1}\left( p\right) +\mu _{1}\), the minimum value of the objective over the empty feasible set is equal to infinity. Therefore, such \(x_{1}\) can be ignored in the outer minimization and we can write our problem, thanks to (50) as

$$\begin{aligned} \begin{array}{l} \min \limits _{x_{1}>\sqrt{\varSigma _{11}}\varPhi ^{-1}\left( p\right) +\mu _{1}}\\ \left\{ c_{1}x_{1}+\min _{x_{2}\in {\mathcal {X}}^{*}}\left\{ c_{2}{\mathbb {E}} x_{2}\left( \xi _{1}\right) \chi _{\left( -\infty ,x_{1}\right] }\left( \xi _{1}\right) |\varphi \left( x_{1},x_{2}\right) \ge p;\,x_{2}\in M\right\} \right\} \\ \quad =\min _{x_{1}>\sqrt{\varSigma _{11}}\varPhi ^{-1}\left( p\right) +\mu _{1}}\left\{ c_{1}x_{1}+c_{2}f\left( x_{1}\right) \right\} . \end{array} \end{aligned}$$

This proves our assertion on an optimal solution \(x_{1}^{*}\). As shown in the first part in this proof, the optimal second-stage decision in (44) associated with the first-stage decision \(x_{1}^{*}\) is defined in (43) and yields the asserted formula for \(x_{2}^{*}\) in the statement of this proof. \(\square \)

Figure 3 illustrates the solution of problem (38) for the data

$$\begin{aligned} c_1=1;\,\,c_2=2;\,\, p=0.8;\,\,\mu =(0,0);\,\,\varSigma =\left( \begin{array}{cc} 1&{}\quad 0.25\\ 0.25&{}\quad 1 \end{array}\right) \end{aligned}$$
Fig. 3
figure 3

Illustration of a solution to problem 38: Optimal first stage decision \(x_1^*\) as minimizer of the function \(c_1t+c_2f(t)\) (left) and optimal second stage decision \(x_2^*(r)\) as affine linear function with slope and intercept as indicated in Theorem 4 (right)