1 Introduction

Let \({\mathcal {X}}\subset {\mathbb {R}}^n\) be a compact set with the infinity norm defined by \(\Vert x\Vert _{\infty }=\max _{i=1,\ldots ,n}|x_i|,x\in {\mathcal {X}}\). Denote \(D>0\) such that \(D:=\sup \{\Vert x-x'\Vert _{\infty }:x,x'\in {\mathcal {X}}\}\) for the diameter of \({\mathcal {X}}\). In this paper, we assume that \({\mathcal {X}}\) can be specified as \({\mathcal {X}}=\{x\in {\mathbb {R}}^n:g(x)\le {\varvec{0}}^{n_g}\}\) where \(g:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^{n_g}\) is a continuously differentiable constraint function. We have the following assumption on g throughout the paper.

Assumption 1

Cottle Constraint Qualification (CCQ) holds at any points in \({\mathcal {X}}\). Namely, for any \(x\in {\mathcal {X}}\), there is a \(d\in {\mathbb {R}}^n\) such that

$$\begin{aligned} \nabla g(x)^\top d<{\varvec{0}}^{n_g} \end{aligned}$$
(1)

holds.

Let \({\mathscr {B}}({\mathcal {X}})\) be Borel \(\sigma \)-algebra on metric space \({\mathcal {X}}\). This paper uses \({\mathscr {B}}(\cdot )\) to denote the Borel \(\sigma \)-algebra on a metric space. Notice that \(\left( {\mathcal {X}},{\mathscr {B}}({\mathcal {X}})\right) \) is a Borel space. Let \(\mu \) be a Borel probability measure on \({\mathscr {B}}({\mathcal {X}})\). Let \(M({\mathcal {X}})\) be the space of Borel probability measures on metric space \({\mathcal {X}}\). Let \(\delta \) be a random vector with support \(\varDelta \subseteq {\mathbb {R}}^s\) and \({\mathbb {P}}\{\cdot \}\) be the probability measurable defined on Borel \(\sigma \)-algebra \({\mathscr {B}}(\varDelta )\) on \(\varDelta \). Let \(p(\delta )\) be the probability density function associated with \({\mathbb {P}}\{\cdot \}\). Given a scalar function \(J:{\mathcal {X}}\rightarrow {\mathbb {R}}\), and a vector-valued function \(h:{\mathcal {X}}\times \varDelta \rightarrow {\mathbb {R}}^m\), a chance-constrained linear program in probability measure space is formulated as:

figure a

where \(\alpha \in (0,1)\) is a given probability level and F(x) is defined by

$$\begin{aligned} F(x):=\int _{\varDelta }{\mathbb {I}}\{h(x,\delta )\}{\textsf{d}}{\mathbb {P}}\{\delta \}=\int _{\varDelta }{\mathbb {I}}\{h(x,\delta )\}p(\delta ){\textsf{d}}\delta . \end{aligned}$$
(2)

Here, \({\mathbb {I}}\{y\}\) presents the indicator function written as

$$\begin{aligned} {\mathbb {I}}\{y\}=\left\{ \begin{array}{ll} 1,&{}\quad \text {if}\; y\le 0,\\ 0,&{}\quad \text {if}\; y>0. \end{array} \right. \end{aligned}$$

Note that F(x) is the probability of having \(h(x,\delta )\le 0\) for given x. Throughout the paper, we assume the following conditions on J(x) and \(h(x,\delta )\).

Assumption 2

For functions J(x) and \(h(x,\delta )\), the followings are supposed to be held:

  1. a.

    J(x) is continuously differentiable with respect to x;

  2. b.

    \(h(x,\delta )\) is continuously differentiable with respect to x for any \(\delta \in \varDelta \);

  3. c.

    For every \(x\in {\mathcal {X}}\), \(h(x,\delta )\) is continuous with respect to \(\delta \);

  4. d.

    The probability density function \(p(\delta )\) is continuous with respect to \(\delta \);

  5. e.

    Let \({\bar{h}}(x,\delta ):=\max _{i}h_i(x,\delta )\), \(\textsf{supp}\ p:={{\textsf{c}}}{{\textsf{l}}}\{\delta \in \varDelta :p(\delta )>0\}\) (\({{\textsf{c}}}{{\textsf{l}}}\{\cdot \}\) denotes the closure), and for each \(x\in {\mathcal {X}}\),

    $$\begin{aligned} \varDelta ^{\textsf{supp}}(x):=\{\delta \in \textsf{supp}\, p:{\bar{h}}(x,\delta )=0\}. \end{aligned}$$

    For each \(x\in {\mathcal {X}}\), the following is assumed to be true:

    $$\begin{aligned} {\mathbb {P}}\{\varDelta ^{\textsf{supp}}(x)\}=0. \end{aligned}$$

    Besides, suppose that \(h(x,\delta )\) has a continuous probability density function for every \(x\in {\mathcal {X}}\);

  6. f.

    There exists \(L>0\) such that

    $$\begin{aligned} \Vert h(x,\delta )-h(x',\delta )\Vert _{\infty }\le L\Vert x-x'\Vert _{\infty },\quad \forall x,x'\in {\mathcal {X}}\;\text {and}\; \forall \delta \in \varDelta , \end{aligned}$$

    and

    $$\begin{aligned} |J(x)-J(x')|\le L\Vert x-x'\Vert _{\infty },\quad \forall x,x'\in {\mathcal {X}}. \end{aligned}$$

In fact, according to the content of pp. 78–79 of [18], we can obtain the continuity of F(x) from Assumption 2.

Denote the feasible region of \(P_{\alpha }\) as \(M_{\alpha }({\mathcal {X}}):=\{\mu \in M({\mathcal {X}}):\int _{{\mathcal {X}}}F(x) {\textsf{d}}\mu \ge 1-\alpha \}\). The optimal objective function value of \(P_{\alpha }\) is

$$\begin{aligned} \bar{{\mathcal {J}}}_{\alpha }:=\textsf{min}\left\{ \int _{{\mathcal {X}}}J(x) {\textsf{d}}\mu :\mu \in M_{\alpha }({\mathcal {X}})\right\} . \end{aligned}$$
(3)

The optimal solution set of \(P_{\alpha }\) is therefore written as

$$\begin{aligned} A_{\alpha }:=\left\{ \mu \in M_{\alpha }({\mathcal {X}}):\int _{{\mathcal {X}}}J(x) {\textsf{d}}\mu = \bar{{\mathcal {J}}}_{\alpha }\right\} , \end{aligned}$$
(4)

\({\bar{\mu }}_{\alpha }\in A_{\alpha }\) is called an optimal measure for \(P_\alpha \).

1.1 Motivation

The motivation for addressing chance-constrained linear programs in probability measure space is from seeking an optimal stochastic policy for the optimal control problem with chance constraints, which is vital for the deployment of reliable autonomous systems by control algorithms that are robust to model misspecifications and for external disturbances [2, 10, 28]. The optimal control problem with chance constraints aims at maximizing a reward function or minimizing a cost function with the constraints that the system state should locate in the safe area with a required probability. The deterministic policy has a fixed value in the decision domain at every time index. In contrast, the stochastic policy provides a probability measure on the decision domain at every time index. The deterministic policy can be regarded as a particular case of the stochastic policy by concentrating the probability measure on a fixed value in the decision domain. The existing techniques for addressing optimal control problems with chance constraints do not touch the essential parts of the problem and may require application-specific assumptions. For example, [10, 17] enforces pointwise chance constraints that ensure the independent satisfaction of each chance constraint at each time step, which leads to a more conservative solution. In general, joint chance constraints are desired, which requires all chance constraints to be satisfied jointly at all times. However, it is challenging to tackle the joint chance-constrained optimal control problem since the distribution of the state trajectory needs to be considered fully. It is possible to address the joint chance-constrained optimal control problem by using Boole’s inequality [2, 24, 36] or performing robust optimization within the bounded model parameters obtained by specifying a confident set [19]. However, these two methods are conservative. More investigations from the viewpoint of optimization theory should be addressed to enhance new breakthroughs for optimal control with chance constraints.

Obtaining open-loop stochastic optimal policies under chance constraints can be essentially written as a chance-constrained linear program in probability measure space [32]. Open-loop stochastic policies mean that the stochastic policies only depend on the initial state. Unfortunately, there is still no research on solving chance-constrained linear programs in probability measure space to our knowledge. Investigating the chance-constrained linear programs in probability measure space is vital, which can give more insights into optimal control with chance constraints.

1.2 Related Works

Optimization with finite chance constraints in finite-dimensional vector space is generally challenging due to the nonconvexity of the feasible set and intractable reformulations [9, 27]. The existing research has two major streams: (1) give assumptions that the constraint functions or the distribution of random variables has some special structure, for example, linear or convex constraint functions [23], finite sample space of random variables [21], elliptically symmetric Gaussian-similar distributions [33], or (2) extract samples [5,6,7, 20, 25, 26, 29, 31] or use smooth functions [13] to approximate the chance constraints. For sample-based methods, the most famous approach in the control field is scenario approach [5,6,7,8, 28]. Scenario approach generates a deterministic optimization problem as the approximation of the original one by extracting samples from the sample space of random variables. The probability of the feasibility of the approximate solution rapidly increases to one as the sample number increases. However, the convergence of the optimality of the approximate solution is not discussed. In another sample-based method, the sample-average approach [13, 20, 26, 29], both feasibility and optimality of the approximate solution are presented. However, neither scenario approach nor sample-average approach can be directly used to solve chance-constrained linear programs in probability measure space since the deduction of the convergence of either scenario approach or sample-average approach assumes that the dimension of the decision variable must be finite.

Optimization with chance/robust constraints in finite-dimensional vector space is also intensively studied, in which the number of chance constraints is infinite [1, 11, 34, 35]. In [34], the generalized differentiation of the probability function of infinite constraints is investigated. The optimality condition with an explicit formulation of subdifferentials is given. In [35], the variational tools are applied to formulate generalized differentiation of chance/robust constraints. The method of getting the explicit outer estimations of subdifferentials from data is also established. An adaptive grid refinement algorithm is developed to solve the optimization with chance/robust constraints in [1]. However, the above research on optimization with chance/robust constraints in finite-dimensional vector space can prove convergence only when the dimension of the decision variable is finite.

Recently, chance constraints in infinite dimensions have attracted a lot of attention. In [12, 14, 15], some essential properties, such as convexity and semi-continuity, are generalized into the chance constraints in infinite dimensions. However, the results in [12] assume that the random variable should have a log-concave density to ensure the semicontinuity. In [15], the continuity of the probability function as chance constraints is proved under the assumption of continuous random distributions. The properties of chance constraints in infinite dimensions are crucial to constructing the optimality condition and implementing convergence analysis for optimization with chance constraints in infinite dimensions. In [14], chance-constrained optimization of elliptic partial differential equation systems is addressed by inner–outer approximation. It proves that the inner and outer approximation converges to the original problem and can provide approximate solutions with ensured convergence. However, the proof of the convergence requires the assumption that the state domain is convex. Besides, it concerns the specific problem in partial differential equation systems.

1.3 Overview of Proposed Method and Contributions

This paper extends the sample-based approximation method to solve chance-constrained linear programs in probability measure space. We show the relationship between chance-constrained optimization in finite-dimensional vector space and chance-constrained linear program in probability measure space. By solving a chance-constrained linear program in probability measure space, we can obtain a stochastic policy to improve the expectation of the optimal value further. We also show that the optimal objective values of the chance-constrained linear program in probability measure space and chance-constrained optimization in finite-dimensional vector space are equal if the constraints involved with random variables are required to be satisfied with probability 1. Namely, in this case, by concentrating the probability measure on an optimal solution of chance-constrained optimization in finite-dimensional vector space, we can obtain an optimal measure for the chance-constrained linear program in probability measure space. Besides, a sample approximate problem and a Gaussian mixture model approximate problem of problem \(P_{\alpha }\) are proposed, by solving which the approximate solution of \(P_{\alpha }\) can be obtained. The convergences of both approximate problems are investigated. Numerical examples are implemented to validate the proposed methods.

Chance-constrained linear program in probability measure space involves chance constraints in infinite dimensions. Our work differs from the [12, 15] in that our purpose is to provide numerical methods for solving chance-constrained linear programs in probability measure space. The properties of chance constraints in infinite dimensions are essential for convergence analysis.

The rest of this paper is organized as follows: Sect. 2 presents two approximate problems of \(P_\alpha \) and gives the main results on the convergence for each approximate problem. The proofs of the main results are presented in Sect. 3. Section 4 presents the results of two numerical examples, which show the effectiveness of our proposed methods. Section 5 concludes the whole paper.

2 Main Results

This section introduces two approximate problems of \(P_{\alpha }\). We also present the convergence for each approximate problem. The proofs are presented in Sect. 3.

2.1 Chance-Constrained Optimization in Finite Space

Chance-constrained optimization \(Q_{\alpha }\) is an optimization problem with chance constraints in a finite-dimension vector space. The problem is written as

figure b

where \(\alpha \in (0,1)\) is a given probability level.

Let \({\mathcal {X}}_{\alpha }:=\{x\in {\mathcal {X}}:F(x)\ge 1-\alpha \}\) be the feasible domain of \(Q_{\alpha }\). Denote \({\bar{J}}_{\alpha }:=\textsf{min}\{J(x):x\in {\mathcal {X}}_{\alpha }\}\) for the optimal objective value of \(Q_{\alpha }\) and \(X_{\alpha }:=\{x\in {\mathcal {X}}_{\alpha }:J(x)={\bar{J}}_{\alpha }\}\) for the optimal solution set of \(Q_{\alpha }\). We have the following assumptions over \(Q_{\alpha }\) throughout the paper.

Assumption 3

There exists a globally optimal solution \({\bar{x}}\) of \(Q_{\alpha }\) such that for any \(\varepsilon >0\) there is \(x\in {\mathcal {X}}\) such that \(0<\Vert x-{\bar{x}}\Vert \le \varepsilon \) and \(F(x)> 1-\alpha \).

The existence of chance constraints gives rise to several difficulties. First, the structural properties of \(h(x,\delta )\) might not be passed to \(F(x)\ge 1-\alpha \). The feasible set \({\mathcal {X}}_\alpha \) can be equivalently obtained as

$$\begin{aligned} {\mathcal {X}}_\alpha =\bigcup _{\varDelta _s\in {\mathcal {F}}}\bigcap _{\delta \in \varDelta _s}{\mathcal {X}}_{\delta }, \end{aligned}$$
(5)

where \({\mathcal {X}}_{\delta }:=\{x\in {\mathcal {X}}:h(x,\delta )\le 0\}\) and \({\mathcal {F}}:=\{\varDelta _s\in {\mathscr {B}}(\varDelta ):{\mathbb {P}}\mathbb \{\varDelta _s\}\ge 1-\alpha \}\). Even if \(h_i(x,\delta ),i=1,\ldots ,m\) are all linear in x for every \(\delta \in \varDelta \), the feasible set \({\mathcal {X}}_\alpha \) may not be convex due to the infinite union operations. Second, it is difficult to obtain a tractable analytical function F(x) to describe the constraint or find a numerically efficient way to compute it. In most applications, \(p(\delta )\) is unknown, and only samples of \(\delta \) are available. We briefly review the sample-based approximation method presented in [20, 25, 26]. Let \({\mathcal {D}}_N=\{\delta ^{(1)},\ldots ,\delta ^{(N)}\}\) be a set of samples randomly extracted from \(\varDelta \) where \(N\in {\mathbb {N}}\). Suppose the sample extraction is independently and identically distributed. Then, \({\mathcal {D}}_N\) can be regarded as a random variable from the augmented sample space \(\varDelta ^N\) with probability measure \({\mathbb {P}}^N\{\cdot \}\) defined on the Borel \(\sigma \)-algebra \({\mathscr {B}}(\varDelta ^N)\). Giving \({\mathcal {D}}_N\), \(\epsilon \in [0,\alpha )\), and \(\gamma >0\), a sample average approximate problem of \(Q_{\alpha }\), defined by \({\tilde{Q}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\), is written as:

figure c

The feasible region of \({\tilde{Q}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) is defined by

$$\begin{aligned} \tilde{{\mathcal {X}}}_{\epsilon ,\gamma }({\mathcal {D}}_N):=\left\{ x\in {\mathcal {X}}:\frac{1}{N}\sum _{j=1}^N{\mathbb {I}}\{h(x,\delta ^{(j)})+\gamma \}\ge 1-\epsilon \right\} . \end{aligned}$$

Denote \({\tilde{J}}_{\epsilon ,\gamma }({\mathcal {D}}_N):=\textsf{min}\{J(x):x\in \tilde{{\mathcal {X}}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\}\) for the optimal objective function value of \({\tilde{Q}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) and \({\tilde{X}}_{\epsilon ,\gamma }({\mathcal {D}}_N):=\{x\in \tilde{{\mathcal {X}}}_{\epsilon ,\gamma }({\mathcal {D}}_N):J(x)={\tilde{J}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\}\) for the optimal solution set of \({\tilde{Q}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\). We can regard \({\tilde{J}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) as a function \({\tilde{J}}_{\epsilon ,\gamma }:\varDelta ^N\rightarrow {\mathbb {R}}\) for given \(\epsilon \) and \(\gamma \). Since \({\mathcal {D}}_N\) is a random variable from \(\varDelta ^N\), \({\tilde{J}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) is consequently a random variable. The sets \(\tilde{{\mathcal {X}}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) and \({\tilde{X}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) also depend on \({\mathcal {D}}_N\) and can be regarded as \(\tilde{{\mathcal {X}}}_{\epsilon ,\gamma }:\varDelta ^N\rightarrow {\mathscr {B}}({\mathcal {X}})\) and \({\tilde{X}}_{\epsilon ,\gamma }:\varDelta ^N\rightarrow {\mathscr {B}}({\mathcal {X}})\). \(\tilde{{\mathcal {X}}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) and \({\tilde{X}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) are called random sets [22]. In [20, 26], the convergence analysis on \(\tilde{{\mathcal {X}}}_{\epsilon ,\gamma }({\mathcal {D}}_N),{\tilde{X}}_{\epsilon ,\gamma }({\mathcal {D}}_N),{\tilde{J}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) is given. We summarize Theorem 10 of [20] and Theorem 3.5 of [26] as Lemma 1.

Lemma 1

Suppose that Assumptions 2 and 3 hold. Let \(\epsilon \in [0,\alpha ),\beta \in (0,\alpha -\epsilon )\) and \(\gamma >0\). Then,

$$\begin{aligned} {{\mathbb {P}}^N}\{\tilde{{\mathcal {X}}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\subseteq {\mathcal {X}}_{\alpha }\}\ge 1-\bigg \lceil \frac{1}{\eta }\bigg \rceil \bigg \lceil \frac{2LD}{\gamma }\bigg \rceil ^{n}\exp \{-2N(\alpha -\epsilon -\beta )^2\}. \end{aligned}$$

Besides, \({\tilde{X}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\rightarrow X_{\alpha }\) and \({\tilde{J}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\rightarrow {\bar{J}}_{\alpha }\) with probability 1 when \(N\rightarrow \infty \), \(\epsilon \rightarrow \alpha ,\gamma \rightarrow 0\).

According to Lemma 1, we can obtain the solution of \(Q_{\alpha }\) with probability 1 when \(N\rightarrow \infty ,\, \epsilon \rightarrow \alpha ,\, \gamma \rightarrow 0\). A natural question arises: can we use the solution of \(Q_{\alpha }\) to obtain an optimal probability measure for \(P_{\alpha }\)? Let \({\bar{x}}_{\alpha }\in X_{\alpha }\) be an optimal solution of \(Q_{\alpha }\). Notice that we have \(\{{\bar{x}}_{\alpha }\}\in {\mathscr {B}}({\mathcal {X}})\) and thus it is possible to define a probability measure \(\mu _{{\bar{x}}_{\alpha }}\) which satisfies that \(\mu _{{\bar{x}}_{\alpha }}(\{{\bar{x}}_{\alpha }\})=\mu _{{\bar{x}}_{\alpha }}({\mathcal {X}})=1\). Then,

$$\begin{aligned} \int _{{\mathcal {X}}}J(x){\textsf{d}}\mu _{{\bar{x}}_{\alpha }}=\int _{\{{\bar{x}}_{\alpha }\}}J(x){\textsf{d}}\mu _{{\bar{x}}_{\alpha }}={\bar{J}}_{\alpha } \end{aligned}$$

and

$$\begin{aligned} \int _{{\mathcal {X}}}F(x) {\textsf{d}}\mu _{{\bar{x}}_{\alpha }}=\int _{\{{\bar{x}}_{\alpha }\}}F(x) {\textsf{d}}\mu _{{\bar{x}}_{\alpha }}=F({\bar{x}}_{\alpha })\ge 1-\alpha . \end{aligned}$$

Thus, \(\mu _{{\bar{x}}_{\alpha }}\) is a feasible solution for \(P_{\alpha }\) with objective value as \({\bar{J}}_{x,\alpha }\). However, \(\mu _{{\bar{x}}_{\alpha }}\) is not sure to locate in \(A_{\alpha }\). Only when \(\alpha =0\), we have \(\mu _{{\bar{x}}_{\alpha }}\in A_{\alpha }\). Notice that it is not ensured that the set \(X_{\alpha }\) is a Borel measurable set. However, it is possible to find a subset \(X_{\alpha }^{\text {m}}\subseteq X_{\alpha }\) that is Borel measurable. A particular example is to choose \(X_{\alpha }^{\text {m}}=\{{\bar{x}}_{\alpha }\}\) where \({\bar{x}}_{\alpha }\in X_{\alpha }\) is one element in the optimal solution set. In this paper, without loss of generality, we assume that \(X_{\alpha }\) is Borel measurable for all \(\alpha \in [0,1]\). Besides, we also assume that \({\mathcal {X}}_0\ne \emptyset \). Then, \({\mathcal {X}}_\alpha \ne \emptyset \) holds for all \(\alpha \in [0,1]\). The above content is formally summarized in Theorem 1.

Theorem 1

Suppose that \({\mathcal {X}}_{\alpha }\) is measurable for all \(\alpha \in [0,1]\) and \({\mathcal {X}}_0\ne \emptyset \). The optimal value of problem \(P_{\alpha }\) satisfies \(\bar{{\mathcal {J}}}_{\alpha }\le {\bar{J}}_{\alpha }\). Besides, if \(\alpha =0\), we have

$$\begin{aligned} \bar{{\mathcal {J}}}_{0}={\bar{J}}_{0} \end{aligned}$$

and

$$\begin{aligned} A_{0}=\{\mu \in M({\mathcal {X}}):\mu ({\mathcal {X}})=\mu (X_0)=1\} \end{aligned}$$
(6)

with probability 1.

The proof of Theorem 1 is given in Sect. 3.1.

Remark 1

Theorem 1 implies that deterministic policy is optimal for robust optimal control where \(\alpha =0\).

2.2 Sample-Based Approximation

Let \({\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\) be the set of all interior points of \({\mathcal {X}}\). By using Hit-and-Run algorithm [30] and Billiard Walk algorithm [16], uniform samples can be generated from \({\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\). For a positive integer \(S\in {\mathbb {N}}\), let \({\mathcal {C}}_S:=\{x^{(1)},\ldots ,x^{(S)}\}\) be a set of uniform samples independently extracted from \({\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\). The set \({\mathcal {C}}_S\) is an element of the augmented space \(\left( {\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\right) ^S\). Since each element \(x^{(i)},i=1,\ldots ,S\) in \({\mathcal {C}}_S\) is extracted independently, we define a S-fold probability \({\mathbb {P}}^S_{\textsf{uni}}\) (\(={\mathbb {P}}_{\textsf{uni}}\times \cdots \times {\mathbb {P}}_{\textsf{uni}}\), S times) in \(\left( {\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\right) ^S\). Here, \( {\mathbb {P}}_{\textsf{uni}}\) is the probability measure of uniform distribution on \({\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\).

With \({\mathcal {C}}_S\) and \({\mathcal {D}}_N\), we can obtain a sample approximate problem of \(P_{\alpha }\) defined by \({\tilde{P}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\):

figure d

where \(U_S:=\{\mu \in {\mathbb {R}}^S:\sum _{i=1}^{S}\mu (i)=1,\, \mu (i)\ge 0,\; \forall i=1,\ldots ,S\}\). Define \({\mathcal {F}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N):=\{\mu \in U_S:\sum _{i=1}^S\mu (i)\frac{1}{N}\sum _{j=1}^N{\mathbb {I}}\{h(x^{(i)},\delta ^{(j)})\}\ge 1-\alpha \}\) as the feasible set of \({\tilde{P}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\). Denote the optimal objective function value as

$$\begin{aligned} \tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N):=\textsf{min}\left\{ \sum _{i=1}^{S} J(x^{(i)})\mu (i):\mu \in {\mathcal {F}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\right\} . \end{aligned}$$

Denote the optimal solution set for \({\tilde{P}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) as

$$\begin{aligned} {\tilde{A}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N):=\left\{ \mu \in {\mathcal {F}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N):\sum _{i=1}^{S} J(x^{(i)})\mu (i)= \tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\right\} . \end{aligned}$$

Let \({\tilde{\mu }}_{\alpha }\in {\tilde{A}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) be an optimal measure. The optimal value \(\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) depends on \({\mathcal {C}}_S\) and \({\mathcal {D}}_S\), and thus it can be regarded as a function \(\tilde{{\mathcal {J}}}_{\alpha }:{\mathcal {X}}^S\times \varDelta ^{N}\rightarrow {\mathbb {R}}\). Then, \(\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) is a random variable. Besides, \({\tilde{A}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) is a random set.

The deduction of the convergences of \(\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) and \({\tilde{A}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) requires another assumption on \(P_\alpha \). We state the assumption after a brief introduction of weak convergence.

Define a space of continuous \({\mathbb {R}}\)-valued functions by

$$\begin{aligned} {\mathscr {C}}({\mathcal {X}},{\mathbb {R}}):=\{f:{\mathcal {X}}\rightarrow {\mathbb {R}} |f\, \text {is continuous}\}. \end{aligned}$$
(7)

It is able to define a metric on \({\mathscr {C}}({\mathcal {X}},{\mathbb {R}})\) by

$$\begin{aligned} \tau (f,f'):=\Vert f-f'\Vert _\infty , \end{aligned}$$
(8)

where \(\Vert f\Vert _\infty \) is defined as

$$\begin{aligned} \Vert f\Vert _\infty :=\sup _{x\in {\mathcal {X}}}|f(x)|. \end{aligned}$$

The metric \(\tau (\cdot ,\cdot )\) turns \({\mathscr {C}}({\mathcal {X}},{\mathbb {R}})\) into a complete metric space.

The weak convergence of probability measures is defined as follows [4].

Definition 1

Let \(\{\mu _k\}_{k=0}^{\infty }\) be a sequence in \(M({\mathcal {X}})\). We say that \(\{\mu _k\}_{k=0}^{\infty }\) converges weakly to \(\mu \) if

$$\begin{aligned} \lim _{k\rightarrow \infty }\left| \int _{{\mathcal {X}}}f(x){\textsf{d}}\mu _k-\int _{{\mathcal {X}}}f(x){\textsf{d}}\mu \right| =0,\quad \text {for all}\quad f\in {\mathscr {C}}({\mathcal {X}},{\mathbb {R}}). \end{aligned}$$
(9)

Since \({\mathcal {X}}\) is compact, \(M({\mathcal {X}})\) can be proved to be weakly compact by Riesz representation theorem [4]. Therefore, giving any sequence of \(\{\mu _k\}_{k=0}^{\infty }\subset M({\mathcal {X}})\), there is a subsequence which converges weakly to some \(\mu \in M({\mathcal {X}})\) in the sense of Definition 1. By Assumption 2, we have that J(x) and F(x) are continuous with respect to x. Therefore, if \(\{\mu _k\}_{k=0}^{\infty }\) converges weakly to \(\mu \), (9) also holds for J(x) or F(x). We give the following assumption on Problem \(P_\alpha \).

Assumption 4

There exists a globally optimal solution \(\mu ^*\in A_\alpha \) of Problem \(P_\alpha \) such that for any \(\delta >0\) there is \(\mu \in M({\mathcal {X}})\) such that \(\int _{{\mathcal {X}}}F(x){\textsf{d}}\mu >1-\alpha \) and \({\mathcal {W}}(\mu ,\mu ^*)\le \delta \), where \({\mathcal {W}}(\mu ,\mu ^*)\) is defined by

$$\begin{aligned} {\mathcal {W}}(\mu ,\mu ^*)=\left| \int _{{\mathcal {X}}}J(x){\textsf{d}}\mu -\int _{{\mathcal {X}}}J(x){\textsf{d}}\mu ^*\right| . \end{aligned}$$
(10)

As \(S,N\rightarrow \infty \), the convergence analysis on \(\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) and \({\tilde{A}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) is summarized in Theorem 2.

Theorem 2

Consider Problem \(P_\alpha \) with \(\alpha >0\). Suppose Assumptions 123, and 4 hold. As \(S,N\rightarrow \infty \), we have

$$\begin{aligned} \liminf _{S,N\rightarrow \infty } \tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)=\bar{{\mathcal {J}}}_{\alpha }, \end{aligned}$$

with probability 1. Besides, as \(S,N\rightarrow \infty \), we have \({\tilde{A}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\subset M_{\alpha }({\mathcal {X}}):=\{\mu \in M({\mathcal {X}}):\int _{{\mathcal {X}}}F(x) {\textsf{d}}\mu \ge 1-\alpha \}\) with probability 1.

The proof of Theorem 2 is given in Sect. 3.2.

2.3 Gaussian Mixture Model-Based Approximation

Another option of approximation is to constrain the choice of \(\mu \) in \(M_{\theta }({\mathcal {X}})\subseteq M({\mathcal {X}})\). Here, \(M_\theta ({\mathcal {X}})\) is defined as

$$\begin{aligned} M_{\theta }({\mathcal {X}}):=\left\{ \mu \in M({\mathcal {X}}): \mu (X)=\int _X p_{\theta }(x){\textsf{d}}x,\; \forall X\subseteq {\mathcal {X}}\right\} , \end{aligned}$$

where the probability density function \(p_{\theta }(x)\) is written as

$$\begin{aligned} p_{\theta }(x)=\sum _{i=1}^{L}\omega _i\phi (x,m_i,\varSigma _i). \end{aligned}$$
(11)

Here, \(\omega _i\in [0,1],\forall i=1,\ldots ,L\), \(\sum _{i=1}^{L}\omega _i=1\), and \(\phi (x,m_i,\varSigma _i)\) is multivariate Gaussian distribution written by

$$\begin{aligned} \phi (x,m_i,\varSigma _i)=\frac{1}{(2\pi )^{n/2}|\varSigma _i|^{1/2}} \exp \left( -\frac{1}{2}(x-m_i)^\top \varSigma _i^{-1}(x-m_i)\right) . \end{aligned}$$

The notation \(\theta \) denotes the parameter vector, including all the unknown parameters in \(\omega _i,m_i,\varSigma _i,\forall i=1,\ldots ,L\). Denote the dimension of \(\theta \) as \(n_{\theta }\). The feasible domain of \(\theta \) is denoted by

$$\begin{aligned} \varTheta :=\left\{ \theta \in {\mathbb {R}}^{n_\theta }:\sum _{i=1}^{L}\omega _i=1,\; \omega _i\ge 0\right\} . \end{aligned}$$

Then, given a data set \({\mathcal {D}}_N\) and the number of Gaussian distributions L, we can obtain a Gaussian mixture model-based approximate problem of \(P_{\alpha }\) defined by \({\hat{P}}_{\alpha }(L,{\mathcal {D}}_N)\):

figure e

Denote the feasible set of \({\hat{P}}_{\alpha }(L,{\mathcal {D}}_N)\) as

$$\begin{aligned} \varTheta _{\alpha }(L,{\mathcal {D}}_N):=\left\{ \theta \in \varTheta :\int _{{\mathcal {X}}}\sum _{j=1}^N\frac{1}{N}{\mathbb {I}}\{h(x,\delta ^{(j)})\}p_{\theta }(x){\textsf{d}}x\ge 1-\alpha \right\} , \end{aligned}$$

and the optimal objective value as

$$\begin{aligned} \hat{{\mathcal {J}}}_{\alpha }(L,{\mathcal {D}}_N):=\textsf{min}\left\{ \int _{{\mathcal {X}}}J(x) p_{\theta }(x){\textsf{d}}x:\theta \in \varTheta _{\alpha }(L,{\mathcal {D}}_N)\right\} . \end{aligned}$$

Besides, the optimal solution set is

$$\begin{aligned} {\hat{\varTheta }}_{\alpha }(L,{\mathcal {D}}_N):=\left\{ \theta \in \varTheta _{\alpha }(L,{\mathcal {D}}_N):\int _{{\mathcal {X}}}J(x) p_{\theta }(x){\textsf{d}}x=\hat{{\mathcal {J}}}_{\alpha }(L,{\mathcal {D}}_N)\right\} . \end{aligned}$$

The optimal objective value \(\hat{{\mathcal {J}}}_{\alpha }(L,{\mathcal {D}}_N)\) depends on the number of used Gaussian models and the data set \({\mathcal {D}}_N\). Since data set \({\mathcal {D}}_N\) is essentially random variable with support \(\varDelta ^N\), \(\hat{{\mathcal {J}}}_{\alpha }(L,{\mathcal {D}}_N)\) is also a random variable. The set \({\hat{\varTheta }}_{\alpha }(L,{\mathcal {D}}_N)\) is a random set.

As \(L,N\rightarrow \infty \), optimality and feasibility of using the optimal solution of \({\hat{P}}_{\alpha }(L,{\mathcal {D}}_N)\) are summarized in Theorem 3.

Theorem 3

Consider Problem \(P_\alpha \) with \(\alpha >0\). Suppose Assumptions 123, and 4 hold. As \(L,N\rightarrow \infty \), we have

$$\begin{aligned} \liminf _{L,N\rightarrow \infty } \hat{{\mathcal {J}}}_{\alpha }(L,{\mathcal {D}}_N)=\bar{{\mathcal {J}}}_{\alpha }, \end{aligned}$$

with probability 1. Besides, let \({\hat{\theta }}\in {\hat{\varTheta }}_{\alpha }(L,{\mathcal {D}}_N)\) be an optimal solution of \({\hat{P}}_{\alpha }(L,{\mathcal {D}}_N)\). The corresponding probability density function is \(p_{{\hat{\theta }}}(x)\), and the obtained probability measure is

$$\begin{aligned} \mu _{{\hat{\theta }}}(X):=\int _{X}p_{{\hat{\theta }}}(x){{\textsf{d}}}x,\quad \forall X\subseteq {\mathcal {X}}. \end{aligned}$$

We have \(\mu _{{\hat{\theta }}}\in M_{\alpha }({\mathcal {X}}):=\{\mu \in M({\mathcal {X}}):\int _{{\mathcal {X}}}F(x) { {\textsf{d}}}\mu \ge 1-\alpha \}\) with probability 1 as \(L,N\rightarrow \infty \).

The proof of Theorem 3 is given in Sect. 3.3.

3 Proofs of Main Results

3.1 Proof of Theorem 1

Proof

(Theorem 1) Define a measure by \({\bar{\mu }}_{\alpha }(\cdot )\), which satisfies that \({\bar{\mu }}_{\alpha }(X_{\alpha })=1\). Then, we have

$$\begin{aligned} \int _{{\mathcal {X}}_{\alpha }} J(x){\textsf{d}}{\bar{\mu }}_{\alpha }=\int _{X_{\alpha }} J(x){\textsf{d}}{\bar{\mu }}_{\alpha }={\bar{J}}_{\alpha }. \end{aligned}$$

Besides, for the constraint, we have

$$\begin{aligned} \int _{{\mathcal {X}}}F(x) {\textsf{d}}{\bar{\mu }}_{\alpha }(x)=\int _{X_{\alpha }}F(x){\textsf{d}}{\bar{\mu }}_{\alpha }(x)\ge 1-\alpha . \end{aligned}$$

Then, \({\bar{\mu }}_{\alpha }(\cdot )\in M_{\alpha }({\mathcal {X}})\) holds. Thus, we have \(\bar{{\mathcal {J}}}_{\alpha }\le \int _{{\mathcal {X}}} J(x){\textsf{d}}{\bar{\mu }}_{\alpha }={\bar{J}}_{\alpha }\).

When \(\alpha =0\), let \({\mathcal {X}}^c_{0}=\{x\in {\mathcal {X}}:F(x)<1\}\) be the complement set of \({\mathcal {X}}_0\), namely, \({\mathcal {X}}^c_{0}\bigcup {\mathcal {X}}_0={\mathcal {X}}\) and \({\mathcal {X}}^c_{0}\bigcap {\mathcal {X}}_0=\emptyset \). Notice that \({\mathcal {X}}^c_{0}\) is Borel measurable since \({\mathcal {X}}_0\) is Borel measurable. Suppose that there is \({\tilde{\mu }}(\cdot )\in M_{0}({\mathcal {X}})\) such that \({\tilde{\mu }}({\mathcal {X}}^c_{0})>0\). Then,

$$\begin{aligned} \int _{{\mathcal {X}}}F(x) {\textsf{d}}{\tilde{\mu }}(x)=\int _{{\mathcal {X}}_0}F(x){\textsf{d}}{\tilde{\mu }}(x)+\int _{{\mathcal {X}}_0^c}F(x){\textsf{d}}{\tilde{\mu }}(x)<{\tilde{\mu }}({\mathcal {X}}_0)+{\tilde{\mu }}({\mathcal {X}}_0^c)= 1, \end{aligned}$$
(12)

which conflicts with that \({\tilde{\mu }}\in M_0({\mathcal {X}})\). Therefore, we have \(\mu ({\mathcal {X}}^c_{0})=0\) for all \(\mu \in M_0({\mathcal {X}})\), which implies that \(\int _{{\mathcal {X}}}J(x){\textsf{d}}\mu =\int _{{\mathcal {X}}_0}J(x){\textsf{d}}\mu \) for all \(\mu \in M_0({\mathcal {X}})\).

Notice that \(X_0\) is a Borel measurable set. Let \(\mu ^*_0(\cdot )\in A_0\) be an optimal probability measure for \(P_0\) and suppose \(\mu ^*_{0}(X_{0})<1\) for deriving the contradiction. Thus, \(\mu ^*_{0}({\mathcal {X}}{\setminus } X_{0})>0\). The corresponding objective function is

$$\begin{aligned} \int _{{\mathcal {X}}} J(x){\textsf{d}}\mu ^*_{0}&= \int _{{\mathcal {X}}_{0}} J(x){\textsf{d}}\mu ^*_{0} \nonumber \\&= \int _{X_{0}}J(x){\textsf{d}}\mu ^*_{0} + \int _{{\mathcal {X}}_{0}{\setminus } X_{0}}J(x){\textsf{d}}\mu ^*_{0}\nonumber \\&= \int _{X_{0}}{\bar{J}}_0{\textsf{d}}\mu ^*_{0} + \int _{{\mathcal {X}}_{0}{\setminus } X_{0}}J(x){\textsf{d}}\mu ^*_{0}\ \ \left( \because \right) \left( J(x)={\bar{J}}_0,\forall x\in X_{0}\right) \nonumber \\&= {\bar{J}}_0\int _{X_{0}}{\textsf{d}}\mu ^*_{0} + \int _{{\mathcal {X}}_{0}{\setminus } X_{0}}J(x){\textsf{d}}\mu ^*_{0} \nonumber \\&=\mu ^*_{0}(X_{0})\cdot {\bar{J}}_{0} + \int _{{\mathcal {X}}_{0}{\setminus } X_{0}}J(x){\textsf{d}}\mu ^*_{0}. \end{aligned}$$
(13)

Denote a measure by \({\bar{\mu }}_{0}(\cdot )\), which satisfies that \({\bar{\mu }}_{0}(X_{0})=1\). Then, we have

$$\begin{aligned} \int _{{\mathcal {X}}} J(x){\textsf{d}}{\bar{\mu }}_{0}-\int _{{\mathcal {X}}} J(x)d\mu ^*_{0}&=\int _{{\mathcal {X}}_{0}} J(x){\textsf{d}}{\bar{\mu }}_{0}-\int _{{\mathcal {X}}_{0}} J(x)d\mu ^*_{0} \nonumber \\&= {\bar{J}}_{0} - \mu ^*_{0}(X_{0}) {\bar{J}}_{0} - \int _{{\mathcal {X}}_{0}{\setminus } X_{0}}J(x){\textsf{d}}\mu ^*_{0}\nonumber \\&=(1-\mu ^*_{0}(X_{0}))\cdot {\bar{J}}_{0}-\int _{{\mathcal {X}}_{0}{\setminus } X_{0}}J(x){\textsf{d}}\mu ^*_{0}\nonumber \\&=\int _{{\mathcal {X}}_{0}{\setminus } X_{0}}({\bar{J}}_{0}-J(x)){\textsf{d}}\mu ^*_{0}\nonumber \\&<\int _{{\mathcal {X}}_{0}{\setminus } X_{0}}(J(x)-J(x)){\textsf{d}}\mu ^*_{0}=0. \end{aligned}$$
(14)

Thus, \(\mu ^*_{0}(\cdot )\) is not the optimal measure. Therefore, (6) holds, which leads to \(\bar{{\mathcal {J}}}_0={\bar{J}}_0\). \(\square \)

3.2 Proof of Theorem 2

Lemma 2

Suppose that Assumption 1 holds. For any \(x\in {\mathcal {X}}\), denote a set as

$$\begin{aligned} {\mathcal {B}}_{\varepsilon }(x):=\{y\in {\mathcal {X}}:\Vert x-y\Vert \le \varepsilon \} \end{aligned}$$

where \(\varepsilon >0\) is radius. For any \(\varepsilon >0\), we have

$$\begin{aligned} \lim _{S\rightarrow \infty } {\mathbb {P}}^S_{\textsf{uni}}\left\{ {\mathcal {C}}_S\bigcap {\mathcal {B}}_\varepsilon (x)\ne \emptyset \right\} =1. \end{aligned}$$
(15)

Lemma 2

First, we show that the interior point set \({\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\) is not empty when Assumption 1 holds. Let \({\bar{x}}\in {\mathcal {X}}\) and thus we have

$$\begin{aligned} g({\bar{x}})\le {\varvec{0}}^{n_g}. \end{aligned}$$
(16)

By Assumption 1, CCQ holds at \({\bar{x}}\). Thus, there exists \(d\in {\mathbb {R}}^n\) such that

$$\begin{aligned} \nabla g({\bar{x}})^\top d<{\varvec{0}}^{n_g}. \end{aligned}$$
(17)

Notice that (16) and (17) directly give

$$\begin{aligned} g({\bar{x}}) + \nabla g({\bar{x}})^\top d<{\varvec{0}}^{n_g}. \end{aligned}$$
(18)

Since \(g(\cdot )\) is continuously differentiable, there exists a small enough \({\bar{\xi }}>0\) such that \(g({\bar{x}}+\xi d)<0\) holds for any \(\xi \in (0,{\bar{\xi }})\) and thus \({\bar{x}}+\xi d\in {\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\). It implies that \({\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\) is not empty.

We start from discussing \({\mathbb {P}}^S_{\textsf{uni}}\{{\mathcal {C}}_S\bigcap {\mathcal {B}}_\varepsilon (x)\ne \emptyset \}\) for \(x\in {\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\). Notice that \({\mathcal {X}}\) is compact and \({\mathcal {C}}_S\) is a set of uniform samples extracted from \({\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\). Thus, for any \(x\in {\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\), the probability that a sample \(x^{(i)}\in {\mathcal {C}}_S,\; i=1,..,S\) locates in \({\mathcal {B}}_\varepsilon (x)\) is

$$\begin{aligned} {\mathbb {P}}_{\textsf{uni}}\{x^{(i)}\in {\mathcal {B}}_\varepsilon (x)\}>0. \end{aligned}$$

Then,

$$\begin{aligned} {\mathbb {P}}^S_{\textsf{uni}}\left\{ {\mathcal {C}}_S\bigcap {\mathcal {B}}_\varepsilon (x)\ne \emptyset \right\}&=1-{\mathbb {P}}^S_{\textsf{uni}}\left\{ {\mathcal {C}}_S\bigcap {\mathcal {B}}_\varepsilon (x)=\emptyset \right\} \nonumber \\&\ge 1-\left( 1-{\mathbb {P}}_{\textsf{uni}}\{x^{(i)}\in {\mathcal {B}}_\varepsilon (x)\}\right) ^S. \end{aligned}$$
(19)

If \(S\rightarrow \infty \), we have \({\mathbb {P}}^S_{\textsf{uni}}\{{\mathcal {C}}_S\bigcap {\mathcal {B}}_\varepsilon (x)\ne \emptyset \}\ge 1\), which implies (15).

Then, we discuss \({\mathbb {P}}^S_{\textsf{uni}}\{{\mathcal {C}}_S\bigcap {\mathcal {B}}_\varepsilon (x)\ne \emptyset \}\) for \(x\in \partial {\mathcal {X}}\), where \(\partial {\mathcal {X}}\) defines the boundary of \({\mathcal {X}}\). Let \(x\in \partial {\mathcal {X}}\) be a boundary point. Again, by Assumption 1, x satisfies the CCQ. By replacing \({\bar{x}}\) in (16) and (18) by x, we have that there exists a small enough \({\bar{\xi }}>0\) such that \(g(x+\xi d)<0\) holds for any \(\xi \in (0,{\bar{\xi }})\) and thus \(x+\xi d\in {\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\). Let \(\varepsilon _1\in (0,{\bar{\xi }})\) and we can find \(x':=x+\xi d\in {\mathcal {B}}_{\varepsilon _1}(x)\bigcap {\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\) with a small enough \(\xi \). Besides, the probability that a sample \(x^{(i)}\in {\mathcal {C}}_S,\; i=1,\ldots ,S\) locates in \({\mathcal {B}}_{\varepsilon _1}(x')\) satisfies that \({\mathbb {P}}_{\textsf{uni}}\{x^{(i)}\in {\mathcal {B}}_{\varepsilon _1}(x')\}>0\). Thus, we have \({\mathbb {P}}_{\textsf{uni}}\{x^{(i)}\in {\mathcal {B}}_{2\varepsilon _1}(x)\}>0\). Let \(\varepsilon _1=\varepsilon /2\), and we can obtain (19) for a boundary point of \({\mathcal {X}}\), which completes the proof. \(\square \)

With sample set \({\mathcal {C}}_{S}=\{x^{(1)},\ldots ,x^{(S)}\}\), a sample average approximate problem of \(P_{\alpha }\), defined by \(\breve{P}_{\alpha }({\mathcal {C}}_{S})\), is written as:

figure f

where \(U_S:=\{\mu \in {\mathbb {R}}^S:\sum _{i=1}^{S}\mu (i)=1,\; \mu (i)\ge 0,\; \forall i=1,\ldots ,S\}\). Denote the feasible region of problem \(\breve{P}_{\alpha }({\mathcal {C}}_{S})\) as

$$\begin{aligned} \breve{{\mathcal {F}}}_{\alpha }({\mathcal {C}}_S):=\left\{ \mu \in U_S:\sum _{i=1}^{S}\mu (i)F(x^{(i)})\ge 1-\alpha .\right\} . \end{aligned}$$

Then, the optimal objective function value of \(\breve{P}_{\alpha }({\mathcal {C}}_{S})\) is defined by

$$\begin{aligned} \breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S):=\textsf{min}\left\{ \sum _{i=1}^SJ(x^{(i)})\mu (i):\mu \in \breve{{\mathcal {F}}}_{\alpha }({\mathcal {C}}_S)\right\} . \end{aligned}$$

The optimal solution set for \(\breve{P}_{\alpha }({\mathcal {C}}_{S})\) is therefore defined by

$$\begin{aligned} \breve{A}_{\alpha }({\mathcal {C}}_S):=\left\{ \mu \in \breve{{\mathcal {F}}}_{\alpha }({\mathcal {C}}_S):\sum _{i=1}^SJ(x^{(i)})\mu (i)=\breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S)\right\} . \end{aligned}$$

A measure \(\breve{\mu }_{\alpha }\in \breve{A}_{\alpha }({\mathcal {C}}_S)\) is called an optimal measure for \(\breve{P}_{\alpha }({\mathcal {C}}_{S})\).

Theorem 4

For given sample sets \({\mathcal {C}}_{S}\) and \({\mathcal {D}}_{N}\), define two functions of \(\mu \in U_S\) as

$$\begin{aligned} \breve{G}_{\alpha }(\mu ,{\mathcal {C}}_{S}):=\sum _{i=1}^{S}\mu (i)F(x^{(i)})=\sum _{i=1}^{S}\mu (i)\int _{\varDelta } {\mathbb {I}}\{h(x^{(i)},\delta )\} p(\delta ){\textsf{d}}\delta , \end{aligned}$$

and

$$\begin{aligned} {\tilde{G}}_{\alpha }(\mu ,{\mathcal {C}}_{S},{\mathcal {D}}_{N}):=\sum _{i=1}^{S}\mu (i)\frac{1}{N}\sum _{j=1}^N{\mathbb {I}}\{h(x^{(i)},\delta ^{(j)})\}. \end{aligned}$$

Then, \({\tilde{G}}_{\alpha }(\mu ,{\mathcal {C}}_{S},{\mathcal {D}}_{N})\) uniformly converges to \(\breve{G}_{\alpha }(\mu ,{\mathcal {C}}_{S})\) on \(U_S\) w.p. 1, i.e.,

$$\begin{aligned} \sup _{\mu \in U}\left| {\tilde{G}}_{\alpha }(\mu ,{\mathcal {C}}_{S},{\mathcal {D}}_{N})-\breve{G}_{\alpha }(\mu ,{\mathcal {C}}_{S})\right| \rightarrow 0,\; \text {w.p. 1}\; \text {as}\; N\rightarrow \infty . \end{aligned}$$

Proof

(Theorem 4) For any given \(x^{(i)}\), \({\mathbb {I}}\{h(x^{(i)},\delta )\}\) is a measurable function of \(\delta \). According to the strong Law of Large Numbers (LLN) [3], we have

$$\begin{aligned} \frac{1}{N}\sum _{j=1}^N{\mathbb {I}}\{h(x^{(i)},\delta ^{(j)})\}-{\mathbb {E}}\{{\mathbb {I}}\{h(x^{(i)},\delta )\}\}\rightarrow 0,\; \text {w.p. 1}\; \text {as}\; N\rightarrow \infty , \end{aligned}$$

where

$$\begin{aligned} {\mathbb {E}}\{{\mathbb {I}}\{h(x^{(i)},\delta )\}\}=\int _{\varDelta } {\mathbb {I}}\{h(x^{(i)},\delta )\} p(\delta ){\textsf{d}}\delta . \end{aligned}$$

Thus, for every \(\mu \in {\mathcal {U}}_S\), we have

$$\begin{aligned} \breve{G}_{\alpha }(\mu ,{\mathcal {C}}_{S})-{\tilde{G}}_{\alpha }(\mu ,{\mathcal {C}}_{S},{\mathcal {D}}_{N})&=\sum _{i=1}^S \mu (i)\left( \frac{1}{N}\sum _{j=1}^N{\mathbb {I}}\{h(x^{(i)},\delta ^{(j)})\}-{\mathbb {E}}\{{\mathbb {I}}\{h(x^{(i)},\delta )\}\}\right) \\&\rightarrow \sum _{i=1}^S \mu (i)\times 0=0.\; \text {w.p. 1}\; \text {as}\; N\rightarrow \infty . \end{aligned}$$

Uniform convergence is ensured since the set \(U_S\) is compact. \(\square \)

Nextly, we show that \(\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) and \({\tilde{A}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) converge to \(\breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S)\) and \(\breve{A}_{\alpha }({\mathcal {C}}_S)\), respectively, with probability 1 as \(N\rightarrow \infty \).

Theorem 5

Consider Problem \(P_\alpha \) with \(\alpha >0\). Assume that there exists a \(x^{(i)}\in {\mathcal {C}}_S\) that satisfies \(F(x^{(i)})>1-\alpha \). As \(N\rightarrow \infty \), \(\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\rightarrow \breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S)\) and \({\tilde{A}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\rightarrow \breve{A}_{\alpha }({\mathcal {C}}_S)\) w.p. 1.

Proof

(Theorem 5) The set \(U_S\) is a compact set. The objective function \(\sum _{i=1}^SJ(x^{(i)})\mu (i)\) is a linear function of \(\mu \in U_S\). Besides, \(F(x^{(i)})\) is a constant value within [0, 1] for a fixed \(x^{(i)}\), which makes the constraint function \(\breve{G}_{\alpha }(\mu ,{\mathcal {C}}_{S})\) a linear function of \(\mu \in U_S\). Therefore, \(\breve{P}_{\alpha }({\mathcal {C}}_S)\) is a linear program. Due to the assumption that there exists \(x^{(i)}\in {\mathcal {C}}_{S}\) such that \(F(x^{(i)})> 1-\alpha \), there is \(\mu \in U_S\) such that \(\breve{G}_{\alpha }(\mu ,{\mathcal {C}}_{S})>1-\alpha \) and thus \(\breve{A}_{\alpha }({\mathcal {C}}_S)\) is nonempty. Since \({\tilde{G}}_{\alpha }(\mu ,{\mathcal {C}}_{S},{\mathcal {D}}_{N})\) converges to \(\breve{G}_{\alpha }(\mu ,{\mathcal {C}}_{S})\) w.p. 1 by Theorem 4, there exists \(N_0\) large enough such that \({\tilde{G}}_{\alpha }(\mu ,{\mathcal {C}}_{S},{\mathcal {D}}_{N_0})\ge 1-\alpha \) w.p. 1. Because \({\tilde{G}}_{\alpha }(\mu ,{\mathcal {C}}_{S},{\mathcal {D}}_{N_0})\) is a linear function of \(\mu \) and \(U_S\) is compact, the feasible set of \({\tilde{P}}_{\alpha }({\mathcal {C}}_{S},{\mathcal {D}}_{N_0})\) is compact as well, and hence \({\tilde{A}}_{\alpha }({\mathcal {C}}_{S},{\mathcal {D}}_{N_0})\) is nonempty w.p. 1 for all \(N\ge N_0\).

Let \(\{N_k\}^{\infty }_{k=1}\) be a sequence such that \(N_k\rightarrow \infty \) and \(N_k\ge N_0\) holds for every \(k=1,\ldots \). Let \({\tilde{\mu }}_k\in {\tilde{A}}_{\alpha }({\mathcal {C}}_{S},{\mathcal {D}}_{N_0})\) such that \({\tilde{G}}_{\alpha }({\tilde{\mu }}_k,{\mathcal {C}}_S,{\mathcal {D}}_{N_k})\ge 1-\alpha \), and \(\sum _{i=1}^SJ(x^{(i)}){\tilde{\mu }}_k(i)=\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_{N_k})\). Let \({\tilde{\mu }}\) be any cluster point of \(\{{\tilde{\mu }}_k\}_{k=1}^{\infty }\). Let \(\{{\tilde{\mu }}_{t}\}_{t=1}^{\infty }\) be a subsequence converging to \({\tilde{\mu }}\). By Theorem 4, we have

$$\begin{aligned} \breve{G}_{\alpha }({\tilde{\mu }},{\mathcal {C}}_S)=\lim _{t\rightarrow \infty }{\tilde{G}}_{\alpha }({\tilde{\mu }}_t,{\mathcal {C}}_S,{\mathcal {D}}_{N_t}),\; \text {w.p. 1}. \end{aligned}$$

Therefore, \(\breve{G}_{\alpha }({\tilde{\mu }},{\mathcal {C}}_S)\ge 1-\alpha \) and \({\tilde{\mu }}\) is feasible for problem \(\breve{P}_{\alpha }({\mathcal {C}}_S)\) which implies \(\sum _{i=1}^SJ(x^{(i)}){\tilde{\mu }}(i)\ge \breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S)\). Note that \({\tilde{\mu }}_t\rightarrow {\tilde{\mu }}\) w.p. 1, which implies that

$$\begin{aligned} \lim _{t\rightarrow \infty } \tilde{{\mathcal {J}}}_\alpha ({\mathcal {C}}_S,{\mathcal {D}}_{N_t})=\lim _{t\rightarrow \infty } \sum _{i=1}^SJ(x^{(i)}){\tilde{\mu }}_t(i)=\sum _{i=1}^SJ(x^{(i)}){\tilde{\mu }}(i)\ge \breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S),\; \text {w.p. 1}. \end{aligned}$$

Since this is true for an arbitrary point of \(\{{\tilde{\mu }}_k\}_{k=1}^{\infty }\) in the compact set \({\mathcal {U}}_S\), we have

$$\begin{aligned} \lim _{k\rightarrow \infty } \tilde{{\mathcal {J}}}_\alpha ({\mathcal {C}}_S,{\mathcal {D}}_{N_k})=\lim _{k\rightarrow \infty } \sum _{i=1}^SJ(x^{(i)}){\tilde{\mu }}_k(i)\ge \breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S),\; \text {w.p. 1}. \end{aligned}$$
(20)

Besides, we know that there exists a globally optimal solution of \(\breve{P}_{\alpha }({\mathcal {C}}_S)\), \(\mu ^*\), such that for any \(\varepsilon >0\) there is \(\mu \in U\) such that \(0<\Vert \mu -\mu ^*\Vert \le \varepsilon \) and \(\breve{G}_{\alpha }(\mu ,{\mathcal {C}}_S)>1-\alpha \). Namely, there exists a sequence \(\{{\tilde{\mu }}_t\}_{t=1}^{\infty }\subseteq U\) that converges to an optimal solution \(\mu ^*\) such that \(\breve{G}_{\alpha }({\tilde{\mu }}_t,{\mathcal {C}}_S)>1-\alpha \) for all \(t\in {\mathbb {N}}\). Notice that \({\tilde{G}}_{\alpha }({\tilde{\mu }}_t,{\mathcal {C}}_S,{\mathcal {D}}_{N_k})\) converges to \(\breve{G}_{\alpha }({\tilde{\mu }}_t,{\mathcal {C}}_S)\) w.p. 1. Then, for any fixed t, \(\exists K(t)\) such that \({\tilde{G}}_{\alpha }({\tilde{\mu }}_t,{\mathcal {C}}_S,{\mathcal {D}}_{N_k})\ge 1-\alpha \) for every \(k\ge K(t)\) w.p. 1. We can assume \(K(t)<K(t+1)\) for every t and define the sequence \(\{{\tilde{\mu }}_k\}_{k=K(1)}^{\infty }\) by setting \({\tilde{\mu }}_k={\tilde{\mu }}_t\) for all k and t with \(K(t)\le k<K(t+1)\). Then, \({\tilde{G}}_{\alpha }({\hat{\mu }}_k,{\mathcal {C}}_S,{\mathcal {D}}_{N_k})\ge 1-\alpha \), which implies \(\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_{N_k})\le \sum _{i=1}^SJ(x^{(i)}){\tilde{\mu }}_k(i)\) for all \(k\ge K(1)\). Thus, we have that

$$\begin{aligned} \lim _{k\rightarrow \infty } \tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_{N_k})\le \sum _{i=1}^SJ(x^{(i)})\mu ^*(i)=\breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S),\; \text {w.p. 1}. \end{aligned}$$
(21)

With (20) and (21), we conclude that \(\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_{N})\rightarrow \breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S)\) w.p. 1 as \(N\rightarrow \infty \).

The proof of \({\tilde{A}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\rightarrow \breve{A}_{\alpha }({\mathcal {C}}_S)\) can be referred to Theorem 5.3 of [27]. \(\square \)

Nextly, we show that \(\breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S)\) converges to \(\bar{{\mathcal {J}}}_{\alpha }\) with probability 1 as S increases.

Theorem 6

Suppose Assumption 2 and 4 hold. As \(S\rightarrow \infty \), with probability 1, we have

$$\begin{aligned} \liminf _{S\rightarrow \infty } \breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S)=\bar{{\mathcal {J}}}_{\alpha }. \end{aligned}$$
(22)

Proof

(Theorem 6) The outline of the proof of Theorem 6 is summarized as follows:

  1. A.

    Prove that the limit of lower bound of \(\breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S)\) is larger than \(\bar{{\mathcal {J}}}_{\alpha }\) by (23);

  2. B.

    Prove that the limit of upper bound of \(\breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S)\) is smaller than \(\bar{{\mathcal {J}}}_{\alpha }\) by (38);

    1. B1.

      Find a sequence \(\{\mu _k\}_{k=1}^{\infty }\) converges weakly to an optimal solution \(\mu ^*\) of \(P_\alpha \);

    2. B2.

      Show that \(\int _{{\mathcal {X}}}F(x){\textsf{d}}\mu _k(x)\) and \(\int _{{\mathcal {X}}}J(x){\textsf{d}}\mu _k(x)\) can be approximated by using discrete probability measure on \({\mathcal {C}}_S\), which refers to (34) and (35);

    3. B3.

      Show that optimal discrete probability measure on \({\mathcal {C}}_S\) for \(\breve{P}_\alpha ({\mathcal {C}}_S)\) has a smaller objective value than the discrete probability measure for approximating any \(\mu _k\) in B2. Then, we obtain (38).

Then, we give the details of the proof.

For any discrete probability measure \(\mu ^S\in \breve{{\mathcal {F}}}_{\alpha }({\mathcal {C}}_S)\), we have

$$\begin{aligned} \int _{{\mathcal {X}}}F(x) {\textsf{d}}\mu ^S(x)=\sum _{i=1}^{S}\mu ^S(\{x^{(i)}\})F(x^{(i)})\ge 1-\alpha . \end{aligned}$$

Thus, \(\mu ^S\in M_{\alpha }(x)\). Then, it holds that

$$\begin{aligned} \sum _{i=1}^S J(x^{(i)})\mu ^S(\{x^{(i)}\})=\int _{{\mathcal {X}}}J(x){\textsf{d}}\mu ^S(x)\ge \bar{{\mathcal {J}}}_{\alpha },\quad \forall \mu ^S\in \breve{{\mathcal {F}}}_{\alpha }({\mathcal {C}}_S). \end{aligned}$$

Furthermore, with probability 1, we have

$$\begin{aligned} \liminf _{S\rightarrow \infty } \breve{{\mathcal {J}}}_\alpha ({\mathcal {C}}_S)\ge \bar{{\mathcal {J}}}_{\alpha }. \end{aligned}$$
(23)

Assumption 4 implies that there exists a sequence \(\{\mu _k\}_{k=1}^{\infty }\subseteq M({\mathcal {X}})\) that converges weakly to an optimal solution \(\mu ^*\) such that

$$\begin{aligned} \int _{{\mathcal {X}}}F(x) {\textsf{d}}\mu _k(x)> 1-\alpha \end{aligned}$$
(24)

for all \(k\in {\mathbb {N}}\). Since \(\{\mu _k\}_{k=1}^{\infty }\) converges weakly to \(\mu ^*\), we have

$$\begin{aligned} \lim _{k\rightarrow \infty }\int _{{\mathcal {X}}}J(x){\textsf{d}}\mu _k(x)-\int _{{\mathcal {X}}}J(x){\textsf{d}}\mu ^*(x)=\lim _{k\rightarrow \infty }{\mathcal {W}}(\mu _k,\mu ^*)=0. \end{aligned}$$
(25)

Notice that \(\bar{{\mathcal {J}}}_\alpha =\int _{{\mathcal {X}}}J(x){\textsf{d}}\mu ^*(x)\) by (3).

For any given \(\varepsilon _J>0\), \(\exists K(\varepsilon _J)\), if \(k\ge K(\varepsilon _J)\),

$$\begin{aligned} \int _{{\mathcal {X}}}J(x){\textsf{d}}\mu _k(x)-\bar{{\mathcal {J}}}_{\alpha }\le \varepsilon _J. \end{aligned}$$

Let \(\tilde{{\mathcal {C}}}_{{\tilde{S}}}^k:=\{{\tilde{x}}^{(1)}_k,\ldots ,{\tilde{x}}^{({\tilde{S}})}_k\}\) be a sample set obtained by sampling from \({\mathcal {X}}\) according to probability measure \(\mu _k\). By Law of Large Numbers (p. 457 of [27]), for any \(f\in {\mathscr {C}}({\mathcal {X}},{\mathbb {R}})\), as \({\tilde{S}}\rightarrow \infty \), with probability 1, we have

$$\begin{aligned} \frac{1}{{\tilde{S}}}\sum _{i=1}^{{\tilde{S}}} f({\tilde{x}}^{(i)}_k)\rightarrow {\mathbb {E}}_{x\sim \mu _k}\left\{ f(x)\right\} =\int _{{\mathcal {X}}}f(x){\textsf{d}}\mu _k(x). \end{aligned}$$
(26)

Since \(J(\cdot )\) and \(F(\cdot )\) are also elements in \({\mathscr {C}}({\mathcal {X}},{\mathbb {R}})\), (26) also holds by replacing \(f(\cdot )\) by either \(J(\cdot )\) or \(F(\cdot )\). Namely, for any \({\tilde{\varepsilon }}_1\), there exists \({\tilde{S}}_{{\textsf{l}}}({\tilde{\varepsilon }}_J)\) such that, if \({\tilde{S}}\ge {\tilde{S}}_{{\textsf{l}}}({\tilde{\varepsilon }}_J)\), with probability 1, the followings hold:

$$\begin{aligned} \left| \frac{1}{{\tilde{S}}}\sum _{i=1}^{{\tilde{S}}} F({\tilde{x}}^{(i)}_k)-\int _{{\mathcal {X}}}F(x){\textsf{d}}\mu _k(x)\right|\le & {} {\tilde{\varepsilon }}_1, \end{aligned}$$
(27)
$$\begin{aligned} \left| \frac{1}{{\tilde{S}}}\sum _{i=1}^{{\tilde{S}}} J({\tilde{x}}^{(i)}_k)-\int _{{\mathcal {X}}}J(x){\textsf{d}}\mu _k(x)\right|\le & {} {\tilde{\varepsilon }}_1. \end{aligned}$$
(28)

On the other hand, according to Lemma 2, as \(S\rightarrow \infty \), for any \({\tilde{s}}\in \{1,\ldots ,{\tilde{S}}\}\) and \({\tilde{\varepsilon }}_r>0\), with probability 1, there exists a sample \(x^{(i_{{\tilde{s}}})}\in {\mathcal {C}}_S:=\{x^{(1)},\ldots ,x^{(S)}\}\) such that

$$\begin{aligned} x^{(i_{{\tilde{s}}})}\in {\mathcal {B}}_{{\tilde{\varepsilon }}_r}({\tilde{x}}_{k}^{{\tilde{s}}}). \end{aligned}$$
(29)

With a little abuse of notation, let \(x^{(i_{{\tilde{s}}})}\) be the closest sample to \({\tilde{x}}^{({\tilde{s}})}_k\), namely, \(x^{(i_{{\tilde{s}}})}\in \arg \min \{\Vert x^{(i)}-{\tilde{x}}^{({\tilde{s}})}_k\Vert :x^{(i)}\in {\mathcal {C}}_S\}\). Define a set \(I_{{\tilde{S}}}:=\{i_1,\ldots ,i_{{\tilde{S}}}\}\) as the set of index corresponding to \(x^{(i_{{\tilde{s}}})}\). Without loss of generality, we assume that \(x^{(i_{{\tilde{s}}})}\ne x^{(j_{{\tilde{s}}})}\) if \(i_{{\tilde{s}}}\ne j_{{\tilde{s}}},\; i_{{\tilde{s}}},j_{{\tilde{s}}}\in I_{{\tilde{S}}}\). The intuitive explanation of the relationship between \({\mathcal {C}}_S\) and \(\tilde{{\mathcal {C}}}^k_{{\tilde{S}}}\) is illustrated in Fig. 1.

Define a discrete probability measure \(\mu ^{S}_k\in {\mathbb {R}}^S\) such that

$$\begin{aligned} \mu ^S_k(i)= & {} \frac{1}{{\tilde{S}}},\quad \forall i\in I_{{\tilde{S}}}, \end{aligned}$$
(30)
$$\begin{aligned} \mu ^S_k(i)= & {} 0,\quad \forall i\notin I_{{\tilde{S}}}. \end{aligned}$$
(31)

For any given positive integer \({\tilde{S}}\) and positive number \({\tilde{\varepsilon }}_2\), due to the continuity of \(J(\cdot )\) and \(F(\cdot )\), there exists \({\bar{S}}_{{\textsf{l}}}({\tilde{S}},{\tilde{\varepsilon }}_2)\) such that, if \(S>S_{{\textsf{l}}}({\tilde{S}},{\tilde{\varepsilon }}_2)\), with probability 1, the followings hold:

$$\begin{aligned}{} & {} \left| \sum _{i=1}^{S} \mu ^S_k(i)F(x^{(i)})-\frac{1}{{\tilde{S}}}\sum _{i=1}^{{\tilde{S}}} F({\tilde{x}}^{(i)}_k)\right| \le {\tilde{\varepsilon }}_2, \end{aligned}$$
(32)
$$\begin{aligned}{} & {} \left| \sum _{i=1}^{S} \mu ^S_k(i)J(x^{(i)})-\frac{1}{{\tilde{S}}}\sum _{i=1}^{{\tilde{S}}} J({\tilde{x}}^{(i)}_k)\right| \le {\tilde{\varepsilon }}_2. \end{aligned}$$
(33)

By combining (27) with (32) and combining (28) with (33), then, for given \({\tilde{\varepsilon }}_1,{\tilde{\varepsilon }}_2\), there exist \({\tilde{S}}_{{\textsf{l}}}({\tilde{\varepsilon }}_1)\) and \(S_{{\textsf{l}}}({\tilde{S}},{\tilde{\varepsilon }}_2)\) such that, if \({\tilde{S}}>{\tilde{S}}_{{\textsf{l}}}({\tilde{\varepsilon }}_1)\) and \(S>S_{{\textsf{l}}}({\tilde{S}},{\tilde{\varepsilon }}_2)\), with probability 1, the following holds:

$$\begin{aligned}{} & {} \left| \sum _{i=1}^{S} \mu ^S_k(i)F(x^{(i)})-\int _{{\mathcal {X}}}F(x){\textsf{d}}\mu _k(x)\right| \le {\tilde{\varepsilon }}_1+{\tilde{\varepsilon }}_2, \end{aligned}$$
(34)
$$\begin{aligned}{} & {} \left| \sum _{i=1}^{S} \mu ^S_k(i)J(x^{(i)})-\int _{{\mathcal {X}}}J(x){\textsf{d}}\mu _k(x)\right| \le {\tilde{\varepsilon }}_1+{\tilde{\varepsilon }}_2. \end{aligned}$$
(35)

According to (24) and (34), we can find \({\tilde{S}}_{{\textsf{l}}}({\tilde{\varepsilon }}_1)\) and \(S_{{\textsf{l}}}({\tilde{S}},{\tilde{\varepsilon }}_2)\) such that, if \({\tilde{S}}>{\tilde{S}}_{{\textsf{l}}}({\tilde{\varepsilon }}_1)\) and \(S>S_{{\textsf{l}}}({\tilde{S}},{\tilde{\varepsilon }}_2)\), with probability 1, the following holds

$$\begin{aligned} \sum _{i=1}^{S} \mu ^S_k(i)F(x^{(i)})\ge 1-\alpha . \end{aligned}$$
(36)

Thus, \(\mu ^S_k\) is a feasible solution of Problem \({\tilde{P}}_\alpha ({\mathcal {C}}_S)\) and thus

$$\begin{aligned} \sum _{i=1}^{S} \mu ^S_k(i)J(x^{(i)})\ge \breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S). \end{aligned}$$
(37)

Since \(\int _{{\mathcal {X}}}J(x){\textsf{d}}\mu _k(x)\) converges to \(\bar{{\mathcal {J}}}_\alpha \) w.p. 1 as \(k\rightarrow \infty \), thus, considering (35) and (37), we have

$$\begin{aligned} \limsup _{S\rightarrow \infty } \breve{{\mathcal {J}}}_\alpha ({\mathcal {C}}_S)\le \bar{{\mathcal {J}}}_{\alpha }. \end{aligned}$$
(38)

With (23) and (38), we have (22).\(\square \)

Fig. 1
figure 1

The intuitive explanation of the relationship between \({\mathcal {C}}_S\) and \(\tilde{{\mathcal {C}}}^k_{{\tilde{S}}}\)

The proof of Theorem 2 can be obtained immediately by using the results of Theorems 5 and 6, which is omitted here.

3.3 Proof of Theorem 3

Main results of [37] are summarized as:

Lemma 3

Let \({\mathcal {X}}^+\) be a compact set. Let \(p:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) be a probability density function on the domain \({\mathbb {R}}^n\). If there exists a positive number \(\rho '>0\) such that \(p\in \{p:p(x)\ge \rho ',\forall x\in {\mathcal {X}}^+\}\), then there exists \(p_{\theta }(x)\) defined by (11) such that

$$\begin{aligned} \lim _{L\rightarrow \infty } \int _{{\mathcal {X}}^+}\left( p(x)-p_{\theta }(x)\right) ^{2}{\textsf{d}}x =0, \end{aligned}$$

where the positive integer L is the number of Gaussian kernels in (11).

Proof

(Theorem 3) For given \({\mathcal {C}}_S,\ {\mathcal {D}}_{N}\) and L, we have problems \({\tilde{P}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_{N})\) and \({\hat{P}}_{\alpha }(L,{\mathcal {D}}_N)\). Let \({\mathcal {X}}_{p,i},i=1,\ldots ,S\) be the partitions of \({\mathcal {X}}\), which satisfy

  1. (a)

    \(x^{(i)}\in {\mathcal {X}}_{p,i}\);

  2. (b)

    \(\bigcup _{i=1}^{S} {\mathcal {X}}_{p,i}={\mathcal {X}}\);

  3. (c)

    \({\mathcal {X}}_{p,i}\bigcap {\mathcal {X}}_{p,i'}=\emptyset \) with probability 1 if \(i\ne i'\).

For any \(\mu ^{S}\in U\), we can correspondingly define a Dirac measure on \({\mathcal {X}}\) as

$$\begin{aligned} \mu ^{S}_{{\textsf{d}}}(x)=\mu ^{S}(x^{(i)})\quad \text {if}\quad x\in {\mathcal {X}}_{p,i}. \end{aligned}$$

Define a set of index as \(I^+=\{i:\mu ^{S}(x^{(i)})>0\}\). Then, we can define a compact set

$$\begin{aligned} {\mathcal {X}}^+=\bigcup _{i\in I^+} {\mathcal {X}}_{p,i}. \end{aligned}$$

According to Lemma 3, there exists a sequence \(\left\{ p_{\theta }(x)\right\} _L\) such that

$$\begin{aligned} \lim _{L\rightarrow \infty } \int _{{\mathcal {X}}^+}\left( \mu ^{S}_{{\textsf{d}}}(x)-p_{\theta }(x)\right) ^{2}{\textsf{d}}x =0. \end{aligned}$$

Thus, we have

$$\begin{aligned} \lim _{L\rightarrow \infty }\int _{{\mathcal {X}}}J(x) {\textsf{d}}p_{\theta }(x)=\int _{{\mathcal {X}}}J(x){\textsf{d}}\mu ^{S}_{{\textsf{d}}}(x) \end{aligned}$$

and

$$\begin{aligned} \lim _{L\rightarrow \infty }\int _{{\mathcal {X}}}\frac{1}{N}\sum _{j=1}^N{\mathbb {I}}\{h(x,\delta ^{(j)})\le 0\}p_{\theta }(x){\textsf{d}}x=\int _{{\mathcal {X}}}\frac{1}{N}\sum _{j=1}^N{\mathbb {I}}\{h(x,\delta ^{(j)})\le 0\}\mu ^{S}_{{\textsf{d}}}(x){\textsf{d}}x. \end{aligned}$$

For any S and N, by applying Lemma 3, we can find a sequence \(\left\{ p^*_{\theta }(x)\right\} _L\) such that

$$\begin{aligned} \lim _{L\rightarrow \infty } \int _{{\mathcal {X}}}J(x) {\textsf{d}}p_{\theta }^{*}(x)=\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N) \end{aligned}$$
(39)

and

$$\begin{aligned} \lim _{L\rightarrow \infty }\int _{{\mathcal {X}}}\sum _{j=1}^{N}\frac{1}{N}{\mathbb {I}}\{h(x,\delta ^{(j)})\}p_{\theta }^{*}(x){\textsf{d}}x=\sum _{i=1}^{S}\mu ^{S}(x^{(i)})\sum _{j=1}^{N}\frac{1}{N}{\mathbb {I}}\{h(x^{(i)},\delta ^{(j)})\}\ge 1-\alpha . \end{aligned}$$
(40)

There exists the limit of \({\hat{P}}_{\alpha }(L,{\mathcal {D}}_N)\) that converges to \({\tilde{P}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) as \(L\rightarrow \infty \). Theorem 3 can be obtained by using Theorem 2. One point should be clarified here. In Theorem 2, the convergence holds for \(S\rightarrow \infty \). In Theorem 3, \(L\rightarrow \infty \) is used instead since we have (39) and (40) for any S increasing to infinite.\(\square \)

4 Numerical Examples

This section provides the results of two numerical examples to validate our proposed methods. All computations were executed on Windows 10 with 32 GB RAM and an Intel(R) Core(TM) i7-1065G7 CPU running at 1.30 GHz. The algorithm and all computations were implemented in MATLAB R2021b. We check the performance of the following methods:

  1. 1.

    Dirac-Delta: solving sample average approximate problem \({\tilde{Q}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) of \(Q_\alpha \);

  2. 2.

    Sample: solving sample-based approximate problem \({\tilde{P}}_\alpha ({\mathcal {C}}_S,{\mathcal {D}}_N)\) of \(P_\alpha \);

  3. 3.

    GMM: GMM-based approximate problem \({\hat{P}}_\alpha (L,{\mathcal {D}}_N)\).

We use the terminology Dirac-Delta for the method of solving sample average approximate problem \({\tilde{Q}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) of \(Q_\alpha \) since it equivalently gives the measure constrained to be a Dirac-delta, namely, the measure is concentrated on one fixed solution.

Fig. 2
figure 2

Results of the numerical example 1: a profile of J(x) and optimal solution obtained by Dirac-Delta; b optimal measure by Sample; c optimal probability density function obtained by GMM

4.1 One-Dimension Example

In the first numerical, we use an extremely simple example to demonstrate the concepts of Theorems 12, and 3. The compact set \({\mathcal {X}}\) is defined by \({\mathcal {X}}=\{x\in {\mathbb {R}}:x\in [-1,1]\}\). Moreover, the cost function J(x) is

$$\begin{aligned} J(x)=-(x+0.6)^2+2. \end{aligned}$$
(41)

The constraint function \(h(x,\delta )\) is

$$\begin{aligned} h(x,\delta )=x^2+\delta -2 \end{aligned}$$
(42)

where \(\delta \sim {\mathcal {N}}(m_{\delta },\varSigma _{\delta }), m_{\delta }=0\), and \(\varSigma _{\delta }=1\). The probability level \(\alpha \) is 0.05. The optimal solution from method Dirac-Delta is \(x^*_{\alpha }=0.595\) and the optimal objective value is 0.572, which is plotted in Fig. 2a. In Dirac-Delta, we set \(\epsilon =\alpha \), \(N=2000\), and \(\gamma =0.01\). Besides, Fig. 2b, c shows the discrete measure obtained by Sample and the probability density function obtained by GMM, respectively. For Sample, we choose samples \(-1, -0.98, -0.96,\ldots , 0.96, 0.98,1\) from \({\mathcal {X}}\) (\(S=201\)) and 2000 randomly extracted samples from \(\varDelta \) (\(N=2000\)). For GMM, we extracted 2000 samples from \(\varDelta \) randomly. Besides, we choose \(L=6\). The solutions of Sample and GMM satisfy the chance constraints. For the objective function, Sample achieves 0.5601 and GMM achieves 0.5615, which are better than the optimal objective value achieved by Dirac-Delta.

Table 1 Statistics of CPU time for one-dimension example

A more comprehensive analysis of CPU time and sample numbers is summarized in Table 1. The CPU time increases as the sample size increases for each method. Unsurprisingly, Sample has a very fast computation time since it only needs to solve a linear program. In this example, since it is one dimension, the required sample number for obtaining good samples in Sample or approximating probability integration in GMM is few. It can achieve acceptable accuracy with only 50 samples. However, if the dimension of x increases, the “Curse of Dimensionality” will emerge. We will show it in the second example.

4.2 Quadrotor System Control

The second example considers a quadrotor system control problem in turbulent conditions. The control problem is expressed as follows:

figure g

where A, B(m), \(d(x_t,\varphi )\) are written by

$$\begin{aligned} A= \begin{bmatrix} 1 &{} \varDelta t &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} \varDelta \\ 0 &{} 0 &{} 0 &{} 1 \end{bmatrix},\; B(m)=\frac{1}{m} \begin{bmatrix} \frac{\varDelta t^2}{2} &{} 0 \\ \varDelta t &{} 0 \\ 0 &{} \frac{\varDelta t^2}{2} \\ 0 &{} \varDelta \end{bmatrix},\; d(x_t,\varphi )=-\varphi \begin{bmatrix} \frac{\varDelta t^2|v_x|v_x}{2} \\ \varDelta t|v_x|v_x \\ \frac{\varDelta t^2|v_y|v_y}{2} \\ \varDelta t|v_y|v_y \end{bmatrix}, \end{aligned}$$

and \(\varDelta t\) is the sampling time, the state of the system is denoted as \(x_t=[p_{x,t},v_{x,t},p_{y,t},v_{y,t}]\in {\mathbb {R}}^4\), the control input of the system is \(u_t=\{u_{x,t},u_{y,t}\}\) within \({\mathcal {U}}:=\{u_t\in {\mathbb {R}}^2:-10\le u_{x,t}\le 10,-10\le u_{y_t}\le 10\}\), and the state and control trajectories are denoted as \(x=(x_t)_{t=1}^{T}\) and \(u=(u_t)_{t=1}^{T-1}\). The system starts from an initial point \(x_0=[-0.5,0,-0.5,0]\). The system is expected to reach the destination set \({\mathcal {X}}_{\text {goal}}=\{x\in {\mathbb {R}}^4|\Vert (p_x-10,p_y-10)\Vert \le 2\}\) at time \(T=10\) while avoiding two polytopic obstacles \({\mathcal {O}}\) shown in Fig. 3. \({\mathcal {O}}\) is defined by the following constraints:

$$\begin{aligned}{} & {} p_{x,t}\le 6.35,\; p_{y,t}\ge 3.35,\; p_{x,t}-0.2-p_{y,t}\ge 0,\\{} & {} p_{x,t}\ge 3.35,\; p_{y,t}\le 6.35,\; p_{x,t}+0.2-p_{y,t}\le 0. \end{aligned}$$

The dynamics are parametrized by uncertain parameter vector \(\delta _t=[m,\varphi ]^\top \), where \(m>0\) represents the system’s mass and \(\varphi >0\) is an uncertain drag coefficient. The parameter vector \(\delta \) of the system is uncorrelated random variables such that \((m-0.75)/0.5\sim \text {Beta}(2,2)\) and \((\varphi -0.4)/0.2\sim \text {Beta}(2,5)\), where \(\text {Beta}(a,b)\) denotes a Beta distribution with shape parameters (ab). \(\omega _t\in {\mathbb {R}}^4\) is the uncertain disturbance at time step t, which obeys multivariate normal distribution with zero means and covariance matrix

$$\begin{aligned} \varSigma = \begin{bmatrix} 0.01 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0.75 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0.01 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0.75 \end{bmatrix}. \end{aligned}$$

For the cost function, we adopt

$$\begin{aligned} \ell ^x(x)= & {} \frac{1}{T}\sum _{t=0}^{T-1}\left( (p_{x,t+1}-p_{x,t})^2+(p_{y,t+1}-p_{y,t})^2\right) ,\\ \ell ^u(u)= & {} \frac{0.1}{T}\sum _{t=0}^{T-1}\left( u_{1,t}^2+u_{2,t}^2\right) . \end{aligned}$$

Results are shown in Fig. 3 for different methods by setting \(\alpha \) as 15%. Figure 3 shows 5000 Monte Carlo (MC) simulations of the quadrotor system using the open-loop policy computed using Dirac-Delta (\(\epsilon =\alpha ,\gamma =0.01, N=2000\)), Sample (\(S=5.1\times 10^6,N=2000\)), and GMM (\(L=6,N=2000\)). When using Dirac-Delta, the algorithm gives a deterministic control policy that satisfies the desired success probability \(1-\alpha \). When using Sample, or GMM, the algorithm gives a stochastic control policy that satisfies the desired success probability \(1-\alpha \). The control inputs that generate trajectories passing through the riskier middle corridor between the obstacles are selected randomly for the stochastic control policies. The costs by using Sample and GMM are reduced by 8.2 and 7.9% compared to using Dirac-Delta. This shows that our approach can compute a better policy that solves the problem than a deterministic policy.

Fig. 3
figure 3

Solutions from different methods for the tolerable failure probability threshold \(\alpha =15\)%. Blue trajectories from Monte Carlo (MC) simulations denote feasible trajectories that reach the goal set \({\mathcal {X}}_{\textrm{goal}}\) and avoid obstacles \({\mathcal {O}}\). Red trajectories violate constraints: a Dirac-Delta (\(\text {MC}=11.6\)% represents that the violation probability is 11.6% in the MC simulations); b Sample (\(\text {MC}=12.8\)% represents that the violation probability is \(12.8\%\) in the MC simulations); c GMM (\(\text {MC}=11.2\)% represents that the violation probability is 11.2% in the MC simulations)

Fig. 4
figure 4

The statistics of the control performance: a reduction of cost; b required samples; c computation time

A more comprehensive comparison between the GMM-based and sample-based approximations is plotted in Fig. 4. Five cases are considered with different sample numbers for extracting the control input. Figure 4a shows that the two algorithms similarly reduce the optimal objective function value. Figure 4b shows each case’s used sample number S of decision variables. By comparing Fig. 4a, b, we can see that enough samples are required to ensure the performance of the approximations. As shown in Fig. 4c, the computation time increases dramatically as the sample number increases. In this comparison, for GMM, we choose \(L=6\), and the probability integration is approximated by using the same samples of Sample. The computation time of GMM is even longer than Sample. One way to decrease the computation time of GMM is to develop fast algorithms for probability integration. We leave this for future work. In this example, the dimension of the decision variable is 20. If the dimension increases, the required sample number will increase, and the computation time will consequently increase for Sample and GMM. We leave the issue of the “Curse of Dimensionality” for future work.

5 Conclusions

In conclusion, the chance-constrained linear program in probability measure space has been addressed using sample approximation or function approximation. We establish optimization problems in finite vector space as approximate problems of chance-constrained linear programs in probability measure space. By solving the approximate problems, we can obtain the approximate solution of the chance-constrained linear program in probability measure space. Numerical examples have been implemented to validate the performance of the proposed method. Future work will be focused on the following points:

  • To implement sample approximation method \({\tilde{P}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\), samples of decision variable are required. As the dimension of the decision variable increases, the required sample number for a good approximation will also increase, bringing the issue of the “Curse of dimensionality.” To overcome the issue of the “Curse of Dimensionality,” it is important to develop efficient sampling algorithms to get “good but small samples” to ensure good approximation performance and mitigate the computation burden;

  • For Gaussian mixture model-based approximation method \({\hat{P}}_{\alpha }(L,{\mathcal {D}}_N)\), the remaining issue is how to approximate the probability integration by fast algorithms when the problem is with complex cost function and constrained functions in high dimension space.