Abstract
A risk-aware decision-making problem can be formulated as a chance-constrained linear program in probability measure space. Chance-constrained linear program in probability measure space is intractable, and no numerical method exists to solve this problem. This paper presents numerical methods to solve chance-constrained linear programs in probability measure space for the first time. We propose two solvable optimization problems as approximate problems of the original problem. We prove the uniform convergence of each approximate problem. Moreover, numerical experiments have been implemented to validate the proposed methods.
Avoid common mistakes on your manuscript.
1 Introduction
Let \({\mathcal {X}}\subset {\mathbb {R}}^n\) be a compact set with the infinity norm defined by \(\Vert x\Vert _{\infty }=\max _{i=1,\ldots ,n}|x_i|,x\in {\mathcal {X}}\). Denote \(D>0\) such that \(D:=\sup \{\Vert x-x'\Vert _{\infty }:x,x'\in {\mathcal {X}}\}\) for the diameter of \({\mathcal {X}}\). In this paper, we assume that \({\mathcal {X}}\) can be specified as \({\mathcal {X}}=\{x\in {\mathbb {R}}^n:g(x)\le {\varvec{0}}^{n_g}\}\) where \(g:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^{n_g}\) is a continuously differentiable constraint function. We have the following assumption on g throughout the paper.
Assumption 1
Cottle Constraint Qualification (CCQ) holds at any points in \({\mathcal {X}}\). Namely, for any \(x\in {\mathcal {X}}\), there is a \(d\in {\mathbb {R}}^n\) such that
holds.
Let \({\mathscr {B}}({\mathcal {X}})\) be Borel \(\sigma \)-algebra on metric space \({\mathcal {X}}\). This paper uses \({\mathscr {B}}(\cdot )\) to denote the Borel \(\sigma \)-algebra on a metric space. Notice that \(\left( {\mathcal {X}},{\mathscr {B}}({\mathcal {X}})\right) \) is a Borel space. Let \(\mu \) be a Borel probability measure on \({\mathscr {B}}({\mathcal {X}})\). Let \(M({\mathcal {X}})\) be the space of Borel probability measures on metric space \({\mathcal {X}}\). Let \(\delta \) be a random vector with support \(\varDelta \subseteq {\mathbb {R}}^s\) and \({\mathbb {P}}\{\cdot \}\) be the probability measurable defined on Borel \(\sigma \)-algebra \({\mathscr {B}}(\varDelta )\) on \(\varDelta \). Let \(p(\delta )\) be the probability density function associated with \({\mathbb {P}}\{\cdot \}\). Given a scalar function \(J:{\mathcal {X}}\rightarrow {\mathbb {R}}\), and a vector-valued function \(h:{\mathcal {X}}\times \varDelta \rightarrow {\mathbb {R}}^m\), a chance-constrained linear program in probability measure space is formulated as:
where \(\alpha \in (0,1)\) is a given probability level and F(x) is defined by
Here, \({\mathbb {I}}\{y\}\) presents the indicator function written as
Note that F(x) is the probability of having \(h(x,\delta )\le 0\) for given x. Throughout the paper, we assume the following conditions on J(x) and \(h(x,\delta )\).
Assumption 2
For functions J(x) and \(h(x,\delta )\), the followings are supposed to be held:
-
a.
J(x) is continuously differentiable with respect to x;
-
b.
\(h(x,\delta )\) is continuously differentiable with respect to x for any \(\delta \in \varDelta \);
-
c.
For every \(x\in {\mathcal {X}}\), \(h(x,\delta )\) is continuous with respect to \(\delta \);
-
d.
The probability density function \(p(\delta )\) is continuous with respect to \(\delta \);
-
e.
Let \({\bar{h}}(x,\delta ):=\max _{i}h_i(x,\delta )\), \(\textsf{supp}\ p:={{\textsf{c}}}{{\textsf{l}}}\{\delta \in \varDelta :p(\delta )>0\}\) (\({{\textsf{c}}}{{\textsf{l}}}\{\cdot \}\) denotes the closure), and for each \(x\in {\mathcal {X}}\),
$$\begin{aligned} \varDelta ^{\textsf{supp}}(x):=\{\delta \in \textsf{supp}\, p:{\bar{h}}(x,\delta )=0\}. \end{aligned}$$For each \(x\in {\mathcal {X}}\), the following is assumed to be true:
$$\begin{aligned} {\mathbb {P}}\{\varDelta ^{\textsf{supp}}(x)\}=0. \end{aligned}$$Besides, suppose that \(h(x,\delta )\) has a continuous probability density function for every \(x\in {\mathcal {X}}\);
-
f.
There exists \(L>0\) such that
$$\begin{aligned} \Vert h(x,\delta )-h(x',\delta )\Vert _{\infty }\le L\Vert x-x'\Vert _{\infty },\quad \forall x,x'\in {\mathcal {X}}\;\text {and}\; \forall \delta \in \varDelta , \end{aligned}$$and
$$\begin{aligned} |J(x)-J(x')|\le L\Vert x-x'\Vert _{\infty },\quad \forall x,x'\in {\mathcal {X}}. \end{aligned}$$
In fact, according to the content of pp. 78–79 of [18], we can obtain the continuity of F(x) from Assumption 2.
Denote the feasible region of \(P_{\alpha }\) as \(M_{\alpha }({\mathcal {X}}):=\{\mu \in M({\mathcal {X}}):\int _{{\mathcal {X}}}F(x) {\textsf{d}}\mu \ge 1-\alpha \}\). The optimal objective function value of \(P_{\alpha }\) is
The optimal solution set of \(P_{\alpha }\) is therefore written as
\({\bar{\mu }}_{\alpha }\in A_{\alpha }\) is called an optimal measure for \(P_\alpha \).
1.1 Motivation
The motivation for addressing chance-constrained linear programs in probability measure space is from seeking an optimal stochastic policy for the optimal control problem with chance constraints, which is vital for the deployment of reliable autonomous systems by control algorithms that are robust to model misspecifications and for external disturbances [2, 10, 28]. The optimal control problem with chance constraints aims at maximizing a reward function or minimizing a cost function with the constraints that the system state should locate in the safe area with a required probability. The deterministic policy has a fixed value in the decision domain at every time index. In contrast, the stochastic policy provides a probability measure on the decision domain at every time index. The deterministic policy can be regarded as a particular case of the stochastic policy by concentrating the probability measure on a fixed value in the decision domain. The existing techniques for addressing optimal control problems with chance constraints do not touch the essential parts of the problem and may require application-specific assumptions. For example, [10, 17] enforces pointwise chance constraints that ensure the independent satisfaction of each chance constraint at each time step, which leads to a more conservative solution. In general, joint chance constraints are desired, which requires all chance constraints to be satisfied jointly at all times. However, it is challenging to tackle the joint chance-constrained optimal control problem since the distribution of the state trajectory needs to be considered fully. It is possible to address the joint chance-constrained optimal control problem by using Boole’s inequality [2, 24, 36] or performing robust optimization within the bounded model parameters obtained by specifying a confident set [19]. However, these two methods are conservative. More investigations from the viewpoint of optimization theory should be addressed to enhance new breakthroughs for optimal control with chance constraints.
Obtaining open-loop stochastic optimal policies under chance constraints can be essentially written as a chance-constrained linear program in probability measure space [32]. Open-loop stochastic policies mean that the stochastic policies only depend on the initial state. Unfortunately, there is still no research on solving chance-constrained linear programs in probability measure space to our knowledge. Investigating the chance-constrained linear programs in probability measure space is vital, which can give more insights into optimal control with chance constraints.
1.2 Related Works
Optimization with finite chance constraints in finite-dimensional vector space is generally challenging due to the nonconvexity of the feasible set and intractable reformulations [9, 27]. The existing research has two major streams: (1) give assumptions that the constraint functions or the distribution of random variables has some special structure, for example, linear or convex constraint functions [23], finite sample space of random variables [21], elliptically symmetric Gaussian-similar distributions [33], or (2) extract samples [5,6,7, 20, 25, 26, 29, 31] or use smooth functions [13] to approximate the chance constraints. For sample-based methods, the most famous approach in the control field is scenario approach [5,6,7,8, 28]. Scenario approach generates a deterministic optimization problem as the approximation of the original one by extracting samples from the sample space of random variables. The probability of the feasibility of the approximate solution rapidly increases to one as the sample number increases. However, the convergence of the optimality of the approximate solution is not discussed. In another sample-based method, the sample-average approach [13, 20, 26, 29], both feasibility and optimality of the approximate solution are presented. However, neither scenario approach nor sample-average approach can be directly used to solve chance-constrained linear programs in probability measure space since the deduction of the convergence of either scenario approach or sample-average approach assumes that the dimension of the decision variable must be finite.
Optimization with chance/robust constraints in finite-dimensional vector space is also intensively studied, in which the number of chance constraints is infinite [1, 11, 34, 35]. In [34], the generalized differentiation of the probability function of infinite constraints is investigated. The optimality condition with an explicit formulation of subdifferentials is given. In [35], the variational tools are applied to formulate generalized differentiation of chance/robust constraints. The method of getting the explicit outer estimations of subdifferentials from data is also established. An adaptive grid refinement algorithm is developed to solve the optimization with chance/robust constraints in [1]. However, the above research on optimization with chance/robust constraints in finite-dimensional vector space can prove convergence only when the dimension of the decision variable is finite.
Recently, chance constraints in infinite dimensions have attracted a lot of attention. In [12, 14, 15], some essential properties, such as convexity and semi-continuity, are generalized into the chance constraints in infinite dimensions. However, the results in [12] assume that the random variable should have a log-concave density to ensure the semicontinuity. In [15], the continuity of the probability function as chance constraints is proved under the assumption of continuous random distributions. The properties of chance constraints in infinite dimensions are crucial to constructing the optimality condition and implementing convergence analysis for optimization with chance constraints in infinite dimensions. In [14], chance-constrained optimization of elliptic partial differential equation systems is addressed by inner–outer approximation. It proves that the inner and outer approximation converges to the original problem and can provide approximate solutions with ensured convergence. However, the proof of the convergence requires the assumption that the state domain is convex. Besides, it concerns the specific problem in partial differential equation systems.
1.3 Overview of Proposed Method and Contributions
This paper extends the sample-based approximation method to solve chance-constrained linear programs in probability measure space. We show the relationship between chance-constrained optimization in finite-dimensional vector space and chance-constrained linear program in probability measure space. By solving a chance-constrained linear program in probability measure space, we can obtain a stochastic policy to improve the expectation of the optimal value further. We also show that the optimal objective values of the chance-constrained linear program in probability measure space and chance-constrained optimization in finite-dimensional vector space are equal if the constraints involved with random variables are required to be satisfied with probability 1. Namely, in this case, by concentrating the probability measure on an optimal solution of chance-constrained optimization in finite-dimensional vector space, we can obtain an optimal measure for the chance-constrained linear program in probability measure space. Besides, a sample approximate problem and a Gaussian mixture model approximate problem of problem \(P_{\alpha }\) are proposed, by solving which the approximate solution of \(P_{\alpha }\) can be obtained. The convergences of both approximate problems are investigated. Numerical examples are implemented to validate the proposed methods.
Chance-constrained linear program in probability measure space involves chance constraints in infinite dimensions. Our work differs from the [12, 15] in that our purpose is to provide numerical methods for solving chance-constrained linear programs in probability measure space. The properties of chance constraints in infinite dimensions are essential for convergence analysis.
The rest of this paper is organized as follows: Sect. 2 presents two approximate problems of \(P_\alpha \) and gives the main results on the convergence for each approximate problem. The proofs of the main results are presented in Sect. 3. Section 4 presents the results of two numerical examples, which show the effectiveness of our proposed methods. Section 5 concludes the whole paper.
2 Main Results
This section introduces two approximate problems of \(P_{\alpha }\). We also present the convergence for each approximate problem. The proofs are presented in Sect. 3.
2.1 Chance-Constrained Optimization in Finite Space
Chance-constrained optimization \(Q_{\alpha }\) is an optimization problem with chance constraints in a finite-dimension vector space. The problem is written as
where \(\alpha \in (0,1)\) is a given probability level.
Let \({\mathcal {X}}_{\alpha }:=\{x\in {\mathcal {X}}:F(x)\ge 1-\alpha \}\) be the feasible domain of \(Q_{\alpha }\). Denote \({\bar{J}}_{\alpha }:=\textsf{min}\{J(x):x\in {\mathcal {X}}_{\alpha }\}\) for the optimal objective value of \(Q_{\alpha }\) and \(X_{\alpha }:=\{x\in {\mathcal {X}}_{\alpha }:J(x)={\bar{J}}_{\alpha }\}\) for the optimal solution set of \(Q_{\alpha }\). We have the following assumptions over \(Q_{\alpha }\) throughout the paper.
Assumption 3
There exists a globally optimal solution \({\bar{x}}\) of \(Q_{\alpha }\) such that for any \(\varepsilon >0\) there is \(x\in {\mathcal {X}}\) such that \(0<\Vert x-{\bar{x}}\Vert \le \varepsilon \) and \(F(x)> 1-\alpha \).
The existence of chance constraints gives rise to several difficulties. First, the structural properties of \(h(x,\delta )\) might not be passed to \(F(x)\ge 1-\alpha \). The feasible set \({\mathcal {X}}_\alpha \) can be equivalently obtained as
where \({\mathcal {X}}_{\delta }:=\{x\in {\mathcal {X}}:h(x,\delta )\le 0\}\) and \({\mathcal {F}}:=\{\varDelta _s\in {\mathscr {B}}(\varDelta ):{\mathbb {P}}\mathbb \{\varDelta _s\}\ge 1-\alpha \}\). Even if \(h_i(x,\delta ),i=1,\ldots ,m\) are all linear in x for every \(\delta \in \varDelta \), the feasible set \({\mathcal {X}}_\alpha \) may not be convex due to the infinite union operations. Second, it is difficult to obtain a tractable analytical function F(x) to describe the constraint or find a numerically efficient way to compute it. In most applications, \(p(\delta )\) is unknown, and only samples of \(\delta \) are available. We briefly review the sample-based approximation method presented in [20, 25, 26]. Let \({\mathcal {D}}_N=\{\delta ^{(1)},\ldots ,\delta ^{(N)}\}\) be a set of samples randomly extracted from \(\varDelta \) where \(N\in {\mathbb {N}}\). Suppose the sample extraction is independently and identically distributed. Then, \({\mathcal {D}}_N\) can be regarded as a random variable from the augmented sample space \(\varDelta ^N\) with probability measure \({\mathbb {P}}^N\{\cdot \}\) defined on the Borel \(\sigma \)-algebra \({\mathscr {B}}(\varDelta ^N)\). Giving \({\mathcal {D}}_N\), \(\epsilon \in [0,\alpha )\), and \(\gamma >0\), a sample average approximate problem of \(Q_{\alpha }\), defined by \({\tilde{Q}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\), is written as:
The feasible region of \({\tilde{Q}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) is defined by
Denote \({\tilde{J}}_{\epsilon ,\gamma }({\mathcal {D}}_N):=\textsf{min}\{J(x):x\in \tilde{{\mathcal {X}}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\}\) for the optimal objective function value of \({\tilde{Q}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) and \({\tilde{X}}_{\epsilon ,\gamma }({\mathcal {D}}_N):=\{x\in \tilde{{\mathcal {X}}}_{\epsilon ,\gamma }({\mathcal {D}}_N):J(x)={\tilde{J}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\}\) for the optimal solution set of \({\tilde{Q}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\). We can regard \({\tilde{J}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) as a function \({\tilde{J}}_{\epsilon ,\gamma }:\varDelta ^N\rightarrow {\mathbb {R}}\) for given \(\epsilon \) and \(\gamma \). Since \({\mathcal {D}}_N\) is a random variable from \(\varDelta ^N\), \({\tilde{J}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) is consequently a random variable. The sets \(\tilde{{\mathcal {X}}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) and \({\tilde{X}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) also depend on \({\mathcal {D}}_N\) and can be regarded as \(\tilde{{\mathcal {X}}}_{\epsilon ,\gamma }:\varDelta ^N\rightarrow {\mathscr {B}}({\mathcal {X}})\) and \({\tilde{X}}_{\epsilon ,\gamma }:\varDelta ^N\rightarrow {\mathscr {B}}({\mathcal {X}})\). \(\tilde{{\mathcal {X}}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) and \({\tilde{X}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) are called random sets [22]. In [20, 26], the convergence analysis on \(\tilde{{\mathcal {X}}}_{\epsilon ,\gamma }({\mathcal {D}}_N),{\tilde{X}}_{\epsilon ,\gamma }({\mathcal {D}}_N),{\tilde{J}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) is given. We summarize Theorem 10 of [20] and Theorem 3.5 of [26] as Lemma 1.
Lemma 1
Suppose that Assumptions 2 and 3 hold. Let \(\epsilon \in [0,\alpha ),\beta \in (0,\alpha -\epsilon )\) and \(\gamma >0\). Then,
Besides, \({\tilde{X}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\rightarrow X_{\alpha }\) and \({\tilde{J}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\rightarrow {\bar{J}}_{\alpha }\) with probability 1 when \(N\rightarrow \infty \), \(\epsilon \rightarrow \alpha ,\gamma \rightarrow 0\).
According to Lemma 1, we can obtain the solution of \(Q_{\alpha }\) with probability 1 when \(N\rightarrow \infty ,\, \epsilon \rightarrow \alpha ,\, \gamma \rightarrow 0\). A natural question arises: can we use the solution of \(Q_{\alpha }\) to obtain an optimal probability measure for \(P_{\alpha }\)? Let \({\bar{x}}_{\alpha }\in X_{\alpha }\) be an optimal solution of \(Q_{\alpha }\). Notice that we have \(\{{\bar{x}}_{\alpha }\}\in {\mathscr {B}}({\mathcal {X}})\) and thus it is possible to define a probability measure \(\mu _{{\bar{x}}_{\alpha }}\) which satisfies that \(\mu _{{\bar{x}}_{\alpha }}(\{{\bar{x}}_{\alpha }\})=\mu _{{\bar{x}}_{\alpha }}({\mathcal {X}})=1\). Then,
and
Thus, \(\mu _{{\bar{x}}_{\alpha }}\) is a feasible solution for \(P_{\alpha }\) with objective value as \({\bar{J}}_{x,\alpha }\). However, \(\mu _{{\bar{x}}_{\alpha }}\) is not sure to locate in \(A_{\alpha }\). Only when \(\alpha =0\), we have \(\mu _{{\bar{x}}_{\alpha }}\in A_{\alpha }\). Notice that it is not ensured that the set \(X_{\alpha }\) is a Borel measurable set. However, it is possible to find a subset \(X_{\alpha }^{\text {m}}\subseteq X_{\alpha }\) that is Borel measurable. A particular example is to choose \(X_{\alpha }^{\text {m}}=\{{\bar{x}}_{\alpha }\}\) where \({\bar{x}}_{\alpha }\in X_{\alpha }\) is one element in the optimal solution set. In this paper, without loss of generality, we assume that \(X_{\alpha }\) is Borel measurable for all \(\alpha \in [0,1]\). Besides, we also assume that \({\mathcal {X}}_0\ne \emptyset \). Then, \({\mathcal {X}}_\alpha \ne \emptyset \) holds for all \(\alpha \in [0,1]\). The above content is formally summarized in Theorem 1.
Theorem 1
Suppose that \({\mathcal {X}}_{\alpha }\) is measurable for all \(\alpha \in [0,1]\) and \({\mathcal {X}}_0\ne \emptyset \). The optimal value of problem \(P_{\alpha }\) satisfies \(\bar{{\mathcal {J}}}_{\alpha }\le {\bar{J}}_{\alpha }\). Besides, if \(\alpha =0\), we have
and
with probability 1.
The proof of Theorem 1 is given in Sect. 3.1.
Remark 1
Theorem 1 implies that deterministic policy is optimal for robust optimal control where \(\alpha =0\).
2.2 Sample-Based Approximation
Let \({\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\) be the set of all interior points of \({\mathcal {X}}\). By using Hit-and-Run algorithm [30] and Billiard Walk algorithm [16], uniform samples can be generated from \({\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\). For a positive integer \(S\in {\mathbb {N}}\), let \({\mathcal {C}}_S:=\{x^{(1)},\ldots ,x^{(S)}\}\) be a set of uniform samples independently extracted from \({\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\). The set \({\mathcal {C}}_S\) is an element of the augmented space \(\left( {\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\right) ^S\). Since each element \(x^{(i)},i=1,\ldots ,S\) in \({\mathcal {C}}_S\) is extracted independently, we define a S-fold probability \({\mathbb {P}}^S_{\textsf{uni}}\) (\(={\mathbb {P}}_{\textsf{uni}}\times \cdots \times {\mathbb {P}}_{\textsf{uni}}\), S times) in \(\left( {\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\right) ^S\). Here, \( {\mathbb {P}}_{\textsf{uni}}\) is the probability measure of uniform distribution on \({\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\).
With \({\mathcal {C}}_S\) and \({\mathcal {D}}_N\), we can obtain a sample approximate problem of \(P_{\alpha }\) defined by \({\tilde{P}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\):
where \(U_S:=\{\mu \in {\mathbb {R}}^S:\sum _{i=1}^{S}\mu (i)=1,\, \mu (i)\ge 0,\; \forall i=1,\ldots ,S\}\). Define \({\mathcal {F}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N):=\{\mu \in U_S:\sum _{i=1}^S\mu (i)\frac{1}{N}\sum _{j=1}^N{\mathbb {I}}\{h(x^{(i)},\delta ^{(j)})\}\ge 1-\alpha \}\) as the feasible set of \({\tilde{P}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\). Denote the optimal objective function value as
Denote the optimal solution set for \({\tilde{P}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) as
Let \({\tilde{\mu }}_{\alpha }\in {\tilde{A}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) be an optimal measure. The optimal value \(\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) depends on \({\mathcal {C}}_S\) and \({\mathcal {D}}_S\), and thus it can be regarded as a function \(\tilde{{\mathcal {J}}}_{\alpha }:{\mathcal {X}}^S\times \varDelta ^{N}\rightarrow {\mathbb {R}}\). Then, \(\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) is a random variable. Besides, \({\tilde{A}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) is a random set.
The deduction of the convergences of \(\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) and \({\tilde{A}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) requires another assumption on \(P_\alpha \). We state the assumption after a brief introduction of weak convergence.
Define a space of continuous \({\mathbb {R}}\)-valued functions by
It is able to define a metric on \({\mathscr {C}}({\mathcal {X}},{\mathbb {R}})\) by
where \(\Vert f\Vert _\infty \) is defined as
The metric \(\tau (\cdot ,\cdot )\) turns \({\mathscr {C}}({\mathcal {X}},{\mathbb {R}})\) into a complete metric space.
The weak convergence of probability measures is defined as follows [4].
Definition 1
Let \(\{\mu _k\}_{k=0}^{\infty }\) be a sequence in \(M({\mathcal {X}})\). We say that \(\{\mu _k\}_{k=0}^{\infty }\) converges weakly to \(\mu \) if
Since \({\mathcal {X}}\) is compact, \(M({\mathcal {X}})\) can be proved to be weakly compact by Riesz representation theorem [4]. Therefore, giving any sequence of \(\{\mu _k\}_{k=0}^{\infty }\subset M({\mathcal {X}})\), there is a subsequence which converges weakly to some \(\mu \in M({\mathcal {X}})\) in the sense of Definition 1. By Assumption 2, we have that J(x) and F(x) are continuous with respect to x. Therefore, if \(\{\mu _k\}_{k=0}^{\infty }\) converges weakly to \(\mu \), (9) also holds for J(x) or F(x). We give the following assumption on Problem \(P_\alpha \).
Assumption 4
There exists a globally optimal solution \(\mu ^*\in A_\alpha \) of Problem \(P_\alpha \) such that for any \(\delta >0\) there is \(\mu \in M({\mathcal {X}})\) such that \(\int _{{\mathcal {X}}}F(x){\textsf{d}}\mu >1-\alpha \) and \({\mathcal {W}}(\mu ,\mu ^*)\le \delta \), where \({\mathcal {W}}(\mu ,\mu ^*)\) is defined by
As \(S,N\rightarrow \infty \), the convergence analysis on \(\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) and \({\tilde{A}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) is summarized in Theorem 2.
Theorem 2
Consider Problem \(P_\alpha \) with \(\alpha >0\). Suppose Assumptions 1, 2, 3, and 4 hold. As \(S,N\rightarrow \infty \), we have
with probability 1. Besides, as \(S,N\rightarrow \infty \), we have \({\tilde{A}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\subset M_{\alpha }({\mathcal {X}}):=\{\mu \in M({\mathcal {X}}):\int _{{\mathcal {X}}}F(x) {\textsf{d}}\mu \ge 1-\alpha \}\) with probability 1.
The proof of Theorem 2 is given in Sect. 3.2.
2.3 Gaussian Mixture Model-Based Approximation
Another option of approximation is to constrain the choice of \(\mu \) in \(M_{\theta }({\mathcal {X}})\subseteq M({\mathcal {X}})\). Here, \(M_\theta ({\mathcal {X}})\) is defined as
where the probability density function \(p_{\theta }(x)\) is written as
Here, \(\omega _i\in [0,1],\forall i=1,\ldots ,L\), \(\sum _{i=1}^{L}\omega _i=1\), and \(\phi (x,m_i,\varSigma _i)\) is multivariate Gaussian distribution written by
The notation \(\theta \) denotes the parameter vector, including all the unknown parameters in \(\omega _i,m_i,\varSigma _i,\forall i=1,\ldots ,L\). Denote the dimension of \(\theta \) as \(n_{\theta }\). The feasible domain of \(\theta \) is denoted by
Then, given a data set \({\mathcal {D}}_N\) and the number of Gaussian distributions L, we can obtain a Gaussian mixture model-based approximate problem of \(P_{\alpha }\) defined by \({\hat{P}}_{\alpha }(L,{\mathcal {D}}_N)\):
Denote the feasible set of \({\hat{P}}_{\alpha }(L,{\mathcal {D}}_N)\) as
and the optimal objective value as
Besides, the optimal solution set is
The optimal objective value \(\hat{{\mathcal {J}}}_{\alpha }(L,{\mathcal {D}}_N)\) depends on the number of used Gaussian models and the data set \({\mathcal {D}}_N\). Since data set \({\mathcal {D}}_N\) is essentially random variable with support \(\varDelta ^N\), \(\hat{{\mathcal {J}}}_{\alpha }(L,{\mathcal {D}}_N)\) is also a random variable. The set \({\hat{\varTheta }}_{\alpha }(L,{\mathcal {D}}_N)\) is a random set.
As \(L,N\rightarrow \infty \), optimality and feasibility of using the optimal solution of \({\hat{P}}_{\alpha }(L,{\mathcal {D}}_N)\) are summarized in Theorem 3.
Theorem 3
Consider Problem \(P_\alpha \) with \(\alpha >0\). Suppose Assumptions 1, 2, 3, and 4 hold. As \(L,N\rightarrow \infty \), we have
with probability 1. Besides, let \({\hat{\theta }}\in {\hat{\varTheta }}_{\alpha }(L,{\mathcal {D}}_N)\) be an optimal solution of \({\hat{P}}_{\alpha }(L,{\mathcal {D}}_N)\). The corresponding probability density function is \(p_{{\hat{\theta }}}(x)\), and the obtained probability measure is
We have \(\mu _{{\hat{\theta }}}\in M_{\alpha }({\mathcal {X}}):=\{\mu \in M({\mathcal {X}}):\int _{{\mathcal {X}}}F(x) { {\textsf{d}}}\mu \ge 1-\alpha \}\) with probability 1 as \(L,N\rightarrow \infty \).
3 Proofs of Main Results
3.1 Proof of Theorem 1
Proof
(Theorem 1) Define a measure by \({\bar{\mu }}_{\alpha }(\cdot )\), which satisfies that \({\bar{\mu }}_{\alpha }(X_{\alpha })=1\). Then, we have
Besides, for the constraint, we have
Then, \({\bar{\mu }}_{\alpha }(\cdot )\in M_{\alpha }({\mathcal {X}})\) holds. Thus, we have \(\bar{{\mathcal {J}}}_{\alpha }\le \int _{{\mathcal {X}}} J(x){\textsf{d}}{\bar{\mu }}_{\alpha }={\bar{J}}_{\alpha }\).
When \(\alpha =0\), let \({\mathcal {X}}^c_{0}=\{x\in {\mathcal {X}}:F(x)<1\}\) be the complement set of \({\mathcal {X}}_0\), namely, \({\mathcal {X}}^c_{0}\bigcup {\mathcal {X}}_0={\mathcal {X}}\) and \({\mathcal {X}}^c_{0}\bigcap {\mathcal {X}}_0=\emptyset \). Notice that \({\mathcal {X}}^c_{0}\) is Borel measurable since \({\mathcal {X}}_0\) is Borel measurable. Suppose that there is \({\tilde{\mu }}(\cdot )\in M_{0}({\mathcal {X}})\) such that \({\tilde{\mu }}({\mathcal {X}}^c_{0})>0\). Then,
which conflicts with that \({\tilde{\mu }}\in M_0({\mathcal {X}})\). Therefore, we have \(\mu ({\mathcal {X}}^c_{0})=0\) for all \(\mu \in M_0({\mathcal {X}})\), which implies that \(\int _{{\mathcal {X}}}J(x){\textsf{d}}\mu =\int _{{\mathcal {X}}_0}J(x){\textsf{d}}\mu \) for all \(\mu \in M_0({\mathcal {X}})\).
Notice that \(X_0\) is a Borel measurable set. Let \(\mu ^*_0(\cdot )\in A_0\) be an optimal probability measure for \(P_0\) and suppose \(\mu ^*_{0}(X_{0})<1\) for deriving the contradiction. Thus, \(\mu ^*_{0}({\mathcal {X}}{\setminus } X_{0})>0\). The corresponding objective function is
Denote a measure by \({\bar{\mu }}_{0}(\cdot )\), which satisfies that \({\bar{\mu }}_{0}(X_{0})=1\). Then, we have
Thus, \(\mu ^*_{0}(\cdot )\) is not the optimal measure. Therefore, (6) holds, which leads to \(\bar{{\mathcal {J}}}_0={\bar{J}}_0\). \(\square \)
3.2 Proof of Theorem 2
Lemma 2
Suppose that Assumption 1 holds. For any \(x\in {\mathcal {X}}\), denote a set as
where \(\varepsilon >0\) is radius. For any \(\varepsilon >0\), we have
Lemma 2
First, we show that the interior point set \({\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\) is not empty when Assumption 1 holds. Let \({\bar{x}}\in {\mathcal {X}}\) and thus we have
By Assumption 1, CCQ holds at \({\bar{x}}\). Thus, there exists \(d\in {\mathbb {R}}^n\) such that
Notice that (16) and (17) directly give
Since \(g(\cdot )\) is continuously differentiable, there exists a small enough \({\bar{\xi }}>0\) such that \(g({\bar{x}}+\xi d)<0\) holds for any \(\xi \in (0,{\bar{\xi }})\) and thus \({\bar{x}}+\xi d\in {\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\). It implies that \({\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\) is not empty.
We start from discussing \({\mathbb {P}}^S_{\textsf{uni}}\{{\mathcal {C}}_S\bigcap {\mathcal {B}}_\varepsilon (x)\ne \emptyset \}\) for \(x\in {\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\). Notice that \({\mathcal {X}}\) is compact and \({\mathcal {C}}_S\) is a set of uniform samples extracted from \({\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\). Thus, for any \(x\in {\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\), the probability that a sample \(x^{(i)}\in {\mathcal {C}}_S,\; i=1,..,S\) locates in \({\mathcal {B}}_\varepsilon (x)\) is
Then,
If \(S\rightarrow \infty \), we have \({\mathbb {P}}^S_{\textsf{uni}}\{{\mathcal {C}}_S\bigcap {\mathcal {B}}_\varepsilon (x)\ne \emptyset \}\ge 1\), which implies (15).
Then, we discuss \({\mathbb {P}}^S_{\textsf{uni}}\{{\mathcal {C}}_S\bigcap {\mathcal {B}}_\varepsilon (x)\ne \emptyset \}\) for \(x\in \partial {\mathcal {X}}\), where \(\partial {\mathcal {X}}\) defines the boundary of \({\mathcal {X}}\). Let \(x\in \partial {\mathcal {X}}\) be a boundary point. Again, by Assumption 1, x satisfies the CCQ. By replacing \({\bar{x}}\) in (16) and (18) by x, we have that there exists a small enough \({\bar{\xi }}>0\) such that \(g(x+\xi d)<0\) holds for any \(\xi \in (0,{\bar{\xi }})\) and thus \(x+\xi d\in {\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\). Let \(\varepsilon _1\in (0,{\bar{\xi }})\) and we can find \(x':=x+\xi d\in {\mathcal {B}}_{\varepsilon _1}(x)\bigcap {\mathcal {X}}^{{{\textsf{i}}}{{\textsf{n}}}}\) with a small enough \(\xi \). Besides, the probability that a sample \(x^{(i)}\in {\mathcal {C}}_S,\; i=1,\ldots ,S\) locates in \({\mathcal {B}}_{\varepsilon _1}(x')\) satisfies that \({\mathbb {P}}_{\textsf{uni}}\{x^{(i)}\in {\mathcal {B}}_{\varepsilon _1}(x')\}>0\). Thus, we have \({\mathbb {P}}_{\textsf{uni}}\{x^{(i)}\in {\mathcal {B}}_{2\varepsilon _1}(x)\}>0\). Let \(\varepsilon _1=\varepsilon /2\), and we can obtain (19) for a boundary point of \({\mathcal {X}}\), which completes the proof. \(\square \)
With sample set \({\mathcal {C}}_{S}=\{x^{(1)},\ldots ,x^{(S)}\}\), a sample average approximate problem of \(P_{\alpha }\), defined by \(\breve{P}_{\alpha }({\mathcal {C}}_{S})\), is written as:
where \(U_S:=\{\mu \in {\mathbb {R}}^S:\sum _{i=1}^{S}\mu (i)=1,\; \mu (i)\ge 0,\; \forall i=1,\ldots ,S\}\). Denote the feasible region of problem \(\breve{P}_{\alpha }({\mathcal {C}}_{S})\) as
Then, the optimal objective function value of \(\breve{P}_{\alpha }({\mathcal {C}}_{S})\) is defined by
The optimal solution set for \(\breve{P}_{\alpha }({\mathcal {C}}_{S})\) is therefore defined by
A measure \(\breve{\mu }_{\alpha }\in \breve{A}_{\alpha }({\mathcal {C}}_S)\) is called an optimal measure for \(\breve{P}_{\alpha }({\mathcal {C}}_{S})\).
Theorem 4
For given sample sets \({\mathcal {C}}_{S}\) and \({\mathcal {D}}_{N}\), define two functions of \(\mu \in U_S\) as
and
Then, \({\tilde{G}}_{\alpha }(\mu ,{\mathcal {C}}_{S},{\mathcal {D}}_{N})\) uniformly converges to \(\breve{G}_{\alpha }(\mu ,{\mathcal {C}}_{S})\) on \(U_S\) w.p. 1, i.e.,
Proof
(Theorem 4) For any given \(x^{(i)}\), \({\mathbb {I}}\{h(x^{(i)},\delta )\}\) is a measurable function of \(\delta \). According to the strong Law of Large Numbers (LLN) [3], we have
where
Thus, for every \(\mu \in {\mathcal {U}}_S\), we have
Uniform convergence is ensured since the set \(U_S\) is compact. \(\square \)
Nextly, we show that \(\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) and \({\tilde{A}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) converge to \(\breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S)\) and \(\breve{A}_{\alpha }({\mathcal {C}}_S)\), respectively, with probability 1 as \(N\rightarrow \infty \).
Theorem 5
Consider Problem \(P_\alpha \) with \(\alpha >0\). Assume that there exists a \(x^{(i)}\in {\mathcal {C}}_S\) that satisfies \(F(x^{(i)})>1-\alpha \). As \(N\rightarrow \infty \), \(\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\rightarrow \breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S)\) and \({\tilde{A}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\rightarrow \breve{A}_{\alpha }({\mathcal {C}}_S)\) w.p. 1.
Proof
(Theorem 5) The set \(U_S\) is a compact set. The objective function \(\sum _{i=1}^SJ(x^{(i)})\mu (i)\) is a linear function of \(\mu \in U_S\). Besides, \(F(x^{(i)})\) is a constant value within [0, 1] for a fixed \(x^{(i)}\), which makes the constraint function \(\breve{G}_{\alpha }(\mu ,{\mathcal {C}}_{S})\) a linear function of \(\mu \in U_S\). Therefore, \(\breve{P}_{\alpha }({\mathcal {C}}_S)\) is a linear program. Due to the assumption that there exists \(x^{(i)}\in {\mathcal {C}}_{S}\) such that \(F(x^{(i)})> 1-\alpha \), there is \(\mu \in U_S\) such that \(\breve{G}_{\alpha }(\mu ,{\mathcal {C}}_{S})>1-\alpha \) and thus \(\breve{A}_{\alpha }({\mathcal {C}}_S)\) is nonempty. Since \({\tilde{G}}_{\alpha }(\mu ,{\mathcal {C}}_{S},{\mathcal {D}}_{N})\) converges to \(\breve{G}_{\alpha }(\mu ,{\mathcal {C}}_{S})\) w.p. 1 by Theorem 4, there exists \(N_0\) large enough such that \({\tilde{G}}_{\alpha }(\mu ,{\mathcal {C}}_{S},{\mathcal {D}}_{N_0})\ge 1-\alpha \) w.p. 1. Because \({\tilde{G}}_{\alpha }(\mu ,{\mathcal {C}}_{S},{\mathcal {D}}_{N_0})\) is a linear function of \(\mu \) and \(U_S\) is compact, the feasible set of \({\tilde{P}}_{\alpha }({\mathcal {C}}_{S},{\mathcal {D}}_{N_0})\) is compact as well, and hence \({\tilde{A}}_{\alpha }({\mathcal {C}}_{S},{\mathcal {D}}_{N_0})\) is nonempty w.p. 1 for all \(N\ge N_0\).
Let \(\{N_k\}^{\infty }_{k=1}\) be a sequence such that \(N_k\rightarrow \infty \) and \(N_k\ge N_0\) holds for every \(k=1,\ldots \). Let \({\tilde{\mu }}_k\in {\tilde{A}}_{\alpha }({\mathcal {C}}_{S},{\mathcal {D}}_{N_0})\) such that \({\tilde{G}}_{\alpha }({\tilde{\mu }}_k,{\mathcal {C}}_S,{\mathcal {D}}_{N_k})\ge 1-\alpha \), and \(\sum _{i=1}^SJ(x^{(i)}){\tilde{\mu }}_k(i)=\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_{N_k})\). Let \({\tilde{\mu }}\) be any cluster point of \(\{{\tilde{\mu }}_k\}_{k=1}^{\infty }\). Let \(\{{\tilde{\mu }}_{t}\}_{t=1}^{\infty }\) be a subsequence converging to \({\tilde{\mu }}\). By Theorem 4, we have
Therefore, \(\breve{G}_{\alpha }({\tilde{\mu }},{\mathcal {C}}_S)\ge 1-\alpha \) and \({\tilde{\mu }}\) is feasible for problem \(\breve{P}_{\alpha }({\mathcal {C}}_S)\) which implies \(\sum _{i=1}^SJ(x^{(i)}){\tilde{\mu }}(i)\ge \breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S)\). Note that \({\tilde{\mu }}_t\rightarrow {\tilde{\mu }}\) w.p. 1, which implies that
Since this is true for an arbitrary point of \(\{{\tilde{\mu }}_k\}_{k=1}^{\infty }\) in the compact set \({\mathcal {U}}_S\), we have
Besides, we know that there exists a globally optimal solution of \(\breve{P}_{\alpha }({\mathcal {C}}_S)\), \(\mu ^*\), such that for any \(\varepsilon >0\) there is \(\mu \in U\) such that \(0<\Vert \mu -\mu ^*\Vert \le \varepsilon \) and \(\breve{G}_{\alpha }(\mu ,{\mathcal {C}}_S)>1-\alpha \). Namely, there exists a sequence \(\{{\tilde{\mu }}_t\}_{t=1}^{\infty }\subseteq U\) that converges to an optimal solution \(\mu ^*\) such that \(\breve{G}_{\alpha }({\tilde{\mu }}_t,{\mathcal {C}}_S)>1-\alpha \) for all \(t\in {\mathbb {N}}\). Notice that \({\tilde{G}}_{\alpha }({\tilde{\mu }}_t,{\mathcal {C}}_S,{\mathcal {D}}_{N_k})\) converges to \(\breve{G}_{\alpha }({\tilde{\mu }}_t,{\mathcal {C}}_S)\) w.p. 1. Then, for any fixed t, \(\exists K(t)\) such that \({\tilde{G}}_{\alpha }({\tilde{\mu }}_t,{\mathcal {C}}_S,{\mathcal {D}}_{N_k})\ge 1-\alpha \) for every \(k\ge K(t)\) w.p. 1. We can assume \(K(t)<K(t+1)\) for every t and define the sequence \(\{{\tilde{\mu }}_k\}_{k=K(1)}^{\infty }\) by setting \({\tilde{\mu }}_k={\tilde{\mu }}_t\) for all k and t with \(K(t)\le k<K(t+1)\). Then, \({\tilde{G}}_{\alpha }({\hat{\mu }}_k,{\mathcal {C}}_S,{\mathcal {D}}_{N_k})\ge 1-\alpha \), which implies \(\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_{N_k})\le \sum _{i=1}^SJ(x^{(i)}){\tilde{\mu }}_k(i)\) for all \(k\ge K(1)\). Thus, we have that
With (20) and (21), we conclude that \(\tilde{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_{N})\rightarrow \breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S)\) w.p. 1 as \(N\rightarrow \infty \).
The proof of \({\tilde{A}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\rightarrow \breve{A}_{\alpha }({\mathcal {C}}_S)\) can be referred to Theorem 5.3 of [27]. \(\square \)
Nextly, we show that \(\breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S)\) converges to \(\bar{{\mathcal {J}}}_{\alpha }\) with probability 1 as S increases.
Theorem 6
Suppose Assumption 2 and 4 hold. As \(S\rightarrow \infty \), with probability 1, we have
Proof
(Theorem 6) The outline of the proof of Theorem 6 is summarized as follows:
-
A.
Prove that the limit of lower bound of \(\breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S)\) is larger than \(\bar{{\mathcal {J}}}_{\alpha }\) by (23);
-
B.
Prove that the limit of upper bound of \(\breve{{\mathcal {J}}}_{\alpha }({\mathcal {C}}_S)\) is smaller than \(\bar{{\mathcal {J}}}_{\alpha }\) by (38);
-
B1.
Find a sequence \(\{\mu _k\}_{k=1}^{\infty }\) converges weakly to an optimal solution \(\mu ^*\) of \(P_\alpha \);
-
B2.
Show that \(\int _{{\mathcal {X}}}F(x){\textsf{d}}\mu _k(x)\) and \(\int _{{\mathcal {X}}}J(x){\textsf{d}}\mu _k(x)\) can be approximated by using discrete probability measure on \({\mathcal {C}}_S\), which refers to (34) and (35);
-
B3.
Show that optimal discrete probability measure on \({\mathcal {C}}_S\) for \(\breve{P}_\alpha ({\mathcal {C}}_S)\) has a smaller objective value than the discrete probability measure for approximating any \(\mu _k\) in B2. Then, we obtain (38).
-
B1.
Then, we give the details of the proof.
For any discrete probability measure \(\mu ^S\in \breve{{\mathcal {F}}}_{\alpha }({\mathcal {C}}_S)\), we have
Thus, \(\mu ^S\in M_{\alpha }(x)\). Then, it holds that
Furthermore, with probability 1, we have
Assumption 4 implies that there exists a sequence \(\{\mu _k\}_{k=1}^{\infty }\subseteq M({\mathcal {X}})\) that converges weakly to an optimal solution \(\mu ^*\) such that
for all \(k\in {\mathbb {N}}\). Since \(\{\mu _k\}_{k=1}^{\infty }\) converges weakly to \(\mu ^*\), we have
Notice that \(\bar{{\mathcal {J}}}_\alpha =\int _{{\mathcal {X}}}J(x){\textsf{d}}\mu ^*(x)\) by (3).
For any given \(\varepsilon _J>0\), \(\exists K(\varepsilon _J)\), if \(k\ge K(\varepsilon _J)\),
Let \(\tilde{{\mathcal {C}}}_{{\tilde{S}}}^k:=\{{\tilde{x}}^{(1)}_k,\ldots ,{\tilde{x}}^{({\tilde{S}})}_k\}\) be a sample set obtained by sampling from \({\mathcal {X}}\) according to probability measure \(\mu _k\). By Law of Large Numbers (p. 457 of [27]), for any \(f\in {\mathscr {C}}({\mathcal {X}},{\mathbb {R}})\), as \({\tilde{S}}\rightarrow \infty \), with probability 1, we have
Since \(J(\cdot )\) and \(F(\cdot )\) are also elements in \({\mathscr {C}}({\mathcal {X}},{\mathbb {R}})\), (26) also holds by replacing \(f(\cdot )\) by either \(J(\cdot )\) or \(F(\cdot )\). Namely, for any \({\tilde{\varepsilon }}_1\), there exists \({\tilde{S}}_{{\textsf{l}}}({\tilde{\varepsilon }}_J)\) such that, if \({\tilde{S}}\ge {\tilde{S}}_{{\textsf{l}}}({\tilde{\varepsilon }}_J)\), with probability 1, the followings hold:
On the other hand, according to Lemma 2, as \(S\rightarrow \infty \), for any \({\tilde{s}}\in \{1,\ldots ,{\tilde{S}}\}\) and \({\tilde{\varepsilon }}_r>0\), with probability 1, there exists a sample \(x^{(i_{{\tilde{s}}})}\in {\mathcal {C}}_S:=\{x^{(1)},\ldots ,x^{(S)}\}\) such that
With a little abuse of notation, let \(x^{(i_{{\tilde{s}}})}\) be the closest sample to \({\tilde{x}}^{({\tilde{s}})}_k\), namely, \(x^{(i_{{\tilde{s}}})}\in \arg \min \{\Vert x^{(i)}-{\tilde{x}}^{({\tilde{s}})}_k\Vert :x^{(i)}\in {\mathcal {C}}_S\}\). Define a set \(I_{{\tilde{S}}}:=\{i_1,\ldots ,i_{{\tilde{S}}}\}\) as the set of index corresponding to \(x^{(i_{{\tilde{s}}})}\). Without loss of generality, we assume that \(x^{(i_{{\tilde{s}}})}\ne x^{(j_{{\tilde{s}}})}\) if \(i_{{\tilde{s}}}\ne j_{{\tilde{s}}},\; i_{{\tilde{s}}},j_{{\tilde{s}}}\in I_{{\tilde{S}}}\). The intuitive explanation of the relationship between \({\mathcal {C}}_S\) and \(\tilde{{\mathcal {C}}}^k_{{\tilde{S}}}\) is illustrated in Fig. 1.
Define a discrete probability measure \(\mu ^{S}_k\in {\mathbb {R}}^S\) such that
For any given positive integer \({\tilde{S}}\) and positive number \({\tilde{\varepsilon }}_2\), due to the continuity of \(J(\cdot )\) and \(F(\cdot )\), there exists \({\bar{S}}_{{\textsf{l}}}({\tilde{S}},{\tilde{\varepsilon }}_2)\) such that, if \(S>S_{{\textsf{l}}}({\tilde{S}},{\tilde{\varepsilon }}_2)\), with probability 1, the followings hold:
By combining (27) with (32) and combining (28) with (33), then, for given \({\tilde{\varepsilon }}_1,{\tilde{\varepsilon }}_2\), there exist \({\tilde{S}}_{{\textsf{l}}}({\tilde{\varepsilon }}_1)\) and \(S_{{\textsf{l}}}({\tilde{S}},{\tilde{\varepsilon }}_2)\) such that, if \({\tilde{S}}>{\tilde{S}}_{{\textsf{l}}}({\tilde{\varepsilon }}_1)\) and \(S>S_{{\textsf{l}}}({\tilde{S}},{\tilde{\varepsilon }}_2)\), with probability 1, the following holds:
According to (24) and (34), we can find \({\tilde{S}}_{{\textsf{l}}}({\tilde{\varepsilon }}_1)\) and \(S_{{\textsf{l}}}({\tilde{S}},{\tilde{\varepsilon }}_2)\) such that, if \({\tilde{S}}>{\tilde{S}}_{{\textsf{l}}}({\tilde{\varepsilon }}_1)\) and \(S>S_{{\textsf{l}}}({\tilde{S}},{\tilde{\varepsilon }}_2)\), with probability 1, the following holds
Thus, \(\mu ^S_k\) is a feasible solution of Problem \({\tilde{P}}_\alpha ({\mathcal {C}}_S)\) and thus
Since \(\int _{{\mathcal {X}}}J(x){\textsf{d}}\mu _k(x)\) converges to \(\bar{{\mathcal {J}}}_\alpha \) w.p. 1 as \(k\rightarrow \infty \), thus, considering (35) and (37), we have
With (23) and (38), we have (22).\(\square \)
The proof of Theorem 2 can be obtained immediately by using the results of Theorems 5 and 6, which is omitted here.
3.3 Proof of Theorem 3
Main results of [37] are summarized as:
Lemma 3
Let \({\mathcal {X}}^+\) be a compact set. Let \(p:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) be a probability density function on the domain \({\mathbb {R}}^n\). If there exists a positive number \(\rho '>0\) such that \(p\in \{p:p(x)\ge \rho ',\forall x\in {\mathcal {X}}^+\}\), then there exists \(p_{\theta }(x)\) defined by (11) such that
where the positive integer L is the number of Gaussian kernels in (11).
Proof
(Theorem 3) For given \({\mathcal {C}}_S,\ {\mathcal {D}}_{N}\) and L, we have problems \({\tilde{P}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_{N})\) and \({\hat{P}}_{\alpha }(L,{\mathcal {D}}_N)\). Let \({\mathcal {X}}_{p,i},i=1,\ldots ,S\) be the partitions of \({\mathcal {X}}\), which satisfy
-
(a)
\(x^{(i)}\in {\mathcal {X}}_{p,i}\);
-
(b)
\(\bigcup _{i=1}^{S} {\mathcal {X}}_{p,i}={\mathcal {X}}\);
-
(c)
\({\mathcal {X}}_{p,i}\bigcap {\mathcal {X}}_{p,i'}=\emptyset \) with probability 1 if \(i\ne i'\).
For any \(\mu ^{S}\in U\), we can correspondingly define a Dirac measure on \({\mathcal {X}}\) as
Define a set of index as \(I^+=\{i:\mu ^{S}(x^{(i)})>0\}\). Then, we can define a compact set
According to Lemma 3, there exists a sequence \(\left\{ p_{\theta }(x)\right\} _L\) such that
Thus, we have
and
For any S and N, by applying Lemma 3, we can find a sequence \(\left\{ p^*_{\theta }(x)\right\} _L\) such that
and
There exists the limit of \({\hat{P}}_{\alpha }(L,{\mathcal {D}}_N)\) that converges to \({\tilde{P}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\) as \(L\rightarrow \infty \). Theorem 3 can be obtained by using Theorem 2. One point should be clarified here. In Theorem 2, the convergence holds for \(S\rightarrow \infty \). In Theorem 3, \(L\rightarrow \infty \) is used instead since we have (39) and (40) for any S increasing to infinite.\(\square \)
4 Numerical Examples
This section provides the results of two numerical examples to validate our proposed methods. All computations were executed on Windows 10 with 32 GB RAM and an Intel(R) Core(TM) i7-1065G7 CPU running at 1.30 GHz. The algorithm and all computations were implemented in MATLAB R2021b. We check the performance of the following methods:
-
1.
Dirac-Delta: solving sample average approximate problem \({\tilde{Q}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) of \(Q_\alpha \);
-
2.
Sample: solving sample-based approximate problem \({\tilde{P}}_\alpha ({\mathcal {C}}_S,{\mathcal {D}}_N)\) of \(P_\alpha \);
-
3.
GMM: GMM-based approximate problem \({\hat{P}}_\alpha (L,{\mathcal {D}}_N)\).
We use the terminology Dirac-Delta for the method of solving sample average approximate problem \({\tilde{Q}}_{\epsilon ,\gamma }({\mathcal {D}}_N)\) of \(Q_\alpha \) since it equivalently gives the measure constrained to be a Dirac-delta, namely, the measure is concentrated on one fixed solution.
4.1 One-Dimension Example
In the first numerical, we use an extremely simple example to demonstrate the concepts of Theorems 1, 2, and 3. The compact set \({\mathcal {X}}\) is defined by \({\mathcal {X}}=\{x\in {\mathbb {R}}:x\in [-1,1]\}\). Moreover, the cost function J(x) is
The constraint function \(h(x,\delta )\) is
where \(\delta \sim {\mathcal {N}}(m_{\delta },\varSigma _{\delta }), m_{\delta }=0\), and \(\varSigma _{\delta }=1\). The probability level \(\alpha \) is 0.05. The optimal solution from method Dirac-Delta is \(x^*_{\alpha }=0.595\) and the optimal objective value is 0.572, which is plotted in Fig. 2a. In Dirac-Delta, we set \(\epsilon =\alpha \), \(N=2000\), and \(\gamma =0.01\). Besides, Fig. 2b, c shows the discrete measure obtained by Sample and the probability density function obtained by GMM, respectively. For Sample, we choose samples \(-1, -0.98, -0.96,\ldots , 0.96, 0.98,1\) from \({\mathcal {X}}\) (\(S=201\)) and 2000 randomly extracted samples from \(\varDelta \) (\(N=2000\)). For GMM, we extracted 2000 samples from \(\varDelta \) randomly. Besides, we choose \(L=6\). The solutions of Sample and GMM satisfy the chance constraints. For the objective function, Sample achieves 0.5601 and GMM achieves 0.5615, which are better than the optimal objective value achieved by Dirac-Delta.
A more comprehensive analysis of CPU time and sample numbers is summarized in Table 1. The CPU time increases as the sample size increases for each method. Unsurprisingly, Sample has a very fast computation time since it only needs to solve a linear program. In this example, since it is one dimension, the required sample number for obtaining good samples in Sample or approximating probability integration in GMM is few. It can achieve acceptable accuracy with only 50 samples. However, if the dimension of x increases, the “Curse of Dimensionality” will emerge. We will show it in the second example.
4.2 Quadrotor System Control
The second example considers a quadrotor system control problem in turbulent conditions. The control problem is expressed as follows:
where A, B(m), \(d(x_t,\varphi )\) are written by
and \(\varDelta t\) is the sampling time, the state of the system is denoted as \(x_t=[p_{x,t},v_{x,t},p_{y,t},v_{y,t}]\in {\mathbb {R}}^4\), the control input of the system is \(u_t=\{u_{x,t},u_{y,t}\}\) within \({\mathcal {U}}:=\{u_t\in {\mathbb {R}}^2:-10\le u_{x,t}\le 10,-10\le u_{y_t}\le 10\}\), and the state and control trajectories are denoted as \(x=(x_t)_{t=1}^{T}\) and \(u=(u_t)_{t=1}^{T-1}\). The system starts from an initial point \(x_0=[-0.5,0,-0.5,0]\). The system is expected to reach the destination set \({\mathcal {X}}_{\text {goal}}=\{x\in {\mathbb {R}}^4|\Vert (p_x-10,p_y-10)\Vert \le 2\}\) at time \(T=10\) while avoiding two polytopic obstacles \({\mathcal {O}}\) shown in Fig. 3. \({\mathcal {O}}\) is defined by the following constraints:
The dynamics are parametrized by uncertain parameter vector \(\delta _t=[m,\varphi ]^\top \), where \(m>0\) represents the system’s mass and \(\varphi >0\) is an uncertain drag coefficient. The parameter vector \(\delta \) of the system is uncorrelated random variables such that \((m-0.75)/0.5\sim \text {Beta}(2,2)\) and \((\varphi -0.4)/0.2\sim \text {Beta}(2,5)\), where \(\text {Beta}(a,b)\) denotes a Beta distribution with shape parameters (a, b). \(\omega _t\in {\mathbb {R}}^4\) is the uncertain disturbance at time step t, which obeys multivariate normal distribution with zero means and covariance matrix
For the cost function, we adopt
Results are shown in Fig. 3 for different methods by setting \(\alpha \) as 15%. Figure 3 shows 5000 Monte Carlo (MC) simulations of the quadrotor system using the open-loop policy computed using Dirac-Delta (\(\epsilon =\alpha ,\gamma =0.01, N=2000\)), Sample (\(S=5.1\times 10^6,N=2000\)), and GMM (\(L=6,N=2000\)). When using Dirac-Delta, the algorithm gives a deterministic control policy that satisfies the desired success probability \(1-\alpha \). When using Sample, or GMM, the algorithm gives a stochastic control policy that satisfies the desired success probability \(1-\alpha \). The control inputs that generate trajectories passing through the riskier middle corridor between the obstacles are selected randomly for the stochastic control policies. The costs by using Sample and GMM are reduced by 8.2 and 7.9% compared to using Dirac-Delta. This shows that our approach can compute a better policy that solves the problem than a deterministic policy.
A more comprehensive comparison between the GMM-based and sample-based approximations is plotted in Fig. 4. Five cases are considered with different sample numbers for extracting the control input. Figure 4a shows that the two algorithms similarly reduce the optimal objective function value. Figure 4b shows each case’s used sample number S of decision variables. By comparing Fig. 4a, b, we can see that enough samples are required to ensure the performance of the approximations. As shown in Fig. 4c, the computation time increases dramatically as the sample number increases. In this comparison, for GMM, we choose \(L=6\), and the probability integration is approximated by using the same samples of Sample. The computation time of GMM is even longer than Sample. One way to decrease the computation time of GMM is to develop fast algorithms for probability integration. We leave this for future work. In this example, the dimension of the decision variable is 20. If the dimension increases, the required sample number will increase, and the computation time will consequently increase for Sample and GMM. We leave the issue of the “Curse of Dimensionality” for future work.
5 Conclusions
In conclusion, the chance-constrained linear program in probability measure space has been addressed using sample approximation or function approximation. We establish optimization problems in finite vector space as approximate problems of chance-constrained linear programs in probability measure space. By solving the approximate problems, we can obtain the approximate solution of the chance-constrained linear program in probability measure space. Numerical examples have been implemented to validate the performance of the proposed method. Future work will be focused on the following points:
-
To implement sample approximation method \({\tilde{P}}_{\alpha }({\mathcal {C}}_S,{\mathcal {D}}_N)\), samples of decision variable are required. As the dimension of the decision variable increases, the required sample number for a good approximation will also increase, bringing the issue of the “Curse of dimensionality.” To overcome the issue of the “Curse of Dimensionality,” it is important to develop efficient sampling algorithms to get “good but small samples” to ensure good approximation performance and mitigate the computation burden;
-
For Gaussian mixture model-based approximation method \({\hat{P}}_{\alpha }(L,{\mathcal {D}}_N)\), the remaining issue is how to approximate the probability integration by fast algorithms when the problem is with complex cost function and constrained functions in high dimension space.
References
Berthold, H., Heitsch, H., Henrion, R., Schwienteck, J.: On the algorithmic solution of optimization problems subject to probabilistic/robust (probust) constraints. Math. Methods Oper. Res. 96, 1–37 (2022)
Blackmore, L., Ono, M., Williams, B.C.: Chance-constrained optimal path planning with obstacles. IEEE Trans. Rob. 27(6), 1080–1094 (2011)
Bertsekas, D.P., Tsitsiklis, J.N.: Introduction to Probability. Athena Scientific, Belmont (2002)
Billingsley, P.: Probability and Measure. John Wiley & Sons, New York (1995)
Calariore, G., Campi, M.C.: The scenario approach to robust control design. IEEE Trans. Autom. Control 51(5), 742–753 (2006)
Campi, M.C., Garatti, S.: The exact feasibility of randomized solutions of uncertain convex programs. SIAM J. Optim. 19(3), 1211–1230 (2008)
Campi, M.C., Garatti, S.: A sampling-and-discarding approach to chance-constrained optimization: feasibility and optimality. J. Optim. Theory Appl. 148(2), 257–280 (2011)
Campi, M.C., Garatti, S.: A general scenario theory for nonconvex optimization and decision making. IEEE Trans. Autom. Control 63(12), 4067–4078 (2015)
Campi, M.C., Garatti, S., Ramponi, F.A.: Introduction to the Scenario Approach. MOS-SIAM Series on Optimization, Philadelphia (2019)
Castillo-Lopez, M., Ludivig, P., Sajadi-Alamdari, S.A.: A real-time approach for chance-constrained motion planning with dynamic obstacles. IEEE Robotics Autom. Lett. 5(2), 3620–3625 (2020)
Chen, P., Ghattas, O.: Taylor approximation for chance constrained optimization problems governed by partial differential equations with high-dimensional random parameters. SIAM/ASA J. Uncertain. Quantif. 9(4), 1381–1410 (2021)
Farshbaf-Shaker, M.H., Henrion, R., Homber, D.: Properties of chance constraints in infinite dimensions with an application to PDE constrained optimization. Set Valued Var. Anal. 26, 821–841 (2018)
Geletu, A., Hoffmann, A., Kloppel, M., Li, P.: An inner-outer approximation approach to chance constrained optimization. SIAM J. Optim. 27(3), 1834–1857 (2017)
Geletu, A., Hoffmann, A., Schmidt, P., Li, P.: Chance constrained optimization of elliptic PDE systems with a smoothing convex approximation. ESAIM Control Optim. Calc. Var. 26(70) (2020)
Grandon, T.G., Henrion, R., Perez-Aros, P.: Dynamic probabilistic constraints under continuous random distributions. Math. Program. 196, 1065–1096 (2022)
Gryazina, E., Polyak, B.: Random sampling: Billiard walk algorithm. Eur. J. Oper. Res. 238, 497–504 (2014)
Hewing, L., Kabzan, J., Zeilinger, M.N.: Cautious model predictive control using Gaussian process regression. IEEE Trans. Control Syst. Technol. 28(6), 2736–2743 (2020)
Kibzun, A.I., Kan, Y.S.: Stochastic Programming Problems. Wiley, West Sussex (1996)
Lew, T., Sharma, A., Harrison, J., Bylard, A., Pavone, M.: Safe active dynamics learning and control: asequential exploration-exploitation framework. IEEE Trans. Robotics 38(5), 2888–2907 (2022)
Luedtke, J., Ahmed, S.: A sample approximation approach for optimization with probabilistic constraints. SIAM J. Optim. 19(2), 674–699 (2008)
Luedtke, J., Ahmed, S., Nemhauser, G.L.: An integer programming approach for linear programs with probabilistic constraints. Math. Program. 122, 247–272 (2010)
Molchanov, I.: Theory of Random Sets. Springer, London (2005)
Nemirovski, A., Shapiro, A.: Convex approximations of chance constrained programs. SIAM J. Optim. 17, 969–996 (2007)
Ono, M., Pavone, M., Kuwata, Y., Balaram, J.: Chance-constrained dynamic programming with application to risk-aware robotic space exploration. Auton. Robots 39(4), 555–571 (2015)
Pagnoncelli, B.K., Ahmed, S., Shapiro, A.: Computational study of a chance constrained portfolio selection problem. J. Optim. Theory Appl. 142(2), 399–416 (2009)
Pena-Ordieres, A., Luedtke, J., Wachter, A.: Solving chance-constrained problems via a smooth sample-based nonlinear approximation. SIAM J. Optim. 30(3), 2221–2250 (2020)
Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on Stochastic Programming: Modeling and Theory, 2nd edn. SIAM, Philadelphia (2014)
Shen, X., Ouyang, T., Zhang, Y., Zhang, X.: Computing probabilistic bounds on state trajectories for uncertain systems. IEEE Trans. Emerg. Top. Comput. Intell. 7(1), 285–290 (2023)
Shen, X., Ouyang, T., Yang, N., Zhuang, J.: Sample-based neural approximation approach for probabilistic constrained programs. IEEE Trans. Neural Netw. Learn. Syst. 34(2), 1058–1065 (2023)
Smith, R.L.: Efficient Monte Carlo procedures for generating points uniformly distributed over bounded regions. Oper. Res. 32(6), 1296–1308 (1984)
Sun, Y., Aw, G., Loxton, R., Teo, K.L.: Chance-constrained optimization for pension fund portfolios in the presense of default risk. Eur. J. Oper. Res. 256(1), 205–214 (2017)
Thorpe, A.J., Lew, T., Oishi, M.M.K., Pavone, M.: Data-driven chance constrained control using kernel distribution embeddings. Proc. Mach. Learn. Res. 144, 1–13 (2022)
van Ackooij, W., Henrion, R.: Gradient formulate for nonlinear probabilistic constraints with Gaussian and Gaussian-like distributions. SIAM J. Optim. 24, 1864–1889 (2014)
van Ackooij, W., Perez-Aros, P.: Generalized differentiation of probability functions acting on an infinite system of constraints. SIAM J. Optim. 29(3), 2179–2210 (2019)
van Ackooij, W., Henrion, R., Perez-Aros, P.: Generalized gradients for probabilistic/robust (probust) constraints. Optimization 69(7–8), 1451–1479 (2020)
Wu, C., Teo, K.L., Wu, S.: Min-max optimal control of linear systems with uncertainty and terminal state constraints. Automatica 49(6), 1809–1815 (2013)
Zeevi, A.J., Meir, R.: Density estimation through convex combinations of densities: approximation and estimation bounds. SIAM J. Optim. 10(1), 99–109 (1997)
Acknowledgements
We thank two anonymous reviewers for taking the precious time and effort to review our manuscript and give us valuable suggestions.
Funding
Open access funding provided by Osaka University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Nobuo Yamashita.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Shen, X., Ito, S. Approximate Methods for Solving Chance-Constrained Linear Programs in Probability Measure Space. J Optim Theory Appl 200, 150–177 (2024). https://doi.org/10.1007/s10957-023-02342-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-023-02342-w