Abstract
We propose a stochastic approximation method for approximating the efficient frontier of chance-constrained nonlinear programs. Our approach is based on a bi-objective viewpoint of chance-constrained programs that seeks solutions on the efficient frontier of optimal objective value versus risk of constraints violation. To this end, we construct a reformulated problem whose objective is to minimize the probability of constraints violation subject to deterministic convex constraints (which includes a bound on the objective function value). We adapt existing smoothing-based approaches for chance-constrained problems to derive a convergent sequence of smooth approximations of our reformulated problem, and apply a projected stochastic subgradient algorithm to solve it. In contrast with exterior sampling-based approaches (such as sample average approximation) that approximate the original chance-constrained program with one having finite support, our proposal converges to stationary solutions of a smooth approximation of the original problem, thereby avoiding poor local solutions that may be an artefact of a fixed sample. Our proposal also includes a tailored implementation of the smoothing-based approach that chooses key algorithmic parameters based on problem data. Computational results on four test problems from the literature indicate that our proposed approach can efficiently determine good approximations of the efficient frontier.
Similar content being viewed by others
Notes
We make this notion more precise in Sect. 4.2.
Although Davis and Drusvyatskiy [21] establish that mini-batching is not necessary for the convergence of the projected stochastic subgradient algorithm, a small mini-batch can greatly enhance the performance of the algorithm in practice.
The scenario approximation problems solved to global optimality using SCIP in Case study 4 were run on a different laptop running Ubuntu 16.04 with a 2.6 GHz four core Intel i7 CPU, 8 GB of RAM due to interfacing issues on Windows.
References
Adam, L., Branda, M., Heitsch, H., Henrion, R.: Solving joint chance constrained problems using regularization and Benders’ decomposition. Ann. Oper. Res., pp. 1–27 (2018), https://doi.org/10.1007/s10479-018-3091-9
Adam, L., Branda, M.: Machine learning approach to chance-constrained problems: An algorithm based on the stochastic gradient descent (2018). http://www.optimization-online.org/DB_HTML/2018/12/6983.html (Last accessed April 1, 2019)
Adam, L., Branda, M.: Nonlinear chance constrained problems: optimality conditions, regularization and solvers. J. Optim. Theory Appl. 170(2), 419–436 (2016)
Amestoy, P.R., Duff, I.S., L’Excellent, J.Y., Koster, J.: MUMPS: a general purpose distributed memory sparse solver. In: International Workshop on Applied Parallel Computing. Springer, pp. 121–130 (2000)
Andrieu, L., Cohen, G., Vázquez-Abad, F.: Stochastic programming with probability constraints (2007). arXiv preprint arXiv:0708.0281
Benders, J.F.: Partitioning procedures for solving mixed-variables programming problems. Numer. Math. 4(1), 238–252 (1962)
Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009)
Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: A fresh approach to numerical computing. SIAM Rev. 59(1), 65–98 (2017)
Bienstock, D., Chertkov, M., Harnett, S.: Chance-constrained optimal power flow: risk-aware network control under uncertainty. SIAM Rev. 56(3), 461–495 (2014)
Calafiore, G., Campi, M.C.: Uncertain convex programs: randomized solutions and confidence levels. Math. Program. 102(1), 25–46 (2005)
Calafiore, G.C., Dabbene, F., Tempo, R.: Research on probabilistic methods for control system design. Automatica 47(7), 1279–1293 (2011)
Campi, M.C., Garatti, S.: A sampling-and-discarding approach to chance-constrained optimization: feasibility and optimality. J. Optim. Theory Appl. 148(2), 257–280 (2011)
Cao, Y., Zavala, V.: A sigmoidal approximation for chance-constrained nonlinear programs (2017). http://www.optimization-online.org/DB_FILE/2017/10/6236.pdf. Last accessed: April 1, 2019
Censor, Y., Chen, W., Combettes, P.L., Davidi, R., Herman, G.T.: On the effectiveness of projection methods for convex feasibility problems with linear inequality constraints. Comput. Optim. Appl. 51(3), 1065–1088 (2012)
Charnes, A., Cooper, W.W., Symonds, G.H.: Cost horizons and certainty equivalents: an approach to stochastic programming of heating oil. Manag. Sci. 4(3), 235–263 (1958)
Chen, W., Sim, M., Sun, J., Teo, C.P.: From CVaR to uncertainty set: implications in joint chance-constrained optimization. Oper. Res. 58(2), 470–485 (2010)
Clarke, F.H.: Optimization and Nonsmooth Analysis, vol 5. SIAM, Philadelphia (1990)
Condat, L.: Fast projection onto the simplex and the \(\ell _1\) ball. Math. Program. 158(1–2), 575–585 (2016)
Curtis, F.E., Wachter, A., Zavala, V.M.: A sequential algorithm for solving nonlinear optimization problems with chance constraints. SIAM J. Optim. 28(1), 930–958 (2018)
Davis, D., Drusvyatskiy, D., Kakade, S., Lee, J.D.: Stochastic subgradient method converges on tame functions (2018). arXiv preprint arXiv:1804.07795
Davis, D., Drusvyatskiy, D.: Stochastic subgradient method converges at the rate \(O\left(k^{-1/4}\right)\) on weakly convex functions (2018). arXiv preprint arXiv:1802.02988
Dentcheva, D., Martinez, G.: Regularization methods for optimization problems with probabilistic constraints. Math. Program. 138(1–2), 223–251 (2013)
Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Program. (2018). https://doi.org/10.1007/s10107-018-1311-3
Dunning, I., Huchette, J., Lubin, M.: JuMP: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017)
Ermoliev, Y.M.: Stochastic quasigradient methods. In: Ermoliev, Y.M., Wets, R.J. (eds). Numerical Techniques for Stochastic Optimization. Springer, Berlin, pp. 141–185 (1988)
Ermoliev, Y.M., Norkin, V.I.: On nonsmooth and discontinuous problems of stochastic systems optimization. Eur. J. Oper. Res. 101(2), 230–244 (1997)
Ermoliev, Y.M., Norkin, V.: Stochastic generalized gradient method for nonconvex nonsmooth stochastic optimization. Cybern. Syst. Anal. 34(2), 196–215 (1998)
Ermoliev, Y.M., Norkin, V.I., Wets, R.J.: The minimization of semicontinuous functions: mollifier subgradients. SIAM J. Control Optim. 33(1), 149–167 (1995)
Geletu, A., Hoffmann, A., Kloppel, M., Li, P.: An inner–outer approximation approach to chance constrained optimization. SIAM J. Optim. 27(3), 1834–1857 (2017)
Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. 155(1–2), 267–305 (2016)
Gleixner, A., Bastubbe, M., Eifler, L., Gally, T., Gamrath, G., Gottwald, R.L., Hendel, G., Hojny, C., Koch, T., Lübbecke, M.E., Maher, S.J., Miltenberger, M., Müller, B., Pfetsch, M.E., Puchert, C., Rehfeldt, D., Schlösser, F., Schubert, C., Serrano, F., Shinano, Y., Viernickel, J.M., Walter, M., Wegscheider, F., Witt, J.T., Witzig, J.: The SCIP Optimization Suite 6.0. Technical report (2018) Optimization. http://www.optimization-online.org/DB_HTML/2018/07/6692.html
Gotzes, C., Heitsch, H., Henrion, R., Schultz, R.: On the quantification of nomination feasibility in stationary gas networks with random load. Math. Methods Oper. Res. 84(2), 427–457 (2016)
Gurobi Optimization LLC: Gurobi Optimizer Reference Manual (2018). http://www.gurobi.com
Hong, L.J., Yang, Y., Zhang, L.: Sequential convex approximations to joint chance constrained programs: a Monte Carlo approach. Oper. Res. 59(3), 617–630 (2011)
Hu, Z., Hong, L.J., Zhang, L.: A smooth Monte Carlo approach to joint chance-constrained programs. IIE Trans. 45(7), 716–735 (2013)
Jiang, R., Guan, Y.: Data-driven chance constrained stochastic program. Math. Program. 158(1–2), 291–327 (2016)
Lagoa, C.M., Li, X., Sznaier, M.: Probabilistically constrained linear programs and risk-adjusted controller design. SIAM J. Optim. 15(3), 938–951 (2005)
Lepp, R.: Extremum problems with probability functions: Kernel type solution methods. In: Floudas CA, Pardalos PM (eds) Encyclopedia of Optimization. Springer, Berlin, pp. 969–973 (2009). https://doi.org/10.1007/978-0-387-74759-0_170
Li, P., Arellano-Garcia, H., Wozny, G.: Chance constrained programming approach to process optimization under uncertainty. Comput. Chem. Eng. 32(1–2), 25–45 (2008)
Luedtke, J.: A branch-and-cut decomposition algorithm for solving chance-constrained mathematical programs with finite support. Math. Program. 146(1–2), 219–244 (2014)
Luedtke, J., Ahmed, S.: A sample approximation approach for optimization with probabilistic constraints. SIAM J. Optim. 19(2), 674–699 (2008)
Miller, B.L., Wagner, H.M.: Chance constrained programming with joint constraints. Oper. Res. 13(6), 930–945 (1965)
Nemirovski, A., Shapiro, A.: Convex approximations of chance constrained programs. SIAM J. Optim. 17(4), 969–996 (2006)
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Nemirovsky, A.S., Yudin, D.B.: Problem complexity and method efficiency in optimization. Wiley, London (1983)
Norkin, V.I.: The analysis and optimization of probability functions. Tech. rep., IIASA Working Paper, WP-93-6 (1993)
Nurminskii, E.: The quasigradient method for the solving of the nonlinear programming problems. Cybernetics 9(1), 145–150 (1973)
Peña-Ordieres, A., Luedtke, J.R., Wächter, A.: Solving chance-constrained problems via a smooth sample-based nonlinear approximation (2019)
Prékopa, A.: On probabilistic constrained programming. In: Proceedings of the Princeton Symposium on Mathematical Programming. Princeton, pp. 113–138 (1970)
Prékopa, A.: Stochastic Programming, vol. 324. Springer, Berlin (1995)
Rafique, H., Liu, M., Lin, Q., Yang, T.: Non-convex min–max optimization: provable algorithms and applications in machine learning (2018). arXiv preprint arXiv:1810.02060
Rengarajan, T., Morton, D.P.: Estimating the efficient frontier of a probabilistic bicriteria model. In: Winter Simulation Conference, pp. 494–504 (2009)
Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, vol. 317. Springer, Berlin (2009)
Ruben, H.: Probability content of regions under spherical normal distributions, IV: the distribution of homogeneous and non-homogeneous quadratic functions of normal variables. Ann. Math. Stat. 33(2), 542–570 (1962)
Shan, F., Zhang, L., Xiao, X.: A smoothing function approach to joint chance-constrained programs. J. Optim. Theory Appl. 163(1), 181–199 (2014)
Shan, F., Xiao, X., Zhang, L.: Convergence analysis on a smoothing approach to joint chance constrained programs. Optimization 65(12), 2171–2193 (2016)
Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on Stochastic Programming: Modeling and Theory. SIAM, Philadelphia (2009)
van Ackooij, W., Henrion, R.: Gradient formulae for nonlinear probabilistic constraints with Gaussian and Gaussian-like distributions. SIAM J. Optim. 24(4), 1864–1889 (2014)
van Ackooij, W., Henrion, R.: (Sub-)Gradient formulae for probability functions of random inequality systems under Gaussian distribution. SIAM/ASA J. Uncertain. Quantif. 5(1), 63–87 (2017)
van Ackooij, W., Frangioni, A., de Oliveira, W.: Inexact stabilized Benders’ decomposition approaches with application to chance-constrained problems with finite support. Computational Optimization and Applications 65(3), 637–669 (2016)
van Ackooij, W., Berge, V., de Oliveira, W., Sagastizábal, C.: Probabilistic optimization via approximate p-efficient points and bundle methods. Comput. Oper. Res. 77, 177–193 (2017)
Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006)
Zhang, S., He, N.: On the convergence rate of stochastic mirror descent for nonsmooth nonconvex optimization (2018). arXiv preprint arXiv:1806.04781
Zhang, H., Li, P.: Chance constrained programming for optimal power flow under uncertainty. IEEE Trans. Power Syst. 26(4), 2417–2424 (2011)
Acknowledgements
The authors thank the anonymous reviewers for suggestions that improved the paper. R.K. also thanks Rui Chen, Eli Towle, and Clément Royer for helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research is supported by the Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Contract No. DE-AC02-06CH11357.
Appendices
Algorithm outlines
1.1 Implementation of the scenario approximation algorithm
Algorithm 7 details our implementation of the tuned scenario approximation algorithm. Instead of solving each scenario approximation problem instance using an off-the-shelf solver by imposing all of the sampled constraints at once, we adopt a cutting-plane approach that iteratively adds some of the violated scenario constraints at a candidate solution to the list of enforced constraints in a bid to reduce the overall computational effort (cf. Algorithm 2.6 in Bienstock et al. [9]). We use \(M = 50\) iterations and \(R = 20\) replicates for our computational experiments. Settings for sample sizes \(N_i\), constraints added per iteration \(N_c\), and number of samples \(N_{MC}\) are problem dependent and provided below. Line 19 in Algorithm 7, which estimates the risk level of a candidate solution, may be replaced, if possible, by one that computes the true analytical risk level or a numerical estimate of it [59].
-
\(N_i = \lceil 10^{a_i}\rceil \), \(i = 1,\ldots ,50\), where: for Case study 1: \(a_i = 1 + \frac{5}{49}(i-1)\); for Case study 2: \(a_i = 1 + \frac{4}{49}(i-1)\); for Case study 3: \(a_i = 1 + \frac{\log _{10}(50000)-1}{49}(i-1)\); for Case study 4: \(a_i = 1 + \frac{4}{49}(i-1)\); and for Case study 5: \(a_i = 1 + \frac{4}{49}(i-1)\) (the upper bounds on \(N_i\) were determined based on memory requirements)
-
\(N_c\): Case study 1: 1000; Case study 2: 100000; Case study 3: 10; Case study 4: 5; and Case study 5: 10 (these values were tuned for good performance)
-
\(N_{MC}\): See Sect. 5.1 of the paper (we use the same Monte Carlo samples that were used by our proposed method to estimate risk levels).
1.2 Solving (CCP) for a fixed risk level
Algorithm 8 adapts Algorithm 2 to solve (CCP) for a given risk level \(\hat{\alpha } \in (0,1)\). An initial point \(\bar{x}^0\) and an upper bound on the optimal objective value \(\nu _{up}\) can be obtained in a manner similar to Algorithm 2, whereas an initial lower bound on the optimal objective value \(\nu _{low}\) can be obtained either using lower bounding techniques (see [41, 43]), or by trial and error. Note that the sequence of approximations in line 9 of Algorithm 8 need not be solved until termination if Algorithm 3 determines that \(\bar{\alpha }^i < \hat{\alpha }\) before its termination criteria have been satisfied.
1.3 Implementation of the sigmoidal approximation algorithm
We used the following ‘tuned’ settings to solve each iteration of the sigmoidal approximation problem (see Section 4 of Cao and Zavala [13]) using IPOPT: tol \(= 10^{-4}\), max_iter = 10000, hessian_approximation = limited_memory, jac_c_constant=yes, and max_cpu_time=3600 s. We terminated the loop of Algorithm SigVar-Alg of Cao and Zavala [13] when the objective improved by less than \(0.01\%\) relative to the previous iteration. In what follows, we use the notation of Algorithm SigVar-Alg of Cao and Zavala [13]. For our ‘adaptive \(\gamma \)’ setting, we use the proposal of Cao and Zavala [13] to specify \(\gamma \) when the solution of the CVaR problem corresponds to \(t_c(\alpha ) < 0\). When \(t_c(\alpha ) = 0\) at the CVaR solution returned by Gurobi, we try to estimate a good value of \(t_c(\alpha )\) by looking at the true distribution of \(g(x_c(\alpha ),\xi )\).
1.4 Tailored projection step for Case study 2
Because Algorithm 2 requires projecting onto the sets \(X_{\nu } := \left\{ x \in \varDelta _N : {x}^\text {T} \varSigma x \le \nu \right\} \) many times for Case study 2 with different values of \(\nu \), we develop a tailored projection routine that is computationally more efficient than using a general-purpose NLP solver to solve these quadratically-constrained quadratic programs. To project a point \(y \in {\mathbb {R}}^N\) onto \(X_{\nu }\) for some \(\nu > 0\), note that the KKT conditions for the projection problem \({\mathop {\min }\nolimits _{x \in X_{\nu }}} 0.5\Vert x-y\Vert ^2\) yield:
where \(\mu \), \(\lambda \), and \((\pi _1,\ldots ,\pi _N)\) denote KKT multipliers. Note that the inequalities in the first row can be simplified as \(x_i = \max \left\{ 0,\frac{y_i-\mu }{1+2\sigma ^2_i\lambda }\right\} \), \(\forall i \in \{1,\ldots ,N\}\). We first check whether \(\lambda = 0\) satisfies the above system of equations (i.e., if the quadratic constraint is inactive at the solution) by setting x to be the projection of y onto the unit simplex. If \(\lambda = 0\) is infeasible, we solve the following system of equations by using binary search over \(\lambda \):
where for each fixed \(\lambda > 0\), the first two sets of equations in x and \(\mu \) are solved by adapting the algorithm of Condat [18] for projection onto the unit simplex. Once solved, if the solution (say \(x^*\)) satisfies \(\sum _{i=1}^{N} \alpha ^2_i (x^*_i)^2 < \nu \), then we conclude the current \(\lambda \) is too large, and if \(\sum _{i=1}^{N} \alpha ^2_i (x^*_i)^2 > \nu \) then we conclude the current \(\lambda \) is too small.
Proof of Proposition 6
We first sketch an outline of the proof (note that our proof will follow a different order for reasons that will become evident). We will establish that (see Chapters 4 and 5 of Rockafellar and Wets [54] for definitions of technical terms)
Then, because of the outer semicontinuity of the normal cone mapping, it suffices to prove that the outer limit of \(\partial {\hat{p}}_k(x)\) is a subset of \(\{\nabla p(\bar{x})\}\). To demonstrate this, we will first show that \(\partial {\hat{p}}_k(x) = - \partial \int _{-\infty }^{\infty } F(x,\eta ) d\phi \left( \eta ;\tau _c^{k-1}\right) d\eta \), where F is the cumulative distribution function of \(\max \left[ \bar{g}(x,\xi )\right] \). Then, we will split this integral into two parts - one accounting for tail contributions (which we will show vanishes when we take the outer limit), and the other accounting for the contributions of F near \((x,\eta ) = (\bar{x},0)\) that we will show satisfies the desired outer semicontinuity property.
Since \({\hat{p}}_k(x)\) can be rewritten as \({\hat{p}}_k(x) = {\mathbb {E}}\big [{\phi \left( \max \left[ \bar{g}(x,\xi )\right] ;\tau _c^{k-1}\right) }\big ]\) by Assumptions 4B and 6, we have
where the second line is to be interpreted as a Lebesgue–Stieljes integral, and the final step follows by integrating by parts and Assumption 4. This yields (see Proposition 5 and Assumption 6 for the existence of these quantities)
by the properties of the Clarke generalized gradient, where \(\{\varepsilon _k\}\) is defined in Assumption 6.
Let \(\{x_k\}\) be any sequence of points in \(X_{\nu }\) converging to \(\bar{x} \in X_{\nu }\). Suppose \(v_k \in - \partial \int _{|\eta | \ge \varepsilon _k} F(x_k,\eta ) d\phi \left( \eta ;\tau _c^{k-1}\right) d\eta \) with \(v_k \rightarrow v \in {\mathbb {R}}^n\). We would like to show that \(v = 0\). Note that by an abuse of notation (where we actually take norms of the integrable selections that define \(v_k\))
where the first step follows from Theorem 2.7.2 of Clarke [17] (whose assumptions are satisfied by virtue of Assumption 6B), the second inequality follows from Proposition 2.1.2 of Clarke [17], and the final equality follows from Assumption 6C.
The above arguments establish that \(\limsup \nolimits _{\begin{array}{c} x \rightarrow \bar{x} \\ k \rightarrow \infty \end{array}}\, - \partial \int _{|\eta | \ge \varepsilon _k} F(x,\eta ) d\phi \left( \eta ;\tau _c^{k-1}\right) d\eta = \{0\}\). Consequently, we have \(\limsup \nolimits _{\begin{array}{c} x \rightarrow \bar{x} \\ k \rightarrow \infty \end{array}}\, \partial {\hat{p}}_k(x) \subset \limsup \nolimits _{\begin{array}{c} x \rightarrow \bar{x} \\ k \rightarrow \infty \end{array}}\, - \partial \int _{-\varepsilon _k}^{\varepsilon _k} F(x,\eta ) d\phi \left( \eta ;\tau _c^{k-1}\right) d\eta \) from Eq. (4). We now consider the outer limit \(\limsup \nolimits _{\begin{array}{c} x \rightarrow \bar{x} \\ k \rightarrow \infty \end{array}}\, - \partial \int _{-\varepsilon _k}^{\varepsilon _k} F(x,\eta ) d\phi \left( \eta ;\tau _c^{k-1}\right) d\eta \). Suppose \(w_k \in - \partial \int _{-\varepsilon _k}^{\varepsilon _k} F(x_k,\eta ) d\phi \left( \eta ;\tau _c^{k-1}\right) d\eta \) with \(w_k \rightarrow w \in {\mathbb {R}}^n\). We wish to show that \(w = \nabla p(\bar{x})\). Invoking Theorem 2.7.2 of Clarke [17] once again, we have that
where for each \(k \in {\mathbb {N}}\) large enough, \(\hat{\phi }_k:{\mathbb {R}}\rightarrow {\mathbb {R}}\) is defined as \(\hat{\phi }_k\left( y\right) = \dfrac{\phi \left( y;\tau _c^{k-1}\right) }{\int _{-\varepsilon _k}^{\varepsilon _k} d\phi \left( z;\tau _c^{k-1}\right) dz}\), the first step follows (by an abuse of notation) from the fact that \(\varepsilon _k < \theta \) for k large enough (see Assumption 6B), and the third and fourth steps follow from Assumption 6C. Noting thatFootnote 4
for some constants \(\omega _{i,k} \in (-\varepsilon _k,\varepsilon _k)\), \(i = 1,\ldots ,n\), by virtue of Assumption 6B and the first mean value theorem for definite integrals, we have
where the first equality above follows from Assumption 6B, and the second equality follows from the fact that \(p(x) = 1 - F(x,0)\) for each \(x \in X_{\nu }\). Reconciling our progress with Eq. (4), we obtain
To establish the desirable equality in Eq. (3), it suffices to show that \(\partial {\hat{p}}_k(x_k) \subset C\) for k large enough, where \(C \subset {\mathbb {R}}^n\) is independent of k. From Eq. (5), we have that the first term in the right-hand side of Eq. (4) is contained in a bounded set that is independent of \(k \in {\mathbb {N}}\). From Eq. (6), we have that any element of the second term in the right-hand side of Eq. (4) is bounded above in norm by \(\max \nolimits _{\begin{array}{c} x \in {\text {cl}}\left( B_{\delta }(\bar{x})\right) \\ \eta \in [-0.5\theta ,0.5\theta ] \end{array}} \Vert \nabla _x F(x,\eta )\Vert \) for k large enough for any \(\delta > 0\). The above arguments in conjunction with Eq. (4) establish Eq. (3). The desired result then follows from the outer semicontinuity of the normal cone mapping, see Proposition 6.6 of Rockafellar and Wets [54]. \(\square \)
Recourse formulation
As noted in Sect. 1 of the paper, Problem (CCP) can also be used to model chance-constrained programs with static recourse decisions, e.g., by defining g through the solution of the following auxiliary optimization problem for each \((x,\xi ) \in X \times \varXi \):
where \(T: {\mathbb {R}}^n \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}^{m_r}\) is continuously differentiable, and \(W:{\mathbb {R}}^d \rightarrow {\mathbb {R}}^{m_r \times n_y}\). Note that g can be recast in the form of a joint chance constraint by appealing to linear programming duality, viz.,
where \(\mathrm {EXT}(V(\xi ))\) denotes the set of extreme points of the polytope
Explicitly reformulating the recourse constraints into the above explicit form is not generally practical since Algorithms 4 and 5 rely on stepping through each of the constraint functions, and the cardinality of the set of extreme points of \(V(\xi )\) may be huge. Therefore, we consider the case when the approximations \({\hat{p}}_k\) are constructed using a single smoothing function, i.e., \({\hat{p}}_k(x) = {\mathbb {E}}\big [{\phi _k(g(x,\xi ))}\big ]\), where \(\{\phi _k\}\) is a sequence of smooth scalar functions that approximate the step function.
Throughout this section, we assume that g that can be reformulated as \(g(x,\xi ) := \underset{j \in J}{\max } \,\, h_j(x,\xi )\), where J is a (potentially large) finite index set, and \(h_j:{\mathbb {R}}^n \times R^d \rightarrow {\mathbb {R}}\) are continuously differentiable functions that are Lipschitz continuous and have Lipschitz continuous gradients in the sense of Assumption 3. We let \(L^{'}_h(\xi )\) denote the Lipschitz constant of \(\nabla h(\cdot ,\xi )\) on \(X_{\nu }\) with \({\mathbb {E}}\big [{\left( L^{'}_h(\xi )\right) ^2}\big ] < +\infty \). While the theory in this section is also applicable to the ordinary joint chance constrained setting, it is usually advantageous to consider individual smoothing functions for each chance constraint function in that case whenever feasible.
We first characterize the Clarke generalized gradient of the approximation \({\hat{p}}_k\) under this setup.
Proposition 11
Suppose Assumptions 4 and 5 and the above assumptions on \(g(x,\xi ) = \max \left[ h(x,\xi )\right] \) and \(\phi _k\) hold. Then \(\partial {\hat{p}}_k(x) = {\mathbb {E}}\big [{d\phi _k\left( \max \left[ h(x,\xi )\right] \right) \times \partial _x \max \left[ h(x,\xi )\right] }\big ]\).
Proof
Note that for any \((x,\xi ) \in X \times \varXi \), we have \(\partial _x \phi _k(g(x,\xi )) = d\phi _k(g(x,\xi )) \, \partial _x \max \left[ h(x,\xi )\right] \), see Theorem 2.3.9 of Clarke [17]. A justification for swapping the derivative and expectation operators then follows from Proposition 5 of the paper and the fact that \({\hat{p}}_k(x) = {\mathbb {E}}\big [{\phi _k(g(x,\xi ))}\big ] = {\mathbb {E}}\big [{\underset{j}{\max }\left[ \phi _k\left( g_j(x,\xi )\right) \right] }\big ]\) since \(\phi _k\) is monotonically nondecreasing on \({\mathbb {R}}\). \(\square \)
The following result establishes that the approximation \({\hat{p}}_k\) continues to enjoy the weak convexity property for the above setup under mild assumptions.
Proposition 12
Suppose Assumptions 1, 4, and 5 hold. Additionally, suppose \(g(x,\xi ) = \max \left[ h(x,\xi )\right] \) satisfies the above conditions and \(\phi _k\) is a scalar smoothing function. Then \({\hat{p}}_k(\cdot )\) is \(\bar{L}_k\)-weakly convex for some constant \(\bar{L}_k\) that depends on \(L^{'}_h(\xi )\), \(L^{'}_{\phi ,k}\), \(M^{'}_{\phi ,k}\), and the diameter of \(X_{\nu }\).
Proof
First, note that \(g(\cdot ,\xi ) := \max \left[ h(\cdot ,\xi )\right] \) is \(L^{'}_h(\xi )\)-weakly convex on \(X_{\nu }\) for each \(\xi \in \varXi \), see Lemma 4.2 of Drusvyatskiy and Paquette [23]. In particular, this implies that for any \(y, z \in X_{\nu }\),
for any \(s_y(\xi ) \in \partial \max \left[ h(y,\xi )\right] \). The Lipschitz continuity of \(d\phi _k(\cdot )\) implies that for any scalars v and w:
From the monotonicity Assumption 4B and by recursively applying the above result, we have for any \(\xi \in \varXi \) and \(y, z \in X_{\nu }\):
Taking expectation on both sides and noting that \(\Vert s_y(\xi )\Vert \le L^{'}_h(\xi )\), we get
Therefore, \({\hat{p}}_k\) is \(\bar{L}_k\)-weakly convex on \(X_{\nu }\) with
\(\square \)
We now outline our proposal for estimating the weak convexity parameter \(\bar{L}_k\) of \({\hat{p}}_k\) for the above setting. First, we note that estimate of the constants \(L^{'}_{\phi ,k}\) and \(M^{'}_{\phi ,k}\) can be obtained from Proposition 7. Next, we propose to replace the unknown constant \(\text {diam}\left( X_{\nu }\right) \) with the diameter 2r of the sampling ball in Algorithm 5. Finally, we note that estimating the Lipschitz constant \(L^{'}_h(\xi )\) for any \(\xi \in \varXi \) is tricky since it would involve looking at all of the \(|J|\) constraints in general, just the thing we wanted to avoid! To circumvent this, we propose to replace \(L^{'}_h(\xi )\) in \(\bar{L}_k\) with an estimate of the local Lipschitz constant of our choice of the B-subdifferential element of \(g(\cdot ,\xi )\) through sampling (similar to the proposal in Algorithm 5). When g is defined through the solution of the recourse formulation considered, we can estimate a B-subdifferential element by appealing to linear programming duality and use this as a proxy for the ‘gradient’ of g. Note that a crude estimate of the step length does not affect the theoretical convergence guarantee of the stochastic subgradient method of Davis and Drusvyatskiy [21].
Additional computational results
We present results of our replication experiments for the four case studies in the paper, and also consider a variant of Case study 3 that has a known analytical solution to benchmark our proposed approach.
Figure 7 presents the enclosure of the trajectories of the EF generated by ten replicates of the proposed approach when applied to Case study 1. For each value of the objective bound \(\nu \), we plot the smallest and largest risk level determined by Algorithm 3 at that bound over the different replicates. We find that the risk levels returned by the proposed algorithm do not vary significantly across the different replicates, with the maximum difference in the risk levels across the 26 points on the EF being a factor of 1.48. Figure 8 presents the corresponding plot for Case study 2. We again find that the risk levels returned by the proposed algorithm do not vary significantly across the different replicates, with the maximum difference in the risk levels across the 20 points on the EF being a factor of 1.006. Figure 9 presents the corresponding plot for Case study 3. The maximum difference in the risk levels at the 31 points on the EF for this case is a factor of 1.22 across the ten replicates. Figure 10 presents the corresponding plot for Case study 4. The maximum difference in the risk levels at the 26 points on the EF for this case is a factor of 1.25 across the ten replicates.
Case study 5
We consider Case study 3 when the random variables \(\xi _{ij} \sim {\mathcal {P}}:= \mathcal {N}(0,1)\) are i.i.d., the number of variables \(n = 100\), the number of constraints \(m = 100\), and bound \(U = 100\). The EF can be computed analytically in this case, see Section 5.1.1 of Hong et al. [34]. Figure 11 compares a typical EF obtained using our approach against the analytical EF and the solutions generated by the tuned scenario approximation algorithm. Our proposed approach is able to converge to the analytical EF, whereas the scenario approximation method is only able to determine a suboptimal EF. Our proposal takes 2332 s on average (and a maximum of 2379 s) to approximate the EF using 32 points, whereas it took the scenario approximation a total of 16, 007 s to generate its 1000 points in Fig. 11. We note that about \(60\%\) of the reported times for our method is spent in generating random numbers because the random variable \(\xi \) is high-dimensional.
Figure 12 presents an enclosure of the trajectories of the EF generated by the proposed approach over ten replicates for this case. We find that the risk levels returned by the proposed algorithm do not vary significantly across the different replicates, with the maximum difference in the risk levels across the 32 points on the EF being a factor of 1.001.
Rights and permissions
About this article
Cite this article
Kannan, R., Luedtke, J.R. A stochastic approximation method for approximating the efficient frontier of chance-constrained nonlinear programs. Math. Prog. Comp. 13, 705–751 (2021). https://doi.org/10.1007/s12532-020-00199-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12532-020-00199-y