Chance-constrained problems and rare events: an importance sampling approach


We study chance-constrained problems in which the constraints involve the probability of a rare event. We discuss the relevance of such problems and show that the existing sampling-based algorithms cannot be applied directly in this case, since they require an impractical number of samples to yield reasonable solutions. We argue that importance sampling (IS) techniques, combined with a Sample Average Approximation (SAA) approach, can be effectively used in such situations, provided that variance can be reduced uniformly with respect to the decision variables. We give sufficient conditions to obtain such uniform variance reduction, and prove asymptotic convergence of the combined SAA-IS approach. As it often happens with IS techniques, the practical performance of the proposed approach relies on exploiting the structure of the problem under study; in our case, we work with a telecommunications problem with Bernoulli input distributions, and show how variance can be reduced uniformly over a suitable approximation of the feasibility set by choosing proper parameters for the IS distributions. Although some of the results are specific to this problem, we are able to draw general insights that can be useful for other classes of problems. We present numerical results to illustrate our findings.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3


  1. 1.

    Random lower semicontinuous functions are called normal integrands in [36].


  1. 1.

    Adas, A.: Traffic models in broadband networks. IEEE Commun. Mag. 35(7), 82–89 (1997)

    Article  Google Scholar 

  2. 2.

    Andrieu, L., Henrion, R., Römisch, W.: A model for dynamic chance constraints in hydro power reservoir management. Eur. J. Oper. Res. 207(2), 579–589 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  3. 3.

    Artstein, Z., Wets, R.J.B.: Consistency of minimizers and the SLLN for stochastic programs. J. Convex Anal. 2(1–2), 1–17 (1996)

    MathSciNet  MATH  Google Scholar 

  4. 4.

    Asmussen, S., Glynn, P.: Stochastic Simulation. Springer, New York (2007)

    Google Scholar 

  5. 5.

    Beraldi, P., Ruszczyński, A.: The probabilistic set-covering problem. Oper. Res. 50(6), 956–967 (2002)

    MathSciNet  Article  MATH  Google Scholar 

  6. 6.

    Bonami, P., Lejeune, M.: An exact solution approach for portfolio optimization problems under stochastic and integer constraints. Oper. Res. 57(3), 650–670 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  7. 7.

    Calafiore, G., Campi, M.C.: Uncertain convex programs: randomized solutions and confidence levels. Math. Program. 102(1), 25–46 (2005)

    MathSciNet  Article  MATH  Google Scholar 

  8. 8.

    Campi, M.C., Garatti, S.: The exact feasibility of randomized solutions of uncertain convex programs. SIAM J. Optim. 19(3), 1211–1230 (2008)

    MathSciNet  Article  MATH  Google Scholar 

  9. 9.

    Campi, M.C., Garatti, S.: A sampling-and-discarding approach to chance-constrained optimization: feasibility and optimality. J. Optim. Theory Appl. 148(2), 257–280 (2011)

    MathSciNet  Article  MATH  Google Scholar 

  10. 10.

    Campi, M.C., Garatti, S., Prandini, M.: The scenario approach for systems and control design. Ann. Rev. Control 33(2), 149–157 (2009)

    Article  Google Scholar 

  11. 11.

    Carniato, A., Camponogara, E.: Integrated coal-mining operations planning: modeling and case study. Int. J. Coal Prep. Util. 31(6), 299–334 (2011)

    Article  Google Scholar 

  12. 12.

    Charnes, A., Cooper, W.W., Symonds, G.H.: Cost horizons and certainty equivalents: an approach to stochastic programming of heating oil. Manag. Sci. 4, 235–263 (1958)

    Article  Google Scholar 

  13. 13.

    Chung, K.L.: A Course in Probability Theory, 2nd edn. Academic Press, New York (1974)

    Google Scholar 

  14. 14.

    Dantzig, G.B., Glynn, P.W.: Parallel processors for planning under uncertainty. Ann. Oper. Res. 22(1), 1–21 (1990)

    MathSciNet  Article  MATH  Google Scholar 

  15. 15.

    Dentcheva, D., Prékopa, A., Ruszczynski, A.: Concavity and efficient points of discrete distributions in probabilistic programming. Math. Program. 89(1), 55–77 (2000)

    MathSciNet  Article  MATH  Google Scholar 

  16. 16.

    Dorfleitner, G., Utz, S.: Safety first portfolio choice based on financial and sustainability returns. Eur. J. Oper. Res. 221(1), 155–164 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  17. 17.

    Duckett, W.: Risk analysis and the acceptable probability of failure. Struct. Eng. 83(15), 25–26 (2005)

    Google Scholar 

  18. 18.

    Ermoliev, Y.M., Ermolieva, T.Y., MacDonald, G., Norkin, V.: Stochastic optimization of insurance portfolios for managing exposure to catastrophic risks. Ann. Oper. Res. 99(1–4), 207–225 (2000)

    MathSciNet  Article  MATH  Google Scholar 

  19. 19.

    Henrion, R., Römisch, W.: Metric regularity and quantitative stability in stochastic programs with probabilistic constraints. Math. Program. 84(1), 55–88 (1999)

    MathSciNet  MATH  Google Scholar 

  20. 20.

    Homem-de-Mello, T., Bayraksan, G.: Monte Carlo methods for stochastic optimization. Surv. Oper. Res. Manag. Sci. 19(1), 56–85 (2014)

    MathSciNet  Google Scholar 

  21. 21.

    Infanger, G.: Monte Carlo (importance) sampling within a Benders decomposition algorithm for stochastic linear programs. Ann. Oper. Res. 39(1), 69–95 (1992)

    MathSciNet  Article  MATH  Google Scholar 

  22. 22.

    Jiang, R., Guan, Y.: Data-driven chance constrained stochastic program (2012).

  23. 23.

    Kahn, H., Harris, T.: Estimation of particle transmission by random sampling. Nat. Bur. Stand. Appl. Math. Ser. 12, 27–30 (1951)

    Google Scholar 

  24. 24.

    L’Ecuyer, P., Mandjes, M., Tuffin, B.: Importance sampling in rare event simulation. In: Rubino, G., Tuffin, B., (eds.) Rare Event Simulation using Monte Carlo Methods, Chap. 2. Wiley, New York (2009)

  25. 25.

    Lejeune, M.: Pattern definition of the p-efficiency concept. Ann. Oper. Res. 200(1), 23–36 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  26. 26.

    Li, W.L., Zhang, Y., So, A.C., Win, Z.: Slow adaptive OFDMA systems through chance constrained programming. IEEE Trans. Signal Process. 58(7), 3858–3869 (2010)

    MathSciNet  Article  Google Scholar 

  27. 27.

    Liu, Y., Guo, H., Zhou, F., Qin, X., Huang, K., Yu, Y.: Inexact chance-constrained linear programming model for optimal water pollution management at the watershed scale. J. Water Resour. Plan. Manag. 134(4), 347–356 (2008)

    Article  Google Scholar 

  28. 28.

    Luedtke, J., Ahmed, S.: A sample approximation approach for optimization with probabilistic constraints. SIAM J. Optim. 19(2), 674–699 (2008)

    MathSciNet  Article  MATH  Google Scholar 

  29. 29.

    Minoux, M.: Discrete cost multicommodity network optimization problems and exact solution methods. Ann. Oper. Res. 106(1–4), 19–46 (2001)

    MathSciNet  Article  MATH  Google Scholar 

  30. 30.

    Minoux, M.: Multicommodity network flow models and algorithms in telecommunications. In: Resende, M., Pardalos, P. (eds.) Handbook of Optimization in Telecommunications, pp. 163–184. Springer, Berlin (2006)

  31. 31.

    Nemirovski, A., Shapiro, A.: Convex approximations of chance constrained programs. SIAM J. Optim. 17(4), 969–996 (2006)

    MathSciNet  Article  MATH  Google Scholar 

  32. 32.

    Pagnoncelli, B., Ahmed, S., Shapiro, A.: Sample average approximation method for chance constrained programming: theory and applications. J. Optim. Theory Appl. 142(2), 399–416 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  33. 33.

    Pagnoncelli, B.K., Reich, D., Campi, M.C.: Risk-return trade-off with the scenario approach in practice: a case study in portfolio selection. J. Optim. Theory Appl. 155(2), 707–722 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  34. 34.

    Prékopa, A.: Probabilistic programming. In: Ruszczyński, A., Shapiro, A. (eds.) Stochastic Programming, vol. 10, pp. 267–351. Elsevier, Amsterdam (2004)

  35. 35.

    Ramaswami, R., Sivarajan, K., Sasaki, G.: Optical Networks: A Practical Perspective. Morgan Kaufmann, Los Altos (2009)

    Google Scholar 

  36. 36.

    Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, A Series of Comprehensive Studies in Mathematics, vol. 317. Springer, Berlin (1998)

    Google Scholar 

  37. 37.

    Römisch, W., Schultz, R.: Stability analysis for stochastic programs. Ann. Oper. Res. 30(1), 241–266 (1991)

    MathSciNet  Article  MATH  Google Scholar 

  38. 38.

    Rosenbluth, M.N., Rosenbluth, A.W.: Monte Carlo calculation of the average extension of molecular chains. J. Chem. Phys. 23, 356 (1955)

    Article  Google Scholar 

  39. 39.

    Rubinstein, R.Y.: Cross-entropy and rare events for maximal cut and partition problems. ACM Trans. Model. Comput. Simul. 12(1), 27–53 (2002)

    Article  Google Scholar 

  40. 40.

    Rubinstein, R.Y., Shapiro, A.: Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method. Wiley, Chichester (1993)

    Google Scholar 

  41. 41.

    Shapiro, A.: Monte Carlo sampling methods. In: Ruszczynski, A., Shapiro, A. (eds.) Stochastic Programming, Handbooks in Operations Research and Management Science, vol. 10. Elsevier, Amsterdam (2003)

    Google Scholar 

  42. 42.

    Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on stochastic programming: modeling and theory, vol. 9. SIAM (2009)

  43. 43.

    Soekkha, H.M.: Aviation Safety: Human Factors, System Engineering, Flight Operations, Economics, Strategies, Management. VSP, Utrecht (1997)

    Google Scholar 

  44. 44.

    Thieu, Q.T., Hsieh, H.Y.: Use of chance-constrained programming for solving the opportunistic spectrum sharing problem under rayleigh fading. In: 9th International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 1792–1797 (2013)

  45. 45.

    Tran, Q.K., Parpas, P., Rustem, B., Ustun, B., Webster, M.: Importance sampling in stochastic programming: a Markov chain Monte Carlo approach (2013).

  46. 46.

    Vallejos, R., Zapata-Beghelli, A., Albornoz, V., Tarifeño, M.: Joint routing and dimensioning of optical burst switching networks. Photon Netw. Commun. 17(3), 266–276 (2009)

    Article  Google Scholar 

Download references


Authors acknowledge the financial support of Grant Anillo ACT-88, Basal Center CMM-UCh, CIRIC-INRIA Chile (J.B., E.M., G.C.), Programa Iniciativa Cientifica Milenio NC130062 (J.B) and FONDECYT Grants 1120244 (T.H., B.P.) and 1130681 (E.M.).

Author information



Corresponding author

Correspondence to Javiera Barrera.


Appendix 1: MIP formulation for \(\hat{p}^{\text {IS}_0}_a\) estimator under heterogeneous demand

We can formulate an integer linear programming model for this problem

$$\begin{aligned} \min \sum \limits _{a\in A} w_a&\ \end{aligned}$$
$$\begin{aligned} \mathcal {N}y^c = d^c&\quad \quad \forall c=1,\ldots ,C \end{aligned}$$
$$\begin{aligned} \sum \limits _{c=1}^{C} \hat{\xi }^s_c y_{c,a} \le w_a + \sum \limits _{k=1}^C k\cdot u_{a,s,k}&\ \quad \quad \forall a\in A, \quad \forall s= 1,\ldots ,N \end{aligned}$$
$$\begin{aligned} \sum \limits _{c=1}^{C} \hat{\xi }^s_c y_{c,a} \ge \sum \limits _{k=1}^C k\cdot u_{a,s,k}&\ \quad \quad \forall a\in A, \quad \forall s= 1,\ldots ,N \end{aligned}$$
$$\begin{aligned} \sum \limits _{s=1}^N \sum \limits _{k=1}^C e^{-k\lambda } u_{a,s,k} \le \alpha N \sum \limits _{k=0}^C G_a(k) v_{a,k}&\ \quad \quad \forall a\in A \end{aligned}$$
$$\begin{aligned} \sum \limits _{c=1}^{C} y_{c,a} = \sum \limits _{k=0}^C k v_{a,k}&\ \quad \quad \forall a\in A \end{aligned}$$
$$\begin{aligned} \sum \limits _{k=0}^C v_{a,k} = 1&\ \quad \quad \forall a\in A \end{aligned}$$
$$\begin{aligned} \sum \limits _{k=1}^C u_{a,s,k} \le 1&\ \quad \quad \forall a\in A, \quad \forall s= 1,\ldots ,N \end{aligned}$$
$$\begin{aligned} w_a\in \mathbb {N},\ y_{c,a} \in \{0,1\}, u_{a,s,k}\in \{0,1\},\nonumber \\ v_{a,k}\in \{0,1\}&\quad \quad \forall a\in A, \quad \forall c=1,\ldots ,C,\nonumber \\&\qquad \forall s= 1,\ldots ,N \end{aligned}$$

Binary variables \(v_{a,k}\), together with Eqs. (58) and (57), satisfy that \(v_{a,k}=1\) if and only if \(\sum \nolimits _{c=1}^C y_{c,a}=k\). The role of binary variables u is explained in the following lemma

Lemma 2

Let (xwuv) be an optimal solution of previous formulation, then there exist an optimal solution \((x,w,\hat{u},v)\) such that

  1. 1.

    \(\sum _{c=1}^C \hat{\xi }^s_c y_{c,a} \le w_a\) if and only if \(\hat{u}_{a,s,k}=0\) for all \(k=1,\ldots ,C\).

  2. 2.

    if \(\hat{u}_{a,s,k}=1\) then \(\sum _{c=1}^C \hat{\xi }^s_c y_{c,a} =k\),


$$\begin{aligned} \sum _{k=1}^C e^{-k\lambda } \hat{u}_{a,s,k} = e^{\sum _{c=1}^C \hat{\xi }^s_c y_{c,a}} {\mathbbm {1}}_{\left\{ \sum \limits _{c=1}^C \hat{\xi }^s_c y_{c,a} > w_a\right\} } \end{aligned}$$


First, note that constraint (54) impose that if \(u_{a,s,k}=0\) for all k then \(\sum _{c=1}^C \hat{\xi }^s_c y_{c,a} \le w_a\). Suppose that \(\sum _{c=1}^C \hat{\xi }^s_c y_{c,a} \le w_a\) but \(u_{a,s,k'}=1\) for some \(k'\). It is easy to see that defining \(\hat{u}_{a,s,k}=0\) for all k and \(\hat{u}=u\) for the other variables, then \(\hat{u}\) also satisfy Eqs. (54) and (55), and since \(\hat{u}\le u\) then it also satisfy Eqs. (59) and (56), hence \((x,w,\hat{u},v)\) is also optimal. Repeating this procedure is easy to see that we obtain a solution that satisfies condition (1). For the second condition, suppose that \(u_{a,s,k}=1\) for some k but \(\sum _{c=1}^C \hat{\xi }^s_c y_{c,a} > k\). Let \(\hat{k}=\sum _{c=1}^C \hat{\xi }^s_c y_{c,a}\) and define \(\hat{u}_{a,s,\hat{k}}=1\), \(\hat{u}_{a,s,k}=0\) \(\forall k\ne \hat{k}\) and \(\hat{u}=u\) for the other variables. By definition, \((w,x,\hat{u},v)\) satisfies (54) and (59), and since \(\hat{k}>k\) then it also satisfies (54). On the other hand, since \(\lambda >0\) then \(e^{-\lambda k}>e^{-\lambda \hat{k}}\) so it also satisfy (56) and then \((x,w,\hat{u},v)\) is also optimal. Repeating this procedure is easy to see that we obtain a solution that satisfies condition (2). \(\square \)

Lemma 2 shows that the optimal solution (yw) of this MIP formulation satisfies

$$\begin{aligned} \frac{1}{N} \sum _{s= 1}^N e^{\sum _{c=1}^C \hat{\xi }^s_c y_{c,a}} {\mathbbm {1}}_{\left\{ \sum \limits _{c= 1}^C\hat{\xi }^s_c y_{c,a} > w_a\right\} } \le \alpha \cdot G_a\left( \sum _{c=1}^C y_{c,a} \right) \quad \forall a\in A, \end{aligned}$$

which is the desired approximation of equation \(\hat{p}^{\text {IS}_0}_a \le \alpha \).

Appendix 2: Proofs of results

Proof of Lemma 1

Lemma 1

Suppose that the set-function \(I_x\) is such that \(G(x,\cdot )\) is \(I_x\)-determined for each \(x \in X\). Given an i.i.d. sample \((\hat{\xi }^1,\ldots ,\hat{\xi }^N)\) from the distribution of \(\hat{\xi }\), let

$$\begin{aligned} \hat{p}^{\text {IS}_0}(x)\ :=\ \frac{1}{N} \sum _{j=1}^{N} {\mathbbm {1}}_{\{G\big (x,\hat{\xi }^j\big )>0\}}L_x(\hat{\xi }^j). \end{aligned}$$

Then \(\hat{p}^{\text {IS}_0}(x)\) is also an unbiased estimator of p(x). Moreover,

$$\begin{aligned} \text {Var}(\hat{p}^{\text {IS}_0}(x))\ = \ \text {Var}(\hat{p}^{\text {IS}}(x)) - \frac{1}{N}\mathbb {E}_{\hat{\xi }}\left[ \text {Var}\left( {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}}\,|\,(\hat{\xi }_i)_{i\in I_x}\right) \right] \end{aligned}$$


First let us prove that the estimator \(\hat{p}^{\text {IS}_0}(x)\) is unbiased, for which it suffices to show that \(\mathbb {E}_{\hat{\xi }}\left[ {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}}L_x(\hat{\xi })\right] =p(x)\). Indeed, we have

$$\begin{aligned} \mathbb {E}_{\hat{\xi }}\left[ {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}}L_x(\hat{\xi })\right]= & {} \mathbb {E}_{\hat{\xi }}\left[ {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}}\mathbb {E}_{\hat{\xi }}\left[ L(\hat{\xi })~|(\hat{\xi }_i)_{i \in I_x}\right] \right] \nonumber \\= & {} \mathbb {E}_{\hat{\xi }}\left[ \mathbb {E}_{\hat{\xi }}\left[ {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}}L(\hat{\xi })~|~(\hat{\xi }_i)_{i \in I_x}\right] \right] \end{aligned}$$
$$\begin{aligned}= & {} \mathbb {E}_{\hat{\xi }}\left[ {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}}L(\hat{\xi })\right] = p(x), \end{aligned}$$

where the second equality follows from the assumption that \(G(x,\cdot )\) is \(I_x\)-determined, which implies that \(G(x,\hat{\xi })\) is measurable with respect to the sigma-algebra generated by \((\hat{\xi }_i)_{i \in I_x}\).

For the second assertion of the theorem, note that

$$\begin{aligned} \mathbb {E}_{\hat{\xi }}\left[ {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}}^2 L_x(\hat{\xi })^2\right]&= \mathbb {E}_{\hat{\xi }}\left[ {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}} \left( \mathbb {E}_{\hat{\xi }}\left[ L(\hat{\xi })~|~(\hat{\xi }_i)_{i \in I_x}\right] \right) ^2\right] \\&= \mathbb {E}_{\hat{\xi }}\left[ \left( \mathbb {E}_{\hat{\xi }}\left[ {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}}L(\hat{\xi })~|~(\hat{\xi }_i)_{i \in I_x}\right] \right) ^2\right] \\&= \ \mathbb {E}_{\hat{\xi }}\left[ \mathbb {E}_{\hat{\xi }}\left[ {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}}L(\hat{\xi })^2~|~(\hat{\xi }_i)_{i \in I_x}\right] \right. \\&\qquad - \left. \text {Var}\left( {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}}L(\hat{\xi })~|~(\hat{\xi }_i)_{i \in I_x}\right) \right] \\&= \ \mathbb {E}_{\hat{\xi }}\left[ {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}}L(\hat{\xi })^2\right] \\&\qquad - \mathbb {E}_{\hat{\xi }}\left[ \text {Var}\left( {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}}L(\hat{\xi })~|~(\hat{\xi }_i)_{i \in I_x}\right) \right] \end{aligned}$$

and therefore

$$\begin{aligned} N \text {Var}(\hat{p}^{\text {IS}_0}(x))&= \mathbb {E}_{\hat{\xi }}\left[ {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}}^2 L_x(\hat{\xi })^2\right] - p(x)^2 = \ \mathbb {E}_{\hat{\xi }}\left[ {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}}L(\hat{\xi })^2\right] \\&\quad - \mathbb {E}_{\hat{\xi }}\left[ \text {Var}\left( {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}}L(\hat{\xi })~|~(\hat{\xi }_i)_{i \in I_x}\right) \right] - p(x)^2 \\&= N \text {Var}(\hat{p}^{\text {IS}}(x)) - \mathbb {E}_{\hat{\xi }}\left[ \text {Var}\left( {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}}\,|\,(\hat{\xi }_i)_{i\in I_x}\right) \right] . \end{aligned}$$

\(\square \)

Proof of Proposition 3

Proposition 3

Let \(\zeta _1,\ldots ,\zeta _m\) be \(m\ge 1\) independent Bernoulli random variables with \(\mathbb {P}\{\zeta _i=1\}=p_i\), and suppose that \(0<p_i<1\) for all i. Let \(Z:= \sum _{i=1}^m \zeta _i\), and define \(\delta := \min _i\, p_i(1-p_i) > 0\). Then, we have

$$\begin{aligned} \mathbb {P}\left\{ Z > \mathbb {E}[Z]\right\} \ > \ \frac{\delta }{2m}. \end{aligned}$$


Let \(u:[0,m]\mapsto \mathbb {R}\) be the function defined as \(u(t):= m^2 - t^2\). Since \(u(\cdot )\) is nonnegative and decreasing on [0, m], we have that

$$\begin{aligned} \mathbb {P}\left\{ Z \le \mathbb {E}[Z]\right\}&= \mathbb {P}\left\{ u(Z) \ge u(\mathbb {E}[Z])\right\} \\&= \ \mathbb {P}\left\{ m^2 -Z^2 \ge m^2 - (\mathbb {E}[Z])^2\right\} \\&\le \ \frac{\mathbb {E}\left[ m^2 - Z^2\right] }{m^2 - (\mathbb {E}[Z])^2}, \end{aligned}$$

where the last inequality follows from Markov’s inequality. Thus, we have

$$\begin{aligned} \mathbb {P}\left\{ Z > \mathbb {E}[Z]\right\}&\ge \ 1 - \frac{\mathbb {E}\left[ m^2 - Z^2\right] }{m^2 - (\mathbb {E}[Z])^2} \ =\ \frac{\mathbb {E}\left[ Z^2\right] -(\mathbb {E}[Z])^2}{m^2 - (\mathbb {E}[Z])^2} \nonumber \\&=\ \frac{\text {Var}(Z)}{(m + \mathbb {E}[Z])(m - \mathbb {E}[Z])} . \end{aligned}$$

Next, notice that independence of \(\{\zeta _i\}\) implies that \(\text {Var}(Z)=\sum _{i=1}^m p_i(1-p_i)\). Moreover, since \(0<\mathbb {E}[Z]<m\) we have that \(m + \mathbb {E}[Z] < 2m\), \(m - \mathbb {E}[Z]< m\) and thus from (66) we have that

$$\begin{aligned} \mathbb {P}\left\{ Z > \mathbb {E}[Z]\right\}&> \ \frac{\sum _{i=1}^m p_i(1-p_i)}{2m^2} \ \ge \ \frac{\delta m}{2m^2} \ = \ \frac{\delta }{2m}. \end{aligned}$$

\(\square \)

Proof of Theorem 1

Theorem 1

Suppose that \(0<\rho _c<1\) for all \(c=1,\dots , C\). Let \(x=(y,w)\) be such that \(w \in \mathbb {N}\) satisfies \(\sum _{c=1}^C \rho _c y_c < w \le \sum _{c=1}^C y_c-1\). Then the function \(B_{x}(\varvec{\lambda })\) is convex and there exists \(\lambda ^*_{x}\in \mathbb {R}_+\cup \{\infty \}\) such that the vector \(\varvec{\lambda }\) defined as \(\lambda _c=\lambda ^*_{x}\) \(\forall c \in C\) minimizes \(B_{x}(\varvec{\lambda })\). If \(w=\sum _{c=1}^C y_c-1\), then the optimal \(\lambda ^*_{x}\) is \(\lambda ^*_{x}=\infty \) and \(\hat{\rho }_c(\lambda ^*_{x})=1\); otherwise, \(\lambda ^*_{x}\) and \(\hat{\rho }_c(\lambda ^*_{x})\) satisfy

$$\begin{aligned} \sum _{c=1}^C \hat{\rho }_c(\lambda ^*_{x}) y_c = w+1 \qquad \text{ and } \qquad \hat{\rho }_c(\lambda ^*_{x}) =\frac{e^{\lambda ^*_{x}} \rho _c}{e^{\lambda ^*_{x}} \rho _c +(1-\rho _c)}. \end{aligned}$$

To prove the theorem, we need the following lemma, the proof of which is shown after the proof of the theorem.

Lemma 3

For \(n\ge 1\), let \(\rho _i\), \(i=1,\ldots ,n\) be numbers such that \(\rho _i \in (0,1)\) and \(\rho _1 \ge \rho _2 \ge \ldots \ge \rho _n\). Given an integer w such that \(0 \le w \le n-1\), consider problem (P) defined as follows:

$$\begin{aligned} \min _{\lambda \in \mathbb {R}^n_+} \max _{\begin{array}{c} z_i\in \{0,1\}^n\\ \sum _i z_i =w+1 \end{array}} -\sum _{i=1}^n z_i \lambda _i + \sum _{i=1}^n \log (e^{\lambda _i} \rho _i + (1-\rho _i)) . \end{aligned}$$

Then, there exists an optimal solution to (P) that satisfies \(\lambda _1 \le \lambda _2 \le \ldots \le \lambda _n\).


(of Theorem 1) Let \(n=\sum _{c=1}^C y_c\). Without loss of generality, let us assume for the sake of simplifying notation that the set \(\{c: y_c=1\}\) corresponds to \(\{1,\ldots ,n\}\). Since the \(\log \) function is increasing, we have that

$$\begin{aligned} \log (B_{x}(\varvec{\lambda }))\ =\ \max _{\begin{array}{c} z_i\in \{0,1\}^n\\ \sum _i z_i =w+1 \end{array}} -\sum _{i=1}^n z_i \lambda _i + \sum _{i=1}^n \log (e^{\lambda _i} \rho _i + (1-\rho _i)) \end{aligned}$$

By Lemma 3, minimizing \(\log (B_{x}(\varvec{\lambda }))\) amounts to solving the following problem:

$$\begin{aligned} \min _{\varvec{\lambda }\in \mathbb {R}^n}\, \psi (\varvec{\lambda }):= -\sum _{i=1}^{w+1} \lambda _i +&\sum _{i=1}^n \log (e^{\lambda _i} \rho _i + (1-\rho _i)) \end{aligned}$$
$$\begin{aligned} \lambda _i \le \lambda _{i+1}&\quad i=1\ldots n-1 \end{aligned}$$
$$\begin{aligned} \lambda _1 \ge 0&\end{aligned}$$

Note that the objective function of the above problem is strictly convex in \(\varvec{\lambda }\). In fact, its second derivatives are

$$\begin{aligned} \frac{\partial ^2 \psi }{\partial \lambda _i^2} = \frac{e^{\lambda _i} \rho _i (1- \rho _i)}{(e^{\lambda _i}\rho _i + (1-\rho _i))^2}>0, \qquad \frac{\partial ^2 \psi }{\partial \lambda _i\partial \lambda _j}=0. \end{aligned}$$

Since \(B_x(\varvec{\lambda })=\exp (\log (B_x(\varvec{\lambda }))\) and \(\log (B_x(\varvec{\lambda }))\) is convex—though not strictly convex due to the components \(\lambda _c\) such that \(y_c=0\)—it follows that \(B_x\) is convex in \(\varvec{\lambda }\). Of course, the components \(\lambda _c\) such that \(y_c=0\) do not affect the value of \(B_x(\varvec{\lambda })\).

Suppose first that \(w=n-1\). Then, the first derivative of the objective function in (67) is given by

$$\begin{aligned} \frac{\partial \psi }{\partial \lambda _i} \ = \ -1 + \frac{e^{\lambda _i} \rho _i}{e^{\lambda _i} + (1- \rho _i)}, \quad i=1,\ldots ,n, \end{aligned}$$

so we see that \(\lim _{\varvec{\lambda }\rightarrow \infty } \nabla \psi (\varvec{\lambda }) =0\). Notice that we can in particular interpret \(\lim _{\varvec{\lambda }\rightarrow \infty }\) as \(\lim _{{\lambda }\rightarrow \infty }\) with \(\lambda _i=\lambda \). That is, in that case the optimal solution of (67)–(69) is \(\lambda _i=\infty \), \(i=1,\ldots ,n\).

Consider now the case \(w<n-1\). We will show that problem (67)–(69) has a unique optimal solution, which can be found by writing the Karush-Kuhn-Tucker conditions as follows:

$$\begin{aligned} -1_{(i\le w+1)} + \frac{e^{\lambda _i} \rho _i}{e^{\lambda _i} \rho _i +(1-\rho _i)} +\mu _{i}-\mu _{i-1}&= 0&i=1\ldots n-1 \end{aligned}$$
$$\begin{aligned} \frac{e^{\lambda _n} \rho _n}{e^{\lambda _n} \rho _n +(1-\rho _n)} -\mu _{n-1}&= 0&\end{aligned}$$
$$\begin{aligned} \mu _i (\lambda _{i+1}-\lambda _i)&= 0&i=1\ldots n-1 \end{aligned}$$
$$\begin{aligned} \mu _0 \lambda _1&= 0&\end{aligned}$$
$$\begin{aligned} \mu _i&\ge 0&i=0\ldots n-1 \end{aligned}$$

where \(\varvec{\mu }=(\mu _i)\) is the vector of Lagrangean multipliers of constraints (68) and \(\mu _0\) is the Lagrangean multiplier of constraint (69).

Consider now a particular choice of vectors \(\varvec{\mu }\) and \(\varvec{\lambda }\) defined as follows. All components of \(\varvec{\lambda }\) are identical, with \(\lambda _i=\lambda ^*\), where \(\lambda ^*\in \mathbb {R}_{+}\) solves the equation

$$\begin{aligned} \varphi (\lambda ^*):= \sum _{i=1}^{n} \frac{e^{\lambda ^*} \rho _i}{e^{\lambda ^*} \rho _i +(1-\rho _i)}\ = \ w+1. \end{aligned}$$

Note that we can always find such \(\lambda ^*\), since the function \(\varphi (\lambda )\) is continuous and increasing, and

$$\begin{aligned} \varphi (0)&= \ \sum _{i=1}^n \rho _i\ < w \ < \ w+1 \end{aligned}$$
$$\begin{aligned} \lim _{\lambda \rightarrow \infty } \varphi (\lambda )&= \ n\ > \ w+1. \end{aligned}$$

The inequalities in (76) follow from the assumptions of the theorem on w and the fact that we are analyzing the case \(w<n-1\). The components of \(\varvec{\mu }\) are defined as

$$\begin{aligned} \mu _0&:= 0 \end{aligned}$$
$$\begin{aligned} \mu _i&:= \min \{i,w+1\} - \sum _{j=1}^i \frac{e^{\lambda ^*} \rho _j}{e^{\lambda ^*} \rho _j +(1-\rho _j)} \quad i=1,\ldots ,n-1. \end{aligned}$$

We claim that \(\varvec{\mu }\) and \(\varvec{\lambda }\) satisfy the KKT conditions (70)–(74) laid out above. To see that, observe that Eqs. (78)–(79) imply (70). Equation (71) follows from (75), since we have

$$\begin{aligned} \frac{e^{\lambda ^*} \rho _n}{e^{\lambda ^*} \rho _n +(1-\rho _n)}\ = \ w+1- \sum _{i=1}^{n-1} \frac{e^{\lambda ^*} \rho _i}{e^{\lambda ^*} \rho _i +(1-\rho _i)} \end{aligned}$$

and the latter term coincides with \(\mu _{n-1}\) defined in (79). Equations (72) and (73) are trivially satisfied. Finally, we show that (74) holds, with strict inequality if \(i\ge 1\). Indeed, (75) implies that

$$\begin{aligned} \sum _{j=1}^{i} \frac{e^{\lambda ^*} \rho _j}{e^{\lambda ^*} \rho _j +(1-\rho _j)}\ < \ w+1 \quad i=1,\ldots ,n-1 \end{aligned}$$

and clearly have

$$\begin{aligned} \sum _{j=1}^{i} \frac{e^{\lambda ^*} \rho _j}{e^{\lambda ^*} \rho _j +(1-\rho _j)}\ < \ i \quad i=1,\ldots ,n \end{aligned}$$

as each term in the summand is less than 1. \(\square \)


(of Lemma 3) Suppose that \(\varvec{\lambda }=(\lambda _1, \ldots , \lambda _n)\) is an optimal solution and there exists some \(j<n\) such that \(\lambda _j>\lambda _{j+1}\). We will show that \(\bar{\varvec{\lambda }}\) defined as \(\bar{\lambda }_j=\lambda _{j+1}\), \(\bar{\lambda }_{j+1}=\lambda _j\) and \(\bar{\lambda }_i=\lambda _i\) for \(i\ne \{j,j+1\}\) has no worse objective function than \(\varvec{\lambda }\). Let \(\varDelta \) be defined as the difference in objective function between \(\varvec{\lambda }\) and \(\bar{\varvec{\lambda }}\), i.e.,

$$\begin{aligned} \varDelta =&\max _{\begin{array}{c} z_i\in \{0,1\}^n\\ \sum _i z_i =w+1 \end{array}} -\sum _{i=1}^n z_i \lambda _i + \sum _{i=1}^n \log (e^{\lambda _i} \rho _i + (1-\rho _i)) \end{aligned}$$
$$\begin{aligned}&\ \ -\left( \max _{\begin{array}{c} z_i\in \{0,1\}^n\\ \sum _i z_i =w+1 \end{array}} -\sum _{i=1}^n z_i \bar{\lambda }_i + \sum _{i=1}^n \log (e^{\bar{\lambda }_i} \rho _i + (1-\rho _i)) \right) . \end{aligned}$$

We will prove that \(\varDelta \ge 0\), showing that \({\bar{\varvec{\lambda }}}\) is no worse than \(\varvec{\lambda }\). Note initially that

$$\begin{aligned} \max _{\begin{array}{c} z_i\in \{0,1\}^n\\ \sum _i z_i =w+1 \end{array}} -\sum _{i=1}^n z_i \lambda _i \ = \ \max _{\begin{array}{c} z_i\in \{0,1\}^n\\ \sum _i z_i =w+1 \end{array}} -\sum _{i=1}^n z_i \bar{\lambda }_i, \end{aligned}$$

since the maximum value on both sides is equal to the sum of the smallest \(w+1\) components of the vector \(\varvec{\lambda }\). Thus, we only need to compare remaining part of the objective function, i.e., we have

$$\begin{aligned} \varDelta&=\ \sum _{i=1}^n \log (e^{\lambda _i} \rho _i + (1-\rho _i)) -\sum _{i=1}^n \log (e^{\bar{\lambda }_i} \rho _i + (1-\rho _i)) \\&=\ \log (e^{\lambda _j} \rho _j + (1-\rho _j))+\log (e^{\lambda _{j+1}} \rho _{j+1} + (1-\rho _{j+1})) \\&\qquad - \log (e^{\bar{\lambda }_j} \rho _j + (1-\rho _j))-\log (e^{\bar{\lambda }_{j+1}} \rho _{j+1} + (1-\rho _{j+1})). \end{aligned}$$

Since \(\bar{\lambda }_{j}=\lambda _{j+1}\) and \(\bar{\lambda }_{j+1}=\lambda _{j}\), it follows that

$$\begin{aligned} \varDelta&=\ \log \left( \frac{e^{\lambda _j} \rho _j + (1-\rho _j)}{e^{\lambda _{j+1}} \rho _j + (1-\rho _j)}\right) -\log \left( \frac{e^{\lambda _{j}} \rho _{j+1} + (1-\rho _{j+1})}{e^{\lambda _{j+1}} \rho _{j+1} + (1-\rho _{j+1})}\right) \\&=\ \log \left( \frac{e^{\lambda _{j}}-e^{\lambda _{j+1}}}{e^{\lambda _{j+1}} + \frac{1}{\rho _{j}}-1}+1\right) - \log \left( \frac{e^{\lambda _{j}}-e^{\lambda _{j+1}}}{e^{\lambda _{j+1}} + \frac{1}{\rho _{j+1}}-1}+1\right) . \end{aligned}$$

Note that the argument inside the \(\log \) is positive, since \(\lambda _j>\lambda _{j+1}\). Moreover, since \(\rho _j \ge \rho _{j+1}\), we see that \(1/\rho _{j}-1\le 1/\rho _{j+1}-1\) and hence we conclude that \(\varDelta \ge 0\). \(\quad \square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Barrera, J., Homem-de-Mello, T., Moreno, E. et al. Chance-constrained problems and rare events: an importance sampling approach. Math. Program. 157, 153–189 (2016).

Download citation


  • Chance-constrained programming
  • Sample average approximation
  • Importance sampling
  • Rare-event simulation

Mathematics Subject Classification

  • 90C15 (Mathematical Programming/Stochastic programming)
  • 65C05 (Numerical Analysis/Probabilistic Methods/Monte Carlo methods)
  • 68M10 (Computer Science/Network design and communication)