Robust sample average approximation

Abstract

Sample average approximation (SAA) is a widely popular approach to data-driven decision-making under uncertainty. Under mild assumptions, SAA is both tractable and enjoys strong asymptotic performance guarantees. Similar guarantees, however, do not typically hold in finite samples. In this paper, we propose a modification of SAA, which we term Robust SAA, which retains SAA’s tractability and asymptotic properties and, additionally, enjoys strong finite-sample performance guarantees. The key to our method is linking SAA, distributionally robust optimization, and hypothesis testing of goodness-of-fit. Beyond Robust SAA, this connection provides a unified perspective enabling us to characterize the finite sample and asymptotic guarantees of various other data-driven procedures that are based upon distributionally robust optimization. This analysis provides insight into the practical performance of these various methods in real applications. We present examples from inventory management and portfolio allocation, and demonstrate numerically that our approach outperforms other data-driven approaches in these applications.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. 1.

    Indeed, nonparametric comparison tests focus on other location parameters such as median.

  2. 2.

    Such a hypothesis is called simple. By contrast, a composite hypothesis is not defined by a single distribution, but rather a family of distributions, and asserts that the data-generating distribution F is some member of this family. An example of a composite hypothesis is that F is normally distributed (with some, unknown mean and variance). We do not consider composite hypotheses in this work.

  3. 3.

    Note that standard definitions of the statistics have the form of \(X_N^2\) and \(G_N^2\) with thresholds given by the \(\chi ^2\) distribution. Our nonstandard but equivalent definition is so that thresholds have the rate \(O(1/\sqrt{N})\) to match other tests presented.

  4. 4.

    The only nuance is that Proposition 3.4 of [46] requires a generalized Slater point. We use the empirical distribution function, \(\hat{F}_N\), as the generalized Slater point in the space of distributions.

  5. 5.

    This is most easily seen by noting:

    $$\begin{aligned} \mathbb {E}\left[ \hat{z}_{{\text {SAA}}}\right]&= \mathbb {E}\left[ \min _{x \in X}\frac{1}{N} \sum _{j=1}^N c(x; \xi ^j)\right] \le \min _{x \in X}\frac{1}{N}\sum _{j=1}^N\mathbb {E}\left[ c(x; \xi ^j)\right] = z_{{\text {stoch}}} \le \mathbb {E}\left[ c(x_{{\text {SAA}}};\xi )\right] . \end{aligned}$$

References

  1. 1.

    Bassamboo, A., Zeevi, A.: On a data-driven method for staffing large call centers. Oper. Res. 57(3), 714–726 (2009)

    Article  MATH  Google Scholar 

  2. 2.

    Bayraksan, G., Love, D.K.: Data-driven stochastic programming using phi-divergences. In: Tutorials in Operations Research, pp. 1–19 (2015)

  3. 3.

    Ben-Tal, A., den Hertog, D., De Waegenaere, A., Melenberg, B., Rennen, G.: Robust solutions of optimization problems affected by uncertain probabilities. Manag. Sci. 59(2), 341–357 (2013)

    Article  Google Scholar 

  4. 4.

    Ben-Tal, A., Nemirovski, A.: Lectures on modern convex optimization. In: Society for Industrial and Applied Mathematics. SIAM, Philadelphia (2001)

  5. 5.

    Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)

    Google Scholar 

  6. 6.

    Bertsimas, D., Doan, X.V., Natarajan, K., Teo, C.P.: Models for minimax stochastic linear optimization problems with risk aversion. Math. Oper. Res. 35(3), 580–602 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  7. 7.

    Bertsimas, D., Gupta, V., Kallus, N.: Data-driven robust optimization. Preprint arXiv:1401.0212 (2013)

  8. 8.

    Bertsimas, D., Popescu, I.: Optimal inequalities in probability theory: a convex optimization approach. SIAM J. Optim. 15(3), 780–804 (2005)

    MathSciNet  Article  MATH  Google Scholar 

  9. 9.

    Billingsley, P.: Convergence of Probability Measures. Wiley, New York (1999)

    Google Scholar 

  10. 10.

    Birge, J.R., Louveaux, F.: Introduction to Stochastic Programming. Springer, New York (2011)

    Google Scholar 

  11. 11.

    Birge, J.R., Wets, R.J.B.: Designing approximation schemes for stochastic optimization problems, in particular for stochastic programs with recourse. Math. Program. Study 27, 54–102 (1986)

    MathSciNet  Article  MATH  Google Scholar 

  12. 12.

    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  13. 13.

    Calafiore, G.C., El Ghaoui, L.: On distributionally robust chance-constrained linear programs. J. Optim. Theory Appl. 130(1), 1–22 (2006)

    MathSciNet  Article  MATH  Google Scholar 

  14. 14.

    D’Agostino, R.B., Stephens, M.A.: Goodness-of-Fit Techniques. Dekker, New York (1986)

    Google Scholar 

  15. 15.

    Delage, E., Ye, Y.: Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 55(3), 98–112 (2010)

    MathSciNet  MATH  Google Scholar 

  16. 16.

    DeMiguel, V., Garlappi, L., Nogales, F.J., Uppal, R.: A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms. Manag. Sci. 55(5), 798–812 (2009)

    Article  MATH  Google Scholar 

  17. 17.

    Dodge, Y.: The Oxford Dictionary of Statistical Terms. Oxford University Press, Oxford (2006)

    Google Scholar 

  18. 18.

    Dudley, R.M.: Real Analysis and Probability, vol. 74. Cambridge University Press, Cambridge (2002)

    Google Scholar 

  19. 19.

    Dupačová, J.: The minimax approach to stochastic programming and an illustrative application. Stoch. Int. J. Probab. Stoch. Process. 20(1), 73–88 (1987)

    MathSciNet  MATH  Google Scholar 

  20. 20.

    Efron, B.: An Introduction to the Bootstrap. Chapman & Hall, New York (1993)

    Google Scholar 

  21. 21.

    Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, New York (2001)

    Google Scholar 

  22. 22.

    Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70(3), 419–435 (2002)

    Article  MATH  Google Scholar 

  23. 23.

    Grotschel, M., Lovasz, L., Schrijver, A.: Geometric Algorithms and Combinatorial Optimization. Springer, New York (1993)

    Google Scholar 

  24. 24.

    Gurobi Optimization Inc.: Gurobi optimizer reference manual. http://www.gurobi.com (2013)

  25. 25.

    Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)

    Article  MATH  Google Scholar 

  26. 26.

    Homem-de Mello, T., Bayraksan, G.: Monte Carlo sampling-based methods for stochastic optimization. Surv. Oper. Res. Manag. Sci. 19(1), 56–85 (2014)

    MathSciNet  Google Scholar 

  27. 27.

    Jiang, R., Guan, Y.: Data-driven chance constrained stochastic program. Tech. rep., Technical report, University of Florida. Available at: Optimization Online http://www.optimization-online.org (2013)

  28. 28.

    King, A.J., Wets, R.J.B.: Epiconsistency of convex stochastic programs. Stoch. Stoch. Rep. 34(1–2), 83–92 (1991)

    Article  MATH  Google Scholar 

  29. 29.

    Klabjan, D., Simchi-Levi, D., Song, M.: Robust stochastic lot-sizing by means of histograms. Prod. Oper. Manag. 22(3), 691–710 (2013)

    Article  Google Scholar 

  30. 30.

    Kleywegt, A.J., Shapiro, A., Homem-de Mello, T.: The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. 12(2), 479–502 (2002)

    MathSciNet  Article  MATH  Google Scholar 

  31. 31.

    Kullback, S.: A lower bound for discrimination information in terms of variation. IEEE Trans. Inf. Theory 13(1), 126–127 (1967)

    Article  Google Scholar 

  32. 32.

    Levi, R., Perakis, G., Uichanco, J.: The data-driven newsvendor problem: new bounds and insights. Oper. Res. (2015). doi:10.1287/opre.2015.1422

    MathSciNet  MATH  Google Scholar 

  33. 33.

    Lim, A.E., Shanthikumar, J.G., Vahn, G.Y.: Conditional value-at-risk in portfolio optimization: coherent but fragile. Oper. Res. Lett. 39(3), 163–171 (2011)

    MathSciNet  Article  MATH  Google Scholar 

  34. 34.

    Lobo, M.S., Vandenberghe, L., Boyd, S., Lebret, H.: Applications of second-order cone programming. Linear Algebra Appl. 284(1), 193–228 (1998)

    MathSciNet  Article  MATH  Google Scholar 

  35. 35.

    Mak, W.K., Morton, D.P., Wood, R.K.: Monte Carlo bounding techniques for determining solution quality in stochastic programs. Oper. Res. Lett. 24(1), 47–56 (1999)

    MathSciNet  Article  MATH  Google Scholar 

  36. 36.

    Noether, G.E.: Note on the Kolmogorov statistic in the discrete case. Metrika 7(1), 115–116 (1963)

    MathSciNet  Article  MATH  Google Scholar 

  37. 37.

    Popescu, I.: Robust mean-covariance solutions for stochastic optimization. Oper. Res. 55(1), 98–112 (2007)

    MathSciNet  Article  MATH  Google Scholar 

  38. 38.

    Prékopa, A.: Stochastic Programming. Kluwer Academic Publishers, Dordrecht (1995)

    Google Scholar 

  39. 39.

    Reed, M., Simon, B.: Methods of Modern Mathematical Physics, Vol. 1: Functional Analysis. Academic Press, New York (1981)

    Google Scholar 

  40. 40.

    Rice, J.: Mathematical Statistics and Data Analysis. Thomson/Brooks/Cole, Belmont (2007)

    Google Scholar 

  41. 41.

    Rockafellar, T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)

    Article  Google Scholar 

  42. 42.

    Rohlf, F.J., Sokal, R.R.: Statistical Tables, 4th edn. Macmillan, New York (2012)

    Google Scholar 

  43. 43.

    Scarf, H.: A min–max solution of an inventory problem. In: Arrow, K.J., Karlin, S., Scarf, H. (eds.) Studies in the Mathematical Theory of Inventory and Production, pp. 201–209. Stanford University Press, Stanford (1958)

    Google Scholar 

  44. 44.

    Scarsini, M.: Multivariate convex orderings, dependence, and stochastic equality. J. Appl. Probab. 35(1), 93–103 (1998)

    MathSciNet  Article  MATH  Google Scholar 

  45. 45.

    Shaked, M., Shanthikumar, J.G.: Stochastic Orders. Springer, New York (2007)

    Google Scholar 

  46. 46.

    Shapiro, A.: On duality theory of conic linear problems. In: Goberna, M.A., López, M.A. (eds.) Semi-Infinite Programming: Recent Advances, pp. 135–165. Kluwer Academic Publishers, Dordrecht (2001)

    Google Scholar 

  47. 47.

    Shapiro, A., Ruszczyński, A. (eds.): Handbooks in Operations Research and Management Science: Vol. 10. Stochastic Programming. Elsevier, Amsterdam (2003)

    Google Scholar 

  48. 48.

    Shawe-Taylor, J., Cristianini, N.: Estimating the moments of a random vector with applications (2003). http://eprints.soton.ac.uk/260372/1/EstimatingTheMomentsOfARandomVectorWithApplications.pdf

  49. 49.

    Stephens, M.A.: Use of the Kolmogorov–Smirnov, Cramér–Von Mises and related statistics without extensive tables. J. R. Stat. Soc. B 32(1), 115–122 (1970)

    MATH  Google Scholar 

  50. 50.

    Thas, O.: Comparing Distributions. Springer, New York (2009)

    Google Scholar 

  51. 51.

    Tikhonov, A.N.: On the stability of inverse problems. Dokl. Akad. Nauk SSSR 39(5), 195–198 (1943)

    MathSciNet  Google Scholar 

  52. 52.

    Vapnik, V.: Principles of risk minimization for learning theory. In: NIPS, pp. 831–838 (1991)

  53. 53.

    Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    Google Scholar 

  54. 54.

    Wächter, A., Biegler, L.: On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006)

    MathSciNet  Article  MATH  Google Scholar 

  55. 55.

    Wang, Z., Glynn, P., Ye, Y.: Likelihood robust optimization for data-driven problems. Preprint arXiv:1307.6279 (2013)

  56. 56.

    Wiesemann, W., Kuhn, D., Sim, M.: Distributionally robust convex optimization. Oper. Res. 62(6), 1358–1376 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  57. 57.

    Žáčková, J.: On minimax solutions of stochastic linear programming problems. Časopis pro pěstování matematiky 91(4), 423–430 (1966)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers and associate editor for their extremely helpful suggestions and very thorough review of the paper. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. 1122374.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Nathan Kallus.

Appendix

Appendix

Proof of Theorem 1

Proof

We will show that \(\mathcal {C}(x;\mathcal {F})\) is continuous in x and that when \(c(x;\xi )\) satisfies the coerciveness conditions, \(\mathcal {C}(x;\mathcal {F})\) is also coercive. The result will then follow from the usual Weierstrass extreme value theorem for deterministic optimization [5].

Let \(S=\{x\in X:\mathcal {C}\left( x;\mathcal {F}\right) <\infty \}\). By assumption \(x_0\in S\), so \(S\ne \varnothing \). Consequently, we can restrict to minimizing over S instead of over X.

Fix any \(x\in X\). Let \(\epsilon >0\) be given. By equicontinuity of the cost at x there is a \(\delta >0\) such that any \(y\in X\) with \(\left| \left| x-y\right| \right| \le \delta \) has \(\left| c\left( x;\xi \right) -c\left( y;\xi \right) \right| \le \epsilon \) for all \(\xi \in \varXi \). Fix any such y. Then

$$\begin{aligned} \mathcal {C}\left( y;\mathcal {F}\right)= & {} \sup _{F_0\in \mathcal {F}}\mathbb {E}_{F_0}[c(y;\xi )]\le \sup _{F_0\in \mathcal {F}}\mathbb {E}_{F_0}[c(x;\xi )]+\epsilon =\mathcal {C}\left( x;\mathcal {F}\right) +\epsilon , \end{aligned}$$
(48)
$$\begin{aligned} \mathcal {C}\left( x;\mathcal {F}\right)= & {} \sup _{F_0\in \mathcal {F}}\mathbb {E}_{F_0}[c(x;\xi )]\le \sup _{F_0\in \mathcal {F}}\mathbb {E}_{F_0}[c(y;\xi )]+\epsilon =\mathcal {C}\left( y;\mathcal {F}\right) +\epsilon . \end{aligned}$$
(49)

Note that (48) implies that S is closed relative to X, which is itself closed. Hence, S is closed. Furthermore, (47) and (48) imply that \(\mathcal {C}\left( x;\mathcal {F}\right) \) is continuous in x on S.

If S is compact, the usual Weierstrass extreme value theorem provides that the continuous \(\mathcal {C}(x;\mathcal {F})\) attains its minimal (finite) value at an \(x\in S\subseteq X\).

Suppose S is not compact. Since \(S\subseteq X\) is closed, this must mean S is unbounded and then X is unbounded and hence not compact. Then, by assumption, \(c(x;\xi )\) satisfies the coerciveness assumption. Because S is unbounded, there exists a sequence \(x_i\in S\) such that \(\lim _{i\rightarrow \infty }\left| \left| x_0-x_i\right| \right| =\infty \). Then by the coerciveness assumption, \(c_i(\xi )=c(x_i;\xi )\) diverges pointwise to infinity. Fix any \(F_0\in \mathcal {F}\). Let \(c'_i(\xi )=\inf _{j\ge i}c_j(\xi )\), which is then pointwise monotone nondecreasing and pointwise divergent to infinity. Then, by Lebesgue’s monotone convergence theorem, \(\lim _{i\rightarrow \infty }\mathbb {E}_{F_0}[c'_i(\xi )]=\infty \). Since \(c'_i\le c_j\) pointwise for any \(j\ge i\), we have \(\mathbb {E}_{F_0}[c'_i(\xi )]\le \inf _{j\ge i}\mathbb {E}_{F_0}[c_j(\xi )]\) and therefore

$$\begin{aligned} \infty =\lim _{i\rightarrow \infty }\mathbb {E}_{F_0}[c'_i(\xi )]\le \lim _{i\rightarrow \infty }\inf _{j\ge i}\mathbb {E}_{F_0}[c_i(\xi )]=\liminf _{i\rightarrow \infty }\mathbb {E}_{F_0}[c_i(\xi )]. \end{aligned}$$

Thus \(\mathcal {C}(x;\mathcal {F})\ge \mathbb {E}_{F_0}[c(x;\xi )]\) is also coercive in x over S. Then, the usual Weierstrass extreme value theorem provides that the continuous \(\mathcal {C}(x;\mathcal {F})\) attains its minimal (finite) value at an \(x\in S\subseteq X\).\(\square \)

Proof of Proposition 2

Proof

Suppose that \(c(x;\xi )\rightarrow \infty \) as \(\xi \rightarrow \infty \). The case of unboundedness in the negative direction is similar. Choose \(\rho >0\) small so that \(\xi ^{(i)}-\xi ^{(i-1)}>2\rho \) for all i. For \(\delta >0\) and \(\xi '\ge \xi ^{(N)}+\rho \), let \(F_{\delta ,\xi '}\) be the measure with density function

$$\begin{aligned} f(\xi ;\delta ,\xi ')=\left\{ \begin{array}{ll}1/(2N\rho )&{}\quad \xi ^{(i)}-\rho \le \xi \le \xi ^{(i)}+\rho \text { for }1\le i\le N-1,\\ 1/(2N\rho )&{}\quad \xi ^{(N)}-\rho +\delta \rho \le \xi \le \xi ^{(N)}+\rho -\delta \rho ,\\ 1/(2N\rho )&{}\quad \xi '\le \xi \le \xi '+2\delta \rho ,\\ 0&{}\quad \text {otherwise}.\end{array}\right. \end{aligned}$$

In words, this density equally distributes mass on a \(\rho \)-neighborhood of every point \(\xi ^{(i)}\), except \(\xi ^{(N)}\). For \(\xi ^{(N)}\), we “steal” \(\delta \) of the mass to place around \(\xi '\). Notice that for any \(\xi '\), \(F_{0,\xi '}\) (i.e., take \(\delta =0\)) minimizes \(S_N(F_0)\) over distributions \(F_0\). Since \(\alpha >0\), \(Q_{S_N}(\alpha )\) is strictly greater than this minimum. Since \(S_N(F_{\delta ,\xi '})\) increases continuously with \(\delta \) independently of \(\xi '\), there must exist \(\delta >0\) small enough so that \(F_{\delta ,\xi '}\in \mathcal {F}_{S_N}^\alpha \) for any \(\xi '>\xi ^{(N)}+\rho \).

Let any \(M>0\) be given. By infinite limit of the cost function, there exists \(\xi '>\xi ^{(N)}+\rho \) sufficiently large such that \(c(x;\xi )\ge M N/\delta \) for all \(\xi \ge \xi '\). Then, we have \(\mathcal {C}\left( x;\mathcal {F}_{S_N}^\alpha \right) \ge \mathbb {E}_{F_{\delta ,\xi '}} \left[ c(x;\xi )\right] \ge \mathbb {P}\left( \xi \ge \xi '\right) M N/\delta =M\).

Since we have shown this for every \(M>0\), we have \(\mathcal {C}\left( x;\mathcal {F}_{S_N}^\alpha \right) =\infty \).\(\square \)

Computing a threshold \(Q_{C_N}(\alpha )\)

We provide two ways to compute \(Q_{C_N}(\alpha )\) for use with the LCX-based GoF test. One is an exact, closed form formula, but which may be loose for moderate N. Another uses the bootstrap to compute a tighter, but approximate threshold.

The theorem below employs a bound on \(\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \) to provide a valid threshold. This bound could either stem from known support bounds or from changing (15) to a two-sided hypothesis with two-sided confidence interval, using the lower bound as in (17) and the upper bound in (49) given below.

Theorem 16

Let \(N\ge 2\). Suppose that with probability at least \(1-\alpha _2\), \(\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \le \overline{Q_{R_N}}\left( \alpha _2\right) \). Let \(\alpha _1\in (0,1)\) be given and suppose \(F_0\preceq _\text {LCX}F\). Then, with probability at least \(1-\alpha _1-\alpha _2\),

$$\begin{aligned}&\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \le \overline{Q_{R_N}}\left( \alpha _2\right) \quad \text { and}\nonumber \\&C_N(F_0) \le \left( 1+\overline{Q_{R_N}}\left( \alpha _2\right) \right) \left( 1+\frac{p}{2-p}\right) \frac{2^{\frac{1}{2}+\frac{1}{p}}}{N^{1-\frac{1}{p}}}\nonumber \\&\qquad \sqrt{d+1+(d+1)\log \left( \frac{N}{d+1}\right) +\log \left( \frac{4}{\alpha _1}\right) }, \end{aligned}$$
(50)

where

$$\begin{aligned} p=\frac{1}{2}\left( \sqrt{\log (256)+8\log \left( N\right) +\left( \log \left( 2N\right) \right) ^2}-\log \left( 2N\right) \right) \in (1,2). \end{aligned}$$
(51)

Hence, defining \(Q_{C_N}(\alpha _1)\) equal to the right-hand side of (49), we get a valid threshold for \(C_N\) in testing \(F_0 \preceq _{\text{ L }CX} F\) at level \(\alpha _1\).

Proof

Observe

$$\begin{aligned} C_N(F_0)\le&\sup _{\left| \left| a\right| \right| _1+\left| b\right| \le 1}\left( \mathbb {E}_{F_0}[\max \left\{ a^T\xi -b,0\right\} ]-\mathbb {E}_{F}[\max \left\{ a^T\xi -b,0\right\} ]\right) \nonumber \\ {}&+\sup _{\left| \left| a\right| \right| _1+\left| b\right| \le 1}\left( \mathbb {E}_{F}[\max \left\{ a^T\xi -b,0\right\} ]-\frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\xi ^i-b,0\right\} \right) \nonumber \\\le&\sup _{\left| \left| a\right| \right| _1+\left| b\right| \le 1}\left( \mathbb {E}_{F}[\max \left\{ a^T\xi -b,0\right\} ]-\frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\xi ^i-b,0\right\} \right) , \end{aligned}$$
(52)

where the first inequality follows by distributing the sup and the second inequality follows because \(F_0 \preceq _{LCX} F\). Next, we provide a probabilistic bound on the last sup, which is the maximal difference over \(\left| \left| a\right| \right| _1+\left| b\right| \le 1\) between the true expectation of the hinge function \(\max \left\{ a^T\xi -b,0\right\} \) and its empirical expectation.

The class of level sets of such functions, i.e., \(\mathcal {S} =\left\{ \left\{ \xi \in \varXi :\max \left\{ a^T\xi -b,0\right\} \le t\right\} :\right. \left. \left| \left| a\right| \right| _1+\left| b\right| \le 1,\,t\in \mathbb {R}\right\} \), is contained in the class of the empty set and all halfspaces. Therefore, it has Vapnik-Chervonenkis dimension at most \(d+1\) (cf. [53]). Therefore, Theorem 5.2 and equation (5.12) of [53] provide that for any \(\epsilon >0\) and \(p\in (1,2]\),

$$\begin{aligned}&\mathbb {P}\Biggl (\sup _{\left| \left| a\right| \right| _1+\left| b\right| \le 1}\frac{1}{D_p(a,b)}\Biggl (\mathbb {E}_{F}[\max \left\{ a^T\xi -b,0\right\} ]-\frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\xi ^i-b,0\right\} \Biggr )>\epsilon \Biggr )\\&\quad <4\exp \left( \left( \frac{(d+1)(\log \left( \frac{N}{d+1}\right) +1)}{N^{2-2/p}}-\frac{\epsilon ^2}{2^{1+2/p}}\right) N^{2-2/p}\right) ,\nonumber \end{aligned}$$
(53)

where \(D_p(a,b)=\int _0^\infty \left( \mathbb {P}_F\left( \max \left\{ a^T\xi -b,0\right\} >t\right) \right) ^{1/p}dt\).

Notice that for any \(\left| \left| a\right| \right| _1+\left| b\right| \le 1\), we have \(0\le \max \left\{ a^T\xi -b,0\right\} \le \max \left\{ 1,\left| \left| \xi \right| \right| _\infty \right\} \le \max \left\{ 1,\left| \left| \xi \right| \right| _2\right\} \) and hence \(\mathbb {E}_F\left[ \max \left\{ a^T\xi -b,0\right\} ^2\right] \le \max \left\{ 1^2,\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \right\} \le 1+\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \). These observations combined with Markov’s inequality yields that, for any \(\left| \left| a\right| \right| _1+\left| b\right| \le 1\) and \(p\in (1,2)\), we have

$$\begin{aligned}&D_p(a,b)=\int _0^\infty \left( \mathbb {P}_F\left( \max \left\{ a^T\xi -b,0\right\} >t\right) \right) ^{1/p}dt\le 1\\&\qquad \qquad \qquad +\int _{1}^\infty \frac{\left( \mathbb {E}_F\left[ \max \left\{ a^T\xi -b,0\right\} ^2\right] \right) ^{1/p}}{t^{2/p}}dt\\&\quad \le \left( 1+\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \right) ^{1/p}\left( 1+\frac{p}{2-p}\right) \le \left( 1+\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \right) \left( 1+\frac{p}{2-p}\right) . \end{aligned}$$

This yields a bound on \(D_p(a,b)\) that is independent of (ab). Using this bound takes \(D_p(a,b)\) out of the sup in (52) and by bounding \(\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \le \overline{Q_{R_N}}\left( \alpha _2\right) \) (which holds with probability \(1-\alpha _2\)), we conclude from (52) that (49) holds with probability \(1-\alpha _1-\alpha _2\) for any \(p\in (1,2)\). The p given in (50) optimizes the bound for \(N\ge 2\).

\(\square \)

Next we show how to bootstrap an approximate threshold \(Q_{C_N}(\alpha )\). Recall that we seek a threshold \(Q_{C_N}(\alpha )\) such that \(\mathbb {P}\left( C_N(F_0)>Q_{C_N}(\alpha )\right) \le \alpha \) whenever \(F_0\preceq _{{\text {LCX}}}F\). Employing (51), we see that a sufficient threshold is the \((1-\alpha )\)th quantile of

$$\begin{aligned} \sup _{\left| \left| a\right| \right| _1+\left| b\right| \le 1}\left( \mathbb {E}_{F}[\max \left\{ a^T\xi -b,0\right\} ]-\frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\xi ^i-b,0\right\} \right) , \end{aligned}$$

where \(\xi ^i\) are drawn IID from F. The bootstrap [20] approximates this by replacing F with the empirical distribution \(\hat{F}_N\). In particular, given an iteration count B, for \(t=1,\dots , B\) it sets

$$\begin{aligned} Q^t=\sup _{\left| \left| a\right| \right| _1+\left| b\right| \le 1}\left( \frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\xi ^i-b,0\right\} -\frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\tilde{\xi }^{t,i}-b,0\right\} \right) \end{aligned}$$
(54)

where \(\tilde{\xi }^{t,i}\) are drawn IID from \(\hat{F}_N\), i.e., IID random choices from \(\{\xi ^1,\dots ,\xi ^N\}\). Then the bootstrap approximates \(Q_{C_N}(\alpha )\) by the \((1-\alpha )\)th quantile of \(\{Q^1,\dots ,Q^B\}\). However, it may be difficult to compute (53) as the problem is non-convex. Fortunately (53) can be solved with a standard MILP formulation or by discretizing the space and enumerating (the objective is Lipschitz).

In particular, our bootstrap algorithm for computing \(Q_{C_N}(\alpha )\) is as follows:

figurec

Proof of Proposition 4

Proof

We first prove that a uniformly consistent test is consistent. Let \(G_0 \ne F\) be given. Denote by d the Lévy-Prokhorov metric, which metrizes weak convergence (cf. [9]), and observe that \(d(G_0, F) > 0\).

Define \(R_N=\sup _{F_0\in \mathcal {F}_N}d(F_0,F)\). We claim that if the test is uniformly consistent, then \(\mathbb {P}\left( R_N \rightarrow 0\right) = 1\). Suppose that for some sample path, \(R_N \not \rightarrow 0\). By the definition of the supremum, there must exist \(\delta > 0\) and a sequence \(F_N \in \mathcal {F}_N\) such that \(d(F_N, F) \ge \delta \) i.o. Since d metrizes weak convergence, this means that \(F_N\) does not converge to F. However, \(F_N \in \mathcal {F}_N\) for all N, i.e. it is never rejected. Therefore, by uniform consistency of the test, the event that \(R_N \not \rightarrow 0\) must have probability 0. I.e., \(R_N \rightarrow 0\) a.s.

Since a.s. convergence implies convergence in probability, we have that \(\mathbb {P}\left( R_N<\epsilon \right) \rightarrow 1\) for every \(\epsilon > 0\), and, in particular, for \(\epsilon = d(G_0, F)>0\). Hence we have,

$$\begin{aligned} \mathbb {P}\left( G_0\in \mathcal {F}_N\right) \le \mathbb {P}\left( d(G_0,F)\le R_N\right) \rightarrow 0. \end{aligned}$$

This proves the first part of the proposition.

For the second part, we describe a test which is consistent but not uniformly consistent. Consider testing a continuous distribution F with the following univariate GoF test:

$$\begin{aligned}&\text {Given data }\xi ^1,\dots ,\xi ^N\text { drawn from }F\text { and a hypothetical continuous distribution }F_0:\\&\quad \text {Let }\ell =\lfloor \log _2 N\rfloor ,\, k=N-2^\ell .\\&\quad \text {If }\frac{k}{2^\ell }\le F_0(\xi ^1) \le \frac{k+1}{2^\ell }\text { then }F_0\text { is not rejected.}\\&\quad \text {Otherwise, reject }F_0\text { if it is rejected by the KS test at level }\frac{\alpha }{1-2^{-\ell }}\\&\qquad \text { applied to the data }\xi ^2, \ldots , \xi ^N. \end{aligned}$$

Notice that under the null-hypothesis, the probability of rejection is

$$\begin{aligned} \mathbb {P}(F_0 \text { rejected } )= & {} \mathbb {P} \left( F_0( \xi ^1 ) \not \in \left[ \frac{k}{2^\ell }, \frac{k+1}{2^\ell } \right] \right) \mathbb {P}( F_0 \text { is rejected by the KS test})\\= & {} (1 - 2^{-\ell }) \frac{\alpha }{1-2^{-\ell }} = \alpha , \end{aligned}$$

where we’ve used that \(\xi ^1\) is independent of the rest of the sample, and \(F_0(\xi ^1)\) is uniformly distributed for \(F_0\) continuous. Consequently, the test is a valid GoF test and it has significance \(\alpha \).

We claim this test is also consistent. Specifically, consider any \(F_0 \ne F\). By continuity of \(F_0\) and consistency of the KS test,

$$\begin{aligned} \mathbb {P}\left( F_0\text { is rejected}\right) =\mathbb {P} \left( F_0( \xi ^1 ) \not \in \left[ \frac{k}{2^\ell }, \frac{k+1}{2^\ell } \right] \right) \mathbb {P}\left( F_0\text { is rejected by the KS test}\right) \longrightarrow 1. \end{aligned}$$

However, the test is not uniformly consistent. Fix any continuous \(F_0\ne F\) and let

$$\begin{aligned} F_N=\left\{ \begin{array}{ll}F_0&{}\quad \text {if }\frac{k}{2^\ell }\le F_0(\xi ^1)\le \frac{k+1}{2^\ell },\\ \hat{F}_N&{}\quad \text {otherwise.}\end{array}\right. \end{aligned}$$

Observe that \(0\le F_0(\xi ^1)\le 1\) and \([0,1]=\bigcup _{k=0}^{2^\ell -1}\left[ \frac{k}{2^\ell },\,\frac{k+1}{2^\ell }\right] \). That is, for every \(\ell \), \(F_N=F_0\) at least once for \(N\in \{2^\ell ,\dots ,2^{\ell +1}-1\}\). Therefore \(F_N = F_0\) i.o., so it does not converge weakly to F. However, as constructed, \(F_N\) is never rejected by the above test. This is done for every sample path so the test cannot be uniformly consistent. \(\square \)

Proofs of Theorems 2 and 3

We first establish two useful results.

Proposition 8

Suppose \(\mathcal {F}_N\) is the confidence region of a uniformly consistent test and that Assumptions 1 and 3 hold. Then, almost surely, \(\mathbb {E}_{F_N}[c(x;\xi )]\rightarrow E_F[c(x;\xi )]\) for any \(x\in X\) and sequences \(F_N\in \mathcal {F}_N\).

Proof

Restrict to the a.s. event that \(\left( F_N\not \rightarrow F\implies F_N\notin \mathcal {F}_N\text { i.o.}\right) \). Fix \(F_N\in \mathcal {F}_N\). Then the contrapositive gives \(F_N\rightarrow F\). Fix x. If \(\varXi \) is bounded (Assumption 3a) then the result follows from the portmanteau lemma (see for example Theorem 2.1 of [9]). Suppose otherwise (Assumption 3b). Then \(\mathbb {E}_{F_N}[\phi (\xi )]\rightarrow \mathbb {E}_F[\phi (\xi )]\). By Theorem 3.6 of [9], \(\phi (\xi )\) is uniformly integrable over \(\{F_1,F_2,\dots \}\). Since \(c(x;\xi )=O(\phi (\xi ))\), it is also uniformly integrable over \(\{F_1,F_2,\dots \}\). Then the result follows by Theorem 3.5 of [9].\(\square \)

The following is a restatement of the equivalence between local uniform convergence of continuous functions and convergence of evaluations along a convergent path. We include the proof for completeness.

Proposition 9

Suppose Assumption 1 holds and \(\mathcal {C}(x_N;\mathcal {F}_N)\rightarrow \mathbb {E}_F[c(x;\xi )]\) for any convergent sequence \(x_N\rightarrow x\). Then (18) holds.

Proof

Let \(E\subseteq X\) compact be given and suppose to the contrary that \(\sup _{x\in E}\left| \mathcal {C}(x;\mathcal {F}_N)\right. \left. -\mathbb {E}_F[c(x;\xi )]\right| \not \rightarrow 0.\) Then \(\exists \epsilon >0\) and \(x_N\in E\) such that \(\left| \mathcal {C}(x_N;\mathcal {F}_N)-\mathbb {E}_F[c(x_N;\xi )]\right| \ge \epsilon \) i.o. This, combined with compactness, means that there exists a subsequence \(N_1<N_2<\cdots <N_k\rightarrow \infty \) such that \(x_{N_k}\rightarrow x\in E\) and \(\left| \mathcal {C}(x_{N_k};\mathcal {F}_{N_k})-\mathbb {E}_F[c(x_{N_k};\xi )]\right| \ge \epsilon \) \(\forall k\). Then,

$$\begin{aligned} 0<\epsilon\le & {} \left| \mathcal {C}(x_{N_k};\mathcal {F}_{N_k})-\mathbb {E}_F[c(x_{N_k};\xi )]\right| \le \left| \mathcal {C}(x_{N_k};\mathcal {F}_{N_k})-\mathbb {E}_F[c(x;\xi )]\right| \\&+\left| \mathbb {E}_F[c(x;\xi )]-\mathbb {E}_F[c(x_{N_k};\xi )]\right| . \end{aligned}$$

By assumption, \(\exists k_1\) such that \(\left| \mathcal {C}(x_{N_k};\mathcal {F}_{N_k})-\mathbb {E}_F[c(x;\xi )]\right| \le \epsilon /4\) \(\forall k\ge k_1\). By equicontinuity and \(x_{N_k}\rightarrow x\), \(\exists k_2\) such that \(\left| c(x;\xi )-c(x_{N_k};\xi )\right| \le \epsilon /4\) \(\forall \xi ,\,k\ge k_2\). Then,

$$\begin{aligned} \left| \mathbb {E}_F [c(x;\xi )]-\mathbb {E}_F [c(x_{N_k};\xi )]\right| \le \mathbb {E}_F[\left| c(x;\xi )-c(x_{N_k};\xi )\right| ]\le \epsilon /4\quad \forall \xi ,\,k\ge k_2. \end{aligned}$$

Combining and considering \(k=\max \left\{ k_1,k_2\right\} \), we get the contradiction \(\epsilon \le \epsilon /2\) for strictly positive \(\epsilon \).\(\square \)

We prove the “if” and “only if” sides of Theorem 2 separately.

Proof

(Proofs of Theorem 3 and the “only if” side of Theorem 2) For either theorem restrict to the a.s. event that

$$\begin{aligned} \mathbb {E}_{F_N}[c(x;\xi )]\rightarrow \mathbb {E}_F[c(x;\xi )]\text { for every }x\in X\text { and sequences } F_N\in \mathcal {F}_N \end{aligned}$$
(55)

(using Proposition 8 for Theorem 2 or by assumption of c-consistency for Theorem 3).

Let any convergent sequence \(x_N\rightarrow x\) and \(\epsilon >0\) be given. By equicontinuity and \(x_N\rightarrow x\), \(\exists N_1\) such that \(\left| c(x_N;\xi )-c(x;\xi )\right| \le \epsilon /2\) \(\forall \xi ,\,N\ge N_1\). Then, \(\forall N\ge N_1\),

$$\begin{aligned} \left| \mathcal {C}(x_N;\mathcal {F}_N)-\mathcal {C}(x;\mathcal {F}_N)\right|\le & {} \sup _{F_0\in \mathcal {F}_N}\left| \mathbb {E}_{F_0}\left[ c(x_N;\xi )-c(x;\xi )\right] \right| \\\le & {} \sup _{F_0\in \mathcal {F}_N}\mathbb {E}_{F_0}\left[ \left| c(x_N;\xi )-c(x;\xi )\right| \right] \le \epsilon /2. \end{aligned}$$

By definition of supremum, \(\exists F_N\in \mathcal {F}_N\) such that \(\mathcal {C}(x;\mathcal {F}_N)\le \mathbb {E}_{F_N}[c(x;\xi )]+\epsilon /4\). By (54), \(\mathbb {E}_{F_N}[c(x;\xi )]\rightarrow \mathbb {E}_{F}[c(x;\xi )]\). Hence, \(\exists N_2\) such that \(\left| \mathbb {E}_{F_N}[c(x;\xi )]\right. \left. -\mathbb {E}_{F}[c(x;\xi )]\right| \le \epsilon /4\) \(\forall N\ge N_2\). Combining these with the triangle inequality

$$\begin{aligned} \left| \mathcal {C}(x_N;\mathcal {F}_N)-\mathbb {E}_F[c(x;\xi )]\right| \le \left| \mathcal {C}(x_N;\mathcal {F}_N)-\mathcal {C}(x;\mathcal {F}_N)\right| +\left| \mathcal {C}(x;\mathcal {F}_N)-\mathbb {E}_F[c(x;\xi )]\right| , \end{aligned}$$

we get

$$\begin{aligned} \left| \mathcal {C}(x_N;\mathcal {F}_N)-\mathbb {E}_F[c(x;\xi )]\right| \le \epsilon \quad \forall N\ge \max \left\{ N_1,N_2\right\} . \end{aligned}$$

Thus, by Proposition 9, we get that (18) holds.

Let \(A_N={\text {arg}}\min _{x\in X}\mathcal {C}(x;\mathcal {F}_N)\). We now show that \(\bigcup _N A_N\) is bounded. If X is compact (Assumption 2a) then this is trivial. Suppose X is not compact (Assumption 2b). Using the same arguments as in the proof of Theorem 1, we have in particular that \(\lim _{\left| \left| x\right| \right| \rightarrow \infty }\mathbb {E}_F[c(x;\xi )]=\infty \), \(z_\text {stoch}=\min _{x\in X}\mathbb {E}_F[c(x;\xi )]<\infty \), that \(A={\text {arg}}\min _{x\in X}\mathbb {E}_F[c(x;\xi )]\) is compact, and each \(A_N\) is compact. Let \(x^*\in A\). Fix \(\epsilon >0\). By definition of supremum \(\exists F_N\in \mathcal {F}_N\) such that \(\mathcal {C}(x^*;\mathcal {F}_N)\le \mathbb {E}_{F_N}[c(x^*;\xi )]+\epsilon \). By (54), \(\mathbb {E}_{F_N}[c(x^*;\xi )]\rightarrow \mathbb {E}_{F}[c(x^*;\xi )]=z_\text {stoch}\). As this is true for any \(\epsilon \) and since \(\min _{x\in X}\mathcal {C}(x;\mathcal {F}_N)\le \mathcal {C}(x^*;\mathcal {F}_N)\), we have \(\limsup _{N\rightarrow \infty }\min _{x\in X}\mathcal {C}(x;\mathcal {F}_N)\le z_\text {stoch}\). Now, suppose for contradiction that \(\bigcup _N A_N\) is unbounded, i.e. there is a subsequence \(N_1<N_2<\cdots <N_k\rightarrow \infty \) and \(x_{N_k}\in A_{N_k}\) such that \(\left| \left| x_{N_k}\right| \right| \rightarrow \infty \). Let \(\delta '=\limsup _{k\rightarrow \infty }\inf _{\xi \notin D}c(x_{N_k};\xi )\ge \liminf _{N\rightarrow \infty }\inf _{\xi \notin D}c(x_{N};\xi )>-\infty \) and \(\delta =\min \left\{ 0,\delta '\right\} \). By D-uniform coerciveness, \(\exists k_0\) such that \(c(x_{N_k};\xi )\ge (z_\text {stoch}+1-\delta )/F(D)\) \(\forall \xi \in D,\,k\ge k_0\). In the case of Theorem 2, let \(F_N\) be any \(F_N\in \mathcal {F}_N\). In the case of Theorem 3, let \(F_N\) be the empirical distribution \(F_N=\hat{F}_N\in \mathcal {F}_N\). In either case, we get \(F_N\rightarrow F\) weakly. In particular, \(F_N(D)\rightarrow F(D)\). Then \(\mathbb {E}_{F_N}[c(x_{N_k};\xi )]\ge F_N(D)\times (z_\text {stoch}+1-\delta )/F(D)+\min \left\{ 0,\inf _{\xi \notin D}c(x_{N_k};\xi )\right\} \) \(\forall k\ge k_0\). Thus \(\limsup _{N\rightarrow \infty }\min _{x\in X}\mathcal {C}(x;\mathcal {F}_{N})\ge \limsup _{k\rightarrow \infty }\min _{x\in X}\mathcal {C}(x;\mathcal {F}_{N_k})\ge z_\text {stoch}+1-\delta +\delta =z_\text {stoch}+1\), yielding the contradiction \(z_\text {stoch}+1\le z_\text {stoch}\).

Thus \(\exists A_\infty \) compact such that \(A\subseteq A_\infty \), \(A_N\subseteq A_\infty \). Then, by (18),

$$\begin{aligned} \delta _N=\left| \min _{x\in X}\mathcal {C}(x;\mathcal {F}_N)-\min _{x\in X}\mathbb {E}_F[c(x;\xi )]\right|&=\left| \min _{x\in A_\infty }\mathcal {C}(x;\mathcal {F}_N)-\min _{x\in A_\infty }\mathbb {E}_F[c(x;\xi )]\right| \\ {}&\le \sup _{x\in A_\infty }\left| \mathcal {C}(x;\mathcal {F}_N)-\mathbb {E}_F[c(x;\xi )]\right| \rightarrow 0, \end{aligned}$$

yielding (19). Let \(x_N\in A_N\). Since \(A_\infty \) is compact, \(x_N\) has at least one convergent subsequence. Let \(x_{N_k}\rightarrow x\) be any convergent subsequence. Suppose for contradiction \(x\notin A\), i.e., \(\epsilon =\mathbb {E}_F[c(x;\xi )]-z_\text {stoch}>0\). Since \(x_{N_k}\rightarrow x\) and by equicontinuity, \(\exists k_1\) such that \(\left| c(x_{N_k};\xi )-c(x;\xi )\right| \le \epsilon /4\) \(\forall \xi ,\,k\ge k_1\). Then, \(\left| \mathbb {E}_F[c(x_{N_k};\xi )]-\mathbb {E}_F[c(x;\xi )]\right| \le \mathbb {E}_F[\left| c(x_{N_k};\xi )-c(x;\xi )\right| ]\le \epsilon /4\) \(\forall k\ge k_1\). Also \(\exists k_2\) such that \(\delta _{N_k}\le \epsilon /4\) \(\forall k\ge k_2\). Then, for \(k\ge \max \left\{ k_1,k_2\right\} \),

$$\begin{aligned} \min _{x\in X}\mathcal {C}(x;\mathcal {F}_{N_k})= & {} \mathcal {C}(x_{N_k};\mathcal {F}_{N_k})\ge \mathbb {E}_F[c(x_{N_k};\xi )]-\delta _{N_k}\ge \mathbb {E}_F[c(x;\xi )]\\&-\,\epsilon /2\ge z_\text {stoch}+\epsilon /2. \end{aligned}$$

Taking limits, we contradict (19).\(\square \)

Proof

(Proof of the “if” side of Theorem 2) Consider any \(\varXi \) bounded (\(R=\sup _{\xi \in \varXi }\left| \left| \xi \right| \right| <\infty \)). Let \(X=\mathbb {R}^{d}\), and

$$\begin{aligned}&c_1(x;\xi )=\left| \left| x\right| \right| \left( 2+{\text {cos}}\left( {x^T\xi }\right) \right) ,\quad c_2(x;\xi )=\left| \left| x\right| \right| \left( 2-{\text {cos}}\left( {x^T\xi }\right) \right) ,\\&c_3(x;\xi )=\left| \left| x\right| \right| \left( 2+{\text {sin}}\left( {x^T\xi }\right) \right) ,\quad c_4(x;\xi )=\left| \left| x\right| \right| \left( 2-{\text {sin}}\left( {x^T\xi }\right) \right) . \end{aligned}$$

Since \(\left| c_i(x;\xi )\right| \le 3\left| \left| x\right| \right| \), expectations exist. The gradient of each \(c_i\) at x has magnitude bounded by \(R\left| \left| x\right| \right| +3\) uniformly over \(\xi \), so Assumption 1 is satisfied. Also, \(\lim _{\left| \left| x\right| \right| \rightarrow \infty }c_i(x;\xi )\ge \lim _{\left| \left| x\right| \right| \rightarrow \infty }\left| \left| x\right| \right| =\infty \) uniformly over all \(\xi \in \varXi \) and \(c_i(x;\xi )\ge 0\), so Assumption 2 is satisfied.

Restrict to the a.s. event that (18) applies simultaneously for \(c_1,c_2,c_3,c_4\). Let any sequence \(F_N\not \rightarrow F\) be given. Let I denote the imaginary unit. Then, by the Lévy continuity theorem and Cramér-Wold device, there exists x such that \(\mathbb {E}_{F_N}\left[ e^{Ix^T\xi }\right] \not \rightarrow \mathbb {E}_{F}\left[ e^{Ix^T\xi }\right] \). On the other hand, by (18),

$$\begin{aligned} 2\left| \left| x\right| \right| +\left| \left| x\right| \right| \sup _{F_0\in \mathcal {F}_N}\mathbb {E}_{F_0}\left[ {\text {cos}}\left( {x^T\xi }\right) \right] \longrightarrow 2\left| \left| x\right| \right| +\left| \left| x\right| \right| \mathbb {E}_{F}\left[ {\text {cos}}\left( {x^T\xi }\right) \right] ,\\ 2\left| \left| x\right| \right| -\left| \left| x\right| \right| \inf _{F_0\in \mathcal {F}_N}\mathbb {E}_{F_0}\left[ {\text {cos}}\left( {x^T\xi }\right) \right] \longrightarrow 2\left| \left| x\right| \right| -\left| \left| x\right| \right| \mathbb {E}_{F}\left[ {\text {cos}}\left( {x^T\xi }\right) \right] ,\\ 2\left| \left| x\right| \right| +\left| \left| x\right| \right| \sup _{F_0\in \mathcal {F}_N}\mathbb {E}_{F_0}\left[ {\text {sin}}\left( {x^T\xi }\right) \right] \longrightarrow 2\left| \left| x\right| \right| +\left| \left| x\right| \right| \mathbb {E}_{F}\left[ {\text {sin}}\left( {x^T\xi }\right) \right] ,\\ 2\left| \left| x\right| \right| -\left| \left| x\right| \right| \inf _{F_0\in \mathcal {F}_N}\mathbb {E}_{F_0}\left[ {\text {sin}}\left( {x^T\xi }\right) \right] \longrightarrow 2\left| \left| x\right| \right| -\left| \left| x\right| \right| \mathbb {E}_{F}\left[ {\text {sin}}\left( {x^T\xi }\right) \right] . \end{aligned}$$

The first two limits imply that \(\sup _{F_0\in \mathcal {F}_N}\left| \mathbb {E}_{F_0}\left[ {\text {cos}}\left( x^T\xi \right) \right] -\mathbb {E}_{F}\left[ {\text {cos}}\left( x^T\xi \right) \right] \right| \rightarrow 0\) and the second two imply that \(\sup _{F_0\in \mathcal {F}_N}\left| \mathbb {E}_{F_0}\left[ {\text {sin}}\left( x^T\xi \right) \right] -\mathbb {E}_{F}\left[ {\text {sin}}\left( x^T\xi \right) \right] \right| \rightarrow 0\). Together, recalling that \(e^{It}={\text {cos}}(t)+I{\text {sin}}(t)\), this implies that

$$\begin{aligned} \sup _{F_0\in \mathcal {F}_N}\left| \mathbb {E}_{F_0}\left[ e^{Ix^T\xi }\right] -\mathbb {E}_{F}\left[ e^{Ix^T\xi }\right] \right| \rightarrow 0. \end{aligned}$$

However, since, \(\mathbb {E}_{F_N}\left[ e^{Ix^T\xi }\right] \not \rightarrow \mathbb {E}_{F}\left[ e^{Ix^T\xi }\right] \), it must be that \(F_N\notin \mathcal {F}_N\) i.o.\(\square \)

Proof of Theorem 4

Proof

In the case of finite support \(\varXi =\{\hat{\xi }^1,\dots ,\hat{\xi }^n\}\), total variation metrizes weak convergence:

$$\begin{aligned} d_\text {TV}(q,q')=\frac{1}{2}\sum _{j=1}^n\left| q(j)-q'(j)\right| . \end{aligned}$$

Restrict to the almost sure event \(d_\text {TV}(\hat{p}_N,p)\rightarrow 0\) (see Theorem 11.4.1 of [18]). We need only show that now \(\sup _{p_0\in \mathcal {F}_N}d_\text {TV}(\hat{p}_N,p_0)\rightarrow 0\), yielding the contrapositive of the uniform consistency condition.

By an application of the Cauchy–Schwartz inequality (cf. [22]),

$$\begin{aligned} d_\text {TV}(\hat{p}_N,p_0)= & {} \frac{1}{2}\sum _{j=1}^n\frac{\left| \hat{p}_N(j)-p_0(j)\right| }{\sqrt{p_0(j)}}\times \sqrt{p_0(j)}\\\le & {} \frac{1}{2}\left( \sum _{j=1}^n\frac{\left( \hat{p}_N(j)-p_0(j)\right) ^2}{{p_0(j)}}\right) ^{1/2}=\frac{X_N(p_0)}{2}. \end{aligned}$$

By Pinsker’s inequality (cf. [31]),

$$\begin{aligned} d_\text {TV}(\hat{p}_N,p_0)\le \frac{1}{\sqrt{2}}\left( \sum _{j=1}^n\sum _{j=1}^n\hat{p}_N(j)\log \left( \hat{p}_N(j)/p_0(j)\right) \right) ^{1/2}=\frac{G_N(p_0)}{2}. \end{aligned}$$

Since both the \(\chi ^2\) and G-tests use a rejection threshold equal to \(\sqrt{Q/N}\) where Q is the \((1-\alpha )\)th quantile of a \(\chi ^2\) distribution with \(n-1\) degrees of freedom (Q is independent of N), we have that \(d_\text {TV}(\hat{p}_N,p_0)\) is uniformly bounded over \(p_0\in \mathcal {F}_N\) by a quantity diminishing with N.\(\square \)

Proof of Theorem 5

Proof

In the case of univariate support, the Lévy metric metrizes weak convergence:

$$\begin{aligned} d_{\mathrm{L}\acute{e}\mathrm{vy}}(G,G')=\inf \{\epsilon >0:G(\xi -\epsilon )-\epsilon \le G'(\xi )\le G(\xi +\epsilon )+\epsilon \ \forall \xi \in \mathbb {R}\}. \end{aligned}$$

Restrict to the almost sure event \(d_{\mathrm{L}\acute{e}\mathrm{vy}}(\hat{F}_n,F)\rightarrow 0\) (see Theorem 11.4.1 of [18]). We need only show that now \(\sup _{F_0\in \mathcal {F}_N}d_{\mathrm{L}\acute{e}\mathrm{vy}}(\hat{F}_N,F_0)\rightarrow 0\), yielding the contrapositive of the uniform consistency condition.

Fix \(F_0\) and let \(0\le \epsilon <d_{\mathrm{L}\acute{e}\mathrm{vy}}(\hat{F}_N,F_0)\). Then \(\exists \xi _0\) such that either (1) \(\hat{F}_N(\xi _0-\epsilon )-\epsilon >F_0(\xi _0)\) or (2) \(\hat{F}_N(\xi _0+\epsilon )+\epsilon <F_0(\xi _0)\). Since \(F_0\) is monotonically non-decreasing, (1) implies \(D_N(F_0)\ge \hat{F}_N(\xi _0-\epsilon )-F_0(\xi _0-\epsilon )>\epsilon \) and (2) implies \(D_N(F_0)\ge F_0(\xi _0+\epsilon )-\hat{F}_N(\xi _0+\epsilon )>\epsilon \). Hence \(d_{\mathrm{L}\acute{e}\mathrm{vy}}(\hat{F}_N,F_0)\le D_N(F_0)\). Moreover, \(D_N\le V_N\) by definition. Since \(\sup _{F_0\in \mathcal {F}_{S_N}^\alpha }S_N(F_0)=Q_{S_N}(\alpha )=O(N^{-1/2})\) for either statistic, both the KS and Kuiper tests are uniformly consistent.

Consider \(D'_N(F_0)=\max _{i=1,\,\dots ,\,N}\left| F_0(\xi ^{(i)})-\frac{2i-1}{2N}\right| =\sigma \left( F_0(\xi ^{(j)})-\frac{2j-1}{2N}\right) \), where j and \(\sigma \) are the maximizing index and sign, respectively. Suppose \(D'_N(F_0)\ge 1/\sqrt{N}+1/N\). If \(\sigma =+1\), this necessarily means that \(1-\frac{2j-1}{2N}\ge 1/\sqrt{N}+1/N\) and therefore \(N-j\ge \lceil \sqrt{N}\rceil +1\). By monotonicity of \(F_0\) we have for \(0\le k\le \lceil \sqrt{N}\rceil \) that \(j+k\le N\) and

$$\begin{aligned} F_0(\xi ^{(j+k)})-\frac{2(j+k)-1}{2N}\ge F_0(\xi ^{(j)})-\frac{2j-1}{2N}-\frac{k}{N}=D'_N(F_0)-\frac{k}{N}\ge 0. \end{aligned}$$

If instead \(\sigma =-1\), this necessarily means that \(\frac{2j-1}{2N}\ge 1/\sqrt{N}+1/N\) and therefore \(j\ge \lceil \sqrt{N}\rceil +1\). By monotonicity of \(F_0\) we have for \(0\le k\le \lceil \sqrt{N}\rceil \) that \(j-k\ge 1\) and

$$\begin{aligned} \frac{2(j-k)-1}{2N}-F_0(\xi ^{(j-k)})\ge \frac{2j-1}{2N}-F_0(\xi ^{(j)})-\frac{k}{N}=D'_N(F_0)-\frac{k}{N}\ge 0. \end{aligned}$$

In either case we have that

$$\begin{aligned} W_N^2= & {} \frac{1}{12N^2}+\frac{1}{N}\sum _{i=1}^N\left( F_0(\xi ^{(i)})-\frac{2i-1}{2N}\right) ^2\ge \frac{1}{12N^2}\\&+\, \frac{1}{N}\sum _{k=0}^{\lceil \sqrt{N}\rceil }\left( D'_N-\frac{k}{N}\right) ^2\ge \frac{D_N^2}{\sqrt{N}}-\frac{2}{N} \end{aligned}$$

using \(D'_N(F_0)\ge 1/\sqrt{N}+1/N\) and \(\left| D'_N(F_0)-D_N(F_0)\right| \le 1/(2N)\) in the last inequality. Therefore,

$$\begin{aligned} D_N^2(F_0)\le \max \left\{ \frac{1}{\sqrt{N}}+\frac{3}{2N},\,\sqrt{N}W_N^2(F_0)+\frac{2}{\sqrt{N}}\right\} . \end{aligned}$$

Since \(F_0(\xi )(1-F_0(\xi ))\le 1\) and by using the integral formulation of CvM and AD (see [50]) the same is true replacing \(W^2_N\) by \(A^2_N\). For either of the CvM or AD statistic \(Q_{S_N}(\alpha )=O(N^{-1/2})\) and hence \(\sup _{F_0\in \mathcal {F}_{S_N}^\alpha }S^2_N(F_0)=O(N^{-1})\). Therefore, both the CvM and AD tests are uniformly consistent.

$$\begin{aligned} W_N^2-U_N^2&=\left( \frac{1}{N}\sum _{i=1}^NF_0(\xi ^{(i)})-\frac{1}{2}\right) ^2\\&\le \max \left\{ \left( \frac{1}{N}\sum _{i=1}^N\min \left\{ 1,\frac{2i-1}{2N}+D'_N(F_0)\right\} -\frac{1}{2}\right) ^2,\,\right. \\&\quad \left. \left( \frac{1}{N}\sum _{i=1}^N\max \left\{ 0,\frac{2i-1}{2N}-D'_N(F_0)\right\} -\frac{1}{2}\right) ^2\right\} . \end{aligned}$$

Letting \(M=\lfloor \frac{1}{2}+N(1-D'_N(F_0))\rfloor \) we have \(\sum _{i=1}^N\min \left\{ 1,\frac{2i-1}{2N}+D'_N(F_0)\right\} =\frac{M^2}{2N}+MD'_N(F_0)+N-M\) so that in the case of \(D'_N(F_0)\ge 1/\sqrt{N}+1/N\), \(\left( \frac{1}{N}\sum _{i=1}^N\min \left\{ 1,\frac{2i-1}{2N}+D'_N(F_0)\right\} -\frac{1}{2}\right) ^2=O(1/N)\). Thus, the Watson test is also uniformly consistent.\(\square \)

Proof of Proposition 5

Proof

Apply Theorem 2 to each i and restrict to the almost sure event that (18) holds for all i. Fix \(F_N\) such that \(F_N\in \mathcal {F}_N\) eventually. Then, (18) yields \(\mathbb {E}_{F_N}[c_i(x;\xi _i)]\rightarrow \mathbb {E}_{F}[c_i(x;\xi _i)]\) for every \(x\in X\). Summing over i yields the contrapositive of the c-consistency condition.\(\square \)

Proof of Proposition 6

Proof

Restrict to a sample path in the almost sure event \(\mathbb {E}_{\hat{F}_N}[\xi _i]\rightarrow \mathbb {E}_{\hat{F}}[\xi _i]\), \(\mathbb {E}_{\hat{F}_N}[\xi _i\xi _j]\rightarrow \mathbb {E}_{\hat{F}}[\xi _i\xi _j]\) for all ij. Consider any \(F_N\) such that \(F_N\in \mathcal {F}^\alpha _{\text {CEG},N}\) eventually. Then clearly \(\mathbb {E}_{F_N}[\xi _i]\rightarrow \mathbb {E}_{F}[\xi _i]\), \(\mathbb {E}_{F_N}[\xi _i\xi _j]\rightarrow \mathbb {E}_{F}[\xi _i\xi _j]\).

Consider any \(F_N\) such that \(F_N\in \mathcal {F}^\alpha _{\text {DY},N}\) eventually. Because covariances exist, we may restrict to N large enough so that \(\left| \left| \hat{\varSigma }_N\right| \right| _2\le M\) (operator norm) and \(F_N\in \mathcal {F}^\alpha _{\text {DY},N}\). Then we get

$$\begin{aligned} \left| \left| \mathbb {E}_{F_N}[\xi ]-\hat{\mu }_N\right| \right| \le M\gamma _{1,N}(\alpha )\rightarrow 0 \end{aligned}$$

and

$$\begin{aligned} \left( \gamma _{3,N}(\alpha )-1\right) \hat{\varSigma }_N\preceq \mathbb {E}_{F_0}[\left( \xi -\hat{\mu }_N\right) \left( \xi -\hat{\mu }_N\right) ^T]-\hat{\varSigma }_N\preceq \left( \gamma _{2,N}(\alpha )-1\right) \hat{\varSigma }_N, \end{aligned}$$

which gives \(\left| \left| \mathbb {E}_{F_0}[\left( \xi -\hat{\mu }_N\right) \left( \xi -\hat{\mu }_N\right) ^T]-\hat{\varSigma }_N\right| \right| _2\le M\max \left\{ \gamma _{2,N}(\alpha )-1,\right. \left. 1-\gamma _{3,N}(\alpha )\right\} \rightarrow 0\). Then again, we have \(\mathbb {E}_{F_N}[\xi _i]\rightarrow \mathbb {E}_{ F}[\xi _i]\), \(\mathbb {E}_{F_N}[\xi _i\xi _j]\rightarrow \mathbb {E}_{ F}[\xi _i\xi _j]\).

In either case we get \(\mathbb {E}_{F_N}[c(x;\xi )]\rightarrow \mathbb {E}_{F_N}[c(x;\xi )]\) for any x due to factorability as in (26). This yields the contrapositive of the c-consistency condition.\(\square \)

Proof of Theorem 6

Proof

If \(F_0\ne F\) then Theorem 1 of [44] yields that either \(F_0\npreceq _\text {LCX}F\) or there is some \(j=1,\dots ,d\) such that \(\mathbb {E}_{F_0} [\xi _j^2]\ne \mathbb {E}_{F} [\xi _j^2]\). If \(F_0\npreceq _\text {LCX}F\) then probability of rejection approaches one since \(C_N>0\) but \(Q_{C_N}(\alpha _1)\rightarrow 0\). Otherwise, \(F_0\preceq _\text {LCX}F\) yields \(\mathbb {E}_{F_0} [\xi _i^2]\le \mathbb {E}_{F} [\xi _i^2]\) for all i via (12) using \(a=e_i\) and \(\phi (\zeta )=\zeta ^2\). Then \(\mathbb {E}_{F_0} [\xi _j^2]\ne \mathbb {E}_{F} [\xi _j^2]\) must mean that \(\mathbb {E}_{F_0}\left[ \left| \left| \xi \right| \right| _2^2\right] <\mathbb {E}_{F}\left[ \left| \left| \xi \right| \right| _2^2\right] \) and probability of rejection goes to one.\(\square \)

Proof of Theorem 7

Proof

Let \(R=\sup _{\xi \in \varXi }\left| \left| \xi \right| \right| _2<\infty \). Restrict to the almost sure event that \(\hat{F}_N\rightarrow F\). Consider \(F_N\) such that \(F_N\in \mathcal {F}_N\) eventually. Let N be large enough so that it is so. Fix \(\left| \left| a\right| \right| _2=1\). Let \(a_1=a\) and complete an orthonormal basis for \(\mathbb {R}^{d}\): \(a_1,a_2,\dots ,a_d\). On the one hand we have \(Q_{R_N}(\alpha _2)\ge \mathbb {E}_{\hat{F}_N}\left[ \sum _{i=1}^d(a_i^T\xi )^2\right] -\mathbb {E}_{F_N}\left[ \sum _{i=1}^d(a_i^T\xi )^2\right] \). On the other hand, for each i,

$$\begin{aligned}&\mathbb {E}_{\hat{F}_N}\left[ (a_i^T\xi )^2\right] -\mathbb {E}_{ F_N}\left[ (a_i^T\xi )^2\right] \\&\quad =~2\int _{b=-R}^0\left( \mathbb {E}_{\hat{F}_N}[\max \left\{ b-a_i^T\xi ,0\right\} ]-\mathbb {E}_{ F_N}[\max \left\{ b-a_i^T\xi ,0\right\} ]\right) db\\&\qquad +2\int _{b=0}^R\left( \mathbb {E}_{\hat{F}_N}[\max \left\{ a_i^T\xi -b,0\right\} ]-\mathbb {E}_{ F_N}[\max \left\{ a_i^T\xi -b,0\right\} ]\right) db\\&\quad \ge ~4\int _{b=0}^R(\left| \left| a\right| \right| _1+\left| b\right| )Q_{C_N}(\alpha _1)db\ge 4\left( \sqrt{d}+R^2/2\right) Q_{C_N}(\alpha _1)=p_N. \end{aligned}$$

Therefore, \(q_N=Q_{R_N}(\alpha _2)+(d-1)p_N\ge \mathbb {E}_{\hat{F}_N}\left[ (a^T\xi )^2\right] -\mathbb {E}_{F_N}\left[ (a^T\xi )^2\right] \) and \(Q_{R_N}(\alpha _2),Q_{C_N}(\alpha _1),p_N,q_N\rightarrow 0\). Let \(G_N(t)=F_N(\left\{ \xi :a^T\xi \le t\right\} )\in [0,1]\) and \(\hat{G}_N(t)=\hat{F}_N(\left\{ \xi :a^T\xi \le t\right\} )\in [0,1]\) be the CDFs of \(a^T\xi \) under \(F_N\) and \(\hat{F}_N\), respectively. Then,

$$\begin{aligned} q_N&\ge \mathbb {E}_{\hat{F}_N}\left[ (a^T\xi )^2\right] -\mathbb {E}_{F_N}\left[ (a^T\xi )^2\right] \\&\quad =~2\int _{b=-R}^0\left( \mathbb {E}_{\hat{F}_N}[\max \left\{ b-a^T\xi ,0\right\} ]-\mathbb {E}_{ F_N}[\max \left\{ b-a^T\xi ,0\right\} ]\right) db\\&\qquad +2\int _{b=0}^R\left( \mathbb {E}_{\hat{F}_N}[\max \left\{ a^T\xi -b,0\right\} ]-\mathbb {E}_{ F_N}[\max \left\{ a^T\xi -b,0\right\} ]\right) db\\&\quad =~2\int _{b=-R}^0\int _{t=-R}^{b}\left( \hat{G}_N(t)-G_N(t)\right) dt\,db\\&\qquad +2\int _{b=0}^R\int _{t=b}^{R}\left( G_N(t)-\hat{G}_N(t)\right) dt\,db\ge p_N,\\&\quad \int _{t=-R}^{b}\left( \hat{G}_N(t)-G_N(t)\right) dt\ge -(\sqrt{d}+R)Q_{C_N}(\alpha )~~\forall b\in [-R,0],\\&\quad \int _{t=b}^{R}\left( G_N(t)-\hat{G}_N(t)\right) dt\ge -(\sqrt{d}+R)Q_{C_N}(\alpha )~~\forall b\in [0,R], \end{aligned}$$

Because \(\hat{F}_N\rightarrow F\), we get \(\hat{G}_N(t)\rightarrow F(\left\{ \xi :a^T\xi \le t\right\} )\) and therefore \(G_N(t)\rightarrow F(\left\{ \xi :a^T\xi \le t\right\} )\) at every continuity point t. Because this is true for every a, the Cramer-Wold device yields \(F_N\rightarrow F\). This is the contrapositive of the uniform consistency condition.\(\square \)

Proof of Theorem 8

Proof

The proof amounts to dualizing a finite-support phi-divergence as done in Theorem 1 of [3] and is included here only for the sake of self-containment.

Let \(\phi _{X}(t)=(t-1)^2/t\) and \(\phi _{G}(t)=-\log (t)+t-1\) (corresponding to “\(\chi ^2\)-distance” and “Burg entropy” in Table 2 of [3], respectively). Then we can write

$$\begin{aligned} \mathcal {F}_{X_N}^\alpha= & {} \left\{ p_0\ge 0:\,\sum _{j=1}^np_0(j)=1,\,\sum _{j=1}^n\hat{p}(j)\phi _X(p_0(j)/\hat{p}_N(j))\le (Q_{X_N}(\alpha ))^2\right\} ,\\ \mathcal {F}_{G_N}^\alpha= & {} \left\{ p_0\ge 0:\,\sum _{j=1}^np_0(j)=1,\,\sum _{j=1}^n\hat{p}(j)\phi _G(p_0(j)/\hat{p}_N(j))\le \frac{1}{2}(Q_{G_N}(\alpha ))^2\right\} . \end{aligned}$$

Therefore, letting \(c_j=c(x;\hat{\xi }^j)\) and fixing \(\phi \) and Q appropriately, the inner problem (4) is equal to

$$\begin{aligned} \max _{p_0\in \mathbb {R}^{n}_+}\quad&c^Tp_0\\ \text {s.t.}\quad&\sum _{j=1}^np_0(j)=1\\&\sum _{j=1}^n\hat{p}_N(j)\phi (p_0(j)/\hat{p}_N(j))\le Q^2. \end{aligned}$$

By Fenchel duality, the above is equal to

$$\begin{aligned}&\min _{r\in \mathbb {R},\,s\in \mathbb {R}_+}\max _{p_0\in \mathbb {R}^{n}_+}\quad c^Tp_0+r(1-e^Tp_0)+s(Q^2-\sum _{j=1}^n\hat{p}_N(j)\phi (p_0(j)/\hat{p}_N(j))) \\&\quad =\min _{r\in \mathbb {R},\,s\in \mathbb {R}_+} r+Q^2s+\sum _{j=1}^n\hat{p}_N(j)s\max _{\rho \in \mathbb {R}_+}\left( \frac{c_j-r}{s}\rho -\phi (\rho )\right) \\&\quad =\min _{r\in \mathbb {R},\,s\in \mathbb {R}_+} r+Q^2s+\sum _{j=1}^n \hat{p}_N(j)s\phi ^*\left( \frac{c_j-r}{s}\right) , \end{aligned}$$

where \(\phi ^*(\tau )=\sup _{\rho \ge 0}(\tau \rho -\phi (\rho ))\) is the convex conjugate. Therefore, the inner problem (4) is equal to

$$\begin{aligned} \min _{r,s,t}\quad&r+Q^2s-\sum _{j=1}^n\hat{p}_N(j)t_j\\ \text {s.t.}\quad&r\in \mathbb {R},\,s\in \mathbb {R}_+,\,t\in \mathbb {R}^{n} ,\,c\in \mathbb {R}^{n}\\&t_j\le -s\phi ^*\left( \frac{c_j-r}{s}\right)&\forall j=1,\dots ,N \\&c_j\ge c(x;\hat{\xi }^j)&\forall j=1,\dots ,N. \end{aligned}$$

It is easy to verify that

$$\begin{aligned} \phi ^*_{X}(\tau )=\left\{ \begin{array}{ll}2-2\sqrt{1-\tau }&{}\quad \tau \le 1\\ \infty &{}\quad \text {otherwise}\end{array}\right. ,\quad \text {and}\quad \phi ^*_{G}(\tau )=\left\{ \begin{array}{ll}-\log (1-\tau )&{}\quad \tau \le 1\\ \infty &{}\quad \text {otherwise}\end{array}\right. \end{aligned}$$

(see e.g. Table 4 of [3]). In the case of \(X_N\), since \(s\ge 0\), we get

$$\begin{aligned} \begin{array}{c} t_j\le -s\phi _X^*\left( \frac{c_j-r}{s}\right) \end{array}&\iff \begin{array}{c} c_j-r\le s,\\ 2s+t_j\le 2\sqrt{s(s-c_j+r)} \end{array}\\ {}&\iff \exists y_j:\begin{array}{c} c_j-r\le s,\,y_j\ge 0,\\ 2s+t_j\le y_j,\\ y_j^2\le (2s)(2s-2c_j+2r) \end{array}\\ {}&\iff \exists y_j:\begin{array}{c} c_j-r\le s,\,y_j\ge 0,\\ 2s+t_j\le y_j,\\ y_j^2+(r-c_j)^2\le (2s-c_j+r)^2. \end{array} \end{aligned}$$

In the case of \(G_N\), since \(s\ge 0\), we get

$$\begin{aligned} t_j\le -s\phi _G^*\left( \frac{c_j-r}{s}\right)&\iff \begin{array}{c} c_j-r\le s,\\ se^{t_j/s}\le s+r-c_j \end{array}\\ {}&\iff \left( t_j,\,s,\,s+r-c_j\right) \in C_\text {XC}. \end{aligned}$$

\(\square \)

Proof of Theorem 9

Proof

Problem (3) is equal to the optimization problems of Theorem 8 augmented with the variable \(x\in X\) and weak optimization is polynomially reducible to weak separation (see [23]). Tractable weak separation for all constraints except \(x\in X\) and (28) is given by the tractable weak optimization over these standard conic-affine constraints. A weak separation oracle is assumed given for \(x\in X\). We polynomially reduce separation over \(c_j\ge \max _kc_{jk}(x)\) for fixed \(c'_j,x'\) to the oracles. We first call the evaluation oracle for each k to check violation and if there is a violation and \(k^*\in \arg \max _kc_{jk}(x')\) then we call the subgradient oracle to get \(s\in \partial c_{jk^*}(x')\) with \(\left| \left| s\right| \right| _\infty \le 1\) and produce the separating hyperplane \(0\ge c_{jk^*}(x')-c_j+s^T(x-x')\).

\(\square \)

Proof of Theorem 10

Proof

Substituting the given formulas for \(K_{S_N},\,A_{S_N},\,b_{S_N,\alpha }\) for each \(S_N\in \{D_N,V_N,W_N,U,N,A_N\}\) in \(A_{S_N}\zeta -b_{S_N,\alpha }\in K_{S_N}\) we obtain exactly \(S_N(\zeta _1,\dots ,\zeta _N)\le Q_{S_N}(\alpha )\) for \(S_N\) as defined in (8). We omit the detailed arithmetic.\(\square \)

Proof of Theorem 13

Proof

Under these assumptions (3) is equal to the optimization problems of Theorem 11 or 12 augmented with the variable x and weak optimization is polynomially reducible to weak separation (see [23]). Tractable weak separation for all constraints except \(x\in X\) and (30) is given by the tractable weak optimization over these standard conic-affine constraints. A weak separation oracle is assumed given for \(x\in X\). By continuity and given structure of \(c(x;\xi )\), we may rewrite (30) as

$$\begin{aligned} c_i\ge \max _{\xi \in [\xi ^{(i-1)},\xi ^{(i)}]} c_k(x;\xi )\quad \forall k=1,\dots ,K. \end{aligned}$$
(56)

We polynomially reduce weak \(\delta \)-separation over the kth constraint at fixed \(c'_{i},x'\) to the oracles. We call the \(\delta \)-optimization oracle to find \(\xi '\in [\xi ^{(i-1)},\xi ^{(i)}]\) such that \(c_k(x';\xi ')\ge \max _{\xi \in [\xi ^{(i-1)},\xi ^{(i)}]} c_k(x;\xi )-\delta \). If \(c_i'\ge c_k(x';\xi ')\) then \((c_i'+\delta ,x')\) satisfy the constraint and is within \(\delta \) of \((c_i',x')\). If \(c_i'< c_k(x';\xi ')\) then we call the subgradient oracle to get \(s\in \partial _x c_{k}(x',\xi ')\) with \(\left| \left| s\right| \right| _\infty \le 1\) and produce the hyperplane \(c_i\ge c(x';\xi ')+s^T(x-x')\) that is violated by \((c_i',x')\) and for any \((c_i,x)\) satisfying (55) (in particular if it is in the \(\delta \)-interior) we have \(c_i\ge \max _{\xi \in [\xi ^{(i-1)},\xi ^{(i)}]} c_k(x;\xi )\ge c_k(x;\xi ')\ge c_k(x';\xi ')+s^T(x-x')\) since s is a subgradient. The case for constraints (31) is similar.

\(\square \)

Proof of Proposition 7

Proof

According to Theorem 11, the observations in Example 2, and by renaming variables, the DRO (3) is given by

$$\begin{aligned} (P) \min&y+\sum _{i=1}^N\left( {Q_{D_N}(\alpha )}+\frac{i-1}{N}\right) s_i+\sum _{i=1}^N\left( {Q_{D_N}(\alpha )}-\frac{i}{N}\right) t_i\\&{\text {s.t.}} \;x\in \mathbb {R}_+,\,y\in \mathbb {R},\,s\in \mathbb {R}^{N}_+,\,t\in \mathbb {R}^{N}_+\\&(r-c)x+y+\sum _{i=j}^N(s_i-t_i)\ge (r-c)\xi ^{(j)}&\forall j=1,\dots ,N+1\\&-(c-b)x+y+\sum _{i=j}^N(s_i-t_i)\ge -(c-b)\xi ^{(j-1)}&\forall j=1,\dots ,N+1. \end{aligned}$$

Applying linear optimization duality we get that its dual is

$$\begin{aligned} (D)\ \max \;&(r-c)\sum _{i=1}^{N+1}\xi ^{(i)}p_i-(c-b)\sum _{i=1}^{N+1}\xi ^{(i-1)}q_i\\ {\text {s.t.}} \;&p\in \mathbb {R}^{N+1}_+,\,q\in \mathbb {R}^{N+1}_+\\&(r-c)\sum _{i=1}^{N+1}p_i-(c-b)\sum _{i=1}^{N+1}q_i\le 0\\&\sum _{i=1}^{N+1}p_i+\sum _{i=1}^{N+1}q_i=1\\&\sum _{i=1}^{j}p_i+\sum _{i=1}^{j}q_i\le {Q_{D_N}(\alpha )}+\frac{j-1}{N}&\forall j=1,\dots ,N\\&-\sum _{i=1}^{j}p_i-\sum _{i=1}^{j}q_i\le {Q_{D_N}(\alpha )}-\frac{j}{N}&\forall j=1,\dots ,N. \end{aligned}$$

It can be directly verified that the following primal and dual solutions are respectively feasible

$$\begin{aligned} x&=(1-\theta )\xi ^{(i_\text {lo})}+\theta \xi ^{(i_\text {hi})},\\ y&=(r-c)\xi ^{(N+1)}-(r-c)x,\\ s&_i=\left\{ \begin{array}{ll}(c-b)\left( \xi ^{(i)}-\xi ^{(i-1)}\right) &{}\quad i\le i_\text {lo}\\ 0&{}\quad \text {otherwise}\end{array}\right.&\forall i=1,\dots ,N,\\ t&_i=\left\{ \begin{array}{ll}(r-c)\left( \xi ^{(i+1)}-\xi ^{(i)}\right) &{}\quad i\ge i_\text {hi}\\ 0&{}\quad \text {otherwise}\end{array}\right.&\forall i=1,\dots ,N\\ p_i=&\left\{ \begin{array}{ll} 0&{}\quad i\le i_\text {hi}-1\\ i/N-\theta -{Q_{D_N}(\alpha )}&{}\quad i=i_\text {hi}\\ 1/N&{}\quad N\ge i\ge i_\text {hi}+1\\ {Q_{D_N}(\alpha )}&{}\quad i=N+1 \end{array}\right. ,&\forall i=1,\dots ,N\\ q_i=&\left\{ \begin{array}{ll} {Q_{D_N}(\alpha )}&{}\quad i=1\\ 1/N&{}\quad 2\le i\le i_\text {lo}\\ \theta -{Q_{D_N}(\alpha )}-(i-2)/N&{}\quad i=i_\text {lo}+1\\ 0&{}i\ge i_\text {lo}+2 \end{array}\right.&\forall i=1,\dots ,N \end{aligned}$$

and that both have objective cost in their respective programs of

$$\begin{aligned} z=&-(c-b){Q_{D_N}(\alpha )}\xi ^{(0)}-\frac{c-b}{N}\sum _{i=1}^{i_\text {lo}-1}\xi ^{(i)}-(c-b)\left( \theta -{Q_{D_N}(\alpha )}-\frac{i_\text {lo}-1}{N}\right) \xi ^{(i_\text {lo})}\\ {}&+ (r-c){Q_{D_N}(\alpha )}\xi ^{(N+1)}+\frac{r-c}{N}\sum _{i=i_\text {hi}+1}^N\xi ^{(i)}+(r-c)\left( \frac{i_\text {hi}}{N}-{Q_{D_N}(\alpha )}-\theta \right) \xi ^{(i_\text {hi})}. \end{aligned}$$

This proves optimality of x. Adding \(0=(c-b)\theta x-(r-c)(1-\theta )x\) to the above yields the form of the optimal objective given in the statement of the result. \(\square \)

Proof of Theorem 14

Proof

Fix x. Let \(S=\{(a,b)\in \mathbb {R}^{d+1}:\left| \left| a\right| \right| _1+\left| b\right| \le 1\}\). Using the notation of [46], letting C be the cone of nonnegative measures on \(\varXi \) and \(C'\) the cone of nonnegative measures on S, we write the inner problem as

$$\begin{aligned} \sup _F\quad&\left\langle F,c(x;\xi )\right\rangle _\varXi \\ {\text {s.t.}}\quad&F\in C,\,\left\langle F,1\right\rangle _\varXi =1\\&{\frac{1}{N}\sum _{i=1}^N\max \{a^T\xi ^i-b,0\}+Q_{C_N}(\alpha _1)-\left\langle F,\max \{a^T\xi -b,0\}\right\rangle _\varXi }\\&\ge 0\quad \forall (a,b)\in S\\&\left\langle F,\left| \left| \xi \right| \right| _2^2\right\rangle _\varXi \ge Q_{R_N}^{\alpha _2} \end{aligned}$$

Invoking Proposition 2.8 of [46] (with the generalized Slater point equal to the empirical distribution), we have that the strongly dual minimization problem is

$$\begin{aligned} \min _{G,\tau ,\theta }\quad&\theta +\left\langle G,Q_{C_N}(\alpha _1)+\frac{1}{N}\sum _{i=1}^N\max \{a^T\xi ^i-b,0\}\right\rangle _S-Q_{R_N}^{\alpha _2}\tau \\ {\text {s.t.}}\quad&G\in C',\,\tau \in \mathbb {R}_+,\,\theta \in \mathbb {R}\nonumber \\&\inf _{\xi \in \varXi }\left( \left\langle G,\max \{a^T\xi -b,0\}\right\rangle _S-c(x;\xi )-\tau \left| \left| \xi \right| \right| _2^2\right) \ge -\theta . \end{aligned}$$
(57)

If (40) is true, we can infer from constraint (57) that

$$\begin{aligned} \tau\le & {} \inf _{\xi \in \varXi }\frac{\theta +\left\langle G,\max \{a^T\xi -b,0\}\right\rangle _S-c(x;\xi )}{\left| \left| \xi \right| \right| _2^2}\\\le & {} \liminf _{i\rightarrow \infty }\frac{\theta +{G}\{S\}\max \{\left| \left| \xi '_i\right| \right| _\infty ,1\}-c(x;\xi '_i)}{\left| \left| \xi '_i\right| \right| _2^2}=0. \end{aligned}$$

This shows that \(\tau =0\) is the only feasible choice.

In general, \(\tau =0\) is a feasible choice and fixing it so provides an upper bound on (56) and hence on the original inner problem by weak duality. Moreover, plugging \(\xi _0\) in to (57) we conclude that

$$\begin{aligned} \tau \le \frac{\theta +{G}\{S\}\max \{\left| \left| \xi _0\right| \right| _\infty ,1\}-c(x;\xi _0)}{\left| \left| \xi _0\right| \right| _2^2} \le \frac{1}{R^2}\theta +\frac{R+1}{R^2}G\{S\}-\frac{1}{R^2}c(x;\xi _0). \end{aligned}$$

Hence, setting \(\tau =0\) in (57) and replacing \(\tau \) by the above bound in the objective provides a lower bound on (56) and hence on the original inner problem.

In order to study both cases, and both the upper and lower bounds in the latter case, we consider for the rest of the proof the following general problem given \(\eta \) and \(\nu \),

$$\begin{aligned} \min _{G,\theta }\quad&\eta \theta +\left\langle G,\nu +\frac{1}{N}\sum _{i=1}^N\max \{a^T\xi ^i-b,0\}\right\rangle _S\end{aligned}$$
(58)
$$\begin{aligned} {\text {s.t.}}\quad&G\in C',\,\theta \in \mathbb {R}\nonumber \\&\inf _{\xi \in \varXi }\left( \left\langle G,\max \{a^T\xi -b,0\}\right\rangle _S-c(x;\xi )\right) \ge -\theta . \end{aligned}$$
(59)

We first rewrite (59) using the representation (37):

$$\begin{aligned} \inf _{\xi \in \varXi }\left( \left\langle G,\max \{a^T\xi -b,0\}\right\rangle _S-c_k(x;\xi )\right) \ge -\theta \quad \forall k=1,\dots ,K. \end{aligned}$$

Next, invoking [46], and employing the concave conjugate, we rewrite the kth of these constraints as follows:

$$\begin{aligned} -\theta&\le \inf _{\xi \in \varXi ,\,g(\cdot )\ge 0}\sup _{H_k\in C'}\left( \left\langle G,g\right\rangle _S-c_k(x;\xi )+\left\langle H_k,a^T\xi -b-g\right\rangle _S\right) \\&=\sup _{H_k\in C',\,G-H_k\in C'}\left( \left\langle H_k,-b\right\rangle _S+c_{k*}(x;\left\langle H_k,a\right\rangle _S)\right) . \end{aligned}$$

Introducing the variables \(H_k\) and this equivalent constraint into the problem (58) and invoking [46] we find that (58) is equal to the dual problem:

$$\begin{aligned} \max _{p,\,q\,\psi }\quad&\sum _{k=1}^Kp_kc_{k**}(x,\,q_k/p_k)\\ {\text {s.t.}}\quad&p_k\in \mathbb {R}^{K}_+,\,q\in \mathbb {R}^{K\times d},\,\inf _{(a,b)\in S}\psi _k(a,b)\ge 0\,\forall k=1,\dots ,K\\&\sum _{k=1}^Kp_k=\eta \\&\inf _{(a,b)\in S}\left( \nu +\frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\xi ^i-b,0\right\} -\sum _{k=1}^K\psi _k(a,b)\right) \ge 0\\&\inf _{(a,b)\in S}\left( \psi _k(a,b)-a^Tq_k+p_kb\right) \ge 0\quad \forall k=1,\dots ,K. \end{aligned}$$

Since \(c_k(x;\xi )\) is closed concave in \(\xi \), \(c_{k**}(x;\xi )=c_k(x;\xi )\). Moreover, recognizing that \(\psi _k=\max \left\{ a^Tq_k-p_kb,0\right\} \) is optimal, we can rewrite the above problem as:

$$\begin{aligned} \max _{p,\,q}\quad&\sum _{k=1}^Kp_kc_{k}(x,\,q_k/p_k) \end{aligned}$$
(60)
$$\begin{aligned} {\text {s.t.}}\quad&p_k\in \mathbb {R}^{K}_+,\,q\in R{K\times d}\nonumber \\&\sum _{k=1}^Kp_k=\eta \nonumber \\&\inf _{(a,b)\in S}\left( \nu +\frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\xi ^i-b,0\right\} -\sum _{k=1}^K\max \left\{ a^Tq_k-p_kb,0\right\} \right) \ge 0. \end{aligned}$$
(61)

Next, we rewrite (61) by splitting it over the different branches on the sum of K maxima, noting that the all-zeros branch is trivial:

$$\begin{aligned} \nu \ge \sup _{(a,b)\in S}\left( \sum _{k=1}^k\gamma _k\left( a^Tq_k-p_kb\right) -\frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\xi ^i-b,0\right\} \right) \quad \forall \gamma \in \mathcal {G}. \end{aligned}$$

Next, we invoke linear optimization duality to rewrite the \(\gamma \)th constraint as follows:

figured

Introducing the variables \(\mu _\gamma ,\rho _\gamma ,u_\gamma ,v_\gamma ,u'_\gamma ,v'_\gamma \) and this equivalent constraint into (60) and invoking linear optimization duality yields \(\mathcal {C}'(x;\nu ,\eta )\) and the result.\(\square \)

Proof of Theorem 15

Proof

Under these assumptions, weak optimization is polynomially reducible to weak separation (see [23]) and separation is given for all constraints but (39). We polynomially reduce weak \(\delta \)-separation over the kth constraint at fixed \(x',h'_k,g'_k\). First we call the concave conjugate oracle to find \(\xi '\) such that \( {h_k'}^T\xi '-c_k(x';\xi ')\le c_{k*}(x';h_k')+\delta . \) If \(g_k'\le {h_k'}^T\xi -c_k(x';\xi ')\) then \(x',h'_k,g'_k-\delta \) satisfies the constraint and is within \(\delta \) of the given point. Otherwise, we call the subgradient oracle to get \(s\in \partial _x c_{k}(x',\xi ')\) with \(\left| \left| s\right| \right| _\infty \le 1\) and produce the hyperplane \(g_k\le h_k^T\xi '-c_k(x';\xi ')-s^T(x-x')\), which is violated by the given point and is valid whenever the original constraint is valid (in particular in its \(\delta \)-interior).\(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bertsimas, D., Gupta, V. & Kallus, N. Robust sample average approximation. Math. Program. 171, 217–282 (2018). https://doi.org/10.1007/s10107-017-1174-z

Download citation

Keywords

  • Sample average approximation of stochastic optimization
  • Data-driven optimization
  • Goodness-of-fit testing
  • Distributionally robust optimization
  • Conic programming
  • Inventory management
  • Portfolio allocation

Mathematics Subject Classification

  • 90C15
  • 62G10
  • 90C47
  • 90C34
  • 90C25