Robust sample average approximation

Bertsimas, Dimitris; Gupta, Vishal; Kallus, Nathan

doi:10.1007/s10107-017-1174-z

Robust sample average approximation

Full Length Paper
Series A
Published: 29 June 2017

Volume 171, pages 217–282, (2018)
Cite this article

Mathematical Programming Submit manuscript

5160 Accesses
142 Citations
1 Altmetric
Explore all metrics

Abstract

Sample average approximation (SAA) is a widely popular approach to data-driven decision-making under uncertainty. Under mild assumptions, SAA is both tractable and enjoys strong asymptotic performance guarantees. Similar guarantees, however, do not typically hold in finite samples. In this paper, we propose a modification of SAA, which we term Robust SAA, which retains SAA’s tractability and asymptotic properties and, additionally, enjoys strong finite-sample performance guarantees. The key to our method is linking SAA, distributionally robust optimization, and hypothesis testing of goodness-of-fit. Beyond Robust SAA, this connection provides a unified perspective enabling us to characterize the finite sample and asymptotic guarantees of various other data-driven procedures that are based upon distributionally robust optimization. This analysis provides insight into the practical performance of these various methods in real applications. We present examples from inventory management and portfolio allocation, and demonstrate numerically that our approach outperforms other data-driven approaches in these applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data-driven robust optimization

Article 25 February 2017

The State of Robust Optimization

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Article Open access 07 July 2017

Notes

Indeed, nonparametric comparison tests focus on other location parameters such as median.
Such a hypothesis is called simple. By contrast, a composite hypothesis is not defined by a single distribution, but rather a family of distributions, and asserts that the data-generating distribution F is some member of this family. An example of a composite hypothesis is that F is normally distributed (with some, unknown mean and variance). We do not consider composite hypotheses in this work.
Note that standard definitions of the statistics have the form of $X_N^2$ and $G_N^2$ with thresholds given by the $\chi ^2$ distribution. Our nonstandard but equivalent definition is so that thresholds have the rate $O(1/\sqrt{N})$ to match other tests presented.
The only nuance is that Proposition 3.4 of [46] requires a generalized Slater point. We use the empirical distribution function, $\hat{F}_N$, as the generalized Slater point in the space of distributions.
This is most easily seen by noting:
$$\begin{aligned} \mathbb {E}\left[ \hat{z}_{{\text {SAA}}}\right]&= \mathbb {E}\left[ \min _{x \in X}\frac{1}{N} \sum _{j=1}^N c(x; \xi ^j)\right] \le \min _{x \in X}\frac{1}{N}\sum _{j=1}^N\mathbb {E}\left[ c(x; \xi ^j)\right] = z_{{\text {stoch}}} \le \mathbb {E}\left[ c(x_{{\text {SAA}}};\xi )\right] . \end{aligned}$$

References

Bassamboo, A., Zeevi, A.: On a data-driven method for staffing large call centers. Oper. Res. 57(3), 714–726 (2009)
Article MATH Google Scholar
Bayraksan, G., Love, D.K.: Data-driven stochastic programming using phi-divergences. In: Tutorials in Operations Research, pp. 1–19 (2015)
Ben-Tal, A., den Hertog, D., De Waegenaere, A., Melenberg, B., Rennen, G.: Robust solutions of optimization problems affected by uncertain probabilities. Manag. Sci. 59(2), 341–357 (2013)
Article Google Scholar
Ben-Tal, A., Nemirovski, A.: Lectures on modern convex optimization. In: Society for Industrial and Applied Mathematics. SIAM, Philadelphia (2001)
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
MATH Google Scholar
Bertsimas, D., Doan, X.V., Natarajan, K., Teo, C.P.: Models for minimax stochastic linear optimization problems with risk aversion. Math. Oper. Res. 35(3), 580–602 (2010)
Article MathSciNet MATH Google Scholar
Bertsimas, D., Gupta, V., Kallus, N.: Data-driven robust optimization. Preprint arXiv:1401.0212 (2013)
Bertsimas, D., Popescu, I.: Optimal inequalities in probability theory: a convex optimization approach. SIAM J. Optim. 15(3), 780–804 (2005)
Article MathSciNet MATH Google Scholar
Billingsley, P.: Convergence of Probability Measures. Wiley, New York (1999)
Book MATH Google Scholar
Birge, J.R., Louveaux, F.: Introduction to Stochastic Programming. Springer, New York (2011)
Book MATH Google Scholar
Birge, J.R., Wets, R.J.B.: Designing approximation schemes for stochastic optimization problems, in particular for stochastic programs with recourse. Math. Program. Study 27, 54–102 (1986)
Article MathSciNet MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Calafiore, G.C., El Ghaoui, L.: On distributionally robust chance-constrained linear programs. J. Optim. Theory Appl. 130(1), 1–22 (2006)
Article MathSciNet MATH Google Scholar
D’Agostino, R.B., Stephens, M.A.: Goodness-of-Fit Techniques. Dekker, New York (1986)
MATH Google Scholar
Delage, E., Ye, Y.: Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 55(3), 98–112 (2010)
MathSciNet MATH Google Scholar
DeMiguel, V., Garlappi, L., Nogales, F.J., Uppal, R.: A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms. Manag. Sci. 55(5), 798–812 (2009)
Article MATH Google Scholar
Dodge, Y.: The Oxford Dictionary of Statistical Terms. Oxford University Press, Oxford (2006)
MATH Google Scholar
Dudley, R.M.: Real Analysis and Probability, vol. 74. Cambridge University Press, Cambridge (2002)
Book MATH Google Scholar
Dupačová, J.: The minimax approach to stochastic programming and an illustrative application. Stoch. Int. J. Probab. Stoch. Process. 20(1), 73–88 (1987)
MathSciNet MATH Google Scholar
Efron, B.: An Introduction to the Bootstrap. Chapman & Hall, New York (1993)
Book MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, New York (2001)
MATH Google Scholar
Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70(3), 419–435 (2002)
Article MATH Google Scholar
Grotschel, M., Lovasz, L., Schrijver, A.: Geometric Algorithms and Combinatorial Optimization. Springer, New York (1993)
Book MATH Google Scholar
Gurobi Optimization Inc.: Gurobi optimizer reference manual. http://www.gurobi.com (2013)
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Article MATH Google Scholar
Homem-de Mello, T., Bayraksan, G.: Monte Carlo sampling-based methods for stochastic optimization. Surv. Oper. Res. Manag. Sci. 19(1), 56–85 (2014)
MathSciNet Google Scholar
Jiang, R., Guan, Y.: Data-driven chance constrained stochastic program. Tech. rep., Technical report, University of Florida. Available at: Optimization Online http://www.optimization-online.org (2013)
King, A.J., Wets, R.J.B.: Epiconsistency of convex stochastic programs. Stoch. Stoch. Rep. 34(1–2), 83–92 (1991)
Article MATH Google Scholar
Klabjan, D., Simchi-Levi, D., Song, M.: Robust stochastic lot-sizing by means of histograms. Prod. Oper. Manag. 22(3), 691–710 (2013)
Article Google Scholar
Kleywegt, A.J., Shapiro, A., Homem-de Mello, T.: The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. 12(2), 479–502 (2002)
Article MathSciNet MATH Google Scholar
Kullback, S.: A lower bound for discrimination information in terms of variation. IEEE Trans. Inf. Theory 13(1), 126–127 (1967)
Article Google Scholar
Levi, R., Perakis, G., Uichanco, J.: The data-driven newsvendor problem: new bounds and insights. Oper. Res. (2015). doi:10.1287/opre.2015.1422
MathSciNet MATH Google Scholar
Lim, A.E., Shanthikumar, J.G., Vahn, G.Y.: Conditional value-at-risk in portfolio optimization: coherent but fragile. Oper. Res. Lett. 39(3), 163–171 (2011)
Article MathSciNet MATH Google Scholar
Lobo, M.S., Vandenberghe, L., Boyd, S., Lebret, H.: Applications of second-order cone programming. Linear Algebra Appl. 284(1), 193–228 (1998)
Article MathSciNet MATH Google Scholar
Mak, W.K., Morton, D.P., Wood, R.K.: Monte Carlo bounding techniques for determining solution quality in stochastic programs. Oper. Res. Lett. 24(1), 47–56 (1999)
Article MathSciNet MATH Google Scholar
Noether, G.E.: Note on the Kolmogorov statistic in the discrete case. Metrika 7(1), 115–116 (1963)
Article MathSciNet MATH Google Scholar
Popescu, I.: Robust mean-covariance solutions for stochastic optimization. Oper. Res. 55(1), 98–112 (2007)
Article MathSciNet MATH Google Scholar
Prékopa, A.: Stochastic Programming. Kluwer Academic Publishers, Dordrecht (1995)
Book MATH Google Scholar
Reed, M., Simon, B.: Methods of Modern Mathematical Physics, Vol. 1: Functional Analysis. Academic Press, New York (1981)
MATH Google Scholar
Rice, J.: Mathematical Statistics and Data Analysis. Thomson/Brooks/Cole, Belmont (2007)
Google Scholar
Rockafellar, T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)
Article Google Scholar
Rohlf, F.J., Sokal, R.R.: Statistical Tables, 4th edn. Macmillan, New York (2012)
MATH Google Scholar
Scarf, H.: A min–max solution of an inventory problem. In: Arrow, K.J., Karlin, S., Scarf, H. (eds.) Studies in the Mathematical Theory of Inventory and Production, pp. 201–209. Stanford University Press, Stanford (1958)
Google Scholar
Scarsini, M.: Multivariate convex orderings, dependence, and stochastic equality. J. Appl. Probab. 35(1), 93–103 (1998)
Article MathSciNet MATH Google Scholar
Shaked, M., Shanthikumar, J.G.: Stochastic Orders. Springer, New York (2007)
Book MATH Google Scholar
Shapiro, A.: On duality theory of conic linear problems. In: Goberna, M.A., López, M.A. (eds.) Semi-Infinite Programming: Recent Advances, pp. 135–165. Kluwer Academic Publishers, Dordrecht (2001)
Chapter Google Scholar
Shapiro, A., Ruszczyński, A. (eds.): Handbooks in Operations Research and Management Science: Vol. 10. Stochastic Programming. Elsevier, Amsterdam (2003)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Estimating the moments of a random vector with applications (2003). http://eprints.soton.ac.uk/260372/1/EstimatingTheMomentsOfARandomVectorWithApplications.pdf
Stephens, M.A.: Use of the Kolmogorov–Smirnov, Cramér–Von Mises and related statistics without extensive tables. J. R. Stat. Soc. B 32(1), 115–122 (1970)
MATH Google Scholar
Thas, O.: Comparing Distributions. Springer, New York (2009)
MATH Google Scholar
Tikhonov, A.N.: On the stability of inverse problems. Dokl. Akad. Nauk SSSR 39(5), 195–198 (1943)
MathSciNet Google Scholar
Vapnik, V.: Principles of risk minimization for learning theory. In: NIPS, pp. 831–838 (1991)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Wächter, A., Biegler, L.: On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006)
Article MathSciNet MATH Google Scholar
Wang, Z., Glynn, P., Ye, Y.: Likelihood robust optimization for data-driven problems. Preprint arXiv:1307.6279 (2013)
Wiesemann, W., Kuhn, D., Sim, M.: Distributionally robust convex optimization. Oper. Res. 62(6), 1358–1376 (2014)
Article MathSciNet MATH Google Scholar
Žáčková, J.: On minimax solutions of stochastic linear programming problems. Časopis pro pěstování matematiky 91(4), 423–430 (1966)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers and associate editor for their extremely helpful suggestions and very thorough review of the paper. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. 1122374.

Author information

Authors and Affiliations

Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Dimitris Bertsimas
Marshall School of Business, University of Southern California, Los Angeles, CA, 90029, USA
Vishal Gupta
School of Operations Research and Information Engineering and Cornell Tech, Cornell University, New York, NY, 10011, USA
Nathan Kallus

Authors

Dimitris Bertsimas
View author publications
You can also search for this author in PubMed Google Scholar
Vishal Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Nathan Kallus
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nathan Kallus.

Appendix

1.1 Proof of Theorem 1

Proof

We will show that $\mathcal {C}(x;\mathcal {F})$ is continuous in x and that when $c(x;\xi )$ satisfies the coerciveness conditions, $\mathcal {C}(x;\mathcal {F})$ is also coercive. The result will then follow from the usual Weierstrass extreme value theorem for deterministic optimization [5].

Let $S=\{x\in X:\mathcal {C}\left( x;\mathcal {F}\right) <\infty \}$. By assumption $x_0\in S$, so $S\ne \varnothing $. Consequently, we can restrict to minimizing over S instead of over X.

Fix any $x\in X$. Let $\epsilon >0$ be given. By equicontinuity of the cost at x there is a $\delta >0$ such that any $y\in X$ with $\left| \left| x-y\right| \right| \le \delta $ has $\left| c\left( x;\xi \right) -c\left( y;\xi \right) \right| \le \epsilon $ for all $\xi \in \varXi $. Fix any such y. Then

$$\begin{aligned} \mathcal {C}\left( y;\mathcal {F}\right)= & {} \sup _{F_0\in \mathcal {F}}\mathbb {E}_{F_0}[c(y;\xi )]\le \sup _{F_0\in \mathcal {F}}\mathbb {E}_{F_0}[c(x;\xi )]+\epsilon =\mathcal {C}\left( x;\mathcal {F}\right) +\epsilon , \end{aligned}$$

(48)

$$\begin{aligned} \mathcal {C}\left( x;\mathcal {F}\right)= & {} \sup _{F_0\in \mathcal {F}}\mathbb {E}_{F_0}[c(x;\xi )]\le \sup _{F_0\in \mathcal {F}}\mathbb {E}_{F_0}[c(y;\xi )]+\epsilon =\mathcal {C}\left( y;\mathcal {F}\right) +\epsilon . \end{aligned}$$

(49)

Note that (48) implies that S is closed relative to X, which is itself closed. Hence, S is closed. Furthermore, (47) and (48) imply that $\mathcal {C}\left( x;\mathcal {F}\right) $ is continuous in x on S.

If S is compact, the usual Weierstrass extreme value theorem provides that the continuous $\mathcal {C}(x;\mathcal {F})$ attains its minimal (finite) value at an $x\in S\subseteq X$.

Suppose S is not compact. Since $S\subseteq X$ is closed, this must mean S is unbounded and then X is unbounded and hence not compact. Then, by assumption, $c(x;\xi )$ satisfies the coerciveness assumption. Because S is unbounded, there exists a sequence $x_i\in S$ such that $\lim _{i\rightarrow \infty }\left| \left| x_0-x_i\right| \right| =\infty $. Then by the coerciveness assumption, $c_i(\xi )=c(x_i;\xi )$ diverges pointwise to infinity. Fix any $F_0\in \mathcal {F}$. Let $c'_i(\xi )=\inf _{j\ge i}c_j(\xi )$, which is then pointwise monotone nondecreasing and pointwise divergent to infinity. Then, by Lebesgue’s monotone convergence theorem, $\lim _{i\rightarrow \infty }\mathbb {E}_{F_0}[c'_i(\xi )]=\infty $. Since $c'_i\le c_j$ pointwise for any $j\ge i$, we have $\mathbb {E}_{F_0}[c'_i(\xi )]\le \inf _{j\ge i}\mathbb {E}_{F_0}[c_j(\xi )]$ and therefore

$$\begin{aligned} \infty =\lim _{i\rightarrow \infty }\mathbb {E}_{F_0}[c'_i(\xi )]\le \lim _{i\rightarrow \infty }\inf _{j\ge i}\mathbb {E}_{F_0}[c_i(\xi )]=\liminf _{i\rightarrow \infty }\mathbb {E}_{F_0}[c_i(\xi )]. \end{aligned}$$

Thus $\mathcal {C}(x;\mathcal {F})\ge \mathbb {E}_{F_0}[c(x;\xi )]$ is also coercive in x over S. Then, the usual Weierstrass extreme value theorem provides that the continuous $\mathcal {C}(x;\mathcal {F})$ attains its minimal (finite) value at an $x\in S\subseteq X$.$\square $

1.2 Proof of Proposition 2

Proof

Suppose that $c(x;\xi )\rightarrow \infty $ as $\xi \rightarrow \infty $. The case of unboundedness in the negative direction is similar. Choose $\rho >0$ small so that $\xi ^{(i)}-\xi ^{(i-1)}>2\rho $ for all i. For $\delta >0$ and $\xi '\ge \xi ^{(N)}+\rho $, let $F_{\delta ,\xi '}$ be the measure with density function

$$\begin{aligned} f(\xi ;\delta ,\xi ')=\left\{ \begin{array}{ll}1/(2N\rho )&{}\quad \xi ^{(i)}-\rho \le \xi \le \xi ^{(i)}+\rho \text { for }1\le i\le N-1,\\ 1/(2N\rho )&{}\quad \xi ^{(N)}-\rho +\delta \rho \le \xi \le \xi ^{(N)}+\rho -\delta \rho ,\\ 1/(2N\rho )&{}\quad \xi '\le \xi \le \xi '+2\delta \rho ,\\ 0&{}\quad \text {otherwise}.\end{array}\right. \end{aligned}$$

In words, this density equally distributes mass on a $\rho $-neighborhood of every point $\xi ^{(i)}$, except $\xi ^{(N)}$. For $\xi ^{(N)}$, we “steal” $\delta $ of the mass to place around $\xi '$. Notice that for any $\xi '$, $F_{0,\xi '}$ (i.e., take $\delta =0$) minimizes $S_N(F_0)$ over distributions $F_0$. Since $\alpha >0$, $Q_{S_N}(\alpha )$ is strictly greater than this minimum. Since $S_N(F_{\delta ,\xi '})$ increases continuously with $\delta $ independently of $\xi '$, there must exist $\delta >0$ small enough so that $F_{\delta ,\xi '}\in \mathcal {F}_{S_N}^\alpha $ for any $\xi '>\xi ^{(N)}+\rho $.

Let any $M>0$ be given. By infinite limit of the cost function, there exists $\xi '>\xi ^{(N)}+\rho $ sufficiently large such that $c(x;\xi )\ge M N/\delta $ for all $\xi \ge \xi '$. Then, we have $\mathcal {C}\left( x;\mathcal {F}_{S_N}^\alpha \right) \ge \mathbb {E}_{F_{\delta ,\xi '}} \left[ c(x;\xi )\right] \ge \mathbb {P}\left( \xi \ge \xi '\right) M N/\delta =M$.

Since we have shown this for every $M>0$, we have $\mathcal {C}\left( x;\mathcal {F}_{S_N}^\alpha \right) =\infty $.$\square $

1.3 Computing a threshold $Q_{C_N}(\alpha )$

We provide two ways to compute $Q_{C_N}(\alpha )$ for use with the LCX-based GoF test. One is an exact, closed form formula, but which may be loose for moderate N. Another uses the bootstrap to compute a tighter, but approximate threshold.

The theorem below employs a bound on $\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] $ to provide a valid threshold. This bound could either stem from known support bounds or from changing (15) to a two-sided hypothesis with two-sided confidence interval, using the lower bound as in (17) and the upper bound in (49) given below.

Theorem 16

Let $N\ge 2$. Suppose that with probability at least $1-\alpha _2$, $\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \le \overline{Q_{R_N}}\left( \alpha _2\right) $. Let $\alpha _1\in (0,1)$ be given and suppose $F_0\preceq _\text {LCX}F$. Then, with probability at least $1-\alpha _1-\alpha _2$,

$$\begin{aligned}&\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \le \overline{Q_{R_N}}\left( \alpha _2\right) \quad \text { and}\nonumber \\&C_N(F_0) \le \left( 1+\overline{Q_{R_N}}\left( \alpha _2\right) \right) \left( 1+\frac{p}{2-p}\right) \frac{2^{\frac{1}{2}+\frac{1}{p}}}{N^{1-\frac{1}{p}}}\nonumber \\&\qquad \sqrt{d+1+(d+1)\log \left( \frac{N}{d+1}\right) +\log \left( \frac{4}{\alpha _1}\right) }, \end{aligned}$$

(50)

where

$$\begin{aligned} p=\frac{1}{2}\left( \sqrt{\log (256)+8\log \left( N\right) +\left( \log \left( 2N\right) \right) ^2}-\log \left( 2N\right) \right) \in (1,2). \end{aligned}$$

(51)

Hence, defining $Q_{C_N}(\alpha _1)$ equal to the right-hand side of (49), we get a valid threshold for $C_N$ in testing $F_0 \preceq _{\text{ L }CX} F$ at level $\alpha _1$.

Proof

Observe

$$\begin{aligned} C_N(F_0)\le&\sup _{\left| \left| a\right| \right| _1+\left| b\right| \le 1}\left( \mathbb {E}_{F_0}[\max \left\{ a^T\xi -b,0\right\} ]-\mathbb {E}_{F}[\max \left\{ a^T\xi -b,0\right\} ]\right) \nonumber \\ {}&+\sup _{\left| \left| a\right| \right| _1+\left| b\right| \le 1}\left( \mathbb {E}_{F}[\max \left\{ a^T\xi -b,0\right\} ]-\frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\xi ^i-b,0\right\} \right) \nonumber \\\le&\sup _{\left| \left| a\right| \right| _1+\left| b\right| \le 1}\left( \mathbb {E}_{F}[\max \left\{ a^T\xi -b,0\right\} ]-\frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\xi ^i-b,0\right\} \right) , \end{aligned}$$

(52)

where the first inequality follows by distributing the sup and the second inequality follows because $F_0 \preceq _{LCX} F$. Next, we provide a probabilistic bound on the last sup, which is the maximal difference over $\left| \left| a\right| \right| _1+\left| b\right| \le 1$ between the true expectation of the hinge function $\max \left\{ a^T\xi -b,0\right\} $ and its empirical expectation.

The class of level sets of such functions, i.e., $\mathcal {S} =\left\{ \left\{ \xi \in \varXi :\max \left\{ a^T\xi -b,0\right\} \le t\right\} :\right. \left. \left| \left| a\right| \right| _1+\left| b\right| \le 1,\,t\in \mathbb {R}\right\} $, is contained in the class of the empty set and all halfspaces. Therefore, it has Vapnik-Chervonenkis dimension at most $d+1$ (cf. [53]). Therefore, Theorem 5.2 and equation (5.12) of [53] provide that for any $\epsilon >0$ and $p\in (1,2]$,

$$\begin{aligned}&\mathbb {P}\Biggl (\sup _{\left| \left| a\right| \right| _1+\left| b\right| \le 1}\frac{1}{D_p(a,b)}\Biggl (\mathbb {E}_{F}[\max \left\{ a^T\xi -b,0\right\} ]-\frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\xi ^i-b,0\right\} \Biggr )>\epsilon \Biggr )\\&\quad <4\exp \left( \left( \frac{(d+1)(\log \left( \frac{N}{d+1}\right) +1)}{N^{2-2/p}}-\frac{\epsilon ^2}{2^{1+2/p}}\right) N^{2-2/p}\right) ,\nonumber \end{aligned}$$

(53)

where $D_p(a,b)=\int _0^\infty \left( \mathbb {P}_F\left( \max \left\{ a^T\xi -b,0\right\} >t\right) \right) ^{1/p}dt$.

Notice that for any $\left| \left| a\right| \right| _1+\left| b\right| \le 1$, we have $0\le \max \left\{ a^T\xi -b,0\right\} \le \max \left\{ 1,\left| \left| \xi \right| \right| _\infty \right\} \le \max \left\{ 1,\left| \left| \xi \right| \right| _2\right\} $ and hence $\mathbb {E}_F\left[ \max \left\{ a^T\xi -b,0\right\} ^2\right] \le \max \left\{ 1^2,\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \right\} \le 1+\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] $. These observations combined with Markov’s inequality yields that, for any $\left| \left| a\right| \right| _1+\left| b\right| \le 1$ and $p\in (1,2)$, we have

$$\begin{aligned}&D_p(a,b)=\int _0^\infty \left( \mathbb {P}_F\left( \max \left\{ a^T\xi -b,0\right\} >t\right) \right) ^{1/p}dt\le 1\\&\qquad \qquad \qquad +\int _{1}^\infty \frac{\left( \mathbb {E}_F\left[ \max \left\{ a^T\xi -b,0\right\} ^2\right] \right) ^{1/p}}{t^{2/p}}dt\\&\quad \le \left( 1+\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \right) ^{1/p}\left( 1+\frac{p}{2-p}\right) \le \left( 1+\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \right) \left( 1+\frac{p}{2-p}\right) . \end{aligned}$$

This yields a bound on $D_p(a,b)$ that is independent of (a, b). Using this bound takes $D_p(a,b)$ out of the sup in (52) and by bounding $\mathbb {E}_F\left[ \left| \left| \xi \right| \right| _2^2\right] \le \overline{Q_{R_N}}\left( \alpha _2\right) $ (which holds with probability $1-\alpha _2$), we conclude from (52) that (49) holds with probability $1-\alpha _1-\alpha _2$ for any $p\in (1,2)$. The p given in (50) optimizes the bound for $N\ge 2$.

$\square $

Next we show how to bootstrap an approximate threshold $Q_{C_N}(\alpha )$. Recall that we seek a threshold $Q_{C_N}(\alpha )$ such that $\mathbb {P}\left( C_N(F_0)>Q_{C_N}(\alpha )\right) \le \alpha $ whenever $F_0\preceq _{{\text {LCX}}}F$. Employing (51), we see that a sufficient threshold is the $(1-\alpha )$th quantile of

$$\begin{aligned} \sup _{\left| \left| a\right| \right| _1+\left| b\right| \le 1}\left( \mathbb {E}_{F}[\max \left\{ a^T\xi -b,0\right\} ]-\frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\xi ^i-b,0\right\} \right) , \end{aligned}$$

where $\xi ^i$ are drawn IID from F. The bootstrap [20] approximates this by replacing F with the empirical distribution $\hat{F}_N$. In particular, given an iteration count B, for $t=1,\dots , B$ it sets

$$\begin{aligned} Q^t=\sup _{\left| \left| a\right| \right| _1+\left| b\right| \le 1}\left( \frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\xi ^i-b,0\right\} -\frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\tilde{\xi }^{t,i}-b,0\right\} \right) \end{aligned}$$

(54)

where $\tilde{\xi }^{t,i}$ are drawn IID from $\hat{F}_N$, i.e., IID random choices from $\{\xi ^1,\dots ,\xi ^N\}$. Then the bootstrap approximates $Q_{C_N}(\alpha )$ by the $(1-\alpha )$th quantile of $\{Q^1,\dots ,Q^B\}$. However, it may be difficult to compute (53) as the problem is non-convex. Fortunately (53) can be solved with a standard MILP formulation or by discretizing the space and enumerating (the objective is Lipschitz).

In particular, our bootstrap algorithm for computing $Q_{C_N}(\alpha )$ is as follows:

1.4 Proof of Proposition 4

Proof

We first prove that a uniformly consistent test is consistent. Let $G_0 \ne F$ be given. Denote by d the Lévy-Prokhorov metric, which metrizes weak convergence (cf. [9]), and observe that $d(G_0, F) > 0$.

Define $R_N=\sup _{F_0\in \mathcal {F}_N}d(F_0,F)$. We claim that if the test is uniformly consistent, then $\mathbb {P}\left( R_N \rightarrow 0\right) = 1$. Suppose that for some sample path, $R_N \not \rightarrow 0$. By the definition of the supremum, there must exist $\delta > 0$ and a sequence $F_N \in \mathcal {F}_N$ such that $d(F_N, F) \ge \delta $ i.o. Since d metrizes weak convergence, this means that $F_N$ does not converge to F. However, $F_N \in \mathcal {F}_N$ for all N, i.e. it is never rejected. Therefore, by uniform consistency of the test, the event that $R_N \not \rightarrow 0$ must have probability 0. I.e., $R_N \rightarrow 0$ a.s.

Since a.s. convergence implies convergence in probability, we have that $\mathbb {P}\left( R_N<\epsilon \right) \rightarrow 1$ for every $\epsilon > 0$, and, in particular, for $\epsilon = d(G_0, F)>0$. Hence we have,

$$\begin{aligned} \mathbb {P}\left( G_0\in \mathcal {F}_N\right) \le \mathbb {P}\left( d(G_0,F)\le R_N\right) \rightarrow 0. \end{aligned}$$

This proves the first part of the proposition.

For the second part, we describe a test which is consistent but not uniformly consistent. Consider testing a continuous distribution F with the following univariate GoF test:

$$\begin{aligned}&\text {Given data }\xi ^1,\dots ,\xi ^N\text { drawn from }F\text { and a hypothetical continuous distribution }F_0:\\&\quad \text {Let }\ell =\lfloor \log _2 N\rfloor ,\, k=N-2^\ell .\\&\quad \text {If }\frac{k}{2^\ell }\le F_0(\xi ^1) \le \frac{k+1}{2^\ell }\text { then }F_0\text { is not rejected.}\\&\quad \text {Otherwise, reject }F_0\text { if it is rejected by the KS test at level }\frac{\alpha }{1-2^{-\ell }}\\&\qquad \text { applied to the data }\xi ^2, \ldots , \xi ^N. \end{aligned}$$

Notice that under the null-hypothesis, the probability of rejection is

$$\begin{aligned} \mathbb {P}(F_0 \text { rejected } )= & {} \mathbb {P} \left( F_0( \xi ^1 ) \not \in \left[ \frac{k}{2^\ell }, \frac{k+1}{2^\ell } \right] \right) \mathbb {P}( F_0 \text { is rejected by the KS test})\\= & {} (1 - 2^{-\ell }) \frac{\alpha }{1-2^{-\ell }} = \alpha , \end{aligned}$$

where we’ve used that $\xi ^1$ is independent of the rest of the sample, and $F_0(\xi ^1)$ is uniformly distributed for $F_0$ continuous. Consequently, the test is a valid GoF test and it has significance $\alpha $.

We claim this test is also consistent. Specifically, consider any $F_0 \ne F$. By continuity of $F_0$ and consistency of the KS test,

$$\begin{aligned} \mathbb {P}\left( F_0\text { is rejected}\right) =\mathbb {P} \left( F_0( \xi ^1 ) \not \in \left[ \frac{k}{2^\ell }, \frac{k+1}{2^\ell } \right] \right) \mathbb {P}\left( F_0\text { is rejected by the KS test}\right) \longrightarrow 1. \end{aligned}$$

However, the test is not uniformly consistent. Fix any continuous $F_0\ne F$ and let

$$\begin{aligned} F_N=\left\{ \begin{array}{ll}F_0&{}\quad \text {if }\frac{k}{2^\ell }\le F_0(\xi ^1)\le \frac{k+1}{2^\ell },\\ \hat{F}_N&{}\quad \text {otherwise.}\end{array}\right. \end{aligned}$$

Observe that $0\le F_0(\xi ^1)\le 1$ and $[0,1]=\bigcup _{k=0}^{2^\ell -1}\left[ \frac{k}{2^\ell },\,\frac{k+1}{2^\ell }\right] $. That is, for every $\ell $, $F_N=F_0$ at least once for $N\in \{2^\ell ,\dots ,2^{\ell +1}-1\}$. Therefore $F_N = F_0$ i.o., so it does not converge weakly to F. However, as constructed, $F_N$ is never rejected by the above test. This is done for every sample path so the test cannot be uniformly consistent. $\square $

1.5 Proofs of Theorems 2 and 3

We first establish two useful results.

Proposition 8

Suppose $\mathcal {F}_N$ is the confidence region of a uniformly consistent test and that Assumptions 1 and 3 hold. Then, almost surely, $\mathbb {E}_{F_N}[c(x;\xi )]\rightarrow E_F[c(x;\xi )]$ for any $x\in X$ and sequences $F_N\in \mathcal {F}_N$.

Proof

Restrict to the a.s. event that $\left( F_N\not \rightarrow F\implies F_N\notin \mathcal {F}_N\text { i.o.}\right) $. Fix $F_N\in \mathcal {F}_N$. Then the contrapositive gives $F_N\rightarrow F$. Fix x. If $\varXi $ is bounded (Assumption 3a) then the result follows from the portmanteau lemma (see for example Theorem 2.1 of [9]). Suppose otherwise (Assumption 3b). Then $\mathbb {E}_{F_N}[\phi (\xi )]\rightarrow \mathbb {E}_F[\phi (\xi )]$. By Theorem 3.6 of [9], $\phi (\xi )$ is uniformly integrable over $\{F_1,F_2,\dots \}$. Since $c(x;\xi )=O(\phi (\xi ))$, it is also uniformly integrable over $\{F_1,F_2,\dots \}$. Then the result follows by Theorem 3.5 of [9].$\square $

The following is a restatement of the equivalence between local uniform convergence of continuous functions and convergence of evaluations along a convergent path. We include the proof for completeness.

Proposition 9

Suppose Assumption 1 holds and $\mathcal {C}(x_N;\mathcal {F}_N)\rightarrow \mathbb {E}_F[c(x;\xi )]$ for any convergent sequence $x_N\rightarrow x$. Then (18) holds.

Proof

Let $E\subseteq X$ compact be given and suppose to the contrary that $\sup _{x\in E}\left| \mathcal {C}(x;\mathcal {F}_N)\right. \left. -\mathbb {E}_F[c(x;\xi )]\right| \not \rightarrow 0.$ Then $\exists \epsilon >0$ and $x_N\in E$ such that $\left| \mathcal {C}(x_N;\mathcal {F}_N)-\mathbb {E}_F[c(x_N;\xi )]\right| \ge \epsilon $ i.o. This, combined with compactness, means that there exists a subsequence $N_1<N_2<\cdots <N_k\rightarrow \infty $ such that $x_{N_k}\rightarrow x\in E$ and $\left| \mathcal {C}(x_{N_k};\mathcal {F}_{N_k})-\mathbb {E}_F[c(x_{N_k};\xi )]\right| \ge \epsilon $ $\forall k$. Then,

$$\begin{aligned} 0<\epsilon\le & {} \left| \mathcal {C}(x_{N_k};\mathcal {F}_{N_k})-\mathbb {E}_F[c(x_{N_k};\xi )]\right| \le \left| \mathcal {C}(x_{N_k};\mathcal {F}_{N_k})-\mathbb {E}_F[c(x;\xi )]\right| \\&+\left| \mathbb {E}_F[c(x;\xi )]-\mathbb {E}_F[c(x_{N_k};\xi )]\right| . \end{aligned}$$

By assumption, $\exists k_1$ such that $\left| \mathcal {C}(x_{N_k};\mathcal {F}_{N_k})-\mathbb {E}_F[c(x;\xi )]\right| \le \epsilon /4$ $\forall k\ge k_1$. By equicontinuity and $x_{N_k}\rightarrow x$, $\exists k_2$ such that $\left| c(x;\xi )-c(x_{N_k};\xi )\right| \le \epsilon /4$ $\forall \xi ,\,k\ge k_2$. Then,

$$\begin{aligned} \left| \mathbb {E}_F [c(x;\xi )]-\mathbb {E}_F [c(x_{N_k};\xi )]\right| \le \mathbb {E}_F[\left| c(x;\xi )-c(x_{N_k};\xi )\right| ]\le \epsilon /4\quad \forall \xi ,\,k\ge k_2. \end{aligned}$$

Combining and considering $k=\max \left\{ k_1,k_2\right\} $, we get the contradiction $\epsilon \le \epsilon /2$ for strictly positive $\epsilon $.$\square $

We prove the “if” and “only if” sides of Theorem 2 separately.

Proof

(Proofs of Theorem 3 and the “only if” side of Theorem 2) For either theorem restrict to the a.s. event that

$$\begin{aligned} \mathbb {E}_{F_N}[c(x;\xi )]\rightarrow \mathbb {E}_F[c(x;\xi )]\text { for every }x\in X\text { and sequences } F_N\in \mathcal {F}_N \end{aligned}$$

(55)

(using Proposition 8 for Theorem 2 or by assumption of c-consistency for Theorem 3).

Let any convergent sequence $x_N\rightarrow x$ and $\epsilon >0$ be given. By equicontinuity and $x_N\rightarrow x$, $\exists N_1$ such that $\left| c(x_N;\xi )-c(x;\xi )\right| \le \epsilon /2$ $\forall \xi ,\,N\ge N_1$. Then, $\forall N\ge N_1$,

$$\begin{aligned} \left| \mathcal {C}(x_N;\mathcal {F}_N)-\mathcal {C}(x;\mathcal {F}_N)\right|\le & {} \sup _{F_0\in \mathcal {F}_N}\left| \mathbb {E}_{F_0}\left[ c(x_N;\xi )-c(x;\xi )\right] \right| \\\le & {} \sup _{F_0\in \mathcal {F}_N}\mathbb {E}_{F_0}\left[ \left| c(x_N;\xi )-c(x;\xi )\right| \right] \le \epsilon /2. \end{aligned}$$

By definition of supremum, $\exists F_N\in \mathcal {F}_N$ such that $\mathcal {C}(x;\mathcal {F}_N)\le \mathbb {E}_{F_N}[c(x;\xi )]+\epsilon /4$. By (54), $\mathbb {E}_{F_N}[c(x;\xi )]\rightarrow \mathbb {E}_{F}[c(x;\xi )]$. Hence, $\exists N_2$ such that $\left| \mathbb {E}_{F_N}[c(x;\xi )]\right. \left. -\mathbb {E}_{F}[c(x;\xi )]\right| \le \epsilon /4$ $\forall N\ge N_2$. Combining these with the triangle inequality

$$\begin{aligned} \left| \mathcal {C}(x_N;\mathcal {F}_N)-\mathbb {E}_F[c(x;\xi )]\right| \le \left| \mathcal {C}(x_N;\mathcal {F}_N)-\mathcal {C}(x;\mathcal {F}_N)\right| +\left| \mathcal {C}(x;\mathcal {F}_N)-\mathbb {E}_F[c(x;\xi )]\right| , \end{aligned}$$

we get

$$\begin{aligned} \left| \mathcal {C}(x_N;\mathcal {F}_N)-\mathbb {E}_F[c(x;\xi )]\right| \le \epsilon \quad \forall N\ge \max \left\{ N_1,N_2\right\} . \end{aligned}$$

Thus, by Proposition 9, we get that (18) holds.

Let $A_N={\text {arg}}\min _{x\in X}\mathcal {C}(x;\mathcal {F}_N)$. We now show that $\bigcup _N A_N$ is bounded. If X is compact (Assumption 2a) then this is trivial. Suppose X is not compact (Assumption 2b). Using the same arguments as in the proof of Theorem 1, we have in particular that $\lim _{\left| \left| x\right| \right| \rightarrow \infty }\mathbb {E}_F[c(x;\xi )]=\infty $, $z_\text {stoch}=\min _{x\in X}\mathbb {E}_F[c(x;\xi )]<\infty $, that $A={\text {arg}}\min _{x\in X}\mathbb {E}_F[c(x;\xi )]$ is compact, and each $A_N$ is compact. Let $x^*\in A$. Fix $\epsilon >0$. By definition of supremum $\exists F_N\in \mathcal {F}_N$ such that $\mathcal {C}(x^*;\mathcal {F}_N)\le \mathbb {E}_{F_N}[c(x^*;\xi )]+\epsilon $. By (54), $\mathbb {E}_{F_N}[c(x^*;\xi )]\rightarrow \mathbb {E}_{F}[c(x^*;\xi )]=z_\text {stoch}$. As this is true for any $\epsilon $ and since $\min _{x\in X}\mathcal {C}(x;\mathcal {F}_N)\le \mathcal {C}(x^*;\mathcal {F}_N)$, we have $\limsup _{N\rightarrow \infty }\min _{x\in X}\mathcal {C}(x;\mathcal {F}_N)\le z_\text {stoch}$. Now, suppose for contradiction that $\bigcup _N A_N$ is unbounded, i.e. there is a subsequence $N_1<N_2<\cdots <N_k\rightarrow \infty $ and $x_{N_k}\in A_{N_k}$ such that $\left| \left| x_{N_k}\right| \right| \rightarrow \infty $. Let $\delta '=\limsup _{k\rightarrow \infty }\inf _{\xi \notin D}c(x_{N_k};\xi )\ge \liminf _{N\rightarrow \infty }\inf _{\xi \notin D}c(x_{N};\xi )>-\infty $ and $\delta =\min \left\{ 0,\delta '\right\} $. By D-uniform coerciveness, $\exists k_0$ such that $c(x_{N_k};\xi )\ge (z_\text {stoch}+1-\delta )/F(D)$ $\forall \xi \in D,\,k\ge k_0$. In the case of Theorem 2, let $F_N$ be any $F_N\in \mathcal {F}_N$. In the case of Theorem 3, let $F_N$ be the empirical distribution $F_N=\hat{F}_N\in \mathcal {F}_N$. In either case, we get $F_N\rightarrow F$ weakly. In particular, $F_N(D)\rightarrow F(D)$. Then $\mathbb {E}_{F_N}[c(x_{N_k};\xi )]\ge F_N(D)\times (z_\text {stoch}+1-\delta )/F(D)+\min \left\{ 0,\inf _{\xi \notin D}c(x_{N_k};\xi )\right\} $ $\forall k\ge k_0$. Thus $\limsup _{N\rightarrow \infty }\min _{x\in X}\mathcal {C}(x;\mathcal {F}_{N})\ge \limsup _{k\rightarrow \infty }\min _{x\in X}\mathcal {C}(x;\mathcal {F}_{N_k})\ge z_\text {stoch}+1-\delta +\delta =z_\text {stoch}+1$, yielding the contradiction $z_\text {stoch}+1\le z_\text {stoch}$.

Thus $\exists A_\infty $ compact such that $A\subseteq A_\infty $, $A_N\subseteq A_\infty $. Then, by (18),

$$\begin{aligned} \delta _N=\left| \min _{x\in X}\mathcal {C}(x;\mathcal {F}_N)-\min _{x\in X}\mathbb {E}_F[c(x;\xi )]\right|&=\left| \min _{x\in A_\infty }\mathcal {C}(x;\mathcal {F}_N)-\min _{x\in A_\infty }\mathbb {E}_F[c(x;\xi )]\right| \\ {}&\le \sup _{x\in A_\infty }\left| \mathcal {C}(x;\mathcal {F}_N)-\mathbb {E}_F[c(x;\xi )]\right| \rightarrow 0, \end{aligned}$$

yielding (19). Let $x_N\in A_N$. Since $A_\infty $ is compact, $x_N$ has at least one convergent subsequence. Let $x_{N_k}\rightarrow x$ be any convergent subsequence. Suppose for contradiction $x\notin A$, i.e., $\epsilon =\mathbb {E}_F[c(x;\xi )]-z_\text {stoch}>0$. Since $x_{N_k}\rightarrow x$ and by equicontinuity, $\exists k_1$ such that $\left| c(x_{N_k};\xi )-c(x;\xi )\right| \le \epsilon /4$ $\forall \xi ,\,k\ge k_1$. Then, $\left| \mathbb {E}_F[c(x_{N_k};\xi )]-\mathbb {E}_F[c(x;\xi )]\right| \le \mathbb {E}_F[\left| c(x_{N_k};\xi )-c(x;\xi )\right| ]\le \epsilon /4$ $\forall k\ge k_1$. Also $\exists k_2$ such that $\delta _{N_k}\le \epsilon /4$ $\forall k\ge k_2$. Then, for $k\ge \max \left\{ k_1,k_2\right\} $,

$$\begin{aligned} \min _{x\in X}\mathcal {C}(x;\mathcal {F}_{N_k})= & {} \mathcal {C}(x_{N_k};\mathcal {F}_{N_k})\ge \mathbb {E}_F[c(x_{N_k};\xi )]-\delta _{N_k}\ge \mathbb {E}_F[c(x;\xi )]\\&-\,\epsilon /2\ge z_\text {stoch}+\epsilon /2. \end{aligned}$$

Taking limits, we contradict (19).$\square $

Proof

(Proof of the “if” side of Theorem 2) Consider any $\varXi $ bounded ($R=\sup _{\xi \in \varXi }\left| \left| \xi \right| \right| <\infty $). Let $X=\mathbb {R}^{d}$, and

$$\begin{aligned}&c_1(x;\xi )=\left| \left| x\right| \right| \left( 2+{\text {cos}}\left( {x^T\xi }\right) \right) ,\quad c_2(x;\xi )=\left| \left| x\right| \right| \left( 2-{\text {cos}}\left( {x^T\xi }\right) \right) ,\\&c_3(x;\xi )=\left| \left| x\right| \right| \left( 2+{\text {sin}}\left( {x^T\xi }\right) \right) ,\quad c_4(x;\xi )=\left| \left| x\right| \right| \left( 2-{\text {sin}}\left( {x^T\xi }\right) \right) . \end{aligned}$$

Since $\left| c_i(x;\xi )\right| \le 3\left| \left| x\right| \right| $, expectations exist. The gradient of each $c_i$ at x has magnitude bounded by $R\left| \left| x\right| \right| +3$ uniformly over $\xi $, so Assumption 1 is satisfied. Also, $\lim _{\left| \left| x\right| \right| \rightarrow \infty }c_i(x;\xi )\ge \lim _{\left| \left| x\right| \right| \rightarrow \infty }\left| \left| x\right| \right| =\infty $ uniformly over all $\xi \in \varXi $ and $c_i(x;\xi )\ge 0$, so Assumption 2 is satisfied.

Restrict to the a.s. event that (18) applies simultaneously for $c_1,c_2,c_3,c_4$. Let any sequence $F_N\not \rightarrow F$ be given. Let I denote the imaginary unit. Then, by the Lévy continuity theorem and Cramér-Wold device, there exists x such that $\mathbb {E}_{F_N}\left[ e^{Ix^T\xi }\right] \not \rightarrow \mathbb {E}_{F}\left[ e^{Ix^T\xi }\right] $. On the other hand, by (18),

$$\begin{aligned} 2\left| \left| x\right| \right| +\left| \left| x\right| \right| \sup _{F_0\in \mathcal {F}_N}\mathbb {E}_{F_0}\left[ {\text {cos}}\left( {x^T\xi }\right) \right] \longrightarrow 2\left| \left| x\right| \right| +\left| \left| x\right| \right| \mathbb {E}_{F}\left[ {\text {cos}}\left( {x^T\xi }\right) \right] ,\\ 2\left| \left| x\right| \right| -\left| \left| x\right| \right| \inf _{F_0\in \mathcal {F}_N}\mathbb {E}_{F_0}\left[ {\text {cos}}\left( {x^T\xi }\right) \right] \longrightarrow 2\left| \left| x\right| \right| -\left| \left| x\right| \right| \mathbb {E}_{F}\left[ {\text {cos}}\left( {x^T\xi }\right) \right] ,\\ 2\left| \left| x\right| \right| +\left| \left| x\right| \right| \sup _{F_0\in \mathcal {F}_N}\mathbb {E}_{F_0}\left[ {\text {sin}}\left( {x^T\xi }\right) \right] \longrightarrow 2\left| \left| x\right| \right| +\left| \left| x\right| \right| \mathbb {E}_{F}\left[ {\text {sin}}\left( {x^T\xi }\right) \right] ,\\ 2\left| \left| x\right| \right| -\left| \left| x\right| \right| \inf _{F_0\in \mathcal {F}_N}\mathbb {E}_{F_0}\left[ {\text {sin}}\left( {x^T\xi }\right) \right] \longrightarrow 2\left| \left| x\right| \right| -\left| \left| x\right| \right| \mathbb {E}_{F}\left[ {\text {sin}}\left( {x^T\xi }\right) \right] . \end{aligned}$$

The first two limits imply that $\sup _{F_0\in \mathcal {F}_N}\left| \mathbb {E}_{F_0}\left[ {\text {cos}}\left( x^T\xi \right) \right] -\mathbb {E}_{F}\left[ {\text {cos}}\left( x^T\xi \right) \right] \right| \rightarrow 0$ and the second two imply that $\sup _{F_0\in \mathcal {F}_N}\left| \mathbb {E}_{F_0}\left[ {\text {sin}}\left( x^T\xi \right) \right] -\mathbb {E}_{F}\left[ {\text {sin}}\left( x^T\xi \right) \right] \right| \rightarrow 0$. Together, recalling that $e^{It}={\text {cos}}(t)+I{\text {sin}}(t)$, this implies that

$$\begin{aligned} \sup _{F_0\in \mathcal {F}_N}\left| \mathbb {E}_{F_0}\left[ e^{Ix^T\xi }\right] -\mathbb {E}_{F}\left[ e^{Ix^T\xi }\right] \right| \rightarrow 0. \end{aligned}$$

However, since, $\mathbb {E}_{F_N}\left[ e^{Ix^T\xi }\right] \not \rightarrow \mathbb {E}_{F}\left[ e^{Ix^T\xi }\right] $, it must be that $F_N\notin \mathcal {F}_N$ i.o.$\square $

1.6 Proof of Theorem 4

Proof

In the case of finite support $\varXi =\{\hat{\xi }^1,\dots ,\hat{\xi }^n\}$, total variation metrizes weak convergence:

$$\begin{aligned} d_\text {TV}(q,q')=\frac{1}{2}\sum _{j=1}^n\left| q(j)-q'(j)\right| . \end{aligned}$$

Restrict to the almost sure event $d_\text {TV}(\hat{p}_N,p)\rightarrow 0$ (see Theorem 11.4.1 of [18]). We need only show that now $\sup _{p_0\in \mathcal {F}_N}d_\text {TV}(\hat{p}_N,p_0)\rightarrow 0$, yielding the contrapositive of the uniform consistency condition.

By an application of the Cauchy–Schwartz inequality (cf. [22]),

$$\begin{aligned} d_\text {TV}(\hat{p}_N,p_0)= & {} \frac{1}{2}\sum _{j=1}^n\frac{\left| \hat{p}_N(j)-p_0(j)\right| }{\sqrt{p_0(j)}}\times \sqrt{p_0(j)}\\\le & {} \frac{1}{2}\left( \sum _{j=1}^n\frac{\left( \hat{p}_N(j)-p_0(j)\right) ^2}{{p_0(j)}}\right) ^{1/2}=\frac{X_N(p_0)}{2}. \end{aligned}$$

By Pinsker’s inequality (cf. [31]),

$$\begin{aligned} d_\text {TV}(\hat{p}_N,p_0)\le \frac{1}{\sqrt{2}}\left( \sum _{j=1}^n\sum _{j=1}^n\hat{p}_N(j)\log \left( \hat{p}_N(j)/p_0(j)\right) \right) ^{1/2}=\frac{G_N(p_0)}{2}. \end{aligned}$$

Since both the $\chi ^2$ and G-tests use a rejection threshold equal to $\sqrt{Q/N}$ where Q is the $(1-\alpha )$th quantile of a $\chi ^2$ distribution with $n-1$ degrees of freedom (Q is independent of N), we have that $d_\text {TV}(\hat{p}_N,p_0)$ is uniformly bounded over $p_0\in \mathcal {F}_N$ by a quantity diminishing with N.$\square $

1.7 Proof of Theorem 5

Proof

In the case of univariate support, the Lévy metric metrizes weak convergence:

$$\begin{aligned} d_{\mathrm{L}\acute{e}\mathrm{vy}}(G,G')=\inf \{\epsilon >0:G(\xi -\epsilon )-\epsilon \le G'(\xi )\le G(\xi +\epsilon )+\epsilon \ \forall \xi \in \mathbb {R}\}. \end{aligned}$$

Restrict to the almost sure event $d_{\mathrm{L}\acute{e}\mathrm{vy}}(\hat{F}_n,F)\rightarrow 0$ (see Theorem 11.4.1 of [18]). We need only show that now $\sup _{F_0\in \mathcal {F}_N}d_{\mathrm{L}\acute{e}\mathrm{vy}}(\hat{F}_N,F_0)\rightarrow 0$, yielding the contrapositive of the uniform consistency condition.

Fix $F_0$ and let $0\le \epsilon <d_{\mathrm{L}\acute{e}\mathrm{vy}}(\hat{F}_N,F_0)$. Then $\exists \xi _0$ such that either (1) $\hat{F}_N(\xi _0-\epsilon )-\epsilon >F_0(\xi _0)$ or (2) $\hat{F}_N(\xi _0+\epsilon )+\epsilon <F_0(\xi _0)$. Since $F_0$ is monotonically non-decreasing, (1) implies $D_N(F_0)\ge \hat{F}_N(\xi _0-\epsilon )-F_0(\xi _0-\epsilon )>\epsilon $ and (2) implies $D_N(F_0)\ge F_0(\xi _0+\epsilon )-\hat{F}_N(\xi _0+\epsilon )>\epsilon $. Hence $d_{\mathrm{L}\acute{e}\mathrm{vy}}(\hat{F}_N,F_0)\le D_N(F_0)$. Moreover, $D_N\le V_N$ by definition. Since $\sup _{F_0\in \mathcal {F}_{S_N}^\alpha }S_N(F_0)=Q_{S_N}(\alpha )=O(N^{-1/2})$ for either statistic, both the KS and Kuiper tests are uniformly consistent.

Consider $D'_N(F_0)=\max _{i=1,\,\dots ,\,N}\left| F_0(\xi ^{(i)})-\frac{2i-1}{2N}\right| =\sigma \left( F_0(\xi ^{(j)})-\frac{2j-1}{2N}\right) $, where j and $\sigma $ are the maximizing index and sign, respectively. Suppose $D'_N(F_0)\ge 1/\sqrt{N}+1/N$. If $\sigma =+1$, this necessarily means that $1-\frac{2j-1}{2N}\ge 1/\sqrt{N}+1/N$ and therefore $N-j\ge \lceil \sqrt{N}\rceil +1$. By monotonicity of $F_0$ we have for $0\le k\le \lceil \sqrt{N}\rceil $ that $j+k\le N$ and

$$\begin{aligned} F_0(\xi ^{(j+k)})-\frac{2(j+k)-1}{2N}\ge F_0(\xi ^{(j)})-\frac{2j-1}{2N}-\frac{k}{N}=D'_N(F_0)-\frac{k}{N}\ge 0. \end{aligned}$$

If instead $\sigma =-1$, this necessarily means that $\frac{2j-1}{2N}\ge 1/\sqrt{N}+1/N$ and therefore $j\ge \lceil \sqrt{N}\rceil +1$. By monotonicity of $F_0$ we have for $0\le k\le \lceil \sqrt{N}\rceil $ that $j-k\ge 1$ and

$$\begin{aligned} \frac{2(j-k)-1}{2N}-F_0(\xi ^{(j-k)})\ge \frac{2j-1}{2N}-F_0(\xi ^{(j)})-\frac{k}{N}=D'_N(F_0)-\frac{k}{N}\ge 0. \end{aligned}$$

In either case we have that

$$\begin{aligned} W_N^2= & {} \frac{1}{12N^2}+\frac{1}{N}\sum _{i=1}^N\left( F_0(\xi ^{(i)})-\frac{2i-1}{2N}\right) ^2\ge \frac{1}{12N^2}\\&+\, \frac{1}{N}\sum _{k=0}^{\lceil \sqrt{N}\rceil }\left( D'_N-\frac{k}{N}\right) ^2\ge \frac{D_N^2}{\sqrt{N}}-\frac{2}{N} \end{aligned}$$

using $D'_N(F_0)\ge 1/\sqrt{N}+1/N$ and $\left| D'_N(F_0)-D_N(F_0)\right| \le 1/(2N)$ in the last inequality. Therefore,

$$\begin{aligned} D_N^2(F_0)\le \max \left\{ \frac{1}{\sqrt{N}}+\frac{3}{2N},\,\sqrt{N}W_N^2(F_0)+\frac{2}{\sqrt{N}}\right\} . \end{aligned}$$

Since $F_0(\xi )(1-F_0(\xi ))\le 1$ and by using the integral formulation of CvM and AD (see [50]) the same is true replacing $W^2_N$ by $A^2_N$. For either of the CvM or AD statistic $Q_{S_N}(\alpha )=O(N^{-1/2})$ and hence $\sup _{F_0\in \mathcal {F}_{S_N}^\alpha }S^2_N(F_0)=O(N^{-1})$. Therefore, both the CvM and AD tests are uniformly consistent.

$$\begin{aligned} W_N^2-U_N^2&=\left( \frac{1}{N}\sum _{i=1}^NF_0(\xi ^{(i)})-\frac{1}{2}\right) ^2\\&\le \max \left\{ \left( \frac{1}{N}\sum _{i=1}^N\min \left\{ 1,\frac{2i-1}{2N}+D'_N(F_0)\right\} -\frac{1}{2}\right) ^2,\,\right. \\&\quad \left. \left( \frac{1}{N}\sum _{i=1}^N\max \left\{ 0,\frac{2i-1}{2N}-D'_N(F_0)\right\} -\frac{1}{2}\right) ^2\right\} . \end{aligned}$$

Letting $M=\lfloor \frac{1}{2}+N(1-D'_N(F_0))\rfloor $ we have $\sum _{i=1}^N\min \left\{ 1,\frac{2i-1}{2N}+D'_N(F_0)\right\} =\frac{M^2}{2N}+MD'_N(F_0)+N-M$ so that in the case of $D'_N(F_0)\ge 1/\sqrt{N}+1/N$, $\left( \frac{1}{N}\sum _{i=1}^N\min \left\{ 1,\frac{2i-1}{2N}+D'_N(F_0)\right\} -\frac{1}{2}\right) ^2=O(1/N)$. Thus, the Watson test is also uniformly consistent.$\square $

1.8 Proof of Proposition 5

Proof

Apply Theorem 2 to each i and restrict to the almost sure event that (18) holds for all i. Fix $F_N$ such that $F_N\in \mathcal {F}_N$ eventually. Then, (18) yields $\mathbb {E}_{F_N}[c_i(x;\xi _i)]\rightarrow \mathbb {E}_{F}[c_i(x;\xi _i)]$ for every $x\in X$. Summing over i yields the contrapositive of the c-consistency condition.$\square $

1.9 Proof of Proposition 6

Proof

Restrict to a sample path in the almost sure event $\mathbb {E}_{\hat{F}_N}[\xi _i]\rightarrow \mathbb {E}_{\hat{F}}[\xi _i]$, $\mathbb {E}_{\hat{F}_N}[\xi _i\xi _j]\rightarrow \mathbb {E}_{\hat{F}}[\xi _i\xi _j]$ for all i, j. Consider any $F_N$ such that $F_N\in \mathcal {F}^\alpha _{\text {CEG},N}$ eventually. Then clearly $\mathbb {E}_{F_N}[\xi _i]\rightarrow \mathbb {E}_{F}[\xi _i]$, $\mathbb {E}_{F_N}[\xi _i\xi _j]\rightarrow \mathbb {E}_{F}[\xi _i\xi _j]$.

Consider any $F_N$ such that $F_N\in \mathcal {F}^\alpha _{\text {DY},N}$ eventually. Because covariances exist, we may restrict to N large enough so that $\left| \left| \hat{\varSigma }_N\right| \right| _2\le M$ (operator norm) and $F_N\in \mathcal {F}^\alpha _{\text {DY},N}$. Then we get

$$\begin{aligned} \left| \left| \mathbb {E}_{F_N}[\xi ]-\hat{\mu }_N\right| \right| \le M\gamma _{1,N}(\alpha )\rightarrow 0 \end{aligned}$$

and

$$\begin{aligned} \left( \gamma _{3,N}(\alpha )-1\right) \hat{\varSigma }_N\preceq \mathbb {E}_{F_0}[\left( \xi -\hat{\mu }_N\right) \left( \xi -\hat{\mu }_N\right) ^T]-\hat{\varSigma }_N\preceq \left( \gamma _{2,N}(\alpha )-1\right) \hat{\varSigma }_N, \end{aligned}$$

which gives $\left| \left| \mathbb {E}_{F_0}[\left( \xi -\hat{\mu }_N\right) \left( \xi -\hat{\mu }_N\right) ^T]-\hat{\varSigma }_N\right| \right| _2\le M\max \left\{ \gamma _{2,N}(\alpha )-1,\right. \left. 1-\gamma _{3,N}(\alpha )\right\} \rightarrow 0$. Then again, we have $\mathbb {E}_{F_N}[\xi _i]\rightarrow \mathbb {E}_{ F}[\xi _i]$, $\mathbb {E}_{F_N}[\xi _i\xi _j]\rightarrow \mathbb {E}_{ F}[\xi _i\xi _j]$.

In either case we get $\mathbb {E}_{F_N}[c(x;\xi )]\rightarrow \mathbb {E}_{F_N}[c(x;\xi )]$ for any x due to factorability as in (26). This yields the contrapositive of the c-consistency condition.$\square $

1.10 Proof of Theorem 6

Proof

If $F_0\ne F$ then Theorem 1 of [44] yields that either $F_0\npreceq _\text {LCX}F$ or there is some $j=1,\dots ,d$ such that $\mathbb {E}_{F_0} [\xi _j^2]\ne \mathbb {E}_{F} [\xi _j^2]$. If $F_0\npreceq _\text {LCX}F$ then probability of rejection approaches one since $C_N>0$ but $Q_{C_N}(\alpha _1)\rightarrow 0$. Otherwise, $F_0\preceq _\text {LCX}F$ yields $\mathbb {E}_{F_0} [\xi _i^2]\le \mathbb {E}_{F} [\xi _i^2]$ for all i via (12) using $a=e_i$ and $\phi (\zeta )=\zeta ^2$. Then $\mathbb {E}_{F_0} [\xi _j^2]\ne \mathbb {E}_{F} [\xi _j^2]$ must mean that $\mathbb {E}_{F_0}\left[ \left| \left| \xi \right| \right| _2^2\right] <\mathbb {E}_{F}\left[ \left| \left| \xi \right| \right| _2^2\right] $ and probability of rejection goes to one.$\square $

1.11 Proof of Theorem 7

Proof

Let $R=\sup _{\xi \in \varXi }\left| \left| \xi \right| \right| _2<\infty $. Restrict to the almost sure event that $\hat{F}_N\rightarrow F$. Consider $F_N$ such that $F_N\in \mathcal {F}_N$ eventually. Let N be large enough so that it is so. Fix $\left| \left| a\right| \right| _2=1$. Let $a_1=a$ and complete an orthonormal basis for $\mathbb {R}^{d}$: $a_1,a_2,\dots ,a_d$. On the one hand we have $Q_{R_N}(\alpha _2)\ge \mathbb {E}_{\hat{F}_N}\left[ \sum _{i=1}^d(a_i^T\xi )^2\right] -\mathbb {E}_{F_N}\left[ \sum _{i=1}^d(a_i^T\xi )^2\right] $. On the other hand, for each i,

$$\begin{aligned}&\mathbb {E}_{\hat{F}_N}\left[ (a_i^T\xi )^2\right] -\mathbb {E}_{ F_N}\left[ (a_i^T\xi )^2\right] \\&\quad =~2\int _{b=-R}^0\left( \mathbb {E}_{\hat{F}_N}[\max \left\{ b-a_i^T\xi ,0\right\} ]-\mathbb {E}_{ F_N}[\max \left\{ b-a_i^T\xi ,0\right\} ]\right) db\\&\qquad +2\int _{b=0}^R\left( \mathbb {E}_{\hat{F}_N}[\max \left\{ a_i^T\xi -b,0\right\} ]-\mathbb {E}_{ F_N}[\max \left\{ a_i^T\xi -b,0\right\} ]\right) db\\&\quad \ge ~4\int _{b=0}^R(\left| \left| a\right| \right| _1+\left| b\right| )Q_{C_N}(\alpha _1)db\ge 4\left( \sqrt{d}+R^2/2\right) Q_{C_N}(\alpha _1)=p_N. \end{aligned}$$

Therefore, $q_N=Q_{R_N}(\alpha _2)+(d-1)p_N\ge \mathbb {E}_{\hat{F}_N}\left[ (a^T\xi )^2\right] -\mathbb {E}_{F_N}\left[ (a^T\xi )^2\right] $ and $Q_{R_N}(\alpha _2),Q_{C_N}(\alpha _1),p_N,q_N\rightarrow 0$. Let $G_N(t)=F_N(\left\{ \xi :a^T\xi \le t\right\} )\in [0,1]$ and $\hat{G}_N(t)=\hat{F}_N(\left\{ \xi :a^T\xi \le t\right\} )\in [0,1]$ be the CDFs of $a^T\xi $ under $F_N$ and $\hat{F}_N$, respectively. Then,

$$\begin{aligned} q_N&\ge \mathbb {E}_{\hat{F}_N}\left[ (a^T\xi )^2\right] -\mathbb {E}_{F_N}\left[ (a^T\xi )^2\right] \\&\quad =~2\int _{b=-R}^0\left( \mathbb {E}_{\hat{F}_N}[\max \left\{ b-a^T\xi ,0\right\} ]-\mathbb {E}_{ F_N}[\max \left\{ b-a^T\xi ,0\right\} ]\right) db\\&\qquad +2\int _{b=0}^R\left( \mathbb {E}_{\hat{F}_N}[\max \left\{ a^T\xi -b,0\right\} ]-\mathbb {E}_{ F_N}[\max \left\{ a^T\xi -b,0\right\} ]\right) db\\&\quad =~2\int _{b=-R}^0\int _{t=-R}^{b}\left( \hat{G}_N(t)-G_N(t)\right) dt\,db\\&\qquad +2\int _{b=0}^R\int _{t=b}^{R}\left( G_N(t)-\hat{G}_N(t)\right) dt\,db\ge p_N,\\&\quad \int _{t=-R}^{b}\left( \hat{G}_N(t)-G_N(t)\right) dt\ge -(\sqrt{d}+R)Q_{C_N}(\alpha )~~\forall b\in [-R,0],\\&\quad \int _{t=b}^{R}\left( G_N(t)-\hat{G}_N(t)\right) dt\ge -(\sqrt{d}+R)Q_{C_N}(\alpha )~~\forall b\in [0,R], \end{aligned}$$

Because $\hat{F}_N\rightarrow F$, we get $\hat{G}_N(t)\rightarrow F(\left\{ \xi :a^T\xi \le t\right\} )$ and therefore $G_N(t)\rightarrow F(\left\{ \xi :a^T\xi \le t\right\} )$ at every continuity point t. Because this is true for every a, the Cramer-Wold device yields $F_N\rightarrow F$. This is the contrapositive of the uniform consistency condition.$\square $

1.12 Proof of Theorem 8

Proof

The proof amounts to dualizing a finite-support phi-divergence as done in Theorem 1 of [3] and is included here only for the sake of self-containment.

Let $\phi _{X}(t)=(t-1)^2/t$ and $\phi _{G}(t)=-\log (t)+t-1$ (corresponding to “$\chi ^2$-distance” and “Burg entropy” in Table 2 of [3], respectively). Then we can write

$$\begin{aligned} \mathcal {F}_{X_N}^\alpha= & {} \left\{ p_0\ge 0:\,\sum _{j=1}^np_0(j)=1,\,\sum _{j=1}^n\hat{p}(j)\phi _X(p_0(j)/\hat{p}_N(j))\le (Q_{X_N}(\alpha ))^2\right\} ,\\ \mathcal {F}_{G_N}^\alpha= & {} \left\{ p_0\ge 0:\,\sum _{j=1}^np_0(j)=1,\,\sum _{j=1}^n\hat{p}(j)\phi _G(p_0(j)/\hat{p}_N(j))\le \frac{1}{2}(Q_{G_N}(\alpha ))^2\right\} . \end{aligned}$$

Therefore, letting $c_j=c(x;\hat{\xi }^j)$ and fixing $\phi $ and Q appropriately, the inner problem (4) is equal to

$$\begin{aligned} \max _{p_0\in \mathbb {R}^{n}_+}\quad&c^Tp_0\\ \text {s.t.}\quad&\sum _{j=1}^np_0(j)=1\\&\sum _{j=1}^n\hat{p}_N(j)\phi (p_0(j)/\hat{p}_N(j))\le Q^2. \end{aligned}$$

By Fenchel duality, the above is equal to

$$\begin{aligned}&\min _{r\in \mathbb {R},\,s\in \mathbb {R}_+}\max _{p_0\in \mathbb {R}^{n}_+}\quad c^Tp_0+r(1-e^Tp_0)+s(Q^2-\sum _{j=1}^n\hat{p}_N(j)\phi (p_0(j)/\hat{p}_N(j))) \\&\quad =\min _{r\in \mathbb {R},\,s\in \mathbb {R}_+} r+Q^2s+\sum _{j=1}^n\hat{p}_N(j)s\max _{\rho \in \mathbb {R}_+}\left( \frac{c_j-r}{s}\rho -\phi (\rho )\right) \\&\quad =\min _{r\in \mathbb {R},\,s\in \mathbb {R}_+} r+Q^2s+\sum _{j=1}^n \hat{p}_N(j)s\phi ^*\left( \frac{c_j-r}{s}\right) , \end{aligned}$$

where $\phi ^*(\tau )=\sup _{\rho \ge 0}(\tau \rho -\phi (\rho ))$ is the convex conjugate. Therefore, the inner problem (4) is equal to

$$\begin{aligned} \min _{r,s,t}\quad&r+Q^2s-\sum _{j=1}^n\hat{p}_N(j)t_j\\ \text {s.t.}\quad&r\in \mathbb {R},\,s\in \mathbb {R}_+,\,t\in \mathbb {R}^{n} ,\,c\in \mathbb {R}^{n}\\&t_j\le -s\phi ^*\left( \frac{c_j-r}{s}\right)&\forall j=1,\dots ,N \\&c_j\ge c(x;\hat{\xi }^j)&\forall j=1,\dots ,N. \end{aligned}$$

It is easy to verify that

$$\begin{aligned} \phi ^*_{X}(\tau )=\left\{ \begin{array}{ll}2-2\sqrt{1-\tau }&{}\quad \tau \le 1\\ \infty &{}\quad \text {otherwise}\end{array}\right. ,\quad \text {and}\quad \phi ^*_{G}(\tau )=\left\{ \begin{array}{ll}-\log (1-\tau )&{}\quad \tau \le 1\\ \infty &{}\quad \text {otherwise}\end{array}\right. \end{aligned}$$

(see e.g. Table 4 of [3]). In the case of $X_N$, since $s\ge 0$, we get

$$\begin{aligned} \begin{array}{c} t_j\le -s\phi _X^*\left( \frac{c_j-r}{s}\right) \end{array}&\iff \begin{array}{c} c_j-r\le s,\\ 2s+t_j\le 2\sqrt{s(s-c_j+r)} \end{array}\\ {}&\iff \exists y_j:\begin{array}{c} c_j-r\le s,\,y_j\ge 0,\\ 2s+t_j\le y_j,\\ y_j^2\le (2s)(2s-2c_j+2r) \end{array}\\ {}&\iff \exists y_j:\begin{array}{c} c_j-r\le s,\,y_j\ge 0,\\ 2s+t_j\le y_j,\\ y_j^2+(r-c_j)^2\le (2s-c_j+r)^2. \end{array} \end{aligned}$$

In the case of $G_N$, since $s\ge 0$, we get

$$\begin{aligned} t_j\le -s\phi _G^*\left( \frac{c_j-r}{s}\right)&\iff \begin{array}{c} c_j-r\le s,\\ se^{t_j/s}\le s+r-c_j \end{array}\\ {}&\iff \left( t_j,\,s,\,s+r-c_j\right) \in C_\text {XC}. \end{aligned}$$

$\square $

1.13 Proof of Theorem 9

Proof

Problem (3) is equal to the optimization problems of Theorem 8 augmented with the variable $x\in X$ and weak optimization is polynomially reducible to weak separation (see [23]). Tractable weak separation for all constraints except $x\in X$ and (28) is given by the tractable weak optimization over these standard conic-affine constraints. A weak separation oracle is assumed given for $x\in X$. We polynomially reduce separation over $c_j\ge \max _kc_{jk}(x)$ for fixed $c'_j,x'$ to the oracles. We first call the evaluation oracle for each k to check violation and if there is a violation and $k^*\in \arg \max _kc_{jk}(x')$ then we call the subgradient oracle to get $s\in \partial c_{jk^*}(x')$ with $\left| \left| s\right| \right| _\infty \le 1$ and produce the separating hyperplane $0\ge c_{jk^*}(x')-c_j+s^T(x-x')$.

$\square $

1.14 Proof of Theorem 10

Proof

Substituting the given formulas for $K_{S_N},\,A_{S_N},\,b_{S_N,\alpha }$ for each $S_N\in \{D_N,V_N,W_N,U,N,A_N\}$ in $A_{S_N}\zeta -b_{S_N,\alpha }\in K_{S_N}$ we obtain exactly $S_N(\zeta _1,\dots ,\zeta _N)\le Q_{S_N}(\alpha )$ for $S_N$ as defined in (8). We omit the detailed arithmetic.$\square $

1.15 Proof of Theorem 13

Proof

Under these assumptions (3) is equal to the optimization problems of Theorem 11 or 12 augmented with the variable x and weak optimization is polynomially reducible to weak separation (see [23]). Tractable weak separation for all constraints except $x\in X$ and (30) is given by the tractable weak optimization over these standard conic-affine constraints. A weak separation oracle is assumed given for $x\in X$. By continuity and given structure of $c(x;\xi )$, we may rewrite (30) as

$$\begin{aligned} c_i\ge \max _{\xi \in [\xi ^{(i-1)},\xi ^{(i)}]} c_k(x;\xi )\quad \forall k=1,\dots ,K. \end{aligned}$$

(56)

We polynomially reduce weak $\delta $-separation over the kth constraint at fixed $c'_{i},x'$ to the oracles. We call the $\delta $-optimization oracle to find $\xi '\in [\xi ^{(i-1)},\xi ^{(i)}]$ such that $c_k(x';\xi ')\ge \max _{\xi \in [\xi ^{(i-1)},\xi ^{(i)}]} c_k(x;\xi )-\delta $. If $c_i'\ge c_k(x';\xi ')$ then $(c_i'+\delta ,x')$ satisfy the constraint and is within $\delta $ of $(c_i',x')$. If $c_i'< c_k(x';\xi ')$ then we call the subgradient oracle to get $s\in \partial _x c_{k}(x',\xi ')$ with $\left| \left| s\right| \right| _\infty \le 1$ and produce the hyperplane $c_i\ge c(x';\xi ')+s^T(x-x')$ that is violated by $(c_i',x')$ and for any $(c_i,x)$ satisfying (55) (in particular if it is in the $\delta $-interior) we have $c_i\ge \max _{\xi \in [\xi ^{(i-1)},\xi ^{(i)}]} c_k(x;\xi )\ge c_k(x;\xi ')\ge c_k(x';\xi ')+s^T(x-x')$ since s is a subgradient. The case for constraints (31) is similar.

$\square $

1.16 Proof of Proposition 7

Proof

According to Theorem 11, the observations in Example 2, and by renaming variables, the DRO (3) is given by

$$\begin{aligned} (P) \min&y+\sum _{i=1}^N\left( {Q_{D_N}(\alpha )}+\frac{i-1}{N}\right) s_i+\sum _{i=1}^N\left( {Q_{D_N}(\alpha )}-\frac{i}{N}\right) t_i\\&{\text {s.t.}} \;x\in \mathbb {R}_+,\,y\in \mathbb {R},\,s\in \mathbb {R}^{N}_+,\,t\in \mathbb {R}^{N}_+\\&(r-c)x+y+\sum _{i=j}^N(s_i-t_i)\ge (r-c)\xi ^{(j)}&\forall j=1,\dots ,N+1\\&-(c-b)x+y+\sum _{i=j}^N(s_i-t_i)\ge -(c-b)\xi ^{(j-1)}&\forall j=1,\dots ,N+1. \end{aligned}$$

Applying linear optimization duality we get that its dual is

$$\begin{aligned} (D)\ \max \;&(r-c)\sum _{i=1}^{N+1}\xi ^{(i)}p_i-(c-b)\sum _{i=1}^{N+1}\xi ^{(i-1)}q_i\\ {\text {s.t.}} \;&p\in \mathbb {R}^{N+1}_+,\,q\in \mathbb {R}^{N+1}_+\\&(r-c)\sum _{i=1}^{N+1}p_i-(c-b)\sum _{i=1}^{N+1}q_i\le 0\\&\sum _{i=1}^{N+1}p_i+\sum _{i=1}^{N+1}q_i=1\\&\sum _{i=1}^{j}p_i+\sum _{i=1}^{j}q_i\le {Q_{D_N}(\alpha )}+\frac{j-1}{N}&\forall j=1,\dots ,N\\&-\sum _{i=1}^{j}p_i-\sum _{i=1}^{j}q_i\le {Q_{D_N}(\alpha )}-\frac{j}{N}&\forall j=1,\dots ,N. \end{aligned}$$

It can be directly verified that the following primal and dual solutions are respectively feasible

$$\begin{aligned} x&=(1-\theta )\xi ^{(i_\text {lo})}+\theta \xi ^{(i_\text {hi})},\\ y&=(r-c)\xi ^{(N+1)}-(r-c)x,\\ s&_i=\left\{ \begin{array}{ll}(c-b)\left( \xi ^{(i)}-\xi ^{(i-1)}\right) &{}\quad i\le i_\text {lo}\\ 0&{}\quad \text {otherwise}\end{array}\right.&\forall i=1,\dots ,N,\\ t&_i=\left\{ \begin{array}{ll}(r-c)\left( \xi ^{(i+1)}-\xi ^{(i)}\right) &{}\quad i\ge i_\text {hi}\\ 0&{}\quad \text {otherwise}\end{array}\right.&\forall i=1,\dots ,N\\ p_i=&\left\{ \begin{array}{ll} 0&{}\quad i\le i_\text {hi}-1\\ i/N-\theta -{Q_{D_N}(\alpha )}&{}\quad i=i_\text {hi}\\ 1/N&{}\quad N\ge i\ge i_\text {hi}+1\\ {Q_{D_N}(\alpha )}&{}\quad i=N+1 \end{array}\right. ,&\forall i=1,\dots ,N\\ q_i=&\left\{ \begin{array}{ll} {Q_{D_N}(\alpha )}&{}\quad i=1\\ 1/N&{}\quad 2\le i\le i_\text {lo}\\ \theta -{Q_{D_N}(\alpha )}-(i-2)/N&{}\quad i=i_\text {lo}+1\\ 0&{}i\ge i_\text {lo}+2 \end{array}\right.&\forall i=1,\dots ,N \end{aligned}$$

and that both have objective cost in their respective programs of

$$\begin{aligned} z=&-(c-b){Q_{D_N}(\alpha )}\xi ^{(0)}-\frac{c-b}{N}\sum _{i=1}^{i_\text {lo}-1}\xi ^{(i)}-(c-b)\left( \theta -{Q_{D_N}(\alpha )}-\frac{i_\text {lo}-1}{N}\right) \xi ^{(i_\text {lo})}\\ {}&+ (r-c){Q_{D_N}(\alpha )}\xi ^{(N+1)}+\frac{r-c}{N}\sum _{i=i_\text {hi}+1}^N\xi ^{(i)}+(r-c)\left( \frac{i_\text {hi}}{N}-{Q_{D_N}(\alpha )}-\theta \right) \xi ^{(i_\text {hi})}. \end{aligned}$$

This proves optimality of x. Adding $0=(c-b)\theta x-(r-c)(1-\theta )x$ to the above yields the form of the optimal objective given in the statement of the result. $\square $

1.17 Proof of Theorem 14

Proof

Fix x. Let $S=\{(a,b)\in \mathbb {R}^{d+1}:\left| \left| a\right| \right| _1+\left| b\right| \le 1\}$. Using the notation of [46], letting C be the cone of nonnegative measures on $\varXi $ and $C'$ the cone of nonnegative measures on S, we write the inner problem as

$$\begin{aligned} \sup _F\quad&\left\langle F,c(x;\xi )\right\rangle _\varXi \\ {\text {s.t.}}\quad&F\in C,\,\left\langle F,1\right\rangle _\varXi =1\\&{\frac{1}{N}\sum _{i=1}^N\max \{a^T\xi ^i-b,0\}+Q_{C_N}(\alpha _1)-\left\langle F,\max \{a^T\xi -b,0\}\right\rangle _\varXi }\\&\ge 0\quad \forall (a,b)\in S\\&\left\langle F,\left| \left| \xi \right| \right| _2^2\right\rangle _\varXi \ge Q_{R_N}^{\alpha _2} \end{aligned}$$

Invoking Proposition 2.8 of [46] (with the generalized Slater point equal to the empirical distribution), we have that the strongly dual minimization problem is

$$\begin{aligned} \min _{G,\tau ,\theta }\quad&\theta +\left\langle G,Q_{C_N}(\alpha _1)+\frac{1}{N}\sum _{i=1}^N\max \{a^T\xi ^i-b,0\}\right\rangle _S-Q_{R_N}^{\alpha _2}\tau \\ {\text {s.t.}}\quad&G\in C',\,\tau \in \mathbb {R}_+,\,\theta \in \mathbb {R}\nonumber \\&\inf _{\xi \in \varXi }\left( \left\langle G,\max \{a^T\xi -b,0\}\right\rangle _S-c(x;\xi )-\tau \left| \left| \xi \right| \right| _2^2\right) \ge -\theta . \end{aligned}$$

(57)

If (40) is true, we can infer from constraint (57) that

$$\begin{aligned} \tau\le & {} \inf _{\xi \in \varXi }\frac{\theta +\left\langle G,\max \{a^T\xi -b,0\}\right\rangle _S-c(x;\xi )}{\left| \left| \xi \right| \right| _2^2}\\\le & {} \liminf _{i\rightarrow \infty }\frac{\theta +{G}\{S\}\max \{\left| \left| \xi '_i\right| \right| _\infty ,1\}-c(x;\xi '_i)}{\left| \left| \xi '_i\right| \right| _2^2}=0. \end{aligned}$$

This shows that $\tau =0$ is the only feasible choice.

In general, $\tau =0$ is a feasible choice and fixing it so provides an upper bound on (56) and hence on the original inner problem by weak duality. Moreover, plugging $\xi _0$ in to (57) we conclude that

$$\begin{aligned} \tau \le \frac{\theta +{G}\{S\}\max \{\left| \left| \xi _0\right| \right| _\infty ,1\}-c(x;\xi _0)}{\left| \left| \xi _0\right| \right| _2^2} \le \frac{1}{R^2}\theta +\frac{R+1}{R^2}G\{S\}-\frac{1}{R^2}c(x;\xi _0). \end{aligned}$$

Hence, setting $\tau =0$ in (57) and replacing $\tau $ by the above bound in the objective provides a lower bound on (56) and hence on the original inner problem.

In order to study both cases, and both the upper and lower bounds in the latter case, we consider for the rest of the proof the following general problem given $\eta $ and $\nu $,

$$\begin{aligned} \min _{G,\theta }\quad&\eta \theta +\left\langle G,\nu +\frac{1}{N}\sum _{i=1}^N\max \{a^T\xi ^i-b,0\}\right\rangle _S\end{aligned}$$

(58)

$$\begin{aligned} {\text {s.t.}}\quad&G\in C',\,\theta \in \mathbb {R}\nonumber \\&\inf _{\xi \in \varXi }\left( \left\langle G,\max \{a^T\xi -b,0\}\right\rangle _S-c(x;\xi )\right) \ge -\theta . \end{aligned}$$

(59)

We first rewrite (59) using the representation (37):

$$\begin{aligned} \inf _{\xi \in \varXi }\left( \left\langle G,\max \{a^T\xi -b,0\}\right\rangle _S-c_k(x;\xi )\right) \ge -\theta \quad \forall k=1,\dots ,K. \end{aligned}$$

Next, invoking [46], and employing the concave conjugate, we rewrite the kth of these constraints as follows:

$$\begin{aligned} -\theta&\le \inf _{\xi \in \varXi ,\,g(\cdot )\ge 0}\sup _{H_k\in C'}\left( \left\langle G,g\right\rangle _S-c_k(x;\xi )+\left\langle H_k,a^T\xi -b-g\right\rangle _S\right) \\&=\sup _{H_k\in C',\,G-H_k\in C'}\left( \left\langle H_k,-b\right\rangle _S+c_{k*}(x;\left\langle H_k,a\right\rangle _S)\right) . \end{aligned}$$

Introducing the variables $H_k$ and this equivalent constraint into the problem (58) and invoking [46] we find that (58) is equal to the dual problem:

$$\begin{aligned} \max _{p,\,q\,\psi }\quad&\sum _{k=1}^Kp_kc_{k**}(x,\,q_k/p_k)\\ {\text {s.t.}}\quad&p_k\in \mathbb {R}^{K}_+,\,q\in \mathbb {R}^{K\times d},\,\inf _{(a,b)\in S}\psi _k(a,b)\ge 0\,\forall k=1,\dots ,K\\&\sum _{k=1}^Kp_k=\eta \\&\inf _{(a,b)\in S}\left( \nu +\frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\xi ^i-b,0\right\} -\sum _{k=1}^K\psi _k(a,b)\right) \ge 0\\&\inf _{(a,b)\in S}\left( \psi _k(a,b)-a^Tq_k+p_kb\right) \ge 0\quad \forall k=1,\dots ,K. \end{aligned}$$

Since $c_k(x;\xi )$ is closed concave in $\xi $, $c_{k**}(x;\xi )=c_k(x;\xi )$. Moreover, recognizing that $\psi _k=\max \left\{ a^Tq_k-p_kb,0\right\} $ is optimal, we can rewrite the above problem as:

$$\begin{aligned} \max _{p,\,q}\quad&\sum _{k=1}^Kp_kc_{k}(x,\,q_k/p_k) \end{aligned}$$

(60)

$$\begin{aligned} {\text {s.t.}}\quad&p_k\in \mathbb {R}^{K}_+,\,q\in R{K\times d}\nonumber \\&\sum _{k=1}^Kp_k=\eta \nonumber \\&\inf _{(a,b)\in S}\left( \nu +\frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\xi ^i-b,0\right\} -\sum _{k=1}^K\max \left\{ a^Tq_k-p_kb,0\right\} \right) \ge 0. \end{aligned}$$

(61)

Next, we rewrite (61) by splitting it over the different branches on the sum of K maxima, noting that the all-zeros branch is trivial:

$$\begin{aligned} \nu \ge \sup _{(a,b)\in S}\left( \sum _{k=1}^k\gamma _k\left( a^Tq_k-p_kb\right) -\frac{1}{N}\sum _{i=1}^N\max \left\{ a^T\xi ^i-b,0\right\} \right) \quad \forall \gamma \in \mathcal {G}. \end{aligned}$$

Next, we invoke linear optimization duality to rewrite the $\gamma $th constraint as follows:

Introducing the variables $\mu _\gamma ,\rho _\gamma ,u_\gamma ,v_\gamma ,u'_\gamma ,v'_\gamma $ and this equivalent constraint into (60) and invoking linear optimization duality yields $\mathcal {C}'(x;\nu ,\eta )$ and the result.$\square $

1.18 Proof of Theorem 15

Proof

Under these assumptions, weak optimization is polynomially reducible to weak separation (see [23]) and separation is given for all constraints but (39). We polynomially reduce weak $\delta $-separation over the kth constraint at fixed $x',h'_k,g'_k$. First we call the concave conjugate oracle to find $\xi '$ such that $ {h_k'}^T\xi '-c_k(x';\xi ')\le c_{k*}(x';h_k')+\delta . $ If $g_k'\le {h_k'}^T\xi -c_k(x';\xi ')$ then $x',h'_k,g'_k-\delta $ satisfies the constraint and is within $\delta $ of the given point. Otherwise, we call the subgradient oracle to get $s\in \partial _x c_{k}(x',\xi ')$ with $\left| \left| s\right| \right| _\infty \le 1$ and produce the hyperplane $g_k\le h_k^T\xi '-c_k(x';\xi ')-s^T(x-x')$, which is violated by the given point and is valid whenever the original constraint is valid (in particular in its $\delta $-interior).$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bertsimas, D., Gupta, V. & Kallus, N. Robust sample average approximation. Math. Program. 171, 217–282 (2018). https://doi.org/10.1007/s10107-017-1174-z

Download citation

Received: 11 August 2014
Accepted: 21 June 2017
Published: 29 June 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10107-017-1174-z

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust sample average approximation

Abstract

Access this article

Similar content being viewed by others

Data-driven robust optimization

The State of Robust Optimization

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Proof of Theorem 1

Proof

1.2 Proof of Proposition 2

Proof

1.3 Computing a threshold \(Q_{C_N}(\alpha )\)

Theorem 16

Proof

1.4 Proof of Proposition 4

Proof

1.5 Proofs of Theorems 2 and 3

Proposition 8

Proof

Proposition 9

Proof

Proof

Proof

1.6 Proof of Theorem 4

Proof

1.7 Proof of Theorem 5

Proof

1.8 Proof of Proposition 5

Proof

1.9 Proof of Proposition 6

Proof

1.10 Proof of Theorem 6

Proof

1.11 Proof of Theorem 7

Proof

1.12 Proof of Theorem 8

Proof

1.13 Proof of Theorem 9

Proof

1.14 Proof of Theorem 10

Proof

1.15 Proof of Theorem 13

Proof

1.16 Proof of Proposition 7

Proof

1.17 Proof of Theorem 14

Proof

1.18 Proof of Theorem 15

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation