Abstract
Upper semicontinuous (usc) functions arise in the analysis of maximization problems, distributionally robust optimization, and function identification, which includes many problems of nonparametric statistics. We establish that every usc function is the limit of a hypo-converging sequence of piecewise affine functions of the difference-of-max type and illustrate resulting algorithmic possibilities in the context of approximate solution of infinite-dimensional optimization problems. In an effort to quantify the ease with which classes of usc functions can be approximated by finite collections, we provide upper and lower bounds on covering numbers for bounded sets of usc functions under the Attouch-Wets distance. The result is applied in the context of stochastic optimization problems defined over spaces of usc functions. We establish confidence regions for optimal solutions based on sample average approximations and examine the accompanying rates of convergence. Examples from nonparametric statistics illustrate the results.
Similar content being viewed by others
Notes
We stress that \(\nu \) is an index and not the power of q.
Recall that “open” here is according to the metric space \((S,\Vert \cdot -\cdot \Vert _\infty ).\)
For the significance of entropy integrals we refer to [44].
This reference states results only for finite dimensions, but since \((F,{\mathbb {d}})\) is a complete separable metric space, with compact balls, the proofs of the required results carry over nearly verbatim.
On \((F,{\mathbb {d}})\), we adopt the Borel sigma-algebra.
For measurable \(h:\Xi \rightarrow {\overline{{\mathbb {R}}}}\), \(\int h(\xi )d{\mathbb {P}}(\xi ) = \int \max \{0,h(\xi )\}d{\mathbb {P}}(\xi ) - \int \max \{0, -h(\xi )\}d{\mathbb {P}}(\xi )\), with \(\infty -\infty = \infty \).
A random variable Y is sub-exponential if for some \(\lambda \ge 0\), \({\mathbb {E}}[\exp (\tau (Y-{\mathbb {E}}Y))] \le \exp (\tau ^2\lambda ^2/2)\) for all \(|\tau |\le 1/\lambda \). Another assumption that ensures a Bernstein-type large-deviation result could have been substituted here.
References
Balabdaoui, F., Wellner, J.A.: Estimation of a k-monotone density: characterizations, consistency and minimax lower bounds. Stat. Neerl. 64(1), 45–70 (2010)
Bampou, D., Kuhn, D.: Polynomial approximations for continuous linear programs. SIAM J. Optim. 22, 628–648 (2012)
Bartlett, P.L., Kulkarni, S.R., Posner, S.E.: Covering numbers for real-valued function classes. IEEE Trans. Inf. Theory 43(5), 1721–1724 (1997)
Bayraksan, G., Morton, D.P.: Assessing solution quality in stochastic programs. Math. Program. 108, 495–514 (2006)
Birman, M.S., Solomjak, M.Z.: Piecewise-polynomial approximation of functions of the classes \(w_p^\alpha \). Math. USSR Sbornik 73, 295–317 (1967)
Bronshtein, E.M.: \(\epsilon \)-Entropy of convex sets and functions. Sib. Math. J. 17(3), 393–398 (1976)
Brudnyi, A.: On covering numbers of sublevel sets of analytic functions. J. Approx. Theory 162(1), 72–93 (2010)
Cui, Y., Pang, J.-S., Sen, B.: Composite difference-max programs for modern statistical estimation problems. SIAM J. Optim. 28(4), 3344–3374 (2018)
Cule, M., Samworth, R.J., Stewart, M.: Maximum likelihood estimation of a multi-dimensional log-concave density. J. R. Stat. Soc. Ser. B 72, 545–600 (2010)
Devolder, O., Glineur, F., Nesterov, Y.: Solving infinite-dimensional optimization problems by polynomial approximation. In: Diehl, M., Glineur, F., Jarlebring, E., Michiels, W. (eds.) Recent Advances in Optimization and its Applications in Engineering, pp. 31–40. Springer, Berlin (2010)
Dudley, R.M.: Metric entropy of some classes of sets with differentiable boundaries. J. Approx. Theory 10(3), 227–236 (1974)
Georghiou, A., Wiesemann, W., Kuhn, D.: Generalized decision rule approximations for stochastic programming via liftings. Math. Program. 152(1–2), 301–338 (2015)
Groeneboom, P., Jongbloed, G., Wellner, J.A.: Estimation of a convex function: characterizations and asymptotic theory. Ann. Stat. 29, 1653–1698 (2001)
Guntuboyina, A., Sen, B.: Covering numbers for convex functions. IEEE Trans. Inf. Theory 59(4), 1957–1965 (2013)
Guntuboyina, A., Sen, B.: Global risk bounds and adaptation in univariate convex regression. Probab. Theory Relat. Fields 163, 379–411 (2015)
Guo, Y., Bartlett, P.L., Shawe-Taylor, J., Williamson, R.C.: Covering numbers for support vector machines. IEEE Trans. Inf. Theory 48(1), 239–250 (2002)
Hanasusanto, G.A., Wiesemann, W., Kuhn, D.: K-adaptability in two-stage robust binary programming. Oper. Res. 63(4), 877–891 (2015)
Hartman, P.: On functions representable as a difference of convex functions. Pac. J. Math. 9, 707–713 (1959)
Higle, J.L., Sen, S.: Statistical verification of optimality conditions for stochastic programs with recourse. Ann. Oper. Res. 30, 215–240 (1991)
Higle, J.L., Sen, S.: Duality and statistical tests of optimality for two stage stochastic programs. Math. Program. 75, 257–275 (1996)
Horst, R., Thoai, N.V.: DC programming: overview. J. Optim. Theory Appl. 103(1), 1–43 (1999)
Kim, A.K.H., Samworth, R.J.: Global rates of convergence in log-concave density estimation. Ann. Stat. 44, 2756–2779 (2016)
Kolmogorov, A.N., Tikhomirov, V.M.: Epsilon-entropy and epsilon-capacity of sets in functional spaces. Am. Math. Soc. Transl. Ser. 2(17), 277–364 (1961)
Kühn, T.: Covering numbers of Gaussian reproducing kernel Hilbert spaces. J. Complex. 27(5), 489–499 (2011)
Lamm, M., Lu, S.: Generalized conditioning based approaches to computing confidence intervals for solutions to stochastic variational inequalities. Math. Program. B 174, 99–127 (2018)
Lu, S., Liu, Y., Yin, L., Zhang, K.: Confidence intervals and regions for the lasso by using stochastic variational inequality techniques in optimization. J. R. Stat. Soc. Ser. B 79, 589–611 (2017)
Mak, W.K., Morton, D.P., Wood, R.K.: Monte Carlo bounding techniques for determining solution quality in stochastic programs. Oper. Res. Lett. 24, 47–56 (1999)
Miller, M.: Binary classification using piecewise affine functions. Master’s thesis, Naval Postgraduate School, Monterey, CA, June (2019)
Norkin, V.I., Pflug, G.C., Ruszczynski, A.: A branch and bound method for stochastic global optimization. Math. Program. 83, 425–450 (1998)
Pontil, M.: A note on different covering numbers in learning theory. J. Complex. 19(5), 665–671 (2003)
Rockafellar, R.T., Wets, R. J-B.: Variational Analysis, Grundlehren der Mathematischen Wissenschaft, vol. 317. Springer, Berlin (1998). (3rd printing-2009 edition)
Royset, J.O.: Optimality functions in stochastic programming. Math. Program. 135(1), 293–321 (2012)
Royset, J.O.: Approximations and solution estimates in optimization. Math. Program. 170(2), 479–506 (2018)
Royset, J.O., Wets, R.J.-B.: From data to assessments and decisions: epi-spline technology. In: Newman, A. (ed.) INFORMS Tutorials. INFORMS, Catonsville (2014)
Royset, J.O., Wets, R.J.-B.: Multivariate epi-splines and evolving function identification problems. Set-Valued and Variational Analysis 24(4), 517–545 (2016). (Erratum: pp. 547–549)
Royset, J.O., Wets, R.J.-B.: Variational theory for optimization under stochastic ambiguity. SIAM J. Optim. 27(2), 1118–1149 (2017)
Royset, J.O., Wets, R.J.-B.: On univariate function identification problems. Math. Program. B 168(1–2), 449–474 (2018)
Royset, J.O., Wets, R.J.-B.: Variational analysis of constrained M-estimators. ArXiv e-prints (2018)
Salinetti, G., Wets, R.J.-B.: On the convergence in distribution of measurable multifunctions (random sets), normal integrands, stochastic processes and stochastic infima. Math. Oper. Res. 11(3), 385–419 (1986)
Salinetti, G., Wets, R.J.-B.: On the hypo-convergence of probability measures. In: Conti, R., De Giorgi, E., Gianessi, F. (eds.) Optimication and Related Fields, Proceedings, Erice 1984, Lecture Notes in Mathematics, vol. 1190, pp. 371–395. Springer, Berlin (1986)
Seijo, E., Sen, B.: Nonparametric least squares estimation of a multivariate convex regression. Ann. Stat. 39, 1633–1657 (2011)
Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on Stochastic Programming: Modeling and Theory, 2nd edn. SIAM, Philadelphia (2014)
Shapiro, A., Homem-de-Mello, T.: A simulation-based approach to two-stage stochastic programming with recourse. Math. Program. 81, 301–325 (1998)
van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes. Springer, Berlin (1996). (2nd printing 2000 edition)
van de Geer, S.: Empirical Processes in M-Estimation. Cambridge University Press, Cambridge (2000)
Wang, J., Huang, H., Luo, Z., Chen, B.: Estimation of covering number in learning theory. In: Proceeding of the Fifth International Conference on Semantics, Knowledge and Grid 2009, pp. 388–391 (2009)
Zhang, Z., Yang, X., Oseledets, I.V., Karniadakis, G.E., Daniel, L.: Enabling high-dimensional hierarchical uncertainty quantification by anova and tensor-train decomposition. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 34(1), 63–76 (2015)
Zhou, D.-X.: The covering number in learning theory. J. Complex. 18(3), 739–767 (2002)
Acknowledgements
This work in supported in parts by DARPA under Grants HR0011-14-1-0060 and HR0011-8-34187, and Office of Naval Research (Science of Autonomy Program) under Grant N00014- 17-1-2372.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of Theorem 4.4
Let \(\rho >0\) and \(F = \{f\in {\text {usc-fcns}}({\mathbb {R}}^d)~|~f(x)\ge -\rho \text{ for } \text{ at } \text{ least } \text{ one } x\in [0,\rho ]^n\}\). We show that F cannot be covered with a lower number of balls than stipulated. Clearly, \(\mathop {\mathrm{dist}}\nolimits _\infty (0,\mathrm{hypo} \;f) \le \rho \) for all \(f\in F\). Thus, in view of (3), \({\mathbb {d}}(0,f) \le \rho + 1\) for all \(f\in F\), where 0 is the zero function on \({\mathbb {R}}^n\), and F is therefore bounded.
Next, let \(\varepsilon \in (0,\rho e^{-\rho }/6]\). We discretize \([0,\rho ]^n\) by defining \(x_i^k = k \rho /\nu _\varepsilon \), \(k = 1, \ldots , \nu _\varepsilon -1\) and \(i=1, \ldots , n\), where
with \(\lfloor a \rfloor \) being the largest integer not exceeding a. The discretization of \([0,\rho ]^n\) then contains the points \((x_1^{k_1}, x_2^{k_2}, \ldots , x_n^{k_n})\), with \(k_i \in \{1, 2, \ldots , \nu _\varepsilon -1\}\) and \(i=1, \ldots , n\). Clearly, the distance between any two such points in the sup-norm is at least \(\rho /\nu _\varepsilon \ge 3\varepsilon e^\rho \). We carry out a similar discretization of \([-\rho ,0]\) and define \(y^l = l \rho / \nu _\varepsilon \), \(l=1, \ldots , \nu _\varepsilon \). The functions that are finite on the discretization points of \([0,\rho ]^n\), with values at each such point equal to \(y^l\) for some l, and have value minus infinity elsewhere are given by \(F_{\varepsilon }\), i.e.,
Certainly, \(F_{\varepsilon } \subset F\). We next define
We establish that \(G_\varepsilon (f) \cap G_\varepsilon (f') = \emptyset \) for \(f,f'\in F_{\varepsilon }, f\ne f'\). Suppose for the sake of a contradiction that there is a g with \(g\in G_\varepsilon (f)\) and \(g\in G_\varepsilon (f')\) for \(f,f'\in F_\varepsilon \), \(f\ne f'\). Then, \(\hat{\mathbb {d}}_\rho (f,g) \le \varepsilon e^\rho \) and \(\hat{\mathbb {d}}_\rho (f',g) \le \varepsilon e^\rho \). However, since \(f\ne f'\), there exists a point \(x\in [0,\rho ]^n\) with \(|f(x) - f'(x)| \ge 3 \varepsilon e^\rho \). Without loss of generality, suppose that \(f(x) \ge f'(x) + 3\varepsilon e^\rho \). Since \(f(z), f'(z) = -\infty \) for all \(z\ne x\) with \(\Vert z-x\Vert _\infty < 3\varepsilon e^\rho \), we have that \(\hat{\mathbb {d}}_\rho (f,g) \le \varepsilon e^\rho \) implies that \(g(z) \ge f(x) - \varepsilon e^\rho \) for some \(z\in {\mathbb {B}}(x,\varepsilon e^\rho )\). Moreover, \(\hat{\mathbb {d}}_\rho (f',g) \le \varepsilon e^\rho \) implies that \(g(z) \le f'(x) + \varepsilon e^\rho \le f(x) - 3\varepsilon e^\rho + \varepsilon e^\rho = f(x) - 2\varepsilon e^\rho \) for all \(z\in {\mathbb {B}}(x,\varepsilon e^\rho )\). Since this is not possible for g, we have reached a contradiction. Thus, \(G_\varepsilon (f) \cap G_\varepsilon (f') = \emptyset \) for \(f,f'\in F_{\varepsilon }, f\ne f'\).
By Lemma 4.1, for any \(f\in {\text {usc-fcns}}({\mathbb {R}}^n)\),
Hence, for \(f\in F_{\varepsilon }\), an \({\mathbb {d}}\)-ball with radius \(\varepsilon \) that contains f needs to be centered at some \(g\in G_\varepsilon (f)\). Since the sets \(G_\varepsilon (f)\), \(f\in F_{\varepsilon }\), are nonoverlapping, a cover of \(F_{\varepsilon }\) by \({\mathbb {d}}\)-balls with radius \(\varepsilon \) must involve a number of balls that is at least as great as the number of functions in \(F_{\varepsilon }\), which is \(\nu _\varepsilon ^{m_\varepsilon }\), where \(m_\varepsilon = (\nu _\varepsilon -1)^n\). Thus,
Let \(c_1 = |\log (\rho e^{-\rho }/4)|\) and \({{\bar{\varepsilon }}} = \min \{\rho e^{-\rho }/12, e^{-2c_1}\}\). Continuing from (11), we then find that
Since \(\log \varepsilon ^{-1} \ge 2|\log (\rho e^{-\rho }/4)|\) for \(\varepsilon \in (0, {{\bar{\varepsilon }}}]\), we have that
and the conclusion is reached. \(\square \)
Rights and permissions
About this article
Cite this article
Royset, J.O. Approximations of semicontinuous functions with applications to stochastic optimization and statistical estimation. Math. Program. 184, 289–318 (2020). https://doi.org/10.1007/s10107-019-01413-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-019-01413-z
Keywords
- Hypo-convergence
- Attouch-Wets distance
- Approximation theory
- Solution stability
- Stochastic optimization
- Epi-splines
- Rate of convergence