Approximations of semicontinuous functions with applications to stochastic optimization and statistical estimation

Royset, Johannes O.

doi:10.1007/s10107-019-01413-z

Approximations of semicontinuous functions with applications to stochastic optimization and statistical estimation

Full Length Paper
Series A
Published: 11 July 2019

Volume 184, pages 289–318, (2020)
Cite this article

Mathematical Programming Submit manuscript

Johannes O. Royset¹

638 Accesses
7 Citations
Explore all metrics

Abstract

Upper semicontinuous (usc) functions arise in the analysis of maximization problems, distributionally robust optimization, and function identification, which includes many problems of nonparametric statistics. We establish that every usc function is the limit of a hypo-converging sequence of piecewise affine functions of the difference-of-max type and illustrate resulting algorithmic possibilities in the context of approximate solution of infinite-dimensional optimization problems. In an effort to quantify the ease with which classes of usc functions can be approximated by finite collections, we provide upper and lower bounds on covering numbers for bounded sets of usc functions under the Attouch-Wets distance. The result is applied in the context of stochastic optimization problems defined over spaces of usc functions. We establish confidence regions for optimal solutions based on sample average approximations and examine the accompanying rates of convergence. Examples from nonparametric statistics illustrate the results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Article Open access 07 July 2017

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Article 03 April 2024

Notes

We stress that $\nu $ is an index and not the power of q.
Recall that “open” here is according to the metric space $(S,\Vert \cdot -\cdot \Vert _\infty ).$
For the significance of entropy integrals we refer to [44].
This reference states results only for finite dimensions, but since $(F,{\mathbb {d}})$ is a complete separable metric space, with compact balls, the proofs of the required results carry over nearly verbatim.
On $(F,{\mathbb {d}})$, we adopt the Borel sigma-algebra.
For measurable $h:\Xi \rightarrow {\overline{{\mathbb {R}}}}$, $\int h(\xi )d{\mathbb {P}}(\xi ) = \int \max \{0,h(\xi )\}d{\mathbb {P}}(\xi ) - \int \max \{0, -h(\xi )\}d{\mathbb {P}}(\xi )$, with $\infty -\infty = \infty $.
A random variable Y is sub-exponential if for some $\lambda \ge 0$, ${\mathbb {E}}[\exp (\tau (Y-{\mathbb {E}}Y))] \le \exp (\tau ^2\lambda ^2/2)$ for all $|\tau |\le 1/\lambda $. Another assumption that ensures a Bernstein-type large-deviation result could have been substituted here.

References

Balabdaoui, F., Wellner, J.A.: Estimation of a k-monotone density: characterizations, consistency and minimax lower bounds. Stat. Neerl. 64(1), 45–70 (2010)
MathSciNet Google Scholar
Bampou, D., Kuhn, D.: Polynomial approximations for continuous linear programs. SIAM J. Optim. 22, 628–648 (2012)
MathSciNet MATH Google Scholar
Bartlett, P.L., Kulkarni, S.R., Posner, S.E.: Covering numbers for real-valued function classes. IEEE Trans. Inf. Theory 43(5), 1721–1724 (1997)
MathSciNet MATH Google Scholar
Bayraksan, G., Morton, D.P.: Assessing solution quality in stochastic programs. Math. Program. 108, 495–514 (2006)
MathSciNet MATH Google Scholar
Birman, M.S., Solomjak, M.Z.: Piecewise-polynomial approximation of functions of the classes $w_p^\alpha $. Math. USSR Sbornik 73, 295–317 (1967)
Google Scholar
Bronshtein, E.M.: $\epsilon $-Entropy of convex sets and functions. Sib. Math. J. 17(3), 393–398 (1976)
MathSciNet Google Scholar
Brudnyi, A.: On covering numbers of sublevel sets of analytic functions. J. Approx. Theory 162(1), 72–93 (2010)
MathSciNet MATH Google Scholar
Cui, Y., Pang, J.-S., Sen, B.: Composite difference-max programs for modern statistical estimation problems. SIAM J. Optim. 28(4), 3344–3374 (2018)
MathSciNet MATH Google Scholar
Cule, M., Samworth, R.J., Stewart, M.: Maximum likelihood estimation of a multi-dimensional log-concave density. J. R. Stat. Soc. Ser. B 72, 545–600 (2010)
MathSciNet MATH Google Scholar
Devolder, O., Glineur, F., Nesterov, Y.: Solving infinite-dimensional optimization problems by polynomial approximation. In: Diehl, M., Glineur, F., Jarlebring, E., Michiels, W. (eds.) Recent Advances in Optimization and its Applications in Engineering, pp. 31–40. Springer, Berlin (2010)
Google Scholar
Dudley, R.M.: Metric entropy of some classes of sets with differentiable boundaries. J. Approx. Theory 10(3), 227–236 (1974)
MathSciNet MATH Google Scholar
Georghiou, A., Wiesemann, W., Kuhn, D.: Generalized decision rule approximations for stochastic programming via liftings. Math. Program. 152(1–2), 301–338 (2015)
MathSciNet MATH Google Scholar
Groeneboom, P., Jongbloed, G., Wellner, J.A.: Estimation of a convex function: characterizations and asymptotic theory. Ann. Stat. 29, 1653–1698 (2001)
MathSciNet MATH Google Scholar
Guntuboyina, A., Sen, B.: Covering numbers for convex functions. IEEE Trans. Inf. Theory 59(4), 1957–1965 (2013)
MathSciNet MATH Google Scholar
Guntuboyina, A., Sen, B.: Global risk bounds and adaptation in univariate convex regression. Probab. Theory Relat. Fields 163, 379–411 (2015)
MathSciNet MATH Google Scholar
Guo, Y., Bartlett, P.L., Shawe-Taylor, J., Williamson, R.C.: Covering numbers for support vector machines. IEEE Trans. Inf. Theory 48(1), 239–250 (2002)
MathSciNet MATH Google Scholar
Hanasusanto, G.A., Wiesemann, W., Kuhn, D.: K-adaptability in two-stage robust binary programming. Oper. Res. 63(4), 877–891 (2015)
MathSciNet MATH Google Scholar
Hartman, P.: On functions representable as a difference of convex functions. Pac. J. Math. 9, 707–713 (1959)
MathSciNet MATH Google Scholar
Higle, J.L., Sen, S.: Statistical verification of optimality conditions for stochastic programs with recourse. Ann. Oper. Res. 30, 215–240 (1991)
MathSciNet MATH Google Scholar
Higle, J.L., Sen, S.: Duality and statistical tests of optimality for two stage stochastic programs. Math. Program. 75, 257–275 (1996)
MathSciNet MATH Google Scholar
Horst, R., Thoai, N.V.: DC programming: overview. J. Optim. Theory Appl. 103(1), 1–43 (1999)
MathSciNet MATH Google Scholar
Kim, A.K.H., Samworth, R.J.: Global rates of convergence in log-concave density estimation. Ann. Stat. 44, 2756–2779 (2016)
MathSciNet MATH Google Scholar
Kolmogorov, A.N., Tikhomirov, V.M.: Epsilon-entropy and epsilon-capacity of sets in functional spaces. Am. Math. Soc. Transl. Ser. 2(17), 277–364 (1961)
Google Scholar
Kühn, T.: Covering numbers of Gaussian reproducing kernel Hilbert spaces. J. Complex. 27(5), 489–499 (2011)
MathSciNet MATH Google Scholar
Lamm, M., Lu, S.: Generalized conditioning based approaches to computing confidence intervals for solutions to stochastic variational inequalities. Math. Program. B 174, 99–127 (2018)
MathSciNet MATH Google Scholar
Lu, S., Liu, Y., Yin, L., Zhang, K.: Confidence intervals and regions for the lasso by using stochastic variational inequality techniques in optimization. J. R. Stat. Soc. Ser. B 79, 589–611 (2017)
MathSciNet MATH Google Scholar
Mak, W.K., Morton, D.P., Wood, R.K.: Monte Carlo bounding techniques for determining solution quality in stochastic programs. Oper. Res. Lett. 24, 47–56 (1999)
MathSciNet MATH Google Scholar
Miller, M.: Binary classification using piecewise affine functions. Master’s thesis, Naval Postgraduate School, Monterey, CA, June (2019)
Norkin, V.I., Pflug, G.C., Ruszczynski, A.: A branch and bound method for stochastic global optimization. Math. Program. 83, 425–450 (1998)
MathSciNet MATH Google Scholar
Pontil, M.: A note on different covering numbers in learning theory. J. Complex. 19(5), 665–671 (2003)
MathSciNet MATH Google Scholar
Rockafellar, R.T., Wets, R. J-B.: Variational Analysis, Grundlehren der Mathematischen Wissenschaft, vol. 317. Springer, Berlin (1998). (3rd printing-2009 edition)
Royset, J.O.: Optimality functions in stochastic programming. Math. Program. 135(1), 293–321 (2012)
MathSciNet MATH Google Scholar
Royset, J.O.: Approximations and solution estimates in optimization. Math. Program. 170(2), 479–506 (2018)
MathSciNet MATH Google Scholar
Royset, J.O., Wets, R.J.-B.: From data to assessments and decisions: epi-spline technology. In: Newman, A. (ed.) INFORMS Tutorials. INFORMS, Catonsville (2014)
Google Scholar
Royset, J.O., Wets, R.J.-B.: Multivariate epi-splines and evolving function identification problems. Set-Valued and Variational Analysis 24(4), 517–545 (2016). (Erratum: pp. 547–549)
MathSciNet MATH Google Scholar
Royset, J.O., Wets, R.J.-B.: Variational theory for optimization under stochastic ambiguity. SIAM J. Optim. 27(2), 1118–1149 (2017)
MathSciNet MATH Google Scholar
Royset, J.O., Wets, R.J.-B.: On univariate function identification problems. Math. Program. B 168(1–2), 449–474 (2018)
MathSciNet MATH Google Scholar
Royset, J.O., Wets, R.J.-B.: Variational analysis of constrained M-estimators. ArXiv e-prints (2018)
Salinetti, G., Wets, R.J.-B.: On the convergence in distribution of measurable multifunctions (random sets), normal integrands, stochastic processes and stochastic infima. Math. Oper. Res. 11(3), 385–419 (1986)
MathSciNet MATH Google Scholar
Salinetti, G., Wets, R.J.-B.: On the hypo-convergence of probability measures. In: Conti, R., De Giorgi, E., Gianessi, F. (eds.) Optimication and Related Fields, Proceedings, Erice 1984, Lecture Notes in Mathematics, vol. 1190, pp. 371–395. Springer, Berlin (1986)
Google Scholar
Seijo, E., Sen, B.: Nonparametric least squares estimation of a multivariate convex regression. Ann. Stat. 39, 1633–1657 (2011)
MathSciNet MATH Google Scholar
Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on Stochastic Programming: Modeling and Theory, 2nd edn. SIAM, Philadelphia (2014)
MATH Google Scholar
Shapiro, A., Homem-de-Mello, T.: A simulation-based approach to two-stage stochastic programming with recourse. Math. Program. 81, 301–325 (1998)
MathSciNet MATH Google Scholar
van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes. Springer, Berlin (1996). (2nd printing 2000 edition)
MATH Google Scholar
van de Geer, S.: Empirical Processes in M-Estimation. Cambridge University Press, Cambridge (2000)
Google Scholar
Wang, J., Huang, H., Luo, Z., Chen, B.: Estimation of covering number in learning theory. In: Proceeding of the Fifth International Conference on Semantics, Knowledge and Grid 2009, pp. 388–391 (2009)
Zhang, Z., Yang, X., Oseledets, I.V., Karniadakis, G.E., Daniel, L.: Enabling high-dimensional hierarchical uncertainty quantification by anova and tensor-train decomposition. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 34(1), 63–76 (2015)
Google Scholar
Zhou, D.-X.: The covering number in learning theory. J. Complex. 18(3), 739–767 (2002)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work in supported in parts by DARPA under Grants HR0011-14-1-0060 and HR0011-8-34187, and Office of Naval Research (Science of Autonomy Program) under Grant N00014- 17-1-2372.

Author information

Authors and Affiliations

Operations Research Department, Naval Postgraduate School, Monterey, USA
Johannes O. Royset

Authors

Johannes O. Royset
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johannes O. Royset.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Theorem 4.4

Let $\rho >0$ and $F = \{f\in {\text {usc-fcns}}({\mathbb {R}}^d)~|~f(x)\ge -\rho \text{ for } \text{ at } \text{ least } \text{ one } x\in [0,\rho ]^n\}$. We show that F cannot be covered with a lower number of balls than stipulated. Clearly, $\mathop {\mathrm{dist}}\nolimits _\infty (0,\mathrm{hypo} \;f) \le \rho $ for all $f\in F$. Thus, in view of (3), ${\mathbb {d}}(0,f) \le \rho + 1$ for all $f\in F$, where 0 is the zero function on ${\mathbb {R}}^n$, and F is therefore bounded.

Next, let $\varepsilon \in (0,\rho e^{-\rho }/6]$. We discretize $[0,\rho ]^n$ by defining $x_i^k = k \rho /\nu _\varepsilon $, $k = 1, \ldots , \nu _\varepsilon -1$ and $i=1, \ldots , n$, where

$$\begin{aligned} \nu _\varepsilon = \left\lfloor \frac{\rho e^{-\rho }}{3\varepsilon }\right\rfloor \ge 2, \end{aligned}$$

with $\lfloor a \rfloor $ being the largest integer not exceeding a. The discretization of $[0,\rho ]^n$ then contains the points $(x_1^{k_1}, x_2^{k_2}, \ldots , x_n^{k_n})$, with $k_i \in \{1, 2, \ldots , \nu _\varepsilon -1\}$ and $i=1, \ldots , n$. Clearly, the distance between any two such points in the sup-norm is at least $\rho /\nu _\varepsilon \ge 3\varepsilon e^\rho $. We carry out a similar discretization of $[-\rho ,0]$ and define $y^l = l \rho / \nu _\varepsilon $, $l=1, \ldots , \nu _\varepsilon $. The functions that are finite on the discretization points of $[0,\rho ]^n$, with values at each such point equal to $y^l$ for some l, and have value minus infinity elsewhere are given by $F_{\varepsilon }$, i.e.,

$$\begin{aligned} F_{\varepsilon } =&\{f\in {\text {usc-fcns}}({\mathbb {R}}^n)~|~ \text{ for } \text{ each } x=(x_1^{k_1}, \ldots , x_n^{k_n}), \\&\text{ with } k_i \in \{1, 2, \ldots , \nu _\varepsilon -1\}, f(x) = y^l\\&\text{ for } \text{ some } l=1, \ldots , \nu _\varepsilon ; f(x) = -\infty \text{ otherwise } \}. \end{aligned}$$

Certainly, $F_{\varepsilon } \subset F$. We next define

$$\begin{aligned} G_\varepsilon (f) = \{g\in {\text {usc-fcns}}({\mathbb {R}}^n)~|~ \hat{\mathbb {d}}_\rho (f,g) \le \varepsilon e^\rho \}, ~ \text{ for } f\in {\text {usc-fcns}}({\mathbb {R}}^n). \end{aligned}$$

We establish that $G_\varepsilon (f) \cap G_\varepsilon (f') = \emptyset $ for $f,f'\in F_{\varepsilon }, f\ne f'$. Suppose for the sake of a contradiction that there is a g with $g\in G_\varepsilon (f)$ and $g\in G_\varepsilon (f')$ for $f,f'\in F_\varepsilon $, $f\ne f'$. Then, $\hat{\mathbb {d}}_\rho (f,g) \le \varepsilon e^\rho $ and $\hat{\mathbb {d}}_\rho (f',g) \le \varepsilon e^\rho $. However, since $f\ne f'$, there exists a point $x\in [0,\rho ]^n$ with $|f(x) - f'(x)| \ge 3 \varepsilon e^\rho $. Without loss of generality, suppose that $f(x) \ge f'(x) + 3\varepsilon e^\rho $. Since $f(z), f'(z) = -\infty $ for all $z\ne x$ with $\Vert z-x\Vert _\infty < 3\varepsilon e^\rho $, we have that $\hat{\mathbb {d}}_\rho (f,g) \le \varepsilon e^\rho $ implies that $g(z) \ge f(x) - \varepsilon e^\rho $ for some $z\in {\mathbb {B}}(x,\varepsilon e^\rho )$. Moreover, $\hat{\mathbb {d}}_\rho (f',g) \le \varepsilon e^\rho $ implies that $g(z) \le f'(x) + \varepsilon e^\rho \le f(x) - 3\varepsilon e^\rho + \varepsilon e^\rho = f(x) - 2\varepsilon e^\rho $ for all $z\in {\mathbb {B}}(x,\varepsilon e^\rho )$. Since this is not possible for g, we have reached a contradiction. Thus, $G_\varepsilon (f) \cap G_\varepsilon (f') = \emptyset $ for $f,f'\in F_{\varepsilon }, f\ne f'$.

By Lemma 4.1, for any $f\in {\text {usc-fcns}}({\mathbb {R}}^n)$,

$$\begin{aligned} {\mathbb {d}}(f,g) \ge e^{-\rho } \hat{\mathbb {d}}_\rho (f,g) > e^{-\rho } \varepsilon e^\rho = \varepsilon \text{ for } \text{ all } g\not \in G_\varepsilon (f). \end{aligned}$$

Hence, for $f\in F_{\varepsilon }$, an ${\mathbb {d}}$-ball with radius $\varepsilon $ that contains f needs to be centered at some $g\in G_\varepsilon (f)$. Since the sets $G_\varepsilon (f)$, $f\in F_{\varepsilon }$, are nonoverlapping, a cover of $F_{\varepsilon }$ by ${\mathbb {d}}$-balls with radius $\varepsilon $ must involve a number of balls that is at least as great as the number of functions in $F_{\varepsilon }$, which is $\nu _\varepsilon ^{m_\varepsilon }$, where $m_\varepsilon = (\nu _\varepsilon -1)^n$. Thus,

$$\begin{aligned} \log N(F,\varepsilon ) \ge \nu _\varepsilon ^n \log \nu _\varepsilon \ge \left( \frac{\rho e^{-\rho }}{3\varepsilon }-2\right) ^n \log \left( \frac{\rho e^{-\rho }}{3\varepsilon }-1\right) . \end{aligned}$$

(11)

Let $c_1 = |\log (\rho e^{-\rho }/4)|$ and ${{\bar{\varepsilon }}} = \min \{\rho e^{-\rho }/12, e^{-2c_1}\}$. Continuing from (11), we then find that

$$\begin{aligned} \log N(F,\varepsilon ) \ge \left( \frac{\rho e^{-\rho }}{6}\right) ^n \left[ 1+ \frac{\log (\rho e^{-\rho }/4)}{\log \varepsilon ^{-1}} \right] \frac{1}{\varepsilon ^n}\log \frac{1}{\varepsilon }. \end{aligned}$$

Since $\log \varepsilon ^{-1} \ge 2|\log (\rho e^{-\rho }/4)|$ for $\varepsilon \in (0, {{\bar{\varepsilon }}}]$, we have that

$$\begin{aligned} \log N(F,\varepsilon ) \ge \left( \frac{\rho e^{-\rho }}{6}\right) ^n \frac{1}{2}\frac{1}{\varepsilon ^n}\log \frac{1}{\varepsilon }~ \text{ for } \varepsilon \in (0, {{\bar{\varepsilon }}}], \end{aligned}$$

and the conclusion is reached. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Royset, J.O. Approximations of semicontinuous functions with applications to stochastic optimization and statistical estimation. Math. Program. 184, 289–318 (2020). https://doi.org/10.1007/s10107-019-01413-z

Download citation

Received: 26 June 2018
Accepted: 27 June 2019
Published: 11 July 2019
Issue Date: November 2020
DOI: https://doi.org/10.1007/s10107-019-01413-z

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximations of semicontinuous functions with applications to stochastic optimization and statistical estimation

Abstract

Access this article

Similar content being viewed by others

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

The Frank-Wolfe Algorithm: A Short Introduction

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof of Theorem 4.4

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Approximations of semicontinuous functions with applications to stochastic optimization and statistical estimation

Abstract

Access this article

Similar content being viewed by others

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

The Frank-Wolfe Algorithm: A Short Introduction

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Theorem 4.4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation