Radial duality part II: applications and algorithms

Grimmer, Benjamin

doi:10.1007/s10107-023-01974-0

Radial duality part II: applications and algorithms

Full Length Paper
Series A
Published: 26 May 2023

Volume 205, pages 69–105, (2024)
Cite this article

Mathematical Programming Submit manuscript

Benjamin Grimmer ORCID: orcid.org/0000-0002-7003-8448¹

339 Accesses
2 Altmetric
Explore all metrics

Abstract

The first part of this work established the foundations of a radial duality between nonnegative optimization problems, inspired by the work of Renegar (SIAM J Optim 26(4): 2649-2676, 2016). Here we utilize our radial duality theory to design and analyze projection-free optimization algorithms that operate by solving a radially dual problem. In particular, we consider radial subgradient, smoothing, and accelerated methods that are capable of solving a range of constrained convex and nonconvex optimization problems and that can scale-up more efficiently than their classic counterparts. These algorithms enjoy the same benefits as their predecessors, avoiding Lipschitz continuity assumptions and costly orthogonal projections, in our newfound, broader context. Our radial duality further allows us to understand the effects and benefits of smoothness and growth conditions on the radial dual and consequently on our radial algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Radial duality part I: foundations

Article 10 August 2023

Generalized Derivatives and Optimality Conditions in Nonconvex Optimization

Article Open access 25 March 2024

Higher-Order Generalized Radial Epiderivative and Its Applications to Set-Valued Optimization Problems

Article 17 November 2017

Notes

Instead of using a re-parameterization, one can explicitly include equality constraints in our model. The details of this approach are given in Sect. 2.2.1, where we see that equality constraints are unaffected by the radial dual.
Our calculation of the radial dual of the quadratic objective follows by definition as
$$\begin{aligned} (1-\frac{1}{2}x^T Qx - c^Tx)^\varGamma _+(y)&= \sup \left\{ v>0 \mid v\left( 1-\frac{y^T Qy}{2v^2} - \frac{c^Ty}{v}\right) \le 1 \right\} \\&= \sup \left\{ v>0 \mid v^2-\frac{1}{2}y^T Qy - (c^Ty+1)v\le 0 \right\} \\&=\left( \frac{c^Ty+1 + \sqrt{(c^Ty+1)^2 +2y^TQy}}{2}\right) _+. \end{aligned}$$
Quadratic programming was the original motivating setting for Frank-Wolfe [13].
The source code is available at github.com/bgrimmer/Radial-Duality-QP-Example.
For example, if either $\{a_i\}$ spans $\mathbb {R}^n$ or the regularizer r(x) has bounded level sets.
Note this hypograph is not actually star convex since $(0,0)\not \in \mathrm {hypo\ }f$.
This is essentially by definition as $v\cdot \hat{\iota }_S(y/v)$ is nondecreasing in v if and only if S is star-convex w.r.t. the origin. Then it is simple to check this function is upper semicontinuous and is vacuously strictly increasing on its effective domain $\mathrm {dom\ }\hat{\iota }_S = \emptyset $, which is empty.

References

Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017). https://doi.org/10.1287/moor.2016.0817
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: Smoothing and first order methods: a unified framework. SIAM J. Optim. 22, 557–580 (2012)
Article MathSciNet Google Scholar
Bertero, M., Boccacci, P., Desiderà, G., Vicidomini, G.: Image deblurring with poisson data: from cells to galaxies. Inverse Prob. 25(12), 123006 (2009). https://doi.org/10.1088/0266-5611/25/12/123006
Article MathSciNet Google Scholar
Bolte, J., Daniilidis, A., Lewis, A.: The łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007). https://doi.org/10.1137/050644641
Article Google Scholar
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017). https://doi.org/10.1007/s10107-016-1091-6
Article MathSciNet Google Scholar
Burke, J.V., Ferris, M.C.: Weak sharp minima in mathematical programming. SIAM J. Control. Optim. 31(5), 1340–1359 (1993). https://doi.org/10.1137/0331063
Article MathSciNet Google Scholar
Chandrasekaran, K., Dadush, D., Vempala, S.: Thin Partitions: Isoperimetric Inequalities and a Sampling Algorithm for Star Shaped Bodies, pp. 1630–1645. https://doi.org/10.1137/1.9781611973075.133
Clarke, F.H., Ledyaev, Y.S., Stern, R.J., Wolenski, P.R.: Nonsmooth Analysis and Control Theory. Springer-Verlag, Berlin, Heidelberg (1998)
Google Scholar
Davis, D., Drusvyatskiy, D.: Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207–239 (2019). https://doi.org/10.1137/18M1178244
Article MathSciNet Google Scholar
Dorn, W.S.: Duality in quadratic programming. Q. Appl. Math. 18(2), 155–162 (1960)
Article MathSciNet Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001). https://doi.org/10.1198/016214501753382273
Article MathSciNet Google Scholar
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3(1–2), 95–110 (1956). https://doi.org/10.1002/nav.3800030109
Article MathSciNet Google Scholar
Freund, R.M.: Dual gauge programs, with applications to quadratic programming and the minimum-norm problem. Math. Program. 38, 47–67 (1987). https://doi.org/10.1007/BF02591851
Article MathSciNet Google Scholar
Gao, H.Y., Bruce, A.G.: Wave shrink with firm shrinkage. Stat. Sin. 7(4), 855–874 (1997)
Google Scholar
Grimmer, B.: Radial subgradient method. SIAM J. Optim. 28(1), 459–469 (2018). https://doi.org/10.1137/17M1122980
Article MathSciNet Google Scholar
Grimmer, B.: Radial Duality Part I: Foundations. arXiv e-prints arXiv:2104.11179 (2021)
Guminov, S., Gasnikov, A.: Accelerated Methods for $\alpha $-Weakly-Quasi-Convex Problems. arXiv e-prints arXiv:1710.00797 (2017)
Guminov, S., Nesterov, Y., Dvurechensky, P., Gasnikov, A.: Accelerated primal-dual gradient descent with linesearch for convex, nonconvex, and nonsmooth optimization problems. Dokl. Math. 99, 125–128 (2019). https://doi.org/10.1134/S1064562419020042
Article MathSciNet Google Scholar
He, N., Harchaoui, Z., Wang, Y., Song, L.: Fast and simple optimization for poisson likelihood models. CoRR abs/1608.01264 (2016). http://arxiv.org/abs/1608.01264
Hinder, O., Sidford, A., Sohoni, N.: Near-optimal methods for minimizing star-convex functions and beyond. In: J. Abernethy, S. Agarwal (eds.) Proceedings of Thirty Third Conference on Learning Theory, Proceedings of Machine Learning Research, vol. 125, pp. 1894–1938. PMLR (2020). http://proceedings.mlr.press/v125/hinder20a.html
Johnstone, P.R., Moulin, P.: Faster subgradient methods for functions with hölderian growth. Math. Program. 180(1), 417–450 (2020). https://doi.org/10.1007/s10107-018-01361-0
Article MathSciNet Google Scholar
Klein Haneveld, W.K., van der Vlerk, M.H., Romeijnders, W.: Chance Constraints, pp. 115–138. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-29219-5_5
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier 48(3), 769–783 (1998). http://eudml.org/doc/75302
Lacoste-Julien, S., Schmidt, M., Bach, F.R.: A simpler approach to obtaining an o(1/t) convergence rate for the projected stochastic subgradient method. CoRR abs/1212.2002 (2012). http://arxiv.org/abs/1212.2002
Lee, J.C., Valiant, P.: Optimizing star-convex functions. In: 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pp. 603–614 (2016). https://doi.org/10.1109/FOCS.2016.71
Liu, M., Yang, T.: Adaptive accelerated gradient converging method under holderian error bound condition. In: Advances in Neural Information Processing Systems, vol. 30 (2017). https://proceedings.neurips.cc/paper/2017/file/2612aa892d962d6f8056b195ca6e550d-Paper.pdf
Lojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. Les équations aux dérivées partielles 117, 87–89 (1963)
Google Scholar
Łojasiewicz, S.: Sur la géométrie semi-et sous-analytique. In: Annales de l’institut Fourier, vol. 43, pp. 1575–1595 (1993)
Mukkamala, M.C., Fadili, J., Ochs, P.: Global convergence of model function based bregman proximal minimization algorithms. (2020) arXiv:2012.13161
Nemirovski, A., Shapiro, A.: Convex approximations of chance constrained programs. SIAM J. Optim. 17(4), 969–996 (2007). https://doi.org/10.1137/050622328
Article MathSciNet Google Scholar
Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence $o(1/k^2)$. Soviet Math. Doklady 27(2), 372–376 (1983)
Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005). https://doi.org/10.1007/s10107-004-0552-5
Article MathSciNet Google Scholar
Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. 152(1–2), 381–404 (2015). https://doi.org/10.1007/s10107-014-0790-0
Article MathSciNet Google Scholar
Nesterov, Y., Polyak, B.: Cubic regularization of newton method and its global performance. Math. Program. 108, 177–205 (2006). https://doi.org/10.1007/s10107-006-0706-8
Article MathSciNet Google Scholar
Polyak, B.T.: Minimization of unsmooth functionals. USSR Comput. Math. Math. Phys. 9(3), 14–29 (1969). https://doi.org/10.1016/0041-5553(69)90061-5
Article Google Scholar
Polyak, B.T.: Sharp minima. Institute of Control Sciences Lecture Notes,Moscow, USSR. Presented at the IIASA Workshop on Generalized Lagrangians and Their Applications, IIASA, Laxenburg, Austria. (1979)
Renegar, J.: “Efficient’’ subgradient methods for general convex optimization. SIAM J. Optim. 26(4), 2649–2676 (2016). https://doi.org/10.1137/15M1027371
Article MathSciNet Google Scholar
Renegar, J.: Accelerated first-order methods for hyperbolic programming. Math. Program. 173(1–2), 1–35 (2019). https://doi.org/10.1007/s10107-017-1203-y
Article MathSciNet Google Scholar
Renegar, J., Grimmer, B.: A simple nearly-optimal restart scheme for speeding-up first order methods. To appear in foundations of computational mathematics (2021)
Roulet, V., d’Aspremont, A.: Sharpness, restart, and acceleration. SIAM J. Optim. 30(1), 262–289 (2020). https://doi.org/10.1137/18M1224568
Article MathSciNet Google Scholar
Rousseeuw, P.J.: Least median of squares regression. J. Am. Stat. Assoc. 79(388), 871–880 (1984). https://doi.org/10.1080/01621459.1984.10477105
Article MathSciNet Google Scholar
Rubinov, A., Yagubov, A.: The space of star-shaped sets and its applications in nonsmooth optimization. Math. Program. Stud. 29 (1986). https://doi.org/10.1007/BFb0121146
Stellato, B., Banjac, G., Goulart, P., Bemporad, A., Boyd, S.: OSQP: an operator splitting solver for quadratic programs. Math. Program. Comput. 12(4), 637–672 (2020). https://doi.org/10.1007/s12532-020-00179-2
Article MathSciNet Google Scholar
Wen, F., Chu, L., Liu, P., Qiu, R.C.: A survey on nonconvex regularization-based sparse and low-rank recovery in signal processing, statistics, and machine learning. IEEE Access 6, 69883–69906 (2018). https://doi.org/10.1109/ACCESS.2018.2880454
Article Google Scholar
Yang, T., Lin, Q.: Rsg: Beating subgradient method without smoothness and strong convexity. J. Mach. Learn. Res. 19(6), 1–33 (2018). http://jmlr.org/papers/v19/17-016.html
Yu, J., Eriksson, A., Chin, T.J., Suter, D.: An adversarial optimization approach to efficient outlier removal. J. Math. Imag. Vis 48, 451–466 (2014). https://doi.org/10.1007/s10851-013-0418-7
Article MathSciNet Google Scholar
Yuan, Y., Li, Z., Huang, B.: Robust optimization approximation for joint chance constrained optimization problem. J. Glob. Optim. 67, 805–827 (2017). https://doi.org/10.1007/s10898-016-0438-0
Article MathSciNet Google Scholar
Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010). https://doi.org/10.1214/09-AOS729
Article MathSciNet Google Scholar

Download references

Acknowledgements

The author thanks Jim Renegar broadly for inspiring this work and concretely for providing feedback an early draft and Rob Freund for constructive thoughts helping focus this work. Additionally, two anonymous referees and the associate editor provided useful feedback much improving this work’s presentation and clarity.

Author information

Authors and Affiliations

Johns Hopkins University, Baltimore, MD, USA
Benjamin Grimmer

Authors

Benjamin Grimmer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin Grimmer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1650441. This work was partially done while the author was visiting the Simons Institute for the Theory of Computing. It was partially supported by the DIMACS/Simons Collaboration on Bridging Continuous and Discrete Optimization through NSF grant #CCF-1740425.

Appendices

Appendix A: LogSumExp gradients and QP optimality certificates

In our quadratic programming example (8) and our generalized setting (37), we consider smoothings of a finite maximum. Given a smooth convex functions $f_i:\mathcal {E}\rightarrow \mathbb {R}$, we considered the smoothing of $\max \{f_i\}$ with parameter $\eta >0$ given by $ f_\eta (x):= \eta \log \left( \sum _{i=0}^n \exp (f_i(x)/\eta )\right) .$ Its gradient is given by $ \nabla f_\eta (x) = \sum \lambda _i \nabla f_i(x)$ where $\lambda _i = \exp (f_i(x)/\eta ) / \sum _j \exp (f_j(x)/\eta )$. Computationally evaluating this requires mild care to avoid precision issues with exponentiating potentially larger numbers. It is numerically stable to instead compute these coefficients via the equivalent formula

$$\begin{aligned} \lambda _i = \frac{\exp ((f_i(x) - \max \{f_k(x)\})/\eta )}{\sum _j \exp ((f_j(x) - \max \{f_k(x)\})/\eta )} \ . \end{aligned}$$

Next, we specialize this formula to the setting of quadratic programming for $g_\eta $ in (8). Observe the gradient of the objective component is given by

$$\begin{aligned} \nabla \left( \frac{c^Ty+1 + \sqrt{(c^Ty+1)^2 +2y^TQy}}{2}\right) _+\ (y)= \frac{Qx +c}{1+\frac{1}{2}x^TQx} \end{aligned}$$

where $x=y/\left( \frac{c^Ty+1 + \sqrt{(c^Ty+1)^2 +2y^TQy}}{2}\right) _+$ by using the gradient formula (25). The gradients of the transformed constraints are simply $ \nabla a_i^T y /b_i = a_i/b_i $. Then the gradient of the smoothing overall is given by

$$\begin{aligned} \nabla g_\eta (y) = \lambda _0 \frac{Qx +c}{1+\frac{1}{2}x^TQx} + \sum _{i=1}^n \lambda _i a_i/b_i \ . \end{aligned}$$

This gradient can be computed using two matrix multiplications with A: Ay is needed to compute the coefficients $\lambda _i$, then $A^T[\lambda _1/b_1 \dots \lambda _n/b_n]$ is needed for the summation above. This gradient formula indicates a reasonable selection of dual multipliers $ v_i = \frac{\lambda _i (1+\frac{1}{2}x^TQx)}{\lambda _0b_i}$ as we then have $g_\eta (y)$ proportional to $Qx+c + A^Tv$.

Appendix B: Calculation of nonconvex subgradient method guarantee (41)

Here we derive (41) from the convergence theory of [10, Theorem 3.1], which assumes the function being minimized g is uniformly M-Lipschitz and $\rho $-weakly convex (defined as $g+\frac{\rho }{2}\Vert \cdot \Vert ^2$ being convex). Equation [10, (3.4)] ensures the subgradient method $y_{k+1} = y_k - \alpha \zeta _k$ for $\zeta _k\in \partial _P g(y_k)$ has some $y_k$ with

$$\begin{aligned} \Vert \nabla g_{1/\bar{\rho }}(y_k)\Vert ^2 \le \frac{\bar{\rho }}{\bar{\rho }- \rho } \frac{(g(x_0) - \inf g) + \frac{\bar{\rho }M^2}{2}\sum _{t=0}^T\alpha _t }{\sum _{t=0}^T \alpha _t} \end{aligned}$$

where $g_{1/\bar{\rho }}$ is the Moreau envelope of g (and using that $g_{1/\bar{\rho }}(x_0) < g(x_0)$). Given the method will be run for T steps, setting $\bar{\rho }= 2\rho $ and $\alpha _k = \sqrt{\frac{g(y_0)-\inf g}{\rho M^2(T+1)}}$ this bound becomes

$$\begin{aligned} \Vert \nabla g_{1/2\rho }(y_k)\Vert ^2 \le 4\sqrt{\frac{\rho M^2(g(y_0)-\inf g)}{T+1}} \end{aligned}$$

Following the discussion of [10, Page 210], this implies $y_k$ has a nearby y that is nearly stationary on g. In particular, $\Vert y-y_k\Vert \le (1/2\rho )\Vert \nabla g_{1/2\rho }(y_k)\Vert $ and $\textrm{dist}(0,\partial _P g(y))\le \Vert \nabla g_{1/2\rho }(y_k)\Vert $. Combining this with the above bound, we conclude

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Grimmer, B. Radial duality part II: applications and algorithms. Math. Program. 205, 69–105 (2024). https://doi.org/10.1007/s10107-023-01974-0

Download citation

Received: 13 May 2021
Accepted: 22 April 2023
Published: 26 May 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s10107-023-01974-0

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Radial duality part II: applications and algorithms

Abstract

Access this article