Skip to main content
Log in

Radial duality part II: applications and algorithms

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

The first part of this work established the foundations of a radial duality between nonnegative optimization problems, inspired by the work of Renegar (SIAM J Optim 26(4): 2649-2676, 2016). Here we utilize our radial duality theory to design and analyze projection-free optimization algorithms that operate by solving a radially dual problem. In particular, we consider radial subgradient, smoothing, and accelerated methods that are capable of solving a range of constrained convex and nonconvex optimization problems and that can scale-up more efficiently than their classic counterparts. These algorithms enjoy the same benefits as their predecessors, avoiding Lipschitz continuity assumptions and costly orthogonal projections, in our newfound, broader context. Our radial duality further allows us to understand the effects and benefits of smoothness and growth conditions on the radial dual and consequently on our radial algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Instead of using a re-parameterization, one can explicitly include equality constraints in our model. The details of this approach are given in Sect. 2.2.1, where we see that equality constraints are unaffected by the radial dual.

  2. Our calculation of the radial dual of the quadratic objective follows by definition as

    $$\begin{aligned} (1-\frac{1}{2}x^T Qx - c^Tx)^\varGamma _+(y)&= \sup \left\{ v>0 \mid v\left( 1-\frac{y^T Qy}{2v^2} - \frac{c^Ty}{v}\right) \le 1 \right\} \\&= \sup \left\{ v>0 \mid v^2-\frac{1}{2}y^T Qy - (c^Ty+1)v\le 0 \right\} \\&=\left( \frac{c^Ty+1 + \sqrt{(c^Ty+1)^2 +2y^TQy}}{2}\right) _+. \end{aligned}$$
  3. Quadratic programming was the original motivating setting for Frank-Wolfe [13].

  4. The source code is available at github.com/bgrimmer/Radial-Duality-QP-Example.

  5. For example, if either \(\{a_i\}\) spans \(\mathbb {R}^n\) or the regularizer r(x) has bounded level sets.

  6. Note this hypograph is not actually star convex since \((0,0)\not \in \mathrm {hypo\ }f\).

  7. This is essentially by definition as \(v\cdot \hat{\iota }_S(y/v)\) is nondecreasing in v if and only if S is star-convex w.r.t. the origin. Then it is simple to check this function is upper semicontinuous and is vacuously strictly increasing on its effective domain \(\mathrm {dom\ }\hat{\iota }_S = \emptyset \), which is empty.

References

  1. Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017). https://doi.org/10.1287/moor.2016.0817

    Article  MathSciNet  Google Scholar 

  2. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  3. Beck, A., Teboulle, M.: Smoothing and first order methods: a unified framework. SIAM J. Optim. 22, 557–580 (2012)

    Article  MathSciNet  Google Scholar 

  4. Bertero, M., Boccacci, P., Desiderà, G., Vicidomini, G.: Image deblurring with poisson data: from cells to galaxies. Inverse Prob. 25(12), 123006 (2009). https://doi.org/10.1088/0266-5611/25/12/123006

    Article  MathSciNet  Google Scholar 

  5. Bolte, J., Daniilidis, A., Lewis, A.: The łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007). https://doi.org/10.1137/050644641

    Article  Google Scholar 

  6. Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017). https://doi.org/10.1007/s10107-016-1091-6

    Article  MathSciNet  Google Scholar 

  7. Burke, J.V., Ferris, M.C.: Weak sharp minima in mathematical programming. SIAM J. Control. Optim. 31(5), 1340–1359 (1993). https://doi.org/10.1137/0331063

    Article  MathSciNet  Google Scholar 

  8. Chandrasekaran, K., Dadush, D., Vempala, S.: Thin Partitions: Isoperimetric Inequalities and a Sampling Algorithm for Star Shaped Bodies, pp. 1630–1645. https://doi.org/10.1137/1.9781611973075.133

  9. Clarke, F.H., Ledyaev, Y.S., Stern, R.J., Wolenski, P.R.: Nonsmooth Analysis and Control Theory. Springer-Verlag, Berlin, Heidelberg (1998)

    Google Scholar 

  10. Davis, D., Drusvyatskiy, D.: Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207–239 (2019). https://doi.org/10.1137/18M1178244

    Article  MathSciNet  Google Scholar 

  11. Dorn, W.S.: Duality in quadratic programming. Q. Appl. Math. 18(2), 155–162 (1960)

    Article  MathSciNet  Google Scholar 

  12. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001). https://doi.org/10.1198/016214501753382273

    Article  MathSciNet  Google Scholar 

  13. Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3(1–2), 95–110 (1956). https://doi.org/10.1002/nav.3800030109

    Article  MathSciNet  Google Scholar 

  14. Freund, R.M.: Dual gauge programs, with applications to quadratic programming and the minimum-norm problem. Math. Program. 38, 47–67 (1987). https://doi.org/10.1007/BF02591851

    Article  MathSciNet  Google Scholar 

  15. Gao, H.Y., Bruce, A.G.: Wave shrink with firm shrinkage. Stat. Sin. 7(4), 855–874 (1997)

    Google Scholar 

  16. Grimmer, B.: Radial subgradient method. SIAM J. Optim. 28(1), 459–469 (2018). https://doi.org/10.1137/17M1122980

    Article  MathSciNet  Google Scholar 

  17. Grimmer, B.: Radial Duality Part I: Foundations. arXiv e-prints arXiv:2104.11179 (2021)

  18. Guminov, S., Gasnikov, A.: Accelerated Methods for \(\alpha \)-Weakly-Quasi-Convex Problems. arXiv e-prints arXiv:1710.00797 (2017)

  19. Guminov, S., Nesterov, Y., Dvurechensky, P., Gasnikov, A.: Accelerated primal-dual gradient descent with linesearch for convex, nonconvex, and nonsmooth optimization problems. Dokl. Math. 99, 125–128 (2019). https://doi.org/10.1134/S1064562419020042

    Article  MathSciNet  Google Scholar 

  20. He, N., Harchaoui, Z., Wang, Y., Song, L.: Fast and simple optimization for poisson likelihood models. CoRR abs/1608.01264 (2016). http://arxiv.org/abs/1608.01264

  21. Hinder, O., Sidford, A., Sohoni, N.: Near-optimal methods for minimizing star-convex functions and beyond. In: J. Abernethy, S. Agarwal (eds.) Proceedings of Thirty Third Conference on Learning Theory, Proceedings of Machine Learning Research, vol. 125, pp. 1894–1938. PMLR (2020). http://proceedings.mlr.press/v125/hinder20a.html

  22. Johnstone, P.R., Moulin, P.: Faster subgradient methods for functions with hölderian growth. Math. Program. 180(1), 417–450 (2020). https://doi.org/10.1007/s10107-018-01361-0

    Article  MathSciNet  Google Scholar 

  23. Klein Haneveld, W.K., van der Vlerk, M.H., Romeijnders, W.: Chance Constraints, pp. 115–138. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-29219-5_5

  24. Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier 48(3), 769–783 (1998). http://eudml.org/doc/75302

  25. Lacoste-Julien, S., Schmidt, M., Bach, F.R.: A simpler approach to obtaining an o(1/t) convergence rate for the projected stochastic subgradient method. CoRR abs/1212.2002 (2012). http://arxiv.org/abs/1212.2002

  26. Lee, J.C., Valiant, P.: Optimizing star-convex functions. In: 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pp. 603–614 (2016). https://doi.org/10.1109/FOCS.2016.71

  27. Liu, M., Yang, T.: Adaptive accelerated gradient converging method under holderian error bound condition. In: Advances in Neural Information Processing Systems, vol. 30 (2017). https://proceedings.neurips.cc/paper/2017/file/2612aa892d962d6f8056b195ca6e550d-Paper.pdf

  28. Lojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. Les équations aux dérivées partielles 117, 87–89 (1963)

    Google Scholar 

  29. Łojasiewicz, S.: Sur la géométrie semi-et sous-analytique. In: Annales de l’institut Fourier, vol. 43, pp. 1575–1595 (1993)

  30. Mukkamala, M.C., Fadili, J., Ochs, P.: Global convergence of model function based bregman proximal minimization algorithms. (2020) arXiv:2012.13161

  31. Nemirovski, A., Shapiro, A.: Convex approximations of chance constrained programs. SIAM J. Optim. 17(4), 969–996 (2007). https://doi.org/10.1137/050622328

    Article  MathSciNet  Google Scholar 

  32. Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence \(o(1/k^2)\). Soviet Math. Doklady 27(2), 372–376 (1983)

    Google Scholar 

  33. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005). https://doi.org/10.1007/s10107-004-0552-5

    Article  MathSciNet  Google Scholar 

  34. Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. 152(1–2), 381–404 (2015). https://doi.org/10.1007/s10107-014-0790-0

    Article  MathSciNet  Google Scholar 

  35. Nesterov, Y., Polyak, B.: Cubic regularization of newton method and its global performance. Math. Program. 108, 177–205 (2006). https://doi.org/10.1007/s10107-006-0706-8

    Article  MathSciNet  Google Scholar 

  36. Polyak, B.T.: Minimization of unsmooth functionals. USSR Comput. Math. Math. Phys. 9(3), 14–29 (1969). https://doi.org/10.1016/0041-5553(69)90061-5

    Article  Google Scholar 

  37. Polyak, B.T.: Sharp minima. Institute of Control Sciences Lecture Notes,Moscow, USSR. Presented at the IIASA Workshop on Generalized Lagrangians and Their Applications, IIASA, Laxenburg, Austria. (1979)

  38. Renegar, J.: “Efficient’’ subgradient methods for general convex optimization. SIAM J. Optim. 26(4), 2649–2676 (2016). https://doi.org/10.1137/15M1027371

    Article  MathSciNet  Google Scholar 

  39. Renegar, J.: Accelerated first-order methods for hyperbolic programming. Math. Program. 173(1–2), 1–35 (2019). https://doi.org/10.1007/s10107-017-1203-y

    Article  MathSciNet  Google Scholar 

  40. Renegar, J., Grimmer, B.: A simple nearly-optimal restart scheme for speeding-up first order methods. To appear in foundations of computational mathematics (2021)

  41. Roulet, V., d’Aspremont, A.: Sharpness, restart, and acceleration. SIAM J. Optim. 30(1), 262–289 (2020). https://doi.org/10.1137/18M1224568

    Article  MathSciNet  Google Scholar 

  42. Rousseeuw, P.J.: Least median of squares regression. J. Am. Stat. Assoc. 79(388), 871–880 (1984). https://doi.org/10.1080/01621459.1984.10477105

    Article  MathSciNet  Google Scholar 

  43. Rubinov, A., Yagubov, A.: The space of star-shaped sets and its applications in nonsmooth optimization. Math. Program. Stud. 29 (1986). https://doi.org/10.1007/BFb0121146

  44. Stellato, B., Banjac, G., Goulart, P., Bemporad, A., Boyd, S.: OSQP: an operator splitting solver for quadratic programs. Math. Program. Comput. 12(4), 637–672 (2020). https://doi.org/10.1007/s12532-020-00179-2

    Article  MathSciNet  Google Scholar 

  45. Wen, F., Chu, L., Liu, P., Qiu, R.C.: A survey on nonconvex regularization-based sparse and low-rank recovery in signal processing, statistics, and machine learning. IEEE Access 6, 69883–69906 (2018). https://doi.org/10.1109/ACCESS.2018.2880454

    Article  Google Scholar 

  46. Yang, T., Lin, Q.: Rsg: Beating subgradient method without smoothness and strong convexity. J. Mach. Learn. Res. 19(6), 1–33 (2018). http://jmlr.org/papers/v19/17-016.html

  47. Yu, J., Eriksson, A., Chin, T.J., Suter, D.: An adversarial optimization approach to efficient outlier removal. J. Math. Imag. Vis 48, 451–466 (2014). https://doi.org/10.1007/s10851-013-0418-7

    Article  MathSciNet  Google Scholar 

  48. Yuan, Y., Li, Z., Huang, B.: Robust optimization approximation for joint chance constrained optimization problem. J. Glob. Optim. 67, 805–827 (2017). https://doi.org/10.1007/s10898-016-0438-0

    Article  MathSciNet  Google Scholar 

  49. Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010). https://doi.org/10.1214/09-AOS729

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The author thanks Jim Renegar broadly for inspiring this work and concretely for providing feedback an early draft and Rob Freund for constructive thoughts helping focus this work. Additionally, two anonymous referees and the associate editor provided useful feedback much improving this work’s presentation and clarity.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin Grimmer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1650441. This work was partially done while the author was visiting the Simons Institute for the Theory of Computing. It was partially supported by the DIMACS/Simons Collaboration on Bridging Continuous and Discrete Optimization through NSF grant #CCF-1740425.

Appendices

Appendix A: LogSumExp gradients and QP optimality certificates

In our quadratic programming example (8) and our generalized setting (37), we consider smoothings of a finite maximum. Given a smooth convex functions \(f_i:\mathcal {E}\rightarrow \mathbb {R}\), we considered the smoothing of \(\max \{f_i\}\) with parameter \(\eta >0\) given by \( f_\eta (x):= \eta \log \left( \sum _{i=0}^n \exp (f_i(x)/\eta )\right) .\) Its gradient is given by \( \nabla f_\eta (x) = \sum \lambda _i \nabla f_i(x)\) where \(\lambda _i = \exp (f_i(x)/\eta ) / \sum _j \exp (f_j(x)/\eta )\). Computationally evaluating this requires mild care to avoid precision issues with exponentiating potentially larger numbers. It is numerically stable to instead compute these coefficients via the equivalent formula

$$\begin{aligned} \lambda _i = \frac{\exp ((f_i(x) - \max \{f_k(x)\})/\eta )}{\sum _j \exp ((f_j(x) - \max \{f_k(x)\})/\eta )} \ . \end{aligned}$$

Next, we specialize this formula to the setting of quadratic programming for \(g_\eta \) in (8). Observe the gradient of the objective component is given by

$$\begin{aligned} \nabla \left( \frac{c^Ty+1 + \sqrt{(c^Ty+1)^2 +2y^TQy}}{2}\right) _+\ (y)= \frac{Qx +c}{1+\frac{1}{2}x^TQx} \end{aligned}$$

where \(x=y/\left( \frac{c^Ty+1 + \sqrt{(c^Ty+1)^2 +2y^TQy}}{2}\right) _+\) by using the gradient formula (25). The gradients of the transformed constraints are simply \( \nabla a_i^T y /b_i = a_i/b_i \). Then the gradient of the smoothing overall is given by

$$\begin{aligned} \nabla g_\eta (y) = \lambda _0 \frac{Qx +c}{1+\frac{1}{2}x^TQx} + \sum _{i=1}^n \lambda _i a_i/b_i \ . \end{aligned}$$

This gradient can be computed using two matrix multiplications with A: Ay is needed to compute the coefficients \(\lambda _i\), then \(A^T[\lambda _1/b_1 \dots \lambda _n/b_n]\) is needed for the summation above. This gradient formula indicates a reasonable selection of dual multipliers \( v_i = \frac{\lambda _i (1+\frac{1}{2}x^TQx)}{\lambda _0b_i}\) as we then have \(g_\eta (y)\) proportional to \(Qx+c + A^Tv\).

Appendix B: Calculation of nonconvex subgradient method guarantee (41)

Here we derive (41) from the convergence theory of [10, Theorem 3.1], which assumes the function being minimized g is uniformly M-Lipschitz and \(\rho \)-weakly convex (defined as \(g+\frac{\rho }{2}\Vert \cdot \Vert ^2\) being convex). Equation [10, (3.4)] ensures the subgradient method \(y_{k+1} = y_k - \alpha \zeta _k\) for \(\zeta _k\in \partial _P g(y_k)\) has some \(y_k\) with

$$\begin{aligned} \Vert \nabla g_{1/\bar{\rho }}(y_k)\Vert ^2 \le \frac{\bar{\rho }}{\bar{\rho }- \rho } \frac{(g(x_0) - \inf g) + \frac{\bar{\rho }M^2}{2}\sum _{t=0}^T\alpha _t }{\sum _{t=0}^T \alpha _t} \end{aligned}$$

where \(g_{1/\bar{\rho }}\) is the Moreau envelope of g (and using that \(g_{1/\bar{\rho }}(x_0) < g(x_0)\)). Given the method will be run for T steps, setting \(\bar{\rho }= 2\rho \) and \(\alpha _k = \sqrt{\frac{g(y_0)-\inf g}{\rho M^2(T+1)}}\) this bound becomes

$$\begin{aligned} \Vert \nabla g_{1/2\rho }(y_k)\Vert ^2 \le 4\sqrt{\frac{\rho M^2(g(y_0)-\inf g)}{T+1}} \end{aligned}$$

Following the discussion of [10, Page 210], this implies \(y_k\) has a nearby y that is nearly stationary on g. In particular, \(\Vert y-y_k\Vert \le (1/2\rho )\Vert \nabla g_{1/2\rho }(y_k)\Vert \) and \(\textrm{dist}(0,\partial _P g(y))\le \Vert \nabla g_{1/2\rho }(y_k)\Vert \). Combining this with the above bound, we conclude

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Grimmer, B. Radial duality part II: applications and algorithms. Math. Program. 205, 69–105 (2024). https://doi.org/10.1007/s10107-023-01974-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-023-01974-0

Keywords

Mathematics Subject Classification

Navigation