Solution refinement at regular points of conic problems

Abstract

Many numerical methods for conic problems use the homogenous primal–dual embedding, which yields a primal–dual solution or a certificate establishing primal or dual infeasibility. Following Themelis and Patrinos (IEEE Trans Autom Control, 2019), we express the embedding as the problem of finding a zero of a mapping containing a skew-symmetric linear function and projections onto cones and their duals. We focus on the special case when this mapping is regular, i.e., differentiable with nonsingular derivative matrix, at a solution point. While this is not always the case, it is a very common occurrence in practice. In this paper we do not aim for new theorerical results. We rather propose a simple method that uses LSQR, a variant of conjugate gradients for least squares problems, and the derivative of the residual mapping to refine an approximate solution, i.e., to increase its accuracy. LSQR is a matrix-free method, i.e., requires only the evaluation of the derivative mapping and its adjoint, and so avoids forming or storing large matrices, which makes it efficient even for cone problems in which the data matrices are given and dense, and also allows the method to extend to cone programs in which the data are given as abstract linear operators. Numerical examples show that the method improves an approximate solution of a conic program, and often dramatically, at a computational cost that is typically small compared to the cost of obtaining the original approximate solution. For completeness we describe methods for computing the derivative of the projection onto the cones commonly used in practice: nonnegative, second-order, semidefinite, and exponential cones. The paper is accompanied by an open source implementation.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2

References

  1. 1.

    Ali, A., Wong, E., Kolter, J.: A semismooth newton method for fast, generic convex programming. In: Proceedings of the 34th International Conference on Machine Learning, pp. 272–279 (2018)

  2. 2.

    Boyd, S., Busseti, E., Diamond, S., Kahn, R., Koh, K., Nystrup, P., Speth, J.: Multi-period trading via convex optimization. Found. Trends Optim. 3(1), 1–76 (2017)

    Article  Google Scholar 

  3. 3.

    Bauschke, H., Combettes, P.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, Berlin (2017)

    Google Scholar 

  4. 4.

    Busseti, E., Ryu, E., Boyd, S.: Risk-constrained Kelly gambling. J. Invest. 25(3), 118–134 (2016)

    Article  Google Scholar 

  5. 5.

    Browder, F.: Convergence theorems for sequences of nonlinear operators in Banach spaces. Math. Z. 100(3), 201–225 (1967)

    MathSciNet  Article  Google Scholar 

  6. 6.

    Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization. SIAM, Philadelphia (2001)

    Google Scholar 

  7. 7.

    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  8. 8.

    Boyd, S., Vandenberghe, L.: Introduction to Applied Linear Algebra - Vectors, Matrices, and Least Squares. Cambridge University Press, Cambridge (2018)

    Google Scholar 

  9. 9.

    Chen, X., Qi, H.D., Tseng, P.: Analysis of nonsmooth symmetric-matrix-valued functions with applications to semidefinite complementarity problems. SIAM J. Optim. 13(4), 960–985 (2003)

    MathSciNet  Article  Google Scholar 

  10. 10.

    Diamond, S., Boyd, S.: CVXPY: a Python-embedded modeling language for convex optimization. J. Mach. Learn. Res. 16(83), 1–5 (2016)

    MathSciNet  MATH  Google Scholar 

  11. 11.

    Domahidi, A., Chu, E., Boyd, S.: ECOS: an SOCP solver for embedded systems. In: 2013 European Control Conference, pp. 3071–3076. IEEE (2013)

  12. 12.

    Evans, L., Gariepy, R.: Measure Theory and Fine Properties of Functions. CRC Press, Boca Raton (1992)

    Google Scholar 

  13. 13.

    El Ghaoui, L., Lebret, H.: Robust solutions to least-squares problems with uncertain data. SIAM J. Matrix Anal. Appl. 18(4), 1035–1064 (1997)

    MathSciNet  Article  Google Scholar 

  14. 14.

    Fu, A., Narasimhan, B., Boyd, S.: CVXR: an R package for disciplined convex optimization. J. Stat. Softw. (2019) (to appear)

  15. 15.

    Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In: Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, pp. 95–110. Springer (2008)

  16. 16.

    Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx (2014)

  17. 17.

    Gardiner, J., Laub, A., Amato, J., Moler, C.: Solution of the Sylvester matrix equation \(AXB^{ T}+CXD^{T}=E\). ACM Trans. Math. Softw. 18(2), 223–231 (1992)

    Article  Google Scholar 

  18. 18.

    Jiang, H.: Global convergence analysis of the generalized Newton and Gauss–Newton methods of the Fischer–Burmeister equation for the complementarity problem. Math. Oper. Res. 24(3), 529–543 (1999)

    MathSciNet  Article  Google Scholar 

  19. 19.

    Jones, E., Oliphant, T., Peterson, P., Others: SciPy: Open source scientific tools for Python. http://www.scipy.org/ (2001). Accessed 4 Mar 2019

  20. 20.

    Kanzow, C., Ferenczi, I., Fukushima, M.: On the local convergence of semismooth Newton methods for linear and nonlinear second-order cone programs without strict complementarity. SIAM J. Optim. 20(1), 297–320 (2009)

    MathSciNet  Article  Google Scholar 

  21. 21.

    Löfberg, J.: YALMIP: A toolbox for modeling and optimization in MATLAB. In: Proceedings of the IEEE International Symposium on Computer Aided Control Systems Design, pp. 284–289 (2004)

  22. 22.

    Lasdon, L., Mitter, S., Waren, A.: The conjugate gradient method for optimal control problems. IEEE Trans. Autom. Control 12(2), 132–138 (1967)

    MathSciNet  Article  Google Scholar 

  23. 23.

    Moreau, J.-J.: Décomposition orthogonale d’un espace hilbertien selon deux cônes mutuellement polaires. Bulletin de la Société Mathématique de France 93, 273–299 (1965)

    MathSciNet  Article  Google Scholar 

  24. 24.

    MOSEK ApS. The MOSEK optimization toolbox for MATLAB manual, version 8.0 (revision 57) (2017)

  25. 25.

    Malick, J., Sendov, H.: Clarke generalized Jacobian of the projection onto the cone of positive semidefinite matrices. Set-Valued Anal. 14(3), 273–293 (2006)

    MathSciNet  Article  Google Scholar 

  26. 26.

    Nash, S.: A survey of truncated-Newton methods. J. Comput. Appl. Math. 124(1–2), 45–59 (2000)

    MathSciNet  Article  Google Scholar 

  27. 27.

    Nocedal, J., Wright, S.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, Berlin (2006)

    Google Scholar 

  28. 28.

    Numba Development Team. Numba. http://numba.pydata.org (2015). Accessed 4 Mar 2019

  29. 29.

    O’Donoghue, B., Chu, E., Parikh, N., Boyd, S.: Conic optimization via operator splitting and homogeneous self-dual embedding. J. Optim. Theory Appl. 169(3), 1042–1068 (2016)

    MathSciNet  Article  Google Scholar 

  30. 30.

    Oliphant, T.: A Guide to NumPy, vol. 1. Trelgol Publishing, Spanish Fork (2006)

    Google Scholar 

  31. 31.

    Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2014)

    Google Scholar 

  32. 32.

    Permenter, F., Friberg, H.A., Andersen, E.D.: Solving conic optimization problems via self-dual embedding and facial reduction: a unified approach. SIAM J. Optim, 27(3), 1257–1282 (2017)

    MathSciNet  Article  Google Scholar 

  33. 33.

    Paige, C., Saunders, M.: LSQR: an algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Softw. 8(1), 43–71 (1982)

    MathSciNet  Article  Google Scholar 

  34. 34.

    Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Program. 58(3, Ser. A), 353–367 (1993)

    MathSciNet  Article  Google Scholar 

  35. 35.

    Qi, L., Sun.: A survey of some nonsmooth equations and smoothing Newton methods. In: Progress in Optimization, volume 30 of Applied Optimization, pp. 121–146. Kluwer (1999)

  36. 36.

    Rockafellar, R.: Convex Analysis. Princeton University Press, Princeton (1970)

    Google Scholar 

  37. 37.

    Rockafellar, R., Wets, R.: Variational Analysis. Springer, Berlin (1998)

    Google Scholar 

  38. 38.

    Stellato, B., Banjac, G., Goulart, P., Bemporad, A., Boyd, S.: OSQP: an operator splitting solver for quadratic programs. ArXiv e-prints (2017)

  39. 39.

    SCS. Splitting conic solve, version 1.1.0. https://github.com/cvxgrp/scs (2015)

  40. 40.

    Sun, D., Sun, J.: Löwner’s operator and spectral functions in Euclidean Jordan algebras. Math. Oper. Res. 33(2), 421–445 (2008)

    MathSciNet  Article  Google Scholar 

  41. 41.

    Sturm, J.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw. 11(1–4), 625–653 (1999)

    MathSciNet  Article  Google Scholar 

  42. 42.

    Sylvester, J.: Sur l’équation linéaire trinôme en matrices d’un ordre quelconque. Comptes Rendus de l’Académie des Sciences 99, 527–529 (1884)

    MATH  Google Scholar 

  43. 43.

    Taylor, J.: Convex Optim. Power Syst. Cambridge University Press, Cambridge (2015)

    Google Scholar 

  44. 44.

    Themelis, A., Patrinos, P.: SuperMann: a superlinearly convergent algorithm for finding fixed points of nonexpansive operators. IEEE Trans. Autom. Control (2019). https://doi.org/10.1109/TAC.2019.2906393

    Article  Google Scholar 

  45. 45.

    Udell, M., Mohan, K., Zeng, D., Hong, J., Diamond, S., Boyd, S.: Convex optimization in Julia. In: SC14 Workshop on High Performance Technical Computing in Dynamic Languages (2014)

  46. 46.

    Wright, S., Holt, J.: An inexact Levenberg–Marquardt method for large sparse nonlinear least squares. ANZIAM J. 26(4), 387–403 (1985)

    MathSciNet  MATH  Google Scholar 

  47. 47.

    Ye, Y., Todd, M., Mizuno, S.: An \({O}(\sqrt{n}{L})\)-iteration homogeneous and self-dual linear programming algorithm. Math. Oper. Res. 19(1), 53–67 (1994)

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgements

The authors thank Yinyu Ye, Micheal Saunders, Nicholas Moehle, and Steven Diamond for useful discussions.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Walaa M. Moursi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Differentiability properties of the residual map Let C be a nonempty closed convex subset of \({\mathbf{R}}^n\). It is well known that the projection \(\Pi _C\) onto C is (firmly) nonexpansive (see, e.g., [5, Proposition 2]), hence it is Lipschitz continuous with a Lipschitz constant at most 1. Consequently, if \(A: {\mathbf{R}}^n\rightarrow {\mathbf{R}}^m\) is linear then the composition \(A\circ \Pi _C\) is also Lipschitz continuous. Therefore, by the Rademacher Theorem (see, e.g., [37, Theorem 9.60] or [12, Theorem 3.2]) both \(\Pi _C\) and \(A\circ \Pi _C\) are differentiable almost everywhere. This allows us to conclude that the residual map (7) is differentiable almost everywhere. Moreover, let \(z\in {\mathbf{R}}^{m+n+1}\). Clearly \({\mathcal {R}}\) is differentiable at z if \(\Pi \) is differentiable at z.

Appendix B

Semi-definite cone projection derivative

Let \(X\in {\mathbf{S}}^n\), let \(X=U\mathbf {diag} (\lambda ) U^T\) be an eigendecomposition of X and suppose that \(\det (X)\ne 0\). Without loss of generality, we can and do assume that the entries of \(\lambda \) are in an increasing order. That is, there exists \(k\in \{1,\ldots , n\}\) such that

$$\begin{aligned} \lambda _1\le \cdots \le \lambda _k<0<\lambda _{k+1} \le \cdots \le \lambda _n. \end{aligned}$$
(17)

We also note that

$$\begin{aligned} \Pi X - X = U\mathbf {diag} (\lambda _- ) U^T, \end{aligned}$$
(18)

where \(\lambda _- = -\min (\lambda , 0)\). It follows from (11), (18), and the orthogonality of U that

$$\begin{aligned} U^T\Pi X U = \mathbf {diag} (\lambda _+ ), \quad U^T(\Pi X -X)U= \mathbf {diag} (\lambda _-). \end{aligned}$$
(19)

Note that

$$\begin{aligned} \Pi X (\Pi X - X) =U \mathbf {diag}(\lambda _+ ) \mathbf {diag} (\lambda _- ) U^T= 0. \end{aligned}$$
(20)

Let \({\mathsf {D}} {\Pi }(X): {\mathbf{S}}^n \rightarrow {\mathbf{S}}^n\) be the derivative of \(\Pi \) at X, and let \({\widetilde{X}}\in {\mathbf{S}}^n\). We now show that (12) holds.

Indeed, using the first order Taylor approximation of \(\Pi \) around X, for \(\Delta X\in {\mathbf{S}}^n\) such that \(||\Delta X||_F\) is sufficiently small (here \(||\cdot ||_F\) denotes the Frobenius norm) we have

$$\begin{aligned} \Pi (X + \Delta X) \approx \Pi X + {\mathsf {D}} \Pi (X)(\Delta X). \end{aligned}$$
(21)

To simplify the notation, we set \(\Delta Y={\mathsf {D}} \Pi (X)(\Delta X)\). Now

$$\begin{aligned} 0&=\Pi (X + \Delta X) (\Pi (X + \Delta X) - X - \Delta X) \end{aligned}$$
(22a)
$$\begin{aligned}&\approx (\Pi X + \Delta Y) (\Pi X + \Delta Y - X - \Delta X) \end{aligned}$$
(22b)
$$\begin{aligned}&=\Pi X(\Pi X-X) +\Delta Y(\Pi X-X) +\Pi X(\Delta Y-\Delta X) +\Delta Y (\Delta Y-\Delta X) \nonumber \\&\approx \Pi X(\Delta Y-\Delta X) + \Delta Y(\Pi X-X) \end{aligned}$$
(22c)
$$\begin{aligned}&\approx U^T\Pi X(\Delta Y-\Delta X)U + U^T\Delta Y(\Pi X-X)U \end{aligned}$$
(22d)
$$\begin{aligned}&=(U^T\Pi XU) U^T(\Delta Y-\Delta X)U + U^T\Delta YU (U^T(\Pi X-X)U) \end{aligned}$$
(22e)
$$\begin{aligned}&=\mathbf {diag} (\lambda _+ ) U^T(\Delta Y-\Delta X)U + U^T\Delta YU (\mathbf {diag} (\lambda _- )). \end{aligned}$$
(22f)

Here, (22a) follows from applying (20) with X replaced by \(X + \Delta X\), (22b) follows from combining (22a) and (21), (22c) follows from (20) by neglecting second order terms, (22d) follows from multiplying (22c) from the left by \(U^T\) and from the right by U, (22e) follows from the fact that \(UU^T=I\) and finally (22f) follows from (19). We rewrite the Sylvester [17, 42] Eq. (22f) as

$$\begin{aligned} \mathbf {diag} (\lambda _+ ) U^T\Delta YU + U^T\Delta YU \mathbf {diag} (\lambda _-) \approx \mathbf {diag} (\lambda _+ ) U^T\Delta X U. \end{aligned}$$
(23)

Using (23), we learn that for any \(i \in \{1, \ldots , n\}\) and \(j \in \{1, \ldots , n\}\), we have

$$\begin{aligned} ((\lambda _-)_j +(\lambda _+)_i)(U^T\Delta Y U)_{ij} \approx (\lambda _+)_i(U^T\Delta X U)_{ij} . \end{aligned}$$

Recalling (17), if \(i \le k, \, j > k\) we have \((\lambda _-)_j = (\lambda _+)_i=0\). Otherwise, \((\lambda _-)_j +(\lambda _+)_i\ne 0 \) and

$$\begin{aligned} (U^T\Delta Y U)_{ij} \approx \underbrace{\frac{(\lambda _+)_i}{(\lambda _-)_j +(\lambda _+)_i}}_{=B_{ij}} (U^T\Delta X U)_{ij} . \end{aligned}$$
(24)

Proceeding by cases in view of (17), and using that \(\Delta Y \) is symmetric (so is \(U^T\Delta Y U\)), we conclude that

$$\begin{aligned} B_{ij} = {\left\{ \begin{array}{ll} 0, &{}~~\text {if}~~i \le k, \, j \le k; \\ \frac{(\lambda _+)_i}{(\lambda _-)_j+(\lambda _+)_i}, &{} ~~\text {if}~~i> k, \, j \le k; \\ \frac{(\lambda _+)_j}{(\lambda _-)_i+(\lambda _+)_j}, &{}~~\text {if}~~i \le k, \, j> k; \\ 1,&{}~~\text {if}~~i> k, \, j > k.\\ \end{array}\right. } \end{aligned}$$

Therefore, combining with (24) we obtain

$$\begin{aligned} U^T\Delta Y U \approx B \circ (U^T\Delta X U ), \end{aligned}$$

where “\(\circ \)” denotes the Hadamard (i.e., entrywise) product. Recalling the definition of \(\Delta Y\) and using that \(UU^T=I\) we conclude that

$$\begin{aligned} {\mathsf {D}} \Pi (X)(\Delta X) \approx U (B \circ (U^T\Delta X U )) U^T. \end{aligned}$$

Letting \(||\Delta X||_F\rightarrow 0\) and applying the implicit function theorem, we conclude that (12) holds.

Appendix C

Exponential cone projection derivative The Lagrangian of the constrained optimization problem (13) is

$$\begin{aligned} \tfrac{1}{2}||(x,y,z) - ({\overline{x}},{\overline{y}},{\overline{z}})||^2 +\mu ({\overline{y}} e^{{\overline{x}}/{\overline{y}}}-{\overline{z}}), \end{aligned}$$

where \(\mu \in {\mathbf{R}}\) is the dual variable. The KKT conditions at a solution \((x^*,y^*,z^*,\mu ^*)\) are

$$\begin{aligned} x^*-x+\mu ^*e^{x^*/y^*}&=0\nonumber \\ y^*-y+\mu ^*e^{x^*/y^*}\big (1-\tfrac{x^*}{y^*}\big )&=0\nonumber \\ z^*-z-\mu ^*&=0\nonumber \\ y^*e^{x^*/y^*}-z^*&=0. \end{aligned}$$
(25)

Considering the differentials dxdydz and \(dx^*,dy^*,dz^*, d\mu ^*\) of the KKT conditions in (25), the authors of [1, Lemma 3.6] obtain the system of equations

$$\begin{aligned} \underbrace{ \begin{bmatrix} 1+\tfrac{\mu ^*e^{x^*/y^*}}{y^*}&-\tfrac{\mu ^*x^*e^{x^*/y^*}}{{y^*}^2}&0&e^{x^*/y^*} \\ -\tfrac{\mu ^*x^*e^{x^*/y^*}}{{y^*}^2}&1+\tfrac{\mu ^*{x^*}^2e^{x^*/y^*}}{{y^*}^3}&0&(1-x^*/y^*)e^{x^*/y^*} \\ 0&0&1&-1 \\ e^{x^*/y^*}&(1-x^*/y^*)e^{x^*/y^*}&-1&0 \end{bmatrix} }_{D} \underbrace{ \begin{bmatrix} dx^*\\ dy^*\\ dz^*\\ d\mu ^* \end{bmatrix} }_{du^*} =\underbrace{\begin{bmatrix} dx\\ dy\\ dz\\ 0 \end{bmatrix} }_{du}.\nonumber \\ \end{aligned}$$
(26)

Note that, since (13) is feasible, D is invertible. Therefore, \(du^*=D^{-1}(du)\). Consequently, the upper left \(3\times 3 \) block matrix of \(D^{-1}\) is the Jacobian of the projection at (xyz) in Case 4.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Busseti, E., Moursi, W.M. & Boyd, S. Solution refinement at regular points of conic problems. Comput Optim Appl 74, 627–643 (2019). https://doi.org/10.1007/s10589-019-00122-9

Download citation

Keywords

  • Conic programming
  • Homogenous self-dual embedding
  • Projection operator
  • Residual map