Skip to main content
Log in

A Bregman–Kaczmarz method for nonlinear systems of equations

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

A Correction to this article was published on 05 April 2024

This article has been updated

Abstract

We propose a new randomized method for solving systems of nonlinear equations, which can find sparse solutions or solutions under certain simple constraints. The scheme only takes gradients of component functions and uses Bregman projections onto the solution space of a Newton equation. In the special case of euclidean projections, the method is known as nonlinear Kaczmarz method. Furthermore if the component functions are nonnegative, we are in the setting of optimization under the interpolation assumption and the method reduces to SGD with the recently proposed stochastic Polyak step size. For general Bregman projections, our method is a stochastic mirror descent with a novel adaptive step size. We prove that in the convex setting each iteration of our method results in a smaller Bregman distance to exact solutions as compared to the standard Polyak step. Our generalization to Bregman projections comes with the price that a convex one-dimensional optimization problem needs to be solved in each iteration. This can typically be done with globalized Newton iterations. Convergence is proved in two classical settings of nonlinearity: for convex nonnegative functions and locally for functions which fulfill the tangential cone condition. Finally, we show examples in which the proposed method outperforms similar methods with the same memory requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Fig. 1
Fig. 2
Algorithm 3
Fig. 3
Fig. 4
Fig. 5
Algorithm 4
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

We do not analyze or generate any datasets, because our work proceeds within a theoretical and mathematical approach. However, the code that generates the figures in this article can be found at https://github.com/MaxiWk/Bregman-Kaczmarz.

Change history

Notes

  1. The typical setting in convergence analysis will be that \(\varphi \) is \(\sigma \)-strongly convex with respect to a norm \(\Vert \cdot \Vert \), and \(\Vert \cdot \Vert _*\) will be its dual norm.

References

  1. Alber, Y., Butnariu, D.: Convergence of Bregman projection methods for solving consistent convex feasibility problems in reflexive Banach spaces. J. Optim. Theory Appl. 92(1), 33–61 (1997)

    Article  MathSciNet  Google Scholar 

  2. Arora, R., Gupta, M.R., Kapila, A., Fazel, M.: Similarity-based clustering by left-stochastic matrix factorization. J. Mach. Learn. Res. 14(1), 1715–1746 (2013)

    MathSciNet  Google Scholar 

  3. Azizan, N., Lale, S., Hassibi, B.: Stochastic mirror descent on overparameterized nonlinear models. IEEE Trans. Neural Netw. Learn. Systems 33(12), 7717–7727 (2021)

    Article  MathSciNet  Google Scholar 

  4. Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Bregman monotone optimization algorithms. SIAM J. Control. Optim. 42(2), 596–636 (2003)

    Article  MathSciNet  Google Scholar 

  5. Bauschke, H.H., Borwein, J.M., et al.: Legendre functions and the method of random Bregman projections. J. Convex Anal. 4(1), 27–67 (1997)

    MathSciNet  Google Scholar 

  6. Bauschke, H.H., Combettes, P.L.: Iterating Bregman retractions. SIAM J. Optim. 13(4), 1159–1173 (2003)

    Article  MathSciNet  Google Scholar 

  7. Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces, vol. 408. Springer, Berlin (2011)

    Book  Google Scholar 

  8. Beck, A.: First-order methods in optimization. SIAM (2017)

  9. Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2), A1111–A1138 (2015)

    Article  MathSciNet  Google Scholar 

  10. Boţ, R.I., Hein, T.: Iterative regularization with a general penalty term-theory and application to L1 and TV regularization. Inverse Prob. 28(10), 104010 (2012)

    Article  Google Scholar 

  11. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)

    Article  MathSciNet  Google Scholar 

  12. Brucker, P.: An O(n) algorithm for quadratic knapsack problems. Oper. Res. Lett. 3(3), 163–166 (1984)

    Article  MathSciNet  Google Scholar 

  13. Butnariu, D., Iusem, A.N., Zalinescu, C.: On uniform convexity, total convexity and convergence of the proximal point and outer Bregman projection algorithm in Banach spaces. J. Convex Anal. 10(1), 35–62 (2003)

    MathSciNet  Google Scholar 

  14. Butnariu, D., Resmerita, E.: The outer Bregman projection method for stochastic feasibility problems in Banach spaces. In: Studies in computational mathematics, vol. 8, pp. 69–86. Elsevier, Amsterdam (2001)

    Google Scholar 

  15. Butnariu, D., Resmerita, E.: Bregman distances, totally convex functions, and a method for solving operator equations in Banach spaces. In Abstract and Applied Analysis, volume 2006. Hindawi, (2006)

  16. Censor, Y., Elfving, T., Herman, G.: Averaging strings of sequential iterations for convex feasibility problems. In: Studies in computational mathematics, vol. 8, pp. 101–113. Elsevier, Amsterdam (2001)

    Google Scholar 

  17. Censor, Y., Lent, A.: An iterative row-action method for interval convex programming. J. Optim. Theory Appl. 34(3), 321–353 (1981)

    Article  MathSciNet  Google Scholar 

  18. Censor, Y., Reich, S.: Iterations of paracontractions and firmaly nonexpansive operators with applications to feasibility and optimization. Optimization 37(4), 323–339 (1996)

    Article  MathSciNet  Google Scholar 

  19. Cetin, A.E.: Reconstruction of signals from Fourier transform samples. Signal Process. 16(2), 129–148 (1989)

    Article  MathSciNet  Google Scholar 

  20. Cetin, A.E.: An iterative algorithm for signal reconstruction from bispectrum. IEEE Trans. Signal Process. 39(12), 2621–2628 (1991)

    Article  Google Scholar 

  21. Dhillon, I.S., Tropp, J.A.: Matrix nearness problems with Bregman divergences. SIAM J. Matrix Anal. Appl. 29(4), 1120–1146 (2008)

    Article  MathSciNet  Google Scholar 

  22. Doikov, N., Nesterov, Y.: Gradient regularization of Newton method with Bregman distances. arXiv preprint arXiv:2112.02952, (2021)

  23. D’Orazio, R., Loizou, N., Laradji, I., Mitliagkas, I.: Stochastic mirror descent: convergence analysis and adaptive variants via the mirror stochastic Polyak stepsize. arXiv preprint arXiv:2110.15412, (2021)

  24. Duchi, J., Shalev-Shwartz, S., Singer, Y., Chandra, T.: Efficient projections onto the \(\ell _1\)-ball for learning in high dimensions. In: Proceedings of the 25th international conference on Machine learning, pp. 272–279, (2008)

  25. Fedotov, A.A., Harremoës, P., Topsoe, F.: Refinements of Pinsker’s inequality. IEEE Trans. Inf. Theory 49(6), 1491–1498 (2003)

    Article  MathSciNet  Google Scholar 

  26. Gafni, E.M., Bertsekas, D.P.: Two-metric projection methods for constrained optimization. SIAM J. Control. Optim. 22(6), 936–964 (1984)

    Article  MathSciNet  Google Scholar 

  27. Gower, R. M., Blondel, M., Gazagnadou, N., Pedregosa, F.: Cutting some slack for SGD with adaptive Polyak stepsizes. arXiv:2202.12328 (2022)

  28. Gu, R., Han, B., Tong, S., Chen, Y.: An accelerated Kaczmarz type method for nonlinear inverse problems in Banach spaces with uniformly convex penalty. J. Comput. Appl. Math. 385, 113211 (2021)

    Article  MathSciNet  Google Scholar 

  29. Hanke, M., Neubauer, A., Scherzer, O.: A convergence analysis of the Landweber iteration for nonlinear ill-posed problems. Numer. Math. 72(1), 21–37 (1995)

    Article  MathSciNet  Google Scholar 

  30. Iusem, N.A., Solodov, V.M.: Newton-type methods with generalized distances for constrained optimization. Optimization 41(3), 257–278 (1997)

    Article  MathSciNet  Google Scholar 

  31. Jarman, B., Yaniv, Y., Needell, D.: Online signal recovery via heavy ball Kaczmarz. arXiv preprint arXiv:2211.06391 (2022)

  32. Jin, Q.: Landweber-Kaczmarz method in Banach spaces with inexact inner solvers. Inverse Prob. 32(10), 104005 (2016)

    Article  MathSciNet  Google Scholar 

  33. Jin, Q., Lu, X., Zhang, L.: Stochastic mirror descent method for linear ill-posed problems in banach spaces. Inverse Prob. 39(6), 065010 (2023)

    Article  MathSciNet  Google Scholar 

  34. Jin, Q., Wang, W.: Landweber iteration of Kaczmarz type with general non-smooth convex penalty functionals. Inverse Prob. 29(8), 085011 (2013)

    Article  MathSciNet  Google Scholar 

  35. Kaczmarz, S.: Angenäherte Auflösung von Systemen linearer Gleichungen. Bull. Internat. Acad. Polon. Sci. Lettres A, pp. 355–357 (1937)

  36. Kostic, V., Salzo, S.: The method of Bregman projections in deterministic and stochastic convex feasibility problems. arXiv preprint arXiv:2101.01704 (2021)

  37. Lin, T., Ho, N., Jordan, M.: On efficient optimal transport: an analysis of greedy and accelerated mirror descent algorithms. In: International conference on machine learning, pp. 3982–3991. PMLR (2019)

  38. Loizou, N., Vaswani, S., Laradji, I. H., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. In: International Conference on Artificial Intelligence and Statistics, PMLR pp. 1306–1314 (2021)

  39. Lorenz, D.A., Schöpfer, F., Wenger, S.: The linearized Bregman method via split feasibility problems: analysis and generalizations. SIAM J. Imag. Sci. 7(2), 1237–1262 (2014)

    Article  MathSciNet  Google Scholar 

  40. Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: International conference on machine learning, pp 3325–3334. PMLR (2018)

  41. Maaß, P., Strehlow, R.: An iterative regularization method for nonlinear problems based on Bregman projections. Inverse Prob. 32(11), 115013 (2016)

    Article  MathSciNet  Google Scholar 

  42. Mishchenko, K.: Regularized Newton method with global \( {O} (1/k^2) \) convergence. arXiv preprint arXiv:2112.02089 (2021)

  43. Nedić, A.: Random algorithms for convex minimization problems. Math. Program. 129(2), 225–253 (2011)

    Article  MathSciNet  Google Scholar 

  44. Nemirovskij, A.S., Yudin, D.B.: Problem complexity and method efficiency in optimization. Wiley, Hoboken (1983)

    Google Scholar 

  45. Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)

    Article  MathSciNet  Google Scholar 

  46. Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation-based image restoration. Multisc. Model. Simul 4(2), 460–489 (2005)

    Article  MathSciNet  Google Scholar 

  47. Pinsker, M. S.: Information and information stability of random variables and processes (in Russian). Holden-Day, (1964)

  48. Polyak, B., Tremba, A.: New versions of Newton method: step-size choice, convergence domain and under-determined equations. Optimiz. Methods Software 35(6), 1272–1303 (2020)

    Article  MathSciNet  Google Scholar 

  49. Polyak, B., Tremba, A.: Sparse solutions of optimal control via Newton method for under-determined systems. J. Global Optim. 76(3), 613–623 (2020)

    Article  MathSciNet  Google Scholar 

  50. Reem, D., Reich, S., De Pierro, A.: Re-examination of Bregman functions and new properties of their divergences. Optimization 68(1), 279–348 (2019)

    Article  MathSciNet  Google Scholar 

  51. Robbins, H., Siegmund, D.: A convergence theorem for non negative almost supermartingales and some applications. In: Optimizing methods in statistics, pp. 233–257. Elsevier, Amsterdam (1971)

    Google Scholar 

  52. Rockafellar, R.T.: Convex analysis, vol. 36. Princeton University Press, Princeton (1970)

    Book  Google Scholar 

  53. Schöpfer, F., Lorenz, D.A.: Linear convergence of the randomized sparse Kaczmarz method. Math. Program. 173(1), 509–536 (2019)

    Article  MathSciNet  Google Scholar 

  54. Schöpfer, F., Lorenz, D.A., Tondji, L., Winkler, M.: Extended randomized Kaczmarz method for sparse least squares and impulsive noise problems. Linear Algebra Appl. 652, 132–154 (2022)

    Article  MathSciNet  Google Scholar 

  55. Schöpfer, F., Louis, A.K., Schuster, T.: Nonlinear iterative methods for linear ill-posed problems in Banach spaces. Inverse Prob. 22(1), 311 (2006)

    Article  MathSciNet  Google Scholar 

  56. Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14(1), 567–599 (2013)

    MathSciNet  Google Scholar 

  57. Tondji, L., Lorenz, D. A.: Faster randomized block sparse Kaczmarz by averaging. Numerical Algorithms, pp. 1–35 (2022)

  58. Wang, Q., Li, W., Bao, W., Gao, X.: Nonlinear Kaczmarz algorithms and their convergence. J. Comput. Appl. Math. 399, 113720 (2022)

    Article  MathSciNet  Google Scholar 

  59. Wang, W., Carreira-Perpinán, M. A.: Projection onto the probability simplex: an efficient algorithm with a simple proof, and an application. arXiv preprint arXiv:1309.1541 (2013)

  60. You, J.-K., Cheng, H.-C., Li, Y.-H.: Minimizing quantum Rényi divergences via mirror descent with Polyak step size. In: 2022 IEEE International symposium on information theory (ISIT), IEEE, pp 252–257 (2022)

  61. You, J.-K., Li, Y.-H.: Two Polyak-type step sizes for mirror descent. arXiv preprint arXiv:2210.01532 (2022)

  62. Yuan, R., Lazaric, A., Gower, R.M.: Sketched Newton-Raphson. SIAM J. Optim. 32(3), 1555–1583 (2022)

    Article  MathSciNet  Google Scholar 

  63. Zalinescu, C.: Convex analysis in general vector spaces. World scientific, (2002)

  64. Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64(3), 107–115 (2021)

    Article  Google Scholar 

  65. Zhou, Z., Mertikopoulos, P., Bambos, N., Boyd, S., Glynn, P.W.: Stochastic mirror descent in variationally coherent optimization problems. Adv. Neural. Inf. Process. Syst. 30, 7040–7049 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maximilian Winkler.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: the typo in Theorem 4.20 has been updated.

Appendix A: Newton’s method for line search problem (11)

Appendix A: Newton’s method for line search problem (11)

We compute the Newton update for problem (11) for general \(\varphi \) with \(C^2\)-smooth conjugate \(\varphi ^*\). The function \(g_{i_k,x_k^*}\) from (19) has first derivative

$$\begin{aligned} g_{i_k,x_k^*}'(t)&= \big \langle \nabla \varphi ^*(x_k^*-t\nabla f_{i_k}(x_k)), -\nabla f_{i_k}(x_k)\big \rangle + \beta _k \\&=\big \langle x_k - \nabla \varphi ^*(x_k^* - t\nabla f_{i_k}(x_k)), \ \nabla f_{i_k}(x_k) \big \rangle - f_{i_k}(x_k) \end{aligned}$$

and second derivative

$$\begin{aligned} g_{i_k,x_k^*}''(t) = \big \langle \nabla ^2 \varphi ^*(x_k^* - t\nabla f_{i_k}(x_k)) \nabla f_{i_k}(x_k), \ \nabla f_{i_k}(x_k) \big \rangle \ge 0. \end{aligned}$$

If it holds \(g_{f_{i_k},x_k^*}''(t)>0\), Newton’s method for (11) reads

$$\begin{aligned} t_{k,l+1} = t_{k,l} - \frac{g_{i_k,x_k^*}'(t_{k,l})}{g_{i_k,x_k^*}''(t_{k,l})}. \end{aligned}$$

As an initial value we use the step size \(t_{k,0}:= \frac{ f_{i_k}(x_k)}{\Vert \nabla f_{i_k}(x_k)\Vert _2^2}\) from the \(\ell _2\)-projection of \(x_k\) onto \(H_k\). We propose to stop the method if \(|g_{i_k,x_k^*}'(t_{k,l})|<\epsilon \). Typical values we used for our numerical examples were \(\epsilon \in \{ 10^{-5}, 10^{-6}, 10^{-9}, 10^{-15}\}\).

It may happen that problem (11) is ill-conditioned, in which case the Newton iterates \(t_{k,l}\) may diverge quickly to \(\pm \infty \) or alternate between two values. We have observed this can e.g. happen for the problem on left stochastic decomposition in Subsection 5.3, if the number m of rows of the matrix X in the problem is small.

In case that the Newton method diverges, we used the recently proposed globalized Newton method from [42], which reads

$$\begin{aligned} t_{k,l+1} = t_{k,l} - \frac{g_{i_k,x_k^*}'(t_{k,l})}{H\cdot \sqrt{|g_{i_k,x_k^*}'(t_{k,l})|} + g_{i_k,x_k^*}''(t_{k,l})} \end{aligned}$$

with a fixed constant \(H>0\). Also here, we stop if \(|g_{i_k,x_k^*}'(t_{k,l})|<\epsilon \). Convergence of the \(t_{k,l}\) for \(l\rightarrow \infty \) is guaranteed, if \(\varphi ^*\) is strongly convex, i.e. if \(\varphi \) is everywhere finite with Lipschitz continuous gradient and the values \(g_{i_k,x_k^*}(t_{k,l})\) are guaranteed to converge to the minimum value if \(\varphi ^*\) has Lipschitz continuous Hessian [42]. We have also observed good convergence for the negative entropy function on \(\mathbb {R}_{\ge 0}^d\) with this method when Newton’s method is unstable. For problems constrained to the probability simplex \(\Delta ^{d-1}\), the globalized Newton method converged more slowly than the vanilla Newton method. For the problem in subsection 5.3 with \((r,m)=(3,100)\) we chose \(H=0.1\). In addition, we performed a relaxed Bregman projection (line 10 of Algorithm 1) with step size (12) if \(|t_{k,l}|>100\).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gower, R., Lorenz, D.A. & Winkler, M. A Bregman–Kaczmarz method for nonlinear systems of equations. Comput Optim Appl 87, 1059–1098 (2024). https://doi.org/10.1007/s10589-023-00541-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-023-00541-9

Keywords

Mathematics Subject Classification

Navigation