Abstract
We propose a new randomized method for solving systems of nonlinear equations, which can find sparse solutions or solutions under certain simple constraints. The scheme only takes gradients of component functions and uses Bregman projections onto the solution space of a Newton equation. In the special case of euclidean projections, the method is known as nonlinear Kaczmarz method. Furthermore if the component functions are nonnegative, we are in the setting of optimization under the interpolation assumption and the method reduces to SGD with the recently proposed stochastic Polyak step size. For general Bregman projections, our method is a stochastic mirror descent with a novel adaptive step size. We prove that in the convex setting each iteration of our method results in a smaller Bregman distance to exact solutions as compared to the standard Polyak step. Our generalization to Bregman projections comes with the price that a convex one-dimensional optimization problem needs to be solved in each iteration. This can typically be done with globalized Newton iterations. Convergence is proved in two classical settings of nonlinearity: for convex nonnegative functions and locally for functions which fulfill the tangential cone condition. Finally, we show examples in which the proposed method outperforms similar methods with the same memory requirements.
Similar content being viewed by others
Data availability
We do not analyze or generate any datasets, because our work proceeds within a theoretical and mathematical approach. However, the code that generates the figures in this article can be found at https://github.com/MaxiWk/Bregman-Kaczmarz.
Change history
05 April 2024
A Correction to this paper has been published: https://doi.org/10.1007/s10589-024-00570-y
Notes
The typical setting in convergence analysis will be that \(\varphi \) is \(\sigma \)-strongly convex with respect to a norm \(\Vert \cdot \Vert \), and \(\Vert \cdot \Vert _*\) will be its dual norm.
References
Alber, Y., Butnariu, D.: Convergence of Bregman projection methods for solving consistent convex feasibility problems in reflexive Banach spaces. J. Optim. Theory Appl. 92(1), 33–61 (1997)
Arora, R., Gupta, M.R., Kapila, A., Fazel, M.: Similarity-based clustering by left-stochastic matrix factorization. J. Mach. Learn. Res. 14(1), 1715–1746 (2013)
Azizan, N., Lale, S., Hassibi, B.: Stochastic mirror descent on overparameterized nonlinear models. IEEE Trans. Neural Netw. Learn. Systems 33(12), 7717–7727 (2021)
Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Bregman monotone optimization algorithms. SIAM J. Control. Optim. 42(2), 596–636 (2003)
Bauschke, H.H., Borwein, J.M., et al.: Legendre functions and the method of random Bregman projections. J. Convex Anal. 4(1), 27–67 (1997)
Bauschke, H.H., Combettes, P.L.: Iterating Bregman retractions. SIAM J. Optim. 13(4), 1159–1173 (2003)
Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces, vol. 408. Springer, Berlin (2011)
Beck, A.: First-order methods in optimization. SIAM (2017)
Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2), A1111–A1138 (2015)
Boţ, R.I., Hein, T.: Iterative regularization with a general penalty term-theory and application to L1 and TV regularization. Inverse Prob. 28(10), 104010 (2012)
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)
Brucker, P.: An O(n) algorithm for quadratic knapsack problems. Oper. Res. Lett. 3(3), 163–166 (1984)
Butnariu, D., Iusem, A.N., Zalinescu, C.: On uniform convexity, total convexity and convergence of the proximal point and outer Bregman projection algorithm in Banach spaces. J. Convex Anal. 10(1), 35–62 (2003)
Butnariu, D., Resmerita, E.: The outer Bregman projection method for stochastic feasibility problems in Banach spaces. In: Studies in computational mathematics, vol. 8, pp. 69–86. Elsevier, Amsterdam (2001)
Butnariu, D., Resmerita, E.: Bregman distances, totally convex functions, and a method for solving operator equations in Banach spaces. In Abstract and Applied Analysis, volume 2006. Hindawi, (2006)
Censor, Y., Elfving, T., Herman, G.: Averaging strings of sequential iterations for convex feasibility problems. In: Studies in computational mathematics, vol. 8, pp. 101–113. Elsevier, Amsterdam (2001)
Censor, Y., Lent, A.: An iterative row-action method for interval convex programming. J. Optim. Theory Appl. 34(3), 321–353 (1981)
Censor, Y., Reich, S.: Iterations of paracontractions and firmaly nonexpansive operators with applications to feasibility and optimization. Optimization 37(4), 323–339 (1996)
Cetin, A.E.: Reconstruction of signals from Fourier transform samples. Signal Process. 16(2), 129–148 (1989)
Cetin, A.E.: An iterative algorithm for signal reconstruction from bispectrum. IEEE Trans. Signal Process. 39(12), 2621–2628 (1991)
Dhillon, I.S., Tropp, J.A.: Matrix nearness problems with Bregman divergences. SIAM J. Matrix Anal. Appl. 29(4), 1120–1146 (2008)
Doikov, N., Nesterov, Y.: Gradient regularization of Newton method with Bregman distances. arXiv preprint arXiv:2112.02952, (2021)
D’Orazio, R., Loizou, N., Laradji, I., Mitliagkas, I.: Stochastic mirror descent: convergence analysis and adaptive variants via the mirror stochastic Polyak stepsize. arXiv preprint arXiv:2110.15412, (2021)
Duchi, J., Shalev-Shwartz, S., Singer, Y., Chandra, T.: Efficient projections onto the \(\ell _1\)-ball for learning in high dimensions. In: Proceedings of the 25th international conference on Machine learning, pp. 272–279, (2008)
Fedotov, A.A., Harremoës, P., Topsoe, F.: Refinements of Pinsker’s inequality. IEEE Trans. Inf. Theory 49(6), 1491–1498 (2003)
Gafni, E.M., Bertsekas, D.P.: Two-metric projection methods for constrained optimization. SIAM J. Control. Optim. 22(6), 936–964 (1984)
Gower, R. M., Blondel, M., Gazagnadou, N., Pedregosa, F.: Cutting some slack for SGD with adaptive Polyak stepsizes. arXiv:2202.12328 (2022)
Gu, R., Han, B., Tong, S., Chen, Y.: An accelerated Kaczmarz type method for nonlinear inverse problems in Banach spaces with uniformly convex penalty. J. Comput. Appl. Math. 385, 113211 (2021)
Hanke, M., Neubauer, A., Scherzer, O.: A convergence analysis of the Landweber iteration for nonlinear ill-posed problems. Numer. Math. 72(1), 21–37 (1995)
Iusem, N.A., Solodov, V.M.: Newton-type methods with generalized distances for constrained optimization. Optimization 41(3), 257–278 (1997)
Jarman, B., Yaniv, Y., Needell, D.: Online signal recovery via heavy ball Kaczmarz. arXiv preprint arXiv:2211.06391 (2022)
Jin, Q.: Landweber-Kaczmarz method in Banach spaces with inexact inner solvers. Inverse Prob. 32(10), 104005 (2016)
Jin, Q., Lu, X., Zhang, L.: Stochastic mirror descent method for linear ill-posed problems in banach spaces. Inverse Prob. 39(6), 065010 (2023)
Jin, Q., Wang, W.: Landweber iteration of Kaczmarz type with general non-smooth convex penalty functionals. Inverse Prob. 29(8), 085011 (2013)
Kaczmarz, S.: Angenäherte Auflösung von Systemen linearer Gleichungen. Bull. Internat. Acad. Polon. Sci. Lettres A, pp. 355–357 (1937)
Kostic, V., Salzo, S.: The method of Bregman projections in deterministic and stochastic convex feasibility problems. arXiv preprint arXiv:2101.01704 (2021)
Lin, T., Ho, N., Jordan, M.: On efficient optimal transport: an analysis of greedy and accelerated mirror descent algorithms. In: International conference on machine learning, pp. 3982–3991. PMLR (2019)
Loizou, N., Vaswani, S., Laradji, I. H., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. In: International Conference on Artificial Intelligence and Statistics, PMLR pp. 1306–1314 (2021)
Lorenz, D.A., Schöpfer, F., Wenger, S.: The linearized Bregman method via split feasibility problems: analysis and generalizations. SIAM J. Imag. Sci. 7(2), 1237–1262 (2014)
Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: International conference on machine learning, pp 3325–3334. PMLR (2018)
Maaß, P., Strehlow, R.: An iterative regularization method for nonlinear problems based on Bregman projections. Inverse Prob. 32(11), 115013 (2016)
Mishchenko, K.: Regularized Newton method with global \( {O} (1/k^2) \) convergence. arXiv preprint arXiv:2112.02089 (2021)
Nedić, A.: Random algorithms for convex minimization problems. Math. Program. 129(2), 225–253 (2011)
Nemirovskij, A.S., Yudin, D.B.: Problem complexity and method efficiency in optimization. Wiley, Hoboken (1983)
Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation-based image restoration. Multisc. Model. Simul 4(2), 460–489 (2005)
Pinsker, M. S.: Information and information stability of random variables and processes (in Russian). Holden-Day, (1964)
Polyak, B., Tremba, A.: New versions of Newton method: step-size choice, convergence domain and under-determined equations. Optimiz. Methods Software 35(6), 1272–1303 (2020)
Polyak, B., Tremba, A.: Sparse solutions of optimal control via Newton method for under-determined systems. J. Global Optim. 76(3), 613–623 (2020)
Reem, D., Reich, S., De Pierro, A.: Re-examination of Bregman functions and new properties of their divergences. Optimization 68(1), 279–348 (2019)
Robbins, H., Siegmund, D.: A convergence theorem for non negative almost supermartingales and some applications. In: Optimizing methods in statistics, pp. 233–257. Elsevier, Amsterdam (1971)
Rockafellar, R.T.: Convex analysis, vol. 36. Princeton University Press, Princeton (1970)
Schöpfer, F., Lorenz, D.A.: Linear convergence of the randomized sparse Kaczmarz method. Math. Program. 173(1), 509–536 (2019)
Schöpfer, F., Lorenz, D.A., Tondji, L., Winkler, M.: Extended randomized Kaczmarz method for sparse least squares and impulsive noise problems. Linear Algebra Appl. 652, 132–154 (2022)
Schöpfer, F., Louis, A.K., Schuster, T.: Nonlinear iterative methods for linear ill-posed problems in Banach spaces. Inverse Prob. 22(1), 311 (2006)
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14(1), 567–599 (2013)
Tondji, L., Lorenz, D. A.: Faster randomized block sparse Kaczmarz by averaging. Numerical Algorithms, pp. 1–35 (2022)
Wang, Q., Li, W., Bao, W., Gao, X.: Nonlinear Kaczmarz algorithms and their convergence. J. Comput. Appl. Math. 399, 113720 (2022)
Wang, W., Carreira-Perpinán, M. A.: Projection onto the probability simplex: an efficient algorithm with a simple proof, and an application. arXiv preprint arXiv:1309.1541 (2013)
You, J.-K., Cheng, H.-C., Li, Y.-H.: Minimizing quantum Rényi divergences via mirror descent with Polyak step size. In: 2022 IEEE International symposium on information theory (ISIT), IEEE, pp 252–257 (2022)
You, J.-K., Li, Y.-H.: Two Polyak-type step sizes for mirror descent. arXiv preprint arXiv:2210.01532 (2022)
Yuan, R., Lazaric, A., Gower, R.M.: Sketched Newton-Raphson. SIAM J. Optim. 32(3), 1555–1583 (2022)
Zalinescu, C.: Convex analysis in general vector spaces. World scientific, (2002)
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64(3), 107–115 (2021)
Zhou, Z., Mertikopoulos, P., Bambos, N., Boyd, S., Glynn, P.W.: Stochastic mirror descent in variationally coherent optimization problems. Adv. Neural. Inf. Process. Syst. 30, 7040–7049 (2017)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: the typo in Theorem 4.20 has been updated.
Appendix A: Newton’s method for line search problem (11)
Appendix A: Newton’s method for line search problem (11)
We compute the Newton update for problem (11) for general \(\varphi \) with \(C^2\)-smooth conjugate \(\varphi ^*\). The function \(g_{i_k,x_k^*}\) from (19) has first derivative
and second derivative
If it holds \(g_{f_{i_k},x_k^*}''(t)>0\), Newton’s method for (11) reads
As an initial value we use the step size \(t_{k,0}:= \frac{ f_{i_k}(x_k)}{\Vert \nabla f_{i_k}(x_k)\Vert _2^2}\) from the \(\ell _2\)-projection of \(x_k\) onto \(H_k\). We propose to stop the method if \(|g_{i_k,x_k^*}'(t_{k,l})|<\epsilon \). Typical values we used for our numerical examples were \(\epsilon \in \{ 10^{-5}, 10^{-6}, 10^{-9}, 10^{-15}\}\).
It may happen that problem (11) is ill-conditioned, in which case the Newton iterates \(t_{k,l}\) may diverge quickly to \(\pm \infty \) or alternate between two values. We have observed this can e.g. happen for the problem on left stochastic decomposition in Subsection 5.3, if the number m of rows of the matrix X in the problem is small.
In case that the Newton method diverges, we used the recently proposed globalized Newton method from [42], which reads
with a fixed constant \(H>0\). Also here, we stop if \(|g_{i_k,x_k^*}'(t_{k,l})|<\epsilon \). Convergence of the \(t_{k,l}\) for \(l\rightarrow \infty \) is guaranteed, if \(\varphi ^*\) is strongly convex, i.e. if \(\varphi \) is everywhere finite with Lipschitz continuous gradient and the values \(g_{i_k,x_k^*}(t_{k,l})\) are guaranteed to converge to the minimum value if \(\varphi ^*\) has Lipschitz continuous Hessian [42]. We have also observed good convergence for the negative entropy function on \(\mathbb {R}_{\ge 0}^d\) with this method when Newton’s method is unstable. For problems constrained to the probability simplex \(\Delta ^{d-1}\), the globalized Newton method converged more slowly than the vanilla Newton method. For the problem in subsection 5.3 with \((r,m)=(3,100)\) we chose \(H=0.1\). In addition, we performed a relaxed Bregman projection (line 10 of Algorithm 1) with step size (12) if \(|t_{k,l}|>100\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gower, R., Lorenz, D.A. & Winkler, M. A Bregman–Kaczmarz method for nonlinear systems of equations. Comput Optim Appl 87, 1059–1098 (2024). https://doi.org/10.1007/s10589-023-00541-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-023-00541-9