We propose a new randomized method for solving systems of nonlinear equations, which can find sparse solutions or solutions under certain simple constraints. The scheme only takes gradients of component functions and uses Bregman projections onto the solution space of a Newton equation. In the special case of euclidean projections, the method is known as nonlinear Kaczmarz method. Furthermore if the component functions are nonnegative, we are in the setting of optimization under the interpolation assumption and the method reduces to SGD with the recently proposed stochastic Polyak step size. For general Bregman projections, our method is a stochastic mirror descent with a novel adaptive step size. We prove that in the convex setting each iteration of our method results in a smaller Bregman distance to exact solutions as compared to the standard Polyak step. Our generalization to Bregman projections comes with the price that a convex one-dimensional optimization problem needs to be solved in each iteration. This can typically be done with globalized Newton iterations. Convergence is proved in two classical settings of nonlinearity: for convex nonnegative functions and locally for functions which fulfill the tangential cone condition. Finally, we show examples in which the proposed method outperforms similar methods with the same memory requirements.
Data availability
We do not analyze or generate any datasets, because our work proceeds within a theoretical and mathematical approach. However, the code that generates the figures in this article can be found at https://github.com/MaxiWk/Bregman-Kaczmarz.
Change history
05 April 2024
A Correction to this paper has been published: https://doi.org/10.1007/s10589-024-00570-y
The typical setting in convergence analysis will be that \(\varphi \) is \(\sigma \)-strongly convex with respect to a norm \(\Vert \cdot \Vert \), and \(\Vert \cdot \Vert _*\) will be its dual norm.
Appendix A: Newton’s method for line search problem (11)
Appendix A: Newton’s method for line search problem (11)
We compute the Newton update for problem (11) for general \(\varphi \) with \(C^2\)-smooth conjugate \(\varphi ^*\). The function \(g_{i_k,x_k^*}\) from (19) has first derivative
and second derivative
If it holds \(g_{f_{i_k},x_k^*}''(t)>0\), Newton’s method for (11) reads
As an initial value we use the step size \(t_{k,0}:= \frac{ f_{i_k}(x_k)}{\Vert \nabla f_{i_k}(x_k)\Vert _2^2}\) from the \(\ell _2\)-projection of \(x_k\) onto \(H_k\). We propose to stop the method if \(|g_{i_k,x_k^*}'(t_{k,l})|<\epsilon \). Typical values we used for our numerical examples were \(\epsilon \in \{ 10^{-5}, 10^{-6}, 10^{-9}, 10^{-15}\}\).
It may happen that problem (11) is ill-conditioned, in which case the Newton iterates \(t_{k,l}\) may diverge quickly to \(\pm \infty \) or alternate between two values. We have observed this can e.g. happen for the problem on left stochastic decomposition in Subsection 5.3, if the number m of rows of the matrix X in the problem is small.
In case that the Newton method diverges, we used the recently proposed globalized Newton method from [42], which reads
with a fixed constant \(H>0\). Also here, we stop if \(|g_{i_k,x_k^*}'(t_{k,l})|<\epsilon \). Convergence of the \(t_{k,l}\) for \(l\rightarrow \infty \) is guaranteed, if \(\varphi ^*\) is strongly convex, i.e. if \(\varphi \) is everywhere finite with Lipschitz continuous gradient and the values \(g_{i_k,x_k^*}(t_{k,l})\) are guaranteed to converge to the minimum value if \(\varphi ^*\) has Lipschitz continuous Hessian [42]. We have also observed good convergence for the negative entropy function on \(\mathbb {R}_{\ge 0}^d\) with this method when Newton’s method is unstable. For problems constrained to the probability simplex \(\Delta ^{d-1}\), the globalized Newton method converged more slowly than the vanilla Newton method. For the problem in subsection 5.3 with \((r,m)=(3,100)\) we chose \(H=0.1\). In addition, we performed a relaxed Bregman projection (line 10 of Algorithm 1) with step size (12) if \(|t_{k,l}|>100\).
