Abstract
In this work, we study the method of randomized Bregman projections for stochastic convex feasibility problems, possibly with an infinite number of sets, in Euclidean spaces. Under very general assumptions, we prove almost sure convergence of the iterates to a random almost common point of the sets. We then analyze in depth the case of affine sets showing that the iterates converge Q-linearly and providing also global and local rates of convergence. This work generalizes recent developments in randomized methods for the solution of linear systems based on orthogonal projection methods. We provided several applications: sketch & project methods for solving linear systems of equations, positive definite matrix completion problem, gossip algorithms for networks consensus, the assessment of robust stability of dynamical systems, and computational solutions for multimarginal optimal transport.
Similar content being viewed by others
Data Availability
Data used in this article is available in public repository.
Notes
Note that is not a distance in the sense of metric topology and even when ϕ(x) = (1/2)∥x∥2 it is one half of the square of the distance between x and y.
This means that \(\nu (A,x) = {\int \limits }_{A} [D_{C_{i}}(x)/\overline {D}_{C}(x)] \mu (d i)\) if x∉C and ν(A,x) = μ(A) if x ∈ C.
References
Ash, R.B., Doléans-Dade, C.A.: Probability & Measure Theory. Academic Press, San Diego, CA USA (2000)
Azizan, N., Hassibi, B.: Stochastic gradient/mirror descent: minimax optimality and implicit regularization. Int. Conf. Learn. Representations (ICLR):1–18 (2019)
Banerjee, A., Merugu, S., Dhillon, I., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)
Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces. Commun. Contemp. Math. 3, 615–647 (2001)
Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42, 596–636 (2003)
Bauschke, H.H., Combettes, P.L.: Legendre functions and the method of random Bregman projections. J. Convex Anal. 4, 27–67 (1997)
Bauschke, H.H., Combettes, P.L.: Iterating Bregman retractions. SIAM J. Optim. 13, 1159–1173 (2003)
Bauschke, H.H., Wang, X., Ye, J., Yuang, X.: Bregman distances and Chebyshev sets. J. Approx. Theory 159, 3–25 (2009)
Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37, A1111–A1138 (2015)
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 200–217 (1967)
Butnariu, D., Censor, Y., Reich, S.: Iterative averaging of entropic projections for solving stochastic convex feasibility problems. Comput. Optim. Appl. 8, 21–39 (1997)
Butnariu, D., Flȧm, S.D.: Strong convergence of expected-projection methods in Hilbert spaces. Numer. Funct. Anal. and Optimiz. 16, 601–636 (1995)
Butnariu, D., Iusem, A., Burachik, R.: Iterative methods of solving stochastic convex feasibility problems and applications. Comput. Optim. Appl. 15, 269–307 (2000)
Calafiore, G., Polyak, B.T.: Stochastic algorithms for exact and approximate feasibility of robust LMIs. IEEE Trans. Autom. Control 46(11), 1755–1759 (2001)
Castaing, C., Valadier, M.: Convex Analysis and Measurable Multifunctions Lecture Notes in Mathematics, vol. 580. Springer, New York (1977)
Censor, Y., Lent, A.: An iterative row-action method for interval convex programming. J. Optim. Theory Appl. 34, 321–353 (1981)
Censor, Y., Reich, S.: Iteration of paracontractions and firmly nonexpansive operators with applications to feasibility optimization. Optimization 37, 323–339 (1996)
Censor, Y., Zenios, A.: Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press New York (1997)
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3, 538–543 (1993)
Cimmino, G.: Calcolo approssimato per le soluzioni di sistemi di equazioni lineari. La Ricerca Scientifica Anno IX(2), 326–333 (1938)
Combettes, P.L.: The foundations of set theoretic estimation. Proc. IEEE 81, 182–208 (1993)
Combettes, P.L., Pesquet, J.-C.: Stochastic quasi-fejér block-coordinate fixed point iterations with random sweeping II: mean-square and linear convergence. Math. Program. B174(1), 433–451 (2019)
Dessein, A., Papadakis, N., Rouas, J.-L.: Regularized optimal transport and the ROT mover’s distance. J. Mach. Learn. Res. 19, 1–53 (2018)
Deutsch, F.: The method of alternating orthogonal projections. In: Singh, S. (ed.) Approximation Theory, Spline Functions and Applications. Kluwer Academic (1992)
Dhillon, I.S., Tropp, J.A.: Matrix nearness problems with Bregman divergences. SIAM J. Matrix Anal. Appl. 29, 1120–1146 (2007)
Duff, I.S., Grimes, R.G., Lewis, J.G.: Users’ guide for the Harwell-Boeing sparse matrix collection (release i) (1992)
Durrett, R.: Probability: Theory and Examples, 4th edn. Cambridge University Press, New York (2010)
Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the β-divergence. Neural Comput. 23, 2421–2456 (2011)
Gower, R., Molitor, D., Moorman, J., Needell, D.: Adaptive sketch-and-project methods for solving linear systems. SIAM J. Matrix Anal. Appl. 42(2), 954–989 (2021)
Gower, R., Richtarik, P.: Randomized iterative methods for linear systems. SIAM J. Matrix Anal. Appl. 36, 1660–1690 (2015)
Halperin, I.: The product of projection operators. Acta Sci. Math. (Szeged) 23, 96–99 (1962)
Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression. Wiley (2005)
Kaczmarz, S.: Angenherte suflösung von systemen linearer gleichungen. Bull. Int. Acad. Polon. Sci., Cl. Sci. Math., Ser. A, Sci. Math. 35, 355–357 (1937)
Kostic, V.R., Miedlar, A., Stolwijk, J.: On matrix nearness problems: distance to delocalization. SIAM J. Matrix Anal. Appl. 36, 435–460 (2015)
Hermer, N., Luke, D.R., Sturm, A.: Random function iterations for consistent stochastic feasibility. Numer. Funct. Anal. Optim. 40, 386–420 (2019)
Jelasity, M., Montresor, A., Babaoglu, O.: Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst. 23(3), 219–252 (2005)
Loizou, N., Richtárik, P.: Revisiting randomized gossip algorithms: general framework, convergence rates and novel block and accelerated protocols, pp. 1-44. arXiv:1905.08645 (2019)
Mangesius, H., Xue, X.D., Hirche, S.: Consensus driven by the geometric mean. IEEE Trans. Control Netw. Syst. 5(1), 251–261 (2016)
Martinsson, P.-G., Tropp, J.: Randomized numerical linear algebra: foundations & algorithms. Acta Numer. 29, 403–572 (2020)
Mavroforakis, C., Erdös, D., Crovella, M., Terzi, E.: Active positive-definite matrix completion. In: Proceedings of the 2017 SIAM international conference on data mining, pp. 264–272 (2017)
Mazko, A.: Matrix equations, spectral problems and stability of dynamic systems. Stability, oscillations and optimization of systems. Cambridge Scientific Publishers (2008)
Muzellec, B., Nock, R., Patrini, G., Nielsen, F.: Tsallis regularized optimal transport and ecological inference. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp. 2387–2393 (2017)
Necoara, I., Richtárik, P., Patrascu, A.: Randomized projection methods for convex feasibility: conditioning and convergence rates. SIAM J. Optim. 29, 2814–2852 (2019)
Nedić, A.: Random projection algorithms for convex set intersection problems. 49th IEEE conference on decision and control (CDC) pp. 7655–7660 (2010)
Needell, D., Rebrova, E.: On block gaussian sketching for the Kaczmarz method. Numer. Alg. 86(/1), 443–473 (2019)
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)
Peyré, G., Cuturi, M.: Computational optimal transport: with applications to data sciences. Now publishers incorporated, USA (2019)
Reem, D., Reich, S., De Pierro, A.: Re-examination of Bregman functions and new properties of their divergences. Optimization 68, 279–348 (2019)
Richtárik, P., Takáč, M.: Stochastic reformulations of linear systems: algorithms and convergence theory. SIAM J. Matrix Anal. Appl. 41, 487–524 (2020)
Rockafellar, R.T.: Convex Analysis. Princeton University Press Princeton (1970)
Ruschendorf, L.: Convergence of the iterative proportional fitting procedure. Ann. Statist. 23, 1160–1174 (1995)
Steinerberger, S.: Randomized Kaczmarz converges along small singular vectors. SIAM J. Matrix Anal. Appl. 42(/2), 608–615 (2021)
Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15, 262–278 (2009)
Sun, T., Tran-Dinh, Q.: Generalized self-concordant functions: a recipe for newton-type methods. Math. Program. 178, 145–213 (2019)
Acknowledgements
We wish to thank three anonymous referees whose helpful comments led to the improvement of the originally submitted version.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Vladimir R. Kostić and Saverio Salzo contributed equally to this work.
Appendices
Appendix: A. Basic facts on Bregman projection method
In the following, we collect few important facts about Bregman distances generated by Legendre functions (see [4, 6, 7, 48]). Note that item (xi) follows from Taylor’s formula for ϕ.
Fact A.1
Let ϕ be a Legendre function. Then the following properties hold.
-
(i)
(∀x ∈domϕ)(∀y ∈int(domϕ)) Dϕ(x,y) = ϕ(x) + ϕ∗(∇ϕ(y)) −〈x,∇ϕ(y)〉.
-
(ii)
(∀y ∈int(domϕ)) Dϕ(⋅,y) is a strictly convex on int(domϕ) and coercive.
-
(iii)
(∀x,y ∈int(domϕ)) Dϕ(x,y) = 0 ⇔ x = y.
-
(iv)
(∀x,y ∈int(domϕ)) Dϕ(x,y) + Dϕ(y,x) = 〈x − y,∇ϕ(x) −∇ϕ(y)〉≥ 0.
-
(v)
(Three-Point Identity [19]) For every x ∈ X and y,z ∈int(domϕ), we have
$$ D_{\phi}(x,z) = D_{\phi}(x,y)+D_{\phi}(y,z) - \langle{x- y, \nabla\phi(z)-\nabla\phi(y)}\rangle. $$(A1) -
(vi)
(∀x,y ∈int(domϕ)) HCode \(D_{\phi }(x,y)=D_{\phi ^{*}}(\nabla \phi (y),\nabla \phi (x))\).
-
(vii)
Dϕ is continuous on int(domϕ) ×int(domϕ).
-
(viii)
Suppose that ϕ is twice differentiable on int(domϕ). Then
$$ \big(\forall x\!\in\textup{int}(\textup{dom}\phi), \nabla^{2}\phi(x)\text{ is invertible}\big)\Leftrightarrow \big(\phi^{*}\!\text{ is twice differentiable}\big). $$(A2) -
(ix)
Suppose that domϕ∗ is open. Then, for every x ∈int(domϕ), the sublevel sets of Dϕ(x,⋅) are compact, and hence ϕ(x,⋅) is lower semicontinuous.
-
(x)
Suppose that domϕ∗ is open. Then, for every x ∈int(domϕ), and every sequence \((y_{k})_{k \in \mathbb {N}}\) in int(domϕ)
$$ D_{\phi}(x,y_{k}) \to 0\ \Rightarrow\ y_{k} \to x. $$(A3)Consequently, for every x ∈int(domϕ) and ε > 0, there exists δ > 0 such that for every y ∈int(domϕ), Dϕ(x,y) < δ ⇒ ∥x − y∥ < ε.
-
(xi)
If ϕ is twice differentiable on int(domϕ), then for every x,y ∈int(domϕ) there exists ξ ∈ [x,y] such that
$$ D_{\phi}(x,y)=\frac{1}{2}\langle{\nabla^{2}\phi(\xi)(x-y),x-y}\rangle. $$(A4)Moreover, for every y ∈int(domϕ) and every ε > 0 there exists δ > 0 such that, for every x ∈int(domϕ) such that x − y∉Ker(∇2ϕ(y)),
$$ \|{x - y}\| \leq \delta\ \Rightarrow\ \left| \frac{D_{\phi}(x,y) - \frac 1 2 \langle{\nabla^{2} \phi(y) (x-y),x-y}\rangle}{\frac 1 2\langle{\nabla^{2} \phi(y) (x - y), x - y}\rangle}\right| \leq \varepsilon. $$(A5)
In addition to the above facts, we will use the following ones, too.
Fact A.2
Let A: X → Y be a linear operator and let A‡ be its Moore-Penrose pseudoinverse. Then AA‡ = A(A∗A)‡A∗ is the orthogonal projector onto Im(A), and \(\|{A^{\dagger }}\|{}^{-1} = \inf _{z\in \textup {Ker}(A)^{\perp }\setminus \{0\}} \|{Az}\|/\|{z}\|\) is the smallest positive singular value of A.
Fact A.3 ([27, Example 5.1.5])
Let ζ1 and ζ2 be independent random variables with values in the measurable spaces \(\mathcal {Z}_{1}\) and \(\mathcal {Z}_{2}\) respectively. Let \(\varphi \colon \mathcal {Z}_{1}\times \mathcal {Z}_{2} \to \mathbb {R}\) be measurable and suppose that \(\mathbb {E}[|{\varphi (\zeta _{1},\zeta _{2})}|]<+\infty \). Then \(\mathbb {E}[\varphi (\zeta _{1},\zeta _{2}) | \zeta _{1}] = \psi (\zeta _{1})\), where for all \(z_{1} \in \mathcal {Z}_{1}\), \(\psi (z_{1}) = \mathbb {E}[\varphi (z_{1}, \zeta _{2})]\).
Fact A.4 ([27, Theorem 3.2.4])
Let \((x_{k})_{k \in \mathbb {N}}\) be a sequence of X-valued random variable and let x be an X-valued random variable. Then the following hold.
-
(i)
Suppose that xk are uniformly essentially bounded, i.e., \(\sup _{k \in \mathbb {N}} \textup {esssup} \|{x_{k}}\|<+\infty \). Then xk → x \(\mathbb {P}\)-a.s. ⇒ \(\mathbb {E}[\|{x_{k}-x}\|^{2}]\to 0\).
-
(ii)
Suppose that xk ∈ U ⊂ X \(\mathbb {P}\)-a.s. and T : U → Y is continuous. Then xk → x in distribution ⇒ T(xk) → T(x) in distribution.
Proof Proof of Lemma 2.1
Let \(x_{i} = P_{C_{i}}(x)\), i = 1, 2 and z ∈ C2. Then using Fact 2.2 (iii), Dϕ(x2,x1) + Dϕ(x1,x) = Dϕ(x2,x) ≤ Dϕ(z,x) = Dϕ(z,x1) + Dϕ(x1,x), which yields Dϕ(x2,x1) ≤ Dϕ(z,x1). Hence \(x_{2} = P_{C_{2}}(x_{1})\) and \(D_{C_{2}}(x_{1}) + D_{C_{1}}(x) = D_{C_{2}}(x)\). □
Proof Proof of Lemma 2.2
Since Az = b, it follows from (11) and Fact A.1(vi) that
Moreover, it follows from Fact A.1(v) that
which together with (A6) yields (i). Next, since PC(x) ∈ C, weak duality yields \(D_{\phi }(P_{C}(x),x) \geq - {\Psi }^{x}_{C}(\lambda )\). Then, by (i), Dϕ(PC(x),x) ≥ Dϕ(z,x) − Dϕ(z,∇ϕ∗(∇ϕ(x) + A∗λ)). Statement (ii) follows by Pythagora’s theorem given in Proposition 2.2(iii). □
Appendix: B. D-Fejér monotone sequences [5, 18]
Let C ⊂ X be a nonempty closed convex set. Let ϕ be a Legendre function such that \(C \cap \text {int}(\text {dom} \phi ) \neq \varnothing \). A sequence \((x_{k})_{k \in \mathbb {N}}\) in int(domϕ) is Bregman monotone or D-Fejér monotone w.r.t. C if
For D-Fejér monotone sequences, the following properties are known [5, Proposition 4.1, Example 4.7, and Theorem 4.1(i)].
Proposition B.1
Let \((x_{k})_{k \in \mathbb {N}}\) be a D-Fejer monotone sequence with respect to C. Then the following hold.
-
(i)
\((\forall x \in C\cap \textup {dom} \phi )\quad (D_{\phi }(x,x_{k}))_{k \in \mathbb {N}}\) is decreasing.
-
(ii)
\((D_{C}(x_{k}))_{k \in \mathbb {N}}\) is decreasing.
-
(iii)
\((\forall k \in \mathbb {N})(\forall p \in \mathbb {N})\quad D_{C}(x_{k+p}) \leq D_{C}(x_{k}) - D_{\phi }(P_{C} (x_{k}),P_{C}(x_{k+p}))\).
-
(iv)
\((\forall x \in C\cap \textup {dom} \phi )(\forall x^{\prime } \in C\cap \textup {dom} \phi ) \quad \langle {x - x^{\prime }, \nabla \phi (x_{k})}\rangle \) is convergent.
-
(v)
Suppose that domϕ∗ is open. Then \((x_{k})_{k \in \mathbb {N}}\) is bounded.
-
(vi)
If all cluster points of \((x_{k})_{k \in \mathbb {N}}\) lie in C, then \((x_{k})_{k \in \mathbb {N}}\) converges to some point in C ∩int(domϕ).
Concerning Proposition B.1(vi), we now give a result ensuring that the cluster points of \((x_{k})_{k \in \mathbb {N}}\) lie in C. In the next result, we will consider the so-called sequential consistency assumption [5].
Proposition B.2
Suppose that DC(xk) → 0 and that for all bounded sequences \((z_{k})_{k \in \mathbb {N}}\) and \((y_{k})_{k \in \mathbb {N}}\) in int(domϕ)
Then \((x_{k})_{k \in \mathbb {N}}\) converges to some point in C ∩int(domϕ).
Proof
Let x ∈ C ∩int(domϕ). It follows from Proposition B.1(i) and Fact A.1(ix) that \((x_{k})_{k \in \mathbb {N}}\) is contained in the compact set {Dϕ(x,⋅) ≤ Dϕ(x,x0)}⊂int(domϕ). Hence, the set of cluster points of \((x_{k})_{k \in \mathbb {N}}\) is nonempty and contained in int(domϕ). Moreover, it follows from Proposition B.1(iii) (with k = 0) and Fact A.1(ix) that \((P_{C}(x_{p}))_{p \in \mathbb {N}}\) is contained in the compact set {Dϕ(PC(x0),⋅) ≤ DC(x0)} and hence it is bounded. Let x be a cluster point of \((x_{k})_{k \in \mathbb {N}}\) and let \((x_{n_{k}})_{k \in \mathbb {N}}\) be a subsequence such that \(x_{n_{k}} \to x\). Then, we saw that x ∈int(domϕ). Moreover, \(D_{\phi }(P_{C}(x_{n_{k}}), x_{k_{n}}) = D_{C}(x_{n_{k}})\to 0\) and hence in virtue of (B8), we have that \(P_{C}(x_{n_{k}}) - x_{n_{k}} \to 0\). Therefore, \(P_{C}(x_{n_{k}}) \to x\), which implies that x ∈ C, since C is closed. Thus, we proved that all cluster points of \((x_{k})_{k \in \mathbb {N}}\) lie in C ∩int(domϕ) and therefore, by Proposition B.1(vi), we derive that \((x_{k})_{k \in \mathbb {N}}\) converges to some point in C. □
Appendix: C. Proof of Lemma 3.1(ii) under assumption H1
It follows from Proposition 3.1(iii) and H1 that there exists a \(\mathbb {P}\)-negligible set N ⊂Ω such that \(C = \bigcap _{\omega \in {\Omega }\setminus N} C_{\xi (\omega )}\) and \(\sup _{\omega \in {\Omega }\setminus N}\|{A_{\xi (\omega )}^{*}(A_{\xi (\omega )} H^{2} A_{\xi (\omega )}^{*})^{\dagger } A_{\xi (\omega )}}\|\leq M<+\infty \). Note that, if x ∈int(domϕ) ∩ C, then \(0=D_{C}(x)=D_{C}(P_{C_{\xi (\omega )}}(x))\), for every ω ∈Ω∖ N, hence (22) holds trivially. Therefore, we let x ∈int(domϕ) ∖ C and let x⋆ = PC(x), y = ∇ϕ(x), y⋆ = ∇ϕ(x⋆). Now, let ω ∈Ω∖ N. We denote for the sake of brevity i = ξ(ω), \(x_{i} = P_{C_{i}}(x)\), yi = ∇ϕ(xi) and H = [∇2ϕ∗(y⋆)]1/2. Next, we will proceed through 6 steps.
- Step1::
-
We have
$$ (\forall v_{i} \in \text{Im}(A_{i}^{*}))\quad y + v_{i} \in \text{int}(\text{dom} \phi^{*})\ \implies\ D_{C}(x_{i}) \leq D_{\phi^{*}}(y + v_{i}, y_{\star}). $$(C9)Indeed, Lemma 2.1 yields \(D_{C}(x_{i}) = D_{\phi }(P_{C}(x_{i}), x_{i}) = D_{\phi }(P_{C}(x),x_{i})= D_{\phi }(x_{\star },P_{C_{i}}(x))\), with x⋆ ∈ Ci. Hence, using Lemma 2.2 and Fact A.1(vi), (C9) follows.
- Step 2::
-
There exists \(\tilde {w} \in \text {Im}(A^{*})\) such that \(\|{H \tilde {w}}\| = 1\) and, for all τ > 0,
$$ u_{\tau}:= H(y_{\star} - y + \tau \tilde w) \in V(x_{\star})\!\setminus\!\{0\}. $$(C10)Indeed, first recall that Im(HA∗) = V (x⋆)≠{0}. It follows from (12) that y⋆ − y ∈Im(A∗). Now, if H(y⋆ − y)≠ 0 we define \(\tilde {w}=(y_{\star } - y)/\|{H(y_{\star } - y)}\|\) and (C10) follows. Otherwise, since Im(HA∗)≠{0}, we can pick \(\tilde {w} \in \text {Im}(A^{*})\) such that \(\|{H \tilde {w}}\|=1\) and again (C10) follows.
- Step 3::
-
Suppose that \(HA_{i}^{*}\neq 0\). We prove that, for every τ > 0, there exists \(v_{i,\tau } \in \text {Im}(A_{i}^{*})\) such that y + vi,τ − y⋆∉Ker(H) and
$$ \|{y+v_{i,\tau} - y_{\star}}\|\leq (1+M\|{H}\|^{2})\|{y_{\star}-y}\| + 3\tau M \|{H}\|. $$(C11)Indeed, since \(HA_{i}^{*}\neq 0\), there exists \(w_{i} \in \text {Im}(HA^{*}_{i})\) such that ∥wi∥ = 2. Now, note that
$$ Q_{i}(x_{\star}) = H A_{i}^{*}[A_{i} H^{2} A_{i}^{*}]^{\dagger} A_{i} H $$(C12)and let, for every τ > 0,
$$ v_{i,\tau}:=A_{i}^{*}[A_{i} H^{2} A_{i}^{*}]^{\dagger} A_{i} H (u_{\tau} +\tau w_{i})\in\text{Im}(A_{i}^{*}). $$(C13)Then, recalling (C10), (C12), and the fact that \(w = H \tilde {w}\), we have
$$ \begin{array}{@{}rcl@{}} H(y_{\star}-y-v_{i,\tau}) &=& H(y_{\star}-y) - Q_{i}(x_{\star})(u_{\tau} + \tau w_{i})\\ &=& [I - Q_{i}(x_{\star})]H(y_{\star}-y) - \tau Q_{i}(x_{\star})(w+w_{i}) \end{array} $$and, since Qi(x⋆) is the projector onto \(\text {Im}(HA_{i}^{*})\) and \(w_{i} \in \text {Im}(HA_{i}^{*})\), we have
$$ \|{H(y+v_{i,\tau} - y_{\star})}\|^{2} = \|{[I - Q_{i}(x_{\star})]H(y_{\star}-y)}\|^{2} + \tau^{2} \|{Q_{i}(x_{\star})w + w_{i}}\|^{2}. $$(C14)In the above formula we have Qi(x⋆)w≠ − wi, since ∥wi∥ = 2 while ∥Qi(x⋆)w∥≤∥w∥ = 1. Therefore ∥H(y + vi,τ − y⋆)∥2 > 0 and hence y + vi,τ − y⋆∉Ker(H). Finally, inequality (C11) follows by bounding ∥vi,τ∥ using (C13), (C10), assumption H1, and the fact that ∥wi∥ = 2 and \(\|{H \tilde {w}}\|=1\).
- Step 4::
-
Suppose that \(HA_{i}^{*}\neq 0\) and let \(\varepsilon \in \left ]0,1\right [\). We prove that for τ > 0 sufficiently small
$$ \frac{1 - \varepsilon}{2} \|{u_{\tau}}\|^{2} \leq D_{C}(x)\ \text{and}\ D_{C}(x_{i}) \leq \frac{1 + \varepsilon}{2} \Big(\|{[I - Q_{i}(x_{\star})]u_{\tau}}\|^{2} + \sqrt{\tau} D_{C}(x) \Big). $$(C15)Indeed, it follows from the second part of Fact A.1(xi), applied to \(D_{\phi ^{*}}\), that there exists \(\tilde \delta >0\) such that if \(\|{\tilde y-y_{\star }}\|<\tilde \delta \) and \(\tilde y-y_{\star }\not \in \text {Ker}(H)\), then
$$ \frac{\sqrt{1 - \varepsilon}}{2} \langle{H^{2} (\tilde y - y_{\star}), \tilde y - y_{\star}}\rangle \leq D_{\phi^{*}}(\tilde y, y_{\star}) \leq \frac{1 + \varepsilon}{2} \langle{H^{2} (\tilde y - y_{\star}), \tilde y - y_{\star}}\rangle. $$Therefore, setting \(\beta _{\star }= 1 + \max \limits \{3 M\|{H}\|^{2}+M\|{H}\|,\|{\tilde w}\|\}>1\), it follows from the inequality \(\|{y_{\star }-y-\tau \tilde w}\| \leq \|{y_{\star }-y}\| + \tau \|{\tilde w}\|\) and (C11) that if τ ≤∥y − y⋆∥ and \(\|{y-y_{\star }}\|\leq \tilde \delta / \beta _{\star }\), we have
$$ D_{\phi^{*}}(y+v_{i,\tau}, y_{\star}) \leq \frac{1 + \varepsilon}{2} \langle{H^{2} (y+v_{i,\tau} - y_{\star}), y+v_{i,\tau} - y_{\star}}\rangle $$(C16)and
$$ D_{\phi^{*}}(y+\tau \tilde w, y_{\star}) \geq \frac{\sqrt{1 - \varepsilon}}{2} \langle{H^{2} (y +\tau \tilde w - y_{\star}), y +\tau \tilde w - y_{\star}}\rangle. $$(C17)Now, the continuity of ∇ϕ and Fact A.1(x) yields that there exists δ > 0 such that if DC(x) < δ then, \(\|{y-y_{\star }}\|<\tilde \delta / \beta _{\star }\), and hence, collecting (C9) and (C16), we obtain \(D_{C}(x_{i}) \leq D_{\phi ^{*}}\big (y+v_{i,\tau },y_{\star } \big )\leq ((1 + \varepsilon )/2) \|{ H(y+v_{i,\tau } - y_{\star })}\|{}^{2}\). However, it also holds that ∥H(y + vi,τ − y⋆)∥ = ∥[I − Qi(x⋆)]uτ + τ(w − wi)∥≤∥[I − Qi(x⋆)]uτ∥ + 3τ. Therefore, since \(\|{u_{\tau }}\| \leq \|{H(y-y_{\star })}\| + \tau \|{H \tilde w}\| \leq (\|{H}\| + 1) \|{y - y_{\star }}\|\),
$$ \begin{array}{@{}rcl@{}} D_{C}(x_{i}) &\leq& \frac{1 + \varepsilon}{2}\left( \|{[I-Q_{i}(x_{\star})]u_{\tau}}\|^{2} + 9 \tau^{2} +6 \tau \|{u_{\tau}}\| \right)\\ &\leq& \frac{1 + \varepsilon}{2}\left( \|{[I-Q_{i}(x_{\star})]u_{\tau}}\|^{2} + 3 \tau(2 \|{H}\| + 5) \|{y - y_{\star}}\| \right) \end{array} $$which, for \(\tau \leq \tau _{\star }^{(1)}:= \min \limits \{\|{y_{\star }-y}\|,9^{-1}D_{C}(x)^{2}\|{y_{\star }-y}\|^{-2}(2 \|{H}\| + 5)^{-2}\}\), gives
$$ D_{C}(x_{i}) \leq \frac{1 + \varepsilon}{2} \Big(\|{[I - Q_{i}(x_{\star})]u_{\tau}}\|^{2} + \sqrt{\tau} D_{C}(x) \Big). $$(C18)On the other hand, \(D_{C}(x) = D_{\phi }(x_{\star }, x) = D_{\phi ^{*}}(y, y_{\star }) \neq 0\). Hence, using the continuity of \(D_{\phi ^{*}}(\cdot ,y_{\star })\), we have that there exists \(\tau _{\star }^{(2)}>0\) such that for every \(\tau \leq \tau _{\star }^{(2)}\), \(D_{C}(x)\geq \sqrt {1-\varepsilon } D_{\phi ^{*}}(y+\tau \tilde w, y_{\star })\). So, (C17) yields
$$ D_{C}(x)\geq \frac{1 - \varepsilon}{2} \langle{H^{2} (y+ \tau \tilde{w} - y_{\star}), y + \tau \tilde{w}- y_{\star}}\rangle = \frac{1 - \varepsilon}{2} \|{u_{\tau}}\|^{2}. $$(C19) - Step 5::
-
For \(\tau \leq \min \limits \{\tau _{\star }^{(1)},\tau _{\star }^{(2)}\}\), we have
$$ \frac{D_{C}(x_{i})}{D_{C}(x)} \leq \frac{1+\varepsilon}{1-\varepsilon} \bigg(1 - \frac{\|{Q_{i}(x_{\star}) u_{\tau}}\|^{2}}{\|{u_{\tau}}\|^{2}} \bigg)+ \frac{1 + \varepsilon}{2}\sqrt{\tau}. $$(C20)This follows from (C15) when \(HA_{i}^{*}\neq 0\). However, (C20) holds even when \(\text {Im}(HA_{i}^{*})=\{0\}\). Indeed in such case, recalling the definition of Qi(x⋆), we have Qi(x⋆) ≡ 0. Hence, since DC(xi) ≤ DC(x), we have that (C20) actually holds for every τ > 0.
- Step 6::
-
Note that inequality (C20) holds with i = ξ(ω) and ω ∈Ω∖ N and that \(\tau _{\star }^{(1)}\) and \(\tau _{\star }^{(2)}\) are independent on i = ξ(ω). Therefore, the above inequality implies that
$$ \frac{D_{C}(P_{C_{\xi}}(x))}{D_{C}(x)} \leq \frac{1+\varepsilon}{1-\varepsilon} \bigg(1 - \frac{\|{Q_{\xi}(x_{\star}) u_{\tau}}\|{}^{2}}{\|{u_{\tau}}\|^{2}} \bigg)+ \frac{1+\varepsilon}{2}\sqrt{\tau}, \mathbb{P}\text{-a.s.} $$So, taking the expectation and recalling definition (21), we have
$$ \begin{array}{@{}rcl@{}} \frac{\mathbb{E}[D_{C}(P_{C_{\xi}}(x))]}{D_{C}(x)} &\leq& \frac{1+\varepsilon}{1-\varepsilon} \bigg(1 - \frac{\|{\overline{Q}(x_{\star}) u}\|^{2}}{\|{u}\|^{2}} \bigg) + \frac{1+\varepsilon}{2}\sqrt{\tau}\\ &\leq& \frac{1+\varepsilon}{1-\varepsilon} [1 - \gamma_{\mathcal{C}}(x_{\star})]+ \frac{1+\varepsilon}{2}\sqrt{\tau}. \end{array} $$
Finally, letting τ → 0 in the above inequality the statement follows.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kostić, V.R., Salzo, S. The method of randomized Bregman projections for stochastic feasibility problems. Numer Algor 93, 1269–1307 (2023). https://doi.org/10.1007/s11075-022-01468-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11075-022-01468-8
Keywords
- Stochastic convex feasibility problem
- Bregman projection method
- Linear convergence
- Randomized algorithm