Skip to main content
Log in

The method of randomized Bregman projections for stochastic feasibility problems

  • Original Paper
  • Published:
Numerical Algorithms Aims and scope Submit manuscript

Abstract

In this work, we study the method of randomized Bregman projections for stochastic convex feasibility problems, possibly with an infinite number of sets, in Euclidean spaces. Under very general assumptions, we prove almost sure convergence of the iterates to a random almost common point of the sets. We then analyze in depth the case of affine sets showing that the iterates converge Q-linearly and providing also global and local rates of convergence. This work generalizes recent developments in randomized methods for the solution of linear systems based on orthogonal projection methods. We provided several applications: sketch & project methods for solving linear systems of equations, positive definite matrix completion problem, gossip algorithms for networks consensus, the assessment of robust stability of dynamical systems, and computational solutions for multimarginal optimal transport.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data Availability

Data used in this article is available in public repository.

Notes

  1. Note that is not a distance in the sense of metric topology and even when ϕ(x) = (1/2)∥x2 it is one half of the square of the distance between x and y.

  2. This means that \(\nu (A,x) = {\int \limits }_{A} [D_{C_{i}}(x)/\overline {D}_{C}(x)] \mu (d i)\) if xC and ν(A,x) = μ(A) if xC.

References

  1. Ash, R.B., Doléans-Dade, C.A.: Probability & Measure Theory. Academic Press, San Diego, CA USA (2000)

  2. Azizan, N., Hassibi, B.: Stochastic gradient/mirror descent: minimax optimality and implicit regularization. Int. Conf. Learn. Representations (ICLR):1–18 (2019)

  3. Banerjee, A., Merugu, S., Dhillon, I., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)

    MathSciNet  MATH  Google Scholar 

  4. Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces. Commun. Contemp. Math. 3, 615–647 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42, 596–636 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bauschke, H.H., Combettes, P.L.: Legendre functions and the method of random Bregman projections. J. Convex Anal. 4, 27–67 (1997)

    MathSciNet  MATH  Google Scholar 

  7. Bauschke, H.H., Combettes, P.L.: Iterating Bregman retractions. SIAM J. Optim. 13, 1159–1173 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bauschke, H.H., Wang, X., Ye, J., Yuang, X.: Bregman distances and Chebyshev sets. J. Approx. Theory 159, 3–25 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  9. Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37, A1111–A1138 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  10. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 200–217 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  11. Butnariu, D., Censor, Y., Reich, S.: Iterative averaging of entropic projections for solving stochastic convex feasibility problems. Comput. Optim. Appl. 8, 21–39 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  12. Butnariu, D., Flȧm, S.D.: Strong convergence of expected-projection methods in Hilbert spaces. Numer. Funct. Anal. and Optimiz. 16, 601–636 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  13. Butnariu, D., Iusem, A., Burachik, R.: Iterative methods of solving stochastic convex feasibility problems and applications. Comput. Optim. Appl. 15, 269–307 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  14. Calafiore, G., Polyak, B.T.: Stochastic algorithms for exact and approximate feasibility of robust LMIs. IEEE Trans. Autom. Control 46(11), 1755–1759 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  15. Castaing, C., Valadier, M.: Convex Analysis and Measurable Multifunctions Lecture Notes in Mathematics, vol. 580. Springer, New York (1977)

    Book  MATH  Google Scholar 

  16. Censor, Y., Lent, A.: An iterative row-action method for interval convex programming. J. Optim. Theory Appl. 34, 321–353 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  17. Censor, Y., Reich, S.: Iteration of paracontractions and firmly nonexpansive operators with applications to feasibility optimization. Optimization 37, 323–339 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  18. Censor, Y., Zenios, A.: Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press New York (1997)

  19. Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3, 538–543 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  20. Cimmino, G.: Calcolo approssimato per le soluzioni di sistemi di equazioni lineari. La Ricerca Scientifica Anno IX(2), 326–333 (1938)

    MATH  Google Scholar 

  21. Combettes, P.L.: The foundations of set theoretic estimation. Proc. IEEE 81, 182–208 (1993)

    Article  Google Scholar 

  22. Combettes, P.L., Pesquet, J.-C.: Stochastic quasi-fejér block-coordinate fixed point iterations with random sweeping II: mean-square and linear convergence. Math. Program. B174(1), 433–451 (2019)

    Article  MATH  Google Scholar 

  23. Dessein, A., Papadakis, N., Rouas, J.-L.: Regularized optimal transport and the ROT mover’s distance. J. Mach. Learn. Res. 19, 1–53 (2018)

    MathSciNet  MATH  Google Scholar 

  24. Deutsch, F.: The method of alternating orthogonal projections. In: Singh, S. (ed.) Approximation Theory, Spline Functions and Applications. Kluwer Academic (1992)

  25. Dhillon, I.S., Tropp, J.A.: Matrix nearness problems with Bregman divergences. SIAM J. Matrix Anal. Appl. 29, 1120–1146 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  26. Duff, I.S., Grimes, R.G., Lewis, J.G.: Users’ guide for the Harwell-Boeing sparse matrix collection (release i) (1992)

  27. Durrett, R.: Probability: Theory and Examples, 4th edn. Cambridge University Press, New York (2010)

    Book  MATH  Google Scholar 

  28. Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the β-divergence. Neural Comput. 23, 2421–2456 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  29. Gower, R., Molitor, D., Moorman, J., Needell, D.: Adaptive sketch-and-project methods for solving linear systems. SIAM J. Matrix Anal. Appl. 42(2), 954–989 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  30. Gower, R., Richtarik, P.: Randomized iterative methods for linear systems. SIAM J. Matrix Anal. Appl. 36, 1660–1690 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  31. Halperin, I.: The product of projection operators. Acta Sci. Math. (Szeged) 23, 96–99 (1962)

    MathSciNet  MATH  Google Scholar 

  32. Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression. Wiley (2005)

  33. Kaczmarz, S.: Angenherte suflösung von systemen linearer gleichungen. Bull. Int. Acad. Polon. Sci., Cl. Sci. Math., Ser. A, Sci. Math. 35, 355–357 (1937)

    Google Scholar 

  34. Kostic, V.R., Miedlar, A., Stolwijk, J.: On matrix nearness problems: distance to delocalization. SIAM J. Matrix Anal. Appl. 36, 435–460 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  35. Hermer, N., Luke, D.R., Sturm, A.: Random function iterations for consistent stochastic feasibility. Numer. Funct. Anal. Optim. 40, 386–420 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  36. Jelasity, M., Montresor, A., Babaoglu, O.: Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst. 23(3), 219–252 (2005)

    Article  Google Scholar 

  37. Loizou, N., Richtárik, P.: Revisiting randomized gossip algorithms: general framework, convergence rates and novel block and accelerated protocols, pp. 1-44. arXiv:1905.08645 (2019)

  38. Mangesius, H., Xue, X.D., Hirche, S.: Consensus driven by the geometric mean. IEEE Trans. Control Netw. Syst. 5(1), 251–261 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  39. Martinsson, P.-G., Tropp, J.: Randomized numerical linear algebra: foundations & algorithms. Acta Numer. 29, 403–572 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  40. Mavroforakis, C., Erdös, D., Crovella, M., Terzi, E.: Active positive-definite matrix completion. In: Proceedings of the 2017 SIAM international conference on data mining, pp. 264–272 (2017)

  41. Mazko, A.: Matrix equations, spectral problems and stability of dynamic systems. Stability, oscillations and optimization of systems. Cambridge Scientific Publishers (2008)

  42. Muzellec, B., Nock, R., Patrini, G., Nielsen, F.: Tsallis regularized optimal transport and ecological inference. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp. 2387–2393 (2017)

  43. Necoara, I., Richtárik, P., Patrascu, A.: Randomized projection methods for convex feasibility: conditioning and convergence rates. SIAM J. Optim. 29, 2814–2852 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  44. Nedić, A.: Random projection algorithms for convex set intersection problems. 49th IEEE conference on decision and control (CDC) pp. 7655–7660 (2010)

  45. Needell, D., Rebrova, E.: On block gaussian sketching for the Kaczmarz method. Numer. Alg. 86(/1), 443–473 (2019)

    MathSciNet  MATH  Google Scholar 

  46. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  47. Peyré, G., Cuturi, M.: Computational optimal transport: with applications to data sciences. Now publishers incorporated, USA (2019)

  48. Reem, D., Reich, S., De Pierro, A.: Re-examination of Bregman functions and new properties of their divergences. Optimization 68, 279–348 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  49. Richtárik, P., Takáč, M.: Stochastic reformulations of linear systems: algorithms and convergence theory. SIAM J. Matrix Anal. Appl. 41, 487–524 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  50. Rockafellar, R.T.: Convex Analysis. Princeton University Press Princeton (1970)

  51. Ruschendorf, L.: Convergence of the iterative proportional fitting procedure. Ann. Statist. 23, 1160–1174 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  52. Steinerberger, S.: Randomized Kaczmarz converges along small singular vectors. SIAM J. Matrix Anal. Appl. 42(/2), 608–615 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  53. Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15, 262–278 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  54. Sun, T., Tran-Dinh, Q.: Generalized self-concordant functions: a recipe for newton-type methods. Math. Program. 178, 145–213 (2019)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We wish to thank three anonymous referees whose helpful comments led to the improvement of the originally submitted version.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir R. Kostić.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Vladimir R. Kostić and Saverio Salzo contributed equally to this work.

Appendices

Appendix: A. Basic facts on Bregman projection method

In the following, we collect few important facts about Bregman distances generated by Legendre functions (see [4, 6, 7, 48]). Note that item (xi) follows from Taylor’s formula for ϕ.

Fact A.1

Let ϕ be a Legendre function. Then the following properties hold.

  1. (i)

    (∀x ∈domϕ)(∀y ∈int(domϕ)) Dϕ(x,y) = ϕ(x) + ϕ(∇ϕ(y)) −〈x,∇ϕ(y)〉.

  2. (ii)

    (∀y ∈int(domϕ)) Dϕ(⋅,y) is a strictly convex on int(domϕ) and coercive.

  3. (iii)

    (∀x,y ∈int(domϕ)) Dϕ(x,y) = 0 ⇔ x = y.

  4. (iv)

    (∀x,y ∈int(domϕ)) Dϕ(x,y) + Dϕ(y,x) = 〈xy,∇ϕ(x) −∇ϕ(y)〉≥ 0.

  5. (v)

    (Three-Point Identity [19]) For every xX and y,z ∈int(domϕ), we have

    $$ D_{\phi}(x,z) = D_{\phi}(x,y)+D_{\phi}(y,z) - \langle{x- y, \nabla\phi(z)-\nabla\phi(y)}\rangle. $$
    (A1)
  6. (vi)

    (∀x,y ∈int(domϕ)) HCode \(D_{\phi }(x,y)=D_{\phi ^{*}}(\nabla \phi (y),\nabla \phi (x))\).

  7. (vii)

    Dϕ is continuous on int(domϕ) ×int(domϕ).

  8. (viii)

    Suppose that ϕ is twice differentiable on int(domϕ). Then

    $$ \big(\forall x\!\in\textup{int}(\textup{dom}\phi), \nabla^{2}\phi(x)\text{ is invertible}\big)\Leftrightarrow \big(\phi^{*}\!\text{ is twice differentiable}\big). $$
    (A2)
  9. (ix)

    Suppose that domϕ is open. Then, for every x ∈int(domϕ), the sublevel sets of Dϕ(x,⋅) are compact, and hence ϕ(x,⋅) is lower semicontinuous.

  10. (x)

    Suppose that domϕ is open. Then, for every x ∈int(domϕ), and every sequence \((y_{k})_{k \in \mathbb {N}}\) in int(domϕ)

    $$ D_{\phi}(x,y_{k}) \to 0\ \Rightarrow\ y_{k} \to x. $$
    (A3)

    Consequently, for every x ∈int(domϕ) and ε > 0, there exists δ > 0 such that for every y ∈int(domϕ), Dϕ(x,y) < δ ⇒ ∥xy∥ < ε.

  11. (xi)

    If ϕ is twice differentiable on int(domϕ), then for every x,y ∈int(domϕ) there exists ξ ∈ [x,y] such that

    $$ D_{\phi}(x,y)=\frac{1}{2}\langle{\nabla^{2}\phi(\xi)(x-y),x-y}\rangle. $$
    (A4)

    Moreover, for every y ∈int(domϕ) and every ε > 0 there exists δ > 0 such that, for every x ∈int(domϕ) such that xy∉Ker(∇2ϕ(y)),

    $$ \|{x - y}\| \leq \delta\ \Rightarrow\ \left| \frac{D_{\phi}(x,y) - \frac 1 2 \langle{\nabla^{2} \phi(y) (x-y),x-y}\rangle}{\frac 1 2\langle{\nabla^{2} \phi(y) (x - y), x - y}\rangle}\right| \leq \varepsilon. $$
    (A5)

In addition to the above facts, we will use the following ones, too.

Fact A.2

Let A: XY be a linear operator and let A be its Moore-Penrose pseudoinverse. Then AA = A(AA)A is the orthogonal projector onto Im(A), and \(\|{A^{\dagger }}\|{}^{-1} = \inf _{z\in \textup {Ker}(A)^{\perp }\setminus \{0\}} \|{Az}\|/\|{z}\|\) is the smallest positive singular value of A.

Fact A.3 ([27, Example 5.1.5])

Let ζ1 and ζ2 be independent random variables with values in the measurable spaces \(\mathcal {Z}_{1}\) and \(\mathcal {Z}_{2}\) respectively. Let \(\varphi \colon \mathcal {Z}_{1}\times \mathcal {Z}_{2} \to \mathbb {R}\) be measurable and suppose that \(\mathbb {E}[|{\varphi (\zeta _{1},\zeta _{2})}|]<+\infty \). Then \(\mathbb {E}[\varphi (\zeta _{1},\zeta _{2}) | \zeta _{1}] = \psi (\zeta _{1})\), where for all \(z_{1} \in \mathcal {Z}_{1}\), \(\psi (z_{1}) = \mathbb {E}[\varphi (z_{1}, \zeta _{2})]\).

Fact A.4 ([27, Theorem 3.2.4])

Let \((x_{k})_{k \in \mathbb {N}}\) be a sequence of X-valued random variable and let x be an X-valued random variable. Then the following hold.

  1. (i)

    Suppose that xk are uniformly essentially bounded, i.e., \(\sup _{k \in \mathbb {N}} \textup {esssup} \|{x_{k}}\|<+\infty \). Then xkx \(\mathbb {P}\)-a.s. ⇒ \(\mathbb {E}[\|{x_{k}-x}\|^{2}]\to 0\).

  2. (ii)

    Suppose that xkUX \(\mathbb {P}\)-a.s. and T : UY is continuous. Then xkx in distribution ⇒ T(xk) → T(x) in distribution.

Proof Proof of Lemma 2.1

Let \(x_{i} = P_{C_{i}}(x)\), i = 1, 2 and zC2. Then using Fact 2.2 (iii), Dϕ(x2,x1) + Dϕ(x1,x) = Dϕ(x2,x) ≤ Dϕ(z,x) = Dϕ(z,x1) + Dϕ(x1,x), which yields Dϕ(x2,x1) ≤ Dϕ(z,x1). Hence \(x_{2} = P_{C_{2}}(x_{1})\) and \(D_{C_{2}}(x_{1}) + D_{C_{1}}(x) = D_{C_{2}}(x)\). □

Proof Proof of Lemma 2.2

Since Az = b, it follows from (11) and Fact A.1(vi) that

$$ \begin{array}{@{}rcl@{}} {\Psi}^{x}_{C}(\lambda) &=& \phi^{*}(\nabla \phi(x) + A^{*}\lambda) - \phi^{*}(\nabla \phi(x)) - \langle z, A^{*}\lambda \rangle \\ &=& D_{\phi^{*}}(\nabla \phi(x)+ A^{*} \lambda, \nabla \phi(x)) + \langle x- z, A^{*}\lambda \rangle \\ &=& D_{\phi}(x, \nabla \phi^{*}(\nabla \phi(x)+ A^{*} \lambda)) + \langle x- z, A^{*}\lambda \rangle. \end{array} $$
(A6)

Moreover, it follows from Fact A.1(v) that

$$ D_{\phi}(z, \nabla \phi^{*}(\nabla \phi(x) + A^{*} \lambda)) = D_{\phi}(z,x) + D_{\phi}(x,\nabla \phi^{*}(\nabla \phi(x) + A^{*} \lambda)) + \langle x - z, A^{*} \lambda \rangle, $$

which together with (A6) yields (i). Next, since PC(x) ∈ C, weak duality yields \(D_{\phi }(P_{C}(x),x) \geq - {\Psi }^{x}_{C}(\lambda )\). Then, by (i), Dϕ(PC(x),x) ≥ Dϕ(z,x) − Dϕ(z,∇ϕ(∇ϕ(x) + Aλ)). Statement (ii) follows by Pythagora’s theorem given in Proposition 2.2(iii). □

Appendix: B. D-Fejér monotone sequences [5, 18]

Let CX be a nonempty closed convex set. Let ϕ be a Legendre function such that \(C \cap \text {int}(\text {dom} \phi ) \neq \varnothing \). A sequence \((x_{k})_{k \in \mathbb {N}}\) in int(domϕ) is Bregman monotone or D-Fejér monotone w.r.t. C if

$$ (\forall x \in C)(\forall k \in \mathbb{N})\qquad D_{\phi}(x, x_{k+1}) \leq D_{\phi}(x,x_{k}). $$
(B7)

For D-Fejér monotone sequences, the following properties are known [5, Proposition 4.1, Example 4.7, and Theorem 4.1(i)].

Proposition B.1

Let \((x_{k})_{k \in \mathbb {N}}\) be a D-Fejer monotone sequence with respect to C. Then the following hold.

  1. (i)

    \((\forall x \in C\cap \textup {dom} \phi )\quad (D_{\phi }(x,x_{k}))_{k \in \mathbb {N}}\) is decreasing.

  2. (ii)

    \((D_{C}(x_{k}))_{k \in \mathbb {N}}\) is decreasing.

  3. (iii)

    \((\forall k \in \mathbb {N})(\forall p \in \mathbb {N})\quad D_{C}(x_{k+p}) \leq D_{C}(x_{k}) - D_{\phi }(P_{C} (x_{k}),P_{C}(x_{k+p}))\).

  4. (iv)

    \((\forall x \in C\cap \textup {dom} \phi )(\forall x^{\prime } \in C\cap \textup {dom} \phi ) \quad \langle {x - x^{\prime }, \nabla \phi (x_{k})}\rangle \) is convergent.

  5. (v)

    Suppose that domϕ is open. Then \((x_{k})_{k \in \mathbb {N}}\) is bounded.

  6. (vi)

    If all cluster points of \((x_{k})_{k \in \mathbb {N}}\) lie in C, then \((x_{k})_{k \in \mathbb {N}}\) converges to some point in C ∩int(domϕ).

Concerning Proposition B.1(vi), we now give a result ensuring that the cluster points of \((x_{k})_{k \in \mathbb {N}}\) lie in C. In the next result, we will consider the so-called sequential consistency assumption [5].

Proposition B.2

Suppose that DC(xk) → 0 and that for all bounded sequences \((z_{k})_{k \in \mathbb {N}}\) and \((y_{k})_{k \in \mathbb {N}}\) in int(domϕ)

$$ D_{\phi}(z_{k},y_{k}) \to 0\ \Rightarrow\ z_{k} - y_{k} \to 0. $$
(B8)

Then \((x_{k})_{k \in \mathbb {N}}\) converges to some point in C ∩int(domϕ).

Proof

Let xC ∩int(domϕ). It follows from Proposition B.1(i) and Fact A.1(ix) that \((x_{k})_{k \in \mathbb {N}}\) is contained in the compact set {Dϕ(x,⋅) ≤ Dϕ(x,x0)}⊂int(domϕ). Hence, the set of cluster points of \((x_{k})_{k \in \mathbb {N}}\) is nonempty and contained in int(domϕ). Moreover, it follows from Proposition B.1(iii) (with k = 0) and Fact A.1(ix) that \((P_{C}(x_{p}))_{p \in \mathbb {N}}\) is contained in the compact set {Dϕ(PC(x0),⋅) ≤ DC(x0)} and hence it is bounded. Let x be a cluster point of \((x_{k})_{k \in \mathbb {N}}\) and let \((x_{n_{k}})_{k \in \mathbb {N}}\) be a subsequence such that \(x_{n_{k}} \to x\). Then, we saw that x ∈int(domϕ). Moreover, \(D_{\phi }(P_{C}(x_{n_{k}}), x_{k_{n}}) = D_{C}(x_{n_{k}})\to 0\) and hence in virtue of (B8), we have that \(P_{C}(x_{n_{k}}) - x_{n_{k}} \to 0\). Therefore, \(P_{C}(x_{n_{k}}) \to x\), which implies that xC, since C is closed. Thus, we proved that all cluster points of \((x_{k})_{k \in \mathbb {N}}\) lie in C ∩int(domϕ) and therefore, by Proposition B.1(vi), we derive that \((x_{k})_{k \in \mathbb {N}}\) converges to some point in C. □

Appendix: C. Proof of Lemma 3.1(ii) under assumption H1

It follows from Proposition 3.1(iii) and H1 that there exists a \(\mathbb {P}\)-negligible set N ⊂Ω such that \(C = \bigcap _{\omega \in {\Omega }\setminus N} C_{\xi (\omega )}\) and \(\sup _{\omega \in {\Omega }\setminus N}\|{A_{\xi (\omega )}^{*}(A_{\xi (\omega )} H^{2} A_{\xi (\omega )}^{*})^{\dagger } A_{\xi (\omega )}}\|\leq M<+\infty \). Note that, if x ∈int(domϕ) ∩ C, then \(0=D_{C}(x)=D_{C}(P_{C_{\xi (\omega )}}(x))\), for every ω ∈Ω∖ N, hence (22) holds trivially. Therefore, we let x ∈int(domϕ) ∖ C and let x = PC(x), y = ∇ϕ(x), y = ∇ϕ(x). Now, let ω ∈Ω∖ N. We denote for the sake of brevity i = ξ(ω), \(x_{i} = P_{C_{i}}(x)\), yi = ∇ϕ(xi) and H = [∇2ϕ(y)]1/2. Next, we will proceed through 6 steps.

Step1::

We have

$$ (\forall v_{i} \in \text{Im}(A_{i}^{*}))\quad y + v_{i} \in \text{int}(\text{dom} \phi^{*})\ \implies\ D_{C}(x_{i}) \leq D_{\phi^{*}}(y + v_{i}, y_{\star}). $$
(C9)

Indeed, Lemma 2.1 yields \(D_{C}(x_{i}) = D_{\phi }(P_{C}(x_{i}), x_{i}) = D_{\phi }(P_{C}(x),x_{i})= D_{\phi }(x_{\star },P_{C_{i}}(x))\), with xCi. Hence, using Lemma 2.2 and Fact A.1(vi), (C9) follows.

Step 2::

There exists \(\tilde {w} \in \text {Im}(A^{*})\) such that \(\|{H \tilde {w}}\| = 1\) and, for all τ > 0,

$$ u_{\tau}:= H(y_{\star} - y + \tau \tilde w) \in V(x_{\star})\!\setminus\!\{0\}. $$
(C10)

Indeed, first recall that Im(HA) = V (x)≠{0}. It follows from (12) that yy ∈Im(A). Now, if H(yy)≠ 0 we define \(\tilde {w}=(y_{\star } - y)/\|{H(y_{\star } - y)}\|\) and (C10) follows. Otherwise, since Im(HA)≠{0}, we can pick \(\tilde {w} \in \text {Im}(A^{*})\) such that \(\|{H \tilde {w}}\|=1\) and again (C10) follows.

Step 3::

Suppose that \(HA_{i}^{*}\neq 0\). We prove that, for every τ > 0, there exists \(v_{i,\tau } \in \text {Im}(A_{i}^{*})\) such that y + vi,τy∉Ker(H) and

$$ \|{y+v_{i,\tau} - y_{\star}}\|\leq (1+M\|{H}\|^{2})\|{y_{\star}-y}\| + 3\tau M \|{H}\|. $$
(C11)

Indeed, since \(HA_{i}^{*}\neq 0\), there exists \(w_{i} \in \text {Im}(HA^{*}_{i})\) such that ∥wi∥ = 2. Now, note that

$$ Q_{i}(x_{\star}) = H A_{i}^{*}[A_{i} H^{2} A_{i}^{*}]^{\dagger} A_{i} H $$
(C12)

and let, for every τ > 0,

$$ v_{i,\tau}:=A_{i}^{*}[A_{i} H^{2} A_{i}^{*}]^{\dagger} A_{i} H (u_{\tau} +\tau w_{i})\in\text{Im}(A_{i}^{*}). $$
(C13)

Then, recalling (C10), (C12), and the fact that \(w = H \tilde {w}\), we have

$$ \begin{array}{@{}rcl@{}} H(y_{\star}-y-v_{i,\tau}) &=& H(y_{\star}-y) - Q_{i}(x_{\star})(u_{\tau} + \tau w_{i})\\ &=& [I - Q_{i}(x_{\star})]H(y_{\star}-y) - \tau Q_{i}(x_{\star})(w+w_{i}) \end{array} $$

and, since Qi(x) is the projector onto \(\text {Im}(HA_{i}^{*})\) and \(w_{i} \in \text {Im}(HA_{i}^{*})\), we have

$$ \|{H(y+v_{i,\tau} - y_{\star})}\|^{2} = \|{[I - Q_{i}(x_{\star})]H(y_{\star}-y)}\|^{2} + \tau^{2} \|{Q_{i}(x_{\star})w + w_{i}}\|^{2}. $$
(C14)

In the above formula we have Qi(x)w≠ − wi, since ∥wi∥ = 2 while ∥Qi(x)w∥≤∥w∥ = 1. Therefore ∥H(y + vi,τy)∥2 > 0 and hence y + vi,τy∉Ker(H). Finally, inequality (C11) follows by bounding ∥vi,τ∥ using (C13), (C10), assumption H1, and the fact that ∥wi∥ = 2 and \(\|{H \tilde {w}}\|=1\).

Step 4::

Suppose that \(HA_{i}^{*}\neq 0\) and let \(\varepsilon \in \left ]0,1\right [\). We prove that for τ > 0 sufficiently small

$$ \frac{1 - \varepsilon}{2} \|{u_{\tau}}\|^{2} \leq D_{C}(x)\ \text{and}\ D_{C}(x_{i}) \leq \frac{1 + \varepsilon}{2} \Big(\|{[I - Q_{i}(x_{\star})]u_{\tau}}\|^{2} + \sqrt{\tau} D_{C}(x) \Big). $$
(C15)

Indeed, it follows from the second part of Fact A.1(xi), applied to \(D_{\phi ^{*}}\), that there exists \(\tilde \delta >0\) such that if \(\|{\tilde y-y_{\star }}\|<\tilde \delta \) and \(\tilde y-y_{\star }\not \in \text {Ker}(H)\), then

$$ \frac{\sqrt{1 - \varepsilon}}{2} \langle{H^{2} (\tilde y - y_{\star}), \tilde y - y_{\star}}\rangle \leq D_{\phi^{*}}(\tilde y, y_{\star}) \leq \frac{1 + \varepsilon}{2} \langle{H^{2} (\tilde y - y_{\star}), \tilde y - y_{\star}}\rangle. $$

Therefore, setting \(\beta _{\star }= 1 + \max \limits \{3 M\|{H}\|^{2}+M\|{H}\|,\|{\tilde w}\|\}>1\), it follows from the inequality \(\|{y_{\star }-y-\tau \tilde w}\| \leq \|{y_{\star }-y}\| + \tau \|{\tilde w}\|\) and (C11) that if τ ≤∥yy∥ and \(\|{y-y_{\star }}\|\leq \tilde \delta / \beta _{\star }\), we have

$$ D_{\phi^{*}}(y+v_{i,\tau}, y_{\star}) \leq \frac{1 + \varepsilon}{2} \langle{H^{2} (y+v_{i,\tau} - y_{\star}), y+v_{i,\tau} - y_{\star}}\rangle $$
(C16)

and

$$ D_{\phi^{*}}(y+\tau \tilde w, y_{\star}) \geq \frac{\sqrt{1 - \varepsilon}}{2} \langle{H^{2} (y +\tau \tilde w - y_{\star}), y +\tau \tilde w - y_{\star}}\rangle. $$
(C17)

Now, the continuity of ∇ϕ and Fact A.1(x) yields that there exists δ > 0 such that if DC(x) < δ then, \(\|{y-y_{\star }}\|<\tilde \delta / \beta _{\star }\), and hence, collecting (C9) and (C16), we obtain \(D_{C}(x_{i}) \leq D_{\phi ^{*}}\big (y+v_{i,\tau },y_{\star } \big )\leq ((1 + \varepsilon )/2) \|{ H(y+v_{i,\tau } - y_{\star })}\|{}^{2}\). However, it also holds that ∥H(y + vi,τy)∥ = ∥[IQi(x)]uτ + τ(wwi)∥≤∥[IQi(x)]uτ∥ + 3τ. Therefore, since \(\|{u_{\tau }}\| \leq \|{H(y-y_{\star })}\| + \tau \|{H \tilde w}\| \leq (\|{H}\| + 1) \|{y - y_{\star }}\|\),

$$ \begin{array}{@{}rcl@{}} D_{C}(x_{i}) &\leq& \frac{1 + \varepsilon}{2}\left( \|{[I-Q_{i}(x_{\star})]u_{\tau}}\|^{2} + 9 \tau^{2} +6 \tau \|{u_{\tau}}\| \right)\\ &\leq& \frac{1 + \varepsilon}{2}\left( \|{[I-Q_{i}(x_{\star})]u_{\tau}}\|^{2} + 3 \tau(2 \|{H}\| + 5) \|{y - y_{\star}}\| \right) \end{array} $$

which, for \(\tau \leq \tau _{\star }^{(1)}:= \min \limits \{\|{y_{\star }-y}\|,9^{-1}D_{C}(x)^{2}\|{y_{\star }-y}\|^{-2}(2 \|{H}\| + 5)^{-2}\}\), gives

$$ D_{C}(x_{i}) \leq \frac{1 + \varepsilon}{2} \Big(\|{[I - Q_{i}(x_{\star})]u_{\tau}}\|^{2} + \sqrt{\tau} D_{C}(x) \Big). $$
(C18)

On the other hand, \(D_{C}(x) = D_{\phi }(x_{\star }, x) = D_{\phi ^{*}}(y, y_{\star }) \neq 0\). Hence, using the continuity of \(D_{\phi ^{*}}(\cdot ,y_{\star })\), we have that there exists \(\tau _{\star }^{(2)}>0\) such that for every \(\tau \leq \tau _{\star }^{(2)}\), \(D_{C}(x)\geq \sqrt {1-\varepsilon } D_{\phi ^{*}}(y+\tau \tilde w, y_{\star })\). So, (C17) yields

$$ D_{C}(x)\geq \frac{1 - \varepsilon}{2} \langle{H^{2} (y+ \tau \tilde{w} - y_{\star}), y + \tau \tilde{w}- y_{\star}}\rangle = \frac{1 - \varepsilon}{2} \|{u_{\tau}}\|^{2}. $$
(C19)
Step 5::

For \(\tau \leq \min \limits \{\tau _{\star }^{(1)},\tau _{\star }^{(2)}\}\), we have

$$ \frac{D_{C}(x_{i})}{D_{C}(x)} \leq \frac{1+\varepsilon}{1-\varepsilon} \bigg(1 - \frac{\|{Q_{i}(x_{\star}) u_{\tau}}\|^{2}}{\|{u_{\tau}}\|^{2}} \bigg)+ \frac{1 + \varepsilon}{2}\sqrt{\tau}. $$
(C20)

This follows from (C15) when \(HA_{i}^{*}\neq 0\). However, (C20) holds even when \(\text {Im}(HA_{i}^{*})=\{0\}\). Indeed in such case, recalling the definition of Qi(x), we have Qi(x) ≡ 0. Hence, since DC(xi) ≤ DC(x), we have that (C20) actually holds for every τ > 0.

Step 6::

Note that inequality (C20) holds with i = ξ(ω) and ω ∈Ω∖ N and that \(\tau _{\star }^{(1)}\) and \(\tau _{\star }^{(2)}\) are independent on i = ξ(ω). Therefore, the above inequality implies that

$$ \frac{D_{C}(P_{C_{\xi}}(x))}{D_{C}(x)} \leq \frac{1+\varepsilon}{1-\varepsilon} \bigg(1 - \frac{\|{Q_{\xi}(x_{\star}) u_{\tau}}\|{}^{2}}{\|{u_{\tau}}\|^{2}} \bigg)+ \frac{1+\varepsilon}{2}\sqrt{\tau}, \mathbb{P}\text{-a.s.} $$

So, taking the expectation and recalling definition (21), we have

$$ \begin{array}{@{}rcl@{}} \frac{\mathbb{E}[D_{C}(P_{C_{\xi}}(x))]}{D_{C}(x)} &\leq& \frac{1+\varepsilon}{1-\varepsilon} \bigg(1 - \frac{\|{\overline{Q}(x_{\star}) u}\|^{2}}{\|{u}\|^{2}} \bigg) + \frac{1+\varepsilon}{2}\sqrt{\tau}\\ &\leq& \frac{1+\varepsilon}{1-\varepsilon} [1 - \gamma_{\mathcal{C}}(x_{\star})]+ \frac{1+\varepsilon}{2}\sqrt{\tau}. \end{array} $$

Finally, letting τ → 0 in the above inequality the statement follows.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kostić, V.R., Salzo, S. The method of randomized Bregman projections for stochastic feasibility problems. Numer Algor 93, 1269–1307 (2023). https://doi.org/10.1007/s11075-022-01468-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11075-022-01468-8

Keywords

Mathematics Subject Classification (2010)

Navigation