The method of randomized Bregman projections for stochastic feasibility problems

Kostić, Vladimir R.; Salzo, Saverio

doi:10.1007/s11075-022-01468-8

The method of randomized Bregman projections for stochastic feasibility problems

Original Paper
Published: 21 December 2022

Volume 93, pages 1269–1307, (2023)
Cite this article

Numerical Algorithms Aims and scope Submit manuscript

404 Accesses
Explore all metrics

Abstract

In this work, we study the method of randomized Bregman projections for stochastic convex feasibility problems, possibly with an infinite number of sets, in Euclidean spaces. Under very general assumptions, we prove almost sure convergence of the iterates to a random almost common point of the sets. We then analyze in depth the case of affine sets showing that the iterates converge Q-linearly and providing also global and local rates of convergence. This work generalizes recent developments in randomized methods for the solution of linear systems based on orthogonal projection methods. We provided several applications: sketch & project methods for solving linear systems of equations, positive definite matrix completion problem, gossip algorithms for networks consensus, the assessment of robust stability of dynamical systems, and computational solutions for multimarginal optimal transport.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

Golden Ratio Proximal Gradient ADMM for Distributed Composite Convex Optimization

Article 15 November 2023

Data Availability

Data used in this article is available in public repository.

Notes

Note that is not a distance in the sense of metric topology and even when ϕ(x) = (1/2)∥x∥² it is one half of the square of the distance between x and y.
This means that $\nu (A,x) = {\int \limits }_{A} [D_{C_{i}}(x)/\overline {D}_{C}(x)] \mu (d i)$ if x∉C and ν(A,x) = μ(A) if x ∈ C.

References

Ash, R.B., Doléans-Dade, C.A.: Probability & Measure Theory. Academic Press, San Diego, CA USA (2000)
Azizan, N., Hassibi, B.: Stochastic gradient/mirror descent: minimax optimality and implicit regularization. Int. Conf. Learn. Representations (ICLR):1–18 (2019)
Banerjee, A., Merugu, S., Dhillon, I., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)
MathSciNet MATH Google Scholar
Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces. Commun. Contemp. Math. 3, 615–647 (2001)
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42, 596–636 (2003)
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Combettes, P.L.: Legendre functions and the method of random Bregman projections. J. Convex Anal. 4, 27–67 (1997)
MathSciNet MATH Google Scholar
Bauschke, H.H., Combettes, P.L.: Iterating Bregman retractions. SIAM J. Optim. 13, 1159–1173 (2003)
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Wang, X., Ye, J., Yuang, X.: Bregman distances and Chebyshev sets. J. Approx. Theory 159, 3–25 (2009)
Article MathSciNet MATH Google Scholar
Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37, A1111–A1138 (2015)
Article MathSciNet MATH Google Scholar
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 200–217 (1967)
Article MathSciNet MATH Google Scholar
Butnariu, D., Censor, Y., Reich, S.: Iterative averaging of entropic projections for solving stochastic convex feasibility problems. Comput. Optim. Appl. 8, 21–39 (1997)
Article MathSciNet MATH Google Scholar
Butnariu, D., Flȧm, S.D.: Strong convergence of expected-projection methods in Hilbert spaces. Numer. Funct. Anal. and Optimiz. 16, 601–636 (1995)
Article MathSciNet MATH Google Scholar
Butnariu, D., Iusem, A., Burachik, R.: Iterative methods of solving stochastic convex feasibility problems and applications. Comput. Optim. Appl. 15, 269–307 (2000)
Article MathSciNet MATH Google Scholar
Calafiore, G., Polyak, B.T.: Stochastic algorithms for exact and approximate feasibility of robust LMIs. IEEE Trans. Autom. Control 46(11), 1755–1759 (2001)
Article MathSciNet MATH Google Scholar
Castaing, C., Valadier, M.: Convex Analysis and Measurable Multifunctions Lecture Notes in Mathematics, vol. 580. Springer, New York (1977)
Book MATH Google Scholar
Censor, Y., Lent, A.: An iterative row-action method for interval convex programming. J. Optim. Theory Appl. 34, 321–353 (1981)
Article MathSciNet MATH Google Scholar
Censor, Y., Reich, S.: Iteration of paracontractions and firmly nonexpansive operators with applications to feasibility optimization. Optimization 37, 323–339 (1996)
Article MathSciNet MATH Google Scholar
Censor, Y., Zenios, A.: Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press New York (1997)
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3, 538–543 (1993)
Article MathSciNet MATH Google Scholar
Cimmino, G.: Calcolo approssimato per le soluzioni di sistemi di equazioni lineari. La Ricerca Scientifica Anno IX(2), 326–333 (1938)
MATH Google Scholar
Combettes, P.L.: The foundations of set theoretic estimation. Proc. IEEE 81, 182–208 (1993)
Article Google Scholar
Combettes, P.L., Pesquet, J.-C.: Stochastic quasi-fejér block-coordinate fixed point iterations with random sweeping II: mean-square and linear convergence. Math. Program. B174(1), 433–451 (2019)
Article MATH Google Scholar
Dessein, A., Papadakis, N., Rouas, J.-L.: Regularized optimal transport and the ROT mover’s distance. J. Mach. Learn. Res. 19, 1–53 (2018)
MathSciNet MATH Google Scholar
Deutsch, F.: The method of alternating orthogonal projections. In: Singh, S. (ed.) Approximation Theory, Spline Functions and Applications. Kluwer Academic (1992)
Dhillon, I.S., Tropp, J.A.: Matrix nearness problems with Bregman divergences. SIAM J. Matrix Anal. Appl. 29, 1120–1146 (2007)
Article MathSciNet MATH Google Scholar
Duff, I.S., Grimes, R.G., Lewis, J.G.: Users’ guide for the Harwell-Boeing sparse matrix collection (release i) (1992)
Durrett, R.: Probability: Theory and Examples, 4th edn. Cambridge University Press, New York (2010)
Book MATH Google Scholar
Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the β-divergence. Neural Comput. 23, 2421–2456 (2011)
Article MathSciNet MATH Google Scholar
Gower, R., Molitor, D., Moorman, J., Needell, D.: Adaptive sketch-and-project methods for solving linear systems. SIAM J. Matrix Anal. Appl. 42(2), 954–989 (2021)
Article MathSciNet MATH Google Scholar
Gower, R., Richtarik, P.: Randomized iterative methods for linear systems. SIAM J. Matrix Anal. Appl. 36, 1660–1690 (2015)
Article MathSciNet MATH Google Scholar
Halperin, I.: The product of projection operators. Acta Sci. Math. (Szeged) 23, 96–99 (1962)
MathSciNet MATH Google Scholar
Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression. Wiley (2005)
Kaczmarz, S.: Angenherte suflösung von systemen linearer gleichungen. Bull. Int. Acad. Polon. Sci., Cl. Sci. Math., Ser. A, Sci. Math. 35, 355–357 (1937)
Google Scholar
Kostic, V.R., Miedlar, A., Stolwijk, J.: On matrix nearness problems: distance to delocalization. SIAM J. Matrix Anal. Appl. 36, 435–460 (2015)
Article MathSciNet MATH Google Scholar
Hermer, N., Luke, D.R., Sturm, A.: Random function iterations for consistent stochastic feasibility. Numer. Funct. Anal. Optim. 40, 386–420 (2019)
Article MathSciNet MATH Google Scholar
Jelasity, M., Montresor, A., Babaoglu, O.: Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst. 23(3), 219–252 (2005)
Article Google Scholar
Loizou, N., Richtárik, P.: Revisiting randomized gossip algorithms: general framework, convergence rates and novel block and accelerated protocols, pp. 1-44. arXiv:1905.08645 (2019)
Mangesius, H., Xue, X.D., Hirche, S.: Consensus driven by the geometric mean. IEEE Trans. Control Netw. Syst. 5(1), 251–261 (2016)
Article MathSciNet MATH Google Scholar
Martinsson, P.-G., Tropp, J.: Randomized numerical linear algebra: foundations & algorithms. Acta Numer. 29, 403–572 (2020)
Article MathSciNet MATH Google Scholar
Mavroforakis, C., Erdös, D., Crovella, M., Terzi, E.: Active positive-definite matrix completion. In: Proceedings of the 2017 SIAM international conference on data mining, pp. 264–272 (2017)
Mazko, A.: Matrix equations, spectral problems and stability of dynamic systems. Stability, oscillations and optimization of systems. Cambridge Scientific Publishers (2008)
Muzellec, B., Nock, R., Patrini, G., Nielsen, F.: Tsallis regularized optimal transport and ecological inference. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp. 2387–2393 (2017)
Necoara, I., Richtárik, P., Patrascu, A.: Randomized projection methods for convex feasibility: conditioning and convergence rates. SIAM J. Optim. 29, 2814–2852 (2019)
Article MathSciNet MATH Google Scholar
Nedić, A.: Random projection algorithms for convex set intersection problems. 49th IEEE conference on decision and control (CDC) pp. 7655–7660 (2010)
Needell, D., Rebrova, E.: On block gaussian sketching for the Kaczmarz method. Numer. Alg. 86(/1), 443–473 (2019)
MathSciNet MATH Google Scholar
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)
Article MathSciNet MATH Google Scholar
Peyré, G., Cuturi, M.: Computational optimal transport: with applications to data sciences. Now publishers incorporated, USA (2019)
Reem, D., Reich, S., De Pierro, A.: Re-examination of Bregman functions and new properties of their divergences. Optimization 68, 279–348 (2019)
Article MathSciNet MATH Google Scholar
Richtárik, P., Takáč, M.: Stochastic reformulations of linear systems: algorithms and convergence theory. SIAM J. Matrix Anal. Appl. 41, 487–524 (2020)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press Princeton (1970)
Ruschendorf, L.: Convergence of the iterative proportional fitting procedure. Ann. Statist. 23, 1160–1174 (1995)
Article MathSciNet MATH Google Scholar
Steinerberger, S.: Randomized Kaczmarz converges along small singular vectors. SIAM J. Matrix Anal. Appl. 42(/2), 608–615 (2021)
Article MathSciNet MATH Google Scholar
Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15, 262–278 (2009)
Article MathSciNet MATH Google Scholar
Sun, T., Tran-Dinh, Q.: Generalized self-concordant functions: a recipe for newton-type methods. Math. Program. 178, 145–213 (2019)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We wish to thank three anonymous referees whose helpful comments led to the improvement of the originally submitted version.

Author information

Authors and Affiliations

Istituto Italiano di Tecnologia, Via Melen, 83, Genova, 16152, Italy
Vladimir R. Kostić & Saverio Salzo
Department of Mathematics and Informatics, Faculty of Science, University of Novi Sad, Trg Dositeja Obradovića 4, Novi Sad, 21000, Serbia
Vladimir R. Kostić

Authors

Vladimir R. Kostić
View author publications
You can also search for this author in PubMed Google Scholar
Saverio Salzo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladimir R. Kostić.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Vladimir R. Kostić and Saverio Salzo contributed equally to this work.

Appendices

Appendix: A. Basic facts on Bregman projection method

In the following, we collect few important facts about Bregman distances generated by Legendre functions (see [4, 6, 7, 48]). Note that item (xi) follows from Taylor’s formula for ϕ.

Fact A.1

Let ϕ be a Legendre function. Then the following properties hold.

(i)
(∀x ∈domϕ)(∀y ∈int(domϕ)) D_ϕ(x,y) = ϕ(x) + ϕ^∗(∇ϕ(y)) −〈x,∇ϕ(y)〉.
(ii)
(∀y ∈int(domϕ)) D_ϕ(⋅,y) is a strictly convex on int(domϕ) and coercive.
(iii)
(∀x,y ∈int(domϕ)) D_ϕ(x,y) = 0 ⇔ x = y.
(iv)
(∀x,y ∈int(domϕ)) D_ϕ(x,y) + D_ϕ(y,x) = 〈x − y,∇ϕ(x) −∇ϕ(y)〉≥ 0.
(v)
(Three-Point Identity [19]) For every x ∈ X and y,z ∈int(domϕ), we have
$$ D_{\phi}(x,z) = D_{\phi}(x,y)+D_{\phi}(y,z) - \langle{x- y, \nabla\phi(z)-\nabla\phi(y)}\rangle. $$
(A1)
(vi)
(∀x,y ∈int(domϕ)) HCode $D_{\phi }(x,y)=D_{\phi ^{*}}(\nabla \phi (y),\nabla \phi (x))$.
(vii)
D_ϕ is continuous on int(domϕ) ×int(domϕ).
(viii)
Suppose that ϕ is twice differentiable on int(domϕ). Then
$$ \big(\forall x\!\in\textup{int}(\textup{dom}\phi), \nabla^{2}\phi(x)\text{ is invertible}\big)\Leftrightarrow \big(\phi^{*}\!\text{ is twice differentiable}\big). $$
(A2)
(ix)
Suppose that domϕ^∗ is open. Then, for every x ∈int(domϕ), the sublevel sets of D_ϕ(x,⋅) are compact, and hence ϕ(x,⋅) is lower semicontinuous.
(x)
Suppose that domϕ^∗ is open. Then, for every x ∈int(domϕ), and every sequence $(y_{k})_{k \in \mathbb {N}}$ in int(domϕ)
$$ D_{\phi}(x,y_{k}) \to 0\ \Rightarrow\ y_{k} \to x. $$
(A3)

Consequently, for every x ∈int(domϕ) and ε > 0, there exists δ > 0 such that for every y ∈int(domϕ), D_ϕ(x,y) < δ ⇒ ∥x − y∥ < ε.
(xi)
If ϕ is twice differentiable on int(domϕ), then for every x,y ∈int(domϕ) there exists ξ ∈ [x,y] such that
$$ D_{\phi}(x,y)=\frac{1}{2}\langle{\nabla^{2}\phi(\xi)(x-y),x-y}\rangle. $$
(A4)
Moreover, for every y ∈int(domϕ) and every ε > 0 there exists δ > 0 such that, for every x ∈int(domϕ) such that x − y∉Ker(∇²ϕ(y)),
$$ \|{x - y}\| \leq \delta\ \Rightarrow\ \left| \frac{D_{\phi}(x,y) - \frac 1 2 \langle{\nabla^{2} \phi(y) (x-y),x-y}\rangle}{\frac 1 2\langle{\nabla^{2} \phi(y) (x - y), x - y}\rangle}\right| \leq \varepsilon. $$
(A5)

In addition to the above facts, we will use the following ones, too.

Fact A.2

Let A: X → Y be a linear operator and let A^‡ be its Moore-Penrose pseudoinverse. Then AA^‡ = A(A^∗A)^‡A^∗ is the orthogonal projector onto Im(A), and $\|{A^{\dagger }}\|{}^{-1} = \inf _{z\in \textup {Ker}(A)^{\perp }\setminus \{0\}} \|{Az}\|/\|{z}\|$ is the smallest positive singular value of A.

Fact A.3 ([27, Example 5.1.5])

Let ζ₁ and ζ₂ be independent random variables with values in the measurable spaces $\mathcal {Z}_{1}$ and $\mathcal {Z}_{2}$ respectively. Let $\varphi \colon \mathcal {Z}_{1}\times \mathcal {Z}_{2} \to \mathbb {R}$ be measurable and suppose that $\mathbb {E}[|{\varphi (\zeta _{1},\zeta _{2})}|]<+\infty $. Then $\mathbb {E}[\varphi (\zeta _{1},\zeta _{2}) | \zeta _{1}] = \psi (\zeta _{1})$, where for all $z_{1} \in \mathcal {Z}_{1}$, $\psi (z_{1}) = \mathbb {E}[\varphi (z_{1}, \zeta _{2})]$.

Fact A.4 ([27, Theorem 3.2.4])

Let $(x_{k})_{k \in \mathbb {N}}$ be a sequence of X-valued random variable and let x be an X-valued random variable. Then the following hold.

(i)
Suppose that x_k are uniformly essentially bounded, i.e., $\sup _{k \in \mathbb {N}} \textup {esssup} \|{x_{k}}\|<+\infty $. Then x_k → x $\mathbb {P}$-a.s. ⇒ $\mathbb {E}[\|{x_{k}-x}\|^{2}]\to 0$.
(ii)
Suppose that x_k ∈ U ⊂ X $\mathbb {P}$-a.s. and T : U → Y is continuous. Then x_k → x in distribution ⇒ T(x_k) → T(x) in distribution.

Proof Proof of Lemma 2.1

Let $x_{i} = P_{C_{i}}(x)$, i = 1, 2 and z ∈ C₂. Then using Fact 2.2 (iii), D_ϕ(x₂,x₁) + D_ϕ(x₁,x) = D_ϕ(x₂,x) ≤ D_ϕ(z,x) = D_ϕ(z,x₁) + D_ϕ(x₁,x), which yields D_ϕ(x₂,x₁) ≤ D_ϕ(z,x₁). Hence $x_{2} = P_{C_{2}}(x_{1})$ and $D_{C_{2}}(x_{1}) + D_{C_{1}}(x) = D_{C_{2}}(x)$. □

Proof Proof of Lemma 2.2

Since Az = b, it follows from (11) and Fact A.1(vi) that

$$ \begin{array}{@{}rcl@{}} {\Psi}^{x}_{C}(\lambda) &=& \phi^{*}(\nabla \phi(x) + A^{*}\lambda) - \phi^{*}(\nabla \phi(x)) - \langle z, A^{*}\lambda \rangle \\ &=& D_{\phi^{*}}(\nabla \phi(x)+ A^{*} \lambda, \nabla \phi(x)) + \langle x- z, A^{*}\lambda \rangle \\ &=& D_{\phi}(x, \nabla \phi^{*}(\nabla \phi(x)+ A^{*} \lambda)) + \langle x- z, A^{*}\lambda \rangle. \end{array} $$

(A6)

Moreover, it follows from Fact A.1(v) that

$$ D_{\phi}(z, \nabla \phi^{*}(\nabla \phi(x) + A^{*} \lambda)) = D_{\phi}(z,x) + D_{\phi}(x,\nabla \phi^{*}(\nabla \phi(x) + A^{*} \lambda)) + \langle x - z, A^{*} \lambda \rangle, $$

which together with (A6) yields (i). Next, since P_C(x) ∈ C, weak duality yields $D_{\phi }(P_{C}(x),x) \geq - {\Psi }^{x}_{C}(\lambda )$. Then, by (i), D_ϕ(P_C(x),x) ≥ D_ϕ(z,x) − D_ϕ(z,∇ϕ^∗(∇ϕ(x) + A^∗λ)). Statement (ii) follows by Pythagora’s theorem given in Proposition 2.2(iii). □

Appendix: B. D-Fejér monotone sequences [5, 18]

Let C ⊂ X be a nonempty closed convex set. Let ϕ be a Legendre function such that $C \cap \text {int}(\text {dom} \phi ) \neq \varnothing $. A sequence $(x_{k})_{k \in \mathbb {N}}$ in int(domϕ) is Bregman monotone or D-Fejér monotone w.r.t. C if

$$ (\forall x \in C)(\forall k \in \mathbb{N})\qquad D_{\phi}(x, x_{k+1}) \leq D_{\phi}(x,x_{k}). $$

(B7)

For D-Fejér monotone sequences, the following properties are known [5, Proposition 4.1, Example 4.7, and Theorem 4.1(i)].

Proposition B.1

Let $(x_{k})_{k \in \mathbb {N}}$ be a D-Fejer monotone sequence with respect to C. Then the following hold.

(i)
$(\forall x \in C\cap \textup {dom} \phi )\quad (D_{\phi }(x,x_{k}))_{k \in \mathbb {N}}$ is decreasing.
(ii)
$(D_{C}(x_{k}))_{k \in \mathbb {N}}$ is decreasing.
(iii)
$(\forall k \in \mathbb {N})(\forall p \in \mathbb {N})\quad D_{C}(x_{k+p}) \leq D_{C}(x_{k}) - D_{\phi }(P_{C} (x_{k}),P_{C}(x_{k+p}))$.
(iv)
$(\forall x \in C\cap \textup {dom} \phi )(\forall x^{\prime } \in C\cap \textup {dom} \phi ) \quad \langle {x - x^{\prime }, \nabla \phi (x_{k})}\rangle $ is convergent.
(v)
Suppose that domϕ^∗ is open. Then $(x_{k})_{k \in \mathbb {N}}$ is bounded.
(vi)
If all cluster points of $(x_{k})_{k \in \mathbb {N}}$ lie in C, then $(x_{k})_{k \in \mathbb {N}}$ converges to some point in C ∩int(domϕ).

Concerning Proposition B.1(vi), we now give a result ensuring that the cluster points of $(x_{k})_{k \in \mathbb {N}}$ lie in C. In the next result, we will consider the so-called sequential consistency assumption [5].

Proposition B.2

Suppose that D_C(x_k) → 0 and that for all bounded sequences $(z_{k})_{k \in \mathbb {N}}$ and $(y_{k})_{k \in \mathbb {N}}$ in int(domϕ)

$$ D_{\phi}(z_{k},y_{k}) \to 0\ \Rightarrow\ z_{k} - y_{k} \to 0. $$

(B8)

Then $(x_{k})_{k \in \mathbb {N}}$ converges to some point in C ∩int(domϕ).

Proof

Let x ∈ C ∩int(domϕ). It follows from Proposition B.1(i) and Fact A.1(ix) that $(x_{k})_{k \in \mathbb {N}}$ is contained in the compact set {D_ϕ(x,⋅) ≤ D_ϕ(x,x₀)}⊂int(domϕ). Hence, the set of cluster points of $(x_{k})_{k \in \mathbb {N}}$ is nonempty and contained in int(domϕ). Moreover, it follows from Proposition B.1(iii) (with k = 0) and Fact A.1(ix) that $(P_{C}(x_{p}))_{p \in \mathbb {N}}$ is contained in the compact set {D_ϕ(P_C(x₀),⋅) ≤ D_C(x₀)} and hence it is bounded. Let x be a cluster point of $(x_{k})_{k \in \mathbb {N}}$ and let $(x_{n_{k}})_{k \in \mathbb {N}}$ be a subsequence such that $x_{n_{k}} \to x$. Then, we saw that x ∈int(domϕ). Moreover, $D_{\phi }(P_{C}(x_{n_{k}}), x_{k_{n}}) = D_{C}(x_{n_{k}})\to 0$ and hence in virtue of (B8), we have that $P_{C}(x_{n_{k}}) - x_{n_{k}} \to 0$. Therefore, $P_{C}(x_{n_{k}}) \to x$, which implies that x ∈ C, since C is closed. Thus, we proved that all cluster points of $(x_{k})_{k \in \mathbb {N}}$ lie in C ∩int(domϕ) and therefore, by Proposition B.1(vi), we derive that $(x_{k})_{k \in \mathbb {N}}$ converges to some point in C. □

Appendix: C. Proof of Lemma 3.1(ii) under assumption H1

It follows from Proposition 3.1(iii) and H1 that there exists a $\mathbb {P}$-negligible set N ⊂Ω such that $C = \bigcap _{\omega \in {\Omega }\setminus N} C_{\xi (\omega )}$ and $\sup _{\omega \in {\Omega }\setminus N}\|{A_{\xi (\omega )}^{*}(A_{\xi (\omega )} H^{2} A_{\xi (\omega )}^{*})^{\dagger } A_{\xi (\omega )}}\|\leq M<+\infty $. Note that, if x ∈int(domϕ) ∩ C, then $0=D_{C}(x)=D_{C}(P_{C_{\xi (\omega )}}(x))$, for every ω ∈Ω∖ N, hence (22) holds trivially. Therefore, we let x ∈int(domϕ) ∖ C and let x_⋆ = P_C(x), y = ∇ϕ(x), y_⋆ = ∇ϕ(x_⋆). Now, let ω ∈Ω∖ N. We denote for the sake of brevity i = ξ(ω), $x_{i} = P_{C_{i}}(x)$, y_i = ∇ϕ(x_i) and H = [∇²ϕ^∗(y_⋆)]^1/2. Next, we will proceed through 6 steps.

Step1::

We have

$$ (\forall v_{i} \in \text{Im}(A_{i}^{*}))\quad y + v_{i} \in \text{int}(\text{dom} \phi^{*})\ \implies\ D_{C}(x_{i}) \leq D_{\phi^{*}}(y + v_{i}, y_{\star}). $$

(C9)

Indeed, Lemma 2.1 yields $D_{C}(x_{i}) = D_{\phi }(P_{C}(x_{i}), x_{i}) = D_{\phi }(P_{C}(x),x_{i})= D_{\phi }(x_{\star },P_{C_{i}}(x))$, with x_⋆ ∈ C_i. Hence, using Lemma 2.2 and Fact A.1(vi), (C9) follows.

Step 2::

There exists $\tilde {w} \in \text {Im}(A^{*})$ such that $\|{H \tilde {w}}\| = 1$ and, for all τ > 0,

$$ u_{\tau}:= H(y_{\star} - y + \tau \tilde w) \in V(x_{\star})\!\setminus\!\{0\}. $$

(C10)

Indeed, first recall that Im(HA^∗) = V (x_⋆)≠{0}. It follows from (12) that y_⋆ − y ∈Im(A^∗). Now, if H(y_⋆ − y)≠ 0 we define $\tilde {w}=(y_{\star } - y)/\|{H(y_{\star } - y)}\|$ and (C10) follows. Otherwise, since Im(HA^∗)≠{0}, we can pick $\tilde {w} \in \text {Im}(A^{*})$ such that $\|{H \tilde {w}}\|=1$ and again (C10) follows.

Step 3::

Suppose that $HA_{i}^{*}\neq 0$. We prove that, for every τ > 0, there exists $v_{i,\tau } \in \text {Im}(A_{i}^{*})$ such that y + v_i,τ − y_⋆∉Ker(H) and

$$ \|{y+v_{i,\tau} - y_{\star}}\|\leq (1+M\|{H}\|^{2})\|{y_{\star}-y}\| + 3\tau M \|{H}\|. $$

(C11)

Indeed, since $HA_{i}^{*}\neq 0$, there exists $w_{i} \in \text {Im}(HA^{*}_{i})$ such that ∥w_i∥ = 2. Now, note that

$$ Q_{i}(x_{\star}) = H A_{i}^{*}[A_{i} H^{2} A_{i}^{*}]^{\dagger} A_{i} H $$

(C12)

and let, for every τ > 0,

$$ v_{i,\tau}:=A_{i}^{*}[A_{i} H^{2} A_{i}^{*}]^{\dagger} A_{i} H (u_{\tau} +\tau w_{i})\in\text{Im}(A_{i}^{*}). $$

(C13)

Then, recalling (C10), (C12), and the fact that $w = H \tilde {w}$, we have

$$ \begin{array}{@{}rcl@{}} H(y_{\star}-y-v_{i,\tau}) &=& H(y_{\star}-y) - Q_{i}(x_{\star})(u_{\tau} + \tau w_{i})\\ &=& [I - Q_{i}(x_{\star})]H(y_{\star}-y) - \tau Q_{i}(x_{\star})(w+w_{i}) \end{array} $$

and, since Q_i(x_⋆) is the projector onto $\text {Im}(HA_{i}^{*})$ and $w_{i} \in \text {Im}(HA_{i}^{*})$, we have

$$ \|{H(y+v_{i,\tau} - y_{\star})}\|^{2} = \|{[I - Q_{i}(x_{\star})]H(y_{\star}-y)}\|^{2} + \tau^{2} \|{Q_{i}(x_{\star})w + w_{i}}\|^{2}. $$

(C14)

In the above formula we have Q_i(x_⋆)w≠ − w_i, since ∥w_i∥ = 2 while ∥Q_i(x_⋆)w∥≤∥w∥ = 1. Therefore ∥H(y + v_i,τ − y_⋆)∥² > 0 and hence y + v_i,τ − y_⋆∉Ker(H). Finally, inequality (C11) follows by bounding ∥v_i,τ∥ using (C13), (C10), assumption H1, and the fact that ∥w_i∥ = 2 and $\|{H \tilde {w}}\|=1$.

Step 4::

Suppose that $HA_{i}^{*}\neq 0$ and let $\varepsilon \in \left ]0,1\right [$. We prove that for τ > 0 sufficiently small

$$ \frac{1 - \varepsilon}{2} \|{u_{\tau}}\|^{2} \leq D_{C}(x)\ \text{and}\ D_{C}(x_{i}) \leq \frac{1 + \varepsilon}{2} \Big(\|{[I - Q_{i}(x_{\star})]u_{\tau}}\|^{2} + \sqrt{\tau} D_{C}(x) \Big). $$

(C15)

Indeed, it follows from the second part of Fact A.1(xi), applied to $D_{\phi ^{*}}$, that there exists $\tilde \delta >0$ such that if $\|{\tilde y-y_{\star }}\|<\tilde \delta $ and $\tilde y-y_{\star }\not \in \text {Ker}(H)$, then

$$ \frac{\sqrt{1 - \varepsilon}}{2} \langle{H^{2} (\tilde y - y_{\star}), \tilde y - y_{\star}}\rangle \leq D_{\phi^{*}}(\tilde y, y_{\star}) \leq \frac{1 + \varepsilon}{2} \langle{H^{2} (\tilde y - y_{\star}), \tilde y - y_{\star}}\rangle. $$

Therefore, setting $\beta _{\star }= 1 + \max \limits \{3 M\|{H}\|^{2}+M\|{H}\|,\|{\tilde w}\|\}>1$, it follows from the inequality $\|{y_{\star }-y-\tau \tilde w}\| \leq \|{y_{\star }-y}\| + \tau \|{\tilde w}\|$ and (C11) that if τ ≤∥y − y_⋆∥ and $\|{y-y_{\star }}\|\leq \tilde \delta / \beta _{\star }$, we have

$$ D_{\phi^{*}}(y+v_{i,\tau}, y_{\star}) \leq \frac{1 + \varepsilon}{2} \langle{H^{2} (y+v_{i,\tau} - y_{\star}), y+v_{i,\tau} - y_{\star}}\rangle $$

(C16)

and

$$ D_{\phi^{*}}(y+\tau \tilde w, y_{\star}) \geq \frac{\sqrt{1 - \varepsilon}}{2} \langle{H^{2} (y +\tau \tilde w - y_{\star}), y +\tau \tilde w - y_{\star}}\rangle. $$

(C17)

Now, the continuity of ∇ϕ and Fact A.1(x) yields that there exists δ > 0 such that if D_C(x) < δ then, $\|{y-y_{\star }}\|<\tilde \delta / \beta _{\star }$, and hence, collecting (C9) and (C16), we obtain $D_{C}(x_{i}) \leq D_{\phi ^{*}}\big (y+v_{i,\tau },y_{\star } \big )\leq ((1 + \varepsilon )/2) \|{ H(y+v_{i,\tau } - y_{\star })}\|{}^{2}$. However, it also holds that ∥H(y + v_i,τ − y_⋆)∥ = ∥[I − Q_i(x_⋆)]u_τ + τ(w − w_i)∥≤∥[I − Q_i(x_⋆)]u_τ∥ + 3τ. Therefore, since $\|{u_{\tau }}\| \leq \|{H(y-y_{\star })}\| + \tau \|{H \tilde w}\| \leq (\|{H}\| + 1) \|{y - y_{\star }}\|$,

$$ \begin{array}{@{}rcl@{}} D_{C}(x_{i}) &\leq& \frac{1 + \varepsilon}{2}\left( \|{[I-Q_{i}(x_{\star})]u_{\tau}}\|^{2} + 9 \tau^{2} +6 \tau \|{u_{\tau}}\| \right)\\ &\leq& \frac{1 + \varepsilon}{2}\left( \|{[I-Q_{i}(x_{\star})]u_{\tau}}\|^{2} + 3 \tau(2 \|{H}\| + 5) \|{y - y_{\star}}\| \right) \end{array} $$

which, for $\tau \leq \tau _{\star }^{(1)}:= \min \limits \{\|{y_{\star }-y}\|,9^{-1}D_{C}(x)^{2}\|{y_{\star }-y}\|^{-2}(2 \|{H}\| + 5)^{-2}\}$, gives

$$ D_{C}(x_{i}) \leq \frac{1 + \varepsilon}{2} \Big(\|{[I - Q_{i}(x_{\star})]u_{\tau}}\|^{2} + \sqrt{\tau} D_{C}(x) \Big). $$

(C18)

On the other hand, $D_{C}(x) = D_{\phi }(x_{\star }, x) = D_{\phi ^{*}}(y, y_{\star }) \neq 0$. Hence, using the continuity of $D_{\phi ^{*}}(\cdot ,y_{\star })$, we have that there exists $\tau _{\star }^{(2)}>0$ such that for every $\tau \leq \tau _{\star }^{(2)}$, $D_{C}(x)\geq \sqrt {1-\varepsilon } D_{\phi ^{*}}(y+\tau \tilde w, y_{\star })$. So, (C17) yields

$$ D_{C}(x)\geq \frac{1 - \varepsilon}{2} \langle{H^{2} (y+ \tau \tilde{w} - y_{\star}), y + \tau \tilde{w}- y_{\star}}\rangle = \frac{1 - \varepsilon}{2} \|{u_{\tau}}\|^{2}. $$

(C19)

Step 5::

For $\tau \leq \min \limits \{\tau _{\star }^{(1)},\tau _{\star }^{(2)}\}$, we have

$$ \frac{D_{C}(x_{i})}{D_{C}(x)} \leq \frac{1+\varepsilon}{1-\varepsilon} \bigg(1 - \frac{\|{Q_{i}(x_{\star}) u_{\tau}}\|^{2}}{\|{u_{\tau}}\|^{2}} \bigg)+ \frac{1 + \varepsilon}{2}\sqrt{\tau}. $$

(C20)

This follows from (C15) when $HA_{i}^{*}\neq 0$. However, (C20) holds even when $\text {Im}(HA_{i}^{*})=\{0\}$. Indeed in such case, recalling the definition of Q_i(x_⋆), we have Q_i(x_⋆) ≡ 0. Hence, since D_C(x_i) ≤ D_C(x), we have that (C20) actually holds for every τ > 0.

Step 6::

Note that inequality (C20) holds with i = ξ(ω) and ω ∈Ω∖ N and that $\tau _{\star }^{(1)}$ and $\tau _{\star }^{(2)}$ are independent on i = ξ(ω). Therefore, the above inequality implies that

$$ \frac{D_{C}(P_{C_{\xi}}(x))}{D_{C}(x)} \leq \frac{1+\varepsilon}{1-\varepsilon} \bigg(1 - \frac{\|{Q_{\xi}(x_{\star}) u_{\tau}}\|{}^{2}}{\|{u_{\tau}}\|^{2}} \bigg)+ \frac{1+\varepsilon}{2}\sqrt{\tau}, \mathbb{P}\text{-a.s.} $$

So, taking the expectation and recalling definition (21), we have

$$ \begin{array}{@{}rcl@{}} \frac{\mathbb{E}[D_{C}(P_{C_{\xi}}(x))]}{D_{C}(x)} &\leq& \frac{1+\varepsilon}{1-\varepsilon} \bigg(1 - \frac{\|{\overline{Q}(x_{\star}) u}\|^{2}}{\|{u}\|^{2}} \bigg) + \frac{1+\varepsilon}{2}\sqrt{\tau}\\ &\leq& \frac{1+\varepsilon}{1-\varepsilon} [1 - \gamma_{\mathcal{C}}(x_{\star})]+ \frac{1+\varepsilon}{2}\sqrt{\tau}. \end{array} $$

Finally, letting τ → 0 in the above inequality the statement follows.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kostić, V.R., Salzo, S. The method of randomized Bregman projections for stochastic feasibility problems. Numer Algor 93, 1269–1307 (2023). https://doi.org/10.1007/s11075-022-01468-8

Download citation

Received: 17 September 2021
Accepted: 22 November 2022
Published: 21 December 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11075-022-01468-8

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The method of randomized Bregman projections for stochastic feasibility problems

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

Random Gradient-Free Minimization of Convex Functions

Golden Ratio Proximal Gradient ADMM for Distributed Composite Convex Optimization

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Appendices

Appendix: A. Basic facts on Bregman projection method

Fact A.1

Fact A.2

Fact A.3 ([27, Example 5.1.5])

Fact A.4 ([27, Theorem 3.2.4])

Proof Proof of Lemma 2.1

Proof Proof of Lemma 2.2

Appendix: B. D-Fejér monotone sequences [5, 18]

Proposition B.1

Proposition B.2

Proof

Appendix: C. Proof of Lemma 3.1(ii) under assumption H1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

The method of randomized Bregman projections for stochastic feasibility problems

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

Random Gradient-Free Minimization of Convex Functions

Golden Ratio Proximal Gradient ADMM for Distributed Composite Convex Optimization

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Appendices

Appendix: A. Basic facts on Bregman projection method

Fact A.1

Fact A.2

Fact A.3 ([27, Example 5.1.5])

Fact A.4 ([27, Theorem 3.2.4])

Proof Proof of Lemma 2.1

Proof Proof of Lemma 2.2

Appendix: B. D-Fejér monotone sequences [5, 18]

Proposition B.1

Proposition B.2

Proof

Appendix: C. Proof of Lemma 3.1(ii) under assumption H1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation