Stochastic proximal splitting algorithm for composite minimization

Patrascu, Andrei; Irofti, Paul

doi:10.1007/s11590-021-01702-7

Stochastic proximal splitting algorithm for composite minimization

Original Paper
Published: 25 January 2021

Volume 15, pages 2255–2273, (2021)
Cite this article

Optimization Letters Aims and scope Submit manuscript

500 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Supported by the recent contributions in multiple domains, the first-order splitting became algorithms of choice for structured nonsmooth optimization. The large-scale noisy contexts make available stochastic information on the objective function and thus, the extension of proximal gradient schemes to stochastic oracles is heavily based on the tractability of the proximal operator corresponding to nonsmooth component, which has been highly exploited in the literature. However, some questions remained about the complexity of the composite models with proximal untractable terms. In this paper we tackle composite optimization problems, assuming only the access to stochastic information on both smooth and nonsmooth components, with a stochastic proximal first-order scheme with stochastic proximal updates. We provide sublinear $\mathcal {O}\left( \frac{1}{k} \right) $ convergence rates (in expectation of squared distance to the optimal set) under the strong convexity assumption on the objective function. Also, linear convergence is achieved for convex feasibility problems. The empirical behavior is illustrated by numerical tests on parametric sparse representation models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Inexact Spingarn’s Partial Inverse Method with Applications to Operator Splitting and Composite Optimization

Article 13 November 2017

Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problems

Article 13 January 2021

Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms

Article 13 July 2022

Notes

Data generating code available athttps://github.com/pirofti/SSPG.

References

Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: convergence, optimality, and adaptivity. SIAM J. Optim. 29(3), 2257–2290 (2019)
Article MathSciNet Google Scholar
Bauschke, H., Deutsch, F., Hundal, H., Park, S.-H.: Accelerating the convergence of the method of alternating projections. Trans. Am. Math. Soc. 355(9), 3433–3461 (2003)
Article MathSciNet Google Scholar
Bauschke, H.H., Borwein, J.M., Li, Wu: Strong conical hull intersection property, bounded linear regularity, jameson’s property (g), and error bounds in convex optimization. Math. Program. 86(1), 135–160 (1999)
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet Google Scholar
Bianchi, P.: Ergodic convergence of a stochastic proximal point algorithm. SIAM J. Optim. 26(4), 2235–2260 (2016)
Article MathSciNet Google Scholar
Davis, D., Drusvyatskiy, D.: Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207–239 (2019)
Article MathSciNet Google Scholar
Elad, M.: Sparse and redundant representations: from theory to applications in signal and image processing. Springer, New York (2010)
Book Google Scholar
Hallac, D., Leskovec, J., Boyd, S.: Network lasso: clustering and optimization in large graphs. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 387–396 (2015)
Koshal, J., Nedic, A., Shanbhag, U.V.: Regularized iterative stochastic approximation methods for stochastic variational inequality problems. IEEE Trans. Autom. Control 58(3), 594–609 (2012)
Article MathSciNet Google Scholar
Moulines, E., Bach, F.R.: Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Advances in Neural Information Processing Systems, pp. 451–459 (2011)
Nedić, A.: Random projection algorithms for convex set intersection problems. In: 49th IEEE Conference on Decision and Control (CDC), pp. 7655–7660. IEEE (2010)
Nedić, A.: Random algorithms for convex minimization problems. Math. Program. 129(2), 225–253 (2011)
Article MathSciNet Google Scholar
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Article MathSciNet Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)
Article MathSciNet Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Springer, US (2013)
MATH Google Scholar
Nguyen, L.M., Nguyen, P.H., van Dijk, M., Richtárik, P., Scheinberg, K., Takáč, M.: Sgd and hogwild! convergence without the bounded gradients assumption. arXiv preprint arXiv:1802.03801 (2018)
Pătraşcu, A.: New nonasymptotic convergence rates of stochastic proximal point algorithm for stochastic convex optimization. Optimization, 1–29 (2020)
Patrascu, A., Necoara, I.: Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization. J. Mach. Learn. Res. 18(1), 7204–7245 (2017)
MathSciNet MATH Google Scholar
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis, vol. 317. Springer, Berlin (2009)
MATH Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton, New Jersey (1988)
Google Scholar
Rockafellar, R.T., Wets, R.J.-B.: On the interchange of subdifferentiation and conditional expectation for convex functionals. Stochastics 7(1), 173–182 (1982)
Article MathSciNet Google Scholar
Rosasco, L., Villa, S., Vũ, B.C.: Convergence of stochastic proximal gradient algorithm. Appl. Math. Optim. 82, 1–27 (2019)
MathSciNet MATH Google Scholar
Ryu, E.K., Boyd, S.: Stochastic proximal iteration: a non-asymptotic improvement upon stochastic gradient descent. Author website, early draft (2016)
Salim, A., Bianchi, P., Hachem, W.: Snake: a stochastic proximal gradient algorithm for regularized problems over large graphs. IEEE Trans. Autom. Control 64(5), 1832–1847 (2019)
Article MathSciNet Google Scholar
Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for svm. Math. Program. 127(1), 3–30 (2011)
Article MathSciNet Google Scholar
Shi, W., Ling, Q., Gang, W., Yin, W.: A proximal gradient algorithm for decentralized composite optimization. IEEE Trans. Signal Process. 63(22), 6013–6023 (2015)
Article MathSciNet Google Scholar
Stoican, F., Irofti, P.: Aiding dictionary learning through multi-parametric sparse representation. Algorithms 12(7), 131 (2019)
Article MathSciNet Google Scholar
Toulis, P., Tran, D., Airoldi, E.: Towards stability and optimality in stochastic gradient descent. In: Artificial Intelligence and Statistics, pp. 1290–1298 (2016)
Varma, R., Lee, H., Kovacevic, J., Chi, Y.: Vector-valued graph trend filtering with non-convex penalties. IEEE Trans. Signal Inf. Process. Over Netw. 6, 48–62 (2019)
Wang, M., Bertsekas, D.P.: Stochastic first-order methods with random constraint projection. SIAM J. Optim. 26(1), 681–717 (2016)
Article MathSciNet Google Scholar
Wang, X., Wang, S., Zhang, H.: Inexact proximal stochastic gradient method for convex composite optimization. Comput. Optim. Appl. 68(3), 579–618 (2017)
Article MathSciNet Google Scholar
Yankelevsky, Y., Elad, M.: Dual graph regularized dictionary learning. IEEE Trans. Signal Inf. Process. Over Netw. 2(4), 611–624 (2016)
Article MathSciNet Google Scholar
Zhong, W., Kwok, J.: Accelerated stochastic gradient method for composite regularization. In: Artificial Intelligence and Statistics, pp. 1086–1094 (2014)

Download references

Acknowledgements

The research of A. Patrascu was supported by a grant of the Romanian Ministry of Education and Research, CNCS - UEFISCDI, project number PN-III-P1-1.1-PD-2019-1123, within PNCDI III. Also, the research work of P. Irofti was supported by a grant of the Romanian Ministry of Education and Research, CNCS - UEFISCDI, project number PN-III-P1-1.1-PD-2019-0825, within PNCDI III..

Author information

Authors and Affiliations

The Research Center for Logic, Optimization and Security (LOS), Department of Computer Science, Faculty of Mathematics and Computer Science, University of Bucharest, Bucharest, Romania
Andrei Patrascu & Paul Irofti

Authors

Andrei Patrascu
View author publications
You can also search for this author in PubMed Google Scholar
Paul Irofti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrei Patrascu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof (of Corollary 3)

For simplicity denote $\theta _k = (1 - \mu _k\sigma _{f})$ , then Theorem 2 implies that:

$$\begin{aligned} \mathbb {E}\left[ \Vert x^{k+1}-x^*\Vert ^2 \right]&\le \left( \prod _{i=0}^k \theta _i\right) \Vert x^0-x^*\Vert ^2 + \varSigma \sum \limits _{i=0}^k \left( \prod \limits _{j=i+1}^{k} \theta _j\right) \mu _i^2. \end{aligned}$$

By using the Bernoulli inequality $ 1- tx \le \frac{1}{1 + tx} \le (1 + x)^{-t}$ for $t \in [0,1], x \ge 0$, then we have:

$$\begin{aligned} \prod \limits _{i=l}^u \theta _i&= \prod \limits _{i=l}^u \left( 1 - \frac{\mu _0}{i^{\gamma }} \sigma _{f}\right) \le \prod \limits _{i=l}^u (1 + \mu _0 \sigma _f)^{-1/i^{\gamma }} = (1 + \mu _0 \sigma _{f})^{- \sum \limits _{i=l}^u \frac{1}{i^{\gamma }}}. \end{aligned}$$

(18)

On the other hand, if we use the lower bound

$$\begin{aligned} \sum \limits _{i=l}^u \frac{1}{i^{\gamma }} \ge \int \limits _{l}^{u + 1} \frac{1}{\tau ^{\gamma }} d\tau = \varphi _{1-\gamma }(u+1) - \varphi _{1-\gamma }(l). \end{aligned}$$

(19)

then we can finally derive:

$$\begin{aligned}&\sum \limits _{i=0}^k \left( \prod \limits _{j=i+1}^{k} \theta _j\right) \mu _i^2 = \sum \limits _{i=0}^m \left( \prod \limits _{j=i+1}^{k} \theta _j\right) \mu _i^2 + \sum \limits _{i=m+1}^k \left( \prod \limits _{j=i+1}^{k} \theta _j\right) \mu _i^2\\&\quad \overset{(18) + (19)}{\le } \sum \limits _{i=0}^m (1 + \mu _0 \sigma _f)^{ \varphi _{1-\gamma }(i+1) - \varphi _{1-\gamma }(k) } \mu _i^2 + \mu _{m+1} \sum \limits _{i=m+1}^k \left[ \prod \limits _{j=i+1}^{k} (1 - \mu _j\sigma _f) \right] \mu _i \\&\quad \le (1 + \mu _0 \sigma _f)^{ \varphi _{1-\gamma }(m) - \varphi _{1-\gamma }(k) } \sum \limits _{i=0}^m \mu _i^2\\&\qquad + \frac{\mu _{m+1}}{\sigma _f} \sum \limits _{i=m+1}^k \left[ \prod \limits _{j=i+1}^{k} (1 - \mu _j\sigma _f) \right] (1 - (1- \sigma _f\mu _i)) \\&\quad = (1 + \mu _0 \sigma _f)^{ \varphi _{1-\gamma }(m) - \varphi _{1-\gamma }(k) } \mu _0^2 \sum \limits _{i=0}^m \frac{1}{i^{2\gamma }} \\&\qquad +\frac{\mu _{m+1}}{\sigma _f} \sum \limits _{i=m+1}^k \left[ \prod \limits _{j=i+1}^{k} (1 - \mu _j\sigma _f) - \prod \limits _{j=i}^{k} (1 - \mu _j\sigma _f) \right] \\&\quad \le (1 + \mu _0 \sigma _f)^{ \varphi _{1-\gamma }(m) - \varphi _{1-\gamma }(k) } \frac{m^{1- 2\gamma } - 1}{1 - 2\gamma } + \frac{\mu _{m+1}}{\sigma _f} \left[ 1 - \prod \limits _{j=m+1}^{k} (1 - \mu _j\sigma _f) \right] \\&\quad \le (1 + \mu _0 \sigma _f)^{ \varphi _{1-\gamma }(m) - \varphi _{1-\gamma }(k) } \varphi _{1 - 2\gamma }(m) + \frac{\mu _{m+1}}{\sigma _f}. \end{aligned}$$

By denoting the second constant $\tilde{\theta }_0 = \frac{1}{1+\mu _0 \sigma _f}$, then the last relation implies the following bound:

$$\begin{aligned} \mathbb {E}\left[ \Vert x^{k+1}-x^*\Vert ^2\right] \le \tilde{\theta }_0^{\varphi _{1-\gamma }(k)} \Vert x^{0}-x^*\Vert ^2 + \tilde{\theta }_0^{ \varphi _{1-\gamma }(k) - \varphi _{1-\gamma }(m) } \varphi _{1 - 2\gamma }(m)\varSigma + \frac{\mu _{m+1}}{\sigma _f} \varSigma . \end{aligned}$$

Denote $r_k^2 = \mathbb {E}[\Vert x^k-x^*\Vert ^2]$. To derive an explicit convergence rate order we analyze upper bounds on function $\phi $.

(i) First assume that $\gamma \in (0, \frac{1}{2})$. This implies that $1 - 2\gamma > 0$ and that:

$$\begin{aligned} \varphi _{1-2\gamma }\left( \left\lfloor \frac{k}{2} \right\rfloor \right) \le \varphi _{1-2\gamma }\left( \frac{k}{2}\right) = \frac{\left( \frac{k}{2} \right) ^{1-2\gamma } - 1}{1-2\gamma }\le \frac{\left( \frac{k}{2} \right) ^{1-2\gamma }}{1-2\gamma }. \end{aligned}$$

(20)

On the other hand, by using the inequality $e^{-x} \le \frac{1}{1 + x}$ for all $x \ge 0$, we obtain:

$$\begin{aligned}&\tilde{\theta }_0^{\varphi _{1-\gamma }(k) - \varphi _{1-\gamma }(\frac{k-2}{2})} \varphi _{1-2\gamma }\left( \frac{k}{2}\right) = e^{(\varphi _{1-\gamma }(k) - \varphi _{1-\gamma }(\frac{k-2}{2}))\ln {\tilde{\theta }_0}} \varphi _{1-2\gamma }\left( \frac{k}{2} \right) \\&\quad \le \frac{\varphi _{1-2\gamma }\left( \frac{k}{2} \right) }{1 + [\varphi _{1-\gamma }(k) - \varphi _{1-\gamma }(\frac{k}{2}-1)]\ln {\frac{1}{\tilde{\theta }_0}}} \overset{(20)}{\le } \frac{\frac{k^{1-2\gamma }}{2^{1-2\gamma } (1-2\gamma )} }{\frac{1}{1-\gamma }[k^{1-\gamma } - (\frac{k}{2}-1)^{1-\gamma }]\ln {\frac{1}{\tilde{\theta }_0}}} \\&\quad = \frac{\frac{k^{1-2\gamma }}{2^{1-2\gamma } (1-2\gamma )}}{\frac{k^{1-\gamma }}{1-\gamma }[1 - (\frac{1}{6})^{1-\gamma }]\ln {\frac{1}{\tilde{\theta }_0}}} = \frac{1-\gamma }{1-2\gamma }\frac{2^{\gamma }k^{-\gamma }}{2^{1-2\gamma }[1 - (\frac{1}{6})^{1-\gamma }]\ln {\frac{1}{\theta _0}}} = \mathcal {O}\left( \frac{1}{k^{\gamma }}\right) . \end{aligned}$$

Therefore, in this case, the overall rate will be given by:

$$\begin{aligned} r_{k+1}^2 \le \theta _0^{\mathcal {O}(k^{1-\gamma })}r_0^2 + \mathcal {O}\left( \frac{1}{k^{\gamma }}\right) \approx \mathcal {O}\left( \frac{1}{k^{\gamma }}\right) . \end{aligned}$$

If $\gamma = \frac{1}{2}$, then the definition of $\varphi _{1-2\gamma }(\frac{k}{2})$ provides that:

$$\begin{aligned} r_{k+1}^2 \le \tilde{\theta }_0^{\mathcal {O}(\sqrt{k})}r_0^2 + \tilde{\theta }_0^{\mathcal {O}(\sqrt{k})}\mathcal {O}(\ln {k}) + \mathcal {O}\left( \frac{1}{\sqrt{k}}\right) \approx \mathcal {O}\left( \frac{1}{\sqrt{k}}\right) . \end{aligned}$$

When $\gamma \in (\frac{1}{2}, 1)$, it is obvious that $\varphi _{1-2\gamma }\left( \frac{k}{2}\right) \le \frac{1}{2\gamma - 1}$ and therefore the order of the convergence rate changes into:

$$\begin{aligned} r_{k+1}^2 \le \tilde{\theta }_0^{\mathcal {O}(k^{1-\gamma })}[r_0^2 + \mathcal {O}(1)] + \mathcal {O}\left( \frac{1}{k^{\gamma }}\right) \approx \mathcal {O}\left( \frac{1}{k^{\gamma }}\right) . \end{aligned}$$

(ii) Lastly, if $\gamma = 1$, by using $\tilde{\theta }_0^{\ln {k+1}} \le \left( \frac{1}{k}\right) ^{\ln {\frac{1}{\tilde{\theta }_0}}}$ we obtain the second part of our result. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Patrascu, A., Irofti, P. Stochastic proximal splitting algorithm for composite minimization. Optim Lett 15, 2255–2273 (2021). https://doi.org/10.1007/s11590-021-01702-7

Download citation

Received: 03 February 2020
Accepted: 08 January 2021
Published: 25 January 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s11590-021-01702-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stochastic proximal splitting algorithm for composite minimization

Abstract

Access this article

Similar content being viewed by others

An Inexact Spingarn’s Partial Inverse Method with Applications to Operator Splitting and Composite Optimization

Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problems

Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof (of Corollary 3)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Stochastic proximal splitting algorithm for composite minimization

Abstract

Access this article

Similar content being viewed by others

An Inexact Spingarn’s Partial Inverse Method with Applications to Operator Splitting and Composite Optimization

Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problems

Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof (of Corollary 3)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation