Skip to main content
Log in

Stochastic Primal Dual Fixed Point Method for Composite Optimization

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

In this paper we propose a stochastic primal dual fixed point method for solving the sum of two proper lower semi-continuous convex function and one of which is composite. The method is based on the primal dual fixed point method proposed in Chen et al. (Inverse Probl 29:025011, 2013) that does not require subproblem solving. Under some mild condition, the convergence is established based on two sets of assumptions: bounded and unbounded gradients and the convergence rate of the expected error of iterate is of the order \({\mathcal {O}}(k^{-\alpha })\) where k is iteration number and \(\alpha \in (0,1]\). Finally, numerical examples on graphic Lasso and logistic regressions are given to demonstrate the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://www.cs.nyu.edu/~roweis/data.html.

  2. http://www.cs.nyu.edu/~roweis/data.html.

References

  1. Chen, P., Huang, J., Zhang, X.: A primal–dual fixed point algorithm for convex separable minimization with applications to image restoration. Inverse Probl. 29, 025011 (2013)

    Article  MathSciNet  Google Scholar 

  2. Tibshirani, R.J.: The Solution Path of the Generalized Lasso. Stanford University, Stanford (2011)

    Book  Google Scholar 

  3. Vapnik, V.: The Nature of Statistical Learning Theory. Springer Science & Business Media (2003)

  4. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4, 1168–1200 (2005)

    Article  MathSciNet  Google Scholar 

  5. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  6. Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate o (\(1/\text{k}^{2}\)). In: Doklady Akademii Nauk SSSR, vol. 269, pp. 543–547 (1983)

  7. Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2, 649–664 (1992)

    Article  MathSciNet  Google Scholar 

  8. Rosasco, L., Villa, S., Vũ, B.: Convergence of stochastic proximal gradient algorithm (2014). arXiv preprint arXiv:1403.5074

  9. Duchi, J., Singer, Y.: Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10, 2899–2934 (2009)

    MathSciNet  MATH  Google Scholar 

  10. Shalev-Shwartz, S., Zhang, T.: Proximal stochastic dual coordinate ascent (2012). arXiv preprint arXiv:1211.2717

  11. Shalev-Shwartz, S., Zhang, T.: Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. In: International Conference on Machine Learning, pp. 64–72 (2014)

  12. Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24, 2057–2075 (2014)

    Article  MathSciNet  Google Scholar 

  13. Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)

    Article  MathSciNet  Google Scholar 

  14. Setzer, S.: Split Bregman algorithm, Douglas–Rachford splitting and frame shrinkage. In: International Conference on Scale Space and Variational Methods in Computer Vision, pp. 464–476. Springer (2009)

  15. He, B., Yuan, X.: On the o(1/n) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)

    Article  MathSciNet  Google Scholar 

  16. Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66, 889–916 (2016)

    Article  MathSciNet  Google Scholar 

  17. Zhu, M., Chan, T.F.: An Efficient Primal–Dual Hybrid Gradient Algorithm for Total Variation Image Restoration, CAM Report 08–34. UCLA, Los Angeles, CA (2008)

  18. Esser, E., Zhang, X., Chan, T.F.: A General Frame Work for a Class of First Order Primal Dual Algorithms for Convex Optimization in Imaging Science, CAM Report 08–34. UCLA, Los Angeles, CA (2008)

  19. Chambolle, A., Pock, T.: A first-order primal–dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120145 (2011)

    Article  MathSciNet  Google Scholar 

  20. Micchelli, C.A., Shen, L., Yuesheng, X.: Proximity algorithms for image models: denoising. Inverse Probl. 27, 045009 (2011)

    Article  MathSciNet  Google Scholar 

  21. Ouyang, H., He, N., Tran, L., Gray, A.: Stochastic alternating direction method of multipliers. In: International Conference on Machine Learning, pp. 80–88 (2013)

  22. Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543–2596 (2010)

    MathSciNet  MATH  Google Scholar 

  23. Duchi, J., Singer, Y.: Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10, 2873–2908 (2009)

    MathSciNet  MATH  Google Scholar 

  24. Suzuki, T.: Dual averaging and proximal gradient descent for online alternating direction multiplier method. In: International Conference on Machine Learning, pp. 392–400 (2013)

  25. Zhong, W., Kwok, J.: Fast stochastic alternating direction method of multipliers. In: International Conference on Machine Learning, pp. 46–54 (2014)

  26. Le Roux, N., Schmidt, M., Bach, F.R: A stochastic gradient method with an exponential convergence rate for finite training sets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2012)

  27. Zhao, S.-Y., Li, W.-J., Zhou, Z.-H.: Scalable stochastic alternating direction method of multipliers (2015). arXiv preprint arXiv:1502.03529

  28. Suzuki, T.: Stochastic dual coordinate ascent with alternating direction method of multipliers. In: International Conference on Machine Learning, pp. 736–744 (2014)

  29. Zheng, S., Kwok, J.T.: Fast-and-light stochastic ADMM. In: IJCAI, pp. 2407–2613 (2016)

  30. Chambolle, A., Ehrhardt, M.J., Richtarik, P., Schonlieb, C.-B.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM J. Optimiz. 28(4), 2783–2808 (2018)

    Article  MathSciNet  Google Scholar 

  31. Chen, Y., Lan, G., Ouyang, Y.: Optimal primal–dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)

    Article  MathSciNet  Google Scholar 

  32. Moulines, E., Bach, F.R: Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Advances in Neural Information Processing Systems, pp. 451–459 (2011)

  33. Polyak, B.T.: Introduction to Optimization, vol. 1. Optimization Software, Inc., Publications Division, New York (1987)

    MATH  Google Scholar 

  34. Kim, S., Sohn, K.-A., Xing, E.P.: A multivariate regression approach to association analysis of a quantitative trait network. Bioinformatics 25(12), i204–i212 (2009)

    Article  Google Scholar 

  35. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)

    Article  Google Scholar 

  36. Banerjee, O., El Ghaoui, L., dAspremont, A.: Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9, 485–516 (2008)

    MathSciNet  MATH  Google Scholar 

  37. Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoqun Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Here, we give the details of the aforementioned lemmas. First, we give a lemma that will be used in the proof of Lemma 3.2.

Lemma 7.1

Letting \(r > 0\) and \(h(x) = r f_0(x/r),x \in {\mathbb {R}}^d\), then, for all\( ~ y \in {\mathbb {R}}^d\),

$$\begin{aligned} {\mathrm {Prox}}_h(y) = r {\mathrm {Prox}}_{r^{-1}f_0}(y/r). \end{aligned}$$

Proof

The assertion can be proved by using the definition of \({\mathrm {Prox}}_{f_0}(\cdot )\) and change of variables.

\(\square \)

1.1 Proof of Lemma 3.2

Proof

By the first optimality condition of problem (1.1), we have

$$\begin{aligned} \begin{aligned} x^{*}&= \underset{x \in {\mathbb {R}}^d}{\arg \min } (f_1 \circ B)(x) + f_2(x) \\&\Leftrightarrow 0 \in -\gamma _k \nabla f_2(x^*) - \gamma _k \partial (f_1 \circ B )(x^*) \\&\Leftrightarrow x^* \in x^* -\gamma _k \nabla f_2(x^*) - \gamma _k B^T \partial f_1(Bx^*) \\&\Leftrightarrow x^* \in x^* -\gamma _k \nabla f_2(x^*) - \lambda B^T \circ \frac{\gamma _k}{\lambda } \partial f_1(Bx^*). \end{aligned} \end{aligned}$$
(7.1)

Letting

$$\begin{aligned} v^* \in \partial f_1(Bx^*), \end{aligned}$$
(7.2)

then Eq. (7.1) can be rewritten as

$$\begin{aligned} x^* = x^* - \gamma _k \nabla f_2(x^*) - \gamma _k B^T v^* . \end{aligned}$$
(7.3)

From Eq. (7.2), we also have

$$\begin{aligned} \frac{\gamma _k}{\lambda }v^* \in \partial \frac{\gamma _k}{\lambda }f_1(Bx^*), \end{aligned}$$
(7.4)

which means that

$$\begin{aligned} \begin{aligned} Bx^*&= {\mathrm {Prox}}_{\frac{\gamma _k}{\lambda }f_1}\left( Bx^* + \frac{\gamma _k}{\lambda }v^*\right) \\&\Leftrightarrow \left( Bx^* + \frac{\gamma _k}{\lambda }v^*\right) - \frac{\gamma _k}{\lambda }v^* = {\mathrm {Prox}}_{\frac{\gamma _k}{\lambda }f_1}\left( Bx^* + \frac{\gamma _k}{\lambda }v^*\right) \\&\Leftrightarrow \frac{\gamma _k}{\lambda }v^* = \left( I - {\mathrm {Prox}}_{\frac{\gamma _k}{\lambda }f_1}\right) \left( Bx^* + \frac{\gamma _k}{\lambda }v^*\right) \\&\Leftrightarrow v^* = \frac{\lambda }{\gamma _k}\left( I - {\mathrm {Prox}}_{\frac{\gamma _k}{\lambda }f_1}\right) \left( Bx^* + \frac{\gamma _k}{\lambda }v^*\right) \\&\Leftrightarrow v^* = \frac{\lambda }{\gamma _k}Bx^* + v^* - \frac{\lambda }{\gamma _k}{\mathrm {Prox}}_{\frac{\gamma _k}{\lambda }f_1}\left( Bx^* + \frac{\gamma _k}{\lambda }v^*\right) \\&\Leftrightarrow v^* = \frac{\lambda }{\gamma _k}Bx^* + v^* - \frac{\lambda }{\gamma _k}{\mathrm {Prox}}_{\left( \frac{\lambda }{\gamma _k}\right) ^{-1}f_1}\Big (\frac{\frac{\lambda }{\gamma _k}Bx^* + v^*}{\frac{\lambda }{\gamma _k}}\Big ) \\&\Leftrightarrow v^* = \frac{\lambda }{\gamma _k}Bx^* + v^* - {\mathrm {Prox}}_{h^k}\Big (\frac{\lambda }{\gamma _k}Bx^* + v^*\Big ) \\&\Leftrightarrow v^* = (I- {\mathrm {Prox}}_{h^k})\Big (\frac{\lambda }{\gamma _k}Bx^* + v^*\Big ) \end{aligned} , \end{aligned}$$
(7.5)

where, in the second-to-last equality, we let \(h^k(x) = \frac{\lambda }{\gamma _k}f_1(\frac{x}{\frac{\lambda }{\gamma _k}}) = \frac{\lambda }{\gamma _k}f_1(\frac{\gamma _k}{\lambda }x)\) and using Lemma 7.1.

Inserting Eq. (7.3) into the last equality of Eq. (7.5), we have

$$\begin{aligned} \begin{aligned} v^*&= (I - {\mathrm {Prox}}_{h^k})\Big (\frac{\lambda }{\gamma _k}B\left( x^* - \gamma _k \nabla f_2(x^*)\right) + \left( I - \lambda BB^T\right) v^* \Big ) \\&\Leftrightarrow v^* = \frac{\lambda }{\gamma _k}\left( I - {\mathrm {Prox}}_{\frac{\gamma _k}{\lambda }f_1}\right) \Big (B\left( x^* - \gamma _k \nabla f_2(x^*)\right) + \left( I - \lambda BB^T\right) \frac{\gamma _k}{\lambda }v^* \Big ). \end{aligned} \end{aligned}$$
(7.6)

Combining (7.3), (7.5) and (7.6), we obtain

$$\begin{aligned} \left\{ \begin{aligned} v^*&= (I - {\mathrm {Prox}}_{h^k})\Big (\frac{\lambda }{\gamma _k}B\left( x^* - \gamma _k \nabla f_2(x^*)\right) + \left( I - \lambda BB^T\right) v^* \Big ) \\&= T_0^{(k)}(x^*,v^*) \\ x^*&= x^* - \gamma _k \nabla f_2(x^*) - \gamma _k B^T T_0^{(k)}(x^*,v^*). \end{aligned} \right. \end{aligned}$$

The converse can be similarly verified. This completes the proof. \(\square \)

1.2 Proof of Lemma 4.1

Proof

Letting \((x^*,v^*)\) be that in Lemma 3.2 and \((x_k,v_k)\) be the iterate in Algorithm 1, \(T_1^{(k)}(\cdot ) = (I - {\mathrm {Prox}}_{h^k})(\cdot )\), where \(h^k(x) = \frac{\lambda }{\gamma _k}f_1(\frac{x}{\frac{\lambda }{\gamma _k}}) = \frac{\lambda }{\gamma _k}f_1(\frac{\gamma _k}{\lambda }x)\). We denote

$$\begin{aligned} \begin{aligned} \varphi _{1,i}^{(k)}(x,y)&= \frac{\lambda }{\gamma _k}B(x - \gamma _k\nabla f_2^{[i]}(x))+ (I - \lambda B B^T)y \\&= \frac{\lambda }{\gamma _k}Bg_{k,i}^{(1)}(x) + My \\ \varphi _2^{(k)}(x,y)&= \frac{\lambda }{\gamma _k}B(x - \gamma _k\nabla f_2(x))+ (I - \lambda B B^T)y \\&= \frac{\lambda }{\gamma _k}Bg_k^{(2)}(x) + My \\ \end{aligned} \end{aligned}$$

Here, \(g_{k,i}^{(1)}(x) = x - \gamma _k\nabla f_2^{[i]}(x)\) and \(g_{k}^{(2)}(x) = x - \gamma _k\nabla f_2(x)\), \(M = I - \lambda B B^T\).

  • (i) Estimation of \(\Vert v_{k + 1} - v^* \Vert _2^2\):

    $$\begin{aligned} \begin{aligned} \Vert v_{k + 1} - v^* \Vert _2^2&= \left\Vert T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}(x_k,v_k)\right) - v^* \right\Vert _2^2 \\&= \left\Vert T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}(x_k,v_k)\right) - T_1^{(k)}\left( \varphi _2^{(k)}(x^*,v^*)\right) \right\Vert _2^2 \\&\le \big< T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}(x_k,v_k)\right) - T_1^{(k)}\left( \varphi _2^{(k)}(x^*,v^*)\right) ,\varphi _{1,i_k}^{(k)}(x_k,v_k) - \varphi _2^{(k)}(x^*,v^*) \big> \\&= \frac{\lambda }{\gamma _k}\big< T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}(x_k,v_k)\right) - T_1^{(k)}\left( \varphi _2^{(k)}(x^*,v^*)\right) ,B\left( g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*)\right) \big> \\&\quad + \big < T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}(x_k,v_k)\right) - T_1^{(k)}\left( \varphi _2^{(k)}(x^*,v^*)\right) , M(v_k - v^*) \big >. \end{aligned} \end{aligned}$$
    (7.7)

    The second equality follows from Eq. (3.2) and the inequality follows from the firm non-expansiveness of \(T_1^{(k)}\) (see Definition 2.3).

    Here, and in what follows, for convenience, we denote \(T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) = T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}(x_k,v_k)\right) \) and \(T_1^{(k)}\left( \varphi _2^{(k)}\right) = T_1^{(k)}\left( \varphi _2^{(k)}(x^*,v^*)\right) \).

  • (ii) Estimation of \(\Vert x_{k + 1} - x^* \Vert _2^2:\)

    $$\begin{aligned}&\Vert x_{k + 1} - x^* \Vert _2^2 \nonumber \\&= \left\Vert x_k - \gamma _k \nabla f_2^{[i_k]}(x_k) - \gamma _k B^T \circ T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - \big (x^* - \gamma _k \nabla f_2(x^*) - \gamma _k B^T \circ T_1^{(k)}\left( \varphi _2^{(k)}\right) \big ) \right\Vert _2^2 \nonumber \\&= \left\Vert g_{k,i_k}^{(1)}(x_k) - \gamma _k B^T \circ T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - \big (g_k^{(2)}(x^*) - \gamma _k B^T \circ T_1^{(k)}\left( \varphi _2^{(k)}\right) \big ) \right\Vert _2^2 \nonumber \\&= \left\Vert g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*) - \gamma _k B^T \circ \big (T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \big )\right\Vert _2^2 \nonumber \\&= \Vert g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*) \Vert _2^2 - 2\gamma _k \big< B^T \circ \big ( T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \big ), g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*) \big>\nonumber \\&\quad + \frac{\gamma _k^2}{\lambda ^2}\left\Vert \lambda B^T \circ \left( T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \right) \right\Vert _2^2 \nonumber \\&= \Vert g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*) \Vert _2^2 - 2\gamma _k \big < B^T \circ \big ( T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \big ), g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*) \big > \nonumber \\&\quad - \frac{\gamma _k^2}{\lambda }\left\Vert \left( T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \right) \right\Vert _M^2 + \frac{\gamma _k^2}{\lambda }\left\Vert \left( T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \right) \right\Vert _2^2. \end{aligned}$$
    (7.8)

    The first equality follows from Eq. (3.2). In the last equality, we use the definition \(M = I - \lambda BB^T\) and \(\Vert y \Vert _M = \sqrt{<y,My>}\).

  • (iii) From (7.7) and (7.8), we have

    $$\begin{aligned}&\Vert x_{k + 1} - x^* \Vert _2^2 + \frac{\gamma _{k + 1}^2}{\lambda }\Vert v_{k + 1} - v^* \Vert _2^2 \nonumber \\&= \Vert g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*) \Vert _2^2 - 2\gamma _k \big< B^T \circ \big ( T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \big ), g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*) \big> \nonumber \\&\quad - \frac{\gamma _{k}^2}{\lambda }\left\Vert \left( T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \right) \right\Vert _M^2 + \frac{\gamma _{k}^2}{\lambda }\left\Vert \left( T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \right) \right\Vert _2^2 \nonumber \\&\quad + \frac{\gamma _{k + 1}^2}{\lambda }\left\Vert \left( T_1\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \right) \right\Vert _2^2 \nonumber \\&\le \Vert g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*) \Vert _2^2 - 2\gamma _k \big< B^T \circ \big ( T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \big ), g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*) \big> \nonumber \\&\quad - \frac{\gamma _{k}^2}{\lambda }\left\Vert \left( T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \right) \right\Vert _M^2 + 2\frac{\gamma _{k}^2}{\lambda }\left\Vert \left( T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \right) \right\Vert _2^2 \nonumber \\&\le \Vert g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*) \Vert _2^2 - \underline{2\gamma _k \big< B^T \circ \big ( T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \big ), g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*) \big>} \nonumber \\&\quad - \frac{\gamma _{k}^2}{\lambda }\left\Vert \left( T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \right) \right\Vert _M^2 \nonumber \\&\quad + \underline{2\frac{\gamma _{k}^2}{\lambda } \frac{\lambda }{\gamma _k}\big< T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) ,B\left( g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*)\right) \big>}\nonumber \\&\quad + 2\frac{\gamma _{k}^2}{\lambda } \big< T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) , M(v_k - v^*) \big>\nonumber \\&= \Vert g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*) \Vert _2^2 - \underline{2\gamma _k \big< B^T \circ \big ( T_1^{(k)}\left( \varphi _1^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \big ), g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*) \big>} \nonumber \\&\quad - \frac{\gamma _{k}^2}{\lambda }\left\Vert \left( T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \right) \right\Vert _M^2\nonumber \\&\quad + \underline{2\gamma _{k}\big< T_1^{(k)}\left( \varphi _1^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) ,B\left( g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*)\right) \big>} \nonumber \\&\quad + 2\frac{\gamma _{k}^2}{\lambda } \big< T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) , M(v_k - v^*) \big>\nonumber \\&= \Vert g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*) \Vert _2^2 + 2\frac{\gamma _{k}^2}{\lambda }\big < T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) , M(v_k - v^*) \big > \nonumber \\&\quad - \frac{\gamma _{k}^2}{\lambda }\left\Vert \left( T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \right) \right\Vert _M^2 \nonumber \\&= \Vert g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*) \Vert _2^2 + \frac{\gamma _{k}^2}{\lambda }\Vert v_k - v^* \Vert _M^2 \nonumber \\&\quad - \frac{\gamma _{k}^2}{\lambda }\left\Vert \left( T_1^{(k)}\left( \varphi _{1,i_k}^{(k)}\right) - T_1^{(k)}\left( \varphi _2^{(k)}\right) \right) - (v_k - v^*) \right\Vert _M^2 \nonumber \\&\le \left\Vert g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*) \left\Vert _2^2 + \frac{\gamma _{k}^2}{\lambda }\left( 1 - \lambda \rho _{min}(B B^T)\right) \right\Vert v_k - v^* \right\Vert _2^2, \end{aligned}$$
    (7.9)

    where the first equality uses Eq. (7.8) and the second equality of (7.7). The first inequality uses the fact that \(\gamma _k\) is decreasing with respect to k. The second inequality uses Eq. (7.7). The last inequality uses the fact that \(0 < \lambda \le \frac{1}{\rho _{max}(B B^T)} \), which means that \(0 \preceq M \preceq (1 - \lambda \rho _{min}(B B^T))\).

Considering the expectations of both sides of inequality (7.9), we obtain

$$\begin{aligned}&{\mathbb {E}}^{(k + 1)}\left( \Vert x_{k + 1} - x^* \Vert _2^2 + \frac{\gamma _{k+1}^2}{\lambda }\Vert v_{k + 1} - v^* \Vert _2^2 \right) \nonumber \\&\le {\mathbb {E}}^{(k + 1)}\left( \Vert g_{k,i_k}^{(1)}(x_k) - g_k^{(2)}(x^*) \Vert _2^2\right) + \frac{\gamma _{k}^2}{\lambda }\left( 1 - \lambda \rho _{min}(B B^T)\right) {\mathbb {E}}^{(k)}\left( \Vert v_k - v^* \Vert _2^2 \right) \nonumber \\&= {\mathbb {E}}^{(k + 1)}\Big (\Vert x_k - x^* - \gamma _k\left( \nabla f_2^{[i_k]}(x_k) - \nabla f_2(x^*)\right) \Vert _2^2 \Big ) \nonumber \\&\quad + \frac{\gamma _{k}^2}{\lambda }\left( 1 - \lambda \rho _{min}(B B^T)\right) {\mathbb {E}}^{(k)}\left( \Vert v_k - v^* \Vert _2^2 \right) \nonumber \\&= {\mathbb {E}}^{(k)}\big (\Vert x_k - x^* \Vert _2^2\big ) + \frac{\gamma _{k}^2}{\lambda }\left( 1 - \lambda \rho _{min}(B B^T)\right) {\mathbb {E}}^{(k)}\left( \Vert v_k - v^* \Vert _2^2 \right) \nonumber \\&\quad - 2\gamma _k {\mathbb {E}}^{(k)}\big < \nabla f_2(x_k) - \nabla f_2(x^*), x_k - x^*\big > + \gamma _k^2 {\mathbb {E}}^{(k + 1)}\left( \Vert \nabla f_2^{[i_k]}(x_k) - \nabla f_2(x^*) \Vert _2^2\right) , \end{aligned}$$
(7.10)

where, in the third term of the last equality, we use the fact that

$$\begin{aligned} {\mathbb {E}}^{(k + 1)} \big (\nabla f_2^{[i_k]}(x_k)\big ) = {\mathbb {E}}^{(k)}\big (\nabla f_2(x_k)\big ). \end{aligned}$$

This completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, YN., Zhang, X. Stochastic Primal Dual Fixed Point Method for Composite Optimization. J Sci Comput 84, 16 (2020). https://doi.org/10.1007/s10915-020-01265-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10915-020-01265-2

Keywords

Navigation