Skip to main content
Log in

Accelerated Doubly Stochastic Gradient Descent for Tensor CP Decomposition

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

In this paper, we focus on the acceleration of doubly stochastic gradient descent method for computing the CANDECOMP/PARAFAC (CP) decomposition of tensors. This optimization problem has N blocks, where N is the order of the tensor. Under the doubly stochastic framework, each block subproblem is solved by the vanilla stochastic gradient method. However, the convergence analysis requires that the variance converges to zero, which is hard to check in practice and may not hold in some implementations. In this paper, we propose accelerating the stochastic gradient method by the momentum acceleration and the variance reduction technique, denoted as DS-MVR. Theoretically, the convergence of DS-MVR only requires the variance to be bounded. Under mild conditions, we show DS-MVR converges to a stochastic \(\varepsilon \)-stationary solution in \(\tilde{\mathcal {O}}(N^{3/2}\varepsilon ^{-3})\) iterations with varying stepsizes and in \(\mathcal {O}(N^{3/2}\varepsilon ^{-3})\) iterations with constant stepsizes, respectively. Numerical experiments on four real-world datasets show that our proposed algorithm can get better results compared with the baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. There are three ways to set the stepsize in general: the adaptive stepsize (such as Adam [15] and Adagrad [11]), the varying stepsize (see (29) for an example), and the constant stepsize (see (34) for an example).

  2. The originally proposed algorithm rALS [3] solves the least square loss function without the regularization term. To produce a proper solution in our experiments, we execute a proximal mapping defined by (10). Hence, our comparison may not reflect the advantage of rALS.

  3. http://www.ece.uwaterloo.ca/~z70wang/research/ssim/

  4. http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes.

  5. http://trace.eas.asu.edu/yuv/

References

  1. Acar, E., Dunlavy, D.M., Kolda, T.G.: A scalable optimization approach for fitting Canonical tensor decompositions. J. Chemom. 25(2), 67–86 (2011)

    Article  Google Scholar 

  2. Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. Mathematical Programming (2022)

  3. Battaglino, C., Ballard, G., Kolda, T.G.: A practical randomized CP tensor decomposition. SIAM J. Matrix Anal. Appl. 39(2), 876–901 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  4. Beutel, A., Talukdar, P.P., Kumar, A., Faloutsos, C., Papalexakis, E.E., Xing, E.P.: FlexiFaCT: Scalable flexible factorization of coupled tensors on Hadoop. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 109–117 (2014)

  5. Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: 19th International Conference on Computational Statistics, pp. 177–186 (2010)

  6. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  7. Cutkosky, A., Orabona, F.: Momentum-based variance reduction in non-convex SGD. Adv. Neural Inf. Process. Syst. 32 (2019)

  8. Douglas, C.J., Chang, J.J.: Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young’’ decomposition. Psychometrika 35(3), 283–319 (1970)

    Article  MATH  Google Scholar 

  9. Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(61), 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  10. Fu, X., Ibrahim, S., Wai, H., Gao, C., Huang, K.: Block-randomized stochastic proximal gradient for low-rank tensor factorization. IEEE Trans. Signal Process. 68, 2170–2185 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  11. Geoffrey, H., Nitish, S., Kevin, S.: Neural networks for machine learning: Lecture 6E rmsprop: Divide the gradient by a running average of its recent magnitude. University of Toronto Lecture (2006)

  12. Herbert, R., Sutton, M.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  13. Hitchcock, F.L.: The expression of a tensor or a Polyadic as a sum of products. J. Math. Phys. 6, 164–189 (1927)

    Article  MATH  Google Scholar 

  14. Hong, D., Kolda, T.G., Duersch, J.A.: Generalized Canonical Polyadic tensor decomposition. SIAM Rev. 62(1), 133–163 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  15. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations (2015)

  16. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  17. Kolda, T.G., Hong, D.: Stochastic gradients for large-scale tensor decomposition. SIAM J. Math. Data Sci. 2(4), 1066–1095 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  18. Lan, G.: First-Order and Stochastic Optimization Methods for Machine Learning. Springer, Switzerland AG (2020)

    Book  MATH  Google Scholar 

  19. Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I., Lempitsky, V.: Speeding-up convolutional neural networks using fine-tuned CP-decomposition. In: In 3rd International Conference on Learning Representations (2015)

  20. Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Adv. Neural Inf. Process. Syst. 31, 5569–5579 (2018)

    Google Scholar 

  21. Li, Z., Li, J.: Simple and optimal stochastic gradient methods for nonsmooth nonconvex optimization. J. Mach. Learn. Res. 23, 239:1-239:61 (2022)

    MathSciNet  Google Scholar 

  22. Lim, L., Comon, P.: Nonnegative approximations of nonnegative tensors. J. Chemom. 23, 432–441 (2009)

    Article  Google Scholar 

  23. Lin, Z., Li, H., Fang, C.: Accelerated Optimization for Machine Learning. Springer, Singapore (2020)

    Book  Google Scholar 

  24. Liu, J., Musialski, P., Wonka, P., Ye, J.: Tensor completion for estimating missing values in visual data. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 208–220 (2012)

    Article  Google Scholar 

  25. Liu, Y., Liu, J., Long, Z., Zhu, C.: Tensor Computation for Data Analysis. Springer, Switzerland AG (2022)

    Book  Google Scholar 

  26. Maehara, T., Hayashi, K., Kawarabayashi, K.: Expected tensor decomposition with stochastic gradient descent. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 1919–1925 (2016)

  27. Nesterov, Y.E.: A method for unconstrained convex minimization problem with the rate of convergence \({O}(1/k^{2})\). Sov. Math. Doklady 27(2), 372–376 (1983)

    Google Scholar 

  28. Nguyen, L.M., Liu, J., Scheinberg, K., Takác, M.: SARAH: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning, pp. 2613–2621 (2017)

  29. Paatero, P.: Construction and analysis of degenerate PARAFAC models. J. Chemom. 14(3), 285–299 (2000)

    Article  Google Scholar 

  30. Passty, G.B.: Ergodic convergence to a zero of the suae of monotone operators in Hilbert space. J. Math. Anal. Appl. 72(2), 383–390 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  31. Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: ProxSARAH: An efficient algorithmic framework for stochastic composite nonconvex optimization. J. Mach. Learn. Res. 21, 110:1-110:48 (2020)

    MathSciNet  MATH  Google Scholar 

  32. Phan, A.H., Tichavský, P., Cichocki, A.: Fast alternating ls algorithms for high order candecomp/parafac tensor factorizations. IEEE Trans. Signal Process. 61(19), 4834–4846 (2013)

    Article  Google Scholar 

  33. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)

    Article  Google Scholar 

  34. Qi, L., Luo, Z.: Tensor Analysis: Spectral Theory and Special Tensors. Society for Industrial and Applied Mathematics, Philadelphia (2017)

    Book  MATH  Google Scholar 

  35. Qian, N.: On the momentum term in gradient descent learning algorithms. Neural Netw. 12(1), 145–151 (1999)

    Article  Google Scholar 

  36. Reddi, S.J., Sra, S., Póczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Adv. Neural Inf. Process. Syst. 29, 1145–1153 (2016)

    Google Scholar 

  37. Reynolds, M.J., Doostan, A., Beylkin, G.: Randomized alternating least squares for Canonical tensor decompositions: application to a pde with random data. SIAM J. Sci. Comput. 38(5), 2634–2664 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  38. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin, Heidelberg (1998)

    Book  MATH  Google Scholar 

  39. Sidiropoulos, N.D., Bro, R.: On the uniqueness of multilinear decomposition of N-way arrays. J. Chemom. 14(3), 229–239 (2000)

    Article  Google Scholar 

  40. Sidiropoulos, N.D., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E.E., Faloutsos, C.: Tensor decomposition for signal processing and machine learning. IEEE Trans. Signal Process. 65(13), 3551–3582 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  41. Silva, V.D., Lim, L.: Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM J. Matrix Anal. Appl. 30(3), 1084–1127 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  42. Sorber, L., Van Barel, M., De Lathauwer, L.: Optimization-based algorithms for tensor decompositions: canonical polyadic decomposition, decomposition in rank-\((l_r, l_r,1)\) terms, and a new generalization. SIAM J. Optim. 23(2), 695–720 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  43. Veganzones, M.A., Cohen, J.E., Cabral, F.R., Chanussot, J., Comon, P.: Nonnegative tensor CP decomposition of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 54(5), 2577–2588 (2016)

    Article  Google Scholar 

  44. Vervliet, N., De Lathauwer, L.: A randomized block sampling approach to Canonical Polyadic decomposition of large-scale tensors. IEEE J. Select. Top. Signal Process. 10(2), 284–295 (2016)

    Article  Google Scholar 

  45. Wang, Q., Cui, C., Han, D.: A momentum block-randomized stochastic algorithm for low-rank tensor CP decomposition. Pac. J. Optim. 17(3), 433–452 (2021)

    MathSciNet  MATH  Google Scholar 

  46. Xu, Y., Xu, Y.: Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization. J. Optim. Theory Appl. 196, 266–297 (2023)

    Article  MathSciNet  MATH  Google Scholar 

  47. Zhang, Z., Batselier, K., Liu, H., Danie, L., Wong, N.: Tensor computation: a new framework for high-dimensional problems in EDA. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 36(4), 521–536 (2016)

    Article  Google Scholar 

  48. Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. J. Mach. Learn. Res. 21, 103:1-103:63 (2020)

  49. Zhou, W., Conrad, B.A., Rahim, S.H., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the editor and anonymous reviewers for their insightful comments and constructive suggestions that improved the quality of our paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deren Han.

Additional information

Communicated by Lam M. Nguyen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research is supported by the National Natural Science Foundation of China (NSFC) grants 12131004, 11926358, 12126608, the Fundamental Research Funds for the Central Universities (Grant No. YWF-22-T-204).

Appendix A: Several Lemmas for Theoretical Analysis in Algorithm 1

Appendix A: Several Lemmas for Theoretical Analysis in Algorithm 1

Lemma A.1

Under Assumption 4.3, we have

$$\begin{aligned} \begin{aligned}&\mathbb {E}_{\zeta ^{k}}\Big [\Big \Vert G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^k+(1-\beta ^{k-1})\Big (\nabla _{A_{\xi ^{k}}}f^{k-1}-G_{\xi ^{k}}^{k-1}\Big )\Big \Vert ^{2}\vert \mathcal {B}^{k},\xi ^{k}\Big ]\\ \le&2(\beta ^{k-1})^{2}\mathbb {E}_{\zeta ^{k}}\left[ \left\| G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^k\right\| ^{2}\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&+2(1-\beta ^{k-1})^{2} \mathbb {E}_{\zeta ^{k}}\left[ \left\| G_{\xi ^{k}}^{k}- G_{\xi ^{k}}^{k-1} \right\| ^{2}\Big \vert \mathcal {B}^{k},\xi ^{k}\right] , \end{aligned} \end{aligned}$$
(40)

where \(\nabla _{A_{\xi ^{k}}}f^k=\nabla _{A_{\xi ^{k}}}f\left( A_{1}^{k},\dots ,A_{N}^{k}\right) \) and \(\nabla _{A_{\xi ^{k}}}f^{k-1} = \nabla _{A_{\xi ^{k}}}f\left( A_{1}^{k-1},\dots ,A_{N}^{k-1}\right) \).

Proof

We have the following inequality holds

$$\begin{aligned} \begin{aligned}&\left\| G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^k+(1-\beta ^{k-1})\left( \nabla _{A_{\xi ^{k}}}f^{k-1}-G_{\xi ^{k}}^{k-1}\right) \right\| ^{2}\\ =&\quad \Big \Vert (1-\beta ^{k-1})\left( \nabla _{A_{\xi ^{k}}}f^{k-1}-G_{\xi ^{k}}^{k-1}+G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^k\right) +\beta ^{k-1}\left( G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^k\right) \Big \Vert ^{2}\\ \le&\quad 2(1-\beta ^{k-1})^{2}\left\| \nabla _{A_{\xi ^{k}}}f^{k-1}-G_{\xi ^{k}}^{k-1}+G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^k\right\| ^{2}\\&\quad +2(\beta ^{k-1})^{2}\left\| G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^k\right\| ^{2}, \end{aligned}\nonumber \\ \end{aligned}$$
(41)

where the last inequality follows from Young’s inequality. From Assumption 4.3, we have that

$$\begin{aligned}&\mathbb {E}_{\zeta ^{k}}\left[ \left\| \nabla _{A_{\xi ^{k}}}f^{k-1}-G_{\xi ^{k}}^{k-1}+G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^{k}\right\| ^{2}\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&=\mathbb {E}_{\zeta ^{k}}\left[ \left\| \nabla _{A_{\xi ^{k}}}f^{k-1}-G_{\xi ^{k}}^{k-1}\right\| ^{2}+\left\| G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^{k}\right\| ^{2}\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&+2\mathbb {E}_{\zeta ^{k}}\left[ \left<\nabla _{A_{\xi ^{k}}}f^{k-1}-G_{\xi ^{k}}^{k-1},G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^{k}\right>\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&=\mathbb {E}_{\zeta ^{k}}\left[ \left\| \nabla _{A_{\xi ^{k}}}f^{k-1}\right\| ^{2}+\left\| G_{\xi ^{k}}^{k-1}\right\| ^{2}\Big \vert \mathcal {B}^{k},\xi ^{k}\right] -2\mathbb {E}_{\zeta ^{k}}\left[ \left<\nabla _{A_{\xi ^{k}}}f^{k-1},G_{\xi ^{k}}^{k-1}\right>\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&+\mathbb {E}_{\zeta ^{k}}\left[ \left\| \nabla _{A_{\xi ^{k}}}f^{k}\right\| ^{2}+\left\| G_{\xi ^{k}}^{k}\right\| ^{2}-2\left<\nabla _{A_{\xi ^{k}}}f^{k},G_{\xi ^{k}}^{k}\right>\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&+2\mathbb {E}_{\zeta ^{k}}\left[ -\left<\nabla _{A_{\xi ^{k}}}f^{k-1},\nabla _{A_{\xi ^{k}}}f^{k}\right>\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&+2\mathbb {E}_{\zeta ^{k}}\left[ -\left<G_{\xi ^{k}}^{k-1},G_{\xi ^{k}}^{k}\right>+\left<G_{\xi ^{k}}^{k-1},\nabla _{A_{\xi ^{k}}}f^{k}\right>\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&+2\mathbb {E}_{\zeta ^{k}}\left[ \left<\nabla _{A_{\xi ^{k}}}f^{k-1},G_{\xi ^{k}}^{k}\right>\vert \mathcal {B}^{k},\xi ^{k}\right] \\&=\mathbb {E}_{\zeta ^{k}}\left[ \left\| \nabla _{A_{\xi ^{k}}}f^{k-1}\right\| ^{2}+\left\| G_{\xi ^{k}}^{k-1}\right\| ^{2}-2\Vert \nabla _{A_{\xi ^{k}}}f^{k-1}\Vert ^2 \Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&+\mathbb {E}_{\zeta ^{k}}\left[ \left\| \nabla _{A_{\xi ^{k}}}f^{k}\right\| ^{2}+\left\| G_{\xi ^{k}}^{k}\right\| ^{2}-2\Vert \nabla _{A_{\xi ^{k}}}f^{k}\Vert ^2 \Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&+2\mathbb {E}_{\zeta ^{k}}\left[ -\left<G_{\xi ^{k}}^{k-1},G_{\xi ^{k}}^{k}\right>+\left<\nabla _{A_{\xi ^{k}}}f^{k-1},\nabla _{A_{\xi ^{k}}}f^{k}\right>\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&=\mathbb {E}_{\zeta ^{k}}\left[ \left\| G_{\xi ^{k}}^{k}- G_{\xi ^{k}}^{k-1} \right\| ^{2}-\left\| \nabla _{A_{\xi ^{k}}}f^{k}-\nabla _{A_{\xi ^{k}}}f^{k-1}\right\| ^{2}\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&\le \mathbb {E}_{\zeta ^{k}}\left[ \left\| G_{\xi ^{k}}^{k}- G_{\xi ^{k}}^{k-1} \right\| ^{2}\Big \vert \mathcal {B}^{k},\xi ^{k}\right] , \end{aligned}$$

where the third equality follows from Assumption 4.3. \(\square \)

Lemma A.2

Suppose the assumption in Theorem 4.1 holds. Then, we have

$$\begin{aligned} \begin{aligned}&76^{2}\frac{2L^{4}}{\eta }\sum _{k=0}^{K-1}(k+4)^{\frac{1}{3}}(\eta ^{k})^{4}+\frac{2}{\eta }\sum _{k=0}^{K-1}(k+4)^{\frac{1}{3}}\left( 1-\left( \frac{k+3}{k+4}\right) ^{\frac{1}{3}}\right) ^{2}\\ \le&12938\eta ^{3}L^{4}\log (K+2)+\frac{1}{3\eta }. \end{aligned} \end{aligned}$$
(42)

Proof

It shows that

$$\begin{aligned} \begin{aligned} \sum _{k=0}^{K-1}(k+4)^{\frac{1}{3}}(\eta ^{k})^{4}=&\eta ^{4}\sum _{k=0}^{K-1}(k+4)^{\frac{1}{3}}(k+3)^{-\frac{4}{3}}\\ =&\eta ^{4}\sum _{k=0}^{K-1}\left( \frac{k+4}{k+3}\right) ^{\frac{1}{3}}(k+3)^{-1}\\ \le&\eta ^{4}\left( \frac{4}{3}\right) ^{\frac{1}{3}}\sum _{k=0}^{K-1}(k+3)^{-1}\\ \le&\eta ^{4}\left( \frac{4}{3}\right) ^{\frac{1}{3}}\log (K+2). \end{aligned} \end{aligned}$$
(43)

Further, we have

$$\begin{aligned} \begin{aligned} 1-\left( \frac{k+3}{k+4}\right) ^{\frac{1}{3}}=&(k+4)^{-\frac{1}{3}}((k+4)^{\frac{1}{3}}-(k+3)^{\frac{1}{3}})\\ =&\frac{(k+4)^{-\frac{1}{3}}}{(k+4)^{\frac{2}{3}}+(k+4)^{\frac{1}{3}}(k+3)^{\frac{1}{3}}+(k+3)^{\frac{2}{3}}}, \end{aligned} \end{aligned}$$
(44)

where the last inequality follows from \(a^{3}-b^{3}=(a-b)(a^{2}+ab+b^{2})\) for any \(a,b\in \mathbb {R}\). From the above equation, we have

$$\begin{aligned} \begin{aligned}&\sum _{k=0}^{K-1}(k+4)^{\frac{1}{3}}\left( 1-\left( \frac{k+3}{k+4}\right) ^{\frac{1}{3}}\right) ^{2}\\ =&\sum _{k=0}^{K-1}\frac{1}{(k+4)^{\frac{1}{3}}\left( (k+4)^{\frac{2}{3}}+(k+4)^{\frac{1}{3}}(k+3)^{\frac{1}{3}}+(k+3)^{\frac{2}{3}}\right) ^{2}}\\ \le&\frac{1}{9}\sum _{k=0}^{K-1}(k+3)^{-\frac{5}{3}}\\ \le&\frac{1}{6}. \end{aligned} \end{aligned}$$
(45)

where the last inequality follows from \(\sum _{k=1}^{+\infty }\frac{1}{k^{\frac{3}{2}}}\approx 2.612\).

Combining (43) and (45), we obtain (42). This completes the proof.

\(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Q., Cui, C. & Han, D. Accelerated Doubly Stochastic Gradient Descent for Tensor CP Decomposition. J Optim Theory Appl 197, 665–704 (2023). https://doi.org/10.1007/s10957-023-02193-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-023-02193-5

Keywords

Mathematics Subject Classification

Navigation