Accelerated Doubly Stochastic Gradient Descent for Tensor CP Decomposition

Wang, Qingsong; Cui, Chunfeng; Han, Deren

doi:10.1007/s10957-023-02193-5

Accelerated Doubly Stochastic Gradient Descent for Tensor CP Decomposition

Published: 19 March 2023

Volume 197, pages 665–704, (2023)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

532 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, we focus on the acceleration of doubly stochastic gradient descent method for computing the CANDECOMP/PARAFAC (CP) decomposition of tensors. This optimization problem has N blocks, where N is the order of the tensor. Under the doubly stochastic framework, each block subproblem is solved by the vanilla stochastic gradient method. However, the convergence analysis requires that the variance converges to zero, which is hard to check in practice and may not hold in some implementations. In this paper, we propose accelerating the stochastic gradient method by the momentum acceleration and the variance reduction technique, denoted as DS-MVR. Theoretically, the convergence of DS-MVR only requires the variance to be bounded. Under mild conditions, we show DS-MVR converges to a stochastic $\varepsilon $-stationary solution in $\tilde{\mathcal {O}}(N^{3/2}\varepsilon ^{-3})$ iterations with varying stepsizes and in $\mathcal {O}(N^{3/2}\varepsilon ^{-3})$ iterations with constant stepsizes, respectively. Numerical experiments on four real-world datasets show that our proposed algorithm can get better results compared with the baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An inexact alternating proximal gradient algorithm for nonnegative CP tensor decomposition

Article 20 July 2021

VecHGrad for Solving Accurately Tensor Decomposition

A block-randomized stochastic method with importance sampling for CP tensor decomposition

Article 25 March 2024

Notes

There are three ways to set the stepsize in general: the adaptive stepsize (such as Adam [15] and Adagrad [11]), the varying stepsize (see (29) for an example), and the constant stepsize (see (34) for an example).
The originally proposed algorithm rALS [3] solves the least square loss function without the regularization term. To produce a proper solution in our experiments, we execute a proximal mapping defined by (10). Hence, our comparison may not reflect the advantage of rALS.
http://www.ece.uwaterloo.ca/~z70wang/research/ssim/
http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes.
http://trace.eas.asu.edu/yuv/

References

Acar, E., Dunlavy, D.M., Kolda, T.G.: A scalable optimization approach for fitting Canonical tensor decompositions. J. Chemom. 25(2), 67–86 (2011)
Article Google Scholar
Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. Mathematical Programming (2022)
Battaglino, C., Ballard, G., Kolda, T.G.: A practical randomized CP tensor decomposition. SIAM J. Matrix Anal. Appl. 39(2), 876–901 (2018)
Article MathSciNet MATH Google Scholar
Beutel, A., Talukdar, P.P., Kumar, A., Faloutsos, C., Papalexakis, E.E., Xing, E.P.: FlexiFaCT: Scalable flexible factorization of coupled tensors on Hadoop. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 109–117 (2014)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: 19th International Conference on Computational Statistics, pp. 177–186 (2010)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Article MathSciNet MATH Google Scholar
Cutkosky, A., Orabona, F.: Momentum-based variance reduction in non-convex SGD. Adv. Neural Inf. Process. Syst. 32 (2019)
Douglas, C.J., Chang, J.J.: Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young’’ decomposition. Psychometrika 35(3), 283–319 (1970)
Article MATH Google Scholar
Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(61), 2121–2159 (2011)
MathSciNet MATH Google Scholar
Fu, X., Ibrahim, S., Wai, H., Gao, C., Huang, K.: Block-randomized stochastic proximal gradient for low-rank tensor factorization. IEEE Trans. Signal Process. 68, 2170–2185 (2020)
Article MathSciNet MATH Google Scholar
Geoffrey, H., Nitish, S., Kevin, S.: Neural networks for machine learning: Lecture 6E rmsprop: Divide the gradient by a running average of its recent magnitude. University of Toronto Lecture (2006)
Herbert, R., Sutton, M.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Article MathSciNet MATH Google Scholar
Hitchcock, F.L.: The expression of a tensor or a Polyadic as a sum of products. J. Math. Phys. 6, 164–189 (1927)
Article MATH Google Scholar
Hong, D., Kolda, T.G., Duersch, J.A.: Generalized Canonical Polyadic tensor decomposition. SIAM Rev. 62(1), 133–163 (2020)
Article MathSciNet MATH Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations (2015)
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
Article MathSciNet MATH Google Scholar
Kolda, T.G., Hong, D.: Stochastic gradients for large-scale tensor decomposition. SIAM J. Math. Data Sci. 2(4), 1066–1095 (2020)
Article MathSciNet MATH Google Scholar
Lan, G.: First-Order and Stochastic Optimization Methods for Machine Learning. Springer, Switzerland AG (2020)
Book MATH Google Scholar
Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I., Lempitsky, V.: Speeding-up convolutional neural networks using fine-tuned CP-decomposition. In: In 3rd International Conference on Learning Representations (2015)
Li, Z., Li, J.: A simple proximal stochastic gradient method for nonsmooth nonconvex optimization. Adv. Neural Inf. Process. Syst. 31, 5569–5579 (2018)
Google Scholar
Li, Z., Li, J.: Simple and optimal stochastic gradient methods for nonsmooth nonconvex optimization. J. Mach. Learn. Res. 23, 239:1-239:61 (2022)
MathSciNet Google Scholar
Lim, L., Comon, P.: Nonnegative approximations of nonnegative tensors. J. Chemom. 23, 432–441 (2009)
Article Google Scholar
Lin, Z., Li, H., Fang, C.: Accelerated Optimization for Machine Learning. Springer, Singapore (2020)
Book Google Scholar
Liu, J., Musialski, P., Wonka, P., Ye, J.: Tensor completion for estimating missing values in visual data. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 208–220 (2012)
Article Google Scholar
Liu, Y., Liu, J., Long, Z., Zhu, C.: Tensor Computation for Data Analysis. Springer, Switzerland AG (2022)
Book Google Scholar
Maehara, T., Hayashi, K., Kawarabayashi, K.: Expected tensor decomposition with stochastic gradient descent. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 1919–1925 (2016)
Nesterov, Y.E.: A method for unconstrained convex minimization problem with the rate of convergence ${O}(1/k^{2})$. Sov. Math. Doklady 27(2), 372–376 (1983)
Google Scholar
Nguyen, L.M., Liu, J., Scheinberg, K., Takác, M.: SARAH: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning, pp. 2613–2621 (2017)
Paatero, P.: Construction and analysis of degenerate PARAFAC models. J. Chemom. 14(3), 285–299 (2000)
Article Google Scholar
Passty, G.B.: Ergodic convergence to a zero of the suae of monotone operators in Hilbert space. J. Math. Anal. Appl. 72(2), 383–390 (1979)
Article MathSciNet MATH Google Scholar
Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: ProxSARAH: An efficient algorithmic framework for stochastic composite nonconvex optimization. J. Mach. Learn. Res. 21, 110:1-110:48 (2020)
MathSciNet MATH Google Scholar
Phan, A.H., Tichavský, P., Cichocki, A.: Fast alternating ls algorithms for high order candecomp/parafac tensor factorizations. IEEE Trans. Signal Process. 61(19), 4834–4846 (2013)
Article Google Scholar
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Article Google Scholar
Qi, L., Luo, Z.: Tensor Analysis: Spectral Theory and Special Tensors. Society for Industrial and Applied Mathematics, Philadelphia (2017)
Book MATH Google Scholar
Qian, N.: On the momentum term in gradient descent learning algorithms. Neural Netw. 12(1), 145–151 (1999)
Article Google Scholar
Reddi, S.J., Sra, S., Póczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Adv. Neural Inf. Process. Syst. 29, 1145–1153 (2016)
Google Scholar
Reynolds, M.J., Doostan, A., Beylkin, G.: Randomized alternating least squares for Canonical tensor decompositions: application to a pde with random data. SIAM J. Sci. Comput. 38(5), 2634–2664 (2016)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin, Heidelberg (1998)
Book MATH Google Scholar
Sidiropoulos, N.D., Bro, R.: On the uniqueness of multilinear decomposition of N-way arrays. J. Chemom. 14(3), 229–239 (2000)
Article Google Scholar
Sidiropoulos, N.D., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E.E., Faloutsos, C.: Tensor decomposition for signal processing and machine learning. IEEE Trans. Signal Process. 65(13), 3551–3582 (2017)
Article MathSciNet MATH Google Scholar
Silva, V.D., Lim, L.: Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM J. Matrix Anal. Appl. 30(3), 1084–1127 (2008)
Article MathSciNet MATH Google Scholar
Sorber, L., Van Barel, M., De Lathauwer, L.: Optimization-based algorithms for tensor decompositions: canonical polyadic decomposition, decomposition in rank-$(l_r, l_r,1)$ terms, and a new generalization. SIAM J. Optim. 23(2), 695–720 (2013)
Article MathSciNet MATH Google Scholar
Veganzones, M.A., Cohen, J.E., Cabral, F.R., Chanussot, J., Comon, P.: Nonnegative tensor CP decomposition of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 54(5), 2577–2588 (2016)
Article Google Scholar
Vervliet, N., De Lathauwer, L.: A randomized block sampling approach to Canonical Polyadic decomposition of large-scale tensors. IEEE J. Select. Top. Signal Process. 10(2), 284–295 (2016)
Article Google Scholar
Wang, Q., Cui, C., Han, D.: A momentum block-randomized stochastic algorithm for low-rank tensor CP decomposition. Pac. J. Optim. 17(3), 433–452 (2021)
MathSciNet MATH Google Scholar
Xu, Y., Xu, Y.: Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization. J. Optim. Theory Appl. 196, 266–297 (2023)
Article MathSciNet MATH Google Scholar
Zhang, Z., Batselier, K., Liu, H., Danie, L., Wong, N.: Tensor computation: a new framework for high-dimensional problems in EDA. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 36(4), 521–536 (2016)
Article Google Scholar
Zhou, D., Xu, P., Gu, Q.: Stochastic nested variance reduction for nonconvex optimization. J. Mach. Learn. Res. 21, 103:1-103:63 (2020)
Zhou, W., Conrad, B.A., Rahim, S.H., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the editor and anonymous reviewers for their insightful comments and constructive suggestions that improved the quality of our paper.

Author information

Authors and Affiliations

LMIB of the Ministry of Education, School of Mathematical Sciences, Beihang University, Beijing, 100191, China
Qingsong Wang, Chunfeng Cui & Deren Han

Authors

Qingsong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chunfeng Cui
View author publications
You can also search for this author in PubMed Google Scholar
Deren Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deren Han.

Additional information

Communicated by Lam M. Nguyen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research is supported by the National Natural Science Foundation of China (NSFC) grants 12131004, 11926358, 12126608, the Fundamental Research Funds for the Central Universities (Grant No. YWF-22-T-204).

Appendix A: Several Lemmas for Theoretical Analysis in Algorithm 1

Lemma A.1

Under Assumption 4.3, we have

$$\begin{aligned} \begin{aligned}&\mathbb {E}_{\zeta ^{k}}\Big [\Big \Vert G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^k+(1-\beta ^{k-1})\Big (\nabla _{A_{\xi ^{k}}}f^{k-1}-G_{\xi ^{k}}^{k-1}\Big )\Big \Vert ^{2}\vert \mathcal {B}^{k},\xi ^{k}\Big ]\\ \le&2(\beta ^{k-1})^{2}\mathbb {E}_{\zeta ^{k}}\left[ \left\| G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^k\right\| ^{2}\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&+2(1-\beta ^{k-1})^{2} \mathbb {E}_{\zeta ^{k}}\left[ \left\| G_{\xi ^{k}}^{k}- G_{\xi ^{k}}^{k-1} \right\| ^{2}\Big \vert \mathcal {B}^{k},\xi ^{k}\right] , \end{aligned} \end{aligned}$$

(40)

where $\nabla _{A_{\xi ^{k}}}f^k=\nabla _{A_{\xi ^{k}}}f\left( A_{1}^{k},\dots ,A_{N}^{k}\right) $ and $\nabla _{A_{\xi ^{k}}}f^{k-1} = \nabla _{A_{\xi ^{k}}}f\left( A_{1}^{k-1},\dots ,A_{N}^{k-1}\right) $.

Proof

We have the following inequality holds

$$\begin{aligned} \begin{aligned}&\left\| G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^k+(1-\beta ^{k-1})\left( \nabla _{A_{\xi ^{k}}}f^{k-1}-G_{\xi ^{k}}^{k-1}\right) \right\| ^{2}\\ =&\quad \Big \Vert (1-\beta ^{k-1})\left( \nabla _{A_{\xi ^{k}}}f^{k-1}-G_{\xi ^{k}}^{k-1}+G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^k\right) +\beta ^{k-1}\left( G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^k\right) \Big \Vert ^{2}\\ \le&\quad 2(1-\beta ^{k-1})^{2}\left\| \nabla _{A_{\xi ^{k}}}f^{k-1}-G_{\xi ^{k}}^{k-1}+G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^k\right\| ^{2}\\&\quad +2(\beta ^{k-1})^{2}\left\| G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^k\right\| ^{2}, \end{aligned}\nonumber \\ \end{aligned}$$

(41)

where the last inequality follows from Young’s inequality. From Assumption 4.3, we have that

$$\begin{aligned}&\mathbb {E}_{\zeta ^{k}}\left[ \left\| \nabla _{A_{\xi ^{k}}}f^{k-1}-G_{\xi ^{k}}^{k-1}+G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^{k}\right\| ^{2}\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&=\mathbb {E}_{\zeta ^{k}}\left[ \left\| \nabla _{A_{\xi ^{k}}}f^{k-1}-G_{\xi ^{k}}^{k-1}\right\| ^{2}+\left\| G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^{k}\right\| ^{2}\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&+2\mathbb {E}_{\zeta ^{k}}\left[ \left<\nabla _{A_{\xi ^{k}}}f^{k-1}-G_{\xi ^{k}}^{k-1},G_{\xi ^{k}}^{k}-\nabla _{A_{\xi ^{k}}}f^{k}\right>\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&=\mathbb {E}_{\zeta ^{k}}\left[ \left\| \nabla _{A_{\xi ^{k}}}f^{k-1}\right\| ^{2}+\left\| G_{\xi ^{k}}^{k-1}\right\| ^{2}\Big \vert \mathcal {B}^{k},\xi ^{k}\right] -2\mathbb {E}_{\zeta ^{k}}\left[ \left<\nabla _{A_{\xi ^{k}}}f^{k-1},G_{\xi ^{k}}^{k-1}\right>\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&+\mathbb {E}_{\zeta ^{k}}\left[ \left\| \nabla _{A_{\xi ^{k}}}f^{k}\right\| ^{2}+\left\| G_{\xi ^{k}}^{k}\right\| ^{2}-2\left<\nabla _{A_{\xi ^{k}}}f^{k},G_{\xi ^{k}}^{k}\right>\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&+2\mathbb {E}_{\zeta ^{k}}\left[ -\left<\nabla _{A_{\xi ^{k}}}f^{k-1},\nabla _{A_{\xi ^{k}}}f^{k}\right>\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&+2\mathbb {E}_{\zeta ^{k}}\left[ -\left<G_{\xi ^{k}}^{k-1},G_{\xi ^{k}}^{k}\right>+\left<G_{\xi ^{k}}^{k-1},\nabla _{A_{\xi ^{k}}}f^{k}\right>\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&+2\mathbb {E}_{\zeta ^{k}}\left[ \left<\nabla _{A_{\xi ^{k}}}f^{k-1},G_{\xi ^{k}}^{k}\right>\vert \mathcal {B}^{k},\xi ^{k}\right] \\&=\mathbb {E}_{\zeta ^{k}}\left[ \left\| \nabla _{A_{\xi ^{k}}}f^{k-1}\right\| ^{2}+\left\| G_{\xi ^{k}}^{k-1}\right\| ^{2}-2\Vert \nabla _{A_{\xi ^{k}}}f^{k-1}\Vert ^2 \Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&+\mathbb {E}_{\zeta ^{k}}\left[ \left\| \nabla _{A_{\xi ^{k}}}f^{k}\right\| ^{2}+\left\| G_{\xi ^{k}}^{k}\right\| ^{2}-2\Vert \nabla _{A_{\xi ^{k}}}f^{k}\Vert ^2 \Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&+2\mathbb {E}_{\zeta ^{k}}\left[ -\left<G_{\xi ^{k}}^{k-1},G_{\xi ^{k}}^{k}\right>+\left<\nabla _{A_{\xi ^{k}}}f^{k-1},\nabla _{A_{\xi ^{k}}}f^{k}\right>\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&=\mathbb {E}_{\zeta ^{k}}\left[ \left\| G_{\xi ^{k}}^{k}- G_{\xi ^{k}}^{k-1} \right\| ^{2}-\left\| \nabla _{A_{\xi ^{k}}}f^{k}-\nabla _{A_{\xi ^{k}}}f^{k-1}\right\| ^{2}\Big \vert \mathcal {B}^{k},\xi ^{k}\right] \\&\le \mathbb {E}_{\zeta ^{k}}\left[ \left\| G_{\xi ^{k}}^{k}- G_{\xi ^{k}}^{k-1} \right\| ^{2}\Big \vert \mathcal {B}^{k},\xi ^{k}\right] , \end{aligned}$$

where the third equality follows from Assumption 4.3. $\square $

Lemma A.2

Suppose the assumption in Theorem 4.1 holds. Then, we have

$$\begin{aligned} \begin{aligned}&76^{2}\frac{2L^{4}}{\eta }\sum _{k=0}^{K-1}(k+4)^{\frac{1}{3}}(\eta ^{k})^{4}+\frac{2}{\eta }\sum _{k=0}^{K-1}(k+4)^{\frac{1}{3}}\left( 1-\left( \frac{k+3}{k+4}\right) ^{\frac{1}{3}}\right) ^{2}\\ \le&12938\eta ^{3}L^{4}\log (K+2)+\frac{1}{3\eta }. \end{aligned} \end{aligned}$$

(42)

Proof

It shows that

$$\begin{aligned} \begin{aligned} \sum _{k=0}^{K-1}(k+4)^{\frac{1}{3}}(\eta ^{k})^{4}=&\eta ^{4}\sum _{k=0}^{K-1}(k+4)^{\frac{1}{3}}(k+3)^{-\frac{4}{3}}\\ =&\eta ^{4}\sum _{k=0}^{K-1}\left( \frac{k+4}{k+3}\right) ^{\frac{1}{3}}(k+3)^{-1}\\ \le&\eta ^{4}\left( \frac{4}{3}\right) ^{\frac{1}{3}}\sum _{k=0}^{K-1}(k+3)^{-1}\\ \le&\eta ^{4}\left( \frac{4}{3}\right) ^{\frac{1}{3}}\log (K+2). \end{aligned} \end{aligned}$$

(43)

Further, we have

$$\begin{aligned} \begin{aligned} 1-\left( \frac{k+3}{k+4}\right) ^{\frac{1}{3}}=&(k+4)^{-\frac{1}{3}}((k+4)^{\frac{1}{3}}-(k+3)^{\frac{1}{3}})\\ =&\frac{(k+4)^{-\frac{1}{3}}}{(k+4)^{\frac{2}{3}}+(k+4)^{\frac{1}{3}}(k+3)^{\frac{1}{3}}+(k+3)^{\frac{2}{3}}}, \end{aligned} \end{aligned}$$

(44)

where the last inequality follows from $a^{3}-b^{3}=(a-b)(a^{2}+ab+b^{2})$ for any $a,b\in \mathbb {R}$. From the above equation, we have

$$\begin{aligned} \begin{aligned}&\sum _{k=0}^{K-1}(k+4)^{\frac{1}{3}}\left( 1-\left( \frac{k+3}{k+4}\right) ^{\frac{1}{3}}\right) ^{2}\\ =&\sum _{k=0}^{K-1}\frac{1}{(k+4)^{\frac{1}{3}}\left( (k+4)^{\frac{2}{3}}+(k+4)^{\frac{1}{3}}(k+3)^{\frac{1}{3}}+(k+3)^{\frac{2}{3}}\right) ^{2}}\\ \le&\frac{1}{9}\sum _{k=0}^{K-1}(k+3)^{-\frac{5}{3}}\\ \le&\frac{1}{6}. \end{aligned} \end{aligned}$$

(45)

where the last inequality follows from $\sum _{k=1}^{+\infty }\frac{1}{k^{\frac{3}{2}}}\approx 2.612$.

Combining (43) and (45), we obtain (42). This completes the proof.

$\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, Q., Cui, C. & Han, D. Accelerated Doubly Stochastic Gradient Descent for Tensor CP Decomposition. J Optim Theory Appl 197, 665–704 (2023). https://doi.org/10.1007/s10957-023-02193-5

Download citation

Received: 07 July 2022
Accepted: 01 March 2023
Published: 19 March 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10957-023-02193-5

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerated Doubly Stochastic Gradient Descent for Tensor CP Decomposition

Abstract

Access this article

Similar content being viewed by others

An inexact alternating proximal gradient algorithm for nonnegative CP tensor decomposition

VecHGrad for Solving Accurately Tensor Decomposition

A block-randomized stochastic method with importance sampling for CP tensor decomposition

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A: Several Lemmas for Theoretical Analysis in Algorithm 1

Lemma A.1

Proof

Lemma A.2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Accelerated Doubly Stochastic Gradient Descent for Tensor CP Decomposition

Abstract

Access this article

Similar content being viewed by others

An inexact alternating proximal gradient algorithm for nonnegative CP tensor decomposition

VecHGrad for Solving Accurately Tensor Decomposition

A block-randomized stochastic method with importance sampling for CP tensor decomposition

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A: Several Lemmas for Theoretical Analysis in Algorithm 1

Appendix A: Several Lemmas for Theoretical Analysis in Algorithm 1

Lemma A.1

Proof

Lemma A.2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation