Skip to main content
Log in

Half-quadratic alternating direction method of multipliers for robust orthogonal tensor approximation

  • Published:
Advances in Computational Mathematics Aims and scope Submit manuscript

Abstract

Higher-order tensor canonical polyadic decomposition (CPD) with one or more of the latent factor matrices being columnwisely orthonormal has been well studied in recent years. However, most existing models penalize the noises, if occurring, by employing the least squares loss, which may be sensitive to non-Gaussian noise or outliers, leading to bias estimates of the latent factors. In this paper, we derive a robust orthogonal tensor CPD model with Cauchy loss, which is resistant to heavy-tailed noise such as the Cauchy noise, or outliers. By exploring the half-quadratic property of the model, we develop the so-called half-quadratic alternating direction method of multipliers (HQ-ADMM) to solve the model. Each subproblem involved in HQ-ADMM admits a closed-form solution. Thanks to some nice properties of the Cauchy loss, we show that the whole sequence generated by the algorithm globally converges to a stationary point of the problem under consideration. Numerical experiments on synthetic and real data demonstrate the effectiveness of the proposed model and algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Anandkumar, A., Jain, P., Shi, Y., Niranjan, U. N.: Tensor vs. matrix methods: robust tensor decomposition under block sparse perturbations. In: Artificial Intelligence and Statistics, pp. 268–276 (2016)

  2. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems : proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1-2), 91–129 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  3. Beaton, A., Tukey, J.: The fitting of power series, meaning polynomials, illustrated on band-spectroscopic data. Technometrics 16(2), 147–185 (1974)

    Article  MATH  Google Scholar 

  4. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1-2), 459–494 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  5. Chen, J., Saad, Y.: On the tensor SVD and the optimal low rank orthogonal approximation of tensors. SIAM. J. Matrix Anal. Appl. 30(4), 1709–1734 (2009)

    Article  MATH  Google Scholar 

  6. Cheng, L., Wu, Y. C., Poor, H.V.: Probabilistic tensor canonical polyadic decomposition with orthogonal factors. IEEE Trans. Signal Process. 65 (3), 663–676 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  7. Cichocki, A., Mandic, D., De Lathauwer, L., Zhou, G., Zhao, Q., Caiafa, C., Phan, H.A.: Tensor decompositions for signal processing applications: from two-way to multiway component analysis. IEEE Signal Process. Mag. 32(2), 145–163 (2015)

    Article  Google Scholar 

  8. De Almeida, A.L.F., Kibangou, A.Y., Miron, S., Araújo, D.C.: Joint data and connection topology recovery in collaborative wireless sensor networks. In: Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013), pp. 5303–5307 (2013)

  9. De Lathauwer, L.: Algebraic methods after prewhitening. In: Handbook of Blind Source Separation, pp. 155–177. Elsevier (2010)

  10. De Lathauwer, L.: A Short introduction to tensor-based methods for factor analysis and blind source separation. In: Proceeding of the IEEE International Symposium on Image and Signal Processing and Analysis (ISPA 2011), pp. 558–563 (2011)

  11. De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21, 1253–1278 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  12. Ding, M., Huang, T.Z., Ma, T.H., Zhao, X.L., Yang, J.H.: Cauchy noise removal using group-based low-rank prior. Appl. Math. Comput. 372, 124971 (2020)

    MathSciNet  MATH  Google Scholar 

  13. Feng, Y., Fan, J., Suykens, J.: A statistical learning approach to modal regression. J. Mach. Learn. Res. 21(2), 1–35 (2020)

    MathSciNet  MATH  Google Scholar 

  14. Feng, Y., Huang, X., Shi, L., Yang, Y., Suykens, J.: Learning with the maximum correntropy criterion induced losses for regression. J. Mach. Learn. Res. 16, 993–1034 (2015)

    MathSciNet  MATH  Google Scholar 

  15. Ganan, S., McClure, D.: Bayesian image analysis: an application to single photon emission tomography. Amer. Statist. Assoc, 12–18 (1985)

  16. Goldfarb, D., Qin, Z.: Robust low-rank tensor recovery: models and algorithms. SIAM J. Matrix Anal. Appl. 35(1), 225–253 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  17. Guan, N., Liu, T., Zhang, Y., Tao, D., Davis, L.S.: Truncated cauchy non-negative matrix factorization. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 246–259 (2017)

    Article  Google Scholar 

  18. Guan, Y., Chu, D.: Numerical computation for orthogonal low-rank approximation of tensors. SIAM J. Matrix Anal. Appl. 40(3), 1047–1065 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  19. He, R., Zheng, W.S., Hu, B.G.: Maximum correntropy criterion for robust face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1561–1576 (2010)

    Google Scholar 

  20. Hillar, C.J., Lim, L.H.: Most tensor problems are NP-hard. J. ACM 60(6), 45:1–45:39 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  21. Holland, P., Welsch, R.: Robust regression using iteratively reweighted least-squares. Commun. Stat.-Theory Methods 6(9), 813–827 (1977)

    Article  MATH  Google Scholar 

  22. Hong, D., Kolda, T.G., Duersch, J.A.: Generalized canonical polyadic tensor decomposition. SIAMRev. 62(1), 133–163 (2020)

    MathSciNet  MATH  Google Scholar 

  23. Hong, M., Luo, Z.Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  24. Hu, S., Li, G.: Convergence rate analysis for the higher order power method in best rank one approximations of tensors. Numer. Math. 140(4), 993–1031 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  25. Hu, S., Ye, K. (2019)

  26. Huber, P.J.: Robust statistics, vol. 523. Wiley, New York (2004)

    Google Scholar 

  27. Kim, G., Cho, J., Kang, M.: Cauchy noise removal by weighted nuclear norm minimization. J. Sci. Comput. 83, 15 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  28. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51, 455–500 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  29. Kovnatsky, A., Glashoff, K., Bronstein, M.M.: MADMM: a generic algorithm for non-smooth optimization on manifolds. In: European Conference on Computer Vision, pp. 680–696. Springer (2016)

  30. Li, G., Liu, T., Pong, T.K.: Peaceman–Rachford splitting for a class of nonconvex optimization problems. Comput. Optim. Appl. 68(2), 407–436 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  31. Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  32. Li, G., Pong, T.K.: Douglas–Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems. Math. Program. 159(1-2), 371–401 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  33. Li, G., Pong, T.K.: Calculus of the exponent of kurdyka–łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18(5), 1199–1232 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  34. Li, J., Usevich, K., Comon, P.: Globally convergent Jacobi-type algorithms for simultaneous orthogonal symmetric tensor diagonalization. SIAM J. Matrix Anal. Appl. 39(1), 1–22 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  35. Li, J., Zhang, S.: Polar decomposition based algorithms on the product of stiefel manifolds with applications in tensor approximation. arXiv:1912.10390 (2019)

  36. Li, X., Lu, Q., Dong, Y., Tao, D.: Robust subspace clustering by cauchy loss function. IEEE Trans. Neural Netw. Learn. Syst. 30(7), 2067–2078 (2018)

    Article  MathSciNet  Google Scholar 

  37. Liu, H., So, A.M.C., Wu, W.: Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods. Math. Program. 178(1), 215–262 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  38. Maronna, R., Bustos, O., Yohai, V.: Bias-and efficiency-robustness of general M-estimators for regression with random carriers. In: Smoothing Techniques for Curve Estimation, pp. 91–116. Springer (1979)

  39. Mei, J.J., Dong, Y., Huang, T.Z., Yin, W.: Cauchy noise removal by nonconvex admm with convergence guarantees. J. Sci. Comput. 74(2), 743–766 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  40. Pan, J., Ng, M.K.: Symmetric orthogonal approximation to symmetric tensors with applications to image reconstruction. Numer. Linear Algebra Appl. 25(5), e2180 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  41. Pravdova, V., Estienne, F., Walczak, B., Massart, D.L.: A robust version of the Tucker3 model. Chemometr. Intell. Lab. Syst. 59(1), 75–88 (2001)

    Article  Google Scholar 

  42. Savas, B., Lim, L.H.: Quasi-Newton methods on grassmannians and multilinear approximations of tensors. SIAM J. Sci. Comput. 32(6), 3352–3393 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  43. Sciacchitano, F., Dong, Y., Zeng, T.: Variational approach for restoring blurred images with cauchy noise. SIAM J. Imag. Sci. 8(3), 1894–1922 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  44. Shashua, A., Levin, A.: Linear image coding for regression and classification using the tensor-rank principle. In: CVPR, vol. 1, pp. I–I. IEEE (2001)

  45. Sidiropoulos, N.D., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E.E., Faloutsos, C.: Tensor decomposition for signal processing and machine learning. IEEE Trans. Signal Process. 65(13), 3551–3582 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  46. Sidiropoulos, N.D., Giannakis, G.B., Bro, R.: Blind parafac receivers for ds-cdma systems. IEEE Trans. Signal Process. 48(3), 810–823 (2000)

    Article  Google Scholar 

  47. Signoretto, M., Dinh, Q.T., De Lathauwer, L., Suykens, J.A.K.: Learning with tensors: a framework based on convex optimization and spectral regularization. Mach. Learn. 94(3), 303–351 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  48. Sørensen, M., De Lathauwer, L., Comon, P., Icart, S., Deneire, L.: Canonical polyadic decomposition with a columnwise orthonormal factor matrix. SIAM J. Matrix Anal. Appl. 33(4), 1190–1213 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  49. Sørensen, M., De Lathauwer, L., Deneire, L.: PARAFAC with orthogonality in one mode and applications in DS-CDMA systems. In: Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2010), pp. 4142–4145 (2010)

  50. Vervliet, N., Debals, O., Sorber, L., Van Barel, M., De Lathauwer, L.: Tensorlab 3.0. http://www.tensorlab.net. Available online (2016)

  51. Wang, L., Chu, M.T., Yu, B.: Orthogonal low rank tensor approximation: alternating least squares method and its global convergence. SIAM J. Matrix Anal. and Appl. 36(1), 1–19 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  52. Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  53. Yang, Y.: The epsilon-alternating least squares for orthogonal low-rank tensor approximation and its global convergence. SIAM J. Matrix Anal. Appl. 41(4), 1797–1825 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  54. Yang, Y., Feng, Y., Suykens, J.A.K.: Robust low-rank tensor recovery with regularized redescending m-estimator. IEEE Trans. Neural Netw. Learn. Syst. 27(9), 1933–1946 (2015)

    Article  MathSciNet  Google Scholar 

  55. Ye, K., Hu, S.: When geometry meets optimization theory: partially orthogonal tensors. arXiv:2201.04824 (2022)

  56. Yu, P., Li, G., Pong, T.K.: Kurdyka–Łojasiewicz exponent via inf-projection. Found. Comput. Math. 1–47 (2021)

Download references

Acknowledgements

We thank the editor and the anonymous reviewers for their insightful comments and suggestions that helped improve this manuscript.

Funding

The first author was supported by the National Natural Science Foundation of China Grants 11801100 and 12171105, and the Fok Ying Tong Education Foundation Grant 171094. The second author was supported by the Simons Foundation Collaboration Grant 572064.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuning Yang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Communicated by: Guoyin Li

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Mathematics of Computation and Optimisation Guest Editors: Jerome Droniou, Andrew Eberhard, Guoyin Li, Russell Luke, Thanh Tran

Appendix

Appendix

1.1 Proof of Theorem 4.1

To prove the convergence of a nonconvex ADMM, a key step is to upper bound the successive difference of the dual variables by the primal variables. Different from the nonconvex ADMMs in the literature, for HQ-ADMM, the weight Wk brings barriers in the estimation of the upper bound. Fortunately, this can be overcome by realizing the relations between Wk, Tk and Tk− 1 based on Proposition 2.3, which will be given in Lemma A.1 With the upper bound at hand, we can derive the decreasing inequality with respect to \(\{\tilde {L}_{\tau }^{k+1,k} \}\) (Lemma A.2), whose verification is somewhat similar to that of a nonconvex block coordinate descent. Then, the boundedness of the variables is established in Theorem A.1. Key to the above two results is to set the parameter \(\tau \geq \sqrt { 10}\). Combining the above pieces, the subsequential convergence will be proved at the end of this subsection using a standard argument.

Lemma .2

It holds that

$$ \| { {\varDelta}_{ {Y}}^{k+1,k} }\|_{F} \leq \| { {\varDelta}_{ {T}}^{k+1,k} } \|_{F} + \| { {\varDelta}_{ {T}}^{k,k-1} } \|_{F} . $$

Proof

From (??), we have

$$ W^{k}\circledast{ \left( {T}^{k+1}- {A} \right) } + {Y}^{k} - \tau { \left( [[\boldsymbol{ \sigma}^{k};\boldsymbol{U}^{k+1}]] - {T}^{k+1} \right) }=0, $$

which together with the definition of Yk+ 1 yields

$$ W^{k}\circledast{ \left( {T}^{k+1}- {A} \right) } + {Y}^{k+1} = 0. $$
(.33)

Therefore, we have

$$ \begin{array}{@{}rcl@{}} \|{ {\varDelta}_{ {Y}}^{k+1,k} } \| &=& { \left\| W^{k}\circledast{ \left( {T}^{k+1}- {A} \right) } - W^{k-1}\circledast{ \left( {T}^{k}- {A} \right) } \right\|_{F} } \\ &= &{ \left\| W^{k}\circledast{ \left( {T}^{k+1}- {A} \right) } - W^{k}\circledast{ \left( {T}^{k}- {A} \right) } + W^{k}\circledast\left( { {T}^{k}- {A} }\right) - W^{k-1}\circledast\left( { {T}^{k}- {A} }\right( \right\|_{F} }\\ &\leq &{ \left\| W^{k}\circledast{ \left( {T}^{k+1}- {T}^{k} \right) } \right\|_{F} } + { \left\| (W^{k} - W^{k-1})\circledast{ \left( {T}^{k}- {A} \right) } \right\|_{F} } \end{array} $$
(.34)

Now, denote \( E_{1}:= { \left \| W^{k}\circledast { \left ({T}^{k+1}- {T}^{k} \right ) } \right \|_{F} } \) and \(\small E_{2}:={ \left \| (W^{k} - W^{k-1})\circledast { \left ({T}^{k}- {A} \right ) } \right \|_{F} } \). We first consider E1. From the definition of Wk, we easily see that \( W^{k}_{i_{1}{\cdots } i_{d}}\leq 1\) for each i1,…,id. Therefore,

$$ E_{1} \leq \| { {\varDelta}_{ {T}}^{k+1,k} } \|. $$
(.35)

Next, we focus on E2. To simplify notations we denote \(a_{i_{1}{\cdots } i_{d}}^{k}:= {T}^{k}_{i_{1}{\cdots } i_{d}} - {A}_{i_{1}{\cdots } i_{d}}\) and

$$e_{i_{1}{\cdots} i_{d}}:= \delta^{2} a_{i_{1}{\cdots} i_{d}}^{k} \left( \frac{1}{ \delta^{2} + \left( a_{i_{1}{\cdots} i_{d}}^{k}\right)^{2} } - \frac{1}{\delta^{2} + \left( a_{i_{1}{\cdots} i_{d}}^{k-1}\right)^{2} } \right) .$$

Then, E2 can be expressed as

$$ \begin{array}{@{}rcl@{}} {E_{2}^{2}} &=&\sum\limits^{n_{1},\ldots,n_{d}}_{i_{1}=1,\ldots,i_{d}=1} { \left( W_{i_{1}{\cdots} i_{d}}^{k+1} - W_{i_{1}{\cdots} i_{d}}^{k} \right) }^{2}{ \left( {T}_{i_{1}{\cdots} i_{d}} - {A}_{i_{1}{\cdots} i_{d}} \right) }^{2} \\ &=& \sum\limits^{n_{1},\ldots,n_{d}}_{i_{1}=1,\ldots,i_{d}=1} \left( \frac{1}{ \delta^{2} + \left( a_{i_{1}{\cdots} i_{d}}^{k}\right)^{2} } - \frac{1}{\delta^{2} + \left( a_{i_{1}\cdots i_{d}}^{k-1}\right)^{2} } \right)^{2}\delta^{4} \left( a_{i_{1}{\cdots} i_{d}}^{k}\right)^{2}\\ & =& \sum\limits_{i_{1}=1,\ldots,i_{d}=1}^{n_{1},\ldots,n_{d}} e_{i_{1}{\cdots} i_{d}}^{2}. \end{array} $$

It follows from Proposition 2.3 that

$$ |e_{i_{1}{\cdots} i_{d}}| \leq | a_{i_{1}{\cdots} i_{d}}^{k}-a_{i_{1}{\cdots} i_{d}}^{k-1} |, $$

and so

$$ E_{2} \leq \| {T}^{k}-{A} - ({T}^{k-1} - {A}) \|_{F} = \|{ {\varDelta}_{{T}}^{k,k-1} }\|_{F}. $$
(.36)

(.34) combining with (.35) and (.36) yields the desired result. □

With Lemma A.1, we then establish a decreasing inequality with respect to \(\{\tilde {L}_{\tau }^{k+1,k} \}\) defined in (??):

$$ \tilde L^{k+1,k}_{\tau} := L_{\tau}(\boldsymbol{ \sigma}^{k+1},\boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, W^{k+1}, {T}^{k}). + \frac{2 }{ \tau }\| {T} - {T}^{\prime}\|_{F}^{2}.$$

Key to the validness of the decreasing inequality is to set \(\tau \geq \sqrt {10}\).

Lemma .3

Let the parameter τ satisfy \( \tau \geq \sqrt {10}\). Then, there holds

$$ \tilde L_{\tau}^{k,k-1} - \tilde L_{\tau}^{k+1,k} \geq \frac{\alpha}{2}{\sum}_{j=1}^{d}{ \left\| { {\varDelta}_{U_j}^{k+1,k} } \right\|_{F} }^{2}+ \frac{1}{\tau} { \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F} }^{2},~\forall k\geq 1, $$

where α > 0 is defined in (??) and (??).

Proof

We first consider the decrease caused by Uj. When 1 ≤ jdt, according to the algorithm, the expression of Lτ(⋅), that \({ \left \| u^{k}_{j,i} \right \| }=1\) and recalling the definition of \(u^{k+1}_{j,i}\), \(\mathbf {v}^{k+1}_{j,i}\), and \(\tilde {\mathbf {v}}^{k+1}_{j,i}\), we have

$$ \begin{array}{@{}rcl@{}} && L_{\tau}(\boldsymbol{ \sigma}^{k}, U^{k+1}_{1},\ldots,U^{k+1}_{j-1},{U^{k}_{j}},\ldots,{U^{d}_{j}} , {T}^{k}, {Y}^{k}, W^{k} ) - \\ &&~~~~~~~~~~~~~~~ L_{\tau}(\boldsymbol{ \sigma}, U^{k+1}_{1},\ldots, U^{k+1}_{j},U^{k}_{j+1},\ldots,{U^{k}_{d}}, {T}^{k}, {Y}^{k}, W^{k}) \\ =&& \sum\limits^{R}_{i=1}{ \left\langle {\sigma^{k}_{i}} \cdot ({Y}^{k} + \tau {T}^{k}) {u_{1,i}^{k+1}\otimes \cdots\otimes u_{j-1,i}^{k+1} \otimes u_{j+1,i}^{k} \otimes\cdots\otimes u_{d,i}^{k} } , u^{k+1}_{j,i} - u^{k}_{j,i}\right\rangle } \\ =&&\sum\limits^{R}_{i=1}{ \left\langle {\sigma^{k}_{i}}\cdot \mathbf{v}^{k+1}_{j,i} , u^{k+1}_{j,i} - u^{k}_{j,i} \right\rangle } \\ =&& \sum\limits^{R}_{i=1} { \left\langle \sigma^{k}\cdot \mathbf{v}^{k+1} + \alpha{u}^{k}_{j,i} , u^{k+1}_{j,i} - u^{k}_{j,i} \right\rangle } + \frac{\alpha}{2}{ \left\| u^{k+1}_{j,i} - u^{k}_{j,i} \right\| }^{2} \\ =&& \sum\limits^{R}_{i=1} { \left\langle \tilde{\mathbf{v}}^{k+1}_{j,i} , \frac{\tilde{\mathbf{v}}^{k+1}_{j,i}}{{ \left\| \tilde{\mathbf{v}}^{k+1}_{j,i} \right\| } } -u^{k}_{j,i} \right\rangle }+ \frac{\alpha}{2}\big\|{u^{k+1}_{j,i} - u^{k}_{j,i} }^{2} \\ \geq && \frac{\alpha}{2}\sum\limits^{R}_{i=1}{ \left\| u^{k+1}_{j,i} - u^{k}_{j,i} \right\| }^{2} = \frac{\alpha}{2}{ \left\| { {\varDelta}_{U_j}^{k+1,k} } \right\| }^{2}_{F} , \end{array} $$
(.37)

where the fourth equality follows from the definition of \(u^{k+1}_{j,i}\) and \(\tilde {\mathbf {v}}^{k+1}_{j,i}\), and the inequality is due to \({ \left \| \mathbf {v} \right \| }\geq {\left \langle \mathbf {v} , u\right \rangle }\) for any vectors u,v of the same size with \({ \left \| u \right \| }=1\).

The decrease of Uj when dt + 1 ≤ jd is similar. From the definition of \(V^{k+1}_{j}\), it holds that

$$ \begin{array}{@{}rcl@{}} && L_{\tau}(\boldsymbol{ \sigma}^{k}, U^{k+1}_{1},\ldots,U^{k+1}_{j-1},{U^{k}_{j}},\ldots,{U^{k}_{d}} , {T}^{k}, {Y}^{k}, W^{k} ) - \\ &&~~~~~~~~~~~~~~~ L_{\tau}(\boldsymbol{ \sigma}^{k}, U^{k+1}_{1},\ldots, U^{k+1}_{j},U^{k}_{j+1},\ldots,{U^{k}_{d}}, {T}^{k}, {Y}^{k}, W^{k}) \\ &=& \sum\limits_{i=1}^{R}{ \left\langle {\sigma^{k}_{i}} \cdot ({Y}^{k} + \tau {T}^{k}) {u_{1,i}^{k+1}\otimes \cdots\otimes u_{j-1,i}^{k+1} \otimes u_{j+1,i}^{k} \otimes\cdots\otimes u_{d,i}^{k} } , u^{k+1}_{j,i} - u^{k}_{j,i}\right\rangle } \\ &=&{ \left\langle V^{k+1}_{j} \cdot \text{diag}(\boldsymbol{ \sigma}^{k}) , U^{k+1}_{j} - {U^{k}_{j}}\right\rangle } \\ &=&{ \left\langle V^{k+1}_{j} \cdot \text{diag}(\boldsymbol{ \sigma}^{k}) + \alpha {U^{k}_{j}} , U^{k+1}_{j} - {U^{k}_{j}}\right\rangle } + \frac{\alpha}{2}{ \left\| U^{k+1}_{j} - {U^{k}_{j}} \right\|_{F} }^{2} \\ & \geq& \frac{\alpha}{2}{ \left\| { {\varDelta}_{U_j}^{k+1,k} } \right\|_{F} }^{2} , \end{array} $$
(.38)

where the inequality follows from the definition of \(U^{k+1}_{j}\) in (??). To show the decrease of T, note that Lτ(⋅) is strongly convex with respect to T, based on which we can easily deduce that

$$ L_{\tau}(\boldsymbol{ \sigma}^{k},\boldsymbol{U}^{k+1}, {T}^{k}, {Y}^{k}, W^{k}) - L_{\tau}(\boldsymbol{ \sigma}^{k}, \boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k}, W^{k}) \geq \frac{\tau}{2}{ \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\| }^{2}_{F}. $$
(.39)

Next, it follows from the definition of Yk+ 1 and Lemma A.1 that

$$ \begin{array}{@{}rcl@{}} && L_{\tau}(\boldsymbol{ \sigma}^{k}, \boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k}, W^{k}) - L_{\tau}(\boldsymbol{ \sigma}^{k}, \boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, W^{k}) \\ =&& { \left\langle {Y}^{k+1} - {Y}^{k} , [[ \boldsymbol{ \sigma}^{k}; \boldsymbol{U}^{k+1} ]] - {T}^{k+1} \right\rangle } \\ =&& -\frac{1}{\tau}{ \left\| { {\varDelta}_{ {Y}}^{k+1,k} } \right\|_{F} }^{2}\\ \geq && -\frac{2}{\tau}{ \left( { \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F} }^{2} + { \left\| { {\varDelta}_{ {T}}^{k,k-1} } \right\|_{F} }^{2} \right) }. \end{array} $$
(.40)

Finally, it follows from the definition of σk+ 1 and Wk+ 1 that

$$ \begin{array}{@{}rcl@{}} &&\!\!\!\!\! L_{\tau}(\boldsymbol{ \sigma}^{k}, \boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, W^{k}) - L_{\tau}(\boldsymbol{ \sigma}^{k+1}, \boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, W^{k}) \geq 0, \\ &&\!\!\!\!\! L_{\tau}(\boldsymbol{ \sigma}^{k+1}, \boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, W^{k}) - L_{\tau}(\boldsymbol{ \sigma}^{k+1}, \boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, W^{k+1}) \geq 0. \end{array} $$
(.41) (.42)

As a result, summing up (.37)–(.42) yields

$$ \begin{array}{@{}rcl@{}} && L_{\tau}(\boldsymbol{ \sigma}^{k}, \boldsymbol{U}^{k}, {T}^{k}, {Y}^{k}, W^{k}) - L_{\tau}(\boldsymbol{ \sigma}^{k+1}, \boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, W^{k+1}) \\ \geq && \frac{\alpha}{2}\sum\limits_{j=1}^{d}{ \left\| { {\varDelta}_{U_j}^{k+1,k} } \right\|_{F}^{2}} + { \left( \frac{\tau}{2} - \frac{2}{\tau} \right) }{\left\| { {\varDelta}_{ {T}}^{k+1,k}} \right\|_{F}^{2}} - \frac{2}{\tau}{\left\|{{\varDelta}_{ {T}}^{k,k-1}} \right\|_{F}^{2}} \\ \geq && \frac{\alpha}{2}\sum\limits_{j=1}^{d}{ \left\| { {\varDelta}_{U_j}^{k+1,k} } \right\|_{F} }^{2} + { \left( \frac{2}{\tau} + \frac{1}{\tau} \right) }{ \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F}^{2}} - \frac{2}{\tau} {\left\| { {\varDelta}_{ {T}}^{k,k-1} } \right\|_{F}^{2}}, \end{array} $$
(.43)

where the last inequality follows from the range of τ. Rearranging the terms of (.43) gives the desired results. This completes the proof. □

We then show that \(\tilde L_{\tau }^{k,k-1}\) defined in Lemma A.2 is lower bounded and the sequence {σk,Uk,Tk,Yk,Wk} is bounded as well.

Theorem .3

Under the setting of Lemma A.2, \(\{\tilde L_{\tau }^{k,k-1}\}\) is bounded. The sequence {σk,Uk,Tk,Yk,Wk} generated by Algorithm 1 is bounded as well.

Proof

Denote \(Q^{k}(\cdot ) := \frac {1}{2}{ \left \| \sqrt { W^{k}}\circledast { \left (\cdot - {A} \right ) } \right \|_{F} }^{2} \); thus, we have \(\nabla Q^{k}({T}) = W^{k}\circledast { \left ({T}- {A} \right ) }\), and it then follows from the quadraticity of Qk(⋅) and \( {Y}^{k} = - W^{k-1}\circledast { \left ({T}^{k}- {A} \right ) }\) from (.33) that

$$ \begin{array}{@{}rcl@{}} &&Q^{k-1}({T}^{k}) - Q^{k-1}([[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]])- { \left\langle {Y}^{k} , [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right\rangle } \\ =&& { \left\langle W^{k-1}\circledast{ \left( [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {A} \right) } , {T}^{k} - [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] \right\rangle } \\ &&~~~~~~~~~~~~+\frac{1}{2}{ \left\| \sqrt{ W^{k-1}}\circledast{ \left( [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right) } \right\|_{F} }^{2}- { \left\langle {Y}^{k} , [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right\rangle }\\ =&& \frac{1}{2}{ \left\| \sqrt{ W^{k-1}}\circledast{ \left( [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right) } \right\|_{F} }^{2} \\ &&~~~~~~~~~~~~+ { \left\langle W^{k-1}\circledast{ \left( [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {A} \right) } - W^{k-1}\circledast\left( { {T}^{k}- {A} }\right( , {T}^{k} - [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] \right\rangle }\\ =&& - \frac{1}{2}{ \left\| \sqrt{ W^{k-1}}\circledast{ \left( [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right) } \right\|_{F} }^{2} \geq -\frac{1}{2}{ \left\| [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right\|_{F} }^{2}, \end{array} $$
(.44)

where the last inequality uses the fact that \(0< W^{k-1}_{i_{1}{\cdots } i_{d}} \leq 1\). It thus follows that for any k ≥ 2,

$$ \begin{array}{@{}rcl@{}} && \tilde L_{\tau}^{k-1,k-2} = \tilde L_{\tau}(\boldsymbol{ \sigma}^{k-1},\boldsymbol{U}^{k-1}, {T}^{k-1}, {Y}^{k-1}, W^{k-1}, {T}^{k-2} )\\ &\geq& \tilde L_{\tau}(\boldsymbol{ \sigma}^{k},\boldsymbol{U}^{k}, {T}^{k}, {Y}^{k}, W^{k-1}, {T}^{k-1}) \\ &=& Q^{k-1}({T}^{k}) + \frac{\delta^{2}}{2}\sum\limits_{i_{1}=1,\ldots,i_{d}=1}^{n_{1},\ldots,n_{d}} {\varrho}( W^{k-1}_{i_{1}{\cdots} i_{d}} ) - { \left\langle {Y}^{k} , [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right\rangle } \\ && + \frac{\tau}{2}{ \left\| [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right\|_{F} }^{2} + \frac{2}{\tau}{ \left\| { {\varDelta}_{ {T}}^{k,k-1} } \right\|_{F} }^{2} \\ &\geq & Q^{k-1}([[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] ) + \frac{\tau -1}{2}{ \left\| [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right\|_{F} }^{2} \\ && + \frac{\delta^{2}}{2}\sum\limits_{i_{1}=1,\ldots,i_{d}=1}^{n_{1},\ldots,n_{d}} {\varrho}( W^{k-1}_{i_{1}{\cdots} i_{d}} ) + \frac{2}{\tau}\big\|{\delta P{k}{k-1}}^{2} \\ &>& -\infty, \end{array} $$
(.45)

where the first inequality follows from the proof of Lemma A.2 (summing up (.37)–(.41), the second one comes from (.44), and the last one is due to the range of τ and ϱ(⋅) ≥ 0. Thus, \(\{ \tilde L_{\tau }^{k,k-1} \}\) is a lower bounded sequence. This together with Lemma A.2 shows that \(\{ \tilde L_{\tau }^{k,k-1} \}\) is bounded. We then show the boundedness of {σk,Uk,Tk,Yk,Wk}. The boundedness of {Uk} and {Wk} is obvious. Next, denote g(σk) as the formulation in lines 5–6 of (.45) with respect to σk. Proposition 2.1 shows that \(\bigotimes _{j=1}^{d}u^{k}_{j,i}\) is orthonormal and hence \({ \| [[ \boldsymbol { \sigma }^{k}; \boldsymbol {U}^{k}]] - {T}^{k} \|_{F} }^{2}\) is strongly convex with respect to σk; this together with the convexity of Qk− 1([[σk; Uk]]) shows that g(σk) is strongly convex with respect to σk. Combining this with (.45) gives the boundedness of {σk}. Quite similarly, we have that {Tk} is bounded. Finally, the boundedness of {Yk} follows from the expression of the T-subproblem (??). As a result, {σk,Uk,Tk,Yk,Wk} is a bounded sequence. This completes the proof. □

Proof Proof of Theorem 4.1

Lemma A.2 in connection with Theorem A.1 yields points 1, 2, and (??); (??) together with Lemma A.1 and the definition of Yk+ 1, σk+ 1, and Wk+ 1 gives (??). On the other hand, since the sequence is bounded, limit points exist. Assume that {σ,U,T,Y,W} is a limit point with

$$\setlength\abovedisplayskip{2pt} \setlength\abovedisplayshortskip{2pt} \setlength\belowdisplayskip{2pt} \setlength\belowdisplayshortskip{2pt} \underset{l\rightarrow \infty}{\lim} \{ \boldsymbol{ \sigma}^{k_{l}}, \boldsymbol{U}^{k_{l}}, {T}^{k_{l}}, {Y}^{k_{l}}, W^{k_{l}}\}= \{ \boldsymbol{ \sigma}^{*}, \boldsymbol{U}^{*}, {T}^{*}, {Y}^{*}, W^{*}\}. $$

(??), (??) then implies that

$$ \underset{l\rightarrow \infty}{\lim} \{ \boldsymbol{ \sigma}^{k_{l}+1}, \boldsymbol{U}^{k_{l}+1}, {T}^{k_{l}+1}, {Y}^{k_{l}+1}, W^{k_{l}+1}\} = \{ \boldsymbol{ \sigma}^{*}, \boldsymbol{U}^{*}, {T}^{*}, {Y}^{*}, W^{*}\}. $$

Therefore, taking the limit into l with respect to the uj,i-subproblem (??) yields

$$ \mathbf{v}^{*}_{j,i}\sigma^{*}_{i} + \alpha{u}^{*}_{j,i} = { \left\| \tilde{\mathbf{v}}^{*}_{j,i} \right\| }u^{*}_{j,i},~1\leq j\leq d-t,~1\leq i\leq R. $$
(.46)

Multiplying both sides by \(u^{*}_{j,i}\) gives

$$ { \left\| \tilde{\mathbf{v}}^{*}_{j,i} \right\| } = \alpha + \sigma^{*}_{i} { \left\langle \mathbf{v}^{*}_{j,i} , u^{*}_{j,i}\right\rangle } = \alpha + \sigma^{*}_{i}{ \left\langle {Y}^{*} + \tau {T}^{*} , \bigotimes_{j=1}^{d}u^{*}_{j,i}\right\rangle } = \alpha + \tau(\sigma^{*}_{i})^{2}, $$
(.47)

where the second equality follows from the definition of vj,i and the last one is given by passing the limit into the expression of \(\sigma ^{k_{l}+1}_{i}\) (??). Thus, (.46) together with (.47) gives

$$ ({Y}^{*} + \tau {T}^{*})\bigotimes_{l\neq j}^{d}u_{l,i}^{*} = \sigma_{i}^{*}\tau{u}_{j,i}^{*}, $$
(.48)

i.e., the first equation of the stationary point system (??).

Taking the limit into l with respect to the Uj-subproblem (??) and noticing the expression (??), we get

$$ V^{*}_{j}\text{diag}(\boldsymbol{ \sigma}^{*}) + \alpha U^{*}_{j} = U^{*}_{j} H^{*}_{j}, $$

where \(H^{*}_{j}\) is a symmetric matrix. Writing it columnwisely, we obtain

$$ \sigma^{*}_{i} { \left( {Y}^{*} + \tau {T}^{*} \right) }\bigotimes_{l\neq j}^{d}u^{*}_{l,i} = {\sum}_{i=1}^{R}(H^{*}_{j} )_{i,r}u^{*}_{j,r} - \alpha{u}^{*}_{j,i},~d-t+1\!\leq\! j\!\leq\! d,~1\leq i\leq R. $$

Denoting \({\Lambda }^{*}_{j}:= H^{*}_{j} - \alpha I\), the above is exactly the third equality of (??). On the other hand, passing the limit into the expression of Tk (??) and Wk (??) respectively gives the T- and W- formulas in (??). Finally, the first expression of (??) yields T = [[σ; U]]. Taking the above pieces together, we have that {σ,U,T,Y,W} satisfies the stationary point system (??).

Next, we show that {σ,U} is also a stationary point of problem (??). We define its Lagrangian function as \(L_{\boldsymbol { {\varPhi }}} := \boldsymbol { {\varPhi }}_{\delta }(\boldsymbol { \sigma }, \boldsymbol {U}) - {\sum }_{j,i=1}^{d-t,R} \eta _{j.i}{ \left (u_{j,i}^{\top } u_{j,i} -1 \right ) } - {\sum }^{d}_{j=d-t+1}{ \left \langle {\Lambda }_{j} , U_{j}^{\top } U_{j} - I\right \rangle }\), similar to that in (??). Taking derivative yields

$$ \left\{ \begin{array}{lr} \partial_{u_{j,i}} \boldsymbol{ {\varPhi}}_{\delta}(\boldsymbol{ \sigma};\boldsymbol{U}) = \eta_{j,i}u_{j,i} \Leftrightarrow W\circledast\left( \left[\left[\boldsymbol{ \sigma},\boldsymbol{U}\right]\right]- {A} \right) \cdot\sigma_{i}\bigotimes_{l\neq j}u_{j,i}= \eta_{j,i}u_{j,i} ,&\\ 1\leq j\leq d-t,1\leq i\leq R,\\ \partial_{u_{j,i}}\boldsymbol{ {\varPhi}}_{\delta}(\boldsymbol{ \sigma},\boldsymbol{U}) = {\sum}^{R}_{r=1}({\Lambda}_{j})_{i,r}u_{j,r} \Leftrightarrow W\circledast\left( {[[\boldsymbol{ \sigma},\boldsymbol{U}]]- {A} }\right) \cdot\sigma_{i}\bigotimes_{l\neq j}u_{j,i}={\sum}^{R}_{r=1}({\Lambda}_{j})_{i,r}u_{j,r}, ,&\\ d-t+1\leq j\leq d,1\leq i\leq R,\\ \partial_{\boldsymbol{ \sigma}}\boldsymbol{ {\varPhi}}_{\delta}(\boldsymbol{ \sigma},\boldsymbol{U}) =0\Leftrightarrow { \left\langle W\circledast{ \left( { \left[\left[ \boldsymbol{ \sigma}; \boldsymbol{U} \right]\right] } - {A} \right) } , \bigotimes_{l=1}^{d} u_{l,i}\right\rangle } = 0,1\leq i\leq R, \end{array} \right. $$
(.49)

1.2 Proof of Theorem 4.2

To prove Theorem 4.2, we first recall some definitions from nonsmooth analysis. Denote \(\text {dom}f:=\{x\in \mathbb {R}^{n}\mid f(\mathbf {x})<+\infty \}\).

Definition 1 (c.f. 2)

For x ∈domf, the Fréchet subdifferential, denoted as \(\hat \partial f(\mathbf {x})\), is the set of vectors \(z\in \mathbb R^{n}\) satisfying

$$ \underset{\mathbf{y} \rightarrow{x}}{\underset{y \neq x}{\lim\inf}} \frac{f(y)-f(x)-\langle z, y-x\rangle}{\|\mathbf{x}-\mathbf{y}\|}\geq 0. $$
(.50)

The subdifferential of f at x ∈domf, written f, is defined as

$$ \setlength\abovedisplayskip{3pt} \partial f(\mathbf{x}):=\left\{\mathbf{z} \in \mathbb{R}^{n}: \exists x^{k} \rightarrow \mathbf{x}, f\left( \mathbf{x}^{k}\right) \rightarrow{f}(\mathbf{x}), \mathbf{z}^{k} \in \hat{\partial} f\left( \mathbf{x}^{k}\right) \rightarrow \mathbf{z}\right\}. $$

It is known that \(\hat \partial f(\mathbf {x})\subset \partial f(\mathbf {x})\) for each \(x\in \mathbb R^{n}\) [4]. An extended-real-valued function is a function \(f:\mathbb {R}^{n}\rightarrow [-\infty ,\infty ]\), which is proper if \(f(\mathbf {x})>-\infty \) for all x and \(f(x)<\infty \) for at least one x. It is called closed if it is lower semi-continuous (l.s.c. for short). The global convergence relies on the the Kurdyka-Łojasiewicz (KL) property given as follows:

Definition 2 (KL property and KL function, c.f. 2, 4)

A proper function f is said to have the KL property at \(\overline {x}\in \text {dom}\partial f :=\{x\in \mathbb R^{n}\mid \partial f(x)\neq \emptyset \}\), if there exist \(\bar \epsilon \in (0,\infty ]\), a neighborhood N of \(\overline {x}\), and a continuous and concave function \(\psi : [0,\bar \epsilon ) \rightarrow \mathbb R_{+}\) which is continuously differentiable on \((0,\bar \epsilon )\) with positive derivatives and ψ(0) = 0, such that for all xN satisfying \(f(\overline {x}) <f({x}) < f(\overline {x}) + \bar \epsilon \), it holds that

$$ \psi^{\prime}(f(\mathbf{x}) - f(\overline{\mathbf{x}}))\text{dist}(0,\partial f(x)) \geq 1, $$

where dist(0,f(x)) means the distance from the original point to the set f(x). If a proper and l.s.c. function f satisfies the KL property at each point of domf, then f is called a KL function.

We then simplify \(\tilde L_{\tau }(\cdot )\) by eliminating the variables W and σ. First, from the definition of Wk+ 1 and Lemma 2.1, we have that

$$ { \left\| \sqrt W^{k+1} \circledast{ \left( {T}^{k+1} - {A} \right) } \right\|_{F} }^{2} + \delta^{2}{\sum}^{n_{1},\ldots,n_{d}}_{i_{1}=1,\ldots,i_{d}=1}{\varrho}(W^{k+1}_{i_{1}{\cdots} i_{d}}) = \boldsymbol{ {\varPhi}}_{\delta}({T}^{k+1}- {A} ), $$

where Φδ(⋅) is defined in (??). This eliminate the W from \(\tilde L_{\tau }(\cdot )\). On the other hand, it follows from the definition of σk+ 1 (??) that

$$ \begin{array}{@{}rcl@{}} &&-{ \left\langle {Y}^{k+1} , [[\boldsymbol{ \sigma}^{k+1};\boldsymbol{U}^{k+1} ]] - {T}^{k+1} \right\rangle } + \frac{\tau}{2}{ \left\| [[\boldsymbol{ \sigma}^{k+1};\boldsymbol{U}^{k+1} ]] - {T}^{k+1} \right\|_{F} }^{2} \\ =&& { \left\langle {Y}^{k+1} , {T}^{k+1}\right\rangle } + \frac{\tau}{2}{ \left\| {T}^{k+1} \right\|_{F} }^{2} - \frac{1}{2\tau}{\sum}_{i=1}^{R}{ \left( \left( { {Y}^{k+1}+ \tau {T}^{k+1}}\right)\bigotimes_{j=1}^{d}u_{j,i}^{k+1} \right) }^{2}. \end{array} $$

Thus, σ is also eliminated. In what follows, whenever necessary, \({\sigma ^{k}_{i}} \) still represents the expression \( ({Y}^{k}+\tau {T}^{k})\bigotimes _{j=1}^{d}u^{k}_{j,i}/\tau \), but we only treat it as a representation instead of a variable.

Then, \(\tilde L_{\tau }(\boldsymbol { \sigma }^{k+1}, \boldsymbol {U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, W^{k+1}, {T}^{k})\) can be equivalently written as

$$ \begin{array}{@{}rcl@{}} &&\tilde L_{\tau}(\boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, {T}^{k} ) \\ =&& \frac{1}{2}\boldsymbol{ {\varPhi}}_{\delta}({T}^{k+1} - {A}) + { \left\langle {Y}^{k+1} , {T}^{k+1}\right\rangle } + \frac{\tau}{2}{ \left\| {T}^{k+1} \right\|_{F} }^{2}\\ &&~~~~~~~~ - \frac{1}{2\tau}{\sum}_{i=1}^{R}{ \left( \left( { {Y}^{k+1}+ \tau {T}^{k+1}}\right)\bigotimes_{j=1}^{d}u_{j,i}^{k+1} \right) }^{2} + \frac{2}{\tau}{ \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F} }^{2}. \end{array} $$

In addition, we denote

$$ \begin{array}{@{}rcl@{}} &&\tilde L_{\tau,\alpha}(\boldsymbol{U}, {T}, {Y}, {T}^{\prime}) := \tilde L_{\tau}(\boldsymbol{U}, {T}, {Y}, {T}^{\prime}) - \frac{\alpha}{2}{\sum}^{d}_{j=1}\big\|{ U_{j}}^{2} \\ &&~~~~~~~~~~~~+ {\sum}^{d-t,R}_{j=1,i=1}\iota_{{ \text{st}(n_{j},1) } }(u_{j,i}) + {\sum}^{d}_{j=d-t+1}\iota_{{ \text{st}(n_{j},R) }}(U_{j}). \end{array} $$

We can see that under the constraints of the optimization problem (??), \(\tilde L_{\tau ,\alpha }(\cdot ) = \tilde L_{\tau }(\cdot ) -\frac {\alpha d R}{2}\). This together with Theorem 4.1 tells us that the sequence \(\{\tilde L_{\tau ,\alpha }(\boldsymbol {U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, {T}^{k}), \}\) is also bounded and nonincreasing. In addition, we have that \(\tilde L_{\tau ,\alpha }(\cdot )\) is a KL function.

Proposition .4

\(\tilde L_{\tau ,\alpha }(\boldsymbol {U}, {T}, {Y}, {T}^{\prime }) \) defined above is a proper, l.s.c., and KL function.

Proof

It is clear that \(\tilde L_{\tau ,\alpha }(\cdot )\) is proper and l.s.c.. Next, since the constrained sets in (??) are all Stiefel manifolds, items 2 and 6 of [4, Example 2] tell us that they are semi-algebraic sets, and their indicator functions are semi-algebraic functions. Therefore, the indicator functions are KL functions [4, Theorem 3]. On the other hand, the remaining part of \(\tilde L_{\tau ,\alpha }\) (besides the indicator functions) is an analytic function and hence it is KL [4]. As a result, \(\tilde L_{\tau ,\alpha }(\boldsymbol {U}, {T}, {Y}, {T}^{\prime }) \) is a KL function. □

In the sequel, we mainly rely on \(\tilde L_{\tau ,\alpha }(\cdot )\) to prove the global convergence. For convenience, we denote

$$ \begin{array}{@{}rcl@{}} \tilde L_{\tau,\alpha}^{k+1,k} &:=& \tilde L_{\tau,\alpha}(\boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, {T}^{k}), ~\text{and}\\ \partial \tilde L_{\tau,\alpha}^{k+1,k}&:=& \partial \tilde L_{\tau,\alpha}(\boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, {T}^{k}); \end{array} $$

denote \({ {\varDelta }_{\boldsymbol {U}, {T}}^{k+1,k} }:= (\boldsymbol {U}^{k+1} , {T}^{k+1}) - (\boldsymbol {U}^{k}, {T}^{k})\), and

$$ { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } := \sqrt{{\sum}^{d}_{j=1}{ \left\| { {\varDelta}_{U_j}^{k+1,k} } \right\|_{F} }^{2} + { \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F} }^{2} }. $$

Lemma .4

There exists a large enough constant c0 > 0, such that

$$ \text{dist}(\boldsymbol{0}, \partial { \tilde L_{\tau,\alpha}^{k+1,k} } ) \leq c_{0}{ \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} } \right) }. $$
(A.19)

Proof

We first consider \(\partial _{u_{j,i}} { \tilde L_{\tau ,\alpha }^{k+1,k} } \), 1 ≤ jdt, 1 ≤ iR, and \(\partial _{U_{j}} { \tilde L_{\tau ,\alpha }^{k+1,k} } \), dt + 1 ≤ jd, respectively. In what follows, we denote

$$ \overline{\mathbf{v}}^{k+1}_{j,i}\!:=\! \sigma^{k+1}_{i} { \left( {Y}^{k+1} + \tau {T}^{k+1} \right) }\bigotimes_{l\neq j}^{d}\mathbf{u}^{k+1}_{l,i} + \alpha\mathbf{u}^{k+1}_{j,i}, ~\text{and}~ \overline V^{k+1}_{j} \!:=\! [\bar{\mathbf{v}}^{k+1}_{j,1},\ldots,\bar{\mathbf{v}}^{k+1}_{j,R} ]. $$

We also recall \(\mathbf {v}_{j,i}^{k+1}:= ({Y}^{k}+ \tau {T}^{k}){\mathbf {u}_{1,i}^{k+1}\otimes \cdots \otimes \mathbf {u}_{j-1,i}^{k+1} \otimes \mathbf {u}_{j+1,i}^{k} \otimes \cdots \otimes \mathbf {u}_{d,i}^{k} }\) and \(\tilde {\mathbf {v}}_{j,i}^{k+1} = {\sigma ^{k}_{i}} \mathbf {v}^{k+1}_{j,i} + \alpha \mathbf {u}^{k}_{j,i}\) for later use. In addition, denote \(\tilde V^{k+1}_{j} := [\tilde {\mathbf {v}}^{k+1}_{j,1},\ldots ,\tilde {\mathbf {v}}^{k+1}_{j,R}]\).

For 1 ≤ jdt, one has

$$ \begin{array}{@{}rcl@{}} \partial_{\mathbf{u}_{j,i}}{ \tilde L_{\tau,\alpha}^{k+1,k} } &=& -\sigma^{k+1}_{i} { \left( {Y}^{k+1}+\tau {T}^{k+1} \right) }\bigotimes_{l\neq j}^{d}\mathbf{u}^{k+1}_{l,i}- \alpha\mathbf{u}^{k+1}_{j,i} + \partial \iota_{\text{st}{n_{j}}{1} }(u^{k+1}_{j,i})\\ &=& - \overline{\mathbf{v}}^{k+1}_{j,i} + \partial \iota_{{ \text{st}(n_{j},1) }}(\mathbf{u}^{k+1}_{j,i}). \end{array} $$
(A.20)

we then wish to show that

$$ \tilde{\mathbf{v}}^{k+1}_{j,i} \in \hat \partial \iota_{{ \text{st}(n_{j},1) } }(\mathbf{u}^{k+1}_{j,i}) \subset \partial \iota_{{ \text{st}(n_{j},1) } }(u^{k+1}_{j,i}). $$
(A.21)

The proof is similar to that of [53, Lemma 6.1]. First, from the definition of \(\iota _{{ \text {st}(n_{j},1) }}(\cdot ) \) and \(\hat \partial \iota _{{ \text {st}(n_{j},1) }}(\cdot )\) in (.50), it is not hard to see that if y∉st(nj, 1), then (.50)clearly holds when \({z} = \tilde {\mathbf {v}}^{k+1}_{j,i}\); otherwise if y ∈st(nj, 1), i.e., ∥y∥ = 1, then from the definition of \(\mathbf {u}^{k+1}_{j,i}\), we see that

$$ \mathbf{u}^{k+1}_{j,i} = \arg\underset{ \|\mathbf{y}\|=1 }{\max} { \left\langle \mathbf{y} , \tilde{\mathbf{v}}^{k+1}_{j,i} \right\rangle }\Leftrightarrow \langle \tilde{{v}}^{k+1}_{j,i},\mathbf{u}^{k+1}_{j,i}-\mathbf{y}\rangle \geq 0,~\forall \|\mathbf{y}\|=1, $$

which together with \(\iota _{{ \text {st}(n_{j},1) }}(\mathbf {y}) = 0\) and \(\iota _{{ \text {st}(n_{j},1) }}(u^{k+1}_{j,i})=0\) gives

$$ \setlength\abovedisplayskip{3pt} \setlength\abovedisplayshortskip{3pt} \setlength\belowdisplayskip{3pt} \setlength\belowdisplayshortskip{3pt} \underset{y \neq u^{k+1}_{j,i}, y \rightarrow{u}^{k+1}_{j,i} }{\lim\inf} \frac{ \iota_{{ \text{st}(n_{j},1) } }(\mathbf{y}) -\iota_{{ \text{st}(n_{j},1) }}(u^{k+1}_{j,i}) -\langle \tilde{{v}}^{k+1}_{j,i}, y -u^{k+1}_{j,i} \rangle}{\|\mathbf{y}- {u}^{k+1}_{j,i} \|}\geq 0. $$

As a result, (A.21) is true, which together with (A.20) shows that

$$ \tilde{\mathbf{v}}^{k+1}_{j,i} - \overline{\mathbf{v}}^{k+1}_{j,i} \in \partial_{\mathbf{u}_{j,i}} { \tilde L_{\tau,\alpha}^{k+1,k} } , ~1\leq j\leq d-t,~1\leq i\leq R. $$

Let 0 denote the origin. Then by using the triangle inequality and the boundeness of {σk,Uk,Tk,Yk}, and noticing the definition of \({ {\varDelta }_{\boldsymbol {U}, {T}}^{k+1,k} }\), there must exist large enough constants c1,c2 > 0 only depending on τ,α, and the size of {σk,Uk,Tk,Yk}, such that

$$ \begin{array}{@{}rcl@{}} && \quad\text{dist}(\boldsymbol{0}, \partial_{\mathbf{u}_{j,i}} { \tilde L_{\tau,\alpha}^{k+1,k} } ) \\ && \leq{ \left\| \tilde{\mathbf{v}}^{k+1}_{j,i} - \overline{\mathbf{v}}^{k+1}_{j,i} \right\| } \\ &&\leq c_{1}{ \left( \sum\limits^{d}_{j=1}{ \left\| { {\varDelta}_{U_j}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{ {Y}}^{k+1,k} } \right\|_{F} } \right) }\\ &&\leq c_{1}{ \left( \sum\limits^{d}_{j=1}{ \left\| { {\varDelta}_{U_j}^{k+1,k} } \right\|_{F} } + 2{ \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{ {T}}^{k,k-1} } \right\|_{F} } \right) } \\ &&\leq c_{2} { \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} } \right) },~1\leq j\leq d-t. \end{array} $$
(A.22)

On the other hand, for dt + 1 ≤ jd, by noticing the definition of \(\overline { V}^{k+1}_{j}\), we have

$$ \partial_{U_{j}} { \tilde L_{\tau,\alpha}^{k+1,k} } = - \overline{V}^{k+1}_{j} + \partial \iota_{{ \text{st}(n_{j},R) } }(U^{k+1}_{j}). $$

From the definition of \(U^{k+1}_{j}\) in (??) and similar to the above argument, we can show that \(\tilde V^{k+1}_{j} \in \partial \iota _{{ \text {st}(n_{j},R) }}(U^{k+1}_{j}). \) Thus,

$$ \tilde V^{k+1}_{j} - \overline V^{k+1}_{j} \in \partial_{U_{j}} { \tilde L_{\tau,\alpha}^{k+1,k} } ,~d-t+1\leq j\leq d. $$

Similar to (A.22), there exists a large enough constant c3 > 0 such that

$$ \text{dist}(\boldsymbol{0}, \partial_{u_{j,i}} { \tilde L_{\tau,\alpha}^{k+1,k} } ) \leq c_{3}{ \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} } \right) },~d-t+1\leq j\leq d. $$
(A.23)

We then consider

$$ \nabla_{ {T}} { \tilde L_{\tau,\alpha}^{k+1,k} } = {W}^{k+1}\circledast{ \left( {T}^{k+1} - {A} \right) } + {Y}^{k+1} - \tau{ \left( [[ \boldsymbol{ \sigma}^{k+1};\boldsymbol{U}^{k+1} ]] - {T}^{k+1} \right) } + \frac{4}{\tau}{ \left( {T}^{k+1} - {T}^{k} \right) }. $$

Note that Wk+ 1 and σk+ 1 above are only representations instead of variables, which represent (??) and (??). From the expression of Yk+ 1 in (.33), we have

$$ \begin{array}{@{}rcl@{}} { \left\| W^{k+1}\circledast{ \left( {T}^{k+1} - {A} \right) } + {Y}^{k+1} \right\|_{F} } &=& { \left\| { \left( W^{k+1} - W^{k} \right) }\circledast{ \left( {T}^{k+1} - {A} \right) } \right\|_{F} }\\ &\leq&{ \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F} }, \end{array} $$

where the inequality follows from Proposition 2.3. On the other side,

$$ \begin{array}{@{}rcl@{}} \tau{ \left\| { [[ \boldsymbol{ \sigma}^{k+1};\boldsymbol{U}^{k+1} ]] - {T}^{k+1} } \right\|_{F} } &=& \tau{ \left\| [[\boldsymbol{\sigma}^{k+1};\boldsymbol{U}^{k+1} ]] - [[\boldsymbol{\sigma}^{k};\boldsymbol{U}^{k+1} ]] + [[\boldsymbol{\sigma}^{k };\boldsymbol{U}^{k+1} ]] - {T}^{k+1} \right\|_{F} } \\ &\leq& \tau { \left\| [[\boldsymbol{\sigma}^{k+1};\boldsymbol{U}^{k+1} ]] - [[\boldsymbol{\sigma}^{k};\boldsymbol{U}^{k+1} ]] \right\|_{F} } + { \left\| { {\varDelta}_{ {Y}}^{k+1,k} } \right\|_{F} }\\ &\leq& c_{4}{ \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} } \right) }, \end{array} $$
(A.24)

where c4 > 0 is large enough. Combining the above pieces shows that there exists a large enough constant c5 > 0 such that

$$ { \left\| \nabla_{ {T}} { \tilde L_{\tau,\alpha}^{k+1,k} } \right\|_{F} } \leq c_{5}{ \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} } \right) }. $$
(A.25)

Next, it follows from (A.24) that

$$ { \left\| \nabla_{ {Y}}{ \tilde L_{\tau,\alpha}^{k+1,k} } \right\|_{F} } = { \left\| { [[ \boldsymbol{ \sigma}^{k+1};\boldsymbol{U}^{k+1} ]] - {T}^{k+1} } \right\|_{F} } \leq \frac{c_{4}}{\tau}{ \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} } \right) }. $$
(A.26)

Finally,

$$ { \left\| \nabla_{ {T}^{\prime}} { \tilde L_{\tau,\alpha}^{k+1,k} } \right\|_{F} } = \frac{4}{\tau}{ \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F} }. $$
(A.27)

Combining (A.22), (A.23), (A.25), (A.26), (A.27), we get that there exists a large enough constant c0 > 0 independent of k, such that

$$ \text{dist}(\boldsymbol{0}, \partial { \tilde L_{\tau,\alpha}^{k+1,k} } ) \leq c_{0}{ \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} } \right) }, $$

as desired. □

Now, we can present the proof concerning global convergence.

Proof Proof of Theorem 4.2

We have mentioned that \(\{ { \tilde L_{\tau ,\alpha }^{k+1,k} } \}\) inherits the properties of \(\{\tilde L_{\tau }^{k+1,k} \}\), i.e., it is bounded, nonincreasing and convergent. We denote its limit as \(\tilde L^{*}_{\tau ,\alpha } = \lim _{k\rightarrow \infty } \tilde L^{k+1,k}_{\tau ,\alpha } = \tilde L_{\tau ,\alpha }(\boldsymbol {U}^{*}, {T}^{*}, {Y}^{*}, {T}^{*})\), where {U,T,Y,T} is a limit point. According to Definition 2 and Proposition A.1, there exist an 𝜖0 > 0, a neighborhood of {U,T,Y,T}, and a continuous and concave function \(\psi (\cdot ):[0,\epsilon _{0}) \rightarrow \mathbb {R}_{+}\) such that for all \(\{\boldsymbol {U}, {T}, {Y}, {T}^{\prime }\} \in {N}\) satisfying \(\tilde L_{\tau ,\alpha }^{*} < \tilde L_{\tau ,\alpha }(\boldsymbol {U}, {T}, {Y}, {T}^{\prime }) <\tilde L_{\tau ,\alpha }^{*} + \epsilon _{0}\), there holds

$$ \psi^{\prime}(\tilde L_{\tau,\alpha}(\boldsymbol{U}, {T}, {Y}, {T}^{\prime}) -\tilde L_{\tau,\alpha}^{*} )\text{dist}(0,\partial \tilde L_{\tau,\alpha}(\boldsymbol{U}, {T}, {Y}, {T}^{\prime}) \geq 1. $$
(A.28)

Let 𝜖1 > 0 be such that

$$ \begin{array}{@{}rcl@{}} &&\mathbb B_{\epsilon_{1}} := \{ { \left( \boldsymbol{U}, {T}, {Y}, {T}^{\prime} \right) }\mid \|{ \left\| U_{j}-U^{*}_{j} \right\|_{F} } < \epsilon_{1},1\leq j\leq d,\big\|{ {T}- {T}^{*}}< \epsilon_{1}, \\ &&{ \left\| {Y}- {Y}^{*} \right\|_{F} }<2 \epsilon_{1}, { \left\| {T}^{\prime} - {T}^{*} \right\|_{F} } <2\epsilon_{1} \} \subset N, \end{array} $$

and let \(\mathbb B^{\boldsymbol {U}, {T}}_{\epsilon _{1}}:= \{ { \left (\boldsymbol {U}, {T} \right ) }\mid { \left \| U_{j} -U^{*}_{j} \right \|_{F} } < \epsilon _{1},1\leq j\leq d,{ \left \| {T}- {T}^{*} \right \|_{F} }<\epsilon _{1} \}\). From the stationary point system (??) and the expression of Yk+ 1 in (.33), we have

$$ \begin{array}{@{}rcl@{}} { \left\| {Y}^{k} - {Y}^{*} \right\|_{F} } &=& { \left\| W^{k-1}\circledast{ \left( {T}^{k} - {A} \right) } - {W}^{*}\circledast { \left( {T}^{*} - {A} \right) } \right\|_{F} } \\ &\leq& { \left\| W^{k-1}\circledast{ \left( {T}^{k} - {A} \right) } - W^{k}\circledast{ \left( {T}^{k}- {A} \right) } \right\|_{F} } \\ &&+{ \left\| {W}^{k}\circledast{ \left( {T}^{k}- {A} \right) } - {W}^{*}\circledast { \left( {T}^{*}- {A} \right) } \right\|_{F} } \\ &\leq& { \left\| { {\varDelta}_{ {T}}^{k,k-1} } \right\|_{F} } + { \left\| { {\varDelta}_{ {T}}^{k,*} } \right\|_{F} } \end{array} $$
(A.29)

where the last inequality follows from Propositions 2.3 and 2.2. On the other hand,

$$ { \left\| {T}^{k-1} - {T}^{*} \right\|_{F} } \leq { \left\| { {\varDelta}_{ {T}}^{k,k-1} } \right\|_{F} } + \left\| { {\varDelta}_{ {T}}^{k,*} } \right\|_{F} . $$
(A.30)

As Theorem 4.1 shows that there exists k0 > 0 such that for kk0, \({ \left \| { {\varDelta }_{ {T}}^{k,k-1} } \right \|_{F} }<\epsilon _{1}\), (A.29) and (A.30) tells us that if kk0 and \(\{\boldsymbol {U}^{k}, {T}^{k}\}\in \mathbb {B}^{\boldsymbol {U}, {T} }_{\epsilon _{1} }\), then \(\{\boldsymbol {U}^{k}, {T}^{k}, {Y}^{k}, {T}^{k-1} \} \in \mathbb {B}_{\epsilon _{1} } \subset N\). Such k0 must exist as {U,T,Y,T} is a limit point. In addition, denote \(c_{1}:=\min \limits \{\alpha /2,1/\tau \}\); then, there exists k1k0 such that \( \{ \boldsymbol {U}^{k_{1}}, {T}^{k_{1}} \} \in \mathbb {B}^{\boldsymbol {U}, {T} }_{\epsilon _{1}/2} \) and

$$ \begin{array}{ll} & \frac{c_{0}}{2\sqrt{c_{1}}c_{2}} { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1},k_{1}-1} } \right\|_{F} } < \frac{\epsilon_{1}}{16},~ \frac{c_{0}}{2\sqrt{c_{1}}c_{2}} { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1}-1,k_{1}-2} } \right\|_{F} } <\frac{\epsilon_{1}}{16},~ \\ &\frac{c_{2}}{2\sqrt{c_{1}}} \psi(\tilde L_{\tau,\alpha}^{k_{1},k_{1}-1} - L^{*}_{\tau,\alpha} ) < \frac{\epsilon_{1}}{4},~L^{*}_{\tau,\alpha} < \tilde L_{\tau,\alpha}^{k_{1},k_{1}-1} < L^{*}_{\tau,\alpha} + \epsilon_{0}, \end{array} $$
(A.31)

where c0 is the constant appearing in Lemma A.3, and c2 is a constant such that \(c_{2} > 16c_{0}/\sqrt {c_{1}}\).

In what follows, we use induction method to show that \(\{\boldsymbol {U}^{k}, {T}^{k}\}\in \mathbb B^{\boldsymbol {U}, {T}}_{\epsilon _{1}}\) for all k > k1. Since ψ(⋅) in Definition 2 is concave, it holds that for any k,

$$ \psi^{\prime}(\tilde L^{k,k-1}_{\tau,\alpha} - L^{*}_{\tau,\alpha} )\left( (\tilde L^{k,k-1}_{\tau,\alpha} - \tilde L^{*}_{\tau,\alpha}) - (\tilde L^{k+1,k}_{\tau,\alpha} - \tilde L^{*}_{\tau,\alpha} ) \right) \!\leq\! \psi(\tilde L^{k,k-1}_{\tau,\alpha} \!-\tilde L^{*}_{\tau,\alpha}) - \psi(\tilde L^{k+1,k}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha} ); $$
(A.32)

on the other side, from the previous paragraph we see that \(\{\boldsymbol {U}^{k_{1}}, {T}^{k_{1}}\}\in \mathbb B^{\boldsymbol {U}, {T} }_{\epsilon _{1}/2}\), \(\{ \boldsymbol {U}^{k_{1}}, {T}^{k_{1}}, {Y}^{k_{1}}, {T}^{k_{1}-1} \} \in \mathbb {B}_{\epsilon _{1}} \subset {N}\), and so (A.28) holds at \(\{\boldsymbol {U}^{k_{1}}, {T}^{k_{1}}, {Y}^{k_{1}}, {T}^{k_{1}-1} \}\). Recall \(c_{1}=\min \limits \{\alpha /2,1/\tau \}\). From Lemma A.2 and the relation between \(\tilde L_{\tau }\) and \(\tilde L_{\tau ,\alpha }\), we obtain

$$ \begin{array}{@{}rcl@{}} c_{1}{ \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1}+1,k} } \right\|_{F} }^{2} &\leq& \tilde L^{k_{1},k_{1}-1}_{\tau,\alpha} - \tilde L_{\tau,\alpha}^{k_{1}+1,k_{1}} \\ &\leq& \frac{\psi(\tilde L^{k_{1},k_{1}-1}_{\tau,\alpha} - \tilde L^{*}_{\tau,\alpha}) - \psi(\tilde L^{k_{1}+1,k_{1}}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha} )}{\psi^{\prime}(\tilde L^{k_{1},k_{1}-1}_{\tau,\alpha} - \tilde L^{*}_{\tau,\alpha} )} \\ &\leq& c_{2}\left( \psi(\tilde L^{k_{1},k_{1}-1}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha}) - \psi(\tilde L^{k_{1}+1,k_{1}}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha} ) \right) \cdot c_{2}^{-1}\text{dist}(0, \partial \tilde L^{k_{1},k_{1}-1}_{\tau,\alpha} ), \end{array} $$

where the second inequality is due to (A.32)while the last one comes from (A.28). Using \(\sqrt {ab}\leq \frac {a+b}{2}\) for a ≥ 0,b ≥ 0, invoking (A.19) and noticing the range in (A.31), we obtain

$$ \begin{array}{@{}rcl@{}} \sqrt{c_{1}} { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1}+1,k} } \right\|_{F} } &\leq& \frac{c_{2}}{2}\left( \psi(\tilde L^{k_{1},k_{1}-1}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha}) - \psi(\tilde L^{k_{1}+1,k_{1}}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha} ) \right)\\ &&~~~~~~~~~~~~~~~~~~~~ + \frac{c_{0} }{2c_{2}}{ \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1},k_{1}-1} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1}-1,k_{1}-2} } \right\|_{F} } \right) }\\ &<&\frac{ \sqrt{c_{1}}\epsilon_{1}}{4} + \frac{ \sqrt{c_{1}}\epsilon_{1}}{8} < \frac{\sqrt{c_{1}}\epsilon_{1}}{2}, \end{array} $$

and so

$$ { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1}+1,*} } \right\|_{F} }\leq { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1}+1,k_{1}} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1},*} } \right\|_{F} } < \frac{\epsilon_{1}}{2} + \frac{\epsilon_{1}}{2}=\epsilon_{1}, $$

namely, \( \{\boldsymbol {U}^{k_{1}+1}, {T}^{k_{1}+1}\} \in \mathbb B^{\boldsymbol {U}, {T} }_{\epsilon _{1}}\).

Now, assume that \(\{\boldsymbol {U}^{k}, {T}^{k}\}\in \mathbb B^{\boldsymbol {U}, {T} }_{\epsilon _{1}}\) for k = k1,…,K. This implies that (A.28) is true at {Uk,Tk,Yk,Tk− 1}, and similarly to the above analysis, we have

$$ \begin{array}{@{}rcl@{}} &&\sqrt{c_{1}} { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } \leq \frac{c_{2}}{2}\left( \psi(\tilde L^{k,k-1}_{\tau,\alpha} - \tilde L^{*}_{\tau,\alpha}) - \psi(\tilde L^{k+1,k}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha} ) \right) \\ &&~~~~~~~~~~+ \frac{c_{0}}{2c_{2}}\left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} }+ { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k-1,k-2} } \right\|_{F} } \right),~k=k_{1},\ldots,K. \end{array} $$
(A.33)

We then show that \(\{\boldsymbol {U}^{K+1}, {T}^{K+1}\}\in \mathbb B^{\boldsymbol {U}, {T}}_{\epsilon _{1}}\). Summing (A.33) for k = k1,…,K yields

$$ \begin{array}{@{}rcl@{}} &&\sqrt{c_{1}} {\sum}^{K}_{k=k_{1}}{ \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } \\ &\leq& \frac{c_{2}}{2}\left( \psi(\tilde L^{k_{1},k_{1}-1}_{\tau,\alpha} - \tilde L^{*}_{\tau,\alpha}) - \psi(\tilde L^{K+1,K}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha} ) \right) \\ && + \frac{c_{0}}{2c_{2}}{\sum}^{K}_{k=k_{1}} \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k-1,k-2} } \right\|_{F} } \right)\\ &\leq& \frac{c_{2}}{2}\left( \psi(\tilde L^{k_{1},k_{1}-1}_{\tau,\alpha} - \tilde L^{*}_{\tau,\alpha}) - \psi(\tilde L^{K+1,K}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha} ) \right) \\ &&+ \frac{c_{0}}{c_{2}}\sum\limits^{K-1}_{k=k_{1}}{ \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } + \frac{2c_{0}}{c_{2}}{ \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1},k_{1}-1} } \right\|_{F} }+ \frac{c_{0}}{c_{2}}{ \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1}-1,k_{1}-2} } \right\|_{F} } . \end{array} $$
(A.34)

Rearranging the terms, noticing (A.31) and noticing that \(\frac {c_{2}}{c_{0}}> \frac {\sqrt {c_{1}}}{16}\), we have

$$ \frac{15\sqrt{c_{1}}}{16} \sum\limits^{K}_{k=k_{1}} { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } \leq \frac{\sqrt{c_{1}}}{4}\epsilon_{1} + \frac{\sqrt c_{1}\epsilon_{1}}{16} + \frac{\sqrt c_{1}\epsilon_{1}}{16}, $$

and so

$$ \begin{array}{@{}rcl@{}} { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{K+1,*} } \right\|_{F} }&\leq& { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{K+1,k_{1}} } \right\|_{F} }+ { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1},*} } \right\|_{F} }\\ &<& \sum\limits^{K}_{k=k_{1}}{ \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} }+ \frac{\epsilon_{1}}{2}\\ &< & \frac{3\epsilon_{1}}{8} + \frac{\epsilon_{1}}{2} < \epsilon_{1}. \end{array} $$

Thus, induction method implies that \(\{\boldsymbol {U}^{k}, {T}^{k}\}\in \mathbb B^{\boldsymbol {U}, {T}}_{\epsilon _{1}}\) for all kk1, i.e., {Uk,Tk,Yk,Tk− 1}∈ N, kk1. As a result, (A.33) holds for all kk1, so does (A.34). Therefore, letting \(K\rightarrow \infty \) in (A.34) yields

$$ {\sum}^{\infty}_{k=1} \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} <+\infty, $$

which shows that {Uk,Tk} is a Cauchy sequence and hence converges. Since {U,T} in Theorem 4.1 is a limit point, the whole sequence converges to {U,T}. This completes the proof. □

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Y., Feng, Y. Half-quadratic alternating direction method of multipliers for robust orthogonal tensor approximation. Adv Comput Math 49, 24 (2023). https://doi.org/10.1007/s10444-023-10014-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10444-023-10014-6

Keywords

Mathematics Subject Classification (2010)

Navigation