Half-quadratic alternating direction method of multipliers for robust orthogonal tensor approximation

Yang, Yuning; Feng, Yunlong

doi:10.1007/s10444-023-10014-6

Half-quadratic alternating direction method of multipliers for robust orthogonal tensor approximation

Published: 24 March 2023

Volume 49, article number 24, (2023)
Cite this article

Advances in Computational Mathematics Aims and scope Submit manuscript

117 Accesses
Explore all metrics

Abstract

Higher-order tensor canonical polyadic decomposition (CPD) with one or more of the latent factor matrices being columnwisely orthonormal has been well studied in recent years. However, most existing models penalize the noises, if occurring, by employing the least squares loss, which may be sensitive to non-Gaussian noise or outliers, leading to bias estimates of the latent factors. In this paper, we derive a robust orthogonal tensor CPD model with Cauchy loss, which is resistant to heavy-tailed noise such as the Cauchy noise, or outliers. By exploring the half-quadratic property of the model, we develop the so-called half-quadratic alternating direction method of multipliers (HQ-ADMM) to solve the model. Each subproblem involved in HQ-ADMM admits a closed-form solution. Thanks to some nice properties of the Cauchy loss, we show that the whole sequence generated by the algorithm globally converges to a stationary point of the problem under consideration. Numerical experiments on synthetic and real data demonstrate the effectiveness of the proposed model and algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Generalized Non-convex Method for Robust Tensor Completion

Article 08 August 2023

On Approximation Algorithm for Orthogonal Low-Rank Tensor Approximation

Article 28 June 2022

Robust Schatten-p Norm Based Approach for Tensor Completion

Article 08 January 2020

References

Anandkumar, A., Jain, P., Shi, Y., Niranjan, U. N.: Tensor vs. matrix methods: robust tensor decomposition under block sparse perturbations. In: Artificial Intelligence and Statistics, pp. 268–276 (2016)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems : proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1-2), 91–129 (2013)
Article MathSciNet MATH Google Scholar
Beaton, A., Tukey, J.: The fitting of power series, meaning polynomials, illustrated on band-spectroscopic data. Technometrics 16(2), 147–185 (1974)
Article MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1-2), 459–494 (2014)
Article MathSciNet MATH Google Scholar
Chen, J., Saad, Y.: On the tensor SVD and the optimal low rank orthogonal approximation of tensors. SIAM. J. Matrix Anal. Appl. 30(4), 1709–1734 (2009)
Article MATH Google Scholar
Cheng, L., Wu, Y. C., Poor, H.V.: Probabilistic tensor canonical polyadic decomposition with orthogonal factors. IEEE Trans. Signal Process. 65 (3), 663–676 (2016)
Article MathSciNet MATH Google Scholar
Cichocki, A., Mandic, D., De Lathauwer, L., Zhou, G., Zhao, Q., Caiafa, C., Phan, H.A.: Tensor decompositions for signal processing applications: from two-way to multiway component analysis. IEEE Signal Process. Mag. 32(2), 145–163 (2015)
Article Google Scholar
De Almeida, A.L.F., Kibangou, A.Y., Miron, S., Araújo, D.C.: Joint data and connection topology recovery in collaborative wireless sensor networks. In: Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013), pp. 5303–5307 (2013)
De Lathauwer, L.: Algebraic methods after prewhitening. In: Handbook of Blind Source Separation, pp. 155–177. Elsevier (2010)
De Lathauwer, L.: A Short introduction to tensor-based methods for factor analysis and blind source separation. In: Proceeding of the IEEE International Symposium on Image and Signal Processing and Analysis (ISPA 2011), pp. 558–563 (2011)
De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21, 1253–1278 (2000)
Article MathSciNet MATH Google Scholar
Ding, M., Huang, T.Z., Ma, T.H., Zhao, X.L., Yang, J.H.: Cauchy noise removal using group-based low-rank prior. Appl. Math. Comput. 372, 124971 (2020)
MathSciNet MATH Google Scholar
Feng, Y., Fan, J., Suykens, J.: A statistical learning approach to modal regression. J. Mach. Learn. Res. 21(2), 1–35 (2020)
MathSciNet MATH Google Scholar
Feng, Y., Huang, X., Shi, L., Yang, Y., Suykens, J.: Learning with the maximum correntropy criterion induced losses for regression. J. Mach. Learn. Res. 16, 993–1034 (2015)
MathSciNet MATH Google Scholar
Ganan, S., McClure, D.: Bayesian image analysis: an application to single photon emission tomography. Amer. Statist. Assoc, 12–18 (1985)
Goldfarb, D., Qin, Z.: Robust low-rank tensor recovery: models and algorithms. SIAM J. Matrix Anal. Appl. 35(1), 225–253 (2014)
Article MathSciNet MATH Google Scholar
Guan, N., Liu, T., Zhang, Y., Tao, D., Davis, L.S.: Truncated cauchy non-negative matrix factorization. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 246–259 (2017)
Article Google Scholar
Guan, Y., Chu, D.: Numerical computation for orthogonal low-rank approximation of tensors. SIAM J. Matrix Anal. Appl. 40(3), 1047–1065 (2019)
Article MathSciNet MATH Google Scholar
He, R., Zheng, W.S., Hu, B.G.: Maximum correntropy criterion for robust face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1561–1576 (2010)
Google Scholar
Hillar, C.J., Lim, L.H.: Most tensor problems are NP-hard. J. ACM 60(6), 45:1–45:39 (2013)
Article MathSciNet MATH Google Scholar
Holland, P., Welsch, R.: Robust regression using iteratively reweighted least-squares. Commun. Stat.-Theory Methods 6(9), 813–827 (1977)
Article MATH Google Scholar
Hong, D., Kolda, T.G., Duersch, J.A.: Generalized canonical polyadic tensor decomposition. SIAMRev. 62(1), 133–163 (2020)
MathSciNet MATH Google Scholar
Hong, M., Luo, Z.Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)
Article MathSciNet MATH Google Scholar
Hu, S., Li, G.: Convergence rate analysis for the higher order power method in best rank one approximations of tensors. Numer. Math. 140(4), 993–1031 (2018)
Article MathSciNet MATH Google Scholar
Hu, S., Ye, K. (2019)
Huber, P.J.: Robust statistics, vol. 523. Wiley, New York (2004)
Google Scholar
Kim, G., Cho, J., Kang, M.: Cauchy noise removal by weighted nuclear norm minimization. J. Sci. Comput. 83, 15 (2020)
Article MathSciNet MATH Google Scholar
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51, 455–500 (2009)
Article MathSciNet MATH Google Scholar
Kovnatsky, A., Glashoff, K., Bronstein, M.M.: MADMM: a generic algorithm for non-smooth optimization on manifolds. In: European Conference on Computer Vision, pp. 680–696. Springer (2016)
Li, G., Liu, T., Pong, T.K.: Peaceman–Rachford splitting for a class of nonconvex optimization problems. Comput. Optim. Appl. 68(2), 407–436 (2017)
Article MathSciNet MATH Google Scholar
Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)
Article MathSciNet MATH Google Scholar
Li, G., Pong, T.K.: Douglas–Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems. Math. Program. 159(1-2), 371–401 (2016)
Article MathSciNet MATH Google Scholar
Li, G., Pong, T.K.: Calculus of the exponent of kurdyka–łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18(5), 1199–1232 (2018)
Article MathSciNet MATH Google Scholar
Li, J., Usevich, K., Comon, P.: Globally convergent Jacobi-type algorithms for simultaneous orthogonal symmetric tensor diagonalization. SIAM J. Matrix Anal. Appl. 39(1), 1–22 (2018)
Article MathSciNet MATH Google Scholar
Li, J., Zhang, S.: Polar decomposition based algorithms on the product of stiefel manifolds with applications in tensor approximation. arXiv:1912.10390 (2019)
Li, X., Lu, Q., Dong, Y., Tao, D.: Robust subspace clustering by cauchy loss function. IEEE Trans. Neural Netw. Learn. Syst. 30(7), 2067–2078 (2018)
Article MathSciNet Google Scholar
Liu, H., So, A.M.C., Wu, W.: Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods. Math. Program. 178(1), 215–262 (2019)
Article MathSciNet MATH Google Scholar
Maronna, R., Bustos, O., Yohai, V.: Bias-and efficiency-robustness of general M-estimators for regression with random carriers. In: Smoothing Techniques for Curve Estimation, pp. 91–116. Springer (1979)
Mei, J.J., Dong, Y., Huang, T.Z., Yin, W.: Cauchy noise removal by nonconvex admm with convergence guarantees. J. Sci. Comput. 74(2), 743–766 (2018)
Article MathSciNet MATH Google Scholar
Pan, J., Ng, M.K.: Symmetric orthogonal approximation to symmetric tensors with applications to image reconstruction. Numer. Linear Algebra Appl. 25(5), e2180 (2018)
Article MathSciNet MATH Google Scholar
Pravdova, V., Estienne, F., Walczak, B., Massart, D.L.: A robust version of the Tucker3 model. Chemometr. Intell. Lab. Syst. 59(1), 75–88 (2001)
Article Google Scholar
Savas, B., Lim, L.H.: Quasi-Newton methods on grassmannians and multilinear approximations of tensors. SIAM J. Sci. Comput. 32(6), 3352–3393 (2010)
Article MathSciNet MATH Google Scholar
Sciacchitano, F., Dong, Y., Zeng, T.: Variational approach for restoring blurred images with cauchy noise. SIAM J. Imag. Sci. 8(3), 1894–1922 (2015)
Article MathSciNet MATH Google Scholar
Shashua, A., Levin, A.: Linear image coding for regression and classification using the tensor-rank principle. In: CVPR, vol. 1, pp. I–I. IEEE (2001)
Sidiropoulos, N.D., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E.E., Faloutsos, C.: Tensor decomposition for signal processing and machine learning. IEEE Trans. Signal Process. 65(13), 3551–3582 (2017)
Article MathSciNet MATH Google Scholar
Sidiropoulos, N.D., Giannakis, G.B., Bro, R.: Blind parafac receivers for ds-cdma systems. IEEE Trans. Signal Process. 48(3), 810–823 (2000)
Article Google Scholar
Signoretto, M., Dinh, Q.T., De Lathauwer, L., Suykens, J.A.K.: Learning with tensors: a framework based on convex optimization and spectral regularization. Mach. Learn. 94(3), 303–351 (2014)
Article MathSciNet MATH Google Scholar
Sørensen, M., De Lathauwer, L., Comon, P., Icart, S., Deneire, L.: Canonical polyadic decomposition with a columnwise orthonormal factor matrix. SIAM J. Matrix Anal. Appl. 33(4), 1190–1213 (2012)
Article MathSciNet MATH Google Scholar
Sørensen, M., De Lathauwer, L., Deneire, L.: PARAFAC with orthogonality in one mode and applications in DS-CDMA systems. In: Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2010), pp. 4142–4145 (2010)
Vervliet, N., Debals, O., Sorber, L., Van Barel, M., De Lathauwer, L.: Tensorlab 3.0. http://www.tensorlab.net. Available online (2016)
Wang, L., Chu, M.T., Yu, B.: Orthogonal low rank tensor approximation: alternating least squares method and its global convergence. SIAM J. Matrix Anal. and Appl. 36(1), 1–19 (2015)
Article MathSciNet MATH Google Scholar
Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)
Article MathSciNet MATH Google Scholar
Yang, Y.: The epsilon-alternating least squares for orthogonal low-rank tensor approximation and its global convergence. SIAM J. Matrix Anal. Appl. 41(4), 1797–1825 (2020)
Article MathSciNet MATH Google Scholar
Yang, Y., Feng, Y., Suykens, J.A.K.: Robust low-rank tensor recovery with regularized redescending m-estimator. IEEE Trans. Neural Netw. Learn. Syst. 27(9), 1933–1946 (2015)
Article MathSciNet Google Scholar
Ye, K., Hu, S.: When geometry meets optimization theory: partially orthogonal tensors. arXiv:2201.04824 (2022)
Yu, P., Li, G., Pong, T.K.: Kurdyka–Łojasiewicz exponent via inf-projection. Found. Comput. Math. 1–47 (2021)

Download references

Acknowledgements

We thank the editor and the anonymous reviewers for their insightful comments and suggestions that helped improve this manuscript.

Funding

The first author was supported by the National Natural Science Foundation of China Grants 11801100 and 12171105, and the Fok Ying Tong Education Foundation Grant 171094. The second author was supported by the Simons Foundation Collaboration Grant 572064.

Author information

Authors and Affiliations

College of Mathematics and Information Science, Guangxi University, Nanning, 530004, China
Yuning Yang
Department of Mathematics and Statistics, State University of New York at Albany, Albany, NY, 12222, USA
Yunlong Feng

Authors

Yuning Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yunlong Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuning Yang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Communicated by: Guoyin Li

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Mathematics of Computation and Optimisation Guest Editors: Jerome Droniou, Andrew Eberhard, Guoyin Li, Russell Luke, Thanh Tran

Appendix

1.1 Proof of Theorem 4.1

To prove the convergence of a nonconvex ADMM, a key step is to upper bound the successive difference of the dual variables by the primal variables. Different from the nonconvex ADMMs in the literature, for HQ-ADMM, the weight W^k brings barriers in the estimation of the upper bound. Fortunately, this can be overcome by realizing the relations between W^k, T^k and T^k− 1 based on Proposition 2.3, which will be given in Lemma A.1 With the upper bound at hand, we can derive the decreasing inequality with respect to $\{\tilde {L}_{\tau }^{k+1,k} \}$ (Lemma A.2), whose verification is somewhat similar to that of a nonconvex block coordinate descent. Then, the boundedness of the variables is established in Theorem A.1. Key to the above two results is to set the parameter $\tau \geq \sqrt { 10}$. Combining the above pieces, the subsequential convergence will be proved at the end of this subsection using a standard argument.

Lemma .2

It holds that

$$ \| { {\varDelta}_{ {Y}}^{k+1,k} }\|_{F} \leq \| { {\varDelta}_{ {T}}^{k+1,k} } \|_{F} + \| { {\varDelta}_{ {T}}^{k,k-1} } \|_{F} . $$

Proof

From (??), we have

$$ W^{k}\circledast{ \left( {T}^{k+1}- {A} \right) } + {Y}^{k} - \tau { \left( [[\boldsymbol{ \sigma}^{k};\boldsymbol{U}^{k+1}]] - {T}^{k+1} \right) }=0, $$

which together with the definition of Y^k+ 1 yields

$$ W^{k}\circledast{ \left( {T}^{k+1}- {A} \right) } + {Y}^{k+1} = 0. $$

(.33)

Therefore, we have

$$ \begin{array}{@{}rcl@{}} \|{ {\varDelta}_{ {Y}}^{k+1,k} } \| &=& { \left\| W^{k}\circledast{ \left( {T}^{k+1}- {A} \right) } - W^{k-1}\circledast{ \left( {T}^{k}- {A} \right) } \right\|_{F} } \\ &= &{ \left\| W^{k}\circledast{ \left( {T}^{k+1}- {A} \right) } - W^{k}\circledast{ \left( {T}^{k}- {A} \right) } + W^{k}\circledast\left( { {T}^{k}- {A} }\right) - W^{k-1}\circledast\left( { {T}^{k}- {A} }\right( \right\|_{F} }\\ &\leq &{ \left\| W^{k}\circledast{ \left( {T}^{k+1}- {T}^{k} \right) } \right\|_{F} } + { \left\| (W^{k} - W^{k-1})\circledast{ \left( {T}^{k}- {A} \right) } \right\|_{F} } \end{array} $$

(.34)

Now, denote $ E_{1}:= { \left \| W^{k}\circledast { \left ({T}^{k+1}- {T}^{k} \right ) } \right \|_{F} } $ and $\small E_{2}:={ \left \| (W^{k} - W^{k-1})\circledast { \left ({T}^{k}- {A} \right ) } \right \|_{F} } $. We first consider E₁. From the definition of W^k, we easily see that $ W^{k}_{i_{1}{\cdots } i_{d}}\leq 1$ for each i₁,…,i_d. Therefore,

$$ E_{1} \leq \| { {\varDelta}_{ {T}}^{k+1,k} } \|. $$

(.35)

Next, we focus on E₂. To simplify notations we denote $a_{i_{1}{\cdots } i_{d}}^{k}:= {T}^{k}_{i_{1}{\cdots } i_{d}} - {A}_{i_{1}{\cdots } i_{d}}$ and

$$e_{i_{1}{\cdots} i_{d}}:= \delta^{2} a_{i_{1}{\cdots} i_{d}}^{k} \left( \frac{1}{ \delta^{2} + \left( a_{i_{1}{\cdots} i_{d}}^{k}\right)^{2} } - \frac{1}{\delta^{2} + \left( a_{i_{1}{\cdots} i_{d}}^{k-1}\right)^{2} } \right) .$$

Then, E₂ can be expressed as

$$ \begin{array}{@{}rcl@{}} {E_{2}^{2}} &=&\sum\limits^{n_{1},\ldots,n_{d}}_{i_{1}=1,\ldots,i_{d}=1} { \left( W_{i_{1}{\cdots} i_{d}}^{k+1} - W_{i_{1}{\cdots} i_{d}}^{k} \right) }^{2}{ \left( {T}_{i_{1}{\cdots} i_{d}} - {A}_{i_{1}{\cdots} i_{d}} \right) }^{2} \\ &=& \sum\limits^{n_{1},\ldots,n_{d}}_{i_{1}=1,\ldots,i_{d}=1} \left( \frac{1}{ \delta^{2} + \left( a_{i_{1}{\cdots} i_{d}}^{k}\right)^{2} } - \frac{1}{\delta^{2} + \left( a_{i_{1}\cdots i_{d}}^{k-1}\right)^{2} } \right)^{2}\delta^{4} \left( a_{i_{1}{\cdots} i_{d}}^{k}\right)^{2}\\ & =& \sum\limits_{i_{1}=1,\ldots,i_{d}=1}^{n_{1},\ldots,n_{d}} e_{i_{1}{\cdots} i_{d}}^{2}. \end{array} $$

It follows from Proposition 2.3 that

$$ |e_{i_{1}{\cdots} i_{d}}| \leq | a_{i_{1}{\cdots} i_{d}}^{k}-a_{i_{1}{\cdots} i_{d}}^{k-1} |, $$

and so

$$ E_{2} \leq \| {T}^{k}-{A} - ({T}^{k-1} - {A}) \|_{F} = \|{ {\varDelta}_{{T}}^{k,k-1} }\|_{F}. $$

(.36)

(.34) combining with (.35) and (.36) yields the desired result. □

With Lemma A.1, we then establish a decreasing inequality with respect to $\{\tilde {L}_{\tau }^{k+1,k} \}$ defined in (??):

$$ \tilde L^{k+1,k}_{\tau} := L_{\tau}(\boldsymbol{ \sigma}^{k+1},\boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, W^{k+1}, {T}^{k}). + \frac{2 }{ \tau }\| {T} - {T}^{\prime}\|_{F}^{2}.$$

Key to the validness of the decreasing inequality is to set $\tau \geq \sqrt {10}$.

Lemma .3

Let the parameter τ satisfy $ \tau \geq \sqrt {10}$. Then, there holds

$$ \tilde L_{\tau}^{k,k-1} - \tilde L_{\tau}^{k+1,k} \geq \frac{\alpha}{2}{\sum}_{j=1}^{d}{ \left\| { {\varDelta}_{U_j}^{k+1,k} } \right\|_{F} }^{2}+ \frac{1}{\tau} { \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F} }^{2},~\forall k\geq 1, $$

where α > 0 is defined in (??) and (??).

Proof

We first consider the decrease caused by U_j. When 1 ≤ j ≤ d − t, according to the algorithm, the expression of L_τ(⋅), that ${ \left \| u^{k}_{j,i} \right \| }=1$ and recalling the definition of $u^{k+1}_{j,i}$, $\mathbf {v}^{k+1}_{j,i}$, and $\tilde {\mathbf {v}}^{k+1}_{j,i}$, we have

$$ \begin{array}{@{}rcl@{}} && L_{\tau}(\boldsymbol{ \sigma}^{k}, U^{k+1}_{1},\ldots,U^{k+1}_{j-1},{U^{k}_{j}},\ldots,{U^{d}_{j}} , {T}^{k}, {Y}^{k}, W^{k} ) - \\ &&~~~~~~~~~~~~~~~ L_{\tau}(\boldsymbol{ \sigma}, U^{k+1}_{1},\ldots, U^{k+1}_{j},U^{k}_{j+1},\ldots,{U^{k}_{d}}, {T}^{k}, {Y}^{k}, W^{k}) \\ =&& \sum\limits^{R}_{i=1}{ \left\langle {\sigma^{k}_{i}} \cdot ({Y}^{k} + \tau {T}^{k}) {u_{1,i}^{k+1}\otimes \cdots\otimes u_{j-1,i}^{k+1} \otimes u_{j+1,i}^{k} \otimes\cdots\otimes u_{d,i}^{k} } , u^{k+1}_{j,i} - u^{k}_{j,i}\right\rangle } \\ =&&\sum\limits^{R}_{i=1}{ \left\langle {\sigma^{k}_{i}}\cdot \mathbf{v}^{k+1}_{j,i} , u^{k+1}_{j,i} - u^{k}_{j,i} \right\rangle } \\ =&& \sum\limits^{R}_{i=1} { \left\langle \sigma^{k}\cdot \mathbf{v}^{k+1} + \alpha{u}^{k}_{j,i} , u^{k+1}_{j,i} - u^{k}_{j,i} \right\rangle } + \frac{\alpha}{2}{ \left\| u^{k+1}_{j,i} - u^{k}_{j,i} \right\| }^{2} \\ =&& \sum\limits^{R}_{i=1} { \left\langle \tilde{\mathbf{v}}^{k+1}_{j,i} , \frac{\tilde{\mathbf{v}}^{k+1}_{j,i}}{{ \left\| \tilde{\mathbf{v}}^{k+1}_{j,i} \right\| } } -u^{k}_{j,i} \right\rangle }+ \frac{\alpha}{2}\big\|{u^{k+1}_{j,i} - u^{k}_{j,i} }^{2} \\ \geq && \frac{\alpha}{2}\sum\limits^{R}_{i=1}{ \left\| u^{k+1}_{j,i} - u^{k}_{j,i} \right\| }^{2} = \frac{\alpha}{2}{ \left\| { {\varDelta}_{U_j}^{k+1,k} } \right\| }^{2}_{F} , \end{array} $$

(.37)

where the fourth equality follows from the definition of $u^{k+1}_{j,i}$ and $\tilde {\mathbf {v}}^{k+1}_{j,i}$, and the inequality is due to ${ \left \| \mathbf {v} \right \| }\geq {\left \langle \mathbf {v} , u\right \rangle }$ for any vectors u,v of the same size with ${ \left \| u \right \| }=1$.

The decrease of U_j when d − t + 1 ≤ j ≤ d is similar. From the definition of $V^{k+1}_{j}$, it holds that

$$ \begin{array}{@{}rcl@{}} && L_{\tau}(\boldsymbol{ \sigma}^{k}, U^{k+1}_{1},\ldots,U^{k+1}_{j-1},{U^{k}_{j}},\ldots,{U^{k}_{d}} , {T}^{k}, {Y}^{k}, W^{k} ) - \\ &&~~~~~~~~~~~~~~~ L_{\tau}(\boldsymbol{ \sigma}^{k}, U^{k+1}_{1},\ldots, U^{k+1}_{j},U^{k}_{j+1},\ldots,{U^{k}_{d}}, {T}^{k}, {Y}^{k}, W^{k}) \\ &=& \sum\limits_{i=1}^{R}{ \left\langle {\sigma^{k}_{i}} \cdot ({Y}^{k} + \tau {T}^{k}) {u_{1,i}^{k+1}\otimes \cdots\otimes u_{j-1,i}^{k+1} \otimes u_{j+1,i}^{k} \otimes\cdots\otimes u_{d,i}^{k} } , u^{k+1}_{j,i} - u^{k}_{j,i}\right\rangle } \\ &=&{ \left\langle V^{k+1}_{j} \cdot \text{diag}(\boldsymbol{ \sigma}^{k}) , U^{k+1}_{j} - {U^{k}_{j}}\right\rangle } \\ &=&{ \left\langle V^{k+1}_{j} \cdot \text{diag}(\boldsymbol{ \sigma}^{k}) + \alpha {U^{k}_{j}} , U^{k+1}_{j} - {U^{k}_{j}}\right\rangle } + \frac{\alpha}{2}{ \left\| U^{k+1}_{j} - {U^{k}_{j}} \right\|_{F} }^{2} \\ & \geq& \frac{\alpha}{2}{ \left\| { {\varDelta}_{U_j}^{k+1,k} } \right\|_{F} }^{2} , \end{array} $$

(.38)

where the inequality follows from the definition of $U^{k+1}_{j}$ in (??). To show the decrease of T, note that L_τ(⋅) is strongly convex with respect to T, based on which we can easily deduce that

$$ L_{\tau}(\boldsymbol{ \sigma}^{k},\boldsymbol{U}^{k+1}, {T}^{k}, {Y}^{k}, W^{k}) - L_{\tau}(\boldsymbol{ \sigma}^{k}, \boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k}, W^{k}) \geq \frac{\tau}{2}{ \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\| }^{2}_{F}. $$

(.39)

Next, it follows from the definition of Y^k+ 1 and Lemma A.1 that

$$ \begin{array}{@{}rcl@{}} && L_{\tau}(\boldsymbol{ \sigma}^{k}, \boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k}, W^{k}) - L_{\tau}(\boldsymbol{ \sigma}^{k}, \boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, W^{k}) \\ =&& { \left\langle {Y}^{k+1} - {Y}^{k} , [[ \boldsymbol{ \sigma}^{k}; \boldsymbol{U}^{k+1} ]] - {T}^{k+1} \right\rangle } \\ =&& -\frac{1}{\tau}{ \left\| { {\varDelta}_{ {Y}}^{k+1,k} } \right\|_{F} }^{2}\\ \geq && -\frac{2}{\tau}{ \left( { \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F} }^{2} + { \left\| { {\varDelta}_{ {T}}^{k,k-1} } \right\|_{F} }^{2} \right) }. \end{array} $$

(.40)

Finally, it follows from the definition of σ^k+ 1 and W^k+ 1 that

$$ \begin{array}{@{}rcl@{}} &&\!\!\!\!\! L_{\tau}(\boldsymbol{ \sigma}^{k}, \boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, W^{k}) - L_{\tau}(\boldsymbol{ \sigma}^{k+1}, \boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, W^{k}) \geq 0, \\ &&\!\!\!\!\! L_{\tau}(\boldsymbol{ \sigma}^{k+1}, \boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, W^{k}) - L_{\tau}(\boldsymbol{ \sigma}^{k+1}, \boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, W^{k+1}) \geq 0. \end{array} $$

(.41) (.42)

As a result, summing up (.37)–(.42) yields

$$ \begin{array}{@{}rcl@{}} && L_{\tau}(\boldsymbol{ \sigma}^{k}, \boldsymbol{U}^{k}, {T}^{k}, {Y}^{k}, W^{k}) - L_{\tau}(\boldsymbol{ \sigma}^{k+1}, \boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, W^{k+1}) \\ \geq && \frac{\alpha}{2}\sum\limits_{j=1}^{d}{ \left\| { {\varDelta}_{U_j}^{k+1,k} } \right\|_{F}^{2}} + { \left( \frac{\tau}{2} - \frac{2}{\tau} \right) }{\left\| { {\varDelta}_{ {T}}^{k+1,k}} \right\|_{F}^{2}} - \frac{2}{\tau}{\left\|{{\varDelta}_{ {T}}^{k,k-1}} \right\|_{F}^{2}} \\ \geq && \frac{\alpha}{2}\sum\limits_{j=1}^{d}{ \left\| { {\varDelta}_{U_j}^{k+1,k} } \right\|_{F} }^{2} + { \left( \frac{2}{\tau} + \frac{1}{\tau} \right) }{ \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F}^{2}} - \frac{2}{\tau} {\left\| { {\varDelta}_{ {T}}^{k,k-1} } \right\|_{F}^{2}}, \end{array} $$

(.43)

where the last inequality follows from the range of τ. Rearranging the terms of (.43) gives the desired results. This completes the proof. □

We then show that $\tilde L_{\tau }^{k,k-1}$ defined in Lemma A.2 is lower bounded and the sequence {σ^k,U^k,T^k,Y^k,W^k} is bounded as well.

Theorem .3

Under the setting of Lemma A.2, $\{\tilde L_{\tau }^{k,k-1}\}$ is bounded. The sequence {σ^k,U^k,T^k,Y^k,W^k} generated by Algorithm 1 is bounded as well.

Proof

Denote $Q^{k}(\cdot ) := \frac {1}{2}{ \left \| \sqrt { W^{k}}\circledast { \left (\cdot - {A} \right ) } \right \|_{F} }^{2} $; thus, we have $\nabla Q^{k}({T}) = W^{k}\circledast { \left ({T}- {A} \right ) }$, and it then follows from the quadraticity of Q^k(⋅) and $ {Y}^{k} = - W^{k-1}\circledast { \left ({T}^{k}- {A} \right ) }$ from (.33) that

$$ \begin{array}{@{}rcl@{}} &&Q^{k-1}({T}^{k}) - Q^{k-1}([[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]])- { \left\langle {Y}^{k} , [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right\rangle } \\ =&& { \left\langle W^{k-1}\circledast{ \left( [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {A} \right) } , {T}^{k} - [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] \right\rangle } \\ &&~~~~~~~~~~~~+\frac{1}{2}{ \left\| \sqrt{ W^{k-1}}\circledast{ \left( [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right) } \right\|_{F} }^{2}- { \left\langle {Y}^{k} , [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right\rangle }\\ =&& \frac{1}{2}{ \left\| \sqrt{ W^{k-1}}\circledast{ \left( [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right) } \right\|_{F} }^{2} \\ &&~~~~~~~~~~~~+ { \left\langle W^{k-1}\circledast{ \left( [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {A} \right) } - W^{k-1}\circledast\left( { {T}^{k}- {A} }\right( , {T}^{k} - [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] \right\rangle }\\ =&& - \frac{1}{2}{ \left\| \sqrt{ W^{k-1}}\circledast{ \left( [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right) } \right\|_{F} }^{2} \geq -\frac{1}{2}{ \left\| [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right\|_{F} }^{2}, \end{array} $$

(.44)

where the last inequality uses the fact that $0< W^{k-1}_{i_{1}{\cdots } i_{d}} \leq 1$. It thus follows that for any k ≥ 2,

$$ \begin{array}{@{}rcl@{}} && \tilde L_{\tau}^{k-1,k-2} = \tilde L_{\tau}(\boldsymbol{ \sigma}^{k-1},\boldsymbol{U}^{k-1}, {T}^{k-1}, {Y}^{k-1}, W^{k-1}, {T}^{k-2} )\\ &\geq& \tilde L_{\tau}(\boldsymbol{ \sigma}^{k},\boldsymbol{U}^{k}, {T}^{k}, {Y}^{k}, W^{k-1}, {T}^{k-1}) \\ &=& Q^{k-1}({T}^{k}) + \frac{\delta^{2}}{2}\sum\limits_{i_{1}=1,\ldots,i_{d}=1}^{n_{1},\ldots,n_{d}} {\varrho}( W^{k-1}_{i_{1}{\cdots} i_{d}} ) - { \left\langle {Y}^{k} , [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right\rangle } \\ && + \frac{\tau}{2}{ \left\| [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right\|_{F} }^{2} + \frac{2}{\tau}{ \left\| { {\varDelta}_{ {T}}^{k,k-1} } \right\|_{F} }^{2} \\ &\geq & Q^{k-1}([[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] ) + \frac{\tau -1}{2}{ \left\| [[ \boldsymbol{ \sigma^{k}}; \boldsymbol{U}^{k} ]] - {T}^{k} \right\|_{F} }^{2} \\ && + \frac{\delta^{2}}{2}\sum\limits_{i_{1}=1,\ldots,i_{d}=1}^{n_{1},\ldots,n_{d}} {\varrho}( W^{k-1}_{i_{1}{\cdots} i_{d}} ) + \frac{2}{\tau}\big\|{\delta P{k}{k-1}}^{2} \\ &>& -\infty, \end{array} $$

(.45)

where the first inequality follows from the proof of Lemma A.2 (summing up (.37)–(.41), the second one comes from (.44), and the last one is due to the range of τ and ϱ(⋅) ≥ 0. Thus, $\{ \tilde L_{\tau }^{k,k-1} \}$ is a lower bounded sequence. This together with Lemma A.2 shows that $\{ \tilde L_{\tau }^{k,k-1} \}$ is bounded. We then show the boundedness of {σ^k,U^k,T^k,Y^k,W^k}. The boundedness of {U^k} and {W^k} is obvious. Next, denote g(σ^k) as the formulation in lines 5–6 of (.45) with respect to σ^k. Proposition 2.1 shows that $\bigotimes _{j=1}^{d}u^{k}_{j,i}$ is orthonormal and hence ${ \| [[ \boldsymbol { \sigma }^{k}; \boldsymbol {U}^{k}]] - {T}^{k} \|_{F} }^{2}$ is strongly convex with respect to σ^k; this together with the convexity of Q^k− 1([[σ^k; U^k]]) shows that g(σ^k) is strongly convex with respect to σ^k. Combining this with (.45) gives the boundedness of {σ^k}. Quite similarly, we have that {T^k} is bounded. Finally, the boundedness of {Y^k} follows from the expression of the T-subproblem (??). As a result, {σ^k,U^k,T^k,Y^k,W^k} is a bounded sequence. This completes the proof. □

Proof Proof of Theorem 4.1

Lemma A.2 in connection with Theorem A.1 yields points 1, 2, and (??); (??) together with Lemma A.1 and the definition of Y^k+ 1, σ^k+ 1, and W^k+ 1 gives (??). On the other hand, since the sequence is bounded, limit points exist. Assume that {σ^∗,U^∗,T^∗,Y^∗,W^∗} is a limit point with

$$\setlength\abovedisplayskip{2pt} \setlength\abovedisplayshortskip{2pt} \setlength\belowdisplayskip{2pt} \setlength\belowdisplayshortskip{2pt} \underset{l\rightarrow \infty}{\lim} \{ \boldsymbol{ \sigma}^{k_{l}}, \boldsymbol{U}^{k_{l}}, {T}^{k_{l}}, {Y}^{k_{l}}, W^{k_{l}}\}= \{ \boldsymbol{ \sigma}^{*}, \boldsymbol{U}^{*}, {T}^{*}, {Y}^{*}, W^{*}\}. $$

(??), (??) then implies that

$$ \underset{l\rightarrow \infty}{\lim} \{ \boldsymbol{ \sigma}^{k_{l}+1}, \boldsymbol{U}^{k_{l}+1}, {T}^{k_{l}+1}, {Y}^{k_{l}+1}, W^{k_{l}+1}\} = \{ \boldsymbol{ \sigma}^{*}, \boldsymbol{U}^{*}, {T}^{*}, {Y}^{*}, W^{*}\}. $$

Therefore, taking the limit into l with respect to the u_j,i-subproblem (??) yields

$$ \mathbf{v}^{*}_{j,i}\sigma^{*}_{i} + \alpha{u}^{*}_{j,i} = { \left\| \tilde{\mathbf{v}}^{*}_{j,i} \right\| }u^{*}_{j,i},~1\leq j\leq d-t,~1\leq i\leq R. $$

(.46)

Multiplying both sides by $u^{*}_{j,i}$ gives

$$ { \left\| \tilde{\mathbf{v}}^{*}_{j,i} \right\| } = \alpha + \sigma^{*}_{i} { \left\langle \mathbf{v}^{*}_{j,i} , u^{*}_{j,i}\right\rangle } = \alpha + \sigma^{*}_{i}{ \left\langle {Y}^{*} + \tau {T}^{*} , \bigotimes_{j=1}^{d}u^{*}_{j,i}\right\rangle } = \alpha + \tau(\sigma^{*}_{i})^{2}, $$

(.47)

where the second equality follows from the definition of v_j,i and the last one is given by passing the limit into the expression of $\sigma ^{k_{l}+1}_{i}$ (??). Thus, (.46) together with (.47) gives

$$ ({Y}^{*} + \tau {T}^{*})\bigotimes_{l\neq j}^{d}u_{l,i}^{*} = \sigma_{i}^{*}\tau{u}_{j,i}^{*}, $$

(.48)

i.e., the first equation of the stationary point system (??).

Taking the limit into l with respect to the U_j-subproblem (??) and noticing the expression (??), we get

$$ V^{*}_{j}\text{diag}(\boldsymbol{ \sigma}^{*}) + \alpha U^{*}_{j} = U^{*}_{j} H^{*}_{j}, $$

where $H^{*}_{j}$ is a symmetric matrix. Writing it columnwisely, we obtain

$$ \sigma^{*}_{i} { \left( {Y}^{*} + \tau {T}^{*} \right) }\bigotimes_{l\neq j}^{d}u^{*}_{l,i} = {\sum}_{i=1}^{R}(H^{*}_{j} )_{i,r}u^{*}_{j,r} - \alpha{u}^{*}_{j,i},~d-t+1\!\leq\! j\!\leq\! d,~1\leq i\leq R. $$

Denoting ${\Lambda }^{*}_{j}:= H^{*}_{j} - \alpha I$, the above is exactly the third equality of (??). On the other hand, passing the limit into the expression of T^k (??) and W^k (??) respectively gives the T^∗- and W^∗- formulas in (??). Finally, the first expression of (??) yields T^∗ = [[σ^∗; U^∗]]. Taking the above pieces together, we have that {σ^∗,U^∗,T^∗,Y^∗,W^∗} satisfies the stationary point system (??).

Next, we show that {σ^∗,U^∗} is also a stationary point of problem (??). We define its Lagrangian function as $L_{\boldsymbol { {\varPhi }}} := \boldsymbol { {\varPhi }}_{\delta }(\boldsymbol { \sigma }, \boldsymbol {U}) - {\sum }_{j,i=1}^{d-t,R} \eta _{j.i}{ \left (u_{j,i}^{\top } u_{j,i} -1 \right ) } - {\sum }^{d}_{j=d-t+1}{ \left \langle {\Lambda }_{j} , U_{j}^{\top } U_{j} - I\right \rangle }$, similar to that in (??). Taking derivative yields

$$ \left\{ \begin{array}{lr} \partial_{u_{j,i}} \boldsymbol{ {\varPhi}}_{\delta}(\boldsymbol{ \sigma};\boldsymbol{U}) = \eta_{j,i}u_{j,i} \Leftrightarrow W\circledast\left( \left[\left[\boldsymbol{ \sigma},\boldsymbol{U}\right]\right]- {A} \right) \cdot\sigma_{i}\bigotimes_{l\neq j}u_{j,i}= \eta_{j,i}u_{j,i} ,&\\ 1\leq j\leq d-t,1\leq i\leq R,\\ \partial_{u_{j,i}}\boldsymbol{ {\varPhi}}_{\delta}(\boldsymbol{ \sigma},\boldsymbol{U}) = {\sum}^{R}_{r=1}({\Lambda}_{j})_{i,r}u_{j,r} \Leftrightarrow W\circledast\left( {[[\boldsymbol{ \sigma},\boldsymbol{U}]]- {A} }\right) \cdot\sigma_{i}\bigotimes_{l\neq j}u_{j,i}={\sum}^{R}_{r=1}({\Lambda}_{j})_{i,r}u_{j,r}, ,&\\ d-t+1\leq j\leq d,1\leq i\leq R,\\ \partial_{\boldsymbol{ \sigma}}\boldsymbol{ {\varPhi}}_{\delta}(\boldsymbol{ \sigma},\boldsymbol{U}) =0\Leftrightarrow { \left\langle W\circledast{ \left( { \left[\left[ \boldsymbol{ \sigma}; \boldsymbol{U} \right]\right] } - {A} \right) } , \bigotimes_{l=1}^{d} u_{l,i}\right\rangle } = 0,1\leq i\leq R, \end{array} \right. $$

(.49)

1.2 Proof of Theorem 4.2

To prove Theorem 4.2, we first recall some definitions from nonsmooth analysis. Denote $\text {dom}f:=\{x\in \mathbb {R}^{n}\mid f(\mathbf {x})<+\infty \}$.

Definition 1 (c.f. 2)

For x ∈domf, the Fréchet subdifferential, denoted as $\hat \partial f(\mathbf {x})$, is the set of vectors $z\in \mathbb R^{n}$ satisfying

$$ \underset{\mathbf{y} \rightarrow{x}}{\underset{y \neq x}{\lim\inf}} \frac{f(y)-f(x)-\langle z, y-x\rangle}{\|\mathbf{x}-\mathbf{y}\|}\geq 0. $$

(.50)

The subdifferential of f at x ∈domf, written ∂f, is defined as

$$ \setlength\abovedisplayskip{3pt} \partial f(\mathbf{x}):=\left\{\mathbf{z} \in \mathbb{R}^{n}: \exists x^{k} \rightarrow \mathbf{x}, f\left( \mathbf{x}^{k}\right) \rightarrow{f}(\mathbf{x}), \mathbf{z}^{k} \in \hat{\partial} f\left( \mathbf{x}^{k}\right) \rightarrow \mathbf{z}\right\}. $$

It is known that $\hat \partial f(\mathbf {x})\subset \partial f(\mathbf {x})$ for each $x\in \mathbb R^{n}$ [4]. An extended-real-valued function is a function $f:\mathbb {R}^{n}\rightarrow [-\infty ,\infty ]$, which is proper if $f(\mathbf {x})>-\infty $ for all x and $f(x)<\infty $ for at least one x. It is called closed if it is lower semi-continuous (l.s.c. for short). The global convergence relies on the the Kurdyka-Łojasiewicz (KL) property given as follows:

Definition 2 (KL property and KL function, c.f. 2, 4)

A proper function f is said to have the KL property at $\overline {x}\in \text {dom}\partial f :=\{x\in \mathbb R^{n}\mid \partial f(x)\neq \emptyset \}$, if there exist $\bar \epsilon \in (0,\infty ]$, a neighborhood N of $\overline {x}$, and a continuous and concave function $\psi : [0,\bar \epsilon ) \rightarrow \mathbb R_{+}$ which is continuously differentiable on $(0,\bar \epsilon )$ with positive derivatives and ψ(0) = 0, such that for all x ∈ N satisfying $f(\overline {x}) <f({x}) < f(\overline {x}) + \bar \epsilon $, it holds that

$$ \psi^{\prime}(f(\mathbf{x}) - f(\overline{\mathbf{x}}))\text{dist}(0,\partial f(x)) \geq 1, $$

where dist(0,∂f(x)) means the distance from the original point to the set ∂f(x). If a proper and l.s.c. function f satisfies the KL property at each point of dom∂f, then f is called a KL function.

We then simplify $\tilde L_{\tau }(\cdot )$ by eliminating the variables W and σ. First, from the definition of W^k+ 1 and Lemma 2.1, we have that

$$ { \left\| \sqrt W^{k+1} \circledast{ \left( {T}^{k+1} - {A} \right) } \right\|_{F} }^{2} + \delta^{2}{\sum}^{n_{1},\ldots,n_{d}}_{i_{1}=1,\ldots,i_{d}=1}{\varrho}(W^{k+1}_{i_{1}{\cdots} i_{d}}) = \boldsymbol{ {\varPhi}}_{\delta}({T}^{k+1}- {A} ), $$

where Φ_δ(⋅) is defined in (??). This eliminate the W from $\tilde L_{\tau }(\cdot )$. On the other hand, it follows from the definition of σ^k+ 1 (??) that

$$ \begin{array}{@{}rcl@{}} &&-{ \left\langle {Y}^{k+1} , [[\boldsymbol{ \sigma}^{k+1};\boldsymbol{U}^{k+1} ]] - {T}^{k+1} \right\rangle } + \frac{\tau}{2}{ \left\| [[\boldsymbol{ \sigma}^{k+1};\boldsymbol{U}^{k+1} ]] - {T}^{k+1} \right\|_{F} }^{2} \\ =&& { \left\langle {Y}^{k+1} , {T}^{k+1}\right\rangle } + \frac{\tau}{2}{ \left\| {T}^{k+1} \right\|_{F} }^{2} - \frac{1}{2\tau}{\sum}_{i=1}^{R}{ \left( \left( { {Y}^{k+1}+ \tau {T}^{k+1}}\right)\bigotimes_{j=1}^{d}u_{j,i}^{k+1} \right) }^{2}. \end{array} $$

Thus, σ is also eliminated. In what follows, whenever necessary, ${\sigma ^{k}_{i}} $ still represents the expression $ ({Y}^{k}+\tau {T}^{k})\bigotimes _{j=1}^{d}u^{k}_{j,i}/\tau $, but we only treat it as a representation instead of a variable.

Then, $\tilde L_{\tau }(\boldsymbol { \sigma }^{k+1}, \boldsymbol {U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, W^{k+1}, {T}^{k})$ can be equivalently written as

$$ \begin{array}{@{}rcl@{}} &&\tilde L_{\tau}(\boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, {T}^{k} ) \\ =&& \frac{1}{2}\boldsymbol{ {\varPhi}}_{\delta}({T}^{k+1} - {A}) + { \left\langle {Y}^{k+1} , {T}^{k+1}\right\rangle } + \frac{\tau}{2}{ \left\| {T}^{k+1} \right\|_{F} }^{2}\\ &&~~~~~~~~ - \frac{1}{2\tau}{\sum}_{i=1}^{R}{ \left( \left( { {Y}^{k+1}+ \tau {T}^{k+1}}\right)\bigotimes_{j=1}^{d}u_{j,i}^{k+1} \right) }^{2} + \frac{2}{\tau}{ \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F} }^{2}. \end{array} $$

In addition, we denote

$$ \begin{array}{@{}rcl@{}} &&\tilde L_{\tau,\alpha}(\boldsymbol{U}, {T}, {Y}, {T}^{\prime}) := \tilde L_{\tau}(\boldsymbol{U}, {T}, {Y}, {T}^{\prime}) - \frac{\alpha}{2}{\sum}^{d}_{j=1}\big\|{ U_{j}}^{2} \\ &&~~~~~~~~~~~~+ {\sum}^{d-t,R}_{j=1,i=1}\iota_{{ \text{st}(n_{j},1) } }(u_{j,i}) + {\sum}^{d}_{j=d-t+1}\iota_{{ \text{st}(n_{j},R) }}(U_{j}). \end{array} $$

We can see that under the constraints of the optimization problem (??), $\tilde L_{\tau ,\alpha }(\cdot ) = \tilde L_{\tau }(\cdot ) -\frac {\alpha d R}{2}$. This together with Theorem 4.1 tells us that the sequence $\{\tilde L_{\tau ,\alpha }(\boldsymbol {U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, {T}^{k}), \}$ is also bounded and nonincreasing. In addition, we have that $\tilde L_{\tau ,\alpha }(\cdot )$ is a KL function.

Proposition .4

$\tilde L_{\tau ,\alpha }(\boldsymbol {U}, {T}, {Y}, {T}^{\prime }) $ defined above is a proper, l.s.c., and KL function.

Proof

It is clear that $\tilde L_{\tau ,\alpha }(\cdot )$ is proper and l.s.c.. Next, since the constrained sets in (??) are all Stiefel manifolds, items 2 and 6 of [4, Example 2] tell us that they are semi-algebraic sets, and their indicator functions are semi-algebraic functions. Therefore, the indicator functions are KL functions [4, Theorem 3]. On the other hand, the remaining part of $\tilde L_{\tau ,\alpha }$ (besides the indicator functions) is an analytic function and hence it is KL [4]. As a result, $\tilde L_{\tau ,\alpha }(\boldsymbol {U}, {T}, {Y}, {T}^{\prime }) $ is a KL function. □

In the sequel, we mainly rely on $\tilde L_{\tau ,\alpha }(\cdot )$ to prove the global convergence. For convenience, we denote

$$ \begin{array}{@{}rcl@{}} \tilde L_{\tau,\alpha}^{k+1,k} &:=& \tilde L_{\tau,\alpha}(\boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, {T}^{k}), ~\text{and}\\ \partial \tilde L_{\tau,\alpha}^{k+1,k}&:=& \partial \tilde L_{\tau,\alpha}(\boldsymbol{U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, {T}^{k}); \end{array} $$

denote ${ {\varDelta }_{\boldsymbol {U}, {T}}^{k+1,k} }:= (\boldsymbol {U}^{k+1} , {T}^{k+1}) - (\boldsymbol {U}^{k}, {T}^{k})$, and

$$ { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } := \sqrt{{\sum}^{d}_{j=1}{ \left\| { {\varDelta}_{U_j}^{k+1,k} } \right\|_{F} }^{2} + { \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F} }^{2} }. $$

Lemma .4

There exists a large enough constant c₀ > 0, such that

$$ \text{dist}(\boldsymbol{0}, \partial { \tilde L_{\tau,\alpha}^{k+1,k} } ) \leq c_{0}{ \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} } \right) }. $$

(A.19)

Proof

We first consider $\partial _{u_{j,i}} { \tilde L_{\tau ,\alpha }^{k+1,k} } $, 1 ≤ j ≤ d − t, 1 ≤ i ≤ R, and $\partial _{U_{j}} { \tilde L_{\tau ,\alpha }^{k+1,k} } $, d − t + 1 ≤ j ≤ d, respectively. In what follows, we denote

$$ \overline{\mathbf{v}}^{k+1}_{j,i}\!:=\! \sigma^{k+1}_{i} { \left( {Y}^{k+1} + \tau {T}^{k+1} \right) }\bigotimes_{l\neq j}^{d}\mathbf{u}^{k+1}_{l,i} + \alpha\mathbf{u}^{k+1}_{j,i}, ~\text{and}~ \overline V^{k+1}_{j} \!:=\! [\bar{\mathbf{v}}^{k+1}_{j,1},\ldots,\bar{\mathbf{v}}^{k+1}_{j,R} ]. $$

We also recall $\mathbf {v}_{j,i}^{k+1}:= ({Y}^{k}+ \tau {T}^{k}){\mathbf {u}_{1,i}^{k+1}\otimes \cdots \otimes \mathbf {u}_{j-1,i}^{k+1} \otimes \mathbf {u}_{j+1,i}^{k} \otimes \cdots \otimes \mathbf {u}_{d,i}^{k} }$ and $\tilde {\mathbf {v}}_{j,i}^{k+1} = {\sigma ^{k}_{i}} \mathbf {v}^{k+1}_{j,i} + \alpha \mathbf {u}^{k}_{j,i}$ for later use. In addition, denote $\tilde V^{k+1}_{j} := [\tilde {\mathbf {v}}^{k+1}_{j,1},\ldots ,\tilde {\mathbf {v}}^{k+1}_{j,R}]$.

For 1 ≤ j ≤ d − t, one has

$$ \begin{array}{@{}rcl@{}} \partial_{\mathbf{u}_{j,i}}{ \tilde L_{\tau,\alpha}^{k+1,k} } &=& -\sigma^{k+1}_{i} { \left( {Y}^{k+1}+\tau {T}^{k+1} \right) }\bigotimes_{l\neq j}^{d}\mathbf{u}^{k+1}_{l,i}- \alpha\mathbf{u}^{k+1}_{j,i} + \partial \iota_{\text{st}{n_{j}}{1} }(u^{k+1}_{j,i})\\ &=& - \overline{\mathbf{v}}^{k+1}_{j,i} + \partial \iota_{{ \text{st}(n_{j},1) }}(\mathbf{u}^{k+1}_{j,i}). \end{array} $$

(A.20)

we then wish to show that

$$ \tilde{\mathbf{v}}^{k+1}_{j,i} \in \hat \partial \iota_{{ \text{st}(n_{j},1) } }(\mathbf{u}^{k+1}_{j,i}) \subset \partial \iota_{{ \text{st}(n_{j},1) } }(u^{k+1}_{j,i}). $$

(A.21)

The proof is similar to that of [53, Lemma 6.1]. First, from the definition of $\iota _{{ \text {st}(n_{j},1) }}(\cdot ) $ and $\hat \partial \iota _{{ \text {st}(n_{j},1) }}(\cdot )$ in (.50), it is not hard to see that if y∉st(n_j, 1), then (.50)clearly holds when ${z} = \tilde {\mathbf {v}}^{k+1}_{j,i}$; otherwise if y ∈st(n_j, 1), i.e., ∥y∥ = 1, then from the definition of $\mathbf {u}^{k+1}_{j,i}$, we see that

$$ \mathbf{u}^{k+1}_{j,i} = \arg\underset{ \|\mathbf{y}\|=1 }{\max} { \left\langle \mathbf{y} , \tilde{\mathbf{v}}^{k+1}_{j,i} \right\rangle }\Leftrightarrow \langle \tilde{{v}}^{k+1}_{j,i},\mathbf{u}^{k+1}_{j,i}-\mathbf{y}\rangle \geq 0,~\forall \|\mathbf{y}\|=1, $$

which together with $\iota _{{ \text {st}(n_{j},1) }}(\mathbf {y}) = 0$ and $\iota _{{ \text {st}(n_{j},1) }}(u^{k+1}_{j,i})=0$ gives

$$ \setlength\abovedisplayskip{3pt} \setlength\abovedisplayshortskip{3pt} \setlength\belowdisplayskip{3pt} \setlength\belowdisplayshortskip{3pt} \underset{y \neq u^{k+1}_{j,i}, y \rightarrow{u}^{k+1}_{j,i} }{\lim\inf} \frac{ \iota_{{ \text{st}(n_{j},1) } }(\mathbf{y}) -\iota_{{ \text{st}(n_{j},1) }}(u^{k+1}_{j,i}) -\langle \tilde{{v}}^{k+1}_{j,i}, y -u^{k+1}_{j,i} \rangle}{\|\mathbf{y}- {u}^{k+1}_{j,i} \|}\geq 0. $$

As a result, (A.21) is true, which together with (A.20) shows that

$$ \tilde{\mathbf{v}}^{k+1}_{j,i} - \overline{\mathbf{v}}^{k+1}_{j,i} \in \partial_{\mathbf{u}_{j,i}} { \tilde L_{\tau,\alpha}^{k+1,k} } , ~1\leq j\leq d-t,~1\leq i\leq R. $$

Let 0 denote the origin. Then by using the triangle inequality and the boundeness of {σ^k,U^k,T^k,Y^k}, and noticing the definition of ${ {\varDelta }_{\boldsymbol {U}, {T}}^{k+1,k} }$, there must exist large enough constants c₁,c₂ > 0 only depending on τ,α, and the size of {σ^k,U^k,T^k,Y^k}, such that

$$ \begin{array}{@{}rcl@{}} && \quad\text{dist}(\boldsymbol{0}, \partial_{\mathbf{u}_{j,i}} { \tilde L_{\tau,\alpha}^{k+1,k} } ) \\ && \leq{ \left\| \tilde{\mathbf{v}}^{k+1}_{j,i} - \overline{\mathbf{v}}^{k+1}_{j,i} \right\| } \\ &&\leq c_{1}{ \left( \sum\limits^{d}_{j=1}{ \left\| { {\varDelta}_{U_j}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{ {Y}}^{k+1,k} } \right\|_{F} } \right) }\\ &&\leq c_{1}{ \left( \sum\limits^{d}_{j=1}{ \left\| { {\varDelta}_{U_j}^{k+1,k} } \right\|_{F} } + 2{ \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{ {T}}^{k,k-1} } \right\|_{F} } \right) } \\ &&\leq c_{2} { \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} } \right) },~1\leq j\leq d-t. \end{array} $$

(A.22)

On the other hand, for d − t + 1 ≤ j ≤ d, by noticing the definition of $\overline { V}^{k+1}_{j}$, we have

$$ \partial_{U_{j}} { \tilde L_{\tau,\alpha}^{k+1,k} } = - \overline{V}^{k+1}_{j} + \partial \iota_{{ \text{st}(n_{j},R) } }(U^{k+1}_{j}). $$

From the definition of $U^{k+1}_{j}$ in (??) and similar to the above argument, we can show that $\tilde V^{k+1}_{j} \in \partial \iota _{{ \text {st}(n_{j},R) }}(U^{k+1}_{j}). $ Thus,

$$ \tilde V^{k+1}_{j} - \overline V^{k+1}_{j} \in \partial_{U_{j}} { \tilde L_{\tau,\alpha}^{k+1,k} } ,~d-t+1\leq j\leq d. $$

Similar to (A.22), there exists a large enough constant c₃ > 0 such that

$$ \text{dist}(\boldsymbol{0}, \partial_{u_{j,i}} { \tilde L_{\tau,\alpha}^{k+1,k} } ) \leq c_{3}{ \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} } \right) },~d-t+1\leq j\leq d. $$

(A.23)

We then consider

$$ \nabla_{ {T}} { \tilde L_{\tau,\alpha}^{k+1,k} } = {W}^{k+1}\circledast{ \left( {T}^{k+1} - {A} \right) } + {Y}^{k+1} - \tau{ \left( [[ \boldsymbol{ \sigma}^{k+1};\boldsymbol{U}^{k+1} ]] - {T}^{k+1} \right) } + \frac{4}{\tau}{ \left( {T}^{k+1} - {T}^{k} \right) }. $$

Note that W^k+ 1 and σ^k+ 1 above are only representations instead of variables, which represent (??) and (??). From the expression of Y^k+ 1 in (.33), we have

$$ \begin{array}{@{}rcl@{}} { \left\| W^{k+1}\circledast{ \left( {T}^{k+1} - {A} \right) } + {Y}^{k+1} \right\|_{F} } &=& { \left\| { \left( W^{k+1} - W^{k} \right) }\circledast{ \left( {T}^{k+1} - {A} \right) } \right\|_{F} }\\ &\leq&{ \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F} }, \end{array} $$

where the inequality follows from Proposition 2.3. On the other side,

$$ \begin{array}{@{}rcl@{}} \tau{ \left\| { [[ \boldsymbol{ \sigma}^{k+1};\boldsymbol{U}^{k+1} ]] - {T}^{k+1} } \right\|_{F} } &=& \tau{ \left\| [[\boldsymbol{\sigma}^{k+1};\boldsymbol{U}^{k+1} ]] - [[\boldsymbol{\sigma}^{k};\boldsymbol{U}^{k+1} ]] + [[\boldsymbol{\sigma}^{k };\boldsymbol{U}^{k+1} ]] - {T}^{k+1} \right\|_{F} } \\ &\leq& \tau { \left\| [[\boldsymbol{\sigma}^{k+1};\boldsymbol{U}^{k+1} ]] - [[\boldsymbol{\sigma}^{k};\boldsymbol{U}^{k+1} ]] \right\|_{F} } + { \left\| { {\varDelta}_{ {Y}}^{k+1,k} } \right\|_{F} }\\ &\leq& c_{4}{ \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} } \right) }, \end{array} $$

(A.24)

where c₄ > 0 is large enough. Combining the above pieces shows that there exists a large enough constant c₅ > 0 such that

$$ { \left\| \nabla_{ {T}} { \tilde L_{\tau,\alpha}^{k+1,k} } \right\|_{F} } \leq c_{5}{ \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} } \right) }. $$

(A.25)

Next, it follows from (A.24) that

$$ { \left\| \nabla_{ {Y}}{ \tilde L_{\tau,\alpha}^{k+1,k} } \right\|_{F} } = { \left\| { [[ \boldsymbol{ \sigma}^{k+1};\boldsymbol{U}^{k+1} ]] - {T}^{k+1} } \right\|_{F} } \leq \frac{c_{4}}{\tau}{ \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} } \right) }. $$

(A.26)

Finally,

$$ { \left\| \nabla_{ {T}^{\prime}} { \tilde L_{\tau,\alpha}^{k+1,k} } \right\|_{F} } = \frac{4}{\tau}{ \left\| { {\varDelta}_{ {T}}^{k+1,k} } \right\|_{F} }. $$

(A.27)

Combining (A.22), (A.23), (A.25), (A.26), (A.27), we get that there exists a large enough constant c₀ > 0 independent of k, such that

$$ \text{dist}(\boldsymbol{0}, \partial { \tilde L_{\tau,\alpha}^{k+1,k} } ) \leq c_{0}{ \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} } \right) }, $$

as desired. □

Now, we can present the proof concerning global convergence.

Proof Proof of Theorem 4.2

We have mentioned that $\{ { \tilde L_{\tau ,\alpha }^{k+1,k} } \}$ inherits the properties of $\{\tilde L_{\tau }^{k+1,k} \}$, i.e., it is bounded, nonincreasing and convergent. We denote its limit as $\tilde L^{*}_{\tau ,\alpha } = \lim _{k\rightarrow \infty } \tilde L^{k+1,k}_{\tau ,\alpha } = \tilde L_{\tau ,\alpha }(\boldsymbol {U}^{*}, {T}^{*}, {Y}^{*}, {T}^{*})$, where {U^∗,T^∗,Y^∗,T^∗} is a limit point. According to Definition 2 and Proposition A.1, there exist an 𝜖₀ > 0, a neighborhood of {U^∗,T^∗,Y^∗,T^∗}, and a continuous and concave function $\psi (\cdot ):[0,\epsilon _{0}) \rightarrow \mathbb {R}_{+}$ such that for all $\{\boldsymbol {U}, {T}, {Y}, {T}^{\prime }\} \in {N}$ satisfying $\tilde L_{\tau ,\alpha }^{*} < \tilde L_{\tau ,\alpha }(\boldsymbol {U}, {T}, {Y}, {T}^{\prime }) <\tilde L_{\tau ,\alpha }^{*} + \epsilon _{0}$, there holds

$$ \psi^{\prime}(\tilde L_{\tau,\alpha}(\boldsymbol{U}, {T}, {Y}, {T}^{\prime}) -\tilde L_{\tau,\alpha}^{*} )\text{dist}(0,\partial \tilde L_{\tau,\alpha}(\boldsymbol{U}, {T}, {Y}, {T}^{\prime}) \geq 1. $$

(A.28)

Let 𝜖₁ > 0 be such that

$$ \begin{array}{@{}rcl@{}} &&\mathbb B_{\epsilon_{1}} := \{ { \left( \boldsymbol{U}, {T}, {Y}, {T}^{\prime} \right) }\mid \|{ \left\| U_{j}-U^{*}_{j} \right\|_{F} } < \epsilon_{1},1\leq j\leq d,\big\|{ {T}- {T}^{*}}< \epsilon_{1}, \\ &&{ \left\| {Y}- {Y}^{*} \right\|_{F} }<2 \epsilon_{1}, { \left\| {T}^{\prime} - {T}^{*} \right\|_{F} } <2\epsilon_{1} \} \subset N, \end{array} $$

and let $\mathbb B^{\boldsymbol {U}, {T}}_{\epsilon _{1}}:= \{ { \left (\boldsymbol {U}, {T} \right ) }\mid { \left \| U_{j} -U^{*}_{j} \right \|_{F} } < \epsilon _{1},1\leq j\leq d,{ \left \| {T}- {T}^{*} \right \|_{F} }<\epsilon _{1} \}$. From the stationary point system (??) and the expression of Y^k+ 1 in (.33), we have

$$ \begin{array}{@{}rcl@{}} { \left\| {Y}^{k} - {Y}^{*} \right\|_{F} } &=& { \left\| W^{k-1}\circledast{ \left( {T}^{k} - {A} \right) } - {W}^{*}\circledast { \left( {T}^{*} - {A} \right) } \right\|_{F} } \\ &\leq& { \left\| W^{k-1}\circledast{ \left( {T}^{k} - {A} \right) } - W^{k}\circledast{ \left( {T}^{k}- {A} \right) } \right\|_{F} } \\ &&+{ \left\| {W}^{k}\circledast{ \left( {T}^{k}- {A} \right) } - {W}^{*}\circledast { \left( {T}^{*}- {A} \right) } \right\|_{F} } \\ &\leq& { \left\| { {\varDelta}_{ {T}}^{k,k-1} } \right\|_{F} } + { \left\| { {\varDelta}_{ {T}}^{k,*} } \right\|_{F} } \end{array} $$

(A.29)

where the last inequality follows from Propositions 2.3 and 2.2. On the other hand,

$$ { \left\| {T}^{k-1} - {T}^{*} \right\|_{F} } \leq { \left\| { {\varDelta}_{ {T}}^{k,k-1} } \right\|_{F} } + \left\| { {\varDelta}_{ {T}}^{k,*} } \right\|_{F} . $$

(A.30)

As Theorem 4.1 shows that there exists k₀ > 0 such that for k ≥ k₀, ${ \left \| { {\varDelta }_{ {T}}^{k,k-1} } \right \|_{F} }<\epsilon _{1}$, (A.29) and (A.30) tells us that if k ≥ k₀ and $\{\boldsymbol {U}^{k}, {T}^{k}\}\in \mathbb {B}^{\boldsymbol {U}, {T} }_{\epsilon _{1} }$, then $\{\boldsymbol {U}^{k}, {T}^{k}, {Y}^{k}, {T}^{k-1} \} \in \mathbb {B}_{\epsilon _{1} } \subset N$. Such k₀ must exist as {U^∗,T^∗,Y^∗,T^∗} is a limit point. In addition, denote $c_{1}:=\min \limits \{\alpha /2,1/\tau \}$; then, there exists k₁ ≥ k₀ such that $ \{ \boldsymbol {U}^{k_{1}}, {T}^{k_{1}} \} \in \mathbb {B}^{\boldsymbol {U}, {T} }_{\epsilon _{1}/2} $ and

$$ \begin{array}{ll} & \frac{c_{0}}{2\sqrt{c_{1}}c_{2}} { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1},k_{1}-1} } \right\|_{F} } < \frac{\epsilon_{1}}{16},~ \frac{c_{0}}{2\sqrt{c_{1}}c_{2}} { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1}-1,k_{1}-2} } \right\|_{F} } <\frac{\epsilon_{1}}{16},~ \\ &\frac{c_{2}}{2\sqrt{c_{1}}} \psi(\tilde L_{\tau,\alpha}^{k_{1},k_{1}-1} - L^{*}_{\tau,\alpha} ) < \frac{\epsilon_{1}}{4},~L^{*}_{\tau,\alpha} < \tilde L_{\tau,\alpha}^{k_{1},k_{1}-1} < L^{*}_{\tau,\alpha} + \epsilon_{0}, \end{array} $$

(A.31)

where c₀ is the constant appearing in Lemma A.3, and c₂ is a constant such that $c_{2} > 16c_{0}/\sqrt {c_{1}}$.

In what follows, we use induction method to show that $\{\boldsymbol {U}^{k}, {T}^{k}\}\in \mathbb B^{\boldsymbol {U}, {T}}_{\epsilon _{1}}$ for all k > k₁. Since ψ(⋅) in Definition 2 is concave, it holds that for any k,

$$ \psi^{\prime}(\tilde L^{k,k-1}_{\tau,\alpha} - L^{*}_{\tau,\alpha} )\left( (\tilde L^{k,k-1}_{\tau,\alpha} - \tilde L^{*}_{\tau,\alpha}) - (\tilde L^{k+1,k}_{\tau,\alpha} - \tilde L^{*}_{\tau,\alpha} ) \right) \!\leq\! \psi(\tilde L^{k,k-1}_{\tau,\alpha} \!-\tilde L^{*}_{\tau,\alpha}) - \psi(\tilde L^{k+1,k}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha} ); $$

(A.32)

on the other side, from the previous paragraph we see that $\{\boldsymbol {U}^{k_{1}}, {T}^{k_{1}}\}\in \mathbb B^{\boldsymbol {U}, {T} }_{\epsilon _{1}/2}$, $\{ \boldsymbol {U}^{k_{1}}, {T}^{k_{1}}, {Y}^{k_{1}}, {T}^{k_{1}-1} \} \in \mathbb {B}_{\epsilon _{1}} \subset {N}$, and so (A.28) holds at $\{\boldsymbol {U}^{k_{1}}, {T}^{k_{1}}, {Y}^{k_{1}}, {T}^{k_{1}-1} \}$. Recall $c_{1}=\min \limits \{\alpha /2,1/\tau \}$. From Lemma A.2 and the relation between $\tilde L_{\tau }$ and $\tilde L_{\tau ,\alpha }$, we obtain

$$ \begin{array}{@{}rcl@{}} c_{1}{ \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1}+1,k} } \right\|_{F} }^{2} &\leq& \tilde L^{k_{1},k_{1}-1}_{\tau,\alpha} - \tilde L_{\tau,\alpha}^{k_{1}+1,k_{1}} \\ &\leq& \frac{\psi(\tilde L^{k_{1},k_{1}-1}_{\tau,\alpha} - \tilde L^{*}_{\tau,\alpha}) - \psi(\tilde L^{k_{1}+1,k_{1}}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha} )}{\psi^{\prime}(\tilde L^{k_{1},k_{1}-1}_{\tau,\alpha} - \tilde L^{*}_{\tau,\alpha} )} \\ &\leq& c_{2}\left( \psi(\tilde L^{k_{1},k_{1}-1}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha}) - \psi(\tilde L^{k_{1}+1,k_{1}}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha} ) \right) \cdot c_{2}^{-1}\text{dist}(0, \partial \tilde L^{k_{1},k_{1}-1}_{\tau,\alpha} ), \end{array} $$

where the second inequality is due to (A.32)while the last one comes from (A.28). Using $\sqrt {ab}\leq \frac {a+b}{2}$ for a ≥ 0,b ≥ 0, invoking (A.19) and noticing the range in (A.31), we obtain

$$ \begin{array}{@{}rcl@{}} \sqrt{c_{1}} { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1}+1,k} } \right\|_{F} } &\leq& \frac{c_{2}}{2}\left( \psi(\tilde L^{k_{1},k_{1}-1}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha}) - \psi(\tilde L^{k_{1}+1,k_{1}}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha} ) \right)\\ &&~~~~~~~~~~~~~~~~~~~~ + \frac{c_{0} }{2c_{2}}{ \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1},k_{1}-1} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1}-1,k_{1}-2} } \right\|_{F} } \right) }\\ &<&\frac{ \sqrt{c_{1}}\epsilon_{1}}{4} + \frac{ \sqrt{c_{1}}\epsilon_{1}}{8} < \frac{\sqrt{c_{1}}\epsilon_{1}}{2}, \end{array} $$

and so

$$ { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1}+1,*} } \right\|_{F} }\leq { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1}+1,k_{1}} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1},*} } \right\|_{F} } < \frac{\epsilon_{1}}{2} + \frac{\epsilon_{1}}{2}=\epsilon_{1}, $$

namely, $ \{\boldsymbol {U}^{k_{1}+1}, {T}^{k_{1}+1}\} \in \mathbb B^{\boldsymbol {U}, {T} }_{\epsilon _{1}}$.

Now, assume that $\{\boldsymbol {U}^{k}, {T}^{k}\}\in \mathbb B^{\boldsymbol {U}, {T} }_{\epsilon _{1}}$ for k = k₁,…,K. This implies that (A.28) is true at {U^k,T^k,Y^k,T^k− 1}, and similarly to the above analysis, we have

$$ \begin{array}{@{}rcl@{}} &&\sqrt{c_{1}} { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } \leq \frac{c_{2}}{2}\left( \psi(\tilde L^{k,k-1}_{\tau,\alpha} - \tilde L^{*}_{\tau,\alpha}) - \psi(\tilde L^{k+1,k}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha} ) \right) \\ &&~~~~~~~~~~+ \frac{c_{0}}{2c_{2}}\left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} }+ { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k-1,k-2} } \right\|_{F} } \right),~k=k_{1},\ldots,K. \end{array} $$

(A.33)

We then show that $\{\boldsymbol {U}^{K+1}, {T}^{K+1}\}\in \mathbb B^{\boldsymbol {U}, {T}}_{\epsilon _{1}}$. Summing (A.33) for k = k₁,…,K yields

$$ \begin{array}{@{}rcl@{}} &&\sqrt{c_{1}} {\sum}^{K}_{k=k_{1}}{ \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } \\ &\leq& \frac{c_{2}}{2}\left( \psi(\tilde L^{k_{1},k_{1}-1}_{\tau,\alpha} - \tilde L^{*}_{\tau,\alpha}) - \psi(\tilde L^{K+1,K}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha} ) \right) \\ && + \frac{c_{0}}{2c_{2}}{\sum}^{K}_{k=k_{1}} \left( { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k,k-1} } \right\|_{F} } + { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k-1,k-2} } \right\|_{F} } \right)\\ &\leq& \frac{c_{2}}{2}\left( \psi(\tilde L^{k_{1},k_{1}-1}_{\tau,\alpha} - \tilde L^{*}_{\tau,\alpha}) - \psi(\tilde L^{K+1,K}_{\tau,\alpha} -\tilde L^{*}_{\tau,\alpha} ) \right) \\ &&+ \frac{c_{0}}{c_{2}}\sum\limits^{K-1}_{k=k_{1}}{ \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } + \frac{2c_{0}}{c_{2}}{ \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1},k_{1}-1} } \right\|_{F} }+ \frac{c_{0}}{c_{2}}{ \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1}-1,k_{1}-2} } \right\|_{F} } . \end{array} $$

(A.34)

Rearranging the terms, noticing (A.31) and noticing that $\frac {c_{2}}{c_{0}}> \frac {\sqrt {c_{1}}}{16}$, we have

$$ \frac{15\sqrt{c_{1}}}{16} \sum\limits^{K}_{k=k_{1}} { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} } \leq \frac{\sqrt{c_{1}}}{4}\epsilon_{1} + \frac{\sqrt c_{1}\epsilon_{1}}{16} + \frac{\sqrt c_{1}\epsilon_{1}}{16}, $$

and so

$$ \begin{array}{@{}rcl@{}} { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{K+1,*} } \right\|_{F} }&\leq& { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{K+1,k_{1}} } \right\|_{F} }+ { \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k_{1},*} } \right\|_{F} }\\ &<& \sum\limits^{K}_{k=k_{1}}{ \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} }+ \frac{\epsilon_{1}}{2}\\ &< & \frac{3\epsilon_{1}}{8} + \frac{\epsilon_{1}}{2} < \epsilon_{1}. \end{array} $$

Thus, induction method implies that $\{\boldsymbol {U}^{k}, {T}^{k}\}\in \mathbb B^{\boldsymbol {U}, {T}}_{\epsilon _{1}}$ for all k ≥ k₁, i.e., {U^k,T^k,Y^k,T^k− 1}∈ N, k ≥ k₁. As a result, (A.33) holds for all k ≥ k₁, so does (A.34). Therefore, letting $K\rightarrow \infty $ in (A.34) yields

$$ {\sum}^{\infty}_{k=1} \left\| { {\varDelta}_{\boldsymbol{U}, {T}}^{k+1,k} } \right\|_{F} <+\infty, $$

which shows that {U^k,T^k} is a Cauchy sequence and hence converges. Since {U^∗,T^∗} in Theorem 4.1 is a limit point, the whole sequence converges to {U^∗,T^∗}. This completes the proof. □

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, Y., Feng, Y. Half-quadratic alternating direction method of multipliers for robust orthogonal tensor approximation. Adv Comput Math 49, 24 (2023). https://doi.org/10.1007/s10444-023-10014-6

Download citation

Received: 30 November 2020
Accepted: 15 January 2023
Published: 24 March 2023
DOI: https://doi.org/10.1007/s10444-023-10014-6

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Half-quadratic alternating direction method of multipliers for robust orthogonal tensor approximation

Abstract

Access this article

Similar content being viewed by others

A Generalized Non-convex Method for Robust Tensor Completion

On Approximation Algorithm for Orthogonal Low-Rank Tensor Approximation

Robust Schatten-p Norm Based Approach for Tensor Completion

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Appendix

1.1 Proof of Theorem 4.1

Lemma .2

Proof

Lemma .3

Proof

Theorem .3

Proof

Proof Proof of Theorem 4.1

1.2 Proof of Theorem 4.2

Definition 1 (c.f. 2)

Definition 2 (KL property and KL function, c.f. 2, 4)

Proposition .4

Proof

Lemma .4

Proof

Proof Proof of Theorem 4.2

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Half-quadratic alternating direction method of multipliers for robust orthogonal tensor approximation

Abstract

Access this article

Similar content being viewed by others

A Generalized Non-convex Method for Robust Tensor Completion

On Approximation Algorithm for Orthogonal Low-Rank Tensor Approximation

Robust Schatten-p Norm Based Approach for Tensor Completion

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Appendix

Appendix

1.1 Proof of Theorem 4.1

Lemma .2

Proof

Lemma .3

Proof

Theorem .3

Proof

Proof Proof of Theorem 4.1

1.2 Proof of Theorem 4.2

Definition 1 (c.f. 2)

Definition 2 (KL property and KL function, c.f. 2, 4)

Proposition .4

Proof

Lemma .4

Proof

Proof Proof of Theorem 4.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation