Abstract
Higher-order tensor canonical polyadic decomposition (CPD) with one or more of the latent factor matrices being columnwisely orthonormal has been well studied in recent years. However, most existing models penalize the noises, if occurring, by employing the least squares loss, which may be sensitive to non-Gaussian noise or outliers, leading to bias estimates of the latent factors. In this paper, we derive a robust orthogonal tensor CPD model with Cauchy loss, which is resistant to heavy-tailed noise such as the Cauchy noise, or outliers. By exploring the half-quadratic property of the model, we develop the so-called half-quadratic alternating direction method of multipliers (HQ-ADMM) to solve the model. Each subproblem involved in HQ-ADMM admits a closed-form solution. Thanks to some nice properties of the Cauchy loss, we show that the whole sequence generated by the algorithm globally converges to a stationary point of the problem under consideration. Numerical experiments on synthetic and real data demonstrate the effectiveness of the proposed model and algorithm.
Similar content being viewed by others
References
Anandkumar, A., Jain, P., Shi, Y., Niranjan, U. N.: Tensor vs. matrix methods: robust tensor decomposition under block sparse perturbations. In: Artificial Intelligence and Statistics, pp. 268–276 (2016)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems : proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1-2), 91–129 (2013)
Beaton, A., Tukey, J.: The fitting of power series, meaning polynomials, illustrated on band-spectroscopic data. Technometrics 16(2), 147–185 (1974)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1-2), 459–494 (2014)
Chen, J., Saad, Y.: On the tensor SVD and the optimal low rank orthogonal approximation of tensors. SIAM. J. Matrix Anal. Appl. 30(4), 1709–1734 (2009)
Cheng, L., Wu, Y. C., Poor, H.V.: Probabilistic tensor canonical polyadic decomposition with orthogonal factors. IEEE Trans. Signal Process. 65 (3), 663–676 (2016)
Cichocki, A., Mandic, D., De Lathauwer, L., Zhou, G., Zhao, Q., Caiafa, C., Phan, H.A.: Tensor decompositions for signal processing applications: from two-way to multiway component analysis. IEEE Signal Process. Mag. 32(2), 145–163 (2015)
De Almeida, A.L.F., Kibangou, A.Y., Miron, S., Araújo, D.C.: Joint data and connection topology recovery in collaborative wireless sensor networks. In: Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013), pp. 5303–5307 (2013)
De Lathauwer, L.: Algebraic methods after prewhitening. In: Handbook of Blind Source Separation, pp. 155–177. Elsevier (2010)
De Lathauwer, L.: A Short introduction to tensor-based methods for factor analysis and blind source separation. In: Proceeding of the IEEE International Symposium on Image and Signal Processing and Analysis (ISPA 2011), pp. 558–563 (2011)
De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21, 1253–1278 (2000)
Ding, M., Huang, T.Z., Ma, T.H., Zhao, X.L., Yang, J.H.: Cauchy noise removal using group-based low-rank prior. Appl. Math. Comput. 372, 124971 (2020)
Feng, Y., Fan, J., Suykens, J.: A statistical learning approach to modal regression. J. Mach. Learn. Res. 21(2), 1–35 (2020)
Feng, Y., Huang, X., Shi, L., Yang, Y., Suykens, J.: Learning with the maximum correntropy criterion induced losses for regression. J. Mach. Learn. Res. 16, 993–1034 (2015)
Ganan, S., McClure, D.: Bayesian image analysis: an application to single photon emission tomography. Amer. Statist. Assoc, 12–18 (1985)
Goldfarb, D., Qin, Z.: Robust low-rank tensor recovery: models and algorithms. SIAM J. Matrix Anal. Appl. 35(1), 225–253 (2014)
Guan, N., Liu, T., Zhang, Y., Tao, D., Davis, L.S.: Truncated cauchy non-negative matrix factorization. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 246–259 (2017)
Guan, Y., Chu, D.: Numerical computation for orthogonal low-rank approximation of tensors. SIAM J. Matrix Anal. Appl. 40(3), 1047–1065 (2019)
He, R., Zheng, W.S., Hu, B.G.: Maximum correntropy criterion for robust face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1561–1576 (2010)
Hillar, C.J., Lim, L.H.: Most tensor problems are NP-hard. J. ACM 60(6), 45:1–45:39 (2013)
Holland, P., Welsch, R.: Robust regression using iteratively reweighted least-squares. Commun. Stat.-Theory Methods 6(9), 813–827 (1977)
Hong, D., Kolda, T.G., Duersch, J.A.: Generalized canonical polyadic tensor decomposition. SIAMRev. 62(1), 133–163 (2020)
Hong, M., Luo, Z.Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)
Hu, S., Li, G.: Convergence rate analysis for the higher order power method in best rank one approximations of tensors. Numer. Math. 140(4), 993–1031 (2018)
Hu, S., Ye, K. (2019)
Huber, P.J.: Robust statistics, vol. 523. Wiley, New York (2004)
Kim, G., Cho, J., Kang, M.: Cauchy noise removal by weighted nuclear norm minimization. J. Sci. Comput. 83, 15 (2020)
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51, 455–500 (2009)
Kovnatsky, A., Glashoff, K., Bronstein, M.M.: MADMM: a generic algorithm for non-smooth optimization on manifolds. In: European Conference on Computer Vision, pp. 680–696. Springer (2016)
Li, G., Liu, T., Pong, T.K.: Peaceman–Rachford splitting for a class of nonconvex optimization problems. Comput. Optim. Appl. 68(2), 407–436 (2017)
Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)
Li, G., Pong, T.K.: Douglas–Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems. Math. Program. 159(1-2), 371–401 (2016)
Li, G., Pong, T.K.: Calculus of the exponent of kurdyka–łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18(5), 1199–1232 (2018)
Li, J., Usevich, K., Comon, P.: Globally convergent Jacobi-type algorithms for simultaneous orthogonal symmetric tensor diagonalization. SIAM J. Matrix Anal. Appl. 39(1), 1–22 (2018)
Li, J., Zhang, S.: Polar decomposition based algorithms on the product of stiefel manifolds with applications in tensor approximation. arXiv:1912.10390 (2019)
Li, X., Lu, Q., Dong, Y., Tao, D.: Robust subspace clustering by cauchy loss function. IEEE Trans. Neural Netw. Learn. Syst. 30(7), 2067–2078 (2018)
Liu, H., So, A.M.C., Wu, W.: Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods. Math. Program. 178(1), 215–262 (2019)
Maronna, R., Bustos, O., Yohai, V.: Bias-and efficiency-robustness of general M-estimators for regression with random carriers. In: Smoothing Techniques for Curve Estimation, pp. 91–116. Springer (1979)
Mei, J.J., Dong, Y., Huang, T.Z., Yin, W.: Cauchy noise removal by nonconvex admm with convergence guarantees. J. Sci. Comput. 74(2), 743–766 (2018)
Pan, J., Ng, M.K.: Symmetric orthogonal approximation to symmetric tensors with applications to image reconstruction. Numer. Linear Algebra Appl. 25(5), e2180 (2018)
Pravdova, V., Estienne, F., Walczak, B., Massart, D.L.: A robust version of the Tucker3 model. Chemometr. Intell. Lab. Syst. 59(1), 75–88 (2001)
Savas, B., Lim, L.H.: Quasi-Newton methods on grassmannians and multilinear approximations of tensors. SIAM J. Sci. Comput. 32(6), 3352–3393 (2010)
Sciacchitano, F., Dong, Y., Zeng, T.: Variational approach for restoring blurred images with cauchy noise. SIAM J. Imag. Sci. 8(3), 1894–1922 (2015)
Shashua, A., Levin, A.: Linear image coding for regression and classification using the tensor-rank principle. In: CVPR, vol. 1, pp. I–I. IEEE (2001)
Sidiropoulos, N.D., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E.E., Faloutsos, C.: Tensor decomposition for signal processing and machine learning. IEEE Trans. Signal Process. 65(13), 3551–3582 (2017)
Sidiropoulos, N.D., Giannakis, G.B., Bro, R.: Blind parafac receivers for ds-cdma systems. IEEE Trans. Signal Process. 48(3), 810–823 (2000)
Signoretto, M., Dinh, Q.T., De Lathauwer, L., Suykens, J.A.K.: Learning with tensors: a framework based on convex optimization and spectral regularization. Mach. Learn. 94(3), 303–351 (2014)
Sørensen, M., De Lathauwer, L., Comon, P., Icart, S., Deneire, L.: Canonical polyadic decomposition with a columnwise orthonormal factor matrix. SIAM J. Matrix Anal. Appl. 33(4), 1190–1213 (2012)
Sørensen, M., De Lathauwer, L., Deneire, L.: PARAFAC with orthogonality in one mode and applications in DS-CDMA systems. In: Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2010), pp. 4142–4145 (2010)
Vervliet, N., Debals, O., Sorber, L., Van Barel, M., De Lathauwer, L.: Tensorlab 3.0. http://www.tensorlab.net. Available online (2016)
Wang, L., Chu, M.T., Yu, B.: Orthogonal low rank tensor approximation: alternating least squares method and its global convergence. SIAM J. Matrix Anal. and Appl. 36(1), 1–19 (2015)
Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)
Yang, Y.: The epsilon-alternating least squares for orthogonal low-rank tensor approximation and its global convergence. SIAM J. Matrix Anal. Appl. 41(4), 1797–1825 (2020)
Yang, Y., Feng, Y., Suykens, J.A.K.: Robust low-rank tensor recovery with regularized redescending m-estimator. IEEE Trans. Neural Netw. Learn. Syst. 27(9), 1933–1946 (2015)
Ye, K., Hu, S.: When geometry meets optimization theory: partially orthogonal tensors. arXiv:2201.04824 (2022)
Yu, P., Li, G., Pong, T.K.: Kurdyka–Łojasiewicz exponent via inf-projection. Found. Comput. Math. 1–47 (2021)
Acknowledgements
We thank the editor and the anonymous reviewers for their insightful comments and suggestions that helped improve this manuscript.
Funding
The first author was supported by the National Natural Science Foundation of China Grants 11801100 and 12171105, and the Fok Ying Tong Education Foundation Grant 171094. The second author was supported by the Simons Foundation Collaboration Grant 572064.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Communicated by: Guoyin Li
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Mathematics of Computation and Optimisation Guest Editors: Jerome Droniou, Andrew Eberhard, Guoyin Li, Russell Luke, Thanh Tran
Appendix
Appendix
1.1 Proof of Theorem 4.1
To prove the convergence of a nonconvex ADMM, a key step is to upper bound the successive difference of the dual variables by the primal variables. Different from the nonconvex ADMMs in the literature, for HQ-ADMM, the weight Wk brings barriers in the estimation of the upper bound. Fortunately, this can be overcome by realizing the relations between Wk, Tk and Tk− 1 based on Proposition 2.3, which will be given in Lemma A.1 With the upper bound at hand, we can derive the decreasing inequality with respect to \(\{\tilde {L}_{\tau }^{k+1,k} \}\) (Lemma A.2), whose verification is somewhat similar to that of a nonconvex block coordinate descent. Then, the boundedness of the variables is established in Theorem A.1. Key to the above two results is to set the parameter \(\tau \geq \sqrt { 10}\). Combining the above pieces, the subsequential convergence will be proved at the end of this subsection using a standard argument.
Lemma .2
It holds that
Proof
From (??), we have
which together with the definition of Yk+ 1 yields
Therefore, we have
Now, denote \( E_{1}:= { \left \| W^{k}\circledast { \left ({T}^{k+1}- {T}^{k} \right ) } \right \|_{F} } \) and \(\small E_{2}:={ \left \| (W^{k} - W^{k-1})\circledast { \left ({T}^{k}- {A} \right ) } \right \|_{F} } \). We first consider E1. From the definition of Wk, we easily see that \( W^{k}_{i_{1}{\cdots } i_{d}}\leq 1\) for each i1,…,id. Therefore,
Next, we focus on E2. To simplify notations we denote \(a_{i_{1}{\cdots } i_{d}}^{k}:= {T}^{k}_{i_{1}{\cdots } i_{d}} - {A}_{i_{1}{\cdots } i_{d}}\) and
Then, E2 can be expressed as
It follows from Proposition 2.3 that
and so
(.34) combining with (.35) and (.36) yields the desired result. □
With Lemma A.1, we then establish a decreasing inequality with respect to \(\{\tilde {L}_{\tau }^{k+1,k} \}\) defined in (??):
Key to the validness of the decreasing inequality is to set \(\tau \geq \sqrt {10}\).
Lemma .3
Let the parameter τ satisfy \( \tau \geq \sqrt {10}\). Then, there holds
where α > 0 is defined in (??) and (??).
Proof
We first consider the decrease caused by Uj. When 1 ≤ j ≤ d − t, according to the algorithm, the expression of Lτ(⋅), that \({ \left \| u^{k}_{j,i} \right \| }=1\) and recalling the definition of \(u^{k+1}_{j,i}\), \(\mathbf {v}^{k+1}_{j,i}\), and \(\tilde {\mathbf {v}}^{k+1}_{j,i}\), we have
where the fourth equality follows from the definition of \(u^{k+1}_{j,i}\) and \(\tilde {\mathbf {v}}^{k+1}_{j,i}\), and the inequality is due to \({ \left \| \mathbf {v} \right \| }\geq {\left \langle \mathbf {v} , u\right \rangle }\) for any vectors u,v of the same size with \({ \left \| u \right \| }=1\).
The decrease of Uj when d − t + 1 ≤ j ≤ d is similar. From the definition of \(V^{k+1}_{j}\), it holds that
where the inequality follows from the definition of \(U^{k+1}_{j}\) in (??). To show the decrease of T, note that Lτ(⋅) is strongly convex with respect to T, based on which we can easily deduce that
Next, it follows from the definition of Yk+ 1 and Lemma A.1 that
Finally, it follows from the definition of σk+ 1 and Wk+ 1 that
As a result, summing up (.37)–(.42) yields
where the last inequality follows from the range of τ. Rearranging the terms of (.43) gives the desired results. This completes the proof. □
We then show that \(\tilde L_{\tau }^{k,k-1}\) defined in Lemma A.2 is lower bounded and the sequence {σk,Uk,Tk,Yk,Wk} is bounded as well.
Theorem .3
Under the setting of Lemma A.2, \(\{\tilde L_{\tau }^{k,k-1}\}\) is bounded. The sequence {σk,Uk,Tk,Yk,Wk} generated by Algorithm 1 is bounded as well.
Proof
Denote \(Q^{k}(\cdot ) := \frac {1}{2}{ \left \| \sqrt { W^{k}}\circledast { \left (\cdot - {A} \right ) } \right \|_{F} }^{2} \); thus, we have \(\nabla Q^{k}({T}) = W^{k}\circledast { \left ({T}- {A} \right ) }\), and it then follows from the quadraticity of Qk(⋅) and \( {Y}^{k} = - W^{k-1}\circledast { \left ({T}^{k}- {A} \right ) }\) from (.33) that
where the last inequality uses the fact that \(0< W^{k-1}_{i_{1}{\cdots } i_{d}} \leq 1\). It thus follows that for any k ≥ 2,
where the first inequality follows from the proof of Lemma A.2 (summing up (.37)–(.41), the second one comes from (.44), and the last one is due to the range of τ and ϱ(⋅) ≥ 0. Thus, \(\{ \tilde L_{\tau }^{k,k-1} \}\) is a lower bounded sequence. This together with Lemma A.2 shows that \(\{ \tilde L_{\tau }^{k,k-1} \}\) is bounded. We then show the boundedness of {σk,Uk,Tk,Yk,Wk}. The boundedness of {Uk} and {Wk} is obvious. Next, denote g(σk) as the formulation in lines 5–6 of (.45) with respect to σk. Proposition 2.1 shows that \(\bigotimes _{j=1}^{d}u^{k}_{j,i}\) is orthonormal and hence \({ \| [[ \boldsymbol { \sigma }^{k}; \boldsymbol {U}^{k}]] - {T}^{k} \|_{F} }^{2}\) is strongly convex with respect to σk; this together with the convexity of Qk− 1([[σk; Uk]]) shows that g(σk) is strongly convex with respect to σk. Combining this with (.45) gives the boundedness of {σk}. Quite similarly, we have that {Tk} is bounded. Finally, the boundedness of {Yk} follows from the expression of the T-subproblem (??). As a result, {σk,Uk,Tk,Yk,Wk} is a bounded sequence. This completes the proof. □
Proof Proof of Theorem 4.1
Lemma A.2 in connection with Theorem A.1 yields points 1, 2, and (??); (??) together with Lemma A.1 and the definition of Yk+ 1, σk+ 1, and Wk+ 1 gives (??). On the other hand, since the sequence is bounded, limit points exist. Assume that {σ∗,U∗,T∗,Y∗,W∗} is a limit point with
(??), (??) then implies that
Therefore, taking the limit into l with respect to the uj,i-subproblem (??) yields
Multiplying both sides by \(u^{*}_{j,i}\) gives
where the second equality follows from the definition of vj,i and the last one is given by passing the limit into the expression of \(\sigma ^{k_{l}+1}_{i}\) (??). Thus, (.46) together with (.47) gives
i.e., the first equation of the stationary point system (??).
Taking the limit into l with respect to the Uj-subproblem (??) and noticing the expression (??), we get
where \(H^{*}_{j}\) is a symmetric matrix. Writing it columnwisely, we obtain
Denoting \({\Lambda }^{*}_{j}:= H^{*}_{j} - \alpha I\), the above is exactly the third equality of (??). On the other hand, passing the limit into the expression of Tk (??) and Wk (??) respectively gives the T∗- and W∗- formulas in (??). Finally, the first expression of (??) yields T∗ = [[σ∗; U∗]]. Taking the above pieces together, we have that {σ∗,U∗,T∗,Y∗,W∗} satisfies the stationary point system (??).
Next, we show that {σ∗,U∗} is also a stationary point of problem (??). We define its Lagrangian function as \(L_{\boldsymbol { {\varPhi }}} := \boldsymbol { {\varPhi }}_{\delta }(\boldsymbol { \sigma }, \boldsymbol {U}) - {\sum }_{j,i=1}^{d-t,R} \eta _{j.i}{ \left (u_{j,i}^{\top } u_{j,i} -1 \right ) } - {\sum }^{d}_{j=d-t+1}{ \left \langle {\Lambda }_{j} , U_{j}^{\top } U_{j} - I\right \rangle }\), similar to that in (??). Taking derivative yields
1.2 Proof of Theorem 4.2
To prove Theorem 4.2, we first recall some definitions from nonsmooth analysis. Denote \(\text {dom}f:=\{x\in \mathbb {R}^{n}\mid f(\mathbf {x})<+\infty \}\).
Definition 1 (c.f. 2)
For x ∈domf, the Fréchet subdifferential, denoted as \(\hat \partial f(\mathbf {x})\), is the set of vectors \(z\in \mathbb R^{n}\) satisfying
The subdifferential of f at x ∈domf, written ∂f, is defined as
It is known that \(\hat \partial f(\mathbf {x})\subset \partial f(\mathbf {x})\) for each \(x\in \mathbb R^{n}\) [4]. An extended-real-valued function is a function \(f:\mathbb {R}^{n}\rightarrow [-\infty ,\infty ]\), which is proper if \(f(\mathbf {x})>-\infty \) for all x and \(f(x)<\infty \) for at least one x. It is called closed if it is lower semi-continuous (l.s.c. for short). The global convergence relies on the the Kurdyka-Łojasiewicz (KL) property given as follows:
Definition 2 (KL property and KL function, c.f. 2, 4)
A proper function f is said to have the KL property at \(\overline {x}\in \text {dom}\partial f :=\{x\in \mathbb R^{n}\mid \partial f(x)\neq \emptyset \}\), if there exist \(\bar \epsilon \in (0,\infty ]\), a neighborhood N of \(\overline {x}\), and a continuous and concave function \(\psi : [0,\bar \epsilon ) \rightarrow \mathbb R_{+}\) which is continuously differentiable on \((0,\bar \epsilon )\) with positive derivatives and ψ(0) = 0, such that for all x ∈ N satisfying \(f(\overline {x}) <f({x}) < f(\overline {x}) + \bar \epsilon \), it holds that
where dist(0,∂f(x)) means the distance from the original point to the set ∂f(x). If a proper and l.s.c. function f satisfies the KL property at each point of dom∂f, then f is called a KL function.
We then simplify \(\tilde L_{\tau }(\cdot )\) by eliminating the variables W and σ. First, from the definition of Wk+ 1 and Lemma 2.1, we have that
where Φδ(⋅) is defined in (??). This eliminate the W from \(\tilde L_{\tau }(\cdot )\). On the other hand, it follows from the definition of σk+ 1 (??) that
Thus, σ is also eliminated. In what follows, whenever necessary, \({\sigma ^{k}_{i}} \) still represents the expression \( ({Y}^{k}+\tau {T}^{k})\bigotimes _{j=1}^{d}u^{k}_{j,i}/\tau \), but we only treat it as a representation instead of a variable.
Then, \(\tilde L_{\tau }(\boldsymbol { \sigma }^{k+1}, \boldsymbol {U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, W^{k+1}, {T}^{k})\) can be equivalently written as
In addition, we denote
We can see that under the constraints of the optimization problem (??), \(\tilde L_{\tau ,\alpha }(\cdot ) = \tilde L_{\tau }(\cdot ) -\frac {\alpha d R}{2}\). This together with Theorem 4.1 tells us that the sequence \(\{\tilde L_{\tau ,\alpha }(\boldsymbol {U}^{k+1}, {T}^{k+1}, {Y}^{k+1}, {T}^{k}), \}\) is also bounded and nonincreasing. In addition, we have that \(\tilde L_{\tau ,\alpha }(\cdot )\) is a KL function.
Proposition .4
\(\tilde L_{\tau ,\alpha }(\boldsymbol {U}, {T}, {Y}, {T}^{\prime }) \) defined above is a proper, l.s.c., and KL function.
Proof
It is clear that \(\tilde L_{\tau ,\alpha }(\cdot )\) is proper and l.s.c.. Next, since the constrained sets in (??) are all Stiefel manifolds, items 2 and 6 of [4, Example 2] tell us that they are semi-algebraic sets, and their indicator functions are semi-algebraic functions. Therefore, the indicator functions are KL functions [4, Theorem 3]. On the other hand, the remaining part of \(\tilde L_{\tau ,\alpha }\) (besides the indicator functions) is an analytic function and hence it is KL [4]. As a result, \(\tilde L_{\tau ,\alpha }(\boldsymbol {U}, {T}, {Y}, {T}^{\prime }) \) is a KL function. □
In the sequel, we mainly rely on \(\tilde L_{\tau ,\alpha }(\cdot )\) to prove the global convergence. For convenience, we denote
denote \({ {\varDelta }_{\boldsymbol {U}, {T}}^{k+1,k} }:= (\boldsymbol {U}^{k+1} , {T}^{k+1}) - (\boldsymbol {U}^{k}, {T}^{k})\), and
Lemma .4
There exists a large enough constant c0 > 0, such that
Proof
We first consider \(\partial _{u_{j,i}} { \tilde L_{\tau ,\alpha }^{k+1,k} } \), 1 ≤ j ≤ d − t, 1 ≤ i ≤ R, and \(\partial _{U_{j}} { \tilde L_{\tau ,\alpha }^{k+1,k} } \), d − t + 1 ≤ j ≤ d, respectively. In what follows, we denote
We also recall \(\mathbf {v}_{j,i}^{k+1}:= ({Y}^{k}+ \tau {T}^{k}){\mathbf {u}_{1,i}^{k+1}\otimes \cdots \otimes \mathbf {u}_{j-1,i}^{k+1} \otimes \mathbf {u}_{j+1,i}^{k} \otimes \cdots \otimes \mathbf {u}_{d,i}^{k} }\) and \(\tilde {\mathbf {v}}_{j,i}^{k+1} = {\sigma ^{k}_{i}} \mathbf {v}^{k+1}_{j,i} + \alpha \mathbf {u}^{k}_{j,i}\) for later use. In addition, denote \(\tilde V^{k+1}_{j} := [\tilde {\mathbf {v}}^{k+1}_{j,1},\ldots ,\tilde {\mathbf {v}}^{k+1}_{j,R}]\).
For 1 ≤ j ≤ d − t, one has
we then wish to show that
The proof is similar to that of [53, Lemma 6.1]. First, from the definition of \(\iota _{{ \text {st}(n_{j},1) }}(\cdot ) \) and \(\hat \partial \iota _{{ \text {st}(n_{j},1) }}(\cdot )\) in (.50), it is not hard to see that if y∉st(nj, 1), then (.50)clearly holds when \({z} = \tilde {\mathbf {v}}^{k+1}_{j,i}\); otherwise if y ∈st(nj, 1), i.e., ∥y∥ = 1, then from the definition of \(\mathbf {u}^{k+1}_{j,i}\), we see that
which together with \(\iota _{{ \text {st}(n_{j},1) }}(\mathbf {y}) = 0\) and \(\iota _{{ \text {st}(n_{j},1) }}(u^{k+1}_{j,i})=0\) gives
As a result, (A.21) is true, which together with (A.20) shows that
Let 0 denote the origin. Then by using the triangle inequality and the boundeness of {σk,Uk,Tk,Yk}, and noticing the definition of \({ {\varDelta }_{\boldsymbol {U}, {T}}^{k+1,k} }\), there must exist large enough constants c1,c2 > 0 only depending on τ,α, and the size of {σk,Uk,Tk,Yk}, such that
On the other hand, for d − t + 1 ≤ j ≤ d, by noticing the definition of \(\overline { V}^{k+1}_{j}\), we have
From the definition of \(U^{k+1}_{j}\) in (??) and similar to the above argument, we can show that \(\tilde V^{k+1}_{j} \in \partial \iota _{{ \text {st}(n_{j},R) }}(U^{k+1}_{j}). \) Thus,
Similar to (A.22), there exists a large enough constant c3 > 0 such that
We then consider
Note that Wk+ 1 and σk+ 1 above are only representations instead of variables, which represent (??) and (??). From the expression of Yk+ 1 in (.33), we have
where the inequality follows from Proposition 2.3. On the other side,
where c4 > 0 is large enough. Combining the above pieces shows that there exists a large enough constant c5 > 0 such that
Next, it follows from (A.24) that
Finally,
Combining (A.22), (A.23), (A.25), (A.26), (A.27), we get that there exists a large enough constant c0 > 0 independent of k, such that
as desired. □
Now, we can present the proof concerning global convergence.
Proof Proof of Theorem 4.2
We have mentioned that \(\{ { \tilde L_{\tau ,\alpha }^{k+1,k} } \}\) inherits the properties of \(\{\tilde L_{\tau }^{k+1,k} \}\), i.e., it is bounded, nonincreasing and convergent. We denote its limit as \(\tilde L^{*}_{\tau ,\alpha } = \lim _{k\rightarrow \infty } \tilde L^{k+1,k}_{\tau ,\alpha } = \tilde L_{\tau ,\alpha }(\boldsymbol {U}^{*}, {T}^{*}, {Y}^{*}, {T}^{*})\), where {U∗,T∗,Y∗,T∗} is a limit point. According to Definition 2 and Proposition A.1, there exist an 𝜖0 > 0, a neighborhood of {U∗,T∗,Y∗,T∗}, and a continuous and concave function \(\psi (\cdot ):[0,\epsilon _{0}) \rightarrow \mathbb {R}_{+}\) such that for all \(\{\boldsymbol {U}, {T}, {Y}, {T}^{\prime }\} \in {N}\) satisfying \(\tilde L_{\tau ,\alpha }^{*} < \tilde L_{\tau ,\alpha }(\boldsymbol {U}, {T}, {Y}, {T}^{\prime }) <\tilde L_{\tau ,\alpha }^{*} + \epsilon _{0}\), there holds
Let 𝜖1 > 0 be such that
and let \(\mathbb B^{\boldsymbol {U}, {T}}_{\epsilon _{1}}:= \{ { \left (\boldsymbol {U}, {T} \right ) }\mid { \left \| U_{j} -U^{*}_{j} \right \|_{F} } < \epsilon _{1},1\leq j\leq d,{ \left \| {T}- {T}^{*} \right \|_{F} }<\epsilon _{1} \}\). From the stationary point system (??) and the expression of Yk+ 1 in (.33), we have
where the last inequality follows from Propositions 2.3 and 2.2. On the other hand,
As Theorem 4.1 shows that there exists k0 > 0 such that for k ≥ k0, \({ \left \| { {\varDelta }_{ {T}}^{k,k-1} } \right \|_{F} }<\epsilon _{1}\), (A.29) and (A.30) tells us that if k ≥ k0 and \(\{\boldsymbol {U}^{k}, {T}^{k}\}\in \mathbb {B}^{\boldsymbol {U}, {T} }_{\epsilon _{1} }\), then \(\{\boldsymbol {U}^{k}, {T}^{k}, {Y}^{k}, {T}^{k-1} \} \in \mathbb {B}_{\epsilon _{1} } \subset N\). Such k0 must exist as {U∗,T∗,Y∗,T∗} is a limit point. In addition, denote \(c_{1}:=\min \limits \{\alpha /2,1/\tau \}\); then, there exists k1 ≥ k0 such that \( \{ \boldsymbol {U}^{k_{1}}, {T}^{k_{1}} \} \in \mathbb {B}^{\boldsymbol {U}, {T} }_{\epsilon _{1}/2} \) and
where c0 is the constant appearing in Lemma A.3, and c2 is a constant such that \(c_{2} > 16c_{0}/\sqrt {c_{1}}\).
In what follows, we use induction method to show that \(\{\boldsymbol {U}^{k}, {T}^{k}\}\in \mathbb B^{\boldsymbol {U}, {T}}_{\epsilon _{1}}\) for all k > k1. Since ψ(⋅) in Definition 2 is concave, it holds that for any k,
on the other side, from the previous paragraph we see that \(\{\boldsymbol {U}^{k_{1}}, {T}^{k_{1}}\}\in \mathbb B^{\boldsymbol {U}, {T} }_{\epsilon _{1}/2}\), \(\{ \boldsymbol {U}^{k_{1}}, {T}^{k_{1}}, {Y}^{k_{1}}, {T}^{k_{1}-1} \} \in \mathbb {B}_{\epsilon _{1}} \subset {N}\), and so (A.28) holds at \(\{\boldsymbol {U}^{k_{1}}, {T}^{k_{1}}, {Y}^{k_{1}}, {T}^{k_{1}-1} \}\). Recall \(c_{1}=\min \limits \{\alpha /2,1/\tau \}\). From Lemma A.2 and the relation between \(\tilde L_{\tau }\) and \(\tilde L_{\tau ,\alpha }\), we obtain
where the second inequality is due to (A.32)while the last one comes from (A.28). Using \(\sqrt {ab}\leq \frac {a+b}{2}\) for a ≥ 0,b ≥ 0, invoking (A.19) and noticing the range in (A.31), we obtain
and so
namely, \( \{\boldsymbol {U}^{k_{1}+1}, {T}^{k_{1}+1}\} \in \mathbb B^{\boldsymbol {U}, {T} }_{\epsilon _{1}}\).
Now, assume that \(\{\boldsymbol {U}^{k}, {T}^{k}\}\in \mathbb B^{\boldsymbol {U}, {T} }_{\epsilon _{1}}\) for k = k1,…,K. This implies that (A.28) is true at {Uk,Tk,Yk,Tk− 1}, and similarly to the above analysis, we have
We then show that \(\{\boldsymbol {U}^{K+1}, {T}^{K+1}\}\in \mathbb B^{\boldsymbol {U}, {T}}_{\epsilon _{1}}\). Summing (A.33) for k = k1,…,K yields
Rearranging the terms, noticing (A.31) and noticing that \(\frac {c_{2}}{c_{0}}> \frac {\sqrt {c_{1}}}{16}\), we have
and so
Thus, induction method implies that \(\{\boldsymbol {U}^{k}, {T}^{k}\}\in \mathbb B^{\boldsymbol {U}, {T}}_{\epsilon _{1}}\) for all k ≥ k1, i.e., {Uk,Tk,Yk,Tk− 1}∈ N, k ≥ k1. As a result, (A.33) holds for all k ≥ k1, so does (A.34). Therefore, letting \(K\rightarrow \infty \) in (A.34) yields
which shows that {Uk,Tk} is a Cauchy sequence and hence converges. Since {U∗,T∗} in Theorem 4.1 is a limit point, the whole sequence converges to {U∗,T∗}. This completes the proof. □
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, Y., Feng, Y. Half-quadratic alternating direction method of multipliers for robust orthogonal tensor approximation. Adv Comput Math 49, 24 (2023). https://doi.org/10.1007/s10444-023-10014-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10444-023-10014-6