Skip to main content
Log in

An inexact augmented Lagrangian method for computing strongly orthogonal decompositions of tensors

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

A strongly orthogonal decomposition of a tensor is a rank one tensor decomposition with the two component vectors in each mode of any two rank one tensors are either colinear or orthogonal. A strongly orthogonal decomposition with few number of rank one tensors is favorable in applications, which can be represented by a matrix-tensor multiplication with orthogonal factor matrices and a sparse tensor; and such a decomposition with the minimum number of rank one tensors is a strongly orthogonal rank decomposition. Any tensor has a strongly orthogonal rank decomposition. In this article, computing a strongly orthogonal rank decomposition is equivalently reformulated as solving an optimization problem. Different from the ill-posedness of the usual optimization reformulation for the tensor rank decomposition problem, the optimization reformulation of the strongly orthogonal rank decomposition of a tensor is well-posed. Each feasible solution of the optimization problem gives a strongly orthogonal decomposition of the tensor; and a global optimizer gives a strongly orthogonal rank decomposition, which is however difficult to compute. An inexact augmented Lagrangian method is proposed to solve the optimization problem. The augmented Lagrangian subproblem is solved by a proximal alternating minimization method, with the advantage that each subproblem has a closed formula solution and the factor matrices are kept orthogonal during the iteration. Thus, the algorithm always can return a feasible solution and thus a strongly orthogonal decomposition for any given tensor. Global convergence of this algorithm to a critical point is established without any further assumption. Extensive numerical experiments are conducted, and show that the proposed algorithm is quite promising in both efficiency and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. We can reformulate (23) as an optimization problem with a smooth objective function by packing \(\Vert \mathcal {B}\Vert _1\) into the constraints as well. Then, optimality conditions can be derived as [39]. While, it seems that it is not a wise choice here to destroy the smooth nature of the constraints and introduce a heavy task on computing the normal cone of a feasible set whose constraints involve nonsmooth functions.

References

  1. Absil, P.-A., Hosseini, S.: A collection of nonsmooth Riemannian optimization problems. In: Hosseini, S., Mordukhovich, B., Uschmajew, A. (eds.) Nonsmooth Optimization and Its Applications, International Series of Numerical Mathematics, vol. 170, pp. 1–15. Birkhäuser, Cham (2019)

    Chapter  Google Scholar 

  2. Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008)

    Book  Google Scholar 

  3. Anandkumar, A., Ge, R., Hsu, D., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. 15, 2773–2832 (2014)

    MathSciNet  MATH  Google Scholar 

  4. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137, 91–129 (2013)

    Article  MathSciNet  Google Scholar 

  5. Bader, B.W., Kolda, T.G.: MATLAB Tensor Toolbox Version 2.6, February 2015. http://www.sandia.gov/~tgkolda/TensorToolbox/

  6. Batselier, K., Liu, H., Wong, N.: A constructive algorithm for decomposing a tensor into a finite sum of orthonormal rank-1 terms. SIAM J. Matrix Anal. Appl. 36, 1315–1337 (2015)

    Article  MathSciNet  Google Scholar 

  7. Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Belmont (1982)

    MATH  Google Scholar 

  8. Bochnak, J., Coste, M., Roy, M.-F.: Real Algebraic Geometry. Ergebnisse der Mathematik und ihrer Grenzgebiete, vol. 36. Springer, Berlin (1998)

    Google Scholar 

  9. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)

    Article  MathSciNet  Google Scholar 

  10. Chen, J., Saad, Y.: On the tensor SVD and the optimal low rank orthogonal approximation of tensors. SIAM J. Matrix Anal. Appl. 30, 1709–1734 (2009)

    Article  MathSciNet  Google Scholar 

  11. Chen, Y., Ye, Y., Wang, M.: Approximation hardness for a class of sparse optimization problems. J. Mach. Learn. Res. 20, 1–27 (2019)

    MathSciNet  MATH  Google Scholar 

  12. Comon, P.: MA identification using fourth order cumulants. Signal Process. 26, 381–388 (1992)

    Article  Google Scholar 

  13. Comon, P.: Independent component analysis, a new concept? Signal Process. 36, 287–314 (1994)

    Article  Google Scholar 

  14. De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21, 1253–1278 (2000)

    Article  MathSciNet  Google Scholar 

  15. De Silva, V., Lim, L.-H.: Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM J. Matrix Anal. Appl. 30, 1084–1127 (2008)

    Article  MathSciNet  Google Scholar 

  16. Donoho, D.L.: For most large underdetermined systems of linear equations the minimal \(1\)-norm solution is also the sparsest solution. Commun. Pure Appl. Math. 59, 797–829 (2006)

    Article  MathSciNet  Google Scholar 

  17. Franc, A.: Etude Algébrique des Multitableaux: Apports de l’Algébre Tensorielle, Thèse de Doctorat, Spécialité Statistiques. Univ. de Montpellier II, Montpellier (1992)

  18. Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. Johns Hopkins University Press, Baltimore (2013)

    MATH  Google Scholar 

  19. Håstad, J.: Tensor rank is NP-complete. J. Algorithms 11, 644–654 (1990)

    Article  MathSciNet  Google Scholar 

  20. Hillar, C.J., Lim, L.-H.: Most tensor problems are NP-hard. J. ACM 60(6), 1–39 (2013)

    Article  MathSciNet  Google Scholar 

  21. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, New York (1985)

    Book  Google Scholar 

  22. Hu, S.: Bounds on strongly orthogonal ranks of tensors, revised manuscript (2019)

  23. Hu, S., Li, G.: Convergence rate analysis for the higher order power method in best rank one approximations of tensors. Numer. Math. 140, 993–1031 (2018)

    Article  MathSciNet  Google Scholar 

  24. Ishteva, M., Absil, P.-A., Van Dooren, P.: Jacobi algorithm for the best low multilinear rank approximation of symmetric tensors. SIAM J. Matrix Anal. Appl. 34, 651–672 (2013)

    Article  MathSciNet  Google Scholar 

  25. Jiang, B., Dai, Y.H.: A framework of constraint preserving update schemes for optimization on Stiefel manifold. Math. Program. 153, 535–575 (2015)

    Article  MathSciNet  Google Scholar 

  26. Jordan, C.: Essai sur la géométrie à n dimensions. Bull. Soc. Math. 3, 103–174 (1875)

    MathSciNet  MATH  Google Scholar 

  27. Kolda, T.G.: Orthogonal tensor decompositions. SIAM J. Matrix Anal. Appl. 23, 243–255 (2001)

    Article  MathSciNet  Google Scholar 

  28. Kolda, T.G.: A counterexample to the possibility of an extension of the Eckart–Young low-rank approximation theorem for the orthogonal rank tensor decomposition. SIAM J. Matrix Anal. Appl. 24, 762–767 (2003)

    Article  MathSciNet  Google Scholar 

  29. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51, 455–500 (2009)

    Article  MathSciNet  Google Scholar 

  30. Kroonenberg, P.M., De Leeuw, J.: Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika 45, 69–97 (1980)

    Article  MathSciNet  Google Scholar 

  31. Kruskal, J.B.: Three-way array: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Appl. 18, 95–138 (1977)

    Article  MathSciNet  Google Scholar 

  32. Landsberg, J.M.: Tensors: Geometry and Applications, Graduate Studies in Mathematics, vol. 128. AMS, Providence (2012)

    Google Scholar 

  33. Leibovici, D., Sabatier, R.: A singular value decomposition of a \(k\)-way array for principal component analysis of multiway data, PTA-k. Linear Algebra Appl. 269, 307–329 (1998)

    Article  MathSciNet  Google Scholar 

  34. Liu, Y.F., Dai, Y.H., Luo, Z.Q.: On the complexity of leakage interference minimization for interference alignment. In: 2011 IEEE 12th International Workshop on Signal Processing Advances in Wireless Communications, pp. 471–475 (2011)

  35. Mangasarian, O.L., Fromovitz, S.: The Fritz–John necessary optimality conditions in the presence of equality and inequality constraints. J. Math. Ana. Appl. 7, 34–47 (1967)

    MathSciNet  MATH  Google Scholar 

  36. Martin, C.D.M., Van Loan, C.F.: A Jacobi-type method for computing orthogonal tensor decompositions. SIAM J. Matrix Anal. Appl. 30, 1219–1232 (2008)

    Article  MathSciNet  Google Scholar 

  37. Nie, J.: Generating polynomials and symmetric tensor decompositions. Found. Comput. Math. 17, 423–465 (2017)

    Article  MathSciNet  Google Scholar 

  38. Robeva, E.: Orthogonal decomposition of symmetric tensors. SIAM J. Matrix Anal. Appl. 37, 86–102 (2016)

    Article  MathSciNet  Google Scholar 

  39. Rockafellar, R.T.: Lagrange multipliers and optimality. SIAM Rev. 35, 183–238 (1993)

    Article  MathSciNet  Google Scholar 

  40. Rockafellar, R.T., Wets, R.: Variational Analysis. Grundlehren der Mathematischen Wissenschaften, vol. 317. Springer, Berlin (1998)

    Google Scholar 

  41. Zhang, T., Golub, G.H.: Rank-one approximation to high order tensors. SIAM J. Matrix Anal. Appl. 23, 534–550 (2001)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is partially supported by National Science Foundation of China (Grant No. 11771328). The author is very grateful for the annoynous referees for their helpful suggestions and comments in revising this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shenglong Hu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A. Convergence theorem for PAM

Let \(f : \mathbb {R}^{n_1}\times \dots \times \mathbb {R}^{n_k}\rightarrow \mathbb {R}\cup \{+\infty \}\) be a function of the following structure

$$\begin{aligned} f(\mathbf {x})=Q(\mathbf {x}_1,\dots ,\mathbf {x}_k)+\sum _{i=1}^kg_i(\mathbf {x}_i), \end{aligned}$$

where Q is a \(C^1\) (continuously differentiable) function with locally Lipschitz continuous gradient, and \(g_i : \mathbb {R}^{n_i}\rightarrow \mathbb {R}\cup \{+\infty \}\) is a proper lower semicontinuous function for each \(i\in \{1,\dots ,k\}\).

We introduce the following algorithmic scheme to solve the optimization problem

$$\begin{aligned} \min _{\mathbf {x}\in \mathbb {R}^{n_1}\times \dots \times \mathbb {R}^{n_k}}f(\mathbf {x}). \end{aligned}$$

Since f is proper and lower semicontinuous, \(\mathbf {x}\) is an optimizer of this minimization problem only if it is a critical point of f, i.e., \(0\in \partial f(\mathbf {x})\).

Algorithm A.1

general PAM

figure c

Step 1 can be implemented through several methods. In particular, (36), (37) and (38) are fulfilled if for all \(j\in \{1,\dots ,k\}\), \(\mathbf {x}^s_j\) is taken as a minimizer of the optimization problem

$$\begin{aligned} \min _{\mathbf {z}\in \mathbb {R}^{n_j}} g_j(\mathbf {z})+Q(\mathbf {x}_1^s,\dots ,\mathbf {x}_{j-1}^s,\mathbf {z},\mathbf {x}_{j+1}^{s-1},\dots ,\mathbf {x}_k^{s-1})+\frac{1}{2}\Vert \mathbf {z}-\mathbf {x}^{s-1}_j\Vert _{P_j}. \end{aligned}$$
(39)

We now state the global convergence of Algorithm A.1 for a wide class of objective functions [4, Theorem 6.2].

Theorem A.2

(Proximal Alternating Minimization) Let f be a Kurdyka–Łojasiewicz function and bounded from below. Let \(\{\mathbf {x}^s\}\) be a sequence produced by Algorithm A.1. If \(\{\mathbf {x}^s\}\) is bounded, then it converges to a critical point of f.

Appendix B. Nonsmooth Lagrange multiplier

The following materials can be found in [40, Chapter 10].

Let \(X\subseteq \mathbb {R}^n\) be nonempty and closed, \(f_0 : \mathbb {R}^n\rightarrow \mathbb {R}\) be locally Lipschitz continuous, \(F : \mathbb {R}^n\rightarrow \mathbb {R}^m\) with \(F:=(f_1,\dots ,f_m)\) and each \(f_i\) locally Lipschitz continuous, and \(\theta : \mathbb {R}^m\rightarrow \mathbb {R}\cup \{\pm \infty \}\) be proper, lower semicontinuous, convex with effective domain D.

Consider the following optimization problem

$$\begin{aligned} \min f_0(\mathbf {x})+\theta (F(\mathbf {x}))\ \text {s.t. }\mathbf {x}\in X. \end{aligned}$$
(40)

If \(\overline{\mathbf {x}}\) is a local optimal solution to (40) such that the following constraint qualification being satisfied

$$\begin{aligned} \mathbf {0}\in \partial (\mathbf {y}^\mathsf {T} F)(\overline{\mathbf {x}})+N_X(\mathbf {\overline{x}})\ \text {and }\mathbf {y}\in N_D(F(\mathbf {\overline{x}}))\Longrightarrow \mathbf {y}=\mathbf {0}, \end{aligned}$$
(41)

then there exists a vector \(\overline{\mathbf {y}}\) such that

$$\begin{aligned} \mathbf {0}\in \partial (f_0+\overline{\mathbf {y}}^\mathsf {T} F)(\overline{\mathbf {x}})+N_X(\mathbf {\overline{x}})\ \text {and }\overline{\mathbf {y}}\in \partial \theta (F(\mathbf {\overline{x}})). \end{aligned}$$
(42)

A vector \(\overline{\mathbf {y}}\) satisfying (42) is called a Lagrange multiplier, and the pair \((\overline{\mathbf {x}},\overline{\mathbf {y}})\) satisfying (42) is a Karush–Kuhn–Tucker pair with \(\overline{\mathbf {x}}\) a KKT point. Let \(M(\overline{\mathbf {x}})\) be the set of Lagrange multipliers for a KKT point \(\overline{\mathbf {x}}\). Under the constraint qualification (41), the set \(M(\overline{\mathbf {x}})\) is compact.

A particular case is \(\theta =\delta _{\{\mathbf {0}\}}\), the indicator function of the set \(\{\mathbf {0}\}\subset \mathbb {R}^m\). Then problem (40) reduces to

$$\begin{aligned} \min _{\mathbf {x}\in X} f_0(\mathbf {x})\ \text {s.t. }f_i(\mathbf {x})=0,\ \text {for all }i=1,\dots ,m. \end{aligned}$$
(43)

If each \(f_i\) is continuously differentiable for \(i\in \{1,\dots ,m\}\), then the constraint qualification is

$$\begin{aligned} y_1\nabla f_1(\overline{\mathbf {x}})+\dots +y_m\nabla f_m(\overline{\mathbf {x}})\in N_X(\overline{\mathbf {x}})\Longrightarrow \mathbf {y}=\mathbf {0}. \end{aligned}$$
(44)

It is the basic constraint qualification discussed in [39], an extension of the Mangasarian–Fromovitz constraint qualification [35].

The optimality condition (42) becomes

$$\begin{aligned} \overline{y}_1\nabla f_1(\overline{\mathbf {x}})+\dots +\overline{y}_m\nabla f_m(\overline{\mathbf {x}})\in \partial f_0(\overline{\mathbf {x}})+ N_X(\overline{\mathbf {x}}), \end{aligned}$$

or in a more familiar form as

$$\begin{aligned} \mathbf {v}+y_1\nabla f_1(\overline{\mathbf {x}})+\dots +y_m\nabla f_m(\overline{\mathbf {x}})\in N_X(\overline{\mathbf {x}})\ \text {for some }\mathbf {v}\in \partial f_0(\overline{\mathbf {x}}). \end{aligned}$$

Appendix C. Proof of Proposition 3.3

Proof

It follows from (16) that \(U^{(i)}_s\in \mathbb {O}(n_i)\) for all \(i\in \{1,\dots ,k\}\) and \(s=1,2,\dots \) and hence the sequence \(\{\mathbb {U}_s\}\) is bounded.

Let \(\Xi _s\in \partial L_{\rho _s}(\mathbb {U}_s,\mathcal {B}_s;\mathcal {X}_s)\) be such that \(\Vert \Xi _s\Vert \le \epsilon _s\) which is guaranteed by (17). Thus,

$$\begin{aligned} \Xi _s=\begin{bmatrix}B_s^{(1)}\\ \vdots \\ B_s^{(k)}\\ \mathcal {W}_s \end{bmatrix}+\rho _s\begin{bmatrix}U_s^{(1)}V_s^{(1)}\big [V_s^{(1)}\big ]^\mathsf {T}-\mathcal {B}_s^{({\text {f}},1)}\big [V_s^{(1)}\big ]^\mathsf {T}+\frac{1}{\rho _s}\mathcal {X}_s^{({\text {f}},1)}\big [V_s^{(1)}\big ]^\mathsf {T}\\ \vdots \\ U_s^{(k)}V_s^{(k)}\big [V_s^{(k)}\big ]^\mathsf {T}-\mathcal {B}_s^{({\text {f}},k)}\big [V_s^{(k)}\big ]^\mathsf {T}+\frac{1}{\rho _s}\mathcal {X}_s^{({\text {f}},k)}\big [V_s^{(k)}\big ]^\mathsf {T}\\ \mathcal {B}_s-(U_s^{(1)},\dots ,U_s^{(k)})\cdot \mathcal {A}-\frac{1}{\rho _s}\mathcal {X}_s\end{bmatrix} \end{aligned}$$
(45)

for some \(\mathcal {W}_s\in \partial \Vert \mathcal {B}_s\Vert _1\), and \(B^{(i)}_s\in N_{\mathbb {O}(n_i)}(U^{(i)}_s)\) for all \(i\in \{1,\dots ,k\}\).

It follows from the last row in (45) and (17) that

$$\begin{aligned} \bigg \Vert \rho _s\big (\mathcal {B}_s-(U_s^{(1)},\dots ,U_s^{(k)})\cdot \mathcal {A}-\frac{1}{\rho _s}\mathcal {X}_s\big )+ \mathcal {W}_s\bigg \Vert \le \Vert \Xi _s\Vert \le \epsilon _s. \end{aligned}$$
(46)

By the fact that \(\mathcal {W}_s\) is uniformly bounded (cf. (22)), and \(\epsilon _s\rightarrow 0\), we conclude that \(\rho _s\big (\mathcal {B}_s-(U_s^{(1)},\dots ,U_s^{(k)})\cdot \mathcal {A}-\frac{1}{\rho _s}\mathcal {X}_s\big )\) is bounded. Therefore, the sequence \(\{\mathcal {X}_{s+1}\}\) is bounded by the multiplier update rule (18).

Since \(\mathcal {W}_s\) and \(\mathcal {X}_s\) are both bounded, it follows from (46) that \(\rho _s\big (\mathcal {B}_s-(U_s^{(1)},\dots ,U_s^{(k)})\cdot \mathcal {A}\big )\) is bounded. As \(\{\rho _s\}\) is a nondecreasing sequence of positive numbers and \(\{\mathbb {U}_s\}\) is bounded, we must have that the sequence \(\{\mathcal {B}_s\}\) is bounded.

In a conclusion, the sequence \(\{\mathbb {U}_s,\mathcal {B}_s,\mathcal {X}_s\}\) is bounded.

For the feasibility, note that \(\mathbb {U}_*\) satisfies the orthogonality by (16). The rest proof is divided into two parts, according to the boundedness of the sequence \(\{\rho _s\}\).

Part I. Suppose first that the penalty sequence \(\{\rho _s\}\) is bounded. By the penalty parameter update rule (19), it follows that \(\rho _s\) stabilizes after some \(s_0\), i.e., \(\rho _s=\rho _{s_0}\) for all \(s\ge s_0\). Thus,

$$\begin{aligned} \Vert (U_s^{(1)},\dots ,U_s^{(k)})\cdot \mathcal {A}-\mathcal {B}_s\Vert \le \tau \Vert (U_{s-1}^{(1)},\dots ,U_{s-1}^{(k)})\cdot \mathcal {A}-\mathcal {B}_{s-1}\Vert \ \text {for all }s\ge s_0+1. \end{aligned}$$
(47)

The feasibility result then follows from a standard continuity argument.

Part II. In the following, we assume that \(\rho _s\rightarrow \infty \) as \(s\rightarrow \infty \).

Likewise, it follows from the last row in (45) and (17) that

$$\begin{aligned} \bigg \Vert \mathcal {B}_s-(U_s^{(1)},\dots ,U_s^{(k)})\cdot \mathcal {A}-\frac{1}{\rho _s}\mathcal {X}_s+\frac{1}{\rho _s}\mathcal {W}_s\bigg \Vert \le \frac{\Vert \Xi _s\Vert }{\rho _s}\le \frac{\epsilon _s}{\rho _s}. \end{aligned}$$

By the fact that \(\mathcal {W}_s\) and \(\mathcal {X}_s\) are both bounded, \(\rho _s\rightarrow \infty \), and \(\epsilon _s\rightarrow 0\), we have that

$$\begin{aligned} \big \Vert \mathcal {B}_s-(U_s^{(1)},\dots ,U_s^{(k)})\cdot \mathcal {A}\big \Vert \rightarrow 0. \end{aligned}$$
(48)

Thus, by continuity, we have that \((\mathbb {U}_*,\mathcal {B}_*)\) is a feasible point.

In the following, we show that \((\mathbb {U}_*,\mathcal {B}_*)\) is a KKT point. Let

$$\begin{aligned} \mathcal {M}_s:=\rho _s\big ((U^{(1)}_s,\dots ,U^{(k)}_s)\cdot \mathcal {A}-\mathcal {B}_s\big ). \end{aligned}$$

It follows from the above analysis that \(\{\mathcal {M}_s\}\) is bounded. By the multiplier update rule (18), the system (45) can be rewritten as

$$\begin{aligned} \Xi _s=\begin{bmatrix}B_s^{(1)}\\ \vdots \\ B_s^{(k)}\\ \mathcal {W}_s \end{bmatrix}+\begin{bmatrix} \big (\mathcal {M}_s+\mathcal {X}_{s}\big )^{({\text {f}},1)}\big [V_s^{(1)}\big ]^\mathsf {T}\\ \vdots \\ \big (\mathcal {M}_s+\mathcal {X}_{s}\big )^{({\text {f}},k)}\big [V_s^{(k)}\big ]^\mathsf {T}\\ -\mathcal {M}_s-\mathcal {X}_{s}\end{bmatrix}. \end{aligned}$$
(49)

The boundedness of \(\{\mathbb {U}_s,\mathcal {B}_s,\mathcal {X}_s,\mathcal {M}_s\}\) and \(\{\Xi _s\}\) implies the boundedness of each \(\{B_s^{(i)}\}\) for all \(i\in \{1,\dots ,k\}\) as well. We assume without loss of generality that

$$\begin{aligned} \{\mathbb {U}_s,\mathcal {B}_s,\mathcal {X}_s,\mathcal {M}_s,\mathcal {W}_s,\mathbb {B}u_s\}\rightarrow \{\mathbb {U}_*,\mathcal {B}_*,\mathcal {X}_*,\mathcal {W}_*,\mathcal {M}_*,\mathbb {B}_*\}\ \text {as }s\rightarrow \infty \ \text {and }s\in \mathcal {K} \end{aligned}$$

for an infinite index set \(\mathcal {K}\subseteq \{1,2,\dots \}\), and in where

$$\begin{aligned} \mathbb {B}_s:=(B^{(1)}_s,\dots ,B^{(k)}_s)\ \text {and }\mathbb {B}_*:=(B^{(1)}_*,\dots ,B^{(k)}_*). \end{aligned}$$

Taking limitations on both sides of (49) within \(\mathcal {K}\), we have then

$$\begin{aligned} \begin{bmatrix}\big (\mathcal {M}_*+\mathcal {X}_*)^{({\text {f}},1)}\big [V_*^{(1)}\big ]^\mathsf {T}\\ \vdots \\ \big (\mathcal {M}_*+\mathcal {X}_*)^{({\text {f}},1)}\big [V_*^{(1)}\big ]^\mathsf {T}\\ -(\mathcal {M}_*+\mathcal {X}_*)\end{bmatrix}=-\begin{bmatrix}B_*^{(1)}\\ \vdots \\ B_*^{(k)}\\ \mathcal {W}_*\end{bmatrix}, \end{aligned}$$

where \(V^{(i)}_*\) is defined as (21) with \(U^{(i)}\)’s being replaced by \(U^{(i)}_*\)’s. By the closedness of subdifferentials, we have

$$\begin{aligned} \mathcal {W}_*\in \partial \Vert \mathcal {B}_*\Vert _1, \end{aligned}$$

and

$$\begin{aligned} B_*^{(i)}\in N_{\mathbb {O}(n_i)}(U^{(i)}_*)\ \text {for all }i\in \{1,\dots ,k\}. \end{aligned}$$

Since each \(N_{\mathbb {O}(n_i)}(U^{(i)}_*)\) is a linear subspace, we have shown that \((\mathbb {U}_*,\mathcal {B}_*)\) is a KKT point of (12) with Lagrange multiplier \(\mathcal {X}_*+\mathcal {M}_*\) (cf. (25)). The proof is complete. \(\square \)

Appendix D. Proof of Proposition 4.2

Proof

It is known that for all \(i\in \{1,\dots ,k\}\) each orthogonal group \(\mathbb {O}(n_i)\) is an algebraic set, defined by a system of polynomial equations. Therefore, \(\mathbb {O}(n_i)\) is a semi-algebraic set and its indicator function is semi-algebraic [8]. The \(l_1\)-norm \(\Vert \cdot \Vert _1\) is also semi-algebraic. Also known is that each semi-algebraic function is a Kurdyka–Łojasiewicz function (cf. [9, Appendix]). Thus, as a summation of the \(l_1\)-norm, the indicator functions of the orthogonal groups, and polynomials, the augmented Lagrangian function \(L_{\rho }(\cdot ,\cdot ;\mathcal {X})\) is a Kurdyka–Łojasiewicz function.

If the iteration sequence \(\{(\mathbb {U}_s,\mathcal {B}_s)\}\) generated by Algorithm 4.1 is bounded, and the function \(L_{\rho }(\cdot ,\cdot ;\mathcal {X})\) is bounded from below, then the sequence \(\{(\mathbb {U}_s,\mathcal {B}_s)\}\) converges by Theorem A.2.

For any given \(\mathcal {X}\), it follows immediately from (14) that the function \(L_{\rho }(\cdot ,\cdot ;\mathcal {X})\) is bounded from below, since

$$\begin{aligned} L_{\rho }(\mathbb {U},\mathcal {B};\mathcal {X})=\Vert \mathcal {B}\Vert _1+\sum _{i=1}^k\delta _{\mathbb {O}(n_i)}(U^{(i)})+\frac{\rho }{2}\Big \Vert (U^{(1)},\dots ,U^{(k)})\cdot \mathcal {A}-\mathcal {B}+\frac{1}{\rho }\mathcal {X}\Big \Vert ^2-\frac{1}{2\rho }\Vert \mathcal {X}\Vert ^2. \end{aligned}$$
(50)

In the language of Appendix A, the variable \(\mathcal {B}\) refers to the \(j=0\)-th block variable, and \(U^{(j)}\) the j-th block for \(j\in \{1,\dots ,k\}\). Then,

$$\begin{aligned} g_0(\mathcal {B}):=\Vert \mathcal {B}\Vert _1,\ \text {and }g_j(U^{(j)})=\delta _{\mathbb {O}(n_j)}(U^{(j)})\ \text {for all }j\in \{1,\dots ,k\}, \end{aligned}$$

and the function Q is defined naturally to comprise \(L_\rho \) in (50).

We first show that (38) is satisfied. By (26), we know that (38) is satisfied by \(\alpha _j=\overline{c}\) for all \(j\in \{0,1,\dots ,k\}\).

It follows from (26), (28) and (29) that

$$\begin{aligned}&L_{\rho }(\mathbb {U}_{s-1},\mathcal {B}_s;\mathcal {X})+\frac{c_s^{(0)}}{2}\Vert \mathcal {B}_s-\mathcal {B}_{s-1}\Vert ^2\le L_{\rho }(\mathbb {U}_{s-1},\mathcal {B}_{s-1};\mathcal {X}),\\&\quad L_{\rho }\left( \left( U^{(1)}_s,U^{(2)}_{s-1},\dots ,U^{(k)}_{s-1}\right) ,\mathcal {B}_s;\mathcal {X}\right) +\frac{c_s^{(1)}}{2}\Vert U^{(1)}_s-U^{(1)}_{s-1}\Vert ^2\le L_{\rho }(\mathbb {U}_{s-1},\mathcal {B}_s;\mathcal {X}),\\&\quad \dots \\&\quad L_{\rho }(\mathbb {U}_s,\mathcal {B}_s;\mathcal {X})+\frac{c_s^{(k)}}{2}\Vert U^{(k)}_s-U^{(k)}_{s-1}\Vert ^2\le L_{\rho }\left( \left( U^{(1)}_s,\dots ,U^{(k-1)}_{s},U^{(k)}_{s-1}\right) ,\mathcal {B}_s;\mathcal {X}\right) . \end{aligned}$$

Summing up these inequalities, we have

$$\begin{aligned} L_{\rho }(\mathbb {U}_s,\mathcal {B}_s;\mathcal {X}) +\frac{\underline{c}}{2}\left( \sum _{i=1}^k\Vert U^{(i)}_s-U^{(i)}_{s-1}\Vert ^2 +\Vert \mathcal {B}_s-\mathcal {B}_{s-1}\Vert ^2\right) \le L_{\rho }(\mathbb {U}_{s-1},\mathcal {B}_{s-1};\mathcal {X}). \end{aligned}$$

Therefore, the sequence \(\{L_{\rho }(\mathbb {U}_s,\mathcal {B}_s;\mathcal {X})\}\) monotonically decreases to a finite limit.

On the other hand, since each component matrix of \(\mathbb {U}_s\) is an orthogonal matrix, the sequence \(\{\mathbb {U}_s\}\) is bounded. Suppose that the sequence \(\{\mathcal {B}_s\}\) is unbounded. Then, it follows from (50) that the sequence \(\{L_{\rho }(\mathbb {U}_s,\mathcal {B}_s;\mathcal {X})\}\) should diverge to infinity, which is an immediate contradiction. Thus, the iteration sequence \(\{(\mathbb {U}_s,\mathcal {B}_s)\}\) must be bounded, and hence converges by Theorem A.2.

In the following, we show that \(\Vert \Theta _s\Vert \rightarrow 0\) as \(s\rightarrow \infty \). First of all, we derive an upper bound estimate for \(\Vert V^{(j)}_s-\tilde{V}^{(j)}_s\Vert \) as

$$\begin{aligned}&\Vert V^{(j)}_s-\tilde{V}^{(j)}_s\Vert \\&\quad =\Vert \big [(U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s-1},\dots ,U^{(k)}_{s-1})\cdot \mathcal {A}\big ]^{({\text {f}},j)}\\&\qquad -\big [(U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s},\dots ,U^{(k)}_{s})\cdot \mathcal {A}\big ]^{({\text {f}},j)}\Vert \\&\quad =\Vert (U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s-1},\dots ,U^{(k)}_{s-1})\cdot \mathcal {A}\\&\qquad -(U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s},\dots ,U^{(k)}_{s})\cdot \mathcal {A}\Vert \\&\quad \le \Vert (U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s-1},\dots ,U^{(k)}_{s-1})\cdot \mathcal {A}\\&\qquad -(U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s-1},\dots ,U^{(k-1)}_{s-1},U^{(k)}_{s})\cdot \mathcal {A}\Vert \\&\qquad +\Vert (U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s-1},\dots ,U^{(k-1)}_{s-1},U^{(k)}_{s})\cdot \mathcal {A}\\&\qquad -(U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s},\dots ,U^{(k)}_{s})\cdot \mathcal {A}\Vert \\&\quad =\Vert (U^{(k)}_{s-1}-U^{(k)}_s)\big [(U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s-1},\dots ,U^{(k-1)}_{s-1},I)\cdot \mathcal {A}\big ]^{({\text {f}},k)}\Vert \\&\qquad +\Vert (U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s-1},\dots ,U^{(k-1)}_{s-1},U^{(k)}_{s})\cdot \mathcal {A}\\&\qquad -(U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s},\dots ,U^{(k)}_{s})\cdot \mathcal {A}\Vert \\&\quad \le \Vert \mathcal {A}\Vert \Vert U^{(k)}_{s-1}-U^{(k)}_s\Vert \\&\qquad +\Vert (U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s-1},\dots ,U^{(k-1)}_{s-1},U^{(k)}_{s})\cdot \mathcal {A}\\&\qquad -(U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s},\dots ,U^{(k)}_{s})\cdot \mathcal {A}\Vert \\&\quad \le \Vert \mathcal {A}\Vert \big (\Vert U^{(j+1)}_{s-1}-U^{(j+1)}_s\Vert +\dots +\Vert U^{(k)}_{s-1}-U^{(k)}_s\Vert \big )\\&\quad \le \Vert \mathcal {A}\Vert \Vert \mathbb {U}_s-\mathbb {U}_{s-1}\Vert , \end{aligned}$$

where the second inequality follows from the fact that \(U^{(i)}_t\in \mathbb {O}(n_i)\) for all \(i\in \{1,\dots ,k\}\) and \(t=1,2,\dots \), and the third from a standard induction.

Likewise, we have

$$\begin{aligned} \Vert (U^{(1)}_{s-1},\dots ,U^{(k)}_{s-1})\cdot \mathcal {A}-(U^{(1)}_{s},\dots ,U^{(k)}_{s})\cdot \mathcal {A}\Vert \le \Vert \mathcal {A}\Vert \Vert \mathbb {U}_s-\mathbb {U}_{s-1}\Vert . \end{aligned}$$

Thus, we have

$$\begin{aligned} \Vert \Theta _s\Vert&\le (\rho +k\Vert \mathcal {X}\Vert +k\rho \Vert \mathcal {B}_s\Vert )\Vert \mathcal {A}\Vert \Vert \mathbb {U}_s -\mathbb {U}_{s-1}\Vert +c_s^{(0)}\Vert \mathcal {B}_s-\mathcal {B}_{s-1}\Vert \\&\quad +\sum _{i=1}^kc_s^{(i)}\Vert U^{(i)}_s-U^{(i)}_{s-1}\Vert \\&\le \big [(\rho +k\Vert \mathcal {X}\Vert +k\rho \Vert \mathcal {B}_s\Vert )\Vert \mathcal {A}\Vert +\overline{c}\big ]\Vert \mathbb {U}_s-\mathbb {U}_{s-1}\Vert +\overline{c}\Vert \mathcal {B}_s-\mathcal {B}_{s-1}\Vert . \end{aligned}$$

Since the iteration sequence \(\{(\mathbb {U}_s,\mathcal {B}_s)\}\) converges, we conclude that \(\Vert \Theta _s\Vert \rightarrow 0\) as \(s\rightarrow \infty \). As \(\epsilon >0\) is a given parameter, Algorithm 4.1 terminates after finitely many iterations. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, S. An inexact augmented Lagrangian method for computing strongly orthogonal decompositions of tensors. Comput Optim Appl 75, 701–737 (2020). https://doi.org/10.1007/s10589-019-00128-3

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-019-00128-3

Keywords

Mathematics Subject Classification

Navigation