An inexact augmented Lagrangian method for computing strongly orthogonal decompositions of tensors

Hu, Shenglong

doi:10.1007/s10589-019-00128-3

An inexact augmented Lagrangian method for computing strongly orthogonal decompositions of tensors

Published: 31 August 2019

Volume 75, pages 701–737, (2020)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Shenglong Hu¹

396 Accesses
4 Citations
Explore all metrics

Abstract

A strongly orthogonal decomposition of a tensor is a rank one tensor decomposition with the two component vectors in each mode of any two rank one tensors are either colinear or orthogonal. A strongly orthogonal decomposition with few number of rank one tensors is favorable in applications, which can be represented by a matrix-tensor multiplication with orthogonal factor matrices and a sparse tensor; and such a decomposition with the minimum number of rank one tensors is a strongly orthogonal rank decomposition. Any tensor has a strongly orthogonal rank decomposition. In this article, computing a strongly orthogonal rank decomposition is equivalently reformulated as solving an optimization problem. Different from the ill-posedness of the usual optimization reformulation for the tensor rank decomposition problem, the optimization reformulation of the strongly orthogonal rank decomposition of a tensor is well-posed. Each feasible solution of the optimization problem gives a strongly orthogonal decomposition of the tensor; and a global optimizer gives a strongly orthogonal rank decomposition, which is however difficult to compute. An inexact augmented Lagrangian method is proposed to solve the optimization problem. The augmented Lagrangian subproblem is solved by a proximal alternating minimization method, with the advantage that each subproblem has a closed formula solution and the factor matrices are kept orthogonal during the iteration. Thus, the algorithm always can return a feasible solution and thus a strongly orthogonal decomposition for any given tensor. Global convergence of this algorithm to a critical point is established without any further assumption. Extensive numerical experiments are conducted, and show that the proposed algorithm is quite promising in both efficiency and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rank Properties and Computational Methods for Orthogonal Tensor Decompositions

Article 23 November 2022

Optimality conditions for Tucker low-rank tensor optimization

Article 13 March 2023

A proximal point like method for solving tensor least-squares problems

Article 26 November 2021

Notes

We can reformulate (23) as an optimization problem with a smooth objective function by packing $\Vert \mathcal {B}\Vert _1$ into the constraints as well. Then, optimality conditions can be derived as [39]. While, it seems that it is not a wise choice here to destroy the smooth nature of the constraints and introduce a heavy task on computing the normal cone of a feasible set whose constraints involve nonsmooth functions.

References

Absil, P.-A., Hosseini, S.: A collection of nonsmooth Riemannian optimization problems. In: Hosseini, S., Mordukhovich, B., Uschmajew, A. (eds.) Nonsmooth Optimization and Its Applications, International Series of Numerical Mathematics, vol. 170, pp. 1–15. Birkhäuser, Cham (2019)
Chapter Google Scholar
Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008)
Book Google Scholar
Anandkumar, A., Ge, R., Hsu, D., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. 15, 2773–2832 (2014)
MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137, 91–129 (2013)
Article MathSciNet Google Scholar
Bader, B.W., Kolda, T.G.: MATLAB Tensor Toolbox Version 2.6, February 2015. http://www.sandia.gov/~tgkolda/TensorToolbox/
Batselier, K., Liu, H., Wong, N.: A constructive algorithm for decomposing a tensor into a finite sum of orthonormal rank-1 terms. SIAM J. Matrix Anal. Appl. 36, 1315–1337 (2015)
Article MathSciNet Google Scholar
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Belmont (1982)
MATH Google Scholar
Bochnak, J., Coste, M., Roy, M.-F.: Real Algebraic Geometry. Ergebnisse der Mathematik und ihrer Grenzgebiete, vol. 36. Springer, Berlin (1998)
Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)
Article MathSciNet Google Scholar
Chen, J., Saad, Y.: On the tensor SVD and the optimal low rank orthogonal approximation of tensors. SIAM J. Matrix Anal. Appl. 30, 1709–1734 (2009)
Article MathSciNet Google Scholar
Chen, Y., Ye, Y., Wang, M.: Approximation hardness for a class of sparse optimization problems. J. Mach. Learn. Res. 20, 1–27 (2019)
MathSciNet MATH Google Scholar
Comon, P.: MA identification using fourth order cumulants. Signal Process. 26, 381–388 (1992)
Article Google Scholar
Comon, P.: Independent component analysis, a new concept? Signal Process. 36, 287–314 (1994)
Article Google Scholar
De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21, 1253–1278 (2000)
Article MathSciNet Google Scholar
De Silva, V., Lim, L.-H.: Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM J. Matrix Anal. Appl. 30, 1084–1127 (2008)
Article MathSciNet Google Scholar
Donoho, D.L.: For most large underdetermined systems of linear equations the minimal $1$-norm solution is also the sparsest solution. Commun. Pure Appl. Math. 59, 797–829 (2006)
Article MathSciNet Google Scholar
Franc, A.: Etude Algébrique des Multitableaux: Apports de l’Algébre Tensorielle, Thèse de Doctorat, Spécialité Statistiques. Univ. de Montpellier II, Montpellier (1992)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. Johns Hopkins University Press, Baltimore (2013)
MATH Google Scholar
Håstad, J.: Tensor rank is NP-complete. J. Algorithms 11, 644–654 (1990)
Article MathSciNet Google Scholar
Hillar, C.J., Lim, L.-H.: Most tensor problems are NP-hard. J. ACM 60(6), 1–39 (2013)
Article MathSciNet Google Scholar
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, New York (1985)
Book Google Scholar
Hu, S.: Bounds on strongly orthogonal ranks of tensors, revised manuscript (2019)
Hu, S., Li, G.: Convergence rate analysis for the higher order power method in best rank one approximations of tensors. Numer. Math. 140, 993–1031 (2018)
Article MathSciNet Google Scholar
Ishteva, M., Absil, P.-A., Van Dooren, P.: Jacobi algorithm for the best low multilinear rank approximation of symmetric tensors. SIAM J. Matrix Anal. Appl. 34, 651–672 (2013)
Article MathSciNet Google Scholar
Jiang, B., Dai, Y.H.: A framework of constraint preserving update schemes for optimization on Stiefel manifold. Math. Program. 153, 535–575 (2015)
Article MathSciNet Google Scholar
Jordan, C.: Essai sur la géométrie à n dimensions. Bull. Soc. Math. 3, 103–174 (1875)
MathSciNet MATH Google Scholar
Kolda, T.G.: Orthogonal tensor decompositions. SIAM J. Matrix Anal. Appl. 23, 243–255 (2001)
Article MathSciNet Google Scholar
Kolda, T.G.: A counterexample to the possibility of an extension of the Eckart–Young low-rank approximation theorem for the orthogonal rank tensor decomposition. SIAM J. Matrix Anal. Appl. 24, 762–767 (2003)
Article MathSciNet Google Scholar
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51, 455–500 (2009)
Article MathSciNet Google Scholar
Kroonenberg, P.M., De Leeuw, J.: Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika 45, 69–97 (1980)
Article MathSciNet Google Scholar
Kruskal, J.B.: Three-way array: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Appl. 18, 95–138 (1977)
Article MathSciNet Google Scholar
Landsberg, J.M.: Tensors: Geometry and Applications, Graduate Studies in Mathematics, vol. 128. AMS, Providence (2012)
Google Scholar
Leibovici, D., Sabatier, R.: A singular value decomposition of a $k$-way array for principal component analysis of multiway data, PTA-k. Linear Algebra Appl. 269, 307–329 (1998)
Article MathSciNet Google Scholar
Liu, Y.F., Dai, Y.H., Luo, Z.Q.: On the complexity of leakage interference minimization for interference alignment. In: 2011 IEEE 12th International Workshop on Signal Processing Advances in Wireless Communications, pp. 471–475 (2011)
Mangasarian, O.L., Fromovitz, S.: The Fritz–John necessary optimality conditions in the presence of equality and inequality constraints. J. Math. Ana. Appl. 7, 34–47 (1967)
MathSciNet MATH Google Scholar
Martin, C.D.M., Van Loan, C.F.: A Jacobi-type method for computing orthogonal tensor decompositions. SIAM J. Matrix Anal. Appl. 30, 1219–1232 (2008)
Article MathSciNet Google Scholar
Nie, J.: Generating polynomials and symmetric tensor decompositions. Found. Comput. Math. 17, 423–465 (2017)
Article MathSciNet Google Scholar
Robeva, E.: Orthogonal decomposition of symmetric tensors. SIAM J. Matrix Anal. Appl. 37, 86–102 (2016)
Article MathSciNet Google Scholar
Rockafellar, R.T.: Lagrange multipliers and optimality. SIAM Rev. 35, 183–238 (1993)
Article MathSciNet Google Scholar
Rockafellar, R.T., Wets, R.: Variational Analysis. Grundlehren der Mathematischen Wissenschaften, vol. 317. Springer, Berlin (1998)
Google Scholar
Zhang, T., Golub, G.H.: Rank-one approximation to high order tensors. SIAM J. Matrix Anal. Appl. 23, 534–550 (2001)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is partially supported by National Science Foundation of China (Grant No. 11771328). The author is very grateful for the annoynous referees for their helpful suggestions and comments in revising this paper.

Author information

Authors and Affiliations

Department of Mathematics, School of Science, Hangzhou Dianzi University, Hangzhou, 310018, China
Shenglong Hu

Authors

Shenglong Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shenglong Hu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A. Convergence theorem for PAM

Let $f : \mathbb {R}^{n_1}\times \dots \times \mathbb {R}^{n_k}\rightarrow \mathbb {R}\cup \{+\infty \}$ be a function of the following structure

$$\begin{aligned} f(\mathbf {x})=Q(\mathbf {x}_1,\dots ,\mathbf {x}_k)+\sum _{i=1}^kg_i(\mathbf {x}_i), \end{aligned}$$

where Q is a $C^1$ (continuously differentiable) function with locally Lipschitz continuous gradient, and $g_i : \mathbb {R}^{n_i}\rightarrow \mathbb {R}\cup \{+\infty \}$ is a proper lower semicontinuous function for each $i\in \{1,\dots ,k\}$.

We introduce the following algorithmic scheme to solve the optimization problem

$$\begin{aligned} \min _{\mathbf {x}\in \mathbb {R}^{n_1}\times \dots \times \mathbb {R}^{n_k}}f(\mathbf {x}). \end{aligned}$$

Since f is proper and lower semicontinuous, $\mathbf {x}$ is an optimizer of this minimization problem only if it is a critical point of f, i.e., $0\in \partial f(\mathbf {x})$.

Algorithm A.1

general PAM

Step 1 can be implemented through several methods. In particular, (36), (37) and (38) are fulfilled if for all $j\in \{1,\dots ,k\}$, $\mathbf {x}^s_j$ is taken as a minimizer of the optimization problem

$$\begin{aligned} \min _{\mathbf {z}\in \mathbb {R}^{n_j}} g_j(\mathbf {z})+Q(\mathbf {x}_1^s,\dots ,\mathbf {x}_{j-1}^s,\mathbf {z},\mathbf {x}_{j+1}^{s-1},\dots ,\mathbf {x}_k^{s-1})+\frac{1}{2}\Vert \mathbf {z}-\mathbf {x}^{s-1}_j\Vert _{P_j}. \end{aligned}$$

(39)

We now state the global convergence of Algorithm A.1 for a wide class of objective functions [4, Theorem 6.2].

Theorem A.2

(Proximal Alternating Minimization) Let f be a Kurdyka–Łojasiewicz function and bounded from below. Let $\{\mathbf {x}^s\}$ be a sequence produced by Algorithm A.1. If $\{\mathbf {x}^s\}$ is bounded, then it converges to a critical point of f.

Appendix B. Nonsmooth Lagrange multiplier

The following materials can be found in [40, Chapter 10].

Let $X\subseteq \mathbb {R}^n$ be nonempty and closed, $f_0 : \mathbb {R}^n\rightarrow \mathbb {R}$ be locally Lipschitz continuous, $F : \mathbb {R}^n\rightarrow \mathbb {R}^m$ with $F:=(f_1,\dots ,f_m)$ and each $f_i$ locally Lipschitz continuous, and $\theta : \mathbb {R}^m\rightarrow \mathbb {R}\cup \{\pm \infty \}$ be proper, lower semicontinuous, convex with effective domain D.

Consider the following optimization problem

$$\begin{aligned} \min f_0(\mathbf {x})+\theta (F(\mathbf {x}))\ \text {s.t. }\mathbf {x}\in X. \end{aligned}$$

(40)

If $\overline{\mathbf {x}}$ is a local optimal solution to (40) such that the following constraint qualification being satisfied

$$\begin{aligned} \mathbf {0}\in \partial (\mathbf {y}^\mathsf {T} F)(\overline{\mathbf {x}})+N_X(\mathbf {\overline{x}})\ \text {and }\mathbf {y}\in N_D(F(\mathbf {\overline{x}}))\Longrightarrow \mathbf {y}=\mathbf {0}, \end{aligned}$$

(41)

then there exists a vector $\overline{\mathbf {y}}$ such that

$$\begin{aligned} \mathbf {0}\in \partial (f_0+\overline{\mathbf {y}}^\mathsf {T} F)(\overline{\mathbf {x}})+N_X(\mathbf {\overline{x}})\ \text {and }\overline{\mathbf {y}}\in \partial \theta (F(\mathbf {\overline{x}})). \end{aligned}$$

(42)

A vector $\overline{\mathbf {y}}$ satisfying (42) is called a Lagrange multiplier, and the pair $(\overline{\mathbf {x}},\overline{\mathbf {y}})$ satisfying (42) is a Karush–Kuhn–Tucker pair with $\overline{\mathbf {x}}$ a KKT point. Let $M(\overline{\mathbf {x}})$ be the set of Lagrange multipliers for a KKT point $\overline{\mathbf {x}}$. Under the constraint qualification (41), the set $M(\overline{\mathbf {x}})$ is compact.

A particular case is $\theta =\delta _{\{\mathbf {0}\}}$, the indicator function of the set $\{\mathbf {0}\}\subset \mathbb {R}^m$. Then problem (40) reduces to

$$\begin{aligned} \min _{\mathbf {x}\in X} f_0(\mathbf {x})\ \text {s.t. }f_i(\mathbf {x})=0,\ \text {for all }i=1,\dots ,m. \end{aligned}$$

(43)

If each $f_i$ is continuously differentiable for $i\in \{1,\dots ,m\}$, then the constraint qualification is

$$\begin{aligned} y_1\nabla f_1(\overline{\mathbf {x}})+\dots +y_m\nabla f_m(\overline{\mathbf {x}})\in N_X(\overline{\mathbf {x}})\Longrightarrow \mathbf {y}=\mathbf {0}. \end{aligned}$$

(44)

It is the basic constraint qualification discussed in [39], an extension of the Mangasarian–Fromovitz constraint qualification [35].

The optimality condition (42) becomes

$$\begin{aligned} \overline{y}_1\nabla f_1(\overline{\mathbf {x}})+\dots +\overline{y}_m\nabla f_m(\overline{\mathbf {x}})\in \partial f_0(\overline{\mathbf {x}})+ N_X(\overline{\mathbf {x}}), \end{aligned}$$

or in a more familiar form as

$$\begin{aligned} \mathbf {v}+y_1\nabla f_1(\overline{\mathbf {x}})+\dots +y_m\nabla f_m(\overline{\mathbf {x}})\in N_X(\overline{\mathbf {x}})\ \text {for some }\mathbf {v}\in \partial f_0(\overline{\mathbf {x}}). \end{aligned}$$

Appendix C. Proof of Proposition 3.3

Proof

It follows from (16) that $U^{(i)}_s\in \mathbb {O}(n_i)$ for all $i\in \{1,\dots ,k\}$ and $s=1,2,\dots $ and hence the sequence $\{\mathbb {U}_s\}$ is bounded.

Let $\Xi _s\in \partial L_{\rho _s}(\mathbb {U}_s,\mathcal {B}_s;\mathcal {X}_s)$ be such that $\Vert \Xi _s\Vert \le \epsilon _s$ which is guaranteed by (17). Thus,

$$\begin{aligned} \Xi _s=\begin{bmatrix}B_s^{(1)}\\ \vdots \\ B_s^{(k)}\\ \mathcal {W}_s \end{bmatrix}+\rho _s\begin{bmatrix}U_s^{(1)}V_s^{(1)}\big [V_s^{(1)}\big ]^\mathsf {T}-\mathcal {B}_s^{({\text {f}},1)}\big [V_s^{(1)}\big ]^\mathsf {T}+\frac{1}{\rho _s}\mathcal {X}_s^{({\text {f}},1)}\big [V_s^{(1)}\big ]^\mathsf {T}\\ \vdots \\ U_s^{(k)}V_s^{(k)}\big [V_s^{(k)}\big ]^\mathsf {T}-\mathcal {B}_s^{({\text {f}},k)}\big [V_s^{(k)}\big ]^\mathsf {T}+\frac{1}{\rho _s}\mathcal {X}_s^{({\text {f}},k)}\big [V_s^{(k)}\big ]^\mathsf {T}\\ \mathcal {B}_s-(U_s^{(1)},\dots ,U_s^{(k)})\cdot \mathcal {A}-\frac{1}{\rho _s}\mathcal {X}_s\end{bmatrix} \end{aligned}$$

(45)

for some $\mathcal {W}_s\in \partial \Vert \mathcal {B}_s\Vert _1$, and $B^{(i)}_s\in N_{\mathbb {O}(n_i)}(U^{(i)}_s)$ for all $i\in \{1,\dots ,k\}$.

It follows from the last row in (45) and (17) that

$$\begin{aligned} \bigg \Vert \rho _s\big (\mathcal {B}_s-(U_s^{(1)},\dots ,U_s^{(k)})\cdot \mathcal {A}-\frac{1}{\rho _s}\mathcal {X}_s\big )+ \mathcal {W}_s\bigg \Vert \le \Vert \Xi _s\Vert \le \epsilon _s. \end{aligned}$$

(46)

By the fact that $\mathcal {W}_s$ is uniformly bounded (cf. (22)), and $\epsilon _s\rightarrow 0$, we conclude that $\rho _s\big (\mathcal {B}_s-(U_s^{(1)},\dots ,U_s^{(k)})\cdot \mathcal {A}-\frac{1}{\rho _s}\mathcal {X}_s\big )$ is bounded. Therefore, the sequence $\{\mathcal {X}_{s+1}\}$ is bounded by the multiplier update rule (18).

Since $\mathcal {W}_s$ and $\mathcal {X}_s$ are both bounded, it follows from (46) that $\rho _s\big (\mathcal {B}_s-(U_s^{(1)},\dots ,U_s^{(k)})\cdot \mathcal {A}\big )$ is bounded. As $\{\rho _s\}$ is a nondecreasing sequence of positive numbers and $\{\mathbb {U}_s\}$ is bounded, we must have that the sequence $\{\mathcal {B}_s\}$ is bounded.

In a conclusion, the sequence $\{\mathbb {U}_s,\mathcal {B}_s,\mathcal {X}_s\}$ is bounded.

For the feasibility, note that $\mathbb {U}_*$ satisfies the orthogonality by (16). The rest proof is divided into two parts, according to the boundedness of the sequence $\{\rho _s\}$.

Part I. Suppose first that the penalty sequence $\{\rho _s\}$ is bounded. By the penalty parameter update rule (19), it follows that $\rho _s$ stabilizes after some $s_0$, i.e., $\rho _s=\rho _{s_0}$ for all $s\ge s_0$. Thus,

$$\begin{aligned} \Vert (U_s^{(1)},\dots ,U_s^{(k)})\cdot \mathcal {A}-\mathcal {B}_s\Vert \le \tau \Vert (U_{s-1}^{(1)},\dots ,U_{s-1}^{(k)})\cdot \mathcal {A}-\mathcal {B}_{s-1}\Vert \ \text {for all }s\ge s_0+1. \end{aligned}$$

(47)

The feasibility result then follows from a standard continuity argument.

Part II. In the following, we assume that $\rho _s\rightarrow \infty $ as $s\rightarrow \infty $.

Likewise, it follows from the last row in (45) and (17) that

$$\begin{aligned} \bigg \Vert \mathcal {B}_s-(U_s^{(1)},\dots ,U_s^{(k)})\cdot \mathcal {A}-\frac{1}{\rho _s}\mathcal {X}_s+\frac{1}{\rho _s}\mathcal {W}_s\bigg \Vert \le \frac{\Vert \Xi _s\Vert }{\rho _s}\le \frac{\epsilon _s}{\rho _s}. \end{aligned}$$

By the fact that $\mathcal {W}_s$ and $\mathcal {X}_s$ are both bounded, $\rho _s\rightarrow \infty $, and $\epsilon _s\rightarrow 0$, we have that

$$\begin{aligned} \big \Vert \mathcal {B}_s-(U_s^{(1)},\dots ,U_s^{(k)})\cdot \mathcal {A}\big \Vert \rightarrow 0. \end{aligned}$$

(48)

Thus, by continuity, we have that $(\mathbb {U}_*,\mathcal {B}_*)$ is a feasible point.

In the following, we show that $(\mathbb {U}_*,\mathcal {B}_*)$ is a KKT point. Let

$$\begin{aligned} \mathcal {M}_s:=\rho _s\big ((U^{(1)}_s,\dots ,U^{(k)}_s)\cdot \mathcal {A}-\mathcal {B}_s\big ). \end{aligned}$$

It follows from the above analysis that $\{\mathcal {M}_s\}$ is bounded. By the multiplier update rule (18), the system (45) can be rewritten as

$$\begin{aligned} \Xi _s=\begin{bmatrix}B_s^{(1)}\\ \vdots \\ B_s^{(k)}\\ \mathcal {W}_s \end{bmatrix}+\begin{bmatrix} \big (\mathcal {M}_s+\mathcal {X}_{s}\big )^{({\text {f}},1)}\big [V_s^{(1)}\big ]^\mathsf {T}\\ \vdots \\ \big (\mathcal {M}_s+\mathcal {X}_{s}\big )^{({\text {f}},k)}\big [V_s^{(k)}\big ]^\mathsf {T}\\ -\mathcal {M}_s-\mathcal {X}_{s}\end{bmatrix}. \end{aligned}$$

(49)

The boundedness of $\{\mathbb {U}_s,\mathcal {B}_s,\mathcal {X}_s,\mathcal {M}_s\}$ and $\{\Xi _s\}$ implies the boundedness of each $\{B_s^{(i)}\}$ for all $i\in \{1,\dots ,k\}$ as well. We assume without loss of generality that

$$\begin{aligned} \{\mathbb {U}_s,\mathcal {B}_s,\mathcal {X}_s,\mathcal {M}_s,\mathcal {W}_s,\mathbb {B}u_s\}\rightarrow \{\mathbb {U}_*,\mathcal {B}_*,\mathcal {X}_*,\mathcal {W}_*,\mathcal {M}_*,\mathbb {B}_*\}\ \text {as }s\rightarrow \infty \ \text {and }s\in \mathcal {K} \end{aligned}$$

for an infinite index set $\mathcal {K}\subseteq \{1,2,\dots \}$, and in where

$$\begin{aligned} \mathbb {B}_s:=(B^{(1)}_s,\dots ,B^{(k)}_s)\ \text {and }\mathbb {B}_*:=(B^{(1)}_*,\dots ,B^{(k)}_*). \end{aligned}$$

Taking limitations on both sides of (49) within $\mathcal {K}$, we have then

$$\begin{aligned} \begin{bmatrix}\big (\mathcal {M}_*+\mathcal {X}_*)^{({\text {f}},1)}\big [V_*^{(1)}\big ]^\mathsf {T}\\ \vdots \\ \big (\mathcal {M}_*+\mathcal {X}_*)^{({\text {f}},1)}\big [V_*^{(1)}\big ]^\mathsf {T}\\ -(\mathcal {M}_*+\mathcal {X}_*)\end{bmatrix}=-\begin{bmatrix}B_*^{(1)}\\ \vdots \\ B_*^{(k)}\\ \mathcal {W}_*\end{bmatrix}, \end{aligned}$$

where $V^{(i)}_*$ is defined as (21) with $U^{(i)}$’s being replaced by $U^{(i)}_*$’s. By the closedness of subdifferentials, we have

$$\begin{aligned} \mathcal {W}_*\in \partial \Vert \mathcal {B}_*\Vert _1, \end{aligned}$$

and

$$\begin{aligned} B_*^{(i)}\in N_{\mathbb {O}(n_i)}(U^{(i)}_*)\ \text {for all }i\in \{1,\dots ,k\}. \end{aligned}$$

Since each $N_{\mathbb {O}(n_i)}(U^{(i)}_*)$ is a linear subspace, we have shown that $(\mathbb {U}_*,\mathcal {B}_*)$ is a KKT point of (12) with Lagrange multiplier $\mathcal {X}_*+\mathcal {M}_*$ (cf. (25)). The proof is complete. $\square $

Appendix D. Proof of Proposition 4.2

Proof

It is known that for all $i\in \{1,\dots ,k\}$ each orthogonal group $\mathbb {O}(n_i)$ is an algebraic set, defined by a system of polynomial equations. Therefore, $\mathbb {O}(n_i)$ is a semi-algebraic set and its indicator function is semi-algebraic [8]. The $l_1$-norm $\Vert \cdot \Vert _1$ is also semi-algebraic. Also known is that each semi-algebraic function is a Kurdyka–Łojasiewicz function (cf. [9, Appendix]). Thus, as a summation of the $l_1$-norm, the indicator functions of the orthogonal groups, and polynomials, the augmented Lagrangian function $L_{\rho }(\cdot ,\cdot ;\mathcal {X})$ is a Kurdyka–Łojasiewicz function.

If the iteration sequence $\{(\mathbb {U}_s,\mathcal {B}_s)\}$ generated by Algorithm 4.1 is bounded, and the function $L_{\rho }(\cdot ,\cdot ;\mathcal {X})$ is bounded from below, then the sequence $\{(\mathbb {U}_s,\mathcal {B}_s)\}$ converges by Theorem A.2.

For any given $\mathcal {X}$, it follows immediately from (14) that the function $L_{\rho }(\cdot ,\cdot ;\mathcal {X})$ is bounded from below, since

$$\begin{aligned} L_{\rho }(\mathbb {U},\mathcal {B};\mathcal {X})=\Vert \mathcal {B}\Vert _1+\sum _{i=1}^k\delta _{\mathbb {O}(n_i)}(U^{(i)})+\frac{\rho }{2}\Big \Vert (U^{(1)},\dots ,U^{(k)})\cdot \mathcal {A}-\mathcal {B}+\frac{1}{\rho }\mathcal {X}\Big \Vert ^2-\frac{1}{2\rho }\Vert \mathcal {X}\Vert ^2. \end{aligned}$$

(50)

In the language of Appendix A, the variable $\mathcal {B}$ refers to the $j=0$-th block variable, and $U^{(j)}$ the j-th block for $j\in \{1,\dots ,k\}$. Then,

$$\begin{aligned} g_0(\mathcal {B}):=\Vert \mathcal {B}\Vert _1,\ \text {and }g_j(U^{(j)})=\delta _{\mathbb {O}(n_j)}(U^{(j)})\ \text {for all }j\in \{1,\dots ,k\}, \end{aligned}$$

and the function Q is defined naturally to comprise $L_\rho $ in (50).

We first show that (38) is satisfied. By (26), we know that (38) is satisfied by $\alpha _j=\overline{c}$ for all $j\in \{0,1,\dots ,k\}$.

It follows from (26), (28) and (29) that

$$\begin{aligned}&L_{\rho }(\mathbb {U}_{s-1},\mathcal {B}_s;\mathcal {X})+\frac{c_s^{(0)}}{2}\Vert \mathcal {B}_s-\mathcal {B}_{s-1}\Vert ^2\le L_{\rho }(\mathbb {U}_{s-1},\mathcal {B}_{s-1};\mathcal {X}),\\&\quad L_{\rho }\left( \left( U^{(1)}_s,U^{(2)}_{s-1},\dots ,U^{(k)}_{s-1}\right) ,\mathcal {B}_s;\mathcal {X}\right) +\frac{c_s^{(1)}}{2}\Vert U^{(1)}_s-U^{(1)}_{s-1}\Vert ^2\le L_{\rho }(\mathbb {U}_{s-1},\mathcal {B}_s;\mathcal {X}),\\&\quad \dots \\&\quad L_{\rho }(\mathbb {U}_s,\mathcal {B}_s;\mathcal {X})+\frac{c_s^{(k)}}{2}\Vert U^{(k)}_s-U^{(k)}_{s-1}\Vert ^2\le L_{\rho }\left( \left( U^{(1)}_s,\dots ,U^{(k-1)}_{s},U^{(k)}_{s-1}\right) ,\mathcal {B}_s;\mathcal {X}\right) . \end{aligned}$$

Summing up these inequalities, we have

$$\begin{aligned} L_{\rho }(\mathbb {U}_s,\mathcal {B}_s;\mathcal {X}) +\frac{\underline{c}}{2}\left( \sum _{i=1}^k\Vert U^{(i)}_s-U^{(i)}_{s-1}\Vert ^2 +\Vert \mathcal {B}_s-\mathcal {B}_{s-1}\Vert ^2\right) \le L_{\rho }(\mathbb {U}_{s-1},\mathcal {B}_{s-1};\mathcal {X}). \end{aligned}$$

Therefore, the sequence $\{L_{\rho }(\mathbb {U}_s,\mathcal {B}_s;\mathcal {X})\}$ monotonically decreases to a finite limit.

On the other hand, since each component matrix of $\mathbb {U}_s$ is an orthogonal matrix, the sequence $\{\mathbb {U}_s\}$ is bounded. Suppose that the sequence $\{\mathcal {B}_s\}$ is unbounded. Then, it follows from (50) that the sequence $\{L_{\rho }(\mathbb {U}_s,\mathcal {B}_s;\mathcal {X})\}$ should diverge to infinity, which is an immediate contradiction. Thus, the iteration sequence $\{(\mathbb {U}_s,\mathcal {B}_s)\}$ must be bounded, and hence converges by Theorem A.2.

In the following, we show that $\Vert \Theta _s\Vert \rightarrow 0$ as $s\rightarrow \infty $. First of all, we derive an upper bound estimate for $\Vert V^{(j)}_s-\tilde{V}^{(j)}_s\Vert $ as

$$\begin{aligned}&\Vert V^{(j)}_s-\tilde{V}^{(j)}_s\Vert \\&\quad =\Vert \big [(U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s-1},\dots ,U^{(k)}_{s-1})\cdot \mathcal {A}\big ]^{({\text {f}},j)}\\&\qquad -\big [(U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s},\dots ,U^{(k)}_{s})\cdot \mathcal {A}\big ]^{({\text {f}},j)}\Vert \\&\quad =\Vert (U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s-1},\dots ,U^{(k)}_{s-1})\cdot \mathcal {A}\\&\qquad -(U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s},\dots ,U^{(k)}_{s})\cdot \mathcal {A}\Vert \\&\quad \le \Vert (U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s-1},\dots ,U^{(k)}_{s-1})\cdot \mathcal {A}\\&\qquad -(U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s-1},\dots ,U^{(k-1)}_{s-1},U^{(k)}_{s})\cdot \mathcal {A}\Vert \\&\qquad +\Vert (U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s-1},\dots ,U^{(k-1)}_{s-1},U^{(k)}_{s})\cdot \mathcal {A}\\&\qquad -(U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s},\dots ,U^{(k)}_{s})\cdot \mathcal {A}\Vert \\&\quad =\Vert (U^{(k)}_{s-1}-U^{(k)}_s)\big [(U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s-1},\dots ,U^{(k-1)}_{s-1},I)\cdot \mathcal {A}\big ]^{({\text {f}},k)}\Vert \\&\qquad +\Vert (U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s-1},\dots ,U^{(k-1)}_{s-1},U^{(k)}_{s})\cdot \mathcal {A}\\&\qquad -(U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s},\dots ,U^{(k)}_{s})\cdot \mathcal {A}\Vert \\&\quad \le \Vert \mathcal {A}\Vert \Vert U^{(k)}_{s-1}-U^{(k)}_s\Vert \\&\qquad +\Vert (U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s-1},\dots ,U^{(k-1)}_{s-1},U^{(k)}_{s})\cdot \mathcal {A}\\&\qquad -(U^{(1)}_s,\dots ,U^{(j-1)}_s,I,U^{(j+1)}_{s},\dots ,U^{(k)}_{s})\cdot \mathcal {A}\Vert \\&\quad \le \Vert \mathcal {A}\Vert \big (\Vert U^{(j+1)}_{s-1}-U^{(j+1)}_s\Vert +\dots +\Vert U^{(k)}_{s-1}-U^{(k)}_s\Vert \big )\\&\quad \le \Vert \mathcal {A}\Vert \Vert \mathbb {U}_s-\mathbb {U}_{s-1}\Vert , \end{aligned}$$

where the second inequality follows from the fact that $U^{(i)}_t\in \mathbb {O}(n_i)$ for all $i\in \{1,\dots ,k\}$ and $t=1,2,\dots $, and the third from a standard induction.

Likewise, we have

$$\begin{aligned} \Vert (U^{(1)}_{s-1},\dots ,U^{(k)}_{s-1})\cdot \mathcal {A}-(U^{(1)}_{s},\dots ,U^{(k)}_{s})\cdot \mathcal {A}\Vert \le \Vert \mathcal {A}\Vert \Vert \mathbb {U}_s-\mathbb {U}_{s-1}\Vert . \end{aligned}$$

Thus, we have

$$\begin{aligned} \Vert \Theta _s\Vert&\le (\rho +k\Vert \mathcal {X}\Vert +k\rho \Vert \mathcal {B}_s\Vert )\Vert \mathcal {A}\Vert \Vert \mathbb {U}_s -\mathbb {U}_{s-1}\Vert +c_s^{(0)}\Vert \mathcal {B}_s-\mathcal {B}_{s-1}\Vert \\&\quad +\sum _{i=1}^kc_s^{(i)}\Vert U^{(i)}_s-U^{(i)}_{s-1}\Vert \\&\le \big [(\rho +k\Vert \mathcal {X}\Vert +k\rho \Vert \mathcal {B}_s\Vert )\Vert \mathcal {A}\Vert +\overline{c}\big ]\Vert \mathbb {U}_s-\mathbb {U}_{s-1}\Vert +\overline{c}\Vert \mathcal {B}_s-\mathcal {B}_{s-1}\Vert . \end{aligned}$$

Since the iteration sequence $\{(\mathbb {U}_s,\mathcal {B}_s)\}$ converges, we conclude that $\Vert \Theta _s\Vert \rightarrow 0$ as $s\rightarrow \infty $. As $\epsilon >0$ is a given parameter, Algorithm 4.1 terminates after finitely many iterations. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, S. An inexact augmented Lagrangian method for computing strongly orthogonal decompositions of tensors. Comput Optim Appl 75, 701–737 (2020). https://doi.org/10.1007/s10589-019-00128-3

Download citation

Received: 26 August 2018
Published: 31 August 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s10589-019-00128-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An inexact augmented Lagrangian method for computing strongly orthogonal decompositions of tensors

Abstract

Access this article

Similar content being viewed by others

Rank Properties and Computational Methods for Orthogonal Tensor Decompositions

Optimality conditions for Tucker low-rank tensor optimization

A proximal point like method for solving tensor least-squares problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A. Convergence theorem for PAM

Algorithm A.1

Theorem A.2

Appendix B. Nonsmooth Lagrange multiplier

Appendix C. Proof of Proposition 3.3

Proof

Appendix D. Proof of Proposition 4.2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

An inexact augmented Lagrangian method for computing strongly orthogonal decompositions of tensors

Abstract

Access this article

Similar content being viewed by others

Rank Properties and Computational Methods for Orthogonal Tensor Decompositions

Optimality conditions for Tucker low-rank tensor optimization

A proximal point like method for solving tensor least-squares problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A. Convergence theorem for PAM

Algorithm A.1

Theorem A.2

Appendix B. Nonsmooth Lagrange multiplier

Appendix C. Proof of Proposition 3.3

Proof

Appendix D. Proof of Proposition 4.2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation