Abstract
Let \( * \) denote the t-product between two third-order tensors proposed by Kilmer and Martin (Linear Algebra Appl 435(3): 641–658, 2011). The purpose of this work is to study fundamental computation over the set \( \textrm{St}\left( n,p,l\right) := \{\mathcal {X} \in \mathbb R^{n\times p \times l} \mid \mathcal {X} ^{\top } * \mathcal {X} = \mathcal I \}\), where \(\mathcal {X} \) is a third-order tensor of size \(n\times p \times l\) (\(n\geqslant p\)) and \({\mathcal {I}}\) is the identity tensor. It is first verified that \( \textrm{St}\left( n,p,l\right) \) endowed with the Euclidean metric forms a Riemannian manifold, which is termed as the (third-order) tensor Stiefel manifold in this work. We then derive the tangent space, Riemannian gradient, and Riemannian Hessian on \( \textrm{St}\left( n,p,l\right) \). In addition, formulas of various retractions based on t-QR, t-polar decomposition, t-Cayley transform, and t-exponential, as well as vector transports, are presented. It is expected that analogous to their matrix counterparts, the derived formulas may serve as building blocks for analyzing optimization problems over the tensor Stiefel manifold and designing Riemannian algorithms.
Similar content being viewed by others
Data Availability
We do not analyze or generate any datasets, because our work proceeds within a theoretical and mathematical approach.
Notes
For QR factorization of complex matrices, we can choose that R factor is upper triangular with real nonzero diagonal elements.
References
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
Comon, P.: Tensors: a brief introduction. IEEE Signal Process. Mag. 31(3), 44–53 (2014)
Cichocki, A., Mandic, D., De Lathauwer, L., Zhou, G., Zhao, Q., Caiafa, C., Phan, H.A.: Tensor decompositions for signal processing applications: from two-way to multiway component analysis. IEEE Signal Process. Mag. 32(2), 145–163 (2015)
Sidiropoulos, N., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E., Faloutsos, C.: Tensor decomposition for signal processing and machine learning. IEEE Trans. Signal Process. 65(13), 3551–3582 (2017)
Braman, K.: Third-order tensors as linear operators on a space of matrices. Linear Algebra Appl. 433(7), 1241–1253 (2010)
Kilmer, M.E., Martin, C.D.: Factorization strategies for third-order tensors. Linear Algebra Appl. 435(3), 641–658 (2011)
Kilmer, M.E., Braman, K., Hao, N., Hoover, R.C.: Third-order tensors as operators on matrices: a theoretical and computational framework with applications in imaging. SIAM J. Matrix Anal. Appl. 34(1), 148–172 (2013)
Lu, C., Feng, J., Chen, Y., Liu, W., Lin, Z., Yan, S.: Tensor robust principal component analysis with a new tensor nuclear norm. IEEE Trans. Pattern Anal. Mach. Intell. 42(4), 925–938 (2019)
Miao, Y., Qi, L., Wei, Y.: T-Jordan canonical form and t-Drazin inverse based on the t-product. Commun. Appl. Math. Comput. Sci. 3(2), 201–220 (2021)
Lund, K.: The tensor t-function: a definition for functions of third-order tensors. Numer. Linear Algebra Appl. 27(3), e2288 (2020)
Miao, Y., Qi, L., Wei, Y.: Generalized tensor function via the tensor singular value decomposition based on the T-product. Linear Algebra Appl. 590, 258–303 (2020)
Liu, W.H., Jin, X.Q.: A study on T-eigenvalues of third-order tensors. Linear Algebra Appl. 612, 357–374 (2020)
Zheng, M.M., Huang, Z.H., Wang, Y.: T-positive semidefiniteness of third-order symmetric tensors and T-semidefinite programming. Comput. Optim. Appl. 78(1), 239–272 (2021)
Qi, L., Luo, Z.: Tubal matrices (2021). arXiv:2105.00793
Huang, W., Absil, P.A., Gallivan, K.A.: A Riemannian BFGS method without differentiated retraction for nonconvex optimization problems. SIAM J. Optim. 28(1), 470–495 (2018)
Hu, J., Jiang, B., Lin, L., Wen, Z., Yuan, Y.X.: Structured quasi-Newton methods for optimization with orthogonality constraints. SIAM J. Sci. Comput. 41(4), A2239–A2269 (2019)
Chen, S., Ma, S., So, A.M.C., Zhang, T.: Proximal gradient method for nonsmooth optimization over the Stiefel manifold. SIAM J. Optim. 30(1), 210–239 (2020)
Huang, W., Wei, K.: Riemannian proximal gradient methods. Math. Program. 194, 371–413 (2022)
Gao, B., Liu, X., Chen, X., Yuan, Y.X.: A new first-order algorithmic framework for optimization problems with orthogonality constraints. SIAM J. Optim. 28(1), 302–332 (2018)
Hu, J., Liu, X., Wen, Z.W., Yuan, Y.X.: A brief introduction to manifold optimization. J. Oper. Res. Soc. China 8(2), 199–248 (2020)
Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)
Tu, L.W.: An Introduction to Manifolds, 2nd edn. Springer, Universitext, New York (2011)
Boumal, N.: An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, Cambridge (2022)
Uschmajew, A., Vandereycken, B.: The geometry of algorithms using hierarchical tensors. Linear Algebra Appl. 439(1), 133–166 (2013)
Holtz, S., Rohwedder, T., Schneider, R.: On manifolds of tensors of fixed TT-rank. Numer. Math. 120(4), 701–731 (2012)
Kressner, D., Steinlechner, M., Vandereycken, B.: Low-rank tensor completion by Riemannian optimization. BIT Numer. Math. 54(2), 447–468 (2014)
Heidel, G., Schulz, V.: A Riemannian trust-region method for low-rank tensor completion. Numer. Linear Algebra Appl. 25(6), e2175 (2018)
Steinlechner, M.: Riemannian optimization for high-dimensional tensor completion. SIAM J. Sci. Comput. 38(5), S461–S484 (2016)
Breiding, P., Vannieuwenhoven, N.: A Riemannian trust region method for the canonical tensor rank approximation problem. SIAM J. Optim. 28(3), 2435–2465 (2018)
Gilman, K., Tarzanagh, D.A., Balzano, L.: Grassmannian optimization for online tensor completion and tracking with the t-SVD. IEEE Trans. Signal Process. 70, 2152–2167 (2022)
Song, G.J., Wang, X.Z., Ng, M.K.: Riemannian conjugate gradient descent method for fixed multi rank third-order tensor completion. J. Comput. Appl. Math. 421, 114866 (2023)
Zhang, X., Yang, Z.P., Cao, C.G.: Inequalities involving Khatri–Rao products of positive semidefinite matrices. Appl. Math. E-Notes 2, 117–124 (2002)
Huang, W.: Optimization algorithms on Riemannian manifolds with applications. Ph.D. thesis, The Florida State University (2013)
Zhu, X.: A Riemannian conjugate gradient method for optimization on the Stiefel manifold. Comput. Optim. Appl. 67(1), 73–110 (2017)
Edelman, A., Arias, T., Smith, S.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)
Bunse-Gerstner, A., Byers, R., Mehrmann, V.: Numerical methods for simultaneous diagonalization. SIAM J. Matrix Anal. Appl. 14(4), 927–949 (1993)
Pesquet-Popescu, B., Pesquet, J.C., Petropulu, A.P.: Joint singular value decomposition-a new tool for separable representation of images. In: International Conference on Image Processing. vol. 2, pp. 569–572. IEEE, Thessaloniki, Greece (2001)
Shashua, A., Levin, A.: Linear image coding for regression and classification using the tensor-rank principle. In: International Conference on Artificial Intelligence and Statistics. vol. 1, pp. I–42–I–49. IEEE Computer Society, Kauai, HI, USA (2001)
Allen, G.I.: Sparse higher-order principal components analysis. In: International Conference on Artificial Intelligence and Statistics. vol. 22, pp. 27–36. PMLR, La Palma, Canary Islands (2012)
Wang, Y., Dong, M., Xu, Y.: A sparse rank-1 approximation algorithm for high-order tensors. Appl. Math. Lett. 102, 106140 (2020)
Mao, X., Yang, Y.: Several approximation algorithms for sparse best rank-1 approximation to higher-order tensors. J. Glob. Optim. (2022). https://doi.org/10.1007/s10898-022-01140-4
Kwak, N.: Principal component analysis based on \(\ell _1\)-norm maximization. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1672–1680 (2008)
Hao, N., Kilmer, M.E., Braman, K., Hoover, R.C.: Facial recognition using tensor–tensor decompositions. SIAM J. Imaging Sci. 6(1), 437–463 (2013)
Schönemann, P.H.: A generalized solution of the orthogonal procrustes problem. Psychometrika 31(1), 1–10 (1966)
Lin, J., Huang, T.Z., Zhao, X.L., Jiang, T.X., Zhuang, L.: A tensor subspace representation-based method for hyperspectral image denoising. IEEE Tran. Geosci. Remote Sens. 59(9), 7739–7757 (2020)
Xu, S.S., Huang, T.Z., Lin, J., Chen, Y.: T-hy-demosaicing: hyperspectral reconstruction via tensor subspace representation under orthogonal transformation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 14, 4842–4853 (2021)
Xu, T., Huang, T.Z., Deng, L.J., Yokoya, N.: An iterative regularization method based on tensor subspace representation for hyperspectral image super-resolution. IEEE Trans. Geosci. Remote Sens. 60, 1–16 (2022)
Hoover, R.C., Caudle, K., Braman, K.: Multilinear discriminant analysis through tensor-tensor eigendecomposition. In: ICMLA. pp. 578–584. IEEE, Orlando, FL (2018)
Ozdemir, C., Hoover, R.C., Caudle, K., Braman, K.: High-order multilinear discriminant analysis via order-\(n\) tensor eigendecomposition. Technical report, SSRN (2022). https://dx.doi.org/10.2139/ssrn.4203431
Vervliet, N., Debals, O., Sorber, L., Van Barel, M., De Lathauwer, L.: Tensorlab 3.0 (2016). http://www.tensorlab.net
Lu, C.: Tensor-Tensor Product Toolbox. Carnegie Mellon University, Pittsburgh (2018)
Iannazzo, B., Porcelli, M.: The Riemannian Barzilai–Borwein method with nonmonotone line search and the matrix geometric mean computation. IMA J. Numer. Anal. 38(1), 495–517 (2018)
Kilmer, M.E., Horesh, L., Avron, H., Newman, E.: Tensor–tensor algebra for optimal representation and compression of multiway data. Proc. Natl. Acad. Sci. U.S.A. 118(28), e2015851118 (2021)
Kernfeld, E., Kilmer, M., Aeron, S.: Tensor–tensor products with invertible linear transforms. Linear Algebra Appl. 485, 545–570 (2015)
Hall, B.C.: Lie Groups, Lie Algebras, and representations. Springer, Cham (2015)
Van Loan, C.: Computing integrals involving the matrix exponential. IEEE Trans. Autom. Control 23(3), 395–404 (1978)
Van Loan, C.F.: The ubiquitous kronecker product. J. Comput. Appl. Math. 123(1–2), 85–100 (2000)
Kolda, T.G.: Multilinear operators for higher-order decompositions. Tech. Rep. SAND2006-2081, 923081, Citeseer (2006)
Author information
Authors and Affiliations
Contributions
X. -P. Mao, Y. Wang and Y. -N. Yang deduced the theories, designed the algorithms, performed the numerial experiments, drafted the manuscript, read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
This work was supported by the National Natural Science Foundation of China (No. 12171105), Fok Ying Tong Education Foundation (No. 171094), and the special foundation for Guangxi Bagui Scholars.
Appendices
Appendix
A Preliminaries on Riemannian Manifold
Basic definitions and properties concerning the Riemannian manifold can be found in the books [21,22,23]. To be more convenient and to make the paper self-contained, we summarize the necessary ones in this section.
Definition 13
[21, 22] A topological manifold \(\mathscr {M}\) of dimension n is a Hausdorff, second countable, locally Euclidean dimension n space. Let \(\mathscr {N}\) be a submanifold of \(\mathscr {M}\). If the manifold topology of \(\mathscr {N}\) coincides with its subspace topology induced from the topological space \(\mathscr {M}\), then \(\mathscr {N}\) is called an embedded submanifold of the manifold \(\mathscr {M}\).
Definition 14
[21] A tangent vector \(\xi _{x}\) to \(\mathscr {M}\) at x is defined as a mapping from \(\mathfrak {F}_{x}(\mathscr {M})\) to \(\mathbb {R}\) such that \( \xi _{x} f:=\dot{\gamma }(0) f:= \frac{\textrm{d}}{\textrm{d} t}f(\gamma (t))\mid _{t=0}, \quad \forall f \in \mathfrak {F}_{x}(\mathscr {M}), \) for some smooth curve \(\gamma (t)\) on \(\mathscr {M}\) with \(\gamma (0)=x\). The tangent space \(T_{x} \mathscr {M}\) to \(\mathscr {M}\) is defined as the set of all tangent vectors to \(\mathscr {M}\) at x. \(T\mathscr {M}:=\bigcup _{x \in \mathscr {M}} T_{x} \mathscr {M}.\) is called the tangent bundle of the manifold.
Definition 15
[21] The differential of \(F: \mathscr {M} \rightarrow \mathscr {N}\) at x is a linear operator \(\textrm{D}F(x): T_{x} \mathscr {M} \rightarrow T_{F(x)} \mathscr {N}\) defined by: \( \textrm{D} F(x)[v]:=\frac{\textrm{d}}{\textrm{d} t} F(\gamma (t))\mid _{t=0}, \) where \(\gamma (t)\) is any curve on the manifold that satisfies \(\gamma (0)=x\) and \(\dot{\gamma }(0)=v\).
Definition 16
[21] A Riemannian metric g is defined on each tangent space of x as an inner product \(g_{x}: T_{x} \mathscr {M} \times T_{x} \mathscr {M} \rightarrow \mathbb {R}\). \( g_{x}(\eta , \xi )=\langle \eta , \xi \rangle _{x} \) where \(\eta , \xi \in T_{x} \mathscr {M}\). A Riemannian manifold is the combination \((\mathscr {M}, g)\).
Definition 17
[33] The geodesic \(\gamma (t)\) defined by an affine connection is a curve that satisfies \( \ddot{\gamma }(t):=\frac{\textrm{D}^{2}}{\textrm{d} t^{2}} \gamma (t):=\frac{\textrm{D}}{\textrm{d} t} \dot{\gamma }(t) =0, \) where \(\frac{\textrm{D}}{\textrm{d} t}\) is the induced covariant derivative (see [23, Thm. 5.29]).
Definition 18
[21] The Riemannian gradient \({\text {grad}}f(x)\) of a function f at x is an unique vector in \(T_x\mathscr {M}\) satisfying \(\langle {\text {grad}} f(x), \xi _x\rangle _{x}=\textrm{D} f(x)[\xi _x], \quad \forall \xi _x \in T_{x} \mathscr {M}.\) The Riemannian Hessian \( {\text {Hess}}f(x)\) is a mapping from the tangent space \(T_{x} \mathscr {M}\) to the tangent space \(T_{x} \mathscr {M}\): \( {\text {Hess}} f(x)[\xi ]:={\nabla }_{\xi } {\text {grad}} f(x), \) where \({\nabla }\) is the Riemannian connection on \(\mathscr {M}\) (see [21, Thm. 5.3.1]).
Lemma 8
[23] Let \(\mathscr {M}\) be a Riemannian submanifold of a Euclidean space \(\mathscr {E}\) and let \(f: \mathscr {M} \rightarrow \mathbb {R}\) be a smooth function. Then,
where \(\textrm{D}\) is the Euclidean derivative, \(\varvec{P}_x(y) \) denotes the orthogonal projection from \(\mathscr {E}\) to \(T_{x} \mathscr {M}\), and smooth scalar field \(\bar{f}\) ( vector field \(\bar{G}\)) is any smooth extension of f (G) to a neighborhood of \(\mathscr {M}\) in \(\mathscr {E}\).
Retraction provides a method to map the tangent vector to the next iterate on the manifold.
Definition 19
(cf. [21, Def. 4.1.1]) A retraction on a manifold \(\mathscr {M}\) is a smooth mapping R from the tangent bundle \(T\mathscr {M} \) onto \(\mathscr {M}\). Let \(R_x\) denote the restriction of R to \(T_x\mathscr {M}\), \((i)~R_x(0_x) = x\), where \(0_x\) denotes the zero element of \(T_x\mathscr {M} \), and \((ii)~\textrm{D} R_x(0_x):T_x\mathscr {M}\mapsto T_x\mathscr {M}\) is the identity map: \(\textrm{D} R_x(0_x)[v] = v\).
For the embedded submanifold of a vector space, a simple way to construct retractions is specified in the following.
Lemma 9
(cf. [21, Prop. 4.1.2]) Let \(\mathscr {M}\) be an embedded manifold of a vector space \(\mathscr {E}\) and let \(\mathscr {N}\) be an abstract manifold such that \(\dim {{\mathscr {M}}}+\dim {{\mathscr {N}}}=\dim {{\mathscr {E}}}\). Assume that there is a diffeomorphism \( \phi :\mathscr {M} \times \mathscr {N}\rightarrow \mathscr {E}_{*}:(F,G)\mapsto \phi (F,G), \) where \(\mathscr {E}_{*}\) is an open subset of \(\mathscr {E}\)(thus \(\mathscr {E}_{*}\) is an open submanifold of \(\mathscr {E}\)), with a neutral element \(I \in \mathscr {N}\) satisfying \( \phi (F,I) = F,~~ \forall F \in \mathscr {M}. \) Then the mapping \( R_{X}(\xi ) = \pi _1 \left( \phi ^{-1}(X+\xi ) \right) , \) where \( \pi _1:\mathscr {M} \times \mathscr {N}\rightarrow \mathscr {M}:(F,G)\mapsto F\) is the projection onto the first component, defines a retraction on \(\mathscr {M}\).
To compare tangent vectors at distinct points on the manifold, the vector transport upon retraction R
gives us a way to transport a tangent vector \(\xi \in T_x\mathscr {M}\) to the tangent space \(T_{R_x(\eta )}\mathscr {M} \) for some \(\eta \in T_x\mathscr {M} \).
Definition 20
(cf. [21, Def. 8.1.1]) A vector transport \(\mathcal {T} : T\mathscr {M} \oplus T\mathscr {M} \rightarrow T\mathscr {M}:(\eta _x,\xi _x ) \mapsto \mathcal {T}_{\eta _x}\xi _x\) associated with a retraction R is a smooth mapping satisfying the following properties for all \(x\in \mathscr {M}\): (i) \(\mathcal {T} _{\eta _x}\xi _x\in T_{R_x(\eta _x)}\mathscr {M}\), (ii) \(\mathcal {T} _{0_{x}}\xi _x = \xi _x\) for all \(\xi _x\in T_x\mathscr {M}\), and \((iii)~\mathcal {T} _{\eta _{x}}(a\xi _x+b\zeta ) = a\mathcal {T}_{\eta _{x}}\xi _x+b\mathcal {T}_{\eta _{x}}\zeta \). Vector transport by differentiated retraction is defined as
Lemma 10
(cf. [21, Sect. 8.1.3]) A vector transport on \(\mathscr {M}\) associated with a retraction R is given by the orthogonal projection onto the tangent space, i.e., \( \mathcal {T}_{\eta _x}{\xi _x} = \varvec{P}_{R_x(\eta _x)}\xi _x. \)
Definition 21
[34] A vector transport \(\mathcal {T}\) is called isometric if for all \(\eta , \xi \in T_{x} \mathscr {M}\), it satisfies \( \left\langle \mathcal {T}_{\eta }(\xi ), \mathcal {T}_{\eta }(\xi )\right\rangle _{R_{x}(\eta )}=\langle \xi , \xi \rangle _{x} \) where R is the retraction associated with \(\mathcal {T}\).
B Proofs of Theorems, Propositions and Lemmas in Sect. 2
1.1 B.1 Proof of Proposition 2
Proof
Taking the conjugate transpose of both sides of the equation in item (i) of Proposition 1, then multiplying both sides by \((F_l\otimes I_n)\), we get
where we use the following property ( [9, Lem. 3]): \({\text {bcirc}}(\mathcal {A})^{\top } = {\text {bcirc}}(\mathcal {A}^{\top }) \). Taking the first column of the block matrix on both sides of the above equation yields
which combing with Definition 6 gives \(L(\mathcal {A} ^{\top }) = {\text {fold}}\left( (\hat{A}^{(i)})^{H}: i \in [l]\right) .\)
1.2 B.2 Proof of Theorem 3
Proof
t-QR was proposed in [7, Sect. 2.5].
For \(i= 1, \cdots , \lceil \frac{l+1}{2}\rceil ,\) let \( \hat{A}^{(i)} = \hat{Q}^{(i)}\cdot \hat{R}^{(i)}\) be the QR decomposition of \(\hat{A}^{(i)}\in \mathbb C^{n\times p} \)Footnote 1 where \(\hat{Q}^{(i)}\in \mathbb C^{n\times p} , (\hat{Q}^{(i)})^H\cdot \hat{Q}^{(i)} = I_p\), \(\hat{R}^{(i)}\in \mathbb C^{p\times p} _{\textrm{upp}}\) and \({\text {diag}}(\hat{R}^{(i)})\in \mathbb R^{p\times p} \), namely, the diagonal entries of \({\hat{R}}^{(i)}\) are real. For \(i=1+ \lceil \frac{l+1}{2} \rceil ,\cdots ,l\), \( {\hat{A}}^{(i)} = \textrm{conj}\left( {\hat{A}}^{(l+2-i)}\right) , {\hat{Q}}^{(i)} = \textrm{conj}\left( {\hat{Q}}^{(l+2-i)}\right) , {\hat{R}}^{(i)} = \textrm{conj}\left( {\hat{R}}^{(l+2-i)}\right) . \) It follows from Remarks 3 and 4 that \({\mathcal {Q}} \in \textrm{St}\left( n,p,l\right) \) and \({\mathcal {R}}\in \mathbb {R}_{\textrm{upp}}^{p\times p \times l} \). Here \({\mathcal {R}}\) to be real is because of Remark 3 and direct computation. Using Remark 3 again, further we have \(\hat{A}^{(1)}\in \mathbb R^{n\times p} , \hat{Q}^{(1)}\in \mathbb R^{n\times p} , \hat{R}^{(1)}\in \mathbb R^{p\times p} \).
We then show the uniqueness of the decomposition. As we know, for QR decomposition of a matrix \(\hat{A}^{(i)}\in \mathbb C^{n\times p} \) with \(n\geqslant p\), if \(\hat{A}^{(i)}, i \in [l]\) are of full rank p, namely, \(\hat{\mathcal {A} }\in \mathbb C^{n\times p \times l} _*\), then the QR decomposition \(\hat{A}^{(i)} = \hat{Q}^{(i)}\hat{R}^{(i)}\) are unique if we require that the diagonal entries of \(\hat{R}^{(i)}\) are all positive, namely, \(\hat{\mathcal {R} }\in \mathbb {C}_{\textrm{upp}+}^{p\times p \times l} \). Since the Fourier transform is bijective, the uniqueness of the matrix QR decomposition leads to the uniqueness of the t-QR decomposition.
1.3 B.3 Proof of Lemma 3
Proof
The proof of Theorem 3 shows that S is isomorphic to
If l is even, then it holds that
Then we examine matrices \(\hat{R}^{(i)}, i \in [l]\) containing free variables. There are two real upper triangular \(p\times p\) matrices, both of dimension \(\frac{(1+p)p}{2}\); there are \(\frac{l-2}{2}\) complex upper triangular \(p\times p\) matrices with positive diagonal elements, both of dimension \(\frac{(p-1)p}{2}\times 2 + p\). Hence the dimension of \(\hat{S}\) is \(2\times \frac{(1+p)p}{2}+\frac{l-2}{2}\times \big (\frac{(p-1)p}{2}\times 2 + p\big )=\frac{p^2l}{2}+p\).
If l is odd, then it holds that
There is one real upper triangular \(p\times p\) matrix of dimension \(\frac{(1+p)p}{2}\); there are \(\frac{l-1}{2}\) complex upper triangular \(p\times p\) matrices with positive diagonal elements, both of dimension \(\frac{(p-1)p}{2}\times 2 + p\). Hence the dimension of \(\hat{S}\) is \( \frac{(1+p)p}{2}+\frac{l-1}{2}\times \big (\frac{(p-1)p}{2}\times 2 + p\big )=\frac{p^2l+p}{2}\).
1.4 B.4 Proof of Theorem 4
Proof
Let the compact t-SVD of \({\mathcal {A}}= {\mathcal {U}} * \mathcal S * {\mathcal {V}}^{\top }\). Let \({\mathcal {P}}:=\mathcal U * {\mathcal {V}}^{\top }\) and \({\mathcal {H}}:= \mathcal V * {\mathcal {S}} * {\mathcal {V}}^{\top }\). Then it is clear that (8) is satisfied. To see that \({\mathcal {H}}\in \textrm{Sym}(\mathbb R_+^{p\times p \times l}) \), first we show that \({\mathcal {S}}\in \textrm{Sym}(\mathbb R_+^{p\times p \times l}) \). This is obvious, as each \({\hat{S}}^{(i)}\) is diagonal with nonnegative entries, and so \({\mathcal {S}}\in \textrm{Sym}(\mathbb R_+^{p\times p \times l}) \), according to Remark 5. By [13, Thm. 7], there is a unique \({\mathcal {T}}\) such that \({\mathcal {T}} * {\mathcal {T}}^{\top }= \mathcal S\). Then \({\mathcal {H}} \) can be written as \({\mathcal {H}} = \mathcal V * {\mathcal {T}} * \left( {\mathcal {V}} * \mathcal T \right) ^{\top }\), which together with [13, Thm. 8] shows that \({\mathcal {H}} \in \textrm{Sym}(\mathbb R_+^{p\times p \times l}) \).
To show the uniqueness of \({\mathcal {H}}\), note that \( \mathcal A^{\top } * {\mathcal {A}}={\mathcal {H}} * {\mathcal {H}}\), which by [13, Thm. 8] is clearly symmetric positive semidefinite. Revoking again [13, Thm. 7] gives the uniqueness of \({\mathcal {H}}\).
If \({\mathcal {A}}^{\top } * {\mathcal {A}}\in \textrm{Sym}(\mathbb R_{++}^{p\times p \times l}) \), [13, Thm. 8] shows that \({\mathcal {H}}\) is nonsingular (invertible, Def. 5), and so \({\mathcal {P}} = \mathcal A * {\mathcal {H}}^{-1}\), which is unique.
Remark 16
The proof of Theorem 4 gives the way to obtain t-PD from the compact t-SVD. This is analogous to the matrix case.
1.5 B.5 Proof of Proposition 6
Proof
This can be easily derived from the proof of Theorem 4. Here the root of a symmetric positive definite tensor was defined in [13, Thm. 7].
1.6 B.6 Proof of Proposition 7
Proof
If \(\hat{\mathcal {A}}\in \mathbb {C}_*^{n\times p \times l}\), then \((\hat{A}^{(i)})^H\hat{A}^{(i)}, i \in [l]\) are Hermitian positive definite. Note that [13, Thm. 5] shows that \((\hat{A}^{(i)})^H\hat{A}^{(i)}, i \in [l]\) are Hermitian positive definite if only if \( \mathcal {A}^{\top }*\mathcal {A}\in \textrm{Sym}(\mathbb R_{++}^{p\times p \times l}) \).
1.7 B.7 Proof of Theorem 5
Proof
Let \({\mathcal {D}}:={\mathcal {U}}^{\top }\in \mathbb R^{p\times n \times l} \). Then for any \({\mathcal {P}}\in \textrm{St}\left( n,p,l\right) \),
where we let \({\hat{W}}^{(i)}:= {\hat{D}}^{(i)}{\hat{P}}^{(i)}{\hat{V}}^{(i)} \in \mathbb C^{p\times p}\). Note that \({\hat{D}}^{(i)} ({\hat{D}}^{(i)})^H = I_p\), \(({\hat{P}}^{(i)})^H {\hat{P}}^{(i)} = I_p\), \(({\hat{V}}^{(i)})^H{\hat{V}}^{(i)} = I_p\). Thus \( \mid ({\hat{W}}^{(i)})_{jj} \mid \leqslant 1 \), \(i \in [l]\), \(j \in [p]\). Therefore,
where \({\hat{S}}^{(i)}\geqslant 0\). On the other hand, take \(\mathcal P:={\mathcal {U}} * {\mathcal {V}}^{\top }\). It is easy to see that
namely, the upper bound is tight, which is achieved when \({\mathcal {P}} = {\mathcal {U}} * {\mathcal {V}}^{\top }\). This gives the desired result.
1.8 B.8 Proof of the Well-Defined Property of (10)
Proof
To be convenient, we will use the notation \( \Delta \) as the frontal-slice-wise product (cf. [54, Def. 2.1]) between two tensors in the Fourier domain, i.e., if \({\hat{C}}^{(i)} = {\hat{A}}^{(i)}{\hat{B}}^{(i)}, i \in [l],\) then it holds that \(L(\mathcal {A} )\Delta L(\mathcal {B} ) = {\text {fold}}\left( \hat{A}^{(i)}\hat{B}^{(i)}: i \in [l]\right) \); in other words,
Using this notation, we have
Thus for any \(N\),
Let \(N \rightarrow \infty \), it holds that
since the series defining the matrix exponential is convergent [55, Prop. 2.1].
1.9 B.9 Proof of Equivalence of (9) and (11)
Proof
Using (9) and item (i) of Proposition 1, we have
where the third equality is due to the following property of the matrix exponential ([55, Prop. 2.3, 6]): If \(X^{\top }X=I\), then \( {\text {exp}} \left[ XAX^{\top } \right] = X {\text {exp}} \left[ A \right] X^{\top },\) and the fifth equality comes from the following formula which follows immediately from definition: \( {\text {exp}} \left[ {\text {Diag}}\left( D_{i}: i \in [l]\right) \right] = {\text {Diag}}\left( {\text {exp}} \left[ D_i \right] : i \in [l]\right) \), and the last equality follows from (5).
1.10 B.10 Proof of Proposition 8
Proof
Since the t-exponential mapping
is the composite of the matrix exponential mapping and linear mappings and the matrix exponential is smooth ( [55, Prop. 2.16]), we conclude that the t-exponential mapping is smooth.
1.11 B.11 Proof of Proposition 9
Proof
Using the corresponding property of the matrix exponential [55, Prop. 2.4], we obtain
where the first equality comes from (11), while (A2) gives the last two equality. Similarly, we can show that \(\frac{\textrm{d}}{\textrm{d}t} {\text {exp}} \left[ t\mathcal {A} \right] =\mathcal {A}* {\text {exp}} \left[ t\mathcal {A} \right] \).
1.12 B.12 Proof of Proposition 10
Proof
Applying the corresponding property in the matrix case [55, Prop. 2.3, 6] and (A2), it follows that
where the first equality comes from (11).
1.13 B.13 Proof of Proposition 11
Proof
We denote \(\mathcal {A} = {\text {Diag}}\left( \mathcal {D}_j: j \in [p]\right) \) and \(\mathcal {B} = {\text {Diag}}\left( {\text {exp}} \left[ \mathcal {D}_j \right] : j \in [p]\right) \). Applying (11), we get
where the third equality is due to the property of the matrix exponential [56]: \( {\text {exp}} \left[ {\text {Diag}}\left( C_{i}: i \in [l]\right) \right] = {\text {Diag}}\left( {\text {exp}} \left[ C_i \right] : i \in [l]\right) \).
1.14 B.14 Proof of Proposition 12
Proof
It follows from Proposition 2 that
where the second equality comes from the corresponding property in the matrix case [55, Prop. 2.3, 2].
1.15 B.15 Proof of Proposition 13
Proof
Using (A2), we have
where the third equality comes from the property in the matrix exponential [55, Prop. 2.3, 5].
C Proofs of the Analytical Solution of the t-Sylvester Equation in Theorem 17
Lemma 11
Let \(\mathcal {A}\in \mathbb {R}^{m\times n \times l},\mathcal {B}\in \mathbb {R}^{n\times k\times l} \). Then
Proof
By definition, the left-hand side part is
where \(B^{(i)}_{:j}\) is the jth column of \(B^{(i)}, i\in [l]\) and the right-hand side part is
We observe that the \((q,1)-\)th block of partitioned matrice on the left-hand side is
where
While the \((q,1)-\)th block of partitioned matrice on the right-hand side is \(\sum \nolimits _{i=1}^{l}A^{(h_i)}B^{(i)}_{:j},\) which is equal to (A3).
Lemma 12
[57] Let \(C\in \mathbb R^{m\times n} ,X\in \mathbb R^{n\times p} ,B\in \mathbb R^{k\times p} \). Then
Lemma 13
Let \(\mathcal {A}\in \mathbb {R}^{m\times n \times l},\mathcal {B}\in \mathbb {R}^{n\times k\times l},\mathcal {C}\in \mathbb {R}^{m\times k\times l}\). Then
Proof
We observe that \({\text {vec}}(\mathcal {C}) = {\text {vec}}([C^{(1)},\cdots ,C^{(l)}])\). Since \({\text {unfold}}(\mathcal {C}) = {\text {bcirc}}(\mathcal {A}){\text {unfold}}(\mathcal {B}),\) i.e., \([C^{(1)},\cdots ,C^{(l)}] = [A^{(1)},\cdots ,A^{(l)}]\widetilde{{\text {bcirc}}}(\mathcal {B})\), we have
where the third equation comes from Lemma 12. Similarly, by lemma 11, there holds
where the third equation follows from Lemma 12.
Proof
Applying lemma 13, the tensor Sylvester equation (36) can be rewritten in the form
D The Euclidean Gradient \({\text {grad}}f(\mathcal {X} )\) and the Euclidean directional derivative \(Df(\mathcal {X} )[\mathcal {H} ]\) in Sect. 3.2
Similar to [13, Def. 4], for third-order tensor \(\mathcal {X}\in \mathbb R^{n\times p \times l} \), we can also introduce the definition of the Euclidean gradient \({\text {grad}}f(\mathcal {X} )\) and the Euclidean Hessian \({\text {Hess}}f(\mathcal {X} )\) from the Fréchet differentiable.
Definition 22
Let \(f: \mathcal {U} \subseteq \mathbb {R}^{n \times p \times l} \rightarrow \mathbb {R}\) be a continuous map. Then, we say f is t-differentiable at \(\mathcal {X} \in \mathcal {U} \) if and only if there exists a third-order tensor \({\text {grad}}f(\mathcal {X})\in \mathbb R^{n\times p \times l} \) such that
where \({\text {grad}}f(\mathcal {X})\) is called the gradient of f at \(\mathcal {X}\) and \(Df(\mathcal {X} )[\mathcal {H} ]= \left\langle {\text {grad}}f(\mathcal {X}), \mathcal {H}\right\rangle \) called the directional derivative of f at \(\mathcal {X}\) along \(\mathcal {H}\). And we say f is twice t-differentiable at \(\mathcal {X} \in U\) if and only if f is continuously t-differentiable and there exists a bounded linear operator \({\text {Hess}}f(\mathcal {X}):\mathbb R^{n\times p \times l} \rightarrow \mathbb R^{n\times p \times l} \) such that
Furthermore, we say f is t-differentiable (twice t-differentiable) on \(\mathcal {U} \) if and only if f is t-differentiable (twice t-differentiable) at every \(\mathcal {X} \in \mathcal {U} \).
Theorem 21
Let f be a continuous map from \(\mathcal {U} \subseteq \mathbb {R}^{n \times p \times l}\) to \(\mathbb {R}\). Thenf is t-differentiable on U if and only if \(\frac{\mathrm {\partial } f(\mathcal {X})}{\partial [{\text {vec}}(\mathcal {X})]}\) exists for every \(\mathcal {X} \in \mathcal {U} \), where \(\frac{\partial f(\mathcal {X})}{\partial [{\text {vec}}(\mathcal {X})]}\) is a vector in \(\mathbb {R}^{npl}\) with \(\left( \frac{\partial f(\mathcal {X})}{\partial [{\text {vec}}(\mathcal {X})]}\right) _{i}=\frac{\partial f(\mathcal {X})}{\partial \left( [{\text {vec}}(\mathcal {X})]_{i}\right) }\) for any \(i \in [npl]\). Especially, for any \(\mathcal {X} \in \mathcal {U} ,\)
where \(\varvec{v}=\textrm{vec}(\mathcal {A} )\) denotes the vectorized tensor of \(\mathcal {A} \) and \(\textrm{vec}^{-1}(\varvec{v})=\mathcal {A} \) represents the operator that converts a vector \(\varvec{v}\) back to a tensor \(\mathcal {A} \), which can all be implemented with functions reshape, permute and ipermute of Matlab (cf. [58]).
Proof
The proof is similar to that of [13, Thm. 1] and is omitted.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mao, XP., Wang, Y. & Yang, YN. Computation over t-Product Based Tensor Stiefel Manifold: A Preliminary Study. J. Oper. Res. Soc. China (2024). https://doi.org/10.1007/s40305-023-00522-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40305-023-00522-z