Skip to main content
Log in

Computation over t-Product Based Tensor Stiefel Manifold: A Preliminary Study

  • Published:
Journal of the Operations Research Society of China Aims and scope Submit manuscript

Abstract

Let \( * \) denote the t-product between two third-order tensors proposed by Kilmer and Martin (Linear Algebra Appl 435(3): 641–658, 2011). The purpose of this work is to study fundamental computation over the set \( \textrm{St}\left( n,p,l\right) := \{\mathcal {X} \in \mathbb R^{n\times p \times l} \mid \mathcal {X} ^{\top } * \mathcal {X} = \mathcal I \}\), where \(\mathcal {X} \) is a third-order tensor of size \(n\times p \times l\) (\(n\geqslant p\)) and \({\mathcal {I}}\) is the identity tensor. It is first verified that \( \textrm{St}\left( n,p,l\right) \) endowed with the Euclidean metric forms a Riemannian manifold, which is termed as the (third-order) tensor Stiefel manifold in this work. We then derive the tangent space, Riemannian gradient, and Riemannian Hessian on \( \textrm{St}\left( n,p,l\right) \). In addition, formulas of various retractions based on t-QR, t-polar decomposition, t-Cayley transform, and t-exponential, as well as vector transports, are presented. It is expected that analogous to their matrix counterparts, the derived formulas may serve as building blocks for analyzing optimization problems over the tensor Stiefel manifold and designing Riemannian algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1

Similar content being viewed by others

Data Availability

We do not analyze or generate any datasets, because our work proceeds within a theoretical and mathematical approach.

Notes

  1. For QR factorization of complex matrices, we can choose that R factor is upper triangular with real nonzero diagonal elements.

References

  1. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)

    Article  MathSciNet  Google Scholar 

  2. Comon, P.: Tensors: a brief introduction. IEEE Signal Process. Mag. 31(3), 44–53 (2014)

    Article  Google Scholar 

  3. Cichocki, A., Mandic, D., De Lathauwer, L., Zhou, G., Zhao, Q., Caiafa, C., Phan, H.A.: Tensor decompositions for signal processing applications: from two-way to multiway component analysis. IEEE Signal Process. Mag. 32(2), 145–163 (2015)

    Article  Google Scholar 

  4. Sidiropoulos, N., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E., Faloutsos, C.: Tensor decomposition for signal processing and machine learning. IEEE Trans. Signal Process. 65(13), 3551–3582 (2017)

    Article  MathSciNet  Google Scholar 

  5. Braman, K.: Third-order tensors as linear operators on a space of matrices. Linear Algebra Appl. 433(7), 1241–1253 (2010)

    Article  MathSciNet  Google Scholar 

  6. Kilmer, M.E., Martin, C.D.: Factorization strategies for third-order tensors. Linear Algebra Appl. 435(3), 641–658 (2011)

    Article  MathSciNet  Google Scholar 

  7. Kilmer, M.E., Braman, K., Hao, N., Hoover, R.C.: Third-order tensors as operators on matrices: a theoretical and computational framework with applications in imaging. SIAM J. Matrix Anal. Appl. 34(1), 148–172 (2013)

    Article  MathSciNet  Google Scholar 

  8. Lu, C., Feng, J., Chen, Y., Liu, W., Lin, Z., Yan, S.: Tensor robust principal component analysis with a new tensor nuclear norm. IEEE Trans. Pattern Anal. Mach. Intell. 42(4), 925–938 (2019)

    Article  Google Scholar 

  9. Miao, Y., Qi, L., Wei, Y.: T-Jordan canonical form and t-Drazin inverse based on the t-product. Commun. Appl. Math. Comput. Sci. 3(2), 201–220 (2021)

    Article  MathSciNet  Google Scholar 

  10. Lund, K.: The tensor t-function: a definition for functions of third-order tensors. Numer. Linear Algebra Appl. 27(3), e2288 (2020)

    Article  MathSciNet  Google Scholar 

  11. Miao, Y., Qi, L., Wei, Y.: Generalized tensor function via the tensor singular value decomposition based on the T-product. Linear Algebra Appl. 590, 258–303 (2020)

    Article  MathSciNet  Google Scholar 

  12. Liu, W.H., Jin, X.Q.: A study on T-eigenvalues of third-order tensors. Linear Algebra Appl. 612, 357–374 (2020)

    Article  MathSciNet  Google Scholar 

  13. Zheng, M.M., Huang, Z.H., Wang, Y.: T-positive semidefiniteness of third-order symmetric tensors and T-semidefinite programming. Comput. Optim. Appl. 78(1), 239–272 (2021)

    Article  MathSciNet  Google Scholar 

  14. Qi, L., Luo, Z.: Tubal matrices (2021). arXiv:2105.00793

  15. Huang, W., Absil, P.A., Gallivan, K.A.: A Riemannian BFGS method without differentiated retraction for nonconvex optimization problems. SIAM J. Optim. 28(1), 470–495 (2018)

    Article  MathSciNet  Google Scholar 

  16. Hu, J., Jiang, B., Lin, L., Wen, Z., Yuan, Y.X.: Structured quasi-Newton methods for optimization with orthogonality constraints. SIAM J. Sci. Comput. 41(4), A2239–A2269 (2019)

    Article  MathSciNet  Google Scholar 

  17. Chen, S., Ma, S., So, A.M.C., Zhang, T.: Proximal gradient method for nonsmooth optimization over the Stiefel manifold. SIAM J. Optim. 30(1), 210–239 (2020)

    Article  MathSciNet  Google Scholar 

  18. Huang, W., Wei, K.: Riemannian proximal gradient methods. Math. Program. 194, 371–413 (2022)

  19. Gao, B., Liu, X., Chen, X., Yuan, Y.X.: A new first-order algorithmic framework for optimization problems with orthogonality constraints. SIAM J. Optim. 28(1), 302–332 (2018)

    Article  MathSciNet  Google Scholar 

  20. Hu, J., Liu, X., Wen, Z.W., Yuan, Y.X.: A brief introduction to manifold optimization. J. Oper. Res. Soc. China 8(2), 199–248 (2020)

    Article  MathSciNet  Google Scholar 

  21. Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)

    Google Scholar 

  22. Tu, L.W.: An Introduction to Manifolds, 2nd edn. Springer, Universitext, New York (2011)

    Book  Google Scholar 

  23. Boumal, N.: An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, Cambridge (2022)

    Google Scholar 

  24. Uschmajew, A., Vandereycken, B.: The geometry of algorithms using hierarchical tensors. Linear Algebra Appl. 439(1), 133–166 (2013)

    Article  MathSciNet  Google Scholar 

  25. Holtz, S., Rohwedder, T., Schneider, R.: On manifolds of tensors of fixed TT-rank. Numer. Math. 120(4), 701–731 (2012)

    Article  MathSciNet  Google Scholar 

  26. Kressner, D., Steinlechner, M., Vandereycken, B.: Low-rank tensor completion by Riemannian optimization. BIT Numer. Math. 54(2), 447–468 (2014)

    Article  MathSciNet  Google Scholar 

  27. Heidel, G., Schulz, V.: A Riemannian trust-region method for low-rank tensor completion. Numer. Linear Algebra Appl. 25(6), e2175 (2018)

    Article  MathSciNet  Google Scholar 

  28. Steinlechner, M.: Riemannian optimization for high-dimensional tensor completion. SIAM J. Sci. Comput. 38(5), S461–S484 (2016)

    Article  MathSciNet  Google Scholar 

  29. Breiding, P., Vannieuwenhoven, N.: A Riemannian trust region method for the canonical tensor rank approximation problem. SIAM J. Optim. 28(3), 2435–2465 (2018)

    Article  MathSciNet  Google Scholar 

  30. Gilman, K., Tarzanagh, D.A., Balzano, L.: Grassmannian optimization for online tensor completion and tracking with the t-SVD. IEEE Trans. Signal Process. 70, 2152–2167 (2022)

    Article  MathSciNet  Google Scholar 

  31. Song, G.J., Wang, X.Z., Ng, M.K.: Riemannian conjugate gradient descent method for fixed multi rank third-order tensor completion. J. Comput. Appl. Math. 421, 114866 (2023)

    Article  MathSciNet  Google Scholar 

  32. Zhang, X., Yang, Z.P., Cao, C.G.: Inequalities involving Khatri–Rao products of positive semidefinite matrices. Appl. Math. E-Notes 2, 117–124 (2002)

    MathSciNet  Google Scholar 

  33. Huang, W.: Optimization algorithms on Riemannian manifolds with applications. Ph.D. thesis, The Florida State University (2013)

  34. Zhu, X.: A Riemannian conjugate gradient method for optimization on the Stiefel manifold. Comput. Optim. Appl. 67(1), 73–110 (2017)

    Article  MathSciNet  Google Scholar 

  35. Edelman, A., Arias, T., Smith, S.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)

    Article  MathSciNet  Google Scholar 

  36. Bunse-Gerstner, A., Byers, R., Mehrmann, V.: Numerical methods for simultaneous diagonalization. SIAM J. Matrix Anal. Appl. 14(4), 927–949 (1993)

    Article  MathSciNet  Google Scholar 

  37. Pesquet-Popescu, B., Pesquet, J.C., Petropulu, A.P.: Joint singular value decomposition-a new tool for separable representation of images. In: International Conference on Image Processing. vol. 2, pp. 569–572. IEEE, Thessaloniki, Greece (2001)

  38. Shashua, A., Levin, A.: Linear image coding for regression and classification using the tensor-rank principle. In: International Conference on Artificial Intelligence and Statistics. vol. 1, pp. I–42–I–49. IEEE Computer Society, Kauai, HI, USA (2001)

  39. Allen, G.I.: Sparse higher-order principal components analysis. In: International Conference on Artificial Intelligence and Statistics. vol. 22, pp. 27–36. PMLR, La Palma, Canary Islands (2012)

  40. Wang, Y., Dong, M., Xu, Y.: A sparse rank-1 approximation algorithm for high-order tensors. Appl. Math. Lett. 102, 106140 (2020)

    Article  MathSciNet  Google Scholar 

  41. Mao, X., Yang, Y.: Several approximation algorithms for sparse best rank-1 approximation to higher-order tensors. J. Glob. Optim. (2022). https://doi.org/10.1007/s10898-022-01140-4

    Article  MathSciNet  Google Scholar 

  42. Kwak, N.: Principal component analysis based on \(\ell _1\)-norm maximization. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1672–1680 (2008)

    Article  Google Scholar 

  43. Hao, N., Kilmer, M.E., Braman, K., Hoover, R.C.: Facial recognition using tensor–tensor decompositions. SIAM J. Imaging Sci. 6(1), 437–463 (2013)

    Article  MathSciNet  Google Scholar 

  44. Schönemann, P.H.: A generalized solution of the orthogonal procrustes problem. Psychometrika 31(1), 1–10 (1966)

    Article  MathSciNet  Google Scholar 

  45. Lin, J., Huang, T.Z., Zhao, X.L., Jiang, T.X., Zhuang, L.: A tensor subspace representation-based method for hyperspectral image denoising. IEEE Tran. Geosci. Remote Sens. 59(9), 7739–7757 (2020)

    Article  Google Scholar 

  46. Xu, S.S., Huang, T.Z., Lin, J., Chen, Y.: T-hy-demosaicing: hyperspectral reconstruction via tensor subspace representation under orthogonal transformation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 14, 4842–4853 (2021)

    Article  Google Scholar 

  47. Xu, T., Huang, T.Z., Deng, L.J., Yokoya, N.: An iterative regularization method based on tensor subspace representation for hyperspectral image super-resolution. IEEE Trans. Geosci. Remote Sens. 60, 1–16 (2022)

    Google Scholar 

  48. Hoover, R.C., Caudle, K., Braman, K.: Multilinear discriminant analysis through tensor-tensor eigendecomposition. In: ICMLA. pp. 578–584. IEEE, Orlando, FL (2018)

  49. Ozdemir, C., Hoover, R.C., Caudle, K., Braman, K.: High-order multilinear discriminant analysis via order-\(n\) tensor eigendecomposition. Technical report, SSRN (2022). https://dx.doi.org/10.2139/ssrn.4203431

  50. Vervliet, N., Debals, O., Sorber, L., Van Barel, M., De Lathauwer, L.: Tensorlab 3.0 (2016). http://www.tensorlab.net

  51. Lu, C.: Tensor-Tensor Product Toolbox. Carnegie Mellon University, Pittsburgh (2018)

    Google Scholar 

  52. Iannazzo, B., Porcelli, M.: The Riemannian Barzilai–Borwein method with nonmonotone line search and the matrix geometric mean computation. IMA J. Numer. Anal. 38(1), 495–517 (2018)

    Article  MathSciNet  Google Scholar 

  53. Kilmer, M.E., Horesh, L., Avron, H., Newman, E.: Tensor–tensor algebra for optimal representation and compression of multiway data. Proc. Natl. Acad. Sci. U.S.A. 118(28), e2015851118 (2021)

    Article  MathSciNet  Google Scholar 

  54. Kernfeld, E., Kilmer, M., Aeron, S.: Tensor–tensor products with invertible linear transforms. Linear Algebra Appl. 485, 545–570 (2015)

    Article  MathSciNet  Google Scholar 

  55. Hall, B.C.: Lie Groups, Lie Algebras, and representations. Springer, Cham (2015)

    Book  Google Scholar 

  56. Van Loan, C.: Computing integrals involving the matrix exponential. IEEE Trans. Autom. Control 23(3), 395–404 (1978)

    Article  MathSciNet  Google Scholar 

  57. Van Loan, C.F.: The ubiquitous kronecker product. J. Comput. Appl. Math. 123(1–2), 85–100 (2000)

  58. Kolda, T.G.: Multilinear operators for higher-order decompositions. Tech. Rep. SAND2006-2081, 923081, Citeseer (2006)

Download references

Author information

Authors and Affiliations

Authors

Contributions

X. -P. Mao, Y. Wang and Y. -N. Yang deduced the theories, designed the algorithms, performed the numerial experiments, drafted the manuscript, read and approved the final manuscript.

Corresponding author

Correspondence to Yu-Ning Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

This work was supported by the National Natural Science Foundation of China (No. 12171105), Fok Ying Tong Education Foundation (No. 171094), and the special foundation for Guangxi Bagui Scholars.

Appendices

Appendix

A Preliminaries on Riemannian Manifold

Basic definitions and properties concerning the Riemannian manifold can be found in the books [21,22,23]. To be more convenient and to make the paper self-contained, we summarize the necessary ones in this section.

Definition 13

[21, 22] A topological manifold \(\mathscr {M}\) of dimension n is a Hausdorff, second countable, locally Euclidean dimension n space. Let \(\mathscr {N}\) be a submanifold of \(\mathscr {M}\). If the manifold topology of \(\mathscr {N}\) coincides with its subspace topology induced from the topological space \(\mathscr {M}\), then \(\mathscr {N}\) is called an embedded submanifold of the manifold \(\mathscr {M}\).

Definition 14

[21] A tangent vector \(\xi _{x}\) to \(\mathscr {M}\) at x is defined as a mapping from \(\mathfrak {F}_{x}(\mathscr {M})\) to \(\mathbb {R}\) such that \( \xi _{x} f:=\dot{\gamma }(0) f:= \frac{\textrm{d}}{\textrm{d} t}f(\gamma (t))\mid _{t=0}, \quad \forall f \in \mathfrak {F}_{x}(\mathscr {M}), \) for some smooth curve \(\gamma (t)\) on \(\mathscr {M}\) with \(\gamma (0)=x\). The tangent space \(T_{x} \mathscr {M}\) to \(\mathscr {M}\) is defined as the set of all tangent vectors to \(\mathscr {M}\) at x. \(T\mathscr {M}:=\bigcup _{x \in \mathscr {M}} T_{x} \mathscr {M}.\) is called the tangent bundle of the manifold.

Definition 15

[21] The differential of \(F: \mathscr {M} \rightarrow \mathscr {N}\) at x is a linear operator \(\textrm{D}F(x): T_{x} \mathscr {M} \rightarrow T_{F(x)} \mathscr {N}\) defined by: \( \textrm{D} F(x)[v]:=\frac{\textrm{d}}{\textrm{d} t} F(\gamma (t))\mid _{t=0}, \) where \(\gamma (t)\) is any curve on the manifold that satisfies \(\gamma (0)=x\) and \(\dot{\gamma }(0)=v\).

Definition 16

[21] A Riemannian metric g is defined on each tangent space of x as an inner product \(g_{x}: T_{x} \mathscr {M} \times T_{x} \mathscr {M} \rightarrow \mathbb {R}\). \( g_{x}(\eta , \xi )=\langle \eta , \xi \rangle _{x} \) where \(\eta , \xi \in T_{x} \mathscr {M}\). A Riemannian manifold is the combination \((\mathscr {M}, g)\).

Definition 17

[33] The geodesic \(\gamma (t)\) defined by an affine connection is a curve that satisfies \( \ddot{\gamma }(t):=\frac{\textrm{D}^{2}}{\textrm{d} t^{2}} \gamma (t):=\frac{\textrm{D}}{\textrm{d} t} \dot{\gamma }(t) =0, \) where \(\frac{\textrm{D}}{\textrm{d} t}\) is the induced covariant derivative (see [23, Thm. 5.29]).

Definition 18

[21] The Riemannian gradient \({\text {grad}}f(x)\) of a function f at x is an unique vector in \(T_x\mathscr {M}\) satisfying \(\langle {\text {grad}} f(x), \xi _x\rangle _{x}=\textrm{D} f(x)[\xi _x], \quad \forall \xi _x \in T_{x} \mathscr {M}.\) The Riemannian Hessian \( {\text {Hess}}f(x)\) is a mapping from the tangent space \(T_{x} \mathscr {M}\) to the tangent space \(T_{x} \mathscr {M}\): \( {\text {Hess}} f(x)[\xi ]:={\nabla }_{\xi } {\text {grad}} f(x), \) where \({\nabla }\) is the Riemannian connection on \(\mathscr {M}\) (see [21, Thm. 5.3.1]).

Lemma 8

[23] Let \(\mathscr {M}\) be a Riemannian submanifold of a Euclidean space \(\mathscr {E}\) and let \(f: \mathscr {M} \rightarrow \mathbb {R}\) be a smooth function. Then,

$$\begin{aligned} \begin{aligned} {\text {grad}} f(x)&=\varvec{P}_x({\text {grad}} {\bar{f}}(x)), \\ {\text { Hess }} f(x)[u]&=\varvec{P}_x(\textrm{D} {\bar{G}}(x)[u]), u \in T_{x} \mathscr {M}, \end{aligned} \end{aligned}$$

where \(\textrm{D}\) is the Euclidean derivative, \(\varvec{P}_x(y) \) denotes the orthogonal projection from \(\mathscr {E}\) to \(T_{x} \mathscr {M}\), and smooth scalar field \(\bar{f}\) ( vector field \(\bar{G}\)) is any smooth extension of f (G) to a neighborhood of \(\mathscr {M}\) in \(\mathscr {E}\).

Retraction provides a method to map the tangent vector to the next iterate on the manifold.

Definition 19

(cf. [21, Def. 4.1.1]) A retraction on a manifold \(\mathscr {M}\) is a smooth mapping R from the tangent bundle \(T\mathscr {M} \) onto \(\mathscr {M}\). Let \(R_x\) denote the restriction of R to \(T_x\mathscr {M}\), \((i)~R_x(0_x) = x\), where \(0_x\) denotes the zero element of \(T_x\mathscr {M} \), and \((ii)~\textrm{D} R_x(0_x):T_x\mathscr {M}\mapsto T_x\mathscr {M}\) is the identity map: \(\textrm{D} R_x(0_x)[v] = v\).

For the embedded submanifold of a vector space, a simple way to construct retractions is specified in the following.

Lemma 9

(cf. [21, Prop. 4.1.2]) Let \(\mathscr {M}\) be an embedded manifold of a vector space \(\mathscr {E}\) and let \(\mathscr {N}\) be an abstract manifold such that \(\dim {{\mathscr {M}}}+\dim {{\mathscr {N}}}=\dim {{\mathscr {E}}}\). Assume that there is a diffeomorphism \( \phi :\mathscr {M} \times \mathscr {N}\rightarrow \mathscr {E}_{*}:(F,G)\mapsto \phi (F,G), \) where \(\mathscr {E}_{*}\) is an open subset of \(\mathscr {E}\)(thus \(\mathscr {E}_{*}\) is an open submanifold of \(\mathscr {E}\)), with a neutral element \(I \in \mathscr {N}\) satisfying \( \phi (F,I) = F,~~ \forall F \in \mathscr {M}. \) Then the mapping \( R_{X}(\xi ) = \pi _1 \left( \phi ^{-1}(X+\xi ) \right) , \) where \( \pi _1:\mathscr {M} \times \mathscr {N}\rightarrow \mathscr {M}:(F,G)\mapsto F\) is the projection onto the first component, defines a retraction on \(\mathscr {M}\).

To compare tangent vectors at distinct points on the manifold, the vector transport upon retraction R

gives us a way to transport a tangent vector \(\xi \in T_x\mathscr {M}\) to the tangent space \(T_{R_x(\eta )}\mathscr {M} \) for some \(\eta \in T_x\mathscr {M} \).

Definition 20

(cf. [21, Def. 8.1.1]) A vector transport \(\mathcal {T} : T\mathscr {M} \oplus T\mathscr {M} \rightarrow T\mathscr {M}:(\eta _x,\xi _x ) \mapsto \mathcal {T}_{\eta _x}\xi _x\) associated with a retraction R is a smooth mapping satisfying the following properties for all \(x\in \mathscr {M}\): (i) \(\mathcal {T} _{\eta _x}\xi _x\in T_{R_x(\eta _x)}\mathscr {M}\), (ii) \(\mathcal {T} _{0_{x}}\xi _x = \xi _x\) for all \(\xi _x\in T_x\mathscr {M}\), and \((iii)~\mathcal {T} _{\eta _{x}}(a\xi _x+b\zeta ) = a\mathcal {T}_{\eta _{x}}\xi _x+b\mathcal {T}_{\eta _{x}}\zeta \). Vector transport by differentiated retraction is defined as

$$\begin{aligned} \mathcal {T}_{\eta _x}\xi _x:= \textrm{D} R_{x}(\eta _x)[\xi _x] = \frac{\textrm{d}}{\textrm{d}t}R_{x}(\eta _x+t\xi _x)\mid _{t=0}. \end{aligned}$$
(A1)

Lemma 10

(cf. [21, Sect. 8.1.3]) A vector transport on \(\mathscr {M}\) associated with a retraction R is given by the orthogonal projection onto the tangent space, i.e., \( \mathcal {T}_{\eta _x}{\xi _x} = \varvec{P}_{R_x(\eta _x)}\xi _x. \)

Definition 21

[34] A vector transport \(\mathcal {T}\) is called isometric if for all \(\eta , \xi \in T_{x} \mathscr {M}\), it satisfies \( \left\langle \mathcal {T}_{\eta }(\xi ), \mathcal {T}_{\eta }(\xi )\right\rangle _{R_{x}(\eta )}=\langle \xi , \xi \rangle _{x} \) where R is the retraction associated with \(\mathcal {T}\).

B Proofs of Theorems, Propositions and Lemmas in Sect. 2

1.1 B.1 Proof of Proposition 2

Proof

Taking the conjugate transpose of both sides of the equation in item (i) of Proposition 1, then multiplying both sides by \((F_l\otimes I_n)\), we get

$$\begin{aligned} (F_l\otimes I_p){\text {bcirc}}(\mathcal {A}^{\top }) ={\text {Diag}}\left( (\hat{A}^{(i)})^H: i \in [l]\right) (F_l\otimes I_n), \end{aligned}$$

where we use the following property ( [9, Lem. 3]): \({\text {bcirc}}(\mathcal {A})^{\top } = {\text {bcirc}}(\mathcal {A}^{\top }) \). Taking the first column of the block matrix on both sides of the above equation yields

$$\begin{aligned} (F_l\otimes I_p){\text {unfold}}(\mathcal {A}^{\top })= & {} \frac{1}{\sqrt{l}}{\text {Diag}}\left( (\hat{A}^{(i)})^H: i \in [l]\right) {\text {Vec}}\left( I_n: i \in [l]\right) \\= & {} \frac{1}{\sqrt{l}} {\text {Vec}}\left( (\hat{A}^{(i)})^H: i \in [l]\right) , \end{aligned}$$

which combing with Definition 6 gives \(L(\mathcal {A} ^{\top }) = {\text {fold}}\left( (\hat{A}^{(i)})^{H}: i \in [l]\right) .\)

1.2 B.2 Proof of Theorem 3

Proof

t-QR was proposed in [7, Sect. 2.5].

For \(i= 1, \cdots , \lceil \frac{l+1}{2}\rceil ,\) let \( \hat{A}^{(i)} = \hat{Q}^{(i)}\cdot \hat{R}^{(i)}\) be the QR decomposition of \(\hat{A}^{(i)}\in \mathbb C^{n\times p} \)Footnote 1 where \(\hat{Q}^{(i)}\in \mathbb C^{n\times p} , (\hat{Q}^{(i)})^H\cdot \hat{Q}^{(i)} = I_p\), \(\hat{R}^{(i)}\in \mathbb C^{p\times p} _{\textrm{upp}}\) and \({\text {diag}}(\hat{R}^{(i)})\in \mathbb R^{p\times p} \), namely, the diagonal entries of \({\hat{R}}^{(i)}\) are real. For \(i=1+ \lceil \frac{l+1}{2} \rceil ,\cdots ,l\), \( {\hat{A}}^{(i)} = \textrm{conj}\left( {\hat{A}}^{(l+2-i)}\right) , {\hat{Q}}^{(i)} = \textrm{conj}\left( {\hat{Q}}^{(l+2-i)}\right) , {\hat{R}}^{(i)} = \textrm{conj}\left( {\hat{R}}^{(l+2-i)}\right) . \) It follows from Remarks 3 and 4 that \({\mathcal {Q}} \in \textrm{St}\left( n,p,l\right) \) and \({\mathcal {R}}\in \mathbb {R}_{\textrm{upp}}^{p\times p \times l} \). Here \({\mathcal {R}}\) to be real is because of Remark 3 and direct computation. Using Remark 3 again, further we have \(\hat{A}^{(1)}\in \mathbb R^{n\times p} , \hat{Q}^{(1)}\in \mathbb R^{n\times p} , \hat{R}^{(1)}\in \mathbb R^{p\times p} \).

We then show the uniqueness of the decomposition. As we know, for QR decomposition of a matrix \(\hat{A}^{(i)}\in \mathbb C^{n\times p} \) with \(n\geqslant p\), if \(\hat{A}^{(i)}, i \in [l]\) are of full rank p, namely, \(\hat{\mathcal {A} }\in \mathbb C^{n\times p \times l} _*\), then the QR decomposition \(\hat{A}^{(i)} = \hat{Q}^{(i)}\hat{R}^{(i)}\) are unique if we require that the diagonal entries of \(\hat{R}^{(i)}\) are all positive, namely, \(\hat{\mathcal {R} }\in \mathbb {C}_{\textrm{upp}+}^{p\times p \times l} \). Since the Fourier transform is bijective, the uniqueness of the matrix QR decomposition leads to the uniqueness of the t-QR decomposition.

1.3 B.3 Proof of Lemma 3

Proof

The proof of Theorem 3 shows that S is isomorphic to

If l is even, then it holds that

$$\begin{aligned} \begin{aligned} \hat{S} = \Biggl \{\hat{\mathcal {R} }~\bigg |~&\hat{R}^{(1)},\hat{R}^{(\frac{l}{2}+1)}\in \mathbb R^{p\times p} _{\textrm{upp}+}, \hat{R}^{(i)}\in \mathbb C^{p\times p} _{\textrm{upp}+}, i = [l]\setminus \{1,\frac{l}{2}+1\},\\&\hat{R}^{(i)} = {\text {conj}}(\hat{R}^{(l+2-i)}), i = 2,\cdots ,\frac{l}{2} \Biggr \}. \end{aligned} \end{aligned}$$

Then we examine matrices \(\hat{R}^{(i)}, i \in [l]\) containing free variables. There are two real upper triangular \(p\times p\) matrices, both of dimension \(\frac{(1+p)p}{2}\); there are \(\frac{l-2}{2}\) complex upper triangular \(p\times p\) matrices with positive diagonal elements, both of dimension \(\frac{(p-1)p}{2}\times 2 + p\). Hence the dimension of \(\hat{S}\) is \(2\times \frac{(1+p)p}{2}+\frac{l-2}{2}\times \big (\frac{(p-1)p}{2}\times 2 + p\big )=\frac{p^2l}{2}+p\).

If l is odd, then it holds that

$$\begin{aligned} \begin{aligned} \hat{S} = \Biggl \{\hat{\mathcal {R} }~\bigg |~&\hat{R}^{(1)}\in \mathbb R^{p\times p} _{\textrm{upp}+}, \hat{R}^{(i)}\in \mathbb C^{p\times p} _{\textrm{upp}+}, i = [l]\setminus \{1\},\\&\hat{R}^{(i)} = {\text {conj}}(\hat{R}^{(l+2-i)}), i = 2,\cdots ,\frac{l+1}{2} \Biggr \}. \end{aligned} \end{aligned}$$

There is one real upper triangular \(p\times p\) matrix of dimension \(\frac{(1+p)p}{2}\); there are \(\frac{l-1}{2}\) complex upper triangular \(p\times p\) matrices with positive diagonal elements, both of dimension \(\frac{(p-1)p}{2}\times 2 + p\). Hence the dimension of \(\hat{S}\) is \( \frac{(1+p)p}{2}+\frac{l-1}{2}\times \big (\frac{(p-1)p}{2}\times 2 + p\big )=\frac{p^2l+p}{2}\).

1.4 B.4 Proof of Theorem 4

Proof

Let the compact t-SVD of \({\mathcal {A}}= {\mathcal {U}} * \mathcal S * {\mathcal {V}}^{\top }\). Let \({\mathcal {P}}:=\mathcal U * {\mathcal {V}}^{\top }\) and \({\mathcal {H}}:= \mathcal V * {\mathcal {S}} * {\mathcal {V}}^{\top }\). Then it is clear that (8) is satisfied. To see that \({\mathcal {H}}\in \textrm{Sym}(\mathbb R_+^{p\times p \times l}) \), first we show that \({\mathcal {S}}\in \textrm{Sym}(\mathbb R_+^{p\times p \times l}) \). This is obvious, as each \({\hat{S}}^{(i)}\) is diagonal with nonnegative entries, and so \({\mathcal {S}}\in \textrm{Sym}(\mathbb R_+^{p\times p \times l}) \), according to Remark 5. By [13, Thm. 7], there is a unique \({\mathcal {T}}\) such that \({\mathcal {T}} * {\mathcal {T}}^{\top }= \mathcal S\). Then \({\mathcal {H}} \) can be written as \({\mathcal {H}} = \mathcal V * {\mathcal {T}} * \left( {\mathcal {V}} * \mathcal T \right) ^{\top }\), which together with [13, Thm. 8] shows that \({\mathcal {H}} \in \textrm{Sym}(\mathbb R_+^{p\times p \times l}) \).

To show the uniqueness of \({\mathcal {H}}\), note that \( \mathcal A^{\top } * {\mathcal {A}}={\mathcal {H}} * {\mathcal {H}}\), which by [13, Thm. 8] is clearly symmetric positive semidefinite. Revoking again [13, Thm. 7] gives the uniqueness of \({\mathcal {H}}\).

If \({\mathcal {A}}^{\top } * {\mathcal {A}}\in \textrm{Sym}(\mathbb R_{++}^{p\times p \times l}) \), [13, Thm. 8] shows that \({\mathcal {H}}\) is nonsingular (invertible, Def. 5), and so \({\mathcal {P}} = \mathcal A * {\mathcal {H}}^{-1}\), which is unique.

Remark 16

The proof of Theorem 4 gives the way to obtain t-PD from the compact t-SVD. This is analogous to the matrix case.

1.5 B.5 Proof of Proposition 6

Proof

This can be easily derived from the proof of Theorem 4. Here the root of a symmetric positive definite tensor was defined in [13, Thm. 7].

1.6 B.6 Proof of Proposition 7

Proof

If \(\hat{\mathcal {A}}\in \mathbb {C}_*^{n\times p \times l}\), then \((\hat{A}^{(i)})^H\hat{A}^{(i)}, i \in [l]\) are Hermitian positive definite. Note that [13, Thm. 5] shows that \((\hat{A}^{(i)})^H\hat{A}^{(i)}, i \in [l]\) are Hermitian positive definite if only if \( \mathcal {A}^{\top }*\mathcal {A}\in \textrm{Sym}(\mathbb R_{++}^{p\times p \times l}) \).

1.7 B.7 Proof of Theorem 5

Proof

Let \({\mathcal {D}}:={\mathcal {U}}^{\top }\in \mathbb R^{p\times n \times l} \). Then for any \({\mathcal {P}}\in \textrm{St}\left( n,p,l\right) \),

$$\begin{aligned} l \left\langle {\mathcal {A}} , {\mathcal {P}}\right\rangle&= \textrm{tr}\left( {\mathcal {A}}^{\top } * {\mathcal {P}} \right) \\&= \textrm{tr}\left( {\mathcal {V}} * {\mathcal {S}} * {\mathcal {U}}^{\top } * {\mathcal {P}} \right) \\&= \textrm{tr}\left( {\mathcal {S}} * {\mathcal {D}} * {\mathcal {P}} * {\mathcal {V}} \right) \\&= \textrm{tr}\left( {\hat{S}} {\hat{D}}{\hat{P}}{\hat{V}} \right) \\&= \sum ^l_{i=1} \textrm{tr}\left( {\hat{S}}^{(i)} {\hat{D}}^{(i)} {\hat{P}}^{(i)} {\hat{V}}^{(i)} \right) = \sum ^l_{i=1} \textrm{tr}\left( {\hat{S}}^{(i)} {\hat{W}}^{(i)} \right) , \end{aligned}$$

where we let \({\hat{W}}^{(i)}:= {\hat{D}}^{(i)}{\hat{P}}^{(i)}{\hat{V}}^{(i)} \in \mathbb C^{p\times p}\). Note that \({\hat{D}}^{(i)} ({\hat{D}}^{(i)})^H = I_p\), \(({\hat{P}}^{(i)})^H {\hat{P}}^{(i)} = I_p\), \(({\hat{V}}^{(i)})^H{\hat{V}}^{(i)} = I_p\). Thus \( \mid ({\hat{W}}^{(i)})_{jj} \mid \leqslant 1 \), \(i \in [l]\), \(j \in [p]\). Therefore,

$$\begin{aligned} \sum ^l_{i=1} \textrm{tr}\left( {\hat{S}}^{(i)} {\hat{W}}^{(i)} \right)&= \sum ^l_{i=1}\sum ^p_{j=1} ({\hat{S}}^{(i)})_{jj} ({\hat{W}}^{(i)})_{jj} \\&\leqslant \sum ^l_{i=1}\sum ^p_{j=1}({\hat{S}}^{(i)})_{jj} \mid {\hat{W}}^{(i)}_{jj}\mid \\&\leqslant \sum ^l_{i=1} \textrm{tr}\left( {\hat{S}}^{(i)} \right) = \textrm{tr}\left( {\hat{S}} \right) , \end{aligned}$$

where \({\hat{S}}^{(i)}\geqslant 0\). On the other hand, take \(\mathcal P:={\mathcal {U}} * {\mathcal {V}}^{\top }\). It is easy to see that

$$\begin{aligned} l \left\langle {\mathcal {A}} , {\mathcal {P}}\right\rangle = \textrm{tr}\left( {\hat{S}} \right) , \end{aligned}$$

namely, the upper bound is tight, which is achieved when \({\mathcal {P}} = {\mathcal {U}} * {\mathcal {V}}^{\top }\). This gives the desired result.

1.8 B.8 Proof of the Well-Defined Property of (10)

Proof

To be convenient, we will use the notation \( \Delta \) as the frontal-slice-wise product (cf. [54, Def. 2.1]) between two tensors in the Fourier domain, i.e., if \({\hat{C}}^{(i)} = {\hat{A}}^{(i)}{\hat{B}}^{(i)}, i \in [l],\) then it holds that \(L(\mathcal {A} )\Delta L(\mathcal {B} ) = {\text {fold}}\left( \hat{A}^{(i)}\hat{B}^{(i)}: i \in [l]\right) \); in other words,

$$\begin{aligned} L(\mathcal {C} ) = L(\mathcal {A} )\Delta L(\mathcal {B} ) \Leftrightarrow {\hat{C}}^{(i)} = {\hat{A}}^{(i)}{\hat{B}}^{(i)}, i \in [l] \Leftrightarrow \mathcal {C} = {\mathcal {A}} * {\mathcal {B}}. \end{aligned}$$
(A2)

Using this notation, we have

$$\begin{aligned} \mathcal {A}^k = L^{-1} \left( L(\mathcal {A})\Delta \cdots \Delta L(\mathcal {A}) \right) = L^{-1} \left( {\text {fold}}\left( (\hat{{A}}^{(i)})^k: i \in [l]\right) \right) . \end{aligned}$$

Thus for any \(N\),

$$\begin{aligned} \sum \limits _{k=0}^{N} \frac{1}{k!} \mathcal {A}^k&=\sum \limits _{k=0}^{N} \frac{1}{k!} L^{-1} \left( {\text {fold}}\left( (\hat{{A}}^{(i)})^k: i \in [l]\right) \right) \\&=L^{-1} \left( {\text {fold}}\left( \sum _{k=0}^N \frac{1}{k!} (\hat{{A}}^{(i)})^k: i \in [l]\right) \right) . \end{aligned}$$

Let \(N \rightarrow \infty \), it holds that

$$\begin{aligned} {\text {exp}} \left[ \mathcal {A} \right]= & {} L^{-1} \left( {\text {fold}}\left( {\text {exp}} \left[ \hat{{A}}^{(i)} \right] : i \in [l]\right) \right) \\= & {} L^{-1} \left( {\text {fold}}\left( {\text {exp}} \left[ (L(\mathcal {A} ))^{(i)} \right] : i \in [l]\right) \right) ,\end{aligned}$$

since the series defining the matrix exponential is convergent [55, Prop. 2.1].

1.9 B.9 Proof of Equivalence of (9) and (11)

Proof

Using (9) and item (i) of Proposition 1, we have

$$\begin{aligned} {\text {exp}} \left[ \mathcal {A} \right]= & {} \textrm{fold} \left( {\text {exp}} \left[ \textrm{bcirc}(\mathcal {A}) \right] \textrm{unfold}(\mathcal {I}) \right) \\= & {} \textrm{fold} \left( {\text {exp}} \left[ \big (F^H_l \otimes I_n\big ) {\text {Diag}}\left( \hat{{A}}^{(i)}: i \in [l]\right) (F_{l} \otimes I_n) \right] \textrm{unfold}(\mathcal {I}) \right) \\= & {} \textrm{fold} \left( \big (F^H_l \otimes I_n\big ) {\text {exp}} \left[ {\text {Diag}}\left( \hat{{A}}^{(i)}: i \in [l]\right) \right] (F_{l} \otimes I_n)\textrm{unfold}(\mathcal {I}) \right) \\= & {} \textrm{fold} \left( \big (F^H_l \otimes I_n\big ) {\text {exp}} \left[ {\text {Diag}}\left( \hat{{A}}^{(i)}: i \in [l]\right) \right] \frac{1}{\sqrt{l}} {\text {Vec}}\left( I_n: i \in [l]\right) \right) \\= & {} \textrm{fold} \left( \frac{1}{\sqrt{l}} \big (F^H_l \otimes I_n\big ) {\text {Diag}}\left( {\text {exp}} \left[ \hat{{A}}^{(i)} \right] : i \in [l]\right) {\text {Vec}}\left( I_n: i \in [l]\right) \right) \\= & {} \textrm{fold} \left( \frac{1}{\sqrt{l}} \big (F^H_l \otimes I_n\big ) {\text {Vec}}\left( {\text {exp}} \left[ \hat{{A}}^{(i)} \right] : i \in [l]\right) \right) \\= & {} L^{-1} \left( {\text {fold}}\left( {\text {exp}} \left[ \hat{{A}}^{(i)} \right] : i \in [l]\right) \right) , \end{aligned}$$

where the third equality is due to the following property of the matrix exponential ([55, Prop. 2.3,  6]): If \(X^{\top }X=I\), then \( {\text {exp}} \left[ XAX^{\top } \right] = X {\text {exp}} \left[ A \right] X^{\top },\) and the fifth equality comes from the following formula which follows immediately from definition: \( {\text {exp}} \left[ {\text {Diag}}\left( D_{i}: i \in [l]\right) \right] = {\text {Diag}}\left( {\text {exp}} \left[ D_i \right] : i \in [l]\right) \), and the last equality follows from (5).

1.10 B.10 Proof of Proposition 8

Proof

Since the t-exponential mapping

$$\begin{aligned} \begin{aligned}&{\text {exp}} \left[ \mathcal {A} \right] =&L^{-1} \left( {\text {fold}}\left( {\text {exp}} \left[ L(\mathcal {A})^{(i)} \right] : i \in [l]\right) \right) \end{aligned} \end{aligned}$$

is the composite of the matrix exponential mapping and linear mappings and the matrix exponential is smooth ( [55, Prop. 2.16]), we conclude that the t-exponential mapping is smooth.

1.11 B.11 Proof of Proposition 9

Proof

Using the corresponding property of the matrix exponential [55, Prop. 2.4], we obtain

$$\begin{aligned} \frac{\textrm{d}}{\textrm{d}t} {\text {exp}} \left[ t\mathcal {A} \right]= & {} \frac{\textrm{d}}{\textrm{d}t} L^{-1} \left( {\text {fold}}\left( {\text {exp}} \left[ t\hat{{A}}^{(i)} \right] : i \in [l]\right) \right) \\= & {} L^{-1} \left( {\text {fold}}\left( \frac{\textrm{d}}{\textrm{d}t} {\text {exp}} \left[ t\hat{{A}}^{(i)} \right] : i \in [l]\right) \right) \\= & {} L^{-1} \left( {\text {fold}}\left( {\text {exp}} \left[ t\hat{{A}}^{(i)} \right] \hat{{A}}^{(i)}: i \in [l]\right) \right) \\= & {} L^{-1} \left( L( {\text {exp}} \left[ t\mathcal {A} \right] )\Delta L(\mathcal {A}) \right) = {\text {exp}} \left[ t\mathcal {A} \right] *\mathcal {A}, \end{aligned}$$

where the first equality comes from (11), while (A2) gives the last two equality. Similarly, we can show that \(\frac{\textrm{d}}{\textrm{d}t} {\text {exp}} \left[ t\mathcal {A} \right] =\mathcal {A}* {\text {exp}} \left[ t\mathcal {A} \right] \).

1.12 B.12 Proof of Proposition 10

Proof

Applying the corresponding property in the matrix case [55, Prop. 2.3,  6] and (A2), it follows that

$$\begin{aligned} {\text {exp}} \left[ \mathcal {X} *\mathcal {A}*\mathcal {X} ^{\top } \right]= & {} L^{-1} \left( {\text {fold}}\left( {\text {exp}} \left[ \left( L(\mathcal {X} *\mathcal {A}*\mathcal {X} ^{\top }) \right) ^{(i)} \right] : i \in [l]\right) \right) \\= & {} L^{-1} \left( {\text {fold}}\left( {\text {exp}} \left[ (L(\mathcal {X} )\Delta L(\mathcal {A}) \Delta L(\mathcal {X} ^{\top }))^{(i)} \right] : i \in [l]\right) \right) \\= & {} L^{-1} \left( {\text {fold}}\left( {\text {exp}} \left[ \hat{{X}}^{(i)}\hat{{A}}^{(i)}(\hat{{X}}^{(i)})^{H} \right] : i \in [l]\right) \right) \\= & {} L^{-1} \left( {\text {fold}}\left( \hat{{X}}^{(i)} {\text {exp}} \left[ \hat{{A}}^{(i)} \right] (\hat{{X}}^{(i)})^{H}: i \in [l]\right) \right) \\= & {} L^{-1} \left( L(\mathcal {X} )\Delta L( {\text {exp}} \left[ \mathcal {A} \right] )\Delta L(\mathcal {X} ^{\top }) \right) =\mathcal {X} * {\text {exp}} \left[ \mathcal {A} \right] *\mathcal {X} ^{\top }, \end{aligned}$$

where the first equality comes from (11).

1.13 B.13 Proof of Proposition 11

Proof

We denote \(\mathcal {A} = {\text {Diag}}\left( \mathcal {D}_j: j \in [p]\right) \) and \(\mathcal {B} = {\text {Diag}}\left( {\text {exp}} \left[ \mathcal {D}_j \right] : j \in [p]\right) \). Applying (11), we get

$$\begin{aligned} {\text {exp}} \left[ \mathcal {A} \right]= & {} L^{-1} \left( {\text {fold}}\left( {\text {exp}} \left[ \hat{A}^{(i)} \right] : i \in [l]\right) \right) \\= & {} L^{-1} \left( {\text {fold}}\left( {\text {exp}} \left[ {\text {Diag}}\left( \hat{{D}_j}^{(i)}: j \in [p]\right) \right] : i \in [l]\right) \right) \\= & {} L^{-1} \left( {\text {fold}}\left( {\text {Diag}}\left( {\text {exp}} \left[ \hat{{D}_j}^{(i)} \right] : j \in [p]\right) : i \in [l]\right) \right) \\= & {} L^{-1} \left( {\text {fold}}\left( \hat{B}^{(i)}: i \in [l]\right) \right) =\mathcal {B} , \end{aligned}$$

where the third equality is due to the property of the matrix exponential [56]: \( {\text {exp}} \left[ {\text {Diag}}\left( C_{i}: i \in [l]\right) \right] = {\text {Diag}}\left( {\text {exp}} \left[ C_i \right] : i \in [l]\right) \).

1.14 B.14 Proof of Proposition 12

Proof

It follows from Proposition 2 that

$$\begin{aligned} ( {\text {exp}} \left[ \mathcal {A} \right] )^{\top }= & {} L^{-1} \left( {\text {fold}}\left( \left( {\text {exp}} \left[ \hat{A}^{(i)} \right] \right) ^H: i \in [l]\right) \right) \\= & {} L^{-1} \left( {\text {fold}}\left( {\text {exp}} \left[ (\hat{{A}}^{(i)})^H \right] : i \in [l]\right) \right) = {\text {exp}} \left[ \mathcal {A}^{\top } \right] , \end{aligned}$$

where the second equality comes from the corresponding property in the matrix case [55, Prop. 2.3,  2].

1.15 B.15 Proof of Proposition 13

Proof

Using (A2), we have

$$\begin{aligned} {\text {exp}} \left[ \mathcal {A} \right] * {\text {exp}} \left[ \mathcal {B} \right]= & {} L^{-1}(L( {\text {exp}} \left[ \mathcal {A} \right] )\Delta L( {\text {exp}} \left[ \mathcal {B} \right] ))\\= & {} L^{-1} \left( {\text {fold}}\left( {\text {exp}} \left[ \hat{{A}}^{(i)} \right] {\text {exp}} \left[ \hat{{B}}^{(i)} \right] : i \in [l]\right) \right) \\= & {} L^{-1} \left( {\text {fold}}\left( {\text {exp}} \left[ \hat{{A}}^{(i)}+\hat{{B}}^{(i)} \right] : i \in [l]\right) \right) = {\text {exp}} \left[ \mathcal {A}+\mathcal {B} \right] , \end{aligned}$$

where the third equality comes from the property in the matrix exponential [55, Prop. 2.3,  5].

C Proofs of the Analytical Solution of the t-Sylvester Equation in Theorem 17

Lemma 11

Let \(\mathcal {A}\in \mathbb {R}^{m\times n \times l},\mathcal {B}\in \mathbb {R}^{n\times k\times l} \). Then

$$\begin{aligned} (I_{kl}\otimes [A^{(1)},\cdots ,A^{(l)}]){\text {vec}}(\widetilde{{\text {bcirc}}}(\mathcal {B}))\\ = ([I_k]_{l\times l}\circledast {\text {bcirc}}(\mathcal {A}) ){\text {vec}}(\mathcal {B}). \end{aligned}$$

Proof

By definition, the left-hand side part is

$$\begin{aligned}{} & {} \begin{bmatrix} [A^{(1)},\cdots ,A^{(l)}] &{}\quad \ddots &{} &{} &{} &{} &{} \\ &{} &{}\quad [A^{(1)},\cdots ,A^{(l)}] &{} &{} &{} &{} \\ &{} &{} &{}\quad \ddots &{} &{} &{} \\ &{} &{} &{} &{} \quad [A^{(1)},\cdots ,A^{(l)}] &{} &{} \\ &{} &{} &{} &{} &{}\quad \ddots &{} \\ &{} &{} &{} &{} &{} &{}\quad [A^{(1)},\cdots ,A^{(l)}] \end{bmatrix}_{kl} \cdot \begin{bmatrix} \begin{bmatrix} B^{(1)}\\ B^{(l)}\\ \vdots \\ B^{(2)} \end{bmatrix}_{:1}\\ \vdots \\ \begin{bmatrix} B^{(1)}\\ B^{(l)}\\ \vdots \\ B^{(2)} \end{bmatrix}_{:k} \\ \vdots \\ \begin{bmatrix} B^{(l)}\\ B^{(l-1)}\\ \vdots \\ B^{(1)} \end{bmatrix}_{:1}\\ \vdots \\ \begin{bmatrix} B^{(l)}\\ B^{(l-1)}\\ \vdots \\ B^{(1)} \end{bmatrix}_{:k} \end{bmatrix}, \end{aligned}$$

where \(B^{(i)}_{:j}\) is the jth column of \(B^{(i)}, i\in [l]\) and the right-hand side part is

$$\begin{aligned}{} & {} \begin{bmatrix} \begin{bmatrix} A^{(1)} &{} &{} \\ &{}\quad \ddots &{} \\ &{} &{}\quad A^{(1)} \end{bmatrix}_k &{} \begin{bmatrix} A^{(l)} &{} &{} \\ &{}\quad \ddots &{} \\ &{} &{}\quad A^{(l)} \end{bmatrix}_k &{} \cdots &{} \begin{bmatrix} A^{(2)} &{} &{} \\ &{}\quad \ddots &{} \\ &{} &{}\quad A^{(2)} \end{bmatrix}_k\\ \begin{bmatrix} A^{(2)} &{} &{} \\ &{}\quad \ddots &{} \\ &{} &{}\quad A^{(2)} \end{bmatrix}_k &{} \begin{bmatrix} A^{(1)} &{} &{} \\ &{}\quad \ddots &{} \\ &{} &{}\quad A^{(1)} \end{bmatrix}_k &{} \cdots &{} \begin{bmatrix} A^{(3)} &{} &{} \\ &{}\quad \ddots &{} \\ &{} &{}\quad A^{(3)} \end{bmatrix}_k\\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ \begin{bmatrix} A^{(l)} &{} &{} \\ &{}\quad \ddots &{} \\ &{} &{}\quad A^{(l)} \end{bmatrix}_k &{} \begin{bmatrix} A^{(l-1)} &{} &{} \\ &{}\quad \ddots &{} \\ &{} &{}\quad A^{(l-1)} \end{bmatrix}_k &{} \cdots &{} \begin{bmatrix} A^{(1)} &{} &{} \\ &{}\quad \ddots &{} \\ &{} &{}\quad A^{(1)} \end{bmatrix}_k \end{bmatrix}_l \cdot \begin{bmatrix} \begin{bmatrix} (B^{(1)})_{:1}\\ \vdots \\ (B^{(1)})_{:k} \end{bmatrix}\\ \begin{bmatrix} (B^{(2)})_{:1}\\ \vdots \\ (B^{(2)})_{:k} \end{bmatrix}\\ \vdots \\ \begin{bmatrix} (B^{(l)})_{:1}\\ \vdots \\ (B^{(l)})_{:k} \end{bmatrix}\\ \end{bmatrix}. \end{aligned}$$

We observe that the \((q,1)-\)th block of partitioned matrice on the left-hand side is

$$\begin{aligned} \sum \nolimits _{i=1}^{l}A^{(i)}B^{(h_i)}_{:j}, \qquad q = (p-1)k+j\in [kl], ~ j\in [k],~p\in [l], \end{aligned}$$
(A3)

where

$$\begin{aligned} h_i = {\left\{ \begin{array}{ll} l+p+1-i,&{}\quad i> p, \\ p+1-i,&{}\quad i\leqslant p. \end{array}\right. } \end{aligned}$$

While the \((q,1)-\)th block of partitioned matrice on the right-hand side is \(\sum \nolimits _{i=1}^{l}A^{(h_i)}B^{(i)}_{:j},\) which is equal to (A3).

Lemma 12

[57] Let \(C\in \mathbb R^{m\times n} ,X\in \mathbb R^{n\times p} ,B\in \mathbb R^{k\times p} \). Then

$$\begin{aligned} Y = CXB^{\top }\Leftrightarrow {\text {vec}}({Y}) = (B\otimes C){\text {vec}}({X}). \end{aligned}$$

Lemma 13

Let \(\mathcal {A}\in \mathbb {R}^{m\times n \times l},\mathcal {B}\in \mathbb {R}^{n\times k\times l},\mathcal {C}\in \mathbb {R}^{m\times k\times l}\). Then

$$\begin{aligned} \mathcal {C} = \mathcal {A}*\mathcal {B}\Leftrightarrow {\text {vec}}(\mathcal {C}) = (\widetilde{{\text {bcirc}}}(\mathcal {B})^{\top }\otimes I_m){\text {vec}}(\mathcal {A})=([I_k]_{l\times l}\circledast {\text {bcirc}}(\mathcal {A}) ){\text {vec}}(\mathcal {B}). \end{aligned}$$

Proof

We observe that \({\text {vec}}(\mathcal {C}) = {\text {vec}}([C^{(1)},\cdots ,C^{(l)}])\). Since \({\text {unfold}}(\mathcal {C}) = {\text {bcirc}}(\mathcal {A}){\text {unfold}}(\mathcal {B}),\) i.e., \([C^{(1)},\cdots ,C^{(l)}] = [A^{(1)},\cdots ,A^{(l)}]\widetilde{{\text {bcirc}}}(\mathcal {B})\), we have

$$\begin{aligned} {\text {vec}}(\mathcal {C})= & {} {\text {vec}}([C^{(1)},\cdots ,C^{(l)}])\\= & {} {\text {vec}}([A^{(1)},\cdots ,A^{(l)}]\widetilde{{\text {bcirc}}}(\mathcal {B}))\\= & {} (\widetilde{{\text {bcirc}}}(\mathcal {B})^{\top }\otimes I_m){\text {vec}}([A^{(1)},\cdots ,A^{(l)}])\\= & {} (\widetilde{{\text {bcirc}}}(\mathcal {B})^{\top }\otimes I_m){\text {vec}}(\mathcal {A}), \end{aligned}$$

where the third equation comes from Lemma 12. Similarly, by lemma 11, there holds

$$\begin{aligned} {\text {vec}}(\mathcal {C})= & {} {\text {vec}}([C^{(1)},\cdots ,C^{(l)}])\\= & {} {\text {vec}}([A^{(1)},\cdots ,A^{(l)}]\widetilde{{\text {bcirc}}}(\mathcal {B}))\\= & {} (I_{kl}\otimes [A^{(1)},\cdots ,A^{(l)}]){\text {vec}}(\widetilde{{\text {bcirc}}}(\mathcal {B}))\\= & {} ([I_k]_{l\times l}\circledast {\text {bcirc}}(\mathcal {A}) ){\text {vec}}(\mathcal {B}), \end{aligned}$$

where the third equation follows from Lemma 12.

Proof

Applying lemma 13, the tensor Sylvester equation (36) can be rewritten in the form

$$\begin{aligned} {\text {vec}}(\mathcal {C}) = \left( \widetilde{{\text {bcirc}}}(\mathcal {B})^{\top }\otimes I_k+[I_k]_{l\times l}\circledast {\text {bcirc}}(\mathcal {A}) \right) {\text {vec}}(\mathcal {X}). \end{aligned}$$
(A4)

D The Euclidean Gradient \({\text {grad}}f(\mathcal {X} )\) and the Euclidean directional derivative \(Df(\mathcal {X} )[\mathcal {H} ]\) in Sect. 3.2

Similar to [13, Def. 4], for third-order tensor \(\mathcal {X}\in \mathbb R^{n\times p \times l} \), we can also introduce the definition of the Euclidean gradient \({\text {grad}}f(\mathcal {X} )\) and the Euclidean Hessian \({\text {Hess}}f(\mathcal {X} )\) from the Fréchet differentiable.

Definition 22

Let \(f: \mathcal {U} \subseteq \mathbb {R}^{n \times p \times l} \rightarrow \mathbb {R}\) be a continuous map. Then, we say f is t-differentiable at \(\mathcal {X} \in \mathcal {U} \) if and only if there exists a third-order tensor \({\text {grad}}f(\mathcal {X})\in \mathbb R^{n\times p \times l} \) such that

$$\begin{aligned} \lim _{\mathcal {H} \rightarrow \mathcal {O}} \frac{\left\| f(\mathcal {X}+\mathcal {H})-f(\mathcal {X})-\left\langle {\text {grad}}f(\mathcal {X}), \mathcal {H}\right\rangle \right\| _{\textrm{F}}}{\Vert \mathcal {H}\Vert _{\textrm{F}}}=0, \end{aligned}$$

where \({\text {grad}}f(\mathcal {X})\) is called the gradient of f at \(\mathcal {X}\) and \(Df(\mathcal {X} )[\mathcal {H} ]= \left\langle {\text {grad}}f(\mathcal {X}), \mathcal {H}\right\rangle \) called the directional derivative of f at \(\mathcal {X}\) along \(\mathcal {H}\). And we say f is twice t-differentiable at \(\mathcal {X} \in U\) if and only if f is continuously t-differentiable and there exists a bounded linear operator \({\text {Hess}}f(\mathcal {X}):\mathbb R^{n\times p \times l} \rightarrow \mathbb R^{n\times p \times l} \) such that

$$\begin{aligned} \lim _{\mathcal {H} \rightarrow \mathcal {O}} \frac{\left\| {\text {grad}}f(\mathcal {X}+\mathcal {H})-{\text {grad}}f(\mathcal {X})-{\text {Hess}}f(\mathcal {X})[\mathcal {H} ]\right\| _{\textrm{F}}}{\Vert \mathcal {H}\Vert _{\textrm{F}}}=0. \end{aligned}$$

Furthermore, we say f is t-differentiable (twice t-differentiable) on \(\mathcal {U} \) if and only if f is t-differentiable (twice t-differentiable) at every \(\mathcal {X} \in \mathcal {U} \).

Theorem 21

Let f be a continuous map from \(\mathcal {U} \subseteq \mathbb {R}^{n \times p \times l}\) to \(\mathbb {R}\). Thenf is t-differentiable on U if and only if \(\frac{\mathrm {\partial } f(\mathcal {X})}{\partial [{\text {vec}}(\mathcal {X})]}\) exists for every \(\mathcal {X} \in \mathcal {U} \), where \(\frac{\partial f(\mathcal {X})}{\partial [{\text {vec}}(\mathcal {X})]}\) is a vector in \(\mathbb {R}^{npl}\) with \(\left( \frac{\partial f(\mathcal {X})}{\partial [{\text {vec}}(\mathcal {X})]}\right) _{i}=\frac{\partial f(\mathcal {X})}{\partial \left( [{\text {vec}}(\mathcal {X})]_{i}\right) }\) for any \(i \in [npl]\). Especially, for any \(\mathcal {X} \in \mathcal {U} ,\)

$$\begin{aligned} {\text {grad}}f(\mathcal {X})={\text {vec}}^{-1} \left( \frac{\partial f(\mathcal {X})}{\partial [{\text {vec}}(\mathcal {X})]} \right) , \end{aligned}$$
(A5)

where \(\varvec{v}=\textrm{vec}(\mathcal {A} )\) denotes the vectorized tensor of \(\mathcal {A} \) and \(\textrm{vec}^{-1}(\varvec{v})=\mathcal {A} \) represents the operator that converts a vector \(\varvec{v}\) back to a tensor \(\mathcal {A} \), which can all be implemented with functions reshape, permute and ipermute of Matlab (cf. [58]).

Proof

The proof is similar to that of [13, Thm. 1] and is omitted.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mao, XP., Wang, Y. & Yang, YN. Computation over t-Product Based Tensor Stiefel Manifold: A Preliminary Study. J. Oper. Res. Soc. China (2024). https://doi.org/10.1007/s40305-023-00522-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40305-023-00522-z

Keywords

Mathematics Subject Classification

Navigation