1 Introduction

Let m and \(n_1, \cdots , n_m\) be positive integers. A tensor \(\mathcal {F}\) of order m and dimension \(n_1 \times \cdots \times n_m\) can be labeled such that

$$\begin{aligned} \mathcal {F}= (\mathcal {F}_{i_1,\cdots ,i_m})_{1\leqslant i_1 \leqslant n_1, \cdots , 1 \leqslant i_m \leqslant n_m}. \end{aligned}$$

Let \(\mathbb {F}\) be a field (either the real field \(\mathbb {R}\) or the complex field \(\mathbb {C}\)). The space of all tensors of order m and dimension \(n_1 \times \cdots \times n_m\) with entries in \(\mathbb {F}\) is denoted as \(\mathbb {F}^{n_1\times \cdots \times n_m}\). For vectors \(v_i \in \mathbb {F}^{n_i}\), \(i=1,\cdots ,m\), their outer product \(v_1 \otimes \cdots \otimes v_m\) is the tensor in \(\mathbb {F}^{n_1\times \cdots \times n_m}\) such that

$$\begin{aligned} (v_1 \otimes \cdots \otimes v_m)_{i_1,\cdots ,i_m} \, = \, (v_1)_{i_1} \cdots (v_m)_{i_m} \end{aligned}$$

for all labels in the corresponding range. A tensor like \(v_1 \otimes \cdots \otimes v_m\) is called a rank-1 tensor. For every tensor \(\mathcal {F}\in \mathbb {F}^{n_1\times \cdots \times n_m}\), there exist vector tuples \((v^{s,1}, \cdots , v^{s,m})\), \(s=1,\cdots ,r\)\(v^{s,j} \in \mathbb {F}^{n_j}\), such that

$$\begin{aligned} \mathcal {F}= \sum _{s=1}^r v^{s,1} \otimes \cdots \otimes v^{s, m}. \end{aligned}$$
(1)

The smallest such r is called the \(\mathbb {F}\)-rank of \(\mathcal {F}\), for which we denote \(\text {rank}_{\mathbb {F}}(\mathcal {F})\). When r is minimum, Eq. (1) is called a rank-r decomposition over the field \(\mathbb {F}\). In the literature, this rank is sometimes referenced as the Candecomp/Parafac (CP) rank. We refer to [1, 2] for various notions of tensor ranks. Recent work for tensor decompositions can be found in [3,4,5,6,7,8,9]. Tensors are closely related to polynomial optimization [10,11,12,13,14]. Tensor decomposition has been widely used in temporal tensor analysis including discovering patterns [15], predicting evolution [16], and identifying temporal communities [17], and in multirelational data analysis including collective classification [18], word representation learning [19], and coherent subgraph learning [20]. Tensor approximation has been explored in signal processing applications [21] and multidimensional, multivariate data analysis [22]. Various other applications of tensors can be found in [23,24,25]. Throughout the paper, we use the Hilbert-Schmidt norm for tensors:

$$\begin{aligned} \Vert \mathcal {F}\Vert = \sqrt{ \sum _{\begin{array}{c} 1\leqslant i_j \leqslant n_j, 1 \leqslant j \leqslant m \end{array} } |\mathcal {F}_{i_1,\cdots ,i_m}|^2 } . \end{aligned}$$

The low rank tensor approximation (LRTA) problem is to find a low rank tensor that is close to a given one. This is equivalent to a nonlinear least square optimization problem. For a given tensor \(\mathcal {F}\in \mathbb {F}^{n_1\times \cdots \times n_m}\) and a given rank r, we look for r vector tuples \(v^{(s)}: = (v^{s,1}, \cdots , v^{s,m})\), \(s=1,\cdots ,r\), such that

$$\begin{aligned} \mathcal {F}\, \approx \, \sum \limits _{s = 1}^r v^{s,1} \otimes \cdots \otimes v^{s, m}, \quad v^{s,j} \in \mathbb {F}^{n_j} . \end{aligned}$$

This requires to solve the following nonlinear least squares optimization

$$\begin{aligned} \min \limits _{ v^{s,j} \in \mathbb {F}^{n_j}, \, j=1,\cdots ,m } \big \Vert \mathcal {F}- \sum \limits _{s = 1}^r v^{s,1} \otimes \cdots \otimes v^{s, m}\big \Vert ^2. \end{aligned}$$
(2)

When \(r = 1\), the best rank-1 approximating tensor always exists and it is equivalent to computing the spectral norm [26, 27]. When \(r>1\), a best rank-r tensor approximation may not exist [28]. Classically used methods for solving low rank tensor approximations are the alternating least squares (ALS) method [29,30,31], higher-order power iterations [32], semidefinite relaxations [10, 12], SVD-based methods [33], optimization-based methods [8, 34]. We refer to [3, 35] for recent work on low rank tensor approximations.

1.1 Contributions

In this paper, we extend the generating polynomial method in [7, 35] to compute tensor rank decompositions and low rank tensor approximations for nonsymmetric tensors. First, we estimate generating polynomials by solving linear least squares. Second, we find their approximately common zeros, which can be done by computing eigenvalue decompositions. Third, we get a tensor decomposition from their common zeros, by solving linear least squares. To find a low rank tensor approximation, we first apply the decomposition method to obtain a low rank approximating tensor and then use nonlinear optimization methods to improve the approximation. Our major conclusion is that if the tensor to be approximated is sufficiently close to a low rank one, then the obtained low rank tensor is a quasi-optimal low rank approximation. The proof is based on perturbation analysis of linear least squares and eigenvalue decompositions.

The paper is organized as follows. In Sect. 2, we review some basic results about tensors. In Sect. 3, we introduce the concept of generating polynomials and study their relations to tensor decompositions. In Sect. 4, we give an algorithm for computing tensor rank decompositions for low rank tensors. In Sect. 5, we give an algorithm for computing low rank approximations. The approximation error analysis is also given. In Sect. 6, we present numerical experiments. Some conclusions are made in Sect. 7.

2 preliminary

2.1 Notation

The symbol \({\mathbb {N}}\) (resp., \({\mathbb {R}}\), \({\mathbb {C}}\)) denotes the set of nonnegative integers (resp., real, complex numbers). For an integer \(r>0\), denote the set \([r] :=\{1, \cdots , r\}\). Uppercase letters (e.g., A) denote matrices, \(A_{ij}\) denotes the (ij)th entry of the matrix A, and Curl letters (e.g., \({\mathcal {F}}\)) denote tensors, \({\mathcal {F}}_{i_1,\cdots ,i_m}\) denotes the \((i_1,\cdots ,i_m)\)th entry of the tensor \({\mathcal {F}}\). For a complex matrix A, \(A^{\top }\) denotes its transpose and \(A^*\) denotes its conjugate transpose. The Kruskal rank of A, for which we denote \(\kappa _A\), is the largest number k such that every set of k columns of A is linearly independent. For a vector v, the \((v)_i\) denotes its ith entry and \(\hbox {diag}(v)\) denotes the square diagonal matrix whose diagonal entries are given by the entries of v. The subscript \(v_{s:t}\) denotes the subvector of v whose label is from s to t. For a matrix A, the subscript notation \(A_{:,j}\) and \(A_{i,:}\), respectively, denote its jth column and ith row. Similar subscript notation is used for tensors. For two matrices AB, their classical Kronecker product is denoted as \(A \boxtimes B\). For a set S, its cardinality is denoted as |S|.

For a tensor decomposition for \(\mathcal {F}\) such that

$$\begin{aligned} \mathcal {F}\, = \, \sum _{s=1}^r u^{s,1}\otimes u^{s,2} \otimes \cdots \otimes u^{s,m}. \end{aligned}$$
(3)

we denote the matrices

$$\begin{aligned} U^{(j)} \, = \, [u^{1,j},\cdots , u^{r,j}], \quad \, j =1, \cdots , m. \end{aligned}$$

The \(U^{(j)}\) is called the jth decomposing matrix for \(\mathcal {F}\). For convenience of notation, we denote that

$$\begin{aligned} U^{(1)} \circ \cdots \circ U^{(m)} = \sum _{i=1}^r (U^{(1)})_{:,i} \otimes \cdots \otimes (U^{(m)})_{:,i}. \end{aligned}$$

Then the above tensor decomposition is equivalent to \(\mathcal {F}= U^{(1)} \circ \cdots \circ U^{(m)}\).

For a matrix \(V \in \mathbb {C}^{p \times n_t}\), define the matrix-tensor product

$$\begin{aligned} \mathcal {A}:=V \times _t \mathcal {F}\end{aligned}$$

is a tensor in \(\mathbb {C}^{n_1\times \cdots \times n_{t-1} \times p \times n_{t+1} \times \cdots \times n_m}\) such that the ith slice of \(\mathcal {A}\) is

$$\begin{aligned} \mathcal {A}_{i_1,\cdots ,i_{t-1},:,i_{t+1},\cdots ,i_m} = V \mathcal {F}_{i_1,\cdots ,i_{t-1},:,i_{t+1},\cdots ,i_m}. \end{aligned}$$

2.2 Flattening matrices

We partition the dimensions \(n_1,n_2,\cdots ,n_m\) into two disjoint groups \(I_1\) and \(I_2\) such that the difference

$$\begin{aligned} \left| \prod _{i \in I_1 } n_{i} - \prod _{j \in I_2} n_{j} \right| \end{aligned}$$

is minimum. Up to a permutation, we write that \(I_1 = \{n_{1}, \cdots , n_{k} \}\), \(I_2 = \{n_{k+1}, \cdots , n_{m} \}.\) For convenience, denote that

$$\begin{aligned} \begin{array}{l} I = \left\{ \left( \imath _{1}, \cdots , \imath _{k} \right) : 1 \leqslant \imath _{j} \leqslant n_{j}, j=1, \cdots , k \right\} ,\\ J =\left\{ \left( \imath _{k+1}, \cdots , \imath _{m}\right) : 1 \leqslant \imath _{j} \leqslant n_{j}, j=k+1, \cdots , m\right\} . \end{array} \end{aligned}$$

For a tensor \(\mathcal {F}\in \mathbb {C}^{n_{1}\times \cdots \times n_{m}}\), the above partition gives the flattening matrix

$$\begin{aligned} \hbox {Flat} (\mathcal {F}) :=\left( \mathcal {F}_{\imath , \jmath }\right) _{\imath \in I, \jmath \in J}. \end{aligned}$$
(4)

This gives the most square flattening matrix for \(\mathcal {F}\). Let \(\sigma _{r}\) denote the closure of all rank-r tensors in \(\mathbb {C}^{n_{1}\times \cdots \times n_{m}}\), under the Zariski topology (see [36]). The set \(\sigma _r\) is an irreducible variety of \(\mathbb {C}^{n_1 \times \cdots \times n_{m}}\). For a given tensor \(\mathcal {F}\in \sigma _r\), it is possible that \({\text {rank}}(\mathcal {F}) > r.\) This fact motivates the notion of border rank:

$$\begin{aligned} {\text {rank}}_{B}(\mathcal {F})=\min \left\{ r: \mathcal {F}\in \sigma _r \right\} . \end{aligned}$$
(5)

For every tensor \(\mathcal {F}\in \mathbb {C}^{n_{1}\times \cdots \times n_{m}}\), one can show that

$$\begin{aligned} {\text {rank}}\hbox {Flat}(\mathcal {F}) \leqslant {\text {rank}}_{B}(\mathcal {F}) \leqslant {\text {rank}}(\mathcal {F}) . \end{aligned}$$
(6)

A property \(\texttt{P}\) is said to hold generically on \(\sigma _{r}\) if \(\texttt{P}\) holds on a Zariski open subset T of \(\sigma _r\). For such a property \(\texttt{P}\), each \(u \in T\) is called a generic point. Interestingly, the above three ranks are equal for generic points of \(\sigma _r\) for a range of values of r.

Lemma 1

Let s be the smaller dimension of the matrix \(\hbox {Flat}(\mathcal {F})\). For every \(r \leqslant s\), the equalities

$$\begin{aligned} {\text {rank}}\textrm{Flat}(\mathcal {F}) \,= \, {\text {rank}}_B(\mathcal {F}) \,= \, {\text {rank}}(\mathcal {F}) \end{aligned}$$
(7)

hold for tensors \(\mathcal {F}\) in a Zariski open subset of \(\sigma _r\).

Proof

Let \(\phi _{1}, \cdots , \phi _{\ell }\) be the \(r \times r\) minors of the matrix

$$\begin{aligned} \hbox {Flat} \left( \sum _{i=1}^r x^{i, 1} \otimes \cdots \otimes x^{i, m} \right) . \end{aligned}$$
(8)

They are homogeneous polynomials in \(x^{i, j}(i=1, \cdots , r, j=1, \cdots , m) .\) Let x denote the tuple \(\left( x^{1,1}, x^{1,2}, \cdots , x^{r, m}\right) .\) Define the projective variety in \(\mathbb {P}^{ r\left( n_{1}+\cdots +n_{m}\right) -1}\)

$$\begin{aligned} Z = \left\{ x: \phi _{1}(x)=\cdots =\phi _{\ell }(x)=0\right\} . \end{aligned}$$
(9)

Then \(Y:=\mathbb {P}^{r\left( n_{1}+\cdots +n_{m}\right) -1} \backslash Z\) is a Zariski open subset of full dimension. Consider the polynomial mapping \(\pi : Y \rightarrow \sigma _r\),

$$\begin{aligned} \left( x^{1,1}, x^{1,2}, \cdots , x^{r, m}\right) \mapsto \sum _{i=1}^{r}\left( x^{i, 1}\right) \otimes \cdots \otimes \left( x^{i, m}\right) . \end{aligned}$$
(10)

The image \(\pi (Y)\) is dense in the irreducible variety \(\sigma _{r}.\) So, \(\pi (Y)\) contains a Zariski open subset \(\mathscr {Y}\) of \(\sigma _r\) (see [37]). For each \(\mathcal {F}\in {\mathscr {Y}}\), there exists \(u \in Y\) such that \(\mathcal {F}= \pi (u)\). Because \(u \notin Z\), at least one of \(\phi _{1}(u), \cdots , \phi _{\ell }(u)\) is nonzero, and hence \({\text {rank}}\hbox {Flat}(\mathcal {F}) \geqslant r .\) By (6), we know (7) holds for all \(\mathcal {F}\in \mathscr {Y}\) since \({\text {rank}}(\mathcal {F}) \leqslant r .\) Since \(\mathscr {Y}\) is a Zariski open subset of \(\sigma _r\), the lemma holds.

By Lemma 1, if \(r \leqslant s\) and \(\mathcal {F}\) is a generic tensor in \(\sigma _{r}\), we can use \({\text {rank}}\hbox {Flat}(\mathcal {F})\) to estimate \({\text {rank}}(\mathcal {F})\). However, for a generic \({\mathcal {F}} \in \mathbb {C}^{n_{1} \times \cdots \times n_{m}}\) such that \({\text {rank}}\hbox {Flat}(\mathcal {F})=r\), we cannot conclude \(\mathcal {F}\in \sigma _r\).

2.3 Reshaping of tensor decompositions

A tensor \(\mathcal {F}\) of order greater than 3 can be reshaped to another tensor \(\widehat{\mathcal {F}}\) of order 3. A tensor decomposition of \(\widehat{\mathcal {F}}\) can be converted to a decomposition for \(\mathcal {F}\) under certain conditions. In the following, we assume a given tensor \(\mathcal {F}\) has decomposition (3). Suppose the set \(\{1,\cdots ,m\}\) is partitioned into 3 disjoint subsets

$$\begin{aligned} \{1,\cdots ,m\} \, = \, I_1 \cup I_2 \cup I_3. \end{aligned}$$

Let \(p_i=|I_i| \mathrm {~for~} i=1,2,3\). For the reshaped vectors

$$\begin{aligned} \left\{ \begin{array}{rcll} w^{s,1} &{} = &{} u^{s,i_1} \boxtimes \cdots \boxtimes u^{s,i_{p_1}} &{}\mathrm {~for~} I_1 \, = \, \{ i_1, \cdots , i_{p_1} \},\\ w^{s,2} &{}=&{} u^{s,j_1} \boxtimes \cdots \boxtimes u^{s,j_{p_2}} &{}\mathrm {~for~} I_2 \, = \, \{ j_1, \cdots , j_{p_2} \}, \\ w^{s,3} &{} = &{} u^{s,k_1} \boxtimes \cdots \boxtimes u^{s,k_{p_3}} &{}\mathrm {~for~} I_3 \, = \, \{ k_1, \cdots , k_{p_3} \}, \end{array} \right. \end{aligned}$$
(11)

we get the following tensor decomposition:

$$\begin{aligned} \widehat{\mathcal {F}} \, = \, \sum _{s=1}^r w^{s,1}\otimes w^{s,2} \otimes w^{s,3} . \end{aligned}$$
(12)

Conversely, for a decomposition like (12) for \(\widehat{\mathcal {F}}\), if all \(w^{s,1}, w^{s,2}, w^{s,3}\) can be expressed as rank-1 products as in (11), then Eq. (12) can be reshaped to a tensor decomposition for \(\mathcal {F}\) as in (3). When the flattened tensor \(\widehat{\mathcal {F}}\) satisfies some conditions, the tensor decomposition of \(\widehat{\mathcal {F}}\) is unique. For such a case, we can obtain a tensor decomposition for \(\mathcal {F}\) through decomposition (12). A classical result about the uniqueness is the Kruskal’s criterion [38].

Theorem 1

(Kruskal’s criterion, [38]) Let \({\mathcal {F}} = U^{(1)} \circ U^{(2)} \circ U^{(3)}\) be a tensor with each \(U^{(i)} \in \mathbb {C}^{n_i \times r}\). Let \(\kappa _i\) be the Kruskal rank of \(U^{(i)}\), for \(i=1,2,3\). If

$$ 2r +2 \leqslant \kappa _1+\kappa _2+\kappa _3, $$

then \({\mathcal {F}}\) has a unique rank-r tensor decomposition.

The Kruskal’s criterion can be generalized for more range of r as in [39]. Assume the dimension \(n_1 \geqslant n_2 \geqslant n_3 \geqslant 2\) and the rank r is such that

$$\begin{aligned} 2r + 2 \leqslant \textrm{min}(n_1,r)+\textrm{min}(n_2,r)+\textrm{min}(n_3,r), \end{aligned}$$

or equivalently, for \(\delta =n_2+n_3-n_1-2\), r is such that

$$\begin{aligned} r \leqslant n_1 +\textrm{min}\{ \frac{1}{2}\delta ,\delta \}. \end{aligned}$$

If \(\mathcal {F}\) is a generic tensor of rank r as above in the space \(\mathbb {C}^{n_1 \times n_2 \times n_3}\), then \(\mathcal {F}\) has a unique rank-r decomposition. The following is a uniqueness result for reshaped tensor decompositions.

Theorem 2

(Reshaped Kruskal Criterion, [39, Theorem 4.6]) For the tensor space \(\mathbb {C}^{n_1 \times n_2 \times \cdots \times n_m}\) with \(m \geqslant 3\), let \(I_1 \cup I_2 \cup I_3 =\{1,2,\cdots ,m\}\) be a union of disjoint sets and let

$$\begin{aligned} p_1 = \prod _{i \in I_1} n_i, \quad p_2 = \prod _{j \in I_2} n_j, \quad p_3 = \prod _{k \in I_3} n_k . \end{aligned}$$

Suppose \(p_1 \geqslant p_2 \geqslant p_3\) and let \(\delta =p_2+p_3-p_1-2\). Assume

$$\begin{aligned} r \leqslant p_1+\textrm{min}\{\frac{1}{2}\delta ,\delta \} . \end{aligned}$$
(13)

If \(\mathcal {F}\) is a generic tensor of rank r in \(\mathbb {C}^{n_1 \times n_2 \times \cdots \times n_m}\), then the reshaped tensor \(\widehat{\mathcal {F}}\in \mathbb {C}^{p_1 \times p_2 \times p_3}\) as in (12) has a unique rank-r decomposition.

3 Generating Polynomials

Generating polynomials can be used to compute tensor rank decompositions. We consider tensors whose ranks r are not bigger than the highest dimension, say, \(r\leqslant n_1\) where \(n_1\) is the biggest of \(n_1, \cdots , n_m\). Denote the indeterminate vector variables

$$\begin{aligned} \mathbf {x_1} = (x_{1,1},\cdots , x_{1,n_1}), \, \cdots , \, \mathbf {x_m} = (x_{m,1},\cdots , x_{m,n_m}). \end{aligned}$$

A tensor in \(\mathbb {C}^{n_1 \times \cdots \times n_m}\) can be labeled by monomials like \(x_{1,i_1}x_{2,i_2}\cdots x_{m,i_m}\). Let

$$\begin{aligned} \begin{array}{rcl} \mathbb {M}:= & {} \big \{ x_{1,i_1}\cdots x_{m,i_m} \; | \;1\leqslant i_j \leqslant n_j \big \} . \end{array} \end{aligned}$$
(14)

For a subset \(J \subseteq \{1,2,\cdots ,m \}\), denote that

$$\begin{aligned} \boxed { \begin{array}{rcl} J^c &{} :=&{} \{1,2,\cdots ,m \} \backslash J, \\ \mathbb {M}_J &{} :=&{} \big \{ x_{1,i_1}\cdots x_{m,i_m} \; | \; x_{j,i_j} = 1\, ~\forall \, j\in J^c \big \}, \\ {\mathcal {M}}_{J} &{} :=&{} {\textrm{span}} \{\mathbb {M}_J\}. \end{array}} \end{aligned}$$
(15)

A label tuple \((i_1,\cdots ,i_m)\) is uniquely determined by the monomial \(x_{1,i_1} \cdots x_{m,i_m}\). So a tensor \(\mathcal {F}\in \mathbb {C}^{n_1\times \cdots \times n_m}\) can be equivalently labeled by monomials such that

$$\begin{aligned} \mathcal {F}_{x_{1,i_1}\cdots x_{m,i_m}} :=\mathcal {F}_{i_1,\cdots ,i_m}. \end{aligned}$$
(16)

With the new labeling by monomials, define the bi-linear product

$$\begin{aligned} \langle \sum _{\mu \in \mathbb {M}}c_{\mu }\mu ,\mathcal {F}\rangle \, :=\, \sum _{\mu \in \mathbb {M}}c_{\mu }\mathcal {F}_{\mu }. \end{aligned}$$
(17)

In the above, each \(c_{\mu }\) is a scalar and \({\mathcal {F}}\) is labeled by monomials as in (16).

Definition 1

([40, 41]) For a subset \(J \subseteq \{1,2,\cdots ,m\}\) and a tensor \({\mathcal {F}} \in \mathbb {C}^{n_1 \times \cdots \times n_m}\), a polynomial \(p \in \mathcal {M}_J\) is called a generating polynomial for \({\mathcal {F}}\) if

$$\begin{aligned} \langle pq,{\mathcal {F}} \rangle =0 \quad \hbox {for all} \, \, q \in \mathbb {M}_{J^c} . \end{aligned}$$
(18)

The following is an example of generating polynomials.

Example 1

Consider the cubic order tensor \({\mathcal {F}} \in \mathbb {C}^{3 \times 3 \times 3}\) given as

The following is a generating polynomial for \({\mathcal {F}}\):

$$\begin{aligned} p :=(2x_{1,1}-x_{1,2})(2x_{2,1}-x_{2,2}). \end{aligned}$$

Note that \(p \in \mathcal {M}_{\{1,2 \}}\) and for each \(i_3 = 1, 2, 3\),

$$\begin{aligned} p \cdot x_{3,i_3} = (2x_{1,1}-x_{1,2})(2x_{2,1}-x_{2,2}) x_{3,i_3}. \end{aligned}$$

One can check that for each \(i_3 = 1, 2, 3\),

$$\begin{aligned} 4{\mathcal {F}}_{1,1,i_3}-2{\mathcal {F}}_{1,2,i_3}- 2{\mathcal {F}}_{2,1,i_3}+{\mathcal {F}}_{2,2,i_3}=0. \end{aligned}$$

This is because \(\begin{bmatrix}4&-2&-2&1 \end{bmatrix}\) is orthogonal to

$$\begin{aligned} \begin{bmatrix} {\mathcal {F}}_{1,1,i_3}&{\mathcal {F}}_{1,2,i_3}&{\mathcal {F}}_{2,1,i_3}&{\mathcal {F}}_{2,2,i_3} \end{bmatrix} \end{aligned}$$

for each \(i_3=1,2,3\).

Suppose the rank \(r \leqslant n_1\) is given. For convenience of notation, denote the label set

$$\begin{aligned} J :=\{(i,j,k):1\leqslant i \leqslant r, ~2 \leqslant j \leqslant m, ~2\leqslant k \leqslant n_j \}. \end{aligned}$$
(19)

For a matrix \(G \in \mathbb {C}^{[r]\times J}\) and a triple \(\tau =(i,j,k) \in J\), define the bi-linear polynomial

$$\begin{aligned} \phi [G,\tau ](x) :=\sum _{\ell =1}^{r} G(\ell ,\tau )x_{1,\ell }x_{j,1}-x_{1,i}x_{j,k} \,\, \in \, {\mathcal {M}}_{\{1,j\}}. \end{aligned}$$
(20)

The rows of G are labeled by \(\ell =1,2,\cdots ,r\), and the columns of G are labeled by \(\tau \in J\). We are interested in G such that \(\phi [G,\tau ]\) is a generating polynomial for a tensor \({\mathcal {F}} \in \mathbb {C}^{n_1 \times n_2 \times \cdots \times n_m}\). This requires that

$$\begin{aligned} \langle \phi [G,\tau ] \cdot \mu ,{\mathcal {F}} \rangle =0 \, \quad \hbox {for all} \, \, \mu \in \mathbb {M}_{\{1,j\}^c}. \end{aligned}$$

The above is equivalent to the equation (\({\mathcal {F}}\) labeled as in (16))

$$\begin{aligned} \sum _{\ell =1}^{r} G(\ell ,\tau ){\mathcal {F}}_{x_{1,\ell } \cdot \mu } ={\mathcal {F}}_{x_{1,i} x_{j,k} \cdot \mu } . \end{aligned}$$
(21)

Definition 2

([40, 41]) When (21) holds for all \(\tau \in J\), the G is called a generating matrix for \({\mathcal {F}}\).

For given G, \(j \in \{ 2, \cdots , m \}\) and \(k \in \{2,\cdots ,n_j \}\), we denote the matrix

$$\begin{aligned} M^{j,k}[G] :=\begin{bmatrix} G(1,(1,j,k)) &{} G(2,(1,j,k)) &{} \cdots &{} G(r,(1,j,k)) \\ G(1,(2,j,k)) &{} G(2,(2,j,k)) &{} \cdots &{} G(r,(2,j,k)) \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ G(1,(r,j,k)) &{} G(2,(r,j,k)) &{} \cdots &{} G(r,(r,j,k)) \\ \end{bmatrix} . \end{aligned}$$
(22)

For each (jk), define the matrix/vector

$$\begin{aligned} \left\{ \begin{array}{rcl} A[\mathcal {F},j] &{} :=&{} \Big ( \mathcal {F}_{x_{1,\ell }\cdot \mu } \Big )_{\mu \in \mathbb {M}_{\{1,j\}^c}, 1\leqslant \ell \leqslant r} , \\ b[\mathcal {F},j,k] &{} :=&{} \Big (\mathcal {F}_{x_{1,\ell } \cdot x_{j,k}\cdot \mu } \Big )_{\mu \in \mathbb {M}_{\{1,j\}^c}, 1\leqslant \ell \leqslant r} . \end{array} \right. \end{aligned}$$
(23)

Equation (21) is then equivalent to

$$\begin{aligned} A[\mathcal {F},j] (M^{j,k}[G])^{\top } \, = \, b[\mathcal {F},j,k]. \end{aligned}$$
(24)

The following is a useful property for the matrices \(M^{j,k}[G]\).

Theorem 3

([40, 41]) Suppose \(\mathcal {F}=\sum _{s=1}^r u^{s,1} \otimes \cdots \otimes u^{s,m}\) for vectors \(u^{s,j} \in \mathbb {C}^{n_j}\). If \(r \leqslant n_1\), \((u^{s,2})_1\cdots (u^{s,m})_1 \ne 0\), and the first r rows of the first decomposing matrix

$$\begin{aligned} U^{(1)} :=[u^{1,1} \, \, \cdots \,\, u^{r,1} ] \end{aligned}$$

are linearly independent, then there exists a G satisfying (24) and satisfying (for all \(j \in \{ 2, \cdots , m \}\), \(k \in \{2,\cdots ,n_j \}\) and \(s=1,\cdots ,r\))

$$\begin{aligned} M^{j,k}[G]\cdot (u^{s,1})_{1:r}= (u^{s,j})_k\cdot (u^{s,1})_{1:r}. \end{aligned}$$
(25)

4 Low Rank Tensor Decompositions

Without loss of generality, assume the dimensions are decreasing as

$$\begin{aligned} n_1 \geqslant n_2 \geqslant \cdots \geqslant n_m. \end{aligned}$$

We discuss how to compute tensor decomposition for a tensor \(\mathcal {F}\in \mathbb {C}^{n_1 \times \cdots \times n_m}\) when the rank r is not bigger than the highest dimension, i.e., \(r \leqslant n_1\). As in Theorem 3, the decomposing vectors \((u^{s,1})_{1:r}\) are common eigenvectors of the matrices \(M^{j,k}[G]\), with \((u^{s,j})_k\) being the eigenvalues, respectively. This implies that the matrices \(M^{j,k}[G]\) are simultaneously diagonalizable. This property can be used to compute tensor decompositions.

Suppose G is a matrix such that (24) holds and \(M^{j,k}[G]\) are simultaneously diagonalizable. That is, there is an invertible matrix \(P \in \mathbb {C}^{r \times r}\) such that all the products \(P^{-1} M^{j,k}[G] P\) are diagonal for all \(j=2,\cdots , m\) and for all \(k = 2, \cdots , n_j\). Suppose \(M^{j,k}[G]\) are diagonalized such that

$$\begin{aligned} P^{-1} M^{j,k}[G] P \, = \, \hbox {diag}[\lambda _{j,k,1},\lambda _{j,k,2},\cdots ,\lambda _{j,k,r}] \end{aligned}$$
(26)

with the eigenvalues \(\lambda _{j,k,s}\). For each \(s =1,\cdots ,r\) and \(j=2,\cdots , m\), denote the vectors

$$\begin{aligned} u^{s,j} \, :=\, (1, \lambda _{j,2, s}, \cdots , \lambda _{j,n_j,s} ). \end{aligned}$$
(27)

When \(\mathcal {F}\) is rank-r, there exist vectors \(u^{1,1}, \cdots , u^{r,1} \in \mathbb {C}^{n_1}\) such that

$$\begin{aligned} \mathcal {F}=\sum _{s=1}^r u^{s,1} \otimes u^{s,2} \otimes \cdots \otimes u^{s,m} . \end{aligned}$$
(28)

The vectors \(u^{s,1}\) can be found by solving linear equations after \(u^{s,j}\) are obtained for \(j=2,\cdots ,m\) and \(s=1,\cdots ,r\). The existence of vectors \(u^{s,1}\) satisfying tensor decomposition (28) is shown in the following theorem.

Theorem 4

Let \({\mathcal {F}} =V^{(1)} \circ \cdots \circ V^{(m)}\) be a rank-r tensor, for matrices \(V^{(i)} \in \mathbb {C}^{n_i \times r}\), such that the first r rows of \(V^{(1)}\) are linearly independent. Suppose G is a matrix satisfying (24) and \(P \in \mathbb {C}^{r \times r}\) is an invertible matrix such that all matrix products \(P^{-1} \cdot M^{j,k}[G] \cdot P\) are simultaneously diagonalized as in (26). For \(j=2,\cdots ,m\) and \(s=1,\cdots , r\), let \(u^{s,j}\) be vectors given as in (27). Then, there must exist vectors \(u^{1,1}, \cdots , u^{r,1} \in \mathbb {C}^{n_1}\) such that tensor decomposition (28) holds.

Proof

Since the matrix \(P = \begin{bmatrix} p_1&\cdots&p_r \end{bmatrix}\) is invertible, there exist scalars \(c_1,\cdots , c_r \in \mathbb {C}\) such that

$$\begin{aligned} \mathcal {F}_{1:r,1,\cdots ,1} = c_1 p_1 +c_2 p_2 + \cdots + c_r p_r . \end{aligned}$$
(29)

Consider the new tensor

$$\begin{aligned} \mathcal {H}\, :=\, \sum _{s=1}^r c_s p_s \otimes u^{s,2}\otimes \cdots \otimes u^{s,m} . \end{aligned}$$

In the following, we show that \(\mathcal {F}_{1:r,:,\cdots ,:} = \mathcal {H}\) and there exist vectors \(u^{1,1}, \cdots , u^{r,1} \in \mathbb {C}^{n_1}\) satisfying equation (28).

By Theorem 3, one can see that the generating matrix G for \(\mathcal {F}\) is also a generating matrix for \(\mathcal {H}\), so it holds that

$$\begin{aligned} \langle \phi [G,\tau ] p, \mathcal {F}\rangle = \langle \phi [G,\tau ] p, \mathcal {H}\rangle =0, \quad \hbox {for all} \,\, p \in \mathbb {M}_{\{1,j\}^c} . \end{aligned}$$
(30)

Therefore, we have

$$\begin{aligned} \langle \phi [G,\tau ] p, \mathcal {H}- \mathcal {F}\rangle =0, \quad \hbox {for all} \,\, p \in \mathbb {M}_{\{1,j\}^c}. \end{aligned}$$
(31)

By (29), one can see that

$$\begin{aligned} (\mathcal {H}- \mathcal {F})_{1:r,1,\cdots ,1}=0. \end{aligned}$$
(32)

In (31), for each \(\tau = (i, 2, k) \in J\) and \(p = 1\), we can get

$$\begin{aligned} (\mathcal {H}- \mathcal {F})_{1:r,:,1,\cdots ,1}=0. \end{aligned}$$

Similarly, for \(\tau = (i, 2, k) \in J\) and \(p=x_{3,j_3}\), we can get

$$\begin{aligned} (\mathcal {H}- \mathcal {F})_{1:r,:,:,1,\cdots ,1}=0. \end{aligned}$$

Continuing this, we can eventually get \(\mathcal {H}= \mathcal {F}_{1:r,:,:,\cdots , :}\). Since the matrix \((V^{(1)})_{1:r,:}\) is invertible, there exists a matrix \(W \in \mathbb {C}^{n_1\times r} \) such that \(V^{(1)} = W (V^{(1)})_{1:r,:}.\) Observe that

$$\begin{aligned} {\mathcal {F}} = W \times _1 \mathcal {F}_{1:r,:,:,\cdots , :}= W \times _1 \mathcal {H}. \end{aligned}$$

Let \(u^{s,1}=W \cdot (c_sp_s)\) for \(s=1,\cdots ,r\). Then tensor decomposition (28) holds.

4.1 An Algorithm for Computing Tensor Decompositions

Consider a tensor \(\mathcal {F}\in \mathbb {C}^{n_1 \times n_2 \times \cdots \times n_m}\) with a given rank r. Recall that the dimensions are ordered such that \(n_1 \geqslant n_2 \geqslant \cdots \geqslant n_m\). We discuss how to compute a rank-r tensor decomposition for \(\mathcal {F}\). Recall \(A[{\mathcal {F}},j]\), \(b[{\mathcal {F}},j,k]\) as in (23), for \(j > 1\). Note that \(A[{\mathcal {F}},j]\) has the dimension \(N_j \times r\), where

$$\begin{aligned} N_j \, :=\, \frac{n_2 \cdots n_m}{n_j} . \end{aligned}$$
(33)

If \(r \leqslant N_j\), then the matrices \(A[{\mathcal {F}},j]\) have full column rank for generic cases. For instance, \(r \leqslant N_3\) if \(m=3\) and \(r \leqslant n_2\). Since \(N_2\) is the smallest, we often use the matrices \(A[{\mathcal {F}},j]\) for \(j \geqslant 3\). For convenience, denote the label set

$$\begin{aligned} \Upsilon \, :=\, \{(j, k): 3 \leqslant j \leqslant m, 2 \leqslant k \leqslant n_j \}. \end{aligned}$$
(34)

In the following, we consider the case that \(r \leqslant N_3\). For each pair \((j,k) \in \Upsilon \), linear system (24) has a unique solution, for which we denote

$$\begin{aligned} Y^{j,k} \, = \, M^{j,k}[G]. \end{aligned}$$

For \(j=2\), equation (24) may not have a unique solution if \(r > N_2\). In the following, we show how to get the tensor decomposition without using the matrices \(M^{2,k}[G]\). By Theorem 3, the matrices \(Y^{j,k}\) are simultaneously diagonalizable, that is, there is an invertible matrix \(P \in \mathbb {C}^{r \times r}\) such that all products \(P^{-1} Y^{j,k} P\) are diagonal for every \((j,k) \in \Upsilon \). Suppose they are diagonalized as

$$\begin{aligned} P^{-1} Y^{j,k} P \, = \, \hbox {diag}[\lambda _{j,k,1},\lambda _{j,k,2},\cdots ,\lambda _{j,k,r}] \end{aligned}$$
(35)

with the eigenvalues \(\lambda _{j,k,s}\). Write P in the column form

$$\begin{aligned} P \, = \, \begin{bmatrix} p_1&\cdots&p_r \end{bmatrix}. \end{aligned}$$

For each \(s =1,\cdots ,r\) and \(j=3,\cdots , m\), let

$$\begin{aligned} v^{s,j} \, :=\, (1, \lambda _{j,2, s}, \cdots , \lambda _{j,n_j,s} ). \end{aligned}$$
(36)

Suppose \({\mathcal {F}}\) has a rank-r decomposition

$$\begin{aligned} {\mathcal {F}} \, = \, \sum _{s=1}^r u^{s,1} \otimes \cdots \otimes u^{s,m}. \end{aligned}$$

Under the assumptions of Theorem 3, linear system (24) has a unique solution for each pair \((j,k) \in \Upsilon \). For every \(j \in \{3,\cdots ,m\}\), there exist scalars \(c_{s,j},c_{s,1}\) such that

$$\begin{aligned} u^{s,j} \, = \, c_{s,j} v^{s,j}, \quad u^{s,1} \, = \, c_{s,1} p_s . \end{aligned}$$

Then, we consider the sub-tensor equation in the vector variables \(y_1, \cdots , y_r \in \mathbb {C}^{n_2}\)

$$\begin{aligned} \mathcal {F}_{1:r,:,\cdots , :} \, = \, \sum _{s=1}^r p_s \otimes y_s \otimes v^{s,3} \otimes \cdots \otimes v^{s,m}. \end{aligned}$$
(37)

There are \(r n_2 \cdots n_m\) equations and \(rn_2\) unknowns. This overdetermined linear system has solutions such that

$$\begin{aligned} y_s=c_{s,2} u^{s,2}, ~\mathrm {for~some}~c_{s,2} \in \mathbb {C}. \end{aligned}$$

After all \(y_s\) are obtained, we solve the linear equation in \(z_1, \cdots , z_r \in \mathbb {C}^{n_1-r}\)

$$\begin{aligned} \mathcal {F}_{r+1:n_1,:,\cdots , :} =\sum _{s=1}^r z_s \otimes y_s \otimes v^{s,3} \otimes \cdots \otimes v^{s,m} . \end{aligned}$$
(38)

After all \(y_s, z_s\) are obtained, we choose the vectors (\(s=1,\cdots ,r\))

$$\begin{aligned} v^{s,1} = \begin{bmatrix} p_s \\ z_s \end{bmatrix}, \quad v^{s,2} = y_s. \end{aligned}$$

Then we get the tensor decomposition

$$\begin{aligned} \mathcal {F}\, = \, \sum _{s=1}^r v^{s,1} \otimes v^{s,2} \otimes \cdots \otimes v^{s,m} . \end{aligned}$$
(39)

Summarizing the above, we get the following algorithm for computing tensor decompositions when \(r\leqslant n_1\) and \(r \leqslant N_3\). Suppose the dimensions are ordered such that \(n_1 \geqslant n_2 \geqslant \cdots \geqslant n_m\).

Algorithm 1

(Rank-r tensor decomposition)

Input:

A tensor \({\mathcal {F}}\in \mathbb {C}^{n_1 \times \cdots \times n_m}\) with rank \(r \leqslant \min (n_1, N_3)\).

Step 1:

For each pair \((j,k) \in \Upsilon \), solve the matrix equation for the solution \(Y^{j,k}\):

$$\begin{aligned} A[\mathcal {F},j] Y^{j,k} \, = \, b[{\mathcal {F}},j,k]. \end{aligned}$$
(40)
Step 2:

Choose generic scalars \(\xi _{j,k} \). Then compute the eigenvalue decomposition \(P^{-1} Y P = D\) for the matrix

$$\begin{aligned} Y \, :=\, \frac{1}{\sum \limits _{ (j,k) \in \Upsilon } \xi _{j,k}} \sum \limits _{ (j,k) \in \Upsilon } \xi _{j,k} Y^{j,k} . \end{aligned}$$
Step 3:

For \(s = 1, \cdots , r\) and \(j \geqslant 3\), let \(v^{s,j}\) be the vectors as in (36).

Step 4:

Solve linear system (37) for vectors \(y_1, \cdots , y_r\).

Step 5:

Solve linear system (38) for vectors \(z_1, \cdots , z_r\).

Step 6:

For each \(s = 1, \cdots , r\), let \(v^{s,1} = \begin{bmatrix} p_s \\ z_s \end{bmatrix}\) and \(v^{s,2} = y_s\).

Output:

A tensor rank-r decomposition as in (39).

The correctness of Algorithm 1 is justified as follows.

Theorem 5

Suppose \(n_1 \geqslant n_2 \geqslant \cdots \geqslant n_m\) and \(r \leqslant \min (n_1, N_3)\) as in (33). For a generic tensor \(\mathcal {F}\) of rank-r, Algorithm 1 produces a rank-r tensor decomposition for \(\mathcal {F}\).

Proof

This can be implied by Theorem 4.

4.2 Tensor Decompositions Via Reshaping

A tensor \(\mathcal {F}\in \mathbb {C}^{n_1 \times \cdots \times n_m}\) can be reshaped as a cubic order tensor \(\widehat{\mathcal {F}}\) as in (12). One can apply Algorithm 1 to compute tensor decomposition (12) for \(\widehat{\mathcal {F}}\). If the decomposing vectors \(w^{s,1}, w^{s,2}, w^{s,3}\) can be reshaped to rank-1 tensors, then we can convert (12) to a tensor decomposition for \(\mathcal {F}\). This is justified by Theorem 2, under some assumptions. A benefit for doing this is that we may be able to compute tensor decompositions for the case that

$$ N_3 < r \leqslant p_2, $$

with the dimension \(p_2\) as in Theorem 2. This leads to the following algorithm for computing tensor decompositions.

Algorithm 2

(Tensor decompositions via reshaping.) Let \(p_1, p_2, p_3\) be dimensions as in Theorem 2.

Input:

A tensor \(\mathcal {F}\in \mathbb {C}^{n_1 \times \cdots \times n_m}\) with rank \(r \leqslant p_2\).

Step 1:

Reshape the tensor \(\mathcal {F}\) to a cubic tensor \(\widehat{\mathcal {F}} \in \mathbb {C}^{p_1 \times p_2 \times p_3}\) as in (12).

Step 2:

Use Algorithm 1 to compute the tensor decomposition

$$\begin{aligned} \widehat{\mathcal {F}} \, = \, \sum _{s=1}^r w^{s,1}\otimes w^{s,2} \otimes w^{s,3} . \end{aligned}$$
(41)
Step 3:

If all \(w^{s,1}, w^{s,2}, w^{s,3}\) can be expressed as outer products of rank-1 tensors as in (11), then output the tensor decomposition as in (3). If one of \(w^{s,1}, w^{s,2}, w^{s,3}\) cannot be expressed as in (11), then the reshaping does not produce a tensor decomposition for \(\mathcal {F}\).

Output:

A tensor decomposition for \(\mathcal {F}\) as in (3).

For Algorithm 2, we have a similar conclusion like Theorem 5. For cleanness of the paper, we do not repeat it here.

5 Low Rank Tensor Approximations

When a tensor \(\mathcal {F}\in \mathbb {C}^{n_1 \times \cdots \times n_m}\) has the rank bigger than r, the linear systems in Algorithm 1 may not be consistent. However, we can find linear least squares solutions for them. This gives an algorithm for computing low rank tensor approximations. Recall the label set \(\Upsilon \) as in (34). The following is the algorithm.

Algorithm 3

(Rank-r tensor approximation.)

Input:

A tensor \({\mathcal {F}}\in \mathbb {C}^{n_1 \times n_2 \times \cdots \times n_m} \) and a rank \(r \leqslant \min (n_1, N_3)\).

Step 1:

For each pair \((j,k) \in \Upsilon \), solve the linear least squares problem

$$\begin{aligned} \min \limits _{ Y^{j,k} \in \mathbb {C}^{r \times r} } \quad \bigg \Vert A[{\mathcal {F}},j] (Y^{j,k} )^T - b[{\mathcal {F}},j,k] \bigg \Vert ^2 . \end{aligned}$$
(42)

Let \(\hat{Y}^{j,k}\) be an optimizer.

Step 2:

Choose generic scalars \(\xi _{j,k}\) and let

$$\begin{aligned} {\hat{Y}}[\xi ] \, = \, \frac{1}{\sum \limits _{ (j,k) \in \Upsilon } \xi _{j,k}} \sum \limits _{ (j,k) \in \Upsilon } \xi _{j,k} {\hat{Y}}^{j,k} . \end{aligned}$$

Compute the eigenvalue decomposition \({\hat{P}}^{-1} \hat{Y}[\xi ] \hat{P} = \Lambda \) such that \(\hat{P} = \begin{bmatrix} \hat{p}_1&\cdots&\hat{p}_r \end{bmatrix}\) is invertible and \(\Lambda \) is diagonal.

Step 3:

For each pair \((j,k) \in \Upsilon \), select the diagonal entries

$$\begin{aligned} \hbox {diag}[\hat{\lambda }_{j,k,1} \,\, \hat{\lambda }_{j,k,2} \,\, \cdots \,\, \hat{\lambda }_{j,k,r} ] \, = \, \hbox {diag} (\hat{P}^{-1} \hat{Y}^{j,k} \hat{P}). \end{aligned}$$

For each \(s=1,\cdots ,r\) and \(j= 3, \cdots , m\), let

$$\begin{aligned} \hat{v}^{s,j} = (1, \hat{\lambda }_{j,2,2}, \cdots , \hat{\lambda }_{j,n_j,s} ). \end{aligned}$$
Step 4:

Let \((\hat{y}_1, \cdots , \hat{y}_r)\) be an optimizer for the following least squares:

$$\begin{aligned} \min \limits _{ (y_1, \cdots , y_r) } \quad \bigg \Vert \mathcal {F}_{1:r, :, \cdots , :} - \sum _{s=1}^r \hat{p}_s \otimes y_s \otimes \hat{v}^{s,3} \otimes \cdots \otimes \hat{v}^{s,m} \bigg \Vert ^2. \end{aligned}$$
(43)
Step 5:

Let \((\hat{z}_1, \cdots , \hat{z}_r)\) be an optimizer for the following least squares:

$$\begin{aligned} \min \limits _{ (z_1, \cdots , z_r) } \quad \bigg \Vert \mathcal {F}_{r+1:n_1, :, \cdots , :} - \sum _{s=1}^r z_s \otimes \hat{y}_s \otimes \hat{v}^{s,3} \otimes \cdots \otimes \hat{v}^{s,m} \bigg \Vert ^2. \end{aligned}$$
(44)
Step 6:

Let \(\hat{v}^{s,1} = \begin{bmatrix} \hat{p}_s \\ \hat{z}_s \end{bmatrix}\) and \(\hat{v}^{s,2} = \hat{y}_s\) for each \(s=1,\cdots , r\).

Output:

A rank-r approximation tensor

$$\begin{aligned} \mathcal {X}^{gp} \, :=\, \sum _{s=1}^r \hat{v}^{s,1} \otimes \hat{v}^{s,2} \otimes \cdots \otimes \hat{v}^{s,m}. \end{aligned}$$
(45)

If \({\mathcal {F}}\) is sufficiently close to a rank-r tensor, then \(\mathcal {X}^{gp}\) is expected to be a good rank-r approximation. Mathematically, the tensor \(\mathcal {X}^{gp}\) produced by Algorithm 3 may not be a best rank-r approximation. However, in computational practice, we can use (45) as a starting point to solve the nonlinear least squares optimization

$$\begin{aligned} \min \limits _{ (u^{s,1}, \cdots , u^{s,m}) } \bigg \Vert \mathcal {F}- \sum _{s=1}^r u^{s,1} \otimes u^{s,2} \otimes \cdots \otimes u^{s,m} \bigg \Vert ^2. \end{aligned}$$
(46)

to improve the approximation quality. Let \(\mathcal {X}^{opt}\) be a rank-r approximation tensor

$$\begin{aligned} \mathcal {X}^{opt} \, :=\, \sum _{s=1}^r u^{s,1} \otimes u^{s,2} \otimes \cdots \otimes u^{s,m} \end{aligned}$$
(47)

which is an optimizer to (46) obtained by nonlinear optimization methods with \(\mathcal {X}^{opt}\) as the initial point.

5.1 Approximation Error Analysis

Suppose the tensor \(\mathcal {F}\) has a best (or nearly best) rank-r approximation

$$\begin{aligned} \mathcal {X}^{bs} \, :=\, \sum _{s=1}^{r} (x^{s,1}) \otimes (x^{s,2}) \otimes \cdots \otimes (x^{s,m}). \end{aligned}$$
(48)

Let \(\mathcal {E}\) be the tensor such that

$$\begin{aligned} \mathcal {F}= \mathcal {X}^{bs} + \mathcal {E}. \end{aligned}$$
(49)

We analyze the approximation performance of \(\mathcal {X}^{gp}\) when the distance \(\epsilon = \Vert \mathcal {E}\Vert \) is small. For a generating matrix G and a generic \(\xi = (\xi _{j,k})_{(j,k) \in \Upsilon }\), denote that

$$\begin{aligned} M[\xi ,G] \, :=\, \frac{1}{\sum \limits _{ (j,k) \in \Upsilon } \xi _{j,k}} \sum \limits _{(j,k) \in \Upsilon } \xi _{j,k} M^{j,k}[G]. \end{aligned}$$
(50)

Recall the \(A[{\mathcal {F}},j]\), \(b[{\mathcal {F}},j,k]\) as in (23). Note that

$$\begin{aligned} \begin{aligned} A[{\mathcal {F}},j]=&~A[\mathcal {X}^{bs},j]+A[\mathcal {\mathcal {E}},j], \\ b[{\mathcal {F}},j,k]=&~b[\mathcal {X}^{bs},j,k]+b[\mathcal {\mathcal {E}},j,k]. \end{aligned} \end{aligned}$$
(51)

Suppose \(\left( x^{s, j}\right) _{1} \ne 0\) for \(j=2, \cdots , m \).

Theorem 6

Let \(\mathcal {X}^{gp}\) be produced by Algorithm 3. Let \({\mathcal {F}},\mathcal {X}^{bs},\mathcal {X}^{opt},\mathcal {E},x^{s,j},\xi _{j,k}\) be as above. Assume the following conditions hold:

  1. (i)

    The subvectors \((x^{1,1})_{1:r}, \cdots , (x^{r,1})_{1:r}\) are linearly independent.

  2. (ii)

    All matrices \(A[ {\mathcal {F}},j ]\) and \(A[ \mathcal {X}^{bs},j ]\) (\(3 \leqslant j \leqslant m\)) have full column rank.

  3. (iii)

    The first entry \(\left( x^{s, j}\right) _{1} \ne 0\) for all \(j=2, \cdots , m \).

  4. (iv)

    The following scalars are pairwisely distinct

    $$\begin{aligned} \sum _{ (j,k) \in \Upsilon } \xi _{j, k}(x^{1,j})_k, \cdots , \sum _{ (j,k) \in \Upsilon } \xi _{j ,k}(x^{r,j})_k . \end{aligned}$$
    (52)

If the distance \(\epsilon = \Vert \mathcal {F}- \mathcal {X}^{bs} \Vert \) is sufficiently small, then

$$\begin{aligned} \Vert \mathcal {X}^{bs}-\mathcal {X}^{gp} \Vert = O(\epsilon ) \quad \quad , \quad \Vert {\mathcal {F}}-\mathcal {X}^{gp} \Vert = O(\epsilon ). \end{aligned}$$
(53)

where the constants in the above \(O(\cdot )\) only depend on \({\mathcal {F}}\) and \(\xi \).

Proof

By conditions (i) and (iii) and by Theorem 3, there exists a generating matrix \(G^{bs}\) for \(\mathcal {X}^{bs}\) such that

$$\begin{aligned} A[\mathcal {X}^{b s}, j] (M^{j,k}[G^{bs}])^T \, = \, b[\mathcal {X}^{b s},j,k] \end{aligned}$$
(54)

for all \(j \in \{2,\cdots ,m\}\) and \(k \in \{2,\cdots ,n_j\}\). Note that \(Y^{j,k}\) is the least squares solution to (42), so for each \((j,k) \in \Upsilon \),

$$\begin{aligned} Y^{j,k}=A[{\mathcal {F}}, j]^{\dagger } \cdot b[{\mathcal {F}},j,k], \quad M^{j,k}[G^{bs}_0]=A[\mathcal {X}^{b s}, j]^{\dagger } \cdot b[\mathcal {X}^{b s}, j,k ]. \end{aligned}$$

(The superscript \(^\dagger \) denotes the pseudoinverse of a matrix.) By (49), for \(j=2,\cdots ,m\), we have

$$\begin{aligned} \begin{array}{l} \left\| A[{\mathcal {F}}, j]-A[\mathcal {X}^{b s}, j]\right\| _{F} \leqslant \left\| \mathcal {F}- \mathcal {X}^{bs} \right\| \leqslant \varepsilon , \\ \left\| b[{\mathcal {F}}, j,k ]-b[\mathcal {X}^{b s}, j,k ]\right\| _{F} \leqslant \left\| \mathcal {F}- \mathcal {X}^{bs} \right\| \leqslant \varepsilon . \end{array} \end{aligned}$$
(55)

Hence, by the condition (ii), if \(\varepsilon >0\) is small enough, we have

$$\begin{aligned} \left\| Y^{j, k} - M^{j, k}[G^{bs}]\right\| = O(\varepsilon ). \end{aligned}$$
(56)

for all \((j,k) \in \Upsilon \). This follows from perturbation analysis for linear least squares (see [42, Theorem 3.4]).

By (48) and Theorem 3, for \(s=1, \cdots , r\) and \((j,k) \in \Upsilon \), it holds that

$$\begin{aligned} M^{j, k}[G^{bs}] \left( x^{s, 1}\right) _{1:r} \quad = \quad \left( x^{s, j}\right) _{k} \left( x^{s, 1}\right) _{1:r} . \end{aligned}$$

This means that each \(\left( x^{s, 1}\right) _{1: r}\) is an eigenvector of \(M^{j, k}[G^{bs}]\), associated with the eigenvalue \(\left( x^{s, j}\right) _{k}\), for each \(s=\) \(1, \cdots , r\). The matrices \(M^{j, k}[G^{bs}]\) are simultaneously diagonalizable, by the condition (i). So \(M[\xi ,G^{bs}]\) is also diagonalizable. Note the eigenvalues of \(M[\xi ,G^{bs}]\) are the sums in (52). They are distinct from each other, by the condition (iv). When \(\epsilon >0\) is small enough, \(M[\xi ,G^{bs}]\) also has distinct eigenvalues. Write that

$$\begin{aligned} Q = \begin{bmatrix} (x^{1,1})_{1:r}&\cdots&(x^{r,1})_{1:r} \end{bmatrix} . \end{aligned}$$

Note that \(Q^{-1} M[\xi ,G^{bs}]Q = D\) is an eigenvalue decomposition. Up to a scaling on \(\hat{P}\) in algorithm 3, it holds that

$$\begin{aligned} \Vert \hat{p}_s - x^{s,1} \Vert _{2}=O(\varepsilon ), \quad \Vert D - \Lambda \Vert _{F}=O(\varepsilon ). \end{aligned}$$
(57)

We refer to [43] for the perturbation bounds in (57). The constants in the above \(O(\cdot )\) eventually only depend on \(\mathcal {F},\xi \).

Note that \((\hat{y}_s,\cdots , \hat{y}_r)\) is the least squares solution to (43) and

$$\begin{aligned} \mathcal {X}^{bs}_{1:r, :, \cdots , :} = \sum _{s=1}^r x^{s,1} \otimes x^{s,2} \otimes x^{s,3} \otimes \cdots \otimes x^{s,m} . \end{aligned}$$
(58)

Due to perturbation analysis of linear least squares, we also have

$$\begin{aligned} \Vert \hat{y}_s - x^{s,2}\Vert _{2}=O(\varepsilon ) . \end{aligned}$$
(59)

Note that the subvectors \((x^{s,1})_{r+1:n_1}\) satisfy the equation

$$\begin{aligned} \mathcal {X}^{bs}_{r+1:n_1, :, \cdots , :} = \sum _{s=1}^r (x^{s,1})_{r+1:n_1} \otimes x^{s,2} \otimes \cdots \otimes x^{s,m} . \end{aligned}$$
(60)

Recall that \((\hat{z}_1, \cdots , \hat{z}_r)\) is the least squares solution to (44). Due to perturbation analysis of linear least squares, we further have the error bound

$$\begin{aligned} \Vert (x^{s, 1})_{r+1:n_1} - \hat{z}_s \Vert _{2}= O (\varepsilon ) . \end{aligned}$$
(61)

Summarizing the above, we eventually get \(\Vert \mathcal {X}^{g p}-\mathcal {X}^{bs} \Vert =O(\varepsilon )\), so

$$\begin{aligned} \left\| {\mathcal {F}}-\mathcal {X}^{g p}\right\| \leqslant \left\| {\mathcal {F}}-\mathcal {X}^{b s}\right\| + \left\| \mathcal {X}^{b s}-\mathcal {X}^{g p}\right\| = O(\varepsilon ) . \end{aligned}$$

The constant for the above \(O(\cdot )\) eventually only depends on \(\mathcal {F}\), \(\xi \).

5.2 Reshaping for Low Rank Approximations

Similar to tensor decompositions, the reshaping trick as in Sect. 4.2 can also be used for computing low rank tensor approximations. For \(m>3\), a tensor \(\mathcal {F}\in \mathbb {C}^{n_1 \times n_2 \times \cdots \times n_m}\) can be reshaped as a cubic tensor \(\widehat{\mathcal {F}} \in \mathbb {C}^{p_1 \times p_2 \times p_3}\) as in (12). Similarly, Algorithm 3 can be used to compute low rank tensor approximations. Suppose the computed rank-r approximating tensor for \(\widehat{\mathcal {F}}\) is

$$\begin{aligned} \widehat{\mathcal {X}}^{gp}:=\sum _{s=1}^{r} \hat{w}^{s,1} \otimes \hat{w}^{s,2} \otimes \hat{w}^{s,3}. \end{aligned}$$
(62)

Typically, the decomposing vectors \(\hat{w}^{s,1}, \hat{w}^{s,2},\hat{w}^{s,3}\) may not be reshaped to rank-1 tensors. Suppose the reshaping is such that \(I_1 \cup I_2 \cup I_3 =\{1,2,\cdots ,m\}\) is a union of disjoint label sets and the reshaped dimensions are

$$\begin{aligned} p_1 = \prod _{i \in I_1} n_i, \quad p_2 = \prod _{i \in I_2} n_i, \quad p_3 = \prod _{i \in I_3} n_i . \end{aligned}$$

Let \(m_i = |I_i|\) for \(i=1,2,3\). By the reshaping, the vectors \(\hat{w}^{s,i}\) can be reshaped back to a tensor \(\hat{W}^{s,i}\) of order \(m_i\), for each \(i=1,2,3\). If \(m_i = 1\), \(\hat{W}^{s,i}\) is a vector. If \(m_i = 2\), we can find a best rank-1 matrix approximation for \(\hat{W}^{s,i}\). If \(m_i \geqslant 3\), we can apply Algorithm 3 with \(r=1\) to get a rank-1 approximation for \(\hat{W}^{s,i}\). In application, we are mostly interested in reshaping such that all \(m_i \leqslant 2\). Finally, this produces a rank-r approximation for \(\mathcal {F}\).

The following is a low rank tensor approximation algorithm via reshaping tensors.

Algorithm 4

(low rank tensor approximations via reshaping.)

Input:

A tensor \(\mathcal {F}\in \mathbb {C}^{n_1 \times n_2 \times \cdots \times n_m}\) and a rank r.

Step 1:

Reshape \(\mathcal {F}\) to a cubic order tensor \(\widehat{\mathcal {F}} \in \mathbb {C}^{p_1 \times p_2 \times p_3}\).

Step 2:

Use Algorithm 3 to compute a rank-r approximating tensor \(\widehat{\mathcal {X}}^{gp}\) as in (62) for \(\widehat{\mathcal {F}}\).

Step 3:

For each \(i=1,2,3\), reshape each vector \(\hat{w}^{s,i}\) back to a tensor \(\widehat{W}^{s,i}\) of order \(m_i\) as above.

Step 4:

For each \(i=1,2,3\), compute a rank-1 approximating tensor \(\widehat{X}^{s,i}\) for \(\widehat{W}^{s,i}\) of order \(m_i\) as above.

Output:

Reshape the sum \(\sum \limits _{s=1}^r \widehat{X}^{s,1} \otimes \widehat{X}^{s,2} \otimes \widehat{X}^{s,3}\) to a tensor in \(\mathbb {C}^{n_1 \times n_2 \times \cdots \times n_m}\), which is a rank-r approximation for \(\mathcal {F}\).

We can do a similar approximation analysis for Algorithm 4 as for Theorem 6. For cleanness of the paper, we do not repeat that.

6 Numerical Experiments

In this section, we apply Algorithms 1 and 3 to compute tensor decompositions and low rank tensor approximations. We implement these algorithms in MATLAB 2020b on a workstation with Ubuntu 20.04.2 LTS, Intel ®Xeon(R) Gold 6248R CPU @ 3.00GHz and memory 1TB. For computing low rank tensor approximations, we use the function \(cpd\_nls\) provided in Tensorlab 3.0 [44] to solve nonlinear least squares optimization (46). The \(\mathcal {X}^{gp}\) denotes the approximating tensor returned by Algorithm 3, and \(\mathcal {X}^{opt}\) denotes the approximating tensor obtained by solving (46), with \(\mathcal {X}^{gp}\) as the initial point. In our numerical experiments, if the rank r is unknown, we use the most square flattening matrix to estimate r as in (4) and Lemma 1.

Example 2

Consider the tensor \({\mathcal {F}}\in \mathbb {C}^{4 \times 4 \times 3}\) whose slices \({\mathcal {F}}_{:,:,1}, {\mathcal {F}}_{:,:,2}, {\mathcal {F}}_{:,:,3}\) are, respectively

By Lemma 1, the estimated rank is \(r=4\).

Applying Algorithm 1 with \(r=4\), we get the rank-4 decomposition \(\mathcal {F}= U^{(1)} \circ U^{(2)} \circ U^{(3)}\), with

$$\begin{aligned} U^{(1)}= \begin{bmatrix} 8 &{} 6 &{} 4 &{} 9 \\ 8 &{} 12 &{} 16 &{} 12 \\ 4 &{} 6 &{} 4 &{} 12 \\ 4 &{} 12 &{} 8 &{} 9 \\ \end{bmatrix}, ~U^{(2)}= \begin{bmatrix} 1 &{} 1 &{} 1 &{} 1 \\ \frac{1}{2} &{} 1 &{} 3 &{} \frac{1}{3} \\ 1 &{} 1 &{} 3 &{} 1 \\ 1 &{} 4 &{} 1 &{} \frac{2}{3} \\ \end{bmatrix}, ~U^{(3)}= \begin{bmatrix} 1 &{} 1 &{} 1 &{} 1 \\ 2 &{} 1 &{} 1 &{} 2 \\ 1 &{} \frac{2}{3} &{} \frac{3}{4} &{} 3 \\ \end{bmatrix}. \end{aligned}$$

Example 3

Consider the tensor in \(\mathbb {C}^{5 \times 4 \times 3 \times 3}\)

$$\begin{aligned} {\mathcal {F}}= V^{(1)} \circ V^{(2)} \circ V^{(3)} \circ V^{(4)}, \end{aligned}$$

where the matrices \(V^{(i)}\) are

$$\begin{aligned} V^{(1)}=\begin{bmatrix} 10 &{} 5 &{}-9 &{} -5 &{} 7\\ 8 &{} 6 &{} -3 &{} -9 &{} 7\\ -9 &{} -1 &{} 7 &{} -3 &{} -1\\ 9 &{} -7 &{} -8 &{} 8 &{} -5\\ -1 &{} 10 &{} 7 &{} -3 &{} 10 \\ \end{bmatrix}, \quad V^{(2)}= \begin{bmatrix} -1 &{} 9 &{} -8 &{} 8 &{} 2\\ 0 &{} -1 &{} -4 &{} 6 &{} 8\\ 7 &{} -7 &{} -2 &{} 2 &{} 10\\ 2 &{} 10 &{} -3 &{} -1 &{} -3\\ \end{bmatrix},\\ V^{(3)}= \begin{bmatrix} 5 &{} 2 &{} -2 &{} -7 &{} 3\\ 9 &{} -3 &{} -7 &{} 7 &{} -2\\ 0 &{} -10 &{} 10 &{} 6 &{} 10\\ \end{bmatrix}, \quad V^{(4)}= \begin{bmatrix} 8 &{} 2 &{} -7 &{} 10 &{} -5 \\ 4 &{} -8 &{} 4 &{} -6 &{} -10\\ 5 &{} 0 &{} 7 &{} -1 &{} -2\\ \end{bmatrix}. \end{aligned}$$

By Lemma 1, the estimated rank \(r=5\).

Applying Algorithm 1 with \(r=5\), we get the rank-5 tensor decomposition \(\mathcal {F}= U^{(1)} \circ U^{(2)} \circ U^{(3)} \circ U^{(4)}\), where the computed matrices \(U^{(i)}\) are

$$\begin{aligned}{} & {} U^{(1)}=\begin{bmatrix} -400 &{} 180 &{} 100\,8 &{} 280\,0 &{} -210 \\ -320 &{} 216 &{} 336 &{} 504\,0 &{} -210 \\ 360 &{} -36 &{} -784 &{} 168\,0 &{} 30 \\ -360 &{} -252 &{} 896 &{} -448\,0 &{} 150 \\ 40 &{} 360 &{} -784 &{} 168\,0 &{} -300 \\ \end{bmatrix}, \quad U^{(2)}=\begin{bmatrix} 1 &{} 1 &{} 1 &{} 1 &{} 1 \\ 0 &{} -\frac{1}{9} &{} \frac{1}{2} &{} \frac{3}{4} &{} 4 \\ -7 &{} -\frac{7}{9} &{} \frac{1}{4} &{} \frac{1}{4} &{} 5 \\ -2 &{} \frac{10}{9} &{} \frac{3}{8} &{} -\frac{1}{8} &{} -\frac{3}{2} \\ \end{bmatrix},\\{} & {} \quad U^{(3)}=\begin{bmatrix} 1 &{} 1 &{} 1 &{} 1 &{} 1 \\ \frac{9}{5} &{} -\frac{3}{2} &{} \frac{7}{2} &{} -1 &{} -\frac{2}{3} \\ 0 &{} -5 &{} -5 &{} -\frac{6}{7} &{} \frac{10}{3} \\ \end{bmatrix},\\{} & {} \quad U^{(4)}=\begin{bmatrix} 1 &{} 1 &{} 1 &{} 1 &{} 1 \\ \frac{1}{2} &{} -4 &{} -\frac{4}{7} &{} -\frac{3}{5} &{} 2 \\ \frac{5}{8} &{} 0 &{} -1 &{} -\frac{1}{10} &{} \frac{2}{5} \\ \end{bmatrix}. \end{aligned}$$

Example 4

Consider the tensor \({\mathcal {F}} \in \mathbb {C}^{5\times 5 \times 4}\) such that

$$\begin{aligned} {\mathcal {F}}_{i_1,i_2,i_3}=i_1+\frac{i_2}{2}+\frac{i_3}{3}+\sqrt{i_1^2+i_2^2+i_3^2} \end{aligned}$$

for all \(i_1,i_2,i_3\) in the corresponding range. The 5 biggest singular values of the flattening matrix \(\hbox {Flat}({\mathcal {F}})\) are

$$\begin{aligned} 109.739\,3,~~ 5.250\,0,~~ 0.106\,8,~~ 8.325 \times 10^{-3},~~ 3.401 \times 10^{-4}. \end{aligned}$$

Applying Algorithm 3 with rank \(r=2,3,4,5\), we get the approximation errors

r

2

3

4

5

\(\Vert {\mathcal {F}}-\mathcal {X}^{gp} \Vert \)

\(5.123\,\,7\times 10^{-1}\)

\(6.864\,\,7\times 10^{-2}\)

\(1.055\,\,8\times 10^{-2}\)

\(9.944\,\,9\times 10^{-3}\)

\(\Vert {\mathcal {F}}-\mathcal {X}^{opt} \Vert \)

\(1.541\,\,0\times 10^{-1}\)

\(1.375\,\,4\times 10^{-2}\)

\(2.662\,\,5\times 10^{-3}\)

\(4.900\,\,2\times 10^{-4}\)

For the case \(r=3\), the computed approximating tensor by Algorithm 3 and by solving (46) is \(U^{(1)} \circ U^{(2)} \circ U^{(3)}\), with

$$\begin{aligned} U^{(1)}=&\begin{bmatrix} -0.497\,3 &{} -7.681\,3 &{} 11.746\,5 \\ -0.252\,5 &{} -6.965\,1 &{} 12.497\,0 \\ -0.087\,2 &{} -6.049\,7 &{} 13.285\,8 \\ -0.013\,2 &{} -5.052\,1 &{} 14.142\,3 \\ -0.001\,0 &{} -4.046\,9 &{} 15.077\,1 \\ \end{bmatrix},\\ U^{(2)}&= \begin{bmatrix} 1.000\,0 &{} 1.000\,0 &{} 1.000\,0 \\ 0.505\,8 &{} 0.921\,1 &{} 1.030\,6 \\ 0.171\,3 &{} 0.816\,7 &{} 1.064\,9 \\ 0.026\,2 &{} 0.700\,3 &{} 1.104\,2 \\ 0.013\,6 &{} 0.580\,7 &{} 1.149\,0 \\ \end{bmatrix}, \\ U^{(3)}=&\begin{bmatrix} 1.000\,0 &{} 1.000\,0 &{} 1.000\,0 \\ 0.507\,5 &{} 0.92\,8\,9 &{} 1.021\,6 \\ 0.175\,6 &{} 0.832\,3 &{} 1.046\,9 \\ 0.039\,9 &{} 0.723\,1 &{} 1.077\,1 \\ \end{bmatrix}. \end{aligned}$$

Example 5

Consider the tensor \({\mathcal {F}} \in \mathbb {C}^{6\times 6 \times 6 \times 5 \times 4}\) such that

$$\begin{aligned} {\mathcal {F}}_{i_1,i_2,i_3,i_4,i_5}= \ \hbox {arctan}(i_1 + 2i_2 + 3i_3 + 4i_4 + 5i_5), \end{aligned}$$

for all \(i_1,i_2,i_3,i_4,i_5\) in the corresponding range. The 5 biggest singular values of the flattening matrix \(\hbox {Flat}({\mathcal {F}})\) are

$$\begin{aligned} 101.71,~~~ 7.752\,9\times 10^{-2},~~~ 2.287\,0\times 10^{-3} , ~~~7.229\,4\times 10^{-5},~~~2.063\,3\times 10^{-6}. \end{aligned}$$

Applying Algorithm 3 with rank \(r=2,3,4,5\), we get the approximation errors as follows:

r

2

3

4

5

\(\Vert {\mathcal {F}}-\mathcal {X}^{gp} \Vert \)

\(9.814\,\,8 \times 10^{-3}\)

\(3.198\,\,7 \times 10^{-3}\)

\(5.794\,\,5\times 10^{-3}\)

\(1.012\,\,1 \times 10^{-5}\)

\(\Vert {\mathcal {F}}-\mathcal {X}^{opt} \Vert \)

\(5.311\,\,1 \times 10^{-3}\)

\(2.262\,\,3\times 10^{-4}\)

\(3.088\,\,9 \times 10^{-5}\)

\(1.752\,\,3 \times 10^{-6}\)

For the case \(r=3\), the computed approximating tensor by Algorithm 3 and by solving (46) is \(U^{(1)} \circ U^{(2)} \circ U^{(3)} \circ U^{(4)} \circ U^{(5)}\), with

$$\begin{aligned} U^{(1)}=&\begin{bmatrix} -0.013\,4 &{} -0.034\,7 &{} 1.552\,4 \\ -0.011\,2 &{} -0.032\,9 &{} 1.552\,5 \\ -0.009\,4 &{} -0.031\,2 &{} 1.552\,6 \\ -0.007\,9 &{} -0.029\,5 &{} 1.552\,7 \\ -0.006\,6 &{} -0.028\,0 &{} 1.552\,8 \\ -0.005\,6 &{} -0.026\,5 &{} 1.552\,9 \\ \end{bmatrix}, U^{(2)}&=\begin{bmatrix} 1.000\,0 &{} 1.000\,0 &{} 1.000\,0 \\ 0.701\,1 &{} 0.899\,2 &{} 1.000\,1 \\ 0.493\,9 &{} 0.808\,0 &{} 1.000\,3 \\ 0.348\,5 &{} 0.726\,0 &{} 1.000\,4 \\ 0.245\,9 &{} 0.652\,3 &{} 1.000\,6 \\ 0.173\,4 &{} 0.586\,1 &{} 1.000\,7 \\ \end{bmatrix},\\ U^{(3)}=&\begin{bmatrix} 1.000\,0 &{} 1.000\,0 &{} 1.000\,0 \\ 0.588\,6 &{} 0.852\,3 &{} 1.000\,2 \\ 0.349\,0 &{} 0.725\,8 &{} 1.000\,4 \\ 0.206\,4 &{} 0.618\,3 &{} 1.000\,6 \\ 0.121\,4 &{} 0.526\,9 &{} 1.000\,8 \\ 0.071\,5 &{} 0.448\,9 &{} 1.001\,1 \\ \end{bmatrix}, U^{(4)}&=\begin{bmatrix} 1.000\,0 &{} 1.000\,0 &{} 1.000\,0 \\ 0.494\,9 &{} 0.807\,8 &{} 1.000\,3 \\ 0.246\,3 &{} 0.652\,1 &{} 1.000\,6 \\ 0.121\,1 &{} 0.526\,9 &{} 1.000\,8 \\ 0.059\,6 &{} 0.425\,6 &{} 1.001\,1 \\ \end{bmatrix}, \\ U^{(5)}=&\begin{bmatrix} 1.000\,0 &{} 1.000\,0 &{} 1.000\,0 \\ 0.416\,1 &{} 0.765\,6 &{} 1.000\,3 \\ 0.173\,0 &{} 0.586\,2 &{} 1.000\,7 \\ 0.071\,1 &{} 0.448\,9 &{} 1.001\,1 \\ \end{bmatrix}. \end{aligned}$$

Example 6

As in Theorem 6, we have shown that if the tensor to be approximated is sufficiently close to a rank-r tensor, then the computed rank-r approximation \(\mathcal {X}^{gp}\) is quasi-optimal. It can be further improved to a better approximation \(\mathcal {X}^{opt}\) by solving nonlinear optimization (46). In this example, we explore the numerical performance of Algorithms 3 and 4 for computing low rank tensor approximations. For the given dimensions \(n_{1}, \cdots , n_{m}\), we generate the tensor

$$\begin{aligned} \mathcal {R} \, = \, \sum _{s=1}^{r} u^{s, 1} \otimes u^{s, 2} \otimes \cdots \otimes u^{s, m}, \end{aligned}$$

where each \(u^{s, j} \in \mathbb {C}^{n_{j}}\) is a complex vector whose real and imaginary parts are generated randomly, obeying the Gaussian distribution. We perturb \(\mathcal {R}\) by another tensor \(\mathcal {E}\), whose entries are also generated with the Gaussian distribution. We scale the perturbing tensor \(\mathcal {E}\) to have a desired norm \(\epsilon \). The tensor \(\mathcal {F}\) is then generated as

$$\begin{aligned} {\mathcal {F}} \, = \, \mathcal {R}+\mathcal {E}. \end{aligned}$$

We choose \(\epsilon \) to be one of \(10^{-2}, 10^{-4}, 10^{-6}\), and use the relative errors

$$\begin{aligned} \rho \_\text {gp}= \frac{\left\| {\mathcal {F}}-\mathcal {X}^{\textrm{gp}}\right\| }{\Vert \mathcal {E}\Vert }, \quad \rho \_\text {opt}= \frac{\left\| {\mathcal {F}}-\mathcal {X}^{\textrm{opt}}\right\| }{\Vert \mathcal {E}\Vert } \end{aligned}$$

to measure the approximation quality of \(\mathcal {X}^{\text {gp}}\), \(\mathcal {X}^{\text {opt}}\), respectively. For each case of \((n_1, \cdots , n_m), r\) and \(\epsilon \), we generate 10 random instances of \(\mathcal {R}, {\mathcal {F}}, \mathcal {E}\). For the case \((n_1, \cdots , n_m) = (20,20,20,20,10)\), Algorithm 4 is used to compute \(\mathcal {X}^{\textrm{gp}}\). All other cases are solved by Algorithm 3. The computational results are reported in Tables 1. For each case of \((n_{1}, \cdots , n_{m})\) and r, we also list the median of above relative errors and the average CPU time (in seconds). The \(\text {t}_\text {gp}\) and \(\text {t}_\text {opt}\) denote the average CPU time (in seconds) for Algorithms 3/4 and for solving (46), respectively.

Table 1 Computational performance of Algorithms 3 and 4 and of nonlinear optimization (46)

In the following, we give a comparison with the generalized eigenvalue decomposition (GEVD) method, which is a classical one for computing tensor decompositions when the rank \(r \leqslant n_2\). We refer to [45, 46] for the work about the GEVD method. Consider a cubic order tensor \(\mathcal {F}\in \mathbb {C}^{n_1 \times n_2 \times n_3}\) with \(n_1 \geqslant n_2 \geqslant n_3\). Suppose \(\mathcal {F}= U^{(1)} \circ U^{(2)} \circ U^{(3)}\) is a rank-r decomposition and \(r \leqslant n_2\). Assume its first and second decomposing matrices \(U^{(1)}, U^{(2)}\) have full column ranks and the third decomposing matrix \(U^{(3)}\) does not have colinear columns. Denote the slice matrices

$$\begin{aligned} F_1 \, :=\, {\mathcal {F}}_{1:r,1:r,1} ,\quad F_2 \, :=\, {\mathcal {F}}_{1:r,1:r,2}. \end{aligned}$$
(63)

One can show that

$$\begin{aligned} F_1=U^{(1)}_{1:r,:} \cdot \hbox {diag}(U^{(3)}_{1,:}) \cdot (U^{(2)}_{1:r,:})^{\top }, \quad F_2=U^{(1)}_{1:r,:} \cdot \hbox {diag}(U^{(3)}_{2,:}) \cdot (U^{(2)}_{1:r,:})^{\top }. \end{aligned}$$
(64)

This implies that the columns of \((U^{(1)}_{1:r,r})^{-\top }\) are generalized eigenvectors of the matrix pair \((F_1^{{\top }}, F_2^{{\top }})\). Consider the transformed tensor

$$\begin{aligned} \hat{{\mathcal {F}}} \, = \, (U^{(1)}_{1:r,r})^{-1}\times _1 {\mathcal {F}}_{1:r,:,:}, \end{aligned}$$
(65)

For each \(s=1,\cdots ,r\), the slice \(\hat{{\mathcal {F}}}_{s,:,:}=U^{(2)}_{:,s} \cdot (U^{(3)}_{:,s})^{\top }\) is a rank-1 matrix. The matrices \(U^{(2)}\), \(U^{(3)}\) can be obtained by computing rank-1 decompositions for the slices \(\hat{{\mathcal {F}}}_{s,:,:}\). After this is done, we can solve the linear system

$$\begin{aligned} U^{(1)} \circ U^{(2)} \circ U^{(3)} = \mathcal {F}\end{aligned}$$
(66)

to get the matrix \(U^{(1)}\). The following is the GEVD method for computing cubic order tensor decompositions when the rank \(r \leqslant n_2\).

Table 2 A comparison for the performance of Algorithms 1 and 5

Algorithm 5

(The GEVD method.) 

Input:

A tensor \(\mathcal {F}\in \mathbb {C}^{n_1 \times n_2 \times n_3}\) with the rank \(r \leqslant n_2\).

1.:

Formulate the tensor \(\hat{{\mathcal {F}}}\) as in (65).

2.:

For \(s=1,\cdots ,r,\) compute \(U^{(2)}_{:,s}\), \(U^{(3)}_{:,s}\) from the rank-1 decomposition of the matrix \(\hat{{\mathcal {F}}}_{s,:,:}\).

3.:

Solve linear system (66) to get \(U^{(1)}\).

Output:

The decomposing matrices \(U^{(1)},U^{(2)},U^{(3)}.\)

We compare the performance of Algorithm 1 and Algorithm 5 for randomly generated tensors with the rank \(r \leqslant n_2\). We generate \(\mathcal {F}= U^{(1)} \circ U^{(2)} \circ U^{(3)}\) such that each \(U^{(i)} \in \mathbb {C}^{n_i \times r}\). The entries of \(U^{(i)}\) are randomly generated complex numbers. Their real and imaginary parts are randomly generated, obeying the Gaussian distribution. For each case of \((n_1,\cdots ,n_m)\) and r, we generate 20 random instances of \(\mathcal {F}\). Algorithm 5 is implemented by the function \(cpd\_gevd\) in the software Tensorlab. All the tensor decompositions are computed correctly by both methods. The average CPU time (in seconds) for Algorithm 1 is denoted as time-gp, while the average CPU time for the GEVD method is denoted as time-gevd. The computational results are reported in Table 2. The numerical experiments show that Algorithm 1 is more computationally efficient than Algorithm 5.

7 Conclusions

   This paper gives computational methods for computing low rank tensor decompositions and approximations. The proposed methods are based on generating polynomials. For a generic tensor of rank \(r\leqslant \textrm{min}(n_1,N_3)\), its tensor decomposition can be obtained by Algorithm 1 . Under some general assumptions, we show that if a tensor is sufficiently close to a low rank one, then the low rank approximating tensor produced by Algorithm 3 is quasi-optimal. Numerical experiments are presented to show the efficiency of the proposed methods.