1 Introduction

It is a well-known fact that the minimal possible ratio between spectral and Frobenius norm of a real \(n \times n\) matrix is \(1 / \sqrt{n}\), and is achieved for any matrix with identical singular values, that is, for multiples of orthogonal matrices. Since the spectral norm of a matrix measures the length of its best rank-one approximation, this statement has the geometric meaning that orthogonal matrices achieve the largest possible relative distance to rank-one matrices. More generally, using singular value decomposition, one can show that the minimal ratio between spectral and Frobenius norm of a rank-k matrix is \(1/\sqrt{k}\) and is achieved when all nonzero singular values are equal.

There has been considerable interest in determining the minimal possible ratio between spectral norm \(\Vert A \Vert _\sigma\) and Frobenius norm \(\Vert A \Vert _F\) of an \(n_1 \times \dots \times n_d\) tensor A; see, e.g., [1, 8, 9, 11,12,13]. As in the matrix case, this ratio measures the distance of A to the set of rank-one tensors, and is hence of both theoretical and practical relevance in problems of low-rank approximation and entanglement. The precise relation between the spectral norm of A and its distance to rank-one tensors is as follows:

$$\begin{aligned} \min _{{{\,\mathrm{rank}\,}}B \le 1} \frac{\Vert A - B \Vert _F}{\Vert A \Vert _F} =\sqrt{ 1 - \frac{\Vert A \Vert ^2_\sigma }{\Vert A \Vert _F^2}}. \end{aligned}$$
(1)

Therefore, the minimal possible ratio \(\Vert A \Vert _\sigma / \Vert A \Vert _F\) that can be achieved is also called the best rank-one approximation ratio of the given tensor space [13]. By (1), it expresses the maximum relative distance of a tensor to the set of rank-one tensors.

Despite some recent progress achieved in the aforementioned references and others, determining the best rank-one approximation ratio for tensors remains a difficult problem in general and is largely open. One reason is

the lack of a suitable analog to the singular value decomposition. Moreover, the best rank-one approximation ratio of tensors usually differs over the real and complex field, as well as for nonsymmetric and symmetric tensors of the same size.

The available results in the literature focus on the best rank-one approximation ratio in the full tensor space. As for matrices, it would however also be useful to estimate its value in dependence of the tensor rank. In this work, we take a first step in this direction. We determine the minimal ratio between spectral and Frobenius norm of real rank-two tensors, and obtain that it is actually the same for symmetric and general tensors. Recall that for matrices this value equals \(1/\sqrt{2}\).

For tensors, one should also take into account that the set of tensors of rank at most two is not closed. Our main result is on symmetric tensors and reads as follows.

Theorem 1.1

Let A be a real symmetric tensor of order \(d\ge 3\) and rank at most two. Then,

$$\begin{aligned} \Vert A\Vert _\sigma >\left( 1 - \frac{1}{d} \right) ^{\frac{d-1}{2}}\Vert A\Vert _F \end{aligned}$$
(2)

and this bound is sharp. In particular,

$$\begin{aligned} \mathop{\mathop{\min}\limits_{A \ne 0}}\limits_{{\,\mathrm{brank}\,}A \le 2 } \frac{\Vert A \Vert _\sigma }{\Vert A\Vert _F} = \left( 1 - \frac{1}{d} \right) ^{\frac{d-1}{2}}, \end{aligned}$$

where \({{\,\mathrm{brank}\,}}\) denotes border rank, and the minimum is taken over real symmetric tensors. Up to orthogonal transformation and scaling, the minimum is achieved only for the tensor

$$\begin{aligned} W_d = \lim _{t \rightarrow 0}\frac{1}{t} \left[ (e_1+te_2)^d-e_1^d \right] = d e_1^{d-1} e_2. \end{aligned}$$

Here, \(e_1,e_2\) are two orthogonal unit tensors, \(u^d\) abbreviates \(u \otimes \dots \otimes u\) (d times) and \(u^{d-1}v\) denotes the symmetric part of \(u^{d-1} \otimes v\) (see below for notation).

The proof of Theorem 1.1 constitutes the main part of this work and is given in Sect. 2. The result however raises the question, whether the same bounds hold for general nonsymmetric tensors of rank two. In Sect. 3, we show that the answer is affirmative by reducing the question to the symmetric case.

Theorem 1.2

Let A be a real \(n_1 \times \dots \times n_d\) tensor of rank at most two. Then,

$$\begin{aligned} \Vert A\Vert _\sigma >\left( 1 - \frac{1}{d} \right) ^{\frac{d-1}{2}}\Vert A\Vert _F \end{aligned}$$
(3)

and this bound is sharp. In particular, assuming \(n_i \ge 2\) for \(i=1,\dots ,d\),

$$\begin{aligned} \mathop{\mathop{\min}\limits_{A \ne 0}}\limits_{{\,\mathrm{brank}\,}A \le 2} \frac{\Vert A \Vert _\sigma }{\Vert A\Vert _F} = \left( 1 - \frac{1}{d} \right) ^{\frac{d-1}{2}}, \end{aligned}$$

where \({{\,\mathrm{brank}\,}}\) denotes border rank, and the minimum is taken over real \(n_1 \times \dots \times n_d\) tensors.

Note that while for symmetric tensors, the notions of rank and symmetric rank are not the same in general [14], they coincide for rank-two tensors, see, e.g., [15].

Due to relation (1), the theorems above are equivalent to the following statement on the maximum relative distance of a real rank-two tensor to the set of rank-one tensors.

Theorem 1.3

Let A be a real tensor of order \(d\ge 3\) and rank at most two. Then,

$$\begin{aligned} \min _{{{\,\mathrm{rank}\,}}B\le 1} \frac{\Vert A-B\Vert _F}{\Vert A\Vert _F}<\sqrt{ 1-\left( 1-\frac{1}{d}\right) ^{d-1}} \end{aligned}$$

and this bound is sharp both for general as well as for symmetric tensors. Equality is achieved for the symmetric tensor \(W_d\) as above.

It is interesting to note that for \(d\rightarrow \infty\) our results imply

$$\begin{aligned} \mathop{\mathop{\min}\limits_{A \ne 0}}\limits_{{\,\mathrm{brank}\,}(A) \le 2 } \frac{\Vert A \Vert _\sigma }{\Vert A\Vert _F} \searrow \frac{1}{\sqrt{e}} \approx 0.6065 \end{aligned}$$

and

$$\begin{aligned} \mathop{\mathop{\max}\limits_{A \ne 0}}\limits_{{\,\mathrm{brank}\,}(A) \le 2 }\min _{{{\,\mathrm{rank}\,}}B\le 1} \frac{\Vert A-B\Vert _F}{\Vert A\Vert _F}\nearrow \sqrt{1-\frac{1}{e}}\approx 0.7951. \end{aligned}$$

In particular, both quantities are bounded independently of d.

1.1 Notation

We consider the subspace \({{\,\mathrm{Sym}\,}}_d({{\,\mathrm{\mathbb {R}}\,}}^n)\) of real symmetric \(n \times \dots \times n\) tensors \(A = [a_{i_1,\dots ,i_d}]\) of order d. It inherits the Euclidean inner product \(\langle A,B \rangle _F = \sum _{i_1,\dots ,i_d} a_{i_1 \dots i_d} b_{i_1\dots i_d}\) from the ambient space, which induces the Frobenius norm via

$$\begin{aligned} \Vert A \Vert _F^2 = \langle A, A \rangle _F. \end{aligned}$$

It will be convenient to introduce the notation

$$\begin{aligned} \pm u^d = \pm u \otimes \dots \otimes u \end{aligned}$$

for symmetric rank-one tensors, and similarly

$$\begin{aligned} u_1u_2\dots u_d=\frac{1}{d!}\sum _{\sigma \in \mathfrak {S}_d}u_{\sigma 1}\otimes u_{\sigma 2}\otimes \dots \otimes u_{\sigma _d} \end{aligned}$$

for the symmetrization of a nonsymmetric rank-one tensor \(u_1 \otimes u_2 \otimes \dots \otimes u_d\). It equals the orthogonal projection of \(u_1 \otimes u_2 \otimes \dots \otimes u_d\) onto \({{\,\mathrm{Sym}\,}}_d({{\,\mathrm{\mathbb {R}}\,}}^n)\). Specifically, the notation \(u^k v^\ell\) denotes the symmetrization of the rank-one tensor \(u^{\otimes k} \otimes v^{\otimes \ell }\). For symmetric rank-one tensors \(u^d\) and \(v^d\) , it holds that \(\langle u^d,v^d\rangle _F=\langle u,v\rangle ^d\) and, therefore, \(\Vert u^d\Vert _F=\Vert u\Vert ^d\).

To any symmetric tensor A, one associates a homogeneous polynomial

$$\begin{aligned} p_A(u) = \sum _{i_1,\dots ,i_d} a_{i_1 \dots i_d} u_{i_1} \dots u_{i_d}=\langle A, u^d\rangle _F. \end{aligned}$$

The spectral norm of A is then defined as

$$\begin{aligned} \Vert A \Vert _\sigma = \max _{u\ne 0}\frac{1}{\Vert u\Vert ^d}|\langle A,u^d \rangle _F|=\max _{u\ne 0} \frac{1}{\Vert u\Vert ^d}|p_A(u)|. \end{aligned}$$

Due to a result of Banach [2], this definition of spectral norm for symmetric tensors is consistent with the general one, which is given in (15). If w is a normalized maximizer of \(\frac{1}{\Vert w\Vert ^d}{\left|p_A(w) \right|}\), then \(\lambda w^d\) with \(\lambda = p_A(w)=\langle A,w^d\rangle _F\) is a best symmetric rank-one approximation of A in Frobenius norm, that is, it satisfies

$$\begin{aligned} \Vert A-\lambda w^d\Vert _F=\min _{u\in {{\,\mathrm{\mathbb {R}}\,}}^n,\mu \in {{\,\mathrm{\mathbb {R}}\,}}}\Vert A-\mu u^d\Vert _F, \end{aligned}$$

and vice versa.

A symmetric tensor of rank at most two takes the form

$$\begin{aligned} A = \alpha u^d -\beta v^d \end{aligned}$$

for vectors uv and scalars \(\alpha ,\beta \ne 0\), and the rank is equal to two if and only if u and v are linearly independent. Note that the difference notation will turn out to be convenient later. Technically, this defines tensors of symmetric rank at most two. But since for rank two both notions of rank coincide [15], we can just use the word rank throughout. It is well-known that the set of tensors of rank at most two is not closed [7]. This is also true when restricting to symmetric tensors. The tensors in the closure are said to have border rank at most two, denoted as \({{\,\mathrm{brank}\,}}A \le 2\).

2 Proof of the main result

For proving Theorem 1.1, we will determine the infimum value of the optimization problem

$$\begin{aligned} \mathop{\mathop{\inf}\limits_{\alpha , \beta \in {{\,\mathrm{\mathbb {R}}\,}}}}\limits_{\Vert u\Vert =\Vert v\Vert =1 }F(\alpha ,\beta ,u,v) = \frac{\Vert \alpha u^d - \beta v^d\Vert _\sigma ^2}{\Vert \alpha u^d-\beta v^d\Vert _F^2}. \end{aligned}$$
(4)

Here, we can always additionally assume that \(\langle u,v\rangle \ge 0\) and \(\alpha >0\). We will proceed in several steps. First, in Sect. 2.1, we validate that the tensor \(W_d\), which has symmetric border rank two, achieves equality in (2). Hence the infimum in (4) cannot be larger than \((1-\frac{1}{d})^{d-1}\). We next consider in Sect. 2.2 the first-order necessary optimality condition for (4) and show that it cannot be fulfilled for rank-two tensors admitting a unique symmetric best rank-one approximation (Proposition 2.1). In other words, the potential candidates for achieving the infimum in (4) are rank-two tensors with more than one symmetric best rank-one approximation. In Sect. 2.3, we therefore derive a criterion for a symmetric rank-two tensor to have a unique symmetric best rank-one approximation (Proposition 2.3), and validate by hand in Sects. 2.4 and 2.5 that for tensors which do not satisfy this criterion the value of F is strictly larger than \((1-\frac{1}{d})^{d-1}\). It then remains to show in Sect. 2.6, that among the tensors of border rank two, and up to orthogonal transformation, only tensor \(W_d\) achieves the infimum. Taken together, these steps provide a complete proof of Theorem 1.1.

In our proofs, we will frequently assume that \(\alpha u^d - \beta v^d\in {{\,\mathrm{Sym}\,}}_d({{\,\mathrm{\mathbb {R}}\,}}^2)\) since we can always restrict to \({{\,\mathrm{Sym}\,}}_d({{\,\mathrm{span}\,}}\{u,v\})\).

2.1 The ratio for tensor \(W_d\)

Recall that \(W_d= e_1^{d-1}e_2^{}=\frac{d}{dt}(e_1+te_2)^d|_{t=0}\). We have \(\Vert W_d\Vert _F^2=d\). The spectral norm is given by following optimization problem:

$$\begin{aligned} \max \, d x^{d-1}y \quad \text {s.t.} \quad x^2+y^2=1. \end{aligned}$$

The KKT conditions for this problem lead to the relation

$$\begin{aligned} (d-1)x^{d-2}y^2-x^d=0, \end{aligned}$$

that is, either \(x=0\), or \(x^2=(d-1)y^2\). We find that \(\smash {x=\sqrt{\frac{d-1 }{d}}}\) and \(y=\frac{1}{\sqrt{d}}\) is a maximizer with the value \(\Vert W_d\Vert _\sigma =d \left( \frac{d-1}{d}\right) ^{(d-1)/2}\frac{1}{\sqrt{d}}\), and therefore

$$\begin{aligned} \frac{\Vert W_d\Vert ^2_\sigma }{\Vert W_d\Vert _F^2}=\left( 1-\frac{1}{d}\right) ^{d-1}. \end{aligned}$$

2.2 Optimality condition for symmetric rank-two tensors

The target function in (4) can be written as a composition

$$\begin{aligned} F(\alpha ,\beta ,u,v) = G(\varphi (\alpha ,\beta ,u,v)) \end{aligned}$$

where

$$\begin{aligned} G :{{\,\mathrm{Sym}\,}}_d({{\,\mathrm{\mathbb {R}}\,}}^n) \rightarrow {{\,\mathrm{\mathbb {R}}\,}}, \quad G(A) = \frac{\Vert A \Vert _\sigma ^2}{\Vert A \Vert _F^2}, \end{aligned}$$

and

$$\begin{aligned} \varphi :{{\,\mathrm{\mathbb {R}}\,}}\times {{\,\mathrm{\mathbb {R}}\,}}\times {{\,\mathrm{\mathbb {R}}\,}}^n \times {{\,\mathrm{\mathbb {R}}\,}}^n \rightarrow {{\,\mathrm{Sym}\,}}_d({{\,\mathrm{\mathbb {R}}\,}}^n), \quad \varphi (\alpha ,\beta ,u,v) = \alpha u^d - \beta v^d. \end{aligned}$$

While \(\varphi\) is smooth, the map G is not differentiable in all points. However, it is the quotient of the smooth function \(A\mapsto \Vert A\Vert _F^2\) and the convex function \(A\mapsto \Vert A\Vert _\sigma ^2\). Therefore, the rules for generalized gradients of regular functions are applicable; see [5, Section 2.3]. It follows that the subdifferential of G in a point A can be computed using a quotient rule, which yields

$$\begin{aligned} \partial G(A) = \frac{2\Vert A\Vert _\sigma }{\Vert A\Vert _F^4} [ \partial (\Vert A\Vert _\sigma ^{})\Vert A\Vert _F^2-A\Vert A\Vert ^{}_\sigma ]. \end{aligned}$$

Here, \(\partial (\Vert A\Vert _\sigma ^{})\) denotes the subdifferential of the spectral norm in A. The derivative of \(\varphi\) equals

$$\begin{aligned} \varphi '(\alpha ,\beta ,u,v)[\delta \alpha , \delta \beta , \delta u, \delta v] = u^{d-1} (\alpha d\cdot \delta u +\delta \alpha \cdot u) - v^{d-1}(d \beta \cdot \delta v+\delta \beta \cdot v), \end{aligned}$$

which leads to

$$\begin{aligned} \begin{aligned}&\partial F(\alpha , \beta ,u,v)[\delta \alpha , \delta \beta , \delta u, \delta v]\\&=\frac{2\Vert A\Vert _\sigma }{\Vert A\Vert _F^4}\langle \partial (\Vert A\Vert _\sigma ^{})\Vert A\Vert _F^2-A\Vert A\Vert ^{}_\sigma ,u^{d-1} (\alpha d\cdot \delta u +\delta \alpha \cdot u) - v^{d-1}(d \beta \cdot \delta v+\delta \beta \cdot v ) \rangle _F \end{aligned} \end{aligned}$$
(5)

with \(A = \varphi (\alpha ,\beta ,u,v) = \alpha u^d - \beta v^d\). The subdifferential of the spectral norm can be characterized as

$$\begin{aligned} \partial (\Vert A\Vert _\sigma ) = {{\,\mathrm{conv}\,}}\,{{\,\mathrm{arg\,max}\,}}\{ \langle A, X \rangle _F :X\in {{\,\mathrm{Sym}\,}}_d({{\,\mathrm{\mathbb {R}}\,}}^n),\,{{\,\mathrm{rank}\,}}X = 1,\, \Vert X\Vert _F=1\}, \end{aligned}$$
(6)

see [4, Theorem 2.1] in general, and [1, Section 2.3] in particular. In words, \(\partial (\Vert A\Vert _\sigma )\) equals the convex hull of the normalized symmetric best rank-one approximations of A.

From (5) and (6), one concludes that the first-order optimality condition \(0\in \partial F(\alpha , \beta ,u,v)\) (see, e.g., [5, Proposition 2.3.2]) for problem (4) implies that there exists X in the convex set (6) such that

$$\begin{aligned} \langle X - \lambda A, u^{d-1} (\alpha d\cdot \delta u +\delta \alpha \cdot u) - v^{d-1}(d \beta \cdot \delta v+\delta \beta \cdot v ) \rangle _F = 0 \end{aligned}$$

for all \((\delta \alpha , \delta \beta ,\delta u, \delta v)\) and some \(\lambda \in {{\,\mathrm{\mathbb {R}}\,}}\). This is equivalent with just requiring

$$\begin{aligned} \langle X - \lambda A, u^{d-1}\delta u+v^{d-1}\delta v \rangle _F = 0 \end{aligned}$$

for all \(\delta u\) and \(\delta v\). Let \(P_{u,v}\) denote the orthogonal projection onto the linear subspace \(\{u^{d-1}\delta u+v^{d-1}\delta v:\delta u,\delta v\in {{\,\mathrm{\mathbb {R}}\,}}^n\}\) of \({{\,\mathrm{Sym}\,}}_d({{\,\mathrm{\mathbb {R}}\,}}^n)\). Taking into account that \(P_{u,v} A = P_{u,v} (\alpha u^d-\beta v^d) = \alpha u^d-\beta v^d\), we conclude that the optimality condition can be written as

$$\begin{aligned} \lambda (\alpha u^d-\beta v^d)\in P_{u,v}{{\,\mathrm{conv}\,}}\, {{\,\mathrm{arg\,max}\,}}\{ \langle \alpha u^d- \beta v^d,X\rangle _F:X\in {{\,\mathrm{Sym}\,}}_d({{\,\mathrm{\mathbb {R}}\,}}^n),\,{{\,\mathrm{rank}\,}}X = 1,\, \Vert X\Vert _F=1\}. \end{aligned}$$
(7)

We now show that condition (7) cannot hold for tensors \(\alpha u^d -\beta v^d\) admitting a unique best symmetric rank-one approximation. This is an interesting analogy to the fact that matrices achieving a minimal ratio of spectral and Frobenius norm have equal singular values.

Proposition 2.1

Let \(A=\alpha u^d -\beta v^d\) have rank two. If A has a unique best symmetric rank-one approximation, then A is not a critical point of the optimization problem (4).

We use the following lemma that shows \(P_{u,v} w^d=au^{d-1}w+bv^{d-1}w\) for any \(w\in {{\,\mathrm{\mathbb {R}}\,}}^n\) with some \(a,b\in {{\,\mathrm{\mathbb {R}}\,}}\).

Lemma 2.2

Let \(\Vert u\Vert =\Vert v\Vert =1\). The projection \(P_{u,v}w^d\) is given by

$$\begin{aligned} \frac{1}{1-\langle u,v \rangle ^{2d-2}}\big [(\langle u,w \rangle ^{d-1}-\langle u,v \rangle ^{d-1}\langle v,w \rangle ^{d-1})u^{d-1} +(\langle v,w \rangle ^{d-1}-\langle u,v \rangle ^{d-1}\langle u,w \rangle ^{d-1})v^{d-1}\big ]w. \end{aligned}$$

Proof

This follows from the definition of orthogonal projection by a direct calculation. \(\square\)

Proof of Proposition 2.1

Let one of \(\pm w^d\) be the normalized best symmetric rank-one approximation of A. Since it is unique, the optimality condition becomes

$$\begin{aligned} \lambda (\alpha u^d+\beta v^d)\in P_{u,v} w^d. \end{aligned}$$
(8)

From \(p_A(w)=\langle A, w^d\rangle _F\ne 0\) and \(A = \alpha u^d +\beta v^d\in \{u^{d-1}\delta u+v^{d-1}\delta v:\delta u,\delta v\in {{\,\mathrm{\mathbb {R}}\,}}^n\}\) , we have \(P_{u,v}w^d\ne 0\), which excludes \(\lambda = 0\). By Lemma 2.2, \(P_{u,v}w^d= au^{d-1}w+bv^{d-1} w\) for some \(a,b\in {{\,\mathrm{\mathbb {R}}\,}}\). However, since u and v are linearly independent, we have the decomposition

$$\begin{aligned} \{u^{d-1}\delta u+v^{d-1}\delta v:\delta u,\delta v\in {{\,\mathrm{\mathbb {R}}\,}}^n\}=\{u^{d-1}\delta u:\delta u\in {{\,\mathrm{\mathbb {R}}\,}}^n\}\oplus \{v^{d-1}\delta v:\delta v\in {{\,\mathrm{\mathbb {R}}\,}}^n\} \end{aligned}$$

into two complementary subspaces. Therefore, (8) would only be possible if w is both a multiple of u and v, which contradicts the linear independence of u and v. \(\square\)

2.3 A condition for unique symmetric best rank-one approximation

We now present a class of symmetric rank-two tensors admitting unique best symmetric rank-one approximations. By the result of Proposition 2.1, these can then be excluded from the further discussion on the minimal norm ratio.

Proposition 2.3

Let

$$\begin{aligned} A=\alpha u^d-\beta v^d \end{aligned}$$

with \(u\ne v\), \(\Vert u\Vert =\Vert v\Vert =1\), \(\langle u,v\rangle \ge 0\) and \(\alpha> \beta >0\). Then, A has exactly one best symmetric rank-one approximation.

For the proof, we require auxiliary results. One is the following fact about polynomials.

Lemma 2.4

Let \(a, \gamma >0\) and \(b \ge 0\) and \(d\ge 2\). The equation \(x=\gamma (x-a)(x+b)^{d-1}\) has two real solutions if d is even, and three real solutions if d is odd.

Proof

Let \(p(x)=\gamma (x-a)(x+b)^{d-1}-x\). Then by the intermediate value theorem, p must have at least two real zeros, namely one in the interval \([-b,0]\) and another one in the interval \((a,\infty )\). On the other hand,

$$\begin{aligned} p'(x)= \gamma d(x+b)^{d-2}\left( x-\frac{(d-1)a-b}{d}\right) -1, \end{aligned}$$

has at most two sign changes, one at a value larger than \(\frac{(d-1)a-b}{d}\) and another at one at a value smaller than \(-b\) if d is odd. Therefore, p has at most three real zeros. The statement follows from the fact that the number of real zeros of a polynomial with real coefficients has the same parity as its degree. \(\square\)

The second lemma narrows the possible locations of maximizers of the homogeneous form \({\left|p_A \right|}\).

Lemma 2.5

Under the assumptions of Proposition 2.3, let w be a maximizer of \({\left|p_A(w) \right|} = {\left|\langle \alpha u^d-\beta v^d,w^d \rangle _F \right|}\) subject to \(\Vert w\Vert =c>0\). Then, \({\left|\langle u,w\rangle \right|} \ge {\left|\langle v,w\rangle \right|}\).

Proof

Assume to the opposite that \({\left|\langle u,w\rangle \right|} < {\left|\langle v,w\rangle \right|}\) and without loss of generality \(\langle v,w\rangle >0\). Let Q be the symmetric orthogonal matrix mapping u to v and v to u (i.e., \(Q = I - z z^T\) with \(z = (u+v)/\Vert u+v\Vert\)), and let \(\bar{w}=Qw\). Then, \(\langle u,w\rangle =\langle v,\bar{w}\rangle\) and \(\langle v,w\rangle =\langle u,\bar{w}\rangle\). By assumption, we then have

$$\begin{aligned} {\left|\langle \alpha u^d-\beta v^d,\bar{w}^d \rangle _F \right|}=\langle \alpha u^d-\beta v^d,\bar{w}^d \rangle _F. \end{aligned}$$

If \({\left|\langle \alpha u^d-\beta v^d,{w}^d \rangle _F \right|}=\langle \alpha u^d-\beta v^d,{w}^d\rangle _F\) , this yields \({\left|\langle \alpha u^d-\beta v^d,\bar{w}^d \rangle _F \right|}>{\left|\langle \alpha u^d-\beta v^d,{w}^d \rangle _F \right|}\) (by using \((\alpha +\beta )\langle v,w\rangle ^d>(\alpha +\beta )\langle u, w\rangle ^d\)) which contradicts the optimality of w. In the other case, \({\left|\langle \alpha u^d-\beta v^d,{w}^d \rangle _F \right|}= - \langle \alpha u^d-\beta v^d,{w}^d\rangle _F\), optimality implies \(\beta (\langle u,w\rangle ^d+\langle v,w\rangle ^d)>\alpha (\langle u,w\rangle ^d+\langle v,w\rangle ^d)\) which contradicts \(\alpha > \beta\). \(\square\)

We are now in the position to prove Proposition 2.3.

Proof of Proposition 2.3

We can assume that \(A \in {{\,\mathrm{Sym}\,}}_d({{\,\mathrm{\mathbb {R}}\,}}^2)\), so that \(u,v \in {{\,\mathrm{\mathbb {R}}\,}}^2\). Without loss of generality, since we can change coordinates, we can consider \(\alpha =1\), \(u=\begin{pmatrix}0\\ 1 \end{pmatrix}\) and \(\root d \of {\beta }v=\begin{pmatrix}a\\ b\end{pmatrix}\) with \(a>0\), \(b\ge 0\) (since \(\langle u,v\rangle \ge 0\)), and \(a^2+b^2<1\) (since \(\beta <\alpha =1\)). Writing \(w = \lambda \begin{pmatrix}x\\ y\end{pmatrix}\) for points on the unit circle, where \(\lambda > 0\) is a normalization constant, we then have

$$\begin{aligned} p_A(w) = \lambda ^d[y^d - (ax + b y)^d]. \end{aligned}$$
(9)

Critical points on the circle are characterized by \(\langle w, \nabla p_A(w) \rangle = 0\), which means

$$\begin{aligned} y^{d-1}x-(bx-ay)(ax+by)^{d-1} = 0 \end{aligned}$$

independent of \(\lambda\). Note that here \(y=0\) is not possible since both a and b are nonzero. Recall that a symmetric best rank-one approximation of A is given as \(p_A(w)w^d\), where w maximizes \({\left|p_A(w) \right|}\) on the circle. Since \(p_A(-w)=(-1)^dp_A(w)\), in order to prove the assertion it suffices to show that \({\left|p_A(w) \right|}\) has exactly one maximizer w with \(y=1\). The optimality condition at such a w reduces to

$$\begin{aligned} x = (bx-a)(ax+b)^{d-1}. \end{aligned}$$
(10)

Hence, we only need to show that there is exactly one solution x of this equation corresponding to a global maximum of \({\left|p_A \right|}\) on the unit circle.

If \(y=1\), then \(p_A\) in (9) has a zero at \(x_0=\frac{1-b}{a}\). Then,

$$\begin{aligned} x_0=\frac{1-b}{a}>\frac{b-b^2-a^2}{a}= (bx_0-a)(ax_0+b)^{d-1}. \end{aligned}$$

This shows that (10) has at least one solution \(x^*>x_0\). We consider such a solution \(x^*\) such that the corresponding unit vector \(w=\lambda \begin{pmatrix} x^*\\ 1 \end{pmatrix}\) is a local maximum of \({\left|p_A \right|}\) on the unit circle. We have

$$\begin{aligned} {\left|\langle u,w\rangle \right|}=\lambda<\frac{\lambda }{\root d \of {\beta }}= \frac{\lambda }{\root d \of {\beta }}(ax_0-b) <\lambda \frac{1}{\root d \of {\beta }}(ax^*-b) ={\left|\langle v,w\rangle \right|}. \end{aligned}$$

By Lemma 2.5, w is not a global maximum of \({\left|p_A \right|}\). If d is even, then by Lemma 2.4, equation (10) has exactly two solutions, and therefore only one corresponds to a global maximum. If d is odd, then by the same lemma, (10) has three solutions. Taking into account that \(p_A\) in (9) has only one zero for \(y=1\), one of these solutions corresponds to a local minimizer of \({\left|p_A \right|}\). Hence, there is only one global maximizer. \(\square\)

2.4 The case \(\alpha > 0\ge \beta\)

We show that \(\frac{\Vert \alpha u^d-\beta v^d\Vert _\sigma ^2}{\Vert \alpha u^d-\beta v^d\Vert _F^2} \ge \frac{1}{2}\) if \(\langle u^d,v^d\rangle _F\ge 0\) and \(\alpha > 0 \ge \beta\). This shows that for \(d> 2\) such tensors do not attain the infimum in (4) since \(\frac{1}{2}> \left( 1-\frac{1}{d}\right) ^{d-1}\). We formulate this statement without \(\alpha\) and \(\beta\) by removing the restriction \(\Vert u \Vert = \Vert v \Vert = 1\).

Proposition 2.6

Let \(u \ne v\) and \(\langle u,v \rangle \ge 0\). Then, \(\frac{\Vert u^d+v^d\Vert ^2_\sigma }{\Vert u^d+v^d\Vert ^2_F}\ge \frac{1}{2}\).

Proof

We can assume \(\Vert u\Vert \ge \Vert v\Vert\). Using that \(\Vert u^d+v^d\Vert _\sigma \ge \langle u^d+v^d,\frac{u^d}{\Vert u\Vert ^d}\rangle _F\), we have

$$\begin{aligned} \frac{\Vert u^d+v^d\Vert ^2_\sigma }{\Vert u^d+v^d\Vert ^2_F} \ge \frac{\Vert u\Vert ^{2d}+2\langle u,v\rangle ^d+\left( \frac{\langle u,v\rangle }{\Vert u\Vert }\right) ^{2d}}{\Vert u\Vert ^{2d}+2\langle u,v\rangle ^d+\Vert v\Vert ^{2d}} =1-\frac{\Vert v\Vert ^{2d}-\left( \frac{\langle u,v\rangle }{\Vert u\Vert }\right) ^{2d}}{\Vert u\Vert ^{2d}+2\langle u,v\rangle ^d+\Vert v\Vert ^{2d}}&\ge 1-\frac{\Vert v\Vert ^{2d}}{2\Vert v\Vert ^{2d}}=\frac{1}{2}, \end{aligned}$$

as asserted. \(\square\)

2.5 The case \(\alpha =\beta >0\)

In this section, we verify by a direct calculation that the infimum in (4) is not attained for the difference of two rank-one tensors with the same norm, i.e., when \(\alpha = \beta\) in (4).

Proposition 2.7

Let \(u\ne v\), \(\Vert u\Vert =\Vert v\Vert \ne 0\), \(\langle u,v\rangle \ge 0\) and \(d\ge 3\). Then,

$$\begin{aligned} \frac{\Vert u^d-v^d\Vert _\sigma ^2}{\Vert u^d-v^d\Vert ^2_F}> \left( 1-\frac{1}{d}\right) ^{d-1}. \end{aligned}$$

We require the following version of Jensen’s inequality.

Lemma 2.8

Let \(f:[a,b]\rightarrow {{\,\mathrm{\mathbb {R}}\,}}\) be convex and continuously differentiable. If \(a+b=a'+b'\) and \(a<a'<b'<b\), then

$$\begin{aligned} \frac{1}{b-a}\int _a^bf(x)\, dx\ge \frac{1}{b'-a'}\int _{a'}^{b'} f(x) \, dx\ge f\left( \frac{a+b}{2}\right) . \end{aligned}$$

The inequalities are strict if f is strictly convex.

Proof

Without loss of generality let \(a=-b\) and \(a'=-b'\). Then, using substitution, we have

$$\begin{aligned} \frac{1}{b}\int _{-b}^bf(x) \, dx&= \frac{1}{b'}\int _{-b'}^{b'} f\left( \frac{b}{b'}x\right) -f(x)+f(x)\,dx\\&=\frac{1}{b'}\int _{-b'}^{b'}f(x) \, dx+\frac{1}{b'}\int _0^{b'}\int _x^{\frac{bx}{b'}}f'(y)-f'(-y)\,dydx \ge \frac{1}{d}\int _{-b'}^{b'}f(x) \, dx, \end{aligned}$$

by monotonicity of the derivative of a convex function. This shows the first of the asserted inequalities. The second inequality is just Jensen’s inequality, noting that \(\frac{a+b}{2}=\frac{a'+b'}{2}\). If f is strictly convex, then \(f'\) is strictly monotone and the inequalities are strict. \(\square\)

Proof of Proposition 2.7

We can assume that \(A \in {{\,\mathrm{Sym}\,}}_d({{\,\mathrm{\mathbb {R}}\,}}^2)\), so that \(u,v \in {{\,\mathrm{\mathbb {R}}\,}}^2\). After rotation and rescaling, we have \(u=\left( \begin{array}{l}1\\ t \end{array}\right)\) and \(v=\left( \begin{array}{l}1\\ -t\end{array}\right)\) with \(t\in (0,1].\) Then,

$$\begin{aligned} \Vert u^d-v^d\Vert ^2_F=2(1+t^2)^d-2(1-t^2)^d=: g(t). \end{aligned}$$
(11)

First, we apply the estimate

$$\begin{aligned} \Vert u^d-v^d\Vert _\sigma \ge \left\langle u^d-v^d,\frac{u^d}{\Vert u\Vert ^d}\right\rangle _F=\frac{(1+t^2)^d-(1-t^2)^d}{\sqrt{1+t^2}^d}, \end{aligned}$$

which yields

$$\begin{aligned} \frac{\Vert u^d-v^d\Vert _\sigma ^2}{\Vert u^d-v^d\Vert _F^2}\ge \frac{(1+t^2)^d-(1-t^2)^d}{2(1+t^2)^d}=\frac{1}{2}\left( 1-\left( \frac{1-t^2}{1+t^2}\right) ^d\right) . \end{aligned}$$

The right-hand side is monotonically increasing in the interval (0, 1]. For \(t=\sqrt{\frac{1}{d-1}}\) it equals

$$\begin{aligned} \frac{1}{2}\left( 1-\left( \frac{d-2}{d}\right) ^d\right) =\frac{d^d-(d-2)^d}{2d^d}. \end{aligned}$$

This value is larger than \(\left( 1-\frac{1}{d}\right) ^{d-1}=\left( \frac{d-1}{d}\right) ^{d-1}\) since, using Lemma 2.8 with \(f(t)=t^{d-1}\), it holds that \(d^d-(d-2)^d> 2 d (d-1)^{d-1}\) for \(d\ge 3\). This shows that

$$\begin{aligned} \frac{\Vert u^d-v^d\Vert _\sigma ^2}{\Vert u^d-v^d\Vert _F^2}>\left( 1-\frac{1}{d}\right) ^{d-1} \end{aligned}$$

for all \(t\in \left[ \sqrt{\frac{1}{d-1}},1\right]\). It hence remains to verify this inequality for all \(t\in \left( 0,\sqrt{\frac{1}{d-1}}\right)\), which is a little bit more involved. The starting point is another lower bound for the spectral norm, namely

$$\begin{aligned} \Vert u^d-v^d\Vert _\sigma \ge \left\langle u^d-v^d, \begin{pmatrix}\sqrt{(d-1)/d}\\ 1/\sqrt{d}\end{pmatrix}^d\right\rangle _F=\frac{1}{\sqrt{d}^d}\Bigg (\left( \sqrt{d-1}+t\right) ^d-\left( \sqrt{d-1}-t\right) ^d\Bigg )=: h(t). \end{aligned}$$

Note that \(\frac{u^d-v^d}{\Vert u^d-v^d\Vert _F}\rightarrow \frac{W_d}{\Vert W_d\Vert _F}\) for \(t\rightarrow 0\). This can be seen by taking the limit of \(\frac{u^d-v^d}{t}\) and noting that \(g(t)=\Vert u^d-v^d\Vert ^2_F\) is of order \(t^2\) by (11). We therefore have

$$\begin{aligned} \lim _{t\rightarrow 0}\frac{h(t)^2}{g(t)}=\left\langle \frac{W_d}{\Vert W_d\Vert _F},\begin{pmatrix}\sqrt{(d-1)/d}\\ 1/\sqrt{d}\end{pmatrix}^d\right\rangle _F^2=\frac{\Vert W_d\Vert _\sigma ^2}{\Vert W_d\Vert ^2_F}=\left( 1 - \frac{1}{d}\right) ^{d-1}, \end{aligned}$$

where the second and third equalities are shown in Sect. 2.1. We now claim that

$$\begin{aligned} \frac{d}{dt}\frac{h(t)^2}{g(t)}>0\text { for } t\in \left( 0,\sqrt{\frac{1}{d-1}}\right) \end{aligned}$$

which then proves the assertion. This claim is equivalent to the positivity of

$$\begin{aligned}&\frac{\sqrt{d}^d}{4d}\big (2h'(t)g(t)-g'(t)h(t)\big ) \\&\quad = \Bigg [ \! \left( \sqrt{d-1}+t\right) ^{d-1}+\left( \sqrt{d-1}-t\right) ^{d-1}\Bigg ]\Bigg [(1+t^2)^d-(1-t^2)^d\Bigg ]\\&\quad -t\Bigg [\! \left( \sqrt{d-1}+t\right) ^{d}-\left( \sqrt{d-1}-t\right) ^{d}\Bigg ] \Bigg [(1+t^2)^{d-1}+(1-t^2)^{d-1}\Bigg ]. \end{aligned}$$

Elementary manipulations give

$$\begin{aligned}&\frac{\sqrt{d}^d}{4d}\big (2h'(t)g(t)-g'(t)h(t)\big )\nonumber \\&= \Bigg [\!\left( \sqrt{d-1}+t\right) ^{d-1} (1+t^2)^{d-1}-\left( \sqrt{d-1}-t\right) ^{d-1} (1-t^2)^{d-1}\Bigg ] \left( 1-t\sqrt{d-1}\right) \nonumber \\&\quad -\Bigg [\!\left( \sqrt{d-1}+t\right) ^{d-1} (1-t^2)^{d-1}-\left( \sqrt{d-1}-t\right) ^{d-1} (1+t^2)^{d-1}\Bigg ] \left( 1+t\sqrt{d-1}\right) \nonumber \\&\begin{aligned}&=\Bigg [\Big (\underbrace{\sqrt{d-1}+t+t^2\sqrt{d-1}+t^3}_{{}=: b}\Big )^{d-1}-\Big (\underbrace{\sqrt{d-1}-t-t^2\sqrt{d-1}+t^3}_{{}=: a}\Big )^{d-1}\Bigg ] \left( 1-t\sqrt{d-1}\right) \\&\quad -\Bigg [\Big (\underbrace{\sqrt{d-1}+t-t^2\sqrt{d-1}-t^3}_{=: b'}\Big )^{d-1}-\Big (\underbrace{\sqrt{d-1}-t+t^2\sqrt{d-1}-t^3}_{=: a'}\Big )^{d-1}\Bigg ] \left( 1+t\sqrt{d-1}\right) . \end{aligned} \end{aligned}$$
(12)

Note that for \(t\in \left( 0,\sqrt{\frac{1}{d-1}}\right)\) we have \(b> b'> a' > a\) and

$$\begin{aligned} b-a=2t\left( 1+t\sqrt{d-1}\right) ,\quad b'-a' =2t\left( 1-t\sqrt{d-1}\right) . \end{aligned}$$

Therefore with \(f(t)=(d-1) t^{d-2}\), we can rewrite (12) as

$$\begin{aligned} \frac{1}{4d}\sqrt{d}^d\big (2h'(t)g(t)-g'(t)h(t)\big )=\frac{1}{2t}\left[ (b'-a')\int _a^b f(x) \, dx -(b-a) \int _{a'}^{b'} f(x) \, dx \right] . \end{aligned}$$

Moreover,

$$\begin{aligned} \frac{a+b}{2} =\sqrt{d-1}+2t^3 > \sqrt{d-1}-2t^3=\frac{a'+b'}{2}, \end{aligned}$$

and therefore \(a'':= \frac{a+b -(b'-a')}{2}>a'>a\) and \(b>b'':= \frac{a+b+(b'-a')}{2}>b'\). Since \({a''+b''}={a+b}\) and \(a''-b''=a'-b'\), Lemma 2.8 yields

$$\begin{aligned} (b'-a')\int _a^b f(x) \, dx \ge (b-a) \int _{a''}^{b''} f(x) \, dx >(b-a) \int _{a'}^{b'} f(x) \, dx, \end{aligned}$$

where the second inequality follows from monotonicity of f. This shows that (12) is positive. \(\square\)

2.6 Tensors of border rank two

We now consider tensors lying on the boundary of the set of symmetric rank-two tensors.

Proposition 2.9

Let A be a limit of symmetric rank-two tensors and\({{\,\mathrm{rank}\,}}A>2\). Then,

$$\begin{aligned} \frac{\Vert A\Vert _\sigma ^2}{\Vert A\Vert ^2_F}\ge \left( 1-\frac{1}{d}\right) ^{d-1}\!=\frac{\Vert W_d\Vert ^2_\sigma }{\Vert W_d\Vert ^2_F} \end{aligned}$$

and equality is attained if and only if \(A=u^{d-1}v\)for some orthogonalu and v, that is, for tensors arising from scaling and orthogonal transformations of tensor \(W_d\).

The boundary of rank-two tensors is well-studied. We require the following well-known parametrization, see, e.g., [3]. We offer a self-contained proof for completeness.

Lemma 2.10

Let A be a limit of symmetric rank-two tensors and \({{\,\mathrm{rank}\,}}A>2\). Then, A is of the form

$$\begin{aligned} A =a u^d+bdu^{d-1}v \end{aligned}$$

with \(\langle u,v\rangle =0\) and \(\Vert u\Vert =\Vert v\Vert =1\).

Proof

Let \(A_n=u_n^d\pm v_n^d\) with \(\lim _{n\rightarrow \infty }A_n=A\) or \(\lim _{n\rightarrow \infty }A_n=-A\). It is not difficult to see that \(u_n\) and \(v_n\) must be unbounded since otherwise there is a subsequence of \(A_n\) converging to a tensor of rank at most two, contradicting \({{\,\mathrm{rank}\,}}A>2\). We write \(v_n=s_n u_n+t_n w_n\) with \(\Vert w_n\Vert =1\) and \(\langle u_n, w_n\rangle =0\). Then,

$$\begin{aligned} A_n=(1{\pm } s_n^d)u_n^d{\pm }\sum _{k=1}^d \left( {\begin{array}{c}d\\ k\end{array}}\right) s_n^{d-k} t_n^k u_n^{d-k}w_n^k, \end{aligned}$$

and it can be checked that all terms are pairwise orthogonal. Hence, since \(A_n\) converges, all terms must be bounded and by passing to a subsequence we can assume that all of them converge. Due to \(\Vert u_n\Vert \rightarrow \infty\) we have \(1\pm s_n^d\rightarrow 0\) for the first term, which implies that the sequence \(s_n\) is bounded. Therefore, considering the term \(k=1\), the sequence \(t_n\Vert u_n\Vert ^{d-1}\) is bounded which automatically implies \(t_n^k\Vert u_n\Vert ^{d-k}\rightarrow 0\) for all \(k>1\). We conclude that

$$\begin{aligned} \lim _{n\rightarrow \infty } A_n=\lim _{n\rightarrow \infty } ( 1\pm s_n^d) u_n^d+\lim _{n\rightarrow \infty }ds_n^{d-1}t_n u_n^{d-1}w_n = a u^d+bd u^{d-1} v \end{aligned}$$

which proves the assertion. \(\square\)

Proof of Proposition 2.9

Using Lemma 2.10, scaling and orthogonal transformations, we can assume \(A=a e_1 ^d+bd e_1^{d-1}e_2 \in {{\,\mathrm{Sym}\,}}_d({{\,\mathrm{\mathbb {R}}\,}}^2)\) with \(a,b\ge 0\). Then, \(\Vert A\Vert _F^2=a^2+b^2d\) since the tensors \(e_1^d\) and \(e_1^{d-1}e_2\) are orthogonal and \(\Vert de_1^{d-1}e_2\Vert _F^2=d\). We have the following two lower bounds for the spectral norm:

$$\begin{aligned} \Vert A\Vert _\sigma \ge \left\langle ae_1^d+bde_1^{d-1}e_2 ,\frac{1}{\sqrt{d}^d}\begin{pmatrix}\sqrt{d-1} \\ 1\end{pmatrix}^d \right\rangle _F =\frac{1}{\sqrt{d}^d}\left( a\sqrt{d-1}^d+bd\sqrt{d-1}^{d-1}\right) \end{aligned}$$
(13)

and

$$\begin{aligned} \Vert A\Vert _\sigma \ge \left\langle ae_1^d+bde_1^{d-1}e_2 ,e_1^d\right\rangle _F = a. \end{aligned}$$
(14)

We can restrict to tensors A with Frobenius norm \(\Vert A\Vert ^2_F=a^2+b^2d=1\) and need to show that

$$\begin{aligned} \Vert A\Vert _\sigma >\left( 1-\frac{1}{d}\right) ^{\frac{d-1}{2}} \end{aligned}$$

whenever \(a>0\). The first lower bound (13) implies that this is true whenever \(b> \frac{\sqrt{d}-a\sqrt{d-1}}{d}\). Together with \(1=a^2+b^2d\) and \(a,b\ge 0\) this verifies the claim for \(0< a< \frac{2\sqrt{d(d-1)}}{2d-1}\). If \(a\ge \frac{2\sqrt{d(d-1)}}{2d-1}\), then the second lower bound (14) yields the desired estimate

$$\begin{aligned} \Vert A\Vert _\sigma ^2\ge a^2\ge \left( \frac{2\sqrt{d(d-1)}}{2d-1}\right) ^2>\frac{d-1}{d}>\left( 1-\frac{1}{d}\right) ^{d-1} \end{aligned}$$

for \(d\ge 3\). \(\square\)

This concludes the proof of Theorem 1.1.

3 Approximation ratio for nonsymmetric rank-two tensors

Recall that the spectral norm for general \(n_1 \times \dots \times n_d\) tensors is defined as

$$\begin{aligned} \Vert A \Vert _\sigma = \max _{\Vert u_1 \Vert = \dots = \Vert u_d \Vert = 1 }\langle A, u_1 \otimes \dots \otimes u_d \rangle _F. \end{aligned}$$
(15)

The result for symmetric tensors raises the question whether the inequality

$$\begin{aligned} \Vert A\Vert _\sigma >\left( 1-\frac{1}{d}\right) ^{\frac{d-1}{2}}\Vert A\Vert _F \end{aligned}$$

is also true for general real tensors of order \(d \ge 3\) and rank at most two. As stated in Theorem 1.2, the answer is indeed affirmative and a consequence of the following interesting fact.

Proposition 3.1

Let A be a real \(n_1 \times \dots \times n_d\) tensor of rank at most two. Then there is is symmetric rank-two tensor \(A_\mathsf S\in {{\,\mathrm{Sym}\,}}_d ({{\,\mathrm{\mathbb {R}}\,}}^2)\) with \(\Vert A\Vert _F=\Vert A_{\mathsf {S}}\Vert _F\) and \(\Vert A\Vert _\sigma \ge \Vert A_{\mathsf {S}}\Vert _\sigma\).

For the proof, we will require two lemmas. The first is on the behavior of successively taking geometric means of positive real numbers, and the second on the relation of Frobenius and spectral norm of two particular \(2\times 2\) matrices.

Lemma 3.2

Let \(x,z\ge 0\), \(k>0\), and define the sequence

$$\begin{aligned} y_0=x,\quad y_1=\left( x^{k-1}z\right) ^{\frac{1}{k}}, \quad y_{\ell +2}=\left( y_{\ell +1}^{k-1}y_{\ell }^{}\right) ^{\frac{1}{k}}. \end{aligned}$$

Then, \(\lim _{\ell \rightarrow \infty } y_\ell =\left( x^k z\right) ^{\frac{1}{k+1}}\).

Proof

We may assume \(x,z> 0\), otherwise the result follows immediately. We show via induction that

$$\begin{aligned} y_\ell =\left( x^{{k^{\ell +1}+(-1)^{\ell }}}z^{{k^{\ell }+(-1)^{\ell -1}}}\right) ^{\frac{1}{k^{\ell }(k+1)}}. \end{aligned}$$
(16)

The cases \(\ell =0\) and \(\ell =1\) follow directly. Now let (16) be true for \(1,\ldots ,\ell +1\). Then,

$$\begin{aligned} y_{\ell +2}&=\left( y_{\ell +1}^{k-1}y_\ell ^{}\right) ^{\frac{1}{k}} =x^{\left( \frac{(k-1)\left( k^{\ell +2}+(-1)^{\ell +1}\right) }{k^{\ell +2}(k+1)}+\frac{k^{\ell +1}+(-1)^{\ell }}{k^{\ell +1}(k+1)}\right) } z^{\left( \frac{(k-1)\left( k^{\ell +1}+(-1)^\ell \right) }{k^{\ell +1}(k+1)}+\frac{k^\ell +(-1)^{\ell -1}}{k^\ell (k+1)}\right) } \\&=\left( x^{{k^{\ell +3}+(-1)^{\ell +2}}} z^{{k^{\ell +2}+(-1)^{\ell +1}}}\right) ^{\frac{1}{k^{\ell +2}(k+1)}}, \end{aligned}$$

proving (16). Taking the limit \(\ell \rightarrow \infty\) gives the result. \(\square\)

Lemma 3.3

Let \(a,b \in {{\,\mathrm{\mathbb {R}}\,}}\) and \(0\le x_1,x_2\le 1\). Define the matrices

$$\begin{aligned} S=\begin{pmatrix} a+bx_1x_2 &{}b\sqrt{x_1x_2-x_1^2x_2^2} \\ b\sqrt{x_1x_2-x_1^2x_2^2} &{} b(1-x_1x_2) \end{pmatrix} \quad \! \text {and}\quad \! T=\begin{pmatrix} a+bx_1x_2 &{}bx_1\sqrt{1-x_2^2} \\ bx_2\sqrt{1-x_1^2} &{} b\sqrt{(1-x_1^2)(1-x_2^2)} \end{pmatrix}. \end{aligned}$$

Then, \(\Vert S\Vert _F=\Vert T\Vert _F\) and \(\Vert S\Vert _\sigma \le \Vert T\Vert _\sigma\).

Proof

A direct calculation shows that \(\Vert S\Vert _F=\Vert T\Vert _F\). The singular values of \(2\times 2\) matrices are given by \(\sigma _{1,2}^2={F^2}/{2}\pm \sqrt{{F^4}/{4}-{\left|D \right|}^2}\), where F is the Frobenius norm and D is the determinant of the matrix. We have

$$\begin{aligned} {\left|\det S \right|}^2=a^2b^2(1-2x_1x_2+x_1^2x_2^2) \quad \text {and}\quad {\left|\det T \right|}^2=a^2b^2(1-x_1^2-x_2^2+x_1^2x_2^2). \end{aligned}$$

Since \(2x_1x_2\le x_1^2+x_2^2\) implies \({\left|\det T \right|}^2 \le {\left|\det S \right|}^2\), the largest singular value of T, which equals its spectral norm, is larger or equal to the largest singular value of S. \(\square\)

Proof of 3.1

Write \(A=\alpha U + \beta V\) where \(U=u_1 \otimes \dots \otimes u_d\) and \(V=v_1 \otimes \dots \otimes v_d\) with \(\Vert u_i\Vert =\Vert v_i\Vert =1\). Then, \(\Vert A\Vert _F^2=\alpha ^2 + 2\alpha \beta \langle U, V\rangle _F+\beta ^2\). We may assume that \(u_i,v_i\in {{\,\mathrm{\mathbb {R}}\,}}^2\) and after an orthogonal change of bases and possibly changing sign of \(\beta\), we may also assume that

$$\begin{aligned} u_i=e_1, \quad v^{}_i=x_i^{}e^{}_1+ \sqrt{1-x_i^2}e^{}_2 \quad \text {with }\quad 0\le x_i\le 1. \end{aligned}$$

Our goal is to show that replacing any k factors \(v_{i_1},\ldots ,v_{i_k}\) of V with the same unit norm vector v defined by

$$\begin{aligned} v=x e_1 +\sqrt{1-x^2} e_2 \quad \text {with} \quad x= \Bigg (\prod _{j=1}^k x_{i_j}\Bigg )^{1/k} \end{aligned}$$

leads to a tensor with the same Frobenius norm but smaller spectral norm. Since Frobenius and spectral norm are invariant under permutation of tensor factors, it suffices to prove this for the case that the first k vectors \(v_1,\dots ,v_k\) are replaced in this way. The resulting tensor is denoted by \(A_k=\alpha U+\beta V_k\) with \(V_k= v \otimes \dots \otimes v \otimes v_{k+1} \otimes \dots \otimes v_d\) and since

$$\begin{aligned} \langle U, V_k\rangle _F=\prod _{i=1}^k \langle u_i,v\rangle \prod _{i=k+1}^d \langle u_i,v_i \rangle =x^k \prod _{i=k+1}^d x_i=\prod _{i=1}^d x_i=\prod _{i=1}^d \langle u_i,v_i \rangle =\langle U, V\rangle _F, \end{aligned}$$

the Frobenius norms of A and \(A_k\) indeed coincide. In the remainder of the proof, we show by induction that the spectral norm does not increase with k, i.e., \(\Vert A_{k+1}\Vert _\sigma \le \Vert A_{k}\Vert _\sigma \le \Vert A\Vert _\sigma\). For \(k=d\) , this provides a symmetric tensor with the desired properties.

We start with \(k=2\). Let \(w_1,\ldots , w_d\) be the maximizers in

$$\begin{aligned} \max _{\Vert w_1\Vert =\cdots =\Vert w_d\Vert =1}\langle A_2,w_1 \otimes \dots \otimes w_d\rangle _F=\Vert A_2\Vert _\sigma . \end{aligned}$$

Let \(a=\alpha \prod _{i=3}^d\langle u_i,w_i \rangle\), \(b=\beta \prod _{i=3}^d\langle v_i,w_i \rangle\), and consider the matrices

$$\begin{aligned} T=a e_1^{} e_1^{T}+ b v_1^{}v_2^{T}=\begin{pmatrix} a+bx_1x_2 &{}bx_1\sqrt{1-x_2^2} \\ bx_2\sqrt{1-x_1^2} &{} b\sqrt{(1-x_1^2)(1-x_2^2)} \end{pmatrix} \end{aligned}$$

and

$$\begin{aligned} S=a e_1^{} e_1^{T}+ b v^{}v_{}^{T} = \begin{pmatrix} a+bx_1x_2 &{}b\sqrt{x_1x_2-x_1^2x_2^2} \\ b\sqrt{x_1x_2-x_1^2x_2^2} &{} b(1-x_1x_2) \end{pmatrix}. \end{aligned}$$

They represent the bilinear forms

$$\begin{aligned} {\tilde{w}}_1^T T {\tilde{w}}_2=\langle A, {\tilde{w}}_1\otimes {\tilde{w}}_2 \otimes w_3 \otimes \dots \otimes w_d \rangle _F\quad \text {and}\quad {\tilde{w}}_1^T S {\tilde{w}}_2=\langle A_2, {\tilde{w}}_1\otimes {\tilde{w}}_2 \otimes w_3 \otimes \dots \otimes w_d \rangle _F \end{aligned}$$

in \({\tilde{w}}_1\) and \({\tilde{w}}_2\). Clearly,

$$\begin{aligned} \Vert T\Vert _\sigma = \max _{\Vert {\tilde{w}}_1\Vert =\Vert {\tilde{w}}_2\Vert =1}\langle A, {\tilde{w}}_1\otimes {\tilde{w}}_2 \otimes w_3 \otimes \dots \otimes w_d \rangle _F\le \Vert A\Vert _\sigma \end{aligned}$$

and

$$\begin{aligned} \Vert S\Vert _\sigma = \max _{\Vert {\tilde{w}}_1\Vert =\Vert {\tilde{w}}_2\Vert =1}\langle A_2, {\tilde{w}}_1\otimes {\tilde{w}}_2 \otimes w_3 \otimes \dots \otimes w_d \rangle _F = \Vert A_2\Vert _\sigma . \end{aligned}$$

Lemma 3.3 implies \(\Vert S\Vert _\sigma \le \Vert T\Vert _\sigma\) and therefore \(\Vert A_2\Vert _\sigma \le \Vert A\Vert _\sigma\).

For the induction step, let \(2 \le k < d\) and assume that replacing any k factors of V in the described manner always results in a tensor with a smaller or equal spectral norm. Note that here V was in principle arbitrary. Starting from the given V, we now construct a sequence \({\widetilde{V}}_0, {\widetilde{V}}_1, \dots\) of rank-one tensors in which the first k factors and then the second to \((k+1)\)-st factors are successively replaced:

$$\begin{aligned} \widetilde{V}_0&= {\tilde{v}}_0\otimes \dots \otimes {\tilde{v}}_0 \otimes v_{k+1} \otimes (v_{k+2} \otimes \dots \otimes v_d), \\ \widetilde{V}_1&= {\tilde{v}}_0 \otimes {\tilde{v}}_1 \otimes \dots \otimes {\tilde{v}}_{1} \otimes (v_{k+2} \otimes \dots \otimes v_d), \\ \widetilde{V}_2&= {\tilde{v}}_2\otimes \dots \otimes {\tilde{v}}_2 \otimes {\tilde{v}}_{1} \otimes (v_{k+2} \otimes \dots \otimes v_d) \\&\vdots \end{aligned}$$

and so on (the term in brackets disappears when \(k=d-1\)). By induction hypothesis, the corresponding sequence \(B_\ell =\alpha U+\beta \widetilde{V}_\ell\) of tensors has nonincreasing spectral norm and in particular \(\Vert B_\ell \Vert _\sigma \le \Vert B_0 \Vert _\sigma = \Vert A_k \Vert _\sigma \le \Vert A \Vert _\sigma\). We claim the \(B_\ell\) converge to \(A_{k+1}\), which proves \(\Vert A_{k+1} \Vert _\sigma \le \Vert A_k \Vert _\sigma \le \Vert A \Vert _\sigma\) as desired. Indeed, the unit norm vectors

$$\begin{aligned} {\tilde{v}}^{}_\ell =y^{}_\ell e^{}_1+\sqrt{1-y_\ell ^2}e_2^{} \end{aligned}$$

above are constructed according to

$$\begin{aligned} y_0=x =\prod _{i=1}^k x_i^{1/k},\quad y_1= \left( x^{k-1} x_{k+1}\right) ^\frac{1}{k},\quad y_{\ell +2} =\left( y_{\ell +1}^{k-1} y^{}_{\ell }\right) ^\frac{1}{k}. \end{aligned}$$

By Lemma 3.2, this sequence converges to

$$\begin{aligned} \left( \left( \prod _{i=1}^k x_i^{1/k}\right) ^k x^{}_{k+1}\right) ^{1/(k+1)}=\prod _{i=1}^{k+1} x_i^{1/(k+1)}, \end{aligned}$$

that is, the \({\tilde{V}}_\ell\) converge to \(V_{k+1}\) and hence the \(B_\ell\) converge to \(A_{k+1}\). This concludes the proof. \(\square\)

Based on Proposition 3.1, we obtain a proof for Theorem 1.2 for general real rank-two directly from Theorem 1.1.

Proof of Theorem 1.2

Again, since A has rank at most two, it suffices to prove the statement for general (i.e., nonsymmetric) \(2 \times \dots \times 2\) tensors. Obviously, the minimal ratio \(\Vert A \Vert _\sigma / \Vert A \Vert _F\) over general \(2 \times \dots \times 2\) tensors is smaller or equal than the minimum over symmetric ones. However, by Proposition 3.1, the converse is also true. The result hence follows from Theorem 1.1. \(\square\)

Theorem 1.2 suggests an interesting relation between results in [6, 10]. The authors in [6] found that the minimal possible ratio of spectral and Frobenius norm among all tensors in \({{\,\mathrm{\mathbb {C}}\,}}^2\otimes {{\,\mathrm{\mathbb {C}}\,}}^2\otimes {{\,\mathrm{\mathbb {C}}\,}}^2\) is \(\frac{2}{3}\), while in [10], it is shown that the minimal ratio for tensors in \({{\,\mathrm{\mathbb {R}}\,}}^2\otimes {{\,\mathrm{\mathbb {R}}\,}}^2\otimes {{\,\mathrm{\mathbb {R}}\,}}^2\) is only \(\frac{1}{2}\). However, Theorem 1.2 states that border rank-two tensors in \({{\,\mathrm{\mathbb {R}}\,}}^{2\times 2\times 2}\) have the minimal ratio \(\frac{2}{3}\). This might be related to the fact that tensors of real rank two and three both have positive volume in \({{\,\mathrm{\mathbb {R}}\,}}^2\otimes {{\,\mathrm{\mathbb {R}}\,}}^2\otimes {{\,\mathrm{\mathbb {R}}\,}}^2\), while almost all tensors in \({{\,\mathrm{\mathbb {C}}\,}}^2\otimes {{\,\mathrm{\mathbb {C}}\,}}^2\otimes {{\,\mathrm{\mathbb {C}}\,}}^2\) have complex rank two.