1 Introduction

With the advances in data collection and storage capabilities, massive multiway (tensor) data are being generated in a wide range of emerging applications [25]. Multilinear algebra and tensor computations are playing more and more important roles in dealing with multiway data in recent years. Computing tensor norms are evidently essential in many tensor computation problems. However, most tensor norms are NP-hard to compute [19], such as the tensor spectral norm [17] and the tensor nuclear norm [12]. As a useful method to approximate matrix norms via block matrices, the computation of tensor norms via block tensors is straightforward and becomes increasingly important within the field of numerical linear algebra [9, 29, 41, 42]. When a tensor is partitioned into subtensors, not necessarily having the same size, some tensor norms of these subtensors form a tensor called a norm compression tensor. Norm compression inequalities for tensors focus on the relation of a norm of this compressed tensor to the norm of the original tensor. These inequalities straightforwardly provide a handy tool to bound and estimate norms of large tensors via norms of smaller subtensors.

In the case of matrices, tensors of order two, norm compression inequalities have been well studied since Bhatia and Kittaneh [6]. Such inequalities have several applications in, for instance, quantum information theory [2, 5] and covariance estimation [7]. An overview of several norm compression inequalities for matrices can be found in [2] and references therein. One important result is due to King [23]: If a matrix M is partitioned into \(2\times 2\) blocks \(\left( {\begin{matrix} M_{11}&{}\quad M_{12}\\ M_{21}&{}\quad M_{22} \end{matrix}}\right) \), then

$$\begin{aligned} \begin{array}{l@{\quad }l} \Vert M\Vert _{p_s}\ge \left\| \left( \begin{array}{c@{\quad }c}\Vert M_{11}\Vert _{p_s} &{} \Vert M_{12}\Vert _{p_s} \\ \Vert M_{21}\Vert _{p_s} &{} \Vert M_{22}\Vert _{p_s} \\ \end{array}\right) \right\| _{p_s} &{}\quad 1\le p\le 2, \\ \Vert M\Vert _{p_s}\le \left\| \left( \begin{array}{c@{\quad }c}\Vert M_{11}\Vert _{p_s} &{} \Vert M_{12}\Vert _{p_s} \\ \Vert M_{21}\Vert _{p_s} &{} \Vert M_{22}\Vert _{p_s} \\ \end{array}\right) \right\| _{p_s}&\quad 2\le p\le \infty , \end{array} \end{aligned}$$
(1)

where \(\Vert {\cdot }\Vert _{p_s}\) stands for the Schatten p-norm of a matrix, i.e., the \(L_p\)-norm of the vector consisting of all the singular values of the matrix. However, there exists an example [3] of a partitioned \(3\times 3\) block matrix such that inequalities of type (1) fail to hold. A conjecture of this type to hold for \(2\times m\) blocks was proposed and its several special cases were proven by Audenaert [3]. There are two notable special cases for \(2\times m\) blocks to hold, namely when the matrix M is positive semidefinite [23], and when the blocks of M are all diagonal matrices [24].

Among all Schatten p-norms of a matrix, three of them are particularly important, namely the spectral norm (\(p=\infty \)), the Frobenius Norm (\(p=2\)), and the nuclear norm (\(p=1\)). The Schatten 2-norm coincides with the Frobenius norm of a matrix, and this makes the corresponding norm compression inequality trivial, which is actually an equality. In fact, the norm compression inequality of type (1) holds for any \(m_1\times m_2\) blocks when \(p=\infty \). This result, to the best of our knowledge, was not studied in the literature, and it is a special case of the main result in this paper. For higher order (order three or higher) tensors, the Schatten p-norms are not well defined unless \(p=\infty ,1\) [12], corresponding to the tensor spectral norm and nuclear norm, respectively. As both the spectral norm and the nuclear norm of a tensor are NP-hard to compute while that of a matrix can be computed in polynomial time, matrix unfoldings have become a major approach dealing with various problems involving these tensor norms, no matter in theory and in practice. Relations of these norms of a tensor and its matrix unfoldings have been studied in [12, 17, 21]. A generalization of such relations under tensor unfoldings has been studied by Want et al. [48].

Block tensors are becoming increasingly important. They have been used in large tensor factorizations [39], tensor decompositions [36], tensor optimization [47], and imaging processing [10]. Ragnarsson and Van Loan [41] developed an infrastructure that supports reasoning about block tensor computations. They [42] further applied block tensors to symmetric embeddings of tensors. Extending block tensors, Li [29] proposed more general concepts of tensor partitions and provided bounds of the spectral norm and the nuclear norm of a tensor via norms of subtensors in a regular partition of the tensor. The results were further generalized to the spectral p-norm and the nuclear p-norm of a tensor and to arbitrary partitions of the tensor [9]. This paper explores the structure of block tensors instead of treating subtensors merely as elements in [9, 29] and proposes more accurate estimation of the spectral norm of a tensor, albeit block tensors are special but most common types of regular partitions [29] and arbitrary partitions [9]. It is worth mentioning that bounds of the tensor spectral p-norm have been extensively studied in the literature [16,17,18, 20, 38, 44, 48], in particular in the area of polynomial optimization [30].

In this paper, we study norm compression inequalities for tensors. We prove that for any block partitioned tensor, no matter how many blocks, the spectral norm of its norm compressed tensor is an upper bound of the spectral norm of the original tensor. The result can be generalized to a wider class of tensor spectral norms. These norm compression inequalities improve many existing bounds of tensor spectral norms in the literature, including the recent bounds via tensor partitions studied in [9, 29]. We discuss two important applications of our results. The first one is on the extremal ratio between the spectral norm and the Frobenius norm of a tensor space. We provide a general methodology to compute upper bounds of this ratio, and in particular to improve the current best upper bound for third order nonnegative tensors and symmetric tensors. The second one is to estimate the spectral norm of a large tensor or matrix via sequential norm compression inequalities. Some numerical evidence is provided to justify our methodology.

This paper is organized as follows. We start with the preparation of various notations, definitions and properties of tensor spectral norms in Sect. 2. In Sect. 3, we present our main result on norm compression inequalities for tensors, and in Sect. 4, we discuss how our main inequalities lead to various other bounds of tensor spectral norms in the literature. For applications, the study of the extremal ratio between the spectral norm and the Frobenius norm of a tensor space is presented in Sect. 5, and estimating the tensor and the matrix spectral norms is discussed in Sect. 6.

2 Preparation

Throughout this paper, we uniformly use the lower case letters (e.g., x), the boldface lower case letters (e.g., \(\varvec{x}=\left( x_i\right) \)), the capital letters (e.g., \(X=\left( x_{ij}\right) \)), and the calligraphic letters (e.g., \(\mathcal {X}=\left( x_{i_1i_2\ldots i_d}\right) \)) to denote scalars, vectors, matrices, and higher order (order three or more) tensors, respectively. Denote \(\mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\) to be the space of d-th order real tensors of dimension \(n_1\times n_2\times \cdots \times n_d\). The same notations apply for a vector space and a matrix space when \(d=1\) and \(d=2\), respectively. Unless otherwise specified, the order of a general tensor in this paper is always denoted by d and the dimension of its mode-k is always denoted by \(n_k\) for \(k=1,2,\ldots ,d\). Given a d-th order tensor space \(\mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\), we denote \(\mathbb {I}^k:=\left\{ 1,2,\ldots ,n_k\right\} \) to be the index set of mode-k for \(k=1,2,\ldots ,d\). Trivially, \(\mathbb {I}^1\times \mathbb {I}^2\times \cdots \times \mathbb {I}^d\) becomes the index set of the entries of a tensor in \(\mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\). Denote \(\mathbb {N}\) to be the set of positive integers and denote \(\mathbb {P}=[1,\infty ]\), the interval where the \(L_p\)-norm of a vector is well defined when \(1\le p\le \infty \).

The Frobenius inner product of two tensors \(\mathcal {U},\mathcal {V}\in \mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\) is defined as:

$$\begin{aligned} \langle \mathcal {U},\mathcal {V}\rangle :=\sum _{i_1=1}^{n_1}\sum _{i_2=1}^{n_2} \ldots \sum _{i_d=1}^{n_d} u_{i_1i_2\ldots i_d} v_{i_1i_2\ldots i_d}. \end{aligned}$$

Its induced Frobenius norm is naturally defined as \(\Vert \mathcal {T}\Vert _2:=\sqrt{\langle \mathcal {T},\mathcal {T}\rangle }\). When \(d=1\), the Frobenius norm is reduced to the Euclidean norm of a vector. In a similar vein, we may define the \(L_p\)-norm of a tensor (also known as the Hölder p-norm [33]) for \(p\in \mathbb {P}\) by looking at a tensor as a vector, as follows:

$$\begin{aligned} \Vert \mathcal {T}\Vert _p:=\left( \sum _{i_1=1}^{n_1}\sum _{i_2=1}^{n_2} \ldots \sum _{i_d=1}^{n_d} |t_{i_1i_2\ldots i_d}|^p\right) ^{\frac{1}{p}}. \end{aligned}$$
(2)

A rank-one tensor, also called a simple tensor, is a tensor that can be written as outer products of vectors

$$\begin{aligned} \mathcal {T}=\varvec{x}^1\otimes \varvec{x}^2\otimes \cdots \otimes \varvec{x}^d. \end{aligned}$$

It is easy to verify that \(\Vert \mathcal {T}\Vert _p=\prod _{k=1}^d\Vert \varvec{x}^d\Vert _p\) for all \(p\in \mathbb {P}\). When \(d=2\), a rank-one tensor is reduced to the well known concept of a rank-one matrix.

The spectral norm of a tensor is an important measure of the tensor.

Definition 2.1

For a given tensor \(\mathcal {T}\in \mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\), the spectral norm of \(\mathcal {T}\), denoted by \(\Vert \mathcal {T}\Vert _\sigma \), is defined as

$$\begin{aligned} \Vert \mathcal {T}\Vert _\sigma :=\max \left\{ \left\langle \mathcal {T}, \varvec{x}^1\otimes \varvec{x}^2\otimes \cdots \otimes \varvec{x}^d \right\rangle : \Vert \varvec{x}^k\Vert _2=1, \quad k=1,2,\ldots ,d\right\} , \end{aligned}$$

and the nuclear norm of \(\mathcal {T}\), denoted by \(\Vert \mathcal {T}\Vert _*\), is defined as

$$\begin{aligned} \Vert \mathcal {T}\Vert _*:= & {} \min \left\{ \sum _{i=1}^r|\lambda _i| : \mathcal {T}=\sum _{i=1}^r \lambda _i \varvec{x}^1_i\otimes \varvec{x}^2_i\otimes \cdots \otimes \varvec{x}^d_i, \Vert \varvec{x}^k_i\Vert _2=1\right. \nonumber \\&\qquad \left. \text{ for } \text{ all } k=1,2,...,d \quad \text{ and }\quad i=1,2,...,r \in \mathbb {N} \right\} . \end{aligned}$$
(3)

Essentially, \(\Vert \mathcal {T}\Vert _\sigma \) is the maximal value of the Frobenius inner product between \(\mathcal {T}\) and a rank-one tensor whose Frobenius norm is one. Computing the tensor spectral norm is also known as the Euclidean spherical constrained multilinear form maximization problem [30]. The tensor nuclear norm is the dual norm to the tensor spectral norm, and vice versa [11, 34], i.e.,

$$\begin{aligned} \Vert \mathcal {T}\Vert _\sigma =\max _{\Vert \mathcal {X}\Vert _*\le 1}\langle \mathcal {T},\mathcal {X}\rangle \quad \text{ and }\quad \Vert \mathcal {T}\Vert _*=\max _{\Vert \mathcal {X}\Vert _\sigma \le 1}\langle \mathcal {T},\mathcal {X}\rangle . \end{aligned}$$

Apart from \(L_p\)-norms defined via tensor entries in (2), there is another set of norms for matrices called Schatten p-norms, defined by the \(L_p\)-norm of the vector consisting all singular values of a matrix. In particular, the spectral norm and the nuclear norm of a matrix is nothing but the Schatten \(\infty \)-norm and the Schatten 1-norm of the matrix, respectively. In fact, the study of norm compression inequalities for matrices in the literature is mostly on Schatten p-norms, including the spectral norm and the nuclear norm as special cases. However, one cannot get a Schatten p-norm for tensors in the manner defining the nuclear norm of a tensor in (3). If the \(L_1\)-norm expression \(\sum _{i=1}^r|\lambda _i|\) in (3) is replaced by an \(L_p\)-norm expression \(\left( \sum _{i=1}^r|\lambda _i|^p\right) ^{\frac{1}{p}}\) for any \(1<p<\infty \), the minimum is always zero [12].

One can actually extend the tensor spectral norm and tensor nuclear norm in Definition 2.1 as follows.

Definition 2.2

For a given tensor \(\mathcal {T}\in \mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\) and a vector \(\varvec{p}=(p_1,p_2,\ldots ,p_d)\in \mathbb {P}^d\), the spectral \(\varvec{p}\)-norm of \(\mathcal {T}\), denoted by \(\Vert \mathcal {T}\Vert _{\varvec{p}_\sigma }\), is defined as

$$\begin{aligned} \Vert \mathcal {T}\Vert _{\varvec{p}_\sigma }:=\max \left\{ \left\langle \mathcal {T}, \varvec{x}^1\otimes \varvec{x}^2\otimes \cdots \otimes \varvec{x}^d \right\rangle : \Vert \varvec{x}^k\Vert _{p_k}=1, \quad k=1,2,\ldots ,d\right\} . \end{aligned}$$

and the nuclear \(\varvec{p}\)-norm of \(\mathcal {T}\), denoted by \(\Vert \mathcal {T}\Vert _{\varvec{p}_*}\), is defined as

$$\begin{aligned} \Vert \mathcal {T}\Vert _{\varvec{p}_*}:= & {} \min \left\{ \sum _{i=1}^r|\lambda _i| : \mathcal {T}=\sum _{i=1}^r \lambda _i \varvec{x}^1_i\otimes \varvec{x}^2_i\otimes \cdots \otimes \varvec{x}^d_i, \Vert \varvec{x}^k_i\Vert _{p_k}=1\right. \\&\qquad \left. \text{ for } \text{ all } k=1,2,...,d \quad \text{ and }\quad i=1,2,...,r\in \mathbb {N}\right\} . \end{aligned}$$

In particular, the spectral \((2,2,\ldots ,2)\)-norm and the nuclear \((2,2,\ldots ,2)\)-norm of a tensor are the usual spectral norm and nuclear norm of the tensor, respectively. The tensor spectral \(\varvec{p}\)-norm was firstly defined at the same time as the tensor spectral norm by Lim [32] in 2005. Computation of the spectral \(\varvec{p}\)-norm for nonnegative tensors was discussed in [13]. When \(p_1=p_2=\cdots =p_d=p\), the tensor spectral p-norm and nuclear p-norm were studied in [9, 12], which are denoted by \(\Vert {\cdot }\Vert _{p_\sigma }\) and \(\Vert {\cdot }\Vert _{p_*}\), respectively. Similar to the tensor spectral norm and nuclear norm, the tensor norms in Definition 2.2 are primal-dual pairs.

Lemma 2.3

For given d-th order tensors \(\mathcal {T}\) and \(\mathcal {X}\) in a same tensor space and \(\varvec{p}\in \mathbb {P}^d\), it follows that

$$\begin{aligned} \langle \mathcal {T},\mathcal {X}\rangle \le \Vert \mathcal {T}\Vert _{\varvec{p}_\sigma } \Vert \mathcal {X}\Vert _{\varvec{p}_*}, \end{aligned}$$

and further

$$\begin{aligned} \Vert \mathcal {T}\Vert _{\varvec{p}_\sigma }&=\max _{\Vert \mathcal {X}\Vert _{\varvec{p}_*}\le 1}\langle \mathcal {T},\mathcal {X}\rangle , \\ \Vert \mathcal {T}\Vert _{\varvec{p}_*}&=\max _{\Vert \mathcal {X}\Vert _{\varvec{p}_\sigma }\le 1}\langle \mathcal {T},\mathcal {X}\rangle . \end{aligned}$$

This duality can be proved similarly to the case when all \(p_k\)’s are equal [9, Lemma 2.5] and is thus omitted.

The spectral \(\varvec{p}\)-norm and nuclear \(\varvec{p}\)-norm of a tensor are in general very difficult to compute. For the computational complexity for various \(\varvec{p}\)’s and orders of the tensor, one is referred to Friedland and Lim [12]. It is worth mentioning that computing these norms for a rank-one tensor admits a closed form.

Proposition 2.4

If a tensor \(\mathcal {T}\) is rank-one, say \(\mathcal {T}=\varvec{x}^1\otimes \varvec{x}^2\otimes \cdots \otimes \varvec{x}^d\), then for any \(\varvec{p}\in \mathbb {P}^d\),

$$\begin{aligned} \Vert \mathcal {T}\Vert _{\varvec{p}_\sigma }=\prod _{k=1}^d\Vert \varvec{x}^k\Vert _{q_k}, \\ \Vert \mathcal {T}\Vert _{\varvec{p}_*}=\prod _{k=1}^d\Vert \varvec{x}^k\Vert _{p_k}, \end{aligned}$$

where \(\frac{1}{p_k}+\frac{1}{q_k}=1\) for \(k=1,2,\ldots ,d\).

The proof is left to interested readers. Therefore, \(\Vert \mathcal {T}\Vert _{\varvec{p}_\sigma }\) can be taken as the maximal value of the Frobenius inner product between \(\mathcal {T}\) and a rank-one tensor whose spectral \(\varvec{q}\)-norm is one.

3 Norm compression inequalities for tensors

To study norm compression inequalities for tensors, we first introduce tensor partitions. One important class of tensor partitions, block tensors, was proposed and studied by Ragnarsson and Van Loan [41]. It is a straightforward generalization of block matrices. Li [29] proposed three types of tensor partitions, namely, modal partitions (an alternative name for block tensors), regular partitions, and tensor partitions, with the later generalizing the former. A more general class of partitions, called arbitrary partitions, was proposed and studied by Chen and Li [9].

Norm compression inequalities are established on block tensors, which are also called modal partitions as they are constructed by partitions of the index sets of tensor modes. Given a tensor \(\mathcal {T}=\left( t_{i_1i_2\ldots i_d}\right) \in \mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\), for every \(1\le k\le d\), the indices of its mode k can be partitioned into \(r_k\) nonempty sets, i.e.,

$$\begin{aligned} \mathbb {I}^k=\{1,2,\ldots ,n_k\}=\bigcup _{j=1}^{r_k} \mathbb {I}^k_j \quad \text{ and }\quad \mathbb {I}^k_i\bigcap \mathbb {I}^k_j =\emptyset \text{ if } i\ne j. \end{aligned}$$

For simplicity, we assume that the indices in \(\mathbb {I}^k_i\) are consecutive and \(\mathbb {I}^k_i\)’s monotonically increase as i increases, since this can be done easily via indices relabeling without affecting tensor norms. Any \((\mathbb {J}_1,\mathbb {J}_2,\ldots ,\mathbb {J}_d)\) where \(\mathbb {J}_k\subseteq \mathbb {I}^k\) for \(k=1,2,\ldots ,d\) uniquely defines a subtensor of \(\mathcal {T}\) by only keeping indices in \(\mathbb {J}_k\) for mode k of \(\mathcal {T}\), i.e.,

$$\begin{aligned} \mathcal {T}(\mathbb {J}_1,\mathbb {J}_2,\ldots ,\mathbb {J}_d) = \left( \left( t_{i_1i_2\ldots i_d}\right) _{i_k\in \mathbb {J}_k,\quad k=1,2,\ldots ,d}\right) \in \mathbb {R}^{|\mathbb {J}_1|\times |\mathbb {J}_2|\times \cdots \times |\mathbb {J}_d|}. \end{aligned}$$

Definition 3.1

The partition \(\left\{ \mathcal {T}\left( \mathbb {I}^1_{j_1},\mathbb {I}^2_{j_2},\ldots ,\mathbb {I}^d_{j_d}\right) : 1\le j_k\le r_k,\, k=1,2,\ldots ,d \right\} \) is called a modal partition of a tensor \(\mathcal {T}\in \mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\), where \(\{\mathbb {I}^k_1, \mathbb {I}^k_2, \ldots , \mathbb {I}^k_{r_k}\}\) is a partition of \(\mathbb {I}^k\) for \(k=1,2,\ldots ,d\).

In our words, a partitioned block tensor is a tensor that has been modal partitioned. Trivially for \(d=2\), a block matrix can be obtained by a modal partition, i.e., partitions of row indices and column indices. We remark that a subtensor in a modal partition of a tensor may not possess the same order of the original tensor. If some \(\mathbb {I}_j^k\) contains only one index, it causes the disappearance of mode k and reduces the order of the subtensor by one. However, we still treat this subtensor as a d-th order tensor by keeping the dimension of mode k being one. For instance, we can always treat a scalar as a one-dimensional vector, or a one-by-one matrix.

In order to present the proof of our main result (Theorem 3.3) clearly as well as to provide a better picture of modal partitions, we now discuss tensor cuts. Given a d-th order tensor \(\mathcal {T}\in \mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\), a mode-k tensor cut, cuts the tensor \(\mathcal {T}\) at mode k into two subtensors \(\mathcal {T}_1\) and \(\mathcal {T}_2\), denoted by \(\mathcal {T}=\mathcal {T}_1{\bigvee }_k\mathcal {T}_2\), where

$$\begin{aligned} \mathcal {T}_1{\in }\mathbb {R}^{n_1 \times \cdots \times n_{k-1} \times \ell _1 \times n_{k+1}\ldots \times n_d},~\mathcal {T}_2\in \mathbb {R}^{n_1 \times \cdots \times n_{k-1} \times \ell _2 \times n_{k+1}\ldots \times n_d},\quad \text{ and }\quad \ell _1+\ell _2{=}n_k. \end{aligned}$$

The same notation can be used to cut a matrix and cut a vector. In particular, for a first order tensor, a vector \(\varvec{x}\in \mathbb {R}^n\), \(\varvec{x}=\varvec{x}_1{\bigvee }_1\varvec{x}_2\) is exactly same as \(\varvec{x}^{\mathrm{T}}=\left( {\varvec{x}_1}^{\mathrm{T}},{\varvec{x}_2}^{\mathrm{T}}\right) \). The mode subscript of \({\bigvee }\) in a tensor cut is sometimes omitted for clearer presentations. For instance, \(\mathcal {T}=\mathcal {T}_1{\bigvee }\mathcal {T}_2\) implies that there exists \(k\in \mathbb {N}\) such that \(\mathcal {T}=\mathcal {T}_1{\bigvee }_k\mathcal {T}_2\). Obviously, the operation \({\bigvee }\) is not commutative and associative in general. Once the notation \({\bigvee }_k\) is applied, the dimensions of its associated two tensors must be the same in every mode except mode k. With this handy notation, a block tensor via a modal partition (Definition 3.1) can be simply written as

$$\begin{aligned} \mathcal {T}= \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d}\mathcal {T}\left( \mathbb {I}^1_{j_1},\mathbb {I}^2_{j_2},\ldots ,\mathbb {I}^d_{j_d}\right) . \end{aligned}$$

The following norm compression identity for vectors is straightforward.

Lemma 3.2

If a vector \(\varvec{x}={\bigvee }_{j=1}^r \varvec{x}_j\in \mathbb {R}^n\), then for any \(p\in \mathbb {P}\),

$$\begin{aligned} \Vert \varvec{x}\Vert _p=\left\| {\bigvee }_{j=1}^r \Vert \varvec{x}_j\Vert _p \right\| _p. \end{aligned}$$

Proof

Denote the vector

$$\begin{aligned} \varvec{y}={\bigvee }_{j=1}^r \Vert \varvec{x}_j\Vert _p = \left( \Vert \varvec{x}_1\Vert _p,\Vert \varvec{x}_2\Vert _p,\ldots ,\Vert \varvec{x}_r\Vert _p \right) ^{\mathrm{T}}. \end{aligned}$$

We have

$$\begin{aligned} \Vert \varvec{y}\Vert _p = \left( \sum _{j=1}^r {\Vert \varvec{x}_j\Vert _p}^p \right) ^{\frac{1}{p}} = \left( \sum _{i=1}^n|x_i|^p\right) ^{\frac{1}{p}} = \Vert \varvec{x}\Vert _p. \end{aligned}$$

\(\square \)

Though in practice one is often interested in the spectral norm of a tensor rather than general spectral \(\varvec{p}\)-norms of the tensor, we present our main result for the general case. It obviously applies to the tensor spectral norm when \(\varvec{p}=(2,2,\ldots ,2)\).

Theorem 3.3

If \(\mathcal {T}= \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d}\mathcal {T}\left( \mathbb {I}^1_{j_1},\mathbb {I}^2_{j_2},\ldots ,\mathbb {I}^d_{j_d}\right) \) is a modal partition of a tensor \(\mathcal {T}\in \mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\) where \(\{\mathbb {I}^k_1, \mathbb {I}^k_2, \ldots , \mathbb {I}^k_{r_k}\}\) is a partition of \(\mathbb {I}^k\) for \(k=1,2,\ldots ,d\), then for any \(\varvec{p}\in \mathbb {P}^d\),

$$\begin{aligned} \Vert \mathcal {T}\Vert _{\varvec{p}_\sigma }&=\left\| \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d}\mathcal {T}\left( \mathbb {I}^1_{j_1},\mathbb {I}^2_{j_2},\ldots ,\mathbb {I}^d_{j_d}\right) \right\| _{\varvec{p}_\sigma }\nonumber \\&\le \left\| \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d} \left\| \mathcal {T}\left( \mathbb {I}^1_{j_1},\mathbb {I}^2_{j_2},\ldots ,\mathbb {I}^d_{j_d}\right) \right\| _{\varvec{p}_\sigma }\right\| _{\varvec{p}_\sigma }. \end{aligned}$$
(4)

Proof

Let \(\Vert \mathcal {T}\Vert _{\varvec{p}_\sigma }=\left\langle \mathcal {T}, \varvec{x}^1\otimes \varvec{x}^2\otimes \cdots \otimes \varvec{x}^d \right\rangle \) with \(\Vert \varvec{x}^k\Vert _{p_k}=1\) for \(k=1,2,\ldots ,d\) (by compactness, these \(\varvec{x}^k\)’s must exist). Denote

$$\begin{aligned} \mathcal {T}_{j_1j_2\ldots j_d}=\mathcal {T}\left( \mathbb {I}^1_{j_1},\mathbb {I}^2_{j_2},\ldots ,\mathbb {I}^d_{j_d}\right) \quad \forall \,1\le j_k\le r_k,\quad k=1,2,\ldots ,d, \end{aligned}$$

which leads to

$$\begin{aligned} \mathcal {T}=\mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d}\mathcal {T}_{j_1j_2\ldots j_d}. \end{aligned}$$

Further, we denote

$$\begin{aligned} \varvec{x}^k_{j_k}=\varvec{x}^k(\mathbb {I}^k_{j_k})\quad \forall \,1\le j_k\le r_k,\quad k=1,2,\ldots ,d. \end{aligned}$$

Obviously we have

$$\begin{aligned} \varvec{x}^k={\bigvee }_{j_k=1}^{r_k}\varvec{x}^k_{j_k}\quad \forall \,k=1,2,\ldots ,d. \end{aligned}$$

First, for every \((j_1,j_2,\ldots ,j_d)\), if none of the vectors \(\varvec{x}^1_{j_1}, \varvec{x}^2_{j_2},\ldots , \varvec{x}^d_{j_d}\) is a zero vector, then

$$\begin{aligned}&\left\langle \mathcal {T}_{j_1j_2\ldots j_d}, \varvec{x}^1_{j_1}\otimes \varvec{x}^2_{j_2}\otimes \cdots \otimes \varvec{x}^d_{j_d} \right\rangle \nonumber \\&\quad = \left\langle \mathcal {T}_{j_1j_2\ldots j_d}, \frac{\varvec{x}^1_{j_1}}{\Vert \varvec{x}^1_{j_1}\Vert _{p_1}}\otimes \frac{\varvec{x}^2_{j_2}}{\Vert \varvec{x}^2_{j_2}\Vert _{p_2}}\otimes \ldots \otimes \frac{\varvec{x}^d_{j_d}}{\Vert \varvec{x}^d_{j_d}\Vert _{p_d}} \right\rangle \prod _{k=1}^d \Vert \varvec{x}^k_{j_k}\Vert _{p_k} \nonumber \\&\quad \le \Vert \mathcal {T}_{j_1j_2\ldots j_d}\Vert _{\varvec{p}_\sigma }\prod _{k=1}^d \Vert \varvec{x}^k_{j_k}\Vert _{p_k}. \end{aligned}$$
(5)

The above inequality trivially holds even if some of \(\varvec{x}^1_{j_1}, \varvec{x}^2_{j_2},\ldots , \varvec{x}^d_{j_d}\) are zero vectors, and thus it holds in general.

Next, by the norm compression identity for vectors in Lemma 3.2, we have

$$\begin{aligned} \left\| {\bigvee }_{j_k=1}^{r_k}\Vert \varvec{x}^k_{j_k}\Vert _{p_k}\right\| _{p_k} =\left\| {\bigvee }_{j_k=1}^{r_k}\varvec{x}^k_{j_k}\right\| _{p_k} =\Vert \varvec{x}^k\Vert _{p_k}=1 \quad \forall \,k=1,2,\ldots ,d. \end{aligned}$$
(6)

Therefore, we have

$$\begin{aligned} \Vert \mathcal {T}\Vert _{\varvec{p}_\sigma }&=\left\langle \mathcal {T}, \varvec{x}^1\otimes \varvec{x}^2\otimes \cdots \otimes \varvec{x}^d \right\rangle \\&=\left\langle \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d}\mathcal {T}_{j_1j_2\ldots j_d}, \left( \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\varvec{x}^1_{j_1}\right) \otimes \left( \mathop {\bigvee \nolimits _1}_{j_2=1}^{r_2}\varvec{x}^2_{j_2} \right) \right. \\&\qquad \left. \otimes \cdots \otimes \left( \mathop {\bigvee \nolimits _1}_{j_d=1}^{r_d}\varvec{x}^d_{j_d}\right) \right\rangle \\&=\left\langle \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d}\mathcal {T}_{j_1j_2\ldots j_d}, \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d} \varvec{x}^1_{j_1} \otimes \varvec{x}^2_{j_2} \otimes \cdots \otimes \varvec{x}^d_{j_d} \right\rangle \\&=\sum _{j_1=1}^{r_1}\sum _{j_2=1}^{r_2}\ldots \sum _{j_d=1}^{r_d} \left\langle \mathcal {T}_{j_1j_2\ldots j_d}, \varvec{x}^1_{j_1} \otimes \varvec{x}^2_{j_2} \otimes \cdots \otimes \varvec{x}^d_{j_d}\right\rangle \\&\le \sum _{j_1=1}^{r_1}\sum _{j_2=1}^{r_2}\ldots \sum _{j_d=1}^{r_d} \left( \Vert \mathcal {T}_{j_1j_2\ldots j_d}\Vert _{\varvec{p}_\sigma }\prod _{k=1}^d \Vert \varvec{x}^k_{j_k}\Vert _{p_k}\right) \\&=\left\langle \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d}\Vert \mathcal {T}_{j_1j_2\ldots j_d}\Vert _{\varvec{p}_\sigma }, \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d} \left( \prod _{k=1}^d \Vert \varvec{x}^k_{j_k}\Vert _{p_k}\right) \right\rangle \\&=\left\langle \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d}\Vert \mathcal {T}_{j_1j_2\ldots j_d}\Vert _{\varvec{p}_\sigma }, \left( \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\Vert \varvec{x}^1_{j_1}\Vert _{p_1}\right) \otimes \left( \mathop {\bigvee \nolimits _1}_{j_2=1}^{r_2}\Vert \varvec{x}^2_{j_2}\Vert _{p_2} \right) \right. \\&\qquad \left. \otimes \cdots \otimes \left( \mathop {\bigvee \nolimits _1}_{j_d=1}^{r_d}\Vert \varvec{x}^d_{j_d}\Vert _{p_d}\right) \right\rangle \\&\le \left\| \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d}\Vert \mathcal {T}_{j_1j_2\ldots j_d}\Vert _{\varvec{p}_\sigma } \right\| _{\varvec{p}_\sigma }, \end{aligned}$$

where the first inequality is due to (5), and the last inequality is due to (6) and Definition 2.2. \(\square \)

We remark that for \(d=2\), the case of matrices, Theorem 3.3 was not studied in the literature, to the best of our knowledge. The tightness of the norm compression inequality in Theorem 3.3 is in general hard to establish. We list some special cases below albeit some of them are trivial:

  1. 1.

    \(d=1\) which corresponds to the case of vectors essentially established in Lemma 3.2;

  2. 2.

    \(\varvec{p}=(1,1,\ldots ,1)\) for which the spectral \(\varvec{p}\)-norm of a tensor is simply the \(L_\infty \)-norm of the tensor or the largest absolute-valued entry of the tensor;

  3. 3.

    All but one of the subtensors are zero tensors;

  4. 4.

    The original tensor \(\mathcal {T}\) is rank-one.

The last case actually includes the first case as a special one. Its proof can be obtained by using Proposition 2.4 and is left to interested readers.

For the nuclear \(\varvec{p}\)-norm of a tensor, like the nice dual bounds of the nuclear norm shown in [29] and the nuclear p-norm shown in [9], one hopes to establish a dual inequality to (4) as follows:

$$\begin{aligned} \Vert \mathcal {T}\Vert _{\varvec{p}_*}&=\left\| \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d}\mathcal {T}\left( \mathbb {I}^1_{j_1},\mathbb {I}^2_{j_2},\ldots ,\mathbb {I}^d_{j_d}\right) \right\| _{\varvec{p}_*}\nonumber \\&\ge \left\| \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d} \left\| \mathcal {T}\left( \mathbb {I}^1_{j_1},\mathbb {I}^2_{j_2},\ldots ,\mathbb {I}^d_{j_d}\right) \right\| _{\varvec{p}_*}\right\| _{\varvec{p}_*}. \end{aligned}$$
(7)

Unfortunately, this does not hold in general; see the example below. This actually makes the norm compression inequalities for tensors more interesting, and the result in Theorem 3.3 more valuable.

Example 3.4

Let \(M=\left( {\begin{matrix} -\,1&{}\quad 1&{}\quad 0\\ 1&{}\quad 0&{}\quad 1\\ 0&{}\quad 1&{}\quad 1 \end{matrix}}\right) \in \mathbb {R}^{3\times 3}\) and a modal partition of M be the \(3\times 3\) entry-wise partition, resulting the nuclear norm compressed tensor to be \(|M|=\left( {\begin{matrix} 1&{}\quad 1&{}\quad 0\\ 1&{}\quad 0&{}\quad 1\\ 0&{}\quad 1&{}\quad 1 \end{matrix}}\right) \). It follows that

$$\begin{aligned} \Vert M\Vert _* =2\sqrt{3} < 4= \Vert |M|\Vert _*, \end{aligned}$$

disproving (7) when \(d=2\) and \(\varvec{p}=(2,2)\).

As far as we are aware, the only known nontrivial case for (7) to hold is for \(2\times 2\) blocked matrices with \(\varvec{p}=(2,2)\) due to King [23], i.e., (1) for \(p=1\). There is a general case, though sort of trivial, for (7) to hold as an equality when \(\varvec{p}=(1,1,\ldots ,1)\) for which the nuclear \(\varvec{p}\)-norm of a tensor becomes the \(L_1\)-norm of the tensor.

To conclude this section, we provide some insights on (7) although we are unable to prove any general result. We believe that (7) holds for nonnegative tensors, i.e., tensors having all nonnegative entries. Another interesting question is to find the smallest \(\tau >0\) such that

$$\begin{aligned} \Vert \mathcal {T}\Vert _{\varvec{p}_*}&=\left\| \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d}\mathcal {T}\left( \mathbb {I}^1_{j_1},\mathbb {I}^2_{j_2},\ldots ,\mathbb {I}^d_{j_d}\right) \right\| _{\varvec{p}_*}\\&\ge \tau \left\| \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d} \left\| \mathcal {T}\left( \mathbb {I}^1_{j_1},\mathbb {I}^2_{j_2},\ldots ,\mathbb {I}^d_{j_d}\right) \right\| _{\varvec{p}_*}\right\| _{\varvec{p}_*} \end{aligned}$$

holds in general. Numerical evidence shows that \(\tau \) may not be a universal constant, and can depend on dimensions of the tensor space.

4 Improved bounds on tensor and matrix norms

In this section, we discuss how our norm compression inequalities improve other known bounds on tensor and matrix norms in the literature.

4.1 Norm compression inequalities for matrices

As mentioned in Sect. 1, norm compression inequalities for matrices were studied mainly for the Schatten p-norms, which unfortunately do not hold for general \(r_1\times r_2\) blocks [3]. However, there do exist two related results for general \(r\times r\) blocks. For the relevancy to our results, we present them using our notations. Let \(T=\mathop {\bigvee \nolimits _1}_{i=1}^r\mathop {\bigvee \nolimits _2}_{j=1}^r T_{ij}\) be a modal partition of a matrix T. One result is due to Bhatia and Kittaneh [6, Theorem 1]:

$$\begin{aligned} r^{-2}\sum _{i=1}^r\sum _{j=1}^r{\Vert T_{ij}\Vert _\sigma }^2 \le {\Vert T\Vert _\sigma }^2 \le \sum _{i=1}^r\sum _{j=1}^r{\Vert T_{ij}\Vert _\sigma }^2, \end{aligned}$$
(8)

and the other is due to Bebendorf [4, Lemma 2.14]:

$$\begin{aligned} \Vert T\Vert _\sigma \le \left( \max _{1\le i\le r}\sum _{j=1}^r\Vert T_{ij}\Vert _\sigma \right) ^{\frac{1}{2}} \left( \max _{1\le j\le r}\sum _{i=1}^r\Vert T_{ij}\Vert _\sigma \right) ^{\frac{1}{2}}. \end{aligned}$$
(9)

A basic inequality between the spectral norm and the Frobenius norms of a matrix (see, e.g., [14]) state that

$$\begin{aligned} \Vert T\Vert _\sigma \le \Vert T\Vert _2. \end{aligned}$$

Therefore, according to (4) when \(d=2\) and \(\varvec{p}=(2,2)\), we get

$$\begin{aligned} \Vert T\Vert _\sigma \le \left\| \mathop {\bigvee \nolimits _1}_{i=1}^r\mathop {\bigvee \nolimits _2}_{j=1}^r \Vert T_{ij}\Vert _\sigma \right\| _\sigma \le \left\| \mathop {\bigvee \nolimits _1}_{i=1}^r\mathop {\bigvee \nolimits _2}_{j=1}^r \Vert T_{ij}\Vert _\sigma \right\| _2 = \left( \sum _{i=1}^r\sum _{j=1}^r{\Vert T_{ij}\Vert _\sigma }^2\right) ^{\frac{1}{2}}, \end{aligned}$$

providing a tighter upper bound of \(\Vert T\Vert _\sigma \) than that in (8).

To see how our inequality (4) improves the upper bound of \(\Vert T\Vert _\sigma \) in (9), we need to use a classical result due to Schur [43], which states that for any matrix \(T=(t_{ij})\in \mathbb {R}^{n_1\times n_2}\),

$$\begin{aligned} \Vert T\Vert _\sigma \le \left( \max _{1\le i\le n_1} \sum _{j=1}^{n_2} |t_{ij}|\right) ^{\frac{1}{2}} \left( \max _{1\le j\le n_2} \sum _{i=1}^{n_1} |t_{ij}|\right) ^{\frac{1}{2}}. \end{aligned}$$
(10)

By (4) when \(d=2\) and \(\varvec{p}=(2,2)\), we get

$$\begin{aligned} \Vert T\Vert _\sigma \le \left\| \mathop {\bigvee \nolimits _1}_{i=1}^r\mathop {\bigvee \nolimits _2}_{j=1}^r \Vert T_{ij}\Vert _\sigma \right\| _\sigma \le \left( \max _{1\le i\le r} \sum _{j=1}^r \Vert T_{ij}\Vert _\sigma \right) ^{\frac{1}{2}} \left( \max _{1\le j\le r} \sum _{i=1}^r \Vert T_{ij}\Vert _\sigma \right) ^{\frac{1}{2}}, \end{aligned}$$

providing a tighter upper bound of \(\Vert T\Vert _\sigma \) than that in (9).

We remark that the result (10) of Schur [43] has been generalized to tensors by Hardy, Littlewood, and Pólya [15, Theorem 273] and so we can easily apply the norm compression inequality (4) to generalize inequality (9) from matrices to tensors. The detail is left to interested readers.

4.2 Bounds on tensor norms via partitions

Li [29] first proposed bounds on the tensor spectral norm based on tensor partitions. Specifically, if \(\{\mathcal {T}_1,\mathcal {T}_2,\ldots ,\mathcal {T}_m\}\) is a regular partition of a tensor \(\mathcal {T}\), then

$$\begin{aligned} \left\| \left( \Vert \mathcal {T}_1\Vert _\sigma ,\Vert \mathcal {T}_2\Vert _\sigma ,\ldots ,\Vert \mathcal {T}_m\Vert _\sigma \right) \right\| _\infty \le \Vert \mathcal {T}\Vert _\sigma \le \left\| \left( \Vert \mathcal {T}_1\Vert _\sigma ,\Vert \mathcal {T}_2\Vert _\sigma ,\ldots ,\Vert \mathcal {T}_m\Vert _\sigma \right) \right\| _2. \end{aligned}$$
(11)

This result was later generalized to the most general class of partitions and to any tensor spectral p-norm by Chen and Li [9], i.e., if \(\left\{ \mathcal {T}_1,\mathcal {T}_2,\ldots ,\mathcal {T}_m\right\} \) is an arbitrary partition of a tensor \(\mathcal {T}\) and \(p\in \mathbb {P}\) with \(\frac{1}{p}+\frac{1}{q}=1\), then

$$\begin{aligned} \left\| \left( \Vert \mathcal {T}_1\Vert _{p_\sigma },\Vert \mathcal {T}_2\Vert _{p_\sigma },\ldots ,\Vert \mathcal {T}_m\Vert _{p_\sigma }\right) \right\| _\infty {\le } \Vert \mathcal {T}\Vert _{p_\sigma } {\le } \left\| \left( \Vert \mathcal {T}_1\Vert _{p_\sigma },\Vert \mathcal {T}_2\Vert _{p_\sigma },\ldots ,\Vert \mathcal {T}_m\Vert _{p_\sigma }\right) \right\| _q. \end{aligned}$$
(12)

Here we do not introduce regular partitions and arbitrary partitions of a tensor, but only mention that they are more general than modal partitions. However, modal partitions are the most commonly seen partitions in practice.

Corollary 4.1

If \(\mathcal {T}= \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d}\mathcal {T}_{j_1j_2\ldots j_d}\) is a modal partition of a tensor \(\mathcal {T}\in \mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\) and \(p\in \mathbb {P}\) with \(\frac{1}{p}+\frac{1}{q}=1\), then

$$\begin{aligned} \Vert \mathcal {T}\Vert _{p_\sigma } {\le } \left\| \mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d} \left\| \mathcal {T}_{j_1j_2\ldots j_d}\right\| _{p_\sigma }\right\| _{p_\sigma } {\le } \left( \sum _{j_1=1}^{r_1}\sum _{j_2=1}^{r_2}\ldots \sum _{j_d=1}^{r_d} {\left\| \mathcal {T}_{j_1j_2\ldots j_d}\right\| _{p_\sigma }}^{q}\right) ^{\frac{1}{q}}. \end{aligned}$$
(13)

Proof

The first inequality in (13) is exactly (4) if we let \(\varvec{p}=(p,p,\ldots ,p)\). To see why one upper bound is tighter than the other upper bound in (13), we only need to apply the upper bound of (12) since \(\left\{ \left\| \mathcal {T}_{j_1j_2\ldots j_d}\right\| _{p_\sigma } : 1\le j_k\le r_k,\,r=1,2,\ldots ,d\right\} \) is a modal partition of the tensor \(\mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d} \left\| \mathcal {T}_{j_1j_2\ldots j_d}\right\| _{p_\sigma }\). \(\square \)

The above result obviously improves the upper bound in (12) and when \(p=2\) improves the upper bound in (11). These improvements are made by considering positions of \(\left\| \mathcal {T}_{j_1j_2\ldots j_d}\right\| _{p_\sigma }\)’s in the norm compressed tensor \(\mathop {\bigvee \nolimits _1}_{j_1=1}^{r_1}\mathop {\bigvee \nolimits _2}_{j_2=1}^{r_2}\ldots \mathop {\bigvee \nolimits _d}_{j_d=1}^{r_d} \left\| \mathcal {T}_{j_1j_2\ldots j_d}\right\| _{p_\sigma }\) rather than treating all of them as entries of a vector.

4.3 Bounds on norms of matrix unfoldings

Matrix unfoldings of a tensor has been one of the main tools in tensor computations, partially because that most tensor problems are NP-hard [19] while corresponding matrix problems are much easier. For instance, computing the spectral norm and the nuclear norm of a tensor is NP-hard [12, 17] while that of a matrix can be done in polynomial time. The relation between norms of matrix unfoldings of a tensor and norms via certain partitions of the tensor has been investigated by Chen and Li [9]. As discussed in Sect. 4.2, the norm compression inequality in Theorem 3.3 improves (12) in [9]. Consequently, bounds of the tensor spectral p-norm can be improved in various ways by applying some specific partitions of the tensor. Here we discuss one particular instance of this kind to appreciate the applicability of our general approach. It could be of special interest to bounding the spectral norm of a large matrix analogous to the discussion in Sect. 6.

Let \(\mathcal {T}\in \mathbb {R}^{n \times n \times n \times n}\) be a fourth order tensor. The traditional matrix unfolding of \(\mathcal {T}\) unfolds \(\mathcal {T}\) to an \(n\times n^3\) matrix, and it can be done via four different modes. Square matrix unfoldings, i.e., unfolding \(\mathcal {T}\) to an \(n^2\times n^2\) matrix, has been appeared frequently recently, in particular in studying the largest eigenvalue of a fourth order tensor [22, 37]. Let \(T_{13,24}\in \mathbb {R}^{n^2\times n^2}\) be the square matrix unfolding of \(\mathcal {T}\) by grouping modes 1 and 3 of \(\mathcal {T}\) into the row and modes 2 and 4 of \(\mathcal {T}\) into the column of \(T_{13,24}\). It is well known (see, e.g., Wang et al. [48]) that \(\Vert \mathcal {T}\Vert _\sigma \le \Vert T_{13,24}\Vert _\sigma \). Let

$$\begin{aligned} \mathcal {T}_{ij}=\mathcal {T}(\{i\},\{j\},\{1,2,\ldots ,n\},\{1,2,\ldots ,n\})\in \mathbb {R}^{1\times 1\times n\times n}\quad \forall \,1\le i,j\le n, \end{aligned}$$

and so we obtain a modal partition of \(\mathcal {T}\),

$$\begin{aligned} \mathcal {T}=\mathop {\bigvee \nolimits _1}_{i=1}^n\mathop {\bigvee \nolimits _2}_{j=1}^n \mathcal {T}_{ij}. \end{aligned}$$

As \(\mathcal {T}_{ij}\) is essentially a matrix, we use \(T_{ij}\in \mathbb {R}^{n \times n}\) to denote it. An important observation is that

$$\begin{aligned} T_{13,24}=\mathop {\bigvee \nolimits _1}_{i=1}^n\mathop {\bigvee \nolimits _2}_{j=1}^n T_{ij}. \end{aligned}$$

In fact, we also have \(T_{14,23}=\mathop {\bigvee \nolimits _1}_{i=1}^n\mathop {\bigvee \nolimits _2}_{j=1}^n (T_{ij})^{\mathrm{T}}\), and other square matrix unfoldings of \(\mathcal {T}\) can also be modal partitioned similarly. The above discussion can be clearly verified by the follow example.

Example 4.2

Let \(\mathcal {T}=(t_{ijk\ell })\in \mathbb {R}^{2\times 2\times 2\times 2}\), and we have

$$\begin{aligned} T_{13,24}=\left( \begin{array}{c@{\quad }c|c@{\quad }c} t_{1111} &{} t_{1112} &{} t_{1211} &{} t_{1212} \\ t_{1121} &{} t_{1122} &{} t_{1221} &{} t_{1222}\\ \hline t_{2111} &{} t_{2112} &{} t_{2211} &{} t_{2212} \\ t_{2121} &{} t_{2122} &{} t_{2221} &{} t_{2222} \\ \end{array} \right)&=\left( \begin{array}{c@{\quad }c} T_{11} &{} T_{12} \\ T_{21} &{} T_{22} \\ \end{array} \right) \in \mathbb {R}^{4\times 4}, \\ T_{14,23}=\left( \begin{array}{c@{\quad }c|c@{\quad }c} t_{1111} &{} t_{1121} &{} t_{1211} &{} t_{1221} \\ t_{1112} &{} t_{1122} &{} t_{1212} &{} t_{1222}\\ \hline t_{2111} &{} t_{2121} &{} t_{2211} &{} t_{2221} \\ t_{2112} &{} t_{2122} &{} t_{2212} &{} t_{2222} \\ \end{array} \right)&=\left( \begin{array}{c@{\quad }c} (T_{11})^{\mathrm{T}} &{} (T_{12})^{\mathrm{T}} \\ (T_{21})^{\mathrm{T}} &{} (T_{22})^{\mathrm{T}} \\ \end{array} \right) \in \mathbb {R}^{4\times 4}. \end{aligned}$$

Let us now apply Theorem 3.3, and we obtain

$$\begin{aligned} \Vert \mathcal {T}\Vert _\sigma \le \Vert T_{13,24}\Vert _\sigma \le \left\| \mathop {\bigvee \nolimits _1}_{i=1}^n\mathop {\bigvee \nolimits _2}_{j=1}^n \Vert T_{ij}\Vert _\sigma \right\| _\sigma \le \left( \sum _{i=1}^n\sum _{j=1}^n {\Vert T_{ij}\Vert _\sigma }^2 \right) ^{\frac{1}{2}}. \end{aligned}$$
(14)

The bound for both \(\Vert \mathcal {T}\Vert _\sigma \) and \(\Vert T_{13,24}\Vert _\sigma \) by the norm compression inequality, \(\left\| \mathop {\bigvee \nolimits _1}_{i=1}^n\mathop {\bigvee \nolimits _2}_{j=1}^n \Vert T_{ij}\Vert _\sigma \right\| _\sigma \), improves \(\left( \sum _{i=1}^n\sum _{j=1}^n {\Vert T_{ij}\Vert _\sigma }^2 \right) ^{\frac{1}{2}}\), which is an instance of [9, Theorem 4.7]. In fact, the bound \(\left\| \mathop {\bigvee \nolimits _1}_{i=1}^n\mathop {\bigvee \nolimits _2}_{j=1}^n \Vert T_{ij}\Vert _\sigma \right\| _\sigma \) can be computed in polynomial time. In practice, if a given matrix \(T_{13,24}\) is very large, (14) can be used to lower and upper bound its spectral norm by the spectral norm of \(\mathcal {T}\) and the norm compressed matrix \(\mathop {\bigvee \nolimits _1}_{i=1}^n\mathop {\bigvee \nolimits _2}_{j=1}^n \Vert T_{ij}\Vert _\sigma \). We will discuss this further in Sect. 6.

To conclude this section, we remark that the variety of modal partitions of a tensor provides various specific needs, such as norms of tensor unfoldings, i.e., unfolding a given tensor to a tensor of a lower order [48]. The usefulness of norm compression inequalities for tensors (Theorem 3.3) is based on the following ground fact: For any tensor unfolding (including matrix unfolding) of a given tensor, there exists a modal partition of the tensor, which is also a modal partition of the tensor unfolding.

5 Extremal ratio between the spectral and Frobenius norms

Given a tensor space \(\mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\), the extremal ratio between the spectral norm and the Frobenius norm is defined as

$$\begin{aligned} \tau (\mathbb {R}^{n_1\times n_2\times \cdots \times n_d}):=\min \left\{ \frac{\Vert \mathcal {T}\Vert _\sigma }{\Vert \mathcal {T}\Vert _2}:\mathcal {T}\ne 0,\,\mathcal {T}\in \mathbb {R}^{n_1\times n_2\times \cdots \times n_d} \right\} . \end{aligned}$$
(15)

This natural question is easy when \(d=2\) (matrices) but becomes very difficult when \(d\ge 3\). The concept was proposed by Qi [40], known as the best rank-one approximation ratio of a tensor space, although Kühn and Peetre [28] studied this ratio much earlier for some small \(n_i\)’s when \(d=3\). For a nonzero tensor \(\mathcal {T}\in \mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\), the following projection problem

$$\begin{aligned} \max \left\{ \langle \mathcal {T},\mathcal {X}\rangle : \Vert \mathcal {X}\Vert _2=1,\,\mathcal {X}\in \mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\right\} =\Vert \mathcal {T}\Vert _2, \end{aligned}$$
(16)

and obviously attains optimum when \(\mathcal {X}=\frac{\mathcal {T}}{\Vert \mathcal {T}\Vert _2}\) by the Cauchy–Schwartz inequality. However, if we consider the projection on rank-one tensors, then

$$\begin{aligned} \max \left\{ \langle \mathcal {T},\mathcal {X}\rangle : \Vert \mathcal {X}\Vert _2=1,\,\mathrm{rank}\,(\mathcal {X})=1,\,\mathcal {X}\in \mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\right\} =\Vert \mathcal {T}\Vert _\sigma . \end{aligned}$$
(17)

Therefore, as an optimization problem (16), (17) becomes its convex relaxation. One is often interested in this relaxation gap, which is exactly the extremal ratio \(\tau (\mathbb {R}^{n_1\times n_2\times \cdots \times n_d})\). In the case of matrices, similar problems are known to be equivalence constants for matrix norms; see, e.g., Tonge [45]. In this sense, the problem is to determine the largest \(\tau >0\) such that \(\tau \Vert \mathcal {T}\Vert _2 \le \Vert \mathcal {T}\Vert _\sigma \) for all \(\mathcal {T}\in \mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\). On the other hand, the closest gap between the two norms is one since \(\Vert \mathcal {T}\Vert _\sigma \le \Vert \mathcal {T}\Vert _2\) and the equality holds if and only if \(\mathcal {T}\) is rank-one.

As an application of this extremal ratio, one obtains an interpretation as a perturbed steepest descent method and can deduce a rate of convergence using bounds of the extremal ratio (see [46, Theorem 2] for details). Since the time it was posted as a conjecture [40, Sect. 7], deciding this extremal ratio for a general tensor space has been a challenge task. Without loss of generality, we assume that \(2\le n_1\le n_2\le \cdots \le n_d\) holds in this section. Some known values of \(\tau (\mathbb {R}^{n_1\times n_2\times \cdots \times n_d})\) are: \(\tau (\mathbb {R}^{n_1\times n_2})=\frac{1}{\sqrt{n_1}}\), \(\tau (\mathbb {R}^{2\times n_2\times n_3})=\frac{1}{\sqrt{2n_2}}\) for even \(n_2\) [26], and \(\tau (\mathbb {R}^{n_1\times n_2\times \cdots \times n_d})= \frac{1}{\sqrt{n_1n_2\ldots n_{d-1}}}\) for \(n_d=2,4,8\) [31]. Note that

$$\begin{aligned} \tau \left( \mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\right) \ge \frac{1}{\sqrt{n_1n_2\ldots n_{d-1}}} \end{aligned}$$
(18)

is a naive lower bound, but can be obtained in various cases; see [31] for recent development on the topic. In the space of symmetric tensors, i.e.,

$$\begin{aligned} \tau (\mathbb {R}_{\mathrm{sym}}^{n^d}):=\min \left\{ \frac{\Vert \mathcal {T}\Vert _\sigma }{\Vert \mathcal {T}\Vert _2}:\mathcal {T}\ne 0,\,\mathcal {T}\in \mathbb {R}_{\mathrm{sym}}^{n^d} \right\} , \end{aligned}$$

where \(\mathbb {R}_{\mathrm{sym}}^{n^d}\) denotes the set of d-th order symmetric real tensors, this extremal ratio has also been studied in [1, 26, 40, 49]. In particular, it was recently shown in [1] that the naive bound (18) is tight for symmetric tensors only if \(n_d=2\). One may also consider the extremal ratio for nonnegative tensors, i.e.,

$$\begin{aligned} \tau \left( \mathbb {R}_+^{n_1\times n_2\times \cdots \times n_d}\right) :=\min \left\{ \frac{\Vert \mathcal {T}\Vert _\sigma }{\Vert \mathcal {T}\Vert _2}:\mathcal {T}\ne 0,\quad \mathcal {T}\in \mathbb {R}_+^{n_1\times n_2\times \cdots \times n_d} \right\} , \end{aligned}$$

where \(\mathbb {R}_+^{n_1\times n_2\times \cdots \times n_d}\) denotes the set of d-th order nonnegative tensors.

In this section, we provide a general tool investigating upper bounds of this extremal ratio by the norm compression inequality in Theorem 3.3. According to (15), the value of \(\frac{\Vert \mathcal {T}\Vert _\sigma }{\Vert \mathcal {T}\Vert _2}\) for any nonzero \(\mathcal {T}\) provides an upper bound of the extremal ratio. By recursively constructing modal partitions, we obtain the main result in this section.

Theorem 5.1

If \(\mathcal {T}\in \mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\) is a nonnegative tensor with \(\frac{\Vert \mathcal {T}\Vert _\sigma }{\Vert \mathcal {T}\Vert _2} = \tau \), then there exists a nonnegative tensor \(\mathcal {T}_m\in \mathbb {R}^{{n_1}^m\times {n_2}^m\times \cdots \times {n_d}^m}\) satisfying \(\frac{\Vert \mathcal {T}_m\Vert _\sigma }{\Vert \mathcal {T}_m\Vert _2} = \tau ^m\) for any \(m\in \mathbb {N}\). If \(\mathcal {T}\) is further symmetric, then \(\mathcal {T}_m\) is also symmetric.

Proof

Let \(\mathcal {T}=(t_{i_1i_2\ldots i_d})\) where the entry \(t_{i_1i_2\ldots i_d}\ge 0\) for all \(1\le i_k\le n_k\), \(k=1,2,\ldots ,d\), in other words, \(\mathcal {T}=\mathop {\bigvee \nolimits _1}_{i_1=1}^{n_1}\mathop {\bigvee \nolimits _2}_{i_2=1}^{n_2}\ldots \mathop {\bigvee \nolimits _d}_{i_d=1}^{n_d} t_{i_1i_2\ldots i_d}\). \(\mathcal {T}_m\) is defined recursively as follows,

$$\begin{aligned} \mathcal {T}_1:=\mathcal {T},\,\mathcal {T}_{m+1}:=\mathop {\bigvee \nolimits _1}_{i_1=1}^{n_1}\mathop {\bigvee \nolimits _2}_{i_2=1}^{n_2}\ldots \mathop {\bigvee \nolimits _d}_{i_d=1}^{n_d} (t_{i_1i_2\ldots i_d}\mathcal {T}_m) \quad m\ge 1. \end{aligned}$$

It is easy to see that \(\mathcal {T}_m\in \mathbb {R}^{{n_1}^m\times {n_2}^m\times \cdots \times {n_d}^m}\) is nonnegative, and is symmetric if \(\mathcal {T}\) is symmetric. The Frobenius norm of \(\mathcal {T}_m\) satisfies

$$\begin{aligned} {\Vert \mathcal {T}_{m+1}\Vert _2}^2= & {} \sum _{i_1=1}^{n_1}\sum _{i_2=1}^{n_2}\ldots \sum _{i_d=1}^{n_d} {\Vert t_{i_1i_2\ldots i_d} \mathcal {T}_m\Vert _2}^2 \\= & {} {\Vert \mathcal {T}_m\Vert _2}^2 \sum _{i_1=1}^{n_1}\sum _{i_2=1}^{n_2}\ldots \sum _{i_d=1}^{n_d} {t_{i_1i_2\ldots i_d}}^2 \\= & {} {\Vert \mathcal {T}_m\Vert _2}^2 {\Vert \mathcal {T}\Vert _2}^2, \end{aligned}$$

and so we get

$$\begin{aligned} \Vert \mathcal {T}_m\Vert _2 ={\Vert \mathcal {T}\Vert _2}^m. \end{aligned}$$

For the spectral norm of \(\mathcal {T}_m\), by noticing \(t_{i_1i_2\ldots i_d}\ge 0\) and applying (4) in Theorem 3.3,

$$\begin{aligned} \Vert \mathcal {T}_{m+1}\Vert _\sigma\le & {} \left\| \mathop {\bigvee \nolimits _1}_{i_1=1}^{n_1}\mathop {\bigvee \nolimits _2}_{i_2=1}^{n_2}\ldots \mathop {\bigvee \nolimits _d}_{i_d=1}^{n_d} \Vert t_{i_1i_2\ldots i_d}\mathcal {T}_m\Vert _\sigma \right\| _\sigma \\= & {} \Vert \mathcal {T}_m\Vert _\sigma \left\| \mathop {\bigvee \nolimits _1}_{i_1=1}^{n_1}\mathop {\bigvee \nolimits _2}_{i_2=1}^{n_2}\ldots \mathop {\bigvee \nolimits _d}_{i_d=1}^{n_d} t_{i_1i_2\ldots i_d} \right\| _\sigma \\= & {} \Vert \mathcal {T}_m\Vert _\sigma \Vert \mathcal {T}\Vert _\sigma , \end{aligned}$$

and we obtain

$$\begin{aligned} \Vert \mathcal {T}_m\Vert _\sigma \le {\Vert \mathcal {T}\Vert _\sigma }^m. \end{aligned}$$
(19)

On the other hand, let \(\mathcal {X}=\varvec{x}^1\otimes \varvec{x}^2\otimes \ldots \otimes \varvec{x}^d\) with \(\varvec{x}^k\in \mathbb {R}^{n_k}\) and \(\Vert \varvec{x}^k\Vert _2=1\) for \(k=1,2,\ldots ,d\), such that \(\langle \mathcal {T},\mathcal {X}\rangle =\Vert \mathcal {T}\Vert _\sigma \), i.e., \(\mathcal {X}\) is a best rank-one approximation of \(\mathcal {T}\). It is easy to see that \(\mathcal {X}\) must be nonnegative as \(\mathcal {T}\) is nonnegative. Recursively construct \(\mathcal {X}_m\) as that for \(\mathcal {T}_m\) as follows,

$$\begin{aligned} \mathcal {X}_1:=\mathcal {X},\,\mathcal {X}_{m+1}:=\mathop {\bigvee \nolimits _1}_{i_1=1}^{n_1}\mathop {\bigvee \nolimits _2}_{i_2=1}^{n_2}\ldots \mathop {\bigvee \nolimits _d}_{i_d=1}^{n_d} (x_{i_1i_2\ldots i_d}\mathcal {X}_m) \quad m\ge 1. \end{aligned}$$

Similar to \(\mathcal {T}_m\), we have \(\Vert \mathcal {X}_m\Vert _2={\Vert \mathcal {X}\Vert _2}^m=1\). Moreover, \(\mathcal {X}_m\) is actually rank-one. To see why, we notice that \(\mathcal {X}_1\) is rank-one and so \(x_{i_1i_2\ldots i_d}=\prod _{k=1}^d x^k_{i_k}\). If \(\mathcal {X}_m\) is rank-one, say \(\mathcal {X}_m=\varvec{y}^1\otimes \varvec{y}^2\otimes \ldots \otimes \varvec{y}^d\), then

$$\begin{aligned} \mathcal {X}_{m+1}&= \mathop {\bigvee \nolimits _1}_{i_1=1}^{n_1}\mathop {\bigvee \nolimits _2}_{i_2=1}^{n_2}\ldots \mathop {\bigvee \nolimits _d}_{i_d=1}^{n_d} \left( \left( \prod _{k=1}^d x^k_{i_k}\right) \varvec{y}^1\otimes \varvec{y}^2\otimes \ldots \otimes \varvec{y}^d\right) \\&= \left( {\bigvee }_{i_1=1}^{n_1} (x^1_{i_1}\varvec{y}^1)\right) \otimes \left( {\bigvee }_{i_2=1}^{n_2} (x^2_{i_2}\varvec{y}^2)\right) \otimes \ldots \otimes \left( {\bigvee }_{i_d=1}^{n_d} (x^d_{i_d}\varvec{y}^d) \right) \end{aligned}$$

is rank-one. By the constructions of \(\mathcal {T}_m\) and \(\mathcal {X}_m\), we have

$$\begin{aligned} \langle \mathcal {T}_{m+1},\mathcal {X}_{m+1}\rangle&= \sum _{i_1=1}^{n_1}\sum _{i_2=1}^{n_2}\ldots \sum _{i_d=1}^{n_d} \langle t_{i_1i_2\ldots i_d}\mathcal {T}_m, x_{i_1i_2\ldots i_d}\mathcal {X}_m\rangle \\&= \langle \mathcal {T}_m, \mathcal {X}_m\rangle \sum _{i_1=1}^{n_1}\sum _{i_2=1}^{n_2}\ldots \sum _{i_d=1}^{n_d} (t_{i_1i_2\ldots i_d} \cdot x_{i_1i_2\ldots i_d})\\&= \langle \mathcal {T}_m, \mathcal {X}_m\rangle \langle \mathcal {T}, \mathcal {X}\rangle , \end{aligned}$$

implying that \(\langle \mathcal {T}_m,\mathcal {X}_m\rangle = \langle \mathcal {T}, \mathcal {X}\rangle ^m\). Therefore, by that \(\mathcal {X}_m\) is rank-one and \(\Vert \mathcal {X}_m\Vert _2=1\),

$$\begin{aligned} \Vert \mathcal {T}_m\Vert _\sigma \ge \langle \mathcal {T}_m,\mathcal {X}_m\rangle = \langle \mathcal {T}, \mathcal {X}\rangle ^m = {\Vert \mathcal {T}\Vert _\sigma }^m, \end{aligned}$$

which combined with (19) leads to

$$\begin{aligned} \Vert \mathcal {T}_m\Vert _\sigma ={\Vert \mathcal {T}\Vert _\sigma }^m. \end{aligned}$$

Finally, the constructed \(\mathcal {T}_m\) satisfies

$$\begin{aligned} \frac{\Vert \mathcal {T}_m\Vert _\sigma }{\Vert \mathcal {T}_m\Vert _2} = \frac{{\Vert \mathcal {T}\Vert _\sigma }^m}{{\Vert \mathcal {T}\Vert _2}^m}=\tau ^m \end{aligned}$$

for any \(m\in \mathbb {N}\). \(\square \)

We remark that the construction of \(\mathcal {T}_m\) in the proof of Theorem 5.1 is essentially from the Kronecker products of m copies of \(\mathcal {T}\). For the sake of simplicity, we do not introduce more notations at this point.

Let us now apply Theorem 5.1 to get an improved upper bound for both \(\tau (\mathbb {R}_+^{n\times n\times n})\) and \(\tau (\mathbb {R}_{\mathrm{sym}}^{n\times n\times n})\). First we introduce the following example.

Example 5.2

Let \(\mathcal {U}\in \mathbb {R}^{2\times 2\times 2}\) be nonnegative and symmetric where \(u_{112}=u_{121}=u_{211}=1\) and other entries are zeros. We have \(\Vert \mathcal {U}\Vert _\sigma =\frac{2}{\sqrt{3}}\), \(\Vert \mathcal {U}\Vert _2=\sqrt{3}\), and \(\frac{\Vert \mathcal {U}\Vert _\sigma }{\Vert \mathcal {U}\Vert _2}=\frac{2}{3}\).

The calculation of \(\Vert \mathcal {U}\Vert _\sigma \) can be easily obtained using the fact that the best rank-one approximation of a symmetric tensor can be obtained at a symmetric rank-one tensor; see e.g., [8, 49]. Since \(\mathcal {U}\) is symmetric, we have

$$\begin{aligned} \Vert \mathcal {U}\Vert _\sigma= & {} \max _{\Vert \varvec{x}^k\Vert _2=1}\left\langle \mathcal {U},\varvec{x}^1\otimes \varvec{x}^2\otimes \varvec{x}^3\right\rangle =\max _{\Vert \varvec{y}\Vert _2=1}\left\langle \mathcal {U},\varvec{y}\otimes \varvec{y}\otimes \varvec{y}\right\rangle =\max _{\Vert \varvec{y}\Vert _2=1}3{y_1}^2y_2 \\= & {} \max _{-1\le y_2\le 1}3\left( 1-{y_2}^2\right) y_2 =\frac{2}{\sqrt{3}}. \end{aligned}$$

We believe that \(\frac{\Vert \mathcal {T}\Vert _\sigma }{\Vert \mathcal {T}\Vert _2}\ge \frac{2}{3}\) for any nonnegative tensor \(\mathcal {T}\in \mathbb {R}_+^{2\times 2\times 2}\), i.e, \(\tau (\mathbb {R}_+^{2\times 2\times 2})=\frac{2}{3}\) from some numerical evidence although we are unable to verify it theoretically.

Theorem 5.3

It holds that

$$\begin{aligned} \frac{1}{n}\le {\tau \left( \mathbb {R}_+^{n\times n\times n}\right) , \tau \left( \mathbb {R}_{\mathrm{sym}}^{n\times n\times n}\right) }\le \frac{1.5}{n^{\frac{\ln 1.5}{\ln 2}}} \le O\left( \frac{1}{n^{0.584}}\right) . \end{aligned}$$

Proof

The lower bound is listed for reference only, which is the naive one in (18) but currently the best known one. For the upper bound, let \(\mathcal {U}\in \mathbb {R}^{2\times 2\times 2}\) be the tensor in Example 5.2 and let \(m\in \mathbb {N}\) such that \(2^m\le n < 2^{m+1}\). By Theorem 5.1, there exists \(\mathcal {U}_m\in \mathbb {R}^{2^m\times 2^m\times 2^m}\), both nonnegative and symmetric, such that

$$\begin{aligned} \tau (\mathbb {R}_{\mathrm{sym}}^{2^m\times 2^m\times 2^m})\le & {} \frac{\Vert \mathcal {U}_m\Vert _\sigma }{\Vert \mathcal {U}_m\Vert _2} =\left( \frac{\Vert \mathcal {U}\Vert _\sigma }{\Vert \mathcal {U}\Vert _2}\right) ^m=\left( \frac{2}{3}\right) ^m=\frac{3}{2}\cdot 2^{(m+1)\log _2 \frac{2}{3}} \\< & {} \frac{3}{2}\cdot n^{\log _2 \frac{2}{3}} = \frac{1.5}{n^{\frac{\ln 1.5}{\ln 2}}}. \end{aligned}$$

Finally, by the obvious fact that \(\tau (\mathbb {R}_{\mathrm{sym}}^{n\times n\times n})\) is nonincreasing as n increases, we get

$$\begin{aligned} \tau \left( \mathbb {R}_{\mathrm{sym}}^{n\times n\times n}\right) \le \tau \left( \mathbb {R}_{\mathrm{sym}}^{2^m\times 2^m\times 2^m}\right) \le \frac{1.5}{n^{\frac{\ln 1.5}{\ln 2}}}. \end{aligned}$$

The above argument obviously showed the upper bound of \(\tau (\mathbb {R}_+^{n\times n\times n})\) as well. \(\square \)

The upper bound in Theorem 5.3 improves the existing one \(O\left( \frac{1}{n^{0.5}}\right) \) and remains the best for general \(\mathbb {R}_+^{n\times n\times n}\) and \(\mathbb {R}_{\mathrm{sym}}^{n\times n\times n}\), to the best of our knowledge. It may be possible to investigate other small-size nonnegative tensors, say \(\mathcal {T}\in \mathbb {R}_+^{3\times 3\times 3}\), to obtain new upper bounds of \(\tau (\mathbb {R}_+^{n\times n\times n})\) using Theorem 5.1. We are not sure if this can beat the one in Theorem 5.3 even though finding \(\tau (\mathbb {R}_+^{3\times 3\times 3})\) is already hard. In general, this tool can be certainly used to find better upper bounds of the extremal ratio for fourth or higher order tensors.

6 Estimating the spectral norm

A straightforward application of norm compression inequalities is to estimate a norm of a large tensor via norms of small subtensors in a modal partition. This is because computing the spectral norm of a tensor is NP-hard in general while computing that of small tensors can be done quite efficiently and accurately. Even for a matrix, computing the spectral norm could be costly when its size gets very large. Estimating matrix norms is an important topic in matrix computations. Most methods in the literature are based on random sampling [35] and power method [27]. In this section, we conduct a preliminary study on norm compression approach using the matrix spectral norm as an example, to provide a picture on how fast the method runs theoretically and how good the approximation is numerically.

As we know, computing the spectral norm of an \(n\times n\) matrix requires \(O(n^3)\) operations, which is essentially the complexity of singular value decompositions. For simplicity, we do not consider numerical errors that could bring some sort of \(\log \frac{1}{\epsilon }\). Suppose that we have a (large) matrix \(T\in \mathbb {R}^{n\times n}\) and need to compute some norm \(\Vert T\Vert \). The exact computation requires \(\alpha n^s\) operations where \(\alpha >0\) and \(s>2\) are two universal constants. The following algorithm provides an estimation of \(\Vert T\Vert \) based on a norm compression hierarchy.

Algorithm 6.1

Approximating a matrix norm via a norm compression hierarchy.

  • Input: A matrix \(T\in \mathbb {R}^{n\times n}\), a level of hierarchy \(m\ge 1\), and a factorization of \(n=\prod _{k=1}^m n_k\) with \(n_k\ge 2\) for \(k=1,2,\ldots ,m\).

    1. 1.

      Set \(T^0=T\) and \(\ell =1\).

    2. 2.

      Denote

      $$\begin{aligned} \mathbb {I}_j^\ell {=}\{(j-1)n_{m-\ell +1}+1,(j-1)n_{m-\ell +1}+2,\ldots , j n_{m-\ell +1}\} \quad \forall \,1{\le } j {\le }\prod _{k=1}^{m-\ell }n_k, \end{aligned}$$

      and compute the level-\(\ell \) norm compression matrix \(T^\ell \in \mathbb {R}^{\prod _{k=1}^{m-\ell }n_k\times \prod _{k=1}^{m-\ell }n_k}\) whose (ij)-th entry

      $$\begin{aligned} t_{ij}^\ell = \left\| T^{\ell -1} \left( \mathbb {I}_i^\ell , \mathbb {I}_j^\ell \right) \right\| \quad \forall \,1\le i,j \le \prod _{k=1}^{m-\ell }n_k. \end{aligned}$$
    3. 3.

      If \(\ell =m\), stop; otherwise increase \(\ell \) by one and go to the previous step.

  • Output: An approximation \(T^m\in \mathbb {R}\).

Remark that the level-m norm compression matrix \(T^m\) is actually one-by-one, i.e., a scalar. It is not difficult to see that \(T^\ell \) is a norm compression matrix of \(T^{\ell -1}\) for \(\ell =1,2,\ldots ,m\). Therefore, according to Theorem 3.3, we have the following property for Algorithm 6.1.

Proposition 6.2

For the matrix spectral \(\varvec{p}\)-norm where \(\varvec{p}\in \mathbb {P}^2\), the matrices computed in Algorithm 6.1 satisfy

$$\begin{aligned} \Vert T\Vert _{\varvec{p}_\sigma } \le \Vert T^1\Vert _{\varvec{p}_\sigma }\le \cdots \le \Vert T^{m-1}\Vert _{\varvec{p}_\sigma } = T^m. \end{aligned}$$

Obviously, the higher the hierarchy, the less accurate the approximation is. However, the approximation is closely related to the way to factorize the dimension n. Let us first study how to choose \(n_k\)’s properly in order to optimize the computational complexity of Algorithm 6.1. The exact computation requires \(\alpha n^s\) operations for an \(n\times n\) matrix. The dimension of \(T^\ell \) is \(\prod _{k=1}^{m-\ell }n_k\times \prod _{k=1}^{m-\ell }n_k\), and obtaining every entry of \(T^\ell \) requires to compute the norm of an \(n_{m-\ell +1}\times n_{m-\ell +1}\) matrix, which can be done in \(\alpha {n_{m-\ell +1}}^s\) operations. Therefore, the complexity of Algorithm 6.1 is

$$\begin{aligned} \sum _{\ell =1}^m \left( \alpha \,{n_{m-\ell +1}}^s\left( \prod _{k=1}^{m-\ell }n_k\right) ^2\right)&=\alpha \sum _{\ell =1}^m \left( {n_{m-\ell +1}}^{s-2}\prod _{k=1}^{m-\ell +1}{n_k}^2\right) \nonumber \\&=\alpha \sum _{\ell =1}^m \left( {n_\ell }^{s-2}\prod _{k=1}^\ell {n_k}^2\right) . \end{aligned}$$
(20)

In particular, if all the \(n_k\)’s are the same, we have \(n={n_1}^m\) and the complexity in (20) is bounded by the highest term in the summation, i.e., \(O\left( {n_1}^{s-2+2m}\right) = O\left( n^{\frac{2m+s-2}{m}}\right) \). This complexity tends to \(n^{2+\epsilon }\) for any \(s>2\).

Proposition 6.3

If the complexity to compute a norm of a general \(n\times n\) matrix is \(O(n^s)\) for some \(s>2\), then the complexity of Algorithm 6.1 is \(O\left( n^{\frac{2m+s-2}{m}}\right) \) if \(n_1=n_2=\cdots =n_m\).

For a fixed m, the level of hierarchy, in order to minimize the complexity of Algorithm 6.1, one should let every term in the summation in (20) be in the same order of magnitude. To this end, we have for any \(1\le \ell \le m-1\),

$$\begin{aligned} {n_\ell }^{s-2}\prod _{k=1}^\ell {n_k}^2={n_{\ell +1}}^{s-2}\prod _{k=1}^{\ell +1}{n_k}^2 \Longrightarrow n_{\ell +1}={n_{\ell }}^{\frac{s-2}{s}} \Longrightarrow n_{\ell }={n_1}^{\left( \frac{s-2}{s}\right) ^{\ell -1}}. \end{aligned}$$

This leads to

$$\begin{aligned} n=\prod _{k=1}^m n_k=\prod _{k=1}^m {n_1}^{\left( \frac{s-2}{s}\right) ^{k-1}}={n_1}^{\frac{s}{2}\left( 1-\left( \frac{s-2}{s}\right) ^m\right) }. \end{aligned}$$

Plugging the expression of n and \(n_k\) in terms of \(n_1\) into (20), we obtain an optimal complexity

$$\begin{aligned} \alpha \sum _{{\ell }=1}^m \left( {n_1}^{(s-2)\left( \frac{s-2}{s}\right) ^{{\ell }-1}} \prod _{k=1}^{{\ell }} {n_1}^{2\left( \frac{s-2}{s}\right) ^{k-1}}\right) =\alpha m{n_1}^s =\alpha m n^{\frac{2}{1-\left( \frac{s-2}{s}\right) ^m}}. \end{aligned}$$

Summarizing the above discussion, we have the following result.

Theorem 6.4

If the complexity to compute a norm of a general \(n\times n\) matrix is \(O(n^s)\) for some \(s>2\), then the complexity of Algorithm 6.1 is \(f_m(n):=O\left( m n^{\frac{2}{1-\left( \frac{s-2}{s}\right) ^m}}\right) \) by choosing \(n_k = O\left( n^{\frac{2\left( \frac{s-2}{s}\right) ^{k-1}}{s\left( 1-\left( \frac{s-2}{s}\right) ^m\right) } }\right) \) for \(k=1,2,\ldots ,m\), i.e., \(n=\prod _{k=1}^m n_k\) with \(n_{k+1}=O\left( {n_k}^{\frac{s-2}{s}}\right) \).

For whatever \(s>2\), this complexity can be \(O(n^{2+\epsilon })\) even for small m. For instance, for the matrix spectral norm where \(s=3\), the complexity \(f_m(n)\) and the way to choose a best factorization of n for some small m’s are listed as follows:

$$\begin{aligned} \begin{array}{l@{\quad }l} f_1(n)=O(n^3) &{} \text{ when } n=n^1,\\ f_2(n)=O(n^{2.25})&{} \text{ when } n=n^{0.75}\cdot n^{0.25},\\ f_3(n)=O(n^{2.077})&{} \text{ when } n=n^{0.692}\cdot n^{0.231}\cdot n^{0.077},\\ f_4(n)=O(n^{2.025})&{} \text{ when } n=n^{0.675}\cdot n^{0.225}\cdot n^{0.075}\cdot n^{0.025}. \end{array} \end{aligned}$$

We emphasize that the main purpose of Algorithm 6.1 is not to replace or compare with existing methods of matrix norm computations. It even cannot work without these methods as it needs them to compute norms of submatrices. The main job of Algorithm 6.1 is to speed up these methods, as illustrated in the above derivation of complexities, and in the same time to maintain good approximability, which is shown below numerically.

We now conduct some preliminary numerical tests to see the performance of Algorithm 6.1. In the first set of tests, we choose some \(n\times n\) matrices for \(n=10^4\), whose dimension is reasonably large yet computationally tractable by MATLAB in a personal computer. According to Algorithm 6.1, if the level m is set to one, it is simply the spectral norm of the original matrix. Using the optimal complexity setting in Theorem 6.4, for \(m=2\), we should set \((n_1,n_2)=(1000,10)\), and for \(m=3\), we should set \((n_1,n_2,n_3)=(625,8,2)\approx (586,8.39,2.03)\) in order to make \(n_1n_2n_3=n\). As mentioned in the above, the complexity for 2 levels is \(O(n^{2.25})\) and that for 3 levels is \(O(n^{2.077})\). For comparison, some classical upper bounds of the matrix spectral norm are used [14, Sect. 2.3]:

$$\begin{aligned} \Vert T\Vert _\sigma \le \Vert T\Vert _2, n \Vert T\Vert _\infty , \sqrt{t_1 t_\infty }, \sqrt{n}\, t_1, \sqrt{n}\, t_\infty , \end{aligned}$$

where

$$\begin{aligned} t_1:= \max _{1\le j\le n} \sum _{i=1}^{n} |t_{ij}| \quad \text{ and }\quad t_\infty : = \max _{1\le i\le n} \sum _{j=1}^{n} |t_{ij}|. \end{aligned}$$

Two types of data matrices are tested. The first type is randomly generated matrices, which include i.i.d. uniform distributions on [0, 1], i.i.d. standard normal distributions, the absolute values of i.i.d. standard normal distributions, i.i.d. Bernoulli distributions, rank-r matrices obtained by sum of i.i.d. rank-one matrices for \(r=1,10,100\), as well as low-rank matrices plus some noises. The second type is covariance matrices defined by exponential covariance function \(\exp (\frac{-|i-j|}{s})\) and by squared exponential covariance function \(\exp (\frac{-(i-j)^2}{2s^2})\) for several s. The spectral norm of the original matrix is computed and scaled to one for easy reference. In the results shown in Table 1, each entry for random matrices is the average of 10 randomly generated instances. It clearly indicates that the bounds by norm compression hierarchies, no matter 2 levels or 3 levels, are good estimations of the true spectral norm, and are in general better than other classical upper bounds.

Table 1 Spectral norm and its upper bounds for \(n\times n\) matrices when \(n=10^4\)

In the second set of tests, we enlarge the number of levels in Algorithm 6.1 in order to investigate the effect of level increasing to the change of spectral norms. According to Proposition 6.2, the spectral norms of the norm compression hierarchy increase as levels increase in Algorithm 6.1. For this purpose, we test \(n\times n\) matrices with \(n=3^8=6561\), i.e., \(n_1=n_2=\cdots =n_8=3\). The same two types of data matrices are tested, and their spectral norms are scaled to one. The results of the second set are shown in Table 2, where each entry for random matrices is the average of 10 randomly generated instances. We find that increment of compression levels has few effect to the change of spectral norms, while the benefit of level increasing lies in decreasing of the complexity of Algorithm 6.1, which can be seen from (20).

Table 2 Spectral norms of the norm compression hierarchy for \(n\times n\) matrices when \(n=3^8\)

The estimation of spectral norms of random matrices generating from i.i.d. normal distributions is not good by the norm compression hierarchy, observed both from Tables 1 and 2. Perhaps it could be the nature of these matrices, as other classical upper bounds are not good as well (Table 1). For nonnegative matrices, low-rank matrices and covariance matrices, the bounds obtained by Algorithm 6.1 are very good, and the algorithm enjoys a low complexity.

To conclude this section, we remark that Algorithm 6.1 can be easily extended to nonsquare matrices (\(\mathbb {R}^{n_1\times n_2}\) with \(n_1\ne n_2\)), to higher order tensors (\(\mathbb {R}^{n_1\times n_2\times \cdots \times n_d}\) with \(d\ge 3\) ), to different factorizations of \(n_k\)’s of a tensor, and to modal partitions having different sizes of subtensors. We do not extensively expand here. The least message is that estimating the matrix spectral norm via norm compression hierarchies can be done in \(O(n^{2+\epsilon })\) operations with a good accuracy. An important message is that for any tensor or matrix norm computation methods, no matter existed or being developed, the approach via norm compression hierarchies can speed up these methods while keep good approximability.

To briefly conclude the whole paper, we proposed norm compression inequalities for partitioned block tensors. For any spectral \(\varvec{p}\)-norm, the norm of a norm compression tensor is an upper bound of the norm of the original tensor. By applying these inequalities, various bounds of tensor and matrix spectral norms in the literature can be improved. Norm compression inequities for tensors have been shown good potential in studying the extremal ratio between the spectral norm and the Frobenius norm of a tensor space, and estimating tensor and matrix norms via norm compression hierarchies. We think this is a promising start and the research can be further extended both in theory and in applications.