On norm compression inequalities for partitioned block tensors

When a tensor is partitioned into subtensors, some tensor norms of these subtensors form a tensor called a norm compression tensor. Norm compression inequalities for tensors focus on the relation of the norm of this compressed tensor to the norm of the original tensor. We prove that for the tensor spectral norm, the norm of the compressed tensor is an upper bound of the norm of the original tensor. This result can be extended to a general class of tensor spectral norms. We discuss various applications of norm compression inequalities for tensors. These inequalities improve many existing bounds of tensor norms in the literature, in particular tightening the general bound of the tensor spectral norm via tensor partitions. We study the extremal ratio between the spectral norm and the Frobenius norm of a tensor space, provide a general way to estimate its upper bound, and in particular, improve the current best upper bound for third order nonnegative tensors and symmetric tensors. We also propose a faster approach to estimate the spectral norm of a large tensor or matrix via sequential norm compression inequalities with theoretical and numerical evidence. For instance, the complexity of our algorithm for the matrix spectral norm is On2+ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O\left( n^{2+\epsilon }\right) $$\end{document} where ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon $$\end{document} ranges from 0 to 1 depending on the partition and the estimate ranges correspondingly from some close upper bound to the exact spectral norm.


Introduction
With the advances in data collection and storage capabilities, massive multiway (tensor) data are being generated in a wide range of emerging applications [25].Multilinear algebra and tensor computations are playing more and more important roles in dealing with multiway data in recent years.Computing tensor norms are evidently essential in many tensor computation problems.However, most tensor norms are NP-hard to compute [19], such as the tensor spectral norm [17] and the tensor nuclear norm [12].As a useful method to approximate matrix norms via block matrices, the computation of tensor norms via block tensors is straightforward and becomes increasingly important within the field of numerical linear algebra [9,29,41,42].When a tensor is partitioned into subtensors, not necessarily having the same size, some tensor norms of these subtensors form a tensor called a norm compression tensor.Norm compression inequalities for tensors focus on the relation of a norm of this compressed tensor to the norm of the original tensor.These inequalities straightforwardly provide a handy tool to bound and estimate norms of large tensors via norms of smaller subtensors.
In the case of matrices, tensors of order two, norm compression inequalities have been well studied since Bhatia and Kittaneh [6].Such inequalities have several applications in, for instance, quantum information theory [2,5] and covariance estimation [7].An overview of several norm compression inequalities for matrices can be found in [2] and references therein.One important result is due to King [23] where • p s stands for the Schatten p-norm of a matrix, i.e., the L p -norm of the vector consisting of all the singular values of the matrix.However, there exists an example [3] of a partitioned 3 × 3 block matrix such that inequalities of type (1) fail to hold.A conjecture of this type to hold for 2 × m blocks was proposed and its several special cases were proven by Audenaert [3].There are two notable special cases for 2 × m blocks to hold, namely when the matrix M is positive semidefinite [23], and when the blocks of M are all diagonal matrices [24].Among all Schatten p-norms of a matrix, three of them are particularly important, namely the spectral norm ( p = ∞), the Frobenius Norm ( p = 2), and the nuclear norm ( p = 1).The Schatten 2-norm coincides with the Frobenius norm of a matrix, and this makes the corresponding norm compression inequality trivial, which is actually an equality.In fact, the norm compression inequality of type (1) holds for any m 1 × m 2 blocks when p = ∞.This result, to the best of our knowledge, was not studied in the literature, and it is a special case of the main result in this paper.For higher order (order three or higher) tensors, the Schatten p-norms are not well defined unless p = ∞, 1 [12], corresponding to the tensor spectral norm and nuclear norm, respectively.As both the spectral norm and the nuclear norm of a tensor are NP-hard to compute while that of a matrix can be computed in polynomial time, matrix unfoldings have become a major approach dealing with various problems involving these tensor norms, no matter in theory and in practice.Relations of these norms of a tensor and its matrix unfoldings have been studied in [12,17,21].A generalization of such relations under tensor unfoldings has been studied by Want et al. [48].
Block tensors are becoming increasingly important.They have been used in large tensor factorizations [39], tensor decompositions [36], tensor optimization [47], and imaging processing [10].Ragnarsson and Van Loan [41] developed an infrastructure that supports reasoning about block tensor computations.They [42] further applied block tensors to symmetric embeddings of tensors.Extending block tensors, Li [29] proposed more general concepts of tensor partitions and provided bounds of the spectral norm and the nuclear norm of a tensor via norms of subtensors in a regular partition of the tensor.The results were further generalized to the spectral p-norm and the nuclear p-norm of a tensor and to arbitrary partitions of the tensor [9].This paper explores the structure of block tensors instead of treating subtensors merely as elements in [9,29] and proposes more accurate estimation of the spectral norm of a tensor, albeit block tensors are special but most common types of regular partitions [29] and arbitrary partitions [9].It is worth mentioning that bounds of the tensor spectral pnorm have been extensively studied in the literature [16][17][18]20,38,44,48], in particular in the area of polynomial optimization [30].
In this paper, we study norm compression inequalities for tensors.We prove that for any block partitioned tensor, no matter how many blocks, the spectral norm of its norm compressed tensor is an upper bound of the spectral norm of the original tensor.The result can be generalized to a wider class of tensor spectral norms.These norm compression inequalities improve many existing bounds of tensor spectral norms in the literature, including the recent bounds via tensor partitions studied in [9,29].We discuss two important applications of our results.The first one is on the extremal ratio between the spectral norm and the Frobenius norm of a tensor space.We provide a general methodology to compute upper bounds of this ratio, and in particular to improve the current best upper bound for third order nonnegative tensors and symmetric tensors.The second one is to estimate the spectral norm of a large tensor or matrix via sequential norm compression inequalities.Some numerical evidence is provided to justify our methodology.
This paper is organized as follows.We start with the preparation of various notations, definitions and properties of tensor spectral norms in Sect. 2. In Sect.3, we present our main result on norm compression inequalities for tensors, and in Sect.4, we discuss how our main inequalities lead to various other bounds of tensor spectral norms in the literature.For applications, the study of the extremal ratio between the spectral norm and the Frobenius norm of a tensor space is presented in Sect.5, and estimating the tensor and the matrix spectral norms is discussed in Sect.6.

Preparation
Throughout this paper, we uniformly use the lower case letters (e.g., x), the boldface lower case letters (e.g., x = (x i )), the capital letters (e.g., X = x i j ), and the calligraphic letters (e.g., X = x i 1 i 2 ...i d ) to denote scalars, vectors, matrices, and higher order (order three or more) tensors, respectively.Denote R n 1 ×n 2 ×•••×n d to be the space of d-th order real tensors of dimension n 1 ×n 2 ×• • •×n d .The same notations apply for a vector space and a matrix space when d = 1 and d = 2, respectively.Unless otherwise specified, the order of a general tensor in this paper is always denoted by d and the dimension of its mode-k is always denoted by Denote N to be the set of positive integers and denote P = [1, ∞], the interval where the L p -norm of a vector is well defined when 1 ≤ p ≤ ∞.
The Frobenius inner product of two tensors . . .
Its induced Frobenius norm is naturally defined as T 2 := √ T , T .When d = 1, the Frobenius norm is reduced to the Euclidean norm of a vector.In a similar vein, we may define the L p -norm of a tensor (also known as the Hölder p-norm [33]) for p ∈ P by looking at a tensor as a vector, as follows: A rank-one tensor, also called a simple tensor, is a tensor that can be written as outer products of vectors It is easy to verify that T p = d k=1 x d p for all p ∈ P. When d = 2, a rank-one tensor is reduced to the well known concept of a rank-one matrix.
The spectral norm of a tensor is an important measure of the tensor.
Definition 2.1 For a given tensor T ∈ R n 1 ×n 2 ×•••×n d , the spectral norm of T , denoted by T σ , is defined as and the nuclear norm of T , denoted by T * , is defined as for all k = 1, 2, ..., d and i = 1, 2, ..., r ∈ N . ( Essentially, T σ is the maximal value of the Frobenius inner product between T and a rank-one tensor whose Frobenius norm is one.Computing the tensor spectral norm is also known as the Euclidean spherical constrained multilinear form maximization problem [30].The tensor nuclear norm is the dual norm to the tensor spectral norm, and vice versa [11,34], i.e., T , X and T * = max T , X . Apart from L p -norms defined via tensor entries in (2), there is another set of norms for matrices called Schatten p-norms, defined by the L p -norm of the vector consisting all singular values of a matrix.In particular, the spectral norm and the nuclear norm of a matrix is nothing but the Schatten ∞-norm and the Schatten 1-norm of the matrix, respectively.In fact, the study of norm compression inequalities for matrices in the literature is mostly on Schatten p-norms, including the spectral norm and the nuclear norm as special cases.However, one cannot get a Schatten p-norm for tensors in the manner defining the nuclear norm of a tensor in (3).If the L 1 -norm expression r i=1 |λ i | in (3) is replaced by an L p -norm expression r i=1 |λ i | p 1 p for any 1 < p < ∞, the minimum is always zero [12].
One can actually extend the tensor spectral norm and tensor nuclear norm in Definition 2.1 as follows.Definition 2.2 For a given tensor T ∈ R n 1 ×n 2 ×•••×n d and a vector p = ( p 1 , p 2 , . . ., p d ) ∈ P d , the spectral p-norm of T , denoted by T p σ , is defined as and the nuclear p-norm of T , denoted by T p * , is defined as for all k = 1, 2, ..., d and i = 1, 2, ..., r ∈ N .
In particular, the spectral (2, 2, . . ., 2)-norm and the nuclear (2, 2, . . ., 2)-norm of a tensor are the usual spectral norm and nuclear norm of the tensor, respectively.The tensor spectral p-norm was firstly defined at the same time as the tensor spectral norm by Lim [32] in 2005.Computation of the spectral p-norm for nonnegative tensors was discussed in [13].When T , X , This duality can be proved similarly to the case when all p k 's are equal [9, Lemma 2.5] and is thus omitted.The spectral p-norm and nuclear p-norm of a tensor are in general very difficult to compute.For the computational complexity for various p's and orders of the tensor, one is referred to Friedland and Lim [12].It is worth mentioning that computing these norms for a rank-one tensor admits a closed form.

Proposition 2.4 If a tensor T is rank-one, say T
The proof is left to interested readers.Therefore, T p σ can be taken as the maximal value of the Frobenius inner product between T and a rank-one tensor whose spectral q-norm is one.

Norm compression inequalities for tensors
To study norm compression inequalities for tensors, we first introduce tensor partitions.
One important class of tensor partitions, block tensors, was proposed and studied by Ragnarsson and Van Loan [41].It is a straightforward generalization of block matrices.Li [29] proposed three types of tensor partitions, namely, modal partitions (an alternative name for block tensors), regular partitions, and tensor partitions, with the later generalizing the former.A more general class of partitions, called arbitrary partitions, was proposed and studied by Chen and Li [9].Norm compression inequalities are established on block tensors, which are also called modal partitions as they are constructed by partitions of the index sets of tensor modes.Given a tensor the indices of its mode k can be partitioned into r k nonempty sets, i.e., For simplicity, we assume that the indices in I k i are consecutive and I k i 's monotonically increase as i increases, since this can be done easily via indices relabeling without affecting tensor norms.Any (J 1 , J 2 , . . ., J d ) where J k ⊆ I k for k = 1, 2, . . ., d uniquely defines a subtensor of T by only keeping indices in J k for mode k of T , i.e., In our words, a partitioned block tensor is a tensor that has been modal partitioned.Trivially for d = 2, a block matrix can be obtained by a modal partition, i.e., partitions of row indices and column indices.We remark that a subtensor in a modal partition of a tensor may not possess the same order of the original tensor.If some I k j contains only one index, it causes the disappearance of mode k and reduces the order of the subtensor by one.However, we still treat this subtensor as a d-th order tensor by keeping the dimension of mode k being one.For instance, we can always treat a scalar as a one-dimensional vector, or a one-by-one matrix.
In order to present the proof of our main result (Theorem 3.3) clearly as well as to provide a better picture of modal partitions, we now discuss tensor cuts.Given a d-th where The same notation can be used to cut a matrix and cut a vector.In particular, for a first order tensor, a vector x ∈ R n , x = x 1 1 x 2 is exactly same as x T = x 1 T , x 2 T .The mode subscript of in a tensor cut is sometimes omitted for clearer presentations.For instance, T = T 1 T 2 implies that there exists k ∈ N such that Obviously, the operation is not commutative and associative in general.Once the notation k is applied, the dimensions of its associated two tensors must be the same in every mode except mode k.With this handy notation, a block tensor via a modal partition (Definition 3.1) can be simply written as The following norm compression identity for vectors is straightforward.
Lemma 3.2 If a vector x = r j=1 x j ∈ R n , then for any p ∈ P, Proof Denote the vector We have Though in practice one is often interested in the spectral norm of a tensor rather than general spectral p-norms of the tensor, we present our main result for the general case.It obviously applies to the tensor spectral norm when p = (2, 2, . . ., 2).
which leads to Further, we denote Obviously we have First, for every ( j 1 , j 2 , . . ., j d ), if none of the vectors x 1 j 1 , x 2 j 2 , . . ., x d j d is a zero vector, then The above inequality trivially holds even if some of x 1 j 1 , x 2 j 2 , . . ., x d j d are zero vectors, and thus it holds in general.
Next, by the norm compression identity for vectors in Lemma 3.2, we have Therefore, we have . . .
where the first inequality is due to (5), and the last inequality is due to (6) and Definition 2.2.
We remark that for d = 2, the case of matrices, Theorem 3.3 was not studied in the literature, to the best of our knowledge.The tightness of the norm compression inequality in Theorem 3.3 is in general hard to establish.We list some special cases below albeit some of them are trivial: 1. d = 1 which corresponds to the case of vectors essentially established in Lemma 3.2; 2. p = (1, 1, . . ., 1) for which the spectral p-norm of a tensor is simply the L ∞ -norm of the tensor or the largest absolute-valued entry of the tensor; 3.All but one of the subtensors are zero tensors; 4. The original tensor T is rank-one.
The last case actually includes the first case as a special one.Its proof can be obtained by using Proposition 2.4 and is left to interested readers.
For the nuclear p-norm of a tensor, like the nice dual bounds of the nuclear norm shown in [29] and the nuclear p-norm shown in [9], one hopes to establish a dual inequality to (4) as follows: Unfortunately, this does not hold in general; see the example below.This actually makes the norm compression inequalities for tensors more interesting, and the result in Theorem 3.3 more valuable.

Example 3.4 Let
∈ R 3×3 and a modal partition of M be the 3 × 3 entry-wise partition, resulting the nuclear norm compressed tensor to be |M| = . It follows that As far as we are aware, the only known nontrivial case for (7) to hold is for 2×2 blocked matrices with p = (2, 2) due to King [23], i.e., (1) for p = 1.There is a general case, though sort of trivial, for (7) to hold as an equality when p = (1, 1, . . ., 1) for which the nuclear p-norm of a tensor becomes the L 1 -norm of the tensor.
To conclude this section, we provide some insights on (7) although we are unable to prove any general result.We believe that (7) holds for nonnegative tensors, i.e., tensors having all nonnegative entries.Another interesting question is to find the smallest τ > 0 such that holds in general.Numerical evidence shows that τ may not be a universal constant, and can depend on dimensions of the tensor space.

Improved bounds on tensor and matrix norms
In this section, we discuss how our norm compression inequalities improve other known bounds on tensor and matrix norms in the literature.

Norm compression inequalities for matrices
As mentioned in Sect. 1, norm compression inequalities for matrices were studied mainly for the Schatten p-norms, which unfortunately do not hold for general r 1 × r 2 blocks [3].However, there do exist two related results for general r × r blocks.For the relevancy to our results, we present them using our notations.Let j=1 T i j be a modal partition of a matrix T .One result is due to Bhatia and Kittaneh [6, Theorem 1]: and the other is due to Bebendorf [4, Lemma 2.14]: A basic inequality between the spectral norm and the Frobenius norms of a matrix (see, e.g., [14]) state that Therefore, according to (4) when d = 2 and p = (2, 2), we get , providing a tighter upper bound of T σ than that in (8).
To see how our inequality (4) improves the upper bound of T σ in (9), we need to use a classical result due to Schur [43], which states that for any matrix By ( 4) when d = 2 and p = (2, 2), we get , providing a tighter upper bound of T σ than that in (9).
We remark that the result (10) of Schur [43] has been generalized to tensors by Hardy, Littlewood, and Pólya [15,Theorem 273] and so we can easily apply the norm compression inequality (4) to generalize inequality (9) from matrices to tensors.The detail is left to interested readers.

Bounds on tensor norms via partitions
Li [29] first proposed bounds on the tensor spectral norm based on tensor partitions.Specifically, if {T 1 , T 2 , . . ., T m } is a regular partition of a tensor T , then This result was later generalized to the most general class of partitions and to any tensor spectral p-norm by Chen and Li [9], i.e., if {T 1 , T 2 , . . ., T m } is an arbitrary partition of a tensor T and p ∈ P with 1 p + 1 q = 1, then Here we do not introduce regular partitions and arbitrary partitions of a tensor, but only mention that they are more general than modal partitions.However, modal partitions are the most commonly seen partitions in practice.
Proof The first inequality in (13) is exactly (4) if we let p = ( p, p, . . ., p).To see why one upper bound is tighter than the other upper bound in (13), we only need to apply the upper bound of (12) since T j 1 j 2 ...
is a modal partition of the tensor 1 The above result obviously improves the upper bound in (12) and when p = 2 improves the upper bound in (11).These improvements are made by considering positions of T j 1 j 2 ... j d p σ 's in the norm compressed tensor 1 T j 1 j 2 ... j d p σ rather than treating all of them as entries of a vector.

Bounds on norms of matrix unfoldings
Matrix unfoldings of a tensor has been one of the main tools in tensor computations, partially because that most tensor problems are NP-hard [19] while corresponding matrix problems are much easier.For instance, computing the spectral norm and the nuclear norm of a tensor is NP-hard [12,17] while that of a matrix can be done in polynomial time.The relation between norms of matrix unfoldings of a tensor and norms via certain partitions of the tensor has been investigated by Chen and Li [9].As discussed in Sect.4.2, the norm compression inequality in Theorem 3.3 improves (12) in [9].Consequently, bounds of the tensor spectral p-norm can be improved in various ways by applying some specific partitions of the tensor.Here we discuss one particular instance of this kind to appreciate the applicability of our general approach.It could be of special interest to bounding the spectral norm of a large matrix analogous to the discussion in Sect.6.Let T ∈ R n×n×n×n be a fourth order tensor.The traditional matrix unfolding of T unfolds T to an n × n 3 matrix, and it can be done via four different modes.Square matrix unfoldings, i.e., unfolding T to an n 2 ×n 2 matrix, has been appeared frequently recently, in particular in studying the largest eigenvalue of a fourth order tensor [22,37].Let T 13,24 ∈ R n 2 ×n 2 be the square matrix unfolding of T by grouping modes 1 and 3 of T into the row and modes 2 and 4 of T into the column of T 13,24 .It is well known (see, e.g., Wang et al. [48]) that T σ ≤ T 13,24 σ .Let and so we obtain a modal partition of T , As T i j is essentially a matrix, we use T i j ∈ R n×n to denote it.An important observation is that In fact, we also have T 14,23 = 1 n i=1 2 n j=1 (T i j ) T , and other square matrix unfoldings of T can also be modal partitioned similarly.The above discussion can be clearly verified by the follow example.

Example 4.2 Let
Let us now apply Theorem 3.3, and we obtain The bound for both T σ and T 13,24 σ by the norm compression inequality, T i j σ σ can be computed in polynomial time.In practice, if a given matrix T 13,24 is very large, ( 14) can be used to lower and upper bound its spectral norm by the spectral norm of T and the norm compressed matrix 1 n i=1 2 n j=1 T i j σ .We will discuss this further in Sect.6.To conclude this section, we remark that the variety of modal partitions of a tensor provides various specific needs, such as norms of tensor unfoldings, i.e., unfolding a given tensor to a tensor of a lower order [48].The usefulness of norm compression inequalities for tensors (Theorem 3.3) is based on the following ground fact: For any tensor unfolding (including matrix unfolding) of a given tensor, there exists a modal partition of the tensor, which is also a modal partition of the tensor unfolding.

Extremal ratio between the spectral and Frobenius norms
Given a tensor space R n 1 ×n 2 ×•••×n d , the extremal ratio between the spectral norm and the Frobenius norm is defined as This natural question is easy when d = 2 (matrices) but becomes very difficult when d ≥ 3. The concept was proposed by Qi [40], known as the best rank-one approximation ratio of a tensor space, although Kühn and Peetre [28] studied this ratio much earlier for some small n i 's when d = 3.For a nonzero tensor and obviously attains optimum when X = T T 2 by the Cauchy-Schwartz inequality.However, if we consider the projection on rank-one tensors, then max T , X : Therefore, as an optimization problem ( 16), ( 17) becomes its convex relaxation.
One is often interested in this relaxation gap, which is exactly the extremal ratio In the case of matrices, similar problems are known to be equivalence constants for matrix norms; see, e.g., Tonge [45].In this sense, the problem is to determine the largest τ > 0 such that τ T 2 ≤ T σ for all T ∈ R n 1 ×n 2 ×•••×n d .On the other hand, the closest gap between the two norms is one since T σ ≤ T 2 and the equality holds if and only if T is rank-one.
As an application of this extremal ratio, one obtains an interpretation as a perturbed steepest descent method and can deduce a rate of convergence using bounds of the extremal ratio (see [46,Theorem 2] for details).Since the time it was posted as a conjecture [40,Sect. 7], deciding this extremal ratio for a general tensor space has been a challenge task.Without loss of generality, we assume that 2 for even n 2 [26], and τ (R is a naive lower bound, but can be obtained in various cases; see [31] for recent development on the topic.In the space of symmetric tensors, i.e., where R n d sym denotes the set of d-th order symmetric real tensors, this extremal ratio has also been studied in [1,26,40,49].In particular, it was recently shown in [1] that the naive bound ( 18) is tight for symmetric tensors only if n d = 2.One may also consider the extremal ratio for nonnegative tensors, i.e., denotes the set of d-th order nonnegative tensors.In this section, we provide a general tool investigating upper bounds of this extremal ratio by the norm compression inequality in Theorem 3.3.According to (15), the value of T σ T 2 for any nonzero T provides an upper bound of the extremal ratio.By recursively constructing modal partitions, we obtain the main result in this section.
T m is defined recursively as follows, . . .
and so we get For the spectral norm of T m , by noticing t i 1 i 2 ...i d ≥ 0 and applying (4) in Theorem 3.3, . . .
and we obtain On the other hand, let , X is a best rank-one approximation of T .It is easy to see that X must be nonnegative as T is nonnegative.Recursively construct X m as that for T m as follows, Moreover, X m is actually rank-one.To see why, we notice that X 1 is rank-one and so is rank-one.By the constructions of T m and X m , we have . . .
which combined with ( 19) leads to Finally, the constructed T m satisfies for any m ∈ N.
We remark that the construction of T m in the proof of Theorem 5.1 is essentially from the Kronecker products of m copies of T .For the sake of simplicity, we do not introduce more notations at this point.
Let us now apply Theorem 5.1 to get an improved upper bound for both τ (R n×n×n + ) and τ (R n×n×n sym ).First we introduce the following example., U 2 = √ 3, and The calculation of U σ can be easily obtained using the fact that the best rank-one approximation of a symmetric tensor can be obtained at a symmetric rank-one tensor; see e.g., [8,49].Since U is symmetric, we have We believe that T σ T 2 ≥ 2 3 for any nonnegative tensor 3 from some numerical evidence although we are unable to verify it theoretically.Theorem 5.3 It holds that Proof The lower bound is listed for reference only, which is the naive one in (18) but currently the best known one.For the upper bound, let U ∈ R 2×2×2 be the tensor in Example 5.2 and let m ∈ N such that 2 m ≤ n < 2 m+1 .By Theorem 5.1, there exists .
Finally, by the obvious fact that τ (R n×n×n sym ) is nonincreasing as n increases, we get .
The above argument obviously showed the upper bound of τ (R n×n×n + ) as well.
The upper bound in Theorem 5.3 improves the existing one O 1 n 0.5 and remains the best for general R n×n×n + and R n×n×n sym , to the best of our knowledge.It may be possible to investigate other small-size nonnegative tensors, say T ∈ R 3×3×3 + , to obtain new upper bounds of τ (R n×n×n + ) using Theorem 5.1.We are not sure if this can beat the one in Theorem 5.3 even though finding τ (R 3×3×3

+
) is already hard.In general, this tool can be certainly used to find better upper bounds of the extremal ratio for fourth or higher order tensors.

Estimating the spectral norm
A straightforward application of norm compression inequalities is to estimate a norm of a large tensor via norms of small subtensors in a modal partition.This is because computing the spectral norm of a tensor is NP-hard in general while computing that of small tensors can be done quite efficiently and accurately.Even for a matrix, computing the spectral norm could be costly when its size gets very large.Estimating matrix norms is an important topic in matrix computations.Most methods in the literature are based on random sampling [35] and power method [27].In this section, we conduct a preliminary study on norm compression approach using the matrix spectral norm as an example, to provide a picture on how fast the method runs theoretically and how good the approximation is numerically.
As we know, computing the spectral norm of an n × n matrix requires O(n 3 ) operations, which is essentially the complexity of singular value decompositions.For simplicity, we do not consider numerical errors that could bring some sort of log 1 .Suppose that we have a (large) matrix T ∈ R n×n and need to compute some norm T .The exact computation requires αn s operations where α > 0 and s > 2 are two universal constants.The following algorithm provides an estimation of T based on a norm compression hierarchy.
3. If = m, stop; otherwise increase by one and go to the previous step.
• Output: An approximation T m ∈ R.
Remark that the level-m norm compression matrix T m is actually one-by-one, i.e., a scalar.It is not difficult to see that T is a norm compression matrix of T −1 for = 1, 2, . . ., m.Therefore, according to Theorem 3.3, we have the following property for Algorithm 6.1.Proposition 6.2For the matrix spectral p-norm where p ∈ P 2 , the matrices computed in Algorithm 6.1 satisfy Obviously, the higher the hierarchy, the less accurate the approximation is.However, the approximation is closely related to the way to factorize the dimension n.Let us first study how to choose n k 's properly in order to optimize the computational complexity of Algorithm 6.1.The exact computation requires αn s operations for an n × n matrix.The dimension of T is m− k=1 n k × m− k=1 n k , and obtaining every entry of T requires to compute the norm of an n m− +1 × n m− +1 matrix, which can be done in αn m− +1 s operations.Therefore, the complexity of Algorithm 6.1 is In particular, if all the n k 's are the same, we have n = n 1 m and the complexity in (20) is bounded by the highest term in the summation, i.e., O n For a fixed m, the level of hierarchy, in order to minimize the complexity of Algorithm 6.1, one should let every term in the summation in (20) be in the same order of magnitude.To this end, we have for any 1 ≤ ≤ m − 1, This leads to Plugging the expression of n and n k in terms of n 1 into (20), we obtain an optimal complexity For whatever s > 2, this complexity can be O(n 2+ ) even for small m.For instance, for the matrix spectral norm where s = 3, the complexity f m (n) and the way to choose a best factorization of n for some small m's are listed as follows: We emphasize that the main purpose of Algorithm 6.1 is not to replace or compare with existing methods of matrix norm computations.It even cannot work without these methods as it needs them to compute norms of submatrices.The main job of Algorithm 6.1 is to speed up these methods, as illustrated in the above derivation of complexities, and in the same time to maintain good approximability, which is shown below numerically.
We now conduct some preliminary numerical tests to see the performance of Algorithm 6.1.In the first set of tests, we choose some n × n matrices for n = 10 4 , whose dimension is reasonably large yet computationally tractable by MATLAB in a personal computer.According to Algorithm 6.1, if the level m is set to one, it is simply the spectral norm of the original matrix.Using the optimal complexity setting in Theorem 6.4, for m = 2, we should set (n 1 , n 2 ) = (1000, 10), and for m = 3, we should set (n 1 , n 2 , n 3 ) = (625, 8, 2) ≈ (586, 8.39, 2.03) in order to make n 1 n 2 n 3 = n.As mentioned in the above, the complexity for 2 levels is O(n 2.25 ) and that for 3 levels is O(n 2.077 ).For comparison, some classical upper bounds of the matrix spectral norm are used [14, Sect.2.3]:  covariance function exp( −(i− j) 2 2s 2 ) for several s.The spectral norm of the original matrix is computed and scaled to one for easy reference.In the results shown in Table 1, each entry for random matrices is the average of 10 randomly generated instances.It clearly indicates that the bounds by norm compression hierarchies, no matter 2 levels or 3 levels, are good estimations of the true spectral norm, and are in general better than other classical upper bounds.
In the second set of tests, we enlarge the number of levels in Algorithm 6.1 in order to investigate the effect of level increasing to the change of spectral norms.According to Proposition 6.2, the spectral norms of the norm compression hierarchy increase as levels increase in Algorithm 6.1.For this purpose, we test n × n matrices with n = 3 8 = 6561, i.e., n 1 = n 2 = • • • = n 8 = 3.The same two types of data matrices are tested, and their spectral norms are scaled to one.The results of the second set are shown in Table 2, where each entry for random matrices is the average of 10 randomly generated instances.We find that increment of compression levels has few effect to the change of spectral norms, while the benefit of level increasing lies in decreasing of the complexity of Algorithm 6.1, which can be seen from (20).The estimation of spectral norms of random matrices generating from i.i.d.normal distributions is not good by the norm compression hierarchy, observed both from Tables 1 and 2. Perhaps it could be the nature of these matrices, as other classical upper bounds are not good as well (Table 1).For nonnegative matrices, low-rank matrices and covariance matrices, the bounds obtained by Algorithm 6.1 are very good, and the algorithm enjoys a low complexity.
To conclude this section, we remark that Algorithm 6.1 can be easily extended to nonsquare matrices (R n 1 ×n 2 with n 1 = n 2 ), to higher order tensors (R n 1 ×n 2 ×•••×n d with d ≥ 3 ), to different factorizations of n k 's of a tensor, and to modal partitions having different sizes of subtensors.We do not extensively expand here.The least message is that estimating the matrix spectral norm via norm compression hierarchies can be done in O(n 2+ ) operations with a good accuracy.An important message is that for any tensor or matrix norm computation methods, no matter existed or being developed, the approach via norm compression hierarchies can speed up these methods while keep good approximability.
To briefly conclude the whole paper, we proposed norm compression inequalities for partitioned block tensors.For any spectral p-norm, the norm of a norm compression tensor is an upper bound of the norm of the original tensor.By applying these inequalities, various bounds of tensor and matrix spectral norms in the literature can be improved.Norm compression inequities for tensors have been shown good potential in studying the extremal ratio between the spectral norm and the Frobenius norm of a tensor space, and estimating tensor and matrix norms via norm compression hierarchies.We think this is a promising start and the research can be further extended both in theory and in applications.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

.Theorem 6 . 4 2 1
Summarizing the above discussion, we have the following result.If the complexity to compute a norm of a general n × n matrix is O(n s ) for some s > 2, then the complexity of Algorithm 6.1 is f m (n) := O mn k = 1, 2, . . ., m, i.e., n = m k=1 n k with n k+1 = O n k s−2 s .
: If a matrix M is partitioned into 2 × 2 blocks M 11 M 12 M 21 M 22 , then [9,12]nsor spectral p-norm and nuclear p-norm were studied in[9,12], which are denoted by • p σ and • p * , respectively.Similar to the tensor spectral norm and nuclear norm, the tensor norms in Definition 2.2 are primal-dual pairs.
1112 t 1211 t 1212 t 1121 t 1122 t 1221 t 1222 t 2111 t 2112 t 2211 t 2212 t 2121 t 2122 t 2221 t 2222 t 1111 t 1121 t 1211 t 1221 t 1112 t 1122 t 1212 t 1222 t 2111 t 2121 t 2211 t 2221 t 2112 t 2122 t 2212 t 2222 If the complexity to compute a norm of a general n × n matrix is O(n s ) for some s > 2, then the complexity of Algorithm 6.1 is O n

Table 1
Spectral norm and its upper bounds for n × n matrices when n = 10 4

Table 2
Spectral norms of the norm compression hierarchy for n × n matrices when n = 38