Abstract
For \(m,n\in \mathbb {N}\), let \(X=(X_{ij})_{i\le m,j\le n}\) be a random matrix, \(A=(a_{ij})_{i\le m,j\le n}\) a real deterministic matrix, and \(X_A=(a_{ij}X_{ij})_{i\le m,j\le n}\) the corresponding structured random matrix. We study the expected operator norm of \(X_A\) considered as a random operator between \(\ell _p^n\) and \(\ell _q^m\) for \(1\le p,q \le \infty \). We prove optimal bounds up to logarithmic terms when the underlying random matrix X has i.i.d. Gaussian entries, independent mean-zero bounded entries, or independent mean-zero \(\psi _r\) (\(r\in (0,2]\)) entries. In certain cases, we determine the precise order of the expected norm up to constants. Our results are expressed through a sum of operator norms of Hadamard products \(A\circ A\) and \((A\circ A)^T\).
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction and main results
With his work on the statistical analysis of large samples [69], Wishart initiated the systematic study of large random matrices. Ever since, random matrices have continuously entered more and more areas of mathematics and applied sciences beyond probability theory and statistics, for instance, in numerical analysis through the work of Goldstine and von Neumann [20, 65] and in quantum physics through the works of Wigner [66,67,68] on his famous semicircle law, which resulted in significant effort to understand spectral statistics of random matrices from an asymptotic point of view. Today, random matrix theory has grown into a vital area of probability theory and statistics, and within the last two decades, random matrices have come to play a major role in many areas of (algorithmic) computational mathematics, for instance, in questions related to sparsification methods [1, 54] and sparse approximation [57, 58], dimension reduction [4, 12, 44], or combinatorial optimization [46, 53]. We refer the reader to [5, 6, 60] for more information.
In this paper, we are interested in the non-asymptotic theory of (large) random matrices. This theory plays a fundamental role in geometric functional analysis at least since the ’70 s, the connection coming in various different flavors. It is of particular importance in the geometry of Banach spaces and the theory of operator algebras [9, 10, 15, 18, 21, 30] and their applications to high-dimensional problems, for instance, in convex geometry [17, 22], compressed sensing [14, 16, 48, 63], information-based complexity [27, 28], or statistical learning theory [50, 64]. On the other hand, geometric functional analysis had and still has enduring influence on random matrix theory as is witnessed, for instance, through applications of measure concentration techniques; we refer to [15, 42] and the references cited therein. The quantity we study and focus on here concerns the expected operator norm of random matrices considered as operators between finite-dimensional \(\ell _p\) spaces; recall that \(\ell _p^n\) denotes the space \(\mathbb {R}^n\) equipped with the (quasi-)norm \(\Vert \cdot \Vert _p\), given by \(\Vert (x_j)_{j=1}^n\Vert _p = (\sum _{j=1}^n |x_j|^p)^{1/p}\) for \(0<p<\infty \) and \(\Vert (x_j)_{j=1}^n\Vert _\infty = \max _{j\le n}|x_j|\) if \(p=\infty \). We address the following problem: for \(1 \le p,q \le \infty \) and \(m,n\in \mathbb {N}\), determine the right order (up to constants that may depend on the parameters p and q) of
where, given a deterministic real \(m\times n\) matrix \(A = (a_{ij})_{i\le m, j\le n}\) and a random matrix \(X = (X_{ij})_{i\le m, j\le n}\), we denote by
the structured random matrix; the symbol \(\mathbin {\circ }\) stands for the Hadamard product of matrices (i.e., entrywise multiplication). The bounds on the expected operator norm should be of optimal order and expressed in terms of the coefficients \(a_{ij}\), \(i\le m,j\le n\). Understanding such expressions and related quantities is important, for instance, when studying the worst-case error of optimal algorithms which are based on random information in function approximation problems [28] (see also [33]) or the quality of random information for the recovery of vectors from an \(\ell _p\)-ellipsoid, where (the radius of) optimal information is given by Gelfand numbers of a diagonal operator [29].
In the case where the random entries of X are i.i.d. standard Gaussians (then we write \(G_A\) instead of \(X_A\)) and \(1\le p,q \le \infty \), we will show the following bound, which is sharp up to logarithmic terms:
where \(D_1 {:=}\Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2}\), \(D_2 {:=}\Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2}\), and \(p^*\) denotes the Hölder conjugate of p defined by the relation \(1/p+1/p^*=1\). As will be explained later, we obtain sharp estimates in certain cases and derive results similar to (1.1) for other models of randomness.
1.1 History of the problem and known results
In what follows, \(A = (a_{ij})_{i,j}\) is a real deterministic matrix and \(G=(g_{ij})_{i,j}\) always stands for a random matrix with i.i.d. standard Gaussian entries (usually the matrices are of size \(m\times n\) unless explicitly stated otherwise). We use C(r), C(r, K), etc. for positive constants which may depend only on the parameters given in brackets and write \(C, C', c,c',\dots \) for positive absolute constants. The symbols \(\lesssim \), \(\lesssim _{r}\), \(\lesssim _{r, K}\), etc. denote that the inequality holds up to multiplicative constants depending only on the parameters given in the subscripts; we write \(a\asymp b\) if \(a\lesssim b\) and \(b\lesssim a\), and \(\asymp _r\), \(\asymp _{r,K}\), etc. if the constants may depend on the parameters given in the subscript.
In 1975, Bennett, Goodman, and Newman [9] proved that if X is an \(m\times n\) random matrix with independent, mean-zero entries taking values in \([-1,1]\), and \(2\le q < \infty \), then
In fact, up to constants, this estimate is best possible: for any \(m\times n\) matrix \(X'\) with \(\pm 1\) entries one readily sees that \( \Vert X':\ell ^n_2 \rightarrow \ell ^m_q\Vert \ge \max \{n^{1/2}, m^{1/q}\}\); just use standard unit vectors and operator duality. Moreover, in this ‘unstructured’ case, where \(a_{ij}=1\) for all i, j, it is easy to extend (1.2) to the whole range of \(p, q\in [1,\infty ]\) (see [8, 13] or Remark 4.2 below). Also, if all entries are i.i.d. Rademacher random variables, then the bounds are two-sided, i.e., the expected operator norm is, up to constants, the same as the minimal norm for all p, q (see [8, Proposition 3.2] or [13, Satz 2]).
The case most studied in the literature is the one of the spectral norm, i.e., the \(\ell _2^n \rightarrow \ell _2^m\) operator norm. Seginer [51] proved in 2000 that if \(X = (X_{ij})_{i\le m, j\le n}\) is an \(m\times n\) random matrix with i.i.d. mean-zero entries, then its operator norm is of the same order as the sum of expectations of the maximum Euclidean norm of rows and columns of X, i.e.,
Riemer and Schütt [49] proved that, up to a logarithmic factor \(\ln (en)^2\), the same holds true for any random matrix with independent but not necessarily identically distributed mean-zero entries. Let us also mention that in the Gaussian setting one can use a non-commutative Khintchine bound (see, e.g., [59, Equation (4.9)]) to show that, up to a factor \(\sqrt{\ln n}\), the expected spectral norm is of the order of the largest Euclidean norm of its rows and columns.
In the very same setting that was considered by Riemer and Schütt, Latała [37] had obtained a few years earlier the dimension-free estimate
This bound is superior to the Riemer–Schütt bound in the case of matrices with all entries equal to 1 and is optimal for Wigner matrices. In other cases, like the one of diagonal matrices, the Riemer–Schütt bound is better.
In the case of structured Gaussian matrices, Latała, van Handel, and Youssef [40], building upon earlier work of Bandeira and van Handel [7] (which combined the moment method with combinatorial considerations) as well as results proved by van Handel in [61] (which used Slepian’s lemma), obtained the precise behavior without any logarithmic terms in the dimension, namely
Their proof is based on a clever block decomposition of the underlying matrix (see [40, Fig. 3.1]). This result finally answered in the affirmative a conjecture made by Latała more than a decade before. We also refer the reader to the survey [62] discussing in quite some detail results prior to [40] and [61]—the latter work discusses the conjectures of Latała and van Handel and shows their equivalence.
Very recently, Latała and Świątkowski [39] investigated a similar problem when the underlying random matrix has Rademacher entries. They proved a lower bound which, up to a \(\ln \ln n\) factor, can be reversed for randomized \(n\times n\) circulant matrices.
In [23], Guédon, Hinrichs, Litvak, and Prochno studied our main and motivating question on the order of the expected operator norm of structured random matrices considered as operators between \(\ell _p^n\) and \(\ell _q^m\) in the special case where \( p\le 2\le q \) and the random entries are Gaussian. In this situation, where we are not dealing with the spectral norm, the moment method cannot be employed. The approach in [23] was therefore different and based on a majorizing measure construction combining the works [24] and [25]. In [23, Theorem 1.1], the authors proved that if \(1< p\le 2\le q < \infty \), then
where \(\gamma _r {:=}(\mathbb {E}|g|^r)^{1/r}\) for a standard Gaussian random variable g. Moreover, for \(p = 1\) and \(q \ge 2\), it was noted in [23, Remark 1.4] (see also [45, Twierdzenie 2]) that
Later, an extension of (1.5) to the case of matrices with i.i.d. isotropic log-concave rows was obtained by Strzelecka in [55].
Trying to extend the upper bound for \(\mathbb {E}\Vert G_A:\ell _p^n \rightarrow \ell _q^m\Vert \) to the whole range \(1\le p, q\le \infty \) one encounters two difficulties. First of all, the methods used in order to prove (1.5) fail if \(q\le 2\) or \(p \ge 2\), because the majorizing measure construction used in [23] is restricted to the case \(q\ge 2\) and the assumption \(1<p\le 2\) is required in a Hölder bound. Moreover, when \(q\le 2\) or \(p \ge 2\) the result cannot hold with the right-hand side of the same form as in (1.5) (see Remark 4.2 below for counterexamplesFootnote 1 to (1.5) in the cases \(q\le 2\) and \(p \ge 2\)). This explains the different form of expressions \(D_1\) and \(D_2\) in (1.1), which in the range \(p \le 2 \le q\) reduce to the maxima of norms on the right-hand side of (1.5)—see (1.9) below.
1.2 Lower bounds and conjectures
By arguments similar to the ones used in order to prove the lower bound in (1.4), one can check that in the range considered in [23, 45] (i.e., \(1\le p\le 2\le q \le \infty \)) one has
Note that for \(p=1\),
which explains the simplified form of (1.6).
We remark that the proof of (1.7) is based merely on the observation that the operator norm is greater than the maximum entry of the matrix and the appropriate maximum norms of its rows and columns, combined with comparison of moments for Gaussian random vectors. Another but related way to proceed, valid for all \(1 \le p, q \le \infty \), is to exchange expectation and suprema over the \(\ell _p^n\) and \(\ell _{q^*}^m\) balls in the definition of the operator norm. We present the details in Sect. 5.1. In particular, Proposition 5.1 and Corollary 5.2 implyFootnote 2 that, for \(1\le p,q \le \infty \),
It is an easy observation (see Lemma 2.1 below) that for \(p\le 2 \le q\),
Thus, in the range \(1 \le p \le 2 \le q < \infty \) considered in [23, 45], the lower bounds (1.7) and (1.8) coincide.
Although it would be natural to conjecture at this point that the bound (1.8) may be reversed up to a multiplicative constant depending only on p, q, such a reverse bound turns out not to be true in the case \(p\le q< 2\) (and in the dual one \(2< p\le q\)) as we shall show in Sect. 5.3.
In order to conjecture the right asymptotic behavior of \(\mathbb {E}\Vert G_A:\ell _p^n \rightarrow \ell _q^m\Vert \), one may take a look at the boundary values of p and q, i.e., \(p\in \{1,\infty \}\) or \(q\in \{1, \infty \}\). Note that (1.6) provides an asymptotic behavior of \(\mathbb {E}\Vert G_A:\ell _p^n \rightarrow \ell _q^m\Vert \) on a part of this boundary (i.e., for \(p=1\) and \(2\le q\le \infty \) and in the dual case \(q=\infty \) and \(1\le p\le 2\)). We provide sharp results on the remaining parts of the boundary of \([1,\infty ]\times [1,\infty ]\) (see dense lines on the boundary of Fig. 1 below):
where
and with \((x_1^{\downarrow {}}, \ldots ,x_n^{\downarrow {}})\) denoting the non-increasing rearrangement of \((|x_1|,\ldots , |x_n|)\) for a given \((x_j)_{j\le n}\in \mathbb {R}^n\). (For the precise formulation see Propositions 1.8 and 1.10, and Corollary 1.11 below.)
Moreover, in Sect. 5.1 we generalize the lower bounds from the boundary into the whole range \((p,q)\in [1,\infty ]\times [1,\infty ]\) (see Fig. 1 below), i.e., we prove
Let us now discuss the relation between the terms appearing above. We postpone the proofs of all the following claims to Sect. 5.
In the case \(p\le 2 \le q\), we have
where the matrices \((a_{ij}')_{i,j}\) and \((a_{ij}'')_{i,j}\) are obtained by permuting the columns and rows, respectively, of the matrix \((|a_{ij}|)_{i,j}\) in such a way that \(\max _i a_{i1}'\ge \dots \ge \max _i a_{in}'\) and \(\max _j a_{1j}'' \ge \dots \ge \max _j a_{mj}''\). Therefore, in the range \(1\le p\le q \le \infty \) the right-hand side of (1.10) changes continuously with p and q (for a fixed matrix A).
Obviously, \( \max _{j\le n}\sqrt{\ln (j+1)} b_j^{\downarrow {}} \ge \max _{i\le m, j\le n}\sqrt{\ln (j+1)} a_{ij}' \) and, in general, the former quantity may be of larger order than the latter one. In Sect. 5.3 we shall present a more subtle relation: for every \(1\le p\le q< 2\) we shall give an example showing that the right-hand side of (1.10) may be of larger order than \(D_1+D_2+\mathbb {E}\max _{i \le m,j\le n} |a_{ij}g_{ij}| \). Note that by duality, i.e., the fact that
the same holds in the case \(2< p\le q\). This suggests that the behavior of \(\mathbb {E}\Vert G_A:\ell _p^n\rightarrow \ell _q^m\Vert \) is different in the regions with horizontal or vertical lines than in the region with northeast lines.
Moreover, we have
(see Sect. 5.2). Note that this is not the case for \(p \le q\), as one can easily see by considering, e.g., A equal to the identity matrix. This suggests a different (than in other regions), simplified, behavior of \(\mathbb {E}\Vert G_A:\ell _p^n\rightarrow \ell _q^m\Vert \) in the region with northwest lines.
Given the discussion above, the lower bounds presented in (1.10), and the fact that they can be reversed for all \(p\in [1,\infty ]\), \(q\in \{1,\infty \}\) (and for all \(q\in [1,\infty ]\), \(p\in \{1,\infty \}\)), it is natural to conjecture the following.
Conjecture 1
For all \(1\le p, q \le \infty \), we conjecture that
Remark 1.1
One could pose another natural conjecture, based on the potential generalization of the first line of the bound (1.4), namely that the inequality
holds for all \(1\le p,q \le \infty \). Indeed, the lower bound is true with constant \(\frac{1}{2}\), since for every deterministic matrix X one has
However, as we prove in Sect. 5.4, this conjecture is wrong: although the right-hand sides of (1.14) and (1.15) are comparable in the range \(1\le p \le 2\le q\le \infty \), for every pair of p, q outside this range the right-hand side of (1.15) may be of smaller order than the left-hand side.
Let us now present a conjecture concerning the boundedness of linear operators given by infinite dimensional matrices. In what follows, we say that a matrix \(B= (b_{ij})_{i,j\in \mathbb {N}}\) defines a bounded operator from \(\ell _p(\mathbb {N})\) to \(\ell _q(\mathbb {N})\) if for all \(x \in \ell _p(\mathbb {N})\) the product Bx is well defined, belongs to \(\ell _q(\mathbb {N})\) and the corresponding linear operator is bounded.
Conjecture 2
Let \(A = (a_{ij})_{i,j\in \mathbb {N}}\) be an infinite matrix with real coefficients and let \(1\le p, q \le \infty \). We conjecture that the matrix \(G_A = (a_{ij}g_{ij})_{i,j\in \mathbb {N}}\) defines a bounded linear operator between \(\ell _p(\mathbb {N})\) and \(\ell _q(\mathbb {N})\) almost surely if and only if the matrix \(A\circ A\) defines a bounded linear operator between \(\ell _{p/2}(\mathbb {N})\) and \(\ell _{q/2}(\mathbb {N})\), the matrix \((A\circ A)^T\) defines a bounded linear operator between \(\ell _{q^*/2}(\mathbb {N})\) and \(\ell _{p^*/2}(\mathbb {N})\), and
-
in the case \(p\le 2\le q\), \(\mathbb {E}\sup _{i,j\in \mathbb {N}} |a_{ij}g_{ij}|<\infty \),
-
in the case \(p\le q\le 2\), \(\lim _{j\rightarrow \infty } b_j = 0\), and \(\sup _{j\in \mathbb {N}}\sqrt{\ln (j+1)} b_j^{\downarrow {}} <\infty \), where \(b_j = \Vert (a_{ij})_{i\in \mathbb {N}}\Vert _{2q/(2-q)}\), \(j\in \mathbb {N}\),
-
in the case \(2\le p \le q\), \(\lim _{i\rightarrow \infty } d_i = 0\), and \(\sup _{i\in \mathbb {N}}\sqrt{\ln (i+1)} d_i^{\downarrow {}} <\infty \), where \(d_i {:=}\Vert (a_{ij})_{j\in \mathbb {N}} \Vert _{2p/(p-2)}\), \(i\in \mathbb {N}\),
-
(in the case \(q<p\) we do not need to assume any additional conditions).
We remark that it suffices to prove Conjecture 1 in order to confirm Conjecture 2.
Proposition 1.2
Assume \(1\le p, q\le \infty \). Then (1.14) for this choice of p, q implies the assertion of Conjecture 2 for the same choice of p, q.
We postpone the proof of this proposition to Subsection 5.5.
In this article, in addition to the cases \(p=q=2\) obtained in [40] and \(p=1, q\ge 2\) proved in [23, 45], we confirm Conjecture 1 when \(p\in \{1,\infty \}\), \(q\in [1,\infty ] \) and when \(q\in \{1,\infty \}\), \(p\in [1,\infty ] \). In all the other cases, we are able to prove the upper bounds only up to logarithmic (in the dimensions m, n) multiplicative factors (see Corollary 1.4 below). In particular, Proposition 1.2 implies that Conjecture 2 holds for all \(p\in \{1,\infty \}\), \(q\in [1,\infty ] \) and for all \(q\in \{1,\infty \}\), \(p\in [1,\infty ] \).
Note that in the structured case we work with, interpolating the results obtained for the boundary cases \(p\in \{1, \infty \}\) or \(q\in \{1, \infty \}\) gives a bound with polynomial (in the dimensions) multiplicative constants which are much worse than logarithmic constants from Corollary 1.4 below. However, as we shall see in Remark 4.2 below, interpolation techniques work well in the non-structured case.
1.3 Main results valid for \(1\le p, q\le \infty \)
We start with general theorems valid for the whole range of p, q. Results which are based on methods working only for specific values of p, q, but yielding better logarithmic terms, are presented in the next subsection. A brief summary and comparison of all results can be found in Table 1.
Before stating our main results, we need to introduce additional notation. For a non-empty set \(J\subset \{1,\ldots ,n\}\), and \(1\le p\le \infty \), we define
By \(\ell _p^J\) we denote the space \(\mathbb {R}^J{:=}\bigl \{(x_j)_{j\in J}: x_j\in \mathbb {R}\bigr \}\) equipped with the norm
whose unit ball is \(B_p^J\). Obviously, the space \(\ell _p^J\) can be identified with a subspace of \(\ell _p^n\). If \(A:\ell _p^n\rightarrow \ell _q^m\) is a linear operator, the notation \(A:\ell _p^J\rightarrow \ell _q^I\) means that A is restricted to the space \(\ell _p^J\) and composed with a projection onto \(\ell _q^I\). Moreover, for \(x=(x_1,\ldots , x_n)\in \mathbb {R}^n\), \(\sup _{J} \Vert x\Vert _{\ell _p^J} = \bigl (\sum _{j\le k} |x_j^{\downarrow {}}|^p\bigr )^{1/p}\), where the supremum is taken over all \(J\subset \{1,\ldots ,n\}\) with \(|J|=k\), and \((x_1^{\downarrow {}}, \ldots ,x_n^{\downarrow {}})\) is the non-increasing rearrangement of \((|x_1|,\ldots , |x_n|)\).
Theorem 1.3
(Main theorem in a general version with sets \( I_0\), \(J_0\)) Assume that \(m\le M\), \(n\le N\), \(1\le p,q \le \infty \), and \(G=(g_{ij})_{i\le M, j\le N}\) has i.i.d. standard Gaussian entries. Then
where the suprema are taken over all sets \(I_0\subset \{1,\ldots ,M\}\), \(J_0\subset \{1,\ldots ,N\}\) such that \(|I_0|=m\), \(|J_0|=n\).
The above theorem gives an estimate on the largest operator norm among all submatrices of \(G_A\) of fixed size. Let us remark that apart from being of intrinsic interest, quantities of this type (for \(p=q=2\)) have appeared in connection with the study of the restricted isometry property of random matrices with independent rows [2] or in the analysis of entropic uncertainty principles for random quantum measurements [3, 47].
Let us now give an outline of the proof of Theorem 1.3. Note that
In the first step of our proof, we find polytopes L and K approximating (with accuracy depending logarithmically on the dimension) the unit balls in \(\ell _p^{J_0}\) and \(\ell _{q^*}^{I_0}\), respectively. The extreme points of the sets K and L have a special and simple structure: absolute values of their non-zero coordinates are all equal to a constant depending only on the size of the support of a given point. Since K is close to \(B_{q^*}^{I_0}\) and L is close to \(B_p^{J_0}\), we may consider only \(x\in {\text {Ext}}(L), y\in {\text {Ext}}(K)\) in (1.16). Since non-zero coordinates of \(x\in {\text {Ext}}(L)\) and \(y\in {\text {Ext}}(K)\), respectively, are all equal up to a sign we may use a symmetrization argument and the contraction principle to remove x and y in the sum on the right-hand side of (1.16). Thus, in the next step of the proof we only need to estimate the expected value of
where I and J represent the potential supports of points in \({\text {Ext}}(K)\) and \({\text {Ext}}(L)\). To deal with this quantity, we first consider the suprema over the subsets of fixed sizes and use Slepian’s lemma to compare the supremum of the Gaussian process above with the supremum of another Gaussian process, which may be bounded easily. Then we make use of the term \(|I|^{-1/q^*}|J|^{-1/p}<1\), which allows us to go back to suprema over the sets \(B_p^ {J_0}\) and \(B_{q^*}^ {I_0}\). At the end, we use the Gaussian concentration inequality to unfix the sizes of sets I and J and complete the proof.
Applying Theorem 1.3 with \(N=n\), \(M=m\) immediately yields the following result, which confirms Conjecture 1 up to some logarithmic terms.
Corollary 1.4
(Main theorem – \(\ell _p\) to \(\ell _q\) version) Assume that \(1\le p,q \le \infty \) and \(G=(g_{ij})_{i\le m, j\le n}\) has i.i.d. standard Gaussian entries. Then,
Moreover, we easily recover the same bound in the case of independent bounded entries. We state and prove a general version with sets \(I_0\) and \(J_0\) akin to Theorem 1.3 in Sect. 3.2.
Corollary 1.5
Assume that \(1\le p,q \le \infty \) and \(X=(X_{ij})_{i\le m, j\le n}\) has independent mean-zero entries taking values in \([-1,1]\). Then
We use the two results above to obtain their analogue in the case of \(\psi _r\) entries for \(r\le 2\); these random variables are defined by (1.17).
This class contains, among others,
-
log-concave random variables (which are \(\psi _1\)),
-
heavy tailed Weibull random variables (of shape parameter \(r\in (0,1)\), i.e., \(\mathbb {P}(|X_{ij}|\ge t)=e^{-t^r/L}\) for \(t\ge 0\)),
-
random variables satisfying the condition
$$\begin{aligned} \Vert X_{ij}\Vert _{2\rho } \le \alpha \Vert X_{ij}\Vert _{\rho } \qquad \text {for all } \rho \ge 1. \end{aligned}$$These random variables are \(\psi _r\) with \(r=1/\log _2\alpha \). They were considered recently in [38].
A general version of the following Corollary 1.6 with sets \(I_0\) and \(J_0\) is stated and proved in Subsection 3.2.
Corollary 1.6
Assume that \(K,L >0\), \(r\in (0,2]\), \(1\le p,q \le \infty \), and \(X=(X_{ij})_{i\le m, j\le n}\) has independent mean-zero entries satisfying
Then
1.4 Results for particular ranges of p, q
We continue with results for some specific ranges of p, q, where we are able to prove estimates with better logarithmic dependence (results which follow from them by duality (1.12) are stated in Table 1 to keep the presentation short). We postpone their proofs to Sect. 4. We start with the case of Gaussian random variables. Recall that \(\gamma _q = (\mathbb {E}|g|^q)^{1/q}\), where g is a standard Gaussian random variable.
Proposition 1.7
For all \(1\le p\le 2\) and \(1\le q< \infty \), we have
If \(q=1\) or \(p=\infty \), then we are able to get a result without logarithmic terms. Recall that for a sequence \((x_j)_{j\le n}\) we denote by \((x_j^{\downarrow {}})_{j\le n}\) the non-increasing rearrangement of \((|x_j|)_{j\le n}\).
Proposition 1.8
-
(i)
For \(1< p \le \infty \), we have
$$\begin{aligned}{} & {} \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{1/2}\Vert ^{1/2} +\Vert (A\mathbin {\circ }A)^T :\ell ^m_{\infty } \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2} \lesssim \mathbb {E}\Vert G_A :\ell ^n_p\rightarrow \ell ^m_1\Vert \\{} & {} \quad \le \gamma _1\Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{1/2}\Vert ^{1/2} + 2 \gamma _{p^*} \Vert (A\mathbin {\circ }A)^T :\ell ^m_{\infty } \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2}. \end{aligned}$$ -
(ii)
Moreover,
$$\begin{aligned} \mathbb {E}\Vert G_A :\ell ^n_1\rightarrow \ell ^m_1\Vert \asymp \Vert A\mathbin {\circ }A :\ell ^n_{1/2} \rightarrow \ell ^m_{1/2}\Vert ^{1/2} + \max _{j\le n}\sqrt{\ln (j+1)} b_j^{\downarrow {}}, \end{aligned}$$where \(b_j {:=}\Vert (a_{ij})_{i\le m}\Vert _2\), \(j\le n\).
Note that (ii) shows in particular that a blow up of the constant \(\gamma _{p^*}\) in the upper estimate (i) for \(p\rightarrow 1\) is necessary, since the right most summands in (i) and (ii) are non-comparable.
Remark 1.9
It shall be clear from the proof that the upper bound in part (i) of Proposition 1.8 remains valid for any random matrix X (instead of G) with independent isotropic rows (i.e., rows with mean zero and the covariance matrix equal to the identity) such that
Note that the independence and the isotropicity of rows imply that also the columns of X are isotropic (since the coordinates of every column are independent and have mean zero and variance 1). Therefore, whenever \(p\ge 2\), condition (1.19) is always satisfied (because the \(p^*\)-integral norm is bounded above by the 2-integral norm, which is then equal to the right-hand side of (1.19), since the covariance matrix of each column is equal to the \(m\times m\) identity matrix).
The following proposition generalizes part (ii) of Proposition 1.8 to an arbitrary \(q \le 2\). We list it separately since we present a proof using different arguments. Recall that the case \(p=1\), \(q\ge 2\) was established before, see (1.6).
Proposition 1.10
If \(1\le q \le 2\), then
where \(b_j = \Vert (a_{ij})_{i\le m}\Vert _{2q/(2-q)}\) for \(j\le n\).
Proposition 1.10 immediately implies its dual version.
Corollary 1.11
If \(2\le p \le \infty \), then
where \(d_i= \Vert (a_{ij})_{j\le n}\Vert _{2p^*/(2-p^*)} = \Vert (a_{ij})_{j\le n}\Vert _{2p/(p-2)} \) for \(i\le m\).
Remark 1.12
Corollary 1.11 and the dual version of (1.6) provide the exact behavior of expected norm of Gaussian operator from \(\ell _p^n\) to \(\ell _q^m\) not only when \(q=\infty \), but also for \(q\ge c_0 \ln m\), as we explain now. For all \(q\ge q_0{:=}c_0 \ln m\) we have the following inequalities for norms on \(\mathbb {R}^m\),
therefore,
Similarly,
Proposition 1.7 implies the following estimate for matrices with independent \(\psi _r\) entries, in the same way as Corollary 1.4 implies Corollary 1.6 (see Sect. 3.2).
Corollary 1.13
Assume that \(K,L >0\), \(r\in (0,2]\), and \(X=(X_{ij})_{i\le m, j\le n}\) has independent mean-zero entries satisfying
Then, for \(1\le p\le 2\), \(1\le q \le \infty \),
By Hoeffding’s inequality (i.e., Lemma 2.13) we know that matrices with independent valued in \([-1,1]\) entries having mean zero satisfy (1.20) with \(r=2\) and \(K=2=L\). In this special case of independent bounded random variables one can also adapt the methods of [9] to prove in the smaller range \(1\le p\le 2 \le q < \infty \) the following result with explicit numerical constants and improved dependence on n (note that the second logarithmic term is better than in Corollary 1.13, where the exponent equals \(1/2+1/p^*\)).
Proposition 1.14
Assume that \(X=(X_{ij})_{i\le m, j\le n}\) has independent mean-zero entries taking values in \([-1,1]\). Then, for \(1\le p\le 2\le q< \infty \),
where \(C(q) {:=}2 (q\Gamma (q/2))^{1/q} \asymp \sqrt{q}\).
Finally, we have the following general result for matrices with independent \(\psi _r\) entries (cf. Corollary 1.6).
Theorem 1.15
Let \(K,L>0\), \(r\in (0,2]\), and assume that \(X=(X_{ij})_{i\le m, j\le n}\) has independent mean-zero entries satisfying
Then, for all \(1\le p\le 2\) and \(1\le q< \infty \),
Having in mind the strategy of proof described after Theorem 1.3, let us elaborate on the idea of proof of Theorem 1.15. We shall split the matrix X into two parts \(X^{(1)}\) and \(X^{(2)}\) which we treat separately. In our decomposition, all entries of \(X^{(1)}\) are bounded by \(C\ln (mn)^{1/r}\) and the probability that \(X^{(2)} \ne 0\) is very small. Then we shall deal with \(X^{(2)}\) using a crude bound (Lemma 4.3) and the fact that the probability that \(X^{(2)} \ne 0\) is small enough to compensate it. In order to bound the expectation of the norm of \(X^{(1)}\), we require a cut-off version of Theorem 1.15 (Lemma 4.4). To obtain it, we shall replace \(B_p^n\) in the expression for the operator norm with a suitable polytope K (and leave \(\sup _{y\in B_{q^*}^m}\) as it is) and then apply a Gaussian-type concentration inequality to the function \(Z\mapsto F(Z) {:=}\Vert Z_A x\Vert _q\) for \(x\in {\text {Ext}}(K)\).
1.5 Tail bounds
All the bounds for \(\mathbb {E}\Vert X_A:\ell _p^n \rightarrow \ell _q^m\Vert \) provided in this work for random matrices X also yield a tail bound for \(\Vert X_A:\ell _p^n \rightarrow \ell _q^m\Vert \). (It is clear from the proof of Proposition 1.16—see Sect. 3.2—that the same applies to the estimates for \(\sup _{I_0, J_0} \Vert G_A: \ell _p^{J_0} \rightarrow \ell _q^{I_0} \Vert \), but we omit the details to keep the presentation clear.)
Proposition 1.16
(Tail bound) Assume that \(K,L \ge 1\), \(r\in (0,2]\), \(1\le p,q \le \infty \), and \(\gamma \ge 1\). Fix a deterministic \(m\times n\) matrix A and assume that
If for all random matrices \(X=(X_{ij})_{i\le m, j\le n}\) with independent mean-zero entries satisfying
we have
then, for all random matrices with independent mean-zero entries satisfying (1.21), we also have
and, for all \(t>0\),
Note that random variables taking values in \([-1,1]\) satisfy condition (1.21) with \(r=2\), \(K=e\), and \(L=1\). Thus, Proposition 1.16 applies also in the setting of bounded or Gaussian entries.
1.6 Organization of the paper
In Sect. 2 we gather various preliminary results we shall use in the sequel. Section 3 contains the proofs of the main results valid for all p, q (i.e., Theorem 1.3 and its corollaries) and the tail bound from Proposition 1.16. In Sect. 4 we prove the results for specific choices/ranges of p, q. In Sect. 5 we prove lower bounds on expected operator norms, showing in particular that our estimates are optimal up to logarithmic factors. We also prove other results justifying the proposed form of Conjecture 1. The last subsection of Sect. 5 is devoted to infinite dimensional Gaussian operators.
2 Preliminaries
2.1 General facts
We start with some easy lemmas which will be used repeatedly throughout the paper.
Lemma 2.1
For any real \(m\times n\) matrix \(B = (b_{ij})_{i\le m, j\le n}\) and \(0<r\le 1\le s\le \infty \), we have
Furthermore, for a real \(m\times n\) matrix \(A = (a_{ij})_{i\le m, j\le n}\) and \(1\le p\le 2\), \(p \le q\le \infty \),
Proof
Since \(0<r\le 1\), we have \({\text {conv}}B_r^n = B_1^n\), where \({\text {conv}}S\) denotes the convex hull of the set S. Moreover, the extreme points of \(B_1^n\) are the signed standard unit vectors, i.e., \(\pm e_1,\dots ,\pm e_n\), and \(z\mapsto \Vert z\Vert _s\) is a convex function (since \(s\ge 1\)). Thus,
This immediately implies the result for the Hadamard product \(A\circ A=:B\) if \(1\le p\le 2\le q\le \infty \).
If, on the other hand, \(1\le p\le q\le 2\), then by the subadditivity of the function \(t\mapsto |t|^{q/2}\),
where in the last equality we used the first part of the Lemma. Since we clearly have
we thus obtain
\(\square \)
Definition 2.2
A set \(K\subset \mathbb {R}^n\) is called unconditional, if for every \((x_j)_{j\le n}\in K\) and every \((\varepsilon _j)_{j\le n} \in \{-1,1\}^n\) we have \((\varepsilon _ j x_j)_{j\le n} \in K\).
We shall use the following version of [49, Lemma 2.1].
Lemma 2.3
Assume that \(1\le p\le \infty \), \(n\in \mathbb {N}\), and define the convex set
Then \(B_p^n \subset \ln (en)^{1/p^*} K\).
Proof
Fix a vector \(x=(x_1,\dots ,x_n)\in \mathbb {R}^n\). We want to prove that \(\Vert x\Vert _K \le \ln (en)^{1/p^*} \Vert x\Vert _p\), where
denotes the norm generated by K, i.e., its Minkowski gauge. Since both K and \(B_p^n\) are permutationally invariant and unconditional (see Definition 2.2), we may and will assume that \(x_1\ge \dots \ge x_n\ge 0\). If we put \(x_{n+1}{:=}0\), then
Since \(\Vert e_1+\dots +e_j\Vert _K = j^{1/p}\) for \(1\le j \le n\),Footnote 3 the triangle and Hölder inequalities yield
where we also used the elementary estimates \(j^{1/p}-(j-1)^{1/p}\le j^{\frac{1}{p}-1}\) and \(\sum _{j=1}^n \frac{1}{j}\le 1+\int _1^n \frac{1}{t} dt = \ln (en)\). This completes the proof. \(\square \)
Remark 2.4
The term \(\ln (en)^{1/p^*}\) can be replaced by \(1+\frac{1}{p} \ln (en)^{1/p^*}\) by writing in the above proof
Here we used the estimates \(j^{1/p}-(j-1)^{1/p}\le \frac{1}{p} (j-1)^{\frac{1}{p}-1}\) for \(j>1\) (which follows from the concavity of the function \(t\mapsto t^{1/p}\)) and the trivial one \(x_1\le \Vert x\Vert _p\).
Remark 2.5
The constant \((\ln n)^{1/p^*}\) in Lemma 2.3 is sharp up to a constant depending on p for every \(1\le p<\infty \) (when \(p=\infty \), \(K=B_p^n\) and the constant depending on p degenerates as \(p \rightarrow \infty \)). More precisely, we shall prove that if \(B_p^n \subset C(p,n) K\), then \(C(p,n) > rsim _p (\ln n)^{1/p^*}\). Note that \(B_p^n \subset C(p,n) K\) if and only if
where \(\Vert \cdot \Vert _K^*\) is norm dual to \(\Vert \cdot \Vert _K\).
Let \({\text {Ext}}K \) be the set of extreme points of K, and let \((y_j^{\downarrow {}})_{j\le n}\) be the non-increasing rearrangement of \((|y_j|)_{j\le n}\). For every \(y\in \mathbb {R}^n\),
Assume that \(p^*\ne 1\) and put \(y_j {:=}j^{-1/p^*}\). We get
whereas
so inequality (2.1) yields that \(C(p,n) > rsim _p (\ln n)^{1/p^*}\).
We shall also need the following standard lemma (see, e.g., [41, Sect. 1.3]). We will use the versions with \(r=1\) and \(r=2\).
Lemma 2.6
Let Z be a nonnegative random variable. If there exist \(a \ge 0\), \(b, \alpha , \beta , s_0> 0\), and \(r\ge 1\) such that
then
Proof
Integration by parts yields
\(\square \)
2.2 Contraction principles
Below we recall the well-known contraction principle due to Kahane and its extension by Talagrand (see, e.g., [64, Exercise 6.7.7] and [43, Theorem 4.4 and the proof of Theorem 4.12]).
Lemma 2.7
(Contraction principle) Let \((X,\Vert \cdot \Vert )\) be a normed space, \(n\in \mathbb {N}\), and \(\rho \ge 1\). Assume that \(x_1,\dots ,x_n\in X\) and \(\alpha {:=}(\alpha _1,\dots ,\alpha _n)\in \mathbb {R}^n\). Then, if \(\varepsilon _1,\dots ,\varepsilon _n\) are independent Rademacher random variables, we have
Lemma 2.8
(Contraction principle) Let T be a bounded subset of \(\mathbb {R}^n\). Assume that \(\varphi _i:\mathbb {R}\rightarrow \mathbb {R}\) are 1-Lipschitz and \(\varphi _i(0)=0\) for \(i=1,\ldots ,n\). Then, if \(\varepsilon _1,\dots ,\varepsilon _n\) are independent Rademacher random variables, we have
2.3 Gaussian random variables
The following result is fundamental to the theory of Gaussian processes and referred to as Slepian’s inequality or Slepian’s lemma [52]. We use the following (slightly adapted) version taken from [11, Theorem 13.3].
Lemma 2.9
(Slepian’s lemma) Let \((X_t)_{t\in T}\) and \((Y_t)_{t\in T}\) be two Gaussian random vectors satisfying \(\mathbb {E}[X_t]=\mathbb {E}[Y_t]\) for all \(t\in T\). Assume that, for all \(s,t \in T\), we have \(\mathbb {E}[(X_s-X_t)^2] \le \mathbb {E}[(Y_s-Y_t)^2]\). Then
The next lemma is folklore. We include a short proof of an estimate with specific constants for the sake of completeness.
Lemma 2.10
Assume that \(k\ge 2\) and let \(g_i\), \(i\le k\), be standard Gaussian random variables (not necessarily independent). Then
Proof
Since the moment generating function of a Gaussian random variable is given by \(\mathbb {E}e^{tg_1} = e^{t^2/2}\), it follows from Jensen’s inequality that
By taking \(t=\sqrt{2\ln k}\), we get the first assertion. We apply this inequality with random variables \(g_1, -g_1, \ldots , g_k, -g_k\) to get the second assertion, namely
\(\square \)
The next two lemmas are taken from [61]. Recall that \(b_1^{\downarrow {}} \ge \ldots \ge b_n^{\downarrow {}}\) is the non-increasing rearrangement of \((|b_j|)_{j\le n}\).
Lemma 2.11
([61, Lemma 2.3]) Assume that \((b_j)_{j\le n}\in \mathbb {R}^n\) and let \((X_j)_{j\le n}\) be random variables (not necessarily independent) satisfying
Then
Lemma 2.12
([61, Lemma 2.4]) Assume that \((b_j)_{j\le n}\in \mathbb {R}^n\) and let \((X_j)_{j\le n}\) be independent random variables with \(X_j \sim {\mathcal {N}}(0,b_j^{2})\) for \(j\le n\). Then
Lemma 2.13
(Hoeffding’s inequality, [32, Theorem 2]) Assume that \((b_j)_{j\le n}\in \mathbb {R}^n\) and let \(X_j\), \(j\le n\), be independent mean-zero random variables such that \(|X_j|\le 1\) a.s. Then, for all \(t\ge 0\),
2.4 Random variables with heavy tails
The following lemma is a special case of [34, Theorem 1].
Lemma 2.14
(Contraction principle) Let \(K, L>0\) and assume that \((\eta _i)_{i\le n}\) and \((\xi _i)_{i\le n}\) are two sequences of independent symmetric random variables satisfying for every \(i\le n\) and \(t\ge 0\),
Then, for every convex function \(\varphi \) and every \(a_1,\ldots , a_n \in \mathbb {R}\),
Lemma 2.15
([31, Theorem 6.2]) Assume that \(Z_1, \dots , Z_n\) are independent symmetric Weibull random variables with shape parameter \(r\in (0,1]\) and scale parameter 1, i.e., \(\mathbb {P}(|Z_i|\ge t)=e^{-t^r}\) for \(t\ge 0\). Then, for every \(\rho {}\ge 2\) and \(a\in \mathbb {R}^n\),
Remark 2.16
(Moments of Weibull random variables) Note that if Z is a symmetric random variable such that \(\mathbb {P}(|Z|\ge t)=e^{-t^r}\), \(r\in (0,2]\), then \(Y=|Z|^{r}{\text {sgn}}(Z)\) has (symmetric) exponential distribution with parameter 1, so by Stirling’s formula we obtain, for all \(\rho \ge 1\),
with \(C\ge 1\).
The three previous results easily imply the following estimate for integral norms of linear combinations of independent \(\psi _r\) random variables.
Proposition 2.17
Let \(K, L>0\), \(r\in (0,1]\) and assume that \(Z_1, \dots , Z_n\) are independent symmetric random variables satisfying \(\mathbb {P}(|Z_i|\ge t) \le Ke^{-t^r/L}\) for all \(t\ge 0\) and \(i\le n\). Then, for every \(\rho {}\ge 2\) and \(a\in \mathbb {R}^n\),
Proof
The first inequality is an immediate consequence of Lemma 2.14 (applied with \(\eta _i=Z_i\), independent Weibull variables \(\xi _i\) with shape parameter r and scale parameter 1, and with the convex function \(\varphi :t\mapsto |t|^{\rho }\)), Lemma 2.15, and Remark 2.16. The second inequality follows from
where in the last step we used the inequality between weighted arithmetic and geometric means. \(\square \)
The next lemma is standard and provides us with several equivalent formulations of the \(\psi _r\) property expressed through tail bounds, growth of moments, and the exponential moments, respectively. We provide a brief proof, since in the literature one usually finds versions for \(r\ge 1\) only.
Lemma 2.18
Assume that \(r\in (0,2]\). Let Z be a non-negative random variable. The following conditions are equivalent:
-
(i)
There exist \(K_1,L_1>0\) such that
$$\begin{aligned} \mathbb {P}(Z\ge t) \le K_1 e^{- t^r/ L_1} \quad \text {for all } t\ge 0. \end{aligned}$$ -
(ii)
There exists \(K_2\) such that
$$\begin{aligned} \Vert Z\Vert _{\rho {}} \le K_2 \rho {}^{1/r} \quad \text {for all } \rho {} \ge 1. \end{aligned}$$ -
(iii)
There exist \(K_3, u>0\) such that
$$\begin{aligned} \mathbb {E}\exp (u Z^r) \le K_3. \end{aligned}$$
Here, (i) implies (ii) with \(K_2 = C(r)K_1L_1^{1/r}\), (ii) implies (iii) with \(K_3 = 1+e^{(2e r)^{-1}}\), \(u=(2erK_2^r)^{-1}\), and (iii) implies (i) with \(K_1 = K_3\), \(L_1=u^{-1}\).
Proof
Property (i) implies (ii) by Lemma 2.14 (applied with \(n=1\), \(\eta _1=Z\) and an independent Weibull variable \(\xi _1\) with parameter r) and Remark 2.16. Property (iii) implies (i) by Chebyshev’s inequality:
Assume now that (ii) holds and denote \(k_0 = \lfloor \frac{1}{r} \rfloor \). Then, for every \(k\in [1,k_0]\), we have \(kr\le 1\) and
while for \(k\ge k_0+1 \), we have \(kr \ge 1\) and, hence, property (ii) yields
Hence, by Stirling’s formula we have for \(u=(2erK_2^r)^{-1}\),
\(\square \)
The next lemma states that a linear combination of independent \(\psi _r\) random variables is a \(\psi _r\) random variable.
Lemma 2.19
Assume that \(u>0\), \(r\in (0,2]\), and let \((Z_i)_{i\le k}\) be independent symmetric random variables satisfying \(\mathbb {P}(|Z_i|\ge t) \le Ke^{-t^r/L}\) for all \(t\ge 0\). Then for every \(a\in \mathbb {R}^k\) the random variable \(Y{:=}\Vert a\Vert _2^{-1}\sum _{i=1}^k a_iZ_i\) satisfies, for all \(t\ge 0\),
where \(K'\), \(L'\) depend only on K, L, and r.
Proof
The case \(r\ge 1\) is standard (see, e.g., [14, Theorem 1.2.5]), therefore we skip a proof in this case (however, in order to prove the lemma in the case \(r\ge 1\) it suffices to use the result of Gluskin and Kwapień [19] (together with Lemma 2.14) instead of Lemma 2.15 in the proof below).
Assume that \(r\in (0,1]\) and recall that \(Y=\Vert a\Vert _2^{-1}\sum _{i=1}^k a_iZ_i\). By Proposition 2.17,
Hence, Lemma 2.18 yields the assertion. \(\square \)
Lemma 2.20
Assume that \(r\in (0,2]\), \(\frac{1}{s}{:=}\frac{1}{r}-\frac{1}{2}\), Y is a non-negative random variable such that \(\mathbb {P}(Y\ge t)=e^{-t^s}\) for all \(t\ge 0\), and \(g\sim {\mathcal {N}}(0,1)\) is independent of Y. Then, for every \(t\ge 0\),
where \(c:=\sqrt{2/\pi }e^{-2}\).
Proof
In the case \(r=2\) we have \(s=\infty \) and then \(Y=1\) almost surely and the assertion is trivial. Assume now that \(r<2\). By our assumptions \(r=\frac{2s}{2+s}\). Let \(x_0{:=}(2t^s)^{1/(2+s)}\). Note that \(x\ge x_0\) is equivalent to \(\frac{t^s}{x^s}\le \frac{x^2}{2}\). Thus,
where we used \(2^{{2/(2+s)}}\le 2\) and chose \(c{:=}\sqrt{2/\pi }e^{-2}\). \(\square \)
Lemma 2.21
Assume that \(K, L>0\), \(r\in (0,2]\) and that Z is a random variable satisfying \(\mathbb {P}(|Z|\ge t) \le Ke^{-t^r/L}\) for all \(t\ge 0\). Let Y, g, and \(c=\sqrt{2/\pi }e^{-2}\) be as in Lemma 2.20. Then there exist random variables \(U\sim |Z|\) and \(V\sim |g|Y\) such that
Proof
For \(t=0\) we have \(1=\mathbb {P}(|Z|\ge 0)\le K\), so \(K\ge 1\), and thus \(\ln (K/c)=\ln (Ke^2\sqrt{\pi /2})>0\). We use our assumptions, the inequality \((a+b)^r\ge (a^r+b^r)/2\), and Lemma 2.20 to obtain for any \(t\ge 0\),
Consider the version U of |Z| and the version V of |g|Y defined on the (common) probability space (0, 1) equipped with Lebesgue measure, constructed as the (generalised) inverses of cumulative distribution functions of |Z| and |g|Y, respectively. Then \((8L)^{-1/r} U - \bigl (\ln (K/c)/4\bigr )^{1/r} \le V \), which implies the assertion. \(\square \)
Lemma 2.22
Let \(K,L>0\), \(r\in (0,2]\) and \(k\ge 3\), and assume that \((Z_i)_{i\le k}\), are random variables satisfying \(\mathbb {P}(|Z_i|\ge t) \le Ke^{-t^r/L}\) for all \(t\ge 0\). Then
and
Proof
By a union bound and the assumptions we get, for every \(v\ge 1\),
where we used \(k\ge 3\) in the last step. We integrate by parts, change the variables, and use the above bound to obtain the second part of the assertion, i.e.,
\(\square \)
3 Proofs of the main results
After the preparation in the previous section, we shall now present the proofs of our main results.
3.1 General bound via Slepian’s lemma
In order to obtain Theorem 1.3 we first prove its weaker version, for \(p=\infty \) and \(q=1\) only. After that we shall use the polytope K from Lemma 2.3 and the Gaussian concentration to see how Proposition 3.1 implies the general bound. The proof of this proposition relies on the symmetrization together with the contraction principle, which allow us to get rid of \(y_i\) and \(x_j\), and make use of Slepian’s lemma.
Proposition 3.1
Assume that \(G=(g_{ij})_{i\le m, j\le n}\) has i.i.d. standard Gaussian entries and \(k \le m\), \(l\le n\). Then
where the suprema are taken over all sets \(I\subset \{1,\ldots ,m\}\), \(J\subset \{1,\ldots ,n\}\) such that \(|I| = k\), \(|J| = l\).
Proof
Throughout the proof, \(k \le m\) and \(l\le n\) are fixed and the suprema are taken over all index sets satisfying \(I\subset \{1,\ldots ,m\}\), \(|I| = k\) and \(J\subset \{1,\ldots ,n\}\), \(|J| = l\).
Let us denote by \(({\widetilde{g}}_{ij})_{i\le m, j\le n}\) an independent copy of \((g_{ij})_{i\le m, j\le n}\). Using the duality \((\ell _1^m)^*=\ell _\infty ^m\), centering the expression, noticing that \(\sum _{j\in J} a_{ij} {\widetilde{g}}_{ij} x_j \) is a Gaussian \(\sqrt{\sum _{j\in J} a_{ij}^2 x_j^2}\), and using Jensen’s inequality, we see that
To estimate the expected value on the right-hand side, we use a symmetrization trick together with the contraction principle (Lemma 2.8). Let \((\varepsilon _i)_{i\le m}\) be a sequence of independent Rademacher random variables independent of all others. Since the random vectors
(where \(i\le m\)) are independent and symmetric, \((Z_i)_{i\le m}\) has the same distribution as \((\varepsilon _iZ_i)_{i\le m}\). Therefore,
Applying (conditionally, with the values of \(g_{ij}\)’s fixed) the contraction principle (i.e., Lemma 2.8) with the set
and the function \(u\mapsto |u|\) (which is 1-Lipschitz and takes the value 0 at the origin), we get
By proceeding similarly as in (3.1), we obtain
Observe that using symmetrization and the contraction principle similarly as in (3.2) and (3.3), we can estimate the first summand on right-hand side of (3.4) as follows,
Altogether, the inequalities in (3.1) – (3.5) yield that
We shall now estimate the first summand on the right-hand side of (3.6) using Slepian’s lemma (i.e., Lemma 2.9). Denote
where \(g_i, i=1,\ldots ,m\), \( {\widetilde{g}}_j, j=1,\ldots ,n\) are independent standard Gaussian variables. The random variables \(X_{I,J}, Y_{I,J}\) clearly have zero mean. Thus, we only need to calculate and compare \(\mathbb {E}(X_{I,J} - X_{{\widetilde{I}}, {\widetilde{J}}})^2\) and \(\mathbb {E}(Y_{I,J} - Y_{{\widetilde{I}}, {\widetilde{J}}})^2\). In the calculations below it will be evident over which sets the index i (resp. j) runs, so in order to shorten the notation and improve readability, we use the notational convention
By independence,
By independence and the inequality \(2\sqrt{ab} \le a+b\) (valid for \(a,b\ge 0\)),
Thus, we clearly have
(cf. Remark 3.2 below). Hence, by Slepian’s lemma (Lemma 2.9) and Lemma 2.10 on the expected maxima of standard Gaussian random variables,
Recalling the estimate (3.6), we arrive at
which completes the proof of Proposition 3.1. \(\square \)
Remark 3.2
In the above proof, we also have
Therefore, by Slepian’s lemma (Lemma 2.9) we may reverse the estimate from the proof as follows:
Proof of Theorem 1.3
Recall that \(\sup _{I_0,J_0}\) stands for the supremum taken over all sets \(I_0\subset [M] {:=}\{1,\ldots , M \} \), \(J_0\subset [N] {:=}\{1,\ldots , N \}\) with \(|I_0|=m\), \(|J_0|=n\). Given such sets \(I_0\), \( {J_0}\), we introduce the sets
Then, by Lemma 2.3, \(B_{q^*}^{I_0} \subset \ln (em)^{1/q} K\) and \(B_{p}^{J_0}\subset \ln (en)^{1/p^*} L\). Therefore,
where we denoted
with the suprema here (and later on in this proof) being always taken over all sets \(I\subset [M], |I|=k\) and \(J\subset [N], |J|=l\).
By Proposition 3.1, we only know that for all \(k\le m\) and \(l\le n\),
but we shall use the Gaussian concentration and the union bound to obtain an estimate for \(\mathbb {E}\max _{k\le m, l\le n} Z_{k,l}.\)
Note first that \((k^{-1/q^*} {\textbf{1}}_{\{i\in I\}})_{i\in I_0}\in K(I_0)\subset B_{q^*}^{I_0}\) and \((l^{-1/p} {\textbf{1}}_{\{j\in J\}})_{j\in J_0}\in L(J_0)\subset B_{p}^{J_0}\), provided that \(|I|=k\), \(|J|=l\), \(I\subset I_0\), \(J\subset J_0\). Therefore,
and, similarly,
This together with the estimate in (3.8) gives
Note that by the Cauchy–Schwarz inequality, the function
is D-Lipschitz with
where in the last inequality we used the fact that \(k\le m\) and \(l\le n\). In order to estimate the right-hand side of the latter inequality, we consider the following two cases:
Case 1. If \(q^*\ge 2\), then \((q^*/2)^* = q/(2-q)\ge q/2\) and \(\Vert \cdot \Vert _{q/(2-q)} \le \Vert \cdot \Vert _{q/2}\). Consequently,
Case 2. If \(q^*\le 2\), then \(B_{q^*/2}^M \subset B_1^M\) and \(\Vert \cdot \Vert _{\infty } \le \Vert \cdot \Vert _{q/2}\). Thus,
In both cases we have
so the Gaussian concentration inequality (see, e.g., [41, Chapter 5.1]) implies that for all \(u\ge 0\), \(k\le m\), and \(l \le n\),
so
This, together with the union bound, implies that for \(u\ge \sqrt{2}\), we have
Hence, by Lemma 2.6 and the estimate in (3.9),
Recalling (3.7) yields the assertion. \(\square \)
3.2 Coupling
In this subsection we use contraction principles and the coupling described in Lemma 2.21 to prove Corollaries 1.5 and 1.6, and Proposition 1.16. Below we state more general versions of the corollaries akin to Theorem 1.3 (the versions from the introduction follow by setting \(M=m\), \(N=n\)).
Theorem 3.3
(General version of Corollary 1.5) Assume that \(m\le M\), \(n\le N\), \(1\le p,q \le \infty \), and \(X=(X_{ij})_{i\le M, j\le N}\) has independent mean-zero entries taking values in \([-1,1]\). Then
where the suprema are taken over all sets \(I\subset \{1,\ldots ,M\}\), \(J\subset \{1,\ldots ,N\}\) such that \(|I|=m\), \(|J|=n\).
Remark 3.4
(Symmetrization of entries of a random matrix) Let \({\widetilde{Z}}\) be an independent copy of a random matrix Z with mean 0 entries. Then for any norm \(\Vert \cdot \Vert \), including the operator norm from \(\ell _p^n\) to \(\ell _q^m\), we have by Jensen’s inequality
Therefore, in many cases we may simply assume that we deal with matrices with symmetric (not only mean 0) entries. For example, in the setting of Theorem 3.3, the entries of \(X-{\widetilde{X}}\) are symmetric and take values in \([-2,2]\), so it suffices to prove the assertion of this theorem (with a two times smaller constant on the right-hand side) under the additional assumption that the entries of the given random matrix are symmetric.
Proof of Theorem 3.3
By Remark 3.4 we may and do assume that the entries of X are symmetric—in this case we need to prove the assertion with a two times smaller constant.
Since the entries of X are independent and symmetric, X has the same distribution as \((\varepsilon _{ij}|X_{ij}|)_{i,j}\), where \((\varepsilon _{ij})_{i\le M, j\le N}\) is a random matrix with i.i.d. Rademacher entries, independent of all other random variables. Thus, the contraction principle (see Lemma 2.7) applied conditionally yields (below the suprema are taken over all sets \(I\subset \{1,\ldots ,M\}\), \(J\subset \{1,\ldots ,N\}\) such that \(|I|=m\), \(|J|=n\), and over all \(x\in B_p^ {J}, y\in B_{q^*}^ {I}\), and the sums run over all \(i\in I\) and \(j\in J\))
and the assertion follows from Theorem 1.3. \(\square \)
Theorem 3.5
(General version of Corollary 1.6) Assume that \(K,L >0\), \(r\in (0,2]\), \(m\le M\), \(n\le N\), \(1\le p,q \le \infty \), and \(X=(X_{ij})_{i\le M, j\le N}\) has independent mean-zero entries satisfying
Then
where the suprema are taken over all sets \(I\subset \{1,\ldots ,M\}\), \(J\subset \{1,\ldots ,N\}\) such that \(|I|=m\), \(|J|=n\).
Proof
Let \({\widetilde{X}}\) be an independent copy of X. Then
This means that the symmetric matrix \(X-{\widetilde{X}}\) satisfies the assumptions of Theorem 3.5. Hence, due to Remark 3.4, we may and do assume that the entries of X are symmetric.
Take the unique positive parameter s satisfying \(\frac{1}{r} = \frac{1}{2}+\frac{1}{s}\). For \(i\le M\), \(j\le N\), let \(g_{ij}\) be i.i.d. standard Gaussian variables, independent of other variables, and let \(Y_{ij}\) be i.i.d. non-negative Weibull random variables with shape parameter s scale parameter 1 (i.e., \(\mathbb {P}(Y_{ij}\ge t)=e^{-t^s}\) for \(t\ge 0\)), independent of other variables. (In the case \(r=2\), we have \(s=\infty \) and then \(Y_{ij}=1\) almost surely.) Take
as in Lemma 2.21 (we pick a pair \((U_{ij}, V_{ij})\) separately for every (i, j), and then take such a version of each pair that the system of MN random pairs \((U_{ij}, V_{ij})\) is independent).
Let \((\varepsilon _{ij})_{i\le M, j\le N}\) be a random matrix with i.i.d. Rademacher entries, independent of all other random variables. Since the entries of X are symmetric and independent, X has the same distribution as \((\varepsilon _{ij}|X_{ij}|)_{ij}\). By Lemma 2.21 we know that
We use the contraction principle conditionally for \(\mathbb {E}_{\varepsilon }\), i.e., for \(U_{ij}\)’s and \(V_{ij}\)’s fixed. More precisely, we apply Lemma 2.7 to the space \(\textbf{X}\) of all \(M\times N\) matrices with real coefficients, equipped with the norm
(where the first supremum is taken over all sets \(I\subset \{1,\ldots ,M\}\), \(J\subset \{1,\ldots ,N\}\) such that \(|I|=m\), \(|J|=n\); recall that the second supremum is taken over all sets I, J as in the first supremum, and over all \(x\in B_p^ {J}, y\in B_{q^*}^ {I}\), and the sum runs over all \(i\in I\) and \(j\in J\)); note that we identify \(\textbf{X}\) with \(\mathbb {R}^{MN}\) (and MN plays the role of n from Lemma 2.7). We apply the contraction principle of Lemma 2.7 (conditionally, with the values of \(U_{ij}\)’s and \(V_{ij}\)’s fixed) with coefficients \(\alpha _{ij}{:=}\frac{U_{ij}}{C(r,K,L)(1+V_{ij})}\) and points \(\textbf{x}_{ij} {:=}\bigl (a_{kl}C(r,K,L)(1+V_{kl}){\textbf{1}}_{\{(k,l)=(i,j)\}}\bigr )_{kl} \in \textbf{X}\) to get
We may estimate the first term using Theorem 3.3 applied to the matrix \((\varepsilon _{ij})_{i\le M, j\le N}\) as follows,
Recall that \((\varepsilon _{ij} V_{ij})_{i\le M, j\le N}\overset{d}{\sim }(\varepsilon _{ij}g_{ij}Y_{ij})_{i\le M, j\le N}\) and that \(Y_{ij}\ge 0\) almost surely. Next we again use the contraction principle (applied conditionally for \(\mathbb {E}_\varepsilon \), i.e. for fixed \(Y_{ij}\)’s and \(g_{ij}\)’s) and get
Moreover, Theorem 1.3 and Lemma 2.22 (applied with \(r=s\), \(k=MN\), \(Z_{ij}=Y_{ij}\), and \(K=1=L\)), imply
Combining the estimates in (3.12)–(3.15) yields the assertion. \(\square \)
Finally, we prove that these estimates of the operator norms translate into tail bounds.
Proof of Proposition 1.16
Since (1.23) implies (1.24) (by Lemma 2.18), it suffices to prove inequality (1.23). By the symmetrization argument similar to the one from the first paragraph of the proof of Theorem 3.5, we may nad will assume that X has independent and symmetric entries satisfying (1.21). By assumption (1.21), and the inequality \(2(a+b)^r\ge a^r+b^r\) we have for every \(t\ge 0\),
so (as in the proof of Lemma 2.21) there exists a random matrix \((Y_{ij})_{i\le m, j\le n}\) with i.i.d. entries with the symmetric Weibull distribution with shape parameter r and scale parameter 1 (i.e., \(\mathbb {P}( |Y_{ij}| \ge t) = e^{-t^r}\) for \(t\ge 0\)) satisfying
Let \((\varepsilon _{ij})_{i\le m, j\le n}\) be a matrix of independent Rademacher random variables independent of all others, and let \(\Vert \cdot \Vert \) denote the operator norm from \(\ell _{p}^n\) to \(\ell _q^m\). Let \(E_{ij}\) be a matrix with 1 at the intersection of ith row and jth column and with other entries 0. The contraction principle (i.e., Lemma 2.7) applied conditionally, (3.16), and the triangle inequality yield for any \(\rho \ge 1\),
Therefore, it suffices to prove (1.23) for random matrices \((Y_{ij})_{ij}\) and \((\varepsilon _{ij})_{ij}\) instead of X.
Since by assumption \(K,L\ge 1\), both random matrices \((Y_{ij})_{ij}\) and \((\varepsilon _{ij})_{ij}\) satisfy (1.21), so for them inequality (1.22) holds. By the comparison of weak and strong moments [38, Theorem 1.1] (note that the random variables \(Y_{ij}\) satisfy the assumption \(\Vert Y_{ij}\Vert _{2\,s}\le \alpha \Vert Y_{ij}\Vert _s\) for all \(s\ge 2\) with \(\alpha =2^{1/r}\) by [38, Remark 1.5]), we have
Because of inequality (1.22), the first summand on the right-hand side may be estimated by \(\gamma D\). Lemma 2.19 and the implication (i)\(\implies \)(ii) from Lemma 2.18 yield
Moreover, by (3.10) and (3.11) (used with \(m=M\) and \(n=N\)) and our assumption that \(\Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2} \le D\),
so the second summand on the right-hand side of (3.17) is bounded above (up to a multiplicative constant depending only on r, K, and L) by \(\rho ^{1/r}D\). Thus, (1.23) indeed holds for the random matrix \((Y_{ij})_{ij}\) instead of X. A similar reasoning shows that the same inequality holds also for the random matrix \(( \varepsilon _{ij})_{ij}\) (one may also simply use the Khintchine–Kahane inequality and assumption (1.22)). \(\square \)
4 Proofs of further results
4.1 Gaussian random variables
Proof of Proposition 1.7
Fix \(1\le p\le 2\) and \(1\le q\le \infty \). Let K be the set defined in Lemma 2.3 for which \(B_p^n\subset \ln (en)^{1/p^*} K\). Then
where \({\text {Ext}}(K)\) is the set of extreme points of K. We shall now estimate the expected value of the right-hand side of (4.1).
To this end, we first consider a fixed \(x=(x_j)_{j=1}^n \in {\text {Ext}}(K)\). Then there exists a non-empty index set \(J\subset \{1,\dots ,n\}\) of cardinality \(k\le n\) such that \(x_j = \frac{\pm 1}{k^{1/p}}\) for \(j\in J\) and \(x_j=0\) for \(j\notin J\). We have
Let us estimate the Lipschitz constant of the function
It follows from the Cauchy–Schwarz inequality (used in \(\mathbb {R}^{m\times n}\)) that
where we put
This shows that the function defined by (4.3) is \(\frac{ b_J}{k^{1/p}}\)-Lipschitz continuous. Therefore, by the Gaussian concentration inequality (see, e.g., [41, Chapter 5.1]), for any \(u\ge 0\),
We shall transform this inequality into a form which is more convenient to work with. We want to estimate \(\mathbb {E}\Vert G_A x\Vert _q \) independently of x and get rid of the dependence on J and p on the right-hand side. By (4.2) and the fact that \(x\in {\text {Ext}}(K)\subset B_p^n\), we obtain
We use the definition of \(b_J\), then interchange the sums, use the triangle inequality, and then the inequality between the arithmetic mean and the power mean of order \(p^*/2\ge 1\) (recall that \(|J|=k\) and \(p\le 2\)) to obtain
The two inequalities above, together with inequality (4.5) (applied with \(u=k^{1/p^* -1/2} b_J \sqrt{2\ln (en)} s\)), imply that
holds for any \(s\ge 0\) and all \(x\in {\text {Ext}}(K)\) with support of cardinality k.
For any \(k\le n\), there are \(2^k \left( {\begin{array}{c}n\\ k\end{array}}\right) \le 2^k n^k \le \exp ( k \ln (en))\) vectors in \({\text {Ext}}(K)\) with support of cardinality k. Therefore, using a union bound together with (4.7), we see that, for all \(s\ge \sqrt{2}\),
Hence, by Lemma 2.6 (applied with \(s_0{:=}\sqrt{2}\), \(\alpha {:=}e\), \(\beta {:=}1\), and \(r{:=}2\)),
Recalling (4.1) and the definitions of a and b yields the assertion. \(\square \)
We now turn to the special case \(q=1\).
Proof of Proposition 1.8
Since the first part of this proof works for general \(q\ge 1\), we do not restrict our attention to \(q=1\) for now. First of all,
where \(X_i=(a_{ij}g_{ij})_{j=1}^n\) is the i-th row of the matrix \(G_A\). Centering this expression gives
We first take care of the second term on the right-hand side of (4.8). We have
In order to deal with the first term on the right-hand side of (4.8), we use a symmetrization trick together with the contraction principle. The latter is the reason that we need to work with \(q=1\) here. We start with the symmetrization. Denoting by \({\widetilde{X}}_1,\dots ,{\widetilde{X}}_n\) independent copies of \(X_1, \dots , X_n\) and by \((\varepsilon _i)_{i=1}^m\) a sequence of Rademacher random variables independent of all others, we obtain by Jensen’s and the triangle inequalities that
If \(q=1\), we may use the contraction principle (i.e., Lemma 2.8 applied with functions \(\varphi _i(t)=|t|\)) conditionally to obtain
For \(p > 1\), we have
Moreover, we have
Inequalities (4.10)–(4.13) give the estimate of the first term on the right-hand side of (4.8). This ends the proof of the upper bound for \(p > 1\).
If \(p = 1\), then letting \(g_1,\ldots ,g_n\) be i.i.d. standard Gaussian random variables, we have
where the last step follows from Lemmas 2.11 and 2.12 with \(b_j {:=}\Vert (a_{ij})_{i\le m}\Vert _2\), \(j\le n\). Putting together (4.8)–(4.11) and (4.14) completes the proof of the upper bound in the case \(p=1\).
The lower bound in the case \(p>1\) follows from Proposition 5.1 and Corollary 5.2 below. In the case \(p=1\), we use Proposition 5.1, note that
and use (4.14) to obtain a lower bound. \(\square \)
Now we deal with another special case, the one where \(p=1\).
Proof of Proposition 1.10
Recall that we deal with the range \(p=1\le q\le 2\). Using the structure of extreme points of \(B_1^n\) we get
Denote \(Z_j = \Vert (a_{ij}g_{ij})_{i\le m}\Vert _q\). By well-known tail estimates of norms of Gaussian variables with values in Banach spaces (see, e.g., [36, Corollary 1] for a more general formulation) we get for all \(t > 0\),
where c, C are universal positive constants, and
Inequality (4.15) shows in particular that the random variables \((Z_j - C \mathbb {E}Z_j)_+\) satisfy
for all \(t > 0\), thus by Lemma 2.11 we get
which together with the observation (following from Lemma 2.1 and the fact that \(1=p\le q\le 2\)) that
proves the upper estimate of the proposition.
Using comparison of moments of norms of Gaussian random vectors, we also get
so to end the proof it is enough to show that
This will follow by a straightforward adaptation of the argument from the proof of Lemma 2.12. We may and do assume that the sequence \((b_j)_{j\le n}\) is non-increasing in j. By (4.16) we have for any \(j\le n\) and \(k \ge 1\),
Thus, since \(b_j\ge b_k\) for all \(j\le k\), we have for any \(k\le n\),
Thus,
Taking maximum over \(k \le n\) gives (4.18) and ends the proof. \(\square \)
4.2 Bounded random variables
Here we show how one can adapt the methods of [9] to prove Proposition 1.14, i.e., a version of Corollary 1.13 in the special case of bounded random variables with better logarithmic terms and with explicit numerical constants. Following [9], we start with a lemma.
Lemma 4.1
Assume that X is as in Proposition 1.14. Let \((b_j)_{j\le n}\in \mathbb {R}^n\) and suppose that \(t_0\) is such that \(\bigl |\sum _{j=1}^n b_j X_{ij}\bigr |\le t_0\) almost surely. Then, for all \(q\ge 2\) and \(0\le t\le {t_0^{2-q}}(4 \sum _{j=1}^n b_j^2)^{-1} \),
where \(C(q) {:=}2(q \Gamma (q/2))^{1/q} \asymp \sqrt{q} \).
Proof
Without loss of generality we may and do assume that \(\sum _{j=1}^n b_j^2=1\).
Since \(q\ge 2\), for \(s\in [0,t_0]\) and \(t\in [0,\frac{1}{4} t_0^{2-q}]\) we have \(ts^q - s^2/2 \le - s^2/4\). Thus, integration by parts, our assumption \(0\le \bigl |\sum _{j=1}^n b_j X_{ij}\bigr |\le t_0\) a.s., and Hoeffding’s inequality (i.e., Lemma 2.13) yield
\(\square \)
Proof of Proposition 1.14
We start with a bunch of reductions. Set
(The equalities follow from Lemma 2.1, since \(p/2\le 1\le q/2\) and \( q^*/2 \le 1\le p^*/2\)). Let K be the set defined in Lemma 2.3, so that \(B_p^n\subset \ln (en)^{1/p^*} K\). Then
where \({\text {Ext}}(K)\) is the set of extreme points of K.
Consider first a fixed \(x=(x_j)_{j=1}^n \in {\text {Ext}}(K) \subset B_p^n\). We have
Denote
Then, by the boundedness of \(X_{ij}\) and by Hölder’s inequality, for every \(i\le m\),
We can now apply, for every \(i\le m\), Lemma 4.1 (with t and \(t_0\) as above and with coefficients \(b_{ j} = a_{ij}x_j\)). Since the random variables \(\bigl |\sum _{j=1}^n a_{ij}x_j X_{ij}\bigr | \), \(i\le m\), are independent, using Lemma 4.1 yields
where in the last step we used the definition of \(a =\Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2}\) (and the fact that \(x\in B^n_{p}\)). By Chebyshev’s inequality and (4.21), we have, for every \(s\ge 0\),
Combining this with the previous estimate yields, for every \(s\ge 0\),
Recall that \(x\in {\text {Ext}}(K)\). Thus, there exists an index set \(J\subset \{1,\dots ,n\}\) of cardinality \(k\le n\), such that \(x_j = \frac{\pm 1}{k^{1/p}}\) for \(j\in J\) and \(x_j=0\) for \(j\notin J\). We use the definition of t and the inequality between the arithmetic mean and the power mean of order \(p^*/2\ge 1\) (recall that \(|J|=k\) and \(p\le 2\)) to get
Putting everything together, we obtain
for all \(s\ge 0\) and all \(x\in {\text {Ext}}(K)\) with support of cardinality k.
For any \(k\le n\), there are \(2^k \left( {\begin{array}{c}n\\ k\end{array}}\right) \le 2^k n^k \le \exp ( k \ln (en))\) vectors in \({\text {Ext}}(K)\) with support of cardinality k. Thus, using the union bound and (4.22), we see that, for all \(s\ge 2\),
Hence, by Lemma 2.6,
Recalling (4.20) and the definitions of a, b, and C(q) yields the assertion. \(\square \)
Remark 4.2
In the unstructured case, for \(X_{ij}\) which are independent, mean-zero, and take values in \([-1,1]\), it is easy to extend (1.2) to the whole range of \(p, q\in [1,\infty ]\) (see [8, 13]). Indeed, for \(p\ge 2\) and \(q\ge 2\),
Thus, for \(p\ge 2\) and \(1\le q\le 2\),
Suppose now that \(1\le p \le 2\le q \le \infty \) and \(1/p+1/q\le 1\) (i.e., \(q\ge p^*\)). Choose \(\theta \in [0,1]\) and \(r \ge 2\) so that \(\frac{1}{p} = \frac{\theta }{2} + \frac{1-\theta }{1}\) and \(\frac{1}{q} = \frac{\theta }{r} + \frac{1-\theta }{\infty }\), i.e., \(\theta = 2/p^*\) and \(r = 2q/p^*\). Using the Riesz–Thorin interpolation theorem, the fact that \(\Vert X:\ell ^n_1 \rightarrow \ell ^m_\infty \Vert \le 1\) (since the entries take values in \([-1,1]\)), and Jensen’s inequality, we arrive at
The estimates in the remaining ranges of p, q follow by duality (1.12). Moreover, up to constants, all these estimates are optimal, as they can be reversed for matrices with \(\pm 1\) entries (see [8, Proposition 3.2] or [13, Satz 2]).
4.3 \(\psi _r\) random variables
In this section, we prove Theorem 1.15. To this end we shall split the matrix X into two parts \(X^{(1)}\) and \(X^{(2)}\) such that all entries of \(X^{(1)}\) are bounded by \(C\ln (mn)^{1/r}\). Then, we shall deal with \(X^{(2)}\) using the following crude bound and the fact that the probability that \(X^{(2)} \ne 0\) is very small. In order to bound the expectation of the norm of \(X^{(1)}\) we need a cut-off version of Theorem 1.15 – see Lemma 4.4 below.
Lemma 4.3
Let \(r\in (0,2]\). Assume that \(X = (X_{ij})_{i\le m, j\le n}\) satisfies the assumptions of Theorem 1.15. Then
Proof
By a standard volumetric estimate (see, e.g., [64, Corollary 4.2.13]), we know that there exists (in the metric \(\Vert \cdot \Vert _p\)) a 1/2-net S in \(B_p^n\) of size at most \(5^n\). In other words, for any \(x\in B_p^n\) there exists \(y\in S\) such that \(x-y \in \frac{1}{2} B_p^n\). Thus, for any \(z\in \mathbb {R}^n\),
Hence,
Likewise, if we denote by T the 1/2-net in \(B_{q^*}^m\) (in the metric \(\Vert \cdot \Vert _{q^*}\)) of size at most \(5^m\), then
Combining these two estimates, we see that
Lemma 2.19 implies that for any \(x\in \mathbb {R}^n\), \(y\in \mathbb {R}^m\), the random variable
satisfies condition (i) in Lemma 2.18. Thus, Lemma 2.18 implies that
where \(c(r,K,L) \in (0,\infty )\) and \(C(r,K,L) \in (0,\infty )\) depend only on r, K, and L.
The function \(z\mapsto e^{z^{r/2}}\) is convex on \([(2r^{-1}-1)^{2/r},\infty )\). Therefore, by Jensen’s inequality, for any \(u> 0\) and any nonnegative random variable Z,
Hence,
Thus, when
we get by (4.26), (3.10), and (3.11),
where in the last two inequalities we also used inequalities \(|S| \le 5^n\) and \(|T|\le 5^m\), and the inclusions \(S\subset B_p^n\), \(T\subset B_{q^*}^m\). Recalling (4.25) completes the proof. \(\square \)
The following cut-off version of Theorem 1.15 can be proved similarly as Proposition 1.7.
Lemma 4.4
Let \(K,L, M>0\) and \(r\in (0,2]\). Assume \(X=(X_{ij})_{i\le m, j\le n}\) is a random matrix with independent symmetric entries taking values in \([-M,M]\) and satisfying the condition
Then, for \(1\le p\le 2\) and \(1\le q< \infty \), we have
Proof
Fix \(1\le p\le 2\) and \(1\le q\le \infty \). Let K be the set defined in Lemma 2.3 so that \(B_p^n\subset \ln (en)^{1/p^*} K\). Then
where \({\text {Ext}}(K)\) is the set of extreme points of K. We shall now estimate the expected value of the right-hand side of (4.28).
To this end, we consider a fixed \(x=(x_j)_{j=1}^n \in {\text {Ext}}(K)\). This means that there exists a non-empty index set \(J\subset \{1,\dots ,n\}\) of cardinality \(k\le n\) such that \(x_j = \frac{\pm 1}{k^{1/p}}\) for \(j\in J\) and \(x_j=0\) for \(j\notin J\). We know from (4.4) that the Lipschitz constant of the convex function
is less than or equal to
Thus, Talagrand’s concentration for convex functions and random vectors with independent bounded coordinates (see [56, Theorem 6.6 and Eq. (6.18)]), together with the inequality \({\text {Med}}(|Z|)\le 2\mathbb {E}|Z|\), implies
Similar to the proof in the Gaussian case (i.e., proof of Proposition 1.7), we shall transform this into a more convenient form by getting rid of \(b_J\) and estimating \(\mathbb {E}\Vert X_A x\Vert _q \). Let us denote, for each \(i\in \{1,\dots ,m\}\),
From our assumption (4.27) as well as Lemmas 2.19 and 2.18, we obtain that \((\mathbb {E}|Z_i|^q)^{1/q}\lesssim _{r,K,L}q^{1/r}\sqrt{\sum _{j=1}^n a_{ij}^2 x_j^2}\). Hence,
From (4.6), we see that
The above two inequalities together with estimate (4.29) (applied with \(t=4k^{\frac{1}{p^*}-\frac{1}{2}}b_J M\sqrt{\ln (en)} s\)), imply that
for every \(s\ge 0\) and any \(x\in {\text {Ext}}(K)\) with support of cardinality k.
For any \(k\le n\), there are \(2^k \left( {\begin{array}{c}n\\ k\end{array}}\right) \le 2^k n^k \le \exp ( k \ln (en))\) vectors in \({\text {Ext}}(K)\) with support of cardinality k. Thus, using the union bound and (4.30), we see that for \(s\ge \sqrt{2}\),
Hence, by Lemma 2.6,
Recalling (4.28) and the definitions of a and b yields the assertion. \(\square \)
Proof of Theorem 1.15
By a symmetrization argument (as in the first paragraph of the proof of Theorem 3.5), we may and do assume that all the entries \(X_{ij}\) are symmetric. Set \(M = (4\,L \ln (mn)/r)^{1/r}\). Denote \({\widehat{X}}_{ij} = X_{ij} {\textbf{1}}_{\{|X_{ij}|\le M\}}\) and let \({\widehat{X}}\) be the \(m\times n\) matrix with entries \({\widehat{X}}_{ij}\). We have
The random matrix \({\widehat{X}}\) satisfies the assumptions of Lemma 4.4. Thus, the first summand above can be estimated as follows:
For the second summand we write, using the Cauchy–Schwarz inequality and then Lemmas 4.3 and 2.22 (with \(k=mn\) and \(v=4/r\); recall that \(M = (4\,L \ln (mn)/r)^{1/r}\)),
Combinging the above three inequalities ends the proof. \(\square \)
5 Lower bounds and further discussion of conjectures
5.1 Lower bounds
Let us first provide lower bounds showing that the upper bounds obtained above are indeed sharp (up to logarithms).
Proposition 5.1
Let \(X = (X_{ij})_{i\le m, j\le n}\) be a random matrix with independent mean-zero entries satisfying \(\mathbb {E}|X_{ij}| \ge c\) for some \(c\in (0,\infty )\). Then, for all \(1\le p, q\le \infty \),
Using duality (1.12) we immediately obtain the following corollary.
Corollary 5.2
Let \(X = (X_{ij})_{i\le m, j\le n}\) be as in Proposition 5.1. Then, for all \(1\le p,q\le \infty \),
Proof of Proposition 5.1
Let \(\Vert \cdot \Vert \) denote the operator norm from \(\ell _{p}^n\) to \(\ell _q^m\). For \(i\in \{1,\dots ,m\}\) and \(j\in \{1,\dots ,n \}\), let us denote by \(E_{ij}\) the \(m\times n\) matrix with entry 1 at the intersection of ith row and jth column and with all other entries 0. By the symmetrization trick described in Remark 3.4, it suffices to consider matrices X with symmetric entries and prove the assertion with a twice better constant \(c/\sqrt{2}\) (note that, also by Remark 3.4, the lower bound for the absolute first moment of the symmetrized entries does not change and is still equal to c).
If X has symmetric independent entries, it has the same distribution as \((\varepsilon _{ij}|X_{ij}|)_{ij}\), where \(\varepsilon _{ij}\), \(i\le m\), \(j\le n\), are i.i.d. Rademacher random variables, independent of all other random variables. Hence, by Jensen’s inequality and the contraction principle (Lemma 2.7 applied with \(\alpha _{ij} = 1/\mathbb {E}|X_{ij}| \le 1/c\) and \(x_{ij}=a_{ij}\mathbb {E}|X_{ij}| E_{ij}\)), we get
Thus, it suffices to estimate from below \(\ \mathbb {E}\Vert \sum _{i,j} \varepsilon _{ij}a_{ij} E_{ij} \Vert \).
Since the \(\ell _q\) norm is unconditional, we obtain from the inequalities of Jensen and Khintchine (see [26]) that
This together with the estimate in (5.1) yields the assertion. \(\square \)
Since \(\Vert G_A:\ell _{p}^n\rightarrow \ell _q^m\Vert \ge \max _{i,j} |a_{ij}g_{ij}|\), it suffices to prove the following proposition in order to provide the lower bound in Conjecture 1.
Proposition 5.3
For the \(m\times n\) Gaussian matrix \(G_A\), we have
where \(b_j= \Vert (a_{ij})_{i\le m} \Vert _{2q/(2-q)}\) and \(d_i=\Vert (a_{ij})_{j\le n}\Vert _{2p/(p-2)}\).
Proof
Since \(B_1^n\subset B_p^n\) for \(p\ge 1\) and the \(b_j\)’s do not depend on p, it suffices to prove the first part of the assertion (in the range \(p\le q\le 2\)) only in the case \(p=1\le q\le 2\). In this case (5.2) follows by Propostion 1.10.
The assertion in the range \(2\le p \le q\) follows by duality (1.12). \(\square \)
5.2 The proof of Inequalities (1.13) and (1.11)
Let us now show that in the case \(q<p\), the third term on the right-hand side in Conjecture 1 is not needed. To this end it suffices to prove (1.13) only in the case \(q<2\), since the case \(p>2\) follows by duality (1.12).
Proposition 5.4
Whenever \(1\le q<p \le \infty \) and \(q<2\), we have
where \(b_j= \Vert (a_{ij})_{i\le m} \Vert _{2q/(2-q)}\).
Proof
Since the right-hand side of (5.3) does not depend on p, and the left-hand side is non-decreasing with p, we may consider only the case \(1\le q<p\le 2\). By permuting the columns of A we may and do assume without loss of generality that the sequence \((b_j)_j\) is non-increasing.
Fix \(j_0\le n\). Let r be the midpoint of the non-empty interval \((\frac{2-p}{p}, \frac{2-q}{q})\). Take \(x=(x_j)_{j\le n}\) with \(x_j=\frac{1}{j^r}\). Since \(rp/(2-p)>1\), we have
so \(x\in C'(p,q)B^n_{p/(2-p)}=C'(p,q)B^n_{(p^*/2)^*}\). Therefore, the inequality \((q^*/2)^*= q/(2~-~q) \ge ~1\) and the facts that \(b_j\ge b_{j_0}\) for all \(j\le j_0\), and that \(r<(2-q)/q\) imply
Taking the maximum over all \(j_0\le n\) completes the proof. \(\square \)
Now we turn to the proof of (1.11). Note that it suffices to prove only the first two-sided inequality in (1.11), since the second one follows from it by duality (1.12).
Proposition 5.5
For all \(1\le p, q\le \infty \), we have
where the matrix \((a_{ij}')_{i,j}\) is obtained by permuting the columns of the matrix \((|a_{ij}|)_{i,j}\) in such a way that \(\max _i a_{i1}'\ge \dots \ge \max _i a_{in}'\).
Proof
By permuting the columns of the matrix A, we can assume that the sequence \((\max _{i\le m} |a_{ij}|)_{j=1}^n\) is non-increasing. We have
The function \(y \mapsto \max _{i\le m} |a_{ij} y_i|\) is \(\max _{i\le m} |a_{ij}|\)-Lipschitz with respect to the Euclidean norm on \(\mathbb {R}^m\), so by Gaussian concentration (see, e.g., [41, Chapter 5.1]),
for all \(t\ge 0\), \(j\le n\). Thus, Lemma 2.11 and inequality (5.5) imply
We have
which, together with (5.6), provides the asserted upper bound.
On the other hand, if \((a_{l}^{\downarrow {}})_{l\le mn}\) denotes the non-increasing rearrangement of the sequence of all absolute values of entries of A, then Lemma 2.12 implies
which provides the asserted lower bound. \(\square \)
Note that the above proof shows in fact that
so
where the matrix \((a_{ij}'')_{i,j}\) is obtained by permuting the rows of the matrix \((|a_{ij}|)_{i,j}\) in such a way that \(\max _j a_{1j}''\ge \dots \ge \max _j a_{mj}''\).
5.3 Counterexample to a seemingly natural conjecture
In this subsection we provide an example showing that for any \(p\le q <2\) the bound
cannot hold. By duality (1.12), it also cannot hold for any \(2<p\le q\). This explains that Conjecture 1 cannot be simplified into a form like on the right-hand side of (1.8).
Let \(p\le q <2\), \(k, N\in \mathbb {N}\), and let \(A_1, \ldots , A_N\) be \(k\times k\) matrices with all entries equal to one. Consider a block matrix
of size \(kN \times kN\), with blocks \(A_1, \ldots A_N\) on the diagonal and with all other entries equal to 0.
Note that since \(p\le q \le 2\),
and similarly, since \(2\le q^*\le p^{*}\),
The two bounds above and Lemma 2.10 imply that the right-hand side of (5.8) is bounded from above by
On the other hand, since for all \(j\le kN\), \(\Vert (a_{ij})_{i} \Vert _{2q/(2-q)} = k^{(2-q)/(2q)}\), we obtain from the lower bound (5.2) that
If we take \(N\asymp e^{e^k}\), then (5.10) is of larger order than (5.9) as \(k\rightarrow \infty \), so (5.8) cannot hold.
5.4 Discussion of another natural conjecture
In this subsection we prove all the assertions of Remark 1.1. We begin by showing that for every \(1\le p \le 2\le q \le \infty \),
and, in the case \(p,q\ge 2\),
where \( D_1 = \Vert A\mathbin {\circ }A :\ell ^n_{p/2} \rightarrow \ell ^m_{q/2}\Vert ^{1/2}\), \(D_2 = \Vert (A\mathbin {\circ }A)^T :\ell ^m_{q^*/2} \rightarrow \ell ^n_{p^*/2}\Vert ^{1/2}\), and \(d_i = \Vert (a_{ij})_{j\le n} \Vert _{2p/(p-2)}\). In other words, (5.11) shows that Conjecture 1 is equivalent to (1.15) as long as \(1\le p \le 2\le q \le \infty \).
Proof of (5.11) and (5.12)
Fix \(i\le m\) and let \(f(x)=\Vert (a_{ij}x_j)_j\Vert _{p^*}\) for \(x\in \mathbb {R}^n\). For \(p\ge 2\) we have \(p^*(2/p^*)^*= 2p/(p-2)\). Thus f is Lipschitz continuous with constant \(L_i\) equal to
Therefore, the Gaussian concentration inequality (see, e.g., [41, Chapter 5.1]) implies that for every \(t\ge 0\) and every \(i\le m\),
so by Lemma 2.11 we get
where the matrix \((a_{ij}'')_{i,j}\) is obtained by permuting the rows of the matrix \((|a_{ij}|)_{i,j}\) in such a way that \(\max _j a_{1j}'' \ge \dots \ge \max _j a_{mj}''\).
Moreover, by Jensen’s inequality,
This together with the triangle inequality and (5.13) implies
and, by duality,
where \(b_j=\Vert (a_{ij})_i)\Vert _{2q/(2-q)}\), and the matrix \((a_{ij}')_{i,j}\) is obtained by permuting the columns of the matrix \((|a_{ij}|)_{i,j}\) in such a way that \(\max _i a_{i1}'\ge \dots \ge \max _i a_{in}'\). This, together with Lemma 2.1 and (5.4) yields in the case \(p\le 2\le q\),
what implies the lower bound of (5.11). In the case \(2<p, q\) we additionally use (5.7) and the simple observation that
to get (5.12).
Now we move to the proof of the upper bound of (5.11) in the case \(p\le 2\le q\). Since the \(\ell _{p^*}^n\) norm is unconditional, we have by Jensen’s inequality and Lemma 2.1
and dually
Moreover, since \(\Vert \cdot \Vert _q\ge \Vert \cdot \Vert _{\infty }\),
which finishes the proof of the upper bound of (5.11). \(\square \)
Next, for every pair \((p,q)\in [1,\infty ]^2\) which does not satisfy the condition \(1\le p\le 2\le q\le \infty \) we shall give examples of \(m,n\in \mathbb {N}\), and \(m\times n\) matrices A, for which
when \(m,n \rightarrow \infty \). This shows that the natural conjecture (1.15) is wrong outside the range \(1\le p\le 2\le q\le \infty \). The case \(p=2=q\), when (1.15) is valid (cf. (1.4)), is in a sense a boundary case, for which (1.15) (i.e., a natural generalization of (1.4)) may hold.
Example 5.6
(for (5.14) in the case \(q<p\).) Let \(m=n\), and \(A={\text {Id}}_n\). Then by Lemmas 2.10 and 2.12 we have
whereas Proposition 5.1 and our assumption \(p/q>1\) imply
Since cases \(2<p \le q\) and \(p\le q <2\) are dual (see (1.12)), we give an example for which (5.14) holds only in the first case.
Example 5.7
(for (5.14) in the case \(2<p \le q\).) Fix p and q satisfying \(2<p \le q\). Let \(m,n \rightarrow \infty \) be such that \(m^{1/q}\gg n^{1/p^*}\), and let A be an \(m\times n\) matrix with all entries equal to 1. For \(p>2\) we have \(2(p/2)^*= 2p/(p-2)\). This together with (5.12) implies
On the other hand, Proposition 5.1 and our assumption \(p/2>1\) imply
5.5 Infinite dimensional Gaussian operators
In this subsection we prove Proposition 1.2 concerning infinite dimensional Gaussian operators. It allows us to see that Conjecture 1 implies Conjecture 2.
Proof of Proposition 1.2
We adapt the proof of [40, Corollary 1.2] to prove Proposition 1.2 in the case \(p\le 2\le q\)—remaining cases may be proven similarly. Fix \(1\le p \le 2\le q\le \infty \) for which (1.14) holds and a deterministic infinite matrix \(A=(a_{ij})_{i,j\in \mathbb {N}}\). Using the monotone convergence theorem one can show that a matrix \(B = (b_{ij})_{i,j\in \mathbb {N}}\) defines a bounded operator between \(\ell _p(\mathbb {N})\) and \(\ell _q(\mathbb {N})\) if an only if \(\sup _{n\in \mathbb {N}} \Vert (b_{ij})_{i,j\le n}:\ell _p^n\rightarrow \ell _q^n\Vert < \infty \). Interpreting \(\Vert B:\ell _p(\mathbb {N})\rightarrow \ell _q(\mathbb {N})\Vert \) as infinity for matrices which do not define a bounded operator, we have
and similarly
and
Therefore, (1.14) implies the following: \(\mathbb {E}\Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert <\infty \) if and only if \(\Vert A\mathbin {\circ }A :\ell _{p/2}(\mathbb {N}) \rightarrow \ell _{q/2}(\mathbb {N}) \Vert <\infty \), \(\Vert (A\mathbin {\circ }A)^T :\ell _{q^*/2}(\mathbb {N}) \rightarrow \ell _{p^*/2}(\mathbb {N}) \Vert <\infty \), and \(\mathbb {E}\sup _{i, j \in \mathbb {N}}|a_{ij}g_{ij}| <\infty \). It thus suffices to prove the following claim: \(\Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert <\infty \) almost surely if and only if \(\mathbb {E}\Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert <\infty \).
If \(\mathbb {P}(\Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert<\infty ) <1\), then \(\mathbb {P}(\Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert =\infty ) >0\), so \(\mathbb {E}\Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert =\infty \).
Assume now that \(\mathbb {P}(\Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert <\infty ) =1\). By (4.23) and (4.24) we know that for every \(n\in \mathbb {N}\) there exist finite sets \(S_n\) and \(T_n\) such that
In particular, there exist Gaussian random variables \((\Gamma _k)_{k\in \mathbb {N}}\) such that
Therefore, we may apply [35, (1.2)] to see that there exists \(\varepsilon >0\) such that \(\mathbb {E}\exp (\varepsilon \Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert ^2 ) < \infty \), so \(\mathbb {E}\Vert G_A:\ell _p(\mathbb {N}) \rightarrow \ell _q(\mathbb {N}) \Vert <\infty \), which completes the proof of the claim. \(\square \)
Data availibility
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
Notes
By Jensen’s inequality, the expected norm of a matrix with i.i.d. Rademacher entries is less than or equal to \(\sqrt{2/\pi }\) times the expected norm of the matrix with Gaussian entries, so (1.5) for \(q\le 2\) or \(p\ge 2\) would imply the same (up to a constant) bound for \(\pm 1\) matrices, which does not hold in this range of (p, q) as we explain in Remark 4.2.
We use here also a trivial observation that \(\Vert G_A:\ell _{p}^n\rightarrow \ell _q^m\Vert \ge \max _{i,j} |a_{ij}g_{ij}|\).
Indeed, \(j^{-1/p}(e_1+\dots +e_j)\in K\), so \(\Vert e_1+\dots +e_j\Vert _K \le j^{1/p}\); on the other hand, \(K\subset B_p^n\), so \(\Vert e_1+\dots +e_j\Vert _K \ge \Vert e_1+\dots +e_j\Vert _p = j^{1/p}\).
References
Achlioptas, D., Mcsherry, F.: Fast computation of low-rank matrix approximations, J. ACM 54(2), 9-es, (2007)
Adamczak, R., Latała, R., Litvak, A.E., Pajor, A., Tomczak-Jaegermann, N.: Chevet type inequality and norms of submatrices. Studia Math. 210(1), 35–56 (2012)
Adamczak, R., Latała, R., Puchała, Z., Życzkowski, K.: Asymptotic entropic uncertainty relations, J. Math. Phys. 57(3), 032204, 24, (2016)
Ailon, N., Chazelle, B.: The fast Johnson-Lindenstrauss transform and approximate nearest neighbors. SIAM J. Comput. 39(1), 302–322 (2009)
Akemann, G., Baik, J., Di Francesco, P. (eds.): The Oxford handbook of random matrix theory. Oxford University Press, Oxford (2015)
Anderson, G.W., Guionnet, A., Zeitouni, O.: An introduction to random matrices, Cambridge Studies in Advanced Mathematics, vol. 118. Cambridge University Press, Cambridge (2010)
Bandeira, A.S., van Handel, R.: Sharp nonasymptotic bounds on the norm of random matrices with independent entries. Ann. Probab. 44(4), 2479–2506 (2016)
Bennett, G.: Schur multipliers. Duke Math. J. 44(3), 603–639 (1977)
Bennett, G., Goodman, V., Newman, C.M.: Norms of random matrices. Pac. J. Math. 59(2), 359–365 (1975)
Benyamini, Y., Gordon, Y.: Random factorization of operators between Banach spaces. J. Anal. Math. 39, 45–74 (1981)
Boucheron, S., Lugosi, G., Massart, P.: Concentration inequalities, Oxford University Press, Oxford. A nonasymptotic theory of independence, With a foreword by Michel Ledoux (2013)
Bourgain, J., Dirksen, S., Nelson, J.: Toward a unified theory of sparse dimensionality reduction in Euclidean space. Geom. Funct. Anal. 25(4), 1009–1088 (2015)
Carl, B., Maurey, B., Puhl, J.: Grenzordnungen von absolut-\((r,\, p)\)-summierenden Operatoren. Math. Nachr. 82, 205–218 (1978)
Chafaï, D., Guédon, O., Lecué, G., Pajor, A.: Interactions between compressed sensing random matrices and high dimensional geometry, Panoramas et Synthèses [Panoramas and Syntheses], vol. 37. Société Mathématique de France, Paris (2012)
Davidson, K.R., Szarek, S.J.: Local Operator Theory, Random Matrices and Banach Spaces, Handbook of the Geometry of Banach Spaces, vol. I, pp. 317–366. North-Holland, Amsterdam (2001)
Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Applied and Numerical Harmonic Analysis, Birkhäuser/Springer, New York (2013)
Friedland, O., Youssef, P.: Approximating matrices and convex bodies, Int. Math. Res. Not. IMRN (8), 2519–2537, (2019)
Gluskin, E. D.: Norms of random matrices and diameters of finite-dimensional sets, Mat. Sb. (N.S.) 120(162)(2), 180–189, 286, (1983)
Gluskin, E.D., Kwapień, S.: Tail and moment estimates for sums of independent random variables with logarithmically concave tails. Studia Math. 114(3), 303–309 (1995)
Goldstine, H.H., von Neumann, J.: Numerical inverting of matrices of high order. II. Proc. Am. Math. Soc. 2, 188–202 (1951)
Gordon, Y.: Some inequalities for Gaussian processes and applications. Israel J. Math. 50(4), 265–289 (1985)
Gordon, Y., Litvak, A.E., Schütt, C., Werner, E.M.: Geometry of spaces between polytopes and related zonotopes. Bull. Sci. Math. 126(9), 733–762 (2002)
Guédon, O., Hinrichs, A., Litvak, A. E., Prochno, J.: On the expectation of operator norms of random matrices, Geometric aspects of functional analysis, Lecture Notes in Math., vol. 2169, Springer, Cham, (2017), pp. 151–162
Guédon, O., Mendelson, S., Pajor, A., Tomczak-Jaegermann, N.: Majorizing measures and proportional subsets of bounded orthonormal systems. Rev. Mat. Iberoam. 24(3), 1075–1095 (2008)
Guédon, O., Rudelson, M.: \(L_p\)-moments of random vectors via majorizing measures. Adv. Math. 208(2), 798–823 (2007)
Haagerup, U.: The best constants in the Khintchine inequality, Studia Math. 70(3) (1981), 231–283 (1982)
Hinrichs, A., Krieg, D., Novak, E., Prochno, J., Ullrich, M.: On the power of random information, Multivariate Algorithms and Information-Based Complexity (F. J. Hickernell and P. Kritzer, eds.), De Gruyter, Berlin/Boston, (1994), pp. 43–64
Hinrichs, A., Krieg, D., Novak, E., Prochno, J., Ullrich, M.: Random sections of ellipsoids and the power of random information. Trans. Am. Math. Soc. 374(12), 8691–8713 (2021)
Hinrichs, A., Prochno, J., Sonnleitner, M.: Random sections of \(\ell _p\)-ellipsoids, optimal recovery and Gelfand numbers of diagonal operators, (2021)
Hinrichs, A., Prochno, J., Vybíral, J.: Gelfand numbers of embeddings of Schatten classes. Math. Ann. 380(3–4), 1563–1593 (2021)
Hitczenko, P., Montgomery-Smith, S.J., Oleszkiewicz, K.: Moment inequalities for sums of certain independent symmetric random variables. Studia Math. 123(1), 15–42 (1997)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Statist. Assoc. 58, 13–30 (1963)
Krieg, D., Ullrich, M.: Function values are enough for \(L_2\)-approximation. Found. Comput. Math. 21(4), 1141–1151 (2021)
Kwapień, S.: Decoupling inequalities for polynomial chaos. Ann. Probab. 15(3), 1062–1071 (1987)
Landau, H.J., Shepp, L.A.: On the supremum of a Gaussian process. Sankhyā Ser. A 32, 369–378 (1970)
Latała, R.: Tail and moment estimates for sums of independent random vectors with logarithmically concave tails. Studia Math. 118(3), 301–304 (1996)
Latała, R.: Some estimates of norms of random matrices. Proc. Am. Math. Soc. 133(5), 1273–1282 (2005)
Latała, R., Strzelecka, M.: Comparison of weak and strong moments for vectors with independent coordinates. Mathematika 64(1), 211–229 (2018)
Latała, R., Świątkowski, W.: Norms of randomized circulant matrices, Electron. J. Probab. 27, Paper No. 80, 23, (2022)
Latała, R., van Handel, R., Youssef, P.: The dimension-free structure of nonhomogeneous random matrices. Invent. Math. 214(3), 1031–1080 (2018)
Ledoux, M.: The Concentration of Measure Phenomenon, Mathematical Surveys and Monographs, vol. 89. American Mathematical Society, Providence, RI (2001)
Ledoux, M.: Deviation inequalities on largest eigenvalues, Geometric aspects of functional analysis, Lecture Notes in Math., Springer. Berlin 2007, 167–219 (1910)
Ledoux, M., Talagrand, M.: Probability in Banach spaces, Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)], vol. 23, Springer-Verlag, Berlin, (1991), Isoperimetry and processes
Linial, N., London, E., Rabinovich, Y.: The geometry of graphs and some of its algorithmic applications. Combinatorica 15(2), 215–245 (1995)
Matlak, D.: Oszacowania norm macierzy losowych, Master’s thesis, Uniwersytet Warszawski, (2017)
Naor, A., Regev, O., Vidick, T.: Efficient rounding for the noncommutative Grothendieck inequality. Theory Comput. 10, 257–295 (2014)
Puchała, Z., Rudnicki, Ł., Życzkowski, K.: Majorization entropic uncertainty relations, J. Phys. A 46(27), 272002, 12, (2013)
Rauhut, H.: Compressive sensing and structured random matrices, Theoretical foundations and numerical methods for sparse recovery, Radon Ser. Comput. Appl. Math., vol. 9, Walter de Gruyter, Berlin, (2010), pp. 1–92
Riemer, S., Schütt, C.: On the expectation of the norm of random matrices with non-identically distributed entries. Electron. J. Probab. 18(29), 13 (2013)
Rudelson, M., Vershynin, R.: Sampling from large matrices: an approach through geometric functional analysis, J. ACM 54(4), Art. 21, 19, (2007)
Seginer, Y.: The expected norm of random matrices. Combin. Probab. Comput. 9(2), 149–166 (2000)
Slepian, D.: The one-sided barrier problem for Gaussian noise. Bell Syst. Tech. J. 41, 463–501 (1962)
So, A. M.-C.: Moment inequalities for sums of random matrices and their applications in optimization, Math. Program. 130(1), Ser. A, 125–151, (2011)
Spielman, D. A., Teng, S.-H.: Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems, Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing (New York, NY, USA), STOC ’04, Association for Computing Machinery, (2004), p. 81-90
Strzelecka, M.: Estimates of norms of log-concave random matrices with dependent entries, Electron. J. Probab. 24, Paper No. 107, 15, (2019)
Talagrand, M.: A new look at independence. Ann. Probab. 24(1), 1–34 (1996)
Tropp, J.A.: Norms of random submatrices and sparse approximation. C. R. Math. Acad. Sci. Paris 346(23–24), 1271–1274 (2008)
Tropp, J.A.: On the conditioning of random subdictionaries. Appl. Comput. Harmon. Anal. 25(1), 1–24 (2008)
Tropp, J.A.: User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics 12(4), 389–434 (2012)
Tropp, J. A.: An introduction to matrix concentration inequalities, Foundations and Trends®in Machine Learning 8(1-2), 1–230, (2015)
van Handel, R.: On the spectral norm of Gaussian random matrices. Trans. Am. Math. Soc. 369(11), 8161–8178 (2017)
van Handel, R.: Structured random matrices, Convexity and concentration, IMA Vol. Math. Appl., vol. 161, Springer, New York, (2017), pp. 107–156
Vershynin, R.: Introduction to the Non-asymptotic Analysis of Random Matrices, Compressed Sensing, pp. 210–268. Cambridge Univ. Press, Cambridge (2012)
Vershynin, R: High-dimensional probability, Cambridge Series in Statistical and Probabilistic Mathematics, vol. 47, Cambridge University Press, Cambridge, (2018), An introduction with applications in data science, With a foreword by Sara van de Geer
von Neumann, J., Goldstine, H.H.: Numerical inverting of matrices of high order. Bull. Am. Math. Soc. 53(11), 1021–1099 (1947)
Wigner, E.P.: Characteristic vectors of bordered matrices with infinite dimensions. Ann. Math. (2) 62, 548–564 (1955)
Wigner, E.P.: Characteristic vectors of bordered matrices with infinite dimensions. II. Ann. Math. 2(65), 203–207 (1957)
Wigner, E.P.: On the distribution of the roots of certain symmetric matrices. Ann. Math. 2(67), 325–327 (1958)
Wishart, J.: The generalised product moment distribution in samples from a normal multivariate population. Biometrika 20A(1/2), 32–52 (1928)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
R. Adamczak is partially supported by the National Science Center, Poland via the Sonata Bis grant no. 2015/18/E/ST1/00214. R. Adamczak was partially supported by the WTZ Grant PL 06/2018 of the OeAD. J. Prochno and M. Strzelecka are—and M. Strzelecki was — supported by the Austrian Science Fund (FWF) Project P32405 Asymptotic Geometric Analysis and Applications. M. Strzelecka was partially supported by the National Science Center, Poland, via the Maestro grant no. 2015/18/A/ST1/00553.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Adamczak, R., Prochno, J., Strzelecka, M. et al. Norms of structured random matrices. Math. Ann. 388, 3463–3527 (2024). https://doi.org/10.1007/s00208-023-02599-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00208-023-02599-6