1 Introduction

1.1 Background and studied objects

Given a matrix \(A \in {\mathbb {R}}^{n \times n}\) and a random vector \(X \in {\mathbb {R}}^n\), the Hanson–Wright inequality provides a tail bound for the chaos \(X^T A X - {\mathbb {E}} X^T A X\). In the original work [1], X was assumed to have independent subgaussian entries whose distributions are symmetric about 0.

This result has been improved and adapted to various settings in a number of works, for example [2] gives a version which holds for vectors with general subgaussian entries without the symmetry assumption of the distribution:

Theorem 1

(Theorem 1.1 from [2]) Let \(A \in {\mathbb {R}}^{n \times n}\). Let \(X \in {\mathbb {R}}^n\) be a random vector with independent entries such that \({\mathbb {E}} X = 0\) and such that X has a subgaussian norm of at most K. Then for every \(t \ge 0\),

$$\begin{aligned} {\mathbb {P}}(|X^T A X - {\mathbb {E}} X^T A X |> t) \le 2 \exp \left[ - c \min \left\{ \frac{t^2}{K^4 \Vert A\Vert _F^2}, \frac{t}{K^2 \Vert A\Vert _{2 \rightarrow 2} } \right\} \right] \end{aligned}$$

where \(\Vert A\Vert _F\) is the Frobenius and \(\Vert A\Vert _{2 \rightarrow 2}\) the spectral norm of A.

Today, the Hanson–Wright inequality is an important probabilistic tool and can be found in various textbooks covering the basics of signal processing and probability theory, such as [3, 4]. It has found numerous applications, in particular it has been a key ingredient for the construction of fast Johnson–Lindenstrauss embeddings [5].

For subgaussian \(X \in {\mathbb {R}}^n\), linear expressions \(\sum _{k = 1}^n a_k X_k\) can be controlled by Hoeffding’s inequality, while quadratic (order 2) expressions \(X^T A X = \sum _{j, k = 1}^n A_{j, k} X_j X_k\) can be controlled by the Hanson–Wright inequality. Thus, it is natural to wonder to what extent such control extends to a higher-order subgaussian chaos of the form

$$\begin{aligned} \sum _{i_1, \ldots , i_d} A_{i_1, \ldots , i_d} X_{i_1} \cdots X_{i_d}. \end{aligned}$$
(1)

Expressions of this type for subgaussian vectors have been considered in [6] where they are controlled using specific tensor norms of the arrays of all expected partial derivatives of certain degree with respect to the entries in X.

In contrast, for independent random vectors \(X^{(1)}, \ldots , X^{(d)}\), the decoupled chaos

$$\begin{aligned} \sum _{i_1, i_2, \ldots , i_d = 1}^n A_{i_1, \ldots , i_d} X^{(1)}_{i_1} \cdots X^{(d)}_{i_d}, \end{aligned}$$
(2)

can be controlled with simpler bounds and has been considered in multiple previous works for numerous different distributions of the random vectors [7,8,9].

In the course of adapting fast Johnson–Lindenstrauss embeddings to data with Kronecker structure as introduced in [10] (see also [11, 12]), one encounters expressions of the form \((X^{(1)} \otimes \cdots \otimes X^{(d)})^T A (X^{(1)} \otimes \cdots \otimes X^{(d)})\) which are somewhat intermediate between (1) and (2), as they can be expanded as

$$\begin{aligned} \sum _{i_1, \ldots , i_{2 d}=1}^n A_{i_1, \dots , i_d, i_{d + 1}, \ldots , i_{2 d}} X^{(1)}_{i_1} \cdots X^{(d)}_{i_d} X^{(1)}_{i_{d + 1}} \cdots X^{(d)}_{i_{2 d}}. \end{aligned}$$
(3)

In the case that A is of the form \(B^T B\) for a matrix B (suitably reindexed), this expression can be rewritten as the square of

$$\begin{aligned} \Vert B(X^{(1)} \otimes \cdots \otimes X^{(d)})\Vert _2, \end{aligned}$$
(4)

a quantity of interest studied by Roman Vershynin in the context of random tensors [13]. Hence, understanding (3) can also help understanding (4), see Sect. 3.4 below.

Even though (3) can be cast as a specific case of (1) for which [6] provides optimal bounds, these bounds are not straightforward to use in this specific situation since they are given in terms of partial derivatives and not in terms of the coefficients \(A_{i_1, \ldots , i_{2 d}}\).

The main results of this paper provide moment estimates for the semi-decoupled chaos process (3) that are easier to use as they are explicitly given in terms of the coefficients \(A_{i_1, \ldots , i_{2 d}}\). Our bounds imply improved estimates for (4) and lay the foundations for an order-optimal analysis of fast Kronecker-structured Johnson–Lindenstrauss embeddings. We refer the reader to our companion paper [14] for a discussion of the implications in this regard. We nevertheless expect that our results should find broader use beyond these specific applications.

1.2 Previous work

For the case where \(X^{(1)}, \ldots , X^{(d)}\) are independent Gaussian vectors, the concentration of (2) has been studied in [7] which provides upper and lower moment bounds which match up to a constant factor depending only on the order d. We will obtain our main results for subgaussian vectors by careful reduction to the Gaussian bounds.

Higher order chaos expressions have also been studied for distributions beyond Gaussian. Specifically, [15,  Section 9], considers (1) for the case of Rademacher vectors. However, the bounds are more intricate than in [7] and the coefficient array \({\varvec{A}} = (A_{i_1, \ldots , i_{d}})_{i_1, \ldots , i_{d}=1}^n\) must satisfy a symmetry condition and be diagonal-free, i.e., \(A_{i_1, \ldots , i_d} = 0\) if any two of the indices \(i_1, \ldots , i_d\) coincide.

Upper and lower bounds on the moments of (2) are shown in [8, 9] for the case of symmetric random variables with logarithmically concave and convex tails, meaning that for a random variable \(X \in {\mathbb {R}}\), the function \(t \mapsto - \log {\mathbb {P}}(|X |\ge t)\) is convex or concave, respectively. However, for general subgaussian random variables, neither of these has to be the case. In addition, these works only consider the decoupled chaos (2) and provide a decoupling inequality to control (1) for diagonal-free \({\varvec{A}}\).

Upper moment bounds for general polynomials of independent subgaussian random variables are provided in [6]. Similar to our work, the authors utilize the decoupling techniques of [16]. Since (3) is a polynomial in the entries of \(X^{(1)}, \ldots , X^{(d)}\), it can also be controlled using the results from [6]. Because the aforementioned work also shows that these moment bounds are tight for the case of Gaussian vectors, one of the main results (Theorem 3) of our work can also be shown using their results. However, their result bounds the corresponding \(L_p\) norms in terms of norms of the array of all \(d' \le 2 d\) expected partial derivatives, meaning that significant additional work would be required to relate these derivatives to the expressions in Theorem 3. We believe, that our approach is not much longer but more insightful. In addition, it provides the decoupling result Theorem 4 which will be of independent interest.

More work on related topics include [17, 18] where upper and lower bounds for the case of random variables satisfying the moment condition \(\Vert X\Vert _{2 p} \le \alpha \Vert X\Vert _p\) are considered for the case of positive variables of order 2. The recent work [19] provides similar bounds to [6] for distributions of bounded \(\psi _\alpha \) norm for \(\alpha \in (0, 1]\) (or \(\alpha \in (0, 2]\) for some fo their results), such as subexponential distributions. Like in [6], their bounds are given in terms of partial derivatives, not directly in terms of the coefficients.

The decoupling technique used in many proofs of the standard Hanson–Wright inequality relates \(X^T A X\) to \(X^T A {\bar{X}}\) where \({\bar{X}}\) is an independent copy of X. This approach was first introduced in [20], already in a general higher-dimensional form. The general idea is to upper bound convex functions (e.g. moments) of (1) by the corresponding expressions of (2), up to a constant. Beside independent, symmetrically distributed entries of the random vectors, the result also requires the coefficient array to be symmetric and diagonal free.

The subsequent work [21] has also shown the reverse decoupling bound, up to constant factors, proving that through (2), one can also provide lower bounds on the moments of (1) with the same assumptions on the coefficient array. However, in some applications it can be interesting to consider non-diagonal-free coefficient arrays. For example, in the scenario of \(\Vert B (X^{(1)} \otimes \dots \otimes X^{(d)})\Vert _2^2\), the coefficient array \(B^T B\) cannot be expected to fulfill the diagonal-free condition in general. The work in [16] lifts the restriction of a diagonal-free coefficient array and bounds the tails of slight modifications of (2) and (1) by each other up to certain constants in the case of Gaussian random variables.

The concentration of the norm (4) has recently been studied for the subgaussian case in [13]. It is shown that

$$\begin{aligned} {\mathbb {P}} \left( \left|\Vert B (X^{(1)} \otimes \cdots \otimes X^{(d)})\Vert _2 - \Vert B\Vert _F \right|> t \right) \le 2 \exp \left( - \frac{c t^2}{d n^{d - 1} \Vert B\Vert _{2 \rightarrow 2}^2 } \right) \end{aligned}$$
(5)

for an absolute constant c and for \(0 \le t \le 2 n^\frac{d}{2} \Vert B\Vert _{2 \rightarrow 2}\). This bound suggests that techniques like the chaos moment bounds in [7] could be applied to this problem, which is what we do in this work and leads to Theorem 6 below.

1.3 Overview of our contribution

The goal of this work is to provide upper and lower bounds for the moments of the deviation of (3) from its expectation for vectors with independent subgaussian entries (Theorem 3 below). Key steps of the proof include a decoupling inequality for expressions of the form (3), Theorem 4, and a comparison to Gaussian random vectors. Finally, based on our results for (3), we provide a concentration inequality for (4) as stated in Theorem 6 which extends previous results of [13].

Possible applications of such results include recent developments in norm-preserving maps for vectors with tensor structure in the context of machine learning methods using the kernel trick [10,11,12].

1.4 Notation

Our results on \(X^T A X\) where X is a Kronecker product of d random vectors will depend crucially on the structure of the coefficient matrix A rearranged as a higher-order (specifically order 2d) array. As such, we must establish sophisticated notation for such arrays and their indices.

Consider a vector of dimensions \({\varvec{n}} = (n_1, n_2, \ldots , n_d)\) and a subset \(I \subset [d]\). We call a function \({\varvec{i}}: I \rightarrow {\mathbb {N}}\) a partial index of order d on I if for all \(l \in I\), \({\varvec{i}}_l := {\varvec{i}}(l) \in [n_l]\). Assume there is exactly one such function if \(I = \emptyset \). If \(I = [d]\), then \({\varvec{i}}\) is called an index of order d. We denote the set of all partial indices of order d on I as \({\varvec{J}}^{\varvec{n}}(I)\); the set of all indices of order d is denoted by \({\varvec{J}}^{\varvec{n}}:= {\varvec{J}}^{\varvec{n}}([d])\). \({\varvec{J}}^{\varvec{n}}\) can be identified with \([n_1] \times \cdots \times [n_d]\).

A function \({\varvec{B}}: {\varvec{J}}^{\varvec{n}}\rightarrow {\mathbb {R}}\) is called an array of order d. Because of the aforementioned identification, we also write \({\varvec{B}} \in {\mathbb {R}}^{n_1 \times \cdots \times n_d} =: {\mathbb {R}}^{{\varvec{n}}}\). For \(I \subset [d]\), we define \({\mathbb {R}}^{{\varvec{n}}}(I)\) to be the set of partial arrays \({\varvec{B}}: {\varvec{J}}^{\varvec{n}}(I) \rightarrow {\mathbb {R}}\). For \(I = [d]\), this is just the aforementioned array definition.

We denote

$$\begin{aligned} \Vert {\varvec{B}}\Vert _2 := \left[ \sum _{{\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(I)} B_{{\varvec{i}}}^2 \right] ^\frac{1}{2} \end{aligned}$$

for the Frobenius norm of the (partial) array where \(B_{{\varvec{i}}} := {\varvec{B}}({\varvec{i}})\) are its entries.

For disjoint sets \(I,\,J \subset [d]\) and corresponding partial indices \({\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(I)\), \({\varvec{j}} \in {\varvec{J}}^{\varvec{n}}(J)\), define the partial index \({\varvec{i}} {\dot{\times }} {\varvec{j}} \in {\varvec{J}}^{\varvec{n}}(I \cup J)\) by

$$\begin{aligned} ({\varvec{i}} {\dot{\times }} {\varvec{j}})_l = {\left\{ \begin{array}{ll} {\varvec{i}}_l &{} \text {if } l \in I \\ {\varvec{j}}_l &{} \text {if } l \in J. \end{array}\right. } \end{aligned}$$
(6)

We will often work with arrays of order 2d whose dimensions along the first d axes are the same as the dimensions along the remaining d ones. We use the notation \({\varvec{n}}^{\times 2} = (n_1, \ldots , n_d, n_1, \ldots , n_d)\) to denote such arrays.

For sets \(I \subset [2 d]\), \(J \subset [d]\) such that \(I \cap (J + d) = \emptyset \) and for corresponding partial indices \({\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(I)\), \({\varvec{j}} \in {\varvec{J}}^{\varvec{n}}(J)\), define the partial index \({\varvec{i}} \dot{+} {\varvec{j}} \in {\varvec{J}}^{{\varvec{n}}^{\times 2}}(I \cup (J + d))\) by

$$\begin{aligned} ({\varvec{i}} \dot{+} {\varvec{j}})_l = {\left\{ \begin{array}{ll} {\varvec{i}}_l &{} \text {if } l \in I \\ {\varvec{j}}_{l - d} &{} \text {if } l \in J + d. \end{array}\right. } \end{aligned}$$
(7)

For \({\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(I)\) and \(J \subset I\), define \({\varvec{i}}_J \in {\varvec{J}}^{\varvec{n}}(J)\) to be the restriction of \({\varvec{i}}\) to J, i.e., \(({\varvec{i}}_J)_l = {\varvec{i}}_l\) for all \(l \in J\).

As suggested by the explanations above, our convention is to use bold letters for higher order arrays (e.g., \({\varvec{A}}\)) while their entries are denoted in non-bold letters (e.g., \(A_{{\varvec{i}}}\)). For some of our results, we will convert matrices into higher-order arrays by rearranging their entries. In these cases, we will denote the matrices in non-bold letters and use the same letter in bold for the array, e.g., A and \({\varvec{A}}\). For the entries, it will be clear from the indices which object is being referred to. Besides that, we will also always use bold letters for array indices (e.g., \({\varvec{i}}\)), for vectors of array dimensions (e.g. \({\varvec{n}}\)), and for the set \({\varvec{J}}^{\varvec{n}}\).

We denote \(Id_n \in {\mathbb {R}}^{n \times n}\) for the identity matrix, \(\Vert A\Vert _F\) for the Frobenius norm of a matrix, and \(\Vert A\Vert _{2 \rightarrow 2}\) for the spectral norm of a matrix.

For a random variable \(Y \in {\mathbb {R}}\), we define \(\Vert Y\Vert _{L_p} := ({\mathbb {E}}|Y |^p)^{1 / p}\) and we define the subgaussian norm \(\Vert Y\Vert _{\psi _2} := \sup _{p \ge 1} \Vert Y\Vert _{L_p} / \sqrt{p}\). For a random vector \(X \in {\mathbb {R}}^n\), we define the subgaussian norm \(\Vert X\Vert _{\psi _2} := \sup _{v \in {\mathbb {R}}^n, \Vert v\Vert _2 = 1} \Vert \langle X, v \rangle \Vert _{\psi _2}\), and we call X isotropic if \({\mathbb {E}} X X^T = Id_n\).

1.5 Previous relevant results

Since our result is based on the bounds given by Latala in [7], we also consider the following norms which are also used in that result. In our notation, the norms of interest are stated as follows.

Definition 1

For \({\varvec{n}} \in {\mathbb {N}}^d\) and an array \({\varvec{A}} \in {\mathbb {R}}^{{\varvec{n}}}\), we define the following norms for any partition \(I_1, \dots , I_\kappa \) of [d].

$$\begin{aligned} \Vert {\varvec{A}}\Vert _{I_1, \ldots , I_\kappa } := \sup _{\begin{array}{c} {\varvec{\alpha }}^{(1)} \in {\mathbb {R}}^{{\varvec{n}}}(I_1), \ldots , {\varvec{\alpha }}^{(\kappa )} \in {\mathbb {R}}^{{\varvec{n}}}(I_\kappa ), \\ \Vert {\varvec{\alpha }}^{(1)}\Vert _2 = \cdots = \Vert {\varvec{\alpha }}^{(\kappa )}\Vert _2 = 1 \end{array}} \quad \sum _{{\varvec{i}} \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}}} \alpha ^{(1)}_{{\varvec{i}}_{I_1}} \cdots \alpha ^{(\kappa )}_{{\varvec{i}}_{I_\kappa }}. \end{aligned}$$

For example, when \(d=2\), the array \({\varvec{A}}\) is a matrix and \(\Vert \cdot \Vert _{\{1,2\}}\) coincides with the Frobenius and \(\Vert \cdot \Vert _{\{1\}, \{2\}}\) with the spectral norm. Latala [7] proved the following upper and lower moment bounds for a decoupled Gaussian chaos of arbitrary order. Even though it is only shown for \(p \ge 2\) in [7], it holds for all \(p \ge 1\) as explained in Remark 1 below.

Theorem 2

(Theorem 1 in [7]) Let \({\varvec{n}} \in {\mathbb {N}}^d\), \({\varvec{A}} \in {\mathbb {R}}^{{\varvec{n}}}\), \(p \ge 1\).

Let \(S(d, \kappa )\) denote the set of partitions of [d] into \(\kappa \) nonempty disjoint subsets. Define

$$\begin{aligned} m_p({\varvec{A}}) := \sum _{\kappa = 1}^d p^{\kappa / 2} \sum _{(I_1, \ldots , I_\kappa ) \in S(d, \kappa )} \Vert {\varvec{A}}\Vert _{I_1, \ldots , I_\kappa }. \end{aligned}$$
(8)

Consider independent Gaussian random vectors \(g^{(1)} \sim N(0, Id_{{\varvec{n}}_1}), \ldots , g^{(d)} \sim N(0, Id_{{\varvec{n}}_d})\). Then

$$\begin{aligned} \frac{1}{C(d)} m_p({\varvec{A}}) \le \left\| \sum _{{\varvec{i}} \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}}} \prod _{l \in [d]} g^{(l)}_{{\varvec{i}}_l} \right\| _{L_p} \le C(d) m_p({\varvec{A}}), \end{aligned}$$

where \(C(d) > 0\) is a constant that only depends on d.

Remark 1

Theorem 1 in [7] only shows this statement for \(p \ge 2\). However, by a small adjustment, we can see that it also holds for \(1 \le p \le 2\) with a possibly different C(d). Let \(X := \sum _{{\varvec{i}} \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}}} \prod _{l \in [d]} g^{(l)}_{{\varvec{i}}_l}\). For the upper bound we have for \(1 \le p \le 2\),

$$\begin{aligned} \Vert X\Vert _{L_p} \le \Vert X\Vert _{L_2} \le C(d) m_2({\varvec{A}}) \le 2^\frac{d}{2} C(d) m_p({\varvec{A}}). \end{aligned}$$

For the lower bound, we consider the recent work [22] about a generalized Gaussian chaos with values in an arbitrary Banach space. Theorem 2.1 in their work states the lower bound

$$\begin{aligned} \frac{1}{C(d)} \sum _{J \subset [d]} \sum _{{\mathcal {P}} \in {\mathcal {P}}(J)} p^{|{\mathcal {P}} |/ 2} {\left| \left| \left| {\varvec{A}} \right| \right| \right| }_{{\mathcal {P}}} \le \Vert X\Vert _{L_p}, \end{aligned}$$
(9)

for all \(p \ge 1\), where \({\mathcal {P}}(J)\) is defined as the set of all partitions of J (into non-empty, pairwise disjoint sets) and \({\left| \left| \left| {\varvec{A}} \right| \right| \right| }_{{\mathcal {P}}}\), defined in (2.2) of [22], is a non-negative expression that coincides with our definition of \(\Vert {\varvec{A}}\Vert _{I_1, \ldots , I_\kappa }\) if \({\mathcal {P}} = (I_1, \ldots , I_\kappa )\) is a partition of the entire set [d]. Therefore we can restrict the sum over J in (9) to the term \(J = [d]\) and obtain

$$\begin{aligned} \frac{1}{C(d)} m_p({\varvec{A}}) = \frac{1}{C(d)} \sum _{{\mathcal {P}} \in {\mathcal {P}}([d])} p^{|{\mathcal {P}} |/ 2} {\left| \left| \left| {\varvec{A}} \right| \right| \right| }_{{\mathcal {P}}} \le \Vert X\Vert _{L_p}. \end{aligned}$$

2 Main results

The main contribution of our work is the following result which gives a generalization of the Hanson–Wright inequality (Theorem 1) in terms of upper and lower moment bounds. Note that the operators \({\dot{\times }}\) and \(\dot{+}\) are defined in (6) and (7).

Theorem 3

For \(d \ge 1\), let \({\varvec{n}} = (n_1, \ldots , n_d)\) be a vector of dimensions, and let \(N = n_1 \ldots n_d\).

Let \(A \in {\mathbb {R}}^{N \times N}\) and \(X^{(1)} \in {\mathbb {R}}^{n_1}, \ldots , X^{(d)} \in {\mathbb {R}}^{n_d}\) be random vectors with independent, mean 0, variance 1 entries with subgaussian norms bounded by \(L \ge 1\). Define \(X := X^{(1)} \otimes \cdots \otimes X^{(d)}\). There exists a constant C(d), depending only on d, such that for all \(p \ge 1\),

$$\begin{aligned} \left\| X^T A X - {\mathbb {E}} X^T A X \right\| _{L_p} \le C(d) m_p. \end{aligned}$$

The numbers \(m_p\) are defined as follows. By rearranging its entries, regard A as an array \({\varvec{A}} \in {\mathbb {R}}^{{\varvec{n}}^{\times 2}}\) of order 2d such that

$$\begin{aligned} X^T A X = \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in [d]} X^{(l)}_{{\varvec{i}}_l} X^{(l)}_{{\varvec{i}}'_l}. \end{aligned}$$

For any \(I \subset [d]\) and for \(I^c = [d] \backslash I\), define \({\varvec{A}}^{(I)} \in {\mathbb {R}}^{{\varvec{n}}^{\times 2}}(I^c \cup (I^c + d))\) by

$$\begin{aligned} A^{(I)}_{{\varvec{i}} \dot{+} {\varvec{i}}'} = \sum _{{\varvec{k}} \in {\varvec{J}}^{\varvec{n}}(I)} A_{({\varvec{i}} {\dot{\times }} {\varvec{k}}) \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{k}})} \end{aligned}$$
(10)

for all \({\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(I^c)\).

For \(T \subset [2 d]\) and \(1 \le \kappa \le 2d\), denote by \(S(T, \kappa )\) the set of partitions of T into \(\kappa \) sets. Then for any \(p \ge 1\), define

$$\begin{aligned} m_p := L^{2 d} \sum _{\kappa = 1}^{2 d} p^\frac{\kappa }{2} \sum _{\begin{array}{c} I \subset [d] \\ I \ne [d] \end{array}} \quad \sum _{(I_1, \ldots , I_\kappa ) \in S((I^c) \cup (I^c + d), \kappa )} \Vert {\varvec{A}}^{(I)}\Vert _{I_1, \ldots , I_\kappa }. \end{aligned}$$

If in addition, \(X^{(1)} \sim N(0, Id_{n_1}), \dots , X^{(d)} \sim N(0, Id_{n_d})\) are normally distributed (i.e. L is constant), and \({\varvec{A}}\) satisfies the symmetry condition that for all \(l \in [d]\) and any \({\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}([d] \backslash \{l\})\), \({\varvec{j}}, {\varvec{j}}' \in {\varvec{J}}^{\varvec{n}}(\{l\})\),

$$\begin{aligned} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}}) \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{j}}')} = A_{({\varvec{i}} {\dot{\times }} {\varvec{j}}') \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{j}})}, \end{aligned}$$
(11)

then also the lower bound

$$\begin{aligned} {\tilde{C}}(d) m_p \le \left\| X^T A X - {\mathbb {E}} X^T A X \right\| _{L_p} \end{aligned}$$

holds for all \(p \ge 1\). Here, \({\tilde{C}}(d) > 0\) only depends on d.

Note that these upper bounds can directly be converted to tail bounds in the style of Theorems 1 or 6 using Lemma 13. After introducing the required tools, the proof of Theorem 3 will be split up into two parts. We will prove the upper bound in Sect. 3.2.2 and then the lower bound in Sect. 3.3.2.

Remark 2

The symmetry condition required for the lower bound is not satisfied for all matrices. However, for any matrix A, we can find a matrix \({\tilde{A}}\) satisfying the symmetry condition and such that \(X^T A X = X^T {\tilde{A}} X\) always holds. To do this, in the array notation we can define \(\tilde{{\varvec{A}}}\) by transposing \({\varvec{A}}\) along all possible sets of axes and then taking the mean \({\tilde{A}}_{{\varvec{i}} \dot{+} {\varvec{i}}'} = \frac{1}{2^d} \sum _{I \subset [d]} A_{({\varvec{i}}_{I^c} {\dot{\times }} {\varvec{i}}'_{I}) \dot{+} ({\varvec{i}}_{I} {\dot{\times }} {\varvec{i}}'_{I^c})}\) for any \({\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}\). This is a generalization of taking \({\tilde{A}} = \frac{1}{2} (A + A^T)\) for \(d = 1\). Note however, that \({\tilde{A}}\) might have significantly smaller norms than A which is why the lower moment bounds in Theorem 3 might not hold for A directly.

A central part of our argument is the following specialized decoupling result for expressions as in (3) which might be of independent interest.

Theorem 4

Let \({\varvec{n}} = (n_1, \dots , n_d) \in {\mathbb {N}}^d\), \({\varvec{A}} \in {\mathbb {R}}^{{\varvec{n}}^{\times 2}}\), \(X^{(1)} \in {\mathbb {R}}^{n_1}, \dots , X^{(d)} \in {\mathbb {R}}^{n_d}\) random vectors with independent mean 0, variance 1 entries and \({\bar{X}}^{(1)}, \dots , {\bar{X}}^{(d)}\) corresponding independent copies. Then for all \(p \ge 1\),

$$\begin{aligned}&\left\| \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in [d]} X^{(l)}_{{\varvec{i}}_l} X^{(l)}_{{\varvec{i}}'_l} - {\mathbb {E}} \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in [d]} X^{(l)}_{{\varvec{i}}_l} X^{(l)}_{{\varvec{i}}'_l} \right\| _{L_p} \\&\quad \le \sum _{\begin{array}{c} I, J \subset [d]: \\ J \subset I,\, I \backslash J \ne [d] \end{array}} 4^{d - |I |} \left\| \sum _{\begin{array}{c} {\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(J) \\ {\varvec{j}} \in {\varvec{J}}^{\varvec{n}}(I \backslash J) \\ {\varvec{k}}, {\varvec{k}}' \in {\varvec{J}}^{\varvec{n}}(I^c) \end{array}} A_{\begin{array}{c} ({\varvec{i}} {\dot{\times }} {\varvec{j}} {\dot{\times }} {\varvec{k}}) \\ \dot{+} ({\varvec{i}} {\dot{\times }} {\varvec{j}} {\dot{\times }} {\varvec{k}}') \end{array}} \prod _{l \in J} \left[ (X^{(l)}_{{\varvec{i}}_l})^2 - 1 \right] \prod _{l \in I^c} X^{(l)}_{{\varvec{k}}_l} {\bar{X}}^{(l)}_{{\varvec{k}}'_l} \right\| _{L_p} \end{aligned}$$

Remark 3

Consider the special case in Theorem 4 of \(X^{(1)}, \dots , X^{(d)}\) being Rademacher vectors, i.e., having independent entries that are \(\pm 1\) with a probability of \(\frac{1}{2}\) each. Then any squared entry is 1 almost surely. This implies that the factor \(\prod _{l \in J} \left[ (X^{(l)}_{{\varvec{i}}_l})^2 - 1 \right] \) is 0 unless \(J = \emptyset \). So on the right hand side of the inequality in Theorem 4, only the terms with \(J = \emptyset \) need to be considered.

Theorem 4 combines two aspects. On the one hand, there is the probabilistic aspect that the factors \(X^{(l)}_{{\varvec{i}}_l} X^{(l)}_{{\varvec{i}}'_l}\) are replaced by \(X^{(l)}_{{\varvec{k}}_l} {\bar{X}}^{(l)}_{{\varvec{k}}'_l}\), using the independent copy \({\bar{X}}^{(l)}_{{\varvec{k}}'_l}\). On the other hand, there is the arithmetic one that the quadratic factors \((X^{(l)}_{{\varvec{i}}_l})^2\) arising on the left hand side for \({\varvec{i}}_l = {\varvec{i}}'_l\) are expressed by factors \(\left[ (X^{(l)}_{{\varvec{i}}_l})^2 - 1 \right] \), which have mean 0. A crucial ingredient for the proof of Theorem 4 is the following theorem, which summarizes the aforementioned arithmetic aspect. Note that it does not take any randomness in the vectors into account.

Theorem 5

Let \({\varvec{n}} \in {\mathbb {N}}^d\), \({\varvec{A}} \in {\mathbb {R}}^{{\varvec{n}}}\), \(X^{(1)} \in {\mathbb {R}}^{n_1}, \ldots , X^{(d)} \in {\mathbb {R}}^{n_d}\). Then

$$\begin{aligned} \sum _{{\varvec{i}} \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}}} \prod _{l \in [d]} (X^{(l)}_{{\varvec{i}}_l})^2 = \sum _{I \subset [d]} \sum _{{\varvec{i}} \in {\varvec{J}}^{\varvec{n}}([d] \backslash I)} A^{\langle I \rangle }_{{\varvec{i}}} \prod _{l \in [d] \backslash I} \left[ (X^{(l)}_{{\varvec{i}}_l})^2 - 1 \right] , \end{aligned}$$

where for any \(I \subset [d]\) and \({\varvec{i}} \in {\varvec{J}}^{\varvec{n}}([d] \backslash I)\),

$$\begin{aligned} A_{{\varvec{i}}}^{\langle I \rangle } = \sum _{{\varvec{j}} \in {\varvec{J}}^{\varvec{n}}(I)} A_{{\varvec{i}} {\dot{\times }} {\varvec{j}}}. \end{aligned}$$

Theorem 3 also leads to the following new tail bound for \(\Vert A (X^{(1)} \otimes \dots \otimes X^{(d)})\Vert _2\). Note that it contains the deviation of the non-squared norm. This improves upon the previous result by Vershynin [13] as described in (5), up to the constant C(d). By comparison, our result provides a strictly stronger bound for matrices with smaller Frobenius norm and holds for all \(t \ge 0\).

Theorem 6

Let \(B \in {\mathbb {R}}^{n_0 \times n^d}\) be a matrix, \(X^{(1)}, \dots , X^{(d)} \in {\mathbb {R}}^{n}\) independent random vectors with independent, mean 0, variance 1 entries with subgaussian norm bounded by \(L \ge 1\), and let \(X := X^{(1)} \otimes \dots \otimes X^{(d)} \in {\mathbb {R}}^{n^d}\). Then for a constant C(d) depending only on d and for any \(t > 0\),

$$\begin{aligned}&{\mathbb {P}}\left( \left|\Vert B X\Vert _2 - \Vert B\Vert _F \right|> t \right) \\&\quad \le {\left\{ \begin{array}{ll} e^2 \exp \left( - C(d) \frac{t^2}{n^{d - 1} \Vert B\Vert _{2 \rightarrow 2}^2} \right) &{} \text {if } t \le n^{\frac{d}{2}} \Vert B\Vert _{2 \rightarrow 2} \\ e^2 \exp \left( - C(d) \left( \frac{t}{\Vert B\Vert _{2 \rightarrow 2}} \right) ^\frac{2}{d} \right) &{} \text {if } t \ge n^\frac{d}{2} \Vert B\Vert _{2 \rightarrow 2} \\ e^2 \exp \left( - C(d) \frac{t^2}{ n^{\frac{d - 1}{2}} \Vert B\Vert _{F}^2 } \right) &{} \text {if } n^{\frac{d - 1}{4}} \Vert B\Vert _{2 \rightarrow 2} \le t \le n^{\frac{d - 1}{4}} \Vert B\Vert _{F}. \end{array}\right. } \end{aligned}$$

Note that the third interval intersects the first two intervals. In any interval of intersection, both bounds hold. For slightly more complicated but provably optimal moment bounds, we refer the reader to Corollary 22.

Remark 4

In addition to extending the previous result in (5) from [13] to all \(t \ge 0\), our result provides a strict improvement of that result for matrices with stable rank \((\Vert B\Vert _F / \Vert B\Vert _{2 \rightarrow 2})^2 \in (1, n^{\frac{d - 1}{2}} )\).

As an example, consider a square matrix \(B \in {\mathbb {R}}^{n^d \times n^d}\) with mildly exponentially decreasing singular values \(\sigma _j = e^{-\frac{1}{2} n^{-\frac{d}{4}} (j - 1) }\) for \(1 \le j \le r\). Then by direct calculations, one can check that \(\Vert B\Vert _{2 \rightarrow 2} = \sigma _1 = 1\) and

$$\begin{aligned} \Vert B\Vert _F^2 = \frac{1 - e^{-n^{-\frac{d}{4}} r}}{1 - e^{- n^{-\frac{d}{4}}}} \in \left[ \frac{1}{2} n^{\frac{d}{4}}, 2 n^{\frac{d}{4}} \right] \end{aligned}$$

So the stable rank is \(\in [\frac{1}{2} n^\frac{d}{4}, 2 n^\frac{d}{4}]\). Indeed, for at least the (for large enough n non-empty) interval \(n^{\frac{1}{4}d - \frac{1}{4}} \le t \le \frac{1}{2} n^{\frac{3}{8}d - \frac{1}{4}}\), the third line in Theorem 6 provides a probability bound \(\le e^2 \exp \left( - C(d) \frac{t^2}{2 n^{\frac{3}{4}d - \frac{1}{2}}} \right) \) while the first line only provides a bound of \(e^2 \exp \left( - C(d) \frac{t^2}{n^{d - 1}} \right) \), i.e., there is an improvement for \(d \ge 3\).

3 Main proofs

3.1 Preliminaries

The classical symmetrization theorem for normed spaces, such as Lemma 6.4.2 in [23], can be extended to increasing convex functions of norms as the following result from [24] shows.

Lemma 7

(Special case of Lemma A1 in [24]) Let \(X_1, \ldots , X_n\) be independent, mean 0 real-valued random variables and \(p \ge 1\). Let \(\xi _1, \ldots , \xi _n\) be independent Rademacher variables that are independent of \(X_1, \ldots , X_n\). Then

$$\begin{aligned} \frac{1}{2^p} {\mathbb {E}}\left|\sum _{k = 1}^n \xi _k X_k \right|^p \le {\mathbb {E}}\left|\sum _{k = 1}^n X_k \right|^p \le 2^p {\mathbb {E}}\left|\sum _{k = 1}^n \xi _k X_k \right|^p \end{aligned}$$

Decoupling theorems for quadratic forms relate double sums \(\sum _{j, k = 1}^n A_{j, k} X_j X_k\) over random variables \((X_j)_{j \in [n]}\) to a “decoupled” expression \(\sum _{j, k = 1}^n A_{j, k} X_j {\bar{X}}_k\) where the \({\bar{X}}_k\) are independent copies of the \(X_k\). Different versions have been used in probability theory for a long time and we refer to Section 3.6 in [25] for an overview of their history. The following version for convex functions, Theorem 8.11 in the textbook [3], is an adaptation of Proposition 1.9 in [26].

Theorem 8

Let \(A \in {\mathbb {R}}^{n \times n}\) be a matrix, \(X \in {\mathbb {R}}^n\) a vector with independent mean 0 entries, and \({\bar{X}}\) and independent copy of X. Let \(F: {\mathbb {R}} \rightarrow {\mathbb {R}}\) be a convex function. Then

$$\begin{aligned} {\mathbb {E}} F\left( \sum _{\begin{array}{c} j, k = 1 \\ j \ne k \end{array}}^n A_{j k} X_j X_k \right) \le {\mathbb {E}} F\left( 4 \sum _{j, k = 1}^n A_{j k} X_j {\bar{X}}_k \right) \end{aligned}$$

Also the following elementary result will be used.

Lemma 9

Let T be a finite set. Then

$$\begin{aligned} \sum _{S \subset T} (-1)^{|S |} = {\left\{ \begin{array}{ll} 1 &{} \text {if } T = \emptyset \\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

Proof

By grouping all \(S \subset T\) of the same size and applying the binomial theorem,

$$\begin{aligned} \sum _{S \subset T} (-1)^{|S |}&= \sum _{k = 0}^{|T |} \sum _{\begin{array}{c} S \subset T \\ |S |= k \end{array}} (-1)^{|S |} = \sum _{k = 0}^{|T |} \left( {\begin{array}{c}|T |\\ k\end{array}}\right) (-1)^k \cdot 1^{|T |- k} = (- 1 + 1)^{|T |} \\&= {\left\{ \begin{array}{ll} 1 &{} \text {if } T = \emptyset \\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

\(\square \)

Although this is a very elementary statement and consequence of the binomial theorem, we are not aware of any previous usages of precisely this identity. One somewhat similar tool is given by Mazur-Orlicz formula ((11) in [27]), which has also been used in a problem related to decoupling inequalities in [28]. It is stated as

$$\begin{aligned} (-1)^k \sum _{\epsilon _1, \ldots , \epsilon _k = 0}^1 (-1)^{-(\epsilon _1 + \cdots + \epsilon _k)} \epsilon _1^{v_1} \cdots \epsilon _{k}^{v_k} = (1 - 0^{v_1}) \cdots (1 - 0^{v_k}). \end{aligned}$$

With \(v_1 = \cdots = v_k = 0\), this becomes

$$\begin{aligned} (-1)^k \sum _{\epsilon _1, \ldots , \epsilon _k = 0}^1 (-1)^{-(\epsilon _1 + \dots + \epsilon _k)} = 0^k. \end{aligned}$$

For \(k = |T |> 0\), the \(\{0, 1\}\)-tuples \((\epsilon _1, \ldots , \epsilon _k)\) can be identified with the subsets \(S \subset T\) such that \(|S |= \epsilon _1 + \cdots + \epsilon _k\) and then this identity implies Lemma 9 for \(T \ne \emptyset \).

For the norms in Definition 1, we need the following property about restricting arrays to some diagonal entries. This can be obtained directly from a repeated application of Lemma 5.2 in [6] (where \(K = \{l, l + d\}\) for each \(l \in I\)). Here again, we use the notation of \({\dot{\times }}\) and \(\dot{+}\) from (6) and (7).

Lemma 10

Let \({\varvec{A}} \in {\mathbb {R}}^{{\varvec{n}}^{\times 2}}\), \(I \subset [d]\) and define \({\varvec{A}}^{[I]} \in {\mathbb {R}}^{{\varvec{n}}^{\times 2}}\) by

$$\begin{aligned} A^{[I]}_{{\varvec{i}} \dot{+} {\varvec{i}}'} := {\left\{ \begin{array}{ll} A_{{\varvec{i}} \dot{+} {\varvec{i}}'} &{} \text {if } \forall l \in I: {\varvec{i}}_l = {\varvec{i}}'_l \\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

for all \({\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}\). Then for any partition \(I_1, \dots , I_\kappa \) of [2d], we have

$$\begin{aligned} \Vert {\varvec{A}}^{[I]}\Vert _{I_1, \ldots , I_\kappa } \le \Vert {\varvec{A}}\Vert _{I_1, \ldots , I_\kappa }. \end{aligned}$$

For comparisons between functions of subgaussian and of Gaussian variables, we will use the concept of strong domination of random variables. See, e.g., [29] for the following definition and further explanations.

Definition 2

(Definition 3.2.1 in[29]) Let \(X, Y \in {\mathbb {R}}\) be random variables and \(\kappa , \lambda > 0\). We say that X is \((\kappa , \lambda )\)-strongly dominated by Y (\(X \prec _{(\kappa , \lambda )} Y\)) if for every \(t > 0\),

$$\begin{aligned} {\mathbb {P}}(|X |> t) \le \kappa {\mathbb {P}}(\lambda |Y |> t). \end{aligned}$$

It can be shown that linear combinations of independent, strongly dominated random variables are again strongly dominated which in turn implies the following statement about expectations of convex functions of these linear combinations.

Theorem 11

(Corollary 3.2.1 in [29]) Let \(X_1, \ldots , X_n, Y_1, \ldots , Y_n \in {\mathbb {R}}\) be independent symmetric random variables and \(a_1, \ldots , a_n \in {\mathbb {R}}\) fixed coefficients such that \(X_i \prec _{(\kappa , \lambda )} Y_i\). Then for any nondecreasing \(\varphi : {\mathbb {R}}^{+} \rightarrow {\mathbb {R}}^{+}\),

$$\begin{aligned} {\mathbb {E}} \varphi \left( \left|\sum _{i = 1}^n a_i X_i \right|\right) \le 2 \lceil \kappa \rceil {\mathbb {E}} \varphi \left( \lceil \kappa \rceil \lambda \left|\sum _{i = 1}^n a_i Y_i \right|\right) . \end{aligned}$$

Statements similar to the following lemma have been used in multiple works to establish a relation between \(\left|\Vert A x\Vert _2 - a \right|\) and \(\left|\Vert A x\Vert _2^2 - a^2 \right|\), for example in the proof of Lemma 5.36 in [30]. For completeness, we state it as a separate result with its proof here.

Lemma 12

For real numbers \(a, b \ge 0\), \(b \ne 0\), it holds that

$$\begin{aligned} \frac{1}{3} \min \left\{ \frac{|a^2 - b^2 |}{b}, \sqrt{|a^2 - b^2 |} \right\} \le |a - b |\le \min \left\{ \frac{|a^2 - b^2 |}{b}, \sqrt{|a^2 - b^2 |} \right\} . \end{aligned}$$

Proof

We obtain

$$\begin{aligned} |a - b |= \frac{|a^2 - b^2 |}{|a + b |} \le \frac{|a^2 - b^2 |}{b}, \end{aligned}$$

and since \(a, b \ge 0\), i.e., \(|a - b |\le |a |+ |b |= |a + b |\), it follows that \(|a - b |^2 \le |a - b ||a + b |= |a^2 - b^2 |\), proving the second inequality.

For the first inequality, first assume the case \(a \le 2 b\). Then \(a + b \le 3 b\) such that

$$\begin{aligned} \frac{1}{3} \frac{|a^2 - b^2 |}{b} \le \frac{|a^2 - b^2 |}{a + b} = |a - b |. \end{aligned}$$

In the case that \(a \ge 2 b\), i.e., \(a - b \ge b \ge 0\), we obtain

$$\begin{aligned} \frac{1}{3} \sqrt{|a^2 - b^2 |}&\le \frac{1}{3} \sqrt{|a + b ||a - b |} \le \frac{1}{3} \sqrt{(|a - b |+ 2 b) |a - b |} \\&\le \frac{1}{3} \sqrt{(|a - b |+ 2 |a - b |) |a - b |} = \frac{1}{\sqrt{3}} |a - b |\le |a - b |. \end{aligned}$$

\(\square \)

Relations between moments and tail bounds have also been well-known in the field. For an overview see, e.g., Chapter 7.3 in [3]. In this spirit, we state and prove the following small tool for the case of mixed tails which we encounter in this work.

Lemma 13

(Moments and tail bounds) Let T be a finite set and X an \({\mathbb {R}}\) valued random variable such that for all \(p \ge p_0 \ge 0\),

$$\begin{aligned} \Vert X\Vert _{L_p} \le \sum _{k = 1}^{d} \min _{l \in T} p^{e_{k, l}} \gamma _{k, l} \end{aligned}$$

for values \(\gamma _{k, l} > 0\), \(e_{k, l} > 0\).

Then for all \(t > 0\),

$$\begin{aligned} {\mathbb {P}}(|X |> t) \le e^{p_0} \exp \left( - \min _{k \in [d]} \max _{l \in T} \left( \frac{t}{e d \gamma _{k, l}} \right) ^\frac{1}{e_{k, l}} \right) . \end{aligned}$$

Proof

Fix any \(u > 0\). For any \(k \in [d]\), define \(l'(k) := {{\,\mathrm{argmax}\,}}_{l \in T} \left( \frac{u}{\gamma _{k, l}} \right) ^{\frac{1}{e_{k, l}}}\), then choose \(k' := {{\,\mathrm{argmin}\,}}_{k \in [d]} \left( \frac{u}{\gamma _{k, l'(k)}} \right) ^{ \frac{1}{e_{k, l'(k)}} }\), and \(p := \left( \frac{u}{\gamma _{k', l'(k')}} \right) ^{\frac{1}{e_{k', l'(k')}}}\), such that \(p = \min _{k \in [d]} \max _{l \in T} \left( \frac{u}{\gamma _{k, l}} \right) ^{\frac{1}{e_{k, l}}}\).

If \(p < p_0\), then \({\mathbb {P}}(|X |> e d u) \le 1 = e^{p_0} \exp (-p_0) \le e^{p_0} \exp (-p)\).

If \(p \ge p_0\), then by the choice of p,

$$\begin{aligned} \Vert X\Vert _{L_p}&\le \sum _{k = 1}^{d} \min _{l \in T} p^{e_{k, l}} \gamma _{k, l} \le \sum _{k = 1}^{d} \min _{l \in T} \left[ \left( \frac{u}{\gamma _{k', l'(k')}}\right) ^ {\frac{1}{e_{k', l'(k')}}} \right] ^{e_{k, l}} \gamma _{k, l} \\&\le \sum _{k = 1}^{d} \left[ \left( \frac{u}{\gamma _{k', l'(k')}}\right) ^{\frac{1}{e_{k', l'(k')}}} \right] ^{e_{k, l'(k)}} \gamma _{k, l'(k)} \\&\le \sum _{k = 1}^{d} \left[ \left( \frac{u}{\gamma _{k, l'(k)}}\right) ^{\frac{1}{e_{k, l'(k)}}} \right] ^{e_{k, l'(k)}} \gamma _{k, l'(k)} \le \sum _{k = 1}^{d} u = d u. \end{aligned}$$

So by Markov’s inequality,

$$\begin{aligned} {\mathbb {P}}(|X |> e d u)&\le {\mathbb {P}}(|X |^p > (e d u)^p) \le \frac{{\mathbb {E}} |X |^p}{(e d u)^p} = \left( \frac{\Vert X\Vert _{L_p}}{e d u} \right) ^p \le e^{-p}. \end{aligned}$$

In all cases, we obtain

$$\begin{aligned} {\mathbb {P}}(|X |> e d u) \le e^{p_0} e^{-p} = e^{p_0} \exp \left( - \min _{k \in [d]} \max _{l \in T} \left( \frac{u}{\gamma _{k, l}} \right) ^{\frac{1}{e_{k, l}}} \right) . \end{aligned}$$

The result follows by taking \(u := \frac{t}{e d}\). \(\square \)

3.2 Proof of the upper bound

3.2.1 Required tools

Lemma 14

There is an absolute constant C such that the following holds. Let \(X \in {\mathbb {R}}^n\) be random with mean 0 and \(\Vert X\Vert _{\psi _2} \le L\). Take a Gaussian vector \(g \sim N(0, Id_n)\) and \(a \in {\mathbb {R}}^n\). Then for all \(p \ge 1\),

$$\begin{aligned} {\mathbb {E}} \left|\sum _{k = 1}^n a_k X_k \right|^p \le (C L)^p {\mathbb {E}} \left|\sum _{k = 1}^n a_k g_k \right|^p. \end{aligned}$$

Proof

By the assumption on X, \(\sum _{k = 1}^n a_k X_k = \langle a, X \rangle \) is mean 0 with \(\Vert \langle a, X \rangle \Vert _{\psi _2} \le L \Vert a\Vert _2\), implying that for any \(p \ge 1\),

$$\begin{aligned} {\mathbb {E}} |\langle a, X \rangle |^p \le (C_1 L \Vert a\Vert _2)^p p^\frac{p}{2}. \end{aligned}$$

On the other hand, \(\langle a, g \rangle \sim N(0, \Vert a\Vert _2^2)\), so by the known absolute moments of the normal distribution and Stirling’s approximation,

$$\begin{aligned} {\mathbb {E}} |\langle a, g \rangle |^p&= \Vert a\Vert _2^{p} \cdot \frac{2^\frac{p}{2}}{\sqrt{\pi }} \Gamma \left( \frac{p + 1}{2} \right) \ge \Vert a\Vert _2^p \frac{2^\frac{p}{2}}{\sqrt{\pi }} \sqrt{2 \pi } \left( \frac{p + 1}{2} \right) ^\frac{p}{2} \exp \left( - \frac{p + 1}{2}\right) \\&\ge 2^\frac{p}{2} \Vert a\Vert _{2}^{p} \sqrt{\frac{2}{e}} \left( \frac{p}{2 e} \right) ^\frac{p}{2} \ge \sqrt{\frac{2}{e}} \left( \frac{1}{e} \right) ^\frac{p}{2} \Vert a\Vert _{2}^{p} p^\frac{p}{2} \ge \left( \frac{2}{e^2} \right) ^\frac{p}{2} \Vert a\Vert _{2}^{p} p^{\frac{p}{2}}, \end{aligned}$$

implying that \({\mathbb {E}}|\langle a, X \rangle |^p \le \left( \frac{C_1 e}{\sqrt{2}} L \right) ^p {\mathbb {E}} |\langle a, g \rangle |^p\). \(\square \)

In order to control arbitrary chaoses, we will derive a similar result as Lemma 14 for squared subgaussian and Gaussian variables. To achieve this, we make use of strong domination. The following theorem states that this can be used to compare squared subgaussian and Gaussian variables.

Lemma 15

There exist absolute constants \(\kappa , \lambda > 0\) such that the following holds. Let X be a random variable with \({\mathbb {E}} X^2 = 1\) and \(\Vert X\Vert _{\psi _2} \le L\), \(L \ge 1\) and \(g \sim N(0, 1)\). Let \(\xi ,\,\xi ' \in \{\pm 1\}\) be Rademacher variables that are independent of X and g. Then \(\xi (X^2 - 1) \prec _{(\kappa , \lambda L^2)} \xi ' (g^2 - 1)\) in the sense of Definition 2.

Proof

For any \(t > 0\),

$$\begin{aligned} {\mathbb {P}}\left( |\xi (X^2 - 1) |> t \right) = {\mathbb {P}} \left( X^2 - 1> t \right) + {\mathbb {P}} \left( - (X^2 - 1) > t \right) \end{aligned}$$

For a constant \(c \ge 1\), the first term can be bounded by

$$\begin{aligned} {\mathbb {P}} \left( X^2 - 1> t \right) = {\mathbb {P}} \left( |X |> \sqrt{1 + t} \right) \le \exp \left( 1 - \frac{1 + t}{c^2 L^2} \right) \le e \cdot e^{- \frac{t}{c^2 L^2}}. \end{aligned}$$

The second term is 0 if \(t \ge 1\) since \(-(X^2 - 1) \le 1\). For \(t \le 1\), \(e^{- \frac{t}{c^2 L^2}} \ge e^{- \frac{1}{c^2 L^2}} \ge e^{- 1}\). Then it holds that \({\mathbb {P}}( - (X^2 - 1) > t) \le 1 \le e \cdot e^{- \frac{t}{c^2 L^2}}\), and altogether we obtain

$$\begin{aligned} {\mathbb {P}}\left( |\xi (X^2 - 1) |> t \right) \le 2 e \cdot e^{- \frac{t}{c^2 L^2}}. \end{aligned}$$

On the other hand, for any \(\lambda > 0\),

$$\begin{aligned} {\mathbb {P}} \left( \lambda L^2 |\xi ' (g^2 - 1) |> t \right)&\ge {\mathbb {P}} \left( g^2 - 1> \frac{t}{\lambda L^2} \right) = {\mathbb {P}} \left( |g |> \sqrt{1 + \frac{t}{\lambda L^2}} \right) \\&= {\mathbb {P}} \left( |g |\ge \sqrt{1 + \frac{t}{\lambda L^2}} \right) . \end{aligned}$$

To bound this, we use the following properties of the normal distribution: (see Proposition 7.5 in [3])

$$\begin{aligned} {\mathbb {P}}(|g |\ge u) \ge \sqrt{\frac{2}{\pi }} \frac{1}{u} \left( 1 - \frac{1}{u^2} \right) e^{- \frac{u^2}{2} }, \qquad {\mathbb {P}}(|g |\ge u) \ge \left( 1 - \sqrt{\frac{2}{\pi }} u \right) e^{- \frac{u^2}{2} }. \end{aligned}$$
(12)

For \(0 < u \le \frac{1}{4}\), the second inequality in (12) yields

$$\begin{aligned} {\mathbb {P}} \left( |g |\ge \sqrt{1 + u} \right) \ge \frac{1}{10} e^{- \frac{1 + u}{2} } \ge \frac{1}{10} e^{-\frac{1}{2}} \cdot e^{- u} \ge \frac{1}{17} e^{-u}. \end{aligned}$$

For \(u \ge \frac{1}{4}\), the first inequality in (12) gives \({\mathbb {P}} \left( |g |\ge \sqrt{1 + u} \right) \ge \frac{1}{5} \sqrt{\frac{2}{\pi }} \frac{1}{\sqrt{1 + u}} e^{- \frac{1 + u}{2} }\). Using that \(\frac{1}{\sqrt{1 + u}} \ge e^{-\frac{1}{2} u}\) for all \(u > 0\), we obtain for \(u \ge \frac{1}{4}\),

$$\begin{aligned} {\mathbb {P}} \left( |g |\ge \sqrt{1 + u} \right)&\ge \frac{1}{5} \sqrt{\frac{2}{\pi }} e^{-\frac{1}{2} u} \exp \left( - \frac{1 + u}{2} \right) = \frac{1}{5} \sqrt{\frac{2}{\pi }} \exp \left( - \frac{1}{2} - u \right) \ge \frac{1}{11} e^{-u}. \end{aligned}$$

So for any \(u > 0\), \({\mathbb {P}}(|g |> \sqrt{1 + u}) \ge \frac{1}{17} e^{-u}\). By choosing \(\lambda = c^2\) and combining,

$$\begin{aligned} {\mathbb {P}}\left( |\xi (X^2 - 1) |> t \right) \le 2 e \cdot e^{-\frac{t}{\lambda L^2}} \le 93 \cdot \frac{1}{17} e^{-\frac{t}{\lambda L^2}} \le 93 {\mathbb {P}}\left( \lambda L^2 |\xi '(g^2 - 1) |> t\right) . \end{aligned}$$

\(\square \)

Theorem 16

There is an absolute constant \(C > 0\) such that the following holds. Let \(X \in {\mathbb {R}}^n\) have independent entries that have mean 0 and variance 1 and are subgaussian with \(\psi _2\) norm \(\le L\) for an \(L \ge 1\). Take a Gaussian vector \(g \sim N(0, Id_n)\) and \(a \in {\mathbb {R}}^n\). Then

$$\begin{aligned} {\mathbb {E}}\left|\sum _{k = 1}^n a_k (X_k^2 - 1) \right|^p \le (C L^2)^p {\mathbb {E}}\left|\sum _{k = 1}^n a_k (g_k^2 - 1) \right|^p. \end{aligned}$$

Proof

Consider independent Rademacher variables \(\xi _1, \ldots , \xi _n, {\bar{\xi }}_1, \ldots , {\bar{\xi }}_n \in \{\pm 1\}^n\) that are also independent of X and g. By the symmetrization Lemma 7, it holds that

$$\begin{aligned} {\mathbb {E}}\left|\sum _{k = 1}^n a_k (X_k^2 - 1) \right|^p&\le 2^p {\mathbb {E}}\left|\sum _{k = 1}^n a_k \xi _k (X_k^2 - 1) \right|^p \nonumber \\ {\mathbb {E}}\left|\sum _{k = 1}^n a_k {\bar{\xi }}_k (g_k^2 - 1) \right|^p&\le 2^p {\mathbb {E}}\left|\sum _{k = 1}^n a_k (g_k^2 - 1) \right|^p. \end{aligned}$$
(13)

Using that \(\xi _k (X^2 - 1) \prec _{(\kappa , \lambda L^2)} {\bar{\xi }}_k (g^2 - 1)\) by Lemma 15 and that \(|\cdot |^p\) is a convex nondecreasing function \({\mathbb {R}}^{+} \rightarrow {\mathbb {R}}^{+}\), Theorem 11 implies that there is a constant \({\tilde{C}} > 0\) such that

$$\begin{aligned} {\mathbb {E}}\left|\sum _{k = 1}^n a_k \xi _k (X_k^2 - 1) \right|^p \le ({\tilde{C}} L^2)^p {\mathbb {E}}\left|\sum _{k = 1}^n a_k {\bar{\xi }}_k (g_k^2 - 1) \right|^p. \end{aligned}$$

\(\square \)

Theorem 5 is an important tool for the proof of our decoupling result (Theorem 4). Its purpose is to rearrange a chaos in such a way that—under some changes—the quadratic factors that occur (here \((X^{(l)}_{{\varvec{i}}_l})^2\)) are replaced by corresponding mean 0 factors of the type \(\left[ (X^{(l)}_{{\varvec{i}}_l})^2 - 1 \right] \), which also occur in Theorem 4.

Rearranging the terms with this theorem enables an iterative application of the standard decoupling Theorem 8 in the proof of Theorem 4. Furthermore, the factors \(\left[ (X^{(l)}_{{\varvec{i}}_l})^2 - 1 \right] \) are 0 in the Rademacher case (Remark 3). In the general case, after the comparison with Gaussians, they will be turned into a product of two independent factors with the subsequent Lemma 17 in the proof of Theorem 3.

As the next step, we prove this Theorem 5.

Proof of Theorem 5

Observing that for any \(I \subset [d]\), \({\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(I)\),

$$\begin{aligned} \prod _{l \in [d] \backslash I} \left[ (X^{(l)}_{{\varvec{i}}_l})^2 - 1 \right] = \sum _{I' \subset [d] \backslash I} (-1)^{|[d] \backslash (I \cup I') |} \prod _{l \in I'} (X^{(l)}_{{\varvec{i}}_l})^2, \end{aligned}$$

we obtain

$$\begin{aligned}&\sum _{\begin{array}{c} I \subset [d] \\ {\varvec{i}} \in {\varvec{J}}^{\varvec{n}}([d] \backslash I) \end{array}} A^{\langle I \rangle }_{{\varvec{i}}} \prod _{l \in [d] \backslash I} \left[ (X^{(l)}_{{\varvec{i}}_l})^2 - 1 \right] \\&\quad = \sum _{\begin{array}{c} I \subset [d] \\ {\varvec{i}} \in {\varvec{J}}^{\varvec{n}}([d] \backslash I) \\ {\varvec{j}} \in {\varvec{J}}^{\varvec{n}}(I) \end{array}} A_{{\varvec{i}} {\dot{\times }} {\varvec{j}}} \sum _{I' \subset [d] \backslash I} (-1)^{|[d] \backslash (I \cup I') |} \prod _{l \in I'} (X^{(l)}_{{\varvec{i}}_l})^2 \\&\quad = \sum _{\begin{array}{c} I \subset [d] \\ I' \subset [d] \backslash I \end{array}} (-1)^{|[d] \backslash (I \cup I') |} \sum _{\begin{array}{c} {\varvec{i}} \in {\varvec{J}}^{\varvec{n}}([d] \backslash I) \\ {\varvec{j}} \in {\varvec{J}}^{\varvec{n}}(I) \end{array}} A_{{\varvec{i}} {\dot{\times }} {\varvec{j}}} \prod _{l \in I'} (X^{(l)}_{{\varvec{i}}_l})^2 \\&\quad = \sum _{\begin{array}{c} I' \subset [d] \\ I \subset [d] \backslash I' \end{array}} (-1)^{|[d] \backslash (I \cup I') |} \sum _{{\varvec{i}} \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}}} \prod _{l \in I'} (X^{(l)}_{{\varvec{i}}_l})^2 \\&\quad = \sum _{I' \subset [d]} \left[ \left( \sum _{I \subset [d] \backslash I'} (-1)^{|([d] \backslash I') \backslash I |} \right) \cdot \left( \sum _{{\varvec{i}} \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}}} \prod _{l \in I'} (X^{(l)}_{{\varvec{i}}_l})^2 \right) \right] . \end{aligned}$$

This implies the claim using Lemma 9. \(\square \)

A key to the proof of the upper moment bound in our main result (Theorem 3) is the decoupling technique of Theorem 4. With the above auxiliary results, we can give the proof of it here.

Proof of Theorem 4

$$\begin{aligned} b&:= \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in [d]} X^{(l)}_{{\varvec{i}}_l} X^{(l)}_{{\varvec{i}}'_l} \\&= \sum _{I \subset [d]} \sum _{\begin{array}{c} {\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(I) \\ {\varvec{j}}, {\varvec{j}}' \in {\varvec{J}}^{\varvec{n}}(I^c) \\ \forall l \in I^c: {\varvec{j}}_l \ne {\varvec{j}}'_l \end{array}} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}}) \dot{+} ({\varvec{i}} {\dot{\times }} {\varvec{j}}')} \prod _{l \in I} (X^{(l)}_{{\varvec{i}}_l})^2 \prod _{l \in I^c} X^{(l)}_{{\varvec{j}}_l} X^{(l)}_{{\varvec{j}}'_l} \end{aligned}$$

since each summand \({\varvec{i}}, {\varvec{i}}'\) is precisely considered in the sum for \(I = \{l \in [d]: {\varvec{i}}_l = {\varvec{i}}'_l\}\) and no other I.

Now applying Theorem 5 yields

$$\begin{aligned} b&= \sum _{I \subset [d]} \sum _{{\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(I)} \left( \sum _{\begin{array}{c} {\varvec{j}}, {\varvec{j}}' \in {\varvec{J}}^{\varvec{n}}(I^c) \\ \forall l \in I^c: {\varvec{j}}_l \ne {\varvec{j}}'_l \end{array}} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}}) \dot{+} ({\varvec{i}} {\dot{\times }} {\varvec{j}}')} \prod _{l \in I^c} X^{(l)}_{{\varvec{j}}_l} X^{(l)}_{{\varvec{j}}'_l} \right) \prod _{l \in I} (X^{(l)}_{{\varvec{i}}_l})^2 \\&= \sum _{\begin{array}{c} I, J \subset [d]: \\ J \subset I \end{array}} \sum _{\begin{array}{c} {\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(J) \\ {\varvec{k}} \in {\varvec{J}}^{\varvec{n}}(I \backslash J) \end{array}} \left( \sum _{\begin{array}{c} {\varvec{j}}, {\varvec{j}}' \in {\varvec{J}}^{\varvec{n}}(I^c) \\ \forall l \in I^c: {\varvec{j}}_l \ne {\varvec{j}}'_l \end{array}} A_{\begin{array}{c} ({\varvec{i}} {\dot{\times }} {\varvec{j}} {\dot{\times }} {\varvec{k}}) \\ \dot{+} ({\varvec{i}} {\dot{\times }} {\varvec{j}}' {\dot{\times }} {\varvec{k}}) \end{array}} \prod _{l \in I^c} X^{(l)}_{{\varvec{j}}_l} X^{(l)}_{{\varvec{j}}'_l} \right) \prod _{l \in J} \left[ (X^{(l)}_{{\varvec{i}}_l})^2 - 1 \right] \\&= \sum _{\begin{array}{c} I, J \subset [d]: \\ J \subset I \end{array}} \sum _{\begin{array}{c} {\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(J) \\ {\varvec{k}} \in {\varvec{J}}^{\varvec{n}}(I \backslash J) \\ {\varvec{j}}, {\varvec{j}}' \in {\varvec{J}}^{\varvec{n}}(I^c) \\ \forall l \in I^c: {\varvec{j}}_l \ne {\varvec{j}}'_l \end{array}} A_{\begin{array}{c} ({\varvec{i}} {\dot{\times }} {\varvec{j}} {\dot{\times }} {\varvec{k}}) \\ \dot{+} ({\varvec{i}} {\dot{\times }} {\varvec{j}}' {\dot{\times }} {\varvec{k}}) \end{array}} \prod _{l \in I^c} X^{(l)}_{{\varvec{j}}_l} X^{(l)}_{{\varvec{j}}'_l} \prod _{l \in J} \left[ (X^{(l)}_{{\varvec{i}}_l})^2 - 1 \right] \\&=: \sum _{\begin{array}{c} I, J \subset [d]: \\ J \subset I \end{array}} S_{I, J}. \end{aligned}$$

Because of

$$\begin{aligned} S_{[d], \emptyset } = \sum _{{\varvec{k}} \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{k}} \dot{+} {\varvec{k}}} = {\mathbb {E}} \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in [d]} X^{(l)}_{{\varvec{i}}_l} X^{(l)}_{{\varvec{i}}'_l} \end{aligned}$$

and the triangle inequality, we obtain

$$\begin{aligned} \Vert b - {\mathbb {E}} b\Vert _{L_p} \le \sum _{\begin{array}{c} I, J \subset [d]: \\ J \subset I, I \backslash J \ne \emptyset \end{array}} \Vert S_{I, J}\Vert _{L_p}. \end{aligned}$$
(14)

For any fixed \(l_0 \in I^c\), we obtain that \(\Vert S_{I, J}\Vert _{L_p} =\)

$$\begin{aligned} \left\| \sum _{\begin{array}{c} \bar{{\varvec{j}}}, \bar{{\varvec{j}}}' \in {\varvec{J}}^{\varvec{n}}(\{l_0\}) \\ \bar{{\varvec{j}}}_{l_0} \ne \bar{{\varvec{j}}}'_{l_0} \end{array}} \left( \sum _{\begin{array}{c} {\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(J) \\ {\varvec{k}} \in {\varvec{J}}^{\varvec{n}}(I \backslash J) \\ {\varvec{j}}, {\varvec{j}}' \in {\varvec{J}}^{\varvec{n}}(I^c \backslash \{l_0\}) \\ \forall l \in I^c: {\varvec{j}}_l \ne {\varvec{j}}'_l \end{array}} A_{\begin{array}{c} ({\varvec{i}} {\dot{\times }} {\varvec{j}} {\dot{\times }} \bar{{\varvec{j}}} {\dot{\times }} {\varvec{k}}) \\ \dot{+} ({\varvec{i}} {\dot{\times }} {\varvec{j}}' {\dot{\times }} \bar{{\varvec{j}}}' {\dot{\times }} {\varvec{k}}) \end{array}} \prod _{l \in I^c} X^{(l)}_{{\varvec{j}}_l} X^{(l)}_{{\varvec{j}}'_l} \prod _{l \in J} \left[ (X^{(l)}_{{\varvec{i}}_l})^2 - 1 \right] \right) X^{(l_0)}_{\bar{{\varvec{j}}}_{l_0}} X^{(l_0)}_{\bar{{\varvec{j}}}'_{l_0}} \right\| _{L_p}. \end{aligned}$$

We can apply the decoupling Theorem 8 to this for the convex function \(|\cdot |^p\) and the expectation conditioned on all variables except \(X^{(l_0)}\). This leads to \(\Vert S_{I, J}\Vert _{L_p} \le \)

$$\begin{aligned} 4 \left\| \sum _{\bar{{\varvec{j}}}, \bar{{\varvec{j}}}' \in {\varvec{J}}^{\varvec{n}}(\{l_0\})} \left( \sum _{\begin{array}{c} {\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(J) \\ {\varvec{k}} \in {\varvec{J}}^{\varvec{n}}(I \backslash J)\\ {\varvec{j}}, {\varvec{j}}' \in {\varvec{J}}^{\varvec{n}}(I^c \backslash \{l_0\}) \\ \forall l \in I^c: {\varvec{j}}_l \ne {\varvec{j}}'_l \end{array}} A_{\begin{array}{c} ({\varvec{i}} {\dot{\times }} {\varvec{j}} {\dot{\times }} \bar{{\varvec{j}}} {\dot{\times }} {\varvec{k}}) \\ \dot{+} ({\varvec{i}} {\dot{\times }} {\varvec{j}}' {\dot{\times }} \bar{{\varvec{j}}}' {\dot{\times }} {\varvec{k}}) \end{array}} \prod _{l \in I^c} X^{(l)}_{{\varvec{j}}_l} X^{(l)}_{{\varvec{j}}'_l} \prod _{l \in J} \left[ (X^{(l)}_{{\varvec{i}}_l})^2 - 1 \right] \right) X^{(l_0)}_{\bar{{\varvec{j}}}_{l_0}} {\bar{X}}^{(l_0)}_{\bar{{\varvec{j}}}'_{l_0}} \right\| _{L_p}. \end{aligned}$$

Repeating this procedure iteratively for all other \(l \in I^c\), we obtain

$$\begin{aligned} \Vert S_{I, J}\Vert _{L_p} \le 4^{d - |I |} \left\| \sum _{\begin{array}{c} {\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(J) \\ {\varvec{k}} \in {\varvec{J}}^{\varvec{n}}(I \backslash J)\\ {\varvec{j}}, {\varvec{j}}' \in {\varvec{J}}^{\varvec{n}}(I^c) \end{array}} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}} {\dot{\times }} {\varvec{k}}) \dot{+} ({\varvec{i}} {\dot{\times }} {\varvec{j}}' {\dot{\times }} {\varvec{k}})} \prod _{l \in I^c} X^{(l)}_{{\varvec{j}}_l} {\bar{X}}^{(l)}_{{\varvec{j}}'_l} \prod _{l \in J} \left[ (X^{(l)}_{{\varvec{i}}_l})^2 - 1 \right] \right\| _{L_p}. \end{aligned}$$

Substituting this into (14) completes the proof. \(\square \)

The works in [16, 21] have investigated polynomials with higher powers of Gaussian variables. Since in our scenario, we only have two occurrences of every vector, thus we can repeatedly apply their result for the case of two coinciding indices. Considering that \(H_2(x) = x^2 - 1\) is the Hermite polynomial of degree 2 and leading coefficient 1, equation (2.9) in [16] in our setup can be written as follows. Note that as suggested there, the case \(p \ge 1\) can also be shown using Jensen’s inequality which can be used to show this inequality with coefficient 2.

Lemma 17

Let \(a \in {\mathbb {R}}^n\), \(g, {\bar{g}} \sim N(0, Id_n)\), \(p \ge 1\). Then

$$\begin{aligned} \left\| \sum _{k = 1}^n a_k (g_k^2 - 1) \right\| _{L_p} \le 2 \left\| \sum _{k = 1}^n a_k g_k {\bar{g}}_k \right\| _{L_p}. \end{aligned}$$

Combining the previous lemmas, now we can prove the upper bound in the main Theorem 3.

3.2.2 Proof of Theorem 3, upper bound

Step 1: Decoupling

Let \(\alpha := \Vert X^T A X - {\mathbb {E}} X^T A X\Vert _{L_p}\). By Theorem 4, \(\alpha \le \)

$$\begin{aligned} \sum _{\begin{array}{c} J \subset I \subset [d] \\ I \backslash J \ne [d] \end{array}} 4^{d - |I |} \left\| \sum _{\begin{array}{c} {\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(J) \\ {\varvec{k}} \in {\varvec{J}}^{\varvec{n}}(I \backslash J) \\ {\varvec{j}}, {\varvec{j}}' \in {\varvec{J}}^{\varvec{n}}(I^c) \end{array}} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}} {\dot{\times }} {\varvec{k}}) \dot{+} ({\varvec{i}} {\dot{\times }} {\varvec{j}}' {\dot{\times }} {\varvec{k}})} \prod _{l \in I^c} X^{(l)}_{{\varvec{j}}_l} {\bar{X}}^{(l)}_{{\varvec{j}}'_l} \prod _{l \in J} \left[ (X^{(l)}_{{\varvec{i}}_l})^2 - 1 \right] \right\| _{L_p}. \end{aligned}$$
(15)

Step 2: Replacing the subgaussian factors by Gaussians

In (15), we can repeatedly apply Lemma 14 to replace all the linear subgaussian factors by Gaussian ones. Afterwards, Theorem 16 allows the same for the quadratic terms. Together, this yields that \(\alpha \le \)

$$\begin{aligned} \sum _{\begin{array}{c} J \subset I \subset [d] \\ I \backslash \ne [d] \end{array}} (C L)^{|I^c |+ |J |} \left\| \sum _{\begin{array}{c} {\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(J) \\ {\varvec{k}} \in {\varvec{J}}^{\varvec{n}}(I \backslash J) \\ {\varvec{j}}, {\varvec{j}}' \in {\varvec{J}}^{\varvec{n}}(I^c) \end{array}} A_{\begin{array}{c} ({\varvec{i}} {\dot{\times }} {\varvec{j}} {\dot{\times }} {\varvec{k}}) \\ \dot{+} ({\varvec{i}} {\dot{\times }} {\varvec{j}}' {\dot{\times }} {\varvec{k}}) \end{array}} \prod _{l \in I^c} g^{(l)}_{{\varvec{j}}_l} {\bar{g}}^{(l)}_{{\varvec{j}}'_l} \prod _{l \in J} \left[ (g^{(l)}_{{\varvec{i}}_l})^2 - 1 \right] \right\| _{L_p}. \end{aligned}$$
(16)

Step 3: Decoupling of squared Gaussians In an analogous fashion as in step 2, we can successively replace all the factors \(\left[ (g^{(l)}_{{\varvec{i}}_l})^2 - 1 \right] \) in (16) by \(g^{(l)}_{{\varvec{i}}_l} {\bar{g}}^{(l)}_{{\varvec{i}}_l}\) using Lemma 17. This leads to

$$\begin{aligned} \alpha \le&\sum _{\begin{array}{c} J \subset I \subset [d] \\ I \backslash J \ne [d] \end{array}} (C L)^{|I^c |+ |J |} \left\| \sum _{\begin{array}{c} {\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(J) \\ {\varvec{k}} \in {\varvec{J}}^{\varvec{n}}(I \backslash J) \\ {\varvec{j}}, {\varvec{j}}' \in {\varvec{J}}^{\varvec{n}}(I^c) \end{array}} A_{\begin{array}{c} ({\varvec{i}} {\dot{\times }} {\varvec{j}} {\dot{\times }} {\varvec{k}}) \\ \dot{+} ({\varvec{i}} {\dot{\times }} {\varvec{j}}' {\dot{\times }} {\varvec{k}}) \end{array}} \prod _{l \in I^c} g^{(l)}_{{\varvec{j}}_l} {\bar{g}}^{(l)}_{{\varvec{j}}'_l} \prod _{l \in J} g^{(l)}_{{\varvec{i}}_l} {\bar{g}}^{(l)}_{{\varvec{i}}_l} \right\| _{L_p} \\ =&\sum _{\begin{array}{c} J \subset I \subset [d] \\ I \backslash J \ne [d] \end{array}} (C L)^{|I^c |+ |J |} \left\| \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(I^c \cup J)} A^{(I, J)}_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in I^c \cup J} g^{(l)}_{{\varvec{j}}_l} {\bar{g}}^{(l)}_{{\varvec{j}}'_l} \right\| _{L_p}. \end{aligned}$$

where for all \({\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(J \cup I^c)\),

$$\begin{aligned} A^{(I, J)}_{{\varvec{i}} \dot{+} {\varvec{i}}'} = {\left\{ \begin{array}{ll} \sum _{{\varvec{k}} \in {\varvec{J}}^{\varvec{n}}(I \backslash J)} A_{({\varvec{i}} {\dot{\times }} {\varvec{k}}) \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{k}})} &{} \text {if } \forall l \in J: {\varvec{i}}_l = {\varvec{i}}_l' \\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(17)

Step 4: Completing the proof Then Theorem 2 yields that

$$\begin{aligned} \left\| \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(J \cup I^c)} A^{(I, J)}_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in I^c \cup J} g^{(l)}_{{\varvec{i}}_l} {\bar{g}}^{(l)}_{{\varvec{i}}'_l} \right\| _{L_p} \le {\tilde{m}}^{(I, J)}_p \end{aligned}$$

where for \(S((J \cup I^c) \cup ((J \cup I^c) + d), \kappa )\) being the set of all partitions of \((J \cup I^c) \cup ((J \cup I^c) + d)\) into \(\kappa \) sets,

$$\begin{aligned} {\tilde{m}}^{(I, J)}_p := \sum _{\kappa = 1}^d p^{\kappa / 2} \sum _{(I_1, \ldots , I_\kappa ) \in S((J \cup I^c) \cup ((J \cup I^c) + d), \kappa )} \Vert {\varvec{A}}^{(I, J)}\Vert _{I_1, \ldots , I_\kappa }. \end{aligned}$$

By Lemma 10, \(\Vert {\varvec{A}}^{(I, J)}\Vert _{I_1, \ldots , I_\kappa } \le \Vert {\varvec{A}}^{(I)}\Vert _{I_1, \ldots , I_\kappa }\) where \({\varvec{A}}^{(I)} = {\varvec{A}}^{(I, \emptyset )}\) as given in the statement of Theorem 3. Together with this, the upper bound in Theorem 3 follows.

3.3 Proof of the lower bound

3.3.1 Required tools

In this section, we will prove the lower bound in Theorem 3. Unlike the upper bound, we will only prove this for the case of Gaussian vectors. Indeed, for arbitrary subgaussian distributions, the lower bound fails to hold as the following simple example for the case \(d = 1\) shows: Consider the identity matrix \(Id_n\) and a Rademacher vector \(\xi \in \{\pm 1\}^n\). Then the object of interest in Theorem 3 is \(\xi ^T Id_n \xi - {\mathbb {E}}[\xi ^T Id_n \xi ] = 0\) even though the moment bounds \(m_p\) would be \(> 0\).

We follow the approach of reversing all steps in the proof of the upper bound, without the Gaussian comparison steps. This is why also the two decoupling steps before and after the Gaussian comparison can be performed together.

As mentioned before, Gaussian decoupling, with upper as well as lower bounds, has been studied in [16] where central ideas of [21] have been used. [16] provides a decoupling inequality for Gaussian chaos with an arbitrary number of coinciding indices. Similarly to Lemma 17, we can adapt the result of Equation (2.9) in [16] to our situation as follows.

Lemma 18

Let \(A \in {\mathbb {R}}^{n \times n}\) be a symmetric matrix, \(g, {\bar{g}} \sim N(0, Id_n)\) be independent, and \(p \ge 1\).

$$\begin{aligned} \left\| \sum _{j, k \in [n]} A_{j, k} g_j {\bar{g}}_k \right\| _{L_p} \le \left\| \sum _{j, k \in [n]} A_{j, k} (g_j g_k - \mathbbm {1}_{j = k}) \right\| _{L_p}. \end{aligned}$$

To generalize this to cases of multiple axes, we iteratively apply Lemma 18 to obtain the following corollary.

Corollary 19

Let \({\varvec{n}} \in {\mathbb {N}}^d\), \({\varvec{A}} \in {\mathbb {R}}^{{\varvec{n}}^{\times 2}}\) such that \({\varvec{A}}\) satisfies the symmetry condition that for all \(l \in [d]\) and any \({\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}([d] \backslash \{l\})\), \({\varvec{j}}, {\varvec{j}}' \in {\varvec{J}}^{\varvec{n}}(\{l\})\),

$$\begin{aligned} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}}) \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{j}}')} = A_{({\varvec{i}} {\dot{\times }} {\varvec{j}}') \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{j}})} \end{aligned}$$
(18)

Let \(g^{(1)}, {\bar{g}}^{(1)} \sim N(0, Id_{{\varvec{n}}_1}), \ldots , g^{(d)}, {\bar{g}}^{(d)} \sim N(0, Id_{{\varvec{n}}_d})\) be independent. Then for any set \(I \subset [d]\), \(p \ge 1\),

$$\begin{aligned}&\left\| \sum _{\begin{array}{c} {\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(I) \\ {\varvec{j}}\in {\varvec{J}}^{\varvec{n}}(I^c) \end{array}} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}}) \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{j}})} \prod _{l \in I} g^{(l)}_{{\varvec{i}}_l} {\bar{g}}^{(l)}_{{\varvec{i}}'_l} \right\| _{L_p} \\&\quad \le \left\| \sum _{\begin{array}{c} {\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(I) \\ {\varvec{j}} \in {\varvec{J}}^{\varvec{n}}(I^c) \end{array}} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}}) \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{j}})} \prod _{l \in I} \left[ g^{(l)}_{{\varvec{i}}_l} g^{(l)}_{{\varvec{i}}'_l} - \mathbbm {1}_{{\varvec{i}}_l = {\varvec{i}}'_l} \right] \right\| _{L_p} \end{aligned}$$

Independently of the Gaussian decoupling approach, the following two lemmas provide a tool to reverse the application of the rearrangement result Theorem 5 in the proof of the upper bound.

Lemma 20

Let \({\varvec{A}} \in {\mathbb {R}}^{{\varvec{n}}^{\times 2}}\) be an array of order 2d and \(X^{(1)} \in {\mathbb {R}}^{n_1}, \ldots X^{(d)} \in {\mathbb {R}}^{n_d}\) vectors. Then

$$\begin{aligned}&\sum _{I \subset [d]} \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(I)} \sum _{{\varvec{j}} \in {\varvec{J}}^{\varvec{n}}(I^c)} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}}) \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{j}})} \prod _{l \in I} \left[ X^{(l)}_{{\varvec{i}}_{l}} X^{(l)}_{{\varvec{i}}'_l} - \mathbbm {1}_{{\varvec{i}}_l = {\varvec{i}}'_l} \right] \\&\quad = \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in [d]} X^{(l)}_{{\varvec{i}}_l} X^{(l)}_{{\varvec{i}}'_l}. \end{aligned}$$

Proof

Note that

$$\begin{aligned} \prod _{l \in I} \left[ X^{(l)}_{{\varvec{i}}_{l}} X^{(l)}_{{\varvec{i}}'_l} - \mathbbm {1}_{{\varvec{i}}_l = {\varvec{i}}'_l} \right] = \sum _{J \subset I} (-\mathbbm {1}_{{\varvec{i}}_l = {\varvec{i}}'_l})^{|I \backslash J |} \prod _{l \in J} X^{(l)}_{{\varvec{i}}_{l}} X^{(l)}_{{\varvec{i}}'_l}. \end{aligned}$$

Using this, we obtain

$$\begin{aligned} \alpha&:= \sum _{I \subset [d]} \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(I)} \sum _{{\varvec{j}} \in {\varvec{J}}^{\varvec{n}}(I^c)} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}}) \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{j}})} \prod _{l \in I} \left[ X^{(l)}_{{\varvec{i}}_{l}} X^{(l)}_{{\varvec{i}}'_l} - \mathbbm {1}_{{\varvec{i}}_l = {\varvec{i}}'_l} \right] \\&= \sum _{I \subset [d]} \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(I)} \sum _{{\varvec{j}} \in {\varvec{J}}^{\varvec{n}}(I^c)} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}}) \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{j}})} \sum _{J \subset I} \prod _{l \in I \backslash J } (-\mathbbm {1}_{{\varvec{i}}_l = {\varvec{i}}'_l}) \prod _{l \in J} X^{(l)}_{{\varvec{i}}_{l}} X^{(l)}_{{\varvec{i}}'_l} \end{aligned}$$

Observing that

$$\begin{aligned} \prod _{l \in I \backslash J } (-\mathbbm {1}_{{\varvec{i}}_l = {\varvec{i}}'_l}) = {\left\{ \begin{array}{ll} (-1)^{|I \backslash J |} &{} \text {if } \forall j \in I \backslash J: {\varvec{i}}_l = {\varvec{i}}'_l \\ 0 &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$

we can conclude

$$\begin{aligned} \alpha&= \sum _{I \subset [d]} \sum _{J \subset I} \sum _{\begin{array}{c} {\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(J) \\ {\varvec{k}} \in {\varvec{J}}^{\varvec{n}}(I \backslash J) \end{array}} \sum _{{\varvec{j}} \in {\varvec{J}}^{\varvec{n}}(I^c)} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}} {\dot{\times }} {\varvec{k}}) \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{j}} {\dot{\times }} {\varvec{k}} )} (-1)^{|I \backslash J |} \prod _{l \in J} X^{(l)}_{{\varvec{i}}_{l}} X^{(l)}_{{\varvec{i}}'_l} \\&= \sum _{J \subset [d]} \sum _{I \supset J} (-1)^{|I \backslash J |} \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(J)} \sum _{{\varvec{j}} \in {\varvec{J}}^{\varvec{n}}(J^c)} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}}) \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{j}})} \prod _{l \in J} X^{(l)}_{{\varvec{i}}_{l}} X^{(l)}_{{\varvec{i}}'_l} \end{aligned}$$

Lemma 9 yields

$$\begin{aligned} \sum _{I \supset J} (-1)^{|I \backslash J |} = \sum _{I' \subset [d] \backslash J} (-1)^{|I' |} = {\left\{ \begin{array}{ll} 1 &{} \text {if } J = [d] \\ 0 &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$

such that

$$\begin{aligned} \alpha = \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in [d]} X^{(l)}_{{\varvec{i}}_{l}} X^{(l)}_{{\varvec{i}}'_l}. \end{aligned}$$

\(\square \)

Lemma 21

Let \({\varvec{A}} \in {\mathbb {R}}^{{\varvec{n}}^{\times 2}}\) be an array of order 2d and \(X^{(1)} \in {\mathbb {R}}^{n_1}, \dots X^{(d)} \in {\mathbb {R}}^{n_d}\) independent random vectors with mean 0, variance 1 entries. Then for any subset \(\emptyset \ne I \subset [d]\), \(p \ge 1\),

$$\begin{aligned}&\left\| \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(I)} \sum _{{\varvec{j}} \in {\varvec{J}}^{\varvec{n}}(I^c)} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}}) {\dot{\times }} ({\varvec{i}}' {\dot{\times }} {\varvec{j}})} \prod _{l \in I} \left[ X^{(l)}_{{\varvec{i}}_{l}} X^{(l)}_{{\varvec{i}}'_l} - \mathbbm {1}_{{\varvec{i}}_l = {\varvec{i}}'_l} \right] \right\| _{L_p} \nonumber \\&\quad \le C(|I |) \left\| \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in [d]} X^{(l)}_{{\varvec{i}}_l} X^{(l)}_{{\varvec{i}}'_l} - {\mathbb {E}} \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in [d]} X^{(l)}_{{\varvec{i}}_l} X^{(l)}_{{\varvec{i}}'_l} \right\| _{L_p}, \end{aligned}$$
(19)

where \(C(|I |)\) is a constant only depending on \(|I |\).

Proof

By the assumptions on the vectors \(X^{(l)}\),

$$\begin{aligned} E := {\mathbb {E}} \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in [d]} X^{(l)}_{{\varvec{i}}_l} X^{(l)}_{{\varvec{i}}'_l} = \sum _{{\varvec{i}} \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}} \dot{+} {\varvec{i}}}. \end{aligned}$$

Since this is exactly the term for \(I = \emptyset \) in Lemma 20, we obtain for the term on the right hand side of (19),

$$\begin{aligned} b :=&\sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in [d]} X^{(l)}_{{\varvec{i}}_l} X^{(l)}_{{\varvec{i}}'_l} - E \\ =&\sum _{\begin{array}{c} \emptyset \ne J \subset [d] \\ {\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(J) \\ {\varvec{j}} \in {\varvec{J}}^{\varvec{n}}(J^c) \end{array}} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}}) \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{j}})} \prod _{l \in J} \left[ X^{(l)}_{{\varvec{i}}_{l}} X^{(l)}_{{\varvec{i}}'_l} - \mathbbm {1}_{{\varvec{i}}_l = {\varvec{i}}'_l} \right] =: \sum _{\begin{array}{c} J \subset [d] \\ J \ne \emptyset \end{array}} S_J. \end{aligned}$$

Using these terms, we need to show that \(\Vert S_I\Vert _{L_p} \le C(|I |) \Vert b\Vert _{L_p}\) for all \(\emptyset \ne I \subset [d]\).

Now we prove this by induction over \(|I |\). First assume \(I = \{l_0\}\). For any \(J \ne \emptyset , I\), there exists an \(l \in J \backslash I\) and then

$$\begin{aligned} {\mathbb {E}} \left[ \prod _{l \in J} \left[ X^{(l)}_{{\varvec{i}}_{l}} X^{(l)}_{{\varvec{i}}'_l} - \mathbbm {1}_{{\varvec{i}}_l = {\varvec{i}}'_l} \right] \bigg \vert X^{(l_0)} \right] = 0 \end{aligned}$$

since there is at least one factor whose conditional expectation is 0.

We conclude

$$\begin{aligned} {\mathbb {E}} \left|S_I \right|^p&= {\mathbb {E}} \left|S_I + {\mathbb {E}} \left[ \sum _{\begin{array}{c} J \subset [d]: J \ne \emptyset , I \end{array}} S_J \,\bigg \vert \, X^{(l_0)} \right] \right|^p \\&= {\mathbb {E}} \left|{\mathbb {E}} \left[ \sum _{\begin{array}{c} J \subset [d]: J \ne \emptyset \end{array}} S_J \,\bigg \vert \, X^{(l_0)} \right] \right|^p \le {\mathbb {E}} |b |^p, \end{aligned}$$

where we used Jensen’s inequality on the conditional expectation in the last step.

Now assume that we have already shown (19) for all \(\emptyset \ne I' \subset [d]\) with \(|I' |< |I |\).

For all \(J \subset [d]\) such that \(J \ne \emptyset , I\), one of the following holds.

  • \(J \backslash I = \emptyset \), i.e., \(J \subset I\): Because \(J \ne I\), \(|J |< |I |\), so by induction

    $$\begin{aligned} \Vert S_J\Vert _{L_p} \le C(|J |) \Vert b\Vert _{L_p}. \end{aligned}$$
    (20)
  • \(J \backslash I \ne \emptyset \). Since there is an \(l' \in J \backslash I\),

    $$\begin{aligned} {\mathbb {E}} \left[ \prod _{l \in J} \left[ X^{(l)}_{{\varvec{i}}_{l}} X^{(l)}_{{\varvec{i}}'_l} - \mathbbm {1}_{{\varvec{i}}_l = {\varvec{i}}'_l} \right] \,\bigg \vert \, (X^{(l)})_{l \in I} \right] = 0. \end{aligned}$$
    (21)

The triangle inequality yields together with (20), that \(\Vert S_I\Vert _{L_p} \le \)

$$\begin{aligned} \left\| S_I + \sum _{\begin{array}{c} J \subset I \\ J \ne \emptyset , I \end{array}} S_J\right\| _{L_p} + \sum _{\begin{array}{c} J \subset I \\ J \ne \emptyset , I \end{array}} \Vert S_J\Vert _{L_p} \le \left\| S_I + \sum _{\begin{array}{c} J \subset I \\ J \ne \emptyset , I \end{array}} S_J\right\| _{L_p} + \left[ \sum _{J \subset I, J \ne \emptyset , I} C(|J |) \right] \Vert b\Vert _{L_p}. \end{aligned}$$

The first term on the right hand side can be controlled with (21) and Jensen’s inequality,

$$\begin{aligned} {\mathbb {E}} \left|S_I + \sum _{\begin{array}{c} J \subset I: J \ne \emptyset , I \end{array}} S_J \right|^p&= {\mathbb {E}} \left|S_I + \sum _{\begin{array}{c} J \subset I: J \ne \emptyset , I \end{array}} S_J + {\mathbb {E}}\left[ \sum _{\begin{array}{c} J \subset [d]: J \backslash I \ne \emptyset \end{array}} S_J \,\Bigg \vert \, (X^{(l)})_{l \in I} \right] \right|^p \\&= {\mathbb {E}} \left|{\mathbb {E}}\left[ \sum _{J \subset [d]: J \ne \emptyset } S_J \,\Bigg \vert \, (X^{(l)})_{l \in I} \right] \right|^p \le {\mathbb {E}} |b |^p. \end{aligned}$$

So altogether \(\Vert S_I\Vert _{L_p} \le C(|I |) \Vert b\Vert _{L_p}\) where \(C(|I |) := \sum _{J \subset I: J \ne \emptyset , I} C(|J |) + 1\) depends only on \(|I |\). \(\square \)

Now we introduced all the necessary tools and can prove the lower bound of the main result, Theorem 3.

3.3.2 Proof of Theorem 3, lower bound

For any \(J \subset I \subset [d]\), define the array \({\varvec{A}}^{(I, J)}\) as in the proof of the upper bound (17) and

$$\begin{aligned} \alpha ^{(I, J)} := \left\| \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(J \cup I^c)} A^{(I, J)}_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in I^c \cup J} g^{(l)}_{{\varvec{i}}_l} {\bar{g}}^{(l)}_{{\varvec{i}}'_l} \right\| _{L_p} \end{aligned}$$
(22)

Step 1: Adding off-diagonal terms

Define independent Rademacher vectors \((\xi ^{(l)})_{l \in J}\) which are also independent of the \(g^{(1)}, \ldots g^{(d)}\), \({\bar{g}}^{(1)}, \ldots , {\bar{g}}^{(d)}\).

Noting that \({\mathbb {E}}_\xi [\xi ^{(l)}_{{\varvec{i}}_l} \xi ^{(l)}_{{\varvec{i}}'_l}] = \mathbbm {1}_{{\varvec{i}}_l = {\varvec{i}}'_l}\), we obtain

$$\begin{aligned}&{\mathbb {E}}_{\xi } \left[ \sum _{\begin{array}{c} {\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(J) \\ {\varvec{k}} \in {\varvec{J}}^{\varvec{n}}(I \backslash J) \\ {\varvec{j}}, {\varvec{j}}' \in {\varvec{J}}^{\varvec{n}}(I^c) \end{array}} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}} {\dot{\times }} {\varvec{k}}) \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{j}}' {\dot{\times }} {\varvec{k}})} \prod _{l \in I^c} g^{(l)}_{{\varvec{j}}_l} {\bar{g}}^{(l)}_{{\varvec{j}}'_l} \prod _{l \in J} (\xi ^{(l)}_{{\varvec{i}}_l} g^{(l)}_{{\varvec{i}}_l}) (\xi ^{(l)}_{{\varvec{i}}'_l} {\bar{g}}^{(l)}_{{\varvec{i}}'_l}) \right] \\&\quad = \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(J \cup I^c)} A^{(I, J)}_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in I^c \cup J} g^{(l)}_{{\varvec{i}}_l} {\bar{g}}^{(l)}_{{\varvec{i}}'_l} \end{aligned}$$

Substituting into (22) and applying Jensen’s inequality and Fubini’s theorem yields

$$\begin{aligned}&(\alpha ^{(I, J)})^p \\&\quad = {\mathbb {E}}_{g, {\bar{g}}} \left|{\mathbb {E}}_{\xi } \sum _{\begin{array}{c} {\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(J) \\ {\varvec{j}}, {\varvec{j}}' \in {\varvec{J}}^{\varvec{n}}(I^c) \end{array}} \sum _{{\varvec{k}} \in {\varvec{J}}^{\varvec{n}}(I \backslash J)} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}} {\dot{\times }} {\varvec{k}}) \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{j}}' {\dot{\times }} {\varvec{k}})} \prod _{l \in I^c} g^{(l)}_ {{\varvec{j}}_l} {\bar{g}}^{(l)}_{{\varvec{j}}'_l} \prod _{l \in J} (\xi ^{(l)}_{{\varvec{i}}_l} g^{(l)}_{{\varvec{i}}_l}) (\xi ^{(l)}_{{\varvec{i}}'_l} {\bar{g}}^{(l)}_{{\varvec{i}}'_l}) \right|^p \\&\quad \le {\mathbb {E}}_{\xi } {\mathbb {E}}_{g, {\bar{g}}} \left|\sum _{\begin{array}{c} {\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(J) \\ {\varvec{j}}, {\varvec{j}}' \in {\varvec{J}}^{\varvec{n}}(I^c) \end{array}} \sum _{{\varvec{k}} \in {\varvec{J}}^{\varvec{n}}(I \backslash J)} A_{({\varvec{i}} {\dot{\times }} {\varvec{j}} {\dot{\times }} {\varvec{k}}) \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{j}}' {\dot{\times }} {\varvec{k}})} \prod _{l \in I^c} g^{(l)}_{{\varvec{j}}_l} {\bar{g}}^{(l)}_{{\varvec{j}}'_l} \prod _{l \in J} (\xi ^{(l)}_{{\varvec{i}}_l} g^{(l)}_{{\varvec{i}}_l}) (\xi ^{(l)}_{{\varvec{i}}'_l} {\bar{g}}^{(l)}_{{\varvec{i}}'_l}) \right|^p \end{aligned}$$

By the symmetry of the normal distribution, conditioned on \((\xi ^{(l)})_{l \in J}\), \((\xi ^{(l)}_{{\varvec{i}}_l} g^{(l)}_{{\varvec{i}}_l}, \xi ^{(l)}_{{\varvec{i}}'_l} {\bar{g}}^{(l)}_{{\varvec{i}}'_l})\) and \((g^{(l)}_{{\varvec{i}}_l}, {\bar{g}}^{(l)}_{{\varvec{i}}'_l})\) have the same distribution. So we can conclude

$$\begin{aligned} \alpha ^{(I, J)} \le \left\| \sum _{\begin{array}{c} {\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(J \cup I^c) \end{array}} \sum _{{\varvec{k}} \in {\varvec{J}}^{\varvec{n}}(I \backslash J)} A_{({\varvec{i}} {\dot{\times }} {\varvec{k}}) \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{k}})} \prod _{l \in J \cup I^c} g^{(l)}_{{\varvec{i}}_l} {\bar{g}}^{(l)}_{{\varvec{i}}'_l} \right\| _{L_p}. \end{aligned}$$

Step 2: Inverse Gaussian decoupling

For every \(J \subset I \subset [d]\), we obtain then by the symmetry of \({\varvec{A}}\) and Corollary 19,

$$\begin{aligned} \alpha ^{(I, J)} \le \left\| \sum _{\begin{array}{c} {\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}(J \cup I^c) \end{array}} \sum _{{\varvec{k}} \in {\varvec{J}}^{\varvec{n}}(I \backslash J)} A_{({\varvec{i}} {\dot{\times }} {\varvec{k}}) \dot{+} ({\varvec{i}}' {\dot{\times }} {\varvec{k}})} \prod _{l \in J \cup I^c} \left[ g^{(l)}_{{\varvec{i}}_l} g^{(l)}_{{\varvec{i}}'_l} - \mathbbm {1}_{{\varvec{i}}_l = {\varvec{i}}'_l} \right] \right\| _{L_p}. \end{aligned}$$

Step 3: Removing the mean subtractions in every factor

Since \(I \backslash J \ne [d]\), \(J \cup I^c \ne \emptyset \) and Lemma 21 provides that \(\alpha ^{(I, J)} \le \)

$$\begin{aligned} C_1(|J \cup I^c |)\left\| \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in [d]} g^{(l)}_{{\varvec{i}}_l} g^{(l)}_{{\varvec{i}}'_l} - {\mathbb {E}} \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in [d]} g^{(l)}_{{\varvec{i}}_l} g^{(l)}_{{\varvec{i}}'_l} \right\| _{L_p}. \end{aligned}$$

Adding this up over all \(J \subset I \subset [d]\), \(I \backslash J \ne [d]\) yields

$$\begin{aligned}&\sum _{\begin{array}{c} J \subset I \subset [d] \\ I \backslash J \ne [d] \end{array}} \alpha ^{(I, J)} \nonumber \\&\quad \le C(d) \left\| \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in [d]} g^{(l)}_{{\varvec{i}}_l} g^{(l)}_{{\varvec{i}}'_l} - {\mathbb {E}} \sum _{{\varvec{i}}, {\varvec{i}}' \in {\varvec{J}}^{\varvec{n}}} A_{{\varvec{i}} \dot{+} {\varvec{i}}'} \prod _{l \in [d]} g^{(l)}_{{\varvec{i}}_l} g^{(l)}_{{\varvec{i}}'_l} \right\| _{L_p} \end{aligned}$$
(23)

where \(C(d) := \sum _{\begin{array}{c} J \subset I \subset [d]: I \backslash J \ne [d] \end{array}} C_1(|J \cup I^c |)\) depends only on d.

Step 4: Completing the proof

Restricting the left hand side in (23) to the terms in which \(J = \emptyset \). The remaining terms \(\alpha ^{(I, \emptyset )}\) only contain the arrays \({\varvec{A}}^{(I, \emptyset )}\) which are equal to the \({\varvec{A}}^{(I)}\) from the theorem statement. Subsequently, we can bound the \(\alpha ^{(I, \emptyset )}\) from below using Theorem 2 (similarly to the upper bound) to obtain the lower bound in Theorem 3.

3.4 Concentration of \(\Vert B X \Vert _2\)

In this section, we apply our main results to the concentration of \(\Vert B X\Vert _2\) where \(X = X^{(1)} \otimes \dots \otimes X^{(d)}\) is a Kronecker product of independent vectors with subgaussian entries. The following statement is a direct consequence from Theorem 3 and Lemma 12.

Corollary 22

Let \(B \in {\mathbb {R}}^{n_0 \times N}\) be a matrix where \(N = n_1 \cdots n_d\) and \(X := X^{(1)} \otimes \cdots \otimes X^{(d)} \in {\mathbb {R}}^N\) a random vector as in Theorem 3.

Let \({\varvec{A}} \in {\mathbb {R}}^{{\varvec{n}}^{\times 2}}\) be the rearrangement of the matrix \(A = B^* B\) as an array with 2d axes. For any \(I \subset [d]\), define the array \({\varvec{A}}^{(I)}\) as in (10).

For \(T \subset [2 d]\), \(1 \le \kappa \le 2d\), denote \(S(T, \kappa )\) for the set of partitions of T into \(\kappa \) sets and \(I^c = [d] \backslash I\). Define for any \(p \ge 1\) and any \(\kappa \in [2d]\),

$$\begin{aligned} m_{p, \kappa }&:= \sum _{\begin{array}{c} I \subset [d] \\ I \ne [d] \end{array}} \sum _{(I_1, \ldots , I_\kappa ) \in S((I^c) \cup (I^c + d), \kappa )} \Vert {\varvec{A}}^{(I)}\Vert _{I_1, \ldots , I_\kappa } \\ m_p&:= L^{2 d} \sum _{\kappa = 1}^{2 d} \min \left\{ p^\frac{\kappa }{2} \frac{m_{p, \kappa }}{\Vert B\Vert _F}, p^{\frac{\kappa }{4}} \sqrt{m_{p, \kappa }} \right\} \end{aligned}$$

Then there is a constant \(C(d) > 0\), depending only on d, such that for all \(p \ge 1\),

$$\begin{aligned} \left\| \Vert B X\Vert _2 - \Vert B\Vert _F \right\| _{L_p} \le C(d) m_p. \end{aligned}$$

If in addition, \(X^{(1)} \sim N(0, Id_{n_1}), \ldots , X^{(d)} \sim N(0, Id_{n_d})\) are normally distributed (i.e., L is constant) and \({\varvec{A}}\) satisfies the symmetry condition (11), then also the lower bound

$$\begin{aligned} {\tilde{C}}(d) m_p \le \left\| \Vert B X\Vert _2 - \Vert B\Vert _F \right\| _{L_p} \end{aligned}$$

holds for all \(p \ge 1\). Above, \({\tilde{C}}(d) > 0\) that depends only on d.

Lemma 23

Let \({\varvec{B}} \in {\mathbb {R}}^{n_1 \times \dots \times n_d}\). Assume that \(I_1, \ldots , I_\kappa \) is a partition of [d]. Let \({\bar{I}}_{\kappa } \cup {\bar{I}}_{\kappa + 1} = I_\kappa \) be a partition into two subsets. Then

$$\begin{aligned} \Vert {\varvec{B}}\Vert _{I_1, \ldots , I_{\kappa - 1}, {\bar{I}}_{\kappa }, {\bar{I}}_{\kappa +1}}&\le \Vert {\varvec{B}}\Vert _{I_1, \ldots , I_\kappa } \\&\le \sqrt{\min \left\{ \prod _{l \in {\bar{I}}_{\kappa }} n_l, \prod _{l \in {\bar{I}}_{\kappa + 1}} n_l \right\} } \Vert {\varvec{B}}\Vert _{I_1, \dots , I_{\kappa - 1}, {\bar{I}}_{\kappa }, {\bar{I}}_{\kappa +1}}. \end{aligned}$$

Proof

Take arrays \({\varvec{\alpha }}^{(1)} \in {\mathbb {R}}^{{\varvec{n}}}(I_1), \ldots , {\varvec{\alpha }}^{(\kappa - 1)} \in {\mathbb {R}}^{{\varvec{n}}}(I_{\kappa - 1}), \bar{{\varvec{\alpha }}}^{(\kappa )} \in {\mathbb {R}}^{{\varvec{n}}}({\bar{I}}_{\kappa }), \bar{{\varvec{\alpha }}}^{(\kappa + 1)} \in {\mathbb {R}}^{{\varvec{n}}}({\bar{I}}_{\kappa + 1})\), with Frobenius norm 1 each, such that \(\Vert {\varvec{B}}\Vert _{I_1, \ldots , I_{\kappa - 1}, {\bar{I}}_{\kappa }, {\bar{I}}_{\kappa + 1}} = \sum _{{\varvec{i}} \in {\varvec{J}}^{\varvec{n}}} B_{{\varvec{i}}} \alpha ^{(1)}_{{\varvec{i}}_{I_1}} \cdots \alpha ^{(\kappa - 1)}_{{\varvec{i}}_{I_{\kappa - 1}}} {\bar{\alpha }}^{(\kappa )}_{{\varvec{i}}_{{\bar{I}}_{\kappa }}} {\bar{\alpha }}^{(\kappa + 1)}_{{\varvec{i}}_{{\bar{I}}_{\kappa + 1}}}\). Now define \({\varvec{\alpha }}^{(\kappa )} \in {\mathbb {R}}^{{\varvec{n}}}(I_\kappa )\) by \(\alpha ^{(\kappa )}_{{\varvec{i}}} = {\bar{\alpha }}^{(\kappa )}_{{\varvec{i}}_{{\bar{I}}_{\kappa }}} {\bar{\alpha }}^{(\kappa + 1)}_{{\varvec{i}}_{{\bar{I}}_{\kappa + 1}}}\) for every \({\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(I_\kappa )\). Then \(\Vert {\varvec{\alpha }}^{(\kappa )}\Vert _{2} = 1\) and by the definition of \(\Vert \cdot \Vert _{I_1, \ldots , I_\kappa }\) as the supremum over \({\varvec{\alpha }}^{(1)}, \ldots , {\varvec{\alpha }}^{(\kappa )}\), we obtain

$$\begin{aligned} \Vert {\varvec{B}}\Vert _{I_1, \ldots , I_{\kappa - 1}, {\bar{I}}_{\kappa }, {\bar{I}}_{\kappa + 1}} = \sum _{{\varvec{i}} \in {\varvec{J}}^{\varvec{n}}} B_{{\varvec{i}}} \alpha ^{(1)}_{{\varvec{i}}_{I_1}} \cdots \alpha ^{(\kappa )}_{{\varvec{i}}_{I_\kappa }} \le \Vert {\varvec{B}}\Vert _{I_1, \dots , I_{\kappa }}, \end{aligned}$$

which proves the first inequality.

To prove the second inequality, take arrays \({\varvec{\alpha }}^{(1)} \in {\mathbb {R}}^{{\varvec{n}}}(I_1), \ldots , {\varvec{\alpha }}^{(\kappa )} \in {\mathbb {R}}^{{\varvec{n}}}(I_\kappa )\) such that

$$\begin{aligned} \Vert {\varvec{B}}\Vert _{I_1, \ldots , I_\kappa } = \sum _{{\varvec{i}} \in {\varvec{J}}^{\varvec{n}}} B_{{\varvec{i}}} \alpha ^{(1)}_{{\varvec{i}}_{I_1}} \cdots \alpha ^{(\kappa )}_{{\varvec{i}}_{I_\kappa }}. \end{aligned}$$

Now define \(\tilde{{\varvec{B}}} \in {\mathbb {R}}^{{\varvec{n}}}(I_\kappa )\) such that for all \({\varvec{i}} \in {\varvec{J}}^{\varvec{n}}({\bar{I}}_\kappa ), {\varvec{j}} \in {\varvec{J}}^{\varvec{n}}({\bar{I}}_{\kappa + 1})\),

$$\begin{aligned} \tilde{{\varvec{B}}}_{{\varvec{i}} {\dot{\times }} {\varvec{j}}} = \sum _{{\varvec{k}} \in {\varvec{J}}^{\varvec{n}}([d] \backslash I_\kappa )} B_{{\varvec{i}} {\dot{\times }} {\varvec{j}} {\dot{\times }} {\varvec{k}}} \alpha ^{(1)}_{{\varvec{k}}_{I_1}} \cdots \alpha ^{(\kappa - 1)}_{{\varvec{k}}_{I_{\kappa - 1}}}. \end{aligned}$$

For \(N_1 := \prod _{l \in {\bar{I}}_{\kappa }} n_l\) and \(N_2 := \prod _{l \in {\bar{I}}_{\kappa + 1}} n_l\), we can interpret \(\tilde{{\varvec{B}}}\) as a matrix \({\tilde{B}} \in {\mathbb {R}}^{N_1 \times N_2}\) with rows indexed by \({\varvec{i}} \in {\varvec{J}}^{\varvec{n}}({\bar{I}}_\kappa )\) and columns indexed by \({\varvec{j}} \in {\varvec{J}}^{\varvec{n}}({\bar{I}}_{\kappa + 1})\).

Then

$$\begin{aligned} \Vert {\tilde{B}}\Vert _F&= \sup _{{\varvec{\beta }} \in {\mathbb {R}}^{{\varvec{n}}}(I_\kappa ), \Vert {\varvec{\beta }}\Vert _2 = 1} \sum _{{\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(I_\kappa )} {\tilde{B}}_{{\varvec{i}}} \beta _{{\varvec{i}}}, \\ \Vert {\tilde{B}}\Vert _{2 \rightarrow 2}&= \sup _{\begin{array}{c} {\varvec{\beta }}^{(1)} \in {\mathbb {R}}^{{\varvec{n}}}({\bar{I}}_\kappa ), {\varvec{\beta }}^{(2)} \in {\mathbb {R}}^{{\varvec{n}}}({\bar{I}}_{\kappa + 1}), \\ \Vert {\varvec{\beta }}^{(1)}\Vert _2 = \Vert {\varvec{\beta }}^{(2)}\Vert _2 = 1 \end{array}} \sum _{\begin{array}{c} {\varvec{i}} \in {\varvec{J}}^{\varvec{n}}({\bar{I}}_\kappa ) \\ {\varvec{j}} \in {\varvec{J}}^{\varvec{n}}({\bar{I}}_{\kappa + 1}) \end{array}} {\tilde{B}}_{{\varvec{i}} {\dot{\times }} {\varvec{j}}} \beta _{{\varvec{i}}}^{(1)} \beta _{{\varvec{j}}}^{(2)}, \end{aligned}$$

such that

$$\begin{aligned} \Vert {\tilde{B}}\Vert _F&= \sup _{{\varvec{\beta }} \in {\mathbb {R}}^{{\varvec{n}}}(I_\kappa ), \Vert {\varvec{\beta }}\Vert _2 = 1} \sum _{{\varvec{i}} \in {\varvec{J}}^{\varvec{n}}(I_\kappa )} \sum _{{\varvec{k}} \in {\varvec{J}}^{\varvec{n}}([d] \backslash I_\kappa )} B_{{\varvec{i}} {\dot{\times }} {\varvec{k}}} \alpha ^{(1)}_{{\varvec{k}}_{I_1}} \dots \alpha ^{(\kappa - 1)}_{{\varvec{k}}_{I_{\kappa - 1}}} \beta _{{\varvec{i}}} \\&= \sup _{{\varvec{\beta }} \in {\mathbb {R}}^{{\varvec{n}}}(I_\kappa ), \Vert {\varvec{\beta }}\Vert _2 = 1} \sum _{{\varvec{i}} \in {\varvec{J}}^{\varvec{n}}} B_{{\varvec{i}}} \alpha ^{(1)}_{{\varvec{i}}_{I_1}} \cdots \alpha ^{(\kappa - 1)}_{{\varvec{i}}_{I_{\kappa - 1}}} \beta _{{\varvec{i}}_{I_\kappa }}, \end{aligned}$$

where by definition the maximum is attained at \({\varvec{\beta }} = {\varvec{\alpha }}^{(\kappa )}\), implying

$$\begin{aligned} \Vert {\tilde{B}}\Vert _F = \Vert {\varvec{B}}\Vert _{I_1, \ldots , I_\kappa }. \end{aligned}$$
(24)

For the spectral norm, we obtain from the definition of \(\Vert \cdot \Vert _{I_1, \ldots , I_{\kappa - 1}, {\bar{I}}_{\kappa }, {\bar{I}}_{\kappa + 1}}\),

$$\begin{aligned}&\Vert {\tilde{B}}\Vert _{2 \rightarrow 2} \nonumber \\&\quad = \sup _{\begin{array}{c} {\varvec{\beta }}^{(1)} \in {\mathbb {R}}^{{\varvec{n}}}({\bar{I}}_\kappa ), {\varvec{\beta }}^{(2)} \in {\mathbb {R}}^{{\varvec{n}}}({\bar{I}}_{\kappa + 1}), \nonumber \\ \Vert {\varvec{\beta }}^{(1)}\Vert _2 = \Vert {\varvec{\beta }}^{(2)}\Vert _2 = 1 \end{array}} \sum _{\begin{array}{c} {\varvec{i}} \in {\varvec{J}}^{\varvec{n}}({\bar{I}}_\kappa ) \nonumber \\ {\varvec{j}} \in {\varvec{J}}^{\varvec{n}}({\bar{I}}_{\kappa + 1}) \end{array}} \sum _{{\varvec{k}} \in {\varvec{J}}^{\varvec{n}}([d] \backslash I_\kappa )} B_{{\varvec{i}} {\dot{\times }} {\varvec{j}} {\dot{\times }} {\varvec{k}}} \alpha ^{(1)}_{{\varvec{k}}_{I_1}} \dots \alpha ^{(\kappa - 1)}_{{\varvec{k}}_{I_{\kappa - 1}}} \beta _{{\varvec{i}}}^{(1)} \beta _{{\varvec{j}}}^{(2)} \nonumber \\&\quad = \sup _{\begin{array}{c} {\varvec{\beta }}^{(1)} \in {\mathbb {R}}^{{\varvec{n}}}({\bar{I}}_\kappa ), {\varvec{\beta }}^{(2)} \in {\mathbb {R}}^{{\varvec{n}}}({\bar{I}}_{\kappa + 1}),\nonumber \\ \Vert {\varvec{\beta }}^{(1)}\Vert _2 = \Vert {\varvec{\beta }}^{(2)}\Vert _2 = 1 \end{array}} \sum _{{\varvec{i}} \in {\varvec{J}}^{\varvec{n}}} B_{{\varvec{i}}} \alpha ^{(1)}_{{\varvec{i}}} \dots \alpha ^{(\kappa - 1)}_{{\varvec{i}}_{I_{\kappa - 1}}} \beta _{{\varvec{i}}_{{\bar{I}}_{\kappa }}}^{(1)} \beta _{{\varvec{j}}_{{\bar{i}}_{\kappa + 1}}}^{(2)} \nonumber \\&\quad \le \Vert {\varvec{B}}\Vert _{I_1, \dots , I_{\kappa - 1}, {\bar{I}}_{\kappa }, {\bar{I}}_{\kappa + 1}}. \end{aligned}$$
(25)

The second inequality now follows from (24), (25) and the general property of matrices that

$$\begin{aligned} \Vert {\tilde{B}}\Vert _F \le \sqrt{\mathrm {rank}({\tilde{B}})} \Vert {\tilde{B}}\Vert _{2 \rightarrow 2} \le \sqrt{\min \{N_1, N_2\}} \Vert {\tilde{B}}\Vert _{2 \rightarrow 2}. \end{aligned}$$

\(\square \)

Lemma 24

Let \({\varvec{A}} \in {\mathbb {R}}^{{\varvec{n}}^{\times 2}}\), \(I \subset [d]\). Define \({\varvec{A}}^{(I)}\) as in (10).

Let \(I_1, \ldots , I_{\kappa }\) be a partition of \(([d] \backslash I) \cup (d + ([d] \backslash I))\). Let \(I_{\kappa +1}, \ldots , I_{\kappa + |I |}\) be the sets \(\{j, j+d\}\) for every \(j \in I\). Then \(I_1, \ldots , I_{\kappa + |I |}\) is a partition of [2d] and

$$\begin{aligned} \Vert {\varvec{A}}^{(I)}\Vert _{I_1, \ldots , I_{\kappa }} \le \sqrt{\prod _{l \in I} n_l } \Vert {\varvec{A}}\Vert _{I_1, \ldots , I_{\kappa + |I |}} \end{aligned}$$

Proof

Take \({\varvec{\alpha }}^{(1)} \in {\mathbb {R}}^{{\varvec{n}}^{\times 2}}(I_1), \dots , {\varvec{\alpha }}^{(\kappa )} \in {\mathbb {R}}^{{\varvec{n}}^{\times 2}}(I_\kappa )\), all having a Frobenius norm of 1, such that

$$\begin{aligned} \Vert {\varvec{A}}^{(I)}\Vert _{I_1, \ldots , I_\kappa } =&\sum _{{\varvec{i}} \in {\varvec{J}}^{{\varvec{n}}^{\times 2}}(I^c \cup (I^c + d))} A^{(I)}_{{\varvec{i}}} \alpha ^{(1)}_{{\varvec{i}}_{I_1}} \dots \alpha ^{(\kappa )}_{{\varvec{i}}_{I_\kappa }} \nonumber \\ =&\sum _{{\varvec{i}} \in {\varvec{J}}^{{\varvec{n}}^{\times 2}}(I^c \cup (I^c + d))} \sum _{{\varvec{k}} \in {\varvec{J}}^{\varvec{n}}(I)} A_{{\varvec{i}} {\dot{\times }} ({\varvec{k}} \dot{+} {\varvec{k}})} \alpha ^{(1)}_{{\varvec{i}}_{I_1}} \cdots \alpha ^{(\kappa )}_{{\varvec{i}}_{I_\kappa }}\nonumber \\ =&\sum _{{\varvec{i}} \in {\varvec{J}}^{{\varvec{n}}^{\times 2}}} A_{{\varvec{i}}} \alpha ^{(1)}_{{\varvec{i}}_{I_1}} \cdots \alpha ^{(\kappa )}_{{\varvec{i}}_{I_\kappa }} \mathbbm {1}_{\forall l \in I: {\varvec{i}}_l = {\varvec{i}}_{l + d}}. \end{aligned}$$
(26)

Now define \({\varvec{\alpha }}^{(\kappa + 1)} \in {\mathbb {R}}^{{\varvec{n}}^{\times 2}}(\{j_1, j_1 + d\}), \ldots , {\varvec{\alpha }}^{(\kappa + |I |)} \in {\mathbb {R}}^{{\varvec{n}}^{\times 2}}(\{j_{|I |}, j_{|I |} + d\})\) (where \(I = \{j_1, \ldots , j_{|I |}\}\)) such that for all \(r \in [|I |]\) and \({\varvec{i}} \in {\varvec{J}}^{{\varvec{n}}^{\times 2}}(\{j_r, j_r + d\})\),

$$\begin{aligned} \alpha ^{(\kappa + r)}_{{\varvec{i}}} = {\left\{ \begin{array}{ll} \frac{1}{\sqrt{n_{j_r}}} &{} \text {if } {\varvec{i}}_{j_r} = {\varvec{i}}_{j_r + d} \\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

Then for \({\varvec{i}} \in {\varvec{J}}^{{\varvec{n}}^{\times 2}}(I \cup (I + d))\)

$$\begin{aligned} \alpha ^{(\kappa + 1)}_{{\varvec{i}}_{I_{\kappa + 1}}} \cdots \alpha ^{(\kappa + |I |)}_{{\varvec{i}}_{I_{\kappa + |I |}}} = \frac{1}{\sqrt{\prod _{l \in I} n_l}} \mathbbm {1}_{\forall l \in I: {\varvec{i}}_l = {\varvec{i}}_{l + d}} \end{aligned}$$

Substituting this into (26) yields

$$\begin{aligned} \Vert {\varvec{A}}^{(I)}\Vert _{I_1, \ldots , I_\kappa }&= \sqrt{\prod _{l \in I} n_l} \sum _{{\varvec{i}} \in {\varvec{J}}^{{\varvec{n}}^{\times 2}}} A_{{\varvec{i}}} \alpha ^{(1)}_{{\varvec{i}}_{I_1}} \cdots \alpha ^{(\kappa )}_{{\varvec{i}}_{I_\kappa }} \alpha ^{(\kappa + 1)}_{{\varvec{i}}_{I_{\kappa + 1}}} \cdots \alpha ^{(\kappa + |I |)}_{{\varvec{i}}_{I_{\kappa + |I |}}} \\&\le \sqrt{\prod _{l \in I} n_l} \Vert {\varvec{A}}\Vert _{I_1, \ldots , I_{\kappa + |I |}} \end{aligned}$$

\(\square \)

Using the aforementioned results, we can give the proof of Theorem 6 about \(\Vert B (X^{(1)} \otimes \cdots \otimes X^{(d)})\Vert _2\) in which we find suitable bounds for all the tensor norms of \(B^* B\) in terms of \(\Vert B\Vert _{2 \rightarrow 2}\) and \(\Vert B\Vert _F\).

Proof of Theorem 6

Let \(A := B^* B \in {\mathbb {R}}^{n^d \times n^d}\) and \({\varvec{A}} \in {\mathbb {R}}^{{\varvec{n}}^{\times 2}}\) be the corresponding array of order 2d obtained by rearranging A for \({\varvec{n}} = (n, \ldots , n)\). Note that here the dimensions along all axes are equal. For \(I \subset [2 d]\), define \({\varvec{A}}^{(I)}\) as in Corollary 22.

Step 1: Showing the norm inequalities

$$\begin{aligned} \Vert {\varvec{A}}^{(I)}\Vert _{I_1, \ldots , I_\kappa } \le n^{\frac{|I |}{2}} \Vert A\Vert _F \quad \Vert {\varvec{A}}^{(I)}\Vert _{I_1, \ldots , I_\kappa } \le n^{d - \frac{\kappa }{2}} \Vert A\Vert _{2 \rightarrow 2}. \end{aligned}$$
(27)

In both cases, we start by extending \(I_1, \ldots , I_\kappa \) to \(I_1, \dots , I_{\kappa + |I |}\) as in Lemma 24, obtaining

$$\begin{aligned} \Vert {\varvec{A}}^{(I)}\Vert _{I_1, \ldots , I_\kappa } \le n^\frac{|I |}{2} \Vert {\varvec{A}}\Vert _{I_1, \ldots , I_{\kappa + |I |}} \end{aligned}$$
(28)

Then the first inequality of (27) follows by repeatedly joining all the sets \(I_1, \ldots , I_{\kappa + |I |}\) in the sense of Lemma 23 (first inequality) yielding \(\Vert {\varvec{A}}\Vert _{I_1, \ldots , I_{\kappa + |I |}} \le \Vert {\varvec{A}}\Vert _{[2 d]} = \Vert A\Vert _F\).

For the second inequality in (27), we distinguish two cases. First assume that \(\kappa \le d - |I |\). Then \(|I |\le d - \kappa \). Since A is a matrix in \({\mathbb {R}}^{n^d \times n^d}\), \(\Vert A\Vert _{2 \rightarrow 2} \le n^\frac{d}{2} \Vert A\Vert _F\) and with the first inequality in (27), we obtain

$$\begin{aligned} \Vert {\varvec{A}}^{(I)}\Vert _{I_1, \ldots , I_\kappa } \le n^{\frac{|I |}{2}} n^{\frac{d}{2}} \Vert A\Vert _{2 \rightarrow 2} \le n^{\frac{d - \kappa }{2}} n^{\frac{d}{2}} \Vert A\Vert _{2 \rightarrow 2} = n^{d - \frac{\kappa }{2}} \Vert A\Vert _{2 \rightarrow 2}. \end{aligned}$$

In the other case that \(\kappa > d - |I |\), denote \(\kappa '\) for the number of sets among \(I_1, \ldots , I_\kappa \) that only contain one element. Since each of the other sets must contain at least two elements, this leads to the inequality

$$\begin{aligned} \kappa ' + 2 (\kappa - \kappa ')&\le |I_1 \cup \cdots \cup I_\kappa |\quad \Rightarrow 2 \kappa - \kappa '&\le 2(d - |I |) \quad \Rightarrow \kappa '&\ge 2(\kappa - d + |I |). \end{aligned}$$

This implies that among \(I_1, \ldots , I_\kappa \), there must be at least \(\kappa - d + |I |\) sets with exactly one element that are all contained in [d] or all contained in \([2 d] \backslash [d]\). Without loss of generality, we can assume that these are \(I_1, \ldots , I_{\kappa - d + |I |}\). Now take the unions \({\bar{I}}_1 := I_1 \cup \cdots \cup I_{\kappa - d + |I |}\) and \({\bar{I}}_2 := I_{\kappa - d + |I |+ 1} \cup \dots \cup I_{\kappa + |I |}\). With (28) and the first inequality of Lemma 23, we obtain

$$\begin{aligned} \Vert {\varvec{A}}^{(I)}\Vert _{I_1, \ldots , I_\kappa } \le n^\frac{|I |}{2} \Vert {\varvec{A}}\Vert _{{\bar{I}}_1, {\bar{I}}_2}. \end{aligned}$$

Now split up \({\bar{I}}_2\) into \({\bar{I}}_{2, 1} := {\bar{I}}_2 \cap [d]\) and \({\bar{I}}_{2, 2} := {\bar{I}}_2 \cap ([2 d] \backslash [d])\). If neither \({\bar{I}}_{2, 1}\) nor \({\bar{I}}_{2, 2}\) is empty, then with the second inequality of Lemma 23, we obtain

$$\begin{aligned} \Vert {\varvec{A}}^{(I)}\Vert _{I_1, \ldots , I_\kappa } \le&n^\frac{|I |}{2} n^{\frac{1}{2} \min \{|{\bar{I}}_{2, 1} |, |{\bar{I}}_{2, 2} |\}} \Vert {\varvec{A}}\Vert _{{\bar{I}}_1, {\bar{I}}_{2, 1}, {\bar{I}}_{2, 2}} \\ \le&n^{\frac{|I |}{2} + \frac{1}{2} \min \{|{\bar{I}}_{2, 1} |, |{\bar{I}}_{2, 2} |\} } \Vert {\varvec{A}}\Vert _{[d], ([2 d] \backslash [d])}, \end{aligned}$$

where in the last step we used the first inequality in Lemma 23 with the fact that \({\bar{I}}_1 \cup {\bar{I}}_{2, 1} \cup {\bar{I}}_{2, 2} = [2 d]\) and each of these three sets is contained in either [d] or \([2 d] \backslash [d]\). Note that the inequality between the first and the third term still holds in the case that \({\bar{I}}_{2, 1}\) or \({\bar{I}}_{2, 2}\) is empty and thus Lemma 8.4 cannot be applied in the first step.

Now assume \({\bar{I}}_1 \subset [d]\) (otherwise \({\bar{I}}_1 \subset [2 d] \backslash [d]\) and the proof works analogously). Then \({\bar{I}}_1 \cup {\bar{I}}_{2, 1} = [d]\) and \({\bar{I}}_{2, 1} = [2 d] \backslash [d]\). So \(\min \{|{\bar{I}}_{2, 1} |, |{\bar{I}}_{2, 2} |\} = |{\bar{I}}_{2, 1} |= d - |{\bar{I}}_1 |= d - (\kappa - d + |I |) = 2d - \kappa - |I |\). This implies

$$\begin{aligned} \Vert {\varvec{A}}^{(I)}\Vert _{I_1, \dots , I_\kappa } \le n^{\frac{|I |}{2} + \frac{1}{2}(2d - \kappa - |I |) } \Vert {\varvec{A}}\Vert _{[d], ([2 d] \backslash [d])} = n^{d - \frac{\kappa }{2}} \Vert A\Vert _{2 \rightarrow 2}. \end{aligned}$$

This completes the proof of (27).

Step 2: Moment and tail bounds

Now, use Corollary 22 and its notation of \(m_{p, \kappa }\) and \(m_p\). The number of terms in the sum of the definition of \(m_{p, \kappa }\) only depends on d. This fact together with (27) leads to

$$\begin{aligned} m_{p, \kappa } \le&C_1(d) \max _{I \subset [d], I \ne [d]} n^\frac{|I |}{2} \Vert A\Vert _F = C_1(d) n^\frac{d - 1}{2} \Vert A\Vert _F \le C_1(d) n^\frac{d - 1}{2} \Vert B\Vert _{2} \Vert B\Vert _F. \\ m_{p, \kappa } \le&C_1(d) n^{d - \frac{\kappa }{2}} \Vert A\Vert _{2 \rightarrow 2} = C_1(d) n^{d - \frac{\kappa }{2}} \Vert B\Vert _{2 \rightarrow 2}^2, \end{aligned}$$

where \(C_1(d)\) is a constant depending only on d. Furthermore, we obtain

$$\begin{aligned}&m_p \le C_1(d) L^{2 d} \\&\quad \cdot \sum _{\kappa = 1}^{2 d} \min \bigg \{ p^\frac{\kappa }{2} n^\frac{d - 1}{2} \Vert B\Vert _{2 \rightarrow 2}, p^\frac{\kappa }{2} n^{d - \frac{\kappa }{2}} \frac{\Vert B\Vert _{2 \rightarrow 2}^2}{\Vert B\Vert _F}, \\&\quad p^\frac{\kappa }{4} n^{\frac{d - 1}{4}} \sqrt{\Vert B\Vert _{2 \rightarrow 2} \Vert B\Vert _F}, p^\frac{\kappa }{4} n^{\frac{d}{2} - \frac{\kappa }{4}} \Vert B\Vert _{2 \rightarrow 2} \bigg \}. \end{aligned}$$

Since this is an upper bound on the \(L_p\) norm of \(\Vert B X\Vert _2 - \Vert B\Vert _F\), Lemma 13 implies

$$\begin{aligned} {\mathbb {P}}\left( \left|\Vert B X\Vert _2 - \Vert B\Vert _F \right|> t \right) \le e^2 \exp \left( - C_2(d) \min _{\kappa \in [2 d]} \beta _\kappa \right) \end{aligned}$$

where

$$\begin{aligned} \beta _\kappa&:= \max \Biggl \{ \left( \frac{t}{n^\frac{d - 1}{2} \Vert B\Vert _{2 \rightarrow 2} } \right) ^\frac{2}{\kappa }, \left( \frac{t \Vert B\Vert _F}{n^{d - \frac{\kappa }{2}}\Vert B\Vert _{2 \rightarrow 2}^2} \right) ^{\frac{2}{\kappa }},\nonumber \\&\quad \left( \frac{t}{n^{\frac{d - 1}{4}} \sqrt{\Vert B\Vert _{2 \rightarrow 2} \Vert B\Vert _F}} \right) ^\frac{4}{\kappa }, \left( \frac{t}{n^{\frac{d}{2} - \frac{\kappa }{4}} \Vert B\Vert _{2 \rightarrow 2} } \right) ^\frac{4}{\kappa } \Biggr \}. \end{aligned}$$
(29)

Now, for each of multiple different ranges of t, we select one of the four terms in (29).

Step 3: Bound for \(t \le n^\frac{d}{2} \Vert B\Vert _{2 \rightarrow 2}\)

For \(\kappa = 1\), we obtain using the first term in (29), \(\beta _1 \ge \left( {t} / {(n^{\frac{d - 1}{2}} \Vert B\Vert _{2 \rightarrow 2})} \right) ^2\).

For \(\kappa \ge 2\), we can use the fourth term in (29) to show the same bound because

$$\begin{aligned} \beta _\kappa \ge&\left( \frac{t}{n^{\frac{d}{2} - \frac{\kappa }{4}} \Vert B\Vert _{2 \rightarrow 2} } \right) ^\frac{4}{\kappa } = n \left( \frac{t}{n^{\frac{d}{2}} \Vert B\Vert _{2 \rightarrow 2} } \right) ^\frac{4}{\kappa } \\ \ge&n \left( \frac{t}{n^{\frac{d}{2}} \Vert B\Vert _{2 \rightarrow 2} } \right) ^2 = \frac{t^2}{n^{d - 1} \Vert B\Vert _{2 \rightarrow 2}^2}. \end{aligned}$$

This implies that

$$\begin{aligned} {\mathbb {P}}\left( \left|\Vert B X\Vert _2 - \Vert B\Vert _F \right|> t \right) \le e^2 \exp \left( - C_2(d) \frac{t^2}{n^{d - 1} \Vert B\Vert _{2 \rightarrow 2}^2 } \right) . \end{aligned}$$

Step 5: Bound for \(t \ge n^\frac{d}{2} \Vert B\Vert _{2 \rightarrow 2}\)

For all \(\kappa \in [2 d]\), using the fourth term in (29) yields

$$\begin{aligned} \beta _\kappa \ge&\left( \frac{t}{n^{\frac{d}{2} - \frac{\kappa }{4}} \Vert B\Vert _{2 \rightarrow 2} } \right) ^\frac{4}{\kappa } = n \left( \frac{t}{n^{\frac{d}{2}} \Vert B\Vert _{2 \rightarrow 2} } \right) ^\frac{4}{\kappa } \\ \ge&n \left( \frac{t}{n^{\frac{d}{2}} \Vert B\Vert _{2 \rightarrow 2} } \right) ^\frac{4}{2 d} = \left( \frac{t}{\Vert B\Vert _{2 \rightarrow 2} } \right) ^\frac{2}{d}, \end{aligned}$$

such that

$$\begin{aligned} {\mathbb {P}}\left( \left|\Vert B X\Vert _2 - \Vert B\Vert _F \right|> t \right) \le e^2 \exp \left( - C_2(d) \left( \frac{t}{\Vert B\Vert _{2 \rightarrow 2} } \right) ^\frac{2}{d} \right) . \end{aligned}$$

Step 6: Bound for \(n^{\frac{d - 1}{4}} \Vert B\Vert _{2 \rightarrow 2} \le t \le n^{\frac{d - 1}{4}} \Vert B\Vert _F\)

Using the third term in (29), we obtain that

$$\begin{aligned} \beta _\kappa \ge&\left( \frac{t^2}{n^{\frac{d - 1}{2}} \Vert B\Vert _{2 \rightarrow 2} \Vert B\Vert _F} \right) ^\frac{2}{\kappa } \ge \left( \frac{t n^{\frac{d - 1}{4}} \Vert B\Vert _{2 \rightarrow 2}}{n^{\frac{d - 1}{2}} \Vert B\Vert _{2 \rightarrow 2} \Vert B\Vert _F} \right) ^\frac{2}{\kappa } \\ =&\left( \frac{t}{n^{\frac{d - 1}{4}} \Vert B\Vert _F} \right) ^\frac{2}{\kappa } \ge \frac{t^2}{n^{\frac{d - 1}{2}} \Vert B\Vert _F^2}, \end{aligned}$$

implying

$$\begin{aligned} {\mathbb {P}}\left( \left|\Vert B X\Vert _2 - \Vert B\Vert _F \right|> t \right) \le e^2 \exp \left( - C_2(d) \frac{t^2}{n^{\frac{d - 1}{2}} \Vert B\Vert _F^2} \right) . \end{aligned}$$

\(\square \)

4 Discussion

In total, for a chaos of the type

$$\begin{aligned} \sum _{i_1, \ldots , i_{2 d}=1}^n A_{i_1, \ldots , i_d, i_{d + 1}, \ldots , i_{2 d}} X^{(1)}_{i_1} \cdots X^{(d)}_{i_d} X^{(1)}_{i_{d + 1}} \cdots X^{(d)}_{i_{2 d}}, \end{aligned}$$

we have shown moment bounds that are tight (up to dependence on d) for the Gaussian case. Along with this, we have also shown a specific decoupling inequality for the above expression and improved moment and tail bounds for \(\Vert B(X^{(1)} \otimes \dots \otimes X^{(d)})\Vert _2\).

The application [14] generalizes the result in [5] on constructing Johnson–Lindenstrauss embeddings from matrices satisfying the restricted isometry property to Johnson–Lindenstrauss embeddings with a fast transformation of Kronecker products. This leads to expressions of the type \(\Vert \Phi D_\xi x\Vert _2^2\), where \(\Phi \in {\mathbb {R}}^{m \times N}\) is a matrix, \(x \in {\mathbb {R}}^N\) a vector, and \(D_\xi \in {\mathbb {R}}^{N \times N}\) is a diagonal matrix with entries from \(\xi = \xi ^{(1)} \otimes \dots \otimes \xi ^{(d)}\), where \(\xi ^{(1)}, \ldots , \xi ^{(d)}\) are independent Rademacher vectors. Then \(\Vert \Phi D_\xi x\Vert _2^2\) can be rewritten as a chaos of the above type with the Rademacher vectors \(\xi ^{(l)}\) as \(X^{(l)}\). This chaos is controlled with the decoupling statement of Theorem 4, more specifically the Rademacher case (Remark 3), in which the terms significantly simplify. After some necessary intermediate steps, the upper moment bounds of Theorem 3 are applied to the resulting decoupled chaos.