Probability Theory and Related Fields

, Volume 162, Issue 3–4, pp 531–586 | Cite as

Concentration inequalities for non-Lipschitz functions with bounded derivatives of higher order

Open Access
Article

Abstract

Building on the inequalities for homogeneous tetrahedral polynomials in independent Gaussian variables due to R. Latała we provide a concentration inequality for not necessarily Lipschitz functions \(f:\mathbb {R}^n \rightarrow \mathbb {R}\) with bounded derivatives of higher orders, which holds when the underlying measure satisfies a family of Sobolev type inequalities
$$\begin{aligned} \Vert g- \mathbb Eg\Vert _p \le C(p)\Vert \nabla g\Vert _p. \end{aligned}$$
Such Sobolev type inequalities hold, e.g., if the underlying measure satisfies the log-Sobolev inequality (in which case \(C(p) \le C\sqrt{p}\)) or the Poincaré inequality (then \(C(p) \le Cp\)). Our concentration estimates are expressed in terms of tensor-product norms of the derivatives of \(f\). When the underlying measure is Gaussian and \(f\) is a polynomial (not necessarily tetrahedral or homogeneous), our estimates can be reversed (up to a constant depending only on the degree of the polynomial). We also show that for polynomial functions, analogous estimates hold for arbitrary random vectors with independent sub-Gaussian coordinates. We apply our inequalities to general additive functionals of random vectors (in particular linear eigenvalue statistics of random matrices) and the problem of counting cycles of fixed length in Erdős–Rényi random graphs, obtaining new estimates, optimal in a certain range of parameters.

Keywords

Concentration of measure Gaussian chaos Sobolev inequalities 

Mathematics Subject Classification

Primary 60E15 46N30 Secondary 60B20 05C80 

1 Introduction

Concentration of measure inequalities are one of the basic tools in modern probability theory (see the monograph [46]). The prototypic result for all concentration theorems is arguably the Gaussian concentration inequality [14, 62], which asserts that if \(G\) is a standard Gaussian vector in \(\mathbb {R}^n\) and \(f:\mathbb {R}^n \rightarrow \mathbb {R}\) is a 1-Lipschitz function, then for all \(t > 0\),
$$\begin{aligned} \mathbb P(|f(G) - \mathbb Ef(G)|\ge t) \le 2\exp (-t^2/2). \end{aligned}$$
Over the years the above inequality has found numerous applications in the analysis of Gaussian processes, as well as in asymptotic geometric analysis (e.g. in modern proofs of Dvoretzky type theorems). Its applicability in geometric situations comes from the fact that it is dimension free and all norms in \(\mathbb {R}^n\) are Lipschitz with respect to one another. However, there are some probabilistic or combinatorial situations, when one is concerned with functions that are not Lipschitz. The most basic case is the probabilistic analysis of polynomials in independent random variables, which arise naturally, e.g., in the study of multiple stochastic integrals, in discrete harmonic analysis as elements of the Fourier expansions on the discrete cube or in numerous problems of random graph theory, to mention just the famous subgraph counting problem [22, 23, 26, 27, 35, 36, 49].

The concentration of measure or more generally integrability properties for polynomials have attracted a lot of attention in the last forty years. In particular Bonami [13] and Nelson [55] provided hypercontractive estimates (Khintchine type inequalities) for polynomials on the discrete cube and in the Gauss space, which have been later extended to other random variables by Kwapień and Szulga [41] (see also [42]). Khintchine type inequalities have been also obtained in the absence of independence for polynomials under log-concave measures by Bourgain [19], Bobkov [10], Nazarov-Sodin-Volberg [54] and Carbery-Wright [21].

Another line of research is to provide two-sided estimates for moments of polynomials in terms of deterministic functions of the coefficients. Borell [15] and Arcones-Giné [5] provided such two-sided bounds for homogeneous polynomials in Gaussian variables. They were expressed in terms of expectations of suprema of certain empirical processes. Talagrand [64] and Bousquet-Boucheron-Lugosi-Massart [17, 18] obtained counterparts of these results for homogeneous tetrahedral1 polynomials in Rademacher variables and Łochowski [48] and Adamczak [1] for random variables with log-concave tails. Inequalities of this type, while implying (up to constants) hypercontractive bounds, have a serious downside as the analysis of the empirical processes involved is in general difficult. It is therefore important to obtain two-sided bounds in terms of purely deterministic quantities. Such bounds for random quadratic forms in independent symmetric random variables with log-concave tails have been obtained by Latała [43] (the case of linear forms was solved earlier by Gluskin and Kwapień [29], whereas bounds for quadratic forms in Gaussian variables were obtained by Hanson-Wright [32], Borell [15] and Arcones-Giné [5]). Their counterparts for multilinear forms of arbitrary degree in nonnegative random variables with log-concave tails have been derived by Latała and Łochowski [45]. As for the symmetric case, the general problem is still open. An important breakthrough has been obtained by Latała [44], who proved two-sided estimates for Gaussian chaos of arbitrary order, that is for homogeneous tetrahedral polynomials of arbitrary degree in independent Gaussian variables (we recall his bounds below as they are the starting point for our investigations). For general symmetric random variables with log-concave tails similar bounds are known only for chaos of order at most three [2].

Polynomials in independent random variables have been also investigated in relation with combinatorial problems, e.g. with subgraph counting [22, 23, 26, 27, 35, 36, 49]. The best known result for general polynomials in this area has been obtained by Kim and Vu [37, 65], who presented a family of powerful inequalities for \([0,1]\)-valued random variables. Over the last decade they have been applied successfully to handle many problems in probabilistic combinatorics. Some recent inequalities for polynomials in subexponential random variables have been also obtained by Schudy and Sviridenko [59, 60]. They are a generalization of the special case of exponential random variables in [45] and are expressed in terms of quantities similar to those considered by Kim-Vu.

Since it is beyond the scope of this paper to give a precise account of all the concentration inequalities for polynomials, we refer the Reader to the aforementioned sources and recommend also the monographs [24, 42], where some parts of the theory are presented in a uniform way. As already mentioned we will present in detail only the results from [44], which are our main tool as well as motivation.

As for concentration results for general non-Lipschitz functions, the only reference we are aware of, which addresses this question is [30], where the Authors obtain interesting inequalities for stationary measures of certain Markov processes and functions satisfying a Lyapunov type condition. Their bounds are not comparable to the ones which we present in this paper. On the one hand they work in a more general Markov process setting, on the other hand, it seems that their results are restricted to the Gaussian-exponential concentration and do not apply to functionals with heavier tails (such as polynomials of degree higher than two). Since the language of [30] is very different from ours, we will not describe the inequalities obtained therein and refer the interested Reader to the original paper.

Let us now proceed to the presentation of our results. To do this we will first formulate a two-sided tail and moment inequality for homogeneous tetrahedral polynomials in i.i.d. standard Gaussian variables due to Latała [44]. To present it in a concise way we need to introduce some notation which we will use throughout the article. For a positive integer \(n\) we will denote \([n] = \{1,\ldots ,n\}\). The cardinality of a set \(I\) will be denoted by \(\# I\). For \(\mathbf{i}= (i_1,\ldots ,i_d) \in [n]^d\) and \(I\subseteq [d]\) we write \(\mathbf{i}_{I}=(i_{k})_{k\in I}\). We will also denote \(|\mathbf{i}| = \max _{j\le d} {i_j}\) and \(|\mathbf{i}_I| = \max _{j \in I} i_j\).

Consider thus a \(d\)-indexed matrix \(A = (a_{i_1,\ldots ,i_d})_{i_1,\ldots ,i_d = 1}^n\), such that \(a_{i_1,\ldots ,i_d} = 0\) whenever \(i_j = i_k\) for some \(j\ne k\), a sequence \(g_1,\ldots ,g_n\) of i.i.d. \(\mathcal {N}(0,1)\) random variables and define
$$\begin{aligned} Z = \sum _{\mathbf{i}\in [n]^d} a_{\mathbf{i}} g_{i_1}\ldots g_{i_d}. \end{aligned}$$
(1)
Without loss of generality we can assume that the matrix \(A\) is symmetric, i.e., for all permutations \(\sigma :[d]\rightarrow [d], a_{i_1,\ldots ,i_d} = a_{i_{\sigma (1)},\ldots ,i_{\sigma (d)}}\).
Let now \(P_d\) be the set of partitions of \([d]\) into nonempty, pairwise disjoint sets. For a partition \(\mathcal {J} =\{J_1,\ldots ,J_k\}\), and a \(d\)-indexed matrix \(A = (a_\mathbf{i})_{\mathbf{i}\in [n]^d}\) (not necessarily symmetric or with zeros on the diagonal), define
$$\begin{aligned} \Vert A\Vert _{\mathcal{J}}=\sup \left\{ \sum _{\mathbf{i}\in [n]^d} a_{\mathbf{i}}\prod _{l=1}^k x^{(l)}_{\mathbf {i}_{J_l}}:\left\| (x^{(l)}_{\mathbf {i}_{J_l}})\right\| _2\le 1, 1\le l\le k \right\} , \end{aligned}$$
(2)
where \(\left\| (x_{\mathbf{i}_{J_l}})\right\| _2 = \sqrt{\sum _{|\mathbf{i}_{J_l}|\le n} x_{\mathbf{i}_{J_l}}^2}\). Thus, e.g.,
$$\begin{aligned} \left\| (a_{ij})_{i,j\le n}\right\| _{\{1,2\}}&= \sup \left\{ \sum _{i,j\le n} a_{ij}x_{ij}:\sum _{i,j\le n} x_{ij}^2 \le 1\right\} \\&= \sqrt{\sum _{i,j\le n}a_{ij}^2} = \left\| (a_{ij})_{i,j\le n}\right\| _{\mathrm{HS}}\!,\\ \left\| (a_{ij})_{i,j\le n}\right\| _{\{1\}\{2\}}&= \sup \left\{ \sum _{i,j\le n} a_{ij}x_iy_j:\sum _{i\le n} x_{i}^2\le 1,\sum _{j\le n}y_j^2 \le 1\right\} \\&= \left\| (a_{ij})_{i,j\le n}\right\| _{\ell _2^n\rightarrow \ell _2^n}\!,\\ \left\| (a_{ijk})_{i,j,k\le n}\right\| _{\{1,2\} \{3\}}&= \sup \left\{ \sum _{i,j,k\le n} a_{ijk}x_{ij}y_k:\sum _{i,j\le n} x_{ij}^2\le 1,\sum _{k\le n}y_k^2 \le 1\right\} . \end{aligned}$$
From the functional analytic perspective the above norms are injective tensor product norms of \(A\) seen as a multilinear form on \( (\mathbb {R}^{n})^d\) with the standard Euclidean structure.

We are now ready to present the inequalities by Latała. Below, as in the whole article by \(C_d\) we denote a constant, which depends only on \(d\). The values of \(C_d\) may differ between occurrences.

Theorem 1.1

(Latała [44]) For any \(d\)-indexed symmetric matrix \(A = (a_{\mathbf{i}})_{\mathbf{i}\in [n]^d}\) such that \(a_\mathbf{i}= 0\) if \(i_j = i_k\) for some \(j\ne k\), the random variable \(Z\), defined by (1) satisfies for all \(p \ge 2\),
$$\begin{aligned} C_d^{-1} \sum _{\mathcal {J}\in P_d} p^{\#\mathcal {J}/2} \Vert A\Vert _{\mathcal {J}} \le \Vert Z\Vert _p \le C_d \sum _{\mathcal {J}\in P_d} p^{\#\mathcal {J}/2} \Vert A\Vert _{\mathcal {J}}. \end{aligned}$$
As a consequence, for all \(t > 0\),
$$\begin{aligned} C_d^{-1}\exp \left( -C_d\min _{\mathcal {J}\in P_d} \left( \frac{t}{\Vert A\Vert _\mathcal {J}}\right) ^{2/\#\mathcal {J}}\right)&\le \mathbb P(|Z| \ge t)\\&\le C_d\exp \left( -\frac{1}{C_d}\min _{\mathcal {J}\in P_d} \left( \frac{t}{\Vert A\Vert _\mathcal {J}}\right) ^{2/\#\mathcal {J}}\right) . \end{aligned}$$

It is worthwhile noting that for \(\#\mathcal {J} > 1\), the norms \(\Vert A\Vert _{\mathcal {J}}\) are not unconditional in the standard basis (decreasing coefficients of the matrix may not result in decreasing the norm). Moreover, for specific matrices they may not be easy to compute. On the other hand, for any \(d\)-indexed matrix \(A\) and any \(\mathcal {J} \in P_d\), we have \(\Vert A\Vert _\mathcal {J} \le \Vert A\Vert _{\{1,\ldots ,d\}} = \sqrt{\sum _{\mathbf{i}} a_\mathbf{i}^2}\). Using this fact in the upper estimates above allows to recover (up to constants depending on \(d\)) hypercontractive estimates for homogeneous tetrahedral polynomials due to Nelson.

Our main result is an extension of the upper bound given in the above theorem to more general random functions and measures. Below we present the most basic setting we will work with and state the corresponding theorems. Some additional extensions are deferred to the main body of the article.

We will consider a random vector \(X\) in \(\mathbb {R}^n\), which satisfies the following family of Sobolev inequalities. For any \(p \ge 2\) and any smooth integrable function \(f:\mathbb {R}^n \rightarrow \mathbb {R}\),
$$\begin{aligned} \Big \Vert f(X)-\mathbb Ef(X)\Big \Vert _p \le L\sqrt{p}\Big \Vert \nabla f(X)\Big \Vert _p, \end{aligned}$$
(3)
for some constant \(L\) (independent of \(p\) and \(f\)), where \(|\cdot |\) is the standard Euclidean norm on \(\mathbb {R}^n\). It is known (see [3] and Theorem 3.4 below) that if \(X\) satisfies the logarithmic Sobolev inequality (15) with constant \(D_{LS}\), then it satisfies (3) with \(L = \sqrt{D_{LS}}\). We remark that there are many criteria for a random vector to satisfy the logarithmic Sobolev inequality (see e.g. [7, 8, 11, 39, 46]), so in particular our assumption (3) can be verified for many random vectors of interest.

Our first result is the following theorem, which provides moment estimates and concentration for \(D\)-times differentiable functions. The estimates are expressed by \(\Vert \cdot \Vert _{\mathcal {J}}\) norms of derivatives of the function (which we will identify with multi-indexed matrices). We will denote the \(d\)-th derivative of \(f\) by \(\mathbf {D}^d f\).

Theorem 1.2

Assume that a random vector \(X\) in \(\mathbb {R}^n\) satisfies the inequality (3) with constant \(L\). Let \(f:\mathbb {R}^n \rightarrow \mathbb {R}\) be a function of the class \(\mathcal {C}^D\). For all \(p \ge 2\) if \(\mathbf {D}^Df(X) \in L^p\), then
$$\begin{aligned} \Vert f(X) - \mathbb Ef(X)\Vert _p&\le C_D\Big (L^D\sum _{\mathcal {J} \in P_D} p^{\frac{\#\mathcal {J}}{2}} \Big \Vert \Vert \mathbf {D}^Df(X)\Vert _\mathcal {J}\Big \Vert _p\\&\quad + \sum _{1\le d\le D-1}L^d\sum _{\mathcal {J}\in P_d} p^{\frac{\#\mathcal {J}}{2}} \Vert \mathbb E\mathbf {D}^df(X)\Vert _\mathcal {J}\Big ). \end{aligned}$$
In particular if \(\mathbf {D}^Df (x)\) is uniformly bounded on \(\mathbb {R}^n\), then setting
$$\begin{aligned} \eta _f(t)&= \min \left( \min _{\mathcal {J}\in P_D}\left( \frac{t}{L^{D}\sup _{x\in \mathbb {R}^n}\Vert \mathbf {D}^D f(x)\Vert _\mathcal {J}}\right) ^{\frac{2}{\#\mathcal {J}}},\right. \\&\quad \left. \min _{1\le d\le D-1}\min _{\mathcal {J}\in P_d} \left( \frac{t}{L^{d}\Vert \mathbb E\mathbf {D}^d f(X)\Vert _\mathcal {J}}\right) ^{\frac{2}{\#\mathcal {J}}}\right) \end{aligned}$$
we obtain for \(t > 0\),
$$\begin{aligned} \mathbb P(|f(X) - \mathbb Ef(X)| \ge t ) \le 2\exp \Big (-\frac{1}{C_D} \eta _f(t)\Big ). \end{aligned}$$

The above theorem is quite technical, so we will now provide a few comments, comparing it to known results.

1. It is easy to see that if \(D = 1\), Theorem 1.2 reduces (up to absolute constants) to the Gaussian-like concentration inequality, which can be obtained from (3) by Chebyshev’s inequality (applied to general \(p\) and optimized).

2. If \(f\) is a homogeneous tetrahedral polynomial of degree \(D\), then the tail and moment estimates of Theorem 1.2 coincide with those from Latała’s Theorem. Thus Theorem 1.2 provides an extension of the upper bound from Latała’s result to a larger class of measures and functions (however we would like to stress that our proof relies heavily on Latała’s work).

3. If \(f\) is a general polynomial of degree \(D\), then \(\mathbf {D}^D f(x)\) is constant on \(\mathbb {R}^n\) (and thus equal to \(\mathbb E\mathbf {D}^D f(X)\)). Therefore in this case the function \(\eta _f\) appearing in Theorem 1.2 can be written in a simplified form
$$\begin{aligned} \eta _f(t) = \min _{1\le d\le D}\min _{\mathcal {J}\in P_d} \Bigg (\frac{t}{L^{d}\Vert \mathbb E\mathbf {D}^d f(X)\Vert _\mathcal {J}}\Bigg )^{2/\#\mathcal {J}}. \end{aligned}$$
(4)
4. For polynomials in Gaussian variables, the estimates given in Theorem 1.2 can be reversed, like in Theorem 1.1. More precisely we have the following theorem, which provides an extension of Theorem 1.1 to general polynomials.

Theorem 1.3

If \(G\) is a standard Gaussian vector in \(\mathbb {R}^n\) and \(f:\mathbb {R}^n \rightarrow \mathbb {R}\) is a polynomial of degree \(D\), then for all \(p \ge 2\),
$$\begin{aligned} C_D^{-1} \sum _{1\le d\le D}\sum _{\mathcal {J}\in P_d} p^{\frac{\#\mathcal {J}}{2}} \Vert \mathbb E\mathbf {D}^df(G)\Vert _\mathcal {J}&\le \Vert f(G) - \mathbb Ef(G)\Vert _p \\&\le C_D \sum _{1\le d\le D}\sum _{\mathcal {J}\in P_d} p^{\frac{\#\mathcal {J}}{2}} \Vert \mathbb E\mathbf {D}^df(G)\Vert _\mathcal {J}. \end{aligned}$$
Moreover for all \(t > 0\),
$$\begin{aligned} \frac{1}{C_D}\exp \Bigg (-C_D \eta _f(t)\Bigg ) \le \mathbb P(|f(G) - \mathbb Ef(G)| \ge t) \le C_D\exp \Bigg (-\frac{1}{C_D} \eta _f(t)\Bigg ), \end{aligned}$$
where
$$\begin{aligned} \eta _f(t) = \min _{1\le d\le D}\min _{\mathcal {J}\in P_d} \Bigg (\frac{t}{\Vert \mathbb E\mathbf {D}^d f(G)\Vert _\mathcal {J}}\Bigg )^{2/\#\mathcal {J}}. \end{aligned}$$
5. It is well known that concentration of measure for general Lipschitz functions fails e.g. on the discrete cube and one has to impose some additional convexity assumptions to get sub-Gaussian concentration [63]. It turns out that if we restrict to polynomials, estimates in the spirit of Theorems 1.1 and 1.2 still hold. To formulate our result in full generality recall the definition of the \(\psi _2\) Orlicz norm of a random variable \(Y\),
$$\begin{aligned} \Vert Y\Vert _{\psi _2} = \inf \Bigg \{ t > 0:\mathbb E\exp \Bigg (\frac{Y^2}{t^2}\Bigg ) \le 2\Bigg \}. \end{aligned}$$
By integration by parts and Chebyshev’s inequality \(\Vert Y\Vert _{\psi _2} < \infty \) is equivalent to a sub-Gaussian tail decay for \(Y\). We have the following result for polynomials in sub-Gaussian random vectors with independent components.

Theorem 1.4

Let \(X = (X_1,\ldots ,X_n)\) be a random vector with independent components, such that for all \(i \le n, \Vert X_i\Vert _{\psi _2} \le L\). Then for every polynomial \(f:\mathbb {R}^n \rightarrow \mathbb {R}\) of degree \(D\) and every \(p \ge 2\),
$$\begin{aligned} \Vert f(X) - \mathbb Ef(X)\Vert _p \le C_D \sum _{d=1}^D L^d \sum _{\mathcal {J}\in P_d} p^{\#\mathcal {J}/2}\Vert \mathbb E\mathbf {D}^d f(X)\Vert _\mathcal {J}. \end{aligned}$$
As a consequence, for any \(t > 0\),
$$\begin{aligned} \mathbb P\Big (|f(X) - \mathbb Ef(X)| \ge t\Big ) \le 2\exp \Big (-\frac{1}{C_D}\eta _f(t)\Big ), \end{aligned}$$
where
$$\begin{aligned} \eta _f(t) = \min _{1\le d\le D}\min _{\mathcal {J}\in P_d} \Bigg (\frac{t}{L^{d}\Vert \mathbb E\mathbf {D}^d f(X)\Vert _\mathcal {J}}\Bigg )^{2/\#\mathcal {J}}. \end{aligned}$$
6. To give the Reader a flavour of possible applications let us mention the Hanson-Wright inequality [32]. Namely, for a random vector \(X = (X_1,\ldots ,X_n)\) in \(\mathbb {R}^n\) with square integrable and mean-zero components and a real symmetric matrix \(A = (a_{ij})_{i,j \le n}\), consider the random variable
$$\begin{aligned} Z = \sum _{i,j = 1}^n a_{ij} X_i X_j. \end{aligned}$$
If the components of \(X\) are independent and \(\Vert X_i\Vert _{\psi _2} \le L\) for \(i=1,\ldots ,n\), then it follows immediately from Theorem 1.4 that for all \(t>0\),
$$\begin{aligned} \mathbb P\big (|Z-\mathbb EZ| \ge t) \le 2 \exp \bigg (-\frac{1}{C} \min \Big (\frac{t^2}{L^4 \Vert A\Vert _{\mathrm{HS}}^2}, \frac{t}{L^2 \Vert A\Vert _{\ell _2^n \rightarrow \ell _2^n}}\Big )\bigg ). \end{aligned}$$
(5)
Similarly, Theorem 1.2 implies (5) under the assumption that \(X\) satisfies the log-Sobolev inequality (15) with constant \(L^2\) (no independence of components of \(X\) is assumed). Moreover, if \(X\) is a standard Gaussian vector in \(\mathbb {R}^n\), then by Theorem 1.3 the tail estimate (5) can be reversed up to numerical constants. We postpone further applications of our theorems to subsequent sections of the article and here we announce only that apart from polynomials we apply Theorem 1.2 to additive functionals and \(U\)-statistics of random vectors, in particular to linear eigenvalue statistics of random matrices, obtaining bounds which complement known estimates by Guionnet and Zeitouni [31]. Theorem 1.4 is applied to the problem of subgraph counting in large random graphs. In a special case when one counts copies of a given cycle in a random graph \(G(n,p)\), our result allows to obtain a tail inequality which is optimal whenever \(p \ge n^{-\frac{k-2}{2(k-1)}} \log ^{-\frac{1}{2}} n\), where \(k\) is the length of the cycle. To the best of our knowledge this is the sharpest currently known result for this range of \(p\).

7. Let us now briefly discuss optimality of our inequalities. The lower bound in Theorem 1.3 clearly shows that Theorem 1.2 is optimal in the class of measures and functions it covers up to constants depending only on \(D\). As for Theorem 1.4, it is similarly optimal in the class of random vectors with independent sub-Gaussian coordinates. In concrete combinatorial applications, for \(0\)\(1\) random variables this theorem may be however suboptimal. This can be seen already for \(D = 1\), for a linear combination of independent Bernoulli variables \(X_1,\ldots ,X_n\) with \(\mathbb P(X_i = 1) = 1 - \mathbb P(X_i=0) = p\). When \(p\) becomes small, the tail bound for such variables given e.g. by the Chernoff inequality is more subtle than what can be obtained from general inequalities for sums of sub-Gaussian random variables and the fact that \(\Vert X_i\Vert _{\psi _2}\) is of order \((\log (2/p))^{-1/2}\). Roughly speaking, this is the reason why in our estimates for random graphs we have the restriction on how small \(p\) can be. At the same time our inequalities still give results comparable to what can be obtained from other general inequalities for polynomials. As already noted in the survey [36], bounds obtained from various general inequalities for the subgraph-counting problem may not be directly comparable, i.e. those performing well in one case may exhibit worse performance in some other cases. Similarly, our inequalities cannot be in general compared e.g. to the estimates by Kim and Vu [37, 38]. For this reason and since it would require introducing new notation, we will not discuss their estimates and just indicate, when presenting applications of Theorem 1.4, several situations when our inequalities perform in a better or worse way than those by Kim and Vu. Let us only mention that the Kim-Vu inequalities similarly as ours are expressed in terms of higher order derivatives of the polynomials. However, Kim and Vu (as well as Schudy and Sviridenko) look at maxima of absolute values of partial derivatives, which does not lead to tensor-product norms which we consider. While in the general sub-Gaussian case we consider, such tensor product norms cannot be avoided (in view of Theorem 1.3), it is not necessarily the case for \(0\)\(1\) random variables.

8. A version of Theorem 1.2 for vectors of independent random variables satisfying the modified logarithmic Sobolev inequality (see e.g. [28]) instead of the classical log-Sobolev inequality is also discussed. In particular, in Theorem 3.4 we relate the modified log-Sobolev inequality to a certain Sobolev-type inequality with a non-Euclidean norm of the gradient and with the constant independent of the dimension.

The organization of the paper is as follows. First, in Sect. 2, we introduce the notation used in the paper, next in Sect. 3 we give the proof of Theorem 1.2 together with some generalizations and examples of applications. In Sect. 4 we prove Theorem 1.3, whereas in Sect. 5 we present the proof of Theorem 1.4 and applications to the subgraph counting problems. In Sect. 6 we provide further refinements of estimates from Sect. 3 in the case of independent random variables satisfying modified log-Sobolev inequalities (they are deferred to the end of the article as they are more technical than those of Sect. 3). In the Appendix we collect some additional facts used in the proofs.

2 Notation

Sets and indices For a positive integer \(n\) we will denote \([n] = \{1,\ldots ,n\}\). The cardinality of a set \(I\) will be denoted by \(\# I\).

For \(\mathbf{i}= (i_1,\ldots ,i_d)\in [n]^d\) and \(I\subseteq [d]\) we write \(\mathbf{i}_{I}=(i_{k})_{k\in I}\). We will also denote \(|\mathbf{i}| = \max _{j\le d} {i_j}\) and \(|\mathbf{i}_I| = \max _{k\in I} i_k\).

For a finite set \(A\) and an integer \(d \ge 0\) we set
$$\begin{aligned} A^{\underline{d}} = \{\mathbf{i}= (i_1,\ldots ,i_d) \in A^d:\forall _{j,k \in \{1,\ldots ,d\}}\quad \ j\ne k \Rightarrow i_j\ne i_k\} \end{aligned}$$
(i.e. \(A^{\underline{d}}\) is the set of \(d\)-indices with pairwise distinct coordinates). Accordingly we will denote \(n^{\underline{d}} = n(n-1)\cdots (n-d+1)\).

For a finite set \(I\), by \(P_I\) we will denote the family of partitions of \(I\) into nonempty, pairwise disjoint sets. For simplicity we will write \(P_d\) instead of \(P_{[d]}\).

For a finite set \(I\) by \(\ell _2(I)\) we will denote the finite dimensional Euclidean space \(\mathbb {R}^I\) endowed with the standard Euclidean norm \(|x|_2 = \sqrt{\sum _{i\in I} x_i^2}\). Whenever there is no risk of confusion we will denote the standard Euclidean norm simply by \(|\cdot |\).

Multi-indexed matrices For a function \(f:\mathbb {R}^n \rightarrow \mathbb {R}\) by \(\mathbf {D}^d f(x)\) we will denote the (\(d\)-indexed) matrix of its derivatives of order \(d\), which we will identify with the corresponding symmetric \(d\)-linear form. If \(M = (M_\mathbf{i})_{\mathbf{i}\in [n]^d}, N = (N_\mathbf{i})_{\mathbf{i}\in [n]^d}\) are \(d\)-indexed matrices, we define \(\langle M,N\rangle =\sum _{\mathbf{i}\in [n]^d} M_\mathbf{i}N_\mathbf{i}\). Thus for all vectors \(y^{1},\ldots ,y^{d} \in \mathbb {R}^n\) we have \(\mathbf {D}^d f(x) (y^{1},\ldots ,y^{d}) = \langle \mathbf {D}^d f(x),y^{1}\otimes \cdots \otimes y^{d}\rangle \), where \(y^{1}\otimes \cdots \otimes y^{d} = (y^{1}_{i_1}y^{2}_{i_2}\ldots y^{d}_{i_d})_{\mathbf{i}\in [n]^d}\).

We will also define the Hadamard product of two such matrices \(M\circ N\) as a \(d\)-indexed matrix with entries \(m_{\mathbf{i}} = M_\mathbf{i}N_\mathbf{i}\) (pointwise multiplication of entries).

Let us also define the notion of “generalized diagonals” of a \(d\)-indexed matrix \(A = (a_\mathbf{i})_{\mathbf{i}\in [n]^d}\). For a fixed set \(K \subseteq [d]\), with \(\#K > 1\), the “generalized diagonal” corresponding to \(K\) is the set of indices \(\{\mathbf{i}\in [n]^d:i_k = i_l\;\text {for all}\; k,l \in K\}\).

Constants We will use the letter \(C\) to denote absolute constants and \(C_a\) for constants depending only on some parameter \(a\). In both cases the values of such constants may differ between occurrences.

3 A concentration inequality for non-Lipschitz functions

In this Section we prove Theorem 1.2. Let us first state our main tool, which is an inequality by Latała in a decoupled version.

Theorem 3.1

(Latała [44]) Let \(A = (a_\mathbf{i})_{\mathbf{i}\in [n]^d}\) be a \(d\)-indexed matrix with real entries and let \(G_1,G_2,\ldots ,G_d\) be i.i.d. standard Gaussian vectors in \(\mathbb {R}^n\). Let \(Z = \langle A, G_1\otimes \cdots \otimes G_d\rangle \). Then for every \(p\ge 2\),
$$\begin{aligned} C_d^{-1}\sum _{\mathcal {J}\in P_d}p^{\#\mathcal {J}/2}\Vert A\Vert _\mathcal {J} \le \Vert Z\Vert _p \le C_d\sum _{\mathcal {J}\in P_d}p^{\#\mathcal {J}/2}\Vert A\Vert _\mathcal {J}. \end{aligned}$$

Thanks to general decoupling inequalities for \(U\)-statistics [25], which we recall in the “Appendix” (Theorem 7.1), the above theorem is formally equivalent to Theorem 1.1. In fact in [44] Latała first proves the above version. In the proof of Theorem 3.3, which is a slight generalization of Theorem 1.2, we will need just Theorem 3.1 (in particular in this part of the article we do not need any decoupling inequalities).

From now on we will work in a more general setting than in Theorem 1.2 and assume that \(X\) is a random vector in \(\mathbb {R}^n\), such that for all \(p\ge 2\) there exists a constant \(L_X(p)\) such that for all bounded \(\mathcal {C}^1\) functions \(f :\mathbb {R}^n \rightarrow \mathbb {R}\),
$$\begin{aligned} \Vert f(X) - \mathbb Ef(X)\Vert _p \le L_X(p) \Big \Vert |\nabla f(X)|\Big \Vert _p. \end{aligned}$$
(6)
Clearly in this situation the above inequality generalizes to all \(\mathcal {C}^1\) functions (if the right-hand side is finite then the left-hand side is well defined and the inequality holds).
Let now \(G\) be a standard \(n\)-dimensional Gaussian vector, independent of \(X\). Using the Fubini theorem together with the fact that for some absolute constant \(C\), all \(x \in \mathbb {R}^n\) and \(p \ge 2, C^{-1}\sqrt{p}|x| \le \Vert \langle x, G\rangle \Vert _p \le C\sqrt{p}|x|\), we can linearise the right-hand side above and write (6) equivalently (up to absolute constants) as
$$\begin{aligned} \Vert f(X) - \mathbb Ef(X)\Vert _p \le \frac{C L_X(p)}{\sqrt{p}} \Big \Vert \langle \nabla f(X),G\rangle \Big \Vert _p. \end{aligned}$$
(7)
We remark that similar linearisation has been used by Maurey and Pisier to provide a simple proof of the Gaussian concentration inequality [57, 58] (see the remark following Theorem 3.3 below). Inequality (7) has an advantage over (6) as it allows for iteration leading to the following simple proposition.

Proposition 3.2

Consider \(p \ge 2\) and let \(X\) be an \(n\)-dimensional random vector satisfying (6). Let \(f :\mathbb {R}^n \rightarrow \mathbb {R}\) be a \(\mathcal {C}^D\) function. Let moreover \(G_1,\ldots ,G_D\) be independent standard Gaussian vectors in \(\mathbb {R}^n\), independent of \(X\). Then for all \(p \ge 2\), if \(\mathbf {D}^D f(X) \in L^p\), then
$$\begin{aligned} \Big \Vert f(X) - \mathbb Ef(X)\Big \Vert _p&\le \frac{C^{D}L_X(p)^D}{p^{D/2}}\Big \Vert \Big \langle \mathbf {D}^D f(X), G_1\otimes \cdots \otimes G_D\Big \rangle \Big \Vert _p \nonumber \\&+ \sum _{1\le d\le D-1} \frac{C^{d}L_X(p)^d}{p^{d/2}}\Big \Vert \Big \langle \mathbb E_X \mathbf {D}^d f(X), G_1\otimes \cdots \otimes G_d\Big \rangle \Big \Vert _p.\nonumber \\ \end{aligned}$$
(8)

Proof

Induction on \(D\). For \(D = 1\) the assertion of the proposition coincides with (7), which (as already noted) is equivalent to (6). Let us assume that the proposition holds for \(D-1\). Applying thus (8) with \(D-1\) instead of \(D\), we obtain
$$\begin{aligned} \Big \Vert f(X) - \mathbb Ef(X)\Big \Vert _p&\le \frac{C^{D-1}L_X(p)^{D-1}}{p^{(D-1)/2}}\Big \Vert \Big \langle \mathbf {D}^{D-1} f(X),G_1\otimes \cdots \otimes G_{D-1}\Big \rangle \Big \Vert _p\nonumber \\&+ \sum _{d=1}^{D-2} \frac{C^{d}L_X(p)^{d}}{p^{d/2}}\Big \Vert \Big \langle \mathbb E_X \mathbf {D}^d f(X),G_1\otimes \cdots \otimes G_d\Big \rangle \Big \Vert _p.\nonumber \\ \end{aligned}$$
(9)
Applying now the triangle inequality in \(L^p\), we get
$$\begin{aligned}&\Big \Vert \Big \langle \mathbf {D}^{D-1} f(X),G_1\otimes \cdots \otimes G_{D-1}\Big \rangle \Big \Vert _p\nonumber \\&\le \Big \Vert \Big \langle \mathbf {D}^{D-1} f(X) - \mathbb E_X \mathbf {D}^{D-1}f(X),G_1\otimes \cdots \otimes G_{D-1}\Big \rangle \Big \Vert _p \nonumber \\&+\Big \Vert \Big \langle \mathbb E_X \mathbf {D}^{D-1}f(X),G_1\otimes \cdots \otimes G_{D-1}\Big \rangle \Big \Vert _p. \end{aligned}$$
(10)
Let us now apply (7) conditionally on \(G_1,\ldots ,G_{D-1}\) to the function \(f_1(x) = \Big \langle \mathbf {D}^{D-1} f(x),G_1\otimes \cdots \otimes G_{D-1}\Big \rangle \). Since \(\Big \langle \mathbf {D}^{D-1} f(X) - \mathbb E_X \mathbf {D}^{D-1}f(X),G_1\otimes \cdots \otimes G_{D-1}\Big \rangle = f_1(X) - \mathbb E_X f_1(X)\) and \(\langle \nabla f_1(X),G_D\rangle = \langle \mathbf {D}^D f(X),G_1\otimes \cdots \otimes G_D\rangle \), we obtain
$$\begin{aligned}&\mathbb E_X \Big |\Big \langle \mathbf {D}^{D-1} f(X)- \mathbb E_X\mathbf {D}^{D-1} f(X),G_1\otimes \cdots \otimes G_{D-1}\Big \rangle \Big |^p \\&\quad \le \frac{C^pL_X(p)^p}{p^{p/2}} \mathbb E_{X,G_D} \Big |\Big \langle \mathbf {D}^D f(X),G_1\otimes \cdots \otimes G_D\Big \rangle \Big |^p. \end{aligned}$$
To finish the proof it is now enough to integrate this inequality with respect to the remaining Gaussian vectors and combine the obtained estimate with (9) and (10).\(\square \)

Let us now specialize to the case when \(L_X(p) = Lp^\gamma \) for some \(L>0,\gamma \ge 1/2\). Combining the above proposition with Latała’s Theorem 3.1, we obtain immediately the following theorem, a special case of which is Theorem 1.2.

Theorem 3.3

Assume that \(X\) is a random vector in \(\mathbb {R}^n\), such that for some constants \(L>0,\gamma \ge 1/2\), all smooth bounded functions \(f\) and all \(p \ge 2\),
$$\begin{aligned} \Big \Vert f(X)-\mathbb Ef(X)\Big \Vert _p \le Lp^\gamma \Big \Vert |\nabla f(X)|\Big \Vert _p. \end{aligned}$$
(11)
For any smooth function \(f:\mathbb {R}^n \rightarrow \mathbb {R}\) of class \(\mathcal {C}^D\) and \(p \ge 2\) if \(\mathbf {D}^D f(X) \in L^p\), then
$$\begin{aligned} \Big \Vert f(X)-\mathbb Ef(X)\Big \Vert _p&\le C_D\Bigg (\sum _{\mathcal {J}\in P_D} L^Dp^{(\gamma - 1/2)D + \#\mathcal {J}/2} \Big \Vert \Vert \mathbf {D}^D f(X)\Vert _\mathcal {J}\Big \Vert _p\\&+\sum _{1\le d\le D-1} \sum _{\mathcal {J}\in P_d} L^d p^{(\gamma -1/2)d + \# \mathcal {J}/2}\Vert \mathbb E\mathbf {D}^d f(X)\Vert _{\mathcal {J}}\Bigg ). \end{aligned}$$
Moreover, if \(\mathbf {D}^D f \) is bounded uniformly on \(\mathbb {R}^n\), then for all \(t > 0\),
$$\begin{aligned} \mathbb P\Big (|f(X)-\mathbb Ef(X)| \ge t \Big ) \le 2\exp \Big (-\frac{1}{C_D}\eta _f(t)\Big ), \end{aligned}$$
where
$$\begin{aligned} \eta _f(t)&= \min (A,B),\\ A&= \min _{\mathcal {J}\in P_D}\Bigg (\Big (\frac{t}{L^{D}\sup _{x\in \mathbb {R}^n}\Vert \mathbf {D}^D f(x)\Vert _\mathcal {J}}\Big )^{2/((2\gamma -1) D+ \#\mathcal {J})}\Bigg ),\\ B&= \min _{1\le d\le D-1} \min _{\mathcal {J}\in P_d} \Bigg (\Big (\frac{t}{L^{d}\Vert \mathbb E\mathbf {D}^d f(X)\Vert _\mathcal {J}}\Big )^{2/((2\gamma -1)d+\#\mathcal {J})}\Bigg ). \end{aligned}$$

Proof

The first part is a straightforward combination of Proposition 3.2 and Theorem 3.1. The second part follows from the first one by Chebyshev’s inequality \(\mathbb P\Big (|Y|\ge e\Vert Y\Vert _p\Big ) \le \exp (-p)\) applied with \(p = \eta _f(t)/C_D\) (note that if \(\eta _f(t)/C_D\le 2\) then one can make the tail bound asserted in the theorem trivial by adjusting the constants).\(\square \)

Remark

In [57, 58] Pisier presents a stronger inequality than (11) with \(\gamma = 1/2\). More specifically, he proves that if \(X,G\) are independent standard centered Gaussian vectors in \(\mathbb {R}^n\), \(E\) is a Banach space and \(f :\mathbb {R}^n \rightarrow E\) is a \(\mathcal {C}^1\) function, then for every convex function \(\Phi :E \rightarrow \mathbb {R}\),
$$\begin{aligned} \mathbb E\Phi (f(X) - \mathbb Ef(X)) \le \mathbb E\Phi \Big (L \langle \nabla f(X),G\rangle \Big ), \end{aligned}$$
(12)
where \(L = \frac{\pi }{2}\). As noted in [47], Caffarelli’s contraction principle [20] implies that, e.g., a random vector \(X\) with density \(e^{-V}\), where \(V:\mathbb {R}^n \rightarrow \mathbb {R}\) satisfies \(D^2 V \ge \lambda \mathrm {Id}, \lambda > 0\) satisfies the above inequality with \(L = \frac{\pi }{2\sqrt{\lambda }}\) (where \(G\) is still a standard Gaussian vector independent of \(X\)). Therefore in this situation a similar approach as in the proof of Proposition 3.2 can be used for functions \(f\) with values in a general Banach space. Moreover, a counterpart of Latała’s results is known for chaos with values in a Hilbert space (to the best of our knowledge this observation has not been published, in fact it can be quite easily obtained from the version for real valued chaos). Thus in this case we can obtain a counterpart of Theorem 3.3 (with \(\gamma = 1/2\)) for Hilbert space valued-functions. In the case of a general Banach space two-sided estimates with deterministic quantities for Gaussian chaos are not known. Still, one can use some known inequalities (like hypercontraction or Borell-Arcones-Giné inequality) instead of Theorem 3.1 and thus obtain new concentration bounds. We remark that if one uses hypercontraction, one can obtain explicit dependence of the constants on the degree of the polynomial, since explicit constants are known for hypercontractive estimates of (Banach space-valued) Gaussian chaos and one can keep track of them during the proof. We skip the details.
In view of Theorem 3.3 a natural question arises: for what measures is the inequality (11) satisfied? Before we provide examples, for technical reasons let us recall the definition of the length of the gradient of a locally Lipschitz function. For a metric space \((\mathcal {X},d)\), a locally Lipschitz function \(f:\mathcal {X} \rightarrow \mathbb {R}\) and \(x \in \mathcal {X}\), we define
$$\begin{aligned} |\nabla f|(x) = \limsup _{d(x,y)\rightarrow 0} \frac{|f(y)-f(x)|}{d(x,y)}. \end{aligned}$$
(13)
If \(\mathcal {X} = \mathbb {R}^n\) with a Euclidean metric and \(f\) is differentiable at \(x\), then clearly \(|\nabla f|(x)\) coincides with the Euclidean length of the usual gradient \(\nabla f(x)\). For this reason, with a slight abuse of notation, we will write \(|\nabla f(x)|\) instead of \(|\nabla f|(x)\). We will consider only measures on \(\mathbb {R}^n\), however since we allow measures which are not necessarily absolutely continuous with respect to the Lebesgue measure, at some points in the proofs we will work with the above abstract definition.
Going back to the question of measures satisfying (11), it is well known (see e.g. [52]) that if \(X\) satisfies the Poincaré inequality
$$\begin{aligned} \mathrm{Var\,}(f(X)) \le D_{Poin}\mathbb E|\nabla f(X)|^2 \end{aligned}$$
(14)
for all locally Lipschitz bounded functions, then \(X\) satisfies (11) with \(\gamma = 1\) and \(L = C\sqrt{D_{Poin}}\) (recall that \(C\) always denotes a universal constant). Assume now that \(X\) satisfies the logarithmic Sobolev inequality
$$\begin{aligned} \mathrm {Ent}f^2(X) \le 2D_{LS} \mathbb E|\nabla f(X)|^2 \end{aligned}$$
(15)
for locally Lipschitz bounded functions, where for a nonnegative random variable \(Y\),
$$\begin{aligned} \mathrm {Ent}Y = \mathbb EY\log Y - \mathbb EY\log (\mathbb EY). \end{aligned}$$
Then, by the results from [3], it follows that \(X\) satisfies (11) with \(\gamma = 1/2\) and \(L = \sqrt{D_{LS}}\).

We will now generalize this observation to measures satisfying the modified logarithmic Sobolev inequality (introduced in [28]). We will present it in greater generality than needed for proving (11), since we will use it later (in Sect. 6) to prove refined concentration results for random vectors with independent Weibull coordinates.

Let \(\beta \in [2,\infty )\). We will say that a random vector \(Y \in \mathbb {R}^k\) satisfies a \(\beta \)-modified logarithmic Sobolev inequality if for every locally Lipschitz bounded positive function \(f :\mathbb {R}^k \rightarrow (0,\infty )\),
$$\begin{aligned} \mathrm {Ent}f^2(Y) \le D_{LS_\beta } \Big (\mathbb E|\nabla f(Y)|^2 + \mathbb E\frac{|\nabla f(Y)|^\beta }{f(Y)^{\beta -2}}\Big ). \end{aligned}$$
(16)
Let us also introduce two quantities, measuring the length of the gradient in product spaces. Consider a locally Lipschitz function \(f :\mathbb {R}^{mk} \rightarrow \mathbb {R}\), where we identify \(R^{mk}\) with the \(m\)-fold Cartesian product of \(\mathbb {R}^k\). Let \(x = (x_1,\ldots ,x_m)\), where \(x_i \in \mathbb {R}^k\). For each \(i=1,\ldots ,m\), let \(|\nabla _i f(x)|\) be the length of the gradient of \(f\), treated as a function of \(x_i\) only, with the other coordinates fixed. Now for \(r \ge 1\), set
$$\begin{aligned} |\nabla f(x)|_r = \Big (\sum _{i=1}^m |\nabla _i f(x)|^r\Big )^{1/r}. \end{aligned}$$
Note that if \(f\) is differentiable at \(x\), then \(|\nabla f(x)|_2 = |\nabla f(x)|\) (the Euclidean length of the “true” gradient), whereas for \(k = 1\) (and \(f\) differentiable), \(|\nabla f(x)|_r\) is the \(\ell _r^m\) norm of \(\nabla f(x)\).

Theorem 3.4

Let \(\beta \in [2,\infty )\) and \(Y\) be a random vector in \(\mathbb {R}^k\), satisfying (16). Consider a random vector \(X = (X_1,\ldots ,X_m)\) in \(\mathbb {R}^{mk}\), where \(X_1,\ldots ,X_m\) are independent copies of \(Y\). Then for any locally Lipschitz \(f :\mathbb {R}^{mk} \rightarrow \mathbb {R}\) such that \(f(X)\) is integrable, and \(p \ge 2\),
$$\begin{aligned} \Vert f(X) - \mathbb Ef(X)\Vert _p \le C_\beta D_{LS_\beta }^{1/2} p^{1/2}\Big \Vert |\nabla f(X)|_2\Big \Vert _p + D_{LS_\beta }^{1/\beta }p^{1/\alpha }\Big \Vert |\nabla f(X)|_\beta \Big \Vert _p, \end{aligned}$$
(17)
where \(\alpha = \frac{\beta }{\beta -1}\) is the Hölder conjugate of \(\beta \).

In particular using the above theorem with \(m = 1\) and \(k = n\), we obtain the following

Corollary 3.5

If \(X\) is a random vector in \(\mathbb {R}^n\) which satisfies the \(\beta \)-modified log-Sobolev inequality (16), then it satisfies (11) with \(\gamma = \frac{\beta -1}{\beta } \ge \frac{1}{2}\) and \(L = C_\beta \max (D_{LS_\beta }^{1/2},D_{LS_\beta }^{1/\beta })\).

We remark that in the class of logarithmically concave random vectors, the \(\beta \)-modified log-Sobolev inequality is known to be equivalent to concentration for 1-Lipschitz functions of the form \(\mathbb P\Big (|f(X) - \mathbb Ef(X)| \ge t \Big ) \le 2\exp \Big (-c t^{\beta /(\beta -1)}\Big )\) [53].

Proof of Theorem 3.4

By the tensorization property of entropy (see e.g. [46], Proposition 5.6) we get for all positive locally Lipschitz bounded functions \(f :\mathbb {R}^{mk} \rightarrow (0,\infty )\),
$$\begin{aligned} \mathrm {Ent}f^2(X) \le D_{LS_\beta }\Bigg (\mathbb E\Big |\nabla f(X)\Big |_2^2 + \sum _{i=1}^m \mathbb E\frac{|\nabla _i f(X)|^\beta }{f(X)^{\beta -2}}\Bigg ). \end{aligned}$$
(18)
Following [3], consider now any locally Lipschitz bounded \(f > 0\) and denote \(F(t) = \mathbb Ef(X)^t\). For \(t > 2\),
$$\begin{aligned} F'(t) = \mathbb E\left( f(X)^t \log f(X) \right) \end{aligned}$$
and
$$\begin{aligned} \frac{d}{dt} \left( \mathbb Ef(X)^t \right) ^{2/t}&= \frac{d}{dt} F(t)^{2/t} = F(t)^{2/t} \cdot \frac{d}{dt} \left( \frac{2}{t} \log F(t) \right) \\&= F(t)^{2/t} \left( \frac{2}{t} \frac{F'(t)}{F(t)} - \frac{2}{t^2} \log F(t) \right) \\&= \frac{2}{t^2} F(t)^{\frac{2}{t} - 1} \left( t F'(t) - F(t) \log F(t) \right) \\&= \frac{2}{t^2} \left( \mathbb Ef(X)^t \right) ^{\frac{2}{t} - 1} \left( \mathbb E\left( f(X)^t \log f(X)^t \right) \right. \\&\left. - \left( \mathbb Ef(X)^t \right) \log \left( \mathbb Ef(X)^t \right) \right) . \end{aligned}$$
By (18) applied to the function \(g = f^{t/2} = \varphi \circ f \) where \(\varphi (u) = |u|^{t/2}\),
$$\begin{aligned} \frac{d}{dt} \left( \mathbb Ef(X)^t \right) ^{2/t}&\le \frac{2}{t^2} \left( \mathbb Ef(X)^t \right) ^{\frac{2}{t} - 1} \cdot D_{LS_\beta } \Big (\mathbb E\Big |\nabla (\varphi \circ f)(X)\Big |_2^2 \\&+ \mathbb E\Big |\nabla (\varphi \circ f)(X)\Big |_\beta ^\beta f(X)^{t(2-\beta )/2}\Big ). \end{aligned}$$
By the chain rule and the Hölder inequality for the pair of conjugate exponents \(t/2, t/(t-2)\),
$$\begin{aligned} \mathbb E\Big |\nabla (\varphi \circ f)(X)\Big |_2^2&= \mathbb E\Big ( \Big |\varphi '(f(X)) \Big | \cdot \Big | \nabla f(X)\Big |_2 \Big )^2 \\&\le \Big ( \mathbb E|\nabla f(X)|_2^t \Big )^{2/t} \left( \mathbb E\left( \varphi ^{\prime }(f(X))\right) ^{2t/(t-2)} \right) ^{(t-2)/t} \\&= \big \Vert |\nabla f(X)|_2\big \Vert _t^2 \cdot \left( \frac{t^2}{4} \right) \left( \mathbb Ef(X)^t \right) ^{1 - \frac{2}{t}}. \end{aligned}$$
Similarly, for \(t \ge \beta \),
$$\begin{aligned} \mathbb E\Big |\nabla (\varphi \circ f)(X)\Big |_\beta ^\beta f(X)^{t(2-\beta )/2}&= \frac{t^\beta }{2^\beta }\mathbb Ef(X)^{(t/2-1)\beta }\Big |\nabla f(X)\Big |_\beta ^\beta f(X)^{t(2-\beta )/2}\\&= \frac{t^\beta }{2^\beta } \mathbb Ef(X)^{t-\beta } \Big |\nabla f(X)\Big |_\beta ^\beta \\&\le \frac{t^\beta }{2^\beta } \Big (\mathbb Ef(X)^t\Big )^{1-\beta /t} \Big (\mathbb E\Big |\nabla f(X)\Big |_\beta ^t \Big )^{\beta /t}\\&= \frac{t^\beta }{2^\beta } \Big (\mathbb Ef(X)^t\Big )^{1-\beta /t} \Big \Vert |\nabla f(X)|_\beta \Big \Vert _t^\beta . \end{aligned}$$
Thus we get for \(\beta \le t \le p\),
$$\begin{aligned} \frac{d}{dt} \left( \mathbb Ef(X)^t \right) ^{2/t} \!\le \! \frac{D_{LS_\beta }}{2}\Big \Vert |\nabla f(X)|_2\Big \Vert _p^2 \!+\! \frac{D_{LS_\beta }}{2^{\beta -1}}t^{\beta -2}\Big (\mathbb Ef(X)^t\Big )^{(2-\beta )/t}\big \Vert |\nabla f(X)|_\beta \big \Vert _p^\beta . \end{aligned}$$
Denote \( a = \frac{D_{LS_\beta }}{2}\Big \Vert |\nabla f(X)|_2\Big \Vert _p^2, b = \frac{D_{LS_\beta }}{2^{\beta -1}}\Big \Vert |\nabla f(X)|_\beta \Big \Vert _p^\beta , g(t) = \left( \mathbb Ef(X)^t \right) ^{2/t}\). The above inequality can be written as
$$\begin{aligned} g^{\beta /2-1}\frac{d}{dt} g \le g^{\beta /2 - 1} a + t^{\beta - 2} b \end{aligned}$$
for \(t \in [\beta ,p]\) or, denoting \(G = g^{\beta /2}\),
$$\begin{aligned} \frac{d}{dt} G \le \frac{\beta }{2}(G^{(\beta -2)/\beta } a + t^{\beta -2}b). \end{aligned}$$
For \(\varepsilon > 0\) consider now the function \(H_\varepsilon (t) = (g(\beta ) + a (t-\beta ) + b^{2/\beta } t^{2 - 2/\beta }+\varepsilon )^{\beta /2}\). We have
$$\begin{aligned} H_\varepsilon (\beta ) > G(\beta ) \end{aligned}$$
and
$$\begin{aligned} \frac{d}{dt} H_\varepsilon (t) = \frac{\beta }{2} H_\varepsilon (t)^{(\beta -2)/\beta } \Big (a \!+\! (2\!-\!2/\beta )t^{1-2/\beta }b^{2/\beta }\Big ) \!\ge \! \frac{\beta }{2}\Big (H_\varepsilon (t)^{(\beta -2)/\beta }a \!+\! t^{\beta -2}b\Big ), \end{aligned}$$
where we used the assumption \(\beta \ge 2\). Using the last three inequalities together with the fact that for \(t \ge 0\) the function \(x \mapsto x^{(\beta -2)/2}a + t^{\beta -2}b\) is increasing on \([0,\infty )\) we obtain that \(G(t) \le H_\varepsilon (t)\) for all \(t \in [\beta ,p]\), which by taking \(\varepsilon \rightarrow 0^+\) implies that for \(p \ge \beta \),
$$\begin{aligned} g(p)&= G(p)^{2/\beta } \le H_0(p)^{2/\beta } \le g(\beta ) + \frac{D_{LS_\beta }}{2}(p-\beta )\big \Vert |\nabla f(X)|_2\big \Vert _p^2\\&+\, \frac{D_{LS_\beta }^{2/\beta }}{2} p^{2-2/\beta }\big \Vert |\nabla f(X)|_\beta \big \Vert _p^2, \end{aligned}$$
i.e.,
$$\begin{aligned} \Vert f(X)\Vert _p^2 \le \Vert f(X)\Vert _\beta ^2 \!+\! \frac{D_{LS_\beta }}{2}(p\!-\!\beta )\big \Vert |\nabla f(X)|_2\big \Vert _p^2 \!+\! \frac{D_{LS_\beta }^{2/\beta }}{2} p^{2-2/\beta }\big \Vert |\nabla f(X)|_\beta \big \Vert _p^2. \end{aligned}$$
(19)
The above inequality has been proved so far for strictly positive, locally Lipschitz functions (the boundedness assumption can be easily removed by truncation and passage to the limit). For the case of a general locally Lipschitz function \(f\), take any \(\varepsilon >0\) and consider \(\tilde{f} = |f| + \varepsilon \). Since \(\tilde{f}\) is strictly positive and locally Lipschitz, the above inequality holds also for \(\tilde{f}\). Taking \(\varepsilon \rightarrow 0^+\), we can now extend (19) to arbitrary locally Lipschitz \(f\).
Finally, assume \(f :\mathbb {R}^{mk} \rightarrow \mathbb {R}\) is locally Lipschitz and \(f(X)\) is integrable. Applying (19) to \(f - \mathbb Ef(X)\) instead of \(f\) and taking the square root, we obtain
$$\begin{aligned}&\Vert f(X) - \mathbb Ef(X)\Vert _p \le \Vert f(X) - \mathbb Ef(X)\Vert _\beta + \sqrt{D_{LS_\beta }(p-\beta )}\big \Vert |\nabla f(X)|_2\big \Vert _p\\&\quad + D_{LS_\beta }^{1/\beta } p^{1/\alpha }\big \Vert |\nabla f(X)|_\beta \big \Vert _p \end{aligned}$$
for \(p \ge \beta \). For \(p \in [2, \beta ]\), since (16) implies the Poincaré inequality with constant \(D_{LS_\beta }/2\) (see Proposition 2.3. in [28]), we get
$$\begin{aligned} \Vert f(X) - \mathbb Ef(X)\Vert _p \le C D_{LS_\beta }^{1/2} p\big \Vert |\nabla f(X)|_2\big \Vert _p \end{aligned}$$
(see the remark following (14)). These two estimates yield (17) with \(C_\beta = C \sqrt{\beta }\).\(\square \)

3.1 Applications of Theorem 1.2

Let us now present certain applications of estimates established in the previous section. For simplicity we will restrict to the basic setting presented in Theorem 1.2.

3.1.1 Polynomials

A typical application of Theorem 1.2 would be to obtain tail inequalities for multivariate polynomials in the random vector \(X\). The constants involved in such estimates do not depend on the dimension, but only on the degree of the polynomial. As already mentioned in the introduction, our results in this setting can be considered as a transference of inequalities by Latała from the tetrahedral Gaussian case to the case of not necessarily product random vectors and general polynomials.

3.1.2 Additive functionals and related statistics

We will now consider three classes of additive statistics of a random vector, often arising in various problems.

Additive functionals Let \(X\) be a random vector in \(\mathbb {R}^n\) satisfying (3). For a function \(f:\mathbb {R}\rightarrow \mathbb {R}\) define the random variable
$$\begin{aligned} Z_f = f(X_1)+\cdots +f(X_n). \end{aligned}$$
(20)
It is classical and follows from (3) by a simple application of the Chebyshev inequality that if \(f\) is smooth with \(\Vert f'\Vert _\infty \le \alpha \), then for all \(t > 0\),
$$\begin{aligned} \mathbb P\big (|Z_f - \mathbb EZ_f| \ge t\big ) \le e^2\exp \Big (-\frac{t^2}{e^2 nL^2\alpha ^2}\Big ). \end{aligned}$$
(21)
Using Theorem 1.2 we can easily obtain inequalities which hold if \(f\) is a polynomial-like function, i.e., if \(\Vert f^{(D)}\Vert _\infty < \infty \) for some \(D\). Note that the derivatives of the function \(F(x_1,\ldots ,x_n) = f(x_1)+\cdots +f(x_n)\) have a very simple diagonal form. In consequence, calculating their \(\Vert \cdot \Vert _\mathcal {J}\) norms is simple. More precisely, we have
$$\begin{aligned} \mathbf {D}^d F(x) = \mathrm{diag}_d \Big (f^{(d)}(x_1),\ldots ,f^{(d)}(x_n)\Big ), \end{aligned}$$
where \(\mathrm{diag}_d(x_1,\ldots ,x_n)\) stands for the \(d\)-indexed matrix \((a_{\mathbf{i}})_{\mathbf{i}\in [n]^d}\) such that \(a_\mathbf{i}= x_i\) if \(i_1 = \cdots = i_d = i\) and \(0\) otherwise. It is easy to see that if \(\mathcal {J} = \{[d]\}\), then \(\Vert \mathrm{diag}_d(x_1,\ldots ,x_n)\Vert _\mathcal {J} = \sqrt{x_1^2+\cdots +x_n^2}\) and if \(\# \mathcal {J} \ge 2\), then \(\Vert \mathrm{diag}_d(x_1,\ldots ,x_n)\Vert _\mathcal {J} = \max _{i\le n}|x_i|\). Therefore we obtain the following corollary to Theorem 1.2. We will apply it in the next section to linear eigenvalue statistics of random matrices.

Corollary 3.6

Let \(X\) be a random vector in \(\mathbb {R}^n\) satisfying (3), \(f :\mathbb {R}\rightarrow \mathbb {R}\) a \(\mathcal {C}^D\) function, such that \(\Vert f^{(D)}\Vert _\infty < \infty \) and let \(Z_f\) be defined by (20). Then for all \(t > 0\),
$$\begin{aligned} \mathbb P(|Z_f - \mathbb EZ_f| \ge t)&\le 2\exp \Bigg (-\frac{1}{C_D}\min \Big (\frac{t^2}{L^{2D}n\Vert f^{(D)} \Vert _\infty ^2},\frac{t^{2/D}}{L^2\Vert f^{(D)}\Vert _\infty ^{2/D}}\Big )\Bigg )\\&+2\exp \Bigg (-\frac{1}{C_D}\min _{1\le d\le D-1}\Big (\frac{t^2}{L^{2d} \sum _{i=1}^n (\mathbb Ef^{(d)}(X_i))^2 }\Big )\Bigg )\\&+ 2\exp \Bigg (-\frac{1}{C_D}\min _{2\le d \le D-1}\Big (\frac{t^{2/d}}{L^2 \max _{i\le n} |\mathbb Ef^{(d)}(X_i)|^{2/d}}\Big )\Bigg ). \end{aligned}$$

Clearly the case \(D=1\) of the above corollary recovers (21) up to constants. Moreover using the (yet unproven) Theorem 1.3 one can see that for \(f(x) = x^D\) and \(X\) being a standard Gaussian vector in \(\mathbb {R}^n\), the estimate of the corollary is optimal up to absolute constants (in this case, since \(Z_f\) is a sum of independent random variables, one can also use estimates from [33]).

Additive functionals of partial sums Let us now consider a slightly more involved additive functional of the form
$$\begin{aligned} S_f = \sum _{i=1}^n f\Bigg (\sum _{j=1}^i X_j\Bigg ). \end{aligned}$$
(22)
Such random variables arise e.g., in the study of additive functionals of random walks (see e.g. [16, 61]). For simplicity we will only discuss what can be obtained directly for Lipschitz functions \(f\) and what Theorem 1.2 gives for \(f\) with bounded second derivative. Let thus \(F(x) = \sum _{i=1}^n f\Big (\sum _{j=1}^i x_j\Big )\). We have \(\frac{\partial }{\partial x_i} F(x) = \sum _{l\ge i} f^{\prime }\Big (\sum _{j\le l} x_j\Big )\). Therefore
$$\begin{aligned} \big \Vert |\nabla F|\big \Vert _\infty ^2 = \Vert f^{\prime }\Vert _\infty ^2 \sum _{i=1}^n (n-i+1)^2 = \frac{1}{6} n(n+1)(2n+1) \Vert f'\Vert _\infty ^2, \end{aligned}$$
which, when combined with (3) and Chebyshev’s inequality yields
$$\begin{aligned} \mathbb P(|S_f - \mathbb ES_f| \ge t) \le 2\exp \Big (-\frac{t^2}{C L^2 n^3 \Big \Vert f^{\prime }\Big \Vert _\infty ^2}\Big ). \end{aligned}$$
Now, let us assume that \(f \in \mathcal {C}^2\) and \(f''\) is bounded. We have
$$\begin{aligned} |\mathbb E\nabla F(X)|^2 = \sum _{i=1}^n\Bigg (\sum _{l=i}^n \mathbb Ef^\prime \Big (\sum _{j=1}^l X_j\Big )\Bigg )^2. \end{aligned}$$
Moreover
$$\begin{aligned} \frac{\partial ^2}{\partial x_i \partial x_j} F(x_1,\ldots ,x_n) = \sum _{l = i \vee j}^nf^{\prime \prime }\Big (\sum _{k=1}^l x_k\Big ) \end{aligned}$$
and thus
$$\begin{aligned} \Big \Vert \mathbf {D}^2 F(x)\Big \Vert _{\{1,2\}}^2&= \sum _{i,j=1}^n \Bigg (\sum _{l = i \vee j}^n f^{\prime \prime }\Big (\sum _{k=1}^l x_k\Big )\Bigg )^2 \\&\le 2\Big \Vert f^{\prime \prime }\Big \Vert _\infty ^2 \sum _{i=1}^n\sum _{j=i}^n (n-j+1)^2 \le Cn^4\Big \Vert f^{\prime \prime }\Big \Vert _\infty ^2. \end{aligned}$$
Since \(\mathbf {D}^2 F\) is a symmetric bilinear form, we have
$$\begin{aligned} \Big \Vert \mathbf {D}^2 F(x)\Big \Vert _{\{1\}\{2\}}&\le \sup _{|\alpha | \le 1} \sum _{i,j=1}^n \sum _{l = i \vee j}^n\Bigg |f''\Bigg (\sum _{k=1}^l x_k\Bigg )\Bigg |\alpha _i\alpha _j\\&\le \sup _{|\alpha | \le 1} \Vert f''\Vert _\infty \sum _{l=1}^n \Big (\sum _{i \le l} \alpha _i\Big )^2 \\&\le \sup _{|\alpha | \le 1} \Vert f''\Vert _\infty \sum _{l=1}^n l\sum _{i\le l} \alpha _i^2 \le Cn^2 \Vert f''\Vert _\infty . \end{aligned}$$
Using the above estimates and Theorem 1.2 we obtain
$$\begin{aligned}&\mathbb P(|S_f - \mathbb ES_f| \ge t)\\&\le 2\exp \Bigg (-\frac{1}{C L^2}\min \Big (\frac{t^2}{\sum _{i=1}^n \big ( \sum _{l=i}^n \mathbb Ef'(\sum _{j=1}^l X_j) \big )^2 },\frac{t}{n^2\Vert f''\Vert _\infty }\Big )\Bigg ). \end{aligned}$$
To effectively bound the sub-Gaussian coefficient in the above inequality one should use some additional information about the structure of the vector \(X\). For a given function \(f\) it is of order at most \(n^5,\) but if, e.g., the function \(f\) is even and \(X\) is symmetric, it clearly vanishes. In this case we get
$$\begin{aligned} \mathbb P(|S_f - \mathbb ES_f| \ge t) \le 2\exp \Big (-\frac{1}{CL^2}\frac{t}{n^2\Vert f''\Vert _\infty }\Big ). \end{aligned}$$
One can check that if for instance \(X\) is a standard Gaussian vector in \(\mathbb {R}^n\) and \(f(x) = x^2\) then this estimate is tight up to the value of the constant \(C\).
\(U\)-statistics Our last application in this section will concern \(U\)-statistics (for simplicity of order 2) of the random vector \(X\), i.e., random variables of the form
$$\begin{aligned} U = \sum _{i,j \le n, i\ne j} h_{ij}(X_i,X_j), \end{aligned}$$
where \(h_{ij}:\mathbb {R}^2 \rightarrow \mathbb {R}\) are smooth functions. Without loss of generality let us assume that \(h_{ij}(x,y) = h_{ji}(y,x)\).
A simple application of Chebyshev’s inequality and (3) gives that if partial derivatives of \(h_{i,j}\) are uniformly bounded on \(\mathbb {R}^2\) then for all \(t > 0\),
$$\begin{aligned} \mathbb P(|U - \mathbb EU| \ge t)&\le 2 \exp \Big (-\frac{1}{C L^2}\frac{t^2}{\sup _{x\in \mathbb {R}^n}\sum _{i=1}^n (\sum _{j\ne i} \frac{\partial }{\partial x}h_{ij}(x_i,x_j))^2}\Big ) \\&\le 2\exp \Big (-\frac{1}{C L^2}\frac{t^2}{n^3 \max _{i\ne j}\Vert \frac{\partial }{\partial x} h_{ij}\Vert _\infty ^2}\Big ). \end{aligned}$$
For \(h_{ij}\) of class \(\mathcal {C}^2\) with bounded derivatives of second order, a direct application of Theorem 1.2 gives
$$\begin{aligned} \mathbb P(|U - \mathbb EU| \ge t) \le 2\exp \Bigg (-\frac{1}{C}\min \Big (\frac{t^2}{L^4 \alpha ^2},\frac{t^2}{L^2 \beta ^2},\frac{t}{L^2 \gamma }\Big )\Bigg ), \end{aligned}$$
where
$$\begin{aligned} \alpha ^2&= \sup _{x\in \mathbb {R}^n} \Bigg \{\sum _{i,j\le n,i\ne j} \Bigg (\frac{\partial ^2}{\partial x\partial y} h_{ij}(x_i,x_j)\Bigg )^2 + \sum _{i=1}^n\Big (\sum _{j\ne i} \frac{\partial ^2}{\partial x^2} h_{ij}(x_i,x_j)\Big )^2\Bigg \}\\&\quad \le n^2\max _{i\ne j}\Big \Vert \frac{\partial ^2}{\partial x\partial y} h_{ij}\Big \Vert _\infty ^2 + n^3\max _{i\ne j}\Big \Vert \frac{\partial ^2}{\partial x^2} h_{ij}\Big \Vert _\infty ^2,\\ \beta ^2&= \sum _{i=1}^n \Big (\sum _{j\ne i}\mathbb E\frac{\partial }{\partial x} h_{ij}(X_i,X_j)\Big )^2\le n^3 \max _{i\ne j} |\mathbb E\frac{\partial }{\partial x} h_{ij}(X_i,X_j)|^2,\\ \gamma&= \sup _{x \in \mathbb {R}^n} \sup _{|\alpha |,|\beta | \le 1} \Bigg \{\sum _{i,j\le n,i\ne j} \frac{\partial ^2}{\partial x\partial y} h_{ij}(x_i,x_j)\alpha _i\beta _j + \sum _{i=1}^n\alpha _i\beta _i \sum _{j\ne i} \frac{\partial ^2}{\partial x^2} h_{ij}(x_i,x_j)\Bigg \}\\&\quad \le n \Big (\max _{i\ne j}\Big \Vert \frac{\partial ^2}{\partial x\partial y} h_{ij}\Big \Vert _\infty + \max _{i\ne j} \Big \Vert \frac{\partial ^2}{\partial x^2} h_{ij}\Big \Vert _\infty \Big ). \end{aligned}$$
In particular, if \(h_{ij} = h\), a function with bounded derivatives of second order, we get \(\alpha ^2 = \mathcal {O}(n^3), \beta ^2 = \mathcal {O}(n^3), \gamma = \mathcal {O}(n)\), which shows that the oscillations of \(U\) are of order at most \(\mathcal {O}(n^{3/2})\). In the case of \(U\)-statistics of independent random variables, generated by bounded \(h\), this is a well known fact, corresponding to the CLT and classical Hoeffding inequalities for \(U\)-statistics. We remark that in the non-degenerate case, i.e. when \(\mathrm{Var\,}(\mathbb E_X h(X,Y)) > 0, n^{3/2}\) is indeed the right normalization in the CLT for \(U\)-statistics (see e.g. [24]).

3.1.3 Linear statistics of eigenvalues of random matrices

We will now use Corollary 3.6 to obtain tail inequalities for linear eigenvalue statistics of Wigner random matrices. We remark that one could also apply to the random matrix case the other inequalities considered in the previous section, obtaining in particular estimates on \(U\)-statistics of eigenvalues (which have been recently investigated by Lytova and Pastur [50]). We will focus on linear eigenvalues statistics (additive functionals in the language of the previous section) and obtain inequalities involving a Sobolev norm of the function \(f\) with respect to the semicircle law (the limiting spectral distribution for Wigner ensembles) as a sub-Gaussian term. We refer the Reader to the monographs [4, 6, 51, 56] for basic facts concerning random matrices.

Consider thus a real symmetric \(n\times n\) random matrix \(A\) (\(n \ge 2\)) and let \(\lambda _1\le \cdots \le \lambda _n\) be its eigenvalues. We will be interested in concentration inequalities for functionals of the form
$$\begin{aligned} Z = \sum _{i=1}^n f(\lambda _i/\sqrt{n}). \end{aligned}$$
In [31] Guionnet and Zeitouni obtained concentration inequalities for \(Z\) with Lipschitz \(f\) assuming that the entries of \(A\) are independent and satisfy the log-Sobolev inequality with some constant \(L\). More specifically, they prove that for all \(t > 0\),
$$\begin{aligned} \mathbb P(|Z - \mathbb EZ| \ge t) \le 2\exp \Big (-\frac{t^2}{8L\Vert f'\Vert _\infty ^2}\Big ). \end{aligned}$$
(In fact they treat a more general case of banded matrices, but for simplicity we will focus on the basic case.)
As a corollary to Theorem 1.2 we present below an inequality which compliments the above result. Our aim is to replace the strong parameter \(\Vert f'\Vert _\infty \) controlling the sub-Gaussian tail by a weaker Sobolev norm with respect to the semicircular law
$$\begin{aligned} d\rho (x) = \frac{1}{2\pi } \sqrt{4 - x^2} \mathbf {1}_{(-2,2)}(x)\,dx \end{aligned}$$
(recall that this is the limiting spectral distribution for Wigner matrices). Imposing additional smoothness assumptions on the function \(f\) it can be done in a window \(|t| \le c_f n\), where \(c_f\) depends on \(f\).

Proposition 3.7

Assume the entries of the matrix \(A\) are independent (modulo symmetry conditions) real valued, mean zero and variance one random variables, satisfying the logarithmic Sobolev inequality (15) with constant \(L^2\). If \(f\) is \(\mathcal {C}^2\) with bounded second derivative, then for all \(t > 0\),
$$\begin{aligned} \mathbb P(|Z - \mathbb EZ| \ge t) \le 2 \exp \left( - \frac{1}{C_L} \left( \frac{t^2}{\int _{-2}^2 f'^2 \, d\rho + n^{-2/3} \left\| f'' \right\| _\infty ^2} \wedge \frac{nt}{\left\| f'' \right\| _\infty } \right) \right) . \end{aligned}$$
(23)

Remark

The case \(f(x) = x^2\) shows that under the assumptions of Proposition 3.7 one cannot expect a tail behaviour better than exponential for large \(t\). Indeed, since \(Z = \frac{1}{n}(\lambda _1^2 + \cdots + \lambda _n^2) = \frac{1}{n}\sum _{i,j \le n} A_{ij}^2\), even if \(A\) is a matrix with standard Gaussian entries, then for all \(t > 0, \mathbb P(|Z-\mathbb EZ| \ge t) > \frac{1}{C} \exp ( -C (t^2 \wedge nt))\).

Remark

A similar inequality to (23) holds in the case of Hermitian matrices with independent entries as well. In the proof given below one should invoke an appropriate result concerning the speed of convergence of the spectral distribution of Wigner matrices to the semicircular law.

Proof

Let us identify the random matrix \(A\) with a random vector \(\tilde{A} = (A_{ij})_{1\le i\le j\le n}\) having values in \(\mathbb {R}^{n(n+1)/2}\) endowed with the standard Euclidean norm \(|\tilde{A}| = \left( \sum _{1 \le i \le j \le n} A_{ij}^2\right) ^{1/2}\). Note that \(\Vert A\Vert _{\mathrm{HS}} \le \sqrt{2} |\tilde{A}|\). By independence of coordinates of \(\tilde{A}\) and the tensorization property of the logarithmic Sobolev inequality (see, e.g., [46, Corollary 5.7]), \(\tilde{A}\) also satisfies (15) with constant \(L^2\). Furthermore, by the Hoffman-Wielandt inequality (see, e.g., [4, Lemma 2.1.19]) which asserts that if \(B, C\) are two \(n \times n\) real symmetric (or Hermitian) matrices and \(\lambda _i(B), \lambda _i(C)\) resp. their eigenvalues arranged in nondecreasing order, then
$$\begin{aligned} \sum _{i=1}^n |\lambda _i(B) - \lambda _i(C)|^2 \le \Vert B - C\Vert _{\mathrm{HS}}^2, \end{aligned}$$
we get that the map \(\tilde{A} \mapsto (\lambda _1/\sqrt{n}, \ldots , \lambda _n/\sqrt{n}) \in \mathbb {R}^n\) is \(\sqrt{2/n}\)-Lipschitz. Therefore, the random vector \((\lambda _1/\sqrt{n}, \ldots , \lambda _n/\sqrt{n})\) satisfies (15) with constant \(2L^2/n\). In consequence, by the results from [3] (see also Theorem 3.4), \((\lambda _1/\sqrt{n}, \ldots , \lambda _n/\sqrt{n})\) also satisfies (3) with constant \(\sqrt{2}L/\sqrt{n}\). Applying Corollary 3.6 with \(D=2\) we obtain
$$\begin{aligned}&\mathbb P(|Z - \mathbb EZ| \ge t)\nonumber \\&\le \! 2\exp \left( \!-\!\frac{1}{C L^2} \left( \frac{t^2}{n^{-1}\sum _{i=1}^n (\mathbb Ef'(\lambda _i/\sqrt{n}))^2 \!+\! L^2 n^{-1} \left\| f'' \right\| ^2_\infty } \wedge \frac{n t}{\left\| f'' \right\| _\infty }\right) \right) \!.\nonumber \\ \end{aligned}$$
(24)
In what follows we shall estimate from above the term \(n^{-1}\sum _{i=1}^n (\mathbb Ef'(\lambda _i/\sqrt{n}))^2\) from (24). First, by Jensen’s inequality
$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n (\mathbb Ef'(\lambda _i/\sqrt{n}))^2 \le \mathbb E\left( \frac{1}{n}\sum _{i=1}^n f'(\lambda _i/\sqrt{n})^2 \right) = \int _\mathbb {R}(f')^2 d\mu , \end{aligned}$$
(25)
where \(\mu \) is the expected spectral measure of the matrix \(n^{-1/2}A\). According to Wigner’s theorem, for a fixed \(f, \mu \) converges to the semicircular law as \(n \rightarrow \infty \) and thus \(\int _\mathbb {R}(f')^2 \, d\mu \rightarrow \int _{-2}^2 (f')^2 \, d\rho \). A non-asymptotic bound on the term \(\int _\mathbb {R}f'^2 \, d\mu \) can be obtained using the result of Bobkov, Götze and Tikhomirov [12] on the speed of convergence of the expected spectral distribution of real Wigner matrices to the semicircular law. Since each entry of \(A\) satisfies the logarithmic Sobolev inequality with constant \(L^2\), it also satisfies the Poincaré inequality with the same constant (see e.g. [46, Chapter 5]). Therefore Theorem 1.1 from [12] gives
$$\begin{aligned} \sup _{x \in \mathbb {R}} |F_\mu (x) - F_\rho (x)| \le C_L n^{-2/3}, \end{aligned}$$
(26)
where \(F_\mu \) and \(F_\rho \) are the distribution functions of \(\mu \) and \(\rho \), respectively.
The decay of \(1-F_\mu (x)\) and \(F_\mu (x)\) as \(x \rightarrow \infty \) and \(x \rightarrow -\infty \) (resp.) can be obtained using the sub-Gaussian concentration of \(\lambda _n/\sqrt{n}\) and \(\lambda _1/\sqrt{n}\), which is, e.g., a consequence of (3) for the vector of eigenvalues of \(n^{-1/2} A\). For example, for any \(t \ge 0\),
$$\begin{aligned} \mathbb P\left( \frac{\lambda _n}{\sqrt{n}} \ge \mathbb E\frac{\lambda _n}{\sqrt{n}} + t \right)&\le 2 \exp \left( - \frac{1}{C} \frac{n t^2}{L^2} \right) . \end{aligned}$$
(27)
Using the classical technique of \(\delta \)-nets for estimating the operator norm of a matrix (see e.g. [58]) and the fact that the entries of \(A\) are sub-Gaussian (as they satisfy the logarithmic Sobolev inequality) one gets \(\mathbb E\lambda _n \le \mathbb E\Vert A\Vert _{\text {op}} \le C L \sqrt{n}\), which together with (27) yields
$$\begin{aligned} 1 - F_\mu (CL + t) \le \mathbb P\left( \frac{\lambda _n}{\sqrt{n}} \ge CL + t \right) \le 2 \exp \left( - \frac{1}{C} \frac{n t^2}{L^2} \right) \end{aligned}$$
(28)
for all \(t \ge 0\). Clearly, the same inequality holds for \(F(-CL - t)\). Integrating by parts, we get
$$\begin{aligned} \int _\mathbb {R}f'^2 \, d\mu = \int _\mathbb {R}f'^2 \, d\rho + \int _\mathbb {R}\left( f'(x)^2 \right) ' (F_\rho (x) - F_\mu (x)) \, dx. \end{aligned}$$
(29)
Combining the uniform estimate (26) with (28) and using an elementary inequality \(2 x y \le x^2 + y^2\), we estimate the last integral in (29) as follows:
$$\begin{aligned}&\left| \int _\mathbb {R}\left( f'(x)^2 \right) ' (F_\mu (x) - F_\rho (x)) \, dx \right| \nonumber \\&\quad \le \int _\mathbb {R}\left| 2 f'(x) f''(x) \right| \left( \left\| F_\mu - F_\rho \right\| _\infty \wedge 2\exp \left( -\frac{n}{C} \frac{ \text {dist}(x, [-CL, CL])^2}{L^2} \right) \right) \, dx \nonumber \\&\quad \le \int _\mathbb {R}f'(x)^2 \, d\nu (x) + \nu (\mathbb {R}) \left\| f'' \right\| _\infty ^2, \end{aligned}$$
(30)
where
$$\begin{aligned} d\nu (x) = C_L n^{-2/3} \wedge 2\exp \left( -\frac{\text {dist}(x, [-CL, CL])^2}{2 \sigma ^2}\right) \, dx, \qquad \text {and} \qquad \sigma ^2 = \frac{C L^2}{2n}. \end{aligned}$$
We proceed to estimate the two last terms from (30). Take \(r > 0\) such that
$$\begin{aligned} 2 e^{-r^2/(2\sigma ^2)} = C_L n^{-2/3} \end{aligned}$$
(31)
or put \(r=0\) if no such \(r\) exists. Note that if we assume \(C_L \ge 1\), as we obviously can, then
$$\begin{aligned} r \le C L n^{-1/2} \sqrt{\log n}. \end{aligned}$$
(32)
We shall need the following estimates, which are easy consequences of the standard estimate for a Gaussian tail:
$$\begin{aligned} \int _r^\infty e^{-y^2/(2\sigma ^2)} \, dy \le C \sigma e^{-r^2/(2\sigma ^2)} \le C_L \sigma n^{-2/3} \le C_L n^{-7/6}, \end{aligned}$$
(33)
and
$$\begin{aligned} \int _r^\infty y^2 e^{-y^2/(2\sigma ^2)} \, dy&\le \left( \int _0^\infty y^4 e^{-y^2/(2\sigma ^2)} \, dy \right) ^{1/2} \left( \int _r^\infty e^{-y^2/(2\sigma ^2)} \, dy \right) ^{1/2} \nonumber \\&\le C_L \sigma ^{5/2} (\sigma n^{-2/3})^{1/2} \le C_L n^{-11/6}. \end{aligned}$$
(34)
Now, (31), (32) and (33) yield
$$\begin{aligned} \nu (\mathbb {R}) \le (CL + r) C_L n^{-2/3} + 4\int _r^\infty e^{-y^2/(2\sigma ^2)} \, dy \le C_L n^{-2/3}. \end{aligned}$$
(35)
We shall also need the estimate for \(\int _\mathbb {R}x^2 \, d\nu (x)\) which follows from (31), (32) and (34):
$$\begin{aligned} \int _\mathbb {R}x^2 \, d\nu (x) \le \frac{2}{3} (CL+r)^3 C_L n^{-2/3} + 4 \int _r^\infty (CL+y)^2 e^{-y^2/(2\sigma ^2)} \, dy \le C_L n^{-2/3}. \end{aligned}$$
(36)
In order to estimate \(\int _\mathbb {R}f'^2 \, d\nu \), take any \(x_0 \in [-2,2]\) such that \(|f'(x_0)|^2 \le \int _{-2}^2 f'^2 \, d\rho \), and use \(|f'(x)| \le |f'(x_0)| + |x-x_0| \left\| f'' \right\| _\infty \) to obtain
$$\begin{aligned} \int _\mathbb {R}f'(x)^2 \, d\nu (x)&\le 2 \Big ( \int _{-2}^2 f'^2 \, d\rho \Big ) \nu (\mathbb {R}) + 2\left\| f'' \right\| ^2_\infty \int _\mathbb {R}|x-x_0|^2 \, d\nu (x)\\&\le 2 \Big ( \int _{-2}^2 f'^2 \, d\rho \Big ) \nu (\mathbb {R}) \!+\! 4 \left\| f'' \right\| ^2_\infty x_0^2 \nu (\mathbb {R}) \!+\! 4\left\| f'' \right\| ^2_\infty \int _\mathbb {R}x^2 \, d\nu (x). \end{aligned}$$
Plugging (35) and (36) into the above yields
$$\begin{aligned} \int _\mathbb {R}f'(x)^2 \, d\nu (x) \le C_L n^{-2/3} \left( \int _{-2}^2 f'^2 \, d\rho + \left\| f'' \right\| _\infty ^2 \right) . \end{aligned}$$
(37)
In turn, plugging (35) and (37) into (30) and then combining with (29) we finally get
$$\begin{aligned} \int _\mathbb {R}f'^2 \, d\mu \le (1 + C_L n^{-2/3}) \int _{-2}^2 f'^2 \, d\rho + C_L n^{-2/3} \left\| f'' \right\| _\infty ^2, \end{aligned}$$
which combined with (24) and (25) completes the proof.\(\square \)

Remarks

1. The factor \(n^{-2/3}\) in (23) comes only from (26) and in some situations can be improved, provided one can obtain better speed of convergence to the semicircle law.

2. With some more work (using truncations or working directly on moments) one can extend the above proposition to the case \(|f''(x)| \le a(1+|x|^k)\) for some non-negative integer \(k\) and \(a \in \mathbb {R}\). In this case we obtain
$$\begin{aligned} \mathbb P\big (|Z\!-\!\mathbb EZ| \ge t\big ) \le 2 \exp \left( \!-\!\left( \frac{t^2}{C_L \int _{-2}^2 f'^2 \,d\rho \!+\! C_{L,k} n^{-2/3} a^2} \wedge \frac{n}{C_{L,k}} \left( \frac{t}{a}\right) ^{\frac{2}{k+2}} \right) \right) . \end{aligned}$$
We also remark that to obtain the inequality (24) one does not have to use independence of the entries of \(A\), it is enough to assume that the vector \(\tilde{A}\) satisfies the inequality (3).

4 Two-sided estimates of moments for Gaussian polynomials

We will now prove Theorem 1.3, showing that in the case of general polynomials in Gaussian variables, the estimates of Theorem 1.2 are optimal (up to constants depending only on the degree of the polynomial). In the special case of tetrahedral polynomials this follows from Latała’s Theorem 1.1 and the following result by Kwapień.

Theorem 4.1

(Kwapień, Lemma 2 in [40]) If \(X = (X_1,\ldots ,X_n)\) where \(X_i\) are independent symmetric random variables, \(Q\) is a multivariate tetrahedral polynomial of degree \(D\) with coefficients in a Banach space \(E\) and \(Q_d\) is its homogeneous part of degree \(d\), then for any symmetric convex function \(\Phi :E \rightarrow \mathbb {R}_+\) and any \(d \in \{0,1, \ldots , D\}\),
$$\begin{aligned} \mathbb E\Phi (Q_d(X)) \le \mathbb E\Phi (C_d Q(X)). \end{aligned}$$

Indeed, when combined with Theorem 1.1 and the triangle inequality, the above theorem gives the following

Corollary 4.2

Let
$$\begin{aligned} Z = \sum _{0\le d \le D} \sum _{\mathbf{i}\in [n]^d} a^{(d)}_\mathbf{i}g_{i_1}\ldots g_{i_d}, \end{aligned}$$
where \(A_d = (a^{(d)}_\mathbf{i})_{\mathbf{i}\in [n]^d}\) is a \(d\)-indexed symmetric matrix of real numbers such that \(a_\mathbf{i}= 0\) if \(i_k = i_l\) for some \(k\ne l\) (we adopt the convention that for \(d=0\) we have a single number \(a^{(0)}_\emptyset \)). Then for any \(p\ge 2\),
$$\begin{aligned} C_D^{-1}\sum _{0\le d\le D} \sum _{\mathcal {J} \in P_d} p^{\#\mathcal {J}/2} \Vert A_d\Vert _\mathcal {J} \le \Vert Z\Vert _p \le C_D \sum _{0\le d\le D} \sum _{\mathcal {J} \in P_d} p^{\#\mathcal {J}/2} \Vert A_d\Vert _\mathcal {J}. \end{aligned}$$

The strategy of proof of Theorem 1.3 is very simple and relies on infinite divisibility of Gaussian random vectors, which will help us approximate the law of a general polynomial in Gaussian variables by the law of a tetrahedral polynomial, for which we will use Corollary 4.2.

It will be convenient to have the polynomial \(f\) represented as a combination of multivariate Hermite polynomials:
$$\begin{aligned} f(x_1, \ldots , x_n) = \sum _{d=0}^D \sum _{\mathbf{d}\in \Delta _d^n} a_\mathbf{d}h_{d_1}(x_1) \cdots h_{d_n}(x_n), \end{aligned}$$
(38)
where
$$\begin{aligned} \Delta _d^n = \{ \mathbf{d}= (d_1, \ldots , d_n) :\forall _{k \in [n]}\ d_k \ge 0 \text { and } d_1 + \cdots + d_n = d \} \end{aligned}$$
and \(h_m(x) = (-1)^m e^{x^2/2} \frac{d^m}{dx^m} e^{-x^2/2}\) is the \(m\)-th Hermite polynomial.
Let \((W_t)_{t \in [0,1]}\) be a standard Brownian motion. Consider standard Gaussian random variables \(g = W_1\) and, for any positive integer \(N\),
$$\begin{aligned} g_{j,N} = \sqrt{N} (W_{\frac{j}{N}} - W_{\frac{j-1}{N}}), \quad j = 1, \ldots , N. \end{aligned}$$
For any \(d \ge 0\), we have the following representation of \(h_d(g) = h_d(W_1)\) as a multiple stochastic integral (see [34, Example 7.12 and Theorem 3.21]),
$$\begin{aligned} h_d(g) = d! \int _0^1 \! \int _0^{t_d} \! \cdots \! \int _0^{t_2} \, dW_{t_1} \cdots dW_{t_{d-1}} dW_{t_d}. \end{aligned}$$
Approximating the multiple stochastic integral leads to
$$\begin{aligned} h_d(g)&= d! \lim _{N \rightarrow \infty } N^{-d/2} \sum _{1 \le j_1 < \cdots < j_d \le N} g_{j_1, N} \ldots g_{j_d, N} \nonumber \\&= \lim _{N \rightarrow \infty } N^{-d/2} \sum _{\mathbf{j}\in [N]^{\underline{d}}} g_{j_1, N} \ldots g_{j_d, N}, \end{aligned}$$
(39)
where the limit is in \(L^2(\Omega )\) (see [34, Theorem 7.3. and formula (7.9)]) and actually the convergence holds in any \(L^p\) (see [34, Theorem 3.50]). We remark that instead of multiple stochastic integrals with respect to the Wiener process we could use the CLT for canonical \(U\)-statistics (see [24, Chapter 4.2]), however the stochastic integral framework seems more convenient as it allows to put all the auxiliary variables on the same probability space as the original Gaussian sequence.
Now, consider \(n\) independent copies \((W_t^{(i)})_{t \in [0,1]}\) of the Brownian motion (\(i=1, \ldots , n\)) together with the corresponding Gaussian random variables: \(g^{(i)} = W_1^{(i)}\) and, for \(N \ge 1\),
$$\begin{aligned} g_{j, N}^{(i)} = \sqrt{N} (W_{\frac{j}{N}}^{(i)} - W_{\frac{j-1}{N}}^{(i)}), \quad j = 1, \ldots , N. \end{aligned}$$
In the lemma below we state the representation of a multivariate Hermite polynomial in the variables \(g^{(1)}, \ldots , g^{(n)}\) as a limit of tetrahedral polynomials in the variables \(g_{j, N}^{(i)}\). To this end let us introduce some more notation. Let
$$\begin{aligned} G^{(n, N)}&= (g_{1,N}^{(1)}, \ldots , g_{N,N}^{(1)}, \ g_{1,N}^{(2)}, \ldots , g_{N,N}^{(2)}, \ \ldots ,\ g_{1,N}^{(n)}, \ldots , g_{N,N}^{(n)}) \\&= (g_{j,N}^{(i)})_{(i,j)\in [n]\times [N]} \end{aligned}$$
be a Gaussian vector with \(n \times N\) coordinates. We identify here the set \([nN]\) with \([n]\times [N]\) via the bijection \((i,j) \leftrightarrow (i-1)N+j\). We will also identify the sets \(([n]\times [N])^d\) and \([n]^d\times [N]^d\) in a natural way. For \(d \ge 0\) and \(\mathbf{d}\in \Delta _d^n\), let
$$\begin{aligned} I_{\mathbf{d}} = \big \{ \mathbf{i}\in [n]^d :\forall _{l \in [n]} \, \# \mathbf{i}^{-1}(\{l\}) = d_l \big \}, \end{aligned}$$
and define a \(d\)-indexed matrix \(B_{\mathbf{d}}^{(N)}\) of \(n^d\) blocks each of size \(N^d\) as follows: for \(\mathbf{i}\in [n]^d\) and \(\mathbf{j}\in [N]^d\),
$$\begin{aligned} \big (B_{\mathbf{d}}^{(N)}\big )_{(\mathbf{i}, \mathbf{j})} = {\left\{ \begin{array}{ll} \frac{d_1! \cdots d_n!}{d!} N^{-d/2} &{} \text {if}\,\, \mathbf{i}\in I_{\mathbf{d}}\,\, \text {and} \,\, (\mathbf{i}, \mathbf{j}) := \big ((i_1, j_1), \cdots , (i_d, j_d)\big ) \in ([n] \times [N])^{\underline{d}},\\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

Lemma 4.3

With the above notation, for any \(p > 0\),
$$\begin{aligned} \big \langle B_{\mathbf{d}}^{(N)}, (G^{(n,N)})^{\otimes d} \big \rangle \mathop {\longrightarrow }\limits _{N\rightarrow \infty } h_{d_1}(g^{(1)}) \ldots h_{d_n}(g^{(n)}) \quad \text {in}\,\, L^p(\Omega ). \end{aligned}$$

Proof

Using (39) for each \(h_{d_i}(g^{(i)})\),
$$\begin{aligned}&h_{d_1}(g^{(1)}) \ldots h_{d_n}(g^{(n)})\\&= \lim _{N \rightarrow \infty } N^{-d/2} \sum _{\begin{array}{c} (j_1^{(1)}, \ldots , j_{d_1}^{(1)}) \in [N]^{\underline{d_1}} \\ \vdots \\ (j_1^{(n)}, \ldots , j_{d_n}^{(n)}) \in [N]^{\underline{d_n}} \end{array}} \Big ( g_{j_1^{(1)},N}^{(1)} \cdots g_{j_{d_1}^{(1)},N}^{(1)} \Big ) \cdots \Big ( g_{j_1^{(n)},N}^{(n)} \cdots g_{j_{d_n}^{(n)},N}^{(n)} \Big ). \end{aligned}$$
For each \(N\), the right-hand side equals
$$\begin{aligned} \frac{1}{\# I_{\mathbf{d}}} N^{-d/2} \sum _{\mathbf{i}\in I_{\mathbf{d}}} \sum _{\begin{array}{c} \mathbf{j}\in [N]^d \text { s.t.} \\ (\mathbf{i}, \mathbf{j}) \in ([n]\times [N])^{\underline{d}} \end{array}} g_{j_1, N}^{(i_1)} \cdots g_{j_d, N}^{(i_d)} = \big \langle B_{\mathbf{d}}^{(N)}, (G^{(n,N)})^{\otimes d} \big \rangle , \end{aligned}$$
since \(\# I_{\mathbf{d}} = \frac{d!}{d_1! \cdots d_n!}\).\(\square \)
Note that \(B_\mathbf{d}^{(N)}\) is symmetric, i.e., for any \(\mathbf{i}\in [n]^d, \mathbf{j}\in [N]^d\) if \(\pi :[d] \rightarrow [d]\) is a permutation and \(\mathbf{i}' \in [n]^d, \mathbf{j}' \in [N]^d\) are such that \(\forall _{k \in [d]} \; i'_k = i_{\pi (k)}\) and \(j'_k = j_{\pi (k)}\), then
$$\begin{aligned} \big ( B_\mathbf{d}^{(N)} \big )_{(\mathbf{i}',\mathbf{j}')} = \big ( B_\mathbf{d}^{(N)} \big )_{(\mathbf{i},\mathbf{j})}. \end{aligned}$$
Moreover, \(B_\mathbf{d}^{(N)}\) has zeros on “generalized diagonals”, i.e., \(\big ( B_\mathbf{d}^{(N)} \big )_{(\mathbf{i},\mathbf{j})} = 0\) if \((i_k, j_k) = (i_l, j_l)\) for some \(k \ne l\).

Proof of Theorem 1.3

Let us first note that it is enough to prove the moment estimates, the tail bound follows from them by the Paley-Zygmund inequality (see e.g. the proof of Corollary 1 in [44]). Moreover, the upper bound on moments follows directly from Theorem 1.2. For the lower bound we use Lemma 4.3 to approximate the \(L^p\) norm of \(f(G)-\mathbb Ef(G)\) with that of a tetrahedral polynomial, for which we can use the lower bound from Corollary 4.2.

Assuming \(f\) is of the form (38), Lemma 4.3 together with the triangle inequality implies
$$\begin{aligned} \lim _{N \rightarrow \infty } \Big \Vert \sum _{d=1}^D \Big \langle \sum _{\mathbf{d}\in \Delta _d^n} a_\mathbf{d}B_\mathbf{d}^{(N)}, \big (G^{(n,N)}\big )^{\otimes d} \Big \rangle \Big \Vert _p = \big \Vert f(G) - \mathbb Ef(G)\big \Vert _p \end{aligned}$$
for any \(p > 0\), where \(G = (g^{(1)}, \ldots , g^{(n)} )\). It therefore remains to relate \(\big \Vert \sum _{\mathbf{d}\in \Delta _d^n} a_\mathbf{d}B_\mathbf{d}^{(N)}\big \Vert _{\mathcal {J}}\) to \(\left\| \mathbb E\mathbf {D}^d f(G) \right\| _{\mathcal {J}}\) for any \(d \ge 1\) and \(\mathcal {J} \in P_d\). In fact we shall prove that
$$\begin{aligned} \lim _{N \rightarrow \infty } \Big \Vert \sum _{\mathbf{d}\in \Delta _d^n} a_\mathbf{d}B_\mathbf{d}^{(N)}\Big \Vert _{\mathcal {J}} = \frac{1}{d!} \left\| \mathbb E\mathbf {D}^d f(G) \right\| _{\mathcal {J}}, \end{aligned}$$
(40)
which will end the proof.
Fix \(d \ge 1\) and \(\mathcal {J} \in P_d\). For any \(\mathbf{d}\in \Delta _d^n\) define a symmetric \(d\)-indexed matrix \((b_\mathbf{d})_{\mathbf{i}\in [n]^d}\) as
$$\begin{aligned} (b_\mathbf{d})_\mathbf{i}= {\left\{ \begin{array}{ll} \frac{d_1! \cdots d_n!}{d!} &{} \text {if}\,\, \mathbf{i}\in I_\mathbf{d}, \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
and a symmetric \(d\)-indexed matrix \((\tilde{B}_\mathbf{d}^{(N)})_{(\mathbf{i}, \mathbf{j}) \in ([n] \times [N])^d}\) as
$$\begin{aligned} (\tilde{B}_\mathbf{d}^{(N)})_{(\mathbf{i}, \mathbf{j})} = N^{-d/2} (b_\mathbf{d})_\mathbf{i}\quad \text {for all}\,\, \mathbf{i}\in [n]^d \,\,\text {and}\,\, \mathbf{j}\in [N]^d. \end{aligned}$$
It is a simple observation that
$$\begin{aligned} \Big \Vert \sum _{\mathbf{d}\in \Delta _d^n} a_\mathbf{d}\tilde{B}_\mathbf{d}^{(N)} \Big \Vert _\mathcal {J} = \Big \Vert \sum _{\mathbf{d}\in \Delta _d^n} a_\mathbf{d}(b_\mathbf{d})_{\mathbf{i}\in [n]^d} \Big \Vert _\mathcal {J}. \end{aligned}$$
(41)
On the other hand, for any \(\mathbf{d}\in \Delta _d^n\), the matrices \(\tilde{B}_\mathbf{d}^{(N)}\) and \(B_\mathbf{d}^{(N)}\) differ at no more than \(\# I_\mathbf{d}\cdot \#([N]^d{\setminus }[N]^{\underline{d}})\) entries. More precisely, if \(\mathcal {J}_0 = \{ [d] \}\) (the trivial partition of \([d]\) into one set), then
$$\begin{aligned} \big \Vert \tilde{B}_\mathbf{d}^{(N)} \!-\! B_\mathbf{d}^{(N)} \big \Vert _\mathcal {J}^2 \!\le \! \big \Vert \tilde{B}_\mathbf{d}^{(N)} \!-\! B_\mathbf{d}^{(N)} \big \Vert _{\mathcal {J}_0}^2 \!\le \! \frac{d_1! \cdots d_n!}{d!} N^{-d} (N^d \!-\! N^{\underline{d}}) \longrightarrow 0 \, \text {as}\, N \!\rightarrow \! \infty . \end{aligned}$$
Thus the triangle inequality for the \(\Vert \cdot \Vert _\mathcal {J}\) norm together with (41) yields
$$\begin{aligned} \lim _{N \rightarrow \infty } \Big \Vert \sum _{\mathbf{d}\in \Delta _d^n} a_\mathbf{d}B_\mathbf{d}^{(N)}\Big \Vert _{\mathcal {J}} = \Big \Vert \sum _{\mathbf{d}\in \Delta _d^n} a_\mathbf{d}(b_\mathbf{d})_{\mathbf{i}\in [n]^d} \Big \Vert _\mathcal {J}. \end{aligned}$$
(42)
Finally, note that
$$\begin{aligned} \mathbb E\mathbf {D}^d f(G) = d! \sum _{\mathbf{d}\in \Delta _d^n} a_\mathbf{d}(b_\mathbf{d})_{\mathbf{i}\in [n]^d}. \end{aligned}$$
(43)
Indeed, using the identity on Hermite polynomials, \(\frac{d}{dx} h_k(x) = k h_{k-1}(x)\) (\(k \ge 1\)), we obtain \(\mathbb E\frac{d^l}{dx^l} h_k(g) = k! \mathbf {1}_{\{k=l\}}\) for \(k,l\ge 0\), and thus, for any \(d, l \le D\) and \(\mathbf{d}\in \Delta _l^n\),
$$\begin{aligned} \big (\mathbb E\mathbf {D}^d h_{d_1}(g^{(1)}) \cdots h_{d_n}(g^{(n)})\big )_\mathbf{i}= d! (b_\mathbf{d})_\mathbf{i}\mathbf {1}_{\{d=l\}} \quad \text {for each}\,\, \mathbf{i}\in [n]^d. \end{aligned}$$
Now, (43) follows by linearity. Combining it with (42) proves (40).\(\square \)

Remark

Note that the above infinite-divisibility argument can be also used to prove the upper bound on moments in Theorem 1.3 (giving a proof independent of the one relying on Theorem 1.2).

5 Polynomials in independent sub-Gaussian random variables

In this section we prove Theorem 1.4. Before we proceed with the core of the proof we will need to introduce some auxiliary inequalities for the norms \(\Vert \cdot \Vert _\mathcal {J}\) as well as some additional notation.

5.1 Properties of \(\Vert \cdot \Vert _\mathcal {J}\) norms

The first inequality we will need is pretty standard and given in the following lemma (it is a direct consequence of the definition of the norms \(\Vert \cdot \Vert _{\mathcal {J}}\)). Below \(\circ \) denotes the Hadamard product of \(d\)-indexed matrices, as defined in Sect. 2.

Lemma 5.1

For any \(d\)-indexed matrix \(A = (a_\mathbf{i})_{\mathbf{i}\in [n]^d}\) and any vectors \(v_1,\ldots ,v_d \in \mathbb {R}^n\) we have for all \(\mathcal {J} \in P_d\),
$$\begin{aligned} \Vert A\circ \otimes _{i=1}^d v_i\Vert _\mathcal {J} \le \Vert A\Vert _\mathcal {J}\prod _{i=1}^d \Vert v_i\Vert _\infty . \end{aligned}$$

To formulate subsequent inequalities we need some auxiliary notation concerning \(d\)-indexed matrices. We will treat matrices as functions from \([n]^d\) into the real line, which in particular allows us to use the notation of indicator functions and for a set \(C \subseteq [n]^d\) write \(\mathbf {1}_{C}\) for the matrix \((a_\mathbf{i})\) such that \(a_\mathbf{i}= 1\) if \(\mathbf{i}\in C\) and \(a_\mathbf{i}= 0\) otherwise.

Note that for \(\#\mathcal {J} > 1, \Vert \cdot \Vert _\mathcal {J}\) is not unconditional in the standard basis, i.e., in general it is not true that \(\Vert A\circ \mathbf {1}_{C}\Vert _\mathcal {J} \le \Vert A\Vert _\mathcal {J}\). One situation in which this inequality holds is when \(C\) is of the form \(C = \{\mathbf{i}:i_{k_1} = j_1,\ldots ,i_{k_l} = j_l\}\) for some \(1 \le k_1<\ldots <k_l \le d\) and \(j_1,\ldots ,j_l \in [n]\) (which follows from Lemma 5.1). This corresponds to setting to zero all coefficients which are outside a “generalized row” of a matrix and leaving the coefficients in this row intact.

Later we will need another inequality of this type, which will allow us to select a “generalized diagonal” of a matrix. The corresponding estimate is given in the following

Lemma 5.2

Let \(A = (a_\mathbf{i})_{\mathbf{i}\in [n]^d}\) be a \(d\)-indexed matrix, \(K \subseteq [d]\) and let \(C \subseteq [n]^d\) be of the form \(C = \{\mathbf{i}:i_k = i_l \;\text {for all} \; k,l \in K\}\). Then for every \(\mathcal {J} \in P_d, \Vert A\circ \mathbf {1}_{C}\Vert _\mathcal {J} \le \Vert A\Vert _\mathcal {J}\).

Proof

Since \(\mathbf {1}_{C_1\cap C_2} = \mathbf {1}_{C_1} \circ \mathbf {1}_{C_2}\), it is enough to consider the case \(\#K = 2\), i.e. \(C = \{\mathbf{i}:i_k = i_l\}\) for some \(1\le k<l\le d\). Let \(\mathcal {J} = \{J_1,\ldots ,J_m\}\). We will consider two cases.

1. The numbers \(k\) and \(l\) are separated by the partition \(\mathcal {J}\). Without loss of generality we can assume that \(k\in J_1, l\in J_2\). Then
$$\begin{aligned}&\Vert A\circ \mathbf {1}_{C}\Vert _\mathcal {J} \nonumber \\&= \sup _{\Vert x^{(j)}_{\mathbf{i}_{J_j}}\Vert _2\le 1:j \ge 3}\Bigg (\sup _{\Vert x^{(1)}_{\mathbf{i}_{J_1}}\Vert _2,\Vert x^{(2)}_{\mathbf{i}_{J_2}}\Vert _2\le 1} \sum _{|\mathbf{i}_{J_1}|\le n}\sum _{|\mathbf{i}_{J_2}|\le n}\mathbf {1}_{\{i_k=i_l\}}\Big (\sum _{|\mathbf{i}_{(J_1\cup J_2)^c}|\le n} a_\mathbf{i}x^{(3)}_{\mathbf{i}_{J_3}}\cdots x^{(m)}_{\mathbf{i}_{J_m}}\Big )x^{(1)}_{\mathbf{i}_{J_1}}x^{(2)}_{\mathbf{i}_{J_2}}\Bigg ).\nonumber \\ \end{aligned}$$
(44)
For any \(x^{(3)}_{\mathbf{i}_{J_3}},\ldots ,x^{(m)}_{\mathbf{i}_{J_m}}\), consider the matrix
$$\begin{aligned} B = (B_{\mathbf{i}_{J_1},\mathbf{i}_{J_2}})_{\mathbf{i}_{J_1},\mathbf{i}_{J_2}} = \Bigg (\sum _{|\mathbf{i}_{(J_1\cup J_2)^c}|\le n} a_\mathbf{i}x^{(3)}_{\mathbf{i}_{J_3}}\ldots x^{(m)}_{\mathbf{i}_{J_m}}\Bigg )_{\mathbf{i}_{J_1},\mathbf{i}_{J_2}} \end{aligned}$$
acting from \(\ell _2([n]^{J_1})\) to \(\ell _2([n]^{J_2})\).
For fixed \(x^{(3)}_{\mathbf{i}_{J_3}},\ldots ,x^{(m)}_{\mathbf{i}_{J_m}}\) the inner supremum on the right hand side of (44) is the operator norm of the block-diagonal matrix obtained from \(B\) by setting to zero entries in off-diagonal blocks. Therefore it is not greater than the operator norm of \(B\), which allows us to write
$$\begin{aligned} \Vert A\circ \mathbf {1}_{C}\Vert _\mathcal {J}&\le \sup _{\Vert x^{(j)}_{\mathbf{i}_{J_j}}\Vert _2\le 1:j \ge 3}\Bigg (\sup _{\Vert x^{(1)}_{\mathbf{i}_{J_1}}\Vert _2,\Vert x^{(2)}_{\mathbf{i}_{J_2}}\Vert _2\le 1} \sum _{|\mathbf{i}_{J_1}|\le n}\sum _{|\mathbf{i}_{J_2}|\le n}\Big (\sum _{|\mathbf{i}_{(J_1\cup J_2)^c}|\le n} a_\mathbf{i}x^{(3)}_{\mathbf{i}_{J_3}}\ldots x^{(m)}_{\mathbf{i}_{J_m}}\Big )x^{(1)}_{\mathbf{i}_{J_1}}x^{(2)}_{\mathbf{i}_{J_2}}\Bigg )\\&= \Vert A\Vert _\mathcal {J}. \end{aligned}$$
2. There exists \(j\) such that \(k,l \in J_j\). Without loss of generality we can assume that \(j = 1\). We have
$$\begin{aligned} \Vert A\circ \mathbf {1}_{C}\Vert _\mathcal {J}&= \sup _{\Vert x^{(j)}_{\mathbf{i}_{J_j}}\Vert _2\le 1:j \ge 2}\Bigg (\sup _{\Vert x^{(1)}_{\mathbf{i}_{J_1}}\Vert _2\le 1} \sum _{|\mathbf{i}_{J_1}|\le n}\mathbf {1}_{\{i_k=i_l\}}\Big (\sum _{|\mathbf{i}_{J_1^c}|\le n} a_\mathbf{i}x^{(2)}_{\mathbf{i}_{J_2}}\ldots x^{(m)}_{\mathbf{i}_{J_m}}\Big )x^{(1)}_{\mathbf{i}_{J_1}}\Bigg )\\&= \sup _{\Vert x^{(j)}_{\mathbf{i}_{J_j}}\Vert _2\le 1:j \ge 2}\Bigg (\sum _{|\mathbf{i}_{J_1}|\le n}\mathbf {1}_{\{i_k=i_l\}}\Big (\sum _{|\mathbf{i}_{J_1^c}|\le n} a_\mathbf{i}x^{(2)}_{\mathbf{i}_{J_2}}\cdots x^{(m)}_{\mathbf{i}_{J_m}}\Big )^2\Bigg )^{1/2}\\&\le \sup _{\Vert x^{(j)}_{\mathbf{i}_{J_j}}\Vert _2\le 1:j \ge 2}\Bigg (\sum _{|\mathbf{i}_{J_1}|\le n}\Big (\sum _{|\mathbf{i}_{J_1^c}|\le n} a_\mathbf{i}x^{(2)}_{\mathbf{i}_{J_2}}\cdots x^{(m)}_{\mathbf{i}_{J_m}}\Big )^2\Bigg )^{1/2} = \Vert A\Vert _\mathcal {J}. \end{aligned}$$
\(\square \)
For a partition \(\mathcal {K} = \{K_1,\ldots ,K_m\} \in P_d\) define
$$\begin{aligned} L(\mathcal {K}) = \{\mathbf{i}\in [n]^d:i_k= i_l \;\text {iff}\; \exists _{j\le m}\; k,l\in K_j\}. \end{aligned}$$
(45)
Thus \(L(\mathcal {K})\) is the set of all indices for which the partition into level sets is equal to \(\mathcal {K}\).

Corollary 5.3

For any \(\mathcal {J,K} \in P_d\) and any \(d\)-indexed matrix \(A\),
$$\begin{aligned} \Vert A\circ \mathbf {1}_{L(\mathcal {K})}\Vert _\mathcal {J} \le 2^{\#\mathcal {K}(\#\mathcal {K}-1)/2}\Vert A\Vert _\mathcal {J}. \end{aligned}$$

Proof

By Lemma 5.2 and the triangle inequality for any \(k< l\),
$$\begin{aligned} \Vert A\circ \mathbf {1}_{\{i_k\ne i_l\}}\Vert _\mathcal {J} = \Vert A - A\circ \mathbf {1}_{\{i_k=i_l\}}\Vert _\mathcal {J} \le 2\Vert A\Vert _\mathcal {J}. \end{aligned}$$
(46)
Now it is enough to note that \(L(\mathcal {K})\) can be expressed as an intersection of \(\#\mathcal {K}\) “generalized diagonals” and \(\#\mathcal {K}(\#\mathcal {K}-1)/2\) sets of the form \(\{\mathbf{i}:i_k\ne i_l\}\) where \(k < l\) and use again Lemma 5.2 together with (46).\(\square \)

5.2 Proof of Theorem 1.4

Let us first note that the tail bound of Theorem 1.4 follows from the moment estimate and Chebyshev’s inequality in the same way as in Theorems 1.2 or 3.3. We will therefore focus on the moment bound.

The method of proof will rely on the reduction to the Gaussian case via decoupling inequalities, symmetrization and the contraction principle. To carry out this strategy we will need the following representation of \(f\):
$$\begin{aligned} f(x) = \sum _{d= 0}^D \sum _{m=0}^d \mathop {\mathop {\sum }\limits _{{k_1, \ldots , k_m > 0}}}\limits _{k_1+\cdots +k_m=d} \sum _{\mathbf{i}\in [n]^{\underline{m}}} c_ {(i_1,k_1),\ldots ,(i_m,k_m)}^{(d)} x_{i_1}^{k_1}x_{i_2}^{k_2}\ldots x_{i_m}^{k_m}, \end{aligned}$$
(47)
where the coefficients \(c_{(i_1,k_1),\ldots ,(i_m,k_m)}^{(d)}\) satisfy
$$\begin{aligned} c_{(i_1,k_1),\ldots ,(i_m,k_m)}^{(d)} = c_{(i_{\pi _1},k_{\pi _1}),\ldots ,(i_{\pi _m},k_{\pi _m})}^{(d)} \end{aligned}$$
(48)
for all permutations \(\pi :[m] \rightarrow [m]\). At this point we would like to explain the convention regarding indices which we will use throughout this section. It is rather standard, but we prefer to draw the Reader’s attention to it, as we will use it extensively in what follows. Namely, we will treat the sequence \(\mathbf{k}= (k_1,\ldots ,k_m)\) as a function acting on \([m]\) and taking values in positive integers. In particular if \(m=0\), then \([m] = \emptyset \) and there exists exactly one function \(\mathbf{k}:[m]\rightarrow \mathbb {N}{\setminus }\{0\}\) (the empty function). Moreover by convention this function satisfies \(\sum _{i=1}^m k_i = 0\) (as the summation runs over an empty set). Therefore, for \(d=0\) and \(m=0\) the subsum over \(k_1,\ldots ,k_m\) and \(\mathbf{i}\) above is equal to the free coefficient of the polynomial (which can be denoted by \(c_\emptyset ^{(0)}\)), since the summation over \(k_1,\ldots ,k_m\) runs over a one-element set containing the empty index/function and for this index there is exactly one index \(\mathbf{i}:[m]\rightarrow \{1,\ldots ,n\}\), which belongs to \([n]^{\underline{m}}\) (again the empty-index). Here we also use the convention that a product over an empty set is equal to one. On the other hand, for \(d>0\), the contribution from \(m=0\) is equal to zero (as the empty index \(\mathbf{k}\) does not satisfy the constraint \(k_1+\cdots +k_m = d\) and so the summation over \(k_1,\ldots ,k_m\) runs over the empty set).
Using (47) together with independence of \(X_1,\ldots ,X_n\), one may write
$$\begin{aligned} f(X) - \mathbb Ef(X)&= \sum _{d=1}^D \sum _{m=1}^d \mathop {\mathop {\sum }\limits _{{k_1, \ldots , k_m > 0}}}\limits _ {{k_1+\cdots +k_m=d}} \sum _{\mathbf{i}\in [n]^{\underline{m}}} c_{(i_1,k_1),\cdots ,(i_m,k_m)}^{(d)} \\&\times \sum _{\emptyset \ne J \subseteq [m]} \prod _{j\in J} (X_{i_j}^{k_j} - \mathbb EX_{i_j}^{k_j})\prod _{j\notin J} \mathbb EX_{i_j}^{k_j}. \end{aligned}$$
Rearranging the terms and using (48) together with the triangle inequality, we obtain
$$\begin{aligned} |f(X) - \mathbb Ef(X) | \le \sum _{ d=1}^D \sum _{a=1}^d \mathop {\mathop {\sum }\limits _{{k_1, \ldots , k_a > 0}}}\limits _ {k_1+\cdots +k_a=d} \Big |\sum _{\mathbf{i}\in [n]^{\underline{a}}} d_{i_1,\ldots ,i_a}^{(k_1,\ldots ,k_a)} (X_{i_1}^{k_1} - \mathbb EX_{i_1}^{k_1})\cdots (X_{i_a}^{k_a} - \mathbb EX_{i_a}^{k_a})\Big |, \end{aligned}$$
where
$$\begin{aligned} d_{i_1,\ldots ,i_a}^{(k_1,\ldots ,k_a)} = \sum _{ m =a}^D \mathop {\mathop {\sum }\limits _{{k_{a+1},\ldots ,k_m > 0:}}}\limits _{k_1+\cdots +k_m \le D} \mathop {\mathop {\sum }\limits _{{i_{a+1},\ldots ,i_m:}}}\limits _{(i_1,\ldots ,i_m)\in [n]^{\underline{m}}} \left( {\begin{array}{c}m\\ a\end{array}}\right) c_{(i_1,k_1),\ldots ,(i_m,k_m)}^{(k_1+\cdots +k_m)}\mathbb EX_{i_{a+1}}^{k_{a+1}}\cdots \mathbb EX_{i_m}^{k_{i_m}}. \end{aligned}$$
Note that (48) implies that for every permutation \(\pi :[a]\rightarrow [a]\),
$$\begin{aligned} d_{i_1,\ldots ,i_a}^{(k_1,\ldots ,k_a)} = d_{i_{\pi _1},\ldots ,i_{\pi _a}}^{(k_{\pi _1},\ldots ,k_{\pi _a})}. \end{aligned}$$
(49)
Let now \(X^{(1)},\ldots ,X^{(D)}\) be independent copies of the random vector \(X\) and \((\varepsilon _i^{(j)})_{i\le n,j\le D}\) an array of i.i.d. Rademacher variables independent of \((X^{(j)})_j\). For each \(k_1,\ldots ,k_a\), by decoupling inequalities (Theorem 7.1 in the “Appendix”) applied to the functions
$$\begin{aligned} h_{i_1,\ldots ,i_a}^{(k_1,\ldots ,k_a)}(x_1,\ldots ,x_a) = d_{i_1,\ldots ,i_a}^{(k_1,\ldots ,k_a)}(x_1^{k_1} - \mathbb EX_{i_1}^{k_1})\cdots (x_a^{k_a} - \mathbb EX_{i_a}^{k_a}) \end{aligned}$$
and standard symmetrization inequalities (applied conditionally \(a\) times) we obtain,
$$\begin{aligned}&\Vert f(X) - \mathbb Ef(X) \Vert _p \nonumber \\&\quad \le C_D\sum _{d=1}^D \sum _{a=1}^d \mathop {\mathop {\sum }\limits _{{k_1, \ldots , k_a > 0}}}\limits _ {k_1+\cdots +k_a=d} \bigg \Vert \sum _{\mathbf{i}\in [n]^{\underline{a}}} d_{i_1,\ldots ,i_a}^{(k_1,\ldots ,k_a)} \Big ((X_{i_1}^{(1)})^{k_1} - \mathbb E(X_{i_1}^{(1)})^{k_1}\Big )\cdots \Big ((X_{i_a}^{(a)})^{k_a} - \mathbb E(X_{i_a}^{(a)})^{k_a}\Big )\bigg \Vert _p\nonumber \\&\quad \le C_D\sum _{d=1}^D \sum _{a=1}^d \mathop {\mathop {\sum }\limits _{{k_1, \ldots , k_a > 0}}}\limits _ {k_1+\cdots +k_a=d} \bigg \Vert \sum _{\mathbf{i}\in [n]^{\underline{a}}} d_{i_1,\ldots ,i_a}^{(k_1,\ldots ,k_a)} \Big (\varepsilon _{i_1}^{(1)}(X_{i_1}^{(1)})^{k_1} \cdots \varepsilon _{i_a}^{(a)}(X_{i_a}^{(a)})^{k_a} \Big )\bigg \Vert _p \end{aligned}$$
(50)
(note that in the first part of Theorem 7.1 one does not impose any symmetry assumptions on the functions \(h_\mathbf{i}\)).

We will now use the following standard comparison lemma (for Reader’s convenience its proof is presented in the “Appendix”).

Lemma 5.4

For any positive integer \(k\), if \(Y_1,\ldots ,Y_n\) are independent symmetric variables with \(\Vert Y_i\Vert _{\psi _{2/k}} \le M\), then
$$\begin{aligned} \left\| \sum _{i=1}^n a_i Y_i\right\| _p \le C_k M\left\| \sum _{i=1}^n a_i g_{i1}\ldots g_{ik} \right\| _p\!, \end{aligned}$$
where \(g_{ij}\) are i.i.d. \(\mathcal {N}(0,1)\) variables.
Note that for any positive integer \(k\) we have \(\Vert X_i^k\Vert _{\psi _{2/k}} = \Vert X_i\Vert _{\psi _2}^k \le L^k\), so (50) together with the above lemma (used repeatedly and conditionally) yield
$$\begin{aligned}&\Vert f(X) - \mathbb Ef(X) \Vert _p \nonumber \\&\quad \le C_D\sum _{d=1}^D L^d\sum _{a=1}^d \mathop {\mathop {\sum }\limits _{{k_1, \ldots , k_a > 0}}}\limits _{k_1+\cdots +k_a=d} \left\| \sum _{\mathbf{i}\in [n]^{\underline{a}}} d_{i_1,\ldots ,i_a}^{(k_1,\ldots ,k_a)} (g^{(1)}_{i_1,1}\cdots g^{(1)}_{i_1,k_1})\cdots (g^{(a)}_{i_a,1}\cdots g^{(a)}_{i_a,k_a})\right\| _p,\qquad \end{aligned}$$
(51)
where \((g_{i,k}^{(j)})\) is an array of i.i.d. standard Gaussian variables. Consider now multi-indexed matrices \(B_1,\ldots ,B_D\) defined as follows. For \(1\le d\le D\), and a multi-index \(\mathbf{r}= (r_1,\ldots ,r_d)\in [n]^d\), let \(\mathcal {I} = \{I_1,\ldots ,I_a\}\) be the partition of \(\{1,\ldots ,d\}\) into the level sets of \(\mathbf{r}\) and \(i_1,\ldots ,i_a\) be the values corresponding to the level sets \(I_1,\ldots ,I_a\). Define moreover
$$\begin{aligned} b^{(d)}_{r_1,\ldots ,r_d} = d^{(\#I_1,\ldots ,\#I_a)}_{i_1,\ldots ,i_a} \end{aligned}$$
(note that thanks to (49) this definition does not depend on the order of \(I_1,\ldots ,I_a\)). Finally, define the \(d\)-indexed matrix \(B_d = (b^{(d)}_{\mathbf{r}})_{\mathbf{r}\in [n]^d}\).

Let us also define for \(k_1,\ldots ,k_a > 0, \sum _{i=1}^a k_i = d\) the partition \(\mathcal {K}(k_1,\ldots ,k_a) \in P_d\) by splitting the set \(\{1,\ldots ,d\}\) into consecutive intervals of length \(k_1,\ldots ,k_a\), i.e., \(\mathcal {K} = \{K_1,\ldots ,K_a\}\), where for \(l = 1,\ldots ,a, K_l = \{1+\sum _{i=1}^{l-1} k_i,2+\sum _{i=1}^{l-1} k_i, \ldots ,\sum _{i=1}^{l} k_i\}\).

Applying Theorem 3.1 to the right hand side of (51), we obtain
$$\begin{aligned}&\Vert f(X) - \mathbb Ef(X) \Vert _p \\&\le C_D\sum _{ d=1}^D L^d\sum _{a=1}^d \mathop {\mathop {\sum }\limits _{{k_1, \ldots , k_a > 0}}}\limits _{k_1+\cdots +k_a=d} \Big \Vert \Big \langle B_d \circ \mathbf {1}_{L(\mathcal {K}(k_1,\ldots ,k_a))},\bigotimes _{j=1}^a \bigotimes _{k=1}^{k_j} (g^{(j)}_{i,k})_{i\le n}\Big \rangle \Big \Vert _p\nonumber \\&\le C_D\sum _{d=1}^D L^d \sum _{a=1}^d \mathop {\mathop {\sum }\limits _{{k_1, \ldots , k_a > 0}}}\limits _{k_1+\cdots +k_a=d} \sum _{\mathcal {J} \in P_{d}} p^{\#\mathcal {J}/2}\Vert B_d \circ \mathbf {1}_{L(\mathcal {K}(k_1,\ldots ,k_a))}\Vert _\mathcal {J}. \end{aligned}$$
Note that for all \(k_1,\ldots ,k_a\) by Corollary 5.3 we have \(\Vert B_d\circ \mathbf {1}_{L(\mathcal {K}(k_1,\cdots ,k_a))}\Vert _\mathcal {J} \le C_d \Vert B_d\Vert _\mathcal {J}\). Thus we obtain
$$\begin{aligned} \Vert f(X) - \mathbb Ef(X) \Vert _p\le C_D\sum _{d=1}^D L^d\sum _{\mathcal {J} \in P_{d}} p^{\#\mathcal {J}/2}\Vert B_d \Vert _\mathcal {J}. \end{aligned}$$
Our next goal is to replace \(B_d\) in the above inequality by \(\mathbb E\mathbf {D}^d f(X)\). To this end we will analyse the structure of the coefficients of \(B_d\) and compare them with the integrated partial derivatives of \(f\).
Let us first calculate \(\mathbb E\mathbf {D}^d f(X)\). Consider \(\mathbf{r}\in [n]^d\), such that \(i_1,\ldots ,i_a\) are its distinct values, taken \(l_1,\ldots ,l_a\) times respectively. We have
$$\begin{aligned}&\mathbb E\frac{\partial ^{d} f}{\partial x_{r_1}\cdots \partial x_{r_d}}(X) = \sum _{k_1\ge l_1,\ldots ,k_a\ge l_a}\sum _{a\le m \le D} \mathop {\mathop {\sum }\limits _{{k_{a+1},\ldots ,k_m > 0}}}\limits _{k_1+\cdots +k_m \le D} \mathop {\mathop {\sum }\limits _{{i_{a+1},\ldots ,i_m}}}\limits _ {(i_1,\ldots ,i_m) \in [n]^{\underline{m}}}\\&\quad \Bigg [\left( {\begin{array}{c}m\\ a\end{array}}\right) a!c^{(k_1+\cdots +k_m)}_{(i_1,k_1),\ldots ,(i_m,k_m)}\prod _{j=1}^a \mathbb EX_{i_j}^{k_j-l_j}\prod _{j=a+1}^m \mathbb EX_{i_j}^{k_j}\prod _{j=1}^a \frac{k_j!}{(k_j - l_j)!}\Bigg ], \end{aligned}$$
where we have used (48).

By comparing this with the definition of \(b^{(d)}_{r_1,\cdots ,r_d}\) and \(d^{(k_1,\ldots ,k_a)}_{i_1,\ldots ,i_a}\) one can see that the sub-sum of the right hand side above corresponding to the choice \(k_1 = l_1,\ldots ,k_a = l_a\) is equal to \(a!l_1!\cdots l_a! b^{(d)}_{r_1,\ldots ,r_d}\).

In particular for \(d=D\), since \(l_1+\cdots +l_a = D\), we have
$$\begin{aligned} \mathbb E\frac{\partial ^{D} f}{\partial x_{r_1}\cdots \partial x_{r_D}}(X) = a! l_1! \cdots l_a! b^{(D)}_{r_1,\ldots ,r_D} \end{aligned}$$
and so
$$\begin{aligned} \Vert B_D\Vert _\mathcal {J}\le \sum _{\mathcal {K}\in P_D} \Vert B_D\circ \mathbf {1}_{L(\mathcal {K})}\Vert _\mathcal {J} \le \sum _{\mathcal {K}\in P_D} \Vert \mathbf {D}^D f(X)\circ \mathbf {1}_{L(\mathcal {K})}\Vert _\mathcal {J} \le C_D \Vert \mathbf {D}^D f(X)\Vert _\mathcal {J}, \end{aligned}$$
where in the last inequality we used Corollary 5.3. Therefore if we prove that for all \(d < D\) and all partitions \(\mathcal {I} = \{I_1,\ldots ,I_a\},\mathcal {J} = \{J_1,\ldots ,J_b\} \in P_d\),
$$\begin{aligned} \Vert a!\#I_1!\cdots \#I_a! (B_d \circ \mathbf {1}_{L(\mathcal {I})}) \!-\! \mathbb E\mathbf {D}^d f(X)\circ \mathbf {1}_{L(\mathcal {I})}\Vert _\mathcal {J} \!\le \! C_D \sum _{d< k \le D} L^{k-d}\! \mathop {\mathop {\sum }\limits _{{\mathcal {K} \in P_k}}}\limits _ {\#\mathcal {K} = \#\mathcal {J}} \Vert B_k\Vert _\mathcal {K}\!, \end{aligned}$$
(52)
then by simple reverse induction (using again Corollary 5.3) we will obtain
$$\begin{aligned} \sum _{ d=1}^{D} L^d \sum _{\mathcal {J} \in P_{d}} p^{\#\mathcal {J}/2}\Vert B_d \Vert _\mathcal {J} \le C_D\sum _{1\le d\le D} L^d\sum _{\mathcal {J} \in P_{d}} p^{\#\mathcal {J}/2}\Vert \mathbb E\mathbf {D}^d f(X)\Vert _\mathcal {J}, \end{aligned}$$
which will end the proof of the theorem.
Fix any \(d < D\) and partitions \(\mathcal {I} = \{I_1,\ldots ,I_a\}, \mathcal {J} = \{J_1,\ldots ,J_b\} \in P_d\). Denote \(l_i = \# I_i\). For every sequence \(k_1,\ldots ,k_a\) such that \(k_i \ge l_i\) for \(i\le a\) and there exists \(i \le a\) such that \(k_i > l_i\), let us define a \(d\)-indexed matrix \(E^{(d,k_1,\ldots ,k_a)}_\mathcal {I} = (e^{(d,k_1,\ldots ,k_a)}_\mathbf{r})_{\mathbf{r}\in [n]^d}\), such that \(e^{(d,k_1,\ldots ,k_a)}_\mathbf{r}= 0\) if \(\mathbf{r}\notin L(\mathcal {I})\) and for \(\mathbf{r}\in L(\mathcal {I})\),
$$\begin{aligned} e^{(d,k_1,\ldots ,k_a)}_\mathbf{r}= \sum \limits _{m=a}^D\,\, \mathop {\mathop {\sum }\limits _{k_{a+1},\ldots ,k_m > 0}}\limits _{k_1+\cdots +k_m \le D}\,\, \mathop {\mathop {\sum }\limits _{{i_{a+1},\ldots ,i_m}}}\limits _{(i_1,\ldots ,i_m) \in [n]^{\underline{m}}} \left( {\begin{array}{c}m\\ a\end{array}}\right) c^{(k_1+\cdots +k_m)}_{(i_1,k_1),\ldots ,(i_m,k_m)}\prod _{j=1}^a \mathbb EX_{i_j}^{k_j-l_j}\prod _{j=a+1}^m \mathbb EX_{i_j}^{k_j}, \end{aligned}$$
where \(i_1,\ldots ,i_a\) are the values of \(\mathbf{r}\) corresponding to the level sets \(I_1,\ldots ,I_a\). We then have
$$\begin{aligned}&\mathop {\mathop {\sum }\limits _{{k_1\ge l_1,\ldots ,k_a \ge l_a}}}\limits _ {\exists _i k_i > l_i} a!\frac{k_1!}{(k_1-l_1)!}\cdots \frac{k_a!}{(k_a-l_a)!}E_\mathcal {I}^{(d,k_1,\ldots ,k_a)}\\&\quad = \mathbb E\mathbf {D}^d f(X)\circ \mathbf {1}_{L(\mathcal {I})} - a!l_1!\cdots l_a! B_d\circ \mathbf {1}_{L(\mathcal {I})}. \end{aligned}$$
Since we do not pay attention to constants depending on \(D\) only, by the above formula and the triangle inequality, to prove (52) it is enough to show that for all sequences \(k_1,\ldots ,k_a\) such that \(k_1+\cdots +k_a \le D, k_i \ge l_i\) for \(i\le a\) and there exists \(i \le a\) such that \(k_i > l_i\), one has
$$\begin{aligned} \Vert E^{(d,k_1,\ldots ,k_a)}_\mathcal {I}\Vert _{\mathcal {J}} \le C_D L^{\sum _{j\le a} (k_j-l_j)}\Vert B_{k_1+\cdots +k_a}\Vert _\mathcal {K} \end{aligned}$$
(53)
for some partition \(\mathcal {K}\in P_{k_1+\cdots +k_a}\) with \(\# \mathcal {K} = \#\mathcal {J}\) (note that \(\sum _{j\le a} l_j = d\)). Therefore in what follows we will fix \(k_1,\ldots ,k_a\) as above and to simplify the notation we will write \(E^{(d)}\) instead of \(E^{(d,k_1,\ldots ,k_a)}_\mathcal {I}\) and \(e^{(d)}_\mathbf{r}\) instead of \(e^{(d,k_1,\ldots ,k_a)}_\mathbf{r}\).
Fix therefore any partition \(\tilde{\mathcal {I}} = \{\tilde{I}_1,\ldots ,\tilde{I_a}\} \in P_{k_1+\cdots +k_a}\) such that \(\#\tilde{I}_i = k_i\) and \(I_i \subseteq \tilde{I}_i\) for all \(i \le a\) (the specific choice of \(\tilde{\mathcal {I}}\) is irrelevant). Finally define a \((k_1+\cdots +k_a)\)-indexed matrix \(\tilde{E}^{(k_1+\cdots +k_a)} = (\tilde{e}^{(k_1+\cdots +k_a)}_\mathbf{r})_{\mathbf{r}\in [n]^d}\) by setting
$$\begin{aligned} \tilde{e}^{(k_1+\cdots +k_a)}_\mathbf{r}= e^{(d)}_{\mathbf{r}_{[d]}} \mathbf {1}_{\{\mathbf{r}\in L(\mathcal {\tilde{I}})\}}. \end{aligned}$$
(54)
In other words, the new matrix is created by embedding the \(d\)-indexed matrix into a “generalized diagonal” of a \((k_1+\cdots +k_a)\)-indexed matrix by adding \(\sum _{j\le a} (k_j-l_j)\) new indices and assigning to them the values of old indices (for each \(j \le a\) we add \(k_j-l_j\) times the common value attained by \(\mathbf{r}_{[d]}\) on \(I_j\)).
Recall now the definition of the coefficients \(b^{(d)}_\mathbf{r}\) and note that for any \(\mathbf{r}\in L(\mathcal {\tilde{I}})\subseteq [n]^{k_1+\ldots +k_a}\) we have \(\tilde{e}^{(k_1+\cdots +k_a)}_\mathbf{r}= b^{(k_1+\cdots +k_a)}_\mathbf{r}\prod _{j=1}^a\mathbb EX_{i_j}^{k_j-l_j}\), where for \(j\le a, i_j\) is the value of \(\mathbf{r}\) on its level set \(\tilde{I}_j\). This means that \(\tilde{E}^{(k_1+\cdots +k_a)} = (B_{k_1+\cdots +k_a}\circ \mathbf {1}_{L(\mathcal {\tilde{I}})})\circ (\otimes _{s=1}^{k_1+\cdots +k_a} v_s)\), where \(v_s = (\mathbb EX_i^{k_j-l_j})_{i\le n}\) if \(s \in \{\min I_1,\ldots ,\min I_a\}\) and \(v_s = (1,\ldots ,1)\) otherwise. Since \(\Vert v_s\Vert _\infty \le (C_DL)^{k_j-l_j}\) if \(s \in \{\min I_j\}_{j\le a}\) and \(\Vert v_s\Vert _\infty = 1\) otherwise, by Lemma 5.1 this implies that for any \(\mathcal {K} \in P_{k_1+\cdots +k_a}\),
$$\begin{aligned} \Vert \tilde{E}^{(k_1+\cdots +k_a)}\Vert _\mathcal {K}&\le (C_D L)^{\sum _{j\le a}(k_j-l_j)}\Vert B_{k_1+\cdots +k_a}\circ \mathbf {1}_{L(\tilde{\mathcal {I}})}\Vert _\mathcal {K} \nonumber \\&\le C_DL^{\sum _{j\le a}(k_j-l_j)}\Vert B_{k_1+\cdots +k_a}\Vert _\mathcal {K}, \end{aligned}$$
(55)
where in the last inequality we used Corollary 5.3.
We will now use the above inequality to prove (53). Consider the unique partition \(\mathcal {K} = \{K_1,\ldots ,K_b\}\) satisfying the following two conditions:
  • for each \(j\le b, J_j \subseteq K_j\),

  • for each \(s \in \{d+1,\ldots ,k_1+\cdots +k_a\}\) if \(s \in \tilde{I}_j\) and \(\pi (s) := \min \tilde{I}_j\in J_k\), then \(s \in K_k\).

In other words, all indices \(s\), which in the construction of \(\tilde{\mathcal {I}}\) were added to \(I_j\) (i.e., elements of \(\tilde{I}_j{\setminus } I_j\)) are now added to the unique element of \(\mathcal {J}\) containing \(\pi (s) = \min \tilde{I}_j = \min I_j\).
Now, it is easy to see that \(\Vert E^{(d)}\Vert _\mathcal {J}\le \Vert \tilde{E}^{(k_1+\cdots +k_a)}\Vert _\mathcal {K}\). Indeed, consider an arbitrary \(x^{(j)} = (x_{\mathbf{r}_{J_j}}^{(j)})_{|\mathbf{r}_{J_j}|\le n}, j=1,\ldots ,b\), satisfying \(\Vert x^{(j)}\Vert _2\le 1\). Define \(y^{(j)} = (y_{\mathbf{r}_{K_j}}^{(j)})_{|\mathbf{r}_{K_j}|\le n}, j =1,\ldots ,b\) with the formula
$$\begin{aligned} y^{(j)}_{\mathbf{r}_{K_j}} = x^{(j)}_{\mathbf{r}_{K_j\cap [d]}}\prod _{s\in K_j \setminus [d]}\mathbf {1}_{\{r_s = r_{\pi (s)}\}}. \end{aligned}$$
We have \(\Vert y^{(j)}\Vert _2 = \Vert x^{(j)}\Vert _2\le 1\). Moreover, by the construction of the matrix \(\tilde{E}^{(k_1+\cdots +k_a)}\) (recall (54)), we have
$$\begin{aligned} \sum _{|\mathbf{r}_{[d]}|\le n} e_{\mathbf{r}_{[d]}}^{(d)} \prod _{j=1}^b x^{(j)}_{\mathbf{r}_{J_j}}&= \sum _{|\mathbf{r}_{[k_1+\ldots +k_a]}|\le n} \tilde{e}_{\mathbf{r}_{[k_1+\ldots +k_a]}}^{(k_1+\ldots +k_a)} \prod _{j=1}^b x^{(j)}_{\mathbf{r}_{J_j}} \\&= \sum _{|\mathbf{r}_{[k_1+\ldots +k_a]}|\le n} \tilde{e}_{\mathbf{r}_{[k_1+\ldots +k_a]}}^{(k_1+\ldots +k_a)} \prod _{j=1}^b y^{(j)}_{\mathbf{r}_{K_j}} \end{aligned}$$
(in the last equality we used the fact that if \(\mathbf{r}\in L(\tilde{\mathcal {I}})\), then for \(s > d, r_{\pi (s)} = r_s\) and so \(y^{(j)}_{\mathbf{r}_{K_j}} = x^{(j)}_{\mathbf{r}_{K_j\cap [d]}} = x^{(j)}_{\mathbf{r}_{J_j}}\)). By taking the supremum over \(x^{(j)}\) one thus obtains \(\Vert E^{(d)}\Vert _\mathcal {J}\le \Vert \tilde{E}^{(k_1+\cdots +k_a)}\Vert _\mathcal {K}\). Combining this inequality with (55) proves (53) and thus (52). This ends the proof of Theorem 1.4.

5.3 Application: subgraph counting in random graphs

We will now apply results from Sect. 5 to some special cases of the problem of subgraph counting in Erdős-Rényi random graphs \(G(n,p)\), which is often used as a test model for deviation inequalities for polynomials in independent random variables. More specifically we will investigate the problem of counting cycles of a fixed length.

It turns out that Theorem 1.4 gives optimal inequalities in some range of parameters (leading to improvements of known results), whereas in some other regimes the estimates it gives are suboptimal.

Let us first describe the setting (we will do it in a slightly more general form that needed for our example). We will consider undirected graphs \(G = (V,E)\), where \(V\) is a finite set of vertices and \(E\) is the set of edges (i.e. two-element subsets of \(V\)). By \(V_G = V(G)\) and \(E_G = E(G)\) we mean the set of vertices and edges (respectively) of a graph \(G\). Also, \(v_G = v(G)\) and \(e_G = e(G)\) denote the number of vertices and edges in \(G\). We say that a graph \(H\) is a subgraph of a graph \(G\) (which we denote by \(H \subseteq G\)) if \(V_H \subseteq V_G\) and \(E_H \subseteq E_G\) (thus a subgraph is not necessarily induced). Graphs \(H\) and \(G\) are isomorphic if there is a bijection \(\pi :V_H \rightarrow V_G\) such that for all distinct \(v,w \in V_H, \{\pi (v),\pi (w)\} \in E_G\) iff \(\{v,w\} \in E_H\).

For \(p \in [0,1]\) consider now the Erdős-Rényi random graph \(G = G(n,p)\), i.e., a graph with \(n\) vertices (we will assume that \(V_G = [n]\)) whose edges are selected independently at random with probability \(p\). In what follows we will be concerned with the number of copies of a given graph \(H = ([k],E_H)\) in the graph \(G\), i.e., the number of subgraphs of \(G\) which are isomorphic to \(H\). We will denote this random variable by \(Y_H(n,p)\). To relate \(Y_H(n,p)\) to polynomials, let us consider the family \(C(n,2)\) of two-element subsets of \([n]\) and the family of independent random variables \(X = (X_{e})_{e \in C(n,2)}\), such that \(\mathbb P(X_{e} = 1) = 1 - \mathbb P(X_{e} = 0) = p\) (i.e., \(X_{e}\) indicates whether the edge \(e\) has been selected or not). Denote moreover by \(\mathrm{Aut}(H)\) the group of isomorphisms of \(H\) into itself and note that
$$\begin{aligned} Y_H(n,p) = \frac{1}{\#\mathrm{Aut}(H)} \sum _{\mathbf{i}\in [n]^{\underline{k}}} \mathop {\mathop {\prod }\limits _{v,w \in [k]}}\limits _{v < w, \{v,w\} \in E(H)} X_{\{i_v,i_w\}}. \end{aligned}$$
The right-hand side above is a homogeneous tetrahedral polynomial of degree \(e_H\). Moreover the variables \(X_{\{v,w\}}\) satisfy
$$\begin{aligned} \mathbb E\exp \Big (X_{\{v,w\}}^2\log (1/p)\Big ) = 1 - p + p\cdot \frac{1}{p} \le 2 \end{aligned}$$
and
$$\begin{aligned} \mathbb E\exp \Big (X_{\{v,w\}}^2\log 2\Big ) \le 2, \end{aligned}$$
which implies that \(\Vert X_{\{v,w\}}\Vert _{\psi _2} \le (\log (1/p))^{-1/2}\wedge (\log (2))^{-1/2} \!\le \! \sqrt{2} (\log (2/p))^{-1/2}\).
We can thus apply Theorem 1.4 to \(Y_H(n,p)\) and obtain
$$\begin{aligned}&\mathbb P\big (|Y_{H}(n,p) - \mathbb EY_{H}(n,p) |\ge t\big )\nonumber \\&\le 2\exp \bigg (-\frac{1}{C_k}\min _{1 \le d\le k}\min _{\mathcal {J} \in P_d} \Big (\frac{t}{L_p^d\Vert \mathbb E\mathbf {D}^d f(X)\Vert _\mathcal {J}}\Big )^{2/\#\mathcal {J}}\bigg ), \end{aligned}$$
(56)
where \(L_p = \sqrt{2} \big (\log (2/p)\big )^{-1/2}\) and \(f :\mathbb {R}^{C(n,2)} \rightarrow \mathbb {R}\) is given by
$$\begin{aligned} f((x_{e})_{e \in C(n,2)}) = \frac{1}{\# \mathrm{Aut}(H)} \sum _{\mathbf{i}\in [n]^{\underline{k}}} \mathop {\mathop {\prod }\limits _{{v,w \in [k]}}}\limits _{v < w, \{v,w\} \in E(H)} x_{\{i_v,i_w\}}. \end{aligned}$$
Deviation inequalities for subgraph counts have been studied by many authors, to mention [22, 26, 27, 35, 36, 37, 38, 65]. As it turns out the lower tail \(\mathbb P(Y_H(n,p) \le \mathbb EY_H(n,p) - t)\) is easier than the upper tail \(\mathbb P(Y_H(n,p) \ge \mathbb EY_H(n,p) +t)\). The lower tail turns out to be also lighter than the upper one. Since our inequalities concern \(|Y_H(n,p) - \mathbb EY_H(n,p)|\), we cannot hope to recover optimal lower tail estimates, however we can still hope to get bounds which in some range of parameters \(n,p\) will agree with optimal upper tail estimates.
Of particular importance in literature is the law of large numbers regime, i.e., the case when \(t = \varepsilon \mathbb EY_H(n,p)\). In [35] the Authors prove that for every \(\varepsilon >0\) such that \(\mathbb P\big (Y_H(n,p) \ge (1+\varepsilon )\mathbb EY_H(n,p)\big ) > 0\),
$$\begin{aligned} \exp \left( -C(H,\varepsilon )M_H^*(n,p)\log \frac{1}{p}\right)&\le \mathbb P\big (Y_H(n,p) \ge (1+\varepsilon )\mathbb EY_H(n,p)\big )\nonumber \\&\le \exp \big (-c(H,\varepsilon )M_H^*(n,p)\big ) \end{aligned}$$
(57)
for certain constants \(c(H,\varepsilon ), C(H,\varepsilon )\) and a certain function \(M_H^*(n,p)\). Since the general definition of \(M_H^*\) is rather involved we will skip the details (in the examples considered in the sequel we will provide specific formulas). Note that if one disregards the constants depending on \(H\) and \(\varepsilon \) only, the lower and upper estimate above differ by the factor \(\log (1/p)\) in the exponent. To our best knowledge providing a lower and upper bound for general \(H\), which would agree up to multiplicative constants in the exponent (depending on \(H\) and \(\varepsilon \) only) is an open problem (see the remark below).

We will now specialize to the case when \(H\) is a cycle. For simplicity we will first present the case of the triangle \(K_3\) (the clique with three vertices). For this graph the upper bound from [35] has been recently strengthened to match the lower one (up to a constant depending only on \(\varepsilon \)) by Chatterjee [22] and DeMarco and Kahn [27] (who also obtained a similar result for general cliques [26]). In the next section we show that if \(p\) is not too small, the inequality (56) also allows to recover the optimal upper bound. In Section 5.3.2 we provide an upper bound for cycles of arbitrary (fixed) length \(k\), which is optimal for \(p \ge n^{-\frac{k-2}{2(k-1)}} \log ^{-\frac{1}{2}} n\).

Remark (Added in revision) Very recently, after the first version of this article was submitted, a major breakthrough was obtained by Chatterjee-Dembo and Lubetzky-Zhao [23, 49], who strengthened the upper bound to \(\exp (-C(H,\varepsilon )M_H^*(n,p)\log \frac{1}{p})\) for general graphs and \(p \ge n^{-c(H)}\). In the case of cycles which we consider in the sequel, our bounds are valid in a larger range of \(p \rightarrow 0\), than those which can be obtained from the present versions of the aforementioned papers. We would also like to point out, that the methods of [23, 49] rely on large deviation principles and not on inequalities for general polynomials in independent random variables. Obtaining general inequalities for polynomials, which would yield optimal bound for general graphs is an interesting and apparently still open research problem.

5.3.1 Counting triangles

Assume that \(H = K_3\) and let us analyse the behaviour of \(\Vert \mathbb E\mathbf {D}^d f(X)\Vert _\mathcal {J}\) for \(d = 1,2,3\). Of course in this case \(\#\mathrm{Aut}(H) = 6\).

We have for any \(e = \{v,w\}, v,w \in [n]\),
$$\begin{aligned} \frac{\partial }{\partial x_e} f(x) = \sum _{i \in [n]\setminus \{v,w\}} x_{\{i,v\}} x_{\{i,w\}} \end{aligned}$$
and so \(\Vert \mathbb E\mathbf {D}f(X)\Vert _{\{1\}}= (n-2)p^2 \sqrt{n(n-1)/2} \le n^2p^2\).
For \(e_1 = e_2\) or when \(e_1\) and \(e_2\) do not have a common vertex, we have \(\frac{\partial ^2}{\partial x_{e_1}\partial x_{e_2}} f = 0\), whereas for \(e_1,e_2\) sharing exactly one vertex, we have
$$\begin{aligned} \frac{\partial ^2}{\partial x_{e_1}\partial x_{e_2}} f (x) = x_{\{v,w\}}, \end{aligned}$$
where \(v,w\) are the vertices of \(e_1,e_2\) distinct from the common one. Therefore
$$\begin{aligned} \mathbb E\mathbf {D}^2 f(X) = p (\mathbf {1}_{\{e_1,e_2\,\, \text {have exactly one common vertex}\}})_{e_1,e_2 \in C(n,2)}. \end{aligned}$$
Using the fact that \(\mathbb E\mathbf {D}^2 f(X)\) is symmetric and for each \(e_1\) the sum of entries of \(\mathbb E\mathbf {D}^2 f(X)\) in the row corresponding to \(e_1\) equals \(2p(n-2)\), we obtain \(\Vert \mathbb E\mathbf {D}^2 f(X)\Vert _{\{1\}\{2\}} = 2p(n-2) \le 2pn\). One can also easily see that \(\Vert \mathbb E\mathbf {D}^2 f(X)\Vert _{\{1,2\}} = p\sqrt{n(n-1)(n-2)} \le pn^{3/2}\).
Finally
$$\begin{aligned} \frac{\partial ^3}{\partial x_{e_1}\partial x_{e_2} \partial x_{e_3}} f = \mathbf {1}_{\{e_1,e_2,e_3\,\,\text {form a triangle}\}} \end{aligned}$$
and thus \(\Vert \mathbb E\mathbf {D}^3f(X)\Vert _{\{1,2,3\}} = \sqrt{n(n-1)(n-2)} \le n^{3/2}\). Moreover, due to symmetry we have
$$\begin{aligned} \Vert \mathbb E\mathbf {D}^3 f(X)\Vert _{\{1,2\}\{3\}} = \Vert \mathbb E\mathbf {D}^3 f(X)\Vert _{\{1,3\}\{2\}} = \Vert \mathbb E\mathbf {D}^3 f(X)\Vert _{\{2,3\}\{1\}}. \end{aligned}$$
Consider arbitrary \((x_{e_1})_{e_1 \in C(n,2)}\) and \((y_{e_2,e_3})_{e_2,e_3 \in C(n,2)}\) of norm one. We have
$$\begin{aligned}&\sum _{e_1,e_2,e_3} \mathbf {1}_{\{e_1,e_2,e_3\,\text {form a triangle}\}}x_{e_1}y_{e_2,e_3} \le \sqrt{\sum _{e_1} \Big ( \sum _{e_2,e_3}\mathbf {1}_{\{e_1,e_2,e_3\,\,\text {form a triangle}\}}y_{e_2,e_3} \Big )^2}\\&\quad \le \sqrt{\sum _{e_1} \Big ( \sum _{e_2,e_3}\mathbf {1}_{\{e_1,e_2,e_3\,\, \text {form a triangle}\}} \Big ) \Big ( \sum _{e_2,e_3}\mathbf {1}_{\{e_1,e_2,e_3\,\,\text {form a triangle}\}}y_{e_2,e_3}^2 \Big ) }\\&\quad = \sqrt{2(n-2)} \sqrt{ \sum _{e_2,e_3} y_{e_2,e_3}^2 \sum _{e_1} \mathbf {1}_{\{e_1,e_2,e_3\,\,\text {form a triangle}\}} } \le \sqrt{2(n-2)}, \end{aligned}$$
where the first two inequalities follow by the Cauchy-Schwarz inequality and the last one from the fact that for each \(e_2,e_3\) there is at most one \(e_1\) such that \(e_1,e_2,e_3\) form a triangle. We have thus obtained \(\Vert \mathbb E\mathbf {D}^3 f(X)\Vert _{\{1,2\}\{3\}} = \Vert \mathbb E\mathbf {D}^3 f(X)\Vert _{\{1,3\}\{2\}} = \Vert \mathbb E\mathbf {D}^3 f(X)\Vert _{\{2,3\}\{1\}} \le \sqrt{2n}\).
It remains to estimate \(\Vert \mathbb E\mathbf {D}^3 f(X)\Vert _{\{1\}\{2\}\{3\}}\). For all \((x_e)_{e\in C(n,2)}, (y_e)_{e\in C(n,2)}, (z_e)_{e\in C(n,2)}\) of norm one we have by the Cauchy-Schwarz inequality
$$\begin{aligned}&\sum _{e_1,e_2,e_3}\mathbf {1}_{\{e_1,e_2,e_3\,\, \text {form a triangle}\}}x_{e_1}y_{e_2} z_{e_3} = \sum _{(i_1,i_2,i_3) \in [n]^{\underline{3}}}x_{\{i_1,i_2\}}y_{\{i_2,i_3\}}z_{\{i_1,i_3\}}\\&\quad \le \sum _{i_1 \in [n]} \left( \sum _{(i_2,i_3) \in ([n]\setminus {\{i_1\}})^{\underline{2}}} x_{\{i_1,i_2\}}^2 z_{\{i_1,i_3\}}^2\right) ^{1/2}\left( \sum _{(i_2,i_3) \in ([n]\setminus {\{i_1\}})^{\underline{2}}} y_{\{i_2,i_3\}}^2\right) ^{1/2}\\&\quad \le \sqrt{2}\sum _{i_1 \in [n]}\left( \sum _{i_2\in [n]\setminus {\{i_1\}}} x_{\{i_1,i_2\}}^2\right) ^{1/2}\left( \sum _{i_3\in [n]\setminus {\{i_1\}}} z_{\{i_1,i_3\}}^2\right) ^{1/2}\\&\quad \le \sqrt{2}\left( \sum _{(i_1,i_2)\in [n]^{\underline{2}}} x_{\{i_1,i_2\}}^2\right) ^{1/2}\left( \sum _{(i_1,i_3)\in [n]^{\underline{2}}} z_{\{i_1,i_3\}}^2\right) ^{1/2} \le 2^{3/2}, \end{aligned}$$
which gives \(\Vert \mathbb E\mathbf {D}^3 f(X)\Vert _{\{1\}\{2\}\{3\}} \le 2^{3/2}\).

Using (56) together with the above estimates, we obtain

Proposition 5.5

For any \(t >0\),
$$\begin{aligned}&\mathbb P\big (|Y_{K_3}(n,p) - \mathbb EY_{K_3}(n,p) | \ge t\big )\\&\quad \le 2\exp \Big (-\frac{1}{C}\min \Big (\frac{t^2}{L_p^6 n^3 + L_p^4p^2n^3 + L_p^2p^4 n^4},\frac{t}{L_p^3n^{1/2} +L_p^2 p n },\frac{t^{2/3}}{L_p^2}\Big )\Big ), \end{aligned}$$
where \(L_p = \big (\log (2/p)\big )^{-1/2}\).
In particular for \(t = \varepsilon \mathbb EY_{K_3}(n,p) = \varepsilon \left( {\begin{array}{c}n\\ 3\end{array}}\right) p^3\),
$$\begin{aligned}&\mathbb P\big (|Y_{K_3}(n,p) - \mathbb EY_{K_3}(n,p) | \ge \varepsilon \mathbb EY_{K_3}(n,p) \big )\\&\quad \le 2\exp \Big (-\frac{1}{C} \min \Big (\varepsilon ^2 n^3p^6 \log ^3(2/p), (\varepsilon ^2 \wedge \varepsilon ^{2/3}) n^2p^2\log (2/p)\Big )\Big ). \end{aligned}$$
Thus for \(p \ge n^{-\frac{1}{4}}\log ^{-\frac{1}{2}} n\) we obtain
$$\begin{aligned} \mathbb P\big (|Y_{K_3}(n,p) \!-\! \mathbb EY_{K_3}(n,p) | \!\ge \! \varepsilon \mathbb EY_{K_3}(n,p) \big ) \le 2\exp \big (\!-\!(\varepsilon ^2 \wedge \varepsilon ^{2/3}) n^2p^2\log (2/p)\big ). \end{aligned}$$
By Corollary 1.7 in [35], if \(p \ge 1/n\), then \(\frac{1}{C} n^2p^2 \le M^*_{K_3}(n,p) \le C n^2p^2\) (recall (57)) and so for \(p \ge n^{-1/4} \log ^{-1/2} n\) the estimate obtained from the above proposition is optimal. As already mentioned the optimal estimate has been recently obtained in the full range of \(p\) by Chatterjee, DeMarco and Kahn. Unfortunately it seems that using our general approach we are not able to recover the full strength of their result. From Proposition 5.5 one can also see that Theorem 1.4, when specialized to polynomials in \(0\)-\(1\) random variables is not directly comparable with the family of Kim-Vu inequalities. As shown in [36] (see table 2 therein), various inequalities by Kim and Vu give for the triangle counting problem exponents \(-\min (n^{1/3}p^{1/6}, n^{1/2}p^{1/2}), -n^{3/2}p^{3/2}, -np\) (disregarding logarithmic factors). Thus for “large” \(p\) our inequality performs better than those by Kim-Vu, whereas for “small” \(p\) this is not the case (note that the Kim-Vu inequalities give meaningful bounds for \(p \ge C n^{-1}\) while ours only for \(p \ge C n^{-1/2}\)). As already mentioned in the introduction the fact that our inequalities degenerate for small \(p\) is not surprising as even for sums of independent \(0\)-\(1\) random variables, when \(p\) becomes small, general inequalities for the sums of independent random variables with sub-Gaussian tails do not recover the correct tail behaviour (the \(\Vert \cdot \Vert _{\psi _2}\) norm of the summands becomes much larger than the variance).

5.3.2 Counting cycles

We will now generalize Proposition 5.5 to cycles of arbitrary length. If \(H\) is a cycle of length \(k\), then by Corollary 1.7 in [35], \(\frac{1}{C} n^2p^2 \le M^*_{H}(n,p) \le C n^2p^2\) for \(p\ge 1/n\). Thus the bounds for the upper tail from (57) imply that for \(p \ge 1/n\),
$$\begin{aligned} \exp \big (-C(k,\varepsilon )n^2p^2\log (1/p)\big )&\le \mathbb P\big (Y_H(n,p) \ge (1+\varepsilon )\mathbb EY_H(n,p) \big ) \\&\le \exp \big (-c(k,\varepsilon )n^2p^2\big ) \end{aligned}$$
for every \(\varepsilon > 0\) for which the above probability is not zero.

We will show that similarly as for triangles, Theorem 1.4 allows to strengthen the upper bound if \(p\) is not too small with respect to \(n\). More precisely, we have the following

Proposition 5.6

Let \(H\) be a cycle of length \(k\). Then for every \(t > 0\),
$$\begin{aligned}&\mathbb P\big (|Y_H(n,p) - \mathbb EY_H(n,p)| \ge t\big )\\&\le 2\exp \Big (-\frac{1}{C_k} \Big (\frac{t^2}{L_p^{2k}n^k} \wedge \min _{\begin{array}{c} 1\le l\le d\le k :\\ d<k\;\mathrm{or}\; l>1 \end{array}}\Big (\frac{t^{2/l}}{L_p^{2d/l} p^{2(k-d)/l}n^{(2k-d-l)/l}}\Big )\Big )\Big ), \end{aligned}$$
where \(L_p \!=\! \big (\log (2/p)\big )^{-1/2}\). In particular for every \(\varepsilon > 0\) and \(p \ge n^{-\frac{k-2}{2(k-1)}}\log ^{-1/2} n\),
$$\begin{aligned} \mathbb P\big (Y_H(n,p) \ge (1+\varepsilon )\mathbb EY_H(n,p)\big ) \le 2\exp \Big (-\frac{1}{C_k} (\varepsilon ^2 \wedge \varepsilon ^{2/k}) n^2p^2\log (2/p)\Big ). \end{aligned}$$

In order to prove the above proposition we need to estimate the corresponding \(\Vert \cdot \Vert _\mathcal {J}\) norms. Since a major part of the argument does not rely on the fact that \(H\) is a cycle and bounds on \(\Vert \cdot \Vert _\mathcal {J}\) norms may be of independent interest, we will now consider arbitrary graphs. Let thus \(H\) be a fixed graph with no isolated vertices.

Similarly to [35], it will be more convenient to count “ordered” copies of a graph \(H\) in \(G(n,p)\). Namely, for \(H = ([k], E_H)\), each sequence of \(k\) distinct vertices in the clique \(K_n, \mathbf{i}\in [n]^{\underline{k}}\) determines an ordered copy \(G_{\mathbf{i}}\) of \(H\) in \(K_n\), where \(G_\mathbf{i}= \mathbf{i}(H)\), i.e., \(V(G_\mathbf{i}) = \mathbf{i}([k])\) and \(E(G_\mathbf{i}) = \{ \mathbf{i}(e) :e \in E(H) \} = \left\{ \{i_u, i_v\} :\{u, v\} \in E(H) \right\} \). Define
$$\begin{aligned} X_H(n,p) := \sum _{\mathbf{i}\in [n]^{\underline{k}}} \mathbf {1}_{\{G_{\mathbf{i}} \subseteq G(n,p)\}} = \sum _{\mathbf{i}\in [n]^{\underline{k}}} \; \prod _{\tilde{e} \in E(G_\mathbf{i})} X_{\tilde{e}}. \end{aligned}$$
Clearly \(X_H(n,p) = \# \mathrm{Aut}(H) Y_H(n,p)\) and \(X_H(n,p) = f(X)\), where
$$\begin{aligned} f(x) := \sum _{\mathbf{i}\in [n]^{\underline{k}}} \; \prod _{\tilde{e} \in E(G_{\mathbf{i}})} x_{\tilde{e}} = \sum _{\mathbf{i}\in [n]^{\underline{k}}} \; \prod _{e \in E(H)} x_{\mathbf{i}(e)}. \end{aligned}$$
(58)
A sequence of distinct edges \((\tilde{e}_1, \ldots , \tilde{e}_d) \in E(K_n)^{\underline{d}}\) determines a subgraph \(G_0 \subseteq K_n\) with \(V(G_0) = \bigcup _{i=1}^d \tilde{e}_i, E(G_0) = \{\tilde{e}_1, \ldots , \tilde{e}_d\}\). Note that
$$\begin{aligned} \partial _{G_0} f(x) := \frac{\partial ^d f(x)}{\partial x_{\tilde{e}_1} \cdots \partial x_{\tilde{e}_d}} = \sum _{\mathbf{i}\in [n]^{\underline{k}} :G_{\mathbf{i}} \supseteq G_0} \; \prod _{\tilde{e} \in E(G_{\mathbf{i}}) \setminus E(G_0)} x_{\tilde{e}} \end{aligned}$$
and thus
$$\begin{aligned} \mathbb E\partial _{G_0} f(X) = p^{e(H) - d} \#\{ \mathbf{i}\in [n]^{\underline{k}} :G_0 \subseteq G_\mathbf{i}\}. \end{aligned}$$
Consider \(\mathbf{e}= (e_1, \ldots , e_d) \in E(H)^{\underline{d}}\) and let \(H_0(\mathbf{e})\) be the subgraph of \(H\) with \(V(H_0(\mathbf{e})) = \bigcup _{i=1}^d e_i, E(H_0(\mathbf{e})) = \{e_1, \ldots , e_d\}\). Clearly, for any \(\mathbf{i}\in [n]^{\underline{k}}, \mathbf{i}(H_0(\mathbf{e})) \subseteq G_\mathbf{i}\). We write \((e_1, \ldots e_d) \simeq (\tilde{e}_1, \ldots , \tilde{e}_d)\) if there exists \(\mathbf{i}\in [n]^{\underline{k}}\) such that \(\mathbf{i}(e_j) = \tilde{e}_j\) for \(j = 1, \ldots , d\).
Note that given \((\tilde{e}_1, \ldots , \tilde{e}_d) \in E(K_n)^{\underline{d}}\) and the corresponding graph \(G_0\),
$$\begin{aligned} \#\{ \mathbf{i}\in [n]^{\underline{k}} :G_0 \subseteq G_\mathbf{i}\}&= \sum _{\mathbf{e}\in E(H)^{\underline{d}}} \#\{ \mathbf{i}\in [n]^{\underline{k}} :\mathbf{i}(e_j) = \tilde{e}_j\,\, \text { for}\,\, j = 1, \ldots , d\} \\&= \sum _{\mathbf{e}\in E(H)^{\underline{d}} } 2^{s(H_0(\mathbf{e}))} (n - v(H_0(\mathbf{e})))^{\underline{k - v(H_0(\mathbf{e}))}} \mathbf {1}_{\{(\tilde{e}_1, \ldots , \tilde{e}_d) \simeq \mathbf{e}\}}, \end{aligned}$$
where for a graph \(G, v(G)\) is the number of vertices of \(G\) and \(s(G)\) is the number of edges in \(G\) with no other adjacent edge. Therefore,
$$\begin{aligned} \mathbb E\mathbf {D}^d f(X) \!=\! p^{e(H) - d} \!\!\sum _{\mathbf{e}\in E(H)^{\underline{d}} } 2^{s(H_0(\mathbf{e}))} (n \!-\! v(H_0(\mathbf{e})))^{\underline{k - v(H_0(\mathbf{e}))}} \left( \mathbf {1}_{\{(\tilde{e}_1, \ldots , \tilde{e}_d) \simeq \mathbf{e}\}} \right) _{(\tilde{e}_1 \ldots , \tilde{e}_d)}\!. \end{aligned}$$
Let \(\mathcal {J}\) be a partition of \([d]\). By the triangle inequality for the norms \(\left\| \cdot \right\| _{\mathcal {J}}\),
$$\begin{aligned} \left\| \mathbb E\mathbf {D}^d f(X) \right\| _{\mathcal {J}} \le p^{e(H) - d} \sum _{\mathbf{e}\in E(H)^{\underline{d}}} 2^{s(H_0(\mathbf{e}))} n^{k - v(H_0(\mathbf{e}))} \left\| \left( \mathbf {1}_{\{(\tilde{e}_1, \ldots , \tilde{e}_d) \simeq \mathbf{e}\}} \right) _{(\tilde{e}_1 \ldots , \tilde{e}_d)} \right\| _\mathcal {J}. \end{aligned}$$
(59)
The norms appearing on the right hand side of (59) are handled by the following

Lemma 5.7

Fix \(1 \le d \le e(H), \mathbf{e}= (e_1, \ldots , e_d) \in E(H)^{\underline{d}}\) and \(\mathcal {J} = \{J_1,\ldots ,J_l\} \in P_d\). Let \(H_0 = H_0(\mathbf{e})\) and for \(r = 1, \ldots , l\), let \(H_r\) be a subgraph of \(H_0\) spanned by the set of edges \(\{e_j :j \in J_r\}\). Then,
$$\begin{aligned}&\left\| \left( \mathbf {1}_{\{(\tilde{e}_1, \ldots , \tilde{e}_d) \simeq (e_1, \ldots , e_d)\}} \right) _{(\tilde{e}_1 \ldots , \tilde{e}_d)} \right\| _{\mathcal {J}} \le 2^{-s(H_0) + \frac{1}{2} \sum _{r=1}^l s(H_r)} \\&\quad \times \, n^{\frac{1}{2} \# \{ v \in V(H_0) :v \in V(H_r)\, \mathrm{for\,\, exactly\,\, one}\, r \in [l] \}}. \end{aligned}$$

Proof

We shall bound the sum
$$\begin{aligned} \sum _{\tilde{e}_1, \ldots , \tilde{e}_d \in E(K_n)} \mathbf {1}_{\{(\tilde{e}_1, \ldots , \tilde{e}_d) \simeq \mathbf{e}\}} \prod _{r=1}^l x_{(\tilde{e}_j)_{j \in J_r}}^{(r)} \end{aligned}$$
(60)
under the constraints \(\sum _{(\tilde{e}_j)_{j \in J_r} \in E(K_n)^{J_r}} \left( x_{(\tilde{e}_j)_{j \in J_r}}^{(r)}\right) ^2 \le 1\) for \(r = 1, \ldots , l\). Note that we can assume \(x^{(r)} \ge 0\) for all \(r \in [l]\). Rewrite the sum (60) as the sum over a sequence of vertices instead of edges:
$$\begin{aligned} 2^{-s(H_0)} \sum _{\mathbf{i}\in [n]^{\underline{V(H_0)}}} \; \prod _{r=1}^l x_{(\mathbf{i}(e_j))_{j \in J_r}}^{(r)}, \end{aligned}$$
where for two sets \(A,B, A^{\underline{B}}\) is the set of 1-1 functions from \(B\) to \(A\). Further note that it is enough to prove the desired bound for the sum
$$\begin{aligned} 2^{-s(H_0)} \sum _{\mathbf{i}\in [n]^{\underline{V(H_0)}}} \; \prod _{r=1}^l y_{\mathbf{i}_{V(H_r)}}^{(r)} \end{aligned}$$
(61)
under the constraints \(2^{-s(H_r)} \sum _{\mathbf{i}\in [n]^{\underline{V(H_r)}}} \left( y_{\mathbf{i}_{V(H_r)}}^{(r)} \right) ^2 \le 1\) for each \(r = 1, \ldots , l\). Indeed, given \(x\)’s, for each \(r = 1, \ldots , l\) and all \(\mathbf{i}\in [n]^{\underline{V(H_r)}}\) take \(y_{\mathbf{i}_{V(H_r)}}^{(r)} = x_{(\mathbf{i}(e_j))_{j \in J_r}}^{(r)}\) and notice that the sum (61) equals the sum (60) while the constraints for \(x\)’s imply the constraints for \(y\)’s. Finally, by homogeneity and the fact that the sum (61) does not depend on the full graph structure but only on the sets of vertices of the graphs \(H_r\), the lemma will follow from the statement: For a sequence of finite, non-empty sets \(V_1, \ldots , V_l\), let \(V = V_1 \cup \cdots \cup V_l\). Then
$$\begin{aligned} \sum _{\mathbf{i}\in [n]^{\underline{V}}} \; \prod _{r=1}^l y_{\mathbf{i}_{V_r}}^{(r)} \le n^{\frac{1}{2} \# \{ v \in V :v \in V_r\,\, \text {for exactly one}\,\, r \in [l] \}} \end{aligned}$$
(62)
for \(y^{(1)}, \ldots , y^{(l)} \ge 0\) satisfying
$$\begin{aligned} \sum _{\mathbf{i}\in [n]^{\underline{V_r}}} \left( y_{\mathbf{i}_{V_r}}^{(r)} \right) ^2 \le 1. \end{aligned}$$
(63)
We prove (62) by induction on \(\# V\). For \(V = \emptyset \) (and \(l=0\)), (62) holds trivially. For the induction step fix any \(v_0 \in V\) and put \(R = \{r \in [l] :v_0 \in V_r\}\). We write
$$\begin{aligned} \sum _{\mathbf{i}\in [n]^{\underline{V}}}\;\prod _{r=1}^l y_{\mathbf{i}_{V_r}}^{(r)} = \sum _{\mathbf{i}\in [n]^{\underline{V \setminus \{v_0\}}}} \left( \left( \prod _{r \in [l] \setminus R} y_{\mathbf{i}_{V_r}}^{(r)} \right) \sum _{i_{v_0} \in [n] \setminus \mathbf{i}(V \setminus \{v_0\})} \; \prod _{r \in R} y_{\mathbf{i}_{V_r}}^{(r)} \right) . \end{aligned}$$
We bound the inner sum using the Cauchy-Schwarz inequality. If \(\# R \ge 2\), we get
$$\begin{aligned} \sum _{i_{v_0} \in [n] \setminus \mathbf{i}(V \setminus \{v_0\})} \; \prod _{r \in R} y_{\mathbf{i}_{V_r}}^{(r)} \le \prod _{r \in R} \left( \sum _{i_{v_0} \in [n] \setminus \mathbf{i}(V \setminus \{v_0\})} \left( y_{\mathbf{i}_{V_r}}^{(r)} \right) ^2 \right) ^{1/2}, \end{aligned}$$
and if \(R = \{r_0\}\) then
$$\begin{aligned} \sum _{i_{v_0} \in [n] \setminus \mathbf{i}(V \setminus \{v_0\})} y_{\mathbf{i}_{V_{r_0}}}^{(r_0)} \le \sqrt{n} \left( \sum _{i_{v_0} \in [n] \setminus \mathbf{i}(V \setminus \{v_0\})} \left( y_{\mathbf{i}_{V_{r_0}}}^{(r_0)} \right) ^2 \right) ^{1/2}. \end{aligned}$$
Now, for each \(r \in R\) put \(W_r = V_r{\setminus } \{v_0\}\) and define
$$\begin{aligned} z_{\mathbf{i}_{W_r}}^{(r)} = \left( \sum _{i_{v_0} \in [n] \setminus \mathbf{i}(W_r )} \left( y_{\mathbf{i}_{V_r}}^{(r)} \right) ^2 \right) ^{1/2} \text { for all}\,\, \mathbf{i}_{W_r} \in [n]^{\underline{W_r}}. \end{aligned}$$
Note that if \(W_r = \emptyset \) then \(z^{(r)}\) is a scalar and by (63), \(0 \le z^{(r)} \le 1\). For \(r \in [l] {\setminus } R\), just put \(W_r = V_r\) and \(z^{(r)} \equiv y^{(r)}\). Let \(L = \{ r \in [l] :W_r \ne \emptyset \}\). Combining the estimates obtained above, we arrive at
$$\begin{aligned} \sum _{\mathbf{i}\in [n]^{\underline{V}}} \; \prod _{r=1}^l y_{\mathbf{i}_{V_r}}^{(r)} \le (\sqrt{n})^{\mathbf {1}_{\{ v_0 \in V_r \,\,\text {for exactly one}\,\, r \in [l]\}}} \sum _{\mathbf{i}\in [n]^{\underline{V \setminus \{v_0\}}}} \; \prod _{r \in L} z_{\mathbf{i}_{W_r}}^{(r)}. \end{aligned}$$
Now we use the induction hypothesis for the sequence of sets \((W_r)_{r \in L}\) and the vectors \(z^{(r)}, r \in L\) (note that \(\sum _{\mathbf{i}\in [n]^{\underline{W_r}}}(z_{\mathbf{i}_{W_r}}^{(r)})^2 \le 1\)).\(\square \)

Remark

The bound in Lemma 5.7 is essentially optimal, at least for large \(n\), say \(n \ge 2 k\). To see this let us analyse optimality of (62) under the constraints (63) (one can see that this is equivalent to the optimality in the original problem). Denote \(V_0 = \{ v \in V :v \in V_r \text { for exactly one } r \in [l]\}\). Fix any \(\mathbf{i}^{(0)} \in [n]^{\underline{k}}\). Then for \(r =1, \ldots , l\) take
$$\begin{aligned} y_{\mathbf{i}_{V_r}}^{(r)} = {\left\{ \begin{array}{ll} n^{-\frac{1}{2} \#(V_r \cap V_0)} &{} \text {if}\,\, \mathbf{i}_{V_r \setminus V_0} \equiv \mathbf{i}_{V_r \setminus V_0}^{(0)} \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
The vectors \(y^{(r)}\) satisfy the constraints (63) and
$$\begin{aligned} \sum _{\mathbf{i}\in [n]^{\underline{V}}} \; \prod _{r=1}^l y_{\mathbf{i}_{V_r}}^{(r)}&= \sum _{\mathbf{i}\in [n]^{\underline{V}} :\mathbf{i}_{V \setminus V_0} \equiv \mathbf{i}_{V{\setminus }V_0}^{(0)}} \prod _{r=1}^l n^{-\frac{1}{2} \#(V_r \cap V_0)} \\&= \left( n - \#(V \setminus V_0)\right) ^{\underline{\# V_0}} \, n^{-\frac{1}{2} \# V_0} \ge (n/2)^{\# V_0} n^{-\frac{1}{2} \# V_0} = 2^{-\# V_0} n^{\frac{1}{2} \# V_0}. \end{aligned}$$

Combining Lemma 5.7 with (59) we obtain

Lemma 5.8

Let \(H\) be any graph with \(k\) vertices, which are not isolated, and let \(f\) be defined by (58). Then for any \(1\le d\le e(H)\) and any \(\mathcal {J} = \{J_1,\ldots ,J_l\} \in P_d\),
$$\begin{aligned}&\Vert \mathbb E\mathbf {D}^d f(X)\Vert _\mathcal {J} \\&\quad \le p^{e(H)-d}\sum _{\mathbf{e}\in E(H)^{\underline{d}}}2^{\frac{1}{2}\sum _{r=1}^l s(H_r(\mathbf{e}))} n^{k - v(H_0(\mathbf{e})) + \frac{1}{2}\#\{v\in V(H_0(\mathbf{e})):v \in V(H_r(\mathbf{e}))\,\,\mathrm{for}\,\mathrm{exactly}\,\mathrm{one}\,\, r\in [l] \}}, \end{aligned}$$
where for \(\mathbf{e}\in E(H)^{\underline{d}}\) and \(r \in [l], H_r(\mathbf{e})\) is the subgraph of \(H_0(\mathbf{e})\) spanned by \(\{e_j:j\in J_r\}\).

We are now ready for

Proof of Proposition 5.6

We will use Lemma 5.8 to estimate \(\Vert \mathbb E\mathbf {D}^d f(X)\Vert _\mathcal {J}\) for any \(d \le k\) and \(\mathcal {J} \in P_d\) with \(\#\mathcal {J} = l\). Note that for any \(\mathbf{e}\in E(H)^{\underline{d}}\),
$$\begin{aligned}&v(H_0(\mathbf{e})) - \frac{1}{2} \# \{ v \in V(H_0(\mathbf{e})) :v \in V(H_r(\mathbf{e}))\,\,\text {for exactly one}\,\, r \in [l]\} \\&\quad = \frac{1}{2} \big ( v(H_0(\mathbf{e})) + \#\{ v \in V(H_0(\mathbf{e})) :v\,\, \text {belongs to more than one}\,\, V(H_r(\mathbf{e}))\} \big ) \\&\quad {\left\{ \begin{array}{ll} = k/2 &{} \text {if}\, d = k\, \text {and}\, l=1,\\ \ge \frac{1}{2} (d+l) &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
where to get the second inequality we used the fact that each vertex of \(H\) has degree two. Thus we obtain
$$\begin{aligned} \Vert \mathbb E\mathbf {D}^k f(X)\Vert _{\{[k]\}}&\le n^{k/2}, \\ \Vert \mathbb E\mathbf {D}^d f(X)\Vert _\mathcal {J}&\le C_k p^{k-d}n^{k - \frac{1}{2}d - \frac{1}{2}l} \quad \text {if}\, d < k\, \text {or}\, l > 1. \end{aligned}$$
Together with (56) this yields the first inequality of the proposition. Using the fact that \(\mathbb EY_H(n,p) \ge \frac{1}{C_k} n^k p^k\), the second inequality follows by simple calculations.\(\square \)

6 Refined inequalities for polynomials in independent random variables satisfying the modified log-Sobolev inequality

In this section we refine the inequalities which can be obtained from Theorem 3.3 for polynomials in independent random variables satisfying the \(\beta \)-modified log-Sobolev inequality (16) with \(\beta > 2\). To this end we will use Theorem 3.4 together with a result from [2], which is a counterpart of Theorem 3.1 for homogeneous tetrahedral polynomials in general independent symmetric random variables with log-concave tails, however only of degree at most 3. We recall that for a set \(I\), by \(P_I\) we denote the family of partitions of \(I\) into pairwise disjoint, nonempty sets.

Theorems 3.1 and 3.2 and 3.4 from [2] specialized to Weibull variables can be translated into

Theorem 6.1

Let \(\alpha \in [1,2]\) and let \(Y_1,\ldots ,Y_n\) be a sequence of i.i.d. symmetric random variables satisfying \(\mathbb P(|Y_i| \ge t) = \exp (-t^\alpha )\). Define \(Y = (Y_1,\ldots ,Y_n)\) and let \(Z_1,\ldots ,Z_d\) be independent copies of \(Y\). Consider a \(d\)-indexed matrix \(A\). Define also
$$\begin{aligned} m_d(p, A) = \sum _{I\subseteq [d]} \sum _{\mathcal {J} \in P_I}\sum _{\mathcal {K}\in P_{[d]\setminus I}} p^{\#\mathcal {J}/2 + \#\mathcal {K}/\alpha }\Vert A\Vert _{\mathcal {J}|\mathcal {K}}, \end{aligned}$$
(64)
where for \(\mathcal {J} = \{J_1,\ldots ,J_r\}\in P_I\) and \(\mathcal {K} = \{K_1,\ldots ,K_k\} \in P_{[d]\setminus I}\),
$$\begin{aligned} \Vert A\Vert _{\mathcal {J}|\mathcal {K}}&= \sum _{s_1\in K_1,\ldots ,s_k \in K_k}\sup \left\{ \sum _{\mathbf{i}\in [n]^d} a_{\mathbf{i}}\prod _{l=1}^r x^{(l)}_{\mathbf {i}_{J_l}}\prod _{l=1}^k y^{(l)}_{\mathbf {i}_{J_l}}:\Vert (x^{(l)}_{\mathbf {i}_{J_l}})\Vert _2\le 1, \right. \\&\left. \text {for}\, 1\le l \le r \text { and} \sum _{i_{s_l} \le n}\Vert (y^{(l)}_{\mathbf {i}_{K_l}})_{\mathbf{i}_{K_l \setminus \{s_l\}}}\Vert _2^\alpha \le 1,\,\text {for}\, 1\le l \le k\right\} . \end{aligned}$$
If \(d \le 3\), then for any \(p \ge 2\),
$$\begin{aligned} C_d^{-1}m_d(p, A)\le \Vert \langle A,Z_1\otimes \cdots \otimes Z_d\rangle \Vert _p \le C_d m_d(p, A). \end{aligned}$$
Moreover, if \(\alpha = 1\), then the above inequality holds for all \(d \ge 1\).
Before we proceed, let us provide a few specific examples of the norms \(\Vert A\Vert _{\mathcal {J}|\mathcal {K}}\), which for \(\alpha < 2\) are more complicated than in the Gaussian case. In what follows, \(\beta = \frac{\alpha }{\alpha -1}\) (with \(\beta = \infty \) for \(\alpha = 1\)). For \(d=1\),
$$\begin{aligned} \Vert (a_i)\Vert _{\{1\}| \emptyset }&= \sup \big \{ \sum a_i x_i :\sum x_i^2 \le 1 \big \} = |(a_i)|_2, \\ \Vert (a_i)\Vert _{\emptyset | \{1\}}&= \sup \big \{ \sum a_i y_i :\sum |y_i|^\alpha \le 1 \big \} = |(a_i)|_\beta . \end{aligned}$$
For \(d=2, \Vert (a_{ij})\Vert _{\{1,2\}| \emptyset } = \Vert (a_{ij})\Vert _{\mathrm{HS}}, \Vert (a_{ij})\Vert _{\{1\}\{2\}| \emptyset } = \Vert (a_{ij})\Vert _{\ell _2 \rightarrow \ell _2}\),
$$\begin{aligned} \Vert (a_{ij})\Vert _{\{1\}|\{2\}}&= \sup \big \{ \sum a_{ij} x_i y_j :\sum x_i^2 \le 1, \sum |y_j|^\alpha \le 1\big \} = \Vert (a_{ij})\Vert _{\ell _\alpha \rightarrow \ell _2}, \\ \Vert (a_{ij})\Vert _{\{2\}|\{1\}}&= \sup \big \{ \sum a_{ij} y_i x_j :\sum x_j^2 \le 1, \sum |y_i|^\alpha \le 1\big \} = \Vert (a_{ij})\Vert _{\ell _2 \rightarrow \ell _\beta }, \\ \Vert (a_{ij})\Vert _{\emptyset | \{1\}\{2\}}&= \sup \big \{\sum a_{ij} y_i z_j :\sum |y_i|^\alpha \le 1, \sum |z_j|^\alpha \le 1\big \} = \Vert (a_{ij})\Vert _{\ell _\alpha \rightarrow \ell _\beta }, \end{aligned}$$
and
$$\begin{aligned} \Vert (a_{ij})\Vert _{\emptyset | \{1,2\}}&= \sup \left\{ \sum a_{ij} y_{ij} :\sum _i \left( \sum _j y_{ij}^2\right) ^{\frac{\alpha }{2}} \le 1\right\} \\&\quad + \sup \left\{ \sum a_{ij} y_{ij} :\sum _j \left( \sum _i y_{ij}^2\right) ^{\frac{\alpha }{2}} \le 1\right\} \\&= \left( \sum _i \left( \sum _j a_{ij}^2 \right) ^{\beta /2} \right) ^{1/\beta } + \left( \sum _j \left( \sum _i a_{ij}^2 \right) ^{\beta /2} \right) ^{1/\beta }. \end{aligned}$$
For \(d=3\), we have, for example,
$$\begin{aligned} \Vert (a_{ijk})\Vert _{\{2\}|\{1\}\{3\}}&= \sup \left\{ \sum a_{ijk} y_i x_j z_k :\sum |x_j|^2 \!\le \! 1, \sum |y_i|^\alpha \!\le \! 1, \sum |z_k|^\alpha \le 1 \right\} , \\ \Vert (a_{ijk})\Vert _{\{2\}|\{1,3\}}&= \sup \left\{ \sum a_{ijk} x_j y_{ik} :\sum x_j^2 \le 1, \sum _i \left( \sum _k y_{ik}^2 \right) ^{\frac{\alpha }{2}} \le 1 \right\} \\&\quad + \sup \left\{ \sum a_{ijk} x_j y_{ik} :\sum x_j^2 \le 1, \sum _k \left( \sum _i y_{ik}^2 \right) ^{\frac{\alpha }{2}} \le 1 \right\} ,\\ \Vert (a_{ijk})\Vert _{\emptyset |\{1\} \{2,3\}}&= \sup \left\{ \sum a_{ijk} y_i z_{jk} :\sum |y_i|^\alpha \le 1, \sum _j \left( \sum _k z_{jk}^2 \right) ^{\frac{\alpha }{2}} \le 1 \right\} \\&\quad + \sup \left\{ \sum a_{ijk} y_i z_{jk} :\sum |y_i|^\alpha \le 1, \sum _k \left( \sum _j z_{jk}^2 \right) ^{\frac{\alpha }{2}} \le 1 \right\} . \end{aligned}$$
In particular, from Theorem 6.1 it follows that for \(\alpha \in [1,2]\), if \(Y = (Y_1, \ldots , Y_n)\) is as in Theorem 6.1 then for every \(x \in \mathbb {R}^n\),
$$\begin{aligned} \frac{1}{C}(\sqrt{p}|x|_2 + p^{1/\alpha }|x|_\beta ) \le \Vert \langle x,Y\rangle \Vert _p \le C(\sqrt{p}|x|_2 + p^{1/\alpha }|x|_\beta ), \end{aligned}$$
where \(|\cdot |_r\) stands for the \(\ell _r^n\) norm (see also [29]). Thus, for \(\beta \in (2, \infty )\), the inequality (17) of Theorem 3.4, for \(m=n, k = 1\) and a \(\mathcal {C}^1\) function \(f :\mathbb {R}^n \rightarrow \mathbb {R}\), can be written in the form
$$\begin{aligned} \Vert f(X) - \mathbb Ef(X)\Vert _p \le C_\beta \Vert \langle \nabla f(X),Y\rangle \Vert _p. \end{aligned}$$
(65)
This allows for induction, just as in the proof of Proposition 3.2, except that instead of Gaussian vectors we will have independent copies of \(Y\). We can thus repeat the proof of Theorem 3.3, using the above observation and Theorem 6.1 instead of Theorem 3.1. This argument will then yield the following proposition, which is a counterpart of Theorem 3.3. At the moment we can prove it only for \(D \le 3\), clearly generalizing Theorem 6.1 to chaos of arbitrary degree would immediately imply it for general \(D\).

Proposition 6.2

Let \(X = (X_1,\ldots ,X_n)\) be a random vector in \(\mathbb {R}^n\), with independent components. Let \(\beta \in (2,\infty )\) and assume that for all \(i \le n, X_i\) satisfies the \(\beta \)-modified logarithmic Sobolev inequality with constant \(D_{LS_\beta }\). Let \(f :\mathbb {R}^n \rightarrow \mathbb {R}\) be a \(\mathcal {C}^D\) function. Define
$$\begin{aligned} m(p, f) = \big \Vert m_D(p, \mathbf {D}^D f(X)) \big \Vert _p + \sum _{1 \le d \le D-1} m_d(p, \mathbb E\mathbf {D}^d f(X)), \end{aligned}$$
where \(m_d(p, A)\) is defined by (64) with \(\alpha = \frac{\beta }{\beta -1}\).
If \(D\le 3\) then for \(p\ge 2\),
$$\begin{aligned} \Vert f(X)-\mathbb Ef(X)\Vert _p \le C_{\beta ,D_{LS_\beta }} m(p, f). \end{aligned}$$
As a consequence, for all \(p \ge 2\),
$$\begin{aligned} \mathbb P\big (|f(X) - \mathbb Ef(X)| \ge C_{\beta ,D_{LS_\beta }} m(p, f)\big ) \le e^{-p}. \end{aligned}$$

Remarks

1. For \(\beta = 2\), the estimates of the above proposition agree with those of Theorem 1.2. For \(\beta > 2\) it improves on what can be obtained from Theorem 3.3 in two aspects (of course just for \(D \le 3\)). First, the exponent of \(p\) is smaller as \((\gamma -1/2)d +\#(\mathcal {J}\cup \mathcal {K})/2 = (1/\alpha -1/2)d+\#\mathcal {J}/2 + \#\mathcal {K}/2 \ge \#\mathcal {J}/2 + \#\mathcal {K}/\alpha \). Second \(\Vert A\Vert _{\mathcal {J}\cup \mathcal {K}} \ge \Vert A\Vert _{\mathcal {J}|\mathcal {K}}\) (since for \(\alpha < 2, |x|_\alpha \ge |x|_2\), so the supremum on the left hand side is taken over a larger set).

2. From results in [2] it follows that if \(f\) is a tetrahedral polynomial of degree \(D\) and \(X_i\) are i.i.d. symmetric random variables satisfying \(\mathbb P(|X_i| \ge t) = \exp (-t^\alpha )\), then the inequalities of Proposition 6.2 can be reversed (up to constants), i.e.,
$$\begin{aligned} \Vert f(X) - \mathbb Ef(X)\Vert _p \ge \frac{1}{C_D}m_f(p). \end{aligned}$$
This is true for any positive integer \(D\).
3. One can also consider another functional inequality, which may be regarded a counterpart of (16) for \(\beta = \infty \). We say that a random vector \(X\) in \(\mathbb {R}^n\) satisfies the Bobkov-Ledoux inequality if for all locally Lipschitz positive functions such that \(|\nabla f(x)|_\infty := \max _{1\le i \le n} |\frac{\partial }{\partial x_i} f(x)| \le d_{BL}f(x)\) for all \(x\),
$$\begin{aligned} \mathrm {Ent}f^2(X) \le D_{BL} \mathbb E|\nabla f(X)|^2. \end{aligned}$$
(66)
This inequality has been introduced in [9] to provide a simple proof of Talagrand’s two-level concentration for the symmetric exponential measure in \(\mathbb {R}^n\). Here \(|\frac{\partial }{\partial x_i} f(x)|\) is defined as “partial length of gradient” (see (13)). Thus in the case of differentiable functions \(|\nabla f|_\infty \) coincides with the \(\ell _\infty ^n\) norm of the “true” gradient.
In view of Theorem 3.4 it is natural to conjecture that the Bobkov-Ledoux inequality implies
$$\begin{aligned} \Vert f(X) - \mathbb Ef(X)\Vert _p \le C\Big (\sqrt{p}\big \Vert |\nabla f(X)|\big \Vert _p + p\big \Vert |\nabla f(X)|_\infty \big \Vert _p\Big ), \end{aligned}$$
(67)
which in turn implies (65) with \(Y = (Y_1, \ldots , Y_n)\) being a vector of independent symmetric exponential variables and some \(C_\infty < \infty \). This would yield an analogue of Proposition 6.2 for \(\beta = \infty \), this time with no restriction on \(D\).

Unfortunately at present we do not know whether the implication (66) \(\implies \) (67) holds true or even if (67) holds for the symmetric exponential measure in \(\mathbb {R}^n\). We only are able to prove the following weaker inequality, which is however not sufficient to obtain a counterpart of Proposition 6.2 for \(\beta = \infty \).

Proposition 6.3

If \(X\) is a random vector in \(\mathbb {R}^n\), which satisfies (66), then for any locally Lipschitz function \(f :\mathbb {R}^n \rightarrow \mathbb {R}\), and any \(p \ge 2\),
$$\begin{aligned} \Vert f(X) - \mathbb Ef(X)\Vert _p \le 3\Big (D_{BL}^{1/2} \sqrt{p}\big \Vert |\nabla f(X)|\big \Vert _p + d_{BL}^{-1}p \big \Vert |\nabla f(X)|_\infty \big \Vert _\infty \Big ). \end{aligned}$$

Proof

To simplify the notation we suppress the argument \(X\). In what follows \(\Vert \cdot \Vert _p\) denotes the \(L_p\) norm with respect to the distribution of \(X\).

Let us fix \(p \ge 2\) and consider \(f_1 = \max (f, \Vert f\Vert _p /2)\). We have
$$\begin{aligned} \begin{aligned} \Vert f_1\Vert _p&\ge \Vert f\Vert _p,\\ \Vert f_1\Vert _2&\le \frac{1}{2} \Vert f\Vert _p + \Vert f\Vert _2,\\ \Vert f_1\Vert _p&\le \frac{3}{2}\Vert f\Vert _p \le 3 \min f_1. \end{aligned} \end{aligned}$$
(68)
Moreover, \(f_1\) is locally Lipschitz and we have pointwise estimates \(|\nabla f_1|\le |\nabla f|, |\nabla f_1|_\infty \le |\nabla f|_\infty \). Assume now that we have proved that
$$\begin{aligned} \Vert f_1\Vert _p \le \Vert f_1\Vert _2 + \sqrt{\frac{D_{BL}}{2}}\sqrt{p}\big \Vert |\nabla f_1| \big \Vert _p + \frac{3p}{2d_{BL}}\big \Vert |\nabla f_1|_\infty \big \Vert _\infty . \end{aligned}$$
(69)
Then, using the first two inequalities of (68), we obtain
$$\begin{aligned} \Vert f\Vert _p&\le \Vert f_1\Vert _p \le \Vert f_1\Vert _2 + \sqrt{\frac{D_{BL}}{2}}\sqrt{p}\big \Vert |\nabla f_1| \big \Vert _p + \frac{3p}{2d_{BL}}\big \Vert |\nabla f_1|_\infty \big \Vert _\infty \\&\le \frac{1}{2}\Vert f\Vert _p + \Vert f\Vert _2 + \sqrt{\frac{D_{BL}}{2}}\sqrt{p}\big \Vert |\nabla f| \big \Vert _p + \frac{3p}{2d_{BL}}\big \Vert |\nabla f|_\infty \big \Vert _\infty , \end{aligned}$$
which gives
$$\begin{aligned} \Vert f\Vert _p \le 2\Bigg (\Vert f\Vert _2 + \sqrt{\frac{D_{BL}}{2}}\sqrt{p}\big \Vert |\nabla f| \big \Vert _p + \frac{3p}{2d_{BL}}\big \Vert |\nabla f|_\infty \big \Vert _\infty \Bigg ). \end{aligned}$$
(70)
Since (66) implies the Poincaré inequality with constant \(D_{BL}/2\) (see e.g. Proposition 2.3 in [28]), we can conclude the proof applying (70) to \(|f - \mathbb Ef|\) (similarly as in the proof of Theorem 3.4). Thus it is enough to prove (69).
From now on we are going to work with the function \(f_1\) only, so for brevity we will drop the subscript and write \(f\) instead of \(f_1\). Assume \(\Vert f\Vert _p \ge \frac{3p}{2d_{BL}} \Vert |\nabla f|_\infty \Vert _\infty \) (otherwise (69) is trivially satisfied). Then, using the third inequality of (68), for \(2\le t \le p\) and all \(x \in \mathbb {R}^n\),
$$\begin{aligned} |\nabla f^{t/2}(x)|_\infty \le \frac{t}{2} f^{t/2-1}(x) |\nabla f(x)|_\infty \le \frac{3}{2} f^{t/2}(x) \frac{p|\nabla f(x)|_\infty }{\Vert f\Vert _p} \le d_{BL} f^{t/2}(x). \end{aligned}$$
We can thus apply (66) with \(f^{t/2}\), which together with Hölder’s inequality gives
$$\begin{aligned} \mathrm {Ent}f^t \le D_{BL} \mathbb E|\nabla f^{t/2}|^2 \le D_{BL}\frac{t^2}{4}\mathbb E\big ( f^{t-2} |\nabla f|^2 \big ) \le D_{BL}\frac{t^2}{4} \big \Vert |\nabla f|\big \Vert _t^2 (\mathbb Ef^t)^{1 - \frac{2}{t}}. \end{aligned}$$
Now, as in the proof of Theorem 3.4, we have
$$\begin{aligned} \frac{d}{dt} (\mathbb Ef^t)^{2/t} = \frac{2}{t^2} (\mathbb Ef^t)^{\frac{2}{t} - 1} \mathrm {Ent}f^t \le \frac{D_{BL}}{2}\big \Vert |\nabla f|\big \Vert _p^2, \end{aligned}$$
which upon integrating gives
$$\begin{aligned} \Vert f\Vert _p^2 \le \Vert f\Vert _2^2 + \frac{D_{BL}}{2} p \big \Vert |\nabla f|\big \Vert _p^2, \end{aligned}$$
which clearly implies (69).\(\square \)

Footnotes

  1. 1.

    A multivariate polynomial is called tetrahedral if all variables appear in it in power at most one.

Notes

Acknowledgments

We would like to thank Michel Ledoux and Sandrine Dallaporta for interesting discussions concerning tail estimates for linear eigenvalue statistics of random matrices.

References

  1. 1.
    Adamczak, R.: Logarithmic Sobolev inequalities and concentration of measure for convex functions and polynomial chaoses. Bull. Pol. Acad. Sci. Math. 53(2), 221–238 (2005)MATHMathSciNetCrossRefGoogle Scholar
  2. 2.
    Adamczak, R., Latała, R.: Tail and moment estimates for chaoses generated by symmetric random variables with logarithmically concave tails. Ann. Inst. H. Poincar Probab. Stat. 48(4), 1103–1136 (2012)MATHCrossRefGoogle Scholar
  3. 3.
    Aida, S., Stroock, D.: Moment estimates derived from Poincaré and logarithmic Sobolev inequalities. Math. Res. Lett. 1(1), 75–86 (1994)MATHMathSciNetCrossRefGoogle Scholar
  4. 4.
    Anderson, G.W., Guionnet, A., Zeitouni, O.: An Introduction to Random Matrices, Volume 118 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge (2010)Google Scholar
  5. 5.
    Arcones, M.A., Giné, E.: On decoupling, series expansions, and tail behavior of chaos processes. J. Theor. Probab. 6(1), 101–122 (1993)MATHCrossRefGoogle Scholar
  6. 6.
    Bai, Z., Silverstein, J.W.: Spectral Analysis of Large Dimensional Random Matrices. Springer Series in Statistics, 2nd edn. Springer, New York (2010)CrossRefGoogle Scholar
  7. 7.
    Bakry, D., Émery, M.: Inégalités de Sobolev pour un semi-groupe symétrique. C. R. Acad. Sci. Paris Sér. I Math. 301(8), 411–413 (1985)MATHGoogle Scholar
  8. 8.
    Barthe, F., Milman, E.: Transference principles for log-Sobolev and spectral-gap with applications to conservative spin systems. Commun. Math. Phys. 323(2), 575–625 (2013)MATHMathSciNetCrossRefGoogle Scholar
  9. 9.
    Bobkov, S., Ledoux, M.: Poincaré’s inequalities and Talagrand’s concentration phenomenon for the exponential distribution. Probab. Theory Relat. Fields 107(3), 383–400 (1997)MATHMathSciNetCrossRefGoogle Scholar
  10. 10.
    Bobkov, S.G.: Remarks on the growth of \(L^p\)-norms of polynomials. In: Geometric aspects of functional analysis, volume 1745 of Lecture Notes in Math., pp 27–35. Springer, Berlin (2000)Google Scholar
  11. 11.
    Bobkov, S.G., Götze, F.: Exponential integrability and transportation cost related to logarithmic Sobolev inequalities. J. Funct. Anal. 163(1), 1–28 (1999)MATHMathSciNetCrossRefGoogle Scholar
  12. 12.
    Bobkov, S.G., Götze, F., Tikhomirov, A.N.: On concentration of empirical measures and convergence to the semi-circle law. J. Theor. Probab. 23(3), 792–823 (2010)MATHCrossRefGoogle Scholar
  13. 13.
    Bonami, A.: Étude des coefficients de Fourier des fonctions de \(L^{p}(G)\). Ann. Inst. Fourier (Grenoble), 20(fasc. 2):335–402 (1971), 1970Google Scholar
  14. 14.
    Borell, C.: The Brunn-Minkowski inequality in Gauss space. Invent. Math. 30(2), 207–216 (1975)MATHMathSciNetCrossRefGoogle Scholar
  15. 15.
    Borell, C.: On the Taylor series of a Wiener polynomial. Seminar Notes on multiple stochastic integration, polynomial chaos and their integration. Case Western Reserve University, Cleveland (1984)Google Scholar
  16. 16.
    Borodin, A.N., Ibragimov, I.A.: Limit theorems for functionals of random walks. Trudy Mat. Inst. Steklov. (1995) (Predel. Teoremy dlya Funktsional. ot Sluchain. Bluzh.), p. 286 (1994)Google Scholar
  17. 17.
    Boucheron, S., Bousquet, O., Lugosi, G., Massart, P.: Moment inequalities for functions of independent random variables. Ann. Probab. 33(2), 514–560 (2005)MATHMathSciNetCrossRefGoogle Scholar
  18. 18.
    Boucheron, S., Lugosi, G., Massart, P.: Concentration inequalities using the entropy method. Ann. Probab. 31(3), 1583–1614 (2003)MATHMathSciNetCrossRefGoogle Scholar
  19. 19.
    Bourgain, J.: On the distribution of polynomials on high-dimensional convex sets. In: Geometric aspects of functional analysis (1989–90) volume 1469 of Lecture Notes in Math. pp 127–137. Springer, Berlin (1991)Google Scholar
  20. 20.
    Caffarelli, L.A.: Monotonicity properties of optimal transportation and the FKG and related inequalities. Commun. Math. Phys. 214(3), 547–563 (2000)MATHMathSciNetCrossRefGoogle Scholar
  21. 21.
    Carbery, A., Wright, J.: Distributional and \(L^q\) norm inequalities for polynomials over convex bodies in \(\mathbb{R}^n\). Math. Res. Lett. 8(3), 233–248 (2001)MATHMathSciNetCrossRefGoogle Scholar
  22. 22.
    Chatterjee, S.: The missing log in large deviations for triangle counts. Random Struct. Algorithms 40(4), 437–451 (2012)MATHCrossRefGoogle Scholar
  23. 23.
    Chatterjee, S., Dembo, A.: Nonlinear large deviations. http://arxiv.org/abs/1401.3495 January (2014)
  24. 24.
    de la Peña, V.H., Giné, E.: Decoupling. From Dependence to Independence, Randomly Stopped Processes. \(U\)-Statistics and Processes. Martingales and Beyond, Probability and its Applications. Springer, New York (1999)Google Scholar
  25. 25.
    de la Peña, V.H., Montgomery-Smith, S.J.: Decoupling inequalities for the tail probabilities of multivariate \(U\)-statistics. Ann. Probab. 23(2), 806–816 (1995)MATHMathSciNetCrossRefGoogle Scholar
  26. 26.
    DeMarco, B., Kahn, J.: Tight upper tail bounds for cliques. Random Struct. Algorithms 41(4), 469–487 (2012)MATHMathSciNetCrossRefGoogle Scholar
  27. 27.
    DeMarco, B., Kahn, J.: Upper tails for triangles. Random Struct. Algorithms 40(4), 452–459 (2012)MATHMathSciNetCrossRefGoogle Scholar
  28. 28.
    Gentil, I., Guillin, A., Miclo, L.: Modified logarithmic Sobolev inequalities and transportation inequalities. Probab. Theory Relat. Fields 133(3), 409–436 (2005)MATHMathSciNetCrossRefGoogle Scholar
  29. 29.
    Gluskin, E.D., Kwapień, S.: Tail and moment estimates for sums of independent random variables with logarithmically concave tails. Studia Math. 114(3), 303–309 (1995)MATHMathSciNetGoogle Scholar
  30. 30.
    Guillin, A., Joulin, A.: Measure concentration through non-Lipschitz observables and functional inequalities. Electron. J. Probab. 18.65, 26 (2013)MathSciNetGoogle Scholar
  31. 31.
    Guionnet, A., Zeitouni, O.: Concentration of the spectral measure for large matrices. Electron. Commun. Probab. 5, 119–136 (2000)MATHMathSciNetCrossRefGoogle Scholar
  32. 32.
    Hanson, D.L., Wright, F.T.: A bound on tail probabilities for quadratic forms in independent random variables. Ann. Math. Stat. 42, 1079–1083 (1971)MATHMathSciNetCrossRefGoogle Scholar
  33. 33.
    Hitczenko, P., Montgomery-Smith, S.J., Oleszkiewicz, K.: Moment inequalities for sums of certain independent symmetric random variables. Studia Math. 123(1), 15–42 (1997)MATHMathSciNetGoogle Scholar
  34. 34.
    Janson, S.: Gaussian Hilbert Spaces, Volume 129 of Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge (1997)CrossRefGoogle Scholar
  35. 35.
    Janson, S., Oleszkiewicz, K., Ruciński, A.: Upper tails for subgraph counts in random graphs. Isr. J. Math. 142, 61–92 (2004)MATHCrossRefGoogle Scholar
  36. 36.
    Janson, S., Ruciński, A.: The infamous upper tail. Probabilistic methods in combinatorial optimization. Random Struct. Algorithms 20(3), 317–342 (2002)MATHCrossRefGoogle Scholar
  37. 37.
    Kim, J.H., Vu, V.H.: Concentration of multivariate polynomials and its applications. Combinatorica 20(3), 417–434 (2000)MATHMathSciNetCrossRefGoogle Scholar
  38. 38.
    Kim, J.H., Vu, V.H.: Divide and conquer martingales and the number of triangles in a random graph. Random Struct. Algorithms 24(2), 166–174 (2004)MATHMathSciNetCrossRefGoogle Scholar
  39. 39.
    Klartag, B.: Concentration of measures supported on the cube. Isr. J. Math. 203(1), 59–80 (2014). doi:10.1007/s11856-013-0072-1
  40. 40.
    Kwapień, S.: Decoupling inequalities for polynomial chaos. Ann. Probab. 15(3), 1062–1071 (1987)MATHMathSciNetCrossRefGoogle Scholar
  41. 41.
    Kwapień, S., Szulga, J.: Hypercontraction methods in moment inequalities for series of independent random variables in normed spaces. Ann. Probab. 19(1), 369–379 (1991)MATHMathSciNetCrossRefGoogle Scholar
  42. 42.
    Kwapień, S., Woyczyński, W.A.: Random Series and Stochastic Integrals: Single and Multiple. Probability and its Applications. Birkhäuser Boston Inc., Boston (1992)CrossRefGoogle Scholar
  43. 43.
    Latała, R.: Tail and moment estimates for some types of chaos. Studia Math. 135(1), 39–53 (1999)MATHMathSciNetGoogle Scholar
  44. 44.
    Latała, R.: Estimates of moments and tails of Gaussian chaoses. Ann. Probab. 34(6), 2315–2331 (2006)MATHMathSciNetCrossRefGoogle Scholar
  45. 45.
    Latała, R., Łochowski, R.: Moment and tail estimates for multidimensional chaos generated by positive random variables with logarithmically concave tails. Stochastic inequalities and applications volume 56 of Progr. Probab, pp. 77–92. Birkhäuser, Basel (2003)Google Scholar
  46. 46.
    Ledoux, M.: The Concentration of Measure Phenomenon, Volume 89 of Mathematical Surveys and Monographs. American Mathematical Society, Providence (2001)Google Scholar
  47. 47.
    Ledoux, M., Oleszkiewicz, K.: On measure concentration of vector-valued maps. Bull. Pol. Acad. Sci. Math. 55(3), 261–278 (2007)MATHMathSciNetCrossRefGoogle Scholar
  48. 48.
    Łochowski, R.: Moment and tail estimates for multidimensional chaoses generated by symmetric random variables with logarithmically concave tails. In Approximation and probability, volume 72 of Banach Center Publ. pp. 161–176. Polish Acad. Sci., Warsaw (2006)Google Scholar
  49. 49.
    Lubetzky, E., Zhao, Y.: On the variational problem for upper tails in sparse random graphs. http://arxiv.org/abs/1402.6011 February (2014)
  50. 50.
    Lytova, A., Pastur, L.: On asymptotic behavior of multilinear eigenvalue statistics of random matrices. J. Stat. Phys. 133(5), 871–882 (2008)MATHMathSciNetCrossRefGoogle Scholar
  51. 51.
    Mehta, M.L.: Random Matrices, Volume 142 of Pure and Applied Mathematics (Amsterdam), 3rd edn. Elsevier/Academic Press, Amsterdam (2004)Google Scholar
  52. 52.
    Milman, E.: On the role of convexity in isoperimetry, spectral gap and concentration. Invent. Math. 177(1), 1–43 (2009)MATHMathSciNetCrossRefGoogle Scholar
  53. 53.
    Milman, E.: Properties of isoperimetric, functional and transport-entropy inequalities via concentration. Probab. Theory Relat. Fields 152(3–4), 475–507 (2012)MATHMathSciNetCrossRefGoogle Scholar
  54. 54.
    Nazarov, F., Sodin, M., Vol\(^{\prime }\)berg, A.: The geometric Kannan-Lovász-Simonovits lemma, dimension-free estimates for the distribution of the values of polynomials, and the distribution of the zeros of random analytic functions. Algebra i Analiz, 14(2):214–234, (2002)Google Scholar
  55. 55.
    Nelson, E.: The free Markoff field. J. Funct. Anal. 12, 211–227 (1973)MATHCrossRefGoogle Scholar
  56. 56.
    Pastur, L., Shcherbina, M.: Eigenvalue Distribution of Large Random Matrices, Volume 171 of Mathematical Surveys and Monographs. American Mathematical Society, Providence (2011)Google Scholar
  57. 57.
    Pisier, G.: Probabilistic methods in the geometry of Banach spaces. Probability and analysis (Varenna. 1985), volume 1206 of Lecture Notes in Math, pp. 167–241. Springer, Berlin (1986)Google Scholar
  58. 58.
    Pisier, G.: The Volume of Convex Bodies and Banach Space Geometry, Volume 94 of Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge (1989)CrossRefGoogle Scholar
  59. 59.
    Schudy, W., Sviridenko, M.: Bernstein-like Concentration and Moment Inequalities for Polynomials of Independent Random Variables: Multilinear Case. http://arxiv.org/abs/1109.5193 September (2011)
  60. 60.
    Schudy, W., Sviridenko, M.: Concentration and Moment Inequalities for Polynomials of Independent Random Variables.http://arxiv.org/abs/1104.4997 April (2011)
  61. 61.
    Skorohod, A.V., Slobodenjuk, N.P.: Predelnye teoremy dlya sluchainykh bluzhdanii. Izdat. “Naukova Dumka”, Kiev, (1970)Google Scholar
  62. 62.
    Sudakov, V.N., Cirel\(^{\prime }\)son, B.S.: Extremal properties of half-spaces for spherically invariant measures. Problems in the theory of probability distributions, II. Zap. Naučn. Sem. Leningrad. Otdel. Mat. Inst. Steklov. (LOMI). 41, 14–24, 165, (1974)Google Scholar
  63. 63.
    Talagrand, M.: An isoperimetric theorem on the cube and the Kintchine-Kahane inequalities. Proc. Am. Math. Soc. 104(3), 905–909 (1988)MATHMathSciNetCrossRefGoogle Scholar
  64. 64.
    Talagrand, M.: New concentration inequalities in product spaces. Invent. Math. 126(3), 505–563 (1996)MATHMathSciNetCrossRefGoogle Scholar
  65. 65.
    Vu, V.H.: Concentration of non-Lipschitz functions and applications. Probabilistic methods in combinatorial optimization. Random Struct. Algorithms 20(3), 262–316 (2002)MATHMathSciNetCrossRefGoogle Scholar

Copyright information

© The Author(s) 2014

Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Authors and Affiliations

  1. 1.Institute of MathematicsUniversity of WarsawWarszawaPoland
  2. 2.Institute of MathematicsPolish Academy of SciencesWarszawaPoland

Personalised recommendations