Concentration inequalities for non-Lipschitz functions with bounded derivatives of higher order

Building on the inequalities for homogeneous tetrahedral polynomials in independent Gaussian variables due to R. Latała we provide a concentration inequality for not necessarily Lipschitz functions f:Rn→R\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f:\mathbb {R}^n \rightarrow \mathbb {R}$$\end{document} with bounded derivatives of higher orders, which holds when the underlying measure satisfies a family of Sobolev type inequalities ‖g-Eg‖p≤C(p)‖∇g‖p.\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \Vert g- \mathbb Eg\Vert _p \le C(p)\Vert \nabla g\Vert _p. \end{aligned}$$\end{document}Such Sobolev type inequalities hold, e.g., if the underlying measure satisfies the log-Sobolev inequality (in which case C(p)≤Cp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C(p) \le C\sqrt{p}$$\end{document}) or the Poincaré inequality (then C(p)≤Cp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C(p) \le Cp$$\end{document}). Our concentration estimates are expressed in terms of tensor-product norms of the derivatives of f\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f$$\end{document}. When the underlying measure is Gaussian and f\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f$$\end{document} is a polynomial (not necessarily tetrahedral or homogeneous), our estimates can be reversed (up to a constant depending only on the degree of the polynomial). We also show that for polynomial functions, analogous estimates hold for arbitrary random vectors with independent sub-Gaussian coordinates. We apply our inequalities to general additive functionals of random vectors (in particular linear eigenvalue statistics of random matrices) and the problem of counting cycles of fixed length in Erdős–Rényi random graphs, obtaining new estimates, optimal in a certain range of parameters.


Introduction
Concentration of measure inequalities are one of the basic tools in modern probability theory (see the monograph [46]). The prototypic result for all concentration theorems is arguably the Gaussian concentration inequality [14,62], which asserts that if G is a standard Gaussian vector in R n and f : R n → R is a 1-Lipschitz function, then for all t > 0, Over the years the above inequality has found numerous applications in the analysis of Gaussian processes, as well as in asymptotic geometric analysis (e.g. in modern proofs of Dvoretzky type theorems). Its applicability in geometric situations comes from the fact that it is dimension free and all norms in R n are Lipschitz with respect to one another. However, there are some probabilistic or combinatorial situations, when one is concerned with functions that are not Lipschitz. The most basic case is the probabilistic analysis of polynomials in independent random variables, which arise naturally, e.g., in the study of multiple stochastic integrals, in discrete harmonic analysis as elements of the Fourier expansions on the discrete cube or in numerous problems of random graph theory, to mention just the famous subgraph counting problem [22,23,26,27,35,36,49].
The concentration of measure or more generally integrability properties for polynomials have attracted a lot of attention in the last forty years. In particular Bonami [13] and Nelson [55] provided hypercontractive estimates (Khintchine type inequalities) for polynomials on the discrete cube and in the Gauss space, which have been later extended to other random variables by Kwapień and Szulga [41] (see also [42]). Khintchine type inequalities have been also obtained in the absence of independence for polynomials under log-concave measures by Bourgain [19], Bobkov [10], Nazarov-Sodin-Volberg [54] and Carbery-Wright [21].
Another line of research is to provide two-sided estimates for moments of polynomials in terms of deterministic functions of the coefficients. Borell [15] and Arcones-Giné [5] provided such two-sided bounds for homogeneous polynomials in Gaussian variables. They were expressed in terms of expectations of suprema of certain empirical processes. Talagrand [64] and Bousquet-Boucheron-Lugosi-Massart [17,18] obtained counterparts of these results for homogeneous tetrahedral 1 polynomials in Rademacher variables and Łochowski [48] and Adamczak [1] for random variables with logconcave tails. Inequalities of this type, while implying (up to constants) hypercontractive bounds, have a serious downside as the analysis of the empirical processes involved is in general difficult. It is therefore important to obtain two-sided bounds in terms of purely deterministic quantities. Such bounds for random quadratic forms in independent symmetric random variables with log-concave tails have been obtained by Latała [43] (the case of linear forms was solved earlier by Gluskin and Kwapień [29], whereas bounds for quadratic forms in Gaussian variables were obtained by Hanson-Wright [32], Borell [15] and Arcones-Giné [5]). Their counterparts for multilinear forms of arbitrary degree in nonnegative random variables with log-concave tails have been derived by Latała and Łochowski [45]. As for the symmetric case, the general problem is still open. An important breakthrough has been obtained by Latała [44], who proved two-sided estimates for Gaussian chaos of arbitrary order, that is for homogeneous tetrahedral polynomials of arbitrary degree in independent Gaussian variables (we recall his bounds below as they are the starting point for our investigations). For general symmetric random variables with log-concave tails similar bounds are known only for chaos of order at most three [2].
Polynomials in independent random variables have been also investigated in relation with combinatorial problems, e.g. with subgraph counting [22,23,26,27,35,36,49]. The best known result for general polynomials in this area has been obtained by Kim and Vu [37,65], who presented a family of powerful inequalities for [0, 1]-valued random variables. Over the last decade they have been applied successfully to handle many problems in probabilistic combinatorics. Some recent inequalities for polynomials in subexponential random variables have been also obtained by Schudy and Sviridenko [59,60]. They are a generalization of the special case of exponential random variables in [45] and are expressed in terms of quantities similar to those considered by Kim-Vu. Since it is beyond the scope of this paper to give a precise account of all the concentration inequalities for polynomials, we refer the Reader to the aforementioned sources and recommend also the monographs [24,42], where some parts of the theory are presented in a uniform way. As already mentioned we will present in detail only the results from [44], which are our main tool as well as motivation.
As for concentration results for general non-Lipschitz functions, the only reference we are aware of, which addresses this question is [30], where the Authors obtain interesting inequalities for stationary measures of certain Markov processes and functions satisfying a Lyapunov type condition. Their bounds are not comparable to the ones which we present in this paper. On the one hand they work in a more general Markov process setting, on the other hand, it seems that their results are restricted to the Gaussian-exponential concentration and do not apply to functionals with heavier tails (such as polynomials of degree higher than two). Since the language of [30] is very different from ours, we will not describe the inequalities obtained therein and refer the interested Reader to the original paper.
Let us now proceed to the presentation of our results. To do this we will first formulate a two-sided tail and moment inequality for homogeneous tetrahedral polynomials in i.i.d. standard Gaussian variables due to Latała [44]. To present it in a concise way we need to introduce some notation which we will use throughout the article. For a positive integer n we will denote [n] = {1, . . . , n}. The cardinality of a set I will be denoted by #I . For i = (i 1 , . . . , i d ) ∈ [n] d and I ⊆ [d] we write i I = (i k ) k∈I . We will also denote |i| = max j≤d i j and |i I | = max j∈I i j .
Let now P d be the set of partitions of [d] into nonempty, pairwise disjoint sets. For a partition J = {J 1 , . . . , J k }, and a d-indexed matrix A = (a i ) i∈[n] d (not necessarily symmetric or with zeros on the diagonal), define where (x i J l ) Thus, e.g., i, j≤n From the functional analytic perspective the above norms are injective tensor product norms of A seen as a multilinear form on (R n ) d with the standard Euclidean structure.
We are now ready to present the inequalities by Latała. Below, as in the whole article by C d we denote a constant, which depends only on d. The values of C d may differ between occurrences. Theorem 1.1 (Latała [44]) For any d-indexed symmetric matrix A = (a i ) i∈[n] d such that a i = 0 if i j = i k for some j = k, the random variable Z , defined by (1) satisfies for all p ≥ 2, As a consequence, for all t > 0, It is worthwhile noting that for #J > 1, the norms A J are not unconditional in the standard basis (decreasing coefficients of the matrix may not result in decreasing the norm). Moreover, for specific matrices they may not be easy to compute. On the other hand, for any d-indexed matrix A and any J ∈ P d , we have A J ≤ A {1,...,d} = i a 2 i . Using this fact in the upper estimates above allows to recover (up to constants depending on d) hypercontractive estimates for homogeneous tetrahedral polynomials due to Nelson. Our main result is an extension of the upper bound given in the above theorem to more general random functions and measures. Below we present the most basic setting we will work with and state the corresponding theorems. Some additional extensions are deferred to the main body of the article.
We will consider a random vector X in R n , which satisfies the following family of Sobolev inequalities. For any p ≥ 2 and any smooth integrable function f : R n → R, for some constant L (independent of p and f ), where |·| is the standard Euclidean norm on R n . It is known (see [3] and Theorem 3.4 below) that if X satisfies the logarithmic Sobolev inequality (15) with constant D L S , then it satisfies (3) with L = √ D L S . We remark that there are many criteria for a random vector to satisfy the logarithmic Sobolev inequality (see e.g. [7,8,11,39,46]), so in particular our assumption (3) can be verified for many random vectors of interest.
Our first result is the following theorem, which provides moment estimates and concentration for D-times differentiable functions. The estimates are expressed by · J norms of derivatives of the function (which we will identify with multi-indexed matrices). We will denote the d-th derivative of f by D d f .
In particular if D D f (x) is uniformly bounded on R n , then setting we obtain for t > 0, The above theorem is quite technical, so we will now provide a few comments, comparing it to known results.
1. It is easy to see that if D = 1, Theorem 1.2 reduces (up to absolute constants) to the Gaussian-like concentration inequality, which can be obtained from (3) by Chebyshev's inequality (applied to general p and optimized).
2. If f is a homogeneous tetrahedral polynomial of degree D, then the tail and moment estimates of Theorem 1.2 coincide with those from Latała's Theorem. Thus Theorem 1.2 provides an extension of the upper bound from Latała's result to a larger class of measures and functions (however we would like to stress that our proof relies heavily on Latała's work).
3. If f is a general polynomial of degree D, then D D f (x) is constant on R n (and thus equal to ED D f (X )). Therefore in this case the function η f appearing in Theorem 1.2 can be written in a simplified form

4.
For polynomials in Gaussian variables, the estimates given in Theorem 1.2 can be reversed, like in Theorem 1.1. More precisely we have the following theorem, which provides an extension of Theorem 1.1 to general polynomials.

Theorem 1.3
If G is a standard Gaussian vector in R n and f : R n → R is a polynomial of degree D, then for all p ≥ 2, Moreover for all t > 0,

5.
It is well known that concentration of measure for general Lipschitz functions fails e.g. on the discrete cube and one has to impose some additional convexity assumptions to get sub-Gaussian concentration [63]. It turns out that if we restrict to polynomials, estimates in the spirit of Theorems 1.1 and 1.2 still hold. To formulate our result in full generality recall the definition of the ψ 2 Orlicz norm of a random variable Y , By integration by parts and Chebyshev's inequality Y ψ 2 < ∞ is equivalent to a sub-Gaussian tail decay for Y . We have the following result for polynomials in sub-Gaussian random vectors with independent components. Theorem 1.4 Let X = (X 1 , . . . , X n ) be a random vector with independent components, such that for all i ≤ n, X i ψ 2 ≤ L. Then for every polynomial f : R n → R of degree D and every p ≥ 2, As a consequence, for any t > 0, where

6.
To give the Reader a flavour of possible applications let us mention the Hanson-Wright inequality [32]. Namely, for a random vector X = (X 1 , . . . , X n ) in R n with square integrable and mean-zero components and a real symmetric matrix A = (a i j ) i, j≤n , consider the random variable If the components of X are independent and X i ψ 2 ≤ L for i = 1, . . . , n, then it follows immediately from Theorem 1.4 that for all t > 0, Similarly, Theorem 1.2 implies (5) under the assumption that X satisfies the log-Sobolev inequality (15) with constant L 2 (no independence of components of X is assumed). Moreover, if X is a standard Gaussian vector in R n , then by Theorem 1.3 the tail estimate (5) can be reversed up to numerical constants. We postpone further applications of our theorems to subsequent sections of the article and here we announce only that apart from polynomials we apply Theorem 1.2 to additive functionals and U -statistics of random vectors, in particular to linear eigenvalue statistics of random matrices, obtaining bounds which complement known estimates by Guionnet and Zeitouni [31]. Theorem 1.4 is applied to the problem of subgraph counting in large random graphs. In a special case when one counts copies of a given cycle in a random graph G(n, p), our result allows to obtain a tail inequality which is optimal whenever where k is the length of the cycle. To the best of our knowledge this is the sharpest currently known result for this range of p.
7. Let us now briefly discuss optimality of our inequalities. The lower bound in Theorem 1.3 clearly shows that Theorem 1.2 is optimal in the class of measures and functions it covers up to constants depending only on D. As for Theorem 1.4, it is similarly optimal in the class of random vectors with independent sub-Gaussian coordinates. In concrete combinatorial applications, for 0-1 random variables this theorem may be however suboptimal. This can be seen already for D = 1, for a linear combination of independent Bernoulli variables X 1 , . . . , X n with P(X i = 1) = 1 − P(X i = 0) = p. When p becomes small, the tail bound for such variables given e.g. by the Chernoff inequality is more subtle than what can be obtained from general inequalities for sums of sub-Gaussian random variables and the fact that X i ψ 2 is of order (log(2/ p)) −1/2 . Roughly speaking, this is the reason why in our estimates for random graphs we have the restriction on how small p can be. At the same time our inequalities still give results comparable to what can be obtained from other general inequalities for polynomials. As already noted in the survey [36], bounds obtained from various general inequalities for the subgraph-counting problem may not be directly comparable, i.e. those performing well in one case may exhibit worse performance in some other cases. Similarly, our inequalities cannot be in general compared e.g. to the estimates by Kim and Vu [37,38]. For this reason and since it would require introducing new notation, we will not discuss their estimates and just indicate, when presenting applications of Theorem 1.4, several situations when our inequalities perform in a better or worse way than those by Kim and Vu. Let us only mention that the Kim-Vu inequalities similarly as ours are expressed in terms of higher order derivatives of the polynomials. However, Kim and Vu (as well as Schudy and Sviridenko) look at maxima of absolute values of partial derivatives, which does not lead to tensor-product norms which we consider. While in the general sub-Gaussian case we consider, such tensor product norms cannot be avoided (in view of Theorem 1.3), it is not necessarily the case for 0-1 random variables. 8. A version of Theorem 1.2 for vectors of independent random variables satisfying the modified logarithmic Sobolev inequality (see e.g. [28]) instead of the classical log-Sobolev inequality is also discussed. In particular, in Theorem 3.4 we relate the modified log-Sobolev inequality to a certain Sobolev-type inequality with a non-Euclidean norm of the gradient and with the constant independent of the dimension.
The organization of the paper is as follows. First, in Sect. 2, we introduce the notation used in the paper, next in Sect. 3 we give the proof of Theorem 1.2 together with some generalizations and examples of applications. In Sect. 4 we prove Theorem 1.3, whereas in Sect. 5 we present the proof of Theorem 1.4 and applications to the subgraph counting problems. In Sect. 6 we provide further refinements of estimates from Sect. 3 in the case of independent random variables satisfying modified log-Sobolev inequalities (they are deferred to the end of the article as they are more technical than those of Sect. 3). In the Appendix we collect some additional facts used in the proofs.

Notation
Sets and indices For a positive integer n we will denote [n] = {1, . . . , n}. The cardinality of a set I will be denoted by #I .
For i = (i 1 , . . . , i d ) ∈ [n] d and I ⊆ [d] we write i I = (i k ) k∈I . We will also denote |i| = max j≤d i j and |i I | = max k∈I i k .
For a finite set A and an integer d ≥ 0 we set A d is the set of d-indices with pairwise distinct coordinates). Accordingly we will denote n d = n(n − 1) · · · (n − d + 1). For a finite set I , by P I we will denote the family of partitions of I into nonempty, pairwise disjoint sets. For simplicity we will write P d instead of P [d] .
For a finite set I by 2 (I ) we will denote the finite dimensional Euclidean space R I endowed with the standard Euclidean norm |x| 2 = i∈I x 2 i . Whenever there is no risk of confusion we will denote the standard Euclidean norm simply by | · |.

Multi-indexed matrices
For a function f : R n → R by D d f (x) we will denote the (d-indexed) matrix of its derivatives of order d, which we will identify with the corresponding symmetric d-linear form.
We will also define the Hadamard product of two such matrices M •N as a d-indexed matrix with entries m i = M i N i (pointwise multiplication of entries).
Let us also define the notion of "generalized diagonals" of a d-indexed matrix A = (a i ) i∈[n] d . For a fixed set K ⊆ [d], with #K > 1, the "generalized diagonal" corresponding to K is the set of indices {i ∈ [n] d : i k = i l for all k, l ∈ K }.
Constants We will use the letter C to denote absolute constants and C a for constants depending only on some parameter a. In both cases the values of such constants may differ between occurrences.

A concentration inequality for non-Lipschitz functions
In this Section we prove Theorem 1.2. Let us first state our main tool, which is an inequality by Latała in a decoupled version.

Theorem 3.1 (Latała [44]) Let A = (a i ) i∈[n] d be a d-indexed matrix with real entries and let G
Thanks to general decoupling inequalities for U -statistics [25], which we recall in the "Appendix" (Theorem 7.1), the above theorem is formally equivalent to Theorem 1.1. In fact in [44] Latała first proves the above version. In the proof of Theorem 3.3, which is a slight generalization of Theorem 1.2, we will need just Theorem 3.1 (in particular in this part of the article we do not need any decoupling inequalities).
From now on we will work in a more general setting than in Theorem 1.2 and assume that X is a random vector in R n , such that for all p ≥ 2 there exists a constant L X ( p) such that for all bounded C 1 functions f : R n → R, Clearly in this situation the above inequality generalizes to all C 1 functions (if the right-hand side is finite then the left-hand side is well defined and the inequality holds). Let now G be a standard n-dimensional Gaussian vector, independent of X . Using the Fubini theorem together with the fact that for some absolute constant C, all x ∈ R n and p ≥ 2, C −1 √ p|x| ≤ x, G p ≤ C √ p|x|, we can linearise the right-hand side above and write (6) equivalently (up to absolute constants) as We remark that similar linearisation has been used by Maurey and Pisier to provide a simple proof of the Gaussian concentration inequality [57,58] (see the remark following Theorem 3.3 below). Inequality (7) has an advantage over (6) as it allows for iteration leading to the following simple proposition. Proposition 3.2 Consider p ≥ 2 and let X be an n-dimensional random vector satisfying (6). Let f : R n → R be a C D function. Let moreover G 1 , . . . , G D be independent standard Gaussian vectors in R n , independent of X . Then for all p ≥ 2, Proof Induction on D. For D = 1 the assertion of the proposition coincides with (7), which (as already noted) is equivalent to (6). Let us assume that the proposition holds for D − 1. Applying thus (8) with D − 1 instead of D, we obtain Applying now the triangle inequality in L p , we get Let us now apply (7) conditionally on G 1 , . . . , To finish the proof it is now enough to integrate this inequality with respect to the remaining Gaussian vectors and combine the obtained estimate with (9) and (10).
Let us now specialize to the case when L X ( p) = Lp γ for some L > 0, γ ≥ 1/2. Combining the above proposition with Latała's Theorem 3.1, we obtain immediately the following theorem, a special case of which is Theorem 1.2.

Theorem 3.3
Assume that X is a random vector in R n , such that for some constants L > 0, γ ≥ 1/2, all smooth bounded functions f and all p ≥ 2, For any smooth function f : Moreover, if D D f is bounded uniformly on R n , then for all t > 0, . Proof The first part is a straightforward combination of Proposition 3.2 and Theorem 3.1. The second part follows from the first one by Chebyshev's inequality P |Y | ≥ e Y p ≤ exp(− p) applied with p = η f (t)/C D (note that if η f (t)/C D ≤ 2 then one can make the tail bound asserted in the theorem trivial by adjusting the constants).
Remark In [57,58] Pisier presents a stronger inequality than (11) with γ = 1/2. More specifically, he proves that if X, G are independent standard centered Gaussian vectors in R n , E is a Banach space and f : R n → E is a C 1 function, then for every convex function : E → R, where L = π 2 . As noted in [47], Caffarelli's contraction principle [20] implies that, e.g., a random vector X with density e −V , where V : where G is still a standard Gaussian vector independent of X ). Therefore in this situation a similar approach as in the proof of Proposition 3.2 can be used for functions f with values in a general Banach space. Moreover, a counterpart of Latała's results is known for chaos with values in a Hilbert space (to the best of our knowledge this observation has not been published, in fact it can be quite easily obtained from the version for real valued chaos). Thus in this case we can obtain a counterpart of Theorem 3.3 (with γ = 1/2) for Hilbert space valued-functions. In the case of a general Banach space two-sided estimates with deterministic quantities for Gaussian chaos are not known. Still, one can use some known inequalities (like hypercontraction or Borell-Arcones-Giné inequality) instead of Theorem 3.1 and thus obtain new concentration bounds. We remark that if one uses hypercontraction, one can obtain explicit dependence of the constants on the degree of the polynomial, since explicit constants are known for hypercontractive estimates of (Banach space-valued) Gaussian chaos and one can keep track of them during the proof. We skip the details.
In view of Theorem 3.3 a natural question arises: for what measures is the inequality (11) satisfied? Before we provide examples, for technical reasons let us recall the definition of the length of the gradient of a locally Lipschitz function. For a metric space (X , d), a locally Lipschitz function f : X → R and x ∈ X , we define If X = R n with a Euclidean metric and f is differentiable at x, then clearly |∇ f |(x) coincides with the Euclidean length of the usual gradient ∇ f (x). For this reason, with a slight abuse of notation, we will write |∇ f (x)| instead of |∇ f |(x). We will consider only measures on R n , however since we allow measures which are not necessarily absolutely continuous with respect to the Lebesgue measure, at some points in the proofs we will work with the above abstract definition.
Going back to the question of measures satisfying (11), it is well known (see e.g. [52]) that if X satisfies the Poincaré inequality for all locally Lipschitz bounded functions, then X satisfies (11) with γ = 1 and L = C √ D Poin (recall that C always denotes a universal constant). Assume now that X satisfies the logarithmic Sobolev inequality for locally Lipschitz bounded functions, where for a nonnegative random variable Y , Then, by the results from [3], it follows that X satisfies (11) with γ = 1/2 and L = √ D L S . We will now generalize this observation to measures satisfying the modified logarithmic Sobolev inequality (introduced in [28]). We will present it in greater generality than needed for proving (11), since we will use it later (in Sect. 6) to prove refined concentration results for random vectors with independent Weibull coordinates.
Let β ∈ [2, ∞). We will say that a random vector Y ∈ R k satisfies a β-modified logarithmic Sobolev inequality if for every locally Lipschitz bounded positive function Let us also introduce two quantities, measuring the length of the gradient in product spaces. Consider a locally Lipschitz function f :
In particular using the above theorem with m = 1 and k = n, we obtain the following Corollary 3.5 If X is a random vector in R n which satisfies the β-modified log-Sobolev inequality (16), then it satisfies (11) We remark that in the class of logarithmically concave random vectors, the βmodified log-Sobolev inequality is known to be equivalent to concentration for 1- Proof of Theorem 3.4 By the tensorization property of entropy (see e.g. [46], Proposition 5.6) we get for all positive locally Lipschitz bounded functions f : Following [3], consider now any locally Lipschitz bounded f > 0 and denote By the chain rule and the Hölder inequality for the pair of conjugate exponents Similarly, for t ≥ β, above inequality can be written as where we used the assumption β ≥ 2. Using the last three inequalities together with the fact that for t ≥ 0 the function The above inequality has been proved so far for strictly positive, locally Lipschitz functions (the boundedness assumption can be easily removed by truncation and passage to the limit). For the case of a general locally Lipschitz function f , take any ε > 0 and considerf = | f | + ε. Sincef is strictly positive and locally Lipschitz, the above inequality holds also forf . Taking ε → 0 + , we can now extend (19) to arbitrary locally Lipschitz f .
Finally, assume f : R mk → R is locally Lipschitz and f (X ) is integrable. Applying (19) to f − E f (X ) instead of f and taking the square root, we obtain (16) implies the Poincaré inequality with constant D L S β /2 (see Proposition 2.3. in [28]), we get (see the remark following (14)). These two estimates yield (17)

Applications of Theorem 1.2
Let us now present certain applications of estimates established in the previous section.
For simplicity we will restrict to the basic setting presented in Theorem 1.2.

Polynomials
A typical application of Theorem 1.2 would be to obtain tail inequalities for multivariate polynomials in the random vector X . The constants involved in such estimates do not depend on the dimension, but only on the degree of the polynomial. As already mentioned in the introduction, our results in this setting can be considered as a transference of inequalities by Latała from the tetrahedral Gaussian case to the case of not necessarily product random vectors and general polynomials.

Additive functionals and related statistics
We will now consider three classes of additive statistics of a random vector, often arising in various problems.
Additive functionals Let X be a random vector in R n satisfying (3). For a function f : R → R define the random variable It is classical and follows from (3) by a simple application of the Chebyshev inequality that if f is smooth with f ∞ ≤ α, then for all t > 0, Using Theorem 1.2 we can easily obtain inequalities which In consequence, calculating their · J norms is simple. More precisely, we have Therefore we obtain the following corollary to Theorem 1.2. We will apply it in the next section to linear eigenvalue statistics of random matrices. (20). Then for all t > 0,

Corollary 3.6 Let X be a random vector in
Clearly the case D = 1 of the above corollary recovers (21) up to constants. Moreover using the (yet unproven) Theorem 1.3 one can see that for f (x) = x D and X being a standard Gaussian vector in R n , the estimate of the corollary is optimal up to absolute constants (in this case, since Z f is a sum of independent random variables, one can also use estimates from [33]).
Additive functionals of partial sums Let us now consider a slightly more involved additive functional of the form Such random variables arise e.g., in the study of additive functionals of random walks (see e.g. [16,61] which, when combined with (3) and Chebyshev's inequality yields Now, let us assume that f ∈ C 2 and f is bounded. We have x k and thus Since D 2 F is a symmetric bilinear form, we have Using the above estimates and Theorem 1.2 we obtain To effectively bound the sub-Gaussian coefficient in the above inequality one should use some additional information about the structure of the vector X . For a given function f it is of order at most n 5 , but if, e.g., the function f is even and X is symmetric, it clearly vanishes. In this case we get One can check that if for instance X is a standard Gaussian vector in R n and f (x) = x 2 then this estimate is tight up to the value of the constant C. U-statistics Our last application in this section will concern U -statistics (for simplicity of order 2) of the random vector X , i.e., random variables of the form where h i j : R 2 → R are smooth functions. Without loss of generality let us assume that h i j (x, y) = h ji (y, x). (3) gives that if partial derivatives of h i, j are uniformly bounded on R 2 then for all t > 0,

A simple application of Chebyshev's inequality and
For h i j of class C 2 with bounded derivatives of second order, a direct application of Theorem 1.2 gives In particular, if h i j = h, a function with bounded derivatives of second order, we get , which shows that the oscillations of U are of order at most O(n 3/2 ). In the case of U -statistics of independent random variables, generated by bounded h, this is a well known fact, corresponding to the CLT and classical Hoeffding inequalities for U -statistics. We remark that in the non-degenerate case, i.e. when Var (E X h(X, Y )) > 0, n 3/2 is indeed the right normalization in the CLT for U -statistics (see e.g. [24]).

Linear statistics of eigenvalues of random matrices
We will now use Corollary 3.6 to obtain tail inequalities for linear eigenvalue statistics of Wigner random matrices. We remark that one could also apply to the random matrix case the other inequalities considered in the previous section, obtaining in particular estimates on U -statistics of eigenvalues (which have been recently investigated by Lytova and Pastur [50]). We will focus on linear eigenvalues statistics (additive functionals in the language of the previous section) and obtain inequalities involving a Sobolev norm of the function f with respect to the semicircle law (the limiting spectral distribution for Wigner ensembles) as a sub-Gaussian term. We refer the Reader to the monographs [4,6,51,56] for basic facts concerning random matrices.
Consider thus a real symmetric n×n random matrix A (n ≥ 2) and let λ 1 ≤ · · · ≤ λ n be its eigenvalues. We will be interested in concentration inequalities for functionals of the form In [31] Guionnet and Zeitouni obtained concentration inequalities for Z with Lipschitz f assuming that the entries of A are independent and satisfy the log-Sobolev inequality with some constant L. More specifically, they prove that for all t > 0, (In fact they treat a more general case of banded matrices, but for simplicity we will focus on the basic case.) As a corollary to Theorem 1.2 we present below an inequality which compliments the above result. Our aim is to replace the strong parameter f ∞ controlling the sub-Gaussian tail by a weaker Sobolev norm with respect to the semicircular law Remark The case f (x) = x 2 shows that under the assumptions of Proposition 3.7 one cannot expect a tail behaviour better than exponential for large t. Indeed, since Z = 1 n (λ 2 1 + · · · + λ 2 n ) = 1 n i, j≤n A 2 i j , even if A is a matrix with standard Gaussian entries, then for all t > 0, P(|Z − EZ | ≥ t) > 1 C exp(−C(t 2 ∧ nt)).
Remark A similar inequality to (23) holds in the case of Hermitian matrices with independent entries as well. In the proof given below one should invoke an appropriate result concerning the speed of convergence of the spectral distribution of Wigner matrices to the semicircular law.
we get that the mapÃ → (λ 1 / √ n, . . . , λ n / √ n) ∈ R n is √ 2/n-Lipschitz. Therefore, the random vector (λ 1 / √ n, . . . , λ n / √ n) satisfies (15) with constant 2L 2 /n. In consequence, by the results from [3] (see also Theorem 3.4) In what follows we shall estimate from above the term n −1 n i=1 (E f (λ i / √ n)) 2 from (24). First, by Jensen's inequality where μ is the expected spectral measure of the matrix n −1/2 A. According to Wigner's theorem, for a fixed f, μ converges to the semicircular law as n → ∞ and thus A non-asymptotic bound on the term R f 2 dμ can be obtained using the result of Bobkov, Götze and Tikhomirov [12] on the speed of convergence of the expected spectral distribution of real Wigner matrices to the semicircular law. Since each entry of A satisfies the logarithmic Sobolev inequality with constant L 2 , it also satisfies the Poincaré inequality with the same constant (see e.g. [46,Chapter 5]). Therefore Theorem 1.1 from [12] gives where F μ and F ρ are the distribution functions of μ and ρ, respectively. The decay of 1 − F μ (x) and F μ (x) as x → ∞ and x → −∞ (resp.) can be obtained using the sub-Gaussian concentration of λ n / √ n and λ 1 / √ n, which is, e.g., a consequence of (3) for the vector of eigenvalues of n −1/2 A. For example, for any t ≥ 0, Using the classical technique of δ-nets for estimating the operator norm of a matrix (see e.g. [58]) and the fact that the entries of A are sub-Gaussian (as they satisfy the logarithmic Sobolev inequality) one gets Eλ n ≤ E A op ≤ C L √ n, which together with (27) yields for all t ≥ 0. Clearly, the same inequality holds for F(−C L − t). Integrating by parts, we get Combining the uniform estimate (26) with (28) and using an elementary inequality 2x y ≤ x 2 + y 2 , we estimate the last integral in (29) as follows: where We proceed to estimate the two last terms from (30). Take r > 0 such that or put r = 0 if no such r exists. Note that if we assume C L ≥ 1, as we obviously can, then r ≤ C Ln −1/2 log n.
Remarks 1. The factor n −2/3 in (23) comes only from (26) and in some situations can be improved, provided one can obtain better speed of convergence to the semicircle law.

2.
With some more work (using truncations or working directly on moments) one can extend the above proposition to the case | f (x)| ≤ a(1 + |x| k ) for some nonnegative integer k and a ∈ R. In this case we obtain . We also remark that to obtain the inequality (24) one does not have to use independence of the entries of A, it is enough to assume that the vectorÃ satisfies the inequality (3).

Two-sided estimates of moments for Gaussian polynomials
We will now prove Theorem 1.3, showing that in the case of general polynomials in Gaussian variables, the estimates of Theorem 1.2 are optimal (up to constants depending only on the degree of the polynomial). In the special case of tetrahedral polynomials this follows from Latała's Theorem 1.1 and the following result by Kwapień.
Indeed, when combined with Theorem 1.1 and the triangle inequality, the above theorem gives the following The strategy of proof of Theorem 1.3 is very simple and relies on infinite divisibility of Gaussian random vectors, which will help us approximate the law of a general polynomial in Gaussian variables by the law of a tetrahedral polynomial, for which we will use Corollary 4.2.
It will be convenient to have the polynomial f represented as a combination of multivariate Hermite polynomials: Approximating the multiple stochastic integral leads to where the limit is in L 2 ( ) (see [34,Theorem 7.3. and formula (7.9)]) and actually the convergence holds in any L p (see [34,Theorem 3.50]). We remark that instead of multiple stochastic integrals with respect to the Wiener process we could use the CLT for canonical U -statistics (see [24,Chapter 4.2]), however the stochastic integral framework seems more convenient as it allows to put all the auxiliary variables on the same probability space as the original Gaussian sequence. Now, consider n independent copies (W In the lemma below we state the representation of a multivariate Hermite polynomial in the variables g (1) , . . . , g (n) as a limit of tetrahedral polynomials in the variables g (i) j,N . To this end let us introduce some more notation. Let G (n,N ) = (g (1) 1,N , . . . , g (1) N ,N , g (2) 1,N , . . . , g (2) N ,N , . . . , g be a Gaussian vector with n × N coordinates. We identify here the set [n N ] with [n] × [N ] via the bijection (i, j) ↔ (i − 1)N + j. We will also identify the sets ( o t h e r w i s e .

Lemma 4.3 With the above notation, for any p
Proof Using (39) for each h d i (g (i) ), For each N , the right-hand side equals (G (n,N ) [d] i k = i π(k) and j k = j π(k) , then

Moreover, B
(N ) d has zeros on "generalized diagonals", i.e., B Proof of Theorem 1.3 Let us first note that it is enough to prove the moment estimates, the tail bound follows from them by the Paley-Zygmund inequality (see e.g. the proof of Corollary 1 in [44]). Moreover, the upper bound on moments follows directly from Theorem 1.2. For the lower bound we use Lemma 4.3 to approximate the L p norm of f (G) − E f (G) with that of a tetrahedral polynomial, for which we can use the lower bound from Corollary 4.2.
Assuming f is of the form (38), Lemma 4.3 together with the triangle inequality implies (1) , . . . , g (n) ). It therefore remains to relate which will end the proof. Fix d ≥ 1 and J ∈ P d . For any d ∈ n d define a symmetric d-indexed matrix It is a simple observation that On the other hand, for any d ∈ n d , the matricesB Thus the triangle inequality for the · J norm together with (41) yields Finally, note that Indeed, using the identity on Hermite polynomials, d dx h k (x) = kh k−1 (x) (k ≥ 1), we obtain E d l dx l h k (g) = k!1 {k=l} for k, l ≥ 0, and thus, for any d, l ≤ D and d ∈ n l , (42) proves (40).

Now, (43) follows by linearity. Combining it with
Remark Note that the above infinite-divisibility argument can be also used to prove the upper bound on moments in Theorem 1.3 (giving a proof independent of the one relying on Theorem 1.2).

Polynomials in independent sub-Gaussian random variables
In this section we prove Theorem 1.4. Before we proceed with the core of the proof we will need to introduce some auxiliary inequalities for the norms · J as well as some additional notation.

Properties of · J norms
The first inequality we will need is pretty standard and given in the following lemma (it is a direct consequence of the definition of the norms · J ). Below • denotes the Hadamard product of d-indexed matrices, as defined in Sect. 2.

Lemma 5.1 For any d-indexed matrix A = (a i ) i∈[n] d and any vectors
To formulate subsequent inequalities we need some auxiliary notation concerning d-indexed matrices. We will treat matrices as functions from [n] d into the real line, which in particular allows us to use the notation of indicator functions and for a set C ⊆ [n] d write 1 C for the matrix (a i ) such that a i = 1 if i ∈ C and a i = 0 otherwise. Note that for #J > 1, · J is not unconditional in the standard basis, i.e., in general it is not true that A • 1 C J ≤ A J . One situation in which this inequality holds is when C is of the form C = {i : i k 1 = j 1 , . . . , i k l = j l } for some 1 ≤ k 1 < . . . < k l ≤ d and j 1 , . . . , j l ∈ [n] (which follows from Lemma 5.1). This corresponds to setting to zero all coefficients which are outside a "generalized row" of a matrix and leaving the coefficients in this row intact.
Later we will need another inequality of this type, which will allow us to select a "generalized diagonal" of a matrix. The corresponding estimate is given in the following

Lemma 5.2 Let A = (a i ) i∈[n] d be a d-indexed matrix, K ⊆ [d] and let C ⊆ [n] d be of the form C
. . , J m }. We will consider two cases.
1. The numbers k and l are separated by the partition J . Without loss of generality we can assume that k ∈ J 1 , l ∈ J 2 . Then For any x i Jm the inner supremum on the right hand side of (44) is the operator norm of the block-diagonal matrix obtained from B by setting to zero entries in off-diagonal blocks. Therefore it is not greater than the operator norm of B, which allows us to write

2.
There exists j such that k, l ∈ J j . Without loss of generality we can assume that j = 1. We have For a partition K = {K 1 , . . . , K m } ∈ P d define Thus L(K) is the set of all indices for which the partition into level sets is equal to K.

Corollary 5.3 For any J , K ∈ P d and any d-indexed matrix A,
Proof By Lemma 5.2 and the triangle inequality for any k < l, Now it is enough to note that L(K) can be expressed as an intersection of #K "generalized diagonals" and #K(#K − 1)/2 sets of the form {i : i k = i l } where k < l and use again Lemma 5.2 together with (46).

Proof of Theorem 1.4
Let us first note that the tail bound of Theorem 1.4 follows from the moment estimate and Chebyshev's inequality in the same way as in Theorems 1.2 or 3.3. We will therefore focus on the moment bound. The method of proof will rely on the reduction to the Gaussian case via decoupling inequalities, symmetrization and the contraction principle. To carry out this strategy we will need the following representation of f : where the coefficients c (d) 1 ,k π 1 ),...,(i πm ,k πm ) (48) for all permutations π : [m] → [m]. At this point we would like to explain the convention regarding indices which we will use throughout this section. It is rather standard, but we prefer to draw the Reader's attention to it, as we will use it extensively in what follows. Namely, we will treat the sequence k = (k 1 , . . . , k m )  Here we also use the convention that a product over an empty set is equal to one. On the other hand, for d > 0, the contribution from m = 0 is equal to zero (as the empty index k does not satisfy the constraint k 1 + · · · + k m = d and so the summation over k 1 , . . . , k m runs over the empty set).
Using (47) together with independence of X 1 , . . . , X n , one may write Rearranging the terms and using (48) together with the triangle inequality, we obtain Let now X (1) , . . . , X (D) be independent copies of the random vector X and (ε ( j) i ) i≤n, j≤D an array of i.i.d. Rademacher variables independent of (X ( j) ) j . For each k 1 , . . . , k a , by decoupling inequalities (Theorem 7.1 in the "Appendix") applied to the functions ...,k a )   i 1 ,...,i a (x 1 , . . . , x a ) = d   (k 1 ,...,k a ) and standard symmetrization inequalities (applied conditionally a times) we obtain, (note that in the first part of Theorem 7.1 one does not impose any symmetry assumptions on the functions h i ).
We will now use the following standard comparison lemma (for Reader's convenience its proof is presented in the "Appendix").

Lemma 5.4 For any positive integer
where g i j are i.i. d. N (0, 1) variables.
Note that for any positive integer k we have X k i ψ 2/k = X i k ψ 2 ≤ L k , so (50) together with the above lemma (used repeatedly and conditionally) yield where (g Applying Theorem 3.1 to the right hand side of (51), we obtain (K(k 1 ,...,k a )) , Note that for all k 1 , . . . , k a by Corollary 5.
Our next goal is to replace B d in the above inequality by ED d f (X ). To this end we will analyse the structure of the coefficients of B d and compare them with the integrated partial derivatives of f .
Let us first calculate ED d f (X ). Consider r ∈ [n] d , such that i 1 , . . . , i a are its distinct values, taken l 1 , . . . , l a times respectively. We have where we have used (48). By comparing this with the definition of b one can see that the sub-sum of the right hand side above corresponding to the choice k 1 = l 1 , . . . , k a = l a is equal to a!l 1 ! · · · l a !b (d) In particular for d = D, since l 1 + · · · + l a = D, we have r 1 ,...,r D and so where in the last inequality we used Corollary 5.3. Therefore if we prove that for all d < D and all partitions I = {I 1 , . . . , I a }, then by simple reverse induction (using again Corollary 5.3) we will obtain D d=1 L d which will end the proof of the theorem. where i 1 , . . . , i a are the values of r corresponding to the level sets I 1 , . . . , I a . We then have Since we do not pay attention to constants depending on D only, by the above formula and the triangle inequality, to prove (52) it is enough to show that for all sequences k 1 , . . . , k a such that k 1 + · · · + k a ≤ D, k i ≥ l i for i ≤ a and there exists i ≤ a such that k i > l i , one has for some partition K ∈ P k 1 +···+k a with #K = #J (note that j≤a l j = d). Therefore in what follows we will fix k 1 , . . . , k a as above and to simplify the notation we will write E (d) instead of E (d,k 1 ,...,k a ) I and e (d) r instead of e (d,k 1 ,...,k a ) r . Fix therefore any partitionĨ = {Ĩ 1 , . . . ,Ĩ a } ∈ P k 1 +···+k a such that #Ĩ i = k i and I i ⊆Ĩ i for all i ≤ a (the specific choice ofĨ is irrelevant). Finally define a (k 1 + · · · + k a )-indexed matrixẼ (k 1 +···+k a ) = (ẽ (k 1 +···+k a ) r ) r∈[n] d by setting In other words, the new matrix is created by embedding the d-indexed matrix into a "generalized diagonal" of a (k 1 + · · · + k a )-indexed matrix by adding j≤a (k j − l j ) new indices and assigning to them the values of old indices (for each j ≤ a we add k j − l j times the common value attained by r [d] on I j ).
Recall now the definition of the coefficients b (d) r and note that for any r ∈ L(Ĩ) ⊆ [n] k 1 +...+k a we haveẽ is the value of r on its level setĨ j . This means thatẼ and v s ∞ = 1 otherwise, by Lemma 5.1 this implies that for any K ∈ P k 1 +···+k a , where in the last inequality we used Corollary 5.3. We will now use the above inequality to prove (53). Consider the unique partition K = {K 1 , . . . , K b } satisfying the following two conditions: • for each s ∈ {d + 1, . . . , k 1 + · · · + k a } if s ∈Ĩ j and π(s) := minĨ j ∈ J k , then s ∈ K k .
In other words, all indices s, which in the construction ofĨ were added to I j (i.e., elements ofĨ j \I j ) are now added to the unique element of J containing π(s) = minĨ j = min I j . Now, it is easy to see that E (d) J ≤ Ẽ (k 1 +···+k a ) K . Indeed, consider an arbitrary We have y ( j) 2 = x ( j) 2 ≤ 1. Moreover, by the construction of the matrix E (k 1 +···+k a ) (recall (54)), we have (in the last equality we used the fact that if r ∈ L(Ĩ), then for s > d, r π(s) = r s and so y ( j) r J j ). By taking the supremum over x ( j) one thus obtains E (d) J ≤ Ẽ (k 1 +···+k a ) K . Combining this inequality with (55) proves (53) and thus (52). This ends the proof of Theorem 1.4.

Application: subgraph counting in random graphs
We will now apply results from Sect. 5 to some special cases of the problem of subgraph counting in Erdős-Rényi random graphs G(n, p), which is often used as a test model for deviation inequalities for polynomials in independent random variables. More specifically we will investigate the problem of counting cycles of a fixed length.
It turns out that Theorem 1.4 gives optimal inequalities in some range of parameters (leading to improvements of known results), whereas in some other regimes the estimates it gives are suboptimal.
Let us first describe the setting (we will do it in a slightly more general form that needed for our example). We will consider undirected graphs G = (V, E), where V is a finite set of vertices and E is the set of edges (i.e. two-element subsets of V ). By V G = V (G) and E G = E(G) we mean the set of vertices and edges (respectively) of a graph G. Also, v G = v(G) and e G = e(G) denote the number of vertices and edges in G. We say that a graph H is a subgraph of a graph G (which we denote by H ⊆ G) if V H ⊆ V G and E H ⊆ E G (thus a subgraph is not necessarily induced). Graphs H and G are isomorphic if there is a bijection π : For p ∈ [0, 1] consider now the Erdős-Rényi random graph G = G(n, p), i.e., a graph with n vertices (we will assume that V G = [n]) whose edges are selected independently at random with probability p. In what follows we will be concerned with the number of copies of a given graph H = ([k], E H ) in the graph G, i.e., the number of subgraphs of G which are isomorphic to H . We will denote this random variable by Y H (n, p). To relate Y H (n, p) to polynomials, let us consider the family C(n, 2) of two-element subsets of [n] and the family of independent random variables X = (X e ) e∈C(n, 2) , such that P(X e = 1) = 1 − P(X e = 0) = p (i.e., X e indicates whether the edge e has been selected or not). Denote moreover by Aut (H ) The right-hand side above is a homogeneous tetrahedral polynomial of degree e H . Moreover the variables X {v,w} satisfy which implies that X {v,w} ψ 2 ≤ (log(1/ p)) −1/2 ∧(log(2)) −1/2 ≤ √ 2(log(2/ p)) −1/2 . We can thus apply Theorem 1.4 to Y H (n, p) and obtain where L p = √ 2 log(2/ p) −1/2 and f : R C(n,2) → R is given by Deviation inequalities for subgraph counts have been studied by many authors, to mention [22,26,27,[35][36][37][38]65]. As it turns out the lower tail P(Y H (n, p) ≤ EY H (n, p) − t) is easier than the upper tail P(Y H (n, p) ≥ EY H (n, p) + t). The lower tail turns out to be also lighter than the upper one. Since our inequalities concern |Y H (n, p) − EY H (n, p)|, we cannot hope to recover optimal lower tail estimates, however we can still hope to get bounds which in some range of parameters n, p will agree with optimal upper tail estimates.
Of particular importance in literature is the law of large numbers regime, i.e., the case when t = εEY H (n, p). In [35] the Authors prove that for every ε > 0 such that for certain constants c(H, ε), C(H, ε) and a certain function M * H (n, p). Since the general definition of M * H is rather involved we will skip the details (in the examples considered in the sequel we will provide specific formulas). Note that if one disregards the constants depending on H and ε only, the lower and upper estimate above differ by the factor log(1/ p) in the exponent. To our best knowledge providing a lower and upper bound for general H , which would agree up to multiplicative constants in the exponent (depending on H and ε only) is an open problem (see the remark below).
We will now specialize to the case when H is a cycle. For simplicity we will first present the case of the triangle K 3 (the clique with three vertices). For this graph the upper bound from [35] has been recently strengthened to match the lower one (up to a constant depending only on ε) by Chatterjee [22] and DeMarco and Kahn [27] (who also obtained a similar result for general cliques [26]). In the next section we show that if p is not too small, the inequality (56) also allows to recover the optimal upper bound. In Section 5.3.2 we provide an upper bound for cycles of arbitrary (fixed) length k, which is optimal for p ≥ n − k−2 2(k−1) log − 1 2 n.

Remark (Added in revision)
Very recently, after the first version of this article was submitted, a major breakthrough was obtained by Chatterjee-Dembo and Lubetzky-Zhao [23,49], who strengthened the upper bound to exp(−C(H, ε)M * H (n, p) log 1 p ) for general graphs and p ≥ n −c(H ) . In the case of cycles which we consider in the sequel, our bounds are valid in a larger range of p → 0, than those which can be obtained from the present versions of the aforementioned papers. We would also like to point out, that the methods of [23,49] rely on large deviation principles and not on inequalities for general polynomials in independent random variables. Obtaining general inequalities for polynomials, which would yield optimal bound for general graphs is an interesting and apparently still open research problem.

Counting triangles
Assume that H = K 3 and let us analyse the behaviour of ED d f (X ) J for d = 1, 2, 3. Of course in this case #Aut(H ) = 6.
We and so ED f (X ) {1} = (n − 2) p 2 √ n(n − 1)/2 ≤ n 2 p 2 . For e 1 = e 2 or when e 1 and e 2 do not have a common vertex, we have ∂ 2 ∂ x e 1 ∂ x e 2 f = 0, whereas for e 1 , e 2 sharing exactly one vertex, we have where v, w are the vertices of e 1 , e 2 distinct from the common one. Therefore Using the fact that ED 2 f (X ) is symmetric and for each e 1 the sum of entries of ED 2 f (X ) in the row corresponding to e 1 equals 2 p(n − 2), we obtain ED 2 f (X ) {1}{2} = 2 p(n−2) ≤ 2 pn. One can also easily see that 3 form a triangle} and thus ED 3 f (X ) {1,2,3} = √ n(n − 1)(n − 2) ≤ n 3/2 . Moreover, due to symmetry we have Consider arbitrary (x e 1 ) e 1 ∈C(n,2) and (y e 2 ,e 3 ) e 2 ,e 3 ∈C(n,2) of norm one. We have Using (56) together with the above estimates, we obtain Proposition 5.5 For any t > 0, In particular for t = εEY K 3 (n, p) = ε n 3 p 3 , Thus for p ≥ n − 1 4 log − 1 2 n we obtain By Corollary 1.7 in [35], if p ≥ 1/n, then 1 C n 2 p 2 ≤ M * K 3 (n, p) ≤ Cn 2 p 2 (recall (57)) and so for p ≥ n −1/4 log −1/2 n the estimate obtained from the above proposition is optimal. As already mentioned the optimal estimate has been recently obtained in the full range of p by Chatterjee, DeMarco and Kahn. Unfortunately it seems that using our general approach we are not able to recover the full strength of their result. From Proposition 5.5 one can also see that Theorem 1.4, when specialized to polynomials in 0-1 random variables is not directly comparable with the family of Kim-Vu inequalities. As shown in [36] (see table 2 therein), various inequalities by Kim and Vu give for the triangle counting problem exponents − min(n 1/3 p 1/6 , n 1/2 p 1/2 ), −n 3/2 p 3/2 , −np (disregarding logarithmic factors). Thus for "large" p our inequality performs better than those by Kim-Vu, whereas for "small" p this is not the case (note that the Kim-Vu inequalities give meaningful bounds for p ≥ Cn −1 while ours only for p ≥ Cn −1/2 ). As already mentioned in the introduction the fact that our inequalities degenerate for small p is not surprising as even for sums of independent 0-1 random variables, when p becomes small, general inequalities for the sums of independent random variables with sub-Gaussian tails do not recover the correct tail behaviour (the · ψ 2 norm of the summands becomes much larger than the variance).

Counting cycles
We will now generalize Proposition 5.5 to cycles of arbitrary length. If H is a cycle of length k, then by Corollary 1.7 in [35], 1 C n 2 p 2 ≤ M * H (n, p) ≤ Cn 2 p 2 for p ≥ 1/n. Thus the bounds for the upper tail from (57) imply that for p ≥ 1/n, exp − C(k, ε)n 2 p 2 log(1/ p) ≤ P Y H (n, p) ≥ (1 + ε)EY H (n, p) ≤ exp − c(k, ε)n 2 p 2 for every ε > 0 for which the above probability is not zero.
We will show that similarly as for triangles, Theorem 1.4 allows to strengthen the upper bound if p is not too small with respect to n. More precisely, we have the following Proposition 5.6 Let H be a cycle of length k. Then for every t > 0, where L p = log(2/ p) −1/2 . In particular for every ε > 0 and p ≥ n − k−2 2(k−1) log −1/2 n, In order to prove the above proposition we need to estimate the corresponding · J norms. Since a major part of the argument does not rely on the fact that H is a cycle and bounds on · J norms may be of independent interest, we will now consider arbitrary graphs. Let thus H be a fixed graph with no isolated vertices.
Similarly to [35], it will be more convenient to count "ordered" copies of a graph H in G(n, p). Namely A sequence of distinct edges (ẽ 1 , . . . , xẽ and thus where for a graph G, v(G) is the number of vertices of G and s(G) is the number of edges in G with no other adjacent edge. Therefore, Let J be a partition of [d]. By the triangle inequality for the norms · J , (59) The norms appearing on the right hand side of (59) are handled by the following (i(e j )) j∈Jr and notice that the sum (61) equals the sum (60) while the constraints for x's imply the constraints for y's. Finally, by homogeneity and the fact that the sum (61) does not depend on the full graph structure but only on the sets of vertices of the graphs H r , the lemma will follow from the statement: For a sequence of finite, non-empty sets V 1 , . . . , V l , let V = V 1 ∪ · · · ∪ V l . Then We bound the inner sum using the Cauchy-Schwarz inequality. If # R ≥ 2, we get  Now we use the induction hypothesis for the sequence of sets (W r ) r ∈L and the vectors z (r ) , r ∈ L (note that i∈[n] Wr (z (r ) i Wr ) 2 ≤ 1).
Remark The bound in Lemma 5.7 is essentially optimal, at least for large n, say n ≥ 2k. To see this let us analyse optimality of (62) under the constraints (63)  We are now ready for Proof of Proposition 5. 6 We will use Lemma 5.8 to estimate ED d f (X ) J for any d ≤ k and J ∈ P d with #J = l. Note that for any e ∈ E(H ) d , v(H 0 (e)) − 1 2 #{v ∈ V (H 0 (e)) : v ∈ V (H r (e)) for exactly one r ∈ [l]} = 1 2 v(H 0 (e)) + #{v ∈ V (H 0 (e)) : v belongs to more than one V (H r (e))} = k/2 ifd = k and l = 1, where to get the second inequality we used the fact that each vertex of H has degree two. Thus we obtain Together with (56) this yields the first inequality of the proposition. Using the fact that EY H (n, p) ≥ 1 C k n k p k , the second inequality follows by simple calculations.

Refined inequalities for polynomials in independent random variables satisfying the modified log-Sobolev inequality
In this section we refine the inequalities which can be obtained from Theorem 3.3 for polynomials in independent random variables satisfying the β-modified log-Sobolev inequality (16) with β > 2. To this end we will use Theorem 3.4 together with a result from [2], which is a counterpart of Theorem 3.1 for homogeneous tetrahedral polynomials in general independent symmetric random variables with log-concave tails, however only of degree at most 3. We recall that for a set I , by P I we denote the family of partitions of I into pairwise disjoint, nonempty sets.
If d ≤ 3, then for any p ≥ 2, Moreover, if α = 1, then the above inequality holds for all d ≥ 1.
Before we proceed, let us provide a few specific examples of the norms A J |K , which for α < 2 are more complicated than in the Gaussian case. In what follows, β = α α−1 (with β = ∞ for α = 1). For d = 1,