Concentration inequalities on the multislice and for sampling without replacement

We present concentration inequalities on the multislice which are based on (modified) log-Sobolev inequalities. This includes bounds for convex functions and multilinear polynomials. As an application we show concentration results for the triangle count in the $G(n,M)$ Erd\H{o}s--R\'{e}nyi model resembling known bounds in the $G(n,p)$ case. Moreover, we give a proof of Talagrand's convex distance inequality for the multislice. Interpreting the multislice in a sampling without replacement context, we furthermore present concentration results for $n$ out of $N$ sampling without replacement. Based on a bounded difference inequality involving the finite-sampling correction factor $1- n/N$, we present an easy proof of Serfling's inequality with a slightly worse factor in the exponent, as well as a sub-Gaussian right tail for the Kolmogorov distance between the empirical measure and the true distribution of the sample.


Introduction
In the past few years, in particular in the analysis of Boolean functions, a model which has found emerging interest is the multislice. It can be regarded as a natural generalization of several well-known models like slices of the hypercube. In detail, let L ≥ 2 be a natural number, κ = (κ 1 , . . . , κ L ) ∈ N L (where by convention, 0 / ∈ N), N := κ 1 + · · ·+ κ L , and let X = {x 1 , . . . , x L } ⊂ R be a set of L distinct real numbers. The multislice is defined as Ω κ := ω = (ω 1 , . . . , ω N ) ∈ X N : In other words, any ω ∈ Ω κ is a sequence of elements from {x 1 , . . . , x L } in which each feature x ℓ appears exactly κ ℓ times. In the context of sampling without replacement, it describes the procedure of (fully) sampling from a population with a set of characteristics {x 1 , . . . , x L }, such that a proportion of κ ℓ /N of the population has characteristic x ℓ . We discuss and extend this relation in Section 1.2.
To gain an intuition into the multislice let us consider some special choices of L and κ. For L = 2 and κ = (k, N − k), the multislice reduces to k-slices on the hypercube, while the case of L = N and κ = (1, . . . , 1) can be interpreted as the symmetric group S N . If L = 2, Ω κ can be interpreted as all possible realizations of an Erdős-Rényi random graph (see Proposition 1.3 below for more details). Moreover, the multislice gives rise to a Markov chain known as the multi-urn Bernoulli-Laplace diffusion model, but we will not pursue this aspect. For examples, see [Sal20].
Multislices equipped with the uniform measure were also considered in earlier works. Logarithmic Sobolev inequalities were proven in [FOW19;Sal20], while in [Fil20], the Friedgut-Kalai-Naor (FKN) theorem was extended to the multislice. We shall make use of the functional inequalities proven by Salez [Sal20] to apply the entropy method and prove concentration inequalities in the above-mentioned settings.

Concentration inequalities for various types of functionals.
In the first section, we present concentration inequalities for some functions on the multislice which are comparable to known concentration results in the independent case. We begin with a number of elementary inequalities. Proposition 1.1.
(1) Let f : Ω κ → R be a function such that |f (ω) − f (τ ij ω)| ≤ c ij for all ω ∈ Ω κ , all 1 ≤ i < j ≤ N and suitable constants c ij ≥ 0. For any t ≥ 0, we have (2) Let f : [x 1 , x L ] N → R be convex and 1-Lipschitz. Then, for any t ≥ 0 we have Proposition 1.1 follows by a classic approach of Ledoux [Led97] (the entropy method), i. e. by exploiting suitable log-Sobolev-type inequalities, some of which might be of independent interest (cf. Propositions 2.2 and 2.3). Note that the bounded differences-type inequality (1.2) is invariant under the change f → −f , so that in particular, this result extends to the concentration inequality By contrast, (1.3) clearly does not hold for −f in general, but by different techniques discussed in Section 1.3, this result can be extended to the lower tails as well.
While results for Lipschitz-type functions as in Proposition 1.1 are fairly standard in concentration of measure theory, in the past decade there has been increasing interest in non-Lipschitz functions. A case in point are so-called multilinear polynomials, i. e. polynomials which are affine with respect to every variable. Clearly, any multilinear polynomial f = f (ω) of degree d may be written as Typically, multilinear polynomials of degree d ≥ 2 no longer have sub-Gaussian tails, but the tails show different regimes or levels of decay, corresponding to a larger family of norms of the tensors of derivatives ∇ k f , k = 1, . . . , d.
The family · I was first introduced in [Lat06], where it was used to prove twosided estimates for L p norms of Gaussian chaos, and the definitions given above agree with the ones from [Lat06] as well as [AW15] and [AKPS19]. We can regard the A I as a family of operator-type norms. In particular, it is easy to see that A {1,...,d} = A HS : For the sake of illustration, consider the case of d = 2 and a quadratic form f (ω) = i<j a ij ω i ω j = ω T Aω/2, where A is a symmetric matrix with vanishing diagonal and entries A ij = a ij = A ji for any i < j. Let us additionally assume that E κ ω i = 0 for any i. In this case, we obviously have E κ ∇f = 0 and E κ ∇ 2 f = A. Consequently, the conclusion of Theorem 1.2 reads showing a version of the famous Hanson-Wright inequality for the multislice (cf. [HW71]). As an alternate strategy of proof, in Section 1.3 we derive Talagrand's convex distance inequality for the multislice, which in particular yields Hanson-Wright inequalities by [AW15] (where results of this type have already been established for sampling without replacement.) Theorem 1.2 may be seen as a generalization of these bounds to any order d ∈ N.
Possible applications include the Erdős-Rényi model, which features random graphs with a fixed number of vertices n. There are two variants of the Erdős-Rényi model which are often labeled G(n, p) and G(n, M). In the G(n, p) model, each possible edge between the n vertices is included with probability p independently of the other edges, while in the G(n, M) model, the graph is chosen uniformly at random from the collection of all graphs with n vertices and M edges. In the following we study G(n, M). One problem which has attracted considerable attention over the last two decades is the number of copies of certain subgraphs, e. g. triangles, in the Erdős-Rényi model. There is extensive literature on concentration inequalities for the triangle count, such as [JR02], [Cha12] and [DK12]. In particular, in [AW15, Proposition 5.5], bounds for the G(n, p) model are derived using higher order concentration results for multilinear polynomials in independent random variables. As Theorem 1.2 provides analogous higher order concentration results in a dependent situation, we are able to show corresponding bounds for the G(n, M) model by our methods.
Comparing Proposition 1.3 to [AW15, Proposition 5.5], we see that we arrive at essentially the same tail bounds despite the dependencies in the G(n, M) model, with the only difference of an additional logarithmic factor L p := (log(2/p)) −1/2 in [AW15]. This logarithmic factor stems from the use of sub-Gaussian norms for independent Bernoulli random variables (which tend to 0 as p → 0), which is not mirrored in the log-Sobolev tools we use.
Typically, the main interest is to study fluctuations which scale with the expected value of f . In this case, setting t : In particular, this shows that the optimal exponent n 2 p 2 known from the G(n, p) setting also shows up for a suitable range of p, cf. the discussion in [AW15].
In a similar way, we may also count cycles as in [AW15, Proposition 5.6], but we do not pursue this in this note.
1.2. Sampling without replacement. In this section we interpret the multislice in the sampling without replacement context, where we sample N times from a population of N individuals ω 1 , . . . , ω N , so that the uniform distribution P κ describes the sampling of all its elements. In applications one does not sample the entire population, but chooses some sample size n ≤ N, i. e. for each ω ∈ Ω κ , and considers the first n coordinates only. Formally, if pr n denotes the projection onto the first n coordinates, we may define Ω κ,n := pr n (Ω κ ). We, again, equip Ω κ,n with the uniform distribution P κ,n , which agrees with the push-forward of P κ under pr n . As above, we denote the expectation with respect to P κ,n by E κ,n f , where f is any real-valued function.
Our first result is a bounded differences inequality for sampling without replacement involving the finite-sampling correction factor 1 − n/N. In the sequel, (ω i c , ω ′ i ) denotes a vector which agrees with ω in all coordinates but the i-th one, while ω i is replaced by some admissible ω ′ i (in the sense that (ω i c , ω ′ i ) ∈ Ω κ,n ). Moreover, for any σ ∈ S n we may define σω ∈ Ω κ,n by noting that σ acts on ω by permuting its indices.
Note that equation (1.6) is invariant under the change f → −f , which yields a two-sided concentration inequality as in (1.4).
To express it in terms of deviation probabilities, for any δ ∈ (0, 1] we have with probability at least 1 − δ

Concentration inequalities of this type have also been proven in [EP09, Lemma 2]
and [Cor+09, Theorem 5] by different methods, and our results agree with these bounds up to constants. Let us apply Proposition 1.4 to some known statistics in sampling without replacement. One of the most famous concentration results for sampling without replacement is Serfling's inequality [Ser74], which can be regarded as a strengthening of Hoeffding's inequality for n out of N sampling due to the inclusion of the finite-sampling correction factor 1 − n/N. For a discussion and some newer results we refer to [BM15], [Tol17] and [GW17]. We can deduce Serfling's inequality with a slightly worse constant from Proposition 1.4. Corollary 1.5. In the situation above, we have for any t ≥ 0 The same estimate holds for P κ,n ( 1 In the original version of Serfling's inequality, the right-hand side is replaced by exp(−2nt 2 /((1 − (n − 1)/N)|X | 2 )).
As a second example, consider the approximation of the the uniform distribution on all the points from which the ω i are sampled using the empirical measure, measured in terms of the Kolmogorov distance. Formally, we put In [GW17], it was conjectured that √ nf has sub-Gaussian tails with variance 1−n/N.
The next result states that after centering around the expectation, this is indeed the case.
Corollary 1.6. With the above notation we have for any t ≥ 0

Talagrand's convex distance inequality.
Let Ω be any measurable space, ω = (ω 1 , . . . , ω N ) ∈ Ω N and A ⊂ Ω N a measurable set. In his landmark paper [Tal95], Talagrand defined the convex distance between ω and A Talagrand proved concentration inequalities for the convex distance of random permutations and product measures which have attracted continuous interest since then. For product measures, an alternate proof based on the entropy method was given in [BLM09]. In [SS19], the entropy method was used to reprove the convex distance inequality for random permutations as well, and this proof was extended to slices of the hypercube. In the present article, we further generalize this proof to the multislice, encompassing both situations discussed in [SS19].
Note that in [Pau14], convex distance inequalities for certain types of dependent random variables are proven. This includes sampling without replacement. In this sense, the result of Proposition 1.7 is not new, but we present a different strategy of proof solely based on the entropy method.
A famous corollary of Talagrand's convex distance inequality are sub-Gaussian concentration inequalities for convex Lipschitz functions, as first proven in [Tal88]. Thus, Proposition 1.7 implies the following corollary, which can be regarded as an extension of Proposition 1.1 to upper and lower tails (ignoring the subtle issue of concentration around the mean or the median of a function). 6 Corollary 1.8. Let f : R N → R be convex and L-Lipschitz. Then for any t ≥ 0 it holds where med(f ) is a median for f .
As a simple application of Corollary 1.8, we show the following bound on the largest eigenvalue of symmetric matrices whose entries have distribution P κ : Corollary 1.9. Let X = (X ij ) i,j be a symmetric n × n random matrix. Let N := n(n+1)/2 and assume that the common distribution of the entries (X ij ) i≤j on R N is given by P κ for some κ, L ≥ 2 and X . Let λ max := λ max (X) := max{|λ(X)| : λ(X) eigenvalue of X}. We have for any t ≥ 0 In particular, this result shows that λ max has sub-Gaussian tails independently of the dimension n. A possible choice of X is the adjacency matrix of a G(n, M) Erdős-Rényi random graph. Corollary 1.9 is an adaption of a classical example for independent random variables, see e. g. [BLM13, Example 6.8].
Furthermore, we are able to prove a somewhat weaker version of the convex distance inequality for n out of N sampling. Here we consider symmetric sets, i. e. sets A ⊂ Ω κ,n such that ω ∈ A implies σω ∈ A for any permutation σ ∈ S n . Obviously, assuming A to be symmetric is increasingly restrictive if n tends to N. This is mirrored in the additional finite-sampling correction factor 1 − n/N in the following theorem (which sharpens the convex distance inequality in [Pau14]). Theorem 1.10. For any symmetric set A ∈ Ω κ,n with P κ,n (A) ≥ 1 2 and any t ≥ 0 we have .
As above, Theorem 1.10 implies the following result.
Corollary 1.11. Let f be a convex and symmetric L-Lipschitz function. Then for any t ≥ 0 we have Examples of functions to which Corollary 1.11 may be applied are the estimators for the mean and the standard deviation given by , having Lipschitz constants L = n −1/2 and L = (2n) −1/2 , respectively. In particular, for any δ ∈ (0, 1] we have with probability at least 1 − δ for any of the two estimators It is well-known that concentration results centered around the expectation and the median differ only by a constant. Indeed, in our case, for any convex, symmetric L-Lipschitz function

Logarithmic Sobolev inequalities for the multislice
The main tool for establishing concentration inequalities in this note is the entropy method, which is based on the use of logarithmic Sobolev-type inequalities. Let us recall some basic facts and definitions especially adapted to discrete spaces. A key object is a suitable difference operator, i. e. a kind of "discrete derivative". Given a probability space (Y, F , µ), we call any operator Γ : L ∞ (µ) → L ∞ (µ) satisfying |Γ(af + b)| = a |Γf | for all a > 0, b ∈ R a difference operator. Moreover, by E µ we denote integration with respect to µ.
(1) We say that µ satisfies a logarithmic Sobolev inequality Γ−LSI(σ 2 ) if for all bounded measurable functions f , we have (2) We say that µ satifies a modified logarithmic Sobolev inequality Γ−mLSI(σ 2 ) if for all bounded measurable functions f , we have (3) We say that µ satisfies a Poincaré inequality Γ−PI(σ 2 ) if for all bounded measurable functions f , we have where Var(f ) := E µ f 2 − (E µ f ) 2 is the variance. (4) If any of these functional inequalities does not hold for all bounded measurable functions but for some subclass A ⊂ L ∞ (µ), we say that µ satisfies a Γ−LSI(σ 2 ) (PI, mLSI) on A.
If Γ satisfies the chain rule (as the ordinary gradient ∇ does), Γ−LSIs and Γ−mLSIs are equivalent concepts, but in the examples we consider in this note, this is usually not true. Moreover, it is well-known that a Γ−LSI(σ 2 ) implies a Γ−PI(σ 2 ), cf. e. g. [BT06, Proposition 3.6].
For the multislice, we mostly consider the following canonical difference operator. Recalling the "switch" operator from (1.1), for any function f : Ω κ → R we set and define the difference operator Γ by Note that Γ ij (f ) 2 might be interpreted as a sort of "local variance". Indeed, it is easy to verify that where ω {i,j} c = (ω k ) k / ∈{i,j} and η ij = (η i , η j ). Therefore, we have Γ(f ) 2 = 2N −1 |df | 2 for the difference operator |df | introduced in [GSS19].
Sometimes (and typically for auxiliary purposes), we shall also need a second, closely related difference operator which we denote by Γ + . Here, we simply set where x + := max(x, 0) denotes the positive part of a real number, and define Γ + accordingly.
Recently, in [Sal20] sharp (modified) logarithmic Sobolev inequalities for the multislice were established. Rewriting these results in accordance with our notation and slightly extending them immediately leads to the following proposition, serving as the basis for our arguments: Proposition 2.2. With the above definitions of Γ and Γ + , P κ satisfies the following functional inequalities: Proof of Proposition 2.2. The Γ−LSI directly follows from [Sal20, Theorem 5]. Moreover, by [Sal20, Lemma 1] (substituting f ≥ 0 by e f ), we have for any f : Ω κ → R. Using the fact that ω → τ ij ω is an automorphism of Ω κ and applying the inequality (a − b)(e a − e b ) ≤ 1 2 (e a + e b )(a − b) 2 leads to the Γ−mLSI(4). By similar arguments, we may also deduce the Γ + −mLSI(8). In particular, we note that the expected values on the right-hand side of (2.2) are symmetric in ω and τ ij ω and use the inequality From Proposition 2.2 we may derive a convex ∇−(m)LSI on the multislice, where ∇ denotes the usual Euclidean gradient.

Proposition 2.3. For any f ∈ A
In other words, P κ satisfies a ∇ − mLSI(8|X | 2 ) on A c .
Another class of functional inequalities we address in this note are Beckner inequalities. Restricting ourselves to the multislice (rather than providing a general definition), P κ satisfies a Beckner inequality with parameter p ∈ (1, 2] (Bec-p) if there exists some constant β p > 0 such that for any nonnegative function f . Here, for any functions f, g on Ω κ (which is the Dirichlet form of the underlying Markov chain). Recently, in [APS20] it was shown that in the context of general Markov semigroups, Beckner inequalities with constants bounded away from zero as p ↓ 1 and modified log-Sobolev inequalities are equivalent. In their article, the authors provide numerous examples and applications, also briefly discussing the multislice. Since we need results of this type for our purposes, we include a somewhat more detailed discussion in the present note. Proof. First note that the result holds true for κ = (1, . . . , 1) as proven in [BT06, Proposition 4.8], with the difference in the constant being due to different normalizations. To extend this result to general κ, we apply a "projection" or "coarsening" argument, cf. [Sal20, Section 3.4]. Indeed, consider the map Ψ : {1, . . . , N} → {1, . . . , L} given by Ψ(i) = ℓ iff i ∈ {κ 1 + · · · + κ ℓ−1 + 1, . . . , κ 1 + · · · + κ ℓ } and extend it to the multislice by coordinate-wise application, i. e. Ψ(ω 1 , . . . , ω N ) := (Ψ(ω 1 ), . . . , Ψ(ω N )). Then, by [Sal20, Lemma 4], for any functions f, g. From these identities, we immediately obtain the result.
Finally, we may also derive logarithmic Sobolev inequalities for symmetric functions of sampling without replacement. Here we use other types of difference operators. Let f : Ω κ,n → R be any (not necessarily symmetric) function. Then, we set Here, the supremum and the infimum have to be interpreted as extending over all admissible configurations, i. e. such that (ω i c , ω i ), (ω i c , ω ′ i ) ∈ Ω κ,n .
Proposition 2.5. Let A n,s := {f : Ω κ,n → R | f symmetric}. With the above definitions of h and h + , P κ,n satisfies the following functional inequalities on A n,s : Proof. We only prove the h + −mLSI. The proofs of the other two inequalities follow by a modification of the arguments below. First note that any function f on Ω κ,n can be extended to a function F on Ω κ which only depends on the first n coordinates by setting F (ω 1 , . . . , ω N ) := f (ω 1 , . . . , ω n ), which may be rewritten as F = f • pr n . We now apply Proposition 2.2 to F . Obviously, Ent P κ (e F ) = Ent P κ,n (e f ). It therefore remains to consider the right-hand side of the mLSI. Here we obtain ω 1 ,...,ωn) .
Here, the first equality follows by symmetry of f with respect to the symmetric group S n , and the fact that f does not depend on (ω n+1 , . . . , ω N ). The first inequality is due to the monotonicity of x → x + , and the last equality follows as P κ,n is the push-forward of P κ under pr n . Thus, for any f ∈ A n,s it holds (ω 1 ,...,ωn) , which finishes the proof.

Proofs of the concentration inequalities
Proof of Proposition 1.1. Recall that if a probability measure µ satisfies a Γ−mLSI(σ 2 ) on A (where Γ denotes some difference operator), we have for any f ∈ A such that Γ(f ) ≤ L, for any t ≥ 0. For a reference, see e. g. [BG99]

Proof. This follows immediately from Proposition 2.4 and [APS20, Proposition 3.3].
To apply the latter result, we have to check that the the constants of the Beckner inequalities Bec-p satisfy for some a > 0, s ≥ 0 and any p ∈ (1, 2]. Clearly, we may take a = 1/4 and s = 0, which finishes the proof. Note that alternatively, we could apply [GSS19, Proposition 2.4], using (2.1) and Proposition 2.2, which yields As a result of using the Γ−LSI, we arrive at a substantially weaker constant, however. Next, we have to relate differences of multilinear polynomials to (formal) derivatives, which is typically achived by an inequality of the form Γ(f ) ≤ c|∇f | for some absolute constant c > 0. However, it comes out that such an inequality cannot be true in our setting. For instance, taking N = 3, X = {0, 1} and f (ω) = ω 1 ω 2 − ω 1 ω 3 , it is easy to check that for ω = (0, 1, 1), we have 0 = |∇f (ω)| < Γ(f )(ω). The same problem arises if we take Γ + instead of Γ. It is possible to prove an inequality of this type with c := |X | for multilinear polynomials with non-negative coefficients and X ⊂ [0, ∞) (this can be seen by slightly modifying the proof of Proposition 3.2 below). However, the proof of Theorem 1.2 also includes an iteration and linearization procedure, and if we only allow for non-negative coefficients we get stuck at d = 2.
The following proposition provides us with the estimate we need to get the recursion going, at the cost of also involving second order derivatives.

Proposition 3.2. Let f = f (ω) be a multilinear polynomial as in Theorem 1.2. Then we have
In particular, for any p ≥ 2 we have with θ as in Lemma 3.1.
Proof. In the proof, we additionally assume f to be d-homogeneous, i. e.
This is done in order to ease notation, and it is no problem to extend our proof to the non-homogeneous case. For notational convenience, for any i 1 < . . . < i d and any permutation σ ∈ S d , we define a i σ(1) ...i σ(d) := a i 1 ...i d , and we set a i 1 ...i d = 0 if i j = i k for some j = k. Finally, note that some of the notation below has to be interpreted accordingly for small values of d, e. g. summation over i 1 < . . . < i d−1 reduces to summation over i 1 for d = 2. Observe that for any k, ℓ ∈ {1, . . . , N}, k = ℓ, we have Consequently it holds proving equation (3.2). Finally, combining (3.2) with Lemma 3.1, we immediately arrive at (3.3).
Proof of Theorem 1.2. To ease notation, we assume |X | = 1 in the sequel. The general case follows in the same way with only minor changes. Recall the fact that for a standard Gaussian g in R k for some k ∈ N and x ∈ R k we have √ pM −1 |x| ≤ x, g L p ≤ M √ p|x| for all p ≥ 1 and some universal constant M > 1. Combining this and equation (3.3) we arrive at Here, G is an N-dimensional standard Gaussian and H is an N 2dimensional standard Gaussian such that G and H are independent of each other and of the ω i , and the L p norms on the right-hand side are taken with respect to the product measure of P κ and the Gaussians.
Note that ∇f, G and ∇ 2 f, H are again multilinear polynomials in the ω i . Moreover, ∇ ∇f, G 1 , G 2 = ∇ 2 f, G 1 ⊗ G 2 and ∇ 2 ∇f, G , H = ∇ 3 f, G ⊗ H . In the last expression, we regard ∇ 3 f as a 2-tensor whose second component is N 2dimensional. Similar relations also hold for the other terms in (3.4).
The proof now follows by iterating (3.4). For simplicity of presentation, let us consider the case of d = 2 first. Here, we apply the triangle inequality (in the form ∇f, G L p ≤ E κ ∇f, G L p + ∇f − E κ ∇f, G L p and similarly for ∇ 2 f, H ) to (3.4). We may then apply (3.4) to ∇f − E κ ∇f, G and ∇ 2 f − E κ ∇ 2 f, H again. This leads to In the last step, we have used that since f is a multilinear polynomial of degree 2, its second order derivatives are constant and all derivatives of order larger than 2 vanish.
Next we use that by [Lat06], there are constants C k depending on k only such that for any (possibly rectangular) k-tensor A and any p ≥ 2, where g 1 , . . . , g k are standard Gaussians. Applying (3.6) to (3.5), we obtain for some From here, the assertion follows by standard arguments, cf. e. g. [GSS20, Proposition 4]. Finally, we consider an arbitrary d ≥ 2 and explain how the proof given above generalizes. First, we apply the triangle inequality to (3.4) and iterate d − 1 times. This yields where we have for any i = 1, . . . , d−1. As f is a multilinear polynomial of degree d, these expressions simplify since the derivatives of order d are constant and all derivatives of higher order vanish. In particular, Now, as above we apply (3.6) to (3.7) (or rather the L p norms appearing in (3.8)) to arrive at for some absolute constant C > 0 depending on d only. In particular, we use that if we apply (3.6) to some ℓ ≥ 1 term in ψ i in (3.8), the norms which arise reappear in the norms corresponding to ℓ = 0 in the ψ i+ℓ terms. The proof is concluded by recalling [GSS20, Proposition 4] again.
Moreover, the second order derivatives ∂ 2 f /(∂ω e 1 ∂ω e 2 ) are zero unless e 1 and e 2 share exactly one vertex, in which case it is ω ij if i and j are the two vertices distinct from the common one. Finally, the third order derivatives ∂ 3 f /(∂ω e 1 ∂ω e 2 ∂ω e 3 ) are 1 if e 1 , e 2 , e 3 form a triangle and zero if not.
Using that for any k = 1, . . . , N and any pairwise distinct set of edges e 1 , . . . , e k , we therefore obtain Moreover, we have E ∇ 2 f = p(½ |e 1 ∩e 2 |=1 ) e 1 ,e 2 , where |e 1 ∩ e 2 | denotes the number of common vertices of e 1 and e 2 . Therefore, we may use the calculations from the proof of [AW15, Proposition 5.5], which yield The proof now follows by plugging in.
The results of Section 1.2 follow from the logarithmic Sobolev inequalities established in Section 2 by standard means.
To prove Talagrand's convex distance inequality on the multislice, we follow the approach by Boucheron, Lugosi and Massart [BLM03], see also [SS19, Proposition 1.9]. A key step in the proof is the following lemma. Lemma 3.3. Let f : Ω κ → R be a non-negative function such that Especially we have In particular, this holds for f (ω) = 1 4 d T (ω, A) 2 , where A ⊂ S n is any set. We defer the proof of Lemma 3.3 until the end of the section and first show how to apply it to prove Talagrand's convex distance inequality.
Proof of Proposition 1.7. The difference operator Γ + clearly satisfies Γ + (g 2 ) ≤ 2gΓ + (g) for all positive functions g, as well as a Γ + − mLSI(8). Moreover, as seen in the proof of Lemma 3.3, we have Γ + (d T (·, A)) ≤ 1. Thus, by [SS19, (3.6)] it holds for λ ∈ [0, 1/16) Furthermore, Lemma 3.3 shows that So, for λ = 1/144 we have Proofs of Corollaries 1.8 and 1.11. These corollaries follow in exactly the same way as the proof of [Tal88, Theorem 3]. The only difference is to note that for any Proof of Corollary 1.9. Since λ max = X op , it is clear by triangular inequality that λ max is a convex function of the X ij , i ≤ j. Moreover, due to Lidskii's inequality, λ max is 1-Lipschitz. It therefore remains to apply Corollary 1.8.
Proof of Lemma 3.3. Rewriting [Sal20, Lemma 1], for any positive function g it holds Using this, we obtain for any λ ∈ [0, 1] where Ψ(x) := e x − 1. By a Taylor expansion it can easily be seen that Ψ(x) ≤ 2x for all x ∈ [0, 1], so that (recall that by (2) The covariance of f e −λf is non-positive (i. e. E f e −λf ≤ E f E e −λf ), which yields In The second part follows by nonnegativity and t = E κ f . It remains to check that f (ω) = 1 4 d T (ω, A) 2 satisfies the two conditions of this lemma. To this end, we first show that Γ + (d T (·, A)) 2 ≤ 1. Writing g(ω) := d T (ω, A), it is well known (see [BLM03]) that by Sion's minimax theorem, we have To estimate Γ + (g) 2 (ω), one has to compare g(ω) and g(τ ij ω). To this end, for any ω ∈ Ω κ fixed, let α, ν be parameters for which the value g(ω) is attained, and letν =ν ij be a minimizer of inf ν∈M(A) N k=1 α k ν(ω ′ : ω ′ k = (τ ij ω) k ). This leads to Using this as well as Γ + (g 2 ) ≤ 2gΓ + (g) for all positive functions g, we have To show the second property, we proceed similarly to [BLM09, Proof of Lemma 1]. By (3.9) and the Cauchy-Schwarz inequality, we have Assuming without loss of generality that f (ω) ≥ f (τ ij ω), chooseν =ν ij ∈ M(A) such that the value of f (τ ij ω) is attained. It follows that which finishes the proof.
Proof of Theorem 1.10. Since A is a symmetric set, ω → d T (ω, A) is a symmetric function, which follows by the definition α k ν(ω ′ : ω ′ k = ω k ).
As in the proof of Proposition 1.7, let ν, α be the parameters for which the value d T (ω, A) is attained, and letν,ω ′ i be minimizers of inf ω ′ i inf ν∈M(A) n j=1 α k ν(η : η k = (ω i c , ω ′ i ) k ). We then have Recall that by Proposition 2.5, P κ,n satisfies an h + − LSI(8(1 − n N )) on the set of all symmetric functions. As a consequence, using (3.1) again, we obtain the sub-Gaussian estimate In the next step, we observe that by the Poincaré inequality we have Var(d T (·, A)) ≤ 8(1 − n/N) E κ,n h + (d T (·, A)) 2 ≤ 4(1 − n/N).