1 Introduction

In the past few years, in particular in the analysis of Boolean functions, a model which has found emerging interest is the multislice. It can be regarded as a natural generalization of several well-known models like slices of the hypercube. In detail, let \(L \ge 2\) be a natural number, \(\kappa = (\kappa _1, \ldots , \kappa _L) \in \mathbb {N}^L\) (where by convention, \(0 \notin \mathbb {N}\)), \(N :=\kappa _1 + \cdots + \kappa _L\), and let \({\mathcal {X}} = \{x_1, \ldots , x_L\} \subset {{\,\mathrm{\mathbb {R}}\,}}\) be a set of L distinct real numbers. Typically, \({\mathcal {X}} = \{0, 1, \ldots , L-1\}\) or \({\mathcal {X}} = \{1,2, \ldots , L\}\), but we prefer not to specify \({\mathcal {X}}\) since the most natural choice usually depends on the situation under consideration. The multislice is defined as

$$\begin{aligned} \varOmega _\kappa :=\left\{ \omega = \left( \omega _1, \ldots , \omega _N\right) \in {\mathcal {X}}^N :\sum _{i=1}^N \mathbbm {1}_{\{\omega _i = x_\ell \}} = \kappa _\ell \, \text { for } \ell = 1, \ldots , L\right\} . \end{aligned}$$

In other words, any \(\omega \in \varOmega _\kappa \) is a sequence of elements from \(\{x_1, \ldots , x_L\}\) in which each feature \(x_\ell \) appears exactly \(\kappa _\ell \) times. In the context of sampling without replacement, it describes the procedure of (fully) sampling from a population with a set of characteristics \(\{x_1, \ldots , x_L\}\), such that a proportion of \(\kappa _\ell / N\) of the population has characteristic \(x_\ell \). We discuss and extend this relation in Sect. 1.2.

To gain an intuition into the multislice, let us consider some special choices of L and \(\kappa \). For \(L=2\), \(\kappa = (k,N-k)\) and \({\mathcal {X}} = \{0,1\}\), the multislice reduces to k-slices on the hypercube, while the case of \(L=N\), \(\kappa = (1, \ldots , 1)\) and \({\mathcal {X}} = \{1, \ldots , N\}\) can be interpreted as the symmetric group \(S_N\). If \(L=2\), \(\varOmega _\kappa \) can be interpreted as all possible realizations of an Erdős–Rényi random graph (see Corollary 1 for more details). Moreover, the multislice gives rise to a Markov chain known as the multi-urn Bernoulli–Laplace diffusion model, but we will not pursue this aspect. For examples, see [25].

We equip \(\varOmega _\kappa \) with the uniform distribution which we denote by \({{\,\mathrm{\mathbb {P}}\,}}_\kappa \) or sometimes also simply \({{\,\mathrm{\mathbb {P}}\,}}\). In other words,

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_\kappa (\{\omega \}) = |\varOmega _\kappa |^{-1} = \left( {\begin{array}{c}N\\ \kappa _1, \ldots , \kappa _L\end{array}}\right) ^{-1} = \frac{\kappa _1! \cdots \kappa _L!}{N!} \end{aligned}$$

for any \(\omega \in \varOmega _\kappa \). If f is any real-valued function on \(\varOmega _\kappa \), we write \(\mathbb {E}_\kappa f\) or \(\mathbb {E}f\) for its expectation with respect to \({{\,\mathrm{\mathbb {P}}\,}}_\kappa \). Moreover, to fix some conventions, we always assume the \(x_\ell \) to be ordered such that \(x_1< x_2< \ldots < x_L\). In particular, \(|{\mathcal {X}} | :=x_L - x_1\) denotes the diameter of \({\mathcal {X}}\). Furthermore, we shall write \(\kappa _{\mathrm {min}} :=\min \{\kappa _1, \ldots , \kappa _L\}\). Finally, for any \(1 \le i \ne j \le N\), let \(\tau _{ij}\) be the “switch” operator which switches the ith and jth component of the vector \(\omega \). In other words, \(\tau _{ij}\) transforms \(\omega \) into the vector \(\tau _{ij}\omega \) given by

$$\begin{aligned} \tau _{ij}\omega = \left( \omega _1, \ldots , \omega _{i-1}, \omega _j, \omega _{i+1}, \ldots , \omega _{j-1}, \omega _i, \omega _{j+1}, \ldots , \omega _N\right) . \end{aligned}$$
(1)

Multislices equipped with the uniform measure were also considered in earlier works. Logarithmic Sobolev inequalities were proven in [16, 25], while in [15], the Friedgut–Kalai–Naor (FKN) theorem was extended to the multislice. We shall make use of the functional inequalities proven by Salez [25] to apply the entropy method and prove concentration inequalities in the above-mentioned settings.

1.1 Concentration Inequalities for Various Types of Functionals

In the first section, we present concentration inequalities for some functions on the multislice which are comparable to known concentration results in the independent case. We begin with a number of elementary inequalities.

Proposition 1

  1. 1.

    Let \(f :\varOmega _\kappa \rightarrow {{\,\mathrm{\mathbb {R}}\,}}\) be a function such that \(|f(\omega ) - f(\tau _{ij}\omega ) | \le c_{ij}\) for all \(\omega \in \varOmega _\kappa \), all \(1 \le i < j \le N\) and suitable constants \(c_{ij} \ge 0\). For any \(t \ge 0\), we have

    $$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_\kappa \left( f - {{\,\mathrm{\mathbb {E}}\,}}_\kappa f \ge t\right) \le \exp \left( - \frac{Nt^2}{4\sum _{1 \le i < j \le N}c_{ij}^2}\right) . \end{aligned}$$
    (2)
  2. 2.

    Let \(f :[x_1, x_L]^N \rightarrow {{\,\mathrm{\mathbb {R}}\,}}\) be convex and 1-Lipschitz. Then, for any \(t \ge 0\) we have

    $$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_\kappa \left( f-{{\,\mathrm{\mathbb {E}}\,}}_\kappa f \ge t\right) \le \exp \left( -\frac{t^2}{16|{\mathcal {X}} |^2}\right) . \end{aligned}$$
    (3)

Proposition 1 follows by a classic approach of Ledoux [23] (the entropy method), i.e., by exploiting suitable log-Sobolev-type inequalities, some of which might be of independent interest (cf. Propositions 4 and 5 ). Note that the bounded differences-type inequality (2) is invariant under the change \(f \mapsto -f\), so that in particular, this result extends to the concentration inequality

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_\kappa \left( |f - {{\,\mathrm{\mathbb {E}}\,}}_\kappa f | \ge t\right) \le 2\exp \left( - \frac{Nt^2}{4\sum _{1 \le i < j \le N}c_{ij}^2}\right) . \end{aligned}$$
(4)

By contrast, the assumption of convexity used in (3) is clearly not invariant under the change \(f \mapsto -f\), but by different techniques discussed in Sect. 1.3, (3) can be extended to the lower tails as well.

While results for Lipschitz-type functions as in Proposition 1 are fairly standard in concentration of measure theory, in the past decade there has been increasing interest in non-Lipschitz functions. A case in point is so-called multilinear polynomials, i.e., polynomials which are affine with respect to every variable. Clearly, any multilinear polynomial \(f = f(\omega )\) of degree d may be written as

$$\begin{aligned} f(\omega ) = a_0 + \sum _{i_1=1}^N a_{i_1}\omega _{i_1} + \cdots + \sum _{i_1< \ldots < i_d} a_{i_1 \ldots i_d} \omega _{i_1} \cdots \omega _{i_d}. \end{aligned}$$
(5)

Typically, multilinear polynomials of degree \(d \ge 2\) no longer have sub-Gaussian tails, but the tails show different regimes or levels of decay, corresponding to a larger family of norms of the tensors of derivatives \(\nabla ^k f\), \(k = 1, \ldots , d\). For t large, terms of the form \(\exp (-(t/\beta _d)^{2/d})\) dominate, where \(\beta _d\) depends on the dth order derivatives. Tail inequalities of this type are also called multilevel tail inequalities, a term phrased by Adamczak [2, 3].

In detail, we need a family of norms \(\Vert \cdot \Vert _{{\mathcal {I}}}\) on the space of d-tensors for each partition \({\mathcal {I}} = \{ I_1, \ldots , I_k \} \in P_d\), where \(P_d\) denotes the set of all partitions of \(\{1,\ldots ,d\}\). For any \(1 \le i_1, \ldots , i_d \le N\) and any subset \(I \subset \{1, \ldots , d\}\), we write \(i_I = (i_k)_{k \in I}\), and for each \(\ell = 1,\ldots , k\) we denote by \(x^{(\ell )}\) a vector in \({{\,\mathrm{\mathbb {R}}\,}}^{N^{I_\ell }}\). Then, for a d-tensor \(A = (a_{i_1, \ldots , i_d})\) and a partition \({\mathcal {I}} \in P_d\), we set

$$\begin{aligned} \Vert A \Vert _{{\mathcal {I}}} :=\sup \Big \lbrace \sum _{i_1, \ldots , i_d} a_{i_1 \ldots i_d} \prod _{\ell = 1}^k x^{(\ell )}_{i_{I_\ell }} : \sum _{i_{I_\ell }} \left( x^{(\ell )}_{i_{I_\ell }}\right) ^2 \le 1 \text { for all } \ell = 1, \ldots , k\Big \rbrace . \end{aligned}$$

The family \(\Vert \cdot \Vert _{{\mathcal {I}}}\) was first introduced in [22], where it was used to prove two-sided estimates for \(L^p\) norms of Gaussian chaos, and the definitions given above agree with the ones from [22] as well as [3, 5]. We can regard the \(\Vert A \Vert _{{\mathcal {I}}}\) as a family of operator-type norms. In particular, it is easy to see that \(\Vert A \Vert _{\{1, \ldots , d\}} = \Vert A \Vert _\mathrm {HS} :=(\sum _{i_1, \ldots , i_d} a_{i_1 \ldots i_d}^2)^{1/2}\) (Hilbert–Schmidt norm) and \(\Vert A \Vert _{\{\{1\}, \ldots , \{d\}\}} = \Vert A \Vert _\mathrm {op} :=\sup \{ \sum _{i_1, \ldots , i_d} a_{i_1 \ldots i_d} x^{(1)}_{i_1} \cdots x^{(d)}_{i_d} : |x_{i^{(\ell )}} | \le 1 \text { for all } \ell = 1, \ldots , d \}\) (operator norm).

Theorem 1

Let \(f = f(\omega )\) be a multilinear polynomial (5) of degree d. There exists a constant \(c = c(d)\) such that

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_\kappa \left( |f - {{\,\mathrm{\mathbb {E}}\,}}_\kappa f | \ge t\right) \le 2 \exp \left( -c \min _{1\le k \le d} \min _{{\mathcal {I}} \in P_k}\left( \frac{t}{|{\mathcal {X}} |^k\Vert {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^k f \Vert _{\mathcal {I}}}\right) ^{2/|{\mathcal {I}} |}\right) . \end{aligned}$$

Theorem 1 is an analogue of [5, Theorem 1.4] (independent sub-Gaussian random variables), [3, Theorem 2.2] (the Ising model), [18, Theorem 5] (in the presence of certain discrete log-Sobolev inequalities), and [4, Corollary 5.4] (modified log-Sobolev inequalities for Glauber dynamics) for the multislice.

For the sake of illustration, consider the case of \(d=2\) and a quadratic form \(f(\omega ) = \sum _{i<j} a_{ij}\omega _i\omega _j = \omega ^TA\omega /2\), where A is a symmetric matrix with vanishing diagonal and entries \(A_{ij} = a_{ij} = A_{ji}\) for any \(i < j\). Let us additionally assume that \({{\,\mathrm{\mathbb {E}}\,}}_\kappa \omega _i = 0\) for any i. In this case, we obviously have \({{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla f = 0\) and \({{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^2 f = A\). Consequently, the conclusion of Theorem 1 reads

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_\kappa \left( |f - {{\,\mathrm{\mathbb {E}}\,}}_\kappa f | \ge t\right) \le 2 \exp \left( -c \min \left( \frac{t^2}{|{\mathcal {X}} |^4\Vert A \Vert _\mathrm {HS}^2}, \frac{t}{|{\mathcal {X}} |^2\Vert A \Vert _\mathrm {op}}\right) \right) , \end{aligned}$$

showing a version of the famous Hanson–Wright inequality for the multislice (cf. [20]). As an alternate strategy of proving the Hanson–Wright inequality, in Sect. 1.3 we derive Talagrand’s convex distance inequality for the multislice, which in particular yields Hanson–Wright inequalities by [1] (where results of this type have already been established for sampling without replacement along the lines, cf. Remark 2.3 therein). Theorem 1 may be seen as a generalization of these bounds to any order \(d \in \mathbb {N}\).

Possible applications of Theorem 1 include the Erdős–Rényi model, which features random graphs with a fixed number of vertices n. There are two variants of the Erdős–Rényi model which are often labeled G(np) and G(nM). In the G(np) model, each possible edge between the n vertices is included with probability p independently of the other edges, while in the G(nM) model, the graph is chosen uniformly at random from the collection of all graphs with n vertices and M edges. In the following, we study G(nM).

Write \(E = \{(i,j): 1 \le i < j \le n\}\) for the set of possible edges, so that \(\mathrm {card}(E) = n(n-1)/2 =:N\). Clearly, any edge \(e \in E\) is included with probability \(M/N =:p\). However, unlike in the G(np) model, the events of the edges being included are not independent. Any configuration \(\omega \) in G(nM) can be written as a vector \(\omega = (\omega _e)_{e\in E} \in \{0,1\}^E\) such that \(\omega _e = 1\) for exactly M entries. In particular, G(nM) can be regarded as a multislice with \(L=2\), \(\kappa = (N-M,M)\) and \({\mathcal {X}} = \{0,1\}\).

One problem which has attracted considerable attention over the last two decades is the number of copies of certain subgraphs, e.g., triangles, in the Erdős–Rényi model. There is extensive literature on concentration inequalities for the triangle count, such as [12, 14, 21]. In particular, in [5, Proposition 5.5], bounds for the G(np) model are derived using higher-order concentration results for multilinear polynomials in independent random variables. As Theorem 1 provides analogous higher-order concentration results in a dependent situation, we are able to show corresponding bounds for the G(nM) model by our methods.

Corollary 1

Consider the G(nM) Erdős–Rényi model, and consider the number of triangles defined as \(f(\omega ) :=\sum _{i<j<k} \omega _{ij}\omega _{jk}\omega _{ik}\). Then, for any \(t \ge 0\),

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}\left( |f - {{\,\mathrm{\mathbb {E}}\,}}f | \ge t\right) \le 2 \exp \left( -c \min \left( \frac{t^2}{n^3 + p^2 n^3 + p^4 n^4}, \frac{t}{n^{1/2} + p n}, t^{2/3}\right) \right) . \end{aligned}$$

Comparing Corollary 1 to [5, Proposition 5.5], we see that we arrive at essentially the same tail bounds despite the dependencies in the G(nM) model, with the only difference of an additional logarithmic factor \(L_p :=(\log (2/p))^{-1/2}\) in [5]. This logarithmic factor stems from the use of sub-Gaussian norms for independent Bernoulli random variables (which tend to 0 as \(p \rightarrow 0\)), which is not mirrored in the log-Sobolev tools we use.

Typically, the main interest is to study fluctuations which scale with the expected value of f. In this case, setting \(t :=\varepsilon {{\,\mathrm{\mathbb {E}}\,}}f = \varepsilon \left( {\begin{array}{c}n\\ 3\end{array}}\right) M(M-1)(M-2)/(N(N-1)(N-2))\), Corollary 1 reads

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}\left( |f - {{\,\mathrm{\mathbb {E}}\,}}f | \ge \varepsilon {{\,\mathrm{\mathbb {E}}\,}}f\right) \le 2 \exp \left( -c \min \left( \varepsilon ^2 n^3 p^6, \left( \varepsilon ^2 \wedge \varepsilon ^{2/3}\right) n^2 p^2 \right) \right) . \end{aligned}$$

In particular, this shows that the optimal exponent \(n^2p^2\) known from the G(np) setting also shows up for a suitable range of p, cf. the discussion in [5].

In a similar way, we may also count cycles as in [5, Proposition 5.6], but we do not pursue this in this note.

1.2 Sampling Without Replacement

In this section, we interpret the multislice in the sampling without replacement context, where we sample N times from a population of N individuals \(\omega _1, \ldots , \omega _N\), so that the uniform distribution \({{\,\mathrm{\mathbb {P}}\,}}_\kappa \) describes the sampling of all its elements. In applications, one does not sample the entire population, but chooses some sample size \(n \le N\), i.e., for each \(\omega \in \varOmega _\kappa \), and considers the first n coordinates only. Formally, if \(pr_n\) denotes the projection onto the first n coordinates, we may define \(\varOmega _{\kappa ,n} :=pr_n(\varOmega _\kappa )\). We, again, equip \(\varOmega _{\kappa ,n}\) with the uniform distribution \({{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}\), which agrees with the push-forward of \({{\,\mathrm{\mathbb {P}}\,}}_\kappa \) under \(pr_n\). As above, we denote the expectation with respect to \({{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}\) by \({{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n}f\), where f is any real-valued function.

Our first result is a bounded differences inequality for sampling without replacement involving the finite-sampling correction factor \(1- n/N\). In the sequel, \((\omega _{i^c}, \omega _i')\) denotes a vector which agrees with \(\omega \) in all coordinates but the ith one, while \(\omega _i\) is replaced by some admissible \(\omega _i'\) (in the sense that \((\omega _{i^c}, \omega _i') \in \varOmega _{\kappa ,n}\)). Moreover, for any \(\sigma \in S_n\) we may define \(\sigma \omega \in \varOmega _{\kappa ,n}\) by noting that \(\sigma \) acts on \(\omega \) by permuting its indices.

Proposition 2

Let \(f: \varOmega _{\kappa ,n} \rightarrow {{\,\mathrm{\mathbb {R}}\,}}\) be an arbitrary function and \((c_i)_{i =1,\ldots ,n}\) such that \(|f(\omega ) - f(\omega _{i^c}, \omega _i') | \le c_i\) for all \(\omega \in \varOmega _{\kappa ,n}, \omega _i' \in {\mathcal {X}}\). For any \(t \ge 0\), it holds

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}\left( \frac{1}{n!} \sum _{\sigma \in S_n} f(\sigma \omega ) - {{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} f \ge t \right) \le \exp \left( - \frac{t^2}{4(1-\frac{n}{N}) \sum _{i = 1}^n c_i^2} \right) . \end{aligned}$$
(6)

In particular, if f is symmetric, i.e., \(f(\omega ) = f(\sigma \omega )\) for any \(\sigma \in S_n\) and any \(\omega \in \varOmega _{\kappa ,n}\), and satisfies \(|f(\omega ) - f(\omega _1', \omega _2, \ldots , \omega _n) | \le c\) for some \(c > 0\), this implies

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}\left( f - {{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} f \ge t \right) \le \exp \left( - \frac{t^2}{4\big ( 1 - \frac{n}{N} \big ) c^2 n} \right) . \end{aligned}$$
(7)

Note that equation (6) is invariant under the change \(f \mapsto -f\), which yields a two-sided concentration inequality as in (4). To express it in terms of deviation probabilities, for any \(\delta \in (0,1]\) we have with probability at least \(1-\delta \)

$$\begin{aligned} \Big |\frac{1}{n!} \sum _{\sigma \in S_n} f(\sigma \omega ) - {{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} f \Big |\le \sqrt{4(1-n/N) \log \left( \frac{2}{\delta } \right) \sum _{i =1}^n c_i^2 }. \end{aligned}$$

Concentration inequalities of this type have also been proven in [31, Lemma 2] and [13, Theorem 5] by different methods, and our results agree with these bounds up to constants.

Let us apply Proposition 2 to some known statistics in sampling without replacement. One of the most famous concentration results for sampling without replacement is Serfling’s inequality [27], which can be regarded as a strengthening of Hoeffding’s inequality for n out of N sampling due to the inclusion of the finite-sampling correction factor \(1-n/N\). For a discussion and some newer results, we refer to [6, 19, 30]. We can deduce Serfling’s inequality with a slightly worse constant from Proposition 2.

Corollary 2

In the situation above, we have for any \(t \ge 0\)

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}\left( \frac{1}{n} \sum _{i = 1}^n \omega _i - {{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} \omega _1 \ge t \right) \le \exp \left( - \frac{n t^2}{4\left( 1- \frac{n}{N} \right) |{\mathcal {X}} |^2} \right) . \end{aligned}$$

The same estimate holds for \({{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}( \frac{1}{n} \sum _{i = 1}^n \omega _i - {{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} \omega _1 \le -t)\).

In the original version of Serfling’s inequality, the right-hand side is replaced by \(\exp (-2nt^2/((1-(n-1)/N)|{\mathcal {X}} |^2))\).

As a second example, consider the approximation of the uniform distribution on all the points from which the \(\omega _i\) are sampled using the empirical measure, measured in terms of the Kolmogorov distance. Formally, we put

$$\begin{aligned} g_{n,t}\left( \omega _1, \ldots , \omega _n\right) = \frac{1}{n} \sum _{i = 1}^n \mathbbm {1}_{(-\infty , t]}(\omega _i) \end{aligned}$$

and

$$\begin{aligned} f(\omega ) :=\sup _{t \in {{\,\mathrm{\mathbb {R}}\,}}} \left( g_{n,t}(\omega ) - {{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} g_{n,t}\right) . \end{aligned}$$

In [19], it was conjectured that \(\sqrt{n} f\) has sub-Gaussian tails with variance \(1 - n/N\). The next result states that after centering around the expectation, this is indeed the case.

Corollary 3

With the above notation, we have for any \(t \ge 0\)

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}( \sqrt{n}|f - {{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} f | \ge t) \le 2\exp \left( - \frac{t^2}{4\left( 1 - \frac{n}{N} \right) } \right) . \end{aligned}$$

1.3 Talagrand’s Convex Distance Inequality

Let \(\varOmega \) be any measurable space, \(\omega = (\omega _1, \ldots , \omega _N) \in \varOmega ^N\) and \(A \subset \varOmega ^N\) a measurable set. In his landmark paper [29], Talagrand defined the convex distance between \(\omega \) and A

$$\begin{aligned} d_T(\omega , A) :=\sup _{\alpha \in \mathbb {R}^N: |\alpha | = 1} d_\alpha (\omega , A), \end{aligned}$$

where

$$\begin{aligned} d_\alpha (\omega , A) :=\inf _{\omega ' \in A} d_\alpha (\omega , \omega ') :=\inf _{\omega ' \in A} \sum _{i=1}^N |\alpha _i | \mathbbm {1}_{\omega _i \ne \omega '_i}. \end{aligned}$$

Talagrand proved concentration inequalities for the convex distance of random permutations and product measures which have attracted continuous interest since then. For product measures, an alternate proof based on the entropy method was given in [10]. In [26], the entropy method was used to reprove the convex distance inequality for random permutations as well, and this proof was extended to slices of the hypercube. In the present article, we further generalize this proof to the multislice, encompassing both situations discussed in [26].

Proposition 3

For any \(A \subseteq \varOmega _\kappa \), it holds

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_\kappa (A) {{\,\mathrm{\mathbb {E}}\,}}_\kappa \exp \left( \frac{d_T(\cdot , A)^2}{144} \right) \le 1. \end{aligned}$$

Note that in [24], convex distance inequalities for certain types of dependent random variables are proven. This includes sampling without replacement. In this sense, the result of Proposition 3 is not new, but we present a different strategy of proof solely based on the entropy method.

A famous corollary of Talagrand’s convex distance inequality is sub-Gaussian concentration inequalities for convex Lipschitz functions, as first proven in [28]. Thus, Proposition 3 implies the following corollary, which can be regarded as an extension of Proposition 1 to upper and lower tails (ignoring the subtle issue of concentration around the mean or the median of a function).

Corollary 4

Let \(f: {{\,\mathrm{\mathbb {R}}\,}}^N \rightarrow {{\,\mathrm{\mathbb {R}}\,}}\) be convex and L-Lipschitz. Then, for any \(t \ge 0\) it holds

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_\kappa \left( |f - \mathrm {med}(f) | \ge t \right) \le 4 \exp \left( - \frac{t^2}{144L^2 |{\mathcal {X}} |^2} \right) , \end{aligned}$$

where \(\mathrm {med}(f)\) is a median for f.

As a simple application of Corollary 4, we show the following bound on the largest eigenvalue of symmetric matrices whose entries have distribution \({{\,\mathrm{\mathbb {P}}\,}}_\kappa \):

Corollary 5

Let \(X = (X_{ij})_{i,j}\) be a symmetric \(n \times n\) random matrix. Let \(N :=n(n+1)/2\) and assume that the common distribution of the entries \((X_{ij})_{i \le j}\) on \({{\,\mathrm{\mathbb {R}}\,}}^N\) is given by \({{\,\mathrm{\mathbb {P}}\,}}_\kappa \) for some \(\kappa \), \(L \ge 2\) and \({\mathcal {X}}\). Let \(\lambda _\mathrm {max} :=\lambda _\mathrm {max}(X) :=\max \{|\lambda (X) | :\lambda (X) \ \text {eigenvalue of} \ X\}\). We have for any \(t \ge 0\)

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}\left( |\lambda _\mathrm {max}(X) - \mathrm {med}(\lambda _\mathrm {max}(X)) | \ge t\right) \le 4 \exp \left( - \frac{t^2}{144 |{\mathcal {X}} |^2} \right) . \end{aligned}$$

In particular, this result shows that \(\lambda _\mathrm {max}\) has sub-Gaussian tails independently of the dimension n. A possible choice of X is the adjacency matrix of a G(nM) Erdős–Rényi random graph. Corollary 5 is an adaptation of a classical example for independent random variables, see, e.g., [11, Example 6.8].

Furthermore, we are able to prove a somewhat weaker version of the convex distance inequality for n out of N sampling. Here, we consider symmetric sets, i.e., sets \(A \subset \varOmega _{\kappa ,n}\), such that \(\omega \in A\) implies \(\sigma \omega \in A\) for any permutation \(\sigma \in S_n\). Obviously, assuming A to be symmetric is increasingly restrictive if n tends to N. This is mirrored in the additional finite-sampling correction factor \(1-n/N\) in the following theorem (which sharpens the convex distance inequality in [24]).

Theorem 2

For any symmetric set \(A \in \varOmega _{\kappa ,n}\) with \({{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}(A) \ge \frac{1}{2}\) and any \(t \ge 0\), we have

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}\left( d_T(\cdot , A) \ge t \right) \le e \exp \left( - \frac{t^2}{16(1-\frac{n}{N})} \right) . \end{aligned}$$

As above, Theorem 2 implies the following result.

Corollary 6

Let f be a convex and symmetric L-Lipschitz function. Then, for any \(t \ge 0\) we have

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}\left( |f - \mathrm {med} (f) | \ge t \right) \le 2e \exp \left( - \frac{t^2}{16(1-n/N) L^2 |{\mathcal {X}} |^2} \right) . \end{aligned}$$

Examples of functions to which Corollary 6 may be applied are the estimators for the mean and the standard deviation given by \(f(\omega ) = {\bar{\omega }} = n^{-1} \sum _{i =1}^n \omega _i\) (sample mean) and \(f(\omega ) = s(\omega ) = (\frac{1}{n-1} \sum _{i=1}^n (\omega _i - {\bar{\omega }})^2)^{1/2} = (\frac{1}{n(n-1)} \sum _{i < j} (\omega _i - \omega _j)^2)^{1/2}\) (sample standard deviation), having Lipschitz constants \(L = n^{-1/2}\) and \(L=(2n)^{-1/2}\), respectively. In particular, for any \(\delta \in (0,1]\) we have with probability at least \(1-\delta \) for any of the two estimators

$$\begin{aligned} |f - \mathrm {med}(f) | \le \sqrt{16 (1-n/N) L^2 |{\mathcal {X}} |^2 \log (2e/\delta )}. \end{aligned}$$

It is well known that concentration results centered around the expectation and the median differ only by a constant. Indeed, in our case, for any convex, symmetric L-Lipschitz function

$$\begin{aligned} |{{\,\mathrm{\mathbb {E}}\,}}_{\kappa , n} f - \mathrm {med}(f) |&\le {{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} |f - \mathrm {med}(f) | = \int _0^\infty {{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}\left( |f - \mathrm {med}(f) | \ge t \right) dt \\&\le 2e \int _0^\infty \exp \left( - \frac{t^2}{16(1-n/N) {\mathcal {X}}^2 L^2} \right) dt\\&= 2e \sqrt{4\pi (1-n/N) |{\mathcal {X}} |^2 L^2} \\&\approx 19.27 |{\mathcal {X}} | L \sqrt{1-n/N}. \end{aligned}$$

2 Logarithmic Sobolev Inequalities for the Multislice

The main tool for establishing concentration inequalities in this note is the entropy method, which is based on the use of logarithmic Sobolev-type inequalities. Let us recall some basic facts and definitions especially adapted to discrete spaces. A key object is a suitable difference operator, i.e., a kind of “discrete derivative.” Given a probability space \(({\mathcal {Y}}, {\mathcal {F}}, \mu )\), we call any operator \(\varGamma : L^\infty (\mu ) \rightarrow L^\infty (\mu )\) satisfying \(|\varGamma (af + b)| = a\, |\varGamma f|\) for all \(a > 0\), \(b \in \mathbb {R}\) a difference operator. Moreover, by \({{\,\mathrm{\mathbb {E}}\,}}_\mu \) we denote integration with respect to \(\mu \).

Definition 1

  1. 1.

    We say that \(\mu \) satisfies a logarithmic Sobolev inequality \(\varGamma \mathrm {-LSI}(\sigma ^2)\) if for all bounded measurable functions f, we have

    $$\begin{aligned} \mathrm {Ent}_{\mu }(f^2) \le 2\sigma ^2 {{\,\mathrm{\mathbb {E}}\,}}_\mu \varGamma (f)^2, \end{aligned}$$

    where \(\mathrm {Ent}_\mu (f) :={{\,\mathrm{\mathbb {E}}\,}}_\mu f\log (f) - {{\,\mathrm{\mathbb {E}}\,}}_\mu f \log {{\,\mathrm{\mathbb {E}}\,}}_\mu f\) (for any positive function f) denotes the entropy functional.

  2. 2.

    We say that \(\mu \) satisfies a modified logarithmic Sobolev inequality \(\varGamma \mathrm {-mLSI}(\sigma ^2)\) if for all bounded measurable functions f, we have

    $$\begin{aligned} \mathrm {Ent}_{\mu }(e^f) \le \frac{\sigma ^2}{2} {{\,\mathrm{\mathbb {E}}\,}}_\mu \varGamma (f)^2e^f. \end{aligned}$$
  3. 3.

    We say that \(\mu \) satisfies a Poincaré inequality \(\varGamma \mathrm {-PI}(\sigma ^2)\) if for all bounded measurable functions f, we have

    $$\begin{aligned} \mathrm {Var}_{\mu }(f) \le \sigma ^2 {{\,\mathrm{\mathbb {E}}\,}}_\mu \varGamma (f)^2, \end{aligned}$$

    where \(\mathrm {Var}(f) :={{\,\mathrm{\mathbb {E}}\,}}_\mu f^2 - ({{\,\mathrm{\mathbb {E}}\,}}_\mu f)^2\) is the variance.

  4. 4.

    If any of these functional inequalities does not hold for all bounded measurable functions but for some subclass \({\mathcal {A}} \subset L^\infty (\mu )\), we say that \(\mu \) satisfies a \(\varGamma \mathrm {-LSI}(\sigma ^2)\) (PI, mLSI) on \({\mathcal {A}}\).

If \(\varGamma \) satisfies the chain rule (as the ordinary gradient \(\nabla \) does), \(\varGamma \mathrm {-LSIs}\) and \(\varGamma \mathrm {-mLSIs}\) are equivalent concepts, but in the examples we consider in this note, this is usually not true. Moreover, it is well known that a \(\varGamma \mathrm {-LSI}(\sigma ^2)\) and a \(\varGamma \mathrm {-mLSI}(\sigma ^2)\) both imply a \(\varGamma \mathrm {-PI}(\sigma ^2)\), cf. e.g., [8, Proposition 3.6].

For the multislice, we mostly consider the following canonical difference operator. Recalling the “switch” operator from (1), for any function \(f :\varOmega _\kappa \rightarrow \mathbb {R}\) we set

$$\begin{aligned} \varGamma _{ij}(f)(\omega ) :=\varGamma _{ij}f(\omega ) :=f(\omega ) - f(\tau _{ij}\omega ) =:f(\omega ) - \tau _{ij}f(\omega ) \end{aligned}$$

and define the difference operator \(\varGamma \) by

$$\begin{aligned} \varGamma (f) :=\Big (\frac{1}{2N} \sum _{1 \le i < j \le N} \varGamma _{ij} (f)^2\Big )^{1/2}. \end{aligned}$$

Note that \(\varGamma _{ij}(f)^2\) might be interpreted as a sort of “local variance.” Indeed, it is easy to verify that

$$\begin{aligned} \varGamma _{ij}(f)^2(\omega ) = 2 \int (f(\omega ) - f(\omega _{\{i,j\}^c}, \eta _{ij}))^2 d{{\,\mathrm{\mathbb {P}}\,}}_\kappa (\eta _{ij} \mid \omega _{\{i,j\}^c}), \end{aligned}$$
(8)

where \(\omega _{\{i,j\}^c} = (\omega _k)_{k \notin \{i,j\}}\) and \(\eta _{ij} = (\eta _i, \eta _j)\). Therefore, we have \(\varGamma (f)^2 = 2 N^{-1} |\mathfrak {d}f |^2\) for the difference operator \(|\mathfrak {d}f |\) introduced in [17].

Sometimes (and typically for auxiliary purposes), we shall also need a second, closely related difference operator which we denote by \(\varGamma ^+\). Here, we simply set

$$\begin{aligned} \varGamma _{ij}^+(f)(\omega ) :=(f(\omega ) - f(\tau _{ij}\omega ))_+, \end{aligned}$$

where \(x_+ :=\max (x,0)\) denotes the positive part of a real number, and define \(\varGamma ^+\) accordingly.

Recently, in [25] sharp (modified) logarithmic Sobolev inequalities for the multislice were established. Rewriting these results in accordance with our notation and slightly extending them immediately leads to the following proposition, serving as the basis for our arguments:

Proposition 4

With the above definitions of \(\varGamma \) and \(\varGamma ^+\), \({{\,\mathrm{\mathbb {P}}\,}}_\kappa \) satisfies the following functional inequalities:

  • \(\varGamma \mathrm {-LSI}(2\log (N/\kappa _{\mathrm {min}})/\log (2))\),

  • \(\varGamma \mathrm {-mLSI}(4)\),

  • \(\varGamma ^+\mathrm {-mLSI}(8)\).

Proof of Proposition 4

The \(\varGamma \mathrm {-LSI}\) directly follows from [25, Theorem 5]. Moreover, by [25, Lemma 1] (substituting \(f \ge 0\) by \(e^f\)), we have

$$\begin{aligned} \mathrm {Ent}_{{{\,\mathrm{\mathbb {P}}\,}}_\kappa }(e^f) \le \frac{1}{N} \sum _{i < j} {{\,\mathrm{\mathbb {E}}\,}}_\kappa (e^{f(\tau _{ij}\omega )} - e^{f(\omega )})(f(\tau _{ij}\omega ) - f(\omega )) \end{aligned}$$
(9)

for any \(f :\varOmega _\kappa \rightarrow {{\,\mathrm{\mathbb {R}}\,}}\). Using the fact that \(\omega \mapsto \tau _{ij}\omega \) is an automorphism of \(\varOmega _\kappa \) and applying the inequality \((a-b)(e^a - e^b) \le \frac{1}{2} (e^a + e^b)(a-b)^2\) leads to the \(\varGamma \mathrm {-mLSI}(4)\). By similar arguments, we may also deduce the \(\varGamma ^+\mathrm {-mLSI}(8)\). In particular, we note that the expected values on the right-hand side of (9) are symmetric in \(\omega \) and \(\tau _{ij}\omega \) and use the inequality \((a-b)_+ (e^a - e^b) \le (a-b)_+^2 e^a\). \(\square \)

From Proposition 4, we may derive a convex \(\nabla -\mathrm {(m)LSI}\) on the multislice, where \(\nabla \) denotes the usual Euclidean gradient.

Proposition 5

For any \(f \in {\mathcal {A}}_c :=\{ f :[x_1,x_L]^N \rightarrow {{\,\mathrm{\mathbb {R}}\,}}\mid f \ \text {convex}\}\), we have

$$\begin{aligned} \mathrm {Ent}_{{{\,\mathrm{\mathbb {P}}\,}}_\kappa }(e^f) \le 4|{\mathcal {X}} |^2{{\,\mathrm{\mathbb {E}}\,}}_\kappa e^f |\nabla f |^2. \end{aligned}$$

In other words, \({{\,\mathrm{\mathbb {P}}\,}}_\kappa \) satisfies a \(\nabla -\mathrm {mLSI}(8|{\mathcal {X}} |^2)\) on \({\mathcal {A}}_c\).

Proof

Using convexity in the first step and the inequality \((a-b)^2 \le 2a^2 + 2b^2\) yields

$$\begin{aligned} \varGamma ^+(f)^2(\omega )&= \frac{1}{4N} \sum _{i \ne j} (f(\omega ) - f(\tau _{ij}\omega ))_+^2 \le \frac{1}{4N} \sum _{i \ne j} \langle \nabla f(\omega ), \omega - \tau _{ij}\omega \rangle ^2 \\&= \frac{1}{4N} \sum _{i \ne j} (\omega _i-\omega _j)^2\big ( \partial _i f(\omega ) - \partial _j f(\omega ) \big )^2 \\&\le \frac{|{\mathcal {X}} |^2}{2N} \sum _{i \ne j} \big ( \partial _i f(\omega )^2 + \partial _j f(\omega )^2 \big ) \\&= \frac{|{\mathcal {X}} |^2}{N} (N-1) \sum _{i = 1}^N \partial _i f(\omega )^2 \le |{\mathcal {X}} |^2 |\nabla f |^2(\omega ). \end{aligned}$$

As \({{\,\mathrm{\mathbb {P}}\,}}_\kappa \) satisfies a \(\varGamma ^+-\mathrm {mLSI}(8)\) by Proposition 4, the claim follows. \(\square \)

Another class of functional inequalities we address in this note is Beckner inequalities. Restricting ourselves to the multislice (rather than providing a general definition), \({{\,\mathrm{\mathbb {P}}\,}}_\kappa \) satisfies a Beckner inequality with parameter \(p \in (1,2]\) (Bec-p) if there exists some constant \(\beta _p > 0\) such that

$$\begin{aligned} {{\,\mathrm{\mathbb {E}}\,}}_\kappa f^p - ({{\,\mathrm{\mathbb {E}}\,}}_\kappa f)^p \le \frac{\beta _p p}{2} {\mathcal {E}}_\kappa (f, f^{p-1}) \end{aligned}$$
(10)

for any non-negative function f. Here,

$$\begin{aligned} {\mathcal {E}}_\kappa (f,g) :=\frac{1}{2N} \sum _{1 \le i < j \le N} {{\,\mathrm{\mathbb {E}}\,}}_\kappa (\varGamma _{ij} f)(\varGamma _{ij} g) \end{aligned}$$

for any functions fg on \(\varOmega _\kappa \) (which is the Dirichlet form of the underlying Markov chain).

Recently, in [4] it was shown that in the context of general Markov semigroups, Beckner inequalities with constants bounded away from zero as \(p \downarrow 1\) and modified log-Sobolev inequalities are equivalent. In their article, the authors provide numerous examples and applications, also briefly discussing the multislice. Since we need results of this type for our purposes, we include a somewhat more detailed discussion in the present note.

Proposition 6

For any \(p \in (1,2]\), \({{\,\mathrm{\mathbb {P}}\,}}_\kappa \) satisfies a Beckner inequality Bec-p with constant \(\beta _p = \frac{4N}{p(N+2)}\).

Proof

First note that the result holds true for \(\kappa = (1, \ldots , 1)\) and \(L=N\) as proven in [8, Proposition 4.8], with the difference in the constant being due to different normalizations. To extend this result to general \(\kappa \), we apply a “projection” or “coarsening” argument, cf. [25, Section 3.4]. Indeed, consider the map \(\varPsi :\{1, \ldots , N\} \rightarrow \{1, \ldots , L\}\) given by \(\varPsi (i) = \ell \) iff \(i \in \{\kappa _1 + \cdots + \kappa _{\ell -1} + 1, \ldots , \kappa _1 + \cdots + \kappa _\ell \}\) and extend it to the multislice (with \({\mathcal {X}} = \{1, \ldots L\}\)) by coordinate-wise application, i.e., \(\varPsi (\omega _1, \ldots , \omega _N) :=(\varPsi (\omega _1), \ldots , \varPsi (\omega _N))\). Moreover, to address a general choice of \({\mathcal {X}}\), let \(\varPhi :\{1, \ldots , L\} \rightarrow {\mathcal {X}}\) be the “canonical identification” \(\varPhi (i) :=x_i\). Then, by [25, Lemma 4] applied to \({\tilde{f}} :=f \circ \varPhi \) and \({\tilde{g}} :=g \circ \varPhi \),

$$\begin{aligned} {{\,\mathrm{\mathbb {E}}\,}}_\kappa f = {{\,\mathrm{\mathbb {E}}\,}}_{(1, \ldots , 1)}({\tilde{f}} \circ \varPsi ),\qquad {\mathcal {E}}_\kappa (f,g) = {\mathcal {E}}_{(1, \ldots , 1)}({\tilde{f}} \circ \varPsi , {\tilde{g}} \circ \varPsi ) \end{aligned}$$

for any functions fg. From these identities, we immediately obtain the result. \(\square \)

Finally, we may also derive logarithmic Sobolev inequalities for symmetric functions of sampling without replacement. Here, we use other types of difference operators. Let \(f :\varOmega _{\kappa ,n} \rightarrow {{\,\mathrm{\mathbb {R}}\,}}\) be any (not necessarily symmetric) function. Then, writing \((\omega _1, \ldots , \omega _i', \ldots , \omega _n) = (\omega _1, \ldots , \omega _{i-1}, \omega _i', \omega _{i+1}, \ldots , \omega _n)\), we set

$$\begin{aligned} \mathfrak {h}(f)^2(\omega _1, \ldots , \omega _n)&= \frac{1}{2} \sum _{i = 1}^n (\sup _{\omega _i} f(\omega _1, \ldots , \omega _n) - \inf _{\omega _i'} f(\omega _1, \ldots , \omega _i', \ldots , \omega _n))^2,\\ \mathfrak {h}^+(f)^2(\omega _1, \ldots , \omega _n)&= \frac{1}{2} \sum _{i = 1}^n (f(\omega _1, \ldots , \omega _n) - \inf _{\omega _i'} f(\omega _1, \ldots , \omega _i', \ldots , \omega _n))^2. \end{aligned}$$

Here, the supremum and the infimum have to be interpreted as extending over all admissible configurations, i.e., such that \((\omega _{i^c},\omega _i), (\omega _{i^c},\omega _i') \in \varOmega _{\kappa ,n}\).

Proposition 7

Let \({\mathcal {A}}_{n,s} :=\{f :\varOmega _{\kappa ,n} \rightarrow {{\,\mathrm{\mathbb {R}}\,}}\mid f \ \text {symmetric}\}\). With the above definitions of \(\mathfrak {h}\) and \(\mathfrak {h}^+\), \({{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}\) satisfies the following functional inequalities on \({\mathcal {A}}_{n,s}\):

  • \(\mathfrak {h}\mathrm {-LSI}(2\log (N/\kappa _{\mathrm {min}})(1-\frac{n}{N})/\log (2))\),

  • \(\mathfrak {h}\mathrm {-mLSI}(4(1-\frac{n}{N}))\),

  • \(\mathfrak {h}^+\mathrm {-mLSI}(8(1-\frac{n}{N}))\).

Proof

We only prove the \(\mathfrak {h}^+\mathrm {-mLSI}\). The proofs of the other two inequalities follow by a modification of the arguments below.

First note that any function f on \(\varOmega _{\kappa ,n}\) can be extended to a function F on \(\varOmega _\kappa \) which only depends on the first n coordinates by setting \(F(\omega _1, \ldots , \omega _N) :=f(\omega _1, \ldots , \omega _n)\), which may be rewritten as \(F = f \circ pr_n\). We now apply Proposition 4 to F. Obviously, \(\mathrm {Ent}_{{{\,\mathrm{\mathbb {P}}\,}}_\kappa }(e^F) = \mathrm {Ent}_{{{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}}(e^f)\). It therefore remains to consider the right-hand side of the \(\mathrm {mLSI}\). Here, we obtain

$$\begin{aligned}&\frac{1}{2N} \sum _{i <j} {{\,\mathrm{\mathbb {E}}\,}}_\kappa (F(\omega ) - F(\tau _{ij} \omega ))_+^2 e^{F(\omega )}\\&= \frac{1}{2N} \sum _{i = 1}^n \sum _{j = n+1}^N {{\,\mathrm{\mathbb {E}}\,}}_\kappa (F(\omega ) - F(\tau _{ij}\omega ))_+^2 e^{F(\omega )} \\&\le \frac{1}{2N} \sum _{i = 1}^n \sum _{j = n+1}^N {{\,\mathrm{\mathbb {E}}\,}}_\kappa (F(\omega ) - \inf _{\omega _i'} F(\omega _{i^c}, \omega _i'))_+^2 e^{F(\omega )} \\&= \frac{N-n}{2N} \sum _{i = 1}^n {{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} (f(\omega _1, \ldots , \omega _n) - \inf _{x_i'} f(\omega _1, \ldots , x_i', \ldots , \omega _n))_+^2 e^{f(\omega _1, \ldots , \omega _n)}. \end{aligned}$$

Here, the first equality follows by symmetry of f with respect to the symmetric group \(S_n\), and the fact that f does not depend on \((\omega _{n+1}, \ldots , \omega _N)\). The first inequality is due to the monotonicity of \(x \mapsto x_+\), and the last equality follows as \({{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}\) is the push-forward of \({{\,\mathrm{\mathbb {P}}\,}}_\kappa \) under \(pr_n\). Thus, for any \(f \in {\mathcal {A}}_{n,s}\) it holds

$$\begin{aligned} \mathrm {Ent}_{{{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}}(e^f)&\le \frac{4}{2N} \sum _{i < j} {{\,\mathrm{\mathbb {E}}\,}}_\kappa (F(\omega ) - F(\tau _{ij} \omega ))_+^2 e^{F(\omega )}\\&\le 4 \frac{N-n}{N} {{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} \mathfrak {h}^+(f)^2(\omega _1, \ldots , \omega _n) e^{f(\omega _1, \ldots , \omega _n)}, \end{aligned}$$

which finishes the proof. \(\square \)

3 Proofs of the Concentration Inequalities

3.1 Proofs of Section 1.1

Proof of Proposition 1

Recall that if a probability measure \(\mu \) satisfies a \(\varGamma -\mathrm {mLSI}(\sigma ^2)\) on \({\mathcal {A}}\) (where \(\varGamma \) denotes some difference operator), we have for any \(f \in {\mathcal {A}}\) such that \(\varGamma (f) \le L\),

$$\begin{aligned} \mu (f-{{\,\mathrm{\mathbb {E}}\,}}f \ge t) \le \exp \left( -\frac{t^2}{2 \sigma ^2 L^2}\right) \end{aligned}$$
(11)

for any \(t \ge 0\). For a reference, see, e.g., [7] or [26, (1.2)]. Combining this fact with Proposition 4 and noting that by definition,

$$\begin{aligned} \varGamma (f) \le \left( \frac{1}{2N} \sum _{1 \le i < j \le N} c_{ij}^2\right) ^{1/2}, \end{aligned}$$

we arrive at (2). In the same way, we may derive (3) using Proposition 5. \(\square \)

The proof of Theorem 1 is more advanced. The basic idea is to follow the steps of the proof of [3, Theorem 2.2] and its refinements as presented in [4, Section 5.3]. First, we derive moment estimates for functions on the multislice.

Lemma 1

For any \(f :\varOmega _\kappa \rightarrow {{\,\mathrm{\mathbb {R}}\,}}\) and any \(p \ge 2\),

$$\begin{aligned} \Vert f - {{\,\mathrm{\mathbb {E}}\,}}_\kappa f \Vert _{L^p({{\,\mathrm{\mathbb {P}}\,}}_\kappa )} \le \sqrt{4\theta p} \Vert \varGamma (f) \Vert _{L^p({{\,\mathrm{\mathbb {P}}\,}}_\kappa )}, \end{aligned}$$

where \(\theta :=\sqrt{e}/(\sqrt{e}-1) < 2.5415\).

Proof

This follows immediately from Proposition 6 and [4, Proposition 3.3]. (Note that the notation used therein differs from ours; in particular, no square root is taken in the definition of \(\varGamma \).) To apply the latter result, we have to check that the constants of the Beckner inequalities Bec-p satisfy

$$\begin{aligned} \beta _p^{-1} = \frac{p(N+2)}{4N} \ge a (p-1)^s \end{aligned}$$

for some \(a > 0\), \(s \ge 0\) and any \(p \in (1,2]\). Clearly, we may take \(a=1/4\) and \(s=0\), which finishes the proof. \(\square \)

Note that alternatively, we could apply [17, Proposition 2.4], using (8) and Proposition 4, which yields

$$\begin{aligned} \Vert f - {{\,\mathrm{\mathbb {E}}\,}}_\kappa f \Vert _{L^p({{\,\mathrm{\mathbb {P}}\,}}_\kappa )} \le \sqrt{\frac{8\log (N/\kappa _{\mathrm {min}})}{\log (2)}} \sqrt{p-1} \Vert \varGamma (f) \Vert _{L^p({{\,\mathrm{\mathbb {P}}\,}}_\kappa )}. \end{aligned}$$

As a result of using the \(\varGamma -\mathrm {LSI}\), we arrive at a substantially weaker constant, however.

Next, we have to relate differences of multilinear polynomials to (formal) derivatives, which is typically achieved by an inequality of the form \(\varGamma (f) \le c |\nabla f |\) for some absolute constant \(c > 0\). However, it comes out that such an inequality cannot be true in our setting. For instance, taking \(N=3\), \({\mathcal {X}} = \{0,1\}\) and \(f(\omega ) = \omega _1\omega _2 - \omega _1\omega _3\), it is easy to check that for \(\omega = (0,1,1)\), we have \(0 = |\nabla f(\omega ) | < \varGamma (f)(\omega )\). The same problem arises if we take \(\varGamma ^+\) instead of \(\varGamma \). It is possible to prove an inequality of this type with \(c :=|{\mathcal {X}} |\) for multilinear polynomials with non-negative coefficients and \({\mathcal {X}} \subset [0, \infty )\). (This can be seen by slightly modifying the proof of Proposition 8.) However, the proof of Theorem 1 also includes an iteration and linearization procedure, and if we only allow for non-negative coefficients, we get stuck at \(d=2\).

The following proposition provides us with the estimate we need to get the recursion going, at the cost of also involving second-order derivatives.

Proposition 8

Let \(f = f(\omega )\) be a multilinear polynomial as in Theorem 1. Then, we have

$$\begin{aligned} \varGamma (f)^2 \le \frac{3|{\mathcal {X}} |^2}{2} |\nabla f |^2 + \frac{3|{\mathcal {X}} |^4}{4N} \Vert \nabla ^2 f \Vert _\mathrm {HS}^2. \end{aligned}$$
(12)

In particular, for any \(p \ge 2\) we have

$$\begin{aligned} \Vert f - {{\,\mathrm{\mathbb {E}}\,}}_{\kappa } f \Vert _{L^p({{\,\mathrm{\mathbb {P}}\,}}_\kappa )} \le \ \sqrt{6 \theta |{\mathcal {X}} |^2 p} \Vert |\nabla f | \Vert _{L^p({{\,\mathrm{\mathbb {P}}\,}}_\kappa )} + \sqrt{3 \theta |{\mathcal {X}} |^4 p/N} \Vert \Vert \nabla ^2 f \Vert _\mathrm {HS} \Vert _{L^p({{\,\mathrm{\mathbb {P}}\,}}_\kappa )} \end{aligned}$$
(13)

with \(\theta \) as in Lemma 1.

Proof

In the proof, we additionally assume f to be d-homogeneous, i.e.,

$$\begin{aligned} f(\omega ) = \sum _{i_1< \ldots < i_d} a_{i_1 \ldots i_d} \omega _{i_1} \cdots \omega _{i_d}. \end{aligned}$$

This is done in order to ease notation, and it is no problem to extend our proof to the non-homogeneous case. For notational convenience, for any \(i_1< \ldots < i_d\) and any permutation \(\sigma \in S_d\), we define \(a_{i_{\sigma (1)} \ldots i_{\sigma (d)}} :=a_{i_1 \ldots i_d}\), and we set \(a_{i_1 \ldots i_d} = 0\) if \(i_j = i_k\) for some \(j \ne k\). Finally, note that some of the notation below has to be interpreted accordingly for small values of d, e.g., summation over \(i_1< \ldots < i_{d-1}\) reduces to summation over \(i_1\) for \(d=2\). Observe that for any \(k, \ell \in \{1,\ldots ,N\}, k \ne \ell ,\) we have

$$\begin{aligned} \varGamma _{k\ell }(f)(\omega )^2&= \left( \sum _{\begin{array}{c} i_1< \ldots< i_{d-1} \\ \ell \notin \{i_1, \ldots , i_{d-1} \} \end{array}} a_{i_1 \ldots i_{d-1} k} \omega _{i_1} \cdots \omega _{i_{d-1}} (\omega _k - \omega _\ell )\right. \\&\quad \left. + \sum _{\begin{array}{c} i_1< \ldots< i_{d-1} \\ k \notin \{i_1, \ldots , i_{d-1} \} \end{array}} a_{i_1 \ldots i_{d-1} \ell } \omega _{i_1} \cdots \omega _{i_{d-1}} (\omega _\ell - \omega _k) \right) ^2\\&= \left( \sum _{i_1< \ldots< i_{d-1}} a_{i_1 \ldots i_{d-1} k} \omega _{i_1} \cdots \omega _{i_{d-1}} (\omega _k - \omega _\ell )\right. \\&\quad + \sum _{i_1< \ldots< i_{d-1}} a_{i_1 \ldots i_{d-1} \ell } \omega _{i_1} \cdots \omega _{i_{d-1}} (\omega _\ell - \omega _k)\\&\quad \left. + \sum _{i_1< \ldots< i_{d-2}} a_{i_1 \ldots i_{d-2} k\ell } \omega _{i_1} \cdots \omega _{i_{d-2}} (\omega _k - \omega _\ell )^2\right) ^2\\&\le 3|{\mathcal {X}} |^2 \left( \left( \sum _{i_1< \ldots< i_{d-1}} a_{i_1 \ldots i_{d-1} k} \omega _{i_1} \cdots \omega _{i_{d-1}} \right) ^2\right. \\&\quad \left. + \left( \sum _{i_1< \ldots< i_{d-1}} a_{i_1 \ldots i_{d-1} \ell } \omega _{i_1} \cdots \omega _{i_{d-1}} \right) ^2\right) \\&\quad + 3 |{\mathcal {X}} |^4 \left( \sum _{i_1< \ldots < i_{d-2}} a_{i_1 \ldots i_{d-2} k\ell } \omega _{i_1} \cdots \omega _{i_{d-2}}\right) ^2\\&= 3|{\mathcal {X}} |^2 (\partial _k f(\omega )^2 + \partial _\ell f(\omega )^2) + 3|{\mathcal {X}} |^4 \partial _{k\ell } f(\omega )^2. \end{aligned}$$

Consequently, it holds

$$\begin{aligned} \varGamma (f)^2&= \frac{1}{4N} \sum _{k \ne \ell } \varGamma _{k\ell }(f)^2 \le \frac{3|{\mathcal {X}} |^2}{4N} \sum _{k\ne \ell } ((\partial _k f)^2 + (\partial _\ell f)^2) + \frac{3|{\mathcal {X}} |^4}{4N} \sum _{k\ne \ell } (\partial _{kl} f)^2\\&\le \frac{3|{\mathcal {X}} |^2}{2} |\nabla f |^2 + \frac{3|{\mathcal {X}} |^4}{4N} \Vert \nabla ^2 f \Vert _\mathrm {HS}^2, \end{aligned}$$

proving equation (12). Finally, combining (12) with Lemma 1, we immediately arrive at (13). \(\square \)

With the help of Proposition 8, we may now prove Theorem 1. To this end, let us introduce some additional notation. If \(A = (a_{i_1 \ldots i_k})_{i_1, \ldots , i_k \le N}\), \(B = (b_{i_1 \ldots i_k})_{i_1, \ldots , i_k \le N}\) are two k-tensors, we define an inner product \(\langle \cdot , \cdot \rangle \) by

$$\begin{aligned} \langle A, B \rangle :=\sum _{i_1, \ldots , i_k \le N} a_{i_1 \ldots i_k} b_{i_1 \ldots i_k}. \end{aligned}$$

Moreover, if \(x^j = (x^j_1, \ldots , x^j_N)\), \(j = 1, \ldots , k\), are any vectors, we set \(x^1 \otimes \cdots \otimes x^k :=(x^1_{i_1} \cdots x^k_{i_k})_{i_1, \ldots , i_k \le N}\). We also extend this notation to the situation in which some of these vectors may be \(N^2\)-dimensional. Indeed, let \(x^1, \ldots , x^k\) be N-dimensional vectors as above, and let \(y^1, \ldots , y^\ell \) be \(N^2\)-dimensional, \(y^j = (y^j_{\nu _1, \nu _2})_{\nu _1, \nu _2 \le N}\). In this case, we set

$$\begin{aligned} x^1 \otimes \cdots \otimes x^k \otimes y^1 \otimes \cdots \otimes y^\ell :=(x_{i_1}^1 \cdots x_{i_k}^k y_{i_{k+1},i_{k+2}}^1 \cdots y_{i_{k+2\ell -1},i_{k+2\ell }}^\ell )_{i_1, \ldots , i_{k+2\ell }\le N}, \end{aligned}$$

which we regard as a rectangular \((k + \ell )\)-tensor whose first k components are N-dimensional and whose last \(\ell \) components are \(N^2\)-dimensional.

Proof of Theorem 1

To ease notation, we assume \(|{\mathcal {X}} |=1\) in the sequel. The general case follows in the same way with only minor changes. Recall the fact that for a standard Gaussian g in \({{\,\mathrm{\mathbb {R}}\,}}^k\) for some \(k \in {{\,\mathrm{\mathbb {N}}\,}}\) and \(x \in {{\,\mathrm{\mathbb {R}}\,}}^k\) we have \(\sqrt{p}M^{-1}|x | \le \Vert \langle x, g \rangle \Vert _{L^p} \le M \sqrt{p} |x |\) for all \(p \ge 1\) and some universal constant \(M > 1\). Combining this and equation (13), we arrive at

$$\begin{aligned} \Vert f - {{\,\mathrm{\mathbb {E}}\,}}_{\kappa } f \Vert _{L^p({{\,\mathrm{\mathbb {P}}\,}}_\kappa )} \le K\left( \Vert \langle \nabla f, G \rangle \Vert _{L^p} + (2N)^{-1/2} \Vert \langle \nabla ^2 f, H \rangle \Vert _{L^p}\right) , \end{aligned}$$
(14)

for \(K :=\sqrt{6\theta } M\). Here, G is an N-dimensional standard Gaussian and H is an \(N^2\)-dimensional standard Gaussian such that G and H are independent of each other and of the \(\omega _i\), and the \(L^p\) norms on the right-hand side are taken with respect to the product measure of \({{\,\mathrm{\mathbb {P}}\,}}_\kappa \) and the Gaussians.

Note that \(\langle \nabla f, G \rangle \) and \(\langle \nabla ^2 f, H \rangle \) are again multilinear polynomials in the \(\omega _i\). Moreover, \(\langle \nabla \langle \nabla f, G_1 \rangle , G_2 \rangle = \langle \nabla ^2 f, G_1 \otimes G_2 \rangle \) and \(\langle \nabla ^2 \langle \nabla f, G \rangle , H \rangle = \langle \nabla ^3 f, G \otimes H \rangle \). In the last expression, we regard \(\nabla ^3f\) as a 2-tensor whose second component is \(N^2\)-dimensional. Similar relations also hold for the other terms in (14).

The proof now follows by iterating (14). For simplicity of presentation, let us consider the case of \(d=2\) first. Here, we apply the triangle inequality (in the form \(\Vert \langle \nabla f, G \rangle \Vert _{L^p} \le \Vert \langle {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla f, G \rangle \Vert _{L^p} + \Vert \langle \nabla f - {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla f, G \rangle \Vert _{L^p}\) and similarly for \(\langle \nabla ^2 f, H \rangle \)) to (14). We may then apply (14) to \(\langle \nabla f - {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla f, G \rangle \) and \(\langle \nabla ^2 f - {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^2 f, H \rangle \) again. This leads to

$$\begin{aligned} \begin{aligned}&\Vert f - {{\,\mathrm{\mathbb {E}}\,}}_{\kappa } f \Vert _{L^p({{\,\mathrm{\mathbb {P}}\,}}_\kappa )}\\&\quad \le \ K \Vert \langle {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla f, G \rangle \Vert _{L^p} + K (2N)^{-1/2} \Vert \langle {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^2 f, H \rangle \Vert _{L^p}\\&\qquad + K^2 \Vert \langle \nabla ^2 f, G_1 \otimes G_2 \rangle \Vert _{L^p} + 2 K^2 (2N)^{-1/2} \Vert \langle {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^3 f, G \otimes H \rangle \Vert _{L^p}\\&\qquad + K^2 (2N)^{-1}\Vert \langle {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^4 f, H_1 \otimes H_2 \rangle \Vert _{L^p}\\&\quad = \ K \Vert \langle {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla f, G \rangle \Vert _{L^p} + K (2N)^{-1/2} \Vert \langle {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^2 f, H \rangle \Vert _{L^p}\\&\qquad + K^2 \Vert \langle {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^2 f, G_1 \otimes G_2 \rangle \Vert _{L^p}. \end{aligned} \end{aligned}$$
(15)

In the last step, we have used that since f is a multilinear polynomial of degree 2, its second-order derivatives are constant and all derivatives of order larger than 2 vanish.

Next, we use that by [22], there are constants \(C_k\) depending on k only such that for any (possibly rectangular) k-tensor A and any \(p \ge 2\),

$$\begin{aligned} \Vert \langle A, g_1 \otimes \cdots \otimes g_k \rangle \Vert _{L^p} \le C_k \sum _{{\mathcal {I}} \in P_k} p^{|{\mathcal {I}} |/2} \Vert A \Vert _{\mathcal {I}}, \end{aligned}$$
(16)

where \(g_1, \ldots , g_k\) are standard Gaussians. Applying (16) to (15), we obtain for some absolute constant C

$$\begin{aligned}&\Vert f - {{\,\mathrm{\mathbb {E}}\,}}_{\kappa } f \Vert _{L^p({{\,\mathrm{\mathbb {P}}\,}}_\kappa )}\\&\quad \le \ K \Vert \langle {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla f, G \rangle \Vert _{L^p} + K (2N)^{-1/2} \Vert \langle {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^2 f, H \rangle \Vert _{L^p} + K^2 \Vert \langle {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^2 f, G_1 \otimes G_2 \rangle \Vert _{L^p}\\&\quad \le \ C_1K p^{1/2} |{{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla f | + C_1K (2N)^{-1/2} p^{1/2} \Vert {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^2 f \Vert _\mathrm {HS} + C_2 K^2 p^{1/2} \Vert {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^2 f \Vert _\mathrm {HS}\\&\qquad + C_2 K^2 p \Vert {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^2 f \Vert _\mathrm {op}\\&\quad \le \ C (p^{1/2} |{{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla f | + p^{1/2} \Vert {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^2 f \Vert _\mathrm {HS} + p \Vert {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^2 f \Vert _\mathrm {op}). \end{aligned}$$

From here, the assertion follows by standard arguments, cf., e.g., [18, Proposition 4].

Finally, we consider an arbitrary \(d \ge 2\) and explain how the proof given above generalizes. First, we apply the triangle inequality to (14) and iterate \(d-1\) times. This yields

$$\begin{aligned} \Vert f - {{\,\mathrm{\mathbb {E}}\,}}_{\kappa } f \Vert _{L^p({{\,\mathrm{\mathbb {P}}\,}}_\kappa )} \le \psi _d + \sum _{i=1}^{d-1} \psi _i, \end{aligned}$$
(17)

where we have

$$\begin{aligned} \begin{aligned} \psi _d&:=\sum _{\ell =0}^d \left( {\begin{array}{c}d\\ \ell \end{array}}\right) K^d (2N)^{-\ell /2} \Vert \langle \nabla ^{d+\ell }f, G_1 \otimes \cdots \otimes G_{d-\ell } \otimes H_1 \otimes \cdots \otimes H_\ell \rangle \Vert _{L^p},\\ \psi _i&:=\sum _{\ell =0}^i \left( {\begin{array}{c}i\\ \ell \end{array}}\right) K^i(2N)^{-\ell /2} \Vert \langle {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^{i+\ell }f, G_1 \otimes \cdots \otimes G_{i-\ell } \otimes H_1 \otimes \cdots \otimes H_\ell \rangle \Vert _{L^p} \end{aligned} \end{aligned}$$
(18)

for any \(i=1, \ldots , d-1\). As f is a multilinear polynomial of degree d, these expressions simplify since the derivatives of order d are constant and all derivatives of higher order vanish. In particular,

$$\begin{aligned} \psi _d = K^d \Vert \langle {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^df, G_1 \otimes \cdots \otimes G_d \rangle \Vert _{L^p}. \end{aligned}$$

Now, as above we apply (16) to (17) (or rather the \(L^p\) norms appearing in (18)) to arrive at

$$\begin{aligned} \Vert f - {{\,\mathrm{\mathbb {E}}\,}}_{\kappa } f \Vert _{L^p({{\,\mathrm{\mathbb {P}}\,}}_\kappa )} \le C \sum _{k=1}^d \sum _{{\mathcal {I}} \in P_k} p^{|{\mathcal {I}} |/2} \Vert {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^k f \Vert _{\mathcal {I}} \end{aligned}$$

for some absolute constant \(C> 0\) depending on d only. In particular, we use that if we apply (16) to some \(\ell \ge 1\) term in \(\psi _i\) in (18), the norms which arise reappear in the norms corresponding to \(\ell = 0\) in the \(\psi _{i+\ell }\) terms. The proof is concluded by recalling [18, Proposition 4] again. \(\square \)

Proof of Corollary 1

The proof works by calculating \(\Vert {{\,\mathrm{\mathbb {E}}\,}}_\kappa \nabla ^k f \Vert _{\mathcal {I}}\) for \(k = 1, 2, 3\) and applying Theorem 1. In the sequel, we use the convention \(\omega _{ji} :=\omega _{ij}\) whenever \(j > i\). It is easy to see that for any edge \(e = \{i,j\}\), we have

$$\begin{aligned} \frac{\partial }{\partial \omega _e} f(\omega ) = \sum _{k \in \{1, \ldots , n\}\setminus \{i,j\}} \omega _{ik}\omega _{jk}. \end{aligned}$$

Moreover, the second-order derivatives \(\partial ^2 f/(\partial \omega _{e_1}\partial \omega _{e_2})\) are zero unless \(e_1\) and \(e_2\) share exactly one vertex, in which case it is \(\omega _{ij}\) if i and j are the two vertices distinct from the common one. Finally, the third-order derivatives \(\partial ^3 f/(\partial \omega _{e_1}\partial \omega _{e_2} \partial \omega _{e_3})\) are 1 if \(e_1, e_2, e_3\) form a triangle and zero if not.

Using that

$$\begin{aligned} {{\,\mathrm{\mathbb {E}}\,}}\omega _{e_1} \cdots \omega _{e_k} = \frac{M(M-1) \cdots (M-k+1)}{N(N-1) \cdots (N-k+1)}, \end{aligned}$$

for any \(k = 1, \ldots , N\) and any pairwise distinct set of edges \(e_1, \ldots , e_k\), we therefore obtain

$$\begin{aligned} \Vert {{\,\mathrm{\mathbb {E}}\,}}\nabla f \Vert _{\{1\}} = \sqrt{N} (n-2) \frac{M(M-1)}{N(N-1)} \le n^2 p^2. \end{aligned}$$

Moreover, we have \({{\,\mathrm{\mathbb {E}}\,}}\nabla ^2 f = p (\mathbbm {1}_{|e_1 \cap e_2 | = 1})_{e_1,e_2}\), where \(|e_1 \cap e_2 |\) denotes the number of common vertices of \(e_1\) and \(e_2\). Therefore, we may use the calculations from the proof of [5, Proposition 5.5], which yield

$$\begin{aligned} \Vert {{\,\mathrm{\mathbb {E}}\,}}\nabla ^2 f \Vert _{\{1,2\}} \le p n^{3/2},\qquad \Vert {{\,\mathrm{\mathbb {E}}\,}}\nabla ^2 f \Vert _{\{1\},\{ 2\}} \le 2 p n,\\ \Vert {{\,\mathrm{\mathbb {E}}\,}}\nabla ^3 f \Vert _{\{1,2,3\}} \le n^{3/2},\qquad \Vert {{\,\mathrm{\mathbb {E}}\,}}\nabla ^3 f \Vert _{\{1\},\{2\},\{3\}} \le 2^{3/2},\\ \Vert {{\,\mathrm{\mathbb {E}}\,}}\nabla ^3 f \Vert _{\{1,2\},\{3\}} = \Vert {{\,\mathrm{\mathbb {E}}\,}}\nabla ^3 f \Vert _{\{1,3\},\{2\}} = \Vert {{\,\mathrm{\mathbb {E}}\,}}\nabla ^3 f \Vert _{\{2,3\},\{1\}} \le \sqrt{2n}. \end{aligned}$$

The proof now follows by plugging in. \(\square \)

3.2 Proofs of Section 1.2

The results of Sect. 1.2 follow from the logarithmic Sobolev inequalities established in Sect. 2 by standard means.

Proof of Proposition 2

Noting that

$$\begin{aligned} \mathfrak {h}(f) = \left( \frac{1}{2} \sum _{i = 1}^n \left( \sup _{\omega _1} f(\omega ) - \inf _{\omega _1'} f(\omega _1', \omega _2, \ldots , \omega _n)\right) ^2 \right) ^{1/2} \le \left( \frac{c^2n}{2} \right) ^{1/2}, \end{aligned}$$

(7) follows from Proposition 7 using the arguments from the proof of Proposition 1. To prove (6) define the symmetric function \(g(\omega ) :=\frac{1}{n!} \sum _{\sigma \in S_n} f(\sigma \omega )\), and observe that by exchangeability of the \(\omega _i\) we have \({{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} f = {{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} g\). Moreover,

$$\begin{aligned} |g(\omega ) - g(\omega _1', \omega _2, \ldots , \omega _n) |&\le \frac{1}{n!} \sum _{\sigma \in S_n} |f(\sigma \omega ) - f(\sigma (\omega _1', \omega _2, \ldots , \omega _n)) | \\&\le \frac{1}{n!} \sum _{\sigma \in S_n} \sum _{i =1}^n \mathbbm {1}_{\sigma (1) = i} c_i \le \frac{1}{n} \sum _{i = 1}^n c_i. \end{aligned}$$

Applying equation (7) to g and using Jensen’s inequality yield

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}(g - {{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} g \ge t)&\le \exp \left( - \frac{t^2}{4(1-n/N)n (n^{-1} \sum _i c_i)^2} \right) \\&\le \exp \left( - \frac{t^2}{4(1-n/N) \sum _i c_i^2} \right) \end{aligned}$$

as claimed. \(\square \)

Proof of Corollary 2

This follows immediately from Proposition 2, as \(f(\omega ) = \frac{1}{n} \sum _{i = 1}^n \omega _i\) is a symmetric function satisfying \(|f(\omega ) - f(\omega _1', \omega _2, \ldots , \omega _n) | \le |{\mathcal {X}} |/n\). \(\square \)

Proof of Corollary 3

This is a consequence of Proposition 2, as for any \(\omega \in \varOmega _{\kappa ,n}\) and \(\omega _1'\) we have by the reverse triangle inequality

$$\begin{aligned} |f(\omega ) - f(\omega _1', \omega _2, \ldots , \omega _n) | \le n^{-1} \sup _{t \in {{\,\mathrm{\mathbb {R}}\,}}} |\mathbbm {1}_{(-\infty , t]}(\omega _1) - \mathbbm {1}_{(-\infty ,t]}(\omega _1') | \le n^{-1}. \end{aligned}$$

\(\square \)

3.3 Proofs of Section 1.3

To prove Talagrand’s convex distance inequality on the multislice, we follow the approach by Boucheron, Lugosi and Massart [9], see also [26, Proposition 1.9]. A key step in the proof is the following lemma.

Lemma 2

Let \(f: \varOmega _\kappa \rightarrow {{\,\mathrm{\mathbb {R}}\,}}\) be a non-negative function such that

  1. 1.

    \(\varGamma ^+(f)^2 \le f\),

  2. 2.

    \(|f(\omega ) - f(\tau _{ij}\omega ) | \le 1\) for all \(\omega , i,j\).

Then, for all \(t \in [0, {{\,\mathrm{\mathbb {E}}\,}}_\kappa f]\) we have

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_\kappa \left( {{\,\mathrm{\mathbb {E}}\,}}_\kappa f - f \ge t\right) \le \exp \left( -\frac{t^2}{32{{\,\mathrm{\mathbb {E}}\,}}_\kappa f} \right) . \end{aligned}$$

Particularly, we have

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_\kappa (f = 0)\exp \left( \frac{{{\,\mathrm{\mathbb {E}}\,}}_\kappa f}{32} \right) \le 1. \end{aligned}$$

In particular, this holds for \(f(\omega ) = \frac{1}{4} d_T(\omega , A)^2\), where \(A \subset S_n\) is any set.

We defer the proof of Lemma 2 until the end of the section and first show how to apply it to prove Talagrand’s convex distance inequality.

Proof of Proposition 3

The difference operator \(\varGamma ^+\) clearly satisfies \(\varGamma ^+(g^2) \le 2g \varGamma ^+(g)\) for all positive functions g, as well as a \(\varGamma ^+-\mathrm {mLSI}(8)\). Moreover, as we will see in the proof of Lemma 2, we have \(\varGamma ^+(d_T(\cdot , A)) \le 1\). Thus, by [26, (3.6)] it holds for \(\lambda \in [0,1/16)\)

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_\kappa (A) {{\,\mathrm{\mathbb {E}}\,}}_\kappa \exp \left( \lambda d_T(\cdot ,A)^2 \right) \le {{\,\mathrm{\mathbb {P}}\,}}_\kappa (A) \exp \left( \frac{\lambda }{1-16\lambda } {{\,\mathrm{\mathbb {E}}\,}}_\kappa d_T(\cdot ,A)^2 \right) . \end{aligned}$$

Furthermore, Lemma 2 shows that

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_\kappa (A) \exp \left( \frac{{{\,\mathrm{\mathbb {E}}\,}}_\kappa d_T(\cdot ,A)^2}{128} \right) \le 1. \end{aligned}$$

So, for \(\lambda = 1/144\) we have

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_\kappa (A) {{\,\mathrm{\mathbb {E}}\,}}_\kappa \exp \left( \frac{d_T(\cdot ,A)^2}{144} \right) \le {{\,\mathrm{\mathbb {P}}\,}}_\kappa (A) \exp \left( \frac{1}{128} {{\,\mathrm{\mathbb {E}}\,}}_\kappa d_T(\cdot ,A)^2 \right) \le 1. \end{aligned}$$

\(\square \)

Proofs of Corollaries 4 and 6

These corollaries follow in exactly the same way as the proof of [28, Theorem 3]. The only difference is to note that for any \(x, y \in \{ f \le \mathrm {med}(f) \}\) such that \(f(x) \ge \mathrm {med}(f) + t\) we have

$$\begin{aligned} t \le \mathrm {med}(f) + t - f(y) \le f(x) - f(y) \le L |x-y | \le L |{\mathcal {X}} | \sup _{\alpha \in {{\,\mathrm{\mathbb {R}}\,}}^n : |\alpha | = 1} \sum _{i= 1}^n \alpha _i \mathbbm {1}_{x_i \ne y_i}, \end{aligned}$$

so that

$$\begin{aligned} f(x) \ge \mathrm {med}(f) + t \Rightarrow d_T(x, A) \ge t/(|{\mathcal {X}} | L). \end{aligned}$$

\(\square \)

Proof of Corollary 5

Since \(\lambda _\mathrm {max} = \Vert X \Vert _\mathrm {op}\), it is clear by triangular inequality that \(\lambda _\mathrm {max}\) is a convex function of the \(X_{ij}\), \(i \le j\). Moreover, due to Lidskii’s inequality (cf. [11, Exercise 3.16]), \(\lambda _\mathrm {max}\) is 1-Lipschitz. It therefore remains to apply Corollary 4. \(\square \)

Proof of Lemma 2

Rewriting [25, Lemma 1], for any positive function g it holds

$$\begin{aligned} \mathrm {Ent}_\kappa (g)&\le \frac{1}{2N} \sum _{i,j} {{\,\mathrm{\mathbb {E}}\,}}_\kappa (g(\tau _{ij}\omega ) - g(\omega ))(\log g(\tau _{ij}\omega ) - \log g(\omega )) \\&= \frac{1}{N} \sum _{i,j} {{\,\mathrm{\mathbb {E}}\,}}_\kappa (g(\tau _{ij}\omega ) - g(\omega )) (\log g(\tau _{ij}\omega ) - \log g(\omega ))_+. \end{aligned}$$

Using this, we obtain for any \(\lambda \in [0,1]\)

$$\begin{aligned} \mathrm {Ent}_\kappa (e^{-\lambda f})&\le \frac{\lambda }{N} {{\,\mathrm{\mathbb {E}}\,}}_\kappa \sum _{i,j} (f(\omega ) - f(\tau _{ij}\omega ))_+ \left( \exp (-\lambda f(\tau _{ij}\omega )) - \exp (-\lambda f(\omega )) \right) \\&= \frac{\lambda }{N} {{\,\mathrm{\mathbb {E}}\,}}_\kappa \sum _{i,j} (f(\omega ) - f(\tau _{ij} \omega ))_+ (\exp (\lambda (f(\omega ) - f(\tau _{ij}\omega )))-1) e^{-\lambda f(\omega )} \\&= \frac{\lambda }{N} {{\,\mathrm{\mathbb {E}}\,}}_\kappa \sum _{i,j} (f(\omega ) - f(\tau _{ij}\omega ))_+ \varPsi (\lambda (f(\omega ) - f(\tau _{ij}\omega ))) e^{-\lambda f(\omega )}, \end{aligned}$$

where \(\varPsi (x) :=e^x - 1\). By a Taylor expansion, it can easily be seen that \(\varPsi (x) \le 2x\) for all \(x \in [0,1]\). Therefore, recalling that by assumption 2 we have \(f(\omega ) - f(\tau _{ij}\omega ) \le 1\), and \(f(\omega ) - f(\tau _{ij}\omega ) \ge 0\) due to the positive part, and using assumption 1 in the last step, we obtain

$$\begin{aligned} \mathrm {Ent}_\kappa \left( e^{-\lambda f}\right)&\le \frac{2\lambda ^2}{N} {{\,\mathrm{\mathbb {E}}\,}}_\kappa \sum _{i,j} \left( f(\omega ) - f(\tau _{ij}\omega )\right) _+^2 e^{-\lambda f(\omega )}\\&= 8\lambda ^2 {{\,\mathrm{\mathbb {E}}\,}}_\kappa \varGamma ^+(f)^2 e^{-\lambda f} \le 8\lambda ^2 {{\,\mathrm{\mathbb {E}}\,}}_\kappa f e^{-\lambda f}. \end{aligned}$$

The covariance of \(fe^{-\lambda f}\) is non-positive (i.e., \({{\,\mathrm{\mathbb {E}}\,}}f e^{-\lambda f} \le {{\,\mathrm{\mathbb {E}}\,}}f {{\,\mathrm{\mathbb {E}}\,}}e^{-\lambda f}\)), which yields

$$\begin{aligned} \mathrm {Ent}_\kappa (e^{-\lambda f}) \le 8\lambda ^2 {{\,\mathrm{\mathbb {E}}\,}}_\kappa f {{\,\mathrm{\mathbb {E}}\,}}_\kappa e^{-\lambda f}. \end{aligned}$$

In other terms, if we set \(h(\lambda ) :={{\,\mathrm{\mathbb {E}}\,}}_\kappa e^{-\lambda f}\), we have

$$\begin{aligned} \left( \frac{\log h(\lambda )}{\lambda } \right) ' \le 8{{\,\mathrm{\mathbb {E}}\,}}_\kappa f, \end{aligned}$$

which by the fundamental theorem of calculus implies for all \(\lambda \in [0,1]\)

$$\begin{aligned} {{\,\mathrm{\mathbb {E}}\,}}_\kappa \exp \left( \lambda ({{\,\mathrm{\mathbb {E}}\,}}_\kappa f - f) \right) \le \exp \left( 8\lambda ^2 {{\,\mathrm{\mathbb {E}}\,}}_\kappa f \right) . \end{aligned}$$

So, for any \(t \in [0, {{\,\mathrm{\mathbb {E}}\,}}_\kappa f]\), by Markov’s inequality and setting \(\lambda = \frac{t}{16 {{\,\mathrm{\mathbb {E}}\,}}_\kappa f}\)

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_\kappa ({{\,\mathrm{\mathbb {E}}\,}}_\kappa f - f \ge t) \le \exp \left( -\lambda t + 8\lambda ^2 {{\,\mathrm{\mathbb {E}}\,}}_\kappa f\right) = \exp \left( - \frac{t^2}{32{{\,\mathrm{\mathbb {E}}\,}}_\kappa f} \right) . \end{aligned}$$

The second part follows by non-negativity and \(t = {{\,\mathrm{\mathbb {E}}\,}}_\kappa f\).

It remains to check that \(f(\omega ) = \frac{1}{4} d_T(\omega , A)^2\) satisfies the two conditions of this lemma. To this end, we first show that \(\varGamma ^+(d_T(\cdot ,A))^2 \le 1\). Writing \(g(\omega ) :=d_T(\omega , A)\), it is well known (see [9]) that by Sion’s minimax theorem, we have

$$\begin{aligned} g(\omega ) = \inf _{\nu \in {\mathcal {M}}(A)} \sup _{\alpha \in {{\,\mathrm{\mathbb {R}}\,}}^N : |\alpha | = 1} \sum _{k = 1}^N \alpha _k \nu \left( \omega ' : \omega '_k \ne \omega _k\right) , \end{aligned}$$
(19)

where \({\mathcal {M}}(A)\) is the set of all probability measures on A. To estimate \(\varGamma ^+(g)^2(\omega )\), one has to compare \(g(\omega )\) and \(g(\tau _{ij}\omega )\). To this end, for any \(\omega \in \varOmega _\kappa \) fixed, let \({\tilde{\alpha }}, {\tilde{\nu }}\) be parameters for which the value \(g(\omega )\) is attained, and let \({\hat{\nu }} = {\hat{\nu }}_{ij}\) be a minimizer of \(\inf _{\nu \in {\mathcal {M}}(A)} \sum _{k = 1}^N {\tilde{\alpha }}_k \nu (\omega ' : \omega '_k \ne (\tau _{ij}\omega )_k)\). This leads to

$$\begin{aligned} \varGamma ^+ (g)(\omega )^2&\le \frac{1}{4N} \sum _{i,j = 1}^N \left( \sum _{k = 1}^N {\tilde{\alpha }}_k \left( {\hat{\nu }}\left( \omega '_k \ne \omega _k\right) - {\hat{\nu }}\left( \omega '_k \ne \left( \tau _{ij}\omega \right) _k\right) \right) \right) _+^2 \\&\le \frac{1}{2N} \sum _{i,j = 1}^N \left( {\tilde{\alpha }}_i^2 + {\tilde{\alpha }}_j^2\right) \le 1. \end{aligned}$$

Using this as well as \(\varGamma ^+(g^2) \le 2g\varGamma ^+(g)\) for all positive functions g, we have

$$\begin{aligned} \varGamma ^+(f)^2 = \frac{1}{16} \varGamma ^+\left( d_T(\cdot , A)^2\right) ^2 \le \frac{1}{4} d_T(\cdot ,A)^2 \varGamma ^+\left( d_T(\cdot ,A)\right) ^2 \le f. \end{aligned}$$

To show the second property, we proceed similarly to [10, Proof of Lemma 1]. By (19) and the Cauchy–Schwarz inequality, we have

$$\begin{aligned} f(\omega ) = \frac{1}{4} \inf _{\nu \in {\mathcal {M}}(A)} \sum _{k = 1}^N \nu \left( \omega ' : \omega '_k \ne \omega _k\right) ^2. \end{aligned}$$

Assuming without loss of generality that \(f(\omega ) \ge f(\tau _{ij}\omega )\), choose \({\hat{\nu }} = {\hat{\nu }}_{ij} \in {\mathcal {M}}(A)\) such that the value of \(f(\tau _{ij}\omega )\) is attained. It follows that

$$\begin{aligned} f(\omega ) - f\left( \tau _{ij}\omega \right) \le \frac{1}{4}\sum _{k=1}^N {\hat{\nu }}\left( \omega '_k \ne \omega _k\right) ^2 - {\hat{\nu }} \left( \omega '_k \ne \left( \tau _{ij}\omega \right) _k\right) ^2 \le \frac{2}{4}, \end{aligned}$$

which finishes the proof. \(\square \)

Proof of Theorem 2

Since A is a symmetric set, \(\omega \mapsto d_T(\omega ,A)\) is a symmetric function, which follows by the definition

$$\begin{aligned} d_T(\omega , A) = \sup _{\alpha \in {{\,\mathrm{\mathbb {R}}\,}}^n : |\alpha | = 1} \inf _{\omega ' \in A} \sum _{i = 1}^n |\alpha _i | \mathbbm {1}_{\omega _i \ne \omega _i'}. \end{aligned}$$

As in (19), we may use Sion’s minimax theorem to rewrite \(d_T\) as

$$\begin{aligned} d_T(\omega ,A) = \inf _{\nu \in {\mathcal {M}}(A)} \sup _{\alpha \in {{\,\mathrm{\mathbb {R}}\,}}^n : |\alpha | = 1} \sum _{k = 1}^n \alpha _k \nu ( \omega ' : \omega '_k \ne \omega _k ). \end{aligned}$$

As in the proof of Proposition 3, let \({\tilde{\nu }}, {\tilde{\alpha }}\) be the parameters for which the value \(d_T(\omega ,A)\) is attained, and let \({\hat{\nu }}\), \({\hat{\omega }}_i'\) be minimizers of \(\inf _{\omega _i'} \inf _{\nu \in {\mathcal {M}}(A)} \sum _{j = 1}^n {\tilde{\alpha }}_k \nu ( \eta : \eta _k \ne (\omega _{i^c}, \omega _i')_k)\). We then have

$$\begin{aligned} \mathfrak {h}^+(d_T(\omega , A))^2&= \frac{1}{2} \sum _{i = 1}^n \left( d_T(\omega ,A) - \inf _{\omega _i'} d_T\left( \left( \omega _{i^c}, \omega _i'\right) , A \right) _+^2 \right. \\&\le \frac{1}{2} \sum _{i = 1}^n \left( \sum _{k = 1}^n {\tilde{\alpha }}_k {\hat{\nu }}\left( \eta : \eta _k \ne \omega _k\right) - \sum _{k = 1}^n {\tilde{\alpha }}_k {\hat{\nu }}\left( \eta : \eta _k \ne \left( \omega _{i^c}, {\hat{\omega }}_i'\right) _k\right) \right) _+^2 \\&\le \frac{1}{2} \sum _{i = 1}^n {\tilde{\alpha }}_i^2 = \frac{1}{2}. \end{aligned}$$

Recall that by Proposition 7, \({{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}\) satisfies an \(\mathfrak {h}^+-\mathrm {LSI}(8(1-\frac{n}{N}))\) on the set of all symmetric functions. As a consequence, using (11) again, we obtain the sub-Gaussian estimate

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}\left( d_T(\cdot , A) - {{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} d_T(\cdot , A) \ge t \right) \le \exp \left( - \frac{t^2}{8(1-n/N)} \right) . \end{aligned}$$

In the next step, we observe that by the Poincaré inequality we have

$$\begin{aligned} \mathrm {Var}(d_T(\cdot ,A)) \le 8 (1-n/N){{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} \mathfrak {h}^+(d_T(\cdot ,A))^2 \le 4(1-n/N). \end{aligned}$$

Hence, Chebyshev’s inequality leads to

$$\begin{aligned} \left( {{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} d_T(\cdot , A)\right) ^2 {{\,\mathrm{\mathbb {P}}\,}}_{\kappa , n}\left( d_T(\cdot ,A) - {{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} d_T(\cdot , A) \le - {{\,\mathrm{\mathbb {E}}\,}}_{\kappa ,n} d_T(\cdot , A)\right) \le 4(1-n/N). \end{aligned}$$

Using that \({{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}(A) \ge 1/2\), we therefore have \({{\,\mathrm{\mathbb {E}}\,}}_{\kappa , n} d_T(\cdot , A) \le \sqrt{8(1-n/N)}\). Finally, since \((t-a)^2 \ge t^2/2 - a^2\) for any \(a \in {{\,\mathrm{\mathbb {R}}\,}}\) we obtain for \(t \ge \sqrt{8(1-n/N)}\)

$$\begin{aligned} {{\,\mathrm{\mathbb {P}}\,}}_{\kappa ,n}(d_T(\cdot , A) \ge t) \le \exp \left( - \frac{(t-\sqrt{8(1-n/N)})^2}{8(1-n/N)} \right) \le \exp \left( - \frac{t^2}{16(1-n/N)} + 1 \right) . \end{aligned}$$

For \(t \le \sqrt{8(1-n/N)}\), the inequality holds trivially, which finishes the proof. \(\square \)