Distributional stability of the Szarek and Ball inequalities

We prove an extension of Szarek's optimal Khinchin inequality (1976) for distributions close to the Rademacher one, when all the weights are uniformly bounded by a $1/\sqrt2$ fraction of their total $\ell_2$-mass. We also show a similar extension of the probabilistic formulation of Ball's cube slicing inequality (1986). These results establish the distributional stability of these optimal Khinchin-type inequalities. The underpinning to such estimates is the Fourier-analytic approach going back to Haagerup (1981).


Introduction
Let ε 1 , ε 2 , . . .be independent identically distributed (i.i.d.) Rademacher random variables, that is, symmetric random signs satisfying P (ε j = ±1) = 1  2 .Motivated by his study of bilinear forms on infinitely many variables, Littlewood conjectured in [26] (see also [15]) the following inequality: for every n ≥ 1 and every unit vector a in R n , we have (1) E which is clearly best possible.Not until 46 years after it had been posed, was this proved by Szarek in [34].His result was later generalised in a stunning way to the setting of vector-valued coefficients a j in arbitrary normed space by Lata la and Oleszkiewicz in [24] (see also [30,Section 4.2] for a modern presentation of their proof using discrete Fourier analysis).Szarek's original proof was based mainly on an intricate inductive scheme (see also [35]).Note that (1) holds trivially if a ∞ = max j |a j | ≥ 1 √ 2 , for if, say we have |a 1 | ≥ 1 √ 2 , then thanks to independence and convexity, Haagerup in his pioneering work [14] on Khinchin inequalities offered a very different approach to the nontrivial regime a ∞ ≤ 1 Taking that route, the point of this paper is to illustrate the robustness of Haagerup's method and extend (1) to i.i.d.sequences of random variables whose distribution is close to the Rademacher one in the W 2 -Wasserstein distance.Using the same framework, we also treat Ball's cube slicing inequality from [2] which asserts that the maximal-volume hyperplane section of the cube [−1, 1] n in R n is attained at (1, 1, 0, . . ., 0) ⊥ .This can be equivalently stated in probabilistic terms as an inequality akin to (1) as follows (see, e.g.equation (2) in [6]).Let ξ 1 , ξ 2 , . . .be i.i.d.random vectors uniform on the unit Euclidean sphere in R 3 .For every n ≥ 1 and every unit vector a in R n , we have where here and throughout | • | denotes the standard Euclidean norm.Szarek's inequality (1), Ball's inequality (2), as well as these extensions fall under the umbrella of so-called Khinchin-type inequalities.The archetype was Khinchin's result asserting that all L p norms of Rademacher sums a j ε j are comparable to its L 2 -norm, established in his work [22] on the law of the iterated logarithm (and perhaps discovered independently by Littlewood in [26]).Due to the intricacies of the methods involved, sharp Khinchin inequalities are known only for a handful of distributions, most notably random signs ( [14,29]), but also uniforms ( [4,5,6,8,18,21,25]), type L ( [17,32]), Gaussian mixtures ( [1,10]), marginals of ℓ p -balls ( [3,11]), or distributions with good spectral properties ( [23,33]).The present work makes a first step towards more general distributions satisfying only a closeness-type assumption instead of imposing structural properties.Viewing sharp Khinchin-type inequalities as maximization problems for functionals on the sphere, our results assert, perhaps surprisingly, the fact that such inequalities are stable with respect to perturbations of the law of the underlying random vectors.These distributional stability results are novel in the context of optimal probabilistic inequalities.

Main results
For p > 0 and a random vector X in R d , we denote its L p -norm with respect to the standard Euclidean norm | • | on R d by X p = (E|X| p ) 1/p , whereas for a (deterministic) vector a in R n , a ∞ = max j≤n |a j | is its ℓ ∞ -norm.We say that the random vector X in R d is symmetric if −X has the same distribution as X.We also recall that the vector X is called rotationally invariant if for every orthogonal map U on R d , U X has the same distribution as X.Equivalently, X has the same distribution as |X|ξ, where ξ is uniformly distributed on the unit sphere , where the infimum is taken over all couplings of X and Y , that is, all random vectors (X ′ , Y ′ ) in R 2d such that X ′ has the same distribution as X and Y ′ has the same distribution as Y .
Our first result is an extension of Szarek's inequality (1) which reads as follows.
Theorem 1.There is a positive universal constant δ 0 such that if we let X 1 , X 2 , . . .be i.i.d.symmetric random variables satisfying then for every n ≥ 3 and unit vectors a in R n with a ∞ ≤ 1 √ 2 , we have Moreover, we can take δ 0 = 10 −4 .
Note that left hand side of ( 3) is nothing but the W 2 -Wasserstein distance between the distribution of X 1 and the Rademacher distribution since |x ± 1| ≥ |x| − 1 for x ∈ R and thus the optimal coupling of the two distributions is X 1 , sign(X 1 ) .
Our second main result provides an analogous extension for Ball's inequality (2).
Theorem 2. Let X 1 , X 2 , . . .be i.i.d.symmetric random vectors in R 3 .Suppose their common characteristic function φ(t) = Ee i t,X 1 satisfies where C 1 = max{C 0 , 1} and ξ is a random vector uniform on the unit Euclidean sphere S 2 in R 3 .Then for every n ≥ 3 and unit vectors a in R n with a ∞ ≤ 1 √ 2 , we have .
Plainly, if we know that X 1 and ξ are sufficently close in W 3 , then the parameter E|X 1 | 3 in (6) is redundant.In contrast to Theorem 1, here the closeness assumption (6) is put in terms of two parameters of the distribution: its third moment and the polynomial decay of its characteristic function.It is not clear whether this is essential.At the technical level of our proofs, the third moment is needed to carry out a certain Gaussian approximation, whilst the decay assumption has to do with an a priori lack of integrability in the Fourier-analytic representation of the L −1 norm (as opposed to the L 1 -norm handled in Theorem 1).
On the other hand, neither of these is very restrictive.In particular, if X 1 has a density f on R 3 vanishing at ∞ whose gradient is integrable, then Another natural sufficient condition is the rotational invariance of X 1 : if, say, X 1 has the same distribution as Rξ, for a nonnegative random variable R and an independent of it random vector ξ uniform on the unit sphere S 2 , then Archimedes' Hat-Box theorem implies that t, Rξ , conditioned on the value of R, is uniform on [−R|t|, R|t|] and thus Moreover, in this case W 2 (X 1 , ξ) = R − 1 2 (since for every unit vectors θ, θ ′ in R d and R ≥ 0, we have |Rθ − θ ′ | ≥ |R − 1|, as is easily seen by squaring).Probabilistically, this is an important special case as it yields results for symmetric unimodal distributions on R. Indeed, if X is of the form Rξ as above, for q > −1, we have the identity where the R j are i.i.d.copies of R and the U j are i.i.d.uniform random variables on [−1, 1], independent of the R j (see Proposition 4 in [19]).The R j U j showing up in this formula can have any symmetric unimodal distribution, uniquely defined by the distribution of R j .Thus, if V 1 , V 2 , . . .be i.i.d.symmetric unimodal random variables, Theorem 2 then immediately yields a sharp upper bound on lim q↓−1 (1 + q)E n j=1 a j X j q for all unit vectors a with a ∞ ≤ 1 √ 2 (cf.[6,5,11,25]).A result in the same vein as Theorem 2 is König and Koldobsky's extension [19] of Ball's cube slicing inequality to product measures with densities satisfying certain regularity and moment assumptions.Their result also applies specifically to vectors of weights satisfying the small coefficient condition a ∞ ≤ 1 √ 2 .Approached differently, full extensions of (1) and (2) (i.e.without the small coefficient restriction on a) have been obtained in our recent work [12] for a very special family of distributions corresponding geometrically to extremal sections and projections of ℓ p -balls.
Acknowledgements.We should very much like to thank an anonymous referee for their careful reading of the manuscript and helpful suggestions, particularly the one leading to Remark 5.

Proof of Theorem 1
Our approach builds on Haagerup's slick Fourier-analytic proof from [14].We let ( 9) be the characteristic function of X 1 .Using the elementary Fourier-integral representation as well as the symmetry and independence of the X j , we have, (10) (see also Lemma 1.2 in [14]).If a is a unit vector in R n with nonzero components, using the AM-GM inequality, we obtain Haagerup's lower bound (11) E (see Lemma 1.3 in [14]).The crucial lemma reads as follows.
If we take the lemma for granted, the proof of Theorem 1 is finished because the small coefficient assumption for each j, and as a result we get where the last equality is justified by (10).
It remains to prove Lemma 3. To this end, we recall that if the X j were Rademacher random variables, then the special function Ψ becomes Haagerup showed that for every s > 0, and concluded by the product representation that Ψ 0 is strictly increasing.In particular, Lemma 3 holds in the Rademacher case due to monotonicity.The rest of the proof builds exactly on this observation: we show that the closeness of distributions guarantees that Ψ and Ψ 0 are close for, say s ≥ 3, and that their derivatives are close for 2 ≤ s ≤ 3.
Crucially, not only do we know that Ψ 0 is strictly monotone, but also we can get a good bound on its derivative near the endpoint s = 2, which we record now for future use.
Lemma 4. We have Proof.Differentiating Haagerup's product expression ( 14) term-by-term yields The rest of this section is devoted to the proof of Lemma 3. We break it into several parts.

3.1.
A uniform bound on the characteristic function.
Lemma 5. Let X be a symmetric random variable satisfying (3).Then its characteristic function φ(t) = Ee itX satisfies, Proof.By symmetry, the triangle inequality and the bound | sin u| ≤ |u|, we get using the Cauchy-Schwarz inequality in the last estimate.Moreover, Plugging in the assumption |X| − 1 2 ≤ δ 0 completes the proof.

3.2.
Uniform bounds on the special function and its derivative.
Lemma 6.Assuming (3) and the symmetry of X 1 , the functions Ψ and Ψ 0 defined in (12) and (13) respectively satisfy Proof.Fix T > 0. Breaking the integral defining Ψ into We also have Optimizing over the parameter T gives the desired bound.
Lemma 7.For s ≥ 2 and 0 < u, v < 1, we have Proof.Let f (x) = x s log x.It suffices to prove that on (0, 1) we have To prove this observe that for t ∈ (0, 1) we have αt log t + t ≤ t ≤ 1 and Lemma 8. Assuming (3) and the symmetry of X 1 , the functions Ψ and Ψ 0 defined in (12) and ( 13) satisfy Proof.Changing the variables and differentiating gives To estimate the integral, we proceed along the same lines as in the proof of Lemma 6.We fix T > 0, write Altogether, with the aid of Lemma 6, Minimising the second term over T > 0 leads to the bound by For s ≥ 2, we have It is now clear that as long as δ 0 is sufficiently small, namely 2η ≤ Ψ 0 (3) − Ψ 0 (2), we get Ψ(s) ≥ Ψ(2), as desired.It can be checked that Ψ 0 (3) . and a choice of δ 0 ≤ 10 −4 suffices for the estimate Ψ(s) ≥ Ψ(2) to hold for s ≥ 3. Now we assume that 2 < s < 3. We have for some 2 < θ < s.Using Lemmas 8 and 4, we get which is positive for all δ 0 ≤ 3.7 • 10 −4 .Thus, Ψ(s) ≥ Ψ(2) holds in both cases.

Proof of Theorem 2
The approach is the same as for Theorem 1, however certain technical details are substantially more involved.We begin with a Fourier-analytic representation for negative moments due to Gorin and Favorov [13].
Specialised to d = 3, q = −1 (β −1,3 = 1 2π 2 ) and X = n j=1 a j X j with X 1 , . . ., X n independent random vectors, we obtain (20) E Note that thanks to the decay assumption (5), the integral on the right hand side converges as long as n ≥ 2 (assuming the a j are nonzero).As in Ball's proof from [2], Hölder's inequality yields denoting the characteristic function of X 1 .Exactly as in the proof of Theorem 1, the following pivotal lemma allows us to finish the proof.
Our proof of Lemma 10 relies on this, additional bounds on the derivative Φ ′ 0 (s) near s = 2, as well as, crucially, bounds quantifying how close Φ is to Φ 0 .In the following subsections we gather such results and then conclude with the proof of Lemma 10.

4.1.
A uniform bound on the characteristic function.Throughout these sections ξ always denotes a random vector uniform on the unit sphere S 2 in R 3 .
Lemma 11.Let X be a symmetric random vector in R 3 with δ = W 2 (X, ξ).Then, its characteristic function φ(t) = Ee i t,X satisfies Proof.Let ξ be uniform on S 2 such that for the joint distribution of (X, ξ), we have X − ξ 2 = W 2 (X, ξ) = δ.By symmetry, the bound | sin u| ≤ |u| and the Cauchy-Schwarz inequality (used twice), we get To conclude we use the triangle inequality 4.2.Bounds on the special function.We begin with a bound on the difference Φ(s)− Φ 0 (s) obtained from the uniform bound on the characteristic functions (Lemma 11 above).
In contrast to Lemma 6, the bound is not uniform in s.For s not too large (the bulk), we incur the factor s 3/4 .To fight it off for large values of s, we shall employ a Gaussian approximation.For that part to work, it is crucial that Φ 0 (2 4.2.1.The bulk. Lemma 12. Let X be a symmetric random vector in R 3 with δ = W 2 (X, ξ) and characteristic function φ satisfying (5) for some C 0 > 0. Let Φ and Φ 0 be defined through (22) and (24) respectively.For every s ≥ 2, we have Proof.Given the definitions, we have We fix T > 0 and split the integration into two regions.By virtue of the decay assumption (5), this is at most Adding up these two bounds and optimising over T yields Plugging this back gives the assertion.

The Gaussian approximation.
We now present a bound on Φ(s) which does not grow as s → ∞ that will allow us to prove Lemma 10 for s sufficiently large.

2
(eC 0 ) 2 < 1 e (also using, say δ + 2 < 3).Then we get Small t.For 0 < u < π, we have Fix 0 < θ < π.Then, first using Lemma 11 and then (28), we obtain Integrating using polar coordinates and invoking the standard tail bound the last integral gets upper bounded by Summarising, we have shown that Very small t.Taylor-expanding φ at 0 with the Lagrange remainder, for some point η in the segment [0, s −1/2 t].To bound the error term, we note that We also note that in the domain {|t| ≤ θ √ s}, the leading term 1 . Assuming this, we thus get Evoking (6), let ξ be uniform on S 2 such that X − ξ 2 ≤ δ with respect to some coupling.Then, for a fixed vector v in R 3 , we obtain the bound where we have set α = ( 1 √ 3 − δ) 2 − 1 3 θE|X| 3 and assumed that α is positive in the last equality (guaranteed by choosing θ sufficiently small).Then we finally obtain Putting these three bounds together gives the assertion.Note that we have imposed the conditions δ < 1 √ 3 and δ < (15C 0 ) −2 when C 0 > π e , as well as θ < π, θ < 3E|X| 3 implies the other two conditions on θ.

4.3.
Bounds on the derivative of the special function.
Lemma 14.Let X be a symmetric random vector in R 3 with δ = W 2 (X, ξ) and characteristic function φ satisfying (5) for some C 0 > 0. Let Φ and Φ 0 be defined through (22) and ( 24) respectively.For every s ≥ 2, we have Proof.First we take the derivative, For the resulting Φ − Φ 0 term, we use Lemma 12.To bound the difference of the integrals resulting from the second term, we fix T > 0 and split the integration into two regions.Small t.Using Lemmas 7 and 11, we obtain Large t.Note that for s ≥ 2, and 0 < u < 1 we have, Adding up these two bounds and optimising over T yields To prove this, we build on the argument of Nazarov and Podkorytov from [31].For a somewhat similar bound, we refer to Proposition 7 in König and Koldobsky's work [20] on maximal-perimeter sections of the cube.For convenience and completeness, we include all arguments in detail.We consider functions (32) f a (x) = e − π 2 x 2 a , g(x) = sin πx πx , x > 0, and their distribution functions Lemma 20.For a ∈ [1, π 3 ] the function F a − G has precisely one sign change point y 0 and at this point changes sign from " − " to " + ".
Proof.Note that F a (y) = G(y) = 0 for y ≥ 1, so we only consider y ∈ (0, 1).We have F a (y) = 2 πa ln( 1 y ).The function g(x) has zeros for x ∈ Z.For m ∈ N, let y m = max [m,m+1] g.We clearly have y m < 1 πm and y m > g(m πm ), which shows that the sequence y m is decreasing.We have the following claims.
Claim 1.The function F a − G is positive on (y 1 , 1).
Claim 2. The function F a − G changes sign at least once in (0, 1).
Due to Claim 1 it is enough to show that F a − G is sometimes negative.We have Claim 3. The function F a − G is increasing on (0, y 1 ).Clearly F ′ a > F ′ 1 and thus the claim follows from the fact that F 1 − G is increasing on (0, y 1 ), which was proved in [31] (Chapter I, Step 5).
Proof of Lemma 19.The assumption Φ 0 (s 0 ) = 2 a is equivalent to After changing variables and using Lemma 20, we get from the Nazarov-Podkorytov lemma (Chapter I, Step 4 in [31]) that for s ≥ s 0 Proof of Lemma 16.Take s 0 = 2.01 and a = 2Φ 0 (s 0 ) −2 in Lemma 19.Since Φ 0 (2) = √ 2, Ball's inequality gives that a ≥ 1.We need to check that a ≤ π 3 .From Lemma We also remark that We break the argument into several regimes for the parameter s.Large s.With hindsight, we set (36) s 0 = max 10 6 (E|X| 3 ) 2 , 2 log C 1 In particular, s 0 ≥ 10 5 .Using Lemma 13, that is we will show that Φ(s) ≤ Φ(2) for all s ≥ s 0 .We take θ = 1 100E|X| 3 which satisfies the conditions of the lemma and then, for the first term A 1 , we use Thanks to (34), we also have so it suffices to show that each of the second and third terms A 2 , A 3 as well as this additional error A 4 do not exceed 1 150 .Using δ < 10 −38 C −9 1 , we get For the exponent in the second term A 2 , observe that and, consequently, Thus, using s ≥ s 0 ≥ 10 6 (E|X| 3 ) 2 , we get .
Remark 2. Handling the complementary case a ∞ > 1 √ 2 which is not covered by Theorems 1 and 2 is a different story.The trivial convexity argument presented in the introduction works in fact only for the Rademacher case, as it requires , and only for the L 1 -norm (see Remark 21 in [6]).To circumvent this, several different approaches have been used: Haagerup's ad hoc approximation (see §3 in [14]), Nazarov and Podkorytov's induction with a strengthened hypothesis (see Ch. II, Step 5 in [31]) which has also been adapted to other distributions (see [6,5,8]), and very recently a different inductive scheme near the extremiser (without a strengthening) needed in a geometric context (see [12]).None of these techniques appears amenable to the broad setting of general distributions that is treated in this paper.a j ε j ≤ δ 0 , by a simple application of the triangle inequality and • 1 ≤ • 2 .Thus, applying this (twice) and the bound (37) of De Diakonikolas and Servedio, we conclude that Theorem 1 also holds for unit vectors a with δ(a) ≥ (2δ 0 /κ) 2 .The same will apply to Theorem 2 with the aid of Theorem 1.2 from [7], a strengthening of Ball's inequality (2) (see also [27]).See [12] for numerical values of the constants κ.
Remark 4. We have used the W 2 -distance in Theorems 1 and 2 for concreteness and convenience.Of course, for every p ≥ 1, if we use the W p -distance in (3) and assume that X 1 is in L p p−1 , then the proofs of Lemmas 5 and 11 go through with the Cauchy-Schwarz inequality replaced by Hölder's inequality and the rest of the proof remains unchanged.It might be of interest to examine weaker distances in such statements.
Remark 5. Szarek's sharp L 1 −L 2 inequality (1) was extended to sharp L p −L 2 bounds for all p > 0 by Haagerup in [14], using Fourier-integral representations of |x| p .It therefore seems plausible that our techniques allow to extend Theorem 1 to sharp bounds on L p norms, but additional (nontrivial and technical) work is needed to treat the analogues of the special function Ψ 0 , (13), relevant to Haagerup's L p bounds.Similarly, the main result from [6] which extends (2) to sharp L p − L 2 bounds for all −1 < p < 0 could be a starting point for extensions of Theorem 2 to L p norms with −1 < p < 0.
Statements.The authors state that there is no conflict of interest.This manuscript has no associated data.

T
and for the second integral use |u s log u| = 1 s |u s log(u s )| ≤ 1 es , 0 < u < 1, to get a bound on it by2  esT , whilst for the first integral, using first Lemma 7 and then Lemma 5, we obtain