1 Introduction

Given a distribution, the isoperimetric coefficient of a subset is the ratio of the measure of the subset boundary to the minimum of the measures of the subset and its complement. Taking the minimum of such ratios over all subsets defines the isoperimetric coefficient of the distribution, also called the Cheeger isoperimetric coefficient of the distribution.

Kannan, Lovász and Simonovits (KLS) [12] conjecture that for any distribution that is log-concave, the Cheeger isoperimetric coefficient equals to that achieved by half-spaces up to a universal constant factor. If the conjecture is true, the Cheeger isoperimetric coefficient can be determined by going through all the half-spaces instead of all subsets. For this reason, the KLS conjecture is also called the KLS hyperplane conjecture. To make it precise, we start by formally defining log-concave distributions and then we state the conjecture.

A probability density function \(p: \mathbb {R}^d\rightarrow \mathbb {R}\) is log-concave if its logarithm is concave, i.e., for any \(x, y \in \mathbb {R}^{d} \times \mathbb {R}^{d}\) and for any \(\lambda \in [0, 1]\),

$$\begin{aligned} p(\lambda x + (1 - \lambda ) y) \ge p(x)^\lambda p(y)^{1-\lambda }. \end{aligned}$$
(1)

Common probability distributions such as Gaussian, exponential and logistic are log-concave. This definition also includes any uniform distribution over a convex set defined as follows. A subset \(K \subset \mathbb {R}^d\) is convex if \(\forall x, y \in K \times K, z \in [x, y] \implies z \in K\). The isoperimetric coefficient \(\psi (p)\) of a density p in \(\mathbb {R}^d\) is defined as

$$\begin{aligned} \psi (p) :=\inf _{S \subset \mathbb {R}^d}\frac{p^+(\partial S)}{\min (p(S), p(S^c))} \end{aligned}$$
(2)

where \(p(S) = \int _{x \in S} p(x) dx\) and the boundary measure of the subset is

$$\begin{aligned} p^+(\partial S) :=\underset{\epsilon \rightarrow 0^+}{\lim \inf }\ \frac{p\left( \left\{ x: {\mathbf {d}}(x, S) \le \epsilon \right\} \right) - p(S)}{\epsilon }, \end{aligned}$$

where \({\mathbf {d}}(x, S)\) is the Euclidean distance between x and the subset S.

The KLS conjecture is stated by Kannan, Lovász and Simonovits [12] as follows.

Conjecture 1

There exists a universal constant c, such that for any log-concave density p in \(\mathbb {R}^d\), we have

$$\begin{aligned} \psi (p) \ge \frac{c}{\sqrt{\rho \left( p \right) }}, \end{aligned}$$

where \(\rho \left( p \right) \) is the spectral norm of the covariance matrix of p. In other words, \(\rho \left( p \right) = \left\| A\right\| _{2}\), where \(A = {{\,\mathrm{Cov}\,}}_{X \sim p} (X)\) is the covariance matrix.

An upper bound of \(\psi (p)\) of the same form is relatively easy and it was shown to be achieved by half-spaces [12]. Proving the lower bound on \(\psi (p)\) up to some small factors in Conjecture 1 is the main goal of this paper. We say a log-concave density is isotropic if its mean \({\mathbb {E}}_{X\sim p} [X]\) equals to 0 and its covariance \({{\,\mathrm{Cov}\,}}_{X\sim p}(X)\) equals to \(\mathbb {I}_d\). In the case of isotropic log-concave densities, the KLS conjecture states that any isotropic log-concave density has its isoperimetric coefficient lower bounded by a universal constant.

There are many attempts trying to lower bound the Cheeger isoperimetric coefficient in the KLS conjecture. We refer readers to the survey paper by Lee and Vempala [18] for a detailed exposition of these attempts. In particular, the original KLS paper [12] (Theorem 5.1) shows that for any log-concave density p with covariance matrix A,

$$\begin{aligned} \psi (p) \ge \frac{\log (2)}{\sqrt{{{\,\mathrm{Tr}\,}}\left( A \right) }}. \end{aligned}$$
(3)

The original KLS paper [12] only deals with uniform distributions over convex sets, but their proof techniques can be easily extended to show that the same results hold for all log-concave densities. Remark that Equation (3) implies \(\psi (p) \ge \frac{\log (2)}{d^{1/2} \cdot \sqrt{\rho \left( p \right) }}\). The current best bound is shown in Lee and Vempala [17], where they show that there exists a universal constant c such that for any log-concave density p with covariance matrix A,

$$\begin{aligned} \psi (p) \ge \frac{c}{\left( {{\,\mathrm{Tr}\,}}\left( A^2 \right) \right) ^{1/4}}. \end{aligned}$$
(4)

It implies that \(\psi (p) \ge \frac{c}{d^{1/4} \cdot \sqrt{\rho \left( p \right) }}\). Note that in Lee and Vempala [17], their notation of \(\psi (p)\) is the reciprocal of ours and it is later switched in Theorem 32 of the survey paper [18] by the same authors. As a result, the above bound is not a misstatement of the results in Lee and Vempala [17] and it is simply translated into our notations. In this paper, we improve the dimension dependency \(d^{-1/4}\) to \(d^{-o_d(1)}\) in the lower bound of the isoperimetric coefficient.

There are many implications of improving the lower bound in the KLS conjecture. The two closely related conjectures are Bourgain’s slicing conjecture [3, 4] and the thin-shell conjecture [2]. It is worth noting that Bourgain [4] stated the slicing conjecture earlier than the introduction of the KLS conjecture. In terms of their connections to the KLS conjecture, Eldan and Klartag [9] proved that the thin-shell conjecture implies Bourgain’s slicing conjecture up to a universal constant factor. Later, Eldan [8] showed that the inverse of an lower bound of the isoperimetric coefficient is equivalent to an upper bound of the thin-shell constant in the thin-shell conjecture. Combining these two results, we have that an lower bound in the KLS conjecture implies upper bounds in the thin-shell conjecture and in Bourgain’s slicing conjecture.

The current best upper bound of the thin-shell constant has the dimension dependency \(d^{1/4}\) due to Lee and Vempala’s [17] improvement in the KLS conjecture. The current best bound of the slicing constant in Bourgain’s slicing conjecture also has the dimension dependency \(d^{1/4}\), proved by Klartag [13] without using the KLS conjecture. Klartag’s slicing constant bound is a slight improvement over Bourgain’s earlier slicing bound [4] which has the dimension dependency \(d^{1/4}\log (d)\). Given the current best bounds in these three conjectures and the relation among them, we conclude that improving the current best lower bound in the KLS conjecture improves the current best bounds for the other two conjectures, as noted in Lee and Vempala [18]. For a detailed exposition of the three conjectures and related results since the introduction of Bourgain’s slicing conjecture, we refer readers to Klartag and Milman [14].

Additionally, improving the lower bound in the KLS conjecture also improves concentration inequalities for Lipschitz functions of log-concave measures. It also leads to faster mixing time bounds of Markov chain Monte Carlo (MCMC) sampling algorithms on log-concave measures. Despite the great importance of these results, deriving these results from our new bound in the KLS conjecture is not the main focus of our paper. We refer readers to Milman [20] and Lee and Vempala [18] for more details about the abundant implications of the KLS conjecture.

Notation For two sequences \(a_n\) and \(b_n\) indexed by an integer n, we say that \(a_n = o_n(b_n)\) if \(\lim _{n \rightarrow \infty } \frac{a_n}{b_n} = 0\). The Euclidean norm of a vector \(x \in \mathbb {R}^d\) is denoted by \(\left\| x\right\| _{2}\). The spectral norm of a square matrix \(A \in \mathbb {R}^{d\times d}\) is denoted by \(\left\| A\right\| _{2}\). The Euclidean ball with center x and radius r is denoted by \(\mathbb {B}(x, r)\). For a real number \(x \in \mathbb {R}\), we denote its ceiling by \(\lceil x \rceil = \min \left\{ m \in \mathbb {Z} \mid m \ge x \right\} \). We say a density p is more log-concave than a Gaussian density \(\varphi \) if p can be written as a product form \(p = \nu \cdot \varphi \) where \(\varphi \) is the Gaussian density and \(\nu \) is a log-concave function (that is, \(\nu \) is proportional to a log-concave density). For a martingale \((M_t,\ t \in \mathbb {R}_+)\), we use \(\left[ M \right] _t\) to denote its quadratic variation, defined as

$$\begin{aligned} \left[ M \right] _t = \sup _{k \in \mathbb {N}} \sup _{0 \le t_1 \le \cdots \le t_k \le t} \sum _{i=1}^k \left( M_{t_i} - M_{t_{i-1}} \right) ^2. \end{aligned}$$

2 Main results

We prove the following lower bound on the isoperimetric coefficient of any log-concave density.

Theorem 1

There exists a universal constant c such that for any log-concave density p in \(\mathbb {R}^d\) and any integer \(\ell \ge 1\), we have

$$\begin{aligned} \psi (p) \ge \frac{1}{\left[ c \cdot \ell \left( \log (d)+1 \right) \right] ^{\ell /2} d^{16/\ell } \cdot \sqrt{\rho \left( p \right) }} \end{aligned}$$
(5)

where \(\rho \left( p \right) \) is the spectral norm of the covariance matrix of p.

As a corollary, take \(\ell = \left\lceil \left( \frac{\log (d)}{\log \log (d)} \right) ^{1/2} \right\rceil \), then there exists a constant \(c'\) such that

$$\begin{aligned} \psi (p) \ge \frac{1}{d^{c' \left( \frac{\log \log (d)}{\log {d}} \right) ^{1/2}} \cdot \sqrt{\rho \left( p \right) }}. \end{aligned}$$

Since \(\lim _{d\rightarrow \infty } \frac{\log \log (d)}{\log (d)} = 0\), for \(d\) large enough, the above lower bound is better than any lower bound of the form \(\frac{1}{d^{c''} \sqrt{\rho \left( p \right) }} \) (\(c''\) is a positive constant) in terms of dimension \(d\) dependency.

The proof of the main theorem uses the stochastic localization scheme introduced by Eldan [8]. Eldan uses this stochastic localization scheme to show that the thin shell conjecture is equivalent to the KLS conjecture up to a logarithmic factor. The construction of stochastic localization scheme uses elementary properties of semimartingales and stochastic integration. The main idea of Eldan’s proof to derive the KLS conjecture from the thin shell conjecture is to smoothly multiply a Gaussian part to the log-concave density, so that the modified density is more log-concave than a Gaussian density. When the Gaussian part is large enough, one can then easily prove the isoperimetric inequality.

The same scheme was refined in Lee and Vempala [17] to obtain the current best lower bound in the KLS conjecture. Lee and Vempala directly attack the KLS conjecture while following the same stochastic localization scheme to smoothly multiply a Gaussian part to the log-concave density. Their use of a new potential function leads to the current best lower bound in the KLS conjecture. The proof in this paper builds on Lee and Vempala [17]’s refinements of Eldan’s method, while it improves the handling of several quantities involved in the stochastic localization scheme. Figure 1 provides a diagram showing the relationship between the main lemmas.

Fig. 1
figure 1

Proof sketch.

To ensure the existence and the uniqueness of the stochastic localization construction, we first prove a lemma that deals with log-concave densities with compact support. Then we relate back to the main theorem by finding a compact support which contains most of the probability measure for a log-concave density.

Lemma 1

There exists a universal constant c such that for any log-concave density p in \(\mathbb {R}^d\) with compact support and any integer \(\ell \ge 1\), we have

$$\begin{aligned} \psi (p) \ge \frac{1}{\left[ c \cdot \ell \left( \log (d)+1 \right) \right] ^{\ell /2} d^{16/\ell } \cdot \sqrt{\rho \left( p \right) }}. \end{aligned}$$
(6)

The proof of Lemma 1 is provided in Section 2.5 after we introduce the intermediate lemmas. The use of the integer l in the lemma indicates that we control the Cheeger isoperimetric coefficient in an iterative fashion. In fact, we prove Lemma 1 by induction over l starting from the known bound in Equation (3). For this, we define the supremum of the product of the isoperimetric coefficient and the square-root of its spectral norm over all log-concave densities in \(\mathbb {R}^d\) with compact support:

$$\begin{aligned} \psi _d= \inf _{ \begin{array}{c} \text{ log-concave } \text{ density }\ p\ \text{ in }\ \mathbb {R}^d\\ \text{ with } \text{ compact } \text{ support } \end{array}} \psi (p) \sqrt{\rho \left( p \right) }. \end{aligned}$$
(7)

Then we prove the following lemma on the lower bound of \(\psi _d\), which serves as the main induction argument.

Lemma 2

Suppose that \(\psi _k \ge \frac{1}{\alpha k^\beta }\) for all \(k \le d\) for some \(0 \le \beta \le \frac{1}{2}\) and \(\alpha \ge 1\), take \(q = \lceil \frac{1}{\beta } \rceil + 1\), there exists a universal constant c such that we have

$$\begin{aligned} \psi _d\ge \frac{1}{c \cdot q^{1/2} \alpha \log (d)^{1/2} d^{\beta - \beta / (8q) }}. \end{aligned}$$

The proof of Lemma 2 is provided towards the end of this section in Section 2.4. To have a good understanding of how we get there, we start by introducing the stochastic localization scheme introduced by Eldan [8].

2.1 Eldan’s stochastic localization scheme.

Given a log-concave density p in \(\mathbb {R}^d\) with covariance matrix \(A\), we define the following stochastic differential equation (SDE)

$$\begin{aligned} dc_t&= C_t^{1/2}dW_t + C_t \mu _t dt,\quad c_0 = 0,\nonumber \\ dB_t&= C_t dt,\quad B_0 = 0, \end{aligned}$$
(8)

where \(W_t\) is the Wiener process, the matrix \(C_t\), the density \(p_t\), the mean \(\mu _t\) and the covariance \(A_t\) are defined as follows

$$\begin{aligned} C_t&= A^{-1}, \end{aligned}$$
(9)
$$\begin{aligned} p_t(x)&= \frac{e^{c_t^\top x - \frac{1}{2}x^\top B_t x} p(x)}{\int _{\mathbb {R}^d} e^{c_t ^\top x - \frac{1}{2}y^\top B_t y} p(y) dy}, \text {for}\, x \in \mathbb {R}^d, \end{aligned}$$
(10)
$$\begin{aligned} \mu _t&= \int _{\mathbb {R}^d} x p_t(x)dx, \end{aligned}$$
(11)
$$\begin{aligned} A_t&= \int _{\mathbb {R}^d} \left( x - \mu _t \right) \left( x - \mu _t \right) p_t(x) dx. \end{aligned}$$
(12)

The next lemma shows the existence and the uniqueness of the SDE solution.

Lemma 3

Given a density p in \(\mathbb {R}^d\) with compact support with covariance \(A\) and \(A\) is invertible, then the SDE (8) is well defined and it has a unique solution on the time interval [0, T], for any time \(T > 0\). Additionally, for any \(x \in \mathbb {R}^d\), \(p_t(x)\) is a martingale with

$$\begin{aligned} dp_t(x) = \left( x - \mu _t \right) ^\top A^{-1/2} dW_t p_t(x). \end{aligned}$$
(13)

The proof of Lemma 3 follows from the standard existence and uniqueness theorem of SDE (Theorem 5.2 in Øksendal [21]). The proof is provided in Appendix A.

Before we dive into the proof of Lemma 2, we discuss how the stochastic localization scheme allows us to control the boundary measure of a subset. First, according to the concavity of the isoperimetric profile (Theorem 2.8 in Sternberg and Zumbrun [25] or Theorem 1.8 in Milman [20]), it is sufficient to consider subsets of measure 1/2 in the definition of the isoperimetric coefficient in Equation (2). Second, the density \(p_t\) is log-concave and it is more log-concave than the Gaussian density proportional to \(e^{-\frac{1}{2}x^\top B_t x}\). It can be shown via the KLS localization lemma [12] that a density which is more log-concave than a Gaussian has an isoperimetric coefficient lower bound that depends on the covariance of the Gaussian (see e.g. Theorem 2.7 in Ledoux [16] or Theorem 4.4 in Cousins and Vempala [7]). Third, given an initial subset E of \(\mathbb {R}^d\) with measure \(p(E) = \frac{1}{2}\), using the martingale property of \(p_t(E)\), we observe that

$$\begin{aligned} p(\partial E)&= {\mathbb {E}}\left[ p_t(\partial E) \right] \\&{\mathop {\ge }\limits ^{\mathrm{(i)}}} {\mathbb {E}}\left[ \frac{1}{2}\left\| B_t^{-1}\right\| _{2}^{-1/2} \min \left( p_t(E), p_t(E^c) \right) \right] \\&{\mathop {\ge }\limits ^{\mathrm{(ii)}}} \frac{1}{4}\cdot \frac{1}{2}\left\| B_t^{-1}\right\| _{2}^{-1/2} {\mathbb {P}}\left( \frac{1}{4}\le p_t(E) \le \frac{3}{4}\right) \\&= \frac{1}{4}\left\| B_t^{-1}\right\| _{2}^{-1/2} {\mathbb {P}}\left( \frac{1}{4}\le p_t(E) \le \frac{3}{4}\right) \cdot \min \left\{ p(E), p(E^c) \right\} . \end{aligned}$$

Inequality (i) uses the isoperimetric inequality for a log-concave density which is more log-concave than a Gaussian density proportional to \(e^{-\frac{1}{2}x^\top B_t x}\) [7, 16]. Inequality (ii) uses the fact that \(p_t(E)\) is nonnegative.

Based on the above observation, the high level idea of the proof requires two main steps:

  • There exists some time \(t > 0\), such that the Gaussian component \(\frac{1}{2}x^\top B_t x\) of the density \(p_t\) is large enough, so that we can apply the known isoperimetric inequality for densities more log-concave than a Gaussian.

  • We need to control the quantity \(p_t(E)\) so that the obtained isoperimetric inequality at time t can be related back to that at time 0.

The first step is obvious since our construction explicitly enforces the density \(p_t\) to have a Gaussian component \(\frac{1}{2}x^\top B_t x\) in Equation (9). Then the remaining question is whether we can run the SDE long enough to make the Gaussian component large enough while still keeping \(p_t(E)\) to be the same order as \(p(E) = \frac{1}{2}\) with large probability.

2.2 Control the evolution of the measure of a subset.

Lemma 4

Under the same assumptions of Lemma 3, for any measurable subset E of \(\mathbb {R}^d\) with \(p(E) = \frac{1}{2}\) and \(t > 0\), the solution \(p_t\) of the SDE (9) satisfies

$$\begin{aligned} {\mathbb {P}}\left( \frac{1}{4} \le p_t(E) \le \frac{3}{4} \right) \ge \frac{9}{10} - {\mathbb {P}}\left( \int _0^t \left\| A^{-1/2} A_t A^{-1/2}\right\| _{2} ds \ge \frac{1}{64} \right) . \end{aligned}$$

This lemma is proved in Lemma 29 of Lee and Vempala [17]. We provide a proof here for completeness.

Proof of Lemma 4. Let \(g_t = p_t(E)\). Using Equation (13), we obtain the following derivative of \(g_t\)

$$\begin{aligned} d g_t&= \int _E (x - \mu _t)^\top A^{-1/2} dW_t p_t(x) dx. \end{aligned}$$

Its quadratic variation is

$$\begin{aligned} d\left[ g \right] _t&= \left\| \int _E A^{-1/2} (x - \mu _t) p_t(x) dx \right\| _{2}^2 dt \\&= \max _{\left\| \xi \right\| _{2} \le 1} \left( \int _E \xi ^\top A^{-1/2} (x - \mu _t) p_t(x) dx \right) ^2 dt \\&\le \max _{\left\| \xi \right\| _{2} \le 1} \left( \int _E \left( \xi ^\top A^{-1/2} (x - \mu _t) \right) ^2 p_t(x) dx \right) \left( \int _E p_t(x) dx \right) dt \\&\le \max _{\left\| \xi \right\| _{2} \le 1} \xi ^\top A^{-1/2} A_t A^{-1/2} \xi dt \\&= \left\| A^{-1/2} A_t A^{-1/2}\right\| _{2} dt, \end{aligned}$$

where the inequality follows from Cauchy–Schwarz inequality. Applying the Dambis, Dubins-Schwarz theorem (see e.g. Revuz and Yor [23] Section V.1 Theorem 1.7), there exists a Wiener process \({\tilde{W}}_t\) such that \(g_t - g_0\) has the same distribution as \({\tilde{W}}_{[g]_t}\). Since \(g_0 = \frac{1}{2}\), we obtain

$$\begin{aligned} {\mathbb {P}}\left( \frac{1}{4} \le p_t(E) \le \frac{3}{4} \right)&= {\mathbb {P}}\left( -\frac{1}{4} \le {\tilde{W}}_{[g]_t} \le \frac{1}{4} \right) \\&\ge 1 - {\mathbb {P}}\left( \max _{0 \le s \le \frac{1}{64}} \left| {\tilde{W}}_s \right|> \frac{1}{4} \right) - {\mathbb {P}}([g]_t> \frac{1}{64}) \\&= 1 - 4 {\mathbb {P}}\left( {\tilde{W}}_{\frac{1}{64}}> \frac{1}{4} \right) - {\mathbb {P}}\left( [g]_t> \frac{1}{64} \right) \\&\ge \frac{9}{10} - {\mathbb {P}}\left( \int _0^t \left\| A^{-1/2} A_t A^{-1/2}\right\| _{2} ds > \frac{1}{64} \right) , \end{aligned}$$

where the last inequality follows from the fact that \({\mathbb {P}}\left( \xi > 2 \right) < 0.023\) for \(\xi \) follows the standard Gaussian distribution.\(\square \)

2.3 Control the evolution of the spectral norm.

According to Lemma 4, to control the evolution of the measures of subsets, we need to control the spectral norm of \(A^{-1/2} A_t A^{-1/2}\). The following lemma serves the purpose.

Lemma 5

In addition to the same assumptions of Lemma 3, if \(\psi _k \ge \frac{1}{\alpha k^\beta }\) for all \(k \le d\) for some \(0 < \beta \le \frac{1}{2}\) and \(\alpha \ge 1\), then there exists a universal constant c such that for \(q = \lceil \frac{1}{\beta } \rceil + 1\), \(d\ge 3\) and \(T_2 = \frac{1}{ c \cdot q \alpha ^2\log (d) d^{2\beta - \beta /(4q)}}\), we have

$$\begin{aligned} {\mathbb {P}}\left( \int _{0}^{T_2} \left\| A^{-1/2} A_t A^{-1/2}\right\| _{2} dt \ge \frac{1}{64} \right) < \frac{4}{10}. \end{aligned}$$

Direct control of the largest eigenvalue of \(A^{-1/2} A_t A^{-1/2}\) is not trivial, instead we use the potential function \(\Gamma _t\) to upper bound the largest eigenvalue. Define

$$\begin{aligned} Q_t&= A^{-1/2} A_t A^{-1/2} \nonumber \\ \Gamma _t&= {{\,\mathrm{Tr}\,}}\left( Q_t^q \right) . \end{aligned}$$
(14)

It is clear that \(\Gamma _t^{1/q} \ge \left\| A^{-1/2} A_t A^{-1/2}\right\| _{2}\). So in order to upper bound \(\left\| A^{-1/2} A_t A^{-1/2}\right\| _{2}\), it is sufficient to upper bound \(\Gamma _t^{1/q}\). The advantage of using \(\Gamma _t\) is that it is differentiable. We have the following differential for \(A_t\) and \(\Gamma _t\):

$$\begin{aligned} dA_t&= \int (x - \mu _t) (x - \mu _t)^\top \left( (x-\mu _t)^\top A^{-1/2}dW_t \right) p_t(x) dx - A_tA^{-1}A_t dt, \end{aligned}$$
(15)
$$\begin{aligned} d\Gamma _t&= q \int \left( x-\mu _t \right) ^\top A^{-1/2} \left( Q_t \right) ^{q-1} A^{-1/2} \left( x-\mu _t \right) \left( x-\mu _t \right) ^\top A^{-1/2} dW_t p_t(x) dx \nonumber \\&\quad - q {{\,\mathrm{Tr}\,}}\left( Q_t^{q+1} \right) dt + \frac{q}{2} \sum _{a = 0}^{q-2} \int \int \left( x-\mu _t \right) ^\top A^{-1/2} Q_t^{a} A^{-1/2} \left( y-\mu _t \right) \nonumber \\&\quad \cdot \left( x-\mu _t \right) ^\top A^{-1/2} Q_t^{q-2-a} A^{-1/2} \left( y-\mu _t \right) \left( x-\mu _t \right) ^\top A^{-1} \left( y-\mu _t \right) p_t(x) p_t(y) dx dy dt. \end{aligned}$$
(16)

Obtaining these differentials uses Itô’s formula and the proofs are provided in Appendix A.

The next lemma upper bounds the terms in the potential \(\Gamma _t\).

Lemma 6

Under the same assumptions of Lemma 5, the potential \(\Gamma _t\) defined in Equation (14) can be written as follows

$$\begin{aligned} d\Gamma _t = v_t^\top dW_t + \delta _t dt, \end{aligned}$$

where \(v_t \in \mathbb {R}^d\) and \(\delta _t \in \mathbb {R}\) satisfy

$$\begin{aligned} \left\| v_t\right\| _{2}&\le 16 q \Gamma _t^{1 + 1/(2q)}, \text { and } \\ \delta _t&\le \min \left\{ 64 q^2 \alpha ^2 \log (d) d^{2\beta -1/q}\Gamma _t^{1 + 1/q}, \frac{2q^2}{t} \Gamma _t \right\} . \end{aligned}$$

The proof of Lemma 6 is provided in Section 3.1. Remark that bounds similar to the first bound of \(\delta _t\) in Lemma 6 have appeared in Lee and Vempala [17], whereas the second bound of \(\delta _t\) in Lemma 6 is novel. The second bound of \(\delta _t\) also leads to the following Lemma 8 which gives better control of the potential than the previous proof by Lee and Vempala [17] when t is large.

Using the bounds in Lemma 6, we state the two lemmas which control the potential \(\Gamma _t\) in two ways.

Lemma 7

Under the same assumptions of Lemma 6, using the following transformation

$$\begin{aligned} h: \mathbb {R}_+&\rightarrow \mathbb {R}\\ a&\mapsto -(a+1)^{-1/q} \end{aligned}$$

we have

$$\begin{aligned} {\mathbb {P}}\left( \max _{t \in [0, T_1]} h(\Gamma _t) \ge - \frac{1}{2}\left( d+1 \right) ^{-1/q} \right) \le \exp (-\frac{2}{3} q\log (d)) \le \frac{3}{10} \end{aligned}$$

where \(T_1 = \frac{1}{32768 q \alpha ^2 \log (d) d^{2\beta }}\).

Lemma 8

Under the same assumptions of Lemma 6, using the following transformation

$$\begin{aligned} f: \mathbb {R}_+&\rightarrow \mathbb {R}\\ a&\mapsto a^{1/q} \end{aligned}$$

we have

$$\begin{aligned} {\mathbb {E}}f(\Gamma _{t_2}) \le {\mathbb {E}}f(\Gamma _{t_1}) \left( \frac{t_2}{t_1} \right) ^{2q}, \forall t_2> t_1 > 0. \end{aligned}$$

The proofs of Lemma 7 and 8 are provided in Section 3.2.

Now we are ready to prove Lemma 5.

Proof of Lemma 5. We take

$$\begin{aligned} T_1 = \frac{1}{32768 q \alpha ^2 \log (d) d^{2\beta }}, \quad T_2 = \frac{d^{\beta /(4q)}}{40} T_1 = \frac{1}{ 1310720 q \alpha ^2\log (d) d^{2\beta - \beta /(4q)}}. \end{aligned}$$

We bound the spectral norm of \(A^{-1/2}A_t A^{-1/2}\) in two time intervals via Lemma 7 and Lemma 8. In the first time interval \([0, T_1]\), we have

$$\begin{aligned} {\mathbb {P}}\left( \int _0^{T_1} \left\| A^{-1/2}A_t A^{-1/2}\right\| _{2} \ge \frac{1}{128} \right)&\le {\mathbb {P}}\left( \max _{t \in [0, T_1]} \left\| A^{-1/2}A_t A^{-1/2}\right\| _{2} \ge \frac{1}{128T_1} \right) \nonumber \\&\quad {\mathop {\le }\limits ^{\mathrm{(i)}}} {\mathbb {P}}\left( \max _{t \in [0, T_1]} \left\| A^{-1/2}A_t A^{-1/2}\right\| _{2} \ge 3 d^{1/q} \right) \nonumber \\&\quad {\mathop {\le }\limits ^{\mathrm{(ii)}}} {\mathbb {P}}\left( \max _{t \in [0, T_1]} \Gamma _t \ge 3^{q} d \right) \nonumber \\&\quad {\mathop {\le }\limits ^{\mathrm{(iii)}}} {\mathbb {P}}\left( \max _{t \in [0, T_1]} \Gamma _t + 1 \ge 2^{q} (d+1) \right) \nonumber \\&\quad {\mathop {=}\limits ^{{\mathrm{(iii)}}}} {\mathbb {P}}\left( \max _{t \in [0, T_1]} h(\Gamma _t) \ge -\frac{1}{2} \left( d+1 \right) ^{-1/q} \right) \nonumber \\&\quad {\mathop {\le }\limits ^{\mathrm{(iv)}}} \frac{3}{10}. \end{aligned}$$
(17)

Inequality (i) follows from the condition \(\beta q \ge 1\). (ii) follows from the fact that \({{\,\mathrm{Tr}\,}}\left( A^q \right) ^{1/q} \ge \left\| A\right\| _{2}\). (iii) is because \(3^q d\ge 2^q (d+ 1)\) when \(q \ge 2\) and \(d\ge 1\). h is defined in Lemma 7. (iv) follows from Lemma 7.

In the first time interval, we can also bound the expectation of \(\Gamma _{T_1}^{1/q}\). Since the density \(p_{T_1}\) is more log-concave than a Gaussian density with covariance matrix \(\frac{A}{T_1}\), the covariance matrix of \(p_{T_1}\) is upper bounded as follows (see Theorem 4.1 in Brascamp-Lieb [5] or Lemma 5 in Eldan and Lehec [10])

$$\begin{aligned} A_{T_1} \preceq \frac{A}{T_1}. \end{aligned}$$
(18)

Consequently, all the eigenvalues of \(Q_{T_1}\) are less than \(\frac{1}{T_1}\) and \(\Gamma _{T_1}\) is upper bounded by \(\frac{d}{T_1^{q}}\). Using the above bound, we can bound the expectation of \(\Gamma _{T_1}^{1/q}\) as follows

$$\begin{aligned} {\mathbb {E}}\left[ \Gamma _{T_1}^{1/q} \right]&= {\mathbb {E}}\left[ \mathbb {1}_{\Gamma _{T_1} \ge 3^q d} \Gamma _{T_1}^{1/q} + \mathbb {1}_{\Gamma _{T_1} < 3^q d} \Gamma _{T_1}^{1/q} \right] \nonumber \\&{\mathop {\le }\limits ^{\mathrm{(i)}}} \frac{d^{1/q}}{T_1} \exp \left( - \frac{2}{3} q \log (d) \right) + 3 d^{1/q} \nonumber \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}} 32768 d^{1/q} q \alpha ^2 + 4 d^{1/q} \nonumber \\&\le 40000 d^{1/q} q \alpha ^2. \end{aligned}$$
(19)

Inequality (i) follows from Lemma 7, the inequality \(3^qd\ge 2^q(d+1)\) (similar to what we did in the last four steps of Equation (17)) and Equation (18). (ii) follows from \(q \ge 2\), \(\beta \le {1/2}\) and \(d^{1/2} \ge \log (d)\) for \(d\ge 3\).

In the second time interval, for \(t \in [T_1, T_2]\), we have

$$\begin{aligned} {\mathbb {E}}\left[ \left\| A^{-1/2} A_{t} A^{-1/2}\right\| _{2} \right]&\le {\mathbb {E}}\left[ \Gamma _{t}^{1/q} \right] \nonumber \\&{\mathop {\le }\limits ^{\mathrm{(i)}}} {\mathbb {E}}\left[ \Gamma _{T_1}^{1/q} \right] \left( \frac{t}{T_1} \right) ^{2q} \nonumber \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}} {\mathbb {E}}\left[ \Gamma _{T_1}^{1/q} \right] \left( \frac{T_2}{T_1} \right) ^{2q} \nonumber \\&{\mathop {\le }\limits ^{\mathrm{(iii)}}} 1000 d^{\beta /2 + 1/q} q \alpha ^2. \end{aligned}$$
(20)

Inequality (i) follows from Lemma 8. (ii) is because \(t \le T_2\). (iii) follows from \(T_2 = \frac{d^{\beta /(4q)}}{40} T_1\). Using the above bound, we control the spectral norm in the second time interval via Markov’s inequality

$$\begin{aligned} {\mathbb {P}}\left( \int _{T_1}^{T_2} \left\| A^{-1/2}A_t A^{-1/2}\right\| _{2} \ge \frac{1}{128} \right)&{\mathop {\le }\limits ^{\mathrm{(i)}}} \frac{{\mathbb {E}}\left[ \int _{T_1}^{T_2} \left\| A^{-1/2}A_t A^{-1/2}\right\| _{2} dt \right] }{1/128} \nonumber \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}} T_2 \cdot 1000 d^{\beta /2 + 1/q} q \alpha ^2 \cdot 128 \nonumber \\&{\mathop {<}\limits ^{\mathrm{(iii)}}} \frac{1}{10}, \end{aligned}$$
(21)

where inequality (i) follows from Markov’s inequality and (ii) follows from Equation (20). (iii) follows from the definition of \(T_2\) and \(\frac{\beta }{2}+\frac{1}{q} \le 2\beta -\beta /(4q)\) when \(\beta q \ge 1\) and \(q \ge 2\).

Combining the bounds in the first and second time intervals in Equation (17) and (21), we obtain

$$\begin{aligned} {\mathbb {P}}\left( \int _{0}^{T_2} \left\| A^{-1/2}A_t A^{-1/2}\right\| _{2} \ge \frac{1}{64} \right)&\le {\mathbb {P}}\left( \int _{0}^{T_1} \left\| A^{-1/2}A_t A^{-1/2}\right\| _{2} \ge \frac{1}{128} \right) \nonumber \\&\quad + {\mathbb {P}}\left( \int _{T_1}^{T_2} \left\| A^{-1/2}A_t A^{-1/2}\right\| _{2} \ge \frac{1}{128} \right) \le \frac{4}{10}. \end{aligned}$$
(22)

\(\square \)

2.4 Proof of Lemma 2.

The proof of Lemma 2 follows the strategy described after Lemma 3. We make the arguments rigorous here. We consider a log-concave density p in \(\mathbb {R}^d\) with compact support. Without loss of generality, we can assume that the covariance matrix A of the density p is invertible. Otherwise, the density p is degenerate and we can instead prove the results in a lower dimension.

According to the concavity of the isoperimetric profile (Theorem 2.8 in Sternberg and Zumbrun [25] or Theorem 1.8 in Milman [20]), it is sufficient to consider subsets of measure 1/2 in the definition of isoperimetric coefficient (2). Given an initial subset E of \(\mathbb {R}^d\) with \(p(E) = \frac{1}{2}\), use the martingale property of \(p_{T_2}(E)\), we have

$$\begin{aligned} p(\partial E)&= {\mathbb {E}}\left[ p_{T_2}(\partial E) \right] \\&{\mathop {\ge }\limits ^{\mathrm{(i)}}} {\mathbb {E}}\left[ \frac{1}{2}\left\| B_{T_2}^{-1}\right\| _{2}^{-1/2} \min \left( p_{T_2}(E), p_{T_2}(E^c) \right) \right] \\&{\mathop {\ge }\limits ^{\mathrm{(ii)}}} \frac{1}{4}\cdot \frac{1}{2}\left\| B_{T_2}^{-1}\right\| _{2}^{-1/2} {\mathbb {P}}( \frac{1}{4}\le p_{T_2}(E) \le \frac{3}{4})\\&= \frac{1}{4}\left\| B_{T_2}^{-1}\right\| _{2}^{-1/2} {\mathbb {P}}( \frac{1}{4}\le p_{T_2}(E) \le \frac{3}{4}) \cdot \min \left\{ p(E), p(E^c) \right\} \\&{\mathop {\ge }\limits ^{\mathrm{(iii)}}} \frac{1}{8}\left\| B_{T_2}^{-1}\right\| _{2}^{-1/2} \cdot \min \left\{ p(E), p(E^c) \right\} \\&{\mathop {=}\limits ^{\mathrm{(iv)}}} \frac{1}{8}T_2^{1/2}\left\| A\right\| _{2}^{-1/2} \cdot \min \left\{ p(E), p(E^c) \right\} . \end{aligned}$$

Inequality (i) uses the isoperimetric inequality for a log-concave density which is more log-concave than a Gaussian density proportional to \(e^{-\frac{1}{2}x^\top B_t x}\) (see e.g. Theorem 2.7 in Ledoux [16] or Theorem 4.4 in Cousins and Vempala [7]). Inequality (ii) follows from the fact that \(p_t(E)\) is nonnegative. (iii) follows from Lemma 4 and Lemma 5 (for \(d\ge 3\)). (iv) follows from the construction that \(B_t = t A^{-1}\). We conclude the proof since \(T_2\) is taken as \(\frac{1}{ c \cdot q \alpha ^2\log (d) d^{2\beta - \beta /(4q)}}\) with c as a constant. The above proof only works for \(d\ge 3\). It is easy to verify that Lemma 2 still holds for the case for \(d= 1, 2\) from the original KLS bound in Equation (3).\(\square \)

2.5 Proof of Lemma 1.

The proof of Lemma 1 consists of applying Lemma 2 recursively. We define

$$\begin{aligned} \alpha _1 = 4, \beta _1 = \frac{1}{2}. \end{aligned}$$

For \(\ell \ge 1\), we define \(\alpha _\ell \) and \(\beta _\ell \) recursively as follows:

$$\begin{aligned} \alpha _{\ell +1}&= 2c \cdot \alpha _\ell \beta _\ell ^{-1/2}, \nonumber \\ \beta _{\ell +1}&= \beta _\ell - \beta _\ell ^2/16, \end{aligned}$$
(23)

where c is the constant in Lemma 2. It is not difficult to show by induction that \(\alpha _\ell \) and \(\beta _\ell \) satisfy

$$\begin{aligned} \frac{1}{\ell +1}&\le \beta _\ell \le \frac{16}{\ell } \nonumber \\ \alpha _\ell&\le \left( 4c^2 \ell \right) ^{\ell /2}. \end{aligned}$$
(24)

We start with a known bound from the original KLS paper [12]

$$\begin{aligned} \psi _d\ge \frac{1}{\alpha _1 d^{\beta _1}},\quad \forall d\ge 1. \end{aligned}$$

In the induction, suppose that we have

$$\begin{aligned} \psi _d\ge \frac{1}{\alpha _\ell \left( \log (d)+1 \right) ^{\ell /2} d^{\beta _\ell }},\quad \forall d\ge 1. \end{aligned}$$

From the above inequality, we obtain for any \(1 \le k \le d\),

$$\begin{aligned} \psi _k \ge \frac{1}{\alpha _\ell ' k^\beta _\ell }, \end{aligned}$$

with \(\alpha _\ell ' = \alpha _\ell \left( \log (d) +1 \right) ^{\ell /2}\). Using the above lower bounds for \(\psi _k\), we can apply Lemma 2. For integer \(\ell +1\), we have

$$\begin{aligned} \psi _d&{\mathop {\ge }\limits ^{(i)}} \frac{1}{c \cdot q^{1/2} \alpha _\ell \left( \log (d)+1 \right) ^{l/2} \log (d)^{1/2} d^{\beta _\ell - \beta _\ell / (8q) }}\\&{\mathop {\ge }\limits ^{(ii)}} \frac{1}{2c \cdot \alpha _\ell \beta _\ell ^{-1/2} \left( \log (d)+1 \right) ^{(l+1)/2} d^{\beta _\ell - \beta _\ell ^2 / 16 }} \\&= \frac{1}{\alpha _{\ell +1} \left( \log (d)+1 \right) ^{(\ell +1)/2} d^{\beta _{\ell +1}}} \end{aligned}$$

where inequality (i) follows from Lemma 2, inequality (ii) follows from \(q \le \frac{2}{\beta }\) and the last equality follows from the definition of \(\alpha _\ell \) and \(\beta _\ell \). We conclude Lemma 1 using the \(\alpha _\ell \) and \(\beta _\ell \) bounds in Equation (24).\(\square \)

2.6 Proof of Theorem 1.

To derive Theorem 1 from Lemma 1, it is sufficient to show that for any log-concave density p in \(\mathbb {R}^d\), most of its probability measure is on a compact support. Let \(\mu \) be the mean of the density p. Since \(r \mapsto p(\mathbb {B}\left( \mu , r \right) ^c)\) is an non-increasing function of r with limit 0 at \(\infty \), there exists a radius \(R > 0\), such that \(p(\mathbb {B}\left( \mu , R \right) ^c) \le 0.2\). Note that it is possible to get a better bound via e.g. log-concave concentration bounds from Paouris [22], but knowing the existence of such radius R is sufficient for the proof here.

Denote \(B = \mathbb {B}\left( \mu , R \right) \). Then \(p(B^c)\le 0.2\). Let \(\varrho \) be the density obtained by truncating p on the ball B. Then \(\varrho \) is log-concave and it has compact support. For a subset \(E \subset \mathbb {R}^d\) of measure such that \(p(E) = \frac{1}{2}\), we have

$$\begin{aligned} p(\partial E)&\ge \varrho (\partial E) p(B) \\&\ge \psi (\varrho ) \min \left( \varrho (E), \varrho (E^c) \right) p(B) \\&= \psi (\varrho ) \min \left( p(E \cap B), p(B \cap E^c) \right) \\&\ge \psi (\varrho ) \min \left( p(E) - p(B^c), p(E^c) - p(B^c) \right) \\&\ge \frac{1}{2} \psi (\varrho ) \min \left( p(E), p(E^c) \right) . \end{aligned}$$

The last inequality follows because \(p(E^c) - p(B^c) \ge 0.5 - 0.2 \ge \frac{1}{4}\). Since it is sufficient to consider subsets of measure 1/2 in the definition of the isoperimetric coefficient [20, 25], we conclude that the isoperimetric coefficient of p is lower bounded by half of that of \(\varrho \). Applying Lemma 1 for the isoperimetric coefficient of \(\varrho \), we obtain Theorem 1.\(\square \)

3 Proof of auxiliary lemmas

In this section, we prove auxiliary Lemmas 67 and 8.

3.1 Tensor bounds and proof of Lemma 6.

In this subsection, we prove Lemma 6. Since Lemma 6 involves the third-order moment tensor of a log-concave density, we define the following 3-Tensor for any probability density \(p \in \mathbb {R}^d\) with mean \(\mu \) to simplify notations.

$$\begin{aligned}&\mathcal {T}_p: \quad \mathbb {R}^{d\times d} \times \mathbb {R}^{d\times d} \times \mathbb {R}^{d\times d} \rightarrow \mathbb {R}\nonumber \\&\quad (A, B, C) \mapsto \int \int (x-\mu )^\top A (y-\mu )\nonumber \\&\quad \cdot (x-\mu )^\top B (y - \mu ) \cdot (x - \mu ) ^\top C (y - \mu ) p(x) p(y) dx dy. \end{aligned}$$
(25)

For ABC three matrices in \(\mathbb {R}^{d\times d}\), we can write \(\mathcal {T}_p(A, B, C)\) equivalently as

$$\begin{aligned} \mathcal {T}_p(A, B, C) = {\mathbb {E}}_{X, Y \sim p} (X-\mu ) ^\top A (Y-\mu ) \cdot (X-\mu ) ^\top B (Y-\mu ) \cdot (X-\mu ) ^\top C (Y-\mu ). \end{aligned}$$

Before we prove Lemma 6, we prove the following properties related to the 3-Tensor.

Lemma 9

Suppose p is a log-concave density with mean \(\mu \) and covariance A. Then for any positive semi-definite matrices B and C, we have

$$\begin{aligned} \left\| \int B^{1/2} (x - \mu ) (x - \mu ) ^\top C (x - \mu ) p(x)dx\right\| _{2} \le 16 \left\| A^{1/2}B A^{1/2}\right\| _{2}^{1/2} {{\,\mathrm{Tr}\,}}\left( A^{1/2} C A^{1/2} \right) . \end{aligned}$$

Lemma 10

Suppose that \(\psi _k \ge \frac{1}{\alpha k^\beta }\) for all \(k \le d\) for some \(0 \le \beta \le \frac{1}{2}\) and \(\alpha \ge 1\). Suppose p is a log-concave density in \(\mathbb {R}^d\) with covariance A and A is invertible. Then for \(q \ge \frac{1}{2\beta }\), we have

$$\begin{aligned} \mathcal {T}_p(A^{q-2}, \mathbb {I}_d, \mathbb {I}_d) \le 128 \alpha ^2 \log (d) d^{2\beta - 1/q} {{\,\mathrm{Tr}\,}}(A^q) ^{1 + 1/q}. \end{aligned}$$

Lemma 11

Given \(\tau > 0\). Suppose p is a log-concave density which is more log-concave than \(\mathcal {N}(0, \frac{1}{\tau } \mathbb {I}_d)\). Let A be its covariance matrix. Suppose A is invertible then for \(q \ge 3\), we have

$$\begin{aligned} \mathcal {T}_p(A^{q-2}, \mathbb {I}_d, \mathbb {I}_d) \le \frac{4}{\tau } {{\,\mathrm{Tr}\,}}\left( A^{q} \right) . \end{aligned}$$

Lemma 12

Suppose p is a log-concave density in \(\mathbb {R}^d\). For any \(\delta \in [0, 1]\), for ABC positive semi-definite matrices then

$$\begin{aligned} \mathcal {T}_{p}(B^{1/2}A^\delta B^{1/2}, B^{1/2}A^{1-\delta }B^{1/2}, C) \le \mathcal {T}_{p}(B^{1/2}AB^{1/2}, B, C). \end{aligned}$$
(26)

The proofs of the above lemmas are provided in Section 3.3.

Now we are ready to prove Lemma 6.

Proof of Lemma 6. We first prove the bound on \(\left\| v_t\right\| _{2}\), where

$$\begin{aligned} v_t = q \int A^{-1/2} \left( x-\mu _t \right) \left( x-\mu _t \right) ^\top A^{-1/2} \left( Q_t \right) ^{q-1} A^{-1/2} \left( x-\mu _t \right) p_t(x) dx. \end{aligned}$$

Applying Lemma 9 and knowing the covariance of \(p_t\) is \(A_t\), we obtain

$$\begin{aligned} \left\| v_t\right\| _{2}&\le 16 q \left\| A_t^{1/2} A^{-1} A_t^{1/2}\right\| _{2}^{1/2} {{\,\mathrm{Tr}\,}}\left( A_t^{1/2} A^{-1/2} Q_t^{q-1} A^{-1/2} A_t^{1/2} \right) \\&{\mathop {=}\limits ^{\mathrm{(i)}}} 16 q \left\| A_t^{1/2} A^{-1} A_t^{1/2}\right\| _{2}^{1/2} {{\,\mathrm{Tr}\,}}\left( Q_t^{q} \right) \\&{\mathop {=}\limits ^{\mathrm{(ii)}}} 16 q \left\| Q_t\right\| _{2}^{1/2} {{\,\mathrm{Tr}\,}}\left( Q_t^{q} \right) \\&{\mathop {\le }\limits ^{\mathrm{(iii)}}} 16 q \left[ {{\,\mathrm{Tr}\,}}\left( Q_t^{q} \right) \right] ^{1+1/(2q)}. \end{aligned}$$

Equality (i) uses the definition of \(Q_t = A^{-1/2} A_t A^{-1/2}\). Equality (ii) uses the fact that \(\left\| MM^\top \right\| _{2} = \left\| M^\top M\right\| _{2}\) for any square matrix \(M \in \mathbb {R}^{d\times d}\). Inequality (iii) uses that \(\left\| M\right\| _{2} \le {{\,\mathrm{Tr}\,}}\left( M^q \right) ^{1/q}\) for any positive semi-definite matrix M.

Next, we bound \(\delta _t\) in two ways. We can ignore the negative term in \(\delta _t\) to obtain the following:

$$\begin{aligned} \delta _t&\le \frac{q}{2} \sum _{a = 0}^{q-2} \int \int \left( x-\mu _t \right) ^\top A^{-1/2} Q_t^{a} A^{-1/2} \left( y-\mu _t \right) \nonumber \\&\quad \cdot \left( x-\mu _t \right) ^\top A^{-1/2} Q_t^{q-2-a} A^{-1/2} \left( y-\mu _t \right) \left( x-\mu _t \right) ^\top A^{-1} \left( y-\mu _t \right) p_t(x) p_t(y) dx dy \nonumber \\&= \frac{q}{2} \sum _{a = 0}^{q-2} \mathcal {T}_{\varrho _t}(Q_t^{a}, Q_t^{q-2-a}, \mathbb {I}_d), \end{aligned}$$
(27)

where \(\varrho _t\) is the density of linear-transformed random variable \(A^{-1/2}\left( X-\mu _t \right) \) for X drawn from \(p_t\) and \(\mu _t\) is the mean of \(p_t\). \(\varrho _t\) is still log-concave since any linear transformation of a log-concave density is log-concave (see e.g. Saumard and Wellner [24]). \(\varrho _t\) has covariance \(A^{-1/2} A_t A^{-1/2}\), which is also \(Q_t\). For \(a \in \left\{ 0, \cdots , q-2 \right\} \), we have

$$\begin{aligned} \mathcal {T}_{\varrho _t}(Q_t^{a}, Q_t^{q-2-a}, \mathbb {I}_d)&{\mathop {\le }\limits ^{\mathrm{(i)}}} \mathcal {T}_{\varrho _t}(Q_t^{q-2}, \mathbb {I}_d, \mathbb {I}_d) \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}} 128 \alpha ^2 \log (d) d^{2\beta - 1/q} \left[ {{\,\mathrm{Tr}\,}}\left( Q_t^q \right) \right] ^{1+1/q}. \end{aligned}$$

Inequality (i) follows from Lemma 12. Inequality (ii) follows from Lemma 10. Since there are \(q-1\) terms in the sum, we conclude the first part of the bound for \(\delta _t\).

On the other hand, since \(p_t\) is more log-concave than the Gaussian density proportional to \(e^{-\frac{t}{2} (x-\mu _t)^\top A^{-1} (x-\mu _t)}\), \(\varrho _t\) is more log-concave than the Gaussian density proportional to \(e^{-\frac{t}{2} x^\top x}\). Applying Lemma 12 and Lemma 11 to each term in Equation (27), we obtain

$$\begin{aligned} \delta _t&\le \frac{q^2}{2} \mathcal {T}_{\varrho _t}(Q_t^{q-2}, \mathbb {I}_d, \mathbb {I}_d) \\&\le \frac{2q^2}{t} {{\,\mathrm{Tr}\,}}\left( Q_t^{q} \right) . \end{aligned}$$

This concludes the second part of the bound for \(\delta _t\).\(\square \)

3.2 Control of the potential in two time intervals.

In this subsection, we prove Lemma 7 and Lemma 8.

Proof of Lemma 7. The function h has the following derivatives

$$\begin{aligned} \frac{d h}{d a} = \frac{1}{q} \left( a + 1 \right) ^{-1/q - 1}, \quad \frac{d^2 h}{da^2} = -\frac{q+1}{q^2} \left( a + 1 \right) ^{-1/q - 2}. \end{aligned}$$

Using Itô’s formula, we obtain

$$\begin{aligned} d h(\Gamma _t)&= \left. \frac{d h}{d a}\right| _{\Gamma _t} d\Gamma _t + \frac{1}{2} \left. \frac{d^2 h}{d a^2}\right| _{\Gamma _t} d\left[ \Gamma \right] _t \\ {}&= \frac{1}{q (\Gamma _t+1)^{1/q+1}} d\Gamma _t - \frac{1}{2} \frac{q+1}{q^2 (\Gamma _t+1)^{1/q+2}} \left\| v_t\right\| _{2}^2 dt \\ {}&\le \frac{1}{q (\Gamma _t+1)^{1/q+1}} d\Gamma _t \\ {}&{\mathop {\le }\limits ^{\mathrm{(i)}}}\ 64 q \alpha ^2 \log (d) d^{2\beta -1/q} dt + \frac{v_t^\top dW_t}{q \left( \Gamma _t + 1 \right) ^{1/q+1}}, \end{aligned}$$

where inequality (i) plugs in the bounds in Lemma 6.

Define a martingale \(Y_t\) such that

$$\begin{aligned} dY_t = \frac{v_t^\top dW_t}{q \left( \Gamma _t + 1 \right) ^{1/q+1}}, \end{aligned}$$

with \(Y_0 = 0\). According to the \(\left\| v_t\right\| _{2}\) upper bound in Lemma 6, we have

$$\begin{aligned} \left\| \frac{1}{q \left( \Gamma _t + 1 \right) ^{1 + 1/q}}v_t\right\| _{2}^2&\le 256. \end{aligned}$$

Hence the martingale \(Y_t\) is well-defined. According to the Dambis, Dubins-Schwarz theorem (see e.g. Revuz and Yor [23] Section V.1 Theorem 1.7), there exits a Wiener process \({\tilde{W}}_t\) such that \(Y_t\) has the same distribution as \({\tilde{W}}_{[Y]_t}\). Then we have for any \(\gamma > 0\),

$$\begin{aligned} {\mathbb {P}}\left( \max _{t \in [0, T]} Y_t \ge \gamma \right) \le {\mathbb {P}}\left( \max _{t \in [0, T]} {\tilde{W}}_{256t} \ge \gamma \right) \le \exp \left( -\frac{\gamma ^2}{512 T} \right) . \end{aligned}$$
(28)

Set \(T = \frac{1}{32768 q \alpha ^2 \log (d) d^{2\beta }}\) and \(\Psi = \frac{1}{2} \left( d+1 \right) ^{-1/q}\). Observe that \(\Gamma _0 = d\) and as a result \(h(\Gamma _0) = -\left( d+1 \right) ^{-1/q}\). Then we have

$$\begin{aligned} {\mathbb {P}}\left( \max _{t \in [0, T]} h(\Gamma _t) \ge -\Psi \right)&\le {\mathbb {P}}\left( \max \right) _{t \in [0, T]} Y_t \ge -\Psi + \left( d+1 \right) ^{-1/q}\\ {}&\quad - \int _0^T 64q \alpha ^2 \log (d) d^{2\beta - 1/q} dt \\ {}&{\mathop {\le }\limits ^{\mathrm{(i)}}}\ {\mathbb {P}}\left( \max _{t \in [0, T]} Y_t \ge \frac{\Psi }{4} \right) \\ {}&{\mathop {\le }\limits ^{\mathrm{(ii)}}} \exp \left( -\frac{\Psi ^2}{8192T} \right) \\ {}&{\mathop {\le }\limits ^{\mathrm{(iii)}}} \exp \left( -\frac{2}{3} q \alpha ^2 \log (d) d^{2\beta - 2/q} \right) \nonumber \\ {}&{\mathop {<}\limits ^{\mathrm{(iv)}}} \frac{3}{10}. \end{aligned}$$

Inequality (i) follows from the choice of T. (ii) uses Equation (28). (iii) follows by plugging in \(\Psi = \frac{1}{2}\left( d+1 \right) ^{-1/q}\) and \(3^q d^2 \ge 2^q (d+ 1)^2\). (iv) follows from \(\beta q \ge 1\), \(d\ge 3\), \(q\ge 2\) and \(3^{-4/3} < 0.3\).\(\square \)

Proof of Lemma 8. The function f has the following derivatives

$$\begin{aligned} \frac{d f(a)}{d a} = \frac{1}{q} a^{1/q-1}, \frac{d^2 f(a, t)}{d a^2} = -\frac{q-1}{q^2} a^{1/q-2}. \end{aligned}$$

Using Itô’s formula, we obtain

$$\begin{aligned} d f\left( \Gamma _t \right)&= \left. \frac{df}{da} \right| _{\Gamma _t} d\Gamma _t + \frac{1}{2} \left. \frac{d^2 f}{ d^2 a }\right| _{\Gamma _t} d \left[ \Gamma \right] _t \\&= \frac{1}{q} \Gamma _t^{1/q-1} \left( v_t^\top dW_t + \delta _t dt \right) - \frac{q-1}{2q^2} \Gamma _t^{1/q-2} \left\| v_t\right\| _{2}^2 dt. \end{aligned}$$

Using the bounds in Lemma 6 and the martingale property of the term \(\frac{1}{q} \Gamma _t^{1/q-1} v_t^\top dW_t\), we obtain

$$\begin{aligned} d {\mathbb {E}}f(\Gamma _t) \le \frac{2q}{t} {\mathbb {E}}f(\Gamma _t) dt. \end{aligned}$$

Solving the above differential equation, we obtain

$$\begin{aligned} {\mathbb {E}}f(\Gamma _{t_2}) \le {\mathbb {E}}f(\Gamma _{t_1}) \left( \frac{t_2}{t_1} \right) ^{2q}, \forall t_2> t_1 > 0. \end{aligned}$$

\(\square \)

3.3 Proof of tensor bounds.

In this subsection, we prove Lemmas 91011 and 12.

Proof of Lemma 9. Since C is positive semi-definite, we can write its eigenvalue decomposition as follows \(C = \sum _{i=1}^d\lambda _i v_i v_i^\top \), with \(\lambda _i \ge 0\). Then,

$$\begin{aligned}&\left\| \int B^{1/2} (x-\mu ) (x-\mu )^\top C (x-\mu ) p(x) dx\right\| _{2} \\&\quad = \left\| \sum _{i=1}^d\int B^{1/2} (x-\mu ) \lambda _i \left( (x-\mu )^\top v_i \right) ^2 p(x) dx\right\| _{2}\\&\quad {\mathop {\le }\limits ^{\mathrm{(i)}}} \sum _{i=1}^d\lambda _i \left\| \int B^{1/2} (x-\mu ) \left( (x-\mu )^\top v_i \right) ^2 p(x) dx\right\| _{2}\\&\quad = \sum _{i=1}^d\lambda _i \max _{\left\| \xi \right\| _{2}\le 1} \int \xi ^\top B^{1/2} (x-\mu ) \left( (x-\mu )^\top v_i \right) ^2 p(x) dx \\&\quad {\mathop {\le }\limits ^{\mathrm{(ii)}}} \sum _{i=1}^d\lambda _i \max _{\left\| \xi \right\| _{2}\le 1} \left( \int \left( \xi ^\top B^{1/2} (x-\mu ) \right) ^2 p(x) dx \right) ^{1/2} \left( \int \left( (x-\mu )^\top v_i \right) ^4 p(x) dx \right) ^{1/2} \\&\quad {\mathop {\le }\limits ^{\mathrm{(iii)}}} 16 \sum _{i=1}^d\lambda _i \max _{\left\| \xi \right\| _{2}\le 1} \left( \int \left( \xi ^\top B^{1/2} (x-\mu ) \right) ^2 p(x) dx \right) ^{1/2} \left( \int \left( (x-\mu )^\top v_i \right) ^2 p(x) dx \right) \\&\quad = 16\left\| B^{1/2} A B^{1/2} \right\| _{2}^{1/2} {{\,\mathrm{Tr}\,}}\left( A^{1/2}CA^{1/2} \right) . \end{aligned}$$

Inequality (i) follows from triangular inequality. (ii) follows from Cauchy–Schwarz inequality. (iii) follows from the statement below, which upper bounds the fourth moment of a log-concave density via its second moment.\(\square \)

For any log-concave density \(\nu \) and any vector \(\theta \in \mathbb {R}^{d}\), we have

$$\begin{aligned} \left( \int \left( (x-\mu _\nu )^\top \theta \right) ^a \nu (x) dx \right) ^{1/a} \le 2 \frac{a}{b} \left( \int \left( (x-\mu _\nu )^\top \theta \right) ^b \nu (x) dx \right) ^{1/b} \end{aligned}$$
(29)

for \(a \ge b > 0\), where \(\mu _\nu \) is the mean of \(\nu \). Equation (29) is proved e.g. in Corollary 5.7 of Guédon et al. [11] and the exact constant is provided in Proposition 3.8 of Latała and Wojtaszczyk [15].

In order to prove Lemma 10, we need to introduce one additional lemma as follows.

Lemma 13

Suppose that \(\psi _k \ge \frac{1}{\alpha k^\beta }\) for all \(k \le d\) for some \(0 < \beta \le \frac{1}{2}\) and \(\alpha \ge 1\). For an isotropic log-concave density p in \(\mathbb {R}^d\) and a unit vector \(v \in \mathbb {R}^d\), define \(\Delta = {\mathbb {E}}_{X \sim p} \left( X^\top v \right) \cdot XX^\top \), then we have

  1. 1.

    For any orthogonal projection matrix \(P \in \mathbb {R}^{d\times d}\) with rank r, we have

    $$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( \Delta P \Delta \right) \le 16 \psi ^{-2}_{\min (2r, d)}. \end{aligned}$$
  2. 2.

    For any positive semi-definite matrix A, we have

    $$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( \Delta A \Delta \right) \le 128 \alpha ^2 \log (d) \left( {{\,\mathrm{Tr}\,}}\left( A^{1/(2\beta )} \right) \right) ^{2\beta }. \end{aligned}$$

This lemma was proved in Lemma 41 in an older version (arXiv version 2) of Lee and Vempala [17]. The main proof idea for the first part of Lemma 13 appeared in Eldan [8] (Lemma 6). we provide a proof here for completeness.

Proof of Lemma 13. For the first part, we have

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( \Delta P \Delta \right) = {\mathbb {E}}_{X \sim p} X^\top \Delta P X \cdot X ^\top v. \end{aligned}$$

Since \({\mathbb {E}}_{X\sim p} X^\top v = 0\), we can subtract the mean of the first term \(X^\top \Delta P X\) without changing the value of \({{\,\mathrm{Tr}\,}}\left( \Delta P \Delta \right) \). Then

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( \Delta P \Delta \right)&= {\mathbb {E}}_{X\sim p} \left[ \left( X^\top \Delta P X - {\mathbb {E}}_{Y \sim p} Y ^\top \Delta P Y \right) \cdot X ^\top v \right] \\&{\mathop {\le }\limits ^{\mathrm{(i)}}} \left( {\mathbb {E}}_{X\sim p}(X^\top v)^2 \right) ^{1/2} \left( {{\,\mathrm{Var}\,}}_{X \sim p }\left( X ^\top \Delta P X \right) \right) ^{1/2} \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}} 2 \psi _{\min (2r, d)}^{-1} \left( {\mathbb {E}}_{X \sim p} \left\| \Delta P X + P^\top \Delta ^\top X\right\| _{2}^2 \right) ^{1/2} \\&\le 4 \psi _{\min (2r, d)}^{-1} \left( {{\,\mathrm{Tr}\,}}\left( \Delta P \Delta \right) \right) ^{1/2}. \end{aligned}$$

Inequality (i) follows from the Cauchy–Schwarz inequality. Inequality (ii) follows from the fact that \({\mathbb {E}}_{X\sim p}(X^\top v)^2 = 1\) as p is isotropic and that the inverse Poincaré constant is upper bounded by twice of inverse of the squared isoperimetric coefficient (also known as Cheeger’s inequality [6, 19] or Theorem 1.1 in Milman [20]). The matrix \(\Delta P + P^\top \Delta \) has rank at most \(\min (2r, d)\). Rearranging the terms in the above equation, we conclude the first part of Lemma 13.

For the second part, we write the matrix A in its eigenvalue decomposition and group the terms by eigenvalues. We have

$$\begin{aligned} A = \sum _{i=1}^d\lambda _i v_i v_i^\top = \sum _{j=1}^J A_j + B, \end{aligned}$$

where \(A_i\) has eigenvalues between the interval \((\left\| A\right\| _{2} e^{i-1} /d, \left\| A\right\| _{2} e^{i} /d]\) and B has eigenvalues smaller than or equal to \(\left\| A\right\| _{2}/d\). Because the intervals have right bounds increasing exponentially, we have \(J = \lceil \log (d) \rceil \). Let \(P_i\) be the orthogonal projection matrix formed by the eigenvectors in \(A_i\). Then we have

$$\begin{aligned}&{{\,\mathrm{Tr}\,}}\left( \Delta A_i \Delta \right) \le \left\| A_i\right\| _{2} {{\,\mathrm{Tr}\,}}\left( \Delta P_i \Delta \right) {\mathop {\le }\limits ^{\mathrm{(i)}}} 16 \left\| A_i\right\| _{2} \psi ^{-2}_{\min (2 \text {rank}(A_i), d)} {\mathop {\nonumber }\limits ^{\mathrm{(ii)}}}\\&\quad {\le } 16 \alpha ^2 \left\| A_i\right\| _{2} \cdot \left( 2 \text {rank}(A_i) \right) ^{2\beta }, \end{aligned}$$
(30)

where inequality (i) follows from the first part of Lemma 13 and inequality (ii) follows from the hypothesis of Lemma 13. Similarly for matrix B, we have

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( \Delta B \Delta \right) {\mathop {\le }\limits ^{\mathrm{(i)}}} 16 \alpha ^2 \left\| B\right\| _{2} \left( 2\text {rank}(B) \right) ^{2\beta } {\mathop {\le }\limits ^{\mathrm{(ii)}}} 32 \alpha ^2 \left\| A\right\| _{2}, \end{aligned}$$
(31)

where inequality (i) follows from the hypothesis of Lemma 13 and inequality (ii) follows from the fact that \(\left\| B\right\| _{2} \le \left\| A\right\| _{2}/d\) and \(2\beta \le 1\). Putting the bounds (30) and (31) together, we have

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( \Delta A \Delta \right)&= \sum _{j=1}^J {{\,\mathrm{Tr}\,}}\left( \Delta A_j\Delta \right) + {{\,\mathrm{Tr}\,}}\left( \Delta B \Delta \right) \\&\le 16 \alpha ^2 \left( \sum _{j=1}^J \left\| A_j\right\| _{2} \cdot \left( 2\text {rank}(A_j) \right) ^{2\beta } + 2\left\| A\right\| _{2} \right) \\&{\mathop {\le }\limits ^{\mathrm{(i)}}} 16 \alpha ^2 \left[ \left( \sum _{j=1}^J \left\| A_j\right\| _{2}^{1/(2\beta )} \cdot \left( 2\text {rank}(A_j) \right) \right) ^{2\beta } \cdot \left( J \right) ^{1-2\beta } + 2\left\| A\right\| _{2} \right] \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}} 16 \alpha ^2\left[ \left( 2 e {{\,\mathrm{Tr}\,}}\left( A^{1/(2\beta )} \right) \right) ^{2\beta } \cdot \left( J \right) ^{1-2\beta } + 2 \left\| A\right\| _{2} \right] \\&\le 128 \alpha ^2 \log (d) \left( {{\,\mathrm{Tr}\,}}\left( A^{1/(2\beta )} \right) \right) ^{2\beta }. \end{aligned}$$

Inequality (i) follows from Holder’s inequality and inequality (ii) follows from the fact that \(\left\| A_j\right\| _{2}^{1/2\beta } \text {rank}(A_j) \le e {{\,\mathrm{Tr}\,}}\left( A_{j}^{1/2\beta } \right) \) due to the construction of \(A_j\). This concludes the second part of Lemma 13.\(\square \)

Proof of Lemma 10. Let \(\mu \) be the mean of p. First, for X a random vector in \(\mathbb {R}^d\) drawn from p, we define the standardized random variable \(A^{-1/2} (X - \mu )\) and its density \(\varrho \). \(\varrho \) is an isotropic log-concave density. Then through a change of variable, we have

$$\begin{aligned}&\mathcal {T}_p \left( A^{q-2}, \mathbb {I}_d, \mathbb {I}_d \right) \\&\quad = \int \int (x-\mu )^\top A^{q-2} (y-\mu ) \cdot (x-\mu )^\top (y-\mu ) \cdot (x-\mu ) ^\top (y-\mu ) p(x) p(y) dx dy \\&\quad = \int \int \left( x^\top A^{q-1} y \right) (x^\top A y) (x ^\top A y) \varrho (x) \varrho (y) dx dy \\&\quad \le \int \int \left( x^\top A^{q} y \right) (x^\top A y) (x ^\top y) \varrho (x) \varrho (y) dx dy \\&\quad = \mathcal {T}_\varrho \left( A^{q}, A, \mathbb {I}_d \right) , \end{aligned}$$

where the last inequality follows from Lemma 12. \(A^q\) is positive semi-definite and we write down its eigenvalue decomposition \(A^q = \sum _{i=1}^d\lambda _i v_i v_i ^\top \) with \(\lambda _i \ge 0\). Since \(\varrho \) is isotropic, we can rewrite the 3-Tensor into a summation form and apply Lemma 13.

$$\begin{aligned}&\mathcal {T}_\varrho \left( A^{q}, A, \mathbb {I}_d \right) \\&\quad = \int \int \left( x ^\top A^q y \right) \left( x ^\top A y \right) \left( x^\top y \right) \varrho (x) \varrho (y) dx dy \\&\quad = \sum _{i=1}^d\lambda _i \int \int \left( x ^\top v_i \right) \left( y ^\top v_i \right) \left( x ^\top A y \right) \left( x^\top y \right) \varrho (x) \varrho (y) dx dy \\&\quad = \sum _{i=1}^d\lambda _i {{\,\mathrm{Tr}\,}}\left( \Delta _i A \Delta _i \right) \\&\quad {\mathop {\le }\limits ^{\mathrm{(i)}}} 128 \alpha ^2 \log (d) \left( {{\,\mathrm{Tr}\,}}(A^{1/2\beta }) \right) ^{2\beta } \left( \sum _{i=1}^d\lambda _i \right) \\&\quad = 128 \alpha ^2 \log (d) \left( {{\,\mathrm{Tr}\,}}(A^{1/2\beta }) \right) ^{2\beta } {{\,\mathrm{Tr}\,}}(A^q) \\&\quad {\mathop {\le }\limits ^{\mathrm{(ii)}}} 128 \alpha ^2 \log (d) {{\,\mathrm{Tr}\,}}(A^q) \left[ {{\,\mathrm{Tr}\,}}\left( A^q \right) ^{1/(2\beta q)} \left( d \right) ^{1 - 1/(2\beta q)} \right] ^{2\beta } \\&\quad = 128 \alpha ^2 \log (d) d^{2\beta - 1/q} {{\,\mathrm{Tr}\,}}(A^q) ^{1 + 1/q}, \end{aligned}$$

where we define \(\Delta _i = \int (x^\top v_i) x x^\top \varrho (x) dx\), inequality (i) follows from Lemma 13 and that \(\varrho \) is isotropic, inequality (ii) follows from Cauchy–Schwarz inequality and the assumption that \(q \ge \frac{1}{2\beta }\).\(\square \)

Proof of Lemma 11. Without loss of generality, we can assume that the density p has mean 0. Its covariance matrix A is positive semi-definite and invertible. We can write down its eigenvalue decomposition as follows \(A = \sum _{i=1}^d\lambda _i v_i v_i^\top \) with \(\lambda _i > 0\) and \(v_i\) are eigenvectors with norm 1. Then \(A^{q}\) has an eigenvalue decomposition with the same eigenvectors \(A^q = \sum _{i=1}^d\lambda _i^q v_i v_i^\top \). Define \(\Delta _i = {\mathbb {E}}_{X \sim p} (X^\top A^{-1/2}v_i) X X ^\top \), then

$$\begin{aligned} \mathcal {T}_p\left( A^{q-2}, \mathbb {I}_d, \mathbb {I}_d \right)&= {\mathbb {E}}_{X, Y \sim p} \left( X^\top A^{q-2} Y \right) (X^\top Y) (X ^\top Y) \nonumber \\&= \sum _{i=1}^d\lambda _i^{q-1} {{\,\mathrm{Tr}\,}}\left( \Delta _i \Delta _i \right) . \end{aligned}$$
(32)

Next we bound the terms \({{\,\mathrm{Tr}\,}}\left( \Delta _i \Delta _i \right) \). We have

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( \Delta _i \Delta _i \right)&= {\mathbb {E}}_{X \sim p} \left( X ^\top A^{-1/2} v_i \right) X^\top \Delta _i X \\&{\mathop {=}\limits ^{\mathrm{(i)}}} {\mathbb {E}}_{X \sim p} \left( X ^\top A^{-1/2} v_i \right) \left( X^\top \Delta _i X - {\mathbb {E}}_{Y \sim p} \left[ Y^\top \Delta _i Y \right] \right) \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}} \left( {\mathbb {E}}_{X \sim p} \left( X ^\top A^{-1/2} v_i \right) ^2 \right) ^{1/2} \left( {{\,\mathrm{Var}\,}}\left( X ^\top \Delta _i X \right) \right) ^{1/2} \\&{\mathop {=}\limits ^{\mathrm{(iii)}}} \left( {{\,\mathrm{Var}\,}}_{X \sim p}\left( X ^\top \Delta _i X \right) \right) ^{1/2} \\&{\mathop {\le }\limits ^{\mathrm{(iv)}}} \left( {\mathbb {E}}_{X \sim p} \frac{1}{\tau } \left\| \Delta _i X + \Delta _i X\right\| _{2}^2 \right) ^{1/2} \\&{\mathop {\le }\limits ^{\mathrm{(v)}}} \left( \frac{4}{\tau } {{\,\mathrm{Tr}\,}}\left( A \Delta _i \Delta _i \right) \right) ^{1/2}. \end{aligned}$$

Equality (i) is because \({\mathbb {E}}_{X \sim p} X = 0\). Inequality (ii) follows from Cauchy–Schwarz inequality. Equality (iii) follows from the definition of the covariance matrix \({\mathbb {E}}_{X\sim p} XX^\top = A\). Inequality (iv) follows from the Brascamp-Lieb inequality (or Hessian Poincaré, see Theorem 4.1 in Brascamp and Lieb [5]) together with the assumption that p is more log-concave than \(\mathcal {N}(0, \frac{1}{\tau }\mathbb {I}_d)\).

Plugging the bounds of the terms \({{\,\mathrm{Tr}\,}}\left( \Delta _i \Delta _i \right) \) into Equation (32), we obtain

$$\begin{aligned} \mathcal {T}_p\left( A^{q-2}, \mathbb {I}_d, \mathbb {I}_d \right)&= \sum _{i=1}^d\lambda _i^{q-1} {{\,\mathrm{Tr}\,}}\left( \Delta _i \Delta _i \right) \\&\le \sum _{i=1}^d\lambda _i^{q-1} \left( \frac{4}{\tau } {{\,\mathrm{Tr}\,}}\left( A \Delta _i \Delta _i \right) \right) ^{1/2} \\&{\mathop {\le }\limits ^{\mathrm{(i)}}} \frac{2}{\tau ^{1/2}} \left( \sum _{i=1}^d\lambda _i^{q} \right) ^{1/2} \left( \sum _{i=1}^d\lambda _i^{q-2} {{\,\mathrm{Tr}\,}}\left( A \Delta _i \Delta _i \right) \right) ^{1/2} \\&= \frac{2}{\tau ^{1/2}} \left( {{\,\mathrm{Tr}\,}}\left( A^q \right) \right) ^{1/2} \left( {\mathbb {E}}_{X, Y \sim p} \left( X^\top A^{q-3} Y \right) (X^\top A Y) (X ^\top Y) \right) ^{1/2} \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}} \frac{2}{\tau ^{1/2}} \left( {{\,\mathrm{Tr}\,}}\left( A^q \right) \right) ^{1/2} \left( {\mathbb {E}}_{X, Y \sim p} \left( X^\top A^{q-2} Y \right) (X^\top Y) (X ^\top Y) \right) ^{1/2} \\&= \frac{2}{\tau ^{1/2}} \left( {{\,\mathrm{Tr}\,}}\left( A^q \right) \right) ^{1/2} \left[ \mathcal {T}_p\left( A^{q-2}, \mathbb {I}_d, \mathbb {I}_d \right) \right] ^{1/2}. \end{aligned}$$

Inequality (i) follows from Cauchy–Schwarz inequality. For \(q \ge 3\), inequality (ii) follows from Lemma 12. From the above equation, after rearranging the terms, we obtain

$$\begin{aligned} \mathcal {T}_p\left( A^{q-2}, \mathbb {I}_d, \mathbb {I}_d \right) \le \frac{4}{\tau } {{\,\mathrm{Tr}\,}}\left( A^q \right) . \end{aligned}$$

\(\square \)

Proof of Lemma 12. This lemma is proved in Lemma 43 in an older version (arXiv version 2) of Lee and Vempala [17], we provide a proof here for completeness.

Without loss of generality, we can assume that the density p has mean 0. For \(i \in \left\{ 1, \cdots , d \right\} \), we define \(\Delta _i = {\mathbb {E}}_{X\sim p} B^{1/2} X X ^\top B^{1/2} X^\top C^{1/2} e_i\) where \(e_i \in \mathbb {R}^d\) is the vector with ith coordinate 1 and 0 elsewhere. We have \(\sum _{i=1}^de_i e_i ^\top = \mathbb {I}_d\). We can rewrite the tensor on the left hand side as a sum of traces.

$$\begin{aligned}&\mathcal {T}_{p}(B^{1/2}A^\delta B^{1/2}, B^{1/2}A^{1-\delta }B^{1/2}, C) \nonumber \\&\quad = {\mathbb {E}}_{X, Y \sim p} X^\top B^{1/2}A^\delta B^{1/2} Y \cdot X^\top B^{1/2}A^{1-\delta }B^{1/2} Y \cdot X ^\top C Y \nonumber \\&\quad = \sum _{i=1}^d{\mathbb {E}}_{X, Y \sim p} X^\top B^{1/2}A^\delta B^{1/2} Y \cdot X^\top B^{1/2}A^{1-\delta }B^{1/2} Y \cdot X^\top C^{1/2} e_i \cdot Y^\top C^{1/2} e_i \nonumber \\&\quad = \sum _{i=1}^d{{\,\mathrm{Tr}\,}}\left( A^{\delta } \Delta _i A^{1-\delta } \Delta _i \right) . \end{aligned}$$
(33)

For any symmetric matrix F, a positive-semidefinite matrix G and \(\delta \in [0, 1]\), we have

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( G^\delta F G^{1-\delta } F \right) \le {{\,\mathrm{Tr}\,}}\left( G F^2 \right) . \end{aligned}$$
(34)

Applying the above trace inequality (34) that we prove later for completeness (see also Lemma 2.1 in Zhu et al. [1]), we obtain

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( A^{\delta } \Delta _i A^{1-\delta } \Delta _i \right) \le {{\,\mathrm{Tr}\,}}\left( A \Delta _i \Delta _i \right) . \end{aligned}$$

Writing the sum of traces in Equation (33) back to the 3-Tensor form, we conclude Lemma 12.

It remains to prove the trace inequality in Equation (34). Without loss of generality, we can assume G is diagonal. Hence, we have

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( G^\delta F G^{1-\delta } F \right)&= \sum _{i = 1}^d\sum _{j = 1}^dG_{ii}^\delta G_{jj}^{1-\delta } F_{ij}^2 \\&\le \sum _{i=1}^d\sum _{j=1}^d\left( \delta G_{ii} + (1-\delta ) G_{jj} \right) F_{ij}^2 \\&= \delta \sum _{i=1}^d\sum _{j=1}^dG_{ii} F_{ij}^2 + (1-\delta ) \sum _{i=1}^d\sum _{j=1}^dG_{jj} F_{ij}^2 \\&= {{\,\mathrm{Tr}\,}}\left( G F^2 \right) , \end{aligned}$$

where the inequality follows from Jensen’s inequality and the fact that the logarithm function is concave (or the inequality of arithmetic and geometric means).\(\square \)