Skip to main content

An Almost Constant Lower Bound of the Isoperimetric Coefficient in the KLS Conjecture

Abstract

We prove an almost constant lower bound of the isoperimetric coefficient in the KLS conjecture. The lower bound has the dimension dependency \(d^{-o_d(1)}\). When the dimension is large enough, our lower bound is tighter than the previous best bound which has the dimension dependency \(d^{-1/4}\). Improving the current best lower bound of the isoperimetric coefficient in the KLS conjecture has many implications, including improvements of the current best bounds in Bourgain’s slicing conjecture and in the thin-shell conjecture, better concentration inequalities for Lipschitz functions of log-concave measures and better mixing time bounds for MCMC sampling algorithms on log-concave measures.

Introduction

Given a distribution, the isoperimetric coefficient of a subset is the ratio of the measure of the subset boundary to the minimum of the measures of the subset and its complement. Taking the minimum of such ratios over all subsets defines the isoperimetric coefficient of the distribution, also called the Cheeger isoperimetric coefficient of the distribution.

Kannan, Lovász and Simonovits (KLS) [12] conjecture that for any distribution that is log-concave, the Cheeger isoperimetric coefficient equals to that achieved by half-spaces up to a universal constant factor. If the conjecture is true, the Cheeger isoperimetric coefficient can be determined by going through all the half-spaces instead of all subsets. For this reason, the KLS conjecture is also called the KLS hyperplane conjecture. To make it precise, we start by formally defining log-concave distributions and then we state the conjecture.

A probability density function \(p: \mathbb {R}^d\rightarrow \mathbb {R}\) is log-concave if its logarithm is concave, i.e., for any \(x, y \in \mathbb {R}^{d} \times \mathbb {R}^{d}\) and for any \(\lambda \in [0, 1]\),

$$\begin{aligned} p(\lambda x + (1 - \lambda ) y) \ge p(x)^\lambda p(y)^{1-\lambda }. \end{aligned}$$
(1)

Common probability distributions such as Gaussian, exponential and logistic are log-concave. This definition also includes any uniform distribution over a convex set defined as follows. A subset \(K \subset \mathbb {R}^d\) is convex if \(\forall x, y \in K \times K, z \in [x, y] \implies z \in K\). The isoperimetric coefficient \(\psi (p)\) of a density p in \(\mathbb {R}^d\) is defined as

$$\begin{aligned} \psi (p) :=\inf _{S \subset \mathbb {R}^d}\frac{p^+(\partial S)}{\min (p(S), p(S^c))} \end{aligned}$$
(2)

where \(p(S) = \int _{x \in S} p(x) dx\) and the boundary measure of the subset is

$$\begin{aligned} p^+(\partial S) :=\underset{\epsilon \rightarrow 0^+}{\lim \inf }\ \frac{p\left( \left\{ x: {\mathbf {d}}(x, S) \le \epsilon \right\} \right) - p(S)}{\epsilon }, \end{aligned}$$

where \({\mathbf {d}}(x, S)\) is the Euclidean distance between x and the subset S.

The KLS conjecture is stated by Kannan, Lovász and Simonovits [12] as follows.

Conjecture 1

There exists a universal constant c, such that for any log-concave density p in \(\mathbb {R}^d\), we have

$$\begin{aligned} \psi (p) \ge \frac{c}{\sqrt{\rho \left( p \right) }}, \end{aligned}$$

where \(\rho \left( p \right) \) is the spectral norm of the covariance matrix of p. In other words, \(\rho \left( p \right) = \left\| A\right\| _{2}\), where \(A = {{\,\mathrm{Cov}\,}}_{X \sim p} (X)\) is the covariance matrix.

An upper bound of \(\psi (p)\) of the same form is relatively easy and it was shown to be achieved by half-spaces [12]. Proving the lower bound on \(\psi (p)\) up to some small factors in Conjecture 1 is the main goal of this paper. We say a log-concave density is isotropic if its mean \({\mathbb {E}}_{X\sim p} [X]\) equals to 0 and its covariance \({{\,\mathrm{Cov}\,}}_{X\sim p}(X)\) equals to \(\mathbb {I}_d\). In the case of isotropic log-concave densities, the KLS conjecture states that any isotropic log-concave density has its isoperimetric coefficient lower bounded by a universal constant.

There are many attempts trying to lower bound the Cheeger isoperimetric coefficient in the KLS conjecture. We refer readers to the survey paper by Lee and Vempala [18] for a detailed exposition of these attempts. In particular, the original KLS paper [12] (Theorem 5.1) shows that for any log-concave density p with covariance matrix A,

$$\begin{aligned} \psi (p) \ge \frac{\log (2)}{\sqrt{{{\,\mathrm{Tr}\,}}\left( A \right) }}. \end{aligned}$$
(3)

The original KLS paper [12] only deals with uniform distributions over convex sets, but their proof techniques can be easily extended to show that the same results hold for all log-concave densities. Remark that Equation (3) implies \(\psi (p) \ge \frac{\log (2)}{d^{1/2} \cdot \sqrt{\rho \left( p \right) }}\). The current best bound is shown in Lee and Vempala [17], where they show that there exists a universal constant c such that for any log-concave density p with covariance matrix A,

$$\begin{aligned} \psi (p) \ge \frac{c}{\left( {{\,\mathrm{Tr}\,}}\left( A^2 \right) \right) ^{1/4}}. \end{aligned}$$
(4)

It implies that \(\psi (p) \ge \frac{c}{d^{1/4} \cdot \sqrt{\rho \left( p \right) }}\). Note that in Lee and Vempala [17], their notation of \(\psi (p)\) is the reciprocal of ours and it is later switched in Theorem 32 of the survey paper [18] by the same authors. As a result, the above bound is not a misstatement of the results in Lee and Vempala [17] and it is simply translated into our notations. In this paper, we improve the dimension dependency \(d^{-1/4}\) to \(d^{-o_d(1)}\) in the lower bound of the isoperimetric coefficient.

There are many implications of improving the lower bound in the KLS conjecture. The two closely related conjectures are Bourgain’s slicing conjecture [3, 4] and the thin-shell conjecture [2]. It is worth noting that Bourgain [4] stated the slicing conjecture earlier than the introduction of the KLS conjecture. In terms of their connections to the KLS conjecture, Eldan and Klartag [9] proved that the thin-shell conjecture implies Bourgain’s slicing conjecture up to a universal constant factor. Later, Eldan [8] showed that the inverse of an lower bound of the isoperimetric coefficient is equivalent to an upper bound of the thin-shell constant in the thin-shell conjecture. Combining these two results, we have that an lower bound in the KLS conjecture implies upper bounds in the thin-shell conjecture and in Bourgain’s slicing conjecture.

The current best upper bound of the thin-shell constant has the dimension dependency \(d^{1/4}\) due to Lee and Vempala’s [17] improvement in the KLS conjecture. The current best bound of the slicing constant in Bourgain’s slicing conjecture also has the dimension dependency \(d^{1/4}\), proved by Klartag [13] without using the KLS conjecture. Klartag’s slicing constant bound is a slight improvement over Bourgain’s earlier slicing bound [4] which has the dimension dependency \(d^{1/4}\log (d)\). Given the current best bounds in these three conjectures and the relation among them, we conclude that improving the current best lower bound in the KLS conjecture improves the current best bounds for the other two conjectures, as noted in Lee and Vempala [18]. For a detailed exposition of the three conjectures and related results since the introduction of Bourgain’s slicing conjecture, we refer readers to Klartag and Milman [14].

Additionally, improving the lower bound in the KLS conjecture also improves concentration inequalities for Lipschitz functions of log-concave measures. It also leads to faster mixing time bounds of Markov chain Monte Carlo (MCMC) sampling algorithms on log-concave measures. Despite the great importance of these results, deriving these results from our new bound in the KLS conjecture is not the main focus of our paper. We refer readers to Milman [20] and Lee and Vempala [18] for more details about the abundant implications of the KLS conjecture.

Notation For two sequences \(a_n\) and \(b_n\) indexed by an integer n, we say that \(a_n = o_n(b_n)\) if \(\lim _{n \rightarrow \infty } \frac{a_n}{b_n} = 0\). The Euclidean norm of a vector \(x \in \mathbb {R}^d\) is denoted by \(\left\| x\right\| _{2}\). The spectral norm of a square matrix \(A \in \mathbb {R}^{d\times d}\) is denoted by \(\left\| A\right\| _{2}\). The Euclidean ball with center x and radius r is denoted by \(\mathbb {B}(x, r)\). For a real number \(x \in \mathbb {R}\), we denote its ceiling by \(\lceil x \rceil = \min \left\{ m \in \mathbb {Z} \mid m \ge x \right\} \). We say a density p is more log-concave than a Gaussian density \(\varphi \) if p can be written as a product form \(p = \nu \cdot \varphi \) where \(\varphi \) is the Gaussian density and \(\nu \) is a log-concave function (that is, \(\nu \) is proportional to a log-concave density). For a martingale \((M_t,\ t \in \mathbb {R}_+)\), we use \(\left[ M \right] _t\) to denote its quadratic variation, defined as

$$\begin{aligned} \left[ M \right] _t = \sup _{k \in \mathbb {N}} \sup _{0 \le t_1 \le \cdots \le t_k \le t} \sum _{i=1}^k \left( M_{t_i} - M_{t_{i-1}} \right) ^2. \end{aligned}$$

Main results

We prove the following lower bound on the isoperimetric coefficient of any log-concave density.

Theorem 1

There exists a universal constant c such that for any log-concave density p in \(\mathbb {R}^d\) and any integer \(\ell \ge 1\), we have

$$\begin{aligned} \psi (p) \ge \frac{1}{\left[ c \cdot \ell \left( \log (d)+1 \right) \right] ^{\ell /2} d^{16/\ell } \cdot \sqrt{\rho \left( p \right) }} \end{aligned}$$
(5)

where \(\rho \left( p \right) \) is the spectral norm of the covariance matrix of p.

As a corollary, take \(\ell = \left\lceil \left( \frac{\log (d)}{\log \log (d)} \right) ^{1/2} \right\rceil \), then there exists a constant \(c'\) such that

$$\begin{aligned} \psi (p) \ge \frac{1}{d^{c' \left( \frac{\log \log (d)}{\log {d}} \right) ^{1/2}} \cdot \sqrt{\rho \left( p \right) }}. \end{aligned}$$

Since \(\lim _{d\rightarrow \infty } \frac{\log \log (d)}{\log (d)} = 0\), for \(d\) large enough, the above lower bound is better than any lower bound of the form \(\frac{1}{d^{c''} \sqrt{\rho \left( p \right) }} \) (\(c''\) is a positive constant) in terms of dimension \(d\) dependency.

The proof of the main theorem uses the stochastic localization scheme introduced by Eldan [8]. Eldan uses this stochastic localization scheme to show that the thin shell conjecture is equivalent to the KLS conjecture up to a logarithmic factor. The construction of stochastic localization scheme uses elementary properties of semimartingales and stochastic integration. The main idea of Eldan’s proof to derive the KLS conjecture from the thin shell conjecture is to smoothly multiply a Gaussian part to the log-concave density, so that the modified density is more log-concave than a Gaussian density. When the Gaussian part is large enough, one can then easily prove the isoperimetric inequality.

The same scheme was refined in Lee and Vempala [17] to obtain the current best lower bound in the KLS conjecture. Lee and Vempala directly attack the KLS conjecture while following the same stochastic localization scheme to smoothly multiply a Gaussian part to the log-concave density. Their use of a new potential function leads to the current best lower bound in the KLS conjecture. The proof in this paper builds on Lee and Vempala [17]’s refinements of Eldan’s method, while it improves the handling of several quantities involved in the stochastic localization scheme. Figure 1 provides a diagram showing the relationship between the main lemmas.

Fig. 1
figure 1

Proof sketch.

To ensure the existence and the uniqueness of the stochastic localization construction, we first prove a lemma that deals with log-concave densities with compact support. Then we relate back to the main theorem by finding a compact support which contains most of the probability measure for a log-concave density.

Lemma 1

There exists a universal constant c such that for any log-concave density p in \(\mathbb {R}^d\) with compact support and any integer \(\ell \ge 1\), we have

$$\begin{aligned} \psi (p) \ge \frac{1}{\left[ c \cdot \ell \left( \log (d)+1 \right) \right] ^{\ell /2} d^{16/\ell } \cdot \sqrt{\rho \left( p \right) }}. \end{aligned}$$
(6)

The proof of Lemma 1 is provided in Section 2.5 after we introduce the intermediate lemmas. The use of the integer l in the lemma indicates that we control the Cheeger isoperimetric coefficient in an iterative fashion. In fact, we prove Lemma 1 by induction over l starting from the known bound in Equation (3). For this, we define the supremum of the product of the isoperimetric coefficient and the square-root of its spectral norm over all log-concave densities in \(\mathbb {R}^d\) with compact support:

$$\begin{aligned} \psi _d= \inf _{ \begin{array}{c} \text{ log-concave } \text{ density }\ p\ \text{ in }\ \mathbb {R}^d\\ \text{ with } \text{ compact } \text{ support } \end{array}} \psi (p) \sqrt{\rho \left( p \right) }. \end{aligned}$$
(7)

Then we prove the following lemma on the lower bound of \(\psi _d\), which serves as the main induction argument.

Lemma 2

Suppose that \(\psi _k \ge \frac{1}{\alpha k^\beta }\) for all \(k \le d\) for some \(0 \le \beta \le \frac{1}{2}\) and \(\alpha \ge 1\), take \(q = \lceil \frac{1}{\beta } \rceil + 1\), there exists a universal constant c such that we have

$$\begin{aligned} \psi _d\ge \frac{1}{c \cdot q^{1/2} \alpha \log (d)^{1/2} d^{\beta - \beta / (8q) }}. \end{aligned}$$

The proof of Lemma 2 is provided towards the end of this section in Section 2.4. To have a good understanding of how we get there, we start by introducing the stochastic localization scheme introduced by Eldan [8].

Eldan’s stochastic localization scheme.

Given a log-concave density p in \(\mathbb {R}^d\) with covariance matrix \(A\), we define the following stochastic differential equation (SDE)

$$\begin{aligned} dc_t&= C_t^{1/2}dW_t + C_t \mu _t dt,\quad c_0 = 0,\nonumber \\ dB_t&= C_t dt,\quad B_0 = 0, \end{aligned}$$
(8)

where \(W_t\) is the Wiener process, the matrix \(C_t\), the density \(p_t\), the mean \(\mu _t\) and the covariance \(A_t\) are defined as follows

$$\begin{aligned} C_t&= A^{-1}, \end{aligned}$$
(9)
$$\begin{aligned} p_t(x)&= \frac{e^{c_t^\top x - \frac{1}{2}x^\top B_t x} p(x)}{\int _{\mathbb {R}^d} e^{c_t ^\top x - \frac{1}{2}y^\top B_t y} p(y) dy}, \text {for}\, x \in \mathbb {R}^d, \end{aligned}$$
(10)
$$\begin{aligned} \mu _t&= \int _{\mathbb {R}^d} x p_t(x)dx, \end{aligned}$$
(11)
$$\begin{aligned} A_t&= \int _{\mathbb {R}^d} \left( x - \mu _t \right) \left( x - \mu _t \right) p_t(x) dx. \end{aligned}$$
(12)

The next lemma shows the existence and the uniqueness of the SDE solution.

Lemma 3

Given a density p in \(\mathbb {R}^d\) with compact support with covariance \(A\) and \(A\) is invertible, then the SDE (8) is well defined and it has a unique solution on the time interval [0, T], for any time \(T > 0\). Additionally, for any \(x \in \mathbb {R}^d\), \(p_t(x)\) is a martingale with

$$\begin{aligned} dp_t(x) = \left( x - \mu _t \right) ^\top A^{-1/2} dW_t p_t(x). \end{aligned}$$
(13)

The proof of Lemma 3 follows from the standard existence and uniqueness theorem of SDE (Theorem 5.2 in Øksendal [21]). The proof is provided in Appendix A.

Before we dive into the proof of Lemma 2, we discuss how the stochastic localization scheme allows us to control the boundary measure of a subset. First, according to the concavity of the isoperimetric profile (Theorem 2.8 in Sternberg and Zumbrun [25] or Theorem 1.8 in Milman [20]), it is sufficient to consider subsets of measure 1/2 in the definition of the isoperimetric coefficient in Equation (2). Second, the density \(p_t\) is log-concave and it is more log-concave than the Gaussian density proportional to \(e^{-\frac{1}{2}x^\top B_t x}\). It can be shown via the KLS localization lemma [12] that a density which is more log-concave than a Gaussian has an isoperimetric coefficient lower bound that depends on the covariance of the Gaussian (see e.g. Theorem 2.7 in Ledoux [16] or Theorem 4.4 in Cousins and Vempala [7]). Third, given an initial subset E of \(\mathbb {R}^d\) with measure \(p(E) = \frac{1}{2}\), using the martingale property of \(p_t(E)\), we observe that

$$\begin{aligned} p(\partial E)&= {\mathbb {E}}\left[ p_t(\partial E) \right] \\&{\mathop {\ge }\limits ^{\mathrm{(i)}}} {\mathbb {E}}\left[ \frac{1}{2}\left\| B_t^{-1}\right\| _{2}^{-1/2} \min \left( p_t(E), p_t(E^c) \right) \right] \\&{\mathop {\ge }\limits ^{\mathrm{(ii)}}} \frac{1}{4}\cdot \frac{1}{2}\left\| B_t^{-1}\right\| _{2}^{-1/2} {\mathbb {P}}\left( \frac{1}{4}\le p_t(E) \le \frac{3}{4}\right) \\&= \frac{1}{4}\left\| B_t^{-1}\right\| _{2}^{-1/2} {\mathbb {P}}\left( \frac{1}{4}\le p_t(E) \le \frac{3}{4}\right) \cdot \min \left\{ p(E), p(E^c) \right\} . \end{aligned}$$

Inequality (i) uses the isoperimetric inequality for a log-concave density which is more log-concave than a Gaussian density proportional to \(e^{-\frac{1}{2}x^\top B_t x}\) [7, 16]. Inequality (ii) uses the fact that \(p_t(E)\) is nonnegative.

Based on the above observation, the high level idea of the proof requires two main steps:

  • There exists some time \(t > 0\), such that the Gaussian component \(\frac{1}{2}x^\top B_t x\) of the density \(p_t\) is large enough, so that we can apply the known isoperimetric inequality for densities more log-concave than a Gaussian.

  • We need to control the quantity \(p_t(E)\) so that the obtained isoperimetric inequality at time t can be related back to that at time 0.

The first step is obvious since our construction explicitly enforces the density \(p_t\) to have a Gaussian component \(\frac{1}{2}x^\top B_t x\) in Equation (9). Then the remaining question is whether we can run the SDE long enough to make the Gaussian component large enough while still keeping \(p_t(E)\) to be the same order as \(p(E) = \frac{1}{2}\) with large probability.

Control the evolution of the measure of a subset.

Lemma 4

Under the same assumptions of Lemma 3, for any measurable subset E of \(\mathbb {R}^d\) with \(p(E) = \frac{1}{2}\) and \(t > 0\), the solution \(p_t\) of the SDE (9) satisfies

$$\begin{aligned} {\mathbb {P}}\left( \frac{1}{4} \le p_t(E) \le \frac{3}{4} \right) \ge \frac{9}{10} - {\mathbb {P}}\left( \int _0^t \left\| A^{-1/2} A_t A^{-1/2}\right\| _{2} ds \ge \frac{1}{64} \right) . \end{aligned}$$

This lemma is proved in Lemma 29 of Lee and Vempala [17]. We provide a proof here for completeness.

Proof of Lemma 4. Let \(g_t = p_t(E)\). Using Equation (13), we obtain the following derivative of \(g_t\)

$$\begin{aligned} d g_t&= \int _E (x - \mu _t)^\top A^{-1/2} dW_t p_t(x) dx. \end{aligned}$$

Its quadratic variation is

$$\begin{aligned} d\left[ g \right] _t&= \left\| \int _E A^{-1/2} (x - \mu _t) p_t(x) dx \right\| _{2}^2 dt \\&= \max _{\left\| \xi \right\| _{2} \le 1} \left( \int _E \xi ^\top A^{-1/2} (x - \mu _t) p_t(x) dx \right) ^2 dt \\&\le \max _{\left\| \xi \right\| _{2} \le 1} \left( \int _E \left( \xi ^\top A^{-1/2} (x - \mu _t) \right) ^2 p_t(x) dx \right) \left( \int _E p_t(x) dx \right) dt \\&\le \max _{\left\| \xi \right\| _{2} \le 1} \xi ^\top A^{-1/2} A_t A^{-1/2} \xi dt \\&= \left\| A^{-1/2} A_t A^{-1/2}\right\| _{2} dt, \end{aligned}$$

where the inequality follows from Cauchy–Schwarz inequality. Applying the Dambis, Dubins-Schwarz theorem (see e.g. Revuz and Yor [23] Section V.1 Theorem 1.7), there exists a Wiener process \({\tilde{W}}_t\) such that \(g_t - g_0\) has the same distribution as \({\tilde{W}}_{[g]_t}\). Since \(g_0 = \frac{1}{2}\), we obtain

$$\begin{aligned} {\mathbb {P}}\left( \frac{1}{4} \le p_t(E) \le \frac{3}{4} \right)&= {\mathbb {P}}\left( -\frac{1}{4} \le {\tilde{W}}_{[g]_t} \le \frac{1}{4} \right) \\&\ge 1 - {\mathbb {P}}\left( \max _{0 \le s \le \frac{1}{64}} \left| {\tilde{W}}_s \right|> \frac{1}{4} \right) - {\mathbb {P}}([g]_t> \frac{1}{64}) \\&= 1 - 4 {\mathbb {P}}\left( {\tilde{W}}_{\frac{1}{64}}> \frac{1}{4} \right) - {\mathbb {P}}\left( [g]_t> \frac{1}{64} \right) \\&\ge \frac{9}{10} - {\mathbb {P}}\left( \int _0^t \left\| A^{-1/2} A_t A^{-1/2}\right\| _{2} ds > \frac{1}{64} \right) , \end{aligned}$$

where the last inequality follows from the fact that \({\mathbb {P}}\left( \xi > 2 \right) < 0.023\) for \(\xi \) follows the standard Gaussian distribution.\(\square \)

Control the evolution of the spectral norm.

According to Lemma 4, to control the evolution of the measures of subsets, we need to control the spectral norm of \(A^{-1/2} A_t A^{-1/2}\). The following lemma serves the purpose.

Lemma 5

In addition to the same assumptions of Lemma 3, if \(\psi _k \ge \frac{1}{\alpha k^\beta }\) for all \(k \le d\) for some \(0 < \beta \le \frac{1}{2}\) and \(\alpha \ge 1\), then there exists a universal constant c such that for \(q = \lceil \frac{1}{\beta } \rceil + 1\), \(d\ge 3\) and \(T_2 = \frac{1}{ c \cdot q \alpha ^2\log (d) d^{2\beta - \beta /(4q)}}\), we have

$$\begin{aligned} {\mathbb {P}}\left( \int _{0}^{T_2} \left\| A^{-1/2} A_t A^{-1/2}\right\| _{2} dt \ge \frac{1}{64} \right) < \frac{4}{10}. \end{aligned}$$

Direct control of the largest eigenvalue of \(A^{-1/2} A_t A^{-1/2}\) is not trivial, instead we use the potential function \(\Gamma _t\) to upper bound the largest eigenvalue. Define

$$\begin{aligned} Q_t&= A^{-1/2} A_t A^{-1/2} \nonumber \\ \Gamma _t&= {{\,\mathrm{Tr}\,}}\left( Q_t^q \right) . \end{aligned}$$
(14)

It is clear that \(\Gamma _t^{1/q} \ge \left\| A^{-1/2} A_t A^{-1/2}\right\| _{2}\). So in order to upper bound \(\left\| A^{-1/2} A_t A^{-1/2}\right\| _{2}\), it is sufficient to upper bound \(\Gamma _t^{1/q}\). The advantage of using \(\Gamma _t\) is that it is differentiable. We have the following differential for \(A_t\) and \(\Gamma _t\):

$$\begin{aligned} dA_t&= \int (x - \mu _t) (x - \mu _t)^\top \left( (x-\mu _t)^\top A^{-1/2}dW_t \right) p_t(x) dx - A_tA^{-1}A_t dt, \end{aligned}$$
(15)
$$\begin{aligned} d\Gamma _t&= q \int \left( x-\mu _t \right) ^\top A^{-1/2} \left( Q_t \right) ^{q-1} A^{-1/2} \left( x-\mu _t \right) \left( x-\mu _t \right) ^\top A^{-1/2} dW_t p_t(x) dx \nonumber \\&\quad - q {{\,\mathrm{Tr}\,}}\left( Q_t^{q+1} \right) dt + \frac{q}{2} \sum _{a = 0}^{q-2} \int \int \left( x-\mu _t \right) ^\top A^{-1/2} Q_t^{a} A^{-1/2} \left( y-\mu _t \right) \nonumber \\&\quad \cdot \left( x-\mu _t \right) ^\top A^{-1/2} Q_t^{q-2-a} A^{-1/2} \left( y-\mu _t \right) \left( x-\mu _t \right) ^\top A^{-1} \left( y-\mu _t \right) p_t(x) p_t(y) dx dy dt. \end{aligned}$$
(16)

Obtaining these differentials uses Itô’s formula and the proofs are provided in Appendix A.

The next lemma upper bounds the terms in the potential \(\Gamma _t\).

Lemma 6

Under the same assumptions of Lemma 5, the potential \(\Gamma _t\) defined in Equation (14) can be written as follows

$$\begin{aligned} d\Gamma _t = v_t^\top dW_t + \delta _t dt, \end{aligned}$$

where \(v_t \in \mathbb {R}^d\) and \(\delta _t \in \mathbb {R}\) satisfy

$$\begin{aligned} \left\| v_t\right\| _{2}&\le 16 q \Gamma _t^{1 + 1/(2q)}, \text { and } \\ \delta _t&\le \min \left\{ 64 q^2 \alpha ^2 \log (d) d^{2\beta -1/q}\Gamma _t^{1 + 1/q}, \frac{2q^2}{t} \Gamma _t \right\} . \end{aligned}$$

The proof of Lemma 6 is provided in Section 3.1. Remark that bounds similar to the first bound of \(\delta _t\) in Lemma 6 have appeared in Lee and Vempala [17], whereas the second bound of \(\delta _t\) in Lemma 6 is novel. The second bound of \(\delta _t\) also leads to the following Lemma 8 which gives better control of the potential than the previous proof by Lee and Vempala [17] when t is large.

Using the bounds in Lemma 6, we state the two lemmas which control the potential \(\Gamma _t\) in two ways.

Lemma 7

Under the same assumptions of Lemma 6, using the following transformation

$$\begin{aligned} h: \mathbb {R}_+&\rightarrow \mathbb {R}\\ a&\mapsto -(a+1)^{-1/q} \end{aligned}$$

we have

$$\begin{aligned} {\mathbb {P}}\left( \max _{t \in [0, T_1]} h(\Gamma _t) \ge - \frac{1}{2}\left( d+1 \right) ^{-1/q} \right) \le \exp (-\frac{2}{3} q\log (d)) \le \frac{3}{10} \end{aligned}$$

where \(T_1 = \frac{1}{32768 q \alpha ^2 \log (d) d^{2\beta }}\).

Lemma 8

Under the same assumptions of Lemma 6, using the following transformation

$$\begin{aligned} f: \mathbb {R}_+&\rightarrow \mathbb {R}\\ a&\mapsto a^{1/q} \end{aligned}$$

we have

$$\begin{aligned} {\mathbb {E}}f(\Gamma _{t_2}) \le {\mathbb {E}}f(\Gamma _{t_1}) \left( \frac{t_2}{t_1} \right) ^{2q}, \forall t_2> t_1 > 0. \end{aligned}$$

The proofs of Lemma 7 and 8 are provided in Section 3.2.

Now we are ready to prove Lemma 5.

Proof of Lemma 5. We take

$$\begin{aligned} T_1 = \frac{1}{32768 q \alpha ^2 \log (d) d^{2\beta }}, \quad T_2 = \frac{d^{\beta /(4q)}}{40} T_1 = \frac{1}{ 1310720 q \alpha ^2\log (d) d^{2\beta - \beta /(4q)}}. \end{aligned}$$

We bound the spectral norm of \(A^{-1/2}A_t A^{-1/2}\) in two time intervals via Lemma 7 and Lemma 8. In the first time interval \([0, T_1]\), we have

$$\begin{aligned} {\mathbb {P}}\left( \int _0^{T_1} \left\| A^{-1/2}A_t A^{-1/2}\right\| _{2} \ge \frac{1}{128} \right)&\le {\mathbb {P}}\left( \max _{t \in [0, T_1]} \left\| A^{-1/2}A_t A^{-1/2}\right\| _{2} \ge \frac{1}{128T_1} \right) \nonumber \\&\quad {\mathop {\le }\limits ^{\mathrm{(i)}}} {\mathbb {P}}\left( \max _{t \in [0, T_1]} \left\| A^{-1/2}A_t A^{-1/2}\right\| _{2} \ge 3 d^{1/q} \right) \nonumber \\&\quad {\mathop {\le }\limits ^{\mathrm{(ii)}}} {\mathbb {P}}\left( \max _{t \in [0, T_1]} \Gamma _t \ge 3^{q} d \right) \nonumber \\&\quad {\mathop {\le }\limits ^{\mathrm{(iii)}}} {\mathbb {P}}\left( \max _{t \in [0, T_1]} \Gamma _t + 1 \ge 2^{q} (d+1) \right) \nonumber \\&\quad {\mathop {=}\limits ^{{\mathrm{(iii)}}}} {\mathbb {P}}\left( \max _{t \in [0, T_1]} h(\Gamma _t) \ge -\frac{1}{2} \left( d+1 \right) ^{-1/q} \right) \nonumber \\&\quad {\mathop {\le }\limits ^{\mathrm{(iv)}}} \frac{3}{10}. \end{aligned}$$
(17)

Inequality (i) follows from the condition \(\beta q \ge 1\). (ii) follows from the fact that \({{\,\mathrm{Tr}\,}}\left( A^q \right) ^{1/q} \ge \left\| A\right\| _{2}\). (iii) is because \(3^q d\ge 2^q (d+ 1)\) when \(q \ge 2\) and \(d\ge 1\). h is defined in Lemma 7. (iv) follows from Lemma 7.

In the first time interval, we can also bound the expectation of \(\Gamma _{T_1}^{1/q}\). Since the density \(p_{T_1}\) is more log-concave than a Gaussian density with covariance matrix \(\frac{A}{T_1}\), the covariance matrix of \(p_{T_1}\) is upper bounded as follows (see Theorem 4.1 in Brascamp-Lieb [5] or Lemma 5 in Eldan and Lehec [10])

$$\begin{aligned} A_{T_1} \preceq \frac{A}{T_1}. \end{aligned}$$
(18)

Consequently, all the eigenvalues of \(Q_{T_1}\) are less than \(\frac{1}{T_1}\) and \(\Gamma _{T_1}\) is upper bounded by \(\frac{d}{T_1^{q}}\). Using the above bound, we can bound the expectation of \(\Gamma _{T_1}^{1/q}\) as follows

$$\begin{aligned} {\mathbb {E}}\left[ \Gamma _{T_1}^{1/q} \right]&= {\mathbb {E}}\left[ \mathbb {1}_{\Gamma _{T_1} \ge 3^q d} \Gamma _{T_1}^{1/q} + \mathbb {1}_{\Gamma _{T_1} < 3^q d} \Gamma _{T_1}^{1/q} \right] \nonumber \\&{\mathop {\le }\limits ^{\mathrm{(i)}}} \frac{d^{1/q}}{T_1} \exp \left( - \frac{2}{3} q \log (d) \right) + 3 d^{1/q} \nonumber \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}} 32768 d^{1/q} q \alpha ^2 + 4 d^{1/q} \nonumber \\&\le 40000 d^{1/q} q \alpha ^2. \end{aligned}$$
(19)

Inequality (i) follows from Lemma 7, the inequality \(3^qd\ge 2^q(d+1)\) (similar to what we did in the last four steps of Equation (17)) and Equation (18). (ii) follows from \(q \ge 2\), \(\beta \le {1/2}\) and \(d^{1/2} \ge \log (d)\) for \(d\ge 3\).

In the second time interval, for \(t \in [T_1, T_2]\), we have

$$\begin{aligned} {\mathbb {E}}\left[ \left\| A^{-1/2} A_{t} A^{-1/2}\right\| _{2} \right]&\le {\mathbb {E}}\left[ \Gamma _{t}^{1/q} \right] \nonumber \\&{\mathop {\le }\limits ^{\mathrm{(i)}}} {\mathbb {E}}\left[ \Gamma _{T_1}^{1/q} \right] \left( \frac{t}{T_1} \right) ^{2q} \nonumber \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}} {\mathbb {E}}\left[ \Gamma _{T_1}^{1/q} \right] \left( \frac{T_2}{T_1} \right) ^{2q} \nonumber \\&{\mathop {\le }\limits ^{\mathrm{(iii)}}} 1000 d^{\beta /2 + 1/q} q \alpha ^2. \end{aligned}$$
(20)

Inequality (i) follows from Lemma 8. (ii) is because \(t \le T_2\). (iii) follows from \(T_2 = \frac{d^{\beta /(4q)}}{40} T_1\). Using the above bound, we control the spectral norm in the second time interval via Markov’s inequality

$$\begin{aligned} {\mathbb {P}}\left( \int _{T_1}^{T_2} \left\| A^{-1/2}A_t A^{-1/2}\right\| _{2} \ge \frac{1}{128} \right)&{\mathop {\le }\limits ^{\mathrm{(i)}}} \frac{{\mathbb {E}}\left[ \int _{T_1}^{T_2} \left\| A^{-1/2}A_t A^{-1/2}\right\| _{2} dt \right] }{1/128} \nonumber \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}} T_2 \cdot 1000 d^{\beta /2 + 1/q} q \alpha ^2 \cdot 128 \nonumber \\&{\mathop {<}\limits ^{\mathrm{(iii)}}} \frac{1}{10}, \end{aligned}$$
(21)

where inequality (i) follows from Markov’s inequality and (ii) follows from Equation (20). (iii) follows from the definition of \(T_2\) and \(\frac{\beta }{2}+\frac{1}{q} \le 2\beta -\beta /(4q)\) when \(\beta q \ge 1\) and \(q \ge 2\).

Combining the bounds in the first and second time intervals in Equation (17) and (21), we obtain

$$\begin{aligned} {\mathbb {P}}\left( \int _{0}^{T_2} \left\| A^{-1/2}A_t A^{-1/2}\right\| _{2} \ge \frac{1}{64} \right)&\le {\mathbb {P}}\left( \int _{0}^{T_1} \left\| A^{-1/2}A_t A^{-1/2}\right\| _{2} \ge \frac{1}{128} \right) \nonumber \\&\quad + {\mathbb {P}}\left( \int _{T_1}^{T_2} \left\| A^{-1/2}A_t A^{-1/2}\right\| _{2} \ge \frac{1}{128} \right) \le \frac{4}{10}. \end{aligned}$$
(22)

\(\square \)

Proof of Lemma 2.

The proof of Lemma 2 follows the strategy described after Lemma 3. We make the arguments rigorous here. We consider a log-concave density p in \(\mathbb {R}^d\) with compact support. Without loss of generality, we can assume that the covariance matrix A of the density p is invertible. Otherwise, the density p is degenerate and we can instead prove the results in a lower dimension.

According to the concavity of the isoperimetric profile (Theorem 2.8 in Sternberg and Zumbrun [25] or Theorem 1.8 in Milman [20]), it is sufficient to consider subsets of measure 1/2 in the definition of isoperimetric coefficient (2). Given an initial subset E of \(\mathbb {R}^d\) with \(p(E) = \frac{1}{2}\), use the martingale property of \(p_{T_2}(E)\), we have

$$\begin{aligned} p(\partial E)&= {\mathbb {E}}\left[ p_{T_2}(\partial E) \right] \\&{\mathop {\ge }\limits ^{\mathrm{(i)}}} {\mathbb {E}}\left[ \frac{1}{2}\left\| B_{T_2}^{-1}\right\| _{2}^{-1/2} \min \left( p_{T_2}(E), p_{T_2}(E^c) \right) \right] \\&{\mathop {\ge }\limits ^{\mathrm{(ii)}}} \frac{1}{4}\cdot \frac{1}{2}\left\| B_{T_2}^{-1}\right\| _{2}^{-1/2} {\mathbb {P}}( \frac{1}{4}\le p_{T_2}(E) \le \frac{3}{4})\\&= \frac{1}{4}\left\| B_{T_2}^{-1}\right\| _{2}^{-1/2} {\mathbb {P}}( \frac{1}{4}\le p_{T_2}(E) \le \frac{3}{4}) \cdot \min \left\{ p(E), p(E^c) \right\} \\&{\mathop {\ge }\limits ^{\mathrm{(iii)}}} \frac{1}{8}\left\| B_{T_2}^{-1}\right\| _{2}^{-1/2} \cdot \min \left\{ p(E), p(E^c) \right\} \\&{\mathop {=}\limits ^{\mathrm{(iv)}}} \frac{1}{8}T_2^{1/2}\left\| A\right\| _{2}^{-1/2} \cdot \min \left\{ p(E), p(E^c) \right\} . \end{aligned}$$

Inequality (i) uses the isoperimetric inequality for a log-concave density which is more log-concave than a Gaussian density proportional to \(e^{-\frac{1}{2}x^\top B_t x}\) (see e.g. Theorem 2.7 in Ledoux [16] or Theorem 4.4 in Cousins and Vempala [7]). Inequality (ii) follows from the fact that \(p_t(E)\) is nonnegative. (iii) follows from Lemma 4 and Lemma 5 (for \(d\ge 3\)). (iv) follows from the construction that \(B_t = t A^{-1}\). We conclude the proof since \(T_2\) is taken as \(\frac{1}{ c \cdot q \alpha ^2\log (d) d^{2\beta - \beta /(4q)}}\) with c as a constant. The above proof only works for \(d\ge 3\). It is easy to verify that Lemma 2 still holds for the case for \(d= 1, 2\) from the original KLS bound in Equation (3).\(\square \)

Proof of Lemma 1.

The proof of Lemma 1 consists of applying Lemma 2 recursively. We define

$$\begin{aligned} \alpha _1 = 4, \beta _1 = \frac{1}{2}. \end{aligned}$$

For \(\ell \ge 1\), we define \(\alpha _\ell \) and \(\beta _\ell \) recursively as follows:

$$\begin{aligned} \alpha _{\ell +1}&= 2c \cdot \alpha _\ell \beta _\ell ^{-1/2}, \nonumber \\ \beta _{\ell +1}&= \beta _\ell - \beta _\ell ^2/16, \end{aligned}$$
(23)

where c is the constant in Lemma 2. It is not difficult to show by induction that \(\alpha _\ell \) and \(\beta _\ell \) satisfy

$$\begin{aligned} \frac{1}{\ell +1}&\le \beta _\ell \le \frac{16}{\ell } \nonumber \\ \alpha _\ell&\le \left( 4c^2 \ell \right) ^{\ell /2}. \end{aligned}$$
(24)

We start with a known bound from the original KLS paper [12]

$$\begin{aligned} \psi _d\ge \frac{1}{\alpha _1 d^{\beta _1}},\quad \forall d\ge 1. \end{aligned}$$

In the induction, suppose that we have

$$\begin{aligned} \psi _d\ge \frac{1}{\alpha _\ell \left( \log (d)+1 \right) ^{\ell /2} d^{\beta _\ell }},\quad \forall d\ge 1. \end{aligned}$$

From the above inequality, we obtain for any \(1 \le k \le d\),

$$\begin{aligned} \psi _k \ge \frac{1}{\alpha _\ell ' k^\beta _\ell }, \end{aligned}$$

with \(\alpha _\ell ' = \alpha _\ell \left( \log (d) +1 \right) ^{\ell /2}\). Using the above lower bounds for \(\psi _k\), we can apply Lemma 2. For integer \(\ell +1\), we have

$$\begin{aligned} \psi _d&{\mathop {\ge }\limits ^{(i)}} \frac{1}{c \cdot q^{1/2} \alpha _\ell \left( \log (d)+1 \right) ^{l/2} \log (d)^{1/2} d^{\beta _\ell - \beta _\ell / (8q) }}\\&{\mathop {\ge }\limits ^{(ii)}} \frac{1}{2c \cdot \alpha _\ell \beta _\ell ^{-1/2} \left( \log (d)+1 \right) ^{(l+1)/2} d^{\beta _\ell - \beta _\ell ^2 / 16 }} \\&= \frac{1}{\alpha _{\ell +1} \left( \log (d)+1 \right) ^{(\ell +1)/2} d^{\beta _{\ell +1}}} \end{aligned}$$

where inequality (i) follows from Lemma 2, inequality (ii) follows from \(q \le \frac{2}{\beta }\) and the last equality follows from the definition of \(\alpha _\ell \) and \(\beta _\ell \). We conclude Lemma 1 using the \(\alpha _\ell \) and \(\beta _\ell \) bounds in Equation (24).\(\square \)

Proof of Theorem 1.

To derive Theorem 1 from Lemma 1, it is sufficient to show that for any log-concave density p in \(\mathbb {R}^d\), most of its probability measure is on a compact support. Let \(\mu \) be the mean of the density p. Since \(r \mapsto p(\mathbb {B}\left( \mu , r \right) ^c)\) is an non-increasing function of r with limit 0 at \(\infty \), there exists a radius \(R > 0\), such that \(p(\mathbb {B}\left( \mu , R \right) ^c) \le 0.2\). Note that it is possible to get a better bound via e.g. log-concave concentration bounds from Paouris [22], but knowing the existence of such radius R is sufficient for the proof here.

Denote \(B = \mathbb {B}\left( \mu , R \right) \). Then \(p(B^c)\le 0.2\). Let \(\varrho \) be the density obtained by truncating p on the ball B. Then \(\varrho \) is log-concave and it has compact support. For a subset \(E \subset \mathbb {R}^d\) of measure such that \(p(E) = \frac{1}{2}\), we have

$$\begin{aligned} p(\partial E)&\ge \varrho (\partial E) p(B) \\&\ge \psi (\varrho ) \min \left( \varrho (E), \varrho (E^c) \right) p(B) \\&= \psi (\varrho ) \min \left( p(E \cap B), p(B \cap E^c) \right) \\&\ge \psi (\varrho ) \min \left( p(E) - p(B^c), p(E^c) - p(B^c) \right) \\&\ge \frac{1}{2} \psi (\varrho ) \min \left( p(E), p(E^c) \right) . \end{aligned}$$

The last inequality follows because \(p(E^c) - p(B^c) \ge 0.5 - 0.2 \ge \frac{1}{4}\). Since it is sufficient to consider subsets of measure 1/2 in the definition of the isoperimetric coefficient [20, 25], we conclude that the isoperimetric coefficient of p is lower bounded by half of that of \(\varrho \). Applying Lemma 1 for the isoperimetric coefficient of \(\varrho \), we obtain Theorem 1.\(\square \)

Proof of auxiliary lemmas

In this section, we prove auxiliary Lemmas 67 and 8.

Tensor bounds and proof of Lemma 6.

In this subsection, we prove Lemma 6. Since Lemma 6 involves the third-order moment tensor of a log-concave density, we define the following 3-Tensor for any probability density \(p \in \mathbb {R}^d\) with mean \(\mu \) to simplify notations.

$$\begin{aligned}&\mathcal {T}_p: \quad \mathbb {R}^{d\times d} \times \mathbb {R}^{d\times d} \times \mathbb {R}^{d\times d} \rightarrow \mathbb {R}\nonumber \\&\quad (A, B, C) \mapsto \int \int (x-\mu )^\top A (y-\mu )\nonumber \\&\quad \cdot (x-\mu )^\top B (y - \mu ) \cdot (x - \mu ) ^\top C (y - \mu ) p(x) p(y) dx dy. \end{aligned}$$
(25)

For ABC three matrices in \(\mathbb {R}^{d\times d}\), we can write \(\mathcal {T}_p(A, B, C)\) equivalently as

$$\begin{aligned} \mathcal {T}_p(A, B, C) = {\mathbb {E}}_{X, Y \sim p} (X-\mu ) ^\top A (Y-\mu ) \cdot (X-\mu ) ^\top B (Y-\mu ) \cdot (X-\mu ) ^\top C (Y-\mu ). \end{aligned}$$

Before we prove Lemma 6, we prove the following properties related to the 3-Tensor.

Lemma 9

Suppose p is a log-concave density with mean \(\mu \) and covariance A. Then for any positive semi-definite matrices B and C, we have

$$\begin{aligned} \left\| \int B^{1/2} (x - \mu ) (x - \mu ) ^\top C (x - \mu ) p(x)dx\right\| _{2} \le 16 \left\| A^{1/2}B A^{1/2}\right\| _{2}^{1/2} {{\,\mathrm{Tr}\,}}\left( A^{1/2} C A^{1/2} \right) . \end{aligned}$$

Lemma 10

Suppose that \(\psi _k \ge \frac{1}{\alpha k^\beta }\) for all \(k \le d\) for some \(0 \le \beta \le \frac{1}{2}\) and \(\alpha \ge 1\). Suppose p is a log-concave density in \(\mathbb {R}^d\) with covariance A and A is invertible. Then for \(q \ge \frac{1}{2\beta }\), we have

$$\begin{aligned} \mathcal {T}_p(A^{q-2}, \mathbb {I}_d, \mathbb {I}_d) \le 128 \alpha ^2 \log (d) d^{2\beta - 1/q} {{\,\mathrm{Tr}\,}}(A^q) ^{1 + 1/q}. \end{aligned}$$

Lemma 11

Given \(\tau > 0\). Suppose p is a log-concave density which is more log-concave than \(\mathcal {N}(0, \frac{1}{\tau } \mathbb {I}_d)\). Let A be its covariance matrix. Suppose A is invertible then for \(q \ge 3\), we have

$$\begin{aligned} \mathcal {T}_p(A^{q-2}, \mathbb {I}_d, \mathbb {I}_d) \le \frac{4}{\tau } {{\,\mathrm{Tr}\,}}\left( A^{q} \right) . \end{aligned}$$

Lemma 12

Suppose p is a log-concave density in \(\mathbb {R}^d\). For any \(\delta \in [0, 1]\), for ABC positive semi-definite matrices then

$$\begin{aligned} \mathcal {T}_{p}(B^{1/2}A^\delta B^{1/2}, B^{1/2}A^{1-\delta }B^{1/2}, C) \le \mathcal {T}_{p}(B^{1/2}AB^{1/2}, B, C). \end{aligned}$$
(26)

The proofs of the above lemmas are provided in Section 3.3.

Now we are ready to prove Lemma 6.

Proof of Lemma 6. We first prove the bound on \(\left\| v_t\right\| _{2}\), where

$$\begin{aligned} v_t = q \int A^{-1/2} \left( x-\mu _t \right) \left( x-\mu _t \right) ^\top A^{-1/2} \left( Q_t \right) ^{q-1} A^{-1/2} \left( x-\mu _t \right) p_t(x) dx. \end{aligned}$$

Applying Lemma 9 and knowing the covariance of \(p_t\) is \(A_t\), we obtain

$$\begin{aligned} \left\| v_t\right\| _{2}&\le 16 q \left\| A_t^{1/2} A^{-1} A_t^{1/2}\right\| _{2}^{1/2} {{\,\mathrm{Tr}\,}}\left( A_t^{1/2} A^{-1/2} Q_t^{q-1} A^{-1/2} A_t^{1/2} \right) \\&{\mathop {=}\limits ^{\mathrm{(i)}}} 16 q \left\| A_t^{1/2} A^{-1} A_t^{1/2}\right\| _{2}^{1/2} {{\,\mathrm{Tr}\,}}\left( Q_t^{q} \right) \\&{\mathop {=}\limits ^{\mathrm{(ii)}}} 16 q \left\| Q_t\right\| _{2}^{1/2} {{\,\mathrm{Tr}\,}}\left( Q_t^{q} \right) \\&{\mathop {\le }\limits ^{\mathrm{(iii)}}} 16 q \left[ {{\,\mathrm{Tr}\,}}\left( Q_t^{q} \right) \right] ^{1+1/(2q)}. \end{aligned}$$

Equality (i) uses the definition of \(Q_t = A^{-1/2} A_t A^{-1/2}\). Equality (ii) uses the fact that \(\left\| MM^\top \right\| _{2} = \left\| M^\top M\right\| _{2}\) for any square matrix \(M \in \mathbb {R}^{d\times d}\). Inequality (iii) uses that \(\left\| M\right\| _{2} \le {{\,\mathrm{Tr}\,}}\left( M^q \right) ^{1/q}\) for any positive semi-definite matrix M.

Next, we bound \(\delta _t\) in two ways. We can ignore the negative term in \(\delta _t\) to obtain the following:

$$\begin{aligned} \delta _t&\le \frac{q}{2} \sum _{a = 0}^{q-2} \int \int \left( x-\mu _t \right) ^\top A^{-1/2} Q_t^{a} A^{-1/2} \left( y-\mu _t \right) \nonumber \\&\quad \cdot \left( x-\mu _t \right) ^\top A^{-1/2} Q_t^{q-2-a} A^{-1/2} \left( y-\mu _t \right) \left( x-\mu _t \right) ^\top A^{-1} \left( y-\mu _t \right) p_t(x) p_t(y) dx dy \nonumber \\&= \frac{q}{2} \sum _{a = 0}^{q-2} \mathcal {T}_{\varrho _t}(Q_t^{a}, Q_t^{q-2-a}, \mathbb {I}_d), \end{aligned}$$
(27)

where \(\varrho _t\) is the density of linear-transformed random variable \(A^{-1/2}\left( X-\mu _t \right) \) for X drawn from \(p_t\) and \(\mu _t\) is the mean of \(p_t\). \(\varrho _t\) is still log-concave since any linear transformation of a log-concave density is log-concave (see e.g. Saumard and Wellner [24]). \(\varrho _t\) has covariance \(A^{-1/2} A_t A^{-1/2}\), which is also \(Q_t\). For \(a \in \left\{ 0, \cdots , q-2 \right\} \), we have

$$\begin{aligned} \mathcal {T}_{\varrho _t}(Q_t^{a}, Q_t^{q-2-a}, \mathbb {I}_d)&{\mathop {\le }\limits ^{\mathrm{(i)}}} \mathcal {T}_{\varrho _t}(Q_t^{q-2}, \mathbb {I}_d, \mathbb {I}_d) \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}} 128 \alpha ^2 \log (d) d^{2\beta - 1/q} \left[ {{\,\mathrm{Tr}\,}}\left( Q_t^q \right) \right] ^{1+1/q}. \end{aligned}$$

Inequality (i) follows from Lemma 12. Inequality (ii) follows from Lemma 10. Since there are \(q-1\) terms in the sum, we conclude the first part of the bound for \(\delta _t\).

On the other hand, since \(p_t\) is more log-concave than the Gaussian density proportional to \(e^{-\frac{t}{2} (x-\mu _t)^\top A^{-1} (x-\mu _t)}\), \(\varrho _t\) is more log-concave than the Gaussian density proportional to \(e^{-\frac{t}{2} x^\top x}\). Applying Lemma 12 and Lemma 11 to each term in Equation (27), we obtain

$$\begin{aligned} \delta _t&\le \frac{q^2}{2} \mathcal {T}_{\varrho _t}(Q_t^{q-2}, \mathbb {I}_d, \mathbb {I}_d) \\&\le \frac{2q^2}{t} {{\,\mathrm{Tr}\,}}\left( Q_t^{q} \right) . \end{aligned}$$

This concludes the second part of the bound for \(\delta _t\).\(\square \)

Control of the potential in two time intervals.

In this subsection, we prove Lemma 7 and Lemma 8.

Proof of Lemma 7. The function h has the following derivatives

$$\begin{aligned} \frac{d h}{d a} = \frac{1}{q} \left( a + 1 \right) ^{-1/q - 1}, \quad \frac{d^2 h}{da^2} = -\frac{q+1}{q^2} \left( a + 1 \right) ^{-1/q - 2}. \end{aligned}$$

Using Itô’s formula, we obtain

$$\begin{aligned} d h(\Gamma _t)&= \left. \frac{d h}{d a}\right| _{\Gamma _t} d\Gamma _t + \frac{1}{2} \left. \frac{d^2 h}{d a^2}\right| _{\Gamma _t} d\left[ \Gamma \right] _t \\ {}&= \frac{1}{q (\Gamma _t+1)^{1/q+1}} d\Gamma _t - \frac{1}{2} \frac{q+1}{q^2 (\Gamma _t+1)^{1/q+2}} \left\| v_t\right\| _{2}^2 dt \\ {}&\le \frac{1}{q (\Gamma _t+1)^{1/q+1}} d\Gamma _t \\ {}&{\mathop {\le }\limits ^{\mathrm{(i)}}}\ 64 q \alpha ^2 \log (d) d^{2\beta -1/q} dt + \frac{v_t^\top dW_t}{q \left( \Gamma _t + 1 \right) ^{1/q+1}}, \end{aligned}$$

where inequality (i) plugs in the bounds in Lemma 6.

Define a martingale \(Y_t\) such that

$$\begin{aligned} dY_t = \frac{v_t^\top dW_t}{q \left( \Gamma _t + 1 \right) ^{1/q+1}}, \end{aligned}$$

with \(Y_0 = 0\). According to the \(\left\| v_t\right\| _{2}\) upper bound in Lemma 6, we have

$$\begin{aligned} \left\| \frac{1}{q \left( \Gamma _t + 1 \right) ^{1 + 1/q}}v_t\right\| _{2}^2&\le 256. \end{aligned}$$

Hence the martingale \(Y_t\) is well-defined. According to the Dambis, Dubins-Schwarz theorem (see e.g. Revuz and Yor [23] Section V.1 Theorem 1.7), there exits a Wiener process \({\tilde{W}}_t\) such that \(Y_t\) has the same distribution as \({\tilde{W}}_{[Y]_t}\). Then we have for any \(\gamma > 0\),

$$\begin{aligned} {\mathbb {P}}\left( \max _{t \in [0, T]} Y_t \ge \gamma \right) \le {\mathbb {P}}\left( \max _{t \in [0, T]} {\tilde{W}}_{256t} \ge \gamma \right) \le \exp \left( -\frac{\gamma ^2}{512 T} \right) . \end{aligned}$$
(28)

Set \(T = \frac{1}{32768 q \alpha ^2 \log (d) d^{2\beta }}\) and \(\Psi = \frac{1}{2} \left( d+1 \right) ^{-1/q}\). Observe that \(\Gamma _0 = d\) and as a result \(h(\Gamma _0) = -\left( d+1 \right) ^{-1/q}\). Then we have

$$\begin{aligned} {\mathbb {P}}\left( \max _{t \in [0, T]} h(\Gamma _t) \ge -\Psi \right)&\le {\mathbb {P}}\left( \max \right) _{t \in [0, T]} Y_t \ge -\Psi + \left( d+1 \right) ^{-1/q}\\ {}&\quad - \int _0^T 64q \alpha ^2 \log (d) d^{2\beta - 1/q} dt \\ {}&{\mathop {\le }\limits ^{\mathrm{(i)}}}\ {\mathbb {P}}\left( \max _{t \in [0, T]} Y_t \ge \frac{\Psi }{4} \right) \\ {}&{\mathop {\le }\limits ^{\mathrm{(ii)}}} \exp \left( -\frac{\Psi ^2}{8192T} \right) \\ {}&{\mathop {\le }\limits ^{\mathrm{(iii)}}} \exp \left( -\frac{2}{3} q \alpha ^2 \log (d) d^{2\beta - 2/q} \right) \nonumber \\ {}&{\mathop {<}\limits ^{\mathrm{(iv)}}} \frac{3}{10}. \end{aligned}$$

Inequality (i) follows from the choice of T. (ii) uses Equation (28). (iii) follows by plugging in \(\Psi = \frac{1}{2}\left( d+1 \right) ^{-1/q}\) and \(3^q d^2 \ge 2^q (d+ 1)^2\). (iv) follows from \(\beta q \ge 1\), \(d\ge 3\), \(q\ge 2\) and \(3^{-4/3} < 0.3\).\(\square \)

Proof of Lemma 8. The function f has the following derivatives

$$\begin{aligned} \frac{d f(a)}{d a} = \frac{1}{q} a^{1/q-1}, \frac{d^2 f(a, t)}{d a^2} = -\frac{q-1}{q^2} a^{1/q-2}. \end{aligned}$$

Using Itô’s formula, we obtain

$$\begin{aligned} d f\left( \Gamma _t \right)&= \left. \frac{df}{da} \right| _{\Gamma _t} d\Gamma _t + \frac{1}{2} \left. \frac{d^2 f}{ d^2 a }\right| _{\Gamma _t} d \left[ \Gamma \right] _t \\&= \frac{1}{q} \Gamma _t^{1/q-1} \left( v_t^\top dW_t + \delta _t dt \right) - \frac{q-1}{2q^2} \Gamma _t^{1/q-2} \left\| v_t\right\| _{2}^2 dt. \end{aligned}$$

Using the bounds in Lemma 6 and the martingale property of the term \(\frac{1}{q} \Gamma _t^{1/q-1} v_t^\top dW_t\), we obtain

$$\begin{aligned} d {\mathbb {E}}f(\Gamma _t) \le \frac{2q}{t} {\mathbb {E}}f(\Gamma _t) dt. \end{aligned}$$

Solving the above differential equation, we obtain

$$\begin{aligned} {\mathbb {E}}f(\Gamma _{t_2}) \le {\mathbb {E}}f(\Gamma _{t_1}) \left( \frac{t_2}{t_1} \right) ^{2q}, \forall t_2> t_1 > 0. \end{aligned}$$

\(\square \)

Proof of tensor bounds.

In this subsection, we prove Lemmas 91011 and 12.

Proof of Lemma 9. Since C is positive semi-definite, we can write its eigenvalue decomposition as follows \(C = \sum _{i=1}^d\lambda _i v_i v_i^\top \), with \(\lambda _i \ge 0\). Then,

$$\begin{aligned}&\left\| \int B^{1/2} (x-\mu ) (x-\mu )^\top C (x-\mu ) p(x) dx\right\| _{2} \\&\quad = \left\| \sum _{i=1}^d\int B^{1/2} (x-\mu ) \lambda _i \left( (x-\mu )^\top v_i \right) ^2 p(x) dx\right\| _{2}\\&\quad {\mathop {\le }\limits ^{\mathrm{(i)}}} \sum _{i=1}^d\lambda _i \left\| \int B^{1/2} (x-\mu ) \left( (x-\mu )^\top v_i \right) ^2 p(x) dx\right\| _{2}\\&\quad = \sum _{i=1}^d\lambda _i \max _{\left\| \xi \right\| _{2}\le 1} \int \xi ^\top B^{1/2} (x-\mu ) \left( (x-\mu )^\top v_i \right) ^2 p(x) dx \\&\quad {\mathop {\le }\limits ^{\mathrm{(ii)}}} \sum _{i=1}^d\lambda _i \max _{\left\| \xi \right\| _{2}\le 1} \left( \int \left( \xi ^\top B^{1/2} (x-\mu ) \right) ^2 p(x) dx \right) ^{1/2} \left( \int \left( (x-\mu )^\top v_i \right) ^4 p(x) dx \right) ^{1/2} \\&\quad {\mathop {\le }\limits ^{\mathrm{(iii)}}} 16 \sum _{i=1}^d\lambda _i \max _{\left\| \xi \right\| _{2}\le 1} \left( \int \left( \xi ^\top B^{1/2} (x-\mu ) \right) ^2 p(x) dx \right) ^{1/2} \left( \int \left( (x-\mu )^\top v_i \right) ^2 p(x) dx \right) \\&\quad = 16\left\| B^{1/2} A B^{1/2} \right\| _{2}^{1/2} {{\,\mathrm{Tr}\,}}\left( A^{1/2}CA^{1/2} \right) . \end{aligned}$$

Inequality (i) follows from triangular inequality. (ii) follows from Cauchy–Schwarz inequality. (iii) follows from the statement below, which upper bounds the fourth moment of a log-concave density via its second moment.\(\square \)

For any log-concave density \(\nu \) and any vector \(\theta \in \mathbb {R}^{d}\), we have

$$\begin{aligned} \left( \int \left( (x-\mu _\nu )^\top \theta \right) ^a \nu (x) dx \right) ^{1/a} \le 2 \frac{a}{b} \left( \int \left( (x-\mu _\nu )^\top \theta \right) ^b \nu (x) dx \right) ^{1/b} \end{aligned}$$
(29)

for \(a \ge b > 0\), where \(\mu _\nu \) is the mean of \(\nu \). Equation (29) is proved e.g. in Corollary 5.7 of Guédon et al. [11] and the exact constant is provided in Proposition 3.8 of Latała and Wojtaszczyk [15].

In order to prove Lemma 10, we need to introduce one additional lemma as follows.

Lemma 13

Suppose that \(\psi _k \ge \frac{1}{\alpha k^\beta }\) for all \(k \le d\) for some \(0 < \beta \le \frac{1}{2}\) and \(\alpha \ge 1\). For an isotropic log-concave density p in \(\mathbb {R}^d\) and a unit vector \(v \in \mathbb {R}^d\), define \(\Delta = {\mathbb {E}}_{X \sim p} \left( X^\top v \right) \cdot XX^\top \), then we have

  1. 1.

    For any orthogonal projection matrix \(P \in \mathbb {R}^{d\times d}\) with rank r, we have

    $$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( \Delta P \Delta \right) \le 16 \psi ^{-2}_{\min (2r, d)}. \end{aligned}$$
  2. 2.

    For any positive semi-definite matrix A, we have

    $$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( \Delta A \Delta \right) \le 128 \alpha ^2 \log (d) \left( {{\,\mathrm{Tr}\,}}\left( A^{1/(2\beta )} \right) \right) ^{2\beta }. \end{aligned}$$

This lemma was proved in Lemma 41 in an older version (arXiv version 2) of Lee and Vempala [17]. The main proof idea for the first part of Lemma 13 appeared in Eldan [8] (Lemma 6). we provide a proof here for completeness.

Proof of Lemma 13. For the first part, we have

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( \Delta P \Delta \right) = {\mathbb {E}}_{X \sim p} X^\top \Delta P X \cdot X ^\top v. \end{aligned}$$

Since \({\mathbb {E}}_{X\sim p} X^\top v = 0\), we can subtract the mean of the first term \(X^\top \Delta P X\) without changing the value of \({{\,\mathrm{Tr}\,}}\left( \Delta P \Delta \right) \). Then

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( \Delta P \Delta \right)&= {\mathbb {E}}_{X\sim p} \left[ \left( X^\top \Delta P X - {\mathbb {E}}_{Y \sim p} Y ^\top \Delta P Y \right) \cdot X ^\top v \right] \\&{\mathop {\le }\limits ^{\mathrm{(i)}}} \left( {\mathbb {E}}_{X\sim p}(X^\top v)^2 \right) ^{1/2} \left( {{\,\mathrm{Var}\,}}_{X \sim p }\left( X ^\top \Delta P X \right) \right) ^{1/2} \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}} 2 \psi _{\min (2r, d)}^{-1} \left( {\mathbb {E}}_{X \sim p} \left\| \Delta P X + P^\top \Delta ^\top X\right\| _{2}^2 \right) ^{1/2} \\&\le 4 \psi _{\min (2r, d)}^{-1} \left( {{\,\mathrm{Tr}\,}}\left( \Delta P \Delta \right) \right) ^{1/2}. \end{aligned}$$

Inequality (i) follows from the Cauchy–Schwarz inequality. Inequality (ii) follows from the fact that \({\mathbb {E}}_{X\sim p}(X^\top v)^2 = 1\) as p is isotropic and that the inverse Poincaré constant is upper bounded by twice of inverse of the squared isoperimetric coefficient (also known as Cheeger’s inequality [6, 19] or Theorem 1.1 in Milman [20]). The matrix \(\Delta P + P^\top \Delta \) has rank at most \(\min (2r, d)\). Rearranging the terms in the above equation, we conclude the first part of Lemma 13.

For the second part, we write the matrix A in its eigenvalue decomposition and group the terms by eigenvalues. We have

$$\begin{aligned} A = \sum _{i=1}^d\lambda _i v_i v_i^\top = \sum _{j=1}^J A_j + B, \end{aligned}$$

where \(A_i\) has eigenvalues between the interval \((\left\| A\right\| _{2} e^{i-1} /d, \left\| A\right\| _{2} e^{i} /d]\) and B has eigenvalues smaller than or equal to \(\left\| A\right\| _{2}/d\). Because the intervals have right bounds increasing exponentially, we have \(J = \lceil \log (d) \rceil \). Let \(P_i\) be the orthogonal projection matrix formed by the eigenvectors in \(A_i\). Then we have

$$\begin{aligned}&{{\,\mathrm{Tr}\,}}\left( \Delta A_i \Delta \right) \le \left\| A_i\right\| _{2} {{\,\mathrm{Tr}\,}}\left( \Delta P_i \Delta \right) {\mathop {\le }\limits ^{\mathrm{(i)}}} 16 \left\| A_i\right\| _{2} \psi ^{-2}_{\min (2 \text {rank}(A_i), d)} {\mathop {\nonumber }\limits ^{\mathrm{(ii)}}}\\&\quad {\le } 16 \alpha ^2 \left\| A_i\right\| _{2} \cdot \left( 2 \text {rank}(A_i) \right) ^{2\beta }, \end{aligned}$$
(30)

where inequality (i) follows from the first part of Lemma 13 and inequality (ii) follows from the hypothesis of Lemma 13. Similarly for matrix B, we have

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( \Delta B \Delta \right) {\mathop {\le }\limits ^{\mathrm{(i)}}} 16 \alpha ^2 \left\| B\right\| _{2} \left( 2\text {rank}(B) \right) ^{2\beta } {\mathop {\le }\limits ^{\mathrm{(ii)}}} 32 \alpha ^2 \left\| A\right\| _{2}, \end{aligned}$$
(31)

where inequality (i) follows from the hypothesis of Lemma 13 and inequality (ii) follows from the fact that \(\left\| B\right\| _{2} \le \left\| A\right\| _{2}/d\) and \(2\beta \le 1\). Putting the bounds (30) and (31) together, we have

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( \Delta A \Delta \right)&= \sum _{j=1}^J {{\,\mathrm{Tr}\,}}\left( \Delta A_j\Delta \right) + {{\,\mathrm{Tr}\,}}\left( \Delta B \Delta \right) \\&\le 16 \alpha ^2 \left( \sum _{j=1}^J \left\| A_j\right\| _{2} \cdot \left( 2\text {rank}(A_j) \right) ^{2\beta } + 2\left\| A\right\| _{2} \right) \\&{\mathop {\le }\limits ^{\mathrm{(i)}}} 16 \alpha ^2 \left[ \left( \sum _{j=1}^J \left\| A_j\right\| _{2}^{1/(2\beta )} \cdot \left( 2\text {rank}(A_j) \right) \right) ^{2\beta } \cdot \left( J \right) ^{1-2\beta } + 2\left\| A\right\| _{2} \right] \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}} 16 \alpha ^2\left[ \left( 2 e {{\,\mathrm{Tr}\,}}\left( A^{1/(2\beta )} \right) \right) ^{2\beta } \cdot \left( J \right) ^{1-2\beta } + 2 \left\| A\right\| _{2} \right] \\&\le 128 \alpha ^2 \log (d) \left( {{\,\mathrm{Tr}\,}}\left( A^{1/(2\beta )} \right) \right) ^{2\beta }. \end{aligned}$$

Inequality (i) follows from Holder’s inequality and inequality (ii) follows from the fact that \(\left\| A_j\right\| _{2}^{1/2\beta } \text {rank}(A_j) \le e {{\,\mathrm{Tr}\,}}\left( A_{j}^{1/2\beta } \right) \) due to the construction of \(A_j\). This concludes the second part of Lemma 13.\(\square \)

Proof of Lemma 10. Let \(\mu \) be the mean of p. First, for X a random vector in \(\mathbb {R}^d\) drawn from p, we define the standardized random variable \(A^{-1/2} (X - \mu )\) and its density \(\varrho \). \(\varrho \) is an isotropic log-concave density. Then through a change of variable, we have

$$\begin{aligned}&\mathcal {T}_p \left( A^{q-2}, \mathbb {I}_d, \mathbb {I}_d \right) \\&\quad = \int \int (x-\mu )^\top A^{q-2} (y-\mu ) \cdot (x-\mu )^\top (y-\mu ) \cdot (x-\mu ) ^\top (y-\mu ) p(x) p(y) dx dy \\&\quad = \int \int \left( x^\top A^{q-1} y \right) (x^\top A y) (x ^\top A y) \varrho (x) \varrho (y) dx dy \\&\quad \le \int \int \left( x^\top A^{q} y \right) (x^\top A y) (x ^\top y) \varrho (x) \varrho (y) dx dy \\&\quad = \mathcal {T}_\varrho \left( A^{q}, A, \mathbb {I}_d \right) , \end{aligned}$$

where the last inequality follows from Lemma 12. \(A^q\) is positive semi-definite and we write down its eigenvalue decomposition \(A^q = \sum _{i=1}^d\lambda _i v_i v_i ^\top \) with \(\lambda _i \ge 0\). Since \(\varrho \) is isotropic, we can rewrite the 3-Tensor into a summation form and apply Lemma 13.

$$\begin{aligned}&\mathcal {T}_\varrho \left( A^{q}, A, \mathbb {I}_d \right) \\&\quad = \int \int \left( x ^\top A^q y \right) \left( x ^\top A y \right) \left( x^\top y \right) \varrho (x) \varrho (y) dx dy \\&\quad = \sum _{i=1}^d\lambda _i \int \int \left( x ^\top v_i \right) \left( y ^\top v_i \right) \left( x ^\top A y \right) \left( x^\top y \right) \varrho (x) \varrho (y) dx dy \\&\quad = \sum _{i=1}^d\lambda _i {{\,\mathrm{Tr}\,}}\left( \Delta _i A \Delta _i \right) \\&\quad {\mathop {\le }\limits ^{\mathrm{(i)}}} 128 \alpha ^2 \log (d) \left( {{\,\mathrm{Tr}\,}}(A^{1/2\beta }) \right) ^{2\beta } \left( \sum _{i=1}^d\lambda _i \right) \\&\quad = 128 \alpha ^2 \log (d) \left( {{\,\mathrm{Tr}\,}}(A^{1/2\beta }) \right) ^{2\beta } {{\,\mathrm{Tr}\,}}(A^q) \\&\quad {\mathop {\le }\limits ^{\mathrm{(ii)}}} 128 \alpha ^2 \log (d) {{\,\mathrm{Tr}\,}}(A^q) \left[ {{\,\mathrm{Tr}\,}}\left( A^q \right) ^{1/(2\beta q)} \left( d \right) ^{1 - 1/(2\beta q)} \right] ^{2\beta } \\&\quad = 128 \alpha ^2 \log (d) d^{2\beta - 1/q} {{\,\mathrm{Tr}\,}}(A^q) ^{1 + 1/q}, \end{aligned}$$

where we define \(\Delta _i = \int (x^\top v_i) x x^\top \varrho (x) dx\), inequality (i) follows from Lemma 13 and that \(\varrho \) is isotropic, inequality (ii) follows from Cauchy–Schwarz inequality and the assumption that \(q \ge \frac{1}{2\beta }\).\(\square \)

Proof of Lemma 11. Without loss of generality, we can assume that the density p has mean 0. Its covariance matrix A is positive semi-definite and invertible. We can write down its eigenvalue decomposition as follows \(A = \sum _{i=1}^d\lambda _i v_i v_i^\top \) with \(\lambda _i > 0\) and \(v_i\) are eigenvectors with norm 1. Then \(A^{q}\) has an eigenvalue decomposition with the same eigenvectors \(A^q = \sum _{i=1}^d\lambda _i^q v_i v_i^\top \). Define \(\Delta _i = {\mathbb {E}}_{X \sim p} (X^\top A^{-1/2}v_i) X X ^\top \), then

$$\begin{aligned} \mathcal {T}_p\left( A^{q-2}, \mathbb {I}_d, \mathbb {I}_d \right)&= {\mathbb {E}}_{X, Y \sim p} \left( X^\top A^{q-2} Y \right) (X^\top Y) (X ^\top Y) \nonumber \\&= \sum _{i=1}^d\lambda _i^{q-1} {{\,\mathrm{Tr}\,}}\left( \Delta _i \Delta _i \right) . \end{aligned}$$
(32)

Next we bound the terms \({{\,\mathrm{Tr}\,}}\left( \Delta _i \Delta _i \right) \). We have

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( \Delta _i \Delta _i \right)&= {\mathbb {E}}_{X \sim p} \left( X ^\top A^{-1/2} v_i \right) X^\top \Delta _i X \\&{\mathop {=}\limits ^{\mathrm{(i)}}} {\mathbb {E}}_{X \sim p} \left( X ^\top A^{-1/2} v_i \right) \left( X^\top \Delta _i X - {\mathbb {E}}_{Y \sim p} \left[ Y^\top \Delta _i Y \right] \right) \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}} \left( {\mathbb {E}}_{X \sim p} \left( X ^\top A^{-1/2} v_i \right) ^2 \right) ^{1/2} \left( {{\,\mathrm{Var}\,}}\left( X ^\top \Delta _i X \right) \right) ^{1/2} \\&{\mathop {=}\limits ^{\mathrm{(iii)}}} \left( {{\,\mathrm{Var}\,}}_{X \sim p}\left( X ^\top \Delta _i X \right) \right) ^{1/2} \\&{\mathop {\le }\limits ^{\mathrm{(iv)}}} \left( {\mathbb {E}}_{X \sim p} \frac{1}{\tau } \left\| \Delta _i X + \Delta _i X\right\| _{2}^2 \right) ^{1/2} \\&{\mathop {\le }\limits ^{\mathrm{(v)}}} \left( \frac{4}{\tau } {{\,\mathrm{Tr}\,}}\left( A \Delta _i \Delta _i \right) \right) ^{1/2}. \end{aligned}$$

Equality (i) is because \({\mathbb {E}}_{X \sim p} X = 0\). Inequality (ii) follows from Cauchy–Schwarz inequality. Equality (iii) follows from the definition of the covariance matrix \({\mathbb {E}}_{X\sim p} XX^\top = A\). Inequality (iv) follows from the Brascamp-Lieb inequality (or Hessian Poincaré, see Theorem 4.1 in Brascamp and Lieb [5]) together with the assumption that p is more log-concave than \(\mathcal {N}(0, \frac{1}{\tau }\mathbb {I}_d)\).

Plugging the bounds of the terms \({{\,\mathrm{Tr}\,}}\left( \Delta _i \Delta _i \right) \) into Equation (32), we obtain

$$\begin{aligned} \mathcal {T}_p\left( A^{q-2}, \mathbb {I}_d, \mathbb {I}_d \right)&= \sum _{i=1}^d\lambda _i^{q-1} {{\,\mathrm{Tr}\,}}\left( \Delta _i \Delta _i \right) \\&\le \sum _{i=1}^d\lambda _i^{q-1} \left( \frac{4}{\tau } {{\,\mathrm{Tr}\,}}\left( A \Delta _i \Delta _i \right) \right) ^{1/2} \\&{\mathop {\le }\limits ^{\mathrm{(i)}}} \frac{2}{\tau ^{1/2}} \left( \sum _{i=1}^d\lambda _i^{q} \right) ^{1/2} \left( \sum _{i=1}^d\lambda _i^{q-2} {{\,\mathrm{Tr}\,}}\left( A \Delta _i \Delta _i \right) \right) ^{1/2} \\&= \frac{2}{\tau ^{1/2}} \left( {{\,\mathrm{Tr}\,}}\left( A^q \right) \right) ^{1/2} \left( {\mathbb {E}}_{X, Y \sim p} \left( X^\top A^{q-3} Y \right) (X^\top A Y) (X ^\top Y) \right) ^{1/2} \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}} \frac{2}{\tau ^{1/2}} \left( {{\,\mathrm{Tr}\,}}\left( A^q \right) \right) ^{1/2} \left( {\mathbb {E}}_{X, Y \sim p} \left( X^\top A^{q-2} Y \right) (X^\top Y) (X ^\top Y) \right) ^{1/2} \\&= \frac{2}{\tau ^{1/2}} \left( {{\,\mathrm{Tr}\,}}\left( A^q \right) \right) ^{1/2} \left[ \mathcal {T}_p\left( A^{q-2}, \mathbb {I}_d, \mathbb {I}_d \right) \right] ^{1/2}. \end{aligned}$$

Inequality (i) follows from Cauchy–Schwarz inequality. For \(q \ge 3\), inequality (ii) follows from Lemma 12. From the above equation, after rearranging the terms, we obtain

$$\begin{aligned} \mathcal {T}_p\left( A^{q-2}, \mathbb {I}_d, \mathbb {I}_d \right) \le \frac{4}{\tau } {{\,\mathrm{Tr}\,}}\left( A^q \right) . \end{aligned}$$

\(\square \)

Proof of Lemma 12. This lemma is proved in Lemma 43 in an older version (arXiv version 2) of Lee and Vempala [17], we provide a proof here for completeness.

Without loss of generality, we can assume that the density p has mean 0. For \(i \in \left\{ 1, \cdots , d \right\} \), we define \(\Delta _i = {\mathbb {E}}_{X\sim p} B^{1/2} X X ^\top B^{1/2} X^\top C^{1/2} e_i\) where \(e_i \in \mathbb {R}^d\) is the vector with ith coordinate 1 and 0 elsewhere. We have \(\sum _{i=1}^de_i e_i ^\top = \mathbb {I}_d\). We can rewrite the tensor on the left hand side as a sum of traces.

$$\begin{aligned}&\mathcal {T}_{p}(B^{1/2}A^\delta B^{1/2}, B^{1/2}A^{1-\delta }B^{1/2}, C) \nonumber \\&\quad = {\mathbb {E}}_{X, Y \sim p} X^\top B^{1/2}A^\delta B^{1/2} Y \cdot X^\top B^{1/2}A^{1-\delta }B^{1/2} Y \cdot X ^\top C Y \nonumber \\&\quad = \sum _{i=1}^d{\mathbb {E}}_{X, Y \sim p} X^\top B^{1/2}A^\delta B^{1/2} Y \cdot X^\top B^{1/2}A^{1-\delta }B^{1/2} Y \cdot X^\top C^{1/2} e_i \cdot Y^\top C^{1/2} e_i \nonumber \\&\quad = \sum _{i=1}^d{{\,\mathrm{Tr}\,}}\left( A^{\delta } \Delta _i A^{1-\delta } \Delta _i \right) . \end{aligned}$$
(33)

For any symmetric matrix F, a positive-semidefinite matrix G and \(\delta \in [0, 1]\), we have

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( G^\delta F G^{1-\delta } F \right) \le {{\,\mathrm{Tr}\,}}\left( G F^2 \right) . \end{aligned}$$
(34)

Applying the above trace inequality (34) that we prove later for completeness (see also Lemma 2.1 in Zhu et al. [1]), we obtain

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( A^{\delta } \Delta _i A^{1-\delta } \Delta _i \right) \le {{\,\mathrm{Tr}\,}}\left( A \Delta _i \Delta _i \right) . \end{aligned}$$

Writing the sum of traces in Equation (33) back to the 3-Tensor form, we conclude Lemma 12.

It remains to prove the trace inequality in Equation (34). Without loss of generality, we can assume G is diagonal. Hence, we have

$$\begin{aligned} {{\,\mathrm{Tr}\,}}\left( G^\delta F G^{1-\delta } F \right)&= \sum _{i = 1}^d\sum _{j = 1}^dG_{ii}^\delta G_{jj}^{1-\delta } F_{ij}^2 \\&\le \sum _{i=1}^d\sum _{j=1}^d\left( \delta G_{ii} + (1-\delta ) G_{jj} \right) F_{ij}^2 \\&= \delta \sum _{i=1}^d\sum _{j=1}^dG_{ii} F_{ij}^2 + (1-\delta ) \sum _{i=1}^d\sum _{j=1}^dG_{jj} F_{ij}^2 \\&= {{\,\mathrm{Tr}\,}}\left( G F^2 \right) , \end{aligned}$$

where the inequality follows from Jensen’s inequality and the fact that the logarithm function is concave (or the inequality of arithmetic and geometric means).\(\square \)

References

  1. 1.

    Z. Allen-Zhu, Y.T. Lee, and L. Orecchia. Using optimization to obtain a width-independent, parallel, simpler, and faster positive SDP solver. In: Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM (2016), pp. 1824–1831.

  2. 2.

    M. Anttila, K. Ball, and I. Perissinaki. The central limit problem for convex bodies. Transactions of the American Mathematical Society, (12)355 (2003), 4723–4735

  3. 3.

    K. Ball. Logarithmically concave functions and sections of convex sets in Rn. Studia Math, (1)88 (1988), 69–84

  4. 4.

    J. Bourgain. On high dimensional maximal functions associated to convex bodies. American Journal of Mathematics, (6)108 (1986), 1467–1476

  5. 5.

    H.J. Brascamp and E.H. Lieb. On extensions of the Brunn–Minkowski and Prékopa–Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. In: Inequalities. Springer (2002), pp. 441–464.

  6. 6.

    J. Cheeger. A lower bound for the smallest eigenvalue of the Laplacian. In: Proceedings of the Princeton Conference in Honor of Professor S. Bochner (1969), pp. 195–199.

  7. 7.

    B. Cousins and S. Vempala. A cubic algorithm for computing Gaussian volume. In: Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics (2014), pp. 1215–1228.

  8. 8.

    R. Eldan. Thin shell implies spectral gap up to polylog via a stochastic localization scheme. Geometric and Functional Analysis, (2)23 (2013), 532–569

  9. 9.

    R. Eldan and B. Klartag. Approximately Gaussian marginals and the hyperplane conjecture. Concentration, Functional Inequalities and Isoperimetry, 545 (2011), 55–68

  10. 10.

    R. Eldan and J. Lehec. Bounding the norm of a log-concave vector via thin-shell estimates. In: Geometric Aspects of Functional Analysis. Springer (2014), pp. 107–122.

  11. 11.

    O. Guédon, P. Nayar, and T. Tkocz. Concentration inequalities and geometry of convex bodies. Analytical and Probabilistic Methods in the Geometry of Convex Bodies, 2 (2014), 9–86

    MathSciNet  MATH  Google Scholar 

  12. 12.

    R. Kannan, L. Lovász, and M. Simonovits. Isoperimetric problems for convex bodies and a localization lemma. Discrete & Computational Geometry, (3–4)13 (1995), 541–559

  13. 13.

    B. Klartag. On convex perturbations with a bounded isotropic constant. Geometric & Functional Analysis GAFA, (6)16 (2006), 1274–1290

  14. 14.

    B. Klartag and V. Milman. The slicing problem by bourgain. In: (To Appear) Analysis at Large, A Collection of Articles in Memory of Jean Bourgain. Springer (2021).

  15. 15.

    R. Latała and J. Wojtaszczyk. On the infimum convolution inequality. Studia Mathematica, (189)2 (2008), 147–187

  16. 16.

    M. Ledoux. The Concentration of Measure Phenomenon, Number 89. American Mathematical Society (2001).

  17. 17.

    Y.T. Lee and S.S. Vempala. Eldan’s stochastic localization and the KLS hyperplane conjecture: an improved lower bound for expansion. In: 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS). IEEE (2017), pp. 998–1007.

  18. 18.

    Y.T. Lee and S.S. Vempala. The Kannan–Lovász–Simonovits conjecture. arXiv preprint arXiv:1807.03465 (2018).

  19. 19.

    V.G. Maz’ya. Classes of domains and imbedding theorems for function spaces. In: Doklady Akademii Nauk, Vol. 133. Russian Academy of Sciences (1960), pp. 527–530.

  20. 20.

    E. Milman. On the role of convexity in isoperimetry, spectral gap and concentration. Inventiones Mathematicae, (1)177 (2009), 1–43

  21. 21.

    B. Øksendal. Stochastic Differential Equations. Springer, Berlin (2003).

  22. 22.

    G. Paouris. Concentration of mass on convex bodies. Geometric & Functional Analysis GAFA, (5)16 (2006), 1021–1049

  23. 23.

    D. Revuz and M. Yor. Continuous Martingales and Brownian Motion, Vol. 293. Springer, Berlin (2013).

  24. 24.

    A. Saumard and J.A. Wellner. Log-concavity and strong log-concavity: a review. Statistics Surveys, 8 (2014), 45

  25. 25.

    P. Sternberg and K. Zumbrun. On the connectivity of boundaries of sets minimizing perimeter subject to a volume constraint. Communications in Analysis and Geometry, (1)7 (1999), 199–220

Download references

Acknowledgements

Yuansi Chen has received funding from the European Research Council under the Grant Agreement No 786461 (CausalStats - ERC-2017-ADG). We acknowledge scientific interaction and exchange at “ETH Foundations of Data Science”. We thank Peter Bühlmann and Bin Yu for their continuous support and encouragement. We thank Afonso Bandeira, Raaz Dwivedi, Ronen Eldan, Yin Tat Lee and Martin Wainwright for helpful discussions. We thank Bo’az Klartag and Joseph Lehec for pointing out a mistake in the previous revision. We also thank anonymous reviewers for their careful reading of our manuscript and their suggestions on presentation and writing.

Funding

Open access funding provided by Swiss Federal Institute of Technology Zurich.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yuansi Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proof of Lemma 3 and derivatives

Proof of Lemma 3 and derivatives

In this section, we first prove the existence and uniqueness of the SDE solution in Lemma 3 and then derive the derivatives of \(p_t\), \(A_t\) and \(\Gamma _t\) in Equation (13), Equation (15) and (16) using Itô’s calculus. Similar results are also proved in Eldan [8] and Lee and Vempala [17] since a similar stochastic localization is used. We provide a proof here for completeness.

Proof of Lemma 3. We can rewrite the stochastic differential equation (8) as follows to make the dependency clear:

$$\begin{aligned} dc_t&= A^{-1/2} dW_t + A^{-1} \mu \left( c_t, B_t \right) dt\\ dB_t&= A^{-1} dt, \end{aligned}$$

where

$$\begin{aligned} \mu (c, B)&= \int x \varrho (c, B, x) dx, \\ \varrho (c, B, x)&= \frac{e^{c^\top x - \frac{1}{2}x^\top B x} p(x)}{\int _{\mathbb {R}^d} e^{c ^\top x - \frac{1}{2}y^\top B y} p(y) dy}. \end{aligned}$$

Since p has a compact support, given \(x \in \mathbb {R}^d\), \(\varrho (\cdot , \cdot , x)\) as a function of (cB) is Lipschitz in c and B. Similarly, \(\mu \) is also Lipschitz in c and B. Consequently, \(A^{-1/2}\), \(A^{-1}\mu (c_t, B_t)\) and \(A^{-1}\) are all bounded and Lipschitz on \(c_t\) and \(B_t\) on the compact support. Applying the existence and uniqueness theorem of SDE solutions (Theorem 5.2 in Øksendal [21]), we show that the SDE solution exists and is unique on the time interval [0, T] for any \(T > 0\).

Next, we derive the derivative of \(p_t\). Define

$$\begin{aligned} G_t(x)&= e^{c_t^\top x - \frac{1}{2} x^\top B_t x} p(x), \\ V_t&= \int G_t(x) dx. \end{aligned}$$

Then \(p_t(x)\) can be written as \(\frac{G_t(x)}{V_t}\). Let \(S_t(x)\) denote the quadratic variation of the process \(c_t^\top x\). We have

$$\begin{aligned} d S_t(x) = x^\top A^{-1} x dt. \end{aligned}$$

Using Itô’s formula, we have

$$\begin{aligned} dG_t(x)&= \left( x^\top (dc_t) - \frac{1}{2} x^\top dB_t x + \frac{1}{2} dS_t \right) G_t(x) \\&= \left( x^\top A^{-1/2}dW_t + x^\top A^{-1} \mu _t dt \right) G_t(x), \\ dV_t&= \int dG_t(x) dx = V_t \left( \mu _t^\top A^{-1/2}dW_t + \mu _t^\top A^{-1} \mu _t dt \right) . \end{aligned}$$

Using Itô’s formula on the inverse of \(V_t\), we have

$$\begin{aligned} d V_t^{-1}&= -\frac{dV_t}{V_t^2} + \frac{d \left[ V \right] _t}{V_t^3} \\&= - V_t^{-1} \left[ \mu _t^\top A^{-1/2} dW_t + \mu _t^\top A^{-1} \mu _t dt \right] + V_t^{-1} \mu _t^\top A^{-1} \mu _t dt \\&= - V_t^{-1} \mu _t^\top A^{-1/2} dW_t. \end{aligned}$$

Using Itô’s formula on \(p_t\), with the above derivatives, we obtain

$$\begin{aligned} dp_t(x)&= d \left( V_t^{-1} G_t(x) \right) \\&= \left( G_t(x) dV_t^{-1} + V_t^{-1}dG_t(x) + d\left[ V^{-1}, G(x) \right] _t \right) \\&= \left( x-\mu _t \right) ^\top A^{-1/2} dW_t p_t(x). \end{aligned}$$

Then we derive the derivative of \(A_t\). By the definition of \(A_t\), we have

$$\begin{aligned} A_t = \int \left( x - \mu _t \right) \left( x - \mu _t \right) ^\top p_t(x) dx, \end{aligned}$$

where \(\mu _t = \int _{\mathbb {R}^d} x p_t(x) dx\). Using Itô’s formula on \(\mu _t\), we obtain

$$\begin{aligned} d\mu _t&= \int x d p_t(x) dx \\&= \int x (x- \mu _t)^\top A^{-1/2} dW_t p_t(x) dx \\&= \int (x - \mu _t) (x- \mu _t)^\top A^{-1/2} dW_t dx\\&= A_t A^{-1/2} dW_t. \end{aligned}$$

Using Itô’s formula on \(A_t\) and viewing it as a function of \(\mu _t\) and \(p_t\), we obtain

$$\begin{aligned} dA_t =&\int \left( x - \mu _t \right) \left( x - \mu _t \right) ^\top dp_t(x) dx - \int d\mu _t\left( x - \mu _t \right) ^\top p_t(x) dx\\&- \int \left( x - \mu _t \right) \left( d\mu _t \right) ^\top p_t(x) dx \\&-\frac{1}{2}\cdot 2 \int \left( x - \mu _t \right) d\left[ \mu _t^\top , p_t(x) \right] _t dx - \frac{1}{2}\cdot 2 \int d\left[ \mu _t, p_t(x) \right] _t \left( x - \mu _t \right) ^\top dx \\&+ \frac{1}{2} \cdot 2 d\left[ \mu _t, \mu _t^\top \right] _t \int p_t(x) dx. \end{aligned}$$

We observe that \(\int d\mu _t\left( x - \mu _t \right) ^\top p_t(x) dx = 0\) and \(\int \left( x - \mu _t \right) \left( d\mu _t \right) ^\top p_t(x) dx = 0\). Then,

$$\begin{aligned} d\left[ \mu _t^\top , p_t(x) \right] _t&= \left( x - \mu _t \right) ^\top A^{-1} A_t p_t(x) dt,\\ d\left[ \mu _t, p_t(x) \right] _t&= A_t A^{-1} \left( x - \mu _t \right) p_t(x) dt, \\ d\left[ \mu _t, \mu _t \right] _t&= A_t A^{-1} A_t dt. \end{aligned}$$

Combining all the terms together, we have

$$\begin{aligned} dA_t = \int \left( x - \mu _t \right) \left( x - \mu _t \right) ^\top \left( \left( x-\mu _t \right) ^\top A^{-1/2} dW_t \right) p_t(x) dx - A_tA^{-1} A_t dt. \end{aligned}$$

Finally, we derive the derivative of \(\Gamma _t\). Define the function \(\Gamma : \mathbb {R}^{d\times d} \mapsto \mathbb {R}\) as \(\Gamma (X) = {{\,\mathrm{Tr}\,}}\left( X^q \right) \). The first-order and second-order derivatives of \(\Gamma \) are given by

$$\begin{aligned} \left. \frac{\partial \Gamma }{\partial X}\right| _{H} = q {{\,\mathrm{Tr}\,}}\left( X^{q-1} H \right) , \left. \frac{\partial ^2 \Gamma }{\partial X \partial X}\right| _{H_1, H_2} = q \sum _{a=0}^{q-2} {{\,\mathrm{Tr}\,}}\left( X^a H_2 X^{q-2-a} H_1 \right) . \end{aligned}$$

Using the above derivatives and Itô’s formula, we obtain

$$\begin{aligned} d\Gamma _t = d {{\,\mathrm{Tr}\,}}\left( Q_t^q \right) = q {{\,\mathrm{Tr}\,}}\left( Q_t^{q-1} d Q_t \right) + \frac{q}{2} \sum _{a = 0}^{q-2} \sum _{i,j,k,l=1}^{d} {{\,\mathrm{Tr}\,}}\left( Q_t^a E_{ij} Q_t^{q-2-a} E_{kl} \right) d\left[ Q_{ij}, Q_{kl} \right] _t, \end{aligned}$$
(35)

where \(E_{ij}\) is the matrix that takes 1 at the entry (ij) and 0 otherwise and \(Q_{ij, t}\) is the stochastic process defined by the (ij) entry of \(Q_t\). Using the derivative of \(A_t\) in Equation (15), we have

$$\begin{aligned} dQ_t&= \int A^{-1/2} \left( x - \mu _t \right) \left( x - \mu _t \right) ^\top A^{-1/2} \left( \left( x-\mu _t \right) ^\top A^{-1/2} dW_t \right) p_t(x) dx\\&\quad - Q_t^2 dt, \\ d\left[ Q_{ij}, Q_{kl} \right] _t&= \int \int z(x)_i z(x)_j z(y)_k z(y)_l (x-\mu _t)^\top A^{-1} (y-\mu _t) p_t(x) p_t(y)dx dy dt, \end{aligned}$$

where \(z(x)_i\) is the ith coordinate of \(\left[ A^{-1/2}(x-\mu _t) \right] \). Plugging the expressions of \(dA_t\) and \(d\left[ A_{ij}, A_{kl} \right] _t \) into Equation (35), we obtain

$$\begin{aligned} d\Gamma _t&= q \int \left( x-\mu _t \right) ^\top A^{-1/2} \left( Q_t \right) ^{q-1} A^{-1/2} \left( x-\mu _t \right) \left( x-\mu _t \right) ^\top A^{-1/2} dW_t p_t(x) dx \nonumber \\&\quad - q {{\,\mathrm{Tr}\,}}\left( Q_t^{q+1} \right) dt + \frac{q}{2} \sum _{a = 0}^{q-2} \int \int \left( x-\mu _t \right) ^\top A^{-1/2} Q_t^{a} A^{-1/2} \left( y-\mu _t \right) \nonumber \\&\quad \cdot \left( x-\mu _t \right) ^\top A^{-1/2} Q_t^{q-2-a} A^{-1/2} \left( y-\mu _t \right) \left( x-\mu _t \right) ^\top A^{-1} \left( y-\mu _t \right) p_t(x) p_t(y) dx dy dt. \end{aligned}$$

\(\square \)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chen, Y. An Almost Constant Lower Bound of the Isoperimetric Coefficient in the KLS Conjecture. Geom. Funct. Anal. 31, 34–61 (2021). https://doi.org/10.1007/s00039-021-00558-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00039-021-00558-4