Abstract
We prove stability estimates for the Shannon–Stam inequality (also known as the entropy-power inequality) for log-concave random vectors in terms of entropy and transportation distance. In particular, we give the first stability estimate for general log-concave random vectors in the following form: for log-concave random vectors \(X,Y \in {\mathbb {R}}^d\), the deficit in the Shannon–Stam inequality is bounded from below by the expression
where \(\mathrm {D}\left( \cdot ~ ||G\right) \) denotes the relative entropy with respect to the standard Gaussian and the constant C depends only on the covariance structures and the spectral gaps of X and Y. In the case of uniformly log-concave vectors our analysis gives dimension-free bounds. Our proofs are based on a new approach which uses an entropy-minimizing process from stochastic control theory.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Let \(\mu \) be a probability measure on \({\mathbb {R}}^d\) and \(X \sim \mu \). Denote by \(\mathrm {h}(\mu )\), the differential entropy of \(\mu \) which is defined to be
One of the fundamental results of information theory is the celebrated Shannon–Stam inequality which asserts that for independent vectors X, Y and \(\lambda \in (0,1)\)
We remark that Stam [24] actually proved the equivalent statement
first observed by Shannon in [23], and known today as the entropy power inequality. To state yet another equivalent form of the inequality, for any positive-definite matrix, \(\varSigma \), we set \(\gamma _\varSigma \) as the centered Gaussian measure on \({\mathbb {R}}^d\) with density
For the case where the covariance matrix is the identity, \(\mathrm {I}_d\), we will also write \(\gamma := \gamma _{\mathrm {I}_d}\). If \(Y \sim \nu \) we set the relative entropy of X with respect to Y as
For \(G \sim \gamma \), the differential entropy is related to the relative entropy by
Thus, when X and Y are independent and centered the statement
is equivalent to (1). Shannon noted that in the case that X and Y are Gaussians with proportional covariance matrices, both sides of (2) are equal. Later, in [24] it was shown that this is actually a necessary condition for the equality case. We define the deficit in (3) as
and are led to the question: what can be said aboutXandYwhen\(\delta _{EPI, \lambda }(X,Y)\)is small? One might expect that, in light of the equality cases, a small deficit in (3) should imply that X and Y are both close, in some sense, to a Gaussian. A recent line of works has focused on an attempt to make this intuition precise (see e.g., [6, 26]), which is also our main goal in the present work. In particular, we give the first stability estimate in terms of relative entropy. A good starting point is the work of Courtade et al. [6] which considers stability in terms of the Wasserstein distance (also known as quadratic transportation). The Wasserstein distance is defined by
where the infimum is taken over all couplings \(\pi \) whose marginal laws are \(\mu \) and \(\nu \). A crucial observation made in their work is that without further assumptions on the measures \(\mu \) and \(\nu \), one should not expect meaningful stability results to hold. Indeed, for any \(\lambda \in (0,1)\) they show that there exists a family of measures \(\{\mu _\varepsilon \}_{\varepsilon > 0}\) such that \(\delta _{EPI, \lambda }(\mu _\varepsilon ,\mu _\varepsilon ) < \varepsilon \) and such that for any Gaussian measure \(\gamma _\varSigma \), \(\mathcal {W}_2(\mu _\varepsilon , \gamma _\varSigma ) \ge \frac{1}{3}\). Moreover, one may take \(\mu _\varepsilon \) to be a mixture of Gaussians. Thus, in order to derive quantitative bounds it is necessary to consider a more restricted class of measures. We focus on the class of log-concave measures which, as our method demonstrates, turns out to be natural in this context.
1.1 Our contribution
A measure is called log-concave if it is supported on some subspace of \({\mathbb {R}}^d\) and, relative to the Lebesgue measure of that subspace, it has a density f for which
where \(\nabla ^2\) denotes the Hessian matrix, and we consider the inequality in the sense of positive definite matrices. Our first result will rely on a slightly stronger condition known as uniform log-concavity. If there exists \(\xi > 0\) such that
then we say that the measure is \(\xi \)-uniformly log-concave.
Theorem 1
Let X and Y be 1-uniformly log-concave centered vectors, and denote by \(\sigma ^2_X,\sigma ^2_Y\) the respective minimal eigenvalues of their covariance matrices. Then there exist Gaussian vectors \(G_X\) and \(G_Y\) such that for any \(\lambda \in (0,1)\),
To compare this with the main result of [6] we recall the transportation-entropy inequality due to Talagrand [25] which states that
As a conclusion we get
where \(C_{\sigma _X,\sigma _Y}\) depends only on \(\sigma _X\) and \(\sigma _Y\). Up to this constant, this is precisely the main result of [6]. In fact, our method can reproduce their exact result, which we present as a warm up in the next section. We remark that as the underlying inequality is of information-theoretic nature, it is natural to expect that stability estimates are expressed in terms of relative entropy.
A random vector is isotropic if it is centered and its covariance matrix is the identity. By a re-scaling argument the above theorem can be restated for uniform log-concave isotropic random vectors.
Corollary 1
Let X and Y be \(\xi \)-uniformly log-concave and isotropic random vectors, then there exist Gaussian vectors \(G_X\) and \(G_Y\) such that for any \(\lambda \in (0,1)\)
In our estimate for general log-concave vectors, the dependence on the parameter \(\xi \) will be replaced by the spectral gap of the measures. We say that a random vector X satisfies a Poincaré inequality if there exists a constant \(C>0\) such that
We define \(C_p(X)\) to be the smallest number such that the above equation holds with \(C=C_p(X)\), and refer to this quantity as the Poincaré constant of X. The inverse quantity, \(C_p(X)^{-1}\) is referred to as the spectral gap of X.
Theorem 2
Let X and Y be centered log-concave vectors with \(\sigma ^2_X\), \(\sigma _Y^2\) denoting the minimal eigenvalues of their covariance matrices. Assume that \(\mathrm {Cov}(X) + \mathrm {Cov}(Y) =2\mathrm {I}_d\) and set \(\max \left( \frac{\mathrm {C_p}(X)}{\sigma _X^2},\frac{\mathrm {C_p}(Y)}{\sigma ^2_Y}\right) = \mathrm {C_p}\). Then, if G denotes the standard Gaussian, for every \(\lambda \in (0,1)\)
where \(K >0\) is a numerical constant, which can be made explicit.
Remark 1
For \(\xi \)-uniformly log-concave vectors, we have the relation, \(\mathrm {C_p}(X) \le \frac{1}{\xi }\) (this is a consequence of the Brascamp-Lieb inequality [3], for instance). Thus, considering Corollary 1, one might have expected that the term \(\mathrm {C^3_p}\) could have been replaced by \(\mathrm {C^2_p}\) in Theorem 2. We do not know if either result is tight.
Remark 2
Bounding the Poincaré constant of an isotropic log-concave measure is the object of the long standing Kannan-Lovász-Simonovits (KLS) conjecture (see [15, 17] for more information). The conjecture asserts that there exists a constant \(K >0\), independent of the dimension, such that for any isotropic log-concave vector X, \(\mathrm {C_p}(X) \le K\). The best known bound is due to Lee and Vempala which showed in [18] that if X is a a d-dimensional log-concave vector, \(\mathrm {C_p}(X) = O \left( \sqrt{d}\right) .\)
Concerning the assumptions of Theorem 2; note that as the EPI is invariant to linear transformation, there is no loss in generality in assuming \(\mathrm {Cov}(X) + \mathrm {Cov}(Y) = 2\mathrm {I}_d\). Remark that \(\mathrm {C_p}(X)\) is, approximately, proportional to the maximal eigenvalue of \(\mathrm {Cov}(X)\). Thus, for ill-conditioned covariance matrices \(\frac{\mathrm {C_p}(X)}{\sigma _X^2},\frac{\mathrm {C_p}(Y)}{\sigma ^2_Y}\) will not be on the same scale. It seems plausible to conjecture that the dependence on the minimal eigenvalue and Poincaré constant could be replaced by a quantity which would take into consideration all eigenvalues.
Some other known stability results, both for log-concave vectors and for other classes of measures, may be found in [5, 6, 26]. The reader is referred to [6, Section 2.2] for a complete discussion. Let us mention one important special case, which is relevant to our results; the so-called entropy jump, first proved for the one dimensional case by Ball et al. [1] and then generalized by Ball and Nguyen to arbitrary dimensions in [2]. According to the latter result, if X is a log-concave and isotropic random vector, then
where \(\mathrm {C_p}(X)\) is the Poincaré constant of X and G is the standard Gaussian. This should be compared to both Corollary 1 and Theorem 2. That is, in the special case of two identical measures and \(\lambda = \frac{1}{2}\), their result gives a better dependence on the Poincaré constant than the one afforded by our results.
Ball and Nguyen [2] also give an interesting motivation for these type of inequalities: They show that if for some constant \(\kappa > 0\),
then the density \(f_X\) of X satisfies, \(f_X(0) \le e^{\frac{2d}{\kappa }}\). The isotropic constant of X is defined by \(L_X := f_X(0)^{\frac{1}{d}}\), and is the main subject of the slicing conjecture, which hypothesizes that \(L_X\) is uniformly bounded by a constant, independent of the dimension, for every isotropic log-concave vector X. Ball and Nguyen observed that using the above fact in conjunction with an entropy jump estimate gives a bound on the isotropic constant in terms of the Poincaré constant, and in particular the slicing conjecture is implied by the KLS conjecture ([7] gives another proof of this reduction which applies to specific measures).
See [16, 20] for more results concerning the entropy jump, as well as connections and analogues to the discrete setting and additive combinatorics.
Our final results give improved bounds under the assumption that X and Y are already close to being Gaussian, in terms of relative entropy, or if one them is a Gaussian. We record these results in the following theorems.
Theorem 3
Suppose that X, Y be isotropic log-concave vectors such that \(\mathrm {C_p}(X),\mathrm {C_p}(Y) \le \mathrm {C_p}\) for some \(\mathrm {C_p} < \infty \). Suppose further that \(\mathrm {D}(X||G), \mathrm {D}(Y||G) \le \frac{1}{4}\), then
The following gives an improved bound in the case that one of the random vectors is a Gaussian, and holds in full generality with respect to the other vector, without a log-concavity assumption.
Theorem 4
(Theorem 9 in [6]) Let X be a centered random vector with finite Poincaré constant, \(\mathrm {C_p}(X) < \infty \). Then
Remark 3
When \(\mathrm {C_p}(X) \ge 1\), the following inequality holds
Remark 4
Theorem 4 was already proved in [6] by using a slightly different approach. Denote by \(\mathrm {I}(X||G)\), the relative Fisher information of the random vector X. In [11] the authors prove the following improved log-Sobolev inequality.
The theorem follows by integrating the inequality along the Ornstein-Uhlenbeck semi-group.
1.2 Discussion and further directions for research
Perhaps the first question that arises in light of Theorem 2 is whether log-concavity is necessary. It should be noted that in dimension 1, in the special case that X has the same distribution as Y, log-concavity is not needed, see [1]. We do not have a counterexample to the conjecture that Theorem 2 holds true without any log-concavity assumption.
A very interesting related question is to try to characterize the approximate equality cases for the Shannon–Stam inequality. In a recent paper [9], it is shown that measures which almost saturate the log-Sobolev inequality are close, in Wasserstein distance, to mixtures of isotropic Gaussians. It is natural to ask whether this is also the case for the Shannon–Stam inequality. In other words, could it be that the Wasserstein distance between both X and Y to a mixture of Gaussians can be bounded by some function of \(\delta _{EPI, \lambda }(X,Y)\). We provide a short discussion following Lemma 2 which gives a heuristic towards such a result.
Finally, a natural question which was mentioned above is to understand the optimal dependence on the Poincaré constant in Theorem 2. It is plausible that the dependence \(\mathrm {C_P}^3\) can be replaced by \(\mathrm {C_P}\). This is supported by the result of Ball and Nguyen [2]. In a related note, in the context of Theorem 3, it makes sense to ask if a related bound (even a dimension-dependent one) can be attained when the assumption \(\mathrm {D}(X||G), \mathrm {D}(Y||G) \le \frac{1}{4}\) is relaxed to \(\mathrm {D}(X||G), \mathrm {D}(Y||G) \le \frac{d}{4}\), for example.
2 Bounding the deficit via martingale embeddings
Our approach is based on ideas somewhat related to the ones which appear in [9, 10, 14]: the very high-level plan of the proof is to embed the variables X, Y as the terminal points of some martingales and express the entropies of X, Y and \(X+Y\) as functions of the associates quadratic co-variation processes. One of the main benefits in using such an embedding is that the co-variation process of \(X+Y\) can be easily expressed in terms on the ones of X, Y, as demonstrated below. In [10] these ideas where used to produce upper bounds for the entropic central limit theorem, so it stands to reason that related methods may be useful here. It turns out, however, that in order to produce meaningful bounds for the Shannon–Stam inequality, one needs a more intricate analysis, since this inequality corresponds to a second-derivative phenomenon: whereas for the CLT one only needs to produce upper bounds on the relative entropy, here we need to be able to compare, in a non-asymptotic way, two relative entropies.
In particular, our martingale embedding is constructed using the entropy minimizing technique developed by Föllmer [12, 13] and later Lehec [19]. This construction has several useful features, one of which is that it allows us to express the relative entropy of a measure in \({\mathbb {R}}^d\) in terms of a variational problem on the Wiener space. In addition, upon attaining a slightly different point of view on this process, that we introduce here, the behavior of this variational expression turns out to be tractable with respect to convolutions. The reader is refereed to [22] for the necessary background in stochastic calculus.
In order to outline the argument, fix centered measures \(\mu \) and \(\nu \) on \({\mathbb {R}}^d\) with finite second moment. Let \(X \sim \mu \), \(Y \sim \nu \) be random vectors and \(G \sim \gamma \) a standard Gaussian random vector.
An entropy-minimizing drift Let \(B_t\) be a standard Brownian motion on \({\mathbb {R}}^d\) and denote by \(\mathcal {F}_t\) its natural filtration. In the sequel, the following process plays a fundamental role:
where the minimum is taken with respect to all processes \(u_t\) adapted to \(\mathcal {F}_t\), such that
Amazingly, under mild assumptions on \(\mu \), and in particular in the case that \(\mu \) is log-concave, there exists a unique minimizer to Eq. (4), from which we construct the process
also known as the Föllmer process, with \(v_t^X\) being the associated Föllmer drift. We refer the reader to [19] for proofs of the existence and uniqueness of the process, as well as of a few other facts summarized below.
It turns out that the process \(v_t^X\) is a martingale [which goes together with the fact that it minimizes the quadratic form in (4)] which is given by the equation
where \(f_X\) is the density of X with respect to the standard Gaussian and \(P_{1-t}\) denotes the heat semi-group,
In fact, Girsanov’s formula gives a very useful relation between the energy of the drift and the entropy of X, namely,
This gives the following alternative interpretation for the process: suppose that the Wiener space is equipped with an underlying probability measure P, with respect to which the process \(B_t\) is a Brownian motion as above. Let Q be a measure on Wiener space such that
then the process \(X_t\) is a Brownian motion with respect to the measure Q. By the representation theorem for the Brownian bridge, this tells us that the process \(X_t\) conditioned on \(X_1\) is a Brownian bridge between 0 and \(X_1\). In particular, we have
Lehec’s proof of the Shannon–Stam inequality For the sake of intuition, we now repeat Lehec’s argument to reproduce the Shannon–Stam inequality (3) using this process. Let \(X_t := B^X_t + \int \limits _0^tv^X_sds\) and \(Y_t := B^Y_t + \int \limits _0^tv^Y_sds\) be the Föllmer processes associated to X and Y, where \(B_t^X\) and \(B_t^Y\) are independent Brownian motions. For \(\lambda \in (0,1)\), define the new processes
and
By the independence of \(B_t^X\) and \(B_t^Y\), \({\tilde{B}}_t\) is a Brownian motion and
Note that as the \(v_t^X\) is martingale, we have for every \(t\in [0,1]\),
Using Eqs. (4) and (6) and recalling that the processes are independent, we finally have
This recovers the Shannon–Stam inequality in the form (3).
An alternative point of view: replacing the drift by a varying diffusion coefficient Lehec’s proof gives rise to the following idea: Suppose the processes \(v_t^X\) and \(v_t^Y\) could be coupled in a way such that the variance of the resulting process \(\sqrt{\lambda } v_t^X + \sqrt{1-\lambda }v_t^Y\) was smaller than that of \(w_t\) above. Such a coupling would improve on (3) and that is the starting point of this work.
As it turns out, however, it is easier to get tractable bounds by working with a slightly different interpretation of the above processes, in which the role of the drift is taken by an adapted diffusion coefficient of a related process.
The idea is as follows: Suppose that \(M_t := \int \limits _0^t F_sdB_s\) is a martingale, where \(F_t\) is some positive-definite matrix valued process adapted to \(\mathcal {F}_t\). Consider the drift defined by
We then claim that \(B_1 + \int \limits _{0}^1u_tdt = M_1\). To show this, we use the stochastic Fubini Theorem [27] to write
Since we now expressed the random variable \(M_1\) as the terminal point of a standard Brownian motion with an adapted drift, the minimality property of the Föllmer drift together with Eq. (6) immediately produces a bound on its entropy. Namely, by using Itô’s isometry and Fubini’s theorem we have the bound
This hints at the following possible scheme of proof: in order to give an upper bound for the expression \(\mathrm {D}(\sqrt{\lambda }X_1 + \sqrt{1-\lambda }Y_1||G)\), it suffices to find martingales \(M_t^X\) and \(M_t^Y\) such that \(M_1^{X}, M_1^Y\) have the laws of X and Y, respectively, and such that the \(\lambda \)-average of the covariance processes is close to the identity.
The Föllmer process gives rise to a natural martingale: Consider \({\mathbb {E}}\left[ X_1|\mathcal {F}_t\right] \), the associated Doob martingale. By the martingale representation theorem ([22, Theorem 4.3.3]) there exists a uniquely defined adapted matrix valued process \(\varGamma _t^X\), for which
By following the construction in (8) and considering the process \({\tilde{v}}_t^X := \int \limits _0^t\frac{\varGamma ^X_s -\mathrm {I}_d}{1-s}dB^X_s\), it is immediate that \(B_1 + \int \limits _{0}^1{\tilde{v}}_t^Xdt = X_1\). Observe that \(v_t - {\tilde{v}}_t\) is a martingale and that for every \(t \in [0,1]\), \(\int \limits _t^1(v_s^X - {\tilde{v}}_s^X)ds|\mathcal {F}_t = 0,\) almost surely. It thus follows that \(v_t^X\) and \({{\tilde{v}}}_t^X\) are almost surely the same process. We conclude the following representation for the Föllmer drift,
The matrix \(\varGamma _t^X\) turns out to be positive definite almost surely, (in fact, it has an explicit simple representation, see Proposition 1 below), which yields, by combining (6) with same calculation as in (9),
Given the processes \(\varGamma _t^X\) and \(\varGamma _t^Y\), we are now in position to express \(\sqrt{\lambda } X + \sqrt{1-\lambda } Y\) as the terminal point of a martingale, towards using (9), which would lead to a bound on \(\delta _{EPI,\lambda }\). We define
and a martingale \({\tilde{B}}_t\) which satisfies
Since \(\varGamma _t^X\) and \(\varGamma _t^Y\) are invertible almost surely and independent, it holds that
where \([{\tilde{B}}]_t\) denotes the quadratic co-variation of \({\tilde{B}}_t\). Thus, by Levy’s characterization, \({\tilde{B}}_t\) is a standard Brownian motion and we have the following equality in law
We can now invoke (9) to get
Combining this with the identity (12) finally gives a bound on the deficit in the Shannon–Stam inequality, in the form
The following technical lemma will allow us to give a lower bound for the right hand side in terms of the variances of the processes \(\varGamma _t^X, \varGamma _t^Y\). Its proof is postponed to the end of the section.
Lemma 1
Let A and B be positive definite matrices and denote
Then
Combining the lemma with the estimate obtained in (13) produces the following result, which will be our main tool in studying \(\delta _{EPI, \lambda }\).
Lemma 2
Let X and Y be centered random vectors on \({\mathbb {R}}^d\) with finite second moment, and let \(\varGamma _t^X, \varGamma _t^Y\) be defined as above. Then,
The expression on the right-hand side of (14) may seem unwieldy, however, in many cases it can be simplified. For example, if it can be shown that, almost surely, \(\varGamma _t^X, \varGamma _t^Y \preceq c_t\mathrm {I}_d\) for some deterministic \(c_t > 0\), then we obtain the more tractable inequality
As we will show, this is the case when the random vectors are log-concave.
Let us now take a small detour to discuss a heuristic idea for a possible extension of our results towards characterizing the approximate equality cases in the Shannon–Stam inequality (as alluded to in Sect. 1.2). Using Eq. (16) below, it can be shown that after a small period of time, the quantities \( \Vert {\mathbb {E}}\left[ \varGamma _t^X \right] \Vert _{op} \) and \(\Vert {\mathbb {E}}\left[ \varGamma _t^Y\right] \Vert _{op}\) become small. This suggests that it may be the case that for any X and Y (hence, even without a log-concavity assumption or a bound on the Poincaré constant), it holds that
where \(\lim _{t_0 \rightarrow 0} c(t_0) = 0\). Our techniques show that, in this case, the laws of random variables \(X_1 | \mathcal {F}_{t_0}\) and \(Y_1 | \mathcal {F}_{t_0}\) are close to Gaussian (with high probability with respect to \(\mathcal {F}_{t_0}\)), which would imply in this case that X and Y are close to mixtures of Gaussians. A related version of this idea appears in [9] and [21].
Proof of Lemma 1
We have
As
we have the equality
Finally, as the trace is invariant under any permutation of three symmetric matrices we have that
and
Thus,
as required. \(\square \)
2.1 The Föllmer process associated to log-concave random vectors
In this section, we collect several results pertaining to the Föllmer process. Throughout the section, we fix a random vector X in \({\mathbb {R}}^n\) and associate to it the Föllmer process \(X_t\), defined in the previous section, as well as the process \(\varGamma ^X_t\), defined in Eq. (10) above. The next result lists some of its basic properties, and we refer to [8, 10] for proofs.
Proposition 1
For \(t \in (0,1)\) define
where \(f_X\) is the density of X with respect to the standard Gaussian and \(Z_{t,X}\) is a normalizing constant defined so that \(\int \limits _{{\mathbb {R}}^d} f_X^t = 1\). Then
-
1.
\(f_X^t\) is the density of the random measure \(\mu _t := X_1|\mathcal {F}_t\) with respect to the standard Gaussian and \(\varGamma ^X_t = \frac{\mathrm {Cov}\left( \mu _t\right) }{1-t}\).
-
2.
\(\varGamma ^X_t\) is almost surely a positive definite matrix, in particular, it is invertible.
-
3.
For all \(t \in (0,1)\), we have
$$\begin{aligned} \frac{d}{dt}{\mathbb {E}}\left[ \varGamma ^X_t\right] = \frac{{\mathbb {E}}\left[ \varGamma ^X_t\right] - {\mathbb {E}}\left[ \left( \varGamma ^X_t\right) ^2\right] }{1-t}. \end{aligned}$$(16) -
4.
The following identity holds
$$\begin{aligned} {\mathbb {E}}\left[ v_t^X\otimes v_t^X\right] = \frac{\mathrm {I}_d - {\mathbb {E}}\left[ \varGamma ^X_t\right] }{1-t} + \mathrm {Cov}(X) - \mathrm {I}_d, \end{aligned}$$(17)for all \(t \in [0,1]\). In particular, if \(\mathrm {Cov}(X) \preceq \mathrm {I}_d\), then \({\mathbb {E}}\left[ \varGamma ^X_t\right] \preceq \mathrm {I}_d\).
In what follows, we restrict ourselves to the case that X is log-concave. Using this assumption we will establish several important properties for the matrix \(\varGamma _t\). For simplicity, we will write \(\varGamma _t := \varGamma _t^X\) and \(v_t := v_t^X\). The next result shows that the matrix \(\varGamma _t\) is bounded almost surely.
Lemma 3
Suppose that X is log-concave, then for every \(t \in (0,1)\)
Moreover, if for some \(\xi >0\), X is \(\xi \)-uniformly log-concave then
Proof
By Proposition 1, \(\mu _t\), the law of \(X_1|\mathcal {F}_t\) has a density \(\rho _t\), with respect to the Lebesgue measure, proportional to
Consequently, since \(-\nabla ^2f_X \succeq 0\),
It follows that, almost surely, \(\mu _t\) is \(\frac{t}{1-t}\)-uniformly log-concave. According to the Brascamp-Lieb inequality [3] \(\alpha \)-uniform log-concavity implies a spectral gap of \(\alpha \), and in particular \(\text {Cov}(\mu _t) \preceq \frac{1 - t}{t}\mathrm {I}_d\) and so, \(\varGamma _t = \frac{\mathrm {Cov}(\mu _t)}{1-t} \preceq \frac{1}{t}\mathrm {I}_d\). If, in addition, X is \(\xi \)-uniformly log-concave, so that \(-\nabla ^2f_X \succeq \xi \mathrm {I}_d\), then we may write
and the arguments given above show \(\text {Cov}(\mu _t) \preceq \frac{(1-t)}{(1-t)\xi + t}\mathrm {I}_d\). Thus,
\(\square \)
Our next goal is to use the formulas given in the above lemma in order to bound from below the expectation of \(\varGamma _t\). We begin with a simple corollary.
Corollary 2
Suppose that X is 1-uniformly log-concave, then for every \(t \in [0,1]\)
Proof
By (16), we have
By Lemma 3, \(\varGamma _t\preceq \mathrm {I}_d\), which shows
Thus, for every t,
\(\square \)
To produce similar bounds for general log-concave random vectors, we require more intricate arguments. Recall that \(\mathrm {C_p}(X)\) denotes the Poincaré constant of X.
Lemma 4
If X is centered and has a finite a Poincaré constant \(\mathrm {C_p}(X) < \infty \), then
Proof
Recall that, by Eq. (7), we know that \(X_t\) has the same law as \(tX_1 + \sqrt{t(1-t)}G\), where G is a standard Gaussian independent of \(X_1\). Since \(\mathrm {C_p}(tX) = t^2\mathrm {C_p}(X) \) and since the Poincaré constant is sub-additive with respect to convolution [4] we get
The drift, \(v_t\), is a function of \(X_t\) and \({\mathbb {E}}\left[ v_t\right] = 0\). Equation (5) implies that \(\nabla _x v_t(X_t)\) is a symmetric matrix, hence the Poincaré inequality yields
As \(v_t(X_t)\) is a martingale, by Itô’s lemma we have
An application of Itô’s isometry then shows
where we have again used the fact that \(\nabla _x v_t(X_t)\) is symmetric. \(\square \)
Using the last lemma, we can deduce lower bounds on the matrix \(\varGamma _t^X\) in terms of the Poincaré constant.
Corollary 3
Suppose that X is log-concave and that \(\sigma ^2\) is the minimal eigenvalue of \(\mathrm {Cov}(X)\). Then,
-
1.
For every \(t \in \left[ 0,\frac{1}{ 2\frac{\mathrm {C_p}(X)}{\sigma ^2}+1}\right] \), \({\mathbb {E}}\left[ \varGamma _t\right] \succeq \frac{\min (1,\sigma ^2)}{3}\mathrm {I}_d.\)
-
2.
For every \(t \in \left[ \frac{1}{ 2\frac{\mathrm {C_p}(X)}{\sigma ^2}+1}, 1\right] \), \({\mathbb {E}}\left[ \varGamma _t\right] \succeq \frac{\min (1,\sigma ^2)}{3}\frac{1}{t\left( 2\frac{\mathrm {C_p}(X)}{\sigma ^2}+1\right) }\mathrm {I}_d\).
Proof
Using Equation (11), Itô’s isometry and the fact that \(\varGamma _t\) is symmetric, we deduce that
Combining this with equation (17) and using Lemma 4, we get
In the case where X is log-concave, by Lemma 3, \(\varGamma _t \preceq \frac{1}{t}\mathrm {I}_d\) almost surely, therefore \({\mathbb {E}}\left[ \varGamma _t^2\right] \preceq \frac{1}{t}{\mathbb {E}}\left[ \varGamma _t\right] \). The above inequality then becomes
Rearranging the inequality shows
As long as \(t \le \frac{1}{ 2\left( \frac{\mathrm {C_p}(X)}{\sigma ^2}\right) +1}\), we have
which gives the first bound. By (10), we also have the bound
The differential equation
has a unique solution given by
Using Gromwall’s inequality, we conclude that for every \(t \in \left[ \frac{1}{ 2\frac{\mathrm {C_p}(X)}{\sigma ^2}+1},1\right] \),
\(\square \)
We conclude this section with a comparison lemma that will allow to control the values of \({\mathbb {E}}\left[ \left\Vert v_t\right\Vert _2^2\right] \).
Lemma 5
Let \(t_0 \in [0,1]\) and suppose that X is centered with a finite Poincaré constant \(\mathrm {C_p}(X) < \infty \). Then
-
1.
For \(t_0 \le t \le 1,\)
$$\begin{aligned} {\mathbb {E}}\left[ \left\Vert v_t\right\Vert _2^2\right] \ge {\mathbb {E}}\left[ \left\Vert v_{t_0}\right\Vert _2^2\right] \frac{t_0\left( \mathrm {C_p}(X)-1\right) t + t}{t_0\left( \mathrm {C_p}(X)-1\right) t + t_0}. \end{aligned}$$ -
2.
For \(0 \le t \le t_0,\)
$$\begin{aligned} {\mathbb {E}}\left[ \left\Vert v_t\right\Vert _2^2\right] \le {\mathbb {E}}\left[ \left\Vert v_{t_0}\right\Vert _2^2\right] \frac{t_0\left( \mathrm {C_p}(X)-1\right) t + t}{t_0\left( \mathrm {C_p}(X)-1\right) t + t_0}. \end{aligned}$$
Proof
Consider the differential equation
It has a unique solution given by
The bounds follow by applying Gromwall’s inequality combined with the result of Lemma 4. \(\square \)
3 Stability for 1-uniformly log-concave random vectors
In this section, we assume that X and Y are both 1-uniformly log-concave. Let \(B_t^X, B_t^Y\) be independent standard Brownian motions and consider the associated processes \(\varGamma _t^X, \varGamma _t^Y\) defined as in Sect. 2.
The key fact that makes the uniform log-concave case easier is Lemma 3, which implies that \(\varGamma _t^X,\varGamma _t^Y \preceq \mathrm {I}_d\) almost surely. In this case, Lemma 2 simplifies to
where we have used the fact that
Consider the two Gaussian random vectors defined as
and observe that
This induces a coupling between X and \(G_X\) from which we obtain, using Itô’s Isometry,
and an analogous estimate also holds for Y. We may now use \({\mathbb {E}}\left[ \varGamma _t^X\right] \) and \({\mathbb {E}}\left[ \varGamma _t^Y\right] \) as the diffusion coefficients for the same Brownian motion to establish
Plugging these estimates into (19) reproves the following bound, which is identical to Theorem 1 in [6].
Theorem 5
Let X and Y be 1-uniformly log-concave centered vectors and let \(G_X, G_Y\) be defined as above. Then,
To obtain a bound for the relative entropy towards the proof of Theorem 1, we will require a slightly more general version of inequality (9). This is the content of the next lemma, whose proof is similar to the argument presented above. The main difference comes from applying Girsanov’s theorem to a re-scaled Brownian motion, from which we obtain an expression analogous to (6). The reader is referred to [10, Lemma 2], for a complete proof.
Lemma 6
Let \(F_t\) and \(E_t\) be two \(F_t\)-adapted matrix-valued processes and let \(X_t\), \(M_t\) be two processes defined by
Suppose that for every \(t\in [0,1]\), \(E_t \succeq c\mathrm {I}_d\) for some deterministic \(c > 0\), then
Proof of Theorem 1
By Corollary 2
We invoke Lemma 6 with \(E_t = {\mathbb {E}}\left[ \varGamma _t^X\right] \) and \(F_t = \varGamma _t^X\) to obtain
Repeating the same argument for Y gives
By invoking Lemma 6 with \(F_t = {\mathbb {E}}\left[ \varGamma _t^X\right] \) and \(E_t = {\mathbb {E}}\left[ \varGamma _t^Y\right] \) and then one more time after switching between \(F_t\) and \(E_t\), and summing the results, we get
Plugging the above inequalities into (19) concludes the proof. \(\square \)
4 Stability for general log-concave random vectors
Fix X, Y, centered log-concave random vectors in \({\mathbb {R}}^d\), such that
with \(\sigma _X^2,\sigma _Y^2\) the corresponding minimal eigenvalues of \(\mathrm {Cov}(X)\) and \(\mathrm {Cov}(Y)\). Assume further that \(\frac{\mathrm {C_p}(Y)}{\sigma _Y^2},\frac{\mathrm {C_p}(X)}{\sigma _X^2} \le \mathrm {C_p}\), for some \(\mathrm {C_p} >1\). Again, let \(B_t^X\) and \(B_t^Y\) be independent Brownian motions and consider the associated processes \(\varGamma _t^X, \varGamma _t^Y\) defined as in Sect. 2.
The general log-concave case, in comparison with the case where X and Y are uniformly log-concave, gives rise to two essential difficulties. Recall that the results in the previous section used the fact that an upper bound for the matrices \(\varGamma _t^X,\varGamma _t^Y\), combined with equation (14) gives the simpler bound (19). Unfortunately, in the general log-concave case, there is no upper bound uniform in t, which creates the first problem. The second issue has to do with the lack of respective lower bounds for \({\mathbb {E}}[\varGamma _t^X]\) and \({\mathbb {E}}[\varGamma _t^Y]\): in view of Lemma 6, one needs such bounds in order to obtain estimates on the entropies.
The solution of the second issue lies in Corollary 3, which gives a lower bound for the processes in terms of the Poincaré constants. We denote \(\xi = \frac{1}{(2\mathrm {C_p}+1)}\frac{\min (\sigma _Y^2,\sigma _X^2)}{3}\), so that the corollary gives
Thus, we are left with the issue arising from the lack of a uniform upper bound for the matrices \(\varGamma _t^X,\varGamma _t^Y\). Note that Lemma 3 gives \(\varGamma _t^X\preceq \frac{1}{t}\mathrm {I}_d\), a bound which is not uniform in t. To illustrate how one may overcome this issue, suppose that there exists an \(\varepsilon >0\), such that
In such a case, Lemma 2 would imply
Towards finding an \(\varepsilon \) such that the above holds, note that since \(v_t^X\) is a martingale, and using (6) we have for every \(t_0 \in [0,1],\)
Observe that
Using the relation in (11), Fubini’s theorem shows
Combining the last two displays gives
Using (17), we have the identities:
and
from which we deduce
Let \(\{w_i\}_{i=1}^d\) be an orthornormal basis of eigenvectors corresponding to the eigenvalues \(\{\lambda _i\}_{i=1}^d\) of \(\mathrm {I}_d-{\mathbb {E}}\left[ \varGamma _t^X\right] \). The following observation, which follows from the above identities, is crucial: if \(\lambda _i \le 0\) then necessarily
In this case, by assumption (20), \(\langle w_i, \mathrm {Cov}(Y)w_i\rangle \le 1\) and
Our aim is to bound (23) from below; thus, in the calculation of the trace in the RHS, we may disregard all \(w_i\) corresponding to negative \(\lambda _i\). Moreover, if \(\lambda _i \ge 0\), we need only consider the cases where
as well. We note that these assumptions on \(w_i\) also imply
Since \(w_i\) is an eigenvector of \(\mathrm {I}_d-{\mathbb {E}}\left[ \varGamma _t^X\right] \), it is also an eigenvector of \({\mathbb {E}}\left[ v^X_t\otimes v^X_t\right] + \mathrm {I}_d-\mathrm {Cov}(X)\) and we have the following equality:
The fact that \(\lambda _i \ge 0\) as well as (24) and (25), ensure that all four terms are positive. Using the estimate (21), the previous equation is bounded from above by
where we have used (20). Summing over all the relevant \(w_i\) we get
Plugging this into (23) and using (22) we have thus shown
This suggests that it may be useful to bound \({\mathbb {E}}\left[ \left\Vert v^X_{t_0}\right\Vert _2^2\right] \) from above, for small values of \(t_0\), which is the objective of the next lemma.
Lemma 7
If X is centered and has a finite Poincaré constant \(\mathrm {C_p}(X) < \infty \), then for every \(s \le \frac{1}{3(2\mathrm {C_p}(X)+1)}\) the following holds
Proof
Suppose to the contrary that \({\mathbb {E}}\left[ \left\Vert v_{s^2}^X\right\Vert ^2_2\right] \ge \frac{s}{4}\cdot \mathrm {D}(X||G)\). Invoking Lemma 5 with \(t_0 = s^2\) gives
whenever \(t \ge s^2\). Thus,
Note now that for \(s\le \frac{1}{3(2\mathrm {C_p}(X)+1)}\)
and in particular we may substitute \(s = \frac{1}{3(2\mathrm {C_p}(X)+1)}\) in (27). In this case, a straightforward calculation yields
which contradicts the identity (6), and concludes the proof by contradiction. \(\square \)
We would like to use the lemma with the choice \(s = \xi ^2\). In order to verify the condition on the lemma which amounts to \(\xi ^2 \le \frac{1}{3(2\mathrm {C_p}(X)+1)}\), we first remark that if \(\sigma _X^2 \le 1\), then it is clear that \(\xi \le \frac{1}{3(2\mathrm {C_p}(X)+1)}\). Otherwise, \(\sigma _X^2 \ge 1\) and
As the same reasoning is also true for Y, we now choose \(t_0 = \xi ^2\), which allows to invoke the previous lemma in (26) and to establish:
We are finally ready to prove the main theorem.
Proof of Theorem 2
Denote \(\xi = \frac{1}{(2\mathrm {C_p}+1)}\frac{\min (\sigma _Y^2,\sigma _X^2)}{3}\). Since X and Y are log-concave, by Lemma 3, \(\varGamma _t^X, \varGamma _t^Y \preceq \frac{1}{t}\mathrm {I}_d\) almost surely. Thus, Lemma 2 gives
By noting that \(\mathrm {C_p} \ge 1\), the bound (28) gives
for some numerical constant \(K>0\). \(\square \)
5 Further results
5.1 Stability for low entropy log-concave measures
In this section we focus on the case where X and Y are log-concave and isotropic. Similar to the previous section, we set \(\xi _X = \frac{1}{3(2\mathrm {C_p}(X) + 1)}\), so that by Corollary 3,
Towards the proof of Theorem 3, we first need an analogue of Lemma 7, for which we sketch the proof here.
Lemma 8
If X is centred and has a finite Poincaré constant \(\mathrm {C_p}(X) < \infty \),
Proof
Assume by contradiction that \({\mathbb {E}}\left[ \left\Vert v_{\xi _X}\right\Vert _2^2\right] \ge \frac{1}{4}\mathrm {D}(X||G)\). In this case, Lemma 5 implies, for every \(t \ge \xi _X\),
A calculation then shows that
which is a contradiction to (6). \(\square \)
Proof of Theorem 3
Since \(v_t^X\) is a martingale, \({\mathbb {E}}\left[ \left\Vert v_t^X\right\Vert _2^2\right] \) is an increasing function. By (6) we deduce the elementary inequality
which holds for every \(s \in [0,1]\). For isotropic X, Equation (17) shows that, for all \(t \in [0,1]\),
where the second inequality is by assumption. Note that Equation (17) also shows that \({\mathbb {E}}\left[ \varGamma _t^X\right] \preceq \mathrm {I}_d\) which yields, for every \(t \in [0,1]\)
Applying this to Y as well produces the bound
Set \(\xi = \min (\xi _X,\xi _Y)\). Repeating the same calculation as in (23) and using the above gives that
Lemma 8 implies
Finally, by Lemma 3, \(\varGamma _t^X,\varGamma _t^Y \preceq \frac{1}{t}\mathrm {I}_d\) almost surely for all \(t \in [0,1]\). We now invoke Lemma 2 to obtain
\(\square \)
5.2 Stability under convolution with a Gaussian
Proof of Theorem 4
Fix \(\lambda \in (0,1)\), by (7) we have that
As the relative entropy is affine invariant, this implies
Lemma 5 yields,
and
Denote
A calculation shows
as well as
Thus, the above bounds give
and
Now, since the expression \(\frac{\alpha }{\alpha + \beta }\) is monotone increasing with respect to \(\alpha \) and decreasing with respect to \(\beta \) whenever \(\alpha ,\beta > 0\), those two inequalities together with (29) imply that
Rewriting the above in terms of the deficit in the Shannon–Stam inequality, we have established
\(\square \)
References
Ball, K., Barthe, F., Naor, A., et al.: Entropy jumps in the presence of a spectral gap. Duke Math. J. 119(1), 41–63 (2003)
Ball, K., Nguyen, V.: Entropy jumps for isotropic log-concave random vectors and spectral gap. Studia Mathematica 1(213), 81–96 (2012)
Brascamp, H.J., Lieb, E.H.: On extensions of the Brunn–Minkowski and Prékopa–Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. J. Funct. Anal. 22(4), 366–389 (1976)
Courtade, T.A.: Bounds on the poincaré constant for convolution measures. to appear in Annales de l’Institut Henri Poincaré (B) Probabilités et Statistiques (2018)
Courtade, T.A.: A quantitative entropic CLT for radially symmetric random vectors. In: 2018 IEEE International Symposium on Information Theory (ISIT) IEEE, pp. 1610–1614 (2018)
Courtade, T.A., Fathi, M., Pananjady, A.: Quantitative stability of the entropy power inequality. IEEE Trans. Inform. Theory 64(8), 5691–5703 (2018)
R, Eldan, B, Klartag: Approximately Gaussian marginals and the hyperplane conjecture. In: Houdré, C. (ed.) Concentration, Functional Inequalities and Isoperimetry. Contemporary Mathematics, vol. 545, pp. 55–68. American Mathematical Society, Providence (2011)
Eldan, R., Lee, J.R.: Regularization under diffusion and anticoncentration of the information content. Duke Math. J. 167(5), 969–993 (2018)
Eldan, R., Lehec, J., Shenfeld, Y.: Stability of the logarithmic Sobolev inequality via the Föllmer process (2019). arXiv preprint arXiv:1903.04522
Eldan, R., Mikulincer, D., Zhai, A.: The CLT in high dimensions: quantitative bounds via martingale embedding (2018). arXiv preprint arXiv:1806.09087
Fathi, M., Indrei, E., Ledoux, M.: Quantitative logarithmic Sobolev inequalities and stability estimates. Discrete Contin. Dyn. Syst. 36(12), 6835–6853 (2016)
Föllmer, H.: An entropy approach to the time reversal of diffusion processes. In: Grigelionis, B. (ed.) Stochastic Differential Systems Filtering and Control, pp. 156–163. Springer, Berlin (1985)
Föllmer, H.: Time reversal on Wiener space. In: Albeverio, S. (ed.) Stochastic Processes-Mathematics and Physics, pp. 119–129. Springer, Berlin (1986)
Jiang, H., Lee, Y.T., Vempala, S.S.: A generalized central limit conjecture for convex bodies (2019). arXiv preprint arXiv:1909.13127
Kannan, R., Lovász, L., Simonovits, M.: Isoperimetric problems for convex bodies and a localization lemma. Discrete Comput. Geom. 13(3–4), 541–559 (1995)
Kontoyiannis, I., Madiman, M.: Sumset and inverse sumset inequalities for differential entropy and mutual information. IEEE Trans. Inf. Theory 60(8), 4503–4514 (2014)
Ledoux, M.: Spectral gap, logarithmic Sobolev constant, and geometric bounds. Surveys Differ. Geom. 9(1), 219–240 (2004)
Lee, Y.T., Vempala, S.S.: Eldan’s stochastic localization and the kls hyperplane conjecture: An improved lower bound for expansion. In: 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), IEEE, pp. 998–1007 (2017)
Lehec, J., et al.: Representation formula for the entropy and functional inequalities. In: Annales de l’Institut Henri Poincaré, Probabilités et Statistiques vol. 49, Institut Henri Poincaré, pp. 885–899 (2013)
Madiman, M., Kontoyiannis, I.: Entropy bounds on abelian groups and the Ruzsa divergence. IEEE Trans. Inf. Theory 64(1), 77–92 (2018)
Mikulincer, D.: Stability of talagrand’s gaussian transport-entropy inequality via the föllmer process (2019). arXiv preprint arXiv:1906.05904
Øksendal, B.: Stochastic differential equations. In: Stochastic Differential Equations, pp. 65–84. Springer, Berlin (2003)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Stam, A.J.: Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Control 2(2), 101–112 (1959)
Talagrand, M.: Transportation cost for Gaussian and other product measures. Geom. Funct. Anal. GAFA 6(3), 587–600 (1996)
Toscani, G.: A strengthened entropy power inequality for log-concave densities. IEEE Trans. Inf. Theory 61(12), 6550–6559 (2015)
Veraar, M.: The stochastic Fubini theorem revisited. Stoch. Int. J. Probab. Stoch. Process. 84(4), 543–551 (2012)
Acknowledgements
We are grateful to Alex Zhai for several enlightening exchanges of ideas, and are thankful to Bo’az Klartag and Max Fathi for useful discussions. We would also like to thank Tom Courtade for his thoughtful comments concerning a preliminary draft and for suggesting that we generalize the proof of Theorem 2 for arbitrary covariance structures. Finally, we thank the two anonymous referees for many insightful comments and questions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
D. Mikulincer is supported by an Azrieli foundation fellowship.
R. Eldan: is the Incumbent of the Elaine Blond career development chair, supported by a European Research Council Starting Grant (ERC StG) and by an Israel Science Foundation Grant No. 715/16.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Eldan, R., Mikulincer, D. Stability of the Shannon–Stam inequality via the Föllmer process. Probab. Theory Relat. Fields 177, 891–922 (2020). https://doi.org/10.1007/s00440-020-00967-w
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-020-00967-w