Stability of the Shannon-Stam inequality via the F\"ollmer process

We prove stability estimates for the Shannon-Stam inequality (also known as the entropy-power inequality) for log-concave random vectors in terms of entropy and transportation distance. In particular, we give the first stability estimate for general log-concave random vectors in the following form: for log-concave random vectors $X,Y \in \mathbb{R}^d$, the deficit in the Shannon-Stam inequality is bounded from below by the expression $$ C \left(\mathrm{D}\left(X||G\right) + \mathrm{D}\left(Y||G\right)\right), $$ where $\mathrm{D}\left( \cdot ~ ||G\right)$ denotes the relative entropy with respect to the standard Gaussian and the constant $C$ depends only on the covariance structures and the spectral gaps of $X$ and $Y$. In the case of uniformly log-concave vectors our analysis gives dimension-free bounds. Our proofs are based on a new approach which uses an entropy-minimizing process from stochastic control theory.


Introduction
Let µ be a probability measure on R d and X ∼ µ. Denote by h(µ), the differential entropy of µ which is defined to be h(µ) := h(X) = − R d ln dµ dx dµ.
One of the fundamental results of information theory is the celebrated Shannon-Stam inequality which asserts that for independent vectors X, Y and λ ∈ (0, 1) We remark that Stam [18] actually proved the equivalent statement first observed by Shannon in [17], and known today as the entropy power inequality. To state yet another equivalent form of the inequality, for any positive-definite matrix, Σ, we set γ Σ as the centered Gaussian measure on R d with density For the case where the covariance matrix is the identity, I d , we will also write γ := γ I d . If Y ∼ ν we set the relative entropy of X with respect to Y as For G ∼ γ, the differential entropy is related to the relative entropy by Thus, when X and Y are independent and centered the statement is equivalent to (1). Shannon noted that in the case that X and Y are Gaussians with proportional covariance matrices, both sides of (2) are equal. Later, in [18] it was shown that this is actually a necessary condition for the equality case. We define the deficit in (3)  what can be said about X and Y when δ EP I,λ (X, Y ) is small? One might expect that, in light of the equality cases, a small deficit in (3) should imply that X and Y are both close, in some sense, to a Gaussian. A recent line of works has focused on an attempt to make this intuition precise (see e.g., [6,20]), which is also our main goal in the present work.
In particular, we give the first stability estimate in terms of relative entropy. A good starting point is the work of Courtade, Fathi and Pananjady ( [6]) which considers stability in terms of the Wasserstein distance (also known as quadratic transportation). The Wasserstein distance is defined by where the infimum is taken over all couplings π whose marginal laws are µ and ν. A crucial observation made in their work is that without further assumptions on the measures µ and ν, one should not expect meaningful stability results to hold. Indeed, for any λ ∈ (0, 1) they show that there exists a family of measures {µ ε } ε>0 such that δ EP I,λ (µ ε , µ ε ) < ε and such that for any Gaussian measure γ Σ , W 2 (µ ε , γ Σ ) ≥ 1 3 . Moreover, one may take µ ε to be a mixture of Gaussians. Thus, in order to derive quantitative bounds it is necessary to consider a more restricted class of measures. We focus on the class of log-concave measures which, as our method demonstrates, turns out to be natural in this context.

Our Contribution
A measure is called log-concave if it is supported on some subspace of R d and, relative to the Lebesgue measure of that subspace, it has a density f for which where ∇ 2 denotes the Hessian matrix, and we consider the inequality in the sense of positive definite matrices. Our first result will rely on a slightly stronger condition known as uniform log-concavity. If there exists ξ > 0 such that then we say that the measure is ξ-uniformly log-concave. Theorem 1. Let X and Y be 1-uniformly log-concave centered vectors, and denote by σ 2 X , σ 2 Y the respective minimal eigenvalues of their covariance matrices. Then there exist Gaussian vectors G X and G Y such that for any λ ∈ (0, 1), To compare this with the main result of [6] we recall the transportation-entropy inequality due to Talagrand ( [19]) which states that As a conclusion we get where C σ X ,σ Y depends only on σ X and σ Y . Up to this constant, this is precisely the main result of [6]. In fact, our method can reproduce their exact result, which we present as a warm up in the next section. We remark that as the underlying inequality is of information-theoretic nature, it is natural to expect that stability estimates are expressed in terms of relative entropy. A random vector is isotropic if it is centered and its covariance matrix is the identity. By a rescaling argument the above theorem can be restated for uniform log-concave isotropic random vectors.
Corollary 2. Let X and Y be ξ-uniformly log-concave and isotropic random vectors, then there exist Gaussian vectors G X and G Y such that for any λ ∈ (0, 1) In our estimate for general log-concave vectors, the dependence on the parameter ξ will be replaced by the spectral gap of the measures. We say that a random vector X satisfies a Poincaré inequality if there exists a constant C > 0 such that E [Var(ψ(X))] ≤ CE ∇ψ(X) 2 2 , for all test functions ψ.
We define C p (X) to be the smallest number such that the above equation holds with C = C p (X), and refer to this quantity as the Poincaré constant of X. The inverse quantity, C p (X) −1 is referred to as the spectral gap of X.
Theorem 3. Let X and Y be centered log-concave vectors with σ 2 X , σ 2 Y denoting the minimal eigenvalues of their covariance matrices. Assume that Cov(X) + Cov(Y ) = 2I d and set max Cp(X) Then, if G denotes the standard Gaussian, for every λ ∈ (0, 1) where K > 0 is a numerical constant, which can be made explicit.
Remark 4. For ξ-uniformly log-concave vectors, we have the relation, C p (X) ≤ 1 ξ (this is a consequence of the Brascamp-Lieb inequality [3], for instance). Thus, considering Corollary 2, one might have expected that the term C 3 p could have been replaced by C 2 p in Theorem 3. We do not know if either result is tight.
Remark 5. Bounding the Poincaré constant of an isotropic log-concave measure is the object of the long standing Kannan-Lováz-Simonovits (KLS) conjecture (see [12,13] for more information). The conjecture asserts that there exists a constant K > 0, independent of the dimension, such that for any isotropic log-concave vector X, C p (X) ≤ K. The best known bound is due to Lee and Vempala which showed in [14] that if X is a a d-dimensional log-concave vector, Concerning the assumptions of Theorem 3; note that as the EPI is invariant to linear transformation, there is no loss in generality in assuming Cov(X) + Cov(Y ) = 2I d . Remark that C p (X) is, approximately, proportional to the maximal eigenvalue of Cov(X). Thus, for illconditioned covariance matrices Cp(X) will not be on the same scale. It seems plausible to conjecture that the dependence on the minimal eigenvalue and Poicnaré constant could be replaced by a quantity which would take into consideration all eigenvalues.
Some other known stability results, both for log-concave vectors and for other classes of measures, may be found in [5,6,20]. The reader is referred to [6, Section 2.2] for a complete discussion. Let us mention one important special case, which is relevant to our results; the socalled entropy jump, first proved for the one dimensional case by Ball, Barthe and Naor ( [1]) and then generalized by Ball and Nguyen to arbitrary dimensions in [2]. According to the latter result, if X is a log-concave and isotropic random vector, then where C p (X) is the Poincaré constant of X and G is the standard Gaussian. This should be compared to both Corollary 2 and Theorem 3. That is, in the special case of two identical measures and λ = 1 2 , their result gives a better dependence on the Poincaré constant than the one afforded by our results.
Ball and Nguyen ( [2]) also give an interesting motivation for these type of inequalities: They show that if for some constant κ > 0, then the density f X of X satisfies, f X (0) ≤ e 2d κ . The isotropic constant of X is defined by L X := f X (0) 1 d , and is the main subject of the slicing conjecture, which hypothesizes that L X is uniformly bounded by a constant, independent of the dimension, for every isotropic log-concave vector X. Ball and Nguyen observed that using the above fact in conjunction with an entropy jump estimate gives a bound on the isotropic constant in terms of the Poincaré constant, and in particular the slicing conjecture is implied by the KLS conjecture.
Our final results give improved bounds under the assumption that X and Y are already close to being Gaussian, in terms of relative entropy, or if one them is a Gaussian. We record these results in the following theorems.
The following gives an improved bound in the case that one of the random vectors is a Gaussian, and holds in full generality with respect to the other vector, without a log-concavity assumption.
Theorem 7. Let X be a centered random vector with finite Poincaré constant, C p (X) < ∞. Then Remark 9. Theorem 7 was already proved in [6] by using a slightly different approach. Denote by I(X||G), the relative Fisher information of the random vector X. In [9] the authors proof the following improved log-Sobolev inequality.
The theorem follows by integrating the inequality along the Ornstein-Uhlenbeck semi-group.
Our approach is based on ideas somewhat related to the ones which appear in [8]: the very highlevel plan of the proof is to embed the variables X, Y as the terminal points of some martingales and express the entropies of X, Y and X+Y as functions of the associates quadratic co-variation processes. One of the main benefits in using such an embedding is that the co-variation process of X + Y can be easily expressed in terms on the ones of X, Y , as demonstrated below. In [8] these ideas where used to produce upper bounds for the entropic central limit theorem, so it stands to reason that related methods may be useful here. It turns out, however, that in order to produce meaningful bounds for the Shannon-Stam inequality, one needs a more intricate analysis, since this inequality corresponds to a second-derivative phenomenon: whereas for the CLT one only needs to produce upper bounds on the relative entropy, here we need to be able to compare, in a non-asymptotic way, two relative entropies.
In particular, our martingale embedding is constructed using the entropy minimizing technique developed by Föllmer ( [10,11]) and later Lehec ( [15]). This construction has several useful features, one of which is that it allows us to express the relative entropy of a measure in R d in terms of a variational problem on the Wiener space. In addition, upon attaining a slightly different point of view on this process, that we introduce here, the behavior of this variational expression turns out to be tractable with respect to convolutions.
In order to outline the argument, fix centered measures µ and ν on R d with finite second moment. Let X ∼ µ, Y ∼ ν be random vectors and G ∼ γ a standard Gaussian random vector.
An entropy-minimizing drift. Let B t be a standard Brownian motion on R d and denote by F t its natural filtration. In the sequel, the following process plays a fundamental role: where the minimum is taken with respect to all processes u t adapted to F t , such that Amazingly, under mild assumptions on µ, and in particular in the case that µ is log-concave, there exists a unique minimizer to Equation (4), from which we construct the process also known as the Föllmer process, with v X t being the associated Föllmer drift. We refer the reader to [15] for proofs of the existence and uniqueness of the process, as well as of a few other facts summarized below.
It turns out that the process v X t is a martingale (which goes together with the fact that it minimizes a quadratic form) which is given by the equation where f X is the density of X with respect to the standard Gaussian and P 1−t denotes the heat semi-group. In fact, Girsanov's formula gives a very useful relation between the energy of the drift and the entropy of X, namely, This gives the following alternative interpretation for the process: suppose that the Wiener space is equipped with an underlying probability measure P , with respect to which the process B t is a Brownian motion as above. Let Q be a measure on Wiener space such that then the process X t is a Brownian motion with respect to the measure Q. By the representation theorem for the Brownian bridge, this tells us that the process X t conditioned on X 1 is a Brownian bridge between 0 and X 1 . In particular, we have Lehec's proof of the Shannon-Stam inequality. For the sake of intuition, we now repeat Lehec's argument to reproduce the Shannon-Stam inequality (3) using this process. Let X t := v Y s ds be the Föllmer processes associated to X and Y , where B X t and B Y t are independent Brownian motions. For λ ∈ (0, 1), define the new processes By the independence of B X t and B Y t ,B t is a Brownian motion and Note that as the v X t is martingale, we have for every t ∈ [0, 1], Using equations (4) and (6) and recalling that the processes are independent, we finally have

This recovers the Shannon-Stam inequality in the form (3).
An alternative point of view: Replacing the drift by a varying diffusion coefficient. Lehec's proof gives rise to the following idea: Suppose the processes v X t and v Y t could be coupled in a way such that the variance of the resulting process √ λv X t + √ 1 − λv Y t was smaller than that of w t above. Such a coupling would improve on (3) and that is the starting point of this work.
As it turns out, however, it is easier to get tractable bounds by working with a slightly different interpretation of the above processes, in which the role of the drift is taken by an adapted diffusion coefficient of a related process.
The idea is as follows: Suppose that M t := t 0 F s dB s is a martingale, where F t is some positive-definite matrix valued process adapted to F t . Consider the drift defined by We then claim that B 1 + 1 0 u t dt = M 1 . To show this, we use the stochastic Fubini Theorem Since we now expressed the random variable M 1 as the terminal point of a standard Brownian motion with an adapted drift, the minimality property of the Föllmer drift together with equation (6) immediately produce a bound on its entropy. Namely, by using Itô's isometry and Fubini's theorem we have the bound (9) This hints at the following possible scheme of proof: in order to give an upper bound for the expression D( √ λX 1 + √ 1 − λY 1 ||G), it suffices to find martingales M X t and M Y t such that M X 1 , M Y 1 have the laws of X and Y , respectively, and such that the λ-average of the covariance processes is close to the identity.
The Föllmer process gives rise to a natural martingale: Consider E [X 1 |F t ], the associated Doob martingale. By the martingale representation theorem ( [16, Theorem 4.3.3]) there exists a uniquely defined adapted matrix valued process Γ X t , for which By following the construction in (8) and considering the processṽ X t : Observe that v t −ṽ t is a martingale and that for every (v X s −ṽ X s )ds|F t = 0, almost surely. It thus follows that v X t andṽ X t are almost surely the same process. We conclude the following representation for the Föllmer drift, The matrix Γ X t turns out to be positive definite almost surely, (in fact, it has an explicit simple representation, see Proposition 1 below), which yields, by the combining (6) with same calculation as in (9), Given the processes Γ X t and Γ Y t , we are now in position to express √ λX + √ 1 − λY as the terminal point of a martingale, towards using (9), which would lead to a bound on δ EP I,λ . We defineΓ and a martingaleB t which satisfies Since Γ X t and Γ Y t are invertible almost surely and independent, it holds that where [B] t denotes the quadratic co-variation ofB t . Thus, by Levy's characterization,B t is a standard Brownian motion and we have the following equality in law We can now invoke (9) to get Combining this with the identity (12) finally gives a bound on the deficit in the Shannon-Stam inequality, in the form The following technical lemma will allow us to give a lower bound for the right hand side in terms of the variances of the processes Γ X t , Γ Y t . Its proof is postponed to the end of the section. Combining the lemma with the estimate obtained in (13) produces the following result, which will be our main tool in studying δ EP I,λ . Lemma 2. Let X and Y be centered random vectors on R d with finite second moment, and let Γ X t , Γ Y t be defined as above. Then, The expression on the right-hand side of (14) may seem unwieldy, however, in many cases it can be simplified. For example, if it can be shown that, almost surely, Γ X t , Γ Y t c t I d for some deterministic c t > 0, then we obtain the more tractable inequality As we will show, this is the case when the random vectors are log-concave.
Proof of Lemma 1. We have As we have the equality Finally, as the trace is invariant under any permutation of three symmetric matrices we have that and Thus, as required.

The Föllmer process associated to log-concave random vectors
In this section, we collect several results pertaining to the Föllmer process. Throughout the section, we fix a random vector X in R n and associate to it the Föllmer process X t , defined in the previous section, as well as the process Γ X t , defined in equation (10) above. The next result lists some of its basic properties, and we refer to [7,8] for proofs.
where f X is the density of X with respect to the standard Gaussian and Z t,X is a normalizing constant defined so that • f t X is the density of the random measure µ t := X 1 |F t with respect to the standard Gaussian and Γ X t = Cov(µt) 1−t .
• Γ X t is almost surely a positive definite matrix, in particular, it is invertible.
• For all t ∈ (0, 1), we have • The following identity holds In what follows, we restrict ourselves to the case that X is log-concave. Using this assumption we will establish several important properties for the matrix Γ t . For simplicity, we will write Γ t := Γ X t and v t := v X t . The next result shows that the matrix Γ t is bounded almost surely.

Lemma 3.
Suppose that X is log-concave, then for every t ∈ (0, 1) Moreover, if for some ξ > 0, X is ξ-uniformly log-concave then Proof. By Proposition 1, µ t , the law of X 1 |F t has a density ρ t , with respect to the Lebesgue measure, proportional to Consequently, since −∇ 2 f X 0, It follows that, almost surely, µ t is t 1−t -uniformly log-concave. According to the Brascamp-Lieb inequality ( [3]) α-uniform log-concavity implies a spectral gap of α, and in particular Cov(µ t ) 1−t t I d and so, Γ t = Cov(µt) If, in addition, X is ξ-uniformly log-concave, so that −∇ 2 f X ξI d , then we may write and the arguments given above show Cov(µ t ) Our next goal is to use the formulas given in the above lemma in order to bound from below the expectation of Γ t . We begin with a simple corollary.

Corollary 10.
Suppose that X is 1-uniformly log-concave, then for every t ∈ [0, 1] By Lemma 3, Γ t I d , which shows Thus, for every t, To produce similar bounds for general log-concave random vectors, we require more intricate arguments. Recall that C p (X) denotes the Poincaré constant of X.

Lemma 4. If X is centered and has a finite a Poincaré constant
Proof. Recall that, by equation (7), we know that X t has the same law as tX where G is a standard Gaussian independent of X 1 . Since C p (tX) = t 2 C p (X) and since the Poincaré constant is sub-additive with respect to convolution ( [4]) we get The drift, v t , is a function of X t and E [v t ] = 0. Equation (5) implies that ∇ x v t (X t ) is a symmetric matrix, hence the Poincaré inequality yields As v t (X t ) is a martingale, by Itô's lemma we have An application of Itô's isometry then shows where we have again used the fact that ∇ x v t (X t ) is symmetric.
Using the last lemma, we can deduce lower bounds on the matrix Γ X t in terms of the Poincaré constant.
Corollary 11. Suppose that X is log-concave and that σ 2 is the minimal eigenvalue of Cov(X). Then, • For every t ∈ 0, • For every t ∈ 1 2 Cp(X) Proof. Using Equation (11), Itô's isometry and the fact that Γ t is symmetric, we deduce that Combining this with equation (17) and using Lemma 4, we get In the case where X is log-concave, by Lemma 3, Γ t The above inequality then becomes Rearranging the inequality shows As long as t ≤ 1 2 which gives the first bound. By (10), we also have the bound The differential equation has a unique solution given by Using Gromwall's inequality, we conclude that for every t ∈ 1 2 Cp (X) We conclude this section with a comparison lemma that will allow to control the values of E v t 2 2 . Lemma 5. Let t 0 ∈ [0, 1] and suppose that X is centered with a finite Poincaré constant C p (X) < ∞. Then and an analogous estimate also holds for Y . We may now use E Γ X t and E Γ Y t as the diffusion coefficients for the same Brownian motion to establish Plugging these estimates into (19) reproves the following bound, which is identical to Theorem 1 in [6].
Theorem 12. Let X and Y be 1-uniformly log-concave centered vectors and let G X , G Y be defined as above. Then, To obtain a bound for the relative entropy towards the proof of Theorem 1, we will require a slightly more general version of inequality (9). This is the content of the next lemma, whose proof is similar to the argument presented above. The main difference comes from applying Girsanov's theorem to a re-scaled Brownian motion, from which we obtain an expression analogous to (6). The reader is referred to [8,Lemma 2], for a complete proof. Lemma 6. Let F t and E t be two F t -adapted matrix-valued processes and let X t , M t be two processes defined by Suppose that for every t ∈ [0, 1], E t cI d for some deterministic c > 0, then Proof of Theorem 1. By Corollary 10 We invoke Lemma 6 with E t = E Γ X t and F t = Γ X t to obtain Repeating the same argument for Y gives By invoking Lemma 6 with F t = E Γ X t and E t = E Γ Y t and then one more time after switching between F t and E t , and summing the results, we get Plugging the above inequalities into (19) concludes the proof.

Stability for general log-concave random vectors
Fix X, Y , centered log-concave random vectors in R d , such that Using the relation in (11), Fubini's theorem shows which contradicts the identity (6), and concludes the proof by contradiction.
As the same reasoning is also true for Y , we now choose t 0 = ξ 2 , which allows to invoke the previous lemma in (24) and to establish: We are finally ready to prove the main theorem.

Stability for low entropy log concave measures
In this section we focus on the case where X and Y are log-concave and isotropic. Similar to the previous section, we set ξ X = 1 3(2Cp(X)+1) , so that by Corollary 11, Towards the proof of Theorem 6, we first need an analogue of Lemma 7, for which we sketch the proof here.