1 Introduction

One of the basic tools in the study of functional inequalities in Euclidean spaces is the use of Lipschitz maps \(T:\mathbb {R}^d\rightarrow \mathbb {R}^d\) [20, 37]. A good starting point for this discussion is Caffarelli’s contraction theorem [12] (see also [19, 29, 41, 62] for other proofs): If \(\gamma _d\) is the standard Gaussian measure on \(\mathbb {R}^d\) and p is a probability measure on \(\mathbb {R}^d\) which is more log-concave than \(\gamma _d\), then the optimal transport map of Brenier \(T:\mathbb {R}^d\rightarrow \mathbb {R}^d\), which pushes forward \(\gamma _d\) to p, is 1-Lipschitz. In other words, that p is more log-concave than \(\gamma _d\) is manifested by the contractive properties of the transport map T. With the existence of T in hand, we can easily transfer to p functional inequalities which are known to be true for \(\gamma _d\). For example, the Poincaré inequality states that for \(\eta :\mathbb {R}^d\rightarrow \mathbb {R}\) we have

$$\begin{aligned} \text {Var}_{\gamma _d}[\eta ]\le \int |\nabla \eta |^2d\gamma _d. \end{aligned}$$

As T is 1-Lipschitz, its derivative is bounded, \(|DT|_{\text {op}}\le 1\), so

$$\begin{aligned} \text {Var}_{p}[\eta ]=\text {Var}_{\gamma _d}[\eta \circ T]\le \int |\nabla (\eta \circ T)|^2d\gamma _d\le \int |DT|_{\text {op}}^2\,(|\nabla \eta |\circ T)^2d\gamma _d\le \int |\nabla \eta |^2dp \end{aligned}$$
(1.1)

where we used that p is the pushforward of \(\gamma _d\) under T. We see that p satisfies the Poincaré inequality with the same constant as \(\gamma _d\).

This paper starts with the observation that since the Gaussian measure is infinite-dimensional in nature,Footnote 1 the search for contractive transport maps from the Gaussian measure to some target measure should not be confined to Euclidean spaces, even if the target measure is a measure on \(\mathbb {R}^d\). Specifically, we will take our source measure to be the Wiener measure (an infinite-dimensional Gaussian measure) which will allow us to take advantage of the Malliavin and stochastic calculus of the Wiener space. Given a target measure p on \(\mathbb {R}^d\), our construction relies on the Föllmer process: The solution \(X=(X_t)\) to the stochastic differential equation

$$\begin{aligned} dX_t=\nabla \log P_{1-t}\left( \frac{dp}{d\gamma _d}\right) (X_t)dt+dB_t, \quad t\in [0,1], \quad X_0=0 \end{aligned}$$
(1.2)

where \((B_t)\) is the standard Brownian motion in \(\mathbb {R}^d\) and \((P_t)\) is the heat semigroup. This process can be seen as Brownian motion conditioned on being distributed like p at time 1; i.e., \(X_1\sim p\). Hence, we view the solution to (1.2) at time 1 as a transport map, which we call the Brownian transport map, \(X_1:\Omega \rightarrow \mathbb {R}^d\) which pushes forward the Wiener measure \(\gamma \) on the Wiener spaceFootnote 2\(\Omega \) to the target measure p on \(\mathbb {R}^d\).

In the remainder of the introduction we present our results on the contractive properties of the Brownian transport map, as well as applications to functional inequalities and to central limit theorems. We also study the behavior of the Brownian transport map when considered as a map from the Wiener space to itself. This point of view further elucidates the connection between our results and optimal transport theory.

1.1 Almost-sure contraction

Before presenting our first result we discuss the types of measures for which it is reasonable to expect that the Brownian transport map will be an almost-sure contraction (the reader is referred to Sect. 2 for the exact definition). The rough intuition is that if the measure \(\gamma \) is squeezed into a more concentrated measure then the transport map should be a contraction. We focus on several mechanisms which in principle could facilitate such contractions. The first mechanism works by requiring that \(S:=\text {diam}({\textrm{supp}}(p))\) is finite so that the entire mass of p is confined into a region with finite volume. The second mechanism, inspired by Caffarelli’s result, works by imposing convexity assumptions on p: We say that p is \(\kappa \)-log-concave for some \(\kappa \in \mathbb {R}\) if

$$\begin{aligned} -\nabla ^2\log \left( \frac{dp}{dx}\right) (x) \succeq \kappa \text {Id}_d \quad \forall \,x\in {\textrm{supp}}(p). \end{aligned}$$

Note that we allow \(\kappa \) to take negative values and that the case \(\kappa =0\) corresponds to p being log-concave. When \(\kappa \ge 1\) we see that p is more log-concave than \(\gamma _d\) (\(\kappa =1\) when \(p=\gamma _d\) as \(-\nabla ^2\log \left( \frac{d\gamma _d}{dx}\right) =\text {Id}_d\)) so in that sense p is more concentrated than \(\gamma _d\) and we expect some type of contraction.

The following result shows that the Brownian transport map is an almost-sure contraction when the target measure satisfies either a convexity assumption or a finite-volume of support assumption. For example, as will be clear from the subsequent discussion, Theorem 1.1 always improves on the analogous result of Caffarelli which states that when p is \(\kappa \)-log-concave, for \(\kappa >0\), the optimal transport map is \(\frac{1}{\sqrt{\kappa }}\)-Lipschitz. In the remainder of the paper we refer to \(\ell \)-Lipschitz maps as contractions with constant \(\ell \).

Theorem 1.1

(Almost-sure contraction) Let p be a \(\kappa \)-log-concave measure for some \(\kappa \in \mathbb {R}\) and let \(S:=\text {diam}({\textrm{supp}}(p))\).

  1. (i)

    If \(\kappa S^2\ge 1\) then the Brownian transport map between \(\gamma \) and p is an almost-sure contraction with constant \(\frac{1}{\sqrt{\kappa }}\).

  2. (ii)

    If \(\kappa S^2<1\) then the Brownian transport map between \(\gamma \) and p is an almost-sure contraction with constant \(\left( \frac{e^{1-\kappa S^2}+1}{2}\right) ^{1/2}S\).

To unpack Theorem 1.1 let us consider some of its important special cases.

  • \(S<\infty \) and \(\kappa =0\). This setting corresponds to the case where p is log-concave with bounded convex support. It is an open question [41, Problem 4.3] whether the optimal transport map of Brenier between \(\gamma _d\) and p is a contraction with a dimension-free constant. On the other hand, Theorem 1.1 shows that the Brownian transport map between \(\gamma \) and p is in fact an almost-sure contraction with a dimension-free constant of the optimal dependence O(S).

  • \(\kappa >0\). If \(\kappa S^2\ge 1\) then we obtain the exact analogue of Caffarelli’s result for the optimal transport map [41, Theorem 2.2]. If \(\kappa S^2<1\) then part (ii) shows that the Brownian transport map is an almost-sure contraction with constant \(\left( \frac{e^{1-\kappa S^2}+1}{2}\right) ^{1/2}S\le \frac{1}{\sqrt{\kappa }}\); the last inequality holds as \(\kappa S^2<1\) and by the estimate \(1-\frac{1}{x}\le \log x\). Thus, we get an improvement on the analogue of Caffarelli’s result.

  • \(S<\infty \) and \(\kappa <0\). In this setting, only part (ii) applies and we see that the Brownian transport map is an almost-sure contraction with constant \(\left( \frac{e^{1+|\kappa | S^2}+1}{2}\right) ^{1/2}S\). There are no analogous results for other transport maps.

  • \(S=\infty \) and \(\kappa \le 0\). The bounds provided by Theorem 1.1 are trivial in this case. This is unavoidable, as we explain in Sect. 1.3.

Remark 1.2

The distinction between \(\kappa S^2\ge 1\) and \(\kappa S^2<1\) is not necessary and one could get more refined results; see Remark 3.5. The formulation of Theorem 1.1, however, is the cleanest which is why we chose it.

Theorem 1.1 goes beyond the above examples by capturing the effect of the interplay of convexity (including \(\kappa <0\)) and support size on the contraction properties of the Brownian transport map.Footnote 3

The reason why we can prove the results in Theorem 1.1, which are unknown for the Brenier map, is because the Malliavin calculus available in the Wiener space allows us to write a differential equation for the derivative of the Brownian transport map, which in turn shows that it is a contraction. This feature does not have an analogue in optimal transport maps (but see [37, equation (1.8)] for a different transport map). Moreover, as will be shown in Sect. 8, trying to replace the Brownian transport map by the optimal transport map on the Wiener space (see Sect. 1.5 for more details) is not possible since the optimal transport map on Wiener space will essentially reduce to the optimal transport map between \(\gamma _d\) and p for which the desired contraction properties are not known.

In our second result we identify a third mechanism to promote the existence of contractive transport maps. In essence, the idea is that taking well-behaved mixtures of Gaussians will also have tame concentration profiles. Indeed, in the case where the mixing measure has bounded support we establish the the Brownian transport map is a contraction.

Theorem 1.3

(Gaussian mixtures) Let \(p:=\gamma _d\star \nu \) be the convolution of the standard Gaussian measure \(\gamma _d\) with a probability measure \(\nu \) on \(\mathbb {R}^d\) supported on a ball of radius R. Then, the Brownian transport map between \(\gamma \) and p is an almost-sure contraction with constant \(\left( \frac{e^{2R^2}-1}{2}\right) ^{1/2}\frac{1}{R}\).

In the one-dimensional case, an analogue of Theorem 1.3 was established in [67], for the Brenier map. The proof relied on the explicit expression for transport maps between measures on the real line. While it is unknown whether the Brenier map enjoys similar properties in higher dimensions, our analysis of the Brownian transport map affords Theorem 1.3 as an extensive generalization to arbitrary dimensions. As a corollary we are able to deduce several new functional inequalities, as well as improve upon existing ones, for Gaussian mixtures (see Sect. 1.2 below).

1.2 Functional inequalities

Once Theorems 1.1 and 1.3 are established, the generality of the transport method towards functional inequalities opens the door to the improvement of numerous functional inequalities. Section 5 is dedicated to proving such results which include the isoperimetric inequality, \(\Psi \)-log-Sobolev inequalities; a generalization of the log-Sobolev inequality, and q-Poincaré inequalities, a generalization of the Poincaré inequality.

In Table 1 we summarize our results and note the ones which seem to be new. The definitions and exact statements are deferred to Sect. 5. As can be seen from the table, for log-concave measures some of the results are not new. However, let us note that the proofs of the mentioned results, obtained gradually over the last several decades, utilized a myriad of different techniques. These techniques include, among others, localization methods, Bakry-Émery calculus, and Brunn-Minkowski theory, and often require ad-hoc arguments for the specific functional inequality in question. In contrast, our transportation approach provides a unifying framework to study such functional inequalities. As a result we are also able to obtain new, previously unknown, results such that as \(\Psi \)-log-Sobolev and q-Poincaré inequalities for log-concave measures with bounded support. While it is likely that one could use other techniques to prove comparable results, the benefit of our approach is that no further arguments are needed, other than Theorem 1.1 and arguments similar to the one outlined in (1.1).

Table 1 Summary of functional inequalities obtained from Theorem 1.1 and Theorem 1.3. For results which were previously known we supply references, otherwise the relevant theorem is noted

The bottom row of Table 1 deals with Gaussian mixtures. The question of existence of functional inequalities for a mixture of distributions, given the existence of the corresponding inequalities for the individual components, has been investigated for some time. Only recently has it been settled for the Poincaré and log-Sobolev inequalities, [17, Theorem 1]. The result of [17] is very general and applies to many families of mixture distributions but, on the other hand, the method of proof seems to be specialized to the Poincaré and log-Sobolev inequalities. In this case, the generality of the transport method allows to tackle inequalities which seem to lie outside of the scope of previous methods. In addition, the generality of the method of [17] misses the special nature of mixtures of Gaussians. Indeed, our results improve on [17, Corollaries 1,2].

1.3 Log-concave measures

As we saw in Sect. 1.2, measures which are \(\kappa \)-log-concave (with \(\kappa >0\)) satisfy a Poincaré inequality with constant \(\kappa ^{-1}\) (which in particular does not depend on the dimension d). When \(\kappa =0\), this constant blows up which leaves open the question of the existence of a Poincaré inequality for log-concave measures. The Kannan–Lovász–Simonovits conjecture [36], in one of its formulations, states that there exists a constant \(C_{\text {kls}}\), which does not depend on the dimension d, such that for any isotropic (i.e., centered with covariance equal to the identity matrix) log-concave measure p on \(\mathbb {R}^d\) we have

$$\begin{aligned} \text {Var}_{p}[\eta ]\le C_{\text {kls}}\int |\nabla \eta |^2dp\quad \text {for }\eta :\mathbb {R}^d\rightarrow \mathbb {R}. \end{aligned}$$

In words, any isotropic log-concave measure on \(\mathbb {R}^d\) satisfies a Poincaré inequality with a constant \(C_{\text {kls}}\) which is dimension-free. In light of the above discussion, transport maps offer a natural route to proving the conjecture: Given an isotropic log-concave measure p on \(\mathbb {R}^d\) we would like to construct a transport map, from \(\gamma _d\) or \(\gamma \), to p which is an almost-sure contraction with constant \(C_{\text {kls}}\). Unfortunately, in general, such a map cannot exist: Indeed, as seen in Table 1, such a map will imply that p satisfies a log-Sobolev inequality with a dimension-free constant. But this is known to be false because the existence of a log-Sobolev inequality is equivalent to sub-Gaussian concentration [46, Theorem 5.3], which does not hold for the two-sided exponential measure even though it is isotropic and log-concave. Nonetheless, the transport approach towards the conjecture can still be made to work by using a weaker notion of contraction and using an important result of E. Milman. Indeed, consider the Brownian transport map and suppose that instead of having an almost-sure bound \(|\mathcal {D}X_1|\le C_{\text {kls}}\) we only have a bound in expectation,

$$\begin{aligned} \mathbb {E}_{\gamma }[|\mathcal {D}X_1|^2]\le C, \end{aligned}$$

for some dimension-free constant C. Repeating the argument above, and using Hölder’s inequality, we find that

$$\begin{aligned} \text {Var}_{p}[\eta ]\le C~ \text {Lip}^2(\eta ) \end{aligned}$$

where \(\text {Lip}(\eta ):=\sup _{x\in \mathbb {R}^d}|\nabla \eta (x)|\). In principle, this bound is weaker than a Poincaré inequality because of the use of the \(L^{\infty }\) norm on the gradient rather than the \(L^2\) norm. However, as shown by E. Milman [55], a Poincaré inequality is equivalent, up to a dimension-free constant, to first moment concentration which follows from the above \(L^{\infty }\) bound. In conclusion, the Kannan–Lovász–Simonovits conjecture is proven as soon as we can show that \(\mathbb {E}_{\gamma }[|\mathcal {D}X_1|^2]\le C\).

Significant progress towards the resolution of the Kannan–Lovász–Simonovits conjecture was made in a series of works [18, 25, 38, 39, 49]. Building on these results and techniques, we are able to make the, so far missing, connection between measure transportation and the Kannan–Lovász–Simonovits conjecture.

Theorem 1.4

(Contraction in expectation for log-concave measures) Let p be an isotropic log-concave measure on \(\mathbb {R}^d\) with compact support.Footnote 4, and let \(X_1:\Omega \rightarrow \mathbb {R}^d\) be the Brownian transport map from the Wiener measure \(\gamma \) to p. There exists a universal constant \(\zeta \) such that, for any positive integer m,

$$\begin{aligned} \mathbb {E}_{\gamma }\left[ |\mathcal {D}X_1|_{\mathcal {L}(H,\mathbb {R}^d)}^{2m}\right] \le \zeta ^m (2m+1)! (\log d)^{12m}, \end{aligned}$$

where \(|\cdot |_{\mathcal {L}(H,\mathbb {R}^d)}\) is the operator norm of operators from the Cameron-Martin space H to \(\mathbb {R}^d\).

Taking \(m=1\) in Theorem 1.4 we see that \(C_{\text {kls}}\) is almost dimension-free, as would be expected from the Kannan–Lovász–Simonovits conjecture. In fact, our transport perspective allows us to go beyond \(m=1\), which is needed for the applications outlined below.

Remark 1.5

As explained in this section, an expectation bound of the form \(\mathbb {E}_{\gamma }[|DT|^2]<C\), where T is a transport map from either \(\gamma \) or \(\gamma _d\) to p, with a dimension-free universal constant C, would lead to a proof of the Kannan–Lovász–Simonovits conjecture. In fact, one of the novelties of the proof of Theorem 1.4 is that it reveals that the reverse is also true, up to \(\log d\) factors, when T is the Brownian transport map. That is, assuming that the Kannan–Lovász–Simonovits conjecture is true we would get, up to \(\log d\) factors, an expectation bound \( \mathbb {E}_{\gamma }\left[ |\mathcal {D}X_1|_{\mathcal {L}(H,\mathbb {R}^d)}^{2\,m}\right] \le C_m\) for a dimension-free universal constant \(C_m\) which depends only on m.

1.4 Stein kernels and central limit theorems

Our proof of Theorem 1.4 is based on known results concerning \(C_{\text {kls}}\). Thus, Theorem 1.4 does not supply any new information regarding the Kannan–Lovász–Simonovits conjecture itself. However, for isotropic log-concave measures, the transport approach is useful not only in the study of the conjecture but also in the theory of Stein kernels. As will become evident soon, these results go beyond the Poincaré inequality, and hence do not follow from the current results on the Kannan–Lovász–Simonovits conjecture.

Given a centered measure p on \(\mathbb {R}^d\), a matrix-valued map \(\mathfrak {s}_p\) is called a Stein kernel for p if

$$\begin{aligned} \mathbb {E}_p[\eta (x)x]=\mathbb {E}_p[\nabla \eta (x)\mathfrak {s}_p(x)] \end{aligned}$$

for a big-enough family of functions \(\eta :\mathbb {R}^d\rightarrow \mathbb {R}\). Gaussian integration by parts shows that \(p=\gamma _d\) if and only if the constant matrix \(\text {Id}_d\) is a Stein kernel for p. Hence, the distance of \(\mathfrak {s}_p\) from \(\text {Id}_d\) controls the extent to which p can be approximated by \(\gamma _d\). Specifically, the Stein discrepancy of \(\mathfrak {s}_{p}\) is defined as

$$\begin{aligned} S^2(\mathfrak {s}_p):=\mathbb {E}_p[|\mathfrak {s}_p-\text {Id}_d|_{\text {HS}}^2]. \end{aligned}$$

The quantity \(S(\mathfrak {s}_p)\) plays an important role in functional inequalities [48, 58, 61] and normal approximations [23, 28, 47, 57]. For applications, often it is enough to bound \(\mathbb {E}_p[|\mathfrak {s}_p|_{\text {HS}}^2]\). While in one dimension the Stein kernel of a given measure is unique and given by an explicit formula, in high-dimensions these kernels are non-unique and their construction is often non-trivial [23]. It was observed in [15] that transport maps with certain properties are good candidates for constructing Stein kernels. We will follow this strategy and construct Stein kernels with small Hilbert-Schmidt norm, based on the Brownian transport map, for a very large class of measures.

Theorem 1.6

(Stein kernels) Let p be an isotropic log-concave measure on \(\mathbb {R}^d\) with compact support. Let \(\chi :\mathbb {R}^d\rightarrow \mathbb {R}^k\) be a continuously differentiable function with bounded partial derivatives such that \(\mathbb {E}_p[\chi ]=0\) and \(\mathbb {E}_p[|\nabla \chi |_{\text {op}}^8]<\infty \). Then, the pushforward measure \(q:=\chi _*p\) on \(\mathbb {R}^k\) admits a Stein kernel \(\tau _q\) satisfying

$$\begin{aligned} \mathbb {E}_q[|\tau _q|_{\text {HS}}^2]\le a d (\log d)^{24}\sqrt{\mathbb {E}_p[|\nabla \chi |_{\text {op}}^8]}, \end{aligned}$$

for some universal constant \(a>0\).

For example, taking \(k=d\) and \(\chi (x) := x\) shows that for p isotropic and log-concave

$$\begin{aligned} \mathbb {E}_p[|\tau _p|_{\text {HS}}^2]\le a d (\log d)^{24}. \end{aligned}$$
(1.3)

The analogous bound of (1.3), with a better polylog and a different Stein kernel \(\mathfrak {s}_p\), follows from [23] that showed \(\mathbb {E}_p[|\mathfrak {s}_p|_{\text {HS}}^2]\le dC_p\), where \(C_p\) is the Poincaré constant of p. Indeed, since \(C_p\le c\log d\) by the result of [38], the bound (1.3) holds for \(\mathfrak {s}_p\). However, unlike previous constructions, our construction is well-behaved with respect to compositions. It allows to consider general \(\chi \), which leads to the existence of Stein kernels \(\tau _q\) with bounded Hilbert-Schmidt norm, where now q does not necessarily satisfy a Poincaré inequality.

As a concrete application we will use Theorem 1.6 to deduce new central limit theorems with nearly optimal convergence rates. The best dimensional dependence in the convergence rate one could expect is of order \(\sqrt{\frac{d}{n}}\), as can be seen by considering product measures. Most known results establish general rates of convergence, in various distances, which are not better than \(\frac{d}{\sqrt{n}}\), and typically require a super-linear dependence in the dimension (see [9, 16] for some notable examples). To improve on such bounds, several recent works have shown that, by imposing strong structural assumptions on the common law of the summands, one can reduce the rate of convergence to \(\sqrt{\frac{d}{n}}\). However, these works dealt with highly regular measures, such as log-concave measures [27], measures with small support [66], or measures satisfying a Poincaré inequality [23, 28]. These assumptions can be restrictive for certain applications involving heavy-tailed measures, whose moment generating function may not be well-defined and hence do not have sub-exponential tails.

We will bypass the above restrictive assumptions by utilizing the Stein kernel approach to normal approximations combined with Theorem 1.6. The starting point is the inequality,

$$\begin{aligned} W_2^2(p,\gamma _d)\le S^2(\mathfrak {s}_p), \end{aligned}$$
(1.4)

valid for any Stein kernel \(\mathfrak {s}_p\), where \(W_2\) is the Wasserstein 2-distance [47, Proposition 3.1]. In particular, inequality (1.4) can be used to prove central limit theorems: Suppose that p is isotropic and let \(\{Y_i\}\) be an i.i.d. sequence sampled from p. Then, as shown in [47, section 2.5], for every given Stein kernel \(\mathfrak {s}_p\), there exists a Stein kernel \(\mathfrak {s}_{p_n}\), where \(\frac{1}{\sqrt{n}}\sum _{i=1}^nY_i\sim p_n\), such that

$$\begin{aligned} S^2(\mathfrak {s}_{p_n})\le \frac{S^2(\mathfrak {s}_p)}{n}. \end{aligned}$$

Combining (1.4) with the triangle inequality thus yields

$$\begin{aligned} W_2^2\left( p_n,\gamma _d\right) \le \frac{2}{n}\left\{ \mathbb {E}_p[|\mathfrak {s}_p|_{\text {HS}}^2]+d\right\} . \end{aligned}$$

The upshot of this discussion is that if we can construct a Stein kernel \(\mathfrak {s}_p\) with a small Hilbert-Schmidt norm we then obtain a central limit theorem with a good rate. In particular, whenever \(\mathbb {E}_p[|\mathfrak {s}_p|_{\text {HS}}^2]=O(d)\), we get an \(\sqrt{\frac{d}{n}}\) rate of convergence. Using our Stein kernel \(\tau _p\) and the bound in Theorem 1.6 we obtain:

Corollary 1.7

(Central limit theorem) Let pq, and \(\chi \) be as in Theorem 1.6, and further suppose that q is isotropic. Then, if \(\{Y_i\}\) are i.i.d. sampled from q we have, with \(\frac{1}{\sqrt{n}}\sum _{i=1}^nY_i\sim q_n\),

$$\begin{aligned} W_2^2(q_n,\gamma _k)\le 2 \frac{\sqrt{\mathbb {E}_p[|\nabla \chi |_{\text {op}}^8]}\,d(\log d)^{24}+k}{n}, \end{aligned}$$

for some universal constant \(a>0\).

Corollary 1.7 allows to significantly relax the regularity assumptions of the above mentioned works, while still maintaining a nearly optimal rate of convergence. For example, if \(\chi \) has a quadratic growth then q can have super-exponential tails, so it cannot satisfy a Poincaré inequality. In contrast, in such a setting, Corollary 1.7 provides results which are comparable, up to the Sobolev norm of \(\chi \) and \(\log d\) terms, to the ones obtained for log-concave measures [23, 28]. Another appealing feature is the ability to treat singular measures when \(k> d\). A particular case of interest is \(\chi (x) := x^{\otimes m}\) for some positive integer m. Even though q may be heavy-tailed in this case, since \(\mathbb {E}_p[|\nabla \chi |_{\text {op}}^8]\) is finite when p is log-concave we get a central limit theorem for sums of i.i.d. tensor powers. Proving such a result was a question posed in [53, section 3] where it was also suggested that finding an appropriate transport map could prove useful. Using our construction, Corollary 1.7 resolves this question.

1.5 Contractions on Wiener space

In the previous sections we viewed the solution X to (1.2) as a map \(X_1:\Omega \rightarrow \mathbb {R}^d\). We can go however beyond the Euclidean setting by not restricting ourselves to the value of X at time \(t=1\). We then get a map \(X:\Omega \rightarrow \Omega \) which transports the Wiener measure \(\gamma \) to the measure \(\mu \) on \(\Omega \) given by \(d\mu (\omega )= \frac{dp}{d\gamma _d}(\omega _1)d\gamma (\omega )\) for \(\omega \in \Omega \) where \(\omega _t\) is the value of \(\omega \) at time \(t\in [0,1]\). In words, \(\mu \) is obtained from \(\gamma \) by reweighting the probability of \(\omega \) according to \(\gamma \) by the value of \(\frac{dp}{d\gamma _d}:\mathbb {R}^d\rightarrow \mathbb {R}^d\) at \(\omega _1\in \mathbb {R}^d\). This leads to the following question: Is X a contraction, in a suitable sense, from the Wiener measure \(\gamma \) to \(\mu \) under appropriate conditions on \(\mu \)? In fact, this question can be placed in the general context of transport maps on Wiener space as we now explain.

We start by recalling that the Wiener space \(\Omega \) contains the important Cameron-Martin space \(H^1\) whose significance lies in the fact that the law of \(\omega +h\) is absolutely continuous with respect to the law of \(\gamma \) whenever \(h\in H^1\). The Cameron-Martin space is the continuous injection of the space \(H:=L^2(\Omega ,\mathbb {R})\) under the anti-derivative map \(\dot{h}\in H\mapsto h:=\int _0^{\cdot }\dot{h}\in H^1\) and it induces a cost on \(\Omega \) by setting, for \(x,y\in \Omega \), \(|x-y|_{H^1}\) with the convention that \(|x-y|_{H^1}=+\infty \) if \(x-y\notin H^1\). Based on this cost two notions of optimal transport maps on \(\Omega \) can be defined:

Given probability measures measure \(\nu ,\mu \) on \(\Omega \) let \(\Pi (\nu ,\mu )\) be the set of probability measures on \(\Omega \times \Omega \) such that their two marginals are equal to \(\nu \) and \(\mu \), respectively. The (squared) Wasserstein 2-distance between \(\nu \) and \(\mu \) is defined as

$$\begin{aligned} W_2^2(\nu ,\mu )=\inf _{\pi \in \Pi (\nu ,\mu )}\int _{\Omega \times \Omega }|x-y|_{H^1}^2d\pi (x,y). \end{aligned}$$

Assuming that \(W_2^2(\nu ,\mu )<\infty \), the optimal transport map \(O:\Omega \rightarrow \Omega \), when its exists, is a map which transports \(\nu \) into \(\mu \) satisfying

$$\begin{aligned} \int _{\Omega }|\omega -O(\omega )|_{H^1}^2d\nu (\omega )=W_2^2(\nu ,\mu ). \end{aligned}$$

This definition is the generalization of the definition appearing in the classical optimal transport theory on Euclidean spaces [64]. In Euclidean spaces, the existence of optimal transport maps and their regularity was proven by Brenier [64, Theorem 2.12] while the analogous result in Wiener space is due to Feyel and Üstünel [31]. Since \(W_2^2(\nu ,\mu )<\infty \), we may write \(O(\omega )=\omega +\xi (\omega )\) where \(\xi :\Omega \rightarrow H^1\) so that

$$\begin{aligned} W_2^2(\nu ,\mu ):=\inf _{\xi } \mathbb {E}_{\gamma }\left[ |\xi (\omega )|_{H^1}^2\right] \end{aligned}$$

where the infimum is taken over all maps \(\xi :\Omega \rightarrow H^1\) such that \(\text {Law}(\omega +\xi (\omega ))=\mu \) with \(\omega \sim \nu \). Importantly, we do not make the requirement that \(\xi (\omega )\) is an adaptedFootnote 5 process. We now turn to the second notion of optimal transport: Given probability measures \(\nu ,\mu \) on \(\Omega \), we defineFootnote 6 the causal optimal transport map \(A:\Omega \rightarrow \Omega \) to be the map which transports \(\nu \) to \(\mu \) satisfying

$$\begin{aligned} \int _{\Omega }|\omega -A(\omega )|_{H^1}^2d\nu (\omega )=\inf _{\xi \text { adapted}} \mathbb {E}_{\gamma }\left[ |\xi (\omega )|_{H^1}^2\right] \end{aligned}$$

where the infimum is taken over all maps \(\xi :\Omega \rightarrow H^1\) such that \(\text {Law}(\omega +\xi (\omega ))=\mu \) with \(\omega \sim \nu \), and with the additional requirement that \(\xi \) is an adapted process. This notion of optimality, sometimes referred to as adapted optimal transport, has recently gained a lot of traction (e.g, [7] and references therein).

The connection between these transport maps and the Brownian transport map follows from the work of Lassalle [44]. It turns out that when \(d\mu (\omega )= \frac{dp}{d\gamma _d}(\omega _1)d\gamma (\omega )\) for \(\omega \in \Omega \), the causal optimal transport map \(A:\Omega \rightarrow \Omega \), which transports \(\gamma \) to \(\mu \), is precisely the Föllmer process \(X:\Omega \rightarrow \Omega \). This is essentially a consequence of Girsanov’s theorem as well as the entropy-minimization property of the Föllmer process [33].

Once a notion of optimal (causal or non-causal) transport map in Wiener space is established, the question of contraction arises: Given a measure \(\mu \) on \(\Omega \) which is more log-concave than the Wiener measure \(\gamma \), is either O or A a contraction? We are not aware of any such results in the current literature. To make this question precise we need a notion of convexity on \(\Omega \) as well as a notion of contraction. We postpone the precise definitions to Sect. 8 and for now denote such a notion of contraction as Cameron-Martin contraction. Let us state some of our results in this direction.

Theorem 1.8

(Cameron-Martin contraction)

  • Let p be any 1-log-concave measure on \(\mathbb {R}^d\) and let \(\mu \) be a measure on the Wiener space given by \(d\mu (\omega )=\frac{dp}{d\gamma _d}(\omega _1)d\gamma (\omega )\) for \(\omega \in \Omega \). Then, the optimal transport map O from the Wiener measure \(\gamma \) to \(\mu \) is a Cameron-Martin contraction with constant 1.

  • There exists a 1-log-concave measure p on \(\mathbb {R}^d\) such that the causal transport map A between \(\gamma \) to \(\mu \), where \(d\mu (\omega )=\frac{dp}{d\gamma _d}(\omega _1)d\gamma (\omega )\) for \(\omega \in \Omega \), is not a Cameron-Martin contraction with any constant.

The first part of the theorem is a straightforward consequence of Caffareli’s contraction theorem. The second part requires some work by constructing a non-trivial example (see Remark 7.1) where the causal optimal transport map fails to be a Cameron-Martin contraction.

1.6 Organization of the paper

Section 2 contains the preliminaries necessary for this work including the definition of the Brownian transport map based on the Föllmer process. Section 3 contains the construction of the Föllmer process and the analysis of its properties which then leads to the almost-sure contraction properties of the Brownian transport map. The main results in this section are contained in Theorem 3.1. Section 4 focuses on the setting where the target measure is log-concave with compact convex support and shows that, in this setting, we can bound the moments of the derivative of the Brownian transport map; the main result is Theorem 4.2. In addition, Sect. 4 contains a short explanation of the connection between stochastic localization and the Föllmer process. In Sect. 5 we use the almost-sure contraction established in Theorem 3.1 to prove new functional inequalities. In addition, Sect. 5 contains our results on Stein kernels and their applications to central limit theorems. In Sect. 6 we set up the preliminaries necessary for the study of contraction properties of transport maps on the Wiener space itself. In Sect. 7 we show that causal optimal transport maps are not Cameron-Martin contractions even when the target measure is \(\kappa \)-log-concave, for any \(\kappa \). Finally, Sect. 8 is devoted to optimal transport on the Wiener space.

2 Preliminaries

For the rest of the paper we fix a dimension d and let \(f:\mathbb {R}^d\rightarrow \mathbb {R}_{\ge 0}\) be a function such that \(\int _{\mathbb {R}^d}fd\gamma _d=1\) where \(\gamma _d\) is the standard Gaussian measure on \(\mathbb {R}^d\). We denote the probability measure \(p(x)dx:=f(x)d\gamma _d(x)\) and further assume that the relative entropy \(\textsf{H}(p|\gamma _d):=\int _{\mathbb {R}^d} \log \left( \frac{dp}{d\gamma _d}\right) dp<+\infty \). We set \(S:={\textrm{diam}}({\textrm{supp}}(p))\). We write \(\langle \cdot ,\cdot \rangle \) for the Euclidean inner product and \(|\cdot |\) for the corresponding norm. Our notion of convexity is the following:

Definition 2.1

A probability measure p is \(\kappa \)-log-concave for some \(\kappa \in \mathbb {R}\) if the support of p is convex and \(-\nabla ^2\log \left( \frac{dp}{dx}\right) (x) \succeq \kappa \text {Id}_d\) for all \(x\in {\textrm{supp}}(p)\).

Next we recall some basics on the classical Wiener space and the Malliavin calculus [59].

2.1 Wiener space

Let \((\Omega ,\mathcal {F},\gamma )\) be the classical Wiener space: \(\Omega =C_0([0,1];\mathbb {R}^d)\) is the set of continuous functions from [0, 1] to \(\mathbb {R}^d\), \(\gamma \) is the Wiener measure, and \(\mathcal {F}\) is the completion (with respect to \(\gamma \)) of the Borel sigma-algebra generated by the uniform norm \(|\omega |_{\infty }:=\sup _{t\in [0,1]}|\omega _t|\) for \(\omega \in \Omega \). In words, a path \(\omega \in \Omega \) sampled according to \(\gamma \) has the law of a Brownian motion in \(\mathbb {R}^d\) running from time 0 to time 1. We set \(W_t:=W(\omega )_t:=\omega _t\) for \(t\in [0,1]\) and let \((\mathcal {F}_t)_{t\in [0,1]}\) be the sigma-algebra on \(\Omega \) generated by \((W_t)_{t\in [0,1]}\) together with the null sets of \(\mathcal {F}\). We say that a process \((u_t)_{t\in [0,1]}\) is adapted if \(u_t:\Omega \rightarrow \mathbb {R}^d\) is \(\mathcal {F}_t\)-measurable for all \(t\in [0,1]\). For the rest of the paper we define a probability measure \(\mu \) on \(\Omega \) by

$$\begin{aligned} d\mu (\omega )=f(\omega _1)d\gamma (\omega ). \end{aligned}$$

An important Hilbert subspace in \(\Omega \) is the Cameron-Martin space \(H^1\) which is defined as follows: Let \(H=L^2([0,1],\mathbb {R}^d)\) and given \(g\in H\) set \(i(g)\in \Omega \) by \(i(g)_t:=\int _0^tg_sds\). Then \(H^1:=\{i(g):g\in H\}\) and we often write \(h_t=\int _0^t \dot{h}_sds\) for \(\dot{h}\in H\) and \(h\in H^1\). The space \(H^1\) has an inner product induced from the inner product of the Hilbert space H, namely, \(\langle h,g\rangle _{H^1}:=\int _0^1\langle \dot{h}_s,\dot{g}_s\rangle ds\). The significance of the Cameron-Martin space is that the measure of the process \(W+h=(\omega _t+h_t(\omega ))_{t\in [0,1]}\) is absolutely continuous with respect to \(\gamma \) whenever \(h(\omega )\in H^1\) \(\gamma \)-a.e. and \((h_t(\omega ))_{t\in [0,1]}\) is adapted and regular-enough; this is a consequence of Girsanov’s theorem. Given \(\dot{h}\in H\) we set \(W(\dot{h}):=\int _0^1\dot{h}_tdW_t\) where the integral is the stochastic Itô integral; in this notation, \(W_t=W(1_{[0,t]})\).

Next we define the notion of contraction which is compatible with the Cameron-Martin space.

Definition 2.2

A measurable map \(T:\Omega \rightarrow \mathbb {R}^d\) is an almost-sure contraction with constant C if

$$\begin{aligned} |T(\omega +h)-T(\omega )|\le C|h|_{H^1} \quad \forall \,h\in H^1\quad \gamma \text {-a.e.} \end{aligned}$$

In Euclidean space, a function is Lipschitz if and only if its derivative (which exists almost-everywhere) is bounded. In order to find the analogue of this result for our notion of contraction we need an appropriate definition of derivatives on the Wiener space.

2.2 Malliavin calculus

The calculus on the Wiener space was developed by P. Malliavin in the 70’s and it will play an important role in our proof techniques. The basic objects of analysis in this theory is the variation of a function \(F:\Omega \rightarrow \mathbb {R}\) as the input \(\omega \in \Omega \) is perturbed. In order for the calculus to be compatible with the Wiener measure only perturbations in the direction of the Cameron-Martin space \(H^1\) are considered. We now sketch the construction of the Malliavin derivative and refer to [59] for a complete treatment. The construction of derivatives of F starts with the definition of the class \(\mathcal {S}\) of smooth random variables: \(F\in \mathcal {S}\) if there exists \(m\in \mathbb {Z}_+\) and a smooth function \(\eta :\mathbb {R}^m\rightarrow \mathbb {R}\) whose partial derivatives have polynomial growth such that

$$\begin{aligned} F=\eta (W(\dot{h}_1),\ldots , W(\dot{h}_m)) \end{aligned}$$

for some \(\dot{h}_1,\ldots ,\dot{h}_m\in H\). The Malliavin derivative of a smooth random variable F is a map \(DF:\Omega \rightarrow H\) defined by

$$\begin{aligned} DF=\sum _{i=1}^m\partial _i\eta (W(\dot{h}_1),\ldots , W(\dot{h}_m))\dot{h}_i. \end{aligned}$$

To get some intuition for this definition observe that

$$\begin{aligned} \langle DF(\omega ),\dot{h}\rangle _H=\frac{d}{d\epsilon }F(\omega +\epsilon h)\big |_{\epsilon =0} \end{aligned}$$

for \(\gamma \)-a.e. \(\omega \in \Omega \) and every \(\dot{h}\in H\) with \(H^1\ni h=\int _0\dot{h}\), is the Gâteaux derivative of F in the direction \(\dot{h}\). The Malliavin derivative is then extended to a larger class of functions on the Wiener space: Given \(p\ge 1\) we let \(\mathbb {D}^{1,p}\) be the closure of the class \(\mathcal {S}\) with respect to the norm

$$\begin{aligned} \Vert F\Vert _{1,p}:=\left( \mathbb {E}_{\gamma }\left[ |F|^p\right] +\mathbb {E}_{\gamma }\left[ |DF|_{H}^p\right] \right) ^{\frac{1}{p}}. \end{aligned}$$

In other words, \(\mathbb {D}^{1,p}\) is the domain in \(L^p(\Omega ,\gamma )\) of the Malliavin derivative operator D. The value of \(DF\in H\) at time \(t\in [0,1]\) is denoted as \(D_tF\).

The notion of the Malliavin derivative allows us to define the appropriate notion of derivatives of transport maps \(F:\Omega \rightarrow \mathbb {R}^k\). Let \(F=(F^1,\ldots ,F^k):\Omega \rightarrow \mathbb {R}^k\) with \(F^i:\Omega \rightarrow \mathbb {R}\) a Malliavin differentiable random variable for \(i\in [k]\), and let \(D_tF\) be the \(k\times d\) matrix given by \([D_tF]_{ij}=D_t^jF^i\) where we use the notation \(D_tF^i=(D_t^1F^i,\ldots , D_t^dF^i)\in \mathbb {R}^d\) for \(i\in [k]\); in other words, \(D_t^jF^i\) is the jth coordinate of \(D_tF^i\). For \(\gamma \)-a.e. \(\omega \in \Omega \) we define the linear Malliavin derivative operator \(\mathcal {D}_{\omega } F: H\rightarrow \mathbb {R}^d\) by

$$\begin{aligned} \mathcal {D}_{\omega } F[\dot{h}]:=\int _0^1D_tF(\omega )\dot{h}_tdt = \langle DF(\omega ),\dot{h}\rangle _H,\quad \dot{h}\in H. \end{aligned}$$

When no confusion arises we omit the subscript dependence on \(\omega \) and write \(\mathcal {D}F\). The next result shows that almost-sure contraction is equivalent to the boundedness of the corresponding Malliavin derivative operator. In the following we denote by \(\mathcal {L}(H,\mathbb {R}^d)\) the space of linear operators from H to \(\mathbb {R}^d\) equipped with the operator norm \(|\cdot |_{\mathcal {L}(H,\mathbb {R}^d)}\). For example, note that \(\mathcal {D}F\in \mathcal {L}(H,\mathbb {R}^d)\) for \(\gamma \)-a.e. \(\omega \in \Omega \).

Lemma 2.3

  • Suppose \(T:\Omega \rightarrow \mathbb {R}^d\) is an almost-sure contraction with constant C. Then \(\mathcal {D}T\) exists \(\gamma \)-a.e. and \(|\mathcal {D}T|_{\mathcal {L}(H,\mathbb {R}^d)}\le C\) \(\gamma \)-a.e.

  • Let \(T:\Omega \rightarrow \mathbb {R}^d\) be such that there exists \(q>1\) so that \(\mathbb {E}_{\gamma }[|T|^q]<\infty \) and \(\mathcal {D}T\) exists \(\gamma \)-a.e. If \(|\mathcal {D}T|_{\mathcal {L}(H,\mathbb {R}^d)}\le C\) \(\gamma \)-a.e. then there exists an almost-sure contraction \({\tilde{T}}:\Omega \rightarrow \mathbb {R}^d\) with constant C such that \(\gamma \)-a.e. \({\tilde{T}}=T\).

Proof

The first part will follow from [10, Theorem 5.11.2(ii)] while the second part will follow from [10, Theorem 5.11.7] once we check that these results can be applied. We take the domain to be \(\Omega \) ( a locally convex space) with the measure \(\gamma \) (a centered Radon Gaussian measure). The space \(H^1\) is the Cameron-Martin space while the image of T is a subset of \(\mathbb {R}^d\) (a separable Banach space with the Radon-Nikodym property). It remains to check that the Gâteaux derivative of T along \(H^1\) is equal to \(\mathcal {D}T\). For smooth cylindrical maps T [10, p. 207] this is clear and the general result follows from [10, Theorem 5.7.2]. \(\square \)

2.3 The Föllmer process and the Brownian transport map

The history of the Föllmer process goes back to the work of E. Schrödinger in 1932 [52], but it was H. Föllmer who formulated the problem in the language of stochastic differential equations [33]; see also the work of Dai Pra from the stochastic control approach [24]. Let \(p=f d\gamma _d\) be our probability measure on \(\mathbb {R}^d\) and let \((B_t)_{t\in [0,1]}\) be the standard Brownian motion in \(\mathbb {R}^d\). The Föllmer drift v(tx) is such that the solution \((X_t)_{t\in [0,1]}\) of the stochastic differential equation

$$\begin{aligned} dX_t=v(t,X_t)dt+dB_t,\quad X_0=0, \end{aligned}$$
(2.1)

satisfies \(X_1\sim p\) and, in addition, \(\int _0^1\mathbb {E}_{\gamma }\left[ |v(t,X_t)|^2\right] dt\) is minimal among all such drifts. It turns out that the Föllmer drift v has an explicit solution: Let \((P_t)_{t\ge 0}\) be the heat semigroup on \(\mathbb {R}^d\) acting on functions \(\eta :\mathbb {R}^d\rightarrow \mathbb {R}\) by

$$\begin{aligned} P_t\eta (x)=\int \eta (x+\sqrt{t}z)d\gamma _d(z), \end{aligned}$$

then, the Föllmer drift \(v:[0,1]\times \mathbb {R}^d\rightarrow \mathbb {R}^d\) is given by

$$\begin{aligned} v(t,x):=\nabla \log P_{1-t}f(x). \end{aligned}$$

That \(X_1\sim p\) with the above v can be seen, for example, from the Fokker-Planck equation of (2.1). Further, as a consequence of Girsanov’s theorem, the optimal drift satisfies

$$\begin{aligned} \textsf{H}(p|\gamma _d)=\frac{1}{2}\int _0^1\mathbb {E}_{\gamma }\left[ |v(t,X_t)|^2\right] dt. \end{aligned}$$
(2.2)

We refer to [33, 51] for more details. Specifically, the validity of (2.2) is guaranteed in our setting by [24, Theorem 3.1] (using the uniqueness of the solution to (2.1)).

The Brownian transport map is defined as the map \(X_1:\Omega \rightarrow \mathbb {R}^d\). This definition makes sense only if (2.1) has a strong solution which in particular is defined at \(t=1\); we will address this issue in the next section.

3 Almost-sure contraction properties of Brownian transport maps

In this section we show that the Brownian transport map is an almost-sure contraction in various settings. The following is the main result of this section and it covers the almost-sure contraction statements of Theorem 1.1 and Theorem 1.3.

Theorem 3.1

  1. (1)

    Suppose that either p is \(\kappa \)-log-concave for some \(\kappa > 0\), or that p is \(\kappa \)-log-concave for some \(\kappa \in \mathbb {R}\) and that \(S<+\infty \). Then (2.1) has a unique strong solution for all \(t\in [0,1]\). Furthermore,

    1. (a)

      If \(\kappa S^2\ge 1\) then \(X_1\) is an almost-sure contraction with constant \(\frac{1}{\sqrt{\kappa }}\); equivalently,

      $$\begin{aligned} |\mathcal {D}X_1|_{\mathcal {L}(H,\mathbb {R}^d)}^2\le \frac{1}{\kappa } \quad \gamma \text {-a.e.} \end{aligned}$$
    2. (b)

      If \(\kappa S^2<1\) then \(X_1\) is an almost-sure contraction with constant \(\left( \frac{e^{1-\kappa S^2}+1}{2}\right) ^{1/2}S\); equivalently,

      $$\begin{aligned} |\mathcal {D}X_1|_{\mathcal {L}(H,\mathbb {R}^d)}^2\le \left( \frac{e^{1-\kappa S^2}+1}{2}\right) S^2 \quad \gamma \text {-a.e.} \end{aligned}$$
  2. (2)

    Fix a probability measure \(\nu \) on \(\mathbb {R}^d\) supported on a ball of radius R and let \(p:=\gamma _d\star \nu \). Then (2.1) has a unique strong solution for all \(t\in [0,1]\). Furthermore, \(X_1\) is an almost-sure contraction with constant \(\left( \frac{e^{2R^2}-1}{2}\right) ^{1/2}\frac{1}{R}\); equivalently,

    $$\begin{aligned} |\mathcal {D}X_1|_{\mathcal {L}(H,\mathbb {R}^d)}^2\le \frac{e^{2R^2}-1}{2R^2} \quad \gamma \text {-a.e.} \end{aligned}$$

Remark 3.2

The dichotomy of \(\kappa S^2\ge 1\) versus \(\kappa S^2<1\) is just a convenient way of organizing the various cases we consider, i.e., \(\kappa \) nonpositive or nonnegative and S finite or infinite. This dichotomy is ambiguous when \(\kappa =0\) and \(S=\infty \) since we need to make a convention regarding \(0\cdot \infty \). Either way, the bound provided by Theorem 3.1(1) is trivial, since it is equal to \(\infty \), so when proving Theorem 3.1(1) we will ignore issues arising from this case. Will come back to the case \(\kappa =0\) when proving Theorem 4.2.

The proof of Theorem 3.1, ignoring for now the issue of existence of solutions to (2.1), relies on the fact that the Malliavin derivative of the Föllmer process satisfies the following linear equation:

$$\begin{aligned} D_rX_t=\text {Id}_d+\int _r^t\nabla v(s,X_s)D_rX_sds\quad \forall \,r\le t\quad \text {and}\quad D_rX_t=0\quad \forall \, r> t. \end{aligned}$$

Using this equation we show that

$$\begin{aligned} |\mathcal {D}X_t|_{\mathcal {L}(H,\mathbb {R}^d)}^2\le \int _0^t e^{2\int _s^t\lambda _{\max }(\nabla v(r,X_r))dr} ds\quad \forall \, t\in [0,1]\quad \gamma \text {-a.e.} \end{aligned}$$

Hence, the proof of Theorem 3.1 now boils down to estimating \(\lambda _{\max }(\nabla v(r,X_r))\). In Sect. 3.1 we express \(\nabla v(r,X_r)\) as a covariance matrix which allows us to bound \(\lambda _{\max }(\nabla v(r,X_r))\). In Sect. 3.2 we use those estimates to establish the existence and uniqueness of a strong solution to (2.1). Consequently, we derive a differential equation for \(\mathcal {D}X_t\), which together with the estimates on \(\lambda _{\max }(\nabla v(r,X_r))\), allow us to bound \(|\mathcal {D}X_t|_{\mathcal {L}(H,\mathbb {R}^d)}^2\). We complete the proof of Theorem 3.1 in Sect. 3.3.

Remark 3.3

As explained above, the key point behind the proof of Theorem 3.1 is to upper bound \( \lambda _{\max }(\nabla v(r,X_r))=\lambda _{\max }(\nabla ^2\log P_{1-r}(X_r))\). However, once a Hessian estimate on \(\nabla ^2\log P_{1-r}(X_r)\) is obtained, it can be used to prove functional inequalities without the usage of the Brownian transport map:

(a) The first way to do so is to work with the semigroup of \((X_t)\), and mimic the classical Bakry-Émery calculation (see [8]). The downside of this approach is that it is well suited to functional inequalities such as the log-Sobolev inequality, but not to isoperimetric-type inequalities. In contrast, transport approaches, such as the Brownian transport map, can provde all of these functional inequalities in one streamlined framework.

(b) The second way to apply the Hessian estimate is to use it within the context of the heat flow transport map of Kim and Milman [37]. This approach avoids the issues mentioned in part (a). On the other hand, the usage of this transport map is only suitable if we want to prove pointwise estimates on the Lipschitz constant of the transport map. In contrast, the Brownian transport map allows us to prove estimates on the Lipschitz constant of the transport map in expectation, which is what is needed to make the connection with the Kannan–Lovász–Simonovits conjecture; cf. Theorem 1.4. (We remark however that the heat flow map has its own advantages, as explained in [54, p.3].)

3.1 Covariance estimates

We begin by representing \(\nabla v\) as a covariance matrix. Define the measure \(p^{x,t}\) on \(\mathbb {R}^d\), for fixed \(t\in [0,1]\) and \(x\in \mathbb {R}^d\), by

$$\begin{aligned} dp^{x,t}(y):= \frac{f(y)\varphi ^{x,t}(y)}{P_tf(x)}dy \end{aligned}$$
(3.1)

where \(\varphi ^{x,t}\) is the density of the d-dimensional Gaussian distribution with mean x and covariance \(t\text {Id}_d\).

Claim

$$\begin{aligned} \nabla v(t,x)=\frac{1}{(1-t)^2}\text {Cov}(p^{x,1-t})-\frac{1}{1-t}\text {Id}_d\quad \forall \, t\in [0,1] \end{aligned}$$
(3.2)

and

$$\begin{aligned} -\frac{1}{1-t}\text {Id}_d \preceq \nabla v(t,x)\quad \forall \, t\in [0,1]. \end{aligned}$$
(3.3)

Proof

The estimate (3.3) follows immediately from (3.2). To prove (3.2) note that since

$$\begin{aligned} P_{1-t}f(x)=\int f(y)\varphi ^{x,1-t}(y)dy \end{aligned}$$

we have

$$\begin{aligned}&\nabla P_{1-t}f(x)=\frac{1}{1-t}\int (y-x)f(y)\varphi ^{x,1-t}(y)dy,\\&\nabla ^2P_{1-t}f(x)=\frac{1}{(1-t)^2}\int (y-x)^{\otimes 2}f(y)\varphi ^{x,1-t}(y)dy-\frac{1}{1-t}\left( \int f(y)\varphi ^{x,1-t}(y)dy\right) \text {Id}_d, \end{aligned}$$

and hence,

$$\begin{aligned} \nabla v(t,x)&=\nabla ^2\log P_{1-t}f(x)=\frac{\nabla ^2P_{1-t}f(x)}{P_{1-t}f(x)}-\left( \frac{\nabla P_{1-t}f(x)}{P_{1-t}f(x)}\right) ^{\otimes 2}\\&=\frac{1}{(1-t)^2}\int (y-x)^{\otimes 2}dp^{x,1-t}(y)\\&\quad -\frac{1}{(1-t)^2}\left( \int (y-x)dp^{x,1-t}(y)\right) ^{\otimes 2}-\frac{1}{1-t} \text {Id}_d\\&=\frac{1}{(1-t)^2}\int y^{\otimes 2}dp^{x,1-t}(y)-\frac{1}{(1-t)^2}\left( \int ydp^{x,1-t}(y)\right) ^{\otimes 2}-\frac{1}{1-t} \text {Id}_d\\&=\frac{1}{(1-t)^2}\text {Cov}(p^{x,1-t})-\frac{1}{1-t}\text {Id}_d. \end{aligned}$$

\(\square \)

We start by using the representation (3.2) to upper bound \(\nabla v\).

Lemma 3.4

Define the measure \(dp=fd\gamma _d\) and let \(S:={\textrm{diam}}({\textrm{supp}}(p))\). Then,

  1. (1)

    For every \(t\in [0,1]\),

    $$\begin{aligned} \nabla v(t,x) \preceq \left( \frac{S^2}{(1-t)^2}-\frac{1}{1-t}\right) \text {Id}_d. \end{aligned}$$
  2. (2)

    Let \(\kappa \in \mathbb {R}\) and suppose that p is \(\kappa \)-log-concave. Then, for any \(t\in \left[ \frac{\kappa }{\kappa -1}1_{\kappa <0},1\right] \),

    $$\begin{aligned} \nabla v(t,x)\preceq \frac{1-\kappa }{\kappa (1-t)+t}. \end{aligned}$$
  3. (3)

    Fix a probability measure \(\nu \) on \(\mathbb {R}^d\) supported on a ball of radius R and let \(p:=\gamma _d\star \nu \). Then,

    $$\begin{aligned} \nabla v(t,x)\preceq R^2\text {Id}_d. \end{aligned}$$

Proof

  1. (1)

    By (3.2), it suffices to show that \(\text {Cov}(p^{x,1-t})\preceq S^2\text {Id}_d\) which is clear from the definition of \(p^{x,1-t}\).

  2. (2)

    If p is \(\kappa \)-log-concave then, for any \(t\in [0,1)\), \(p^{x,1-t}\) is \(\left( \kappa +\frac{t}{1-t}\right) \)-log-concave because

    $$\begin{aligned}{} & {} -\nabla ^2\log \left( \frac{dp^{x,1-t}}{dy}\right) (y)=-\nabla ^2\log \left( f(y)\varphi ^{0,1}(y)\right) \\{} & {} \quad -\nabla ^2\log \left( \frac{\varphi ^{x,1-t}(y)}{\varphi ^{0,1}(y)}\right) \succeq \kappa \text {Id}_d+\frac{t}{1-t}\text {Id}_d \end{aligned}$$

    where we used that \(dp(y)=f(y)\varphi ^{0,1}(y)dy\). If \(t\in \left[ \frac{\kappa }{\kappa -1}1_{\kappa <0},1\right] \), then \(\left( \kappa +\frac{t}{1-t}\right) \ge 0\) so by the Brascamp-Lieb inequality [3, Theorem 4.9.1], applied to functions of the form \(\mathbb {R}^d\ni x\mapsto \langle x,v\rangle \) for \(v\in S^{d-1}\), we get

    $$\begin{aligned} \text {Cov}(p^{x,1-t})\preceq \left( \kappa +\frac{t}{1-t}\right) ^{-1}\text {Id}_d \end{aligned}$$

    and the result follows by (3.2).

  3. (3)

    We have

    $$\begin{aligned} \frac{dp^{x,1-t}(y)}{dy}=\frac{(\gamma _d\star \nu )(y)}{\varphi ^{0,1}(y)}\frac{\varphi ^{x,1-t}(y)}{P_{1-t}\left( \frac{\gamma _d\star \nu }{\varphi ^{0,1}}\right) (x)}=A_{x,t}\int \varphi ^{z,1}(y) \varphi ^{\frac{x}{t},\frac{1-t}{t}}(y)d \nu (z) \end{aligned}$$

    for some constant \(A_{x,t}\) depending only on x and t. Hence,

    $$\begin{aligned} \frac{dp^{x,1-t}(y)}{dy}=\int \varphi ^{(1-t)z+x,1-t}(y)\, d\tilde{\nu }(z) \end{aligned}$$

    where \(\tilde{\nu }\) is a probability measure which is a multiple of \(\nu \) by a positive function. In particular, \(\tilde{\nu }\) is supported on the same ball as \(\nu \). Let G be a standard Gaussian vector in \(\mathbb {R}^d\) and \(Z\sim \tilde{\nu }\) be independent. Then

    $$\begin{aligned} \sqrt{1-t}G+x+(1-t)Z\sim p^{x,1-t} \end{aligned}$$

    so

    $$\begin{aligned} \text {Cov}(p^{x,1-t})=(1-t)\text {Id}_d+(1-t)^2\text {Cov}(Z)\preceq (1-t)[1+(1-t)R^2]\text {Id}_d. \end{aligned}$$

    By (3.2),

    $$\begin{aligned} \nabla v(t,x)\preceq \frac{1+(1-t)R^2}{1-t}\text {Id}_d-\frac{1}{1-t}\text {Id}_d=R^2\text {Id}_d. \end{aligned}$$

\(\square \)

Remark 3.5

In principle, we could use more refined Brascamp-Lieb inequalities [43, Theorem 3.3], or use the results of [22] (which imply a stronger Poincaré inequality; we omit the details of this implication), to improve Lemma 3.4(2) and the subsequent results. However, the improvement will end up being not too significant at a cost of much more tedious computations so we omit the details.

The majority of this section focuses on part (1) of Theorem 3.1 since once that part is settled, part (2) will follow easily. The next two corollaries combine the bounds of Lemma 3.4(1,2) to obtain a bound on \(\lambda _{\max }(\nabla v(t,x))\), as well as its exponential, which is needed to bound \(\mathcal {D}X_t\). The first corollary handles the case \(\kappa \ge 0\) with no assumptions on S while the second corollary handles the case \(\kappa <0\) under the assumption \(S<\infty \).

Corollary 3.6

Define the measure \(dp=fd\gamma _d\) with \(S:={\textrm{diam}}({\textrm{supp}}(p))\) and suppose that p is \(\kappa \)-log-concave with \(\kappa \in [0,+\infty )\).

  • If \(\kappa S^2\ge 1\) then

    $$\begin{aligned} \lambda _{\max }(\nabla v(t,x))\le \theta _t:=\frac{1-\kappa }{(1-\kappa )t+\kappa },\quad t\in [0,1] \end{aligned}$$

    and

    $$\begin{aligned} \int _0^te^{2\int _s^t\theta _rdr}ds=\frac{t((1-\kappa )t+\kappa )}{\kappa },\quad t\in [0,1]. \end{aligned}$$
  • If \(\kappa S^2<1\) then

    $$\begin{aligned} \lambda _{\max }(\nabla v(t,x))\le \theta _t:= {\left\{ \begin{array}{ll} \frac{t+S^2-1}{(1-t)^2}\quad \text {for } t\in \left[ 0, \frac{1-\kappa S^2}{(1-\kappa )S^2+1}\right] ,\\ \frac{1-\kappa }{(1-\kappa )t+\kappa } \quad \text {for } t\in \left[ \frac{1-\kappa S^2}{(1-\kappa )S^2+1},1\right] , \end{array}\right. } \end{aligned}$$

    and, for \(t\in \left[ \frac{1-\kappa S^2}{(1-\kappa )S^2+1},1\right] \),Footnote 7

    $$\begin{aligned} \int _0^te^{2\int _s^t\theta _rdr}ds&=\frac{1}{2S^2}((1-\kappa )S^2t+\kappa S^2)^2\left\{ e^{2(1-\kappa S^2)}-1\right\} \\&\quad +((1-\kappa )t+\kappa )(\kappa S^2+t+(1-\kappa )tS^2-1). \end{aligned}$$

Proof

By Lemma 3.4, the two upper bounds we can get on \(\lambda _{\max }(\nabla v(t,x))\) are \(\frac{t+S^2-1}{(1-t)^2}\) and \(\frac{1-\kappa }{\kappa (1-t)+t}\). Simple algebra shows that

$$\begin{aligned} \frac{t+S^2-1}{(1-t)^2}\le \frac{1-\kappa }{\kappa (1-t)+t} \quad \text {if and only if}\quad (S^2-\kappa S^2+1) t\le 1-\kappa S^2. \end{aligned}$$

We consider two cases.

  • \(\kappa S^2\ge 1\): By considering \(\kappa S^2=1\) we see that the bound \((S^2-\kappa S^2+1) t\le 1-\kappa S^2\) cannot hold so it is always advantageous to use the bound

    $$\begin{aligned} \lambda _{\max }(\nabla v(t,x))\le \theta _t:=\frac{1-\kappa }{\kappa (1-t)+t}=\frac{1-\kappa }{(1-\kappa )t+\kappa }. \end{aligned}$$

    Next we will compute \(\int _0^te^{2\int _s^t\theta _rdr}ds\) and we first check that the integral \(\int _s^t\theta _rdr\) is well-defined. The only issue is if \((1-\kappa )t+\kappa =0\) which happens when \(t_0:=\frac{\kappa }{\kappa -1}\). If \(\kappa \in (0,1)\) then \(t_0<0\) so \(\theta _t\) is integrable on [0, 1], and if \(\kappa \ge 1\), then \(t_0> 1\) so again \(\theta _t\) is integrable on [0, 1]. The only issue is when \(\kappa =0\) in which case \(t_0=0\). However, in that case we cannot have \(\kappa S^2\ge 1\) as \(\kappa =0\). Compute,

    $$\begin{aligned} \int _s^t\theta _rdr=(1-\kappa )\left\{ \frac{1}{1-\kappa }\log ((1-\kappa )r+\kappa )\right\} \bigg |_s^t=\log \left( \frac{(1-\kappa )t+\kappa }{(1-\kappa )s+\kappa }\right) \end{aligned}$$

    so

    $$\begin{aligned} \int _0^te^{2\int _s^t\theta _rdr}ds&=((1-\kappa )t+\kappa )^2\int _0^t\frac{1}{((1-\kappa )s+\kappa )^2}ds\\&=-\frac{((1-\kappa )t+\kappa )^2}{1-\kappa }\left\{ \frac{1}{(1-\kappa )t+\kappa }-\frac{1}{\kappa }\right\} =\frac{t((1-\kappa )t+\kappa )}{\kappa }. \end{aligned}$$
  • \(\kappa S^2<1\): The condition \((S^2-\kappa S^2+1) t\le 1-\kappa S^2\) is equivalent to

    $$\begin{aligned} t\le \frac{1-\kappa S^2}{(1-\kappa ) S^2+1} \end{aligned}$$

    since the denominator is nonnegative as \(\kappa S^2<1\). Hence we define

    $$\begin{aligned} \lambda _{\max }(\nabla v(t,x))\le \theta _t:= {\left\{ \begin{array}{ll} \frac{t+S^2-1}{(1-t)^2}\quad \text {for } t\in \left[ 0, \frac{1-\kappa S^2}{(1-\kappa )S^2+1}\right] ,\\ \frac{1-\kappa }{(1-\kappa )t+\kappa } \quad \text {for } t\in \left[ \frac{1-\kappa S^2}{(1-\kappa )S^2+1},1\right] . \end{array}\right. } \end{aligned}$$

    From now until the end of the proof we assume that \(t\ge \frac{1-\kappa S^2}{(1-\kappa )S^2+1}\). In order to compute \(\int _s^t\theta _rdr\) we start by noting that \(\frac{r}{(1-r)^2}=\frac{d}{dr}\left[ \frac{1}{1-r}+\log (1-r)\right] \). We also note that, following the discussion in the \(\kappa S^2\ge 1\) case, the denominator \((1-\kappa )t+\kappa \) does not vanish in the range where it is integrated. For \(s\in \left[ 0, \frac{1-\kappa S^2}{(1-\kappa )S^2+1}\right] \) we have

    $$\begin{aligned} \int _s^t\theta _rdr&=\int _s^{\frac{1-\kappa S^2}{(1-\kappa )S^2+1}}\left( \frac{r+S^2-1}{(1-r)^2}\right) dr+\int _{\frac{1-\kappa S^2}{(1-\kappa )S^2+1}}^t\frac{1-\kappa }{(1-\kappa )r+\kappa } dr\\&=\left\{ \frac{1}{1-r}+\log (1-r)\right\} \bigg |_s^{\frac{1-\kappa S^2}{(1-\kappa )S^2+1}}+(S^2-1)\left\{ \frac{1}{1-r}\right\} \bigg |_s^{\frac{1-\kappa S^2}{(1-\kappa )S^2+1}}\\&\quad +(1-\kappa )\left\{ \frac{1}{1-\kappa }\log ((1-\kappa )r+\kappa )\right\} \bigg |_{\frac{1-\kappa S^2}{(1-\kappa )S^2+1}}^t\\&=\left\{ \log (1-r)\right\} \bigg |_s^{\frac{1-\kappa S^2}{(1-\kappa )S^2+1}}+S^2\left\{ \frac{1}{1-r}\right\} \bigg |_s^{\frac{1-\kappa S^2}{(1-\kappa )S^2+1}}\\&\quad +\left\{ \log ((1-\kappa )r+\kappa )\right\} \bigg |_{\frac{1-\kappa S^2}{(1-\kappa )S^2+1}}^t\\&=\log \left( \frac{S^2}{(1-\kappa )S^2+1}\right) -\log (1-s)+(1-\kappa )S^2+1-\frac{S^2}{1-s}\\&\quad +\log ((1-\kappa )t+\kappa )-\log \left( \frac{1}{(1-\kappa )S^2+1}\right) \\&=\{(1-\kappa )S^2+1+\log ((1-\kappa )S^2t+\kappa S^2)\}-\left\{ \log (1-s)+\frac{S^2}{1-s}\right\} . \end{aligned}$$

    Hence,

    $$\begin{aligned} e^{2\int _s^t\theta _rdr}ds=e^{2(1-\kappa )S^2+2}((1-\kappa )S^2t+\kappa S^2)^2\frac{e^{-\frac{2S^2}{1-s}}}{(1-s)^2}. \end{aligned}$$

    For \(s\in \left[ \frac{1-\kappa S^2}{(1-\kappa )S^2+1},1\right] \) we have,

    $$\begin{aligned} \int _s^t\theta _rdr=\int _s^t\frac{1-\kappa }{(1-\kappa )r+\kappa } dr=\log \left( \frac{(1-\kappa )t+\kappa }{(1-\kappa )s+\kappa }\right) \end{aligned}$$

    and so

    $$\begin{aligned} e^{2\int _s^t\theta _rdr}ds=\left( \frac{(1-\kappa )t+\kappa }{(1-\kappa )s+\kappa }\right) ^2. \end{aligned}$$

    It follows that

    $$\begin{aligned}&\int _0^te^{2\int _s^t\theta _rdr}ds\\&\quad = e^{2(1-\kappa )S^2+2}((1-\kappa )S^2t+\kappa S^2)^2\int _0^{\frac{1-\kappa S^2}{(1-\kappa )S^2+1}} \frac{e^{-\frac{2S^2}{1-s}}}{(1-s)^2} ds\\&\qquad +((1-\kappa )t+\kappa )^2\int _{\frac{1-\kappa S^2}{(1-\kappa )S^2+1}}^t \frac{1}{((1-\kappa )s+\kappa )^2}ds, \end{aligned}$$

    and we note that both integrals are finite because \((1-\kappa )s+\kappa \) does not vanish in the range of integration. The first integral reads

    $$\begin{aligned} \int _0^{\frac{1-\kappa S^2}{(1-\kappa )S^2+1}} \frac{e^{-\frac{2S^2}{1-s}}}{(1-s)^2} ds=\left\{ -\frac{1}{2S^2}e^{-\frac{2S^2}{1-s}}\right\} \bigg |_0^{\frac{1-\kappa S^2}{(1-\kappa )S^2+1}}=-\frac{1}{2S^2}\left\{ e^{-2(1-\kappa )S^2-2}-e^{-2S^2}\right\} \end{aligned}$$

    so

    $$\begin{aligned}{} & {} e^{2(1-\kappa )S^2+2}((1-\kappa )S^2t+\kappa S^2)^2\int _0^{\frac{1-\kappa S^2}{(1-\kappa )S^2+1}} \frac{e^{-\frac{2S^2}{1-s}}}{(1-s)^2} ds\\{} & {} \quad =\frac{1}{2S^2}((1-\kappa )S^2t+\kappa S^2)^2\left\{ e^{2(1-\kappa S^2)}-1\right\} . \end{aligned}$$

    The second integral reads

    $$\begin{aligned}&\int _{\frac{1-\kappa S^2}{(1-\kappa )S^2+1}}^t \frac{1}{((1-\kappa )s+\kappa )^2}ds=-\frac{1}{1-\kappa }\left\{ \frac{1}{(1-\kappa )s+\kappa }\right\} \bigg |_{\frac{1-\kappa S^2}{(1-\kappa )S^2+1}}^t\\&\quad =-\frac{1}{1-\kappa }\left\{ \frac{1}{(1-\kappa )t+\kappa }-((1-\kappa )S^2+1)\right\} \end{aligned}$$

    so

    $$\begin{aligned}&((1-\kappa )t+\kappa )^2\int _{\frac{1-\kappa S^2}{(1-\kappa )S^2+1}}^t \frac{1}{((1-\kappa )s+\kappa )^2}ds\\&\quad =-\frac{(1-\kappa )t+\kappa }{1-\kappa }\left\{ 1-((1-\kappa )t+\kappa )((1-\kappa )S^2+1)\right\} \\&\quad =((1-\kappa )t+\kappa )(\kappa S^2+t+(1-\kappa )tS^2-1). \end{aligned}$$

Adding everything up gives the result. \(\square \)

Corollary 3.7

Define the measure \(dp=fd\gamma _d\) and suppose that \(S:={\textrm{diam}}({\textrm{supp}}(p))<\infty \) and that p is \(\kappa \)-log-concave with \(\kappa \in (-\infty , 0)\). We have

$$\begin{aligned} \lambda _{\max }(\nabla v(t,x))\le \theta _t:= {\left\{ \begin{array}{ll} \frac{t+S^2-1}{(1-t)^2}\quad \text {for } t\in \left[ 0, \frac{1-\kappa S^2}{(1-\kappa )S^2+1}\right] ,\\ \frac{1-\kappa }{(1-\kappa )t+\kappa } \quad \text {for } t\in \left[ \frac{1-\kappa S^2}{(1-\kappa )S^2+1},1\right) , \end{array}\right. } \end{aligned}$$

and, for \(t\in \left[ \frac{1-\kappa S^2}{(1-\kappa )S^2+1},1\right] \),

$$\begin{aligned} \int _0^te^{2\int _s^t\theta _rdr}ds&=\frac{1}{2S^2}((1-\kappa )S^2t+\kappa S^2)^2\left\{ e^{2(1-\kappa S^2)}-1\right\} \\&\quad +((1-\kappa )t+\kappa )(\kappa S^2+t+(1-\kappa )tS^2-1). \end{aligned}$$

Proof

By Lemma 3.4, the two upper bounds we can get on \(\lambda _{\max }(\nabla v(t,x))\) are \(\frac{t+S^2-1}{(1-t)^2}\) for any \(t\in [0,1]\) and \(\frac{1-\kappa }{\kappa (1-t)+t}\) for \(t\in \left[ \frac{\kappa }{\kappa -1},1\right] \). Hence, for \(t\in [0,\frac{\kappa }{\kappa -1})\), we must use the bound \(\frac{t+S^2-1}{(1-t)^2}\). Next we note that \(0< \frac{\kappa }{\kappa -1}<\frac{1-\kappa S^2}{(1-\kappa )S^2+1}\le 1\) and that (using \(\kappa (1-t)+t\ge 0\) for \(t\ge \frac{\kappa }{\kappa -1}\)),

$$\begin{aligned} \frac{t+S^2-1}{(1-t)^2}\le \frac{1-\kappa }{\kappa (1-t)+t} \quad \text {for } t\in \left[ \frac{\kappa }{\kappa -1},\frac{1-\kappa S^2}{(1-\kappa )S^2+1}\right] . \end{aligned}$$

We define

$$\begin{aligned} \lambda _{\max }(\nabla v(t,x))\le \theta _t:= {\left\{ \begin{array}{ll} \frac{t+S^2-1}{(1-t)^2}\quad \text {for } t\in \left[ 0, \frac{1-\kappa S^2}{(1-\kappa )S^2+1}\right] ,\\ \frac{1-\kappa }{(1-\kappa )t+\kappa } \quad \text {for } t\in \left[ \frac{1-\kappa S^2}{(1-\kappa )S^2+1},1\right] . \end{array}\right. } \end{aligned}$$

As in the proof of Corollary 3.6, we have

$$\begin{aligned} e^{2\int _s^t\theta _rdr}= {\left\{ \begin{array}{ll} e^{2(1-\kappa )S^2+2}((1-\kappa )S^2t+\kappa S^2)^2\frac{e^{-\frac{2S^2}{1-s}}}{(1-s)^2}, &{} s\in \left[ 0, \frac{1-\kappa S^2}{(1-\kappa )S^2+1}\right] \\ \left( \frac{(1-\kappa )t+\kappa }{(1-\kappa )s+\kappa }\right) ^2,&{} s\in \left[ \frac{1-\kappa S^2}{(1-\kappa )S^2+1},1\right] . \end{array}\right. } \end{aligned}$$

Since \(\frac{\kappa }{\kappa -1}<\frac{1-\kappa S^2}{(1-\kappa )S^2+1}\), the above term can be integrated as in the proof of Corollary 3.6. \(\square \)

3.2 The Malliavin derivative of the Föllmer process

The bounds provided by (3.3) and Lemma 3.4 are only strong enough to establish the existence of a unique strong solution to (2.1) only until \(t<1\) because at \(t=1\) these bounds can blow up. For our purposes, however, it is crucial to have the solution well-defined at \(t=1\) since we need \(X_1\sim p\). We will proceed by first analyzing the behavior of the solution before time 1, which will then allow us to extend the solution and its Malliavin derivative to \(t=1\); see Proposition 3.10.

Lemma 3.8

Let \(dp=fd\gamma _d\) with \(S:={\textrm{diam}}({\textrm{supp}}(p))\) and suppose that either \(S<\infty \) or p is \(\kappa \)-log-concave with \(\kappa \ge 0\). The Eq. (2.1) has a unique strong solution \((X_t)\) for \(t\in [0,1)\) satisfying

$$\begin{aligned} D_rX_t=\text {Id}_d+\int _r^t\nabla v(s,X_s)D_rX_sds\quad \forall \,r\le t\quad \text {and}\quad D_rX_t=0\quad \forall \, r> t, \quad \gamma \text {-a.e.} \end{aligned}$$

In addition,

$$\begin{aligned} |\mathcal {D}X_t|_{\mathcal {L}(H,\mathbb {R}^d)}^2\le \int _0^t e^{2\int _s^t\lambda _{\max }(\nabla v(r,X_r))dr} ds,\quad \forall \, t\in [0,1)\quad \gamma \text {-a.e.} \end{aligned}$$

Proof

Fix \(T\in (0,1)\) and note that by (3.3) and Lemma 3.4, \(v:[0,T]\times \mathbb {R}^d\rightarrow \mathbb {R}^d\) is uniformly Lipschitz in x (with the Lipschitz constant depending on T). Writing \(v(t,x)=v(t,0)+\int _0^1\nabla v(t,rx)x dr\) we see, again by (3.3) and Lemma 3.4, that v is of linear growth. It is standard that under these conditions the Eq. (2.1) has a unique strong solution [59, Lemma 2.2.1]. The Malliavin differentiability of \((X_t)_{t\in [0,1)}\) and the formula for its derivative follow from [59, Theorem 2.2.1 and p. 121], as we now elaborate. According to [59, Theorem 2.2.1], if \((X_t)\) is a solution to a stochastic differential equation

$$\begin{aligned} dX_t=dB_t+\int _0^tb(s,X_s)ds, \end{aligned}$$

with b globally Lipschitz and of linear growth, then

$$\begin{aligned} D_rX_t=\int _r^t\bar{b}(s,X_s)D_rX_sds\quad \forall \,r\le t\quad \text {and}\quad D_rX_t=0\quad \forall \, r> t, \quad \gamma \text {-a.e.}, \end{aligned}$$

where \({\bar{b}}=\nabla b\) (see [59, p. 121]). As mentioned above, \(b:=v\) is indeed globally Lipschitz and of linear growth, which implies the result.

Turning to the bound on \(\mathcal {D}X_t\), fix \(\dot{h}\in H\) and define \(\alpha _{\dot{h}}:[0,1)\rightarrow \mathbb {R}^d\) by

$$\begin{aligned} \alpha _{\dot{h}}(t):=\mathcal {D}X_t[\dot{h}]=\int _0^tD_rX_t[\dot{h}_r]dr. \end{aligned}$$

The equation for \(DX_t\) and Fubini’s theorem (which can be applied since \(\nabla v\) is bounded and by using Grönwall’s inequality on any norm of \(D_rX_t\)) imply that

$$\begin{aligned} \alpha _{\dot{h}}(t)&= \int _0^t\text {Id}_d[\dot{h}_r]dr+\int _0^t\int _r^t\nabla v(s,X_s)D_rX_s[\dot{h}_r]dsdr\\&=h_t+\int _0^t\int _0^t\nabla v(s,X_s)D_rX_s[\dot{h}_r]dsdr\quad \text {(because }D_rX_s=0\text { for }s<r)\\&=h_t+\int _0^t\nabla v(s,X_s)\left( \int _0^tD_rX_s[\dot{h}_r]dr\right) ds\\&=h_t+\int _0^t\nabla v(s,X_s)\left( \int _0^sD_rX_s[\dot{h}_r]dr\right) ds\quad \text {(because }D_rX_s=0\text { for }s<r) \\&=h_t+\int _0^t\nabla v(s,X_s) \alpha _{\dot{h}}(s)ds. \end{aligned}$$

Hence

$$\begin{aligned} \partial _t\alpha _{\dot{h}}(t)=\dot{h}_t+\nabla v(t,X_t) \alpha _{\dot{h}}(t) \quad \forall \, t\in [0,1). \end{aligned}$$

Set \(\lambda _t:=\lambda _{\max }(\nabla v(t,X_t))\). It follows from the Cauchy-Schwarz inequality that

$$\begin{aligned} \partial _t|\alpha _{\dot{h}}(t)|^2&=2\langle \partial _t\alpha _{\dot{h}}(t),\alpha _{\dot{h}}(t)\rangle =2\langle \dot{h}_t,\alpha _{\dot{h}}(t)\rangle +2\langle \alpha _{\dot{h}}(t), \nabla v(t,X_t)\alpha _{\dot{h}}(t)\rangle \\&\le 2|\dot{h}_t|\sqrt{|\alpha _{\dot{h}}(t)|^2}+2\lambda _t|\alpha _{\dot{h}}(t)|^2 \end{aligned}$$

so defining \(y:[0,1)\rightarrow \mathbb {R}\) by \(y(t):=|\alpha _{\dot{h}}(t)|^2\) we find

$$\begin{aligned} \partial _ty(t)\le 2|\dot{h}_t|\sqrt{y(t)}+2\lambda _ty(t). \end{aligned}$$

In order to analyze y(t) we note that the solution of the Bernoulli ordinary differential equation

$$\begin{aligned} \partial _tz(t)=2|\dot{h}_t|\sqrt{z(t)}+2\lambda _tz(t), \quad z(0)=0 \end{aligned}$$

can be verified to be

$$\begin{aligned} z(t)=\left( e^{\int _0^t\lambda _s ds}\int _0^t e^{-\int _0^s\lambda _rdr}|\dot{h}_s|ds\right) ^2. \end{aligned}$$

By the Cauchy-Schwarz inequality,

$$\begin{aligned} z(t)\le e^{2\int _0^t\lambda _s ds}\int _0^t e^{-2\int _0^s\lambda _rdr}ds\int _0^t|\dot{h}_s|^2ds= \int _0^t e^{2\int _s^t\lambda _rdr}ds\int _0^t|\dot{h}_s|^2ds \end{aligned}$$

so since \(y(t)\le z(t)\) for all \(t\in [0,1)\) we conclude that

$$\begin{aligned} |\mathcal {D}X_t|_{\mathcal {L}(H,\mathbb {R}^d)}^2=\sup _{\dot{h}\in H:|\dot{h}|_H=1}|\alpha _{\dot{h}}(t)|^2\le \int _0^t e^{2\int _s^t\lambda _rdr}ds. \end{aligned}$$

\(\square \)

Combining Lemma 3.8 and Corollaries 3.63.7 we obtain:

Corollary 3.9

Let \(dp=fd\gamma _d\) with \(S:={\textrm{diam}}({\textrm{supp}}(p))\) and suppose that p is \(\kappa \)-log-concave for some \(\kappa \in \mathbb {R}\). Then, \(\gamma \)-a.e.,

  1. (a)

    Suppose \(\kappa \ge 0\).

    • If \(\kappa S^2\ge 1\):

      $$\begin{aligned} |\mathcal {D}X_t|_{\mathcal {L}(H,\mathbb {R}^d)}^2\le \frac{t((1-\kappa )t+\kappa )}{\kappa },\quad t\in [0,1). \end{aligned}$$
    • If \(\kappa S^2<1\): For \(t\in \left[ \frac{1-\kappa S^2}{(1-\kappa )S^2+1},1\right) \),

      $$\begin{aligned} |\mathcal {D}X_t|_{\mathcal {L}(H,\mathbb {R}^d)}^2\le & {} \frac{1}{2S^2}((1-\kappa )S^2t+\kappa S^2)^2\left\{ e^{2(1-\kappa S^2)}-1\right\} \\{} & {} +((1-\kappa )t+\kappa )(\kappa S^2+t+(1-\kappa )tS^2-1). \end{aligned}$$
  2. (b)

    Suppose \(\kappa \le 0\) and that \(S<+\infty \). For \(t\in \left[ \frac{1-\kappa S^2}{(1-\kappa )S^2+1},1\right) \),

    $$\begin{aligned} |\mathcal {D}X_t|_{\mathcal {L}(H,\mathbb {R}^d)}^2\le & {} \frac{1}{2S^2}((1-\kappa )S^2t+\kappa S^2)^2\left\{ e^{2(1-\kappa S^2)}-1\right\} \\{} & {} +((1-\kappa )t+\kappa )(\kappa S^2+t+(1-\kappa )tS^2-1). \end{aligned}$$

We will now extend the solution X and its Malliavin derivatives to \(t=1\).

Proposition 3.10

Let \(dp=fd\gamma _d\) with \(S:={\textrm{diam}}({\textrm{supp}}(p))\) and suppose that either p is \(\kappa \)-log-concave for some \(\kappa > 0\), or that p is \(\kappa \)-log-concave for some \(\kappa \in \mathbb {R}\) and that \(S<+\infty \). The Eq. (2.1) has a unique strong solution \((X_t)\) for \(t\in [0,1]\) satisfying, \(\gamma \)-a.e.,

$$\begin{aligned} \forall \, r\le t, ~~D_rX_t=\text {Id}_d+\int _r^t\nabla v(s,X_s)D_rX_sds\quad \text {and}\quad D_rX_t=0~\forall \, r> t \end{aligned}$$

and

$$\begin{aligned} |\mathcal {D}X_t|_{\mathcal {L}(H,\mathbb {R}^d)}^2\le \int _0^t e^{2\int _s^t\lambda _{\max }(\nabla v(r,X_r))dr} ds\quad \forall \, t\in [0,1]. \end{aligned}$$

In addition:

  1. (a)

    Suppose \(\kappa \ge 0\).

    • If \(\kappa S^2\ge 1\):

      $$\begin{aligned} |\mathcal {D}X_t|_{\mathcal {L}(H,\mathbb {R}^d)}^2\le \frac{t((1-\kappa )t+\kappa )}{\kappa },\quad t\in [0,1]. \end{aligned}$$
    • If \(\kappa S^2<1\): For \(t\in \left[ \frac{1-\kappa S^2}{(1-\kappa )S^2+1},1\right] \),

      $$\begin{aligned} |\mathcal {D}X_t|_{\mathcal {L}(H,\mathbb {R}^d)}^2\le & {} \frac{1}{2S^2}((1-\kappa )S^2t+\kappa S^2)^2\left\{ e^{2(1-\kappa S^2)}-1\right\} \\{} & {} +((1-\kappa )t+\kappa )(\kappa S^2+t+(1-\kappa )tS^2-1). \end{aligned}$$
  2. (b)

    Suppose \(\kappa \le 0\) and \(S<+\infty \). For \(t\in \left[ \frac{1-\kappa S^2}{(1-\kappa )S^2+1},1\right] \),

    $$\begin{aligned} |\mathcal {D}X_t|_{\mathcal {L}(H,\mathbb {R}^d)}^2\le & {} \frac{1}{2S^2}((1-\kappa )S^2t+\kappa S^2)^2\left\{ e^{2(1-\kappa S^2)}-1\right\} \\{} & {} +((1-\kappa )t+\kappa )(\kappa S^2+t+(1-\kappa )tS^2-1). \end{aligned}$$

Proof

We start by establishing the solution to (2.1) all the way to \(t=1\). Let \((X_t)_{t\in [0,1)}\) be the process given by Lemma 3.8. For \(k\in \mathbb {Z}_+\) define \(t_k:=1-\frac{1}{k}\) and compute, for \(l\ge k\),

$$\begin{aligned} \mathbb {E}[|X_{t_l}-X_{t_k}|^2]&\le 2\mathbb {E}[|B_{t_l}-B_{t_k}|^2]\\&\quad +2\int _{t_k}^{t_l}\mathbb {E}[|v(s,X_s)|^2]ds\le 2d(t_l-t_k)+2 (t_l-t_k)\textsf{H}(p|\gamma _d)\\&\le \frac{2d}{k}+\frac{2}{k}\textsf{H}(p|\gamma _d) \end{aligned}$$

where we used (2.2) and \(t_l-t_k\le 1-\left( 1-\frac{1}{k}\right) =\frac{1}{k}\). Given \(\epsilon >0\) let N be such that \(\frac{2d}{N}+\frac{2}{N}\textsf{H}(p|\gamma _d)<\epsilon \) (which is possible as \(\textsf{H}(p|\gamma _d)<\infty \)) to conclude that \(\mathbb {E}_{\gamma }[|X_{t_l}-X_{t_k}|^2]\le \epsilon \) for any \(k,l\ge N\). Hence, \(\{X_{t_k}\}\) is a Cauchy sequence in \(L^2(\Omega ,\mathbb {R}^d)\) which is complete; we denote the limit by \(X_1\). Repeating the above argument on the right-hand side of (2.1) shows that \((X_t)_{t\in [0,1]}\) solves (2.1) for all \(t\in [0,1]\).

To extend the derivative to \(t=1\) we start by showing that \(DX_1\) exists. Fix \(w\in \mathbb {R}^d\) and take \(\dot{h}\equiv w\) so that

$$\begin{aligned} |\mathcal {D}X_t[\dot{h}]|^2=\left| \int _0^tD_rX_twdr\right| ^2. \end{aligned}$$

Taking \(w=e_j\), the jth element of the standard basis of \(\mathbb {R}^d\), and using that \(D_r^jX_t^i=0\) if \(r>t\), we have

$$\begin{aligned} |\mathcal {D}X_t[\dot{h}]|^2= & {} \sum _{j=1}^d\left( \int _0^tD_r^jX_t^idr\right) ^2\\= & {} \sum _{j=1}^d\left( \int _0^1D_r^jX_t^idr\right) ^2\le \sum _{j=1}^d|D^jX_t^i|_{H}^2. \end{aligned}$$

By Corollary 3.9, it follows that \(\sup _k |D^jX_{t_k}^i|_{H}^2<\infty \) for any \(i,j\in [d]\), \(\gamma \)-a.e. Hence, by [59, Lemma 1.2.3], for any \(i,j\in [d]\), \(D^jX_1^i\) exists and \(D^jX_{t_k}^i\) converges to \(D^jX_1^i\) in the weak topology of \(L^2(\Omega ,H)\). Hence, for a fixed \(\dot{h}\in H\), we have that \(\mathbb {E}_{\gamma }[|\mathcal {D}X_{t_k}[\dot{h}]-\mathcal {D}X_1[\dot{h}]|^2]\rightarrow 0\) as \(k\rightarrow \infty \). In particular, for a fixed \(\dot{h}\in H\), \(\mathcal {D}X_{t_k}[\dot{h}]\) converges to \(\mathcal {D}X_{1}[\dot{h}]\) in probability.

On the other hand, fix \(\dot{h}\in H\) and recall the definition of \(\alpha _{\dot{h}}:[0,1)\rightarrow \mathbb {R}^d\) from the proof of Lemma 3.8. The definition of \(\alpha _{\dot{h}}\) as an integral, and the fact that the integrand is bounded (since \(\sup _k |D^jX_{t_k}^i|_{H}^2<\infty \) and \(\dot{h}_r\) is in \(L^2([0,1],\mathbb {R}^d)\)), show that, \(\gamma \)-a.e., \(\{\alpha _{\dot{h}}(t_k)\}\) is a Cauchy sequence so it converges to some limit denoted as \(\alpha _{\dot{h}}(1)\). Hence, \(\gamma \)-a.e, \(\mathcal {D}X_{t_k}[\dot{h}]\) converges to \(\alpha _{\dot{h}}(1)\) for any \(\dot{h}\in H\). In particular, \(\mathcal {D}X_{t_k}[\dot{h}]\) converges to \(\alpha _{\dot{h}}(1)\) in probability.

It follows that, \(\gamma \)-a.e., \(\alpha _{\dot{h}}(1)=\mathcal {D}X_{1}[\dot{h}]\). Since \(\alpha _{\dot{h}}(t)=\mathcal {D}X_{t}[\dot{h}]\) for all \(t\in [0,1)\) we conclude that, \(\gamma \)-a.e., for any \(\dot{h}\in H\), \(\mathcal {D}X_{t}[\dot{h}]\) converges to \(\mathcal {D}X_{1}[\dot{h}]\). By the Banach-Steinhaus theorem, \(\gamma \)-a.e., the limiting operator \(\mathcal {D}X_{1}:H\rightarrow \mathbb {R}^d\) is linear and continuous with \(|\mathcal {D}X_{1}|_{\mathcal {L}(H,\mathbb {R}^d)}\le \lim \inf _{t\uparrow 1} |\mathcal {D}X_t|_{\mathcal {L}(H,\mathbb {R}^d)}\). The proof is complete. \(\square \)

3.3 Proof of Theorem 3.1

We start by noting that Lemma 2.3 applies in our setting because the moment assumption holds by either convexity or the boundedness of the support (including in the Gaussian mixture case).

Part (1): Combining the results in Proposition 3.10 and plugging in \(t=1\) we get:

  1. (a)

    If \(\kappa S^2\ge 1\):

    $$\begin{aligned} |\mathcal {D}X_1|_{\mathcal {L}(H,\mathbb {R}^d)}^2\le \frac{1}{\kappa }\quad \gamma \text {-a.e.} \end{aligned}$$
  2. (b)

    If \(\kappa S^2<1\):

    $$\begin{aligned} |\mathcal {D}X_1|_{\mathcal {L}(H,\mathbb {R}^d)}^2\le \left( \frac{e^{1-\kappa S^2}+1}{2}\right) S^2 \quad \gamma \text {-a.e.} \end{aligned}$$

This completes the proof.

Part (2): By Lemma 3.4(3),

$$\begin{aligned} \nabla v(t,x)\preceq R^2\text {Id}_d \end{aligned}$$

so the previous arguments of this section apply to show that (2.1) has a unique strong solution in the setting where p is a mixture of Gaussians. In addition, the bound \(\nabla v(t,x)\preceq R^2\text {Id}_d\) implies that

$$\begin{aligned} \lambda _{\max }(\nabla v(t,x))\le \theta _t:=R^2\quad \forall t\in [0,1]. \end{aligned}$$

Hence, repeating the computations earlier in this section yields

$$\begin{aligned} |\mathcal {D}X_1|_{\mathcal {L}(H,\mathbb {R}^d)}^2\le \int _0^1e^{2\int _s^1\theta _r dr}ds=\frac{e^{2R^2}-1}{2R^2}. \end{aligned}$$

4 Contraction properties of Brownian transport maps for log-concave measures

In this section we suppose that p is an isotropic log-concave measure with compact support. Our main result, Theorem 4.2, bounds the norms of the derivative of the Brownian transport map (Theorem 1.4). The proof of Theorem 4.2 relies on the result of [38] and the technique of [18], which is based on the stochastic localization of Eldan; see also [25, 39, 49].

4.1 Preliminaries

We start by explaining the connection between stochastic localization and the Föllmer process. Recall that the Föllmer process is the solution \((X_t)_{t\in [0,1]}\) to the stochastic differential equation (2.1):

$$\begin{aligned} dX_t=\nabla \log P_{1-t}f(X_t)dt+dB_t,\quad X_0=0 \end{aligned}$$

and has the property that \(X_1\sim p\) where \(p=fd\gamma \). We also recall the definition (3.1):

$$\begin{aligned} dp^{x,t}(y)= \frac{f(y)\varphi ^{x,t}(y)}{P_tf(x)}dy. \end{aligned}$$

Let us denote by \(p_t\) the (random) law of \(X_1|X_t\), that is, \(\int _{\mathbb {R}^d} \eta dp_t=\mathbb {E}_{\gamma }[\eta (X_1)|X_t]\) a.s. for all \(\eta :\mathbb {R}^d\rightarrow \mathbb {R}\) continuous and bounded. The next lemma establishes the connection between the Föllmer process and stochastic localization. The proof is well-known and we provide it for completeness.

Lemma 4.1

For \(t\in [0,1)\) the random law \(p_t\) has a density with respect to the Lebesgue measure, denoted by \(p_t(y)\), which satisfies \(p_t(y)dy=dp^{X_t,1-t}(y)\). Further, given \(y\in \mathbb {R}^d\) the random process \((p_t(y))_{t\in [0,1)}\) satisfies the stochastic differential equation

$$\begin{aligned} dp_t(y)=p_t(y)\left\langle \frac{y-\int zdp_t(z)}{1-t},dB_t\right\rangle . \end{aligned}$$
(4.1)

In stochastic localization (in its simplified setting), equation (4.1), up to time-change (\(t\mapsto \frac{1}{1-t}-1\)), serves as the definition of the process. We refer [26, 50], and [40, section 4] for more information.

Proof of Lemma 4.1

Let \((X_t)_{t\in [0,1]}\) be the Föllmer process and let \(\mu \) be its associated measure on the Wiener space \(\Omega \): \(\frac{d\mu }{d\gamma }(\omega )=f(\omega _1)\) for \(\omega \in \Omega \). Then, for any \(\eta :\mathbb {R}^d\rightarrow \mathbb {R}\) continuous and bounded, we have

$$\begin{aligned} \mathbb {E}_{\gamma }[\eta (X_1)|X_t]&=\mathbb {E}_{\mu }[\eta (\omega _1)|X_t]=\frac{\mathbb {E}_{\gamma }\left[ \frac{d\mu }{d\gamma }(\omega _1)\eta (\omega _1)\bigg |X_t\right] }{\mathbb {E}_{\gamma }\left[ \frac{d\mu }{d\gamma }(\omega _1)\bigg |X_t\right] }\\&=\frac{\mathbb {E}_{\gamma }[f(\omega _1)\eta (\omega _1)|X_t]}{\mathbb {E}_{\gamma }[f(\omega _1)|X_t]}=\frac{P_{1-t}(f\eta )(X_t)}{P_{1-t}f(X_t)}\\&=\frac{1}{P_{1-t}f(X_t)}\int _{\mathbb {R}^d}\eta (y)f(y)\frac{\exp \left( -\frac{|y-X_t|^2}{2(1-t)}\right) }{(2\pi (1-t))^{d/2}} dy\\&=\int _{\mathbb {R}^d}\eta (y)p^{X_t,1-t}(y)dy. \end{aligned}$$

It follows that \(p_t=p^{X_t,1-t}\) with density

$$\begin{aligned} p_t(y)=\frac{f(y)}{P_{1-t}f(X_t)}\frac{\exp \left( -\frac{|y-X_t|^2}{2(1-t)}\right) }{(2\pi (1-t))^{d/2}} \end{aligned}$$

which is well-defined for all \(t\in [0,1)\). Fix \(y\in \mathbb {R}^d\) and let

$$\begin{aligned} \alpha (t,x):=\frac{1}{P_{1-t}f(x)}\quad \text {and} \quad \beta (t,x):=\frac{\exp \left( -\frac{|y-x|^2}{2(1-t)}\right) }{(2\pi (1-t))^{d/2}} \end{aligned}$$

so that \(p_t(y)=\alpha (t,X_t)\beta (t,X_t)\). By the heat equation,

$$\begin{aligned} \partial _t\alpha (t,x)&=-\frac{\partial _t P_{1-t}f(x)}{P_{1-t}f(x)^2}=\frac{1}{2}\frac{\Delta P_{1-t}f(x)}{P_{1-t}f(x)^2},\\ \partial _t\beta (t,x)&=\frac{\exp \left( -\frac{|y-x|^2}{2(1-t)}\right) }{(2\pi (1-t))^{d/2}}\left\{ \frac{d}{2}\frac{1}{1-t}-\frac{|y-x|^2}{2(1-t)^2}\right\} ,\\ \nabla \alpha (t,x)&=-\frac{\nabla P_{1-t}f(x)}{P_{1-t}f(x)^2},\quad \nabla \beta (t,x)=\frac{\exp \left( -\frac{|y-x|^2}{2(1-t)}\right) }{(2\pi (1-t))^{d/2}}\frac{y-x}{1-t},\\ \Delta \alpha (t,x)&=-\frac{\Delta P_{1-t}f(x)}{P_{1-t}f(x)^2}+2\frac{|\nabla P_{1-t}f(x)|^2}{P_{1-t}f(x)^3},\\ \Delta \beta (t,x)&=\frac{\exp \left( -\frac{|y-x|^2}{2(1-t)}\right) }{(2\pi (1-t))^{d/2}}\left\{ \frac{|y-x|^2}{(1-t)^2}-\frac{d}{1-t}\right\} , \end{aligned}$$

and hence,

$$\begin{aligned}&\partial _t[\alpha (t,x)\beta (t,x)]=p_t(y)\left\{ \frac{1}{2}\frac{\Delta P_{1-t}f(x)}{P_{1-t}f(x)}+\frac{d}{2}\frac{1}{1-t}-\frac{|y-x|^2}{2(1-t)^2}\right\} ,\\&\nabla [\alpha (t,x)\beta (t,x)]=p_t(y)\left\{ -\frac{\nabla P_{1-t}f(x)}{P_{1-t}f(x)}+\frac{y-x}{1-t}\right\} ,\\&\frac{1}{2}\Delta [\alpha (t,x)\beta (t,x)]= p_t(y)\left\{ -\frac{1}{2}\frac{\Delta P_{1-t}f(x)}{P_{1-t}f(x)}+\frac{|\nabla P_{1-t}f(x)|^2}{P_{1-t}f(x)^2}\right. \\&\quad \left. -\left\langle \frac{\nabla P_{1-t}f(x)}{P_{1-t}f(x)},\frac{y-x}{1-t}\right\rangle +\frac{|y-x|^2}{2(1-t)^2}-\frac{d}{2(1-t)}\right\} . \end{aligned}$$

It follows from Itô’s formula that

$$\begin{aligned} d[\alpha (t,X_t)\beta (t,X_t)]&= p_t(y)\left\{ \frac{|\nabla P_{1-t}f(X_t)|^2}{P_{1-t}f(X_t)^2}-\left\langle \frac{\nabla P_{1-t}f(X_t)}{P_{1-t}f(X_t)},\frac{y-X_t}{1-t}\right\rangle \right\} dt\\&\quad +p_t(y)\left\langle -\frac{\nabla P_{1-t}f(X_t)}{P_{1-t}f(X_t)}+\frac{y-X_t}{1-t},dX_t\right\rangle \\&=p_t(y)\left\langle -\frac{\nabla P_{1-t}f(X_t)}{P_{1-t}f(X_t)}+\frac{y-X_t}{1-t},dB_t\right\rangle . \end{aligned}$$

By integration by parts,

$$\begin{aligned} \frac{\nabla P_{1-t}f(x)}{P_{1-t}f(x)}=\int \frac{z-x}{1-t}dp^{x,1-t}(z)\Longrightarrow -\frac{\nabla P_{1-t}f(X_t)}{P_{1-t}f(X_t)}=\frac{X_t-\int zdp_t(z)}{1-t}, \end{aligned}$$

so

$$\begin{aligned} dp_t(y)=p_t(y)\left\langle \frac{y-\int zdp_t(z)}{1-t},dB_t\right\rangle . \end{aligned}$$

\(\square \)

4.2 Moments of the derivative of the Brownian transport map

Our next goal is to bound the moments of \(\mathcal {D}X_t\). To this end, we will use the current best bounds in the Kannan–Lovász–Simonovits conjecture. Let \(k\ge 0\) be such that

$$\begin{aligned} C_{\text {kls}}\le a d^k \end{aligned}$$

where \(a>0\) is some dimension-free constant. If the Kannan–Lovász–Simonovits conjecture is true, we can take \(k=\frac{1}{\log d}\) to get \(C_{\text {kls}}\le a e\), which is a dimension-free constant. The result of [38] is that we can take \(k=\frac{\log \log d}{\log d}\), which then yields \(C_{\text {kls}}\le a \log d\).

Theorem 4.2

(Isotropic log-concave measures) Let p be an isotropic log-concave measure with compact support. Then (2.1) has a unique strong solution on [0, 1]. Further, there exists a universal \(\zeta \) such that, for any positive integer m,

$$\begin{aligned} \mathbb {E}_{\gamma }\left[ |\mathcal {D}X_t|_{\mathcal {L}(H,\mathbb {R}^d)}^{2m}\right] \le \zeta ^m (2m+1)! (\log d)^{12m}\quad \forall t\in [0,1]. \end{aligned}$$

Remark 4.3

The assumption in Theorem 4.2 that p has a compact support is not important for the application to the Kannan–Lovász–Simonovits conjecture; see [18, section 2.6]. In particular, the bounds in the theorem are independent of the size of the support of p.

Proof

By Proposition 3.10, there exists a unique strong solution \((X_t)\) to (2.1) for all \(t\in [0,1]\) with \(X_1\sim p\) and, for any \(m>0\),

$$\begin{aligned} \mathbb {E}_{\gamma }\left[ |\mathcal {D}X_t|_{\mathcal {L}(H,\mathbb {R}^d)}^{2m}\right] \le \mathbb {E}_{\gamma }\left[ \left( \int _0^t e^{2\int _s^t\lambda _{\max }(\nabla v(r,X_r))dr} ds\right) ^m\right] \quad \forall t\in [0,1]. \end{aligned}$$
(4.2)

Hence, our goal is to upper bound the right-hand side of the inequality above. Given \(\alpha >2\) define the stopping time

$$\begin{aligned} \tau :=r_0 \wedge \inf \{r\in [0,1]:\lambda _{\max }(\nabla v(r,X_r))\ge \alpha \} \end{aligned}$$

for some \(r_0\in \left[ 0,\frac{t}{2}\right] \) to be chosen later. By Lemma 3.4(2) (with \(\kappa =0\)), we have \(\lambda _{\max }(\nabla v(r,X_r))\le \frac{1}{r}\) for all \(r\in [0,1]\) while, on the other hand, \(\lambda _{\max }(\nabla v(r,X_r))\le \alpha \) for \(r\in [0,\tau ]\) by the definition of \(\tau \). Hence,

$$\begin{aligned} \int _s^t\lambda _{\max }(\nabla v(r,X_r))dr= & {} \int _s^{\tau }\lambda _{\max }(\nabla v(r,X_r))dr+\int _{\tau }^t\lambda _{\max }(\nabla v(r,X_r))dr\le \alpha r_0\\{} & {} +\int _{\tau }^t\frac{1}{r}dr=\alpha r_0+\log t-\log \tau \end{aligned}$$

so it follows that

$$\begin{aligned} e^{2\int _s^t\lambda _{\max }(\nabla v(r,X_r))dr} \le e^{2\alpha r_0}\frac{t^2}{\tau ^2}. \end{aligned}$$

We conclude that

$$\begin{aligned} \mathbb {E}_{\gamma }\left[ \left( \int _0^t e^{2\int _s^t\lambda _{\max }(\nabla v(r,X_r))dr} ds\right) ^m\right] \le e^{2m\alpha r_0}t^{2m}\mathbb {E}_{\gamma }\left[ \frac{1}{\tau ^{2m}}\right] , \end{aligned}$$

and hence, by (4.2),

$$\begin{aligned} \mathbb {E}_{\gamma }\left[ |\mathcal {D}X_t|_{\mathcal {L}(H,\mathbb {R}^d)}^{2m}\right] \le e^{2m\alpha r_0}\mathbb {E}_{\gamma }\left[ \frac{1}{\tau ^{2m}}\right] t^{2m}. \end{aligned}$$
(4.3)

In light of (4.3), we need to choose \(\alpha , r_0\) appropriately and show that \(\mathbb {E}_{\gamma }\left[ \frac{1}{\tau ^{2m}}\right] \) can be sufficiently bounded. The control of the moments of \(\frac{1}{\tau }\) will rely on showing that this random variable has a sub-exponential tail.

Lemma 4.4

Suppose there exist nonnegative constants (possibly dimension dependent) \(b_{\alpha },c_{\alpha }\) such that

$$\begin{aligned} \mathbb {P}_{\gamma }\left[ \frac{1}{\tau }\ge r\right] \le c_{\alpha }e^{-b_{\alpha }r}\quad \forall r\in \left[ \frac{1}{r_0},\infty \right] . \end{aligned}$$

Then,

$$\begin{aligned} \mathbb {E}_{\gamma }\left[ \frac{1}{\tau ^{m}}\right] \le \frac{1}{r_0^m}\left[ 1+\left( \frac{1}{b^m_{\alpha }}+1\right) m!c_{\alpha }me^{-\frac{b_{\alpha }}{r_0}}\right] . \end{aligned}$$

Proof

We will apply the identity \(\mathbb {E}[Y^m]=m\int _0^{\infty } y^{m-1}\mathbb {P}[ Y\ge y]dy\), for a nonnegative random variable Y, with \(Y=\frac{1}{\tau }\). By the definition of \(\tau \), \(\mathbb {P}_{\gamma }\left[ \frac{1}{\tau }\ge s\right] =1\) for \(s\in \left[ 0,\frac{1}{r_0}\right] \) so, for any positive integer m,

$$\begin{aligned} \mathbb {E}_{\gamma }\left[ \frac{1}{\tau ^{m}}\right]&=m\int _0^{\frac{1}{r_0}}r^{m-1}dr+m\int _{\frac{1}{r_0}}^{\infty } r^{m-1}\mathbb {P}_{\gamma }\left[ \frac{1}{\tau }\ge r\right] dr\\&\le \frac{1}{r_0^m}+c_{\alpha }m\int _{\frac{1}{r_0}}^{\infty } r^{m-1} e^{-b_{\alpha }r}dr\\&=\frac{1}{r_0^m}+\frac{c_{\alpha }m}{b_{\alpha }^m}\int _{\frac{b_{\alpha }}{r_0}}^{\infty } r^{m-1} e^{-r}dr=\frac{1}{r_0^m}+\frac{c_{\alpha }m(m-1)!}{b_{\alpha }^m}e^{-\frac{b_{\alpha }}{r_0}}\sum _{j=0}^{m-1}\frac{(b_{\alpha })^j}{r_0^jj!} \end{aligned}$$

where we used the incomplete Gamma function identity \(\int _x^{\infty }r^{m-1}e^{-r}dr=(m-1)!e^{-x}\sum _{j=0}^{m-1}\frac{x^j}{j!}\) when m is a positive integer. Using

$$\begin{aligned} \frac{1}{j!}\le 1,\quad \quad b_{\alpha }^j\le b_{\alpha }^m+1, \quad \text {and}\quad \frac{1}{r_0^j}\le \frac{1}{r_0^m} ~\text { (as } r_0\in [0,1]) \end{aligned}$$

we have \(\sum _{j=0}^{m-1}\frac{(b_{\alpha })^j}{r_0^jj!}\le m\frac{b_{\alpha }^m+1}{r_0^m}\) and hence

$$\begin{aligned} \mathbb {E}_{\gamma }\left[ \frac{1}{\tau ^{m}}\right] \le \frac{1}{r_0^m}\left[ 1+\left( \frac{1}{b_{\alpha }^m}+1\right) m!c_{\alpha }me^{-\frac{b_{\alpha }}{r_0}}\right] . \end{aligned}$$

\(\square \)

In light of Lemma 4.4, our goal is to prove that \(\frac{1}{\tau }\) has a sub-exponential tail, which requires a better understanding of the stopping time \(\tau \). To simplify notation let \(K_t:=\text {Cov}(p^{X_t,1-t})\) and recall the representation (3.2),

$$\begin{aligned} \nabla v(t,X_t)=\frac{1}{(1-t)^2}K_t-\frac{1}{1-t}\text {Id}_d. \end{aligned}$$

Hence,

$$\begin{aligned} \lambda _{\max }(\nabla v(t,X_t))=\frac{\lambda _{\max }(K_t)}{(1-t)^2}-\frac{1}{1-t} \end{aligned}$$

and

$$\begin{aligned} \tau =r_0\wedge \inf \left\{ r\in [0,1]:\frac{\lambda _{\max }(K_r)}{(1-r)^2}-\frac{1}{1-r}\ge \alpha \right\} . \end{aligned}$$

The quantity \(\lambda _{\max }(K_r)\) is difficult to control so we use the moment method and instead control \(\Gamma _r:=\text {Tr}[K_r^q]\), while noting that \(\lambda _{\max }(K_r)\le \Gamma _r^{\frac{1}{q}}\) for any \(q\ge 0\). The process \((\Gamma _t)_{t\in [0,1]}\) satisfies a stochastic differential equation

$$\begin{aligned} d\Gamma _t=u_tdB_t+\delta _t dt \end{aligned}$$

for some vector-valued process \((u_t)_{t\in [0,1]}\) and a real-valued process \((\delta _t)_{t\in [0,1]}\). These processes can be derived using Itô’s formula and the stochastic differential equation satisfied by \((K_t)_{t\in [0,1]}\) (which itself can be derived using Itô’s formula). Next, we use the argument in [18] to control the processes \((u_t)\) and \((\delta _t)\).

Lemma 4.5

Suppose \(C_{\text {kls}}\le a d^k\) for \(k\ge 0\) and let \(q:=\lceil \frac{1}{k} \rceil +1\). Then, there exists a universal constant \(c>0\) such that, for any \(r\in \left[ 0,\frac{1}{2}\right] \), we have, a.s.,

$$\begin{aligned}&|u_r|\le cq\Gamma _r^{1+\frac{1}{2q}},\\&\delta _r\le ca^2q^2(\log d)d^{2k-\frac{1}{q}}\Gamma _r^{1+\frac{1}{q}}. \end{aligned}$$

Proof

The statement of the lemma is essentially [18, Lemma 6], up to time-change. To make the connection with [18] we recall that, by Lemma 4.1,

$$\begin{aligned} dp_r(x)=p_r(x)\left\langle \frac{x-a_r}{1-r},dB_r\right\rangle , \end{aligned}$$

where \(a_r:=\int zdp_r(z)\), and that \(\Gamma _r=\text {Tr}\left[ (\text {Cov}(p_r))^q\right] \). On the other hand, the arguments of [18] use the measure-valued process [18, Equation (13)]:

$$\begin{aligned} d{\tilde{p}}_r(x)={\tilde{p}}_r(x)\langle x-{\tilde{a}}_r,dB_r\rangle ,\quad {\tilde{p}}_0=p \end{aligned}$$
(4.4)

with \({\tilde{a}}_r:=\int _{\mathbb {R}^d}zd{\tilde{p}}_r(z)\). The connection between \((p_r)\) and \(({\tilde{p}}_r)\) is via a time change: set \(s(r):=\frac{1}{1-r}-1\) and note that

$$\begin{aligned} d{\tilde{p}}_{s(r)}(x)=\sqrt{s'(r)}{\tilde{p}}_{s(r)}(x)\langle x-{\tilde{a}}_{s(r)},dB_r\rangle =\frac{{\tilde{p}}_{s(r)}}{1-r}\langle x-{\tilde{a}}_{s(r)},dB_r\rangle . \end{aligned}$$

Since the Eq. (4.4) has a unique strong solution [18, Lemma 3], it follows that \(p_r={\tilde{p}}_{s(r)}\) a.s. if the same driving Brownian motion is used. In particular, with \({\tilde{\Gamma }}_r:=\text {Tr}\left[ (\text {Cov}({\tilde{p}}_r))^q\right] \), we have \(\Gamma _r={\tilde{\Gamma }}_{s(r)}\). By [18, Lemma 6],

$$\begin{aligned} d{\tilde{\Gamma }}_r={\tilde{u}}_rdB_r+{\tilde{\delta }}_r dr \end{aligned}$$

for some vector-valued process \(({\tilde{u}}_r)_{r\in [0,1]}\) and a real-valued process \(({\tilde{\delta }}_r)_{r\in [0,1]}\) which satisfy

$$\begin{aligned}&|{\tilde{u}}_r|\le 16q{\tilde{\Gamma }}_r^{1+\frac{1}{2q}},\\&{\tilde{\delta }}_r\le 64a^2q^2(\log d)d^{2k-\frac{1}{q}}{\tilde{\Gamma }}_r^{1+\frac{1}{q}}. \end{aligned}$$

Hence, as

$$\begin{aligned} d\Gamma _r=d{\tilde{\Gamma }}_{s(r)}=\sqrt{s'(r)}{\tilde{u}}_{s(r)}dB_r+s'(r){\tilde{\delta }}_{s(r)}dr, \end{aligned}$$

we get \(u_r=\sqrt{s'(r)}{\tilde{u}}_{s(r)}\) and \(\delta _r=s'(r){\tilde{\delta }}_{s(r)}\) a.s. The proof is complete by noting that \(s'(r)\) and \(\sqrt{s'(r)}\) are uniformly bounded on \(\left[ 0,\frac{1}{2}\right] \). \(\square \)

Extending the analysis of [18], we can use Lemma 4.5 to show that \(\frac{1}{\tau }\) has a sub-exponential tail.

Lemma 4.6

Suppose \(C_{\text {kls}}\le a d^k\) for \(k\ge 0\) and let \(q:=\lceil \frac{1}{k} \rceil +1\). There exists a universal constant c such that, with \(\alpha =2d^{\frac{1}{q}}\), we have

$$\begin{aligned} \mathbb {P}_{\gamma }\left[ \frac{1}{\tau }\ge r\right] \le c_{\alpha }e^{-b_{\alpha }r}\quad \forall r\in \left[ \frac{1}{r_0},\infty \right] , \end{aligned}$$

with

$$\begin{aligned} c_{\alpha }=\exp \left( \frac{2a^2q(\log d)d^{2k-\frac{1}{q}}}{c}\right) \quad \text {and}\quad b_{\alpha }=\frac{1}{2c^2d^{\frac{1}{q}}}. \end{aligned}$$

Proof

For \(s\le r_0\) we have,

$$\begin{aligned} \mathbb {P}_{\gamma }[\tau \le s]=\mathbb {P}_{\gamma }\left[ \sup _{r\in [0,s]}\lambda _{\max }(\nabla v(r,X_r))\ge \alpha \right] \le \mathbb {P}_{\gamma }\left[ \sup _{r\in [0,s]}\lambda _{\max }(K_r)\ge 2\alpha \right] \end{aligned}$$

where the last inequality uses that \(r_0\le \frac{1}{2}\) and that \(\alpha >2\). Recalling that \(\lambda _{\max }(K_r)\le \Gamma _r^{\frac{1}{q}}\) we get,

$$\begin{aligned} \mathbb {P}_{\gamma }[\tau \le s]\le \mathbb {P}_{\gamma }\left[ \sup _{r\in [0,s]}\Gamma _r\ge (2\alpha )^q\right] =\mathbb {P}_{\gamma }\left[ \sup _{r\in [0,s]} \Theta _r\ge (2\alpha )^q\right] \end{aligned}$$

where \((\Theta _r)\) is the stopped process given by

$$\begin{aligned} \Theta _r:=1_{r<\theta }\Gamma _r+1_{r\ge \theta }(2\alpha )^q\quad \text {with}\quad \theta :=\inf \{r: \Gamma _r\ge (2\alpha )^q\}. \end{aligned}$$

Let \(\eta (x)=-x^{-\frac{1}{2q}}\) and note that \(\eta \) is monotonically increasing on \((0,\infty )\) so

$$\begin{aligned} \mathbb {P}_{\gamma }[\tau \le s]\le \mathbb {P}_{\gamma }\left[ \sup _{r\in [0,s]}\eta ( \Theta _r)\ge \eta ((2\alpha )^q)\right] = \mathbb {P}_{\gamma }\left[ \sup _{r\in [0,s]}\eta (\Theta _r)\ge - \frac{1}{\sqrt{2\alpha }}\right] . \end{aligned}$$

Moreover, since \(\eta (\Theta _r)=-\frac{1}{\sqrt{2\alpha }}\) for \(r\ge \theta \), we have

$$\begin{aligned} \mathbb {P}_{\gamma }[\tau \le s]\le \mathbb {P}_{\gamma }\left[ \sup _{r\in [0,\min (\theta ,s)]}\eta (\Theta _r)\ge - \frac{1}{\sqrt{2\alpha }}\right] . \end{aligned}$$
(4.5)

Applying Itô’s formula to \(\eta (\Theta _r)\), and using Lemma 4.5 as well as \(\Gamma _r=\Theta _r\) for \(r\le \theta \), we get, for \(r\le \theta \),

$$\begin{aligned} d\eta (\Theta _r)&=\frac{1}{2q\Gamma _r^{1+\frac{1}{2q}}}d\Gamma _r-\frac{1}{2}\frac{1}{2q}\left( \frac{1}{2q}+1\right) \frac{1}{\Gamma _r^{\frac{1}{2q}+2}}d[\Gamma ]_r\le \frac{1}{2q\Gamma _r^{1+\frac{1}{2q}}}d\Gamma _r\\&=\frac{1}{2q\Gamma _r^{1+\frac{1}{2q}}}u_rdB_r+\frac{1}{2q\Gamma _r^{1+\frac{1}{2q}}}\delta _r dr\\&\le \frac{1}{2q\Gamma _r^{1+\frac{1}{2q}}}u_rdB_r+ \frac{1}{2}ca^2q(\log d)d^{2k-\frac{1}{q}}\Gamma _r^{\frac{1}{2q}}dr\\&= \frac{1}{2q\Theta _r^{1+\frac{1}{2q}}}u_rdB_r+ \frac{1}{2}ca^2q(\log d)d^{2k-\frac{1}{q}}\Theta _r^{\frac{1}{2q}}dr. \end{aligned}$$

Define the martingale \(M_s:=\int _0^s\frac{1}{2q\Theta _r^{1+\frac{1}{2q}}}u_rdB_r\) and note that, since p is isotropic, we have \(\eta (\Theta _0)=\eta (\Gamma _0)=\eta (d)=-d^{-\frac{1}{2q}}\). Hence,

$$\begin{aligned} \eta (\Theta _s)\le & {} -d^{-\frac{1}{2q}}+M_s+\int _0^s\frac{1}{2}ca^2q(\log d)d^{2k-\frac{1}{q}}\Theta _r^{\frac{1}{2q}}dr\le -d^{-\frac{1}{2q}}+M_s\\{} & {} +\frac{1}{2}sca^2q(\log d)d^{2k-\frac{1}{q}}\sqrt{2}\sqrt{\alpha }, \end{aligned}$$

where the last inequality holds by the definition of \((\Theta _r)\). Plugging this estimate into (4.5) yields

$$\begin{aligned} \mathbb {P}_{\gamma }[\tau \le s]\le \mathbb {P}_{\gamma }\left[ \sup _{r\in [0,\min (\theta ,s)]}M_r\ge - \frac{1}{\sqrt{2\alpha }}+d^{-\frac{1}{2q}}-\frac{\sqrt{2}}{2}sca^2q(\log d)d^{2k-\frac{1}{q}}\sqrt{\alpha }\right] . \end{aligned}$$

By the Dubins-Schwarz theorem we have \(M_s=Z_{[M]_s}\) with \((Z_s)\) a standard Brownian motion in \(\mathbb {R}\), and by Lemma 4.5,

$$\begin{aligned} {[}M_s]=\int _0^s\frac{1}{4q^2\Theta _r^{2+\frac{2}{2q}}}|u_r|^2dr\le \frac{c^2}{4}s. \end{aligned}$$

Hence,

$$\begin{aligned}&\mathbb {P}_{\gamma }\left[ \sup _{r\in [0,\min (\theta ,s)]}M_r\ge - \frac{1}{\sqrt{2\alpha }}+d^{-\frac{1}{2q}}-\frac{\sqrt{2}}{2}sca^2q(\log d)d^{2k-\frac{1}{q}}\sqrt{\alpha }\right] \\&\quad =\mathbb {P}_{\gamma }\left[ \sup _{r\in [0,\min (\theta ,s)]}Z_{[M]_r}\ge - \frac{1}{\sqrt{2\alpha }}+d^{-\frac{1}{2q}}-\frac{\sqrt{2}}{2}sca^2q(\log d)d^{2k-\frac{1}{q}}\sqrt{\alpha }\right] \\&\quad \le \mathbb {P}_{\gamma }\left[ \sup _{r\in [0,\frac{c^2}{4}\min (\theta ,s)]}Z_r\ge - \frac{1}{\sqrt{2\alpha }}+d^{-\frac{1}{2q}}-\frac{\sqrt{2}}{2}sa^2q(\log d)d^{2k-\frac{1}{q}}\sqrt{\alpha }\right] \\&\quad \le \mathbb {P}_{\gamma }\left[ \sup _{r\in [0,\frac{c^2}{4}s]}Z_r\ge - \frac{1}{\sqrt{2\alpha }}+d^{-\frac{1}{2q}}-\frac{\sqrt{2}}{2}sca^2q(\log d)d^{2k-\frac{1}{q}}\sqrt{\alpha }\right] . \end{aligned}$$

Applying Doob’s maximal inequality for Brownian motion we get

$$\begin{aligned}&\mathbb {P}_{\gamma }\left[ \sup _{r\in [0,\frac{c^2}{4}s]}Z_r\ge - \frac{1}{\sqrt{2\alpha }}+d^{-\frac{1}{2q}}-\frac{\sqrt{2}}{2}sca^2q(\log d)d^{2k-\frac{1}{q}}\sqrt{\alpha }\right] \\&\quad \le \exp \left( -2\frac{\left[ - \frac{1}{\sqrt{2\alpha }}+d^{-\frac{1}{2q}}-\frac{\sqrt{2}}{2}sca^2q(\log d)d^{2k-\frac{1}{q}}\sqrt{\alpha }\right] ^2}{c^2s}\right) . \end{aligned}$$

Now let \(\alpha :=2d^{\frac{1}{q}}\) so that

$$\begin{aligned}&\left[ - \frac{1}{\sqrt{2\alpha }}+d^{-\frac{1}{2q}}-\frac{\sqrt{2}}{2}sa^2q(\log d)d^{2k-\frac{1}{q}}\sqrt{\alpha }\right] ^2=\left[ \frac{1}{2d^{\frac{1}{2q}}}-sca^2q(\log d)d^{2k-\frac{1}{2q}}\right] ^2\\&\quad =\frac{1}{4d^{\frac{1}{q}}}-sca^2q(\log d)d^{2k-\frac{1}{q}}+s^2c^2a^4q^2(\log d)^2d^{4k-\frac{1}{q}}. \end{aligned}$$

Omitting the (positive) last term above we get

$$\begin{aligned}&\mathbb {P}_{\gamma }\left[ \sup _{r\in [0,\frac{c^2}{4}s]}Z_r\ge - \frac{1}{\sqrt{2\alpha }}+d^{-\frac{1}{2q}}-\frac{\sqrt{2}}{2}sca^2q(\log d)d^{2k-\frac{1}{q}}\sqrt{\alpha }\right] \\&\quad \le \exp \left( -\frac{1}{2c^2sd^{\frac{1}{q}}}\right) \exp \left( \frac{2a^2q(\log d)d^{2k-\frac{1}{q}}}{c}\right) . \end{aligned}$$

\(\square \)

We now complete the proof of the theorem. By Lemma 4.4 and Lemma 4.6,

$$\begin{aligned} \mathbb {E}_{\gamma }\left[ \frac{1}{\tau ^{m}}\right] \le \frac{1}{r_0^m}\left[ 1+\left( \frac{1}{b^m_{\alpha }}+1\right) m!m\exp \left( \frac{2a^2q(\log d)d^{2k-\frac{1}{q}}}{c}\right) \exp \left( -\frac{1}{2c^2d^{\frac{1}{q}}r_0}\right) \right] . \end{aligned}$$

We will choose \(r_0\in \left[ 0,\frac{t}{2}\right] \) such that the two exponentials cancel each other. Setting

$$\begin{aligned} r_0=\frac{t}{4qca^2(\log d)d^{2k}} \end{aligned}$$

we get

$$\begin{aligned} \mathbb {E}_{\gamma }\left[ \frac{1}{\tau ^{m}}\right] \le \left( \frac{4qca^2(\log d)d^{2k}}{t}\right) ^m\left[ 1+\left( \frac{1}{b^m_{\alpha }}+1\right) m!m\right] . \end{aligned}$$

By [38, Theorem 1.2], we may take \(k=\frac{\log \log d}{\log d}\), and hence, \(q=\lceil \frac{1}{k}\rceil +1=c'\frac{\log d}{\log \log d}\) for some \(c'\). By increasing \(c'\), we may assume that \(2k=\frac{2}{q-1}\). Hence, using \(\frac{1}{b^m_{\alpha }}+1\le \frac{2}{b^m_{\alpha }}=2^{m+1}c^{2\,m}d^{\frac{m}{q}}\) and \(\left[ 1+\left( \frac{1}{b^m_{\alpha }}+1\right) m!m\right] \le 2\left( \frac{1}{b^m_{\alpha }}+1\right) m!m\le 2^{m+2}c^{2\,m}m!md^{\frac{m}{q}}\), we get

$$\begin{aligned} \mathbb {E}_{\gamma }\left[ \frac{1}{\tau ^{m}}\right]&\le \left( \frac{4qca^2(\log d)d^{2k}}{t}\right) ^m2^{m+2}c^{2m}m!md^{\frac{m}{q}}\\&\le \frac{1}{t^m}(32)^mc^{3m}a^{2m}m!m[q(\log d)d^{\frac{2}{q-1}+\frac{1}{q}}]^m\\&\le \frac{1}{t^m}(32)^mc^{3m}a^{2m}m!m[(\log d)qd^{\frac{4}{q-1}}]^m. \end{aligned}$$

We have

$$\begin{aligned}&q^m=(c')^m\left( \frac{\log d}{\log \log d}\right) ^m\le (c')^m (\log d)^m,\\&d^{\frac{4m}{q-1}}=d^{4mk}=(\log d)^{4m}, \end{aligned}$$

so

$$\begin{aligned} \mathbb {E}_{\gamma }\left[ \frac{1}{\tau ^{m}}\right] \le m! m[16c^3a^2]^m\frac{(\log d)^{6m}}{t^m}, \end{aligned}$$

for some constant \(a>0\). By (4.3),

$$\begin{aligned} \mathbb {E}_{\gamma }\left[ |\mathcal {D}X_t|_{\mathcal {L}(H,\mathbb {R}^d)}^{2m}\right]&\le e^{2m\alpha r_0}(2m)! 2m[32c^3a^2]^{2m}(\log d)^{12m}\\&=\exp \left( 2m\frac{2d^{\frac{1}{q}}t}{4qca^2(\log d)d^{\frac{2}{q-1}}} \right) (2m)! 2m[32c^3a^2]^{2m}(\log d)^{12m}\\&\le \exp \left( m\frac{1}{cqa^2(\log d)} \right) (2m)! 2m[32c^3a^2]^{2m}(\log d)^{12m}\\&\le e^{c''m}(2m)! 2m[32c^3a^2]^{2m}(\log d)^{12m}, \end{aligned}$$

where we used that \(d^{\frac{1}{q}-\frac{2}{q-1}}<1\), and that \(\exp \left( m\frac{1}{cqa^2(\log d)} \right) \le e^{c''m}\) for some constant \(c''\), as \(d\rightarrow \infty \). Taking

$$\begin{aligned} \zeta :=[32c^3a^2e^{c''}]^2, \end{aligned}$$

and using \(2m(2m)!\le (2m+1)!\), completes the proof. \(\square \)

5 Functional inequalities

The contraction properties provided by Theorem 3.1 and Theorem 4.2 allow us to prove functional inequalities for measures in Euclidean spaces. The main goal of this section is to demonstrate the power of the contraction machinery developed in this paper, rather than be exhaustive, so we focus only on a number of functional inequalities. As a consequence of the almost-sure contraction of Theorem 3.1, we will prove \(\Psi \)-Sobolev inequalities (Theorem 5.3), q-Poincaré inequalities (Theorem 5.4), and isoperimetric inequalities (Theorem 5.5). As a consequence of the contraction in expectation of Theorem 4.2, we will construct Stein kernels and prove central limit theorems (Theorem 1.6 and Corollary 1.7).

We start with almost-sure contractions; the next lemma describes the behavior of derivatives under such contractions.

Lemma 5.1

Let \(\Upsilon :\Omega \rightarrow \mathbb {R}^d\) be an almost-sure contraction with constant C and let \(\eta :\mathbb {R}^d\rightarrow \mathbb {R}\) be a continuously differentiable Lipschitz function. Then,

$$\begin{aligned} D(\eta \circ \Upsilon )=(\mathcal {D}\Upsilon )^*\nabla \eta (\Upsilon )\quad \gamma \text {-a.e.} \end{aligned}$$

where \((\mathcal {D}\Upsilon )^*:\mathbb {R}^d\rightarrow H\) is the adjoint of \(\mathcal {D}\Upsilon \). Further,

$$\begin{aligned} |D(\eta \circ \Upsilon )|_H\le C |\nabla \eta (\Upsilon )|\quad \gamma \text {-a.e.} \end{aligned}$$

Proof

To compute \(D(\eta \circ \Upsilon )\) we note that, by duality, it can be viewed as the operator \(\mathcal {D}(\eta \circ \Upsilon ):H\rightarrow \mathbb {R}\) acting on \(h\in H\) by \(\mathcal {D}(\eta \circ \Upsilon )[h]=\langle D(\eta \circ \Upsilon ),h\rangle _H\). By the chain rule [59, Proposition 1.2.3],

$$\begin{aligned} \langle D(\eta \circ \Upsilon ),h\rangle _H= & {} \int _0^1(\nabla \eta (\Upsilon ))^*D_t\Upsilon \dot{h}_tdt=\langle \nabla \eta (\Upsilon ),\mathcal {D}\Upsilon [h]\rangle \\ {}= & {} \langle (\mathcal {D}\Upsilon )^*\nabla \eta (\Upsilon ),h\rangle _H \end{aligned}$$

so \( D(\eta \circ \Upsilon )= (\mathcal {D}\Upsilon )^*\nabla \eta (\Upsilon )\). Next, using

$$\begin{aligned} D(\eta \circ \Upsilon )=(\mathcal {D}\Upsilon )^*\nabla \eta (\Upsilon )\quad \gamma \text {-a.e.}, \end{aligned}$$

the bound \(|(\mathcal {D}\Upsilon )^*|_{\mathcal {L}(\mathbb {R}^d,H)}=|\mathcal {D}\Upsilon |_{\mathcal {L}(H,\mathbb {R}^d)}\le C\) (Lemma 2.3) implies

$$\begin{aligned} |D(\eta \circ \Upsilon )|_H\le C |\nabla \eta (\Upsilon )|\quad \gamma \text {-a.e.} \end{aligned}$$

\(\square \)

With Lemma 5.1 in hand we can now start the proofs of the functional inequalities which follow from Theorem 3.1. We begin with the \(\Psi \)-Sobolev inequalities [14].

Definition 5.2

Let \(\mathcal {I}\) be a closed interval (possibly unbounded) and let \(\Psi :\mathcal {I}\rightarrow \mathbb {R}\) be a twice-differentiable function. We say that \(\Psi \) is a divergence if each of the functions \(\Psi , \Psi '', -\frac{1}{\Psi ''}\) is convex. Given a probability measure \(\nu \) on \(\mathbb {R}^d\) and a function \(\eta :\mathbb {R}^d\rightarrow \mathcal {I}\), such that \(\int \eta \, d\nu \in \mathcal {I}\), we define

$$\begin{aligned} \text {Ent}_{\nu }^{\Psi }(\eta ):=\int _{\mathbb {R}^d}\Psi (\eta )d\nu -\Psi \left( \int _{\mathbb {R}^d}\eta \,d\nu \right) . \end{aligned}$$

Some classical examples of divergences are \(\Psi :\mathbb {R}\rightarrow \mathbb {R}\) with \(\Psi (x)=x^2\) (Poincaré inequality) and \(\Psi :\mathbb {R}_{\ge 0}\rightarrow \mathbb {R}\) with \(\Psi (x)=x\log x\) (log-Sobolev inequality).

Theorem 5.3

(\(\Psi \)-Sobolev inequalities) Let \(\Psi :\mathcal {I}\rightarrow \mathbb {R}\) be a divergence.

  1. (1)

    Let p be a \(\kappa \)-log-concave measure with \(S:=\text {diam}({\textrm{supp}}(p))\) and let \(\eta :\mathbb {R}^n\rightarrow \mathcal {I}\) be any continuously differentiable Lipschitz function such that \(\int \eta ^2\, dp\in \mathcal {I}\).

    • If \(\kappa S^2\ge 1\) then

      $$\begin{aligned} \text {Ent}_p^{\Psi }(\eta )\le \frac{1}{2\kappa }\int _{\mathbb {R}^d}\Psi ''(\eta )|\nabla \eta |^2 dp. \end{aligned}$$
    • If \(\kappa S^2< 1\) then

      $$\begin{aligned} \text {Ent}_{p}^{\Psi }(\eta )\le \frac{e^{1-\kappa S^2}+1}{4}S^2\int _{\mathbb {R}^d}\Psi ''(\eta )|\nabla \eta |^2 dp. \end{aligned}$$
  2. (2)

    Fix a probability measure \(\nu \) on \(\mathbb {R}^d\) supported on a ball of radius R and let \(p:=\gamma _d^{a,\Sigma }\star \nu \) where \(\gamma _d^{a,\Sigma }\) is a the Gaussian measure on \(\mathbb {R}^d\) with mean a and covariance \(\Sigma \). Set \(\lambda _{\min }:=\lambda _{\min }(\Sigma )\) and \(\lambda _{\max }:=\lambda _{\max }(\Sigma )\). Then, for any \(\eta :\mathbb {R}^n\rightarrow \mathcal {I}\), a continuously differentiable Lipschitz function such that \(\int \eta ^2\, dp\in \mathcal {I}\), we have

    $$\begin{aligned} \text {Ent}_{p}^{\Psi }(\eta )\le \frac{\lambda _{\min }\lambda _{\max }}{2R^2}\left( e^{2 \frac{R^2}{\lambda _{\min }}}-1\right) \int _{\mathbb {R}^d}\Psi ''(\eta )|\nabla \eta |^2 d p. \end{aligned}$$

Proof

  1. (1)

    We will use the fact [14, Theorem 4.2] that \(\Psi \)-Sobolev inequalities hold for the Wiener measure \(\gamma \):

    $$\begin{aligned} \text {Ent}_{\gamma }^{\Psi }(F)\le \frac{1}{2}\mathbb {E}_{\gamma }\left[ \Psi ''(F)|DF|_{H}^2\right] \end{aligned}$$

    for any \(F:\Omega \rightarrow \mathcal {I}\) which is \(L^2\)-integrable with respect to \(\gamma \). Let \((X_t)_{t\in [0,1]}\) be the Föllmer process associated to p, so that \(X_1\sim p\), and suppose that \(X_1:\Omega \rightarrow \mathbb {R}^d\) is an almost-sure contraction with constant C. Given \(\eta \) let \(F(\omega ):=(\eta \circ X_1)(\omega )\). Then, by Lemma 5.1 and [59, Proposition 1.2.4],

    $$\begin{aligned} \text {Ent}_{p}^{\Psi }(\eta )&=\text {Ent}_{\gamma }^{\Psi }(F)\le \frac{1}{2}\mathbb {E}_{\gamma }\left[ \Psi ''(F)|DF|_H^2\right] \le \frac{C^2}{2}\mathbb {E}_{\gamma }\left[ \Psi ''(\eta \circ X_1)|\nabla \eta (X_1)|^2\right] \\&=\frac{C^2}{2}\int _{\mathbb {R}^d}\Psi ''(\eta )|\nabla \eta |^2 dp. \end{aligned}$$

    The proof is complete by Theorem 3.1.

  2. (2)

    Let \(Y\sim \nu \), let \({\tilde{\nu }}\) be the law \(\Sigma ^{-1/2}Y\), and define \({\tilde{p}}:=\gamma _d\star {\tilde{\nu }}\). Set \(\lambda _{\min }:=\lambda _{\min }(\Sigma )\) and \(\lambda _{\max }:=\lambda _{\max }(\Sigma )\). The argument of part (1) gives,

    $$\begin{aligned} \text {Ent}_{{\tilde{p}}}^{\Psi }(\eta )\le \frac{e^{2\lambda _{\min }^{-1} R^2}-1}{2\lambda _{\min }^{-1} R^2}\int _{\mathbb {R}^d}\Psi ''(\eta )|\nabla \eta |^2 d{\tilde{p}}. \end{aligned}$$

    Let \(p=\gamma _d^{a,\Sigma }\star \nu \) and let \({\tilde{X}}\sim {\tilde{p}}\) so that \(\Sigma ^{1/2}{\tilde{X}}+a\sim p\). Given \(\eta \) let \({\tilde{\eta }}:=\eta (\Sigma ^{1/2}x+a)\) so that

    $$\begin{aligned} \text {Ent}_{p}^{\Psi }(\eta )=\text {Ent}_{{\tilde{p}}}^{\Psi }({\tilde{\eta }})\le \frac{e^{2\lambda _{\min }^{-1} R^2}-1}{2\lambda _{\min }^{-1} R^2}\int _{\mathbb {R}^d}\Psi ''({\tilde{\eta }})|\nabla {\tilde{\eta }}|^2 d{\tilde{p}}. \end{aligned}$$

    Since \(\nabla {\tilde{\eta }}(x)=\Sigma ^{1/2}\nabla \eta (\Sigma ^{1/2}x+a)\) we have \(|\nabla {\tilde{\eta }}(x)|^2\le \lambda _{\max } |\nabla \eta (\Sigma ^{1/2}x+a)|^2\) so

    $$\begin{aligned} \text {Ent}_{p}^{\Psi }(\eta )\le \frac{\lambda _{\min }\lambda _{\max }}{2R^2}\left( e^{2 \frac{R^2}{\lambda _{\min }}}-1\right) \int _{\mathbb {R}^d}\Psi ''(\eta )|\nabla \eta |^2 d p. \end{aligned}$$

\(\square \)

Theorem 5.4

(q-Poincaré inequalities) Let \(q\in [1,\infty )\) and set \(c_q:=1_{q\in [1,2)}\frac{\pi }{2}+1_{q\in [2,\infty )}\sqrt{q-1}\). Let \(\eta :\mathbb {R}^n\rightarrow \mathbb {R}\) be any continuously differentiable Lipschitz function such that \(\int \eta \,dp=0\) and \(\eta \in L^q(\gamma _d)\).

  1. (1)

    Let p be a \(\kappa \)-log-concave measure with \(S:=\text {diam}({\textrm{supp}}(p))\).

    • If \(\kappa S^2\ge 1\) then

      $$\begin{aligned} \mathbb {E}_p[\eta ^q]\le \frac{1}{\kappa ^{q/2}}c_q^q\mathbb {E}_p[|\nabla \eta |^q]. \end{aligned}$$
    • If \(\kappa S^2< 1\) then

      $$\begin{aligned} \mathbb {E}_p[\eta ^q]\le \left( \frac{e^{1-\kappa S^2}+1}{2}\right) ^{q/2}S^qc_q^q\mathbb {E}_p[|\nabla \eta |^q]. \end{aligned}$$
  2. (2)

    Fix a probability measure \(\nu \) on \(\mathbb {R}^d\) supported on a ball of radius R and let \(p:=\gamma _d^{a,\Sigma }\star \nu \) where \(\gamma _d^{a,\Sigma }\) is a the Gaussian measure on \(\mathbb {R}^d\) with mean a and covariance \(\Sigma \). Then, with \(\lambda _{\min }:=\lambda _{\min }(\Sigma )\) and \(\lambda _{\max }:=\lambda _{\max }(\Sigma )\), we have

    $$\begin{aligned} \mathbb {E}_{p}[\eta ^q]\le c_q^q \frac{1}{2^{q/2}}\frac{(\lambda _{\min }\lambda _{\max })^{q/2}}{R^q}(e^{2\frac{R^2}{\lambda _{\min }}}-1)^{q/2}\mathbb {E}_{p}[|\nabla \eta |^q]. \end{aligned}$$

Proof

  1. (1)

    We will use the fact [1, Theorem 2.6] (see [58, Proposition 3.1(3)] for an earlier result) that the q-Poincaré inequality holds for the Wiener measure \(\gamma \): For \(q\in [1,\infty )\) and \(F\in \mathbb {D}^{1,q}\) with \(\mathbb {E}_{\gamma }[F]=0\), we have

    $$\begin{aligned} \mathbb {E}_{\gamma }[F^q]\le c_q^q\mathbb {E}_{\gamma }[|DF|_{H}^q]. \end{aligned}$$

    Let \((X_t)_{t\in [0,1]}\) be the Föllmer process associated to p, so that \(X_1\sim p\), and suppose that \(X_1:\Omega \rightarrow \mathbb {R}^d\) is an almost-sure contraction with constant C. Given \(\eta \) let \(F(\omega ):=(\eta \circ X_1)(\omega )\). Then, by Lemma 5.1 and [59, Proposition 1.2.4],

    $$\begin{aligned} \mathbb {E}_p[\eta ^q]=\mathbb {E}_{\gamma }[F^q]\le c_q^q\mathbb {E}_{\gamma }[|DF|_{H}^q]\le C^qc_q^q\mathbb {E}_{\gamma }[|\nabla \eta (X_1)|^q]=C^qc_q^q\mathbb {E}_p[|\nabla \eta |^q]. \end{aligned}$$

    The proof is complete by Theorem 3.1.

  2. (2)

    Let \(Y\sim \nu \), let \({\tilde{\nu }}\) be the law \(\Sigma ^{-1/2}Y\), and define \({\tilde{p}}:=\gamma _d\star {\tilde{\nu }}\). Set \(\lambda _{\min }:=\lambda _{\min }(\Sigma )\) and \(\lambda _{\max }:=\lambda _{\max }(\Sigma )\). The argument of part (1) gives,

    $$\begin{aligned} \mathbb {E}_{{\tilde{p}}}[\eta ^q]\le \left( \frac{e^{2\lambda _{\min }^{-1}R^2}-1}{2\lambda _{\min }^{-1}R^2}\right) ^{q/2}c_q^q\mathbb {E}_{{\tilde{p}}}[|\nabla \eta |^q]. \end{aligned}$$

    Let \(p=\gamma _d^{a,\Sigma }\star \nu \) and let \({\tilde{X}}\sim {\tilde{p}}\) so that \(\Sigma ^{1/2}{\tilde{X}}+a\sim p\). Given \(\eta \) let \({\tilde{\eta }}:=\eta (\Sigma ^{1/2}x+a)\) so that

    $$\begin{aligned} \mathbb {E}_{p}[\eta ^q]=\mathbb {E}_{{\tilde{p}}}[{\tilde{\eta }}^q]\le \left( \frac{e^{2\lambda _{\min }^{-1}R^2}-1}{2\lambda _{\min }^{-1}R^2}\right) ^{q/2}c_q^q\mathbb {E}_{{\tilde{p}}}[|\nabla {\tilde{\eta }}|^q]. \end{aligned}$$

    Since \(\nabla {\tilde{\eta }}(x)=\Sigma ^{1/2}\nabla \eta (\Sigma ^{1/2}x+a)\) we have \(|\nabla {\tilde{\eta }}(x)|^q\le \lambda _{\max }^{q/2} |\nabla \eta (\Sigma ^{1/2}x+a)|^q\) so

    $$\begin{aligned} \mathbb {E}_{p}[\eta ^q]\le c_q^q \frac{1}{2^{q/2}}\frac{(\lambda _{\min }\lambda _{\max })^{q/2}}{R^q}(e^{2\frac{R^2}{\lambda _{\min }}}-1)^{q/2}\mathbb {E}_{p}[|\nabla \eta |^q]. \end{aligned}$$

\(\square \)

Theorem 5.5

(Isoperimetric inequalities) Let \(\Phi \) be the cumulative distribution function of \(\gamma _1\) and let \(B_d\subset \mathbb {R}^d\) be the unit ball.

  1. (1)

    Let p be a \(\kappa \)-log-concave measure with \(S:=\text {diam}({\textrm{supp}}(p))\) and let

    $$\begin{aligned} C:= {\left\{ \begin{array}{ll} \frac{1}{\sqrt{\kappa }}&{}\text {if }\kappa S^2\ge 1\\ \left( \frac{e^{1-\kappa S^2}+1}{2}\right) ^{1/2}S&{}\text {if }\kappa S^2< 1. \end{array}\right. } \end{aligned}$$

    Then, for any Borel set \(A\subset \mathbb {R}^d\) and \(r\ge 0\),

    $$\begin{aligned} p[A+r B_d]\ge \Phi \left( \Phi ^{-1}(p[A])+\frac{r}{C}\right) . \end{aligned}$$
  2. (2)

    Fix a probability measure \(\nu \) on \(\mathbb {R}^d\) supported on a ball of radius R and let \(p:=\gamma _d^{a,\Sigma }\star \nu \) where \(\gamma _d^{a,\Sigma }\) is a the Gaussian measure on \(\mathbb {R}^d\) with mean a and covariance \(\Sigma \). Set \(\lambda _{\min }:=\lambda _{\min }(\Sigma ),~ \lambda _{\max }:=\lambda _{\max }(\Sigma )\), and

    $$\begin{aligned} C:= (\lambda _{\min }\lambda _{\max })^{1/2}\frac{(e^{\frac{2R^2}{\lambda _{\min }}}-1)^{1/2}}{\sqrt{2}R}. \end{aligned}$$

    Then,

    $$\begin{aligned} p[A+rB_d]\ge \Phi \left( \Phi ^{-1}(p[A])+\frac{r}{C}\right) . \end{aligned}$$

Proof

  1. (1)

    Let \(B_{H^1}\) be the unit ball in \(H^1\). We will use the fact [45, Theorem 4.3] that the Wiener measure \(\gamma \) satisfies the isoperimetric inequality:

    $$\begin{aligned} \gamma [K+r B_{H^1}]\ge \Phi (\Phi ^{-1}(\gamma (K))+r) \end{aligned}$$

    for any Borel measurable set \(K\subset \Omega \) and \(r\ge 0\); see the discussion following [45, Theorem 4.3] for measurability issues.

    Let \((X_t)_{t\in [0,1]}\) be the Föllmer process associated to p so that \(X_1\sim p\). Suppose that \(X_1:\Omega \rightarrow \mathbb {R}^d\) is an almost-sure contraction with constant C, so in particular,

    $$\begin{aligned} |X_1(\omega +h)-X_1(\omega )|\le C|h|_{H^1}\quad \forall h\in H^1, \quad \gamma \text {-a.e.} \end{aligned}$$

    Let \(M\subset \mathbb {R}^d\) be a Borel measurable set. We will show that

    $$\begin{aligned} X_1^{-1}(M)+\frac{r}{C}B_{H^1}\subset X_1^{-1}(M+rB_d). \end{aligned}$$
    (5.1)

    Then, by the isoperimetric inequality for \(\gamma \) and (5.1),

    $$\begin{aligned} p[M+rB_d]=\gamma [X_1^{-1}(M+rB_d)]&\ge \gamma \left[ X_1^{-1}(M)+\frac{r}{C}B_{H^1}\right] \\&\ge \Phi \left( \Phi ^{-1}\left( \gamma [X_1^{-1}(M)]\right) +\frac{r}{C}\right) \\&=\Phi \left( \Phi ^{-1}(p[M])+\frac{r}{C}\right) . \end{aligned}$$

    The proof is then complete by Theorem 3.1.

    In order to prove (5.1) it suffices to show that

    $$\begin{aligned} X_1\left( X_1^{-1}(M)+\frac{r}{C}B_{H^1}\right) \subset M+rB_d, \end{aligned}$$

    or, in other words, that \(\omega \in X_1^{-1}(M)+\frac{r}{C}B_{H^1}\Rightarrow X_1(\omega )\in M+rB_d\). Fix \(\omega \in X_1^{-1}(M)+\frac{r}{C}B_{H^1}\) so that \(\omega =\theta +\frac{r}{C}h\) for some \(\theta \in X_1^{-1}(M)\) and \(h\in H^1\) with \(|h|_{H^1}=1\). Then \(X_1(\omega -\frac{r}{C}h)\in M\) and, as X is an almost-sure contraction with a constant C, \(\left| X_1\left( \omega -\frac{r}{C}h\right) -X_1(\omega )\right| \le r\) so \(X_1(\omega )\in M+rB_d\) as desired.

  2. (2)

    Let \(Y\sim \nu \), let \({\tilde{\nu }}\) be the law \(\Sigma ^{-1/2}Y\), and define \({\tilde{p}}:=\gamma _d\star {\tilde{\nu }}\). Set \(\lambda _{\min }:=\lambda _{\min }(\Sigma )\) and \(\lambda _{\max }:=\lambda _{\max }(\Sigma )\). The argument of part (1) gives, for any Borel set \(M\subset \mathbb {R}^d\) and \(r\ge 0\),

    $$\begin{aligned} {\tilde{p}} [M+rB_d]\ge \Phi \left( \Phi ^{-1}({\tilde{p}}[M])+\frac{r}{C}\right) \end{aligned}$$

    with \(C:=\left( \frac{e^{2\lambda _{\min }^{-1}R^2}-1}{2\lambda _{\min }^{-1}R^2}\right) ^{1/2}\). Let \(p=\gamma _d^{a,\Sigma }\star \nu \) and let \({\tilde{X}}\sim {\tilde{p}}\) so that \(\Sigma ^{1/2}{\tilde{X}}+a\sim p\). Then, for any Borel set \(M\subset \mathbb {R}^d\) and \(r\ge 0\),

    $$\begin{aligned} p[M+rB_d]={\tilde{p}}[\Sigma ^{-1/2}(M-a)+\Sigma ^{-1/2}rB_d]\ge {\tilde{p}}[\Sigma ^{-1/2}(M-a)+\lambda _{\max }^{-1/2}rB_d]. \end{aligned}$$

    Hence,

    $$\begin{aligned} p[M+rB_d]\ge \Phi \left( \Phi ^{-1}\left( {\tilde{p}}\left[ \Sigma ^{-1/2}(M-a)\right] \right) +\frac{r\lambda _{\max }^{-1/2}}{C}\right) . \end{aligned}$$

    The proof is complete by noting that \({\tilde{p}}\left[ \Sigma ^{-1/2}(M-a)\right] =p[M]\).

\(\square \)

5.1 Stein kernels

We now turn to the applications of the contraction in expectation, as in Theorem 4.2. Specifically, we shall prove Theorem 1.6, from which Corollary 1.7 follows, as explained in the introduction. We first establish the connection between the Brownian transport map and Stein kernels. Given Malliavin differentiable functions \(F,G:\Omega \rightarrow \mathbb {R}^k\) we denote

$$\begin{aligned} \left( DF,DG\right) _H:=\int _0^1D_tF(D_tG)^*dt. \end{aligned}$$

Note that, as outlined in Sect. 2, for every fixed \(t \in [0,1]\), \(D_tF\) is a \(k \times d\) matrix, and so \(\left( DF,DG\right) _H\) takes values in the space of \(k\times k\) matrices. The construction of the Stein kernel relies on the Ornstein-Uhlenbeck operator \(\mathcal {L}\), as defined, as in [57, section 2.8.2]. To define the operator, let \(\delta \) stand for the adjoint of the Malliavin derivative D, also called the Skorokhod integral. For our purposes we shall only use \(\delta \) on matrix-valued paths DF and DG, where F and G are as above. In this case, \(\delta \) acts on the rows of DG, and \(\delta DG\) takes values in \(\mathbb {R}^k\). Formally, the adjoint property of \(\delta \) is given by

$$\begin{aligned} \mathbb {E}\left[ \left( DF,DG\right) _H\right] = -\mathbb {E}\left[ \langle F, \delta D G \rangle \right] , \end{aligned}$$

where the inner product on the right hand side is the Euclidean one in \(\mathbb {R}^k\).

We can now define the Ornstein-Uhlenbeck operator as \(\mathcal {L}:= - \delta D\). By construction, if \(G:\Omega \rightarrow \mathbb {R}^k\), then \(\mathcal {L} G:\Omega \rightarrow \mathbb {R}^k\) as well. A useful property of \(\mathcal {L}\) is that it is invertible on the subspace of functions G, satisfying \(\mathbb {E}_\gamma \left[ G\right] =0\), and we denote the pseudo-inverse by \(\mathcal {L}^{-1}\); see [57, section 2.8.2] for more details, and in particular [57, Proposition 2.8.11]. Now, given a Malliavin differentiable function \(F:\Omega \rightarrow \mathbb {R}^k\) such that, \(\mathbb {E}_{\gamma }[F]=0\), we define the \(k\times k\) matrix-valued map

$$\begin{aligned} \tau (x):=\mathbb {E}_{\gamma }\left[ (-D\mathcal {L}^{-1}F,DF)_H|F=x\right] . \end{aligned}$$

Above, the expression \(\mathbb {E}_{\gamma }\left[ \cdot |F=x\right] \) is the expectation of the regular conditional probability on the fibers of the map \(F^{-1}(x)\), which is well-defined for almost every \(x\in \mathbb {R}^k\), with respect to the law of F, [35].

Lemma 5.6

Let \(F: \Omega \rightarrow \mathbb {R}^k\) be Malliavin differentiable and satisfy \(\mathbb {E}_\gamma \left[ F\right] = 0\). Then, the map \(\tau \) is a Stein kernel for \(F_*\gamma \).

Proof

The proof follows the argument in [53, Lemma 1]. Let \(\eta :\mathbb {R}^k\rightarrow \mathbb {R}^k\) be a continuously differentiable and Lipschitz function and let \(Y\sim F_*\gamma \). We need to show that

$$\begin{aligned} \mathbb {E}[\langle \nabla \eta (Y),\tau (Y)\rangle _{\text {HS}}]=\mathbb {E}[\langle \eta (Y),Y\rangle ] \end{aligned}$$

where \(\langle \cdot ,\cdot \rangle _{\text {HS}}\) is the Hilbert-Schmidt inner product. We recall that \(\mathcal {L}=-\delta D\) where \(\delta \) is the adjoint to the Malliavin derivative D. Compute,

$$\begin{aligned}&\mathbb {E}[\langle \nabla \eta (Y),\tau (Y)\rangle _{\text {HS}}]=\mathbb {E}[\langle \nabla \eta (Y),\mathbb {E}_{\gamma }\left[ (-D\mathcal {L}^{-1}F,DF)_H|F=Y\right] \rangle _{\text {HS}}]\\&\quad =\mathbb {E}_{\gamma }[\langle \nabla \eta (F),(-D\mathcal {L}^{-1}F,DF)_H\rangle _{\text {HS}}]\\&\quad =\mathbb {E}_{\gamma }[\text {Tr}[(-D\mathcal {L}^{-1}F,D(\eta \circ F),)_H]]\quad (\text {chain rule})\\&\quad =\mathbb {E}_{\gamma }[\langle \eta \circ F,-\delta D\mathcal {L}^{-1}F\rangle ]\quad (\delta \hbox {is adjoint to} D)\\&\quad =\mathbb {E}_{\gamma }[\langle \eta \circ F,\mathcal {L}\mathcal {L}^{-1}F\rangle ]\quad (\mathcal {L}=-\delta D)\\&\quad =\mathbb {E}_{\gamma }[\langle \eta \circ F, F\rangle ]\quad (\text {when }\mathbb {E}_{\gamma }[F]=0)\\&\quad =\mathbb {E}[\langle \eta (Y),Y\rangle ]. \end{aligned}$$

\(\square \)

Theorem 1.6

(Stein kernels) Let p be an isotropic log-concave measure on \(\mathbb {R}^d\) with compact support. Let \(\chi :\mathbb {R}^d\rightarrow \mathbb {R}^k\) be a continuously differentiable function with bounded partial derivatives such that \(\mathbb {E}_p[\chi ]=0\) and \(\mathbb {E}_p[|\nabla \chi |_{\text {op}}^8]<\infty \). Then, the pushforward measure \(q:=\chi _*p\) on \(\mathbb {R}^k\) admits a Stein kernel \(\tau _q\) satisfying

$$\begin{aligned} \mathbb {E}_q[|\tau _q|_{\text {HS}}^2]\le a d (\log d)^{24}\sqrt{\mathbb {E}_p[|\nabla \chi |_{\text {op}}^8]}, \end{aligned}$$

for some universal constant \(a>0\).

Remark 5.7

As will become evident from the proof of Theorem 1.6, the result holds provided that \(\chi \circ X_1\) is a Malliavin differentiable random vector where \((X_t)\) is the Föllmer process associated to p. By [59, Proposition 1.2.3], this condition holds if \(\chi \) is a continuously differentiable function with bounded partial derivatives.

Proof

Let \(F=\chi \circ X_1\) and let \(\tau \) be the Stein kernel constructed above so that \(\tau _q:=\tau \) is a Stein kernel of q. Let \((\mathcal {P}_t)\) be the Ornstein-Uhlenbeck semigroup on the Wiener spaceFootnote 8 and recall that it is a contraction [57, Proposition 2.8.6]. By [57, Proposition 2.9.3],

$$\begin{aligned} \mathbb {E}_q[|\tau _q|_{\text {HS}}^2]= & {} \mathbb {E}_{\gamma }[|(DF,D\mathcal {L}^{-1}F)_H|_{\text {HS}}^2]=\mathbb {E}_{\gamma }\left[ \left| \int _0^{\infty }e^{-s}(DF,\mathcal {P}_s(DF))_Hds\right| _{\text {HS}}^2\right] \\\le & {} \sup _{s\in [0,\infty )}\mathbb {E}_{\gamma }\left[ \left| (DF,\mathcal {P}_s(DF))_H\right| _{\text {HS}}^2\right] . \end{aligned}$$

Using \((\mathcal {P}_s(DF))_r=\mathcal {P}_s(D_rF)\), we get

$$\begin{aligned}&\sup _{s\in [0,\infty )}\mathbb {E}_{\gamma }\left[ \left| (DF,\mathcal {P}_s(DF))_H\right| _{\text {HS}}^2\right] =\sup _{s\in [0,\infty )}\mathbb {E}_{\gamma }\left[ \left| \int _0^1(D_rF)^*(\mathcal {P}_s(D_rF))dr\right| _{\text {HS}}^2\right] \\&\quad \le \sup _{s\in [0,\infty )}\mathbb {E}_{\gamma }\left[ \int _0^1|(D_rF)^*(\mathcal {P}_s(D_rF))|_{\text {HS}}^2dr\right] \\&\quad \le \min \{k,d\}\sup _{s\in [0,\infty )}\mathbb {E}_{\gamma }\left[ \int _0^1|D_rF|_{\text {op}}^2|\mathcal {P}_s(D_rF)|_{\text {op}}^2dr\right] \\&\quad \le d\,\mathbb {E}_{\gamma }\left[ \int _0^1|D_rF|_{\text {op}}^2|D_rF|_{\text {op}}^2dr\right] \le d\sup _{r\in [0,1]}\mathbb {E}_{\gamma }[|D_rF|_{\text {op}}^4]\\&\quad =d\sup _{r\in [0,1]}\mathbb {E}_{\gamma }[|\nabla \chi (X_1)(D_rX_1)|_{\text {op}}^4]\\&\quad \le d\sqrt{\mathbb {E}_{\gamma }[|\nabla \chi (X_1)|_{\text {op}}^8]}\,\sup _{r\in [0,1]}\sqrt{\mathbb {E}_{\gamma }[|D_rX_1|_{\text {op}}^8]}. \end{aligned}$$

It remains to show that \(\sup _{r\in [0,1]}\sqrt{\mathbb {E}_{\gamma }[|D_rX_1|_{\text {op}}^8]}\le a (\log d)^{24}\) for some universal constant \(a>0\). The latter will follow from Theorem 4.2 as soon as we show that, \(\gamma \)-a.e.,

$$\begin{aligned} |D_rX_1|_{\text {op}}\le |\mathcal {D}X_1|_{\mathcal {L}(H,\mathbb {R}^d)}\quad \forall r\in [0,1]. \end{aligned}$$

Indeed, fix \(\omega \in \Omega \) and \(r\in [0,1]\). Choose a unit vector \(w\in \mathbb {R}^d\) such that \(|D_rX_1|_{\text {op}}=|D_rX_1w|\). Let \((\dot{h}^{\epsilon ,r})_{\epsilon >0}\subset L^2([0,1])\) be an approximation to the identity: \(\int _0^1|\dot{h}_s^{\epsilon ,r}|^2ds=1\) and for every continuous \(\eta \in L^2([0,1])\), \(\lim _{\epsilon \rightarrow 0}\int _0^1\eta (s)\dot{h}_s^{\epsilon ,r}=\eta (r)\). Define \(\dot{h}_s^{\epsilon ,r,w}:=\dot{h}_s^{\epsilon ,r}w\) for \(s\in [0,1]\) and note that \(|\dot{h}^{\epsilon ,r,w}|_H=1\) so \(|\mathcal {D}X_1[\dot{h}^{\epsilon ,r,w}]|\le |\mathcal {D}X_1|_{\mathcal {L}(H,\mathbb {R}^d)}\). By the definition of \(\mathcal {D}X_1\), and since \(s\mapsto D_sX_1\) is continuous (since it satisfies a differential equation) \(\gamma \)-a.e., we have

$$\begin{aligned} \lim _{\epsilon \rightarrow 0}\mathcal {D}X_1[\dot{h}^{\epsilon ,r,w}]=D_rX_1w. \end{aligned}$$

It follows that \(|D_rX_1w|=\lim _{\epsilon \rightarrow 0}|\mathcal {D}X_1[\dot{h}^{\epsilon ,r,w}]|\le |\mathcal {D}X_1|_{\mathcal {L}(H,\mathbb {R}^d)}\). This completes the proof. \(\square \)

6 Cameron–Martin contractions

The notion of contraction we considered up until now was the appropriate one when the target measures were measures on \(\mathbb {R}^d\). If, however, we are interested in transportation between measures on the Wiener space itself, then we need a stronger notion of contraction.

Definition 6.1

A measurable map \(T:\Omega \rightarrow \Omega \) is a Cameron-Martin transport map if \(T(\omega )=\omega +\xi (\omega )\) for some measurable map \(\xi :\Omega \rightarrow H^1\); we write \(\xi (\omega )=\int _0^{\cdot }\dot{\xi }(\omega )\) for some measurable map \(\dot{\xi }:\Omega \rightarrow H\). We set \((T_t)_{t\in [0,1]}:=( W_t\circ T)_{t\in [0,1]}\). A Cameron-Martin transport map \(T:\Omega \rightarrow \Omega \) is a Cameron-Martin contraction with constant C if, \(\gamma \)-a.e.,

$$\begin{aligned} |T(\omega +h)-T(\omega )|_{H^1}\le C|h|_{H^1}\quad \forall ~h\in H^1. \end{aligned}$$

Claim

Let \(T:\Omega \rightarrow \Omega \) be a Cameron-Martin contraction with constant C. Then, for any \(t\in [0,1]\), \(T_t:\Omega \rightarrow \mathbb {R}^d\) is an almost-sure contraction with constant C.

Proof

Let \(T:\Omega \rightarrow \Omega \) be a Cameron-Martin contraction with constant C. Fix \(h\in H^1,\omega \in \Omega ,\) and define \(q\in H^1\) by \(q_t=T_t(\omega +h)-T_t(\omega )\) for \(t\in [0,1]\); note that q is indeed an element of \(H^1\) since T is a Cameron-Martin transport map. Since

$$\begin{aligned} \sup _{t\in [0,1]}|q_t|=\sup _{t\in [0,1]}\left| \int _0^t\dot{q}_sds\right| , \end{aligned}$$

it follows from Jensen’s inequality that

$$\begin{aligned} \sup _{t\in [0,1]}|T_t(\omega +h)-T_t(\omega )|^2= & {} \sup _{t\in [0,1]}\left| \int _0^t\dot{q}_sds\right| ^2\le \sup _{t\in [0,1]}t\int _0^t|\dot{q}_s|^2ds\\\le & {} \int _0^1|\dot{q}_s|^2ds=|T(\omega +h)-T(\omega )|_{H^1}^2. \end{aligned}$$

\(\square \)

We see that a Cameron-Martin contraction is a stronger notion than an almost-sure contraction. Given a measure \(\mu \) on \(\Omega \), a Cameron-Martin contraction between \(\gamma \) and \(\mu \) would transfer functional inequalities from \(\gamma \) to \(\mu \) where the functions are allowed to depend on the entire path \(\{\omega _t\}_{t\in [0,1]}\). For the rest of the paper, we focus on the question of whether either the causal optimal transport map or the optimal transport map is a Cameron-Martin contraction when the target measure enjoys convexity properties. The notion of convexity we use is compatible with the Cameron-Martin space [30]:

Definition 6.2

A measurable map \(V:\Omega \rightarrow \mathbb {R}\cup \{\infty \}\) is Cameron-Martin convex if, for any \( h,g\in H^1\) and \(\alpha \in [0,1]\), it holds that

$$\begin{aligned} V(\omega +\alpha h+(1-\alpha )g)\ge \alpha V(\omega + h)+(1-\alpha ) V(\omega + g) \quad \gamma \text {-a.e.} \end{aligned}$$

Remark 6.3

An important example of a Cameron-Martin convex function V on \(\Omega \) is \(V(\omega )=\eta (\omega _1)\) with \(\eta :\mathbb {R}^d\rightarrow \mathbb {R}\) a convex function.

The precise question we consider is the following: Suppose \(\mu \) is a probability measure on \(\Omega \) of the form \(d\mu (\omega )=e^{-V(\omega )}d\gamma (\omega )\), where \(V:\Omega \rightarrow \mathbb {R}\) is a Cameron-Martin convex function. Let \(A,O:\Omega \rightarrow \Omega \) be the causal optimal transport map and optimal transport map from \(\gamma \) to \(\mu \), respectively. Is it true that either A or O is a Cameron-Martin contraction with any constant C?

In order to answer this question our first task is to construct a suitable notion of derivative for Cameron-Martin transport maps \(T:\Omega \rightarrow \Omega \) so that, in analogy with Lemma 2.3, we can establish a correspondence between being a Cameron-Martin contraction and having a bounded derivative. The Malliavin derivative was defined for real-valued functions \(F:\Omega \rightarrow \mathbb {R}\) but it can be defined for H-valued functions \(\dot{\xi }:\Omega \rightarrow H\) as well [59, p. 31]. We start with the class \(\mathcal {S}_{H}\) of H-valued smooth random variables: \(\dot{\xi }\in \mathcal {S}_{H}\) if \(\dot{\xi }=\sum _{i=1}^mF_i\dot{h}_i\) where \(F_i\in \mathcal {S}\) (the class of smooth random variables, cf. Sect. 2), \(\dot{h}_i\in H\) for \(i\in [m]\) for some \(m\in \mathbb {Z}_+\). For \(\dot{\xi }\in \mathcal {S}_{H}\) we define

$$\begin{aligned} H\otimes H\ni D\dot{\xi }:=\sum _{i=1}^m DF_i\otimes \dot{h}_i \end{aligned}$$

and let \(\mathbb {D}^{1,p}(H)\) be the completion of \(\mathcal {S}_{H}\) under the norm

$$\begin{aligned} \Vert \dot{\xi }\Vert _{1,p,H}=\left( \mathbb {E}_{\gamma }\left[ |\dot{\xi }|_{H}^p\right] +\mathbb {E}_{\gamma }\left[ |D\dot{\xi }|_{H\otimes H}^p\right] \right) ^{\frac{1}{p}}. \end{aligned}$$

Since \(D\dot{\xi }\in H\otimes H\) we may also view it as a linear operator \(D\dot{\xi }:H\rightarrow H\) and we denote its operator norm by \(|D\dot{\xi }|_{\mathcal {L}(H)}\).

Definition 6.4

Let \(T:\Omega \rightarrow \Omega \) be a measurable map of the form \(T(\omega )=\omega +\xi (\omega )\) where \(\dot{\xi } \in \mathbb {D}^{1,p}(H)\) for some \(p\ge 1\). For any \(\omega \in \Omega \) we define \(DT(\omega ):H\rightarrow H\) by

$$\begin{aligned} DT(\omega )[\dot{h}]=\dot{h}+D\dot{\xi }(\omega )[\dot{h}], \quad \forall ~\dot{h}\in H. \end{aligned}$$

The operator norm of \(DT(\omega ): H\rightarrow H\), with \(\omega \in \Omega \) fixed, is denoted by \(|DT(\omega )|_{\mathcal {L}(H)}\).

For the purpose of this work, we focus on measures \(\mu \) on \(\Omega \) of the form \(d\mu (\omega ):=f(\omega _1)d\gamma (\omega )\). In addition, comparing the following lemma to Lemma 2.3, we see that it provides only one direction of the correspondence between Cameron-Martin contractions and bounded derivatives. A more general theory could be developed, at least in principle, but our goal in this work is to highlight key differences between causal optimal transport and optimal transport on the Wiener space, to which end the following suffices.

Lemma 6.5

Let \(\mu \) be a measure on \(\Omega \) of the form \(d\mu (\omega ):=f(\omega _1)d\gamma (\omega )\) and let \(T:\Omega \rightarrow \Omega \) be a transport map from \(\gamma \) to \(\mu \) of the form \(T(\omega )=\omega +\xi (\omega )\) where \(\xi :=\int _0^{\cdot }\dot{\xi }\) for some \(\dot{\xi }\in \mathbb {D}^{1,p}(H)\). If T is a Cameron-Martin contraction with constant C then \(|DT|_{\mathcal {L}(H)}\le C\) \(\gamma \)-a.e.

Proof

We first note that the Malliavin differentiability of \(\xi \), as well as [10, Theorem 5.7.2], imply that, \(\gamma \)-a.e., for any \(h,g\in H^1\),

$$\begin{aligned} \lim _{\epsilon \downarrow 0}\frac{1}{\epsilon }\langle T(\omega +\epsilon h)-T(\omega ),g\rangle _{H^1}&=\lim _{\epsilon \downarrow 0}\frac{1}{\epsilon }\langle \epsilon h+\xi (\omega +\epsilon h)-\xi (\omega ),g\rangle _{H^1}\nonumber \\&=\langle [\text {Id}_{\Omega }+D\dot{\xi }(\omega )][\dot{h}],\dot{g}\rangle _{H}\nonumber \\&=\langle DT(\omega )[\dot{h}],\dot{g}\rangle _{H}. \end{aligned}$$
(6.1)

Suppose now that T is a Cameron-Martin contraction with constant C so that, for a fixed \(h\in H^1\), and any \(\epsilon >0\),

$$\begin{aligned} \sup _{g\in H^1, |g|_{H^1}=1}\frac{1}{\epsilon }\langle T(\omega +\epsilon h)-T(\omega ),g\rangle _{H^1}=\frac{1}{\epsilon }|T(\omega +\epsilon h)-T(\omega )|_{H^1}\le C|h|_{H^1}. \end{aligned}$$

Taking \(\epsilon \downarrow 0\) and using (6.1) shows that

$$\begin{aligned} \sup _{\dot{g}\in H, |\dot{g}|_H=1}\langle DT(\omega )[\dot{h}],\dot{g}\rangle _{H}=|DT(\omega )[\dot{h}]|_{H}\le C \end{aligned}$$

so it follows that

$$\begin{aligned} \sup _{\dot{h}\in H, |\dot{h}|_{H}=1}|DT(\omega )[\dot{h}]|_{H}=|DT(\omega )|_{\mathcal {L}(H)}\le C. \end{aligned}$$

\(\square \)

7 Causal optimal transport

In this section we answer in the negative, for causal optimal transport maps, the question raised in Sect. 6, thus proving the second part of Theorem 1.8. In particular, we will construct a strictly log-concave function \(f:\mathbb {R}^d\rightarrow \mathbb {R}\) such that the causal optimal transport map from \(\gamma \) to \(d\mu (\omega ):=f(\omega _1)d\gamma (\omega )\) is not a Cameron-Martin contraction, with any constant C. This indeed provides a negative answer in light of Remark 6.3. Our concrete example is the case where \(fd\gamma _d\) is the measure of a one-dimensional Gaussian random variable conditioned on being positive. More precisely, fix a constant \(\sigma > 0\) and let \(f:\mathbb {R}\rightarrow \mathbb {R}\) be given by

$$\begin{aligned} f(x)=\frac{1_{[0,+\infty )}(x)e^{-\frac{x^2}{2\sigma }}}{\int _{\mathbb {R}}1_{[0,+\infty )}(y)e^{-\frac{y^2}{2\sigma }}d\gamma _1(y)}. \end{aligned}$$

The measure \(f(x)d\gamma _1(x)\) is the measure on \(\mathbb {R}\) of a centered Gaussian, whose variance is smaller than one, conditioned on being positive. We define a measure \(\mu \) on \(\Omega \) by setting

$$\begin{aligned} d\mu (\omega ):=f(\omega _1)d\gamma (\omega ) \end{aligned}$$

and note that f is strictly log-concave for all \(\sigma >0\), and that f is log-concave as \(\sigma \rightarrow \infty \). To simplify computations we will take \(\sigma \ge 1\). Finally, note that the assumptions of Proposition 3.10 hold in this case (\(\kappa \ge 0\)).

Remark 7.1

Given \(d\mu (\omega )=f(\omega _1)d\gamma (\omega )\) let p be the measure on \(\mathbb {R}\) given by \(p:=fd\gamma _1\) and let \(\gamma _1^{a,\sigma }\) be the Gaussian measure on \(\mathbb {R}\) with mean a and variance \(\sigma \). The natural examples for testing whether the causal optimal transport map A, between \(\gamma \) to \(\mu \), is a Cameron-Martin contraction are \(p=\gamma _1^{a,1}\), for some \(a\in \mathbb {R}\), and \(p=\gamma _1^{0,\sigma }\), for some \(\sigma > 0\). We can expect these examples to show that A is not a Cameron-Martin contraction since they saturate the bounds in Lemma 3.4: When \(p=\gamma _1^{a,1}\) we have \(\nabla v(t,x)=0\) (saturation of Lemma 3.4(2) under the assumption \(\kappa \ge 1\)) and when \(p=\gamma _1^{0,\sigma }\) we have that, in the limit \(\sigma \downarrow 0\), \(\nabla v(t,x)=-\frac{1}{1-t}\) (saturation of (3.3)). Since in these cases \(|\nabla v|\) is the largest, we can expect that A will not be a Cameron-Martin contraction since its derivative will blow up. However, explicit calculation shows that A is in fact a Cameron-Martin contraction for \(p=\gamma _1^{a,1}\) and \(p=\gamma _1^{0,\sigma }\). Hence, we require the construction of a more sophisticated example which we obtain by considering Gaussians conditioned on being positive.

In order to prove that A is not a Cameron-Martin contraction we will use Lemma 6.5. We will show that with the example above, with positive probability, the derivative can be arbitrary large so that A cannot be a Cameron-Martin contraction with any constant C.

As mentioned already, the map A is nothing but the Föllmer process X [44]. This allows for the following convenient representation of the derivative of A.

Lemma 7.2

Let A be the causal optimal transport map from \(\gamma \) to \(\mu \) and let X be the solution of (2.1). Fix \(0<\epsilon <1\). For any \(\dot{h}\in H\),

$$\begin{aligned} (DA[\dot{h}])_t=\partial _t\langle DX_t,\dot{h}\rangle _H\quad \forall ~t\in [0,1-\epsilon ]. \end{aligned}$$

In addition,

$$\begin{aligned} |DA[\dot{h}]|_{H}^2\ge \int _0^{1-\epsilon }\left( \dot{h}_t+\nabla v(t,X_t)\int _0^te^{\int _s^t\nabla v(r,X_r)dr}\dot{h}_sds\right) ^2dt. \end{aligned}$$

Proof

We have \(A=\text {Id}_{\Omega }+\xi \) where \(\dot{\xi }_t(\omega )=v(t,X_t(\omega ))\) with the drift v as in (2.1). To show that

$$\begin{aligned} (DA[\dot{h}])_t=\partial _t\langle DX_t,\dot{h}\rangle _H\quad \forall ~t\in [0,1-\epsilon ] \end{aligned}$$

we start by noting that Proposition 3.10 gives

$$\begin{aligned} DX_t=1_{[0,t]}+\int _0^t\nabla v(s,X_s)DX_sds, \end{aligned}$$

so

$$\begin{aligned} \partial _t\langle DX_t,\dot{h}\rangle _H=\dot{h}_t+\nabla v(t,X_t)\langle DX_t,\dot{h}\rangle _H \quad \forall t\in [0,1]. \end{aligned}$$
(7.1)

Hence, our goal is to show that

$$\begin{aligned} (DA[\dot{h}])_t=\dot{h}_t+\nabla v(t,X_t)\langle DX_t,\dot{h}\rangle _H\quad \forall ~t\in [0,1-\epsilon ]. \end{aligned}$$

To establish the above identity it suffices to show that

$$\begin{aligned} \langle Dv[\dot{h}],\dot{g}\rangle _{H}=\int _0^1\nabla v(t,X_t)\langle DX_t,\dot{h}\rangle _H\dot{g}_tdt \end{aligned}$$

for every \(h,g\in H^1\) with \(\dot{g}_t=0\) for all \(t\in [1-\epsilon ,1]\). This indeed holds since, for such g and h,

$$\begin{aligned} \langle Dv[\dot{h}],\dot{g}\rangle _{H}&=\lim _{\delta \downarrow 0}\frac{1}{\delta }\langle \xi (\omega +\delta h)-\xi (\omega ),g\rangle _{H^1}\\&=\int _0^1 \lim _{\delta \downarrow 0}\frac{v(t,X_t(\omega +\delta h))-v(t,X_t(\omega ))}{\delta }\dot{g}_tdt\\&=\int _0^1\nabla v(t,X_t)\langle DX_t,\dot{h}\rangle _H\dot{g}_tdt \end{aligned}$$

where in the second equality the integral and the limit were exchanged by the use of the dominated convergence theorem and as v is Lipschitz on \([0,1-\epsilon ]\) (Eq. (3.3) and Lemma 3.4), while the third equality holds by the chain rule which can be applied as v is Lipschitz [59, Proposition 1.2.4].

The proof of the second part of the lemma follows by noting that the solution to the ordinary differential Eq. (7.1), with initial condition 0 at \(t=0\), is

$$\begin{aligned} \langle DX_t,\dot{h}\rangle _H=\int _0^te^{\int _s^t\nabla v(r,X_r)dr}\dot{h}_sds. \end{aligned}$$

\(\square \)

The next theorem is the main result of this section, showing that, with positive probability, \(|DA|_{\mathcal {L}(H)}\) can be arbitrarily large, thus proving the second part of Theorem 1.8.

Theorem 7.3

Let \(\ell \in H^1\) be given by \(\ell (t)=t\) for \(t\in [0,1]\). There exists a constant \(c>0\) such that, for any \(0<\epsilon <c\), there exists a measurable set \(\mathcal {E}_{\epsilon }\subset \Omega \) satisfying

$$\begin{aligned} \gamma [\mathcal {E}_{\epsilon }]>0 \end{aligned}$$

and

$$\begin{aligned} \gamma [|DA[\dot{\ell }]|_{H}> c\log (1/\epsilon )~|~\mathcal {E}_{\epsilon }]=1. \end{aligned}$$

The upshot of Theorem 7.3 is that there exists a unit norm \(\dot{h}\in H\), specifically \(h=\ell \), such that, for any \(b >0\), the event \(\{|DA[\dot{h}]|_{H}>b\}\) has positive probability (possibly depending on b). Since

$$\begin{aligned} |DA|_{\mathcal {L}(H)}=\sup _{\dot{h}\in H:|\dot{h}|=1}|DA[\dot{h}]|_{H} \end{aligned}$$

we conclude that A cannot be a Cameron-Martin contraction, with any constant C.

Next we describe the idea behind the proof of Theorem 7.3. Fix \(0<\epsilon <1\). By Lemma 7.2, and as \(\nabla v(t,X_t)=\partial _{xx}^2\log P_{1-t}f(X_t)\), we have

$$\begin{aligned} |DA[\dot{h}]|_{H}^2\ge \int _0^{1-\epsilon }\left( \dot{h}_t+\partial _{xx}^2\log P_{1-t}f(X_t)\int _0^te^{\int _s^t\partial _{xx}^2\log P_{1-r}f(X_r)dr}\dot{h}_sds\right) ^2dt. \end{aligned}$$
(7.2)

The idea of the proof is to construct a function \(\eta _{\epsilon }:[0,1-\epsilon ]\rightarrow \mathbb {R}\) and a constant \(b>0\), such that

$$\begin{aligned} \partial _{xx}^2\log P_{1-t}f(\eta _{\epsilon }(t))\approx -\frac{b}{1-t} \quad \forall ~t\in [\epsilon ,1-\epsilon ]. \end{aligned}$$

If we choose \(h=\ell \), and substitute \(\eta _{\epsilon }(t)\) for \(X_t\) in (7.2), then a computation shows that \(|DA[\dot{h}]|_{H}\) is large. The final step is to show that, with positive probability,

$$\begin{aligned} \partial _{xx}^2\log P_{1-t}f(X_t)\approx \partial _{xx}^2\log P_{1-t}f(\eta _{\epsilon }(t)) \quad \forall ~t\in [\epsilon ,1-\epsilon ]. \end{aligned}$$

This implies that, with positive probability, we can make \(|DA[\dot{h}]|_{H}\) arbitrary large. We now proceed to make this idea precise. We start with the construction of the function \(\eta _{\epsilon }\).

Lemma 7.4

For every \(0<\epsilon <1\) there exists an absolutely continuous function \(\eta _{\epsilon }:[0,1-\epsilon ]\rightarrow \mathbb {R}\), with \(\eta _{\epsilon }(0)=0\), such that

$$\begin{aligned} -\frac{1}{2}\frac{1}{1-t} -\frac{1}{1-t+\sigma }\le \partial _{xx}^2\log P_{1-t}f(\eta _{\epsilon }(t))\le -\frac{1}{8}\frac{1}{1-t}\quad \forall t\in [\epsilon ,1-\epsilon ]. \end{aligned}$$

Furthermore, for any \(\epsilon >0\) and \(\eta _{\epsilon }\) as above, there exists \(\delta (\epsilon )>0\) such that, if \({\tilde{\eta }}:[0,1]\rightarrow \mathbb {R}\) satisfies

$$\begin{aligned} \sup _{t\in [\epsilon ,1-\epsilon ]}|{\tilde{\eta }}(t)-\eta _{\epsilon }(t)|<\delta (\epsilon ), \end{aligned}$$

then

$$\begin{aligned} -\frac{1}{2}\frac{1}{1-t} -\frac{1}{1-t+\sigma }\le \partial _{xx}^2\log P_{1-t}f({\tilde{\eta }}(t))\le -\frac{1}{8}\frac{1}{1-t} \quad \forall t\in [\epsilon ,1-\epsilon ] \end{aligned}$$

as well.

Proof

Fix \(0<\epsilon <1\) and let \(Z:=\int _{\mathbb {R}}1_{[0,+\infty )}(y)e^{-\frac{y^2}{2\sigma }}d\gamma _1(y)\) so that \(f(x)=Z^{-1}1_{[0,+\infty )}(x)e^{-\frac{x^2}{2\sigma }}\). Let \(\varphi \) be the density of the standard Gaussian measure on \(\mathbb {R}\) and let \(\Phi (x):=\int _{-\infty }^x\varphi (y)dy\) be its cumulative distribution function. Making the change of variables \(y\mapsto \frac{y-x}{\sqrt{t}}\) we get

$$\begin{aligned} P_tf(x)=\frac{1}{Z}\int _{\mathbb {R}}1_{[0,+\infty )}(x+\sqrt{t}y)e^{-\frac{(x+\sqrt{t}y)^2}{2\sigma }}\frac{e^{-\frac{y^2}{2}}}{\sqrt{2\pi }}dy=\frac{1}{Z\sqrt{t}\sqrt{2\pi }}\int _0^{\infty }e^{-\frac{y^2}{2\sigma }}e^{-\frac{(y-x)^2}{2t}}dy. \end{aligned}$$

Since

$$\begin{aligned}&\frac{y^2}{\sigma }+\frac{(y-x)^2}{t}=\left( \frac{1}{\sigma }+\frac{1}{t}\right) y^2-2\frac{xy}{t}+\frac{x^2}{t}=\frac{t+\sigma }{t\sigma }\left[ y-\frac{\sigma }{t+\sigma }x\right] ^2+\left[ \frac{1}{t}-\frac{\sigma }{t(t+\sigma )}\right] x^2\\&\quad =\frac{t+\sigma }{\sigma t}\left[ y-\frac{\sigma }{t+\sigma }x\right] ^2+\frac{x^2}{t+\sigma }, \end{aligned}$$

we have

$$\begin{aligned} \frac{1}{\sqrt{t}Z\sqrt{2\pi }}\int _0^{\infty }e^{-\frac{y^2}{2\sigma }}e^{-\frac{(y-x)^2}{2t}}dy=\frac{\sqrt{\frac{\sigma }{t+\sigma }}}{Z}e^{-\frac{1}{2}\frac{x^2}{t+\sigma }}\int _0^{\infty }\frac{\exp \left[ -\frac{1}{2\frac{\sigma t}{t+\sigma }}\left[ y-\frac{\sigma }{t+\sigma }x\right] ^2\right] }{\sqrt{2\pi \frac{\sigma t}{t+\sigma }}}dy. \end{aligned}$$

The cumulative distribution function of a Gaussian with mean \(\frac{\sigma }{t+\sigma }x\) and variance \(\frac{\sigma t}{t+\sigma }\) is \(y\mapsto \Phi \left( \frac{y-\frac{\sigma }{t+\sigma }x}{\sqrt{\frac{\sigma t}{t+\sigma }}}\right) \) so,

$$\begin{aligned} \int _0^{\infty }\frac{\exp \left[ -\frac{1}{2\frac{\sigma t}{t+\sigma }}\left[ y-\frac{\sigma }{t+\sigma }x\right] ^2\right] }{\sqrt{2\pi \frac{\sigma t}{t+\sigma }}}dy=1- \Phi \left( \frac{-\frac{\sigma }{t+\sigma }x}{\sqrt{\frac{\sigma t}{t+\sigma }}}\right) =1- \Phi \left( -\sqrt{\frac{\sigma }{t(t+\sigma )}}x\right) . \end{aligned}$$

Since \(\Phi (-y)=1-\Phi (y)\) we conclude that

$$\begin{aligned} P_tf(x)=\frac{\sqrt{\frac{\sigma }{t+\sigma }}}{Z}e^{-\frac{1}{2}\frac{x^2}{t+\sigma }}\Phi \left( \sqrt{\frac{\sigma }{t(t+\sigma )}}x\right) \quad \forall t\in [0,1]. \end{aligned}$$

Let \(\sigma _t:=\sqrt{\frac{\sigma }{(1-t)(1-t+\sigma )}}\) and let \(m(x):=\frac{\varphi (-x)}{\Phi (-x)}\) be the inverse Mills ration, where \(\varphi \) is the density of the standard Gaussian on \(\mathbb {R}\). Then, for \(t\in [0,1]\),

$$\begin{aligned} \partial _x\log P_{1-t}f(x)=\sigma _t\frac{\varphi (\sigma _t x)}{\Phi (\sigma _t x)}-\frac{x}{1-t+\sigma }=\sigma _tm(-\sigma _t x)-\frac{x}{1-t+\sigma }, \end{aligned}$$

and using the readily verified relation

$$\begin{aligned} m'(x)=m^2(x)+xm(x), \end{aligned}$$

we get

$$\begin{aligned} \partial _{xx}^2\log P_{1-t}f(x)= & {} -\sigma _t^2 m'(-\sigma _t x)-\frac{1}{1-t+\sigma }=-\sigma _t^2[m^2(-\sigma _t x)-\sigma _t xm(-\sigma _t x)]\\{} & {} -\frac{1}{1-t+\sigma }. \end{aligned}$$

Set \(\eta _{\epsilon }(t):=-\sigma _t^{-1}c\) for \(t\in [\epsilon ,1-\epsilon ]\) with c a constant to be determined shortly, and continue \(\eta _{\epsilon }\) to \([0,\epsilon ]\) in such a way that \(\eta _{\epsilon }\) is absolutely continuous with derivative in \(L^2([0,1-\epsilon ])\), and \(\eta _{\epsilon }(0)=0\). Then,

$$\begin{aligned} \partial _{xx}^2\log P_{1-t}f(\eta _{\epsilon }(t))=-\sigma _t^2[m^2(c)+cm(c)]-\frac{1}{1-t+\sigma }\quad \forall ~ t\in [\epsilon ,1-\epsilon ]. \end{aligned}$$

Since \(m^2(0)+0m(0)=\frac{2}{\pi }>\frac{1}{2}\), and as \(\lim _{x\rightarrow -\infty }[m^2(x)+xm(x)]=0\), the continuity of m implies that there exists \(c<0\) such that \(m^2(c)+cm(c)=\frac{1}{3}\). With this choice of c we have \(\frac{1}{4}<m^2(c)+cm(c)<\frac{1}{2}\) so

$$\begin{aligned} -\frac{\sigma _t^2}{2}-\frac{1}{1-t+\sigma }\le \partial _{xx}^2\log P_{1-t}f(\eta _{\epsilon }(t))\le -\frac{\sigma _t^2}{4}-\frac{1}{1-t+\sigma }\quad \forall ~ t\in [\epsilon ,1-\epsilon ]. \end{aligned}$$

Whenever \(\sigma \ge 1\),

$$\begin{aligned} \frac{1}{2}\frac{1}{1-t}\le \sigma _t^2=\frac{\sigma }{(1-t)(1-t+\sigma )}\le \frac{1}{1-t} \end{aligned}$$

so

$$\begin{aligned} -\frac{1}{2}\frac{1}{1-t}-\frac{1}{1-t+\sigma }\le \partial _{xx}^2\log P_{1-t}f(\eta _{\epsilon }(t))\le -\frac{1}{8}\frac{1}{1-t}-\frac{1}{1-t+\sigma }. \end{aligned}$$

Since \( -\frac{1}{1-t+\sigma }\le 0\) we get

$$\begin{aligned} -\frac{1}{2}\frac{1}{1-t} -\frac{1}{1-t+\sigma }\le \partial _{xx}^2\log P_{1-t}f(\eta _{\epsilon }(t))\le -\frac{1}{8}\frac{1}{1-t}. \end{aligned}$$

This completes the proof of the first part of the lemma.

For the second part of the lemma, given \(\epsilon \) and \(\eta _{\epsilon }\) as above, use the continuity of m, and that \(m^2(c)+cm(c)=\frac{1}{3}\), to choose \(\delta '>0\) such that \(|c'-(-c)|<\delta '\Rightarrow \frac{1}{4}<m^2(-c')-c'm(-c')< \frac{1}{2}\). Now let \(\delta (\epsilon ):=\frac{\delta '}{\sigma _{1-\epsilon }}\) and let \({\tilde{\eta }}:[0,1]\rightarrow \mathbb {R}\) be any function such that \(\sup _{t\in [\epsilon ,1-\epsilon ]}|\eta _{\epsilon }(t)-{\tilde{\eta }}(t)|<\delta (\epsilon )\). Then,

$$\begin{aligned} |\sigma _t{\tilde{\eta }}(t)-(-c)|=|\sigma _t{\tilde{\eta }}(t)-\sigma _t\eta _{\epsilon }(t)|\le |\sigma _{1-\epsilon }{\tilde{\eta }}(t)-\sigma _{1-\epsilon }\eta _{\epsilon }(t)|<\delta ' \quad \forall ~t\in [\epsilon ,1-\epsilon ] \end{aligned}$$

so \(\frac{1}{4}<m^2(-\sigma _t{\tilde{\eta }}(t))-(-\sigma _t{\tilde{\eta }}(t))m(-\sigma _t{\tilde{\eta }}(t))< \frac{1}{2}\). It follows, as above, that

$$\begin{aligned} -\frac{\sigma _t^2}{2}-\frac{1}{1-t+\sigma }\le \partial _{xx}^2\log P_{1-t}f({\tilde{\eta }}(t))\le -\frac{\sigma _t^2}{4}-\frac{1}{1-t+\sigma }\quad \forall ~ t\in [\epsilon ,1-\epsilon ], \end{aligned}$$

and we continue as above to complete the proof. \(\square \)

Next we show that if we take \(h=\ell \), and substitute for X in (7.2) a function \({\tilde{\eta }}\) which is close to the function \(\eta _{\epsilon }\) constructed in Lemma 7.4, then \(|DA[\dot{h}]|_{H}\) is large.

Lemma 7.5

There exists a constant \(c>0\) with the following properties. Fix \(0<\epsilon <c\) and let \(\eta _{\epsilon }\) and \(\delta :=\delta (\epsilon )\) be as in Lemma 7.4. Let \({\tilde{\eta }}:[0,1]\rightarrow \mathbb {R}\) be any function such that \(\sup _{t\in [\epsilon ,1-\epsilon ]}|{\tilde{\eta }}(t)-\eta _{\epsilon }(t)|<\delta \). Then,

$$\begin{aligned} \int _0^{1-\epsilon }\left( 1+\partial _{xx}^2\log P_{1-t}f({\tilde{\eta }}(t))\int _0^te^{\int _s^t\partial _{xx}^2\log P_{1-r}f({\tilde{\eta }}(r))dr}ds\right) ^2dt\ge c\log (1/\epsilon ). \end{aligned}$$

Proof

By Eq. (3.3) and Lemma 3.4(2) (with \(\kappa =1\)),

$$\begin{aligned} -\frac{1}{1-t}\le \partial _{xx}^2\log P_{1-t}f(x)\le 0 \end{aligned}$$

so, for \(s\le \epsilon \),

$$\begin{aligned} \int _s^{\epsilon }\partial _{xx}^2\log P_{1-r}f({\tilde{\eta }}(r))dr\ge \int _0^{\epsilon }\partial _{xx}^2\log P_{1-r}f({\tilde{\eta }}(r))dr\ge \int _0^{\epsilon }-\frac{1}{1-r}dr=\log (1-\epsilon ). \end{aligned}$$

It follows that, for \(\epsilon <t\),

$$\begin{aligned}&\int _0^te^{\int _s^t\partial _{xx}^2\log P_{1-r}f({\tilde{\eta }}(r))dr}ds\\&\quad \ge \int _{\epsilon }^te^{\int _s^t\partial _{xx}^2\log P_{1-r}f({\tilde{\eta }}(r))dr}ds+(1-\epsilon )\int _0^{\epsilon }e^{\int _{\epsilon }^t\partial _{xx}^2\log P_{1-r}f({\tilde{\eta }}(r))dr}ds\\&\quad \ge \frac{1}{2}\int _{\epsilon }^te^{\int _s^t\partial _{xx}^2\log P_{1-r}f({\tilde{\eta }}(r))dr}ds+\frac{1}{2}\int _0^{\epsilon }e^{\int _{\epsilon }^t\partial _{xx}^2\log P_{1-r}f({\tilde{\eta }}(r))dr}ds\\&\quad =\frac{1}{2} \int _0^te^{\int _{\epsilon \vee s}^t\partial _{xx}^2\log P_{1-r}f({\tilde{\eta }}(r))dr}ds. \end{aligned}$$

Using the lower bound of Lemma 7.4 we get,

$$\begin{aligned} \int _0^te^{\int _s^t\partial _{xx}^2\log P_{1-r}f({\tilde{\eta }}(r))dr}ds\ge \frac{1}{2}\int _0^te^{\int _{\epsilon \vee s}^t\left[ -\frac{1}{2}\frac{1}{1-r} -\frac{1}{1-r+\sigma }\right] dr}ds \quad \forall t\in [\epsilon ,1-\epsilon ], \end{aligned}$$

and using the upper bound of Lemma 7.4 we conclude that, for \(t\in [\epsilon ,1-\epsilon ]\),

$$\begin{aligned}&\partial _{xx}^2\log P_{1-t}f({\tilde{\eta }}(t))\int _0^te^{\int _s^t\partial _{xx}^2\log P_{1-r}f({\tilde{\eta }}(r))dr}ds\le -\frac{1}{16}\frac{1}{1-t} \int _0^te^{\int _{\epsilon \vee s}^t\left[ -\frac{1}{2}\frac{1}{1-r} -\frac{1}{1-r+\sigma }\right] dr}ds\\&\quad \le -\frac{1}{16}\frac{1}{1-t} \int _0^te^{\int _{s}^t\left[ -\frac{1}{2}\frac{1}{1-r} -\frac{1}{1-r+\sigma }\right] dr}ds=-\frac{1}{16}\frac{1}{1-t} \left\{ \int _0^t \left[ \frac{\sqrt{1-t}}{\sqrt{1-s}}+\frac{1-t+\sigma }{1-s+\sigma }\right] ds\right\} \\&\quad =\frac{1}{8}\frac{\sqrt{1-t}-1}{\sqrt{1-t}}+\frac{1}{16}\frac{1-t+\sigma }{1-t} \log \left( \frac{1+\sigma -t}{1+\sigma }\right) \le \frac{1}{8}\frac{\sqrt{1-t}-1}{\sqrt{1-t}}, \end{aligned}$$

since the second term is nonpositive. In particular, letting \(t_0:=\frac{80}{81}\), we get

$$\begin{aligned} 1+\partial _{xx}^2\log P_{1-t}f({\tilde{\eta }}(t))\int _0^te^{\int _s^t\partial _{xx}^2\log P_{1-r}f({\tilde{\eta }}(r))dr}ds\le 0\quad \forall t\in [t_0,1-\epsilon ]. \end{aligned}$$

It follows that

$$\begin{aligned}&\int _0^{1-\epsilon }\left( 1+\partial _{xx}^2\log P_{1-t}f({\tilde{\eta }}(t))\int _0^te^{\int _s^t\partial _{xx}^2\log P_{1-r}f({\tilde{\eta }}(r))dr}ds\right) ^2dt\\&\quad \ge \int _{t_0}^{1-\epsilon }\left( 1+\partial _{xx}^2\log P_{1-t}f({\tilde{\eta }}(t))\int _0^te^{\int _s^t\partial _{xx}^2\log P_{1-r}f({\tilde{\eta }}(r))dr}ds\right) ^2dt\\&\quad \ge \int _{t_0}^{1-\epsilon }\left( 1-\frac{1}{8}\frac{1-\sqrt{1-t}}{\sqrt{1-t}}\right) ^2dt\ge \frac{1}{2}\frac{1}{64}\int _{t_0}^{1-\epsilon }\frac{1}{1-t}dt-\frac{1}{4}\int _{t_0}^{1-\epsilon }\frac{1}{\sqrt{1-t}}dt\\&\quad \ge \frac{1}{128}\log (1/\epsilon )+\frac{1}{128}\log (1/81)-\frac{1}{18}, \end{aligned}$$

which completes the proof. \(\square \)

It remains to show that, with positive probability, X is close to \(\eta _{\epsilon }\).

Lemma 7.6

Fix \(0<\epsilon <c\), with c as in Lemma 7.5, and let \(\delta :=\delta (\epsilon )\) be as in Lemma 7.4. Then, the set

$$\begin{aligned} \mathcal {E}_{\epsilon ,\delta }:=\left\{ {\tilde{\eta }}\in \Omega :{\tilde{\eta }}(0)=0~\text { and }\sup _{t\in [\epsilon ,1-\epsilon ]}|{\tilde{\eta }}(t)-\eta _{\epsilon }(t)|<\delta \right\} \subset \Omega \end{aligned}$$

is measurable and has positive probability.

Proof

The measurability of \(\mathcal {E}_{\epsilon ,\delta }\) follows as the sigma-algebra \(\mathcal {F}\) is generated by the Borel sets of \(\Omega \) with respect to the uniform norm. To show that \(\mathcal {E}_{\epsilon ,\delta }\) has positive probability it will be useful to note that the Föllmer process X is a mixture of Brownian bridges in the following sense. Let \(({\tilde{\Omega }}, \tilde{\mathcal {F}},{\tilde{\mathbb {P}}})\) be any probability space which supports a Brownian motion \({\tilde{B}}=({\tilde{B}}_t)_{t\in [0,1]}\) and a random vector \( Y\sim p\), independent of \({\tilde{B}}\). Define the process Z by \(Z_t:={\tilde{B}}_t-t({\tilde{B}}_1- Y)\) for \(t\in [0,1]\) so, conditioned on Y, Z is a Brownian bridge starting at 0 and terminating at Y. Given a set \(\mathcal {B}\in \mathcal {F}\) we have [33],

$$\begin{aligned} \gamma [X\in \mathcal {B}]={\tilde{\mathbb {P}}}[\{{\tilde{\omega }}\in {\tilde{\Omega }}: Z({\tilde{\omega }})\in \mathcal {B}\}]. \end{aligned}$$

Since

$$\begin{aligned} {\tilde{\mathbb {P}}}[\{{\tilde{\omega }}\in {\tilde{\Omega }}: Z({\tilde{\omega }})\in \mathcal {B}\}]=\mathbb {E}[{\tilde{\mathbb {P}}}[\{{\tilde{\omega }}\in {\tilde{\Omega }}: Z({\tilde{\omega }})\in \mathcal {B}\}|Y]], \end{aligned}$$

it will suffice to show that, for any \(b\in \mathbb {R}\) and \(Z^b_t:={\tilde{B}}_t-t({\tilde{B}}_1-b)\), we have

$$\begin{aligned} {\tilde{\mathbb {P}}}[\{{\tilde{\omega }}\in {\tilde{\Omega }}: Z^b({\tilde{\omega }})\in \mathcal {E}_{\epsilon ,\delta }\}]>0. \end{aligned}$$

This is equivalent to the following statement: Fix \(b\in \mathbb {R}\) and let \(\eta _{\epsilon }\) be as in Lemma 7.4. Then, for any \(\epsilon \in (0,1)\) and \(\delta >0\),

$$\begin{aligned} {\tilde{\mathbb {P}}}\left[ \sup _{t\in [\epsilon ,1-\epsilon ]}|Z_t^b-\eta _{\epsilon }(t)|< \delta \right] >0. \end{aligned}$$
(7.3)

To prove (7.3) define the function \(h:[0,1]\rightarrow \mathbb {R}\) by

$$\begin{aligned} h_t= {\left\{ \begin{array}{ll} \eta _{\epsilon }(t),&{}t\in [0,1-\epsilon ]\\ \frac{b-\eta _{\epsilon }(1-\epsilon )}{\epsilon }t+\frac{\eta _{\epsilon }(1-\epsilon )-(1-\epsilon )b}{\epsilon },&{}t\in (1-\epsilon , 1], \end{array}\right. } \end{aligned}$$

and note that the construction of \(\eta _{\epsilon }\) ensures that \(h\in H^1\). Then, for any \(\delta >0\),

$$\begin{aligned} {\tilde{\mathbb {P}}}\left[ \sup _{t\in [0,1]}|{\tilde{B}}_t-h_t|<\frac{\delta }{2}\right] >0, \end{aligned}$$

because \(H^1\) is dense in \(C_0([0,1])\). It follows that

$$\begin{aligned}&{\tilde{\mathbb {P}}}\left[ \sup _{t\in [0,1-\epsilon ]}|Z^b_t-\eta _{\epsilon }(t)|<\delta \right] \ge {\tilde{\mathbb {P}}}\left[ \sup _{t\in [0,1-\epsilon ]}|{\tilde{B}}_t-\eta _{\epsilon }(t)|<\frac{\delta }{2}, ~|{\tilde{B}}_1-b|<\frac{\delta }{2}\right] \\&\quad ={\tilde{\mathbb {P}}}\left[ \sup _{t\in [0,1-\epsilon ]}|{\tilde{B}}_t-h_t|\le \frac{\delta }{2},~ |{\tilde{B}}_1-h_1|<\frac{\delta }{2}\right] \ge {\tilde{\mathbb {P}}}\left[ \sup _{t\in [0,1]}|{\tilde{B}}_t-h_t|<\frac{\delta }{2}\right] >0. \end{aligned}$$

\(\square \)

We can now complete the proof of Theorem 7.3 by combining the above lemmas.

Proof

(of Theorem 7.3) Fix \(0<\epsilon <c\) and let \(\eta _{\epsilon }\) and \(\delta :=\delta (\epsilon )\) be as in Lemma 7.4. Let \(\mathcal {E}_{\epsilon }:=\mathcal {E}_{\epsilon ,\delta }\) be as in Lemma 7.6 so \(\gamma [\mathcal {E}_{\epsilon }]>0\). Conditioned on \(\mathcal {E}_{\epsilon }\), Lemma 7.5 implies that there exists \(c>0\) such that

$$\begin{aligned} |DA|[\dot{\ell }]|_H\ge & {} \int _0^{1-\epsilon }\left( 1+\partial _{xx}^2\log P_{1-t}f(X_t)\int _0^te^{\int _s^t\partial _{xx}^2\log P_{1-r}f(X_r)dr}ds\right) ^2dt\\ {}\ge & {} c\log (1/\epsilon ), \end{aligned}$$

where the first inequality holds by (7.2). It follows that

$$\begin{aligned} \gamma [|DA[\dot{\ell }]|_{H}> c\log (1/\epsilon )~|~\mathcal {E}_{\epsilon }]=1. \end{aligned}$$

\(\square \)

8 Optimal transport

The fundamental results of optimal transport on the Wiener space are due to Feyel and Üstünel [31]. One of their results, pertaining to our setting, is the following analogue of a theorem of Brenier in Euclidean spaces [63, Theorem 2.12].

Theorem 8.1

([31, Theorem 4.1]) Let \(\mu \) be a measure on \(\Omega \) defined by \(\frac{d\mu }{d\gamma }(\omega )=F(\omega )\), where \(F:\Omega \rightarrow \mathbb {R}\) is a positive function \(\gamma \)-a.e., such that \(W_2(\gamma ,\mu )<\infty \). Then, there exists a unique (up to a constant) convex map \(\phi :\Omega \rightarrow \mathbb {R}\), such that \(\mu \) is the pushforward of \(\gamma \) under \(\nabla \phi \in H^1\), where \((\nabla \phi (\omega ))_t=\int _0^tD_s\phi (\omega )ds\) with \(D\phi \) being the Malliavin derivative of \(\phi \), and \(\mathbb {E}_{\gamma }\left[ | \omega -\nabla \phi (\omega )|_{H^1}^2\right] =W_2^2(\gamma ,\mu )\).

The main result in this section, Theorem 8.2, establishes a Cameron-Martin contraction for \(\phi \) in the case where \(F(\omega )=f(\omega _1)\) is \((\kappa -1)\)-log-concave for \(\kappa \ge 0\); informally, \(\mu \) is \(\kappa \)-log-concave for \(\kappa \ge 0\). This proves the first part of Theorem 1.8. Our motivation is primarily to show that there are settings where the optimal transport map is a Cameron-Martin contraction, contrary to the causal optimal transport map (second part of Theorem 1.8 via Theorem 7.3).

Theorem 8.2

Let \(\mu \) be a probability measure on \(\Omega \) given by \(d\mu (\omega )=f(\omega _1)d\gamma (\omega )\) where \(f:\mathbb {R}^d\rightarrow \mathbb {R}\) is \((\kappa -1)\)-log-concave for some \(\kappa \ge 0\). Suppose, in addition, that the optimal transport map in \(\mathbb {R}^d\) from \(\gamma _d\) to \(p:=f\gamma _d\) is twice-differentiableFootnote 9. Then, the optimal transport map in \(\Omega \) from \(\gamma \) to \(\mu \) is a Cameron-Martin contraction with constant \(\max \left\{ \frac{1}{\sqrt{\kappa }},1\right\} \).

Recall that in the Euclidean setting, if \(p=fd\gamma _d\) with \(f:\mathbb {R}^d\rightarrow \mathbb {R}\) \((\kappa -1)\)-log-concave for some \(\kappa \ge 0\), then the optimal transport map is a contraction with constant \(\frac{1}{\sqrt{\kappa }}\) [41, Theorem 2.2]. However, the fact that the analogous result of Theorem 8.2 gives the constant \(\max \left\{ \frac{1}{\sqrt{\kappa }},1\right\} \) is no coincidence. If the constant of the Cameron-Martin contraction of the optimal transport map on the Wiener space was in fact \(\frac{1}{\sqrt{\kappa }}\), then, arguing as in Sect. 5, we would conclude that the measure \(\mu \) on \(\Omega \) given by \(d\mu (\omega )=f(\omega _1)d\gamma (\omega )\) satisfies a Poincaré inequality with constant \(\frac{1}{\kappa }\). But as argued in [42, Remark 2.7], we expect that if \(\mu \) is equivalent to \(\gamma \) then its Poincaré constant must be greater or equal to 1. Hence, we may expect Theorem 8.2 to be the optimal result.Footnote 10 (We note that the question of the contraction properties of the optimal transport map on Wiener space was addressed in [32, §6].)

Proof

We start by explicitly computing the optimal transport map O. Define the measure p on \(\mathbb {R}^d\) by \(\frac{dp}{d\gamma _d}(x):=f(x)\) and let \(\phi _d:\mathbb {R}^d\rightarrow \mathbb {R}\) be the convex function such that \(\nabla \phi _d\) is the optimal transport map from \(\gamma _d\) to p. Define \(\dot{\xi }\in H\) by \(\dot{\xi }_t:=\nabla \phi _d(\omega _1)-\omega _1\), for all \(t\in [0,1]\), and let \(O':\Omega \rightarrow \Omega \) be given by \(O'(\omega )_t:=\omega _t+\xi _t\) (where \(\xi _t:=\int _0^t\dot{\xi }_sds\)) for \(t\in [0,1]\). We claim that \(O=O'\). Indeed, by the uniqueness part of Theorem 8.1, and as \(O'\) is convex (according to Definition 6.2 and Remark 6.3), it suffices to show that the pushforward of \(\gamma \) by \(O'\) is \(\mu \), and that \(W_2^2(\gamma ,\mu )=\mathbb {E}_{\gamma }\left[ |\omega -O'(\omega ) |_{H^1}^2\right] \). The fact that \(\mu \) is the pushforward of \(\gamma \) by \(O'\) follows by construction: Let \(Z^b\) be a Brownian bridge on [0, 1] starting at 0 and terminating at \(b\in \mathbb {R}^d\). Then, for any \(\eta :\Omega \rightarrow \mathbb {R}\) measurable continuous and bounded, we have

$$\begin{aligned} \mathbb {E}_{\gamma }[\eta (O'(\omega ))]&=\mathbb {E}_{\gamma }[\mathbb {E}_{\gamma }[\eta (O'(\omega ))]|\omega _1]\\&\quad =\int _{\mathbb {R}^d} \mathbb {E}[\eta (Z^{\nabla \phi _d(x)})]d\gamma _d(x)=\mathbb {E}_{Y\sim p}[\mathbb {E}[\eta (Z^Y)]]\\&=\mathbb {E}_{\mu }[\eta (\omega )]. \end{aligned}$$

To see that \(O'\) is in fact the actual optimal transport map we compute

$$\begin{aligned} \mathbb {E}_{\gamma }\left[ |\omega -O'(\omega ) |_{H^1}^2\right]&=\mathbb {E}_{\gamma }\left[ |\xi |_{H^1}^2\right] =\mathbb {E}_{\gamma }\left[ \int _0^1 |\omega _1-\nabla \phi _d(\omega _1)|^2dt\right] \\&=W_2^2(\gamma _d,p)\le W_2^2(\gamma ,\mu ), \end{aligned}$$

which shows that \(O'=O\). Next we show that O is a Cameron-Martin contraction. Fix \(h\in H^1\) and compute

$$\begin{aligned} |O(\omega +h)-O(\omega )|_{H^1}^2&=\int _0^1 |\dot{h}_t+\dot{\xi }_t(\omega +h)-\dot{\xi }_t(\omega )|^2dt\\&=\int _0^1 |\nabla \phi _d(\omega _1+h_1)-\nabla \phi _d(\omega _1)+\dot{h}_t-h_1|^2 dt. \end{aligned}$$

Since

$$\begin{aligned} \nabla \phi _d(\omega _1+h_1)-\nabla \phi _d(\omega _1)=\left[ \int _0^1\nabla ^2\phi _d(\omega _1+rh_1)dr\right] h_1, \end{aligned}$$

we may write

$$\begin{aligned} |O(\omega +h)-O(\omega )|_{H^1}^2&=\int _0^1 \left| \left[ \int _0^1\nabla ^2\phi _d(\omega _1+rh_1)dr-\text {Id}_d\right] h_1+\dot{h}_t\right| ^2 dt\\&{=}{:}\int _0^1 \left| Mh_1+\dot{h}_t\right| ^2 dt\\&=\int _0^1\left\{ |Mh_1|^2+2\langle Mh_1,\dot{h}_t\rangle +|\dot{h}_t|^2\right\} dt\\&=|Mh_1|^2+2\langle Mh_1,h_1\rangle +|h|_{H^1}^2. \end{aligned}$$

Since p is \(\kappa \)-log-concave, we have \(0\preceq \nabla ^2\phi _d\preceq \frac{1}{\sqrt{\kappa }}\text {Id}_d\) [41, Theorem 2.2], and hence, \(-\text {Id}_d \preceq M\preceq \left( \frac{1}{\sqrt{\kappa }}-1\right) \text {Id}_d\). There are now two cases: \(\kappa \ge 1\) and \(\kappa <1\). Suppose \(\kappa \ge 1\). We claim that \(|Mh_1|^2+2\langle Mh_1,h_1\rangle \le 0\). Indeed, the latter is equivalent to \(|[M+\text {Id}_d]h_1|^2\le |h_1|^2\), which is true since \(0\preceq M+\text {Id}_d \preceq \frac{1}{\sqrt{\kappa }}\text {Id}_d\preceq \text {Id}_d\). This shows that

$$\begin{aligned} \kappa \ge 1\quad \Longrightarrow \quad |O(\omega +h)-O(\omega )|_{H^1}\le |h|_{H^1}\quad \forall h\in H^1. \end{aligned}$$

Suppose now that \(\kappa <1\). We claim that \(|Mh_1|^2+2\langle Mh_1,h_1\rangle \le \left( \frac{1}{\kappa }-1\right) |h|_{H^1}^2\). Indeed, since \(|[M+\text {Id}_d]h_1|^2\le \frac{1}{\kappa }|h_1|^2\), we get \(|Mh_1|^2+2\langle Mh_1,h_1\rangle \le \left( \frac{1}{\kappa }-1\right) |h_1|^2\le \left( \frac{1}{\kappa }-1\right) |h|_{H^1}^2\) where we used \(|h_1|^2\le |h|_{H^1}^2\) by the Cauchy-Schwarz inequality. This shows that

$$\begin{aligned} \kappa < 1\quad \Longrightarrow \quad |O(\omega +h)-O(\omega )|_{H^1}\le \frac{1}{\sqrt{\kappa }} |h|_{H^1}\quad \forall h\in H^1. \end{aligned}$$

\(\square \)

Remark 8.3

The example of a one-dimensional Gaussian conditioned on being positive, constructed in Sect. 7, does not exactly satisfy the assumptions of Theorem 8.2 since the second-derivative of the transport map between \(\gamma _1\) and the conditioned Gaussian \(p=f\gamma _1\) does not exist at every point in \(\mathbb {R}\). Nonetheless, the statement of Theorem 8.2 still holds true in this case. In the example of Sect. 7, the optimal transport map is explicit, \(\nabla \phi _1=F_p^{-1}\circ F_{\gamma _1}\), where \(F_p\) and \(F_{\gamma _1}\) are the cumulative distribution functions of p and \(\gamma _1\), respectively. Computing the derivatives of this map we see that \(\phi _1\) is twice-differentiable everywhere. Hence, the proof of Theorem 8.2 still goes through since \(\nabla ^2\phi _1\) must exists everywhere on the line \(\omega _1+rh_1\).

Remark 8.4

The proof of Theorem 8.2 shows that the optimal transport map O between \(\gamma \) and \(\mu (d\omega )=f(\omega _1)\gamma (d\omega )\) is essentially the optimal transport map in \(\mathbb {R}^d\) between \(\gamma _d\) and \(f\gamma _d\). This explains why we cannot use the optimal transport map on Wiener space instead of the Brownian transport map, since the desired contraction properties for the optimal transport maps in \(\mathbb {R}^d\) are still unknown.

Remark 8.5

Inspection of the proof of Theorem 8.2 reveals that if \(\theta :\mathbb {R}^d\rightarrow \mathbb {R}^d\) is a contraction of \(\gamma _d\) onto a target measure p, then one can construct a Cameron-Martin contraction \(\Theta :\Omega \rightarrow \Omega \) of \(\gamma \) onto \(d\mu (\omega ):=p(\omega _1)d\gamma (\omega )\). Indeed, define \(\Theta _t(\omega ):=\omega _t+t\theta (\omega _1)-t\omega _1\), for \(t\in [0,1]\), and repeat the computation in the proof of Theorem 8.2. Note that, in general, \(\Theta \) will not be the optimal transport map on the Wiener space, unless \(\theta \) is the optimal transport map on the Euclidean space.