The Brownian transport map

Contraction properties of transport maps between probability measures play an important role in the theory of functional inequalities. The actual construction of such maps, however, is a non-trivial task and, so far, relies mostly on the theory of optimal transport. In this work, we take advantage of the infinite-dimensional nature of the Gaussian measure and construct a new transport map, based on the Föllmer process, which pushes forward the Wiener measure onto probability measures on Euclidean spaces. Utilizing the tools of the Malliavin and stochastic calculus in Wiener space, we show that this Brownian transport map is a contraction in various settings where the analogous questions for optimal transport maps are open. The contraction properties of the Brownian transport map enable us to prove functional inequalities in Euclidean spaces, which are either completely new or improve on current results. Further and related applications of our contraction results are the existence of Stein kernels with desirable properties (which lead to new central limit theorems), as well as new insights into the Kannan–Lovász–Simonovits conjecture. We go beyond the Euclidean setting and address the problem of contractions on the Wiener space itself. We show that optimal transport maps and causal optimal transport maps (which are related to Brownian transport maps) between the Wiener measure and other target measures on Wiener space exhibit very different behaviors.


Introduction
One of the basic tools in the study of functional inequalities in Euclidean spaces is the use of Lipschitz maps T : R d → R d [20,37].A good starting point for this discussion is Caffarelli's contraction theorem [12] (see also [19,29,41,62] for other proofs): If γ d is the standard Gaussian measure on R d and p is a probability measure on R d which is more log-concave than γ d , then the optimal transport map of Brenier T : R d → R d , which pushes forward γ d to p, is 1-Lipschitz.In other words, that p is more log-concave than γ d is manifested by the contractive properties of the transport map T .With the existence of T in hand, we can easily transfer to p functional inequalities which are known to be true for γ d .For example, the Poincaré inequality states that for η : R d → R we have Var As T is 1-Lipschitz, its derivative is bounded, |DT | op ≤ 1, so where we used that p is the pushforward of γ d under T .We see that p satisfies the Poincaré inequality with the same constant as γ d .This paper starts with the observation that since the Gaussian measure is infinite-dimensional in nature 1 , the search for contractive transport maps from the Gaussian measure to some target measure should not be confined to Euclidean spaces, even if the target measure is a measure on R d .Specifically, we will take our source measure to be the Wiener measure (an infinite-dimensional Gaussian measure) which will allow us to take advantage of the Malliavin and stochastic calculus of the Wiener space.Given a target measure p on R d , our construction relies on the Föllmer process: The solution X = (X t ) to the stochastic differential equation where (B t ) is the standard Brownian motion in R d and (P t ) is the heat semigroup.This process can be seen as Brownian motion conditioned on being distributed like p at time 1; i.e., X 1 ∼ p.Hence, we view the solution to (1.2) at time 1 as a transport map, which we call the Brownian transport map, X 1 : Ω → R d which pushes forward the Wiener measure γ on the Wiener space 2 Ω to the target measure p on R d .
In the remainder of the introduction we present our results on the contractive properties of the Brownian transport map, as well as applications to functional inequalities and to central limit theorems.We also study the behavior of the Brownian transport map when considered as a map from the Wiener space to itself.This point of view further elucidates the connection between our results and optimal transport theory.
1.1.Almost-sure contraction.Before presenting our first result we discuss the types of measures for which it is reasonable to expect that the Brownian transport map will be an almost-sure contraction (the reader is referred to section 2 for the exact definition).The rough intuition is that if the measure γ is squeezed into a more concentrated measure then the transport map should be a contraction.We focus on several mechanisms which in principle could facilitate such contractions.The first mechanism works by requiring that S := diam(supp(p)) is finite so that the entire mass of p is confined into a region with finite volume.The second mechanism, inspired by Caffarelli's result, works by imposing convexity assumptions on p: We say that p is κ-log-concave for some κ ∈ R if Note that we allow κ to take negative values and that the case κ = 0 corresponds to p being log-concave.When κ ≥ 1 we see that p is more log-concave than γ d (κ = 1 when p = γ d as −∇ 2 log dγ d dx = Id d ) so in that sense p is more concentrated than γ d and we expect some type of contraction.
The following result shows that the Brownian transport map is an almost-sure contraction when the target measure satisfies either a convexity assumption or a finite-volume of support assumption.For example, as will be clear from the subsequent discussion, Theorem 1.1 always improves on the analogous result of Caffarelli which states that when p is κ-log-concave, for κ > 0, the optimal transport map is 1 √ κ -Lipschitz.In the remainder of the paper we refer to ℓ-Lipschitz maps as contractions with constant ℓ.Theorem 1.1.(Almost-sure contraction) Let p be a κ-log-concave measure for some κ ∈ R and let S := diam(supp(p)).
(i) If κS 2 ≥ 1 then the Brownian transport map between γ and p is an almost-sure contraction with constant 1 √ κ . 2 The Wiener measure γ is such that Ω ∋ ω ∼ γ is a standard Brownian motion in R d where Ω is the classical Wiener space of continuous paths in R d parameterized by time t ∈ [0, 1].
To unpack Theorem 1.1 let us consider some of its important special cases.
• S < ∞ and κ = 0.This setting corresponds to the case where p is log-concave with bounded convex support.
It is an open question [41,Problem 4.3] whether the optimal transport map of Brenier between γ d and p is a contraction with a dimension-free constant.On the other hand, Theorem 1.1 shows that the Brownian transport map between γ and p is in fact an almost-sure contraction with a dimension-free constant of the optimal dependence O(S).
• κ > 0. If κS 2 ≥ 1 then we obtain the exact analogue of Caffarelli's result for the optimal transport map [41,Theorem 2.2].If κS 2 < 1 then part (ii) shows that the Brownian transport map is an almost-sure contraction with constant e 1−κS 2 +1 2 1/2 S ≤ 1 √ κ ; the last inequality holds as κS 2 < 1 and by the estimate 1− 1 x ≤ log x.Thus, we get an improvement on the analogue of Caffarelli's result.
• S < ∞ and κ < 0. In this setting, only part (ii) applies and we see that the Brownian transport map is an almost-sure contraction with constant e 1+|κ|S 2 +1 2 1/2 S.There are no analogous results for other transport maps.
• S = ∞ and κ ≤ 0. The bounds provided by Theorem 1.1 are trivial in this case.This is unavoidable, as we explain in section 1.3.
Remark 1.2.The distinction between κS 2 ≥ 1 and κS 2 < 1 is not necessary and one could get more refined results; see Remark 3.5.The formulation of Theorem 1.1, however, is the cleanest which is why we chose it.
Theorem 1.1 goes beyond the above examples by capturing the effect of the interplay of convexity (including κ < 0) and support size on the contraction properties of the Brownian transport map. 3he reason why we can prove the results in Theorem 1.1, which are unknown for the Brenier map, is because the Malliavin calculus available in the Wiener space allows us to write a differential equation for the derivative of the Brownian transport map, which in turn shows that it is a contraction.This feature does not have an analogue in optimal transport maps (but see [37, equation (1.8)] for a different transport map).Moreover, as will be shown in section 8, trying to replace the Brownian transport map by the optimal transport map on the Wiener space (see section 1.5 for more details) is not possible since the optimal transport map on Wiener space will essentially reduce to the optimal transport map between γ d and p for which the desired contraction properties are not known.
In our second result we identify a third mechanism to promote the existence of contractive transport maps.In essence, the idea is that taking well-behaved mixtures of Gaussians will also have tame concentration profiles.Indeed, in the case where the mixing measure has bounded support we establish the the Brownian transport map is a contraction.Theorem 1.3.(Gaussian mixtures) Let p := γ d ⋆ ν be the convolution of the standard Gaussian measure γ d with a probability measure ν on R d supported on a ball of radius R.Then, the Brownian transport map between γ and p is an almost-sure contraction with constant e 2R 2 −1 2 1/2 1 R .
In the one-dimensional case, an analogue of Theorem 1.3 was established in [67], for the Brenier map.The proof relied on the explicit expression for transport maps between measures on the real line.While it is unknown whether the Brenier map enjoys similar properties in higher dimensions, our analysis of the Brownian transport map affords Theorem 1.3 as an extensive generalization to arbitrary dimensions.As a corollary we are able to deduce several new functional inequalities, as well as improve upon existing ones, for Gaussian mixtures (see section 1.2 below).
1.2.Functional inequalities.Once Theorems 1.1 and 1.3 are established, the generality of the transport method towards functional inequalities opens the door to the improvement of numerous functional inequalities.Section 5 is dedicated to proving such results which include the isoperimetric inequality, Ψ-log-Sobolev inequalities; a generalization of the log-Sobolev inequality, and q-Poincaré inequalities, a generalization of the Poincaré inequality.In Table 1 we summarize our results and note the ones which seem to be new.The definitions and exact statements are deferred to Section 5.
As can be seen from the table, for log-concave measures some of the results are not new.
Theorem 5.3 a Comparable results with more restrictive assumptions were proven in [13].b Comparable results with worse exponent were proven in [65,6].c Comparable results with worse exponent were proven in [17].
Table 1.Summary of functional inequalities obtained from Theorem 1.1 and Theorem 1.3.For results which were previously known we supply references, otherwise the relevant theorem is noted.
However, let us note that the proofs of the mentioned results, obtained gradually over the last several decades, utilized a myriad of different techniques.These techniques include, among others, localization methods, Bakry-Émery calculus, and Brunn-Minkowski theory, and often require adhoc arguments for the specific functional inequality in question.In contrast, our transportation approach provides a unifying framework to study such functional inequalities.As a result we are also able to obtain new, previously unknown, results such that as Ψ-log-Sobolev and q-Poincaré inequalities for log-concave measures with bounded support.While it is likely that one could use other techniques to prove comparable results, the benefit of our approach is that no further arguments are needed, other than Theorem 1.1 and arguments similar to the one outlined in (1.1).
The bottom row of Table 1 deals with Gaussian mixtures.The question of existence of functional inequalities for a mixture of distributions, given the existence of the corresponding inequalities for the individual components, has been investigated for some time.Only recently has it been settled for the Poincaré and log-Sobolev inequalities, [17,Theorem 1].The result of [17] is very general and applies to many families of mixture distributions but, on the other hand, the method of proof seems to be specialized to the Poincaré and log-Sobolev inequalities.In this case, the generality of the transport method allows to tackle inequalities which seem to lie outside of the scope of previous methods.In addition, the generality of the method of [17] misses the special nature of mixtures of Gaussians.Indeed, our results improve on [17,Corollaries 1,2].1.3.Log-concave measures.As we saw in section 1.2, measures which are κ-log-concave (with κ > 0) satisfy a Poincaré inequality with constant κ −1 (which in particular does not depend on the dimension d).When κ = 0, this constant blows up which leaves open the question of the existence of a Poincaré inequality for log-concave measures.The Kannan-Lovász-Simonovits conjecture [36], in one of its formulations, states that there exists a constant C kls , which does not depend on the dimension d, such that for any isotropic (i.e., centered with covariance equal to the identity matrix) log-concave measure p on R d we have In words, any isotropic log-concave measure on R d satisfies a Poincaré inequality with a constant C kls which is dimension-free.In light of the above discussion, transport maps offer a natural route to proving the conjecture: Given an isotropic log-concave measure p on R d we would like to construct a transport map, from γ d or γ, to p which is an almost-sure contraction with constant C kls .Unfortunately, in general, such a map cannot exist: Indeed, as seen in Table 1, such a map will imply that p satisfies a log-Sobolev inequality with a dimension-free constant.But this is known to be false because the existence of a log-Sobolev inequality is equivalent to sub-Gaussian concentration [46,Theorem 5.3], which does not hold for the two-sided exponential measure even though it is isotropic and log-concave.Nonetheless, the transport approach towards the conjecture can still be made to work by using a weaker notion of contraction and using an important result of E. Milman.Indeed, consider the Brownian transport map and suppose that instead of having an almost-sure bound |DX 1 | ≤ C kls we only have a bound in expectation, for some dimension-free constant C. Repeating the argument above, and using Hölder's inequality, we find that Var p [η] ≤ C Lip 2 (η) where Lip(η) := sup x∈R d |∇η(x)|.In principle, this bound is weaker than a Poincaré inequality because of the use of the L ∞ norm on the gradient rather than the L 2 norm.However, as shown by E. Milman [55], a Poincaré inequality is equivalent, up to a dimension-free constant, to first moment concentration which follows from the above L ∞ bound.In conclusion, the Kannan-Lovász-Simonovits conjecture is proven as soon as we can show that Significant progress towards the resolution of the Kannan-Lovász-Simonovits conjecture was made in a series of works [25,49,18,39,38].Building on these results and techniques, we are able to make the, so far missing, connection between measure transportation and the Kannan-Lovász-Simonovits conjecture.
Theorem 1.4.(Contraction in expectation for log-concave measures) Let p be an isotropic logconcave measure on R d with compact support4 , and let X 1 : Ω → R d be the Brownian transport map from the Wiener measure γ to p.There exists a universal constant ζ such that, for any positive integer m, is the operator norm of operators from the Cameron-Martin space H to R d .
Taking m = 1 in Theorem 1.4 we see that C kls is almost dimension-free, as would be expected from the Kannan-Lovász-Simonovits conjecture.In fact, our transport perspective allows us to go beyond m = 1, which is needed for the applications outlined below.
Remark 1.5.As explained in this section, an expectation bound of the form where T is a transport map from either γ or γ d to p, with a dimension-free universal constant C, would lead to a proof of the Kannan-Lovász-Simonovits conjecture.In fact, one of the novelties of the proof of Theorem 1.4 is that it reveals that the reverse is also true, up to log d factors, when T is the Brownian transport map.That is, assuming that the Kannan-Lovász-Simonovits conjecture is true we would get, up to log d factors, an expectation bound C m for a dimension-free universal constant C m which depends only on m.
1.4.Stein kernels and central limit theorems.Our proof of Theorem 1.4 is based on known results concerning C kls .Thus, Theorem 1.4 does not supply any new information regarding the Kannan-Lovász-Simonovits conjecture itself.However, for isotropic log-concave measures, the transport approach is useful not only in the study of the conjecture but also in the theory of Stein kernels.As will become evident soon, these results go beyond the Poincaré inequality, and hence do not follow from the current results on the Kannan-Lovász-Simonovits conjecture.
Given a centered measure p on R d , a matrix-valued map s p is called a Stein kernel for p if for a big-enough family of functions η : R d → R. ].The quantity S(s p ) plays an important role in functional inequalities [48,61,58] and normal approximations [57,47,28,23].For applications, often it is enough to bound E p [|s p | 2 HS ].While in one dimension the Stein kernel of a given measure is unique and given by an explicit formula, in high-dimensions these kernels are non-unique and their construction is often non-trivial [23].It was observed in [15] that transport maps with certain properties are good candidates for constructing Stein kernels.We will follow this strategy and construct Stein kernels with small Hilbert-Schmidt norm, based on the Brownian transport map, for a very large class of measures.
Theorem 1.6.(Stein kernels) Let p be an isotropic log-concave measure on R d with compact support.Let χ : R d → R k be a continuously differentiable function with bounded partial derivatives such that E p [χ] = 0 and E p [|∇χ| 8  op ] < ∞.Then, the pushforward measure q := χ * p on R k admits a Stein kernel τ q satisfying for some universal constant a > 0.
For example, taking k = d and χ(x) := x shows that for p isotropic and log-concave The analogous bound of (1.3), with a better polylog and a different Stein kernel s p , follows from [23] that showed E p [|s p | 2 HS ] ≤ dC p , where C p is the Poincaré constant of p. Indeed, since C p ≤ c log d by the result of [38], the bound (1.3) holds for s p .However, unlike previous constructions, our construction is well-behaved with respect to compositions.It allows to consider general χ, which leads to the existence of Stein kernels τ q with bounded Hilbert-Schmidt norm, where now q does not necessarily satisfy a Poincaré inequality.
As a concrete application we will use Theorem 1.6 to deduce new central limit theorems with nearly optimal convergence rates.The best dimensional dependence in the convergence rate one could expect is of order d n , as can be seen by considering product measures.Most known results establish general rates of convergence, in various distances, which are not better than d √ n , and typically require a super-linear dependence in the dimension (see [9,16] for some notable examples).
To improve on such bounds, several recent works have shown that, by imposing strong structural assumptions on the common law of the summands, one can reduce the rate of convergence to d n .However, these works dealt with highly regular measures, such as log-concave measures [27], measures with small support [66], or measures satisfying a Poincaré inequality [28,23].These assumptions can be restrictive for certain applications involving heavy-tailed measures, whose moment generating function may not be well-defined and hence do not have sub-exponential tails.
We will bypass the above restrictive assumptions by utilizing the Stein kernel approach to normal approximations combined with Theorem 1.6.The starting point is the inequality, valid for any Stein kernel s p , where W 2 is the Wasserstein 2-distance [47,Proposition 3.1].In particular, inequality (1.4) can be used to prove central limit theorems: Suppose that p is isotropic and let {Y i } be an i.i.d.sequence sampled from p.Then, as shown in [47, section 2.5], for every given Stein kernel s p , there exists a Stein kernel s pn , where 1 Combining (1.4) with the triangle inequality thus yields The upshot of this discussion is that if we can construct a Stein kernel s p with a small Hilbert-Schmidt norm we then obtain a central limit theorem with a good rate.In particular, whenever we get an d n rate of convergence.Using our Stein kernel τ p and the bound in Theorem 1.6 we obtain: Corollary 1.7.(Central limit theorem) Let p, q, and χ be as in Theorem 1.6, and further suppose that q is isotropic.Then, if {Y i } are i.i.d.sampled from q we have, with 1 for some universal constant a > 0.
Corollary 1.7 allows to significantly relax the regularity assumptions of the above mentioned works, while still maintaining a nearly optimal rate of convergence.For example, if χ has a quadratic growth then q can have super-exponential tails, so it cannot satisfy a Poincaré inequality.In contrast, in such a setting, Corollary 1.7 provides results which are comparable, up to the Sobolev norm of χ and log d terms, to the ones obtained for log-concave measures [23,28].Another appealing feature is the ability to treat singular measures when k > d.A particular case of interest is χ(x) := x ⊗m for some positive integer m.Even though q may be heavy-tailed in this case, since E p [|∇χ| 8  op ] is finite when p is log-concave we get a central limit theorem for sums of i.i.d.tensor powers.Proving such a result was a question posed in [53, section 3] where it was also suggested that finding an appropriate transport map could prove useful.Using our construction, Corollary 1.7 resolves this question.
1.5.Contractions on Wiener space.In the previous sections we viewed the solution X to (1.2) as a map X 1 : Ω → R d .We can go however beyond the Euclidean setting by not restricting ourselves to the value of X at time t = 1.We then get a map X : Ω → Ω which transports the Wiener measure γ to the measure µ on Ω given by dµ(ω) = dp dγ d (ω 1 )dγ(ω) for ω ∈ Ω where ω t is the value of ω at time t ∈ [0, 1].In words, µ is obtained from γ by reweighting the probability of ω according to γ by the value of dp dγ d : R d → R d at ω 1 ∈ R d .This leads to the following question: Is X a contraction, in a suitable sense, from the Wiener measure γ to µ under appropriate conditions on µ?In fact, this question can be placed in the general context of transport maps on Wiener space as we now explain.
We start by recalling that the Wiener space Ω contains the important Cameron-Martin space H 1 whose significance lies in the fact that the law of ω + h is absolutely continuous with respect to the law of γ whenever h ∈ H 1 .The Cameron-Martin space is the continuous injection of the space Based on this cost two notions of optimal transport maps on Ω can be defined: Given probability measures measure ν, µ on Ω let Π(ν, µ) be the set of probability measures on Ω × Ω such that their two marginals are equal to ν and µ, respectively.The (squared) Wasserstein 2-distance between ν and µ is defined as Assuming that W 2 2 (ν, µ) < ∞, the optimal transport map O : Ω → Ω, when its exists, is a map which transports ν into µ satisfying This definition is the generalization of the definition appearing in the classical optimal transport theory on Euclidean spaces [64].In Euclidean spaces, the existence of optimal transport maps and their regularity was proven by Brenier [64, Theorem 2.12] while the analogous result in Wiener space is due to Feyel and Üstünel [31].Since W 2 2 (ν, µ) < ∞, we may write O(ω) = ω + ξ(ω) where where the infimum is taken over all maps ξ : Ω → H 1 such that Law(ω + ξ(ω)) = µ with ω ∼ ν.Importantly, we do not make the requirement that ξ(ω) is an adapted 5 process.We now turn to the second notion of optimal transport: Given probability measures ν, µ on Ω, we define 6 the causal optimal transport map A : Ω → Ω to be the map which transports ν to µ satisfying where the infimum is taken over all maps ξ : Ω → H 1 such that Law(ω + ξ(ω)) = µ with ω ∼ ν, and with the additional requirement that ξ is an adapted process.This notion of optimality, sometimes 5 Colloquially, a process is adapted if it cannot anticipate the future; see section 2 for the precise definition. 6This definition is the analogue of the Monge problem rather than the Wasserstein distance.There is a more general definition [44] of causal optimal transport corresponding to adapted Wasserstein distance but this is not important for our work.
referred to as adapted optimal transport, has recently gained a lot of traction (e.g, [7] and references therein).The connection between these transport maps and the Brownian transport map follows from the work of Lassalle [44].It turns out that when dµ(ω) = dp dγ d (ω 1 )dγ(ω) for ω ∈ Ω, the causal optimal transport map A : Ω → Ω, which transports γ to µ, is precisely the Föllmer process X : Ω → Ω.This is essentially a consequence of Girsanov's theorem as well as the entropy-minimization property of the Föllmer process [33].
Once a notion of optimal (causal or non-causal) transport map in Wiener space is established, the question of contraction arises: Given a measure µ on Ω which is more log-concave than the Wiener measure γ, is either O or A a contraction?We are not aware of any such results in the current literature.To make this question precise we need a notion of convexity on Ω as well as a notion of contraction.We postpone the precise definitions to section 8 and for now denote such a notion of contraction as Cameron-Martin contraction.Let us state some of our results in this direction.
Theorem 1.8.(Cameron-Martin contraction) • Let p be any 1-log-concave measure on R d and let µ be a measure on the Wiener space given by dµ(ω) = dp dγ d (ω 1 )dγ(ω) for ω ∈ Ω.Then, the optimal transport map O from the Wiener measure γ to µ is a Cameron-Martin contraction with constant 1.
• There exists a 1-log-concave measure p on R d such that the causal transport map A between γ to µ, where dµ(ω) = dp dγ d (ω 1 )dγ(ω) for ω ∈ Ω, is not a Cameron-Martin contraction with any constant.
The first part of the theorem is a straightforward consequence of Caffareli's contraction theorem.The second part requires some work by constructing a non-trivial example (see Remark 7.1) where the causal optimal transport map fails to be a Cameron-Martin contraction.
Organization of the paper.Section 2 contains the preliminaries necessary for this work including the definition of the Brownian transport map based on the Föllmer process.Section 3 contains the construction of the Föllmer process and the analysis of its properties which then leads to the almost-sure contraction properties of the Brownian transport map.The main results in this section are contained in Theorem 3.1.Section 4 focuses on the setting where the target measure is logconcave with compact convex support and shows that, in this setting, we can bound the moments of the derivative of the Brownian transport map; the main result is Theorem 4.2.In addition, section 4 contains a short explanation of the connection between stochastic localization and the Föllmer process.In section 5 we use the almost-sure contraction established in Theorem 3.1 to prove new functional inequalities.In addition, section 5 contains our results on Stein kernels and their applications to central limit theorems.In section 6 we set up the preliminaries necessary for the study of contraction properties of transport maps on the Wiener space itself.In section 7 we show that causal optimal transport maps are not Cameron-Martin contractions even when the target measure is κ-log-concave, for any κ.Finally, section 8 is devoted to optimal transport on the Wiener space.
Wiener space and for other discussions about this work.We also thank Ronen Eldan, Alexandros Eskenazis, Bo'az Klartag, and Emanuel Milman for useful conversations about the topics of this paper.Finally, we thank the anonymous referees for their valuable suggestions and corrections.
The authors gratefully acknowledge the program Geometric Functional Analysis and Applications that took place in 2017 at the Mathematical Sciences Research Institute (MSRI) where their collaboration on this project was initiated.Dan Mikulincer is partially supported by a European Research Council grant no.803084.Yair Shenfeld was partially funded by NSF grant DMS-1811735 and the Simons Collaboration on Algorithms & Geometry; this material is based upon work supported by the National Science Foundation under Award Number 2002022.

Preliminaries
For the rest of the paper we fix a dimension d and let f : R d → R ≥0 be a function such that R d f dγ d = 1 where γ d is the standard Gaussian measure on R d .We denote the probability measure p(x)dx := f (x)dγ d (x) and further assume that the relative entropy H(p|γ d ) := R d log dp dγ d dp < +∞.We set S := diam(supp(p)).We write •, • for the Euclidean inner product and | • | for the corresponding norm.Our notion of convexity is the following: Next we recall some basics on the classical Wiener space and the Malliavin calculus [59].
Wiener space.Let (Ω, F, γ) be the classical Wiener space: is the Wiener measure, and F is the completion (with respect to γ) of the Borel sigma-algebra generated by the uniform norm |ω| ∞ := sup t∈[0,1] |ω t | for ω ∈ Ω.In words, a path ω ∈ Ω sampled according to γ has the law of a Brownian motion in R d running from time 0 to time 1.We set W t := W (ω) t := ω t for t ∈ [0, 1] and let (F t ) t∈[0,1] be the sigma-algebra on Ω generated by (W t ) t∈[0,1] together with the null sets of F. We say that a process For the rest of the paper we define a probability measure µ on Ω by dµ(ω) = f (ω 1 )dγ(ω).
An important Hilbert subspace in Ω is the Cameron-Martin space H 1 which is defined as follows: Then H 1 := {i(g) : g ∈ H} and we often write h t = t 0 ḣs ds for ḣ ∈ H and h ∈ H 1 .The space H 1 has an inner product induced from the inner product of the Hilbert space H, namely, h, g H 1 := 1 0 ḣs , ġs ds.The significance of the Cameron-Martin space is that the measure of the process W +h = (ω t +h t (ω)) t∈[0,1] is absolutely continuous with respect to γ whenever h(ω) ∈ H 1 γ-a.e. and (h t (ω)) t∈[0,1] is adapted and regularenough; this is a consequence of Girsanov's theorem.Given ḣ ∈ H we set W ( ḣ) := 1 0 ḣt dW t where the integral is the stochastic Itô integral; in this notation, Next we define the notion of contraction which is compatible with the Cameron-Martin space.
In Euclidean space, a function is Lipschitz if and only if its derivative (which exists almosteverywhere) is bounded.In order to find the analogue of this result for our notion of contraction we need an appropriate definition of derivatives on the Wiener space.
Malliavin calculus.The calculus on the Wiener space was developed by P. Malliavin in the 70's and it will play an important role in our proof techniques.The basic objects of analysis in this theory is the variation of a function F : Ω → R as the input ω ∈ Ω is perturbed.In order for the calculus to be compatible with the Wiener measure only perturbations in the direction of the Cameron-Martin space H 1 are considered.We now sketch the construction of the Malliavin derivative and refer to [59] for a complete treatment.The construction of derivatives of F starts with the definition of the class S of smooth random variables: F ∈ S if there exists m ∈ Z + and a smooth function η : R m → R whose partial derivatives have polynomial growth such that for some ḣ1 , . . ., ḣm ∈ H.The Malliavin derivative of a smooth random variable F is a map To get some intuition for this definition observe that for γ-a.e.ω ∈ Ω and every ḣ ∈ H with H 1 ∋ h = 0 ḣ, is the Gâteaux derivative of F in the direction ḣ.The Malliavin derivative is then extended to a larger class of functions on the Wiener space: Given p ≥ 1 we let D 1,p be the closure of the class S with respect to the norm In other words, D 1,p is the domain in L p (Ω, γ) of the Malliavin derivative operator D. The value of DF ∈ H at time t ∈ [0, 1] is denoted as D t F .The notion of the Malliavin derivative allows us to define the appropriate notion of derivatives of transport maps and let D t F be the k × d matrix given by [D t F ] ij = D j t F i where we use the notation ; in other words, D j t F i is the jth coordinate of D t F i .For γ-a.e.ω ∈ Ω we define the linear Malliavin derivative operator When no confusion arises we omit the subscript dependence on ω and write DF .The next result shows that almost-sure contraction is equivalent to the boundedness of the corresponding Malliavin derivative operator.In the following we denote by L(H, R d ) the space of linear operators from H to R d equipped with the operator norm • Let T : Ω → R d be such that there exists q > 1 so that E γ [|T | q ] < ∞ and DT exists γ-a.e.
then there exists an almost-sure contraction T : Ω → R d with constant C such that γ-a.e.T = T .
Proof.The first part will follow from [10, Theorem 5.11.2(ii)] while the second part will follow from [10,Theorem 5.11.7] once we check that these results can be applied.We take the domain to be Ω ( a locally convex space) with the measure γ (a centered Radon Gaussian measure).The space H 1 is the Cameron-Martin space while the image of T is a subset of R d (a separable Banach space with the Radon-Nikodym property).It remains to check that the Gâteaux derivative of T along H 1 is equal to DT .For smooth cylindrical maps T [10, p. 207] this is clear and the general result follows from [10, Theorem 5.7.2].
The Föllmer process and the Brownian transport map.The history of the Föllmer process goes back to the work of E. Schrödinger in 1932 [52], but it was H. Föllmer who formulated the problem in the language of stochastic differential equations [33]; see also the work of Dai Pra from the stochastic control approach [24].Let p = f dγ d be our probability measure on R d and let (B t ) t∈[0,1] be the standard Brownian motion in R d .The Föllmer drift v(t, x) is such that the solution (X t ) t∈[0,1] of the stochastic differential equation satisfies X 1 ∼ p and, in addition, is minimal among all such drifts.It turns out that the Föllmer drift v has an explicit solution: Let (P t ) t≥0 be the heat semigroup on R d acting on functions η : R d → R by That X 1 ∼ p with the above v can be seen, for example, from the Fokker-Planck equation of (2.1).Further, as a consequence of Girsanov's theorem, the optimal drift satisfies We refer to [33,51] for more details.Specifically, the validity of (2.2) is guaranteed in our setting by [24, Theorem 3.1] (using the uniqueness of the solution to (2.1)).
The Brownian transport map is defined as the map X 1 : Ω → R d .This definition makes sense only if (2.1) has a strong solution which in particular is defined at t = 1; we will address this issue in the next section.

Almost-sure contraction properties of Brownian transport maps
In this section we show that the Brownian transport map is an almost-sure contraction in various settings.The following is the main result of this section and it covers the almost-sure contraction statements of Theorem 1.1 and Theorem 1.3.Theorem 3.1.
(1) Suppose that either p is κ-log-concave for some κ > 0, or that p is κ-log-concave for some κ ∈ R and that S < +∞.Then (2.1) has a unique strong solution for all t ∈ [0, 1].Furthermore, (a) If κS 2 ≥ 1 then X 1 is an almost-sure contraction with constant 1 √ κ ; equivalently, (2) Fix a probability measure ν on R d supported on a ball of radius R and let p := γ d ⋆ ν.Then (2.1) has a unique strong solution for all t ∈ [0, 1].Furthermore, X 1 is an almost-sure contraction with constant e 2R 2 −1 Remark 3.2.The dichotomy of κS 2 ≥ 1 versus κS 2 < 1 is just a convenient way of organizing the various cases we consider, i.e., κ nonpositive or nonnegative and S finite or infinite.This dichotomy is ambiguous when κ = 0 and S = ∞ since we need to make a convention regarding 0 • ∞.Either way, the bound provided by Theorem 3.1(1) is trivial, since it is equal to ∞, so when proving Theorem 3.1(1) we will ignore issues arising from this case.Will come back to the case κ = 0 when proving Theorem 4.2.
The proof of Theorem 3.1, ignoring for now the issue of existence of solutions to (2.1), relies on the fact that the Malliavin derivative of the Föllmer process satisfies the following linear equation: Using this equation we show that Hence, the proof of Theorem 3.1 now boils down to estimating λ max (∇v(r, X r )).In section 3.1 we express ∇v(r, X r ) as a covariance matrix which allows us to bound λ max (∇v(r, X r )).In section 3.2 we use those estimates to establish the existence and uniqueness of a strong solution to (2.1).Consequently, we derive a differential equation for DX t , which together with the estimates on λ max (∇v(r, X r )), allow us to bound We complete the proof of Theorem 3.1 in section 3.3.Remark 3.3.As explained above, the key point behind the proof of Theorem 3.1 is to upper bound λ max (∇v(r, X r )) = λ max (∇ 2 log P 1−r (X r )).However, once a Hessian estimate on ∇ 2 log P 1−r (X r ) is obtained, it can be used to prove functional inequalities without the usage of the Brownian transport map: (a) The first way to do so is to work with the semigroup of (X t ), and mimic the classical Bakry-Émery calculation (see [8]).The downside of this approach is that it is well suited to functional inequalities such as the log-Sobolev inequality, but not to isoperimetric-type inequalities.In contrast, transport approaches, such as the Brownian transport map, can provde all of these functional inequalities in one streamlined framework.
(b) The second way to apply the Hessian estimate is to use it within the context of the heat flow transport map of Kim and Milman [37].This approach avoids the issues mentioned in part (a).On the other hand, the usage of this transport map is only suitable if we want to prove pointwise estimates on the Lipschitz constant of the transport map.In contrast, the Brownian transport map allows us to prove estimates on the Lipschitz constant of the transport map in expectation, which is what is needed to make the connection with the Kannan-Lovász-Simonovits conjecture; cf.Theorem 1.4.(We remark however that the heat flow map has its own advantages, as explained in [54, p.3].) 3.1.Covariance estimates.We begin by representing ∇v as a covariance matrix.Define the measure p x,t on R d , for fixed t ∈ [0, 1] and x ∈ R d , by dp x,t (y) := f (y)ϕ x,t (y) where ϕ x,t is the density of the d-dimensional Gaussian distribution with mean x and covariance t Id d . Claim.
Proof.The estimate (3.3) follows immediately from (3.2).To prove (3.2) note that since and hence, We start by using the representation (3.2) to upper bound ∇v.
(2) Let κ ∈ R and suppose that p is κ-log-concave.Then, for any t (3) Fix a probability measure ν on R d supported on a ball of radius R and let p := γ d ⋆ ν.Then, Proof.
(1) By (3.2), it suffices to show that Cov(p x,1−t ) S 2 Id d which is clear from the definition of p x,1−t . ( where we used that dp(y ≥ 0 so by the Brascamp-Lieb inequality [3, Theorem 4.9.1],applied to functions of the form and the result follows by (3.2).
(3) We have for some constant A x,t depending only on x and t.Hence, where ν is a probability measure which is a multiple of ν by a positive function.In particular, ν is supported on the same ball as ν.Let G be a standard Gaussian vector in R d and Z ∼ ν be independent.Then Remark 3.5.In principle, we could use more refined Brascamp-Lieb inequalities [43, Theorem 3.3], or use the results of [22] (which imply a stronger Poincaré inequality; we omit the details of this implication), to improve Lemma 3.4(2) and the subsequent results.However, the improvement will end up being not too significant at a cost of much more tedious computations so we omit the details.
The majority of this section focuses on part (1) of Theorem 3.1 since once that part is settled, part (2) will follow easily.The next two corollaries combine the bounds of Lemma 3.4(1,2) to obtain a bound on λ max (∇v(t, x)), as well as its exponential, which is needed to bound DX t .The first corollary handles the case κ ≥ 0 with no assumptions on S while the second corollary handles the case κ < 0 under the assumption S < ∞.Corollary 3.6.Define the measure dp = f dγ d with S := diam(supp(p)) and suppose that p is κ-log-concave with κ ∈ [0, +∞).
• κS 2 ≥ 1: By considering κS 2 = 1 we see that the bound (S 2 − κS 2 + 1)t ≤ 1 − κS 2 cannot hold so it is always advantageous to use the bound Next we will compute t 0 e 2 t s θrdr ds and we first check that the integral t s θ r dr is welldefined.The only issue is if (1 − κ)t + κ = 0 which happens when t 0 := κ κ−1 .If κ ∈ (0, 1) then t 0 < 0 so θ t is integrable on [0, 1], and if κ ≥ 1, then t 0 > 1 so again θ t is integrable on [0, 1].The only issue is when κ = 0 in which case t 0 = 0.However, in that case we cannot have κS 2 ≥ 1 as κ = 0. Compute, since the denominator is nonnegative as κS 2 < 1. Hence we define From now until the end of the proof we assume that t ≥ 1−κS 2 (1−κ)S 2 +1 .In order to compute t s θ r dr we start by noting that r . We also note that, following the discussion in the κS 2 ≥ 1 case, the denominator (1 − κ)t + κ does not vanish in the range where it is integrated.For s ∈ 0, Hence, and so and we note that both integrals are finite because (1 − κ)s + κ does not vanish in the range of integration.The first integral reads The second integral reads Adding everything up gives the result.
Corollary 3.7.Define the measure dp = f dγ d and suppose that S := diam(supp(p)) < ∞ and that p is κ-log-concave with κ ∈ (−∞, 0).We have and, for t ∈ Proof.By Lemma 3.4, the two upper bounds we can get on λ max (∇v(t, x)) We define As in the proof of Corollary 3.6, we have , the above term can be integrated as in the proof of Corollary 3.6.

3.2.
The Malliavin derivative of the Föllmer process.The bounds provided by (3.3) and Lemma 3.4 are only strong enough to establish the existence of a unique strong solution to (2.1) only until t < 1 because at t = 1 these bounds can blow up.For our purposes, however, it is crucial to have the solution well-defined at t = 1 since we need X 1 ∼ p.We will proceed by first analyzing the behavior of the solution before time 1, which will then allow us to extend the solution and its Malliavin derivative to t = 1; see Proposition 3.10.
Turning to the bound on DX t , fix ḣ ∈ H and define α ḣ : [0, 1) → R d by The equation for DX t and Fubini's theorem (which can be applied since ∇v is bounded and by using Grönwall's inequality on any norm of D r X t ) imply that Set λ t := λ max (∇v(t, X t )).It follows from the Cauchy-Schwarz inequality that In order to analyze y(t) we note that the solution of the Bernoulli ordinary differential equation so since y(t) ≤ z(t) for all t ∈ [0, 1) we conclude that t 0 e 2 t s λrdr ds.
To extend the derivative to t = 1 we start by showing that DX 1 exists.Fix w ∈ R d and take ḣ ≡ w so that Taking w = e j , the jth element of the standard basis of R d , and using that D j r X i t = 0 if r > t, we have By Corollary 3.9, it follows that sup k |D j X i t k | 2 H < ∞ for any i, j ∈ [d], γ-a.e.Hence, by [59, Lemma 1.2.3], for any i, j ∈ [d], D j X i 1 exists and D j X i t k converges to D j X i 1 in the weak topology of L 2 (Ω, H).Hence, for a fixed ḣ ∈ H, we have that On the other hand, fix ḣ ∈ H and recall the definition of α ḣ : [0, 1) → R d from the proof of Lemma 3.8.The definition of α ḣ as an integral, and the fact that the integrand is bounded (since , show that, γ-a.e., {α ḣ(t k )} is a Cauchy sequence so it converges to some limit denoted as α ḣ(1).
The proof is complete.

3.3.
Proof of Theorem 3.1.We start by noting that Lemma 2.3 applies in our setting because the moment assumption holds by either convexity or the boundedness of the support (including in the Gaussian mixture case).
Part (1): Combining the results in Proposition 3.10 and plugging in t = 1 we get: (a) If κS 2 ≥ 1: (b) If κS 2 < 1: This completes the proof.Part (2): By Lemma 3.4(3), ∇v(t, x) R 2 Id d so the previous arguments of this section apply to show that (2.1) has a unique strong solution in the setting where p is a mixture of Gaussians.In addition, the bound ∇v(t, x) Hence, repeating the computations earlier in this section yields

Contraction properties of Brownian transport maps for log-concave measures
In this section we suppose that p is an isotropic log-concave measure with compact support.Our main result, Theorem 4.2, bounds the norms of the derivative of the Brownian transport map (Theorem 1.4).The proof of Theorem 4.2 relies on the result of [38] and the technique of [18], which is based on the stochastic localization of Eldan; see also [25,49,39].
Preliminaries.We start by explaining the connection between stochastic localization and the Föllmer process.Recall that the Föllmer process is the solution (X t ) t∈[0,1] to the stochastic differential equation (2.1): and has the property that X 1 ∼ p where p = f dγ.We also recall the definition (3.1): Let us denote by p t the (random) law of In stochastic localization (in its simplified setting), equation (4.1), up to time-change (t → 1 1−t − 1), serves as the definition of the process.We refer [26], [50], and [40, section 4] for more information.
Proof of Lemma 4.1.Let (X t ) t∈[0,1] be the Föllmer process and let µ be its associated measure on the Wiener space Ω: dµ dγ (ω) = f (ω 1 ) for ω ∈ Ω.Then, for any η : R d → R continuous and bounded, we have It follows that p t = p Xt,1−t with density which is well-defined for all t ∈ [0, 1).Fix y ∈ R d and let so that p t (y) = α(t, X t )β(t, X t ).By the heat equation, and hence, It follows from Itô's formula that By integration by parts, Moments of the derivative of the Brownian transport map.Our next goal is to bound the moments of DX t .To this end, we will use the current best bounds in the Kannan-Lovász-Simonovits conjecture.Let k ≥ 0 be such that where a > 0 is some dimension-free constant.If the Kannan-Lovász-Simonovits conjecture is true, we can take k = 1 log d to get C kls ≤ ae, which is a dimension-free constant.The result of [38] is that we can take k = log log d log d , which then yields C kls ≤ a log d.Theorem 4.2.(Isotropic log-concave measures) Let p be an isotropic log-concave measure with compact support.Then (2.1) has a unique strong solution on [0, 1].Further, there exists a universal ζ such that, for any positive integer m, The assumption in Theorem 4.2 that p has a compact support is not important for the application to the Kannan-Lovász-Simonovits conjecture; see [18, section 2.6].In particular, the bounds in the theorem are independent of the size of the support of p.
Proof.By Proposition 3.10, there exists a unique strong solution (X t ) to (2.1) for all t ∈ [0, 1] with X 1 ∼ p and, for any m > 0, Hence, our goal is to upper bound the right-hand side of the inequality above.Given α > 2 define the stopping time τ := r 0 ∧ inf{r ∈ [0, 1] : λ max (∇v(r, X r )) ≥ α} for some r 0 ∈ 0, t 2 to be chosen later.By Lemma 3.4(2) (with κ = 0), we have λ max (∇v(r, X r )) ≤ In light of (4.3), we need to choose α, r 0 appropriately and show that E γ 1 τ 2m can be sufficiently bounded.The control of the moments of 1 τ will rely on showing that this random variable has a sub-exponential tail.Lemma 4.4.Suppose there exist nonnegative constants (possibly dimension dependent) b α , c α such that Then, Proof.We will apply the identity By the definition of τ , P γ 1 τ ≥ s = 1 for s ∈ 0, 1 r 0 so, for any positive integer m, where we used the incomplete Gamma function identity when m is a positive integer.Using and hence In light of Lemma 4.4, our goal is to prove that 1 τ has a sub-exponential tail, which requires a better understanding of the stopping time τ .To simplify notation let K t := Cov(p Xt,1−t ) and recall the representation (3.2), Hence, The quantity λ max (K r ) is difficult to control so we use the moment method and instead control Γ r := Tr[K q r ], while noting that λ max (K r ) ≤ Γ 1 q r for any q ≥ 0. The process (Γ t ) t∈[0,1] satisfies a stochastic differential equation dΓ t = u t dB t + δ t dt for some vector-valued process (u t ) t∈[0,1] and a real-valued process (δ t ) t∈[0,1] .These processes can be derived using Itô's formula and the stochastic differential equation satisfied by (K t ) t∈[0,1] (which itself can be derived using Itô's formula).Next, we use the argument in [18] to control the processes (u t ) and (δ t ).
Define the martingale M s := where the last inequality holds by the definition of (Θ r ).Plugging this estimate into (4.5)yields By the Dubins-Schwarz theorem we have M s = Z [M ]s with (Z s ) a standard Brownian motion in R, and by Lemma 4.5, Hence, Applying Doob's maximal inequality for Brownian motion we get Omitting the (positive) last term above we get We now complete the proof of the theorem.By Lemma 4.4 and Lemma 4.6, We will choose r 0 ∈ 0, t 2 such that the two exponentials cancel each other.Setting we get By [38, Theorem 1.2], we may take k = log log d log d , and hence, q = ⌈ 1 k ⌉ + 1 = c ′ log d log log d for some c ′ .By increasing c ′ , we may assume that 2k = 2 q−1 .Hence, using We have for some constant a > 0. By (4.3), = exp 2m 2d where we used that d 1 q − 2 q−1 < 1, and that exp m 2 , and using 2m(2m)!≤ (2m + 1)!, completes the proof.

Functional inequalities
The contraction properties provided by Theorem 3.1 and Theorem 4.2 allow us to prove functional inequalities for measures in Euclidean spaces.The main goal of this section is to demonstrate the power of the contraction machinery developed in this paper, rather than be exhaustive, so we focus only on a number of functional inequalities.As a consequence of the almost-sure contraction of Theorem 3.1, we will prove Ψ-Sobolev inequalities (Theorem 5.3), q-Poincaré inequalities (Theorem 5.4), and isoperimetric inequalities (Theorem 5.5).As a consequence of the contraction in expectation of Theorem 4.2, we will construct Stein kernels and prove central limit theorems (Theorem 1.6 and Corollary 1.7).
We start with almost-sure contractions; the next lemma describes the behavior of derivatives under such contractions.
where (DΥ) * : R d → H is the adjoint of DΥ.Further, Proof.To compute D(η • Υ) we note that, by duality, it can be viewed as the operator D(η • Υ) : With Lemma 5.1 in hand we can now start the proofs of the functional inequalities which follow from Theorem 3.1.We begin with the Ψ-Sobolev inequalities [14].Definition 5.2.Let I be a closed interval (possibly unbounded) and let Ψ : I → R be a twicedifferentiable function.We say that Ψ is a divergence if each of the functions Ψ, Ψ ′′ , − 1 Ψ ′′ is convex.Given a probability measure ν on R d and a function η : R d → I, such that η dν ∈ I, we define Some classical examples of divergences are Ψ : R → R with Ψ(x) = x 2 (Poincaré inequality) and Ψ : R ≥0 → R with Ψ(x) = x log x (log-Sobolev inequality).
(1) Let p be a κ-log-concave measure with S := diam(supp(p)) and let η : R n → I be any continuously differentiable Lipschitz function such that η 2 dp ∈ I.
(2) Fix a probability measure ν on R d supported on a ball of radius R and let p := γ a,Σ d ⋆ ν where γ a,Σ d is a the Gaussian measure on R d with mean a and covariance Σ. Set λ min := λ min (Σ) and λ max := λ max (Σ).Then, for any η : R n → I, a continuously differentiable Lipschitz function such that η 2 dp ∈ I, we have Proof.
(2) Fix a probability measure ν on R d supported on a ball of radius R and let p := γ a,Σ d ⋆ ν where γ a,Σ d is a the Gaussian measure on R d with mean a and covariance Σ.Then, with λ min := λ min (Σ) and λ max := λ max (Σ), we have Proof.
(1) We will use the fact [1, Theorem 2.6] (see [58,Proposition 3.1(3)] for an earlier result) that the q-Poincaré inequality holds for the Wiener measure γ: be the Föllmer process associated to p, so that X 1 ∼ p, and suppose that Then, by Lemma 5.1 and [59, Proposition 1.2.4], ].The proof is complete by Theorem 3.1.
(1) Let p be a κ-log-concave measure with S := diam(supp(p)) and let Then, for any Borel set A ⊂ R d and r ≥ 0, (2) Fix a probability measure ν on R d supported on a ball of radius R and let p := γ a,Σ d ⋆ ν where γ a,Σ d is a the Gaussian measure on R d with mean a and covariance Σ. Set λ min := λ min (Σ), λ max := λ max (Σ), and Then, Proof.
(1) Let B H 1 be the unit ball in H 1 .We will use the fact [45,Theorem 4.3] that the Wiener measure γ satisfies the isoperimetric inequality: for any Borel measurable set K ⊂ Ω and r ≥ 0; see the discussion following [45, Theorem 4.3] for measurability issues.Let (X t ) t∈[0,1] be the Föllmer process associated to p so that X 1 ∼ p. Suppose that X 1 : Ω → R d is an almost-sure contraction with constant C, so in particular, Let M ⊂ R d be a Borel measurable set.We will show that Then, by the isoperimetric inequality for γ and (5.1), The proof is then complete by Theorem 3.1.
In order to prove (5.1) it suffices to show that (2) Let Y ∼ ν, let ν be the law Σ −1/2 Y , and define p := γ d ⋆ ν.Set λ min := λ min (Σ) and λ max := λ max (Σ).The argument of part (1) gives, for any Borel set M ⊂ R d and r ≥ 0, The proof is complete by noting that p Σ Stein kernels.We now turn to the applications of the contraction in expectation, as in Theorem 4.2.Specifically, we shall prove Theorem 1.6, from which Corollary 1.7 follows, as explained in the introduction.We first establish the connection between the Brownian transport map and Stein kernels.Given Malliavin differentiable functions F, G : Ω → R k we denote (DF, DG) H := Note that, as outlined in section 2, for every fixed t ∈ [0, 1], D t F is a k × d matrix, and so (DF, DG) H takes values in the space of k × k matrices.The construction of the Stein kernel relies on the Ornstein-Uhlenbeck operator L, as defined, as in [57, section 2.8.2].To define the operator, let δ stand for the adjoint of the Malliavin derivative D, also called the Skorokhod integral.For our purposes we shall only use δ on matrix-valued paths DF and DG, where F and G are as above.In this case, δ acts on the rows of DG, and δDG takes values in R k .Formally, the adjoint property of δ is given by where the inner product on the right hand side is the Euclidean one in R k .
We can now define the Ornstein-Uhlenbeck operator as L := −δD.By construction, if G : Ω → R k , then LG : Ω → R k as well.A useful property of L is that it is invertible on the subspace of functions G, satisfying E γ [G] = 0, and we denote the pseudo-inverse by L −1 ; see [57, section 2.8.2] for more details, and in particular [57,Proposition 2.8.11].Now, given a Malliavin differentiable function F : Ω → R k such that, E γ [F ] = 0, we define the k × k matrix-valued map Above, the expression E γ [•|F = x] is the expectation of the regular conditional probability on the fibers of the map F −1 (x), which is well-defined for almost every x ∈ R k , with respect to the law of F , [35].
Proof.The proof follows the argument in [53, Lemma 1].Let η : R k → R k be a continuously differentiable and Lipschitz function and let Y ∼ F * γ.We need to show that where •, • HS is the Hilbert-Schmidt inner product.We recall that L = −δD where δ is the adjoint to the Malliavin derivative D. Compute, Theorem 1.6.(Stein kernels) Let p be an isotropic log-concave measure on R d with compact support.Let χ : R d → R k be a continuously differentiable function with bounded partial derivatives such that E p [χ] = 0 and E p [|∇χ| 8op ] < ∞.Then, the pushforward measure q := χ * p on R k admits a Stein kernel τ q satisfying for some universal constant a > 0.
Remark 5.7.As will become evident from the proof of Theorem 1.6, the result holds provided that χ • X 1 is a Malliavin differentiable random vector where (X t ) is the Föllmer process associated to p.By [59, Proposition 1.2.3], this condition holds if χ is a continuously differentiable function with bounded partial derivatives.
Proof.Let F = χ • X 1 and let τ be the Stein kernel constructed above so that τ q := τ is a Stein kernel of q.Let (P t ) be the Ornstein-Uhlenbeck semigroup on the Wiener space 8 and recall that it is a contraction [57,Proposition 2.8.6].By [57, Proposition 2.9.3],  24 for some universal constant a > 0. The latter will follow from Theorem 4.2 as soon as we show that, γ-a.e., ) be an approximation to the identity:

Cameron-Martin contractions
The notion of contraction we considered up until now was the appropriate one when the target measures were measures on R d .If, however, we are interested in transportation between measures on the Wiener space itself, then we need a stronger notion of contraction.Definition 6.1.A measurable map T : Ω → Ω is a Cameron-Martin transport map if T (ω) = ω + ξ(ω) for some measurable map ξ : Ω → H 1 ; we write ξ(ω) = • 0 ξ(ω) for some measurable map ξ : Ω → H.We set Claim.Let T : Ω → Ω be a Cameron-Martin contraction with constant C.Then, for any t ∈ [0, 1], T t : Ω → R d is an almost-sure contraction with constant C.
Proof.Let T : Ω → Ω be a Cameron-Martin contraction with constant C. Fix h ∈ H 1 , ω ∈ Ω, and define q ∈ H 1 by q t = T t (ω + h) − T t (ω) for t ∈ [0, 1]; note that q is indeed an element of H 1 since T is a Cameron-Martin transport map.Since We see that a Cameron-Martin contraction is a stronger notion than an almost-sure contraction.Given a measure µ on Ω, a Cameron-Martin contraction between γ and µ would transfer functional inequalities from γ to µ where the functions are allowed to depend on the entire path {ω t } t∈[0,1] .For the rest of the paper, we focus on the question of whether either the causal optimal transport map or the optimal transport map is a Cameron-Martin contraction when the target measure enjoys convexity properties.The notion of convexity we use is compatible with the Cameron-Martin space [30]: Remark 6.3.An important example of a Cameron-Martin convex function V on Ω is V (ω) = η(ω 1 ) with η : R d → R a convex function.
The precise question we consider is the following: Suppose µ is a probability measure on Ω of the form dµ(ω) = e −V (ω) dγ(ω), where V : Ω → R is a Cameron-Martin convex function.Let A, O : Ω → Ω be the causal optimal transport map and optimal transport map from γ to µ, respectively.Is it true that either A or O is a Cameron-Martin contraction with any constant C?
In order to answer this question our first task is to construct a suitable notion of derivative for Cameron-Martin transport maps T : Ω → Ω so that, in analogy with Lemma 2.3, we can establish a correspondence between being a Cameron-Martin contraction and having a bounded derivative.The Malliavin derivative was defined for real-valued functions F : Ω → R but it can be defined for H-valued functions ξ : Ω → H as well [59, p. 31].We start with the class S H of H-valued smooth random variables: ξ ∈ S H if ξ = m i=1 F i ḣi where F i ∈ S (the class of smooth random variables, cf.section 2), ḣi ∈ H for i ∈ [m] for some m ∈ Z + .For ξ ∈ S H we define For the purpose of this work, we focus on measures µ on Ω of the form dµ(ω) := f (ω 1 )dγ(ω).In addition, comparing the following lemma to Lemma 2.3, we see that it provides only one direction of the correspondence between Cameron-Martin contractions and bounded derivatives.A more general theory could be developed, at least in principle, but our goal in this work is to highlight key differences between causal optimal transport and optimal transport on the Wiener space, to which end the following suffices.Lemma 6.5.Let µ be a measure on Ω of the form dµ(ω) := f (ω 1 )dγ(ω) and let T : Ω → Ω be a transport map from γ to µ of the form T (ω) = ω + ξ(ω) where ξ := Proof.We first note that the Malliavin differentiability of ξ, as well as [10, Theorem 5.7.2], imply that, γ-a.e., for any h, g ∈ H Suppose now that T is a Cameron-Martin contraction with constant C so that, for a fixed h ∈ H 1 , and any ǫ > 0, sup Taking ǫ ↓ 0 and using (6.1) shows that sup

Causal optimal transport
In this section we answer in the negative, for causal optimal transport maps, the question raised in section 6, thus proving the second part of Theorem 1.8.In particular, we will construct a strictly log-concave function f : R d → R such that the causal optimal transport map from γ to dµ(ω) := f (ω 1 )dγ(ω) is not a Cameron-Martin contraction, with any constant C.This indeed provides a negative answer in light of Remark 6.3.Our concrete example is the case where f dγ d is the measure of a one-dimensional Gaussian random variable conditioned on being positive.More precisely, fix a constant σ > 0 and let f : R → R be given by .
The measure f (x)dγ 1 (x) is the measure on R of a centered Gaussian, whose variance is smaller than one, conditioned on being positive.We define a measure µ on Ω by setting and note that f is strictly log-concave for all σ > 0, and that f is log-concave as σ → ∞.To simplify computations we will take σ ≥ 1.Finally, note that the assumptions of Proposition 3.10 hold in this case (κ ≥ 0).
Remark 7.1.Given dµ(ω) = f (ω 1 )dγ(ω) let p be the measure on R given by p := f dγ 1 and let γ a,σ 1 be the Gaussian measure on R with mean a and variance σ.The natural examples for testing whether the causal optimal transport map A, between γ to µ, is a Cameron-Martin contraction are p = γ a, 1  1 , for some a ∈ R, and p = γ 0,σ 1 , for some σ > 0. We can expect these examples to show that A is not a Cameron-Martin contraction since they saturate the bounds in Lemma 3.4: When p = γ a,1 1 we have ∇v(t, x) = 0 (saturation of Lemma 3.4(2) under the assumption κ ≥ 1) and when p = γ 0,σ 1 we have that, in the limit σ ↓ 0, ∇v(t, x) = − 1 1−t (saturation of (3.3)).Since in these cases |∇v| is the largest, we can expect that A will not be a Cameron-Martin contraction since its derivative will blow up.However, explicit calculation shows that A is in fact a Cameron-Martin contraction for p = γ a, 1  1 and p = γ 0,σ 1 .Hence, we require the construction of a more sophisticated example which we obtain by considering Gaussians conditioned on being positive.
In order to prove that A is not a Cameron-Martin contraction we will use Lemma 6.5.We will show that with the example above, with positive probability, the derivative can be arbitrary large so that A cannot be a Cameron-Martin contraction with any constant C.
As mentioned already, the map A is nothing but the Föllmer process X [44].This allows for the following convenient representation of the derivative of A.
Lemma 7.2.Let A be the causal optimal transport map from γ to µ and let X be the solution of (2.1).Fix 0 < ǫ < 1.For any ḣ ∈ H, In addition, Proof.We have A = Id Ω +ξ where ξt (ω) = v(t, X t (ω)) with the drift v as in (2.1).To show that we start by noting that Proposition 3.10 gives Hence, our goal is to show that To establish the above identity it suffices to show that . This indeed holds since, for such g and h, where in the second equality the integral and the limit were exchanged by the use of the dominated convergence theorem and as v is Lipschitz on [0, 1 − ǫ] (Equation (3.3) and Lemma 3.4), while the third equality holds by the chain rule which can be applied as v is Lipschitz [59, Proposition 1.2.4].
The proof of the second part of the lemma follows by noting that the solution to the ordinary differential equation (7.1), with initial condition 0 at t = 0, is DX t , ḣ H = t 0 e t s ∇v(r,Xr)dr ḣs ds.
The next theorem is the main result of this section, showing that, with positive probability, |DA| L(H) can be arbitrarily large, thus proving the second part of Theorem 1.8.Theorem 7.3.Let ℓ ∈ H 1 be given by ℓ(t) = t for t ∈ [0, 1].There exists a constant c > 0 such that, for any 0 < ǫ < c, there exists a measurable set The upshot of Theorem 7.3 is that there exists a unit norm ḣ ∈ H, specifically h = ℓ, such that, for any b > 0, the event {|DA[ ḣ]| H > b} has positive probability (possibly depending on b).Since we conclude that A cannot be a Cameron-Martin contraction, with any constant C.
Next we describe the idea behind the proof of Theorem 7.3.Fix 0 < ǫ < 1.By Lemma 7.2, and as ∇v(t, X The idea of the proof is to construct a function η ǫ : [0, 1 − ǫ] → R and a constant b > 0, such that If we choose h = ℓ, and substitute η ǫ (t) for X t in (7.2), then a computation shows that |DA[ ḣ]| H is large.The final step is to show that, with positive probability, . This implies that, with positive probability, we can make |DA[ ḣ]| H arbitrary large.We now proceed to make this idea precise.We start with the construction of the function η ǫ .Lemma 7.4.For every 0 < ǫ < 1 there exists an absolutely continuous function η Furthermore, for any ǫ > 0 and η ǫ as above, there exists δ(ǫ) > 0 such that, if η : as well.
For the second part of the lemma, given ǫ and η ǫ as above, use the continuity of m, and that m 2 (c) + cm(c) = and we continue as above to complete the proof.
Next we show that if we take h = ℓ, and substitute for X in (7.2) a function η which is close to the function η ǫ constructed in Lemma 7.4, then |DA[ ḣ]| H is large.
since the second term is nonpositive.In particular, letting t 0 := It remains to show that, with positive probability, X is close to η ǫ .Proof.The measurability of E ǫ,δ follows as the sigma-algebra F is generated by the Borel sets of Ω with respect to the uniform norm.To show that E ǫ,δ has positive probability it will be useful to note that the Föllmer process X is a mixture of Brownian bridges in the following sense.Let ( Ω, F , P) be any probability space which supports a Brownian motion B = ( Bt ) t∈[0,1] and a random vector Y ∼ p, independent of B. Define the process Z by Z t := Bt − t( B1 − Y ) for t ∈ [0, 1] so, conditioned on Y , Z is a Brownian bridge starting at 0 and terminating at Y .Given a set B ∈ F we have [33], We can now complete the proof of Theorem 7.3 by combining the above lemmas.To see that O ′ is in fact the actual optimal transport map we compute Since p is κ-log-concave, we have 0 ∇ Suppose now that κ < 1.We claim that |M h 1 | 2 + 2 M h 1 , h Remark 8.3.The example of a one-dimensional Gaussian conditioned on being positive, constructed in section 7, does not exactly satisfy the assumptions of Theorem 8.2 since the secondderivative of the transport map between γ 1 and the conditioned Gaussian p = f γ 1 does not exist at every point in R. Nonetheless, the statement of Theorem 8.2 still holds true in this case.In the example of section 7, the optimal transport map is explicit, ∇φ 1 = F −1 p • F γ 1 , where F p and F γ 1 are the cumulative distribution functions of p and γ 1 , respectively.Computing the derivatives of this map we see that φ 1 is twice-differentiable everywhere.Hence, the proof of Theorem 8.2 still goes through since ∇ 2 φ 1 must exists everywhere on the line ω 1 + rh 1 .
Remark 8.4.The proof of Theorem 8.2 shows that the optimal transport map O between γ and µ(dω) = f (ω 1 )γ(dω) is essentially the optimal transport map in R d between γ d and f γ d .This explains why we cannot use the optimal transport map on Wiener space instead of the Brownian transport map, since the desired contraction properties for the optimal transport maps in R d are still unknown.

Lemma 5 . 1 .
Let Υ : Ω → R d be an almost-sure contraction with constant C and let η : R d → R be a continuously differentiable Lipschitz function.Then,