A Bismut-Elworthy inequality for a Wasserstein diffusion on the circle

We introduce in this paper a strategy to prove gradient estimates for some infinite-dimensional diffusions on $L_2$-Wasserstein spaces. For a specific example of a diffusion on the $L_2$-Wasserstein space of the torus, we get a Bismut-Elworthy-Li formula up to a remainder term and deduce a gradient estimate with a rate of blow-up of order $\mathcal{O}(t^{-(2+\epsilon)})$.


Introduction
In [vRS09], von Renesse and Sturm introduced a diffusion process on the L 2 -Wasserstein space P 2 (R), satisfying following properties: the large deviations in small time are given by the Wasserstein distance and the martingale term which arises when expanding any smooth function φ of the measure argument along the process has exactly the square norm of the Wasserstein gradient of φ as local quadratic variation. There are several examples of diffusions on Wasserstein spaces, either constructed via Dirichlet forms [vRS09,AvR10,KvR], as limits of particle systems [Kon17b,Kon17a] or as a system of infinitely many particles [Mar18,Mar21]. Some diffusive properties of those processes have already been proved, as e.g. a large deviation principle [KvR18] or as a restoration of uniqueness for McKean-Vlasov equations [Mar21].
We prove in this paper another well-known diffusive property. Indeed, we control the gradient of the semi-group associated to a diffusion process on the L 2 -Wasserstein space. For finite-dimensional diffusions, this gradient estimate can be obtained from a Bismut-Elworthy-Li integration by parts formula. Bismut, Elworthy and Li showed that the gradient of the semigroup P t φ associated to the stochastic differential equation dX t = σ(X t )dW t + b(X t )dt on R n can be expressed as follows where V s is a certain stochastic process starting at v 0 (see [Bis81,Elw92,EL94]). In particular, that equality shows that ∇(P t φ) x 0 (v 0 ) is of order t −1/2 for small times. Important domains of application of Bismut-Elworthy-Li formulae are among others geometry [Tha97,TW98,ATW06], non-linear PDEs [Del03,Zha05] or finance [FLL + 99, MT06]. Recent interest has emerged for similar results in infinite dimension. First, Bismut-Elworthy-Li formulae were proved for Kolmogorov equations on Hilbert spaces and for reaction-diffusion systems in bounded domains of R n , see [DPEZ95,Cer01,DP04]. Recently, Crisan and McMurray [CM18] and Baños [Bn18] proved Bismut-Elworthy-Li formulae for McKean-Vlasov equations dX t = b(t, X t , µ t )dt + σ(t, X t , µ t )dW t , with µ t = L(X t ). For other recent smoothing results on McKean-Vlasov equations and mean-field games, see also [BLPR17, CdR20, CdRF, CCD19].

A gradient estimate for a Wasserstein diffusion on the torus
In this paper, we construct a system of infinitely many particles moving on the one-dimensional torus T = S 1 , identified with the interval [0, 2π]. Considering for each time the empirical measure associated to that system, we get a diffusion process on P(T), space of probability measures on T. Then we average out that process over the realizations of an additive noise β. This averaging increases the regularity of the process and leads to a gradient estimate of the associated semi-group.
To state more precisely the main result of the paper, let us introduce the following equation Hereabove, β, (W ℜ,k ) k∈Z and (W ℑ,k ) k∈Z are independent standard real-valued Brownian motions, the notation ℜ denotes the real part of a complex number and W k · := W ℜ,k The sequence (f k ) k∈Z is fixed and real-valued, typically of the form f k = C (1+k 2 ) α/2 . Lastly, the initial condition g : [0, 1] → R is a C 1 -function with positive derivative satisfying g(1) = g(0) + 2π, so that g is seen as the quantile function of a probability measure ν g 0 on T. For each u ∈ [0, 1], (x g t (u)) t∈ [0,T ] represents the trajectory of the stochastic particle starting at g (u), which is driven both by a common noise W = (W k ) k and by an idiosyncratic noise β, taking over the terminology usually used for McKean-Vlasov equation. This terminology is justified since equation (1) can be seen as the counterpart on the torus T of the equation on the real line studied in [Mar21] and defined by where (w(k, t)) k∈R,t∈[0,T ] is a complex-valued Brownian sheet on R × [0, T ]. Moreover, we will show that the cloud of all particles is spread over the whole torus. More precisely, for each t ∈ [0, T ], the probability measure ν g t = Leb [0,1] •(x g t ) −1 has a density q g t w.r.t. Lebesgue measure on T such that q g t (x) > 0 for all x ∈ T. Instead of studying the process (ν g t ) t∈[0,T ] , we consider a more regular process defined by averaging over the realizations of β: i.e. µ g t is the probability measure on T with density p g t (x) := E β [q g t (x)], x ∈ T, w.r.t. Lebesgue measure 1 . In other words, since we assume that β and (W k ) k∈Z are independent, µ g t is the conditional law of x g t given (W k ) k∈Z . The semi-group associated to (µ g t ) t∈[0,T ] is defined, for any bounded and continuous function φ : P 2 (T) → R, by P t φ(µ g 0 ) := E [φ(µ g t )]. Alternatively, denoting the lift of φ by φ(X) := φ((Leb [0,1] ⊗P β ) • X −1 ) for any random variable X ∈ L 2 ([0, 1] × Ω β ), we can also define the semi-group by P t φ(g) = E φ (x g t ) .
The main theorem of this paper states the following upper bound for the Fréchet derivative of g → P t φ(g) depending only on the L ∞ -norm of φ. Assume that g ∈ C 3+θ with θ > 0 and let h be a 1-periodic C 1 -function. If φ is sufficiently regular and if f k = C (1+k 2 ) α/2 , k ∈ Z, with α ∈ 7 2 , 9 2 , then there is C g independent of h such that for any t ∈ (0, T ]. The precise assumptions over φ and the precise statement of this theorem are given below by Definition 11 and Theorem 15, respectively. Moreover, C g depends polynomially on g ′′′ L∞ , g ′′ L∞ , g ′ L∞ and 1 g ′ L∞ .

Comments on the main result
The order α of the polynomial decrease of (f k ) k∈Z has a key role in this paper. In equation (1), the diffusion coefficient in front of the noise W is written as a Fourier series, (f k ) k∈Z being the sequence of Fourier coefficients. Therefore, it should not be surprising that the larger α is, the more regular the solution to (1) is 2 . Nevertheless, when we apply Girsanov's Theorem with respect to W , which is part of a standard method introduced by Thalmaier and Wang [Tha97,TW98], we need α to be sufficiently small, in order to be able to invert the Fourier series 3 . So there is a balance regarding the choice of α, which explains why we assume in our main result α to be bounded from above and from below. Moreover, the question of the order α is highly related to the rate t −(2+θ) appearing in (3). Usually, we expect a rate of t −1/2 for diffusions. As we have already mentioned, this rate follows directly from a Bismut-Elworthy-Li integration by parts formula. However, adapting the usual strategy based on Kunita's expansion as in [Tha97,TW98], we do not get an exact integration by parts formula here. Indeed, the failure of the latter strategy in our case comes from the fact that it is impossible to choose α which is simultaneously large enough to ensure a sufficient regularity of the solution and small enough to be able to invert the Fourier series. We refer to remark 22 below for a justification of that claim. Therefore, the main new strategy introduced in this paper is to regularize the derivative of the solution 4 . By doing this, we get an approximate integration by parts formula, in the sense that there is an additional remainder term appearing in the formula: where λ k,ε · is a stochastic process 5 . Controlling the remainder term leads us, by a final bootstrap argument, to the desired upper bound on |D P t φ(g) · h|, at the prize of worsening the rate of blow-up. We are not claiming that the rate of t −(2+θ) is sharp, but we expect that a rate of t −1/2 is unachievable for this process. Let us mention that the author improves this rate of blow-up to t −(1+θ) , at the prize of assuming C 4+θ -regularity of g and h. Since the proof is long and technical, it is not included in this paper but we refer to [Mar19,Chapter IV] for all the details and for an application to a gradient estimate for an inhomogeneous SPDE with Hölder continuous source term.
Furthermore, the idiosyncratic noise β is important as well. Of course, the addition of β does not change dramatically the dynamics of the process, since it acts as a rotation on the circle 2 see Proposition 4 below. 3 see Lemma 21 below. 4 More exactly, we regularize the function A g defined by (14) into a convolution A g,ε defined by (15). 5 defined in Lemma 21. of the whole system. Nevertheless, as it has already been pointed out, the diffusion process (µ g t ) t∈[0,T ] is defined by an average over the realizations of β. The importance of that averaging is consistent with SPDE theory. Indeed, (µ g t ) t∈[0,T ] solves the following equation: with initial condition µ g t | t=0 = µ g 0 . The noise β manifests in the additional term 1 2 in front of ∆(µ g t ). On the level of the densities, (p g t ) t∈[0,T ] solves the following equation Denis and Stoica showed in [DS04,DMS05] that the above equation is well-posed -and they also gave energy estimates -if λ is strictly larger than the critical threshold . If we considered equation (1) without β, we would exactly obtain the above equation with λ = λ crit . Therefore, it seems that adding a level of randomness is crucial to get our estimate. Precisely, in our above-described strategy, the Brownian motion β plays a key role in controlling the remainder term.
In addition, let us note that we study a process on the one-dimensional circle T, and not on the real line as e.g. in [Kon17b,Mar21]. We made this choice rather for technical reasons, in order to deal with processes of compactly supported measures having a positive density on the whole space. The main result is restricted to functions g which have a strictly positive derivative, meaning that the associated measure has a density with respect to Lebesgue measure on the torus. The constant C g tends to infinity when min u∈[0,1] g ′ (u) gets closer to zero. The assumptions on the regularity of g and h seem reasonable since our model is close to the process (µ MMAF t ) t called modified massive Arratia flow introduced in [Kon17b], which has highly singular coefficients. Indeed, Konarovskyi and von Renesse showed in [KvR18] that (µ MMAF t ) t , which is almost surely of finite support for all t > 0, solves the following SPDE:

Organization of the paper
The goal of Section 2 is to define properly equation (1) and to state the main result of the paper. The proof of the theorem is then divided into four main steps. We start in the short Section 3 by splitting the gradient of the semi-group into two parts, one regularized term and one remainder term, which are separately studied in sections 4 and 5, respectively. Finally in Section 6, we complete the proof by a bootstrap argument.

Statement of the main result
The main result of the paper is stated in Paragraph 2.4. Before, we define precisely the diffusion on the torus (Paragraph 2.1), its associated semi-group (Paragraph 2.2) and the assumptions on the test functions (Paragraph 2.3).

A diffusion on the torus
In this paper, we study the following stochastic differential equation on a fixed time interval with initial condition x g 0 = g. In this paragraph, we first define the assumptions made on W k , β, f k and g, where we emphasise the interpretation of (x g t ) t∈[0,T ] as a diffusion on the torus. Then we state existence, uniqueness and some important properties of solutions to equation (4).
Let β, (W ℜ,k ) k∈Z , (W ℑ,k ) k∈Z be a collection of independent standard real-valued Brownian motions. Thus W k · = W ℜ,k · + iW ℑ,k · denotes a C-valued Brownian motion. The notation ℜ denotes the real part of a complex number, so that equation (4) can alternatively be written as follows Definition 1. We say that f : Note that if f is of order α > 1 2 , then for each u ∈ R, the particle (x g t (u)) t∈[0,T ] has a finite quadratic variation equal to x g (u), x g (u) t = k∈Z f 2 k + 1 t. Let T be the one-dimensional torus, that we identify with the interval [0, 2π]. P(T) denotes the space of probability measures on the torus. We consider the L 2 -Wasserstein metric W T 2 on P(T), defined by W T 2 (µ, ν) , where Π(µ, ν) is the set of probability measures on T 2 with first marginal µ and second marginal ν and where d T is the distance on the torus defined by d T (x, y) := inf k∈Z |x − y − 2kπ|, where x, y ∈ R.
Definition 2. Let G 1 be the set of C 1 -functions g : R → R such that for every u ∈ R, g ′ (u) > 0 and g(u + 1) = g(u) + 2π. Let ∼ be the following equivalence relation on G 1 : g 1 ∼ g 2 if and only if there exists c ∈ R such that g 2 (·) = g 1 (· + c). We denote by G 1 the set of equivalence classes G 1 / ∼.
An interpretation for Definition 2 is that the initial condition g is seen as the quantile function (or inverse c.d.f. function) associated with the measure ν g 0 ∈ P(T) with density p(x) = 1 g ′ (g −1 (x)) , x ∈ [0, 2π], with respect to Lebesgue measure on T. There is a one-to-one correspondence between G 1 and the set of positive densities on the torus, see Paragraph A.1 in appendix for more details.
Existence, uniqueness, continuity and differentiability of solutions to (4) depend on the order α of f and on the regularity of g, as shown by the following two propositions. The proofs, which are classical, are left to appendix.
Proof. See Paragraph A.2 in appendix.
For every j ∈ N and θ ∈ [0, 1), let C j+θ denote the set of C j -functions whose j th derivative is θ-Hölder continuous. By extension, G j+θ ⊆ G 1 and G j+θ ⊆ G 1 , are the subsets of G 1 and of G 1 consisting of all C j+θ -functions and C j+θ -equivalence classes, respectively. Proposition 4. Let j 1, θ ∈ (0, 1), g ∈ G j+θ and f be of order α > j + 1 2 + θ. Then almost surely, for every t ∈ [0, T ], the map u → x g t (u) is a C j+θ ′ -function for every θ ′ < θ. Moreover, its first derivative satisfies almost surely Proof. See Paragraph A.2 in appendix.
The next proposition states that the flow of the SDE preserves the equivalence classes of quantile functions that we introduced in Definition 2.
Proposition 5. Let θ ∈ (0, 1), g ∈ G 1+θ and f be of order α > 3 2 + θ. Then almost surely, for every t ∈ [0, T ], the map u → x g t (u) belongs to G 1 . Moreover, if g 1 ∼ g 2 , then almost surely, Proof. By propositions 3 and 4, it is clear that u → x g t (u) belongs to C 1 and that ∂ u x g t (u) > 0 for every u ∈ R. Furthermore, let (y g t ) t∈[0,T ] be the process defined by ,u∈R satisfy the same equation and belong to C(R × [0, T ]). By Proposition 3, there is a unique solution in this space. Thus for every t ∈ [0, T ] and every u ∈ R, The proof of the second statement is similar; if there is c ∈ R such that g 2 (u) = g 1 (u + c) for every u ∈ R, then the processes (x g 2 t (u)) t∈[0,T ],u∈R and (x g 1 t (u + c)) t∈[0,T ],u∈R satisfy the same equation and are equal.
By Proposition 5, we are able to give a meaning to equation (4) with initial value g in G 1+θ . Indeed, for each t ∈ [0, T ], the solution x g t will take its values in G 1+θ ′ for every θ ′ < θ. More generally, by Proposition 4, if the initial condition g belongs to G j+θ for j 1 and θ ∈ (0, 1), then for each t ∈ [0, T ], x g t belongs to G j+θ ′ for every θ ′ < θ. Furthermore, the L p -norms (in the space variable) of the derivatives ∂ (j) u x g t , j 1, and of 1 ∂ux g t can be easily controlled with respect to the initial conditions. All the inequalities that will be needed later in this paper were listed and proved in Lemma 39 in appendix.
To conclude this paragraph, let us mention that the solution to equation (4) can be equivalently seen as a solution to the following parametric SDE Under the same assumptions over f , well-posedness and regularity of solutions to equation (6) can be shown, see Proposition 40 in appendix. Moreover, Z t is closely related to x g t by the following identities.
2.2 Semi-group averaged out by idiosyncratic noise 7

Semi-group averaged out by idiosyncratic noise
According to Proposition 5, u ∈ [0, 1] → x g t (u) is for each fixed t a quantile function of the measure ν g t ∈ P(T) defined by ν g t := Leb [0,1] •(x g t ) −1 . However, the stochastic process (ν g t ) t∈[0,T ] is not regular enough to obtain a gradient estimate for the associated semi-group. Therefore, we average out the realization of the noise β by defining µ g t : To be more precise, we define three sources of randomness, for the noises W k , β and the initial condition g, respectively. Let ( , P β ) be filtered probability spaces satisfying usual conditions, on which we define a (G W t ) t∈[0,T ]adapted collection ((W k t ) t∈[0,T ] ) k∈Z of independent C-valued Brownian motions and a (G β t ) t∈[0,T ]adapted standard Brownian motion (β t ) t∈[0,T ] , respectively. Let (Ω 0 , G 0 , P 0 ) be another probability space rich enough to support G 1 -valued random variables with any possible distribution. We denote by E B , E W and E 0 the expectations associated to P β , P W and P 0 , respectively. Let (Ω, G, (G t ) t∈[0,T ] , P) be the filtered probability space defined by Ω : G 0 ) and P := P W ⊗ P β ⊗ P 0 . Without loss of generality, we assume the filtration (G t ) t∈[0,T ] to be complete and, up to adding negligible subsets to G 0 , we assume that G 0 = G 0 .

Assumptions on the test functions
The semi-group (P t ) t∈[0,T ] acts on bounded and continuous functions φ : P 2 (R) → R. We will assume the assumptions on φ defined hereafter.
Definition 10. We define an equivalence class on P 2 (R) by for any µ ∼ ν. In particular, φ induces a map from P(T) to R.
In particular, for a T-stable function φ and X ∈ L 2 (Ω), φ(X) = φ({X}), where {x} is the unique number in [0, 2π) such that x − {x} ∈ 2πZ. Let us mention two important classes of examples of T-stable functions: The 2π-periodicity condition ensures that φ( If the reader is not familiar with the L-derivative ∂ µ φ, we refer to Paragraph A.3 in appendix for a short introduction. Definition 11. A function φ : P 2 (R) → R is said to satisfy the φ-assumptions if the following three conditions hold: (φ1) φ is T-stable, bounded and continuous on P 2 (R).
The following statement shows that the class of functions satisfying the φ-assumptions is stable under the action of (P t ) t∈[0,T ] .
Proposition 13. Assume that f is of order α > 5 2 . Let φ : P 2 (R) → R be a function satisfying the φ-assumptions. Then for every t ∈ [0, T ], P t φ : P 2 (R) → R also satisfies the φ-assumptions. Moreover, for any fixed t ∈ [0, T ], the Fréchet derivative of g → P t φ(µ g 0 ) is given by Note that D φ(x g t ) is an element of the dual of L 2 ([0, 1] × Ω β ), identified here with an element of L 2 ([0, 1] × Ω β ). The proof of Proposition 13 is given in Paragraph A.4 in appendix.

Statement of the main theorem
The main result of this paper is a gradient estimate for the semi-group (P t ) t∈[0,T ] associated to (µ g t ) t∈[0,T ] , which is given at points g ∈ G 3+θ and for directions of perturbations h defined as follows.
Definition 14. We denote by ∆ 1 the set of 1-periodic C 1 -functions h : R → R. We define the following norm on ∆ 1 : A simple computation shows that for |ρ| ≪ 1, g + ρh still belongs to G 1 . Let us state the main theorem.
Theorem 15. Let φ : P 2 (R) → R satisfy the φ-assumptions. Let θ ∈ (0, 1) and f be of order α = 7 2 + θ. Let g ∈ G 3+θ and h ∈ ∆ 1 be two deterministic functions. Then there is C g independent of h such that for every t ∈ (0, T ] where C g is bounded when g ′′′ L∞ + g ′′ L∞ + g ′ L∞ + 1 g ′ L∞ is bounded. In the following section, we split the l.h.s. of (11) into two terms, I 1 and I 2 , which will be studied separately in sections 4 and 5, respectively.

Preparation for the proof
To start this paragraph, we rewrite the gradient of the semi-group in terms of the L-derivative of P t φ and of the linear functional derivative of P t φ. We refer to paragraph A.3 in appendix for a short remainder about definitions and relationships between the different types of derivatives.
For convenience, the following lemma is written for a perturbation g ′ h instead of h (the corresponding result for h can be naturally obtained by applying the following formula to h g ′ instead of h). For later purposes, the lemma is stated for random functions g and h, with a G 0 -measurable randomness. Recall that within this framework, g and h are independent of ((W k ) k∈Z , β).
Lemma 16. Let φ, θ and f be as in Theorem 15. Let g and h be G 0 -measurable random variables with values respectively in G 3+θ and ∆ 1 . Then for every t ∈ [0, T ], Proof. Fix ω 0 in an almost-sure event of Ω 0 such that g = g(ω 0 ) belongs to G 3+θ and h = h(ω 0 ) belongs to ∆ 1 . Since g ′ is 1-periodic and positive, ρ 0 := 1 h ′ L∞ inf u∈R g ′ (u) is positive and g +ρh belongs to G 1 for every ρ ∈ (−ρ 0 , ρ 0 ). The first equality in (12) follows from the definition of the L-derivative: The second equality in (12) was already stated in Proposition 13. For the third equality in (12), we use the relationship between the L-derivative and the functional linear derivative (see Proposition 41) This follows from Proposition 45 in appendix, because on the one hand P t φ satisfies the φ-assumptions by Proposition 13 and on the other hand the probability measure µ g 0 has density g ′ on the torus, which is strictly positive everywhere. It follows that δPtφ and where Lemma 17. For any t ∈ [0, T ] and ε > 0, A g t is a 2π-periodic C 1 -function and A g,ε t is a 2π-periodic C ∞ -function.
Proof. The periodicity property follows from the fact that h and ∂ u x g t are 1-periodic and from We conclude this paragraph by splitting the derivative of P t φ into two terms. I 1 involves the regularized function A g,ε t , whereas I 2 is a remainder term for which we have to show that it is small with respect to |ε|.

Proposition 18. Under the same assumptions as in Lemma 16 and for every
where A g,ε s (x g s (u))dsdu ; ds.
Therefore, equation (12) rewrites The r.h.s. of the latter equality is clearly equal to I 1 + I 2 .
The proof of Theorem 15 is now divided into three main steps. In the following two sections, we will study separately I 1 and I 2 . This will lead to the estimate stated in Corollary 32. In Section 6, we conclude the proof by iterating the result of Corollary 32 over successive time intervals.

Analysis of I 1
In order to control I 1 , we adapt in this section a method of proof introduced in [Tha97]. To follow that strategy, we take benefit from the fact that A g,ε t , in contrast to A g t , is as regular as needed. The drawback is that the control on I 1 blows up when ε goes to 0, which is the reason why the explosion rate is t −2−θ in Theorem 15 and not t −1/2 as in [Tha97].
Let us define A g,ε s (x g s (u))ds.
Using that notation, I 1 rewrites as follows The goal of this section is to prove the following inequality: Proposition 19. Let φ, θ and f be as in Theorem 15. Let g and h be G 0 -measurable random variables with values respectively in G 3+θ and ∆ 1 . Then there is C > 0 independent of g, h and θ such that for every t ∈ [0, T ], for every ε ∈ (0, 1), The proof of the proposition is based on writing the SDE satisfied by (Z t is the solution to (6). We recall this expansion, known as Kunita's theorem, in the following lemma: adapted process such that t → ζ t is absolutely continuous, ζ 0 = 0 almost surely and E T 0 |ζ t |dt is finite. Then almost surely, for every x ∈ R, t ∈ [0, T ] and ρ ∈ R,

Fourier inversion on the torus
A key ingredient in the study of I 1 is the following Fourier inversion of A g,ε t , which we will later use in order to apply Girsanov's theorem.
and such that there is a constant Therefore, by Dirichlet's Theorem, that map is equal to the sum of its Fourier series where c k (A) := 1 2π 2π 0 A(y)e iky dy for every 2π-periodic function A and for every k ∈ Z. Let us define λ k t := Compute the Fourier coefficient c k (A g,ε s ): Thus there is C independent of k and s such that for every where we used inequality (71) given in appendix. Moreover, the derivative of A g s is equal to: .
We deduce that (21) Therefore, for every s T , by (71) and (72) and because g belongs to G 3+θ and α = 7 2 + θ. We deduce that which is inequality (18). This completes the proof of the lemma.
Remark 22. After the proof of Lemma 21, we can now explain precisely why a regularization of A g was needed. Imagine for a while that instead of looking for the Fourier inverse of A g,ε in (17), we were looking for the Fourier inverse of A g . In order to prove an inequality like (18), we would have to show that The latter sum converges if E t 0 A g s 2 C p ds is bounded for a certain p > 1 + 2α. In turn, if the latter expectation is bounded then for almost every s, y → A g s (y) is of class C p . But we know, by definition (14) of A g s and by Proposition 4, that A g s ∈ C p if h ∈ C p , g ∈ C p+θ and α > p + 1 2 . The regularity of h is not a big problem, since we could simply assume higher regularity in the assumptions of Theorem 15. However, it is impossible to choose α so that both inequalities p > 1 + 2α and α > p + 1 2 hold simultaneously. Regularizing A g allows to work with A g,ε s ∈ C p without having to assume that α > p + 1 2 .

A Bismut-Elworthy-like formula
Let us state and prove an integration by parts formula, close to Bismut-Elworthy formula.
In view of proving Proposition 23, let us introduce the following stopping times. Let M 0 be an integer large enough so that for every u ∈ R, 1 Since g ∈ G 3+θ and f is of order α > 7 2 , inequalities (71) and (72) imply that and there is a constant C M,ε > 0 such that P W ⊗ P β -almost surely, k∈Z Similarly as in the proof of Lemma 21, there is a constant C > 0 such that for every k ∈ Z\{0} and for every ε > 0, By Definition (14), for every s ∈ [0, T ], Since the constant does not depend on t, we deduce the statement of the lemma.
Define the (G t )-adapted process (a M t ) t∈[0,T ] by a M t = a t∧τ M , in other words: We easily check that for every u ∈ R, a M 0 (u) = 0 and thatȧ M t (u) = g ′ (u) (28) We deduce that Therefore, inequality (28) holds with C a M : It follows from (30) and (31) Using the constant C a M appearing in (28), we define ρ 0 : The following lemma makes use of Kunita's expansion.
Then there exists C depending on M , f , g ′ , h, T and ε such that for every ρ ∈ (−ρ 0 , ρ 0 ) and for every t ∈ [0, T ], Proof. Fix u ∈ R and write the equation satisfied by (Z where we used the identities (7) and (8).
Finally, the following lemma states a Bismut-Elworthy formula. Remark that the only difference with the formula of Proposition 23 is the localization by τ M .
Lemma 27. Let θ ∈ (0, 1), g ∈ G 3+θ and f be of order α = 7 2 + θ. For every M M 0 and for every t ∈ [0, T ], Proof. Take the real part of equality (26) with y = Y ρ,M s (u). Recall that A g,ε s and f k are realvalued. We obtain for every M M 0 , for every u ∈ R, for every ρ ∈ (−ρ 0 (M ), ρ 0 (M )) and for every s ∈ [0, T ], Thus, we rewrite equality (32) in the following way: Recall that by Lemma 24, there is a constant C M,ε > 0 such that P W ⊗ P β -almost surely, It follows from Novikov's condition that the process (E ρ t ) t∈[0,T ] is a P W ⊗ P β -martingale. Let P ρ be the probability measure on Ω W × Ω β such that P ρ is absolutely continuous with respect to P W ⊗ P β with density k∈Z is a collection of independent P ρ -Brownian motions, independent of (β, G 0 ). By uniqueness in law of equation (4), the law of (Y ρ,M t ) t∈[0,T ] under P ρ is equal to the law of ( The r.h.s. does not depend on ρ, so we have Let us now prove that d dρ |ρ=0 By assumption (φ2), φ is a Lipschitzcontinuous function. By Lemma 26, we have for every ρ ∈ (−ρ 0 , ρ 0 ) since the exponential term is a P W ⊗ P β -martingale. Therefore, It follows from (39) and (40) We used Proposition 6 for the last equality. Therefore, equality (38) holds true.
Proof (Proposition 23). We want to prove which is equivalent to (23). In order to obtain that equality, it is sufficient to pass to the limit when M → +∞ in (38). Recall that by (25), sup Proof of (41). For every M M 0 , by Hölder's inequality (68), Remark that for every s ∈ [0, T ], A g,ε s L∞ where the constant C does not depend on M . The last inequality is obtained by inequalities (71) and (72), because g ∈ G 2+θ and α > 5 2 + θ. We deduce (41). Proof of (42). For every M M 0 and t ∈ [0, T ]

Conclusion of the analysis
Putting together Lemma 21 and Proposition 23, we conclude the proof of Proposition 19.
Remark 28. Note that we have proved a Bismut-Elworthy-Li integration by parts formula up to a remainder term. Indeed, by propositions 18 and 23, we proved that where it should be recalled that ((λ k t ) t∈[0,T ] ) k∈Z , defined by (17), depends on ε. In the next section, we prove that I 2 is of order O(ε).

Analysis of I 2
In this section, we look for an upper bound of |I 2 |. Define H g,ε s (u) : . Then I 2 rewrites as follows Moreover, let (K g,ε t ) t∈[0,T ] be the process defined by: We also introduce the notation δψ δm , denoting the zero-average linear functional derivative. For every µ ∈ P 2 (R) and v ∈ R, The main result of this section is the following proposition.
such that X is continuous in u for P-almost every fixed ((w k ) k∈Z , b) ∈ Θ and such that P W ⊗P β -almost surely, for every u ∈ R and for every t ∈ [0, T ], -measurable, such that P W ⊗ P β -almost surely, for every u ∈ R and for every t ∈ [0, T ], Proof. Consider the canonical space (Θ, B(Θ), ( B t (Θ)) t∈[0,T ] , P). By Proposition 3, there is a strong and pathwise unique solution to (4) with initial condition x g 0 = g. Therefore, for every fixed u ∈ R, there is a unique solution (x g t (u)) t∈[0,T ] to Proof of (a). By Yamada-Watanabe Theorem, the law of (x g , (W k ) k∈Z , β) under P W ⊗ P β is equal to the law of (x g , (w k ) k∈Z , b) under P. This result is proved in [KS91,Prop. 5.3.20] for a finite-dimensional noise, but the proof is the same for the infinite-dimensional noise ((w k ) k∈Z , b) ∈ Θ. By a corollary to this theorem (see [KS91,Coro. 5.3.23]), it follows that for every u ∈ Q, there is a B(Θ)/B(C([0, T ], R))-measurable function (C([0, T ], R))-measurable, such that P-almost surely, for every t ∈ [0, T ], Moreover, again by Proposition 3, there is a P-almost sure event A ∈ B(Θ) such that for every ((w k ) k∈Z , b) ∈ A, the function (t, u) → x g t (u) is continuous on [0, T ]× R. Up to modifying the almost-sure event A, we may assume that for every ((w k ) k∈Z , b) ∈ A and for every u ∈ Q, equality (50) holds. Therefore, we can define a continuous function in the variable u ∈ R by extending u ∈ Q → X u . More precisely, define for every u ∈ R, ((w k ) k∈Z , b) ∈ Θ, In the latter definition, the limit exists and for every ( is continuous. It remains to show that X is progressivelymeasurable. Fix t ∈ [0, T ]. By construction of X u , we know that for every u ∈ Q, Since P-almost surely, for every u ∈ R and for every t ∈ [0, T ], x g t (u) = X t (u, (w k ) k∈Z , b), we deduce that P W ⊗P β -almost surely, for every u ∈ R and for every t ∈ [0, T ], equality (47) holds. It completes the proof of (a).
Proof of (b). This step is equivalent to find P : C([0, T ], C) Z → C([0, T ], P 2 (R)) such that for every bounded measurable function Υ : R → R, the function is where µ Wiener denotes the Wiener measure on C([0, T ], R). Thus P W -almost surely, for every t ∈ [0, T ], for every Υ : R → R bounded and measurable, where the last equality follows from Definition 7. Thus we proved equality (48). Moreover, for every t ∈ [0, T ], by composition of two measurable functions, This completes the proof of (b). Proof of (c). Define, on the canonical space (Θ, B(Θ)), F g t = (x g t ) −1 and In order to prove that H g,ε can be written as a progressively measurable function of u and ((w k ) k∈Z , b), we will prove successively that this property holds for ∂ u x g , F g , A g and A g,ε and we will deduce the result for H g,ε by composition of progressively measurable functions. Let us start with ∂ u x g . By Proposition 4, since g ∈ G 1+θ and α > 3 2 + θ, P-almost surely, for every t ∈ [0, T ], the map u → x g t (u) is of class C 1 . Thus there exists a P-almost-sure event A ∈ B(θ) such that for every ((w k ) k∈Z , b) ∈ A, x g t (u) = X t (u, (w k ) k∈Z , b) holds for every (t, u) ∈ [0, T ] × R and such that u → x g t (u) belongs to C 1 . Define for every ((w k ) k∈Z , b) ∈ A, for every (t, u) ∈ [0, T ] × R, Thus for every ((w k ) k∈Z , b) ∈ A and for every (t, u) ∈ [0, T ]×R, ∂ u x g t (u) = ∂ u X t (u, (w k ) k∈Z , b). Moreover, by progressive-measurability of X , it follows from the definition of ∂ u X is also progressively measurable; more precisely, for every t ∈ [0, T ], Now, consider F g . Define for every x ∈ [0, 2π] Thus we have for every x ∈ [x t (0), x t (0) + 2π] Therefore, since for every x ∈ R, F g t (x + 2π) = F g t (x) + 1, we have Hence it is sufficient to prove that we can write F g t as a progressively measurable function of x and ((w k ) k∈Z , b). Recall that P-almost surely, u → x g (u) = X (u, (w k ) k∈Z , b) is continuous. Thus there is I such that P-almost surely, for every v ∈ [0, 1], for every x ∈ [0, 2π], 1 {x g It follows from Fubini's Theorem and from (51) that for every t ∈ [0, T ], Let us conclude with A g , A g,ε and H g,ε . First, remark that A g is obtained by products and compositions of ∂ u x g , F g and h, where h is a C 1 -function. Thus x → A g (x) is a progressively measurable function of x and ((w k ) k∈Z , b). It follows also that (x, y) → A g (x − y)ϕ ε (y) is a progressively measurable function of x, y and (w k ) k∈Z , b. By Fubini's Theorem, we deduce that x → A g,ε (x) is a progressively measurable function of x and ((w k ) k∈Z , b). Again by products and compositions, it follows that there is a progressively measurable function H such that Palmost surely, for every u ∈ R and for every t ∈ [0, T ], It follows that P W ⊗ P β -almost surely, equality (49) holds. It completes the proof of (c).

Idiosyncratic noise
Coming back to equality (43), and applying the relation ∂ µ φ(µ g t ) = ∂ v δφ δm (µ g t ) (see Proposition 41 in appendix), we have By definition (45), δφ δm is equal to δφ δm up to a constant, so their derivatives are equal and In the following lemma, we prove that I 2 can be expressed in terms of δφ δm instead of its derivative. This key step is, as shown below, a consequence of Girsanov's Theorem applied with respect to the idiosyncratic noise β.
For every ν ∈ [−1, 1], define the following stopping time Define the process (y ν r ) r∈[0,T ] as the solution to Define P ν as the absolutely continuous probability measure with respect to P W ⊗ P β with density dP ν d(P W ⊗P β ) = E ν T . Thus by Girsanov's Theorem, the law under P ν of ((W k ) k∈Z , β ν ) is equal to the law under P W ⊗ P β of ((W k ) k∈Z , β). It follows that (Ω, G, (G t ) t∈[0,T ] , P ν , y ν , (W k ) k∈Z , β ν ) is a weak solution to equation (4).

Conclusion of the analysis 29
Recall that g belongs to G 3+θ and f is of order α > 7 2 + θ. Thus by (71) and by (72) . By Hölder's inequality, we have . By Burkholder-Davis-Gundy inequality, it follows that where the last inequality holds by (69). By the same computation, where the last inequality holds by (70). We deduce that E 3 C √ t (1 + g ′′ L∞ + g ′ 2 L∞ ). By inequality (61) and the estimates on E i , for i = 1, 2, 3, we finally get: where C 2 (g) = 1 + g ′′′ 3 L 8 + g ′′ 12 L∞ + g ′ 12 As a conclusion of sections 4 and 5, we have proved the following inequality.

Proof of the main theorem
Essentially, Corollary 32 states that we can control the gradient of P t φ by the gradient of φ. By iterating the inequality over successive time steps, we will conclude the proof of Theorem 15.
Definition 33. Let K t be the set of G t -measurable random variables taking their values P-almost surely in the set of continuous 1-periodic functions k : R → R satisfying k L∞ = 1.
Proposition 34. Let φ, θ, f and g be as in Theorem 15. Let t, s ∈ [0, T ] such that t + s T . For each G s -measurable function h with values in ∆ 1 satisfying P-almost surely h C 1 4, there exists C g > 0 independent of s, t and h such that Proof. By equality (12), where the second equality follows from the fact that h is 1-periodic and the last equality follows from (45). Apply now inequality (62) with ε 0 = 1 2 7 2 +θ √ t C h C 1 C 2 (g) . For every G 0 -measurable g and h, , since for any random variable X on Ω and any Thus it follows from the latter inequality that: Now, consider a deterministic function g and a G s -measurable h, where s T − t. Then, repeating the whole argument with the G s -measurable variables x g s and h instead of g and a G 0 -measurable h, respectively, we get: where x s,x g s t+s (u) denotes the value at time t + s and at point u of the unique solution to (4) which is equal to x g s at time s and where ε s is G s -measurable. By strong uniqueness of (4), we have the following flow property: x s,x g s t+s = x g t+s and µ s,x g s t+s = µ g t+s . Therefore, Remark that u → K s,x g s ,εs t+s (u)/ K s,x g s ,εs t+s L∞ belongs to K t+s . Thus, taking the expectation of the latter inequality, there is C > 0 so that for every G s -measurable function h satisfying h C 1 4 In order to prove inequality (63), it remains to show that there is C g such that E C 3 (x g s ) 2 C g . Since C 3 (g) = C 1 (g)C 2 (g) 3+2θ , we have: We refer to (70), (71) and (72) to argue that the r.h.s. is bounded by a uniform constant in s ∈ [0, T ] and depending polynomially on g ′′′ L∞ , g ′′ L∞ , g ′ L∞ and 1 g ′ L∞ . The constant is finite since g belongs to G 3+θ .
Corollary 35. Let φ, θ, f and g satisfy the same assumption as in Proposition 34. Let t, s ∈ [0, T ] such that 2t + s T . For any h : R → R be a G s -measurable random variable with values in ∆ 1 satisfying P-almost surely h C 1 4, there exists C g > 0 independent of s, t and h such that Proof. We get the above inequality by applying (63) to P t φ instead of φ. We note that P t (P t φ) = P 2t φ and that P t φ L∞ φ L∞ .
Let us denote by h the map defined for every u ∈ R by h(u) : We check that h is a G t 0 −2t -measurable 1-periodic C 1 -function. Moreover, h L∞ 2 and ∂ u h L∞ 2; thus h C 1 4. Therefore, the assumptions of Corollary 35 are satisfied, with s = t 0 − 2t. We apply (65) with s = t 0 − 2t: and by taking the supremum over all k in K t 0 −2t , we get S 2t C g φ L∞ t 2+θ + 1 2 3+θ S t . We complete the proof of Theorem 15.
Proof (Theorem 15). It follows from Proposition 36 that for every t ∈ (0, t 0 2 ], Therefore, denoting by S := sup t∈(0,t 0 ] t 2+θ S t , we have S 2 2+θ C g φ L∞ + 1 2 S. Since S is finite, we obtain S 2 3+θ C g φ L∞ . Thus for every t 0 ∈ (0, T ], t 2+θ 0 S t 0 2 3+θ C g φ L∞ . Therefore, for any deterministic 1-periodic function k : R → R and for every t ∈ (0, T ], we have Let h ∈ ∆ 1 . Thus k = ∂ u h g ′ is a 1-periodic function and we deduce that for a new constant C g . Applying equality (64) with h g ′ instead of g ′ , we obtain which concludes the proof of the theorem.

A.1 Density functions and quantile functions on the torus
We define the set of positive densities on the torus.
Definition 37. Let P + be the set of continuous functions p : T → R such that for every x ∈ T, p(x) > 0 and T p = 1. P + can also be seen as the set of 2π-periodic and continuous functions p : R → (0, +∞) such that 2π 0 p(x)dx = 1.
Let p ∈ P + and x 0 ∈ T be an arbitrary point on the torus. Define a cumulative distribution Since p is 2π-periodic and 2π 0 p = 1, F 0 satisfies F 0 (x + 2π) = F 0 (x) + 1 for each x ∈ R. It follows from the continuity and from the positivity of p that F 0 is a C 1 -function and for every x ∈ R, F ′ 0 (x) = p(x) > 0, so that F 0 is strictly increasing. Therefore, the inverse function g 0 := F −1 0 : R → R is well defined. The following properties of g 0 are straightforward: -for every x ∈ R, g 0 • F 0 (x) = x and for every u ∈ R, F 0 • g 0 (u) = u; -g 0 is a strictly increasing C 1 -function and for each u ∈ R, g ′ 0 (u) = 1 p(g 0 (u)) ; -g 0 (0) = x 0 and for every u ∈ R, g 0 (u + 1) = g 0 (u) + 2π (we say that g 0 is pseudo-periodic); -g ′ 0 : R → R is positive everywhere and is a 1-periodic function.

Proposition 38.
There is a one-to-one correspondence between the set P + and the set G 1 of Definition 2.
Proof. Let ι : P + → G 1 be the map such that for every p ∈ P + , g = ι(p) is the equivalence class given by the above construction. We show that ι is one-to-one. First, ι is injective. Indeed, let p 1 , p 2 ∈ P + such that ι(p 1 ) = ι(p 2 ). Let x 0 ∈ T and define, for i = 1, 2, F i (x) = x x 0 p i (y)dy and g i = F −1 i . Then by construction g 1 = ι(p 1 ) = ι(p 2 ) = g 2 . Therefore, there is c ∈ R such that g 2 (·) = g 1 (· + c). Thus for every x ∈ R, Thus F 1 and F 2 share the same derivative: p 1 = p 2 .
Second, ι is surjective. Let g ∈ G 1 and g be a representative of the class g. It is a C 1function such that g ′ (u) > 0 for every u ∈ R and, since g(u + 1) = g(u) + 2π for every u ∈ R, g ′ is 1-periodic. Define F := g −1 : R → R. In particular, F is a C 1 -function such that F ′ > 0 and for every x ∈ R, F (x + 2π) = F (x) + 1. Thus p := F ′ is a continuous function with values in (0, +∞) and for every x ∈ R, p(x) = 1 g ′ (F (x)) . Thus for every x ∈ R, p(x + 2π) = p(x) and 2π 0 p = 1. Therefore p belongs to P + . We check that g = ι(p). Let x 0 be an arbitrary point in T, F 0 be defined by F 0 (x) = x x 0 p(y)dy and g 0 : there is c ∈ R such that F 0 (·) = F (·) + c. Therefore, g 0 (·) = g(· + c), whence g 0 ∼ g. This completes the proof.  [Mar21,Prop. 5]. Moreover, the fact that the map u → x g t (u) is strictly increasing is obtained by the study of the process (x g t (u 2 ) − x g t (u 1 )) t∈[0,T ] for every u 1 < u 2 as in [Mar21,Prop. 6]. The fact that it holds P W ⊗ P β -almost surely, for every 0 u 1 < u 2 1 follows from the continuity of x g , see [Mar21,Cor. 7].

A.2 Properties of the diffusion on the torus
Proof (Proposition 4). Assume that g belongs to G 1+θ and that f is of order α > 3 2 + θ. This second assumption ensures that k∈Z k 2+2θ |f k | 2 converges. By differentiating formally (w.r.t. variable u) equation (4), consider a solution (z t (u)) t∈[0,T ],u∈R to: Using the fact that g ′ is θ-Hölder continuous and that k∈Z k 2+2θ |f k | 2 < +∞, we prove by standard arguments that for each u ∈ R, the solution (z t (u)) t∈[0,T ] exists, is unique and that the map We easily get a constant C depending on g C 1+θ and on k∈Z k 2+2θ |f k | 2 such that for each t ∈ [0, T ], E ε (t) C|ε| 2θ + C t 0 E ε (s)ds. By Gronwall's Lemma, it follows that E ε (T ) C|ε| 2θ , thus E ε (T ) → 0. Therefore, using the continuity of z, we get almost surely for every u ∈ R, for every ε = 0 and for every t ∈ [0, T ]: Thus almost surely, ∂ u x g t (u) = z t (u) for every u ∈ R and t ∈ [0, T ] and furthermore it is given by the exponential form (5). The statements for higher derivatives are obtained similarly. For a detailed version of this proof with every computation of the inequalities mentioned above, see [Mar19, Lemmas II.12 and II.13].
By the previous proof, we know that ∂ u x g satisfies equation (67). It follows that we can control the L p -norms of ∂ u x g and of higher derivatives with respect to the initial condition g.

A.3 Differential calculus on the Wasserstein space
Let us recall a few results about differentiation of real-valued functions φ defined on P 2 (R). We refer to [Lio], [CD18,Chap. 5] or [Car13] for a complete introduction to those differential calculus.
Lions-derivative or L-derivative. Let (Ω, F, P) be a probability space rich enough so that for any probability measure µ on any Polish space, we can construct on (Ω, F, P) a random variable with distribution µ; a sufficient condition is that (Ω, F, P) is Polish and atomless. Let L 2 (Ω) be the set of square integrable random variables on (Ω, F, P), modulo the equivalence relation of almost sure equality. For any φ : Linear functional derivative. Basically, it is nothing but the notion of differentiability we would use for φ : M(R) → R if it were defined on the whole M(R), where M(R) is the linear space of signed measures on R. Note that a subset K of P 2 (R) is said to be bounded if there is M such that for every µ ∈ K, R |x| 2 dµ(x) M . A function φ : P 2 (R) → R is said to have a linear functional derivative if there exists a function δφ δm jointly continuous in (m, v), such that for any bounded subset K of P 2 (R), the function v → δφ δm (m)(v) is at most of quadratic growth in v uniformly in m for m ∈ K, and such that for all m, m ′ ∈ P 2 (R), Note that δφ δm is uniquely defined up to an additive constant only.
Link between both derivatives.
Proposition 41. Let φ : P 2 (R) → R be L-differentiable on P 2 (R), such that the Fréchet derivative of its lifted function D φ : L 2 (Ω) → L 2 (Ω) is uniformly Lipschitz-continuous. Assume also that for each µ ∈ P 2 (R), there is a version Then φ has a linear functional derivative and for every µ ∈ P 2 (R),

A.4 Functions of probability measures on the torus
We use in this paper some well-known properties of the L-derivative, which are recalled in this paragraph. Moreover, since we work in the particular case of probability measures on the torus, it leads naturally to the periodicity of the L-derivative, see Proposition 43 and foll. Finally, we prove Proposition 13.
Then, consider a general µ ∈ P 2 (R). Let Z be a random variable on (Ω, F, P) independent of X with normal distribution N (0, 1) and (a n ) n∈N be a sequence such that for all n ∈ N, a n ∈ (0, 1) and a n → n→+∞ 0. For every n ∈ N, the support of the distribution L(X + a n Z) is equal to R. Thus for every n ∈ N, v → ∂ µ φ(L(X + a n Z))(v) is 2π-periodic.
Since (a n ) n∈N is bounded by 1 and X ∈ L 2 (Ω), there exists C > 0 such that for every n ∈ N and for every v ∈ [0, 2π] Recall that v → ∂ µ φ(L(X + a n Z))(v) is 2π-periodic for every n ∈ N. Thus the sequence (∂ µ φ(L(X+a n Z))) n∈N is uniformly bounded on R. By Arzela-Ascoli's Theorem, up to extracting a subsequence, (∂ µ φ(L(X + a n Z))) n∈N converges uniformly to a limit u : R → R. In particular, u is a 2π-periodic function. Moreover, we prove that the following quantity tends to zero. Let Y ∈ L 2 (Ω).
Corollary 44. Let φ : P 2 (R) → R be a function satisfying the φ-assumptions. Let µ ∈ P 2 (R) and assume that µ has a density belonging to P + in the sense of Definition 37. Then there is a unique 2π-periodic and continuous version of ∂ µ φ(µ)(·). Furthermore, for every v ∈ [0, 2π], Proof. Let X ∈ L 2 (Ω) with distribution µ. Then the law of {X} is µ, seen as an element of P 2 (R) with support included in [0, 2π]. Since the density of µ belongs to P + , the support of µ is equal to [0, 2π].
In the light of Proposition 41, we prove that the linear functional derivative is also 2πperiodic: Proposition 45. Let φ : P 2 (R) → R be a function satisfying the φ-assumptions. Let µ ∈ P 2 (R) be such that µ has a density belonging to P + in the sense of Definition 37. Then T ∂ µ φ(µ)(v)dv = 0. In other words, v → δφ δm (µ)(v) is 2π-periodic.
Proof. By Corollary 44, it is sufficient to prove that T ∂ µ φ( µ)(v)dv = 0. Let Y 0 be a random variable on (Ω, F, P) with distribution equal to µ. Let p : R → R denote its density, extended by 2π-continuity. By assumption, p(v) > 0 for every v ∈ [0, 2π], hence it holds for every v ∈ R.
Define the following ordinary differential equation: with initial condition Y 0 . Denoting by F := x → x 0 p(v)dv and by g = F −1 respectively the c.d.f. and the quantile function associated to p, we have d dt F (Y t ) = 1. Thus for every t 0, F (Y t ) = F (Y 0 ) + t and Y t = g(F (Y 0 ) + t) = g t (F (Y 0 )), where g t (·) = g(· + t).
Fix t 0. Since F (Y 0 ) has a uniform distribution on [0, 1], Y t = g t (F (Y 0 )) implies that g t is the quantile function of the random variable Y t . According to Definition 2, g t ∼ g, thus we deduce that the law of {Y t } is µ. . Since φ is T-stable, it implies that P W -almost surely, φ(Z X t ) = φ(Z Y t ). By Definition 8, P t φ(µ) Thus P t φ is T-stable. By Definition 8, it is clear that P t φ is bounded on P 2 (R), because φ is bounded. Furthermore, P t φ is continuous on P 2 (R), and even Lipschitz-continuous. Indeed, let µ, ν ∈ P 2 (R) and X, Y ∈ L 2 [0, 1] be the quantile functions respectively associated with µ and ν: µ = L [0,1] (X) and ν = L [0,1] (Y ); in other words, X (resp. Y ) is the increasing rearrangement of µ (resp. ν). A classical result in optimal transportation (see e.g. [Vil03, Theorem 2.18]) states that (X, Y ) realises the optimal coupling in the definition of the L 2 -Wasserstein distance: W 2 (µ, ν) 2 = 1 0 |X(u) − Y (u)| 2 du.
Assumption (φ3): Let us prove that for every X 1 , X 2 , Y ∈ L 2 [0, 1], By formula (10), |D P t φ(X 1 ) · Y − D P t φ(X 2 ) · Y | |D 3 | + |D 4 |, where Up to replacing X and X + λY by X 1 and X 2 , D 3 and D 4 are equivalent to D 1 and D 2 . Thus we get by the same computations as for D 1 and D 2 : This completes the proofs of (86) and of the proposition.