Distribution dependent SDEs driven by additive fractional Brownian motion

We study distribution dependent stochastic differential equations with irregular, possibly distributional drift, driven by an additive fractional Brownian motion of Hurst parameter H∈(0,1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H\in (0,1)$$\end{document}. We establish strong well-posedness under a variety of assumptions on the drift; these include the choice B(·,μ)=(f∗μ)(·)+g(·),f,g∈B∞,∞α,α>1-12H,\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} B(\cdot ,\mu )=(f*\mu )(\cdot ) + g(\cdot ), \quad f,\,g\in B^\alpha _{\infty ,\infty },\quad \alpha >1-\frac{1}{2H}, \end{aligned}$$\end{document}thus extending the results by Catellier and Gubinelli (Stochast Process Appl 126(8):2323–2366, 2016) to the distribution dependent case. The proofs rely on some novel stability estimates for singular SDEs driven by fractional Brownian motion and the use of Wasserstein distances.


Introduction
In this work we consider a distribution dependent SDE (henceforth DDSDE) of the form where B : R + × R d × P(R d ) → R d , ξ is an R d -valued random variable and W is a R d -valued stochastic process independent of ξ.The drift B and the law of (ξ, W ) are prescribed, while the process X is the unknown and L(X t ) denotes the law of its marginal at time t.Usually in the literature W is sampled as a standard Brownian motion; in this case the DDSDE is also called a McKean-Vlasov SDE, after the pioneering work [32] where it was first introduced.
The importance of McKean-Vlasov equations is due to their connection to systems of N particles subject to a mean field interaction of the form where (ξ i , W i ) are typically taken to be i.i.d.copies of (ξ, W ) and L N X (N ) t stands for the empirical measure of the system at time t.One expects the DDSDE (1.1) to be the mean field limit of (1.2) in the sense that, as N goes to infinity, L N X (N ) t converges weakly to L(X t ) with probability 1.
Another feature of DDSDEs in the Brownian noise case is their connection to nonlinear Fokker-Planck PDEs (also called McKean-Vlasov equations) of the form which describe the evolution of the marginal ρ t = L(X t ); in particular, both (1.1) and (1.3) provide a macroscopic, compact description of the system (1.2), allowing one to reduce its complexity.For this reason, DDSDEs have found applications in numerous fields, see the review [26] and the references therein; let us also mention their connection to mean-field games [30].
Classical results concerning the well-posedness of the DDSDE (1.1) and the mean-field limit property go back to Sznitman [42] and Gärtner [20]; in the last years the field has witnessed substantial contributions both from the analytic and probabilistic communities.On the one hand, new methods based on entropy inequalities [14,27,6] and modulated energy methods [39,40] have allowed for the rigorous derivation of mean field limits for fairly singular B; while on the other, DDSDEs with irregular drifts are related to the flourishing field of regularization by noise phenomena.The latter topic was initiated by Zvonkin [47] and Veretennikov [45] in the case of standard SDEs, see [13] for a general overview; recently many authors have applied similar techniques in the DDSDE case, see for instance [5,33,38,11,24].
Contrary to the previously mentioned works, here we will study DDSDEs in which W is sampled as a fractional Brownian motion (fBm for short) of Hurst parameter H ∈ (0, 1).Our main reasons for doing so are the following: 1.It was shown in [10], revisiting the ideas of Tanaka [43], that for Lipschitz B the meanfield limit of (1.2) to (1.1) holds for any choice of the process W , regardless of it being Markov or a semimartingale.In particular the DDSDE has a physical meaning and still provides a compact description of a much more complex system of interacting particles.
2. Several regularization by noise results for standard SDE are available for W sampled as an fBm (or similar fractional processes), see [35,9,31,1,3] for a short selection.In light of Point 2. above, it is natural to expect similar results to hold for DDSDEs with singular (possibly even distributional in space) drifts and W sampled as an fBm; by Point 1., they are relevant in the study of particle systems with singular interactions (for instance with a discontinuity at the origin, as typical of Coulomb and Riesz-type potentials).
Let us mention that there is a certain degree of arbitrariness in choosing W to be sampled as an fBm, as one could consider other non-Markovian, non-martingale processes.We believe our choice to be simple enough while at the same time representing what one might expect for a larger class of processes (e.g.Gaussian processes satisfying a local non-determinism condition).In this sense, this work also serves as a comparison to the results from [15], where we explored in detail the DDSDE (1.1) in the opposite regime where no assumption whatsoever is imposed on W , thus no regularization can be observed.
Despite the above motivations, singular DDSDEs driven by fBm (or similar fractional processes) so far have not received the same attention as their Brownian counterparts; to the best of our knowledge, the only other work treating these kind of equations is [4].
One possible reason for this is the substantial new difficulties presented by such equations: fBm with parameter H = 1/2 is neither a Markov process, nor a semimartingale, so techniques based on Itô calculus are not applicable.This includes in particular the connection to parabolic semigroups, the martingale problem formulation and the use of Zvonkin transform (or Itô-Tanaka trick), all techniques used extensively in the aforementioned works in the Brownian case.It also prevents the use of standard arguments, which typically rely on establishing uniqueness of the law ρ t = L(X t ) through PDE analysis of (1.3) and then fixing the law in the DDSDE and treating it as a standard SDE.
Treating DDSDEs driven by fBm thus requires a novel set of tools and ideas; our strategy in this paper builds on the work of Catellier and Gubinelli [9], which represented a major breakthrough in the study of standard SDEs driven by fBm of the form (1.4) Therein the authors develop a pathwise approach to the equation, based on nonlinear Young integrals and Girsanov transform, that allows to give meaning to (1.4) and establish its pathby-path uniqueness, for drifts b of poor regularity, possibly even distributional.Their results and techniques have been revisited in subsequent works [17,22,19,21]; in general it suffices to require see for instance Theorem 15 and Corollary 2 from [17].Here B α ∞,∞ denote Besov-Hölder spaces; see Section 1.1 below for the relevant definitions and notations in use throughout the article.
For the sake of exposition, let us ignore for the moment the additional time regularity required in (1.5) in the case H > 1/2, since it is mostly of a technical nature; then condition (1.5) roughly amounts to the drift b enjoying a spatial regularity B α ∞,∞ with α > 1 − 1/(2H).Observe that for all H ∈ (0, 1) this includes values α < 1/2, while for H < 1/2 we are even allowed to take α < 0, namely distributional b.To the best of our knowledge, no work after [9] has improved on the allowed range of α.
With the above theory at hand, we can interpret the DDSDE (1.1) by rewriting it as namely, X solves the SDE with drift b, in the Catellier-Gubinelli sense, where b depends in a nontrivial way on the law of X itself.This interpretation comes with a natural fixed point formulation: given a process X, we can associate to it a "flow of measures" µ t = L(X t ) and a drift b µ t := B t (•, µ t ), then solve the associated SDE, which gives a new process Y = I(X); thus X is a solution to (1.1) if and only if it is a fixed point for I.
Alternatively, one could start with the flow of measures µ • = {µ t } t∈[0,T ] and set up the fixed point procedure for this object, by defining J (µ • ) t = L(X t ) for X solution to b µ .These two interpretations are in fact equivalent: once µ • is completely determined, the DDSDE reduces to a standard SDE with fixed drift b µ , to which the previous results can be applied; see Lemma 4.4 for more details.Throughout the article we will exploit both interpretations whenever useful.
Given the above interpretation, we need two main ingredients to develop a solution theory: 1. Firstly, B must have the properties that b µ satisfies (1.5) for any µ • of interest and that the solution-to-drift map X( → µ • ) → b µ is Lipschitz in suitable topology.2. Secondly, we must develop stability estimates for the drift-to-solution map b → Y , in an appropriate topology that complements the stability of µ → b µ .
Once these points are established, the contractivity of the overall map X → b µ → I(X) follows.
There are however major problems with the program outlined above; to describe them without too many technicalities, let us consider here the most relevant case B(µ) = f * µ + g for time homogeneous f, g ∈ B α ∞,∞ , α > 1 − 1/(2H).In this case, the map µ → b µ is naturally Lipschitz in the total variation topology, in the sense that however due to the lack of an underlying parabolic PDE (1.3) (and the associated maximum principle) in the fBm setting, it is no obvious how to control the drift-to-solution map b → Y in this topology, i.e. how to bound L(Y 1 t ) − L(Y 2 t ) T V as a function of b 1 − b 2 B α ∞,∞ .One of the main intuitions of the current work, which allows us to overcome this difficulty, is the understanding that although the regularity B α ∞,∞ is needed in order to solve the SDE (1.4), one may establish stability estimates in the weaker norm B α−1 ∞,∞ .Roughly speaking, given two solutions X 1 , X 2 to (1.4) associated to different initial data and drifts (ξ i , b i ), for any p ∈ [1, ∞) we have see Theorem 3.13 and Corollary 3.17 for the rigorous statements.This property is a natural analogous to standard ODE theory, where solvability requires b Lipschitz, but stability estimates are in the supremum norm.
In our setting, it implies that B only needs to enjoy some multiscale regularity of the form for another notion of distance d(µ 1 , µ 2 ), possibly different from the total variation one.The right choice for d turns out to be the family of p-Wasserstein distances d p (µ 1 , µ 2 ), which complements the bound (1.6) thanks to the basic property d For the sake of this preliminary discussion we have ignored the time regularity requirement in (1.5), but it does indeed play a relevant role, making the proofs a bit more technical and requiring us to treat the cases H > 1/2 and H ≤ 1/2 slightly differently; see Section 4 for more details.
Let us stress that, since we are not allowed to use the same tools as in the Brownian setting, our results are not optimal for the choice H = 1/2, sharper ones being available for instance in [38,24].Nevertheless, they still provide some new insights, with the stability estimate (1.6) being new in this setting as well.This also partially answers the ongoing debate from [38,24,25] on whether the drift B should be taken Lipschitz continuous in the measure argument µ w.r.t. the total variation distance, the Wasserstein one or a weighted mix of the two: the use of Wasserstein distance allows the drift to be Lipschitz continuous in the different regularity scale B α−1 ∞,∞ , which is strictly negative in the regime α ∈ (0, 1), which is admissible in (1.5) for A major open problem coming from this work is the mean-field convergence (and associated propagation of chaos property) of the particle system (1.2) to (1.1), for the class of singular drifts for which we establish well-posedness of the DDSDE in Theorem 2.4.Our techniques are currently not enough to give a full answer; recently, several authors have investigated the Brownian setting using alternative tools based on Girsanov theorem and Large Deviations, see [29,28,44,23].Contrary to Itô calculus, these tools are available for fBm as well, thus we hope they may be of help in future investigations.
Another interesting question posed by the current work is whether our results can be further improved, in the sense of allowing values of α < 1 − 1/(2H), at least in some special cases.Theorems 2.6 and 2.7 suggest an affirmative answer for convolutional drifts B(µ) = b * µ, see also the discussion at the beginning of Section 5; this is in analogy with the Brownian case, where standard SDE theory requires roughly b ∈ L ∞ x , but the nonlinear PDE (1.3) can be solved for roughly We conclude this introduction with the structure of the paper.In Section 1.1 we introduce all relevant notations adopted in the paper and recall some well-known facts.Section 2 contains all our main results and Section 2.1 relevant examples of drifts B satisfying them.We present in detail the Catellier-Gubinelli theory of SDEs driven by fBm in Section 3, where we prove our main stability results (Theorem 3.13 and Corollary 3.17 from Section 3.3) as well as some new auxiliary results on the regularity of the law of solutions (Section 3.4).Sections 4 and 5 contain the proofs of our main results, respectively Theorems 2.4, 2.5, 2.6 and 2.7.Finally, we have included in Appendix A a collection of useful analytic lemmas used throughout the paper.
1.1.Notations, conventions and well-known facts.Throughout the article we will always work on a finite time interval [0, T ], although arbitrarily large; we will never deal with estimates on the infinite interval [0, +∞).We write a b whenever there exists a constant C > 0 such that a ≤ b.To stress the dependence C = C(λ) on a particular parameter λ, we will write a λ b.For p ∈ [1, ∞] and where it will not cause confusion, we write p ′ to denote the dual exponent to p, that is Throughout the article, whenever not mentioned explicitly, we will consider an underlying probability space (Ω, F, P); any σ-algebra appearing is assumed to be P-complete.If Ω has a topological structure, then B(Ω) denotes its Borel σ-algebra (again up to P-completion).
We denote by E P , or simply E, expectation w.r.t.P; Given a Banach space E and p ∈ [1, ∞], we will frequently consider E-valued random variables X in the space We denote by L P (X), or simply L(X), the law of X on E, namely the pushforward measure P• X −1 = X♯P; more generally, we adopt the notation F ♯µ for the pushforward of a measure µ under a measurable map F .Given a measure µ ∈ P(C T ),we mention in particular the pushforward µ t := e t ♯µ where e t (h) = h t denotes the evaluation map, e t : If (E, • E ) is a Banach space, then C T E and C γ T E are Banach spaces with norms In the case E = R n for some n ∈ N, whenever it doesn't create confusion we will simply use Given a Banach space E and q ∈ [1, ∞], we denote by L q T E = L q (0, T ; E) the Bochner-Lebesgue space of strongly measurable f : with usual modification for q = ∞; as before we write L q T for L q T R n .
) denote respectively compactly supported smooth functions and n-times differentiable functions with continuous, bounded derivatives up to order n; S = S(R d ; R m ) denote Schwartz functions, S ′ their dual.Given f , we denote by Df its Jacobian, i.e. the collection of first order derivatives (∂ j f i ) i,j , possibly interpreted in the distributional sense.For α ∈ (0, 1), C α x = C α (R d ; R m ) stand for the Banach space of Hölder continuous functions, with norm and its derivatives of order ⌊α⌋ belong to , where ⌊α⌋ denotes the integer part of α.We denote by where ∆ n denote the Littlewood-Paley blocks associated to a partition of the unity.We refer to the monograph [2] for details on Besov spaces; throughout the paper we will frequently employ their properties, like Besov embeddings, Bernstein estimates for ∆ n f or the regularity of f * g for f , g in different Besov spaces.For α ∈ R + \ N, the spaces C α x and B α ∞,∞ coincide; however for clarity we will continue to write C α for α ≥ 0 and B α ∞,∞ otherwise.The notations from this section and the previous one can be combined to define C γ T C α x , L q T B α p,p , etc.; similarly, we define C γ T C α loc to be the vector space of all f : Given a function f of time and space, Df always denotes its Jacobian in the space variable only.
1.1.3.Probability measures and Wasserstein distance.Given a separable Banach space E, we denote by P(E) the set of probability measures over E; we write µ n ⇀ µ for weak convergence of measures, in the sense of testing against continuous bounded functions.
Given µ, ν ∈ P(E), Π(µ, ν) stands for the set of all possible couplings of (µ, ν), i.e. the subset of P(E × E) with first and second marginals given respectively by µ and ν.For any p ∈ [1, ∞), we define which is a well defined quantity (possibly taking value +∞).By [46,Theorem 4.1], an optimal coupling m ∈ Π(µ, ν) realizing the above infimum always exists.Similarly we define P p (E) to be set of p-integrable probability measures; that is, µ ∈ P p (E) if µ ∈ P(E) and It is well known that d p (µ, ν) < ∞ for µ, ν ∈ P p (E) and that (P p (E), d p ) is a complete metric space, usually referred to as the p-Wasserstein space on E; let us stress however that our definition of d p (µ, ν) holds for all µ, ν ∈ P(E).We recall that, given a sequence {µ n } n ⊂ P p (E), d p (µ n , µ) → 0 is equivalent to µ n ⇀ µ weakly and µ n p → µ p , see [46,Theorem 6.9].Given µ ∈ P(R d ), with a slight abuse of notation we will write µ ∈ L q (R d ) (or simply L q x ) for q ∈ [1, ∞] to indicate that µ admits a density µ(dx) = ρ(x) dx with respect to the d-dim.Lebesgue measure, such that ρ ∈ L q x .
1.1.4.Fractional Brownian motion.A real valued continuous process {W t , t ∈ [0, T ]} is a fractional Brownian motion (fBm) with Hurst parameter H ∈ (0, 1) if it is a centered Gaussian process with covariance function an R d -valued process W is a d-dimensional fBm if its components are independent 1-dimensional fBms.All the results we are going to recall here are classical and can be found in [34,37].
For H = 1/2, fBm corresponds to classical Brownian motion (Bm), but for H = 1/2 it is not a semimartingale nor a Markov process; its trajectories are P-a.s. in C H−ε T for any ε > 0. Given an fBm W of parameter H on a probability space (Ω, F, P), it's always possible to construct a standard Bm B on it such that the following canonical representation holds: where K H is a Volterra-type kernel and B and W generate the same filtration.Given a filtration {F t } t∈[0,T ] , we say that W is an F t -fBm if the associated B is an F t -Bm in the classical sense.
Closely related to the canonical representation are a version of Girsanov theorem for fBm (see e.g.[35,Theorem 2]) and the strong local non-determinism (LND) of fBm: for any H ∈ (0, 1) there exists c H > 0 such that The LND property plays a key role in establishing the regularising features of W , cf. [22,18].

Main results
Let us recall that the focus here is an abstract DDSDE of the form where L(ξ) = µ 0 , ξ independent of W and W is sampled as a fBm of parameter H ∈ (0, 1).We want to identify general conditions for measurable drifts we can develop a solution theory for (2.1).As explained in the introduction, our strategy consists in setting up a fixed point for µ → b µ t := B t (µ t ) → X → μt := L(X t ).To this end, the assumptions on B should enforce two facts: for any flow of measures µ ∈ C T P p , the associated drift b µ t := B t (µ) is regular enough to solve (1.4), namely b µ must satisfy condition (1.5); the map µ → b µ should be stable in suitable topologies.Last but not least, the eligible B should include cases of particular interest (most notably B(µ) = b * µ), see Section 2.1 below.
Corresponding to the above requirements, for H > 1/2 we define the following space: Definition 2.1.For α, β ∈ (0, 1) and p ∈ [1, ∞), let H β,α p denote the class of continuous functions B : [0, T ] × R d × P p (R d ) → R d satisfying the following condition: there exists C > 0 such that i.For all (t, x, µ) iii.For all t ∈ [0, T ] and µ, ν ). Whenever it does not create confusion, we will simply denote by B the optimal constant C.
Corresponding to the above requirements, for H ≤ 1/2 we define the following space: ∞,∞ satisfying the following condition: there exists h ∈ L q T such that i.For all (t, µ) ν).Whenever it does not create confusion, we will simply denote by B the optimal constant h L q T .Remark 2.3.It is readily checked that for α ≤ α, p ≥ p and q ≤ q we have G q, α p ⊂ G q,α p .Similarly, for α ≤ α, β ≤ β and p ≥ p it holds H β, α p ⊂ H β,α p .
Roughly speaking, we say that X is a solution to the DDSDE (2.1) if, setting b µ t := B t (L(X t )), then X is a solution to the standard SDE (1.4) associated to b µ , being interpreted in the Catellier-Gubinelli sense whenever b µ is singular; the pathwise theory for singular SDEs will be recalled in detail in Section 3.All the concepts of strong existence, pathwise uniqueness and uniqueness in law for DDSDEs then follow from the standard ones, see Definition 4.2 from Section 4.2.
Our first main result is the well-posedness of DDSDE (2.1) under suitable conditions on B; it can be seen as an extension of [17,Theorem 15] to the distribution dependent case.
Then for any µ 0 ∈ P p (R d ), strong existence, pathwise uniqueness and uniqueness in law hold for the DDSDE (2.1).
Similarly, let H ≤ 1/2 and let B ∈ G q,α p for parameters Then for any µ 0 ∈ P p (R d ), strong existence, pathwise uniqueness and uniqueness in law hold for the DDSDE (2.1).
Given a DDSDE (2.1), we will consider either (ξ, B) or (µ 0 , B) to be the data of the problem, where we recall that L(ξ) = µ 0 .As already mentioned in the introduction, the solution X is entirely determined by the associated flow of measures µ ∈ C T P p given by µ t = L(X t ): once this is known, the drift b µ t = B t (µ t ) is determined as well and so we can reconstruct the strong solution X (or construct another copy of it on any probability space of interest).For this reason, it is quite useful to regard µ ∈ C T P p to be itself a solution to the DDSDE; the exact equivalence between µ and X will be discussed rigorously in Lemma 4.4 from Section 4.2.
The next theorem provides stability estimates for the data-to-solution map (µ 0 , B) → µ (respectively (ξ, B) → X), showing that it is locally Lipschitz.
Theorem 2.5.Let µ 0 , ν 0 ∈ P p for some p ∈ [1, ∞).Then the following holds: i.For H > 1/2, let B 1 , B 2 , be drifts in H H,α p with parameters satisfying (2.2) and let M > 0 be a constant such that B i ≤ M .Then there exists a constant C = C(α, H, T, M, p) such that, for any µ i 0 ∈ P p (R d ), the associated solutions where If X 1 , X 2 are two associated solutions, in the sense of stochastic processes, defined on the same probability space, then there exists γ > 1/2 such that ii.For H ≤ 1/2, let B 1 , B 2 , be drifts in G q,α p with parameters satisfying (2.3) and let M > 0 be a constant such that B i ≤ M .Then there exists a constant C = C(α, H, T, M, p, q) such that, for any µ i 0 ∈ P p (R d ), the associated solutions µ i ∈ C T P p satisfy sup where If X 1 , X 2 are two associated solutions, in the sense of stochastic processes, defined on the same probability space, then there exists γ > 1/2 such that As the settings of Theorems 2.4 and 2.5 are very general, they do not allow one to exploit any specific structure of the DDSDE in consideration to obtain sharper results.A prototypical example of such structure, which arises in many practical applications, is given by convolutional drifts B t (x, µ) := (b t * µ)(x).The associated DDSDE takes the form (2.8) As before we allow the drift b to be distributional, at least of the form b ∈ L 1 T B α p,p for some α ∈ R, p ∈ [1, ∞]; at this stage pointwise evaluation of b s * L(X s ) is not meaningful, instead we again interpret the equation in the Catellier-Gubinelli sense.
The heuristic idea behind the next results is that we can use the convolutional structure in a recursive way: assuming we are given a solution X with sufficiently regular law L(X • ), this in turn leads to an improved regularity for the effective drift b • * L(X • ), compared to the original b.The argument can be made rigorous by establishing a priori estimates and working with smooth approximations; as a result, we are able to establish well-posedness for (2.8) in situations where the general Theorem 2.4 does not apply.
In both results we are going to present, we will need some additional regularity for the initial data µ 0 , in the form of an integrability assumption.This is because, as explained in the introduction, the lack of an underlying parabolic PDE prevents us from proving a smoothing effect at strictly positive times analogous to that of parabolic equations; rather, in order to develop a priori estimates, we will show that such integrability is propagated by the dynamics.
The next result shows existence and uniqueness of solutions to (2.8) in a suitable class, under an additional condition on div b, which is by now quite standard since the pioneering work [12].
x there exists a strong solution to (2.8), which satisfies sup moreover uniqueness holds, both pathwise and in law, in the class of solutions satisfying (2.9).
Our second result in the convolutional case is established under L q T L p x -type assumptions on b; here instead of relying on a bound for div b, we exploit Girsanov-based arguments to establish integrability of L(X t ).This technique however only works in the regime Then for any b ∈ L q T L p x and µ 0 ∈ L r x , there exists a strong solution to (2.1), which satisfies sup moreover uniqueness holds, both pathwise and in law, in the class of solutions satisfying (2.11).
Remark 2.8.Condition (2.10) can be generalized in a way that allows values r ≤ d/(d − 1) and that applies for d = 1, see Theorem 5.9 in Section 5.2 for more details.We warn the reader not to interpret Theorems 2.6 and 2.7 as full pathwise uniqueness (resp.uniqueness in law) statements: in general they do not exclude the existence of irregular solutions X which do not satisfy condition (2.9) (resp.(2.11)).However, as the proofs show, any solution constructed as the limit of smooth drifts b n → b does satisfy (2.9) (resp.(2.11)), thus it is the only physical solution to the DDSDE (2.1).

2.1.
Examples.To illustrate the variety of situations to which Theorems 2.4 and 2.5 apply, we provide here several examples of functions contained in G q,α p and H β,α p .
Example 2.9.Let α ∈ R, and for any where the integral is meaningful in the Bochner sense; then B ∈ G q,α p for any p ∈ [1, ∞).Indeed, by the hypothesis on b, it is readily checked that given µ, ν ∈ P(R d ), let m ∈ P(R 2d ) be an optimal coupling for d 1 (µ, ν), then for some C > 0, uniformly over s, t, x, x ′ , y, y ′ ; we can identify b with the map b : given by (t, y) → b t (•, y).Assume additionally that for the same constant Then B ∈ H β,α p for any p ∈ [1, ∞).The verification of Conditions i. and iii. of Definition 2.1 is identical to that of Example 2.9, so we only need to focus on Condition ii. for p = 1.
Given µ, ν ∈ P(R d ), let m be an optimal coupling for d 1 (µ, ν), then where in the last step we used Jensen's inequality and the optimality of m.
Indeed, the verification of Condition i. from Definition 2.2 is the same as in Example 2.9, where now we can take h The verification of Conditions i. and ii.from Definition 2.1 follows from Example 2.10, as we can simply set bt (x, y) := b t (x − y) and apply the calculations therein to b. Condition i. instead follows as above from an application of Lemma A.7.
Finally, let us point out that all the computations carry over to the case Similarly, given b as in Example 2.10, with B defined as above, it is easy to verify that B ∈ H α,β p for any p ∈ [1, ∞).As a prototypical example, one may consider b ∈ B α ∞,∞ and define in which case, similarly to before, it holds B ∈ G q,α p for any q ∈ [1, ∞] (resp.B ∈ H β,α p for any β ∈ (0, 1)) and p ∈ [1, ∞).
We highlight that this class of examples are quite important since B is only defined on P 1 (R d ) and not on the whole P(R d ), thus making the use of other notions of distance between measures (e.g. total variation norm) more difficult to handle.It can be further generalized to the case φ : R d → R m for another m ∈ N (namely, B is determined by m statistics associated to µ) or to dependence on p-moments like B t (•, µ) = b t (•, µ p ) for µ ∈ P p (R d ); for p > 1 we can also allow φ to grow more than linearly at infinity.

SDEs driven by fBm
In this section we revisit the theory of singular SDEs driven by fBm, in order to derive useful estimates to apply later to the DDSDE setting.Sections 3.1 and 3.2 serve as a recap of key facts, respectively the pathwise meaning of singular SDEs and the regularising properties of fractional Brownian motion.Sections 3.3 and 3.4 instead provide novel results, Theorem 3.13 being the most important for our purposes.
Although the material of Sections 3.1-3.2 is strongly based on the works previous [9,17,16,22], we felt obliged to provide the proofs of several key results for technical but rather important reasons.On the one hand, the aforementioned works are focused entirely on a pathwise setting, never establishing clear probabilistic concepts of solutions (cf.Definitions 3.4-3.5 below); on the other hand, previously singular drifts b ∈ L q T B α ∞,∞ were treated in [9] only in the autonomous case, while in [17] when they are compactly supported in space.As neither option fits our setting nicely (consider drifts of the form b = b * µ t ) we extend the results therein to suit our analysis of DDSDEs.
3.1.Pathwise SDEs as nonlinear Young equations.Consider a standard SDE of the form where b ∈ L 1 T B α ∞,∞ with α ∈ R and W is an R d -valued fractional Brownian motion.When α > 0, the SDE has a classical meaning; it can be solved pathwise by standard ODE theory if b is regular enough, e.g.α > 1.We will say that b is a distributional drift (sometimes distributional field) if instead α < 0, in which case pointwise evaluation is not allowed, and we cannot give meaning to the integral appearing in (3.2) in the classical Lebesgue sense.
To deal with distributional drifts, we will employ the nonlinear Young integral framework, first developed in [9]; to present it, we first need the concept of averaged field.
Let us give an heuristic motivation before going into technical details.In the regular regime α > 0, if X is a solution to (3.1), by the change of variables θ t := X t − W t we find that θ solves Closely related to the above integral is the averaging of the field b along the curve W , namely the space-time function which we call an averaged field ; we will write As long as b is at least measurable and bounded, both integrals appearing in (3.2) and (3.3) are well defined.However, for distributional b, while equation (3.2) breaks down, the averaged field T W b is still meaningful in the distributional sense, see [17,Section 3.1]; moreover, depending on the properties of W , T W b might even be continuous or (higher order) differentiable in the spatial variable.
The fundamental intuition of [9] is that the regularity of T W b can be used to give meaning to (3.2), thus also to (3.1), by reformulating the SDE as a nonlinear Young equation.
As the next statement shows, given any space-time function Then for any interval [s, t] ⊂ [0, T ] and any sequence of partitions D n of [s, t] with mesh converging to zero, the following limit exists and is independent of the chosen sequence: We will refer to it as a nonlinear Young integral.Furthermore: i.The integral is additive: The statement is a particular subcase of [16,Theorem 2.7].
We provided the statement only for A ∈ C γ T C 1 loc as this setting is sufficient for our purposes, but let us mention that the theory is more general and allows to consider With the above result at hand, we can now define nonlinear Young equations.
we say that θ is a solution to the nonlinear Young equation associated to T and For later use, we provide the following technical lemma; loosely speaking it shows that solutions to nonlinear Young equations have a closure property.
suppose that for each n there exists a solution θ n associated to (θ 0 , A n ) and that θ n → θ in C γ T .Then θ solves the nonlinear Young equation associated to (θ 0 , A).
Proof.This is a direct consequence of Point iii. of Proposition 3.1.By assumption and we can pass to the limit on both sides thanks to the continuity of (θ, A) → • 0 A(ds, θ s ).We are now ready to explain what it means for X to be a solution to (3.1) when b is distributional but T W b is regular enough: roughly speaking, we impose the condition X = θ + W , where θ solves the nonlinear YDE associated to A = T W b, which is the natural extension of (3.2).Although so far we have always dealt with a stochastic process W , this is a pathwise notion of solution, in the sense that for any fixed realization of W (ω) such that we have an analytically well-defined equation of the form (3.4).This is encoded in the next definition, inspired by [19,Section 4.3], which contains a more in-depth discussion of various related concepts.
Definition 3.4.Let (Ω, F, P) be a probability space, (ξ, W ) an R d × C T -valued random variable defined on it and let b be a distributional field.We say that another C T -valued random variable X on (Ω, F, P) is a pathwise solution to the SDE (3.1) associated to (b, ξ, W ) if there exists Ω ′ ⊂ Ω with P(Ω ′ ) = 1 and a deterministic γ > 1/2 such that for all ω ∈ Ω ′ the following hold: i.
The following definition relates standard probabilistic notions of weak and strong solutions and of uniqueness to the notion of pathwise existence given in Definition 3.4.Definition 3.5.Let b be a distributional field, ν ∈ P(R d × C T ).A tuple (Ω, F, P; X, ξ, W ) given by a probability space (Ω, F, P) and a C T × R d × C T -valued random variable is a weak solution to the SDE (3.1) associated to (b, ν) if L P (ξ, W ) = ν and X is a pathwise solution associated to (b, ξ, W ) in the sense of Definition 3.4.We say that X is a strong solution if it is adapted to the filtration F t = σ{ξ, W s | s ≤ t}.Weak uniqueness holds for the SDE associated to (b, ν) if any given weak solutions (Ω i , F i , P i ; X i , ξ i , W i ), i = 1, 2, associated to the same data (b, ν), satisfy L P 1 (X 1 ) = L P 2 (X 2 ).Similarly, pathwise uniqueness holds if any two given solutions (X i , ξ, W ) defined on the same probability space, w.r.t. the same (b, ξ, W ), satisfy X 1 = X 2 P-a.s.
In line with the above definition, we will use the standard terminology that weak (resp.strong) existence holds for the SDE associated to (b, ν) to mean that we can construct a weak (resp.strong) solution (Ω, F, P; X, ξ, W ). In particular, if strong existence holds, then (Ω, F, P) can be chosen to be the canonical space, namely with It then follows from Point ii. of Proposition 3.1 that in this setting the concept of pathwise solution from Definition 3.4 is equivalent to the standard one.Moreover for b ∈ C T C 1 loc standard ODE theory guarantees pathwise uniqueness, uniqueness in law and strong existence of solutions for the SDE associated to (b, ν) for any choice of ν ∈ P(R d × C T ).
The next lemma provides a simple condition to establish uniqueness of solutions to (3.1).Lemma 3.7.Let (Ω, F, P) be a probability space, (X, ξ, W ) be a triple defined on it such that X solves the SDE associated to (b, ξ, W ) in the sense of Definition 3.
for P-a.e. ω, then any other solution X defined on the same probability space and associated to (b, ξ, W ) must coincide with it, in the sense that X = X P-a.s.
Proof.The statement is a useful rewriting of [17,Remark 15].
Let us stress that, even when the assumptions of Lemma 3.7 are met, pathwise uniqueness doesn't immediately follow, unless one can additionally show that X is a strong solution.
3.2.Regularity of averaged fields and Girsanov transform for fBm.In Section 3.1 we have treated the SDE (3.1) in full generality, but in the remainder of Section 3 we will deal with a slightly more specific setting.We will always take W to be an R d -valued fBm of parameter H ∈ (0, 1) and ξ to be random initial data independent of it; in particular W 0 ≡ 0 and ν = L(ξ, W ) = L(ξ) ⊗ L(W ) = µ 0 ⊗ µ H for some µ 0 ∈ P(R d ), where µ H ∈ P(C T ) denotes the law of fBm of parameter H ∈ (0, 1).Therefore for fixed H we can regard the data of the problem to be the pair (µ 0 , b); if the initial data ξ = x 0 ∈ R d is deterministic, with a slight abuse we will write (x 0 , b) in place of (δ x 0 , b).
We begin by showing the P-a.s.regularity of averaged fields T W b for W sampled as an fBm.We continue to make use of the intuitive notation despite the fact that in general these objects will not be defined as Lebesgue integrals; rather they are random variables defined on (Ω, F, P) constructed as the unique limits of Then for any γ < γ there exists an increasing function K (depending on d, T and the above parameters) such that As the proof follows quite closely the ones given in [17, Section 3.3], we only provide a sketch.Let b be smooth and compactly supported, otherwise one can argue by density; up to reasoning componentwise, scaling and shifting, we can assume so it will never appear in the computations in the sequel.Set W (2) Let us show how to obtain exponential estimates for I 2 , the ones for I 1 being similar.Going through analogous computations to [17,Theorem 4], invoking heat kernel type estimates, it holds Applying Burkholder-Davis-Gundy inequality with optimal asymptotic behaviour for large p, we deduce that Putting everything together, there exists a constant C > 0 such that for any η > 0 and by Stirling's approximation the last series is convergent for any η < (Ce) −1 .Together with similar estimates for I 1 , we conclude that there exists η > 0 sufficiently small and C > 0 such that The above estimate together with [17, Lemma 18] implies that for any γ < γ there exist η > 0 and κ > 0 such that It remains to show that we can improve the above inequality by allowing any value η > 0, so that we reach (3.6).To do so, we will resort to an interpolation trick, similar in style to techniques already applied in [17,Theorem 15], [9,Corollary 4.6].First, observe that if α, q, H satisfy (3.5) and we fix γ < γ, then we can find ε sufficiently small so that γ ε = 1 − 1/q − (α − ε)H > 1/2 and γ < γ ε ; then by estimate (3.8) (for α − ε in place of α) and linearity, there exist η > 0 and κ > 0 such that As before we can assume b L q T B α ∞,∞ = 1 and we fix ε > 0 as above.Then for any N ∈ N we can decompose b as where ∆ j denote Littlewood-Paley blocks.There exists C > 0 such that η and decompose b as above; w.l.o.g.we may assume that b 2,N = 0, otherwise the stated estimate is trivial.Clearly under (3.5) it holds that γ ≤ 1 − 1/q, therefore setting β = 1 − 1/q − γ we have where the estimate is deterministic; combining it with (3.9) applied to b = b 2,N , we get where the estimate now holds for all η ≥ 0.
), so we can find ρ > 0 small enough such that (α, ρ) satisfy (3.10) as well.It follows that ; the estimate for general τ thus follows applying the one for τ = T to bi .
Finally, in order to prove iii. it is enough to show that as in that case we can find γ ∈ (H + 1/2, γ) such that (3.6) holds.But the above condition on γ is exactly (3.12).
In order to apply Lemma 3.7, we need some information on the pathwise properties of weak solutions X.From this perspective, techniques based on Girsanov theorem are vey natural, as they suggest that T X b may have the same regularity as T W b. As already mentioned, Girsanov transform holds for fBm, see [35]; sufficient conditions in order to apply it in our context (in particular to check that Novikov condition is satisfied) can be found in [17, Section 4.2.2], to which we also refer for more details on the explicit formula for dP/ dQ.Proposition 3.10.Let (Ω, F, {F t } t≥0 , P) be a filtered probability space, W be an F t -fBm of parameter H ∈ (0, 1) and h be an F t -adapted process with trajectories in C γ T , γ > H + 1/2, such that h 0 = 0 and Then there exists another probability measure Q, given by Girsanov theorem, such that h + W is distributed as an F t -fBm under Q.Moreover P and Q are equivalent and it holds where the above estimate only depends on the function K.
Proof.Follows almost exactly as the proof of [17,Theorem 14].
Remark 3.11.For H ≤ 1/2 and b ∈ L q T B α ∞,∞ with (α, q, H) satisfying (3.12), it follows from Corollary 3.9 and Proposition 3.10 that we can construct a weak solution (Ω, F, P; X, W ) to the SDE associated to (x 0 , b), with the property that there exists a measure Q equivalent to P such that L Q (X) = L P (x 0 + W ); moreover all the moments of dP/ dQ and dQ/ dP can be controlled in a way that depends on b L q T B α ∞,∞ but not on the specific (x 0 , b).In particular, the estimates can be performed uniformly over x for some α > 1 − 1/(2H), using the regularity of fBm trajectories it's easy to check that the map t → b(t, x 0 + W t ) belongs P-a.s. to C αH−ε T for any ε > 0. Furthermore, reasoning as in the proof of [17,Theorem 15], it can be shown that there exists γ > H + 1/2 and an increasing function K such that Therefore also in this case we can apply Proposition 3.10 to construct weak solutions to the SDE.Moreover the function K only depends on b E , therefore as before all estimates are uniform over x 0 ∈ R d and b ∈ E with b E ≤ M , M fixed parameter.
If both cases, if in addition b is smooth, then the weak solution constructed in this way necessarily coincides with the unique strong one; thus the above reasoning also provide uniform estimates for the solutions associated to smooth drifts.

3.3.
Stability estimates for SDEs.In light of the above results, in the remainder of Section 3 we will always impose the following assumption on the drift b.Assumption 3.12.Given H ∈ (0, 1), b satisfies one of the following: equivalently, there exists a constant C > 0 s.t., for all (s, t, x, y) for some (α, q) satisfying (3.12).In both cases we will use the notation b We are now ready to present the main result of this section.Theorem 3.13.Let W be an fBm of parameter H ∈ (0, 1) and let b satisfy Assumption 3.12.Then for any x 0 ∈ R d strong existence, pathwise uniqueness and uniqueness in law hold for the SDE in the sense of Definition 3.5.Given x i 0 ∈ R d and b i satisfying Assumption 3.12, i = 1, 2, denote by X i the solutions associated to (x i 0 , b i ) and let M > 0 be a constant such that b i E ≤ M for i = 1, 2. Let (α, q) be another pair satisfying (3.12) with the same α as in Assumption 3.12 and q ≤ q.Then there exists γ > 1/2 with the following property: for any p ∈ [1, ∞) there exists a constant C > 0 (depending on γ, p, M, T, d, q and the parameters appearing in Assumption 3.12) such that Proof.We will only treat the case H ≤ 1/2, the other one being almost identical.Let us first assume b i to be smooth functions and show that (3.15) holds; in this case by Remark 3.6 strong existence and uniqueness hold automatically.Moreover by Remark 3.11, there exist probability measures Q i equivalent to P such that L Q i (X i ) = L P (x i +W ), with moment estimates depending on M but not on (x i 0 , b i ); the solutions decompose as where again K depends on M but not on the specific By Taylor expansion and elementary addition and subtraction, the difference Y = X 1 − X 2 satisfies let us define In order to get estimates for Y , it turns out to be useful to reinterpret the above equation as a linear Young differential equation of the form Indeed for any γ > 1/2, we can apply [16, estimate (3.16), Theorem 3.9] to obtain the existence of a constant C = C(γ) such that for any τ ≤ T it holds and so our task reduces to finding estimates for quantities of the form We start by estimating ψ, which is the simplest term.Recalling that L Q 2 (X 2 ) = L P (x 2 0 + W ), by Point ii. of Corollary 3.9 and Cauchy inequality we can find γ > 1/2 such that, for any p ≥ 1, In order to get estimates for A, observe first of all that by convexity of z → exp(ηz 2 ), it holds where the estimate is uniform in λ and η; therefore by Proposition 3.10, for any λ there exists a probability Q λ equivalent to P such that L Q λ (h λ + W ) = L P (W ); moreover estimates of the form (3.13) only depend on K and thus on M , but not (x i 0 , b i ).Therefore by Jensen's inequality and Proposition 3.8, we can find γ > 1/2 such that, for any η ≥ 0, it holds that Putting everything together, we have obtained which proves (3.15) for smooth b i .Assume now we are given x 0 ∈ R d and b satisfying Assumption 3.12; we can find q ≤ q, q < ∞ such that (α, q) satisfy (3.12) and a sequence {b n } n be smooth drifts s for any ε > 0 (for instance set b n = b * ψ n with {ψ n } n≥1 a standard family of mollifiers).Let X n be the unique solutions to (3.14) associated to (x 0 , b n ), then by (3.15) it holds showing that the random variables θ n = X n − W are a Cauchy sequence in L p Ω C γ T .Therefore they converge to a unique limit θ, which is adapted to the filtration F t = σ{W s : s ≤ t} since θ n are so.Similarly the X n converge to X = θ + W which is adapted.
The estimates from Corollary 3.9, the linearity of b → T w b and the property b n → b in L q T B α−ε p,p together imply that P-a.s.
Since we have P-a.s.θ n → θ in C γ T as well, we can invoke the closure property of nonlinear Young equations (Lemma 3.3) to deduce that X = θ + W is a pathwise solution to (3.14) in the sense of Definition 3.4.
Furthermore, by Fatou's lemma it follows that Girsanov can be applied to X = θ + W = x 0 + h + W to deduce that X is distributed as x 0 + W under another probability measure equivalent to P. In particular, P-a.s.
To summarise, X is a strong solution (so that a copy of it can be constructed on any probability space supporting the measure µ H ) such that T X b ∈ C γ T C 1 loc , which implies by Lemma 3.7 that pathwise uniqueness must hold.This also implies that the law of any solution coincides with the one constructed by Girsanov theorem, from which uniqueness in law follows.
The extension of inequality (3.14) to any pair of solutions X i associated to distributional drifts b i is now a direct consequence of the approximation argument.Remark 3.14.At the price of making the statement of Theorem 3.13 slightly more technical, we have allowed the presence of the additional parameter q ≤ q to handle q = ∞.Indeed finding approximation sequences in L ∞ T B α−1 p,p can be a hard task since this is not a separable space; the use of L q T B α−1 p,p with q < ∞ will also be useful later in the proofs in Section 4.3.Remark 3.15.Theorem 3.13 gives us the information that, for drifts b satisfying Assumption 3.12, the nonlinear Young interpretation of the SDE is the only physical one.Namely, any other solution concept sharing the fundamental property of being the limit of solutions associated to smooth drifts b n → b will coincide with ours.The statement of Theorem 3.13 can be further strengthened to establish path-by-path uniqueness, see [9], however we will not need this for our purposes.
Remark 3.16.Although we have proved the stability estimate (3.15) in order to apply to DDS-DEs, it is of interest on its own.Indeed it can be applied to construct the stochastic flow associated to SDE (3.14), or to develop numerical schemes for distributional drifts b by first approximating them by smoother b n .We leave both applications for future research.
The next lemmas extend the previous results to the case of random initial data.
Corollary 3.17.Given H ∈ (0, 1) and b satisfying Assumption 3.12, strong existence, uniqueness in law and pathwise uniqueness also hold for random initial data X 0 = ξ independent of W . Assume b i are drifts satisfying the assumptions of Theorem 3.13 and where the constant C and the parameters γ, α q are the same as in (3.15).Moreover, denoting by µ i t = L(X i t ) the laws of the unique solutions X i associated to Proof.Strong existence and pathwise uniqueness for random initial data follows from that for deterministic ones by classical arguments.Given a probability space (Ω, F, P) with (W, ξ 1 , ξ 2 ) defined on it and drifts (b 1 , b 2 ), we can condition on the variables (ξ 1 , ξ 2 ) independent of W and apply estimate (3.15) to deduce that ; inequality (3.17) follows taking the L p Ω -norm on both sides, using the tower property of conditional expectation.Now assume we are given a pair (µ 1 0 , µ 2 0 ) ∈ P(R d )×P(R d ) and let m ∈ Π(µ 1 0 , µ 2 0 ) be an optimal coupling for them.On the canonical space Ω = R 2d × C T , endowed with P = m ⊗ µ H , we can construct random variables (ξ 1 , ξ 2 , W ) and solutions X i associated to (ξ i , b i ), in such a way that and so estimate (3.18) follows from (3.17) applied in this setting.
Corollary 3.18.Let H ∈ (0, 1), b satisfying Assumption 3.12, ξ random initial data independent of W and X be the solution associated to (ξ, b).Then there exists another probability measure Q Proof.It suffices to work on the canonical space (Ω, F, P) with Ω = R d ×C T ∋ (x, ω), P = µ 0 ⊗µ H where µ 0 := L(ξ).For any x ∈ R d , denote by ω → X x (ω) the unique strong solution associated to (x, b), so that (x, ω) → X x (ω) gives the solution to the SDE with initial distribution µ 0 .Recall that for any x ∈ R d , there exists a probability measure on C T denoted by Thus if we define a probability measure Exploiting the bounds from Proposition 3.10 (which for given b are uniform in x ∈ R d ) we find providing the conclusion.
Remark 3.19.It follows from the above that for any p ∈ [1, ∞) and any ε > 0 In particular, if As in the case of Remark 3.11, for fixed ξ the estimate can be performed uniformly over b E ≤ M .

3.4.
Regularity of the solution laws.Although our main interest is the study of DDSDEs, our analysis also yields results on the regularity of the law L(X t ) for the solution to a standard SDE with singular drift.The method is quite simple but appears to be new and does not rely on PDE techniques nor Malliavin calculus; rather we exploit Girsanov transform and the averaging estimates for fBm, in combination with duality arguments.Proposition 3.20.Let b satisfy Assumption 3.12, X be the solution associated to (ξ, b) for random initial ξ independent of W . Then L(X Proof.Observe that if (α, q, H) satisfy (3.19), then we can find ε > 0 small enough so that (−α − 2ε, q ′ , H) satisfy the assumptions of Proposition 3.8, where q ′ denotes the conjugate of q.By Corollary 3.18, there exists an equivalent measure Q such that L Q (X) = L P (ξ +W ), therefore for any .
where in the last passage we used the fact that estimate for p < ∞ big enough, by the du- the conclusion then follows from an application of Lemma A.1 from Appendix A.
Proposition 3.21.Let X, b, ξ be as in Proposition 3.20.Then L(X • ) ∈ L q T L p x for all (q, p) ∈ (1, ∞) 2 satisfying 1 q + Hd p > Hd. (3.20) Proof.Observe that (q, p) ∈ (1, ∞) 2 satisfy (3.20) if and only if the conjugates (q ′ , p ′ ) satisfy By [31,Lemma 6.4] (more precisely equation (6.11) right after the proof therein) and estimates based on Girsanov theorem analogous to the proof of Proposition 3.20, we deduce that x and so we can conclude by Lemma A.2 in the Appendix that in this case L(X

Proofs of the main results
We split the proof of Theorem 2.4 into sections, which deal respectively with the cases H > 1/2 and H ≤ 1/2; the proof of Theorem 2.5 is presented in Section 4.3 instead.
We recall to the reader that in this section we will be dealing with DDSDEs of the form with the drift B belonging to either G q,α p or H β,α p (cf. Definitions 2.2-2.1)depending on the value of H ∈ (0, 1).The variable ξ is independent of W and with prescribed law µ 0 ∈ P(R d ), thus depending on the context we will treat both (ξ, B) and (µ 0 , B) as the data of the problem.

4.1.
The case H > 1/2.In this regime we will always consider drifts B ∈ H β,α p with α, β > 0 and p ∈ [1, ∞).In particular here B : [0, T ] × R d × P p (R d ) → R d is bounded and uniformly continuous in all of its arguments; in this sense, although the concept of solution introduced in Section 3 does include the standard one by Remark 3.6, we do not employ it here.
Rather, we will simply say that a tuple (X, ξ, W ), defined on a probability space (Ω, F, P), such that L P (ξ, W ) = µ 0 ⊗ µ H , is a solution to (4.1) if L(X t ) ∈ P p (R d ) for all t ∈ [0, T ] and the integral equation (4.1) holds P-a.s.The concepts of strong existence, pathwise uniqueness and uniqueness in law immediately carry over from the usual ones for SDEs.
Then for any µ 0 ∈ P p (R d ) strong existence, pathwise uniqueness and uniqueness in law hold for the DDSDE (4.1) with data (µ 0 , B).
Proof.We divide the proof in several steps.
Step 1: weak existence.By hypothesis B : [0, T ]×R d ×P p (R d ) → R d is a uniformly continuous, bounded map; existence of weak solutions on [0, T ] then follows from [15,Proposition 3.10].
Step 2: any weak solution is a strong one.Let X be a weak solution of the DDSDE w.r.t.(ξ, W ) on a probability space (Ω, F, P).Then setting µ t = L(X t ), b µ (t, x) = B t (x, µ t ), X solves the SDE associated to b µ , which satisfies |b µ (t, x)| ≤ B uniformly over (t, x).As a consequence where we repeatedly applied Minkowski's inequality; by assumption where we applied (4.2) to obtain the last inequality.Namely, b µ satisfies Assumption 3.12, implying that strong existence and uniqueness in law holds for the associated SDE; therefore X is adapted to (ξ, W ).
Step 3: reduction to the canonical space.As we are dealing with a strong solution X, we can regard it as a random variable on the canonical space (Ω, F, P) with Ω = R d ×C T , P = µ 0 ⊗µ H , F the P-completion of B(R d × C T ).Applying the same reasoning to any pair of weak (thus strong) solutions X 1 , X 2 , possibly defined on different probability spaces, we can construct a coupling ( X1 , X2 ) of solutions defined on the canonical space and w.r.t. the same random variables (ξ, W ). If we show that X1 ≡ X2 , then the equality L(X 1 ) = L(X 2 ) follows.
Step 4: pathwise uniqueness on the canonical space.Let us drop the tilde and adopt the notations It follows from the computations of Step 2 that we can find M ∼ 1 + B 2 such that b i satisfy Assumption 3.12 with b i E ≤ M .We can therefore apply estimate (3.17) for the choice q = q = ∞, together with combining everything, using again X 1 0 = X 2 0 , for any τ ∈ [0, T ] it holds Choosing τ small enough so that C B τ γ < 1, we conclude that µ 1 t = µ 2 t for all t ∈ [0, τ ] and so that E[ X 1 − X 2 γ;[0,τ ] ] = 0, i.e.P-a.s.X 1 ≡ X 2 on [0, τ ].In light of this, choosing now τ = 2τ , going through similar computations we have implying that the solutions also coincide on [0, 2τ ].Iterating the reasoning for τ = nτ until we cover [0, T ] gives the conclusion.

4.2.
The case H ≤ 1/2.In this case we can allow the drift to be singular, i.e. take values in B α ∞,∞ with α < 0. We start by defining what we mean by solution to the DDSDE in this case.
Definition 4.2.Let (Ω, F, P) be a probability space, (X, ξ, W ) be a C T ×R d ×C T -valued random variable defined on it with L P (ξ, W ) = L(ξ) ⊗ µ H ; let B : [0, T ] × P p → S ′ be a measurable map for some p ∈ [1, ∞).We say that X is a solution to the DDSDE (4.1) associated to (ξ, B) if associated to (b µ , ξ, W ) in the sense of Definition 3.4.All the concepts of strong solution, pathwise uniqueness and uniqueness in law are similarly readapted from those of Definition 3.5.
As before, we will consider both (ξ, B) and (µ 0 , B) to be the data of the problem, depending on whether we are focusing on solutions on a prescribed probability space or on their laws.
Assume now B ∈ G q,α p with T is the function associated to B from Definition 2.2.Thus b µ satisfies Assumption 3.12 and the associated SDE has a unique solution X by Corollary 3.17; if in addition µ 0 ∈ P p , then by Remark 3.19 the map t → L(X t ) belongs to C T P p .
Thus for fixed µ 0 , setting I µ 0 (µ) • = L(X • ), we can define a map I µ 0 from C T P p to itself; this map comes with an alternative notion of solution to the DDSDE.Lemma 4.4.Let B ∈ G q,α p with parameters satisfying (4.3), µ 0 ∈ P p .The following hold: i. if X is a weak solution to (4.1), then µ t = L(X t ) is a fixed point for I µ 0 ; ii. if µ is a fixed point for I µ 0 , then there exists a strong solution X to (4.1); iii.if there exists at most one fixed point for I µ 0 , then pathwise uniqueness and uniqueness in law hold for (4.1).
Proof.Point i. immediately follows from the definitions.To see Point ii., assume , so by the results of Section 3, we can construct a strong solution X to the SDE associated to (µ 0 , b µ ).But then by definition of I µ 0 it holds L(X t ) = µ t and so X solves the DDSDE.It remains to show Point iii.; assume X i are two solutions and set Then by Point ii., µ i are both fixed points for I µ 0 , so µ 1 = µ 2 and b µ 1 = b µ 2 .But then X i both solve the SDE associated to b µ 1 , for which uniqueness holds both pathwise and in law, so the conclusion follows.
It follows from the above that, in order to show strong existence, pathwise uniqueness and uniqueness in law for the DDSDE (4.1) in the sense of Definition 4.2, it's enough to show that there exists exactly one solution µ ∈ C T P p in the sense of Definition 4.3.Proposition 4.5.Let B ∈ G q,α p with parameters satisfying (4.3); then for any µ 0 ∈ P p (R d ) strong existence, pathwise uniqueness and uniqueness in law hold for the DDSDE (4.1) associated to (µ 0 , B).
Proof.Define the map I µ 0 : C T P p → C T P p associated to (µ 0 , B) as before; in order to show that there exists exactly one fixed point to I µ 0 , it's enough to establish its contractivity. Given ; denote by X i two solutions, defined on the same probability space and with respect to the same data (ξ, W ), to the SDEs associated to (ξ, b i ), where L(ξ) = µ 0 .By definition of G q,α p , there exists h ∈ L q T , such that for any τ ∈ (0, T ] we have Applying Corollary 3.17, using the fact that X 1 0 = X 2 0 = ξ, we can find γ > 1/2 and C > 0 such that for any τ ∈ (0, T ], we have Choosing τ > 0 sufficiently small such that C h L q T τ γ < 1, we find that I µ 0 is a contraction from C([0, τ ]; P p ) to itself, so therein there exists a unique fixed point μ = I µ 0 (μ); it remains to show we can extend uniquely this fixed point to the whole interval [0, T ].
To do this, the classical argument for SDEs would require to restart the equation at t = τ ; however we can't perform this, as the fractional Brownian motion is not a Markov process.We can exploit the fact that τ only depends on C h L q T and not the history of the paths X i nor µ i to give the following alternative reasoning.
Given τ , μ ∈ C([0, τ ]; P p ) as above, consider E := {µ • ∈ C([0, 2τ ]; P p ) : µ| [0,τ ] = μ}, which is a closed subset of C([0, 2τ ]; P p ) and thus a complete metric space with the same norm.Since μ is a fixed point on [0, τ ], I µ 0 leaves E invariant; for any µ i ∈ E, arguing as above it holds It follows that I µ 0 is a contraction on E and admits a unique fixed point on it, which is necessarily the only possible extension of μ on [0, 2τ ].Repeating the argument on [0, nτ ] as many times as necessary to cover [0, T ] concludes the proof.

4.3.
Stability estimates for DDSDEs.The purpose of this section is to provide the proof of Theorem 2.5, which loosely speaking establishes Lipschitz dependence of the solutions µ i ∈ C T P p in terms of the data (µ i 0 , B i ) for i = 1, 2. We assume that we are given drifts B i belonging to H H,α p for parameters satisfying (2.2) when H > 1/2, respectively B i ∈ G q,α p for parameters satisfying (2.3) when H ≤ 1/2; in both cases we denote the optimal constants by B i .Given µ i 0 ∈ P p , we denote by µ i ∈ C T P p the unique solutions associated to (µ i 0 , B i ), whose existence is granted by Theorem 2.4.Finally, for α, q given as above, let us recall the notation introduced in Theorem 2.5: Proof of Theorem 2.5.Let µ i be the solutions as above and set b i t := B i (t, µ i t ).Recall from the proofs of Propositions 4.1 and 4.5 that if B i ≤ M , then b i E ≤ C(M ), E being suitable spaces for which Assumption 3.12 is met; so we are in a position to apply estimates from Section 3.3.First observe that, by addition and subtraction of The argument slightly differs in the H > 1 2 and H ≤ 1 2 cases, so we will handle them separately.We begin with H > 1 2 .Let us choose q < ∞ big enough so that (α, q, H) satisfies (3.12); then we can apply estimate (3.18) to obtain on the other hand, by estimate (4.4) and the assumption Putting everything together, setting s ) q , we obtain applying Grönwall to f and taking the power 1/q no both sides readily gives which is exactly the desired estimate (2.4).Suppose now X i are solutions defined on the same probability space, then combining estimate (3.17) with the ones above we find and the conclusion readily follows from We now move on to the case H ≤ 1 2 ; for q = ∞ the proof is the same as above, so we can assume w.l.o.g.q < ∞ here.For B i ∈ G q,α p , it follows again by (4.4) that where we recall that h i ∈ L q T are the functions associated to B i given in Definition 2.2.Following the same strategy as before, by (4.6) and (4.5), by this inequality and the assumption h 1 which gives estimate (2.6).The statement for E[ X 1 − X 2 p γ ] now follows exactly as in the case We conclude this section with the application of Theorem 2.5 to a particularly relevant case.

Refined results in the convolutional case
In this section we focus on the case of DDSDEs with convolutional structure, namely They correspond to the case B t (µ) = b t * µ and can therefore be solved under suitable assumptions on b (e.g.b ∈ E as in Example 4.6).Due to their specific structure however, as soon as the associated solution X has a regular law µ, its regularity immediately transfers to the drift b µ t = b t * µ t , as the next simple lemma shows.Lemma 5.1.Let H ∈ (0, 1), b ∈ B α ∞,∞ for α > 1 − 1/(2H), µ 0 ∈ P 1 and X denote the unique solution to the DDSDE (5.1) with L(ξ, W ) = µ 0 ⊗ µ H . Then X also solves an SDE with drift b µ which belongs to L 1 T C 1 x .Proof.Let X be the aforementioned solution, then by the proof of Theorem 2.4 we know that it is solves an SDE with drift b µ satisfying Assumption 3.12; applying Proposition 3.20 for the choice q = 1 in (3.19), we deduce that µ ∈ L 1 T B α 1,1 for any α < 1/(2H).Therefore by the hypothesis and Young's inequality it holds b ) and all α < 1/(2H); choosing α appropriately gives the conclusion.Remark 5.2.Up to technicalities, the proof readapts to the case of time-dependent drifts B t (µ) = b t * µ with b satisfying Assumption 3.12, with the same conclusion that b µ ∈ L 1 T C 1 x .
Lemma 5.1 shows that in this setting the effective drift b µ is much more regular than the original b, to the point that the SDE associated to b µ can be solved classically.However in order to give meaning to the DDSDE, it suffices to know that b µ satisfies the weaker Assumption 3.12; for this reason we expect the criteria coming from Theorem 2.4 to be suboptimal for convolutional DDSDEs (5.1), as they don't take in account the different regularity of b and b µ .
A partial improvement of those results is given by Theorems 2.6 and 2.7, whose proofs are presented respectively in Sections 5.1 and 5.2.Before moving further, let us define rigorously what we mean by solutions here, although the concept is very similar to that of Definition 4.2.
Definition 5.3.Fix H ∈ (0, 1); let (Ω, F, P) be a probability space, (X, ξ, W ) be a C T ×R d ×C Tvalued random variable defined on it with L P (ξ, W ) = µ 0 ⊗ µ H and b be a distributional drift.We say that X is a solution to the DDSDE (5.1) associated to where we additionally require that either: i. b µ satisfies Assumption 3.12 and the SDE is interpreted in the sense of Definition 3.4, or ii.b µ ∈ L 1 T C 0 x and the SDE is interpreted in the standard integral sense.All the concepts of weak solution, strong solution, pathwise uniqueness and uniqueness in law are readapted similarly.
A major role in the proofs of Theorems 2.6 and 2.7 is given by the following conditional uniqueness result.Proposition 5.4.Let H ∈ (0, 1), p ∈ [1, ∞), p ′ its conjugate exponent; let b be a distributional drift satisfying one of the following conditions: Hq .Assume furthermore that for a given µ 0 ∈ L p ′ x there exists a weak solution X to the DDSDE (5.1) associated to (µ 0 , b), satisfying Then X is a strong solution; moreover it is the unique one (both pathwise and in law) in the class of solutions satisfying condition (5.2) Proof.We handle the cases H ≤ 1/2 and H > 1/2 slightly differently.
The case H ≤ 1/2.First observe that, if X satisfies (5.2), then by Young's inequality b µ ; in particular b µ satisfies Assumption 3.12, Definition 5.3 is meaningful and X is necessarily a strong solution.Now let X i , i = 1, 2, be two solutions to (5.1) satisfying (5.2); as they are both strong solutions, by the usual arguments, we can assume them to be defined on the same probability space, w.r.t. the same (ξ, W ), and we only need to check that X 1 = X 2 P-a.s.Moreover thanks to the strict inequality α > 1 − 1 2H + 1 Hq here we can assume w.l.o.g.q < ∞.
For i = 1, 2, set µ i t = L(X i t ); as b * µ i both satisfy Assumption 3.12, we may apply Corollary 3.17 to find x ) < ∞.In particular the quantity d r ′ (µ 1 t , µ 2 t ) is finite for any r ′ ∈ [1, ∞) and any t ∈ [0, T ].We now wish to apply Corollary A.9 from Appendix A to obtain better control on the difference of the drifts b * µ 1 − b * µ 2 .To do so observe that, under our assumptions on the parameters (α, q, p), we can find new parameters (s, r) ∈ (1, ∞) 2 with s large and r close to 1 such that For this choice, set α := α − d/s; by construction the parameters (p, p ′ , s, r) satisfy the assumptions of Corollary A.9 from Appendix A; its application, together with standard Besov embeddings, yields ).Now since under (5.3) the triple (α, q, H) also satisfies (3.12), we can again apply estimate (3.17) from Corollary 3.17 to find Applying Grönwall's lemma we conclude that d r ′ (µ 1 t , µ 2 t ) = 0 and so µ 1 t = µ 2 t for all t ∈ [0, T ].Thus, X i are solutions solutions to the same SDE and therefore X 1 • = X 2 • P-a.s.The case H > 1/2.We argue essentially in the same way, only this time checking that X is a strong solution starting from the available information on b and L(X • ) is less straightforward.
First observe that Young's inequality still provides b µ ∈ C 0 T C α x , so that by Definition 5.3 the DDSDE is meaningful in the classical integral sense.In order to check Assumption 3.12 for b µ (which implies X being strong), it remains to show that b * µ ∈ C αH T C 0 x for some α such that Applying Corollary A.9 and the previous estimate for d r ′ (µ t , µ s ), we then find On the other hand, since b ∈ C 0 T B α p,p and µ ∈ L ∞ T L p ′ x , by Young's inequality, we have b 1. We can now interpolate between the two estimates: choose θ = α ∈ (0, 1), so that θ(α − 1) x , where by construction α > 1 − 1/(2H).The second part of the argument, concerning the comparison of two solutions X i satisfying (5.2), now proceeds identically as in the case H ≤ 1/2.5.1.Distributional kernels with bounded divergence.Proposition 5.4 reduces the problem of uniqueness of solutions (in a suitable class) to that of establishing their regularity, in the sense of equation (5.2).
One classical way to show that the condition µ 0 ∈ L p ′ x is propagated at positive times, which has been exploited systematically after [12], is to impose boundedness of div b; in the setting of DDSDEs with general additive noise and regular drift b, an analogous statement can be found in [15,Proposition 4.3].
Hq .Then for any µ 0 ∈ L p ′ x there exists a strong solution to the DDSDE (5.1) associated to (µ 0 , b), which moreover satisfies L(X Proof.We start by dealing with the case H ≤ 1/2; at the end of the proof we explain how the reasoning needs to be modified for H > 1/2. The case H ≤ 1/2.In this case we can assume w.l.o.g.q < ∞; recall that if f n is a bounded sequence in L q T B α ∞,∞ , for (α, q) satisfying Assumption 3.12 such that f n → f in L q T B α−1 ∞,∞ , then by Corollary 3.17 the associated solutions converge to the unique strong solution X of the SDE associated to (ξ, W, f ) (we can assume {X n } n≥1 and X to be defined on the same probability space for the same (ξ, W )). Given b as in the hypothesis, consider a sequence of smooth, bounded functions b n such that x ; let X n be the solutions to whose existence is granted by classical results (see e.g.[10,Theorem 7]) and set µ n t = L(X n t ).By [15,Proposition 4.3], there exists As a consequence, each X n solves an SDE with drift In turn this implies by Remark 3.19 that for any fixed ε > 0 we have the uniform estimate since moreover X n 0 = ξ for all n ∈ N, we can conclude by Ascoli-Arzelà that the sequence {X n } n≥1 is tight in C T .We can then extract a (not relabelled) subsequence such that L(X n ) converge weakly to some µ ∈ P(C T ); consequently µ n t ⇀ µ t in P(R d ) for any t ∈ [0, T ], where µ t = e t ♯µ and e t : C T → R d is the evaluation map.It follows from the uniform estimates that µ t L p ′ x ≤ C µ 0 L p ′ as well.
We claim that the drifts Once this is shown, by the initial observation the solutions X n must converge to the unique solution X associated to (ξ, W, f ); then it must hold L(X t ) = µ t , f t = b t * L(X t ) and so we can conclude that X is a solution to (5.1) with the desired regularity.
It remains to show the claim; to this end, we set ) and dominated convergence imply that for all α < α.For h n we have the estimate Hence we have shown the claim and thus the conclusion in this case.
The case H > 1/2.As in the proof of Proposition 5.4, in this regime L(X • ) ∈ L ∞ T L p ′ x is not enough to deduce straightaway that b * L(X • ) satisfies Assumption 3.12; however up to technical details, the proof is almost the same as above.
Specifically, we can consider a sequence {b n } n of smooth functions, uniformly bounded in x and such that b n → b in L q T B α p,p for any q < ∞.Then exploiting the a priori bound from [15, Proposition 4.3] and the argument from Proposition 5.4, one can derive uniform estimates for the solutions X n associated to X and finally pass to the limit with the help of Corollary 3.17.
Alternatively, let us mention that the existence of a weak solution X satisfying L(X x in this setting can be obtained by an application of [15,Proposition 4.4].
Proof of Theorem 2.6.It is now an immediate consequence of Propositions 5.4 and 5.5.

Integrable kernels.
We now restrict ourselves to the case H ≤ 1/2 and drifts b ∈ L q T L p x ; in this setting we can present a second route to establishing existence of a solution with sufficiently regular law, to which we can apply Proposition 5.4.
Before proceeding further, let us explain why it is reasonable to expect so.By the Besov embedding L p t ֒→ B −d/p ∞,∞ , drifts b ∈ L q T L p x satisfy Assumption 3.12 if and only if however, differently from the class L q T B α p,p , for b ∈ L q T L p x it is known after the works [36,31] that Girsanov transform (and thus weak existence and uniqueness in law for associated SDEs) is available as soon as As already seen in Section 3.4, Girsanov transform allows to deduce information on the regularity of L(X t ), which in turn provides higher regularity of the effective drift b µ for the convolutional DDSDE.In particular, we may hope that starting from b ∈ L q T L p x for (q, p) satisfying (5.6), we end up with b µ ∈ L q T L p x with (q, p) satisfying (5.5).At a technical level, we will proceed similarly as in Section 5.1, first establishing uniform a priori estimates for regular b and then running an approximation procedure.We start by establishing the recalling and improving the available results on Girsanov transform; as we are only interested in smooth approximations, for simplicity we restrict to regular drifts.Lemma 5.6.Let (Ω, F, P) be a probability space, (ξ, ); let X be the unique strong solution to Then there exists a measure Q equivalent to P such that L Q (X) = L P (ξ + W ) and there exists an increasing function F , depending on H, T, p, q, such that where the estimate does not depend on µ 0 nor the specific function f .
Proof.For deterministic initial data ξ = x 0 ∈ R d (equiv.µ 0 = δ x 0 ), the statement is a direct consequence of [31,Lemma 6.7], where it is already stressed that the estimates only depend on f L q T L p x but not on x 0 nor the specific f .The proof for random initial data ξ independent of W is now identical to that of Corollary 3.18; the estimate not depending on ξ follows from the property that f The next lemma shows that the initial regularity of µ 0 is propagated at positive times, establishing useful a priori estimates; the proof is similar to that of Proposition 3.21.
Lemma 5.7.Let ξ, W, X, f, (p, q) be as in Lemma 5.6 and assume µ 0 ∈ L r x for some r ∈ (1, ∞); then Proof.Fix r < r and denote by r′ the conjugate exponent of r; take ε > 0 such that r ′ (1+ε) = r′ .Let Q be the measure given by Lemma 5.6 such that L Q (X) = L P (ξ + W ); since dP/dQ admits moments of any order, for any g where in the last passage we used the fact that ξ and W t are independent.Recalling that L(W t ) is a probability measure, by Hölder's and then Young's inequality we arrive at As the estimate is uniform over all g ∈ C ∞ c (R d ) and t ∈ [0, T ], by duality we deduce that as the reasoning holds for all r < r, the conclusion follows.
We are now ready to prove the existence of solutions to the DDSDE (5.1) for b ∈ L q T L p x and sufficiently integrable µ 0 . (5.7) Then for any b ∈ L q T L p x and any µ 0 ∈ L r x there exists a strong solution X to the associated DDSDE (5.1), which moreover satisfies L(X • ) ∈ L ∞ T L r x for any r ∈ [1, r).Proof.We pursue the same general strategy as in the proof of Proposition 5.5.
As condition (5.7)only contains strict inequalities, w.l.o.g.we can assume q < ∞; consider a sequence {b n } n of Lipschitz, compactly supported functions such that b x .It follows from [10,Theorem 7] that for every n there exists a unique solution X n to the approximating DDSDE In particular, each X n is also a solution to an SDE with drift f n t := b n t * L(X n t ) and by Young's inequality Therefore we may apply Lemmas 5.6 and 5.7 to obtain the uniform bound for all r ∈ [1, r).Applying Hölder's inequality to the integral in time and and Young's inequalities to the convolution in space, we find x < ∞ for any p < p, where Using the fact that p can be chosen arbitratrily close to p and that the first inequality in (5.7) is strict, we see that the family {b n * µ n } is bounded in L q T L p x for parameters (q, p) satisfying but this is exactly condition (5.5), i.e. the regularity regime in which we know how to solve the SDE in a strong sense.On the other hand, the uniform bound for b n L q T L p x and the use of Girsanov transform allows to derive a uniform bound for E[ X n H−ε ] for any ε > 0; together with X n 0 = ξ for all n this implies tightness of {X n } n , so that we can extract a (not relabelled) subsequence such that L(X n ) ⇀ µ in P(C T ), µ n t ⇀ µ t = e t ♯µ for all t ∈ [0, T ].From here, the argument is almost identical to that of Proposition 5.5: once we show that b n * µ n → b * µ in a sufficiently strong topology, then by Corollary 3.17 the solutions X n will converge to the unique strong solution X associated to (b * µ, ξ), which must therefore be a solution to the DDSDE associated to µ 0 = L(ξ) and b.By the uniform bounds on {µ n } n and weak convergence µ n t ⇀ µ t = L(X t ), we also deduce that L(X x for some (q, p) satisfying (5.5), we can apply Corollary A.6 from Appendix A to deduce that (up to further relabelling and Corollary 3.17, this implies the conclusion.
We are now ready to prove the main result of this subsection.
Proof.The proof is based on a (non-trivial) combination of Propositions 5.4 and 5.8.Under our assumptions, the existence of a strong solution such that L(X • ) ∈ L ∞ T L r x for any r < r is granted; in particular if r > p ′ , then we can choose r = p ′ and then assumptions of Proposition 5.5 in this case are satisfied thanks to the embedding L q T L p x ֒→ L q T B −ε p,p for any ε > 0, giving the uniqueness part of the statement.Up to technicalities the borderline case r = p ′ can be treated similarly, exploiting the embedding L q T L p x ֒→ L q T B −ε p,p for some p = p(ε) > p chosen so that 1/r + 1/p = 1.
Thus it remains to study the regime r < p ′ , equivalently r ′ > p; in this case we can choose r < r such that r′ > p as well.By Besov embedding it then holds to verify that b satisfies the assumptions of Proposition 5.4, it then suffices to check that since r can be taken arbitrarily close to r, this follows from the first strict inequality in (5.7).
Appendix A. Some useful lemmas We collect in this appendix several technical lemmas of analytic nature that have been used throughout the paper.We start with useful facts on the identification of elements of dual spaces.
and there exists a constant C such that  T L p D as a subset of L q T L p x .Observe that for p > 1, . By hypothesis f defines an element of the dual of L q ′ T L p D , thus by duality Since D has finite measure, g L ∞ D = lim p→∞ g L p D for all measurable g; taking p ′ → ∞ in the above by Fatou's lemma we obtain as the reasoning holds for any R > 0, the conclusion follows talking R → ∞.
The next statements concern the compactness properties of convolutions, specifically how weak convergence of measures is enhanced to strong convergence of associated functionals b * µ. x is equicontinuous, in the sense that τ h n b → τ h b in L p x for h n → h in R d .Since µ n ⇀ µ, by Skorokhod's representation theorem we can construct a probability space (Ω, F, P) and a family of r.v.s {X n } n , X on it such that L P (X n ) = µ n , L P (X) = µ and X n → X P-a.s.; it then holds where in the second passage we used Jensen's inequality.By the aforementioned equicontinuity it holds τ X n b − τ X b L p x → 0 P-a.s. and we have the uniform bound τ X n b − τ X b L p x ≤ 2 b L p x , thus the first claim follows from dominated convergence.
Regarding the second claim, Young's inequality gives a uniform bound for {b * µ n } in L p x for 1 + 1/p = 1/p + 1/r; combined with convergence in L p x and interpolation estimates, we deduce convergence in L p x for any p ∈ [p, p).
Remark A.4.In the borderline case of b ∈ L p x and {µ n } n bounded in L p ′ x , then {b * µ n } n is a bounded sequence in C 0 (R d ), which denotes the Banach space of continuous functions vanishing at infinity, endowed with the supremum norm.In this case it can be shown that {b * µ n } n is also equicontinuous, so by Ascoli-Arzelà it converges to b * µ uniformly on compact sets.Corollary A.6.Let p, q ∈ [1, ∞), {b n } n ⊂ L q T L p x be a sequence such that b n → b in L q T L p x ; moreover let {µ n , µ} ⊂ C T P(R d ) be such that µ n t ⇀ µ t weakly for every t ∈ [0, T ] and such that sup The last statements we are going to provide concern the continuity of the map µ → b * µ in suitable topologies.Their proof require the use of maximal functions and their basic properties, which we recall first; we refer the interested reader to [41]  (A.2) The above and similar inequalities (see [8,7] for recent asymmetric extensions) allow to control the map µ → b * µ in Wasserstein spaces.
Lemma A.8. Let (p, q, r, s) ∈ (1, ∞) 4 be such that r ≤ p ∧ s and Then there exists a constant C, depending on d and the above parameters, such that for any b ∈ W 1,p x and µ, ν Proof.If d r ′ (µ, ν) = +∞ the inequality is trivially true, so we can assume d r ′ (µ, ν) < ∞; moreover since µ, ν ∈ L q x for any q ∈ [1, q], w.l.o.g.we can assume equality holds in (A.3).Let m ∈ Π(µ, ν) be an optimal coupling of (µ, ν) for d r ′ (µ, ν) and let N ⊂ R d be as in (A. which gives the conclusion.

Definition 4 . 3 .
Assume B ∈ G q,α p with parameters satisfying (4.3), µ 0 ∈ P p ; we say that a flow of measures µ ∈ C T P p is a solution to the DDSDE associated to (µ 0 , B) if it satisfies I µ 0 (µ) = µ.The next lemma clarifies the relation between Definitions 4.2 and 4.3.
By addition and subtraction, b µ t − b µ s = (b t − b s ) * µ t + b s * (µ t − µ s ); by the hypothesis on b, µ, we can estimate the first term by

1 x
x)|½ |f (t,•)|≤M (x) ½ |x|≤M dxTaking the limit M → ∞ gives the conclusion.Lemma A.2. Suppose f ∈ L 1 loc ([0, T ] × R d ), q ∈ (1, ∞)and there exists a constant C such that[0,T ]×R d f (t, x)ϕ(t, x) dt dx ≤ C ϕ L q ′ T L for all compactly supported ϕ ∈ L ∞ ([0, T ] × R d ).Then f ∈ L q T L ∞ x and f L q T L ∞ x ≤ C. Proof.Fix R >0 and set D = {x ∈ R d : |x| ≤ R}; we can identify L p D := L p (D) as the subset of L p (R d ) made of functions supported on D, similarly L q

Lemma A. 3 .
Let b ∈ L p x with p ∈ [1, ∞) and {µ n } n ⊂ P(R d ) such that µ n ⇀ µ weakly, then lim n→∞ b * µ n − b * µ L p x = 0.If moreover {µ n } n is bounded in L r x with r > 1, then b * µ i → b * µ in L p x for any p ∈ [p, ∞) s.t.Given h ∈ R d , define the translation operator τ h : f → f (• + h) acting on L p x .Recall that any given b ∈ L p
t∈[0,T ] µ L r x < ∞ for some r ∈ [1, ∞].Then b n * µ n → b * µ in L q T L p x for all p ∈ [p, ∞) satisfying (A.1).Proof.It suffices to show that b n * µ n → b * µ in L q T L p x ; once this is done, convergence in L q T L p xfollows as usual by interpolation and boundedness in L q T L p x , which comes from the assumptions and Young's inequality.It holdsb n * µ n L q T L p x ≤ (b n − b) * µ n L q T L p x + b * µ n − b * µ L q T L p xwhere we can estimate the first term by(b n − b) * µ n L q T L p x ≤ b n − b L q T L p x → 0.For the second term, by Lemma A.3 and the assumptions it holds bt * (µ n t − µ t ) L p x → 0 for Lebesgue a.e.t ∈ [0, T ], as well as b t * (µ n t − µ t ) L p x ≤ 2 b t L p x ; thus by dominated convergence we infer b * µ n − b * µ L q T L p x → 0 as well.Lemma A.7.For any p ∈ [1, ∞) and α ∈ R there exists a constant C = C(α) such that b * (µ − ν) B α−1 ∞,∞ ≤ C b B α ∞,∞ d p (µ, ν) for all b ∈ B α∞,∞ and µ, ν ∈ P(R d ).Proof.It's enough give the proof for p = 1, as the general case follows from d 1 (µ, ν) ≤ d p (µ, ν); we can assume d 1 (µ, ν) < ∞, otherwise the inequality is trivial.By Bernstein estimates, reasoning on Littlewood-Paley blocks, we haveb * (µ − ν) B α−1 ∞,∞ = sup n 2 n(α−1) (∆ n b) * (µ − ν) L ∞ x ≤ sup n 2 n(α−1) ∆ n b W 1,∞ x d 1 (µ, ν) sup n 2 nα ∆ n b L ∞ x d 1 (µ, ν) = b B α ∞,∞ d 1 (µ, ν) which gives the claim.
for their proofs.Given b ∈ L p (R d ), p ∈ [1, ∞], its maximal function M b is defined by M b(x) := sup r>0 1 λ d r d B(x,r) |b(y)| dy where λ d stands for the Lebesgue measure of B(0, 1) in R d .It is well known that if p ∈ (1, ∞], then M f ∈ L p (R d ) and M b p ≤ c d,p b p for some constant c d,p > 0; similar definitions and properties hold in the case of vector-valued drifts b ∈ L p (R d ; R m ) (in which case c = c d,p,m ).If b ∈ W 1,p (R d ; R d ), then there exists a Lebesgue-negligible set N ⊂ R d and a constant c d > 0 such that the Hajlasz inequality holds: |b(x) − b(y)| ≤ c d |x − y| (M Db(x) + M Db(y)) ∀ x, y ∈ R d \ N.