Quantitative heat kernel estimates for diffusions with distributional drift

We consider the stochastic differential equation on $\mathbb{R}^d$ given by $$ \, \mathrm{d}X_t = b(t,X_t) \, \mathrm{d}t + \, \mathrm{d} B_t, $$ where $B$ is a Brownian motion and $b$ is considered to be a distribution of regularity $>-\frac12$. We show that the martingale solution of the SDE has a transition kernel $\Gamma_t$ and prove upper and lower heat kernel bounds for $\Gamma_t$ with explicit dependence on $t$ and the norm of $b$.


Introduction and main results
In this paper we consider the stochastic differential equation on R d given by where B is a Brownian motion and b is a distribution of regularity > − 1 2 .Such singular diffusions (diffusions with distributional drift) appear as models for stochastic processes in random media (then b would also be random, but independent of B), for example in [4,6,5].They also appear as "stochastic characteristics" in Feynman-Kac type representations of singular SPDEs, for example in [13,5,17].In non-singular SPDEs, the stochastic characteristics would be formulated in terms of the Brownian motion, and they may be useful tools to infer information about the long-time behavior of the SPDE.For example, the asymptotic behavior of the total mass of the parabolic Anderson model is typically derived via the Feynman-Kac formula [16], and for that purpose it is important that we understand the Brownian motion and its transition probabilities very well.When studying singular variants of the parabolic Anderson model, where the Brownian motion in the Feynman-Kac representation is replaced by a singular diffusion, we thus need to understand the transition probabilities of this singular diffusion.Moreover, since we are interested in the long-time behavior, we need quantitative control of the transition probabilities on arbitrarily long time intervals.This motivates our present work.
We show that the solution to (1) possesses a transition kernel Γ t : R d × R d → R for all t > 0. This means that under the measure P x such that X 0 = x we have for all The following theorem represents the main result of our paper, in which we show that the above transition kernel satisfies heat-kernel estimates.
For any Banach space X and t > 0 we write • CtX for the norm on C([0, t], X), which is defined for f ∈ C([0, t], X) by ∆ −1 b denotes the first Littlewood-Payley block and ∆ ≥0 b the sum of the positive Littlewood-Payley blocks (see Section 1.2).B s p,q denotes a Besov space, see [2].
As a corollary, we obtain the following estimate on the escape probability of the diffusion X to leave a ball.Corollary 1.2.Let α ∈ (0, 1  2 ).There exists a C > 0 such that for all b ∈ C([0, ∞), B −α ∞,1 (R d , R d )), x ∈ R d , K > 0 and T ≥ 1, and for X solving (1) with P x (X 0 = x) = 1: .Since in that case ∆ ≥0 b = 0, this corresponds exactly to our bounds (2) and (3) (for µ = 0).Remark 1.4.As we consider a time inhomogeneous drift, we could have also formulated the heat-kernel estimates for Γ s,t (with 0 ≤ s < t), which is the transition kernel from time s to time t: If P s,x is the probability measure under which X s = x and (1) holds (for t > s), then E s,x [ϕ(X t )] = R d ϕ(y)Γ s,t (x, y) dy.However, to simplify notation we only consider the case s = 0 and we write Γ t for Γ 0,t .The heat-kernel estimates for Γ s,t follow by applying Theorem 1.1 with b ′ t = b t+s , t ≥ 0.

Literature
Diffusions with a distributional drift were first considered by Bass and Chen [3] and Flandoli, Russo and Wolf [8], both in the one-dimensional time-homogeneous setting.More recently, Delarue and Diel [6] used Hairer's rough path approach to singular SPDEs [14,15] to extend the results of [8] to the time-inhomogeneous case, and they applied this to construct a random directed polymer measure.Flandoli, Issoglio and Russo [7] were the first to consider multidimensional singular diffusions, but they require more regularity than in the previous works on the one-dimensional case (they consider the "Young regime", i.e., the distributional drift has regularity better than −1/2).Zhang and Zhao [22] study the ergodicity and they derive heat-kernel estimates for singular diffusions in the Young regime.Cannizzaro and Chouk [5] use paracontrolled distributions to extend the approach of [6] to higher dimensions and the results of [7] to more singular drifts.They apply this to construct a random polymer measure that is closely related to the parabolic Anderson model.
In this paper we follow the approach of Cannizzaro and Chouk, although we restrict our attention to the more regular Young regime.This is crucial for our arguments.
As already mentioned, Zhang and Zhao [22] also prove heat-kernel estimates for SDEs with distributional drifts in the Young regime.More precisely, they prove that there exist c, C ≥ 1 such that for all t ∈ (0, T ] and x, y ∈ R d Moreover, they give an upper bound on the gradient of the transition kernel, ∇Γ t .Here, the constant C implicitly depends on T and b C −α .If b is the gradient of a function that does not dependent on time, then there is classical heatkernel estimates for Γ, see for example Stroock [20,Theorem 4.3.9].In that theorem we have b = ∇U for a smooth and bounded function U , but the estimate only depends on max U −min U , so by an approximation argument it extends to continuous and bounded U .This result is uniform in time, but also here the dependence of the constants on max U − min U is implicit.
In another work by the authors together with W. König [17], our heat-kernel estimates are applied to derive the asymptotic behavior of the total mass of the parabolic Anderson model.In that application it is crucial to understand how the constant grows with t and the norm of b.Therefore, we need our "quantitative version" of the heat-kernel estimates.

Notation and conventions
We write N = {1, 2, . . .}, N 0 = {0} ∪ N and N −1 = {−1} ∪ N 0 .For the whole paper, d is an element of N and will denote the dimension of the space.For families (a i ) i∈I , (b i ) i∈I in R for an index set I, we write a i b i to denote the existence of a C > 0 such that a i ≤ Cb i for all i ∈ I.We write C b for the space of continuous bounded functions and C ∞ b for the space of C ∞ functions for which all their derivatives are bounded functions.We abbreviate function spaces and Besov spaces by omitting "(R d )" in the notation, for example we abbreviate B β p,q (R d ) to B β p,q .Moreover, we write C β for B β ∞,∞ and C β p for B β p,∞ .We write u v for the paraproduct between u and v (with the low frequencies of u and the high frequencies of v), and u v for the resonance product; we adopt the notation from [19] and refer to [2] as background material.
In the rest of the paper (ρ i ) i∈N −1 is a dyadic partition of unity, meaning that ρ −1 is supported in a ball around 0, ρ 0 is supported in an annulus, For i ∈ N −1 we write ∆ i for the corresponding Littlewood-Payley blocks (F denotes the Fourier transform) Moreover, we define ∆ ≥0 f to be the sum of all the positive Littlewood-Payley blocks: 2 Diffusions with distributional drift and their heat-kernel estimates Throughout this section we fix we consider the stochastic differential equation For t > 0 let L t be the operator We consider the following Cauchy problem for u : [0, T ] × R d → R with terminal condition φ: The solution theory for the Cauchy problem will be given in Proposition 2.4.We write u φ for the solution to (7).But let us first discuss how to interpret (5) in terms of a martingale problem.Definition 2.1.We say that a stochastic process X = (X t ) t∈[0,T ] on a probability space (Ω, P) is a solution to the SDE (5) on [0, T ] with initial condition X 0 = x if it satisfies the martingale problem for ((L t ) t∈(0,T ] , δ x ), i.e., if ) and for u = u φ being the solution to the Cauchy problem (7), the process is a martingale.
The martingale problem has a unique solution: there exists a unique solution to the martingale problem for ((L t ) t∈(0,T ] , δ x ), in the sense that there is a unique probability measure P x on Ω = C([0, T ], R d ) such that the coordinate process X t (ω) = ω(t) satisfies the martingale problem for ((L t ) t∈(0,T ] , δ x ).Moreover, X is a strong Markov process under P x and the measure P x depends (weakly) continuously on the drift b.
Remark 2.3.The continuity of the solution P in terms of the drift is not mentioned in [5, Theorem 1.2], but it can be extracted from their proof.
Observe that Theorem 2.2 also implies that there exists a unique probability measure P s,x on C([s, T ], R d ) such that the coordinate process satisfies the martingale problem for ((L t ) t∈(s,T ] , δ x ).This can be obtained by applying Theorem 2.2 to a shift of the drift, as is mentioned in Remark 1.4.
Next, our aim is to show that X admits a transition density Γ s,t for 0 ≤ s < t ≤ T (Proposition 2.9), which means that for ϕ ∈ C c (R d ) and x ∈ R d and with P s,x as in Remark 2.3 We do this by showing that Γ t,T (x, y) = u δy (t, x) for the solution u δy to (7) with terminal condition u(T, •) = δ y .
In order to construct the solution u δy we have to slightly extend the results of [5].Indeed, in [5, Theorem 3.1 and 3.2] the well-posedness of the Cauchy problem is shown for φ ∈ C β with β ∈ (1 + α, 2 − α), and δ z is not in this space.The solution theory in [5] is formulated in terms of mild solutions: A mild solution of ( 7) is a fixed point u of Φ, i.e., Φu = u, where Φ is defined on where P t φ := p(t, •) * φ for t > 0 and P 0 φ = φ (that Φ is well-defined follows by 2.6).In order to allow δ y as a terminal condition, we will consider a different space that "allows a blowup as t ↑ T ".However, for notational elegance, we instead consider a space with "a blowup at 0" and mention that u is a fixed point of Φ if and only if v given by v(t, so that we call v a mild solution of We will show that Θ has a fixed point in the following space (for suitable δ, β).For δ ≥ 0, β ∈ R and t > 0 we define The following proposition is a slight extension of [5, Theorem 3.1 and 3 (as continuous embeddings), this does not make much of a difference.But our heat-kernel estimates depend on the B −α ∞,1 -norm and for their derivation it is more convenient to work with B −α ∞,1 .Before we prove Proposition 2.4 we present two auxiliary facts, Lemma 2.5 and 2.6.We write B for the beta function (see e.g.[1, Section 1.1]), which is the function There exists a C > 0 such that for all t ∈ (0, 1], Proof.In [12,Lemma A.7] it is proven (for p = ∞, but can be caried on mutatis mutandis for general p ∈ [1, ∞]) that for all κ ≥ 0 and γ ∈ R there exists a C > 0 such that for all t ∈ (0, 1] which implies the first bound in (13).The second bound in (14) follows by (15) as The bound in ( 14) is also proven in [12, Lemma A.9], we give the proof to be self-contained.By applying (15) we obtain for t ∈ (0, 1] This proves ( 14).
Proof of Proposition 2.4.If γ ≥ β, then the statement follows directly from [5,Theorem 3.2].Therefore, we assume that γ < β and it is sufficient to show that the statement holds for "t 0 " instead of "T ", where t 0 will be chosen small, as we can extend the solution to As mentioned before, it is sufficient to consider the fixed point problem for Θ as in (10) instead of Φ.Let us write Θ φ t for Θ as in (10) but with "T " replaced by "t".We will show that there exists a t 0 such that (a) Θ φ t 0 has a unique fixed point in  (observe that by assumption κ > 0 and δ ∈ (0, 1), because 0 and, moreover That Θ φ t v forms an element of C((0, t], C β p ) follows by Lemma 2.5.Therefore, with (17 C β p to itself.By (18) then follows that for sufficiently small t 0 the map Θ φ t 0 is a contraction on the Banach space M β−γ 2 t 0 C β p and it has a unique fixed point in that space.
(b) When t 0 is as above, then ) which follows from the following estimates which follow similarly as the above ones (use that γ > β and ( 14) with β = −α and δ = 0) , follows from the second bound in (13) and the following estimate (by 2.6, which follows similarly to (17)) (d) Let us write v φ,b for the solution of ( 11) (with L t as in ( 6)).To see the continuity of the solution with respect to b and φ . By Lemma 2.5 and by 2.6 we have Hence there exists a δ ∈ (0, t 0 ) (small enough, e.g., 2 ) such that So for t ∈ (0, δ] we obtain the desired continuity.By an iteration argument we can obtain the continuity for all t ∈ (0, t 0 ], as for example for t ∈ (δ, 2δ] we have (e) It remains to show that we can increase the integrability from p to ∞, i.e., that v t ∈ C β for all t > 0 and that also as an element of C β the solution v t for fixed t > 0 depends continuously on b and φ.First we show that if t > 0, then v s ∈ C β for all s > t.To simplify notation we only consider the most extreme case p = 1, but the argument for general p is essentially the same.Let n ∈ N 0 be such that Write p 0 = 1 and for i ∈ {1, . . ., n} for all i ∈ {1, . . ., n − 1} and v t ∈ C β , so indeed v s ∈ C β for all s > t.As t was arbitrary, we have shown that v t ∈ C β for all t > 0. As all the inclusions ⊂ above are given by continuous embeddings, the continuity of the solution with respect to φ and b follows from the continuity shown in (d).

A direct computation using that
Proof.The continuity follows from Proposition 2.4.
Because there exists a C > 0 such that b → 0 for all s ∈ [0, ∞) we obtain by a "3ε argument" that R be defined by Γ t,T (x, y) = u δy (t, x).Let P t,x be the unique probability measure on C([t, T ], R d ) such that the coordinate process X is a solution to the SDE (5) t,x weakly converges to P t,x (Theorem 2.2) and the uniform convergence in Corollary 2.8 we obtain for φ ∈ C c (R d ):

Heat-kernel upper bounds
Here we prove the upper bound (2) of the heat-kernel estimates.We follow the "parametrix" approach from Friedman's book [9] to prove the heat-kernel estimates presented in Theorem 1.1.This means that we write Γ t as a series (see Lemma 3.3) and bound each term in that series to obtain a bound for the whole series and thus for Γ t .Usually the point of the parametrix is to deal with non-constant diffusion coefficients, but the approach is still useful for us despite the fact that we deal with constant diffusion coefficients.
Because of Corollary 2.8 we can restrict our attention to b in by a limiting argument.For the rest of this section we fix α ∈ (0, 1  2 ), and c > 1 as in Theorem 1.
By duality and Bernstein's inequality, see [2, Proposition 2.76 and Lemma 2.1], we have We will apply the above bound for functions g that are Gaussian, therefore we will need estimates for derivatives of Gaussian functions.So we recall the following bound: |x| 2 for (t, x) ∈ (0, ∞) × R d be the standard Gaussian kernel.For the space derivatives ∂ µ p we have the following estimate: The proof of the upper bound (2) essentially follows by iterating the previous two observations.To carry out the argument we need the following result, which allows us to write Γ as an infinite series.Lemma 3.3.Let t > 0 and y ∈ R d .For s ∈ [0, t) and x ∈ R d we define Then for all k ∈ N the map , where Moreover, (with Γ s,t as in Proposition 2.9) Proof.By (21) we know that Ψ y,1 equals the inner product of −b(t − s, x) with a convolution in space and time.Therefore, by applying the L 1 inequality for convolutions (Young's inequality) for the space as well for the time convolution, we obtain from which we conclude that 2 ) for all k ∈ N. It remains to show (24).As Γ s,t (x, y) = u δy (s, x) where u δy being the fixed point of the map Φ as in ( 9) with φ = δ y , that is, with u = u δy , From a Picard iteration it follows that Γ is the limit of the sequence Therefore, Γ 1 s,t (x, y) = p(t − s, x − y) and we obtain recursively (see also [9,Chapter 1.4]) This proves (24).
3.4.Now let us get back to Remark 1.4.Observe that in the right-hand side in (24) the dependence on t is in the Ψ y,k functions, and we see that the rest is a function of t − s.This allows us to take the first time variable, s, equal to zero, and proof the heat-kernel bounds as in Theorem 1.1.From now on we write "Γ t " for "Γ 0,t ".
Note that the first term appearing in the right-hand side of ( 24) is already bounded by the right-hand side of (2).Therefore, we will recursively estimate This will be done with the help of some auxiliary lemmas, which follow below.
As we write P t g = p(t, •) * g (see ( 9)), we have For any given norm • we will write ∇f Lemma 3.6.There exists a C > 0 (independent of b) such that for all µ ∈ N d 0 with |µ| ≤ 2, y ∈ R d and t, s, r ∈ (0, ∞) with t > s > r and all Proof.We abbreviate g t,s,r by g.Observe that We estimate both h L 1 and ∇h L 1 .We use (21) and Similarly, in combination with Leibniz's rule, we obtain Using the above and that (a + b) α ≤ a α + b α for a, b ≥ 0 we obtain (25).
3.7.Now we apply the above lemma to our setting.But first, let us introduce some notation.For k ∈ N, t ≥ 0, i ∈ {0, 1}, and β ∈ {0, α} we write We are interested in the bounds for I 0 i,k only.But in order to describe a recursive relation for them, as we will see in the next lemma, we also need the I α i,k 's.
Lemma 3.8.Let C > 0 be as in Lemma 3.6.For all k ∈ N, t ≥ 0, i ∈ {0, 1} and β ∈ {0, α} Proof.We claim that the following holds.For all k ∈ N, y ∈ R d and i ∈ {0, 1, 2} From this (26) follows by definition of I β k .Now let us prove (27).Let g t,s,r be as in Lemma 3.6 with f = Ψ y,k r,t .Observe that by definition of Ψ y,k+1 s,t (23) we can write so that (one can verify the interchange of integrals by Fubini's theorem and using Lemma 3.3) With this, (27) follows from (25).
In the proof of Lemma 3.10 we will use the following bound for the beta function (see (12)).
Proof.By [1, Theorem 1.1.4and Theorem 1.4.1]we have for γ, β > 0 From this we deduce the following.Let Therefore Let us now use the recursive relation for I β i,k and the bounds on the beta function to obtain estimates for I β i,k : Lemma 3.10.Let C > 0 be as in Lemma 3.6 and let M = 8M 1 2 −α with M δ as in Lemma 3.9.There exists a K > 0 (independent of b) such that for all k ∈ N, t > 0, β ∈ {0, α} and i ∈ {0, 1} Proof.We give a proof by induction.Instead of " ∆ −1 b CtL ∞ " and " " we will write "X" and "Y ", respectively.
This shows that the power of t is the right one.We bound the beta function terms to finish the proof.By Lemma 3.9 we have Remark 3.11.The restriction α ∈ (0, 1 2 ) in Lemma 3.10 is necessary since M = 4M 1 2 −α diverges as α ↑ 1 2 (see see the definition of M δ in Lemma 3.9).This is not unexpected, since for α > 1  2 we are no longer in the Young regime and we would need techniques like paracontrolled distributions or regularity structures to solve the equation for Γ.Lemma 3.10 together with the following basic inequality constitutes the proof of Theorem 1.1.Lemma 3.12.Let β ∈ (0, 1).Then there exists an L > 0 such that for z ≥ 0 Lemma 3.13.There exists a C > 0 (independent of b) such that for all µ ∈ N d 0 with |µ| ≤ 1, and for all t > 0, x, y ∈ R d , Proof.To show both (29) and (30) it is sufficient to estimate the series with the modulus of each term in the series in the right-hand side of (29) by the right-hand side of (30).Let K, C, M be as in Lemma 3.10.Again, we will write "X" and "Y " instead of "

Now we can prove the heat-kernel lower bounds:
Proof of the heat-kernel lower bound (3) of Theorem 1.1.We want to apply Lemma 4.1.Therefore we will find an a such that the condition is satisfied.Once more we will write "X" and "Y " instead of " ". Let us also take . Let α ∈ (0, 1 2 ), c > 1 and C > 0 be as in Lemma 3.13.Then (30) gives for a > 0, t ∈ (0, a] and x, y ∈ R d with |x − y| ≤ √ t: Therefore, it holds that Γ t (x, y) Hence there exists a K ∈ (0, 1) (which only depends on c, C and α) such that the choice a = K[X 2 + Y 2 1−α ] −1 works.So by Lemma 4.1 there exist a κ ∈ (0, 1) and a M > 1 such that for all t ∈ [0, ∞) and x, y ∈ R d , This proves that (3) holds for a large enough C.

Proof of Corollary 1.2
As before, we consider b ∈ C([0, T ], B −α ∞,1 ) for some α ∈ (0, 1 2 ) and we let X = (X t ) t∈[0,T ] be the solution to the martingale problem for ((L t ) t∈(0,T ] , δ x ).We prove Corollary 1.2, which means that we estimate the probability that X escapes a box of size K before time T .The estimate is a consequence of our heat-kernel estimates (Theorem 1.1), Markov's inequality and the Garsia-Rademich-Rumsey inequality.By the latter (see [21,Theorem 2.1.3])we have for κ > 0 where In the proof of Corollary 5.2 we will bound the right-hand side of (31) in terms of a function ζ.In the next lemma we start by gathering some auxiliary facts about ζ. (log( 1 r ) ∨ 1).
Proof.That ψ is strictly increasing on ( 1 e , ∞) will be clear, whereas on [0, 1 e ) it follows by calculating its derivative.Since ψ and ζ are continuous and bounded away from 0 and ∞ on compact subintervals of (0, ∞), the existence of such m and M follows once we show that lim Corollary 5.2.Let ψ be as in Lemma 5.1 and let C > 0 be as in Theorem 1.1.Then there exists an M > 0 such that for all T ≥ 1 . (33) Proof.The proof is inspired by [11,Corollary A.5]. Unfortunately we cannot directly apply that result, because the constant they derive depends on the time interval [0, T ] (even though this is not explicitly stated).
n−1}, hence the Besov embedding theorem [2, Proposition 2.71] gives C β p i−1 ⊂ C γ p i for all i ∈ {1, . . ., n − 1}, and C β pn ⊂ C γ .We have v t [10,lassical, see for example[10, Theorem 6.5.4].So let b(n)and Γ be as in Corollary 2.8 and for x ∈ R d let P ], R d ) such that the coordinate process X is a solution to the martingale problem for ((L