1 Introduction

In this paper we study the stochastic differential equation (SDE)

$$\begin{aligned} \mathrm d X_t = b(t, X_t) \mathrm dt + \mathrm d W_t, \quad X_0 \sim \mu , \end{aligned}$$
(1.1)

where \(X_t\in \mathbb {R}^d\), the process \((W_t)\) is a d-dimensional Brownian motion, \(\mu \) is any probability measure and the drift \(b(t, \cdot )\) is an element of a negative Besov space \(\mathcal C^{(-\beta )+}\), see below for the precise definition. SDE (1.1) is clearly only formal at this stage, because the drift b cannot even be evaluated at the point \(X_t\), and one first needs to define a notion of solution for this kind of SDEs. We tackle this problem from two different viewpoints. In the first part we set up a martingale problem and show its well-posedness. In the second part we identify the dynamics of the solution of the martingale problem.

The first steps in the study of the SDE in dimension 1 (and with a diffusion coefficient \(\sigma \)) were done in [13, 14, 22]. In dimension \(d>1\) we mention the work [12] where the authors introduced the notion of virtual solution whose construction depended a priori on a real parameter \(\lambda \). Also, the setting was slightly different because the function spaces were negative fractional Sobolev spaces \(H^{-\beta }_q\) and not Besov spaces. Other authors have studied SDEs with distributional coefficients; afterwards, we mention in particular [1, 5, 7, 29]. The main idea in all these works, which is the same we also develop in the first part of the present paper, is to frame the SDE as a martingale problem; hence, the main goal is to find a domain \({\mathcal {D}}_{{\mathcal {L}}}\) that characterises the martingale solution in terms of the quantity

$$\begin{aligned} f(t, X_t) -f(0, X_0) - \int _0^t {\mathcal {L}}{\mathcal {f}} (s, X_s) \textrm{d}s, \end{aligned}$$
(1.2)

for all \(f\in {\mathcal {D}}_{{\mathcal {L}}}\), where \({\mathcal {L}}\) is the parabolic generator of X formally given by \({\mathcal {L}} f =\partial _t f + \tfrac{1}{2} \Delta f +\nabla f \, b\). This is made rigorous using results on the PDE

$$\begin{aligned} \left\{ \begin{array}{l} {\mathcal {L}} f = g\\ f(T) = f_T, \end{array} \right. \end{aligned}$$

developed in [18].

Our framework in terms of function spaces is slightly different than all the works cited above. In the first part of the article, the only difference is that we allow the initial condition \(X_0\) to be any random variable, and not only a Dirac delta in a point x. Well-posedeness of the PDE \({\mathcal {L}}{\mathcal {f}} =g\) allows to give a proper meaning to the martingale problem. Various regularity results on the PDE together with a transformation of the solution X into the solution Y of a ‘standard’ (Stroock–Varadhan) martingale problem (see Sect. 3) allow us to show existence and uniqueness of the solution X to the martingale problem, see Theorem 4.5. We also prove other interesting results such as Theorem 4.2 where we show that the law density of the solution X satisfies the Fokker–Planck equation, which is a PDE with negative Besov coefficients. Furthermore, we show in Theorem 4.3 some tightness results for smoothed solutions \(X^n\) when the negative Besov coefficients are smoothed.

The main novelty of this paper is the second part, where we study the SDE \( X_t = X_0 + \int _0^t b(s, X_s) \textrm{d}s + W_t \) from a different point of view, in particular we look into the dynamics of the process itself. One natural question to ask, which is well understood in the classical Stroock–Varadhan case where b is a locally bounded function, is the equivalence between the solution to the martingale problem and the solution in law (i.e. weak solution) of the SDE. In the case of SDEs with distributional coefficients, the first challenging problem is to define a suitable notion of solution of the SDE and then to study well-posedness of that equation. To this aim, we start in Sect. 5 by showing that the solution to the martingale problem is a weak Dirichlet process, for which we identify the martingale component in its canonical decomposition, see Proposition 5.11 and Remark 5.12. We then introduce in Sect. 6 our notion of solution for the SDE, involving a ‘local time’ operator which plays the role of the integral \(\int _0^t b(s, X_s) \textrm{d}s\) and involving weak Dirichlet processes. Under further mild assumptions on b, for example if it has compact support, in Theorem 6.5 we show that a solution to the martingale problem is also a solution to the SDE. In a slightly more restricted framework, in Proposition 6.12 we obtain the converse result, hence providing the equivalence result of SDEs and martingale problems for distributional drifts, see Corollary 6.13. Those results extend [22, Propositions 6.7 and 6.10] stated in dimension 1 and in the case of time-homogeneous coefficients.

A typical example of drift b for which all our results are valid, arises when b is a quenched realisation of an independent noise \(\dot{B}_x(\omega )\), which is a generalised random field whose trajectories are the divergence of a \((1-\beta )\)-Hölder continuous functions \(x\mapsto B_x(\omega )\) for some \(\beta \in (0,\frac{1}{2})\), cut with a smooth function with compact support. These models arise when describing the motion of particles propagating in an irregular medium, see [27] and references therein. The class of these noises is large, and in dimension \(d=1\) it includes for instance (bi)fractional, multi-fractional Brownian ones, etc., with Hurst index greater than \(\frac{1}{2}\), to be cut so that they have compact support.

A result connected to ours is provided by [6], where the authors study the case when the driving noise is a Lévy \(\alpha \)-stable process and the distributional drift lives in a general Besov space \({\mathbb {B}}^{-\beta }_{p,q}\). In particular, they formulate the martingale problem and a quite different notion of SDE (for which, in \(d=1\) they even study pathwise uniqueness, extending in this way [22, Corollary 5.19], stated for Brownian motion) and prove that a solution to the martingale problem is also a solution to their SDE. However, they do not prove the converse result; hence, they do not have any equivalence.

The paper is organised as follows. In Sect. 2 we introduce the framework in which we work, in particular the various functions spaces appearing in the paper and many useful results from the companion paper [18]. In Sect. 3 we introduce the martingale problem and transform it into a classical equivalent Stroock–Varadhan martingale problem. In Sect. 4 we show existence and uniqueness of a solution to the martingale problem and various other properties. In Sect. 5 we show that the solution to the martingale problem is a weak Dirichlet process and identify its decomposition. In Sect. 6 we introduce the notion of solution to the SDE and show its equivalence to the martingale problem. Finally, in Appendix A we state a useful result on solutions of (classical) PDEs that we use in the paper.

2 Setting and Preliminary Results

2.1 Function Spaces

Let us denote by \( C^{1,2}_{buc}:= C^{1,2}_{buc}([0,T]\times \mathbb {R}^d)\) the space of all \(C^{1,2}\) real functions such that the function and its gradient in x are bounded, and the Hessian matrix and the time-derivative are bounded and uniformly continuous. Let us denote by \( C^{1,2}_{c}:= C^{1,2}_{c}([0,T]\times \mathbb {R}^d)\) the space of \( C^{1,2} ([0,T]\times \mathbb {R}^d)\) with compact support. Let us denote by \(C_b^{1,2}:= C^{1,2}_{b}([0,T]\times \mathbb {R}^d) \) the space of \(C^{1,2}\)-functions that are bounded with bounded derivatives. We also use the notation \(C^{0,1}:=C^{0,1}([0,T]\times \mathbb {R}^d)\) to indicate the space of real functions with gradient in x uniformly continuous in (tx). Let \(C_c^\infty := C_c^\infty (\mathbb {R}^d)\) denote the space of all smooth real functions with compact support. We denote by \(C_c=C_c(\mathbb {R}^d)\) the space of \(\mathbb {R}\)-valued continuous functions with compact support. Let \({\mathcal {S}}={\mathcal {S}}(\mathbb R^d )\) be the space of real-valued Schwartz functions on \(\mathbb R^d\) and \({\mathcal {S}}'={\mathcal {S}}'({\mathbb {R}}^d )\) the space of Schwartz distributions. The corresponding dual pairing will be denoted by \(\langle \cdot , \cdot \rangle \).

For \(\gamma \in {\mathbb {R}}\) we denote by \({\mathcal {C}}^\gamma = \mathcal C^\gamma ({\mathbb {R}}^d)\) the Besov space (or Hölder-Zygmund space), endowed with its norm \(\Vert \cdot \Vert _{\gamma }\). For more details see [2, Section 2.7, pag 99] and also [18], where we recall all useful facts and definitions about these spaces. If \(\gamma \in \mathbb {R}^+ {\setminus } {\mathbb {N}}\) then the space coincides with the classical Hölder space. If \(\gamma <0\) then the space includes some Schwartz distributions. We have \({\mathcal {C}}^\gamma \subset {\mathcal {C}}^\alpha \) for any \(\gamma >\alpha \). Moreover, it holds that \(L^\infty \subset {\mathcal {C}}^0\) (see [17] for a proof in the case of anisotropic Besov spaces). We denote by \(C_T {\mathcal {C}}^\gamma \) the space of continuous functions on [0, T] taking values in \(\mathcal C^\gamma \), that is \(C_T {\mathcal {C}}^\gamma := C([0,T]; \mathcal C^\gamma )\). For any given \(\gamma \in \mathbb {R}\) we denote by \(\mathcal C^{\gamma +}\) and \({\mathcal {C}}^{\gamma -}\) the spaces given by

$$\begin{aligned} {\mathcal {C}}^{\gamma +}:= \cup _{\alpha >\gamma } {\mathcal {C}}^{\alpha }, \qquad {\mathcal {C}}^{\gamma -}:= \cap _{\alpha <\gamma } \mathcal C^{\alpha }. \end{aligned}$$

Note that \({\mathcal {C}}^{\gamma +}\) is an inductive space. We will also use the spaces \(C_T C^{\gamma +}:=C([0,T]; {\mathcal {C}}^{\gamma +})\), which is equivalent to the fact that for \(f\in C_T C^{\gamma +} \) there exists \(\alpha >\gamma \) such that \(f\in C_T C^{\alpha }\), see for example [19, Appendix B]. Similarly, we use the space \(C_T C^{\gamma -}:=C([0,T]; {\mathcal {C}}^{\gamma -})\), meaning that if \(f\in C_T {\mathcal {C}}^{\gamma -} \) then for any \(\alpha <\gamma \) we have \(f\in C_T {\mathcal {C}}^{\alpha }\). We denote by \({\mathcal {C}}_c^{\gamma }={\mathcal {C}}_c^{\gamma }(\mathbb {R}^d)\) the space of elements in \({\mathcal {C}}^{\gamma }\) with compact support. Similarly when \(\gamma \) is replaced by \(\gamma +\) or \(\gamma -\). When defining the domain of the martingale problem, we will work with spaces of functions which are the limit of functions with compact support, so that they are Banach spaces. More precisely, let us denote by \(\bar{{\mathcal {C}}}_c^\gamma = \bar{{\mathcal {C}}}_c^\gamma (\mathbb {R}^d)\) the space

$$\begin{aligned} \bar{{\mathcal {C}}}_c^\gamma := \{f \in {{\mathcal {C}}}^\gamma \text { such that } \exists (f_n) \subset {{\mathcal {C}}}_c^\gamma \text { and } f_n \rightarrow f \text { in } {{\mathcal {C}}}^\gamma \}. \end{aligned}$$

As above we denote the inductive space and intersection space as

$$\begin{aligned} \bar{{\mathcal {C}}}_c^{\gamma +}:= \cup _{\alpha >\gamma } \bar{\mathcal C}_c^{\alpha }, \qquad \bar{{\mathcal {C}}}_c^{\gamma -}:= \cap _{\alpha <\gamma } \bar{{\mathcal {C}}}_c^{\alpha }. \end{aligned}$$

The main reason for introducing this class of subspaces is that \(\bar{{\mathcal {C}}}_c^{\gamma +}\) are separable, as proved in [18, Lemma 5.7], unlike the classical Besov spaces \( C^{\gamma }\) and \( {{\mathcal {C}}}^{\gamma +}\) which are not separable. Similarly as above, we use the space \(C_T \bar{\mathcal C}_c^{\gamma +}:=C([0,T]; \bar{{\mathcal {C}}}_c^{\gamma +})\); in particular we observe that if \(f\in C_T \bar{\mathcal C}_c^{\gamma +} \) then for any \(\alpha <\gamma \) we have \(f\in C_T \bar{{\mathcal {C}}}_c^{\alpha }\) by [19, Remark B.1, part (ii)]. Moreover, in [18, Corollary 5.8] we show that \(C_T \bar{\mathcal C}_c^{\gamma +}\) is separable. Note that if f is continuous and such that \(\nabla f \in C_T {\mathcal {C}}^{0+}\) then \(f\in C^{0,1}\).

Note that for all function spaces introduced above we use the same notation to indicate \(\mathbb {R}\)-valued functions but also \(\mathbb {R}^d\)- or \(\mathbb {R}^{d\times d}\)-valued functions. It will be clear from the context which space is needed. When \(f:\mathbb {R}^d \rightarrow \mathbb {R}^m\) is differentiable, we denote by \(\nabla f\) the matrix given by \((\nabla f)_{i,j} = \partial _i f_j\). In particular, when \(f: \mathbb {R}^d \rightarrow \mathbb {R}\) then \(\nabla f\) is a column vector and we denote the Hessian matrix of f by Hess(f).

For \(\gamma \in (0,1)\) we define space \(D{\mathcal {C}}^\gamma \) as

$$\begin{aligned} D{\mathcal {C}}^\gamma := \{&h: \mathbb {R}^d \rightarrow \mathbb {R}\text { differentiable function s.t. } \nabla h \in {\mathcal {C}}^\gamma \}, \end{aligned}$$

and by \(C_T D{\mathcal {C}}^\gamma := C([0,T]; D{\mathcal {C}}^\gamma )\). Note that the following inclusion holds \({\mathcal {C}}^{1+\alpha } \subset D{\mathcal {C}}^\alpha .\) Analogously as for the \(\mathcal C^{\gamma +}\)-spaces, for \(\gamma >0\) we also introduce the spaces

$$\begin{aligned} D {\mathcal {C}}^{\gamma +}:= \cup _{\alpha >\gamma } D{\mathcal {C}}^{\alpha }, \qquad D{\mathcal {C}}^{\gamma -}:= \cap _{\alpha <\gamma } D{\mathcal {C}}^{\alpha }. \end{aligned}$$

We will also use the spaces \(C_T D{\mathcal {C}}^{\gamma +}:=C([0,T]; D {\mathcal {C}}^{\gamma +})\). For more details on these spaces, see [18, Sect. 3].

2.2 Some Tools and Properties

The following is an important estimate which allows to define the pointwise product between certain distributions and functions, which is based on Bony’s estimates. For details see [4] or [16, Sect. 2.1]. Let \(f \in {\mathcal {C}}^\alpha \) and \(g\in {\mathcal {C}}^{-\beta }\) with \(\alpha -\beta >0\) and \(\alpha ,\beta >0\). Then the ‘pointwise product’ \( f \, g\) is well-defined as an element of \(\mathcal C^{-\beta }\) and there exists a constant \(c>0\) such that

$$\begin{aligned} \Vert f \, g\Vert _{-\beta } \le c \Vert f \Vert _\alpha \Vert g\Vert _{-\beta }. \end{aligned}$$
(2.1)

Remark 2.1

Using (2.1) it is not difficult to see that if \(f \in C_T {\mathcal {C}}^{\alpha }\) and \(g \in C_T {\mathcal {C}}^{-\beta }\) for \(\alpha>\beta >0\) then the product is also continuous with values in \( {\mathcal {C}}^{-\beta }\), and

$$\begin{aligned} \Vert f \, g\Vert _{C_T {\mathcal {C}}^{-\beta }} \le c \Vert f \Vert _{C_T \mathcal C^{\alpha }} \Vert g\Vert _{ C_T {\mathcal {C}}^{-\beta }}. \end{aligned}$$
(2.2)

Below we recall some results on a class of PDEs with distributional drift in negative Besov spaces that will be used to set up the martingale problem for the singular SDE (1.1). All results are taken from [18]. In [18], as well as in the present work, the main assumption concerning the distribution-valued function b is the following.

Assumption A1

Let \(0<\beta <1/2\) and \(b\in C_T {\mathcal {C}}^{(-\beta )+}(\mathbb {R}^d)\). In particular \(b\in C_T {\mathcal {C}}^{-\beta }(\mathbb {R}^d)\). Notice that b is a column vector.

We start by the formal definition of the operator \({\mathcal {L}}\).

Definition 2.2

(Definition 4.3, [18]) Let b satisfy Assumption A1. The operator \(\mathcal L\) is defined as

$$\begin{aligned} \begin{array}{lcll} {\mathcal {L}}: &{}{\mathcal {D}}_{{\mathcal {L}}}^0 &{}\rightarrow &{}\{\mathcal S'\text {-valued continuous functions}\}\\ &{} f &{} \mapsto &{} {\mathcal {L}} f:= \dot{f} + \frac{1}{2}\Delta f + \nabla f \, b, \end{array} \end{aligned}$$

where

$$\begin{aligned} {\mathcal {D}}_{{\mathcal {L}}}^0: = C_T D{\mathcal {C}}^{\beta } \cap C^1([0,T]; {\mathcal {S}}'). \end{aligned}$$

Here \(f: [0,T]\times \mathbb {R}^d \rightarrow \mathbb {R}\) and the function \(\dot{f}:[0,T]\rightarrow {\mathcal {S}}'\) is the time-derivative of f. Note also that \(\nabla f \, b:= \nabla f \cdot b\) is well-defined using (2.1) and Assumption A1 and moreover it is continuous. The Laplacian \(\Delta \) is intended in the sense of distributions.

Next we recall some results on certain PDEs, all driven by the operator \({\mathcal {L}}\). These results are all proved in the companion paper [18]. There are three equations of interest, all related but slightly different. The first PDE is

$$\begin{aligned} \left\{ \begin{array}{l} {\mathcal {L}} v = g \\ v(T) = v_T. \end{array}\right. \end{aligned}$$
(2.3)

We know from [18, Remark 4.8] that if \(v_T \in {\mathcal {C}}^{(1+ \beta )+}\) and \(g \in C_T {\mathcal {C}}^{(-\beta )+}\) then there exists a unique (weak or mild) solution \(v\in C_T {\mathcal {C}}^{(1+ \beta )+}\). In [18, Lemma 4.17 and Remark 4.18] we prove a continuity result, namely that if the terminal condition \(v_T\) in (2.3) is replaced by a sequence \((v_T^n)\) that converges to \(v_T\) in \(\mathcal C^{(1+\beta )+}\), the terms b and g are replaced by two sequences \((b^n)\) and \((g^n)\), respectively, both converging in \( C_T \mathcal C^{-\beta }\), then also the corresponding unique solutions \((v^n)\) will converge to v in \(C_T{\mathcal {C}}^{(1+\beta )+}\).

We can solve PDE (2.3) also under weaker conditions on \(v_T\), in particular we allow functions with linear growth. The space that characterises this behaviour is denoted by \(D\mathcal C^{\beta }\), which is the space of differentiable functions whose gradient belongs to \({\mathcal {C}}^{\beta }\). Notice that in [18] we introduce two concepts of solution, weak and mild, which are defined for functions in \( C_T D{\mathcal {C}}^{\beta }\). We prove in [18, Proposition 4.5] that the notions of weak and mild solution of the PDE are equivalent. In [18, Remark 4.8] we show that if \(v_T \in D{\mathcal {C}}^{\beta +}\) then there exists a unique solution \(v\in C_T D {\mathcal {C}}^{\beta +}\). Continuity results for PDE (2.3) in the spaces \(D\mathcal C^{\beta +}\) also hold, as we prove in [18, Remark 4.18 (i)], that is if \(g^n\rightarrow g\) in \(C_T \mathcal C^{-\beta } \), \(b^n \rightarrow b\) in \(C_T {\mathcal {C}}^{-\beta } \) and \(v^n_T \rightarrow v_T\) in \( D {\mathcal {C}}^{\beta +} \) then \(v^n \rightarrow v\) in \(C_T D {\mathcal {C}}^{\beta +} \). As a special case we show in [18, Corollary 4.10] that the function \(\text {id}_i(x)= x_i\) solves PDE (2.3) with \(v(T)= x_i\) and \( g= b_i\), that is \({\mathcal {L}} \text {id}_i = b_i\).

Let \(\lambda >0\). The second PDE to consider is

$$\begin{aligned} \left\{ \begin{array}{l} {\mathcal {L}} \phi _i = \lambda (\phi _i - \text {id}_i)\\ \phi _i(T) = \text {id}_i, \end{array} \right. \end{aligned}$$
(2.4)

which has a unique (weak or mild) solution \(\phi _i\) for \(i=1, \ldots , d\) in the space \(C_T D{\mathcal {C}}^{(1-\beta )-}\) (uniqueness holds in \(C_T D{\mathcal {C}}^{\beta }\)) by [18, Theorem 4.7 (i)]. In [18, Proposition 4.15] we show that \(\phi _i \in \mathcal D^0_{{\mathcal {L}}}\) and \({\dot{\phi }}_i \in C_T {\mathcal {C}}^{(-\beta )-} \) for all \(i=1, \ldots , d\). We denote by \(\phi \) the column vector with components \(\phi _i\), \(i=1, \ldots , d\). We show in [18, Proposition 4.16] that there exists \(\lambda >0 \) large enough such that \(\phi (t, \cdot )\) is invertible for all \(t\in [0,T]\), and denoting such inverse with

$$\begin{aligned} \psi (t, \cdot ):= \phi ^{-1}(t, \cdot ). \end{aligned}$$
(2.5)

In the same proposition we also show that \(\phi , \psi \in C^{0,1}\) and moreover that \(\nabla \phi \in C_T {\mathcal {C}}^{(1-\beta )-}\) and \(\nabla \psi (t, \cdot ) \in {\mathcal {C}}^{(1-\beta )-}\) for all \(t\in [0,T]\) and \(\sup _{t\in [0,T]}\Vert \nabla \psi (t, \cdot ) \Vert _{\alpha }\) for all \(\alpha <1-\beta \). From now on, let \((b^n)\) be the sequence defined in [19, Proposition 2.4], so we know that \(b^n\rightarrow b\) in \(C_T{\mathcal {C}}^{-\beta }\), \(b^n \in C_T {\mathcal {C}}^\gamma \) for all \(\gamma >0\) and \(b^n\) is bounded and Lipschitz. Here and in the rest of the paper \(\lambda >0\) is fixed and independent of n, chosen such that

$$\begin{aligned} \lambda = [C(\beta , \varepsilon ) \max \{ \sup _n\Vert b^n\Vert _{C_T \mathcal C^{-\beta +\varepsilon }},\Vert b\Vert _{C_T {\mathcal {C}}^{-\beta +\varepsilon }} \}]^{\frac{1}{1-\theta }}, \end{aligned}$$
(2.6)

according to [18, Lemma 4.19], where \(\varepsilon >0\) is such that \(\theta := \frac{1+2\beta -\varepsilon }{2}<1\) and \( C(\beta , \varepsilon )\) is a constant only depending on \(\beta \) and \( \varepsilon \). Notice with this choice of \(\lambda \) the corresponding inverse \(\psi ^n\) of \(\phi ^n \), see (2.5) is well-defined according to [18, Proposition 4.16 (ii)]. In [18, Lemma 4.19] we show that \(\phi ^n \rightarrow \phi \) and \(\psi ^n \rightarrow \psi \) uniformly on \([0,T]\times \mathbb {R}^d\) and \(\Vert \nabla \phi ^n \Vert _\infty + |\phi ^n(0,0)|\) is uniformly bounded in n.

Finally, in [18, Theorem 4.14] we show that the function \(\phi \) is equivalently defined as \(\phi = \text {id} + u\), where \(u= (u_1, \ldots , u_d)\) and \(u_i\) is the unique solution of the third PDE, that is

$$\begin{aligned} \left\{ \begin{array}{l} {\mathcal {L}} u_i = \lambda u_i - b_i\\ u_i(T) = 0 \end{array} \right. \end{aligned}$$
(2.7)

in the space \(C_T {\mathcal {C}}^{(2-\beta )-}\). For the latter PDE there are also continuity results proven in [18, Lemma 4.17], namely \(u_i^n \rightarrow u_i\) in \(C_T \mathcal C^{(2-\beta )-}\). Moreover, we have uniform convergence of \(u^n\rightarrow u, \nabla u^n \rightarrow \nabla u\) by [18, Lemma 4.19]. With \(\lambda \) chosen as in (2.6) we have \(\Vert \nabla u^n\Vert _{\infty } \le \frac{1}{2}\) by [18, Proposition 4.13 and bound (4.34)].

2.3 Probabilistic Notation

In the sequel we will consider generic measurable spaces \((\Omega , {\mathcal {F}})\). On them we will consider various probability measures denoted by \({\mathbb {P}}\). We will make use of the notation \((X, {\mathbb {P}})\) or \((Y, {\mathbb {P}})\), where X or Y will denote continuous stochastic processes indexed by \(t\in [0,T]\) defined on the probability space \( (\Omega , {\mathcal {F}}, {\mathbb {P}})\), without recalling it explicitly. The filtrations considered, if not explicitly mentioned, will be the canonical filtrations generated by X or Y (which will be the same in our applications).

Once the probability space \( (\Omega , {\mathcal {F}}, {\mathbb {P}})\) is fixed, we will denote by \({\mathscr {C}}\) the linear space of continuous processes on [0, T] with values in \(\mathbb {R}^d\) endowed with the metric of uniform convergence in probability (u.c.p.).

The canonical space of continuous functions from [0, T] with values in \({\mathbb {R}}^d\) will be denoted by \({\mathcal {C}}_T \), and it will be endowed with the sigma algebra of Borel sets \({\mathcal {B}}(\mathcal C_T)\). For \(s\in [0,T]\) we will use the notation \({\mathcal {C}}_s \) for the space of continuous functions defined on [0, s]. Thus, for a given couple \((X, {\mathbb {P}})\), the law of X under \({\mathbb {P}}\) will be a Borel probability measure on the measurable space \(({\mathcal {C}}_T, {\mathcal {B}}({\mathcal {C}}_T))\).

3 A Zvonkin-type Transformation

In the study of SDEs with low-regularity coefficients, like (1.1), one successful idea is to apply a bijective transformation that changes the singular drift and produces a transformed SDE whose drift has no singular component and which can thus be solved with standard techniques. The idea goes back to Zvonkin [30], and in the present case a transformation that does the job is the unique solution \(\phi \) of the PDE (2.4). The analysis that we do here can shed some light on what kind of transformations, aside from \(\phi \), of the martingale problem fulfilled by X will lead to different, but equivalent, transformed martingale problems fulfilled by a new process Y.

Let us start by introducing a class of function, denoted by \({\mathcal {D}}_{{\mathcal {L}}} \), that is the domain of the martingale problem

$$\begin{aligned} \begin{array}{ll} {\mathcal {D}}_{{\mathcal {L}}}: = &{} \{ f \in C_T{\mathcal {C}}^{(1+\beta )+}: \exists g \in C_T\bar{{\mathcal {C}}}_c^{0+} \text { such that } \\ &{} f \text { is a weak solution of } {\mathcal {L}} f =g \text { and } f(T) \in \bar{{\mathcal {C}}}_c^{(1+\beta )+ }\}, \end{array} \end{aligned}$$
(3.1)

where \({\mathcal {L}}\) has been defined in Definition 2.2.

Definition 3.1

We say that a couple \((X, {\mathbb {P}})\), where X is a continuous process indexed by \(t\in [0,T]\) and \({\mathbb {P}}\) is a probability on some measurable space, is a solution to the martingale problem with distributional drift b and initial condition \(\mu \) (for shortness, solution of MP with distributional drift b and i.c. \(\mu \)) if and only if for every \(f \in {\mathcal {D}}_{{\mathcal {L}}}\)

$$\begin{aligned} f(t, X_t) - f(0, X_0) - \int _0^t ({\mathcal {L}} f) (s, X_s) \textrm{d}s \end{aligned}$$
(3.2)

is a local martingale under \({\mathbb {P}}\), and \(X_0 \sim \mu \) under \({\mathbb {P}}\), where the domain \({\mathcal {D}}_{{\mathcal {L}}} \) is given by (3.1) and \({\mathcal {L}}\) has been defined in Definition 2.2.

We say that the martingale problem with distributional drift b admits uniqueness if for any two solutions \((X^1, \mathbb P^1)\) and \((X^2, {\mathbb {P}}^2)\) with \(X^i_0\sim \mu \), \(i=1,2\), then the law of \(X^1\) under \({\mathbb {P}}^1\) is the same as the law of \(X^2\) under \({\mathbb {P}}^2\).

Remark 3.2

Since \( \bar{{\mathcal {C}}}_c^{(1+\beta )+ } \subset \bar{\mathcal C}_c^{0+ } \subset {{\mathcal {C}}}^{(-\beta )+ }\), then there exists a unique weak solution \(f\in C_T{\mathcal {C}}^{(1+\beta )+}\) for the PDE appearing in \({\mathcal {D}}_{{\mathcal {L}}}\), see Sect. 2.2. Moreover, by [18, Remark 4.4] we have \( {\mathcal {D}}_{{\mathcal {L}}} \subset {\mathcal {D}}_{{\mathcal {L}}}^0\).

Proposition 3.3

The domain \({\mathcal {D}}_{{\mathcal {L}}}\) defined in (3.1) equipped with its graph topology is separable.

Proof

By [18, Lemma 5.7 (i)] with \(\gamma =0\) we know that \( \bar{{\mathcal {C}}}_c^{0+}\) is separable; hence, there exists a dense subset \(D_0\) of \( \bar{{\mathcal {C}}}_c^{0+}\), and by [18, Corollary 5.8] we know that \( C_T\bar{{\mathcal {C}}}_c^{\beta +}\) is separable; thus, there exists a dense subset \(D_\beta \) of \( C_T\bar{{\mathcal {C}}}_c^{\beta +}\). Let us denote by D the set of all \( f_n\in C_T {\mathcal {C}}^{(1+\beta )+} \) such that \({\mathcal {L}} f_n =g_n; f_n(T) = f_n^T\) where \(g_n \in D_0 \) and \(f_n^T \in D_\beta \). Clearly, D is countable, because \(D_0\) and \(D_{\beta }\) are countable and \(D\subset {\mathcal {D}}_{{\mathcal {L}}}\). Moreover, by continuity results on the PDE (2.3), see Sect. 2.2, we have that if \(f_n^T\rightarrow f(T)\) in \({\mathcal {C}}^{(1+\beta )+}\) and \(g_n \rightarrow g\) in \(C_T{\mathcal {C}}^{0+}\), then \(f_n \rightarrow f\) in \(C_T{\mathcal {C}}^{(1+\beta )+}\), which proves that the set D is dense in \({\mathcal {D}}_{{\mathcal {L}}}\). \(\square \)

Next, we introduce the transformed SDE studied here, which is

$$\begin{aligned} Y_t=&Y_0 + \lambda \int _0^t Y_s \textrm{d}s - \lambda \int _0^t \psi (s, Y_s) \textrm{d}s + \int _0^t \nabla \phi (s, \psi (s, Y_s) )\textrm{d}W_s, \end{aligned}$$
(3.3)

where \(\phi \) is the unique solution of (2.4) and \(\psi \) is its (space-)inverse given by (2.5) with \(\lambda >0\) chosen large enough (see Sect. 2.2). Notice that this SDE is formally obtained by applying the transformation \(\phi \) to X as in Definition 3.1, that is, setting \(Y_t =\phi (t,X_t) \) and using that \(\phi \) is invertible with inverse \(\psi \).

Denoting by Y the solution of (3.3), by Itô’s formula for all \({\tilde{f}}\in C_{buc}^{1,2}([0,T]\times \mathbb {R}^d)\) we know that

$$\begin{aligned} {\tilde{f}}( t, Y_t) - {\tilde{f}}(0, Y_0) - \int _0^t ( \tilde{\mathcal L}{\tilde{f}}) (s, Y_s) \textrm{d}s \end{aligned}$$

is a martingale under \({\mathbb {P}}\). Here the operator \(\tilde{{\mathcal {L}} }\) is the generator of Y, which is defined by

$$\begin{aligned} \tilde{{\mathcal {L}}} {\tilde{f}} := \partial _t {\tilde{f}} + \lambda \nabla {\tilde{f}} (\text {id} - \psi ) + \frac{1}{2} \text {Tr}[(\nabla \phi \circ \psi )^\top \text {Hess} {\tilde{f}} (\nabla \phi \circ \psi )] . \end{aligned}$$
(3.4)

In particular, \((Y, {\mathbb {P}})\) verifies the classical Stroock–Varadhan martingale problem with respect to \(\tilde{{\mathcal {L}}}\). We recall that this notion is equivalent to the one of weak solution for SDEs, see [20, Proposition 4.11 in Chapter 5]. To avoid confusion with the notion of weak solution for PDEs, in this paper we use the terminology solution in law instead of weak solution when referring to SDEs.

Remark 3.4

Note that the coefficients in \(\tilde{{\mathcal {L}}}\) belong to \( C^{0,\nu }\) for any \(\nu <1-\beta \), see Appendix A for the definition of \( C^{0,\nu }\). Indeed, \({\tilde{f}}\in C^{1,2}_{buc}\), \(\psi \) has linear growth since \(|\nabla \psi |\) is uniformly bounded and the coefficient is \(\nabla \phi \circ \psi \) belongs to \( C^{0,\nu }\) for any \(\nu <1-\beta \) because

$$\begin{aligned} \Vert \nabla \phi (t, \psi (t, \cdot ))\Vert _\nu \le&\sup _{x} | \nabla \phi (t, \psi (t, x))|\\&+\sup _{ x_1, x_2, \, x_1\ne x_2 } \frac{| \nabla \phi (t, \psi (t, x_2)) -\nabla \phi (t, \psi (t, x_2))| }{|x_1-x_2|^\nu }\\ \le&\sup _{t\in [0,T]} \Vert \nabla \phi (t, \cdot )\Vert _\infty \\&+ \sup _{t\in [0,T]} \sup _{ x_1, x_2, \, x_1\ne x_2 } \frac{| \nabla \phi (t, \psi (t, x_2)) -\nabla \phi (t, \psi (t, x_2))| }{|\psi (t, x_1) - \psi (t, x_2) |^\nu }\\&\frac{|\psi (t, x_1) - \psi (t, x_2) |^\nu }{|x_1-x_2|^\nu } \\ \le&\sup _{t\in [0,T]} \Vert \nabla \phi (t, \cdot )\Vert _\infty + \Vert \nabla \phi \Vert _{C_T {\mathcal {C}}^\nu } \Vert \nabla \psi \Vert _\infty . \end{aligned}$$

Here we have also used Remark A.1.

It will be useful later on to consider a domain for the operator \(\tilde{{\mathcal {L}}}\) obtained as the image of \({\mathcal D}_{{\mathcal {L}}}\) through \(\phi \). Let us define

$$\begin{aligned} \tilde{{\mathcal {D}}}_{\tilde{{\mathcal {L}}}}:= \{{\tilde{f}} = f\circ \psi \text { for some } f \in {\mathcal {D}}_{{\mathcal {L}}} \text { and } \psi \text { defined in (2.5)} \}. \end{aligned}$$
(3.5)

The choice of the SDE (3.3) and of the domain \(\tilde{{\mathcal {D}}}_{\tilde{{\mathcal {L}}}}\) is natural since we use the transformed process \(Y_t = \phi (t, X_t)\).

Lemma 3.5

Let \(g,h: \mathbb {R}^d \rightarrow \mathbb {R}^d\) with \(h\in C^1\) with \(\nabla h \in C^{\beta +}\) and \(g\in {\mathcal {C}}^{(1+\beta )+}\). Then \( g \circ h \in {\mathcal {C}}^{(1+\beta )+}\). If moreover \(g_n\rightarrow g\) in \( \mathcal C^{(1+\beta )+}\), then \(g_n \circ h \rightarrow g\circ h \) in \(\mathcal C^{(1+\beta )+}\).

Proof

To prove that \( g \circ h \in {\mathcal {C}}^{(1+\beta )+}\) is equivalent to prove that \( {\bar{f}}:= g(h(\cdot ))\) is bounded and that there exists \(\alpha >\beta \) such that \(\nabla {\bar{f}} \in \mathcal C^\alpha \), i.e. \(\nabla {\bar{f}} \) is bounded and \(\alpha \)-Hölder continuous. The first claim is obvious by boundedness of g. The gradient \(\nabla {\bar{f}} ( \cdot ) = \nabla h(\cdot ) \nabla g (h(\cdot ))\) is bounded because it is the product of two bounded matrices since \(\nabla h, \nabla g\) are bounded by assumption on gh.

To show that \(\nabla {\bar{f}}\) is \(\alpha \)-Hölder continuous, it is enough to show that it is the product of two functions in \(\mathcal C^\alpha \) (note that boundedness of the factors is crucially used). We have \(\nabla h\in {\mathcal {C}}^\alpha \) for some \(\alpha >\beta \) by assumption. On the other hand it is immediate to show that the term \(\nabla g (h( \cdot )) \) is in \({\mathcal {C}}^\alpha \), because it is bounded, and \(\alpha \)-Hölder continuity is proved using that \(\nabla g\in {\mathcal {C}}^\alpha \) and h is Lipschitz because by assumption \(\nabla h\) is bounded.

To show convergence, let us denote \({\bar{f}}_n:= g_n \circ h\). Since \({\bar{f}}_n(0)\rightarrow {\bar{f}}(0)\), it is enough to show the convergence of \(\nabla {\bar{f}}_n \) in \({\mathcal {C}}^{\alpha }\). We use the same properties as above to get \( \Vert \nabla {\bar{f}}_n - \nabla \bar{f}\Vert _{\alpha } \le \Vert \nabla h\Vert _\infty \Vert g_n -g\Vert _\alpha + \Vert \nabla g_n -\nabla g\Vert _\infty \Vert h \Vert _\alpha , \) and the proof is complete. \(\square \)

Lemma 3.6

If \({\tilde{f}}\in C^{1,2}_{buc} \) and \(\phi \) is the unique solution to PDE (2.4), then \( {\tilde{f}} \circ \phi \in C_T \mathcal C^{(1+\beta )+}\).

Proof

Let us set \( f:= {\tilde{f}} \circ \phi \). We first prove that \(f(t) \in {\mathcal {C}}^{(1+\beta )+}\) for all \(t\in [0,T]\). This is a consequence of Lemma 3.5 with \(g= {\tilde{f}} (t,\cdot )\) and \(h=\psi (t,\cdot )\). The hypothesis on g is satisfied since \(g\in C^{1,2}_{buc} \) and hence \(g(t)\in {\mathcal {C}}^{(1+\gamma )+}\) for any \(\gamma \in (0,1)\). The hypothesis on h is satisfied since \(\nabla h\in {\mathcal {C}}^{(1-\beta )-}\) implies \(\nabla h\in \mathcal C^{\beta +}\).

For the (uniform) time-continuity with values in \( \mathcal C^{(1+\beta )+}\), since \(\beta \) is not an integer, we have to control

$$\begin{aligned} \Vert f(t)-f(s)\Vert _\infty + \Vert \nabla f(t)-\nabla f(s)\Vert _\alpha \end{aligned}$$
(3.6)

for some \(\alpha >\beta \) and for small \(|t-s|\), where we recall \(f(t) = {\tilde{f}}(t, \phi (t, \cdot ))\), having used the equivalent norm [18, (2.3)]. The first term in (3.6) is obvious from the fact that \({\tilde{f}}\in C^{1,2}_{buc}\) and

$$\begin{aligned} \phi (t, x) -\phi (s,x) = u(t, x) -u(s,x), \, \text { where } u \in C_T {\mathcal {C}}^{1+\alpha }, \end{aligned}$$
(3.7)

see Sect. 2.2.

For the second term in (3.6), setting \(H:= \nabla \tilde{f} \circ \phi \), we can write \(\nabla f = H \nabla \phi \). We note that \(\nabla \phi \in C_T{\mathcal {C}}^\alpha \), see Sect. 2.2, and since \(H \in C_T{\mathcal {C}}^\alpha \) (proved below) then the product is also in \(C_T{\mathcal {C}}^\alpha \) and the proof is concluded.

It remains to show that \(H \in C_T{\mathcal {C}}^\alpha \). For the sup part of the norm (see [18, (2.2)]), we notice that

$$\begin{aligned} \Vert H(t) - H(s)\Vert _\infty&\le \Vert \nabla {\tilde{f}}(t, \phi (t, \cdot )) - \nabla {\tilde{f}}(t, \phi (s, \cdot )) \Vert _\infty \nonumber \\&+ \Vert \nabla \tilde{f}(t, \phi (s, \cdot )) - \nabla {\tilde{f}}(s, \phi (s, \cdot )) \Vert _\infty \nonumber \\&\le \Vert \text {Hess}({\tilde{f}}) \Vert _{\infty }\Vert \phi (t, \cdot ) -\phi (s, \cdot )\Vert _\infty \nonumber \\&+ \Vert \nabla {\tilde{f}}(t, \phi (s, \cdot )) - \nabla {\tilde{f}}(s, \phi (s, \cdot )) \Vert _\infty , \end{aligned}$$
(3.8)

and the first term is bounded as above using (3.7), while the second term is controlled because \({\tilde{f}} \in C^{1,2}_{buc}\).

We observe that \(H\in C^{0,1}\) and \(\nabla H= (\text {Hess}(\tilde{f})\circ \phi ) \nabla \phi \). We will use below that \(\nabla H\) is uniformly continuous, which we see by showing that each term of the product is bounded and uniformly continuous (buc). \(\nabla \phi \) is buc because \(\nabla \phi \in C_T {\mathcal {C}}^{\alpha }\). The term \(\text {Hess}({\tilde{f}})\circ \phi \) is similar to (3.8) but using that Hess\(({\tilde{f}})\) is buc and (3.7).

Concerning the \(\alpha \)-seminorm (see [18, (2.2)]), for \(x_1, x_2\) such that \(|x_1 -x_2|<1\) we have

$$\begin{aligned}&\frac{|H(t, x_1) - H(t, x_2) - \left( H(s, x_1) - H(s, x_2) \right) |}{|x_1-x_2|^\alpha } \\&\quad \le \int _0^1 \left| \nabla H(t, x_1 + a (x_2-x_1)) - \nabla H(s, x_1 + a (x_2-x_1)) \right| \textrm{d}a |x_2 - x_1|^{1-\alpha } \\&\quad \le \omega _{\nabla H}( |t-s|) , \end{aligned}$$

where \(\omega _{\nabla H} (\cdot )\) denotes the continuity modulus of \( \nabla H\). This concludes the control of the second term in (3.6). \(\square \)

Lemma 3.7

Let \( {\tilde{f}}\in C^{1,2}_{buc}\) and \(\phi \) be the unique solution of PDE (2.4). Setting \( f:= {\tilde{f}} \circ \phi \) we have \(f\in {\mathcal {D}}_{{\mathcal {L}}}^0\) and

$$\begin{aligned} (\tilde{{\mathcal {L}}} {\tilde{f}} ) \circ \phi = {{\mathcal {L}}} f \end{aligned}$$

in \(C_T {\mathcal {C}}^{0+}\), that is f is a solution of \({\mathcal {L}} f =g\) with \(g:=(\tilde{{\mathcal {L}}} {\tilde{f}} ) \circ \phi \in C_T {\mathcal {C}}^{0+}\). Equivalently, we have \(\tilde{{\mathcal {L}}} \tilde{f} = ( {{\mathcal {L}}} f) \circ \psi \), where \(\psi \) is the space-inverse of \(\phi \) defined in (2.5).

If moreover \({\tilde{f}}\) has compact support, then f(T) and g also have compact support, in which case \(f\in {\mathcal {D}}_{{\mathcal {L}}} \).

Proof

We start by proving that \(f\in {{\mathcal {D}}}_{{\mathcal {L}}}^0\) so that we can then calculate \({\mathcal {L}} f\). Notice that \(f\in C_T \mathcal C^{(1+\beta )+}\) by Lemma 3.6. To show that \(f\in C^1([0,T], {\mathcal {S}}')\), we compute the time-derivative \(\dot{f} \). Recall that \({\tilde{f}} \in C^{1,2}_{buc}\) by assumption, and that \(\phi :[0,T]\times \mathbb {R}^d \rightarrow \mathbb {R}^d\) and \(f, {\tilde{f}}: [0,T]\times \mathbb {R}^d \rightarrow \mathbb {R}\). We have

$$\begin{aligned} t \mapsto \dot{f} (t, \cdot ) = \dot{ {\tilde{f}} }(t, \phi (t, \cdot )) + \sum _{k=1}^d \partial _k {\tilde{f}}(t, \phi (t, \cdot )) {\dot{\phi }}_k (t, \cdot ), \end{aligned}$$
(3.9)

where the dot \(\dot{\ }\) denotes the time-derivative and \(\partial _k:= \frac{\partial }{ \partial x_k}\). We show that the right-hand side of equation (3.9) is in \(C_T{\mathcal {S}}'\). For the first term in (3.9) clearly we have the claim because \( \dot{\tilde{f}} \circ \phi \) is uniformly continuous in tx. The second term in (3.9) has products of the form \((\partial _k {\tilde{f}} \circ \phi ) {\dot{\phi }}_k\) where \({\dot{\phi }}_k \in C_T {\mathcal {C}}^{(-\beta )-}\), see Sect. 2.2 and \(\partial _k {\tilde{f}} \circ \phi \in C_T {\mathcal {C}}^{\beta +}\). Hence, the product is well-defined and continuous by (2.2). This shows that \(f\in C^1([0,T]; {\mathcal {S}}')\) and hence \(f\in {{\mathcal {D}}}_{{\mathcal {L}}}^0\).

We now apply \({\mathcal {L}}\) to f so we need to calculate the spatial derivatives of f. The first space derivative of f with respect to \(x_i\) is

$$\begin{aligned} \partial _i f (t, \cdot ) = \sum _{k=1}^d \partial _k {\tilde{f}} (t, \phi (t, \cdot )) \partial _i \phi _k (t, \cdot ) , t\in [0,T] \end{aligned}$$

and the second derivative is

$$\begin{aligned} \partial _{ii} f (t, \cdot )&= \sum _{k=1}^d\left[ \sum _{l=1}^d ( \partial _{lk} {\tilde{f}} (t, \phi (t, \cdot )) \partial _i \phi _l (t, \cdot ) ) \partial _i \phi _k (t, \cdot ) + \partial _k {\tilde{f}} (t, \phi (t, \cdot )) \partial _{ii} \phi _k(t, \cdot ) \right] \\&= \left( (\nabla \phi )^T (\text {Hess}({\tilde{f}})\circ \phi ) \nabla \phi \right) _{ii}(t, \cdot )+ \sum _{k=1}^d \partial _k {\tilde{f}} (t, \phi (t, \cdot )) \partial _{ii} \phi _k(t, \cdot ) , t\in [0,T]. \end{aligned}$$

Note that \(\partial _i f (t, \cdot )\) for all \( t\in [0,T]\) is a well-defined object in \({\mathcal {C}}^{(-\beta )-}\) because it is actually a function in \( {\mathcal {C}}^{\beta +}\) by Lemma 3.6. The second derivative \(\partial _{ii} f (t, \cdot ) \) is made of two terms: the first one is a bounded function, and the second one is well-defined in \({\mathcal {C}}^{(-\beta )-}\) again by means of the pointwise product (2.1), where for all \( t\in [0,T]\) the distributional term \(\partial _{ii}\phi _k (t, \cdot )\) is in \({\mathcal {C}}^{ (-\beta )-}\) since \(\partial _{i}\phi _k (t, \cdot ) \in {\mathcal {C}}^{ (1-\beta )-} \), see Sect. 2.2. Using these we calculate \({\mathcal {L}}{\mathcal {f}}\):

$$\begin{aligned} ({\mathcal {L}} f) (t, \cdot ) =&\dot{{\tilde{f}}} (t, \phi (t, \cdot )) + \frac{1}{2} \sum _{i=1}^d \left( (\nabla \phi )^T (\text {Hess}({\tilde{f}})\circ \phi ) \nabla \phi \right) _{ii}(t,\cdot )\nonumber \\&+ \sum _{k=1}^d \partial _k {\tilde{f}} (t, \phi (t, \cdot )) \left[ \dot{\phi }_k (t, \cdot ) + \frac{1}{2} \sum _{i=1}^d\partial _{ii} \phi _k(t, \cdot ) +\partial _i \phi _k (t, \cdot ) b_i(t, \cdot ) \right] , t\in [0,T], \end{aligned}$$
(3.10)

where the last term \(\partial _k {\tilde{f}} (t, \phi (t, \cdot )) \partial _i \phi _k (t, \cdot ) b_i(t, \cdot )\) is well-defined in \({\mathcal {C}}^{(-\beta )+}\) by (2.1) used twice. Thus, equality (3.10) holds in the space \({\mathcal {C}}^{(-\beta )-}\). Now we observe that \({\mathcal {L}} \phi _k = \lambda (\phi _k -\text {id}_k)\) because \(\phi _k\) is solution of PDE (2.4), see Sect. 2.2. Thus, the equality above becomes

$$\begin{aligned} ({\mathcal {L}} f) (t, \cdot ) =&\dot{{\tilde{f}}} (t, \phi (t, \cdot )) + \frac{1}{2} \sum _{i=1}^d \left( (\nabla \phi )^T (\text {Hess}({\tilde{f}})\circ \phi ) \nabla \phi \right) _{ii}(t, \cdot )\nonumber \\&+ \sum _{k=1}^d \partial _k {\tilde{f}} (t, \phi (t, \cdot )) \lambda (\phi _k (t, \cdot )-\text {id}_k)\nonumber \\ =&\dot{{\tilde{f}}} (t, \phi (t, \cdot )) + \frac{1}{2} \text {Tr} \left( (\nabla \phi )^T (\text {Hess}({\tilde{f}})\circ \phi ) \nabla \phi \right) (t, \cdot )\nonumber \\&+ \lambda \nabla {\tilde{f}} (t, \phi (t, \cdot )) (\phi (t, \cdot )-\text {id}) , t \in [0,T]. \end{aligned}$$
(3.11)

On the other hand, by direct definition (3.4) of \(\tilde{{\mathcal {L}}}\) applied to \({\tilde{f}} \in C^{1,2}_{buc}\) and then composed with \(\phi \) and using \(\psi (t, \phi (t, \cdot ) = \text {id}\), one easily gets

$$\begin{aligned} (\tilde{{\mathcal {L}}} {\tilde{f}} )(t, \phi (t, \cdot )) =&\dot{ {\tilde{f}}} (t, \phi (t, \cdot )) + \frac{1}{2} \text {Tr} \left( (\nabla \phi )^T (\text {Hess}({\tilde{f}})\circ \phi ) \nabla \phi \right) (t, \cdot )\nonumber \\&+ \lambda \nabla {\tilde{f}} (t, \phi (t, \cdot )) (\phi (t, \cdot )-\text {id}), t\in [0,T]. \end{aligned}$$
(3.12)

Now using (3.11) and (3.12) we get \(t\mapsto ({\mathcal {L}} f)=(\tilde{{\mathcal {L}}} {\tilde{f}} )(t, \phi (t, \cdot )) \) in \(C([0,T];{\mathcal {S}}')\). We observe that the right-hand side of (3.12) belongs to \(C_T {\mathcal {C}}^{0+}\). Setting \(g:=(\tilde{{\mathcal {L}}} {\tilde{f}} )\circ \phi \) we can conclude that \({\mathcal {L}} f =g \in C_T {\mathcal {C}}^{0+}\). Given that both sides are functions, we can compose them with \(\psi \) to get \(\tilde{{\mathcal {L}}} \tilde{f}= ({{\mathcal {L}}} f ) \circ \psi \).

Finally, we show that if \({\tilde{f}}\) has compact support, then \(g=(\tilde{{\mathcal {L}}} {\tilde{f}} ) \circ \phi \) also has compact support. First notice that \( \tilde{{\mathcal {L}}} {\tilde{f}} \) has compact support; thus, there exists \(M>0\) such that for all (tx) with \(|(t, \phi (t, x))|>M\) then \(g(t,x)=0\). To show that g has compact support it is enough to find \(N>0\) such that if \(|(t,x)|> N\), then \(|(t,\phi (t,x))|>M\). This is equivalent to showing that

$$\begin{aligned} A:= \{(t,x): |(t,\phi (t,x))|\le M\} \subset \{(t,x): |(t,x)|\le N \}=:B, \end{aligned}$$

for some N. To show the above inclusion, let \((t,x)\in A\). We write \((t, x) = (t, \psi (t, \phi (t,x)))\) and using that \(\nabla \psi \) is uniformly bounded, see Sect. 2.2, we get

$$\begin{aligned} |(t,x)|&= |(t,\psi (t, \phi (t,x)) - \psi (t, 0) +\psi (t, 0))|\\&\le C |(t,\phi (t,x))| +|(t, \psi (t, 0))|\\&\le C M +\sup _{t\in [0,T]}| (t,\psi (t, 0))| =: N, \end{aligned}$$

which shows that \((t,x)\in B\). We conclude by noting that \(f(T, \cdot )\) also has compact support, following the above computations but fixing the time \(t=T\) and replacing \(\tilde{{\mathcal {L}}} \tilde{f} \) with \({\tilde{f}}\). \(\square \)

Lemma 3.8

We have \(C^{1,2}_{c} \subset {\tilde{{\mathcal {D}}}}_{\tilde{{\mathcal {L}}}}\).

Proof

By Definition of \({\tilde{{\mathcal {D}}}}_{\tilde{{\mathcal {L}}}}\) we have to show that if \({\tilde{f}}\in C^{1,2}_{c}\) then \( f:= {\tilde{f}} \circ \phi \in {\mathcal {D}}_{{\mathcal {L}}}\), where \( \mathcal D_{{\mathcal {L}}}\) is given in (3.1). First, we note that by Lemma 3.6 we have \(f \in C_T{\mathcal {C}}^{(1+\beta )+}\). Next we show that \({\mathcal {L}}{\mathcal {f}} =g \) for some \(g\in C_T\bar{\mathcal C}_c^{0+}\). We define \(g:= \tilde{{\mathcal {L}}} {\tilde{f}} \circ \phi \). By Lemma 3.7 we have \({\mathcal {L}} f =g \) and since \({\tilde{f}}\) has compact support, then \(f\in {\mathcal {D}}_{{\mathcal {L}}}\) by Lemma 3.7 again. \(\square \)

We can finally state the main result of this section, namely the equivalence between the original martingale problem and the Zvonkin-transformed martingale problem.

Theorem 3.9

Let Assumption A1 hold.

  1. (i)

    If \((X, {\mathbb {P}})\) is a solution to MP with distributional drift b and i.c. \(\mu \) then \((Y, {\mathbb {P}}) \) is a solution in law to (3.3), where \(Y_t:= \phi (t, X_t)\) and \(Y_0\sim \nu \), where \(\nu \) is the pushforward measure of \(\mu \) given by \(\nu := \mu (\psi (0, \cdot ))\).

  2. (ii)

    If \((Y, {\mathbb {P}}) \) is a solution in law to (3.3) with \(Y_0\sim \nu \) then \((X, {\mathbb {P}})\) is a solution to MP with distributional drift b and i.c. \(\mu \), where \(X_t:= \psi (t, Y_t)\) and \(\mu \) is the pushforward measure of \(\nu \) given by \(\mu := \nu (\phi (0, \cdot ))\).

Proof

Item (i). Let \((X, {\mathbb {P}})\) be a solution of MP. For any \({\tilde{f}} \in C_c^\infty \) we define \(f:= {\tilde{f}} \circ \phi \), where \(\phi \) is the unique solution of PDE (2.4). By Lemma 3.7\(f \in {\mathcal {D}}_{{\mathcal {L}}}\). Setting \(Y_t:= \phi (t, X_t)\), by Lemma 3.7 we have

$$\begin{aligned} {\tilde{f}} (Y_t) -{\tilde{f}} (Y_0) - \int _0^t (\tilde{{\mathcal {L}}} \tilde{f}) (s, Y_s) \mathrm ds = f(t, X_t) - f(0,X_0) - \int _0^t ( \mathcal L f)(s, X_s) \mathrm ds, \end{aligned}$$

which is a local martingale under \({\mathbb {P}}\) for all \({\tilde{f}}\in C^\infty _c\) by Definition 3.1 since \(f\in \mathcal D_{{\mathcal {L}}}\). It follows that the couple \((Y, {\mathbb {P}})\) satisfies the Stroock–Varadhan martingale problem; therefore, \((Y, {\mathbb {P}})\) is a solution in law of SDE (3.3).

Item (ii). Let \((Y, {\mathbb {P}})\) be a solution in law of SDE (3.3). We define \(X_t:= \psi (t, Y_t)\), where \(\psi \) is the (space-)inverse of \(\phi \) defined in (2.5). To show that \((X, {\mathbb {P}})\) is a solution to MP with distributional drift b, we need to show that for all \(f\in {\mathcal {D}}_{{\mathcal {L}}}\) the quantity

$$\begin{aligned} f(t, X_t) - f(0, X_0) -\int _0^t ({\mathcal {L}} f)(s, X_s) \mathrm d s \end{aligned}$$

is a local martingale under \({\mathbb {P}}\). Since \( f\in \mathcal D_{{\mathcal {L}}} \) then there exists \(g \in C_T {\mathcal {C}}^{0+}\) (so there exists \(\nu \in (0,1)\) with \(g\in C_T {\mathcal {C}}^\nu \)) such that \({\mathcal {L}} f = g\). We define \({\tilde{g}}:= g\circ \psi \), \({\tilde{f}}_T:= f (T, \psi (T, \cdot ))\) and \({\tilde{f}}^n_T: = {\tilde{f}}_T *\rho _n\), where \(\rho _n=p_{\frac{1}{n}}\) with \(p_t\) the heat kernel. We see that \( {\tilde{g}}\in {\mathcal {C}}^{0,\nu }\), see Appendix A for the explicit definition of the space. Indeed \({\tilde{g}}\) is in \(C([0,T]\times \mathbb {R}^d)\) because g and \(\psi \) are, and it is easy to obtain the bound

$$\begin{aligned} \sup _{t\in [0,T]} \sup _{x\ne y} \frac{|{\tilde{g}}(t, x)- {\tilde{g}}(t, y)| }{|x-y|^\nu } \le \sup _{t\in [0,T]} \Vert g(t)\Vert _{{\mathcal {C}}^\nu } \Vert \nabla \psi \Vert _\infty ^\nu , \end{aligned}$$

using the fact that \(g\in C_T{\mathcal {C}}^\nu \) and \(\psi \in C^{0,1}\) with gradient \(\nabla \psi \) uniformly bounded, see Sect. 2.2. Moreover, \({\tilde{f}}^n_T\in C^{2+\nu }\) (for explicit definition of these spaces and its inclusion in other spaces, see Appendix A) and by Remark 3.4 the coefficients of \(\tilde{{\mathcal {L}}}\) are in \(C^{0,\nu }\). So by [21, Theorem 5.1.9] (which has been recalled in Theorem A.3 in the Appendix for ease of reading) we know that for each n there exists a function \({\tilde{f}}^n \in C^{1, 2+\nu }([0,T]\times \mathbb {R}^d)\) (see Appendix A for the definition of this space and its inclusion in other spaces) which is the classical solution of

$$\begin{aligned} \left\{ \begin{array}{l} \tilde{{\mathcal {L}}} {\tilde{f}}^n = {\tilde{g}}\\ {\tilde{f}}^n(T) = {\tilde{f}}^n_T. \end{array} \right. \end{aligned}$$
(3.13)

Therefore, \({\tilde{f}}^n \in C^{1,2}\) and thus by Itô’s formula

$$\begin{aligned} {\tilde{f}}^n(t, Y_t) - {\tilde{f}}^n(0, Y_0) -\int _0^t {\tilde{g}}(s, Y_s) \mathrm d s \end{aligned}$$

is a local martingale under \({\mathbb {P}}\). Here we used that \( (\tilde{ {\mathcal {L}}} {\tilde{f}}_n)(s, Y_s) = {\tilde{g}}(s, Y_s)\) by construction. Setting \(f^n:= {\tilde{f}}^n \circ \phi \), we also have that

$$\begin{aligned} f^n(t, X_t) -f^n(0, X_0) -\int _0^t g(s, X_s) \mathrm d s \end{aligned}$$
(3.14)

is a local martingale under \({\mathbb {P}}\). Using the definition of \({\tilde{g}}\), the fact that \({\tilde{f}}^n\) is a classical solution of PDE (3.13) and \({\tilde{f}}^n \in C^{1,2}_{buc}\) (see Remark A.2) by Lemma 3.7 we know that

$$\begin{aligned} g= {\tilde{g}} \circ \phi = {\tilde{{\mathcal {L}}}} {\tilde{f}}^n \circ \phi = {\mathcal {L}} f^n, \end{aligned}$$

in \(C_T{\mathcal {C}}^{\nu }\) and thus in particular \(f^n\) is a weak solution of

$$\begin{aligned} \left\{ \begin{array}{l} {{\mathcal {L}}} f^n = g\\ f^n(T) = f^n_T, \end{array} \right. \end{aligned}$$
(3.15)

where \(f^n_T:= {\tilde{f}}^n(T) \circ \phi (T, \cdot )\).

Now we claim that \(f^n\) is the unique mild solution to (3.15) in \(C_T{\mathcal {C}}^{(1+\beta )+}\) and that \(f^n\rightarrow f\) uniformly on compacts (these claims will be proven later). By this convergence and taking the limit of (3.14) where we replace \(g= {\mathcal {L}} f\), we get that

$$\begin{aligned} f(t, X_t) - f(0, X_0) -\int _0^t ({\mathcal {L}} f)(s, X_s) \mathrm d s \end{aligned}$$

is a local martingale under \({\mathbb {P}}\), thanks to the fact that the space of local martingales is closed under u.c.p. convergence.

It is left to prove that \(f^n\) is the unique mild solution to (3.15) in \(C_T{\mathcal {C}}^{(1+\beta )+}\) and that \(f^n\rightarrow f\) uniformly on compacts, which we do in three steps.

Step 1: we prove that \(f^n\) is the unique mild solution to (3.15) in \( C_T{\mathcal {C}}^{(1+\beta )+} \). To do so, first we show that \(f^n\in C_T{\mathcal {C}}^{(1+\beta )+} \), indeed \(f^n: = {\tilde{f}}^n \circ \phi \) with \({\tilde{f}}^n\in C_{buc}^{1,2}\) and \(\phi \) solution of PDE (2.4), so by Lemma 3.6 we have \(f^n\in C_T{\mathcal {C}}^{(1+\beta )+} \). In Sect. 2.2 it is recalled that weak and mild solutions are equivalent therefore \(f^n\) is the unique (mild) solution in \(C_T{\mathcal {C}}^{(1+\beta )+}\).

Step 2: we prove that \(f_T^n \rightarrow f_T: = f(T)\) in \(\mathcal C^{(1+\beta )+}\). Recall that \(f^n_T = {\tilde{f}}^n_T \circ \phi (T, \cdot )\), so by Lemma 3.6 again we have \(f^n_T\in \mathcal C^{(1+\beta )+} \). Moreover, \(f_T = f(T)\in {\mathcal {C}}^{(1+\beta )+} \) because \(f\in {\mathcal {D}}_{{\mathcal {L}}}\). Now we notice that \(\tilde{f}_T \in {\mathcal {C}}^{(1+\beta )+} \) by Lemma 3.5 using the definition \({\tilde{f}}_T:= f_T \circ \psi (T, \cdot ) \), where \(f_T\in {\mathcal {C}}^{(1+\beta )+}\) by definition of \({\mathcal {D}}_{{\mathcal {L}}} \) and \(\psi ( T, \cdot ) \in C^1\) with \(\nabla \psi (T, \cdot ) \in {\mathcal {C}}^{(1-\beta )-}\) see Sect. 2.2. Since \({\tilde{f}}_T^n = {\tilde{f}}_T *\rho _n\) and the convolution with the mollifier \(\rho _n\) maintains the same regularity of \({\tilde{f}}_T\) by [18, Lemma 2.4], then \({\tilde{f}}^n_T \rightarrow {\tilde{f}}_T\) in \({\mathcal {C}}^{(1+\beta )+}\), see Sect. 2.2. Finally, again by Lemma 3.5 we have \({\tilde{f}}^n_T \circ \phi (T, \cdot ) \rightarrow {\tilde{f}}_T \circ \phi (T, \cdot )\) in \({\mathcal {C}}^{(1+\beta )+}\) as wanted.

Step 3: we prove that \(f^n\rightarrow f\) uniformly, in particular uniformly on compacts. From Step 1 we have that \(f^n\) is the unique solution of (3.15) in \(C_T C^{(1+\beta )+}\). Moreover, we recall that f is the unique mild solution in the same space of \({\mathcal {L}} f = g\) with terminal condition the value of the function itself, \( f_T = f(T)\). We can now apply continuity results on the PDE (3.15), see Sect. 2.2, to conclude that \(f^n\rightarrow f\) in \(C_T {\mathcal {C}}^{(1+\beta )+}\). This clearly implies that \(f^n\rightarrow f\) uniformly, as wanted. \(\square \)

Remark 3.10

It is possible to define an equivalent MP by a transformation different than the one used in Theorem 3.9. Indeed, it is enough to consider a generic transformation \(\phi \in C_T D \mathcal C^{\beta +}\) which is space-invertible with inverse \(\psi \), and under which one has the equivalence between \((X, {\mathbb {P}})\) solving the MP with respect to \({\mathcal {L}}\) and \((\phi (X), {\mathbb {P}})\) solving the MP with respect to \(\tilde{{\mathcal {L}}}\), where \(\tilde{ \mathcal L} {\tilde{f}}:= {\mathcal {L}} f \circ \psi \). The issue going further would be to interpret \(\tilde{{\mathcal {L}}} {\tilde{f}} = {\tilde{g}}\) as a PDE, which would need to be considered in the mild sense and will presumably require some regularity of \(\phi \). Well-posedness of such an equation would be based on Schauder-type estimates for the time-dependent semigroup generated by the diffusive component of the operator \( \tilde{{\mathcal {L}}}\), which are far from being straightforward.

From now on, let \((b^n)\) be the sequence defined in [19, Proposition 2.4], so we know that \(b^n\rightarrow b\) in \(C_T{\mathcal {C}}^{-\beta }\), \(b^n \in C_T {\mathcal {C}}^\gamma \) for all \(\gamma \in \mathbb {R}\) and \(b^n\) is bounded and Lipschitz. Recall that \(\lambda >0\) has been fixed and independent of n, chosen such that (2.6) holds. To conclude the section, we prove a continuity result for the transformed problem for Y that will be useful when we will prove analogous continuity results for the original problem for X. Let us denote by \(Y^n\) the strong solution of

$$\begin{aligned} Y^n_t= \phi (0, X_0) + \lambda \int _0^t Y^n_s \textrm{d}s - \lambda \int _0^t \psi ^n(s, Y^n_s) \textrm{d}s + \int _0^t \nabla \phi ^n(s, \psi ^n(s, Y^n_s) )\textrm{d}W_s,\nonumber \\ \end{aligned}$$
(3.16)

which is the counterpart of (3.3) when one replaces b with \(b^n\).

Remark 3.11

We notice that the drift and the diffusion coefficient of (3.16) are uniformly bounded in n. Indeed the drift is given by \(\lambda (y -\psi ^n(s, y)) = \lambda u^n (s, \psi ^n(s,y))\) and the diffusion coefficient is \( \nabla \phi ^n(s, \psi ^n(s, y)) = \nabla u^n(s, \psi ^n(s, y)) + I_d\). Thanks [18, Lemma 4.9], for every fixed \(\alpha \in (\beta , 1-\beta )\) we have

$$\begin{aligned} \Vert u^n \Vert _{ C_T {\mathcal {C}}^{\alpha +1}} \le R_\lambda ( \Vert b^n\Vert _{ C_T {\mathcal {C}}^{-\beta }} ) \Vert b^n\Vert _{C_T {\mathcal {C}}^{-\beta }} \le R_\lambda ( \sup _n\Vert b^n\Vert _{ C_T {\mathcal {C}}^{-\beta }} ) \sup _n \Vert b^n\Vert _{C_T {\mathcal {C}}^{-\beta }}, \end{aligned}$$

where \(R_\lambda \) is an increasing function. Thus, \(u_n\) and \(\nabla u_n\) are uniformly bounded in n.

Lemma 3.12

Let \(Y^n\) be the solution of SDE (3.16). Then the sequence of laws of \((Y^n)\) is tight.

Proof

According to [20, Theorem 4.10 in Chapter 2] we need to prove that

$$\begin{aligned} \lim _{\eta \rightarrow \infty } \sup _{n\ge 1} {\mathbb {P}}(| Y^n_0 |>\eta ) =0 \end{aligned}$$
(3.17)

and that for every \(\varepsilon >0\)

$$\begin{aligned} \lim _{\delta \rightarrow 0} \sup _{n\ge 1} {\mathbb {P}} \Big ( \sup _{\begin{array}{c} s,t \in [0,T] \\ |s-t|\le \delta \end{array}} |Y^n_t-Y^n_s|>\varepsilon \Big ) =0. \end{aligned}$$
(3.18)

We know that \( Y^n_0 = \phi ^n(0, X_0)\) and \(X_0\sim \mu \). By continuity results on the PDE (2.4), see Sect. 2.2, we have that \(\phi ^n\rightarrow \phi \) uniformly and that

$$\begin{aligned}a:= \sup _{n\ge 1} \Vert \nabla \phi ^n\Vert _\infty<\infty \quad \text {and} \quad b:= \sup _{n\ge 1} |\phi ^n(0,0)|<\infty .\end{aligned}$$

So the first condition (3.17) for tightness gives

$$\begin{aligned} {\mathbb {P}}(| Y^n(0) |>\eta )&= {\mathbb {P}}(| \phi ^n(0, X_0) |>\eta ) \\&\le {\mathbb {P}}(| \phi ^n(0,0) | + \Vert \nabla \phi ^n\Vert _\infty |X_0|>\eta ) \\&\le {\mathbb {P}}(a + b |X_0| >\eta ). \end{aligned}$$

Noticing that \(a+b|X_0| \) is a finite random variable (independent of n) then we have (3.17).

Concerning the second bound (3.18) for tightness, we first observe that the classical Kolmogorov criterion

$$\begin{aligned} \mathbb {E}[|Y_t^n - Y_s^n|^4] \le C |t-s|^2 \end{aligned}$$
(3.19)

holds for some positive constant C independent of n. The proof of this bound works exactly as the proof in [12, Step 3 of Proposition 29]: indeed, the process \(Y^n\) therein has the same form as \(Y^n\) given by (3.16). By Remark 3.11 we have that the drift and diffusion coefficients are uniformly bounded in n, so that [12, Step 3 of Proposition 29] allows to show (3.19).

Now we apply Garsia–Rodemich–Rumsey Lemma (see e.g. [3, Sect. 3]) and we know that for every \(0<m<1\) there exists a constant \(C'\) and a random variable \(\Gamma _n\) such that

$$\begin{aligned} | Y_t^n - Y_s^n|^4 \le C' |t - s|^m \Gamma _n \end{aligned}$$

with

$$\begin{aligned} {\mathbb {E}}(\Gamma _n) \le c\ C \frac{1}{1-m} T^{2-m}, \end{aligned}$$
(3.20)

where c is a universal constant. Consequently, for every \(\varepsilon >0\) and for every \(n\ge 1\)

$$\begin{aligned} {\mathbb {P}} \Big ( \sup _{\begin{array}{c} s,t \in [0,T] \\ |s-t|\le \delta \end{array}} |Y^n_t-Y^n_s|>\varepsilon \Big ) =&\;{\mathbb {P}} \Big ( \varepsilon < \sup _{\begin{array}{c} s,t \in [0,T] \\ |s-t|\le \delta \end{array}} |Y^n_t-Y^n_s| \le C'^{\frac{1}{4}} \delta ^{\frac{m}{4}} \Gamma _n^{\frac{1}{4}} \Big )\\ \le&\;{\mathbb {P}} \Big ( \varepsilon \le C'^{\frac{1}{4}} \delta ^{\frac{m}{4}} \Gamma _n^{\frac{1}{4}} \Big )\\ \le&\;{\mathbb {P}} \Big ( \Gamma _n \ge \frac{\varepsilon ^4}{ C'\delta ^{m} } \Big )\\ \le&\;\frac{ C'\delta ^{m} }{\varepsilon ^4} \mathbb {E}(\Gamma _n), \end{aligned}$$

by Chebyshev inequality. So, using (3.20) we have that \(\sup _{n\ge 1} {\mathbb {P}} \Big ( \sup _{\begin{array}{c} s,t \in [0,T] \\ |s-t|\le \delta \end{array}} |Y^n_t-Y^n_s|>\varepsilon \Big ) \rightarrow 0\) as \(\delta \rightarrow 0\) and (3.18) is established. \(\square \)

Remark 3.13

When \(Y_0=y \) is a deterministic initial condition, we know that (3.3) admits existence and uniqueness in law by [28, Theorem 10.2.2], because the drift and diffusion coefficient are bounded by Remark 3.11 and the diffusion coefficients is continuous since \(\nabla \phi \) and \(\psi \) are continuous and it is uniformly non-degenerate since \(\Vert \nabla u\Vert _\infty \le \frac{1}{2}\), see Sect. 2.2.

4 The Martingale Problem for X

In this section we solve the martingale problem for the process X, which formally satisfies an SDE of the form

$$\begin{aligned} X_t = X_0 + \int _0^t b(s, X_s) \textrm{d}s + W_t, \end{aligned}$$

where W is a d-dimensional Brownian motion, the drift b is an element of \(C_T{\mathcal {C}}^{(-\beta )+}\) that satisfies Assumption A1 and the initial condition \(X_0\) is a given random variable. To do so, we first solve the problem for a deterministic initial condition and then we use this to extend the result to any initial condition. We also derive some properties about said solution, such as its link to the Fokker–Planck equation and continuity properties.

We start with the case when the drift b is a function, by comparing the notion of solution to the singular MP with the notion of solution in law of SDEs, and with the Stroock–Varadhan Martingale Problem, see [28, Sect. 6.0]. We recall that \((X, {\mathbb {P}})\) is a solution to the Stroock–Varadhan Martingale Problem with respect to \({\mathcal {L}}\) if for every \(f\in C_c^\infty \)

$$\begin{aligned} f( X_t) - f(X_0) - \int _0^t (\frac{1}{2} \Delta f (X_s) + \nabla f(X_s) b(s, X_s)) \textrm{d}s \end{aligned}$$
(4.1)

is a local martingale.

Lemma 4.1

Let \(b\in C_T {\mathcal {C}}^{0+}\). Let \((\Omega , {\mathcal {F}}, \mathbb P)\) be some probability space. Let \(X_0 \sim \mu \). Then the following are equivalent.

  1. (i)

    The couple \((X, {\mathbb {P}})\) is solution to the MP with distributional drift b.

  2. (ii)

    The couple \((X, {\mathbb {P}})\) is solution to the Stroock–Varadhan Martingale Problem with respect to \({\mathcal {L}}\).

  3. (iii)

    There exists a Brownian motion W such that the process X under \({\mathbb {P}}\) is a solution of \(\textrm{d}X_t = b(t, X_t) \textrm{d}t + \textrm{d}W_t\).

Proof

(ii) \(\iff \) (iii). This follows from the Stroock–Varadhan classical theory, see [28, Chapter 8]. We sketch the proof for completeness. If the Stroock–Varadhan Martingale Problem is fulfilled, i.e. if (ii) holds, then in fact (4.1) also holds for \(f \in C^2\). Choosing \(f(x) = x^i\) and \(f(x) = x^i x^j, 1 \le i,j \le d,\) one can show that \(M_t = X_t - X_0 - \int _0^t b(s,X_s) \textrm{d}s\) is a local martingale with covariation matrix \(([X^i, X^j])_{i,j}\) being the identity. The process M is then a standard d-dimensional Brownian motion by Lévy’s characterisation theorem. Vice versa if X fulfils the SDE (iii) then (ii) follows by Itô’s formula.

(i) \(\Longrightarrow \) (ii). For this it is enough to show that for every \(f\in C_c^\infty \) (4.1) holds. This is true since \(C_c^\infty \subset {\mathcal {D}}_{{\mathcal {L}}}\) in this case.

(iii) \(\Longrightarrow \) (i). We will make use of the spaces \(C^{0,\nu }([0,T] \times {\mathbb {R}}^d)\) and \( C^{1, 2+\nu }\) for \(\nu \in (0,1)\), which have been defined in Appendix A. Since \( b \in C_T {{\mathcal {C}}}^{0+}\), by [18, Remark 4.12] we know that the unique solution \(u\in C_T {\mathcal {C}}^{(1+\beta )+}\) of PDE (2.7) is also the classical solution as given in Theorem A.3, hence \(u\in C^{1,2} \). We set \(\phi = \text {id} + u\), which thus belongs to \( C^{1, 2}\) so by Itô’s formula applied to \( Y= \phi (t, X_t)\) where X is a solution to \(\textrm{d}X_t = b(t, X_t) \textrm{d}t + \textrm{d}W_t\) we get that Y solves (3.3) with initial condition \(Y_0 \sim \nu := \mu (\psi (0, \cdot ))\), where \(\psi \) is the inverse of \(\phi \). Thus, Theorem 3.9 implies that \((X, {\mathbb {P}}) \) is a solution to the MP with (distributional) drift b and i.c. \(\mu \), as wanted. \(\square \)

Next we show the link between the law of the solution to the MP and the Fokker–Planck equation, in particular we show that the law of the solution to the martingale problem with distributional drift satisfies a Fokker–Planck equation.

Theorem 4.2

Let Assumption A1 hold. Let \((X,{\mathbb {P}})\) be a solution to the martingale problem with distributional drift b and initial condition \(\mu \) with density \(v_0\). Let \(v(t, \cdot )\) be the law density of \(X_t\) and let us assume that \(v\in C_T{\mathcal {C}}^{\beta +}\). Then v is a weak solution of the Fokker–Planck equation, that is for every \(\varphi \in {\mathcal {S}}\) we have

$$\begin{aligned} \langle \varphi , v(t)\rangle = \langle \varphi , v_0\rangle + \int _0^t \langle \frac{1}{2} \Delta \varphi , v(s) \rangle \textrm{d}s + \int _0^t \langle \nabla \varphi , v (s) b(s) \rangle \textrm{d}s, \end{aligned}$$
(4.2)

for all \(t\in [0,T]\).

Notice that the product v(s)b(s) appearing in the last integral is well-defined using pointwise products (2.1). We remark that the solution v is the unique solution of (4.2) by [19, Theorem 3.7 and Proposition 3.2].

Proof

It is enough to show the claim for all \(\varphi \in C_c^\infty \). Indeed \(C_c^\infty \) is dense in \({\mathcal {S}} \). Since \(\varphi \in C_c^\infty \subset {\mathcal {D}}^0_{{\mathcal {L}}}\), then we can apply the operator \({\mathcal {L}}\) defined in Definition 2.2 to \(\varphi \), and we define \({\mathcal {L}} \varphi =:g \). Clearly \(\varphi \) is a weak solution of the PDE \({\mathcal {L}} \varphi =g\) with terminal condition \(\varphi \). Moreover, the function \(\varphi \) is time independent by construction. Using the definition of \(\mathcal L\) we get for all \(s\in [0,T]\) that

$$\begin{aligned} ({\mathcal {L}} \varphi )(s) = \tfrac{1}{2} \Delta \varphi + \nabla \varphi \, b(s) \end{aligned}$$
(4.3)

in \({\mathcal {C}}^{-\beta }\) (having used the regularity of \(\varphi \) and the pointwise product (2.1)). In fact since \(t \mapsto b(t, \cdot )\in {\mathcal {C}}^{-\beta }\) is a continuous function of time by (2.2) we have that \({\mathcal {L}}\varphi \in C_T{\mathcal {C}}^{-\beta }\).

We now construct a sequence \((g^n) \in C_T{\mathcal {C}}^{0+}\) that converges to g in \(C_T{\mathcal {C}}^{-\beta }\) and that is compactly supported. Let \((b^n)\) be the sequence defined before (2.6), in particular it converges to b in \(C_T\mathcal C^{-\beta }\) and let us define \( g^n:= \tfrac{1}{2} \Delta \varphi + \nabla \varphi \, b^n\). Then clearly \(g^n\in C_T{\mathcal {C}}^{0+}\) (in fact it is more regular) and

$$\begin{aligned} \Vert g-g^n\Vert _{C_T {\mathcal {C}}^{-\beta }} = \Vert \nabla \varphi \, (b-b^n)\Vert _{C_T {\mathcal {C}}^{-\beta }} \le \Vert \nabla \varphi \Vert _{C_T {\mathcal {C}}^{\beta +}} \Vert b-b^n\Vert _{C_T {\mathcal {C}}^{-\beta }}, \end{aligned}$$

and the right-hand side goes to 0 as \(n\rightarrow \infty \). Moreover, denoting by K the compact support of \(\varphi \), we have that also \(g^n\) is supported on K.

Let us denote by \(u^n\) the mild solution of \({\mathcal {L}} u^n = g^n, \, u^n(T) = \varphi \), which exists and is unique in \(C_T {\mathcal {C}}^{(1+\beta )+}\), see Sect. 2.2. Such function belongs to \({\mathcal {D}}_{{\mathcal {L}}}\) by definition of the domain \({\mathcal {D}}_{{\mathcal {L}}}\), see (3.1). Since \(u^n \in {\mathcal {D}}_{{\mathcal {L}}}\) and \((X, {\mathbb {P}})\) is a solution to the martingale problem with distributional drift b and initial condition \(\mu \) with density \(v_0\), then we know that

$$\begin{aligned} u^n(t, X_t) - u^n(0, X_0) - \int _0^t {\mathcal {L}} u^n (s, X_s) \textrm{d}s \end{aligned}$$

is a local martingale under \({\mathbb {P}}\), but also a true martingale since \(u^n\) and \({\mathcal {L}} u^n\) are bounded. We denoted by \(v(t, \cdot )\) the law density of \(X_t\); thus, taking the expectation under \({\mathbb {P}}\) we have

$$\begin{aligned} \int _{\mathbb {R}^d} u^n(t, x) v(t, x) \textrm{d}x&- \int _{\mathbb {R}^d} u^n(0, x) v_0(x) \textrm{d}x - \int _0^t \int _{\mathbb {R}^d}( {\mathcal {L}}{\mathcal {u}}^n) (s, x) v(s, x) \textrm{d}x \textrm{d}s =0. \end{aligned}$$
(4.4)

We now consider a smooth function \(\chi _{K}\in C_c^\infty \) such that \(\chi _{K} = 1\) on K. Since \(g^n\) is compactly supported on K and \({\mathcal {L}}{\mathcal {u}}^n = g^n\), we can rewrite the double integral in (4.4) as

$$\begin{aligned} \int _0^t \int _{\mathbb {R}^d}( {\mathcal {L}}{\mathcal {u}}^n) (s, x) v(s, x) \textrm{d}x \textrm{d}s =&\int _0^t \int _{\mathbb {R}^d}( {\mathcal {L}}{\mathcal {u}}^n) (s, x) v(s, x) \chi _{K}(x)\textrm{d}x \textrm{d}s\\ =&\int _0^t \langle ( {\mathcal {L}}{\mathcal {u}}^n) (s) v(s), \chi _{K}\rangle \textrm{d}s, \end{aligned}$$

where the dual pairing is in \({\mathcal {S}}, {\mathcal {S}}'\). By continuity properties of the PDE \({\mathcal {L}} u^n = g^n\) with terminal condition \(u^n(T)=\varphi \) (see Sect. 2.2) we know that since \(g^n \rightarrow g \) in \(C_T {\mathcal {C}}^{-\beta }\) then \(u^n \rightarrow \varphi \) in \(C_T \mathcal C^{(1+\beta )+}\), since \(\varphi \) is the unique solution of \({\mathcal {L}} \varphi = g\) with terminal condition \(u(T)= \varphi \). Thus, taking the limit as \(n\rightarrow \infty \) of the above dual pairing, we get

$$\begin{aligned} \lim _{n\rightarrow \infty } \int _0^t \langle ( {\mathcal {L}}{\mathcal {u}}^n) (s) v(s), \chi _{K}\rangle \textrm{d}s&= \int _0^t \langle ( {\mathcal {L}}\varphi ) (s) v(s), \chi _{K}\rangle \textrm{d}s\nonumber \\&= \int _0^t \langle \frac{1}{2}\Delta \varphi \, v(s), \chi _{K}\rangle +\langle \nabla \varphi \, b(s) v(s), \chi _{K}\rangle \textrm{d}s \nonumber \\&= \int _0^t \langle \frac{1}{2}\Delta \varphi , v(s) \rangle \textrm{d}s +\int _0^t \langle \nabla \varphi \, b(s) v(s), \chi _{K}\rangle \textrm{d}s . \end{aligned}$$
(4.5)

Now we prove that the latter dual pairing in (4.5) can be rewritten as

$$\begin{aligned} \langle \nabla \varphi \, b(s) v(s), \chi _{K}\rangle = \langle \nabla \varphi , b(s) v(s) \rangle , \end{aligned}$$
(4.6)

for all \(s\in [0,T]\). Indeed, the LHS of (4.6) is well-defined because \(\chi _{K}\in C_c^\infty \) and for every \(s\in [0,T]\) the distribution \(\nabla \varphi \, b(s) v(s)\) is actually an element of \({\mathcal {C}}^{-\beta }\) because of the pointwise product (2.1) and of the regularity \(v(s)\in {\mathcal {C}}^{\beta +}\) and \(b(s) \in {\mathcal {C}}^{-\beta }\). The RHS of (4.6) is also well-defined, but now the test function is \(\nabla \varphi \in C_c^\infty \) and the distribution is b(s)v(s). To show that (4.6) holds we observe that by the continuity of the product (2.2) we have \( b^n(s) v(s) \rightarrow b(s) v(s)\) in \({\mathcal {C}}^{-\beta }\) (in fact uniformly in \(s\in [0,T]\)) and thus we can write

$$\begin{aligned} \langle \nabla \varphi \, b(s) v(s), \chi _{K}\rangle&= \lim _{n \rightarrow \infty } \langle \nabla \varphi \, b^n(s) v(s), \chi _{K}\rangle \\&= \lim _{n \rightarrow \infty } \int _{\mathbb {R}^d} \nabla \varphi (x) b^n(s,x) v(s,x) \chi _{K}(x) \textrm{d}x\\&= \lim _{n \rightarrow \infty } \int _{\mathbb {R}^d} \nabla \varphi (x) b^n(s,x) v(s,x)\textrm{d}x\\&= \lim _{n \rightarrow \infty } \langle \nabla \varphi , b^n(s) v(s) \rangle \\&=\langle \nabla \varphi , b(s) v(s) \rangle , \end{aligned}$$

for all \(s\in [0,T]\), which proves (4.6).

To conclude it is enough to take the limit as \(n\rightarrow \infty \) in (4.4) and use (4.5) and (4.6) to get (4.2). \(\square \)

The following is a continuity result for the martingale problem. Recall that \((b^n)\) is the sequence defined before (2.6) in Sect. 2.2, so we know that \(b^n\rightarrow b\) in \(C_T{\mathcal {C}}^{-\beta }\), \(b^n \in C_T {\mathcal {C}}^\gamma \) for all \(\gamma \in \mathbb {R}\) and \(b^n\) is bounded and Lipschitz. We denote by \(X^n\) the (strong) solution to the SDE

$$\begin{aligned} X_t^n = X_0 + \int _0^t b^n(s, X_s^n) \textrm{d}s + W_t, \end{aligned}$$
(4.7)

where \(X_0\sim \mu \).

Theorem 4.3

Let Assumptions A1 hold. Let \((b^n)\) be a sequence in \(C_T{\mathcal {C}}^{(-\beta )+}\) converging to b in \(C_T{\mathcal {C}}^{-\beta }\). Let \((X, {\mathbb {P}})\) (respectively \((X^n, {\mathbb {P}}^n)\)) be a solution to the MP with distributional drift b (respectively \(b^n\)) and initial condition \(\mu \). Then the sequence \((X^n, {\mathbb {P}}^n)\) converges in law to \((X, {\mathbb {P}})\). In particular, if \(b^n \in C_T{\mathcal {C}}^{0+}\) and \(X^n\) is a strong solution of

$$\begin{aligned} X^n_t = X_0 + \int _0^t b^n(s, X^n_s) \textrm{d}s + W_t, \end{aligned}$$

then \(X^n\) converges to \((X, {\mathbb {P}})\) in law.

Proof

The proof is identical to that of [12, Proposition 29]. In particular Step 4 therein deals with the convergence in law of \(Y^n\), which is the solution of SDE (3.16), and Step 5 with the convergence in law of \(X^n\). Notice that the drift b therein lives in a different space than ours (Bessel potential spaces instead of Hölder-Besov spaces), and the initial condition in [12] is deterministic, but the setting is otherwise the same. The only tools used in Step 4 and 5 are the tightness of the sequence of laws of \(Y^n\), which we proved in Lemma 3.12, and the uniform convergence of \(u^n\rightarrow u, \nabla u^n \rightarrow \nabla u\) and \(\psi ^n \rightarrow \psi \), see Sect. 2.2. Finally, setting \(X_t:= \psi (t, Y_t)\) for \(t\in [0,T]\), then \((X, {\mathbb {P}})\) is the unique solution to the martingale problem with distributional drift b and initial condition \(\mu \) by Theorem 3.9, because \((Y, {\mathbb {P}})\) is the unique solution to (3.3) with initial condition \(Y_0 \sim \nu \) where \(\nu \) is the pushforward measure of \(\mu \) through \(\phi \).

It remains to prove the last claim of the theorem, which follows because \(X^n\) is also a solution to the MP with distributional drift \(b^n\) by Lemma 4.1, so the first part of the theorem can be applied. \(\square \)

The first existence and uniqueness result is for the solution to the MP with distributional drift b and deterministic initial condition \(X_0 =x\). We will extend the result to any random variable in Theorem 4.5.

Proposition 4.4

The martingale problem with distributional drift b and i.c. \(\delta _x\), for \(x\in \mathbb {R}^d\), admits existence and uniqueness according to Definition 3.1.

Proof

Let \((X, {\mathbb {P}})\) be a solution to the MP. Setting \(Y_t = \phi (t, X_t)\) and \(Y_0 = y:=\phi (0,x)\), by Item (i) of Theorem 3.9 we have that \((Y, {\mathbb {P}})\) is a solution in law to (3.3). By Remark 3.13 the solution \( (Y, \mathbb P)\) is unique; hence, the law of X under \({\mathbb {P}}\) is uniquely determined.

Existence follows from the fact that equation (3.3) with \(Y_0=y\) has a solution in law, say \((Y, {\mathbb {P}})\), again by Remark 3.13. Then setting \(X_t:= \psi (t, Y_t)\) by Item (ii) of Theorem 3.9, we know that \((X, {\mathbb {P}})\) is a solution in law to MP with distributional drift b and i.c. \(\delta _x\). \(\square \)

Next we extend the existence and uniqueness result of Proposition 4.4 to the general case when the initial condition \(X_0\) is a random variable rather than a deterministic point.

Theorem 4.5

Let Assumption A1 hold and let \(\mu \) be a probability measure on \(\mathbb {R}^d\). Then there exists a unique solution \((X, {\mathbb {P}})\) to the martingale problem with distributional drift b and initial condition \(\mu \).

Proof

Existence. The idea is to use a superposition argument in order to glue together the solutions of MP with a deterministic initial condition x, for all possible initial conditions x. This is implemented using the process \(Y_t= \phi (t, X_t)\).

We have the measure \(\mu \) on \(({\mathbb {R}}^d,{\mathcal {B}} (\mathbb {R}^d) )\) which is the law of the initial condition \(X_0\) and we define a new measure \(\nu \) on the same space given by \(\nu (B)= \mu (\psi (0, B))\) for any \(B \in {\mathcal {B}} (\mathbb {R}^d) \). Notice that \(\nu \) is the pushforward of \(\mu \) through the function \(\phi \), where \(\psi = \phi ^{-1}\) has been defined in (2.5); thus, \(\nu \) plays the role of the initial condition for the process \(\phi (t, X_t)\). Let Y be the canonical process and \({\mathbb {P}}^y\) be a law of the canonical process on \( {\mathcal {C}}_T \) such that \((Y,{\mathbb {P}}^y)\) is the unique weak solution to (3.3) with \(Y_0 =y\). Then it is known by [28, Theorem 7.1.6] that \( (y, C)\mapsto {\mathbb {P}}^y(C)\) is a random kernel for \(y\in \mathbb {R}^d\) and \(C\in {\mathcal {B}}({\mathcal {C}}_T )\); hence, the probability \({\mathbb {P}}\) given by

$$\begin{aligned} {\mathbb {P}}(C):= \int {\mathbb {P}}^y (C)\nu (\textrm{d}y) \end{aligned}$$
(4.8)

is well-defined. Setting \(X_t:= \psi (t, Y_t)\), our candidate solution to the MP with distributional drift b and initial condition \(\mu \) is \((X, {\mathbb {P}}) \). First, we observe that for any \(C \in \mathcal B({\mathcal {C}}_T)\) of the form \(C= \{\omega : \omega _0 \in B\} \) with some \(B\in {\mathcal {B}}(\mathbb {R}^d)\), we have

$$\begin{aligned} {\mathbb {P}}^y(C) = {\mathbb {P}}^y (\omega \in C) = {\mathbb {P}}^y (Y_0 \in B) = \mathbbm {1}_{B}(y), \end{aligned}$$
(4.9)

having used that \({\mathbb {P}}^y\)-a.s. the canonical process Y is such that \(Y_0=y\). This will allow us to show that the initial condition \(X_0\) has law \(\mu \). Indeed, for any \(A\in \mathcal B(\mathbb {R}^d)\) we set \(B = \phi (0, A)\) and we calculate

$$\begin{aligned} {\mathbb {P}}(X_0 \in A)&= {\mathbb {P}}(\psi (0, Y_0) \in A) = \mathbb P(Y_0 \in \phi (0,A)) = {\mathbb {P}}(Y_0 \in B). \end{aligned}$$
(4.10)

Now using the definition (4.8) of \({\mathbb {P}}\) and setting \(C= \{Y_0 \in B\} \) we have \({\mathbb {P}}(Y_0 \in B)= {\mathbb {P}}(C) = \int {\mathbb {P}}^y (C) \nu (\textrm{d}y) \) and by (4.9) we have

$$\begin{aligned} {\mathbb {P}}(Y_0 \in B) = \int _{B} \nu (\textrm{d}y)= \nu (B)= \nu ( \phi (0, A)). \end{aligned}$$
(4.11)

Finally, using the definition of \(\nu \) and the fact that \( \psi \) is the inverse of \(\phi \) we have \({\mathbb {P}}(X_0 \in A)= \mathbb P(C)=\mu (A)\) as wanted.

Next we show that for every \(f\in {\mathcal {D}}_{{\mathcal {L}}}\) the process

$$\begin{aligned} M^{f}_u (X): = f(u, X_u) - f(0, X_0) - \int _0^u ({\mathcal {L}} f)(r, X_r) \textrm{d}r, \end{aligned}$$
(4.12)

is a martingale under \({\mathbb {P}}\), that is for every \(f\in \mathcal D_{{\mathcal {L}}}\) and \(F_s\) bounded and continuous functional on \({\mathcal {C}}_s\) (see Sect. 2.3) we have

$$\begin{aligned} {\mathbb {E}} [M^{f}_t(X) F_s(X)] = {\mathbb {E}} [M^{f}_s (X) F_s(X)], \end{aligned}$$

for all \(0\le s\le t\le T\). Indeed, we notice that under \(\mathbb P^y\) we have \(Y_0\sim \delta _y\); hence, \(X_0\sim \delta _{\psi (0, y)}=:\delta _x\). Moreover, \((Y,{\mathbb {P}}^y)\) is a solution of (3.3) with i.c. \(Y_0=y\); hence, by Theorem 3.9 part (ii) we have that \((X_\cdot := \psi (\cdot , Y_\cdot ),\mathbb P^y)\) is a solution to the MP with distributional drift b and i.c. \(X_0 \sim \delta _x\); thus, by the definition of \({\mathbb {P}}\) given in (4.8) we get

$$\begin{aligned} {\mathbb {E}} [(M^{f}_t (X)-M^{f}_s(X)) F_s(X)]&= \int {\mathbb {E}}^{y} [(M^{f}_t(X) -M^{f}_s(X)) F_s(X)] \nu (\textrm{d}y) = 0, \end{aligned}$$

where we denoted by \({\mathbb {E}}^y\) the expectation under \(\mathbb P^y\).

Uniqueness. Here the idea is to use disintegration in order to reduce the MP to MPs with deterministic initial condition. We proceed by stating and proving two preliminary facts.

Fact 1:

Let \(E^1\) be a dense countable set in \(C_c(\mathbb {R})\), \(E^2\) be a dense countable set in \(\mathcal D_{{\mathcal {L}}}\) and \(E^{ {\mathcal {C}}_s}\) be a countable set of bounded continuous functionals such that for every bounded continuous functional \(F_s \in {\mathcal {C}}_s\) there exists a sequence \((F_s^n)\subset E^{ {\mathcal {C}}_s}\) such that \(F^n_s \rightarrow F_s\) in a pointwise uniformly bounded way, see (4.17). A couple \((X, {\mathbb {P}})\) is a solution to the MP with distributional drift b and initial condition \(X_0\) if and only if

$$\begin{aligned} {\mathbb {E}} [M^{f}_t(X) F_s(X) g(X_0)] = {\mathbb {E}} [M^{f}_s(X) F_s(X)g(X_0)], \end{aligned}$$
(4.13)

for every \(f\in E^2, F_s\in E^{{\mathcal {C}}_s}, g\in E^1\) and \(s<t\) with \(s,t \in {\mathbb {Q}} \cap [0,T]\), where \( M^{f}_u (X)\) is given by (4.12).

This fact can be seen as follows. First, we notice that, since \(M^f\) are bounded processes, if \(M^f\) is a local martingale, then it is also a true martingale, and hence, the MP with distributional drift is equivalent to (4.13) for all \(f\in {\mathcal {D}}_{\mathcal L}\), \(F_s\in {\mathcal {C}}_s\) and \(g\in C_c\) and \(s<t\) with \(s,t \in [0,T]\).

Next, one can show that this is equivalent when choosing \(s<t, s,t \in {\mathbb {Q}} \cap [0,T]\). Indeed for any bounded and continuous functional \(F_s\) on \( {\mathcal {C}}_s\), for a sequence of rational times \(s_n \downarrow s\) with \(s<s_n<t\), we can associate a sequence of bounded and continuous functionals \(F_{s_n} \) on \({\mathcal {C}}_{s_n}\) by setting \(F_{s_n}(\eta ):= F_s (\eta \vert _{[0,s]})\) for \(\eta \in {\mathcal {C}}_{s_n}\).

This allows to replace the condition \(s\in {\mathbb {Q}}\cap [0,T]\) with \(s\in [0,T]\). In order to replace \(t \in {\mathbb {Q}}\cap [0,T]\) with \(t\in [0,T]\), we choose a rational sequence \(t_n \in (s, T] \) such that \(t_n \rightarrow t\) and use the fact that the local martingale \(t \mapsto M^f_t\) is continuous and Lebesgue dominated convergence theorem.

Finally, we use again Lebesgue dominated convergence theorem to see the validity of (4.13) for all \(f\in {\mathcal {D}}_{\mathcal L}\), \(F_s\in {\mathcal {C}}_s\) and \(g\in C_c\) and \(s<t\) with \(s,t \in {\mathbb {Q}} \cap [0,T]\).

We remark that \(E^1\) exists because \(C_c\) is separable by [18, Lemma 5.7 (ii)], \(E^2\) exists because \({\mathcal {D}}_{{\mathcal {L}}}\) is separable by Proposition 3.3 and \(E^{{\mathcal {C}}_s}\) exists by Lemma 4.6, whose statement and proof has been postponed at the end of this section.

Fact 2:

Let \((X, {\mathbb {P}})\) be a solution to the MP with distributional drift b and i.c. \(\mu \). There exists a random kernel \({\mathbb {P}}^{x}\) such that \({\mathbb {P}} = \int {\mathbb {P}}^{x} \textrm{d}\mu (x)\), where for \(\mu \)-almost all \(x\in \mathbb {R}^d\), \({\mathbb {P}}^{x}\) lives on \(\{\omega \in \Omega : X_0(\omega )=x\}\) and for any bounded and continuous functional \(G: C[0,T] \rightarrow \mathbb {R}\) we have

$$\begin{aligned} {\mathbb {E}} (G(X)) = \int _{\mathbb {R}^d} {\mathbb {E}}^{x}( G(X)) \textrm{d}\mu (x), \end{aligned}$$
(4.14)

where \({\mathbb {E}}\) and \({\mathbb {E}}^{x}\) stand for the expectation under \({\mathbb {P}}\) and \({\mathbb {P}}^{x}\) respectively.

This follows from the disintegration theorem in [8, Chapter III, nos. 70–72].

We now proceed with the proof of uniqueness. Let \((X^1, \mathbb P_1)\) and \((X^2, {\mathbb {P}}_2)\) be two solutions to the MP with distributional drift b and initial condition \(X_0 \sim \mu \). Without loss of generality we can suppose that \(X^1 = X^2 = X\) is the canonical process on \(\Omega = {\mathcal {C}}_T\). Since \((X^i, {\mathbb {P}}_i), i=1,2\) is a solution of the MP, then by Fact 1 we have

$$\begin{aligned} {\mathbb {E}}_i [(M^{f}_t(X) -M^{f }_s(X))F_s(X)g(X_0)] = 0, \end{aligned}$$

for all \(0\le s\le t\le T\), \(s,t \in {\mathbb {Q}}\), \(g \in E^1, f \in E^2\) and \( F_s \in E^{{\mathcal {C}}_s}\) and \(i=1,2\). We now apply Fact 2 to both \({\mathbb {P}}_1\) and \({\mathbb {P}}_2\), and in particular (4.14) with \(G (\eta ) = (M^f_t(\eta ) -M^f_t(\eta )) F_s(\eta ) g(\eta _0) \) to rewrite the above equality as

$$\begin{aligned} \int _{\mathbb {R}^d} {\mathbb {E}}_i^{x} [(M^{f}_t(X) -M_s^f(X)) F_s( X) g(X_0)] \textrm{d}\mu (x) =0, \end{aligned}$$
(4.15)

for all \(0\le s\le t\le T\), \(s,t \in {\mathbb {Q}}\), \(g \in E^1, f \in E^2\) and \( F_s \in E^{{\mathcal {C}}_s}\) and \(i=1,2\). Now we recall that for \(\mu \)-almost all x, we have \( X_0(\omega )=x\), \(\mathbb P^x_i\)-a.s.; thus, equation (4.15) becomes

$$\begin{aligned} \int _{\mathbb {R}^d} g(x) {\mathbb {E}}_i^{x} [(M^{f}_t(X) -M_s^f(X)) F_s( X)] \textrm{d}\mu (x) =0, \end{aligned}$$

for every \(0\le s\le t\le T\), \(s,t \in {\mathbb {Q}}\), \(g \in E^1, f \in E^2\) and \( F_s \in E^{{\mathcal {C}}_s}\) and \(i=1,2\). Since g is arbitrarily chosen in a dense set of \(C_c(\mathbb {R})\), then we have

$$\begin{aligned} {\mathbb {E}}_i^{x} [(M^{f}_t(X) -M_s^f(X)) F_s( X)] =0 \quad \mu \text {-a.e.}, \end{aligned}$$
(4.16)

for every \(0\le s\le t\le T\), \(s,t \in {\mathbb {Q}}\), \(f \in E^2\) and \( F_s \in E^{{\mathcal {C}}_s}\) and \(i=1,2\). Note that (4.16) is true because the sets \( {\mathbb {Q}} \cap [0,T], E^2 \) and \( E^{{\mathcal {C}}_s}\) are countable. By Fact 1 this means that the couple \((X,{\mathbb {P}}^{x}_i)\) is a solution to the MP with distributional drift b and initial condition \(\delta _x\), for \(i=1,2\) for \(\mu \)-almost all x. By Proposition 4.4 we have uniqueness of the MP with deterministic initial condition \(\delta _x\); hence, for \(\mu \)-almost all x we have \( \mathbb P_1^x = {\mathbb {P}}_2^x\). Thus, recalling the disintegration \(\mathbb P_i = \int {\mathbb {P}}_i^{x} \textrm{d}\mu (x)\) for \(i=1,2\) from Fact 2, we conclude \({\mathbb {P}}_1={\mathbb {P}}_2 \) as wanted. \(\square \)

We conclude the section with the proof of a technical result used in Fact 1 in the proof of Theorem 4.5.

Lemma 4.6

There exists a countable family D of bounded and continuous functionals from \( C([0,T];\mathbb {R}^d) \) to \(\mathbb {R}\) such that any bounded and continuous functional \(F: C([0,T];\mathbb {R}^d) \rightarrow \mathbb {R}\) can be approximated by a sequence \((F_n)\subset D\) in a pointwise uniformly bounded way, that is

$$\begin{aligned}&F_n \rightarrow F \text { pointwise}\nonumber \\&\sup _n \sup _{\eta \in C([0,T];\mathbb {R}^d)} | F_n(\eta )| < \infty . \end{aligned}$$
(4.17)

Proof

We set \(T=1\) without loss of generality. Let \(\eta \in C([0,1];\mathbb {R}^d)\). By [18, Lemma 5.5] we know that the function \(t\mapsto F(\eta (t))\) can be approximated by \(F_n(\eta (\cdot )):= F(B_n (\eta , \cdot ))\), where \((B_n)\) are the \(\mathbb {R}^d\)-valued Bernstein polynomials defined for any function \(\eta \in C([0,1];\mathbb {R}^d)\) by

$$\begin{aligned} B_n(\eta ,t): = \sum _{j=0}^n \eta (\frac{j}{n} ) t^j (1-t)^{n-j}{ n \atopwithdelims ()j}. \end{aligned}$$

Notice that the convergence is uniform in t. Now for fixed n and \(y_0, y_1, \ldots , y_n \in \mathbb {R}^d\) we consider the function f on \(\mathbb {R}^{(n+1)d}\) defined by

$$\begin{aligned} f(y_0, y_1, \ldots , y_n):= F\left( \sum _{j=0}^n y_j (\cdot )^j (1-\cdot )^{n-j}{ n \atopwithdelims ()j}\right) , \end{aligned}$$

so that \(F_n(\eta ) = f (\eta (\frac{0}{n}), \eta (\frac{1}{n}), \ldots , \eta (\frac{n}{n}))\). Notice that \(\sup _{\eta \in C[0,T]} | F_n(\eta )| \le \Vert F\Vert _\infty \). We have thus reduced the problem to approximating any continuous bounded function \(f: \mathbb {R}^{(n+1)d} \rightarrow \mathbb {R}\). We further reduce the problem to continuous functions on \([-M, M]^{(n+1)d}\) by restriction, for some \(M>0\). Indeed, a function \(f:[-M, M]^{(n+1)d} \rightarrow \mathbb {R}\) can be naturally extended to a bounded continuous function \({\hat{f}}\) on \({\mathbb {R}}^{(n+1)d}\) by setting for \(x\in \mathbb {R}^{(n+1)d}\)

$$\begin{aligned} {\hat{f}}(x) = f(x_1\vee ( -M) \wedge M, \ldots , x_{(n+1)d}\vee ( -M) \wedge M). \end{aligned}$$

One can see that \( C([-M, M]^{(n+1)d})\) is separable by Stone–Weierstrass theorem. We denote by \(D_{n, \text {fin}}\) the dense set in the set of bounded and continuous functions from \(\mathbb {R}^{(n+1)d}\rightarrow \mathbb {R}\).

The proof is concluded by setting \(D:= \cup _{n\in \mathbb {N}} D_n\), where

$$\begin{aligned} \begin{aligned} D_n:= \bigg \{ F: C([0,1];\mathbb {R}^d) \rightarrow \mathbb {R}: F(\eta )&= f (\eta (\tfrac{0}{n}), \eta (\tfrac{1}{n}), \ldots , \eta (\tfrac{n}{n}) ), \\&\eta \in C([0,1];\mathbb {R}^d) \text { for some } f \in D_{n, \text {fin}} \bigg \}, \end{aligned} \end{aligned}$$

which is a countable set of bounded functions. Then for any bounded and continuous functional \(F:C([0,1];\mathbb {R}^d) \rightarrow \mathbb {R}\) we construct the sequence \((F_n)\) that converges to F pointwisely by choosing the appropriate element \(F_n\in D_n\). Since the convergence in \( C([-M, M]^{(n+1)d})\) is uniform, we also have \(\sup _n \sup _{\eta \in C([0,1];\mathbb {R}^d)} | F_n(\eta )| < \infty .\) \(\square \)

Remark 4.7

One could also define the domain \({\mathcal {D}}_{{\mathcal {L}}}\) of the martingale problem as a subset of the smaller space \(C_T\mathcal C^{(2-\beta )-}\) instead of the larger space \(C_T\mathcal C^{(1+\beta )+}\). On the other hand, one could enlarge the domain by choosing functions with linear growth, namely in \( C_TD\mathcal C^{\beta +}\). In both cases the analysis of the resulting MP is similar and should lead to an equivalent problem to the one studied in the present paper. We leave these details to the interested reader.

5 The Solution of the MP as Weak Dirichlet Process

In this section we focus on the weak Dirichlet decomposition property of the solution of the MP, which will be useful in Sect. 6 to characterise it as a solution of a generalised SDE. We notice that a solution to the martingale problem with distributional drift b is not a semimartingale in general. Indeed already in the fully studied case of dimension \(d=1\), see [14, Corollary 5.11], one sees that the solution is a semimartingale if and only if b is a Radon measure. We can, however, discuss and investigate other properties of this process, which turns out to be a weak Dirichlet process, and we identify the martingale component of the weak Dirichlet decomposition.

We start with the definition of weak Dirichlet process, that can be found in [15], see also [10, 11].

Definition 5.1

Let X be a continuous stochastic process on some probability space \((\Omega , {\mathcal {F}}, {\mathbb {P}})\) and let \({\mathcal {F}}^X\) denote its canonical filtration.

  • A process \({\mathscr {A}}\) is said to be an \({\mathcal {F}}^X\)-martingale orthogonal process if \([N,{\mathscr {A}}]=0\) for every \({\mathcal {F}}^X\)-continuous local martingale N.

  • The process X is said \({\mathcal {F}}^X\)-weak Dirichlet if it is the sum of an \({\mathcal {F}}^X\)-local martingale M and an \({\mathcal {F}}^X\)-martingale orthogonal process \({\mathscr {A}}\).

    When \({\mathscr {A}}_0=0\) a.s., we call \( X = M+ {\mathscr {A}}\) the standard decomposition.

Remark 5.2

  • The two equalities in the statement of Definition 5.1, that is \([N,{\mathscr {A}}]=0\) and \( X = M+ {\mathscr {A}}\), are meant up to indistinguishability with respect to \({\mathbb {P}}\).

  • The standard decomposition of a \({\mathcal {F}}^X\)-weak Dirichlet process is unique.

In the remainder of the section, we let \((X, {\mathbb {P}})\) be the solution to the martingale problem with distributional drift b and initial condition \(\mu \), with \({\mathbb {P}}\) being a probability measure on some measurable space \((\Omega , {\mathcal {F}})\) that will be fixed throughout. We will make use of the space of processes \({\mathscr {C}}\), introduced in Sect. 2.3. Let Assumption A1 hold.

Proposition 5.3

Let \(f\in C^{0,1}([0,T]\times \mathbb {R}^d)\). Then \(f(t, X_t)\) is an \({\mathcal {F}}^X\)-weak Dirichlet process. In particular, X is an \({\mathcal {F}}^X\)-weak Dirichlet process.

Proof

We recall that by Theorem 3.9\(X=\psi (t, Y_t)\) where \(\psi \in C^{0,1}\) and \((Y_t)\) is an \({\mathcal {F}}^X\)-semimartingale. Then \(f(t, X_t) = (f\circ \psi )(t, Y_t)\) is a \(C^{0,1}\) function of a semimartingale; hence, it is a weak Dirichlet process by [15, Corollary 3.11]. \(\square \)

From now on we denote by \(f(t, X_t) = M^f + {\mathscr {A}}^f_t\) the standard decomposition of the weak Dirichlet process \(f(t, X_t) \) for \(f\in C^{0,1}\).

In what follows, we compute the covariation process between two martingale parts \(M^f\) and \(M^h\), for two functions \(f,h\in C^{0,1}\). To do so we first need some preparatory lemmata dealing with functions in some subspace of \({\mathcal {D}}_{{\mathcal {L}}} \). We denote by \({\mathcal {D}}_{{\mathcal {L}}}^s\) the space given by

$$\begin{aligned} {\mathcal {D}}_{{\mathcal {L}}}^s:=\{ f \text { such that } \exists {\tilde{f}} \in C^{1,2}_c \text { and } f = {\tilde{f}} \circ \phi \}, \end{aligned}$$
(5.1)

which is obviously and algebra. Moreover, it is a linear subspace of \({\mathcal {D}}_{{\mathcal {L}}}\) by Lemma 3.8.

Proposition 5.4

For \(f,h \in {\mathcal {D}}_{{\mathcal {L}}}^s \) we have

$$\begin{aligned} {\mathcal {L}} (fh) = ({\mathcal {L}} f)h +( {\mathcal {L}} h) f + \nabla f \nabla h. \end{aligned}$$
(5.2)

Proof

Let \(f,h \in {\mathcal {D}}_{{\mathcal {L}}}^s \) and let us compute the time derivative of the product fh. We have

$$\begin{aligned} \partial _t(fh)&= h\partial _t f+ f \partial _t h, \end{aligned}$$
(5.3)

which makes sense as we see below. Indeed, \(h\partial _t f \) is well-defined because \(h\in C_T {\mathcal {C}}^{(1+\beta )+}\) and \( \partial _t f = {\mathcal {L}} f -\frac{1}{2} \Delta f - \nabla f b \) is an element of \(C_T{\mathcal {C}}^{(\beta -1)+}\). The latter holds because \( {\mathcal {L}} f \in C_T {\mathcal {C}}^{0+}\), \(\frac{1}{2} \Delta f \in C_T{\mathcal {C}}^{(\beta -1)+}\) and \(\nabla f b \in C_T\mathcal C^{-\beta }\), with \((\beta -1)\le -\beta \). Similarly for \( f \partial _t h\).

We also calculate the Laplacian of fh

$$\begin{aligned} \tfrac{1}{2} \Delta (fh) = \frac{1}{2} (h \Delta f + 2 \nabla f \nabla h + f\Delta h), \end{aligned}$$
(5.4)

where we recall that \(\nabla f \nabla h:= \nabla f \cdot \nabla h\), and we calculate the transport term

$$\begin{aligned} b \nabla (fh) = b \nabla f \, h + b \nabla h \, f, \end{aligned}$$
(5.5)

which are well-defined by similar arguments. Collecting (5.3), (5.4) and (5.5) then equality (5.2) follows. \(\square \)

Lemma 5.5

Let \(f,h \in {\mathcal {D}}_{{\mathcal {L}}}^s\). Then

$$\begin{aligned}{}[M^f, M^h]_t = \int _0^t (\nabla f)(s, X_s) (\nabla h)(s, X_s) \textrm{d}s. \end{aligned}$$
(5.6)

Proof

By Proposition 5.4, \(fh\in {\mathcal {D}}_{{\mathcal {L}}}^s \subset {\mathcal {D}}_{{\mathcal {L}}},\) so using the martingale problem, Proposition 5.3 (and considerations below) together with the uniqueness of the standard weak Dirichlet decomposition we have

$$\begin{aligned} (fh)(t,X_t) = M^{fh}+ \int _0^t {\mathcal {L}} (fh) (s, X_s) \textrm{d}s, \end{aligned}$$
(5.7)

having incorporated the initial condition \((fh)(0,X_0)\) in the martingale part \(M^{fh}\) so that \({\mathscr {A}}^{fh}_t = \int _0^t {\mathcal {L}} (fh) (s, X_s) \textrm{d}s\); hence, \({\mathscr {A}}^{fh}_0=0\) as required. It holds also

$$\begin{aligned}&f(t,X_t) = M_t^{f}+ \int _0^t {\mathcal {L}} f (s, X_s) \textrm{d}s \end{aligned}$$
(5.8)
$$\begin{aligned}&h(t,X_t) = M_t^{h}+ \int _0^t {\mathcal {L}} h (s, X_s) \textrm{d}s. \end{aligned}$$
(5.9)

Integrating by parts \((fh)(t,X_t)\) and using (5.8) and (5.9), we have

$$\begin{aligned} (fh)(t,X_t) =&\int _0^t f(s, X_s) \textrm{d}h(s, X_s) + \int _0^t h(s, X_s) \textrm{d}f(s, X_s) + [f(\cdot , X), h(\cdot , X)]_t\nonumber \\ =&{\mathscr {M}}_t +\int _0^t f(s, X_s) ({\mathcal {L}}{\mathcal {h}})(s, X_s) \textrm{d}s + \int _0^t h(s, X_s) ({\mathcal {L}}{\mathcal {f}})(s, X_s) \textrm{d}s + [M^f, M^h]_t, \end{aligned}$$
(5.10)

where \(({\mathscr {M}}_t)\) is some local martingale. Equations (5.7) and (5.10) give two decompositions of the semimartingale \((fh)(t, X_t)\). By uniqueness of the decomposition and taking into account Proposition 5.4, the conclusion (5.6) follows. \(\square \)

Remark 5.6

We notice that both sides of (5.6) are well-defined also for \(f,h\in C^{0,1}\).

Lemma 5.7

\({\mathcal {D}}_{{\mathcal {L}}}^s\) is dense in \(C^{0,1}([0,T] \times {\mathbb {R}}^d)\).

Proof

Let \(\chi : {\mathbb {R}} \rightarrow \mathbb {R}_+\) be a smooth function such that

$$\begin{aligned} \chi (x) =\left\{ \begin{array}{ll} 0 \quad &{}x\ge 0\\ 1 \quad &{}x\le -1\\ \in (0,1) \quad &{} x\in (-1,0). \end{array} \right. \end{aligned}$$

We set \(\chi _n: \mathbb {R}^d \rightarrow \mathbb {R}\) as \( \chi _n(x): = \chi (|x| - (n+1)). \) In particular

$$\begin{aligned} \chi _n (x) =\left\{ \begin{array}{ll} 0 \quad &{}|x|\ge n+1\\ 1 \quad &{}|x|\le n\\ \in (0,1) \quad &{} \text {otherwise}. \end{array} \right. \end{aligned}$$

Let \(f\in C^{0,1}\). Let us define \({\tilde{f}}:= f\circ \psi \in C^{0,1}\) and \({\tilde{f}}_n:= {\tilde{f}} \chi _n\). Since \( {\tilde{f}}_n \rightarrow {\tilde{f}}\) in \(C^{0,1}\) also \( f_n:= {\tilde{f}}_n \circ \phi \rightarrow f\) in \(C^{0,1}\); hence, we reduce to the case where \({\tilde{f}} =f\circ \psi \) has compact support.

We set

$$\begin{aligned} {\tilde{f}}_m(t,x):= m \int _t^{t+\tfrac{1}{m}} (f\star \rho _m)(s,x) \textrm{d}s, \end{aligned}$$

where \(\rho _m\) is a sequence of mollifiers with compact support and \(\star \) denotes the space-convolution. Then \({\tilde{f}}_m \in C_c^{1, \infty } ([0,T] \times \mathbb {R}^d)\) and \({\tilde{f}}_m \rightarrow \tilde{f}\) in \(C^{0,1}\); hence, \(f_m:= {\tilde{f}}_m \circ \phi \rightarrow f\) in \(C^{0,1}\). \(\square \)

Theorem 5.8

Let \(f,h \in C^{0,1}\). Then

$$\begin{aligned} {[}M^f, M^h]_t = \int _0^t (\nabla f)(s, X_s) (\nabla h)(s, X_s) \textrm{d}s. \end{aligned}$$
(5.11)

Proof

First, we notice that (5.11) holds for every \(f,h\in {\mathcal {D}}_{{\mathcal {L}}}^s\) by Lemma 5.5. Each side of (5.11) is well-defined for \(f,h \in C^{0,1},\) by Remark 5.6. Moreover, by Lemma 5.7\({\mathcal {D}}_{{\mathcal {L}}}^s \subset C^{0,1}\) is a dense subspace.

Next we show that, for fixed \(h\in {\mathcal {D}}_{{\mathcal {L}}}^s\), the map \(f\mapsto [M^f, M^h]\) is continuous and linear from \(C^{0,1}\) to \({\mathcal {C}}\). For this we make use of Banach–Steinhaus theorem for F-spaces, see e.g. [9, Theorem 2.1]. Indeed, the space \(C^{0,1}\) is clearly an F-space, and so is the linear space of continuous processes \({\mathscr {C}}\) equipped with the u.c.p. topology. Let \([M^f, M^h]^\varepsilon \) denote the \(\varepsilon \)-regularisation of the bracket \([M^f, M^h]\), see [26, Definition 4.2] or [24, Sect. 1] for a precise definition. Let \(h\in \mathcal D_{{\mathcal {L}}}^s\) be fixed. The operator \(T^\varepsilon : f \mapsto [M^f, M^h]^\varepsilon \) is linear and continuous from \(C^{0,1}\) to \({\mathscr {C}}\). Finally, \([M^f, M^h]\) is well-defined as a u.c.p.-limit of \([M^f, M^h]^\varepsilon \), see [24, Proposition 1.1]. Thus, by Banach–Steinhaus the map \(f\mapsto [M^f, M^h]\) is continuous from \(C^{0,1}\). Since both members of (5.11) are continuous and linear, then (5.11) extends to all \(f\in C^{0,1}\) and \(h\in {\mathcal {D}}_{{\mathcal {L}}}^s\).

Finally, let \(f\in C^{0,1}\) be fixed. By the same reasoning as above we extend (5.11) to \(h\in C^{0,1}\). \(\square \)

Corollary 5.9

The map \(f\mapsto {\mathscr {A}}^f\) is continuous (and linear) from \(C^{0,1}\) to \({\mathscr {C}}\).

Proof

Since \(f_n \rightarrow 0\) in \(C^{0,1}\), then \(f_n(\cdot , X)\rightarrow 0\) u.c.p. By Theorem 5.8\([M^{f_n}]\rightarrow 0\), and taking into account [20, Chapter 1, Problem 5.25] we have that \(M^{f_n}\rightarrow 0\) u.c.p. Using the decomposition \(f_n(\cdot , X) = M^{f_n}+ {\mathscr {A}}^{f_n}\), we have \({\mathscr {A}}^{f_n} \rightarrow 0\) u.c.p. and the proof is concluded. \(\square \)

Remark 5.10

Let \(\text {id}_i (x)= x_i\). Then \(\text {id}_i \in C^{0,1}\). Setting \(M^{\text {id}} = (M^{\text {id}_1}, \ldots , M^{\text {id}_d})^\top \) then by Theorem 5.8, we have

$$\begin{aligned} {[} M^{\text {id}_i}, M^{\text {id}_j}]_t = \delta _{ij}t. \end{aligned}$$

Hence, by Lévy characterisation theorem this implies that \(M^{\text {id}}- X_0\) is a standard d-dimensional Brownian motion. We denote this Brownian motion by \(W^X\).

Proposition 5.11

For \(f\in C^{0,1}([0,T] \times \mathbb {R}^d)\), we have

$$\begin{aligned} M^f_t = f(0, X_0) + \int _0^t \nabla f(s, X_s) \cdot \textrm{d}M^{\text {id}}_s. \end{aligned}$$

Proof

Recall that we write

$$\begin{aligned} f(t, X_t) = M^f_t + {\mathscr {A}}^f_t, \end{aligned}$$
(5.12)

where the right-hand side is the standard (unique) decomposition of the left-hand side, as an \({\mathcal {F}}^X\)-weak Dirichlet process. In particular \( {\mathscr {A}}^f\) is an \({\mathcal {F}}^X\)-orthogonal process with \( {\mathscr {A}}^f_0=0\) and \(M^f\) is the martingale component. We define \(\tilde{{\mathscr {A}}}^f\) so that

$$\begin{aligned} f(t, X_t) = f(0, X_0) + \int _0^t \nabla f(s, X_s) \cdot \textrm{d}M^{\text {id}}+ \tilde{{\mathscr {A}}}^f_t. \end{aligned}$$

We will prove later that

$$\begin{aligned} {[}\tilde{{\mathscr {A}}}^f, N] =0 \text { for all continuous local} {\mathcal {F}}^X\hbox {-martingales }N. \end{aligned}$$
(5.13)

From (5.13) we have that \(\tilde{{\mathscr {A}}}^f\) is an \({\mathcal {F}}^X\)-martingale orthogonal process with \(\tilde{\mathscr {A}}^f_0 =f(0, X_0)-f(0, X_0) =0\); thus, by uniqueness of the decomposition of weak Dirichlet processes it must be \(\tilde{{\mathscr {A}}}^f ={{\mathscr {A}}}^f\) and therefore

$$\begin{aligned} M^f_t = f(0, X_0) + \int _0^t \nabla f(s, X_s) \cdot \textrm{d}M_s^{\text {id}}, \end{aligned}$$

as wanted. It remains to prove (5.13). By definition of \(\tilde{{\mathscr {A}}}^f\) and (5.12) we have

$$\begin{aligned}{}[\tilde{{\mathscr {A}}}^f, N]_t&= [f(\cdot , X_\cdot ) , N]_t - [\int _0^\cdot \nabla f(s, X_s) \cdot \textrm{d}M^{\text {id}}, N]_s\nonumber \\&= [M^f, N]_t- \int _0^t \nabla f(s, X_s) \cdot \textrm{d}[ M^{\text {id}}, N]_s, \end{aligned}$$
(5.14)

having used the weak Dirichlet decomposition \(f(\cdot , X)= M^f +{{\mathscr {A}}}^f \), where \({{\mathscr {A}}}^f\) is an \(\mathcal F^X\)-martingale orthogonal process. Regarding N, now we observe that by Kunita–Watanabe decomposition there is an \({\mathcal {F}}^X \)-progressively measurable process \(\xi \) and an orthogonal local martingale O such that

$$\begin{aligned} N_t = N_0 + \int _0^t \xi _s \cdot dM^{\text {id}}_s +O_t. \end{aligned}$$

Thus, the covariation with \(M^{\text {id}}\) gives

$$\begin{aligned} {[} M^{\text {id}}, N]_t&= [ M^{\text {id}}, \int _0^\cdot \xi _s \cdot \textrm{d}M^{\text {id}}_s ]_t =\int _0^t \xi _s \textrm{d}s , \end{aligned}$$

since \([ M^{\text {id}_i}, M^{\text {id}_j}]_t = \delta _{i,j} t \) by Remark 5.10. We calculate \([ M^f, N]_t\) using Theorem 5.8 to get

$$\begin{aligned}{}[ M^f, N]_t = [ M^{f}, \int _0^\cdot \xi _s \cdot \textrm{d}M^{\text {id}}_s ]_t =\int _0^t \xi _s \cdot \textrm{d}[ M^{f}, M^{\text {id}}]_s = \int _0^t \xi _s \cdot \nabla f (s, X_s) \textrm{d}s. \end{aligned}$$

Plugging these two covariations into (5.14), we get

$$\begin{aligned}{}[\tilde{{\mathscr {A}}}^f, N]_t = \int _0^t \xi _s \cdot \nabla f (s, X_s) \textrm{d}s - \int _0^t \nabla f(s, X_s) \cdot \xi _s \textrm{d}s =0, \end{aligned}$$

which is (5.13) as wanted. \(\square \)

We conclude this section with some final remarks.

Remark 5.12

  1. (i)

    We recall that \( {\mathcal {D}}_{{\mathcal {L}}}^s \subset {\mathcal {D}}_{{\mathcal {L}}} \subset C^{0,1}\). Thus, for \(f\in {\mathcal {D}}_{{\mathcal {L}}}^s\) by uniqueness of the weak Dirichlet decomposition and by the martingale problem we have \({\mathscr {A}}^f_t = \int _0^t ({\mathcal {L}} f)(s, X_s) \textrm{d}s\). Therefore, we have that \(f\mapsto {\mathscr {A}}^f\) is the continuous linear extension of \(f\mapsto \int _0^t ({\mathcal {L}} f)(s, X_s) \textrm{d}s\) taking values in \({\mathscr {C}}\).

  2. (ii)

    We recall that the function \(\text {id}_i\) solves PDE (2.3) so we have \({\mathcal {L}} \text {id}_i = b^i\), see Sect. 2.2. Hence, taking \(f=\text {id}_i\) for some \(i\in \{1, \ldots , d \}\) one gets \(X = M^{\text {id}_i} + {\mathscr {A}}^{\text {id}_i} \), where formally

    $$\begin{aligned} {\mathscr {A}}^{\text {id}_i} = ``\int _0^\cdot b^i(s, X_s) \textrm{d}s", \end{aligned}$$

    by the first point in this Remark. Putting all components together one would get indeed

    $$\begin{aligned} {\mathscr {A}}^{\text {id}}:=({\mathscr {A}}^{\text {id}_i} )_i = ``\int _0^\cdot b(s, X_s) \textrm{d}s". \end{aligned}$$

    Plugging this into the decomposition \(\text {id}(X_t) = M_t^{\text {id}} + {\mathscr {A}}_t^{\text {id}} \) and using Remark 5.10 gives the (formal) writing

    $$\begin{aligned} X_t =X_0 + W^X_t+ ``\int _0^t b(s, X_s) \textrm{d}s"\end{aligned}$$

    as expected. Notice, however, that in general \(\text {id}_i \notin {\mathcal {D}}_{{\mathcal {L}}}\) since \(b\in C_T {\mathcal {C}}^{-\beta }\) so in general \( b \notin C_T \bar{{\mathcal {C}}}_c^{0+}\). This is why the writing above is only formal. We will introduce an extended domain in the next section to make this argument rigorous.

6 Generalised SDEs and their Relationship with MP

In this final section we investigate the dynamics of the process X which formally solves the SDE \(\textrm{d}X_t = b(t, X_t) \textrm{d}t + \textrm{d}W_t\) and compare it to the solution to the martingale problem. First, we define a notion of solution for the formal SDE, a definition that amongst other things involves weak Dirichlet processes. We show that any solution to the MP is also a solution of the formal SDE and a chain rule holds (Theorem 6.5). Finally, we close the circle by showing that, under the stronger assumption for X to be a Dirichlet process, X being a solution to the formal SDE is equivalent to being a solution to the MP (Corollary 6.13). We recall that X is an \({\mathcal F}^X\)-Dirichlet process if it is the sum of an \({\mathcal F}^X\)-local martingale plus an adapted zero quadratic variation process. In this section there is always an underlying measurable space \((\Omega , {\mathcal {F}})\).

We make a further technical assumption on the support of the singular drift b. This assumption is a standing assumption until the end of the paper.

Assumption A2

Let \(b \in C_T \bar{{\mathcal {C}}}_c^{(-\beta )+} \).

As mentioned above, the idea of the current section is inspired by Remark 5.12 and consists in further investigating to which extent our solution to the martingale problem is the solution of an SDE of the form

$$\begin{aligned} X_t = X_0 + W^X_t+ ``\int _0^t b(s, X_s) \textrm{d}s", \end{aligned}$$

where \(X_0 \sim \mu \). We note that if \(b=l\) were a function, the interpretation of \(``\int _0^t l(s, X_s) \textrm{d}s"\) would indeed be the integral \(\int _0^t l(s, X_s) \textrm{d}s\). In particular, \(\int _0^t l(s, X_s) \textrm{d}s\) is well-defined for any \(l \in C_T \bar{ \mathcal C}_c^{0+}\). We will study various properties of \(l \mapsto \int _0^t l(s, X_s) \textrm{d}s\) for a reasonable class of distributions l (which includes for example \(b\in C_T \bar{ {\mathcal {C}}}_c^{(-\beta )+}\) from Assumption A2), proceeding similarly to [22].

Definition 6.1

Let \({\mathbb {P}}\) be a probability measure on \((\Omega , {\mathcal {F}})\). We say that a process X fulfils the local time property with respect to a topological vector space \(B \supset C_T \bar{ {\mathcal {C}}}_c^{0+}\) if \( C_T \bar{ {\mathcal {C}}}_c^{0+}\) is dense in B and the map from \( C_T \bar{ {\mathcal {C}}}_c^{0+}\) with values in \( {\mathscr {C}}\) defined by

$$\begin{aligned} l \mapsto \int _0^t l(s, X_s) \textrm{d}s \end{aligned}$$

admits a continuous extension to B (or equivalently it is continuous with respect to the topology of B) which we denote by \(A^{X, B}\).

Notice that this notion has been first defined in a different context in [22, Definition 6.1], see also [22, Remark 6.2] for the links to local time. Using the local time property we now introduce a notion of solution to SDE which is different from the martingale problem. We will then study its properties and links to the solution to the martingale problem.

Definition 6.2

Let \({\mathbb {P}}\) be a probability measure on \((\Omega , {\mathcal {F}})\). Given \(b\in B\subset {\mathcal {S}}'(\mathbb {R}^d)\), we say that X is a B-solution to

$$\begin{aligned} X_t =X_0 + W_t+ \int _0^t b(t, X_t) \textrm{d}s, \end{aligned}$$

if there exists a Brownian motion \(W= W^X\) and

  1. (a)

    X fulfils the local time property with respect to B;

  2. (b)

    \(b\in B\);

  3. (c)

    \( X_t = X_0 + W^X_t+ A_t^{ X,B}(b)\);

  4. (d)

    X is an \({\mathcal {F}}^X\)-weak Dirichlet process.

Remark 6.3

Some examples of B are \(B= C_T \bar{ {\mathcal {C}}}_c^{0+}\) and \(B= C_T \bar{ {\mathcal {C}}}_c^{(-\beta )+}\). Indeed, \( \bar{ \mathcal C}_c^{0+}\) is dense in \(\bar{ {\mathcal {C}}}_c^{(-\beta )+}\) since by [18, Lemma 5.4 (i)] \({\mathcal {S}} \subset \bar{ {\mathcal {C}}}_c^{0+} \) and \({\mathcal {S}}\) is dense in \(\bar{ \mathcal C}_c^{(-\beta )+}\). Finally, by [19, Remark B.1] we conclude that \( C_T \bar{ {\mathcal {C}}}_c^{0+}\) is dense in \(C_T \bar{ {\mathcal {C}}}_c^{(-\beta )+}\).

Below we will investigate B-solutions for \(B= C_T \bar{\mathcal C}^{(-\beta )+}_c\). We denote by

$$\begin{aligned} {\mathcal {D}}_{{\mathcal {L}}}^B:= \left\{ f \in {\mathcal {D}}^0_{{\mathcal {L}}} \text { such that } g:= {\mathcal {L}} f \in B \right\} . \end{aligned}$$
(6.1)

Remark 6.4

Let \(B = C_T \bar{ {\mathcal {C}}}_c^{(-\beta )+}\). Notice that \(f=\text {id} \in {\mathcal {D}}_{{\mathcal {L}}}^B\) and \({\mathcal {L}} \, id=b\), in the sense that \({\mathcal {L}} \, \text {id}_i=b^i\) for all \(i=1, \ldots , d\) as recalled in Remark 5.12 item (ii).

Theorem 6.5

Let \(B = C_T \bar{ {\mathcal {C}}}_c^{(-\beta )+}\). Let \((X, {\mathbb {P}}) \) be the solution to the martingale problem with distributional drift b and i.c. \(\mu \). Then there exists a Brownian motion \(W^X\) with respect to \( {\mathbb {P}} \) such that X is a B-solution of

$$\begin{aligned} X_t = X_0 + W^X_t+ \int _0^t b(s, X_s) \textrm{d}s, \end{aligned}$$

where \(X_0 \sim \mu \). Moreover, for every \(f\in {\mathcal {D}}_{\mathcal L}^B \) we have the chain rule

$$\begin{aligned} f(t, X_t) = f(0,X_0) + \int _0^t (\nabla f ) (s, X_s) \cdot \textrm{d}W^X_s + A^{X,B}_t ({\mathcal {L}} f), \end{aligned}$$
(6.2)

and the equality

$$\begin{aligned} A^{X,B}_t ({\mathcal {L}} f) = {\mathscr {A}}^f_t. \end{aligned}$$
(6.3)

Remark 6.6

Notice that point (c) in Definition 6.2 provides the standard decomposition of the weak Dirichlet process X, where the local martingale component is given by \(X_0+W^X\) and the martingale orthogonal process is given by \(A^{X,B}_t (b) = \mathscr {A}^{\text {id}}_t\) in view of (6.3) and Remark 6.4.

Proof of Theorem 6.5

For ease of notation we write \(A^X\) in place of \(A^{X,B}\).

Let \((X, {\mathbb {P}})\) be the solution to the martingale problem with distributional drift b and i.c. \(\mu \). We have to show that the four conditions of Definition 6.2 are satisfied. Clearly, \(b\in B\) which is point (b) of Definition 6.2. By Proposition 5.3, for every \(f\in C^{0,1}\) we have that \(f(t, X_t)\) is an \({\mathcal {F}}^X\)-weak Dirichlet process; hence, X is also a weak Dirichlet process (point (d) of Definition 6.2) with decomposition

$$\begin{aligned} f(t, X_t) = M^f_t + {\mathscr {A}}^f_t. \end{aligned}$$

Next we check the local time property, which is point (a) of Definition 6.2. We use that X solves the martingale problem for every \(f\in {\mathcal {D}}_{{\mathcal {L}}} \subset C^{0,1}\) (thus \(f(t, X_t)- \int _0^t ({\mathcal {L}} f) (s, X_s) \textrm{d}s \) is a local martingale) and uniqueness of the weak Dirichlet decomposition to get

$$\begin{aligned} {\mathscr {A}}^f = \int _0^\cdot ({\mathcal {L}} f) (s, X_s) \textrm{d}s = A^X( {\mathcal {L}} f), \end{aligned}$$
(6.4)

where the second equality holds because \( {\mathcal {L}} f\in C_T \bar{{\mathcal {C}}}_c^{0+}\). We want to show that \( A^X\) extends to all \(g\in B = C_T \bar{ {\mathcal {C}}}_c^{(-\beta )+}\). Let us denote by T the map

$$\begin{aligned} \begin{array}{llll} T:&{} C_T \bar{{\mathcal {C}}}_c^{(-\beta )+} &{}\rightarrow &{} C_T D {\mathcal {C}}^{\beta +} \\ &{} g&{} \mapsto &{} T(g):= v, \end{array} \end{aligned}$$

where v is the unique solution in \(C_T {\mathcal {C}}^{(1+\beta )+}\) of PDE

$$\begin{aligned} \left\{ \begin{array}{l} {\mathcal {L}} v = g\\ v(T) = 0, \end{array} \right. \end{aligned}$$

which is PDE (2.3) with \(v_T = 0\), see Sect. 2.2. It is clear that for \(f\in \mathcal D_{{\mathcal {L}}}\) and \(g= {\mathcal {L}} f \in C_T \bar{\mathcal C}_c^{0+}\) we have \(T(g)=f\) so that (6.3) writes

$$\begin{aligned} {\mathscr {A}}^{T(g)} = A^X(g). \end{aligned}$$

Now we recall that \(g\mapsto T(g)\in C_T \mathcal C^{(1+\beta )+} \subset C^{0,1}\) is continuous, see Sect. 2.2, in particular when \(g_n\rightarrow g\) in \(C_T \bar{{\mathcal {C}}}_c^{-\beta } \) then \(f_n =T(g_n)\rightarrow T(g)= f\) in \(C_T {\mathcal {C}}^{(1+\beta )+} \subset C^{0,1} \). Moreover, by Corollary 5.9 also the map \(f \mapsto {\mathscr {A}}^f\) is continuous from \(C^{0,1}\) to \({\mathscr {C}}\). Now we use the density of \( C_T \bar{{\mathcal {C}}}_c^{0+}\) in \(C_T \bar{{\mathcal {C}}}_c^{(-\beta )+}\) to conclude that the local time property holds and also (6.3) holds. Point (c) in Definition 6.2 follows from the chain rule (6.2) (shown below) for \(f= \text {id}\) using Remark 6.4.

It is left to prove that the chain rule (6.2) holds. We define \(W^X:= M^{\text {id}} - X_0\), which is a Brownian motion by Remark 5.10. First, we prove that (6.2) holds for \(f \in {\mathcal {D}}_{{\mathcal {L}}}\). Indeed, by Proposition 5.11 we know that \(M^f_t = f(0, X_0) + \int _0^t (\nabla f )(s, X_s) \cdot \textrm{d}W_s^X\) so using that X is a solution to the martingale problem we easily get that (6.2) holds for \(f\in \mathcal D_{{\mathcal {L}}} \). In order to extend it to \(f\in \mathcal D_{{\mathcal {L}}}^B\), we use the operator T and rewrite the chain rule (6.2) as

$$\begin{aligned} (Tg)(t, X_t) - (Tg)(0, X_0) - \int _0^t \nabla ( Tg) (s, X_s) \cdot \textrm{d}W^X_s = A_t^X(g), \end{aligned}$$
(6.5)

for all \(g\in B = C_T \bar{{\mathcal {C}}}_c^{(-\beta )+}\). Notice that (6.5) holds for \(g\in C_T \bar{{\mathcal {C}}}_c^{0+}\) since (6.2) holds for \(f\in {\mathcal {D}}_{{\mathcal {L}}}\) with \({\mathcal {L}}{\mathcal {f}} =g\). The left-hand side of (6.5) is continuous from B to \({\mathscr {C}}\) because it is the composition of continuous operators. The right-hand side of (6.5) extends from \(g \in C_T \bar{{\mathcal {C}}}_c^{0+}\) to \(g\in B\) by the local time property (a). Since \( C_T \bar{{\mathcal {C}}}_c^{0+}\) is dense in B, then (6.5) extends to B, which is (6.2) as wanted. \(\square \)

Remark 6.7

Notice that if, in the previous proof, we defined the solution operator T using a different terminal condition \(v_T \in \mathcal C^{(1+\beta )+} \), \(v_T\ne 0\), it would have led to the same operator \(A^{X,B}\). This can be seen by noticing that the operator is the unique extension of the integral operator \(l \mapsto \int _0^t l(s, X_s) \textrm{d}s\).

We now introduce a refined notion of B-solution, which will be used later.

Definition 6.8

Let \({\mathbb {P}}\) be a probability measure on \((\Omega , {\mathcal {F}})\). We say that X is a reinforced B-solution of

$$\begin{aligned} X_t = X_0 + W_t^X+ \int _0^t b(t, X_t) \textrm{d}s \end{aligned}$$

if

  1. (i)

    it is a B-solution of the SDE in the sense of Definition 6.2;

  2. (ii)

    for any \(f\in C_b^{1,2,B}\), where

    $$\begin{aligned} C_b^{1,2,B}:=\{ f\in C_b^{1,2} \text { such that } \dot{f} +\frac{1}{2} \Delta f \in C_T \bar{{\mathcal {C}}}_c^{0+} \text { and } \nabla f b \in B \}, \end{aligned}$$

    then

    $$\begin{aligned} \int _0^t (\nabla f )(s, X_s) \cdot \textrm{d}^- A_s^{X,B} ( b) = A^{X,B}_t (\nabla f \, b), \end{aligned}$$
    (6.6)

    where the forward integral \(d^-A\) is the one given in [23] in the one-dimensional case, which can be straightforwardly extended to the vector case. In particular, for a locally bounded integrand process Y and a continuous integrator process X we denote

    $$\begin{aligned} \int _0^t Y_s \cdot \textrm{d}^- X_s = \sum _{i =1}^d \int _0^t Y^i_s \textrm{d}^- X^i_s. \end{aligned}$$

Remark 6.9

  1. (i)

    When \(b \in C_T\bar{{\mathcal {C}}}_c^{0+} \) and \( f\in C^{1,2}_b\) then \(\nabla f b \in C_T \bar{{\mathcal {C}}}_c^{0+}\) because we can choose the approximating sequence \(b_n \rightarrow b\) with compact support to construct the approximating sequence \(\nabla f b_n \rightarrow \nabla f b\). In this case equality (6.6) holds because both members are equal to \(\int _0^t (\nabla f \, b)(s, X_s) \textrm{d}s\). Thus, it is natural to require the condition (6.6).

  2. (ii)

    In the case \(B=C_T \bar{{\mathcal {C}}}_c^{(-\beta )+}\), we notice that the condition \( \nabla f b \in B\) is always satisfied. Indeed, \(\nabla f \in C_T {\mathcal {C}}^{\beta +}\) and \( b\in B \); thus, by (2.2) \(\nabla f b\in C_T {\mathcal {C}}^{(-\beta )+}\). Finally, \(\nabla f b\in C_T \bar{ {\mathcal {C}}}_c^{(-\beta )+}\) because we can construct the compactly supported sequence by considering \(\nabla f b_n\), where \( b_n \) is the compactly supported sequence that converges to b in \(C_T \bar{ {\mathcal {C}}}_c^{(-\beta )+}\), using again (2.2). Thus, \( C^{1,2,B}_b\) reduces to

    $$\begin{aligned} \{ f\in C_b^{1,2} \text { such that } \dot{f} +\frac{1}{2} \Delta f \in C_T \bar{{\mathcal {C}}}_c^{0+} \} \end{aligned}$$

    and does not depend on B.

Next we want to consider the case when X is an \(\mathcal F^X\)-Dirichlet process. In this case we show that the notion of solution of the martingale problem with distributional drift is equivalent to the one of the reinforced B-solution. Let us start with a remark.

Remark 6.10

If X is a B-solution which is an \({\mathcal {F}}^X\)-Dirichlet process, then \( [X, X]_t = t \text {I}_d\). Indeed, by Remark 6.6 we have that \(X_t = W_t^X + A^{X, B}_t(b)\) is the standard decomposition of the weak Dirichlet process X, and by the uniqueness of the weak Dirichlet decomposition and the fact that X is an \({\mathcal {F}}^X\)-Dirichlet process then \(A^{X, B}_t(b)\) is a zero quadratic variation process and so \([X,X]_t = t \text {I}_d\).

Proposition 6.11

Let \(B= C_T \bar{{\mathcal {C}}}_c^{(-\beta )+}\). If \((X, {\mathbb {P}})\) satisfies the martingale problem with distributional drift b and X is an \({\mathcal {F}}^X\)-Dirichlet process, then X is a reinforced B-solution according to Definition 6.8.

Proof

First, we notice that point (i) of Definition 6.8 is satisfied by Theorem 6.5. Next we check point (ii) and we write \(A^X\) instead of \(A^{X,B}\) for ease of notation. Let \(f\in C_b^{1,2,B}\). Using the weak Dirichlet decomposition since \(f\in C_b^{1,2}\), we have

$$\begin{aligned} f(t, X_t)&= M_t^f + {\mathcal {A}}^f_t\nonumber \\&= f(0, X_0) + \int _0^t (\nabla f)(r, X_r) \cdot \textrm{d}W^X_r + {\mathcal {A}}^f_t, \end{aligned}$$
(6.7)

having used Proposition 5.11 to express the martingale component part.

On the other hand, it easily follows that \(f\in \mathcal D_{{\mathcal {L}}}^B\) defined in (6.1), because \({\mathcal {L}} f = \nabla f b + g\), where \(g:= \dot{f} + \frac{1}{2} \Delta f \in C_T \bar{{\mathcal {C}}}_c^{0+} \subset B \) by assumption, and \(\nabla f b \in B\) as seen in Remark 6.9, item (ii). Since X is a B-solution and an \({\mathcal {F}}^X\)-Dirichlet process, by Remark 6.10 we have \([X,X]_t = t \text {I}_d\). So, by applying a slight adaptation of Itô’s formula [24, Theorem 2.2] to \(f(t, X_t) \) for \(f\in C_b^{1,2}\) we have

$$\begin{aligned} f(t, X_t) =&f(0, X_0) + \int _0^t (\nabla f)(r, X_r) \cdot \textrm{d}W^X_r + \int _0^t (\nabla f)(r, X_r) \cdot \textrm{d}^- A^X_r (b)\nonumber \\&+ \int _0^t (\partial _ t f+ \frac{1}{2} \Delta f)(r, X_r) \textrm{d}r\nonumber \\ =&f(0, X_0) + \int _0^t (\nabla f)(r, X_r) \cdot \textrm{d}W^X_r + \int _0^t (\nabla f)(r, X_r) \cdot \textrm{d}^- A^X_r(b)\nonumber \\&+A^X_t (\partial _ t f+ \frac{1}{2} \Delta f). \end{aligned}$$
(6.8)

We recall that \(\partial _ t f+ \frac{1}{2} \Delta f \in C_T \bar{{\mathcal {C}}}_c^{0+}\) since \(f \in C^{1,2,B}_b\), and so \(A^X_t (\partial _ t f+ \frac{1}{2} \Delta f )\) is trivially well-defined. On the other hand \(\partial _ t f+ \frac{1}{2} \Delta f = {\mathcal {L}} f - \nabla f \, b\), where \( {\mathcal {L}} f,\nabla f \, b \in B\) as noticed in Remark 6.9 item (ii); thus, we can write

$$\begin{aligned} A^X_t (\partial _ t f+ \frac{1}{2} \Delta f) = A^X_t ({\mathcal {L}} f - \nabla f \, b ) = A^X_t ({\mathcal {L}} f )- A^X_t( \nabla f \, b ). \end{aligned}$$

Plugging this into (6.8) and comparing with (6.7), we get

$$\begin{aligned} {\mathcal {A}}^f_t = \int _0^t (\nabla f)(r, X_r) \cdot \textrm{d}^- A^X_r \\ +A^X_t ({\mathcal {L}} f ) - A^X_t( \nabla f \, b ); \end{aligned}$$

hence, applying (6.3) we conclude. \(\square \)

The next result is the converse statement of Proposition 6.11.

Proposition 6.12

Let \(B= C_T \bar{{\mathcal {C}}}_c^{(-\beta )+}\) and \(b \in B\). Let \({\mathbb {P}}\) be a probability measure on \((\Omega , {\mathcal {F}})\). Let X be a reinforced B-solution according to Definition 6.8, which is also an \({\mathcal {F}}^X\)-Dirichlet process. Then \((X, {\mathbb {P}})\) solves the martingale problem with distributional drift b.

Proof

We need to show that for every \(f\in {\mathcal {D}}_{{\mathcal {L}}}\)

$$\begin{aligned} f(t, X_t) -f(0, X_0) - \int _0^t ({\mathcal {L}}{\mathcal {f}} )(r, X_r) \textrm{d}r \end{aligned}$$

is an \({\mathcal {F}}^X\)-local martingale under \({\mathbb {P}}\). Since \(f\in {\mathcal {D}}_{{\mathcal {L}}}\) we know that there exists \(l\in C_T \bar{{\mathcal {C}}}_c^{0+}\) such that \({\mathcal {L}} f = l\). By the density of \({\mathcal {S}}\) into \( \bar{{\mathcal {C}}}_c^{(-\beta )+}\), see [18, Lemma 5.4] and using [19, Remark B.1] we see that \(C_T{\mathcal {S}}\) is dense in \( C_T\bar{{\mathcal {C}}}_c^{(-\beta )+}\). Thus, we can find a sequence \((b_n) \) such that \( b_n \in C_T \bar{{\mathcal {C}}}_c^{0+}\) and \(b_n \rightarrow b\) in \(C_T {{\mathcal {C}}}^{(-\beta )+}\). Let \({\mathcal {L}}_n u:= \partial _t u + \frac{1}{2} \Delta u + \nabla u \, b_n\) and let us consider the PDE \({\mathcal {L}}_n f_n = l\) and \(f_n(T) = f(T)\). By [18, Remark 4.12] we know that the unique solution \(f_n\in C_T {\mathcal {C}}^{(1+\beta )+}\) is also the classical solution as given in [21, Theorem 5.1.9]; hence, \(f_n\in C^{1,2}_b \). We recall that X is a B-solution in the sense of Definition 6.2 and it is an \({\mathcal {F}}^X\)-Dirichlet process with decomposition \(X = W^X + A^{X,B}\) by Remark 6.10. By Itô’s formula [25, Theorem 6.1], taking into account the linearity of \(A^{X,B}\) and the fact that \(b_n \in C_T \bar{{\mathcal {C}}}_c^{0+}\), we have

$$\begin{aligned} f_n(t, X_t) =&f_n (0, X_0) + \int _0^t (\nabla f_n )(s, X_s) \cdot \textrm{d}W_s + \int _0^t(\nabla f_n )(s, X_s) \cdot \textrm{d}^- A^{X,B}_s(b-b_n) \nonumber \\&+ \int _0^t (\nabla f_n )(s, X_s) b_n (s, X_s) \textrm{d}s+ \frac{1}{2} \int _0^t (\Delta f_n)(s, X_s ) \textrm{d}s + \int _0^t (\partial _s f_n)(s, X_s) \textrm{d}s\nonumber \\ =&f_n (0, X_0) + \int _0^t (\nabla f_n )(s, X_s) \cdot \textrm{d}W_s + \int _0^t(\nabla f_n )(s, X_s) \cdot \textrm{d}^- A^{X,B}_s(b-b_n) \nonumber \\&+ \int _0^t l(s, X_s ) \textrm{d}s , \end{aligned}$$
(6.9)

having used \({\mathcal {L}}_n f_n =l\) in the last equality. Using again the linearity of \(A^{X,B}\), we have

$$\begin{aligned} \nonumber \int _0^t&(\nabla f_n )(s, X_s) \cdot \textrm{d}^- A^{X,B}_s(b-b_n) \\&=\int _0^t(\nabla f_n )(s, X_s) \cdot \textrm{d}^- A^{X,B}_s(b) - \int _0^t(\nabla f_n )(s, X_s) \cdot \textrm{d}^- A^{X,B}_s(b_n). \end{aligned}$$
(6.10)

The second integral on the RHS is equal to \(A^{X,B}_t (\nabla f_n \,b_n)\) by Remark 6.9 item (i) since \(f_n \in C^{1,2}_b\) and \(b_n \in C_T \bar{{\mathcal {C}}}_c^{0+}\). Since X is a reinforced B-solution, by (6.6) the first integral on the RHS of (6.10) gives \(A^{X,B}_t (\nabla f_n \,b) \) so by additivity we rewrite (6.10) as

$$\begin{aligned} \int _0^t (\nabla f_n )(s, X_s) \cdot \textrm{d}^- A^{X,B}_s(b-b_n) = A^{X,B}_t (\nabla f_n \,b-\nabla f_n \,b_n). \end{aligned}$$
(6.11)

Plugging (6.11) into (6.9), we have

$$\begin{aligned} f_n(t, X_t)&-f_n (0, X_0) - A^{X,B}_t (\nabla f_n \,b-\nabla f_n \,b_n) - \int _0^t l(s, X_s ) \textrm{d}s\nonumber \\&= \int _0^t (\nabla f_n )(s, X_s) \cdot \textrm{d}W_s. \end{aligned}$$
(6.12)

Since \(b_n \rightarrow b\) in \(C_T {\mathcal {C}}^{-\beta }\), we then have \(f_n \rightarrow f\) in \(C_T {\mathcal {C}}^{(1+\beta )+}\) and \( \nabla f_n \rightarrow \nabla f\) in \(C_T {\mathcal {C}}^{\beta +}\) by continuity results for PDE (2.3), see Sect. 2.2. Thus, the right-hand side of (6.12) converges u.c.p. to \(\int _0^t (\nabla f )(s, X_s) \cdot \textrm{d}W_s\), which is a local martingale under \({\mathbb {P}}\). Moreover, the left-hand side of (6.12) converges u.c.p. to \(f(t, X_t) -f(0, X_0) - \int _0^t l(s, X_s ) \textrm{d}s \) and since \(l = {\mathcal {L}} f\) we conclude. \(\square \)

As a consequence, we get a characterisation property for solutions of the SDE in terms of solutions to martingale problem.

Corollary 6.13

Let \(B= C_T \bar{{\mathcal {C}}}_c^{(-\beta )+}\) and \(b \in B\). Let \({\mathbb {P}}\) be a probability measure on \((\Omega , {\mathcal {F}})\). Suppose that X is an \({\mathcal {F}}^X\)-Dirichlet process. Then X is a reinforced B-solution of the SDE

$$\begin{aligned} X_t = X_0 + W_t + \int _0^t b(t, X_t) \textrm{d}t \end{aligned}$$

if and only if \((X, {\mathbb {P}})\) solves the martingale problem with distributional drift b and initial condition \(X_0 \sim \mu \).

Proof

Combine Proposition 6.11 and Proposition 6.12. \(\square \)