1 Introduction

We consider the transport equation, here posed (w.l.o.g.) as terminal value problem. This is,

$$\begin{aligned} {\left\{ \begin{array}{ll} - \partial _t u (t, x)_{} = \displaystyle \sum _{i=1}^d f_i (x) \cdot D_x u (t, x){\dot{W}}_t^i \equiv \Gamma u_t (x) {\dot{W}}_t &{}\quad \text { in }\quad (0,T)\times {{\mathbb {R}}}^{n},\\ u=g &{}\quad \text { on }\quad \{T\}\times {{\mathbb {R}}}^{n}. \end{array}\right. } \end{aligned}$$

for fixed \(T>0\), with vector fields \(f = (f_1,\ldots ,f_d)\) driven by a \(\mathcal {C}^1\)-driving signal \(W=(W^1,\ldots ,W^d)\). The canonical pairing of \(Du =D_x u = (\partial _{x^1}u,\ldots ,\partial _{x^n}u)\) with a vector field is indicated by a dot, and we already used the operator / vector notation

$$\begin{aligned} \Gamma _i = f_i (x) \cdot D_x, \ \ \Gamma = (\Gamma _1,\ldots , \Gamma _d). \end{aligned}$$

By the methods of characteristics, the unique (classical) \(\mathcal {C}^{1, 1}\) transport solution \(u:[0, T] \times {{\mathbb {R}}}^n \rightarrow {{\mathbb {R}}}\), is given explicitly by

$$\begin{aligned} u (s, x) = u (s, x ; W) : = g (X^{s, x}_T) \;, \end{aligned}$$

provided \(g \in \mathcal {C}^1\) and the vector fields \(f_1, \ldots , f_d\) are nice enough (\(\mathcal {C}^1_b\) will do) to ensure a \(\mathcal {C}^1\) solution flow for the ODE

$$\begin{aligned}{\left\{ \begin{array}{ll} {\dot{X}}^{s,x}_t = \displaystyle \sum _{i = 1}^d f_i (X^{s,x}_t) {\dot{W}}^i_t \equiv f (X_t) {\dot{W}}_t,\\ X^{s,x}_s = x\,. \end{array}\right. } \end{aligned}$$

In turn, solving this ODE with random initial data induces a natural evolution of measures, given by the continuity - or forward equation

$$\begin{aligned}{\left\{ \begin{array}{ll} \partial _t \rho = \displaystyle \sum _{i = 1}^d {{\,\mathrm{div}\,}}_x (f_i (x) \rho _t)\,\mathrm d W_t^i &{}\quad \text { in }\quad (0,T)\times {{\mathbb {R}}}^{n},\\ \rho (0)=\mu &{}\quad \text { on }\quad \{0\}\times {{\mathbb {R}}}^{n} \,. \end{array}\right. }\end{aligned}$$

Well-posedness of the “trinity” transport/flow/continuity will depend on the regularity of the data. For \(W \in \mathcal {C}^1\) we have an effective vector field

$$\begin{aligned} b (t,x) = \sum _{i = 1}^d f_i (x) {\dot{W}}^i_t \end{aligned}$$

which is continuous in \(t \in [0,T]\) and inherits the regularity of f. In particular, \(f \in \mathcal {C}^1\) will be sufficient for a \(\mathcal {C}^{1,1}\)-flow. In a landmark paper, DiPerna–Lions [9] and then Ambrsosio [1], showed that the transport problem (weak solutions) is well-posed under bounds on \({{\,\mathrm{div}\,}}b\) (rather than \(D_x b\)) which in turn leads to a generalized flow. Another fundamental direction may be called regularisation by noise, based on the observation that generically \({\dot{X}} = f_0(X) + (noise)\) is much better behaved than the noise-free problem, see e.g. [2, 4, 6, 10, 12, 21].

Our work is not concerning with DiPerna-Lions type analysis, nor regularisation by noise. In fact, our driving vector fields will be very smooth, to compensate for the the irregularity of the noise, which we here assumed to be very rough (This trade-off is typical in rough paths and regularity structures.)

Specifically, we continue a programme started independently by Bailleul–Gubinelli [3] (see also [8]) and Diehl et al. [7] and take W as a rough path, henceforth called \({\mathbf{W}}\). As in these works, we are interested in an intrinsic notion of solution (Rough path stability of transport problems was already noted in [5]). The contribution of this article is a treatment of rough noise of arbitrarily low regularity. Based on a suitable definition of solution, carefully introduced below, we can show

Theorem 1.1

Assume \({\mathbf{W}}\) is a weakly geometric rough path of Hölder regularity with exponent \(\gamma \in (0,1]\). Assume f has \(2\lfloor \gamma ^{-1}\rfloor +1\) bounded derivatives. Then there is a unique spatially regular (resp. measure-valued) solution to the rough transport (resp. continuity) equation with regular terminal data (resp. measure-valued initial data).

This should be compared with [3, 7], which both treat the “level-2 case”, with Hölder noise of exponent \(\gamma > 1/3\). Treating the general case, i.e. with arbitrarily small Hölder exponent, requires us in particular to fully quantify the interaction of iterated integrals, themselves constrained by shuffle-relations, and the controlled structure of the PDE problem at hand. In fact, the shuffle relations will be seen crucial to preserve the hyperbolic nature of the rough transport equation. This is different for (ordinary) rough differential equations where the shuffle relations can be discarded at the price of working with branched (think: Itô-type) rough paths. For what it’s worth, our arguments restricted to the (well-known) level-2-case still contain some worthwhile simplifications with regard to the existing literature, e.g. by avoiding the analysis of an adjoint equation [7] and showing uniqueness for weak solutions of the continuity equations via a small class of test functions. On our way we also (have to) prove some facts on (controlled) geometric rough paths of independent interest, not (or only in the branched setting [16, 18]) available in the literature.

Relation to existing works: Unlike the case of rough transport equation, when it comes to stochastic constructions it is impossible to mention all related works stretching over more than four decades, from e.g. Funaki [13], Ogawa [23] to recent works such as [24] with fractional noise and Russo–Valois integration.

The many benefits of a robust theory of stochastic partial differential equations, by combining a deterministic RPDE theory with Brownian and more general noise, are now well documented and need not be repeated in detail. Let us still recall one example of interest: multidimensional fractional Brownian motion admits a canonical geometric rough path lift (see e.g. [11]) \(1/4 < \alpha<\) H, which constitutes an admissible rough noise for our rough transport and continuity equations. Various authors (see for example Unterberger [26], Nualart and Tindel [22], etc.) have constructed “renormalised” canonical fractional Brownian rough paths for any \(H>0\), fully covered by Theorem 1.1.

1.1 Notations

We fix once and for all a time \(T>0\). In what follows we abbreviate estimates of the form \(| (a) - (b) | \lesssim |t-s|^\gamma \) by writing \((a) \underset{\gamma }{=} (b)\). Given \(\gamma \in (0,1)\) we denote by \(\mathcal {C}^\gamma \) the classical Hölder space, i.e. consisting of functions \(f:[0,T]\rightarrow {\mathbb {R}}\) such that

$$\begin{aligned} \sup _{t\ne s}\frac{|f_t-f_s|}{|t-s|^\gamma }<\infty . \end{aligned}$$

Throughout the paper we say geometric rough path, when we really mean weakly geometric rough path (since we only work with this type of rough path, the difference [14] will not matter to us).

2 Rough paths

We start by reviewing the definition of geometric rough paths of roughness \(\gamma \in (0,1)\) and controlled rough paths. We will do so in a Hopf-algebraic language following [18], but before we will introduce some basic concepts.

A word of length \(p\ge 1\) over the alphabet \(\{1,\ldots ,d\}\) is a tuple \(w=(i_1,\ldots ,i_p)\in \{1,\ldots ,d\}^p\), and we set \(|w|:=p\). We denote by \(\varepsilon \) the empty word, which is by convention the unique word with zero length. Given two non-empty words \(v=(i_1,\ldots ,i_p)\) and \(w=(i_{p+1},\ldots ,i_{p+q})\), we denote by \(vw:=(i_1,\ldots ,i_p,i_{p+1},\ldots ,i_{p+q})\) their concatenation. By definition \(\varepsilon w=w\varepsilon =w\). We observe that in any case \(|vw|=|v|+|w|\). The concatenation product is associative but not commutative.

The symmetric group \({\mathbb {S}}_p\) acts on words of length p by permutation of its entries, that is, \(\sigma .w:=(i_{\sigma (1)},\ldots ,i_{\sigma (p)})\). Given two integers \(p,q\ge 1\), a (pq)-shuffle is a permutation \(\sigma \in {\mathbb {S}}_{p+q}\) such that

$$\begin{aligned} \sigma (1)<\sigma (2)<\cdots<\sigma (p)\text {and}\sigma (p+1)<\sigma (p+2)<\cdots <\sigma (p+q). \end{aligned}$$

We denote by \({{\,\mathrm{Sh}\,}}(p,q)\) the set of all (pq)-shuffles.

2.1 The shuffle algebra

The shuffle product was introduced by Ree [25] to study the combinatorial properties of iterated integrals, following K.-T. Chen’s work. Let \(d\ge 1\) be fixed, and consider the tensor algebra H over \({{\mathbb {R}}}^d\), which is defined to be the direct sum

$$\begin{aligned} H:=\bigoplus _{p=0}^{\infty } ({{\mathbb {R}}}^d)^{\otimes p}. \end{aligned}$$

A linear basis for H is given by pure tensors \(e_{i_1}\otimes \cdots \otimes e_{i_p}\), \(p\ge 1\) where \(\{e_1,\ldots ,e_d\}\) is a basis of \({{\mathbb {R}}}^d\), and the additional element \(\mathbf {1}\) which generates \((\mathbb {R}^d)^{\otimes 0}:={{\mathbb {R}}}\mathbf {1}\). In order to ease the notation we denote, for a word \(w=(i_1,\ldots ,i_p)\), \(e_w:=e_{i_1}\otimes e_{i_2}\otimes \cdots \otimes e_{i_p}\). By definition, the set \(\{e_w:|w|=p\}\) is a linear basis for \(({{\mathbb {R}}}^d)^{\otimes p}\) for any \(p\ge 0\).

The space H is endowed with a product , called the shuffle product, defined on pure tensors as

There is also another operation, called the deconcatenation coproduct \(\Delta :H\rightarrow H\otimes H\), defined by

$$\begin{aligned} \Delta e_w:=\sum _{uv=w}e_u\otimes e_v. \end{aligned}$$

The shuffle product and the deconcatenation coproduct satisfy a compatibility relation (which will not play any role in the sequel), turning the tripe into a graded connected bialgebra. This implies the existence of a linear map \(S:H\rightarrow H\), called the antipode, turning into a Hopf algebra. In our particular setting, S can be explicitly computed on basis elements by \(S(e_{i_1\cdots i_p})=(-1)^pe_{i_p\cdots i_1}\).

The coproduct endows the dual space \(H^*\) with an algebra structure via the convolution product given, for \(g,h\in H^*\), by

$$\begin{aligned}\langle g\star h, x\rangle :=\langle g\otimes h,\Delta x\rangle . \end{aligned}$$

On pure tensor this yields

$$\begin{aligned} \langle g\star h,e_w\rangle =\sum _{uv=w}\langle g,e_u\rangle \langle h,e_v\rangle . \end{aligned}$$

A character is a linear map \(g\in H^*\) such that for all \(x,y\in H\). It is a standard result (see e.g. [20]) that the collection of all characters on H forms a group G under the convolution product whose identity is the function \(\mathbf {1}^*\in H^*\), defined by \(\mathbf {1}^*(e_b)=0\) for every word b and \(\mathbf {1}^*(\mathbf {1})=1\). The inverse of an element \(g\in G\) can be computed by using the antipode: \(g^{-1}=g\circ S\).

Given \(N\ge 0\), we consider the step-N truncated tensor algebra

$$\begin{aligned} H_N=\bigoplus _{p=0}^N({{\mathbb {R}}}^d)^{\otimes p}. \end{aligned}$$

Definition 2.1

A step-N truncated character is a linear map \(g\in H_N^*\) such that


for all \(x\in ({{\mathbb {R}}}^d)^{\otimes p}\) and \(y\in ({{\mathbb {R}}}^d)^{\otimes q}\) with \(p+q\le N\).

It is not hard to show that the set \(G^{(N)}\) of all step-N truncated characters is also a group under the convolution product, whose identity is again \(\mathbf {1}^*\). Denoting by \(e_1^*,\ldots ,e^*_d\) the basis of \({{\mathbb {R}}}^d\) dual to \(\{e_1,\ldots ,e_d\}\), we introduce the dual basis \((e_a^*)\) of \(H_N^*\) in the canonical way, that is, for a word w we denote by \(e_w^*\) the unique linear map on \(H_N\) such that

$$\begin{aligned} \langle e_w^*,e_v\rangle =\delta _w(v). \end{aligned}$$

The convolution product of two of these basis elements can be explicitly computed. Indeed, by definition

$$\begin{aligned} \langle e_u^*\star e_v^*,e_w\rangle =\sum _{u'v'=w}\langle e_u^*,e_{u'}\rangle \langle e_v^*,e_{v'}\rangle \end{aligned}$$

which is nonzero if and only if \(w=uv\), in which case \(\langle e_u^*\star e_v^*,e_w\rangle =1\). Therefore \(e_u^*\star e_v^*=e_{uv}^*\). For this reason this product is also known as the concatenation product.

2.2 Geometric rough paths

We now recall the notion of geometric rough paths. The group \(G^{(N)}\) can be endowed with a sub-additive homogeneous norm \(\Vert \cdot \Vert _N:G^{(N)}\rightarrow {{\mathbb {R}}}_+\), see [19] for further details. This allows us to define a left invariant metric on \(G^{(N)}\) by setting

$$\begin{aligned} d_N(g,h):=\Vert h^{-1}g\Vert _N. \end{aligned}$$

Definition 2.2

Let \({N_{\gamma }}:=\lfloor \gamma ^{-1}\rfloor \) denote the integer part of \(\gamma ^{-1}\). A geometric rough path of regularity \(\gamma \) is a \(\gamma \)-Hölder path \({\mathbf{W}}:[0,T]\rightarrow (G^{(N_\gamma )},d_N)\). The set of all geometric rough paths of regularity \(\gamma \) will be denoted by \(\mathscr {C}^\gamma \).

By definition of the increments \({\mathbf{W}}_{st}:={\mathbf{W}}_s^{-1}\star {\mathbf{W}}_t\) satisfy the so-called Chen’s relations

$$\begin{aligned} {\mathbf{W}}_{st}={\mathbf{W}}_{su}\star {\mathbf{W}}_{ut} \end{aligned}$$

for all \(0\le s,u,t\le T\). Moreover, by construction of the homogeneous norm \(\Vert \cdot \Vert _{N}\), for any word w such that \(|w|\le N_{\gamma }\) one has

$$\begin{aligned} \sup _{t\ne s}\frac{|\langle {\mathbf{W}}_{st},e_w \rangle |}{|t-s|^{|w|\gamma }}<\infty . \end{aligned}$$

2.3 Controlled rough paths and rough integrals

One of the main goals of rough paths theory is to give meaning to solutions of controlled equations of the form

$$\begin{aligned} \mathrm dX_t=\sum _{i=1}^df_i(X_t)\,\mathrm d{\mathbf{W}}^i_t, \end{aligned}$$

for some collection of sufficiently regular vector fields \(f_1,\ldots , f_d\) on \({{\mathbb {R}}}^n\) and where the driving signals \(W^1,\ldots , W^d\) are very irregular. The general philosophy is that if the smoothness of the vector fields compensates the lack of regularity of the driving signals, then we can still have existence of solutions given that we reinterpret the equation in the appropriate sense. The central ingredient for proving this kind of results is the notion of controlled rough path which we now recall.

Definition 2.3

[11, 16] Let \({\mathbf{W}}\in \mathscr {C}^{\gamma }\) and \(1\le N\le N_{\gamma }+1\). A rough path controlled by \({\mathbf{W}}\) is a path \({\mathbf{X}}:[0,T]\rightarrow H_{N-1}\) if for any word w such that \(\vert w\vert \le N-1\) the path \(t\mapsto \langle e_w^*,{\mathbf{X}}_t\rangle \in \mathcal {C}^\gamma \) and

$$\begin{aligned} \langle e_w^*,{\mathbf{X}}_t\rangle \underset{(N-|w|)\gamma }{=}\langle {\mathbf{W}}_{st}\star e_w^*,{\mathbf{X}}_s\rangle \,, \end{aligned}$$

for all \(s<t\). We denote by \(\mathscr {D}^{N\gamma }_{{\mathbf{W}}}\) the (vector) space of paths \({\mathbf{X}}\) satisfying (2.6).

We say that a path \(X:[0,T]\rightarrow {{\mathbb {R}}}\) is controlled by \({\mathbf{W}}\) if there exists a controlled path \({\mathbf{X}}\in \mathscr {D}^{N\gamma }_{{\mathbf{W}}}\) such that \(\langle \mathbf {1}^*,{\mathbf{X}}_t\rangle =X_t\); we call \({\mathbf{X}}\) a controlled rough path above (the controlled path) X.

Remark 2.4

The definition in [11] seems more restrictive in that one always take \(N=N_\gamma \), which is the minimal value of N required for rough integration. The case \(N=N_\gamma +1\) is convenient to keep track of the additional information obtained by rough integration, see Remark 2.7.

Remark 2.5

Alternatively, by writing \({\mathbf{X}}\) and \({\mathbf{W}}\) as the sums

$$\begin{aligned}{\mathbf{X}}_s=\sum _{|u|\le N-1}\langle e_u^*,{\mathbf{X}}_s\rangle e_u,\quad {\mathbf{W}}_{st}=\sum _{|v|\le N}\langle {\mathbf{W}}_{st}, e_v\rangle e_v^*,\end{aligned}$$

the condition in Eq. (2.6) can be explicitly written

$$\begin{aligned} \langle e_w^*,{\mathbf{X}}_t\rangle \underset{(N-|w|)\gamma }{=}\sum _{0\le |v|\le N-|w|}\langle e_{vw}^*,{\mathbf{X}}_s\rangle \langle {\mathbf{W}}_{st},e_v\rangle , \end{aligned}$$

for any word w.

By construction of the vector space \(\mathscr {D}^{N\gamma }_{{\mathbf{W}}}\), the quantity

$$\begin{aligned}\Vert \mathbf {X} \Vert _{{\mathbf{W}};N\gamma }:= \sum _{0\le \vert w\vert<N} \sup _{s<t}\frac{|\langle e_w^*,{\mathbf{X}}_t\rangle -\langle {\mathbf{W}}_{st}\star e_w^*,{\mathbf{X}}_s\rangle |}{|t-s|^{(N- \vert w \vert )\gamma }}\,,\end{aligned}$$

is finite for any \({\mathbf{X}}\in \mathscr {D}^{N\gamma }_{{\mathbf{W}}}\). We can easily show that \( \Vert \cdot \Vert _{{\mathbf{W}};N\gamma } \) is a seminorm, and \(\mathscr {D}^{N\gamma }_{{\mathbf{W}}}\) becomes a Banach space under the norm

$$\begin{aligned} \Vert {\mathbf{X}}\Vert _{\mathscr {D}^{N\gamma }_{{\mathbf{W}}}}:=\max _{|w|\le N-1}|\langle e_w^*,{\mathbf{X}}_0\rangle |+\Vert {\mathbf{X}}\Vert _{{\mathbf{W}};N\gamma }. \end{aligned}$$

We extend the notion of controlled rough path above a vector-valued path \(X:[0,T]\rightarrow \mathbb {R}^n\). In this case, the path \({\mathbf{X}}\) takes values in \((H_{N-1})^n\), that is, each component path \(\langle e_w^*,{\mathbf{X}}\rangle \) is a vector of \({{\mathbb {R}}}^n\), which we denote by

$$\begin{aligned} \langle e_w^*,{\mathbf{X}}\rangle =(\langle e_w^*,{\mathbf{X}}\rangle _1,\ldots ,\langle e_w^*,{\mathbf{X}}\rangle _n). \end{aligned}$$

Then we require the bound in Eq.  (2.6) to hold componentwise, or equivalently, we can replace the absolute value of the left-hand side by any norm on \({{\mathbb {R}}}^n\). We denote this space by \((\mathscr {D}^{N\gamma }_{{\mathbf{W}}})^{n}\).

Using the higher-order information contained in the controlled rough path \(\mathbf {X}\in \mathscr {D}^{N\gamma }_{{\mathbf{W}}}\), we recall the rigorous notion of rough integral of \({\mathbf{X}}\) against \({\mathbf{W}}\). For its proof see [11].

Theorem 2.6

Let \({\mathbf{W}}\in \mathscr {C}^{\gamma }\) and \({\mathbf{X}}\in \mathscr {D}^{N_{\gamma }\gamma }_{{\mathbf{W}}}\). For every \(i\in \{1,\ldots ,d\}\) there exists a unique real valued path in \(\mathcal {C}^{\gamma }\)

$$\begin{aligned} t\mapsto \int _0^t X_u\,\mathrm d{\mathbf{W}}^i_u:=\lim _{|\pi |\rightarrow 0}\sum _{[a,b]\in \pi }\sum _{0\le |w|\le N_{\gamma }-1}\langle e^*_w,{\mathbf{X}}_a\rangle \langle {\mathbf{W}}_{ab}, e_{wi}\rangle , \end{aligned}$$

where \(\pi \) is a sequence of partitions of [0, t] whose mesh \(|\pi |\) converges to 0. We call it the rough integral of X with respect to \(W^i\). Moreover one has the estimate

$$\begin{aligned} \int _0^tX_u\,\mathrm d{\mathbf{W}}^i_u- \int _0^sX_u\,\mathrm d{\mathbf{W}}^i_u=:\int _s^tX_u\,\mathrm d{\mathbf{W}}^i_u\underset{(N_{\gamma }+1) \gamma }{=}\sum _{0<|w|\le N_{\gamma }-1}\langle e^*_w,{\mathbf{X}}_s\rangle \langle {\mathbf{W}}_{st}, e_{wi}\rangle ,\nonumber \\ \end{aligned}$$

for any \(s<t\). Introducing the function \(\int _0^{\cdot }{\mathbf{X}}_u\,\mathrm d{\mathbf{W}}^i_u:[0,T]\rightarrow H_{N_{\gamma }}\) given by

$$\begin{aligned} \left\langle \mathbf {1}^*,\int _0^t{\mathbf{X}}_u\,\mathrm d{\mathbf{W}}^i_u\right\rangle :=\int _0^tX_u\,\mathrm d{\mathbf{W}}^i_u\,,\quad \left\langle e_{wi}^*,\int _0^t{\mathbf{X}}_u\,\mathrm d{\mathbf{W}}^i_u\right\rangle :=\langle e_w^*,{\mathbf{X}}_t\rangle \, \end{aligned}$$

and zero elsewhere, one has \(\int _0^{\cdot }{\mathbf{X}}_u\,\mathrm d{\mathbf{W}}^i_u\in \mathscr {D}^{(N_{\gamma }+1)\gamma }_{{\mathbf{W}}}\).

Remark 2.7

Differently from the general definition of the \(\mathscr {D}^{N\gamma }_{{\mathbf{W}}}\) spaces, in order to define the rough integral it is necessary to start from a controlled rough path \({\mathbf{X}}\in \mathscr {D}^{N_{\gamma }\gamma }_{{\mathbf{W}}}\). The operation of integration on controlled rough path comes also with some quantitative bounds. Looking at the definition, it is also possible to prove there exists a constant \(C(T, \gamma , {\mathbf{W}})>0\) depending on T, \(\gamma \), \({\mathbf{W}}\) such that

$$\begin{aligned}\left\| \int _0^{\cdot }{\mathbf{X}}_u\,\mathrm d{\mathbf{W}}^i_u \right\| _{\mathscr {D}^{(N_\gamma +1)\gamma }_{{\mathbf{W}}}}\le C(T, \gamma , {\mathbf{W}})\Vert {\mathbf{X}}\Vert _{\mathscr {D}^{N_\gamma \gamma }_{{\mathbf{W}}}}.\end{aligned}$$

Therefore the application \({\mathbf{X}}\mapsto \int {\mathbf{X}}\,\mathrm d{\mathbf{W}}^i\) is a continuous linear map.

The second operation we introduce is the composition of a controlled rough path and a smooth function. Given a smooth function \(\phi :{{\mathbb {R}}}^n\rightarrow {{\mathbb {R}}}\), its k-th derivative at \(x\in {{\mathbb {R}}}^n\) is the multilinear map \(D^k\phi (x):({{\mathbb {R}}}^n)^{\otimes k}\rightarrow {{\mathbb {R}}}\) such that for \(v^1,\ldots ,v^k\in {{\mathbb {R}}}^n\),

$$\begin{aligned} D^k\phi (x)(v^1,\ldots ,v^k)=\sum _{\alpha _1,\ldots ,\alpha _k=1}^n\frac{\partial ^k\phi }{\partial x_{\alpha _1}\cdots \partial x_{\alpha _k}}(x)v_{\alpha _1}^1\cdots v_{\alpha _k}^k. \end{aligned}$$

To ease notation we define

$$\begin{aligned} \partial ^\alpha \phi (x):=\frac{\partial ^k\phi }{\partial x_{\alpha _1}\cdots \partial x_{\alpha _k}}(x) = \frac{\partial ^k\phi }{\partial x^{i_1}_{1}\cdots \partial x^{i_n}_{n}}(x) \end{aligned}$$

for a word \(\alpha =(\alpha _1,\ldots ,\alpha _k)\in \{1,\ldots ,n\}^k\); of course, such \(\alpha \) induces a multi-index \(i=(i_1,\ldots ,i_n)\in {\mathbb {N}}^n\), where \(i_j\) counts the number of entries of \(\alpha \) that equal j.

We note that \(D^k\phi (x)\) is symmetric, meaning that for any permutation \(\sigma \in {\mathbb {S}}_k\) we have that \(D^k\phi (x)(v^1,\ldots ,v^k)=D^k\phi (x)(v^{\sigma (1)},\ldots ,v^{\sigma (k)}).\)

Remark 2.8

Observe that we also use the notion of word in this case, albeit with a different alphabet. In order to avoid confusion we reserve latin letters such as uvw, etc for words on the alphabet \(\{1,\ldots ,d\}\), introduced in the beginning of Sect. 2, and greek letters such as \(\alpha ,\beta \), etc for words on the alphabet \(\{1,\ldots ,n\}\) as above.

With these notations, Taylor’s theorem states that if \(\phi :{{\mathbb {R}}}^n\rightarrow {{\mathbb {R}}}^m\) is of class \(\mathcal {C}^{r+1}({{\mathbb {R}}}^n, {{\mathbb {R}}}^m)\) then for any \(j=1,\ldots ,m\) one has the identity

$$\begin{aligned} \phi ^j(y)=\sum _{k=0}^{r}\frac{1}{k!}D^k\phi ^j(x)\left( (y-x)^{\otimes k} \right) +{\mathrm O}(|y-x|^{r+1}). \end{aligned}$$

In what follows, for any finite number of words \(u_1,\ldots ,u_k\) we introduce the set

Since the shuffle product is commutative, for any permutation \(\sigma \in {\mathbb {S}}_k\) we have that

$$\begin{aligned} {{\,\mathrm{Sh}\,}}(u_1,\ldots ,u_k)={{\,\mathrm{Sh}\,}}(u_{\sigma (1)},\ldots ,u_{\sigma (k)}). \end{aligned}$$

Thanks to this notation, we can prove Faà di Bruno’s formula (see also [17]). We denote by \({\mathcal {P}}(m)\) the collection of all partitions of \(\{1,\ldots ,m\}\). Given \(\pi =\{B_1,\ldots ,B_p\}\in {\mathcal {P}}(m)\), we let \(\#\pi :=p\) denote the number of its blocks, and for each block we denote by |B| its cardinality.

Lemma 2.9

For any couple of functions \(g:{{\mathbb {R}}}^n\rightarrow {{\mathbb {R}}}^n\) and \(f:{{\mathbb {R}}}^n\rightarrow {{\mathbb {R}}}\) sufficiently smooth and every \(m\ge 1\), letting \(h:=f\circ g\) one has the identity

$$\begin{aligned} D^mh(x)(v_1,\ldots ,v_m)=\sum _{\pi \in {\mathcal {P}}(m)}D^{\#\pi }f(g(x))(D^{|B_1|}g(x)(v_{B_1}),\ldots ,D^{|B_p|}g(x)(v_{B_p})) \end{aligned}$$

where \(v_B:=(v_{i_1},\ldots ,v_{i_q})\) for \(B=\{i_1,\ldots ,i_q\}\).

In particular, for any word \(\alpha =(\alpha _1,\ldots ,\alpha _m)\) we have

$$\begin{aligned} \partial ^{\alpha }h(x)=\sum _{k=1}^{m}\frac{1}{k!}\sum _{\begin{array}{c} \beta _1,\ldots ,\beta _k \\ \alpha \in {\text {Sh}}(\beta _1,\ldots ,\beta _k) \end{array}}D^kf(g(x))(\partial ^{\beta _1}g(x),\ldots ,\partial ^{\beta _k}g(x)). \end{aligned}$$


We proceed by induction on m. For \(m=1\) the formula reads

$$\begin{aligned} Dh(x)v=Df(g(x))Dg(x)v \end{aligned}$$

which is the usual chain rule. Suppose the formula holds for some \(m\ge 1\). Then, applying the chain rule to each of the terms we get

$$\begin{aligned}&D^{m+1}h(x)(v_1,\ldots ,v_{m+1})\\&=\sum _{\pi \in {\mathcal {P}}(m)}\sum _{l=1}^{k}D^{\#\pi +1}f(g(x)) \left( D^{|B_1|}g(x)v_{B_1},\ldots ,D^{|B_l|+1}g(x)(v_{B_l},v_{m+1}),\ldots ,D^{|B_k|}g(x)v_{B_k} \right) \\&\quad +\sum _{\pi \in {\mathcal {P}}(m)}D^{\#\pi +1}f(g(x))(D^{|B_1|}g(x)v_{B_1},\ldots ,D^{|B_k|}g(x)v_{B_k},Dg(x)v_{m+1})\\&= \sum _{\pi '\in {\mathcal {P}}(m+1)}D^{\#\pi '}f(g(x))\left( D^{|B'_1|}g(x)(v_{B'_1}),\ldots ,D^{|B'_p|}g(x)(v_{B'_{k'}}) \right) \end{aligned}$$

where the last identity follows from the fact that every partition \(\pi '\in {\mathcal {P}}(m+1)\) can be obtained by either appending \(m+1\) to one of the blocks of some partition \(\pi \in {\mathcal {P}}(m)\) or by adding the singleton block \(\{m+1\}\) to it.

Given a word \(\alpha =(\alpha _1,\ldots ,\alpha _m)\), we evaluate the previous formula in the canonical basis vectors \(v_1=e_{\alpha _1},\ldots ,v_m=e_{\alpha _m}\) to obtain

$$\begin{aligned} \partial ^\alpha h(x)&= D^mh(x)(v_1,\ldots ,v_m)\\&= \sum _{\pi \in {\mathcal {P}}(m)}D^{\#\pi }f(g(x))\left( \partial ^{\alpha _{B_1}}g(x),\ldots ,\partial ^{\alpha _{B_k}}g(x) \right) \end{aligned}$$

where \(\alpha _{B}=(\alpha _{i_1},\ldots ,\alpha _{i_q})\) if \(B=\{i_1,\ldots ,i_p\}\). It is now clear that for any choice of \(\pi \in {\mathcal {P}}(m)\) the words \(\alpha _{B_1},\ldots ,\alpha _{B_k}\) satisfy \(\alpha \in {{\,\mathrm{Sh}\,}}(\alpha _{B_1},\ldots ,\alpha _{B_k})\). Conversely, if \(\alpha \in {{\,\mathrm{Sh}\,}}(\beta _1,\ldots ,\beta _k)\), there is a partition \(\pi =\{B_1,\ldots ,B_k\}\) with \(B_j=\) is such that \(\beta _j=\alpha _{B_j}\). Moreover, for any choice of such a partition, any of the k! permutations of its blocks result in the same evaluation by symmetry of the differential. Thus

$$\begin{aligned} \partial ^\alpha h(x)=\sum _{k=1}^m\frac{1}{k!}\sum _{\begin{array}{c} \beta _1,\ldots ,\beta _k\\ \alpha \in {{\,\mathrm{Sh}\,}}(\beta _1,\ldots ,\beta _k) \end{array}}D^kf(g(x))(\partial ^{\beta _1}g(x),\ldots ,\partial ^{\beta _k}g(x)). \end{aligned}$$

\(\square \)

Remark 2.10

This result should be well-known to experts, yet the closest reference we found in the literature [17] only covers the scalar case (and does not immediately yield the multivariate case).

Using a similar technique we show a version of this identity for controlled rough paths.

Theorem 2.11

Let \({\mathbf{W}}\in \mathscr {C}^\gamma \), \(1\le N\le N_\gamma +1\), \({\mathbf{X}}\in (\mathscr {D}_{{\mathbf{W}}}^{N\gamma })^n\), and \(\phi \in C^{N}({{\mathbb {R}}}^n, {{\mathbb {R}}}^m)\) and set \(X_t:=\langle \mathbf {1},{\mathbf{X}}_t\rangle \). We introduce the path \({\varvec{\Phi }}({\mathbf{X}}):[0,T]\rightarrow (H_{N-1})^m\) defined by \(\langle \mathbf {1}^*,{\varvec{\Phi }}({\mathbf{X}})_t\rangle _j= \phi ^j(X_t)\) and for any \(j=1,\ldots ,m\), and any non-empty word w by the identity

$$\begin{aligned} \langle e_w^*,{\varvec{\Phi }}({\mathbf{X}})_t\rangle _j:=\sum _{k=1}^{|w|}\frac{1}{k!} {\sum _{\begin{array}{c} w_1,\ldots ,w_k\\ w\in {{\,\mathrm{Sh}\,}}(w_1,\ldots ,w_k) \end{array}}}D^k\phi ^j(X_t)(\langle e_{w_1}^*,{\mathbf{X}}_t\rangle ,\ldots ,\langle e_{w_k}^*,{\mathbf{X}}_t\rangle ). \end{aligned}$$

Then \({\varvec{\Phi }}({\mathbf{X}})\) is also a controlled rough path belonging to \((\mathscr {D}_{{\mathbf{W}}}^{N\gamma })^m\).

Remark 2.12

A similar statement in the setting of branched rough paths [16, Lemma 8.4] is known and somewhat easier due to the absence of shuffle relations.

Before going into the proof, we introduce some more notation. If \({\mathbf{X}}\) is a controlled path, \(L\in {\mathcal {L}}( ({{\mathbb {R}}}^n)^{\otimes k},{{\mathbb {R}}}^m)\), \(t\ge 0\) and \(w_1,\ldots ,w_k\) are words, we let

$$\begin{aligned} L(t;w_1,\ldots ,w_k):=L(\langle e_{w_1}^*,{\mathbf{X}}_t\rangle ,\ldots ,\langle e_{w_k}^*,{\mathbf{X}}_t\rangle ).\\ \end{aligned}$$


It is sufficient to prove the result when \(m=1\). We first prove the result for the case of \(\langle \mathbf {1}^*,{\varvec{\Phi }}({\mathbf{X}})_t\rangle =\phi (X_t)\). By Taylor expanding \(\phi \) up to order N around \(X_s\) we get that

$$\begin{aligned} \phi (X_t)\underset{N\gamma }{=}\sum _{k=0}^{N-1}\frac{1}{k!}D^k\phi (X_s)\left( (X_t-X_s)^{\otimes k} \right) . \end{aligned}$$

Since \({\mathbf{X}}\in \left( \mathscr {D}_{{\mathbf{W}}}^{N\gamma } \right) ^n\), according to Remark 2.5, we have

$$\begin{aligned} \langle \mathbf {1}^*,{\mathbf{X}}_t-{\mathbf{X}}_s\rangle \underset{N\gamma }{=}\langle {\mathbf{W}}_{st}-\mathbf {1}^*,{\mathbf{X}}_s\rangle =\sum _{0<|w|< N}\langle e_w^*,{\mathbf{X}}_s\rangle \langle {\mathbf{W}}_{st},e_u\rangle . \end{aligned}$$

Plugging this estimate into the above equation and using the character property of \({\mathbf{W}}_{st}\) in (2.2) we obtain

so the desired estimate follows.

Now we show the bound (2.6) for all words \(w\ne \mathbf {1}\). By fixing an integer \(1\le k\le \vert w \vert \) and words \(u_1,\ldots ,u_k\) such that \(w\in {{\,\mathrm{Sh}\,}}(u_1,\ldots ,u_k)\) we consider the term

$$\begin{aligned} D^k\phi (X_t)(t;w_1,\ldots ,w_k). \end{aligned}$$

Again, since \({\mathbf{X}}\) is controlled by \({\mathbf{W}}\), plugging the estimate in Remark 2.5 into (2.16) and using the multilinearity of the derivative we obtain


Performing a Taylor expansion of \(D^k\phi \) up to order \(N-|w|\) between \(X_t\) and \(X_s\), we obtain

$$\begin{aligned}&D^k\phi (X_t)(s;v_1w_1,\ldots ,v_kw_k)\nonumber \\&\underset{(N -\vert w\vert )\gamma }{=} \sum _{m=0}^{N-\vert w\vert -1}\frac{1}{m!}D^{k+m}\phi (X_s)\left( (X_t-X_s)^{\otimes m}, \langle e_{v_1w_1}^*,{\mathbf{X}}_s\rangle ,\ldots ,\langle e_{v_kw_k}^*,{\mathbf{X}}_s\rangle \right) .\nonumber \\ \end{aligned}$$

Combining the estimates (2.17) and (2.18) with (2.15) into the definition of \(\langle e_w^*,{\varvec{\Phi }}({\mathbf{X}})_t\rangle \), we obtain the identity


Since the derivative \(D^{k+m}\phi (X_s)\) is symmetric we can replace it with

$$\begin{aligned}\frac{k!m!}{(k+m)!}\sum _{ I_k\sqcup J_m= \{1,\ldots ,m+k\}} D^{k+m}\phi (X_s)(v_{i_1}w_{i_1},\ldots ,z_{j_1},\ldots ,v_{i_k}w_{i_k},\ldots ).\end{aligned}$$

Replacing this expression in the right-hand side of (2.19), it is now an easy but tedious exercise to verify the resulting expression is equal to the sum

$$\begin{aligned} \sum _{0\le \vert u \vert < N-\vert w\vert }\sum _{l=1}^{|w|+ \vert u\vert }\sum _{\begin{array}{c} u_1,\ldots ,u_l \\ uw\in {\text {Sh}}(u_1,\ldots ,u_l) \end{array}}\frac{1}{l!}D^l\phi (X_s)(s;u_1,\dots ,u_l)\langle {\mathbf{W}}_{st},e_{u}\rangle .\end{aligned}$$

Thereby proving the result. \(\square \)

Remark 2.13

A similar proof gives quantitative bounds on the application \({\mathbf{X}}\rightarrow {\varvec{\Phi }}({\mathbf{X}})\). Indeed for any \(\phi \in \mathcal {C}^{N}_b({{\mathbb {R}}}^n, {{\mathbb {R}}}^m)\) it is possible to prove that this application is locally Lipschitz on \(\mathscr {D}^{N\gamma }_{{\mathbf{W}}}\).

3 Rough differential equations

Now we come to the definition of solution of the RDE

$$\begin{aligned} {\left\{ \begin{array}{ll} \mathrm dX_t=\displaystyle \sum _{i=1}^df_i(X_t)\,\mathrm d{\mathbf{W}}^i_t,\\ X_0=x. \end{array}\right. } \end{aligned}$$

We assume that the vector fields \(f_1,\ldots ,f_d\) are of class at least \(\mathcal {C}^{N_\gamma }\), so that by Theorem 2.11 the composition \(f_i(X_t)\) can be lifted to a controlled path \({\mathbf{F}}_i:\left( \mathscr {D}^{N_{\gamma }\gamma }_{{\mathbf{W}}} \right) ^n\rightarrow \left( \mathscr {D}^{N_{\gamma }\gamma }_{{\mathbf{W}}} \right) ^n\).

Definition 3.1

A path \(X:[0,T]\rightarrow {{\mathbb {R}}}^n\) is a solution of (3.1) if there exists a controlled path \({\mathbf{X}}\in \left( \mathscr {D}^{N_{\gamma }\gamma }_{{\mathbf{W}}} \right) ^n\) satisfying \(\langle \mathbf {1}^*, {\mathbf{X}}_t\rangle = X_t\) such that

$$\begin{aligned} {\mathbf{X}}_t-{\mathbf{X}}_s=\sum _{i=1}^d\int _s^t{\mathbf{F}}_i({\mathbf{X}})_u\,\mathrm d{\mathbf{W}}^i_u, \end{aligned}$$

for all \(s,t\in [0,T]\).

Remark 3.2

We stress that (3.2) is an equation in \(\mathscr {D}^{N_\gamma \gamma }_{{\mathbf{W}}}\), which in fact implies that \(\langle e_w^*,{\mathbf{X}}_t\rangle =F_w(X_t)\) for all words w with \(|w|\le N_\gamma -1\).

Remark 3.3

If \({\mathbf{X}}\in \mathscr {D}^{N_\gamma \gamma }_{{\mathbf{W}}}\) satisfies Eq.  (3.2), it can also be regarded as an element of \(\mathscr {D}^{(N_\gamma +1)\gamma }_{{\mathbf{W}}}\), by Eq.  (2.8). Therefore we freely treat solutions to RDEs as elements of either of these spaces.

By solving a fixed point equation on \(\left( \mathscr {D}_{{\mathbf{W}}}^{N_\gamma \gamma } \right) ^n\) (see e.g. [11]) of the form

$$\begin{aligned}{\mathbf{X}}_t= {\mathbf{X}}_0+ \sum _{i=1}^d\int _0^t{\mathbf{F}}_i({\mathbf{X}})_u\,\mathrm d{\mathbf{W}}^i_u \end{aligned}$$

with (see Proposition 3.4 for the definition of the functions \(F_w:{{\mathbb {R}}}^n\rightarrow {{\mathbb {R}}}^n\))

$$\begin{aligned} {\mathbf{X}}_0=\sum _{|w|\le N_\gamma -1}F_w(x)e_w\in \left( H_{N_\gamma -1} \right) ^n, \end{aligned}$$

one can prove that there exists a unique global solution of (3.2) if the vector fields are of class \(\mathcal {C}^{N_\gamma +1}_b\). We recall this interesting expansion of the solution.

Proposition 3.4

(Davie’s expansion) A path \(X:[0,T]\rightarrow {{\mathbb {R}}}^n\) is the unique rough path solution to Eq.  (2.5) in the sense of Definition 3.1 if and only if

$$\begin{aligned} X_t\underset{(N_{\gamma }+1)\gamma }{=}\sum _{0\le |w|\le N_{\gamma }}F_w(X_s)\langle {\mathbf{W}}_{st}, e_w\rangle \end{aligned}$$

and the coefficients of its lift \({\mathbf{X}}\in (\mathscr {D}_{{\mathbf{W}}}^{N_\gamma +1})^n\) are given by \(\langle e_w^*,{\mathbf{X}}_t\rangle =F_w(X_t)\) where the functions \(F_w:{{\mathbb {R}}}^n\rightarrow {{\mathbb {R}}}^n\) are recursively defined by by \(F_{\varepsilon }:={\mathrm {id}}\) and

$$\begin{aligned} F_{iw}(x):=DF_w(x)f_i(x). \end{aligned}$$

Remark 3.5

By Eq.  (2.7) this results actually implies the chain of estimates, for all words \(|w|\le N_{\gamma }\),

$$\begin{aligned} F_w(X_t)\underset{(N_{\gamma }+1-|w|)\gamma }{=}\sum _{0\le |u|\le N-|w|}F_{uw}(X_s)\langle {\mathbf{W}}_{st},e_u\rangle . \end{aligned}$$

Proof of Proposition 3.4

Suppose that \({\mathbf{X}}\) is a rough solution to Eq.  (2.5) in the sense of Definition 3.1. We define the functions \(F_w:{{\mathbb {R}}}^n\rightarrow {{\mathbb {R}}}^n\) recursively by \(F_i(x):=f_i(x)\) and

$$\begin{aligned} F_{wi}(x):=\sum _{k=1}^{|w|}\frac{1}{k!} {\sum _{\begin{array}{c} u_1,\ldots ,u_k\\ w\in {{\,\mathrm{Sh}\,}}(u_1,\ldots ,u_k) \end{array}}}D^kf_i(x)(F_{u_1}(x),\ldots ,F_{u_k}(x)). \end{aligned}$$

Now it is an easy but tedious verification to show that these functions satisfy \(F_{iw}(x)=DF_w(x)f_i(x)\); this identity essentially amounts to a reiterated use of the Leibniz rule. The form of the coefficients of \({\mathbf{X}}\) is shown by induction, it being clear for a single letter \(i=1,\ldots ,d\). If w is any word with \(0\le |w|\le N-1\) and \(i\in \{1,\ldots ,d\}\) by definition

$$\begin{aligned} \langle e_{wi}^*,{\mathbf{X}}_t-{\mathbf{X}}_0\rangle&= \left\langle e_{wi}^*,\sum _{j=1}^d\int _0^t{\mathbf{F}}_j({\mathbf{X}})_u\,\mathrm dW^j_u \right\rangle \\&= \langle e_w^*,{\mathbf{F}}_i({\mathbf{X}})_t\rangle \end{aligned}$$

where, in the second identity we have used Eq.  (2.10). By Theorem 2.11, the last coefficient equals

$$\begin{aligned} \sum _{k=1}^{|w|}\frac{1}{k!}\sum _{\begin{array}{c} u_1,\ldots ,u_k\\ w\in {{\,\mathrm{Sh}\,}}(u_1,\ldots ,u_k) \end{array}}D^kf_i(X_t)(t;u_1,\ldots ,u_k)=F_{wi}(X_t) \end{aligned}$$

by the induction hypothesis. Then we obtain Eq.  (3.3) from Definition 2.3 and Remark 2.5.

Conversely, suppose that X admits the local expansion in Eq.  (3.3) and that the path \({\mathbf{X}}\) satisfies \(\langle e_w^*,{\mathbf{X}}_t\rangle =F_w(X_t)\) for all words w with \(|w|\le N\). First we show that X is controlled by \({\mathbf{W}}\) with coefficients given by \({\mathbf{X}}\). For this we have to Taylor expand the difference \(F_w(X_t)-F_w(X_s)\) and collect terms as in the proof of Theorem 2.11. Then, by Eq.  (2.10) it is not difficult to see that in fact

$$\begin{aligned} \langle e_{wi}^*,{\mathbf{X}}_t\rangle =F_{wi}(X_t)=\left\langle e_{wi}^*,\sum _{j=1}^d\int _0^t{\mathbf{F}}_j({\mathbf{X}})_u\,\mathrm dW^j_u \right\rangle \end{aligned}$$

so that Definition 3.1 is satisfied. \(\square \)

3.1 Differentiability of the flow

It is a standard result in classical ODE theory that given a regular enough vector field V, the equation \(\dot{X}=V(X)\) induces a smooth flow on \({{\mathbb {R}}}^d\). Indeed, if we let \(X^x_t\) denote the unique solution of this equation such that \(X_0^x=x\), then the map \((t,x)\mapsto X_t^x\) is a flow, in the sense that \((t,X^x_s)\mapsto X^x_{t+s}\) and the mapping \(x\mapsto X_t^x\) is a diffeomorphism for each fixed t. More precesily, if V is of class \(\mathcal {C}^k\), then the application \(x\mapsto X_t^x\) is also of class \(\mathcal {C}^k\).

Now we show that a similar statement is true in the case of RDEs. The statement is the following

Theorem 3.6

Let \(f_1,\ldots ,f_d\) be a family of class \(\mathcal {C}^{N_\gamma +1+k}_b\) vector fields in \({{\mathbb {R}}}^d\) for some integer \(k\ge 0\), and \({\mathbf{W}}\in \mathscr {C}^\gamma \). Then

  1. 1.

    the RDE

    $$\begin{aligned} \mathrm dX_t=\sum _{i=1}^df_i(X_t)\,\mathrm d{\mathbf{W}}^i_t,\quad X_s=x \end{aligned}$$

    has a unique solution \({\mathbf{X}}^{s,x}\in \mathscr {D}^{(N_\gamma +1)\gamma }_{{\mathbf{W}}}\),

  2. 2.

    the induced flow \(x\mapsto X^{s,x}_t\) is a class \(\mathcal {C}^{k+1}\) diffeomorphism for each fixed \(s<t\), and

  3. 3.

    the partial derivatives satisfy the system of RDEs

    $$\begin{aligned} \mathrm d\partial ^\alpha X_t^{s,x}=\sum _{i=1}^d\sum _{k=1}^{|\alpha |}\frac{1}{k!}\sum _{\alpha \in {\text {Sh}}(\beta _1,\ldots ,\beta _k)}D^kf_i(X^{s,x}_t)(\partial ^{\beta _1}X^{s,x}_t,\ldots ,\partial ^{\beta _k}X^{s,x}_t)\,\mathrm d{\mathbf{W}}^i_t \end{aligned}$$

    with initial conditions \(X^{s,x}_s=x\), \(\partial ^i X^{s,x}_s=e_i\) and \(\partial ^\alpha X^{s,x}_s=0\) for all words with \(|\alpha |\ge 2\).


Points (1) and (2) are standard results in rough paths as found e.g. in Chapter 11 in [15]. For the algebraic identity in (3), it suffices to show the results in the case W is smooth. Indeed, by standard arguments \({\mathbf{W}}\in \mathscr {C}^\gamma \) can be approximated uniformly with uniform \(\gamma \)-Hölder rough path bound, and hence in \(\mathscr {C}^{\gamma -\eta }\) for any \(\eta >0\), while on the other hand the particular structure (cf. Chapter 11 in [15]) of the system of (rough) differential equations guarantees uniqueness and global existence so that the limiting argument is justified.

It remains to show point (3) for \({\mathbf{W}}\) smooth. We note that the integral representation of the solution

$$\begin{aligned} X^{s,x}_t=x+\sum _{i=1}^d\int _s^tf_i(X^{s,x}_u)\dot{W}^i_u\,\mathrm du \end{aligned}$$

holds. By Lemma 2.9, for any \(\alpha =(\alpha _1,\ldots ,\alpha _m)\) and \(s<u<t\), we have

$$\begin{aligned} \partial ^\alpha X_{ut}^{s,x}=\int _u^t\sum _{i=1}^d\sum _{k=1}^{m}\frac{1}{k!}\sum _{\beta _1,\ldots ,\beta _k}D^kf_i(X^{s,x}_r)(\partial ^{\beta _1}X^{s,x}_r,\ldots ,\partial ^{\beta _k}X^{s,x}_r)\dot{W}^i_r\,\mathrm dr \end{aligned}$$

which is the smooth version of Eq.  (3.6). \(\square \)

We aim now to obtain a Davie-type expansion of the partial derivatives \(\partial ^\alpha X^{s,x}\) by making use of point 3. above. We observe that the above system of equations has the form

$$\begin{aligned} \mathrm dX^{s,x}_t&= \sum _{i=1}^df_i(X^{s,x}_t)\,\mathrm d{\mathbf{W}}^i_t\\ \mathrm dDX^{s,x}_t&= \sum _{i=1}^dDf_i(X^{s,x}_t)DX^{s,x}_t\,\mathrm d{\mathbf{W}}^i_t\\ \mathrm dD^2X^{s,x}_t&= \sum _{i=1}^dDf_i(X^{s,x}_t)D^2X^{s,x}_t\,\mathrm d{\mathbf{W}}^i_t+(\ldots )\\&\vdots \end{aligned}$$

with initial conditions \(X^{s,x}_s=x\), \(DX^{s,x}_s=I\), \(D^2X^{s,x}_s=D^3X^{s,x}_s=\cdots =0\), where the inhomogeneity \((\ldots )\) is not important to spell out.

The expansion is clear only for the first equation; it is just Eq. (3.3). We would like to use Proposition 3.4 to obtain an expansion of the second equation but the problem is that the vector field driving the equation depends on time, so the result does not directly apply. For the third and subsequent equations the problem is not only that but also they are non-homogeneous.

To solve this problem we extend our state space \({{\mathbb {R}}}^n\) to (the still finite-dimensional space)

$$\begin{aligned} {\mathfrak {S}}_k:={{\mathbb {R}}}^n\oplus {\mathcal {L}}({{\mathbb {R}}}^n,{{\mathbb {R}}}^n)\oplus \cdots \oplus {\mathcal {L}}\left( ({{\mathbb {R}}}^n)^{\otimes (k-1)},{{\mathbb {R}}}^n \right) \end{aligned}$$

and define the vector fields (we give a more precise definition below in Eq.  (3.8)) \({\mathfrak {f}}_i:{\mathfrak {S}}_k\rightarrow {\mathfrak {S}}_k\) by

$$\begin{aligned} {\mathfrak {f}}_i({\mathfrak {x}}):=(f_i(x),Df_i(x)(y_1),D^2f_i(x)(y_1,y_1)+Df_i(x)(y_2),\ldots ) \end{aligned}$$

where \({\mathfrak {x}}=(x,y_1,y_2,\ldots ,y_{k-1})\in {\mathfrak {S}}_k\). The previous proposition shows that if

$$\begin{aligned}{\mathfrak {X}}^{s,x}_t:=\left( X^{s,x}_t,DX^{s,x}_t,\ldots ,D^{k-1}X^{s,x}_t\right) \end{aligned}$$


$$\begin{aligned} \mathrm d{\mathfrak {X}}^{s,x}_t=\sum _{i=1}^d{\mathfrak {f}}_i({\mathfrak {X}}^{s,x}_t)\,\mathrm d{\mathbf{W}}^i_t,\quad {\mathfrak {X}}^{s,x}_s:={\mathfrak {x}}=(x,I,0,\ldots ,0). \end{aligned}$$

This transformation turns the system of non-autonomous non-homogeneous RDEs into a single autonomous homogeneous RDE in \({\mathfrak {S}}_k\).

Corollary 3.7

For any word \(\alpha \), the partial derivatives of the solution flow \(X^{s,x}\) have the following Davie expansion: for any \(p=1,\ldots ,k-1\),

$$\begin{aligned} D^pX^{s,x}_{t}\underset{(N_\gamma +1)\gamma }{=}\sum _{0\le |v|\le N_\gamma }D^pF_w(x)\langle {\mathbf{W}}_{st},e_w\rangle . \end{aligned}$$

In particular, for a word \(\alpha \in \{1,\ldots ,n\}^p\) we have that

$$\begin{aligned} \partial ^\alpha X^{s,x}_{t}\underset{(N_\gamma +1)\gamma }{=}\sum _{0\le |v|\le N_\gamma }\partial ^\alpha F_w(x)\langle {\mathbf{W}}_{st},e_w\rangle . \end{aligned}$$


The hypotheses on the vector fields \(f_1,\ldots ,f_d\) imply that \({\mathfrak {f}}_1,\ldots ,{\mathfrak {f}}_d\) are of class \(\mathcal {C}^{N_\gamma +1}_b\) on \({\mathfrak {S}}_k\), so this equation has a unique solution. Applying Proposition 3.4 in this extended space we obtain, for \(s<t\), the expansion

$$\begin{aligned} {\mathfrak {X}}_t^{s,x}\underset{(N_{\gamma }+1)\gamma }{=}\sum _{0\le |w|<N}{\mathfrak {F}}_w({\mathfrak {x}})\langle {\mathbf{W}}_{st},e_w\rangle . \end{aligned}$$

In order to deduce the result, we need to show that \({\mathfrak {F}}_w({\mathfrak {x}})_p=D^pF_w(x)\) for all words w and \(p=0,1,\ldots ,k-1\). We do this by induction on the length of w. If \(w=i\) is a single letter, the p-th component, \(p=0,1,\ldots ,k-1\), of the vector field \({\mathfrak {f}}_i\) is given by \({\mathfrak {f}}_i(x)_0=f_i(x)\) and

$$\begin{aligned} {\mathfrak {f}}_i({\mathfrak {x}})_p=\sum _{j=1}^p\sum _{(r)_j}\frac{p!}{r_1!\cdots r_j!(1!)^{r_1}\cdots (j!)^{r_k}}D^{p-j+1}f_i(x)\left( y_1^{r_1},\ldots ,y_{j}^{r_j}\right) \end{aligned}$$

where the inner sum is over the set of indices \((r_1,\ldots ,r_j)\) such that \(r_1+\cdots +r_j=p-j+1\) and \(r_1+2r_2+\cdots +jr_j=p\). For our particular initial condition, \(y_j=0\) for \(j=2,3,\ldots ,k-1\) the formula simplifies to

$$\begin{aligned} {\mathfrak {f}}_i({\mathfrak {x}})_p=D^pf_i(x)\in {\mathcal {L}}\left( ({{\mathbb {R}}}^n)^{\otimes p},{{\mathbb {R}}}^n \right) \end{aligned}$$

since the only term left in (3.8) is the one with \(j=1\), \(r_1=p\).

We continue by induction on the length of the word. We compute the p-th derivative of \(x\mapsto F_{iw}(x)=DF_w(x)f_i(x)\) by recognizing that \(F_{iw}=\varphi _1\circ \varphi _2\) with \(\varphi _1(x,h)=DF_w(x)h\) and \(\varphi _2(x)=(x,f_i(x))\). A quick check gives that the higher order derivatives of \(\varphi _1\) and \(\varphi _2\) are given by

$$\begin{aligned} D^m\varphi _1(x,h)( (u_1,v_1),\ldots ,(u_m,v_m) )&= D^{m+1}F_w(x)(u_1,\ldots ,u_m,h)\\&\quad +\sum _{j=1}^mD^mF_w(x)(u_1,\ldots ,{\hat{u}}_j,\ldots ,u_m)\\ D^m\varphi _2(x)(h_1,\ldots ,h_m)&= (h_1\delta _{m=1},D^mf_i(x)(h_1,\ldots ,h_m)) \end{aligned}$$

where \({\hat{u}}_j=v_j\). Thus, using Lemma 2.9 we get that

$$\begin{aligned} D^pF_{iw}(x)(h_1,\ldots ,h_p)=\sum _{\pi \in {\mathcal {P}}(p)}D^{\#\pi }\varphi _1(\varphi _2(x))(D^{|B_1|}\varphi _2(x)h_{B_1},\ldots ,D^{|B_q|}\varphi _2(x)h_{B_q}). \end{aligned}$$

Now we have three cases, depending on the number of blocks of the partition in the above summation:

  1. 1.

    \(q=p\): there is a single partition with p blocks, and each block is a singleton. In this case the term equals

    $$\begin{aligned} D^{p+1}F_w(x)(h_1,\ldots ,h_p,f_i(x))+\sum _{j=1}^pD^pF_w(x)(h_1,\ldots ,Df_i(x)h_j,\ldots ,h_p). \end{aligned}$$
  2. 2.

    \(q=1\): there is a single partition with one block, namely \(\pi =\{1,\ldots ,p\}\). In this case the term equals

    $$\begin{aligned} DF_w(x)[D^pf_j(x)(h_1,\ldots ,h_p)]. \end{aligned}$$
  3. 3.

    \(1<q<p\): there is at least one block of size greater than one, which means that the first term in the expression for \(D^m\varphi _1\) vanishes since at least one of \(u_1,\ldots ,u_m\) vanishes. For the rest of the terms, the exact result depends on whether there is a block with exactly one block or not: if all blocks have more than one block then the whole expression vanishes; otherwise, we obtain one term for each of the blocks having size exactly one, and it is of the form

    $$\begin{aligned} D^{\#\pi }F_w(x)(h_{B},D^{\#p-|B|}f_i(x)h_{\pi {\setminus } B}). \end{aligned}$$

In either case, using the induction hypothesis it is possible to show that each of the terms appearing are of the form \(\partial _r{\mathfrak {F}}_w({\mathfrak {x}})_p{\mathfrak {f}}_i({\mathfrak {x}})_r\), which then means that \(D^pF_{iw}(x)=[D{\mathfrak {F}}_w({\mathfrak {x}}){\mathfrak {f}}_i({\mathfrak {x}})]_p\) as desired. For example, the term

$$\begin{aligned} D^{p+1}F_w(h_1,\ldots ,h_p,f_i(x)) \end{aligned}$$

corresponds to

$$\begin{aligned}{}[\partial _0{\mathfrak {F}}_w({\mathfrak {x}})_p{\mathfrak {f}}_i({\mathfrak {x}})_0](h_1,\ldots ,h_p) \end{aligned}$$

and so on.

\(\square \)

In particular for the first derivative, the first few terms of the expansion read

$$\begin{aligned} DX^{s,x}_t&= I+\sum _{i=1}^dDf_i(x)\langle {\mathbf{W}}_{st},e_i\rangle \\&\quad +\sum _{i,j=1}^d\Bigl (Df_j(x)Df_i(x)+D^2f_j(x)(f_i(x),{\mathrm {id}})\Bigr )\langle {\mathbf{W}}_{st},e_{ij}\rangle +\cdots \end{aligned}$$

3.2 Itô’s formula for RDEs

The last ingredient to add in the study of the rough transport equation is to write down a change of variable formula for a solution of Eq.  (2.5) for some sufficiently smooth vector field \(f=(f_1,\ldots ,f_d)\). By analogy with terminology of stochastic calculus we call it an “Itô formula”. For any \(i=1,\ldots ,n\) we denote by \(\Gamma _i\) the differential operator \(f_i(x)\cdot D_x\) and for any non-empty word \(w=i_1\ldots i_m\) we use the shorthand notation

$$\begin{aligned}\Gamma _{w}:=\Gamma _{i_1}\circ \cdots \circ \Gamma _{i_m}\,.\end{aligned}$$

Moreover we adopt the convention \(\Gamma _{\varepsilon }={{\text {id}}}\).

Lemma 3.8

Let \(f_1,\ldots ,f_d\in \mathcal {C}^{N_\gamma +1}({{\mathbb {R}}}^n; {{\mathbb {R}}}^n)\) be vector fields on \({{\mathbb {R}}}^n\). If \(\phi :{{\mathbb {R}}}^n\rightarrow {{\mathbb {R}}}\) is a smooth function and w is a nonempty word, then

$$\begin{aligned} \Gamma _{w}\phi (x)=\sum _{k=1}^{\vert w\vert }\frac{1}{k!}\sum _{\begin{array}{c} u_1,\ldots ,u_k\\ w\in {\text {Sh}}(u_1,\ldots ,u_k) \end{array}}D^k\phi (x)(F_{u_1}(x),\ldots ,F_{u_k}(x)). \end{aligned}$$


Before commencing we introduce some notation. If \(\phi :{{\mathbb {R}}}^n\rightarrow {{\mathbb {R}}}\) and \(g_1,\ldots ,g_k:{{\mathbb {R}}}^n\rightarrow {{\mathbb {R}}}^n\) are smooth functions, we define

$$\begin{aligned} D^k\phi (x):(g_1,\ldots ,g_k):=D^k\phi (x)(g_1(x),\ldots ,g_k(x)) \end{aligned}$$

where the right-hand side was defined in Eq.  (2.11). The Leibniz rule then gives that for any \(h\in {{\mathbb {R}}}^n\) we have

$$\begin{aligned} h\cdot D_x\left( D^k\phi (x):(g_1,\ldots ,g_k) \right)= & {} D^{k+1}\phi (x):(h,g_1,\ldots ,g_k)\\&+\sum _{i=1}^kD^k\phi (x):(g_1,\ldots ,(D_xg_i)h,\ldots ,g_k). \end{aligned}$$

We now prove the result by induction on the word’s length \(\vert w \vert \). If \(w=i\) is a single letter then \(\Gamma _i\phi (x)=f_i(x)\cdot \nabla \phi (x)=D\phi (x)f_i(x)\) which is exactly Eq.  (3.9). Supposing the identity true for any word \(w'\) such that \(\vert w'\vert \le \vert w\vert \), we prove it for jw where \(j\in \{1,\ldots ,d\}\). By induction one has

$$\begin{aligned} \Gamma _{w}\phi (x)&=\sum _{k=1}^{\vert w\vert }\frac{1}{k!}\sum _{\begin{array}{c} u_1,\ldots ,u_k\\ w\in {\text {Sh}}(u_1,\ldots ,u_k) \end{array}}D^k\phi (x)(F_{u_1}(x),\ldots ,F_{u_k}(x)). \end{aligned}$$

By the above form of Leibniz rule, with \(g_i=F_{u_i}\) and \(h=f_j(x)\), and noticing that by definition

$$\begin{aligned} D_xF_{u_i}(x)f_j(x)=F_{ju_i}(x) \end{aligned}$$

we obtain that

$$\begin{aligned}\begin{aligned} \Gamma _{j}\left( D^k\phi (x):(F_{u_1},\ldots ,F_{u_k})\right)&=D^{k+1}\phi (x):(f_j,F_{u_1},\ldots ,F_{u_k})\\ {}&\quad +\sum _{i=1}^kD^k\phi (x):(F_{u_1},\ldots ,F_{ju_i},\ldots ,F_{u_k})\,. \end{aligned}\end{aligned}$$

Summing this expression over words \(u_1,\ldots ,u_k\), we can rewrite it as

$$\begin{aligned}\begin{aligned}&\sum _{r=1}^k \sum _{\begin{array}{c} u_1,\ldots ,u_k\\ jw\in {\text {Sh}}(u_1,\ldots , ju_r, \ldots ,u_k) \end{array}}D^{k} \phi (x):(F_{u_1},\ldots ,F_{ju_r},\ldots ,F_{u_k})\\ {}&\quad +\frac{1}{k+1}\sum _{r=1}^{k+1}\sum _{\begin{array}{c} u_1,\ldots ,u_k\\ jw\in {\text {Sh}}(u_1,\ldots , j, \ldots ,u_k) \end{array}}D^{k+1} \phi (x):(F_{u_1},\cdots ,\overbrace{f_{j}}^{\text {{ r}th place}},\ldots , F_{u_k}), \end{aligned}\end{aligned}$$

the factor \(1/(k+1)\) is introduced because of the symmetry of \(D^{k+1}\phi (x)\). Summing finally over k, we can express the final expression as

$$\begin{aligned}\begin{aligned} \Gamma _{jw}\phi (x)=&\sum _{k=1}^{\vert w\vert }\frac{1}{k!}\sum _{r=1}^k \sum _{\begin{array}{c} u_1,\ldots ,u_k\\ jw\in {\text {Sh}}(u_1,\ldots , ju_r, \ldots ,u_k) \end{array}}D^{k} \phi (x):(F_{u_1},\cdots ,F_{ju_r},\ldots , F_{u_k})\\ {}&+\sum _{k=1}^{\vert w\vert } \frac{1}{(k+1)!}\sum _{r=1}^{k+1}\sum _{\begin{array}{c} u_1,\cdots ,u_k\\ jw\in {\text {Sh}}(u_1,\ldots ,j,\ldots ,u_k) \end{array}}D^{k+1}\phi (x):(F_{u_1},\cdots ,\overbrace{f_{j}}^{\text {{ r}th place}},\ldots ,F_{u_k}). \end{aligned}\end{aligned}$$

Since the letter j may appear as a single word or concatenated at the right with some word, we finally identify the whole expression above with

$$\begin{aligned}\begin{aligned}&\sum _{k=1}^{\vert w\vert +1}\frac{1}{k!}\sum _{\begin{array}{c} u_1,\ldots ,u_k\\ ja\in {\text {Sh}}(u_1,\ldots ,u_k) \end{array}}D^k\phi (x):(F_{u_1},\ldots ,F_{u_k}). \end{aligned}\end{aligned}$$

\(\square \)

Now we show a formula for the composition of the solution to the RDE (2.5) and a sufficiently smooth function.

Theorem 3.9

(Itô formula for RDEs) Let \(f_i\in \mathcal {C}^{N_\gamma +1}\) and let \({\mathbf{X}}\in \mathscr {D}^{(N_\gamma +1)\gamma }_{{\mathbf{W}}}\) be the unique solution of Eq.  (2.5) and \(X_t=\langle \mathbf {1}^*,{\mathbf{X}}_t\rangle \). Then for any real valued function \(\phi \in \mathcal {C}^{N_{\gamma }+1}_b({{\mathbb {R}}}^n)\) one has the identity

$$\begin{aligned} \phi (X_t)=\phi (X_s)+ \sum _{i=1}^d\int _s^t(\Gamma _i\phi )(X_r)\,\mathrm d{\mathbf{W}}^i_r\,. \end{aligned}$$

More generally, one has the following estimates at the level of controlled rough paths

$$\begin{aligned} \langle e^*_{w}, {\varvec{\Phi }}({\mathbf{X}})_t\rangle \underset{(N_{\gamma }+1-\vert w \vert ) \gamma }{=}\langle e^*_{w},{\varvec{\Phi }}({\mathbf{X}})_s\rangle + \left\langle e^*_w,\sum _{i=1}^d\int _s^t(\Gamma _i{\varvec{\Phi }})({\mathbf{X}})_r\,\mathrm d{\mathbf{W}}^i_r\right\rangle , \end{aligned}$$

where \(\Gamma _i{\varvec{\Phi }}({\mathbf{X}})\) is the controlled lift of composition of \({\mathbf{X}}\) with the function \(\Gamma _i\phi \in \mathcal {C}^{N_\gamma }\) and any non-empty word such that \(\vert w\vert \le N_{\gamma }\).

Proof of Theorem 3.9

The theorem is obtained by comparing the coefficients of the controlled rough paths \( {\varvec{\Phi }}({\mathbf{X}})_t\) and \(\int _0^t(\Gamma _i{\varvec{\Phi }})(X_r)\,\mathrm d{\mathbf{W}}^i_r\) for every \(i=1,\ldots ,d\). Using Lemma 3.8 and Proposition 3.4, for every non-empty word a one has

$$\begin{aligned} \langle e^*_{w}, {\varvec{\Phi }}(\mathbf {X})_t\rangle= & {} \sum _{k=1}^{\vert w\vert }\frac{1}{k!}\sum _{\begin{array}{c} u_1,\ldots ,u_k\\ w\in {\text {Sh}}(u_1,\ldots ,u_k) \end{array}}D^k\phi (X_t)(t;u_1,\ldots ,u_k)\nonumber \\= & {} \sum _{k=1}^{\vert w\vert }\frac{1}{k!}\sum _{\begin{array}{c} u_1,\ldots ,u_k\\ w\in {\text {Sh}}(u_1,\ldots ,u_k) \end{array}}D^k\phi (X_t):(F_{u_1},\ldots ,F_{u_k})\nonumber \\= & {} \Gamma _w\phi (X_t). \end{aligned}$$

Using the same identities we also deduce for any word w,

$$\begin{aligned} \left\langle e^*_{wj},\sum _{i=1}^d\int _0^t(\Gamma _i{\varvec{\Phi }})({\mathbf{X}})_r\,{\mathrm {d}}{\mathbf{W}}^i_r\right\rangle = \langle e^*_{w},(\Gamma _j{\varvec{\Phi }})({\mathbf{X}})_t\rangle = \Gamma _{w}(\Gamma _j\phi )(X_t)= \Gamma _{wj}\phi ( X_t).\nonumber \\ \end{aligned}$$

Since \(\sum _{i=1}^n\int _0^t(\Gamma _i{\varvec{\Phi }})({\mathbf{X}})_r\,\mathrm d{\mathbf{W}}^i_r \) and \({\varvec{\Phi }}(\mathbf {X})_t\) belong both to \(\mathscr {D}_{{\mathbf{W}}}^{(N_{\gamma }+1)\gamma }\) for any word w one has both

$$\begin{aligned} \langle e_w^*,{\varvec{\Phi }}(\mathbf {X})_t\rangle - \langle e_w^*,{\varvec{\Phi }}(\mathbf {X})_s\rangle \underset{(N_{\gamma }+1-|w|)\gamma }{=}\sum _{0< \vert v\vert \le N_{\gamma }-|w|}\langle e_{wv}^*,{\varvec{\Phi }}(\mathbf {X})_s\rangle \langle {\mathbf{W}}_{st}, e_v\rangle , \end{aligned}$$


$$\begin{aligned} \left\langle e_w^*,\sum _{i=1}^n\int _s^t(\Gamma _i{\varvec{\Phi }})({\mathbf{X}})_r\,\mathrm d{\mathbf{W}}^i_r\right\rangle \underset{(N_{\gamma }+1-|w|)\gamma }{=}\sum _{0< \vert v\vert \le N_{\gamma }-|w|}\left\langle e_{wv}^*,\sum _{i=1}^n\int _0^s(\Gamma _i{\varvec{\Phi }})({\mathbf{X}})_r\,\mathrm d{\mathbf{W}}^i_r\right\rangle \langle {\mathbf{W}}_{st}, e_v\rangle . \end{aligned}$$

The identities (3.12) and (3.13) imply that the right-hand sides of the above estimates are the same quantities. Thus we obtain Eq.  (3.11) by simply subtracting one side from the other. In case \(w=\mathbf {1}\) one has

$$\begin{aligned}\phi (X_t)-\phi (X_s)- \sum _{i=1}^d\int _s^t(\Gamma _i\phi )(X_r)\,\mathrm d{\mathbf{W}}^i_r\underset{(N_{\gamma }+1)\gamma }{=}0\,.\end{aligned}$$

Since \((N_{\gamma }+1)\gamma >1\) and the right hand side is the increment of a path, one has the identity (3.10). \(\square \)

Using the identities (3.12) we can rewrite the Itô formula using only the operators \(\Gamma _w\).

Corollary 3.10

(Itô-Davie formula for RDEs) Let \(X:[0,T]\rightarrow {{\mathbb {R}}}^n\) be the unique solution of Eq.  (2.5). Then for any real valued function \(\phi \in \mathcal {C}^{N_{\gamma }+1}_b({{\mathbb {R}}}^n)\) and any word w one has the estimate

$$\begin{aligned} \Gamma _{w}\phi (X_t) \underset{(N_{\gamma }+1-\vert w\vert ) \gamma }{=}\sum _{0\le \vert v\vert \le N_{\gamma }- \vert w\vert }\Gamma _{vw}\phi (X_s) \langle \mathbf {W}_{st},e_v\rangle . \end{aligned}$$

4 Rough transport and continuity

4.1 Rough transport equation

We now consider the rough transport equation

$$\begin{aligned} {\left\{ \begin{array}{ll} -\mathrm du_s=\sum _{i=1}^d\Gamma _i u_s\,\mathrm d{\mathbf{W}}^i_s,\\ u(T,\cdot )= g(\cdot ) \end{array}\right. } \end{aligned}$$

where we recall the differential operator \(\Gamma _i:=f_i\cdot D_x\) for some vector fields \(f_1,\ldots ,f_d\) on \({{\mathbb {R}}}^n\).

We now prepare the definition of a regular solution to the rough transport equation. Since we are in the fortunate position to have an explicit solution candidate we derive a graded set of rough path estimates that provide a natural generalisation of the classical transport differential equation.

Definition 4.1

Let \(\gamma \in (0,1)\), \(\mathbf {W}\in \mathscr {C}^{\gamma }\) a weakly-geometric rough path of roughness \(\gamma \) and \(g\in \mathcal {C}^{N_\gamma +1}\). A \(\mathcal {C}^{\gamma ,N_\gamma +1}\)-function \(u:[0,T]\times {{\mathbb {R}}}^n\rightarrow {{\mathbb {R}}}\) such that \(u(T, \cdot )= g(\cdot )\) is said to be a regular solution to the rough transport Eq.  (4.1) if one has the estimates

$$\begin{aligned} \Gamma _{w} u_s (x)\underset{(N_{\gamma }+1-\vert w \vert ) \gamma }{=}\sum _{0\le \vert v\vert \le N_{\gamma }-\vert w\vert }\Gamma _{ wv}u_t (x) \langle \mathbf {W}_{st},e_v \rangle \,, \end{aligned}$$

for every \(s<t\in [0,T]\), uniformly on compact sets in x and any word w.

Remark 4.2

Since each application of the vector fields \(\Gamma _{i_1\ldots i_n}\) amounts to take n derivatives, these estimates have the interpretation that time regularity of \(\Gamma _{i_1\ldots i_n}u\), can be traded against space regularity in a controlled sense.

Theorem 4.3

Let \(f \in \mathcal {C}^{2N_\gamma +1}_b\), \(g\in \mathcal {C}^{N_\gamma +1}\) and consider the rough solution \(X^{s,x}\) to Eq.  (2.5). Then \(u(s, x):=g(X^{s,x}_T)\) is a solution to the rough transport equation in the sense of Definition 4.1.


We first note that by Theorem 3.6 the map \((s,x)\mapsto X^{s,x}_T\) belongs to \(\mathcal {C}^{\gamma ,N_\gamma +1}\). Since \(g\in \mathcal {C}^{N_\gamma +1}\) then \(u(s,x)= g(X^{s,x}_T)\in \mathcal {C}^{\gamma ,N_\gamma +1}\). Let us show that u is a solution by proving the estimates given in Definition 4.1 for some fixed times \(s< t < T\) and x in compact set. By uniqueness of the RDE flow one has \(X^{s, x}_T = X^{t, y}_T \) where \(y = X^{s,x}_t\). Thus we deduce from the definition of u the identity

$$\begin{aligned} u_s(x) = u_t(X^{s,x}_t). \end{aligned}$$

Let \({\mathbf{X}}\) denote the controlled rough path such that \(X^{s,x}_t=\langle \mathbf {1}^*,{\mathbf{X}}_t\rangle \). Since \(g\in \mathcal {C}_b^{N_{\gamma }+1}\), we can apply the rough Itô formula in Eq.  (3.14) to the function \(x\rightarrow u_t(x)\) obtaining

$$\begin{aligned} u_t(X^{s,x}_t)\underset{(N_{\gamma }+1)\gamma }{=}\sum _{|w|\le N}\Gamma _w u_t(x)\langle {\mathbf{W}}_{st},w\rangle \end{aligned}$$

obtaining (4.2) for the case of \(w=\varepsilon \). To show the estimates on \(\Gamma _{i_1\ldots i_l}u_s\), we apply Lemma 3.8 to the function \(x\mapsto u_s(x)\)

$$\begin{aligned} \Gamma _{w}u_s(x) = \sum _{k=1}^{\vert w \vert }\frac{1}{k!}\sum _{\begin{array}{c} u_1,\ldots ,u_k\\ w\in {{\,\mathrm{Sh}\,}}(u_1,\ldots ,u_k) \end{array}}D^ku_s(x)(F_{u_1}(x),\ldots ,F_{u_k}(x)). \end{aligned}$$

Using again the identity (4.3), for any word \(\alpha \) we apply Eq.  (2.13) obtaining

$$\begin{aligned}\partial ^\alpha (u_s(x))= \sum _{l=1}^k \frac{1}{l!}\sum _{\begin{array}{c} \beta _1,\ldots ,\beta _l\\ \alpha \in {\text {Sh}}(\beta _1,\dotsc ,\beta _l) \end{array}}D^lu_t(X^{s,x}_t)(\partial ^{\beta _1}X^{s,x}_t,\ldots , \partial ^{\beta _l}X^{s,x}_t).\end{aligned}$$

Since the vector field \(f\in C_b^{2N_{\gamma }+1}\) and every \(\beta _i\) such that \( \alpha \in {\text {Sh}}(\beta _1,\dotsc ,\beta _l)\) satisfies \(\vert \beta _i \vert \le \vert a\vert \) we can apply Definition 3.7 to get

$$\begin{aligned} \partial ^{\beta _i}X_t^{s,x}\underset{(N_{\gamma }+1-\vert w \vert )\gamma }{=}\sum _{0\le |v|\le N_{\gamma }-\vert w \vert }\partial ^{\beta _i}F_v(x)\langle {\mathbf{W}}_{st}, e_v\rangle \,.\end{aligned}$$

Plugging these estimates in \(D^lu_t(X^{s,x}_t)\) and one has


Plugging this expression into (4.4) we obtain


Rearranging the sums and applying the definition of the functions \(F_w\) we obtain the identity

$$\begin{aligned} \begin{aligned}&\sum _{k=l}^{\vert w \vert }\frac{1}{k!}\sum _{\begin{array}{c} u_1,\dotsc ,u_k\\ w\in {\text {Sh}}(u_1,\dotsc ,u_k) \end{array}} \sum _{\begin{array}{c} \alpha \in {\text {Sh}}(\beta _1,\dotsc ,\beta _l)\\ \vert \alpha \vert =k \end{array}}D^lu_t(X^{s,x}_t)(\partial ^{\beta _1}F_{v_1}(x), \ldots ,\partial ^{\beta _l}F_{v_l}(x))F^{\alpha _1}_{u_1}(x)\cdots F^{\alpha _k}_{u_k}(x)\\ {}&\qquad =\sum _{\begin{array}{c} u'_1,\ldots ,u'_l\\ w\in {\text {Sh}}(u'_1,\dotsc ,u'_l) \end{array}}D^lu_t(X^{s,x}_t)(F_{u'_1v_1}(x), \cdots ,F_{u'_lv_l}(x)). \end{aligned} \end{aligned}$$

Therefore the right-hand side of (4.6) becomes


We perform now a Taylor expansion of \(D^lu_t(X^{s,x}_t)\) up to order \( N- \vert w \vert \) between \(X_t^{s,x}\) and x, yielding for any words \(u'_1,\ldots , u'_k\)

$$\begin{aligned} \begin{aligned}&D^lu_t(X_t^{s,x})\bigg (F_{u'_1v_1}(x), \cdots ,F_{u'_lv_l}(x)\bigg )\\ {}&\qquad \underset{(N_{\gamma } +1-\vert w \vert )\gamma }{=} \sum _{m=0}^{N-\vert w\vert }\frac{1}{m!}D^{l+m}u_t(x)\left( (X_t^{s,x}-x)^{\otimes m} ,F_{u'_1v_1}(x), \cdots ,F_{u'_lv_l}(x)\right) . \end{aligned} \end{aligned}$$

Plugging now the Davie expansion (3.3) truncated at order \(N_{\gamma }-\vert w\vert \) into (4.7) we have the following estimate


Using the symmetry of \(D^{l+m}u_t(x)\), we deduce

$$\begin{aligned}\frac{l!m!}{(l+m)!}\sum _{ I_l\sqcup J_m= \{1,\ldots ,m+l\}} D^{m+l}\phi (x):\left( F_{u'_{i_1}v_{i_1}}, \cdots , F_{z_{j_1}}, \ldots ,F_{u'_{i_l}v_{i_{l}}}, \ldots \right) .\end{aligned}$$

Replacing this expression in the right-hand side of (4.9), we can easily verify that the resulting expression is equal to the sum

$$\begin{aligned} \sum _{0\le \vert v \vert \le N-\vert w\vert }\sum _{n=1}^{|w|+ \vert v\vert }\sum _{\begin{array}{c} u_1,\dotsc ,u_n \\ wv\in {\text {Sh}}(u_1,\dotsc ,u_n) \end{array}}\frac{1}{n!}D^n u_t(x):(F_{u_1}, \cdots ,F_{u_n})\langle {\mathbf{W}}_{st},e_{v}\rangle .\end{aligned}$$

Thereby proving the result. \(\square \)

We can now show that solutions in the sense of Definition 4.1 are unique.

Theorem 4.4

Let \(f_i \in \mathcal {C}^{2N_{\gamma }+1}_b\) with associated differential operators \(\Gamma _i\), and \({\mathbf{W}}\in \mathscr {C}^\gamma \). Given regular terminal data \(g \in \mathcal {C}^{N_{\gamma }+1}\), there exists a unique regular solution to the rough transport equation (4.1).


Existence is clear, since Proposition 4.3 exactly says that \((t, x) \mapsto g (X^{t, x}_T)\) gives a regular solution. Let now u be any solution to the rough transport equation. We show that, whenever \(X=X^{\bar{s}, \bar{y}}\) for every \(\bar{s}, \bar{y} \) one has the estimate

$$\begin{aligned} u (t, X_t) - u (s, X_s) \underset{(N_{\gamma }+1) \gamma }{=} 0. \end{aligned}$$

Since \((N_{\gamma }+1) \gamma > 1\) this entails that \(t \mapsto u (t, X_t)\) is constant, and so we recover the uniqueness from the identities

$$\begin{aligned}u (s, x) =u (s, X^{s, x}_s) =u (T, X^{s, x}_T) = g (X^{s, x}_T)\,.\end{aligned}$$

To prove (4.10) we show that for every \(k = 0,\ldots , N_{\gamma }\) and any choice of indexes \(i_1,\ldots , i_k \) (if \(k=0\) we do not consider indexes) one has the estimates

$$\begin{aligned} \Gamma _{i_1\ldots i_k} u_t (X_t) \underset{(N_{\gamma }+1-k) \gamma }{=} \Gamma _{i_1\ldots i_k}u_s (X_s).\end{aligned}$$

Let us prove this estimate by reverse induction on the indices length. The case when the indices \(i_1\ldots i_{N_{\gamma }}\) have length \(N_{\gamma }\) comes easily from the algebraic manipulation

$$\begin{aligned}\begin{aligned} \Gamma _{i_1\ldots i_{N_{\gamma }}} u_t (X_t) - \Gamma _{i_1\ldots i_{N_{\gamma }}}u_s (X_s)&= \bigg (\Gamma _{i_1\ldots i_{N_{\gamma }}} u_t (X_t) - \Gamma _{i_1\ldots i_{N_{\gamma }}} u_s (X_t)\bigg ) \\&\quad +\bigg ( \Gamma _{i_1\ldots i_{N_{\gamma }}} u_s (X_t) - \Gamma _{i_1\ldots i_{N_{\gamma }}} u_s (X_s) \bigg )\,. \end{aligned}\end{aligned}$$

Using the defining property of a solution in the estimates (4.2), the first difference on the right-hand side is of order \(\gamma \). Moreover by hypothesis on u one has \(\Gamma _{i_1\ldots i_{N_{\gamma }}}u_s (\cdot )\in C^1\), always uniformly in \(s \in [0, T]\), therefore the second difference is also of order \(\gamma \), as required. Supposing the estimate true for every indices of length k we will prove it on every indices \(i_1\ldots i_{k-1} \) of length \(k-1\) . By repeating the same procedure as before we obtain

$$\begin{aligned}\begin{aligned} \Gamma _{i_1\ldots i_{k-1}} u_t (X_t)-\Gamma _{i_1\ldots i_{k-1}}u_s (X_s)&= \underbrace{\bigg (\Gamma _{i_1\ldots i_{k-1}} u_t (X_t) - \Gamma _{i_1\ldots i_{k-1}} u_s (X_t)\bigg )}_I \\ {}&\quad +\underbrace{\bigg ( \Gamma _{i_1\ldots i_{k-1}} u_s (X_t) - \Gamma _{i_1\ldots i_{k-1}} u_s (X_s) \bigg )}_{II}. \end{aligned}\end{aligned}$$

Using the definition of a solution, the first difference on the right-hand side satisfies

$$\begin{aligned}I\underset{({N_{\gamma }}+1-k) \gamma }{=}- \sum _{k=1}^{{N_{\gamma }}+1-k} \sum _{\vert w\vert =k }\Gamma _{i_1\ldots i_{k-1} w}u_t (X_t) \langle \mathbf {W}_{st},w\rangle .\end{aligned}$$

On the other hand, using Lemma 3.8 two times we write \(\Gamma _{i_1\ldots i_{k-1}} u_s(X_t)= \langle e^*_{i_1\cdots i_{k-1}}, \mathbf {U}_s(\mathbf {X})_t\rangle \) so that the second difference can be replaced by the usual remainder

$$\begin{aligned} \begin{aligned} II \underset{({N_{\gamma }}+1-k) \gamma }{=}&\sum _{k=1}^{{N_{\gamma }}+1-k} \sum _{\vert w\vert =k }\langle e^*_{ i_1\ldots i_{k-1}w}, \mathbf {U}_s(\mathbf {X})_s\rangle \langle \mathbf {W}_{st},w \rangle \\\underset{({N_{\gamma }}+1-k) \gamma }{=}&\sum _{k=1}^{{N_{\gamma }}+1-k} \sum _{\vert w\vert =k }\Gamma _{ i_1\ldots i_{k-1}w}u_s (X_s) \langle \mathbf {W}_{st},w \rangle .\end{aligned}\end{aligned}$$

Combining the two estimates we obtain

$$\begin{aligned} I+II= -\sum _{k=1}^{{N_{\gamma }}+1-k}\sum _{\vert w\vert =k }\bigg (\Gamma _{ i_1\ldots i_{k-1}w}u_t (X_t) -\Gamma _{ i_1\ldots i_{k-1}w}u_s (X_s) \bigg ) \langle \mathbf {W}_{st},w \rangle \,. \end{aligned}$$

Since the terms in the sum involve the increment \( \Gamma _{\sigma } u_t (X_t)- \Gamma _{\sigma } u_s (X_s)\) where \(\sigma \) has length bigger or equal than k we apply the recursive hypothesis obtaining that each term satisfies

$$\begin{aligned}\Gamma _{i_1\ldots i_{k-1}w} u_t (X_t)- \Gamma _{ i_1\ldots i_{k-1}w} u_s (X_s)\underset{({N_{\gamma }}+1-k-\vert w\vert ) \gamma }{=} 0 \end{aligned}$$

and the multiplication with \(\langle \mathbf {W}_{st},w \rangle \) gives the desired estimate. \(\square \)

4.2 Continuity equation and analytically weak formulation

Given a finite measure \(\rho \in \mathcal {M} ({{\mathbb {R}}}^n)\) and a continuous bounded function \(\phi \in C_b ({{\mathbb {R}}}^n)\), we write \(\rho (\phi ) = \int \phi (x) \rho (dx)\) for the natural pairing. We are interested in measure-valued (forward) solutions to the continuity equation

$$\begin{aligned}{\left\{ \begin{array}{ll} \mathrm{d}_t \rho _t = \displaystyle \sum _{i = 1}^d {{\,\mathrm{div}\,}}_x (f_i (x) \rho _t)\,\mathrm d{\mathbf{W}}_t^i &{}\quad \text { in }\quad (0,T)\times {{\mathbb {R}}}^{n},\\ \rho _0=\mu &{} \quad \text { on }\quad \{0\}\times {{\mathbb {R}}}^{n} \end{array}\right. } \end{aligned}$$

when \({\mathbf{W}}\) is again a weakly geometric rough path. As before we use the notation \(\Gamma _i = f_i (x) \cdot D_x\), whose formal adjoint is \(\Gamma _i^\star = -{{\,\mathrm{div}\,}}_x (f_i{\cdot })\).

Definition 4.5

Let \(\gamma \in (0,1)\), \(\mathbf {W}\in \mathscr {C}_g^{\gamma }\) and \(\mu \in \mathcal {M} ({{\mathbb {R}}}^n)\). Any function \(\rho :[0,T]\rightarrow \mathcal {M} ({{\mathbb {R}}}^n)\) such that \(\rho _0= \mu \) is called a weak or measure-valued solution to the rough continuity equation

$$\begin{aligned} \mathrm d\rho _t = \sum _{i = 1}^d {{\,\mathrm{div}\,}}_x (f_i (x) \rho _t)\,\mathrm d{\mathbf{W}}^i_t \end{aligned}$$

if for every \(\phi \) bounded in \(\mathcal {C}^{{N_{\gamma }}+1}_b\) and any word w with \(|w|\le N_\gamma \) one has the estimates

$$\begin{aligned} \rho _t(\Gamma _w\phi )\underset{(N_\gamma +1-|w|)\gamma }{=}\sum _{0\le |v|<N_\gamma +1-|w|}\rho _s(\Gamma _{wv}\phi )\langle {\mathbf{W}}_{st},e_v\rangle \end{aligned}$$

for every \(s<t\in [0,T]\) and uniformly in \(\phi \).

Theorem 4.6

Let \(f \in \mathcal {C}^{2{N_{\gamma }}+1}_b\) and \({\mathbf{W}}\in \mathscr {C}_g^\gamma \). Given initial data \(\mu \in \mathcal {M} ({{\mathbb {R}}}^n)\), there exists a unique solution to the measure-valued rough continuity equation, explicitly given for \(\phi \in \mathcal {C}_b^{{N_{\gamma }}+1}\) by

$$\begin{aligned} \rho _t (\phi ) = \int \phi (X^{0, x}_t) \mu (d x)\;, \end{aligned}$$

where \( X^{0, x}\) is the unique solution of the RDE \(\mathrm dX_t = \sum _{i=1}^d f_i(X_t)\,\mathrm d{\mathbf{W}}^i_t\) such that \(X^{0,x}_0=x\).


(Existence) Using the composition of the controlled rough path \(\mathbf {X}^{0,x}\) with \(\phi \in \mathcal {C}^{{N_{\gamma }}+1}_b\) and the shorthand notation \(X^{0,x}_t=X_t\) we can write

$$\begin{aligned}\phi (X_t) \underset{({N}_{\gamma }+1) \gamma }{=} \phi (X_s) + \sum _{k=1}^{{N_{\gamma }}} \sum _{\vert w\vert =k } \Gamma _w\phi (X_s)\langle \mathbf {W}_{st},w \rangle \,,\\ \Gamma _{i_1\ldots i_n}\phi (X_t) \underset{(N_{\gamma }+1-n) \gamma }{=}\Gamma _{i_1\ldots i_n}\phi (X_s) +\sum _{k=1}^{{N_{\gamma }}-n} \sum _{\vert w\vert =k }\Gamma _{i_1\ldots i_n w}\phi (X_s) \langle \mathbf {W}_{st},w \rangle \,.\end{aligned}$$

This showing the existence when \(\mu = \delta _x\) thanks to Proposition 4.3. Since we are dealing with bounded vector fields, all these estimates are uniform in \(X_0 = x\). Thus we can integrate both sides with respect to the measure \(\mu \), obtaining the existence.

(Uniqueness) To prove the uniqueness, we will show that for any \(0<t\le T\), any function \(g \in \mathcal {C}^{{N_{\gamma }}+1}_b\) and any solution \(u:[0,t]\times {{\mathbb {R}}}^n\rightarrow {{\mathbb {R}}}\) of the RPDE

$$\begin{aligned} \mathrm d u_r = \displaystyle \sum _{i=1}^d \Gamma _i u (r, x)\,\mathrm d{\mathbf{W}}^i_r,\quad u_t=g, \end{aligned}$$

the function \(r\in [0,t] \mapsto \alpha (r):=\rho _r (u_r)\) is constant. This property implies that for any function \(g \in \mathcal {C}^{{N_{\gamma }}+1}_b\) and \(t>0\) one has the identity

$$\begin{aligned}\rho _t (g)=\rho _t (u_t) = \rho _0 (u_0)= \mu (u_0)\end{aligned}$$

which uniquely determines the measure \(\rho _t\) for any \(0<t\le T\). Since the parameter T was also arbitrary it is not restrictive to prove the result when \(t=T\). Then \(\alpha \) is constant if and only if one has the estimate

$$\begin{aligned} \alpha (r)\underset{(N_{\gamma }+1) \gamma }{=} \alpha (s)\,. \end{aligned}$$

Writing \(u_{s,r} = u_r - u_s\) and similarly for \(\rho \) one has

$$\begin{aligned} \rho _r (u_r) - \rho _s (u_s) = \rho _{s, r} (u_r)+\rho _s (u_{s, r}) \,.\end{aligned}$$

By construction of regular solution with \(\phi = u_r\in \mathcal {C}^{{N_{\gamma }}+1}_b\) the first summand expands as

$$\begin{aligned} \rho _{s, r}( u_r) \underset{({N_{\gamma }}+1) \gamma }{=} \sum _{k=1}^{N_{\gamma }} \sum _{\vert w\vert =k }\rho _s (\Gamma _w u_r) \langle \mathbf {W}_{sr},w \rangle \,. \end{aligned}$$

On the other hand, we expand the second summand on the right-hand using the very definition of regular backward RPDE obtaining

$$\begin{aligned}u_{s,r} (x) \underset{(N_{\gamma }+1) \gamma }{=} -\sum _{k=1}^{{N_{\gamma }}} \sum _{\vert w\vert =k }\Gamma _{ w}u_r (x) \langle \mathbf {W}_{sr},w \rangle \,,\end{aligned}$$

where the remainder is uniform on x. By integrating this estimate on \(\rho _s\), we obtain

$$\begin{aligned} \rho _s (u_{s, r})\underset{(N_{\gamma }+1) \gamma }{=} -\sum _{k=1}^{{N_{\gamma }}} \sum _{\vert w\vert =k }\rho _s(\Gamma _{ w}u_r ) \langle \mathbf {W}_{sr},w \rangle \,. \end{aligned}$$

Combining the two estimates (4.15) and (4.14) we obtain (4.13) and the theorem is proven. \(\square \)