Random walks and Lévy processes as rough paths

We consider random walks and Lévy processes in a homogeneous group G. For all p>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p > 0$$\end{document}, we completely characterise (almost) all G-valued Lévy processes whose sample paths have finite p-variation, and give sufficient conditions under which a sequence of G-valued random walks converges in law to a Lévy process in p-variation topology. In the case that G is the free nilpotent Lie group over Rd\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {R}^d$$\end{document}, so that processes of finite p-variation are identified with rough paths, we demonstrate applications of our results to weak convergence of stochastic flows and provide a Lévy–Khintchine formula for the characteristic function of the signature of a Lévy process. At the heart of our analysis is a criterion for tightness of p-variation for a collection of càdlàg strong Markov processes.


Introduction
This paper focuses on several questions regarding Lévy processes and random walks in homogeneous groups, with a particular focus on applications to rough paths theory. Let G be a homogeneous group (in the sense of [15]) equipped with a sub-additive homogeneous norm and the corresponding left-invariant metric. We can summarise the three main results of the paper as follows.
• (Theorem 5.1) Given a Lévy process X in G, we determine (almost) all values of p > 0 for which the sample paths of X have almost surely finite p-variation. • (Theorem 5.5) We give sufficient conditions for a sequence of (interpolated and reparametrised) random walks in G to converge weakly to a (interpolated and reparametrised) Lévy process in G in p-variation topology. • (Theorem 5.17) In the case that G = G N (R d ), the step-N free nilpotent Lie group over R d , we determine a Lévy-Khintchine formula for the characteristic function (in the sense of [11]) of the signature of the random rough path constructed from a Lévy process in G.
We apply the second of these results in the context of rough paths to show weak convergence of stochastic flows in several examples. Notably, we provide a significant generalisation of a result of Kunita [29] and of a related result of Breuillard, Friz and Huesmann [7].
We take a moment to discuss how our work relates to the appearance of càdlàg rough paths in the current literature. Friz and Shekhar [17] recently introduced a broad extension of rough paths theory to the càdlàg setting. Their work in particular generalises the notion of rough integration and RDEs and significantly extends earlier work of Williams [36] who gave pathwise solutions to differential equations driven by Lévy processes in R d .
As a family of càdlàg rough paths of particular interest, Lévy process in G N (R d ) of finite p-variation for some 1 ≤ p < N + 1, were studied in [17]. Such Lévy prough paths bear a resemblance to Markovian rough paths constructed from subelliptic Dirichlet forms on L 2 (G N (R d )), first studied in [19] and recently in [10][11][12], in the sense that both processes may be viewed as stochastic rough paths whose evolution depends entirely on its first N iterated integrals.
The method we employ here to give meaning to càdlàg rough paths is to connect left-and right-limits with continuous paths and treat the resulting object as a classical rough path. We therefore do not address directly the concept of a càdlàg RDE in this paper, but emphasise that our methods relate closely to Marcus SDEs and that Theorems 5.1 and 5.17 can be seen as generalisations of two related results in [17] (discussed further in Sect. 5). We mention however that the method of proof used for our main results, which is based on approximating a Lévy process by a sequence of random walks, is different to the methods used in [17].
We also point out that our methods treat general interpolations, which depend arbitrarily on the endpoints of jumps, on the same footing as the simpler linear interpolation used in Marcus SDEs. Examples of interest of such non-linear interpolations date back to the works of McShane [33] and Sussman [35] on approximations of Brownian motion (discussed further in Examples 5.12 and 5.14), and recently in the work of Flint, Hambly and Lyons [14].
A crucial result for our analysis, which we believe to be of independent interest, is a criterion for tightness of p-variation of strong Markov processes taking values in a Polish space (Theorem 4.8). This result is a generalisation of the main result of Manstavičius [32], which provides a criterion for a strong Markov process to have sample paths of a.s. finite p-variation. Our proof of Theorem 4.8 is a simplification of the stopping times technique adopted in [32].
Finally, we mention that while most applications presented in this paper concern geometric rough paths, and thus only require consideration of the free nilpotent Lie group, we have attempted to make statements in their natural level of generality. In particular, we believe that our results may prove to be of interest for studying random walks and Lévy processes in the Butcher group, which correspond to branched rough paths in the sense of [21,22] (see also Remark 3.2 below).

Outline of the paper
In Sect. 2 we discuss iid arrays and Lévy processes taking values in a general Lie group. Our only contribution in this section is the construction of a sequence of random walks (X n ) n≥1 associated with a Lévy process X such that X n D → X in the Skorokhod topology, and for which tightness of p-variation is simple to verify. In Sect. 3 we recall several preliminary facts about homogeneous groups and spaces of paths of finite p-variation.
Section 4 is devoted to the proof of Theorem 4.1, which shows tightness of pvariation for a collection of random walks in a homogeneous group. This is a central result used in the proofs of the three main aforementioned theorems, which we state and prove in Sect. 5. In Sect. 5.3.1 we also provide several applications of Theorem 5.5 to weak convergence of stochastic flows.
In "Appendix A" we introduce the concept of path functions, which serve to connect the left-and right-limits of càdlàg paths, and collect several technical results used throughout Sect. 5. In "Appendix B" we describe conditions under which sample paths of a Lévy process possess infinite p-variation (used to complete the proof of Theorem 5.1).

Notation
Throughout this section, we fix a Lie group G with Lie algebra g, and identify g with the space of left-invariant vector fields on G. Let u 1 , . . . , u m be a basis for g. We equip g with the inner product for which u 1 , . . . , u m is an orthonormal basis. For an element y ∈ g we write y = m i=1 y i u i . When x is an element of a normed space, we denote its norm by |x|.
We further fix an open neighbourhood U ⊂ G of the identity 1 G ∈ G, such that U has compact closure and exp : g → G is a diffeomorphism from a neighbourhood of zero in g onto U . Let ξ i ∈ C ∞ c (G, R) be smooth functions of compact support such that log(x) = m i=1 ξ i (x)u i for all x ∈ U (that is, ξ i provide exponential coordinates of the first kind on U ). We denote ξ : For a metric space E, denote by D([0, T ], E) the space of càdlàg functions x : [0, T ] → E equipped with the Skorokhod topology (see, e.g., [3,Section 12]). We shall use the symbol o to denote spaces of paths whose starting point is the identity element 1 G . For example D o ([0, T ], G) denotes the set of all x ∈ D([0, T ], G) such that x 0 = 1 G .

Preliminaries on iid arrays and Lévy processes
An array in G is a sequence of a finite collection of G-valued random variables (X n1 , . . . , X nn ) n≥1 . We call the array iid if, for every n ≥ 1, X n1 , . . . , X nn are iid. We will always suppose that an iid array X nj is infinitesimal, i.e., lim n→∞ P [X n1 / ∈ V ] = 0 for every neighbourhood V of 1 G . Furthermore, for all n ≥ 1 we let For a collection of elements and for an array X nj , we refer to the associated random walk X n to mean the sequence of associated walks built from the collections (X n1 , . . . , X nn ).
Recall that a (left) Lévy process in G is a D o ([0, T ], G)-valued random variable X with independent and stationary (right) increments. We refer to Liao [30] for further details.
We call a Lévy triplet (or simply triplet) a collection (A, and a Lévy measure on G (see [30, p. 12]). A classical theorem of Hunt [26] asserts that for every Lévy process X in G, there exists a unique triplet (A, B, ) such that the generator of X is given for all f ∈ C 2 0 (G) and x ∈ G by Conversely, every Lévy triplet gives rise to a unique Lévy process.
We will heavily use a characterisation due to Feinsilver [13] of when a G-valued random walk converges in law to a Markov process as a D o ([0, 1], G)-valued random variable. The following is a special case of the main results of [13]. Theorem 2.1 (Feinsilver [13]). Let X nj be an iid array of G-valued random variables and X n the associated random walk. Denote by F n the probability measure on G associated with X n1 . Let X be a Lévy process in G with triplet (A, B, ).
The following notion of a scaling function will be used throughout the paper.

Definition 2.2 (Scaling function). A continuous bounded function
Let X nj be an iid array in G. We say that θ scales the array X nj if The importance behind the above definition is that given a scaling function θ which scales X nj , the rate with which θ decays at 1 G will determine the values of p > 0 for which the p-variation of the associated random walk is tight (Theorem 4.1).

Example 2.3
In the case G = R d , the prototypical example of a scaling function is 1 ∧ | · | 2 . For a general Lie group G, the example extends as follows: let c > 0 be sufficiently small such that W := {exp(y) | y ∈ g, |y| ≤ c} is contained in U . Then is a scaling function.

Remark 2.4
Suppose that θ is defined by (2.1) and that X nj is an iid array in G such that the associated random walk converges in law to a Lévy process. Then a simple consequence of Theorem 2.1 is that θ scales the array X nj .

Approximating walk
In this subsection, given a Lévy process X in G, we construct an iid array X nj for which the associated random walk X n converges in law to X. The array X nj has the advantage that it takes values in either the support of the Lévy measure of X, or in a set which shrinks to the identity as n → ∞. This makes the walk X n significantly easier to analyse than the increments of X itself and will be used in the proofs of Theorems 5.1 and 5.17. Throughout this subsection, let X be a Lévy process in G with triplet (A, B, ).
Define also the sets of indexes and let K = k ∈ K B k = 0 . For n sufficiently large so that (U c ) < n/2, let Define U n = {x ∈ U | |ξ(x)| ≤ h n } and note that w n := {U c n } ≤ n/2. Remark that lim n→∞ h n = 0 which implies that U n shrinks to 1 G as n → ∞.
Define on G the probability measure μ n (dx) := w −1 n 1{x ∈ U c n } (dx). Observe that by Hölder's inequality, for all q ≥ 1 For every n ≥ 1, let Y n = Y 1 n u 1 + · · · Y m n u m be a g-valued random variable such that for all and with covariances for all i, j ∈ {1, . . . , m} In particular, note that Y i n = b i n a.s. for all i / ∈ J . Remark that setting q = 2 in (2.2) implies It follows that we can choose Y n such that exp(Y n ) has support in a neighbourhood V n of 1 G , such that V n shrinks to 1 G as n → ∞. Denote by ν n the probability measure of the G-valued random variable exp(Y n ). Finally, let X n1 be the G-valued random variable associated to the probability measure (w n /n)μ n + (1 − w n /n)ν n , and let X n2 , . . . , X nn be independent copies of X n1 .
Consider the random walk X n associated with X nj . Then a straightforward application of Theorem 2.1 implies that X n D → X as D o ([0, 1], G)-valued random variables. We also record the following two simple lemmas whose proofs we omit. Lemma 2.5 Let 0 < q 1 , . . . , q m ≤ 2 be real numbers such that q i / ∈ i for all i ∈ {1, . . . , m}, q i = 2 for all i ∈ J , and q i ≥ 1 for all i ∈ K . Let θ be a scaling function such that θ(x) = m i=1 |ξ i (x)| q i for x in a neighbourhood of 1 G . Then θ scales the array X n1 , . . . , X nn . Lemma 2.6 Let θ be a scaling function on G which scales X nj . Let V be a neighbourhood of 1 G , and let f : supp( ) ∪ V → R be a bounded measurable function such that f is continuous on supp( ). Furthermore, suppose that Then for all n sufficiently large, X n1 ∈ supp( ) ∪ V a.s., and

Homogeneous groups
In this section we collect several preliminary facts about homogeneous groups. For details, we refer to [15] and [25]. Throughout this section, we fix a homogeneous group G. That is, G is a nilpotent, connected, and simply connected Lie group endowed with a one-parameter family of dilations (group automorphisms) (δ λ ) λ>0 , which, upon identifying G with its Lie algebra g by the exp map, is given by for a basis u 1 , . . . , u m of g and real numbers d m ≥ · · · ≥ d 1 ≥ 1. We equip G with a sub-additive homogeneous norm ||·|| which induces a left-invariant metric d(x, y) = x −1 y (see [25]).
For the remainder of the section, we identify G with g by the diffeomorphism exp : g → G, and write x = where the (finite) sum runs over all non-zero multi-indexes α, β such that deg(α) + deg(β) = d i .
Example 3.1 Recall that a Lie group G is called graded if its Lie algebra is endowed with a decomposition such that [g i , g j ] ⊆ g i+ j , where g k = 0 for k > N (and where we allow the possibility that g k = 0 for some k ≤ N ). Every graded Lie group can be equipped with a natural family of dilations (δ λ ) λ>0 , and thus a homogeneous structure, for which d 1 , . . . , d m are rational numbers with d 1 = 1, given by δ λ (u) = λ k/α u for all u ∈ g k , where α = min{k ≥ 1 | g k = 0} (and conversely, if d 1 , . . . , d m are rational for a homogeneous group G, then G can be given a graded structure [15, p. 5]). Recall also that a graded Lie group G is called a step-N Carnot group (or stratified group in the terminology of [15]) if the decomposition (3.2) further satisfies [g i , g j ] = g i+ j , where g k = 0 for k > N . Every Carnot group is a homogeneous group with a natural family of dilations given by δ λ (u) = λ k u for all u ∈ g k (so that d i ∈ {1, . . . , N }), and for which the metric d can be taken as the Carnot-Carathéodory distance [2, p. 38].
The Carnot group which will be particularly relevant in Sect. 5.3 for applications in rough paths theory is the step-N free nilpotent Lie group G N (R d ) over R d , which we recall is, by definition, the space where geometric p-rough paths (for p = N ) take value. For further details concerning the theory of geometric rough paths, we refer to [18].
Remark 3.2 Another homogeneous group which plays an important role in the theory of rough paths is the step-N Butcher group G N (R d ) over R d (see [21,22]). Recall is not a Carnot group (see [22,Remark 2.15]).
The group G N (R d ) is, by definition, the space where branched rough paths take value (which form a genuine extension of the notion of geometric rough paths). We mention that branched rough paths were recently studied in [8] to give a rough path perspective on renormalisation of stochastic PDEs in the theory of regularity structures [9,23]. Lévy processes in G N (R d ) in particular form a family of stationary stochastic processes closed under appropriate renormalisation maps (see [8,Section 4]).

Paths of finite p-variation
We will drop the reference to the interval [s, t] when it is clear from the context. For convenience, we record the following standard interpolation estimates. (2) There exists C > 0 such that for all x, y : Proof (1) is obvious. To show (2), it follows from an application of the CBH formula (3.1) and the equivalence of ||·|| and |||·|||, that for all g, h ∈ G Note that, except in trivial cases,  To show the claimed properties of C 0, p-var ([0, T ], G), note that for x ∈ G, the path γ : t → exp(t log x) has finite p-variation if and only if is separable (and thus Polish) is also easy to show (e.g., by considering γ ∈ C g ([0, T ], G) with rational coordinates and using a similar argument as the proof of Lemma A.5).
The following result will be important in our classification of G-valued Lévy processes of finite p-variation.

Proposition 3.5 Let p > 0 and (X n ) n≥1 be a sequence of D([0, T ], G)-valued random variables such that (||X n || p-var;[0,T ] ) n≥1 is a tight collection of real random
is Polish, we may apply the Skorokhod representation theorem [27,Theorem 3.30], from which the conclusion easily follows. (2) It follows from Lemma 3.3 that every set of the form

p-variation tightness of random walks
We continue to use the notation of the previous section. Consider an iid array X nk in the homogeneous group G, and let X n be the associated random walk. The main result of this section is Theorem 4.1, which provides sufficient conditions under which (||X n || p-var;[0,1] ) n≥1 is tight. In its simplest form, Theorem 4.1 implies that whenever X n converges in law to a Lévy process in G, and the array X nk is scaled by a scaling function θ , then (||X n || p-var;[0,1] ) n≥1 is tight for all p > κ > 0, where κ depends only on the scaling function θ .
Let ξ 1 , . . . , ξ m ∈ C ∞ c (G) and ξ : G → g be smooth functions and U a neighbourhood of 1 G for which the conditions at the start of Sect. 2 are satisfied with respect to the basis u 1 , . . . , u m . Theorem 4.1 Let X n1 , . . . , X nn be an iid array of G-valued random variables and X n the associated random walk. For every i ∈ {1, . . . , m}, let 0 < q i ≤ 2 be a real number, and define Consider the following conditions: The remainder of the section is devoted to the proof of Theorem 4.1, which can be split into three parts. The first part is collected in Sect. 4.1 and comprises a general p-variation tightness criterion for strong Markov processes. The second part, which is the most technical part of the proof, is collected in Sect. 4.2 and establishes the bounds required to apply the results of Sect. 4.1 for the case p > d m . The third part is collected in Sect. 4.3 and treats the case p ≤ d m . Roughly speaking, in the third part we decompose X n into the lift of a walk in a lower level group, for which the previous two parts apply, and a perturbation on the higher levels, for which the p-variation can be controlled directly.

p-Variation tightness of strong Markov processes
In this section we give a criterion for p-variation tightness of strong Markov processes in a Polish space (Theorem 4.8), which is inspired by the work of Manstavičius [32].
Let (E, d) be a metric space and x : [0, T ] → E a function. Define and, for δ > 0, Note that quantity ν δ (x) measures the maximum number of oscillations of x of magnitude greater than δ over non-overlapping intervals. Observe the following basic inequality which serves to control ||x|| p-var;[0,T ] : Proof Note that for any function x : Consider a real random variable Z distributed by the negative binomial distribution with parameters ( T / h , q), i.e., Z counts the total number of iid Bernoulli trials with success probability q until exactly T / h failures occur. It follows from the uniform bound (4.2) that (where one considers A i as a failure with probability at least 1 − q), so that We now show how one can verify the condition of Lemma 4.3 for a strong Markov process. We first restrict attention to the set of times on which a process is allowed to move.

Definition 4.4 For a metric space (E, d) and a D([0, T ], E)-valued random variable
Let Z X ⊆ [0, T ] denote the union of all stationary intervals, and let R X = [0, T ]\Z X be its complement.
We emphasise that the role of R X is only technical in that it allows us to easily formulate bounds uniform in s ∈ R X (such as those in Theorem 4.8 and Corollary 4.9) which hold for random walks and for which the same bounds would not hold when taken uniformly over all s ∈ [0, T ] (though for completely harmless reasons). The following lemma is a variant of Gīhman-Skorokhod [20, Lemma 2, p. 420] (in which the notion of R X does not appear).

Lemma 4.6 (Maximum inequality). Let X be a càdlàg (not necessarily strong) Markov process taking values in a Polish space (E, d).
Let h, δ > 0 and suppose there exists c ∈ [0, 1) such that Then for all s ∈ R X and x ∈ E, it holds that Proof Let s ∈ R X and observe that a.s.

Consider a nested sequence of partitions
where the right side is non-decreasing in n since D n are nested.
It thus suffices to show that for any partition D To this end, for i ∈ {0, . . . , n}, consider the events Define the σ -algebras F s,t := σ (X u ) s≤u≤t . Observe that (4.3) implies that a.s.
Moreover, consider the disjoint events Finally, (4.4) now follows from the fact that Conditioning on the stopping times {τ 4δ i (X), . . . , τ 4δ 0 (X)} and using the assumption that X is a strong Markov process, the desired result now follows from Lemma 4.6.
We now obtain the following p-variation tightness criterion for strong Markov processes. Recall the quantity M(X) = sup s,t∈[0,T ] d(X t , X s ).
Then for any p > κ, (||X|| p-var;[0,T ] ) X∈M is a tight collection of real random variables.
It thus remains to show (4.5). By (b) and Corollary 4.7, it holds that for all δ ∈ (0, b] from which (4.5) readily follows.

is a tight collection of D([0, T ], E)-valued random variables, and for any p > γ /β, (||X n || p-var;[0,T ] ) n≥1 is a tight collection of real random variables.
Proof First, note that (ii) applied to small h allows us to verify the Aldous condition for the sequence (X n ) n≥1 (see, e.g, [28, p. 188], though note one should restrict attention to sequences of stopping times τ n taking values in R X n a.s., which is a trivial modification to the usual Aldous condition). Together with (i), it follows that where h = aδ γ /β . It follows that the conditions of Theorem 4.8 are satisfied with

Proof of Theorem 4.1 in the case p > d m
We continue using the notation of Sect. 3. In particular, we identify G with g via the exp map. Observe that an inductive application of the CBH formula (3.1), along with the multinomial identity (z 1 + · · · + z n ) j = k 1 +···+k n = j j k 1 ,...,k n z k 1 1 . . . z k n n , yields the following lemma.
Recall that X n denotes the random walk associated to the iid array X n1 , . . . , X nn . Then there exists K > 0 such that for all n ≥ 1, k ∈ {1, . . . , n} and δ ∈ (0, 1] P X n k/n > δ ≤ K k/n δ γ , (4.6) and, for all i ∈ {1, . . . , m} such that q i ≤ 1, Proof We first claim that it suffices to consider the case ||X n1 || ≤ ε a.s. for all n ≥ 1, where ε > 0 may be taken arbitrarily small. Indeed, let ε > 0 and note that there exists c > 0 such that θ(x) > c for all x ∈ G with ||x|| > ε. Since θ scales X nk , it follows that there exists C 1 > 0 such that for all n ≥ 1 It follows that for all n ≥ 1 and k ∈ {1, . . . , n} P X n k/n > δ ≤ P X n k/n > δ, max 1≤a≤k ||X na || < ε + C 1 k/n, and similarly for P Y n,i k/n > δ . Replacing X nk by we note that (B) and (C) imply that the same conditions hold for the iid array X nk . It thus suffices to prove the statement of the lemma for the iid array X nk instead as claimed.

By Markov's and Jensen's inequalities (observing that
To bound the last expression, for a multi-index α = (α 1 , . . . , α m ), denote |α| = α 1 + · · · + α m . Note that due to the assumption ||X n1 || < ε a.s., (B) is equivalent to Furthermore, by (C) and the Cauchy-Schwartz inequality, Consider now the expression Since X n1 , . . . , X nn are independent, (4.12) splits into a sum of terms of the form E X Call the simple degree of such a term the number of β i > 0. The minimum simple degree of any term is evidently r and the maximum is 2r , and one readily sees that there exists C 3 > 0 such that for all n ≥ 1 and k ∈ {1, . . . , n}, the number of terms of simple degree s ∈ {r, . . . , 2r } is bounded above by C 3 k s . Furthermore, since X n1 , . . . , X nn are identically distributed, it follows from (4.10) and (4.11) that there exists C 4 > 0 such that the absolute value of every term of simple degree s is bounded above by C 4 n −s . Since 2 ≤ r ≤ s and k ≤ n, it follows that Therefore, from (4.9) and the fact that d i ≤ γ i ≤ γ , we obtain (4.8). This completes the case r ≥ 2. It remains to consider the case r = 1. Define now γ i := d i (q i ∨ 1). It holds that where the first inequality is due to (4.10), and the second inequality is due to the (discrete) Burkholder-Davis-Gundy inequality and the fact that q i ≤ 2. It now follows from (C) and (4.10) that Since γ j ≤ γ , this completes the case r = 1 and the proof of the lemma.
As mentioned in Remark 4.10, Corollary 4.9 and the bound (4.6) are now sufficient to prove Theorem 4.1 for the case that p > d m . For i ∈ {1, . . . , m}, let g >i be the subspace of g spanned by {u j | j > i}. Note that g >i is an ideal of g, and so we can define the Lie algebra g i = g/g >i and the projection map π i : g → g i . The dilations δ λ on g give rise to a natural family of dilations on g i , and thus to a homogeneous group G i associated with g i . Equivalently, G i = g/g >i , where we have identified g with G and g >i with a normal subgroup of G. We implicitly equip G i with an arbitrary sub-additive homogeneous norm ||·||. For notational convenience, we also let G 0 = {1} be the trivial group and π 0 : G → G 0 the trivial map. Proof (i) Observe that π i X n is the random walk associated with the G i -valued iid array π i X nk , from which the conclusion follows by Corollary 4.

Finite p-variation of Lévy processes
Consider a homogeneous group G and recall the notation of Sect. 3. Recall also the definitions of i , J , and K from Sect. 2.3. The following is the main result of this subsection. (A, B, ).

Theorem 5.1 Let p > 0 and X be a Lévy process in G with triplet
(1) Then ||X|| p-var; [ For the proof of Theorem 5.1, we require the following lemma.

Convergence in p-variation
In this subsection we consider continuous random paths (X n,φ ) n≥1 , X φ , constructed from a random walk X n and a Lévy process X by connecting their left-and rightlimits with a path function φ, and give conditions under which X n,φ D → X φ as C p-var ([0, 1], G)-valued random variables. All relevant material on path functions is collected in "Appendix A". Theorem 5.5 Let X nj be an iid array in G and X n the associated random walk. Let X be a Lévy process in G with triplet (A, B, ). Suppose that X n D → X as D o ([0, 1], G)-valued random variables and that θ scales Let W ⊆ G be a closed subset such that supp( ) ⊆ W and X n1 ∈ W a.s. for all n ≥ 1 . Let p > max{1, q 1 d 1 , . . . , q m ([0, 1], G)-valued random variables. Remark 5.6 In the statement of Theorem 5.5, note that, a.s., X −1 t− X t ∈ supp( ) for every jump time t of X (e.g., [30,Proposition 1.4]). Hence, for any (measurable) path function φ defined on supp( ), X φ is indeed a well-defined C o ([0, 1], G)-valued random variable.

Applications to rough paths theory
We apply the results so far developed in the paper to the theory of rough paths and stochastic flows. Following Example 3.1, denote by G N (R d ) the step-N free nilpotent Lie group over R d and let g N (R d ) be its Lie algebra. For the remainder of the paper, unless otherwise stated, we shall always let G = G N (R d ) and g = g N (R d ). Being a Carnot group, G comes equipped with a natural homogeneous structure and we note that u 1 , . . . , u d can be identified with a basis of R d .
, equipped with the metric d p-var;[0,T ] , denote the space of weakly geometric p-rough paths. Given an element x ∈ W G p (R d ), and a collection ( f i ) d i=1 of vector fields in Lip γ +k−1 (R e ) for γ > p ≥ 1 and an integer k ≥ 1, there is a unique solution to the rough differential equation (RDE) We refer to [18] for further details on (geometric) rough paths theory.

Stochastic flows
Let U x T ←0 : y 0 → y T denote the flow associated to (5.1), which we recall is an element of Diff k (R e ), the group of C k -diffeomorphisms of R e . Recall that the map is a continuous function on W G p (R d ) when Diff k (R e ) is equipped with the C k -topology ( [18,Theorem 11.12]). The following result is now an immediate corollary of Theorem 5.5.

a collection of vector fields in
We demonstrate how one can apply Corollary 5.7 to show weak convergence of stochastic flows in the following three examples, the first of which extends a result of Kunita [29].
Example 5.8 (Linear interpolation, Kunita [29]). Let Y n1 , . . . , Y nn be an iid array in R d such that the associated random walk Y n converges in law as a D o ([0, 1], R d )valued random variable to a Lévy process Y in R d .
We claim that ODE flows driven by the piecewise linear interpolation of the random walk Y n along Lip γ +k−1 vector fields, for any γ > 2, k ≥ 1, converge in law as Diff k (R e )-valued random variables.
Indeed, setting G := G 2 (R d ), consider the G-valued iid array X nj := e Y nj . It follows that X nj is scaled by any scaling function θ on G for which θ ≥ d i=1 |ξ i | 2 . Moreover, using the fact that ξ i • exp ∈ C ∞ c (R d ), one can readily see by Theorem 2.1 that X n D → X as D o ([0, 1], G)-valued random variables, where X is a G-valued Lévy process. Finally, consider the 1-approximating, endpoint continuous path function Then X n,φ is (a reparametrisation of) the lift of the piecewise linear interpolation of Y n . Furthermore, the conditions of Theorem 5.5 are satisfied for all p > 2, so that 0, 1], G)-valued random variables, from which the desired claim follows (Corollary 5.7).

Remark 5.9
In the previous example, it is easy to see that RDEs driven by X φ coincide (up to reparametrisation) with general (Marcus) RDEs driven by X (in the sense of [17,Section 6]) and thus with Marcus SDEs driven by Y. . . , f d , along which Y n drives an ODE, generate a finite dimensional Lie algebra, which essentially allows one to reduce the problem to a random walk on a Lie group (see [29, p. 340]). Our approach, based on convergence under rough path topologies, bypasses this restriction and provides a natural interpretation of the limiting stochastic flow as the solution of an RDE.
Remark 5. 11 Breuillard, Friz and Huesmann [7] showed a result analogous to the above example in a special case where the limiting Lévy process Y is Brownian motion. The main analytic tool used in [7] is the Kolmogorov-Lamperti criterion to show tightness of (||Y n || 1/ p-Höl; [0,1] ) n≥1 . This is of course stronger than tightness of (||Y n || p-var; [0,1] ) n≥1 , and cannot hold whenever the limiting Lévy process has jumps, which demonstrates an example where the tightness criterion Theorem 4.8 can be used as an effective alternative to the classical Kolmogorov-Lamperti criterion.
In the following example we demonstrate how Example 5.8 generalises to nonlinear interpolations with essentially no extra effort.
Example 5.12 (Non-linear interpolation). As in Example 5.8, let Y nj be an iid array in R d such that Y n D → Y for a Lévy process Y in R d . Instead of piecewise interpolations, consider now any q-approximating endpoint continuous path function ψ : 0, 1], G) denotes the level-2 lifting map. Consider the iid array X nj := f (Y nj ). It follows readily from the assumption that ψ is q-approximating that X nj is again scaled by any scaling function θ on G for which We now make the assumption on ψ and Y n1 that for all i, j ∈ {1, . . . m} the following limits exist: This occurs, for example, whenever every ξ i • f is twice differentiable at zero, but in general will depend on the array Y nj and the path function ψ.
Under this assumption, it follows from Theorem 2.1 that the random walk X n associated with the array X nj converges in law to the Lévy process X with triplet (C, D, ), where is the pushforward of by f .
Define now the q-approximating, endpoint continuous path function φ : Observe that the conditions of Theorem 5.5 are again satisfied for all p > 2, so that , G)-valued random variables. Note that X n,φ is, up to reparametrisation, the lift of Y n,ψ (which is itself, up to reparametrisation, the random walk Y n interpolated by the path function ψ). It follows that ODE flows driven by Y n,ψ along Lip γ +k−1 vector fields, for any γ > 2, k ≥ 1, converge in law as Diff k -valued r.v.'s to the corresponding RDE flow driven by X φ (Corollary 5.7).

Remark 5.13 McShane [33] considered non-linear interpolations of the increments of
Brownian motion and showed strong convergence of the corresponding ODEs to the associated Stratonovich SDE with an adjusted drift. We note that the family of path functions ψ to which the above example applies includes the non-linear interpolations considered by McShane ([33, p. 285]) (provided that Y nk are also sufficiently well behaved, e.g., the increments of Brownian motion, to ensure that the limits C i, j and D i exist). The above example can thus be seen as a weak convergence analogue for general Lévy processes of the results in [33]. In a similar way, the following example is analogous to the results of Sussman [35] on non-linear approximations of Brownian motion.
Example 5.14 (Perturbed walk). As in Examples 5.8 and 5.12, let Y nj be an iid array In this example we wish to consider the random path Z n ∈ C 1-var ([0, 1], R d ) defined by linearly joining the points of Y n , and, between each linear chord, running along the path n −1/N γ .

Define the closed subset
Note that for every x ∈ W decomposes uniquely as x = exp(y) exp(λv) for some y ∈ R d and λ ≥ 0. Define then the 1-approximating, endpoint continuous path function φ : Consider the G-valued iid array X nj := exp(Y nj ) exp(n −1 v) and the associated random walk X n . Observe that X n,φ is (a reparametrisation of) the level-N lift of the path Z n described above.
We now claim that X n D → X for a Lévy process X in G. A straightforward way to show this is to take local coordinates σ 1 , . . . , so that ξ(X nk ) = f n (Y nk ). Note that, since v is in the centre of g, there exists a neighbourhood of zero V ⊂ R d and n 0 > 0 such that for all n ≥ n 0 where h n ≡ 0 on V . It readily follows that Observe now that for all y ∈ R d , lim n→∞ h n (y) = ξ(e y ) − σ (y), so that by dominated convergence, Since n f n (0) = v for all n sufficiently large, we obtain that the following limit exists: Furthermore, letting denote the pushforward of by exp, one can show in exactly the same way that exists for all i, j ∈ {1, . . . , m}, and that for every f ∈ C b (G) which is identically zero on a neighbourhood of 1 G . It follows by Theorem 2.1 that X n D → X as claimed, where X is the Lévy process with triplet (C, D, ).
Finally, one readily sees that X nj is scaled by any scaling function θ on G for which It now follows by Theorem 5.5 that for all p > N , X n,φ D → X φ as C 0, p-var o ([0, 1], G)valued r.v.'s. In particular, ODE flows driven by the random paths Z n along Lip γ +k−1 vector fields, for any γ > N , k ≥ 1, converge in law as Diff k -valued r.v.'s to the corresponding RDE flow driven by X φ (Corollary 5.7).

Remark 5.15
Note that the previous Example 5.8 is a special case of Example 5.14 by taking v = 0 and γ the constant path γ ≡ 0. Building on Remark 5.9, one can verify that RDEs driven by X φ coincide (up to reparametrisation) with the associated Marcus SDEs driven by Y with an adjusted drift given by appropriate N -th level Lie brackets of the driving vector fields (cf. [16] and [18,Section 13.3.4]).

The Lévy-Khintchine formula for Lévy rough paths
In this subsection we determine a formula for the characteristic function (in the sense of [11]) of the signature of a Lévy rough path.
Recall that for every x ∈ W G p (R d ), there exists an element called the signature of x, where S(x) k 0,T encodes all the k-fold iterated integrals of x. A fundamental result in rough paths theory is that S(x) 0,T belongs to a certain group G(R d ) contained in the set of group-like elements of T ((R d )) (for the tensor Hopf algebra structure). Furthermore, for every linear map f ∈ L(R d , L(R e , R e )), the series One of the main results of [11] is that for any W G p (R d )-valued random variable X, the following characteristic function where H varies over all finite dimensional complex Hilbert spaces, uniquely determines S(X) 0,T as a G(R d )-valued random variable G(R d ) (more generally, this result holds for every G(R d )-valued random variable).
Remark 5. 16 Boedihardjo et al. [5] have recently established a conjecture of Hambly-Lyons [24] on the kernel of the map S : W G p (R d ) → T ((R d )). A consequence of the main result of [5] is that for all with γ > p (not necessarily linear). In combination with the results from [11], it follows that for any W G p (R d )-valued random variable X, knowledge of the map (5.2) uniquely determines the law of every RDE driven by X.
We now state the aforementioned formula for the characteristic function of the signature of a Lévy rough path. For a subset W ⊆ G, path function φ : W → C p-var o ([0, 1], G), and a linear map f ∈ L(R d , L(R e , R e )), we adopt the shorthand notation By interpolation (Lemma 3.3), one can readily verify that f φ is continuous whenever φ is p-approximating and endpoint continuous. Finally, we canonically treat g = g N (R d ) as a subspace of the tensor algebra T (R d ), so that for any Lie algebra h, every f ∈ L(R d , h) extends uniquely to a linear map f : g → h. (A, B, ). Suppose that for some 1 ≤ 0, 1], G) be a p-approximating, endpoint continuous path function defined on supp( ).

Theorem 5.17 (Lévy-Khintchine formula). Let X be a Lévy process in G with triplet
Then for every finite-dimensional complex Hilbert space H and f ∈ L(R d , u(H )), it holds that the function L(H, H ) is -integrable, and that where

Remark 5.18
Note that every pair (X, φ) as in Theorem 5.17 naturally gives rise to a convolution semigroup (μ t ) t>0 of probability measures on G(R d ) (which we recall is a Polish but, if d > 1, non-locally compact group, [11]) given by s, t], G) denotes the connecting map applied to the restriction X [s,t] . Moreover, treating φ as a map supp( ) → G(R d ), x → S(φ(x)) 0,1 , and every f ∈ L(R d , u(H )) as unitary representation of G(R d ), Theorem 5.17 bears a close resemblance to other forms of the Lévy-Khintchine formula stated in terms of unitary representations of Lie groups (see, e.g., [1,Section 5.5]).

Remark 5.19
Theorem 5.17 can be seen as an extension of a related result on the expected signature of a Lévy p-rough path for 1 ≤ p < 3 ([17, Theorem 53]) in which φ is taken as the log-linear path function φ(e x ) = e t x , ∀x ∈ g, and additional moment assumptions on the Lévy measure are required to ensure existence of the expected signature.
We first record the following estimate which is readily derived from standard Euler approximations to RDEs ( [18,Corollary 10.15]). Then for all f ∈ L(R d , L(R e , R e )), it holds that Proof of Theorem 5.17 Without loss of generality, we cam assume T = 1. Let V be a bounded neighbourhood of 1 G and W := supp( ) ∪ V . Note that φ shrinks on the diagonal (see Remark A.2), so we can find a path function ψ : which is also p-approximating and shrinks on the diagonal and such that ψ ≡ φ on supp( ) (e.g., let ψ(x) be a geodesic from 1 G to x for all x ∈ V \supp( )). Let X n1 , . . . , X nn be the iid array constructed in Sect. 2.3 associated to X, and let X n be the associated random walk. Due to the shrinking support of the random variables Y nj from Sect. 2.3, observe that for every ε > 0, P X n / ∈ supp( ) ε = 0 for all n sufficiently large, (5.4) where we recall the notation supp( ) ε from Section A.1. In particular, for all n sufficiently large, X n ∈ W 0 a.s., so that X n,ψ is well-defined. Observe that, due to (5.4) and Proposition A.4, X n,ψ D → X ψ as C o ([0, 1], G)-valued random variables. Let p < p < N + 1. Since ||X|| p-var;[0,1] < ∞ a.s. by assumption, we deduce from Theorem 5.1, Lemma 5.4, and Proposition 3.5, that where the equality in law follows from the fact that ψ ≡ φ on supp( ) and X ∈ supp( ) 0 a.s. (see Remark 5.6). For all i ∈ {1, . . . , m}, define q i := 2 ∧ ( p/d i ), and let θ be a scaling function on G such that θ ≡ m i=1 |ξ i | q i in a neighbourhood of 1 G . It follows from Lemma 2.5 and part (2) of Theorem 5.1 that θ scales the array X nj .
Since ψ is p-approximating, it holds that lim x→1 G ||ψ(x)|| Since the array X nj is iid, note that for all n ≥ 1 Since X n,ψ D → X φ as W G p (R d )-valued r.v.'s, and x → f (S(x) 0,1 ) is a continuous bounded function on W G p (R d ), we obtain (5.3).

A.1. The connecting map on the Skorokhod space
For a path x ∈ D([0, T ], E) and a time t ∈ [0, T ], define For a subset J ⊆ E × E and ε ≥ 0 define the subset of càdlàg paths In particular, x ∈ J 0 if and only if all the jumps of x are in J . In the case that E is a Lie group and B ⊆ E, we set For a path function φ : The construction is similar to the method considered in [17] and [36] of adding fictitious times over which to traverse the jumps.
Consider x ∈ J 0 . Let t 1 , t 2 , . . . be the jump times of x ordered so that Let 0 ≤ m ≤ ∞ be the number of jumps of x. We call the sequence (t j ) m j=1 the canonically ordered jump times of x. We henceforth fix a strictly decreasing sequence (r i ) ∞ i=1 of positive real numbers such that ∞ i=1 r i < ∞. Define the sequence (n k ) m k=0 by n 0 = 0, and for 1 ≤ k < m+1 let n k be the smallest integer such that n k > n k−1 and r n k < x t k . Let r := m k=1 r n k . Define the strictly increasing (càdlàg) function Note that τ (t−) < τ(t) if and only if t = t k for some 1 ≤ k < m + 1. Moreover, note that the interval [τ (t k −), τ (t k )) is of length r n k . We now define x ∈ C([0, T + r ], E) by Denote by τ r the linear bijection from [0, T ] to [0, T + r ]. We finally define and the associated time change We call the map x → x φ from J 0 to C([0, T ], E) the connecting map.

A.2. Measurability and continuity
The main result of this subsection is a continuity property of the connecting map, which we summarise in Proposition A.4. Let x ∈ K 0 and a sequence x(n) ∈ J 0 such that x(n) → x in the Skorokhod topology as n → ∞, and such that for every ε > 0, there exists n 0 > 0 such that x(n) ∈ K ε for all n ≥ n 0 . Then Proof Let ε > 0. By uniform continuity of x φ , there exists η > 0 such that Let δ > 0 (which we shall send to zero), and suppose that there exists λ ∈ * such that d ∞;[0,T ] (x • λ, y) < δ and d ∞;[0,T ] (λ, id) < δ.
Observe that there exists an integer k ≥ 1 such that λ(t i ) = t i for all i ∈ {1, . . . , k}, and, denoting by v 1 < · · · < v k (resp. v 1 < · · · < v k ) the same set of points as t 1 , . . . , t k (resp. t 1 , . . . , t k ) ordered monotonically, it holds that Moreover, by choosing δ sufficiently small, we can assume that n i = n i for all i ∈ {1, . . . , k} and that k is sufficiently large so that, by making ∞ j=k+1 r j sufficiently small, it holds that |τ y (t ) − τ x (λ(t ))| < η for all t ∈ [v i , v i+1 ) (this is where we have used the condition r n j < x t j ).
In particular, it holds that that for all t , which in turn follows easily from the construction of x φ . (ii) Note that the condition lim n→∞ P [X n / ∈ K ε ] = 0 for all ε > 0 implies that is

A.3. p-variation
The main result of this subsection is Proposition A.7, which shows that a papproximating path function does not significantly increase the p-variation of a càdlàg path.
We first require the following lemma, whose proof was inspired by [17,Lemma 22].  (a 1 , b 1 ), . . . , J m = (a m , b m ) those intervals I n which contain some partition point t j ∈ D, ordered so that b j < a j+1 for all j ∈ {1, . . . m − 1}. Call a block a consecutive run of partition points t j , t j+1 , . . . , t n either all in J r for some r ∈ {1, . . . , m}, in which case we call it red, or all outside I , in which case we call it blue. We call a consecutive pair of partition points t i , t i+1 ∈ D which lie in different blocks either red-red, red-blue, or blue-red depending on their respective blocks (note there are no blue-blue pairs). For convenience of notation, set J 0 = (a 0 , b 0 ) := (−∞, 0) and J m+1 = (a m+1 , b m+1 ) := (T, ∞).
For r ∈ {1, . . . , m} and a red block t j , t j+1 , . . . , t n in J r we have For r ∈ {0, . . . , m} and a blue block t j , t j+1 , . . . , t n between J r , J r +1 we have n i= j+1 d(x t i−1 , x t i ) p = n i= j+1 d(y t i−1 , y t i ) p ≤ ω y (b r , a r +1 ).
For r ∈ {1, . . . , m} and a red-blue pair t i , t i+1 with t i ∈ J r and t i+1 between J r , J r +1 , we have For r ∈ {0, . . . , m − 1} and a blue-red pair t i , t i+1 with t i between J r , J r +1 and t i+1 ∈ J r +1 , we have Finally for r ∈ {1, . . . , m − 1} a red-red pair t i , t i+1 with t i ∈ J r and t i+1 ∈ J r +1 , we have Since ω x and ω y are super-additive, the conclusion now follows from splitting the sum k i=1 d(x t i−1 , x t i ) p into blocks and consecutive pairs in different blocks.

Appendix B: Infinite p-variation of Lévy processes in Lie groups
The purpose of this section is to establish conditions under which sample paths of a Lévy process have infinite p-variation. The methods are all well-known for the case G = R d , so we mostly provide indications of how they extend to a general Lie group. Throughout this section, we use the notation from Section 2. Let X be a Lévy process in a Lie group G with triplet (A, B, ) and let τ = inf {t ≥ 0 | X t / ∈ U } be the first exit time of X from U . Let y 0 ∈ g\ log(U ) be a distinguished point and consider the g-valued process For ε > 0, define the g-valued process Y ε t := ε −1/2 (Y εt ) for t ∈ [0, 1]. Let B be a g- Proof Note that for every ε > 0, Y ε can be considered as a g-valued Markov process (for which every point outside ε −1/2 log(U ) is absorbing). Writing L ε and L B for the generators of Y ε and B respectively, it suffices to show that L ε f → L B f in C 0 (g) for all f ∈ C ∞ c (g) (see, e.g., [27,Chapter 17]). This in turn follows from writing the generator of X in the log chart and performing a straightforward limiting argument. Proof Let c > 0. Lemma B.1 implies that there exist δ, ε 0 > 0 such that for all 0 < ε ≤ ε 0 P ε −1/2 |ξ i (X ε )| > c > δ.
The following is a form of the classical Blumenthal-Getoor index [4] adapted to the setting of Lie groups. Recall the definitions of i and K from Section 2.3. if either (i) q ∈ i , or (ii) i ∈ K and q < 1.
Proof Define f ∈ C c (G) by f (x) = 1−exp(−|ξ i (x)| q ). Since X has independent and stationary increments, we can readily show (cf. [4, p. 499]) that (B.3) holds whenever It thus suffices to show that (B.4) holds in both cases of (i) and (ii): (i) Let (ψ n ) n≥1 be a non-decreasing sequence of non-negative functions in C ∞ (R), each vanishing on some neighbourhood of zero, and such that lim n→∞ ψ n (x) = |x| q for all x ∈ R. Then for f n (x) := 1 − exp(−ψ n (ξ i (x))), we have where the final convergence follows from q ∈ i . Since 0 ≤ f n ≤ f , we obtain (B.4).
(ii) Since q < 1, for every integer n ≥ 1 we can find ψ n ∈ C ∞ (R) such that |ψ n (x)| ≤ |x| q for all x ∈ R and such that ψ n (x) = nx/ B i for all x in a neighbourhood V n of zero. Note that we may suppose A i,i = 0 and q / ∈ i (as otherwise the desired result follows by Corollary B.3 or by case (i)). Then for f n (x) := 1 − exp(−ψ n (ξ i (x))), a straightforward calculation shows that where the final convergence follows from q / ∈ i and | f n (x)| ≤ C|ψ n (ξ i (x))|. Since again f n ≤ f , we obtain (B.4).