1 Introduction

This paper focuses on several questions regarding Lévy processes and random walks in homogeneous groups, with a particular focus on applications to rough paths theory. Let G be a homogeneous group (in the sense of [15]) equipped with a sub-additive homogeneous norm and the corresponding left-invariant metric. We can summarise the three main results of the paper as follows.

  • (Theorem 5.1) Given a Lévy process \(\mathbf {X}\) in G, we determine (almost) all values of \(p > 0\) for which the sample paths of \(\mathbf {X}\) have almost surely finite p-variation.

  • (Theorem 5.5) We give sufficient conditions for a sequence of (interpolated and reparametrised) random walks in G to converge weakly to a (interpolated and reparametrised) Lévy process in G in p-variation topology.

  • (Theorem 5.17) In the case that \(G = G^N(\mathbb {R}^d)\), the step-N free nilpotent Lie group over \(\mathbb {R}^d\), we determine a Lévy–Khintchine formula for the characteristic function (in the sense of [11]) of the signature of the random rough path constructed from a Lévy process in G.

We apply the second of these results in the context of rough paths to show weak convergence of stochastic flows in several examples. Notably, we provide a significant generalisation of a result of Kunita [29] and of a related result of Breuillard, Friz and Huesmann [7].

We take a moment to discuss how our work relates to the appearance of càdlàg rough paths in the current literature. Friz and Shekhar [17] recently introduced a broad extension of rough paths theory to the càdlàg setting. Their work in particular generalises the notion of rough integration and RDEs and significantly extends earlier work of Williams [36] who gave pathwise solutions to differential equations driven by Lévy processes in \(\mathbb {R}^d\).

As a family of càdlàg rough paths of particular interest, Lévy process in \(G^N(\mathbb {R}^d)\) of finite p-variation for some \(1 \le p < N +1\), were studied in [17]. Such Lévy p-rough paths bear a resemblance to Markovian rough paths constructed from subelliptic Dirichlet forms on \(L^2(G^N(\mathbb {R}^d))\), first studied in [19] and recently in [10,11,12], in the sense that both processes may be viewed as stochastic rough paths whose evolution depends entirely on its first N iterated integrals.

The method we employ here to give meaning to càdlàg rough paths is to connect left- and right-limits with continuous paths and treat the resulting object as a classical rough path. We therefore do not address directly the concept of a càdlàg RDE in this paper, but emphasise that our methods relate closely to Marcus SDEs and that Theorems 5.1 and 5.17 can be seen as generalisations of two related results in [17] (discussed further in Sect. 5). We mention however that the method of proof used for our main results, which is based on approximating a Lévy process by a sequence of random walks, is different to the methods used in [17].

We also point out that our methods treat general interpolations, which depend arbitrarily on the endpoints of jumps, on the same footing as the simpler linear interpolation used in Marcus SDEs. Examples of interest of such non-linear interpolations date back to the works of McShane [33] and Sussman [35] on approximations of Brownian motion (discussed further in Examples 5.12 and 5.14), and recently in the work of Flint, Hambly and Lyons [14].

A crucial result for our analysis, which we believe to be of independent interest, is a criterion for tightness of p-variation of strong Markov processes taking values in a Polish space (Theorem 4.8). This result is a generalisation of the main result of Manstavičius [32], which provides a criterion for a strong Markov process to have sample paths of a.s. finite p-variation. Our proof of Theorem 4.8 is a simplification of the stopping times technique adopted in [32].

Finally, we mention that while most applications presented in this paper concern geometric rough paths, and thus only require consideration of the free nilpotent Lie group, we have attempted to make statements in their natural level of generality. In particular, we believe that our results may prove to be of interest for studying random walks and Lévy processes in the Butcher group, which correspond to branched rough paths in the sense of [21, 22] (see also Remark 3.2 below).

1.1 Outline of the paper

In Sect. 2 we discuss iid arrays and Lévy processes taking values in a general Lie group. Our only contribution in this section is the construction of a sequence of random walks \((\mathbf {X}^n)_{n \ge 1}\) associated with a Lévy process \(\mathbf {X}\) such that \(\mathbf {X}^n \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}\) in the Skorokhod topology, and for which tightness of p-variation is simple to verify. In Sect. 3 we recall several preliminary facts about homogeneous groups and spaces of paths of finite p-variation.

Section 4 is devoted to the proof of Theorem 4.1, which shows tightness of p-variation for a collection of random walks in a homogeneous group. This is a central result used in the proofs of the three main aforementioned theorems, which we state and prove in Sect. 5. In Sect. 5.3.1 we also provide several applications of Theorem 5.5 to weak convergence of stochastic flows.

In “Appendix A” we introduce the concept of path functions, which serve to connect the left- and right-limits of càdlàg paths, and collect several technical results used throughout Sect. 5. In “Appendix B” we describe conditions under which sample paths of a Lévy process possess infinite p-variation (used to complete the proof of Theorem 5.1).

2 Iid arrays and Lévy processes in Lie groups

2.1 Notation

Throughout this section, we fix a Lie group G with Lie algebra \(\mathfrak g\), and identify \(\mathfrak g\) with the space of left-invariant vector fields on G. Let \(u_1, \ldots , u_m\) be a basis for \(\mathfrak g\). We equip \(\mathfrak g\) with the inner product for which \(u_1,\ldots , u_m\) is an orthonormal basis. For an element \(y \in \mathfrak g\) we write \(y = \sum _{i=1}^m y^iu_i\). When x is an element of a normed space, we denote its norm by |x|.

We further fix an open neighbourhood \(U \subset G\) of the identity \(1_G \in G\), such that U has compact closure and \(\exp : \mathfrak g\mapsto G\) is a diffeomorphism from a neighbourhood of zero in \(\mathfrak g\) onto U. Let \(\xi _i \in C^\infty _c(G, \mathbb {R})\) be smooth functions of compact support such that \(\log (x) = \sum _{i=1}^m \xi _i(x) u_i\) for all \(x \in U\) (that is, \(\xi _i\) provide exponential coordinates of the first kind on U). We denote \(\xi : G \mapsto \mathfrak g, \xi (x) = \sum _{i=1}^m \xi _i(x) u_i\).

For a metric space E, denote by D([0, T], E) the space of càdlàg functions \(\mathbf {x}: [0,T] \mapsto E\) equipped with the Skorokhod topology (see, e.g., [3, Section 12]). We shall use the symbol o to denote spaces of paths whose starting point is the identity element \(1_G\). For example \(D_o([0,T],G)\) denotes the set of all \(\mathbf {x}\in D([0,T],G)\) such that \(\mathbf {x}_0 = 1_G\).

2.2 Preliminaries on iid arrays and Lévy processes

An array in G is a sequence of a finite collection of G-valued random variables \(\left( X_{n1},\ldots , X_{nn}\right) _{n \ge 1}\). We call the array iid if, for every \(n \ge 1\), \(X_{n1},\ldots , X_{nn}\) are iid. We will always suppose that an iid array \(X_{nj}\) is infinitesimal, i.e., \(\lim _{n \rightarrow \infty } \mathbb {P} \left[ X_{n1} \notin V \right] = 0\) for every neighbourhood V of \(1_G\). Furthermore, for all \(n \ge 1\) we let

$$\begin{aligned} B_n \,{:=}\, \mathbb E \left[ \xi (X_{n1}) \right] \in \mathfrak g, \end{aligned}$$

and for all \(i,j\in \{1,\ldots , m\}\)

$$\begin{aligned} A_n^{i,j} \,{:=}\, \mathbb E \left[ \xi _i(X_{n1})\xi _j(X_{n1}) \right] . \end{aligned}$$

For a collection of elements \(x_{1},\ldots , x_{n}\) in G, we define the associated walk \(\mathbf {x}\in D_o([0,1],G)\) by

$$\begin{aligned} \mathbf {x}_{t} = {\left\{ \begin{array}{ll} 1_G &{}\text{ if } t \in [0,n^{-1}) \\ x_{1}\ldots x_{\lfloor tn \rfloor } &{}\text{ if } t \in [n^{-1},1], \end{array}\right. } \end{aligned}$$

and for an array \(X_{nj}\), we refer to the associated random walk \(\mathbf {X}^n\) to mean the sequence of associated walks built from the collections \((X_{n1},\ldots ,X_{nn})\).

Recall that a (left) Lévy process in G is a \(D_o([0,T], G)\)-valued random variable \(\mathbf {X}\) with independent and stationary (right) increments. We refer to Liao [30] for further details.

We call a Lévy triplet (or simply triplet) a collection \((A, B, \Pi )\) of an \(m\times m\) covariance matrix \((A^{i,j})_{i,j = 1}^m\), an element \(B = \sum _{i=1}^m B^i u_i \in \mathfrak g\), and a Lévy measure \(\Pi \) on G (see [30, p. 12]).

A classical theorem of Hunt [26] asserts that for every Lévy process \(\mathbf {X}\) in G, there exists a unique triplet \((A,B,\Pi )\) such that the generator of \(\mathbf {X}\) is given for all \(f \in C^2_0(G)\) and \(x \in G\) by

$$\begin{aligned} \lim _{t \rightarrow 0} t^{-1}\mathbb E \left[ f(x\mathbf {X}_t) - f(x) \right]= & {} \sum _{i=1}^m B^i (u_i f)(x) + \frac{1}{2}\sum _{i,j = 1}^m A^{i,j} (u_i u_j f)(x)\\&+ \int _G \left[ f(xy) - f(x) - \sum _{i = 1}^m \xi _i(y)(u_i f)(x)\right] \Pi (dy). \end{aligned}$$

Conversely, every Lévy triplet gives rise to a unique Lévy process.

We will heavily use a characterisation due to Feinsilver [13] of when a G-valued random walk converges in law to a Markov process as a \(D_o([0,1], G)\)-valued random variable. The following is a special case of the main results of [13].

Theorem 2.1

(Feinsilver [13]). Let \(X_{nj}\) be an iid array of G-valued random variables and \(\mathbf {X}^n\) the associated random walk. Denote by \(F_n\) the probability measure on G associated with \(X_{n1}\). Let \(\mathbf {X}\) be a Lévy process in G with triplet \((A,B,\Pi )\).

Then \(\mathbf {X}^n \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}\) as \(D_o([0,1],G)\)-valued random variables if and only if

  1. (1)

    \(\lim _{n \rightarrow \infty } n F_n(f) = \Pi (f)\) for every \(f \in C_b(G)\) which is identically zero on a neighbourhood of \(1_G\),

  2. (2)

    \(\lim _{n \rightarrow \infty } n B_n = B\), and

  3. (3)

    for all \(i,j\in \{1,\ldots , m\}\),

    $$\begin{aligned} \lim _{n \rightarrow \infty } nA^{i,j}_n = A^{i,j} + \int _G \xi _i(x)\xi _j(x) \Pi (dx). \end{aligned}$$

The following notion of a scaling function will be used throughout the paper.

Definition 2.2

(Scaling function). A continuous bounded function \(\theta : G \rightarrow \mathbb {R}\) is called a scaling function if

  1. (i)

    \(\theta (1_G) = 0\),

  2. (ii)

    \(\theta (x) > 0\) for all \(x \ne 1_G\),

  3. (iii)

    there exists \(C > 0\) such that \(|\xi |^2 \le C\theta \), and

  4. (iv)

    there exists \(c > 0\) such that \(\theta (x) > c\) for all \(x \in G{\setminus } U\).

Let \(X_{nj}\) be an iid array in G. We say that \(\theta \) scales the array \(X_{nj}\) if

$$\begin{aligned} \sup _{n\ge 1} n\mathbb E \left[ \theta (X_{n1}) \right] < \infty . \end{aligned}$$

The importance behind the above definition is that given a scaling function \(\theta \) which scales \(X_{nj}\), the rate with which \(\theta \) decays at \(1_G\) will determine the values of \(p > 0\) for which the p-variation of the associated random walk is tight (Theorem 4.1).

Example 2.3

In the case \(G = \mathbb {R}^d\), the prototypical example of a scaling function is \(1 \wedge |\cdot |^2\). For a general Lie group G, the example extends as follows: let \(c > 0\) be sufficiently small such that \(W \,{:=}\, \{\exp (y) \mid y \in \mathfrak g, |y| \le c\}\) is contained in U. Then

$$\begin{aligned} \theta (x) \,{:=}\, \mathbf 1 \{x \in U\}(c^2 \wedge |\xi (x)|^2) + c^2\mathbf 1 \{x \notin U\} \end{aligned}$$
(2.1)

is a scaling function.

Remark 2.4

Suppose that \(\theta \) is defined by (2.1) and that \(X_{nj}\) is an iid array in G such that the associated random walk converges in law to a Lévy process. Then a simple consequence of Theorem 2.1 is that \(\theta \) scales the array \(X_{nj}\).

2.3 Approximating walk

In this subsection, given a Lévy process \(\mathbf {X}\) in G, we construct an iid array \(X_{nj}\) for which the associated random walk \(\mathbf {X}^n\) converges in law to \(\mathbf {X}\). The array \(X_{nj}\) has the advantage that it takes values in either the support of the Lévy measure of \(\mathbf {X}\), or in a set which shrinks to the identity as \(n \rightarrow \infty \). This makes the walk \(\mathbf {X}^n\) significantly easier to analyse than the increments of \(\mathbf {X}\) itself and will be used in the proofs of Theorems 5.1 and 5.17.

Throughout this subsection, let \(\mathbf {X}\) be a Lévy process in G with triplet \((A, B, \Pi )\). For \(i\in \{1, \ldots m\}\) define

Define also the sets of indexes

For \(k \in \widetilde{K}\) define

$$\begin{aligned} \widetilde{B}^k = B^k - \int _{G} \xi _k(x) \Pi (dx), \end{aligned}$$

and let .

For n sufficiently large so that \(\Pi (U^c) < n/2\), let

Define and note that \(w_n \,{:=}\, \Pi \{U_n^c\} \le n/2\). Remark that \(\lim _{n \rightarrow \infty } h_n = 0\) which implies that \(U_n\) shrinks to \(1_G\) as \(n \rightarrow \infty \).

Define on G the probability measure \(\mu _n(dx) \,{:=}\, w_n^{-1}\mathbf 1 \{x \in U_n^c\} \Pi (dx)\). Observe that by Hölder’s inequality, for all \(q \ge 1\)

$$\begin{aligned} \int _{U_n^c} |\xi _i(x)| \Pi (dx) \le (n/2)^{1-1/q}\left( \int _{G}|\xi _i(x)|^q\Pi (dx)\right) ^{1/q}. \end{aligned}$$
(2.2)

For every \(n \ge 1\), let \(Y_n = Y_n^1u_1 + \cdots Y_n^m u_m\) be a \(\mathfrak g\)-valued random variable such that for all \(k \in \widetilde{K}\)

$$\begin{aligned} b_n^k \,{:=}\, \mathbb E \left[ Y_n^k \right] = (1-w_n/n)^{-1}n^{-1} \widetilde{B}^k, \end{aligned}$$

and for all \(k \notin \widetilde{K}\)

$$\begin{aligned} b_n^k \,{:=}\, \mathbb E \left[ Y_n^k \right] = (1-w_n/n)^{-1}n^{-1} \left( B^k - \int _{U_n^c}\xi _k(x)\Pi (dx)\right) , \end{aligned}$$

and with covariances for all \(i,j\in \{1,\ldots , m\}\)

$$\begin{aligned} \mathbb E \left[ (Y_n^i - b_n^i)(Y_n^j-b_n^j) \right] = (1-w_n/n)^{-1}n^{-1}A^{i,j}. \end{aligned}$$

In particular, note that \(Y_n^i = b_n^i\) a.s. for all \(i \notin J\). Remark that setting \(q = 2\) in (2.2) implies

$$\begin{aligned} \lim _{n\rightarrow \infty } n^{-1} \int _{U_n^c} |\xi _i(x)| \Pi (dx) = 0, \end{aligned}$$

from which it follows that \(\sup _{n \ge 1}n|b^i_n| < \infty \). Moreover, it holds that

$$\begin{aligned} \lim _{n \rightarrow \infty }\mathbb E \left[ (Y_n^i - b_n^i)(Y_n^j-b_n^j) \right] = 0. \end{aligned}$$

It follows that we can choose \(Y_n\) such that \(\exp (Y_n)\) has support in a neighbourhood \(V_n\) of \(1_G\), such that \(V_n\) shrinks to \(1_G\) as \(n \rightarrow \infty \). Denote by \(\nu _n\) the probability measure of the G-valued random variable \(\exp (Y_n)\).

Finally, let \(X_{n1}\) be the G-valued random variable associated to the probability measure \((w_n/n) \mu _n + (1-w_n/n) \nu _n\), and let \(X_{n2}, \ldots , X_{nn}\) be independent copies of \(X_{n1}\).

Consider the random walk \(\mathbf {X}^n\) associated with \(X_{nj}\). Then a straightforward application of Theorem 2.1 implies that \(\mathbf {X}^n\,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}\) as \(D_o([0,1],G)\)-valued random variables. We also record the following two simple lemmas whose proofs we omit.

Lemma 2.5

Let \(0 < q_1, \ldots , q_m \le 2\) be real numbers such that \(q_i \notin \Gamma _i\) for all \(i \in \{1,\ldots , m\}\), \(q_i = 2\) for all \(i \in J\), and \(q_i \ge 1\) for all \(i \in K\). Let \(\theta \) be a scaling function such that \(\theta (x) = \sum _{i = 1}^m |\xi _i(x)|^{q_i}\) for x in a neighbourhood of \(1_G\). Then \(\theta \) scales the array \(X_{n1},\ldots , X_{nn}\).

Lemma 2.6

Let \(\theta \) be a scaling function on G which scales \(X_{nj}\). Let V be a neighbourhood of \(1_G\), and let \(f : supp (\Pi ) \cup V \mapsto \mathbb {R}\) be a bounded measurable function such that f is continuous on \( supp (\Pi )\). Furthermore, suppose that

$$\begin{aligned} \lim _{x \rightarrow 1_G} \frac{1}{\theta (x)} \left| f - f(1_G) - \sum _{i=1}^m b_i\xi _i(x) - \frac{1}{2}\sum _{i,j=1}^m a_{i,j}\xi _i(x)\xi _j(x) \right| = 0. \end{aligned}$$

Then for all n sufficiently large, \(X_{n1} \in supp (\Pi ) \cup V\) a.s., and

$$\begin{aligned} \lim _{n \rightarrow \infty } n\mathbb E \left[ f(X_{n1}) - f(1_G) \right] = Q, \end{aligned}$$

where

$$\begin{aligned} Q \,{:=}\, \sum _{i=1}^m B^ib_i + \frac{1}{2}\sum _{i,j=1}^m A^{i,j}a_{i,j} + \int _{G} \left[ f(x) - f(1_G) - \sum _{i=1}^m b_i\xi _i(x) \right] \Pi (dx). \end{aligned}$$

3 Homogeneous groups

In this section we collect several preliminary facts about homogeneous groups. For details, we refer to [15] and [25].

Throughout this section, we fix a homogeneous group G. That is, G is a nilpotent, connected, and simply connected Lie group endowed with a one-parameter family of dilations (group automorphisms) \((\delta _\lambda )_{\lambda > 0}\), which, upon identifying G with its Lie algebra \(\mathfrak g\) by the \(\exp \) map, is given by

$$\begin{aligned} \delta _\lambda (u_i) = \lambda ^{d_i} u_i \end{aligned}$$

for a basis \(u_1,\ldots , u_m\) of \(\mathfrak g\) and real numbers \(d_m \ge \cdots \ge d_1 \ge 1\). We equip G with a sub-additive homogeneous norm \(\left| \left| \cdot \right| \right| \) which induces a left-invariant metric \(d(x,y) = \left| \left| x^{-1}y \right| \right| \) (see [25]).

For the remainder of the section, we identify G with \(\mathfrak g\) by the diffeomorphism \(\exp : \mathfrak g\mapsto G\), and write \(x = \sum x^i u_i\) for \(x \in G\). Note that \(\left| \left| \left| x \right| \right| \right| = \sum _{i=1}^m |x^i|^{1/d_i}\) is also a homogeneous norm on G and thus equivalent to \(\left| \left| \cdot \right| \right| \).

For a multi-index \(\alpha = (\alpha ^1,\ldots , \alpha ^m)\), \(\alpha ^i \ge 0\), we define \(\deg (\alpha ) = \sum _{i=1}^m \alpha ^i d_i\), and for \(x \in G\), write \(x^\alpha = (x^1)^{\alpha ^1}\ldots (x^m)^{\alpha ^m}\). By the Campbell–Baker–Hausdorff (CBH) formula, for all \(i \in \{1,\ldots , m\}\) there exist constants \(C^i_{\alpha ,\beta }\) such that

$$\begin{aligned} (xy)^i = x^i + y^i + \sum _{\alpha ,\beta } C^i_{\alpha ,\beta } x^\alpha y^\beta , \end{aligned}$$
(3.1)

where the (finite) sum runs over all non-zero multi-indexes \(\alpha ,\beta \) such that \(\deg (\alpha ) + \deg (\beta ) = d_i\).

Example 3.1

Recall that a Lie group G is called graded if its Lie algebra is endowed with a decomposition

$$\begin{aligned} \mathfrak g= \mathfrak g^1 \oplus \ldots \oplus \mathfrak g^N \end{aligned}$$
(3.2)

such that \([\mathfrak g^i,\mathfrak g^j] \subseteq \mathfrak g^{i+j}\), where \(\mathfrak g^k = 0\) for \(k > N\) (and where we allow the possibility that \(\mathfrak g^k = 0\) for some \(k \le N\)). Every graded Lie group can be equipped with a natural family of dilations \((\delta _{\lambda })_{\lambda > 0}\), and thus a homogeneous structure, for which \(d_1,\ldots , d_m\) are rational numbers with \(d_1 = 1\), given by \(\delta _\lambda (u) = \lambda ^{k/\alpha }u\) for all \(u \in \mathfrak g^k\), where \(\alpha = \min \{k \ge 1 \mid \mathfrak g^k \ne 0\}\) (and conversely, if \(d_1,\ldots , d_m\) are rational for a homogeneous group G, then G can be given a graded structure [15, p. 5]).

Recall also that a graded Lie group G is called a step-N Carnot group (or stratified group in the terminology of [15]) if the decomposition (3.2) further satisfies \([\mathfrak g_i,\mathfrak g_j] = \mathfrak g_{i+j}\), where \(\mathfrak g_k = 0\) for \(k > N\). Every Carnot group is a homogeneous group with a natural family of dilations given by \(\delta _\lambda (u) = \lambda ^k u\) for all \(u \in \mathfrak g_k\) (so that \(d_i \in \{1,\ldots , N\}\)), and for which the metric d can be taken as the Carnot–Carathéodory distance [2, p. 38].

The Carnot group which will be particularly relevant in Sect. 5.3 for applications in rough paths theory is the step-N free nilpotent Lie group \(G^N(\mathbb {R}^d)\) over \(\mathbb {R}^d\), which we recall is, by definition, the space where geometric p-rough paths (for \(\lfloor p \rfloor = N\)) take value. For further details concerning the theory of geometric rough paths, we refer to [18].

Remark 3.2

Another homogeneous group which plays an important role in the theory of rough paths is the step-N Butcher group \(\mathcal {G}^N(\mathbb {R}^d)\) over \(\mathbb {R}^d\) (see [21, 22]). Recall that \(G^N(\mathbb {R}^d)\) is canonically embedded in \(\mathcal {G}^N(\mathbb {R}^d)\), and that \(\mathcal {G}^N(\mathbb {R}^d)\) admits a natural grading under which \(\mathcal {G}^N(\mathbb {R}^d)\) is not a Carnot group (see [22, Remark 2.15]).

The group \(\mathcal {G}^N(\mathbb {R}^d)\) is, by definition, the space where branched rough paths take value (which form a genuine extension of the notion of geometric rough paths). We mention that branched rough paths were recently studied in [8] to give a rough path perspective on renormalisation of stochastic PDEs in the theory of regularity structures [9, 23]. Lévy processes in \(\mathcal {G}^N(\mathbb {R}^d)\) in particular form a family of stationary stochastic processes closed under appropriate renormalisation maps (see [8, Section 4]).

3.1 Paths of finite p-variation

For \(p > 0\) and functions \(\mathbf {x},\mathbf {y}: [s,t] \mapsto G\), define the p-variation distance

$$\begin{aligned} d_{p -var ;[s,t]}(\mathbf {x},\mathbf {y}) = d(\mathbf {x}_s,\mathbf {y}_s) + \sup _{\mathcal {D}\subset [s,t]} \left( \sum _{t_j \in \mathcal {D}} d(\mathbf {x}_{t_j,t_{j+1}}, \mathbf {y}_{t_j,t_{j+1}})^p \right) ^{1/p}, \end{aligned}$$

where the supremum runs over all partitions \(\mathcal {D}\) of [st] and where we have used the shorthand notation \(\mathbf {x}_{u,v} = \mathbf {x}_u^{-1} \mathbf {x}_v\). Define also the p-variation of \(\mathbf {x}\) by \(\left| \left| \mathbf {x} \right| \right| _{p -var ;[s,t]} = d_{p -var ;[s,t]}(\mathbf {x},1_G)\), and

$$\begin{aligned} d_{0;[s,t]}(\mathbf {x},\mathbf {y}) = d(\mathbf {x}_s,\mathbf {y}_s) + \sup _{u,v \in [s,t]} d(\mathbf {x}_{u,v},\mathbf {y}_{u,v}) \end{aligned}$$

and

$$\begin{aligned} d_{\infty ;[s,t]}(\mathbf {x},\mathbf {y}) = \sup _{u \in [s,t]} d(\mathbf {x}_u,\mathbf {y}_u), \; \; \left| \left| \mathbf {x} \right| \right| _{\infty ;[s,t]} = d_{\infty ;[s,t]}(\mathbf {x},1_G). \end{aligned}$$

We will drop the reference to the interval [st] when it is clear from the context. For convenience, we record the following standard interpolation estimates.

Lemma 3.3

  1. (1)

    For all \(p'> p > 0\) and \(\mathbf {x},\mathbf {y}: [s,t] \mapsto G\)

    $$\begin{aligned} d_{p' -var }(\mathbf {x},\mathbf {y}) \le 2^{\max \{0,(1-p)/p'\}}(\left| \left| \mathbf {x} \right| \right| _{p -var } + \left| \left| \mathbf {y} \right| \right| _{p -var })^{p/p'} d_{0}(\mathbf {x},\mathbf {y})^{1-p/p'}. \end{aligned}$$
  2. (2)

    There exists \(C > 0\) such that for all \(\mathbf {x},\mathbf {y}: [s,t] \mapsto G\) with \(\mathbf {x}_s = \mathbf {y}_s\)

    $$\begin{aligned} d_{\infty }(\mathbf {x},\mathbf {y}) \le d_{0}(\mathbf {x},\mathbf {y}) \le C \max \{d_{\infty }(\mathbf {x},\mathbf {y}), d_{\infty }(\mathbf {x},\mathbf {y})^{1/d_m}(\left| \left| \mathbf {x} \right| \right| _{\infty } + \left| \left| \mathbf {y} \right| \right| _{\infty })^{1-1/d_m}\}. \end{aligned}$$

Proof

(1) is obvious. To show (2), it follows from an application of the CBH formula (3.1) and the equivalence of \(\left| \left| \cdot \right| \right| \) and \(\left| \left| \left| \cdot \right| \right| \right| \), that for all \(g,h \in G\)

$$\begin{aligned} \left| \left| g^{-1}hg \right| \right| \le C_1 \max \{\left| \left| h \right| \right| , \left| \left| h \right| \right| ^{1/d_m}\left| \left| g \right| \right| ^{1-1/d_m}\}. \end{aligned}$$

The conclusion now follows by the identical argument used to prove [18, Proposition 8.15]. \(\square \)

For \(p \ge 1\), let \(C^{p -var }([0,T],G)\) denote the space of continuous paths of finite p-variation equipped with the metric \(d_{p -var ;[0,T]}\). Note that \(C^{p -var }([0,T],G)\) is a complete metric space due to the lower semi-continuity of \(\mathbf {x}\mapsto \left| \left| \mathbf {x} \right| \right| _{p -var ;[0,T]}\) (under pointwise convergence).

Note that, except in trivial cases, \(C^{p -var }([0,T],G)\) is non-separable. However, it is not difficult to show that \(C^{p' -var }([0,T],G)\) contains a separable subset \(C^{0,p' -var }([0,T],G)\) which contains \(C^{p -var }([0,T],G)\) for all \(1 \le p < p'\). Indeed, let \(C^g([0,T],G)\) denote the space of curves which are concatenations of one-parameter subgroups of G, i.e., all curves \(\gamma : [0,T] \mapsto G\) of the form

$$\begin{aligned} \gamma (t) = \gamma (t_{k-1}) \exp \left( \frac{t-t_{k-1}}{t_{k} - t_{k-1}} \log x_k \right) , \; \; t \in [t_{k-1},t_{k}], \; \; k \in \{1, \ldots , n\}, \end{aligned}$$
(3.3)

where \(\mathcal {D}= (t_0 = 0< t_1< \cdots < t_n = T)\) is a partition of [0, T] and \(x_1,\ldots , x_n \in G\) (and where for clarity we have broken the convention of identifying G with \(\mathfrak g\)). Then for \(p \ge 1\), define \(C^{0,p -var }([0,T],G)\) as the closure of \(C^{g}([0,T],G) \cap C^{p -var }([0,T],G)\) in \(C^{p -var }([0,T],G)\).

Remark 3.4

In the case that G is a Carnot group with decomposition (3.2), \(C^{0,p -var }([0,T],G)\) is precisely the closure of the horizontal lifts of smooth paths \(\gamma \in C^\infty ([0,T], \mathfrak g^1)\).

To show the claimed properties of \(C^{0,p -var }([0,T],G)\), note that for \(x \in G\), the path \(\gamma : t \mapsto \exp (t\log x)\) has finite p-variation if and only if \(x^i = 0\) for all \(i \in \{1,\ldots , m\}\) such that \(d_i > p\), in which case there exists \(C_1 = C_1(p,G)>0\) such that \(\left| \left| \gamma \right| \right| _{p -var ;[0,1]} \le C_1 \left| \left| x \right| \right| \). For \(\mathbf {x}: [0,T] \mapsto G\), and a partition \(\mathcal {D}\subset [0,T]\), let \(\mathbf {x}^\mathcal {D}\in C^g([0,T], G)\) be the interpolation of \(\mathbf {x}\) along \(\mathcal {D}\) defined as \(\gamma \) in (3.3) with \(x_k = \mathbf {x}^{-1}_{t_{k-1}}\mathbf {x}_{t_k}\) and \(\mathbf {x}^\mathcal {D}_0 = \mathbf {x}_0\). One can then readily show (e.g., by Lemma A.5) that \(\sup _{\mathcal {D}\subset [0,T]}\left| \left| \mathbf {x}^\mathcal {D} \right| \right| _{p -var } \le C_2\left| \left| \mathbf {x} \right| \right| _{p -var }\). Hence for all \(\mathbf {x}\in C^{p -var }([0,T],G)\) and \(p' > p \ge 1\), by Lemma 3.3, \(d_{p' -var ;[0,T]}(\mathbf {x}^\mathcal {D},\mathbf {x}) \rightarrow 0\) as \(|\mathcal {D}| \rightarrow 0\), which shows that \(C^{p -var }([0,T],G) \subseteq C^{0,p' -var }([0,T],G)\) as claimed. The fact that \(C^{0,p -var }([0,T],G)\) is separable (and thus Polish) is also easy to show (e.g., by considering \(\gamma \in C^g([0,T],G)\) with rational coordinates and using a similar argument as the proof of Lemma A.5).

The following result will be important in our classification of G-valued Lévy processes of finite p-variation.

Proposition 3.5

Let \(p > 0\) and \((\mathbf {X}_n)_{n \ge 1}\) be a sequence of D([0, T], G)-valued random variables such that \((\left| \left| \mathbf {X}_n \right| \right| _{p -var ;[0,T]})_{n \ge 1}\) is a tight collection of real random variables. Suppose that \(\mathbf {X}_n \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}\) as D([0, T], G)-valued random variables.

  1. (1)

    It holds that \(\left| \left| \mathbf {X} \right| \right| _{p -var ;[0,T]} < \infty \) a.s..

  2. (2)

    Suppose further that \(p \ge 1\) and \(\mathbf {X}^n, \mathbf {X}\) are C([0, T], G)-valued random variables. Then for all \(p' > p\), \(\mathbf {X}_n \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}\) as \(C^{0,p' -var }([0,T], G)\)-valued random variables.

Proof

  1. (1)

    Note that \(\mathbf {x}\mapsto \left| \left| \mathbf {x} \right| \right| _{p -var }\) is a lower semi-continuous function on D. Since D([0, T], G) is Polish, we may apply the Skorokhod representation theorem [27, Theorem 3.30], from which the conclusion easily follows.

  2. (2)

    It follows from Lemma 3.3 that every set of the form \(A \cap \{\mathbf {x}\in C([0,T],G) \mid \left| \left| \mathbf {x} \right| \right| _{p -var ;[0,T]} < R\}\), where \(R > 0\) and A is a compact subset of C([0, T], G) (for uniform topology), is a compact subset of \(C^{0,p' -var }([0,T],G)\). Hence \((\mathbf {X}_n)_{n \ge 1}\) is a tight collection of \(C^{0,p' -var }([0,T],G)\)-valued r.v.’s, and so converges in law along a subsequence to some \(C^{0,p' -var }([0,T],G)\)-valued r.v. \(\widetilde{\mathbf {X}}\). Since \(\mathbf {X}_n \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}\) as C([0, T], G)-valued r.v.’s, it necessarily follows that \(\widetilde{\mathbf {X}} \,{\buildrel \mathcal {D}\over =}\,\mathbf {X}\), which concludes the proof.

\(\square \)

Remark 3.6

A version of Helly’s selection principle (see [34, Theorem 2.4]) states that any uniformly bounded sequence of functions \(\mathbf {x}^n : [0,T] \mapsto G\) for which \(\sup _{n \ge 1} \left| \left| \mathbf {x}^n \right| \right| _{p -var ;[0,T]} < \infty \) for some \(p \ge 1\), has a subsequence such that \(\mathbf {x}^{n_k} \rightarrow \mathbf {x}\) pointwise.

4 p-variation tightness of random walks

We continue to use the notation of the previous section. Consider an iid array \(X_{nk}\) in the homogeneous group G, and let \(\mathbf {X}^n\) be the associated random walk. The main result of this section is Theorem 4.1, which provides sufficient conditions under which \((\left| \left| \mathbf {X}^n \right| \right| _{p -var ;[0,1]})_{n \ge 1}\) is tight. In its simplest form, Theorem 4.1 implies that whenever \(\mathbf {X}^n\) converges in law to a Lévy process in G, and the array \(X_{nk}\) is scaled by a scaling function \(\theta \), then \((\left| \left| \mathbf {X}^n \right| \right| _{p -var ;[0,1]})_{n \ge 1}\) is tight for all \(p> \kappa > 0\), where \(\kappa \) depends only on the scaling function \(\theta \).

Let \(\xi _1,\ldots ,\xi _m \in C^\infty _c(G)\) and \(\xi : G \mapsto \mathfrak g\) be smooth functions and U a neighbourhood of \(1_G\) for which the conditions at the start of Sect. 2 are satisfied with respect to the basis \(u_1,\ldots , u_m\).

Theorem 4.1

Let \(X_{n1}, \ldots , X_{nn}\) be an iid array of G-valued random variables and \(\mathbf {X}^n\) the associated random walk. For every \(i\in \{1,\ldots , m\}\), let \(0 < q_i \le 2\) be a real number, and define

$$\begin{aligned} \kappa = \max \{q_1 d_1,\ldots ,q_m d_m\}. \end{aligned}$$

Consider the following conditions:

  1. (A)

    for every fixed \(h \in [0,1]\), \((\mathbf {X}^n_h)_{n \ge 1}\) is a tight collection of G-valued random variables;

  2. (B)

    for all \(i \in \{1,\ldots , m\}\), \(\sup _{n \ge 1} n\left| \mathbb E \left[ \xi _i(X_{n1}) \right] \right| < \infty \);

  3. (C)

    the array \(X_{nk}\) is scaled by a scaling function \(\theta \), where \(\theta \equiv \sum _{i=1}^m |\xi _i|^{q_i}\) on a neighbourhood of \(1_G\).

Then, provided (A), (B) and (C) hold, \((\mathbf {X}^n)_{n \ge 1}\) is a tight collection of \(D_o([0,1],G)\)-valued random variables and, for every \(p > \kappa \), \((\left| \left| \mathbf {X}^n \right| \right| _{p -var ;[0,1]})_{n \ge 1}\) is a tight collection of real random variables.

Remark 4.2

Suppose that for a Lévy process \(\mathbf {X}\) in G, \(\mathbf {X}^n \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}\) as \(D_o([0,T], G)\)-valued random variables. Then conditions (A) and (B) are automatically satisfied by Theorem 2.1 (and (C) is satisfied upon choosing \(q_i = 2\) for all \(i\in \{1,\ldots , m\}\) by Remark 2.4).

The remainder of the section is devoted to the proof of Theorem 4.1, which can be split into three parts. The first part is collected in Sect. 4.1 and comprises a general p-variation tightness criterion for strong Markov processes. The second part, which is the most technical part of the proof, is collected in Sect. 4.2 and establishes the bounds required to apply the results of Sect. 4.1 for the case \(p > d_m\). The third part is collected in Sect. 4.3 and treats the case \(p \le d_m\). Roughly speaking, in the third part we decompose \(\mathbf {X}^n\) into the lift of a walk in a lower level group, for which the previous two parts apply, and a perturbation on the higher levels, for which the p-variation can be controlled directly.

4.1 p-Variation tightness of strong Markov processes

In this section we give a criterion for p-variation tightness of strong Markov processes in a Polish space (Theorem 4.8), which is inspired by the work of Manstavičius [32].

Let (Ed) be a metric space and \(\mathbf {x}: [0,T] \mapsto E\) a function. Define

$$\begin{aligned} M(\mathbf {x}) \,{:=}\, \sup _{s,t \in [0,T]} d(\mathbf {x}_t, \mathbf {x}_s), \end{aligned}$$

and, for \(\delta > 0\),

Note that quantity \(\nu _\delta (\mathbf {x})\) measures the maximum number of oscillations of \(\mathbf {x}\) of magnitude greater than \(\delta \) over non-overlapping intervals. Observe the following basic inequality which serves to control \(\left| \left| \mathbf {x} \right| \right| _{p -var ;[0,T]}\):

$$\begin{aligned} \left| \left| \mathbf {x} \right| \right| ^p_{p -var ;[0,T]} \le \sum _{r = 1}^\infty 2^{-rp+p}\nu _{2^{-r}}(\mathbf {x}) + M(\mathbf {x})^p\nu _{1}(\mathbf {x}). \end{aligned}$$
(4.1)

For \(\delta > 0\), define the increasing sequence of times \((\tau ^\delta _j(\mathbf {x}))_{j=0}^\infty \) by \(\tau ^\delta _0(\mathbf {x}) = 0\) and for \(j \ge 1\)

Lemma 4.3

Let \(\mathbf {X}\) be a D([0, T], E)-valued random variable for a Polish space (Ed). Let \(\delta ,h > 0\) such that there exists \(q \in (0,1)\) for which a.s. for all \(i \ge 0\)

$$\begin{aligned} \mathbb {P} \left[ \tau ^\delta _{i+1}(\mathbf {X}) - \tau ^\delta _i(\mathbf {X}) \le h | \tau ^\delta _{i}(\mathbf {X}),\ldots , \tau ^\delta _{0}(\mathbf {X}) \right] \le q \end{aligned}$$
(4.2)

(where we use the convention \(\infty - \infty = \infty \)). Then

$$\begin{aligned} \mathbb E \left[ \nu _\delta (\mathbf {X}) \right] \le \lceil T/h \rceil \frac{1}{1-q}. \end{aligned}$$

Proof

Note that for any function \(\mathbf {x}: [0,T] \mapsto E\), it holds that \(\nu _{\delta }(\mathbf {x})\) is the largest integer j for which \(\tau ^\delta _j(\mathbf {x}) \le T\), and thus

$$\begin{aligned} \mathbb {P} \left[ \nu _{\delta }(\mathbf {X}) \ge j \right] = \mathbb {P} \left[ \tau ^{\delta }_j(\mathbf {X}) \le T \right] . \end{aligned}$$

For \(i \ge 0\), consider the event \(A_i = \{\tau _{i+1}^{\delta }(\mathbf {X}) - \tau _{i}^\delta (\mathbf {X}) > h\}\), and note that

$$\begin{aligned} \mathbb {P} \left[ \tau _j^\delta (\mathbf {X}) \le T \right] \le \mathbb {P} \left[ at most \lceil T/h \rceil of (A_i)_{i=0}^{j-1} occur \right] . \end{aligned}$$

Consider a real random variable Z distributed by the negative binomial distribution with parameters \((\lceil T/h \rceil ,q)\), i.e., Z counts the total number of iid Bernoulli trials with success probability q until exactly \(\lceil T/h \rceil \) failures occur. It follows from the uniform bound (4.2) that

$$\begin{aligned} \mathbb {P} \left[ at most \lceil T/h \rceil of (A_i)_{i=0}^{j-1} occur \right] \le \mathbb {P} \left[ Z \ge j \right] \end{aligned}$$

(where one considers \(A_i\) as a failure with probability at least \(1-q\)), so that

$$\begin{aligned} \mathbb E \left[ \nu _\delta (\mathbf {X}) \right] \le \mathbb E \left[ Z \right] = \lceil T/h \rceil \frac{1}{1-q}. \end{aligned}$$

\(\square \)

We now show how one can verify the condition of Lemma 4.3 for a strong Markov process. We first restrict attention to the set of times on which a process is allowed to move.

Definition 4.4

For a metric space (Ed) and a D([0, T], E)-valued random variable \(\mathbf {X}\), call a (deterministic) open interval \((s,t) \subset [0,T]\) stationary if

$$\begin{aligned} \mathbb {P} \left[ \forall u \in (s,t), \mathbf {X}_u = \mathbf {X}_s \right] = 1. \end{aligned}$$

Let \(Z_\mathbf {X}\subseteq [0,T]\) denote the union of all stationary intervals, and let \(R_{\mathbf {X}} = [0,T]{\setminus } Z_\mathbf {X}\) be its complement.

Example 4.5

For the random walk \(\mathbf {X}^n \in D([0,1], G)\) associated with an iid array \(X_{nj}\) in a Lie group G, we have \(R_{\mathbf {X}^n} = \{0,1/n,\ldots , (n-1)/n, 1\}\).

We emphasise that the role of \(R_\mathbf {X}\) is only technical in that it allows us to easily formulate bounds uniform in \(s \in R_\mathbf {X}\) (such as those in Theorem 4.8 and Corollary 4.9) which hold for random walks and for which the same bounds would not hold when taken uniformly over all \(s \in [0,T]\) (though for completely harmless reasons). The following lemma is a variant of Gīhman–Skorokhod [20, Lemma 2, p. 420] (in which the notion of \(R_\mathbf {X}\) does not appear).

Lemma 4.6

(Maximum inequality). Let \(\mathbf {X}\) be a càdlàg (not necessarily strong) Markov process taking values in a Polish space (Ed).

Let \(h,\delta > 0\) and suppose there exists \(c \in [0,1)\) such that

$$\begin{aligned} \sup _{s \in R_\mathbf {X}} \sup _{x \in E} \sup _{t \in [s,s+h]}\mathbb P^{s,x} \left[ d(\mathbf {X}_s,\mathbf {X}_t) > \delta \right] \le c. \end{aligned}$$
(4.3)

Then for all \(s \in R_\mathbf {X}\) and \(x \in E\), it holds that

$$\begin{aligned} \mathbb P^{s,x} \left[ \sup _{t \in [s,s+h]} d(\mathbf {X}_s,\mathbf {X}_t)> 2\delta \right] \le \frac{\mathbb P^{s,x} \left[ d(\mathbf {X}_s,\mathbf {X}_{s+h}) > \delta \right] }{1-c}. \end{aligned}$$

Proof

Let \(s \in R_\mathbf {X}\) and observe that a.s.

$$\begin{aligned} \sup _{t \in [s,s+h]} d(\mathbf {X}_s,\mathbf {X}_t) = \sup _{t \in [s,s+h] \cap R_\mathbf {X}} d(\mathbf {X}_s,\mathbf {X}_t). \end{aligned}$$

Consider a nested sequence of partitions \(\mathcal {D}_n \subset [s,s+h] \cap R_\mathbf {X}\) such that

$$\begin{aligned} \lim _{n \rightarrow \infty }\sup _{t \in [s,s+h]\cap R_\mathbf {X}} d_+(t,\mathcal {D}_n) = 0, \end{aligned}$$

where . Since \(\mathbf {X}\) is càdlàg, it holds that

$$\begin{aligned} \sup _{t \in [s,s+h] \cap R_\mathbf {X}} d(\mathbf {X}_s,\mathbf {X}_t) = \lim _{n \rightarrow \infty } \sup _{t_i \in \mathcal {D}_n} d(\mathbf {X}_s,\mathbf {X}_{t_i}), \end{aligned}$$

where the right side is non-decreasing in n since \(\mathcal {D}_n\) are nested.

It thus suffices to show that for any partition \(\mathcal {D}= (t_0=s,\ldots , t_n) \subset [s,s+h]\cap R_\mathbf {X}\), we have

$$\begin{aligned} \mathbb P^{s,x} \left[ \sup _{t_i \in \mathcal {D}} d(\mathbf {X}_s,\mathbf {X}_{t_i})> 2\delta \right] \le \frac{\mathbb P^{s,x} \left[ d(\mathbf {X}_s,\mathbf {X}_{s+h}) > \delta \right] }{1-c}. \end{aligned}$$
(4.4)

To this end, for \(i \in \{0,\ldots , n\}\), consider the events

$$\begin{aligned} C_i \,{:=}\, \{d(\mathbf {X}_{t_i},\mathbf {X}_{s+h}) > \delta \} \end{aligned}$$

and

$$\begin{aligned} B_i \,{:=}\, \{d(\mathbf {X}_s,\mathbf {X}_{t_i}) > 2\delta \}. \end{aligned}$$

Define the \(\sigma \)-algebras \(\mathcal {F}_{s,t} \,{:=}\, \sigma (\mathbf {X}_u)_{s\le u \le t}\). Observe that (4.3) implies that a.s.

Moreover, consider the disjoint events \(F_i \,{:=}\, B^c_1 \cap \ldots \cap B^c_{i-1} \cap B_{i}\). Then for all \(i \in \{0,\ldots , n\}\)

$$\begin{aligned} F_i \cap C^c_{i} \subseteq C_0. \end{aligned}$$

Since each \(F_i\cap C^c_i\) is disjoint and \(F_i\) is \(\mathcal {F}_{s,t_i}\)-measurable, we have

Finally, (4.4) now follows from the fact that

$$\begin{aligned} \sum _{i=1}^n \mathbb P^{s,x} \left[ F_i \right] = \mathbb P^{s,x} \left[ \sup _{t_i \in \mathcal {D}} d(\mathbf {X}_s,\mathbf {X}_{t_i}) > 2\delta \right] . \end{aligned}$$

\(\square \)

Corollary 4.7

Let \(\mathbf {X}\) be a càdlàg strong Markov process taking values in a Polish space (Ed). Let \(h,\delta > 0\) and \(c \in [0,1)\) satisfy (4.3). Then for all \(i \ge 0\), a.s.

$$\begin{aligned} \mathbb {P} \left[ \tau ^{4\delta }_{i+1}(\mathbf {X}) - \tau ^{4\delta }_{i}(\mathbf {X}) \le h | \tau ^{4\delta }_i(\mathbf {X}),\ldots ,\tau ^{4\delta }_0(\mathbf {X}) \right] \le \frac{c}{1-c}. \end{aligned}$$

Proof

Observe that \(\tau ^{4\delta }_i(\mathbf {X})\) takes values a.s. in \(R_\mathbf {X}\), and that the event \(\{\tau _{i+1}^{4\delta }(\mathbf {X}) - \tau _i^{4\delta }(\mathbf {X}) \le h\}\) is contained inside \(\{\sup _{t \in [\tau _i,\tau _i + h]} d(\mathbf {X}_{\tau _i},\mathbf {X}_{t}) > 2\delta \}\). Conditioning on the stopping times \(\{\tau ^{4\delta }_i(\mathbf {X}),\ldots ,\tau ^{4\delta }_0(\mathbf {X})\}\) and using the assumption that \(\mathbf {X}\) is a strong Markov process, the desired result now follows from Lemma 4.6. \(\square \)

We now obtain the following p-variation tightness criterion for strong Markov processes. Recall the quantity \(M(\mathbf {X}) = \sup _{s,t \in [0,T]} d(\mathbf {X}_t, \mathbf {X}_s)\).

Theorem 4.8

Let \(\mathcal {M}\) be collection of càdlàg strong Markov processes on [0, T] taking values in a Polish space (Ed). Suppose that

  1. (a)

    \((M(\mathbf {X}))_{\mathbf {X}\in \mathcal {M}}\) is tight, and

  2. (b)

    there exist constants \(a, \kappa , b > 0\) and \(c \in [0,1/2)\) such that for all \(\delta \in (0, b]\)

    $$\begin{aligned} \sup _{\mathbf {X}\in \mathcal {M}} \sup _{s \in R_{\mathbf {X}}} \sup _{x \in E} \sup _{t \in [s,s+h(\delta )]} \mathbb P^{s,x} \left[ d(\mathbf {X}_s,\mathbf {X}_{t}) > \delta \right] \le c, \end{aligned}$$

    where \(h(\delta ) \,{:=}\, a\delta ^\kappa \).

Then for any \(p > \kappa \), \((\left| \left| \mathbf {X} \right| \right| _{p -var ;[0,T]})_{\mathbf {X}\in \mathcal {M}}\) is a tight collection of real random variables.

Proof

Let \(p > \kappa \). We claim that it suffices to show

$$\begin{aligned} \sup _{\mathbf {X}\in \mathcal {M}} \sum _{r = 0}^\infty 2^{-rp}\mathbb E \left[ \nu _{2^{-r}}(\mathbf {X}) \right] < \infty . \end{aligned}$$
(4.5)

Indeed, observe that (4.5) implies that \((\nu _{1}(\mathbf {X}))_{\mathbf {X}\in \mathcal {M}}\) is tight. It then follows, by (a) and the estimate (4.1), that (4.5) implies \((\left| \left| \mathbf {X} \right| \right| _{p -var ;[0,T]})_{\mathbf {X}\in \mathcal {M}}\) is tight as claimed.

It thus remains to show (4.5). By (b) and Corollary 4.7, it holds that for all \(\delta \in (0,b]\)

Hence, by Lemma 4.3, for all \(\delta \in (0,b]\)

$$\begin{aligned} \sup _{\mathbf {X}\in \mathcal {M}}\mathbb E \left[ \nu _{4\delta }(\mathbf {X}) \right] \le \lceil T/h(\delta ) \rceil \frac{1}{1-\frac{c}{1-c}} \le (1 + T \delta ^{-\kappa }/a)\frac{1-c}{1-2c}, \end{aligned}$$

from which (4.5) readily follows. \(\square \)

Corollary 4.9

Let \((\mathbf {X}^n)_{n \ge 1}\) be a sequence of càdlàg strong Markov processes on [0, T] taking values in a Polish space (Ed). Suppose that

  1. (i)

    for every fixed rational \(h \in [0,T]\), \((\mathbf {X}_h^n)_{n \ge 1}\) is a tight collection of E-valued random variables, and

  2. (ii)

    there exist constants \(K, \beta , \gamma , b > 0\) such that for all \(\delta \in (0, b]\) and \(h > 0\)

    $$\begin{aligned} \sup _{n \ge 1} \sup _{s \in R_{\mathbf {X}^n}} \sup _{x \in E} \sup _{t \in [s,s+h]} \mathbb P^{s,x} \left[ d(\mathbf {X}^n_s,\mathbf {X}^n_t) > \delta \right] \le K\frac{h^\beta }{\delta ^\gamma }. \end{aligned}$$

Then \((\mathbf {X}^n)_{n \ge 1}\) is a tight collection of D([0, T], E)-valued random variables, and for any \(p > \gamma /\beta \), \((\left| \left| \mathbf {X}^n \right| \right| _{p -var ;[0,T]})_{n \ge 1}\) is a tight collection of real random variables.

Proof

First, note that (ii) applied to small h allows us to verify the Aldous condition for the sequence \((\mathbf {X}^n)_{n \ge 1}\) (see, e.g, [28, p. 188], though note one should restrict attention to sequences of stopping times \(\tau _n\) taking values in \(R_{\mathbf {X}^n}\) a.s., which is a trivial modification to the usual Aldous condition). Together with (i), it follows that \((\mathbf {X}^n)_{n \ge 1}\) is a tight collection of D([0, T], E)-valued random variables ([28, Theorems 4.8.1, 4.8.2]).

Observe that M is a continuous function on D([0, T], E), from which it follows that \((M(\mathbf {X}^n))_{n \ge 1}\) is tight. Moreover, observe that (ii) implies that there exists \(a > 0\) such that for all \(\delta \in (0,b]\)

$$\begin{aligned} \sup _{n \ge 1} \sup _{s \in R_{\mathbf {X}^n}} \sup _{x \in E} \sup _{t \in [s,s+h]} \mathbb P^{s,x} \left[ d(\mathbf {X}^n_s,\mathbf {X}^n_t) > \delta \right] \le \frac{1}{3}, \end{aligned}$$

where \(h = a\delta ^{\gamma /\beta }\). It follows that the conditions of Theorem 4.8 are satisfied with \(\kappa = \gamma /\beta \), so that indeed \((\left| \left| \mathbf {X}^n \right| \right| _{p -var ;[0,T]})_{n \ge 1}\) is tight for all \(p > \gamma /\beta \). \(\square \)

4.2 Proof of Theorem 4.1 in the case \(p > d_m\)

We continue using the notation of Sect. 3. In particular, we identify G with \(\mathfrak g\) via the \(\exp \) map.

Remark 4.10

We note here that Corollary 4.9 and the bound (4.6) in the upcoming Lemma 4.12 are sufficient to establish that conditions (A), (B) and (C) imply that \((\mathbf {X}^n)_{n \ge 1}\) is a tight collection of \(D_o([0,1],G)\)-valued random variables and that \((\left| \left| \mathbf {X}^n \right| \right| _{p -var ;[0,1]})_{n \ge 1}\) is tight for all \(p > \kappa \vee d_m\), which proves the statement of Theorem 4.1 subject to the restriction \(p > d_m\).

Observe that an inductive application of the CBH formula (3.1), along with the multinomial identity \((z_1+\cdots + z_n)^{j} = \sum _{k_1+\cdots + k_n = j} \left( {\begin{array}{c}j\\ k_1,\ldots ,k_n\end{array}}\right) z_1^{k_1}\ldots z_n^{k_n}\), yields the following lemma.

Lemma 4.11

For all \(x_1,\ldots , x_k \in G\) and every index \(i \in \{1,\ldots , m\}\), it holds that

$$\begin{aligned} (x_1\ldots x_k)^i = \sum _{1\le a_1 \le k} x^i_{a_1} + \sum _{r=2}^{\lfloor d_i \rfloor }\sum _{\alpha _1,\ldots , \alpha _r}\sum _{1 \le a_1< \cdots < a_r \le k} c^i_{\alpha _1,\ldots ,\alpha _r} x^{\alpha _1}_{a_1}\ldots x^{\alpha _r}_{a_r}, \end{aligned}$$

where \(\sum _{\alpha _1,\ldots , \alpha _r}\) indicates the (finite) sum over all non-zero multi-indexes \(\alpha _1,\ldots , \alpha _r\) such that \(\deg (\alpha _1) + \cdots + \deg (\alpha _r) = d_i\) and \(c^i_{\alpha _1,\ldots ,\alpha _r}\) are constants.

Recall that \(\mathbf {X}^n\) denotes the random walk associated to the iid array \(X_{n1},\ldots , X_{nn}\).

Lemma 4.12

Use the notation from Theorem 4.1 and suppose that (B) and (C) hold. Let \(\gamma \,{:=}\, d_m \vee \kappa \), and for \(i \in \{1,\ldots , m\}\) denote by \(\mathbf {Y}^{n,i} \in D_o([0,1],\mathbb {R})\) the random walk associated with the \(\mathbb {R}\)-valued iid array \(X^i_{nk}\).

Then there exists \(K > 0\) such that for all \(n \ge 1\), \(k \in \{1,\ldots , n\}\) and \(\delta \in (0,1]\)

$$\begin{aligned} \mathbb {P} \left[ \left| \left| \mathbf {X}^n_{k/n} \right| \right| > \delta \right] \le K\frac{k/n}{\delta ^\gamma }, \end{aligned}$$
(4.6)

and, for all \(i \in \{1,\ldots , m\}\) such that \(q_i \le 1\),

$$\begin{aligned} \mathbb {P} \left[ \left| \mathbf {Y}^{n,i}_{k/n} \right| > \delta \right] \le K\frac{k/n}{\delta ^{q_i}}. \end{aligned}$$
(4.7)

Proof

We first claim that it suffices to consider the case \(\left| \left| X_{n1} \right| \right| \le \varepsilon \) a.s. for all \(n \ge 1\), where \(\varepsilon > 0\) may be taken arbitrarily small. Indeed, let \(\varepsilon > 0\) and note that there exists \(c > 0\) such that \(\theta (x) > c\) for all \(x \in G\) with \(\left| \left| x \right| \right| > \varepsilon \). Since \(\theta \) scales \(X_{nk}\), it follows that there exists \(C_1 > 0\) such that for all \(n \ge 1\)

$$\begin{aligned} \mathbb {P} \left[ \left| \left| X_{n1} \right| \right| \ge \varepsilon \right] \le c^{-1}\mathbb E \left[ \theta (X_{n1}) \right] \le C_1/n, \end{aligned}$$

and hence

$$\begin{aligned} \mathbb {P} \left[ \max _{1 \le a \le k}\left| \left| X_{na} \right| \right| \ge \varepsilon \right] = 1 - \left( 1-\mathbb {P} \left[ \left| \left| X_{n1} \right| \right| \ge \varepsilon \right] \right) ^k \le C_1k/n. \end{aligned}$$

It follows that for all \(n \ge 1\) and \(k \in \{1,\ldots , n\}\)

$$\begin{aligned} \mathbb {P} \left[ \left| \left| \mathbf {X}^n_{k/n} \right| \right|> \delta \right] \le \mathbb {P} \left[ \left| \left| \mathbf {X}^n_{k/n} \right| \right| > \delta , \max _{1 \le a \le k}\left| \left| X_{na} \right| \right| < \varepsilon \right] + C_1k/n, \end{aligned}$$

and similarly for \(\mathbb {P} \left[ \left| \mathbf {Y}^{n,i}_{k/n} \right| > \delta \right] \). Replacing \(X_{nk}\) by

$$\begin{aligned} X_{nk}' = {\left\{ \begin{array}{ll} X_{nk} &{}\text{ if } \left| \left| X_{nk} \right| \right| < \varepsilon \\ 1_G &{}\text{ otherwise }, \end{array}\right. } \end{aligned}$$

we note that (B) and (C) imply that the same conditions hold for the iid array \(X_{nk}'\). It thus suffices to prove the statement of the lemma for the iid array \(X_{nk}'\) instead as claimed.

We henceforth assume that \(\left| \left| X_{n1} \right| \right| < \varepsilon \) a.s., where \(\varepsilon > 0\) is sufficiently small so that \(x \in U\) whenever \(\left| \left| x \right| \right| < \varepsilon \). We first show (4.7). Let \(i \in \{1,\ldots , m\}\) such that \(q_i \le 1\). Then there exists \(C_2 > 0\) such that

$$\begin{aligned} \mathbb E \left[ \left| \mathbf {Y}^{n,i}_{k/n} \right| ^{q_i} \right] = \mathbb E \left[ \left| \sum _{a=1}^k X^i_{na} \right| ^{q_i} \right] \le \mathbb E \left[ \sum _{a=1}^k |X^i_{na}|^{q_i} \right] \le C_2k/n. \end{aligned}$$

where the second inequality is due to (C). It follows by Markov’s inequality that there exists \(K > 0\) such that (4.7) holds for all \(n \ge 1\), \(k \in \{1,\ldots , n\}\), and \(\delta \in (0,1]\).

We now show (4.6). By Lemma 4.11, it suffices to show that for all \(i \in \{1,\ldots m\}\), \(r \in \{1,\ldots , \lfloor d_i \rfloor \}\), multi-indexes \(\alpha _1,\ldots , \alpha _r\) such that \(\deg (\alpha _1) + \cdots +\deg (\alpha _r) = d_i\) (with \(\alpha _1^i = 1\) in the case that \(r=1\)), there exists \(K > 0\) such that for all \(n \ge 1\), \(k \in \{1,\ldots ,n\}\) and \(\delta \in (0,1]\)

$$\begin{aligned} \mathbb {P} \left[ \left| \sum _{1 \le a_1< \cdots < a_r \le k}X^{\alpha _1}_{na_1}\ldots X^{\alpha _r}_{na_r} \right| ^{1/d_i} > \delta \right] \le K\frac{k/n}{\delta ^\gamma }. \end{aligned}$$
(4.8)

To this end, let us fix \(i \in \{1,\ldots , m\}\), \(r \in \{1,\ldots , \lfloor d_i \rfloor \}\), and multi-indexes \(\alpha _1,\ldots , \alpha _r\) such that \(\deg (\alpha _1)+\cdots +\deg (\alpha _r) = d_i\). Consider first the case \(r \ge 2\). Define

By Markov’s and Jensen’s inequalities (observing that \(\gamma _i \le 2d_i\))

$$\begin{aligned}&\mathbb {P} \left[ \left| \sum _{1 \le a_1< \cdots< a_r \le k}X^{\alpha _1}_{na_1}\ldots X^{\alpha _r}_{na_r} \right| ^{1/d_i} > \delta \right] \nonumber \\&\quad \le \delta ^{-\gamma _j} \mathbb E \left[ \left( \sum _{1 \le a_1< \cdots < a_r \le k} X^{\alpha _1}_{na_1}\ldots X^{\alpha _r}_{na_r}\right) ^2 \right] ^{\gamma _i/2d_i}. \end{aligned}$$
(4.9)

To bound the last expression, for a multi-index \(\alpha = (\alpha ^1,\ldots ,\alpha ^m)\), denote \(|\alpha | = \alpha ^1 + \cdots + \alpha ^m\). Note that due to the assumption \(\left| \left| X_{n1} \right| \right| < \varepsilon \) a.s., (B) is equivalent to

$$\begin{aligned} \sup _{|\alpha | = 1} \sup _{n \ge 1} n\left| \mathbb E \left[ X_{n1}^\alpha \right] \right| < \infty . \end{aligned}$$
(4.10)

Furthermore, by (C) and the Cauchy–Schwartz inequality,

$$\begin{aligned} \sup _{|\alpha | \ge 2}\sup _{n \ge 1}n\mathbb E \left[ \left| X^{\alpha }_{n1} \right| \right] < \infty . \end{aligned}$$
(4.11)

Consider now the expression

$$\begin{aligned} \mathbb E \left[ \left( \sum _{1 \le a_1< \cdots < a_r \le k} X^{\alpha _1}_{na_1}\ldots X^{\alpha _r}_{na_r}\right) ^2 \right] . \end{aligned}$$
(4.12)

Since \(X_{n1},\ldots , X_{nn}\) are independent, (4.12) splits into a sum of terms of the form \(\mathbb E \left[ X_{n1}^{\beta _1} \right] \ldots \mathbb E \left[ X_{nk}^{\beta _k} \right] \) with \(\beta _i \ge 0\). Call the simple degree of such a term the number of \(\beta _i > 0\). The minimum simple degree of any term is evidently r and the maximum is 2r, and one readily sees that there exists \(C_3 > 0\) such that for all \(n \ge 1\) and \(k \in \{ 1,\ldots , n\}\), the number of terms of simple degree \(s \in \{r,\ldots , 2r\}\) is bounded above by \(C_3 k^s\). Furthermore, since \(X_{n1},\ldots , X_{nn}\) are identically distributed, it follows from (4.10) and (4.11) that there exists \(C_4 > 0\) such that the absolute value of every term of simple degree s is bounded above by \(C_4 n^{-s}\). Since \(2 \le r \le s\) and \(k \le n\), it follows that

$$\begin{aligned} \mathbb E \left[ \left( \sum _{1 \le a_1< \cdots < a_r \le k} X^{\alpha _1}_{na_1}\ldots X^{\alpha _r}_{na_r}\right) ^2 \right] \le C_5 (k/n)^{2}. \end{aligned}$$

Therefore, from (4.9) and the fact that \(d_i \le \gamma _i \le \gamma \), we obtain (4.8). This completes the case \(r \ge 2\).

It remains to consider the case \(r = 1\). Define now \(\gamma _i \,{:=}\, d_i(q_i\vee 1)\). It holds that

$$\begin{aligned} \mathbb {P} \left[ \left| \sum _{1 \le a \le k}X^{i}_{na} \right| ^{1/d_i} > \delta \right] \le \delta ^{-\gamma _j} \mathbb E \left[ \left| \sum _{1 \le a \le k}X^i_{na} \right| ^{q_i\vee 1} \right] . \end{aligned}$$

Denote \(\mu _{n} = \mathbb E \left[ X^i_{n1} \right] \). Then there exist \(C_6, C_7 > 0\) such that

$$\begin{aligned} \mathbb E \left[ \left| \sum _{1 \le a \le k}X^i_{na} \right| ^{q_i\vee 1} \right]&= \mathbb E \left[ \left| \sum _{1 \le a \le k}X^i_{na} - \mu _n + \mu _n \right| ^{q_i\vee 1} \right] \\&\le C_6\mathbb E \left[ \left| \sum _{1 \le a \le k}X^i_{na} - \mu _n \right| ^{q_i\vee 1} + \left| k/n \right| ^{q_i\vee 1} \right] \\&\le C_7\left( \mathbb E \left[ \sum _{1 \le a \le k}\left| X^i_{na} - \mu _n \right| ^{q_i\vee 1} \right] + (k/n)^{q_i\vee 1} \right) , \end{aligned}$$

where the first inequality is due to (4.10), and the second inequality is due to the (discrete) Burkholder–Davis–Gundy inequality and the fact that \(q_i \le 2\). It now follows from (C) and (4.10) that

$$\begin{aligned} \mathbb E \left[ \left| \sum _{1 \le a \le k} X^i_{na} \right| ^{q_i\vee 1} \right]&\le C_8\left( k\mathbb E \left[ |X^i_{n1}|^{q_i\vee 1} \right] + k|\mu _n|^{q_i\vee 1} + (k/n)^{q_i\vee 1} \right) \\&\le C_{9}\left( k/n + k n^{-(q_i\vee 1)} + (k/n)^{q_i\vee 1} \right) \\&\le C_{10}\left( k/n\right) . \end{aligned}$$

Since \(\gamma _j \le \gamma \), this completes the case \(r=1\) and the proof of the lemma. \(\square \)

As mentioned in Remark 4.10, Corollary 4.9 and the bound (4.6) are now sufficient to prove Theorem 4.1 for the case that \(p > d_m\).

4.3 Proof of Theorem 4.1 in the case \(p \le d_m\)

Lemma 4.13

Use the notation from Lemma 4.12 and suppose that (A), (B) and (C) hold. For all \(i \in \{1,\ldots , m\}\), it holds that \((\mathbf {Y}^{n,i}_h)_{n \ge 1, h \in [0,1]}\) is a tight collection of real random variables.

Proof

By Remark 4.10, \((\mathbf {X}^n)_{n \ge 1}\) is a tight collection of \(D_o([0,1],G)\)-valued random variables, from which it follows that \((\max _{1 \le k \le n}|X^i_{nk}|)_{n \ge 1}\) is tight for all \(i \in \{1,\ldots , m\}\). We may thus suppose that \(\left| \left| X_{n1} \right| \right| \le R\) a.s. for some large \(R>0\) and all \(n \ge 1\).

Consider the decomposition \(X^i_{nk} = A_{nk} + B_{nk}\) where

$$\begin{aligned} A_{nk} = X^i_{nk}\mathbf 1 \{\left| \left| X_{nk} \right| \right| < \varepsilon \} \end{aligned}$$

and

$$\begin{aligned} B_{nk} = X^i_{nk}\mathbf 1 \{\varepsilon \le \left| \left| X_{nk} \right| \right| \le R\}. \end{aligned}$$

We take here \(\varepsilon > 0\) sufficiently small so that \(\left| \left| x \right| \right| < \varepsilon \) implies \(x \in U\). It suffices to prove that \((\sum _{a=1}^k B_{na})_{n \ge 1, k \in \{1, \ldots , n\}}\) and \((\sum _{a=1}^k A_{na})_{n \ge 1, k \in \{1, \ldots , n\}}\) are tight collections of real random variables.

Let \(C_1 = C_1(\varepsilon ) > 0\) be such that \(C_1\theta (x) > |x^i|\mathbf 1 \{\varepsilon \le \left| \left| x \right| \right| \le R\}\) for all \(x\in G\). Since \(\theta \) scales \(X_{nk}\), it holds that

$$\begin{aligned} \sup _{n \ge 1, k \in \{1,\ldots , n\}}\mathbb E \left[ \left| \sum _{a=1}^k B_{na} \right| \right] \le \sup _{n \ge 1} C_1n\mathbb E \left[ \theta (X_{n1}) \right] < \infty , \end{aligned}$$

and thus \((\sum _{a=1}^k B_{na})_{n \ge 1, k \in \{1, \ldots , n\}}\) is tight.

Now observe that (B) and (C) imply that \(\sup _{n \ge 1} n\left| \mathbb E \left[ A_{n1} \right] \right| < \infty \). Moreover (C) implies that there exists \(C_2 > 0\) such that \(|\xi _i(x)|^2 \le C_2\theta (x)\) for all \(x \in G\) and \(i \in \{1,\ldots , m\}\). Since \(A_{na} = \mathbf 1 \{\left| \left| X_{na} \right| \right| < \varepsilon \}\xi _i(X_{na})\), and \(A_{n1},\ldots , A_{nn}\) are iid, it follows that

$$\begin{aligned} \sup _{n \ge 1, k \in \{1,\ldots , n\}} \mathbb E \left[ \left( \sum _{a=1}^k A_{na}\right) ^2 \right] \le \sup _{n \ge 1} \sum _{a,b=1}^n \left| \mathbb E \left[ A_{na}A_{nb} \right] \right| < \infty , \end{aligned}$$

and thus \((\sum _{a=1}^k A_{na})_{n \ge 1, k \in \{1, \ldots , n\}}\) is also tight. \(\square \)

For \(i \in \{1,\ldots , m\}\), let \(\mathfrak g^{> i}\) be the subspace of \(\mathfrak g\) spanned by \(\{u_j \mid j > i\}\). Note that \(\mathfrak g^{> i}\) is an ideal of \(\mathfrak g\), and so we can define the Lie algebra \(\mathfrak g^i = \mathfrak g/\mathfrak g^{> i}\) and the projection map \(\pi ^i : \mathfrak g\mapsto \mathfrak g^i\). The dilations \(\delta _\lambda \) on \(\mathfrak g\) give rise to a natural family of dilations on \(\mathfrak g^{i}\), and thus to a homogeneous group \(G^i\) associated with \(\mathfrak g^i\). Equivalently, \(G^i = \mathfrak g/\mathfrak g^{>i}\), where we have identified \(\mathfrak g\) with G and \(\mathfrak g^{> i}\) with a normal subgroup of G. We implicitly equip \(G^i\) with an arbitrary sub-additive homogeneous norm \(\left| \left| \cdot \right| \right| \). For notational convenience, we also let \(G^0 = \{1\}\) be the trivial group and \(\pi ^0 : G \mapsto G^0\) the trivial map.

Corollary 4.14

Use the notation from Lemma 4.12 and suppose that (A), (B) and (C) hold.

  1. (i)

    Let \(i \in \{1,\ldots , m\}\) and \(p > d_i \vee \kappa \). Then \((\left| \left| \pi ^{i}\mathbf {X}^n \right| \right| _{p -var ;[0,1]})_{n \ge 1}\) is tight.

  2. (ii)

    For every \(i \in \{1, \ldots , m\}\) such that \(q_i \le 1\), \((\left| \left| \mathbf {Y}^{n,i} \right| \right| _{p -var ;[0,1]})_{n \ge 1}\) is tight for all \(p > q_i\).

Proof

  1. (i)

    Observe that \(\pi ^i\mathbf {X}^n\) is the random walk associated with the \(G^{i}\)-valued iid array \(\pi ^{i} X_{nk}\), from which the conclusion follows by Corollary 4.9 and the bound (4.6) of Lemma 4.12 (cf. Remark 4.10).

  2. (ii)

    From Corollary 4.9 and the bound (4.7) of Lemma 4.12, it suffices to check that condition (i) of Corollary 4.9 holds for the processes \((\mathbf {Y}^{n,i})_{n \ge 1}\). However this follows from Lemma 4.13.

\(\square \)

Recall that we identify G with \(\mathfrak g\) via the \(\exp \) map. For functions \(\mathbf {z}: [0,T] \mapsto G\) and \(\mathbf {y}: [0,T] \mapsto \mathbb {R}\), define the function

$$\begin{aligned} \mathbf {x}= \mathbf {z}+ \mathbf {y}: [0,T] \mapsto G, \; \; \mathbf {x}_t = \mathbf {z}_t + \mathbf {y}_t u_m. \end{aligned}$$

(where addition is taken in \(\mathfrak g\)). The following lemma is a simple consequence of the fact that \((\mathbf {x}_s^{-1}\mathbf {x}_t)^i = (\mathbf {z}_s^{-1}\mathbf {z}_t)^i\) for all \(i \in \{1,\ldots , m-1\}\), and

$$\begin{aligned} (\mathbf {x}_s^{-1}\mathbf {x}_t)^m = (\mathbf {z}_s^{-1}\mathbf {z}_t)^m + \mathbf {y}_t - \mathbf {y}_s. \end{aligned}$$

Lemma 4.15

Let \(\mathbf {z}: [0,T] \mapsto G\) and \(\mathbf {y}: [0,T] \mapsto \mathbb {R}\) be functions, and let \(\mathbf {x}= \mathbf {z}+ \mathbf {y}\). Then for any \(p > 0\) there exists \(C=(p,G) > 0\) such that

$$\begin{aligned} \left| \left| \mathbf {x} \right| \right| _{p -var ;[0,T]} \le C\left( \left| \left| \mathbf {z} \right| \right| _{p -var ;[0,T]} + \left| \left| \mathbf {y} \right| \right| ^{1/d_m}_{p/d_m -var ;[0,T]} \right) . \end{aligned}$$

Lemma 4.16

Let \(p > 0\) and \(i \in \{1,\ldots , m\}\) be the largest index such that \(d_i \le p\) (with \(i=0\) if no such index exists). Consider elements \(x_1,\ldots , x_n \in G\) and let \(\mathbf {x}\in D_o([0,1], G)\) be the associated walk. For \(j \in \{i + 1, \ldots , m\}\), let \(\mathbf {y}^j \in D_o([0,1], \mathbb {R})\) be the walk associated with the real numbers \(x^j_1,\ldots , x^j_n\).

Then there exists \(C = C(p,G) > 0\), such that

$$\begin{aligned} \left| \left| \mathbf {x} \right| \right| _{p -var ;[0,1]} \le C\left( \left| \left| \pi ^{i}\mathbf {x} \right| \right| _{p -var ;[0,1]} + \sum _{j = i + 1}^m \left| \left| \mathbf {y}^j \right| \right| ^{1/d_j}_{p/d_j -var ;[0,1]}\right) . \end{aligned}$$

Proof

By induction on m and Lemma 4.15, it suffices to show that if \(p < d_m\) and \(x^m_k = 0\) for all \(k \in \{1,\ldots , n\}\), then \(\left| \left| \mathbf {x} \right| \right| _{p -var ;[0,1]} \le C_1\left| \left| \pi ^{m-1} \mathbf {x} \right| \right| _{p -var ;[0,1]}\). This in turn follows from the CBH formula (3.1) and an application of Young’s partition coarsening argument (see, e.g., [31, p. 50]). \(\square \)

We now have all the ingredients for the proof of Theorem 4.1.

Proof of Theorem 4.1

The fact that \((\mathbf {X}^n)_{n \ge 1}\) is a tight collection of \(D_o([0,1],G)\)-valued random variables follows directly from Corollary 4.9 and the bound (4.6) of Lemma 4.12 (cf. Remark 4.10).

Let \(p > \kappa \). Decreasing p if necessary, we may suppose \(p \ne d_i\) for all \(i \in \{1,\ldots , m\}\). Let \(i \in \{1,\ldots , m\}\) be the largest index such that \(d_i < p\) (with \(i=0\) if no such index exists). Define \(\mathbf {Y}^{n,j}\) as in Lemma 4.12, and note that \(q_j< p/d_j < 1\) for all \(j \in \{i + 1,\ldots , m\}\). It follows by Corollary 4.14 that \((\left| \left| \pi ^i \mathbf {X}^n \right| \right| _{p -var ;[0,1]})_{n \ge 1}\) and \((\left| \left| \mathbf {Y}^{n,j} \right| \right| _{p/d_j -var ;[0,1]})_{n \ge 1}\) are tight for all \(j \in \{i+1,\ldots , m\}\). We conclude by Lemma 4.16 that \((\left| \left| \mathbf {X}^n \right| \right| _{p -var ;[0,1]})_{n \ge 1}\) is also tight. \(\square \)

5 Lévy processes in homogeneous groups

5.1 Finite p-variation of Lévy processes

Consider a homogeneous group G and recall the notation of Sect. 3. Recall also the definitions of \(\Gamma _i\), J, and K from Sect. 2.3. The following is the main result of this subsection.

Theorem 5.1

Let \(p > 0\) and \(\mathbf {X}\) be a Lévy process in G with triplet \((A, B, \Pi )\).

  1. (1)

    Then \(\left| \left| \mathbf {X} \right| \right| _{p -var ;[0,1]} < \infty \) a.s. provided that all of the following hold:

    1. (i)

      \(p > 2d_j\) for all \(j \in J\);

    2. (ii)

      \(p > d_k\) for all \(k \in K\);

    3. (iii)

      \(p/d_i > \sup \{\Gamma _i\}\) for all \(i \in \{1, \ldots , m\}\).

  2. (2)

    Then \(\left| \left| \mathbf {X} \right| \right| _{p -var ;[0,1]} = \infty \) a.s. provided that one of the following holds:

    1. (iv)

      \(p \le 2d_j\) for some \(j \in J\);

    2. (v)

      \(p < d_k\) for some \(k \in K\);

    3. (vi)

      \(p/d_i \in \Gamma _i\) for some \(i \in \{1,\ldots , m\}\).

Remark 5.2

Note that Theorem 5.1 does not completely determine all values of p for which \(\left| \left| \mathbf {X} \right| \right| _{p -var ;[0,1]} < \infty \) a.s. (e.g., when \(p/d_i = \sup \{\Gamma _i\} \notin \Gamma _i\) for some \(i \in \{1, \ldots , m\}\)). Comparing Theorem 5.1 with known results for \(\mathbb {R}\)-valued Lévy processes [6], we suspect that (ii) and (iii) can be replaced by \(p \ge d_k, \forall k \in K\), and \(p/d_i \notin \Gamma _i, \forall i \in \{1,\ldots , m\}\), respectively, which would complete the characterisation.

Remark 5.3

In [17], the authors determined sufficient conditions under which a Lévy process in the step-2 free nilpotent Lie group \(G^2(\mathbb {R}^d)\) possesses finite p-variation for \(p \in (2,3)\), along with a partial converse that their conditions cannot in general be weakened ([17, Theorem 50]). In this context, Theorem 5.1 generalises this result to all \(N \ge 1\) and \(p > 0\) and provides a sharp converse. In particular, the Carnot–Carathéodory Blumenthal–Getoor index \(\beta \) introduced in [17] for a Lévy measure on \(G^N(\mathbb {R}^d)\) relates to our definition of \(\Gamma _i\) by \(\beta = \max \{d_1\sup \{\Gamma _1\},\ldots , d_m\sup \{\Gamma _m\}\}\), in which case (iii) reads \(p > \beta \).

For the proof of Theorem 5.1, we require the following lemma.

Lemma 5.4

Let \(\mathbf {X}\) be a Lévy process in G with triplet \((A,B,\Pi )\). Assume \(p > 0\) satisfies (i), (ii), and (iii) of Theorem 5.1.

Let \(X_{nj}\) be the associated iid array constructed in Sect. 2.3 and \(\mathbf {X}^n\) the associated random walk. Then \((\left| \left| \mathbf {X}^n \right| \right| _{p -var ;[0,1]})_{n \ge 1}\) is tight.

Proof

Let \(0< p' < p\) such that \(p'\) also satisfies (i), (ii), and (iii) of Theorem 5.1. For all \(i \in \{1,\ldots , m\}\), define \(q_i \,{:=}\, 2\wedge (p'/d_i)\), and let \(\theta \) be a scaling function on G such that \(\theta \equiv \sum _{i = 1}^m |\xi _i|^{q_i}\) in a neighbourhood of \(1_G\).

Observe that \(q_i \notin \Gamma _i\) for all \(i \in \{1,\ldots , m\}\), \(q_j = 2\) for all \(j \in J\), and \(q_k > 1\) for all \(k \in K\). Thus, by Lemma 2.5, \(\theta \) scales the array \(X_{nj}\). Moreover, since \(\mathbf {X}^n \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}\) as \(D_o([0,1],G)\)-valued random variables, it follows that the array \(X_{nj}\) satisfies the conditions of Theorem 4.1 with the above \(\theta \) and \(q_1,\ldots , q_m\) (see Remark 4.2). Since \(p > \max \{q_1 d_1,\ldots , q_m d_m\}\), it follows that \((\left| \left| \mathbf {X}^n \right| \right| _{p -var ;[0,1]})_{n \ge 1}\) is tight. \(\square \)

Proof of Theorem 5.1

(1) follows from Lemma 5.4 and part (1) of Proposition 3.5, while (2) follows directly from Corollary B.3 and Proposition B.4. \(\square \)

5.2 Convergence in p-variation

In this subsection we consider continuous random paths \((\mathbf {X}^{n,\phi })_{n \ge 1}, \mathbf {X}^\phi \), constructed from a random walk \(\mathbf {X}^n\) and a Lévy process \(\mathbf {X}\) by connecting their left- and right-limits with a path function \(\phi \), and give conditions under which \(\mathbf {X}^{n,\phi } \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}^\phi \) as \(C^{p -var }([0,1],G)\)-valued random variables. All relevant material on path functions is collected in “Appendix A”.

Theorem 5.5

Let \(X_{nj}\) be an iid array in G and \(\mathbf {X}^n\) the associated random walk. Let \(\mathbf {X}\) be a Lévy process in G with triplet \((A,B,\Pi )\). Suppose that \(\mathbf {X}^n \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}\) as \(D_o([0,1],G)\)-valued random variables and that \(\theta \) scales \(X_{nj}\), where \(\theta \equiv \sum _{i=1}^m |\xi _i|^{q_i}\) in a neighbourhood of \(1_G\) for some \(0 < q_i \le 2\).

Let \(W \subseteq G\) be a closed subset such that \( supp (\Pi ) \subseteq W\) and \(X_{n1} \in W\) a.s. for all \(n \ge 1\). Let \(p > \max \{1, q_1 d_1,\ldots ,q_m d_m\}\) and \(\phi : W \mapsto C_o^{p -var }([0,1],G)\) a p-approximating, endpoint continuous path function.

Then \(\left| \left| \mathbf {X}^\phi \right| \right| _{p -var ;[0,1]} < \infty \) a.s., and for every \(p' > p\), \(\mathbf {X}^{n,\phi } \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}^\phi \) as \(C_o^{0,p' -var }([0,1], G)\)-valued random variables.

Remark 5.6

In the statement of Theorem 5.5, note that, a.s., \(\mathbf {X}_{t-}^{-1}\mathbf {X}_t \in supp (\Pi )\) for every jump time t of \(\mathbf {X}\) (e.g., [30, Proposition 1.4]). Hence, for any (measurable) path function \(\phi \) defined on \( supp (\Pi )\), \(\mathbf {X}^\phi \) is indeed a well-defined \(C_o([0,1],G)\)-valued random variable.

Proof

By Theorem 4.1, it holds that \((\left| \left| \mathbf {X}^n \right| \right| _{p -var ;[0,1]})_{n \ge 1}\) is tight, and thus, by Proposition A.7, \((\left| \left| \mathbf {X}^{n,\phi } \right| \right| _{p -var ;[0,1]})_{n \ge 1}\) is also tight. Since \(\phi \) is endpoint continuous on W, it follows by Proposition A.4 that \(\mathbf {X}^{n,\phi } \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}^{\phi }\) as \(C_o([0,1],G)\)-valued random variables. The conclusion now follows from Proposition 3.5. \(\square \)

5.3 Applications to rough paths theory

We apply the results so far developed in the paper to the theory of rough paths and stochastic flows. Following Example 3.1, denote by \(G^N(\mathbb {R}^d)\) the step-N free nilpotent Lie group over \(\mathbb {R}^d\) and let \(\mathfrak g^N(\mathbb {R}^d)\) be its Lie algebra. For the remainder of the paper, unless otherwise stated, we shall always let \(G = G^N(\mathbb {R}^d)\) and \(\mathfrak g= \mathfrak g^N(\mathbb {R}^d)\). Being a Carnot group, G comes equipped with a natural homogeneous structure and we note that \(u_1,\ldots , u_d\) can be identified with a basis of \(\mathbb {R}^d\).

For \(1 \le p < N+1\), we let \(WG\Omega _p(\mathbb {R}^d) \,{:=}\, C^{p -var }([0,T], G)\), equipped with the metric \(d_{p -var ;[0,T]}\), denote the space of weakly geometric p-rough paths. Given an element \(\mathbf {x}\in WG\Omega _p(\mathbb {R}^d)\), and a collection \((f_i)_{i=1}^d\) of vector fields in \( Lip ^{\gamma +k-1}(\mathbb {R}^e)\) for \(\gamma > p \ge 1\) and an integer \(k \ge 1\), there is a unique solution to the rough differential equation (RDE)

$$\begin{aligned} d\mathbf {y}_t = f(\mathbf {y}_t)d\mathbf {x}_t, \; \; \mathbf {y}_0 \in \mathbb {R}^e. \end{aligned}$$
(5.1)

We refer to [18] for further details on (geometric) rough paths theory.

5.3.1 Stochastic flows

Let \(U^\mathbf {x}_{T\leftarrow 0} : \mathbf {y}_0 \mapsto \mathbf {y}_T\) denote the flow associated to (5.1), which we recall is an element of \( Diff ^k(\mathbb {R}^e)\), the group of \(C^k\)-diffeomorphisms of \(\mathbb {R}^e\). Recall that the map \(U^\cdot _{T\leftarrow 0} : WG\Omega _p(\mathbb {R}^d) \mapsto Diff ^k(\mathbb {R}^e)\) is a continuous function on \(WG\Omega _p(\mathbb {R}^d)\) when \( Diff ^k(\mathbb {R}^e)\) is equipped with the \(C^k\)-topology ([18, Theorem 11.12]). The following result is now an immediate corollary of Theorem 5.5.

Corollary 5.7

Suppose the assumptions of Theorem 5.5 are verified for some \(1 \le p < N+1\). Let \(\gamma > p\), \(k \ge 1\) an integer, and \((f_i)_{i=1}^d\) a collection of vector fields in \( Lip ^{\gamma +k-1}(\mathbb {R}^e)\). Let \(U^\cdot _{1\leftarrow 0} : WG\Omega _p(\mathbb {R}^d) \mapsto Diff ^k(\mathbb {R}^e)\) be the associated flow map.

Then \(U^{\mathbf {X}^{n,\phi }}_{1\leftarrow 0} \,{\buildrel \mathcal {D}\over \rightarrow }\,U^{\mathbf {X}^\phi }_{1\leftarrow 0}\) as \( Diff ^k(\mathbb {R}^e)\)-valued random variables.

We demonstrate how one can apply Corollary 5.7 to show weak convergence of stochastic flows in the following three examples, the first of which extends a result of Kunita [29].

Example 5.8

(Linear interpolation, Kunita [29]). Let \(Y_{n1},\ldots , Y_{nn}\) be an iid array in \(\mathbb {R}^d\) such that the associated random walk \(\mathbf {Y}^n\) converges in law as a \(D_o([0,1],\mathbb {R}^d)\)-valued random variable to a Lévy process \(\mathbf {Y}\) in \(\mathbb {R}^d\).

We claim that ODE flows driven by the piecewise linear interpolation of the random walk \(\mathbf {Y}^n\) along \( Lip ^{\gamma +k-1}\) vector fields, for any \(\gamma > 2\), \(k \ge 1\), converge in law as \( Diff ^k(\mathbb {R}^e)\)-valued random variables.

Indeed, setting \(G \,{:=}\, G^2(\mathbb {R}^d)\), consider the G-valued iid array \(X_{nj} \,{:=}\, e^{Y_{nj}}\). It follows that \(X_{nj}\) is scaled by any scaling function \(\theta \) on G for which \(\theta \ge \sum _{i=1}^d |\xi _i|^2\). Moreover, using the fact that \(\xi _i\circ \exp \in C^\infty _c(\mathbb {R}^d)\), one can readily see by Theorem 2.1 that \(\mathbf {X}^n\,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}\) as \(D_o([0,1],G)\)-valued random variables, where \(\mathbf {X}\) is a G-valued Lévy process. Finally, consider the 1-approximating, endpoint continuous path function

$$\begin{aligned} \phi : \exp (\mathbb {R}^d) \mapsto C_o([0,1], G), \; \; \phi (e^x)_t = e^{tx}. \end{aligned}$$

Then \(\mathbf {X}^{n,\phi }\) is (a reparametrisation of) the lift of the piecewise linear interpolation of \(\mathbf {Y}^n\). Furthermore, the conditions of Theorem 5.5 are satisfied for all \(p > 2\), so that \(\mathbf {X}^{n,\phi } \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}^{\phi }\) as \(C_o^{0,p -var }([0,1],G)\)-valued random variables, from which the desired claim follows (Corollary 5.7).

Remark 5.9

In the previous example, it is easy to see that RDEs driven by \(\mathbf {X}^\phi \) coincide (up to reparametrisation) with general (Marcus) RDEs driven by \(\mathbf {X}\) (in the sense of [17, Section 6]) and thus with Marcus SDEs driven by \(\mathbf {Y}\).

Remark 5.10

The previous example extends the main result of Kunita [29, Theorem 4, Corollary p. 329]. The main restriction of Kunita’s result is the assumption that the vector fields \(f_1,\ldots , f_d\), along which \(\mathbf {Y}^n\) drives an ODE, generate a finite dimensional Lie algebra, which essentially allows one to reduce the problem to a random walk on a Lie group (see [29, p. 340]). Our approach, based on convergence under rough path topologies, bypasses this restriction and provides a natural interpretation of the limiting stochastic flow as the solution of an RDE.

Remark 5.11

Breuillard, Friz and Huesmann [7] showed a result analogous to the above example in a special case where the limiting Lévy process \(\mathbf {Y}\) is Brownian motion. The main analytic tool used in [7] is the Kolmogorov–Lamperti criterion to show tightness of . This is of course stronger than tightness of \((\left| \left| \mathbf {Y}^n \right| \right| _{p -var ;[0,1]})_{n \ge 1}\), and cannot hold whenever the limiting Lévy process has jumps, which demonstrates an example where the tightness criterion Theorem 4.8 can be used as an effective alternative to the classical Kolmogorov–Lamperti criterion.

In the following example we demonstrate how Example 5.8 generalises to non-linear interpolations with essentially no extra effort.

Example 5.12

(Non-linear interpolation). As in Example 5.8, let \(Y_{nj}\) be an iid array in \(\mathbb {R}^d\) such that \(\mathbf {Y}^n \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {Y}\) for a Lévy process \(\mathbf {Y}\) in \(\mathbb {R}^d\).

Instead of piecewise interpolations, consider now any q-approximating endpoint continuous path function \(\psi : \mathbb {R}^d \mapsto C_o^{q -var }([0,1], \mathbb {R}^d)\) for some \(1 \le q < 2\). Set again \(G = G^2(\mathbb {R}^d)\) and define the injective map \(f : \mathbb {R}^d \mapsto G\) by

$$\begin{aligned} f(x) = S_2(\psi (x))_1, \end{aligned}$$

where \(S_2 : C_o^{q -var }([0,1],\mathbb {R}^d) \mapsto C_o^{q -var }([0,1],G)\) denotes the level-2 lifting map.

Consider the iid array \(X_{nj} \,{:=}\, f(Y_{nj})\). It follows readily from the assumption that \(\psi \) is q-approximating that \(X_{nj}\) is again scaled by any scaling function \(\theta \) on G for which \(\theta \ge \sum _{i=1}^d |\xi _i|^2\). We now make the assumption on \(\psi \) and \(Y_{n1}\) that for all \(i,j \in \{1,\ldots m\}\) the following limits exist:

$$\begin{aligned} D^i&\,{:=}\, \lim _{n \rightarrow \infty } n\mathbb E \left[ \xi _i(f(Y_{n1})) \right] , \\ C^{i,j}&\,{:=}\, \lim _{n \rightarrow \infty } n\mathbb E \left[ \xi _i(f(Y_{n1}))\xi _j(f(Y_{n1})) \right] - \int _{\mathbb {R}^d} \xi _i(f(x))\xi _j(f(x)) \Pi (dx). \end{aligned}$$

This occurs, for example, whenever every \(\xi _i \circ f\) is twice differentiable at zero, but in general will depend on the array \(Y_{nj}\) and the path function \(\psi \).

Under this assumption, it follows from Theorem 2.1 that the random walk \(\mathbf {X}^n\) associated with the array \(X_{nj}\) converges in law to the Lévy process \(\mathbf {X}\) with triplet \((C,D,\Xi )\), where \(\Xi \) is the pushforward of \(\Pi \) by f.

Define now the q-approximating, endpoint continuous path function \(\phi : f(\mathbb {R}^d) \mapsto C_o^{q -var }([0,1],G)\) by

$$\begin{aligned} \phi (f(x)) = S_2(\psi (x)). \end{aligned}$$

Observe that the conditions of Theorem 5.5 are again satisfied for all \(p > 2\), so that \(\mathbf {X}^{n,\phi } \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}^{\phi }\) as \(C_o^{0,p -var }([0,1],G)\)-valued random variables.

Note that \(\mathbf {X}^{n,\phi }\) is, up to reparametrisation, the lift of \(\mathbf {Y}^{n,\psi }\) (which is itself, up to reparametrisation, the random walk \(\mathbf {Y}^n\) interpolated by the path function \(\psi \)). It follows that ODE flows driven by \(\mathbf {Y}^{n,\psi }\) along \( Lip ^{\gamma +k-1}\) vector fields, for any \(\gamma > 2\), \(k \ge 1\), converge in law as \( Diff ^k\)-valued r.v.’s to the corresponding RDE flow driven by \(\mathbf {X}^\phi \) (Corollary 5.7).

Remark 5.13

McShane [33] considered non-linear interpolations of the increments of Brownian motion and showed strong convergence of the corresponding ODEs to the associated Stratonovich SDE with an adjusted drift. We note that the family of path functions \(\psi \) to which the above example applies includes the non-linear interpolations considered by McShane ([33, p. 285]) (provided that \(Y_{nk}\) are also sufficiently well behaved, e.g., the increments of Brownian motion, to ensure that the limits \(C^{i,j}\) and \(D^i\) exist). The above example can thus be seen as a weak convergence analogue for general Lévy processes of the results in [33]. In a similar way, the following example is analogous to the results of Sussman [35] on non-linear approximations of Brownian motion.

Example 5.14

(Perturbed walk). As in Examples 5.8 and 5.12, let \(Y_{nj}\) be an iid array in \(\mathbb {R}^d\) such that \(\mathbf {Y}^n \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {Y}\) for a Lévy process \(\mathbf {Y}\) in \(\mathbb {R}^d\).

Let \(N \ge 2\) and as before denote \(G = G^N(\mathbb {R}^d)\) and \(\mathfrak g= \mathfrak g^N(\mathbb {R}^d)\). Fix a path \(\gamma \in C_o^{1 -var }([0,1], \mathbb {R}^d)\) such that \(v \,{:=}\, \log (S_N(\gamma )_{0,1})\) is in the center of \(\mathfrak g\) (that is, \(v^i = 0\) for all \(i \in \{1,\ldots , m\}\) such that \(d_i < N\)).

In this example we wish to consider the random path \(\mathbf {Z}^n \in C^{1 -var }([0,1], \mathbb {R}^d)\) defined by linearly joining the points of \(\mathbf {Y}^n\), and, between each linear chord, running along the path \(n^{-1/N}\gamma \).

Define the closed subset

Note that for every \(x \in W\) decomposes uniquely as \(x = \exp (y)\exp (\lambda v)\) for some \(y \in \mathbb {R}^d\) and \(\lambda \ge 0\). Define then the 1-approximating, endpoint continuous path function \(\phi : W \mapsto C_o^{1 -var }([0,1],G)\) by

$$\begin{aligned} \phi (\exp (y)\exp (\lambda v))_t = {\left\{ \begin{array}{ll} \exp (2t y) &{}\text{ if } t \in [0,1/2] \\ \exp (y)S_N(\lambda ^{1/N}\gamma )_{2t-1} &{}\text{ if } t \in (1/2,1]. \end{array}\right. } \end{aligned}$$

Consider the G-valued iid array \(X_{nj} \,{:=}\, \exp (Y_{nj})\exp (n^{-1} v)\) and the associated random walk \(\mathbf {X}^n\). Observe that \(\mathbf {X}^{n,\phi }\) is (a reparametrisation of) the level-N lift of the path \(\mathbf {Z}^n\) described above.

We now claim that \(\mathbf {X}^n \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}\) for a Lévy process \(\mathbf {X}\) in G. A straightforward way to show this is to take local coordinates \(\sigma _1,\ldots , \sigma _d \in C^\infty _c(\mathbb {R}^d)\) so that \(\sigma \,{:=}\, \sum _{i=1}^d \sigma _iu_i\) is the identity in a neighbourhood of zero, and write the triplet of \(\mathbf {Y}\) as \((A,B,\Pi )\) with respect to \(\sigma _1,\ldots , \sigma _d\). Define the functions

$$\begin{aligned} f_n : \mathbb {R}^d \mapsto \mathfrak g, \; \; f_n(y) = \xi (e^y e^{v/n}), \end{aligned}$$

so that \(\xi (X_{nk}) = f_n(Y_{nk})\). Note that, since v is in the centre of \(\mathfrak g\), there exists a neighbourhood of zero \(V \subset \mathbb {R}^d\) and \(n_0 > 0\) such that for all \(n \ge n_0\)

$$\begin{aligned} f_n(y) = \sum _{i=1}^d \sigma _i(y) u_i + \xi (e^{v/n}) + h_n(y), \end{aligned}$$

where \(h_n \equiv 0\) on V. It readily follows that

$$\begin{aligned} \lim _{k\rightarrow \infty } \sup _{n \ge 1} k\left| \mathbb E \left[ f_n(Y_{k1}) - \xi (e^{v/n}) \right] - Q_n \right| = 0, \end{aligned}$$

where

$$\begin{aligned} Q_n \,{:=}\, \sum _{i=1}^d B^iu_i + \int _{\mathbb {R}^d} h_n(y)\Pi (dy) \in \mathfrak g. \end{aligned}$$

Observe now that for all \(y \in \mathbb {R}^d\), \(\lim _{n\rightarrow \infty }h_n(y) = \xi (e^y) - \sigma (y)\), so that by dominated convergence,

$$\begin{aligned} \lim _{n \rightarrow \infty } Q_n = \sum _{i=1}^d B^iu_i + \int _{\mathbb {R}^d} \left( \xi (e^y) - \sigma (y)\right) \Pi (dy) =: Q \in \mathfrak g, \end{aligned}$$

from which it follows that

$$\begin{aligned} \lim _{n \rightarrow \infty } n\mathbb E \left[ f_n(Y_{n1}) - f_n(0) \right] = Q. \end{aligned}$$

Since \(nf_n(0) = v\) for all n sufficiently large, we obtain that the following limit exists:

$$\begin{aligned} D \,{:=}\, \lim _{n \rightarrow \infty }n\mathbb E \left[ \xi (X_{n1}) \right] \in \mathfrak g. \end{aligned}$$

Furthermore, letting \(\Xi \) denote the pushforward of \(\Pi \) by \(\exp \), one can show in exactly the same way that

$$\begin{aligned} C^{i,j} \,{:=}\, \lim _{n \rightarrow \infty } n\mathbb E \left[ \xi _i(X_{n1})\xi _j(X_{n1}) \right] - \int _{G}\xi _i(x)\xi _j(x)\Xi (dx) \end{aligned}$$

exists for all \(i,j \in \{1,\ldots , m\}\), and that

$$\begin{aligned} \lim _{n \rightarrow \infty } n\mathbb E \left[ f(X_{n1}) \right] = \int _{G} f(x)\Xi (dx) \end{aligned}$$

for every \(f \in C_b(G)\) which is identically zero on a neighbourhood of \(1_G\). It follows by Theorem 2.1 that \(\mathbf {X}^n \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}\) as claimed, where \(\mathbf {X}\) is the Lévy process with triplet \((C,D,\Xi )\).

Finally, one readily sees that \(X_{nj}\) is scaled by any scaling function \(\theta \) on G for which

$$\begin{aligned} \theta \ge \sum _{i=1}^d|\xi _i|^2 + \sum _{\begin{array}{c} 1 \le i \le m \\ d_i = N \end{array}} |\xi _i|. \end{aligned}$$

It now follows by Theorem 5.5 that for all \(p > N\), \(\mathbf {X}^{n,\phi } \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}^{\phi }\) as \(C_o^{0,p -var }([0,1],G)\)-valued r.v.’s. In particular, ODE flows driven by the random paths \(\mathbf {Z}^n\) along \( Lip ^{\gamma +k-1}\) vector fields, for any \(\gamma > N\), \(k \ge 1\), converge in law as \( Diff ^k\)-valued r.v.’s to the corresponding RDE flow driven by \(\mathbf {X}^\phi \) (Corollary 5.7).

Remark 5.15

Note that the previous Example 5.8 is a special case of Example 5.14 by taking \(v = 0\) and \(\gamma \) the constant path \(\gamma \equiv 0\). Building on Remark 5.9, one can verify that RDEs driven by \(\mathbf {X}^\phi \) coincide (up to reparametrisation) with the associated Marcus SDEs driven by \(\mathbf {Y}\) with an adjusted drift given by appropriate N-th level Lie brackets of the driving vector fields (cf. [16] and [18, Section 13.3.4]).

5.3.2 The Lévy–Khintchine formula for Lévy rough paths

In this subsection we determine a formula for the characteristic function (in the sense of [11]) of the signature of a Lévy rough path.

Recall that for every \(\mathbf {x}\in WG\Omega _p(\mathbb {R}^d)\), there exists an element

$$\begin{aligned} S(\mathbf {x})_{0,T} = (1,S(\mathbf {x})^1_{0,T},S(\mathbf {x})^2_{0,T}, \ldots ) \in T((\mathbb {R}^d)) = \prod _{k=0}^\infty (\mathbb {R}^d)^{\otimes k}, \end{aligned}$$

called the signature of \(\mathbf {x}\), where \(S(\mathbf {x})_{0,T}^k\) encodes all the k-fold iterated integrals of \(\mathbf {x}\). A fundamental result in rough paths theory is that \(S(\mathbf {x})_{0,T}\) belongs to a certain group \(G(\mathbb {R}^d)\) contained in the set of group-like elements of \(T((\mathbb {R}^d))\) (for the tensor Hopf algebra structure). Furthermore, for every linear map \(f \in \mathbf L(\mathbb {R}^d,\mathbf L(\mathbb {R}^e, \mathbb {R}^e))\), the series \(\sum _{k=0}^\infty f^{\otimes k}(S(\mathbf {x})_{0,T}^k)\) converges absolutely to an operator \(f(S(\mathbf {x})_{0,T}) \in \mathbf L(\mathbb {R}^e,\mathbb {R}^e)\) (which is precisely the flow \(U^\mathbf {x}_{T\leftarrow 0}\) associated with the RDE (5.1) upon treating f as a collection of linear vector fields on \(\mathbb {R}^e\)).

For a finite-dimensional complex Hilbert space H, let \(\mathfrak u(H) \subset \mathbf L(H,H)\) denote the Lie algebra of anti-Hermitian operators on H and \(\mathcal {U}(H) \subset \mathbf L(H,H)\) the group of unitary operators on H. Note that every \(f \in \mathbf L(\mathbb {R}^d, \mathfrak u(H))\) naturally induces a map \(f : G(\mathbb {R}^d) \mapsto \mathbf L(H,H)\) (which is continuous for the topology on \(G(\mathbb {R}^d)\) introduced in [11]) given by \(f(x) = \sum _{k=0}^\infty f^{\otimes k}(x^k)\), where \(x^k \in (\mathbb {R}^d)^{\otimes k}\) denotes the level-k projection of x. Note that f satisfies \(f(x) \in \mathcal {U}(H)\) and \(f(xy) = f(x)f(y)\) for all \(x,y \in G(\mathbb {R}^d)\), i.e., \(f : G(\mathbb {R}^d) \mapsto \mathcal {U}(H)\) is a unitary representation of \(G(\mathbb {R}^d)\).

One of the main results of [11] is that for any \(WG\Omega _p(\mathbb {R}^d)\)-valued random variable \(\mathbf {X}\), the following characteristic function

$$\begin{aligned} \mathbf L(\mathbb {R}^d,\mathfrak u(H))&\mapsto \mathbf L(H,H),\nonumber \\ f&\mapsto \mathbb E \left[ f(S(\mathbf {X})_{0,T}) \right] , \end{aligned}$$
(5.2)

where H varies over all finite dimensional complex Hilbert spaces, uniquely determines \(S(\mathbf {X})_{0,T}\) as a \(G(\mathbb {R}^d)\)-valued random variable \(G(\mathbb {R}^d)\) (more generally, this result holds for every \(G(\mathbb {R}^d)\)-valued random variable).

Remark 5.16

Boedihardjo et al. [5] have recently established a conjecture of Hambly–Lyons [24] on the kernel of the map \(S : WG\Omega _p(\mathbb {R}^d) \mapsto T((\mathbb {R}^d))\). A consequence of the main result of [5] is that for all \(\mathbf {x},\mathbf {y}\in WG\Omega _p(\mathbb {R}^d)\), \(S(\mathbf {x})_{0,T} = S(\mathbf {y})_{0,T} \Leftrightarrow U^\mathbf {x}_{T\leftarrow 0} = U^\mathbf {y}_{T\leftarrow 0}\) for all collections \((f_i)_{i=1}^d\) of vector fields in \( Lip ^{\gamma }(\mathbb {R}^e)\) with \(\gamma > p\) (not necessarily linear). In combination with the results from [11], it follows that for any \(WG\Omega _p(\mathbb {R}^d)\)-valued random variable \(\mathbf {X}\), knowledge of the map (5.2) uniquely determines the law of every RDE driven by \(\mathbf {X}\).

We now state the aforementioned formula for the characteristic function of the signature of a Lévy rough path. For a subset \(W\subseteq G\), path function \(\phi : W \mapsto C^{p -var }_o([0,1], G)\), and a linear map \(f \in \mathbf L(\mathbb {R}^d,\mathbf L(\mathbb {R}^e,\mathbb {R}^e))\), we adopt the shorthand notation

$$\begin{aligned} f_{\phi } : W \mapsto \mathbf L(\mathbb {R}^e,\mathbb {R}^e), \; \; f_\phi (x) \,{:=}\, f(S(\phi (x))_{0,1}). \end{aligned}$$

By interpolation (Lemma 3.3), one can readily verify that \(f_\phi \) is continuous whenever \(\phi \) is p-approximating and endpoint continuous. Finally, we canonically treat \(\mathfrak g= \mathfrak g^N(\mathbb {R}^d)\) as a subspace of the tensor algebra \(T(\mathbb {R}^d)\), so that for any Lie algebra \(\mathfrak h\), every \(f \in \mathbf L(\mathbb {R}^d, \mathfrak h)\) extends uniquely to a linear map \(f : \mathfrak g\mapsto \mathfrak h\).

Theorem 5.17

(Lévy–Khintchine formula). Let \(\mathbf {X}\) be a Lévy process in G with triplet \((A, B, \Pi )\). Suppose that for some \(1 \le p < N+1\), \(\left| \left| \mathbf {X} \right| \right| _{p -var ;[0,T]} < \infty \) a.s.. Let \(\phi : supp (\Pi ) \mapsto C_o^{p -var }([0,1],G)\) be a p-approximating, endpoint continuous path function defined on \( supp (\Pi )\).

Then for every finite-dimensional complex Hilbert space H and \(f \in \mathbf L(\mathbb {R}^d,\mathfrak u(H))\), it holds that the function

$$\begin{aligned} f_\phi - id _H - \sum _{i=1}^m f(u_i)\xi _i : supp (\Pi ) \mapsto \mathbf L(H,H) \end{aligned}$$

is \(\Pi \)-integrable, and that

$$\begin{aligned} \mathbb E \left[ f(S(\mathbf {X}^\phi )_{0,T}) \right] = \exp \left( T\Psi _\mathbf {X}(f) \right) , \end{aligned}$$
(5.3)

where

Remark 5.18

Note that every pair \((\mathbf {X},\phi )\) as in Theorem 5.17 naturally gives rise to a convolution semigroup \((\mu _t)_{t > 0}\) of probability measures on \(G(\mathbb {R}^d)\) (which we recall is a Polish but, if \(d > 1\), non-locally compact group, [11]) given by \(\mu _t = Law \left[ S(\mathbf {X}^\phi _{[0,t]})_{0,t} \right] \), where \(\mathbf {X}_{[s,t]}^\phi \in C([s,t],G)\) denotes the connecting map applied to the restriction \({\left. \mathbf {X} \phantom {\big |} \right| _{[s,t]} }\). Moreover, treating \(\phi \) as a map \( supp (\Pi ) \mapsto G(\mathbb {R}^d)\), \(x \mapsto S(\phi (x))_{0,1}\), and every \(f \in \mathbf L(\mathbb {R}^d,\mathfrak u(H))\) as unitary representation of \(G(\mathbb {R}^d)\), Theorem 5.17 bears a close resemblance to other forms of the Lévy–Khintchine formula stated in terms of unitary representations of Lie groups (see, e.g., [1, Section 5.5]).

Remark 5.19

Theorem 5.17 can be seen as an extension of a related result on the expected signature of a Lévy p-rough path for \(1 \le p < 3\) ([17, Theorem 53]) in which \(\phi \) is taken as the log-linear path function \(\phi (e^x) = e^{tx}\), \(\forall x \in \mathfrak g\), and additional moment assumptions on the Lévy measure are required to ensure existence of the expected signature.

We first record the following estimate which is readily derived from standard Euler approximations to RDEs ([18, Corollary 10.15]).

Lemma 5.20

Let \(1 \le p< \gamma < N+1\), \(\theta \) a scaling function on G, \(W \subseteq G\) a subset, and \(\phi \) a path function defined on W such that \(\lim _{x \rightarrow 1_G}\left| \left| \phi (x) \right| \right| ^\gamma _{p -var ;[0,1]}/\theta (x) = 0\).

Then for all \(f \in \mathbf L(\mathbb {R}^d, \mathbf L(\mathbb {R}^e,\mathbb {R}^e))\), it holds that

$$\begin{aligned} \lim _{x \rightarrow 1_G} \frac{1}{\theta (x)}\left| f_\phi (x) - id _{\mathbb {R}^e} - \sum _{i=1}^m \xi _i(x) f(u_i) - \frac{1}{2}\sum _{\begin{array}{c} 1 \le i,j \le m \\ d_i + d_j \le N \end{array}} \xi _i(x) \xi _j(x) f(u_i)f(u_j) \right| = 0. \end{aligned}$$

Proof of Theorem 5.17

Without loss of generality, we cam assume \(T=1\). Let V be a bounded neighbourhood of \(1_G\) and \(W \,{:=}\, supp (\Pi ) \cup V\). Note that \(\phi \) shrinks on the diagonal (see Remark A.2), so we can find a path function \(\psi : W \mapsto C([0,1],G)\) which is also p-approximating and shrinks on the diagonal and such that \(\psi \equiv \phi \) on \( supp (\Pi )\) (e.g., let \(\psi (x)\) be a geodesic from \(1_G\) to x for all \(x \in V{\setminus } supp (\Pi )\)).

Let \(X_{n1}, \ldots , X_{nn}\) be the iid array constructed in Sect. 2.3 associated to \(\mathbf {X}\), and let \(\mathbf {X}^n\) be the associated random walk. Due to the shrinking support of the random variables \(Y_{nj}\) from Sect. 2.3, observe that

$$\begin{aligned} for every \,\varepsilon > 0, \mathbb {P} \left[ \mathbf {X}^n \notin supp (\Pi )^\varepsilon \right] = 0 \hbox { for all } n \hbox { sufficiently large}, \end{aligned}$$
(5.4)

where we recall the notation \( supp (\Pi )^\varepsilon \) from Section A.1. In particular, for all n sufficiently large, \(\mathbf {X}^n \in W^0\) a.s., so that \(\mathbf {X}^{n,\psi }\) is well-defined. Observe that, due to (5.4) and Proposition A.4, \(\mathbf {X}^{n, \psi } \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}^{\psi }\) as \(C_o([0,1], G)\)-valued random variables.

Let \(p< p' < N+1\). Since \(\left| \left| \mathbf {X} \right| \right| _{p -var ;[0,1]} < \infty \) a.s. by assumption, we deduce from Theorem 5.1, Lemma 5.4, and Proposition 3.5, that

$$\begin{aligned} \mathbf {X}^{n,\psi } \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}^{\psi } \,{\buildrel \mathcal {D}\over =}\,\mathbf {X}^\phi \; \; as \,C_o^{0,p' -var }([0,1],G)-\hbox {valued random variables}, \end{aligned}$$

where the equality in law follows from the fact that \(\psi \equiv \phi \) on \( supp (\Pi )\) and \(\mathbf {X}\in supp (\Pi )^0\) a.s. (see Remark 5.6).

For all \(i \in \{1,\ldots , m\}\), define \(q_i \,{:=}\, 2\wedge (p/d_i)\), and let \(\theta \) be a scaling function on G such that \(\theta \equiv \sum _{i=1}^m |\xi _i|^{q_i}\) in a neighbourhood of \(1_G\). It follows from Lemma 2.5 and part (2) of Theorem 5.1 that \(\theta \) scales the array \(X_{nj}\).

Since \(\psi \) is p-approximating, it holds that \(\lim _{x \rightarrow 1_G}\left| \left| \psi (x) \right| \right| _{p -var ;[0,1]}^\gamma /\theta (x) = 0\) for all \(\gamma > p\). For \(f \in \mathbf L(\mathbb {R}^d, \mathfrak u(H))\), observe that \(f_{\psi }\) is a map from W to the unitary operators on H (thus bounded) and is continuous on \( supp (\Pi )\). Since \(f_{\psi } \equiv f_{\phi }\) on \( supp (\Pi )\), it now follows from Lemmas 5.20 and 2.6 that

$$\begin{aligned} \lim _{n \rightarrow \infty } n\mathbb E \left[ f_{\psi }(X_{n1}) - id _H \right] = \Psi _\mathbf {X}(f), \end{aligned}$$

and thus

$$\begin{aligned} \lim _{n \rightarrow \infty } \mathbb E \left[ f_{\psi }(X_{n1}) \right] ^n = \exp (\Psi _\mathbf {X}(f)). \end{aligned}$$

Since the array \(X_{nj}\) is iid, note that for all \(n \ge 1\)

$$\begin{aligned} \mathbb E \left[ f(S(\mathbf {X}^{n,\psi })_{0,1}) \right] = \mathbb E \left[ f_{\psi }(X_{n1}) \right] ^n. \end{aligned}$$

Since \(\mathbf {X}^{n,\psi } \,{\buildrel \mathcal {D}\over \rightarrow }\,\mathbf {X}^\phi \) as \(WG\Omega _{p'}(\mathbb {R}^d)\)-valued r.v.’s, and \(\mathbf {x}\mapsto f(S(\mathbf {x})_{0,1})\) is a continuous bounded function on \(WG\Omega _{p'}(\mathbb {R}^d)\), we obtain (5.3). \(\square \)