1 Introduction

The small-time asymptotics of heat kernels have been extensively studied over the years, from an analytic, a geometric as well as a probabilistic point of view. Bismut [9] used Malliavin calculus to perform the analysis of the heat kernel in the elliptic case and he developed a deterministic Malliavin calculus to study Hörmander-type hypoelliptic heat kernels. Following this approach, Ben Arous [4] found the corresponding small-time asymptotics outside the sub-Riemannian cut locus and Ben Arous [5] and Léandre [11] studied the behaviour on the diagonal. In joint work [6, 7], they also discussed the exponential decay of hypoelliptic heat kernels on the diagonal.

In recent years, there has been further progress in the study of heat kernels on sub-Riemannian manifolds. Barilari et al. [3] found estimates of the heat kernel on the cut locus by using an analytic approach, and Inahama and Taniguchi [10] combined Malliavin calculus and rough path theory to determine small-time full asymptotic expansions on the off-diagonal cut locus. Moreover, Bailleul et al. [2] studied the asymptotics of sub-Riemannian diffusion bridges outside the cut locus. We extend their analysis to the diagonal and describe the asymptotics of sub-Riemannian diffusion loops. In a suitable chart, and after a suitable rescaling, we show that the small-time diffusion loop measures have a non-degenerate limit, which we identify explicitly in terms of a certain local limiting operator. Our analysis also allows us to determine the loop asymptotics under the scaling used to obtain a small-time Gaussian limit of the sub-Riemannian diffusion bridge measures in [2]. In general, these asymptotics are now degenerate and need no longer be Gaussian.

Let M be a connected smooth manifold of dimension d and let a be a smooth non-negative quadratic form on the cotangent bundle \(T^*M\). Let \(\mathcal {L}\) be a second order differential operator on M with smooth coefficients, such that \(\mathcal {L}1=0\) and such that \(\mathcal {L}\) has principal symbol a / 2. One refers to a as the diffusivity of the operator \(\mathcal {L}\). We say that a has a sub-Riemannian structure if there exist \(m\in \mathbb {N}\) and smooth vector fields \(X_1,\dots ,X_m\) on M satisfying the strong Hörmander condition, i.e. the vector fields together with their commutator brackets of all orders span \(T_yM\) for all \(y\in M\), such that

$$\begin{aligned} a(\xi ,\xi )=\sum _{i=1}^m\langle \xi , X_i(y)\rangle ^2 \quad \text{ for }\quad \xi \in T_y^* M. \end{aligned}$$

Thus, we can write

$$\begin{aligned} \mathcal {L}=\frac{1}{2}\sum _{i=1}^m X_i^2+X_0 \end{aligned}$$

for a vector field \(X_0\) on M, which we also assume to be smooth. Note that the vector fields \(X_0,X_1,\dots ,X_m\) are allowed to vanish and hence, the sub-Riemannian structure \((X_1,\dots ,X_m)\) need not be of constant rank. To begin with, we impose the global condition

$$\begin{aligned} M=\mathbb {R}^d\quad \text{ and }\quad X_0,X_1,\dots ,X_m\in C_b^\infty (\mathbb {R}^d,\mathbb {R}^d), \end{aligned}$$
(1.1)

subject to the additional constraint \(X_0(y)\in \hbox {span}\{X_1(y),\dots ,X_m(y)\}\) for all \(y\in \mathbb {R}^d\). Subsequently, we follow Bailleul et al. [2] and insist that there exist a smooth one-form \(\beta \) on M with \(\Vert a(\beta ,\beta )\Vert _\infty <\infty \), and a locally invariant positive smooth measure \(\tilde{\nu }\) on M such that, for all \(f\in C^\infty (M)\),

$$\begin{aligned} \mathcal {L}f =\frac{1}{2}\hbox {div}(a\nabla f) + a(\beta ,\nabla f). \end{aligned}$$

Here the divergence is understood with respect to \(\tilde{\nu }\). Note that if the operator \(\mathcal {L}\) is of this form then \(X_0=\sum _{i=1}^m\alpha _i X_i\) with \(\alpha _i=\frac{1}{2}\hbox {div } X_i+\beta (X_i)\) and in particular, \(X_0(y)\in \hbox {span}\{X_1(y),\dots ,X_m(y)\}\) for all \(y\in M\).

We are interested in the associated diffusion bridge measures. Fix \(x\in M\) and let \(\varepsilon >0\). If we do not assume the global condition then the diffusion process \((x_t^\varepsilon )_{t<\zeta }\) defined up to the explosion time \(\zeta \) starting from x and having generator \(\varepsilon \mathcal {L}\) may explode with positive probability before time 1. Though, on the event \(\{\zeta >1\}\), the process \((x_t^\varepsilon )_{t\in [0,1]}\) has a unique sub-probability law \(\mu _\varepsilon ^x\) on the set of continuous paths \(\Omega =C([0,1],M)\). Choose a positive smooth measure \(\nu \) on M, which can differ from the locally invariant positive measure \(\tilde{\nu }\) on M, and let p denote the Dirichlet heat kernel for \(\mathcal {L}\) with respect to \(\nu \). We can disintegrate \(\mu _\varepsilon ^x\) to give a unique family of probability measures \((\mu _\varepsilon ^{x,y}{:}\,y\in M)\) on \(\Omega \) such that

$$\begin{aligned} \mu _\varepsilon ^x({\mathrm d}\omega )=\int _M\mu _\varepsilon ^{x,y}({\mathrm d}\omega ) p(\varepsilon ,x,y)\nu ({\mathrm d}y), \end{aligned}$$

with \(\mu _\varepsilon ^{x,y}\) supported on \(\Omega ^{x,y}=\{\omega \in \Omega {:}\,\omega _0=x,\omega _1=y\}\) for all \(y\in M\) and where the map \(y\mapsto \mu _\varepsilon ^{x,y}\) is weakly continuous. Bailleul et al. [2] studied the small-time fluctuations of the diffusion bridge measures \(\mu _\varepsilon ^{x,y}\) in the limit \(\varepsilon \rightarrow 0\) under the assumption that (xy) lies outside the sub-Riemannian cut locus. Due to the latter condition, their results do not cover the diagonal case unless \(\mathcal {L}\) is elliptic at x. We show how to extend their analysis in order to understand the small-time fluctuations of the diffusion loop measures \(\mu _\varepsilon ^{x,x}\).

As a by-product, we recover the small-time heat kernel asymptotics on the diagonal shown by Ben Arous [5] and Léandre [11]. Even though our approach for obtaining the small-time asymptotics on the diagonal is similar to [5], it does not rely on the Rothschild and Stein lifting theorem, cf. [15]. Instead, we use the notion of an adapted chart at x, introduced by Bianchini and Stefani [8], which provides suitable coordinates around x. We discuss adapted charts in detail in Sect. 2. The chart Ben Arous [5] performed his analysis in is in fact one specific example of an adapted chart, whereas we allow for any adapted chart. In the case where the diffusivity a has a sub-Riemannian structure which is one-step bracket-generating at x, any chart around x is adapted. However, in general these charts are more complex and for instance, even if \(M=\mathbb {R}^d\) there is no reason to assume that the identity map is adapted. Paoli [14] successfully used adapted charts to describe the small-time asymptotics of Hörmander-type hypoelliptic operators with non-vanishing drift at a stationary point of the drift field.

To a sub-Riemannian structure \((X_1,\dots ,X_m)\) on M, we associate a linear scaling map \({\delta }_\varepsilon {:}\,\mathbb {R}^d\rightarrow \mathbb {R}^d\) in a suitable set of coordinates, which depends on the number of brackets needed to achieve each direction, and the so-called nilpotent approximations \({\tilde{X}}_1,\dots ,{\tilde{X}}_m\), which are homogeneous vector fields on \(\mathbb {R}^d\). For the details see Sect. 2. The map \({\delta }_\varepsilon \) allows us to rescale the fluctuations of the diffusion loop to high enough orders so as to obtain a non-degenerate limiting measure, and the nilpotent approximations are used to describe this limiting measure. Let \((U,\theta )\) be an adapted chart around \(x\in M\). Smoothly extending this chart to all of M yields a smooth map \(\theta {:}\,M\rightarrow \mathbb {R}^d\) whose derivative \({\mathrm d}\theta _x{:}\,T_xM\rightarrow \mathbb {R}^d\) at x is invertible. Write \(T\Omega ^{0,0}\) for the set of continuous paths \(v=(v_t)_{t\in [0,1]}\) in \(T_xM\) with \(v_0=v_1=0\). Define a rescaling map \(\sigma _\varepsilon {:}\,\Omega ^{x,x}\rightarrow T\Omega ^{0,0}\) by

$$\begin{aligned} \sigma _\varepsilon (\omega )_t=({\mathrm d}\theta _x)^{-1} \left( {\delta }_\varepsilon ^{-1}\left( \theta (\omega _t)-\theta (x)\right) \right) \end{aligned}$$

and let \({\tilde{\mu }}_\varepsilon ^{x,x}\) be the pushforward measure of \(\mu _\varepsilon ^{x,x}\) by \(\sigma _\varepsilon \), i.e. \({\tilde{\mu }}_\varepsilon ^{x,x}\) is the unique probability measure on \(T\Omega ^{0,0}\) given by

$$\begin{aligned} {\tilde{\mu }}_\varepsilon ^{x,x}=\mu _\varepsilon ^{x,x}\circ \sigma _\varepsilon ^{-1}. \end{aligned}$$

Our main result concerns the weak convergence of these rescaled diffusion loop measures \({\tilde{\mu }}_\varepsilon ^{x,x}\). To describe the limit, assuming that \(\theta (x)=0\), we consider the diffusion process \(({\tilde{x}}_t)_{t\ge 0}\) in \(\mathbb {R}^d\) starting from 0 and having generator

$$\begin{aligned} \tilde{\mathcal {L}}=\frac{1}{2}\sum _{i=1}^m{\tilde{X}}_i^2. \end{aligned}$$

A nice cascade structure of the nilpotent approximations \({\tilde{X}}_1,\dots ,{\tilde{X}}_m\) ensures that this process exists for all time. Let \({\tilde{\mu }}^{0,\mathbb {R}^d}\) denote the law of the diffusion process \(({\tilde{x}}_t)_{t\in [0,1]}\) on the set of continuous paths \(\Omega (\mathbb {R}^d)=C([0,1],\mathbb {R}^d)\). By disintegrating \({\tilde{\mu }}^{0,\mathbb {R}^d}\), we obtain the loop measure \({\tilde{\mu }}^{0,0,\mathbb {R}^d}\) supported on the set \(\Omega (\mathbb {R}^d)^{0,0}= \{\omega \in \Omega (\mathbb {R}^d){:}\,\omega _0=\omega _1=0\}\). Define a map \(\rho {:}\,\Omega (\mathbb {R}^d)^{0,0}\rightarrow T\Omega ^{0,0}\) by

$$\begin{aligned} \rho (\omega )_t=({\mathrm d}\theta _x)^{-1}\omega _t \end{aligned}$$

and set \({\tilde{\mu }}^{x,x}={\tilde{\mu }}^{0,0,\mathbb {R}^d}\circ \rho ^{-1}\). This is the desired limiting probability measure on \(T\Omega ^{0,0}\).

Theorem 1.1

(Convergence of the rescaled diffusion bridge measures) Let M be a connected smooth manifold and fix \(x\in M\). Let \(\mathcal {L}\) be a second order partial differential operator on M such that, for all \(f\in C^\infty (M)\),

$$\begin{aligned} \mathcal {L}f=\frac{1}{2}\hbox {div}(a\nabla f)+a(\beta ,\nabla f), \end{aligned}$$

with respect to a locally invariant positive smooth measure, and where the smooth non-negative quadratic form a on \(T^*M\) has a sub-Riemannian structure and the smooth one-form \(\beta \) on M satisfies \(\Vert a(\beta ,\beta )\Vert _\infty <\infty \). Then the rescaled diffusion loop measures \({\tilde{\mu }}_\varepsilon ^{x,x}\) converge weakly to the probability measure \({\tilde{\mu }}^{x,x}\) on \(T\Omega ^{0,0}\) as \(\varepsilon \rightarrow 0\).

We prove this result by localising Theorem 1.2. As a consequence of the localisation argument, Theorem 1.1 remains true under the weaker assumption that the smooth vector fields giving the sub-Riemannian structure are only locally defined. The theorem below imposes an additional constraint on the map \(\theta \) which ensures that we can rely on the tools of Malliavin calculus to prove it. As we see later, the existence of such a diffeomorphism \(\theta \) is always guaranteed.

Theorem 1.2

Fix \(x\in \mathbb {R}^d\). Let \(X_0,X_1,\dots ,X_m\) be smooth bounded vector fields on \(\mathbb {R}^d\), with bounded derivatives of all orders, which satisfy the strong Hörmander condition everywhere and suppose that \(X_0(y)\in \hbox {span} \{X_1(y),\dots ,X_m(y)\}\) for all \(y\in \mathbb {R}^d\). Set

$$\begin{aligned} \mathcal {L}=\frac{1}{2}\sum _{i=1}^m X_i^2+X_0. \end{aligned}$$

Assume that the smooth map \(\theta {:}\,\mathbb {R}^d\rightarrow \mathbb {R}^d\) is a global diffeomorphism with bounded derivatives of all positive orders and an adapted chart at x. Then the rescaled diffusion loop measures \({\tilde{\mu }}_\varepsilon ^{x,x}\) converge weakly to the probability measure \({\tilde{\mu }}^{x,x}\) on \(T\Omega ^{0,0}\) as \(\varepsilon \rightarrow 0\).

Note that the limiting measures with respect to two different choices of admissible diffeomorphisms \(\theta _1\) and \(\theta _2\) are related by the Jacobian matrix of the transition map \(\theta _2\circ \theta _1^{-1}\).

The proof of Theorem 1.2 follows [2]. The additional technical result needed in our analysis is the uniform non-degeneracy of the \({\delta }_\varepsilon \)-rescaled Malliavin covariance matrices. Throughout the paper, we consider Malliavin covariance matrices in the sense of Bismut and refer to what is also called the reduced Malliavin covariance matrix simply as the Malliavin covariance matrix. Under the global assumption, there exists a unique diffusion process \((x_t^\varepsilon )_{t\in [0,1]}\) starting at x and having generator \(\varepsilon \mathcal {L}\). Choose \(\theta {:}\,\mathbb {R}^d\rightarrow \mathbb {R}^d\) as in Theorem 1.2 and define \(({\tilde{x}}_t^\varepsilon )_{t\in [0,1]}\) to be the rescaled diffusion process given by

$$\begin{aligned} {\tilde{x}}_t^\varepsilon = {\delta }_\varepsilon ^{-1}\left( \theta (x_t^\varepsilon )-\theta (x)\right) . \end{aligned}$$

Denote the Malliavin covariance matrix of \({\tilde{x}}_1^\varepsilon \) by \({\tilde{c}}_1^\varepsilon \). We know that, for each \(\varepsilon >0\), the matrix \({\tilde{c}}_1^\varepsilon \) is non-degenerate because the vector fields \(X_1,\dots ,X_m\) satisfy the strong Hörmander condition everywhere. We prove that these Malliavin covariance matrices are in fact uniformly non-degenerate.

Theorem 1.3

(Uniform non-degeneracy of the \({\delta }_\varepsilon \)-rescaled Malliavin covariance matrices) Let \(X_0,X_1,\dots ,X_m\) be smooth bounded vector fields on \(\mathbb {R}^d\), with bounded derivatives of all orders, which satisfy the strong Hörmander condition everywhere and such that \(X_0(y)\in \hbox {span}\{X_1(y),\dots ,X_m(y)\}\) for all \(y\in \mathbb {R}^d\). Fix \(x\in \mathbb {R}^d\) and consider the diffusion operator

$$\begin{aligned} \mathcal {L}=\frac{1}{2}\sum _{i=1}^m X_i^2+X_0. \end{aligned}$$

Then the rescaled Malliavin covariance matrices \({\tilde{c}}_1^\varepsilon \) are uniformly non-degenerate, i.e. for all \(p<\infty \), we have

$$\begin{aligned} \sup _{\varepsilon \in (0,1]}\mathbb {E}\left[ \left| \det \left( {\tilde{c}}_1^\varepsilon \right) ^{-1} \right| ^p\right] <\infty . \end{aligned}$$

We see that the uniform non-degeneracy of the rescaled Malliavin covariance matrices \({\tilde{c}}_1^\varepsilon \) is a consequence of the non-degeneracy of the limiting diffusion process \(({\tilde{x}}_t)_{t\in [0,1]}\) with generator \(\tilde{\mathcal {L}}\). The latter is implied by the nilpotent approximations \({\tilde{X}}_1,\dots ,{\tilde{X}}_m\) satisfying the strong Hörmander condition everywhere on \(\mathbb {R}^d\), as proven in Sect. 2.

Organisation of the paper The paper is organised as follows. In Sect. 2, we define the scaling operator \({\delta }_\varepsilon \) with which we rescale the fluctuations of the diffusion loop to obtain a non-degenerate limit. It also sets up notations for subsequent sections and proves preliminary results from which we deduce properties of the limiting measure. In Sect. 3, we characterise the leading-order terms of the rescaled Malliavin covariance matrices \({\tilde{c}}_1^\varepsilon \) as \(\varepsilon \rightarrow 0\) and use this to prove Theorem 1.3. Equipped with the uniform non-degeneracy result, in Sect. 4, we adapt the analysis from [2] to prove Theorem 1.2. The approach presented is based on ideas from Azencott, Bismut and Ben Arous and relies on tools from Malliavin calculus. Finally, in Sect. 5, we employ a localisation argument to prove Theorem 1.1 and give an example to illustrate the result. Moreover, we discuss the occurrence of non-Gaussian behaviour in the \(\sqrt{\varepsilon }\)-rescaled fluctuations of diffusion loops.

2 Graded structure and nilpotent approximation

We introduce the notion of an adapted chart and of an associated dilation \(\delta _\varepsilon {:}\,\mathbb {R}^d\rightarrow \mathbb {R}^d\) which allows us to rescale the fluctuations of a diffusion loop in a way which gives rise to a non-degenerate limit as \(\varepsilon \rightarrow 0\). To be able to characterise this limiting measure later, we define the nilpotent approximation of a vector field on M and show that the nilpotent approximations of a sub-Riemannian structure form a sub-Riemannian structure themselves. This section is based on Bianchini and Stefani [8] and Paoli [14], but we made some adjustments because the drift term \(X_0\) plays a different role in our setting. At the end, we present an example to illustrate the various constructions.

2.1 Graded structure induced by a sub-Riemannian structure

Let \((X_1,\dots ,X_m)\) be a sub-Riemannian structure on M and fix \(x\in M\). For \(k\ge 1\), set

$$\begin{aligned} \mathcal {A}_k= \left\{ [X_{i_1},[X_{i_2},\dots , [X_{i_{k-1}},X_{i_k}]\dots ]] {:}\,1\le i_1,\dots ,i_k\le m\right\} \end{aligned}$$

and, for \(n\ge 0\), define a subspace of the space of smooth vector fields on M by

$$\begin{aligned} C_n=\hbox {span }\bigcup _{k=1}^n\mathcal {A}_k, \end{aligned}$$

where the linear combinations are taken over \(\mathbb {R}\). Note that \(C_0=\{0\}\). Let \(C=\hbox {Lie}\{X_1,\dots ,X_m\}\) be the Lie algebra over \(\mathbb {R}\) generated by the vector fields \(X_1,\dots ,X_m\). We observe that \(C_n\subset C_{n+1}\) as well as \([C_{n_1},C_{n_2}]\subset C_{n_1+n_2}\) for \(n_1,n_2\ge 0\) and that \(\bigcup _{n\ge 0} C_n=C\). Hence, \(\mathcal {C}=\{C_n\}_{n\ge 0}\) is an increasing filtration of the subalgebra C of the Lie algebra of smooth vector fields on M. Consider the subspace \(C_n(x)\) of the tangent space \(T_xM\) given by

$$\begin{aligned} C_n(x)=\left\{ X(x){:}\,X\in C_n\right\} . \end{aligned}$$

Define \(d_n=\hbox {dim } C_n(x)\). Since \(X_1,\dots ,X_m\) are assumed to satisfy the strong Hörmander condition, we have \(\bigcup _{n\ge 0} C_n(x) = T_x M\), and it follows that

$$\begin{aligned} N=\min \{n\ge 1{:}\,d_n=d\} \end{aligned}$$

is well-defined. We call N the step of the filtration \(\mathcal {C}\) at x.

Definition 2.1

A chart \((U,\theta )\) around \(x\in M\) is called an adapted chart to the filtration \(\mathcal {C}\) at x if \(\theta (x)=0\) and, for all \(n\in \{1,\dots ,N\}\),

  1. (i)

    \(\displaystyle C_n(x)= \hbox {span}\left\{ \frac{\partial }{\partial \theta ^1}(x), \dots ,\frac{\partial }{\partial \theta ^{d_n}}(x)\right\} ,\) and

  2. (ii)

    \(\left( \hbox {D }\theta ^k\right) (x)=0\) for every differential operator \(\hbox {D}\) of the form

    $$\begin{aligned} \hbox {D}=Y_1\dots Y_n\quad \text{ with }\quad Y_1,\dots ,Y_n\in \{X_1,\dots ,X_m\} \end{aligned}$$

    and all \(k>d_n.\)

Note that condition (ii) is equivalent to requiring that \((\hbox {D }\theta ^k)(x)=0\) for every differential operator \(\hbox {D }\in \hbox {span} \{Y_1\cdots Y_j{:}\,Y_l\in C_{i_l} \text{ and } i_1+\dots +i_j\le n\}\) and all \(k>d_n\). The existence of an adapted chart to the filtration \(\mathcal {C}\) at x is ensured by [8, Corollary 3.1], which explicitly constructs such a chart by considering the integral curves of the vector fields \(X_1,\dots ,X_m\). However, we should keep in mind that even though being adapted at x is a local property, the germs of adapted charts at x need not coincide.

Unlike Bianchini and Stefani [8], we choose to construct our graded structure on \(\mathbb {R}^d\) instead of on the domain U of an adapted chart, as this works better with our analysis. Define weights \(w_1,\dots ,w_d\) by setting \(w_k=\min \{l\ge 1{:}\, d_l\ge k\}\). This definition immediately implies \(1\le w_1 \le \dots \le w_d=N\). Let \(\delta _\varepsilon {:}\,\mathbb {R}^d\rightarrow \mathbb {R}^d\) be the anisotropic dilation given by

$$\begin{aligned} \delta _\varepsilon (y)=\delta _\varepsilon \left( y^1,\dots ,y^k,\dots ,y^d\right) = \left( \varepsilon ^{w_1/2}y^1,\dots ,\varepsilon ^{w_k/2}y^k,\dots ,\varepsilon ^{w_d/2}y^d\right) , \end{aligned}$$

where \((y^1,\dots ,y^d)\) are Cartesian coordinates on \(\mathbb {R}^d\). For a non-negative integer w, a polynomial g on \(\mathbb {R}^d\) is called homogeneous of weight w if it satisfies \(g\circ \delta _\varepsilon =\varepsilon ^{w/2}g\). For instance, the monomial \(y_1^{\alpha _1}\dots y_d^{\alpha _d}\) is homogeneous of weight \(\sum _{k=1}^d\alpha _k w_k\). We denote the set of polynomials which are homogeneous of weight w by \(\mathcal {P}(w)\). Note that the zero polynomial is contained in \(\mathcal {P}(w)\) for all non-negative integers w. Following [8], the graded order \(\mathcal {O}(g)\) of a polynomial g is defined by the property

$$\begin{aligned} \mathcal {O}(g)\ge i \quad \text{ if } \text{ and } \text{ only } \text{ if }\quad g\in \bigoplus _{w\ge i}\mathcal {P}(w). \end{aligned}$$

Thus, the graded order of a non-zero polynomial g is the maximal non-negative integer i such that \(g\in \oplus _{w\ge i}\mathcal {P}(w)\) whereas the graded order of the zero polynomial is set to be \(\infty \). Similarly, for a smooth function \(f\in C^\infty (V)\), where \(V\subset \mathbb {R}^d\) is an open neighbourhood of 0, we define its graded order \(\mathcal {O}(f)\) by requiring that \(\mathcal {O}(f)\ge i\) if and only if each Taylor approximation of f at 0 has graded order at least i. We see that the graded order of a smooth function is either a non-negative integer or \(\infty \). Furthermore, for an integer a, a polynomial vector field Y on \(\mathbb {R}^d\) is called homogeneous of weight a if, for all \(g\in \mathcal {P}(w)\), we have \(Y g\in \mathcal {P}(w-a)\). Here we set \(\mathcal {P}(b)=\{0\}\) for negative integers b. The weight of a general polynomial vector field is defined to be the smallest weight of its homogeneous components. Moreover, the graded order \(\mathcal {O}(\hbox {D})\) of a differential operator \(\hbox {D}\) on V is given by saying that

$$\begin{aligned} \mathcal {O}(\hbox {D})\le i\quad \text{ if } \text{ and } \text{ only } \text{ if }\quad \mathcal {O}(\hbox {D }g)\ge \mathcal {O}(g)-i \text{ for } \text{ all } \text{ polynomials } g. \end{aligned}$$

For example, the polynomial vector field \(y^1\frac{\partial }{\partial y^1}+(y^1)^2\frac{\partial }{\partial y^1}\) on \(\mathbb {R}^d\) has weight \(-w_1\) but considered as a differential operator it has graded order 0. It also follows that the graded order of a differential operator takes values in \(\mathbb {Z}\cup \{\pm \infty \}\) and that the zero differential operator has graded order \(-\infty \). In the remainder, we need the notions of the weight of a polynomial vector field and the graded order of a vector field understood as a differential operator. For smooth vector fields \(X_1\) and \(X_2\) on V, it holds true that

$$\begin{aligned} \mathcal {O}([X_1,X_2])\le \mathcal {O}(X_1)+\mathcal {O}(X_2). \end{aligned}$$
(2.1)

We further observe that for any smooth vector field X on V and every integer n, there exists a unique polynomial vector field \(X^{(n)}\) of weight at least n such that \(\mathcal {O}(X-X^{(n)})\le n-1\), namely the sum of the homogeneous vector fields of weight greater than or equal to n in the formal Taylor series of X at 0.

Definition 2.2

Let X be a smooth vector field on V. We call \(X^{(n)}\) the graded approximation of weight n of X.

Note that \(X^{(n)}\) is a polynomial vector field and hence, it can be considered as a vector field defined on all of \(\mathbb {R}^d\).

2.2 Nilpotent approximation

Let \((U,\theta )\) be an adapted chart to the filtration induced by a sub-Riemannian structure \((X_1,\dots ,X_m)\) on M at x and set \(V=\theta (U)\). Note that, for \(i\in \{1,\dots ,m\}\), the pushforward vector field \(\theta _*X_i\) is a vector field on V and write \({\tilde{X}}_i\) for the graded approximation \((\theta _*X_i)^{(1)}\) of weight 1 of \(\theta _*X_i\).

Definition 2.3

The polynomial vector fields \({\tilde{X}}_1,\dots ,{\tilde{X}}_m\) on \(\mathbb {R}^d\) are called the nilpotent approximations of the vector fields \(X_1,\dots ,X_m\) on M.

By [8, Theorem 3.1], we know that \(\mathcal {O}(\theta _*X_i)\le 1\). Thus, the formal Taylor series of \(\theta _*X_i\) at 0 cannot contain any homogeneous components of weight greater than or equal to two. This implies that \({\tilde{X}}_i\) is a homogeneous vector field of weight 1 and therefore,

$$\begin{aligned} \left( \delta _\varepsilon ^{-1}\right) _* {\tilde{X}}_i =\varepsilon ^{-1/2}{\tilde{X}}_i \quad \text{ for } \text{ all } i\in \{1,\dots ,m\}. \end{aligned}$$

Moreover, from \(\mathcal {O}(\theta _*X_i-{\tilde{X}}_i)\le 0\), we deduce that

$$\begin{aligned} \sqrt{\varepsilon }\left( \delta _\varepsilon ^{-1}\right) _* (\theta _* X_i )\rightarrow {\tilde{X}}_i \quad \text{ as }\quad \varepsilon \rightarrow 0 \quad \text{ for } \text{ all } i\in \{1,\dots ,m\}. \end{aligned}$$

This convergence holds on all of \(\mathbb {R}^d\) because for \(y\in \mathbb {R}^d\) fixed, we have \(\delta _\varepsilon (y)\in V\) for \(\varepsilon >0\) sufficiently small.

Remark 2.4

The vector fields \({\tilde{X}}_1,\dots ,{\tilde{X}}_m\) on \(\mathbb {R}^d\) have a nice cascade structure. Since \({\tilde{X}}_i\), for \(i\in \{1,\dots ,m\}\), contains the terms of weight 1 the component \({\tilde{X}}_i^k\), for \(k\in \{1,\dots ,d\}\), does not depend on the coordinates with weight greater than or equal to \(w_k\) and depends only linearly on the coordinates with weight \(w_k-1\). \(\square \)

We show that the nilpotent approximations \({\tilde{X}}_1,\dots ,{\tilde{X}}_m\) inherit the strong Hörmander property from the sub-Riemannian structure \((X_1,\dots ,X_m)\). This result plays a crucial role in the subsequent sections as it allows us to describe the limiting measure of the rescaled fluctuations by a stochastic process whose associated Malliavin covariance matrix is non-degenerate.

Lemma 2.5

Let

$$\begin{aligned} \tilde{\mathcal {A}}_k(0)= \left\{ [{\tilde{X}}_{i_1},[{\tilde{X}}_{i_2},\dots , [{\tilde{X}}_{i_{k-1}},{\tilde{X}}_{i_k}]\dots ]](0) {:}\, 1\le i_1,\dots ,i_k\le m\right\} . \end{aligned}$$

Then

$$\begin{aligned} \mathrm {span}\bigcup _{k=1}^n\tilde{\mathcal {A}}_k(0)= \mathrm {span}\left\{ \frac{\partial }{\partial y^1}(0), \dots ,\frac{\partial }{\partial y^{d_n}}(0)\right\} . \end{aligned}$$
(2.2)

Proof

We prove this lemma by induction. For the base case, we note that \(\mathcal {O}(\theta _*X_i-{\tilde{X}}_i)\le 0\) implies \({\tilde{X}}_i(0)=(\theta _*X_i)(0)\). Hence, by property (i) of an adapted chart \(\theta \), we obtain

$$\begin{aligned} \hbox {span }\tilde{\mathcal {A}}_1(0)= & {} \hbox {span}\left\{ {\tilde{X}}_1(0),\dots ,{\tilde{X}}_m(0)\right\} = (\theta _*C_1)(0)\\= & {} \hbox {span}\left\{ \frac{\partial }{\partial y^1}(0), \dots ,\frac{\partial }{\partial y^{d_1}}(0)\right\} , \end{aligned}$$

which proves (2.2) for \(n=1\). Let us now assume the result for \(n-1\). Due to \(\mathcal {O}(\theta _*X_i-{\tilde{X}}_i)\le 0\) and using (2.1) as well as the bilinearity of the Lie bracket, it follows that

$$\begin{aligned} \mathcal {O}\left( \theta _*[X_{i_1},[X_{i_2},\dots , [X_{i_{n-1}},X_{i_n}]\dots ]]- [{\tilde{X}}_{i_1},[{\tilde{X}}_{i_2},\dots , [{\tilde{X}}_{i_{n-1}},{\tilde{X}}_{i_n}]\dots ]]\right) \le n-1. \end{aligned}$$

Applying the induction hypothesis, we deduce that

$$\begin{aligned}&\left( \theta _*[X_{i_1},[X_{i_2},\dots , [X_{i_{n-1}},X_{i_n}]\dots ]]- [{\tilde{X}}_{i_1},[{\tilde{X}}_{i_2},\dots , [{\tilde{X}}_{i_{n-1}},{\tilde{X}}_{i_n}]\dots ]]\right) (0)\\&\quad \quad \in \hbox {span}\left\{ \frac{\partial }{\partial y^1}(0), \dots ,\frac{\partial }{\partial y^{d_{n-1}}}(0)\right\} = \hbox {span }\bigcup _{k=1}^{n-1}\tilde{\mathcal {A}}_k(0). \end{aligned}$$

This gives

$$\begin{aligned} \hbox {span}\left\{ \frac{\partial }{\partial y^1}(0), \dots ,\frac{\partial }{\partial y^{d_{n}}}(0)\right\} = (\theta _*C_n)(0)\subset \hbox {span }\bigcup _{k=1}^n\tilde{\mathcal {A}}_k(0) \end{aligned}$$

and since \(\mathcal {O}\left( [{\tilde{X}}_{i_1},[{\tilde{X}}_{i_2},\dots , [{\tilde{X}}_{i_{n-1}},{\tilde{X}}_{i_n}]\dots ]]\right) \le n\), the other inclusion holds as well. Thus, we have established equality, which concludes the induction step. \(\square \)

The lemma allows us to prove the following proposition.

Proposition 2.6

The nilpotent approximations \({\tilde{X}}_1,\dots ,{\tilde{X}}_m\) satisfy the strong Hörmander condition everywhere on \(\mathbb {R}^d\).

Proof

By definition, we have \(d_N=d\), and Lemma 2.5 implies that

$$\begin{aligned} \hbox {span }\bigcup _{k=1}^N\tilde{\mathcal {A}}_k(0)= \hbox {span}\left\{ \frac{\partial }{\partial y^1}(0), \dots ,\frac{\partial }{\partial y^{d}}(0)\right\} =\mathbb {R}^d, \end{aligned}$$

i.e. \({\tilde{X}}_1,\dots ,{\tilde{X}}_m\) satisfy the strong Hörmander condition at 0. In particular, there are vector fields

$$\begin{aligned} Y_1,\dots ,Y_d\in \bigcup _{k=1}^N \left\{ [{\tilde{X}}_{i_1},[{\tilde{X}}_{i_2},\dots , [{\tilde{X}}_{i_{k-1}},{\tilde{X}}_{i_k}]\dots ]] {:}\,1\le i_1,\dots ,i_k\le m\right\} \end{aligned}$$

such that \(Y_1(0),\dots ,Y_d(0)\) are linearly independent, i.e. \(\hbox {det}(Y_1(0),\dots ,Y_d(0))\not = 0\). By continuity of the map \(y\mapsto \hbox {det}(Y_1(y),\dots ,Y_d(y))\), it follows that there exists a neighbourhood \(V_0\) of 0 on which the vector fields \({\tilde{X}}_1,\dots ,{\tilde{X}}_m\) satisfy the strong Hörmander condition. Since the Lie bracket commutes with pushforward, the homogeneity property \(\left( \delta _\varepsilon ^{-1}\right) _* {\tilde{X}}_i =\varepsilon ^{-1/2}{\tilde{X}}_i\) of the nilpotent approximations shows that the strong Hörmander condition is in fact satisfied on all of \(\mathbb {R}^d\). \(\square \)

We conclude with an example.

Example 2.7

Let \(M=\mathbb {R}^2\) and fix \(x=0\). Let \(X_1\) and \(X_2\) be the vector fields on \(\mathbb {R}^2\) defined by

$$\begin{aligned} X_1=\frac{\partial }{\partial x^1}+x^1\frac{\partial }{\partial x^2} \quad \text{ and }\quad X_2=x^1\frac{\partial }{\partial x^1}, \end{aligned}$$

with respect to Cartesian coordinates \((x^1,x^2)\) on \(\mathbb {R}^2\). We compute

$$\begin{aligned}{}[X_1,X_2]=\frac{\partial }{\partial x^1}-x^1\frac{\partial }{\partial x^2} \quad \text{ and }\quad [X_1,[X_1,X_2]]=-2\frac{\partial }{\partial x^2}. \end{aligned}$$

It follows that

$$\begin{aligned} C_1(0)=C_2(0)=\hbox {span}\left\{ \frac{\partial }{\partial x^1}(0)\right\} \;,C_3(0)=\mathbb {R}^2 \quad \text{ and }\quad d_1=d_2=1\;,d_3=2. \end{aligned}$$

We note that the Cartesian coordinates are not adapted to the filtration induced by \((X_1,X_2)\) at 0 because, for instance, \(\left( (X_1)^2\,x_2\right) (0)=1\). Following the constructive proof of [8, Corollary 3.1], we find a global adapted chart \(\theta {:}\,\mathbb {R}^2\rightarrow \mathbb {R}^2\) at 0 given by

$$\begin{aligned} \theta ^1=x^1\quad \text{ and }\quad \theta ^2=-\frac{1}{2}(x^1)^2+x^2. \end{aligned}$$

The corresponding weights are \(w_1=1\), \(w_2=3\) and the associated anisotropic dilation is

$$\begin{aligned} \delta _\varepsilon (y^1,y^2)=\left( \varepsilon ^{1/2}y^1,\varepsilon ^{3/2}y^2\right) , \end{aligned}$$

where \((y^1,y^2)\) are Cartesian coordinates on our new copy of \(\mathbb {R}^2\). For the pushforward vector fields of \(X_1\) and \(X_2\) by \(\theta \), we obtain

$$\begin{aligned} \theta _*X_1=\frac{\partial }{\partial y^1} \quad \text{ and }\quad \theta _*X_2= y^1\left( \frac{\partial }{\partial y^1}-y^1\frac{\partial }{\partial y^2}\right) . \end{aligned}$$

From this we can read off that

$$\begin{aligned} {\tilde{X}}_1=\frac{\partial }{\partial y^1} \quad \text{ and }\quad {\tilde{X}}_2=-\left( y^1\right) ^2\frac{\partial }{\partial y^2} \end{aligned}$$

because \(y^1\frac{\partial }{\partial y^1}\) is a vector field of weight 0. We observe that \({\tilde{X}}_1\) and \({\tilde{X}}_2\) are indeed homogeneous vector fields of weight 1 on \(\mathbb {R}^2\) which satisfy the strong Hörmander condition everywhere.

3 Uniform non-degeneracy of the rescaled Malliavin covariance matrices

We prove the uniform non-degeneracy of suitably rescaled Malliavin covariance matrices under the global condition

$$\begin{aligned} M=\mathbb {R}^d\quad \text{ and }\quad X_0,X_1,\dots ,X_m\in C_b^\infty (\mathbb {R}^d,\mathbb {R}^d), \end{aligned}$$

and the additional assumption that \(X_0(y)\in \hbox {span}\{X_1(y),\dots ,X_m(y)\}\) for all \(y\in \mathbb {R}^d\). We further assume that \(\theta {:}\,\mathbb {R}^d\rightarrow \mathbb {R}^d\) is a global diffeomorphism with bounded derivatives of all positive orders and an adapted chart to the filtration induced by the sub-Riemannian structure \((X_1,\dots ,X_m)\) at \(x\in \mathbb {R}^d\) fixed. Such a diffeomorphism always exists as [8, Corollary 3.1] guarantees the existence of an adapted chart \(\tilde{\theta }{:}\, U\rightarrow \mathbb {R}^d\) and due to [13, Lemma 5.2], we can construct a global diffeomorphism \(\theta {:}\,\mathbb {R}^d\rightarrow \mathbb {R}^d\) with bounded derivatives of all positive orders which agrees with \(\tilde{\theta }\) on a small enough neighbourhood of x in U. We note that \(\theta _*X_0,\theta _*X_1,\dots ,\theta _*X_m\) are also smooth bounded vector fields on \(\mathbb {R}^d\) with bounded derivatives of all orders. In particular, to simplify the notation in the subsequent analysis, we may assume \(x=0\) and that \(\theta \) is the identity map. By Sect. 2, this means that, for Cartesian coordinates \((y_1,\dots ,y_d)\) on \(\mathbb {R}^d\) and for all \(n\in \{1,\dots ,N\}\), we have

  1. (i)

    \(\displaystyle C_n(0)= \hbox {span}\left\{ \frac{\partial }{\partial y^1}(0), \dots ,\frac{\partial }{\partial y^{d_n}}(0)\right\} ,\) and

  2. (ii)

    \(\left( \hbox {D } y^k\right) (x)=0\) for every differential operator

    $$\begin{aligned} \hbox {D }\in \left\{ Y_1\cdots Y_j{:}\, Y_l\in C_{i_l} \text{ and } i_1+\dots +i_j\le n\right\} \end{aligned}$$

    and all \(k>d_n.\)

Write \(\langle \cdot ,\cdot \rangle \) for the standard inner product on \(\mathbb {R}^d\) and, for \(n\in \{0,1,\dots ,N\}\), denote the orthogonal complement of \(C_n(0)\) with respect to this inner product by \(C_n(0)^\perp \). As defined in the previous section, we further let \(\delta _\varepsilon {:}\,\mathbb {R}^d\rightarrow \mathbb {R}^d\) be the anisotropic dilation induced by the filtration at 0 and we consider the nilpotent approximations \({\tilde{X}}_1,\dots ,{\tilde{X}}_m\) of the vector fields \(X_1,\dots ,X_m\).

Let \((B_t)_{t\in [0,1]}\) be a Brownian motion in \(\mathbb {R}^m\), which is assumed to be realised as the coordinate process on the path space \(\{w\in C([0,1],\mathbb {R}^m){:}\, w_0=0\}\) under Wiener measure \(\mathbb {P}\). Define \(\underline{X}_0\) to be the vector field on \(\mathbb {R}^d\) given by

$$\begin{aligned} {\underline{X}}_0=X_0+\frac{1}{2}\sum _{i=1}^m\nabla _{X_i}X_i, \end{aligned}$$

where \(\nabla \) is the Levi-Civita connection with respect to the Euclidean metric. Under our global assumption, the Itô stochastic differential equation in \(\mathbb {R}^d\)

$$\begin{aligned} {\mathrm d}x_t^\varepsilon =\sum _{i=1}^m \sqrt{\varepsilon } X_i(x_t^\varepsilon )\,{\mathrm d}B_t^i+ \varepsilon {\underline{X}}_0(x_t^\varepsilon )\,{\mathrm d}t,\quad x_0^\varepsilon =0 \end{aligned}$$

has a unique strong solution \((x_t^\varepsilon )_{t\in [0,1]}\). Its law on \(\Omega =C([0,1],\mathbb {R}^d)\) is \(\mu _\varepsilon ^0\). We consider the rescaled diffusion process \(({\tilde{x}}_t^\varepsilon )_{t\in [0,1]}\) which is defined by \({\tilde{x}}_t^\varepsilon ={\delta }_\varepsilon ^{-1}(x_t^\varepsilon )\). It is the unique strong solution of the Itô stochastic differential equation

$$\begin{aligned} {\mathrm d}{\tilde{x}}_t^\varepsilon =\sum _{i=1}^m\sqrt{\varepsilon } \left( \left( \delta _\varepsilon ^{-1}\right) _*X_i\right) ({\tilde{x}}_t^\varepsilon )\,{\mathrm d}B_t^i +\varepsilon \left( \left( \delta _\varepsilon ^{-1}\right) _*{\underline{X}}_0\right) ({\tilde{x}}_t^\varepsilon )\,{\mathrm d}t,\quad {\tilde{x}}_0^\varepsilon =0. \end{aligned}$$

Let us further look at

$$\begin{aligned} {\mathrm d}{\tilde{x}}_t=\sum _{i=1}^m {\tilde{X}}_i({\tilde{x}}_t)\,{\mathrm d}B_t^i+ \underline{{\tilde{X}}}_0({\tilde{x}}_t)\,{\mathrm d}t,\quad {\tilde{x}}_0=0, \end{aligned}$$

where \(\underline{{\tilde{X}}}_0\) is the vector field on \(\mathbb {R}^d\) defined by

$$\begin{aligned} \underline{{\tilde{X}}}_0= \frac{1}{2}\sum _{i=1}^m\nabla _{{\tilde{X}}_i}{\tilde{X}}_i. \end{aligned}$$

Due to the nice cascade structure discussed in Remark 2.4 and by [12, Proposition 1.3], there exists a unique strong solution \(({\tilde{x}}_t)_{t\in [0,1]}\) to this Itô stochastic differential equation in \(\mathbb {R}^d\). We recall that \(\sqrt{\varepsilon }\left( \delta _\varepsilon ^{-1}\right) _* X_i\rightarrow {\tilde{X}}_i\) as \(\varepsilon \rightarrow 0\) for all \(i\in \{1,\dots ,m\}\) and because \(X_0(y)\in \hbox {span} \{X_1(y),\dots ,X_m(y)\}\) for all \(y\in \mathbb {R}^d\), we further have \(\varepsilon \left( \delta _\varepsilon ^{-1}\right) _* X_0\rightarrow 0\) as \(\varepsilon \rightarrow 0\). It follows that, for all \(t\in [0,1]\),

$$\begin{aligned} {\tilde{x}}_t^\varepsilon \rightarrow {\tilde{x}}_t \text{ as } \varepsilon \rightarrow 0 \text{ almost } \text{ surely } \text{ and } \text{ in } L^p \text{ for } \text{ all } p<\infty . \end{aligned}$$
(3.1)

For the Malliavin covariance matrices \({\tilde{c}}_1^\varepsilon \) of \({\tilde{x}}_1^\varepsilon \) and \({\tilde{c}}_1\) of \({\tilde{x}}_1\), we also obtain that

$$\begin{aligned} {\tilde{c}}_1^\varepsilon \rightarrow {\tilde{c}}_1 \text{ as } \varepsilon \rightarrow 0 \text{ almost } \text{ surely } \text{ and } \text{ in } L^p \text{ for } \text{ all } p<\infty . \end{aligned}$$
(3.2)

Proposition 2.6 shows that the nilpotent approximations \({\tilde{X}}_1,\dots ,{\tilde{X}}_m\) satisfy the strong Hörmander condition everywhere, which implies the following non-degeneracy result.

Corollary 3.1

The Malliavin covariance matrix \({\tilde{c}}_1\) is non-degenerate, i.e. for all \(p<\infty \), we have

$$\begin{aligned} \mathbb {E}\left[ \left| \det \left( {\tilde{c}}_1\right) ^{-1} \right| ^p \right] <\infty . \end{aligned}$$

Hence, the rescaled diffusion processes \(({\tilde{x}}_t^\varepsilon )_{t\in [0,1]}\) have a non-degenerate limiting diffusion process as \(\varepsilon \rightarrow 0\). This observation is important in establishing the uniform non-degeneracy of the rescaled Malliavin covariance matrices \({\tilde{c}}_1^\varepsilon \). In the following, we first gain control over the leading-order terms of \({\tilde{c}}_1^\varepsilon \) as \(\varepsilon \rightarrow 0\), which then allows us to show that the minimal eigenvalue of \({\tilde{c}}_1^\varepsilon \) can be uniformly bounded below on a set of high probability. Using this property, we prove Theorem 1.3 at the end of the section.

3.1 Properties of the rescaled Malliavin covariance matrix

Let \((\tilde{v}_t^\varepsilon )_{t\in [0,1]}\) be the unique stochastic process in \(\mathbb {R}^d\otimes (\mathbb {R}^d)^*\) such that \(({\tilde{x}}_t^\varepsilon ,\tilde{v}_t^\varepsilon )_{t\in [0,1]}\) is the strong solution of the following system of Itô stochastic differential equations starting from \(({\tilde{x}}_0^\varepsilon ,\tilde{v}_0^\varepsilon )=(0,I)\).

$$\begin{aligned} {\mathrm d}{\tilde{x}}_t^\varepsilon&=\sum _{i=1}^m\sqrt{\varepsilon } \left( \left( \delta _\varepsilon ^{-1}\right) _*X_i\right) ({\tilde{x}}_t^\varepsilon )\,{\mathrm d}B_t^i +\varepsilon \left( \left( \delta _\varepsilon ^{-1}\right) _*{\underline{X}}_0\right) ({\tilde{x}}_t^\varepsilon )\,{\mathrm d}t\\ {\mathrm d}\tilde{v}_t^\varepsilon&=-\sum _{i=1}^m\sqrt{\varepsilon }\tilde{v}_t^\varepsilon \nabla \left( \left( \delta _\varepsilon ^{-1}\right) _*X_i\right) ({\tilde{x}}_t^\varepsilon )\,{\mathrm d}B_t^i \\&\quad \,\, -\varepsilon \tilde{v}_t^\varepsilon \left( \nabla \left( \left( \delta _\varepsilon ^{-1}\right) _*{\underline{X}}_0\right) - \sum _{i=1}^m\left( \nabla \left( \left( \delta _\varepsilon ^{-1}\right) _*X_i\right) \right) ^2\right) ({\tilde{x}}_t^\varepsilon )\,{\mathrm d}t \end{aligned}$$

The Malliavin covariance matrix \({\tilde{c}}_t^\varepsilon \) of the rescaled random variable \({\tilde{x}}_t^\varepsilon \) can then be expressed as

$$\begin{aligned} {\tilde{c}}_t^\varepsilon =\sum _{i=1}^m\int _0^t \left( \tilde{v}_s^\varepsilon \left( \sqrt{\varepsilon }\left( \delta _\varepsilon ^{-1}\right) _* X_i\right) ({\tilde{x}}_s^\varepsilon )\right) \otimes \left( \tilde{v}_s^\varepsilon \left( \sqrt{\varepsilon }\left( \delta _\varepsilon ^{-1}\right) _* X_i\right) ({\tilde{x}}_s^\varepsilon )\right) \,{\mathrm d}s. \end{aligned}$$

It turns out that we obtain a more tractable expression for \({\tilde{c}}_t^\varepsilon \) if we write it in terms of \((x_t^\varepsilon ,v_t^\varepsilon )_{t\in [0,1]}\), which is the unique strong solution of the following system of Itô stochastic differential equations.

$$\begin{aligned} {\mathrm d}x_t^\varepsilon&=\sum _{i=1}^m\sqrt{\varepsilon } X_i(x_t^\varepsilon )\,{\mathrm d}B_t^i +\varepsilon {\underline{X}}_0(x_t^\varepsilon )\,{\mathrm d}t,\quad x_0^\varepsilon =0\\ {\mathrm d}v_t^\varepsilon&=-\sum _{i=1}^m\sqrt{\varepsilon }v_t^\varepsilon \nabla X_i(x_t^\varepsilon )\,{\mathrm d}B_t^i -\varepsilon v_t^\varepsilon \left( \nabla {\underline{X}}_0- \sum _{i=1}^m(\nabla X_i)^2\right) (x_t^\varepsilon )\,{\mathrm d}t, \quad v_0^\varepsilon =I \end{aligned}$$

One can check that the stochastic processes \((v_t^\varepsilon )_{t\in [0,1]}\) and \((\tilde{v}_t^{\varepsilon })_{t\in [0,1]}\) are related by \(\tilde{v}_t^{\varepsilon }={{\delta }}_{\varepsilon }^{-1} v_t^{\varepsilon }{{\delta }}_{\varepsilon }\), where the map \({{\delta }}_{\varepsilon }\) is understood as an element in \(\mathbb {R}^d\otimes (\mathbb {R}^d)^*\). This implies that

$$\begin{aligned} {\tilde{c}}_t^\varepsilon =\sum _{i=1}^m\int _0^t\left( \sqrt{\varepsilon }{\delta }_\varepsilon ^{-1}\left( v_s^\varepsilon X_i(x_s^\varepsilon )\right) \right) \otimes \left( \sqrt{\varepsilon }{\delta }_\varepsilon ^{-1}\left( v_s^\varepsilon X_i(x_s^\varepsilon )\right) \right) \,{\mathrm d}s. \end{aligned}$$
(3.3)

We are interested in gaining control over the leading-order terms of \({\tilde{c}}_1^\varepsilon \) as \(\varepsilon \rightarrow 0\). In the corresponding analysis, we frequently use the lemma stated below.

Lemma 3.2

Let Y be a smooth vector field on \(\mathbb {R}^d\). Then

$$\begin{aligned} {\mathrm d}(v_t^\varepsilon Y(x_t^\varepsilon ))= & {} \sum _{i=1}^m\sqrt{\varepsilon } v_t^\varepsilon [X_i,Y](x_t^\varepsilon )\,{\mathrm d}B_t^i \\&+\varepsilon v_t^\varepsilon \left( [X_0,Y]+\frac{1}{2}\sum _{i=1}^m \left[ X_i,\left[ X_i,Y\right] \right] \right) (x_t^\varepsilon )\,{\mathrm d}t. \end{aligned}$$

Proof

To prove this identity, we switch to the Stratonovich setting. The system of Stratonovich stochastic differential equations satisfied by the processes \((x_t^\varepsilon )_{t\in [0,1]}\) and \((v_t^\varepsilon )_{t\in [0,1]}\) is

$$\begin{aligned} \partial x_t^\varepsilon&= \sum _{i=1}^m\sqrt{\varepsilon } X_i(x_t^\varepsilon )\,\partial B_t^i + \varepsilon X_0(x_t^\varepsilon )\,{\mathrm d}t,\quad x_0^\varepsilon =0\\ \partial v_t^\varepsilon&= -\sum _{i=1}^m\sqrt{\varepsilon }v_t^\varepsilon \nabla X_i(x_t^\varepsilon )\,\partial B_t^i - \varepsilon v_t^\varepsilon \nabla X_0(x_t^\varepsilon )\,{\mathrm d}t,\quad v_0^\varepsilon =I. \end{aligned}$$

By the product rule, we have

$$\begin{aligned} \partial (v_t^\varepsilon Y(x_t^\varepsilon )) =(\partial v_t^\varepsilon )Y(x_t^\varepsilon )+ v_t^\varepsilon \nabla Y(x_t^\varepsilon )\,\partial x_t^\varepsilon . \end{aligned}$$

Using

$$\begin{aligned} (\partial v_t^\varepsilon )Y(x_t^\varepsilon )= -\sum _{i=1}^m\sqrt{\varepsilon }v_t^\varepsilon \nabla X_i(x_t^\varepsilon )Y(x_t^\varepsilon )\,\partial B_t^i -\varepsilon v_t^\varepsilon \nabla X_0(x_t^\varepsilon )Y(x_t^\varepsilon )\,{\mathrm d}t \end{aligned}$$

as well as

$$\begin{aligned} v_t^\varepsilon \nabla Y(x_t^\varepsilon )\,\partial x_t^\varepsilon = \sum _{i=1}^m\sqrt{\varepsilon } v_t^\varepsilon \nabla Y(x_t^\varepsilon )X_i(x_t^\varepsilon )\,\partial B_t^i + \varepsilon v_t^\varepsilon \nabla Y(x_t^\varepsilon )X_0(x_t^\varepsilon )\,{\mathrm d}t \end{aligned}$$

yields the identity

$$\begin{aligned} \partial (v_t^\varepsilon Y(x_t^\varepsilon ))= \sum _{i=1}^m\sqrt{\varepsilon } v_t^\varepsilon [X_i,Y](x_t^\varepsilon )\,\partial B_t^i+ \varepsilon v_t^\varepsilon [X_0,Y](x_t^\varepsilon )\,{\mathrm d}t. \end{aligned}$$

It remains to change back to the Itô setting. We compute that, for \(i\in \{1,\dots ,m\}\),

$$\begin{aligned}&{\mathrm d}\left[ \sqrt{\varepsilon } v^\varepsilon [X_i,Y](x^\varepsilon ),B^i\right] _t\\&\quad =\sum _{j=1}^m\varepsilon v_t^\varepsilon \nabla [X_i,Y](x_t^\varepsilon ) X_j(x_t^\varepsilon )\,{\mathrm d}[B^j,B^i]_t\\&\qquad -\sum _{j=1}^m\varepsilon v_t^\varepsilon \nabla X_j(x_t^\varepsilon ) [X_i,Y](x_t^\varepsilon )\,{\mathrm d}[B^j,B^i]_t\\&\quad =\varepsilon v_t^\varepsilon \nabla [X_i,Y](x_t^\varepsilon ) X_i(x_t^\varepsilon )\,{\mathrm d}t -\varepsilon v_t^\varepsilon \nabla X_i(x_t^\varepsilon ) [X_i,Y](x_t^\varepsilon )\,{\mathrm d}t\\&\quad = \varepsilon v_t^\varepsilon [X_i,[X_i,Y]](x_t^\varepsilon )\,{\mathrm d}t \end{aligned}$$

and the claimed result follows. \(\square \)

The next lemma, which is enough for our purposes, does not provide an explicit expression for the leading-order terms of \({\tilde{c}}_1^\varepsilon \). However, its proof shows how one could recursively obtain these expressions if one wishes to do so. To simplify notations, we introduce \((B_t^0)_{t\in [0,1]}\) with \(B_t^0=t\).

Lemma 3.3

For every \(n\in \{1,\dots ,N\}\), there are finite collections of vector fields

$$\begin{aligned} \mathcal {B}_n= \left\{ Y_{j_1,\dots ,j_k}^{(n,i)}{:}\,1\le k\le n, 0\le j_1,\dots ,j_k\le m, \quad 1\le i\le m\right\}&\subset C_{n+1} \quad \text{ and }\\ \tilde{\mathcal {B}}_n= \left\{ \tilde{Y}_{j_1,\dots ,j_k}^{(n,i)}{:}\,1\le k\le n, 0\le j_1,\dots ,j_k\le m,\quad 1\le i\le m\right\}&\subset C_{n+2} \end{aligned}$$

such that, for all \(u\in C_n(0)^\perp \) and all \(i\in \{1,\dots ,m\}\), we have that, for all \(\varepsilon >0\),

$$\begin{aligned}&\left\langle u,\varepsilon ^{-n/2} v_t^\varepsilon X_i(x_t^\varepsilon )\right\rangle \\&\quad =\left\langle u, \sum _{k=1}^n\sum _{j_1,\dots ,j_k=0}^m \int _0^t\int _0^{t_2}\dots \int _0^{t_k}v_s^\varepsilon \left( Y_{j_1,\dots ,j_k}^{(n,i)}+ \sqrt{\varepsilon }\,\tilde{Y}_{j_1,\dots ,j_k}^{(n,i)}\right) \left( x_s^\varepsilon \right) \,{\mathrm d}B_s^{j_k}\,{\mathrm d}B_{t_k}^{j_{k-1}}\dots \,{\mathrm d}B_{t_2}^{j_1}\right\rangle . \end{aligned}$$

Proof

We prove this result iteratively over n. For all \(u\in C_1(0)^\perp \), we have \(\langle u, X_i(0)\rangle =0\) because \(C_1(0)=\hbox {span}\{X_1(0),\dots ,X_m(0)\}\). From Lemma 3.2, it then follows that

$$\begin{aligned}&\left\langle u,\varepsilon ^{-1/2}v_t^\varepsilon X_i(x_t^\varepsilon )\right\rangle \\&\quad =\left\langle u,\sum _{j=1}^m\int _0^t v_s^\varepsilon [X_j,X_i](x_s^\varepsilon )\,{\mathrm d}B_s^j\right. \\&\qquad \left. +\int _0^t\sqrt{\varepsilon } v_s^\varepsilon \left( [X_0,X_i]+\frac{1}{2}\sum _{j=1}^m[X_j,[X_j,X_i]]\right) (x_s^\varepsilon )\,{\mathrm d}s \right\rangle . \end{aligned}$$

This gives us the claimed result for \(n=1\) with

$$\begin{aligned} Y_j^{(1,i)}&= {\left\{ \begin{array}{ll} 0 &{} \quad \text{ if } j=0 \\ {[}X_j,X_i]&{} \quad \text{ if } 1\le j\le m \end{array}\right. } \quad \text{ and }\\ \tilde{Y}_j^{(1,i)}&= {\left\{ \begin{array}{ll} {[}X_0,X_i]+\frac{1}{2}\sum _{l=1}^m[X_l,[X_l,X_i]] &{} \quad \text{ if } j=0 \\ 0 &{} \quad \text{ otherwise } \end{array}\right. }. \end{aligned}$$

Let us now assume the result to be true for \(n-1\). Due to \(C_{n}(0)^\perp \subset C_{n-1}(0)^\perp \), the corresponding identity also holds for all \(u\in C_{n}(0)^\perp \). Using Lemma 3.2, we obtain that

$$\begin{aligned} v_s^\varepsilon Y_{j_1,\dots ,j_k}^{(n-1,i)}(x_s^\varepsilon )&= Y_{j_1,\dots ,j_k}^{(n-1,i)}(0)+ \sum _{j=1}^m\int _0^s\sqrt{\varepsilon } v_r^\varepsilon \left[ X_j,Y_{j_1,\dots ,j_k}^{(n-1,i)}\right] (x_r^\varepsilon )\,{\mathrm d}B_r^j\\&\quad +\int _0^s\varepsilon v_r^\varepsilon \bigg ( \left[ X_0,Y_{j_1,\dots ,j_k}^{(n-1,i)}\right] + \frac{1}{2}\sum _{j=1}^m\left[ X_j, \left[ X_j,Y_{j_1,\dots ,j_k}^{(n-1,i)}\right] \right] \bigg ) (x_r^\varepsilon )\,{\mathrm d}r. \end{aligned}$$

Note that \(Y_{j_1,\dots ,j_k}^{(n-1,i)}\in C_{n}\) implies \(\langle u, Y_{j_1,\dots ,j_k}^{(n-1,i)}(0)\rangle =0\) for all \(u\in C_n(0)^\perp \). We further observe that

$$\begin{aligned} \left[ X_j,Y_{j_1,\dots ,j_k}^{(n-1,i)}\right] , \tilde{Y}_{j_1,\dots ,j_k}^{(n-1,i)}&\in C_{n+1}\quad \text{ as } \text{ well } \text{ as }\\ \left[ X_0,Y_{j_1,\dots ,j_k}^{(n-1,i)}\right] + \frac{1}{2}\sum _{j=1}^m\left[ X_j, \left[ X_j,Y_{j_1,\dots ,j_k}^{(n-1,i)}\right] \right]&\in C_{n+2} \end{aligned}$$

and collecting terms shows that the claimed result is also true for n. \(\square \)

These expressions allow us to characterise the rescaled Malliavin covariance matrix \({\tilde{c}}_1^\varepsilon \) because, for all \(n\in \{0,1,\dots ,N-1\}\) and all \(u\in C_{n+1}(0)\cap C_{n}(0)^\perp \), we have

$$\begin{aligned} \left\langle u,{\tilde{c}}_1^\varepsilon u\right\rangle = \sum _{i=1}^m\int _0^1\left\langle u,\varepsilon ^{-n/2} v_t^\varepsilon X_i(x_t^\varepsilon )\right\rangle ^2\,{\mathrm d}t. \end{aligned}$$
(3.4)

By the convergence result (3.2), it follows that, for \(u\in C_{1}(0)\),

$$\begin{aligned} \left\langle u,{\tilde{c}}_1 u\right\rangle = \lim _{\varepsilon \rightarrow 0}\left\langle u,{\tilde{c}}_1^\varepsilon u\right\rangle = \sum _{i=1}^m\int _0^1\left\langle u, X_i(0)\right\rangle ^2\,{\mathrm d}t \end{aligned}$$

and from Lemma 3.3, we deduce that, for all \(n\in \{1,\dots ,N-1\}\) and all \(u\in C_{n+1}(0)\cap C_{n}(0)^\perp \),

$$\begin{aligned}&\left\langle u,{\tilde{c}}_1 u\right\rangle \nonumber \\&\quad = \sum _{i=1}^m\int _0^1\left\langle u, \sum _{k=1}^n\sum _{j_1,\dots ,j_k=0}^m \int _0^t\int _0^{t_2}\dots \int _0^{t_k} Y_{j_1,\dots ,j_k}^{(n,i)}(0) \,{\mathrm d}B_s^{j_k}\,{\mathrm d}B_{t_k}^{j_{k-1}}\dots \,{\mathrm d}B_{t_2}^{j_1}\right\rangle ^2\,{\mathrm d}t,\nonumber \\ \end{aligned}$$
(3.5)

which describes the limiting Malliavin covariance matrix \({\tilde{c}}_1\) uniquely.

3.2 Uniform non-degeneracy of the rescaled Malliavin covariance matrices

By definition, the Malliavin covariance matrices \({\tilde{c}}_1^\varepsilon \) and \({\tilde{c}}_1\) are symmetric tensors. Therefore, their matrix representations are symmetric in any basis and we can think of them as symmetric matrices. Let \(\lambda _{\mathrm{min}}^\varepsilon \) and \(\lambda _{\mathrm{min}}\) denote the minimal eigenvalues of \({\tilde{c}}_1^\varepsilon \) and \({\tilde{c}}_1\), respectively. As we frequently use the integrals from Lemma 3.3, it is convenient to consider the stochastic processes \((I_t^{(n,i),+})_{t\in [0,1]}\), \((I_t^{(n,i),-})_{t\in [0,1]}\) and \((\tilde{I}_t^{(n,i)})_{t\in [0,1]}\) given by

$$\begin{aligned} I_t^{(n,i),+}&= \sum _{k=1}^n\sum _{j_1,\dots ,j_k=0}^m \int _0^t\int _0^{t_2}\dots \int _0^{t_k}\left( v_s^\varepsilon Y_{j_1,\dots ,j_k}^{(n,i)}(x_s^\varepsilon )+Y_{j_1,\dots ,j_k}^{(n,i)}(0)\right) \,{\mathrm d}B_s^{j_k}\,{\mathrm d}B_{t_k}^{j_{k-1}}\dots \,{\mathrm d}B_{t_2}^{j_1},\\ I_t^{(n,i),-}&= \sum _{k=1}^n\sum _{j_1,\dots ,j_k=0}^m \int _0^t\int _0^{t_2}\dots \int _0^{t_k}\left( v_s^\varepsilon Y_{j_1,\dots ,j_k}^{(n,i)}(x_s^\varepsilon )-Y_{j_1,\dots ,j_k}^{(n,i)}(0)\right) \,{\mathrm d}B_s^{j_k}\,{\mathrm d}B_{t_k}^{j_{k-1}}\dots \,{\mathrm d}B_{t_2}^{j_1}, \text{ and }\\ \tilde{I}_t^{(n,i)}&= \sum _{k=1}^n\sum _{j_1,\dots ,j_k=0}^m \int _0^t\int _0^{t_2}\dots \int _0^{t_k}v_s^\varepsilon \tilde{Y}_{j_1,\dots ,j_k}^{(n,i)}(x_s^\varepsilon ) \,{\mathrm d}B_s^{j_k}\,{\mathrm d}B_{t_k}^{j_{k-1}}\dots \,{\mathrm d}B_{t_2}^{j_1}. \end{aligned}$$

For \(\alpha ,\beta ,\gamma ,\delta >0\), define subspaces of the path space \(\{w\in C([0,1],\mathbb {R}^m){:}\, w_0=0\}\) by

$$\begin{aligned} \Omega ^1(\alpha )&=\{\lambda _{\mathrm{min}}\ge 2\alpha \},\\ \Omega _\varepsilon ^2(\beta ,\gamma )&= \left\{ \sup _{0\le t\le 1}\left| I_t^{(n,i),+}\right| \le \beta ^{-1}\;,\right. \\&\;\;\,\quad \left. \sup _{0\le t\le 1}\left| \tilde{I}_t^{(n,i)}\right| \le \gamma ^{-1} {:}\,1\le i\le m, \quad 1\le n\le N\right\} , \text{ and }\\ \Omega _\varepsilon ^3(\delta )&= \left\{ \sup _{0\le t\le 1}|x_t^\varepsilon |\le \delta ,\sup _{0\le t\le 1}|v_t^\varepsilon -I|\le \delta \right\} \\&\qquad \cup \left\{ \sup _{0\le t\le 1}\left| I_t^{(n,i),-}\right| \le \delta {:}\,1\le i\le m, \quad 1\le n\le N\right\} . \end{aligned}$$

Note that the events \(\Omega _\varepsilon ^2(\beta ,\gamma )\) and \(\Omega _\varepsilon ^3(\delta )\) depend on \(\varepsilon \) as the processes \((I_t^{(n,i),+})_{t\in [0,1]}\), \((I_t^{(n,i),-})_{t\in [0,1]}\) and \((\tilde{I}_t^{(n,i)})_{t\in [0,1]}\) depend on \(\varepsilon \). We show that, for suitable choices of \(\alpha ,\beta ,\gamma \) and \(\delta \), the rescaled Malliavin covariance matrices \({\tilde{c}}_1^\varepsilon \) behave nicely on the set

$$\begin{aligned} \Omega (\alpha ,\beta ,\gamma ,\delta ,\varepsilon )= \Omega ^1(\alpha )\cap \Omega _\varepsilon ^2(\beta ,\gamma )\cap \Omega _\varepsilon ^3(\delta ) \end{aligned}$$

and that its complement is a set of small probability in the limit \(\varepsilon \rightarrow 0\). As we are only interested in small values of \(\alpha ,\beta ,\gamma ,\delta \) and \(\varepsilon \), we may make the non-restrictive assumption that \(\alpha ,\beta ,\gamma ,\delta ,\varepsilon <1\).

Lemma 3.4

There exist positive constants \(\chi \) and \(\kappa \), which do not depend on \(\varepsilon \), such that if

$$\begin{aligned} \chi \varepsilon ^{1/6}\le \alpha \;, \quad \beta =\gamma =\alpha \quad \text{ and }\quad \delta =\kappa \alpha ^2 \end{aligned}$$

then, on \(\Omega (\alpha ,\beta ,\gamma ,\delta ,\varepsilon )\), it holds true that

$$\begin{aligned} \lambda _{\mathrm{min}}^\varepsilon \ge \frac{1}{2}\lambda _{\mathrm{min}}. \end{aligned}$$

Proof

Throughout, we shall assume that we are on the event \(\Omega (\alpha ,\beta ,\gamma ,\delta ,\varepsilon )\). Let

$$\begin{aligned} R^\varepsilon (u)=\frac{\left\langle u,{\tilde{c}}_1^\varepsilon u\right\rangle }{\langle u,u\rangle } \quad \text{ and }\quad R(u)=\frac{\left\langle u,{\tilde{c}}_1 u\right\rangle }{\langle u,u\rangle } \end{aligned}$$

be the Rayleigh–Ritz quotients of the rescaled Malliavin covariance matrix \({\tilde{c}}_1^\varepsilon \) and of the limiting Malliavin covariance matrix \({\tilde{c}}_1\), respectively. As a consequence of the Min-Max Theorem, we have

$$\begin{aligned} \lambda _{\mathrm{min}}^\varepsilon =\min \{R^\varepsilon (u){:}\,u\not =0\} \quad \text{ as } \text{ well } \text{ as }\quad \lambda _{\mathrm{min}}=\min \{R(u){:}\,u\not =0\}. \end{aligned}$$

Since \(\lambda _\mathrm{min}\ge 2\alpha \), it suffices to establish that \(|R^\varepsilon (u)-R(u)|\le \alpha \) for all \(u\not = 0\). Set

$$\begin{aligned} K = \max _{1\le i\le m}\sup _{y\in \mathbb {R}^d} |X_i(y)|,\quad L = \max _{1\le i\le m}\sup _{y\in \mathbb {R}^d} |\nabla X_i(y)| \end{aligned}$$

and note that the global condition ensures \(K,L<\infty \). Using the Cauchy–Schwarz inequality, we deduce that, for \(u\in C_1(0)\!\setminus \!\{0\}\),

$$\begin{aligned} |R^\varepsilon (u)-R(u)|&\le \frac{\displaystyle \sum _{i=1}^m\int _0^1\left| \left\langle u,v_t^\varepsilon X_i(x_t^\varepsilon )\right\rangle ^2- \left\langle u,X_i(0)\right\rangle ^2 \right| \,{\mathrm d}t}{\langle u,u \rangle }\\&\le \sum _{i=1}^m\int _0^1| v_t^\varepsilon X_i(x_t^\varepsilon )+X_i(0)| | v_t^\varepsilon X_i(x_t^\varepsilon ) - X_i(0)|\,{\mathrm d}t\\&\le m((1+\delta )K+K)(\delta K+\delta L). \end{aligned}$$

Applying Lemma 3.3 as well as the expressions (3.4) and (3.5), we obtain in a similar way that, for all \(n\in \{1,\dots ,N-1\}\) and all non-zero \(u\in C_{n+1}(0)\cap C_n(0)^\perp \),

$$\begin{aligned} |R^\varepsilon (u)-R(u)|&\le \sum _{i=1}^m\int _0^1 \left| I_t^{(n,i),+}+\sqrt{\varepsilon }\tilde{I}_t^{(n,i)}\right| \left| I_t^{(n,i),-}+\sqrt{\varepsilon }\tilde{I}_t^{(n,i)}\right| \,{\mathrm d}t\\&\le m\left( \beta ^{-1}+\sqrt{\varepsilon }\gamma ^{-1}\right) \left( \delta +\sqrt{\varepsilon }\gamma ^{-1}\right) . \end{aligned}$$

It remains to consider the cross-terms. For \(n_1,n_2\in \{1,\dots ,N-1\}\) and \(u^1\in C_{n_1+1}(0)\cap C_{n_1}(0)^\perp \) as well as \(u^2\in C_{n_2+1}(0)\cap C_{n_2}(0)^\perp \), we polarise (3.4) to conclude that

$$\begin{aligned}&\frac{\left\langle u^1,{\tilde{c}}_1^\varepsilon u^2\right\rangle - \left\langle u^1,{\tilde{c}}_1 u^2\right\rangle }{|u^1||u^2|}\\&\quad \le \sum _{i=1}^m\int _0^1 \left| \frac{I_t^{(n_1,i),+}+I_t^{(n_1,i),-}}{2}+\sqrt{\varepsilon } \tilde{I}_t^{(n_1,i)}\right| \left| I_t^{(n_2,i),-} +\sqrt{\varepsilon } \tilde{I}_t^{(n_2,i)}\right| \,{\mathrm d}t\\&\qquad +\sum _{i=1}^m\int _0^1 \left| I_t^{(n_1,i),-}+\sqrt{\varepsilon } \tilde{I}_t^{(n_1,i)}\right| \left| \frac{I_t^{(n_2,i),+}-I_t^{(n_2,i),-}}{2}\right| \,{\mathrm d}t\\&\quad \le m\left( \beta ^{-1}+\delta +\sqrt{\varepsilon }\gamma ^{-1}\right) \left( \delta +\sqrt{\varepsilon }\gamma ^{-1}\right) . \end{aligned}$$

Similarly, if \(n_1=0\) and \(n_2\in \{1,\dots ,N-1\}\), we see that

$$\begin{aligned} \frac{\left\langle u^1,{\tilde{c}}_1^\varepsilon u^2\right\rangle - \left\langle u^1,{\tilde{c}}_1 u^2\right\rangle }{|u^1||u^2|}\le m\left( (1+\delta )K\left( \delta +\sqrt{\varepsilon }\gamma ^{-1}\right) + (\delta K+\delta L)\left( \frac{\beta ^{-1}+\delta }{2}\right) \right) . \end{aligned}$$

Writing a general non-zero \(u\in \mathbb {R}^d\) in its orthogonal sum decomposition and combining all the above estimates gives

$$\begin{aligned} |R^\varepsilon (u)-R(u)|\le \kappa _1\delta + \kappa _2\beta ^{-1}\delta + \kappa _3\sqrt{\varepsilon }\beta ^{-1}\gamma ^{-1}+ \kappa _4\varepsilon \gamma ^{-2} \end{aligned}$$

for some constants \(\kappa _1,\kappa _2,\kappa _3\) and \(\kappa _4\), which depend on KL and m but which are independent of \(\alpha ,\beta ,\gamma ,\delta \) and \(\varepsilon \). If we now choose \(\kappa \) and \(\chi \) in such a way that both \(\kappa \le 1/(4\max \{\kappa _1,\kappa _2\})\) and \(\chi ^3\ge 4\max \{\kappa _3,\kappa _4^{1/2}\}\), and provided that \(\chi \varepsilon ^{1/6}\le \alpha \), \(\beta =\gamma =\alpha \) as well as \(\delta =\kappa \alpha ^2\), then

$$\begin{aligned} \kappa _1\delta + \kappa _2\beta ^{-1}\delta + \kappa _3\sqrt{\varepsilon }\beta ^{-1}\gamma ^{-1}+ \kappa _4\varepsilon \gamma ^{-2}\le \kappa _1\kappa \alpha ^2 + \kappa _2\kappa \alpha + \kappa _3\chi ^{-3}\alpha + \kappa _4\chi ^{-6}\alpha ^4 \le \alpha . \end{aligned}$$

Since \(\kappa \) and \(\chi \) can always be chosen to be positive, the desired result follows. \(\square \)

As a consequence of this lemma, we are able to control \(\det \left( {\tilde{c}}_1^\varepsilon \right) ^{-1}\) on the good set \(\Omega (\alpha ,\beta ,\gamma ,\delta ,\varepsilon )\). This allows us to prove Theorem 1.3.

Proof of Theorem 1.3

We recall that by Proposition 2.6, the nilpotent approximations \({\tilde{X}}_1,\dots ,{\tilde{X}}_m\) satisfy the strong Hörmander condition everywhere on \(\mathbb {R}^d\). The proof of [12, Theorem 4.2] then shows that

$$\begin{aligned} \lambda _{\mathrm{min}}^{-1}\in L^p(\mathbb {P}),\quad \text{ for } \text{ all }\quad p<\infty . \end{aligned}$$
(3.6)

By the Markov inequality, this integrability result implies that, for all \(p<\infty \), there exist constants \(D(p)<\infty \) such that

$$\begin{aligned} \mathbb {P}\left( \Omega ^1(\alpha )^c\right) \le D(p)\alpha ^p. \end{aligned}$$
(3.7)

Using the Burkholder–Davis–Gundy inequality and Jensen’s inequality, we further show that, for all \(p<\infty \), there are constants \(E_1(p),E_2(p)<\infty \) such that

$$\begin{aligned} \mathbb {E}\left[ \sup _{0\le t\le 1}|x_t^\varepsilon |^p\right] \le E_1(p)\varepsilon ^{p/2}\quad \text{ and }\quad \mathbb {E}\left[ \sup _{0\le t\le 1}|v_t^\varepsilon -I|^p\right] \le E_2(p)\varepsilon ^{p/2}. \end{aligned}$$

Similarly, by repeatedly applying the Burkholder–Davis–Gundy inequality and Jensen’s inequality, we also see that, for all \(p<\infty \) and for all \(n\in \{1,\dots ,N\}\) and \(i\in \{1,\dots ,m\}\), there exist constants \(E^{(n,i)}(p)<\infty \) and \(D^{(n,i)}(p),\tilde{D}^{(n,i)}(p)<\infty \) such that

$$\begin{aligned} \mathbb {E}\left[ \sup _{0\le t\le 1}\left| I_t^{(n,i),-}\right| ^p\right] \le E^{(n,i)}(p)\varepsilon ^{p/2} \end{aligned}$$

as well as

$$\begin{aligned} \mathbb {E}\left[ \sup _{0\le t\le 1}\left| I_t^{(n,i),+}\right| ^p\right] \le D^{(n,i)}(p) \quad \text{ and }\quad \mathbb {E}\left[ \sup _{0\le t\le 1}\left| \tilde{I}_t^{(n,i)}\right| ^p\right] \le \tilde{D}^{(n,i)}(p). \end{aligned}$$

As the sets \(\Omega _\varepsilon ^2(\beta ,\gamma )\) and \(\Omega _\varepsilon ^3(\delta )\) are defined by only finitely many constraints, the bounds established above and the Markov inequality imply that, for all \(p<\infty \), there are constants \(D(p)<\infty \) and \(E(p)<\infty \) such that

$$\begin{aligned} \mathbb {P}\left( \Omega _\varepsilon ^2(\beta ,\gamma )^c\right)&\le D(p)\left( \beta ^p+\gamma ^p\right) \quad \text{ and } \end{aligned}$$
(3.8)
$$\begin{aligned} \mathbb {P}\left( \Omega _\varepsilon ^3(\delta )^c\right)&\le E(p)\delta ^{-p}\varepsilon ^{p/2}. \end{aligned}$$
(3.9)

Moreover, from the Kusuoka–Stroock estimate, cf. [1], as stated by Watanabe [16, Theorem 3.2], we know that there exist a positive integer S and, for all \(p<\infty \), constants \(C(p)<\infty \) such that, for all \(\varepsilon \in (0,1]\),

$$\begin{aligned} \Vert \det ({\tilde{c}}_1^\varepsilon )^{-1}\Vert _p= \left( \mathbb {E}\left[ \left| \det \left( {\tilde{c}}_1^\varepsilon \right) ^{-1} \right| ^p\right] \right) ^{1/p}\le C(p)\varepsilon ^{-S/2}. \end{aligned}$$

Let us now choose \(\alpha =\chi ^{3/4}\varepsilon ^{1/8}\), \(\beta =\gamma =\alpha \) and \(\delta =\kappa \alpha ^2\). We note that \(\chi \varepsilon ^{1/6}=\alpha ^{4/3}\le \alpha \) and hence, from Lemma 3.4 it follows that

$$\begin{aligned} \lambda _{\mathrm{min}}^\varepsilon \ge \frac{1}{2}\lambda _{\mathrm{min}}\end{aligned}$$

on \(\Omega (\alpha ,\beta ,\gamma ,\delta ,\varepsilon )\). Thus, we have

$$\begin{aligned} \det ({\tilde{c}}_1^\varepsilon )^{-1} \mathbbm {1}_{\Omega (\alpha ,\beta ,\gamma ,\delta ,\varepsilon )} \le (\lambda _{\mathrm{min}}^\varepsilon )^{-d} \mathbbm {1}_{\Omega (\alpha ,\beta ,\gamma ,\delta ,\varepsilon )} \le 2^d\lambda _{\mathrm{min}}^{-d} \mathbbm {1}_{\Omega (\alpha ,\beta ,\gamma ,\delta ,\varepsilon )} \end{aligned}$$

and therefore,

$$\begin{aligned} \det ({\tilde{c}}_1^\varepsilon )^{-1}\le 2^d\lambda _{\mathrm{min}}^{-d}+\det ({\tilde{c}}_1^\varepsilon )^{-1} \left( \mathbbm {1}_{\Omega ^1(\alpha )^c}+\mathbbm {1}_{\Omega _\varepsilon ^2(\beta ,\gamma )^c}+ \mathbbm {1}_{\Omega _\varepsilon ^3(\delta )^c} \right) . \end{aligned}$$

Using the Hölder inequality, the Kusuoka–Stroock estimate as well as the estimates (3.7), (3.8) and (3.9), we further deduce that, for all \(q,r<\infty \),

$$\begin{aligned} \Vert \det ({\tilde{c}}_1^\varepsilon )^{-1}\Vert _p&\le 2^d\Vert \lambda _\mathrm{min}^{-1}\Vert _p^d+ C(2p)\varepsilon ^{-S/2}\left( \mathbb {P}\left( \Omega ^1(\alpha )^c\right) ^{1/2p}+ \mathbb {P}\left( \Omega _\varepsilon ^2(\beta ,\gamma )^c\right) ^{1/2p}\right. \\&\quad \left. + \mathbb {P}\left( \Omega _\varepsilon ^3(\delta )^c\right) ^{1/2p} \right) \\&\le 2^d\Vert \lambda _\mathrm{min}^{-1}\Vert _p^d+C(2p)\varepsilon ^{-S/2} \left( (D(q)\alpha ^q)^{1/2p}+ \left( E(r)\delta ^{-r}\varepsilon ^{r/2}\right) ^{1/2p}\right) . \end{aligned}$$

Hence, we would like to choose q and r in such a way that we can control both \(\varepsilon ^{-S/2}\alpha ^{q/2p}\) and \(\varepsilon ^{-S/2}\delta ^{-r/2p}\varepsilon ^{r/4p}\). Since \(\delta = \kappa \alpha ^2\) and \(\alpha =\chi ^{3/4}\varepsilon ^{1/8}\), we have

$$\begin{aligned} \varepsilon ^{-S/2}\alpha ^{q/2p}= & {} \chi ^{3q/8p}\varepsilon ^{-S/2+q/16p}\quad \text{ as } \text{ well } \text{ as }\\ \varepsilon ^{-S/2}\delta ^{-r/2p}\varepsilon ^{r/4p}= & {} \left( \kappa \chi ^{3/2}\right) ^{-r/2p}\varepsilon ^{-S/2+r/8p}. \end{aligned}$$

Thus, picking \(q=8pS\) and \(r=4pS\) ensures both terms remain bounded as \(\varepsilon \rightarrow 0\) and we obtain

$$\begin{aligned} \Vert \det ({\tilde{c}}_1^\varepsilon )^{-1}\Vert _p \le 2^d\Vert \lambda _\mathrm{min}^{-1}\Vert _p^d+C(2p) \left( D(8pS,\chi )^{1/2p}+E(4pS,\kappa ,\chi )^{1/2p}\right) . \end{aligned}$$

This together with the integrability (3.6) of \(\lambda _\mathrm{min}^{-1}\) implies the uniform non-degeneracy of the rescaled Malliavin covariance matrices \({\tilde{c}}_1^\varepsilon \). \(\square \)

4 Convergence of the diffusion bridge measures

We prove Theorem 1.2 in this section with the extension to Theorem 1.1 left to Sect. 5. For our analysis, we adapt the Fourier transform argument presented in [2] to allow for the higher-order scaling \(\delta _\varepsilon \). As in Sect. 3, we may assume that the sub-Riemannian structure \((X_1,\dots ,X_m)\) has already been pushed forward by the global diffeomorphism \(\theta {:}\,\mathbb {R}^d\rightarrow \mathbb {R}^d\) which is an adapted chart at \(x=0\) and which has bounded derivatives of all positive orders.

Define \(T\Omega ^0\) to be the set of continuous paths \(v=(v_t)_{t\in [0,1]}\) in \(T_0\mathbb {R}^d\cong \mathbb {R}^d\) with \(v_0=0\) and set

$$\begin{aligned} T\Omega ^{0,y}=\{v\in T\Omega ^0{:}\, v_1=y\}. \end{aligned}$$

Let \({\tilde{\mu }}_\varepsilon ^0\) denote the law of the rescaled process \(({\tilde{x}}_t^\varepsilon )_{t\in [0,1]}\) on \(T\Omega ^0\) and write \(q(\varepsilon ,0,\cdot )\) for the law of \(v_1\) under the measure \({\tilde{\mu }}_\varepsilon ^0\). To obtain the rescaled diffusion bridge measures, we disintegrate \({\tilde{\mu }}_\varepsilon ^0\) uniquely, with respect to the Lebesgue measure on \(\mathbb {R}^d\), as

$$\begin{aligned} {\tilde{\mu }}_\varepsilon ^0({\mathrm d}v)=\int _{\mathbb {R}^d} {\tilde{\mu }}_\varepsilon ^{0,y}({\mathrm d}v)q(\varepsilon ,0,y)\,{\mathrm d}y, \end{aligned}$$
(4.1)

where \({\tilde{\mu }}_\varepsilon ^{0,y}\) is a probability measure on \(T\Omega ^0\) which is supported on \(T\Omega ^{0,y}\), and the map \(y\mapsto {\tilde{\mu }}_\varepsilon ^{0,y}\) is weakly continuous. We can think of \({\tilde{\mu }}_\varepsilon ^{0,y}\) as the law of the process \(({\tilde{x}}_t^\varepsilon )_{t\in [0,1]}\) conditioned by \({\tilde{x}}_1^\varepsilon =y\). In particular, this construction is consistent with our previous definition of \({\tilde{\mu }}_\varepsilon ^{0,0}\). Similarly, write \({\tilde{\mu }}^0\) for the law of the limiting rescaled diffusion process \(({\tilde{x}}_t)_{t\in [0,1]}\) on \(T\Omega ^0\), denote the law of \(v_1\) under \({\tilde{\mu }}^0\) by \(\bar{q}(\cdot )\) and let \(({\tilde{\mu }}^{0,y}{:}\,y\in \mathbb {R}^d)\) be the unique family of probability measures we obtain by disintegrating the measure \({\tilde{\mu }}^0\) as

$$\begin{aligned} {\tilde{\mu }}^0({\mathrm d}v)=\int _{\mathbb {R}^d} {\tilde{\mu }}^{0,y}({\mathrm d}v)\bar{q}(y)\,{\mathrm d}y. \end{aligned}$$
(4.2)

To keep track of the paths of the diffusion bridges, we fix \(t_1,\dots ,t_k\in (0,1)\) with \(t_1<\dots <t_k\) as well as a smooth function g on \((\mathbb {R}^d)^k\) of polynomial growth and consider the smooth cylindrical function G on \(T\Omega ^0\) defined by \(G(v)=g(v_{t_1},\dots ,v_{t_k})\). For \(y\in \mathbb {R}^d\) and \(\varepsilon >0\), set

$$\begin{aligned} G_\varepsilon (y)&=q(\varepsilon ,0,y)\int _{T\Omega ^{0,y}}G(v) {\tilde{\mu }}_\varepsilon ^{0,y}({\mathrm d}v)\quad \text{ and }\\ G_0(y)&=\bar{q}(y)\int _{T\Omega ^{0,y}}G(v) {\tilde{\mu }}^{0,y}({\mathrm d}v). \end{aligned}$$

Both functions are continuous integrable functions on \(\mathbb {R}^d\) and in particular, we can consider their Fourier transforms \({\hat{G}}_\varepsilon (\xi )\) and \({\hat{G}}_0(\xi )\) given by

$$\begin{aligned} {\hat{G}}_\varepsilon (\xi )=\int _{\mathbb {R}^d} G_\varepsilon (y)\mathrm{e}^{\mathrm{i}\langle \xi ,y\rangle }\,{\mathrm d}y \quad \text{ and }\quad {\hat{G}}_0(\xi )=\int _{\mathbb {R}^d} G_0(y)\mathrm{e}^{\mathrm{i}\langle \xi ,y\rangle }\,{\mathrm d}y. \end{aligned}$$

Using the disintegration of measure property (4.1), we deduce that

$$\begin{aligned} {\hat{G}}_\varepsilon (\xi )&=\int _{\mathbb {R}^d}\int _{T\Omega ^{0,y}}q(\varepsilon ,0,y) G(v){\tilde{\mu }}_\varepsilon ^{0,y}({\mathrm d}v) \mathrm{e}^{\mathrm{i}\langle \xi ,y\rangle }\,{\mathrm d}y\\&=\int _{T\Omega ^{0,y}} G(v)\mathrm{e}^{\mathrm{i}\langle \xi ,v_1\rangle } {\tilde{\mu }}_\varepsilon ^0({\mathrm d}v)\\&=\mathbb {E}\left[ G({\tilde{x}}^\varepsilon )\exp \left\{ \mathrm{i}\langle \xi ,{\tilde{x}}_1^\varepsilon \rangle \right\} \right] . \end{aligned}$$

Similarly, by using (4.2), we show that

$$\begin{aligned} {\hat{G}}_0(\xi )=\mathbb {E}\left[ G({\tilde{x}})\exp \left\{ \mathrm{i}\langle \xi ,{\tilde{x}}_1\rangle \right\} \right] . \end{aligned}$$

We recall that \({\tilde{x}}_t^\varepsilon \rightarrow {\tilde{x}}_t\) as \(\varepsilon \rightarrow 0\) almost surely and in \(L^p\) for all \(p<\infty \). Hence, \({\hat{G}}_\varepsilon (\xi )\rightarrow {\hat{G}}_0(\xi )\) as \(\varepsilon \rightarrow 0\) for all \(\xi \in \mathbb {R}^d\). To be able to use this convergence result to make deductions about the behaviour of the functions \(G_\varepsilon \) and \(G_0\) we need \({\hat{G}}_\varepsilon \) to be integrable uniformly in \(\varepsilon \in (0,1]\). This is provided by the following lemma, which is proven at the end of the section.

Lemma 4.1

For all smooth cylindrical functions G on \(T\Omega ^0\) there are constants \(C(G)<\infty \) such that, for all \(\varepsilon \in (0,1]\) and all \(\xi \in \mathbb {R}^d\), we have

$$\begin{aligned} |{\hat{G}}_\varepsilon (\xi )|\le \frac{C(G)}{1+|\xi |^{d+1}}. \end{aligned}$$
(4.3)

Moreover, in the case where \(G(v)=|v_{t_1}-v_{t_2}|^4\), there exists a constant \(C<\infty \) such that, uniformly in \(t_1,t_2\in (0,1)\), we can choose \(C(G)=C|t_1-t_2|^2\), i.e. for all \(\varepsilon \in (0,1]\) and all \(\xi \in \mathbb {R}^d\),

$$\begin{aligned} |{\hat{G}}_\varepsilon (\xi )|\le \frac{C|t_1-t_2|^2}{1+|\xi |^{d+1}}. \end{aligned}$$
(4.4)

With this setup, we can prove Theorem 1.2.

Proof of Theorem 1.2

Applying the Fourier inversion formula and using (4.3) from Lemma 4.1 as well as the dominated convergence theorem, we deduce that

$$\begin{aligned} G_\varepsilon (0)=\frac{1}{(2\pi )^d}\int _{\mathbb {R}^d}{\hat{G}}_\varepsilon (\xi )\,{\mathrm d}\xi \rightarrow \frac{1}{(2\pi )^d}\int _{\mathbb {R}^d}{\hat{G}}_0(\xi )\,{\mathrm d}\xi =G_0(0) \quad \text{ as }\quad \varepsilon \rightarrow 0. \end{aligned}$$
(4.5)

Let \(Q=\sum _{n=1}^N nd_n\) be the homogeneous dimension of the sub-Riemannian structure \((X_1,\dots ,X_m)\). Due to the change of variables formula, we have

$$\begin{aligned} q(\varepsilon ,0,y)=\varepsilon ^{Q/2}p(\varepsilon ,0,\delta _\varepsilon (y)), \end{aligned}$$

where p and q are the Dirichlet heat kernels, with respect to the Lebesgue measure on \(\mathbb {R}^d\), associated to the processes \((x_t^1)_{t\in [0,1]}\) and \(({\tilde{x}}_t^1)_{t\in [0,1]}\), respectively. From (4.5), it follows that

$$\begin{aligned} \varepsilon ^{Q/2}p(\varepsilon ,0,0)\int _{T\Omega ^{0,0}}G(v) {\tilde{\mu }}_\varepsilon ^{0,0}({\mathrm d}v)\rightarrow \bar{q}(0)\int _{T\Omega ^{0,0}}G(v) {\tilde{\mu }}^{0,0}({\mathrm d}v) \quad \text{ as }\quad \varepsilon \rightarrow 0. \end{aligned}$$
(4.6)

Choosing \(g\equiv 1\) shows that

$$\begin{aligned} \varepsilon ^{Q/2}p(\varepsilon ,0,0)\rightarrow \bar{q}(0) \quad \text{ as }\quad \varepsilon \rightarrow 0, \end{aligned}$$
(4.7)

which agrees with the small-time heat kernel asymptotics established in [5] and [11]. We recall that \(\bar{q}{:}\,\mathbb {R}^d\rightarrow [0,\infty )\) is the density of the random variable \({\tilde{x}}_1\), where \(({\tilde{x}}_t)_{t\in [0,1]}\) is the limiting rescaled process with generator

$$\begin{aligned} \tilde{\mathcal {L}}=\frac{1}{2}\sum _{i=1}^m {\tilde{X}}_i^2. \end{aligned}$$

By Proposition 2.6, the nilpotent approximations \({\tilde{X}}_1,\dots ,{\tilde{X}}_m\) satisfy the strong Hörmander condition everywhere on \(\mathbb {R}^d\) and since \(\tilde{\mathcal {L}}\) has vanishing drift, the discussions in [7] imply that \(\bar{q}(0)>0\). Hence, we can divide (4.6) by (4.7) to obtain

$$\begin{aligned} \int _{T\Omega ^{0,0}}G(v) {\tilde{\mu }}_\varepsilon ^{0,0}({\mathrm d}v)\rightarrow \int _{T\Omega ^{0,0}}G(v) {\tilde{\mu }}^{0,0}({\mathrm d}v) \quad \text{ as }\quad \varepsilon \rightarrow 0. \end{aligned}$$

Thus, the finite-dimensional distributions of \({\tilde{\mu }}_\varepsilon ^{0,0}\) converge weakly to those of \({\tilde{\mu }}^{0,0}\) and it remains to establish tightness to deduce the desired convergence result. Taking \(G(v)=|v_{t_1}-v_{t_2}|^4\), using the Fourier inversion formula and the estimate (4.4) from Lemma 4.1, we conclude that

$$\begin{aligned} \varepsilon ^{Q/2} p(\varepsilon ,0,0)\int _{T\Omega ^{0,0}} |v_{t_1}-v_{t_2}|^4\; {\tilde{\mu }}_\varepsilon ^{0,0}({\mathrm d}v)=G_\varepsilon (0)\le C|t_1-t_2|^2. \end{aligned}$$

From (4.7) and due to \(\bar{q}(0)>0\), it further follows that there exists a constant \(D<\infty \) such that, for all \(t_1,t_2\in (0,1)\),

$$\begin{aligned} \sup _{\varepsilon \in (0,1]}\int _{T\Omega ^{0,0}} |v_{t_1}-v_{t_2}|^4\; {\tilde{\mu }}_\varepsilon ^{0,0}({\mathrm d}v)\le D|t_1-t_2|^2. \end{aligned}$$

Standard arguments finally imply that the family of laws \(({\tilde{\mu }}_\varepsilon ^{0,0}{:}\,\varepsilon \in (0,1])\) is tight on \(T\Omega ^{0,0}\) and hence, \({\tilde{\mu }}_\varepsilon ^{0,0}\rightarrow {\tilde{\mu }}^{0,0}\) weakly on \(T\Omega ^{0,0}\) as \(\varepsilon \rightarrow 0\). \(\square \)

It remains to establish Lemma 4.1. The proof closely follows [2, Proof of Lemma 4.1], where the main adjustments needed arise due to the higher-order scaling map \(\delta _\varepsilon \). In addition to the uniform non-degeneracy of the rescaled Malliavin covariance matrices \({\tilde{c}}_1^\varepsilon \), which is provided by Theorem 1.3, we need the rescaled processes \(({\tilde{x}}_t^\varepsilon )_{t\in [0,1]}\) and \((\tilde{v}_t^\varepsilon )_{t\in [0,1]}\) defined in Sect. 3.1 to have moments of all orders bounded uniformly in \(\varepsilon \in (0,1]\). The latter is ensured by the following lemma.

Lemma 4.2

There are moment estimates of all orders for the stochastic processes \(({\tilde{x}}_t^\varepsilon )_{t\in [0,1]}\) and \((\tilde{v}_t^\varepsilon )_{t\in [0,1]}\) which are uniform in \(\varepsilon \in (0,1]\), i.e. for all \(p<\infty \), we have

$$\begin{aligned} \sup _{\varepsilon \in (0,1]} \mathbb {E}\left[ \sup _{0\le t\le 1}|{\tilde{x}}_t^\varepsilon |^p\right]<\infty \quad \text{ and }\quad \sup _{\varepsilon \in (0,1]} \mathbb {E}\left[ \sup _{0\le t\le 1}|\tilde{v}_t^\varepsilon |^p\right] <\infty . \end{aligned}$$

Proof

We exploit the graded structure induced by the sub-Riemannian structure \((X_1,\dots ,X_m)\) and we make use of the properties of an adapted chart. For \(\tau \in [0,1]\), consider the Itô stochastic differential equation in \(\mathbb {R}^d\)

$$\begin{aligned} {\mathrm d}x_t^\varepsilon (\tau )= \sum _{i=1}^m\tau \sqrt{\varepsilon } X_i(x_t^\varepsilon (\tau ))\,{\mathrm d}B_t^i+ \tau ^2\varepsilon {\underline{X}}_0(x_t^\varepsilon (\tau ))\,{\mathrm d}t ,\quad x_0^\varepsilon (\tau )=0 \end{aligned}$$

and let \(\{(x_t^\varepsilon (\tau ))_{t\in [0,1]}{:}\,\tau \in [0,1]\}\) be the unique family of strong solutions which is almost surely jointly continuous in \(\tau \) and t. Observe that \(x_t^\varepsilon (0)=0\) and \(x_t^\varepsilon (1)=x_t^\varepsilon \) for all \(t\in [0,1]\), almost surely. Moreover, for \(n\ge 1\), the rescaled nth derivative in \(\tau \)

$$\begin{aligned} x_t^{\varepsilon ,(n)}(\tau )= \varepsilon ^{-n/2}\left( \frac{\partial }{\partial \tau }\right) ^n x_t^\varepsilon (\tau ) \end{aligned}$$

exists for all \(\tau \) and t, almost surely. For instance, \((x_t^{\varepsilon ,(1)}(\tau ))_{t\in [0,1]}\) is the unique strong solution of the Itô stochastic differential equation

$$\begin{aligned} {\mathrm d}x_t^{\varepsilon ,(1)}(\tau )&= \sum _{i=1}^m X_i(x_t^\varepsilon (\tau ))\,{\mathrm d}B_t^i+ 2\tau \sqrt{\varepsilon }{\underline{X}}_0(x_t^\varepsilon (\tau ))\,{\mathrm d}t\\&\quad + \sum _{i=1}^m\tau \sqrt{\varepsilon } \nabla X_i(x_t^\varepsilon (\tau ))x_t^{\varepsilon ,(1)}(\tau )\,{\mathrm d}B_t^i\\&\quad +\tau ^2\varepsilon \nabla {\underline{X}}_0(x_t^\varepsilon (\tau ))x_t^{\varepsilon ,(1)}(\tau )\,{\mathrm d}t,\quad x_0^{\varepsilon ,(1)}(\tau )=0. \end{aligned}$$

In particular, we compute that \(x_t^{\varepsilon ,(1)}(0)=\sum _{i=1}^m X_i(0)B_t^i\). As \(\langle u,X_i(0)\rangle =0\) for all \(i\in \{1,\dots ,m\}\) and all \(u\in C_1(0)^\perp \), we deduce

$$\begin{aligned} \left\langle u,x_t^{\varepsilon ,(1)}(0)\right\rangle =0 \quad \text{ for } \text{ all }\quad u\in C_1(0)^\perp . \end{aligned}$$
(4.8)

By looking at the corresponding stochastic differential equation for \((x_t^{\varepsilon ,(2)}(\tau ))_{t\in [0,1]}\), we further obtain that

$$\begin{aligned} x_t^{\varepsilon ,(2)}(0)= \sum _{i=1}^m \int _0^t 2\nabla X_i(0) x_s^{\varepsilon ,(1)}(0)\,{\mathrm d}B_s^i+ 2{{\underline{X}}}_0(0)t. \end{aligned}$$

Due to (4.8), the only non-zero terms in \(\nabla X_i(0) x_s^{\varepsilon ,(1)}(0)\) are scalar multiples of the first \(d_1\) columns of \(\nabla X_i(0)\), i.e. where the derivative is taken along a direction lying in \(C_1(0)\). Thus, by property (ii) of an adapted chart and since \(X_0(0)\in \hbox {span}\{X_1(0),\dots ,X_m(0)\}\), it follows that

$$\begin{aligned} \left\langle u,x_t^{\varepsilon ,(2)}(0)\right\rangle =0 \quad \text{ for } \text{ all }\quad u\in C_2(0)^\perp . \end{aligned}$$

In general, continuing in the same way and by appealing to the Faà di Bruno formula, we prove iteratively that, for all \(n\in \{1,\dots ,N-1\}\),

$$\begin{aligned} \left\langle u,x_t^{\varepsilon ,(n)}(0)\right\rangle =0 \quad \text{ for } \text{ all }\quad u\in C_n(0)^\perp . \end{aligned}$$
(4.9)

Besides, the stochastic process \((x_t^\varepsilon (\tau ),x_t^{\varepsilon ,(1)}(\tau ), \dots ,x_t^{\varepsilon ,(N)}(\tau ))_{t\in [0,1]}\) is the solution of a stochastic differential equation with graded Lipschitz coefficients in the sense of Norris [12]. As the coefficient bounds of the graded structure are uniform in \(\tau \in [0,1]\) and \(\varepsilon \in (0,1]\), we obtain, uniformly in \(\tau \) and \(\varepsilon \), moment bounds of all orders for \((x_t^\varepsilon (\tau ),x_t^{\varepsilon ,(1)}(\tau ), \dots ,x_t^{\varepsilon ,(N)}(\tau ))_{t\in [0,1]}\). Finally, due to (4.9) we have, for all \(n\in \{1,\dots ,N\}\) and all \(u\in C_n(0)\cap C_{n-1}(0)^\perp \),

$$\begin{aligned} \left\langle u,{\tilde{x}}_t^\varepsilon \right\rangle = \left\langle u,\varepsilon ^{-n/2} x_t^\varepsilon \right\rangle = \left\langle u, \int _0^1\int _0^{\tau _1}\dots \int _0^{\tau _{n-1}}x_t^{\varepsilon ,(n)}(\tau _n) \,{\mathrm d}\tau _n\,{\mathrm d}\tau _{n-1}\dots \,{\mathrm d}\tau _1 \right\rangle . \end{aligned}$$

This together with the uniform moment bounds implies the claimed result that, for all \(p<\infty \),

$$\begin{aligned} \sup _{\varepsilon \in (0,1]} \mathbb {E}\left[ \sup _{0\le t\le 1}|{\tilde{x}}_t^\varepsilon |^p\right] <\infty . \end{aligned}$$

We proceed similarly to establish the second estimate. Let \(\{(v_t^\varepsilon (\tau ))_{t\in [0,1]}{:}\,\tau \in [0,1]\}\) be the unique family of strong solutions to the Itô stochastic differential equation in \(\mathbb {R}^d\)

$$\begin{aligned} {\mathrm d}v_t^\varepsilon (\tau )= & {} -\sum _{i=1}^m\tau \sqrt{\varepsilon }v_t^\varepsilon (\tau ) \nabla X_i(x_t^\varepsilon (\tau ))\,{\mathrm d}B_t^i \\&-\tau ^2\varepsilon v_t^\varepsilon (\tau )\left( \nabla {{\underline{X}}}_0- \sum _{i=1}^m(\nabla X_i)^2\right) (x_t^\varepsilon (\tau ))\,{\mathrm d}t, \quad v_0^\varepsilon (\tau )=I \end{aligned}$$

which is almost surely jointly continuous in \(\tau \) and t. We note that \(v_t^\varepsilon (0)=I\) and \(v_t^\varepsilon (1)=v_t^\varepsilon \) for all \(t\in [0,1]\), almost surely. For \(n\ge 1\), set

$$\begin{aligned} v_t^{\varepsilon ,(n)}(\tau )= \varepsilon ^{-n/2}\left( \frac{\partial }{\partial \tau }\right) ^n v_t^\varepsilon (\tau ), \end{aligned}$$

which exists for all \(\tau \) and t, almost surely. For \(n_1,n_2\in \{1,\dots ,N\}\) and \(u^1\in C_{n_1}(0)\cap C_{n_1-1}(0)^\perp \) as well as \(u^2\in C_{n_2}(0)\cap C_{n_2-1}(0)^\perp \), we have

$$\begin{aligned} \left\langle u^1,\tilde{v}_t^\varepsilon u^2\right\rangle = \varepsilon ^{-(n_1-n_2)/2}\left\langle u^1,v_t^\varepsilon u^2\right\rangle . \end{aligned}$$

Therefore, if \(n_1\le n_2\), we obtain the bound \(|\langle u^1,\tilde{v}_t^\varepsilon u^2\rangle |\le |\langle u^1,v_t^\varepsilon u^2\rangle |\). On the other hand, if \(n_1>n_2\) then \(\langle u^1,u^2\rangle =0\) and in a similar way to proving (4.9), we show that

$$\begin{aligned} \left\langle u^1,v_t^{\varepsilon ,(k)}(0)u^2\right\rangle =0 \quad \text{ for } \text{ all }\quad k\in \{1,\dots , n_1-n_2-1\} \end{aligned}$$

by repeatedly using property (ii) of an adapted chart. This allows us to write

$$\begin{aligned} \left\langle u^1,\tilde{v}_t^\varepsilon u^2\right\rangle = \left\langle u^1, \left( \int _0^1\int _0^{\tau _1}\dots \int _0^{\tau _{n_1-n_2-1}} v_t^{\varepsilon ,(n_1-n_2)}(\tau _{n_1-n_2}) \,{\mathrm d}\tau _{n_1-n_2}\,{\mathrm d}\tau _{n_1-n_2-1}\dots \,{\mathrm d}\tau _1\right) u^2 \right\rangle \end{aligned}$$

for \(n_1>n_2\). As the stochastic process \((x_t^\varepsilon (\tau ), v_t^\varepsilon (\tau ), x_t^{\varepsilon ,(1)}(\tau ),v_t^{\varepsilon ,(1)}(\tau ),\dots \), \(x_t^{\varepsilon ,(N)}(\tau ),v_t^{\varepsilon ,(N)}(\tau ))_{t\in [0,1]}\) is the solution of a stochastic differential equation with graded Lipschitz coefficients in the sense of Norris [12], with the coefficient bounds of the graded structure being uniform in \(\tau \in [0,1]\) and \(\varepsilon \in (0,1]\), the second result claimed follows. \(\square \)

We finally present the proof of Lemma 4.1. For some of the technical arguments which carry over unchanged, we simply refer the reader to [2].

Proof of Lemma 4.1

Let \((x_t^\varepsilon )_{t\in [0,1]}\) be the process in \(\mathbb {R}^d\) and \((u_t^\varepsilon )_{t\in [0,1]}\) as well as \((v_t^\varepsilon )_{t\in [0,1]}\) be the processes in \(\mathbb {R}^d\otimes (\mathbb {R}^d)^*\) which are defined as the unique strong solutions of the following system of Itô stochastic differential equations.

$$\begin{aligned} {\mathrm d}x_t^\varepsilon&=\sum _{i=1}^m\sqrt{\varepsilon } X_i(x_t^\varepsilon )\,{\mathrm d}B_t^i +\varepsilon {{\underline{X}}}_0(x_t^\varepsilon )\,{\mathrm d}t,\quad x_0^\varepsilon =0\\ {\mathrm d}u_t^\varepsilon&=\sum _{i=1}^m\sqrt{\varepsilon }\nabla X_i(x_t^\varepsilon )u_t^\varepsilon \,{\mathrm d}B_t^i +\varepsilon \nabla {{\underline{X}}}_0(x_t^\varepsilon )u_t^\varepsilon \,{\mathrm d}t, \quad u_0^\varepsilon =I\nonumber \\ {\mathrm d}v_t^\varepsilon&=-\sum _{i=1}^m\sqrt{\varepsilon }v_t^\varepsilon \nabla X_i(x_t^\varepsilon )\,{\mathrm d}B_t^i -\varepsilon v_t^\varepsilon \left( \nabla {{\underline{X}}}_0- \sum _{i=1}^m(\nabla X_i)^2\right) (x_t^\varepsilon )\,{\mathrm d}t, \quad v_0^\varepsilon =I\nonumber \end{aligned}$$
(4.10)

Fix \(k\in \{1,\dots ,d\}\). For \(\eta \in \mathbb {R}^d\), consider the perturbed process \((B_t^{\eta })_{t\in [0,1]}\) in \(\mathbb {R}^m\) given by

$$\begin{aligned} {\mathrm d}B_t^{\eta ,i}={\mathrm d}B_t^i+\eta \left( \sqrt{\varepsilon }{\delta }_\varepsilon ^{-1} \left( v_t^\varepsilon X_i(x_t^\varepsilon )\right) \right) ^k\,{\mathrm d}t, \quad B_0^\eta =0, \end{aligned}$$

where \((\sqrt{\varepsilon }{\delta }_\varepsilon ^{-1} (v_t^\varepsilon X_i(x_t^\varepsilon )))^k\) denotes the kth component of the vector \(\sqrt{\varepsilon }{\delta }_\varepsilon ^{-1} (v_t^\varepsilon X_i(x_t^\varepsilon ))\) in \(\mathbb {R}^d\). Write \((x_t^{\varepsilon ,\eta })_{t\in [0,1]}\) for the strong solution of the stochastic differential equation (4.10) with the driving Brownian motion \((B_t)_{t\in [0,1]}\) replaced by \((B_t^{\eta })_{t\in [0,1]}\). We choose a version of the family of processes \((x_t^{\varepsilon ,\eta })_{t\in [0,1]}\) which is almost surely smooth in \(\eta \) and set

$$\begin{aligned} \left( (x^\varepsilon )'_t\right) ^k= \left. \frac{\partial }{\partial \eta }\right| _{\eta =0}x_t^{\varepsilon ,\eta }. \end{aligned}$$

The derived process \(((x^\varepsilon )'_t)_{t\in [0,1]}=(((x^\varepsilon )'_t)^1, \dots ,((x^\varepsilon )'_t)^d)_{t\in [0,1]}\) in \(\mathbb {R}^d\otimes \mathbb {R}^d\) associated with the process \((x_t^\varepsilon )_{t\in [0,1]}\) then satisfies the Itô stochastic differential equation

$$\begin{aligned} {\mathrm d}(x^\varepsilon )_t'= & {} \sum _{i=1}^m\sqrt{\varepsilon }\nabla X_i(x_t^\varepsilon )(x^\varepsilon )_t'\,{\mathrm d}B_t^i+\varepsilon \nabla {{\underline{X}}}_0(x_t^\varepsilon )(x^\varepsilon )_t'\,{\mathrm d}t\\&+\sum _{i=1}^m\sqrt{\varepsilon }X_i(x_t^\varepsilon )\otimes \left( \sqrt{\varepsilon }{\delta }_{\varepsilon }^{-1}\left( v_t^\varepsilon X_i(x_t^\varepsilon )\right) \right) {\mathrm d}t \end{aligned}$$

subject to \((x^\varepsilon )'_0=0\). Using the expression (3.3) for the rescaled Malliavin covariance matrix \({\tilde{c}}_t^\varepsilon \), we show that \((x^\varepsilon )'_t=u_t^\varepsilon {\delta }_\varepsilon {\tilde{c}}_t^\varepsilon \). It follows that for the derived process \((({\tilde{x}}^\varepsilon )'_t)_{t\in [0,1]}\) associated with the rescaled process \(({\tilde{x}}^\varepsilon _t)_{t\in [0,1]}\) and the stochastic process \((\tilde{u}_t^\varepsilon )_{t\in [0,1]}\) given by \(\tilde{u}_t^\varepsilon ={\delta }_\varepsilon ^{-1} u_t^\varepsilon {\delta }_\varepsilon \), we have

$$\begin{aligned} ({\tilde{x}}^\varepsilon )'_t=\tilde{u}_t^\varepsilon {\tilde{c}}_t^\varepsilon . \end{aligned}$$

Note that both \(\tilde{u}_1^\varepsilon \) and \({\tilde{c}}_1^\varepsilon \) are invertible for all \(\varepsilon >0\) with \((\tilde{u}_1^\varepsilon )^{-1}=\tilde{v}_1^\varepsilon \). Let \((r_t^\varepsilon )_{t\in [0,1]}\) be the process defined by

$$\begin{aligned} {\mathrm d}r_t^\varepsilon =\sum _{i=1}^m\sqrt{\varepsilon }{\delta }_\varepsilon ^{-1} \left( v_t^\varepsilon X_i(x_t^\varepsilon )\right) {\mathrm d}B_t^i,\quad r_0^\varepsilon =0 \end{aligned}$$

and set \(y_t^{\varepsilon ,(0)}=(x_{t\wedge t_1}^\varepsilon ,\dots ,x_{t\wedge t_k}^\varepsilon ,x_t^\varepsilon ,v_t^\varepsilon ,r_t^\varepsilon ,(x^\varepsilon )'_t)\). The underlying graded Lipschitz structure, in the sense of Norris [12], allows us, for \(n\ge 0\), to recursively define

$$\begin{aligned} z_t^{\varepsilon ,(n)}=\left( y_t^{\varepsilon ,(0)},\dots ,y_t^{\varepsilon ,(n)}\right) \end{aligned}$$

by first solving for the derived process \(((z^{\varepsilon ,(n)})'_t)_{t\in [0,1]}\), then writing

$$\begin{aligned} \left( z^{\varepsilon ,(n)}\right) '_t=\left( \left( y^{\varepsilon ,(0)}\right) '_t, \dots ,\left( y^{\varepsilon ,(n)}\right) '_t\right) \end{aligned}$$

and finally setting \(y_t^{\varepsilon ,(n+1)}=(y^{\varepsilon ,(n)})'_t\).

Consider the random variable \(y^\varepsilon =(({\tilde{x}}^\varepsilon )'_1)^{-1}\) in \((\mathbb {R}^d)^*\otimes (\mathbb {R}^d)^*\) and let \(\phi =\phi (y^\varepsilon ,z_1^{\varepsilon ,(n)})\) be a polynomial in \(y^\varepsilon \), where the coefficients are continuously differentiable in \(z_1^{\varepsilon ,(n)}\) and of polynomial growth, along with their derivatives. Going through the deductions made from Bismut’s integration by parts formula in [2, Proof of Lemma 4.1] with \(R\equiv 0\) and \(F\equiv 0\) shows that for any continuously differentiable, bounded function \(f{:}\,\mathbb {R}^d\rightarrow \mathbb {R}\) with bounded first derivatives and any \(k\in \{1,\dots ,d\}\), we have

$$\begin{aligned} \mathbb {E}\left[ \nabla _k f({\tilde{x}}_1^\varepsilon ) \phi \left( y^\varepsilon ,z_1^{\varepsilon ,(n)}\right) \right] = \mathbb {E}\left[ f({\tilde{x}}_1^\varepsilon )\nabla _k^*\phi \left( y^\varepsilon ,z_1^{\varepsilon ,(n+1)}\right) \right] , \end{aligned}$$

where

$$\begin{aligned} \nabla _k^*\phi \left( y^\varepsilon ,z_1^{\varepsilon ,(n+1)}\right)&=\tau _k\left( y^\varepsilon \otimes r_1^\varepsilon +y^\varepsilon ({\tilde{x}}^\varepsilon )''_1 y^\varepsilon \right) \phi \left( y^\varepsilon ,z_1^{\varepsilon ,(n)}\right) \\&\quad +\tau _k\left( y^\varepsilon \otimes \left( \nabla _y \phi \left( y^\varepsilon ,z_1^{\varepsilon ,(n)}\right) y^\varepsilon ({\tilde{x}}^\varepsilon )''_1 y^\varepsilon \right) \right) \\&\quad -\tau _k\left( y^\varepsilon \otimes \left( \nabla _z \phi \left( y^\varepsilon ,z_1^{\varepsilon ,(n)}\right) \left( z^{\varepsilon ,(n)}\right) '_1\right) \right) , \end{aligned}$$

and \(\tau _k{:}\,(\mathbb {R}^d)^*\otimes (\mathbb {R}^d)^*\otimes \mathbb {R}^d\rightarrow \mathbb {R}\) is the linear map given by \(\tau _k(e_l^*\otimes e_{k'}^*\otimes e_{l'}) =\delta _{k k'}\delta _{l l'}\,.\) Starting from

$$\begin{aligned} \phi \left( y^\varepsilon ,z_1^{\varepsilon ,(0)}\right) =G({\tilde{x}}^\varepsilon ) =g\left( {\tilde{x}}_{t_1}^\varepsilon ,\dots ,{\tilde{x}}_{t_k}^\varepsilon \right) \end{aligned}$$

we see inductively that, for any multi-index \(\alpha =(k_1,\dots ,k_n)\),

$$\begin{aligned} \mathbb {E}\left[ \nabla ^\alpha f({\tilde{x}}_1^\varepsilon )G({\tilde{x}}^\varepsilon )\right] = \mathbb {E}\left[ f({\tilde{x}}_1^\varepsilon )(\nabla ^*)^\alpha G \left( y^\varepsilon ,z_1^{\varepsilon ,(n)}\right) \right] . \end{aligned}$$

Fixing \(\xi \in \mathbb {R}^d\) and choosing \(f(\cdot )=\mathrm{e}^{\mathrm{i}\langle \xi ,\cdot \rangle }\) in this integration by parts formula yields

$$\begin{aligned} |\xi ^\alpha ||{\hat{G}}_\varepsilon (\xi )|\le \mathbb {E}\left[ \left| (\nabla ^*)^\alpha G\left( y^\varepsilon ,z_1^{\varepsilon ,(n)} \right) \right| \right] . \end{aligned}$$

In order to deduce the bound (4.3), it remains to establish that \(C_\varepsilon (\alpha ,G)=\mathbb {E}[|(\nabla ^*)^\alpha G(y^\varepsilon ,z_1^{\varepsilon ,(n)})|]\) can be controlled uniformly in \(\varepsilon \). Due to \(y^\varepsilon =({\tilde{c}}_1^\varepsilon )^{-1} \tilde{v}_1^\varepsilon \), Theorem 1.3 and the second estimate from Lemma 4.2 immediately imply that, for all \(p<\infty \),

$$\begin{aligned} \sup _{\varepsilon \in (0,1]}\mathbb {E}\left[ \left| y^\varepsilon \right| ^p\right] <\infty . \end{aligned}$$
(4.11)

Moreover, from the first moment estimate in Lemma 4.2, it follows that all processes derived from the rescaled process \(({\tilde{x}}_t^\varepsilon )_{t\in [0,1]}\) have moments of all orders bounded uniformly in \(\varepsilon \in (0,1]\). Similarly, for \(n=d+1\) and all \(p<\infty \), we obtain

$$\begin{aligned} \sup _{\varepsilon \in (0,1]}\mathbb {E}\left[ \left| z_1^{\varepsilon ,(n)}\right| ^p\right] <\infty , \end{aligned}$$
(4.12)

where we observe that, for all \(n\in \{0,1,\dots ,N-1\}\) and all \(u\in C_{n+1}(0)\cap C_n(0)^\perp \),

$$\begin{aligned} \left\langle u,r_t^\varepsilon \right\rangle = \sum _{i=1}^m\int _0^t\left\langle u, \varepsilon ^{-n/2}v_s^\varepsilon X_i(x_s^\varepsilon )\right\rangle \,{\mathrm d}B_s^i, \end{aligned}$$

and use Lemma 3.3 to show that there is no singularity in the process \((r_t^\varepsilon )_{t\in [0,1]}\) as \(\varepsilon \rightarrow 0\). Since \((\nabla ^*)^\alpha G\) is of polynomial growth in the argument \((y^\varepsilon ,z_1^{\varepsilon ,(n)})\), the moment estimates (4.11) and (4.12) show that \(C_\varepsilon (\alpha ,G)\) is bounded uniformly in \(\varepsilon \in (0,1]\). This establishes (4.3).

Finally, the same proof as presented in [2, Proof of Lemma 4.1] shows that we have (4.4) in the special case where \(G(v)=|v_{t_1}-v_{t_2}|^4\) for some \(t_1,t_2\in (0,1)\). Let the process \(({\tilde{x}}_t^{\varepsilon ,(0)})_{t\in [0,1]}\) be given by \({\tilde{x}}_t^{\varepsilon ,(0)}={\tilde{x}}_t^\varepsilon \) and, recursively for \(n\ge 0\), define \(({\tilde{x}}_t^{\varepsilon ,(n+1)})_{t\in [0,1]}\) by \({\tilde{x}}_t^{\varepsilon ,(n+1)}=({\tilde{x}}_t^\varepsilon ,({\tilde{x}}^{\varepsilon ,(n)})'_t)\). Then, for all \(p\in [1,\infty )\), there exists a constant \(D(p)<\infty \) such that, uniformly in \(t_1,t_2\in (0,1)\) and in \(\varepsilon \in (0,1]\),

$$\begin{aligned} \mathbb {E}\left[ \left| {\tilde{x}}_{t_1}^{\varepsilon ,(n)}- {\tilde{x}}_{t_2}^{\varepsilon ,(n)}\right| ^{4p}\right] \le D(p)|t_1-t_2|^{2p}. \end{aligned}$$

Furthermore, from the expression for the adjoint operator \(\nabla _k^*\) we deduce that, for all \(n\ge 1\) and any multi-index \(\alpha =(k_1,\dots ,k_n)\), there exists a random variable \(M_\alpha \), with moments of all orders which are bounded uniformly in \(\varepsilon \in (0,1]\), such that

$$\begin{aligned} \left( \nabla ^*\right) ^\alpha G\left( y^\varepsilon ,z_1^{\varepsilon ,(n)}\right) = M_\alpha \left| {\tilde{x}}_{t_1}^{\varepsilon ,(n)}-{\tilde{x}}_{t_2}^{\varepsilon ,(n)}\right| ^{4}. \end{aligned}$$

By using Hölder’s inequality, we conclude that there exists a constant \(C(\alpha )<\infty \) such that, uniformly in \(t_1,t_2\in (0,1)\) and \(\varepsilon \in (0,1]\), we obtain

$$\begin{aligned} C_\varepsilon (\alpha ,G)\le C(\alpha )|t_1-t_2|^2, \end{aligned}$$

which implies (4.4). \(\square \)

5 Localisation argument

In proving Theorem 1.1 by localising Theorem 1.2, we use the same localisation argument as presented in [2, Section 5]. This is possible due to [2, Theorem 6.1], which provides a control over the amount of heat diffusing between two fixed points without leaving a fixed closed subset, also covering the diagonal case. After the proof, we give an example to illustrate Theorem 1.1 and we remark on deductions made for the \(\sqrt{\varepsilon }\)-rescaled fluctuations of diffusion loops.

Let \(\mathcal {L}\) be a differential operator on M satisfying the conditions of Theorem 1.1 and let \((X_1,\dots ,X_m)\) be a sub-Riemannian structure for the diffusivity of \(\mathcal {L}\). Define \(X_0\) to be the smooth vector field on M given by requiring

$$\begin{aligned} \mathcal {L}=\frac{1}{2}\sum _{i=1}^m X_i^2+X_0 \end{aligned}$$

and recall that \(X_0(y)\in \hbox {span}\{X_1(y),\dots ,X_m(y)\}\) for all \(y\in M\). Let \((U_0,\theta )\) be an adapted chart to the filtration induced by \((X_1,\dots ,X_m)\) at \(x\in M\) and extend it to a smooth map \(\theta {:}\,M\rightarrow \mathbb {R}^d\). By passing to a smaller set if necessary, we may assume that the closure of \(U_0\) is compact. Let U be a domain in M containing x and compactly contained in \(U_0\). We start by constructing a differential operator \(\bar{\mathcal {L}}\) on \(\mathbb {R}^d\) which satisfies the assumptions of Theorem 1.2 with the identity map being an adapted chart at 0 and such that \(\mathcal {L}(f)=\bar{\mathcal {L}}(f\circ \theta ^{-1})\circ \theta \) for all \(f\in C^\infty (U)\).

Set \(V=\theta (U)\) and \(V_0=\theta (U_0)\). Let \(\chi \) be a smooth function on \(\mathbb {R}^d\) which satisfies \(\mathbbm {1}_V\le \chi \le \mathbbm {1}\) and where \(\{\chi >0\}\) is compactly contained in \(V_0\). The existence of such a function is always guaranteed. Besides, we pick another smooth function \(\rho \) on \(\mathbb {R}^d\) with \(\mathbbm {1}_{V}\le \mathbbm {1}-\rho \le \mathbbm {1}_{V_0}\) and such that \(\chi +\rho \) is everywhere positive. Define vector fields \({\bar{X}}_0,{\bar{X}}_1,\dots ,{\bar{X}}_m,{\bar{X}}_{m+1},\dots ,{\bar{X}}_{m+d}\) on \(\mathbb {R}^d\) by

$$\begin{aligned} {\bar{X}}_i(z)&= {\left\{ \begin{array}{ll} \chi (z)\,{\mathrm d}\theta _{\theta ^{-1}(z)} \left( X_i\left( \theta ^{-1}(z)\right) \right) &{} \quad \text{ if } z\in V_0\\ 0 &{} \quad \text{ if } z\in \mathbb {R}^d\setminus V_0 \end{array}\right. }&\quad \text{ for } i\in \{0,1,\dots ,m\},\\ {\bar{X}}_{m+k}(z)&=\rho (z)e_k&\quad \text{ for } k\in \{1,\dots ,d\}, \end{aligned}$$

where \(e_1,\dots ,e_d\) is the standard basis in \(\mathbb {R}^d\). We note that \(X_0(y)\in \hbox {span}\{X_1(y),\dots ,X_m(y)\}\) for all \(y\in M\) implies that \({\bar{X}}_0(z)\in \hbox {span}\{{\bar{X}}_1(z),\dots ,{\bar{X}}_m(z)\}\) holds for all \(z\in \mathbb {R}^d\). Moreover, the vector fields \({\bar{X}}_1,\dots ,{\bar{X}}_m\) satisfy the strong Hörmander condition on \(\{\chi >0\}\), while \({\bar{X}}_{m+1},\dots ,{\bar{X}}_{m+d}\) themselves span \(\mathbb {R}^d\) on \(\{\rho >0\}\). As \(U_0\) is assumed to have compact closure, the vector fields constructed are all bounded with bounded derivatives of all orders. Hence, the differential operator \(\bar{\mathcal {L}}\) on \(\mathbb {R}^d\) given by

$$\begin{aligned} \bar{\mathcal {L}}=\frac{1}{2}\sum _{i=1}^{m+d} {\bar{X}}_i^2 +{\bar{X}}_0 \end{aligned}$$

satisfies the assumptions of Theorem 1.2. We further observe that, on V,

$$\begin{aligned} {\bar{X}}_i = \theta _*(X_i)\quad \text{ for } \text{ all } i\in \{0,1,\dots ,m\}, \end{aligned}$$

which yields the the desired property that \(\bar{\mathcal {L}}=\theta _*\mathcal {L}\) on V. Additionally, we see that the nilpotent approximations of \(({\bar{X}}_1,\dots ,{\bar{X}}_m,{\bar{X}}_{m+1},\dots ,{\bar{X}}_{m+d})\) are \(({\tilde{X}}_1,\dots ,{\tilde{X}}_m,0,\dots ,0)\) which shows that the limiting rescaled processes on \(\mathbb {R}^d\) associated to the processes with generator \(\varepsilon \bar{\mathcal {L}}\) and \(\varepsilon \mathcal {L}\), respectively, have the same generator \(\tilde{\mathcal {L}}\). Since \((U_0,\theta )\), and in particular the restriction \((U,\theta )\) is an adapted chart at x, it also follows that the identity map on \(\mathbb {R}^d\) is an adapted chart to the filtration induced by the sub-Riemannian structure \(({\bar{X}}_1,\dots ,{\bar{X}}_m,{\bar{X}}_{m+1},\dots ,{\bar{X}}_{m+d})\) on \(\mathbb {R}^d\) at 0. Thus, Theorem 1.2 holds with the identity map as the global diffeomorphism and we associate the same anisotropic dilation \(\delta _\varepsilon {:}\,\mathbb {R}^d\rightarrow \mathbb {R}^d\) with the adapted charts \((U,\theta )\) at x and (VI) at 0. We use this to finally prove our main result.

Proof of Theorem 1.1

Let \({\bar{p}}\) be the Dirichlet heat kernel for \(\bar{\mathcal {L}}\) with respect to the Lebesgue measure \(\lambda \) on \(\mathbb {R}^d\). Choose a positive smooth measure \(\nu \) on M which satisfies \(\nu =(\theta ^{-1})_*\lambda \) on U and let p denote the Dirichlet heat kernel for \(\mathcal {L}\) with respect to \(\nu \). Write \(\mu _\varepsilon ^{0,0,\mathbb {R}^d}\) for the diffusion loop measure on \(\Omega ^{0,0}(\mathbb {R}^d)\) associated with the operator \(\varepsilon \bar{\mathcal {L}}\) and write \({\tilde{\mu }}_\varepsilon ^{0,0,\mathbb {R}^d}\) for the rescaled loop measure on \(T\Omega ^{0,0}(\mathbb {R}^d)\), which is the image measure of \(\mu _\varepsilon ^{0,0,\mathbb {R}^d}\) under the scaling map \({\bar{\sigma }}_\varepsilon {:}\,\Omega ^{0,0}(\mathbb {R}^d)\rightarrow T\Omega ^{0,0}(\mathbb {R}^d)\) given by

$$\begin{aligned} {\bar{\sigma }}_\varepsilon (\omega )_t=\delta _\varepsilon ^{-1}\left( \omega _t\right) . \end{aligned}$$

Moreover, let \({\tilde{\mu }}^{0,0,\mathbb {R}^d}\) be the loop measure on \(T\Omega ^{0,0}(\mathbb {R}^d)\) associated with the stochastic process \(({\tilde{x}}_t)_{t\in [0,1]}\) on \(\mathbb {R}^d\) starting from 0 and having generator \(\tilde{\mathcal {L}}\) and let \(\bar{q}\) denote the probability density function of \({\tilde{x}}_1\). From Theorem 1.2, we know that \({\tilde{\mu }}_\varepsilon ^{0,0,\mathbb {R}^d}\) converges weakly to \({\tilde{\mu }}^{0,0,\mathbb {R}^d}\) on \(T\Omega ^{0,0}(\mathbb {R}^d)\) as \(\varepsilon \rightarrow 0\), and its proof also shows that

$$\begin{aligned} {\bar{p}}(\varepsilon ,0,0)=\varepsilon ^{-Q/2}\bar{q}(0)(1+o(1)) \quad \text{ as }\quad \varepsilon \rightarrow 0. \end{aligned}$$
(5.1)

Let \(p_U\) denote the Dirichlet heat kernel in U of the restriction of \(\mathcal {L}\) to U and write \(\mu _\varepsilon ^{x,x,U}\) for the diffusion bridge measure on \(\Omega ^{x,x}(U)\) associated with the restriction of the operator \(\varepsilon \mathcal {L}\) to U. For any measurable set \(A\subset \Omega ^{x,x}(M)\), we have

$$\begin{aligned} p(\varepsilon ,x,x)\mu _\varepsilon ^{x,x}(A) = p_U(\varepsilon ,x,x)\mu _\varepsilon ^{x,x,U}(A\cap \Omega ^{x,x}(U))+ p(\varepsilon ,x,x)\mu _\varepsilon ^{x,x}(A\setminus \Omega ^{x,x}(U)). \end{aligned}$$
(5.2)

Additionally, by counting paths and since \(\nu =(\theta ^{-1})_*\lambda \) on U, we obtain

$$\begin{aligned} {\bar{p}}(\varepsilon ,0,0) \mu _\varepsilon ^{0,0,\mathbb {R}^d}\left( \theta (A\cap \Omega ^{x,x}(U))\right) = p_U(\varepsilon ,x,x) \mu _\varepsilon ^{x,x,U}\left( A\cap \Omega ^{x,x}(U)\right) , \end{aligned}$$
(5.3)

where \(\theta (A\cap \Omega ^{x,x}(U))\) denotes the subset \(\{(\theta (\omega _t))_{t\in [0,1]}{:}\,\omega \in A\cap \Omega ^{x,x}(U)\}\) of \(\Omega ^{0,0}(\mathbb {R}^d)\). Let B be a bounded measurable subset of the set \(T\Omega ^{x,x}(M)\) of continuous paths \(v=(v_t)_{t\in [0,1]}\) in \(T_xM\) with \(v_0=0\) and \(v_1=0\). For \(\varepsilon >0\) sufficiently small, we have \(\sigma _\varepsilon ^{-1}(B)\subset \Omega ^{x,x}(U)\) and so (5.2) and (5.3) imply that

$$\begin{aligned} p(\varepsilon ,x,x)\mu _\varepsilon ^{x,x}\left( \sigma _\varepsilon ^{-1}(B)\right) = {\bar{p}}(\varepsilon ,0,0) \mu _\varepsilon ^{0,0,\mathbb {R}^d}\left( \theta \left( \sigma _\varepsilon ^{-1}(B)\right) \right) . \end{aligned}$$

Since \(\mu _\varepsilon ^{x,x}(\sigma _\varepsilon ^{-1}(B))={\tilde{\mu }}_\varepsilon ^{x,x}(B)\) and

$$\begin{aligned} \mu _\varepsilon ^{0,0,\mathbb {R}^d}\left( \theta \left( \sigma _\varepsilon ^{-1}(B)\right) \right) = \mu _\varepsilon ^{0,0,\mathbb {R}^d}\left( {\bar{\sigma }}_\varepsilon ^{-1}({\mathrm d}\theta _x(B))\right) = {\tilde{\mu }}_\varepsilon ^{0,0,\mathbb {R}^d}({\mathrm d}\theta _x(B)), \end{aligned}$$

we established that

$$\begin{aligned} p(\varepsilon ,x,x){\tilde{\mu }}_\varepsilon ^{x,x}(B)= {\bar{p}}(\varepsilon ,0,0){\tilde{\mu }}_\varepsilon ^{0,0,\mathbb {R}^d}({\mathrm d}\theta _x(B)). \end{aligned}$$
(5.4)

Moreover, it holds true that \(\mu _\varepsilon ^{0,0,\mathbb {R}^d}(\theta (\Omega ^{x,x}(U))\rightarrow 1\) as \(\varepsilon \rightarrow 0\). Therefore, taking \(A=\Omega ^{x,x}(M)\) in (5.3) and using (5.1) gives

$$\begin{aligned} p_U(\varepsilon ,x,x)=\varepsilon ^{-Q/2}\bar{q}(0)(1+o(1)) \quad \text{ as }\quad \varepsilon \rightarrow 0. \end{aligned}$$

By [2, Theorem 6.1], we know that

$$\begin{aligned} \limsup _{\varepsilon \rightarrow 0}\varepsilon \log p(\varepsilon ,x,M\setminus U,x) \le -\frac{d(x,M\setminus U,x)^2}{2}, \end{aligned}$$

where \(p(\varepsilon ,x,M\!\setminus \! U,x)=p(\varepsilon ,x,x)-p_U(\varepsilon ,x,x)\) and \(d(x,M\setminus U,x)\) is the sub-Riemannian distance from x to x through \(M\setminus U\). Since \(d(x,M\setminus U,x)\) is strictly positive, it follows that

$$\begin{aligned} p(\varepsilon ,x,x)=p_U(\varepsilon ,x,x)+p(\varepsilon ,x,M\setminus U,x)= \varepsilon ^{-Q/2}\bar{q}(0)(1+o(1)) \quad \text{ as }\quad \varepsilon \rightarrow 0. \end{aligned}$$

Hence, due to (5.4), we have \({\tilde{\mu }}_\varepsilon ^{x,x}(B)={\tilde{\mu }}_\varepsilon ^{0,0,\mathbb {R}^d}({\mathrm d}\theta _x(B))(1+o(1))\) for any bounded measurable set \(B\subset T\Omega ^{x,x}(M)\). From the weak convergence of \({\tilde{\mu }}_\varepsilon ^{0,0,\mathbb {R}^d}\) to \({\tilde{\mu }}^{0,0,\mathbb {R}^d}\) on \(T\Omega ^{0,0}(\mathbb {R}^d)\) as \(\varepsilon \rightarrow 0\) and since \({\tilde{\mu }}^{0,0,\mathbb {R}^d}({\mathrm d}\theta _x(B))={\tilde{\mu }}^{x,x}(B)\), we conclude that the diffusion loop measures \({\tilde{\mu }}_\varepsilon ^{x,x}\) converge weakly to the loop measure \({\tilde{\mu }}^{x,x}\) on \(T\Omega ^{0,0}(M)\) as \(\varepsilon \rightarrow 0\). \(\square \)

We close with an example and a remark.

Example 5.1

Consider the same setup as in Example 2.7, i.e. \(M=\mathbb {R}^2\) with \(x=0\) fixed and the vector fields \(X_1, X_2\) on \(\mathbb {R}^2\) defined by

$$\begin{aligned} X_1=\frac{\partial }{\partial x^1}+x^1\frac{\partial }{\partial x^2} \quad \text{ and }\quad X_2=x^1\frac{\partial }{\partial x^1} \end{aligned}$$

in Cartesian coordinates \((x^1,x^2)\). We recall that these coordinates are not adapted to the filtration induced by \((X_1,X_2)\) at 0 and we start off by illustrating why this chart is not suitable for our analysis. The unique strong solution \((x_t^\varepsilon )_{t\in [0,1]}=(x_t^{\varepsilon ,1},x_t^{\varepsilon ,2})_{t\in [0,1]}\) of the Stratonovich stochastic differential equation in \(\mathbb {R}^2\)

$$\begin{aligned} \partial x_t^{\varepsilon ,1}&= \sqrt{\varepsilon }\,\partial B_t^1+ \sqrt{\varepsilon }x_t^{\varepsilon ,1}\,\partial B_t^2\\ \partial x_t^{\varepsilon ,2}&= \sqrt{\varepsilon }x_t^{\varepsilon ,1}\,\partial B_t^1 \end{aligned}$$

subject to \(x_0^\varepsilon =0\) is given by

$$\begin{aligned} x_t^\varepsilon = \left( \sqrt{\varepsilon }\int _0^t\mathrm{e}^{\sqrt{\varepsilon }\left( B_t^2-B_s^2\right) }\,\partial B_s^1, \varepsilon \int _0^t\left( \int _0^s\mathrm{e}^{\sqrt{\varepsilon }\left( B_s^2-B_r^2\right) }\,\partial B_r^1\right) \,\partial B_s^1\right) . \end{aligned}$$

Even though the step of the filtration induced by \((X_1,X_2)\) at 0 is \(N=3\), rescaling the stochastic process \((x_t^\varepsilon )_{t\in [0,1]}\) by \(\varepsilon ^{-3/2}\) in any direction leads to a blow-up in the limit \(\varepsilon \rightarrow 0\). Instead, the highest-order rescaled process we can consider is \((\varepsilon ^{-1/2}x_t^{\varepsilon ,1},\varepsilon ^{-1}x_t^{\varepsilon ,2})_{t\in [0,1]}\) whose limiting process, as \(\varepsilon \rightarrow 0\), is characterised by

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}\left( \varepsilon ^{-1/2}x_t^{\varepsilon ,1},\varepsilon ^{-1}x_t^{\varepsilon ,2}\right) \rightarrow \left( B_t^1,\frac{1}{2}\left( B_t^1\right) ^2\right) . \end{aligned}$$

Thus, these rescaled processes localise around a parabola in \(\mathbb {R}^2\). As the Malliavin covariance matrix of \((B_1^1,\frac{1}{2}(B_1^1)^2)\) is degenerate, the Fourier transform argument from Sect. 4 cannot be used. Rather, we first need to apply an additional rescaling along the parabola to recover a non-degenerate limiting process. This is the reason why we choose to work in an adapted chart because it allows us to express the overall rescaling needed as an anisotropic dilation.

Let \(\theta {:}\,\mathbb {R}^2\rightarrow \mathbb {R}^2\) be the same global adapted chart as used in Example 2.7 and let \({\delta }_\varepsilon {:}\,\mathbb {R}^2\rightarrow \mathbb {R}^2\) be the associated anisotropic dilation. We showed that the nilpotent approximations \({\tilde{X}}_1, {\tilde{X}}_2\) of the vector fields \(X_1, X_2\) are

$$\begin{aligned} {\tilde{X}}_1=\frac{\partial }{\partial y^1} \quad \text{ and }\quad {\tilde{X}}_2=-\left( y^1\right) ^2\frac{\partial }{\partial y^2}, \end{aligned}$$

with respect to Cartesian coordinates \((y^1,y^2)\) on the second copy of \(\mathbb {R}^2\). From the convergence result (3.1), it follows that, for all \(t\in [0,1]\),

$$\begin{aligned} {\delta }_\varepsilon ^{-1}\left( \theta (x_t^\varepsilon )\right) \rightarrow \left( B_t^1,-\int _0^t\left( B_s^1\right) ^2\,\partial B_s^2\right) \quad \text{ as }\quad \varepsilon \rightarrow 0. \end{aligned}$$

Since \({\mathrm d}\theta _0{:}\,\mathbb {R}^2\rightarrow \mathbb {R}^2\) is the identity, Theorem 1.1 says that the suitably rescaled fluctuations of the diffusion loop at 0 associated to the stochastic process with generator \(\mathcal {L}=\frac{1}{2}(X_1^2+X_2^2)\) converge weakly to the loop obtained by conditioning \((B_t^1,-\int _0^t(B_s^1)^2\,\partial B_s^2)_{t\in [0,1]}\) to return to 0 at time 1.

Remark 5.2

We show that Theorems 1.1 and 1.2 allow us to make deductions about the \(\sqrt{\varepsilon }\)-rescaled fluctuations of diffusion loops. For the rescaling map \(\tau _\varepsilon {:}\,\Omega ^{x,x}\rightarrow T\Omega ^{0,0}\) given by

$$\begin{aligned} \tau _\varepsilon (\omega )_t=({\mathrm d}\theta _x)^{-1} \left( \varepsilon ^{-1/2}\theta (\omega _t)\right) , \end{aligned}$$

we are interested in the behaviour of the measures \(\mu _\varepsilon ^{x,x}\circ \tau _\varepsilon ^{-1}\) in the limit \(\varepsilon \rightarrow 0\). Let \(e_1,\dots ,e_d\) be the standard basis in \(\mathbb {R}^d\) and define \(\psi {:}\,T\Omega ^{0,0}\rightarrow T\Omega ^{0,0}\) by

$$\begin{aligned} \psi (v)_t=\sum _{i=1}^{d_1} \left\langle {\mathrm d}\theta _x(v_t),e_i\right\rangle \left( {\mathrm d}\theta _x\right) ^{-1}e_i. \end{aligned}$$

The map \(\psi \) takes a path in \(T\Omega ^{0,0}\) and projects it onto the component living in the subspace \(C_1(x)\) of \(T_xM\). Since the maps \(\tau _\varepsilon \) and \(\sigma _\varepsilon \) are related by

$$\begin{aligned} \tau _\varepsilon (\omega )_t=\left( {\mathrm d}\theta _x\right) ^{-1} \left( \varepsilon ^{-1/2}\delta _\varepsilon \left( {\mathrm d}\theta _x\left( \sigma _\varepsilon (\omega )_t \right) \right) \right) \end{aligned}$$

and because \(\varepsilon ^{-1/2}\delta _\varepsilon (y)\) tends to \((y^1,\dots ,y^{d_1},0,\dots ,0)\) as \(\varepsilon \rightarrow 0\), it follows that the \(\sqrt{\varepsilon }\)-rescaled diffusion loop measures \(\mu _\varepsilon ^{x,x}\circ \tau _\varepsilon ^{-1}\) converge weakly to \({\tilde{\mu }}^{x,x}\circ \psi ^{-1}\) on \(T\Omega ^{0,0}\) as \(\varepsilon \rightarrow 0\). Provided \(\mathcal {L}\) is non-elliptic at x, the latter is a degenerate measure which is supported on the set of paths \((v_t)_{t\in [0,1]}\) in \(T\Omega ^{0,0}\) which satisfy \(v_t\in C_1(x)\), for all \(t\in [0,1]\). Hence, the rescaled diffusion process \((\varepsilon ^{-1/2}\theta (x_t^\varepsilon ))_{t\in [0,1]}\) conditioned by \(\theta (x_1^\varepsilon )=0\) localises around the subspace \((\theta _*C_1)(0)\).

Finally, by considering the limiting diffusion loop from Example 5.1, we demonstrate that the degenerate limiting measure \({\tilde{\mu }}^{x,x}\circ \psi ^{-1}\) need not be Gaussian. Going back to Example 5.1, we first observe that the map \(\psi \) is simply projection onto the first component, i.e.

$$\begin{aligned} \psi (v)_t= \begin{pmatrix} 1 &{}\quad 0 \\ 0 &{}\quad 0 \end{pmatrix} v_t. \end{aligned}$$

Thus, to show that the measure \({\tilde{\mu }}^{x,x}\circ \psi ^{-1}\) is not Gaussian, we have to analyse the process \((B_t^1,-\int _0^t(B_s^1)^2\,\partial B_s^2)_{t\in [0,1]}\) conditioned to return to 0 at time 1 and show that its first component is not Gaussian. Using the tower property, we first condition on \(B_1^1=0\) to see that this component is equal in law to the process \((B_t^1-tB_1^1)_{t\in [0,1]}\) conditioned by \(\int _0^1(B_s^1-sB_1^1)^2\,\partial B_s^2=0\), where the latter is in fact equivalent to conditioning on \(\int _0^1(B_s^1-sB_1^1)^2\,{\mathrm d}B_s^2=0\). Let \(\mu _{B}\) denote the Brownian bridge measure on \(\Omega (\mathbb {R})^{0,0}=\{\omega \in C([0,1],\mathbb {R}){:}\,\omega _0=\omega _1=0\}\) and let \(\nu \) be the law of \(-\int _0^1(B_s^1-sB_1^1)^2\,{\mathrm d}B_s^2\) on \(\mathbb {R}\). Furthermore, denote the joint law of

$$\begin{aligned} \left( B_t^1-tB_1^1\right) _{t\in [0,1]} \quad \text{ and }\quad -\int _0^1\left( B_s^1-sB_1^1\right) ^2\,{\mathrm d}B_s^2 \end{aligned}$$

on \(\Omega (\mathbb {R})^{0,0}\times \mathbb {R}\) by \(\mu \). Since \(-\int _0^1\omega _s^2\,{\mathrm d}B_s^2\), for \(\omega \in \Omega (\mathbb {R})^{0,0}\) fixed, is a normal random variable with mean zero and variance \(\int _0^1\omega _s^4\,{\mathrm d}s\), we obtain that

$$\begin{aligned} \mu ({\mathrm d}\omega ,{\mathrm d}y)= \frac{1}{\sqrt{2\pi }\sigma (\omega )} \mathrm{e}^{-\frac{y^2}{2\sigma ^2(\omega )}}\mu _B({\mathrm d}\omega )\,{\mathrm d}y \quad \text{ with }\quad \sigma (\omega )=\left( \int _0^1\omega _s^4\,{\mathrm d}s\right) ^{1/2}. \end{aligned}$$
(5.5)

On the other hand, we can disintegrate \(\mu \) as

$$\begin{aligned} \mu ({\mathrm d}\omega ,{\mathrm d}y)=\mu _B^y({\mathrm d}\omega )\nu ({\mathrm d}y), \end{aligned}$$

where \(\mu _B^y\) is the law of \((B_t^1-tB_1^1)_{t\in [0,1]}\) conditioned by \(-\int _0^1(B_s^1-sB_1^1)^2\,{\mathrm d}B_s^2=y\), i.e. we are interested in the measure \(\mu _B^0\). From (5.5), it follows that

$$\begin{aligned} \mu _B^0({\mathrm d}\omega )\propto \sigma ^{-1}(\omega )\mu _B({\mathrm d}\omega )= \left( \int _0^1\omega _s^4\,{\mathrm d}s\right) ^{-1/2}\mu _B({\mathrm d}\omega ). \end{aligned}$$

This shows that \(\mu _B^0\) is not Gaussian, which implies that the \(\sqrt{\varepsilon }\)-rescaled fluctuations indeed admit a non-Gaussian limiting diffusion loop. \(\square \)