1 Introduction

A subRiemannian manifold as a triplet \((M,\Delta , g_0)\) where M is a connected, smooth manifold of dimension \(n\in \mathbb {N},\,\Delta \) denotes a subbundle of TM bracket-generating TM, and \(g_0\) is a positive definite smooth, bilinear form on \(\Delta \), see for instance [66]. Similarly to the Riemannian setting, one endows \((M,\Delta , g_0)\) with a metric space structure by defining the Carnot–Caratheodory (CC) control distance: For any pair \(x,y\in M\) set

$$\begin{aligned} d_0(x,y)= & {} \inf \{\delta >0 \text { such that there exists a curve } \gamma \in C^\infty ([0,1]; M) \\&\quad \text { with endpoints }x,y \text { such that }\dot{\gamma }\in \Delta (\gamma ) \text { and }|\dot{\gamma }|_g\le \delta \}. \end{aligned}$$

Curves whose velocity vector lies in \(\Delta \) are called horizontal, their length is defined in an obvious way. Subriemannian metrics can be defined, by prescribing a smooth distributions of vector fields \(X=(X_1,\ldots ,X_m)\) in \(R^n\), orthonormal with respect to \(g_0\), and satisfying the Hörmander finite rank condition

$$\begin{aligned} rank \,\, Lie (X_1,\ldots ,X_m)(x)=n, \quad \forall x\in \Omega . \end{aligned}$$
(1.1)

When attempting to extend known Riemannian results to the subRiemannian setting one naturally is led to approximating the sub-Riemannian metric (and the associated distance function \(d_0(\cdot , \cdot ) \)) with a one-parameter family of degenerating Riemannian metric (associated to distance functions \(d_\epsilon (\cdot , \cdot )\)), which converge in the Gromov–Hausdorff sense as \(\epsilon \rightarrow 0\) to the original one. This approximation is described in detail in from the point of view of the distance functions in Sect. 2.2 and from the point of view of the Riemannian setting in Definition 3.7. The approximating distance functions \(d_\epsilon \) can be defined in terms of an extended generating frame of smooth vector fields \(X_1^\epsilon ,\ldots ,X_p^\epsilon \), with \(p\ge n\) and \(X_i^\epsilon =X_i\) for \(i=1,\ldots ,m\), that converges/collapses uniformly on compact sets to the original family\(X_1,\ldots ,X_m\) as \(\epsilon \rightarrow 0\). This frame includes all the higher order commutators needed to bracket generate the tangent space. When coupled with uniform estimates, this method provides a strategy to extend known Riemannian results to the subRiemannian setting. Such approximations have been widely used since the mid-80’s in a variety of contexts. As example we recall the work of Debiard [33], Koranyi [55, 56], Ge [45], Rumin [77] as well as the references in [67] and [68]. More recently this technique has been used in the study of minimal surfaces and mean curvature flow in the Heisenberg group. Starting from the existence theorem of Pauls [71], and Cheng et al. [24], to the regularity results by Manfredini and the authors [15, 16]. Our work is largely inspired to the results of Manfredini and one of us [27] where the Nagel et al. estimates for the fundamental solution of subLaplacians have been extended to the Riemannian approximants uniformly as \(\epsilon \rightarrow 0\). In the following we list in more detail the nature of the stability estimates we investigate. Given a Riemannian manifold \((M^n,g)\), with a Riemannian smooth volume form expressed in local coordinates \((x_1,\ldots ,x_n)\) as \(d\ vol= \sqrt{g} dx_1 \ldots dx_n\), one can consider the corresponding heat operator acting on functions \(u:M\rightarrow \mathbb {R}\),

$$\begin{aligned} L_gu = \partial _t u - \frac{1}{\sqrt{g}} \sum _{i,j=1}^n \partial _i \left( \sqrt{g} g^{ij} \partial _j u\right) . \end{aligned}$$

The study of such operators is closely related to certain geometric and analytic estimates, namely: For \(K\subset \subset M\) and \(r_0>0\) there exists positive constants \(C_D,C_P,..\) below depending on \(K,r_0,g\) such that for all \(x\in K\) and \(0<r<r_0\), one has

  • (Doubling property)

    $$\begin{aligned} vol (B(x,r)) \ge C_D vol (B(x,2r)); \end{aligned}$$
    (1.2)
  • (Poincaré inequality) \(\int _{B(x,r)} |u-u_{B(x,r)}| dvol \le C_P r \int _{B(x,2r)} |\nabla _g u| d vol\);

  • (Gaussian estimates) If \(h_g\) denotes the heat kernel of \(L_g,\,x,y\in M\) and \(t>0\) one has

    $$\begin{aligned}&C_g^{-1} (vol(B(x,\sqrt{t})))^{-n/2} \exp \left( A_g \frac{ d(x,y)^2 }{t}\right) \nonumber \\&\quad \le |h(x,y,t) \le C_g (vol(B(x,\sqrt{t})))^{-n/2} \exp \left( B_g \frac{ d(x,y)^2 }{t}\right) \end{aligned}$$
    (1.3)

    and if appropriate curvature conditions hold

    $$\begin{aligned} |\partial _t^s \partial _{i_1} \cdots \partial _{i_k} h(x,y,t,s)\le C_{s,k,g} t^{-s-\frac{k}{2}} (vol(B(x,t-s)))^{-n/2} \exp \left( B_G \frac{ d(x,y)^2 }{t{-}s}\right) ;\nonumber \\ \end{aligned}$$
    (1.4)
  • (Parabolic Harnack inequality) If \(L_gu=0\) in \(Q=M\times (0,T)\) and \(u\ge 0\) then

    $$\begin{aligned} \sup _{B(x,r)\times (t-r^2, t-r^2/2)} u \le C_g \inf _{B(x,r)\times (t+r^2/2, t+r^2)} u. \end{aligned}$$
    (1.5)

The connections between such estimates was made evident in the work of Saloff-Coste [78] and Grigoryan [46], who independently established the equivalence

See also related works by Biroli and Mosco [6], and Sturm [80].

This paper aims at describing the behavior of such estimates along a sequence of metrics \(g_\epsilon \), that collapse to a subRiemannian structure as \(\epsilon \rightarrow 0\). We will prove that the estimates are stable as \(\epsilon \rightarrow 0\) and explore some of the consequences of this stability. Although, thanks to the work of Jerison [52], Nagel et al. [70] and Jerison and Sanchez-Calle [53], the Poincarè inequality, the doubling property and the Gaussian bounds are well known for subRiemannian structures, it is not immediate that they continue to hold uniformly in the approximation as \(\epsilon \rightarrow 0\). For one thing, the Riemannian curvature tensor is unbounded as \(\epsilon \rightarrow 0\), thus preventing the use of Li-Yau’s estimates. Moreover, as \(\epsilon \rightarrow 0\) the Hausdorff dimension of the metric spaces \((M,d_\epsilon )\), where \(d_\epsilon \) denotes the distance function associated to \(g_\epsilon \), typically does not remain constant and in fact increases at \(\epsilon =0\) to the homogeneous dimension associated to the subRimannian structure. The term multiscale from the title reflects the fact that the blow up of the metric as \(\epsilon \rightarrow 0\) is Riemannian at scales less than \(\epsilon \) and subRiemannian at larger scales.

To illustrate our work we introduce a prototype for the class of spaces we investigate, we consider the manifold \(M=\mathbb {R}^2\times S^1\), with coordinates \((x_1,x_2,\theta )\). The horizontal distribution is given by

$$\begin{aligned} \Delta =span\{ X_1,X_2\}, \text { with }X_1=\cos \theta \partial _{x_1}+\sin \theta \partial _{x_2}, \text { and } X_2=\partial _\theta . \end{aligned}$$

The subRiemannian metric \(g_0\) is defined so that \(X_1\) and \(X_2\) form a orthonormal basis. This is the group of Euclidean isometries defined below in Example 2.1. For each \(\epsilon >0\) we also consider the Riemannian metric \(g_\epsilon \) on M uniquely defined by the requirement that \(X_1,X_2, \epsilon X_3\) is an orthonormal basis, with \(X_3=-\sin \theta \partial _{x_1}+\cos \theta \partial _{x_2}\). Denote by \(d_\epsilon \) the corresponding Riemannian distance, by \(X_i^*\) the adjoint of \(X_i\) with respect to Lebesgue measure and by \(\Gamma _\epsilon \) the fundamental solution of the Laplace-Beltrami operator \(L_\epsilon =\sum _{i=1}^3 X_i^* X_i\). Since \(L_\epsilon \) is uniformly elliptic, then there exists \(C_\epsilon ,R_\epsilon >0\) such that for \(d_\epsilon (x,y)<R_\epsilon \) the fundamental solution will satisfy

$$\begin{aligned} C_\epsilon ^{-1} d_\epsilon (x,y)^{-1} \le \Gamma _\epsilon (x,y) \le C_e d_\epsilon (x,y)^{-1}. \end{aligned}$$

As \(\epsilon \rightarrow 0\) this estimate will degenerate in the following way: \(R_\epsilon \rightarrow 0,\,C_\epsilon \rightarrow \infty \) and for \(\epsilon =0\) one will eventually have

$$\begin{aligned} \Gamma _0(x,y)\approx d_0(x,y)^{-2}. \end{aligned}$$

As a result of the work in [70] one has that for each \(\epsilon >0\) there exists \(C_\epsilon >0\) such that

$$\begin{aligned} C^{-1} \frac{d_\epsilon ^2(x,y)}{|B_\epsilon (x,d(x,y))|} \le \Gamma _\epsilon (x,y) \le C \frac{d_\epsilon ^2(x,y)}{|B_\epsilon (x,d(x,y))|}. \end{aligned}$$

The main result of [27] was to provide stable bounds for the fundamental solution by proving that one can choose \(C_\epsilon \) independent of \(\epsilon \) as \(\epsilon \rightarrow 0\). In this paper we extend such stable bounds to the degenerate parabolic setting and to the more general subRiemannian setting.

Since our results will be local in nature, unless explicitly stated we will always assume that \(M=\mathbb {R}^n\) and use as volume the Lebesgue measure. The first result we present is due to Rea and the authors [18] and concerns stability of the doubling property.

Theorem 1.1

For every \(\epsilon _0>0\), and \(K\subset \subset \mathbb {R}^n\) there exist constants \(R,C>0\) depending on \(K, \epsilon _0\) and on the subRiemannian structure, such that for every \(\epsilon \in [0,\epsilon _0],\,x\in K\) and \(0<r<R\),

$$\begin{aligned} |B_\epsilon (x,2r)| \le C |B_\epsilon (x,r)|. \end{aligned}$$

Here we have denoted by \(B_\epsilon \) the balls related to the \(d_\epsilon \) distance function.

We present here a rather detailed proof of this result, amending some minor gaps in the exposition in [18]. If the subRiemannian structure is equiregular, as an original contribution of this paper, in Theorem 3.10 we also present a quantitative version of this result, by introducing an explicit quasi-norms equivalent to \(d_\epsilon \). These families of quasi-norms play a role analogue to the one played by the Koranyi Gauge quasi-norm (2.5) in the Heisenberg group. We also sketch the proof of the stability of Jerison’s Poincare inequality from [18].

Theorem 1.2

Let \(K\subset \subset \mathbb {R}^n\) and \(\epsilon _0>0\). The vector fields \((X^\epsilon _i)_{i=1\cdots p}\) satisfy the Poincare inequality

$$\begin{aligned} \int _{B_\epsilon (x,R)} |u-u_{B_\epsilon (x,r)}| dx \le C_P\int _{B_\epsilon (x,2r)} |\nabla ^\epsilon u| dx \end{aligned}$$

with a constant \(C_P\) depending on \(K, \epsilon _0\) and the subRiemannian structure, but independent of \(\epsilon \). Here we have denoted by \(\nabla ^\epsilon u\) the gradient of u along the frame \(X_1^\epsilon ,\ldots ,X_p^\epsilon \).

Our next results concerns the stability, as \(\epsilon \rightarrow 0\), of the Gaussian estimates for the heat kernels associated to the family of second order, sub-elliptic differential equations in non divergence form

$$\begin{aligned} L_{\epsilon , A} u\equiv \partial _t u- \sum _{i,j=1}^p a^\epsilon _{ij} X_i^\epsilon X_j^\epsilon u=0 , \end{aligned}$$

in a cylinder \( Q=\Omega \times (0,T)\). Here \(\{a_{ij}^\epsilon \}_{i,j=1,\ldots ,p}\) is a constant real matrix such that

$$\begin{aligned} \frac{1}{2}\Lambda ^{-1} \sum _{i=1}^p \xi _i^2 \le \sum _{i,j=1}^p a_{ij}^\epsilon \xi _i \xi _j \le 2\Lambda \sum _{i=1}^p \xi _i^2, \end{aligned}$$
(1.6)

for all \(\xi \in \mathbb {R}^p\), uniformly in \(\epsilon >0\) and

$$\begin{aligned} \Lambda ^{-1} \sum _{i=1}^m \xi _i^2 \le \sum _{i,j=1}^m a_{ij}^\epsilon \xi _i \xi _j \le \Lambda \sum _{i=1}^m \xi _i^2, \end{aligned}$$
(1.7)

for all \(\xi \in \mathbb {R}^m\) and \(\epsilon >0\).

Theorem 1.3

Let \(K\subset \subset \mathbb {R}^n, \Lambda >0\) and \(\epsilon _0>0\). The fundamental solution \(\Gamma _{\epsilon , A}\) of the operator \(L_{\epsilon , A}\), is a kernel with exponential decay of order 2, uniform with respect to \(\epsilon \in [0,\epsilon _0]\) and for any coefficients matrix A satisfying the bounds above for the fixed \(\Lambda >0\). In particular, the following estimates hold:

  • For every \(K\subset \subset \Omega \) there exists a constant \(C_\Lambda >0\) depending on \(\Lambda \) but independent of \(\epsilon \in [0,\epsilon _0]\), and of the matrix A such that for each \(\epsilon \in [0,\epsilon _0],\,x,y\in K\) and \(t>0\) one has

    $$\begin{aligned} C_\Lambda ^{-1} \frac{e^{-C_\Lambda \frac{d_\epsilon (x,y)^2}{t}}}{|B_\epsilon (x, \sqrt{t})|}\le P_{\epsilon , A^\epsilon }(x,y,t)\le C_\Lambda \frac{ e^{-\frac{d_\epsilon (x,y)^2}{C_\Lambda t}}}{|B_\epsilon (x, \sqrt{t})|}. \end{aligned}$$
    (1.8)
  • For \(s\in \mathbb {N}\) and k-tuple \((i_1,\ldots ,i_k)\in \{1,\ldots ,m\}^k\) there exists a constant \(C_{s,k}>0\) depending only on \(k,s,X_1,\ldots ,X_m,\Lambda \) such that

    $$\begin{aligned} \left| (\partial _t^s X_{i_1}\cdots X_{i_k} P_{\epsilon , A^\epsilon })(x,y,t)\right| \le C_{s,k} \frac{t^\frac{-2s-k}{2} e^{-\frac{d_\epsilon (x,y)^2}{C_\Lambda t}}}{|B_\epsilon (x, \sqrt{t})|} \end{aligned}$$
    (1.9)

    for all \(x,y\in K\) and \(t>0\).

  • For any \(A_1,A_2\in M_\Lambda ,\,s\in \mathbb {N}\) and k-tuple \((i_1,\ldots ,i_k)\in \{1,\ldots ,m\}^k\) there exists \(C_{s,k}>0\) depending only on \(k,s,X_1,\ldots ,X_m, \Lambda \) such that

    $$\begin{aligned}&|(\partial _t^s X_{i_1}\cdots X_{i_k} P_{\epsilon , A_1})(x,y,t) - \partial _t^s X_{i_1}\cdots X_{i_k} P_{\epsilon , A_2})(x,y, t) |\nonumber \\&\quad \le ||A_1 - A_2||C_{s,k} \frac{t^\frac{h-2s-k}{2} e^{-\frac{d_\epsilon (x,y)^2}{C_\Lambda t}}}{|B_\epsilon (x, \sqrt{t})|}, \end{aligned}$$
    (1.10)

    where \(||A||^2:=\sum _{i,j=1}^n a_{ij}^2\).

Moreover, if \(\Gamma _A\) denotes the fundamental solution of the operator \(L_\mathcal {A}=\sum _{i,j=1}^m a_{ij}^0X_iX_j\), then one has

$$\begin{aligned} {X}^\epsilon _{i_1}\cdots { X}^\epsilon _{i_k} \partial _t^s \Gamma _{\epsilon , A^\epsilon }\rightarrow {X}_{i_1}\cdots {X}_{i_k}\partial _t^s \Gamma _{A^0} \end{aligned}$$
(1.11)

as \(\epsilon \rightarrow 0\) uniformly on compact sets and in a dominated way on subcompacts of \(\Omega \).

This theorem extends to the general Hörmander vector fields setting analogue results proved by Manfredini and the authors in [17], in the setting of Carnot groups.

In a similar fashion, one of our main result in this paper is the extension to the Hörmander vector fields setting of the Carnot groups Schauder estimates established in previous work with Manfredini in [17]. To prove such extension we combine the Gaussian bounds above with a refined version of Rothschild and Stein [76] freezing and lifting scheme, adapted to the multi-scale setting, to establish Schauder type estimates which are uniform in \(\epsilon \in [0,\epsilon _0]\), for the family of second order, sub-elliptic differential equations in non divergence form

$$\begin{aligned} L_{\epsilon , A^\epsilon } u\equiv \partial _t u- \sum _{i,j=1}^n a^\epsilon _{ij}(x,t) X_i^\epsilon X_j^\epsilon u=0, \end{aligned}$$
(1.12)

in a cylinder \( Q=\Omega \times (0,T)\). Our standing assumption is that the coefficients of the operator satisfy (1.6), and (1.7) for some fixed \(\Lambda >0\).

Theorem 1.4

Let \(\alpha \in (0,1), f\in C^{\infty }(Q)\) and w be a smooth solution of \(L_{\epsilon , A^\epsilon }w=f\) on Q. Let K be a compact sets such that \(K\subset \subset {Q}\), set \(2\delta =d_0(K, \partial _p Q)\) and denote by \(K_\delta \) the \(\delta \)-tubular neighborhood of K. Assume that there exists a constant \(C>0\) such that

$$\begin{aligned} || a_{ij}^\epsilon ||_{C^{k,\alpha }_{\epsilon ,X}(K_\delta )} \le C, \end{aligned}$$

for some value \(k\in \mathbb {N}\) and for every \(\epsilon \in [0,\epsilon _0]\). There exists a constant \(C_1>0\) depending on \(\alpha ,\,C,\,\epsilon _0,\,\delta \), and the constants in Proposition 5.2, but independent of \(\epsilon \), such that

$$\begin{aligned} ||w||_{C^{k+2, \alpha }_{\epsilon ,X}(K)} \le C_1 \left( ||f||_{C^{k,\alpha }_{\epsilon ,X}(K_\delta )}+ ||w||_{C^{k+1, \alpha }_{\epsilon ,X}(K_\delta )}\right) . \end{aligned}$$

Here we have set

$$\begin{aligned} ||u||_{C_{\epsilon ,X}^{\alpha }({Q})}=\sup _{(x,t)\ne (x_{0},t_0)} \frac{|u(x,t) - u(x_{0},t_0)|}{\tilde{d}_{\epsilon }^\alpha ((x,t), (x_{0},t_0))}+ \sup _{Q} |u|. \end{aligned}$$

and if \(k\ge 1\) we have let \(u \in C_{\epsilon ,X}^{k,\alpha }({Q})\) if for all \(i=1,\ldots ,m\), one has \(X_i \in C_{\epsilon ,X}^{k-1,\alpha }({Q})\).

Analogous estimates in the \(L^p\) spaces, for operators independent of \(\epsilon \) are well known (see for instance [76] for the constant coefficient case and [9] for the Carnot group setting). Our result yield a stable version, as \(\epsilon \rightarrow 0\), of such estimates, which is valid for any family of Hörmander vector fields.

Theorem 1.5

Let \(\alpha \in (0,1), f\in C^{\infty }(Q)\) and w be a smooth solution of \(L_{\epsilon , A}w=f\) on Q. Let K be a compact sets such that \(K\subset \subset {Q}\), set \(2\delta =d_0(K, \partial _p Q)\) and denote by \(K_\delta \) the \(\delta \)-tubular neighborhood of K. Assume that there exists a constant \(C>0\) such that

$$\begin{aligned} || a_{ij}^\epsilon ||_{C^{k,\alpha }_{\epsilon ,X}(K_\delta )} \le C, \end{aligned}$$

for some value \(k\in \mathbb {N}\) and for every \(\epsilon \in [0,\epsilon _0]\). For any \(p>1\), there exists a constant \(C_1>0\) depending on \(p, \alpha ,\,C,\,\epsilon _0,\,\delta \), and the constants in Proposition 5.2, but independent of \(\epsilon \), such that

$$\begin{aligned} ||w||_{W^{k+2, p}_{\epsilon ,X}(K)} \le C_1 \left( ||f||_{W^{k,p}_{\epsilon ,X}(K_\delta )}+ ||w||_{W^{k+1, p}_{\epsilon ,X}(K_\delta )}\right) . \end{aligned}$$

Here we have set

$$\begin{aligned} ||w||_{W^{k,p}_{\epsilon ,X}} :=\sum _{i=1}^k \sum _{| I |=i} ||X^\epsilon _{i_1} \ldots X^\epsilon _{i_k} w||_{L^p}. \end{aligned}$$

We conclude the paper with two, related, groups of applications of our stability results. In the first we recall the notion of p-admissible structure (Definition 7.1), originally introduced by Hajlasz and Koskela in [48]. This class of spaces supports a rich analytic structure and allows for the development of a first-order (in the sense of derivatives up to order one) potential theory. We review some recent results by the authors and collaborators [1, 18] concerning Harnack inequalities for weak solutions of classes of quasilinear degenerate parabolic PDE in such spaces. The main point of the section is that in view of Theorems 1.1 and 1.2, the Riemannian approximations of a subRiemannian structure satisfy the hypothesis of p-admissible structure uniformly in \(\epsilon \ge 0\). Consequently, the Harnack inequalities hold uniformly across all scales. This provides a powerful technique in the study of degenerate elliptic and parabolic problems through the process of regularization and approximation. To exemplify this observation in a simple case, we consider approximating Riemannian metrics \(g_\epsilon \) with generating frame \(X_1^\epsilon ,\ldots ,X_n^\epsilon \) defined in an open set \(\Omega \subset \mathbb {R}^n\), and a family of divergence form parabolic linear equations analogue to (1.12), i.e.

$$\begin{aligned} L_{\epsilon , A} u\equiv \partial _t u- \sum _{i,j=1}^n X_i^{\epsilon , *}( a^\epsilon _{ij}(u,x,t) X_j^\epsilon u)=0, \end{aligned}$$
(1.13)

in a cylinder \( Q=\Omega \times (0,T)\). We assume that the coefficients of the operator depend smoothly on u and satisfy (1.6), and (1.7) for some fixed \(\Lambda >0\). Thanks to the stability estimates one can prove that there exists positive constants \(C,R>0\) depending only on the fixed subRiemannian structure, but independent of \(\epsilon \), such that for any \(\epsilon \ge 0\) and any non-negative weak solution \(u\ge 0\) of (1.13), one has

$$\begin{aligned} \sup _{B_\epsilon (x,r)} u \le C \inf _{B_\epsilon (x,r)} u \end{aligned}$$

for any metric ball \(B_\epsilon (x,4r)\subset \Omega \) and \(0<r<R\). Clearly this yields Hölder regularity for u that is stable as \(\epsilon \rightarrow 0\). Applying the Schauder estimates from Theorem 1.4 one obtains higher order regularity, uniformly in \(\epsilon \rightarrow 0 \) and so in particular we obtain smoothness of solutions in the case \(\epsilon =0\). For further details and for a more general version of this result, applied to weak solutions of quasilinear equations, we refer the reader to Theorem 7.5.

In the last section we discuss one of the motivating applications of our work. We outline how the structure stability results, the stability of the Schauder estimates and of the Harnack inequalities can be used to prove regularity and long time existence theorems for solutions of the subRiemannian mean curvature flow and the total curvature flow of graphs over bounded sets in step 2 Carnot groups and even in some non-nilpotent Lie groups. This is part of the work developed by the authors jointly with Maria Manfredini in [14, 17]. The notion of horizontal, or p-mean curvature has arisen in the last 10 years thanks to the work of many researchers. The two main motivations are Pansu conjecture, concerning the isoperimetric profile of the Heisenberg group [19, 26, 43, 49, 51, 69, 7375]; and the existence, regularity and uniqueness of minimals surfaces [2025, 31, 32, 50, 71], and [72]. The mean curvature flow and the total curvature flow arise in connection to gradient descent for the perimeter functional and as such can be used for both applications. Very little is known about both flows in the subRiemannian setting and as far as we know the results in [14, 17] are the first to establish existence of long time smooth flows. For other contributions to this topics, from different points of view, we recall the recent work in [35, 36].

2 Definitions and preliminary results

Let \(X=(X_1,\ldots ,X_m)\) denote a collection of smooth vector fields defined in an open subset \(\Omega \subset \mathbb {R}^n\) satisfying Hörmander’s finite rank condition (1.1), that is there exists an integer s such that the set of all vector fields, along with their commutators up to order s spans \(\mathbb {R}^n\) for every point in \(\Omega \),

$$\begin{aligned} rank \,\, Lie (X_1,\ldots ,X_m)(x)=n, \quad \text { for all }l x\in \Omega . \end{aligned}$$
(2.1)

Example 2.1

The standard example for such families is the Heisenberg group \({\mathbb {H}}^1\). This is a Lie group whose underlying manifold is \(\mathbb {R}^3\) and is endowed with a group law \((x_1,x_2,x_3)(y_1,y_2,y_3)=(x_1+y_1, x_2+y_2, x_3+y_3-(x_2y_1-x_1y_2))\). With respect to such law one has that the vector fields \(X_1=\partial _{x_1}-x_2 \partial _{x_3}\) and \(X_2=\partial _{x_2}+x_1\partial _{x_3}\) are left-invariant. Together with their commutator \([X_1,X_2]=2\partial _{x_3}\) they yield a basis of \(\mathbb {R}^3\). A second example is given by the classical group of rigid motions of the plane, also known as the roto-translation group \({\mathcal {RT}}\). This is a Lie group with underlying manifold \(\mathbb {R}^2\times S^1\) and a group law \((x_1,x_2,\theta _1)(y_1,y_2,\theta _2)=(x_1+y_1\cos \theta -y_2\sin \theta , x_2+y_1\sin \theta +y_2\cos \theta , \theta _1+\theta _2)\).

Following Nagel, Stein and Wainger, [70, page 104] we define

$$\begin{aligned} X^{(1)}=\left\{ X_1,\ldots ,X_m\right\} , \ X^{(2)}=\left\{ [X_1,X_2],\ldots ,[X_{m-1},X_m]\right\} , \ etc. \ldots \end{aligned}$$
(2.2)

letting \(X^{(k)}\) denote the set of all commutators of order \(k=1,\ldots ,r\). Indicate by \(Y_1,\ldots ,Y_p\) an enumeration of the components of \(X^{(1)}, X^{(2)},\ldots ,X^{(r)}\) such that \(Y_i=X_i\) for every \(i\le m\). If \(Y_k\in X^{(i)}\) we say that \(Y_k\) has a formal degree \(d(Y_k)=d(k) =i\). The collection of vector fields \(\{Y_1,\ldots ,Y_p\}\) spans \(\mathbb {R}^n\) at every point.

Example 2.2

If we consider the Heisenberg group vector fields \(X_1=\partial _{x_1}-x_2 \partial _{x_3}\) and \(X_2=\partial _{x_2}+x_1\partial _{x_3}\) with \((x_1,x_2,x_3)\in \mathbb {R}^3\), then \(X^{(1)}:=\{X_1,X_2\}\) and \(X^{(2)}=\{2\partial _{x_3}\}\). If we instead consider the vectors arising from the group of roto-translations one has \(X_1=\cos \theta \partial _{x_1}+\sin \theta \partial _{x_2}\) and \(X_2=\partial _{\theta }\) with \((x_1,x_2, \theta )\in \mathbb {R}^2\times S^1\) and \(X^{(1)}=\{X_1, X_2\} \) and \(X^{(2)}=\{ \sin \theta \partial _{x_1} - \cos \theta \partial _{x_2}\}\).

Example 2.3

Note that the sets \(X^{(i)}\) may have non-trivial intersection. For instance, consider the vector fields

$$\begin{aligned} X_1=\cos \theta \partial _{x_1}+\sin \theta \partial _{x_2}; \ X_2=\partial _{\theta }; \ X_3=\partial _{x_3}; \text { and }X_4=x_3^2 \partial _{x_4} \end{aligned}$$

in \((x_1,x_2,x_3,x_4,\theta )\in \mathbb {R}^4\times S^1\). In this case \(r=3\) and

$$\begin{aligned} X^{(1)}=\{X_1, X_2, X_3, X_4\}; \ X^{(2)}\\ =\{ \sin \theta \partial _{x_1} - \cos \theta \partial _{x_2}, \ 2x_3\partial _{x_4} \}; \text { and }X^{(3)}= \{\pm X_1, 2\partial _{x_4}\} \end{aligned}$$

with \(Y_1=X_1, \ldots , Y_4= X_4, Y_5=\sin \theta \partial _{x_1}-\cos \theta \partial _{x_2}, y_6= 2x_3 \partial _{x_4}, Y_7=X_1, Y_8=-X_1,\) and \(Y_{10}=2\partial _{x_4}\).

2.1 Carnot–Caratheodory distance

For each \(x, y\in \Omega \) and \(\delta >0\) denote by \(\Gamma (\delta )\) the space of all absolutely continuous curves \(\gamma :[0,1]\rightarrow \mathbb {R}^n\), joining x to y (i.e., \(\gamma (0)=x\) and \(\gamma (1)=y\)) which are tangent a.e. to the horizontal distribution \(span \{X_1,\ldots ,X_m\}\), and such that if we write

$$\begin{aligned} \gamma '(t)=\sum _{i=1}^m \alpha _i(t) X_i|_{\gamma (t)}, \end{aligned}$$

then \(\sum _{i=1}^m |\alpha _i(t)|\le \delta \) a.e. \(t\in [0,1]\). The Carnot–Caratheodory distance between x and y is defined to be

$$\begin{aligned} d_0(x,y):= \inf _{\Gamma (\delta )\ne \text {Empty Set}} \delta . \end{aligned}$$
(2.3)

In [70], the authors introduce several other distances that eventually are proved to be equivalent to \(d_0(x,y)\). The equivalence itself yields new insight into the Carnot–Caratheodory distance. Because of this, we will remind the reader of one of these distances. For each \(x,y\in \Omega \) and \(\delta >0\) denote by \(\hat{\Gamma }(\delta )\) the space of all absolutely continuous curves \(\gamma :[0,1]\rightarrow \mathbb {R}^n\), joining x to y and such that if one writes

$$\begin{aligned} \gamma '(t)=\sum _{i=1}^p\beta _i(t) Y_i|_{\gamma (t)}, \end{aligned}$$

then \(|\beta _i (t)| \le \delta ^{d(i)}.\) One then sets

$$\begin{aligned} \hat{d} (x,y):= \inf _{\hat{\Gamma }(\delta )\ne \text {Empty Set}} \delta . \end{aligned}$$

It is fairly straightforward (see [70, Proposition 1.1] to see that

Proposition 2.4

The function \(\hat{d}\) is a distance function in \(\Omega \) and for any \(K\subset \subset \Omega \) there exists \(C=C(X_1,\ldots ,X_m, K)>0\) such that

$$\begin{aligned} C^{-1} |x-y|\le \hat{d}(x,y)\le C |x-y|^{\max _i d(i)}. \end{aligned}$$

It is far less trivial to prove the following (see [70, Theorem 4])

Theorem 2.5

The distance functions \(d_0\) and \(\hat{d}\) are equivalent.

2.2 The approximating distances

There are several possibile definitions for Riemannian distance functions which approximate a Carnot–Caratheodory metric in the Gromov-Hausdroff sense.

Definition 2.6

Let \(\{Y_1,\ldots ,Y_p\}\) be a generating family of vector fields constructed as in (2.2) from a family of Hörmander vector fields \(X_1,\ldots ,X_m\). For every \(\epsilon >0\) denote by \(d_\epsilon (\cdot , \cdot )\) the Carnot–Caratheodory metric associated to the family of vector fields \((X_1^\epsilon , \ldots ,X_{p}^\epsilon ),\) defined as

$$\begin{aligned} X_i^\epsilon =\left\{ \begin{array}{lll} Y_i &{}\quad \text { if } i\le m, \\ \epsilon ^{d(i) -1 } Y_i &{} \quad \text { if } m+1 \le i \le p, \\ Y_{i-p+m} &{}\quad \text { if } p+1 \le i \le 2p -m \end{array}. \right. \end{aligned}$$
(2.4)

We will also define an extension of the degree function, setting \(d_\epsilon (i)=1\) for all \(i\le p\), and \(d_\epsilon (i) = d(i-p+m)\) if \(i\ge p+1\). In order to simplify notations we will denote \(X= X^0,\,d_0=d\) and use the same notation for both families of vector fields (dependent or independent of \(\epsilon \)).

Note that for every \(\epsilon \in (0,\bar{\epsilon })\) the sets \(\{X_i^\epsilon \}\) extends the original family of vector fields \((X_i)\) to a new families of vector fields satisfying assumption (I) on page 107 [70], i.e. there exist smooth functions \(c_{jk}^l\), depending on \(\epsilon \), such that

$$\begin{aligned}{}[X^\epsilon _j,X^\epsilon _k] =\sum _{d_\epsilon (l)\le d_\epsilon (j) + d_\epsilon (k)} c_{jk}^l {X_l^\epsilon } \end{aligned}$$

and

$$\begin{aligned} \{X^\epsilon _j\}_{j=1}^{2p-m} \text { span } \mathbb {R}^n \text { at every point }. \end{aligned}$$

Remark 2.7

Note that the coefficients \(c_{jk}^l\) will be unbounded as \(\epsilon \rightarrow 0\). In principle this could be a problem as the doubling constant in the proof in [70] depends indirectly from the \(C^r\) norm of these functions. In this survey we will describe a result, originally proved in [18], showing that this is not the case.

Remark 2.8

It follows immediately from the definition that for fixed \(x,y\in \Omega \) the function \(d_\epsilon (x,y)\) is decreasing in \(\epsilon \) and for every \(\epsilon \in (0,\bar{\epsilon })\),

$$\begin{aligned} d_0(x,y)\ge d_\epsilon (x,y) \end{aligned}$$

Remark 2.9

Let us consider a special case where \(\dim \text { span }(X_1,\ldots ,X_m)\) is constant and the vector fields \(X_1,\ldots ,X_p\) are chosen to be linearly independent in \(\Omega \). In this case we can consider two positive defined symmetric quadratic forms \(g_0,\) and \( \lambda \) defined respectively on the distribution \(H(x)=\text { span }(X_1,\ldots ,X_m)(x)\), for \(x\in \Omega \) and on \(H^\perp (x)\). The product metric \(g_0\oplus \lambda \) is then a Riemannian metric on all of \(T \Omega \). The form \(g_0\) is called a subRiemannian metric on \(\Omega \), corresponding to H. Next, for every \(\epsilon \in (0,\bar{\epsilon }]\) reconsider the rescaled metric \(g_\epsilon :=g_0\oplus \epsilon ^{-1}\lambda \) and the corresponding Riemannian distance function \(d_\epsilon \) in \(\Omega \). The latter is bi-Lipschitz equivalent to the distance \(d_\epsilon \) defined above. In [45, Theorem 1.1] Ge proved that that as metric spaces, the sequence \((\Omega , d_\epsilon )\) converges to \((\Omega , d_0)\) as \(\epsilon \rightarrow 0\) in the sense of Gromov–Hausdorff. In this limit the Hausdorff dimension of the space degenerates from coinciding with the topological dimension, for \(\epsilon >0\), to a value \(Q>n\) which may change from open set to open set. We will go more in detail about this point in the next section. In this sense the limiting approximation scheme we are using can be described by the collapsing of a family of Riemannian metric to a subRiemannian metric. See also [68, Theorem 1.2.1] for yet another related Riemannian approximation scheme.

Remark 2.10

From different perspectives, note that the subLaplacian associated to the family \(X_1^\epsilon ,\ldots ,X_m^\epsilon \) i.e. \({\mathcal {L}}u=\sum _{i=1}^m X_i^{\epsilon ,2}u\) is an elliptic operator for all \(\epsilon >0\), degenerating to a subelliptic operator for \(\epsilon =0\).

2.3 A special case: the Heisenberg group \({\mathbb {H}}^1\)

In this section we describe the behavior of the distance \(d_\epsilon \) (and of the corresponding metric balls \(B_\epsilon (x,r)\) as \(\epsilon \rightarrow 0\), by looking at the special case of the Heisenber group. In this setting we will also provide an elementary argument showing that the doubling property holds uniformly as \(\epsilon \rightarrow 0\).

Consider the vector fields from Example 2.1 \(X_1=\partial _{x_1}-x_2 \partial _{x_3},\) \(X_2=\partial _{x_2}+x_1\partial _{x_3}\) and \(X_3=\partial _{x_3}\) with \((x_1,x_2,x_3)\in \mathbb {R}^3\). The Carnot–Carathéodory distance \(d_0\) associated to the subRiemannian metric defined by the orthonormal frame \(X_1,X_2\) is equivalent to a more explicitly defined pseudo-distance function, that we call gauge distance, defined as

$$\begin{aligned} |x|^{4}=\left( x_1^2+x_2^2\right) ^2+x_3^2, \text { and } \rho (x,y)=|y^{-1}x|, \end{aligned}$$
(2.5)

where \(y^{-1}=(-y_1,-y_2,-y_3)\) and \(y^{-1}x=(x_1-y_1,x_2-y_2, x_3-y_3- (y_1x_2-x_1y_2))\) is the Heisenberg group multiplication.

Lemma 2.11

For each \(x\in \mathbb {R}^3\),

$$\begin{aligned} A^{-1}|x|\le d_0(x,0)\le A|x|, \end{aligned}$$
(2.6)

for some constant \(A>0\).

Proof

Observe that the 1-parameter family of diffeomorpthisms

$$\begin{aligned} (x_1,x_2,x_3)\rightarrow \delta _\lambda (x_1,x_2,x_3):=(\lambda x_1, \lambda x_2, \lambda ^2 x_3) \end{aligned}$$

satisfies \(|\delta _\lambda (x)|=\lambda |x|\), and \(d\delta _\lambda X_i=\lambda X_i\circ \delta _\lambda \) for \(i=1,2\). Consequently \(d_0(\delta _\lambda (x),\delta _\lambda (y))=\lambda d_0 (x,y)\), and \(\delta _\lambda (B(0,1))=B(0,\lambda )\). Since the unit ball B(0, 1) is a bounded open neighborhood of the origin, it will contain a set of the form \(|x|\le A^{-1}\) and will be contained in a set of the form \(|x|\le A\). By applying \(\delta _\lambda \) we then have that for any \(R>0\),

$$\begin{aligned} \left\{ x\in \mathbb {R}^3| |x|\le A^{-1} R\right\} \subset B(0,R)\subset \{ x\in \mathbb {R}^3| |x|\le A R\} \end{aligned}$$

concluding the proof of (2.6). \(\square \)

Remark 2.12

Since the Heisenberg group is a Lie group, then it is natural to use a left-invariant volume form to measure the size of sets, namely the Haar measure. It is not difficult to see [29] that the Haar measure coincides with the Lebesgue measure in \(\mathbb {R}^3\). It follows immediately from the previous lemma that the corresponding volume of a ball B(xr) is

$$\begin{aligned} |B(x,r)|=Cr^4. \end{aligned}$$
(2.7)

As a consequence one can show that the Hausdorff dimension of the metric space \(({\mathbb {H}}^1, d_0)\) is 4. The Hausdorff dimension of any horizontal curve (i.e. tangent to the distribution generated by \(X_1\) and \(X_2\)) is 1, while the Hausdorff dimension of the vertical z-axis is 2.

Next we turn our attention to the balls in the metrics \(g_\epsilon \) and the associated distance functions \(d_\epsilon \). To better describe the approximate shape of such balls we define the pseudo-distance function \(d_{G,\epsilon } (x,y)=N_\epsilon (y^{-1}x)\) corresponding to the regularized gauge function

$$\begin{aligned} N_\epsilon ^2(x)=x_1^2+x_2^2+ \min \bigg \{ |x_3| , \epsilon ^{-2}x_3^2 \bigg \}. \end{aligned}$$
(2.8)

Our next goal is to show that the Riemannian distance function \(d_\epsilon \) is well approximated by the gauge pseudo-distance \(d_{G,\epsilon }\).

Lemma 2.13

There exists \(A>0\) independent of \(\epsilon \) such that for all \(x,y\in \mathbb {R}^3\)

$$\begin{aligned} A^{-1} d_{G,\epsilon }(x,y) \le d_\epsilon (x,y) \le A d_{G,\epsilon }(x,y) . \end{aligned}$$
(2.9)

The estimate (2.9) yields immediately

Corollary 2.14

The doubling property holds uniformly in \(\epsilon >0\).

Remark 2.15

Before proving (2.9) it is useful to examine a specific example: compare two trajectories from the origin \(0=(0,0,0)\) to the point \(x=(0,0,x_3)\). The first is the segment \(\gamma _1\) defined by \(s\rightarrow (0,0,x_3 s)\), for \(s\in [0,1]\). The length of this segment in the Riemannian metric \(g^\epsilon \) given by the orthonormal frame \(X_1,X_2,\epsilon X_3\) is

$$\begin{aligned} \ell _\epsilon (\gamma _1)=\epsilon ^{-1}|x_3|. \end{aligned}$$

We also consider a second trajectory \(\gamma _2\) given by the subRiemannian geodesic between the two points. In view of (2.6) the length of this curve in the subRiemannian metric \(g^0\) defined by the orthonormal frame \(X_1,X_2\) is proportional to \(\sqrt{|x_3|}\) and coincides with the length in the Riemannian metric \(g^\epsilon \), i.e.

$$\begin{aligned} \ell _\epsilon (\gamma _2)=\ell _0(\gamma _2)\approx \sqrt{|x_3|}. \end{aligned}$$

Since \(d_\epsilon \) is computed by selecting the shortest path between two points in the \(g^\epsilon \) metric, then if \(\sqrt{|x_3|}>\epsilon \) one will have \(d_\epsilon (x,0) \le \sqrt{|x_3|} \approx N_\epsilon (x)\), whereas at small scales (i.e. for \(d_0(x,0)<\epsilon \)) one will have \(d_\epsilon (x,0)\le \epsilon ^{-1}|x_3|\). By left translation invariance of \(d_{G,\epsilon }\) we have that for any two points \(x=(x_1,x_2,s)\) and \(x'=(x_1,x_2,t)\),

$$\begin{aligned} d_\epsilon (x,x')\le C \min \left( \epsilon ^{-1}|t-s|, \sqrt{|t-s|}\right) . \end{aligned}$$
(2.10)

From this simple example one can expect that at large scale (i.e. for points \(d^0(x,0)>\epsilon \)) the Riemannian and the subRiemannian distances are approximately the same \(d_\epsilon (x,0)\approx d_0(x,0)\).

Proof

From the invariance by left translations of both \(d_{G,\epsilon }\) and \(d_\epsilon \) it is sufficient to prove that \(d_\epsilon (x,0)\) and \(N_\epsilon (x)\) are equivalent. We begin by establishing the first inequality in (2.9), i.e. we want to show that there exists a positive constant A such that

$$\begin{aligned} A^{-1}N_\epsilon (x)\le d_\epsilon (0,x). \end{aligned}$$

Consider a point \(x=(x_1,x_2,x_3)\in \mathbb {R}^3\) and three curves

  • A length minimizing curve \(\gamma :[0,1]\rightarrow \mathbb {R}^3\) for the metric \(g_\epsilon \), such that

    $$\begin{aligned} d_\epsilon (0,x)=\ell _\epsilon (\gamma ):=\int _0^1 \sqrt{ a_1^2 (t) + a_2^2 (t)+\epsilon ^{-2}a_3^2(t)} dt, \end{aligned}$$

    where \(\gamma '(t)=\sum _{i=1,3} a_i(t)X_i|_{\gamma (t)}.\)

  • An horizontal curve \(\gamma _1:[0,1]\rightarrow \mathbb {R}^3\) with one end-point at the origin (\(t=0\)) and such that \(\gamma _1'(t)=a_1(t) X_1|_{\gamma (t)}+a_2(t) X_2|_{\gamma (t)}\). Denote by \(P=\gamma _1(1)\) and observe that \(P=(x_1,x_2,p_3)\) for some value of \(p_3\) such that \(\int _0^1 a_3(t) dt=x_3-p_3\).

  • A vertical segment \(\gamma _2:[0,1]\rightarrow \mathbb {R}^3\) with endpoints P and x, such that \(\gamma _2'(t)=a_3(t)X_3|_{\gamma _2(t)}\). Note that

    $$\begin{aligned}&\epsilon ^{-1}|x_3-p_3|\le \left| \epsilon ^{-1} \int _0^1 a_3(t) dt\right| \le \int _0^1 |a_3(t)|\epsilon ^{-1}dt\\&\quad = \ell _\epsilon (\gamma _2)\le \ell _\epsilon (\gamma )\le d_\epsilon (x,p). \end{aligned}$$

Observe that in view of the equivalence (2.6),

$$\begin{aligned} C^{-1}\sqrt{x_1^2+x_2^2} \le d_0(P,0)\le \ell _0(\gamma _1)=\ell _\epsilon (\gamma _1)\le \ell _\epsilon (\gamma ), \end{aligned}$$

for some constant \(C>0\). On the other hand one also has

$$\begin{aligned} \epsilon ^{-1} |x_3-p_3| \le d_\epsilon (x,p)\le d_\epsilon (x,0)+d_\epsilon (0,p)\le d_\epsilon (x,0)+ \ell _\epsilon (\gamma _1)\le 2d_\epsilon (0,x). \end{aligned}$$

Hence if \(|p_3|\le \frac{1}{2} |x_3|\) then \(|x_3-p_3|\ge \frac{1}{2} |x_3|\) and consequently

$$\begin{aligned} d_\epsilon (x,0) = \ell _\epsilon (\gamma ) \ge \epsilon ^{-1} |x_3-p_3| \ge \min (\epsilon ^{-1}|x_3|, \sqrt{|x_3|}). \end{aligned}$$

The latter yields immediately that \(d_\epsilon (x,0)\ge C^{-1} N_\epsilon (x),\) for some value of \(C>0\) independent of \(\epsilon >0\). Next we consider the case \(|p_3|>\frac{1}{2}|x_3|\). This yields

$$\begin{aligned}&\min (\epsilon ^{-1}|x_3|, \sqrt{|x_3|}) \le \frac{1}{2} \min (\epsilon ^{-1}|p_3|, \sqrt{|p_3|})\le \sqrt{|p_3|}\le |P|\le Cd_0(P,0)\\&\quad C \ell _0(\gamma _1)=C \ell _\epsilon (\gamma _1) \le C \ell _\epsilon (\gamma )=C d_\epsilon (x,0), \end{aligned}$$

where |P| is defined as in (2.6). In summary, so far we have proved the first half of (2.9).

To prove the second half of the inequality we consider an horizontal segment \(\Gamma _1\) joining the origin to \(Q=(x_1,x_2, 0)\). Note that \(d_0(0,Q)=d_\epsilon (0,Q)=\ell _0(\Gamma _1)=\ell _\epsilon (\Gamma _1)\). In view of (2.10) one has

$$\begin{aligned} d_\epsilon (0,x)\le d_\epsilon (0,Q)+d_\epsilon (Q,x) = d_0(0,Q)+C\min (\epsilon ^{1} |x_3| , \sqrt{|x_3|}) \le CN_\epsilon (x). \end{aligned}$$

The latter completes the proof of (2.9). \(\square \)

Remark 2.16

Similar arguments continue to hold more in general, in the setting of Carnot groups.

As a consequence of Lemma 2.13, one has that for \(\epsilon >0\) the metric space \((\mathbb {R}^3, d_\epsilon )\) is locally bi-Lipschitz to the Euclidean space, and hence its Hausdorff dimension will be 3. As \(\epsilon \rightarrow 0\) the non-horizontal directions are penalized causing a sharp phase transition between the regime at \(\epsilon >0\) and \(\epsilon =0\).

The intuition developed through this example hints at the multiple scale aspect of the \(d_\epsilon \) metrics: At scales smaller than \(\epsilon >0\) the local geometry of the metric space \((\mathbb {R}^3,d_\epsilon )\) is roughly Euclidean; For scales larger than \(\epsilon >0\) it is subRiemannian. This intuition will inform the proofs of the stability for the doubling property in the next section.

3 Stability of the homogenous structure

The volume of Carnot–Caratheodory balls, and its doubling property, has been studied in Nagel, Stein and Wainger’s seminal work [70]. In this section we recall the main results in this paper and show how to modify their proof so that the stability of the doubling constant as \(\epsilon \rightarrow 0\) becomes evident.

3.1 The Nagel–Stein–Wainger estimates

Consider the Carnot–Caratheodory metric \(d_\epsilon (\cdot , \cdot )\) associated to the family of vector fields \((X_1^\epsilon , \ldots ,X_{p}^\epsilon ),\) defined in (2.4). Denote by \(B_e(x,r)=\{y|d_\epsilon (x,y)<r\}\) the corresponding metric balls.

For every n-tuple \(I=(i_1,\ldots ,i_n)\in \{1,\ldots ,2p-m\}^n\), and for \(\bar{\epsilon } \ge \epsilon \ge 0\) define the coefficient

$$\begin{aligned} \lambda ^\epsilon _I(x)=\det (X^\epsilon _{i_1}(x),\ldots ,X^\epsilon _{i_n}(x)). \end{aligned}$$

For a fixed \(0\le \epsilon \le \bar{\epsilon }\) and for a fixed constant \(0<C_{2,\epsilon }<1\), choose \(I_\epsilon =(i_{\epsilon 1},\ldots ,i_{\epsilon n})\) such that

$$\begin{aligned} |\lambda ^\epsilon _{I_\epsilon }(x)|r^{d_\epsilon (I_\epsilon )} \ge C_{2,\epsilon } max_J |\lambda ^\epsilon _J(x)|r^{d_\epsilon (J)}, \end{aligned}$$
(3.1)

where the maximum ranges over all n-tuples. Denote \(J_\epsilon \) the family of remaining indices, so that \(\{X^\epsilon _{i_{\epsilon , j}}: i_{\epsilon , j} \in I_\epsilon \} \cup \{X^\epsilon _{i_{\epsilon , k}}: i_{\epsilon ,k} \in J_\epsilon \}\) is the complete list \(X_1^\epsilon ,\ldots ,X_{2p-m}^\epsilon \). When \(\epsilon =0\) we will refer to \(I_0\) as a choice corresponding to the n-tuple \(X^0_{i_{01}},\ldots ,X^0_{i_{0n}}\) realizing (3.1). One of the main contributions in Nagel, Stein and Wainger’s seminal work [70], consists in the proof that for a v and a x fixed, and letting

$$\begin{aligned} Q_\epsilon ( r)=\left\{ u\in \mathbb {R}^n: |u_j| \le r^{d_\epsilon ( i_{\epsilon j})}\right\} \end{aligned}$$

denote a weighted cube in \(\mathbb {R}^n\), then the quantity \(|\lambda ^\epsilon _{I_\epsilon }(x)|\) provides an estimates of the Jacobian of the exponential mapping \(u\rightarrow \Phi _{\epsilon , v, x}(u)\) defined for \(u\in Q(r)\) as

$$\begin{aligned} \Phi _{\epsilon , v, x}(u) = exp\left( \sum _{i_{\epsilon , j}\in I_\epsilon } u_j X^\epsilon _{i_{\epsilon , j}} + \sum _{i_{\epsilon , k}\in J_\epsilon } v_k X^\epsilon _{i_{\epsilon , k}} \right) (x). \end{aligned}$$
(3.2)

More precisely, for \(\epsilon \ge 0\) and fixed one has

Theorem 3.1

[70, Theorem 7] For every \(\epsilon \ge 0\), and \(K\subset \subset \mathbb {R}^n\) there exist \(R_\epsilon >0\) and constants \(0<C_{1, \epsilon }, C_{2, \epsilon } <1\) such that for every \(x\in K\) and \(0<r<R_\epsilon \), if \(I_\epsilon \) is such that (3.1) holds, then

  1. (i)

    if \(|v_k| \le C_{2 \epsilon }r^{d(i_{\epsilon k})},\,\Phi _{\epsilon , v, x } \) is one to one on the box \(Q_\epsilon (C_{1, \epsilon } r)\)

  2. (ii)

    if \(|v_k| \le C_{2 \epsilon }r^{d(i_{\epsilon k})}\) the Jacobian matrix of \(\Phi _{\epsilon , v, x}\) satisfies on the cube \(Q_\epsilon (C_{1, \epsilon }r)\)

    $$\begin{aligned} \frac{1}{4} \left| \lambda ^\epsilon _{I_\epsilon } (x)\right| \le \left| J\Phi _{\epsilon , v, x}\right| \le 4 \left| \lambda ^\epsilon _{I_\epsilon } (x)\right| \end{aligned}$$
  3. (iii)
    $$\begin{aligned} \Phi _{\epsilon , v, x}(Q_\epsilon (C_{1, \epsilon } r)) \subset B_{\epsilon }(x, r) \subset \Phi _{\epsilon , v, x}(Q_\epsilon (C_{1, \epsilon } r/C_{2, \epsilon })) \end{aligned}$$

As a corollary one has that the volume of a Carnot–Caratheodory ball centered in x can be estimated by the measure of the corresponding cube and the Jacobian determinant of \(\Phi _{\epsilon ,v,x}\).

Corollary 3.2

([70, Theorem 1]) For every \(\epsilon \ge 0\), and \(K\subset \subset \mathbb {R}^n\) and for \(R_\epsilon >0\) as in Theorem 3.1, there exist constants \(C_{3 \epsilon }, C_{4\epsilon }>0\) depending on \(K, R_\epsilon , C_{1,\epsilon }\) and \(C_{2\epsilon }\) such that for all \(x\in K\) and \(0<r<R_\epsilon \) one has

$$\begin{aligned} C_{3 \epsilon }\sum _I \left| \lambda ^\epsilon _I(x)\right| r^{d(I)} \le \left| B_\epsilon (x,r)\right| \le C_{4 \epsilon } \sum _I \left| \lambda ^\epsilon _I(x)\right| r^{d(I)}, \end{aligned}$$
(3.3)

Estimates (3.3) in turn implies the doubling condition (1.2) with constants depending eventually on \(R_\epsilon , C_{1\epsilon }\) and \(C_{2\epsilon }\).

3.2 Uniform estimates as \(\epsilon \rightarrow 0\)

Having already proved the stability of the doubling property in the special case of the Heisenberg group, in this section we turn to the general case of Hörmander’s vector fields and describe in some details results from [18] establishing that the constants \(C_{1\epsilon }\) \(C_{2\epsilon }\) do not vanish as \(\epsilon \rightarrow 0\). Without loss of generality one may assume that both constants are non-decreasing in \(\epsilon \). In fact, if that is not the case one may consider a new pair of constants \(\tilde{C}_{i,\epsilon }=\inf _{s\in [\epsilon ,\bar{\epsilon }]} C_{i,s}\), for \(i=1,2\).

Proposition 3.3

For every \(\epsilon \in [0,\bar{\epsilon }]\), the constants \(R_\epsilon , C_{1, \epsilon }\) and \(C_{2, \epsilon }\) in Theorem 3.1 may be chosen to be independent of \(\epsilon \), depending only on the \(C^{r+1}\) norm of the vector fields, on the number \(\bar{\epsilon }\), and on the compact K .

Proof

The proof is split in two cases: first we study the range \(\epsilon<r<R_0\) which roughly corresponds to the balls of radius r having a sub-Riemannian shape. In this range we show that one can select the constants \(C_{i,\epsilon }\) to be approximately \(C_{i,0}\). The second case consists in the analysis of the range \(r<\epsilon <\bar{\epsilon }\). In this regime the balls are roughly of Euclidean shape and we show that the constants \(C_{i,\epsilon }\) can be approximately chosen to be \(C_{i,\bar{\epsilon }}\).

Let us fix \(\epsilon \in (0,\bar{\epsilon }],\,R=R_0\) and \(r<R_0\). We can start by describing the family \(I_\epsilon \) defined in (3.1), which maximize \(\lambda ^\epsilon _I(x)\). We first note that for every \(\epsilon >0\) and for every \(i,\,m+1 \le i \le p\) we have

$$\begin{aligned} X^\epsilon _{i}r^{d_\epsilon (i)} = \epsilon ^{d(i)-1} r Y_i, \quad X^\epsilon _{i+p-m}r^{d_\epsilon (i+p-m)} = r^{d(i)} Y_i. \end{aligned}$$
(3.4)

In the range \(0<r<\epsilon <\bar{\epsilon }\) one can assume without loss of generality that the n-tuple satisfying the maximality condition (3.1) will include only vectors of the form \(\{\epsilon ^{d(i_{\epsilon 1})-1}Y_{i_{\epsilon 1}}, \ldots , \epsilon ^{d(i_{\epsilon n})-1}Y_{i_{\epsilon n}}\} \) for some n-index \(I_\epsilon =(i_{\epsilon 1},\ldots ,i_{\epsilon n})\), with \(1\le i_{\epsilon k} \le p\). In fact, if this were not the case and the n-tuple were to include a vector of the form \(X^\epsilon _j=Y_{j-p+m}\) for some \(p<j\), then we could substitute such vector with \(X^\epsilon _{j-p+m}=Y_{j-p+m}\epsilon ^{d(j-p+m)-1}\) and from (3.4) infer that the value of the corresponding term \(|\lambda ^\epsilon _{I_\epsilon }(x)| r^{d_\epsilon (I_\epsilon )}\) would increase.

Similarly, in the range \(0<\epsilon<r<\bar{\epsilon }\) one can assume that the n-tuple satisfying the maximality condition (3.1) will include only vectors of the form \(\{Y_{i_{\epsilon 1}}, \ldots , Y_{i_{\epsilon n}} \}\) for some n-index \(I_\epsilon =(i_{\epsilon 1},\ldots ,i_{\epsilon n})\), with \(1\le i_{\epsilon k} \le p\). Note that the corresponding expression

$$\begin{aligned} \left| \lambda ^\epsilon _{I_\epsilon }(x)\right| r^{d_\epsilon (I_\epsilon )-1}=\left| \det (Y_{i_{\epsilon 1}}, \ldots , Y_{i_{\epsilon n}})(x)\right| r^{\sum _{I_\epsilon } d(i_{\epsilon k})} \end{aligned}$$

would then be one of the terms in the left hand side of (3.1) for \(\epsilon =0\), and thus is maximized by \(C_{2,0}^{-1}|\lambda _{I_0}^0(x)|r^{d(I_0)-1}\).

Case 1: In view of the argument above, for every \(\epsilon<r<R_0\) the indices \(I_\epsilon \) defined by the maximality condition (3.1) can be chosen to coincide with indices of the family \(I_0\) and do not depend on \(\epsilon \). On the other hand the vector excluded from \(I_\epsilon \) will be not only those in \(J_0\) but also the ones that have been added with a weight factor of a power of \(\epsilon \),

$$\begin{aligned} \left\{ X_k^\epsilon : k\in J_\epsilon \right\}= & {} \left\{ X^0_{i_{0,k}}: i_{0,k}\in J_0\right\} \cup \left\{ \epsilon ^{d(i_{0,k})-1} X^0_{i_{0,k}}: i_{0,k}\in I_0, i_{0,k}> m \right\} \\&\cup \left\{ \epsilon ^{d(i_{0,k})-1} X^0_{i_{0,k}}: i_{0,k}\in J_0 {, i_{0,k}>m} \right\} . \end{aligned}$$

In correspondence with this decomposition of the set of indices we define a splitting in the v-variables in (5.14) as

$$\begin{aligned} v=(\hat{v}, {\tilde{v}, \bar{v}}). \end{aligned}$$

Consequently for every \(\epsilon <r\) the function \(\Phi _{\epsilon , v, x}(u)\) can be written as

$$\begin{aligned} \Phi _{\epsilon , v, x}(u)= & {} exp\left( \sum _{i_{\epsilon j}\in I_\epsilon } u_j X^\epsilon _{i_{\epsilon j}} + \sum _{i_{\epsilon k}\in J_\epsilon } v_k X^\epsilon _{i_{\epsilon k}} \right) (x)\nonumber \\= & {} exp\left( \sum _{i_{0 j}\in I_0} u_j X^0_{i_{0 j}} + \sum _{i_{\epsilon k}\in J_\epsilon } v_k X^\epsilon _{i_{\epsilon k}} \right) (x)\nonumber \\= & {} exp\left( \sum _{i_{0 j}\in I_0} u_j X^0_{i_{0 j}} + \sum _{i_{0 k}\in J_0} \hat{v}_k Y_{i_{0 k}} + \sum _{i_{0 k}\in I_0, i>m } {\tilde{v}}_k \epsilon ^{d(i_{0 k})-1} Y_{i_{0 k}}\right. \nonumber \\&+\left. \sum _{i_{0 k}\in J_0 {, i_{0,k}>m} }{\bar{v}}_k \epsilon ^{d(i_{0 k})-1} Y_{i_{0 k}}\right) (x)\nonumber \\= & {} \Phi _{0, \hat{v}_k + \bar{v}_k \epsilon ^{d(i_{0 k})-1}, x}\left( u_1, \ldots u_m, u_{m+1} + {\tilde{v}}_{m+1} \epsilon ^{d(i_{0 m+1})-1}, \ldots , u_{n}\right. \nonumber \\&+\left. {\tilde{v}}_{n} \epsilon ^{d(i_{0 n})-1}\right) . \end{aligned}$$
(3.5)

Let us define mappings

$$\begin{aligned} F_{1,\epsilon ,v}(u)=\left( u_1,\ldots ,u_m, u_{m+1}+\tilde{v}_{m+1}\epsilon ^{d(i_{0 m+1})-1},\ldots ,u_{n} + {\tilde{v}}_{n} \epsilon ^{d(i_{0 n})-1}\right) , \end{aligned}$$

and

$$\begin{aligned} F_{2,\epsilon }(v)=\left( \hat{v}_1 + \bar{v}_1 \epsilon ^{d( i_{01})-1 }, \ldots , \hat{v}_{2p-m} + \bar{v}_{2p-m} \epsilon ^{d( i_{0, 2p-m})-1}\right) . \end{aligned}$$

In view of (3.5) we can write

$$\begin{aligned} \Phi _{\epsilon , v, x}(u) =\Phi _{0, F_{2,\epsilon }(v), x}(F_{1,\epsilon ,v}(u)). \end{aligned}$$
(3.6)

Note that for any \(\epsilon \ge 0\) and for a fixed v, the mapping \(u\rightarrow F_{1,\epsilon ,v}(u)\) is invertible and volume preserving in all \(\mathbb {R}^n\). Moreover \(J\Phi _{\epsilon , v, x}(u) =J\Phi _{0, F_{2,\epsilon }(v), x}(F_{1,\epsilon ,v}(u)).\) In view of (3.6) and of Theorem 3.1, as a function of u, the mapping \(\Phi _{\epsilon ,v,x}(u)\) is defined, invertible, and satisfies the Jacobian estimates in Theorem 3.1 (ii)

$$\begin{aligned} \frac{1}{4} \left| \lambda ^0_{I_0} (x)\right| \le \left| J\Phi _{0, F_{2,\epsilon }(v),x}(F_{1,\epsilon ,v}(u))\right| =\left| J\Phi _{\epsilon , v,x}(u)\right| \le 4 \left| \lambda ^0_{I_0} (x)\right| \end{aligned}$$

for all u such that \(F_{1,\epsilon ,v}(u)\in Q_0(C_{1,0}r)\) and for v such that

$$\begin{aligned} \left| F_{2,\epsilon }^k(v)\right|= & {} \left| \hat{v}_k + \bar{v}_k \epsilon ^{d(i_{0 k})-1}\right| \le C_{2, 0}r^{d(i_{0 k})}, \\ \left| u_1\right|\le & {} C_{1, 0}r^{d(i_{0 1})}\cdots \left| u_m \right| {\le } C_{1, 0} r^{d(i_{0 m})}, \left| u_{m+1} {+} {\tilde{v}}_{m+1} \epsilon ^{d(i_{0 m+1})-1}\right| {\le } C_{1, 0} r^{d(i_{0 m{+}1})}, \end{aligned}$$

when \(k=1,\ldots ,2p-m\). \(\square \)

The completion of the proof of Case 1 rests on the following two claims:

Claim 1

let \(\epsilon<r<R_0\). There exists \(C_6>0\), independent of \(\epsilon \), such that for all v satisfying \(|v_k| \le C_6r^{d(i_{\epsilon k})}\) one has \(|F_{2,\epsilon }^k(v)|=|\hat{v}_k + \bar{v}_k \epsilon ^{d(i_{0 k})-1}|\le C_{2, 0}r^{d(i_{0 k})} .\)

Proof of the claim

If we choose \(C_6< \min \{C_{1, 0}, C_{2, 0}\}\) and

$$\begin{aligned} |\hat{v}_k|, |\tilde{v}_k|, |\bar{v}_k|\le \min \left\{ C_{1, 0}, C_{2, 0}\right\} \frac{r^ {d(i_{\epsilon k})}}{4}, \quad |u_j|\le C_{1, 0}\frac{r^{d(i_{\epsilon j})}}{4}, \end{aligned}$$

it follows that

$$\begin{aligned} |\hat{v}_k|\le C_{2, 0}\frac{r^ {d(i_{0 k})}}{4}, \quad |\tilde{v}_k|, |\bar{v}_k| \le C_{1, 0}\frac{r}{4},\quad |u_j| \le C_{1, 0}\frac{r^{d(i_{\epsilon j})}}{4}. \end{aligned}$$

So that

$$\begin{aligned} |\hat{v}_k|\le C_{2, 0}\frac{r^ {d(i_{0 k})}}{4}, \quad \epsilon ^{d(i_{0 k})-1} |\tilde{v}_k|, \quad \epsilon ^{d(i_{0 k})-1}|\bar{v}_k| \le C_{1, 0}\frac{r^{d(i_{0 k})} }{4},\; |u_j| \le C_{1, 0}\frac{r^{d(i_{0 j})}}{4}, \end{aligned}$$

completing the proof of the claim. \(\square \)

Claim 2

Let \(\epsilon<r<R_0\) and v fixed such that \(|v_k| \le C_6 r^{d(i_{\epsilon k})}\) for \(k=1,\ldots ,2p-m\). One has that

$$\begin{aligned} Q_\epsilon \left( C_5^{-1} r\right) \subset F_{1,\epsilon ,v}^{-1}\left( Q_0(C_{1,0} r)\right) \subset Q_\epsilon (C_5 r) \end{aligned}$$

for some constant \(C_5>0\) independent of \(\epsilon \ge 0\).

Proof of the claim

Choose \(C_5\) sufficiently large so that \(2\max \{ C_5^{-1}, C_6 \}\le C_{1,0}\) and observe that if \(u\in Q_\epsilon (C_5^{-1} r)\) then for \(k=1,\ldots ,m\) we have \(|u_k|\le C_{1,0}r^{d(i_{\epsilon ,k})}=C_{1,0}r^{d(i_{0,k})}\) while for \(k=m+1,\ldots ,n\) we have \(|F^k_{1,\epsilon ,v}(u)|=|u_k+\tilde{v}_k \epsilon ^{d(i_{0k})-1}|\le \max \{C_5^{-1}, C_6 \} r^{d(i_{0 k})} (1+\bar{\epsilon }^{d(i_{0 k})-1})\le C_{1,0} r^{d(i_{0 k})}\). This proves the first inclusion in the claim. To establish the second inclusion we choose \(C_5\) large enough so that \(2(C_{1,0}+C_{2,\bar{\epsilon }})\le C_5\) and observe that if \(F_{1,\epsilon ,v}(u)\in Q_0(C_{1,0} r)\) then for \(k=m+1,\ldots ,n\) one has \(|u_k|\le |u_k+\tilde{v}_k \epsilon ^{d(i_{0k})-1}|+|\tilde{v}_k| \epsilon ^{d(i_{0k})-1}\le 2(C_{1,0}+C_{2,\bar{\epsilon }}) r^{d(i_{0 k})}\le C_5 r^{d(i_{0 k})}\). The corresponding estimate for the range \(k=1,\ldots ,m\) is immediate.

In view of Claims 1 and 2, and of Theorem 3.1 It follows that for \(\epsilon <r\) and these choices of constants (independent of \(\epsilon \))Footnote 1 the function \(\Phi _{\epsilon , v,x}(u) \) is invertible on \(Q_0(C_{1,0}r)\) and (i), (ii) and (iii) are satisfied.

Case 2: As remarked above, in the range \(0<r<\epsilon <\bar{\epsilon }\) one can assume that the n-tuple satisfying the maximality condition (3.1) will include only vectors of the form \(\{\epsilon ^{d(i_{\epsilon 1})-1}Y_{i_{\epsilon 1}}, \ldots , \epsilon ^{d(i_{\epsilon n})-1}Y_{i_{\epsilon n}} \}\) for some n-index \(I_\epsilon =(i_{\epsilon 1},\ldots ,i_{\epsilon n})\), with \(1\le i_{\epsilon k} \le p\). Note that in view of (3.4) and the maximality condition (3.1) the corresponding term

$$\begin{aligned} \left| \lambda ^\epsilon _{I_\epsilon }(x)\right| r^{d_\epsilon (I_\epsilon )} \end{aligned}$$

can be rewritten and estimated as follows

$$\begin{aligned} \left| \lambda ^\epsilon _{I_\epsilon }(x)\right| r^{d_\epsilon (I_\epsilon )}= \epsilon ^{d(I_\epsilon )-n} r^n \left| \det ( Y_{i_{\epsilon 1}}, \ldots , Y_{i_{\epsilon n}} ) (x)\right| . \end{aligned}$$

It is then clear that the maximizing n-tuple \(I_\epsilon \) in (3.1) will be identified by the lowest degree \(d(I_\epsilon )\) among all n-tuples corresponding to non-vanishing determinants \(\det ( Y_{i_{\epsilon 1}}, \ldots , Y_{i_{\epsilon n}} ) \) in a neighborhood of the point x. Since this choice does not depend on \(\epsilon >r\), then one has that \(I_\epsilon =I_{\bar{\epsilon }}\). In other words, if we denote

$$\begin{aligned} (X^{\bar{\epsilon }})_{ {i_{\bar{\epsilon },k}} \in I_{\bar{\epsilon }}} = \left\{ \bar{\epsilon }^{d(i_{\bar{\epsilon },1})-1} Y_{i_{\bar{\epsilon },1}}, \ldots , \bar{\epsilon }^{d(i_{\bar{\epsilon },n})-1} Y_{i_{\bar{\epsilon },n}} \right\} \end{aligned}$$

then the maximality condition (3.1) in the range \(0<r<\epsilon <\bar{\epsilon }\) can be satisfied independently from \(\epsilon \) by selecting the family of vector fields:

$$\begin{aligned} (X^\epsilon )_{{i_{\epsilon ,k}} \in I_\epsilon }=\left\{ \epsilon ^{d(i_{\bar{\epsilon },1})-1} Y_{i_{\bar{\epsilon },1}}, \ldots , \epsilon ^{d(i_{\bar{\epsilon },n})-1} Y_{i_{\bar{\epsilon },n}} \right\} \end{aligned}$$

The complementary family \(J_\epsilon \) becomes

$$\begin{aligned} \left\{ Y^\epsilon _{i_{\epsilon k}}: i_{\epsilon k} \in J_\epsilon \right\}= & {} \left\{ \epsilon ^{d(i_{\bar{\epsilon },k})-1} Y_{i_{\bar{\epsilon },k}}: i_{0,k}\in J_{\bar{\epsilon }}, \text { with } i_{\bar{\epsilon },k} \le p \right\} \nonumber \\&\cup \left\{ Y_{i_{\bar{\epsilon },k}-p+m}: i_{\bar{\epsilon },k}\in J_{\bar{\epsilon }}, \text { with }i_{\bar{\epsilon },k}>p\right\} \end{aligned}$$
(3.7)

If we denote \(A_\epsilon \), and \(B_\epsilon \) these three sets, and split the v-variable from (5.14) as \(v=(\hat{v}, \tilde{v})\), then it is clear that

$$\begin{aligned} Y\in A_\epsilon \text { iff } \frac{{\bar{\epsilon }}^{d(i_{\bar{\epsilon },k})-1} }{\epsilon ^{d(i_{\bar{\epsilon },k})-1} }Y\in A_{\bar{\epsilon }}, \end{aligned}$$

and in this case the values of \(d_\epsilon \) and \(d_{\bar{\epsilon }}\) are the same on the corresponding indices. Analogously \( Y\in B_\epsilon \text { iff } Y\in B_{\bar{\epsilon }}\) and the degrees are the same.

For every \(\epsilon >r\) the map \(\Phi _{\epsilon , v, x}(u)\) then can be written as

$$\begin{aligned} \Phi _{\epsilon , v, x}(u)= & {} \exp \left( \sum _{i_{\epsilon j}\in I_\epsilon } u_j X^\epsilon _{i_{\epsilon j}} + \sum _{i_{\epsilon k}\in J_\epsilon } v_k X^\epsilon _{i_{\epsilon k}} \right) (x)\\= & {} \exp \left( \sum _{i_{\bar{\epsilon } j}\in I_{\bar{\epsilon }} } u_j X^\epsilon _{i_{\bar{\epsilon } j}} + \sum _{i_{\bar{\epsilon } k}\in J_{\bar{\epsilon }}} v_k X^\epsilon _{i_{\bar{\epsilon } k}} \right) (x)\\= & {} \exp \left( \sum _{i_{\bar{\epsilon } j}\in I_{\bar{\epsilon }}} u_j \frac{{ \epsilon }^{d(i_{\bar{\epsilon },k})-1} }{\bar{\epsilon }^{d(i_{\bar{\epsilon },k})-1} }X^{\bar{\epsilon }}_{i_{\bar{\epsilon } j}} + \sum _{i_{\bar{\epsilon } k}\in J_{\bar{\epsilon }}\text { and }i_{\bar{\epsilon } j} \le p}{\hat{v}}_k \frac{{ \epsilon }^{d(i_{0,k})-1} }{\bar{\epsilon }^{d(i_{0,k})-1} }X^{\bar{\epsilon }}_{i_{0 k}}\right. \\&\left. +\, \sum _{i_{\bar{\epsilon } k}\in J_{\bar{\epsilon }}\text { and }i_{\bar{\epsilon } j} > p} \tilde{v}_k X^{\bar{\epsilon }}_{i_{\bar{\epsilon } k}} \right) (x) \end{aligned}$$

This function is defined and invertible for

$$\begin{aligned} \ | \tilde{v}_k| ,\ |{\hat{v}}_k | \frac{{ \epsilon }^{d(i_{\bar{\epsilon },k})-1} }{\bar{\epsilon }^{d(i_{\bar{\epsilon },k})-1} }\le C_{2, \bar{\epsilon }}r^{d_{\bar{\epsilon }}(i_{\bar{\epsilon } k})}, |u_j |\frac{{ \epsilon }^{d(i_{0,j})-1} }{\bar{\epsilon }^{d(i_{\bar{\epsilon },j})-1} }\le C_{1, \bar{\epsilon }} r^{d_{\bar{\epsilon }}(i_{\bar{\epsilon } j})}. \end{aligned}$$

Recall that with the present choice of \(r<\epsilon <\bar{\epsilon }\), we have \(C_{1, \bar{\epsilon }} r^{d_{\bar{\epsilon }}(i_{\bar{\epsilon } j})}=C_{1, \bar{\epsilon }} r^{d_{ \epsilon }(i_{\bar{\epsilon } j})}=C_{1, \bar{\epsilon }} r^{d_{ \epsilon }(i_{\epsilon j})}\). If we set

$$\begin{aligned} |\hat{v}_k|,| \tilde{v}_k|\le & {} C_{2, \bar{\epsilon }} r^{d_{\bar{\epsilon }(i_{\bar{\epsilon } k})}},\\ |u_j|\le & {} C_{1, \bar{\epsilon }} r^{d_{\bar{\epsilon }} (i_{\bar{\epsilon } j})}, \end{aligned}$$

and argue similarly to Case 1, then the function \(\Phi _{\epsilon , v, x}\) will satisfy conditions i), ii), and iii) on \(Q(C_{1,\bar{\epsilon }}r)\) and hence on \(Q(C_{1,\epsilon }r)\), with constants independent of \(\epsilon \). \(\square \)

3.3 Equiregular subRiemannian structures and equivalent pseudo-distances

The intrinsic definition, based on a minimizing choice, of the Carnot–Caratheodory metric is not convenient when one needs to produce quantitative estimates, as we will do in the following sections. It is then advantageous to use equivalent pseudo-distances which are explicitly defined in terms of certain system of coordinates. In the last section we have already encountered two special cases, i.e. the norms \(|\cdot |\) defined in (2.5) and its Riemannian approximation (2.8). In this section we extend this construction to a all equi-regular subRiemannian structures. For \(\Omega \subset \mathbb {R}^n\) consider the subRiemannian manifold \((\Omega ,\Delta , g)\) and iteratively set \(\Delta ^1:=\Delta \), and \(\Delta ^{i+1}=\Delta ^i+[\Delta ^i, \Delta ]\) for \(i\in \mathbb {N}\). The bracket generating condition is expressed by saying that there exists an integer \(s\in \mathbb {N}\) such that \(\Delta _{p}^s= \mathbb {R}^n\) for all \(p\in M\).

Definition 3.4

A subRiemannian manifold \((\Omega , \Delta , g)\) is equiregular if, for all \(i\in \mathbb {N}\), the dimension of \(\Delta ^i_p\) is constant in \(p\in \Omega \). The homogenous dimension

$$\begin{aligned} Q=\sum _{i=1}^{s-1} \left[ \dim (\Delta _p^{i+1})-\dim (\Delta _p^i)\right] , \end{aligned}$$
(3.8)

coincides with the Hausdorff dimension with respect to the Carnot–Caratheodory distance.

This class is generic as any subRiemannian manifold has a dense open subset on which the restriction of the subRiemannian metric is equiregular.

Example 3.5

Systems of free vector fields, as defined in Definition 5.4, yield a distribution \(\Delta \) that supports an equiregular subRiemannian structure for any choice of the horizontal metric g.

Example 3.6

An analytic Lie group G is called a homogenous stratified Lie group if its Lie algebra admits a stratification \(\mathcal {G}= V^1\oplus \cdots \oplus V^r\) with \([V^i,V^j]=V^{i+j}\) and \([V^i, V^r]=0\). Given a positive definite bilinear form \(g_0\) on \(V^1\) we call the pair \((G,g_0)\) a Carnot group and the corresponding left invariant metric \(g_0\) is a equiregular sub-Riemannian metric.

Next we assume we have a equiregular subRiemannian manifold \((\Omega , \Delta , g)\) and consider an orthonormal horizontal basis \(X_1,\ldots ,X_m\) of \(\Delta \). Following the process in (2.2) one can construct a frame \(Y_1,\ldots ,Y_n\) for \(\mathbb {R}^n\) where \(Y_1,\ldots ,Y_m\) is the original horizontal frame and \(Y_{m+1},\ldots ,Y_{n}\) are commutators such that \((Y_1,\ldots ,Y_{m_k})|_p\) spans \(\Delta ^k_p\), for \(k=1,\ldots ,s\). The degree d(i) of \(Y_i\) is the order of commutators needed to generate \(Y_i\) out of the horizontal span, i.e. \(d(i)=k\) if \(Y_i\in \Delta ^k_p\) but \(Y_i \notin \Delta _p^{k-1}\). In particular one has \(d(i)=1\) for \(i=1,\ldots ,m\). The equiregularity hypothesis allows one to choose \(Y_1,\ldots ,Y_n\) linearly independent. Next we extend g to a Riemannian metric \(g_1\) on all of \(T\Omega \) by imposing that \(Y_1,\ldots ,Y_n\) is an orthonormal basis.

Definition 3.7

For any \(\epsilon \in (0,\bar{\epsilon }]\) we define the Riemannian metric \(g_\epsilon \) by setting that \(\{ \epsilon ^{d(i)-1} Y_i,\,i=1,\ldots ,n\}\) is an orthonormal frame. Denote by \(d_\epsilon (x,y)\) the corresponding Riemannian distance function.

Remark 3.8

Repeating the proof of [70, Theorem 4] one immediately sees that \(d_\epsilon \) as defined here is comparable to the distance \(d_\epsilon \) defined in Sect. 2.2, with equivalence constants independent of \(\epsilon >0\).

We define canonical coordinates around a point \(x_0\in \Omega \) as follows. Since \(Y_1,\ldots ,Y_n\) is a generating frame for \(T\Omega \) then for any point x in a neighborhood \(\omega \) of \(x_0\) one has that there exists a unique n-tuple \((x_1,\ldots ,x_n)\) such that

$$\begin{aligned} \exp \left( \sum _{i=1}^n x_i Y_i\right) (x_0)=x. \end{aligned}$$
(3.9)

We will set \(x=(x_1,\ldots ,x_n)\) and use this n-tuple as local coordinates in \(\omega \).

Definition 3.9

For every \(x=(x_1,\ldots ,x_n)\in \omega \) we define a pseudo-distance \(d_{G,\epsilon }(x,x_0):=N_\epsilon (x_1,\ldots ,x_n)\) with

$$\begin{aligned} N_\epsilon (x_1,\ldots ,x_n):= \sqrt{\sum _{i=1}^m x_i^2 }+ \sum _{i=m+1}^n \min \left( \epsilon ^{-(d(i)-1)} |x_i|, |x_i|^{1/d(i)} \right) . \end{aligned}$$
(3.10)

For \(\epsilon =0\) we set

$$\begin{aligned} N_0(x_1,\ldots ,x_n):=\sqrt{\sum _{i=1}^m x_i^2 }+ \sum _{i=m+1}^n |x_i|^{1/d(i)}. \end{aligned}$$

Theorem 3.10

For every compact \(x_0\in K\subset \omega \) there exists \(C=C(K,\Delta , g, \omega )>0\), independent of \(\epsilon \in (0,\bar{\epsilon }]\) , such that

$$\begin{aligned} C^{-1} d_{G,\epsilon }(x,x_0) \le d_{\epsilon }(x,x_0) \le C d_{G,\epsilon }(x,x_0) \end{aligned}$$

for all \(x\in K\).

Remark 3.11

Note that for \(\epsilon =0\) the equivalence is a direct consequence of the Ball-Box theorem proved by Nagel et al. [70] or Mitchell [65, Lemma 3.4]. This observation replaces the estimates (2.6) from the Heisenberg group setting.

The proof of Theorem 3.10 follows as a corollary of the following

Proposition 3.12

In the hypothesis of Theorem 3.10 one has that there exists \(R=R(K,\Delta , g, \omega )>0, C=C(K,\Delta , g, \omega )>0\), independent of \(\epsilon \in (0,\bar{\epsilon }]\) , such that for all \(x\in K\) and \(r\in (0,R)\),

$$\begin{aligned} B_{G,\epsilon }(x_0, C^{-1} r) \subset B_{\epsilon }(x_0,r) \subset B_{G,\epsilon }(x_0, C r), \end{aligned}$$

where

$$\begin{aligned} B_{G,\epsilon }(x_0, r) :=\left\{ x\in \mathbb {R}^n \text { such that } \max _{i=1,\ldots ,s} \left[ \min \left( \epsilon ^{-(d(i)-1)} |x_i|, |x_i|^{1/d(i)} \right) \right] < r \right\} . \end{aligned}$$

Proof

The proof follows closely the arguments in the previous section and is based on the results in [70]. In view of the equiregularity hypothesis note that \(Y_1,\ldots ,Y_n\) are linearly independent and the construction in (2.4) yields the distribution \(X_1^\epsilon ,\ldots ,X_{2n-m}^\epsilon \) over \(\Omega \). Recall from (5.14), Proposition 3.3 and Theorem 3.1 that if \(I_\epsilon , J_\epsilon \) are chosen as in (3.1) and for any \(v=(v_1,\ldots ,v_{n-m})\) such that \(|v_k| \le C_{2 \epsilon }r^{d(i_{\epsilon k})}\), one has

$$\begin{aligned} B_\epsilon (x_0,r) \approx \Phi _{\epsilon ,v,x_0} (Q_\epsilon (r)), \end{aligned}$$
(3.11)

with constants independent from \(\epsilon \ge 0\), where \(Q_\epsilon =\{u\in \mathbb {R}^n: |u_j| \le r^{d_\epsilon ( i_{\epsilon j})}\}\), and

$$\begin{aligned} \Phi _{\epsilon , v, x}(u) = exp\left( \sum _{i_{\epsilon , j}\in I_\epsilon } u_j X^\epsilon _{i_{\epsilon , j}} + \sum _{i_{\epsilon , k}\in J_\epsilon } v_k X^\epsilon _{i_{\epsilon , k}} \right) (x). \end{aligned}$$

The n-tuple \(I_\epsilon \) contains n indexes related either to the horizontal vector fields \(X_1^\epsilon , \ldots , X_m^\epsilon \) or to the commutators \(X_{m+1}^\epsilon ,\ldots ,X_n^\epsilon \). The latter may consist of weighted versions \(X_{m+1}^\epsilon ,\ldots , X_n^\epsilon \) or unweighted versions \(X_{n+1}^\epsilon , \ldots , X_{2n-m}^\epsilon \). In either case the same vector will appear both in the weighted and in the unweighted version (either among the \(I_\epsilon \) indexes or in the complement \(J_\epsilon \)). Comparing the representation \(\Phi _{\epsilon ,v,x_0}\) with the x-coordinates representation (3.9) one has

$$\begin{aligned} \exp \left( \sum _{i=1}^n x_i Y_i\right) (x_0)= exp\left( \sum _{i_{\epsilon , j}\in I_\epsilon } u_j X^\epsilon _{i_{\epsilon , j}} + \sum _{i_{\epsilon , k}\in J_\epsilon } v_k X^\epsilon _{i_{\epsilon , k}} \right) (x_0), \end{aligned}$$

and we let for each \(k=1,\ldots ,n\)

$$\begin{aligned} x_k= {\left\{ \begin{array}{ll} \epsilon ^{d(k)-1} u_{i_k} + v_{j_k} &{} \text { if } i_k\le n \\ u_{i_k}+ \epsilon ^{d(k)-1} v_{j_k} &{} \text { if } i_k>n \end{array}\right. }. \end{aligned}$$

From the latter we obtain that for all \(k=1,\ldots ,n\)

$$\begin{aligned} |x_k| \le C\left( \epsilon ^{d(k)-1} r + r^{d(k)}\right) . \end{aligned}$$

If \(x\in B_\epsilon (x_0,r)\) then \(|u_{i_k}|, |v_{j_k}|\le C r^{d(k)}\). Consequently,

$$\begin{aligned}&\min ( \epsilon ^{-(d(k)-1)} |x_k|, |x_k|^{1/d(k)} ) \\&\quad \le C \min \left( \epsilon ^{-(d(k)-1)} |\epsilon ^{d(k)-1} r + r^{d(k)}|, \bigg [ \epsilon ^{d(k)-1} r + r^{d(k)}\bigg ]^{1/d(k)} \right) \\&\quad \le C \min \left( r \bigg [ 1+ \bigg (\frac{r}{\epsilon }\bigg )^{d(k-1)}\bigg ], \ \ r \bigg [ \bigg (\frac{\epsilon }{r}\bigg )^{d(k)-1} + 1\bigg ]^{1/d(k)} \right) \le 2C r. \end{aligned}$$

This shows that for \(r>0\) sufficiently small, and for some choice of \(C>0\) independent of \(\epsilon \ge 0\), we have \(B_{\epsilon }(x_0,r) \subset B_{G,\epsilon }(x_0, C r)\).

To prove the reverse inclusion we consider a point \(x=\exp ( \sum _{i=1}^n x_i Y_i)(x_0)\in B_{G,\epsilon }(x_0, C r)\). Select \(I_\epsilon \) as in (3.1) and set \(v=0\) to represent x in the basis \(X_{i_1},\ldots ,X_{i_n}\) as

$$\begin{aligned} x=exp\left( \sum _{i_{\epsilon , j}\in I_\epsilon } u_j X^\epsilon _{i_{\epsilon , j}} \right) (x_0). \end{aligned}$$

In view of Theorem 3.1, and (3.11), to prove the proposition it suffices to show that there exists a constant \(C>0\) independent of \(\epsilon >0\) such that for each \(j=1,\ldots ,n\) one has \(|u_j|\le Cr^{d_\epsilon (i_{\epsilon j})}\).

We distinguish two cases: In the range \(\epsilon \ge 2r\) one can argue as in (3.4) to deduce that for each \(j=1,\ldots ,n\) we may assume without loss of generality that the contribution due to \(u_jX^\epsilon _{i_{\epsilon , j}}\) follows from the choice of a weighted vector, and hence is of the form \(u_j \epsilon ^{d(k)-1} Y_k\) for some \(k>m\). Consequently one has \(d_\epsilon (i_{\epsilon , j})=1\) and \(x_k=u_j \epsilon ^{d(k)-1}\).

On the other hand, since \(\epsilon \ge 2r \) then one must also have that

$$\begin{aligned} \min \left( \epsilon ^{-(d(k)-1)} |x_k|, |x_k|^{1/d(k)}\right) = \epsilon ^{-(d(k)-1)} |x_k| <r. \end{aligned}$$

Consequently one has

$$\begin{aligned} |u_j|= |x_k| \epsilon ^{1-d(k)} \le r= r^{d_\epsilon (i_j)}. \end{aligned}$$

In the range \(\epsilon <2r\) we observe that one must have \(|x_k|\le Cr^d(k)\). Arguing as in (3.4) we see that without loss of generality, or each \(j=1,\ldots ,n\), the contribution due to \(u_jX^\epsilon _{i_{\epsilon , j}}\) follows from the choice of a un-weighted vector, and hence is of the form \(u_j Y_k\) for some \(k>m\). Consequently one has \(d_\epsilon (i_j)=d(k)>1\) and \(x_k=u_j \), concluding the proof. \(\square \)

4 Stability of the Poincaré inequality

In this section we will focus on the Poincaré inequality and prove that it holds with a choice of a constant which is stable as \(\epsilon \rightarrow 0\). Our argument rests on results of Lanconelli and Morbidelli [59] whose proof, in some respects, simplifies the method used by Jerison in [52]. Using some Jacobian estimates from [44] or [40] we will establish that the assumptions required in the key result [59, Theorem 2.1] are satisfied independently from \(\epsilon \ge 0\). We start by recalling

Theorem 4.1

[59, Theorem 2.1] Assume that the doubling condition (D) is satisfied and there exist a sphere \(B_\epsilon (x_0, r)\), a cube \(Q_\epsilon \subset \mathbb {R}^n\) and a map \(E: B_\epsilon (x_0, r)\times Q_\epsilon \rightarrow \mathbb {R}^n\) satisfying the following conditions:

  1. (i)

    \(B_\epsilon (x_0, 2 r)\subset E(x, Q_\epsilon )\)    for every \(x\in B_\epsilon (x_0, r)\)

  2. (ii)

    the function \(u \mapsto E(x, u)\) is one to one on the box \( Q_\epsilon \) as a function of the variable u and there exists a constant \(\alpha _1>0\) such that

    $$\begin{aligned} \frac{1}{\alpha _1} |JE(x,0)| \le |JE(x,u)| \le \alpha _1 |JE(x,0)| \quad \text { for every } u \in Q_\epsilon \end{aligned}$$

    Also assume that there exists a positive constant \(\alpha _2\), and a function \(\gamma : B_\epsilon (x_0, r) \times Q_\epsilon \times [0,\alpha _2 r]\rightarrow \mathbb {R}^n\) satisfying the following conditions

  3. (iii)

    For every \((x,u) \in B_\epsilon (x_0, r) \times Q_\epsilon \) the function \(t \mapsto \gamma (x,u,t)\) is a subunit path connecting x and E(xu)

  4. (iv)

    For every \((h,t) \in B_\epsilon (x_0, r) \times Q_\epsilon \) the function \(x \mapsto \gamma (x,u,t)\) is a one-to-one map and there exists a constant \(\alpha _3>0\) such that

    $$\begin{aligned} \inf _{ B_\epsilon (x_0, r)\times Q_\epsilon } \Big |det \frac{\partial \gamma }{\partial x}\Big |\ge \alpha _3 \end{aligned}$$

Then there exists a constant \(C_P\) depending only on the constants \(\alpha _1, \alpha _2, \alpha _3\) and the doubling constant \(C_D\) such that (P) is satisfied.

We are now ready to prove Theorem 1.2

Proof

All one needs to establish is that the assumptions of Theorem 4.1 are satisfied unformly in \(\epsilon \) on a metric ball. Apply Proposition 3.3 and Theorem 3.1 with \(K=B_\epsilon (x_0, r)\) and choose the constants \(C_i\) produced by these results. Set \(Q_\epsilon =Q_\epsilon (\frac{3 C_{1}}{C_2} r)\) and let

$$\begin{aligned} E(x,u)= \Phi _{\epsilon , 0,x}(u), \text { defined on } K \times Q_\epsilon \rightarrow \mathbb {R}^n. \end{aligned}$$

To establish assumption (i) of Theorem 4.1 it suffices to note that by virtue of condition (iii) in Theorem 3.1 one has that for \(x\in B_\epsilon (x_0, r)\),

$$\begin{aligned} B_\epsilon (x_0, 2 r)\subset B_\epsilon (x, 3 r)\subset E(x, Q_\epsilon ). \end{aligned}$$

Assumption (ii) in Theorem 4.1 is a direct consequence of condition (ii) in Theorem 3.1, with \(\alpha _1= 16\). Chow’s connectivity theorem implies that E(xu) satisfies assumption (iii), with a function \(\gamma \), piecewise expressed as exponential mappings of vector fields of \(\epsilon -\)degree one. Let us denote \((X^\epsilon _i)_{i\in I_\epsilon }\) the required vector fields. With this choice of path, it is known (see for example [44, Lemma 2.2] or [40, pp 99–101]) that \(x\rightarrow \gamma (x,u,t)\) is a \(C^1\) path, with Jacobian determinant

$$\begin{aligned} \bigg |det \frac{\partial \gamma }{\partial x}(x,u,t)\bigg |= 1 + \psi (x,u,t), \end{aligned}$$

for a suitable function \(\psi (x,u,t)\) satisfying

$$\begin{aligned} |\psi (x,u,t)|\le cr, \text { on }K\times Q_\epsilon \times [0,cr]. \end{aligned}$$

Since the constant c depends solely on the Lipschitz constant of the vector fields \((X^\epsilon _i)_{i\in I_\epsilon }\) then it can be chosen independently of \(\epsilon \). As a consequence condition (iv) is satisfied and the proof is concluded. \(\square \)

5 Stability of heat Kernel estimates

5.1 Hörmander type parabolic operators in non divergence form

The results in this section concern uniform Gaussian estimates for the heat kernel of certain degenerate parabolic differential equations, and their parabolic regularizations. We will consider a collection of smooth vector fields \(X=(X_1, \ldots , X_m)\) satisfying Hörmander’s finite rank condition (1.1) in an open set \(\Omega \subset \mathbb {R}^n\). We will use throughout the section the definition of degree d(i) relative to the stratification (2.2).

A second order, non-divergence form, ultra-parabolic operator with constant coefficients \(a_{ij}\) can be expressed as:

$$\begin{aligned} L_A = \partial _t - \sum _{i,j=1}^ma_{ij} X_i X_j, \end{aligned}$$
(5.1)

where \(A=( a_{ij})_{ij=1, \ldots , m}\) is a symmetric, real-valued, positive definite \(m\times m\) matrix satisfying

$$\begin{aligned} \Lambda ^{-1} \sum _{d(i)=1} \xi _i^2 \le \sum _{i,j=1}^m a_{ij} \xi _i \xi _j \le \Lambda \sum _{d(i)=1} \xi _i^2 \end{aligned}$$
(5.2)

for a suitable constant \(\Lambda \). We will also call

$$\begin{aligned} M_{m, \Lambda }\text { the set of symmetric } m\times m \text { real valued matrix, satisfying } (5.2) \end{aligned}$$
(5.3)

If A is the identity matrix then the existence of a heat kernel for the operator \(L_A\) is a by now classical result due to Folland [37] and Rothschild and Stein [76]. Gaussian estimates have been provided by Jerison and Sanchez-Calle [53], and by Kusuoka and Strook [58]. There is a broad, more recent literature dealing with Gaussian estimates for non divergence form operators with Hölder continuous coefficients \(a_{ij}\). Such estimates have been systematically studied in [79, 11] where a self-contained proof is provided.

A natural technique for studying the properties of the operator \(L_A\) is to consider a parabolic regularization induced by the vector fields \(X_i^\epsilon \) defined in (2.4). More precisely, we will define the operator

$$\begin{aligned} L_{\epsilon , A} = \partial _t - \sum _{i,j=1}^p a^\epsilon _{ij} X_i^\epsilon X_j^\epsilon \end{aligned}$$
(5.4)

where \(a^\epsilon _{i,j} \) is any \(p\times p\) positive definite matrix belonging to \( M_{p, 2 \Lambda }\) and such that

$$\begin{aligned} a^\epsilon _{i,j} = a_{i,j} \quad \text { for }i,j=1, \ldots , m. \end{aligned}$$

We will denote

$$\begin{aligned} M^\epsilon _{p, 2\Lambda } \end{aligned}$$
(5.5)

the set of such matrices. Formally, the operator \(L_{A}\) can be recovered as a limit as \(\epsilon \rightarrow 0\) of operator \(L_{\epsilon , A}\). Here we are interested in understanding which are the properties of solutions of \(L_{\epsilon , A}\) which are preserved in the limit.

For \(\epsilon >0\) consider a Riemannian metric \(g_\epsilon \) defined as in Remark 2.9, such that the vector fields \(X_i^\epsilon \) are orthonormal. The induced distance function \(d_\epsilon \) is biLipschitz equivalent to the Euclidean norm \(||_E\). Consequently, the operator \(L_{\epsilon , A}\) has a fundamental solution \(\Gamma _{\epsilon , A}\), which can be estimated as

$$\begin{aligned} \Gamma _{\epsilon , A}(x)\le C_\epsilon \frac{e^{-\frac{|x|_{E}^2}{C_\epsilon t}}}{t^{n/2}} \end{aligned}$$
(5.6)

for some positive constant \(C_\epsilon \) depending on \(A,\epsilon \) and \(X_1,\ldots ,X_m\).

Unfortunately the constant \(C_\epsilon \) blows up as \(\epsilon \) approaches 0, so the Riemannian estimate (5.7) alone does not provide Gaussian bounds of the fundamental solution \(\Gamma _A\) of the limit operator (5.1) as \(\epsilon \) goes to 0. In [57] the elliptic regularization technique has been used to obtain \(L^p\) and \(C^\alpha \) regularity of the solutions, which however are far from being optimal. In [27], new estimates uniform in \(\epsilon \) have been provided, in the time independent setting which are optimal with respect to the decay of the limit operator. In [17] the result has been extended to the parabolic operators, in the special case of Carnot groups.

In order to further extend these estimates, we need to formulate the following definition:

Definition 5.1

(Definition of the \({\mathcal {E}}(2, d_\epsilon , M^\epsilon _{2\Lambda })\) spaces) We define \({\mathcal {E}}(2+h, d_\epsilon , M^\epsilon _{p, 2\Lambda })\) to be the set of all kernels \((P_{\epsilon , A})_{\epsilon >0, A\in M^\epsilon _{p, 2\Lambda }}\), defined on \(\mathbb {R}^{2n}\times ]0, \infty [ \) that have an exponential decay of order \(2 + h\), uniformly with respect to a family of distances \((d_\epsilon )_\epsilon \) and of matrices \(A\in M^\epsilon _{p, 2\Lambda }\) (see definition 5.5),on any compact sets of an open set \(\Omega \). More precisely, we will say that \(P_{\epsilon , A}\in {\mathcal {E}}(2+h, d_\epsilon , M^\epsilon _{p, 2\Lambda })\) if the following three conditions hold:

  • For every \(K\subset \subset \Omega \) there exists a constant \(C_\Lambda >0\) depending on \(\Lambda \) but independent of \(\epsilon >0\), and of the matrix \(A\in M^\epsilon _{p, 2\Lambda }\) such that for each \(\epsilon >0,\,x,y\in K\) and \(t>0\) one has

    $$\begin{aligned} C_\Lambda ^{-1} \frac{t^\frac{h}{2}e^{-C_\Lambda \frac{d_\epsilon (x,y)^2}{t}}}{|B_\epsilon (x, \sqrt{t})|}\le P_{\epsilon , A}(x,y,t)\le C_\Lambda \frac{t^\frac{h}{2} e^{-\frac{d_\epsilon (x,y)^2}{C_\Lambda t}}}{|B_\epsilon (x, \sqrt{t})|}. \end{aligned}$$
    (5.7)
  • For \(s\in \mathbb {N}\) and k-tuple \((i_1,\ldots ,i_k)\in \{1,\ldots ,m\}^k\) there exists a constant \(C_{s,k}>0\) depending only on \(k,s,X_1,\ldots ,X_m,\Lambda \) such that

    $$\begin{aligned} \left| (\partial _t^s X_{i_1}\cdots X_{i_k} P_{\epsilon , A})(x,y,t)\right| \le C_{s,k} \frac{t^\frac{h-2s-k}{2} e^{-\frac{d_\epsilon (x,y)^2}{C_\Lambda t}}}{|B_\epsilon (x, \sqrt{t})|} \end{aligned}$$
    (5.8)

    for all \(x,y\in K\) and \(t>0\).

  • For any \(A_1,A_2\in M_\Lambda ,\,s\in \mathbb {N}\) and k-tuple \((i_1,\ldots ,i_k)\in \{1,\ldots ,m\}^k\) there exists \(C_{s,k}>0\) depending only on \(k,s,X_1,\ldots ,X_m, \Lambda \) such that

    $$\begin{aligned}&|(\partial _t^s X_{i_1}\cdots X_{i_k} P_{\epsilon , A_1})(x,y,t) - \partial _t^s X_{i_1}\cdots X_{i_k} P_{\epsilon , A_2})(x,y, t) |\nonumber \\&\quad \le ||A_1 - A_2||C_{s,k} \frac{t^\frac{h-2s-k}{2} e^{-\frac{d_\epsilon (x,y)^2}{C_\Lambda t}}}{|B_\epsilon (x, \sqrt{t})|}, \end{aligned}$$
    (5.9)

    where \(||A||^2:=\sum _{i,j=1}^n a_{ij}^2\).

With these notations we will now extend all these previous results to vector fields which only satisfy the Hörmander condition, establishing estimates which are uniform in the variable \(\epsilon \) as \(\epsilon \rightarrow 0\), and in the choice of the matrix \(A\in M^\epsilon _{2\Lambda }\) for the fundamental solutions \(\Gamma _{\epsilon , A}\) of the operators \(L_{\epsilon , A}\). To be more specific, we will prove:

Proposition 5.2

The fundamental solution \(\Gamma _{\epsilon , A}\) of the operator \(L_{\epsilon , A}\), is a kernel with exponential decay of order 2, uniform with respect to \(\epsilon >0\) and to \(A\in M^\epsilon _{m,\Lambda }\), according to definition (5.1). Hence it belongs to the set \({\mathcal {E}}(2, d_\epsilon , M^\epsilon _{2\Lambda }) \). Moreover, if \(\Gamma _A\) is the fundamental solution of the operator \(L_\mathcal {A}\) defined in (5.1) one has

$$\begin{aligned} {X}^\epsilon _{i_1}\cdots { X}^\epsilon _{i_k} \partial _t^s \Gamma _{\epsilon , A}\rightarrow {X}_{i_1}\cdots {X}_{i_k}\partial _t^s \Gamma _{A} \end{aligned}$$
(5.10)

as \(\epsilon \rightarrow 0\) uniformly on compact sets and in a dominated way on subcompacts of \(\Omega \).

Our main contribution is that all the constants are independent of \(\epsilon \). The proof of this assertion is based on a lifting procedure, which allows to express the fundamental solution of the operator \(L_{A, \epsilon }\) in terms of the fundamental solution of a new operator \({\bar{L}}_{A}\) independent of \(\epsilon \). The lifting procedure is composed by a first step in which we apply the delicate Rothschild and Stein lifting technique [76]. After that, when the vector fields are free up to a specific step, we apply a second lifting which has been introduced in [27], where the time independent case was studied, and from [17] where the Carnot group setting is considered.

The simplest example of such an equation is the Heat equation associated to the Kohn Laplacian in the Heisenberg group, \(\partial _t - X_1^2 -X_2^2\), where the vector fields \(X_1\) and \(X_2\) have been expressed on coordinates in Example 2.1. In order to present our approach we will give an outline of the proof in this special setting.

Example 5.3

Denote by \((x_1, x_2, x_3)\) points of \(\mathbb {R}^3\), let \(X_1, X_2, X_3\) be the vector fields defined in Example 2.1, and let I denote the identity matrix: Consider the parabolic operator

$$\begin{aligned} L_{\epsilon , I} = - \partial _t + X_1^2 + X_2 ^2 + \epsilon ^2 X_3^2, \end{aligned}$$

and note that it becomes degenerate parabolic as \(\epsilon \rightarrow 0\). Let \(d_\epsilon \) denote the Carnot–Caratheodory distance associated to the distribution \(X_1,X_2, \epsilon X_3\).

In order to handle such degeneracy we introduce new variables \((z_1, z_2, z_3)\) and a new set of vector fields replicating the same structure of the initial ones, i.e.,

$$\begin{aligned} \hat{Z}_1= \partial _{z_1} + z_2 \partial _{z_3},\quad \hat{Z}_2= \partial _{z_2} - z_1 \partial _{z_3},\quad \hat{Z}_3 = \partial _{z_3} \end{aligned}$$

with \((x_1,x_2,x_3, z_1,z_2,z_3)\in {\mathbb {H}}^1\times {\mathbb {H}}^1\). The next step consists in lifting \(L_{\epsilon ,I}\) to an operator

$$\begin{aligned} \bar{L}_\epsilon = \partial _t + X_1^2 + X_2 ^2 + Z_1^2 + Z_2^2 + \left( Z_3 + \epsilon X_3\right) ^2 , \end{aligned}$$

defined on \({\mathbb {H}}^1\times {\mathbb {H}}^1\), and denote by \(\bar{\Gamma }_\epsilon \) its fundamental solution. Let \(\bar{d}_\epsilon \) denote the Carnot–Caratheodory distance generated by \(X_1,X_2,Z_1,Z_2, (Z_3+\epsilon X_3)\) and arguing as in (5.22) note that \(\bar{d}_\epsilon ((x,z), (y,z))\ge d_\epsilon (x,y)-C_0\), for some constant \(C_0\) independent of \(\epsilon \). Consider the change of variables on the Lie algebra of \({\mathbb {H}}^1\times {\mathbb {H}}^1\),

$$\begin{aligned} X_i\rightarrow X_i, Z_i\rightarrow Z_i, \quad \text { for }i=1,2, Z_3 + \epsilon X_3\rightarrow Z_3. \end{aligned}$$

Note that the Jacobian of such change of variables does not depend on \(\epsilon \) and that it reduces the operator \(\bar{L}_\epsilon \) to

$$\begin{aligned} \bar{L} = \partial _t + X_1^2 + X_2 ^2 + Z_1^2 + Z_2^2 + Z_3^2 \end{aligned}$$

whose fundamental solution we denote by \(\bar{\Gamma }\). Note that this operator is parabolic with respect to the vector fields \(Z_i\) and degenerate parabolic with respect to the vector fields \(X_i\). Is is clear that the operator \( \bar{L} \) is independent of \(\epsilon \), and consequently its fundamental solution \( \bar{\Gamma }\) satisfies standard Gaussian estimates with constants independent of \(\epsilon \)

$$\begin{aligned} \bar{\Gamma }(x,t)\le C_\Lambda \frac{e^{-\frac{\bar{d} (x,0)^2}{C_\Lambda t}}}{|\bar{B} (0, \sqrt{t})|}, \end{aligned}$$

where \(\bar{d}\) denotes the Carnot–Caratheodory distance in \({\mathbb {H}}^1\times {\mathbb {H}}^1\) generated by the distribution of vector fields \(X_1,X_2,Z_1,Z_2,Z_3\). Changing back to the original variable we see that also \(\bar{\Gamma }_\epsilon \) satisfies analogous estimates with the same constants, with the distance \(\bar{d}\) replaced by the distance \(\bar{d}_\epsilon \) naturally associated to the operator \(\bar{L}_\epsilon \). Finally, integrating with respect to the added variable \((z_1, z_2, z_3)\), we obtain an uniform bound for the fundamental solution of the operator \(L_{\epsilon ,I}\) in terms of the distance \(d_\epsilon \).

5.2 The Rothschild–Stein freezing and lifting theorems

Let us first recall a local lifting procedure introduced by Rothschild and Stein in [76] which, starting from a family \((X_i)_{i=1, \ldots , m}\) of Hörmander type vector fields of step s in a neighborhood of \(\mathbb {R}^n\), leads to the construction of a new family of vector fields which are free, and of Hörmander type with the same step s, in a neighborhood of a larger space. The projection of the new free vector fields on \(\mathbb {R}^n\) yields the original vector fields, and that is why they are called liftings.

Let us start with some definitions:

Definition 5.4

Denote by \(n_{m,s}\) the dimension (as a vector space) of the free nilpotent Lie algebra with m generators and step s. Let \(X_1,\ldots ,X_m\) be a set of smooth vector fields defined in an open neighborhood of a point \(x_0 \in \mathbb {R}^n\), and let

$$\begin{aligned} V^{(s)} = span \left\{ X^{(1)}, \ldots , X^{(r)}\right\} , \end{aligned}$$

where the sets \(X^{j}\) are as defined in (2.2). We shall say that \(X_1,\ldots ,X_m\) are free up to step s if for any \(1\le r\le s\) we have \(n_{m,s}= dim(V^{(s)})\).

If a point \(x_0 \in R^n\) is fixed, the lifting procedure of Rothschild-Stein locally introduces new variables \(\tilde{z} \) and new vector fields \((\tilde{Z}_i)\) expressed in terms of the new variables such that in a neighborhood U of \(x_0\) the vector fields \(\tilde{X}_i = (X_i + \tilde{Z}_i)_{i=1,\ldots , m}\) are free at step s. More precisely, one has [76, Theorem 4]

Theorem 5.5

Let \(X_1,\ldots ,X_m\) be a system of smooth vector fields, satisfying (1.1) in an open set \(U\subset \mathbb {R}^n\). For any \(x \in U\) there exists a connected open neighborhood of the origin \(V\subset \mathbb {R}^{\nu - n}\), and smooth functions \(\lambda _{ij}(x,\tilde{z})\), with \(x\in R^n\) and \(\tilde{z}=(z_{n+1},\ldots ,z_{\nu })\in V\), defined in a neighborhood \(\tilde{U}\) of \(\tilde{x}=(x,0) \in U\times V \subset \mathbb {R}^\nu \), such that the vector fields \(\tilde{X}_1,\ldots ,\tilde{X}_m\) given by

$$\begin{aligned} \tilde{X}_i=X_i + \tilde{Z}_i, \quad \tilde{Z}_i = \sum _{j=n+1}^{\nu } \lambda _{ij}(x,\tilde{z})\partial _{z_j} \end{aligned}$$

are free up to step r at every point in \(\tilde{U}\).

Remark 5.6

In the literature the lifting procedure described above is often coupled with another key result introduced in [76], a nilpotent approximation which is akin to the classical freezing technique for elliptic operators. Let us explicitly note that in Sect. 5.3 we need only to apply the lifting theorem mentioned above, and not the freezing procedure. In particular, in the example of the so called Grushin vector fields

$$\begin{aligned} X_3=\partial _{x_1} \text { and }X_4=x_1 \partial _{x_2} \end{aligned}$$

they would need to be lifted through this procedure to the Heisenberg group structure

$$\begin{aligned} X_3=\partial _{x_1} \text { and }X_4=\partial _{x_3}+x_1 \partial _{x_2}. \end{aligned}$$

On the other hand the vector fields

$$\begin{aligned} X_1=\cos \theta \partial _{x_1}+\sin \theta \partial _{x_2} \quad \text {and} \quad X_2=\partial _{\theta } \end{aligned}$$

will be unchanged by the lifting process, since they are already free up to step 2.

Later on, In Sect. 5.4 we will apply Rothschild and Stein’s freezing theorem to a family of vector fields \(X_1,\ldots ,X_m\) free up to step r. This will allow to approximate a given family of vector fields with homogeneous ones. Note that in this case the function \(\Phi \) in (5.14) is independent of v and its expression reduces to:

$$\begin{aligned} \Phi _{x}(u) = exp\left( \sum _{i} u_i X_{i} \right) (x). \end{aligned}$$
(5.11)

The pertinent theorem from [76] is the following,

Theorem 5.7

Let \(X_1,\ldots ,X_m\) be a family of vector fields are free up to rank r at every point. Then for every x there exists a neighborhood V of x and a neighborhood U of the identity in \(G_{m,r}\), such that:

  1. (a)

    the map \(\Phi _{x}: U\rightarrow V \) is a diffeomorphism onto its image. We will call \(\Theta _x\) its inverse map

  2. (b)

    we have

    $$\begin{aligned} d \Theta _x (X_i)= Y_i+R_i, \quad \ i=1,\ldots ,m \end{aligned}$$
    (5.12)

    where \(R_i\) is a vector field of local degree less or equal than zero, depending smoothly on x.

Hence the operator \(R_i\) will represented in the form:

$$\begin{aligned} R_i= \sum _{jh} \sigma _{i}(u) X_i, \end{aligned}$$

where \(\sigma \) is an homogeneous polynomial of degree \(d(X_i)-1.\)

5.3 A lifting procedure uniform in \(\epsilon \)

So far we have started with a set of Hörmander vector fields \(X_1,\ldots ,X_m\) in \(\Omega \subset \mathbb {R}^n\) and we have lifted them through Theorem 5.5 to a set \(\tilde{X}_1,\ldots ,\tilde{X}_m\) of Hörmander vector fields that are free up to a step s in a neighborhood \(\tilde{\Omega }\subset \mathbb {R}^\nu \). Next, we perform a second lifting inspired by the work in [17]. We will consider the augmented space \( \mathbb {R}^{\nu }\times \mathbb {R}^\nu \) defined in terms of \(\nu \) new coordinates \(\hat{z}=(\hat{z}_1,\ldots ,\hat{z}_\nu )\). Set \(z=(\tilde{z}, \hat{z})\) and denote points of \(\mathbb {R}^{\nu }\times \mathbb {R}^{\nu }\) by \(\bar{x} =(x, \tilde{z}, \hat{z}) =(x,z)\). Denote by \(\hat{Z}_1,\ldots , \hat{Z}_m\) a family of vector fields free up to step s. \(\tilde{X}_1,\ldots ,\tilde{X}_m\), i.e. a family of vector fields free of step s in the variables \(\hat{z}\), and let

$$\begin{aligned} \hat{Z}_{m+1}, \ldots \hat{Z}_\nu \end{aligned}$$

denote the complete set of their commutators, as we did in (2.2). Note that the subRiemannian structure generated by \(\hat{Z}_1,\ldots ,\hat{Z}_m\) coincides with the structure generated by the family \(\tilde{X}_i\), but are defined in terms of new variables \(\hat{z}\).

For every \(\epsilon \in [0,1)\) consider a sub-Riemannian structure determined by the choice of horizontal vector fields given by

$$\begin{aligned} \left( \bar{X}_1^\epsilon , \ldots \bar{X}^\epsilon _{m + \nu }\right) = \left( \tilde{X}_1,\ldots , \tilde{X}_m, \hat{Z}_1,\ldots , \hat{Z}_m, \tilde{X}_{m+1}^\epsilon +\hat{Z}_{m+1},\ldots , \tilde{X}_{\nu }^\epsilon +\hat{Z}_{\nu }\right) . \end{aligned}$$
(5.13)

Since the space is free up to step r the function \(\Phi \) in (5.14) is independent of v and its expression reduces to:

$$\begin{aligned} \Phi _{\epsilon , \bar{x}}(u) = exp\left( \sum _{i} u^\epsilon _i \bar{X}^\epsilon _{i} \right) (\bar{x}). \end{aligned}$$
(5.14)

In the sequel, when we will need to explicitly indicate the vector fields defining \(\Phi \) we will also use the notation:

$$\begin{aligned} \Phi _{\epsilon , \bar{x}, \bar{X}^\epsilon }(u) = \Phi _{\epsilon , \bar{x}}(u), \text { and } \Phi _{\epsilon , \bar{x}, \bar{X}^\epsilon i}(u) \text { its components,} \end{aligned}$$
(5.15)

and analogous notations will be used for the inverse map \(\Theta _{\epsilon , \bar{x}, \bar{X}^\epsilon }\)

For every \(\epsilon > 0\) and \(\bar{x}, \bar{x}_0\), in view of Theorem 3.10 the associated ball box distances reduce to:

$$\begin{aligned} \bar{d}_\epsilon (\bar{x}, \bar{x}_0)= \sum _{i=1}^{2m}|u^\epsilon _i| + \sum _{i=2m+1}^{\nu +m} \min \left( |u^\epsilon _i|, |u^\epsilon _i|^{1/d(i)} \right) + \sum _{i=\nu +m+1}^{2\nu }| u^\epsilon _i|^{1/d(i )} \end{aligned}$$

For \(\epsilon =0\) and \(\bar{x}, \bar{x}_0 \) we have

$$\begin{aligned} \bar{d}_0(\bar{x}, \bar{x}_0)= \sum _{i=1}^{n} | u^0_i|^{1/d(i )} \end{aligned}$$

5.4 Proof of the stability result

The sub-Laplacian/heat operator associated to this structure is

$$\begin{aligned} \bar{L}_{\epsilon ,A}=\partial _t -\sum _{i=1}^{m+ \nu } \bar{a}_{ij} \bar{X}^\epsilon _{i}\bar{X}^\epsilon _{j}, \end{aligned}$$

where

$$\begin{aligned} \bar{A}= A \oplus \lambda I \end{aligned}$$

and I is the identity matrix of dimension \(\nu \times \nu \). We denote by \(\bar{\Gamma }_{\epsilon , A}\) the heat kernels of the corresponding heat operators, and prove a lemma analogous to Lemma 5.2 for the lifted operator:

Lemma 5.8

The fundamental solution \(\bar{\Gamma }_{\epsilon , A}\) of the operator \(\bar{L}_{\epsilon , A}\), is a kernel with local uniform exponential decay of order 2 with respect to \(\epsilon >0\) and \(A\in M^\epsilon _{m+\nu ,\Lambda }\), according to definition (5.1). Hence it belongs to the set \({\mathcal {E}}(2, \bar{d}_\epsilon , M^\epsilon _{m+\nu ,\Lambda }) \). Moreover, as \(\epsilon \rightarrow 0\) one has

$$\begin{aligned} {X}^\epsilon _{i_1}\cdots { X}^\epsilon _{i_k} \partial _t^s \bar{\Gamma }_{\epsilon , A}\rightarrow {X}_{i_1}\cdots {X}_{i_k}\partial _t^s \bar{\Gamma }_{A} \end{aligned}$$
(5.16)

uniformly on compact sets, in a dominated way on all \(\bar{G}\).

Proof

The result for the limit operator \(\bar{L}_{0,A} \) is well known and contained for example in [11]. Hence we only have to estimate the fundamental solution of the operators \(\bar{L}_{\epsilon ,A}\) in terms of the one of \(\bar{L}_{0,A} \). In order to do so, we first define a change of variable on the Lie algebra:

$$\begin{aligned} T_\epsilon (\bar{X}_i^\epsilon )=\bar{X}^0_i\text { for } i=1,\ldots ,\nu +m \end{aligned}$$
(5.17)

Then from a fixed point \(\bar{z}\) we apply the exponential map to induce on the Lie group a volume preserving change of variables. Using the notation introduced in (5.15), we will denote

$$\begin{aligned} \bar{F}_{\epsilon , \bar{z}} :\bar{G}\rightarrow \bar{G}, \quad \bar{F}_\epsilon (\bar{x})= \exp ( \Phi ^{-1}_{\epsilon , \bar{z}, T_\epsilon (\bar{X}^0) i} (\bar{x}) \bar{X}^0_i)(\bar{z}) \end{aligned}$$

Since the distances are defined in terms of the exponential maps, this change of variables induces a relation between the distances \(\bar{d}_0\) and \(\bar{d}_\epsilon \):

$$\begin{aligned} \bar{d}_\epsilon (\bar{x}, \bar{x}_0)=\bar{d}_0( \bar{F}_\epsilon (\bar{x}), \bar{F}_\epsilon (\bar{x}_0)). \end{aligned}$$
(5.18)

Analogously we also have

$$\begin{aligned} \bar{\Gamma }_{\epsilon , A} (\bar{x}, \bar{y}, t)=\bar{\Gamma }_{0,A}(\bar{F}_\epsilon (\bar{x}), \bar{F}_\epsilon (\bar{y}), t), \end{aligned}$$
(5.19)

Hence assertions (5.7) follow from the estimates of \(\bar{\Gamma }_{0,A}\) contained for instance in [53]. Indeed the second inequality can be established as follows:

$$\begin{aligned} \bar{\Gamma }_{\epsilon , A} (\bar{x}, \bar{y}, t)=\bar{\Gamma }_{0,A}(\bar{F}_\epsilon (\bar{x}), \bar{F}_\epsilon (\bar{y}), t) \le C_\Lambda \frac{e^{-\frac{\bar{d}_0(\bar{F}_\epsilon (\bar{x}), \bar{F}_\epsilon (\bar{y}))^2}{C_\Lambda t}}}{|\bar{B}_0 (\bar{F}_\epsilon (\bar{x}), \sqrt{t})|}= C_\Lambda \frac{e^{-\frac{\bar{d}_\epsilon (\bar{x}, \bar{y})^2}{C_\Lambda t}}}{|\bar{B}_\epsilon (\bar{x}, \sqrt{t})|}. \end{aligned}$$

The proof of the first inequality in (5.7) and (5.8) is analogous, while (5.9) follows from the estimates of the fundamental solution contained in ([11]).

The pointwise convergence (5.16) is also an immediate consequence of (5.18) and (5.19). In order to prove the dominated convergence result we need to relate the distances \(\bar{d}_0\) and \(\bar{d}_\epsilon \). On the other side, the change of variable (5.17) allows to express exponential coordinates \(u_i^\epsilon ,\) in terms of \(u_i^0\) as follows:

$$\begin{aligned} \bar{d}_\epsilon ( \bar{x}, \bar{x}_0)= \sum _{i=1}^{2m}|u_i^0| + \sum _{i=2m+1}^\nu \left( |u_i^0- \epsilon w_{i+\nu }^0|^{1/d(i)} + \min (|u_i^0|, |u_i^0|^{1/d(i)} ) \right) \end{aligned}$$

so that for allFootnote 2 \(\bar{x}, \bar{x}_0\in \bar{G}\)

$$\begin{aligned} \bar{d}_0(\bar{x}, \bar{x}_0) - C_0\le \bar{d}_\epsilon ( \bar{x}, \bar{x}_0)\le \bar{d}_0(\bar{x}, \bar{x}_0)+ C_0 \end{aligned}$$
(5.20)

where \(C_0\) is independent of \(\epsilon \). The latter and (5.8) imply that there is a constant \(\tilde{C}_{s,k}\) independent of \(\epsilon \) such that

$$\begin{aligned} \left| (\partial _t^s { X}^\epsilon _{i_1}\cdots { X}^\epsilon _{i_k} \bar{\Gamma }_{\epsilon , A})(\bar{x},\bar{y},t)\right| \le \tilde{C}_{s,k} t^{-s-k/2} \frac{e^{-\frac{\bar{d}_0(\bar{x},\bar{y})^2}{C_\Lambda t}}}{|\bar{B}_0(\bar{x}, \sqrt{t})|} \end{aligned}$$

and this imply dominated convergence with respect to the \(\epsilon \) variable. \(\square \)

In order to be able to conclude the proof of Proposition 5.2, we need to study the relation between the fundamental solutions \(\Gamma _A( x,y,t)\) and its lifting \(\bar{\Gamma } _{0,A}((x, 0), (y, z), t) \), as well as the relation between \(\Gamma _{\epsilon A} (x,y, t)\)and \(\bar{\Gamma }_{\epsilon , A} ((x, 0), (y, z), t) , \)

Remark 5.9

We first note that for every \(f\in C^\infty _0(\mathbb {R}^n \times R^+)\) f can be identified with a \(C^\infty \) and bounded function defined on \(\mathbb {R}^{n+ \nu } \times R^+\) and constant in the z-variables. Hence

$$\begin{aligned} L_{A}f = {\bar{L}}_{A}f, \quad L_{\epsilon , A}f = {\bar{L}}_{\epsilon , A}f, \end{aligned}$$

Consequently:

$$\begin{aligned} f(x,t) = \int \int \left( \int \bar{\Gamma } _{\epsilon ,A}((x, 0, s), (y, z, t)) dz \right) L_{\epsilon , A}f(y,s) dy ds \end{aligned}$$

From the definition of fundamental solution we can deduce that

$$\begin{aligned} \Gamma _A( x,y,t)= & {} \int _G \bar{\Gamma } _{0,A}((x, 0), (y, z), t) dz, \quad \text { and } \nonumber \\ \Gamma _{\epsilon A} (x,y, t)= & {} \int _G \bar{\Gamma }_{\epsilon , A} ((x, 0), (y, z), t) dz, \end{aligned}$$
(5.21)

for any \(x\in G\) and \(t>0\).

We conclude this section with the proof of the main result Proposition 5.2.

Proof

In view of the previous remanrk and (global) dominated convergence of the derivatives of \(\bar{\Gamma }_{\epsilon , A}\) to the corresponding derivatives of \(\bar{\Gamma }_{0,A}\) as \(\epsilon \rightarrow 0\), we deduce that

$$\begin{aligned} \int _G \bar{\Gamma }_{\epsilon , A} ((x, 0), (y, z), t) dz \rightarrow \int _G \bar{\Gamma }_{0,A} (( x, 0),(y, z), t) dz \end{aligned}$$

as \(\epsilon \rightarrow 0\). The Gaussian estimates of \( \Gamma _{\epsilon , A}\) follow from the corresponding estimates on \(\bar{\Gamma }_{\epsilon , A}\) and the fact that in view of (5.20),

$$\begin{aligned} \bar{d}_\epsilon ((x, z), (x_0, z_0))\ge & {} \bar{d}_0((x, z), (x_0, z_0)) - C_0\ge d_0( x, x_0) + d_0(z, z_0) - C_0\nonumber \\\ge & {} d_\epsilon (x, x_0) + d_\epsilon (z, z_0) -3 C_0 \end{aligned}$$
(5.22)

Indeed the latter shows that there exists a constant \(C>0\) depending only on \(G, \sigma _0\) such that for every \(x\in G\),

$$\begin{aligned} \int _G e^{- \frac{d^2_\epsilon ((x, z), (x_0, z_0))}{t}}dz \le C e^{-\frac{d^2_\epsilon (x, x_0)}{t}} \int _G e^{- \frac{d^2_\epsilon ( z , z_0)}{t}}dz \le C e^{-\frac{d^2_\epsilon (x, x_0)}{t}}. \end{aligned}$$

The conclusion follows at once. \(\square \)

5.5 Differential of the integral operator associated to \(\Gamma _\epsilon \)

In this subsection we will show how to differentiate a functional F expressed as follows:

$$\begin{aligned} F(f)(x, t) = \int \Gamma _{\epsilon , A}(x,y,t) f(y, s) dy ds. \end{aligned}$$

In order to do so, we will need to differentiate both with respect to x and to y, so that we will denote \(X_i^{\epsilon , x}\Gamma _{\epsilon , A}(x,y,t)\) the derivative with respect to the variable x and \(X_i^{\epsilon , y}\Gamma _{\epsilon , A}(x,y,t)\) the derivative with respect to the variable y.

Analogously, we will denote the derivative with the first variable of the lifted fundamental solution

$$\begin{aligned} \bar{X}_i^{\epsilon , \bar{x}}\bar{\Gamma }_{, A}((x, w), (y,z) ,t). \end{aligned}$$

For \(\epsilon =0\), we will have by definition

$$\begin{aligned} \bar{X}_i^{0, \bar{x}}\bar{\Gamma }_{, A}((x, w), (y,z) ,t)= ( X_i^{0, x} + \tilde{Z}_i^w)\bar{\Gamma }_{, A}((x, w), (y,z) ,t). \end{aligned}$$

The derivative with respect to the second variable will be denoted \(\bar{X}_i^{0, \bar{y}}.\) If \(\Gamma \) is the Euclidean heat kernel, there is a simple relation between the derivative with respect to the two variables, indeed in this case \(\Gamma _{\epsilon , A}(x,y,t)= \Gamma _{\epsilon , A}(x-y,0,t)\), so that

$$\begin{aligned} X_i^{\epsilon , x}\Gamma _{\epsilon , A}(x,y,t) = - X_i^{\epsilon , y}\Gamma _{\epsilon , A}(x,y,t). \end{aligned}$$
(5.23)

Consequently for every function \(f\in C^\infty _0\)

$$\begin{aligned} \partial _{x_i}F(f)(x,t) = \int \Gamma _{\epsilon , A}(x,y,t) \partial _{y_i}f(y) dy. \end{aligned}$$

This is no more the case in general Lie groups, or for Hörmander vector fields. However we will see that there is a relation between the two derivatives, which allows to prove the following:

Proposition 5.10

Assume that \(f\in C^{\infty }_0(\Omega \times ]0, T[)\) in an open set \(\Omega \times ]0, T[\). For every \(x\in K\subset \subset \Omega \), for every \(i=1, \ldots m\) there exists the derivative \(X_i^\epsilon F(f)(x,t) \). Precisely there exist kernels \( P_{\epsilon , i,h}(x, y, t), R_{\epsilon , i}(x, y, t) \in {\mathcal {E}}(2, d_\epsilon , M^\epsilon _{m, \Lambda }) \) such that

$$\begin{aligned} X_i^{\epsilon } F(f)=-\int \sum _{h=1}^m X^{\epsilon , y, *}_{h} P_{\epsilon , i,h }(x, y, t) f(y) dy - \int R_{\epsilon , i}(x,y, t) f(y) dy. \end{aligned}$$

(Let us note explicitly that the term \(R_{\epsilon , i,h}(x,y, t) \) plays the role of an error term).

Proof

We can apply the lifting procedure described in Sects. 5.2 and 5.3, and representing the fundamental solution as in (5.19) and (5.21), we obtain the following expression for \(F_\epsilon \):

$$\begin{aligned} F_\epsilon (f)= & {} \int \int _G \bar{\Gamma }_{\epsilon , A} ((x, 0), (y, z), t) dz f(y) dy\\= & {} \int \int \bar{\Gamma }_{0,A}(\bar{F}_\epsilon (x, 0), \bar{F}_\epsilon (y, z), t) dz f(y) dy. \end{aligned}$$

By differentiating with respect to \(X_i^\epsilon \) we get:

$$\begin{aligned} X^\epsilon _iF_\epsilon (f)(x) = \int \int ({\bar{X}^{0,x}}_i - \tilde{Z}_i^w) \bar{\Gamma }_{0,A}(\bar{F}_\epsilon (x, 0), \bar{F}_\epsilon (y, z), t) dz f(y) dy. \end{aligned}$$
(5.24)

Note that the family of vectors \(\bar{X}^{0}_i\) is independent of \(\epsilon \) and free of step r. Hence, in view of [76, page 295, line 3 from below] one has that for every \(i,j=1, \cdots m\), there exist families of indices \(I_{i,j}\), and polynomials \( \bar{p}_{ih}\) homogeneous of degree \(\ge h\) such that:

$$\begin{aligned} \bar{X}^{0, \bar{x}}_{i}\bar{\Gamma }_{0,A}(\bar{x}, \bar{y}, t)= & {} \sum _{j=1}^m \left( \bar{X}^{0, \bar{y}}_{j}\right) ^* \sum _{h\in I_{i,j}} \bar{X}^{0, \bar{y}}_{h} \left( {\bar{p}_{i h}}(\Theta _{\bar{x}}(\bar{y}))\bar{\Gamma }_{0,A}(\bar{x}, \bar{y}, t)\right) \\&-\left( \sum _{j=1}^m\left( \bar{X}^{0, \bar{y}}_{j}\right) ^*\sum _{h\in I_{i,j}} \bar{X}^{0, \bar{y}}_{h}\right) \left( {\bar{p}_{ih}}(\Theta _{\bar{x}}(\bar{y}))\right) \bar{\Gamma }_{0,A}(\bar{x}, \bar{y}, t). \end{aligned}$$

In particular using this expression in the variable z alone, and integrating by parts we deduce

$$\begin{aligned} \int \int \tilde{Z}_i^w \bar{\Gamma }_{0,A}(\bar{F}_\epsilon (x, 0), \bar{F}_\epsilon (y, z), t) dz = 0 \end{aligned}$$

We now call

$$\begin{aligned} \bar{R}_{0, i }(\bar{x}, \bar{y}, t) = \left( \sum _{j=1}^m\left( \bar{X}^{0, y}_{j}\right) ^*\sum _{h\in I_{i,j}} \bar{X}^{0, \bar{y}}_{h}\right) \left( {\bar{p}_{ih}}(\Theta _{\bar{x}}(\bar{y}))\right) \bar{\Gamma }_{0,A}(\bar{x}, \bar{y}, t) \end{aligned}$$

This kernel, being obtained by multiplication of \(\Gamma _{0,A}(\bar{x}, \bar{y}, t)\) by a polynomial, has locally the same decay as \(\Gamma _{0,A}(\bar{x}, \bar{y}, t)\). In particular it is clear that the conditions 5.7, 5.8, 5.9 are satisfied uniformly with respect to \(\epsilon \), since there is no dependence on \(\epsilon \). As a consequence, if we set

$$\begin{aligned} R_{\epsilon , i }(x, y, t) =\int \bar{R}_{0, i }\left( \bar{F}_\epsilon (x, 0), \bar{F}_\epsilon (y, z), t\right) dz \end{aligned}$$

then \(R_{\epsilon , i}(x, y, t) \in {\mathcal {E}}(2, d_\epsilon , M^\epsilon _{m, \Lambda }) \) Similarly we call

$$\begin{aligned} \bar{P}_{\epsilon , i,h }(\bar{x}, \bar{y}, t) = \sum _{h\in I_{i,j}} \bar{X}^{0, y}_{h} \Big ({\bar{p}_{i h}}(\Theta _{\bar{x}}(\bar{y}))\bar{\Gamma }_{0,A}(\bar{x}, \bar{y}, t)\Big ) \end{aligned}$$

Now we use the fact that \(\bar{\Gamma }_{0,A}\in {\mathcal {E}}(2, \bar{d}, M^\epsilon _{m+\nu , \Lambda }) \) together with the fact that \(\bar{p}_{ih}\) is a polynomial of the degree equal of the order of \(\bar{X}_{h}^{0, y}\) to conclude that

$$\begin{aligned} \bar{P}_{\epsilon , i,h }(\bar{x}, \bar{y}, t) \in {\mathcal {E}}(2, \bar{d}, M^\epsilon _{m+\nu , \Lambda }) \end{aligned}$$

Setting

$$\begin{aligned} P_{\epsilon , i,h }(x, y, t) = \int \bar{P}_{0, i,h }(\bar{F}_\epsilon (x, 0), \bar{F}_\epsilon (y, z), t) dz \end{aligned}$$

it follows that \(P_{\epsilon , i,h }(x, y, t) \in {\mathcal {E}}(2, d_\epsilon , M^\epsilon _{m, \Lambda }) \)

Substituting this expression into equation (5.24) we get

$$\begin{aligned} X^\epsilon _iF_\epsilon (f)(x) = -\int \sum _{j=1}^m \left( \bar{X}^{0, y}_{j}\right) ^* P_{\epsilon , i,h }(x, y, t) f(y) dy-\int R_{\epsilon , i}(x, y, t) f(y) dy. \end{aligned}$$
(5.25)

\(\square \)

6 Stability of interior Schauder estimates

In this section we will prove uniform estimates in spaces of Hölder continuous functions and in Sobolev spaces for solutions of second order sub-elliptic differential equations in non divergence form

$$\begin{aligned} L_{\epsilon , A} u\equiv \partial _t u- \sum _{i,j=1}^n a^\epsilon _{ij}(x,t) X_i^\epsilon X_j^\epsilon u=0, \end{aligned}$$

in a cylinder \( Q=\Omega \times (0,T)\) that are stable as \(\epsilon \rightarrow 0\). The proof of both estimates is largely based on the heat kernel estimates established above. Internal Schauder estimates for these type of operators are well known. We recall the results of Xu [83], Bramanti and Brandolini [10] for heat-type operators, and the results of Lunardi [62], and Polidoro and Di Francesco [34], and Gutierrez and Lanconelli [47], which apply to a large class of squares of vector fields plus a drift term. We also recall [64] where uniform Schauder estimates for a particular elliptic approximation of subLaplacians are proved.

Here the novelty is due to the uniform condition with respect to \(\epsilon \). This is accomplished by using the uniform Gaussian bounds established in in the previous section. This result extends to Hörmander type operators the analogous assertion proved by Manfredini and the authors in [14] in the setting of Carnot Groups.

6.1 Uniform Schauder estimates

Let us start with the definition of classes of Hölder continuous functions in this setting

Definition 6.1

Let \(0<\alpha < 1,\,{Q}\subset \mathbb {R}^{n+1}\) and u be defined on Q. We say that \(u \in C_{\epsilon ,X}^{\alpha }({Q})\) if there exists a positive constant M such that for every \((x,t), (x_{0},t_0)\in {Q}\)

$$\begin{aligned} |u(x,t) - u(x_{0},t_0)| \le M \tilde{d}_{\epsilon }^\alpha ((x,t), (x_{0},t_0)). \end{aligned}$$
(6.1)

We put

Iterating this definition, if \(k\ge 1\) we say that \(u \in C_{\epsilon ,X}^{k,\alpha }({Q})\) if for all \(i=1,\ldots ,m\) \(X_i \in C_{\epsilon ,X}^{k-1,\alpha }({Q})\). Where we have set \(C^{0,\alpha }_{\epsilon ,X}({Q})=C^{\alpha }_{\epsilon , X}({Q}).\)

The main results of this section, which generalizes to the Hörmander vector fields setting our previous result with Manfredini in [14] is

Proposition 6.2

Let w be a smooth solution of \(L_{\epsilon , A}w=f\) on Q. Let K be a compact sets such that \(K\subset \subset {Q}\), set \(2\delta =d_0(K, \partial _p Q)\) and denote by \(K_\delta \) the \(\delta \)-tubular neighborhood of K. Assume that there exists a constant \(C>0\) such that

$$\begin{aligned} || a_{ij}^\epsilon ||_{C^{k,\alpha }_{\epsilon ,X}(K_\delta )} \le C, \end{aligned}$$

for any \(\epsilon \in (0,1)\). There exists a constant \(C_1>0\) depending on \(\alpha ,\,C,\,\delta \), and the constants in Proposition 5.2, but independent of \(\epsilon \), such that

$$\begin{aligned} ||w||_{C^{k+2, \alpha }_{\epsilon ,X}(K)} \le C_1 \left( ||f||_{C^{k,\alpha }_{\epsilon ,X}(K_\delta )}+ ||w||_{C^{k+1, \alpha }_{\epsilon ,X}(K_\delta )}\right) . \end{aligned}$$

We will first consider to a constant coefficient operator, for which we will obtain a representation formula, then we will show how to obtain from this the claimed result.

Precisely we will consider the constant coefficient frozen operator:

$$\begin{aligned} L_{\epsilon ,(x_0,t_0)}\equiv \partial _t - \sum _{i,j=1}^n a^\epsilon _{ij}(x_0,t_0) X_i^\epsilon X_j^\epsilon , \end{aligned}$$

where \((x_0,t_0)\in Q\). We explicitly note that for \(\epsilon >0\) fixed the operator \(L_{\epsilon , (x_0,t_0)}\) is uniformly parabolic, so that its heat kernel can be studied through standard singular integrals theory in the corresponding Riemannian balls.

As a direct consequence of the definition of fundamental solution one has the following representation formula

Lemma 6.3

Let w be a smooth solution to \(L_\epsilon w=f \) in \(Q\subset \mathbb {R}^{n+1}\). For every \(\phi \in C^\infty _0(Q)\),

$$\begin{aligned}&(w\phi )(x,t)= \int _Q\Gamma ^\epsilon _{(x_0,t_0)}((x,t),(y,\tau )) \left( L_{\epsilon , (x_0,t_0)}-L_\epsilon \right) (w \,\phi )(y,\tau ) dyd\tau \nonumber \\&\quad +\int _{Q}\Gamma ^\epsilon _{(x_0,t_0)}((x,t),(y,\tau )) \left( f\phi + wL_\epsilon \phi + 2 \sum _{i,j=1}^n a_{ij}^\epsilon (y,\tau ) X_i^\epsilon wX_j^\epsilon \phi \right) (y,\tau ) dyd\tau , \end{aligned}$$
(6.2)

where we have denoted by \(\Gamma ^\epsilon _{(x_0,t_0)}\) the heat kernel for of \(L_{\epsilon ,(x_0,t_0)}\).

Iterating the previous lemma we get the following

Lemma 6.4

Let \(k\in N\) and consider a k-tuple \((i_1,\ldots ,i_k)\in \{1,\ldots ,m\}^k\). There exists a differential operator B of order \(k+1\), depending on horizontal derivatives of \(a_{ij}^\epsilon \) of order at most k, such that

$$\begin{aligned} X_{i_k}^\epsilon \cdots X_{i_1}^\epsilon \left( L_{\epsilon ,(x_0,t_0)}-L_\epsilon \right) = \sum _{i,j=1}^n \left( a_{ij}^\epsilon - a_{ij}^\epsilon (x_0, t_0) \right) X_{i_k}^\epsilon \cdots X_{i_1}^\epsilon X^\epsilon _iX^\epsilon _j + B. \end{aligned}$$

Proof

The proof can be made by induction. Indeed it is true with \(B=0\) by definition if \(k=0\):

$$\begin{aligned} L_{\epsilon ,(x_0,t_0)}-L_\epsilon = \sum _{i,j=1}^n \left( a_{ij}^\epsilon - a_{ij}^\epsilon (x_0, t_0) \right) X^\epsilon _iX^\epsilon _j . \end{aligned}$$

if it true for a fixed value of k then we have

$$\begin{aligned}&X_{i_{k+1}}^\epsilon X_{i_1}^\epsilon \cdots X_{i_k}^\epsilon \left( L_{\epsilon ,(x_0,t_0)}-L_\epsilon \right) \\&\quad = \sum _{i,j=1}^n \left( a_{ij}^\epsilon - a_{ij}^\epsilon (x_0, t_0) \right) X_{i_{k+1}}^\epsilon X_{i_k}^\epsilon \cdots X_{i_1}^\epsilon X^\epsilon _i X^\epsilon _j + \tilde{B} \end{aligned}$$

where

$$\begin{aligned} \tilde{B}=X_{i_{k+1}}\left( a_{ij}^\epsilon - a_{ij}^\epsilon (x_0, t_0) \right) X_{i_k}^\epsilon \cdots X_{i_1}^\epsilon X^\epsilon _iX^\epsilon _j + X_{i_{k+1}}B. \end{aligned}$$

By the properties of B it follows that \(\tilde{B}\) is a differential operator of order \(k+2\), depending on horizontal derivatives of \(a_{ij}^\epsilon \) of order at most \(k+1\). This concludes the proof. \(\square \)

We can go back to our operator L and establish the following regularity results, differentiating twice the representation formula:

Proposition 6.5

Let \(0<\alpha <1\) and w be a smooth solution of \(L_{\epsilon }w=f\in C^{\alpha }_{\epsilon .X}({Q})\) in the cylinder Q. Let K be a compact sets such that \(K\subset \subset {Q}\), set \(2\delta =d_0(K, \partial _p Q)\) and denote by \(K_\delta \) the \(\delta \)-tubular neighborhood of K. Assume that there exists a constant \(C>0\) such that for every \(\epsilon \in (0,1)\)

$$\begin{aligned} || a^\epsilon _{ij}||_{C^{ \alpha }_{\epsilon ,X}(K_\delta )}\le C. \end{aligned}$$

There exists a constant \(C_1>0\) depending on \(\delta ,\,\alpha ,\,C\) and the constants in Proposition 5.2 such that

$$\begin{aligned} ||w||_{C^{2, \alpha }_{\epsilon ,X}(K)} \le C_1\left( ||f||_{C^{\alpha }_{\epsilon ,X}(K_\delta )}+ ||w||_{C^{1, \alpha }_{\epsilon ,X}(K_\delta )}\right) . \end{aligned}$$

Proof

The proof follows the outline of the standard case, as in [41], and rests crucially on the Gaussian estimates proved in Proposition 5.2. Choose a parabolic sphereFootnote 3 \(B_{\epsilon ,\delta } \subset \subset K\) where \(\delta >0\) will be fixed later and a cut-off function \(\phi \in C^\infty _0(\mathbb {R}^{n+1})\) identically 1 on \(B_{\epsilon , \delta /2}\) and compactly supported in \(B_{\epsilon ,\delta }\). This implies that for some constant \(C>0\) depending only on G and \(\sigma _0\),

$$\begin{aligned} \left| \nabla _\epsilon \phi \right| \le C\delta ^{-1}, \quad |L^\epsilon \phi | \le C\delta ^{-2}, \end{aligned}$$

in Q. Now we represent the function \(w\phi \) through the formula 6.3 and take two derivatives in the direction of the vector fields. We remark once more that the operator is uniformly elliptic due to the \(\epsilon -\)regularization, hence the differentiation under the integral can ben considered standard. As a consequence for every multi-index \(I=(i_1,i_2)\in \{1,\ldots ,m\}^2\) and for every \((x_0,t_0)\in B_{\epsilon ,\delta }\) one has:

$$\begin{aligned}&X_{i_1}^\epsilon X_{i_2}^\epsilon (w\phi )(x_0,t_0) \end{aligned}$$
(6.3)
$$\begin{aligned}&\quad = \int _Q X_{i_1}^\epsilon X_{i_2}^\epsilon \Gamma ^\epsilon _{(x_0,t_0)}(\cdot ,(y,\tau ))|_{(x_0,t_0)} \left( L_{\epsilon ,(x_0,t_0)}-L_\epsilon \right) (w \,\phi )(y,\tau ) dyd\tau \nonumber \\&\qquad + \int _Q X_{i_1}^\epsilon X_{i_2}^\epsilon \Gamma ^\epsilon _{(x_0,t_0)}(\cdot ,(y,\tau ))_{(x_0,t_0)} \left( f\phi + wL_\epsilon \phi \right. \nonumber \\&\qquad \left. + 2\sum _{i,j=1}^n a_{ij}^\epsilon X_i^\epsilon wX_j^\epsilon \phi \right) (y,\tau ) dyd\tau . \end{aligned}$$
(6.4)

In order to study the Hölder continuouity of the second derivatives, we note that the uniform Hölder continuity of \(a_{ij}^\epsilon \), and Proposition 5.2 ensure that the kernal satisfy the classical singular integral properties (see [37]):

$$\begin{aligned}&|X_{i_1}^\epsilon X_{i_2}^\epsilon \Gamma ^\epsilon _{(x,t)}((x,t),(y,\tau ))- X_{i_1}^\epsilon X_{i_2}^\epsilon \Gamma ^\epsilon _{(x_0,t_0)}((x_0,t_0),(y,\tau ))|\\&\quad \le C\,\tilde{d}^\alpha _\epsilon ((x,t),(x_0,t_0)) \frac{(\tau -t_0)^{-1}e^{-\frac{d_\epsilon (x_0,y)^2}{C_\Lambda (\tau -t_0)}}}{|B_\epsilon (0, \sqrt{\tau -t_0})|}, \end{aligned}$$

with \(C>0\) independent of \(\epsilon \). From here, proceeding as in [41, Theorem 2, Chapter 4], the first term in the right hand side of formula (6.3) can be estimated as follows:

$$\begin{aligned}&\Big |\Big | \int X_{i_1}^\epsilon X_{i_2}^\epsilon \Gamma ^\epsilon _{(x_0,t_0)} (\cdot ,(y,\tau )) (L_\epsilon -L_{\epsilon ,(x_0, t_0)}) (w \,\phi )(y,\tau )dyd\tau \Big |\Big |_{C^\alpha _{\epsilon ,X}(B_{\epsilon ,\delta })}\nonumber \\&\quad \le C_1 \Big |\Big |(L_\epsilon -L_{\epsilon ,(x_0, t_0)}) (w \,\phi )\Big |\Big |_{C^\alpha _{\epsilon ,X}(B_{\epsilon , \delta })} \nonumber \\&\quad =C_1\sum _{i,j}\left| \left| (a_{ij}^\epsilon (x_0,t_0)-a_{ij}^\epsilon (\cdot )\big )X_j^\epsilon X_j^\epsilon (w \,\phi ) \right| \right| _{C^\alpha _{\epsilon ,X}(B_{\epsilon , \delta })}\nonumber \\&\quad \le \tilde{C}_1 \delta ^\alpha ||a_{ij}^\epsilon ||_{C^{\alpha }_{\epsilon ,X}(B_{\epsilon ,\delta })}||w\phi ||_{C^{2, \alpha }_{\epsilon ,X}(B_{\epsilon ,\delta })}, \end{aligned}$$
(6.5)

where \(C_1,\) and \(\tilde{C}_1\) are stable as \(\epsilon \rightarrow 0\). Similarly, if \(\phi \) is fixed, the Hölder norm of the second term in the representation formula (6.3) is bounded by

$$\begin{aligned}&\Big |\Big |\int X_{i_1}^\epsilon X_{i_2}^\epsilon \Gamma ^\epsilon _{(x_{0},t_0)}((x_0,t_0),(y,\tau )) \big (f\phi (y,\tau ) + wL\phi (y,\tau ) \nonumber \\&\quad + 2 a_{ij}^\epsilon X_i^\epsilon wX_j^\epsilon \phi \big )dyd\tau \Big |\Big |_{C^\alpha _{\epsilon ,X}(B_{\epsilon , \delta })}\nonumber \\&\qquad \le C_2 \left( ||f||_{C^\alpha _{\epsilon ,X}(K_\delta )} +\frac{C}{\delta ^2}|| w||_{C^{1, \alpha }_{\epsilon ,X}(K_\delta )}\right) . \end{aligned}$$
(6.6)

From (6.3), (6.5) and (6.6) we deduce that

$$\begin{aligned} ||w\phi ||_{C^{2, \alpha }_{\epsilon ,X}(B_\delta )}\le \tilde{C}_2\, \delta ^\alpha ||w\phi ||_{C^{2, \alpha }_{\epsilon ,X}(B_\delta )} + C_2\left( ||f||_{C^\alpha _{\epsilon ,X}(K_\delta )} +\frac{C}{\delta ^2}|| w||_{C^{1,\alpha }_{\epsilon ,X}(K_\delta )}\right) . \end{aligned}$$

Choosing \(\delta \) sufficiently small we prove the assertion on the fixed sphere \(B_{\epsilon ,\delta }\) The conclusion follows from a standard covering argument. \(\square \)

We can now conclude the proof of Proposition 6.2:

Proof

The proof is similar to the previous one for \(k=1\). We start by differentiating the representation formula (6.2) along an arbitrary direction \(X_{i_1}\)

$$\begin{aligned}&X_{i_1}^\epsilon (w\phi )(x,t) \end{aligned}$$
(6.7)
$$\begin{aligned}&\quad = \int _Q X_{i_1}^\epsilon \Gamma ^\epsilon _{(x_0,t_0)}(\cdot ,(y,\tau )) \left( L_{\epsilon ,(x_0,t_0)}-L_\epsilon \right) (w \,\phi )(y,\tau ) dyd\tau \nonumber \\&\qquad + \int _Q X_{i_1}^\epsilon \Gamma ^\epsilon _{(x_0,t_0)}(\cdot ,(y,\tau )) \left( f\phi + wL_\epsilon \phi + 2\sum _{i,j=1}^n a_{ij}^\epsilon X_i^\epsilon wX_j^\epsilon \phi \right) (y,\tau ) dyd\tau . \end{aligned}$$
(6.8)

Now we apply Theorem 5.10 and deduce that there exist kernels

$$\begin{aligned} P_{e, i_1,h, (x_0,t_0)}((x,t),(y,\tau )), R_{e, i_1, (x_0,t_0)}((x,t),(y,\tau )), \end{aligned}$$

with the same decay of the fundamental solution such that

$$\begin{aligned}&X_{i_1}^\epsilon (w\phi )(x,t)\nonumber \\&\quad =-\int \sum _{h=1}^m P_{\epsilon , i_1,h, (x_0,t_0)}((x,t),(y,\tau )) X^{\epsilon , y}_{h}\left( L_{\epsilon ,(x_0,t_0)}-L_\epsilon \right) (w \,\phi )(y,\tau ) dyd\tau \nonumber \\&\qquad -\int R_{\epsilon , i_1,(x_0,t_0)}((x,t),(y,\tau )) \left( L_{\epsilon ,(x_0,t_0)}-L_\epsilon \right) (w \,\phi )(y,\tau ) dyd\tau dy\nonumber \\&\qquad -\sum _{h=1}^m\int P_{\epsilon , i_1,h, (x_0,t_0)}((x,t),(y,\tau )) X^{\epsilon , y}_{h}\left( f\phi + wL_\epsilon \phi \right. \nonumber \\&\qquad \left. + 2\sum _{i,j=1}^n a_{ij}^\epsilon X_i^\epsilon wX_j^\epsilon \phi \right) (y,\tau ) dyd\tau -\int R_{\epsilon , i_1, (x_0,t_0)}((x,t),(y,\tau )) \nonumber \\&\qquad \times \left( f\phi + wL_\epsilon \phi + 2\sum _{i,j=1}^n a_{ij}^\epsilon X_i^\epsilon wX_j^\epsilon \phi \right) (y,\tau ) dyd\tau . \end{aligned}$$
(6.9)

Using Lemma 6.4, this yields the existence of new kernels \(P^{i_1,\cdots , i_k}_{\epsilon , h_1, \cdots , h_k, (x_0,t_0)}((x,t),(y,\tau )) \) with the behavior of a fundamental solution (and the same dependence on \(\epsilon \)) such that

$$\begin{aligned}&X_{i_1}^\epsilon \cdots X_{i_{k}}^\epsilon (w\phi )(x,t)= \int ^{i_1,\ldots , i_k}_{\epsilon , h_1, \ldots , h_k, (x_0,t_0)}((x,t),(y,\tau )) \\&\quad \Big ( a_{ij}^\epsilon - a_{ij}^\epsilon (x_0, t_0) \Big )X_{i_1}^\epsilon \ldots X_{i_{k}}^\epsilon X^\epsilon _iX^\epsilon _j (w \,\phi )(y,\tau ) dyd\tau \\&\qquad + \int ^{i_1,\ldots , i_k}_{\epsilon , h_1, \ldots , h_k, (x_0,t_0)}((x,t),(y,\tau ))B (w \,\phi )(y,\tau ) dyd\tau \\&\qquad +\int ^{i_1,\ldots , i_k}_{\epsilon , h_1, \ldots , h_k, (x_0,t_0)}((x,t),(y,\tau )) X_{i_1}^\epsilon \cdots X_{i_{k}}^\epsilon \Big (f\phi (y,\tau ) + wL_\epsilon \phi (y,\tau ) \\&\qquad + 2 \sum _{i,j=1}^n a_{ij}^\epsilon X_i^\epsilon wX_j^\epsilon \phi \Big )dyd\tau \\&\qquad + {\text{ lower } \text{ order } \text{ terms }}\\ \end{aligned}$$

where \(\phi \) is as in the proof of Proposition 6.5 and B is a differential operator of order \(k+1\). The conclusion follows by further differentiating the previous representation formula along two horizontal directions \(X_{j_1}^\epsilon X_{j_2}^\epsilon \) and arguing as in the proof of Proposition 6.5. \(\square \)

7 Application I: Harnack inequalities for degenerate parabolic quasilinear equations hold uniformly in \(\epsilon \)

The results we have presented so far show that for any \(\epsilon _0>0\), the 1-parameter family of metric spaces \((\mathcal {M},d_\epsilon )\) associated to the Riemannian approximations of a subRiemannian metric space \((\mathcal {M},d_0)\), satisfy uniformly in \(\epsilon \in [0,\epsilon _0]\) the hypothesis in the definition of p-admissible structure in the sense of [48, Theorem 13.1]. This class of metric measure spaces has a very rich analytic structure (Sobolev-Poincaré inequalities, John–Nirenberg lemma, ...) that allows for the development of a basic regularity theory for weak solutions of classes of nonlinear degenerate parabolic and elliptic PDE. In the following we will remind the reader of the definition and basic properties of p-admissible structures and sketch some of the main regularity results from the recent papers [1] and [18]. We will conclude the section with a sample application of these techniques to the global (in time) existence of solutions for a class of evolution equations that include the subRiemannian total variation flow [14].

7.1 Admissible ambient space geometry

Consider a smooth real manifold M and let \(\mu \) be a locally finite Borel measure on M which is absolutely continuous with respect the Lebesgue measure when represented in local charts. Let \(d(\cdot , \cdot ): M \times M \rightarrow \mathbb {R}^{+}\) denote the control distance generated by a system of bounded, \(\mu \)-measurable, Lipschitz vector fields \(\Xi = (X_1,\ldots , X_m)\) on M. As in [3] and [44] one needs to assume as basic hypothesis

$$\begin{aligned} \text { the inclusion } i: (M, chart) \rightarrow (M,d) \text { is continuous,} \end{aligned}$$
(7.1)

where we have denoted by (Mchart) the topology on M induced by the Euclidean topology in \(\mathbb {R}^n\) via coordinate charts. For \(x \in M\) and \(r > 0\), set \(B(x,r) = \{y \in M: d(x,y) < r \}\) and let |B(xr)| denote the \(\mu \) measure of B(xr). In general, given a function u and a ball \(B=B(x,r)\) then \(u_B\) denotes the \(\mu \)-average of u on the ball \(B=B(x,r)\). In view of (7.1) the closed metric ball \(\bar{B}\) is a compact set.

Definition 7.1

Assume hypothesis (7.1) holds. Given \(1 \le p < \infty \), the triplet \((M,\mu ,d)\) is said to define a p-admissible structure (in the sense of [48, Theorem 13.1]) if for every compact subset K of M there exist constants \(C_D = C_D(\Xi , K), C_P = C_P(\Xi , K) > 0\), and \(R = R(\Xi , K) > 0\), such that the following hold.

  1. (1)

    Doubling property:

    $$\begin{aligned} |B(x,2r)|\le C_D |B(x,r)| \text{ whenever } x \in K \text{ and } 0< r < R. \end{aligned}$$
    (D)
  2. (2)

    Weak (1, p)-Poincaré inequality:

    figure a

    whenever \(x \in K,\,0< r < R,\,u\in W_{\Xi }^{1,p}(B(x,2r)).\)

Theorems 1.1 and 1.2 yield the following

Theorem 7.2

Let \(X_1,\ldots ,X_m\) be a family of Hörmander vector fields in \(\Omega \subset \mathbb {R}^n\) and denote by \(\mu \) Lebesgue measure. For each \(\epsilon _0>0\) and \(\epsilon \in [0,\epsilon _0]\) denote by \(d_\epsilon \) the distance functions defined in Definition 2.6. For all \(\epsilon \in [0,\epsilon _0]\) and \(p\ge 1\), the space \((\Omega , \mu , d_\epsilon )\) is p-admissible, with constants \(C_D\) and \(C_P\) independent of \(\epsilon \).

Other examples of p-admissible spaces are:

  • The classical setting: \(\mathcal {M}={\mathbb {R}}^n,\,d\mu \) equals the n-dimensional Lebesgue measure, and \(\Xi = (X_1,\ldots , X_m)=(\partial _{x_1},\ldots ,\partial _{x_n})\).

  • Our setting is also sufficiently broad to include some non-smooth vector fields such as the Baouendi-Grushin frames, e.g., consider, for \(\gamma \ge 1\) and \((x,y)\in \mathbb {R}^2\), the vector fields \(X_1=\partial _x\) and \(X_2=|x|^{\gamma }\partial _y\). Unless \(\gamma \) is a positive even integer these vector fields fail to satisfy Hörmander’s finite rank hypothesis. However, the doubling inequality as well as the Poincaré inequality hold and have been used in the work of Franchi and Lanconelli [38] to establish Harnack inequalities for linear equations.

  • Consider a smooth manifold \(\mathcal {M}\) endowed with a complete Riemannian metric g. Let \(\mu \) denote the Riemann volume measure, and by \(\Xi \) denote a g-orthonormal frame. If the Ricci curvature is bounded from below (\(Ricci\ge -Kg\)) then one has a 2-admissible structure. In fact, in this setting the Poincaré inequality follows from Buser’s inequality while the doubling condition is a consequence of the Bishop–Gromov comparison principle. If \(K=0\), i.e. the Ricci tensor is non-negative, then these assumptions holds globally and so does the Harnack inequality.

Spaces with a p-admissible structure support a homogenous structure in the sense of Coifman and Weiss [28].

Lemma 7.3

Let \((\mathcal {M},\mu ,d)\) be a p-admissible structure for some \(p\ge 1,\,\Omega \) a bounded open set in \(\mathcal {M}\) and set \(K=\bar{\Omega }\). If \(x \in K\) and \(0< s< r < R\), then the following holds.

  1. (1)

    There exists a constant \(N = N(C_D) > 0\), called homogeneous dimension of K with respect to \((\Xi , d, \mu )\), such that \(|B(x,r)| \le C_D \tau ^{-N} |B(x,\tau r)|\), for all \(0 < \tau \le 1\).

  2. (2)

    There exists a continuous function \(\phi \in C_{0}(B(x,r))\cap W_{\Xi }^{1,\infty }(B(x,r))\) and a constant \(C = C(\Xi ,K) > 0\), such that \(\phi = 1\) in B(xs) and \(|\Xi \phi |\le C/(r-s),\,0 \le \phi \le 1\).

  3. (3)

    Metric balls have the so called \(\hat{\delta }\)-annular decay property, i.e., there exists \(\hat{\delta }= \hat{\delta }(C_D) \in (0,1]\), such that

    $$\begin{aligned} |B(x,r)\setminus B(x,(1-\epsilon )r)| \le C \epsilon ^{\hat{\delta }} |B(x,r)|, \end{aligned}$$

    whenever \(0< \epsilon < 1\).

Proof

Statement (1) follows from (D) by a standard iteration argument. Statement (2) is proved in [44, Theorem 1.5]. Statement (3) follows from [12, Corollary 2.2], since we have a Carnot–Carathéodory space. Furthermore, \(\hat{\delta }\) depends only on \(C_D\). \(\square \)

Given \(\Omega \subset M\), open, and \(1 \le p \le \infty \), we let \(W_{\Xi }^{1,p}(\Omega ) = \{ u \in L^p(\Omega ,\mu ): X_i u \in L^p (\Omega ,\mu ), i=1,\ldots ,m\}\) denote the horizontal Sobolev space, and we let \(W_{\Xi ,0}^{1,p}\subset W_{\Xi }^{1,p}\) be the closureFootnote 4 of the space of \(W_{\Xi }^{1,p}\) functions with compact (distributional) support in the norm \(\Vert u \Vert _{1,p}^p = \Vert u \Vert _p + \Vert \Xi u \Vert _p\) with respect to \(\mu \). In the following we will omit \(\mu \) in the notation for Lebesgue and Sobolev spaces. Given \(t_1<t_2\), and \(1 \le p \le \infty \), we let \(\Omega _{t_1,t_2} \equiv \Omega \times (t_1,t_2)\) and we let \(L^p(t_1,t_2;W_{\Xi }^{1,p}(\Omega )),\,t_1 < t_2\), denote the parabolic Sobolev space of real-valued functions defined on \(\Omega _{t_1,t_2}\) such that for almost every \(t,\,t_1< t < t_2\), the function \(x \rightarrow u(x,t)\) belongs to \(W_{\Xi }^{1,p}(\Omega )\) and

$$\begin{aligned} \Vert u \Vert _{L^{p}(t_1,t_2;W_{\Xi }^{1,p}(\Omega ))} = \left( \int _{t_1}^{t_2} \int _{\Omega } (|u(x,t)|^p + |\Xi u(x,t)|^p) d\mu dt \right) ^{1/p} < \infty . \end{aligned}$$

The spaces \(L^p(t_1,t_2;W_{\Xi ,0}^{1,p}(\Omega ))\) is defined analogously. We let \(W^{1,p}(t_1,t_2;L^p(\Omega ))\) consist of real-valued functions \(\eta \in L^p(t_1,t_2;L^p(\Omega ))\) such that the weak derivative \(\partial _t\eta (x,t)\) exists and belongs to \(L^p(t_1,t_2;L^p(\Omega ))\). Consider the set of functions \(\phi ,\,\phi \in W^{1,p}(t_1,t_2;L^p(\Omega ))\), such that the functions

$$\begin{aligned} t\rightarrow \int _{\Omega } |\phi (x,t)|^p d \mu (x) \text{ and } t\rightarrow \int _{\Omega } |\partial _t \phi (x,t)|^p d \mu (x), \end{aligned}$$

have compact support in \((t_1,t_2)\). We let \(W_0^{1,p}(t_1,t_2; L^p(\Omega ))\) denote the closure of this space under the norm in \(W^{1,p}(t_1,t_2; L^p(\Omega ))\).

From [48, Corollary 9.5] one can see that the metric balls \(B(x_0,r)\) are John domains. Consequently, (D), (P), and [48, Theorem 9.7] yield Sobolev-Poincaré inequality,

Lemma 7.4

Let \(B(x_0,r) \subset \Omega ,\,0< r< R,\,1 \le p<\infty \). There exists a constant \(C = C(C_D,C_P,p) \ge 1\) such that for every \(u \in W_{\Xi }^{1,p} (B(x_0,r))\),

where \(u_B\) denotes the \(\mu \) average of u over \(B(x_0,r)\), and where \(1 \le \kappa \le {N}/{(N-p)}\), if \(1 \le p < N\), and \(1 \le \kappa < \infty \), if \(p \ge N\). Moreover,

whenever \(u \in W_{\Xi ,0}^{1,p} (B(x_0,r))\).

7.2 Quasilinear degenerate parabolic PDE

In this section we list some recent results concerning regularity of weak solutions of certain nonlinear, degenerate parabolic PDE in spaces \((M,\mu ,d)\) that are p-admissible for some \(p\in [2,\infty )\). If \(p=2\) we can allow lower order terms, but at present this is not yet established for \(p>2\). Given a domain (i.e., an open, connected set) \(\Omega \subset M\), and \(T>0\) we set \(\Omega _T=\Omega \times (0,T)\). For a function \(u:\Omega _T\rightarrow \mathbb {R}\), and \(1\le p,q\) we define the norms

$$\begin{aligned} ||u||_{p,q}^q=\Big (\int _0^T \Big (\int _\Omega |u|^p dx\Big )^{\frac{q}{p}} dt\Big )^{\frac{1}{q}}, \end{aligned}$$
(7.2)

and the corresponding Lebesgue spaces \(L^{p,q}(\Omega _T)=L^q([0,T], L^p(\Omega ))\). We will say that \(\mathcal {A}, \mathcal {B}\) are admissible symbols (in \(\Omega _T\)) if the following holds:

  1. (i)

    \((x,t) \rightarrow \mathcal {A}(x,t,u,\xi ), \mathcal {B}(x,t,u,\xi )\) are measurable for every \((u,\xi ) \in \mathbb {R}\times \mathbb {R}^m\),

  2. (ii)

    \((u,\xi ) \rightarrow \mathcal {A}(x,t,u,\xi ), \mathcal {B}(x,t,u,\xi )\) are continuous for almost every \((x,t) \in \Omega _T\),

  3. (iii)
    • For \(p=2\): There exist constants \(a,\bar{a}>0\) and functions \(b,c,e,f,h\in L^{\tilde{p},q}(Q)\) with \(\tilde{p}>2\), and q given by \(\frac{N}{2\tilde{p}}+\frac{1}{q}<\frac{1}{2}\) and functions \(d,g\in L^{\alpha ,\beta }(Q)\) with \(1<\alpha \) and \(\beta \) given by \(\frac{N}{2\alpha }+\frac{1}{\beta }<1\) such that for a.e. \((x,t) \in \Omega _T\) and \(\xi \in \mathbb {R}^m\) one has

      $$\begin{aligned} \sum _{i=1}^{{m}} \mathcal {A}_i (x,t,u,\xi ) \xi _i&\ge a|\xi |^2-b^2 u^2-f^2, \nonumber \\ |\mathcal {A}(x,t,u,\xi )|&\le \bar{a} |\xi |+e|u|+h,\nonumber \\ |\mathcal {B}(x,t,u,\xi )|&\le c|\xi |+d|u|+g. \end{aligned}$$
      (7.3)

      In view of the conditions on \(\tilde{p},q,\alpha ,\beta \) there exists \(\theta >0\) such that

      $$\begin{aligned} \tilde{p}\ge \frac{2}{1-\theta }&\text { and } \frac{{N}}{2 \tilde{p}}+\frac{1}{q}\le \frac{1-\theta }{2} \nonumber \\ \alpha \ge \frac{1}{1-\theta }&\text { and } \frac{{N}}{2\alpha }+\frac{1}{\beta } \le 1-\theta . \end{aligned}$$
      (7.4)

      We say that a constant depends on the structure conditions (7.3), if it depends only onFootnote 5

      $$\begin{aligned} a,\bar{a}, ||b||, ||c||,||d|| ,||e||, ||f||,||g||,||h||, N, \theta , \end{aligned}$$

      and is uniformly bounded if these quantities are so.

    • For \(p>2\) we will only consider \(\mathcal {B}=0\) and ask that the following bounds

      $$\begin{aligned} \mathcal {A}(x,t,u,\xi ) \cdot \xi \ge \mathcal {A}_0 |\xi |^p, \ |\mathcal {A}(x,t,u,\xi )| \le \mathcal {A}_1 |\xi |^{p-1}, \end{aligned}$$
      (7.5)

      hold for every \((u,\xi ) \in \mathbb {R}\times \mathbb {R}^m\) and almost every \((x,t) \in \Omega _T\).

\(\mathcal {A}_0\) and \(\mathcal {A}_1\) are called the structural constants of \(\mathcal {A}\). If \(\mathcal {A}\) and \(\tilde{\mathcal {A}}\) are both admissible symbols, with the same structural constants \(\mathcal {A}_0\) and \(\mathcal {A}_1\), then we say that the symbols are structurally similar.

Let E be a domain in \(\mathcal {M}\times \mathbb {R}\). We say that the function \(u:E\rightarrow \mathbb {R}\) is a weak solution to

$$\begin{aligned} \partial _t u(x,t)= L_{A,p} u \equiv -\sum _{i=1}^{m}X_i^*\mathcal {A}_i(x,t,u,\Xi u)+ \mathcal {B}(x,t,u,\Xi u), \end{aligned}$$
(7.6)

in E, where \(X^*_i\) is the formal adjoint w.r.t. \(d \mu \), if whenever \(\Omega _{t_1,t_2} \Subset E\) for some domain \(\Omega \subset \mathcal {M},\,u \in L^p(t_1,t_2;W_{\Xi }^{1,p}(\Omega ))\) and

$$\begin{aligned}&\int _{t_1}^{t_2} \int _{\Omega } u \frac{\partial \eta }{\partial t} d\mu dt - \int _{t_1}^{t_2} \int _{\Omega } \mathcal {A}(x,t,u,\Xi u) \cdot \Xi \eta \ d\mu dt \nonumber \\&\quad + \int _{t_1}^{t_2} \int _{\Omega } \mathcal {B}(x,t,u,\Xi u) \eta \ d\mu dt = 0, \end{aligned}$$
(7.7)

for every test function

$$\begin{aligned} \eta \in W_0^{1,2}(t_1,t_2; L^2(\Omega )) \cap L^p (t_1,t_2; W_{\Xi ,0}^{1,p}(\Omega )). \end{aligned}$$

A function u is a weak super-solution (sub-solution) to (7.6) in E if whenever \(\Omega _{t_1,t_2} \Subset E\) for some domain \(\Omega \subset \mathcal {M}\), we have \(u \in L^p(t_1,t_2;W^{1,p}(\Omega ))\), and the left hand side of (7.7) is non-negative (non-positive) for all non-negative test functions \(W_0^{1,2}(t_1,t_2; L^2(\Omega )) \cap L^p (t_1,t_2; W_{\Xi ,0}^{1,p}(\Omega ))\).

The main results in [1] and [18] can be summarized in the following theorem.

Theorem 7.5

Let \((\mathcal {M},\mu ,d)\) be a p-admissible structure for some fixed \(p\in [2,\infty )\). For a bounded open subset \(\Omega \subset \mathcal {M}\), let u be a non-negative, weak solution to (7.6) in an open set containing the cylinder \(\Omega \times [0,T_0]\) and assume that the structure conditions (7.5) are satisfied.

  • For \(p=2\) and for any subcylinder \(Q_{3\rho }=B(\bar{x},3\rho )\times (\bar{t}-9\rho ^2,\bar{t})\subset Q\) there exists a constant \(C>0\) depending on \(C_D,\,C_L,\,C_P\), the structure conditions (7.3) and on \(\rho \) such that

    $$\begin{aligned} \sup _{Q^-}u\le C\inf _{Q^+} (u+\rho ^\theta k), \end{aligned}$$
    (7.8)

    where

    $$\begin{aligned} Q^+=B(x,\rho )\times (\bar{t}-\rho ^2,\bar{t})\text { and }Q^-=B(x,\rho )\times (\bar{t}- 8\rho ^2, \bar{t}-7 \rho ^2) \end{aligned}$$
    (7.9)

    \(\theta >0\) is defined as in (7.4), and we have let \(k=||f||+||g||+||h||\).

  • For \(p>2\): Assuming \(\mathcal {B}=0\), there exist constants \(C_1,C_2,C_3 \ge 1\), depending only on \(\Xi ,C_D,C_P,\mathcal {A}_0,\mathcal {A}_1,p\), such that for almost all \((x_0,t_0)\in \Omega \times [0,T_0]\), the following holds: If \(u(x_0,t_0)>0\), and if \(0<r\le R(\Xi , \bar{\Omega })\) (from Definition 7.1) is sufficiently small so that

    $$\begin{aligned} B(x_0,8r)\subset \Omega \quad \text { and }\quad (t_0 - C_1 u(x_0,t_0)^{2-p}{r}^p,\ t_0 + C_1 u(x_0,t_0)^{2-p}{r}^p) \subset (0,T_0), \end{aligned}$$

    then

    $$\begin{aligned} u(x_0, t_0) \le C_2\inf _{Q}u, \end{aligned}$$

    where

    $$\begin{aligned} Q=B(x_0, {r}) \times \bigg (t_0 +\frac{1}{2}{C_3} { u(x_0,t_0)^{2-p}{r}^p}, t_0 + C_3 u(x_0,t_0)^{2-p}{r}^p\bigg ). \end{aligned}$$

    Furthermore, the constants \(C_1,C_2,C_3\) can be chosen independently of p as \(p\rightarrow 2\).

We conclude this section with a corollary of the proof in [18] [Lemma 3.6], a weak Harnack inequality that plays an important role in the proof of the regularity of the mean curvature flow for graphs over certain Lie groups established in [14]. Consider a weak supersolution \(w \in L^p(t_1,t_2;W_{\Xi }^{1,p}(\Omega ))\) of the linear equation

$$\begin{aligned} -\partial _t w- \sum _{i=1}^m X_i^* (a_{ij}(x,t) X_j w)= g(x,t), \end{aligned}$$
(7.10)

with \(t_1,t_2,\Omega \) as defined above. Assume the coercivity hypothesis

$$\begin{aligned} \Lambda ^{-1} \sum _{d(i)=1} \xi _i^2 \le \sum _{i,j=1}^m a_{ij}(x,t) \xi _i \xi _j \le \Lambda \sum _{d(i)=1} \xi _i^2 \end{aligned}$$
(7.11)

for a.e. (xt) and all \(\xi \in \mathbb {R}^m\), for a suitable constant \(\Lambda \).

Proposition 7.6

Let \((\mathcal {M},\mu ,d)\) be a 2-admissible structure. For a bounded open subset \(\Omega \subset \mathcal {M}\), let w be a non-negative, weak supersolution to (7.10) in an open set containing the cylinder \(\Omega \times [0,T_0]\) and assume that conditions (7.11) are satisfied. For any subcylinder \(Q_{3\rho }=B(\bar{x},3\rho )\times (\bar{t}-9\rho ^2,\bar{t})\subset Q\) there exists a constant \(C>0\) depending on \(C_D,\,C_L,\,C_P\), the structure conditions (7.3) and on \(\rho \) such that

$$\begin{aligned} \frac{1}{|Q^-|} \int _{Q^-} w \ dx dt \le C\left( \inf _{Q^+} w +\sup _{Q^+} |g| \rho ^2\right) , \end{aligned}$$
(7.12)

with \(Q^+, Q^-\) as defined in (7.9).

8 Application II: regularity for quasilinear subelliptic PDE through Riemannian approximation

Let G be a Carnot group of step two. We denote by g its Lie algebra and by \(g=V^1\oplus V^2\) its stratification. If \(g_0\) is a subRiemannian metric on \(V^1\) we let \(d_0\) denote its corresponding CC metric, and \(X_1,\ldots ,X_m\) a left-invariant orthonormal basis of \(V^1\). Consider a smooth surface \(M\subset G\). A point \(p\in M\) is characteristic if the horizontal distribution given by left translation of \(V^1\) is entirely contained in the tangent plane \(T_pM\). Equivalently, if M is represented as a 0-level set of a function f, the points where the horizontal gradient of the defining function does vanish are called characteristic. At non-characteristic points several equivalent definitions for the notion of horizontal mean curvature \(h_0\) have been proposed. To quote a few: \(h_0\) can be defined in terms of the legendrian foliation [22], first variation of the area functional [19, 22, 30, 49, 75], as horizontal divergence of the horizontal unit normal or as limit of the mean curvatures \(h_\epsilon \) of suitable Riemannian approximating metrics \(\sigma _\epsilon \) [19]. If the surface is not regular, the notion of curvature can be expressed in the viscosity sense (we refer to [2, 4, 5, 13, 61, 63, 81, 82] for viscosity solutions of PDE in the sub-Riemannian setting). The formulation we use here is the following, at every non-characteristic point p in the level set of f we set

$$\begin{aligned} h_0(p)=\sum _{i=1}^m X_i\left( X_i f/ |\nabla _0 f|_0\right) . \end{aligned}$$

The mean curvature flow is the motion of a surface where each points is moving in the direction of the normal with speed equal to the mean curvature. In the total variation flow the speed is the mean curvature, scaled by the modulus of the gradient. Both flows play key roles in digital image processing as well as provide prototypes for modeling motion of interfaces in a variety of physical settings.

As an illustration of the usefulness of the uniform estimates established above, in this section we want to briefly sketch the strategy used in [14] and [17], where the Riemannian approximation scheme is used to establish regularity for the graph solutions of the Total Variation flow

$$\begin{aligned} \frac{\partial u}{\partial t} =\sum _{i=1}^m X_i \left( \frac{X_iu}{\sqrt{1+|\nabla _0 u|^2}}\right) , \end{aligned}$$
(8.1)

and for the graphical solutions of the mean curvature flow

$$\begin{aligned} \frac{\partial u}{\partial t} =\sqrt{1+|\nabla _0 u|^2}\sum _{i=1}^m X_i \left( \frac{X_iu}{\sqrt{1+|\nabla _0 u|^2}}\right) . \end{aligned}$$
(8.2)

In both cases \(\Omega \subset G\) is a bounded open set, with G is a Lie group, free up to step two, but not necessarily nilpotent. These equations describe the motions of the (non-characteristic) hypersurfaces given by the graphs of the solutions in \(G\times \mathbb {R}\).

We will consider solutions arising as limits of solutions of the analogue Riemannian flows, i.e.

$$\begin{aligned} \frac{\partial u_{\epsilon }}{\partial t} = h_{\epsilon }=\sum _{i=1}^n X_i^{\epsilon }\left( \frac{X_i^{\epsilon }u_{\epsilon }}{W_{\epsilon } }\right) \quad \text { for }x\in \Omega , \; t>0, \end{aligned}$$
(8.3)

and

$$\begin{aligned} \frac{\partial u_{\epsilon }}{\partial t} =W_\epsilon h_{\epsilon }=W_\epsilon \sum _{i=1}^n X_i^{\epsilon }\left( \frac{X_i^{\epsilon }u_{\epsilon }}{W_{\epsilon } }\right) =\sum _{i,j=1}^na_{ij}^\epsilon (\nabla _\epsilon u_\epsilon ) X_i^\epsilon X_j^\epsilon u_\epsilon \quad \text { for }x\in \Omega , \; t>0, \end{aligned}$$
(8.4)

where, \(h_\epsilon \) is the mean curvature of the graph of \(u_\epsilon (\cdot , t)\) and

$$\begin{aligned} W_\epsilon ^2=1+|\nabla _\epsilon u_\epsilon |^2= 1+\sum _{i=1}^n \left( X_i^\epsilon u_\epsilon \right) ^2 \text { and } a_{ij}^\epsilon (\xi )= \delta _{ij}-\frac{\xi _i \xi _j}{1+|\xi |^2} , \end{aligned}$$
(8.5)

for all \(\xi \in \mathbb {R}^n\).

The main results in [14] and [17] concern long time existence of solutions of the initial value problems

$$\begin{aligned} \Bigg \{ \begin{array}{ll} \partial _t u_\epsilon = h_\epsilon W_\epsilon &{}\text { in }Q=\Omega \times (0,T) \\ u_\epsilon =\varphi &{}\text { on } \partial _p Q, \end{array} \quad \text { and }\quad \Bigg \{ \begin{array}{ll} \partial _t u_\epsilon = h_\epsilon &{}\text { in }Q=\Omega \times (0,T) \\ u_\epsilon =\varphi &{}\text { on } \partial _p Q, \end{array} \end{aligned}$$
(8.6)

with \(\partial _p Q=(\Omega \times \{t=0\})\cup (\partial \Omega \times (0,T))\) denoting the parabolic boundary of Q.

Theorem 8.1

Let G be a Lie group of step two, \(\Omega \subset G\) a bounded, open, convex set (in a sense to be defined later) and \(\varphi \in C^2(\bar{\Omega })\). There exists unique solutions \(u_\epsilon \in C^{\infty }(\Omega \times (0,\infty ))\cap L^\infty ((0,\infty ),C^1(\bar{\Omega }))\) of the two initial value problems in (8.6), and for each \(k\in \mathbb {N}\) and \(K\subset \subset Q\), there exists \(C_k=C_k(G,\varphi ,k,K,\Omega )>0\) not depending on \(\epsilon \) such that

$$\begin{aligned} ||u_\epsilon ||_{C^k(K)} \le C_k. \end{aligned}$$
(8.7)

Corollary 8.2

Under the assumptions of Theorem 8.1, as \(\epsilon \rightarrow 0\) the solutions \(u_\epsilon \) of either flow converge uniformly (with all theirs derivatives) on compact subsets of Q to the unique, smooth solution of the corresponding sub-Riemannian flow in \(\Omega \times (0,\infty )\) with initial data \(\varphi \).

The proof of this result rests crucially on the estimates established in this paper. In the following we list the main steps. First of all we note that in view of the short time existence result in the Riemannian setting we can assume that locally \(u_\epsilon \) are smooth both in time and space.

8.1 Interior gradient bounds

Denote by right \(X_i^r\) the left invariant frame corresponding to \(X_i's\) and observe that these two frames commute. For both flows, consider solutions \(u_\epsilon \in C^3(Q)\) and denote \(v_0=\partial _t u_\epsilon ,\,v_i = X_i^{r}u_\epsilon \) for \(i=i, \ldots , n\). Then for every \(h=0,\ldots ,n\) one has that \(v_h\) is a solution of

$$\begin{aligned} \partial _t v_h= X_i^\epsilon ( a_{ij} X_jv_h )= a_{ij}^\epsilon (\nabla _\epsilon u_\epsilon ) X_i^\epsilon X_j^\epsilon v_h + a^{i,j,h}(\nabla _\epsilon u)X_i^\epsilon X_j^\epsilon u_\epsilon X_k^\epsilon v_h, \end{aligned}$$
(8.8)

where

$$\begin{aligned} a^{i,j,h}(p)=\frac{\partial a^\epsilon _{ij}}{\partial p_h} , \quad a_{ij}^\epsilon (p) = \frac{1}{\sqrt{1+|p|^2}}\left( \delta _{ij}- \frac{p_i p_j}{1+|p|^2}\right) , \end{aligned}$$

for the total variation flow, while

$$\begin{aligned} a^{i,j,h}(p)=\frac{\partial a^\epsilon _{ij}}{\partial p_h} - \frac{\partial a^\epsilon _{ih}}{\partial p_j}, \quad a_{ij}^\epsilon (p) = \delta _{ij}- \frac{p_i p_j}{1+|p|^2}, \end{aligned}$$

for the mean curvature flow. The weak parabolic maximum principle yields that there exists \(C=C(G,||\varphi ||_{C^2( \Omega )})>0\) such that for every compact subset \(K\subset \subset \Omega \) one has

$$\begin{aligned} \sup _{K \times [0,T)} |\nabla _1 u_\epsilon | \le \sup _{\partial _p Q}(|\nabla _1 u_\epsilon | + |\partial _ t u_\epsilon |), \end{aligned}$$

where \(\nabla _1\) is the full \(g_1-\)Riemannian gradient. This yields the desired unform interior gradient bounds. This argument works in any Lie group, with no restrictions on the step.

8.2 Global gradient bounds

The proof of the boundary gradient estimates is more delicate and depends crucially on the geometry of the space. In particular the argument we outline here only holds in step two groups G and for domains \(\Omega \subset G\) that are locally Euclidean convex when expressed in the Rothschild-Stein preferred coordinates. As usual we note that the function \(v_\epsilon =u_\epsilon -\varphi \) solves the homogenous ’boundary’ value problem

$$\begin{aligned} \Bigg \{ \begin{array}{ll} P( v_\epsilon )=0 &{}\text { in }Q=\Omega \times (0,T) \\ v_\epsilon =0 &{}\text { on } \partial _p Q, \end{array} \end{aligned}$$
(8.9)

with \(b^\epsilon (x)= a_{ij}^\epsilon (\nabla _\epsilon v_\epsilon (x)+\nabla _\epsilon \varphi (x) ) X_i^\epsilon X_j^\epsilon \varphi (x).\) and

$$\begin{aligned} P(v)= a_{ij}^\epsilon (\nabla _\epsilon v_\epsilon +\nabla _\epsilon \varphi ) X_i^\epsilon X_j^\epsilon v_\epsilon +b^\epsilon -\partial _t v. \end{aligned}$$
(8.10)

Then we construct for each point \(p_0=(x_0,t_0)\in \partial \Omega \times (0,T)\) a barrier function:

Lemma 8.3

Let G be a Carnot group of step two and \(\Omega \subset G\) convex in the Euclidean sense. For each point \(p_0=(x_0,t_0)\in \partial \Omega \times (0,T)\) one can construct a positive function \(w\in C^2(Q)\) such that

$$\begin{aligned} Q(w)\le & {} 0 \text { in }V\cap Q \text { with }V\text { a parabolic neighborhood of } p_0,\nonumber \\ w(p_0)= & {} 0 \text { and }w\ge v_\epsilon \text { in }\partial _pV\cap Q. \end{aligned}$$
(8.11)

Proof

In the hypothesis that \(\Omega \) is convex in the Euclidean sense we have that every \(x_0\in \partial \Omega \) supports a tangent hyperplane \(\Pi \) defined by an equation of the form \(\Pi (x)=\sum _{i=1}^n a_i x_i=0\) with \(\Pi >0\) in \(\Omega ,\,\Pi (x_0)=0\), and normalized as \(\sum _{d(i)=1,2}a_i^2=1\). Following the standard argument (see for instance [60, Chapter 10]) we prove that there exists a function \(\Phi \) such that the barrier at \((x_0,t_0)\in \partial \Omega \times (0,T)\) can be expressed in the form \(w= \Phi (\Pi )\). \(\square \)

Now comparison with the barrier constructed above yields that

Proposition 8.4

Let G be a Carnot group of step two, \(\Omega \subset G\) a bounded, open, convex (in the Euclidean sense) set and \(\varphi \in C^2(\bar{\Omega })\). For \(\epsilon >0\) denote by \(u_\epsilon \in C^2(\Omega \times (0,T))\cap C^1(\bar{\Omega }\times (0,T))\) the non-negative unique solution of the initial value problem (8.6). There exists \(C=C(G, ||\varphi ||_{C^2(\bar{\Omega })})>0\) such that

$$\begin{aligned} \sup _{\partial \Omega \times (0,T)}|\nabla _\epsilon u_\epsilon | \le \sup _{\partial \Omega \times (0,T)}|\nabla _1 u_\epsilon | \le C. \end{aligned}$$
(8.12)

Proof

$$\begin{aligned} 0\le \frac{v_\epsilon (x,t)}{dist_{\sigma _1}(x,x_0)} \le \frac{w(x,t)}{{dist_{\sigma _1}(x,x_0)}} \le C(k,\nu ), \end{aligned}$$
(8.13)

in \(V\cap Q\), with \(dist_{\sigma _1}(x,x_0)\) being the distance between x and \(x_0\) in the Riemannian metric \(\sigma _1\), concluding the proof of the boundary gradient estimates. \(\square \)

8.3 Harnack inequalities and \(C^{1,\alpha }\) estimates

Let us first recall that \((G,d_\epsilon )\) is a 2-admissible geometry in the sense of Definition 7.1, with Doubling and Poincare constants uniform in \(\epsilon \ge 0\), as we proved in Theorems 1.1 and 1.2.

The total variation equation is expressed in divergence form, hence also the Eq. (8.8) satisfied by the right derivatives \(v_h = X^r_h u_\epsilon \) is in the same form. The mean curvature flow is not in divergence form. However, arguing as in [60] it is possible to show that there exists a regular, positive and strictly monotone function \(\Phi \) such that \(\Phi (v_h)\) satisfies a divergence form equation. As a consequence we can apply the Harnack inequalities in Theorem 7.5 and Proposition 7.6 to the bounded solutions \(v_h\) (or \(\Phi (v_h)\)), thus yielding the \(C^{1,\alpha }\) uniform interior estimates.

Corollary 8.5

Letting K be a compact set \(K\subset \subset Q\), there exist constants \(\alpha \in (0,1)\) and \(C=C(K,\alpha )>0\) such that for all \(i=1,\ldots ,n\) one has that \(v= X_i^\epsilon u \) satisfies

$$\begin{aligned} ||v||_{ C_{\epsilon ,X}^{\alpha }(K)} + ||\nabla _\epsilon v||_{L^2(K)}\le C, \end{aligned}$$

uniformly in \(\epsilon \in (0,1)\).

8.4 Schauder estimates and higher order estimates

The uniform Gaussian estimates and Schauder estimates in Theorem 1.4 applied to (8.8) yield the higher order estimates and conclude the proof. Once obtained the interior \(C^{1, \alpha }\) estimate of the solution uniform in \(\epsilon \), we write the mean curvature flow equation in non divergence form:

$$\begin{aligned} \partial _t u_\epsilon - \sum _{i,j=1}^n a^\epsilon _{ij}(x,t) X_i^\epsilon X_j^\epsilon u_\epsilon =0. \end{aligned}$$

Applying Schauder estimates in Proposition we immediately deduce the proof of Theorem 8.1.

Proof of Theorem 8.1

Since the solution is of class \(C^{1, \alpha }\), and the norm is bounded uniformly in \(\epsilon \) then \(u_\epsilon \) it is a solution of a divergence form equation

$$\begin{aligned} \partial _t u_\epsilon - \sum _{i,j=1}^n a^\epsilon _{ij}(x,t) X_i^\epsilon X_j^\epsilon u_\epsilon =0, \end{aligned}$$

with \(a_{ij}^\epsilon \) of class \(C^\alpha \) such that for every K be a compact sets such that \(K\subset \subset {Q}\) and \(2\delta =d_0(K, \partial _p Q)\) there exists a positive constant \(C_0\) such that

$$\begin{aligned} || a_{ij}^\epsilon ||_{C^{ \alpha }_{\epsilon ,X}(K_\delta )} \le C_0, \end{aligned}$$

for every \(\epsilon \in (0,1)\). Consequently, by Proposition 6.2 there exists a constant \(C_2\) such that

$$\begin{aligned} ||u_\epsilon ||_{C^2(K)} \le C_2. \end{aligned}$$

We now prove by induction that for every \(k\in N\) and for every compact set \(K\subset \subset {Q}\) there exists a positive constant C such that

$$\begin{aligned} ||u_\epsilon ||_{C^{k,\alpha }_{\epsilon ,X}(K)}\le C, \end{aligned}$$
(8.14)

for every \(\epsilon >0\). The assertion is true if \(k=2\), by Proposition 6.5. If (8.14) is true for \(k+1\), then the coefficients \(a_{ij}^\epsilon \) belong to \( C^{k, \alpha }_{\epsilon ,X}\) uniformly as \(\epsilon \in (0,1)\) and (8.14) holds at the level \(k+2\) by virtue of Proposition 6.2. \(\square \)