1 Introduction and statement of main results

In this paper we establish certain estimates related to the solvability of the Dirichlet, Neumann and Regularity problems with data in \(L^2\), in the following these problems are referred to as (D2), (N2) and (R2), by way of layer potentials and for second order parabolic equations of the form

$$\begin{aligned} \mathcal {H}u:=(\partial _t+\mathcal {L})u = 0, \end{aligned}$$
(1.1)

where

$$\begin{aligned} \mathcal {L}:=-\text{ div } A(X,t)\nabla =-\sum _{i,j=1}^{n+1}\partial _{x_i}(A_{i,j}(X,t)\partial _{x_j}) \end{aligned}$$

is defined in \(\mathbb R^{n+2}=\{(X,t)=(x_1,\ldots ,x_{n+1},t)\in \mathbb R^{n+1}\times \mathbb R\}\), \(n\ge 1\). \(A=A(X,t)=\{A_{i,j}(X,t)\}_{i,j=1}^{n+1}\) is assumed to be a \((n+1)\times (n+1)\)-dimensional matrix with complex coefficients satisfying the uniform ellipticity condition

$$\begin{aligned} \,\mathrm{(i)}&\Lambda ^{-1}|\xi |^2\le \text{ Re } \left( \sum _{i,j=1}^{n+1} A_{i,j}(X,t)\xi _i\bar{\xi }_j\right) ,\nonumber \\ \,\mathrm{(ii)}&|A\xi \cdot \zeta |\le \Lambda |\xi ||\zeta |, \end{aligned}$$
(1.2)

for some \(\Lambda \), \(1\le \Lambda <\infty \), and for all \(\xi ,\zeta \in \mathbb C^{n+1}\), \((X,t)\in \mathbb R^{n+2}\). Here \(u\cdot v=u_1v_1+\cdots +u_{n+1}v_{n+1}\), \(\bar{u}\) denotes the complex conjugate of u and \(u\cdot \bar{v}\) is the (standard) inner product on \(\mathbb C^{n+1}\). In addition, we consistently assume that

$$\begin{aligned} A(x_1,\ldots ,x_{n+1},t)=A(x_1,\ldots ,x_{n}),\quad {\text {i.e.,}}\; A\; {\mathrm{is } \ \mathrm{independent } \ \mathrm{of }} \ x_{n+1}\quad {\text {and}}\; t. \end{aligned}$$
(1.3)

The solvability of (D2), (N2) and (R2) for the operator \(\mathcal {H}\) in \(\mathbb R^{n+2}_+=\{(x,x_{n+1},t)\in \mathbb R^{n}\times \mathbb R\times \mathbb R:\ x_{n+1}>0\}\), with data prescribed on \(\mathbb R^{n+1}=\partial \mathbb R^{n+2}_+=\{(x,x_{n+1},t)\in \mathbb R^{n}\times \mathbb R\times \mathbb R:\ x_{n+1}=0\}\) and by way of layer potentials, can roughly be decomposed into two steps: boundedness of layer potentials and invertibility of layer potentials. In this paper we first prove, in the case of equations of the form (1.1), satisfying (1.2) and (1.3) and the De Giorgi–Moser–Nash estimates stated in (2.6) and (2.7) below, that a set of key boundedness estimates for associated single layer potentials can be reduced to two crucial estimates (Theorem 1.1), one being a square function estimate involving the single layer potential. By establishing a local parabolic Tb-theorem for square functions, and by establishing a version of the main result in [15] for equations of the form (1.1), assuming in addition that A is real and symmetric, we are then subsequently able to verify the two crucial estimates in the case of real, symmetric operators (1.1) satisfying (1.2) and (1.3) (Theorem 1.2). As part of this argument we establish, and this is of independent interest, a scale-invariant reverse Hölder inequality for the parabolic Poisson kernel (Theorem 1.3). The invertibility of layer potentials, and hence the solvability of the Dirichlet, Neumann and Regularity problems \(L^2\)-data, is addressed in [33].

Jointly, this paper and [33] yield solvability for (D2), (N2) and (R2), by way of layer potentials, when the coefficient matrix is either

$$\begin{aligned} \,\mathrm{(i)}&\text{ a } \text{ small } \text{ complex } \text{ perturbation } \text{ of } \text{ a } \text{ constant } \text{(complex) } \text{ matrix, } \text{ or }\nonumber \\ \,\mathrm{(ii)}&\text{ a } \text{ real } \text{ and } \text{ symmetric } \text{ matrix, } \text{ or }\nonumber \\ \,\mathrm{(iii)}&\text{ a } \text{ small } \text{ complex } \text{ perturbation } \text{ of } \text{ a } \text{ real } \text{ and } \text{ symmetric } \text{ matrix }. \end{aligned}$$

In all cases the unique solutions can be represented in terms of layer potentials. We claim that the results established in this paper and in [33], and the tools developed, pave the way for important developments in the area of parabolic PDEs. In particular, it is interesting to generalize the present paper and [33] to the context of \(L^p\) and relevant endpoint spaces, and to challenge the assumption in (1.3).

The main results of this paper and [33] can jointly be seen as a parabolic analogue of the elliptic results established in [3] and we recall that in [3] the authors establish results concerning the solvability of the Dirichlet, Neumann and Regularity problems with data in \(L^2\), i.e., (D2), (N2) and (R2), by way of layer potentials and for elliptic operators of the form \(-\text{ div }\, A(X)\nabla ,\) in \(\mathbb R_+^{n+1}:=\{X=(x,x_{n+1})\in \mathbb R^{n}\times \mathbb R:\ x_{n+1}>0\}\), \(n\ge 2\), assuming that A is a \((n+1)\times (n+1)\)-dimensional matrix which is bounded, measurable, uniformly elliptic and complex, and assuming, in addition, that the entries of A are independent of the spatial coordinate \(x_{n+1}\). Moreover, if A is real and symmetric, (D2), (N2) and (R2) were solved in [2729], but the major achievement in [3] is that the authors prove that the solutions can be represented by way of layer potentials. In [24] a version of [3], but in the context of \(L^p\) and relevant endpoint spaces, was developed and in [26] the structural assumption that A is independent of the spatial coordinate \(x_{n+1}\) is challenged. The core of the impressive arguments and estimates in [3] is based on the fine and elaborated techniques developed in the context of the proof of the Kato conjecture, see [4, 5, 20].

1.1 Notation

Based on (1.3) we let \(\lambda =x_{n+1}\), and when using the symbol \(\lambda \) we will write the point \((X,t)=(x_1,\ldots ,x_{n},x_{n+1},t)\) as \( (x,t, \lambda )=(x_1,\ldots ,x_{n},t,\lambda )\). Using this notation,

$$\begin{aligned}\mathbb R^{n+2}_+=\{(x,t,\lambda )\in \mathbb R^{n}\times \mathbb R\times \mathbb R:\ \lambda >0\},\end{aligned}$$

and

$$\begin{aligned}\mathbb R^{n+1}=\partial \mathbb R^{n+2}_+=\{(x,t,\lambda )\in \mathbb R^{n}\times \mathbb R\times \mathbb R:\ \lambda =0\}.\end{aligned}$$

We write \(\nabla :=(\nabla _{||},\partial _\lambda )\) where \(\nabla _{||}:=(\partial _{x_1},\ldots ,\partial _{x_n})\). We let \(L^2(\mathbb R^{n+1},\mathbb C)\) denote the Hilbert space of functions \(f:\mathbb R^{n+1}\rightarrow \mathbb C\) which are square integrable and we let \(||f||_2\) denote the norm of f. We also introduce

$$\begin{aligned} |||\cdot |||:=\left( \int _0^\infty \int _{\mathbb R^{n+1}}|\cdot |^2\, \frac{dxdtd\lambda }{\lambda }\right) ^{1/2}. \end{aligned}$$
(1.4)

Given \((x,t)\in \mathbb R^{n}\times \mathbb R\) we let \(\Vert (x,t)\Vert \) be the unique positive solution \(\rho \) to the equation

$$\begin{aligned} \frac{t^2}{\rho ^4}+\sum \limits ^{n}_{i=1}\frac{x^2_i}{\rho ^2}=1. \end{aligned}$$

Then \(\Vert (\gamma x,\gamma ^2t)\Vert =\gamma \Vert (x,t)\Vert \), \(\gamma >0\), and we call \(\Vert (x,t)\Vert \) the parabolic norm of (xt). We define the parabolic first order differential operator \(\mathbb D\) through the relation

$$\begin{aligned} \widehat{(\mathbb D f)}(\xi ,\tau ):=\Vert (\xi ,\tau )\Vert \hat{f}(\xi ,\tau ), \end{aligned}$$

where \(\widehat{(\mathbb D f)}\) and \(\hat{f}\) denote the Fourier transform of \(\mathbb D f\) and f, respectively. We define the fractional (in time) differentiation operators \(D_{1/2}^t\) through the relation

$$\begin{aligned} \widehat{(D_{1/2}^t f)}(\xi ,\tau ):=|\tau |^{1/2}\hat{f}(\xi ,\tau ). \end{aligned}$$

We let \(H_t\) denote a Hilbert transform in the t-variable defined through the multiplier \(i\text{ sgn }({\tau })\). We make the construction so that

$$\begin{aligned} \partial _t=D_{1/2}^tH_tD_{1/2}^t. \end{aligned}$$

By applying Plancherel’s theorem we have

$$\begin{aligned} \Vert \mathbb D f\Vert _{2}\approx \Vert \nabla _{||} f\Vert _2+\Vert H_tD^t_{1/2}f\Vert _2\approx \Vert \nabla _{||} f\Vert _2+\Vert D^t_{1/2}f\Vert _2, \end{aligned}$$

with constants depending only on n.

1.2 Non-tangential maximal functions

Given \((x_0,t_0)\in \mathbb R^{n+1}\), and \(\beta >0\), we define the cone

$$\begin{aligned} \Gamma ^\beta (x_0,t_0):=\{(x,t,\lambda )\in \mathbb R^{n+2}_+:\ ||(x-x_0,t-t_0)||<\beta \lambda \}. \end{aligned}$$

Consider a function U defined on \(\mathbb R^{n+2}_+\). The non-tangential maximal operator \(N_*^\beta \) is defined

$$\begin{aligned} N_{*}^\beta (U)(x_0,t_0):=\sup _{(x,t,\lambda )\in \Gamma ^\beta (x_0,t_0)}\ |U(x,t,\lambda )|. \end{aligned}$$

Given \((x,t)\in \mathbb R^{n+1}\), \(\lambda >0\), we let

$$\begin{aligned} Q_\lambda (x,t):=\{(y,s):\ |x_i-y_i|<\lambda ,\ |t-s|<\lambda ^2\} \end{aligned}$$

denote the parabolic cube on \(\mathbb R^{n+1}\), with center (xt) and side length \(\lambda \). We let

$$\begin{aligned} W_\lambda (x,t):=\{(y,s,\sigma ):\ (y,s)\in Q_\lambda (x,t),\lambda /2<\sigma <3\lambda /2\} \end{aligned}$$

be an associated Whitney type set. Using this notation we also introduce

$$\begin{aligned} \tilde{N}_*^\beta (U)(x_0,t_0):=\sup _{(x,t,\lambda )\in \Gamma ^\beta (x_0,t_0)}\left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{W_\lambda (x,t)}|U(y,s,\sigma )|^2\, dydsd\sigma \right) ^{1/2}. \end{aligned}$$

We let

$$\begin{aligned} \Gamma (x_0,t_0):=\Gamma ^1(x_0,t_0),\ N_{*}(U):=N_{*}^1(U),\ \tilde{N}_{*}(U):=\tilde{N}_{*}^1(U). \end{aligned}$$

Furthermore, in many estimates it is necessary to increase the \(\beta \) in \(\Gamma ^\beta \) as the estimate progresses. We will use the convention, when the exact \(\beta \) is not important, that \(N_{**}(U)\), \(\tilde{N}_{**}(U)\), equal \(N_{*}^\beta (U)\), \(\tilde{N}_{*}^\beta (U)\), for some \(\beta >1\). In fact, the \(L^p\)-norms of \(N_{*}\) and \(N_{*}^\beta \) are equivalent, for any \(\beta >0\) (see for example [16, Lemma 1, p. 166]).

1.3 Single layer potentials

Consider \(\mathcal {H}=\partial _t+\mathcal {L}=\partial _t-\mathop {{\text {div}}}\nolimits A\nabla \) and \(\mathcal {H}^*:=-\partial _t+\mathcal {L}^*\), where \(\mathcal {L}^*\) is the hermitian adjoint of \(\mathcal {L}\), i.e., \(\mathcal {L}^*=-\mathop {{\text {div}}}\nolimits A^*\nabla \). Assume that \(\mathcal {H}\), \(\mathcal {H}^*\), satisfy (1.2) and (1.3). Then \(\mathcal {L}=-\mathop {{\text {div}}}\nolimits A\nabla \) defines, recall that A is independent of t, a maximal accretive operator on \(L^2(\mathbb R^{n+1},\mathbb C)\) and \(-\mathcal {L}\) generates a contraction semigroup on \(L^2(\mathbb R^{n+1},\mathbb C)\), \(e^{-t\mathcal {L}}\), for \(t>0\), see p. 28 in [6]. Let \(K_t(X,Y)\) denote the distributional or Schwartz kernel of \(e^{-t\mathcal {L}}\). In the statement of our main results, and hence throughout the paper, we will assume, in addition to (1.2) and (1.3), that \(\mathcal {H}\), \(\mathcal {H}^*\), both satisfy De Giorgi–Moser–Nash estimates stated in (2.6) and (2.7) below. This assumption implies, in particular, that \(K_t(X,Y)\) is, for each \(t>0\), Hölder continuous in X and Y and that \(K_t(X,Y)\) satisfies the Gaussian (pointwise) estimates stated in Definition 2 on p. 29 in [6]. Under these assumptions we introduce

$$\begin{aligned} \Gamma (x,t,\lambda ,y,s,\sigma ):=\Gamma ^{\mathcal {H}}(X,t,Y,s):=K_{t-s}(X,Y)=K_{t-s}(x,\lambda ,y,\sigma ) \end{aligned}$$

whenever \(t-s>0\) and we put \(\Gamma (x,t,\lambda ,y,s,\sigma )=0\) whenever \(t-s<0\). Based on (1.3) we in the following let

$$\begin{aligned} \Gamma _\lambda (x,t,y,s):= & {} \Gamma (x,t,\lambda ,y,s,0),\nonumber \\ \Gamma _\lambda ^*(y,s,x,t):= & {} \Gamma ^*(y,s,0,x,t,\lambda ), \end{aligned}$$

and we introduce associated single layer potentials

$$\begin{aligned} \mathcal {S}_\lambda ^{\mathcal {H}} f(x,t):= & {} \int _{\mathbb R^{n+1}}\Gamma _\lambda (x,t,y,s)f(y,s)\, dyds,\nonumber \\ \mathcal {S}_\lambda ^{\mathcal {H}^*} f(x,t):= & {} \int _{\mathbb R^{n+1}}\Gamma _\lambda ^*(y,s,x,t)f(y,s)\, dyds. \end{aligned}$$

1.4 Statement of main results

The following are our main results.

Theorem 1.1

Consider \(\mathcal {H}=\partial _t-\mathop {{\text {div}}}\nolimits A\nabla \). Assume that \(\mathcal {H}\), \(\mathcal {H}^*\), satisfy (1.2) and (1.3) as well as the De Giorgi–Moser–Nash estimates stated in (2.6) and (2.7) below. Assume that there exists a constant C such that

$$\begin{aligned} \,\mathrm{(i)}&\sup _{\lambda> 0}||\partial _\lambda \mathcal {S}_{\lambda }^{\mathcal {H}}f||_2+\sup _{\lambda > 0}||\partial _\lambda \mathcal {S}_{\lambda }^{\mathcal {H}^*}f||_2\le C ||f||_2,\nonumber \\ \,\mathrm{(ii)}&|||\lambda \partial _\lambda ^2 \mathcal {S}_{\lambda }^{\mathcal {H}}f|||+|||\lambda \partial _\lambda ^2 \mathcal {S}_{\lambda }^{\mathcal {H}^*}f|||\le C ||f||_2, \end{aligned}$$
(1.5)

whenever \(f\in L^2(\mathbb R^{n+1},\mathbb C)\). Then there exists a constant c, depending at most on n, \(\Lambda \), the De Giorgi–Moser–Nash constants and C , such that

$$\begin{aligned} \,\mathrm{(i)}&||N_*(\partial _\lambda \mathcal {S}_\lambda ^{\mathcal {H}} f)||_2+||N_*(\partial _\lambda \mathcal {S}_\lambda ^{\mathcal {H}^*} f)||_2\le c||f||_2,\nonumber \\ \,\mathrm{(ii)}&\sup _{\lambda>0}||\mathbb D\mathcal {S}_{\lambda }^{\mathcal {H}}f||_{2}+\sup _{\lambda >0}||\mathbb D\mathcal {S}_{\lambda }^{\mathcal {H}^*} f||_{2}\le c||f||_2,\nonumber \\ \,\mathrm{(iii)}&||\tilde{N}_*(\nabla _{||}\mathcal {S}_\lambda ^{\mathcal {H}} f)||_2+||\tilde{N}_*(\nabla _{||}\mathcal {S}_\lambda ^{\mathcal {H}^*} f)||_2\le c||f||_2,\nonumber \\ \,\mathrm{(iv)}&||\tilde{N}_*(H_tD_{1/2}^t\mathcal {S}_\lambda ^{\mathcal {H}} f)||_2+||\tilde{N}_*(H_tD_{1/2}^t\mathcal {S}_\lambda ^{\mathcal {H}^*} f)||_2\le c||f||_2, \end{aligned}$$
(1.6)

whenever \(f\in L^2(\mathbb R^{n+1},\mathbb C)\).

Theorem 1.2

Consider \(\mathcal {H}=\partial _t-\mathop {{\text {div}}}\nolimits A\nabla \). Assume that \(\mathcal {H}\) satisfies (1.2) and (1.3). Assume in addition that A is real and symmetric. Then there exists a constant C , depending at most on n, \(\Lambda \), such that (1.5) holds with this C . In particular, the estimates in (1.6) all hold, with constants depending only on n, \(\Lambda \), C , in the case when A is real, symmetric and satisfies (1.2) and (1.3).

Theorem 1.3

Assume that \(\mathcal {H}=\partial _t-\mathop {{\text {div}}}\nolimits A\nabla \) satisfies (1.2) and (1.3). Suppose in addition that A is real and symmetric. Then the parabolic measure associated to \(\mathcal {H}\), in \(\mathbb R^{n+2}_+\), is absolutely continuous with respect to the measure dxdt on \(\mathbb R^{n+1}=\partial \mathbb R^{n+2}_+\). Moreover, let \(Q\subset \mathbb R^{n+1}\) be a parabolic cube and let \(K(A_Q,y,s)\) be the to \(\mathcal {H}\) associated Poisson kernel at \(A_Q:=(x_Q,l(Q),t_Q)\) where \((x_Q,t_Q)\) is the center of the cube Q and l(Q) defines its size. Then there exists \(c\ge 1\), depending only on n and \(\Lambda \), such that

$$\begin{aligned} \int _{Q}|K(A_Q,y,s)|^{2}\, dyds\le c|Q|^{-1}. \end{aligned}$$

Remark 1.4

Note that (1.5) (i) is a uniform (in \(\lambda \)) \(L^2\)-estimate involving the first order partial derivative, in the \(\lambda \)-coordinate, of single layer potentials, while (1.5) (ii) is a square function estimate involving the second order partial derivatives, in the \(\lambda \)-coordinate, of single layer potentials. A relevant question is naturally in what generality the estimates in (1.5) can be expected to hold. In [33] it is proved, under additional assumptions, that these estimates are stable under small complex perturbations of the coefficient matrix. However, in the elliptic case and after [3] appeared, it was proved in [34], see [17] for an alternative proof, that if \(-\text{ div }\, A(X)\nabla \) satisfies the basic assumptions imposed in [3], then the elliptic version of (1.5) (ii) always holds. In fact, the approach in [34], which is based on functional calculus, even dispenses of the De Giorgi–Moser–Nash estimates underlying [3]. Furthermore, in the elliptic case (1.5) (ii) can be seen to imply (1.5) (i) by the results of [2]. Hence, in the elliptic case, and under the assumptions of [3], the elliptic version of (1.5) always holds. Based on this it is fair to pose the question whether or not a similar line of development can be anticipated in the parabolic case. Based on [32], this paper and [33], we anticipated that a parabolic version of [17] can be developed, To develop a parabolic version of [2] is a very interesting and potentially challenging project.

Theorem 1.3 is used in the proof of Theorem 1.2 and to our knowledge Theorems 1.1, 1.2 and 1.3 are all new. To put these results in the context of the current literature devoted to parabolic layer potentials and parabolic singular integrals, in \(C^1\)-regular or Lipschitz regular cylinders, it is fair to first mention [1214] where a theory of singular integral operators with mixed homogeneity was developed and Theorem 1.1 (i)–(iv) were proved in the context of the heat operator and in the context of time-independent \(C^1\)-cylinders. These results were then extended in [7, 8], still in the context of the heat operator, to the setting of time-independent Lipschitz domains. The more challenging setting of time-dependent Lipschitz type domains was considered in [18, 21, 30], see also [22]. In particular, in these papers the correct notion of time-dependent Lipschitz type domains, from the perspective of parabolic singular integral operators and parabolic layer potentials, was found. One major contribution of these papers, see [18, 21, 22] in particular, is the proof of Theorem 1.1 in the context of the heat operator in time-dependent Lipschitz type domains. Beyond these results the literature only contains modest contributions to the study of parabolic layer potentials associated to second order parabolic operators (in divergence form) with variable, bounded, measurable, uniformly elliptic (and complex) coefficients. Based on this we believe that our results will pave the way for important developments in the area of parabolic PDEs.

While Theorems 1.1 and 1.2 coincide, in the stationary case, with the set up and the corresponding results established in [3] for elliptic equations, we claim that our results, Theorem 1.1 in particular, are not, for at least two reasons, straightforward generalizations of the corresponding results in [3]. First, our result rely on [32] where certain square function estimates are established for second order parabolic operators of the form \(\mathcal {H}\), and where, in particular, a parabolic version of the technology in [4] is developed. Second, in general the presence of the (first order) time-derivative forces one to consider fractional time-derivatives leading, as in [18, 21, 30], see also [22], to rather elaborate additional estimates. Theorem 1.3 gives a parabolic version of an elliptic result due to Jerison and Kenig [27] and a version of the main result in [15] for equations of the form (1.1), assuming in addition that A is real and symmetric.

1.5 Proofs and organization of the paper

In general we will only supply the proof of our statements for \(\mathcal {S}_\lambda :=\mathcal {S}_\lambda ^{\mathcal {H}}\). The corresponding results for \(\mathcal {S}_\lambda ^*:=\mathcal {S}_\lambda ^{\mathcal {H}^*}\) then follow readily by analogy. In Sect. 2, which is of preliminary nature, we introduce notation, weak solutions, state the De Giorgi–Moser–Nash estimates referred to in Theorem 1.1, we prove energy estimates, and we state/prove a few fact from Littlewood–Paley theory. In Sect. 3 we prove a set of important preliminary estimates related to the boundedness of single layer potentials: off-diagonal estimates and uniform (in \(\lambda \)) \(L^2\)-estimates. Section 4 is devoted to the proof of two important lemmas: Lemmas 4.1 and 4.2. To briefly describe these results we introduce \(\Phi (f)\) where

$$\begin{aligned} \Phi (f):=\sup _{\lambda > 0}||\partial _\lambda \mathcal {S}_\lambda f||_2+|||\lambda \partial _\lambda ^2\mathcal {S}_{\lambda }f|||. \end{aligned}$$
(1.7)

Lemma 4.1 concerns estimates of non-tangential maximal functions and in this lemma we establish bounds of \(||N_*(\partial _\lambda \mathcal {S}_\lambda f)||_2\), \(||\tilde{N}_*(\nabla _{||}\mathcal {S}_\lambda f)||_2\) and \(||\tilde{N}_*(H_tD_{1/2}^t\mathcal {S}_\lambda f)||_2\) in terms of a constant times

$$\begin{aligned} \Phi (f)+||f||_2+\sup _{\lambda >0}||\mathbb D \mathcal {S}_\lambda f||_2. \end{aligned}$$

In Lemma 4.2 we establish square function estimates of the form,

$$\begin{aligned} \,\mathrm{(i)}&|||\lambda ^{m+2l+4}\nabla \partial _\lambda \partial _t^{l+1}\partial _\lambda ^{m+1}\mathcal {S}_{\lambda }f|||\le c(\Phi (f)+||f||_2),\nonumber \\ \,\mathrm{(ii)}&|||\lambda ^{m+2l+4}\partial _t\partial _t^{l+1}\partial _\lambda ^{m+1}\mathcal {S}_{\lambda }f|||\le c(\Phi (f)+||f||_2), \end{aligned}$$

whenever \(f\in L^2(\mathbb R^{n+1},\mathbb C)\), and for \(m\ge -1\), \(l\ge -1\). Using Lemma 4.1, the proof of Theorem 1.1 boils down to proving the estimate

$$\begin{aligned} \sup _{\lambda >0}||\mathbb D \mathcal {S}_\lambda f||_2\le c(\Phi (f)+||f||_2). \end{aligned}$$
(1.8)

The estimate in (1.8), which is rather demanding, uses Lemma 4.2 and make extensive use of recent results concerning resolvents, square functions and Carleson measures, established in [32]. In Sect. 5 we collect the material from [32] needed in the proof of (1.8). In [32] a parabolic version of the main and hard estimate in [4] is established. In subsection 5.3, we also seize the opportunity to clarify some statements made in [32] concerning the Kato square root problem for parabolic operators. The conclusion is that in [32] the Kato square root problem for parabolic operators, with merely bounded and measurable coefficients, is solved for the first time in the literature. In Sect. 6 we prove (1.8) as a consequence of Lemmas 6.1, 6.2, and 6.3 stated below. For clarity, the final proof of Theorem 1.1, based on the estimates established in the previous sections, is summarized in Sect. 7. In Sect. 8 we prove Theorem 1.2 by first establishing a local parabolic Tb-theorem for square functions, see Theorem 8.1, and then by establishing Theorem 1.3. We believe that our proof of Theorem 1.3 adds to the clarity of the corresponding argument in [15].

2 Preliminaries

Let \(x=(x_1,\ldots ,x_{n})\), \(X=(x,x_{n+1})\), \((x,t){=}(x_1,\ldots ,x_{n},t)\), \((X,t)=(x_1,\ldots ,x_{n}, x_{n+1},t)\). Given \((X,t)=(x,x_{n+1}, t)\), \(r>0\), we let \(Q_r(x,t)\) and \(\tilde{Q}_r(X,t)\) denote, respectively, the parabolic cubes in \(\mathbb R^{n+1}\) and \(\mathbb R^{n+2}\), centered at (xt) and (Xt), and of size r. By Q, \(\tilde{Q}\) we denote any such parabolic cubes and we let l(Q), \(l(\tilde{Q})\), \((x_Q,t_Q)\), \((X_{\tilde{Q}},t_{\tilde{Q}})\) denote their sizes and centers, respectively. Given \(\gamma >0\), we let \(\gamma Q\), \(\gamma \tilde{Q}\) be the cubes which have the same centers as Q and \(\tilde{Q}\), respectively, but with sizes defined by \(\gamma l(Q)\) and \(\gamma l(\tilde{Q})\). Given a set \(E\subset \mathbb R^{n+1}\) we let |E| denote its Lebesgue measure and by \(1_E\) we denote the indicator function for E. Finally, by \(||\cdot ||_{L^2(E)}\) we mean \(||\cdot 1_E||_2\). Furthermore, as mentioned and based on (1.3), we will frequently also use a different convention concerning the labeling of the coordinates: we let \(\lambda =x_{n+1}\) and when using the symbol \(\lambda \), the point \((X,t)=(x,x_{n+1},t)\) will be written as \( (x,t, \lambda )=(x_1,\ldots ,x_{n},t,\lambda )\). We write \(\nabla =(\nabla _{||},\partial _\lambda )\) where \(\nabla _{||}=(\partial _{x_1},\ldots ,\partial _{x_n})\). The notation \(L^2(\mathbb R^{n+1},\mathbb C)\), \(||\cdot ||_2\), \(\Vert (\cdot ,\cdot )\Vert \), \(\mathbb D\), \(D_{1/2}^t\), \(H_t\), was introduced in Sect. 1.1 above. In the following we will, in addition to \(\mathbb D\) and \(D_{1/2}^t\), at instances also use the parabolic half-order time derivative

$$\begin{aligned} \widehat{\mathbb D_{n+1}f}(\xi ,\tau ):=\frac{\tau }{\Vert (\xi ,\tau )\Vert }\hat{f}(\xi ,\tau ). \end{aligned}$$

We let \(\mathbb H:=\mathbb H(\mathbb R^{n+1},\mathbb C)\) be the closure of \(C_0^\infty (\mathbb R^{n+1},\mathbb C)\) with respect to

$$\begin{aligned} \Vert f\Vert _{\mathbb H}:=\Vert \mathbb Df\Vert _2. \end{aligned}$$
(2.1)

By applying Plancherel’s theorem we have

$$\begin{aligned} \,\mathrm{(i)}&\Vert f\Vert _{\mathbb H}\approx \Vert \nabla _{||} f\Vert _2+\Vert H_tD_{1/2}^tf\Vert _2\approx \Vert \nabla _{||} f\Vert _2+\Vert D^t_{1/2}f\Vert _2,\nonumber \\ \,\mathrm{(ii)}&\Vert \mathbb D_{n+1}f\Vert _2\le c\Vert D^t_{1/2}f\Vert _2, \end{aligned}$$
(2.2)

with constants depending only on n. Furthermore, we let \(\tilde{\mathbb H}:=\tilde{\mathbb H}(\mathbb R^{n+2},\mathbb C)\) be the closure of \(C_0^\infty (\mathbb R^{n+2},\mathbb C)\) with respect to

$$\begin{aligned} \Vert F\Vert _{\tilde{\mathbb H}}:=\left( \int _{-\infty }^\infty \int _{\mathbb R^{n+1}}\left( |\partial _\lambda F|^2+|\mathbb DF|^2\right) \, dxdtd\lambda \right) ^{1/2}. \end{aligned}$$

Similarly, we let \(\tilde{\mathbb H}_+:=\tilde{\mathbb H}_+(\mathbb R^{n+2}_+,\mathbb C)\) be the closure of \(C_0^\infty (\mathbb R^{n+2}_+,\mathbb C)\) with respect to the expression in the last display but with integration over the interval \((-\infty ,\infty )\) replaced by integration over the interval \((0,\infty )\).

2.1 Weak solutions

Let \(\Omega \subset \{X=(x,x_{n+1})\in \mathbb R^n\times \mathbb R_+\}\) be a domain and let, given \(-\infty<t_1< t_2<\infty \), \(\Omega _{t_1,t_2}=\Omega \times (t_1,t_2)\). We let \(W^{1,2}(\Omega ,\mathbb C)\) be the Sobolev space of complex valued functions v, defined on \(\Omega \), such that v and \(\nabla v\) are in \(L^{2}(\Omega ,\mathbb C)\). \(L^2(t_1,t_2,W^{1,2}(\Omega ,\mathbb C))\) is the space of functions \(u:\Omega _{t_1,t_2}\rightarrow \mathbb C\) such that

$$\begin{aligned} ||u||_{L^2(t_1,t_2,W^{1,2}(\Omega ,\mathbb C))}:=\left( \int _{t_1}^{t_2}||u(\cdot ,t)||_{W^{1,2}(\Omega ,\mathbb C)}^2\, dt\right) ^{1/2}<\infty . \end{aligned}$$

We say that \(u\in L^2(t_1,t_2,W^{1,2}(\Omega ,\mathbb C))\) is a weak solution to the equation

$$\begin{aligned} \mathcal {H}u=(\partial _t+\mathcal {L})u=0, \end{aligned}$$
(2.3)

in \(\Omega _{t_1,t_2}\), if

$$\begin{aligned} \int _{\mathbb R^{n+2}_+} \left( A\nabla u\cdot \nabla \bar{\phi }-u \partial _t\bar{\phi }\right) \, dXdt=0, \end{aligned}$$
(2.4)

whenever \(\phi \in C_0^{\infty } (\Omega _{t_1,t_2},\mathbb C)\). Similarly, we say that u is a weak solution to (2.3) in \(\mathbb R^{n+2}_+\) if \(u\phi \in L^2(-\infty ,\infty ,W^{1,2}(\mathbb R^n\times \mathbb R_+,\mathbb C))\) whenever \(\phi \in C_0^{\infty } (\mathbb R^{n+2}_+,\mathbb C)\) and if (5.2) holds whenever \(\phi \in C_0^{\infty } (\mathbb R^{n+2}_+,\mathbb C)\). Assuming that \(\mathcal {H}\) satisfies (1.2) and (1.3) as well as the De Giorgi–Moser–Nash estimates stated in (2.6) and (2.7) below, it follows that any weak solution is smooth as a function of t and in this case

$$\begin{aligned} \int _{\mathbb R^{n+2}_+} \left( A\nabla u\cdot \nabla \bar{\phi }+\partial _tu \bar{\phi }\right) \, dXdt=0, \end{aligned}$$

holds whenever \(\phi \in C_0^{\infty } (\Omega _{t_1,t_2},\mathbb C)\). Furthermore, if u is globally defined in \(\mathbb R^{n+2}_+\), and if \(D_{1/2}^tu\overline{H_tD_{1/2}^t\phi }\) is integrable in \(\mathbb R^{n+2}_+\), whenever \(\phi \in C_0^\infty (\mathbb R^{n+2}_+,\mathbb C)\), then

$$\begin{aligned} B_+(u,\phi )=0\quad \text{ whenever } \quad \phi \in C_0^\infty (\mathbb R^{n+2}_+,\mathbb C), \end{aligned}$$
(2.5)

where the sesquilinear form \(B_+(\cdot ,\cdot )\) is defined on \( \tilde{\mathbb H}_+\times \tilde{\mathbb H}_+\) as

$$\begin{aligned} B_+(u,\phi ):= \int _0^\infty \int _{\mathbb R^{n+1}} \left( A\nabla u\cdot \nabla \bar{\phi }-D_{1/2}^tu\overline{H_tD_{1/2}^t\phi }\right) \, dxdtd\lambda . \end{aligned}$$

In particular, whenever u is a weak solution to (2.3) in \(\mathbb R^{n+2}_+\) such that \(u\in \tilde{\mathbb H}_+\), then (2.5) holds. From now on, whenever we write that \(\mathcal {H}u=0\) in a bounded domain \(\Omega _{t_1,t_2}\), then we mean that (5.2) holds whenever \(\phi \in C_0^{\infty } (\Omega _{t_1,t_2},\mathbb C)\), and when we write that \(\mathcal {H}u=0\) in \(\mathbb R^{n+2}_+\), then we mean that (5.2) holds whenever \(\phi \in C_0^{\infty } (\mathbb R^{n+2}_+,\mathbb C)\).

2.2 De Giorgi–Moser–Nash estimates

We say that solutions to \(\mathcal {H}u=0\) satisfy De Giorgi–Moser–Nash estimates if there exist, for each \(1\le p<\infty \) fixed, constants c and \(\alpha \in (0,1)\) such that the following is true. Let \(\tilde{Q}\subset \mathbb R^{n+2}\) be a parabolic cube and assume that \(\mathcal {H}u=0\) in \(2\tilde{Q}\). Then

$$\begin{aligned} \sup _{ \tilde{Q}}|u|\le c\left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{2\tilde{Q}}|u|^p\right) ^{1/p}, \end{aligned}$$
(2.6)

and

$$\begin{aligned}&|u(X,t)-u(\tilde{X},\tilde{t})|\le c\left( \frac{||(X-\tilde{X},t-\tilde{t})||}{r}\right) ^{\alpha } \left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{2\tilde{Q}}|u|^p\right) ^{1/p}, \end{aligned}$$
(2.7)

whenever (Xt), \((\tilde{X},\tilde{t})\in \tilde{Q}\), \(r:=l(\tilde{Q})\). The constant c and \(\alpha \) will be referred to as the De Giorgi–Moser–Nash constants. It is well known that if (2.6) and (2.7) hold for one p, \(1\le p<\infty \), then these estimates hold for all p in this range.

2.3 Energy estimates

Lemma 2.1

Assume that \(\mathcal {H}\) satisfies (1.2) and (1.3). Let \(\tilde{Q}\subset \mathbb R^{n+2}\) be a parabolic cube and let \(\beta >1\) be a fixed constant. Assume that \(\mathcal {H}u=0\) in \(\beta \tilde{Q}\). Let \(\phi \in C_0^\infty (\beta \tilde{Q})\) be a cut-off function for \(\tilde{Q}\) such that \(0\le \phi \le 1\), \(\phi =1\) on \(\tilde{Q}\). Then there exists a constant \(c=c(n,\Lambda ,\beta )\), \(1\le c<\infty \), such that

$$\begin{aligned} \int |\nabla u(X,t)|^2(\phi (X,t))^2\, dXdt\le c\int |u(X,t)|^2(|\nabla \phi (X,t)|^2+\phi (X,t)|\partial _t\phi (X,t)|)\, dXdt. \end{aligned}$$

Proof

The lemma is a standard energy estimate. Indeed,

$$\begin{aligned} \int \left( A\nabla u\cdot \nabla (\bar{u}\phi ^2)-u \partial _t(\bar{u}\phi ^2)\right) \, dXdt=0, \end{aligned}$$

by the definition of weak solutions. Hence,

$$\begin{aligned} \int |\nabla u|^2\phi ^2\, dxdt\le c\int |u|^2(|\nabla \phi |^2+\phi |\partial _t\phi |)\, dXdt. \end{aligned}$$

\(\square \)

Lemma 2.2

Assume that \(\mathcal {H}\) satisfies (1.2) and (1.3). Let \(Q\subset \mathbb R^{n+1}\) be a parabolic cube, \(\lambda _0\in \mathbb R\), and let \(\beta _1>1\), \(\beta _2\in (0,1]\) be fixed constants. Let \(I=(\lambda _0-\beta _2l(Q),\lambda _0+\beta _2l(Q))\), \(\gamma I= (\lambda _0-\gamma \beta _2l(Q),\lambda _0+\gamma \beta _2l(Q))\) for \(\gamma \in (0,1)\). Assume that \(\mathcal {H}u=0\) in \(\beta _1^2Q\times I\). Then there exists a constant \(c=c(n,\Lambda ,\beta _1,\beta _2)\), \(1\le c<\infty \), such that

$$\begin{aligned} \,\mathrm{(i)}&\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{Q}|\nabla u(x,t,\lambda _0)|^2\, dxdt\le c\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{\beta _1Q\times \frac{1}{4}I}|\nabla u(X,t)|^2\, dXdt,\nonumber \\ \,\mathrm{(ii)}&\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{Q}|\nabla u(x,t,\lambda _0)|^2\, dxdt\le \frac{c}{l(Q)^2}\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{\beta _1^2Q\times \frac{1}{2}I}|u(X,t)|^2\, dXdt. \end{aligned}$$

Proof

It suffices to prove the lemma with \(\beta _1=2\), \(\beta _2=1\). Furthermore, we only prove (i) as (ii) follows from (i) and Lemma 2.1. For \(\lambda _0\in \mathbb R\) fixed, and with \(\gamma I\) as above, we let

$$\begin{aligned} J_1:= & {} \left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{Q}\left| \nabla u(x,t,\lambda _0)-\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{\frac{1}{16}I}\nabla u(x,t,\lambda )\, d\lambda \right| ^2dxdt\right) ^{1/2},\nonumber \\ J_2:= & {} \left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{Q}\left| \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{\frac{1}{16}I}\nabla u(x,t,\lambda )\, d\lambda \right| ^2dxdt\right) ^{1/2}. \end{aligned}$$

Then

$$\begin{aligned} \left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{Q}|\nabla u(x,t,\lambda _0)|^2\, dxdt\right) ^{1/2}\le J_1+J_2. \end{aligned}$$

Using the Hölder inequality

$$\begin{aligned} J_2\le c\left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{2Q\times \frac{1}{8}I}|\nabla u(X,t)|^2\, dXdt\right) ^{1/2}. \end{aligned}$$

Using the fundamental theorem of calculus and the Hölder inequality,

$$\begin{aligned} J_1\le cl(Q)\left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{Q\times \frac{1}{16}I}|\nabla \partial _\lambda u(X,t)|^2\, dXdt\right) ^{1/2}. \end{aligned}$$

Using that \(\partial _\lambda u\) is a solution to the same equation as u it follows from Lemma 2.1 that

$$\begin{aligned} J_1\le c\left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{\frac{3}{2}Q\times \frac{1}{8}I}|\partial _\lambda u(X,t)|^2\, dXdt\right) ^{1/2}. \end{aligned}$$

Hence the estimate in (i) follows. \(\square \)

Lemma 2.3

Assume that \(\mathcal {H}\) satisfies (1.2) and (1.3). Let \(\tilde{Q}\subset \mathbb R^{n+2}\) be a parabolic cube and let \(\beta >1\) be a fixed constant. Assume that \(\mathcal {H}u=0\) in \(\beta \tilde{Q}\). Then there exists a constant \(c=c(n,\Lambda , \beta )\), \(1\le c<\infty \), such that

$$\begin{aligned} \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{\tilde{Q}}|\partial _t u(X,t)|^2\, dXdt\le \frac{c}{l(\tilde{Q})^4}\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{\beta \tilde{Q}}|u(X,t)|^2\, dXdt. \end{aligned}$$

Proof

Let \(\phi \in C_0^\infty (\beta \tilde{Q})\) be a cut-off function for \(\tilde{Q}\) such that \(0\le \phi \le 1\), \(\phi =1\) on \(\tilde{Q}\), \(|\nabla \phi |\le c/l(\tilde{Q})\), \(|\partial _t\phi |\le c/l(\tilde{Q})^2\). Let

$$\begin{aligned} J_1:=\int |\partial _t u|^2\phi ^4\, dXdt, \end{aligned}$$

and

$$\begin{aligned} J_2:=\int |\nabla u|^2\phi ^2\, dXdt,\quad J_3:=\int |\nabla \partial _t u|^2\phi ^6\, dXdt. \end{aligned}$$

As \(\partial _t u\) is a solution to the same equation as u,

$$\begin{aligned} \int \left( A\nabla \partial _tu\cdot \nabla (\bar{u}\phi ^4)-\partial _t u \partial _t(\bar{u}\phi ^4)\right) \, dXdt=0. \end{aligned}$$

Hence,

$$\begin{aligned} J_1=\int \left( (A\nabla \partial _t u\cdot \nabla \bar{u}) \phi ^4+4(A\nabla \partial _tu\cdot \nabla \phi )\bar{u}\phi ^3-4(\partial _t u \partial _t\phi )\bar{u}\phi ^3\right) \, dXdt, \end{aligned}$$

and

$$\begin{aligned} J_1\le l(\tilde{Q})^2\epsilon J_3+\frac{c(\epsilon )}{l(\tilde{Q})^2}J_2+ \frac{c(\epsilon )}{l(\tilde{Q})^4}\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{\beta \tilde{Q}}|u(X,t)|^2\, dXdt \end{aligned}$$

where \(\epsilon \) is a degree of freedom. Again using that \(\partial _t u\) is a solution to the same equation as u, and essentially Lemma 2.1, we see that

$$\begin{aligned} J_3\le c\int |\partial _t u|^2\phi ^4(|\nabla \phi |^2+|\partial _t\phi |)\, dXdt\le \frac{c}{l(\tilde{Q})^2}J_1. \end{aligned}$$

Combining the above estimates, and again using Lemma 2.1, the lemma follows. \(\square \)

2.4 Littlewood–Paley theory

We define a parabolic approximation of the identity, which will be fixed throughout the paper, as follows. Let \(\mathcal {P}\in C_0^\infty (Q_1(0))\), \(\mathcal {P}\ge 0\) be real-valued, \(\int \mathcal {P}\, dxdt=1\), where \(Q_1(0)\) is the unit parabolic cube in \(\mathbb R^{n+1}\) centered at 0. At instances we will also assume that \(\int x_i\mathcal {P}(x,t)\, dxdt=0\) for all \(i\in \{1,\ldots ,n\}\). We set \(\mathcal {P}_\lambda (x,t)=\lambda ^{-n-2}\mathcal {P}(\lambda ^{-1}x,\lambda ^{-2}t)\) whenever \(\lambda >0\). We let \(\mathcal {P}_\lambda \) denote the convolution operator

$$\begin{aligned} \mathcal {P}_\lambda f(x,t)=\int _{\mathbb R^{n+1}}\mathcal {P}_\lambda (x-y,t-s)f(y,s)\, dyds. \end{aligned}$$

Similarly, by \(\mathcal {Q}_\lambda \) we denote a generic approximation to the zero operator, not necessarily the same at each instance, but chosen from a finite set of such operators depending only on our original choice of \(\mathcal {P}_\lambda \). In particular, \(\mathcal {Q}_\lambda (x,t)=\lambda ^{-n-2}\mathcal {Q}(\lambda ^{-1}x,\lambda ^{-2}t)\) where \(\mathcal {Q}\in C^\infty (\mathbb {R}^{n+1})\), \(\int \mathcal {Q}\, dxdt=0\). In addition we will, following [21], assume that \(\mathcal {Q}_\lambda \) satisfies the conditions

$$\begin{aligned} |\mathcal {Q}_\lambda (x,t)|\le & {} \frac{c\lambda }{(\lambda +||(x,t)||)^{n+3}},\nonumber \\ |\mathcal {Q}_\lambda (x,t)-\mathcal {Q}_\lambda (y,s)|\le & {} \frac{c||(x-y,t-s)||^\alpha }{(\lambda +||(x,t)||)^{n+2+\alpha }}, \end{aligned}$$

where the latter estimate holds for some \(\alpha \in (0,1)\) whenever \(2||(x-y,t-s)||\le ||(x,t)||\). Under these assumptions it is well known that

$$\begin{aligned} \int _0^\infty \int _{\mathbb R^{n+1}}|\mathcal {Q}_\lambda f|^2\, \frac{dxdtd\lambda }{\lambda }\le c\int _{\mathbb R^{n+1}}|f|^2\, {dxdt}, \end{aligned}$$
(2.8)

for all \(f\in L^2(\mathbb R^{n+1},\mathbb C)\). In the following we collect a number of elementary observations used in the forthcoming sections.

Lemma 2.4

Let \(\mathcal {P}_\lambda \) be as above. Then

$$\begin{aligned} |||\lambda \nabla \mathcal {P}_\lambda f|||+|||\lambda ^2\partial _t\mathcal {P}_\lambda f|||+|||\lambda \mathbb D \mathcal {P}_\lambda f|||\le c||f||_2, \end{aligned}$$

for all \(f\in L^2(\mathbb R^{n+1},\mathbb C)\).

Proof

This lemma essentially follows immediately from (2.8). For slightly more details we refer to the proof of Lemma 2.30 in [32]. \(\square \)

Consider a cube \(Q\subset \mathbb R^{n+1}\). In the following we let \(\mathcal {A}_\lambda ^Q\) denote the dyadic averaging operator induced by Q, i.e., if \(\hat{Q}_\lambda (x,t)\) is the minimal dyadic cube (with respect to the grid induced by Q) containing (xt), with side length at least \(\lambda \), then

$$\begin{aligned} \mathcal {A}_\lambda ^Q f(x,t):=\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{\hat{Q}_\lambda (x,t)}f\, dyds, \end{aligned}$$
(2.9)

is the average of f over \(\hat{Q}_\lambda (x,t)\).

Lemma 2.5

Let \(\mathcal {A}_\lambda ^Q\) and \(\mathcal {P}_\lambda \) be as above. Then

$$\begin{aligned} \int _0^\infty \int _{\mathbb R^{n+1}}|(\mathcal {A}_\lambda ^Q-\mathcal {P}_\lambda )f|^2\, \frac{dxdtd\lambda }{\lambda }\le c\int _{\mathbb R^{n+1}}|f|^2\, {dxdt}, \end{aligned}$$

for all \(f\in L^2(\mathbb R^{n+1},\mathbb C)\).

Proof

The lemma follows by orthogonality estimates and we here include a sketch of the proof for completion. Let \(F\in C_0^\infty (\mathbb R^{n+2}_+,\mathbb C)\) be such that \(|||F|||=1\). It suffices to prove that

$$\begin{aligned} \int _0^\infty \int _{\mathbb R^{n+1}}F(x,t,\lambda )\overline{(\mathcal {A}_\lambda ^Q-\mathcal {P}_\lambda )f(x,t)}\, \frac{dxdtd\lambda }{\lambda }\le c||f||_2, \end{aligned}$$

for all \(f\in L^2(\mathbb R^{n+1},\mathbb C)\). To prove this we first note that \(|(\mathcal {A}_\lambda ^Q-\mathcal {P}_\lambda )f(x_0,t_0)|\le cM(f)(x_0,t_0)\) whenever \((x_0,t_0)\in \mathbb R^{n+1}\) and where M is the parabolic Hardy–Littlewood maximal function. Hence,

$$\begin{aligned} \sup _{\lambda >0}||(\mathcal {A}_\lambda ^Q-\mathcal {P}_\lambda )||_{2\rightarrow 2}\le c. \end{aligned}$$

Let \(\mathcal {Q}_\lambda \) be an approximation of the zero operator defined based on a function \(\mathcal {Q}\) so normalized that \(\mathcal {Q}_\lambda \) is a resolution of the identity, i.e.,

$$\begin{aligned} \int _0^\infty \mathcal {Q}_\lambda ^2g\, \frac{d\lambda }{\lambda }=g, \end{aligned}$$

whenever \(g\in C_0^\infty (\mathbb R^{n+1},\mathbb C)\). Then

$$\begin{aligned} ||(\mathcal {A}_\lambda ^Q-\mathcal {P}_\lambda )\mathcal {Q}_\sigma ||_{2\rightarrow 2}\le c\min \{(\lambda /\sigma )^\delta ,(\sigma /\lambda )^\delta \}, \end{aligned}$$
(2.10)

for some \(\delta >0\). Indeed, let \(\mathcal {R}_\lambda (x,t,y,s)\) be the kernel associated to \(\mathcal {A}_\lambda ^Q-\mathcal {P}_\lambda \), i.e.,

$$\begin{aligned} \mathcal {R}_\lambda (x,t,y,s)=\frac{1}{|\hat{Q}_\lambda (x,t)|}1_{\hat{Q}_\lambda (x,t)}(y,s)-\mathcal {P}_\lambda (x-y,t-s). \end{aligned}$$

Then \(\mathcal {R}_\lambda 1=0\) and it is easily seen that

$$\begin{aligned} \,\mathrm{(i)}&|\mathcal {R}_\lambda (x,t,y,s)|\le \lambda ^\delta (\lambda +||(x,t)||)^{-n-2-\delta },\nonumber \\ \,\mathrm{(ii)}&\int _{\mathbb R^{n+1}}\sup _{\{(z,w):\ ||(z-y,w-s)||\le \sigma \}} |\mathcal {R}_\lambda (x,t,z,w)-\mathcal {R}_\lambda (x,t,y,s)|\, dyds\le c(\sigma /\lambda )^\delta , \end{aligned}$$

whenever \((x,t)\in \mathbb R^{n+1}\), \(0<\sigma \le \lambda <\infty \) and with \(\delta =1\). Note that there is an unfortunate statement in the corresponding proof in [32]: there (ii) was stated in a pointwise sense which can, obviously, not hold as the indicator function \(1_{\hat{Q}_\lambda (x,t)}\) is not Hölder continuous. Using (i), (ii), one can, arguing as in the proof of display (3.7) and Remark 3.11 in [25], conclude the validity of (2.10). Let \(h_\delta (\lambda ,\sigma ):=\min \{(\lambda /\sigma )^\delta ,(\sigma /\lambda )^\delta \}.\) We write

$$\begin{aligned}&\left| \int _0^\infty \int _{\mathbb R^{n+1}}F(x,t,\lambda )\overline{(\mathcal {A}_\lambda ^Q-\mathcal {P}_\lambda )f(x,t)}\, \frac{dxdtd\lambda }{\lambda }\right| \nonumber \\&\quad = \ \left| \int _0^\infty \int _0^\infty \int _{\mathbb R^{n+1}}F(x,t,\lambda )\overline{(\mathcal {A}_\lambda ^Q-\mathcal {P}_\lambda )\mathcal {Q}_\sigma ^2f}(x,t)\, dxdt\frac{d\lambda }{\lambda }\frac{d\sigma }{\sigma }\right| , \end{aligned}$$

Hence, using Cauchy–Schwarz we see that

$$\begin{aligned} \left| \int _0^\infty \int _{\mathbb R^{n+1}}F(x,t,\lambda )\overline{(\mathcal {A}_\lambda ^Q-\mathcal {P}_\lambda )f(x,t)}\, \frac{dxdtd\lambda }{\lambda }\right| \le I_1^{1/2} I_2^{1/2}, \end{aligned}$$

where

$$\begin{aligned} I_1:= & {} \int _0^\infty \int _0^\infty \int _{\mathbb R^{n+1}}|F(x,t,\lambda )|^2h_\delta (\lambda ,\sigma )\, dxdt\frac{d\lambda }{\lambda }\frac{d\sigma }{\sigma },\nonumber \\ I_2:= & {} \int _0^\infty \int _0^\infty \int _{\mathbb R^{n+1}}|{(\mathcal {A}_\lambda ^Q-\mathcal {P}_\lambda )\mathcal {Q}_\sigma ^2f}(x,t)|^2(h_\delta (\lambda ,\sigma ))^{-1}\, dxdt\frac{d\lambda }{\lambda }\frac{d\sigma }{\sigma }. \end{aligned}$$

Integrating with respect to \(\sigma \) in \(I_1\) we see that \(I_1\le c\). Furthermore, using (2.10) we see that

$$\begin{aligned} I_2\le & {} \int _0^\infty \int _0^\infty \int _{\mathbb R^{n+1}}|\mathcal {Q}_\sigma f(x,t)|^2h_\delta (\lambda ,\sigma )\, dxdt\frac{d\lambda }{\lambda }\frac{d\sigma }{\sigma }\nonumber \\\le & {} c\int _0^\infty \int _{\mathbb R^{n+1}}|\mathcal {Q}_\sigma f(x,t)|^2\, dxdt \frac{d\sigma }{\sigma }\le c||f||_2^2. \end{aligned}$$

This completes the proof of the lemma. See also the proof of Lemma 4.3 in [25]. \(\square \)

3 Off-diagonal and uniform \(L^2\)-estimates for single layer potentials

We here establish a number of elementary and preliminary estimates for single layer potentials. We will consistently only formulate and prove results for \(\mathcal {S}_\lambda :=\mathcal {S}_\lambda ^{\mathcal {H}}\) and for \(\lambda >0\), where \(\mathcal {H}=\partial _t-\mathop {{\text {div}}}\nolimits A\nabla \) is assumed to satisfy (1.2) and (1.3) as well as (2.6) and (2.7). The corresponding results for \(\mathcal {S}_\lambda ^*:=\mathcal {S}_\lambda ^{\mathcal {H}^*}\) follow by analogy. Here we will also use the notation \(\mathop {{\text {div}}}\nolimits _{||}=\nabla _{||}\cdot \), \(D_i=\partial _{x_i}\) for \(i\in \{1,\ldots ,n+1\}\). We let

$$\begin{aligned} (\mathcal {S}_\lambda D_j)f(x,t):= & {} \int _{\mathbb R^{n+1}}\partial _{y_j}\Gamma _\lambda (x,t,y,s)f(y,s)\, dyds,\ 1\le j\le n,\nonumber \\ (\mathcal {S}_\lambda D_{n+1})f(x,t):= & {} \int _{\mathbb R^{n+1}}\partial _{\sigma }\Gamma (x,t,\lambda ,y,s,\sigma )|_{\sigma =0}f(y,s)\, dyds. \end{aligned}$$

We set

$$\begin{aligned} (\mathcal {S}_\lambda \nabla ):= & {} ((\mathcal {S}_\lambda D_1),\ldots ,(\mathcal {S}_\lambda D_n),(\mathcal {S}_\lambda D_{n+1})),\nonumber \\ (\mathcal {S}_\lambda \nabla \cdot )\mathbf{f}:= & {} \sum _{j=1}^{n+1}(\mathcal {S}_\lambda D_j)f_j, \end{aligned}$$

whenever \(\mathbf{f}=(f_1,\ldots ,f_{n+1})\) and we note that

$$\begin{aligned} (\mathcal {S}_\lambda \nabla _{||})\cdot \mathbf{f}_{||}=-\mathcal {S}_\lambda (\mathop {{\text {div}}}\nolimits _{||}{} \mathbf{f}_{||}),\quad (\mathcal {S}_\lambda D_{n+1})f_{n+1}=-\partial _\lambda \mathcal {S}_\lambda f_{n+1}, \end{aligned}$$

whenever \(\mathbf{f}=(\mathbf{f}_{||},f_{n+1}) \in C_0^\infty (\mathbb R^{n+1},\mathbb C^{n+1})\) and by the translation invariance in the \(\lambda \)-variable. Given a function \(f\in L^2(\mathbb R^{n+1},\mathbb C)\), and \(h=(h_1,\ldots ,h_{n+1})\in \mathbb R^{n+1}\), we let \((\mathbb D^hf)(x,t)=f(x_1+h_1,\ldots ,x_n+h_n,t+h_{n+1})-f(x,t)\). Given \(m\ge -1\), \(l\ge -1\) we let

$$\begin{aligned} K_{m,\lambda }(x,t,y,s):= & {} \partial _\lambda ^{m+1}\Gamma _\lambda (x,t,y,s),\nonumber \\ K_{m,l,\lambda }(x,t,y,s):= & {} \partial _t^{l+1}\partial _\lambda ^{m+1}\Gamma _\lambda (x,t,y,s), \end{aligned}$$
(3.1)

and we introduce

$$\begin{aligned} d_\lambda (x,t,y,s):=|x-y|+|t-s|^{1/2} +\lambda . \end{aligned}$$

Lemma 3.1

Consider \(m\ge -1\), \(l\ge -1\). Then there exists constants \(c_{m,l}\) and \(\alpha \in (0,1)\), depending at most on n, \(\Lambda \), the De Giorgi–Moser–Nash constants, m, l, such that

$$\begin{aligned} \,\mathrm{(i)}&|K_{m,l,\lambda }(x,t,y,s)|\le c_{m,l}({d_\lambda (x,t,y,s)})^{-n-m-2l-4},\nonumber \\ \,\mathrm{(ii)}&|(\mathbb D^hK_{m,l,\lambda }(\cdot ,\cdot ,y,s))(x,t)|\le c_{m,l}||h||^\alpha ({d_\lambda (x,t,y,s)})^{-n-m-2l-4-\alpha },\nonumber \\ \,\mathrm{(iii)}&|(\mathbb D^hK_{m,l,\lambda }(x,t,\cdot ,\cdot ))(y,s)|\nonumber \le c_{m,l}||h||^\alpha ({d_\lambda (x,t,y,s)})^{-n-m-2l-4-\alpha }, \end{aligned}$$

whenever \(2||h||\le ||(x-y,t-s)||\) or \(2||h||\le \lambda \).

Proof

Assume first that \(l=-1\). Then \(K_{m,l,\lambda }=K_{m,\lambda }\). In the case \(m=-1\) the estimates in (i)–(iii) follow from (2.6) and (2.7), see also [1] and Section 1.4 in [6]. In the cases \(m\ge 0\), the corresponding estimates follow by induction using (2.6), (2.7), Lemmas 2.1 and 2.2. This establishes the estimates in (i)–(iii) for \(K_{m,-1,\lambda }\) whenever \(m\ge -1\). We next consider the case of \(K_{m,l,\lambda }\), \(l\ge 0\). Fix \((y,s)\in \mathbb R^{n+1}\) and let \(u=u(x,t,\lambda )=K_{m,l,\lambda }(x,t,y,s)\) for some \(l\ge 0\). Given \((x,t,\lambda )\in \mathbb R_+^{n+2}\), let \(\tilde{Q}\subset \mathbb R^{n+2}\) be the largest parabolic cube centered at \((x,t,\lambda )\) such that \(16\tilde{Q}\subset \mathbb R^{n+2}_+\) and such \(\mathcal {H}u=0\) in \(16\tilde{Q}\). Then \(l(\tilde{Q})\approx \min \{\lambda ,||(x-y,t-s)||\}\), and

$$\begin{aligned} |\partial _t u(x,t,\lambda )|\le c\left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{2\tilde{Q}}|\partial _t u|^2\, dXdt\right) ^{1/2}, \end{aligned}$$

by (2.6) as \(\partial _t u\) is a solution to the same equation as u. Using Lemma 2.3 we can therefore conclude that

$$\begin{aligned} |\partial _t u(x,t,\lambda )|^2\le \frac{c}{l(\tilde{Q})^4}\left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{8\tilde{Q}}|u|^2\, dXdt\right) . \end{aligned}$$

Using this and induction, the estimate in (i) follows for \(K_{m,l,\lambda }(x,t,y,s)\) whenever \(l\ge -1\). Using (2.7), the estimates in (ii) and (iii) are proved similarly. \(\square \)

Lemma 3.2

Consider \(m\ge -1\), \(l\ge -1\) and \(\rho >1\). Then there exist a constant \(c_{m,l}\), depending at most on n, \(\Lambda \), the De Giorgi–Moser–Nash constants, m, l, and a constant \(c_{m,l,\rho }\), depending in addition on \(\rho \), such that

$$\begin{aligned} \,\mathrm{(i)}&\int _{2^{k+1}Q{\setminus } 2^kQ}|(2^kl(Q))^{m+2l+3}\nabla _yK_{m,l,\lambda }(x,t,y,s)|^2dy ds\le c_{m,l}(2^kl(Q))^{-n-2},\nonumber \\ \,\mathrm{(ii)}&\int _{2Q}|(l(Q))^{m+2l+3}\nabla _yK_{m,l,\lambda }(x,t,y,s)|^2dy ds\le c_{m,l,\rho }(l(Q))^{-n-2},\nonumber \\&\quad if\, l(Q)/\rho \le \lambda \le \rho l(Q), \end{aligned}$$

whenever \(Q\subset \mathbb R^{n+1}\) is a parabolic cube, \(k\ge 1\) is an integer and \((x,t)\in Q\).

Proof

Fix \((x,t)\in Q\) and let

$$\begin{aligned} v(y,s,\lambda ):=\overline{K_{m,l,\lambda }(x,t,y,s)}. \end{aligned}$$

Then v is a solution to the adjoint equation. The lemma now follows from Lemma 2.2 (ii), applied to the adjoint equation, and Lemma 3.1 (i). Indeed, it is easy to see that Lemma 2.2 also is valid in when Q is replaced by the annular region \(2^{k+1}Q{\setminus } 2^kQ\). \(\square \)

Lemma 3.3

Consider \(m\ge -1\), \(l\ge -1\) and \(\rho >1\). Then there exist a constant \(c_{m,l}\), depending at most on n, \(\Lambda \), the De Giorgi–Moser–Nash constants, m, l, and a constant \(c_{m,l,\rho }\), depending in addition on \(\rho \), such that

$$\begin{aligned} \mathrm{(i)}&||\partial _t^{l+1}\partial _\lambda ^{m+1}(\mathcal {S}_\lambda \nabla _{||}\cdot )(\mathbf{f}1_{2^{k+1}Q{\setminus } 2^kQ})||_{L^2(Q)}^2 \\&\le \ c_{m,l}2^{-(n+2)k}(2^kl(Q))^{-2m-4l-6}||\mathbf{f}||^2_{L^2(2^{k+1}Q{\setminus } 2^kQ)},\nonumber \\ \mathrm{(ii)}&||\partial _t^{l+1}\partial _\lambda ^{m+1}(\mathcal {S}_\lambda \nabla _{||}\cdot )(\mathbf{f}1_{2Q})||_{L^2(Q)}^2 \le c_{m,l,\rho }(l(Q))^{-2m-4l-6}||\mathbf{f}||^2_{L^2(2Q)},\nonumber \\&\quad { if}\, l(Q)/\rho \le \lambda \le \rho l(Q),\nonumber \\ \mathrm{(iii)}&||\partial _t^{l+1}\partial _\lambda ^{m+1}(\mathcal {S}_\lambda )({f}1_{2^{k+1}Q{\setminus } 2^kQ})||_{L^2(Q)}^2\le \ c_{m,l}2^{-(n+2)k}\\&(2^kl(Q))^{-2m-4l-4}||{f}||^2_{L^2(2^{k+1}Q{\setminus } 2^kQ)},\nonumber \\ \mathrm{(iv)}&||\partial _t^{l+1}\partial _\lambda ^{m+1}(\mathcal {S}_\lambda )({f}1_{2Q})||_{L^2(Q)}^2 \le c_{m,l,\rho }(l(Q))^{-2m-4l-4}||{f}||^2_{L^2(2Q)},\nonumber \\&\quad {\text{ if }}\, l(Q)/\rho \le \lambda \le \rho l(Q), \end{aligned}$$

whenever \(Q\subset \mathbb R^{n+1}\) is a parabolic cube, \(k\ge 1\) is an integer, \(\mathbf{f} \in L^2(\mathbb R^{n+1},\mathbb C^{n})\), and \({f} \in L^2(\mathbb R^{n+1},\mathbb C)\).

Proof

Let \((x,t)\in Q\). To prove (i) we note that

$$\begin{aligned}&|\partial _t^{l+1}\partial _\lambda ^{m+1}(\mathcal {S}_\lambda \nabla _{||}\cdot )(\mathbf{f}1_{2^{k+1}Q{\setminus } 2^kQ})(x,t)|^2\nonumber \\&\quad = \ \left| \int _{2^{k+1}Q{\setminus } 2^kQ}\nabla _yK_{m,l,\lambda }(x,t,y,s)\cdot \mathbf{f}(y,s)\, dyds\right| ^2\nonumber \\&\quad \le \ ||\nabla _yK_{m,l,\lambda }(x,t,y,s)||_{L^2(2^{k+1}Q {\setminus } 2^kQ)}^2||\mathbf{f}||^2_{L^2(2^{k+1}Q{\setminus } 2^kQ)}\nonumber \\&\quad \le \ c_{m,l}(2^kl(Q))^{-n-2m-4l-8}||\mathbf{f}||^2_{L^2(2^{k+1}Q{\setminus } 2^kQ)}, \end{aligned}$$

by Lemma 3.2 (i). Hence, integrating with respect to (xt) we see that

$$\begin{aligned}&||\partial _t^{l+1}\partial _\lambda ^{m+1}(\mathcal {S}_\lambda \nabla _{||}\cdot )(\mathbf{f}1_{2^{k+1}Q{\setminus } 2^kQ})||_{L^2(Q)}^2\nonumber \\&\quad \le \ c_{m,l}(l(Q))^{n+2}(2^kl(Q))^{-n-2m-4l-8}||\mathbf{f}||^2_{L^2(2^{k+1}Q{\setminus } 2^kQ)}\nonumber \\&\quad \le \ c_{m,l}2^{-(n+2)k}(2^kl(Q))^{-2m-4l-6}||\mathbf{f}||^2_{L^2(2^{k+1}Q{\setminus } 2^kQ)}. \end{aligned}$$

This completes the proof of (i). The proof of (ii) is similar. To prove (iii) we again consider \((x,t)\in Q\). Then

$$\begin{aligned}&|\partial _t^{l+1}\partial _\lambda ^{m+1}(\mathcal {S}_\lambda )({f}1_{2^{k+1}Q{\setminus } 2^kQ})(x,t)|^2\nonumber \\&\quad = \ \left| \int _{2^{k+1}Q{\setminus } 2^kQ}K_{m,l,\lambda }(x,t,y,s){f}(y,s)\, dyds\right| ^2\nonumber \\&\quad \le \ ||K_{m,l,\lambda }(x,t,y,s)||_{L^2(2^{k+1}Q {\setminus } 2^kQ)}^2||{f}||^2_{L^2(2^{k+1}Q{\setminus } 2^kQ)}\nonumber \\&\quad \le \ c_{m,l}(2^kl(Q))^{-n-2m-4l-6}||{f}||^2_{L^2(2^{k+1}Q{\setminus } 2^kQ)}. \end{aligned}$$

We can now proceed as above to complete the proof of (iii). The proof of (iv) is similar.\(\square \)

Lemma 3.4

Assume \(m\ge -1\), \(l\ge -1\), \(m+2l\ge -2\), Then there exists a constant \(c_{m,l}\), depending at most on n, \(\Lambda \), the De Giorgi–Moser–Nash constants, m, l, such that the following holds. Let \(\mathbf{f} \in L^2(\mathbb R^{n+1},\mathbb C^{n})\) and \({f} \in L^2(\mathbb R^{n+1},\mathbb C)\). Then

$$\begin{aligned} \,\mathrm{(i)}&\sup _{\lambda>0}||\lambda ^{m+2l+3}\partial _t^{l+1}\partial _\lambda ^{m+1}(\mathcal {S}_\lambda \nabla _{||}\cdot )\mathbf{f}||_2\le c_{m,l}||\mathbf{f}||_2,\nonumber \\ \,\mathrm{(ii)}&\sup _{\lambda >0} ||\lambda ^{m+2l+3}\partial _t^{l+1}\partial _\lambda ^{m+1}(\nabla _{||} \mathcal {S}_\lambda f)||_2\le c_{m,l}||{f}||_2. \end{aligned}$$

Furthermore, if \(m+2l\ge -1\) then

$$\begin{aligned} \,\mathrm{(iii)}&\sup _{\lambda >0} ||\lambda ^{m+2l+2}\partial _t^{l+1}\partial _\lambda ^{m+1} (\mathcal {S}_\lambda f)||_2\le c_{m,l}||{f}||_2. \end{aligned}$$

Proof

We first note that to prove (ii) it suffices to only prove (i), as, by duality, (ii) follows from (i) applied to \(\mathcal {S}_\lambda ^*\). To prove (i), fix \(\lambda >0\) and consider \(m\ge -1\), \(l\ge -1\). Then

$$\begin{aligned}&||\lambda ^{m+2l+3}\partial _t^{l+1}\partial _\lambda ^{m+1}(\mathcal {S}_\lambda \nabla _{||}\cdot )\mathbf{f}||_{2}^2 \le \sum _Q\int _Q|\lambda ^{m+2l+3}\partial _t^{l+1}\partial _\lambda ^{m+1}(\mathcal {S}_\lambda \nabla _{||}\cdot )\mathbf{f}(x,t)|^2\, dxdt, \end{aligned}$$

where the sum runs over the dyadic grid of parabolic cubes with \(l(Q)\approx \lambda \). With Q fixed we see that

$$\begin{aligned}&\int _Q|\lambda ^{m+2l+3}\partial _t^{l+1}\partial _\lambda ^{m+1}(\mathcal {S}_\lambda \nabla _{||}\cdot )\mathbf{f}(x,t)|^2\, dxdt\nonumber \\&\quad \le \ \int _Q|\lambda ^{m+2l+3}\partial _t^{l+1}\partial _\lambda ^{m+1}(\mathcal {S}_\lambda \nabla _{||}\cdot )(\mathbf{f}1_{2Q})(x,t)|^2\, dxdt\nonumber \\&\quad \quad + \ \sum _{k\ge 1}\int _Q|\lambda ^{m+2l+3}\partial _t^{l+1}\partial _\lambda ^{m+1}(\mathcal {S}_\lambda \nabla _{||}\cdot )(\mathbf{f}1_{2^{k+1}Q{\setminus } 2^kQ})(x,t)|^2\, dxdt\nonumber \\&\quad \le \ c\lambda ^{2m+4l+6}(l(Q))^{-2m-4l-6}||\mathbf{f}||^2_{L^2(2Q)}\nonumber \\&\quad \quad + \ \sum _{k\ge 1}c2^{-(n+2)k}\lambda ^{2m+4l+6}(2^kl(Q))^{-2m-4l-6} ||\mathbf{f}||^2_{L^2(2^{k+1}Q{\setminus } 2^kQ)}\nonumber \\&\quad \le \ c\left( ||\mathbf{f}||^2_{L^2(2Q)}+\sum _{k\ge 1}2^{-(n+2)k}2^{-(2m+4l+6)k} ||\mathbf{f}||^2_{L^2(2^{k+1}Q{\setminus } 2^kQ)}\right) , \end{aligned}$$

by Lemma 3.3 (i) and (ii), as \(l(Q)\approx \lambda \). Hence,

$$\begin{aligned}&||\lambda ^{m+2l+3}\partial _t^{l+1}\partial _\lambda ^{m+1}(\mathcal {S}_\lambda \nabla _{||}\cdot )\mathbf{f}||_{L^2(\mathbb R^{n+1})}^2\nonumber \\&\quad \le \ c||\mathbf{f}||_2^2+c\sum _Q\sum _{k\ge 1}2^{-(n+2)k} 2^{-(2m+4l+6)k} ||\mathbf{f}||^2_{L^2(2^{k+1}Q{\setminus } 2^kQ)}. \end{aligned}$$
(3.2)

To complete the proof of (i) we now note that there exists, given a point (xt), at most \(c_n2^{(n+2)k}\) cubes Q such that \((x,t)\in 2^{k+1}Q{\setminus } 2^kQ\). Hence, using this, and the estimate in (3.2), we see that

$$\begin{aligned} ||\lambda ^{m+2l+3}\partial _t^{l+1}\partial _\lambda ^{m+1}(\mathcal {S}_\lambda \nabla _{||}\cdot )\mathbf{f}||_{L^2(\mathbb R^{n+1})}^2\le & {} c||\mathbf{f}||_2^2+c\sum _{k\ge 1} 2^{-(2m+4l+6)k} ||\mathbf{f}||^2_2\nonumber \\\le & {} c||\mathbf{f}||_2^2, \end{aligned}$$

as long as \(m+2l>-3\). This completes the proof of (i). Using Lemma 3.3 (iii) and (iv), the proof of (iii) is similar. We omit further details. \(\square \)

Lemma 3.5

Let \(f\in C_0^\infty (\mathbb R^{n+1},\mathbb C)\) and \(\lambda _0>0\). Then \(\mathcal {S}_{\lambda _0} f\in {\mathbb H}(\mathbb R^{n+1},\mathbb C)\cap L^2(\mathbb R^{n+1},\mathbb C)\).

Proof

Given \(f\in C_0^\infty (\mathbb R^{n+1},\mathbb C)\) we let \(Q\subset \mathbb R^{n+1}\) be a parabolic cube, centered at (0, 0), such that the support of f is contained in Q. Let \(\lambda _0>0\) be fixed. We have to prove that \(||\nabla _{||}\mathcal {S}_{\lambda _0} f||_2<\infty \), \(||H_tD_{1/2}^t\mathcal {S}_{\lambda _0} f||_2<\infty \), and that \(||\mathcal {S}_{\lambda _0} f||_2<\infty \). To estimate \(||\nabla _{||}\mathcal {S}_{\lambda _0} f||_2\) we see, by duality, that it suffices to bound

$$\begin{aligned} \int _{Q}|(\mathcal {S}_{\lambda _0}^*\nabla _{||}\cdot )\mathbf{f}(x,t)|^2\, dxdt\le & {} \int _{Q}|(\mathcal {S}_{\lambda _0}^*\nabla _{||}\cdot )(\mathbf{f}1_{2Q})(x,t)|^2\, dxdt\nonumber \\&+\,\sum _{k\ge 1}\int _{Q}|(\mathcal {S}_{\lambda _0}^*\nabla _{||}\cdot )(\mathbf{f}1_{2^{k+1}Q{\setminus } 2^kQ})(x,t)|^2\, dxdt, \end{aligned}$$

where \(\mathbf{f}\in C_0^\infty (\mathbb R^{n+1},\mathbb C^n)\), \(||\mathbf{f}||_2=1\). However, now using the adjoint version of Lemma 3.3 (i), (ii) with \(l=-1=m\), we immediately see that

$$\begin{aligned} \int _{Q}|(\mathcal {S}_{\lambda _0}^*\nabla _{||}\cdot )\mathbf{f}(x,t)|^2\, dxdt\le c(n,\Lambda ,\lambda _0)<\infty , \end{aligned}$$

whenever \(\mathbf{f}\in C_0^\infty (\mathbb R^{n+1},\mathbb C^n)\), \(||\mathbf{f}||_2=1\). To estimate \(||H_tD_{1/2}^t\mathcal {S}_{\lambda _0} f||_2\) we first note that

$$\begin{aligned} ||H_tD_{1/2}^t\mathcal {S}_{\lambda _0} f||_2^2\le ||\partial _t\mathcal {S}_{\lambda _0} f||_2||\mathcal {S}_{\lambda _0} f||_2. \end{aligned}$$

Using Lemma 3.4 (iii) we see that \(||\partial _t\mathcal {S}_{\lambda _0} f||_2\le c(n,\Lambda ,\lambda _0)||f||_2<\infty \). To estimate \(||\mathcal {S}_{\lambda _0} f||_2\) we write

$$\begin{aligned} \int _{\mathbb R^{n+1}}|\mathcal {S}_{\lambda _0}f(x,t)|^2\, dxdt\le & {} \int _{2Q}|\mathcal {S}_{\lambda _0} f(x,t)|^2\, dxdt\nonumber \\&+\,\sum _{k\ge 1}\int _{2^{k+1}Q{\setminus } 2^kQ}|\mathcal {S}_{\lambda _0} f(x,t)|^2\, dxdt. \end{aligned}$$

Using this and Lemma 3.1 (i) we deduce that

$$\begin{aligned} \int _{\mathbb R^{n+1}}|\mathcal {S}_{\lambda _0}f(x,t)|^2\, dxdt\le c(n,\Lambda ,\lambda _0)<\infty . \end{aligned}$$

This completes the proof of the lemma. \(\square \)

4 Estimates of non-tangential maximal functions and square functions

Consider \(\mathcal {S}_\lambda =\mathcal {S}_\lambda ^{\mathcal {H}}\), for \(\lambda >0\), where \(\mathcal {H}=\partial _t-\mathop {{\text {div}}}\nolimits A\nabla \) is assumed to satisfy (1.2) and (1.3) as well as (2.6) and (2.7). Recall the notation \(|||\cdot |||\), \(\Phi (f)\), introduced in (1.4), (1.7). This section is devoted to the proof of the following two lemmas.

Lemma 4.1

Then there exists a constant c, depending at most on n, \(\Lambda \), and the De Giorgi–Moser–Nash constants, such that

whenever \(f\in L^2(\mathbb R^{n+1},\mathbb C)\).

Lemma 4.2

Assume \(m\ge -1\), \(l\ge -1\). Let \(\Phi (f)\) be defined as in (1.7). Assume that \(\Phi (f)<\infty \) whenever \(f\in L^2(\mathbb R^{n+1},\mathbb C)\). Then there exists a constant c, depending at most on n, \(\Lambda \), the De Giorgi–Moser–Nash constants, and m, l, such that

$$\begin{aligned} \,\mathrm{(i)}&|||\lambda ^{m+2l+4}\nabla \partial _\lambda \partial _t^{l+1}\partial _\lambda ^{m+1}\mathcal {S}_{\lambda }f||| \le c(\Phi (f)+||f||_2),\nonumber \\ \,\mathrm{(ii)}&|||\lambda ^{m+2l+4}\partial _t\partial _t^{l+1}\partial _\lambda ^{m+1}\mathcal {S}_{\lambda }f||| \le c(\Phi (f)+||f||_2), \end{aligned}$$

whenever \(f\in L^2(\mathbb R^{n+1},\mathbb C)\).

4.1 Proof of Lemma 4.1

Throughout the proof we can, without loss of generality, assume that \(f\in C_0^\infty (\mathbb R^{n+1},\mathbb C)\). We let \(Q\subset \mathbb R^{n+1}\) be the (smallest) cube centered at (0, 0) such that the support of f is contained in \(\frac{1}{2}Q\). Let \(\delta >0\) be small and let \(1_{\lambda >2\delta }\) denote the indicator function for the set \(\{\lambda :\ {\lambda >2\delta }\}\subset \mathbb R\).

Proof of Lemma 4.1 (i)

We let \((x_0,t_0)\in \mathbb R^{n+1}\). Recall that the kernel of \(\partial _\lambda {\mathcal {S}}_\lambda \) is \(K_{0,\lambda }(x,t,y,s)\) introduced in (3.1). \(K_{0,\lambda }(x,t,y,s)\) is a (parabolic) Calderon–Zygmund kernel satisfying the Calderon–Zygmund type estimates stated in Lemma 3.1. Given \((x_0,t_0)\in \mathbb R^{n+1}\) we consider \((x,t,\lambda )\in \Gamma (x_0,t_0)\). Then

$$\begin{aligned}&|\partial _\lambda {\mathcal {S}}_{\lambda } (f)(x,t)-\partial _\lambda {\mathcal {S}}_{\lambda } (f)(x_0,t_0)|\nonumber \\&\quad \le \ \int _{\mathbb R^{n+1}}|K_{0,{\lambda }}(x,t,y,s)-K_{0,{\lambda }}(x_0,t_0,y,s)||f(y,s)|\, dyds\nonumber \\&\quad \le \ cM(f)(x_0,t_0), \end{aligned}$$

by Lemma 3.1 and where M is the parabolic Hardy–Littlewood maximal function. Hence

$$\begin{aligned} N_*(1_{\lambda>2\delta }\partial _\lambda {\mathcal {S}}_{\lambda }f)(x_0,t_0)\le \sup _{\lambda >2\delta }|\partial _\lambda {\mathcal {S}}_{\lambda } (f)(x_0,t_0)|+cM(f)(x_0,t_0), \end{aligned}$$

and we intend to estimate \(|\partial _\lambda {\mathcal {S}}_{\lambda } (f)(x_0,t_0)|\) for \(\lambda >2\delta \). To do this we fix \(\lambda >2\delta \) and we decompose \(\partial _\lambda {\mathcal {S}}_{\lambda } (f)(x_0,t_0)\) as

$$\begin{aligned}&\int _{||(x_0-y,t_0-s)||>5\lambda }(K_{0,{\lambda }}(x_0,t_0,y,s)-K_{0,\delta }(x_0,t_0,y,s))f(y,s)\, dyds\nonumber \\&\quad \quad + \ \int _{||(x_0-y,t_0-s)||\le 5\lambda }K_{0,{\lambda }}(x_0,t_0,y,s)f(y,s)\, dyds\nonumber \\&\quad \quad - \ \int _{\lambda<||(x_0-y,t_0-s)||<5\lambda }K_{0,\delta }(x_0,t_0,y,s)f(y,s)\, dyds\nonumber \\&\quad \quad + \ \int _{||(x_0-y,t_0-s)||>\lambda }K_{0,\delta }(x_0,t_0,y,s)f(y,s)\, dyds\nonumber \\&\quad =: \ I_1^\delta (x_0,t_0,{\lambda })+I_2^\delta (x_0,t_0,\lambda )+I_3^\delta (x_0,t_0,\lambda )+I_4^\delta (x_0,t_0,\lambda ). \end{aligned}$$

Using Lemma 3.1 we see that

$$\begin{aligned} |I_1^\delta (x_0,t_0,\lambda )+I_2^\delta (x_0,t_0,\lambda )+I_3^\delta (x_0,t_0,\lambda )|\le cM(f)(x_0,t_0). \end{aligned}$$

Furthermore,

$$\begin{aligned} |I_4^\delta (x_0,t_0,\lambda )|\le \mathcal {T}_*^\delta f(x_0,t_0), \end{aligned}$$

where

$$\begin{aligned} \mathcal {T}_*^\delta f(x_0,t_0)=\sup _{\epsilon >2\delta } |\mathcal {T}_\epsilon ^\delta f(x_0,t_0)| \end{aligned}$$

and

$$\begin{aligned} \mathcal {T}_\epsilon ^\delta f(x_0,t_0)=\int _{||(x_0-y,t_0-s)||>\epsilon }K_{0,\delta }(x_0,t_0,y,s)f(y,s)\, dyds. \end{aligned}$$

We have to prove that \( \mathcal {T}_*^\delta :L^2(\mathbb R^{n+1},\mathbb C)\rightarrow L^2(\mathbb R^{n+1},\mathbb C)\) and we have to estimate \(||\mathcal {T}_*^\delta ||_{2\rightarrow 2}\). To do this we carry out an argument similar to the proof of Cotlar’s inequality for Calderon–Zygmund operators. With \(\epsilon >0\) fixed, we let \(Q_\epsilon \) be the the largest parabolic cube, centered at \((x_0,t_0)\), which satisfies that \(2Q_\epsilon \cap \{(y,s)\in \mathbb R^{n+1}:\ ||(x_0-y,t_0-s)||>\epsilon \}=\emptyset \). Then \(l(Q_\epsilon )\approx \epsilon \). Write \(f=f{1}_{2Q_\epsilon }+f{1}_{\mathbb R^{n+1}{\setminus } 2Q_\epsilon }\). Then

$$\begin{aligned} |\mathcal {T}_\epsilon ^\delta f(x_0,t_0)|= & {} |\partial _\lambda \mathcal {S}_\delta (f{1}_{\mathbb R^{n+1}{\setminus } 2Q_\epsilon })(x_0,t_0)|\nonumber \\\le & {} cM(f)(x_0,t_0)+ |\partial _\lambda \mathcal {S}_\delta f(x,t)|+ |\partial _\lambda \mathcal {S}_\delta (f{1}_{2Q_\epsilon })(x,t)|, \end{aligned}$$

whenever \((x,t)\in Q_\epsilon \) and where have used Lemma 3.1 once again. Let \(r\in (0,1)\). Taking a \(L^r\) average in the last display with respect to (xt), we see that

$$\begin{aligned} |\mathcal {T}_\epsilon ^\delta f(x_0,t_0)|\le & {} cM(f)(x_0,t_0)+ (M(|\partial _\lambda \mathcal {S}_\delta f|^r)(x_0,t_0))^{1/r}\nonumber \\&+\,\left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{Q_\epsilon }|\partial _\lambda \mathcal {S}_\delta (f{1}_{2Q_\epsilon })|^r\, dxdt\right) ^{1/r}. \end{aligned}$$

Hence,

$$\begin{aligned} |\mathcal {T}_\epsilon ^\delta f(x_0,t_0)|\le cM(f)(x_0,t_0)+ (M(|\partial _\lambda \mathcal {S}_\delta f|^r)(x_0,t_0))^{1/r}+M(|\partial _\lambda \mathcal {S}_\delta f|)(x_0,t_0). \end{aligned}$$

Furthermore, using an equality attributed to Kolmogorov, see Lemma 10 on p. 35 in [11] for example, and that the support of f is contained in Q, we see that

$$\begin{aligned} (M(|\partial _\lambda \mathcal {S}_\delta f|^r)(x_0,t_0))^{1/r}\le c||\partial _\lambda \mathcal {S}_\delta ||_{L^1(Q)\rightarrow L^{1,\infty }(5Q)}\bigr )M(f)(x_0,t_0), \end{aligned}$$

where \(L^{1,\infty }(5Q)\) is weak-\(L^1\). Using that \(\partial _\lambda \mathcal {S}_\delta \) is a Calderon–Zygmund operator one can deduce, by retracing, and localizing, the proof of weak estimates in Calderon–Zygmund theory based on \(L^2\) estimates, that

$$\begin{aligned} ||\partial _\lambda \mathcal {S}_\delta ||_{L^1(Q)\rightarrow L^{1,\infty }(5Q)}\le c\big (1+||\partial _\lambda \mathcal {S}_\delta ||_{L^2(Q)\rightarrow L^{2}(\mathbb R^{n+1})}\big ), \end{aligned}$$

where c depends on the kernel \(K_{0,\lambda }\) through the constants appearing in Lemma 3.1. For a detailed account of the dependence of the constant c, see [31]. Hence

$$\begin{aligned} \mathcal {T}_*^\delta f(x_0,t_0)\le c\big (1+||\partial _\lambda \mathcal {S}_\delta ||_{L^2(Q)\rightarrow L^{2}(\mathbb R^{n+1})}\big )M(f)(x_0,t_0)+M(|\partial _\lambda \mathcal {S}_\delta f|)(x_0,t_0) \end{aligned}$$

and retracing the estimates we can conclude that we have proved that

$$\begin{aligned} N_*(1_{\lambda >2\delta }\partial _\lambda {\mathcal {S}}_{\lambda }f)(x_0,t_0)\le c\big (1+||\partial _\lambda \mathcal {S}_\delta ||_{2\rightarrow 2}\big )M(f)(x_0,t_0)+M(|\partial _\lambda \mathcal {S}_\delta f|)(x_0,t_0) \end{aligned}$$

whenever \((x_0,t_0)\in \mathbb R^{n+1}\) and \(\delta >0\). Hence,

$$\begin{aligned} ||N_*(1_{\lambda>2\delta }\partial _\lambda {\mathcal {S}}_{\lambda }f)||_2\le c\left( 1+\sup _{\lambda >0}||\partial _\lambda \mathcal {S}_\lambda ||_{2\rightarrow 2}\right) ||f||_2, \end{aligned}$$

whenever \(f\in C_0^\infty (\mathbb R^{n+1},\mathbb C)\) and for a constant c, depending at most on n, \(\Lambda \), and the De Giorgi–Moser–Nash constants, in particular c is independent of \(\delta \). Letting \(\delta \rightarrow 0\) completes the proof of Lemma 4.1 (i). \(\square \)

Proof of Lemma 4.1 (ii)

We let \((x_0,t_0)\in \mathbb R^{n+1}\). To estimate \(\tilde{N}_*(1_{\lambda >2\delta }\nabla _{||}{\mathcal {S}}_\lambda f)(x_0,t_0)\) it suffices to bound

$$\begin{aligned} \left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{W_\lambda (x_0,t_0)}|\nabla _{||}{\mathcal {S}}_\sigma f(y,s)|^2\, dydsd\sigma \right) ^{1/2},\ \end{aligned}$$

where

$$\begin{aligned} W_\lambda (x,t):=\{(y,s,\sigma ):\ (y,s)\in Q_\lambda (x,t),\lambda /2<\sigma <3\lambda /2\} \end{aligned}$$

and for \(\lambda >4\delta /3\) which we from now on assume. In the following we let, for \(m\in \{0,1,\ldots ,4\}\)

$$\begin{aligned} 2^m W_{\lambda }(x,t):=\{(y,s,\sigma ):\ (y,s)\in Q_{2^m\lambda }(x,t),\lambda /2-m\lambda 2^{-10}<\sigma <3\lambda /2+m\lambda 2^{-10}\}. \end{aligned}$$

Then \(2^0W_\lambda (x,t)=W_\lambda (x,t)\). Using this notation and energy estimates, Lemma 2.1, we see that

$$\begin{aligned} \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{W_\lambda (x_0,t_0)}|\nabla _{||}{\mathcal {S}}_\sigma f(y,s)|^2\, dydsd\sigma \le \frac{c}{\lambda ^2}\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{2W_{\lambda }(x_0,t_0)}|{\mathcal {S}}_\sigma f(y,s)-A|^2\, dydsd\sigma , \end{aligned}$$

where A is a constant which in the following is a degree of freedom. Furthermore, using (2.6) with \(p=1\) we see that

$$\begin{aligned}&\left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{W_{\lambda }(x_0,t_0)}|\nabla _{||}{\mathcal {S}}_\sigma f(y,s)|^2\, dydsd\sigma \right) ^{1/2} \le \frac{c}{\lambda }\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{2^2W_{\lambda }(x_0,t_0)}|{\mathcal {S}}_\sigma f(y,s)-A|\, dydsd\sigma . \end{aligned}$$

We write

$$\begin{aligned}&\frac{1}{\lambda }\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{2^2W_{\lambda }(x_0,t_0)}|{\mathcal {S}}_\sigma f(y,s)-A|\, dydsd\sigma \nonumber \\&\quad \le \ \frac{1}{\lambda }\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{2^2W_{\lambda }(x_0,t_0)}|{\mathcal {S}}_\sigma f(y,s)- {\mathcal {S}}_\sigma f(y,t_0)|\, dydsd\sigma \nonumber \\&\quad \quad + \ \frac{1}{\lambda }\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{2^2W_{\lambda }(x_0,t_0)}|{\mathcal {S}}_\sigma f(y,t_0)-A|\, dydsd\sigma \nonumber \\&\quad =: \ I_1+I_2. \end{aligned}$$

By the fundamental theorem of calculus we have

$$\begin{aligned} I_1\le \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{2^3W_{\lambda }(x_0,t_0)}|\lambda \partial _t {\mathcal {S}}_\sigma f(y,s)|\, dydsd\sigma . \end{aligned}$$

Let \(Q\subset \mathbb R^{n+1}\) be a parabolic cube centered at \((x_0,t_0)\) and with side length \(8\lambda \). Then \(I_1\) is bounded by

$$\begin{aligned}&c\int _{\lambda /8}^{2\lambda }\int _{Q}|\lambda ^{-n-2}\partial _t {\mathcal {S}}_\sigma (f\mathbf{1}_{2Q})(y,s)|\, dydsd\sigma \nonumber \\&\quad \quad + \ c\int _{\lambda /8}^{2\lambda }\int _{Q}|\lambda ^{-n-2}\left( \partial _t {\mathcal {S}}_\sigma (f\mathbf{1}_{\mathbb R^{n+1}{\setminus } 2Q})(y,s)-\partial _t {\mathcal {S}}_\sigma (f\mathbf{1}_{\mathbb R^{n+1}{\setminus } 2Q})(x_0,t_0)\right) |\, dydsd\sigma \nonumber \\&\quad \quad + \ c\int _{\lambda /8}^{2\lambda }|\partial _t {\mathcal {S}}_\sigma (f\mathbf{1}_{\mathbb R^{n+1}{\setminus } 2Q})(x_0,t_0)|\, d\sigma \nonumber \\&\quad =: \ I_{11}+I_{12}+I_{13}. \end{aligned}$$

Using Lemma 3.1 we see that

$$\begin{aligned} I_{11}+I_{12}\le c M(f)(x_0,t_0), \end{aligned}$$

where M is the parabolic Hardy–Littlewood maximal function. Furthermore,

$$\begin{aligned} I_{13}\le & {} c\sum _{k=1}^\infty \int _{\lambda /8}^{2\lambda }|\partial _t {\mathcal {S}}_\sigma (f\mathbf{1}_{2^{k+1}Q{\setminus } 2^kQ})(x_0,t_0)|\, d\sigma \nonumber \\\le & {} c\lambda \sum _{k=1}^\infty (2^k\lambda )^{-n-3}\int _{2^{k+1}Q}|f(y,s)|\, dyds\le c M(f)(x_0,t_0). \end{aligned}$$

Hence, we can conclude that

$$\begin{aligned} I_1\le c M(f)(x_0,t_0). \end{aligned}$$
(4.1)

Focusing on \(I_2\) we see that

$$\begin{aligned} I_2\le & {} \frac{1}{\lambda }\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{2^2W_{\lambda }(x_0,t_0)}|{\mathcal {S}}_\sigma f(y,t_0)-{\mathcal {S}}_{\delta /4} f(y,t_0)|\, dydsd\sigma \nonumber \\&+\,\frac{1}{\lambda }\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{2^2W_{\lambda }(x_0,t_0)}|{\mathcal {S}}_{\delta /4}f(y,t_0)-A|\, dydsd\sigma \nonumber \\=: & {} I_{21}+I_{22}. \end{aligned}$$

By the fundamental theorem of calculus

$$\begin{aligned} I_{21}\le & {} \frac{1}{\lambda }\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{2^3W_{\lambda }(x_0,t_0)}\lambda |N_{**}^x(\partial _\lambda {\mathcal {S}}_\lambda f(\cdot ,t_0))(y)|\, dydsd\sigma \nonumber \\\le & {} M^x(N_{**}^x(\partial _\lambda {\mathcal {S}}_\lambda f(\cdot ,t_0))(\cdot ))(x_0), \end{aligned}$$

where \(M^x\) is the Hardy–Littlewood maximal function in x only and \(N_{**}^x\) is an elliptic non tangential maximal function on a fixed time slice. Finally, let A be the average of \({\mathcal {S}}_{\delta /4}f(y,t_0)\), with respect to y, on an spatial surface cube around \(x_0\) with sidelength \(\lambda \). Then, using the \(L^1\)-Poincare inequality we deduce that

$$\begin{aligned} I_{22}\le c M^x(\nabla _{||}{\mathcal {S}}_{\delta /4}f(\cdot ,t_0))(x_0). \end{aligned}$$

Retracing the argument we can conclude that

$$\begin{aligned} \tilde{N}_*(1_{\lambda >2\delta }\nabla _{||}{\mathcal {S}}_\lambda f)(x_0,t_0)\le & {} c \bigl (M(f)(x_0,t_0)+M^x(N_{**}^x(\partial _\lambda {\mathcal {S}}_\lambda f(\cdot ,t_0))(\cdot ))(x_0)\nonumber \\&+\, M^x(\nabla _{||}{\mathcal {S}}_{\delta /4}f(\cdot ,t_0))(x_0)\bigr ). \end{aligned}$$

Hence

$$\begin{aligned} ||\tilde{N}_*(1_{\lambda >2\delta }\nabla _{||}{\mathcal {S}}_\lambda f)||_2^2\le & {} c \bigl (||f||_2^2+||\nabla _{||}{\mathcal {S}}_{\delta /4}f||_2^2\bigr )\nonumber \\&+\,\int _{-\infty }^\infty \int _{\mathbb R^{n}}|N_{**}^x(\partial _\lambda {\mathcal {S}}_\lambda f(\cdot ,t))(x)|^2\, dxdt. \end{aligned}$$

However,

$$\begin{aligned} N_{**}^x(\partial _\lambda {\mathcal {S}}_\lambda f(\cdot ,t_0))(x_0)\le N_{**}(\partial _\lambda {\mathcal {S}}_\lambda f)(x_0,t_0) \end{aligned}$$

and we can conclude that

$$\begin{aligned} ||\tilde{N}_*(1_{\lambda>2\delta }\nabla _{||}{\mathcal {S}}_\lambda f)||_2\le c \left( ||f||_2+\sup _{\lambda >0}||\nabla _{||}{\mathcal {S}}_\lambda f||_2+||N_{**}(\partial _\lambda {\mathcal {S}}_\lambda f)||_2\right) . \end{aligned}$$

This completes the proof of Lemma 4.1 (ii). \(\square \)

Proof of Lemma 4.1 (iii)

We again fix \((x_0,t_0)\in \mathbb R^{n+1}\) and we note that to estimate

$$\begin{aligned} \tilde{N}_*(1_{\lambda >2\delta }H_tD_{1/2}^t{\mathcal {S}}_\lambda f)(x_0,t_0) \end{aligned}$$

it suffices to bound

$$\begin{aligned} \left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{W_\lambda (x_0,t_0)}|H_tD_{1/2}^t{\mathcal {S}}_\sigma f(y,s)|^2\, dydsd\sigma \right) ^{1/2},\quad \lambda >4\delta /3. \end{aligned}$$

Consider \((y,s,\sigma )\in W_\lambda (x_0,t_0)\), \(\lambda >4\delta /3\), and let \(K\gg 1\) be a degree of freedom to be chosen. Then

$$\begin{aligned} H_tD_{1/2}^t({\mathcal {S}}_\sigma f)(y,s)= & {} \lim _{\epsilon \rightarrow 0}\int _{\epsilon \le |s-t|<1/\epsilon }\frac{\text{ sgn }(s-t)}{|s-t|^{3/2}}({\mathcal {S}}_\sigma f)(y,t)\, dt\nonumber \\= & {} \lim _{\epsilon \rightarrow 0}\int _{\epsilon \le |s-t|<(K\sigma )^2}\frac{\text{ sgn }(s-t)}{|s-t|^{3/2}}({\mathcal {S}}_\sigma f)(y,t)\, dt\nonumber \\&+\,\lim _{\epsilon \rightarrow 0}\int _{(K\sigma )^2\le |s-t|<1/\epsilon }\frac{\text{ sgn }(s-t)}{|s-t|^{3/2}}({\mathcal {S}}_\sigma f)(y,t)\, dt\nonumber \\=: & {} g_1(y,s,\sigma )+g_2(y,s,\sigma ). \end{aligned}$$

Let

$$\begin{aligned} g_3(x_0,t_0,\sigma ):=\sup _{\{y:\ |y-x_0|\le 4\sigma \}}\sup _{\{\tau :\ |\tau -t_0|\le (4K\sigma )^2\}}|\partial _\tau ({\mathcal {S}}_\sigma f)(y,\tau )|. \end{aligned}$$

Then, using the oddness about s of the kernel in the definition of \(g_1\),

$$\begin{aligned} |g_1(y,s,\sigma )|\le cK\lambda g_3(x_0,t_0,\sigma ), \end{aligned}$$

whenever \((y,s,\sigma )\in W_\lambda (x_0,t_0)\). Hence,

$$\begin{aligned} \left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{W_\lambda (x_0,t_0)}|g_1(y,s,\sigma )|^2\, dydsd\sigma \right) \le c\lambda ^2\int _{\lambda /8}^{2\lambda }| g_3(x_0,t_0,\sigma )|^2\, d\sigma . \end{aligned}$$

To estimate the right hand side in the last display, let \((y,\tau )\) be such that \( |y-x_0|\le 4\sigma \), \( |\tau -t_0|\le (4K\sigma )^2\). Let \(Q\subset \mathbb R^{n+1}\) be a parabolic cube centered at \((x_0,t_0)\) and with side length \(16K\sigma \). Then, for K large enough we see that

$$\begin{aligned} |\lambda \partial _\tau ({\mathcal {S}}_\sigma f)(y,\tau )|\le & {} \lambda |\partial _\tau {\mathcal {S}}_\sigma (f1_{2Q})(y,\tau )|\nonumber \\&+\,\lambda |\partial _\tau {\mathcal {S}}_\sigma ( f1_{\mathbb R^{n+1}{\setminus } 2Q})(y,\tau )-\partial _\tau {\mathcal {S}}_\sigma (f1_{\mathbb R^{n+1}{\setminus } 2Q})(x_0,t_0)|\nonumber \\&+\,\lambda |\partial _\tau {\mathcal {S}}_\sigma (f1_{\mathbb R^{n+1}{\setminus } 2Q})(x_0,t_0)|. \end{aligned}$$

Basically repeating the proof of (4.1) we see that

$$\begin{aligned} \left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{W_\lambda (x_0,t_0)}|g_1(y,s,\sigma )|^2\, dydsd\sigma \right) ^{1/2}\le c M(f)(x_0,t_0). \end{aligned}$$

To estimate \(g_2(y,s,\sigma )\), whenever \((y,s,\sigma )\in W_\lambda (x_0,t_0)\), we introduce the function

$$\begin{aligned} g_4(\bar{y},\bar{s},\sigma ):=\lim _{\epsilon \rightarrow 0}\int _{(K\sigma )^2\le |t-\bar{s}|<1/\epsilon }\frac{\text{ sgn } (\bar{s}-t)}{|\bar{s}-t|^{3/2}}({\mathcal {S}}_{\delta /4} f)(\bar{y},t)\, dt. \end{aligned}$$

Now

$$\begin{aligned} |g_2(y,s,\sigma )-g_4(x_0,t_0,\sigma )|\le & {} |g_2(y,s,\sigma )-g_2(x_0,s,\sigma )|\nonumber \\&+\,|g_2(x_0,s,\sigma )-g_2(x_0,t_0,\sigma )|\nonumber \\&+\,|g_2(x_0,t_0,\sigma )-g_4(x_0,t_0,\sigma )|. \end{aligned}$$

In particular,

$$\begin{aligned} |g_2(y,s,\sigma )-g_4(x_0,t_0,\sigma )|\le & {} \int _{(K\sigma )^2\le |s-t|}\frac{|{\mathcal {S}}_\sigma f(y,t)-{\mathcal {S}}_\sigma f(x_0,t)|}{|t-s|^{3/2}}\, dt\nonumber \\&+\,\int _{(K\sigma )^2\le |\xi |}\frac{|{\mathcal {S}}_\sigma f(x_0,\xi +s)-{\mathcal {S}}_\sigma f(x_0,\xi +t_0)|}{|\xi |^{3/2}}\, d\xi \nonumber \\&+\,\int _{(K\sigma )^2\le |t-t_0|}\frac{|{\mathcal {S}}_\sigma f(x_0,t)-{\mathcal {S}}_{\delta /4} f(x_0,t)|}{|t_0-t|^{3/2}}\, dt\nonumber \\=: & {} h_1(y,s,\sigma )+h_2(y,s,\sigma )+h_3(x_0,t_0,\sigma ). \end{aligned}$$

We note that

$$\begin{aligned} h_2(y,s,\sigma )\le & {} c\sigma ^2\int _{(K\sigma )^2\le |\xi |}\frac{N_*(\partial _t{\mathcal {S}}_\sigma f)(x_0,\xi +t_0)}{|\xi |^{3/2}}\, d\xi \nonumber \\\le & {} c\sigma \int _{(K\sigma )^2\le |\xi |}\frac{M(f)(x_0,\xi +t_0)}{|\xi |^{3/2}}\, d\xi \le c M^t(M(f)(x_0,\cdot ))(t_0), \end{aligned}$$

where \(M^t\) is the Hardy–Littlewood maximal operator in the t-variable, as we see by arguing as above. Similarly,

$$\begin{aligned} h_3(y,s,\sigma )\le c M^t(N_*(\partial _\lambda {\mathcal {S}}_\sigma f)(x_0,\cdot ))(t_0). \end{aligned}$$

We therefore focus on \(h_1(y,s,\sigma )\). Let

$$\begin{aligned} \tilde{h}_1(y,\sigma ):=\int _{\lambda ^2\le |t-t_0|}\frac{|{\mathcal {S}}_\sigma f(y,t)-{\mathcal {S}}_\sigma f(x_0,t)|}{|t-t_0|^{3/2}}\, dt. \end{aligned}$$

If K is large enough, then \(h_1(y,s,\sigma )\le c\tilde{h}_1(y,\sigma )\), whenever \((y,s,\sigma )\in W_\lambda (x_0,t_0)\). Hence we only have to estimate

$$\begin{aligned} \left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{\hat{Q}_\lambda (x_0)\times I_{\lambda /2}(\lambda )}\tilde{h}_1^2\, dyd\sigma \right) ^{1/2}=\sup \left| \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{\hat{Q}_\lambda (x_0)\times I_{\lambda /2}(\lambda )} \tilde{h}_1g\, dyd\sigma \right| , \end{aligned}$$

where \(\hat{Q}_\lambda (x_0)\subset \mathbb R^n\) now is a (non-parabolic) cube with side length \(\lambda \) and center \(x_0\), \(I_{\lambda /2}(\lambda )\) is the interval \((\lambda /2,3\lambda /2)\), and where the sup is taken with respect to all \(g\in C_0^\infty (\mathbb R^{n+1},\mathbb R)\) such that

$$\begin{aligned} \left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{\hat{Q}_\lambda (x_0)\times I_{\lambda /2}(\lambda )}g^2\, dyd\sigma \right) ^{1/2}=1. \end{aligned}$$
(4.2)

Given g as in (4.2) we let

$$\begin{aligned} E:=\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{\hat{Q}_\lambda (x_0)\times I_{\lambda /2}(\lambda )} \tilde{h}_1g\, dyd\sigma . \end{aligned}$$

Then

$$\begin{aligned} E= & {} \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{\hat{Q}_\lambda (x_0)\times I_{\lambda /2}(\lambda )}\left( \int _{\lambda ^2\le |t-t_0|}\frac{|{\mathcal {S}}_\sigma f(y,t)- {\mathcal {S}}_\sigma f(x_0,t)|}{|t-t_0|^{3/2}}\, dt\right) g(y,\sigma )\, dyd\sigma \nonumber \\\le & {} c\sum _{j=0}^\infty (\lambda ^22^j)^{-3/2}\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{\hat{Q}_\lambda (x_0)\times I_{\lambda /2}(\lambda )}\left( \int _{I_j}{|{\mathcal {S}}_\sigma f(y,t)- {\mathcal {S}}_\sigma f(x_0,t)|}\, dt\right) g(y,\sigma )\, dyd\sigma , \end{aligned}$$

where \(I_j=\{t:\ \lambda ^22^j\le |t-t_0|<\lambda ^22^{j+1}\}\). Let \(\eta \in (-\lambda ^2/100,\lambda ^2/100)\) be a degree of freedom. Given any integer \(i\in \{2^{j-1},\ldots ,2^{j+3}\}\) we let \(t_{j,i}^\pm =t_0\pm i\lambda ^2\), \(N_j=(2^{j+3}-2^{j-1}+1)\). Given \(\eta \) we let \(I_{j,i}(t_{j,i}^\pm +\eta ,\lambda ^2)\) be the interval centered at \(t_{j,i}^\pm +\eta \) and of length \(2\lambda ^2\). Then \(\{I_{j,i}(t_{j,i}^\pm +\eta ,\lambda ^2)\}_i\) is, for each \(\eta \in (-\lambda ^2/100,\lambda ^2/100)\), a covering of \(I_j\) and \(\{I_{j,i}(t_{j,i}+\eta ,\lambda ^2/10^4)\}\) is a disjoint collection. Using this we see that |E| can be bounded from above by

$$\begin{aligned}&c\lambda ^2\sum _{j=0}^\infty (\lambda ^22^j)^{-3/2}\sum _{i=1}^{N_j}\mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{W_\lambda (x_0,t_{j,i}^\pm +\eta )}|{\mathcal {S}}_\sigma f(y,t)- {\mathcal {S}}_\sigma f(x_0,t)||g(y,\sigma )|\, dydtd\sigma \nonumber \\&\quad \le \ c\lambda ^3\sum _{j=0}^\infty (\lambda ^22^j)^{-3/2}\sum _{i=1}^{N_j}\tilde{N}_{**}(\nabla _{||}{\mathcal {S}}_\lambda f)(x_0,t_{j,i}^\pm +\eta ). \end{aligned}$$

This estimate holds uniformly with respect to \(\eta \in (-\lambda ^2/100,\lambda ^2/100)\). In particular, taking the average with respect to \(\eta \) we see that

$$\begin{aligned} |E|\le & {} c\lambda \sum _{j=0}^\infty (\lambda ^22^j)^{-3/2}\int _{\{t:\ \lambda ^22^{j-2}\le |t-t_0|<\lambda ^22^{j+4}\}}\tilde{N}_{**}(\nabla _{||}{\mathcal {S}}_\lambda f)(x_0,t)\, dt\nonumber \\\le & {} cM^t(\tilde{N}_{**}(\nabla _{||}{\mathcal {S}}_\lambda f)(x_0,\cdot ))(t_0). \end{aligned}$$

Putting the estimates together we can conclude, for \(\lambda >4\delta /3\), that

$$\begin{aligned} \left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{W_\lambda (x_0,t_0)}|H_tD_{1/2}^t{\mathcal {S}}_\sigma f(y,s)|^2\, dydsd\sigma \right) ^{1/2} \end{aligned}$$

is bounded by

$$\begin{aligned}&cM^t(\tilde{N}_{**}(\nabla _{||}{\mathcal {S}}_\lambda f)(x_0,\cdot ))(t_0)+c M^t(M(f)(x_0,\cdot ))(t_0)+cM^t(N_*(\partial _\lambda {\mathcal {S}}_\sigma f)(x_0,\cdot ))(t_0)\nonumber \\&\quad +\left( \mathop {\int \!\!\!~\!~\!\!\!\!\!\!-}\nolimits _{W_\lambda (x_0,t_0)}|g_4(x_0,t_0,\sigma )|^2\, dydsd\sigma \right) ^{1/2}, \end{aligned}$$

where \(M^t\) is the Hardy–Littlewood maximal operator in the t-variable and M is the parabolic Hardy–Littlewood maximal function. Hence, letting

$$\begin{aligned} \psi (x_0,t_0):=\sup _{\sigma >0}|g_4(x_0,t_0,\sigma )| \end{aligned}$$

we see that

$$\begin{aligned} ||\tilde{N}_*(1_{\lambda >2\delta }H_tD_{1/2}^t{\mathcal {S}}_\lambda f)||_2\le & {} c||f||_2+ c\bigl (||\tilde{N}_{**}(\nabla _{||} \mathcal {S}_\lambda f)||_2+||N_{**}(\partial _\lambda \mathcal {S}_\lambda f)||_2\bigr )+c||\psi ||_2 \end{aligned}$$

where the constant c is independent of \(\delta \). Hence, to complete the proof of (iii) it remains to estimate \(||\psi ||_2\). To do this we first recall that \(f\in C_0^\infty (\mathbb R^{n+1},\mathbb C)\). Hence, using Lemma 3.5 we know that \(\mathcal {S}_{\delta /4} f\in {\mathbb H}(\mathbb R^{n+1},\mathbb C)\cap L^2(\mathbb R^{n+1},\mathbb C)\). Using this it follows that

$$\begin{aligned} {\mathcal {S}}_{\delta /4}f(x,t)=cI_{1/2}^t(D_{1/2}^t{\mathcal {S}}_{\delta /4}f)(x,t)=c I_{1/2}^th(x,t), \end{aligned}$$

where \( I_{1/2}^t\) is the (fractional) Riesz operator in t defined on the Fourier transform side through the multiplier \(|\tau |^{-1/2}\) and \(h(x,t):=(D_{1/2}^t{\mathcal {S}}_{\delta /4}f)(x,t)\). Using this we see that

$$\begin{aligned}\psi (x_0,t_0)=c\sup _{\epsilon >0}|\tilde{V}_\epsilon h(x_0,t_0)|=:c\tilde{V}_*h(x_0,t_0),\end{aligned}$$

where \(V_\epsilon \) is defined on functions \(k\in L^2(\mathbb R,\mathbb R)\) by

$$\begin{aligned} V_\epsilon k(t)=\int _{\{|s-t|>\epsilon \}}\frac{\text{ sgn }(t-s) I_{1/2}^tk(s)}{|s-t|^{3/2}}\, ds, \end{aligned}$$

and \(\tilde{V}_\epsilon h(x,t)=V_\epsilon h(x,\cdot )\) evaluated at t. However, using this notation we can apply Lemma 2.27 in [21] and conclude that

$$\begin{aligned} ||\psi ||_2\le c||h||_2=c||D_{1/2}^t{\mathcal {S}}_{\delta /4}f||_2\le c\sup _{\lambda >0}||H_tD_{1/2}^t\mathcal {S}_\lambda f||_2. \end{aligned}$$

This completes the proof of Lemma 4.1 (iii). \(\square \)

4.2 Proof of Lemma 4.2

We first note, using Lemmas 2.1, 2.3 and induction, that it suffices to prove

$$\begin{aligned} \,\mathrm{(i^{\prime })}&|||\lambda \nabla \partial _\lambda \mathcal {S}_{\lambda }f|||\le c\Phi (f)+c||f||_2,\nonumber \\ \,\mathrm{(ii^{\prime })}&|||\lambda \partial _t\mathcal {S}_{\lambda }f|||\le c\Phi (f)+c||f||_2, \end{aligned}$$

whenever \(f\in L^2(\mathbb R^{n+1},\mathbb C)\). To prove (i\(^{\prime }\)) it suffices to estimate \(|||\lambda \nabla _{||}\partial _\lambda \mathcal {S}_{\lambda }f|||\). Given \(\epsilon >0\) we let

$$\begin{aligned} A_1:= & {} -\frac{1}{2}\int _{\epsilon }^{1/\epsilon }\int _{\mathbb R^{n+1}}\nabla _{||}\partial _\lambda ^2 \mathcal {S}_{\lambda }f\cdot \overline{\nabla _{||}\partial _\lambda \mathcal {S}_{\lambda }f}\, \lambda ^2{dxdtd\lambda },\nonumber \\ A_2:= & {} -\frac{1}{2}\int _{\epsilon }^{1/\epsilon }\int _{\mathbb R^{n+1}}\nabla _{||}\partial _\lambda \mathcal {S}_{\lambda }f\cdot \overline{\nabla _{||}\partial _\lambda ^2 \mathcal {S}_{\lambda }f}\, \lambda ^2{dxdtd\lambda },\nonumber \\ A_3:= & {} \int _{\mathbb R^{n+1}}\nabla _{||}\partial _\lambda \mathcal {S}_{\lambda }f\cdot \overline{\nabla _{||}\partial _\lambda \mathcal {S}_{\lambda }f}\, \lambda ^2{dxdt}\biggl |_{\lambda =1/\epsilon },\nonumber \\ A_4:= & {} \int _{\mathbb R^{n+1}}\nabla _{||}\partial _\lambda \mathcal {S}_{\lambda }f\cdot \overline{\nabla _{||}\partial _\lambda \mathcal {S}_{\lambda }f}\, \lambda ^2{dxdt}\biggl |_{\lambda =\epsilon }. \end{aligned}$$

Using partial integration with respect to \(\lambda \),

$$\begin{aligned} \int _{\epsilon }^{1/\epsilon }\int _{\mathbb R^{n+1}}\nabla _{||}\partial _\lambda \mathcal {S}_{\lambda }f\cdot \overline{\nabla _{||}\partial _\lambda \mathcal {S}_{\lambda }f}\, \lambda {dxdtd\lambda }=A_1+A_2+A_3+A_4. \end{aligned}$$

Furthermore, using Lemma 3.4 (ii),

$$\begin{aligned} |A_1|+|A_2|+|A_3|+|A_4|\le c|||\lambda ^2 \nabla _{||}\partial _\lambda ^2 \mathcal {S}_{\lambda }f|||^2+c||f||_2^2, \end{aligned}$$

with c independent of \(\epsilon \). Hence

$$\begin{aligned} |||\lambda \nabla _{||}\partial _\lambda \mathcal {S}_{\lambda }f|||^2= & {} \lim _{\epsilon \rightarrow 0}\int _{\epsilon }^{1/\epsilon }\int _{\mathbb R^{n+1}}\nabla _{||}\partial _\lambda \mathcal {S}_{\lambda }f\cdot \overline{\nabla _{||}\partial _\lambda \mathcal {S}_{\lambda }f}\, \lambda {dxdtd\lambda }\nonumber \\\le & {} c|||\lambda ^2 \nabla _{||}\partial _\lambda ^2 \mathcal {S}_{\lambda }f|||^2+c||f||_2^2. \end{aligned}$$
(4.3)

(i\(^{\prime }\)) now follows from an application of Lemma 2.1. To prove (ii\(^{\prime }\)) we first introduce, for \(\epsilon >0\),

$$\begin{aligned} B_1:= & {} -\frac{1}{2}\int _{\epsilon }^{1/\epsilon }\int _{\mathbb R^{n+1}}\partial _t\partial _\lambda \mathcal {S}_{\lambda }f \overline{\partial _t \mathcal {S}_{\lambda }f}\, \lambda ^2{dxdtd\lambda },\nonumber \\ B_2:= & {} -\frac{1}{2}\int _{\epsilon }^{1/\epsilon }\int _{\mathbb R^{n+1}}\partial _t\mathcal {S}_{\lambda }f \overline{\partial _t \partial _\lambda \mathcal {S}_{\lambda }f}\, \lambda ^2{dxdtd\lambda },\nonumber \\ B_3:= & {} \int _{\mathbb R^{n+1}}\partial _t \mathcal {S}_{\lambda }f \overline{\partial _t \mathcal {S}_{\lambda }f}\, \lambda ^2{dxdt}\biggl |_{\lambda =1/\epsilon },\nonumber \\ B_4:= & {} -\int _{\mathbb R^{n+1}}\partial _t \mathcal {S}_{\lambda }f \overline{\partial _t \mathcal {S}_{\lambda }f}\, \lambda ^2{dxdt}\biggl |_{\lambda =\epsilon }. \end{aligned}$$

Then, using Lemma 3.4 (iii)

$$\begin{aligned} |B_1|+|B_2|+|B_3|+|B_4|\le c|||\lambda ^2\partial _t\partial _\lambda \mathcal {S}_{\lambda }f|||^2+c||f||_2^2, \end{aligned}$$

with c independent of \(\epsilon \). Hence, again by integration by parts with respect to \(\lambda \),

$$\begin{aligned} |||\lambda \partial _t\mathcal {S}_{\lambda }f|||^2= & {} \lim _{\epsilon \rightarrow 0}\int _{\epsilon }^{1/\epsilon }\int _{\mathbb R^{n+1}}\partial _t \mathcal {S}_{\lambda }f \overline{\partial _t \mathcal {S}_{\lambda }f}\, \lambda {dxdtd\lambda }\nonumber \\\le & {} c|||\lambda ^2\partial _t\partial _\lambda \mathcal {S}_{\lambda }f|||^2+c||f||_2^2. \end{aligned}$$
(4.4)

Furthermore, repeating the above argument it also follows that

$$\begin{aligned} |||\lambda ^2\partial _t\partial _\lambda \mathcal {S}_{\lambda }f|||^2\le c|||\lambda ^3\partial _t\partial _\lambda ^2\mathcal {S}_{\lambda }f|||^2+c||f||_2^2. \end{aligned}$$

Finally, using Lemma 2.3 we can combine the above estimates and conclude that

$$\begin{aligned} |||\lambda \partial _t\mathcal {S}_{\lambda }f|||\le c\Phi (f)+c||f||_2. \end{aligned}$$

This completes the proof of (ii\(^{\prime }\)) and hence the proof of Lemma 4.2.

5 Resolvents, square functions and Carleson measures

In the following we collect some of the main results from [32] to be used in the proof of our main results. Throughout the section we assume that \(\mathcal {H}\), \(\mathcal {H}^*\) satisfy (1.2) and (1.3). We let

$$\begin{aligned} \mathcal {L}_{||}:=-\mathop {{\text {div}}}\nolimits _{||} A_{||}\nabla _{||}, \end{aligned}$$

where \(\mathop {{\text {div}}}\nolimits _{||}\) is the divergence operator in the variables \((\partial _{x_1},\ldots ,\partial _{x_n})\). \(A_{||}\) is the \(n\times n\)-dimensional sub matrix of A defined by \(\{A_{i,j}\}_{i,j=1}^n\). We also let

$$\begin{aligned} \mathcal {H}_{||}:=\partial _t+\mathcal {L}_{||},\quad \mathcal {H}_{||}^*:=-\partial _t+\mathcal {L}_{||}^*. \end{aligned}$$

Using this notation the equation \(\mathcal {H} u=0\) can be written, formally, as

$$\begin{aligned} \mathcal {H}_{||}u-\sum _{j=1}^{n+1}A_{n+1,j}D_{n+1}D_ju-\sum _{i=1}^{n}D_i(A_{i,n+1}D_{n+1}u)=0. \end{aligned}$$
(5.1)

In the proof of Lemma 6.1 below we will use that (5.1) holds in an appropriate weak sense on cross sections \(\lambda =\) constant. Indeed, let \(\lambda \in (a,b)\) and let \(\epsilon <\min (\lambda -a,b-\lambda )\). Set \(\varphi _\epsilon (\sigma )=\epsilon ^{-1}\varphi (\sigma /\epsilon )\) where \(\varphi \in C_0^\infty (-1/2,1/2)\), \(0\le \varphi \), \(\int \varphi \, d\sigma =1\). We let \(\phi _{\lambda ,\epsilon }(x,t, \sigma )=\psi (x,t)\varphi _\epsilon (\sigma )\) where \(\psi \in C_0^\infty (\mathbb R^{n+1},\mathbb C)\). Then, by the notion of weak solutions we have

$$\begin{aligned}&\int _{\mathbb R^{n+2}} \left( A_{||}(x)\nabla _{||} u(x,t, \sigma )\cdot \nabla _{||}\overline{\phi _{\lambda ,\epsilon }(x,t, \sigma )}-u(x,t, \sigma )\partial _t\overline{\phi _{\lambda ,\epsilon }(x,t, \sigma )}\right) \, dxdtd\sigma \nonumber \\&\quad =\sum _{j=1}^{n+1}\int _{\mathbb R^{n+2}} A_{n+1,j}(x)\partial _{x_j}\partial _\lambda u(x,t, \sigma )\overline{\phi _{\lambda ,\epsilon }(x,t, \sigma )}\, dxdtd\sigma \nonumber \\&\quad \quad -\,\sum _{i=1}^{n}\int _{\mathbb R^{n+2}} A_{i,n+1}(x)\partial _\lambda u(x,t, \sigma )\partial _{x_i}\overline{\phi _{\lambda ,\epsilon }(x,t, \sigma )}\, dxdtd\sigma . \end{aligned}$$
(5.2)

Hence, if

$$\begin{aligned} \nabla u,\ \nabla \partial _\lambda u\in L^2(\mathbb R^{n+1},\mathbb C^{n+1}), \end{aligned}$$
(5.3)

uniformly in \(\lambda \in (a,b)\), with norms depending continuously on \(\lambda \in (a,b)\), then we can conclude, by letting \(\epsilon \rightarrow 0\) in (5.2), that

$$\begin{aligned}&\int _{\mathbb R^{n+1}} \biggl ( A_{||}(x)\nabla _{||} u(x,t, \lambda )\cdot \nabla _{||}\overline{\psi (x,t)}-u(x,t, \lambda )\partial _t\overline{\psi (x,t)}\biggr )\, dxdt\nonumber \\&\quad =\sum _{j=1}^{n+1}\int _{\mathbb R^{n+1}} A_{n+1,j}(x)\partial _{x_j}\partial _\lambda u(x,t, \lambda )\overline{\psi (x,t)}\, dxdt\nonumber \\&\quad \quad -\,\sum _{i=1}^{n}\int _{\mathbb R^{n+1}} A_{i,n+1}(x)\partial _\lambda u(x,t, \lambda )\partial _{x_i}\overline{\psi (x,t)}\, dxdt. \end{aligned}$$
(5.4)

In this sense, and under these assumptions, (5.1) holds on cross sections \(\lambda =\) constant.

5.1 Resolvents and a parabolic Hodge decomposition associated to \(\mathcal {H}_{||}\)

Recall the function space \(\mathbb H={\mathbb H}(\mathbb R^{n+1},\mathbb C)\) introduced in (2.1). In the following we will consider, to ensure a Hilbertian structure, that this space is equipped with the equivalent semi norm stated on the right hand side in (2.2) (i). We let \(\mathbb H^*={\mathbb H}^*(\mathbb R^{n+1},\mathbb C)\) be the space dual to \(\mathbb H\), with norm \(||\cdot ||_{\mathbb H^*}\), and we let \(\langle \cdot ,\cdot \rangle _{\mathbb H^*}:\mathbb H^*\times \mathbb H\rightarrow \mathbb C \) denote the duality pairing. We let \(\bar{\mathbb H}=\bar{\mathbb H}(\mathbb R^{n+1},\mathbb C)\) be the closure of \(C_0^\infty (\mathbb R^{n+1},\mathbb C)\) with respect to the norm

$$\begin{aligned} \Vert f\Vert _{\bar{\mathbb H}}:=\Vert f\Vert _{\mathbb H}+\Vert f\Vert _2. \end{aligned}$$

We let \(\bar{\mathbb H}^*=\bar{\mathbb H}^*(\mathbb R^{n+1},\mathbb C)\) be the space dual to \(\bar{\mathbb H}\), with norm \(||\cdot ||_{\bar{\mathbb H}^*}\), and we let \(\langle \cdot ,\cdot \rangle _{\bar{\mathbb H}^*}:\bar{\mathbb H}^*\times \bar{\mathbb H}\rightarrow \mathbb C \) denote the duality pairing. Let \(B:\mathbb H\times \mathbb H\rightarrow \mathbb R\) be defined as

$$\begin{aligned} B(u,\phi ):= \int _{\mathbb R^{n+1}} (A_{||}\nabla _{||} u\cdot \nabla _{||}\bar{\phi }-D_{1/2}^tu\overline{H_tD_{1/2}^t\phi })\, dxdt, \end{aligned}$$
(5.5)

and let, for \(\delta \in (0,1)\), \( B_\delta :\mathbb H\times \mathbb H\rightarrow \mathbb R\) be defined as

$$\begin{aligned} B_\delta (u,\phi ):= & {} \int _{\mathbb R^{n+1}}A_{||}\nabla _{||} u\cdot \overline{\nabla _{||}(I+\delta H_t)\phi }\, dxdt\nonumber \\&-\,\int _{\mathbb R^{n+1}}D_{1/2}^tu\overline{H_tD_{1/2}^t(I+\delta H_t)\phi }\, dxdt. \end{aligned}$$
(5.6)

Definition 5.1

Let \(F\in {\mathbb H}^*(\mathbb R^{n+1},\mathbb C)\). We say that a function \(u\in {\mathbb H}(\mathbb R^{n+1},\mathbb C)\) is a (weak) solution to the equation \(\mathcal {H}_{||}u=F\), in \(\mathbb R^{n+1}\), if

$$\begin{aligned} B(u,\phi )=\langle F,\phi \rangle _{\mathbb H^*}, \end{aligned}$$

whenever \(\phi \in {\mathbb H}(\mathbb R^{n+1},\mathbb C)\).

Definition 5.2

Let \(\lambda >0\) be given. Let \(F\in \bar{\mathbb H}^*(\mathbb R^{n+1},\mathbb C)\). We say that a function \(u\in \bar{\mathbb {H}}(\mathbb R^{n+1},\mathbb C)\) is a (weak) solution to the equation \(u+\lambda ^2\mathcal {H}_{||}u=F\), in \(\mathbb R^{n+1}\), if

$$\begin{aligned} \int _{\mathbb R^{n+1}}u\bar{\phi }\, dxdt+\lambda ^2 B(u,\phi )=\langle F,\phi \rangle _{\bar{\mathbb H}^*}, \end{aligned}$$

whenever \(\phi \in \bar{\mathbb H}(\mathbb R^{n+1},\mathbb C)\).

Lemma 5.3

Consider the operator \(\mathcal {H}_{||}=\partial _t-\mathop {{\text {div}}}\nolimits _{||} A_{||}\nabla _{||}\) and assume that A satisfies (1.2), (1.3). Let \(F\in {\mathbb H}^*(\mathbb R^{n+1},\mathbb C)\). Then there exists a weak solution to the equation \(\mathcal {H}_{||}u=F\), in \(\mathbb R^{n+1}\), in the sense of Definition 5.1. Furthermore,

$$\begin{aligned} ||u||_{\mathbb H}\le c||F||_{\mathbb H^*}, \end{aligned}$$

for some constant c depending only on n and \(\Lambda \). The solution is unique up to a constant.

Proof

This is essentially Lemma 2.6 in [32]. Let \(\phi _\delta :=(I+\delta H_t)\phi \), \(\phi \in {\mathbb H}(\mathbb R^{n+1},\mathbb C)\), \(\delta \in (0,1)\). Then

$$\begin{aligned} |\langle F,\phi _\delta \rangle _{\mathbb H^*}|\le c||F||_{\mathbb H^*}||\phi ||_{\mathbb H}. \end{aligned}$$

Consider the sesquilinear form \(B_\delta (\cdot ,\cdot )\) introduced in (5.6). If \(\delta =\delta (n,\Lambda )\) is small enough, then \(B_\delta (\cdot ,\cdot )\) is a sesquilinear, bounded, coercive form on \(\mathbb H\times \mathbb H\). Hence, using the Lax–Milgram theorem we see that there exists a unique \(u\in {\mathbb H}\) such that

$$\begin{aligned} B(u,\phi _\delta )= B_\delta (u,\phi )=\langle F,\phi _\delta \rangle _{\mathbb H^*}, \end{aligned}$$

for all \(\phi \in \mathbb H\). Using that \((I+\delta H_t)\) is invertible on \(\mathbb H\), if \(0<\delta \ll 1\) is small enough, we can conclude that

$$\begin{aligned} B(u,\psi )= \langle F,\psi \rangle _{\mathbb H^*}, \end{aligned}$$

whenever \(\psi \in {\mathbb H}\). The bound \(||u||_{\mathbb H}\le c||F||_{\mathbb H^*}\) follows readily. This completes the existence and quantitative part of the lemma. The statement concerning uniqueness follows immediately. \(\square \)

Lemma 5.4

Let \(\lambda >0\) be given. Consider the operator \(\mathcal {H}_{||}=\partial _t-\mathop {{\text {div}}}\nolimits _{||} A_{||}\nabla _{||}\) and assume that A satisfies (1.2), (1.3). Let \(F\in \bar{\mathbb H}^*(\mathbb R^{n+1},\mathbb C)\). Then there exists a weak solution to the equation \(u+\lambda ^2\mathcal {H}_{||}u=F\), in \(\mathbb R^{n+1}\), in the sense of Definition 5.2. Furthermore,

$$\begin{aligned} ||u||_{2}+||\lambda \nabla _{||} u||_{2}+||\lambda D_{1/2}^tu||_{2}\le c||F||_{\bar{\mathbb H}^*}, \end{aligned}$$

for some constant c depending only on n and \(\Lambda \). The solution is unique.

Proof

See the proof of Lemma 2.7 in [32]. \(\square \)

Remark 5.5

Definitions 5.1, 5.2, Lemmas 5.3, and 5.4, all have analogous formulations for the operator \(\mathcal {H}_{||}^*\).

Remark 5.6

Let \(\lambda >0\) be given. Consider the operator \(\mathcal {H}_{||}=\partial _t-\mathop {{\text {div}}}\nolimits _{||} A_{||}\nabla _{||}\). Let \(F\in \bar{\mathbb H}^*(\mathbb R^{n+1},\mathbb C)\). By Lemma 5.4 the equation \(u+\lambda ^2\mathcal {H}_{||}u=F\) has a unique weak solution \(u\in \bar{\mathbb H}\). From now on we will denote this solution by \( \mathcal {E}_\lambda F\). In the case of the operator \(\mathcal {H}_{||}^*\) we denote the corresponding solution by \( \mathcal {E}_\lambda ^*F\). In this sense \(\mathcal {E}_\lambda =(I+\lambda ^2\mathcal {H}_{||})^{-1}\) and \(\mathcal {E}_\lambda ^*=(I+\lambda ^2\mathcal {H}_{||}^*)^{-1}\).

Consider \(\lambda >0\) fixed, let \(|h|\ll \lambda \) and consider \(F\in \bar{\mathbb H}^*(\mathbb R^{n+1},\mathbb C)\). By definition,

$$\begin{aligned}&\int _{\mathbb R^{n+1}}\mathcal {E}_{\lambda +h}F\bar{\phi }\, dxdt+(\lambda +h)^2 B(\mathcal {E}_{\lambda +h}F,\phi )=\langle F,\phi \rangle _{\bar{\mathbb H}^*},\nonumber \\&\int _{\mathbb R^{n+1}}\mathcal {E}_{\lambda }F\bar{\phi }\, dxdt+\lambda ^2 B(\mathcal {E}_{\lambda }F,\phi )=\langle F,\phi \rangle _{\bar{\mathbb H}^*}, \end{aligned}$$
(5.7)

for all \(\phi \in \bar{\mathbb H}(\mathbb R^{n+1},\mathbb C)\). We let \(\mathcal {D}_{\lambda }^hF:=\mathcal {E}_{\lambda +h}F-\mathcal {E}_{\lambda }F\). (5.7) implies

$$\begin{aligned} \int _{\mathbb R^{n+1}}\mathcal {D}_{\lambda }^hF\bar{\phi }_\delta \, dxdt+\lambda ^2 B(\mathcal {D}_{\lambda }^hF,\phi _\delta )=-h(2\lambda +h)B(\mathcal {E}_{\lambda +h}F,\phi _\delta ) \end{aligned}$$
(5.8)

for all \(\phi \in \bar{\mathbb H}(\mathbb R^{n+1},\mathbb C)\), \(\phi _\delta :=(I+\delta H_t)\phi \). Again, arguing as in the proof of Lemma 5.4 we see, if \(\delta =\delta (n,\Lambda )\), \(0<\delta \ll 1\) is small enough and as \(\mathcal {D}_{\lambda }^hF\in \bar{\mathbb H}(\mathbb R^{n+1},\mathbb C)\), that

$$\begin{aligned} ||\mathcal {D}_{\lambda }^hF||_{2}+||\lambda \nabla _{||} \mathcal {D}_{\lambda }^hF||_{2}+||\lambda D_{1/2}^t\mathcal {D}_{\lambda }^hF||_{2}\le c|h|||\mathcal {E}_{\lambda +h}F||_2\le c|h|||F||_{\bar{\mathbb H}^*}, \end{aligned}$$
(5.9)

where c is independent of h. Hence

$$\begin{aligned} \lim _{h\rightarrow 0}\mathcal {D}_{\lambda }^hF=\lim _{h\rightarrow 0}\bigl (\mathcal {E}_{\lambda +h}F-\mathcal {E}_{\lambda }F\bigr )=0 \end{aligned}$$
(5.10)

in the sense that

$$\begin{aligned} ||\mathcal {D}_{\lambda }^hF||_{2}+||\lambda \nabla _{||} \mathcal {D}_{\lambda }^hF||_{2}+||\lambda D_{1/2}^t\mathcal {D}_{\lambda }^hF||_{2}\rightarrow 0\quad \text{ as } \; h\rightarrow 0. \end{aligned}$$
(5.11)

Similarly,

$$\begin{aligned} \int _{\mathbb R^{n+1}}h^{-1}\mathcal {D}_{\lambda }^hF\bar{\phi }_\delta \, dxdt+\lambda ^2 B(h^{-1}\mathcal {D}_{\lambda }^hF,\phi _\delta )=-(2\lambda +h)B(\mathcal {E}_{\lambda +h}F,\phi _\delta ) \end{aligned}$$
(5.12)

and hence

$$\begin{aligned} ||h^{-1}\mathcal {D}_{\lambda }^hF||_{2}+||\lambda \nabla _{||} (h^{-1}\mathcal {D}_{\lambda }^hF)||_{2}+||\lambda D_{1/2}^t(h^{-1}\mathcal {D}_{\lambda }^hF)||_{2}\le c||F||_{\bar{\mathbb H}^*}, \end{aligned}$$
(5.13)

where c is independent of h. Using (5.13), (5.12) and (5.11) we see, as \(\lambda \) is fixed, that

$$\begin{aligned} \lim _{h\rightarrow 0}h^{-1}\mathcal {D}_{\lambda }^hF=:\mathcal {G}_\lambda F\; \text{ weakly } \text{ in } \;\bar{\mathbb H}(\mathbb R^{n+1},\mathbb C), \end{aligned}$$
(5.14)

that (5.13) holds with \(h^{-1}\mathcal {D}_{\lambda }^hF\) replaced by \(\mathcal {G}_\lambda F\) and that

$$\begin{aligned} \int _{\mathbb R^{n+1}}\mathcal {G}_\lambda F\bar{\phi }\, dxdt+\lambda ^2 B(\mathcal {G}_\lambda F,\phi )=-2\lambda B(\mathcal {E}_\lambda F,\phi )=-2\lambda \langle \mathcal {H}_{||}\mathcal {E}_\lambda F,\phi \rangle _{\bar{\mathbb H}^*} \end{aligned}$$
(5.15)

whenever \(\phi \in \bar{\mathbb H}(\mathbb R^{n+1},\mathbb C)\). We define

$$\begin{aligned} \partial _\lambda \mathcal {E}_\lambda F:=\mathcal {G}_\lambda F \end{aligned}$$
(5.16)

and hence

$$\begin{aligned} \partial _\lambda \mathcal {E}_\lambda F=-2\lambda \mathcal {E}_\lambda \mathcal {H}_{||}\mathcal {E}_\lambda F \end{aligned}$$
(5.17)

in the sense of (5.15). Furthermore, if \(F=f\in \mathbb H(\mathbb R^{n+1},\mathbb C)\) then

$$\begin{aligned} \langle \mathcal {H}_{||}\mathcal {E}_\lambda f,\phi \rangle _{\bar{\mathbb H}^*}-\langle \mathcal {E}_\lambda \mathcal {H}_{||} f,\phi \rangle _{\bar{\mathbb H}^*}= & {} \langle \mathcal {H}_{||}\mathcal {E}_\lambda f,\phi \rangle _{\bar{\mathbb H}^*}-\langle \mathcal {H}_{||} f,\mathcal {E}_\lambda ^*\phi \rangle _{\bar{\mathbb H}^*}\nonumber \\= & {} B(\mathcal {E}_\lambda f,\phi )-B(f,\mathcal {E}_\lambda ^*\phi )=0, \end{aligned}$$
(5.18)

and hence \(\mathcal {H}_{||}\) and \(\mathcal {E}_\lambda \) commute in this sense. Furthermore, as A is independent of t we can, by arguing similarly, conclude that if \(f\in \mathbb H(\mathbb R^{n+1},\mathbb C)\), then

$$\begin{aligned} \langle \partial _t\mathcal {E}_\lambda f,\phi \rangle _{\bar{\mathbb H}^*}-\langle \mathcal {E}_\lambda \partial _t f,\phi \rangle _{\bar{\mathbb H}^*}=0=\langle \mathcal {L}_{||}\mathcal {E}_\lambda f,\phi \rangle _{\bar{\mathbb H}^*}-\langle \mathcal {E}_\lambda \mathcal {L}_{||} f,\phi \rangle _{\bar{\mathbb H}^*} \end{aligned}$$
(5.19)

and hence \(\partial _t\) and \(\mathcal {E}_\lambda \), and \(\mathcal {L}_{||}\) and \(\mathcal {E}_\lambda \), commute in this sense. In particular, if \(F=f\in \mathbb H(\mathbb R^{n+1},\mathbb C)\) then

$$\begin{aligned} \partial _\lambda \mathcal {E}_\lambda f=-2\lambda \mathcal {E}_\lambda ^2 \mathcal {H}_{||} f \end{aligned}$$
(5.20)

in the sense of (5.15).

5.2 Estimates of resolvents

We here collect a set of the estimates for \(\mathcal {E}_\lambda f\) and \(\mathcal {E}_\lambda ^*f\) to be used in the next section.

Lemma 5.7

Let \(\lambda >0\) be given. Consider the operator \(\mathcal {H}_{||}=\partial _t-\mathop {{\text {div}}}\nolimits _{||} A_{||}\nabla _{||}\) and assume that A satisfies (1.2), (1.3). Let \(\Theta _\lambda \) denote any of the operators

$$\begin{aligned} {\mathcal {E}_\lambda , \lambda \nabla _{||}\mathcal {E}_\lambda , \lambda D_{1/2}^t\mathcal {E}_\lambda }, \end{aligned}$$

or

$$\begin{aligned} {\lambda \mathcal {E}_\lambda D_{1/2}^t, \lambda ^2 \nabla _{||}\mathcal {E}_\lambda D_{1/2}^t, \lambda ^2 D_{1/2}^t\mathcal {E}_\lambda D_{1/2}^t}, \end{aligned}$$

and let \(\tilde{\Theta }_\lambda \) denote any of the operators

$$\begin{aligned} {\lambda \mathcal {E}_\lambda \mathop {{\text {div}}}\nolimits _{||}, \lambda ^2 \nabla _{||}\mathcal {E}_\lambda \mathop {{\text {div}}}\nolimits _{||}, \lambda ^2 D_{1/2}^t\mathcal {E}_\lambda \mathop {{\text {div}}}\nolimits _{||}}. \end{aligned}$$

Then there exist c, depending only on \(n, \Lambda \), such that

$$\begin{aligned} \,\mathrm{(i)}&\int _{\mathbb R^{n+1}}\ |\Theta _\lambda f(x,t)|^2\, dxdt\le c\int _{\mathbb R^{n+1}}\ |f(x,t)|^2\, dxdt,\nonumber \\ \,\mathrm{(ii)}&\int _{\mathbb R^{n+1}}\ |\tilde{\Theta }_\lambda \mathbf{f}(x,t)|^2\, dxdt\le c\int _{\mathbb R^{n+1}}\ |\mathbf{f}(x,t)|^2\, dxdt, \end{aligned}$$

whenever \(f\in L^2(\mathbb R^{n+1},\mathbb C)\), \(\mathbf{f}\in L^2(\mathbb R^{n+1},\mathbb C^{n})\).

Proof

This is Lemma 2.11 in [32]. \(\square \)

Lemma 5.8

Let \(\lambda >0\) be given. Consider the operator \(\mathcal {H}_{||}=\partial _t-\mathop {{\text {div}}}\nolimits _{||} A_{||}\nabla _{||}\) and assume that A satisfies (1.2), (1.3). Let \(A_{n+1}^{||}:=(A_{1,n+1},\ldots ,A_{n,n+1})\),

$$\begin{aligned} \mathcal {U}_\lambda :=\lambda \mathcal {E}_\lambda \mathop {{\text {div}}}\nolimits _{||}, \end{aligned}$$

and let

$$\begin{aligned} \mathcal {R}_\lambda :=\mathcal {U}_\lambda A_{n+1}^{||}-(\mathcal {U}_\lambda A_{n+1}^{||})\mathcal {P}_\lambda , \end{aligned}$$

where \(\mathcal {P}_\lambda \) is a parabolic approximation of the identity. Then there exists a constant c, depending only on n, \(\Lambda \), such that

$$\begin{aligned} ||\mathcal {R}_\lambda f||_2\le c(||\lambda \nabla f||_2+||\lambda ^2 \partial _tf||_2), \end{aligned}$$

whenever \(f\in C_0^\infty (\mathbb R^{n+1},\mathbb C)\).

Proof

The lemma is a consequence of Lemma 2.27 in [32]. \(\square \)

Lemma 5.9

Let \(\lambda >0\) be given. Consider the operator \(\mathcal {H}_{||}=\partial _t-\mathop {{\text {div}}}\nolimits _{||} A_{||}\nabla _{||}\) and assume that A satisfies (1.2), (1.3). Let \(A_{n+1}^{||}:=(A_{1,n+1},\ldots ,A_{n,n+1})\),

$$\begin{aligned} \mathcal {U}_\lambda :=\lambda \mathcal {E}_\lambda \mathop {{\text {div}}}\nolimits _{||}, \end{aligned}$$

and consider \(\mathcal {U}_\lambda A_{n+1}^{||}\). Then there exists a constant c, depending only on n, \(\Lambda \), such that

$$\begin{aligned} \int _0^{l(Q)}\int _Q|\mathcal {U}_\lambda A_{n+1}^{||}|^2\frac{dxdtd\lambda }{\lambda }\le c|Q|, \end{aligned}$$

for all cubes \(Q\subset \mathbb R^{n+1}\).

Proof

This is Lemma 3.1 in [32]. \(\square \)

Remark 5.10

For the details of the proof of Lemmas 5.8 and 5.9 we refer to [32]. We here simply note that for \(\lambda \) fixed, \((\mathcal {U}_\lambda A_{n+1}^{||})\) (and \(\mathcal {R}_\lambda 1\)) exists as an element in \(L^2_{\text{ loc }}(\mathbb R^{n+1},\mathbb C)\). Indeed, let \(Q_R\) be the parabolic cube on \(\mathbb R^{n+1}\) with center at (0, 0) and with size determined by R. Writing

$$\begin{aligned} \mathcal {U}_\lambda A_{n+1}^{||}=\mathcal {U}_\lambda A_{n+1}^{||}1_{2Q_R}+ \mathcal {U}_\lambda A_{n+1}^{||}1_{\mathbb R^{n+1}{\setminus } 2Q_R}, \end{aligned}$$

and using Lemma 5.7 we see that

$$\begin{aligned} ||\mathcal {U}_\lambda (A_{n+1}^{||}1_{2Q_R})1_{Q_R}||_2\le c||A||_\infty R^{(n+2)/2}. \end{aligned}$$

Furthermore, by the off-diagonal estimates for \(\mathcal {U}_\lambda \) proved in Lemma 2.17 in [32] it follows that also

$$\begin{aligned} ||\mathcal {U}_\lambda (A_{n+1}^{||}1_{\mathbb R^{n+1}{\setminus } 2Q_R})1_{Q_R}||_2\le c||A||_\infty R^{(n+2)/2}. \end{aligned}$$

Theorem 5.11

Consider the operators \(\mathcal {H}_{||}=\partial _t+\mathcal {L}_{||}=\partial _t-\mathop {{\text {div}}}\nolimits _{||} A_{||}\nabla _{||}\), \(\mathcal {H}_{||}^*=-\partial _t+\mathcal {L}_{||}^*=-\partial _t-\mathop {{\text {div}}}\nolimits _{||} A_{||}^*\nabla _{||}\), and assume that A satisfies (1.2), (1.3). Then there exists a constant c, \(1\le c<\infty \), depending only on n, \(\Lambda \), such that

$$\begin{aligned} |||\lambda \mathcal {E}_\lambda \mathcal {H}_{||}f|||+|||\lambda \mathcal {E}_\lambda ^*\mathcal {H}_{||}^*f|||\le c||\mathbb D f||_2, \end{aligned}$$
(5.21)

and

$$\begin{aligned} \,\mathrm{(i)}&|||\partial _\lambda \mathcal {E}_\lambda f|||+|||\partial _\lambda \mathcal {E}_\lambda ^*f|||\le c||\mathbb Df||_2,\nonumber \\ \,\mathrm{(ii)}&|||\lambda \partial _t\mathcal {E}_\lambda f|||+|||\lambda \partial _t\mathcal {E}_\lambda ^*f|||\le c||\mathbb Df||_2,\nonumber \\ \,\mathrm{(iii)}&|||\lambda \mathcal {E}_\lambda \mathcal {L}_{||} f|||+|||\lambda \mathcal {E}_\lambda ^*\mathcal {L}_{||}^*f|||\le c||\mathbb Df||_2,\nonumber \\ \,\mathrm{(iv)}&|||\lambda \mathcal {L}_{||}\mathcal {E}_\lambda f|||+|||\lambda \mathcal {L}_{||}^*\mathcal {E}_\lambda ^*f|||\le c||\mathbb Df||_2, \end{aligned}$$
(5.22)

whenever \(f\in \mathbb H(\mathbb R^{n+1},\mathbb C)\).

Proof

(5.24) is Theorem 1.17 in [32], (5.22) (i)–(iv) is Corollary 1.18 in [32]. However, as the proof of Corollary 1.18 in [32] is presented in a slightly formal manner we here include the proof of the inequalities in (5.22) clarifying details. We only supply the proof in the case of \( \mathcal {H}_{||}\). To prove (i) we note that \(\partial _\lambda \mathcal {E}_\lambda f\) is defined as in (5.16) and that we have, using (5.20), \(\partial _\lambda \mathcal {E}_\lambda f=-2\lambda \mathcal {E}_\lambda ^2 \mathcal {H}_{||} f\) in the sense of (5.15). Hence (i) follows from (5.24). To prove (ii) we note that \(\partial _t\) and \(\mathcal {E}_\lambda \) commute in the sense discussed above, see (5.19), and that

$$\begin{aligned} \lambda \mathcal {E}_\lambda \partial _t f=\lambda \mathcal {E}_\lambda \mathcal {H}_{||} f-\lambda \mathcal {E}_\lambda \mathcal {L}_{||}f. \end{aligned}$$

Hence, using (5.24) we see that

$$\begin{aligned} |||\lambda \partial _t\mathcal {E}_\lambda f|||\le c||\mathbb Df||_2+|||\lambda \mathcal {E}_\lambda \mathcal {L}_{||}f|||. \end{aligned}$$

Therefore, to prove (ii) it suffices to prove (iii). To prove (iii), we let \( f\in \mathbb H(\mathbb R^{n+1},\mathbb C)\) and put \(g=A_{||}\nabla _{||} f\). Using Lemma 5.3 we then see that there exists a weak solution u to the equation

$$\begin{aligned} \mathop {{\text {div}}}\nolimits _{||}(g)= \mathcal {H}_{||}u\; \text{ such } \text{ that }\; ||u||_{\mathbb H}\le c||g||_2. \end{aligned}$$
(5.23)

In particular,

$$\begin{aligned} \lambda \mathcal {E}_\lambda \mathcal {L}_{||}f= \lambda \mathcal {E}_\lambda \mathcal {H}_{||}u. \end{aligned}$$
(5.24)

Hence, again using Theorem 5.11 we see that

$$\begin{aligned} |||\lambda \mathcal {E}_\lambda \mathcal {L}_{||}f|||\le c||\mathbb D u||_2. \end{aligned}$$
(5.25)

(iii) now follows by combining (5.23) and (5.25). To prove (iv) we simply note that \(\mathcal {L}\) and \(\mathcal {E}_\lambda \) commute in the sense of (5.19), and hence (iv) follows from the argument in (iii). This completes the proof of (5.22) (i)–(iv). \(\square \)

5.3 Remark on the Kato problem for parabolic equations

In Section 5 in [32] implications of two of the results proved in [32], Theorem 1.17 and Theorem 1.19 in [32], for Kato square root problems related to the operator \(\partial _t+\mathcal {L}_{||}\) (in [32] this operator is denoted \(\partial _t+\mathcal {L}\)), as well as generalizations of these results to operators \(\partial _t-\mathop {{\text {div}}}\nolimits A(x,t)\nabla \), i.e., to operators with time-dependent coefficients, are discussed. The discussion in the section is essentially flawless but the author neglects to properly state that the Kato square root problem for the operator \(\partial _t+\mathcal {L}_{||}\) is in fact solved in [32]. Indeed, the core of the approach in [32] is the observation that \(\partial _t+\mathcal {L}_{||}\) can be realized as an operator \(\bar{\mathbb H}\rightarrow \bar{\mathbb H}^*\) via the sesquilinear form \(B(u,\psi )\) introduced in (5.5):

$$\begin{aligned} \langle (\partial _t+\mathcal {L}_{||}) u, \psi \rangle := B(u,\psi ),\ u,\psi \in \bar{\mathbb H}. \end{aligned}$$

By the arguments in [32] it follows, see also Lemma 5.4 above, that if \(\theta \in \mathbb C\) with \( \text{ Re } \theta > 0\), then

$$\begin{aligned} \theta +\partial _t+\mathcal {L}_{||}: \mathcal {D}(\partial _t+\mathcal {L}_{||}) \rightarrow L^2(\mathbb R^{n+1},\mathbb C) \end{aligned}$$

is bijective and the resolvent satisfies the estimate

$$\begin{aligned} \Vert (\theta + (\partial _t+\mathcal {L}_{||}))^{-1} f\Vert _2 \le \frac{1}{\text{ Re } \theta }\Vert f\Vert _2. \end{aligned}$$

In particular, \(\partial _t+\mathcal {L}_{||}\), with maximal domain \(\mathcal {D}(\partial _t+\mathcal {L}_{||}) = \{u \in \bar{\mathbb H} : (\partial _t+\mathcal {L}_{||}) u \in L^2(\mathbb R^{n+1},\mathbb C) \}\) in \(L^2(\mathbb R^{n+1},\mathbb C)\), is maximal accretive and, see also the discussion in Section 5 in [32], \(\partial _t+\mathcal {L}_{||}\) is sectorial and there is a square root \(\sqrt{\partial _t+\mathcal {L}_{||}}\) abstractly defined by functional calculus. Furthermore, \(\partial _t+\mathcal {L}_{||}\) has a bounded \(H^\infty \) calculus. This is an other way of formulating the discussion in Section 5 in [32] up to display (5.4) in [32]. Furthermore, the inequality

$$\begin{aligned} ||\sqrt{\partial _t+\mathcal {L}_{||}}f||_2^2\le c\int _0^\infty \int _{\mathbb R^{n+1}}|(I+\lambda ^2(\partial _t+\mathcal {L}_{||}))^{-1}\lambda (\partial _t+\mathcal {L}_{||})f|^2\, \frac{dxdtd\lambda }{\lambda }, \end{aligned}$$
(5.26)

does hold for all \(f\in C_0^\infty (\mathbb R^{n+1},\mathbb {C})\). In particular, the inequality in display (5.5) in [32] is valid and this was the only point left open in [32]. Based on this we can conclude, using the main result proved in [32], that there exists a constant c, \(1\le c<\infty \), depending only on n, \(\Lambda \), such that

$$\begin{aligned} c^{-1}||\mathbb D f||_2\le ||\sqrt{\partial _t+\mathcal {L}_{||}}f||_2\le c||\mathbb D f||_2, \end{aligned}$$
(5.27)

whenever \(f\in \bar{\mathbb H}\).

6 Estimates in parabolic Sobolev spaces

Throughout this section we assume that \(\mathcal {H}\), \(\mathcal {H}^*\) satisfy (1.2) and (1.3) as well as (2.6) and (2.7). Using the estimates established and stated in Sects. 4 and 5 we in this section prove the following three lemmas.

Lemma 6.1

Let \(\Phi (f)\) be defined as in (1.7). Assume that \(\Phi (f)<\infty \) whenever \(f\in L^2(\mathbb R^{n+1},\mathbb C)\). Then there exists a constant c, depending at most on n, \(\Lambda \), and the De Giorgi–Moser–Nash constants, such that

$$\begin{aligned} ||\nabla _{||} \mathcal {S}_{\lambda _0}f||_{2} \le c(\Phi (f)+||f||_2+||N_{**}(\partial _\lambda \mathcal {S}_{\lambda }f)||_2), \end{aligned}$$

whenever \(f\in L^2(\mathbb R^{n+1},\mathbb C)\), \(\lambda _0>0\).

Lemma 6.2

Let \(\Phi (f)\) be defined as in (1.7). Assume that \(\Phi (f)<\infty \) whenever \(f\in L^2(\mathbb R^{n+1},\mathbb C)\). Then there exists a constant c, depending at most on n, \(\Lambda \), and the De Giorgi–Moser–Nash constants, such that

$$\begin{aligned} ||\mathbb D_{n+1}\mathcal {S}_{\lambda _0}f||_{2}^2 \le c(\Phi (f)+||f||_2), \end{aligned}$$

whenever \(f\in L^2(\mathbb R^{n+1},\mathbb C)\), \(\lambda _0> 0\).

Lemma 6.3

There exists a constant c, depending at most on n, such that

$$\begin{aligned} ||H_tD_{1/2}^t\mathcal {S}_{\lambda _0}f||_{2} \le c(||\mathbb D_{n+1}\mathcal {S}_{\lambda _0}f||_{2}+||\nabla _{||} \mathcal {S}_{\lambda _0}f||_{2}), \end{aligned}$$

whenever \(f\in L^2(\mathbb R^{n+1},\mathbb C)\), \(\lambda _0> 0\).

The proofs of Lemmas 6.16.3 are given below.

6.1 Proof of Lemma 6.1

Throughout the proof we can, without loss of generality, assume that \(f\in C_0^\infty (\mathbb R^{n+1},\mathbb C)\). Let \(\lambda _0>0\) be fixed. To prove the lemma it suffices to estimate

$$\begin{aligned} I:=\int _{\mathbb R^{n+1}}\bar{\mathbf{g}}\cdot \nabla _{||} {\mathcal {S}_{\lambda _0}f}\, dxdt, \end{aligned}$$

where \(\mathbf{g}\in C_0^\infty (\mathbb R^{n+1},\mathbb C^n)\) and \(||\mathbf{g}||_2=1\). Given \(f\in C_0^\infty (\mathbb R^{n+1},\mathbb C)\), we note, see Lemma 3.5, that \(\mathcal {S}_{\lambda _0}f\in {\mathbb H}(\mathbb R^{n+1},\mathbb C)\cap L^2(\mathbb R^{n+1},\mathbb C)\). Hence, using Lemma 5.3,

$$\begin{aligned} I=\int _{\mathbb R^{n+1}}A_{||}\nabla _{||} \mathcal {S}_{\lambda _0}f\cdot \overline{\nabla _{||}v}\, dxdt+\int _{\mathbb R^{n+1}}H_tD_{1/2}^t(\mathcal {S}_{\lambda _0}f)\overline{D_{1/2}^t(v)}\, dxdt, \end{aligned}$$

for a function \(v\in \mathbb H=\mathbb H(\mathbb R^{n+1},\mathbb C)\) which satisfies

$$\begin{aligned} ||v||_{\mathbb H}\le c||\mathbf{g}||_2, \end{aligned}$$

for some constant c depending only on n and \(\Lambda \). Let

$$\begin{aligned} I_1:= & {} \int _{\mathbb R^{n+1}}A_{||}\nabla _{||} \mathcal {S}_{\lambda _0}f\cdot \overline{\nabla _{||}v}\, dxdt,\nonumber \\ I_2:= & {} \int _{\mathbb R^{n+1}}H_tD_{1/2}^t(\mathcal {S}_{\lambda _0}f)\overline{D_{1/2}^t(v)}\, dxdt. \end{aligned}$$

As \(C_0^\infty (\mathbb R^{n+1},\mathbb C)\) is dense in \(\mathbb {H}(\mathbb {R}^{n+1},\mathbb {C})\) we can in the following also assume, without loss of generality, that \(v \in C_0^\infty (\mathbb {R}^{n+1},\mathbb {C})\). This reduction allows us to handle several boundary terms which appear when we integrate by parts.

We first estimate \(I_1\). Recall the resolvents, \(\mathcal {E}_\lambda =(I+\lambda ^2\mathcal {H}_{||})^{-1}\) and \(\mathcal {E}_\lambda ^*=(I+\lambda ^2\mathcal {H}_{||}^*)^{-1}\), introduced in Sect. 5. To start the estimate of \(I_1\) we first note, applying Lemma 5.7, that

$$\begin{aligned} \left| \int _{\mathbb R^{n+1}}A_{||}\nabla _{||} \mathcal {E}_\lambda \mathcal {S}_{\lambda +\lambda _0}f\cdot \overline{\nabla _{||}\mathcal {E}_\lambda ^*v}\, dxdt\right| \le \frac{c}{\lambda ^2}||\mathcal {S}_{\lambda +\lambda _0}f||_2||v||_2. \end{aligned}$$
(6.1)

Hence, using that

$$\begin{aligned} \mathcal {S}_{\lambda +\lambda _0}f-\mathcal {S}_{\lambda _0}f =\int _{\lambda _0}^{\lambda + \lambda _0}\partial _\sigma \mathcal {S}_{\sigma }f\, d\sigma , \end{aligned}$$
(6.2)

the fact that \(\Phi (f)<\infty \), Lemma 3.5 and that \(f, v\in C_0^\infty (\mathbb R^{n+1},\mathbb C)\), we can use (6.1) to conclude that

$$\begin{aligned} \left| \int _{\mathbb R^{n+1}}A_{||}\nabla _{||} \mathcal {E}_\lambda \mathcal {S}_{\lambda +\lambda _0}f\cdot \overline{\nabla _{||}\mathcal {E}_\lambda ^*v}\, dxdt\right| \longrightarrow 0 \quad \text{ as } \; \lambda \rightarrow \infty . \end{aligned}$$
(6.3)

Hence,

$$\begin{aligned} I_1=-\int _{0}^\infty \partial _\lambda \left( \int _{\mathbb R^{n+1}}A_{||}\nabla _{||} \mathcal {E}_\lambda \mathcal {S}_{\lambda +\lambda _0}f\cdot \overline{\nabla _{||}\mathcal {E}_\lambda ^*v}\, dxdt\right) d\lambda . \end{aligned}$$
(6.4)

Consider \(\lambda >0\), \(\lambda _0>0\) fixed, let \(|h|\ll \min \{\lambda _0,\lambda \}\). Then

$$\begin{aligned}&\int _{\mathbb R^{n+1}}A_{||}\nabla _{||} \mathcal {E}_{\lambda +h} \mathcal {S}_{\lambda +\lambda _0+h}f\cdot \overline{\nabla _{||}\mathcal {E}_{\lambda +h}^*v}\, dxdt\nonumber \\&\quad -\int _{\mathbb R^{n+1}}A_{||}\nabla _{||} \mathcal {E}_{\lambda } \mathcal {S}_{\lambda +\lambda _0}f\cdot \overline{\nabla _{||}\mathcal {E}_{\lambda +h}^*v}\, dxdt=T_1^h+T_2^h+T_3^h, \end{aligned}$$
(6.5)

where

$$\begin{aligned} T_1^h:= & {} \int _{\mathbb R^{n+1}}A_{||}\nabla _{||}\bigl ( \mathcal {E}_{\lambda +h}-\mathcal {E}_{\lambda }\bigr ) \mathcal {S}_{\lambda +\lambda _0}f\cdot \overline{\nabla _{||}\mathcal {E}_{\lambda +h}^*v}\, dxdt,\nonumber \\ T_2^h:= & {} \int _{\mathbb R^{n+1}}A_{||}\nabla _{||} \mathcal {E}_{\lambda }\mathcal {S}_{\lambda +\lambda _0}f\cdot \overline{\nabla _{||}\bigl (\mathcal {E}_{\lambda +h}^*v-\mathcal {E}_{\lambda }^*v}\bigr )\, dxdt,\nonumber \\ T_3^h:= & {} \int _{\mathbb R^{n+1}}A_{||}\nabla _{||} \mathcal {E}_{\lambda +h} \bigl (\mathcal {S}_{\lambda +\lambda _0+h}f-\mathcal {S}_{\lambda +\lambda _0}f\bigr )\cdot \overline{\nabla _{||}\mathcal {E}_{\lambda +h}^*v}\, dxdt. \end{aligned}$$
(6.6)

Using (5.7)–(5.16) we see that

$$\begin{aligned} \lim _{h\rightarrow 0}h^{-1}T_1^h= & {} \int _{\mathbb R^{n+1}}A_{||}\nabla _{||}\partial _\lambda \mathcal {E}_{\lambda } \mathcal {S}_{\lambda +\lambda _0}f\cdot \overline{\nabla _{||}\mathcal {E}_{\lambda }^*v}\, dxdt,\nonumber \\ \lim _{h\rightarrow 0}h^{-1}T_2^h= & {} \int _{\mathbb R^{n+1}}A_{||}\nabla _{||} \mathcal {E}_{\lambda }\mathcal {S}_{\lambda +\lambda _0}f\cdot \overline{\nabla _{||}\partial _\lambda \mathcal {E}_{\lambda }^*v}\, dxdt,\nonumber \\ \lim _{h\rightarrow 0}h^{-1}T_3^h= & {} \int _{\mathbb R^{n+1}}A_{||}\nabla _{||} \mathcal {E}_{\lambda } \partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f\cdot \overline{\nabla _{||}\mathcal {E}_{\lambda }^*v}\, dxdt. \end{aligned}$$
(6.7)

Using these deductions we can conclude that

$$\begin{aligned} I_1= & {} -\int _{0}^\infty \int _{\mathbb R^{n+1}}\bigl ((A_{||}\nabla _{||} \partial _\lambda \mathcal {E}_\lambda \mathcal {S}_{\lambda +\lambda _0}f)\cdot \overline{\nabla _{||}\mathcal {E}_\lambda ^*v}\bigr )\, dxdtd\lambda \nonumber \\&-\int _{0}^\infty \int _{\mathbb R^{n+1}} \bigl ((A_{||}\nabla _{||} \mathcal {E}_\lambda \mathcal {S}_{\lambda +\lambda _0}f)\cdot \overline{\nabla _{||}\partial _\lambda \mathcal {E}_\lambda ^*v}\bigr )\, dxdtd\lambda \nonumber \\&-\int _{0}^\infty \int _{\mathbb R^{n+1}}\bigl ((A_{||}\nabla _{||} \mathcal {E}_\lambda \partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f)\cdot \overline{\nabla _{||}\mathcal {E}_\lambda ^*v}\bigr )\, dxdtd\lambda \nonumber \\=: & {} I_{11}+I_{12}+I_{13}, \end{aligned}$$

and we emphasize that by our assumptions, and (5.7)–(5.16), \(I_{11}-I_{13}\) are well defined. To proceed we first note that

$$\begin{aligned} I_{11}= & {} -\int _{0}^\infty \langle \mathcal {L}_{||}^*\mathcal {E}_\lambda ^*v, \partial _\lambda \mathcal {E}_\lambda \mathcal {S}_{\lambda +\lambda _0}f\rangle _{\bar{\mathbb H}^*}\, d\lambda =-\int _{0}^\infty \langle \mathcal {E}_\lambda ^*\mathcal {L}_{||}^*v, \partial _\lambda \mathcal {E}_\lambda \mathcal {S}_{\lambda +\lambda _0}f\rangle _{\bar{\mathbb H}^*}\, d\lambda ,\nonumber \\ I_{12}= & {} -\int _{0}^\infty \langle \mathcal {L}_{||}\mathcal {E}_\lambda \mathcal {S}_{\lambda +\lambda _0}f, \partial _\lambda \mathcal {E}_\lambda ^*v \rangle _{\bar{\mathbb H}^*}\, d\lambda =-\int _{0}^\infty \langle \mathcal {E}_\lambda \mathcal {L}_{||} \mathcal {S}_{\lambda +\lambda _0}f, \partial _\lambda \mathcal {E}_\lambda ^*v \rangle _{\bar{\mathbb H}^*}\, d\lambda , \end{aligned}$$

by (5.19). Let

$$\begin{aligned} J:=\int _{0}^\infty \int _{\mathbb R^{n+1}} |\mathcal {E}_\lambda \mathcal {L}_{||} \mathcal {S}_{\lambda +\lambda _0}f|^2\, \lambda dxdtd\lambda . \end{aligned}$$

Then, using (5.17) , the \(L^2\)-boundedness of \(\mathcal {E}_\lambda \) and \(\mathcal {E}_\lambda ^*\), Lemma 5.7, and the square function estimates , Theorem 5.11, we see that

$$\begin{aligned} |I_{11}|+|I_{12}|\le & {} c (|||\lambda \partial _t\mathcal {S}_{\lambda +\lambda _0}f|||+J^{1/2})||v||_{\mathbb H}\nonumber \\\le & {} c(\Phi (f)+||f||_2+J^{1/2})||v||_{\mathbb H}, \end{aligned}$$

where we on the last line have used Lemma 4.2. Next, referring to (5.4) we have

$$\begin{aligned} \mathcal {L}_{||}\mathcal {S}_{\lambda +\lambda _0}f= & {} \sum _{j=1}^{n+1}A_{n+1,j}D_{n+1}D_j\mathcal {S}_{\lambda +\lambda _0}f\nonumber \\&+\sum _{i=1}^{n}D_i(A_{i,n+1}D_{n+1}\mathcal {S}_{\lambda +\lambda _0}f)+\partial _t\mathcal {S}_{\lambda +\lambda _0}f \end{aligned}$$

in a weak sense for almost every \(\lambda \). Using this, and the \(L^2\)-boundedness of \(\mathcal {E}_\lambda \), Lemma 5.7, we see that

$$\begin{aligned} J \le c(|||\lambda \nabla \partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f|||^2+|||\lambda \partial _t \mathcal {S}_{\lambda +\lambda _0}f|||^2+\tilde{J}), \end{aligned}$$

where

$$\begin{aligned} \tilde{J}:=\int _{0}^\infty \int _{\mathbb R^{n+1}} \left| \mathcal {E}_\lambda \sum _{i=1}^{n}D_i(A_{i,n+1}\partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f)\right| ^2\, \lambda dxdtd\lambda . \end{aligned}$$

In particular, again using Lemma 4.2 we see that

$$\begin{aligned} J\le c(\Phi (f)+||f||_2+\tilde{J}). \end{aligned}$$

To estimate \(\tilde{J}\), let \(A_{n+1}^{||}:=(A_{1,n+1},\ldots ,A_{n,n+1})\). Then

$$\begin{aligned} \tilde{J}= & {} \int _{0}^\infty \int _{\mathbb R^{n+1}} |\mathcal {E}_\lambda \mathop {{\text {div}}}\nolimits _{||}(A_{n+1}^{||}\partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f)|^2\, \lambda {dxdtd\lambda }\nonumber \\= & {} \int _{0}^\infty \int _{\mathbb R^{n+1}} |\mathcal {U}_\lambda (A_{n+1}^{||}\partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f)|^2\, \frac{dxdtd\lambda }{\lambda }, \end{aligned}$$

where \(\mathcal {U}_\lambda :=\lambda \mathcal {E}_\lambda \mathop {{\text {div}}}\nolimits _{||}\). We write

$$\begin{aligned} \mathcal {U}_\lambda A_{n+1}^{||}=\mathcal {U}_\lambda A_{n+1}^{||}-(\mathcal {U}_\lambda A_{n+1}^{||})\mathcal {P}_\lambda +(\mathcal {U}_\lambda A_{n+1}^{||})\mathcal {P}_\lambda =:\mathcal {R}_\lambda +(\mathcal {U}_\lambda A_{n+1}^{||})\mathcal {P}_\lambda . \end{aligned}$$

Then

$$\begin{aligned} \tilde{J}\le \tilde{J}_{1}+\tilde{J}_{2}, \end{aligned}$$

where

$$\begin{aligned} \tilde{J}_{1}:= & {} \int _{0}^\infty \int _{\mathbb R^{n+1}} |\mathcal {R}_\lambda \partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f|^2\, \frac{dxdtd\lambda }{\lambda },\nonumber \\ \tilde{J}_{2}:= & {} \int _{0}^\infty \int _{\mathbb R^{n+1}} |(\mathcal {U}_\lambda A_{n+1}^{||})\mathcal {P}_\lambda (\partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f)|^2\, \frac{dxdtd\lambda }{\lambda }. \end{aligned}$$

Using Lemmas 5.8, and 4.2, we see that

$$\begin{aligned} \tilde{J}_{1}\le & {} c\int _{0}^\infty \int _{\mathbb R^{n+1}} |\nabla \partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f|^2\, \lambda {dxdtd\lambda }\nonumber \\&+\, c\int _{0}^\infty \int _{\mathbb R^{n+1}} |\partial _t\partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f|^2\, \lambda ^3{dxdtd\lambda }\nonumber \\\le & {} c(\Phi (f)^2+||f||_2^2). \end{aligned}$$

Furthermore, by the Carleson measure estimate in Lemma 5.9 we have

$$\begin{aligned} \tilde{J}_{2}\le c||N_*(\mathcal {P}_\lambda (\partial _\lambda \mathcal {S}_{\lambda }f))||_2^2. \end{aligned}$$

Finally, we note that

$$\begin{aligned} ||N_*(\mathcal {P}_\lambda (\partial _\lambda \mathcal {S}_{\lambda }f))||_2\le c ||M(N_{**}(\partial _\lambda \mathcal {S}_{\lambda }f))||_2\le c||N_{**}(\partial _\lambda \mathcal {S}_{\lambda }f)||_2 \end{aligned}$$

where M is the parabolic Hardy–Littlewood maximal function. Putting all these estimates together we can conclude that

$$\begin{aligned} |I_{11}|+|I_{12}|\le \bigl (\Phi (f)+||f||_2+||N_{**}(\partial _\lambda \mathcal {S}_{\lambda }f)||_2\bigr )||v||_{\mathbb H}, \end{aligned}$$

which completes the estimate of \(|I_{11}|+|I_{12}|\). We next estimate \(I_{13}\). Integrating by parts with respect to \(\lambda \) we deduce, by repeating the argument above, that

$$\begin{aligned} I_{13}= & {} -\int _{0}^\infty \int _{\mathbb R^{n+1}}\bigl (A_{||}\nabla _{||} \mathcal {E}_\lambda \partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f\cdot \overline{\nabla _{||}\mathcal {E}_\lambda ^*v}\bigr )\, dxdtd\lambda \nonumber \\= & {} \int _{0}^\infty \int _{\mathbb R^{n+1}}\partial _\lambda \bigl (A_{||}\nabla _{||} \mathcal {E}_\lambda \partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f\cdot \overline{\nabla _{||}\mathcal {E}_\lambda ^*v}\bigr )\, \lambda dxdtd\lambda \nonumber \\= & {} \int _{0}^\infty \int _{\mathbb R^{n+1}}\bigl ((A_{||}\nabla _{||} \partial _\lambda \mathcal {E}_\lambda \partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f)\cdot \overline{\nabla _{||}\mathcal {E}_\lambda ^*v}\bigr )\, \lambda dxdtd\lambda \nonumber \\&+\int _{0}^\infty \int _{\mathbb R^{n+1}} \bigl ((A_{||}\nabla _{||} \mathcal {E}_\lambda \partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f)\cdot \overline{\nabla _{||}\partial _\lambda \mathcal {E}_\lambda ^*v}\bigr )\, \lambda dxdtd\lambda \nonumber \\&+\int _{0}^\infty \int _{\mathbb R^{n+1}}\bigl ((A_{||}\nabla _{||} \mathcal {E}_\lambda \partial _\lambda ^2 \mathcal {S}_{\lambda +\lambda _0}f)\cdot \overline{\nabla _{||}\mathcal {E}_\lambda ^*v}\bigr )\, \lambda dxdtd\lambda \nonumber \\=: & {} I_{131}+I_{132}+I_{133}. \end{aligned}$$

By repeating the estimates above used to control \(|I_{11}|+|I_{12}|\), we see that

$$\begin{aligned} (|I_{131}|+|I_{132}|)^2\le & {} c\int _{0}^\infty \int _{\mathbb R^{n+1}} |\nabla \partial _\lambda ^2 \mathcal {S}_{\lambda +\lambda _0}f|^2\, \lambda ^3 dxdtd\lambda ,\nonumber \\&+\, c\int _{0}^\infty \int _{\mathbb R^{n+1}} |\partial _t \partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f|^2\, \lambda ^3 dxdtd\lambda \nonumber \\&+\, c\int _{0}^\infty \int _{\mathbb R^{n+1}} |\partial _t \partial _\lambda ^2 \mathcal {S}_{\lambda +\lambda _0}f|^2\, \lambda ^5 dxdtd\lambda +c||N_*(\mathcal {P}_\lambda (\lambda \partial _\lambda ^2 \mathcal {S}_{\lambda }f))||_2^2. \end{aligned}$$

Furthermore,

$$\begin{aligned} I_{133}= & {} \int _{0}^\infty \int _{\mathbb R^{n+1}}\bigl (A_{||}\nabla _{||} \mathcal {E}_\lambda \partial _\lambda ^2 \mathcal {S}_{\lambda +\lambda _0}f\cdot \overline{\nabla _{||}\mathcal {E}_\lambda ^*v}\bigr )\, \lambda dxdtd\lambda \nonumber \\= & {} -\int _{0}^\infty \int _{\mathbb R^{n+1}}\mathcal {E}_\lambda \partial _\lambda ^2 \mathcal {S}_{\lambda +\lambda _0}f\overline{\mathcal {E}_\lambda ^*\mathcal {L}_{||}^*v}\, \lambda dxdtd\lambda , \end{aligned}$$

by previous arguments. Using the \(L^2\)-boundedness of \(\mathcal {E}_\lambda \), Lemma 5.7 and the square function estimate for \(\mathcal {E}_\lambda ^*\mathcal {L}_{||}^*\), Theorem 5.11, we can conclude that

$$\begin{aligned} |I_{133}|\le & {} c\left( \int _{0}^\infty \int _{\mathbb R^{n+1}} |\partial _\lambda ^2 \mathcal {S}_{\lambda +\lambda _0}f|^2\, \lambda dxdtd\lambda \right) ^{1/2}||v||_{\mathbb H}. \end{aligned}$$

Hence, again using Lemma 4.2 we see that

$$\begin{aligned} |I_{13}|\le c\left( \Phi (f)+||f||_2+||N_*(\mathcal {P}_\lambda (\lambda \partial _\lambda ^2 \mathcal {S}_{\lambda }f))||_2\right) ||v||_{\mathbb H}, \end{aligned}$$

Again

$$\begin{aligned} ||N_*(\mathcal {P}_\lambda (\lambda \partial _\lambda ^2 \mathcal {S}_{\lambda }f))||_2\le c||M(N_{**}(\lambda \partial _\lambda ^2 \mathcal {S}_{\lambda }f))||_2\le c||N_{**}(\lambda \partial _\lambda ^2 \mathcal {S}_{\lambda }f)||_2, \end{aligned}$$

and using (2.6) and Lemma 2.1 we see that

$$\begin{aligned} ||N_{**}(\lambda \partial _\lambda ^2 \mathcal {S}_{\lambda }f)||_2\le c||N_{**}(\partial _\lambda \mathcal {S}_{\lambda }f)||_2, \end{aligned}$$

after a slight redefinition of the non-tangential maximal function on the right hand side. This completes the proof of \(I_1\).

We next estimate \(I_2\). To start the estimate of \(I_2\) we first deduce, by arguing along the lines of (6.3)–(6.7), that

$$\begin{aligned} I_2= & {} -\int _{0}^\infty \int _{\mathbb R^{n+1}}\partial _\lambda \bigl (H_tD_{1/2}^t\mathcal {E}_\lambda \mathcal {S}_{\lambda +\lambda _0}f\cdot \overline{D_{1/2}^t\mathcal {E}_\lambda ^*v}\bigr )\, dxdtd\lambda \nonumber \\= & {} -\int _{0}^\infty \int _{\mathbb R^{n+1}}(H_tD_{1/2}^t\partial _\lambda \mathcal {E}_\lambda \mathcal {S}_{\lambda +\lambda _0}f)\cdot \overline{D_{1/2}^t\mathcal {E}_\lambda ^*v}\, dxdtd\lambda \nonumber \\&-\int _{0}^\infty \int _{\mathbb R^{n+1}} (H_tD_{1/2}^t \mathcal {E}_\lambda \mathcal {S}_{\lambda +\lambda _0}f)\cdot \overline{D_{1/2}^t\partial _\lambda \mathcal {E}_\lambda ^*v}\, dxdtd\lambda \nonumber \\&-\int _{0}^\infty \int _{\mathbb R^{n+1}}(H_tD_{1/2}^t\mathcal {E}_\lambda \partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f)\cdot \overline{D_{1/2}^t\mathcal {E}_\lambda ^*v}\, dxdtd\lambda \nonumber \\=: & {} I_{21}+I_{22}+I_{23}. \end{aligned}$$

Using the \(L^2\)-boundedness of \(\mathcal {E}_\lambda \) and \(\mathcal {E}_\lambda ^*\), Lemma 5.7, and the square function estimates, Theorem 5.11, that \(\mathcal {H}_{||}\) commutes with \(\mathcal {E}_\lambda \), \(D_{1/2}^t\), and \(H_tD_{1/2}^t\), and that \(\mathcal {H}_{||}^*\) commutes with \(\mathcal {E}_\lambda ^*\), \(D_{1/2}^t\), and \(H_tD_{1/2}^t\), in both cases in the sense described above, we can as in the estimate of \(|I_{11}|+|I_{12}|\) deduce that

$$\begin{aligned} |I_{22}|\le c|||\lambda \partial _t\mathcal {S}_{\lambda +\lambda _0}f||| \, ||v||_{\mathbb H}\le c(\Phi (f)+||f||_2)||v||_{\mathbb H}. \end{aligned}$$
(6.8)

At the final step of this deduction we have also used Lemma 4.2. Integrating by parts with respect to \(\lambda \) in \(I_{23}\), and repeating the arguments used in the estimates of \(|I_{21}|\) and \(|I_{22}|\), it is easily seen, using Lemma 4.2, that

$$\begin{aligned} |I_{23}|\le c(\Phi (f)+||f||_2)||v||_{\mathbb H}+|\tilde{I}_{23}|, \end{aligned}$$

where

$$\begin{aligned} \tilde{I}_{23}=\int _{0}^\infty \int _{\mathbb R^{n+1}}\bigl ((H_tD_{1/2}^t\mathcal {E}_\lambda \partial _\lambda ^2 \mathcal {S}_{\lambda +\lambda _0}f)\cdot \overline{D_{1/2}^t\mathcal {E}_\lambda ^*v}\bigr )\, \lambda dxdtd\lambda . \end{aligned}$$

However, again using Lemma 5.7 and Theorem 5.11

$$\begin{aligned} |\tilde{I}_{23}|\le |||\lambda \partial _\lambda ^2 \mathcal {S}_{\lambda +\lambda _0}f||| \, |||\lambda \partial _t\mathcal {E}_\lambda ^*v |||\le c\Phi (f)||v||_{\mathbb H}. \end{aligned}$$

This completes the proof of the lemma.

6.2 Proof of Lemma 6.2

To prove Lemma 6.2 it suffices to estimate

$$\begin{aligned} \int _{\mathbb R^{n+1}} (\mathbb D_{n+1}\mathcal {S}_{\lambda _0}f)\bar{g}\, dxdt \end{aligned}$$

when \(f, g\in C_0^\infty (\mathbb R^{n+1},\mathbb C)\), \(||g||_2=1\). Let in the following \(\mathcal {P}_\lambda \) be a parabolic approximation of the identity. Then, using (2.2) (ii) we see that

$$\begin{aligned} \left| \int _{\mathbb R^{n+1}}(\mathbb D_{n+1}\mathcal {S}_{\lambda +\lambda _0}f) \mathcal {P}_\lambda \bar{g}\, dxdt\right|\le & {} c|| D_{1/2}^t\mathcal {S}_{\lambda +\lambda _0}f||_2|| \mathcal {P}_\lambda \bar{g}||_2 \nonumber \\\le & {} \frac{c}{\lambda ^{n/2+1}} \Vert \partial _t \mathcal {S}_{\lambda + \lambda _0}f \Vert _2 \Vert \mathcal {S}_{\lambda + \lambda _0}f \Vert _2. \end{aligned}$$

Again using (6.2), Hölder’s inequality, the fact that \(\Phi (f)<\infty \), Lemmas 3.4 and 3.5 we deduce that

$$\begin{aligned} \left| \int _{\mathbb R^{n+1}}(\mathbb D_{n+1}\mathcal {S}_{\lambda +\lambda _0}f) \mathcal {P}_\lambda \bar{g}\, dxdt\right| \longrightarrow 0 \quad \text{ as }\; \lambda \rightarrow \infty . \end{aligned}$$

Hence,

$$\begin{aligned} -\int _{\mathbb R^{n+1}} (\mathbb D_{n+1}\mathcal {S}_{\lambda _0}f)\bar{g}\, dxdt= & {} \int _0^\infty \int _{\mathbb R^{n+1}} \partial _\lambda ((\mathbb D_{n+1}\mathcal {S}_{\lambda +\lambda _0}f) \mathcal {P}_\lambda \bar{g})\, dxdtd\lambda \nonumber \\= & {} \int _0^\infty \int _{\mathbb R^{n+1}}(\mathbb D_{n+1}\partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f) \mathcal {P}_\lambda \bar{g}\, dxdtd\lambda \nonumber \\&+\int _0^\infty \int _{\mathbb R^{n+1}} (\mathbb D_{n+1}\mathcal {S}_{\lambda +\lambda _0}f)\partial _\lambda (\mathcal {P}_\lambda \bar{g})\, dxdtd\lambda \nonumber \\=: & {} I+{ II}. \end{aligned}$$

Note that \(\mathbb D_{n+1}=i\mathbb D^{-1}\partial _t\) and that \(\partial _\lambda \mathcal {P}_\lambda =\mathbb D \mathcal {Q}_\lambda \) where \(\mathcal {Q}_\lambda \) is an approximation of the zero operator. To prove this one can use that the kernel of \(\partial _\lambda \mathcal {P}_\lambda \) has not only zero mean but also first order vanishing moments if \(\mathcal {P}\) is an even function (see also [21, p. 366]). Using this we see that

$$\begin{aligned} |{ II}|^2\le & {} \left| \int _0^\infty \int _{\mathbb R^{n+1}} (\partial _t \mathcal {S}_{\lambda +\lambda _0}f) \mathcal {Q}_\lambda \bar{g}\, dxdtd\lambda \right| ^2\nonumber \\\le & {} c\int _{0}^\infty \int _{\mathbb R^{n+1}}|\partial _t\mathcal {S}_{\lambda +\lambda _0}f|^2\, \lambda { dxdtd\lambda }\le c(\Phi (f)+||f||_2)^2, \end{aligned}$$

by (2.8) and Lemma 4.2. To handle I we again integrate by parts with respect to \(\lambda \),

$$\begin{aligned} -I= & {} \int _0^\infty \int _{\mathbb R^{n+1}}(\mathbb D_{n+1}\partial _\lambda ^2 \mathcal {S}_{\lambda +\lambda _0}f) \mathcal {P}_\lambda \bar{g}\, \lambda dxdtd\lambda \nonumber \\&+\int _0^\infty \int _{\mathbb R^{n+1}}(\mathbb D_{n+1}\partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f)\partial _\lambda (\mathcal {P}_\lambda \bar{g})\, \lambda dxdtd\lambda \nonumber \\=: & {} I_1+I_2. \end{aligned}$$

Arguing as above we immediately see that

$$\begin{aligned} |I_2|^2\le c\int _{0}^\infty \int _{\mathbb R^{n+1}}|\partial _t\partial _\lambda \mathcal {S}_{\lambda +\lambda _0}f|^2\, \lambda ^3{ dxdtd\lambda }\le c(\Phi (f)+||f||_2)^2. \end{aligned}$$

Focusing on \(I_1\), Lemma 2.4 implies

$$\begin{aligned}|I_1| \le ||| \lambda \partial _\lambda ^2 \mathcal {S}_{\lambda + \lambda _0}f ||| \ ||| \lambda \mathbb {D}_{n+1} \mathcal {P}_\lambda g ||| \le c ||| \lambda \partial _\lambda ^2 \mathcal {S}_{\lambda + \lambda _0}f ||| \ ||| \lambda \mathbb {D} \mathcal {P}_\lambda g ||| \le c \Phi (f), \end{aligned}$$

and the proof of the lemma is complete.

6.3 Proof of Lemma 6.3

Let \(K\gg 2\) be a degree of freedom and let \(\phi \in C_0^\infty (\mathbb R)\) be an even function with \(\phi =1\) on \((-3/2,-2/K)\cup (2/K,3/2)\) and with support in \((-2,-1/K)\cup (1/K,2)\). Recall that the multiplier defining \(D_{1/2}^t\) is \(|\tau |^{1/2}\). We write

$$\begin{aligned} |\tau |^{1/2}= & {} |\tau |^{1/2}\phi (\tau /||(\xi ,\tau )||^2)+ |\tau |^{1/2}(1-\phi )(\tau /||(\xi ,\tau )||^2)\nonumber \\= & {} \text{ sgn }(\tau )\frac{||(\xi ,\tau )||}{|\tau |^{1/2}}\phi (\tau /||(\xi ,\tau )||^2)\frac{\tau }{||(\xi ,\tau )||}\nonumber \\&-\sum _{j=1}^n|\tau |^{1/2}\frac{i\xi _j}{|\xi |^2}(1-\phi )(\tau /||(\xi ,\tau )||^2)i\xi _j. \end{aligned}$$

Hence, introducing the multipliers

$$\begin{aligned} m_1(\xi ,\tau )= & {} \text{ sgn }(\tau )\frac{||(\xi ,\tau )||}{|\tau |^{1/2}}\phi (\tau /||(\xi ,\tau )||^2),\nonumber \\ m_{2,j}(\xi ,\tau )= & {} -|\tau |^{1/2}\frac{i\xi _j}{|\xi |^2}(1-\phi )(\tau /||(\xi ,\tau )||^2), \end{aligned}$$

for \(j\in \{1,\ldots ,n\}\) we can conclude the existence of kernels \(L_1\), \(L_{2,j}\), corresponding to \(m_1\), \(m_{2,j}\), such that

$$\begin{aligned} D_{1/2}^t=L_1*\mathbb D_{n+1}+c\sum _{j=1}^n L_{2,j}*\partial _{x_j}, \end{aligned}$$

where \(*\) denotes convolution. Choosing \(K=K(n)\) large enough we see that the multipliers \(m_1\) and \(m_{2,j}\) are bounded, and hence \(L_1\) and \(L_{2,j}\) are bounded operators on \(L^2(\mathbb R^{n+1},\mathbb C)\). This completes the proof of Lemma 6.3.

7 Proof of Theorem 1.1

Assume that \(\mathcal {H}\), \(\mathcal {H}^*\), satisfy (1.2) and (1.3) as well as the De Giorgi–Moser–Nash estimates stated in (2.6) and (2.7). Assume also that there exists a constant C such that (1.5) holds whenever \(f\in L^2(\mathbb R^{n+1},\mathbb C)\). To prove Theorem 1.1 we need to prove that there exists a constant c, depending at most on n, \(\Lambda \), the De Giorgi–Moser–Nash constants and C , such that the inequalities in (1.6) (i)–(iv) hold. Again, we only have to prove (1.6) (i)–(iv) for \(\mathcal {S}_\lambda ^{\mathcal {H}}\) as the corresponding results for \(\mathcal {S}_\lambda ^{\mathcal {H}^*}\) follow by analogy. To start the proof, we first note that (1.6) (i) is an immediate consequence of Lemma 4.1 (i) and the assumption in (1.5) (i). Using Lemmas 6.1, 6.2, and 6.3, we see that (1.6) (i) and the assumptions in (1.5) imply that

$$\begin{aligned} \sup _{\lambda >0}||\mathbb D\mathcal {S}_{\lambda }^{\mathcal {H}}f||_{2}\le c||f||_2. \end{aligned}$$

This proves (1.6) (ii). (1.6) (iii), (iv), now follows immediately form these estimates and Lemma 4.1.

8 Proof of Theorems 1.2 and 1.3

Assume that \(\mathcal {H}=\partial _t-\text{ div } A\nabla \) satisfies (1.2) and (1.3). Assume in addition that A is real and symmetric. Then (2.6) and (2.7) hold. To prove Theorem 1.2 we have to prove that there exists a constant C , depending at most on n, \(\Lambda \), such that (1.5) holds with this C . We first focus on the estimate in (1.5) (ii). Consider

$$\begin{aligned} \psi _{\lambda }(x,t,y,s):=\lambda K_{1,\lambda }(x,t,y,s)=\lambda \partial _\lambda ^{2}\Gamma _\lambda (x,t,y,s). \end{aligned}$$
(8.1)

Then, using Lemma 3.1 we see that \(\psi _{\lambda }(x,t,y,s)\) satisfies the Calderon–Zygmund bounds

$$\begin{aligned} |\psi _{\lambda }(x,t,y,s)|\le c{|\lambda |}(d_\lambda (x,t,y,s))^{-n-3}, \end{aligned}$$
(8.2)

and

$$\begin{aligned} |\mathbb D^h(\psi _{\lambda }(\cdot ,\cdot ,y,s))(x,t)|\le & {} c{|\lambda |||h||^\alpha }(d_\lambda (x,t,y,s))^{-n-3-\alpha }\nonumber \\\le & {} c{||h||^\alpha }(d_\lambda (x,t,y,s))^{-n-2-\alpha }, \end{aligned}$$
(8.3)

for some \(\alpha >0\), whenever \(2||h||\le (|x-y|+|t-s|^{1/2})\) or \(2||h||\le |\lambda |\). Our proof of Theorem 1.2 is based on the following two theorems proved below.

Theorem 8.1

Assume that \(\psi _{\lambda }\) satisfies (8.2) and (8.3). Let

$$\begin{aligned} \theta _\lambda f(x,t):=\int _{\mathbb R^{n+1}}\psi _{\lambda }(x,t,y,s)f(y,s)\, dyds, \end{aligned}$$

whenever \(f\in L^2(\mathbb R^{n+1},\mathbb C)\). Suppose that there exists a system \(\{b_Q\}\) of functions, \(b_Q:\mathbb R^{n+1}\rightarrow \mathbb C\), index by parabolic cubes \(Q\subseteq \mathbb R^{n+1}\), and a constant c, independent of Q, such that for each cube Q the following is true.

$$\begin{aligned} \,\mathrm{(i)}&\int _{\mathbb R^{n+1}}|b_Q(x,t)|^2\, dxdt\le c|Q|,\nonumber \\ \,\mathrm{(ii)}&\int _0^{l(Q)}\int _{Q}|\theta _\lambda b_Q(x,t)|^2\, \frac{dxdtd\lambda }{\lambda }\le c|Q|,\nonumber \\ \,\mathrm{(iii)}&c^{-1}|Q|\le \text{ Re } \int _{Q} b_Q(x,t)\, dxdt. \end{aligned}$$
(8.4)

Then there exists a constant c such that

$$\begin{aligned} |||\theta _\lambda f|||=\left( \int _0^\infty \int _{\mathbb R^{n+1}}|\theta _\lambda f(x,t)|^2\, \frac{dxdtd\lambda }{\lambda }\right) ^{1/2}\le c||f||_2, \end{aligned}$$
(8.5)

whenever \(f\in L^2(\mathbb R^{n+1},\mathbb C)\).

The proofs of Theorems 1.3 and 8.1 are given below. We here use Theorems 1.3 and 8.1 to complete the proof of Theorem 1.2.

Proof of (1.5) (ii) We simply have to produce, using Theorem 8.1 and for \(\theta _\lambda \) defined using the kernel in (8.1), a system \(\{b_Q\}\) of functions satisfying (8.4) (i)–(iii). To do this we let

$$\begin{aligned} b_Q(y,s):=|Q|1_Q\tilde{K}_-(A_Q^-,y,s), \end{aligned}$$

whenever \((y,s)\in \mathbb R^{n+1}\), where \(1_Q\) is the indicator function for the cube Q and where \(\tilde{K}_-(A_Q^-,y,s)\) is the to \(\mathcal {H}^*=-\partial _t+\mathcal {L}\) associated Poisson kernel, at \(A_Q^-:=(x_Q,-l(Q),t_Q)\), defined with respect to \(\mathbb R_-^{n+2}\). Theorem 1.3 applies to \(\tilde{K}_-(A_Q^-,\cdot ,\cdot )\) modulo trivial modifications. To verify that \(b_Q\) satisfies (8.4) (i)–(iii), we first note that (i) is an immediate consequence of Theorem 1.3. Furthermore,

$$\begin{aligned} \int _{\mathbb R^{n+1}}b_Q(y,s)\, dyds=|Q|\tilde{\omega }_-^{A_Q^-}(Q)\ge c^{-1}|Q|, \end{aligned}$$

by elementary estimates and where \(\tilde{\omega }_-^{A_Q^-}\) is the associated parabolic measure at \(A_Q^-\) and defined with respect to \(\mathbb R_-^{n+2}\). Hence (iii) follows and it only remains to establish (ii). Let \((x,t)\in Q\), \(\lambda \in (0,l(Q))\) and note that

$$\begin{aligned} \theta _\lambda b_Q(x,t)= & {} \int _{\mathbb R^{n+1}}\lambda \partial _\lambda ^{2}\Gamma _\lambda (x,t,y,s)b_Q(y,s)\, dyds\nonumber \\= & {} \lambda |Q|\int _{Q}\partial _\lambda ^{2}\Gamma _\lambda (x,t,y,s)\tilde{K}_-(A_Q^-,y,s)\, dyds\nonumber \\= & {} \lambda |Q|\left( \partial _\lambda ^{2}\Gamma (x,t,\lambda , x_Q,t_Q,-l(Q))\right) , \end{aligned}$$

by the definition of \({A_Q^-}\), \(\tilde{K}_-(A_Q^-,y,s)\), and as \(\partial _\lambda ^{2}\Gamma (x,t,\lambda , x_Q,t_Q,-l(Q))\) solves \(\mathcal {H}^*u=0\) in \(\mathbb R^{n+2}_-\). Using this, and (8.2), we see that (ii) follows by elementary manipulations. Hence, using Theorem 8.1 we can conclude the validity of (1.5) (ii). \(\square \)

Proof of (1.5) (i) We first note, that we can throughout the proof assume, without loss of generality, that \(f\in C_0^\infty (\mathbb R^{n+1},\mathbb R)\). Second, using Theorem 1.3 and the fact that if \(\mathcal {H}=\partial _t-\text{ div } A\nabla \) satisfies (1.2) and (1.3), and if A is real and symmetric, then the estimates of the non-tangential maximal function by the square function established in [9] for the heat equation, remain valid for solutions to \(\mathcal {H}u=0\). In particular, let \(f\in C_0^\infty (\mathbb R^{n+1},\mathbb R)\) and consider \(\lambda >0\) fixed. We let R and r be such that \(\lambda \ll r\ll R\) and such that the support of f is contained in \(Q_{R/4}(0,0)\). Then, using Theorem 1.3 and [9] we see that

$$\begin{aligned} ||(\partial _\lambda \mathcal {S}_\lambda f)1_{Q_r(0,0)}||_2^2\le c|||\lambda \nabla \partial _\lambda \mathcal {S}_\lambda f|||^2+cR^{n+2}|\partial _\lambda \mathcal {S}_{R/2} f(0,0)|^2, \end{aligned}$$

for a constant c depending only on n, \(\Lambda \). However,

$$\begin{aligned} R^{n+2}|\partial _\lambda \mathcal {S}_{R/2} f(0,0)|^2\le R^{-n-2}||f||_1^{2}. \end{aligned}$$

Hence, first letting \(R\rightarrow \infty \) and then letting \(r\rightarrow \infty \) we can conclude that

$$\begin{aligned} ||\partial _\lambda \mathcal {S}_\lambda f||_2\le c|||\lambda \nabla \partial _\lambda \mathcal {S}_\lambda f|||. \end{aligned}$$
(8.6)

Using (4.3) we see that

$$\begin{aligned} |||\lambda \nabla \partial _\lambda \mathcal {S}_\lambda f|||\le c|||\lambda \partial _\lambda ^2\mathcal {S}_\lambda f|||+c||f||_2. \end{aligned}$$
(8.7)

(8.6), (8.7) and (1.5) (ii) now prove (1.5) (i). \(\square \)

This completes the proof of Theorem 1.2 modulo Theorems 8.1 and 1.3.

8.1 Proof of Theorem 8.1

Though there are several references for this type of argument, see [10, 19, 25] and the references therein, we will, for completion, include a sketch/proof of the argument in our context. To start with, as \(\psi _{\lambda }\) satisfies (8.2) and (8.3) it is well-known, see [10], that to prove (8.5) it suffices to prove the Carleson measure estimate

$$\begin{aligned} \sup _{Q\subset \mathbb R^{n+1}}\frac{1}{|Q|}\int _0^{l(Q)}\int _{Q}|\theta _\lambda 1|^2\, \frac{dxdtd\lambda }{\lambda }\le c. \end{aligned}$$
(8.8)

Using assumption (iii) in the statement of Theorem 8.1, and a by now well-known stopping time argument, see [19], one can conclude that

$$\begin{aligned} \sup _{Q\subset \mathbb R^{n+1}}\frac{1}{|Q|}\int _0^{l(Q)}\int _Q|\theta _\lambda 1|^2\frac{dxdtd\lambda }{\lambda }\le c\sup _{Q\subset \mathbb R^{n+1}}\frac{1}{|Q|}\int _0^{l(Q)}\int _Q|(\theta _\lambda 1 )\mathcal {A}_\lambda ^Q b_{Q}|\frac{dxdtd\lambda }{\lambda }, \end{aligned}$$

where \(\mathcal {A}_\lambda ^Q\) denotes the dyadic averaging operator induced by Q and introduced in (2.9). Hence, to prove (8.8) it suffices to prove that

$$\begin{aligned} \int _0^{l(Q)}\int _Q|(\theta _\lambda 1 )\mathcal {A}_\lambda ^Q b_{Q}|\frac{dxdtd\lambda }{\lambda }\le c|Q|, \end{aligned}$$
(8.9)

for all \(Q\subset \mathbb R^{n+1}\). We write

$$\begin{aligned} (\theta _\lambda 1)\mathcal {A}_\lambda ^Qb_{Q}=\mathcal {R}_\lambda ^{(1)}b_{Q}+\mathcal {R}_\lambda ^{(2)}b_{Q}+\theta _\lambda b_{Q}, \end{aligned}$$

where

$$\begin{aligned} \mathcal {R}_\lambda ^{(1)}b_{Q}:= & {} (\theta _\lambda 1)(\mathcal {A}_\lambda ^Q-\mathcal {A}_\lambda ^Q \mathcal {P}_\lambda )b_{Q},\nonumber \\ \mathcal {R}_\lambda ^{(2)}b_{Q}:= & {} ((\theta _\lambda 1)\mathcal {A}_\lambda ^Q \mathcal {P}_\lambda -\theta _\lambda )b_{Q}, \end{aligned}$$

and where \(\mathcal {P}_\lambda \) is a parabolic approximation of the identity. Using assumption (ii) in the statement of Theorem 8.1 we see that the contribution from the term \(\theta _\lambda b_{Q}\) to the Carleson measure in (8.9) is controlled. Hence we focus on the contributions from \(\mathcal {R}_\lambda ^{(1)}b_{Q}\) and \(\mathcal {R}_\lambda ^{(2)}b_{Q}\). Note that

$$\begin{aligned} \mathcal {R}_\lambda ^{(1)}= & {} (\theta _\lambda 1)(\mathcal {A}_\lambda ^Q-\mathcal {A}_\lambda ^Q \mathcal {P}_\lambda )=(\theta _\lambda 1)\mathcal {A}_\lambda ^Q(\mathcal {A}_\lambda ^Q-\mathcal {P}_\lambda ). \end{aligned}$$

Using (8.2), (8.3), and a version of Schur’s lemma, we see that

$$\begin{aligned} ||(\theta _\lambda 1)\mathcal {A}_\lambda ^Q||_{2\rightarrow 2}\le c. \end{aligned}$$

Thus, by Lemma 2.5,

$$\begin{aligned} \int _{0}^{l(Q)}\int _{Q}| \mathcal {R}_\lambda ^{(1)}b_{Q}(x,t)|^2\, \frac{dxdtd\lambda }{\lambda }\le & {} c \int _0^\infty \int _{\mathbb R^{n+1}}|(\mathcal {A}_\lambda ^Q-\mathcal {P}_\lambda )b_{Q}(x,t)|^2\, \frac{dxdtd\lambda }{\lambda }\nonumber \\\le & {} c \int _{\mathbb R^{n+1}}|b_{Q}(x,t)|^2\, {dxdt}\le c|Q|. \end{aligned}$$

It remains to estimate

$$\begin{aligned} \int _{0}^{l(Q)}\int _{Q}| \mathcal {R}_\lambda ^{(2)}b_{Q}(x,t)|^2\, \frac{dxdtd\lambda }{\lambda }. \end{aligned}$$

However, using (8.2), (8.3), and that \(\mathcal {R}_\lambda ^{(2)}1=0\), it follows by a well known orthogonality argument, and assumption (i) in the statement of Theorem 8.1, that

$$\begin{aligned} \int _{0}^{l(Q)}\int _{Q}| \mathcal {R}_\lambda ^{(2)}b_{Q}(x,t)|^2\, \frac{dxdtd\lambda }{\lambda }\le \int _{\mathbb R^{n+1}}|b_Q(x,t)|^2\, dxdt\le c|Q|. \end{aligned}$$

This completes the proof of Theorem 8.1.

8.2 Proof of Theorem 1.3

Under the assumptions of Theorem 1.3 there exists a Green’s function \(G=G(X,t,Y,s)\) to \(\mathcal {H}=\partial _t+\mathcal {L}=\partial _t-\mathop {{\text {div}}}\nolimits A\nabla \) in \(\mathbb R_+^{n+2}\), and corresponding measures \(\omega ^{(X,t)}(\cdot )\), \(\tilde{\omega }^{(X,t)}(\cdot )\), \((X,t)\in \mathbb R_+^{n+2}\) such that

$$\begin{aligned} \phi (X,t)= & {} \int \bigl (A\nabla _Y G(X,t,Y,s)\cdot \nabla \phi (Y,s)+G(X,t,Y,s)\partial _s\phi (Y,s)\bigr )\, dYds\nonumber \\&+ \int \phi (y,0,s)\, d\omega ^{(X,t)}(y,s),\nonumber \\ \phi (X,t)= & {} \int \bigl (A\nabla _Y G(Y,s,X,t)\cdot \nabla \phi (Y,s)-G(Y,s,X,t)\partial _s\phi (Y,s)\bigr )\, dYds\nonumber \\&+ \int \phi (y,0,s)\, d\tilde{\omega }^{(X,t)}(y,s), \end{aligned}$$
(8.10)

whenever \(\phi \in C_0^\infty (\mathbb R^{n+2})\) and where \((X,t)=(x,x_{n+1},t)\), \((Y,s)=(y,y_{n+1},s)\). In particular,

$$\begin{aligned} (\partial _t+\mathcal {L}_{X,t})G(X,t,Y,s)=\delta _{(0,0)}(X-Y,t-s), \end{aligned}$$

and

$$\begin{aligned} (-\partial _s+\mathcal {L}_{Y,s})G(X,t,Y,s)=\delta _{(0,0)}(X-Y,t-s). \end{aligned}$$
(8.11)

Furthermore, in this setting G has a number of well-known properties, see for example display (3.7) on p. 11 in [23], and given \(f\in C(\mathbb R^{n+1})\cap L^\infty (\mathbb R^{n+1})\),

$$\begin{aligned} u(X,t)=\int _{\mathbb R^{n+1}}f(y,s)\, d\omega ^{(X,t)}(y,s), \end{aligned}$$

gives the solution to the continuous Dirichlet problem \(\mathcal {H}u=(\partial _t+\mathcal {L})u=(\partial _t-\mathop {{\text {div}}}\nolimits A\nabla )u=0\) in \(\mathbb R^{n+2}_+\), \(u\in C(\mathbb R^{n+1}\times [0,\infty ))\), and \(u(x,0,t)=f(x,t)\) whenever \((x,t)\in \mathbb R^{n+1}\). \(\{\omega ^{(X,t)}:\ (X,t)\in \mathbb R^{n+2}_+\}\) and \(\{\tilde{\omega }^{(X,t)}:\ (X,t)\in \mathbb R^{n+2}_+\}\) are families of regular Borel measures on \(\mathbb R^{n+1}\) which we call \(\mathcal {H}\)-caloric, or \(\mathcal {H}\)-parabolic measures, and \(\mathcal {H}^*\)-caloric, or \(\mathcal {H}^*\)-parabolic measures, respectively.

Given \(\mathcal {H}=\partial _t-\mathop {{\text {div}}}\nolimits A\nabla \), satisfying (1.2) and (1.3) with constant \(\Lambda \), A real and symmetric, let \(A_\epsilon \), \(0<\epsilon \ll 1\), be a smooth \((n+1)\times (n+1)\)-matrix valued function, \(A_\epsilon \) real and symmetric, such that \(\mathcal {H}^\epsilon =\partial _t-\mathop {{\text {div}}}\nolimits A_\epsilon \nabla \) satisfies (1.2) and (1.3), with constants depending at most on n and \(\Lambda \), and such that \(|A_\epsilon -A|\le \epsilon \) on \(\mathbb R^{n+2}\). Let as above \(G_\epsilon (X,t,Y,s)\), \(\omega _\epsilon ^{(X,t)}\), \(\tilde{\omega }_\epsilon ^{(X,t)}\), be the Green’s function and boundary measures associated to \(\mathcal {H}_\epsilon =\partial _t-\mathop {{\text {div}}}\nolimits A_\epsilon \nabla \), \(\mathcal {H}_\epsilon ^*=-\partial _t-\mathop {{\text {div}}}\nolimits A_\epsilon \nabla \). Extending \(G_\epsilon \) and G to all of \(\mathbb R^{n+2}\) by putting \(G_\epsilon \equiv 0\equiv G\) on \(\mathbb R^{n+2}_-\) one can prove, by for instance following the argument in Lemma 3.37 in [23], that

$$\begin{aligned}&\int \bigl (A_\epsilon \nabla _Y G_\epsilon (X,t,Y,s)\cdot \nabla \phi (Y,s)+G_\epsilon (X,t,Y,s)\partial _s\phi (Y,s)\bigr )\, dYds\nonumber \\&\quad \rightarrow \int \bigl (A\nabla _Y G(X,t,Y,s)\cdot \nabla \phi (Y,s)+G(X,t,Y,s)\partial _s\phi (Y,s)\bigr )\, dYds \end{aligned}$$
(8.12)

and

$$\begin{aligned}&\int \bigl (A_\epsilon \nabla _Y G_\epsilon (Y,s,X,t)\cdot \nabla \phi (Y,s)-G_\epsilon (Y,s,X,t)\partial _s\phi (Y,s)\bigr )\, dYds\nonumber \\&\quad \rightarrow \int \bigl (A\nabla _Y G(Y,s,X,t)\cdot \nabla \phi (Y,s)-G(Y,s,X,t)\partial _s\phi (Y,s)\bigr )\, dYds, \end{aligned}$$
(8.13)

as \(\epsilon \rightarrow 0\), whenever \((X,t)\in \mathbb R^{n+2}_+\) and \(\phi \in C_0^\infty (K)\) where K is a compact subset of \(\mathbb R^{n+2}{\setminus }\{(X,t)\}\). Hence, using (8.10), (8.12), (8.13) we can conclude that

$$\begin{aligned} \omega _\epsilon ^{(X,t)}\rightarrow \omega ^{(X,t)}, \quad \tilde{\omega }_\epsilon ^{(X,t)}\rightarrow \tilde{\omega }^{(X,t)} \end{aligned}$$
(8.14)

weakly as Radon measures on \(\mathbb R^{n+1}\) as \(\epsilon \rightarrow 0\).

Based on the above outline it follows that it suffices to prove Theorem 1.3 assuming that A is smooth. Indeed, consider, for \(\epsilon >0\) small, \(A_\epsilon \) and assume that the parabolic measure associated to \(\mathcal {H}_\epsilon \), in \(\mathbb R^{n+2}_+\), is absolutely continuous with respect to the measure dxdt on \(\mathbb R^{n+1}=\partial \mathbb R^{n+2}_+\), let \(Q\subset \mathbb R^{n+1}\) be a parabolic cube and let \(K_\epsilon (A_Q,y,s)\) be the to \(\mathcal {H}_\epsilon \) associated Poisson kernel at \(A_Q:=(x_Q,l(Q),t_Q)\) where \((x_Q,t_Q)\) is the center of the cube Q and l(Q) defines its size. Furthermore, assume that there exists \(c\ge 1\), depending only on n and \(\Lambda \), such that

$$\begin{aligned} \int _{Q}|K_\epsilon (A_Q,y,s)|^{2}\, dyds\le c|Q|^{-1}. \end{aligned}$$

Then \(K_\epsilon (A_Q,y,s)\rightarrow K(A_Q,y,s)\) weakly on Q as \(\epsilon \rightarrow 0\) and

$$\begin{aligned} \int _{Q}|K(A_Q,y,s)|^{2}\, dyds\le c|Q|^{-1}. \end{aligned}$$

Furthermore,

$$\begin{aligned} \int \phi (y,s) d\omega ^{A_Q}(y,s)&{=}&\lim _{\epsilon \rightarrow 0}\int \phi (y,s) d\omega _\epsilon ^{A_Q}(y,s)\nonumber \\&{=}&\lim _{\epsilon \rightarrow 0}\int \phi (y,s) K_\epsilon (A_Q,y,s)\, dyds=\int \phi (y,s) K(A_Q,y,s)\, dyds,\nonumber \\ \end{aligned}$$
(8.15)

whenever \(\phi \in C_0^\infty (Q\times (-l(Q)/2,l(Q)/2)\) and Theorem 1.3 follows.

In the following we prove Theorem 1.3 assuming that A is smooth. If A is smooth it follows that the solution to the Dirichlet problem \(\mathcal {H}u=0\) in \(\mathbb R^{n+2}_+\), \(u=f\) on \(\mathbb R^{n+1}\), equals

$$\begin{aligned} u(X,t)=\int _{\mathbb R^{n+1}} K(X,t,y,s)f(y,s)\, dyds, \end{aligned}$$

where

$$\begin{aligned} K(X,t,y,s):= & {} \langle \nabla _YG(X,t,Y,s),A(Y)e_{n+1}\rangle |_{y_{n+1}=0}=a_{n+1,n+1}(y)\partial _{y_{n+1}}\\&\quad G(X,t,Y,s)|_{y_{n+1}=0}. \end{aligned}$$

Using (1.2) we see that \(a_{n+1,n+1}\) is uniformly bounded from below. Let \(Q\subset \mathbb R^{n+1}\) be a parabolic cube and let \(A_Q:=(X_Q,t_Q):=(x_Q,l(Q),t_Q)\), where \((x_Q,t_Q)\) is the center of the cube and l(Q) defines its size. We write \(Q=\hat{Q}\times (t_Q-l(Q)^2/2,t_Q+l(Q)^2/2)\) where \(\hat{Q}\subset \mathbb R^{n}\) is a (elliptic) cube in the space variables only. Then

$$\begin{aligned} \int _{Q}(K(A_Q,y,s))^2\, dyds= & {} \int _{t_Q-l(Q)^2/2}^{t_Q+l(Q)^2/2}\int _{\hat{Q}}(K(X_Q,t_Q,y,s))^2\, dyds\nonumber \\= & {} \int _{-l(Q)^2/2}^{l(Q)^2/2}\int _{\hat{Q}}(K(X_Q,0,y,s))^2\, dyds\nonumber \\= & {} \int _{-l(Q)^2/2}^{l(Q)^2/2}\int _{\hat{Q}}(K(X_Q,0,y,-s))^2\, dyds, \end{aligned}$$
(8.16)

by the translation invariance in the time-variable due to (1.3). Using the Harnack inequality we see that

$$\begin{aligned} (K(X_Q,0,y,-s))^2\le c K(X_Q,0,y,-s)K(X_Q,16l(Q)^2,y,s), \end{aligned}$$
(8.17)

whenever \((y,s)\in \hat{Q}\times [-l(Q)^2/2,l(Q)^2/2]\). Let

$$\begin{aligned} \phi \in C_0^\infty (\mathbb R^{n+2}{\setminus }\bigl (\{(X_Q,0)\}\cup \{(X_Q,16l(Q)^2)\}\bigr ) \end{aligned}$$

be such that

$$\begin{aligned} \phi (y,y_{n+1},s)=1, \end{aligned}$$
(8.18)

whenever \((y,y_{n+1},s)\in \hat{Q}\times [-l(Q)/16,l(Q)/16]\times [-l(Q)^2/2,l(Q)^2/2]\), and

$$\begin{aligned} \phi (y,y_{n+1},s)=0, \end{aligned}$$
(8.19)

whenever \((y,y_{n+1},s)\in \mathbb R^{n+2}{\setminus }\bigl (2\hat{Q}\times [-l(Q)/8,l(Q)/8]\times [-l(Q)^2,l(Q)^2]\bigr )\). Furthermore, we choose \(\phi \) so that

$$\begin{aligned} \ |\nabla _Y\phi (Y,s)|\le cl(Q)^{-1},\ \quad |\partial _s\phi (Y,s)|\le cl(Q)^{-2}, \end{aligned}$$
(8.20)

whenever \((Y,s)\in \mathbb R^{n+2}\). Let \(\Psi (Y,s):= \phi (Y,s)\partial _{y_{n+1}}v(Y,s)\), where

$$\begin{aligned} v(Y,s):=G(X_Q,0,Y,-s), \end{aligned}$$

and let

$$\begin{aligned} \tilde{v}(Y,s):=G(X_Q,16l(Q)^2,Y,s). \end{aligned}$$

Using (8.11) we see that

$$\begin{aligned} 0= & {} \int _{\mathbb R^{n+21}_+} \bigl ((-\partial _s+\mathcal {L}_{Y,s})G(X_Q,16l(Q)^2,Y,s)\bigr )\Psi (Y,s)\, dYds\nonumber \\= & {} \int _{\mathbb R^{n+2}_+} \bigl ((-\partial _s+\mathcal {L}_{Y,s})\tilde{v}(Y,s)\bigr )\Psi (Y,s)\, dYds. \end{aligned}$$

Using this identity, and integrating by parts, we see that

$$\begin{aligned} I:= & {} \int _{\mathbb R^{n+1}} \Psi (Y,s)|_{y_{n+1}=0}K(X_Q,16l(Q)^2,y,s)\, dyds\nonumber \\= & {} \int _{\mathbb R^{n+2}_+} \bigl ((\partial _s+\mathcal {L}_{Y,s})\Psi (Y,s)\bigr )\tilde{v}(Y,s)\, dYds. \end{aligned}$$
(8.21)

We will now use the identity in (8.21) to prove Theorem 1.3. Indeed,

$$\begin{aligned} (\partial _s+\mathcal {L}_{Y,s})\Psi= & {} \partial _s\Psi -\mathop {{\text {div}}}\nolimits (A\nabla _Y\Psi )\nonumber \\= & {} \partial _{y_{n+1}}v\partial _s\phi -\mathop {{\text {div}}}\nolimits ((\partial _{y_{n+1}}v)A\nabla _Y\phi )-A\nabla _Y \partial _{y_{n+1}}v\cdot \nabla _Y\phi \nonumber \\&+\,\phi (\partial _s\partial _{y_{n+1}}v-\mathop {{\text {div}}}\nolimits (A\nabla _Y\partial _{y_{n+1}}v)). \end{aligned}$$

The key observation is, as A is independent of \(y_{n+1}\), that

$$\begin{aligned} \partial _s\partial _{y_{n+1}}v-\mathop {{\text {div}}}\nolimits (A\nabla _Y\partial _{y_{n+1}}v)= & {} \partial _{y_{n+1}}\bigl (\partial _sv-\mathop {{\text {div}}}\nolimits (A\nabla _Yv)\bigr )\nonumber \\= & {} \partial _{y_{n+1}}\bigl (((-\partial _s+\mathcal {L}_Y)G)(X_Q,0,Y,-s)\bigr )=0, \end{aligned}$$

on the support of \(\phi \). This is due to the presence of the minus sign in front of s in \(G(X_Q,0,Y,-s)\). Hence, using (8.21) and elementary manipulations, we see that

$$\begin{aligned} I=I_1+I_2-I_3. \end{aligned}$$

where

$$\begin{aligned} I_1:= & {} \int _{\mathbb R^{n+2}_+} \partial _{y_{n+1}}G(X_Q,0,Y,-s)(\partial _s\phi (Y,s))G(X_Q,16l(Q)^2,Y,s)\, dYds,\nonumber \\ I_2:= & {} \int _{\mathbb R^{n+2}_+} \partial _{y_{n+1}}G(X_Q,0,Y,-s)(A\nabla _Y\phi )\cdot \nabla _YG(X_Q,16l(Q)^2,Y,s)\, dYds,\nonumber \\ I_3:= & {} \int _{\mathbb R^{n+2}_+} (A\nabla _Y \partial _{y_{n+1}}G(X_Q,0,Y,-s)\cdot \nabla _Y\phi )G(X_Q,16l(Q)^2,Y,s)\, dYds. \end{aligned}$$

Recall that \(\phi \) satisfies (8.18)–(8.20) and let \(E=\mathbb R^{n+2}_+\cap \overline{\{(Y,s): \phi (Y,s)\ne 0\}}\). Using this,

$$\begin{aligned} |I_1|\le & {} cl(Q)^{-2}\int _{E} |\partial _{y_{n+1}}G(X_Q,0,Y,-s)||G(X_Q,16l(Q)^2,Y,s)|\, dYds,\nonumber \\ |I_2|\le & {} cl(Q)^{-1}\int _{E} |\partial _{y_{n+1}}G(X_Q,0,Y,-s)||\nabla _YG(X_Q,16l(Q)^2,Y,s)|\, dYds,\nonumber \\ |I_3|\le & {} cl(Q)^{-1}\int _{E} |\nabla _Y \partial _{y_{n+1}}G(X_Q,0,Y,-s)||G(X_Q,16l(Q)^2,Y,s)|\, dYds. \end{aligned}$$

Hence, using energy estimates and Gaussian bounds for the fundamental solution we deduce

$$\begin{aligned} |I|\le |I_1|+|I_2|+|I_3|\le c|Q|^{-1}. \end{aligned}$$

Using this and (8.21) we see that

$$\begin{aligned} \int _{-l(Q)^2/2}^{l(Q)^2/2}\int _{\hat{Q}} K(X_Q,0,y,-s)K(X_Q,16l(Q)^2,y,s)\, dyds\le c|Q|^{-1}. \end{aligned}$$

Hence, using (8.16) and (8.17) we can conclude that

$$\begin{aligned} \int _{Q}(K(A_Q,y,s))^2\, dyds\le c|Q|^{-1}, \end{aligned}$$
(8.22)

whenever \(Q\subset \mathbb R^{n+1}\) is a parabolic cube, for a constant \(c\ge 1 \), depending only on n and \(\Lambda \). Put together Theorem 1.3 follows.