1 Introduction

The Bernstein method [7] is nowadays a classical technique to obtain gradient estimates for second order elliptic equations [26]. The core idea behind this approach is extremely simple and relies on the so called Bochner identity for the Laplacian on the Euclidean space \(\mathbb {R}^N\)

$$\begin{aligned} \Delta |\nabla u|^2=2|D^2u|^2+2\nabla u\cdot \nabla \Delta u. \end{aligned}$$

This in particular shows that if u is a solution to an elliptic/parabolic equation on some domain of the Euclidean space, then \(w=|\nabla u|^2\) is a subsolution to an elliptic/parabolic equation. As a consequence, the gradient bound follows from the maximum principle, whenever one knows a priori that \(\nabla u\) is bounded on the boundary of the domain, which is typically a consequence of the existence of barrier-like functions, cf [26]. This nonvariational technique to obtain global (and even local) bounds based on the maximum principle has been fruitfully extended to many nonlinear PDEs, such as quasilinear elliptic equations [26], fully nonlinear second order equations [12], semilinear equations (with superlinear gradient growth), even driven by the p-Laplacian, see e.g. [33, 40] and the more recent work [37], integro-differential problems, both linear and (fully) nonlinear, see [11]. This approach is also the cornerstone to deduce quite different qualitative and quantitative properties for elliptic and parabolic equations, such as differential Harnack estimates and Liouville theorems, see [24, 34, 35].

One of the main drawbacks of the standard Bernstein method relies on the regularity requirements necessary to carry out the computations. Since third derivatives appear in the Bochner identity, u needs to be smooth enough (e.g. of class \(C^3\)), and hence only a priori estimates can be derived. Then one has to find a suitable regularization/approximation of the equation having smooth enough solutions to really obtain the regularity estimate after passing to the limit, see e.g. [18, Remark 3], or instead work at the level of difference quotients starting with suitable weak solutions, cf [16].

This difficulty was partially circumvented through the introduction of the so-called weak Bernstein method introduced by Barles [2, 3] in the realm of fully nonlinear equations, which consists in shifting the attention, after a change of variable, to the maximum of the function

$$\begin{aligned} (x,y)\longmapsto u(x)-u(y)-L|x-y|\ ,(x,y)\in {\overline{\Omega }}\times {\overline{\Omega }}, \end{aligned}$$

\(\Omega \) being the state space. The idea is to prove that if it is achieved when \(x=y\) for L large enough, then \(|\nabla u|\le L\). The previous idea basically corresponds to look at the equation satisfied by \(|\nabla u|^2\), and the structure conditions are similar to those required to run the classical Bernstein argument for nonlinear elliptic equations. There are, among others, three peculiar features of this method: it does not require regular solutions (in particular it applies to viscosity solutions), it does not need strong ellipticity, being applicable to problems with fractional or degenerate diffusion [5, 13], and it allows to treat gradient terms with arbitrary growth. Nonetheless, it still requires \(f\in W^{1,\infty }\), as in the standard Bernstein argument.

Another approach in the framework of viscosity solutions based again on a slightly different doubling variables method has been introduced by Ishii and Lions [31] to obtain \(C^{\alpha }\), \(0<\alpha \le 1\), estimates: this method takes advantage of the ellipticity of the diffusion to control the coercivity of the gradient term, being particularly designed for problems with first-order terms below the natural growth and Hölder or bounded coefficients, cf Assumption (3.16) in [31], see e.g. [4, 32] and also [22, 36, 38] for further developments. A general reference discussing both these procedures to get gradient bounds in the context of viscosity solutions is the paper by Barles and Souganidis [6].

Viscosity solutions’ techniques have been refined for the application to problems with superquadratic gradient growth in [13]. Nevertheless, both these methods have a drawback in terms of the regularity of the data. Being based on the notion of viscosity solution, they rely on the maximum principle and they require at least continuous or Lipschitz data. Albeit the notion of viscosity solution admits variants that allow to encompass discontinuous Hamiltonians with merely summable data (at least when dealing with local terms), no technique seems available to derive gradient bounds for problems with fractional diffusion, strongly coercive gradient terms (especially in the supernatural regime) and data in \(L^q\).

To this aim, different methods have been explored when the equation has “ingredients” belonging to Lebesgue spaces. Such techniques, usually named integral Bernstein methods, are again based on delicate integral refinements of the Bochner identity and started with the work by Lions [34]. Equivalently, they can be formulated at the level of the variational formulation of the equation, choosing a p-Laplacian of suitable order as a test function. These integral approaches have been extended more recently in [18] (see also the references therein), in the study of nonlinear Calderón–Zygmund estimates for elliptic problems with superlinear gradient terms. They have been used even in connection with p-Laplacian problems without the presence of first-order terms [14, 37]. Most of these integral techniques for nonlinear equations with power-growth terms have in common the use of the so-called Bakry–Émery curvature dimensional inequality [1], that in \(\mathbb {R}^N\) reads as

$$\begin{aligned} |D^2u|^2\ge \frac{1}{N}(\Delta u)^2, \end{aligned}$$
(1)

and it is a consequence of the Cauchy–Schwarz inequality, cf [24, Definition 20.7]. This inequality is, among others, a powerful tool to obtain logarithmic gradient estimates for positive harmonic functions, and hence the Harnack inequality. In the nonlinear setting, it is crucial to handle nonlinearities with superlinear gradient growth, at least for elliptic problems. Indeed, if u solves \(-\Delta u+|\nabla u|^\gamma =f(x)\), \(\gamma >1\), then (1) combined with the algebraic inequality \((a-b)^2\ge \frac{a^2}{2}-2b^2\), \(a,b\in \mathbb {R}\), imply

$$\begin{aligned} |D^2u|^2\ge \frac{|\nabla u|^{2\gamma }}{2N}-\frac{2}{N}f^2, \end{aligned}$$

which allows to gain an additional degree of coercivity respect to \(|\nabla u|^2\) through the term \(|\nabla u|^{2\gamma }\) and conclude the gradient bound. This crucial and deep step has (once more) a drawback: it does not apply neither to equations involving fractional operators (even in the stationary case) nor to evolution equations with time-dependent source terms belonging to Lebesgue spaces. Indeed, as for the latter, if u solves \(\partial _t u-\Delta u+|\nabla u|^\gamma =f(x,t)\), then (1) implies the presence of a term involving a time-derivative term that can be absorbed only when one knows a priori that \(\partial _tu \ge -C\). For instance, this is the case when \(f=f(x)\) or even when \(\partial _tf(x,t)\) is essentially bounded or at least belongs to some Lebesgue space.

The case of the presence of a fractional diffusion is even worse. In this setting the fractional Bochner identity, cf [20, equation (2.10)], replaces \(|D^2u|^2\) with the nonlocal term

$$\begin{aligned} \int \frac{|\nabla u(x,t)-\nabla u(x+y,t)|^2}{|y|^{N+2s}}\,dy, \end{aligned}$$

which however does not allow to deduce a fractional version of (1). Actually, whether an inequality like (1) (for a possibly different constant) holds for the fractional Laplacian was raised in [24, equation (20.14)], and recently answered negatively in [41], showing thus that some new technique with respect to the “classical” integral Bernstein approach is needed to obtain gradient bounds for fractional equations with merely integrable data.

The aim of this note is thus to propose a new Bernstein-type argument to prove Lipschitz estimates that avoids the use of (1), namely the gain of (additional) coercivity by plugging the equation, to deduce gradient bounds in the aforementioned “negative” situations, i.e. when the equation is parabolic and/or it presents a nonlocal diffusive term with also unbounded terms in Lebesgue spaces. This would imply existence and uniqueness of solutions as a byproduct through the contraction mapping principle combined with a continuation argument, see [15].

We will focus on the Cauchy problem (for simplicity posed on the N-dimensional flat torus)

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _tu+{\mathcal {L}}u+H(x,\nabla u)=f(x,t)&{}\text { in }Q_T:={\mathbb {T}}^N\times (0,T),\\ u(x,0)=u_0(x)&{}\text { in }{\mathbb {T}}^N , \end{array}\right. } \end{aligned}$$
(2)

where \({\mathcal {L}}\) is a diffusion operator defined by

$$\begin{aligned} {\mathcal {L}}:=-\varepsilon \Delta +\mu (-\Delta )^s \end{aligned}$$

when \(\varepsilon ,\mu >0\). Such mixed diffusive operators have received an increasing attention during the last years, see e.g. the series of works by Biagi–Dipierro–Valdinoci–Vecchi [8, 9] and the recent one by De Filippis–Mingione [21]. The achievement of \(L^\infty \) gradient bounds will rely heavily on the analysis of the regularity of solutions to the dual equation of the linearization of (2), and it is largely inspired by a method introduced by Evans [23] to study gradient shock structures of first-order Hamilton–Jacobi equations with non-convex Hamiltonians. Here, we consider

$$\begin{aligned} {\left\{ \begin{array}{ll} -\partial _t\rho +{\mathcal {L}}^*\rho -\textrm{div}(H_p(x,\nabla u)\rho )=0&{}\text { in }Q_\tau :={\mathbb {T}}^N\times (0,\tau ),\\ \rho (x,\tau )=\rho _\tau (x)&{}\text { in }{\mathbb {T}}^N , \end{array}\right. } \end{aligned}$$
(3)

where \(H_p\) stands for the derivative of H with respect to the second entry and

$$\begin{aligned} {\mathcal {L}}^*:=-\varepsilon \Delta +\mu (-\Delta )^s. \end{aligned}$$

We emphasize that such an approach has been recently implemented for viscous Hamilton–Jacobi equations with coercive gradient terms in [16], from which we borrow most of the ideas used in this note. In particular, we exploit the regularizing effect of both the local and the first-order term, and hence the method can be considered of nonperturbative type, whilst the nonlocal one acts only as a perturbation. The main idea to derive the gradient bound will be based on a variational nonlocal version of a (nonlinear) Bochner identity. Indeed, if u solves (2), straightforward computations lead to the following identity satisfied by \(w=\frac{1}{2}|\nabla u|^2\):

$$\begin{aligned}{} & {} \partial _t w(x,t)+{\mathcal {L}}w+\varepsilon |D^2u(x,t)|^2+\frac{\mu }{2}\int _{{\mathbb {T}}^N}|\nabla u(x,t)-\nabla u(x+y,t)|^2K(y)\,dy\\{} & {} \quad +H_p(x,\nabla u(x,t))\cdot \nabla w(x,t)\\{} & {} \quad +H_x(x,\nabla u(x,t))\cdot \nabla u(x,t)=\nabla f(x,t)\cdot \nabla u(x,t), \end{aligned}$$

where K is the kernel of the fractional Laplacian on the torus, cf [39]. Then, the central point is to test the previous identity for w against the function \(\rho \) solving the dual problem (3), see Lemma 6.1, and handle all the integral terms through a delicate interplay between integrability estimates for transport equations, Sobolev and Young’s inequalities. Similar duality methods were previously used in various contexts for local problems, see e.g. [29, 42], and even to deduce semiconcavity and Lipschitz bounds for equations with mixed diffusion and regular data in [15] in the context of parabolic fractional Mean Field Games. This procedure applies, with suitable modifications, even to stationary problems of the form

$$\begin{aligned} {\mathcal {L}}u+u+H(x,\nabla u)=f(x)\text { in }{\mathbb {T}}^N \end{aligned}$$

through the study of the regularity properties of its dual counterpart

$$\begin{aligned} {\mathcal {L}}^*\rho +\rho -\textrm{div}(H_p(x,\nabla u)\rho )=0\text { in }{\mathbb {T}}^N, \end{aligned}$$

see e.g. [27]. Moreover, the technique of the present paper applies with few modifications to problems driven by the more general local-nonlocal operator

$$\begin{aligned} {\mathcal {L}}_{A,b,c,s}u=-\textrm{Tr}(A(x,t)D^2u)+b(x,t)\cdot Du+c(x,t)u+(-\Delta )^su, \end{aligned}$$

under suitable regularity assumptions on the coefficients, and even to equations with more general integro-differential operators for which a duality theory holds, cf [25]. We emphasize once more that, though one expects the same results of the case of a local diffusion, being the dominating part of the diffusive term, obtaining a gradient estimate through the standard methods, such as the Bernstein one, is by no means immediate even for the stationary problem involving the mixed operator.

We mention that refinements of the Bernstein technique in the nonlocal setting have been proposed quite recently in the manuscript [11], which develops the Bernstein method for integro-differential equations (even fully nonlinear) without lower-order terms, and also in [19, 20] to study gradient bounds for solutions to some different nonlocal models. A recent study that provides oscillation estimates through viscosity solutions’ methods can be found in [36]. Nonetheless, we believe that such formulations do not allow to treat problems with power-growth nonlinearities and \(L^q\) data as those appearing in the PDEs of the present paper, being of nonvariational nature. We also mention the possible application of our techniques to study regularity properties for some nonlocal problems, where a gradient nonlinearity with polynomial growth appears, such as those arising in combustion theory [30] driven by the operator \(-\Delta -(-\Delta )^s\), that will be the matter of future research.

Plan of the paper. Section 2 is devoted to state the assumptions and the main results of the paper. Section 3 provides some preliminary algebraic identities to implement the Bernstein argument. Section 4 concerns regularity properties of transport diffusion equations with general velocity field b driven by the mixed operator \(-\Delta +(-\Delta )^s\). Section 5 contains some estimates for solutions to (4) via the results in Sect. 4. Section 6 is devoted to the proof of the main result.

2 Main result

Throughout this manuscript, the state space will be \({\mathbb {T}}^N\), the N-dimensional flat torus. We denote by \(L^p({\mathbb {T}}^N)\) the space of all measurable and periodic functions on \(\mathbb {R}^N\) belonging to \(L^p_{\textrm{loc}}(\mathbb {R}^N)\), endowed with the norm \(\Vert \cdot \Vert _p=\Vert \cdot \Vert _{L^p((0,1)^N)}\). For positive \(\mu \in \mathbb {R}\), \(W^{\mu ,p}({\mathbb {T}}^N)\) is the standard fractional Sobolev space of functions on the flat torus, while \(H_p^\mu ({\mathbb {T}}^N)\) denotes the Bessel potential space, i.e. the space of distributions u such that \((I-\Delta )^\frac{\mu }{2}u\in L^p({\mathbb {T}}^N)\). For any time interval \((0,t) \subseteq \mathbb {R}\), let \(Q_{t}:={\mathbb {T}}^N\times (0, t)\). For any \(p\ge 1\), we denote by \(W^{2,1}_p(Q_t)\) the space of functions u such that \(\partial _t^{r}D^{\beta }_xu\in L^p(Q)\) for all multi-indices \(\beta \) and r such that \(|\beta |+2r\le 2\), endowed with the norm

$$\begin{aligned} \Vert u\Vert _{W^{2,1}_p(Q_t)}=\left( \iint _{Q_t}\sum _{|\beta |+2r\le 2}|\partial _t^{r}D^{\beta }_x u|^pdxdt\right) ^{\frac{1}{p}}. \end{aligned}$$

Similarly, the space \(W^{1,0}_p(Q_t)\) is equipped with the norm

$$\begin{aligned} \left\Vert u\right\Vert _{W^{1,0}_p(Q_t)}:=\Vert u\Vert _{L^p(Q_t)}+\sum _{|\beta |=1}\Vert D_x^{\beta }u\Vert _{L^p(Q_t)}\ . \end{aligned}$$

We define the space \({\mathcal {H}}_p^{1}(Q_t)\) as the space of functions \(u\in W^{1,0}_p(Q_t)\) with \(\partial _tu\in (W^{1,0}_{p'}(Q))'\) and norm

$$\begin{aligned} \Vert u\Vert _{{\mathcal {H}}_p^{1}(Q_t)}:=\left\Vert u\right\Vert _{W^{1,0}_p(Q_t)}+\Vert \partial _tu\Vert _{(W^{1,0}_{p'}(Q_t))'}\ . \end{aligned}$$

We consider the following Cauchy problem

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _tu+{\mathcal {L}}u+H(x,\nabla u)=f(x,t)&{}\text { in }Q_T,\\ u(x,0)=u_0(x)&{}\text { in }{\mathbb {T}}^N. \end{array}\right. } \end{aligned}$$
(4)

Here, without loss of generality, we suppose \(\varepsilon ,\mu =1\) and consider the diffusion operator

$$\begin{aligned} {\mathcal {L}}:=-\Delta +(-\Delta )^s , \\ s\in (0,1). \end{aligned}$$

We will also denote by \(H_x\) and \(H_p\) the derivatives of H with respect to the first and second entry respectively. Moreover, we assume that \(H\in C^1({\mathbb {T}}^N\times \mathbb {R}^N)\) is convex and satisfies

$$\begin{aligned} \begin{aligned}&\text {there exist constants}\, \gamma> 1\, \text {and}\, C_H>0 \text { such that }\\&\qquad C_H^{-1}|p|^{\gamma }-C_H\le H(x,p)\le C_H(|p|^{\gamma }+1) ,\\&\qquad H_p(x,p)\cdot p-H(x,p)\ge C^{-1}_H|p|^{\gamma }-C_H , \\&\qquad |H_x(x,p)|\le C_H(|p|^{\gamma }+1) , \\&\qquad C^{-1}_H|p|^{\gamma -1}- {C}_H \le |H_p(x,p)|\le C_H|p|^{\gamma -1}+{C}_H , \end{aligned} \end{aligned}$$
(H)

and f will be some source term controlled in some space-time Lebesgue class \(L^q(Q_T)\), for some suitable \(q>1\). Our main result shows the preservation of the Lipschitz regularity in the equation via a quantitative bound.

Theorem 2.1

Suppose that

  1. (i)

    \(s\in (0,1)\);

  2. (ii)

    \(H\in C^1({\mathbb {T}}^N\times \mathbb {R}^N)\) and satisfies (H);

  3. (iii)

    \(f\in C^1(Q_T)\);

  4. (iv)

    \(u_0\in W^{1,\infty }({\mathbb {T}}^N)\).

Let

$$\begin{aligned} q> {\left\{ \begin{array}{ll} N+2&{}\text { if }1<\gamma \le 3\\ \frac{(N+2)(\gamma -1)}{2}&{}\text { if }\gamma >3. \end{array}\right. } \end{aligned}$$
(5)

Then, there exists a constant C depending on \(q,N,T,s,\gamma ,C_H,\Vert u_0\Vert _{W^{1,\infty }({\mathbb {T}}^N)},\Vert f\Vert _{L^q(Q_T)}\) such that every smooth solution to (4) satisfies

$$\begin{aligned} \Vert u(\cdot ,t)\Vert _{W^{1,\infty }({\mathbb {T}}^N)}\le C,\ t\in [0,T]. \end{aligned}$$

Let us stress that, though we require \(f\in C^1(Q_T)\), the gradient bound depends only on the summability of \(f\in L^q_{x,t}\), so it can be regarded as an a priori estimate. In particular, one can avoid to impose \(f\in C^1(Q_T)\) implementing a scheme for strong solutions belonging to \(W^{2,1}_q(Q_T)=\{\partial _t u,u,\nabla u,D^2u\in L^q(Q_T)\}\): this can be done using a test function argument as detailed in Remark 6.2, cf [18], or using a procedure through difference quotients [16], which in a sense avoids the differentiation of the equation. It is worth remarking that when the source term is essentially bounded, i.e. \(f\in L^\infty (Q_T)\), the condition on the summability (5) holds, and the results appear to be new even in this framework. Indeed, as already discussed, Lipschitz bounds from the theory of viscosity solutions require the right-hand side of the equation to be at least continuous and time-independent or even more regular (e.g. Lipschitz), so that an estimate on \(\partial _t u\) readily follows by the maximum principle. We finally emphasize that a regularity estimate starting from a continuous initial datum and suitable weak solutions can be obtained working at the level of difference quotients, as already done first in [16] for Lipschitz regularity and then in [17, 28] for Hölder regularization properties.

We conclude by saying that this nonlinear duality method readily leads to a new proof of the gradient bound for strong solutions to an equation without the local diffusion term (i.e. with \(\varepsilon =0\)) when one imposes \(f\in L^\infty (0,T;W^{1,\infty }({\mathbb {T}}^N))\), which is a classical assumption in the theory of viscosity solutions. This provides an alternative method of proof of the gradient bounds in [5, Section 3] (although in a stronger framework than the viscosity one and for viscous problems). Still, even when \(f\in L^q(0,T;W^{1,q}({\mathbb {T}}^N))\) the results would be new, as the assumptions on the right-hand side can be considered as intermediate between those typically assumed to implement the weak Bernstein argument and the Ishii-Lions method, see Remark 6.3.

3 Preliminary results

For \(s\in (0,1)\) and \(u\in C^\infty ({\mathbb {T}}^N)\) we recall that the following pointwise identity holds (cf [39])

$$\begin{aligned} (-\Delta )^s u=\text {P.V.}\int _{{\mathbb {T}}^N}(u(x)-u(x+y))K(y)\,dy, \end{aligned}$$

where

$$\begin{aligned} K(y)=c_{N,s}\sum _{k\in {\mathbb {Z}}^N}\frac{1}{|x-2\pi k|^{N+2s}}, x\in {\mathbb {T}}^N\ ,x\ne 0 \end{aligned}$$

and \(c_{N,s}\) is a normalizing positive constant depending on Ns, see [39] for the definition. The main result needed to implement our Bernstein-type estimate is the following:

Proposition 3.1

If u is a smooth solution to (4), then the evolution of \(w=\frac{1}{2}|\nabla u(x,t)|^2\) is described by the equation

$$\begin{aligned}{} & {} \partial _t w(x,t)-\Delta w+(-\Delta )^sw+|D^2u(x,t)|^2\nonumber \\{} & {} \qquad +\frac{1}{2}\int _{\Omega }|\nabla u(x,t)-\nabla u(x+y,t)|^2K(y)\,dy\nonumber \\{} & {} \qquad +H_p(x,\nabla u(x,t))\cdot \nabla w(x,t)+H_x(x,\nabla u(x,t))\cdot \nabla u(x,t)\nonumber \\{} & {} \quad =\nabla f(x,t)\cdot \nabla u(x,t)\text { in }\Omega \times (0,T) \end{aligned}$$
(6)

equipped with the initial condition \(w(x,0)=\frac{1}{2}|\nabla u(x,0)|^2\), where \(\Omega ={\mathbb {T}}^N\) or \(\mathbb {R}^N\) and \(|D^2u(x,t)|^2=\sum _{i,j=1}^N(\partial _{x_ix_j}u)^2\).

Before proving this, we need the following Bochner (pointwise) identity for the mixed local-nonlocal operator \(-\Delta +(-\Delta )^s\). This extends an identity already pointed out in [20, equation (2.10)].

Lemma 3.2

Let \(w(x,t)=\frac{1}{2}|\nabla u(x,t)|^2\). Then w satisfies the identity

$$\begin{aligned} \Delta w(x,t)-(-\Delta )^s w(x,t)= & {} \nabla u(x,t)\cdot \nabla (\Delta u(x,t))+\nabla u(x,t)\cdot \nabla (-(-\Delta )^s u(x,t))\nonumber \\{} & {} +|D^2u(x,t)|^2\!+\!\frac{1}{2}\int _{\Omega }|\nabla u(x,t)\!-\!\nabla u(x\!+\!y,t)|^2K(y)\,dy,\nonumber \\ \end{aligned}$$
(7)

where \(\Omega ={\mathbb {T}}^N\) or \(\mathbb {R}^N\).

Proof

Standard algebraic computations give

$$\begin{aligned} \partial _{x_j}w=\sum _{i=1}^N\partial _{x_i}u\partial _{x_ix_j}u\ ;\partial _{x_jx_j}w=\sum _{i=1}^N[(\partial _{x_ix_j}u)^2+\partial _{x_i}u\partial _{x_ix_jx_j}u], \end{aligned}$$

so, summing over j we have

$$\begin{aligned} \Delta w(x,t)=|D^2u(x,t)|^2+\nabla u(x,t)\cdot \nabla \Delta u(x,t). \end{aligned}$$

Moreover, from [19, Proposition 2.1] or [24, Lemma 20.2] for a sufficiently smooth function v, we have

$$\begin{aligned} \frac{1}{2}\Delta ^s(v^2(x,t))=v(x,t)\Delta ^s v(x,t)+\frac{1}{2}\int _{\Omega }(v(x,t)-v(x+y,t))^2K(y)\,dy. \end{aligned}$$

Applying the above identity to any directional derivative \(v=\partial _e u\), we get the conclusion after summing the obtained expressions.

Proof of Proposition 3.1

It is enough to differentiate (4) with respect to \(x_j\), multiply the resulting equation by \(\partial _{x_j}u\), taking the sums for \(j=1,...,N\) and finally use (7).

We end this section with a maximal \(L^q\)-regularity property for the heat equation with mixed diffusion.

Lemma 3.3

Let \(q>1\), \(V\in L^q(Q_T)\) and \(w_0\in W^{2-\frac{2}{q},q}(Q_T)\). Then, there exists a unique strong solution \(w\in W^{2,1}_q(Q_T)\) of the evolution problem

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _tw-\Delta w+(-\Delta )^sw=V(x,t)&{}\text { in }Q_T:={\mathbb {T}}^N\times (0,T),\\ w(x,0)=w_0(x)&{}\text { in }{\mathbb {T}}^N. \end{array}\right. } \end{aligned}$$

Moreover, the following estimate holds

$$\begin{aligned} \Vert w\Vert _{W^{2,1}_q(Q_T)}\le C(\Vert V\Vert _{L^q(Q_T)}+\Vert w_0\Vert _{W^{2-\frac{2}{q},q}(Q_T)}), \end{aligned}$$

where C depends on sTqN and remains bounded for bounded values of T.

Proof

We use a contraction mapping argument on the space \({\mathcal {C}}:=\{u\in W^{2,1}_q(Q_T):u(0)=u_0\}\), following [25, Theorem 3.7]. For fixed \(z\in W^{2,1}_q(Q_T)\), we consider the map \(\Psi \) that sends z into the solution of the problem

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _tw-\Delta w=V(x,t)-(-\Delta )^sz&{}\text { in }Q_T:={\mathbb {T}}^N\times (0,T),\\ w(x,0)=w_0(x)&{}\text { in }{\mathbb {T}}^N. \end{array}\right. } \end{aligned}$$

Applying interpolation inequalities, see e.g. [15, Lemma 2.4], we have for \(\delta >0\)

$$\begin{aligned} \Vert (-\Delta )^sz\Vert _{L^q(Q_T)}\le \delta \Vert z\Vert _{W^{2,1}_q(Q_T)}+C(\delta )\Vert z\Vert _{L^q(Q_T)}, \end{aligned}$$

where \(C(\delta )\) grows as \(\delta \rightarrow 0\). We now write

$$\begin{aligned} z(\cdot ,s)=w_0(\cdot )+\int _0^s\partial _t z(\cdot ,\omega )\,d\omega \end{aligned}$$

and get by the Hölder’s inequality

$$\begin{aligned} \Vert z\Vert _{L^q(Q_T)}\le \tau ^\frac{1}{q}\Vert w_0\Vert _{L^q({\mathbb {T}}^N)}+\tau \Vert \partial _tz\Vert _{L^q(Q_T)}. \end{aligned}$$

Applying maximal \(L^q\)-regularity for the heat equation with frozen right-hand side we get

$$\begin{aligned} \Vert w\Vert _{W^{2,1}_q(Q_T)}\le C(\Vert V\Vert _{L^q(Q_T)}+\Vert (-\Delta )^sz\Vert _{L^q(Q_T)}+\Vert w_0\Vert _{W^{2-\frac{2}{q},q}(Q_T)}), \end{aligned}$$

and then one finds that \(\Psi \) is a contraction on \({\mathcal {C}}\), as done in [25], by taking first \(\delta \) small and then \(T\le T_0\) small enough. Applying the same procedure a finite number of steps, one proves the result for any fixed T.

4 A priori estimates for general transport equations with mixed local and nonlocal diffusion

In this section we focus on the following backward problem driven by a general vector field \(b=b(x,t)\)

$$\begin{aligned} {\left\{ \begin{array}{ll} -\partial _t\rho -\Delta \rho +(-\Delta )^s \rho +\textrm{div}(b(x,t)\rho )=0&{}\text { in }Q_\tau ,\\ \rho (x,\tau )=\rho _\tau (x)&{}\text { in }{\mathbb {T}}^N. \end{array}\right. } \end{aligned}$$
(8)

We will also assume that \(\rho _\tau \in C^\infty ({\mathbb {T}}^N)\), \(\rho _\tau \ge 0\) and \(\Vert \rho _\tau \Vert _{L^1({\mathbb {T}}^N)}=1\). We mainly describe the a priori estimates needed to run our Bernstein argument, without discussing the existence and uniqueness of (weak) solutions to such problem, which however is well-known due to the presence of the heat operator, even when b belongs to the Aronson-Serrin interpolated condition \(b\in L^r_t(L^z_x)\), \(\frac{N}{2r}+\frac{1}{z}\le \frac{1}{2}\), see e.g. [25, Remark 3.7]. Thus, from now on, we will consider classical solutions, even though the argument can be made rigorous for weak energy solutions [16].

The next is a maximal regularity estimate for solutions to (8) in Lebesgue spaces obtained in terms of terminal data belonging to \(L^1({\mathbb {T}}^N)\).

Proposition 4.1

Let \(\rho \) be the nonnegative solution to (8) and let

$$\begin{aligned} 1<q'<\frac{N+2}{N+1}. \end{aligned}$$

Then there exists \(C>0\) depending on \(\sigma ',N,s,T\) such that

$$\begin{aligned} \Vert \rho \Vert _{{\mathcal {H}}_{q'}^1(Q_\tau )}\le C(\Vert b\rho \Vert _{L^{q'}(Q_\tau )}+\Vert \rho \Vert _{L^{q'}(Q_\tau )}+\Vert \rho _\tau \Vert _{L^1({\mathbb {T}}^N)})\ . \end{aligned}$$

Proof

The proof can be done following either the duality arguments in [16, Proposition 2.4] or regarding the problem (8) as

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t\rho -\Delta \rho +(-\Delta )^s \rho =-\textrm{div}(b(x,t)\rho )&{}\text { in }Q_\tau :={\mathbb {T}}^N\times (0,\tau ),\\ \rho (x,\tau )=\rho _\tau (x)&{}\text { in }{\mathbb {T}}^N , \end{array}\right. } \end{aligned}$$

and then using maximal regularity properties for the linear evolution equation \(\partial _t\rho -\Delta \rho +(-\Delta )^s \rho =V(x,t)\in L^q_{x,t}\) as in [28], together with the embeddings of (trace) fractional Sobolev spaces. Maximal regularity for the heat equation with mixed diffusion in Lebesgue spaces holds by Lemma 3.3.

As a consequence, we have the following

Corollary 4.2

Let \(\rho \) be the nonnegative solution to (8) and let

$$\begin{aligned} 1<q'<\frac{N+2}{N+1}. \end{aligned}$$

Then there exists \(C>0\) depending on \(\sigma ',N,s,T\) such that

$$\begin{aligned} \Vert \rho \Vert _{{\mathcal {H}}_{q'}^1(Q_\tau )}\le C\left( \iint _{Q_\tau }|b|^{m'}\rho \,dxdt+1\right) . \end{aligned}$$

where

$$\begin{aligned} m'=1+\frac{N+2}{q}. \end{aligned}$$

Proof

The proof is a consequence of the application of the Young’s inequality in the estimate of Proposition 4.1, and it is essentially the same of [16, Proposition 2.5], so we omit it.\(\square \)

5 A priori estimates for the Hamilton–Jacobi equation by duality and some consequences

The main goal of this section is to analyze the following transport equation

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t\rho -\Delta \rho +(-\Delta )^s \rho -\textrm{div}(H_p(x,\nabla u(x,t))\rho )=0&{}\text { in }Q_\tau :={\mathbb {T}}^N\times (0,\tau ),\\ \rho (x,\tau )=\rho _\tau (x)&{}\text { in }{\mathbb {T}}^N , \end{array}\right. } \end{aligned}$$
(9)

for \(\tau \in (0,T)\), \(\rho _\tau \in C^\infty ({\mathbb {T}}^N)\), \(\rho _\tau \ge 0\) and \(\Vert \rho _\tau \Vert _{L^1({\mathbb {T}}^N)}=1\). Note that by the standing assumptions, \(\rho \) is a.e. nonnegative on the cylinder. Notice also that it is the adjoint equation of the linearization of (4). From now on, we denote by \(L:{\mathbb {T}}^N\times \mathbb {R}^N\rightarrow \mathbb {R}\) the Legendre transform of H with respect to the second entry, i.e.

$$\begin{aligned} L(x,\nu )=\sup _{p\in \mathbb {R}^N}\{p\cdot \nu -H(x,p)\}. \end{aligned}$$

By the convexity of \(H(x,\cdot )\) it follows that

$$\begin{aligned} H(x,p)=\sup _{\nu \in \mathbb {R}^N}\{p\cdot \nu -L(x,\nu )\} \end{aligned}$$

and

$$\begin{aligned} H(x,p)=p\cdot \nu -L(x,\nu )\iff \nu =H_p(x,p). \end{aligned}$$

We further recall the following properties of the Lagrangian function L valid for all \(\nu \in \mathbb {R}^N\):

$$\begin{aligned} C_L^{-1}|\nu |^{\gamma '}-C_L\le |L(x,\nu )|\le C_L|\nu |^{\gamma '}. \end{aligned}$$
(L1)

Theorem 5.1

Let \(u,\rho \) be classical solutions to (4) and (9) respectively, and assume

$$\begin{aligned} q>\max \left\{ \frac{(N+2)(\gamma -1)}{2},N+2\right\} . \end{aligned}$$
(10)

Then, there exists \(C>0\) and \(\theta \in (0,1)\) such that

$$\begin{aligned} \Vert \rho \Vert _{{\mathcal {H}}_{q'}^1(Q_\tau )}\le C\left( \Vert \nabla u\Vert _{L^\infty (Q_\tau )}^{1-\theta }+1\right) , \end{aligned}$$

where C depends on \(C_H,s,q,d,T,\Vert u_0\Vert _{C({\mathbb {T}}^N)},\Vert f\Vert _{L^q(Q_T)}\).

Parabolic Sobolev embeddings of \({\mathcal {H}}_q^1(Q_T)\) into \(L^{s}(Q_T)\) for \(s>1\) satisfying \(\frac{1}{s}=\frac{1}{q'}-\frac{1}{N+2}\) (cf. [16, Appendix A]) imply the following result:

Corollary 5.2

Under the assumptions of Theorem 5.1, there exists a constant \(C>0\) independent of u such that

$$\begin{aligned} \Vert \rho \Vert _{L^{p'}(Q_\tau )}\le C(\left( \Vert \nabla u\Vert _{L^\infty (Q_\tau )}^{1-\theta }+1\right) ,\ p>\frac{(N+2)(\gamma -1)}{\gamma +1}. \end{aligned}$$

The proof of Theorem 5.1 follows the main line of [16, Section 3.1]. We first recall the following crucial representation formula, that easily follows by multiplying (4) by \(-\rho \) and (9) by u:

Proposition 5.3

Let u be a solution to (4) and \(\rho \) be a solution to (9). Then the following identity holds

$$\begin{aligned} \int _{{\mathbb {T}}^N}u(x,\tau )\rho _\tau (x)\,dx= & {} \int _{{\mathbb {T}}^N}u_0(x)\rho (x,0)\,dx\\ {}{} & {} +\iint _{Q_\tau }L(x,H_p(x,\nabla u(x,t))\rho \,dxdt+\iint _{Q_\tau }f\rho \,dxdt. \end{aligned}$$

With the aid of Proposition 5.3 we first prove the following sup-norm estimate for solutions to (4). This slightly extends [16, Proposition 3.7] and [28, Theorem 2.3] to problems with mixed diffusion.

Proposition 5.4

Let \(f\in L^q(Q_T)\), \(q>\frac{N+2}{2}\). Any solution to (4) satisfies

$$\begin{aligned} \Vert u(\cdot ,t)\Vert _{C({\mathbb {T}}^N)}\le C ,t\in [0,T], \end{aligned}$$

where C depends on \(T,N,q,s,\Vert f\Vert _{L^q(Q_T)}\).

Proof

We first prove that

$$\begin{aligned} u(x,\tau )\le \Vert u_0\Vert _{C({\mathbb {T}}^N)}+C\Vert f\Vert _{L^q({\mathbb {T}}^N)} ,\tau \in (0,T),\ x\in {\mathbb {T}}^N. \end{aligned}$$
(11)

Consider the strong nonnegative solution of the following problem

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t\mu -\Delta \mu +(-\Delta )^s \mu =0&{}\text { in }Q_\tau ,\\ \mu (x,\tau )=\mu _\tau (x)&{}\text { in }{\mathbb {T}}^N. \end{array}\right. } \end{aligned}$$

where \(\mu _\tau \in C^\infty \), \(\mu _\tau \ge 0\) and \(\Vert \mu _\tau \Vert _1=1\). By Corollary 4.2 with \(b\equiv 0\) and Sobolev embeddings, we have \(\Vert \mu \Vert _{L^{q'}}\le C\), \(q'<\frac{N+2}{N}\). Using \(\mu \) as a test function in (4), one obtains

$$\begin{aligned} \int _{{\mathbb {T}}^N}u(x,\tau )\mu _\tau (x)\,dx= & {} \int _{{\mathbb {T}}^N}u_0(x)\mu (x,0)\,dx\\ {}{} & {} -\iint _{Q_\tau }H(x,\nabla u(x,t))\mu \,dxdt+\iint _{Q_\tau }f\mu \,dxdt. \end{aligned}$$

Then, the upper bound on u follows by duality using the Hölder’s inequality on the first and third integral, the fact that \(\Vert \mu (t)\Vert _1=1\) for all \(t\in (0,\tau )\) and also that \(H,\mu \ge 0\). The bound from below can be obtained in the same manner testing (4) against the solution of (9), as in [16, Proposition 3.7], and it is based on the representation formula in Proposition 5.3.

This bound directly leads to the following integrability estimate on the velocity field b of (9) with respect to \(\rho \) computed along the solution of the equation (4). Its proof follows that of Proposition 3.2 in [16].

Proposition 5.5

Let u be a solution to (4) and \(\rho \) be a solution to (9). Then, there exists a constant \(C>0\) depending on \(q,N,T,s,C_H,\Vert f\Vert _{L^r(Q_T)}\), \(r>\max \left\{ \frac{N+2}{2},\frac{(N+2)(\gamma -1)}{\gamma }\right\} \) such that

$$\begin{aligned} \iint _{Q_\tau }|\nabla u(x,t)|^{k}\rho \,dxdt\le C ,1\le k\le \gamma . \end{aligned}$$

Remark 5.6

Similar estimates as those in Proposition 5.5 can be found in [16, 17, 27, 28].

Proof

We rewrite the identity in Proposition 5.3 as

$$\begin{aligned} \iint _{Q_\tau }L(x,H_p(x,\nabla u(x,t))\rho \,dxdt= & {} \int _{{\mathbb {T}}^N}u(x,\tau )\rho _\tau (x)\,dx-\int _{{\mathbb {T}}^N}u_0(x)\rho (x,0)\,dx\\ {}{} & {} -\iint _{Q_\tau }f\rho \,dxdt. \end{aligned}$$

We pick \(r>1\) such that

$$\begin{aligned} \frac{(N+2)(\gamma -1)}{\gamma }<r<N+2<q. \end{aligned}$$

We use (L1), the Hölder’s inequality and the upper bound in (11) to find

$$\begin{aligned} C_L^{-1}\iint _{Q_\tau }|H_p(x,\nabla u(x,t)|^{\gamma '}\rho \,dxdt\le 2\Vert u\Vert _{C(Q_T)}+\Vert f\Vert _{L^r(Q_T)}\Vert \rho \Vert _{L^{r'}(Q_T)}\\ \le 2(\Vert u_0\Vert _\infty +\Vert f\Vert _{L^q(Q_\tau )})+\Vert f\Vert _{L^r(Q_\tau )}\Vert \rho \Vert _{L^{r'}(Q_\tau )}. \end{aligned}$$

Let \({\bar{q}}\) be such that

$$\begin{aligned} r'=\frac{(N+2){\bar{q}}'}{N+2-{\bar{q}}'}. \end{aligned}$$

By the embedding \({\mathcal {H}}^1_{{\bar{q}}'}(Q_\tau )\hookrightarrow L^{r'}(Q_\tau )\) and choosing \(r>\frac{N+2}{2}\) we have \(\bar{q}'<\frac{N+2}{N+1}\). We are thus in position to apply Corollary 4.2 and obtain

$$\begin{aligned} \Vert \rho \Vert _{L^{r'}(Q_\tau )}\le C(\Vert \rho \Vert _{{\mathcal {H}}^1_{\bar{q}'}(Q_\tau )}+1)\le {\tilde{C}}\left( \iint _{Q_\tau }|H_p(x,\nabla u)|^{m'}\rho \,dxdt+1\right) , \end{aligned}$$

where

$$\begin{aligned} m'=1+\frac{N+2}{{\bar{q}}}. \end{aligned}$$

We thus end up with

$$\begin{aligned} C_L^{-1}\iint _{Q_\tau }|H_p(x,\nabla u(x,t)|^{\gamma '}\rho \,dxdt\le 2(\Vert u_0\Vert _\infty +\Vert f\Vert _{L^q(Q_\tau )})\\ +{\tilde{C}}\Vert f\Vert _{L^r(Q_\tau )}\left( \iint _{Q_\tau }|H_p(x,\nabla u)|^{m'}\rho \,dxdt+1\right) . \end{aligned}$$

The last integral can be absorbed on the left-hand side by the weighted Young’s inequality since

$$\begin{aligned} m'=1+\frac{N+2}{{\bar{q}}}=\frac{N+2}{r}<\gamma ', \end{aligned}$$

so that we conclude the estimate.

We are now ready for the proof of the main result of this section.

Proof of Theorem 5.1

Since q in (10) always satisfies \(q'<\frac{N+2}{N+1}\), we can apply Corollary 4.2 combined with the hypotheses (H) to conclude

$$\begin{aligned} \Vert \rho \Vert _{{\mathcal {H}}_{\widetilde{q}'}^1(Q_\tau )}{} & {} \le C_1\left( \iint _{Q_\tau }|H_p(x,\nabla u(x,t))|^{m'}\rho \,dxdt+1\right) \\ {}{} & {} \qquad \le C_2\left( \iint _{Q_\tau }|\nabla u|^{(\gamma -1)m'}\rho \,dxdt+1\right) \\{} & {} \qquad \le C_3\left( \Vert \nabla u\Vert _{L^\infty (Q_\tau )}^{1-\theta }\iint _{Q_\tau }|\nabla u|^{(\gamma -1)m'-1+\theta }\rho \,dxdt+1\right) , \end{aligned}$$

where

$$\begin{aligned} m'=1+\frac{N+2}{q}. \end{aligned}$$

We now choose \(\theta >0\) small enough so that

$$\begin{aligned} k=(\gamma -1)m'-1+\theta \le \gamma . \end{aligned}$$

Note that this can be done by the initial choice of q. Thus, we use Proposition 5.5 and conclude the estimate.

6 Proof of the main result and further comments

We start with the following refined variational Bochner identity that will allow us to exploit the information \(f\in L^q_{x,t}\):

Lemma 6.1

Let u be a classical solution to (4) and \(\rho \) be a solution to (9). Then, the following identity holds

$$\begin{aligned}{} & {} \int _{{\mathbb {T}}^N}w(x,\tau )\rho _\tau (x)\,dx+\iint _{Q_\tau }|D^2u(x,t)|^2\rho \,dxdt+\iint _{Q_\tau }{\mathcal {I}}[\nabla u](x,t)\rho \,dxdt\nonumber \\{} & {} \quad =-\iint _{Q_\tau }H_x(x,\nabla u(x,t))\cdot \nabla u(x,t)\rho \,dxdt\nonumber \\{} & {} \qquad -\iint _{Q_\tau }f(x,t)\textrm{div}(\nabla u(x,t)\rho )\,dxdt+\int _{{\mathbb {T}}^N}w(x,0)\rho (x,0)\,dx, \end{aligned}$$
(12)

where \({\mathcal {I}}[\nabla u](x,t)=\frac{1}{2}\int _{{\mathbb {T}}^N}|\nabla u(x,t)-\nabla u(x+y,t)|^2K(y)\,dy\).

Proof

It is sufficient to multiply (6) by the adjoint variable \(\rho \) solving (9), integrate over the cylinder \(Q_\tau \) and finally integrate by parts the terms involving the time derivative and the one involving the source term f(xt) of the equation.

We are now ready for the proof of the main result

Proof of Theorem 2.1

We start with (12), observing that the third term on the left-hand side is nonnegative since \(\rho \) and the nonlocal term \({\mathcal {I}}[\nabla u]\) are nonnegative, cf [24, Remark 20.3]. Thus, we end up with the inequality

$$\begin{aligned}{} & {} \int _{{\mathbb {T}}^N}w(x,\tau )\rho _\tau (x)\,dx+\iint _{Q_\tau }|D^2u(x,t)|^2\rho \,dxdt \\{} & {} \quad \le -\iint _{Q_\tau }H_x(x,\nabla u(x,t))\cdot \nabla u(x,t)\rho \,dxdt\\{} & {} \qquad -\iint _{Q_\tau }f(x,t)\textrm{div}(\nabla u(x,t)\rho )\,dxdt+\int _{{\mathbb {T}}^N}w(x,0)\rho (x,0)\,dx. \end{aligned}$$

First, we observe that the chain rule applied to the second term of the right-hand side leads to

$$\begin{aligned} -\iint _{Q_\tau }f\textrm{div}(\nabla u\rho )\,dxdt=-\iint _{Q_\tau }f\Delta u\rho \,dxdt-\iint _{Q_\tau }f\nabla u\cdot \nabla \rho \,dxdt. \end{aligned}$$

We are now left to estimate all the terms appearing on the right-hand side. We start with the one involving \(H_x\). Using (H), Proposition 5.5 with \(k=\gamma \) and the Young’s inequality we conclude

$$\begin{aligned}{} & {} \iint _{Q_\tau }|H_x(x,\nabla u(x,t))||\nabla u(x,t)|\rho \,dxdt\le C_H\Vert \nabla u\Vert _{L^\infty (Q_\tau )}\iint _{Q_\tau }|\nabla u|^\gamma \rho \,dxdt+C_H\tau \\{} & {} \quad \le C_1+\frac{1}{8}\Vert \nabla u\Vert _{L^\infty (Q_\tau )}^2. \end{aligned}$$

We now estimate the terms involving the source of the equation. We use the Cauchy–Schwarz inequality (1), the Young’s inequality together with the Hölder’s inequality with an exponent \(\tilde{p}>1\) to be later determined to find that

$$\begin{aligned}{} & {} \iint _{Q_\tau }|f||\Delta u|\rho \,dxdt\le \sqrt{N}\iint _{Q_\tau }|f||D^2 u|\rho \,dxdt\le \frac{1}{2}\iint _{Q_\tau }|D^2 u|^2\rho \,dxdt\\{} & {} \qquad +\frac{N}{2}\iint _{Q_\tau }f^2\rho \,dxdt\\{} & {} \quad \le \frac{1}{2}\iint _{Q_\tau }|D^2 u|^2\rho \,dxdt+\frac{N}{2}\Vert f\Vert ^2_{L^{2\tilde{p}}(Q_\tau )}\Vert \rho \Vert _{L^{\tilde{p}'}(Q_\tau )}. \end{aligned}$$

The last term can be bounded by \(\Vert f\Vert _{L^q(Q_\tau )}\) through Theorem 5.1 by choosing \({\tilde{p}}\) satisfying

$$\begin{aligned} \frac{2(N+2)(\gamma -1)}{\gamma +1}<2\tilde{p}\le q. \end{aligned}$$

This can always be done in view of (5). Therefore, we have by Corollary 5.2 and the Young’s inequality

$$\begin{aligned}{} & {} \iint _{Q_\tau }|f||\Delta u|\rho \,dxdt\le \frac{N}{2}\Vert f\Vert _{L^{q}(Q_\tau )}^2(\Vert \nabla u\Vert _{L^\infty (Q_\tau )}^{1-\theta }+1)+\frac{1}{2}\iint _{Q_\tau }|D^2 u|^2\rho \,dxdt\\{} & {} \quad \le C_2+\frac{1}{8}\Vert \nabla u\Vert _{L^\infty (Q_\tau )}^2+\frac{1}{2}\iint _{Q_\tau }|D^2 u|^2\rho \,dxdt. \end{aligned}$$

Using the Hölder’s inequality, Theorem 5.1, and the weighted Young’s inequality we deduce

$$\begin{aligned}{} & {} \iint _{Q_\tau }|f||\nabla u||\nabla \rho |\,dxdt\le \Vert \nabla u\Vert _{L^\infty (Q_\tau )}\Vert f\Vert _{L^q(Q_T)}\Vert \nabla \rho \Vert _{L^{q'}(Q_T)}\\{} & {} \quad \le \Vert \nabla u\Vert _{L^\infty (Q_\tau )}\Vert f\Vert _{L^q(Q_T)}\Vert \rho \Vert _{{\mathcal {H}}^1_{q'}(Q_T)}\\{} & {} \quad \le C_3\Vert \nabla u\Vert _{L^\infty (Q_\tau )}\Vert f\Vert _{L^q(Q_T)}\left( \Vert \nabla u\Vert _{L^\infty (Q_\tau )}^{1-\theta }+1\right) \le \frac{1}{8}\Vert \nabla u\Vert _{L^\infty (Q_\tau )}^2+C_4. \end{aligned}$$

Finally, using the conservation of the \(L^1\) norm for the transport-diffusion equation we have \(\Vert \rho (x,0)\Vert _{L^1({\mathbb {T}}^N)}=1\), and we conclude

$$\begin{aligned} \int _{{\mathbb {T}}^N}w(x,0)\rho (x,0)\,dx\le \frac{1}{2}\Vert \nabla u_0\Vert _{L^\infty ({\mathbb {T}}^N)}^2. \end{aligned}$$

Plugging all the estimates in the first inequality we have

$$\begin{aligned}{} & {} \int _{{\mathbb {T}}^N}\frac{1}{2}|\nabla u(x,\tau )|^2\rho _\tau (x)\,dx+\iint _{Q_\tau }|D^2u(x,t)|^2\rho \,dxdt\le C_5+\frac{1}{2}\Vert \nabla u_0\Vert _{L^\infty ({\mathbb {T}}^N)}\\{} & {} \quad +\frac{3}{8}\Vert \nabla u\Vert _{L^\infty (Q_\tau )}^2+\frac{1}{2}\iint _{Q_\tau }|D^2 u|^2\rho \,dxdt, \end{aligned}$$

where \(C_5=\max \{C_1,C_2,C_4\}\) depends on \(N,C_H,q,T,s,\Vert f\Vert _q\). The last inequality holds for all smooth \(\rho _\tau \ge 0\) such that \(\Vert \rho _\tau \Vert _1=1\), which gives by duality

$$\begin{aligned} \frac{1}{2}\Vert \nabla u(\cdot ,\tau )\Vert _{L^\infty ({\mathbb {T}}^N)}^2\le C_5+\frac{1}{2}\Vert \nabla u_0\Vert _{L^\infty ({\mathbb {T}}^N)}^2 +\frac{3}{8}\Vert \nabla u\Vert _{L^\infty (Q_\tau )}^2. \end{aligned}$$

The estimate then follows by passing to the supremum over \(\tau \in (0,T)\) on the left-hand side.

Some final remarks are in order:

Remark 6.2

The argument continues to be valid for functions \(u\in W^{2,1}_q\) instead of smooth solutions, without need to assume \(f\in C^1\). This can be done using the test function \(\varphi =-\textrm{div}(\nabla u\rho )\) in the variational formulation of (4), as in [18].

Remark 6.3

When f is more regular, the proof works even when \(\varepsilon =0\), i.e. for the equation driven by the sole fractional Laplacian, in the subcritical case \(s\in \left( \frac{1}{2},1\right) \) and for strong solutions belonging to the space \({\mathcal {H}}_q^{2s}(Q_T)=\{\partial _t u,(I-\Delta )^su\in L^q(Q_T)\}\), as introduced in [15]. The procedure simplifies since one can avoid an integration by parts in (12) and use the identity

$$\begin{aligned}{} & {} \int _{{\mathbb {T}}^N}w(x,\tau )\rho _\tau (x)\,dx\le \int _{{\mathbb {T}}^N}w(x,\tau )\rho _\tau (x)\,dx+\iint _{Q_\tau }{\mathcal {I}}[\nabla u](x,t)\rho \,dxdt\\{} & {} \quad =-\iint _{Q_\tau }H_x(x,\nabla u(x,t))\cdot \nabla u(x,t)\rho \,dxdt\\{} & {} \qquad -\iint _{Q_\tau }\nabla f(x,t)\cdot \nabla u(x,t)\rho \,dxdt+\int _{{\mathbb {T}}^N}w(x,0)\rho (x,0)\,dx. \end{aligned}$$

Then, if \(f\in L^\infty (0,T;W^{1,\infty }({\mathbb {T}}^N))\), we have

$$\begin{aligned} \iint _{Q_\tau }\nabla f(x,t)\cdot \nabla u(x,t)\rho \,dxdt\le \Vert f\Vert _{L^\infty _t(W^{1,\infty }_x)}\Vert \nabla u\Vert _{L^\infty (Q_\tau )}\tau , \end{aligned}$$

and then it is sufficient to use the Young’s inequality. If, instead, \(f\in L^q(0,T;W^{1,q}({\mathbb {T}}^N))\) for q such that

$$\begin{aligned} q>\max \left\{ N+2s,\frac{(N+2s)(\gamma -1)}{2s-1}\right\} , \end{aligned}$$

it is enough to argue as follows:

$$\begin{aligned}{} & {} \iint _{Q_\tau }\nabla f(x,t)\cdot \nabla u(x,t)\rho \,dxdt\le \Vert \nabla u\Vert _{L^\infty (Q_\tau )}\\{} & {} \quad \iint _{Q_\tau }|\nabla f|\rho \,dxdt\le \Vert \nabla u\Vert _{L^\infty (Q_\tau )}\Vert f\Vert _{L^p_t(W^{1,p}_x)}\Vert \rho \Vert _{L^{p'}(Q_\tau )} \end{aligned}$$

and use the estimate in Corollary 5.9 of [28] combined with the Young’s inequality to conclude the gradient bound. We emphasize that an assumption like \(f\in L^q(0,T;W^{1,q}({\mathbb {T}}^N))\) lies in between those assumed for the use of the weak Bernstein method and the ones to run the Ishii-Lions argument. It remains an open problem whether the \(L^\infty \) gradient bound holds for \(f\in L^q_{x,t}\) and \(\gamma >2s\): by scaling, we expect this can be true provided that \(q>\frac{N+2s}{2s-1}\), \(s\in (1/2,1)\).

Remark 6.4

Lipschitz estimates in the subquadratic case can be obtained in a slightly different manner, combining Lemma 3.3 with Gagliardo-Nirenberg interpolation inequalities. Indeed, regarding (4) as

$$\begin{aligned} \partial _tu-\Delta u+(-\Delta )^s u=-H(x,\nabla u)+f(x,t) \end{aligned}$$

one has

$$\begin{aligned} \Vert u\Vert _{W^{2,1}_q}\lesssim \Vert \nabla u\Vert _{L^{q\gamma }}^\gamma +\Vert f\Vert _q+\Vert u_0\Vert _{W^{2-2/q,q}}. \end{aligned}$$

Then, Gagliardo-Nirenberg interpolation inequalities lead to

$$\begin{aligned} \Vert \nabla u\Vert _{L^{q\gamma }}^\gamma \le C\Vert \nabla u\Vert _{L^{2q}}^\gamma \lesssim \Vert u\Vert _{W^{2,1}_q}^{\frac{\gamma }{2}} \Vert u\Vert _{\infty }^{\frac{\gamma }{2}}. \end{aligned}$$

Since \(\gamma \in (1,2)\), one can use the Young’s inequality and the sup-norm estimate in Proposition 5.4 to conclude a bound on \(\Vert u\Vert _{W^{2,1}_q}\) for \(q>\frac{N+2}{2}\). By parabolic Sobolev embeddings it then follows that \(\Vert \nabla u\Vert _{L^\infty }\) is bounded whenever \(q>N+2\). The drawback of this approach is the requirement on \(\gamma \) (that must be of subquadratic growth) and on the initial data, which needs to be more regular respect to Theorem 2.1. The same idea works even for problems driven by the sole fractional Laplacian using interpolation estimates and the \(L^\infty \) bounds in Theorem 2.3 of [28], provided that \(\gamma <2s\), \(s\in (1/2,1)\). Still, the same approach of this manuscript can be refined to obtain maximal regularity estimates via new Hölder and \(L^p\) bounds for mixed diffusion problems, see [17].

Remark 6.5

Lipschitz bounds can be produced under the weaker restriction \(q>N+2\) when \(\gamma >3\) for viscous problems driven by the Laplacian, see [10], but the validity of such bounds under this weaker restriction remains an open problem even for equations driven by nonlocal operators, as discussed in Remark 6.3. We emphasize that the assumption \(q>N+2\) is sharp, at least for viscous parabolic problems, see [16, Remark 3.13].

Remark 6.6

A priori estimates proved in Theorem 2.1 can be used to proved existence and uniqueness of strong solutions, as shown in [15, Proposition 3.11] and [16, Section 4].

Remark 6.7

The adjoint-Bernstein method implemented in this paper can be successfully applied even to (nonlinear) first-order terms with linear gradient growth. Indeed, in this case the drift of the transport-diffusion equation would be bounded and the estimates in Sect. 4 can be obtained in the same manner. However, the case of sublinear, i.e. \(\gamma \le 1\), gradient terms can be deduced through the classical \(W^{2,1}_q\) regularity estimates for viscous problems, see e.g. [25].

On behalf of all authors, the corresponding author states that there is no conflict of interest.