1 Introduction

The celebrated Li–Yau inequality states that on a complete Riemannian manifold M with topological dimension n and non-negative Ricci curvature, we have

$$\begin{aligned} - \Delta (\log u) \le \frac{n}{2t},\quad t>0, \end{aligned}$$
(1)

for any positive solution u to the heat equation \(\partial _t u = \Delta u\) on \((0,\infty )\times M\), where \(\Delta \) denotes the Laplace–Beltrami operator. This inequality originates from the seminal work [21]. It is optimal in the sense that equality is achieved for the heat kernel in the Euclidean case. As an important application of (1) (sharp) parabolic Harnack inequalities can be deduced. Using the \(\Gamma \)-calculus of Bakry and Émery, the Li–Yau inequality (1) has been generalized to Markov diffusion operators that satisfy the curvature-dimension condition CD(0, n), see [3] and also the extensive monograph [4].

Concerning the non-local situation, the approach via curvature-dimension inequalities has stimulated a lot of research in the discrete setting. Based on discrete replacements of certain chain rule identities, in [8, 16, 22] suitable substitutes of curvature-dimension conditions have been introduced in order to prove Li–Yau type inequalities for generalized graph Laplacians. Corresponding Li–Yau inequalities for non-local operators describing diffusion processes with arbitrary long jumps do not seem to be known.

In the continuous setting, the fractional Laplacian on \({\mathbb {R}}^d\) is one of the most prominent representatives of a non-local operator. It is a natural question to ask in which way properties for the Laplace operator in the Euclidean case carry over to the fractional Laplacian. In particular it is of great interest whether a Li–Yau type inequality is also valid for positive solutions of the fractional heat equation, and if yes, whether the time-dependence is like in (1). The recent survey article [17] of Garofalo dedicates a whole chapter to address these open questions.

The main aim of this article is to establish a general reduction principle to derive Li–Yau inequalities, that applies to the discrete and continuous setting, respectively, and which answers both of the above questions for the fractional heat equation positively. For that purpose, we will not follow the approach of curvature-dimension inequalities. Remarkably, in the quite recent article [24] it has been shown that the fractional Laplacian does not satisfy \(CD(\kappa ,n)\) for any pair \((\kappa ,n)\in {\mathbb {R}}\times (0,\infty )\), which is in sharp contrast to the Euclidean Laplacian. Instead, we follow the different approach of reducing the problem to the heat kernel. In the case of the classical heat equation on \({\mathbb {R}}^d\), the heat kernel satisfies (1) with equality and one is able to deduce the Li–Yau inequality (1) by using that positive solutions to the heat equation are given as the convolution of the heat kernel with the respective initial datum, see e.g. [17, Section 21]. We show in a very general framework that this reduction principle also works for positive solutions to the corresponding non-local diffusion equation. Indeed, by Theorem 2.4 we conclude that the heat kernel determines the corresponding Li–Yau inequality for solutions that can be represented via a respective integral representation involving the corresponding heat kernel. As an application, we apply this principle to the fractional heat equation and derive a Li–Yau inequality (see Theorem 3.2), which is optimal with respect to the time-behaviour. To highlight that our method applies to a quite general setting, we also illustrate the case of a discrete state space in Sect. 4. Here, we consider the example that the underlying graph is given by the complete graph \(K_n\), \(n\ge 2\), and deduce an optimal Li–Yau type inequality, which improves known results from [16].

As we have mentioned before, parabolic Harnack inequalities belong to the most important applications of the original Li–Yau estimate. Therefore it is natural to ask whether non-local Li–Yau inequalities lead to similar applications. While in the discrete setting this has been positively answered by [8, 16, 22], we are not aware of any proof of a Harnack inequality via a Li–Yau type inequality for an evolution equation with a purely non-local diffusion operator in the space-continuous setting. In Theorem 5.2 we derive a scale-invariant Harnack inequality for the fractional heat equation, where the crucial point is not the result as such (in fact, better results already exist in the literature, see Remark 5.3 below), but the fact that such a derivation is possible at all. In contrast to the classical heat equation, we do not have a gradient term \(|\nabla ( \log u)|^2\) in the corresponding differential Harnack inequality, but have to work with the non-local operator \(\Psi _\Upsilon (\log u)\) (see (3)) involving the function \(\Upsilon (z)=e^z-1-z\), which makes the proof much more difficult. Here we point out that one cannot argue as in the discrete case, where this operator already appeared, see the work [16] on Li–Yau inequalities in the discrete setting. We remark that the operator \(\Psi _\Upsilon \) also plays a fundamental role in the context of curvature-dimension conditions for non-local operators including the discrete case of Markov chains, see [25].

The article is organized as follows. In the next section we establish a reduction principle to derive Li–Yau inequalities for non-local diffusion problems in a very general framework. Thereafter, we apply this principle in Sect. 3 to the fractional heat equation and in Sect. 4 to the diffusion equation with the Markov generator that corresponds to the complete graph. Section 5 is devoted to the derivation of Harnack inequalities by means of the Li–Yau estimates obtained before.

We have been informed that Tuhin Ghosh and Moritz Kassmann from Bielefeld (Germany) have recently proved a different version of Li–Yau type inequality for the fractional heat equation by an alternative approach, see [18]. Their result also allows to derive a Harnack inequality. A preprint shall be available soon.

2 A reduction principle for deriving Li–Yau inequalities

Let (Md) be a metric space and \({\mathcal {B}}(M)\) denote the Borel \(\sigma \)-algebra on M. We consider a non-local operator of the form

$$\begin{aligned} L f(x) = \int _{M\setminus \{x\}} \big ( f(y)-f(x)\big ) k(x,\mathrm {d}y), \end{aligned}$$
(2)

where the kernel is such that \(k(x,\cdot )\) defines a \(\sigma \)-finite measure on \({\mathcal {B}}(M\setminus \{x\})\) for any \(x\in M\) and \(f:M\rightarrow {\mathbb {R}}\) is such that the integral exists. We also include the case where the integral on the right-hand side of (2) is singular. In this situation one replaces \(\int _{M \setminus \{x\}}\) by \(\lim \nolimits _{\varepsilon \rightarrow 0^+} \int _{M \setminus B_\varepsilon (x)}\) in the right-hand side of (2). This is motivated by the important example of the fractional Laplace operator, where we choose \(M={\mathbb {R}}^n\) and d as the corresponding Euclidean distance. However, the quite general formulation also covers a variety of other important situations, such as, for instance, the case where L is the generator of a continuous-time Markov chain on a discrete state space endowed with the natural graph structure with M being the set of vertices and edge weights given by the corresponding transition rates between two respective states.

In [16], the following formula serves as a replacement of a classical chain rule. It has been established in the context of locally finite graphs and also for a class of non-local operators on \({\mathbb {R}}^n\) that are of the form (2). We state the result here again due to our more general setting. The short proof follows analogously as in [16].

Lemma 2.1

Let \(D\subset {\mathbb {R}}\) be an open set, \(h \in C^1(D;{\mathbb {R}})\), \(f:M \rightarrow D\) and \(x\in M\) such that Lf(x) and L(h(f))(x) exist. Then we have

$$\begin{aligned} L \big ( h(f)\big )(x) = h'(f(x)) L f(x) + \int _{M \setminus \{x\}} \Lambda _h\big (f(y),f(x)\big )k(x,\mathrm {d}y), \end{aligned}$$

where \(\Lambda _h(w,z):= h(w)-h(z)-h'(z)(w-z)\) for any \(w,z \in D\).

In the context of this article, we will apply Lemma 2.1 with the specific choice of \(h(r)=\log (r)\), \(r \in (0,\infty )\). In this case we have

$$\begin{aligned} \Lambda _{\log }(w,z)= \log w - \log z -\frac{w-z}{z} = -\Upsilon (\log w - \log z), \, w,z \in (0,\infty ), \end{aligned}$$

where \(\Upsilon (r):= e^r - r -1\), \(r\in {\mathbb {R}}\). Introducing the operator

$$\begin{aligned} \Psi _\Upsilon (f)(x) = \int _{M \setminus \{x\}} \Upsilon \big ( f(y)-f(x)\big ) k(x,\mathrm {d}y), \end{aligned}$$
(3)

we deduce from Lemma 2.1 that

$$\begin{aligned} L (\log f) =\frac{L f }{f} - \Psi _\Upsilon (\log f) \end{aligned}$$
(4)

holds for any positive function f whenever Lf and \(L (\log f)\) exist. Note that a possible singularity at \(y=x\) in the right-hand side of (3) does not play a role by positivity of \(\Upsilon \) in the sense that the integral is either finite or \(\infty \). In the particular case of the fractional Laplacian, a simple Taylor argument shows that the quadratic behavior of \(\Upsilon \) near the origin ensures that the singularity of the corresponding integral kernel is compensated provided the function is sufficiently smooth.

Now, we establish the key estimate of this article.

Lemma 2.2

Let L be an operator of the form (2) and let \(H:M\times M \rightarrow (0,\infty )\) be such that \(H(x,\cdot )\) is \({\mathcal {B}}(M)\)-measurable and the restriction \(\left. H\right| _{M\setminus \{x\} \times M}\) is \({\mathcal {B}}(M\setminus \{x\}) \otimes {\mathcal {B}}(M)\)-measurable for any \(x \in M\), respectively. Further, let \(f:M\rightarrow (0,\infty )\) be \({\mathcal {B}}(M)\)-measurable. We assume that the integral \(P f(x)= \int _{M} H(x,y)f(y)\,\mathrm {d}\nu (y)\) and also \(\Psi _\Upsilon (\log P f)\) exist for any \(x \in M\) and that for \(\nu \)-a.e. \(y \in M\) the expression \(\Psi _\Upsilon (\log H(\cdot ,y))(x)\) exists for every \(x \in M\). Here \(\nu :{\mathcal {B}}(M)\rightarrow [0,\infty ]\) is a \(\sigma \)-finite measure. Then we have

$$\begin{aligned} \int _{M} \Psi _\Upsilon (\log H(\cdot ,y))(x)H(x,y)f(y) \mathrm {d}\nu (y) \ge \Psi _\Upsilon (\log P f)(x) P f(x). \end{aligned}$$
(5)

Proof

Recalling (3), we write

$$\begin{aligned} \Psi _\Upsilon (\log P f)(x) P f (x)&= \int _{M\setminus \{x\}} \Upsilon \Big (\log \frac{P f(h)}{P f(x)}\Big )P f(x)k(x,\mathrm {d}h) \\&= \int _{M\setminus \{x\}}\Upsilon \Big (\log \frac{P f(h)}{P f(x)}\Big ) \int _{M} H(x,y)f(y)\mathrm {d}\nu (y)\,k(x,\mathrm {d}h). \end{aligned}$$

Next, by Tonelli’s theorem we observe that

$$\begin{aligned}&\int _{M} \Psi _\Upsilon \big (\log H(\cdot ,y)) (x) H(x,y) f(y)\mathrm {d}\nu (y) \\&\quad = \int _{M} H(x,y) f(y) \int _{M\setminus \{x\}} \Upsilon \Big (\log \frac{H(h,y)}{H(x,y)}\Big )k(x,\mathrm {d}h) \, \mathrm {d}\nu (y) \\&\quad = \int _{M\setminus \{x\}}\int _{M} H(x,y) f(y)\Upsilon \Big ( \log \frac{H(h,y)}{H(x,y)}\Big )\mathrm {d}\nu (y)\,k(x,\mathrm {d}h). \end{aligned}$$

Consequently, we have

$$\begin{aligned} \begin{aligned}&\int _{M} \Psi _\Upsilon \big (\log H(\cdot ,y)) (x) H(x,y) f(y)\mathrm {d}\nu (y) - \Psi _\Upsilon (\log P f)(x) P f (x) \\&\quad = \int _{M\setminus \{x\}} \int _{M} \Big (\Upsilon \Big ( \log \frac{H(h,y)}{H(x,y)}\Big ) - \Upsilon \Big (\log \frac{P f(h)}{P f(x)}\Big )\Big ) H(x,y)f(y)\mathrm {d}\nu (y)\, k(x,\mathrm {d}h). \end{aligned} \end{aligned}$$
(6)

Now, it can be readily checked that the mapping \(r\mapsto \Upsilon (\log (r))\), \(r \in (0,\infty )\), is convex. Since \(\frac{\mathrm {d}}{\mathrm {d}r}\Upsilon (\log (r))=\frac{r-1}{r}\), \(r\in (0,\infty )\), we infer from convexity that

$$\begin{aligned} \Upsilon (\log (r))-\Upsilon (\log (s)) \ge \frac{s-1}{s}(r-s) \end{aligned}$$

holds for any \(r,s \in (0,\infty )\). With this at hand and using the positivity of f and H, we can now estimate the right-hand side of (6) as follows.

$$\begin{aligned} \begin{aligned}&\int _{M \setminus \{x\}} \int _{M}\Big (\Upsilon \Big ( \log \frac{H(h,y)}{H(x,y)}\Big ) - \Upsilon \Big (\log \frac{P f(h)}{P f(x)}\Big )\Big ) H(x,y)f(y)\mathrm {d}\nu (y)\, k(x,\mathrm {d}h) \\&\quad \ge \int _{M \setminus \{x\}} \int _{M} \frac{P f(h) - P f(x)}{P f(h)}\Big (\frac{H(h,y)}{H(x,y)} - \frac{P f(h)}{P f(x)}\Big ) H(x,y)f(y)\mathrm {d}\nu (y) \, k(x,\mathrm {d}h). \end{aligned} \end{aligned}$$

Having a possible singularity in mind, we choose \(\varepsilon >0\) arbitrary (but small enough such that \(B_\varepsilon (x) \subsetneq M\)) and then use linearity to observe

$$\begin{aligned}&\int _{M \setminus B_\varepsilon (x)} \int _{M} \frac{P f(h) - P f(x)}{P f(h)}\Big (\frac{H(h,y)}{H(x,y)} - \frac{P f(h)}{P f(x)}\Big ) H(x,y)f(y)\mathrm {d}\nu (y) \, k(x,\mathrm {d}h) \\&\quad = \int _{M \setminus B_\varepsilon (x)}\int _{M} \frac{P f(h) - P f(x)}{P f(h)} H(h,y) f(y)\mathrm {d}\nu (y) \, k(x,\mathrm {d}h) \\&\qquad - \int _{M \setminus B_\varepsilon (x)}\int _{M} \frac{P f(h)-P f(x)}{P f(x)} H(x,y) f(y) \mathrm {d}\nu (y)\, k(x,\mathrm {d}h) \\&\quad =: I_{\varepsilon ,1} - I_{\varepsilon ,2}. \end{aligned}$$

Further, we have

$$\begin{aligned} I_{\varepsilon ,1}&= \int _{M \setminus B_\varepsilon (x)} \frac{P f(h)- P f(x)}{P f(h)} \int _{M} H(h,y) f(y) \mathrm {d}\nu (y) \, k(x,\mathrm {d}h)\\&= \int _{M \setminus B_{\varepsilon }(x)}\big (P f(h) - P f(x)\big ) k(x,\mathrm {d}h) \end{aligned}$$

and similarly

$$\begin{aligned} I_{\varepsilon ,2}&= \frac{1}{P f(x)}\int _{M \setminus B_\varepsilon (x)} \big (P f(h) - P f(x)\big )\int _{M} H(x,y) f(y)\mathrm {d}\nu (y) \, k(x,\mathrm {d}h) \\&= \int _{M \setminus B_{\varepsilon }(x)} \big (P f(h) - P f(x)\big )k(x,\mathrm {d}h), \end{aligned}$$

which yields that \(I_{\varepsilon ,1}-I_{\varepsilon ,2}=0\) holds. Sending \(\varepsilon \rightarrow 0\) yields the claim. \(\square \)

Remark 2.3

Lemma 2.2 is a non-local version of the inequality

$$\begin{aligned} \int _{{\mathbb {R}}^n}\big |\nabla _x \log H(x,y)\big |^2 H(x,y) f(y)\,\mathrm {d}y\ge \big |\nabla \log Pf(x)\big |^2 Pf(x), \end{aligned}$$
(7)

where \(Pf(x)=\int _{{\mathbb {R}}^n}H(x,y)f(y)\mathrm {d}y\), for sufficiently regular, positive functions H and f. This shows, as already indicated by the chain rule (4), that the expression \(\Psi _\Upsilon (\log f)\) serves as a natural analogue of \(|\nabla \log f|^2\) in the non-local setting.

In contrast to inequality (5), the argument for (7) is quite simple. In fact, using Hölder’s inequality we have

$$\begin{aligned} (\partial _{x_i} Pf(x))^2&= \Bigg (\int _{{\mathbb {R}}^n} \partial _{x_i} H(x,y)f(y)\,\mathrm {d}y\Bigg )^2\\&\le \int _{{\mathbb {R}}^n} \frac{(\partial _{x_i} H(x,y))^2}{H(x,y)}f(y) \,\mathrm {d}y \,\, \int _{{\mathbb {R}}^n} H(x,y) f(y) \,\mathrm {d}y, \end{aligned}$$

which directly leads to (7) by summing up and employing the classical chain rule for the gradient.

We now come to our main result. For \(T>0\) arbitrary, we consider \(u:[0,T) \times M \rightarrow (0,\infty )\) with \(u(\cdot ,x) \in C^1\big ( (0,T)\big )\) for any \(x \in M\) and such that

$$\begin{aligned} \partial _t u(t,x) = L (u(t,\cdot ))(x) \end{aligned}$$
(8)

holds for any \((t,x)\in (0,T)\times M\). We set \(u_0(\cdot )=u(0,\cdot )\) and make the following assumptions:

  1. (A1)

    There exists a mapping \(p: (0,T) \times M \times M \rightarrow (0,\infty )\) such that \(p(t,x,\cdot )\) is \({\mathcal {B}}(M)\)-measurable and the restriction \(\left. p(t,\cdot ,\cdot )\right| _{M\setminus \{x\} \times M}\) is \({\mathcal {B}}(M \setminus \{x\}) \otimes {\mathcal {B}}(M)\)-measurable for any \((t,x) \in (0,T)\times M\). Moreover, there exists a \(\sigma \)-finite reference measure \(\mu :{\mathcal {B}}(M)\rightarrow [0,\infty ]\), such that p is differentiable with respect to time and satisfies for any \((t,x)\in (0,T)\times M\) the equation \(\partial _t p(t,x,y) = L (p(t,\cdot ,y))(x)\) for \(\mu -\)a.e. \(y \in M\), and, moreover, that the representation formula

    $$\begin{aligned} u(t,x)= \int _M p(t,x,y) u_0(y) \mathrm {d}\mu (y) \end{aligned}$$
    (9)

    holds true at any \((t,x) \in (0,T)\times M\).

  2. (A2)

    We have that

    $$\begin{aligned} \partial _t u(t,x) = \int _M \partial _t p(t,x,y) u_0(y) \mathrm {d}\mu (y) \end{aligned}$$

    holds at any \((t,x)\in (0,T)\times M\).

Note that assumption (A2) is quite mild in the sense that it can be justified by means of the dominated convergence theorem for a large class of examples.

Combining (4) and (8), we observe that

$$\begin{aligned} \partial _t \log u = \frac{\partial _t u}{u}=L (\log u) + \Psi _\Upsilon (\log u). \end{aligned}$$
(10)

Note that due to assumption (A1), in (10), we can also replace u by \(p(\cdot ,\cdot ,y)\) for \(\mu -\)a.e. \(y\in M\).

The above assumptions have a clear motivation from the viewpoint of stochastic processes. Let \(\big (X_t\big )_{t \ge 0}\) be a Markov process on a probability space \((\Omega ,{\mathcal {F}},{\mathbb {P}})\) with state space given by a locally compact and separable metric space M. One defines the semigroup

$$\begin{aligned} P_t f (x) = {\mathbb {E}}(f(X_t) | X_t = x) \end{aligned}$$
(11)

for suitable functions f (e.g. bounded and measurable), which is then a solution to the corresponding Cauchy problem. Very often \(P_t f\) is given by an integral representation as in (9), where the measure \(\mu \) commonly plays the role of an invariant measure for the process and the kernel p(txy) describes the transition density for the Markov process. Transition densities play a fundamental role in several aspects of probability theory and analysis. For instance, they form the basis of the parametrix-technique to construct Lévy-type processes, see e.g. [19]. The case of \(\alpha \)-stable isotropic Lévy processes will be discussed in Sect. 3 below. Of somewhat different flavour is the situation when M is a discrete set, in which case we refer to \((X_t)_{t\ge 0}\) as a continuous-time Markov chain. Then the kernel p(txy) coincides with the respective transition probability, i.e. the probability of the process being at time t in the state y provided that it has started in the state x. We illustrate this situation in Sect. 4 by considering the special case when the underlying graph to the Markov generator is given by a complete graph.

The fundamental estimate of Lemma 2.2 allows to reduce the problem of deriving Li–Yau inequalities for (8) to the kernel function of assumption (A1), as the following result shows.

Theorem 2.4

Let \(u:[0,T)\times M \rightarrow (0,\infty )\) solve (8) and satisfy assumptions (A1) and (A2) such that at any \((t,x)\in (0,T)\times M\) the expressions \(L (\log p(t,\cdot ,y))(x)\) for \(\mu -\)a.e. \(y \in M\) and \(L(\log u(t,\cdot ))(x)\) exist, respectively. If the estimate

$$\begin{aligned} -L \big (\log p(t,\cdot ,y)\big )(x)\le \varphi (t,x) \end{aligned}$$
(12)

holds at any \((t,x) \in (0,T)\times M\) and \(\mu \)-a.e. \(y\in M\), where \(\varphi : (0,\infty )\times M \rightarrow {\mathbb {R}}\), then the Li–Yau type inequality

$$\begin{aligned} - L (\log u(t,\cdot ))(x)\le \varphi (t,x) \end{aligned}$$
(13)

holds true for every \((t,x) \in (0,T)\times M\).

Remark 2.5

Using the evolution equation (10) for \(\log u\), the Li–Yau inequality (13) from Theorem 2.4 implies the differential Harnack inequality

$$\begin{aligned} \partial _t \log u(t,x)\ge \Psi _\Upsilon (\log u)(t,x)-\varphi (t,x) \end{aligned}$$
(14)

for all \((t,x) \in (0,T)\times M\). In the discrete setting, it has been shown that inequalities of the form (14) imply Harnack inequalities for positive solutions of the diffusion equation associated with the operator L, see [16]. To illustrate this, we will look at the example of an unweighted complete graph in Sect. 5. In the same section, we will further consider the fractional heat equation and show that from the corresponding differential Harnack inequality a Harnack estimate can be derived as well.

Proof

The two key ingredients in the proof of Theorem 2.4 are Lemma 2.2 and the evolution equation (10) for \(\log u\).

Using (10), we can rewrite (13) equivalently as

$$\begin{aligned} u(t,x) \, \Psi _\Upsilon (\log u(t,\cdot ) )(x) \le \partial _t u(t,x) + \varphi (t,x)\, u(t,x). \end{aligned}$$
(15)

By the same argument, we know by (A1) that the assumption (12) can be reformulated into

$$\begin{aligned} p(t,x,y) \, \Psi _\Upsilon (\log p(t,\cdot ,y) )(x) \le \partial _t p(t,x,y) + \varphi (t,x)\, p(t,x,y) \end{aligned}$$
(16)

for \(\mu -\)a.e. \(y \in M\). We observe from the assumptions (A1) and (A2) that

$$\begin{aligned}&\partial _t u(t,x) + \varphi (t,x)\,u(t,x) \\&\quad = \int _M \partial _t p(t,x,y)u_0(y) \mathrm {d}\mu (y) + \varphi (t,x) \int _M p(t,x,y) u_0(y) \mathrm {d}\mu (y) \\&\quad = \int _M \Big ( \partial _t p(t,x,y) + \varphi (t,x)p(t,x,y) \Big ) u_0(y) \mathrm {d}\mu (y) \\&\quad \ge \int _M \Psi _\Upsilon \big ( \log p(t,\cdot ,y)\big )(x)\, p(t,x,y) u_0(y) \mathrm {d}\mu (y)\\&\quad \ge \Psi _\Upsilon \big ( \log u(t,\cdot )\big )(x) \, u(t,x), \end{aligned}$$

where we used (16) in the second to last line and applied Lemma 2.2 in the last line (with \(H(x,y)=p(t,x,y)\)) at \((t,x) \in (0,T)\times M\). \(\square \)

Remark 2.6

Note that we could relax the assumptions of Theorem 2.4 in order to get a local version of the above result. More precisely, given some fixed \((t,x)\in (0,T)\times M\), we only need to assume that \(L\big ( \log p(t,\cdot ,y)\big )(x)\) for \(\mu -\)a.e. \(y\in M\) and \(L \big ( \log u(t,\cdot )\big )(x)\) exist, respectively. If then \(-L\big ( \log p(t,\cdot ,y)\big )(x)\le C\) holds for \(\mu -\)a.e. \(y\in M\) and a constant \(C\in {\mathbb {R}}\), we infer that \(-L\big ( \log u(t,\cdot )\big )(x)\le C\) by the same arguments as in the proof of Theorem 2.4.

3 Application to the fractional heat equation

In this section we want to apply Theorem 2.4 to derive Li–Yau type inequalities for positive solutions to the fractional heat equation

$$\begin{aligned} \partial _t u + \big (- \Delta \big )^\frac{\beta }{2}u = 0 \end{aligned}$$
(17)

on the full space \({\mathbb {R}}^d\), where \(\beta \in (0,2)\).

The fractional Laplacian on \({\mathbb {R}}^d\) can be defined as

$$\begin{aligned} - \big (-\Delta )^\frac{\beta }{2}f(x) = c_{\beta ,d}\,\, \mathrm {p.v.}\int _{{\mathbb {R}}^d} \frac{f(y)-f(x)}{|x-y|^{d+\beta }}\, \mathrm {d}y, \end{aligned}$$

where \(\beta \in (0,2)\), the normalizing constant \(c_{\beta ,d}\) is given by \(c_{\beta , d} = \frac{2^\beta \Gamma (\frac{d+\beta }{2})}{\pi ^\frac{d}{2}|\Gamma (-\frac{\beta }{2})|}\) and f is a suitable function (e.g. in \(C^2({\mathbb {R}}^d) \cap L^\infty ({\mathbb {R}}^d)\)). There is also an equivalent definition in the pointwise sense that reads as

$$\begin{aligned} - (-\Delta )^\frac{\beta }{2}f(x) = \frac{c_{\beta ,d}}{2} \int _{{\mathbb {R}}^d} \frac{f(x+y)+f(x-y) - 2f(x)}{|y|^{d+\beta }}\mathrm {d}y \end{aligned}$$

and has the advantage of avoiding the principal value integral.

Using classical Fourier techniques, one can express solutions to (17) as

$$\begin{aligned} u(t,x) = \big ( G^{(\beta )}(t,\cdot ) *u_0 \big )(x) = \int _{{\mathbb {R}}^d}G^{(\beta )}(t,x-y) u_0(y) \mathrm {d}y, \end{aligned}$$
(18)

where we refer to \(G^{(\beta )}\) as the fractional heat kernel of order \(\beta \in (0,2)\). We will discuss below that this representation formula in fact holds true for the class of solutions we are interested in.

But before that, we collect some properties of the fractional heat kernels, which we will use subsequently. We refer to [17] and the references therein for a more detailed account.

In the special case of \(\beta =1\), there is an explicit formula for the fractional heat kernel available, which reads as

$$\begin{aligned} G^{(1)}(t,x)= \frac{\Gamma \big (\frac{d+1}{2}\big )}{\pi ^\frac{d+1}{2}} \frac{t}{\big (t^2 + |x|^2\big )^{\frac{d+1}{2}}}. \end{aligned}$$
(19)

Concerning the general situation, it goes back to [9] that the fractional heat kernels behave as follows

$$\begin{aligned} G^{(\beta )}(t,x) \asymp \frac{t}{\big (t^{\frac{2}{\beta }} + |x|^2\big )^{\frac{d+\beta }{2}}}, \quad (t,x) \in (0,\infty )\times {\mathbb {R}}^d, \end{aligned}$$
(20)

where the symbol \(\asymp \) means that the ratio is bounded by a constant positive factor from above and below. This yields in particular that we have for any non-negative \(u_0\)

$$\begin{aligned} \int _{{\mathbb {R}}^d}G^{(\beta )}(t,x-y) u_0(y) \mathrm {d}y \asymp t \int _{{\mathbb {R}}^d} \frac{u_0(y)}{\big (t^{\frac{2}{\beta }} + |x-y|^2\big )^{\frac{d+\beta }{2}}}\mathrm {d}y, \end{aligned}$$
(21)

which yields that the integral on the right-hand side of (18) exists if and only if the integral on the right-hand side of (21) is finite.

Further, we have \(G^{(\beta )}(t,\cdot ) \in C^\infty ({\mathbb {R}}^d)\) and that the fractional heat kernels can be expressed in the following self-similar form

$$\begin{aligned} G^{(\beta )}(t,x) = t^{-\frac{d}{\beta }} \Phi _{\beta }\left( \frac{x}{t^{\frac{1}{\beta }}}\right) , \end{aligned}$$
(22)

where \((t,x) \in (0,\infty )\times {\mathbb {R}}^d\) and \(\Phi _{\beta }(y)=G^{(\beta )}(1,y)\) for any \(y \in {\mathbb {R}}^d\).

There also exist specific estimates for the derivatives of the fractional heat kernels, for which we refer to [20], where those have been studied extensively in quite great generality. We will only make use of

$$\begin{aligned} |\nabla \Phi _{\beta }(x)| \lesssim \frac{1}{|x|^{d+\beta +1}} \end{aligned}$$
(23)

and

$$\begin{aligned} \Vert \nabla ^2 \Phi _{\beta }(x) \Vert \lesssim \frac{1}{|x|^{d+\beta +2}}, \end{aligned}$$
(24)

which hold whenever \(|x|\ge 1\), cf. [20]. Here we have used the symbol ’\(\lesssim \)’ to indicate that the corresponding constant is independent of x.

In [6] a Widder-type theorem for the fractional heat equation has been established. More precisely, any non-negative strong solution to (17) is given by (18). Here, by strong solution it is meant that for any fixed \(T>0\) we have \(\partial _t u \in C\big ( (0,T)\times {\mathbb {R}}^d\big )\), \(u \in C \big ( [0,T)\times {\mathbb {R}}^d\big )\) and that (17) holds in the pointwise sense for every \((t,x)\in (0,T)\times {\mathbb {R}}^d\). Consequently, the assumption (A1) is satisfied for any positive strong solution u, where the kernel is given by \(p(t,x,y)=G^{(\beta )}(t,x-y)\) and the measure \(\mu \) is the Lebesgue measure on \({\mathbb {R}}^d\). We point out that it is not clear at all whether a Widder-type theorem holds for the heat equation associated with the operator L in the general framework considered in Sect. 2.

Concerning assumption (A2), we refer to [26], where the bound

$$\begin{aligned} | \big ( - \Delta \big )^\frac{\beta }{2} (G^{(\beta )}(t,\cdot ))(x)| \le \frac{C}{\big (t^\frac{2}{\beta } + |x|^2\big )^\frac{d+\beta }{2}} \end{aligned}$$

has been established for some constant \(C>0\). This yields, that assumption (A2) follows from the dominated convergence theorem due to the existence of the integral on the right-hand side of (18) (cf. (21)).

Lemma 3.1

We have that

$$\begin{aligned} \big (-\Delta \big )^{\frac{\beta }{2}}(\log G^{(\beta )}(t,\cdot ))(x) \le \frac{C_{LY}(\beta ,d)}{t}, \end{aligned}$$
(25)

holds at any \(x \in {\mathbb {R}}^d\), \(t>0\) and \(\beta \in (0,2)\), where the finite constant \(C_{LY}(\beta ,d)>0\) is given by

$$\begin{aligned} C_{LY}(\beta ,d)=\frac{c_{\beta ,d}}{2} \sup \limits _{y \in {\mathbb {R}}^d}\int _{{\mathbb {R}}^d}\frac{\log \Big (\frac{ \Phi _{\beta }(y)^2}{ \Phi _{\beta }(y+\sigma ) \Phi _{\beta }(y-\sigma )}\Big )}{|\sigma |^{d+\beta }}\mathrm {d}\sigma . \end{aligned}$$
(26)

Proof

We consider

$$\begin{aligned}&\big ( - \Delta \big )^{\frac{\beta }{2}}(\log G^{(\beta )}(t,\cdot ))(x) \\&\quad = \frac{c_{\beta ,d}}{2} \int _{{\mathbb {R}}^d} \frac{2 \log G^{(\beta )}(t,x) - \log G^{(\beta )}(t,x+h)-\log G^{(\beta )}(t,x-h)}{|h|^{d+\beta }}\mathrm {d}h \\&\quad = \frac{c_{\beta ,d}}{2} \int _{{\mathbb {R}}^d} \frac{\log \Big (\frac{G^{(\beta )}(t,x)^2}{G^{(\beta )}(t,x+h)\,G^{(\beta )}(t,x-h)}\Big )}{|h|^{d+\beta }}\mathrm {d}h. \end{aligned}$$

Using the self-similar form (22) and setting \(y=\frac{x}{t^{\frac{1}{\beta }}}\), we get that

$$\begin{aligned} \int _{{\mathbb {R}}^d} \frac{\log \Big (\frac{G^{(\beta )}(t,x)^2}{G^{(\beta )}(t,x+h)G^{(\beta )}(t,x-h)}\Big )}{|h|^{d+\beta }}\mathrm {d}h&= \frac{1}{t} \int _{{\mathbb {R}}^d}\frac{\log \Big (\frac{ \Phi _{\beta }(y)^2}{ \Phi _{\beta }(y+\sigma ) \Phi _{\beta }(y-\sigma )}\Big )}{|\sigma |^{d+\beta }}\mathrm {d}\sigma , \end{aligned}$$

where we have substituted \(\sigma =\frac{h}{t^{\frac{1}{\beta }}}\). Consequently, it suffices to show that the mapping \(J:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) given by

$$\begin{aligned} J(y)=\int _{{\mathbb {R}}^d}\frac{\log \Big (\frac{ \Phi _{\beta }(y)^2}{ \Phi _{\beta }(y+\sigma ) \Phi _{\beta }(y-\sigma )}\Big )}{|\sigma |^{d+\beta }}\mathrm {d}\sigma \end{aligned}$$

is bounded. By continuity we only need to show that J is bounded on \({\mathbb {R}}^d \setminus B_{R}(0)\) for some \(R>2\). In the sequel, we assume that \(|y|\ge R\). We write

$$\begin{aligned} J(y)&= \int _{{\mathbb {R}}^d \setminus B_1(0)}\frac{\log \Big (\frac{ \Phi _{\beta }(y)^2}{ \Phi _{\beta }(y+\sigma ) \Phi _{\beta }(y-\sigma )}\Big )}{|\sigma |^{d+\beta }}\mathrm {d}\sigma + \int _{B_1(0)} \frac{\log \Big (\frac{ \Phi _{\beta }(y)^2}{ \Phi _{\beta }(y+\sigma ) \Phi _{\beta }(y-\sigma )}\Big )}{|\sigma |^{d+\beta }}\mathrm {d}\sigma \\&=: J_1(y)+J_2(y). \end{aligned}$$

First, we consider \(J_1(y)\) and use the behaviour described by (20)(with \(t=1\)) to find a constant \(M>1\) such that

$$\begin{aligned} J_1(y) \le \frac{d+\beta }{2}\,\int _{{\mathbb {R}}^d \setminus B_1(0)} \frac{\log \Big (M\frac{(1+|y+\sigma |^2)(1+|y-\sigma |^2)}{(1+|y|^2)^2}\Big )}{|\sigma |^{d+\beta }}\mathrm {d}\sigma . \end{aligned}$$

We have that

$$\begin{aligned} (1+|y+\sigma |^2)(1+|y-\sigma |^2)&\le \big (1+2|y|^2 + 2|\sigma |^2)^2. \end{aligned}$$

In the case of \(|y|> |\sigma |\), this yields that

$$\begin{aligned} \frac{(1+|y+\sigma |^2)(1+|y-\sigma |^2)}{(1+|y|^2)^2} \le \Big (\frac{1+4|y|^2}{1+|y|^2}\Big )^2 \le M', \end{aligned}$$

where \(M'>1\) is some constant that is independent of y. In the other case of \(|y|\le |\sigma |\), we have

$$\begin{aligned} \frac{(1+|y+\sigma |^2)(1+|y-\sigma |^2)}{(1+|y|^2)^2}\le \big (1+ 4|\sigma |^2\big )^2. \end{aligned}$$

Putting these estimates together and using the monotonicity of the logarithm, we end up with

$$\begin{aligned} J_1(y)&\le \frac{d+\beta }{2}\,\int _{B_{|y|}(0)\setminus B_1(0)} \frac{\log (M M')}{|\sigma |^{d+\beta }}\mathrm {d}\sigma \\&\quad + \frac{d+\beta }{2}\,\int _{{\mathbb {R}}^d \setminus B_{|y|}(0)} \frac{\log (M (1+4|\sigma |^2)^2)}{|\sigma |^{d+\beta }}\mathrm {d}\sigma \\&\le \frac{d+\beta }{2}\,\int _{{\mathbb {R}}^d \setminus B_1(0)} \frac{\log \big (M^2 M' (1+4|\sigma |^2)^2\big )}{|\sigma |^{d+\beta }}\mathrm {d}\sigma \le C_1 <\infty , \end{aligned}$$

where \(C_1>0\) is some constant which is independent of y.

Now, we turn to \(J_2(y)\). Using the inequality \(\log r \le r-1\), \(r\in (0,\infty )\), we infer

$$\begin{aligned} J_2(y) \le \int _{B_1(0)} \frac{ \Phi _{\beta }(y)^2 - \Phi _{\beta }(y+\sigma ) \Phi _{\beta }(y-\sigma )}{ \Phi _{\beta }(y+\sigma ) \Phi _{\beta }(y-\sigma )}\frac{\mathrm {d}\sigma }{|\sigma |^{d+\beta }}. \end{aligned}$$
(27)

Given some \(\eta \in B_1(0)\), we have by the Cauchy–Schwarz inequality that

$$\begin{aligned}&1-\frac{2}{R}\le 1- \frac{2|\eta |}{|y|} \le 1- \frac{2|\eta |}{|y|}+\frac{|\eta |^2}{|y|^2}\le \frac{|y+\eta |^2}{|y|^2} \\&\quad \le 1+ \frac{2|\eta |}{|y|}+\frac{|\eta |^2}{|y|^2}\le 1+\frac{2}{R}+\frac{1}{R^2}. \end{aligned}$$

Consequently, since \(R>2\), we observe that

$$\begin{aligned} |y+\eta |^2 \asymp |y|^2, \end{aligned}$$
(28)

where the respective constants are in particular independent of \(\eta \).

Further, by using Taylor’s expansion we get that

$$\begin{aligned} \Phi _{\beta }(y+\sigma )&= \Phi _{\beta }(y) + \sigma \cdot \nabla \Phi _{\beta }(y)^T + \frac{1}{2} \sigma \cdot \nabla ^2 \Phi _{\beta }(y+\zeta _+ \sigma ) \sigma , \\ \Phi _{\beta }(y-\sigma )&= \Phi _{\beta }(y) - \sigma \cdot \nabla \Phi _{\beta }(y)^T + \frac{1}{2} \sigma \cdot \nabla ^2 \Phi _{\beta }(y-\zeta _- \sigma ) \sigma , \end{aligned}$$

where \(\zeta _-,\zeta _+ \in [0,1]\). Then we get

$$\begin{aligned}&\Phi _{\beta }(y)^2 - \Phi _{\beta }(y+\sigma ) \Phi _{\beta }(y-\sigma )\\&\quad = -\frac{ \Phi _{\beta }(y)}{2} (\sigma \cdot \nabla ^2 \Phi _{\beta }(y+\zeta _+ \sigma )\sigma +\sigma \cdot \nabla ^2 \Phi _{\beta }(y-\zeta _- \sigma )\sigma )\\&\qquad -\frac{\sigma \cdot \nabla \Phi _{\beta }(y)^T}{2} (\sigma \cdot \nabla ^2 \Phi _{\beta }(y-\zeta _- \sigma )\sigma - \sigma \cdot \nabla ^2 \Phi _{\beta }(y+\zeta _+ \sigma )\sigma )\\&\qquad - \frac{1}{4}\big (\sigma \cdot \nabla ^2 \Phi _{\beta }(y+\zeta _+ \sigma )\sigma \big )\big (\sigma \cdot \nabla ^2 \Phi _{\beta }(y-\zeta _-\sigma )\sigma \big )\\&\qquad + \big ( \sigma \cdot \nabla \Phi _{\beta }(y)^T \big )^2. \end{aligned}$$

Next, we aim to suitably estimate the integrand on the right hand side of (27). For that purpose, we will subsequently make use of the estimates (20) (with \(t=1\)),(23), (24), (28) and the Cauchy–Schwarz inequality. Moreover, we will use the symbol ‘\(\lesssim \)’ whenever the corresponding constant is independent of |y|.

We have

$$\begin{aligned}&\frac{ | \Phi _{\beta }(y)\, \sigma \cdot \nabla ^2 \Phi _{\beta }(y+\zeta _+ \sigma )\sigma |}{ \Phi _{\beta }(y+\sigma ) \Phi _{\beta }(y-\sigma )}\\&\quad \lesssim \Big ( \frac{(1+|y+\sigma |^2)(1+|y-\sigma |^2)}{1+|y|^2}\Big )^{\frac{d+\beta }{2}} |\sigma |^2 \Vert \nabla ^2 \Phi _{\beta }(y+\zeta _+\sigma ) \Vert \\&\quad \lesssim \Big ( \frac{|y+\sigma |^2|y-\sigma |^2}{|y|^2}\Big )^{\frac{d+\beta }{2}}\frac{|\sigma |^2}{|y+\zeta _+\sigma |^{d+\beta +2}} \lesssim \frac{|\sigma |^2}{|y|^2}. \end{aligned}$$

Clearly, the corresponding term with \(-\zeta _-\) instead of \(\zeta _+\) can be treated analogously. Next, we consider

$$\begin{aligned}&\frac{\big |\big ( \sigma \cdot \nabla \Phi _{\beta }(y)^T\big )\big ( \sigma \cdot \nabla ^2 \Phi _{\beta }(y+\zeta _+\sigma )\sigma \big )\big |}{ \Phi _{\beta }(y+\sigma ) \Phi _{\beta }(y-\sigma )}\\&\quad \lesssim |\sigma |^3 |\nabla \Phi _{\beta }(y)| \Vert \nabla ^2 \Phi _{\beta }(y+\zeta _+\sigma )\Vert \big ( (1+|y+\sigma |^2)(1+|y-\sigma |^2)\big )^{\frac{d+\beta }{2}}\\&\quad \lesssim \frac{|\sigma |^3 |y+\sigma |^{d+\beta }|y-\sigma |^{d+\beta }}{|y|^{d+\beta +1}|y+\zeta _+\sigma |^{d+\beta +2}} \lesssim \frac{|\sigma |^3}{|y|^3} \end{aligned}$$

and again, the corresponding term with \(-\zeta _-\) instead of \(\zeta _+\) can be treated analogously. Furthermore, we have

$$\begin{aligned}&\frac{\big |\big (\sigma \cdot \nabla ^2 \Phi _{\beta }(y+\zeta _+ \sigma )\sigma \big )\big (\sigma \cdot \nabla ^2 \Phi _{\beta }(y-\zeta _-\sigma )\sigma \big )\big |}{ \Phi _{\beta }(y+\sigma ) \Phi _{\beta }(y-\sigma )}\\&\quad \lesssim |\sigma |^4 \Vert \nabla ^2 \Phi _{\beta }(y+\zeta _+\sigma )\Vert \Vert \nabla ^2 \Phi _{\beta }(y-\zeta _- \sigma )\Vert \big ( (1+|y+\sigma |^2)(1+|y-\sigma |^2)\big )^{\frac{d+\beta }{2}}\\&\quad \lesssim \frac{|\sigma |^4 |y+\sigma |^{d+\beta } |y-\sigma |^{d+\beta }}{|y+\zeta _+\sigma |^{d+\beta +2} |y-\zeta _-\sigma |^{d+\beta +2}} \lesssim \frac{|\sigma |^4}{|y|^4} \end{aligned}$$

and last but not least

$$\begin{aligned} \frac{\big ( \sigma \cdot \nabla \Phi _{\beta }(y)^T \big )^2}{ \Phi _{\beta }(y+\sigma ) \Phi _{\beta }(y-\sigma )}&\lesssim |\sigma |^2 |\nabla \Phi _{\beta }(y)|^2 \big ( (1+|y+\sigma |^2)(1+|y-\sigma |^2)\big )^{\frac{d+\beta }{2}}\\&\lesssim \frac{|\sigma |^2 |y+\sigma |^{d+\beta }|y-\sigma |^{d+\beta }}{|y|^{2d+2\beta +2}} \lesssim \frac{|\sigma |^2}{|y|^2}. \end{aligned}$$

Putting everything together, we infer from (27) that

$$\begin{aligned} J_2(y) \lesssim \frac{1}{|y|^2} \int _{B_1(0)}|\sigma |^{2-d-\beta }\mathrm {d}\sigma \le C_2 <\infty , \end{aligned}$$

for some constant \(C_2>0\) being independent of y. \(\square \)

Observe that the proof of Lemma 3.1 also shows that \(C_{LY}(\beta ,d)\) is the smallest constant among all \(C>0\) satisfying

$$\begin{aligned} \big (-\Delta \big )^{\frac{\beta }{2}}(\log G^{(\beta )}(t,\cdot ))(x) \le \frac{C}{t},\quad t>0,\,x\in {\mathbb {R}}^d. \end{aligned}$$

In this sense the constant \(C_{LY}(\beta ,d)\) is optimal.

Combining Theorem 2.4 with Lemma 3.1, we have shown the following Li–Yau type inequality for the fractional heat equation.

Theorem 3.2

Let \(u:[0,T)\times {\mathbb {R}}^d \rightarrow (0,\infty )\) be a strong solution to the fractional heat equation (17). Then the Li–Yau type inequality

$$\begin{aligned} \big (-\Delta \big )^\frac{\beta }{2} (\log u(t,\cdot ))(x) \le \frac{C_{LY}(\beta ,d)}{t} \end{aligned}$$
(29)

holds at any \((t,x)\in (0,T)\times {\mathbb {R}}^d\), where the constant \(C_{LY}(\beta ,d)>0\) is given by (26).

In the special case \(\beta =1\) an analytic expression in closed form is available for the fractional heat kernel, which allows to explicitly compute the constant of the Li–Yau inequality from Theorem 3.2.

Proposition 3.3

For \(\beta =1\) the Li–Yau constant (26) is given by

$$\begin{aligned} C_{LY}(1,d)= \frac{\pi \, d(d+1)\, c_{1,d}\, \omega _d}{2} = \frac{d(d+1)}{2B\big (\frac{d+1}{2},\frac{1}{2}\big )}, \end{aligned}$$
(30)

where \(\omega _d\) denotes the volume of the d-dimensional unit ball in \({\mathbb {R}}^d\) and B the Beta function.

Proof

First, we show that

$$\begin{aligned} \sup \limits _{x \in {\mathbb {R}}^d}\int _{{\mathbb {R}}^d}\frac{\log \Big (\frac{ \Phi _{1}(x)^2}{\Phi _{1}(x+\sigma )\Phi _{1}(x-\sigma )}\Big )}{|\sigma |^{d+1}}\mathrm {d}\sigma = \int _{{\mathbb {R}}^d}\frac{\log \Big (\frac{\Phi _{1}(0)^2}{\Phi _{1}(\sigma ) \Phi _{1}(-\sigma )}\Big )}{|\sigma |^{d+1}}\mathrm {d}\sigma . \end{aligned}$$
(31)

Let \(x\in {\mathbb {R}}^d\) be fixed and \(\sigma \in {\mathbb {R}}^d\) arbitrary. Using the explicit representation (19), we get that

$$\begin{aligned} \log \Bigg (\frac{\Phi _{1}(x)^2}{\Phi _{1}(x+\sigma ) \Phi _{1}(x-\sigma )}\Bigg ) = \frac{d+1}{2}\log \Bigg (\frac{(1+|x+\sigma |^2)(1+|x-\sigma |^2)}{(1+|x|^2)^2}\Bigg ). \end{aligned}$$

We have

$$\begin{aligned} \frac{(1+|x+\sigma |^2)(1+|x-\sigma |^2)}{(1+|x|^2)^2}&= 1 + \frac{2|\sigma |^2 + |\sigma |^4 + 2 |x|^2|\sigma |^2 - 4 (x\cdot \sigma )^2}{1+2|x|^2+|x|^4}\\&\le 1 + \frac{2|\sigma |^2 + |\sigma |^4 + 2 |x|^2|\sigma |^2}{1+2|x|^2+|x|^4}. \end{aligned}$$

We set \(a:=|\sigma |^2\) and consider \(h_a:[0,\infty )\rightarrow [0,\infty )\) given by \(h_a(z)= \frac{2az+2a+a^2}{1+2z+z^2}\). Note that

$$\begin{aligned} \big (1+h_a(0)\big )^{\frac{d+1}{2}} = \big (1+2|\sigma |^2+|\sigma |^4\big )^\frac{d+1}{2} = \frac{\Phi _1(0)^2}{\Phi _1(\sigma )\Phi _1(-\sigma )}. \end{aligned}$$

Hence, in order to establish (31) it suffices to show that \(h_a\) is decreasing on \([0,\infty )\) for any \(a\in [0,\infty )\). We have that

$$\begin{aligned} h_a'(z) = \frac{(1+2z+z^2)2a - (2az+2a+a^2)(2z+2)}{(1+2z+z^2)^2}, \end{aligned}$$

which is non-positive if and only if

$$\begin{aligned} \frac{1+2z+z^2}{2z+2}-z \le 1+\frac{a}{2}. \end{aligned}$$

This inequality holds true since

$$\begin{aligned} \frac{1+2z+z^2}{2z+2}-z = \frac{1-z^2}{2z+2} = \frac{1-z}{2}\le \frac{1}{2}. \end{aligned}$$

Consequently, (31) is established. Due to \(\Phi _1\) being radial symmetric, the calculation of the corresponding integral boils down to the one-dimensional case. More precisely, we have

$$\begin{aligned} \int _{{\mathbb {R}}^d}\frac{\log \Big (\frac{\Phi _{1}(0)^2}{\Phi _{1}(\sigma ) \Phi _{1}(-\sigma )}\Big )}{|\sigma |^{d+1}}\mathrm {d}\sigma = 2 \,\frac{d+1}{2}\, d\, \omega _d \int _0^\infty \frac{\log \big ( 1+r^2)}{r^2}\mathrm {d}r = \pi \, d(d+1) \, \omega _d, \end{aligned}$$

where the latter follows from an elementary calculation. The right-hand side in (30) follows from \(\omega _d=\frac{\pi ^\frac{d}{2}}{\Gamma (\frac{d}{2}+1)}\) and \(\Gamma (\frac{1}{2})=\sqrt{\pi }\), which yields

$$\begin{aligned} \pi c_{1,d} \omega _d = \sqrt{\pi } \frac{\Gamma (\frac{d+1}{2})}{\Gamma (\frac{d}{2}+1)} = \frac{\Gamma (\frac{1}{2})\Gamma (\frac{d+1}{2})}{\Gamma (\frac{d}{2}+1)} = \frac{1}{B\big (\frac{d+1}{2},\frac{1}{2}\big )}. \end{aligned}$$

\(\square \)

Remark 3.4

It is still an open question whether the Li–Yau constant \(C_{LY}(\beta ,d)\) tends to the Li–Yau constant \(\frac{d}{2}\) from the classical heat equation as \(\beta \rightarrow 2\).

As highlighted throughout Sect. 2 our approach merely depends on the specific structure of the operator than on the state space. Therefore it is natural to ask whether fractional powers of the Laplace–Beltrami operator on a Riemannian manifold do fit into our setting. The question of suitable representations of fractional powers of the Laplace–Beltrami operator has been addressed in the context of compact manifolds in [1] and also for hyperbolic spaces in [5]. These formulas are based on well-established ways to represent fractional powers of suitable operators like the Laplace–Beltrami operator, e.g. via the heat semigroup and Gaussian estimates [1] or the harmonic extension method [5]. In the discrete setting, it has been shown by [14] that fractional powers of the discrete Laplacian with state space \({\mathbb {Z}}\) can be written in the form described in Sect. 2. However, it seems to be difficult to establish suitable bounds on the heat kernel in a more general Riemannian setting that allow to derive a Li–Yau inequality (with an explicit function \(\varphi (t,x)\)) via the reduction principle, Theorem 2.4.

Another interesting research direction is gradient estimates for more complicated equations, which may comprise non-local terms like a fractional Laplacian (not necessarily as the leading order operator). We refer to the recent work [11], where it could be also interesting to include a fractional Laplacian in the setting described in lines 2–9 on page 437.

4 Illustration of the discrete case

Let us now discuss an application of the general principle developed in Sect. 2 that is very different from the one of the last section. In the case that M is a finite set, the operator L can be viewed as the generator of a continuous-time Markov chain. More precisely, L induces the generator matrix (which is commonly called Q-matrix) given by \(Q=\big (q(x,y)\big )_{x,y \in M}\) where \(q(x,\cdot )\) denotes the density of the measure \(k(x,\cdot )\) with respect to the counting measure on \(M\setminus \{x\}\). Further, one sets \(q(x,x) = -\sum _{y \ne x} q(x,y)\). In this setting, the kernel p(txy) in assumption (A1) is given by the transition probabilities and assumption (A2) is obviously satisfied. The transition probabilities are given by the entries of the matrix exponential \(e^{tQ}\). This, and much more information on the theory of continuous-time Markov chains can be found, for instance, in [23].

We illustrate the finite state space case by the example of the (unweighted) complete graph \(K_n\). In this case, the corresponding Q-matrix is given by the entries (\(x,y\in \{1,2,\ldots ,n\}\))

(32)

One readily checks that 0 is an eigenvalue of Q of multiplicity 1 with eigenvector \((1,...,1)^T\) and \(-n\) is an eigenvalue of Q of multiplicity \(n-1\) with eigenvectors \((-1,e_j^{(n-1)})^T\), \(j \in \{1,...,n-1\}\), where \(e_j^{(n-1)}\) denotes the j-th unit vector in \({\mathbb {R}}^{n-1}\). From that, one determines the transition probabilities to be given by

Now, we have

which yields that \(-L \big (\log (p)(t,\cdot ,y)\big )(x)\) is maximal if \(x=y\) for any fixed \(t>0\).

Applying Theorem 2.4, we obtain the following optimal result.

Theorem 4.1

Let \(u:[0,\infty )\times M \rightarrow (0,\infty )\) be a solution to \(\partial _t u = L u\) on \((0,\infty )\times M\), where L denotes the generator of the continuous-time Markov chain such that the complete graph \(K_n\), \(n\ge 2\), is the corresponding underlying graph. Then the Li–Yau inequality

$$\begin{aligned} - L \big ( \log u(t,\cdot ))(x)\le (n-1) \log \big (\frac{1+(n-1)e^{-nt}}{1-e^{-nt}}\big ) \end{aligned}$$

holds at any \((t,x) \in (0,\infty )\times M\).

Let us compare this result with what has been obtained in [16] in the case of the complete graph \(K_n\). The method of [16] relies on the curvature-dimension condition CD(F; 0) (where F is a so called CD-function). It is shown there, that CD(F; 0) implies a Li–Yau inequality of the form

$$\begin{aligned} - L \big ( \log u(t,\cdot )\big )(x) \le \varphi (t), \end{aligned}$$

where the so-called relaxation function \(\varphi :(0,\infty )\rightarrow (0,\infty )\) is defined as the unique positive solution of

$$\begin{aligned} \varphi '(t) + F(\varphi (t)) = 0,\quad t>0, \end{aligned}$$
(33)

such that \(\lim \limits _{t\rightarrow 0^+}\varphi (t)=\infty \). It is important to notice that for the uniqueness part, there is no need to assume that \(\frac{F(r)}{r}\) is increasing on \((0,\infty )\). This can be seen from the proof of [16, Lemma 3.5].

It has been calculated in [16] that the function \(F(r)=(n-1)\big (e^\frac{r}{n-1}-(n-1)e^\frac{-r}{n-1}+n-2\big ) \), \(r\ge 0\), satisfies the functional inequality that defines the curvature-dimension condition but is not a CD-function in general since \(\frac{F(r)}{r}\) is not increasing near 0. Since this property is of eminent importance for the method of [16], one needs to find a suitable lower bound for F which serves as a CD-function. This is when optimality in general gets lost. Remarkably, Theorem 4.1 shows that in this situation this is only an obstacle due to the method of [16], in fact the function F is tailor-made for the Li–Yau inequality of Theorem 4.1. Indeed, we can show that

$$\begin{aligned} \varphi (t)=(n-1) \log \left( \frac{1+(n-1)e^{-nt}}{1-e^{-nt}}\right) \end{aligned}$$

is the relaxation function to F, in the sense that \(\varphi \) is the unique positive solution of (33) with \(\lim \limits _{t\rightarrow 0^+}\varphi (t)=\infty \).

More precisely, we have

$$\begin{aligned} \varphi '(t)&= \frac{(n-1)n e^{-nt}}{(1+(n-1)e^{-nt})(1-e^{-nt})} \big ( (1-n) (1- e^{-nt})-(1+(n-1)e^{-nt})\big ) \\&= - \frac{n^2(n-1)e^{-nt}}{(1+(n-1)e^{-nt})(1-e^{-nt})} \end{aligned}$$

and besides that

$$\begin{aligned}&F(\varphi (t)) = (n-1) \Bigg ( \frac{1+(n-1)e^{-nt}}{1-e^{-nt}} - (n-1) \frac{1-e^{-nt}}{1+(n-1)e^{-nt}} + n-2 \Bigg ) \\&\quad = \frac{(n-1) \Big ( (1+(n-1)e^{-nt})^2 - (n-1)(1-e^{-nt})^2 + (n-2)(1-e^{-nt})(1+(n-1)e^{-nt})\Big )}{(1+(n-1)e^{-nt})(1-e^{-nt})} \\&\quad = \frac{(n-1)\Big ( 2-n + 4(n-1) e^{-nt} - (n-1)(n-2)e^{-2nt} + (n-2)\big ( 1+(n-2)e^{-nt} - (n-1)e^{-2nt}\big )\Big )}{(1+(n-1)e^{-nt})(1-e^{-nt})}\\&\quad = \frac{n^2 (n-1)e^{-nt}}{(1+(n-1)e^{-nt})(1-e^{-nt})}. \end{aligned}$$

5 Harnack inequalities

One of the main reasons for which Li and Yau [21] proved their inequality was to deduce from it their fundamental Harnack inequality. In the parabolic case, a Harnack inequality is a pointwise estimate of the form

$$\begin{aligned} u(t_1,x_1)\le u(t_2,x_2) C(t_1,t_2,x_1,x_2),\quad 0<t_1<t_2<\infty ,\, x_1,x_2\in M, \end{aligned}$$
(34)

where u is a positive solution to the corresponding diffusion/heat equation on the state space M, and the constant in (34) may also depend on parameters of the space M like dimension and curvature bounds. For example, positive solutions of the heat equation on \({\mathbb {R}}^d\) satisfy the following sharp inequality

$$\begin{aligned} u(t_1,x_1)\le u(t_2,x_2) \Big (\frac{t_2}{t_1}\Big )^{d/2} e^{\frac{|x_1-x_2|^2}{4(t_2-t_1)}},\quad 0<t_1<t_2, \,x_1,\,x_2\in {\mathbb {R}}^d, \end{aligned}$$
(35)

see e.g. [2]. A corresponding inequality is also valid on complete d-dimensional Riemannian manifolds M with nonnegative Ricci curvature, see [21].

The main idea to derive such estimates for the classical heat equation on a Riemannian manifold from the Li–Yau inequality is to integrate along geodesics in space-time and use the classical chain rule. In the context of jump processes it is natural to replace this integration along continuous paths with appropriate jumps, especially if respective paths in space-time are not available at all like in the discrete setting. The latter situation was studied in the works [8, 16, 22], where Harnack inequalities have been derived from Li–Yau estimates.

In order to highlight the additional difficulty that arises in the space-continuous setting, we briefly repeat the core argument from the discrete case of how Li–Yau estimates can be used in order to derive Harnack estimates of the form (34). Recall Remark 2.5, which states that the non-local Li–Yau inequality \(-L \log u \le \varphi (t)\) can be reformulated as

$$\begin{aligned} \partial _t \log u \ge \Psi _\Upsilon (\log u) - \varphi (t). \end{aligned}$$

With this at hand, one proceeds as follows. Given \(0<t_1<t_2\) one has for arbitrary \(s\in [t_1,t_2]\)

$$\begin{aligned}&\log \frac{u(t_1,x_1)}{u(t_2,x_2)} \\&\quad = \log \frac{u(t_1,x_1)}{u(s,x_1)} + \log \frac{u(s,x_1)}{u(s,x_2)} + \log \frac{u(s,x_2)}{u(t_2,x_2)}\\&\quad = - \int _{t_1}^s \partial _t \log u(t,x_1) \mathrm {d}t + \log \frac{u(s,x_1)}{u(s,x_2)} - \int _{s}^{t_2} \partial _t \log u(t,x_2) \mathrm {d}t\\&\quad \le \int _{t_1}^s \big ( \varphi (t)-\Psi _{\Upsilon }(\log u)(t,x_1)\big )\mathrm {d}t + \log \frac{u(s,x_1)}{u(s,x_2)} \\&\qquad + \int _s^{t_2} \big (\varphi (t)-\Psi _\Upsilon (\log u)(t,x_2)\big )\mathrm {d}t\\&\quad = \int _{t_1}^{t_2} \varphi (t)\mathrm {d}t + \log \frac{u(s,x_1)}{u(s,x_2)} - \int _{t_1}^s \Psi _\Upsilon (\log u) (t,x_1)\mathrm {d}t \\&\qquad - \int _{s}^{t_2} \Psi _\Upsilon (\log u) (t,x_2) \mathrm {d}t. \end{aligned}$$

Note that it suffices to find a control

$$\begin{aligned} \log \frac{u(s,x_1)}{u(s,x_2)} - \int _{t_1}^s \Psi _\Upsilon (\log u) (t,x_1)\mathrm {d}t - \int _{s}^{t_2} \Psi _\Upsilon (\log u) (t,x_2) \mathrm {d}t \le {\tilde{C}}(t_1,t_2,x_1,x_2) \end{aligned}$$

in order to derive a bound of the form (34). In the discrete setting this can be achieved with a quite rough estimate. Indeed, let M be discrete and assume that \(k(x_2,x_1)>0\), then we observe

$$\begin{aligned}&\log \frac{u(s,x_1)}{u(s,x_2)} - \int _{t_1}^s \Psi _\Upsilon (\log u) (t,x_1)\mathrm {d}t - \int _{s}^{t_2} \Psi _\Upsilon (\log u) (t,x_2) \mathrm {d}t \nonumber \\&\quad \le \log \frac{u(s,x_1)}{u(s,x_2)} - k(x_2,x_1) \int _s^{t_2} \Upsilon (\log u(t,x_1)-\log u(t,x_2))\mathrm {d}t, \end{aligned}$$
(36)

from which a suitable estimate can then be derived, see [16, Section 6]. In the argument, one chooses, among other things, \(s\in [t_1,t_2]\) in such a way that the right-hand side in (36) is minimized over \([t_1,t_2]\), and one uses the inequality \(\Upsilon (z)\ge \frac{1}{2}z^2\), \(z\ge 0\).

Combining the proof from [16, Section 6] with our findings from Sect. 4 we arrive at the following Harnack inequality for positive solutions of the heat equation on complete graphs.

Corollary 5.1

Let \(u:[0,\infty )\times M \rightarrow (0,\infty )\) be a solution to \(\partial _t u= Lu\) on \((0,\infty )\times M\), where L denotes the generator of the continuous-time Markov chain such that the complete graph \(K_n\), \(n\ge 2\), is the corresponding underlying graph. Then the Harnack inequality

$$\begin{aligned} u(t_1,x_1)\le u(t_2,x_2) \exp \Big [ (n-1)\int _{t_1}^{t_2} \log \Big ( \frac{1+(n-1)e^{-nt}}{1-e^{-nt}}\Big ) \mathrm {d}t + \frac{2}{t_2-t_1}\Big ] \end{aligned}$$
(37)

holds true for any \(0<t_1<t_2<\infty \) and \(x_1,x_2 \in M\).

As described above, in the discrete setting, it is possible to drop the \(\Psi _\Upsilon (\log u)(\cdot ,x_1)\) term and to only use the corresponding second term with \(x=x_2\). Estimating in this simple way does not work if the state space is continuous. It turns out that here the proof of a Harnack inequality in the form (34) is much more involved.

We now come to a Harnack inequality for positive solutions to the fractional heat equation on \({\mathbb {R}}^d\). Even though the result obtained here is not optimal, the proof shows that from the Li–Yau inequality (29) one can indeed derive Harnack estimates. To the best of the authors’ knowledge, this seems to be the first proof of a Harnack inequality via a Li–Yau type inequality for an evolution equation with a purely non-local diffusion operator in the space-continuous setting.

Theorem 5.2

Let \(\beta \in (0,2)\) and \(u:[0,\infty )\times {\mathbb {R}}^d\rightarrow (0,\infty )\) be a strong solution to the fractional heat equation (17). Then there exists a positive constant \(C=C(\beta ,d)\) such that for all \(0<t_1<t_2<\infty \) and \(x_1,x_2 \in {\mathbb {R}}^d\) the following Harnack type inequality holds true.

$$\begin{aligned} u(t_1,x_1)\le u(t_2,x_2) \Big (\frac{t_2}{t_1}\Big )^{C_{LY}}\exp \left( C\left[ 1+\frac{|x_1-x_2|^{\beta +d}}{(t_2-t_1)^{1+\frac{d}{\beta }}}\right] \right) , \end{aligned}$$
(38)

where \(C_{LY}\) is the Li–Yau constant given by (26).

Remark 5.3

(i) The parabolic Harnack inequality for the space fractional heat equation has already intensively been studied in the literature. Concerning local solutions, we refer to [7, 13] for various results in this direction. Here the authors make use of probabilistic methods. A purely analytic proof, even in a rough setting, has been found in [12]. For global solutions, Harnack estimates have also been derived in [10, Theorem 8.2] and [15, Theorem 2.7] by means of heat kernel estimates. As shown in [10], an important difference to the classical heat equation is that no time gap is required between \(t_1\) and \(t_2\).

(ii) The Harnack inequality (38) is not optimal in various respects. We need a time delay and instead of the exponential function one would expect polynomial terms, cf. [10, Theorem 8.2]. Another aspect is the lack of robustness, the constant C blows up as \(\beta \rightarrow 2\). Nevertheless (38) is scale invariant and implies a natural Harnack estimate on different space-time cylinders as typically obtained for related equations with rough coefficients, cf. [12].

Proof

As in the previous calculations we have for any \(s\in [t_1,t_2]\) that

$$\begin{aligned} \log \frac{u(t_1,x_1)}{u(t_2,x_2)}&= - \int _{t_1}^s \partial _t \log u(t,x_1) \mathrm {d}t + \log \frac{u(s,x_1)}{u(s,x_2)} - \int _{s}^{t_2} \partial _t \log u(t,x_2) \mathrm {d}t. \end{aligned}$$
(39)

From Theorem 3.2 and Remark 2.5 we know that the differential Harnack inequality

$$\begin{aligned} \partial _t \log u(t,x)\ge \Psi _\Upsilon (\log u)(t,x)-\frac{C_{LY}(\beta ,d)}{t},\quad t>0,\,x\in {\mathbb {R}}^d \end{aligned}$$

holds true. Combining this with (39) we obtain

$$\begin{aligned} \log \frac{u(t_1,x_1)}{u(t_2,x_2)}&\le \int _{t_1}^{t_2}\frac{C_{LY}(\beta ,d)}{t} \mathrm {d}t + \log \frac{u(s,x_1)}{u(s,x_2)}\nonumber \\&\quad - \int _{t_1}^s \Psi _\Upsilon (\log u) (t,x_1)\mathrm {d}t - \int _{s}^{t_2} \Psi _\Upsilon (\log u) (t,x_2) \mathrm {d}t, \;s\in [t_1,t_2]. \end{aligned}$$
(40)

So setting \(v=\log u\), just like before, we have to find a suitable upper bound for the function

$$\begin{aligned}&f(s):=v(s,x_1)-v(s,x_2) - \int _{t_1}^s \Psi _\Upsilon (v) (t,x_1)\mathrm {d}t \\&\quad - \int _{s}^{t_2} \Psi _\Upsilon (v) (t,x_2) \mathrm {d}t,\quad s \in [t_1,t_2], \end{aligned}$$

We first assume that \(|x_1-x_2|\le 1\) and show the general statement by a classical scaling argument later.

Set \(t_*=\frac{t_1+t_2}{2}\). We introduce the weight function

$$\begin{aligned} \eta (t)=\left\{ \begin{array}{ll} (t-t_1)^\alpha &{}, t \in [t_1,t_*)\\ (t_2-t)^\alpha &{}, t \in [ t_*,t_2], \end{array}\right. \end{aligned}$$
(41)

with fixed parameter \(\alpha >\frac{1}{2}\max \{0,\frac{d}{\beta }-1\}\), e.g. \(\alpha =\frac{d}{\beta }\). Employing Fubini’s theorem, we observe that

$$\begin{aligned} \min \limits _{t \in [t_1,t_2]} f(t)&\le \frac{1}{\int _{t_1}^{t_2} \eta (t) \mathrm {d}t} \int _{t_1}^{t_2} \eta (t)f(t)\mathrm {d}t \nonumber \\&= \frac{1}{\int _{t_1}^{t_2} \eta (t) \mathrm {d}t} \int _{t_1}^{t_2}\Big ( \eta (t) \big (v(t,x_1)-v(t,x_2)\big ) - \Psi _\Upsilon (v)(t,x_1) \int _t^{t_2}\eta (\tau )\mathrm {d}\tau \nonumber \\&\quad - \Psi _\Upsilon (v)(t,x_2) \int _{t_1}^{t}\eta (\tau )\mathrm {d}\tau \Big )\mathrm {d}t\nonumber \\&= \frac{1}{\int _{t_1}^{t_2} \eta (t) \mathrm {d}t}\Big [ \int _{t_1}^{t_2}\Big ( \eta (t) A_1(t) - \Psi _\Upsilon (v)(t,x_1) \int _t^{t_2}\eta (\tau )\mathrm {d}\tau \Big )\mathrm {d}t \nonumber \\&\quad + \int _{t_1}^{t_2} \Big (\eta (t)A_2(t)- \Psi _\Upsilon (v)(t,x_2) \int _{t_1}^{t}\eta (\tau )\mathrm {d}\tau \Big )\mathrm {d}t\Big ], \end{aligned}$$
(42)

where

$$\begin{aligned} A_1(t) = \frac{1}{|Q_t|}\int _{Q_t} \big (v(t,x_1)-v(t,y)\big )\mathrm {d}y,\quad A_2(t) = \frac{1}{|Q_t|}\int _{Q_t}\big ( v(t,y)-v(t,x_2)\big )\mathrm {d}y \end{aligned}$$

for \(t \in (t_1,t_2)\), and \(Q_t\) is the open ball \(Q_t=B_{r(t)}(x_1)\) with radius

$$\begin{aligned} r(t)= \Big (\frac{\omega _d c_{\beta ,d}}{1+\alpha }(t_2-t)\Big )^\frac{1}{\beta } \end{aligned}$$

and volume

$$\begin{aligned} |Q_t|=\omega _d r(t)^d= \omega _d^{\frac{d}{\beta }+1} \Big (\frac{c_{\beta ,d}}{1+\alpha }(t_2-t)\Big )^\frac{d}{\beta }. \end{aligned}$$
(43)

Using the inequality \(z \le \Upsilon (-z)+1\), valid for any \(z \in {\mathbb {R}}\), we may estimate

$$\begin{aligned} A_1(t)&\le 1+ \frac{1}{|Q_t|}\int _{Q_t} \Upsilon \big (v(t,y)-v(t,x_1)\big )\mathrm {d}y \\&\le 1+\frac{c_{\beta ,d}(t_2-t)}{1+\alpha }\int _{Q_t}\frac{\Upsilon (v(t,y)-v(t,x_1))}{|y-x_1|^{d+\beta }}\mathrm {d}y. \end{aligned}$$

Consequently, we observe that

$$\begin{aligned}&\frac{1}{\int _{t_1}^{t_2} \eta (t) \mathrm {d}t} \int _{t_1}^{t_2}\Big ( \eta (t) A_1(t) - \Psi _\Upsilon (v)(t,x_1) \int _t^{t_2}\eta (\tau )\mathrm {d}\tau \Big )\mathrm {d}t \\&\quad \le 1 +\frac{c_{\beta ,d}}{\int _{t_1}^{t_2} \eta (t) \mathrm {d}t} \int _{t_1}^{t_2}\Big ( \frac{\eta (t)(t_2-t)}{1+\alpha }-\int _t^{t_2}\eta (\tau )\mathrm {d}\tau \Big ) \int _{Q_t} \frac{\Upsilon (v(t,y)-v(t,x_1))}{|y-x_1|^{d+\beta }}\mathrm {d}y \,\mathrm {d}t. \end{aligned}$$

If \(t \in [t_*, t_2 ]\) then

$$\begin{aligned} \frac{\eta (t)(t_2-t)}{1+\alpha }-\int _t^{t_2}\eta (\tau )\mathrm {d}\tau = 0. \end{aligned}$$
(44)

If instead \(t \in [ t_1, t_*)\) one readily checks that the left-hand side of (44) is increasing in t, so that we conclude

$$\begin{aligned} \frac{1}{\int _{t_1}^{t_2} \eta (t) \mathrm {d}t} \int _{t_1}^{t_2}\Big ( \eta (t) A_1(t) - \Psi _\Upsilon (v)(t,x_1) \int _t^{t_2}\eta (\tau )\mathrm {d}\tau \Big )\mathrm {d}t\le 1. \end{aligned}$$
(45)

Next, denoting by \(z_+\) the positive part of \(z\in {\mathbb {R}}\) and using that \(\Upsilon (z)\ge \frac{1}{2}z^2\) for all \(z\ge 0\), we estimate the integral term involving \(A_2(t)\) as follows.

$$\begin{aligned}&\int _{t_1}^{t_2}\Big ( \eta (t)A_2(t)- \Psi _\Upsilon (v)(t,x_2) \int _{t_1}^{t}\eta (\tau )\mathrm {d}\tau \Big ) \mathrm {d}t\\&\quad \le \int _{t_1}^{t_2}\int _{Q_t} \frac{\eta (t)(v(t,y)-v(t,x_2))}{|Q_t|}\\&\qquad -\frac{c_{\beta ,d}\Upsilon (v(t,y)-v(t,x_2))\int _{t_1}^{t}\eta (\tau )\mathrm {d}\tau }{|y-x_2|^{d+\beta }}\mathrm {d}y\mathrm {d}t\\&\quad \le \int _{t_1}^{t_2}\int _{Q_t} \frac{\eta (t)(v(t,y)-v(t,x_2))_+}{|Q_t|}\\&\qquad -\frac{c_{\beta ,d}\Upsilon \big ((v(t,y)-v(t,x_2))_+\big )\int _{t_1}^{t}\eta (\tau )\mathrm {d}\tau }{|y-x_2|^{d+\beta }} \mathrm {d}y\mathrm {d}t\\&\quad \le \int _{t_1}^{t_2}\int _{Q_t}\frac{\eta (t)(v(t,y)-v(t,x_2))_+}{|Q_t|}\\&\qquad -\frac{c_{\beta ,d}\big ((v(t,y)-v(t,x_2))_+\big )^2\int _{t_1}^{t}\eta (\tau )\mathrm {d}\tau }{2|y-x_2|^{d+\beta }} \mathrm {d}y\mathrm {d}t\\&\quad \le \int _{t_1}^{t_2} \int _{Q_t} \frac{|y-x_2|^{d+\beta }\eta (t)^2}{2c_{\beta ,d}|Q_t|^2 \int _{t_1}^{t}\eta (\tau )\mathrm {d}\tau }\mathrm {d}y\mathrm {d}t, \end{aligned}$$

where in the last step we used that \(\max _{z \in {\mathbb {R}}}(b_1z-b_2z^2)= \frac{b_1^2}{4b_2}\) for constants \(b_1 \in {\mathbb {R}}\) and \(b_2>0\).

Using the inequality \((a+b)^p\le 2^{p-1}(a^p+b^p)\), \(a,b\ge 0\), \(p\ge 1\), and our assumption that \(|x_1-x_2|\le 1\) we have

$$\begin{aligned} \int _{Q_t} |y-x_2|^{d+\beta }\mathrm {d}y&\le 2^{d+\beta -1} \Bigg (\int _{Q_t} |y-x_1|^{d+\beta }\mathrm {d}y + |Q_t|\Bigg )\\&\le 2^{d+\beta -1} \big (r(t)^{d+\beta }|Q_t|+|Q_t|\big ). \end{aligned}$$

We therefore obtain

$$\begin{aligned}&\int _{t_1}^{t_2}\Bigg ( \eta (t)A_2(t)- \Psi _\Upsilon (v)(t,x_2) \int _{t_1}^{t}\eta (\tau )\mathrm {d}\tau \Bigg ) \mathrm {d}t \\&\quad \le \frac{2^{d+\beta -2}}{c_{\beta ,d}}\int _{t_1}^{t_2} \frac{\eta (t)^2\big (1+r(t)^{d+\beta }\big )}{|Q_t| \int _{t_1}^{t}\eta (\tau )\mathrm {d}\tau }\,\mathrm {d}t. \end{aligned}$$

For \(t \in [t_1, t_*]\) we observe that

$$\begin{aligned}&\frac{\eta (t)^2\big (1+r(t)^{d+\beta }\big )}{|Q_t| \int _{t_1}^{t}\eta (\tau )\mathrm {d}\tau } = \frac{(t-t_1)^{2\alpha } \big (1+r(t)^{d+\beta }\big )(1+\alpha )}{\omega _d r(t)^d (t-t_1)^{1+\alpha }} \\&\quad \le \frac{1+\alpha }{\omega _d} (t-t_1)^{\alpha -1}\Bigg (\frac{1}{r(t_*)^d}+r(t_1)^\beta \Bigg ) \end{aligned}$$

whereas for \(t \in (t_*, t_2]\) we have

$$\begin{aligned}&\frac{\eta (t)^2\big (1+r(t)^{d+\beta }\big )}{|Q_t| \int _{t_1}^{t}\eta (\tau )\mathrm {d}\tau } \\&\quad \le \frac{(t_2-t)^{2\alpha } \big (1+r(t)^{d+\beta }\big )}{\omega _d r(t)^d \int _{t_1}^{t_*}\eta (\tau )\mathrm {d}\tau } \le \frac{1+\alpha }{\omega _d}\frac{ (t_2-t)^{2\alpha } }{(t_*-t_1)^{1+\alpha }}\Bigg (\frac{1}{r(t)^d}+r(t_*)^\beta \Bigg )\\&\quad = \frac{1+\alpha }{\omega _d}\frac{ (t_2-t)^{2\alpha } }{(t_*-t_1)^{1+\alpha }} \Bigg [\Bigg (\frac{1+\alpha }{\omega _d c_{\beta ,d}}\Bigg )^{\frac{d}{\beta }} (t_2-t)^{-\frac{d}{\beta }}+r(t_*)^\beta \Bigg ]. \end{aligned}$$

Note that \(2\alpha -\frac{d}{\beta }>-1\), by our choice of \(\alpha \). Consequently,

$$\begin{aligned}&\int _{t_1}^{t_2} \frac{\eta (t)^2\big (1+ r(t)^{d+\beta }\big )}{|Q_t| \int _{t_1}^{t}\eta (\tau )\mathrm {d}\tau }\,\mathrm {d}t \\&\quad = \int _{t_1}^{t_*}\ldots \,\mathrm {d}t+ \int _{t_*}^{t_2} \ldots \,\mathrm {d}t\\&\quad \le \frac{(1+\alpha )(t_*-t_1)^\alpha }{\alpha \omega _d} \Bigg (\frac{1}{r(t_*)^d}+r(t_1)^\beta \Bigg )\\&\qquad +\frac{(1+\alpha )}{\omega _d} \Bigg (\frac{(1+\alpha )^{\frac{d}{\beta }}(t_2-t_*)^{\alpha -\frac{d}{\beta }}}{(2\alpha -\frac{d}{\beta }+1)(\omega _d c_{\beta ,d})^{\frac{d}{\beta }}}+\frac{(t_2-t_*)^\alpha }{2\alpha +1}r(t_*)^\beta \Bigg )\\&\quad \le \frac{(1+\alpha )^{1+\frac{d}{\beta }}(t_*-t_1)^{\alpha -\frac{d}{\beta }}}{\alpha \omega _d^{1+\frac{d}{\beta }} c_{\beta ,d}^{\frac{d}{\beta }}}+\frac{2c_{\beta ,d}}{\alpha }(t_*-t_1)^{1+\alpha }\\&\qquad + \frac{(1+\alpha )^{1+\frac{d}{\beta }}(t_2-t_*)^{\alpha -\frac{d}{\beta }}}{(2\alpha -\frac{d}{\beta }+1)\omega _d^{1+\frac{d}{\beta }} c_{\beta ,d}^{\frac{d}{\beta }}} +\frac{c_{\beta ,d}}{2\alpha +1}(t_2-t_*)^{1+\alpha }. \end{aligned}$$

Since \(t_2-t_*=t_*-t_1=\frac{1}{2}(t_2-t_1)\) and

$$\begin{aligned} \int _{t_1}^{t_2}\eta (t)\,\mathrm {d}t=2\int _{t_1}^{t_*}\eta (t)\,\mathrm {d}t=\frac{2}{1+\alpha }(t_*-t_1)^{1+\alpha }, \end{aligned}$$

it follows from the previous estimates that

$$\begin{aligned}&\frac{1}{\int _{t_1}^{t_2} \eta (t) \mathrm {d}t} \int _{t_1}^{t_2}\Bigg ( \eta (t)A_2(t)- \Psi _\Upsilon (v)(t,x_2) \int _{t_1}^{t}\eta (\tau )\mathrm {d}\tau \Bigg ) \mathrm {d}t\nonumber \\&\quad \le M(\alpha ,d,\beta )\Big (1+(t_2-t_1)^{-1-\frac{d}{\beta }}\Big ), \end{aligned}$$
(46)

where the constant \(M(\alpha ,d,\beta )\) can be specified explicitly.

We now choose s in (40) such that \(f(s)=\min _{t\in [t_1,t_2]} f(t)\). Using (42), (45) and (46) we obtain

$$\begin{aligned} \log \frac{u(t_1,x_1)}{u(t_2,x_2)} \le C_{LY}\log \big ( \frac{t_2}{t_1}\big )+1+M(\alpha ,d,\beta )\Big (1+(t_2-t_1)^{-1-\frac{d}{\beta }}\Big ), \end{aligned}$$

and thus

$$\begin{aligned} u(t_1,x_1)\le u(t_2,x_2) \Big (\frac{t_2}{t_1}\Big )^{C_{LY}}\exp \Big (1+M(\alpha ,d,\beta )\Big [1+(t_2-t_1)^{-1-\frac{d}{\beta }}\Big ]\Big ), \end{aligned}$$

which implies the assertion in the case \(|x_1-x_2|\le 1\). The general case follows from a classical scaling argument using that for any \(\lambda >0\) the function \(u(\lambda ^\beta t, \lambda x)\) is a solution to the fractional heat equation on \((0,\infty )\times {\mathbb {R}}^d\) if and only if u(tx) enjoys this property. \(\square \)