Holder continuity of weak solutions to evolution equations with distributed order fractional time derivative

We study the regularity of weak solutions to evolution equations with distributed order fractional time derivative. We prove a weak Harnack inequality for nonnegative weak supersolutions and H\"older continuity of weak solutions to this problem. Our results substantially generalise analogous known results for the problem with single order fractional time derivative.


Introduction and main result
In this paper we study the local Hölder continuity of weak solutions to the following problem where T > 0, Ω is a bounded domain in R N , N ≥ 1, u 0 is a given initial data, f is a bounded function, A = (a ij ) is an R N ×N -valued given function and k is the kernel related to the distributed order fractional derivative. Here Du stands for the gradient w.r.t. to the spatial variables and f 1 * f 2 is the convolution on the positive half-line w.r.t. time, that is (f 1 * f 2 )(t) = t 0 f 1 (t − τ )f 2 (τ ) dτ , t ≥ 0. In the paper, it is merely assumed that the coefficients A(t, x) are measurable, bounded and uniformly elliptic.
In the classical case, which formally corresponds to k being the Dirac delta distribution so that the integro-differential operator w.r.t. time in (1) becomes the classical time derivative, there is a well known De Giorgi-Nash-Moser regularity theory, see for example [15], [16], [22]. Here, we study the problem with memory, in particular, ∂ t [k * (u − u 0 )] covers such cases as the Caputo fractional derivative, multi-term fractional derivatives or a 'purely distributed' fractional derivative.
Before describing the structure condition on k we briefly give (only formally) some motivation for equation (1) from physics. Assume that there exists a function l ∈ L 1,loc (R + ) such that k * l ≡ 1. Let us further suppose that the (scalar) quantity u is conserved in any (smooth) subdomain V of the domain Ω, i.e.
q is the corresponding flux. It is well known that in case q(t, x) = −A(t, x)Du(t, x), i.e. Fick's first law, (1) becomes a classical diffusion equation. However, in many applications, especially in non-homogeneous materials this constitutive law is not appropriate and other forms of the flux are proposed (see for example [17], [25]). Let us consider q(t, x) = −∂ t [l * A(·, x)Du(·, x)](t), where l ≡ 1 (if l ≡ 1, then it is Fick'a law), then the flux clearly depends on all the values of Du(τ, x) for τ ∈ (0, t). In this case (2) takes the form Assuming that the functions under consideration are smooth enough we arrive at We next apply the divergence theorem and convolve both sides with k to obtain i.e.
Since k * l = 1 and V is arbitrary, we arrive, at least formally, at (1). Let us now describe the class of kernels k to be studied in this paper. Let {α n } M n=1 satisfy 0 < α 1 < α 2 < · · · < α M < 1, q n , n = 1, . . . , M , be nonnegative numbers, and w ∈ L 1 ((0, 1)) be nonnegative. We define the measure µ on the Borel sets in R by dµ = M n=1 q n dδ(· − α n ) + wdν 1 , where δ(· − α n ) is the Dirac measure at α n and ν N denotes the N -dimensional Lebesgue measure. Here we allow the first or the second component in the above representation to vanish, but we always assume that µ ≡ 0. Then we define where Γ is the Gamma function. Having introduced the kernel k, the associated distributed order fractional derivative (of Caputo type) of a sufficiently smooth function v is given by We note that the notation D (µ) coincides with the notation from [12] and [13]. We also point out that concerning integration w.r.t. µ we use the following convention. If 0 ≤ a < b ≤ 1, then b a h(α)dµ(α) = (a,b] h(α)dµ(α), but a a h(α)dµ(α) = h(α n )q n if a = α n for some n ∈ {1, . . . , M } and 0 elsewhere. There is a vast literature on diffusion equations with single order fractional time derivative, i.e. µ = δ(· − α) with some α ∈ (0, 1) and k(t) = t −α Γ(1−α) , which form an important class of subdiffusion equations. In fact, they can be used to model diffusive particles with a mean squared displacement behaving as a multiple of t α ( [17]). Diffusion equations with distributed order fractional derivatives appear in the context of ultra-slow diffusion, see [14,21]. Here, the mean squared displacement might only have a logarithmic growth in time.
The regularity theory for weak solutions to the problem with single order fractional time derivative has been established in a series of papers by Zacher: [28] (boundedness of weak solutions), [27] (weak Harnack inequality for nonnegative weak supersolutions), and [26] (Hölder regularity of weak solutions). Later, for the problem with single order fractional time derivative and fractional diffusion in space, Hölder continuity of weak solutions was proved in [1], and a weak Harnack inequality was derived in [11]. In contrast to the classical parabolic De Giorgi-Nash-Moser theory, the full Harnack inequality (in its usual form) fails to hold for nonnegative solutions to time-fractional diffusion equations (local or nonlocal in space) if the space dimension is at least two, see [8].
It is worth emphasising that among these results, only the boundedness of weak solutions to (1) have been proved with a more general kernel k ( [28]). To describe the result, following [29], a kernel k ∈ L 1, loc (R + ) is called to be of type PC if it is nonnegative and nonincreasing, and there exists a kernel l ∈ L 1, loc (R + ) such that k * l = 1 in (0, ∞); (k, l) is then a so-called PC pair. In this case, l is completely positive (cf. Thm. 2.2 in [5]) and thus nonnegative. The assumption on k in the boundedness results from [28] is now that k is of type PC and that the corresponding kernel l belongs to L p ((0, T )) for some p > 1. This assumption covers a wide class of kernels, including those considered in this paper.
So, regularity of weak solutions beyond boundedness has been established only in the case of a single order fractional time derivative, i.e. µ = δ(· − α). In this paper, we substantially generalise the abovementioned results by developing a De Giorgi-Nash-Moser theory for evolution equations of the form (1) with a general distributed order fractional time derivative, i.e. the kernel k is as described above in (4), (5). We establish a weak Harnack inequality for nonnegative weak supersolutions and prove interior Hölder continuity of weak solutions to (1). Our unified approach contains in particular such special cases as: single order fractional time derivatives (µ = δ(· − α)), multi-term fractional time derivatives (µ = M n=1 q n δ(· − α n )) and 'purely distributed' fractional time derivatives (dµ(α) = w(α)dν 1 (α)). One of the advantages of our approach is that we only assume nonnegativity and integrability of the weight function w. In the literature, the distributed order derivative is usually considered with a more regular weight w, and the authors frequently impose additional assumptions concerning the behaviour of w near the endpoints of the interval [0, 1]. In our treatment, we develop further some ideas introduced by the first two authors in [12] and [13], which allows us to work with very general measures µ.
Scaling properties of the equations play an important role in De Giorgi-Nash-Moser regularity theory as the scaling indicates how the local sets (time-space cylinders) are to be selected. Without suitable geometry, the iteration techniques of De Giorgi and Moser do not work. We point out that, although the problem for a single fractional time derivative of order α ∈ (0, 1) admits a natural scaling with similarity variable s = |x| 2 /t α , this is no longer the case for the distributed order derivative. In fact, the lack of natural scaling is one of the biggest obstacles in establishing the regularity theory for (1). We overcome this difficulty by introducing appropriate time-space cylinders whose shape depends on the kernel k.
In the proof of the weak Harnack inequality we mostly follow the approach of [27], i.e. we establish mean-value inequalities by means of Moser iteration schemes, prove weak L 1 estimates for the logarithm of the supersolution, and apply a lemma of Bombieri and Giusti. However, we point out that, although the general idea of the proof and some basic estimates have been taken from [27], our case is much more involved. In particular, two crucial ingredients in the proof of the weak Harnack inequality are Lemma 2.5 and Lemma 2.6. It should be emphasised that these results are much easier to obtain if we assume that the support of the measure µ is cut-off from one. The solution of the problem in the whole generality requires not only more careful and complicated calculations in the proofs of Lemma 2.5 and Lemma 2.6, but also a completely new argument in the essential part of the logarithmic estimates. As a by-product, the latter leads to a significant improvement of Zacher's results on the single order case from [27] by establishing the robustness of the estimates as α → 1.
Having obtained the weak Harnack inequality, we use it to prove the Hölder continuity of weak solutions to (1). In addition to the substantial generalisation of the result in [26] on the single order case, another novelty is that, in contrast to [26], our argument is based on the weak Harnack inequality, which allows for a much less involved proof. Even if the method of Harnack inequalities to prove regularity is well known, our argument seems to be new in the context of temporally non-local equations.
We say that a function u is a weak solution (subsolution, supersolution) of (1) in Ω T , if u belongs to the space and for any nonnegative test function Weak solutions of (1) in the class Z have been constructed in [29] under the assumptions (H1)-(H3). Note that the function u 0 plays the role of the initial data for u, at least in a weak sense. In case of sufficiently regular functions u and k * (u − u 0 ) the condition (k * u)| t=0 = 0 implies u| t=0 = u 0 , see [29] and Section 3.5 in [13].
Next we describe the geometry of the time-space cylinders appearing in our estimates. As already mentioned, the choice of the right geometry is crucial for the De Giorgi-Nash-Moser techniques to work. In our case, the dependence of the shape of the cylinders on the kernel k is more complicated than in the single order case, where only the order α ∈ (0, 1) determines the geometry. By B(x, r) we denote the open ball with radius r > 0 centered at x ∈ R N and recall that by ν N we mean the Lebesgue measure in R N . We then set We will show in Lemma 2.4 that there is unique increasing function Φ ∈ C([0, ∞)) ∩ C 1 ((0, ∞)) such that Φ(0) = 0 and k 1 (Φ(r)) = r −2 for all r > 0. With this function, for δ ∈ (0, 1), t 0 ≥ 0, τ > 0, and a ball B(x 0 , r), we then consider the cylinders Q − (t 0 , x 0 , r, δ) = (t 0 , t 0 + δτ Φ(2r)) × B(x 0 , δr), We note that in the single fractional order case, i.e. µ = δ(· − β) with some β ∈ (0, 1) we have Φ(r) = r 2 β , which leads to the cylinders used in [27].
Let us now present the main results of the article. For this purpose we fix a number γ − ∈ (0, 1) as large as possible for which Note that here the supremum need not be assumed. The larger γ − , the larger is the critical exponent in the weak Harnack inequality. We have the following theorem.
We apply this estimate to deduce Hölder regularity of weak solutions to (13), which is our final result. Theorem 1.3. Let T > 0, N ≥ 1 and Ω ⊂ R N be a bounded domain. Suppose the assumptions (H1)-(H3) are satisfied and let u 0 ∈ L ∞ (Ω) and f ∈ L ∞ (Ω T ). If u is a bounded weak solution to then for any V ⊂ Ω T separated from the parabolic boundary of Ω T by a positive distance, there exist constants C > 0 and ε ∈ (0, 1) depending only on µ, V , Λ, ν and N such that Recall that [28,Theorem 3.1] gives the boundedness of weak solutions to (13), provided it is bounded on the parabolic boundary of Ω T . In this case the assumption concerning the boundedness of u in Theorem 1.3, may be skipped.
As already mentioned, the exponent in the weak Harnack inequality depends on γ − . Let us provide some examples. In the case of a multi-term fractional time derivative (µ(α) = M n=1 q n δ(· − α n ) with q M > 0) we take γ − = α M , which leads to the condition p < 2+N αM 2+N αM −2αM in the weak Harnack inequality. We obtain the same result in the case where a single order fractional derivative dominates the distributed order part associated with the weight function w in the sense that On the other hand if for every γ ∈ (0, 1) we have that 1 γ w(α)dα > 0, then we can take γ − arbitrarily close to one and the weak Harnack inequality holds for any positive p < 1 + 2 N , that is, the critical exponent coincides with the one from the classical parabolic case.
The paper is organised as follows. In Section 2, we recall various preliminary results from distributed order calculus and establish estimates for the kernel l and the resolvent kernel associated with l which are crucial to make our approach work, but which are also of independent interest in the context of Volterra equations with completely positive kernels. Then we recall Moser iteration lemmas and the lemma of Bombieri and Giusti as well as other auxiliary results. Section 3 is devoted to the proof of the weak Harnack estimate (Theorem 1.1), while the final Section 4 contains the proof of Theorem 1.3.

Introduction from distributed order calculus
We begin by showing that any kernel k ∈ L 1, loc (R + ) from the class considered in this paper (see (5)) is of type PC . In [13] it was already proven that there exists l ∈ L 1,loc ([0, ∞)) such that (k, l) is a PC pair if dµ ≡ wdν 1 , see also [14]. However for a µ given by (4) the proof may be repeated without any changes. Thus, we arrive at the following result.
Theorem 2.1. Let µ be given by formula (4), where q n > 0 for n = 1, . . . , M , w ∈ L 1 ((0, 1)) is nonnegative a.e. on (0, 1). Then, there exists a nonnegative l ∈ L 1,loc ([0, ∞)) such that k * l = 1 and the operator of fractional integration I (µ) , defined by the formula Furthermore, l is given by the formula where Remark 2.1. Since the pair (k, l) is a PC pair it follows from [29, Theorem 3.1] that the initial-boundary value problem (1) with homogeneous Dirichlet boundary condition and initial condition u| t=0 = u 0 has exactly one weak solution belonging to Z.
We remark that in the whole paper, in the estimates, the constants, usually denoted by c, may change from line to line.
In the paper we will frequently use the fact that for x ∈ (0, 1] the expression . We will establish this auxiliary result in the next remark. Remark 2.2. There exists a positive constant c = c(µ) such that for every x ∈ (0, 1] there holds Indeed, since x ∈ (0, 1] we may write Using (19) for x ≤ 1 and (9) we get Let us now establish the crucial estimates for the kernel l.
Lemma 2.1. The kernel l defined in Theorem 2.1 satisfies l ∈ C ∞ ((0, ∞)). Furthermore, for every t > 0 there holds Moreover, for every T > 0, there exists c = c(µ, T ) such that for every t ∈ (0, T ] Proof. At first, we note that from (16) we have Since k * l = 1 and l is nonincreasing, we infer that .
To show (21) we recall that from (16) we have e −pt H(p)dp.
We apply (18) with x = p −1 and we obtain On the other hand we have Therefore, using (22) and (23) for p ≥ 1 we get We note that Therefore, for every t ∈ (0, T ] we have 1 2 , and applying estimate (24) in (16) we obtain . Remark 2.3. There exists a positive constant c = c(µ) such that Indeed, from Lemma 2.1 and (9) we obtain that for t ∈ (0, 1) Next, we present the formula for the solution to the resolvent equation associated with the kernel l, which will play an important role in the logarithmic estimates.
with θ ≥ 0, is given by Proof. We will prove this result applying the Laplace transform similarly as in [12,Theorem 2]. We recall Lemma 6 from [13] (see also Lemma 2.1 in [2]), which serves as a tool for the inversion of the Laplace transform.
We recall thatl (p) = 1 where p α = exp(α log p) and log p denotes the principal branch of the logarithm. Consequently, .
We will show using Lemma 2.3 that r θ is given by (27). Let us define for θ > 0 F (p) = 1 .
After a brief calculation we arrive at (27).

Properties of the cylinders
In this subsection we introduce the function Φ which defines the shape of our time-space cylinders. Then we establish important estimates concerning the kernel l and the resolvent kernel associated with l.
Remark 2.4. We note that since Proposition 2.1. For every r > 0 and for every λ ∈ (0, 1] there holds Proof. From (29) we have Since k 1 is decreasing we obtain the claim.
The subsequent lemma plays an essential role in the derivation of the mean value inequalities.
In view of (20), it is enough to show that for r > 0 small enough with some constant c = c(µ, p). Let us denote x := Φ(2r). Then multiplying by x 1−p the preceding inequality can be reformulated as Let us denote For τ ∈ (0, 1] we have where c(µ) > 0, thanks to (9). Therefore, we have is well defined and g(0) = 0. Using the same reasoning for the function and then we also see that h(0) = 0.
We will now show that there exists c = c(µ, p) > 0 such that for some positive r * = r * (µ, p), which will finish the proof, since g ′ , h ′ ∈ L 1 ((0, Φ(2r * ))). Let us firstly discuss the case when 1 < p < 1 1−γ− . We have and consequently , and we have to show that there exists c = c(µ, p) > 0 such that for x as before. To this end we write If 1 < p < 1 1−γ− , then γ − > 1 − 1 p , and thus On the other hand, for x ∈ (0, 1] we have Since x ≤ 1 we may write Inserting this result into the line above we obtain Combining (34), (35) and (36) yields Note that γ − + 1 p − 1 is positive, hence, there exists r * ∈ (0, 1) depending only on µ and p, such that Φ(2r * ) ≤ 1 and for every x ∈ (0, Φ(2r * )) there holds Applying (18) we obtain (33). The case p = 1 follows from the previous case, by Hölder's inequality. But it is also instructive to give a direct argument, to also provide some further estimates required later in the paper.
Analogously to the previous case, we have to show that there exists c > 0 depending only on µ such that (37) .
The following estimate from below of the resolvent kernel associated with l will play a crucial role in the logarithmic estimates. Lemma 2.6. LetC, C 1 > 0. There exist a positive r * = r * (µ) and positive c 1 , c 2 , c 3 depending only on C 1 ,C and µ such that for every r ∈ (0, r * ], θ = C1 r 2 , t ∈ (0,CΦ(r)), the following estimates hold: Thus, in particular, there exist a positive r * = r * (µ) and a positive c = c(C 1 ,C, µ) such that for every r ∈ (0, r * ], θ = C1 r 2 , and t ∈ (0,CΦ(r)), we have Proof. We will distinguish two cases. Let us firstly assume that the support of µ is cut-off from one, i.e.
In this case the proof is easier. We note that for any c 2 > 0 and for any p ≥ p * := c 2 (Φ(r)) −1 there holds Indeed, applying (29), for p ≥ p * we have Let us take here c 2 = 1. Further, for p > p * we may write On the other hand, there holds Therefore, using (43), (44) and (41) for p ≥ p * we obtain Applying (38) with t = 1 p we arrive at Letting t ∈ (0,CΦ(r)) we thus have and we obtain (40) in the case when (41) holds.
To show (39) we first note that from (21) we get Next, from (20) we get where in the second inequality we use the fact that t → is decreasing and in the last one we apply (40). Finally, we note that (27) gives r θ > 0, hence from (26) we get r θ < l and then where we applied (20), (37) and (40).
In order to show the result when (41) does not hold we firstly show that in the general case for t ∈ (0,CΦ(r)) there exists c = c(C 1 ,C, µ) > 0 such that We note that since r θ and l are nonincreasing Applying (20), (37) and (29) we have Since t ≤CΦ(r) we have (Φ(r)) −α ≤C α t −α and where in the last estimate we used the fact that r θ is nonincreasing. Recalling (26) we arrive at Using again monotonicity of r θ we obtain (46). Now, we will show that under the assumption that the support of µ is not cut-off from one there exists c = c(C 1 ,C, µ) > 0 such that for any t ∈ (0,CΦ(r)), 0 < r ≤ r * = r * (µ) Recalling (27), for any A > 0 we have We will show that for appropriately chosen A = A(C 1 ,C, µ), for every 0 < r ≤ r * = r * (µ) and every t ∈ (0,CΦ(r)) there holds At first we note that Let us take for simplicity A >C. We note that since (41) does not hold, and fix it from now. Then, On the other hand, since p ≥ 1 and the support of µ is not cut-off from one, (− cos(πα))dµ(α) 1 3 4 p α (− cos(πα))dµ(α).
Observe that for p > At −1 and t <CΦ(r) there holds Since Φ(r) is increasing and Φ(0) = 0, then there exists a positive r * depending only on µ such that for every r ∈ (0, r * ] and t <CΦ(r), and p > At −1 we have Thus, for such p there holds Henceforth, we discuss only 0 < r < r * . Using the estimate above together with (51) and (18) we arrive at Applying this estimate in (50) we obtain Since the support of µ is not cut-off from one, we may take γ − > 1 2 and we have further where in the last estimate we used (18). Let us now estimate We note that for t ∈ (0,CΦ(r)) there holds H θ (p)dp.

Moser iterations and an abstract lemma of Bombieri and Giusti
In this subsection, let U σ , 0 < σ ≤ 1, denote a family of measurable subsets of a fixed finite measure space endowed with a measure ̟, such that Our proof of the weak Harnack inequality relies on methods used in the proof of the weak Harnack inequality for single order fractional derivative [27]. Here we recall several general lemmas on Moser iterations (see [ [27], we need to control the power of the constant C in the inequalities resulting from the iteration process. For this reason, and for the convenience of the reader, we repeat the proof of these lemmas taking additional care of the constant C. Then there exists a constant M = M (a, κ,p) > 0 such that Let 0 < p ≤p and ς ∈ (0, 1). Set p i = pκ i , i = 0, 1, . . . and define the sequence {σ i }, i = 0, 1, . . ., by σ 0 = 1 and We now send n to ∞ and use the fact that lim n→∞ Φ(p n , ς) = ess sup Uς |f | to obtain that ess sup Hence the proof is complete.
The second Moser iteration result is the following. Then

The Yosida approximation of the non-local operator
In this section we first introduce the Yosida approximation of the non-local operator of the form d dt k * . The special case k(t) = t −α Γ(1−α) with α ∈ (0, 1) has been discussed in [27]. Here, we repeat the reasoning for general k for the reader's convenience, see also [28].
Let 1 ≤ p < ∞, T > 0, and X be a real Banach space. Then the non-local operator B defined by where the zero means vanishing at t = 0, is known to be m-accretive in L p ((0, T ); X), cf. [4], [6], and [9]. Its Yosida approximations B n , given by B n = nB(n + B) −1 , n ∈ N, have the property that for any u ∈ D(B), one has B n u → Bu in L p ((0, T ); X) as n → ∞. Furthermore, we have the representation where k n = ns n , and s n is the unique solution of the scalar-valued Volterra equation see e.g. [24]. Denote by h n ∈ L 1, loc (R + ) the resolvent kernel associated with nl, that is Convolving (56) with k and using l * k = 1, we find that Consequently, The kernels k n are nonnegative and nonincreasing for all n ∈ N, and they belong to H 1 1 ((0, T )), cf. [19] and [24]. Note that for any In fact, setting u = l * f , we have u ∈ D(B), and as n → ∞. This implies in particular that k n → k in L 1 ((0, T )) as n → ∞.
Next, we recall a fundamental identity for integro-differential operators of the form d dt (k * u), cf. also [28], [27]. Suppose that k ∈ H 1 1 ((0, T )), U is an open subset of R, and H ∈ C 1 (U ). Then for any sufficiently smooth function u on (0, T ) taking values in U , we have for a.a. t ∈ (0, T ) wherek denotes the derivative of k. In particular this identity applies to the Yosida approximations of the non-local operator. An integrated version of (58) can be found in [10, Lemma 18.4.1].
We will frequently use that if k is nonincreasing and H is convex then the last term in (58) is nonnegative. However, we would like to point out that, exactly as in [27], for the delicate logarithmic estimates in Section 3.3 one really needs the full identity (58), even though we work all the time with convex or concave functions. In particular, similarly as in [27], the crucial fractional differential inequality (119) cannot be obtained by using merely convexity inequalities.
In view of the regularity of l established in Remark 2.3, the subsequent two lemmas are also obtained by simple algebra. They are straightforward generalization of [27, Lemma 2.4 and Lemma 2.5].
If in addition v is nonnegative and ϕ is nondecreasing there holds Then

An embedding result and a weighted Poincaré inequality
We finish this chapter by recalling a fundamental result on parabolic embeddings and a weighted Poincaré inequality. We will apply these tools in a similar manner as in [27].
The following embedding result is a particular case of [23, Proposition 2.1].
Proposition 2.2. Let T > 0 and Ω be a bounded domain in R N and assume that ∂Ω satisfies the property of positive density. For 1 < p ≤ ∞ we define the space endowed with the norm .
3 Proof of the weak Harnack inequality

The regularized weak formulation and time shifts
We recall a lemma which provides an equivalent weak formulation of (1). The idea is to replace the singular kernel k by its more regular approximation k n (n ∈ N). Here, k n , h n , n ∈ N, are defined as in Section 2.4. This lemma plays an important role in deriving a priori estimates for weak (sub-/super-) solutions of (1).
Lemma 3.1. Let T > 0, and Ω ⊂ R N be a bounded domain. Suppose the assumptions (H1)-(H3) are satisfied and f is bounded on Ω T . Then u ∈ Z is a weak solution (subsolution, supersolution) of (1) in Ω T if and only if for any nonnegative function ψ ∈°H 1 2 (Ω) one has The proof of Lemma 3.1 is exactly the same as the proof of Lemma 3.1 in [27]. Analogously as in [27], if u ∈ Z is a weak supersolution of (1) in Ω T and u 0 ≥ 0 in Ω, then for any nonnegative function ψ ∈°H 1 2 (Ω).
Let now t 1 ∈ (0, T ) be fixed. For t ∈ (t 1 , T ) we introduce the shifted time s = t − t 1 and put g(s) = g(s + t 1 ), s ∈ (0, T − t 1 ), for functions g defined on (t 1 , T ). Using the decomposition we then see that Assuming in addition that u ≥ 0 on (0, t 1 ) × Ω it follows from (62), (63), and the positivity of ψ and of −k n that for any nonnegative function ψ ∈°H 1 2 (Ω). This relation will be the starting point for all of the estimates below.

Mean value inequalities
For simplicity of the notation, for r > 0 we set B r (x) := B(x, r). Recall that ν N denotes the Lebesgue measure in R N .
Proof: In the proof we follow the idea of the proof of [27, Theorem 3.1]. The main novelty here is to introduce the cylinders with the shape dependent on the kernel k. Since the problem which we consider lacks a natural scaling, one has to treat the terms which depend on the radius r very carefully and finally apply the crucial Lemma 2.5. Since here, we only consider balls centered at fixed x 0 , we abbreviate the notation B r := B r (x 0 ).
Without loss of generality, we may may further assume that u is bounded away from zero. Otherwise, we replace u with u + ε and u 0 with u 0 + ε and eventually pass with ε to zero. To abbreviate the notation, we again denote B r := B r (x 0 ).
Proof. Similarly as in the previous proofs we abbreviate B r = B r (x 0 ). Let us take γ − ∈ (0, 1) satisfying (9), fix 1 < p < 1 1−γ− and let r ∈ (0, r * ], where r * denotes the minimum of r * (µ) from Lemma 2.6 and r * (µ, p) from Lemma 2.5. Then, in particular Φ(r * ) ≤ 1. Since u 0 ≥ 0 in B r and u is a positive weak supersolution we may assume that u 0 = 0. Without loss of generality we also may take t 0 = 0. Indeed, if t 0 > 0 we shift the time as t → t − t 0 and we obtain an inequality of the same type on the time-interval J := [0, τ Φ(r)]. Due to k * u ∈ C([0, t 0 + τ Φ(r)]; L 2 (B)), we have k * ũ ∈ C(J; L 2 (B)) for the shifted functionũ(s, x) = u(s + t 0 , x). Thus u satisfies for any nonnegative test function v ∈°H 1 2 (B r ). We begin similarly as in the proof of [27,Theorem 3.3]. We choose ψ ∈ C 1 0 (B r ) such that supp ψ ⊂ B r , ψ = 1 in B δr , 0 ≤ ψ ≤ 1, |Dψ| ≤ 2/[(1 − δ)r] and the domains {x ∈ B r : ψ(x) 2 ≥ b} are convex for all b ≤ 1. Then for t ∈ J we take the test function v = ψ 2 u −1 . We have Inserting this into (112) we obtain for a.a. t ∈ J − Br where Using (H1) and Young's inequality we find that Using this, assumption (H2) and the estimate |Dψ| ≤ 2/[(1 − δ)r], we infer from (113) that for a.a. t ∈ J − Br We set w = log u (not to confuse with the weight function appearing in the definition of the kernel k), then Dw = u −1 Du. Applying Proposition 2.3 with weight ψ 2 gives where From (114) and (115) it follows that which in turn implies for a.a. t ∈ J, with some constant C 1 = C 1 (δ, N, ν, Λ) and S n (t) = R n (t)/ Br ψ 2 dx. Suppressing the spatial variable x, the identity (58) with H(y) = − log y reads Rewriting this identity in terms of w = log u we obtain where Ψ(y) = e y − 1 − y. Since Ψ is convex, it follows from Jensen's inequality that Using this inequality in (117) we have where the last equality holds again due to (117) with u replaced by e W . From (116) and (118) we deduce that ν Next, we define where A is a positive constant depending only on C 1 , µ, τ, η which will be chosen later. We note that this definition makes sense, since k * e W ∈ C(J). The latter is a consequence of k * u ∈ C(J; L 2 (B r )) and where we apply again Jensen's inequality. Similarly as in [27], to prove (110) and (111), we use the inequalities We will estimate each of the four terms I j separately. As to I 1 and I 3 we follow the lines of proof of the analogous estimates from [27,Theorem 3.3]. However, it is important to notice that in the case when (41) does not hold, the arguments from [27] to estimate I 2 and I 4 break down. Here, we present a new approach to estimate I 2 and also modify the estimate of I 4 .
Estimate of I 2 . Let us denote w := e W . Multiplying (119) by e W we see that Denoting ρ = τ Φ(r) and integrating the above inequality from t to ηρ we get (k n * w)(ηρ) − (k n * w)(t) + C 1 r 2 ηρ t w(s)ds + ηρ t S n (s)w(s)ds ≥ 0.
We may choose a subsequence n m such that (k nm * w)(ηρ) → (k * w)(ηρ) for almost all r and (k nm * w)(t) → (k * w)(t) for almost all t ∈ J − . We proceed for such r and t. Then we obtain Using positivity of v and monotonicity of l we arrive at Let us denote v(t) = v(ηρ − t) for t ∈ J − . Then, the last estimate may be rewritten as Next, we multiply the obtained inequality by e −βt , where β > 0 is to be chosen later.

Hence, we have
Estimate of I 4 . We define the function H m on R by H m (y) = y, y ≤ m, and H m (y) = m + (y − m)/(y − m + 1), y ≥ m, m > 0. Note that H m ∈ C 1 (R) is increasing, concave, and bounded above by m + 1. Moreover, by concavity Then, from (58) we obtain and thus multiplying (119) by e W H ′ m e W and employing (126), we infer that For t ∈ J + we shift the time by setting s = t−ητ Φ(r) = t−ηρ and putf (s) = f (s+ηρ), s ∈ (0, (1−η)ρ), for functions f defined on J + . Using the time-shifting identity (63), (127) implies that for a.a. s ∈ (0, (1−η)ρ) where Υ n,m denotes the history term Now, we shall deduce the estimate from below for eW . For this purpose we set θ = C1 r 2 and convolve (128) with r θ , where r θ satisfies (26). We have a.e. in (0, (1 − η)ρ) Passing to the limit with n (on an appropriate subsequence), it follows that where Applying Fubini's theorem we obtain In order to estimate the inner integral we apply (40) from Lemma 2.6 withC = (1 − η)τ . Then we have where in the last inequality we used the estimate 1 − x α ≥ α(1 − x) for x ∈ (0, 1). We note that since ρ = τ Φ(r), we have Recalling that Φ(r) ≤ 1 we may apply (38) to get Thus, we have .
Remark 3.1. We notice that as a byproduct of the proof of Theorem 3.3, we obtain robust logarithmic estimates for µ = δ(· − α) as α → 1. We point out that in [27], the constants in the estimates of I 2 and I 4 blow up as α → 1. Here, we provide new estimates which in the case of a single order fractional derivative are uniform with respect to the order of derivative α ∈ [α 0 , 1), with an arbitrarily fixed α 0 ∈ (0, 1).

Application to strong maximum principle
Similarly as in the single-order fractional derivative case, the strong maximum principle for weak subsolutions of (1) may be easily derived as a consequence of the weak Harnack inequality.
Proof. Let M = ess sup ΩT u. Then v := M − u is a nonnegative weak supersolution of (1) with u 0 replaced by v 0 := M − u 0 ≥ 0. For any 0 ≤ t 1 < t 1 + ηΦ(2r) < t 0 the weak Harnack inequality with p = 1 applied to v yields an estimate of the form Hence u = M a.e. in (0, t 0 ) × B(x 0 , r) and the assertion now follows by a chaining argument.

Weak Harnack inequality with inhomogeneity
We finish this chapter with a simple proof of Theorem 1.2. We will later apply this result to deduce Hölder regularity of weak solutions to (13) (Theorem 1.3). We recall the notation Φ(r) := Φ(2r).

Proof of the Hölder regularity
In this chapter we prove Theorem 1.3. We will deduce the Hölder regularity of weak solutions to (13) via the weak Harnack estimate. This is a novel approach compared to [26], where Hölder regularity for solutions to the problem with single order fractional time derivative was derived by means of growth lemmas without the use of Harnack inequalities. Here we provide a much simpler and shorter argument than the one in [26]. Let u be a bounded weak solution to where we assume that r ∈ (0, r * ] and r * = r * (µ) comes from Theorem 1.2 with p = 1. Then Φ(r) ≤ 1. We assume further that u 0 ∈ L ∞ (B(x 0 , 2r)), f ∈ L ∞ ((0, 2ηΦ(r)) × B(x 0 , 2r)). Again, in order to abbreviate the notation, we will often write B r (x) instead of B(x, r). Moreover, in this section, by l we denote an integer number, not to confuse with the kernel l associated with the kernel k. Let us define F (t, x) = f (t, x) + u 0 (x)k(t). We normalize the equation by putting v := u 2D , G = F 2D , D := u L∞((0,2ηΦ(r))×B2r (x0)) + r 2 F L∞(( η 2 Φ(r),2ηΦ(r))×B2r(x0)) .
Indeed, ifl corresponds to the largest cylinder contained in Q dom , it means that Q(2 −(l−1) ) is not contained in Q dom . Hence, either In the first case we get while in the second case Applying Proposition (2.1) we obtain and thusl and we arrive at (156). Let l 0 ≥l be an integer that will be fixed later. We set a l := ess inf where κ ∈ (0, 1) will also be chosen later. Then, by definition, for all j ≤ l 0 , j ∈ Z we have because ess osc Q dom v ≤ 1 and b j − a j ≥ 1 for j ≤ l 0 . We would like to prove the property (158) for j > l 0 with appropriate sequences a j , b j . More precisely, we will show that for j > l 0 there exist a nondecreasing sequence (a j ) and nonincreasing sequence (b j ) such that (158) holds. We will prove it by induction. Firstly, we note that for j ≤ l 0 the condition (158) trivially holds. Now, let l ≥ l 0 and assume that (158) holds for all j ≤ l. Under this assumption, we will construct a l+1 ≥ a l and b l+1 ≤ b l such that (158) holds also for j = l + 1. Let us denotet = t 1 − θΦ(2 −l r). Then (t, t 1 ) × B(x 1 , 2 −l r) = Q(2 −l ). At first we will show that the induction hypothesis implies the following estimate for the memory term.