1 Introduction

The temperature u of a material undergoing a multi-phase change, for instance ice-water-vapor, can be described by the following nonlinear parabolic partial differential equation

$$\begin{aligned} \partial _t\beta (u)-{\text {div}}\big (|Du|^{p-2}Du\big )\ni 0\quad \text { weakly in }E_T. \end{aligned}$$
(1.1)

Here E is an open set of \(\mathbb {R}^N\) with \(N\ge 1\) and \(E_T:=E\times (0,T]\) for some \(T>0\).

The enthalpy \(\beta (\cdot )\) is a maximal monotone graph in \(\mathbb {R}\times \mathbb {R}\) defined by (cf. Fig. 1)

$$\begin{aligned} \beta (u)=u+\sum _{i=0}^{\ell } \nu _i \mathcal {H}_{e_i}(u)\quad \text { for some } \ell \in \mathbb {N}\cup \{\infty \}, \, e_i\in \mathbb {R}\text { and } \nu _i>0, \end{aligned}$$
(1.2)

where we have assumed that \(0=e_o<e_1<\dots <e_{\ell }\),

$$\begin{aligned} d:=\min \big \{e_{i+1}-e_{i}: i=0,1,\cdots ,\ell -1\big \}>0, \end{aligned}$$

and denoted

The equation (1.1) will be understood in a proper weak sense to be made precise later.

The main result is that locally bounded, local weak solutions to (1.1) with \(p\ge 2\) are locally continuous and a modulus of continuity is explicitly quantified.

Fig. 1
figure 1

Graph of \(\beta \)

1.1 Statement of the results

From here on, we will deal with the following more general parabolic partial differential equation modeled on (1.1):

$$\begin{aligned} \partial _t\beta (u)-{\text {div}}\textbf {A}(x,t,u, Du) \ni 0\quad \text { weakly in }\> E_T, \end{aligned}$$
(1.3)

where \(\beta (\cdot )\) is defined in (1.2). The function \(\textbf {A}(x,t,u,\xi ):E_T\times \mathbb {R}^{N+1}\rightarrow \mathbb {R}^N\) is assumed to be measurable with respect to \((x, t) \in E_T\) for all \((u,\xi )\in \mathbb {R}\times \mathbb {R}^N\), and continuous with respect to \((u,\xi )\) for a.e. \((x,t)\in E_T\). Moreover, we assume the structure conditions

$$\begin{aligned} \left\{ \begin{array}{l} \textbf {A}(x,t,u,\xi )\cdot \xi \ge C_o|\xi |^p \\ |\textbf {A}(x,t,u,\xi )|\le C_1|\xi |^{p-1} \end{array} \right. \quad \text { a.e.}\> (x,t)\in E_T,\, \forall \,u\in \mathbb {R},\,\forall \,\xi \in \mathbb {R}^N, \end{aligned}$$
(1.4)

where \(C_o\) and \(C_1\) are given positive constants, and we take \(p\ge 2\).

In the sequel, the set of parameters \(\{d, \nu _i, p,N,C_o,C_1, \Vert u\Vert _{\infty , E_T}\}\) will be referred to as the data. A generic positive constant \(\gamma \) depending on the data will be used in the estimates.

Let \(\Gamma :=\partial E_T-\overline{E}\times \{T\}\) be the parabolic boundary of \(E_T\), and for a compact set \(\mathcal {K}\subset E_T\) introduce the parabolic p-distance from \(\mathcal {K}\) to \(\Gamma \) by

$$\begin{aligned} \begin{aligned} {\text {dist}}_p(\mathcal {K};\,\Gamma )&:=\inf _{\begin{array}{c} (x,t)\in \mathcal {K}\\ (y,s)\in \Gamma \end{array}} \left\{ |x-y|+|t-s|^{\frac{1}{p}}\right\} . \end{aligned} \end{aligned}$$

The formal definition of local weak solution to (1.3) will be given in Sect. 1.3. Now we proceed to present the main theorem, where by \(\ln ^{(k)}\) we mean the logarithmic function composed k times.

Theorem 1.1

Let u be a bounded weak solution to (1.3) in \(E_T\), under the structure condition (1.4) for \(p\ge 2\). Then for every pair of points \((x_1,t_1), (x_2,t_2)\in \mathcal {K}\), there holds that

$$\begin{aligned} \big |u(x_1,t_1)-u(x_2,t_2)\big | \le \varvec{\omega }\!\left( |x_1-x_2|+|t_1-t_2|^{\frac{1}{p}}\right) , \end{aligned}$$

where

$$\begin{aligned} \varvec{\omega }(r)=C \Big (\ln ^{(6)} \frac{{\text {dist}}_p(\mathcal {K};\,\Gamma )}{c r}\Big )^{-\sigma }\quad \text { for all }r\in \big (0, {\text {dist}}_p(\mathcal {K};\,\Gamma )\big ) \end{aligned}$$

for some absolute constant \(c\in (0,1)\), and for some \(C>1\) and \(\sigma \in (0,1)\) depending on the data.

Remark 1.1

All constants in Theorem 1.1 are stable as \(p\downarrow 2\).

Remark 1.2

Even though all the proofs are given for the specific \(\beta \) in (1.2), nevertheless a more general graph can be considered, namely

$$\begin{aligned} \beta (u)=\beta _{AC}(u)+\sum _{i=0}^{\ell } \nu _i \mathcal {H}_{e_i}(u)\quad \text { for some } \ell \in \mathbb {N}\cup \{\infty \},\, e_i\in \mathbb {R}\text { and } \nu _i>0, \end{aligned}$$
(1.5)

where \(\beta _{AC}=\beta _{AC}(s)\) denotes an absolutely continuous and hence a.e. differentiable function in \(\mathbb {R}\), such that

$$\begin{aligned} 0<\alpha _o\le \beta _{AC}'(s)\le \alpha _1, \end{aligned}$$

for two positive constants \(\alpha _o\) and \(\alpha _1\). This reflects the fact that the thermal properties of the material under consideration might change according to the temperature. The graph (1.5) can be reduced to (1.2) by a straightforward adaption of the change of variables introduced in [4, Sect. 1]. Furthermore, Theorem 1.1 continues to hold for (1.3) with lower order terms, which take into account the convection resulting from the heat transfer. Again, the modifications of the proofs can be modeled on the arguments in [4,5,6], but we refrain from pursuing generality in this direction, focusing instead on the actual novelties.

Theorem 1.1 bears global information of \(\beta \) through the range of u. However, once a modulus of continuity is obtained, we can confine the range of u by restricting space-time distance, such that u only experiences one jump of \(\beta \) at most.

Corollary 1.1

(Localization) Under the hypotheses of Theorem 1.1, the modulus improves automatically to the one for the two-phase problem.

1.2 Novelty and significance

Graphs \(\beta \) such as the one in (1.2), but exhibiting just a single jump, say at the origin, arise from a weak formulation of the classical Stefan problem, which models a liquid/solid phase transition, such as water/ice. It is quite natural to ask whether the transition of phase occurs with a continuous temperature across the water/ice interface. This question was initially raised in a 1960 paper of Oleĭnik (see [18]) and was later reported in [14, Chapter V, Sect. 9]. Since then an important research field was born, and soon new problems started to be posed, besides the one originally formulated by Oleĭnik in her 1960 paper. The interested reader can refer to [21], to have at least an overview of the huge development that the research about the Stefan problem has witnessed. In these notes the issue is the regularity of local solutions, the ultimate goal being to prove the continuity of solutions to (1.1) for a general maximal monotone graph \(\beta \). Such a result has not been achieved yet, even though it is clear that the coercivity of \(\beta \) is essential for a solution to be continuous, as pointed out by examples in [8].

Continuity results for (1.1) with \(\beta \) as in (1.2) but with a single jump, and \(p=2\), have been given in [3, 4, 19, 22]. Moreover, Ziemer proved the continuity up to the boundary for general Dirichlet boundary data. Whereas Caffarelli and Evans heavily relied on the properties of the Laplacian, and their result cannot be extended to the full quasilinear case of (1.3), DiBenedetto’s approach is flexible enough to deal with the general framework, and it also allows lower order terms, which are thoroughly justified from a physical point of view, since they describe convection phenomena.

A quantitative estimate on the modulus of continuity, still in the case of a single jump and \(p=2\), was given in [7, Remark 3.1], but without proof. Few years later, DiBenedetto quantified Ziemer’s results, and in [5] proved that solutions have a boundary modulus of continuity of the kind

$$\begin{aligned} \varvec{\omega }(r)=C\Big [\ln \ln \Bigg (\frac{R}{cr}\Bigg )\Big ]^{-\sigma },\quad C>1,\, c,\,\sigma \in (0,1),\,r\in (0,R). \end{aligned}$$
(1.6)

A major step forward towards a full proof of the local continuity of solutions to (1.3) with \(p=2\) and \(\beta \) a general maximal monotone graph in \(\mathbb {R}\times \mathbb {R}\), is represented by [11]; the authors proved that locally bounded, weak solutions are locally continuous, and the modulus of continuity can be quantitatively estimated only in terms of the data, even though an explicit expression of such a modulus is not provided in the paper. The proof is given in full generality for \(N=2\), whereas for \(N\ge 3\) it relies on a proper comparison function, and therefore, it is limited to \({\textbf {A}}=Du\). The paper is quite technical, but a thorough and clear presentation of the methods employed is given in [10, Sect. 5]; the list of references therein gives a comprehensive state of the art at the moment of its publication.

To our knowledge, the first paper to deal with \(p>2\) is [20]: besides its intrinsic mathematical interest, the nonlinear diffusion operator with growth of order larger than 2 naturally takes into account non-Newtonian filtration phenomena.

For a few years there were basically no further improvements, as far as the continuity issue is concerned. Things changed with [1]: the authors consider (1.3) with \(p\ge 2\) and (1.2) with a single jump, and they derive an explicit modulus of continuity better than (1.6), namely

$$\begin{aligned} \varvec{\omega }(r)=C\Big [\ln \Bigg (\frac{R}{cr}\Bigg )\Big ]^{-\sigma },\quad C>1,\, c,\,\sigma \in (0,1),\,r\in (0,R), \end{aligned}$$
(1.7)

with \(\sigma \) precisely quantified just in terms of N and p, which they conjecture to be optimal. In [2] the result is extended up to the boundary: under the same conditions as before about the equation, and assuming a positive geometric density condition at the boundary \(\partial E\), solutions to the Dirichlet problem have a modulus of continuity as in (1.6), yet weaker than (1.7).

Further progress has been recently made in [16, 17]. Indeed, interior moduli sharper than (1.7) are provided in [17] for \(p=2\) and \(N=1,2\). On the other hand, under the same general conditions as in [2], the boundary modulus of continuity has been improved to (1.7) in [16]: for Dirichlet boundary conditions, any \(p\ge 2\) can do; whereas for Neumann boundary conditions only \(p=2\) could be dealt with, while the case \(p>2\) remains an open problem.

With respect to the existing literature described so far, the present work represents a step forward, at least under two different points of view.

First of all, we consider an arbitrary number of jumps of \(\beta \), and not just a single discontinuity; this case has already been dealt with in [12], but only for \(p=2\), whereas here we work with \(p\ge 2\). Moreover, even though some of the techniques employed in [12] and here are comparable, the general approach we follow is definitely different.

The other novelty is given by the explicit modulus of continuity in Theorem 1.1: to our knowledge, it is the first time that a modulus is explicitly stated for a \(\beta \) that is more general than the one considered in [1, 2, 16, 17]. Due to the wide generality assumed on \(\beta \), i.e. arbitrary number of jumps and arbitrary height for each single jump, the parameter \(\sigma \) depends on the data, that is, also on \(\Vert u\Vert _{\infty ,E_T}\). Providing an optimal modulus of continuity that carries global information of \(\beta \) is a difficult task, and we are well aware that the one shown in Theorem 1.1 seems far from being the best possible. Nevertheless, as we have pointed out in Corollary 1.1, the importance of a quantitative continuity statement lies in the fact that once we have it, the same result implies that the modulus can be automatically improved to the one for the two-phase problem (single-jump); indeed, by restricting the space-time distance, u can be confined, so that it experiences one jump of \(\beta \) at most, and we end up having the modulus given in (1.7). We refrained from going into details about the proof of Corollary 1.1, since we would basically have to reproduce what was done in [1].

Moreover, in a forthcoming paper we plan to address a multi-phase transition problem with a maximal monotone graph \(\beta \) as in (1.5), without assuming that \(\beta _{AC}'\) is bounded above: besides its intrinsic mathematical interest, this is what occurs, for example, in the so-called Buckley-Leverett model for the motion of two immiscible fluids in a porous medium (see [13, 15]). In such a case, \(\beta \) presents two singularities, say at \(u=0\) and \(u=1\), where \(\beta \) can become vertical with an exponential speed, or even faster, and might also exhibit a jump.

1.3 Definition of solution

A function

$$\begin{aligned} u\in L_{{\text {loc}}}^{\infty }\big (0,T;L^2_{{\text {loc}}}(E)\big )\cap L^p_{{\text {loc}}}\big (0,T; W^{1,p}_{{\text {loc}}}(E)\big ) \end{aligned}$$

is a local, weak sub(super)-solution to (1.3) with the structure conditions (1.4), if for every compact set \(K\subset E\) and every sub-interval \([t_1,t_2]\subset (0,T]\), there is a selection \(v\subset \beta (u)\), i.e.

$$\begin{aligned} \left\{ \big (z,v(z)\big ): z\in E_T\right\} \subset \left\{ \big (z,\beta [u(z)]\big ): z\in E_T\right\} , \end{aligned}$$

such that

$$\begin{aligned} \int \limits _K v\zeta \,\textrm {d}x\Bigg |_{t_1}^{t_2} + \iint \limits _{K\times (t_1,t_2)} \big [-v\partial _t\zeta +\textbf {A}(x,t,u,Du)\cdot D\zeta \big ]\textrm {d}x\textrm {d}t\le (\ge )0 \end{aligned}$$

for all non-negative test functions

$$\begin{aligned} \zeta \in W^{1,2}_{{\text {loc}}}\big (0,T;L^2(K)\big )\cap L^p_{{\text {loc}}}\big (0,T;W_o^{1,p}(K) \big ). \end{aligned}$$

Observe that \(v\in L^{\infty }_{{\text {loc}}} \big (0,T;L^2_{{\text {loc}}}(E)\big )\) and hence all the integrals are well-defined. A function that is both a local, weak sub-solution and a local, weak super-solution is termed a local, weak solution.

We will consider the regularized version of the Stefan problem (1.3). For a parameter \(\varepsilon \in (0,\tfrac{1}{2} d)\), we introduce the function

$$\begin{aligned} \mathcal {H}_{e_i}^{\varepsilon }(u):=\left\{ \begin{array}{cc} 1,\quad &{} u>e_i+\varepsilon ,\\ \frac{1}{\varepsilon }(u-e_i),\quad &{} e_i\le u\le e_i+\varepsilon ,\\ 0,\quad &{} u<e_i, \end{array} \right. \end{aligned}$$

and define the mollification of \(\beta \) by

$$\begin{aligned} \beta _\varepsilon (u)\equiv u+H_\varepsilon (u):=u+\sum _{i=0}^{\ell } \nu _i \mathcal {H}_{e_i}^{\varepsilon }(u); \end{aligned}$$

we now deal with

$$\begin{aligned} \begin{aligned}&\partial _t [u+H_\varepsilon (u)] -{\text {div}}\textbf {A}(x,t,u, Du) \le (\ge ) 0\quad \text { weakly in }\> E_T. \end{aligned} \end{aligned}$$
(1.8)

Sub(super)-solutions to (1.8) are defined like for the parabolic p-Laplacian as in [6, Chapter II]. Hence, the solutions to (1.8) are generally not smooth.

In this note we assume that local solutions to (1.3) can be approximated by a sequence of solutions to (1.8) locally uniformly. This approximating approach parallels the one in [11], yet with a more particular \(\beta _\varepsilon \) and the p-Laplacian here. The goal is to establish an estimate on the modulus of continuity for the approximating solutions uniform in \(\varepsilon \), which grants the same modulus to the limiting function.

There is yet another notion of solution, which requires a solution to possess time derivative in the Sobolev sense, cf. [4, 5, 16, 17]. Theorem 1.1 continues to hold for that kind of notion and the proof calls for minor modifications from the one given here. The advantage is that an approximating scheme is not needed. However, the preset requirement on the time derivative is usually too strong to guarantee the continuity of a constructed solution in the existence theory.

1.4 Structure of the proof

Since the paper is technically involved, we think it better to first discuss the main ideas in an informal way.

Roughly speaking, we follow an approach that is by now standard when dealing with the continuity of solutions to degenerate parabolic equations: starting from a properly built reference cylinder, we have two alternatives: either we can find a sub-cylinder, such that the set where u is close to its supremum is small, or such a sub-cylinder cannot be determined.

In the first case, we can show a reduction of oscillation near the supremum, and this is accomplished in Sect. 3.1; the second alternative is more difficult and will be taken on in Sects. 3.23.5, where we prove a reduction of oscillation near the infimum, assuming that such an infimum is actually close to one of the discontinuity points; finally, the case of the infimum being properly far from all the discontinuity points is dealt with in Sect. 3.6. Indeed, this last possibility is the easiest one, since the equation behaves as though it were the parabolic p-Laplacian with \(p\ge 2\).

All the alternatives are quantified, and the structural dependences of the various constants are carefully traced, and this eventually leads to an estimate of the modulus of continuity in Sect. 3.7. As pointed out in Corollary 1.1, once established, the modulus improves automatically to the one for the two-phase problem. Indeed, in general, \(\omega \) could be large, and d could be small: our argument shrinks \(\omega \) step by step to an oscillation less than d across all potential jumps, and this quantification is precisely what eventually gives the modulus of (1.7).

2 Preliminary tools

2.1 Energy estimates

Here and in the following, we denote by \(K_\varrho (x_o)\) the cube of side length \(2\varrho \) and center \(x_o\), with faces parallel to the coordinate planes of \(\mathbb {R}^N\), and for \(k\in \mathbb {R}\) we let the truncations \((u-k)_+\) and \((u-k)_-\) be defined by

$$\begin{aligned} (u-k)_+\equiv \max \{u-k, 0\},\quad (u-k)_-\equiv \max \{k-u, 0\}. \end{aligned}$$

We can repeat almost verbatim the calculations in [11, Sect. 2] modulo proper mollification in the time variable, and prove the following estimates.

Proposition 2.1

Let u be a local weak sub(super)-solution to (1.8) with (1.4) in \(E_T\). There exists a constant \(\gamma (C_o,C_1,p)>0\), such that for all cylinders \(Q_{R,S}=K_R(x_o)\times (t_o-S,t_o)\subset E_T\), every \(k\in \mathbb {R}\), and every non-negative, piecewise smooth cutoff function \(\zeta \) vanishing on \(\partial K_{R}(x_o)\times (t_o-S,t_o)\), there holds

(2.1)

The general formulation of (2.1) can be simplified, if we take into account the specific structure of \(H_\varepsilon \). In particular, since \(H'_\varepsilon \ge 0\), the second term on the left-hand side can be dropped. On the other hand, since \(H_\varepsilon \) is a linear combination of Heaviside functions (an increasing step function) modulo \(\varepsilon \), we have

$$\begin{aligned} \int \limits _0^{(u-k)_\pm }H'_\varepsilon (k\pm s) s\,\textrm {d}s\le (u-k)_\pm \int \limits _0^{(u-k)_\pm }H'_\varepsilon (k\pm s) \,\textrm {d}s\le \Bigg (\sum \limits _{i=0}^{\ell }\nu _i\Bigg )(u-k)_\pm , \end{aligned}$$

provided \(\sum _{i=0}^{\ell }\nu _i\) is finite. Instead, if it is infinite, we let

$$\begin{aligned} M:=\Vert u\Vert _{\infty ,E_T}, \end{aligned}$$

and estimate

$$\begin{aligned} \int \limits _0^{(u-k)_\pm }H'_\varepsilon (k\pm s) s\,\textrm {d}s\le \sup _{-M\le s\le M}|H_\varepsilon (s)|(u-k)_\pm . \end{aligned}$$

Hence, in this case the subsequent estimates will depend also on M, but are independent of \(\varepsilon \).

With all these remarks, (2.1) becomes,

where the constant \(\gamma \) depends only on the data but M, if \(\sum _{i=0}^{\ell }\nu _i\) is finite. If it is infinite, the constant \(\gamma \) also depends on M.

If we choose the cutoff function \(\zeta \) such that \(\zeta (\cdot ,t_o-S)=0\), then we obtain

(2.2)

which corresponds to estimate (2.5) of [11].

On the other hand, if we choose the cutoff function such that \(\zeta =\zeta (x)\), i.e. independent of t, we get

(2.3)

which corresponds to estimate (2.6) of [11].

2.2 Logarithmic estimates

The following logarithmic energy estimate will be useful; the case \(p=2\) has been derived in [12, (3.13)] (see also [11, (2.7)]). To this end, letting k, u and \(Q_{R,S}\) be as in Proposition 2.1, we set

$$\begin{aligned} \mathcal {L}:=\sup _{Q_{R,S}}(u-k)_{\pm }, \end{aligned}$$

take \(c\in (0,\mathcal {L})\), and introduce the following function in \(Q_{R,S}\):

$$\begin{aligned} \Psi (x,t)\equiv \Psi \big (\mathcal {L},(u-k)_{\pm }, c\big ):=\ln _+\Big (\frac{\mathcal {L}}{\mathcal {L}-(u-k)_{\pm }+c}\Big ). \end{aligned}$$

This function enjoys the following estimate.

Proposition 2.2

Let the hypotheses in Proposition 2.1 hold. There exists \(\gamma >1\) depending only on the data and on M, such that for any \(\sigma \in (0,1)\),

$$\begin{aligned}&\sup _{t_o-S\le t\le t_o}\int \limits _{K_{\sigma R}(x_o)}\Psi ^2(x,t)\,\textrm {d}x\le \int \limits _{K_{R}(x_o)}\Psi ^2(x,t_o-S)\,\textrm {d}x\\&\quad +\frac{\gamma }{c}\int \limits _{K_{R}(x_o)}\Psi (x,t_o-S)\,\textrm {d}x+\frac{\gamma }{(1-\sigma )^p R^p}\iint \limits _{Q_{R,S}}\Psi |\Psi _u|^{2-p}\,\textrm {d}x\textrm {d}t. \end{aligned}$$

Proof

To simplify the symbolism, we denote \(\Psi (s)\equiv \Psi \big (\mathcal {L},s, c\big )\) and its derivative \(\Psi '\). In the weak formulation we use the test function \(\pm \Psi \Psi '\zeta ^p\), with \(\zeta =\zeta (x)\) and

$$\begin{aligned} \left\{ \begin{aligned}&\zeta \equiv 1\ \text { on }\ K_{\sigma R}(x_o),\\&\zeta =0\ \text { for }\ x\in \partial K_R(x_o),\\&|D\zeta |\le \frac{1}{(1-\sigma )R}. \end{aligned} \right. \end{aligned}$$

We work in the cylinder \(K_R(x_o)\times (t_o-S,t]\) with \(t\in (t_o-S,t_o]\). Observe that

$$\begin{aligned} \pm H_\varepsilon '(u) \partial _t u \Psi \Psi '=\partial _t\int \limits _0^{(u-k)_\pm } H_\varepsilon '(k\pm s)\Psi (s) \Psi '(s)\, \textrm {d}s. \end{aligned}$$

By the arbitrariness of \(t\in (t_o-S,t_o]\), we easily obtain

$$\begin{aligned}&\sup _{t_o-S\le t\le t_o}\frac{1}{2}\int \limits _{K_R(x_o)}\Psi ^2(x,t)\zeta ^p(x)\,\textrm {d}x\\&\qquad +\sup _{t_o-S\le t\le t_o}\int \limits _{K_R(x_o)\times \{t\}}\left( \int \limits _0^{(u-k)_\pm } H_\varepsilon '(k\pm s)\Psi (s) \Psi '(s)\,\textrm {d}s\right) \zeta ^p(x)\,\textrm {d}x\\&\qquad +\iint \limits _{Q_{R,S}}(1+\Psi )\Psi '^2|D(u-k)_\pm |^p\zeta ^p\,\textrm {d}x\textrm {d}\tau \\&\quad \le \frac{1}{2}\int \limits _{K_R}\Psi ^2(x,t_o-S)\zeta ^p(x)\,\textrm {d}x\\&\quad +\int \limits _{K_R(x_o)\times \{t_o-S\}}\left( \int \limits _0^{(u-k)_\pm } H_\varepsilon '(k\pm s)\Psi (s) \Psi '(s)\,\textrm {d}s\right) \zeta ^p(x)\,\textrm {d}x\\&\quad +p\iint \limits _{Q_{R,S}}\Psi \Psi '|D(u-k)_\pm |^{p-1}\zeta ^{p-1}|D\zeta |\,\textrm {d}x\textrm {d}\tau . \end{aligned}$$

Since \(H'_\varepsilon \ge 0\), the second term on the left-hand side can be discarded. As for the right-hand side, since \(\Psi \) is an increasing function of its argument \((u-k)_\pm \), we have

$$\begin{aligned} \int \limits _0^{(u-k)_\pm } H_\varepsilon '(k\pm s)\Psi (s) \Psi '(s) \,\textrm {d}s&\le \Big (\sup _{Q_{R,S}}\Psi '\Big ) \int \limits _0^{(u-k)_\pm }H'_\varepsilon (k\pm s)\Psi (s)\,\textrm {d}s\\&\le \Big (\sup _{Q_{R,S}}\Psi '\Big )\Bigg (\sum \limits _{i=0}^{\ell }\nu _i\Bigg )\Psi \big ((u-k)_\pm \big ), \end{aligned}$$

provided \(\sum _{i=0}^{\ell }\nu _i\) is finite. If instead it is infinite, we estimate

$$\begin{aligned} \int \limits _0^{(u-k)_\pm }H'_\varepsilon (k\pm s) \Psi (s) \Psi '(s)\,\textrm {d}s\le \Big (\sup _{Q_{R,S}}\Psi '\Big )\Big (\sup _{-M\le s\le M}|H_\varepsilon (s)|\Big )\Psi \big ((u-k)_\pm \big ). \end{aligned}$$

Hence, in this case the subsequent estimates will depend also on M.

By its definition, \(\Psi '\le 1/c\) in \(Q_{R,S}\) and therefore,

$$\begin{aligned} \int \limits _{K_R(x_o)\times \{t_o-S\}}\left( \int \limits _0^{(u-k)_\pm } H_\varepsilon '(k\pm s)\Psi \Psi ' \,\textrm {d}s\right) \zeta ^p(x)\,\textrm {d}x\le \frac{\gamma }{c}\int \limits _{K_R(x_o)}\Psi (x,t_o-S)\,\textrm {d}x, \end{aligned}$$

where \(\gamma \) depends only on the data if \(\sum _{i=0}^{\ell }\nu _i\) is finite, otherwise it depends also on M.

Moreover, since \(\Psi \ge 0\), an application of Young’s inequality yields that

$$\begin{aligned}&p\iint \limits _{Q_{R,S}} \Psi \Psi ' \zeta ^{p-1} |D(u-k)_{\pm }|^{p-1} |D\zeta |\,\textrm {d}x\textrm {d}\tau \\&\quad \le \iint \limits _{Q_{R,S}}(1+\Psi )\Psi '^2\zeta ^p |D(u-k)_{\pm }|^p\,\textrm {d}x\textrm {d}\tau +\frac{\gamma }{(1-\sigma )^p R^p}\iint \limits _{Q_{R,S}}\Psi |\Psi '|^{2-p} \,\textrm {d}x\textrm {d}\tau . \end{aligned}$$

Collecting all the terms, we conclude the proof. \(\square \)

2.3 De Giorgi type lemmas

For a cylinder \(\mathcal {Q}=K \times (T_1,T_2)\subset E_T\), we introduce the numbers \(\mu ^{\pm }\) and \(\omega \) satisfying

We now present the first De Giorgi type lemma that can be shown by using the energy estimates in (2.2); for the detailed proof we refer to [16, Lemma 2.1]. Here we denote the backward cylinder \((x_o,t_o)+Q_{\varrho }(\theta ):=K_{\varrho }(x_o)\times (t_o-\theta \varrho ^p,t_o)\). If no confusion arises, we omit the vertex \((x_o,t_o)\) for simplicity.

Lemma 2.1

Let u be a local weak sub(super)-solution to (1.8) with (1.4) in \(E_T\). For \(\xi \in (0,1)\), set \(\theta =(\xi \omega )^{2-p}\). There exists a constant \(c_o\in (0,1)\) depending only on the data, such that if

$$\begin{aligned} \big |\big [\pm (\mu ^\pm -u)\le \xi \omega \big ]\cap Q_{\varrho }(\theta ) \big |\le c_o(\xi \omega )^{\frac{N+p}{p}}|Q_{\varrho }(\theta )|, \end{aligned}$$

then

$$\begin{aligned} \pm (\mu ^\pm -u)\ge \tfrac{1}{2}\xi \omega \quad \text { a.e. in }Q_{\frac{1}{2}\varrho }(\theta ), \end{aligned}$$

provided the cylinder \(Q_{\varrho }(\theta )\) is included in \(\mathcal {Q}\).

The next lemma is a variant of the previous one, involving quantitative initial data. For this purpose, we will use the forward cylinder at \((x_o,t_1)\):

$$\begin{aligned} (x_o, t_1)+Q^+_{\varrho }(\theta ):=K_{\varrho }(x_o)\times (t_1,t_1+\theta \varrho ^p). \end{aligned}$$
(2.4)

We have the following.

Lemma 2.2

Let u be a local weak sub(super)-solution to (1.8) with (1.4) in \(E_T\). Assume that for some \(\xi \in (0,1)\) there holds

$$\begin{aligned} \pm \big (\mu ^\pm -u(\cdot , t_1)\big )\ge \xi \omega \quad \text { a.e. in } K_{\varrho }(x_o). \end{aligned}$$

There exists a constant \(\gamma _o\in (0,1)\) depending only on the data, such that for any \(\theta >0\), if

$$\begin{aligned} \big |\big [\pm (\mu ^\pm -u)\le \xi \omega \big ]\cap \big [(x_o, t_1)+Q^+_{\varrho }(\theta )\big ]\big |\le \frac{ \gamma _o(\xi \omega )^{2-p}}{\theta }|Q^+_{\varrho }(\theta )|, \end{aligned}$$

then

$$\begin{aligned} \pm (\mu ^\pm -u )\ge \tfrac{1}{2}\xi \omega \quad \text { a.e. in }K_{\frac{1}{2}\varrho }(x_o)\times (t_1,t_1+\theta \varrho ^p), \end{aligned}$$

provided the cylinder \((x_o, t_1)+Q^+_{\varrho }(\theta )\) is included in \(\mathcal {Q}\).

Proof

Let us deal with the case of super-solutions only, as the other case is similar. We use the energy estimate in (2.3) in the cylinder \(Q_{R,S}\equiv (x_o, t_1)+Q^+_{\varrho }(\theta )\), with the levels

$$\begin{aligned} k_n=\mu ^-+\frac{\xi \omega }{2}+\frac{\xi \omega }{2^{n+1}},\quad n=0,1,\cdots . \end{aligned}$$

Due to this choice of \(k_n\) and the assumed pointwise information at \(t_1\), the two space integrals at \(t_o-S\equiv t_1\) vanish and the energy estimates reduce to the ones for the parabolic p-Laplacian. As a result, the rest of the proof can be reproduced as in [9, Chapt. 3, Sect. 4]. \(\square \)

The next lemma quantifies measure conditions to ensure the degeneracy of the p-Laplacian prevails over the singularity of \(\beta (\cdot )\). Its proof can be attributed to the theory of parabolic p-Laplacian. Again we omit the vertex of \((x_o,t_o)\) from \(Q_{\varrho }(\theta )\) for simplicity.

Lemma 2.3

Let u be a local weak super-solution to (1.8) with (1.4) in \(E_T\). Assume that for some \(\alpha ,\,\eta \in (0,1)\) and \(A>1\), there holds

$$\begin{aligned} \big |\big [u(\cdot , t)-\mu ^-\ge \eta \omega \big ]\cap K_\varrho \big |>\alpha |K_\varrho |\quad \text { for all }t\in \big (t_o-A\omega ^{2-p}\varrho ^p,t_o\big ). \end{aligned}$$

There exists \(\xi \in (0,\eta )\), such that if \(A\ge \xi ^{2-p}\) and

$$\begin{aligned} \iint \limits _{Q_{\varrho }(\theta )}\int \limits _u^{k} H_\varepsilon '(s)\chi _{[s<k]} \,\textrm {d}s\textrm {d}x\textrm {d}t\le \xi \omega \big |\big [u\le \mu ^-+\tfrac{1}{2}\xi \omega \big ]\cap Q_{\frac{1}{2}\varrho }(\theta ) \big |, \end{aligned}$$

where \(k=\mu ^-+\xi \omega \) and \(\theta =(\xi \omega )^{2-p}\), then

$$\begin{aligned} u\ge \mu ^-+\tfrac{1}{4}\xi \omega \quad \text { a.e. in } Q_{\frac{1}{2}\varrho }(\theta ), \end{aligned}$$

provided the cylinder \(Q_{\varrho }(A\omega ^{2-p})\) is included in \(\mathcal {Q}\). Moreover, it can be traced that

$$\xi =\gamma (\text {data})\,\eta \exp \Bigg \{-\alpha ^{-\frac{p}{p-1}}\Bigg \}.$$

Proof

Let us turn our attention to the energy estimate (2.1) written with \(Q_{R,S}=Q_r(\theta )\) for \(\frac{1}{2}\varrho \le r\le \varrho \), and with \(k=\mu ^-+\xi \omega \). The last integral on the right-hand side is estimated by using the given measure theoretical information:

$$\begin{aligned}&\iint \limits _{Q_{r}(\theta )}\int \limits _u^{k} H_\varepsilon '(s)(k-s)_+ \,\textrm {d}s |\partial _t\zeta ^p|\,\textrm {d}x\textrm {d}t\\&\quad \le (k-\mu ^-)\iint \limits _{Q_{r}(\theta )}\int \limits _u^{k} H_\varepsilon '(s)\chi _{[s<k]} \,\textrm {d}s |\partial _t\zeta ^p|\,\textrm {d}x\textrm {d}t\\&\quad \le \xi \omega \Vert \partial _t\zeta ^p\Vert _{\infty } \iint \limits _{Q_{\varrho }(\theta )}\int \limits _u^{k} H_\varepsilon '(s)\chi _{[s<k]} \,\textrm {d}s \textrm {d}x\textrm {d}t\\&\quad \le (\xi \omega )^2 \Vert \partial _t\zeta ^p\Vert _{\infty } \big |\big [u\le \mu ^-+\tfrac{1}{2}\xi \omega \big ]\cap Q_{\frac{1}{2} \varrho }(\theta )\big |\\&\quad \le 4 \Vert \partial _t\zeta ^p\Vert _{\infty } \iint \limits _{Q_{r}(\theta )}\big [u-(\mu ^-+\xi \omega )\big ]^2_-\,\textrm {d}x\textrm {d}t. \end{aligned}$$

As such it can be combined with an analogous term involving \(\partial _t\zeta ^p\) on the right-hand side of (2.1). Consequently, we end up with an energy estimate of \((u-k)_-\), departing from which the theory of parabolic p-Laplacian in [6] applies. Therefore, we may determine a constant \(\xi \) by the data and \(\alpha \), such that

$$\begin{aligned} u\ge \mu ^-+\tfrac{1}{4}\xi \omega \quad \text { a.e. in }Q_{\frac{1}{2}\varrho }(\theta ). \end{aligned}$$

The dependence of \(\xi \) can be traced as in [16, Appendix A]. \(\square \)

Remark 2.1

An analogous statement for sub-solutions holds near \(\mu ^+\). Since we do not use it in the argument, it is omitted.

2.4 Consequence of the logarithmic estimate

The setting is the same as in Sect. 2.3, i.e., we introduce the cylinder \(\mathcal {Q}\subset E_T\) and define the quantities \(\mu ^\pm \) and \(\omega \) connecting the supremum/infimum and the oscillation of u over \( \mathcal {Q}\). We will use also cylinders of the forward type (2.4), with vertex at \((x_o,t_1)\).

The following lemma indicates how the measure of sets where u is close to the supremum/infimum shrinks at each level of an arbitrarily long time interval, once initial pointwise information is given.

Lemma 2.4

Let u be a local weak sub(super)-solution to (1.3) with (1.4) in \(E_T\). For \(\xi \in (0,1)\), set \(\theta =(\xi \omega )^{2-p}\). Suppose that

$$\begin{aligned} \pm \big (\mu ^{\pm }-u(\cdot , t_1)\big )\ge \xi \omega \quad \text { a.e. in } K_{\varrho }(x_o). \end{aligned}$$
(2.5)

Then for any \(\alpha \in (0,1)\) and \(A\ge 1\), there exists \(\bar{\xi }\in (0,\frac{1}{4}\xi )\), such that

$$\begin{aligned} \big |\big [\pm \big (\mu ^{\pm }-u(\cdot , t)\big )\le \bar{\xi }\omega \big ]\cap K_{\frac{1}{2}\varrho }(x_o)\big |\le \alpha |K_{\frac{1}{2}\varrho }|\quad \text { for all }t\in (t_1,t_1+A\theta \varrho ^p), \end{aligned}$$

provided the cylinder \(K_{\varrho }(x_o)\times (t_1,t_1+A\theta \varrho ^p)\) is included in \(\mathcal {Q}\). Moreover, the dependence of \(\bar{\xi }\) is given by

$$\begin{aligned} \bar{\xi }=\tfrac{1}{2}\xi \exp \Bigg \{-\gamma (\text {data})\frac{A}{\alpha }\Bigg \}. \end{aligned}$$

Proof

We will prove the estimate with \(\mu ^+\), since the one with \(\mu ^-\) is completely analogous. Moreover, for simplicity we omit the dependence on \(x_o\). Proposition 2.2 will be used in the cylinder \(K_{\varrho }(x_o)\times (t_1,t_1+A\theta \varrho ^p) \). To this end, let us put

$$\begin{aligned} k=\mu ^+-\xi \omega ,\qquad \sigma =\tfrac{1}{2},\qquad c=\bar{\xi }\omega , \end{aligned}$$

with \(\bar{\xi }\in (0,\frac{1}{4}\xi )\) to be chosen. Due to (2.5) the integrals on the right-hand side at time \(t=t_1\) vanish. Therefore, we are left with

$$\begin{aligned} \sup _{t_1\le t\le {t_1+A\theta \varrho ^p}}\int \limits _{K_{\frac{1}{2}\varrho }}\Psi ^2(x,t)\,\textrm {d}x\le \frac{4\gamma }{\varrho ^p} \int \limits _{t_1}^{t_1+A\theta \varrho ^p}\int \limits _{K_\varrho }\Psi |\Psi _u|^{2-p}\,\textrm {d}x\textrm {d}\tau . \end{aligned}$$

Let us relabel \(4\gamma \) as \(\gamma \). It is easy to see that

$$\begin{aligned} \Psi \le \ln \frac{\mathcal L}{\bar{\xi }\omega } \le \ln \frac{\xi }{\bar{\xi }}, \end{aligned}$$

since

$$\begin{aligned} {\mathcal L}=\sup _{\mathcal {Q}}(u-k)_+\le \mu ^+-\mu ^++\xi \omega =\xi \omega . \end{aligned}$$

On the other hand,

$$\begin{aligned} |\Psi _u|^{2-p}=\left| {\mathcal L}-(u-k)_++\bar{\xi }\omega \right| ^{p-2}\le (2\xi \omega )^{p-2}. \end{aligned}$$

Hence, we may estimate

$$\begin{aligned} \frac{\gamma }{\varrho ^p}\int \limits _{t_1}^{t_1+A\theta \varrho ^p}\int \limits _{K_\varrho }\Psi |\Psi _u|^{2-p}\,\textrm {d}x\textrm {d}\tau&\le \gamma A\theta (2\xi \omega )^{p-2}\,|K_\varrho | \Bigg (\ln \frac{\xi }{\bar{\xi }}\Bigg ) \le \gamma A\,|K_\varrho | \Bigg (\ln \frac{\xi }{2\bar{\xi }} \Bigg ), \end{aligned}$$

bearing in mind that \(\bar{\xi }\in (0,\frac{1}{4}\xi )\) and \(\theta =(\xi \omega )^{2-p}\). If we consider

$$\begin{aligned} A_{\bar{\xi },{\frac{1}{2}\varrho }}(t):=\big [u(\cdot ,t)>\mu ^+-\bar{\xi }\omega \big ] \cap K_{\frac{1}{2}\varrho } \end{aligned}$$

as integration set for the integral on the left-hand side, instead of the larger \(K_{{\frac{1}{2}\varrho }}\) and note that \(\Psi \) is decreasing in \(\mathcal {L}\), we may estimate over \(A_{\bar{\xi },{\frac{1}{2}\varrho }}(t)\):

$$\begin{aligned} \Psi \ge \ln \frac{\xi \omega }{\xi \omega -(\xi \omega -\bar{\xi }\omega )+\bar{\xi }\omega }=\ln \frac{\xi }{2\bar{\xi }}. \end{aligned}$$

Then we obtain

$$\begin{aligned} \Bigg (\ln \frac{\xi }{2\bar{\xi }}\Bigg )^2|A_{\bar{\xi },{\frac{1}{2}\varrho }}(t)| \le \widetilde{\gamma } A \Bigg (\ln \frac{\xi }{2\bar{\xi }}\Bigg ) |K_{\frac{1}{2}\varrho }|\quad \text { for all } t\in (t_1,{t_1+A\theta \varrho ^p}) \end{aligned}$$

with \(\widetilde{\gamma }=2\gamma \), that is

$$\begin{aligned} |A_{\bar{\xi },{\frac{1}{2}\varrho }}(t)|\le \frac{ \widetilde{\gamma } A}{\ln \left( \xi /{2\bar{\xi }}\right) } |K_{\frac{1}{2}\varrho }|\quad \text { for all } t\in (t_1,{t_1+A\theta \varrho ^p}). \end{aligned}$$

If we choose \(\bar{\xi }\) such that

$$\begin{aligned} \alpha \equiv \frac{\widetilde{\gamma } A}{\ln \left( \xi /2\bar{\xi }\right) }, \end{aligned}$$

we conclude the proof. \(\square \)

3 Proof of Theorem 1.1

Assume \((x_o,t_o)=(0,0)\), introduce \(Q_o=K_\varrho \times (-\varrho ^{p-1},0)\) and set

Letting \(\theta =(\frac{1}{4} \omega )^{2-p}\), for some \(A(\omega )>1\) to be determined in terms of the data and \(\omega \), we may assume that

(3.1)

the case when the set inclusion does not hold will be incorporated later.

3.1 Reduction of oscillation near the supremum

In this section, we work with u as a sub-solution near its supremum. Recalling that \(\theta =(\frac{1}{4} \omega )^{2-p}\), suppose for some \(\bar{t}\in \big (-(A-1)\theta \varrho ^p,0\big )\), there holds

$$\begin{aligned} \Bigg |\Bigg [\mu ^+ - u\le \frac{1}{4} \omega \Bigg ] \cap \Big [(0,\bar{t})+Q_{\varrho }(\theta )\Big ]\Bigg |\le c_o\Bigg (\frac{1}{4} \omega \Bigg )^{\frac{N+p}{p}}|Q_{\varrho }(\theta )|=:\alpha |Q_{\varrho }(\theta )|, \end{aligned}$$
(3.2)

where \(c_o\) is the constant determined in Lemma 2.1 in terms of the data. According to Lemma 2.1 (with \(\xi =\frac{1}{4}\)), we have

$$\begin{aligned} \mu ^+ - u\ge \tfrac{1}{8} \omega \quad \text{ a.e. } \text{ in } \quad (0,\bar{t})+Q_{\frac{1}{2} \varrho }(\theta ). \end{aligned}$$
(3.3)

This pointwise estimate (3.3) at \(t_1:=\bar{t}-\theta (\tfrac{1}{2}\varrho )^p\) serves as the initial datum for an application of Lemma 2.2 and Lemma 2.4. First of all, according to Lemma 2.2, there exists \(\gamma _o\in (0,1)\), such that if for some \(\eta \in (0,\frac{1}{8})\),

$$\begin{aligned} \big |\big [\mu ^+ - u\le \eta \omega \big ]\cap \widetilde{Q} \big | \le \frac{\gamma _o\Big (\frac{1}{8}\omega \Big )^{2-p}}{A\theta }|\widetilde{Q} | \quad \text { where }\widetilde{Q} :=K_{\frac{1}{2}\varrho }\times (t_1,0), \end{aligned}$$
(3.4)

then

$$\begin{aligned} \mu ^+ - u\ge \frac{1}{2}\eta \omega \quad \text { a.e. in }K_{\frac{1}{4}\varrho }\times (t_1,0). \end{aligned}$$
(3.5)

On the other hand, owing to (3.3), Lemma 2.4 implies that (3.4) is verified with the choice

$$\begin{aligned} \eta =\frac{1}{16}\exp \Bigg \{-\frac{\gamma A^2}{2^{p-2}\gamma _o}\Bigg \}, \end{aligned}$$

and hence so is (3.5) due to Lemma 2.2. Note that all constants are stable as \(p\downarrow 2\). Consequently, the estimate (3.5) yields a reduction of oscillation

(3.6)

Keep in mind that \(A(\omega )\) is yet to be chosen.

3.2 Reduction of oscillation near the infimum: part I

Starting from this section, let us suppose (3.2) does not hold, that is, for any \(\bar{t}\in \big (-(A-1)\theta \varrho ^p,0\big ]\),

$$\begin{aligned} \Bigg |\Big [\mu ^+ - u\le \frac{1}{4} \omega \Big ] \cap \big [(0,\bar{t})+Q_{\varrho }(\theta )\big ]\Bigg |>\alpha |Q_{\varrho }(\theta )|\quad \text { for }\alpha =c_o\Bigg (\frac{1}{4} \omega \Bigg )^{\frac{N+p}{p}}. \end{aligned}$$

Since \(\mu ^+-\frac{1}{4}\omega \ge \mu ^-+\frac{1}{4}\omega \) may be assumed without loss of generality, we rephrase it as

$$\begin{aligned} \Bigg |\Big [u-\mu ^-\ge \frac{1}{4}\omega \Big ]\cap \big [(0,\bar{t})+Q_{\varrho }(\theta )\big ] \Bigg |> \alpha |Q_{\varrho }(\theta )|. \end{aligned}$$
(3.7)

Fixing such \(\bar{t}\) for the moment, we will analyze the local clustering phenomenon of u encoded in the measure information (3.7). The idea of the following argument is taken from [11, Sects. 5–8]. We will work with u as a super-solution near its infimum throughout Sects. 3.23.6.

Lemma 3.1

For every \(\lambda \in (0,1)\) and \(\eta \in (0,1)\), there exist a point \((x_*,t_*)\in (0,\bar{t})+Q_{\varrho }(\theta )\), a number \(\kappa \in (0,1)\) and a cylinder \((x_*,t_*)+Q_{\kappa \varrho }(\theta )\subset (0,\bar{t})+Q_{\varrho }(\theta )\), such that

$$\begin{aligned} \Bigg |\Big [u\le \mu ^-+\frac{1}{4}\lambda \omega \Big ]\cap \big [(x_*,t_*)+Q_{\kappa \varrho }(\theta )\big ]\Bigg | \le \eta |Q_{\kappa \varrho }(\theta )|. \end{aligned}$$

The dependence of \(\kappa \) is traced by \(\kappa =\gamma (\text {data}) (1-\lambda )^{3+\frac{2}{p}}\eta ^{1+\frac{1}{p}}\alpha ^{2+\frac{1}{p}}\omega ^{1+\frac{1}{p}}\).

Proof

For simplicity of notation, we take \(\bar{t}=0\). Let \(\zeta \) be a standard cutoff function in \(Q_{2\varrho }(\theta )\) that vanishes on its parabolic boundary and equals the identity in \(Q_{\varrho }(\theta )\), satisfying \(|D\zeta |\le \gamma /\varrho \) and \(|\partial _t \zeta |\le \gamma /(\theta \varrho ^p) \). According to the energy estimate (2.2) written in \(Q_{2\varrho }(\theta )\) for \((u-k)_-\) with \(k=\mu ^-+\frac{1}{4}\omega \) and with the cutoff function \(\zeta \), a simple calculation yields

$$\begin{aligned} \iint \limits _{Q_{\varrho }(\theta )}|D(u-k)_-|^p\,\textrm {d}x\textrm {d}t\le \frac{\gamma }{\varrho ^p}\omega ^p\Big (1+\frac{1}{\omega }\Big ) |Q_{\varrho }(\theta )|. \end{aligned}$$

In terms of

$$\begin{aligned} v:=\frac{\frac{1}{4}\omega -(u-k)_-}{\frac{1}{4}\omega }, \end{aligned}$$

this energy estimate may be written as

$$\begin{aligned} \iint \limits _{Q_{\varrho }(\theta )}|Dv|^p\,\textrm {d}x\textrm {d}t\le \frac{\gamma }{\varrho ^p}\Big (1+\frac{1}{\omega }\Big ) |Q_{\varrho }(\theta )|, \end{aligned}$$
(3.8)

and meanwhile the measure information (3.7) reads

$$\begin{aligned} \big |\big [v\ge 1\big ]\cap Q_{\varrho }(\theta )\big |> \alpha |Q_{\varrho }(\theta )|. \end{aligned}$$
(3.9)

To proceed, let us define the set

$$\begin{aligned} A(t):=\big [v(\cdot , t)\ge 1\big ]\cap K_\varrho , \end{aligned}$$

and the set

$$\begin{aligned} \mathcal {P}:=\Bigg \{t\in (-\theta \varrho ^p,0): |A(t)|\ge \frac{1}{2}\alpha |K_\varrho |\Bigg \}. \end{aligned}$$

Now we may estimate by using (3.9):

$$\begin{aligned} \alpha |Q_{\varrho }(\theta )|\le \int \limits _{-\theta \varrho ^p}^{0}|A(t)|\,\textrm {d}t&=\int \limits _{\mathcal {P}}|A(t)|\,\textrm {d}t+ \int \limits _{\mathcal {P}^c}|A(t)|\,\textrm {d}t\le \Bigg (|\mathcal {P}|+\frac{1}{2}\alpha \theta \varrho ^p\Bigg )|K_\varrho |, \end{aligned}$$

which implies \(|\mathcal {P}|\ge \tfrac{1}{2}\alpha \theta \varrho ^p\). This joint with (3.8) yields that

$$\begin{aligned} \frac{1}{2}\alpha \theta \varrho ^p\inf _{t\in \mathcal {P}}\int \limits _{K_\varrho \times \{t\}}|Dv|^p\,\textrm {d}x\le \iint \limits _{Q_\varrho (\theta )}|Dv|^p\,\textrm {d}x\textrm {d}t\le \frac{\gamma }{\varrho ^p} \Bigg (1+\frac{1}{\omega }\Bigg ) |Q_{\varrho }(\theta )|, \end{aligned}$$

that is,

$$\begin{aligned} \inf _{t\in \mathcal {P}}\int \limits _{K_\varrho \times \{t\}}|Dv|^p\,\textrm {d}x\le \frac{2\gamma }{\alpha \varrho ^p}\Big (1+\frac{1}{\omega }\Big )|K_\varrho |. \end{aligned}$$

Therefore, there exists \(\widetilde{t}\in \mathcal {P}\), such that

$$\begin{aligned} \int \limits _{K_\varrho \times \{\widetilde{t}\}}|Dv|^p\,\textrm {d}x\le \frac{2\gamma }{\alpha \varrho ^p}\Big (1+\frac{1}{\omega }\Big )|K_\varrho |. \end{aligned}$$
(3.10)

Meanwhile, by the definition of \(\mathcal {P}\), there holds that

$$\begin{aligned} \Big |\big [v(\cdot , \widetilde{t})\ge 1\big ]\cap K_\varrho \Big |\ge \frac{1}{2}\alpha |K_\varrho |. \end{aligned}$$
(3.11)

Based on (3.10) and (3.11), we are ready to apply [9, Chap. 2, Lemma 3.1]. Indeed, let \(\widetilde{\lambda }:=\tfrac{1}{2}(1+\lambda )\) and \(\widetilde{\eta }\in (0,1)\) to be determined: there exist \(\widetilde{x}\in K_\varrho \) and

$$\begin{aligned} \varepsilon =\gamma (\text {data})(1-\widetilde{\lambda })\widetilde{\eta }\alpha ^{2+\frac{1}{p}}\omega ^{\frac{1}{p}}, \end{aligned}$$
(3.12)

such that

$$\begin{aligned} \big |\big [v(\cdot , \widetilde{t})\ge \widetilde{\lambda }\big ]\cap K_{\varepsilon \varrho }(\widetilde{x})\big |\ge (1-\widetilde{\eta }) |K_{\varepsilon \varrho }|. \end{aligned}$$

Reverting to u, we actually obtain

$$\begin{aligned} \Bigg |\Big [u(\cdot , \widetilde{t})\le \mu ^-+\frac{1}{4}\widetilde{\lambda }\omega \Big ]\cap K_{\varepsilon \varrho }(\widetilde{x})\Bigg | \le \widetilde{\eta } |K_{\varepsilon \varrho }|. \end{aligned}$$
(3.13)

In order to propagate this measure information, we consider the forward cylinders

$$\begin{aligned} \left\{ \begin{aligned}&K_{\frac{1}{2}\varepsilon \varrho }(\widetilde{x})\times \big (\widetilde{t},\widetilde{t}+\delta \theta (\varepsilon \varrho )^p\big ),\\&K_{ \varepsilon \varrho }(\widetilde{x})\times \big (\widetilde{t},\widetilde{t}+\delta \theta (\varepsilon \varrho )^p\big ), \end{aligned} \right. \end{aligned}$$

where \(\delta >0\) is to be determined. Let \(\zeta (x)\) be a cutoff function in \(K_{ \varepsilon \varrho }(\widetilde{x})\) that vanishes on \(\partial K_{ \varepsilon \varrho }(\widetilde{x})\) and equals the identity in \(K_{ \frac{1}{2}\varepsilon \varrho }(\widetilde{x})\), such that \(|D\zeta |\le \gamma / (\varepsilon \varrho )\). Employing (3.13), the energy estimate (2.3) for \((u-k)_-\) with \(k=\mu ^-+\tfrac{1}{4}\widetilde{\lambda }\omega \) in this setting gives that

$$\begin{aligned} \int \limits _{K_{ \frac{1}{2}\varepsilon \varrho }(\widetilde{x})\times \{t\}}(u-k)_-^2\,\textrm {d}x\le \gamma \omega ^2\Big (\delta +\frac{2\widetilde{\eta }}{\omega }\Big ) |K_{ \frac{1}{2}\varepsilon \varrho }|, \end{aligned}$$

for all \(t\in \big (\widetilde{t},\widetilde{t}+\delta \theta ( \varepsilon \varrho )^p\big )\).

We estimate the integral on the left-hand side from below by

$$\begin{aligned} \int \limits _{K_{ \frac{1}{2}\varepsilon \varrho }(\widetilde{x})\times \{t\}}(u-k)_-^2\,\textrm {d}x&\ge \int \limits _{K_{ \frac{1}{2}\varepsilon \varrho }(\widetilde{x})\times \{t\}}(u-k)_-^2\chi _{[u<\mu ^-+\frac{1}{4}\lambda \omega ]}\,\textrm {d}x\\&\ge \frac{1}{64}(1-\lambda )^2\omega ^2 \Bigg |\Bigg [u(\cdot , t)<\mu ^-+\frac{1}{4}\lambda \omega \Bigg ]\cap K_{ \frac{1}{2}\varepsilon \varrho }(\widetilde{x})\Bigg |. \end{aligned}$$

Substituting this estimate back to the energy estimate yields that

$$\begin{aligned} \Bigg |\Big [u(\cdot , t)<\mu ^-+\frac{1}{4}\lambda \omega \Big ]\cap K_{ \frac{1}{2}\varepsilon \varrho }(\widetilde{x})\Bigg |\le \frac{\gamma }{(1-\lambda )^2}\Big (\delta +\frac{2\widetilde{\eta }}{\omega }\Big ) |K_{ \frac{1}{2}\varepsilon \varrho }|. \end{aligned}$$

Now we may choose \(\delta \) and \(\widetilde{\eta }\) to satisfy

$$\begin{aligned} \frac{\gamma }{(1-\lambda )^2}\Big (\delta +\frac{2\widetilde{\eta }}{\omega }\Big )\le \eta . \end{aligned}$$
(3.14)

Up to now, we have shown that

$$\begin{aligned} \Bigg |\Bigg [u(\cdot , t)\ge \mu ^-+\frac{1}{4}\lambda \omega \Bigg ]\cap K_{ \frac{1}{2}\varepsilon \varrho }(\widetilde{x})\Bigg |\ge (1-\eta ) |K_{ \frac{1}{2}\varepsilon \varrho }|, \end{aligned}$$

for all \(t\in \big (\widetilde{t},\widetilde{t}+\delta \theta (\varepsilon \varrho )^p\big )\). For simplicity let us denote \(r:=\frac{1}{2}\varepsilon \varrho \). The above slicewise measure information actually yields

$$\begin{aligned} \Bigg |\Bigg [u \ge \mu ^-+\frac{1}{4}\lambda \omega \Bigg ]\cap Q \Bigg |\ge (1-\eta ) |Q|,\text {where}\,\,Q:=K_{r}(\widetilde{x})\times \big (\widetilde{t},\widetilde{t}+\delta \theta (\varepsilon \varrho )^p\big ). \end{aligned}$$
(3.15)

Arranging \(L:=\tfrac{1}{2}\delta ^{-\frac{1}{p}}\) to be an integer, we partition Q along the space coordinate planes into \(L^N\) disjoint but adjacent sub-cylinders, each of which is congruent to

$$\begin{aligned} K_{\frac{r}{L}}\times \big (-\delta \theta (\varepsilon \varrho )^p,0\big )\equiv K_{\kappa \varrho }\times \big (-\theta (\kappa \varrho )^p,0\big )=Q_{\kappa \varrho }(\theta ) \end{aligned}$$

where \(\kappa :=\varepsilon \delta ^{\frac{1}{p}}\) can be traced by combining (3.12) and (3.14), i.e.

$$\begin{aligned} \kappa =\gamma (\text {data}) (1-\lambda )^{3+\frac{2}{p}} \eta ^{1+\frac{1}{p}} \alpha ^{2+\frac{1}{p}}\omega ^{1+\frac{1}{p}}. \end{aligned}$$

Due to (3.15), it is easy to see that at least one of them, say \((x_*,t_*)+Q_{\kappa \varrho }(\theta )\), will satisfy the desired property

$$\begin{aligned} \Bigg |\Bigg [u \ge \mu ^-+\frac{1}{4}\lambda \omega \Bigg ]\cap \big [(x_*,t_*)+Q_{\kappa \varrho }(\theta ) \big ]\Bigg |\ge (1-\eta )| Q_{\kappa \varrho }(\theta )|. \end{aligned}$$

The proof is concluded with such a choice of \((x_*,t_*)\) and \(\kappa \). \(\square \)

The location of the clustering within \((x_*,t_*)+Q_{\kappa \varrho }(\theta )\subset (0,\bar{t})+Q_{\varrho }(\theta )\) being only qualitative notwithstanding, the quantified measure concentration allows us to extract pointwise estimate with the aid of Lemma 2.1 and then use the logarithmic energy estimate to propagate the measure information up to the level \(\bar{t}\), cf. Fig. 2.

Fig. 2
figure 2

Local clustering

As a matter of fact, if in Lemma 3.1 we choose \(\lambda =\tfrac{1}{2}\) and \(\eta =c_o(\tfrac{1}{4}\omega )^{\frac{N+p}{p}}\) where \(c_o\) is determined in Lemma 2.1, then Lemma 3.1 and Lemma 2.1 would yield that

$$\begin{aligned} u\ge \mu ^-+\frac{1}{16}\omega \quad \text { a.e. in }(x_*,t_*)+Q_{\frac{1}{2}\kappa \varrho }(\theta ), \end{aligned}$$

for some \((x_*,t_*)\in (0,\bar{t})+Q_{\varrho }(\theta )\) and the constant

$$\begin{aligned} \kappa =\gamma (\text {data})\omega ^{\bar{q}}\quad \text { where }\bar{q}:=\Big (\frac{N}{p}+1\Big )\Big (3+\frac{2}{p}\Big )+\frac{1}{p}+1. \end{aligned}$$

In particular, we have

$$\begin{aligned} u\Big (\cdot , t_*-\theta \Big (\frac{1}{2}\kappa \varrho \Big )^p\Big )\ge \mu ^-+\frac{1}{16}\omega \quad \text { a.e. in }K_{\frac{1}{2}\kappa \varrho }(x_*), \end{aligned}$$

which serves as the initial datum to apply Lemma 2.4. Indeed, setting \(\alpha =\tfrac{1}{2}\) and \(\xi =\tfrac{1}{16}\) in Lemma 2.4 and choosing \(\widetilde{A}\) so large that

$$\begin{aligned} \Big (\frac{1}{4}\omega \Big )^{2-p}\varrho ^p\le \widetilde{A}\Big (\frac{1}{16}\omega \Big )^{2-p}\Big (\frac{1}{2}\kappa \varrho \Big )^p,\quad \text { i.e. }\quad \widetilde{A}\ge \frac{2^{4-p}}{\kappa ^{p}}, \end{aligned}$$

it yields a number \(\bar{\xi }\in (0,\tfrac{1}{4}\xi )\), such that

$$\begin{aligned} \big |\big [u(\cdot , \bar{t})>\mu ^-+\bar{\xi }\omega \big ]\cap K_{\frac{1}{4}\kappa \varrho }(x_*) \big |>\frac{1}{2} \big |K_{\frac{1}{4}\kappa \varrho }\big |. \end{aligned}$$
(3.16)

The dependence of \(\bar{\xi }\) is traced by

$$\begin{aligned} \bar{\xi }=\frac{1}{2}\xi \exp \Big \{- \frac{\gamma \widetilde{A}}{\alpha }\Big \}=\frac{1}{32}\exp \Big \{-\frac{\gamma 2^{4-p}}{\kappa ^p} \Big \}=\frac{1}{32}\exp \Big \{ -\frac{\gamma }{\omega ^{p\bar{q}}} \Big \}. \end{aligned}$$
(3.17)

The measure information (3.16) permits us to claim that

$$\begin{aligned} \big |\big [u(\cdot , \bar{t})>\mu ^-+\bar{\xi }\omega \big ]\cap K_{ \varrho } \big |&\ge \big |\big [u(\cdot , \bar{t})>\mu ^-+\bar{\xi }\omega \big ]\cap K_{\frac{1}{4}\kappa \varrho }(x_*) \big |\\&>\frac{1}{2} |K_{\frac{1}{4}\kappa \varrho }|=\frac{1}{2}\Big (\frac{1}{4}\kappa \Big )^N |K_\varrho |=:\bar{\alpha } |K_\varrho |. \end{aligned}$$

Thanks to the arbitrariness of \(\bar{t}\), we have actually arrived at

$$\begin{aligned} \big |\big [u(\cdot , t)\ge \mu ^- + \bar{\xi }\omega \big ]\cap K_\varrho \big |>\bar{\alpha } |K_\varrho |\quad \text { for all }t\in \big (-(A-1) \theta \varrho ^p,0\big ]. \end{aligned}$$
(3.18)

The dependence of \(\bar{\alpha }\) is traced by

$$\begin{aligned} \bar{\alpha }=\gamma (\text {data})\,\omega ^{\bar{q}N}. \end{aligned}$$
(3.19)

This measure information (3.18) lays the foundation for the analysis to be set out in the following sections. Since A is a large number, we will stipulate that (3.18) holds with \(A-1\) replaced by A for simplicity.

3.3 Reduction of oscillation near the infimum: part II

Let us first introduce the following intrinsic cylinders

$$\begin{aligned} \left\{ \begin{array}{ll} Q_{\varrho }(\widehat{\theta })=K_\varrho \times ( - \widehat{\theta }\varrho ^p,0),\quad \widehat{\theta }=(\xi \omega )^{2-p},\\ Q_{\varrho }(\widetilde{\theta })=K_\varrho \times ( -\widetilde{\theta }\varrho ^p,0),\quad \widetilde{\theta }=(\delta \xi \omega )^{1-p}, \end{array}\right. \end{aligned}$$

for some \(\xi (\omega )\) and \(\delta (\omega )\) in (0, 1) to be determined later. We can always assume \(\xi \) and \(\delta \) to be sufficiently small, so that \( \widehat{\theta }\le \widetilde{\theta }\). On the other hand, we may assume that

$$\begin{aligned} 8^p\widetilde{\theta }\le A \theta \end{aligned}$$
(3.20)

for some \(A(\omega )\) yet to be determined.

Throughout Sects. 3.33.5, we always assume that

$$\begin{aligned} |\mu ^- - e_i|\le \frac{1}{4}\delta \xi \omega \quad \text { for some } i\in \{0,1,\cdots ,\ell \}, \end{aligned}$$
(3.21)

for the same \(\xi (\omega )\) and \(\delta (\omega )\) in (0, 1) introduced above, to be determined. When restriction (3.21) does not hold, the case will be examined in Sect. 3.6.

First of all, we turn our attention to Lemma 2.1 and Lemma 2.3. In view of the measure information (3.18), Lemma 2.3 is at our disposal, with \(\alpha \), \(\eta \) and A replaced by \(\bar{\alpha }\), \(\bar{\xi }\) and \(A/4^{2-p}\) respectively. Suppose \(\xi \) is determined in Lemma 2.3 in terms of the data and \(\bar{\alpha }\) fixed in (3.19), and recall that \( \widehat{\theta }=(\xi \omega )^{2-p}\). If there holds

$$\begin{aligned} \Bigg |\Bigg [u\le \mu ^-+\frac{1}{2}\xi \omega \Bigg ]\cap Q_{\frac{1}{2}\varrho }( \widehat{\theta })\Bigg |\le c_o(\xi \omega )^{\frac{N+p}{p}}|Q_{\frac{1}{2}\varrho }( \widehat{\theta })|, \end{aligned}$$

then Lemma 2.1 yields that

$$\begin{aligned} u\ge \mu ^-+\frac{1}{4}\xi \omega \quad \text { a.e. in }Q_{\frac{1}{4}\varrho }( \widehat{\theta }). \end{aligned}$$
(3.22)

Analogously, if for \(k=\mu ^-+ \xi \omega \), there holds

$$\begin{aligned} \iint \limits _{Q_{\varrho }( \widehat{\theta })}\int \limits _u^{k} H_\varepsilon '(s) \chi _{[s<k]}\,\textrm {d}s\textrm {d}x\textrm {d}t\le \xi \omega \Bigg |\Bigg [u\le \mu ^-+\frac{1}{2}\xi \omega \Bigg ]\cap Q_{\frac{1}{2}\varrho }( \widehat{\theta })\Bigg | \end{aligned}$$

then Lemma 2.3 yields that, stipulating \(A\ge 4^{2-p}\xi ^{2-p}\),

$$\begin{aligned} u\ge \mu ^-+\frac{1}{4}\xi \omega \quad \text { a.e. in }Q_{\frac{1}{2}\varrho }( \widehat{\theta }). \end{aligned}$$
(3.23)

Consequently, either (3.22) or (3.23) yields a reduction of oscillation

(3.24)

For later use, we record the dependence of \(\xi \) here, that is,

$$\begin{aligned} \xi = \exp \Big \{- \gamma (\text {data})\omega ^{-\frac{p\bar{q}N}{p-1}}-\gamma (\text {data})\omega ^{-p\bar{q}}\Big \}. \end{aligned}$$
(3.25)

3.4 Reduction of oscillation near the infimum: part III

In this section, we continue to examine the situation when the measure condition in Lemma 2.1 is violated:

$$\begin{aligned} \Big |\Big [u\le \mu ^-+\frac{1}{2}\xi \omega \Big ]\cap Q_{\frac{1}{2}\varrho }( \widehat{\theta })\Big |> c_o(\xi \omega )^{\frac{N+p}{p}}\Big |Q_{\frac{1}{2}\varrho }( \widehat{\theta })\Big |, \end{aligned}$$
(3.26)

and when the condition in Lemma 2.3 is also violated: for \(k=\mu ^-+ \xi \omega \), there holds

$$\begin{aligned} \iint \limits _{Q_{\varrho }( \widehat{\theta })}\int \limits _u^{k} H_\varepsilon '(s)\chi _{[s<k]} \,\textrm {d}s\textrm {d}x\textrm {d}t>\xi \omega \Big |\Big [u\le \mu ^-+\frac{1}{2}\xi \omega \Big ]\cap Q_{\frac{1}{2}\varrho }(\widehat{\theta })\Big |. \end{aligned}$$
(3.27)

Combining (3.26) and (3.27), we obtain that, for all \(r\in [2\varrho , 8\varrho ]\),

$$\begin{aligned} \begin{aligned} \iint \limits _{Q_{r}(\widehat{\theta })}\int \limits _u^{k} H_\varepsilon '(s)\chi _{[s<k]} \,\textrm {d}s\textrm {d}x\textrm {d}t&\ge \iint \limits _{Q_{\varrho }(\widehat{\theta })}\int \limits _u^{k} H_\varepsilon '(s)\chi _{[s<k]} \,\textrm {d}s\textrm {d}x\textrm {d}t\\&\ge \xi \omega \Big |\Big [u\le \mu ^-+\frac{1}{2}\xi \omega \Big ]\cap Q_{\frac{1}{2}\varrho }(\widehat{\theta })\Big |\\&\ge c_o(\xi \omega )^b|Q_{\frac{1}{2}\varrho }(\widehat{\theta })|\ge \widetilde{\gamma }(\xi \omega )^b|Q_r(\widehat{\theta })|, \end{aligned} \end{aligned}$$
(3.28)

where \(\widetilde{\gamma }=c_o16^{-N-p}\) and \(b=1+\tfrac{N+p}{p}\).

Next, introduce a free parameter \(\bar{\delta }\in (\delta ,2\delta )\) and set \(\bar{\theta }:=(\bar{\delta }\xi \omega )^{1-p}\). Recall also that \( \theta =(\tfrac{1}{4}\omega )^{2-p}\), \(\widetilde{\theta }=(\delta \xi \omega )^{1-p}\), \(\widehat{\theta }=(\xi \omega )^{2-p}\), and that we have assumed \(\widetilde{\theta }(8\varrho )^p\le A \theta \varrho ^p\le \varrho ^{p-1}\) in (3.20). Therefore,

$$\begin{aligned} Q_r( \widehat{\theta })\subset Q_r(\bar{\theta })\subset Q_r(\widetilde{\theta })\subset Q_{r}(A \theta )\subset \widetilde{Q}_o \quad \text { for any } r\in [2\varrho , 8\varrho ]. \end{aligned}$$

The estimate (3.28) implies that there exists \(t_*\in [-\widehat{\theta } r^p,0]\), such that

$$\begin{aligned} \int \limits _{K_{r}\times \{t_*\}}\int \limits _u^{k} H_\varepsilon '(s)\chi _{[s<k]} \,\textrm {d}s\textrm {d}x\ge \widetilde{\gamma }(\xi \omega )^b|K_r|. \end{aligned}$$
(3.29)

Observe also that for any \(t\in [-\bar{\theta } r^p, 0]\) and any \(\bar{\delta }\in (\delta ,2\delta )\), there holds

$$\begin{aligned} \begin{aligned} |K_r|&\ge \big |\big [u\le \mu ^-+\bar{\delta }\xi \omega \big ]\cap K_r\big |\\&\ge (\bar{\delta }\xi \omega )^{-2}\int \limits _{K_r\times \{t\}}\big [u-(\mu ^-+\bar{\delta }\xi \omega )\big ]_-^2\,\textrm {d}x. \end{aligned} \end{aligned}$$
(3.30)

Denoting \(\bar{k}=\mu ^-+ \bar{\delta }\xi \omega \) and enforcing that for some \(i\in \{0,1,\cdots ,\ell \}\),

$$\begin{aligned} |\mu ^- - e_i|\le \frac{1}{4}\delta \xi \omega \quad \text { and }\quad \varepsilon \le \frac{1}{4}\delta \xi \omega , \end{aligned}$$

we use (3.29) and (3.30) to estimate

In the first inequality above, we have assumed \(\tfrac{9}{4}\xi \omega \le d\) by possibly further restricting the choice of \(\xi \) in (3.25), and hence \((\mu ^-,\bar{k})\subset (e_i-\tfrac{1}{4}\delta \xi \omega ,e_i+\tfrac{1}{4}\delta \xi \omega +2\delta \xi \omega )\subset (e_i-d,e_i+d)\). As such the constant \(\gamma \) in the definition of \(\xi \) in (3.25) depends on d and M. The above analysis together with (2.1) yields the following energy estimate.

Lemma 3.2

Let u be a weak super-solution to (1.8) with (1.4) in \(E_T\), under the measure information (3.18). Let (3.26) and (3.27) hold true. Denoting \(b:=1+\tfrac{N+p}{p}\) and setting \(k=\mu ^-+\bar{\delta }\xi \omega \) with \(\bar{\delta }\in (\delta ,2\delta )\), there exists a positive constant \(\gamma \) depending only on the data, such that for all \(\sigma \in (0,1)\) and all \(r\in [2\varrho , 8\varrho ]\) we have

provided that for some \(i\in \{0,1,\cdots ,\ell \}\),

$$\begin{aligned} |\mu ^- - e_i|\le \frac{1}{4}\delta \xi \omega \quad \text { and }\quad \varepsilon \le \frac{1}{4}\delta \xi \omega . \end{aligned}$$

Based on the energy estimate in Lemma 3.2, a De Giorgi type lemma can be derived. Notice that the time scaling used here is different from the one in Lemmas 2.12.3.

Lemma 3.3

Suppose the hypotheses in Lemma 3.2 hold. Let \(\delta \in (0,1)\). There exists a constant \(c_1\in (0,1)\) depending only on the data, such that if

$$\begin{aligned} \big |\big [u<\mu ^-+2\delta \xi \omega \big ]\cap Q_{4\varrho }(\widetilde{\theta })\big |\le c_1 (\xi \omega )^{b}|Q_{4\varrho }(\widetilde{\theta })|,\quad \text { where }\widetilde{\theta }=(\delta \xi \omega )^{1-p}, \end{aligned}$$

then enforcing \(|\mu ^- - e_i |\le \tfrac{1}{4}\delta \xi \omega \) for some \(i\in \{0,1,\ldots ,\ell \}\) and \(\varepsilon \le \tfrac{1}{4}\delta \xi \omega \), we have

$$\begin{aligned} u\ge \mu ^-+\delta \xi \omega \quad \text { a.e. in }Q_{2\varrho }(\widetilde{\theta }), \end{aligned}$$

provided \(4^p\widetilde{\theta }\le A \theta \).

Proof

For \(n=0,1,\ldots ,\) we set

$$\begin{aligned} \left\{ \begin{array}{c} \displaystyle k_n=\mu ^-+ \delta \xi \omega +\frac{\delta \xi \omega }{2^{n}},\quad \tilde{k}_n=\frac{k_n+k_{n+1}}{2},\\ \displaystyle \varrho _n=2\varrho +\frac{\varrho }{2^{n-1}}, \quad \tilde{\varrho }_n=\frac{\varrho _n+\varrho _{n+1}}{2},\\ \displaystyle K_n=K_{\varrho _n},\quad \widetilde{K}_n=K_{\tilde{\varrho }_n},\\ \displaystyle Q_n=Q_{\varrho _n}(\widetilde{\theta }),\quad \widetilde{Q}_n=Q_{\tilde{\varrho }_n}(\widetilde{\theta }). \end{array} \right. \end{aligned}$$

We will use the energy estimate in Lemma 3.2 with the pair of cylinders \(\widetilde{Q}_n\subset Q_n\). Note that the constant \(\bar{\delta }\) in Lemma 3.2 is replaced by \((1+2^{-n})\delta \), as indicated in the definition of \(k_n\). Enforcing \(|\mu ^- - e_i|\le \tfrac{1}{4}\delta \xi \omega \) and \(\varepsilon \le \tfrac{1}{4}\delta \xi \omega \), the energy estimate in Lemma 3.2 yields that

where \(A_n:=\big [u<k_n\big ]\cap Q_n\).

Let \(0\le \phi \le 1\) be a cutoff function that vanishes on the parabolic boundary of \(\widetilde{Q}_n\) and equals the identity in \(Q_{n+1}\). An application of the Hölder inequality, the Sobolev imbedding [6, Chap. I, Proposition 3.1] and the above energy estimate give that

In terms of \( Y_n=|A_n|/|Q_n|\), the recurrence is rephrased as

$$\begin{aligned} Y_{n+1} \le \frac{\gamma C^n}{(\xi \omega )^{\frac{b}{N+2}}} Y_n^{1+\frac{1}{N+2}}, \end{aligned}$$

for a constant \(\gamma \) depending only on the data and with \(C=C(N,p)\). Hence, by [6, Chap. I, Lemma 4.1], there exists a positive constant \(c_1\) depending only on the data, such that \(Y_n\rightarrow 0\) if we require that \(Y_o\le c_1(\xi \omega )^b\). This concludes the proof. \(\square \)

The next lemma concerns the smallness of the measure density of the set \([u\approx \mu ^-]\). Its proof relies on (2.2) and the measure information (3.18) will be employed.

Lemma 3.4

Let u be a weak super-solution to (1.8) with (1.4) in \(E_T\), under the measure information (3.18). There exists a positive constant \(\gamma \) depending only on the data, such that for any \(j_*\in \mathbb {N}\) we have

$$\begin{aligned} \Big |\Big [ u\le \mu ^-+\frac{\xi \omega }{2^{j_*}}\Big ]\cap Q_{4\varrho }(\widetilde{\theta })\Big | \le \frac{\gamma }{ \bar{\alpha }} j_*^{-\frac{p-1}{p}} |Q_{4\varrho }(\widetilde{\theta })|, \quad \text { where }\widetilde{\theta }=\Big (\frac{\xi \omega }{2^{j_*}}\Big )^{1-p}, \end{aligned}$$

with \(\bar{\alpha }\) as in (3.19), provided \(4^p\widetilde{\theta }\le A \theta \).

Proof

We employ the energy estimate (2.2) in \(Q_{8\varrho }(\widetilde{\theta })\) with a standard cutoff function \(\zeta \) that vanishes on the parabolic boundary of \(Q_{8\varrho }(\widetilde{\theta })\) and equals the identity in \(Q_{4\varrho }(\widetilde{\theta })\), satisfying \(|D\zeta |\le \gamma /\varrho \) and \(|\partial _t\zeta |\le \gamma /(\widetilde{\theta }\varrho ^p)\). The levels are chosen to be

$$\begin{aligned} k_j:=\mu ^-+\frac{\xi \omega }{2^{j}},\quad j=0,1,\ldots , j_*. \end{aligned}$$

Therefore, assuming \(j_*\) has been chosen, and taking into account the definition of \( \widetilde{\theta }(j_*)\), the energy estimate (2.2) yields that

$$\begin{aligned} \begin{aligned} \iint \limits _{Q_{4\varrho }(\widetilde{\theta })}|D(u-k_j)_-|^p\,\textrm {d}x\textrm {d}t&\le \frac{\gamma }{\varrho ^p}\bigg (\frac{\xi \omega }{2^j}\bigg )^p\bigg [1+\frac{1}{\widetilde{\theta }}\bigg (\frac{\xi \omega }{2^j}\bigg )^{1-p}\bigg ] |A_{j,8\varrho }|\\&\le \frac{\gamma }{\varrho ^p}\bigg (\frac{\xi \omega }{2^j}\bigg )^p|A_{j,8\varrho }|, \end{aligned} \end{aligned}$$

where \( A_{j,8\varrho }:= \big [u<k_{j}\big ]\cap Q_{8\varrho }(\widetilde{\theta })\).

Observing \(\xi <\bar{\xi }\) from (3.17) and (3.25), we may derive the measure theoretical information from (3.18):

$$\begin{aligned} \big |\big [u(\cdot , t)>\mu ^-+\xi \omega \big ]\cap K_{4\varrho }\big |\ge \bar{\alpha } 8^{-N} |K_{4\varrho }| \qquad \text{ for } \text{ all }\quad t\in \big (-\widetilde{\theta }(4\varrho )^p,0\big ]. \end{aligned}$$

With this information at hand, we apply [6, Chap. I, Lemma 2.2] slicewise to \(u(\cdot ,t)\) for \(t\in \big (-\widetilde{\theta }(4\varrho )^p,0\big ]\), over the cube \(K_{4\varrho }\), for levels \(k_{j+1}<k_{j}\), followed by an application of Hölder’s inequality. Indeed, we estimate

$$\begin{aligned}&(k_j-k_{j+1})\big |\big [u(\cdot , t)<k_{j+1} \big ] \cap K_{4\varrho }\big | \\&\qquad \le \frac{\gamma \varrho ^{N+1}}{\big |\big [u(\cdot , t)>k_{j}\big ]\cap K_{4\varrho }\big |} \int \limits _{ [k_{j+1}<u(\cdot ,t)<k_{j}] \cap K_{4\varrho }}|Du(\cdot ,t)|\,\textrm {d}x\\&\qquad \le \frac{\gamma \varrho }{\bar{\alpha }} \Bigg [\int \limits _{ [k_{j+1}<u(\cdot ,t)<k_{j}] \cap K_{4\varrho }}|Du(\cdot ,t)|^p\,\textrm {d}x\Bigg ]^{\frac{1}{p}} \big |\big [ k_{j+1}<u(\cdot ,t)<k_{j}\big ]\cap K_{4\varrho }\big |^{1-\frac{1}{p}} \\&\qquad = \frac{ \gamma \varrho }{\bar{\alpha }} \Bigg [\int \limits _{K_{4\varrho }}|D(u-k_j)_-(\cdot ,t)|^p\,\textrm {d}x\Bigg ]^{\frac{1}{p}} \big [ |A_{j,4\varrho }(t)|-|A_{j+1,4\varrho }(t)|\big ]^{1-\frac{1}{p}}, \end{aligned}$$

where we have set \( A_{j,4\varrho }(t):= \big [u(\cdot ,t)<k_{j}\big ]\cap K_{4\varrho }\). We perform an integration in \(\textrm {d}t\) over the interval \(\big (-\widetilde{\theta }(4\varrho )^p,0\big ]\) on both sides and apply Hölder’s inequality. Setting \(A_{j,4\varrho }:=[u<k_j]\cap Q_{4\varrho }(\widetilde{\theta })\), we arrive at

$$\begin{aligned} \frac{\xi \omega }{2^{j+1}}\big |A_{j+1,4\varrho }\big |&\le \frac{\gamma \varrho }{\bar{\alpha }}\Bigg [\iint \limits _{Q_{4\varrho }(\widetilde{\theta })}|D(u-k_j)_-|^p\,\textrm {d}x\textrm {d}t\Bigg ]^\frac{1}{p} \big [|A_{j,4\varrho }|-|A_{j+1,4\varrho }|\big ]^{1-\frac{1}{p}}\\&\le \frac{\gamma }{\bar{\alpha }} \frac{\xi \omega }{2^j}|A_{o,8\varrho }|^{\frac{1}{p}}\big [|A_{j,4\varrho }|-|A_{j+1,4\varrho }|\big ]^{1-\frac{1}{p}}. \end{aligned}$$

Now take the power \(\frac{p}{p-1}\) on both sides of the above inequality to obtain

$$\begin{aligned} \big |A_{j+1,4\varrho }\big |^{\frac{p}{p-1}} \le \left( \frac{\gamma }{\bar{\alpha }}\right) ^{\frac{p}{p-1}}|A_{o,8\varrho }|^{\frac{1}{p-1}}\big [|A_{j,4\varrho }|-|A_{j+1,4\varrho }|\big ]. \end{aligned}$$

Add these inequalities from 0 to \(j_*-1\) to obtain

$$\begin{aligned} j_* |A_{j_*,4\varrho }|^{\frac{p}{p-1}}\le \left( \frac{\gamma }{\bar{\alpha }}\right) ^{\frac{p}{p-1}}|A_{o,8\varrho }|^{\frac{1}{p-1}}|A_{o,4\varrho }|, \end{aligned}$$

from which we easily obtain

$$\begin{aligned} |A_{j_*,4\varrho }|\le \frac{\gamma }{\bar{\alpha }}{j_*^{-\frac{p-1}{p}}}|Q_{4\varrho }(\widetilde{\theta })|. \end{aligned}$$

The proof is completed. \(\square \)

3.5 Reduction of oscillation near the infimum: part IV

Under the conditions (3.26) and (3.27), we may reduce the oscillation in the following way. First of all, let \(\xi \) be determined in (3.25). Then we choose, according to Lemma 3.4, the integer \(j_*\) so large to satisfy that

$$\begin{aligned} \frac{\gamma }{ \bar{\alpha } j_*^{\frac{p-1}{p}}}\le c_1(\xi \omega )^b, \end{aligned}$$

where \(c_1\) is the constant appearing in Lemma 3.3. According to (3.19) and (3.25), the dependence of \(j_*\) can be traced by

$$\begin{aligned} j_*=\omega ^{-q_o}\exp \big \{\gamma (\text {data})\omega ^{-q_1}\big \}, \end{aligned}$$
(3.31)

for some positive \(\{q_o,q_1\}\) depending on the data.

Next, we can fix \(2\delta =2^{-j_*}\) in Lemma 3.3. Consequently, by the choice of \(j_*\) in (3.31), Lemma 3.3 can be applied, assuming that \(|\mu ^- - e_i|\le \tfrac{1}{4}\delta \xi \omega \) for some \(i\in \{0,1,\cdots ,\ell \}\) and \(\varepsilon \le \frac{1}{4}\delta \xi \omega \), and we arrive at

$$\begin{aligned} u\ge \mu ^-+ \delta \xi \omega \quad \text { a.e. in }Q_{2\varrho }(\widetilde{\theta }), \end{aligned}$$

where we may trace, recalling (3.25),

$$\begin{aligned} \delta =\exp \Big \{-\omega ^{-q_o}\exp \big \{\gamma \omega ^{-q_1}\big \}\Big \}\quad \text { and }\quad \xi = \exp \Big \{- \gamma \omega ^{-q_2}\Big \} \end{aligned}$$
(3.32)

for some generic \(\gamma \) and some positive \(\{q_o, q_1, q_2\}\) determined by the data. This would give us a reduction of oscillation

(3.33)

with the above-defined \(\delta \) and \(\xi \). The choice of A can be finally made from \(8^p\widetilde{\theta }\le A\theta \) as required in (3.20), i.e. \(A\ge 2^{p+4}\omega ^{-1}(\delta \xi )^{1-p}\). Thus we may choose

$$\begin{aligned} A(\omega )=\exp \Big \{\exp \big \{\gamma \omega ^{-q}\big \}\Big \} \end{aligned}$$
(3.34)

for some properly defined positive \(\gamma \) and q depending only on the data.

To summarize the achievements in Sects. 3.13.5, taking the reverse of (3.1), (3.6), (3.24) and (3.33) all into account, if \(|\mu ^- - e_i|\le \tfrac{1}{4}\delta \xi \omega \) for some \(i\in \{0,1,\cdots ,\ell \}\) and \(\varepsilon \le \frac{1}{4}\delta \xi \omega \) hold true, then for \(\theta =(\tfrac{1}{4}\omega )^{2-p}\) we have that

(3.35)

where

$$\begin{aligned} \eta =\exp \left\{ -\exp \Big \{\exp \big \{\gamma \omega ^{-q}\big \}\Big \}\right\} , \end{aligned}$$

for some properly defined positive \(\gamma \) and q depending only on the data.

3.6 Reduction of oscillation near the infimum: part V

Let \(\xi (\omega )\) and \(\delta (\omega )\) be determined in (3.32). The analysis throughout Sects. 3.33.5 has been founded on the condition (3.21). We now examine the case when (3.21) does not hold, namely,

$$\begin{aligned} |\mu ^- - e_i|>\frac{1}{4}\delta \xi \omega \quad \text { for all }i\in \{0,1,\ldots ,\ell \}. \end{aligned}$$
(3.36)

Notice that the analysis in Sect. 3.2 does not rely on the condition (3.21), and thus the measure information (3.18) derived there is still at our disposal. In view of the dependences of \(\delta \) and \(\xi \) in (3.32) and that of \(\bar{\xi }\) in (3.17), we may assume that \(\delta \xi <\bar{\xi }\) and that (3.18) holds true with \(\bar{\xi }\) replaced by \(\delta \xi \).

Next, for \(\widetilde{\xi }\in (0,\tfrac{1}{8})\) we introduce the levels \(k=\mu ^-+ \widetilde{\xi }\delta \xi \omega \). According to (3.36) and assuming that \(\varepsilon \le \frac{1}{4}\delta \xi \omega \), the energy estimate (2.1)\(_-\) written in \(Q_{\varrho }(\vartheta )\subset Q_\varrho (A\theta )\) for some \(0<\vartheta <A\theta \) yields that

Given this energy estimate and the measure information (3.18), the theory of parabolic p-Laplacian in [6] applies; see also [16, Appendix A] for tracing the constants.

Lemma 3.5

Let u be a weak super-solution to (1.8) with (1.4) in \(E_T\). Suppose (3.18) and (3.36) hold true, and \(\varepsilon \le \frac{1}{4}\delta \xi \omega \). There exists a positive constant \(\widetilde{\xi }\) depending on the data and \(\bar{\alpha }\) of (3.19), such that for \( \vartheta =(\widetilde{\xi }\delta \xi \omega )^{2-p}\) we have

provided \(\vartheta \le A\theta \). Moreover, the dependence of \(\widetilde{\xi }\) can be traced by

$$\begin{aligned} \widetilde{\xi }=\gamma (\text {data}) \exp \big \{-\bar{\alpha }^{-\frac{p}{p-1}}\big \}. \end{aligned}$$

Remark 3.1

Note that the choice of A in (3.34) verifies \(\vartheta \le A\theta \).

3.7 Derivation of the modulus of continuity

This is the final part of the proof of Theorem 1.1. Let us summarize what has been achieved by the previous sections. To do so, we will first assume that \(\omega \le 1\). According to (3.35) and Lemma 3.5, we have

(3.37)

where \(\theta =(\tfrac{1}{4}\omega )^{2-p}\) and

$$\begin{aligned} \eta =\exp \left\{ -\exp \Big \{\exp \big \{\gamma \omega ^{-q}\big \}\Big \}\right\} , \end{aligned}$$

for some properly defined positive \(\gamma \) and q depending only on the data.

In order to iterate the arguments, we set

$$\begin{aligned} \omega _1:=\max \Big \{\big (1-\eta (\omega )\big )\omega , \gamma \Big (\ln ^{(2)}\frac{1}{\varrho }\Big )^{-\frac{1}{q}}\Big \} \end{aligned}$$

and seek \(\varrho _1\) to verify the set inclusions, recalling A from (3.34):

$$\begin{aligned} Q_{\varrho _1}(A_1\theta _1)\subset Q_{\frac{1}{4}\varrho }(\theta ),\quad Q_{\varrho _1}(A_1\theta _1)\subset Q_o, \end{aligned}$$

where \(\theta _1:=(\tfrac{1}{4}\omega _1)^{2-p}\), \(A_1:=A(\omega _1)\). Note that we may assume \(\eta (\omega )\le \frac{1}{2}\), which yields \(\omega _1\ge \frac{1}{2}\omega \). Then we estimate

$$\begin{aligned} A_1\theta _1\varrho _1^p\le A_1\Big (\frac{1}{8}\omega \Big )^{2-p}\varrho _1^p, \end{aligned}$$

and hence choose

$$\begin{aligned} A_1\Big (\frac{1}{8}\omega \Big )^{2-p}\varrho _1^p=\theta \Big (\frac{1}{4}\varrho \Big )^p,\quad \text { i.e. }\quad \varrho _1^p=2^{2-3p}A_1^{-1}\varrho ^p. \end{aligned}$$

It is not hard to verify that the other set inclusion also holds with such a choice of \(\varrho _1\). Consequently, we arrive at

which takes the place of (3.1)\(_2\) in the next stage. Repeating the arguments of Sects. 3.13.6, we obtain that

Now we may construct for \(n\in \mathbb {N}\),

$$\begin{aligned} \left\{ \begin{array}{cc} \varrho _o=\varrho ,\quad \varrho _{n+1}^p=2^{2-3p} A_n^{-1}\varrho _n^p,\quad A_n=A(\omega _n),\,\quad \theta _n=(\tfrac{1}{4}\omega _n)^{2-p},\\ \displaystyle \omega _o=\omega ,\quad \omega _{n+1}=\max \Big \{\omega _n\big (1-\eta (\omega _n)\big ),\,\gamma \Big (\ln ^{(2)}\frac{1}{\varrho _n}\Big )^{-\frac{1}{q}}\Big \},\\ Q_n=Q_{\frac{1}{4}\varrho _n}(\theta _n),\quad Q'_n=Q_{\varrho _n}(A_n \theta _n). \end{array}\right. \end{aligned}$$

By induction, if up to some \(j\in \mathbb {N}\), we have

$$\begin{aligned} \omega _{n}> \gamma \Big (\ln ^{(2)}\frac{1}{\varepsilon }\Big )^{-\frac{1}{q}} \quad \text { for all }n\in \{0,1,\cdots , j-1\}, \end{aligned}$$

then for all \(n\in \{0,1,\cdots , j\}\), there holds

On the other hand, we denote by j the first index to satisfy

$$\begin{aligned} \omega _{j}\le \gamma \Big (\ln ^{(2)}\frac{1}{\varepsilon }\Big )^{-\frac{1}{q}}. \end{aligned}$$
(3.38)

Observe that if there exist \(n_o\in \mathbb {N}\) and a sequence \(\{a_n\}\) satisfying

$$\begin{aligned} a_{n+1}\ge \max \Big \{a_n\big (1-\eta (a_n)\big ),\,\gamma \Big (\ln ^{(2)}\frac{1}{\varrho _n}\Big )^{-\frac{1}{q}}\Big \} \end{aligned}$$

for all \(n\ge n_o\), and meanwhile \(a_{n_o}\ge \omega _{n_o}\), then \(a_n\ge \omega _n\) for all \(n\ge n_o\). We may choose

$$\begin{aligned} a_n=\Big (\ln ^{(3)} (n+a)\Big )^{-\sigma } \end{aligned}$$

for some proper \(\sigma \in (0,\tfrac{1}{q})\) and an absolute constant a, such that \(a_o\ge 1\). Since we have assumed \(\omega \le 1\), we have \(a_o\ge \omega _o\) and hence, \(a_n\ge \omega _n\) for all \(n\ge 0\).

Let us take \(4r\in (0,\varrho )\). If for some \(n\in \{0,1,\cdots , j\}\), we have

$$\begin{aligned} \varrho _{n+1}\le 4r<\varrho _{n}, \end{aligned}$$

then the right-hand side inequality yields

(3.39)

Next we examine the left-hand side inequality. For this purpose, we first note that it may be assumed that \(\eta (\omega _n)\le \frac{1}{2}\). Hence, we estimate \(\omega _n\ge (\frac{1}{2})^n\omega \),

$$\begin{aligned} A(\omega _n)\le \exp \Big \{\exp \Big \{\frac{2^{qn}}{\omega ^q}\Big \}\Big \}, \end{aligned}$$

and

$$\begin{aligned} (4r)^p\ge \varrho _{n+1}^p&\ge 2^{2-3p}\exp \Bigg \{-\exp \Bigg \{\frac{2^{qn}}{\omega ^q}\Bigg \}\Bigg \}\varrho _n^p\\&\ge 2^{ n(2-3p)}\exp \Bigg \{-\sum _{i=0}^{n}\exp \Bigg \{\frac{2^{qi}}{\omega ^q}\Bigg \}\Bigg \}\varrho ^p\\&\ge 2^{ n(2-3p)}\exp \Bigg \{-\exp \Bigg \{\frac{2^{(q+1)n}}{\omega ^{q+1}}\Bigg \}\Bigg \}\varrho ^p. \end{aligned}$$

By taking logarithm on both sides, we estimate

$$\begin{aligned} n\ge \frac{1}{(q+1)\ln 2}\ln ^{(3)}\frac{\varrho }{cr} + |\ln \omega |, \end{aligned}$$

for some absolute constant \(c>0\). Substituting it back to (3.39), we obtain

for some \(C>0\) depending on the data and \(\omega \).

Finally, if \(4r<\varrho _{j+1}\) where j is the first index for (3.38) to hold, then we may use (3.38) and

to incorporate the \(\varepsilon \) term into the oscillation estimate:

Now according to our assumption in Sect. 1.3 we may let \(\varepsilon \rightarrow 0\) and obtain the desired modulus of continuity.

The assumption that \(\omega \le 1\) at the beginning of this section is not restrictive. For otherwise, the same arguments in the previous sections generate quantities

$$\begin{aligned} \big \{\eta , \delta , \xi ,\bar{\xi },\widetilde{\xi }, \alpha ,\bar{\alpha },\kappa , A\big \} \end{aligned}$$

depending only on the data, but independent of \(\omega \). Consequently, instead of (3.37), we end up with

Given this, we may set up an iteration scheme as before and iterate \(n_*=n_*(\text {data})\) times, such that

for some \(\varrho _*\) and \(\omega _*\) depending on \(n_*\).

Without loss of generality, due to (3.34), we may take \(A=A(\omega _*)\). As such the above intrinsic relation plays the role of (3.1) and the previous arguments can be reproduced.