1 Introduction

For \(f\in L^1_{\mathrm {loc}}({\mathbb {R}}^d)\) and a ball or cube B, we denote

$$\begin{aligned} f_B = \frac{1}{{\mathcal {L}}(B)}\int _B|f| . \end{aligned}$$

The centered Hardy–Littlewood maximal function is defined by

$$\begin{aligned} \mathrm M^{{\mathrm {c}}}f(x) = \sup _{r>0}f_{B(x,r)}, \end{aligned}$$

and the uncentered Hardy–Littlewood maximal function is defined by

$$\begin{aligned} \widetilde{{\mathrm {M}}}f(x) = \sup _{B\ni x}f_B \end{aligned}$$

where the supremum is taken over all balls that contain x. The regularity of a maximal operator was first studied by Kinnunen in 1997. He proved in [18] that for each \(p>1\) and \(f\in W^{1,p}({\mathbb {R}}^d)\) the bound

$$\begin{aligned} \Vert \nabla {\mathrm {M}}f\Vert _p \le C_{d,p} \Vert \nabla f\Vert _p \end{aligned}$$
(1.1)

holds for \({\mathrm {M}}=\mathrm M^{{\mathrm {c}}}\). Formula (1.1) also holds for \({\mathrm {M}}=\widetilde{{\mathrm {M}}}\). This implies that both Hardy–Littlewood maximal operators are bounded on Sobolev spaces with \(p>1\). His proof does not apply for \(p=1\). Note that unless \(f=0\) also \(\Vert {\mathrm {M}}f\Vert _1\le C_{d,1}\Vert f\Vert _1\) fails since \({\mathrm {M}}f\) is not in \(L^1({\mathbb {R}}^d)\). In [16] Hajłasz and Onninen asked whether formula (1.1) also holds for \(p=1\) for the centered Hardy–Littlewood maximal operator. This question has become a well known problem for various maximal operators and there has been lots of research on this topic. So far it has mostly remained unanswered, but there has been some progress. For the uncentered maximal function and \(d=1\) it has been proved in [28] by Tanaka and later in [22] by Kurka for the centered Hardy–Littlewood maximal function. The proof for the centered maximal function turned out to be much more complicated. Aldaz and Pérez Lázaro obtained in [3] the sharp improvement \(\Vert \nabla \widetilde{{\mathrm {M}}}f\Vert _{L^1({\mathbb {R}})}\le \Vert \nabla f\Vert _{L^1({\mathbb {R}})}\) of Tanaka’s result. For the uncentered Hardy–Littlewood maximal function Hajłasz’s and Onninen’s question already also has a positive answer for all dimensions d in several special cases. For radial functions Luiro proved it in [24], for block decreasing functions Aldaz and Pérez Lázaro proved it in [2] and for characteristic functions the author proved it in [30]. As a first step towards weak differentiability, Hajłasz and Malý proved in [15] that for \(f\in L^1({\mathbb {R}}^d)\) the centered Hardy–Littlewood maximal function is approximately differentiable. In [1] Aldaz et al. proved bounds on the modulus of continuity for all dimensions.

A related question is whether the maximal operator is a continuous operator. Luiro proved in [23] that for \(p>1\) the uncentered maximal operator is continuous on \(W^{1,p}({\mathbb {R}}^d)\). There is ongoing research for the endpoint case \(p=1\). For example Carneiro et al. proved in [11] that \(f\mapsto \nabla \widetilde{{\mathrm {M}}}f\) is continuous \(W^{1,1}({\mathbb {R}})\rightarrow L^1({\mathbb {R}})\) and in [14] González-Riquelme and Kosz recently improved this to continuity on \(\mathrm {BV}\). Carneiro et al. proved in [8] that for radial functions f, the operator \(f\mapsto \nabla \widetilde{{\mathrm {M}}}f\) is continuous as a map \(W^{1,1}({\mathbb {R}}^d)\rightarrow L^1({\mathbb {R}}^d)\).

The regularity of maximal operators has also been studied for other maximal operators and on other spaces. We focus on the endpoint \(p=1\). In [12] Carneiro and Svaiter and in [7] Carneiro and González-Riquelme investigated maximal convolution operators \({\mathrm {M}}\) associated to certain partial differential equations. Analogous to the Hardy–Littlewood maximal operator they proved \(\Vert \nabla {\mathrm {M}}f\Vert _{L^1({\mathbb {R}}^d)}\le C_d\Vert \nabla f\Vert _{L^1({\mathbb {R}}^d)}\) for \(d=1\), and for \(d>1\) if f is radial. In [9] Carneiro and Hughes proved \(\Vert \nabla {\mathrm {M}}f\Vert _{l^1({\mathbb {Z}}^d)}\le C_d\Vert f\Vert _{l^1({\mathbb {Z}}^d)}\) for centered and uncentered discrete maximal operators. This bound does not hold on \({\mathbb {R}}^d\), but because in the discrete setting we have \(\Vert \nabla f\Vert _{l^1({\mathbb {Z}}^d)}\le C_d\Vert f\Vert _{l^1({\mathbb {Z}}^d)}\), it is weaker than the still open \(\Vert \nabla {\mathrm {M}}f\Vert _{l^1({\mathbb {Z}}^d)}\le C_d\Vert \nabla f\Vert _{l^1({\mathbb {Z}}^d)}\). In [21] Kinnunen and Tuominen proved the boundedness of a discrete maximal operator in the metric Hajłasz Sobolev space \(M^{1,1}\). In [27] Pérez et al. proved the boundedness of certain convolution maximal operators on Hardy-Sobolev spaces \(\dot{H}^{1,p}\) for a sharp range of exponents, including \(p=1\). In [29] the author proved \( {{\,\mathrm{var}\,}}\mathrm M^{{\mathrm {d}}}f \le C_d {{\,\mathrm{var}\,}}f \) for the dyadic maximal operator for all dimensions d.

For \(0\le \alpha \le d\) the centered fractional Hardy–Littlewood maximal function is defined by

$$\begin{aligned} \mathrm M^{{\mathrm {c}}}_\alpha f(x) = \sup _{r>0}r^\alpha f_{B(x,r)} . \end{aligned}$$

For a ball B we denote the radius of B by r(B). The uncentered fractional Hardy–Littlewood maximal function is defined by

$$\begin{aligned} \widetilde{{\mathrm {M}}}_\alpha f(x) = \sup _{B\ni x}r(B)^\alpha f_B \end{aligned}$$

where the supremum is taken over all balls that contain x. Note that \({\mathrm {M}}_\alpha \) does not make much sense for \(\alpha >d\). For \(\alpha =0\) it is the Hardy–Littlewood maximal function. The following is the fractional version of formula (1.1).

Theorem 1.1

Let \(1\le p<\infty \) and \(0<\alpha <d/p\) and \({\mathrm {M}}_\alpha \in \{\mathrm M^{{\mathrm {c}}}_\alpha ,\widetilde{{\mathrm {M}}}_\alpha \}\). Then for all \(f\in W^{1,p}({\mathbb {R}}^d)\) we have that \({\mathrm {M}}_\alpha f\) is weakly differentiable with

$$\begin{aligned} \Vert \nabla {\mathrm {M}}_\alpha f\Vert _{(p^{-1}-\alpha /d)^{-1}} \le C_{d,\alpha ,p} \Vert \nabla f\Vert _p \end{aligned}$$
(1.2)

where the constant \(C_{d,\alpha ,p}\) depends only on d\(\alpha \) and p. In the endpoint \(p=1\) we can replace \(f\in W^{1,1}({\mathbb {R}}^d)\) by \(f\in \mathrm {BV}({\mathbb {R}}^d)\). The endpoint result for \(p=d/\alpha \) holds true as well.

We prove Theorem 1.1 in Sect. 2.1. The study of the regularity of the fractional maximal operator was initiated by Kinnunen and Saksman. They proved in [20, Theorem 2.1] that formula (1.2) holds for \(0\le \alpha <d/p\) and \(1<p<\infty \). They showed \( |\nabla \mathrm M^{{\mathrm {c}}}_\alpha f(x)| \le {\mathrm {M}}_\alpha |\nabla f|(x) \) for almost every \(x\in {\mathbb {R}}^d\), and then concluded formula (1.2) from the \(L^{(p^{-1}-\alpha /d)^{-1}}\)-boundedness of \({\mathrm {M}}_\alpha \), which fails for \(p=1\). Another result by Kinnunen and Saksman in [20] is that for all \(\alpha \ge 1\) we have \( |\nabla \mathrm M^{{\mathrm {c}}}_\alpha f(x)| \le (d-\alpha ) {\mathrm {M}}_{\alpha -1}f(x) \) for almost every \(x\in {\mathbb {R}}^d\). In [10] Carneiro and Madrid used this, the \(L^{d/(d-\alpha )}\)-boundedness of \({\mathrm {M}}_{\alpha -1}\), and Sobolev embedding to concluded formula (1.2). All of this also works for the uncentered fractional maximal function \(\widetilde{{\mathrm {M}}}_\alpha \). The strategy fails for \(\alpha <1\).

Our main result is the extension of formula (1.2) to the endpoint \(p=1\) for \(0<\alpha <1\) which has been an open problem. Our proof of Theorem 1.1 also works for \(1\le \alpha \le d\), and further extends to \(1\le p<\infty \), \(0<\alpha \le d/p\). We present the proof for this range of parameters here, since it also smoothens out the blowup of the constants for \(p\rightarrow 1\) which occurs in the previous proof for \(p>1\). Note that interpolation is not immediately available for results on the gradient level. Our approach fails for \(\alpha =0\). The corner point \(\alpha =0,\ p=1\) is the earlier mentioned question by Hajłasz and Onninen and remains open. Similarly to Carneiro and Madrid, we begin the proof with a pointwise estimate \( |\nabla {\mathrm {M}}_\alpha f(x)| \le (d-\alpha ) {\mathrm {M}}_{\alpha ,-1}f(x) \) which holds for all \(0<\alpha <d\) for bounded functions. We estimate \({\mathrm {M}}_{\alpha ,-1}f\) in Theorem 1.2 and from that conclude Theorem 1.1.

For the centered fractional maximal function define

$$\begin{aligned} \mathcal B^{{\mathrm {c}}}_\alpha (x) = \{B(x,r)\} \end{aligned}$$

where r is the largest radius such that \( \mathrm M^{{\mathrm {c}}}_\alpha f(x)=r^\alpha f_{B(x,r)} \) and for the uncentered fractional maximal function define

$$\begin{aligned} \widetilde{\mathcal {B}}_\alpha (x) = \bigl \{B: x\in \overline{B} ,\ r(B)^\alpha f_B = \widetilde{{\mathrm {M}}}_\alpha f(x) ,\ \forall A\supsetneq B \ r(A)^\alpha f_A < \widetilde{{\mathrm {M}}}_\alpha f(x) \bigr \} . \end{aligned}$$

Then for almost every \(x\in {\mathbb {R}}^d\) the sets \(\mathcal B^{{\mathrm {c}}}_\alpha (x)\) and \(\widetilde{\mathcal {B}}_\alpha (x)\) are nonempty, i.e. the supremum in the definition of the maximal function is attained in a largest ball B with \(x\in \overline{B}\), see Lemma 2.2. For \(\mathcal {B}_\alpha \in \{\mathcal B^{{\mathrm {c}}}_\alpha ,\widetilde{\mathcal {B}}_\alpha \}\) denote \(\mathcal {B}_\alpha =\bigcup _{x\in {\mathbb {R}}^d}\mathcal {B}_\alpha (x)\). For \(\beta \in {\mathbb {R}}\) with \(-1\le \alpha +\beta <d\) this allows us to define the following maximal functions

$$\begin{aligned} \mathrm M^{{\mathrm {c}}}_{\alpha ,\beta } f(x)&= \sup _{B\in \mathcal B^{{\mathrm {c}}}_\alpha :x\in \overline{B}}r(B)^{\alpha +\beta } f_B,\\ \widetilde{{\mathrm {M}}}_{\alpha ,\beta } f(x)&= \sup _{B\in \widetilde{\mathcal {B}}_\alpha :x\in \overline{B}}r(B)^{\alpha +\beta } f_B \end{aligned}$$

for almost every \(x\in {\mathbb {R}}^d\). Note that also for the centered version the supremum is all balls \(B\in \mathcal B^{{\mathrm {c}}}_\alpha \) whose closure contains x, not only over those centered in x.

Theorem 1.2

Let \(1\le p<\infty \) and \(0<\alpha <d\) and \(\beta \in {\mathbb {R}}\) with \(0\le \alpha +\beta +1<d/p\) and \({\mathrm {M}}_{\alpha ,\beta }\in \{\mathrm M^{{\mathrm {c}}}_{\alpha ,\beta },\widetilde{{\mathrm {M}}}_{\alpha ,\beta }\}\). Then for all \(f\in W^{1,p}({\mathbb {R}}^d)\) we have

$$\begin{aligned} \Vert {\mathrm {M}}_{\alpha ,\beta } f\Vert _{(p^{-1}-(1+\alpha +\beta )/d)^{-1}} \le C_{d,\alpha ,\beta ,p} \Vert \nabla f\Vert _p \end{aligned}$$

where the constant \(C_{d,\alpha ,\beta ,p}\) depends only on d\(\alpha ,\) \(\beta \) and p. In the endpoint \(p=1\) we can replace \(f\in W^{1,1}({\mathbb {R}}^d)\) by \(f\in \mathrm {BV}({\mathbb {R}}^d)\). The endpoint result for \(p=d/(1+\alpha +\beta )\) holds true as well.

We prove Theorem 1.2 in Sect. 4. There had also been progress on \(0<\alpha \le 1\) similarly as for the Hardy–Littlewood maximal operator. For the uncentered fractional maximal function Carneiro and Madrid proved Theorem 1.1 for \(d=1\) in [10], and Luiro proved Theorem 1.1 for radial functions in [25]. Beltran and Madrid transferred Luiros result to the centered fractional maximal function in [5]. In [6] Beltran et al. proved Theorem 1.1 for \(d\ge 2\) and a centered maximal operator that only uses balls with lacunary radius and for maximal operators with respect to smooth kernels. The next step after boundedness is continuity of the gradient of the fractional maximal operator, as it implies boundedness, but doesn’t follow from it. In [4, 26] Beltran and Madrid already proved it for the uncentered fractional maximal operator in the cases where the boundedness is known.

For a dyadic cube Q we denote by \({{\,\mathrm{l}\,}}(Q)\) the sidelength of Q. The fractional dyadic maximal function is defined by

$$\begin{aligned} \mathrm M^{{\mathrm {d}}}_\alpha f(x) = \sup _{Q:Q\ni x} {{\,\mathrm{l}\,}}(Q)^\alpha f_Q , \end{aligned}$$

where the supremum is taken over all dyadic cubes that contain x. The dyadic maximal operator has enjoyed a bit less attention than its continuous counterparts, such as the centered and the uncentered Hardy–Littlewood maximal operator. The dyadic maximal operator is different in the sense that formula (1.2) only holds for \(\alpha =0\), \(p=1\) and only in the variation sense, for which formula (1.2) has been proved in [29]. But for any other \(\alpha \) and p formula (1.2) fails because \(\nabla \mathrm M^{{\mathrm {d}}}_\alpha f\) is not a Sobolev function. We can however prove Theorem 1.4, the dyadic analog of Theorem 1.2. For \(\alpha \ge 0\) and a function \(f\in L^1({\mathbb {R}}^d)\) define \(\mathcal {Q}_\alpha \) to be the set of all cubes Q such that for all dyadic cubes \(P\supsetneq Q\) we have \({{\,\mathrm{l}\,}}(P)^\alpha f_P<{{\,\mathrm{l}\,}}(Q)^\alpha f_Q\).

Remark 1.3

In the uncentered setting one could also define \(\mathcal {B}_\alpha \) in a similar way as \(\mathcal {Q}_\alpha \).

For \(\beta \in {\mathbb {R}}\) with \(-1\le \alpha +\beta <d\) also define in the dyadic setting

$$\begin{aligned} \mathrm M^{{\mathrm {d}}}_{\alpha ,\beta } f(x) = \sup _{Q\in \mathcal {Q}_\alpha :x\in \overline{Q}}{{\,\mathrm{l}\,}}(Q)^{\alpha +\beta }f_Q . \end{aligned}$$

Then

Theorem 1.4

Let \(1\le p<\infty \) and \(0<\alpha <d\) and \(\beta \in {\mathbb {R}}\) with \(0\le \alpha +\beta +1<d/p\). Then for all \(f\in W^{1,p}({\mathbb {R}}^d)\) we have

$$\begin{aligned} \Vert \mathrm M^{{\mathrm {d}}}_{\alpha ,\beta } f\Vert _{(p^{-1}-(1+\alpha +\beta )/d)^{-1}} \le C_{d,\alpha ,\beta ,p} \Vert \nabla f\Vert _p \end{aligned}$$

where the constant \(C_{d,\alpha ,\beta ,p}\) depends only on d\(\alpha ,\) \(\beta \) and p. In the endpoint \(p=1\) we can replace \(f\in W^{1,1}({\mathbb {R}}^d)\) by \(f\in \mathrm {BV}({\mathbb {R}}^d)\). The endpoint result for \(p=d/(1+\alpha +\beta )\) holds true as well.

Our main result in the dyadic setting is the following.

Theorem 1.5

Let \(1\le p<\infty \) and \(0<\alpha <d\). Then for all \(f\in W^{1,p}({\mathbb {R}}^d)\) we have

$$\begin{aligned} \Biggl ( \sum _{Q\in \mathcal {Q}_\alpha } ({{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}f_Q)^p \Biggr )^{\frac{1}{p}} \le C_{d,\alpha ,p} \Vert \nabla f\Vert _p \end{aligned}$$

where the constant \(C_{d,\alpha ,p}\) depends only on d\(\alpha \) and p. In the endpoint \(p=1\) we can replace \(f\in W^{1,1}({\mathbb {R}}^d)\) by \(f\in \mathrm {BV}({\mathbb {R}}^d)\). The endpoint result for \(p=\infty \) holds true as well.

Remark 1.6

Note that in Theorem 1.5 we restrict \(0<\alpha <d\) and not \(0<\alpha <d/p\).

In Sect. 2.2 we conclude Theorem 1.4 from Theorem 1.5, and in Sect. 3 we prove Theorem 1.5.

Remark 1.7

Theorem 1.5 fails for \(\alpha =0\). However for \(\alpha =0\) and \(p=1\), a version with \(f_Q\) by replaced by \(f_Q-\lambda _Q\) holds for certain \(\lambda _Q\), see [29, Proposition 2.5].

Remark 1.8

For centered, uncentered maximal operator and dyadic maximal operator, Theorems 1.21.4 and 1.5 admit localized versions of the following form. For \(D\subset {\mathbb {R}}^d\) we set \( \mathcal {B}_\alpha (D) = \bigcup _{x\in D}\mathcal {B}_\alpha (x) \) and \( E = \bigcup \{c B:B\in \mathcal {B}_\alpha (D)\} \) with some large \(c>1\). Then Theorem 1.2 also holds in the form

$$\begin{aligned} \Vert \nabla {\mathrm {M}}_{\alpha ,-1} f\Vert _{L^{(p^{-1}-\alpha /d)^{-1}}(D)} \le C_{d,\alpha ,p} \Vert \nabla f\Vert _{L^p(E)} . \end{aligned}$$

Theorem 1.4 holds with the dyadic version of E and Theorem 1.5 where the sum on the left hand side is over any subset \(\mathcal {Q}\subset \mathcal {Q}_\alpha \) and the integral on the right is over \(\bigcup \{cQ:Q\in \mathcal {Q}\}\). These localized results directly follow from the same proof as the global results, if one keeps track of the balls and cubes which are being dealt with. The respective localized version of Theorem 1.1 can be proven if one has Lemma 2.4 without the differentiability assumption. Then in the reduction of Theorem 1.1 to Theorem 1.2 one could apply Theorem 1.2 to the same function f and \(\mathcal {Q}_\alpha \) for which one is showing Theorem 1.1, bypassing the approximation step and therefore preserving the locality of Theorem 1.2. This is in contrast to the actual local fractional maximal operator, for whom Theorem 1.1 fails by [17, Example 4.2], which works for \(\alpha >0\). However if \(\alpha =0\) and \(p>1\) then the local fractional maximal operator is again bounded due to [19], and by [30] for \(\alpha =0\) and \(p=1\) and characteristic functions.

Dyadic cubes are much easier to deal with than balls, but the dyadic version still serves as a model case for the continuous versions since both versions share many properties. This can be observed in [30], where we proved \({{\,\mathrm{var}\,}}{\mathrm {M}}_01_{E}\le C_d{{\,\mathrm{var}\,}}1_{E}\) for the dyadic maximal operator and the uncentered Hardy–Littlewood maximal operator. The proof for the dyadic maximal operator is much shorter, but the same proof idea also works for the uncentered maximal operator. Also in this paper a part of the proof of Theorem 1.4 for the dyadic maximal operator is used also in the proof of Theorem 1.2 for the Hardy–Littlewood maximal operator.

The plan for the proof of Theorem 1.1 is the following. For simplicity we write it down for \(p=1\).

$$\begin{aligned} \int |\nabla {\mathrm {M}}_\alpha f|^{\frac{d}{d-\alpha }}&\le (d-\alpha )^{\frac{d}{d-\alpha }} \int ({\mathrm {M}}_{\alpha ,-1}f)^{\frac{d}{d-\alpha }} \\&= d(d-\alpha )^{\frac{\alpha }{d-\alpha }} \int _0^\infty \lambda ^{\frac{\alpha }{d-\alpha }}{\mathcal {L}}(\{{\mathrm {M}}_{\alpha ,-1}f>\lambda \})\mathop {}\!{\mathrm {d}}\lambda \\&= d(d-\alpha )^{\frac{\alpha }{d-\alpha }} \int _0^\infty \lambda ^{\frac{\alpha }{d-\alpha }}{\mathcal {L}}\left( \bigcup \{\overline{B}:B\in \mathcal {B}_\alpha ,r(B)^{\alpha -1}f_B>\lambda \}\right) \mathop {}\!{\mathrm {d}}\lambda \\&\lesssim _\alpha \int _0^\infty \lambda ^{\frac{\alpha }{d-\alpha }}\sum _{B\in {\tilde{\mathcal {B}}}_\alpha ,cr(B)^{\alpha -1}f_B>\lambda }{\mathcal {L}}(B)\mathop {}\!{\mathrm {d}}\lambda \\&= \sum _{B\in {\tilde{\mathcal {B}}}_\alpha }\int _0^{cr(B)^{\alpha -1}f_B}\lambda ^{\frac{\alpha }{d-\alpha }}\mathop {}\!{\mathrm {d}}\lambda \\&= \frac{(1-\alpha /d)c^{\frac{d}{d-\alpha }}}{(d\sigma _d)^{\frac{d}{d-\alpha }}} \sum _{B\in {\tilde{\mathcal {B}}}_\alpha }(f_B{\mathcal {H}}^{d-1}(\partial B))^{\frac{d}{d-\alpha }} \\&\le \frac{(1-\alpha /d)c^{\frac{d}{d-\alpha }}}{(d\sigma _d)^{\frac{d}{d-\alpha }}} \biggl ( \sum _{B\in {\tilde{\mathcal {B}}}_\alpha }f_B{\mathcal {H}}^{d-1}(\partial B) \biggr )^{\frac{d}{d-\alpha }} \\&\lesssim _\alpha \biggl ( \sum _{Q\in {\tilde{\mathcal {Q}}}_\alpha }f_Q{\mathcal {H}}^{d-1}(\partial Q) \biggr )^{\frac{d}{d-\alpha }} \\&\le C_{d,\alpha } ({{\,\mathrm{var}\,}}f)^{\frac{d}{d-\alpha }} , \end{aligned}$$

where \(\sigma _d\) is the volume of the d-dimensional unit ball. In the second step we apply the layer cake formula, in the forth step we pass from a union of arbitrary balls to very disjoint balls \({\tilde{\mathcal {B}}}_\alpha \) with a Vitali covering argument, in the eighth step we pass from those balls to comparable dyadic cubes and as the last step use a result from the dyadic setting.

We use \(\alpha >0\) as follows. Let A be a ball and B(xr) be a smaller ball that intersects A. Then by \(A\subset B(x,3r(A))\) we have \(3^{\alpha -d} r(A)^\alpha f_A\le (3r(A))^\alpha f_{B(x,3r(A))}\). Thus if \(r^\alpha f_{B(x,r)}\le 3^{\alpha -d} r(A)^\alpha f_A\) then B(xr) is not used by the fractional maximal operator. Hence it suffices to consider balls B with \(3^{d-\alpha }(r(B)/r(A))^\alpha f_B > f_A\). From that we can conclude \(f_B>2f_A\) or \(r(B)\gtrsim _\alpha r(A)\). Thus for any two balls BA used by the fractional maximal operator, one of the following alternatives applies.

  1. (1)

    The balls B and A are disjoint.

  2. (2)

    The intervals \((f_B/2,f_B)\) and \((f_A/2,f_A)\) are disjoint.

  3. (3)

    The radii r(B) and r(A) are comparable.

We use this in the forth step of the proof strategy above. We use a dyadic version of these alternatives in last step. Note that for \(\alpha =0\) optimal balls B of arbitrarily different sizes with similar values \(f_B\) can intersect.

Remark 1.9

There is a proof of Theorem 1.1 which has a structure parallel to the one presented above, but three steps are replaced. The estimate \(|\nabla {\mathrm {M}}_\alpha f|^{\frac{d}{d-\alpha }}\le (d-\alpha )^{\frac{d}{d-\alpha }}{\mathrm {M}}_{\alpha ,-1} f\) is replaced by \(|\nabla {\mathrm {M}}_\alpha f|^{\frac{d}{d-\alpha }}\le (d-\alpha )^{\frac{\alpha }{d-\alpha }}|\nabla {\mathrm {M}}_\alpha f|({\mathrm {M}}_{\alpha ,-1} f)^{\frac{\alpha }{d-\alpha }}\), the layer cake formula is replaced by the coarea formula [13, Theorem 3.11] and the Vitali covering argument is replaced by [30, Lemma 4.1] which deals with the boundary of balls instead of their volume. Otherwise it is identical to the proof presented in this paper.

$$\begin{aligned} \int |\nabla {\mathrm {M}}_\alpha f|^{\frac{d}{d-\alpha }}&\le (d-\alpha )^{\frac{\alpha }{d-\alpha }} \int |\nabla {\mathrm {M}}_\alpha f|({\mathrm {M}}_{\alpha ,-1}f)^{\frac{\alpha }{d-\alpha }} \\&=(d-\alpha )^{\frac{\alpha }{d-\alpha }} \int _0^\infty \int _{\partial _*{\{{\mathrm {M}}_\alpha f>\lambda \}}}({\mathrm {M}}_{\alpha ,-1}f)^{\frac{\alpha }{d-\alpha }}\mathop {}\!{\mathrm {d}}\lambda \\&=(d-\alpha )^{\frac{\alpha }{d-\alpha }} \int _0^\infty \int _{\partial _*{\bigcup \{\overline{B}:B\in \mathcal {B}_\alpha ,r(B)^\alpha f_B>\lambda \}}}(r(B_x)^{\alpha -1}f_{B_x})^{\frac{\alpha }{d-\alpha }}\mathop {}\!{\mathrm {d}}{\mathcal {H}}^{d-1}(x)\mathop {}\!{\mathrm {d}}\lambda \\&\lesssim _\alpha \int _0^\infty \sum _{B\in {\tilde{\mathcal {B}}}_\alpha ,r(B)^\alpha f_B>\lambda }{\mathcal {H}}^{d-1}(\partial B)(r(B)^{\alpha -1}f_B)^{\frac{\alpha }{d-\alpha }}\mathop {}\!{\mathrm {d}}\lambda \\&\lesssim _\alpha \sum _{B\in {\tilde{\mathcal {B}}}_\alpha }(f_B{\mathcal {H}}^{d-1}(\partial B))^{\frac{d}{d-\alpha }} \end{aligned}$$

and from there on arrive exactly as before at the bound by \(({{\,\mathrm{var}\,}}f)^{\frac{d}{d-\alpha }}\). This motivates a similar replacement in the dyadic setting. Instead of proving the boundedness of \(\Vert {\mathrm {M}}_{\alpha ,-1}f\Vert _{d/(d-\alpha )}\), Theorem 1.4, one might bound

$$\begin{aligned} \int _0^\infty \int _{\partial _*{\{{\mathrm {M}}_\alpha f>\lambda \}}}({\mathrm {M}}_{\alpha ,-1}f)^{\frac{\alpha }{d-\alpha }}\mathop {}\!{\mathrm {d}}\lambda . \end{aligned}$$

Note that formally

$$\begin{aligned} \int |\nabla {\mathrm {M}}_\alpha f(x)|({\mathrm {M}}_{\alpha ,-1} f(x))^{\frac{\alpha }{d-\alpha }} \mathop {}\!{\mathrm {d}}x \end{aligned}$$

is not well defined because \({\mathrm {M}}_{\alpha ,-1}f\) jumps where \(\nabla {\mathrm {M}}_\alpha f\) is supported.

Remark 1.10

In the proof of Theorems 1.11.21.5 and 1.4 we do not a priori need \(f\in L^p({\mathbb {R}}^d)\), it suffices to have \(f\in L^q({\mathbb {R}}^d)\) for some \(1\le q\le p\). However from \(\Vert \nabla f\Vert _p<\infty \) we can then anyways conclude \(f\in L^p({\mathbb {R}}^d)\) by Sobolev embedding.

2 Reformulation

In order to avoid writing absolute values, we consider only nonnegative functions f for the rest of the paper. We can still conclude Theorems 1.11.21.4 and 1.5 for signed functions because \(|f|_B=f_B\) and \(\bigl |\nabla |f|(x)\bigr |\le |\nabla f(x)|\). Recall the set of dyadic cubes

$$\begin{aligned} \bigcup _{n\in {\mathbb {Z}}} \Bigl \{ [x_1,x_1+2^n)\times \cdots \times [x_d,x_d+2^n):\forall i\in \{1,\ldots ,n\}\ x_i\in 2^n{\mathbb {Z}} \Bigr \} . \end{aligned}$$

For a set \(\mathcal {B}\) of balls or dyadic cubes we denote

$$\begin{aligned} \bigcup \mathcal {B}= \bigcup _{B\in \mathcal {B}}B \end{aligned}$$

as is commonly used in set theory. By \(a\lesssim _{\gamma _1,\ldots ,\gamma _n} b\) we mean that there exists a constant \(C_{d,\gamma _1,\ldots ,\gamma _n}\) that depends only on the values of \(\gamma _1,\ldots ,\gamma _n\) and the dimension d and such that \(a\le C_{d,\gamma _1,\ldots ,\gamma _n} b\).

We work in the setting of functions of bounded variation, as in Evans–Gariepy [13, Section 5]. For an open set \(\Omega \subset {\mathbb {R}}^d\) a function \(u\in L^1_{\mathrm {loc}}(\Omega )\) is said to have locally bounded variation if for each open and compactly supported \(V\subset \Omega \) we have

$$\begin{aligned} \sup \Bigl \{\int _Vu\,\text {div}\,\varphi :\varphi \in C^1_{\text{ c }}(V;{\mathbb {R}}^d),\ |\varphi |\le 1\Bigr \}<\infty . \end{aligned}$$

Such a function comes with a measure \(\mu \) and a function \(\nu :\Omega \rightarrow {\mathbb {R}}^d\) that has \(|\nu |=1\) \(\mu \)-a.e. such that for all \(\varphi \in C^1_{\text{ c }}(\Omega ;{\mathbb {R}}^d)\) we have

$$\begin{aligned} \int u\,\text {div}\,\varphi =\int \varphi \nu \mathop {}\!{\mathrm {d}}\mu . \end{aligned}$$

We denote \(\nabla u=-\nu \mu \) and define the variation of u by

$$\begin{aligned} {{\,\mathrm{var}\,}}_\Omega u = \mu (\Omega ) = \Vert \nabla u\Vert _{L^1(\Omega )} . \end{aligned}$$

If \(\nabla u\) is a locally integrable function we call u weakly differentiable.

Lemma 2.1

Let \(1<p\le \infty \) and \((u_n)_n\) be a sequence of locally integrable functions with

$$\begin{aligned} \sup _n \Vert \nabla u_n\Vert _p <\infty \end{aligned}$$

which converge to u in \(L^1_{\mathrm {loc}}({\mathbb {R}}^d)\). Then u is weakly differentiable and

$$\begin{aligned} \Vert \nabla u\Vert _p \le \limsup _n \Vert \nabla u_n\Vert _p . \end{aligned}$$

Proof

By the weak compactness of \(L^p({\mathbb {R}}^d)\) there is a subsequence, for simplicity also denoted by \((u_n)_n\), and a \(v\in L^p({\mathbb {R}}^d)^d\) such that \(\nabla u_n\rightarrow v\) weakly in \(L^p({\mathbb {R}}^d)\) and \(\Vert v\Vert _p\le \limsup _n\Vert \nabla u_n\Vert _p\). Let \(\varphi \in C^\infty _c({\mathbb {R}}^d)\) and \(i\in \{1,\ldots ,d\}\). Then

$$\begin{aligned} \int u\partial _i\varphi = \lim _{n\rightarrow \infty } \int u_n\partial _i\varphi = - \lim _{n\rightarrow \infty } \int \partial _iu_n\varphi = - \int v_i\varphi \end{aligned}$$

which means \(\nabla u=v\). \(\square \)

2.1 Hardy–Littlewood maximal operator

In this section we reduce Theorem 1.1 to Theorem 1.2. Let \(1\le p<d/\alpha \) and \(f\in L^p({\mathbb {R}}^d)\). For \(x\in {\mathbb {R}}^d\) consider for the uncentered maximal operator the set of balls B with \(x\in \overline{B}\) and \( {\mathrm {M}}_\alpha f(x) = r(B)^\alpha f_B , \) and for the centered maximal operator such balls B which are centered in x. Recall that we denote by \(\mathcal {B}_\alpha (x)\) the subset of those balls that have the largest radius.

Lemma 2.2

Let \({\mathrm {M}}_\alpha \in \{\mathrm M^{{\mathrm {c}}}_\alpha ,\widetilde{{\mathrm {M}}}_\alpha \}\) and \(1\le p<d/\alpha \). Let \(f\in L^p({\mathbb {R}}^d)\) and \(x\in {\mathbb {R}}^d\) be a Lebesgue point of f. Then \(\mathcal {B}_\alpha (x)\) is nonempty.

Proof

We formulate one proof that works both for the centered and uncentered fractional maximal operator. Let \((B_n)_n\) a sequence of balls with \(x\in B_n\) and

$$\begin{aligned} {\mathrm {M}}_\alpha f(x)=\lim _{n\rightarrow \infty }r(B_n)^\alpha f_{B_n} . \end{aligned}$$

Assume there is a subsequence \((n_k)_k\) with \(r(B_{n_k})\rightarrow 0\). Then \(f_{B_{n_k}}\rightarrow f(x)\) and thus

$$\begin{aligned} \limsup _{k\rightarrow \infty } r(B_{n_k})^\alpha f_{B_{n_k}} \le f(x) \limsup _{n\rightarrow \infty } r(B_{n_k})^\alpha =0 , \end{aligned}$$

a contradiction. Assume there is a subsequence \((n_k)_k\) with \(r(B_{n_k})\rightarrow \infty \). Then

$$\begin{aligned} \limsup _{k\rightarrow \infty } r(B_{n_k})^\alpha f_{B_{n_k}}&\le \limsup _{k\rightarrow \infty } r(B_{n_k})^\alpha {\mathcal {L}}(B_{n_k})^{-1} {\mathcal {L}}(B_{n_k})^{1-\frac{1}{p}} \Bigl (\int _{B_{n_k}} f^p\Bigr )^{\frac{1}{p}} \\&= \limsup _{k\rightarrow \infty } \sigma _d^{-\frac{1}{p}} r(B_{n_k})^{\alpha -\frac{d}{p}} \Bigl (\int _{B_{n_k}} f^p\Bigr )^{\frac{1}{p}} \\&\le \sigma _d^{-\frac{1}{p}} \limsup _{k\rightarrow \infty } r(B_{n_k})^{\alpha -\frac{d}{p}} \Vert f\Vert _p = 0 \end{aligned}$$

since \(\Vert f\Vert _p<\infty \), a contradiction. Hence there is a subsequence \((n_k)_k\) such that \(r(B_{n_k})\) converges to some value \(r\in (0,\infty )\). We can conclude that there is a ball B with \(x\in \overline{B}\) and \(r(B)=r\) and \( \int _{B_{n_k}} f \rightarrow \int _B f . \) So we have

$$\begin{aligned} {\mathrm {M}}_\alpha f(x) = \lim _{k\rightarrow \infty } r(B_{n_k})^\alpha f_{B_{n_k}} = r(B)^\alpha f_B . \end{aligned}$$

A similar argument shows that there exist a largest ball B for which \(\sup _{\overline{B}\ni x}r(B)^\alpha f_B\) is attained. \(\square \)

Lemma 2.3

Let \({\mathrm {M}}_\alpha \in \{\mathrm M^{{\mathrm {c}}}_\alpha ,\widetilde{{\mathrm {M}}}_\alpha \}\). and \(f\in L^\infty ({\mathbb {R}}^d)\) have bounded variation. Then \({\mathrm {M}}_\alpha f\) is locally Lipschitz.

Proof

If \(f=0\) then the statement is obvious, so consider \(f\ne 0\). Let B be a ball. Then there is a ball \(A\supset B\) with \(f_A>0\). Define

$$\begin{aligned} r_0 = 2r(A) \Bigl ( \frac{f_A}{2^d\Vert f\Vert _\infty } \Bigr )^{1/\alpha } \end{aligned}$$

and let \(x\in B\). Then \(A\subset B(x,2r(A)\) so that for \(r<r_0\) we have

$$\begin{aligned} r^\alpha f_{B(x,r)} < (2r(A))^\alpha \frac{f_A}{2^d\Vert f\Vert _\infty }\Vert f\Vert _\infty \le (2r(A))^\alpha f_{B(x,2r(A))} . \end{aligned}$$

That means that on B the maximal function \({\mathrm {M}}_\alpha f\) is the supremum over all functions \(\sigma _d^{-1}r^{\alpha -d}f*1_{B(z,r)}\) with \(r\ge r_0\) and z such that \(0\in B(z,r)\) for the uncentered operator and \(z=0\) for the centered. Those convolutions are weakly differentiable with

$$\begin{aligned} \nabla (r^{\alpha -d}f*1_{B(z,r)}) = r^{\alpha -d}(\nabla f)*1_{B(z,r)} \end{aligned}$$

so that

$$\begin{aligned} |\nabla (r^{\alpha -d}f*1_{B(z,r)})| \le r^{\alpha -d}{{\,\mathrm{var}\,}}f \le r_0^{\alpha -d}{{\,\mathrm{var}\,}}f . \end{aligned}$$

Thus on B the maximal function \({\mathrm {M}}_\alpha f\) is a supremum of functions with Lipschitz constant \(\sigma _d^{-1}r_0^{\alpha -d}{{\,\mathrm{var}\,}}f\) and hence itself Lipschitz with the same constant. \(\square \)

The following has essentially already been observed in [17, 20, 23, 25].

Lemma 2.4

Let \({\mathrm {M}}_\alpha \in \{\mathrm M^{{\mathrm {c}}}_\alpha ,\widetilde{{\mathrm {M}}}_\alpha \}\) and let \({\mathrm {M}}_\alpha f\) be differentiable in x. Then for every \(B\in \mathcal {B}_\alpha (x)\) we have

$$\begin{aligned} |\nabla {\mathrm {M}}_\alpha f(x)| \le (d-\alpha )r(B)^{\alpha -1}f_B . \end{aligned}$$

In the uncentered case if \(x\in B\) we have \( \nabla \widetilde{{\mathrm {M}}}_\alpha f(x)=0 . \)

Proof

Let \(B(z,r)\in \mathcal {B}_\alpha (x)\) and let e be a unit vector. Note that for the centered maximal operator we have \(z=x\). Then for all \(h>0\) we have \(x+he\in \overline{B(z,r+h)}\). Thus

$$\begin{aligned} |\nabla {\mathrm {M}}_\alpha f(x)|&= \sup _e\lim _{h\rightarrow 0} \frac{{\mathrm {M}}_\alpha f(x)-{\mathrm {M}}_\alpha f(x+he)}{h} \\&\le \frac{1}{\sigma _d} \lim _{h\rightarrow 0} \frac{1}{h}(r^{\alpha -d}\int _{B(z,r)}f-(r+h)^{\alpha -d}\int _{B(z+eh,r+h)}f) \\&\le \frac{1}{\sigma _d} \lim _{h\rightarrow 0} \frac{1}{h}(r^{\alpha -d}\int _{B(z,r)}f-(r+h)^{\alpha -d}\int _{B(z,r)}f) \\&= \frac{1}{\sigma _d} \lim _{h\rightarrow 0} \frac{1}{h}(r^{\alpha -d}-(r+h)^{\alpha -d})\int _{B(z,r)}f \\&= \frac{1}{\sigma _d} (d-\alpha )r^{\alpha -d-1}\int _{B(z,r)}f . \end{aligned}$$

If \(x\in B(z,r)\) then since for all \(y\in B(z,r)\) we have \({\mathrm {M}}_\alpha f(y)\ge {\mathrm {M}}_\alpha f(x)\) we get \(\nabla {\mathrm {M}}_\alpha f(x)=0\). \(\square \)

Now we reduce Theorem 1.1 to Theorem 1.2. We prove Theorem 1.2 in Sect. 4.

Proof of Theorem 1.1

For each \(n\in {\mathbb {N}}\) define a cutoff function \(\varphi _n\) by

$$\begin{aligned} \varphi _n(x) = {\left\{ \begin{array}{ll} 1,&{}0\le |x|\le 2^n,\\ 2-2^{-n}|x|,&{}2^n\le |x|\le 2^{n+1},\\ 0,&{}2^{n+1}\le |x|<\infty . \end{array}\right. } \end{aligned}$$

Then \(|\nabla \varphi _n(x)|=2^{-n}1_{2^n\le |x|\le 2^{n+1}}\) and thus

$$\begin{aligned} \Vert f\nabla \varphi _n\Vert _p = 2^{-n}\Vert f\Vert _{L^p(B(0,2^{n+1}){\setminus } B(0,2^n))} \rightarrow 0 \end{aligned}$$
(2.1)

for \(n\rightarrow \infty \). Denote \(f_n(x)=\min \{f(x),n\}\cdot \varphi _n(x)\). Then by formula (2.1) we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \Vert \nabla f_n\Vert _p = \lim _{n\rightarrow \infty } \Vert \nabla f_n-\min \{f,n\}\nabla \varphi _n\Vert _p = \lim _{n\rightarrow \infty } \Vert \varphi _n\nabla \min \{f,n\}\Vert _p = \Vert \nabla f\Vert _p .\nonumber \\ \end{aligned}$$
(2.2)

Since \(1\le p<d/\alpha \) and \(f\in L^p({\mathbb {R}}^d)\) we have \({\mathrm {M}}_\alpha f\in L^{(p^{-1}-\alpha /d)^{-1},\infty }({\mathbb {R}}^d)\subset L^1_{\mathrm {loc}}({\mathbb {R}}^d)\). Then since \({\mathrm {M}}_\alpha f_n\rightarrow {\mathrm {M}}_\alpha f\) pointwise from below, \({\mathrm {M}}_\alpha f_n\) converges to \({\mathrm {M}}_\alpha f\) in \(L^1_{\mathrm {loc}}({\mathbb {R}}^d)\). So from Lemma 2.1 it follows that

$$\begin{aligned} \Vert \nabla {\mathrm {M}}_\alpha f\Vert _{(p^{-1}-\alpha /d)^{-1}} \le \limsup _{n\rightarrow \infty } \Vert \nabla {\mathrm {M}}_\alpha f_n\Vert _{(p^{-1}-\alpha /d)^{-1}} . \end{aligned}$$

By Lemma 2.3 we have that \({\mathrm {M}}_\alpha f_n\) is weakly differentiable and differentiable almost everywhere, so that by Lemmas 2.22.4 and Theorem 1.2 we have

$$\begin{aligned} \int |\nabla {\mathrm {M}}_\alpha f_n|^{(p^{-1}-\alpha /d)^{-1}}&\le (d-\alpha ) \Vert {\mathrm {M}}_\alpha f_n/r(B_x)\Vert _{(p^{-1}-\alpha /d)^{-1}} \\&\le (d-\alpha ) \Vert {\mathrm {M}}_{\alpha ,-1} f_n\Vert _{(p^{-1}-\alpha /d)^{-1}} \\&\lesssim _\alpha \Vert \nabla f_n\Vert _p , \end{aligned}$$

which by formula (2.2) converges to \( \Vert \nabla f\Vert _p . \) for \(n\rightarrow \infty \). For the endpoint \(p=d/\alpha \) the proof works the same. \(\square \)

2.2 Dyadic maximal operator

In this section we reduce Theorem 1.4 to Theorem 1.5. Let \(1\le p<d/\alpha \) and \(f\in L^p({\mathbb {R}}^d)\). Recall that we denote by \(\mathcal {Q}_\alpha \) the set of all dyadic cubes Q such that for every dyadic cube ball \(P\supsetneq Q\) we have \({{\,\mathrm{l}\,}}(P)^\alpha f_P<{{\,\mathrm{l}\,}}(Q)^\alpha f_Q\). For \(x\in {\mathbb {R}}^d\), we denote by \(\mathcal {Q}_\alpha (x)\) the set of dyadic cubes Q with \(x\in \overline{Q}\) and

$$\begin{aligned} \mathrm M^{{\mathrm {d}}}_\alpha f(x)={{\,\mathrm{l}\,}}(Q)^\alpha f_Q . \end{aligned}$$

Lemma 2.5

Let \(1\le p<d/\alpha \) and \(f\in L^p({\mathbb {R}}^d)\) and \(x\in {\mathbb {R}}^d\) be a Lebesgue point of f. Then \(\mathcal {Q}_\alpha (x)\) contains a dyadic cube \(Q_x\) with

$$\begin{aligned} {{\,\mathrm{l}\,}}(Q_x) = \sup _{Q\in \mathcal {Q}_\alpha (x)}{{\,\mathrm{l}\,}}(Q) \end{aligned}$$

and that cube also belongs to \(\mathcal {Q}_\alpha \).

Proof

Let \((Q_n)_n\) be a sequence of cubes with \({{\,\mathrm{l}\,}}(Q_n)\rightarrow \infty \). Then

$$\begin{aligned} \limsup _{n\rightarrow \infty } {{\,\mathrm{l}\,}}(Q_n)^\alpha f_{Q_n}&\le \limsup _{n\rightarrow \infty } {{\,\mathrm{l}\,}}(Q_n)^{\alpha -d} {\mathcal {L}}(Q_n)^{1-\frac{1}{p}} \Bigl (\int _{Q_n} f^p\Bigr )^{\frac{1}{p}} \\&= \limsup _{n\rightarrow \infty } {{\,\mathrm{l}\,}}(Q_n)^{\alpha -d+d-\frac{d}{p}} \Bigl (\int _{Q_n} f^p\Bigr )^{\frac{1}{p}} \\&= \limsup _{n\rightarrow \infty } {{\,\mathrm{l}\,}}(Q_n)^{\alpha -\frac{d}{p}} \Bigl (\int _{Q_n} f^p\Bigr )^{\frac{1}{p}} \\&\le \limsup _{n\rightarrow \infty } {{\,\mathrm{l}\,}}(Q_n)^{\alpha -\frac{d}{p}} \Vert f\Vert _p = 0 . \end{aligned}$$

Let \((Q_n)_n\) be a sequence of cubes with \({{\,\mathrm{l}\,}}(Q_n)\rightarrow 0\). Then since \(f_{Q_n}\rightarrow f(x)\) and \({{\,\mathrm{l}\,}}(Q_n)^\alpha \rightarrow 0\), we have \({{\,\mathrm{l}\,}}(Q_n)^\alpha f_Q\rightarrow 0\). Thus since for each k there are at most \(2^d\) many cubes Q with \({{\,\mathrm{l}\,}}(Q)=2^k\) and whose closure contains x, the supremum has to be attained for a finite set of cubes from which we can select the largest. \(\square \)

Now we reduce Theorem 1.4 to Theorem 1.5. We prove Theorem 1.5 in Sect. 3.

Proof of Theorem 1.4

By Lemma 2.5, \(\mathrm M^{{\mathrm {d}}}_{\alpha ,\beta }f\) is defined almost everywhere. We have

$$\begin{aligned} \int (\mathrm M^{{\mathrm {d}}}_{\alpha ,\beta } f(x))^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}} \mathop {}\!{\mathrm {d}}x&\le \int \sum _{Q\in \mathcal {Q}_\alpha } 1_{Q}(x) ({{\,\mathrm{l}\,}}(Q)^{\alpha +\beta }f_Q)^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}} \mathop {}\!{\mathrm {d}}x \\&= \sum _{Q\in \mathcal {Q}_\alpha } {\mathcal {L}}(Q) ({{\,\mathrm{l}\,}}(Q)^{\alpha +\beta }f_Q)^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}} \\&= \sum _{Q\in \mathcal {Q}_\alpha } ({{\,\mathrm{l}\,}}(Q)^{d/p-1}f_Q)^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}} \\&\le \biggl ( \sum _{Q\in \mathcal {Q}_\alpha } \bigl ( {{\,\mathrm{l}\,}}(Q)^{d/p-1}f_Q \bigr )^p \biggr )^{(1-p(1+\alpha +\beta )/d)^{-1}} \\&\lesssim _\alpha \Vert \nabla f\Vert _p^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}}, \end{aligned}$$

where the last step follows from Theorem 1.5. In the endpoint case we have by Theorem 1.5

$$\begin{aligned} \Vert \mathrm M^{{\mathrm {d}}}_{\alpha ,\beta } f\Vert _\infty&= \sup _{Q\in \mathcal {Q}_\alpha } {{\,\mathrm{l}\,}}(Q)^{\alpha +\beta }f_Q \\&= \sup _{Q\in \mathcal {Q}_\alpha } {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}f_Q \le \Biggl ( \sum _{Q\in \mathcal {Q}_\alpha } ({{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}f_Q)^p \Biggr )^{\frac{1}{p}} \lesssim _p \Vert \nabla f\Vert _p . \end{aligned}$$

\(\square \)

3 Dyadic maximal operator

In this section we prove Theorem 1.5. For a measurable set \(E\subset {\mathbb {R}}^d\) we define the measure theoretic boundary by

$$\begin{aligned} \partial _*{E} = \left\{ x:\limsup _{r\rightarrow 0}\frac{{\mathcal {L}}(B(x,r){\setminus } E)}{r^d}>0,\ \limsup _{r\rightarrow 0}\frac{{\mathcal {L}}(B(x,r)\cap E)}{r^d}>0\right\} . \end{aligned}$$

We denote the topological boundary by \(\partial E\). As in [29, 30], our approach to the variation is the coarea formula rather then the definition of the variation, see for example [13, Theorem 5.9].

Lemma 3.1

Let \(f\in L^1_{\mathrm {loc}}({\mathbb {R}}^d)\) with locally bounded variation and \(U\subset {\mathbb {R}}^d\). Then

$$\begin{aligned} {{\,\mathrm{var}\,}}_U f = \int _{\mathbb {R}}{\mathcal {H}}^{d-1}(\partial _*{\{f>\lambda \}}\cap U)\mathop {}\!{\mathrm {d}}\lambda . \end{aligned}$$

Lemma 3.2

Let \(f\in L^1_{\mathrm {loc}}({\mathbb {R}}^d)\) be weakly differentiable and \(U\subset {\mathbb {R}}^d\) and \(\lambda _0<\lambda _1\). Then

$$\begin{aligned} \int _{\{x\in U:\lambda _0<f(x)<\lambda _1\}} |\nabla f| = \int _{\lambda _0}^{\lambda _1}{\mathcal {H}}^{d-1}(\partial _*{\{f>\lambda \}}\cap U)\mathop {}\!{\mathrm {d}}\lambda . \end{aligned}$$

Recall also the relative isoperimetric inequality for cubes.

Lemma 3.3

Let Q be a cube and E be a measurable set. Then

$$\begin{aligned} \min \{{\mathcal {L}}(Q\cap E),{\mathcal {L}}(Q{\setminus } E)\}^{d-1}\lesssim {\mathcal {H}}^{d-1}(\partial _*{E}\cap Q)^d . \end{aligned}$$

We will use a result from the case \(\alpha =0\). For a subset \(\mathcal {Q}\subset \mathcal {Q}_0\) and \(Q\in \mathcal {Q}_0\), we denote

$$\begin{aligned} \lambda _Q^\mathcal {Q}= \min \biggl \{ \max \Bigl \{ \inf \{\lambda :{\mathcal {L}}(\{f>\lambda \}\cap Q)<2^{-d-2}{\mathcal {L}}(Q)\} ,\ \sup \{f_P:P\in \mathcal {Q},\ P\supsetneq Q\} \Bigr \} ,f_Q \biggr \} . \end{aligned}$$

Proposition 3.4

Let \(1\le p<\infty \) and \(f\in L^1_{\mathrm {loc}}({\mathbb {R}}^d)\) and \(|\nabla f|\in L^p({\mathbb {R}}^d)\). Then for every set \(\mathcal {Q}\subset \mathcal {Q}_0\) we have

$$\begin{aligned} \sum _{Q\in \mathcal {Q}} ({{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}(f_Q-\lambda _Q^\mathcal {Q}))^p \lesssim _p \Vert \nabla f\Vert _p^p . \end{aligned}$$

For \(p=1\) it also holds with \(\Vert \nabla f\Vert _1\) replaced by \({{\,\mathrm{var}\,}}f\).

Remark 3.5

We have that \(\alpha <\beta \) implies \(\mathcal {Q}_\beta \subset \mathcal {Q}_\alpha \). This is because for \({{\,\mathrm{l}\,}}(Q)<{{\,\mathrm{l}\,}}(P)\), \({{\,\mathrm{l}\,}}(Q)^\alpha f_Q>{{\,\mathrm{l}\,}}(P)^\alpha f_P\) becomes a stronger estimate the larger \(\alpha \) becomes.

By Remark 3.5 we can apply Proposition 3.4 to \(Q=\mathcal {Q}_\alpha \). For \(p=1\) Proposition 3.4 is Proposition 2.5 in [29]. For the proof for all \(p\ge 1\) we follow the strategy in [29]. In particular we use the following result. For \(Q\in \mathcal {Q}_0\) we denote

$$\begin{aligned} {\bar{\lambda }}_Q = \min \biggl \{ \max \Bigl \{ \inf \{\lambda :{\mathcal {L}}(\{f>\lambda \}\cap Q)<{\mathcal {L}}(Q)/2\} ,\ \sup \{f_P:P\in \mathcal {Q}_0,\ P\supsetneq Q\} \Bigr \} ,f_Q \biggr \} . \end{aligned}$$

Lemma 3.6

(Corollary 3.3 in [29]) Let \(f\in L^1_{\mathrm {loc}}({\mathbb {R}}^d)\). Then for every \(Q\in \mathcal {Q}_0\) we have

$$\begin{aligned} {\mathcal {L}}(Q)(f_Q-\lambda _Q^\emptyset ) \le 2^{d+2} \sum _{P\in \mathcal {Q}_0,P\subsetneq Q} \int _{{\bar{\lambda }}_P}^{f_P}{\mathcal {L}}(P\cap \{f>\lambda \})\mathop {}\!{\mathrm {d}}\lambda . \end{aligned}$$

Note that \(f_P>{\bar{\lambda }}_P\) implies \(P\in \mathcal {Q}_0\).

Proof of Proposition 3.4

By Lemmas 3.33.2 we have for each \(P\in \mathcal {Q}_0\) and \(P\subsetneq Q\) that

$$\begin{aligned} \int _{{\bar{\lambda }}_P}^{f_P}{\mathcal {L}}(\{f>\lambda \}\cap P)\mathop {}\!{\mathrm {d}}\lambda&\le {{\,\mathrm{l}\,}}(P) \int _{{\bar{\lambda }}_P}^{f_P}{\mathcal {L}}(\{f>\lambda \}\cap P)^{1-\frac{1}{d}}\mathop {}\!{\mathrm {d}}\lambda \\&\lesssim {{\,\mathrm{l}\,}}(P) \int _{{\bar{\lambda }}_P}^{f_P}{\mathcal {H}}^{d-1}(\partial _*{\{f>\lambda \}}\cap P)\mathop {}\!{\mathrm {d}}\lambda \\&= {{\,\mathrm{l}\,}}(P) \int _{x\in P:{\bar{\lambda }}_P<f(x)<f_P}|\nabla f| \\&= {{\,\mathrm{l}\,}}(P) \int _Q|\nabla f| 1_{P\times ({\bar{\lambda }}_P,f_P)}(x,f(x))\mathop {}\!{\mathrm {d}}x . \end{aligned}$$

We note that for any \(Q\in \mathcal {Q}\) we have \(\lambda _Q^\mathcal {Q}\ge \lambda _Q^\emptyset \) and use Lemma 3.6. Then we apply the above calculation, Hölder’s inequality and use that \(({\bar{\lambda }}_P,f_P)\) and \(({\bar{\lambda }}_Q,f_Q)\) are disjoint for \(P\subsetneq Q\),

$$\begin{aligned}&\sum _{Q\in \mathcal {Q}} \Bigl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}(f_Q-\lambda _Q^\mathcal {Q}) \Bigr )^p \\&\quad \le 2^{d+2} \sum _{Q\in \mathcal {Q}}\Biggl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1-d} \sum _{P\in \mathcal {Q}_0,P\subsetneq Q} \int _{{\bar{\lambda }}_P}^{f_P}{\mathcal {L}}(\{f>\lambda \}\cap P)\mathop {}\!{\mathrm {d}}\lambda \Biggr )^p \\&\quad \lesssim \sum _{Q\in \mathcal {Q}}\Biggl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1-d} \int _Q|\nabla f| \sum _{P\in \mathcal {Q}_0,P\subsetneq Q}{{\,\mathrm{l}\,}}(P) 1_{P\times ({\bar{\lambda }}_P,f_P)}(x,f(x))\mathop {}\!{\mathrm {d}}x \Biggr )^p \\&\quad \le \sum _{Q\in \mathcal {Q}}\Biggl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1-d+d(1-\frac{1}{p})} \Biggl [ \int _Q|\nabla f|^p\biggl ( \sum _{P\in \mathcal {Q}_0,P\subsetneq Q}{{\,\mathrm{l}\,}}(P)1_{P\times ({\bar{\lambda }}_P,f_P)}(x,f(x)) \biggr )^p\mathop {}\!{\mathrm {d}}x \Biggr ]^{\frac{1}{p}}\Biggr )^p \\&\quad = \sum _{Q\in \mathcal {Q}}\Biggl ( {{\,\mathrm{l}\,}}(Q)^{-1}\Biggl [ \sum _{P\in \mathcal {Q}_0,P\subsetneq Q}{{\,\mathrm{l}\,}}(P)^p \int _{(x,f(x))\in P\times ({\bar{\lambda }}_P,f_P)}|\nabla f|^p \Biggr ]^{\frac{1}{p}}\Biggr )^p \\&\quad = \sum _{Q\in \mathcal {Q}} {{\,\mathrm{l}\,}}(Q)^{-p} \sum _{P\in \mathcal {Q}_0,P\subsetneq Q}{{\,\mathrm{l}\,}}(P)^p \int _{(x,f(x))\in P\times ({\bar{\lambda }}_P,f_P)}|\nabla f|^p \\&\quad = \sum _{P\in \mathcal {Q}_0} {{\,\mathrm{l}\,}}(P)^p \int _{x\in P:f(x)\in ({\bar{\lambda }}_P,f_P)}|\nabla f|^p \sum _{Q\in \mathcal {Q},Q\supsetneq P}{{\,\mathrm{l}\,}}(Q)^{-p} \\&\quad \le \frac{1}{2^p-1} \sum _{P\in \mathcal {Q}_0} \int _{x\in P:f(x)\in ({\bar{\lambda }}_P,f_P)}|\nabla f|^p\\&\quad \le \frac{1}{2^p-1} \int |\nabla f|^p . \end{aligned}$$

For \(p=1\) with \({{\,\mathrm{var}\,}}f\) instead of \(\Vert \nabla f\Vert _1\) we do not use Lemma 3.2 or Hölder’s inequality, but interchange the order of summation first and then apply Lemma 3.1. \(\square \)

For a dyadic cube Q denote by \({{\,\mathrm{prt}\,}}(Q)\) the dyadic parent cube of Q.

Lemma 3.7

Let \(1\le p<d/\alpha \) and \(f\in L^p({\mathbb {R}}^d)\) and let \(\varepsilon >0\). Then there is a subset \({\tilde{\mathcal {Q}}}_\alpha \) of \(\mathcal {Q}_\alpha \) such that for each \(Q\in \mathcal {Q}_\alpha \) with \({{\,\mathrm{l}\,}}(Q)^\alpha f_Q>\varepsilon \) there is a \(P\in {\tilde{\mathcal {Q}}}_\alpha \) with \(Q\subset {{\,\mathrm{prt}\,}}(P)\) and \(f_Q\le 2^d f_P\). Furthermore for any two \(Q,P\in {\tilde{\mathcal {Q}}}_\alpha \) one of the following holds.

  1. (1)

    \({{\,\mathrm{prt}\,}}(Q)={{\,\mathrm{prt}\,}}(P).\)

  2. (2)

    \({{\,\mathrm{prt}\,}}(Q)\) and \({{\,\mathrm{prt}\,}}(P)\) don’t intersect.

  3. (3)

    \(f_Q/f_P\not \in (2^{-d},2^d)\).

Proof

Set \({\tilde{\mathcal {Q}}}_\alpha ^0\) to be the set of maximal cubes Q with \({{\,\mathrm{l}\,}}(Q)^\alpha f_Q>\varepsilon \). For any dyadic cube Q with \({{\,\mathrm{l}\,}}(Q)^\alpha f_Q>\varepsilon \) we have

$$\begin{aligned} \varepsilon < {{\,\mathrm{l}\,}}(Q)^{\alpha -d}\int _Q f \le {{\,\mathrm{l}\,}}(Q)^{\alpha -d+d-\frac{d}{p}}\Bigl (\int _Q f^p\Bigr )^{\frac{1}{p}} \le {{\,\mathrm{l}\,}}(Q)^{\alpha -\frac{d}{p}}\Vert f\Vert _p \end{aligned}$$

which implies

$$\begin{aligned} {{\,\mathrm{l}\,}}(Q) < (\Vert f\Vert _p/\varepsilon )^{(p^{-1}-\alpha /d)^{-1}} . \end{aligned}$$
(3.1)

Hence

$$\begin{aligned} \bigcup {\tilde{\mathcal {Q}}}_\alpha ^0 = \bigcup \{Q\in \mathcal {Q}_\alpha :{{\,\mathrm{l}\,}}(Q)^\alpha f_Q>\varepsilon \} . \end{aligned}$$

Assume we have already defined \({\tilde{\mathcal {Q}}}_\alpha ^n\). Then define \({\tilde{\mathcal {Q}}}_\alpha ^{n+1}\) to be the set of maximal cubes \(Q\in \mathcal {Q}_\alpha \) with

$$\begin{aligned} f_Q > 2^d\sup _{P\in {\tilde{\mathcal {Q}}}_\alpha ^n:Q\subset {{\,\mathrm{prt}\,}}(P)}f_P . \end{aligned}$$
(3.2)

Set \({\tilde{\mathcal {Q}}}_\alpha ={\tilde{\mathcal {Q}}}_\alpha ^0\cup {\tilde{\mathcal {Q}}}_\alpha ^1\cup \ldots .\)

Assume there is a cube Q with \({{\,\mathrm{l}\,}}(Q)^\alpha f_Q>\varepsilon \) such that for all \(P\in {\tilde{\mathcal {Q}}}_\alpha \) with \(Q\subset {{\,\mathrm{prt}\,}}(P)\) we have \(f_Q>2^df_P\). Then by formula (3.1) there is a maximal such cube Q. Furthermore there is a smallest \(P\in {\tilde{\mathcal {Q}}}_\alpha \) with \(Q\subset {{\,\mathrm{prt}\,}}(P)\) and an n with \(P\in {\tilde{\mathcal {Q}}}_\alpha ^n\). But then Q is a maximal cube that satisfies formula (3.2), which implies \(Q\in {\tilde{\mathcal {Q}}}_\alpha ^{n+1}\), a contradiction.

If for \(Q,P\in {\tilde{\mathcal {Q}}}_\alpha \) neither (1) nor (2) holds, then after renaming we have \({{\,\mathrm{prt}\,}}(Q)\subsetneq {{\,\mathrm{prt}\,}}(P)\). Then P has been added to \({\tilde{\mathcal {Q}}}_\alpha \) before Q, and since \(Q\subset {{\,\mathrm{prt}\,}}(P)\) this means \(f_Q>2^df_P\). \(\square \)

Lemma 3.8

Let \(1\le p<\infty \) and \(f\in W^{1,p}({\mathbb {R}}^d)\) and let \(\varepsilon >0\). Let \(\mathcal {Q}\subset \mathcal {Q}_0\) be a set of dyadic cubes such that

  1. (1)

    for each \(Q\in \mathcal {Q}\) there is an ancestor cube \(p(Q)\supsetneq Q\) with \({{\,\mathrm{l}\,}}(p(Q))\le {{\,\mathrm{l}\,}}(Q)/\varepsilon \) and \(f_Q>2^\varepsilon f_{p(Q)},\)

  2. (2)

    and for any two distinct \(Q,P\in \mathcal {Q}\) such that p(Q) and p(P) intersect we have \(f_Q/f_P\not \in (2^{-\varepsilon },2^\varepsilon )\).

Then

$$\begin{aligned} \Biggl ( \sum _{Q\in \mathcal {Q}} ({{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}f_Q)^p \Biggr )^{\frac{1}{p}} \lesssim _\varepsilon \Vert \nabla f\Vert _p . \end{aligned}$$

The endpoint \(p=\infty \) holds as well.

Proof

We divide into two types of cubes and deal with them separately. Denote

$$\begin{aligned} \mathcal {Q}_-&= \{Q\in \mathcal {Q}: {\mathcal {L}}(\{f>2^{-\varepsilon /3}f_Q\}\cap Q)<2^{-d-2}{\mathcal {L}}(Q) \} , \\ \mathcal {Q}_+&= \{Q\in \mathcal {Q}: {\mathcal {L}}(\{f>2^{-\varepsilon /3}f_Q\}\cap Q)\ge 2^{-d-2}{\mathcal {L}}(Q) \} . \end{aligned}$$

Let \(Q\in \mathcal {Q}_-\) and recall \(\lambda _Q^{\mathcal {Q}}\) from Proposition 3.4. Then since

$$\begin{aligned} \sup \{\lambda :{\mathcal {L}}(\{f>\lambda \}\cap Q)<2^{-d-2}{\mathcal {L}}(Q)\}&\le 2^{-\varepsilon /3}f_Q , \\ \sup \{f_P:P\in \mathcal {Q},\ P\supsetneq Q\}&\le 2^{-\varepsilon }f_Q \end{aligned}$$

we have

$$\begin{aligned} f_Q-\lambda _Q^{\mathcal {Q}} \ge (1-2^{-\varepsilon /3})f_Q . \end{aligned}$$

Since \(\mathcal {Q}\subset \mathcal {Q}_0\) we conclude from Proposition 3.4

$$\begin{aligned} \sum _{Q\in \mathcal {Q}_-} \Bigl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}f_Q \Bigr )^p \le (1-2^{-\varepsilon /3})^{-p} \sum _{Q\in \mathcal {Q}_-} \Bigl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1} (f_Q-\lambda _Q^{\mathcal {Q}}) \Bigr )^p \lesssim _{\varepsilon ,p} \Vert \nabla f\Vert _p^p . \end{aligned}$$

Let \(Q\in \mathcal {Q}_+\) and \(\lambda >2^{-2\varepsilon /3}f_Q\). Since by (1) we have \(2^{\varepsilon /3}f_{p(Q)}<2^{-2\varepsilon /3}f_Q\), we obtain from Chebyshev’s inequality

$$\begin{aligned} {\mathcal {L}}(p(Q)\cap \{f>\lambda \}) \le 2^{-\varepsilon /3}{\mathcal {L}}(p(Q)) . \end{aligned}$$
(3.3)

Since \(Q\in \mathcal {Q}_+\), for \(\lambda <2^{-\varepsilon /3}f_Q\) we have

$$\begin{aligned} 2^{-d-2}\varepsilon ^d{\mathcal {L}}(p(Q)) \le 2^{-d-2}{\mathcal {L}}(Q) \le {\mathcal {L}}(Q\cap \{f>\lambda \}) \le {\mathcal {L}}(p(Q)\cap \{f>\lambda \}) . \end{aligned}$$
(3.4)

So for all \( 2^{-2\varepsilon /3}f_Q \le \lambda \le 2^{-\varepsilon /3}f_Q \) we can conclude by the isoperimetric inequality Lemma 3.3 and formulas (3.3) and (3.4) that

$$\begin{aligned} {\mathcal {H}}^{d-1}(\partial _*{\{f>\lambda \}}\cap p(Q))^d&\gtrsim \min \{{\mathcal {L}}(p(Q)\cap \{f>\lambda \}),{\mathcal {L}}(p(Q){\setminus }\{f>\lambda \})\}^{d-1} \\&\ge ({\mathcal {L}}(p(Q))\min \{\varepsilon ^d2^{-d-2},1-2^{-\varepsilon /3}\})^{d-1} \\&\gtrsim _\varepsilon {\mathcal {L}}(p(Q))^{d-1} . \end{aligned}$$

Thus for each \(Q\in \mathcal {Q}_+\) by Lemma 3.2 and Hölder’s inequality we have

$$\begin{aligned} \int _{2^{-2\varepsilon /3}f_Q}^{2^{-\varepsilon /3}f_Q} {{\,\mathrm{l}\,}}(p(Q))^{d-1} \mathop {}\!{\mathrm {d}}\lambda&\lesssim _\varepsilon \int _{2^{-2\varepsilon /3}f_Q}^{2^{-\varepsilon /3}f_Q} {\mathcal {H}}^{d-1}(\partial _*{\{f>\lambda \}}\cap p(Q)) \mathop {}\!{\mathrm {d}}\lambda \\&= \int _{x\in p(Q):f(x)\in (2^{-2\varepsilon /3},2^{-\varepsilon /3})f_Q} |\nabla f| \\&\le {{\,\mathrm{l}\,}}(p(Q))^{d-\frac{d}{p}} \Biggl ( \int _{x\in p(Q):f(x)\in (2^{-2\varepsilon /3},2^{-\varepsilon /3})f_Q} |\nabla f|^p \Biggr )^{\frac{1}{p}} . \end{aligned}$$

Now we use (2) and conclude

$$\begin{aligned} \sum _{Q\in \mathcal {Q}_+} \Bigl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}f_Q \Bigl )^p&\lesssim _{\varepsilon ,p} \sum _{Q\in \mathcal {Q}_+} \Bigl ( {{\,\mathrm{l}\,}}(p(Q))^{\frac{d}{p}-1}f_{p(Q)} \Bigr )^p \\&\lesssim _{\varepsilon ,p} \sum _{Q\in \mathcal {Q}_+} \Biggl ( {{\,\mathrm{l}\,}}(p(Q))^{\frac{d}{p}-d} \int _{2^{-2\varepsilon /3}f_Q}^{2^{-\varepsilon /3}f_Q} {{\,\mathrm{l}\,}}(p(Q))^{d-1} \mathop {}\!{\mathrm {d}}\lambda \Biggr )^p \\&\lesssim _{\varepsilon ,p} \sum _{Q\in \mathcal {Q}_+} \int _{x\in p(Q):f(x)\in (2^{-2\varepsilon /3},2^{-\varepsilon /3})f_Q} |\nabla f|^p \\&\le \int |\nabla f|^p . \end{aligned}$$

For \(p=1\) with \({{\,\mathrm{var}\,}}f\) instead of \(\Vert \nabla f\Vert _1\) we use Lemma 3.1 instead of Lemma 3.2 and Hölder’s inequality. For \(p=\infty \) let \(Q\in \mathcal {Q}\). Then by the Sobolev-Poincaré inequality we have

$$\begin{aligned} \Vert \nabla f\Vert _\infty \ge \Vert \nabla f\Vert _{L^\infty (p(Q))}&\gtrsim {{\,\mathrm{l}\,}}(p(Q))^{-d-1} \int _{p(Q)}|f-f_{p(Q)}| \\&\ge {{\,\mathrm{l}\,}}(Q)^{-d-1} \varepsilon ^{d+1} \int _Q|f-f_{p(Q)}| \\&\ge {{\,\mathrm{l}\,}}(Q)^{-d-1} \varepsilon ^{d+1} \int _Qf-f_{p(Q)} \\&= {{\,\mathrm{l}\,}}(Q)^{-1} \varepsilon ^{d+1} (f_Q-f_{p(Q)}) \\&\ge {{\,\mathrm{l}\,}}(Q)^{-1} \varepsilon ^{d+1}(1-2^{-\varepsilon }) f_Q . \end{aligned}$$

\(\square \)

Proof of Theorem 1.5

Let \(\varepsilon >0\) and \({\tilde{\mathcal {Q}}}_\alpha \) be the set of cubes from Lemma 3.7. Let \(Q\in \mathcal {Q}_\alpha \). Then there is a \(P\in {\tilde{\mathcal {Q}}}_\alpha \) with \(Q\subset {{\,\mathrm{prt}\,}}(P)\) and \(f_Q\le 2^d f_P\). Then \(f_Q\le 4^d f_{{{\,\mathrm{prt}\,}}(P)}\). Thus since \({{\,\mathrm{l}\,}}(Q)^\alpha f_Q>{{\,\mathrm{l}\,}}({{\,\mathrm{prt}\,}}(P))^\alpha f_{{{\,\mathrm{prt}\,}}(P)}\) we have \({{\,\mathrm{l}\,}}(Q)>4^{-d/\alpha }{{\,\mathrm{l}\,}}({{\,\mathrm{prt}\,}}(P))\). Thus for each P there are at most \(c_\alpha \) many \(Q\in \mathcal {Q}_\alpha \) with \(Q\subset {{\,\mathrm{prt}\,}}(P)\) and \(f_Q\le 2^df_P\). We conclude

$$\begin{aligned} \sum _{Q\in \mathcal {Q}_\alpha ,{{\,\mathrm{l}\,}}(Q)^\alpha f_Q>\varepsilon } \Bigl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}f_Q \Bigr )^p&\le \sum _{P\in {\tilde{\mathcal {Q}}}_\alpha } \sum _{Q\in \mathcal {Q}_\alpha ,\ Q\subset {{\,\mathrm{prt}\,}}(P),\ f_Q\le 2^df_P} \Bigl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}f_Q \Bigr )^p \\&\lesssim _{\alpha ,p} c_\alpha \sum _{P\in {\tilde{\mathcal {Q}}}_\alpha } \Bigl ( {{\,\mathrm{l}\,}}(P)^{\frac{d}{p}-1}f_P \Bigr )^p . \end{aligned}$$

For each dyadic cube \(P\in \{{{\,\mathrm{prt}\,}}(Q):Q\in {\tilde{\mathcal {Q}}}_\alpha \}\) pick a \(Q\in {\tilde{\mathcal {Q}}}_\alpha \) with \(P={{\,\mathrm{prt}\,}}(Q)\) such that for all \(Q'\in {\tilde{\mathcal {Q}}}_\alpha \) with \(P={{\,\mathrm{prt}\,}}(Q')\) we have \(f_{Q'}\le f_Q\). Denote by \({\hat{\mathcal {Q}}}_\alpha \) the set of all such dyadic cubes Q. Then

$$\begin{aligned} \sum _{Q\in {\tilde{\mathcal {Q}}}_\alpha } \Bigl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}f_Q \Bigr )^p&\le \sum _{P\in \{{{\,\mathrm{prt}\,}}(Q):Q\in {\tilde{\mathcal {Q}}}_\alpha \}} \sum _{Q\in {\tilde{\mathcal {Q}}}_\alpha :P={{\,\mathrm{prt}\,}}(Q)} \Bigl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}f_Q \Bigr )^p \\&\le \sum _{P\in \{{{\,\mathrm{prt}\,}}(Q):Q\in {\tilde{\mathcal {Q}}}_\alpha \}} 2^d\sum _{Q\in {\hat{\mathcal {Q}}}_\alpha :P={{\,\mathrm{prt}\,}}(Q)} \Bigl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}f_Q \Bigr )^p \\&= 2^d\sum _{Q\in {\hat{\mathcal {Q}}}_\alpha } \Bigl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}f_Q \Bigr )^p. \end{aligned}$$

We want to show that Lemma 3.8 applies to \({\hat{\mathcal {Q}}}_\alpha \) with \(p(Q)={{\,\mathrm{prt}\,}}(Q)\). Since \({\hat{\mathcal {Q}}}_\alpha \subset \mathcal {Q}_\alpha \) we have \({\hat{\mathcal {Q}}}_\alpha \subset \mathcal {Q}_0\) by Remark 3.5, and (1) follows from \(f_Q>2^\alpha f_{{{\,\mathrm{prt}\,}}(Q)}\). For (2) let \(Q,P\in {\hat{\mathcal {Q}}}_\alpha \) be distinct such that \({{\,\mathrm{prt}\,}}(Q)\) and \({{\,\mathrm{prt}\,}}(P)\) intersect. Since we have \({{\,\mathrm{prt}\,}}(Q)\ne {{\,\mathrm{prt}\,}}(P)\), Lemma 3.7 implies \(f_Q/f_P\not \in (2^{-d},2^d)\). Thus by Lemma 3.8 we have

$$\begin{aligned} 2^d\sum _{Q\in {\hat{\mathcal {Q}}}_\alpha } \Bigl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}f_Q \Bigr )^p \lesssim _{\alpha ,p} \Vert \nabla f\Vert _p^p . \end{aligned}$$

We have proven for every \(\varepsilon >0\) that

$$\begin{aligned} \sum _{Q\in \mathcal {Q}_\alpha ,{{\,\mathrm{l}\,}}(Q)^\alpha f_Q>\varepsilon } \Bigl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}f_Q \Bigr )^p \lesssim _{\alpha ,p} \Vert \nabla f\Vert _p^p \end{aligned}$$

with constant independent of \(\varepsilon \). So we can let \(\varepsilon \) go to zero and conclude Theorem 1.5.

For the endpoint \(p=\infty \) let \(Q\in \mathcal {Q}_\alpha \). Then we use \(f_{{{\,\mathrm{prt}\,}}(Q)}\le 2^{-\alpha }f_Q\) and copy the proof of the endpoint in Lemma 3.8 with \(p(Q)={{\,\mathrm{prt}\,}}(Q)\) and \(\varepsilon =1/2\). \(\square \)

4 Hardy–Littlewood maximal operator

In this section we prove Theorem 1.2.

4.1 Making the balls disjoint

Lemma 4.1

Let \({\mathrm {M}}_\alpha \in \{\mathrm M^{{\mathrm {c}}}_\alpha ,\widetilde{{\mathrm {M}}}_\alpha \}\) and \(1\le p<d/(1+\alpha +\beta )\) and \(f\in L^p({\mathbb {R}}^d)\) and let \(\varepsilon >0\). Then for any \(c_1\ge 2,c_2\ge 1\) there is a set of balls \({\widetilde{\mathcal {B}}}\subset \mathcal {B}_\alpha \) such that for two balls \(B,A\in {\widetilde{\mathcal {B}}}\) we have \(c_1B\cap c_1A=\emptyset \) or \(f_A/f_B\not \in (c_2^{-1},c_2),\) and furthermore

$$\begin{aligned}&\int _\varepsilon ^\infty \lambda ^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}-1} {\mathcal {L}}\Bigl ( \bigcup \bigl \{ B\in \mathcal {B}_\alpha :r(B)^{\alpha +\beta }f_B>\lambda \bigr \} \Bigr ) \mathop {}\!{\mathrm {d}}\lambda \\&\quad \lesssim _{\alpha ,\beta ,p,c_1,c_2} \biggl ( \sum _{B\in {\widetilde{\mathcal {B}}}} \Bigl ( r(B)^{\frac{d}{p}-1}f_B \Bigr )^p \biggr )^{(1-p(1+\alpha +\beta )/d)^{-1}} . \end{aligned}$$

Proof

Let \(B\in \mathcal {B}_\alpha \) with \(r(B)^{\alpha +\beta }f_B>\varepsilon \). Then

$$\begin{aligned} \varepsilon < r(B)^{\alpha +\beta } f_B \le r(B)^{\alpha +\beta }{\mathcal {L}}(B)^{-1}{\mathcal {L}}(B)^{1-1/p} \Bigl (\int _B f^p\Bigr )^{1/p} \le \sigma _d^{-1/p} r(B)^{\alpha +\beta -d/p} \Vert f\Vert _p , \end{aligned}$$

which means that r(B) is bounded by

$$\begin{aligned} K = (\sigma _d^{-1/p}\Vert f\Vert _p/\varepsilon )^{1/(d/p-\alpha -\beta )} . \end{aligned}$$

Define \(\mathcal {B}^0=\{B\in \mathcal {B}_\alpha :r(B)\in [1/2,1]K\}\). Then for all \(B\in \mathcal {B}^0\) we have that \(r(B)^\alpha f_B\) is uniformly bounded. Inductively define a sequence of balls as follows. For \(B_0,\ldots ,B_{k-1}\) already defined choose a ball \(B_k\in \mathcal {B}^0\) such that \(c_1B_k\) is disjoint from \(c_1B_0,\ldots ,c_1B_{k-1}\) and which attains at least half of

$$\begin{aligned} \sup \{ f_B: B\in \mathcal {B}^0,c_1B\cap (c_1B_0\cup \ldots \cup c_1B_{k-1})=\emptyset \} \end{aligned}$$

if one exists. Set \(\widetilde{\mathcal {B}^0}=\{B_0,B_1,\ldots \}\). Then for all \(B\in \mathcal {B}^0\) we have that \(c_1B\) intersects \(\bigcup \{c_1B:B\in \widetilde{\mathcal {B}^0}\}\). Define

$$\begin{aligned} \overline{\mathcal {B}^0}=\{B(x,r)\in \mathcal {B}_\alpha :\exists A\in \widetilde{\mathcal {B}^0}\ A\subset B(x,5c_1r(A)),\ f_{B(x,r)}\le c_2f_A\} . \end{aligned}$$

Then \(\mathcal {B}^0\subset \overline{\mathcal {B}^0}\). We proceed by induction. For each \(n\in {\mathbb {N}}\) define

$$\begin{aligned} \mathcal {B}^n = \bigl \{ B\in \mathcal {B}_\alpha {\setminus }(\overline{\mathcal {B}^0}\cup \ldots \cup \overline{\mathcal {B}^{n-1}}):r(B)\in [1/2,1]2^{-n}K \bigr \} , \end{aligned}$$

as above greedily select a sequence \(\widetilde{\mathcal {B}^n}\) of balls \(B\in \mathcal {B}^n\) with almost maximal \(f_B\) such that for every already selected \(A\in \widetilde{\mathcal {B}^n}\) we have \(c_1B\cap c_1A=\emptyset \), and define

$$\begin{aligned} \overline{\mathcal {B}^n} = \bigl \{ B(x,r)\in \mathcal {B}_\alpha :\exists A\in \widetilde{\mathcal {B}^n}\ A\subset B(x,5c_1r(A)),\ f_{B(x,r)}\le c_2f_A \bigr \} . \end{aligned}$$

Note that we have \(\mathcal {B}^n\subset \overline{\mathcal {B}^n}\). Finally set \({\widetilde{\mathcal {B}}}=\widetilde{\mathcal {B}^0}\cup \widetilde{\mathcal {B}^1}\cup \ldots \). For \(A\in {\widetilde{\mathcal {B}}}\), we denote

$$\begin{aligned} U_{A,\lambda } = \bigl \{ B(x,r)\in \mathcal {B}_\alpha : A\subset B(x,5c_1r(A)),\ f_{B(x,r)}\le c_2f_A,r^{\alpha +\beta } f_{B(x,r)}>\lambda \bigr \} . \end{aligned}$$

Let \(\lambda >\varepsilon \) and \(B\in \mathcal {B}_\alpha \) with \(r(B)^{\alpha +\beta }f_B>\lambda \). Then there is an n with \(B\in \overline{\mathcal {B}^n}\), and hence a \(A\in \widetilde{\mathcal {B}^n}\) with \(B\in U_{A,\lambda }\). Let \(A\in {\widetilde{\mathcal {B}}}\) and \(B(x,r)\in U_{A,\lambda }\). Then \(A\subset B(x,5c_1r(A))\). Since \(r\in R_\alpha f(x)\) we have

$$\begin{aligned} r^\alpha f_{B(x,r)} \ge (5c_1r(A))^\alpha f_{B(x,5c_1r(A))} \ge (5c_1r(A))^\alpha (5c_1)^{-d} f_A \end{aligned}$$

which implies

$$\begin{aligned} r \ge (5c_1)^{1-d/\alpha }r(A)(f_A/f_{B(x,r)})^{1/\alpha } \ge (5c_1)^{1-d/\alpha }c_2^{1/\alpha }r(A) . \end{aligned}$$

Since \(r\le 5c_1r(A)\) it follows that

$$\begin{aligned} r^\beta \le r(A)^\beta {\left\{ \begin{array}{ll} (5c_1)^\beta , &{} \beta \ge 0 , \\ (5c_1)^{\beta -d\beta /\alpha }c_2^{\beta /\alpha }, &{} \beta <0 . \end{array}\right. } \end{aligned}$$

Together with

$$\begin{aligned} r^\alpha f_{B(x,r)} \le (5c_1r(A))^\alpha c_2f_A \end{aligned}$$

we obtain

$$\begin{aligned} r^{\alpha +\beta } f_{B(x,r)} \le c_3 r(A)^{\alpha +\beta }f_A , \end{aligned}$$

where

$$\begin{aligned} c_3 = {\left\{ \begin{array}{ll} (5c_1)^{\alpha +\beta } c_2, &{} \beta \ge 0 , \\ (5c_1)^{\alpha +\beta -d\beta /\alpha }c_2^{1+\beta /\alpha }, &{} \beta <0. \end{array}\right. } \end{aligned}$$

Thus \(U_{A,\lambda }\) is only nonempty if

$$\begin{aligned} \lambda < c_3 r(A)^{\alpha +\beta }f_A . \end{aligned}$$

We can conclude

$$\begin{aligned}&\int _\varepsilon ^\infty \lambda ^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}-1} {\mathcal {L}}\Bigl ( \bigcup \{B\in \mathcal {B}_\alpha :r(B)^{\alpha +\beta }f_B>\lambda \} \Bigr ) \mathop {}\!{\mathrm {d}}\lambda \\&\quad = \int _\varepsilon ^\infty \lambda ^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}-1} {\mathcal {L}}\Bigl ( \bigcup _{A\in {\widetilde{\mathcal {B}}}} \bigcup U_{A,\lambda } \Bigr ) \mathop {}\!{\mathrm {d}}\lambda \\&\quad \le \sum _{A\in {\widetilde{\mathcal {B}}}} \int _\varepsilon ^\infty \lambda ^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}-1} {\mathcal {L}}\Bigl ( \bigcup U_{A,\lambda } \Bigr ) \mathop {}\!{\mathrm {d}}\lambda \\&\quad = \sum _{A\in {\widetilde{\mathcal {B}}}} \int _\varepsilon ^{c_3 r(A)^{\alpha +\beta }f_A} \lambda ^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}-1} {\mathcal {L}}\Bigl ( \bigcup U_{A,\lambda } \Bigr ) \mathop {}\!{\mathrm {d}}\lambda \\&\quad \le \sum _{A\in {\widetilde{\mathcal {B}}}} (5c_1)^d{\mathcal {L}}(A) \int _\varepsilon ^{c_3 r(A)^{\alpha +\beta }f_A} \lambda ^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}-1} \mathop {}\!{\mathrm {d}}\lambda \\&\quad \le (1/p-(1+\alpha +\beta )/d) \sum _{A\in {\widetilde{\mathcal {B}}}} (5c_1)^d{\mathcal {L}}(A) \Bigl ( c_3 r(A)^{\alpha +\beta }f_A \Bigr )^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}} \\&\quad = (1/p-(1+\alpha +\beta )/d) (5c_1)^d c_3^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}} \sigma _d \sum _{A\in {\widetilde{\mathcal {B}}}} \Bigl ( r(A)^{\frac{d}{p}-1}f_A \Bigr )^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}} \\&\quad \le (1/p-(1+\alpha +\beta )/d) (5c_1)^d c_3^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}} \sigma _d \biggl ( \sum _{A\in {\widetilde{\mathcal {B}}}} \Bigl ( r(A)^{\frac{d}{p}-1}f_A \Bigr )^p \biggr )^{(1-p(1+\alpha +\beta )/d)^{-1}} . \end{aligned}$$

\(\square \)

4.2 Transfer to dyadic cubes

In this subsection we pass from disjoint balls to dyadic cubes and then conclude Theorem 1.2 using a result from the dyadic setting.

Remark 4.2

There are \(3^d\) dyadic grids \(\mathcal {D}_1,\ldots ,\mathcal {D}_{3^d}\) such that each ball B is contained in a dyadic cube \(Q_B\in \mathcal {D}=\mathcal {D}_1\cup \cdots \cup \mathcal {D}_{3^d}\) with \({{\,\mathrm{l}\,}}(Q)\lesssim r(B)\).

Lemma 4.3

Let \({\mathrm {M}}_\alpha \in \{\mathrm M^{{\mathrm {c}}}_\alpha ,\widetilde{{\mathrm {M}}}_\alpha \}\) and \(f\in L^1_{\mathrm {loc}}({\mathbb {R}}^d)\). Then for each \(B\in \mathcal {B}_\alpha \) we have \(f_{Q_B}\sim f_B\) and \({{\,\mathrm{l}\,}}(Q_B)\sim r(B)\).

Proof

Let x be the center of B, and \(Q_B\) be the cube from Remark 4.2, and \(A=B(x,\sqrt{d}{{\,\mathrm{l}\,}}(Q))\). Then \(r(B)\sim {{\,\mathrm{l}\,}}(Q_B)\sim r(A)\) and \(f_B\lesssim f_{Q_B}\lesssim f_A\). Since \(B\in \mathcal {B}_\alpha \) we also have \(r(A)^\alpha f_A<r(B)^\alpha f_B\) and therefore conclude \(f_{Q_B}\lesssim f_A\lesssim f_B\). \(\square \)

Lemma 4.4

Let \({\mathrm {M}}_\alpha \in \{\mathrm M^{{\mathrm {c}}}_\alpha ,\widetilde{{\mathrm {M}}}_\alpha \}\) and \(f\in L^1_{\mathrm {loc}}({\mathbb {R}}^d)\). For each \(\alpha >0\) and \(B\in \mathcal {B}_\alpha \) and cube \(P\supset Q_B\) we have \( {{\,\mathrm{l}\,}}(P)^\alpha f_P \lesssim _\alpha {{\,\mathrm{l}\,}}(Q_B))^\alpha f_{Q_B}.\)

Proof

For x the center of B define \(A=B(x,\sqrt{d}{{\,\mathrm{l}\,}}(P))\). Then from \(f_P\lesssim f_A\) and \(r(A)^\alpha f_A<r(B)^\alpha f_B\) and \(f_B\lesssim f_{Q_B}\) we obtain \( {{\,\mathrm{l}\,}}(P)^\alpha f_P \lesssim s^\alpha f_{B(x,s)} < r^\alpha f_{B(x,r)} \lesssim _\alpha {{\,\mathrm{l}\,}}(Q_{B(x,r)})^\alpha f_{Q_{B(x,r)}} . \) \(\square \)

Proof of Theorem 1.2

For \(B\in \mathcal {B}_\alpha \) denote by \(P_B\) the largest cube that attains \(\max _{P\supset Q_B}f_P\). Then \(P_B\in \mathcal {Q}_0\) and by Lemmas 4.3 and 4.4 we have \({{\,\mathrm{l}\,}}(P_B)\sim _\alpha r(B)\) and \(f_{P_B}\sim _\alpha f_B\). By Lemma 4.4 there further exists a cube \(p(P_B)\supset P_B\) with \(f_{p(P_B)}\le f_{P_B}/2\) and \({{\,\mathrm{l}\,}}(p(P_B))\lesssim _\alpha {{\,\mathrm{l}\,}}(P_B)\).

Let \(\varepsilon >0\) and let \({\widetilde{\mathcal {B}}}\) be the set of balls from Lemma 4.1. By Lemmas 4.3 and 4.4 there are \(c_1,c_2\) such that for any two distinct \(B,A\in {\widetilde{\mathcal {B}}}\) we have that \(p(P_B)\) and \(p(P_A)\) are disjoint or \(f_{P_B}/f_{P_A}\not \in (1/2,2)\). Define \(\mathcal {Q}=\{P_B:B\in {\widetilde{\mathcal {B}}}\}\). By the layer cake formula and Lemmas 4.1, and 4.3 we have

$$\begin{aligned}&\int ({\mathrm {M}}_{\alpha ,\beta }f)^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}} \\&\quad = {(p^{-1}-(1+\alpha +\beta )/d)^{-1}} \int _0^\infty \lambda ^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}-1} {\mathcal {L}}(\{{\mathrm {M}}_{\alpha ,\beta } f>\lambda \}) \mathop {}\!{\mathrm {d}}\lambda \\&\quad = {(p^{-1}-(1+\alpha +\beta )/d)^{-1}} \lim _{\varepsilon \rightarrow 0} \int _\varepsilon ^\infty \lambda ^{(p^{-1}-(1+\alpha +\beta )/d)^{-1}-1} {\mathcal {L}}\Bigl ( \bigcup \{B\in \mathcal {B}_\alpha :r(B)^{\alpha +\beta }f_B>\lambda \} \Bigr ) \mathop {}\!{\mathrm {d}}\lambda \\&\quad \lesssim _{\alpha ,\beta ,p} \lim _{\varepsilon \rightarrow 0} \biggl ( \sum _{B\in {\widetilde{\mathcal {B}}}} \Bigl ( r(B)^{\frac{d}{p}-1}f_B \Bigr )^p \biggr )^{(1-p(1+\alpha +\beta )/d)^{-1}} \\&\quad \sim _{\alpha ,\beta ,p} \lim _{\varepsilon \rightarrow 0} \biggl ( \sum _{Q\in \mathcal {Q}} \Bigl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}f_Q \Bigr )^p \biggr )^{(1-p(1+\alpha +\beta )/d)^{-1}} . \end{aligned}$$

For each \(i=1,\ldots ,3^d\) we apply Lemma 3.8 to \(\mathcal {Q}\cap \mathcal {D}_i\) and obtain

$$\begin{aligned} \sum _{Q\in \mathcal {Q}} \Bigl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}f_Q \Bigr )^p = \sum _{i=1}^{3^d}\sum _{Q\in \mathcal {Q}\cap \mathcal {D}_i} \Bigl ( {{\,\mathrm{l}\,}}(Q)^{\frac{d}{p}-1}f_Q \Bigr )^p \lesssim _{\alpha ,\beta ,p} \Vert \nabla f\Vert _p^p . \end{aligned}$$

For the endpoint \(p=d/(1+\alpha +\beta )\) we use \(\Vert {\mathrm {M}}_{\alpha ,\beta } f\Vert _\infty =\sup _{B\in \mathcal {B}_\alpha }r(B)^{\alpha +\beta }f_B\). Let \(B\in \mathcal {B}_\alpha \). Then \(f_{2B}\le 2^{-\alpha }f_B\) and we have by the Sobolev-Poincaré inequality

$$\begin{aligned}&\Vert \nabla f\Vert _{d/(1+\alpha +\beta )} \ge \biggl ( \int _{2B}|\nabla f|^{d/(1+\alpha +\beta )} \biggr )^{(1+\alpha +\beta )/d}&\gtrsim r(2B)^{\alpha +\beta -d} \int _{2B}|f-f_{2B}| \\&\ge 2^{\alpha +\beta -d} r(B)^{\alpha +\beta -d} \int _B|f-f_{2B}| \\&\ge 2^{\alpha +\beta -d} r(B)^{\alpha +\beta -d} \int _B(f-f_{2B}) \\&= \sigma _d 2^{\alpha +\beta -d} r(B)^{\alpha +\beta } (f_B-f_{2B}) \\&\ge \sigma _d 2^{\alpha +\beta -d} r(B)^{\alpha +\beta } (1-2^{-\alpha }) f_B . \end{aligned}$$

\(\square \)