Guide to nonlinear potential estimates

One of the basic achievements in nonlinear potential theory is that the typical linear pointwise estimates via fundamental solutions find a precise analog in the case of nonlinear equations. We give a comprehensive account of this fact and prove new unifying families of potential estimates. We also describe new fine properties of solutions to measure data problems.


A synopsis
The aim of this paper is to give a rather comprehensive introduction to nonlinear potential estimates, i.e., pointwise estimates for solutions to quasilinear, possibly degenerate elliptic equations via linear and nonlinear potentials. The paper contains both new and old results. They fall into two categories. The first consists of those results that have been proved elsewhere, and that are here given in different and/or streamlined version. The second contains new results that are presented for the first time. These fill some of the gaps that were making the current theory still somewhat incomplete.
To ease the reading, the already known results will be stated together with the reference to the corresponding original paper where they have appeared for the first time. The new ones will be presented pointing at the place of this paper where the proof can be found. In general, the first part of the paper is devoted to the presentation of the general setting, with the statements of the main theorems; this goes up to Sect. 8. The remaining parts are instead devoted to the proofs.
We shall start from a presentation of the classical results valid for linear elliptic equations in Sect. 2, and this will serve to give the general guideline to the topics we are going to cover in the nonlinear case. The first potential estimates for nonlinear equations will be introduced in Sect. 3. There the classical pointwise estimates will be presented. In the subsequent Sects. 4 and 5 we shall instead give a class of estimates aimed at unifying the theory. These allow to frame the classical pointwise potential inequalities in a more general setting, allowing for estimates of both size and oscilla-tions of solutions and their derivatives, including fractional derivates. At this stage we shall use "intermediate" Wolff and Riesz potentials and various fractional maximal operators. Especially, we prove Theorem 10, whose role is to unify estimates for u and Du via a family of estimates for suitable fractional operators of u. We shall eventually add a few remarks on the case of equations with coefficients in Sect. 6. All the results up to Sect. 6 are presented for energy solutions, that is for solutions belonging to the natural energy space W 1, p associated to the problems considered. In Sect. 7 we then turn to the case of general measure data problems, that is when very weak solutions come into the play. This will give us the opportunity, in Sect. 8, to present a few interesting consequences of potential estimates. Specifically, we shall prove two theorems about the possibility of describing the fine behaviour of solutions to nonlinear measure data problems and of their gradient via potentials. These theorems are the nonlinear analogs of classical results about fine properties of solutions to linear equations, which are usually derived via linear potential; see Remark 3 below. In Sect. 9 we then gather a series of regularity results that follow as a corollary of the theory presented. Some of these results are well-known and they are now framed in the general and unifying context of nonlinear potential estimates. The remaining sections are devoted to the proofs of the results introduced in the preceding ones, and their titles are self-explaining. In this paper we are not going to deal with parabolic problems, for which we refer to [42][43][44].
Before starting, we find useful to establish some notation. In what follows we denote by c a general positive constant, possibly varying from line to line; special occurrences will be denoted by c 1 , c 2 ,c 1 ,c 2 or the like. All these constants will always be larger or equal than one; moreover relevant dependencies on parameters will be emphasized using parentheses, i.e., c 1 ≡ c 1 (n, p, ν, L) means that c 1 depends only on n, p, ν, L. We denote by B(x, r ) ≡ B r (x) := {x ∈ R n : |x − x| < r } the open ball with center x and radius r > 0. When not important we shall omit denoting the center as follows: B r ≡ B(x, r ). Moreover, with B being a generic ball with radius r , we will denote by σ B the ball concentric to B having radius σ r , σ > 0. Unless otherwise stated, different balls in the same context will have the same center. The symbol will denote a bounded open subset of R n and we shall always consider the case n ≥ 2. With O ⊂ R n being a measurable subset with positive measure, and with g : O → R n , n ≥ 1, being a measurable map, we shall denote by its integral average; here |O| denotes the Lebesgue measure of O. The map g will be typically a gradient. We shall denote |g|(O) := O |g| dx therefore adopting a unified notation for both measures and integrals of L 1 -maps. In the following, μ will always denote a Borel measure with finite total mass, which is initially defined on a certain open subset ⊂ R n . Since this will not affect the rest, with no loss of generality, all such measures will be considered as defined in the whole R n so that |μ|(R n ) < ∞. We shall denote by M b the space of all such measures.

From linear to nonlinear
When considering linear elliptic equations, a powerful tool for establishing the qualitative properties of solutions is given by representation formulas via fundamental solutions. Eventually, these lead to consider linear Riesz potentials and singular integrals. Let us recall the situation for the simplest example, which is of course given by the classical Poisson equation For the sake of exposition, we initially consider the last equation in the whole R n , with n ≥ 2. Here μ, which is eventually taken to be a measure, is again for simplicity assumed to be a smooth and compactly supported function, while u is the unique solution which decays to zero at infinity. The point we are interested in now is that u can be recovered via convolution with the so called fundamental solution (2) and this means that the following representation formula holds: The formula in the last display allows to shift the study of solutions to the analysis of a related integral operator. It is therefore time for the following: Definition 1 (Riesz potentials) Let α ∈ (0, n]; the linear operator defined by is called the α-Riesz potential of μ, where μ is a Borel measure defined on R n . Now, (3) allows to conclude with the following pointwise potential estimates: |u(x)| I 2 (|μ|)(x)| and |Du(x)| I 1 (|μ|)(x).
The first inequality actually holds in the case n ≥ 3; the second one follows differentiating (3). Since the behaviour of Riesz potentials with respect to various relevant function spaces is known (see for instance [1,26]), using (5) allows to infer size properties of solutions from those of the relevant potentials, and in a sharp way. For instance, the following regularizing property of Riesz potentials is known for every q > 1 and α > 0 such that 0 < αq < n: ≤ c(n, α, q) μ L q (R n ) .
A point, which is less emphasised concerning estimates (5) is that (3) also allows to bound oscillations of the solution u in terms of suitable Riesz potentials. This point actually turns out of be of primary interest for us here. To see this, by using the elementary inequality |x −x| 2−n − |y −x| 2−n ≤ c(n) |x −x| 2−n−α + |y −x| 2−n−α |x − y| α , which is valid whenever x, y,x ∈ R n , we get the new representation formula The previous estimate allows to infer, again via potentials, information on the solutions in function spaces measuring smoothness rather than size. For instance, the Hölder continuity criterion I 2−α (|μ|) ∈ L ∞ ⇒ u ∈ C 0,α is immediate. Again, prescribing that I 2−α (|μ|) ∈ L q implies that u belongs to the so called Calderón space C α q ; see Definition 2 below, and [17,40], for a discussion. In a similar way one gets estimates for the gradient |Du(x) − Du(y)| ≤ c I 1−α (|μ|)(x) + I 1−α (|μ|)(y) |x − y| α , 0 ≤ α < 1. (10) As a matter of fact, formulas (9)- (10) can be read as nonlocal, fractional derivatives versions of the classical estimates in (5). This is clarified by the following definition: Definition 2 (Calderón spaces) Let α ∈ (0, 1], q ≥ 1, and let ⊂ R n be an open subset. A measurable function v, finite a.e. in , belongs to the Calderón space C α q ( ) if and only if there exists a nonnegative function m ∈ L q ( ) such that holds for almost every couple (x, y) ∈ × .
This is just another way to say that v has "fractional derivatives". The advantage is that nonlocality is reduced to a minimal status: only two points are considered in (11). Moreover, when is a suitably regular domain, Calderón spaces are closely related to the usual fractional Sobolev spaces W α,q ; see [17]. The function m plays in fact the role of a fractional derivative of v of order α in the L q -sense. Definition 2 is implicit in the work of DeVore and Sharpley [17], where the authors fix the canonical choice m = M # α (v) that is indeed always possible in (11) (see Proposition 1 below). The symbol M # α (v) denotes the standard fractional sharp maximal operator introduced in the following (take = R n and R = ∞): , and let f ∈ L 1 ( ); the function defined by is called the restricted (centered) sharp fractional maximal function of f .
Notice that for β = 0 the previous definition gives back the classical sharp maximal operator of Fefferman and Stein (in the restricted version), which is in fact denoted by M # R ( f ). Definition 2 and the interpretation of m as the α-order fractional derivative of v allow to read the inequalities in (9)-(10) as a way to bound fractional derivatives of u via intermediate Riesz potentials. We may express this concept in the following imprecise yet suggestive way: Needless to say, that last formula has only a symbolic meaning. At this point it is clear that the classical estimates (5) embed in the family of interpolating estimates (12) as particular borderline cases corresponding to the choices α = 0 and α = 1, respectively. In turn, the estimates in (12) provides a unifying approach to the regularity of solutions to (1). They allow to get results both in spaces aimed at measuring the size of functions and in those measuring smoothness.
One of the aims of this paper is to show that completely similar estimates actually hold for a large class of quasilinear, possibly degenerate equations. A potential theory which is completely analogous to the linear one can be constructed in the nonlinear case too.

Basic nonlinear estimates
The main question is now whether and in which sense estimates like (5) and (12) extend to the case of solutions to quasilinear elliptic equations of the type In the most general case, the right hand side μ is a Borel measure with finite total mass in R n , i.e. μ ∈ M b . The one in (13) is a nonlinear version of the Poisson equation when the assumptions are considered. The numbers 0 < ν ≤ L provide bounds for the lowest and highest eigenvalue of the matrix ∂a(·), respectively (ν = L = 1 gives the case of the Laplacean operator). We also want to consider cases where Eq. (13) might be degenerate, thereby examining for instance the so called p-Laplacean operator: In order to catch the essential properties of the equation in (15), we shall consider general vector fields a : R n → R n . These are assumed to be C 1 -regular and to satisfy the following growth and ellipticity assumptions: whenever z, λ ∈ R n . The case p = 2 gives back (14). We remark that we confine ourselves to the case p ≥ 2 as we are mainly interested to present the main ideas in the most accessible way; for results in the case p ≤ 2 we refer to [22,38,40,58]. In the rest of the paper ν, L are fixed parameters whose role is to establish the rate of ellipticity of the vector field a(·). The role of s ≥ 0 is more peculiar. This parameter serves to distinguish the degenerate (s = 0) from the nondegenerate one (s > 0). In this last case a model is given, by for instance taking s = 1, by the nondegenerate equation p-Laplacean type equation (see [49]) Sometimes we shall also consider equations with measurable coefficients, that is when the vector field a(·) exhibits an explicit dependence on the variable x, i.e., − div a(x, Du) = μ. (17) In this case, we shall use a set of assumptions which is weaker than (16), namely, we shall consider a Carathéodory vector field a : × R n → R n such that are satisfied whenever z 1 , z 2 , x ∈ . These assumptions are, up to adjusting the constants ν, L in a universal way, implied by those in (16). Results for the case when x → a(x, ·) is more regular and satisfy (16) uniformly with respect to x ∈ are described in Sect. 6 below.
Our primary emphasis will be on a priori estimates. This means that, according to a scheme which is typical in regularity theory, we shall mainly confine ourselves to present results in form of a priori estimates for more regular solutions and data. We are therefore most of the times considering weak energy solutions to (17), that is functions u ∈ W 1, p ( ) satisfying a(x, Du), Dϕ dx = ϕ dμ (19) for every choice of ϕ ∈ C ∞ ( ) with compact support in . Here we are still assuming that μ is a Borel measure with finite mass, but we could assume that μ ∈ C ∞ as well, stating the same results involving only the total variation of μ, considered as a measure. Results for the original context would then follow via approximation. Such an approach is obviously restrictive when considering general measure data problems. Indeed, observe for instance that due to the monotonicity properties of the operator assumed in (16), considering a solution that belongs to the space W 1, p ( ) automatically implies that the measure μ belongs to the dual W −1, p , which is certainly not the case for any measure when p ≤ n. As a matter of fact, in general, distributional solutions to measure data problems do not belong to the natural energy space W 1, p ( ) and for this reasons they are called very weak solutions. Treating the case of general measures needs then greater care, already in specifying the notion of solution one is dealing with. We are briefly discussing these aspects in Sects. 7 and 14.4 below, where we shall see how to extend the results presented for energy solutions to the case of general solutions to measure data problems. Since we are going to deal with local results, we shall use in a standard way the truncated version of the classical Riesz potentials. Definition 4 (Truncated Riesz potentials) Let μ be Borel measure with finite total mass on R n ; the (truncated) Riesz potential is defined by The word truncated refers to the inequality I μ β (x, R) ≤ c(n)I β (|μ|)(x). Now, let us go back to estimates in (5), and let us notice that these cannot hold when p = 2. Indeed, they clearly do not respect the homogeneity properties of the equation. To see this, we consider a non-null solution to div (|Du| p−2 Du) = μ with p > 2, then, for γ > 0 we haveũ Assuming now that the first estimate in (5) would hold, this would yield that is, recalling the definitions ofũ,μ and I 2 , Upon letting γ → 0 and using that p > 2, we would then conclude with |u(x)| = 0 whenever I 2 (μ)(x) is finite. A similar argument-letting γ → ∞-also gives that estimate as in (5) cannot hold in the case p < 2. In order to overcome this point one is led to consider a new family of potentials. They are obtained by, in some sense, incorporating in Riesz potentials the scaling properties of the p-Laplacean equation.
Definition 5 (Wolff potentials) Let μ be Borel measure with finite total mass on R n ; the nonlinear Wolff potential is defined by Wolff potentials, that despite their name were first considered and studied in [28], reduce to Riesz potentials when p = 2, i.e. I μ β ≡ W μ β/2,2 . They play a crucial role in nonlinear potential theory and in the description of the fine properties of solutions to nonlinear equations in divergence form [2,3,28,29,32,59,60].
An important fact about Wolff potentials is that their behaviour can be in several aspects recovered from that of Riesz potentials via so called Havin-Maz'ya potentials V β, p (|μ|)(x). Indeed, following inequality holds provided pβ < n: The above estimate allows to derive, in a sharp way, almost all types of local estimates starting by the properties of the Riesz potentials, whose action in several function spaces is in fact known; for this we refer to [28,29]. As first shown in the fundamental works of Kilpeläinen and Malý [37,38] for the case of nonnegative measures, a neat analog of the first estimate in (5) holds using Wolff potentials. Later on, a new and interesting proof has been offered Trudinger and Wang [65,66], and this allows to cover the case of general subelliptic operators. Yet different proofs can be found in [39], and in [21], where an approach covering the case of general signed measures has been developed. The final outcome is summarized in the following: Theorem 1 [21,37,38,65] Let u ∈ W 1, p ( ) be a weak solution to the equation with measurable coefficients (17) under the assumptions (18). There exists a constant c ≡ c(n, p, ν, L) such that the pointwise estimate (20) holds whenever B(x, R) ⊂ and the right hand side is finite. Moreover, if then u is continuous in .
A sketchy proof of this theorem is proposed in Sect. 18 below; extensions to more general operators have been given in [50,53]. If p = 2, then W μ 1, p ≡ I μ 2 and we retrieve a local analog of the first estimate in (5). A remarkable point here is that estimate (20) is sharp, and the nonlinear potential W μ 1, p cannot be replaced by any other smaller potential. This is in fact reported in the following: Theorem 2 [37,38] Let u ∈ W 1, p ( ) be a nonnegative weak solution to the Eq. (13) under the assumptions (18), where μ is a positive measure and s = 0. There exists a constant c ≡ c(n, p, ν, L) such that the following pointwise estimate holds whenever B(x, 2R) ⊂ and the Wolff potential is finite: The possibility of extending pointwise potential estimates to the gradient of solutions has remained an open and discussed issue since the paper [38]. The answer came only recently and here we consider the case of equations of the type (13). The first result in this direction is contained in [57] for the case p = 2, that is when assumptions (14) are in force. In [57] it is indeed proved the following analog of the second estimate in (5): For the case p > 2 a first result has been given in [21], where the following, apparently natural estimate, has been proved to hold at every Lebesgue point x of Du, under assumptions (16): The previous estimate seems to put a final word on the problem of a sharp analog of the second estimate in (5) since the orthodoxy of nonlinear potential theory prescribes that Wolff potentials replace Riesz potentials everywhere when p = 2. Surprisingly enough, in [41] we have shown that this is not the case. Using Wolff potentials is necessary only when estimating solutions, while it is not when passing to their gradients. In fact the following holds: Theorem 3 [41] Let u ∈ W 1, p ( ) be a solution to the Eq. (13), under the assumptions (16). There exists a constant c ≡ c(n, p, ν, L) such that the Riesz potential estimate holds whenever B(x, R) ⊂ and the right hand side is finite. Moreover, if then Du is continuous in .
The proof of Theorem 3 will be presented in Sect. 15 below, where it will be obtained as a corollary of more general potential estimates. An extension of the previous result to a class of general operators including the p-Laplacean has been recently given by Baroni in [5].
Remark 1 An obvious ambiguity arises in the statements of Theorems 1-3 when saying that the related estimates hold whenever the right hand side is finite. This might appear obvious. Another point is that estimates in Theorems 1-3 are stated for every x, while both u(x) and Du(x) are only defined if x is Lebesgue point, where the so called precise representative can be defined. Both ambiguities are clarified in Sect. 8. There we prove that, both for u and Du, the set of Lebesgue points coincides with the one for which inequalities (20) and (24) feature a finite right hand side, respectively.
Observe that estimate (24) obviously improves the one in (23) as, if p ≥ 2 then 1/( p − 1) ≤ 1, and therefore The implications of Theorem 3 are rather surprising: when switching to the gradient, estimates linearize and they reduce to the ones already available for the Poisson equation. The nonlinear, possibly degenerate character of the equations considered plays no role here. In particular, the regularity theory of equations as (13) reduces to that of the Poisson equation up to the C 1 -level. Several facts typical of the linear theory can now be reproduced verbatim. For instance, for solutions u ∈ W 1, p (R n ) to (13) defined in the whole R n , the classical estimate via classical Riesz potentials follows immediately. For a relevant connection between Theorem 3 and fundamental solutions to general nonlinear measure data problems we refer to Sect. 7.1 below. Another very precise analogy with the classical linear theory is a striking nonlinear extension of a fundamental theorem of Stein [63]. This claims that if v ∈ W 1,1 is a Sobolev function defined in R n with n ≥ 2, then Dv ∈ L(n, 1) ⇒ v is continuous. (27) This can be regarded as the limiting case of Sobolev-Morrey embedding theorem that reads Dv ∈ L n+ε ⇒ v ∈ C 0,ε/(n+ε) whenever ε > 0. Note indeed that L n+ε ⊂ L(n, 1) ⊂ L n for every ε > 0 with all the inclusions being strict. Another way to state Stein's theorem concerns the regularity of solutions u : → R N to the Laplacean system, and reads div Du = u ∈ L(n, 1) ⇒ Du is continuous.
This follows by (27) and classical Calderón-Zygmund theory. The point is now that if μ ∈ L(n, 1), then a computation involving the definition of norm in Lorentz spaces (see [23,41] for discussion) implies that (25) takes place, and therefore we conclude with the following: Theorem 4 (Nonlinear Stein theorem [41]) Let u ∈ W 1, p ( ) be a solution to the Eq. (13), under the assumptions (16) and such that μ ∈ L(n, 1) locally in . Then Du is continuous in .
See [12] for a global Lipschitz bound. Without appealing to potentials, but by using different means, the result of the previous theorem also holds for systems.
Theorem 5 (Vectorial nonlinear Stein theorem [45]) Let u ∈ W 1, p ( , R N ), N ≥ 1, be a vector valued solution to the p-Laplacean system Assume that the components of the vector field F : → R N locally belong to the space L(n, 1). Then Du is continuous in .
Finally, let us mention that starting from the techniques developed for the last two theorems, similar results can be proved in the case of fully nonlinear equations [15].

Universal potential estimates
The results of this section have a double aim. On the one hand, they show that estimates (9)-(10) have analogs for nonlinear equations. On the other hand, they show that those in (20), (23) and (24) are actually special cases of a more general class of nonlinear potentials estimates. These, in turn, allow to recover in an optimal way all the basic regularity properties of solutions in terms of the regularity of the assigned datum μ. The range of the results implied by such estimates is of course limited by the regularity theory of homogeneous equations as div a(x, Dw) = 0.
Therefore we first recall what is the maximal regularity of solutions to equations as in (28) and in (29) below. In turn, this dramatically changes according to the smoothness assumed on the partial map x → a(x, ·). When measurability is considered, very low degree of regularity is expected and solutions are just Hölder continuous for some exponent. When instead dependence on x becomes more regular then higher regularity follows. In this case, again for sake of simplicity, and since this does not affect the expositions of the main ideas, we shall confine ourselves to the case of equations with no coefficients as div a(Dw) = 0.
The first result we present upgrades estimate (20) to low order fractional derivatives, and allows to give a sharp formulation of the classical De Giorgi's theory via nonlinear Wolff potential estimates. De Giorgi's theory for equations with measurable provides the existence of a universal Hölder continuity exponent α m ∈ (0, 1), depending only on n, p, ν, L, such that w ∈ C 0,α loc ( ) for every α < α m (30) and The previous estimate holds whenever x, y ∈ B R/2 and B R ⊂ , for a constant depending only on n, p, ν, L and α. The exponent α m can be thought as the maximal Hölder regularity exponent associated to the vector field a(·). It is universal in the sense that it is independent of a(·) and of the particular solution considered, but just depends only on n, p, ν, L. For more precise information on α m see Theorem 18 and Remark 6 below. It then holds the following: Theorem 6 (De Giorgi's theory via potentials [40]) Let u ∈ W 1, p ( ) be a weak solution to the equation with measurable coefficients (17) under assumptions (18). Let B R ⊂ be such that x, y ∈ B R/2 ; then holds provided the right hand side is finite and 0 ≤ α < α m , where the exponent α m has been defined in (30)- (31). Moreover, wheneverα ∈ [0, α m ) is fixed, the dependence of the constant c is uniform for α ∈ [0,α], in the sense that c depends only on n, p, ν, L andα.
Estimate (32) does not catch up, when α → 1, the optimal gradient bound in (24). There is a natural reason for such a lack of endpoint property: it catches the other optimal bound (20) as α → 0, and the two cases involve different potentials. In fact, the change in the nature of the estimates when passing from (20) to (24) requires another theorem, parallel to Theorem 7.
Theorem 8 (Uniform Riesz potential estimate) Let u ∈ W 1, p ( ) be a weak solution to the Eq. (13) under assumptions (16). Let B R ⊂ be such that x, y ∈ B R/4 ; then holds provided the right hand side is finite and 0 < α ≤ 1. Moreover, whenever α ∈ (0, 1] is fixed, the dependence of the constant c is uniform for α ∈ [α, 1] as c depends only n, p, ν, L andα. The previous result is here presented for the first time and for the proof we refer to Sect. 5 below. There Theorem 8 is actually obtained as a corollary of a more general estimate for certain maximal operators of Du. From the discussion made there it will be clear that the points x, y are Lebesgue points of u when the right hand side in (33) is finite; compare with Remark 1. Theorems 6 and 8 provide altogether the optimal analog of estimate (9) and in fact they both unify with Theorem 7 when p = 2, when Wolff and Riesz potentials do coincide. We remark that is also possible to quantify the blow-up of the constant c in (33) as α → 0; see Remark 7 below.
We now examine the situation for the gradient, giving an estimate which is again bound to give back (10) when p = 2. As already done before Theorem 6, we recall the basic information about the maximal regularity of solutions to homogeneous equations as in (29). These in turn involve the fundamentals of the regularity theory of the p-Laplacean equations and can be summarised by saying that there exists a positive exponent α M ∈ (0, 1), depending only on n, p, ν and L, such that whenever x, y ∈ B R/2 , and hold for any local solution w to (29). The constant c depends only on n, p, ν, L and α. For estimate (35) and the exponent α M see Theorem 17 and Remark 5 below. The result in (34) finds its origins in the basic work of Uraltseva [67] while further approaches can be found in [18,24,46,51]. We remark that all these results are again based on the groundbreaking work of De Giorgi [16]. The exponent α M is in general strictly less than one (see again [67]); while lower bounds on α M can be obtained by tracking the constant dependence in the various proofs, its precise (optimal) value is not known. An optimal lower bound for α M is conjectured to be 1/3, according to the regularity exhibited by the solutions to the so called ∞-Laplacean equation [4]. By taking (34) into account we have the following nonlinear version of estimate (10): Theorem 9 (Uniform Wolff potential estimate) Let u ∈ W 1, p ( ) be a weak solution to the Eq. (13) under assumptions (16). Let B R ⊂ be such that x, y ∈ B R/4 ; then holds provided the right hand side is finite and The exponent α M has been defined in (34). Moreover, whenever 0 ≤α < min{1/( p − 1), α M } is fixed, the dependence of the constant c is uniform for α ∈ [0,α], in the sense that c depends only n, p, ν, L andα.
The proof of this theorem will be presented in Sect. 16 below. Notice that the limitation in (37) makes the statement of Theorem 9 consistent with the definition of Wolff potential given in Definition 5, where it must be β > 0.

Maximal-potential estimates
Let us now explain Theorem 8 proceeding for some while with a few purely heuristic arguments. By considering the function m appearing in (11) as a fractional derivative of order α, we can express the content of Theorem 8 in the following way: thereby obtaining a nonlinear version of (12). The singularity of the case α = 0 essentially stems from the fact that Wolff potentials come then into the play when considering pointwise estimates for u. Let us proceed heuristically and see why Riesz potentials intervene as long as fractional derivatives are considered. We rewrite Eq. (15) as the decoupled system This viewpoint tells us that an equation as in (15), which is commonly seen as a nonlinear equation in the gradient, can be also seen as linear equation in a nonlinear vector field of the gradient, that is H . The analysis of equations as −div H = μ, being f → div f a differential operator of order one, typically involves estimates of H via operators as I μ 1 . In turn this explains the appearance of estimates as in (24). The reader will immediately recognize that no similar argument applies to u, as the equation depends directly on Du and not u. This is ultimately the reason for the appearance of Wolff potentials when deriving pointwise for u as (20). Let us now go to intermediate derivatives and let us believe that fractional derivatives ∂ α u can be controlled by in turn controlling quantities as on every possible ball B R . Using a typical dimension analysis viewpoint, the appearance in the estimates of a multiplicative factor as R −β amounts to consider a corresponding derivative of order β in the estimates. Now, let us rewrite Eq. (15) as and note that in this scheme of reasoning the operator f → R (α−1)( p−1) div f has now order p − α( p − 1). Its inversion should therefore involve potentials as I μ p−α( p−1) . At this point, recalling the identification (39) leads to conclude with an estimate with the dimensional meaning of (38). The point is now to make everything rigorous. A natural way to make the identification in (39) is to use a nonlocal operator as for instance a fractional maximal operator. It is therefore time for the following: , and let f be an L 1 ( )-function or a measure with finite mass; the function defined by Using fractional maximal operators allows to get Riesz potential estimates which are uniform in the whole range [0, 1], and that provide a rigorous interpretation of previous heuristic arguments. The outcome is summarized in the following theorem, which appears here for the first time. The proof is contained in Sect. 12 below (see Sect. 14.4 for the case of SOLA).
The maximal operators of Definition 3 naturally connect to those presented in Definition 6 via Poincaré inequality so that a uniform estimate on M # α,R (u) follows appealing to Theorem 10, that is We yet remark that in the estimate in the last display the constant c is independent of α, and in fact the estimate holds uniformly for α ∈ [0, 1]. The turning point to pointwise estimates is now given by the fact that the sharp maximal operator M # α,R (u)(x) controls the pointwise behaviour of u provided α > 0. This fact is expressed in the following Proposition, a first form of which is present in [17]. It in turn relies on some original arguments of Campanato [10].
holds whenever x, y ∈ B 2R/5 , for a constant c depending only on n. More precisely, x and y are Lebesgue points of f whenever M # α,R ( f )(x) and M # α,R ( f )(y) are finite, respectively. Therefore, whenever the right hand side in (43) is finite, the values of f are defined as follows: Proposition 1 allows to obtain Theorem 8 as a corollary of Theorem 10 and in particular of (42); see Sect. 12.7 below for the proofs, including the one of Proposition 1. We believe that Theorem 10 helps to understand the peculiar nature of the case α = 0 and the occurrence of Wolff potentials instead of Riesz potentials. Indeed, an L ∞ -bound on the sharp maximal operator is essentially equivalent to require that u belongs to the space BMO (introduced in [34], see (68) below). On the other hand, a real L ∞ -bound on u itself requires an L ∞ -bound on W In the next section we shall in fact describe a few additional results showing that also the case α = 1 reveals to be special when using maximal operators instead of potentials.

Non-endpoint maximal estimates
An interesting fact concerning the two cases α = 0, 1 of (40) is that when looking for bounds on intermediate derivatives, potentials can be replaced by maximal operators. These give in fact smaller quantities. Maximal operators, as we shall see, allow to replace conditions typically given in Lebesgue or Lorentz spaces, with weaker conditions formulated in terms of Marcinkiewicz spaces. Let us recall the following, elementary inequality which holds for γ ∈ (0, 1) see for instance [40,Lemma 4.1]. Then we have the following: Theorem 11 (Intermediate maximal estimates [40]) Let u ∈ W 1, p ( ) be a weak solution to (13) under the assumptions (16). Let B R ⊂ be a ball centred at x; then the estimate holds uniformly in α ∈ [0,α], whenever 0 <α < 1. The constant c depends only on n, p, ν, L ,α.
As for gradient oscillations, we instead have the following result, whose proof is contained in Sect. 13 below.
Theorem 12 (Gradient sharp maximal estimate [40]) Let u ∈ W 1, p ( ) be a weak solution to (13) under the assumptions (16). Let B R ⊂ be a ball centered at x; then the following estimate: for a constant c depending only on n, p, ν, L ,α. The exponent α M has been introduced in (34)-(35).

Equations with coefficients
Getting nonlinear potential estimates catching regularity beyond Theorem 6 for equations as in (17), necessitates to assume more regularity on the partial map x → a(x, ·). Keep in mind the discussion before Theorem 6. We therefore define the following averaged vector field: whenever B(x, r ) ⊂ , and then the averaged (and renormalized) modulus of continuity of x → a(x, ·) as follows: When considering equations of the type the definition in (46) gives something which is comparable to the usual modulus of continuity of the function c(·). We then have Theorem 13 Let u ∈ W 1, p ( ) be a weak solution to (17), with assumptions (16) (uniformly) verified by the partial map z → a(·, z) for every x ∈ . Then - [40] For everyα ∈ (0, 1) there exists a number δ ∈ (0, 1), depending only on n, p, ν, L ,α, such that implies the validity of estimate (32) whenever α ∈ [0, α m ). In particular, if the limit in (48) is zero estimate (32) holds for every α < 1 - then estimates (33) and (40) then estimate (36) holds whenever α <α.
In all three cases the constants involved in the estimates additionally depend on the quantities appearing in (48)-(50).
Let us now briefly comment on the assumptions made in the last theorem, remarking that all of them are essentially necessary. Assumption (48) allows to catch the case in which, when referring to Eq. (47), the function c(·) is not even continuous, but of class VMO or BMO with small seminorm. This reconnects the theory presented here to the classical Calderón-Zygmund theory, where VMO-regularity of coefficients is required to establish integrability estimates related to estimate (32); see for instance [9]. Assumption (149) is necessary too. Indeed, estimate (33) implies gradient boundedness when μ is good enough, while Dini continuity of coefficients is known to be necessary to get Lipschitz regularity of solutions; see [33]. The proof of the second statement in the last theorem has the techniques developed for Theorem 10 as starting point, and will appear in the forthcoming paper [6]. Finally, assumption (50) seems to be natural in view of the usual Schauder theory. This prescribes that, in order to have Hölder continuity of the gradient, the Hölder continuity of x → a(x, ·) is necessary.

Interlude on measure data problems
When looking at equations as in (17) we have considered distributional solutions lying in W 1, p . This choice is actually aimed at simplifying the presentation in a way that emphasises the results obtained in the form of a priori estimates. Dealing with W 1, p -solutions is anyway very much restrictive in view of the fact that, typically, distributional solutions to measure data problems do not enjoy this regularity. A typical instance is the so called nonlinear Green's function which solves, in the sense of distributions, the problem In the right hand side of (52) 1 there appears the Dirac measure δ charging the origin. A straightforward calculation reveals that G p ∈ W 1, p while it holds that We recall here that the Marcinkiewicz space M q ( ), for 1 ≤ q < ∞, is the set of all measurable maps μ satisfying the condition The above discussion now motivates the following: The terminology "very weak" is used to emphasize the fact that such solutions do not belong, in general, to the natural Sobolev space W 1, p ( ), i.e. they are not energy solutions. Note that, when considering assumptions (18), in order to guarantee that a(x, Du) ∈ L 1 ( , R n ) it suffices, for instance, to have Du ∈ W 1, p−1 ( ). Definition 7, not surprisingly, poses problems. For instance, very weak solutions may exist beside usual energy solutions [61], and are therefore not in general unique. As a matter of fact one of the basic open issues of the theory of measure data problems is to find a function class where to solve in a unique way Dirichlet problems of the type with μ being in the most general case a (signed) Borel measure with finite total mass. We will not pursue this matter here, rather referring to [14,36] for a more comprehensive discussion. Here we are interested in a class of solutions which we regard to be very natural (see in fact the equivalence results obtained in [36]) and for which all the potential estimates described in this paper continue to hold. This is the class of SOLA (Solutions Obtained by Limits of Approximations), which is introduced in the SOLA are special because they are selected via an approximation procedure using more regular energy solutions, and thereby they inherit a few of their basic properties. For instance: their precise representative is defined out of a null p-capacity set, exactly as W 1, p -functions; see Theorem 16 below. Let us recall how to build a SOLA to (55), following [8]. One considers solutions The right hand sides data μ k := μ * φ k ∈ C ∞ are canonically obtained by smoothing μ via convolution with a sequence of smooth standard, smooth mollifiers {φ k }. This approach leads to the existence of a very weak solution u ∈ W ( ) (up to a not relabeled subsequence). SOLA are not known to be unique, except in a few special cases (for instance when μ ∈ L 1 ). See [8,13,14,36] for a larger discussion. We summarise the basic existence and regularity results available for SOLA in the following: Theorem 14 [8,55] Under the assumptions (18) with p ≤ n, there exists a SOLA u ∈ W 1, p−1 0 ( ) to (55). Moreover, every SOLA u ∈ W 1,1 loc ( ) to (17) is such that In the case of equations of the type (13) under assumptions (16), we also have A preliminary integrability result was obtained for a different kind of solutions in the pioneering paper of Lindqvist [48]. The restriction p ≤ n is aimed at focusing on the case of SOLA, since when p > n then μ belongs to the dual of W 1, p and it is possible to consider standard energy solutions. In display (57) a fractional Sobolev space appears. The meaning is actually that holds whenever A . We also remark that, in fact, it can be proved that every SOLA u is such that This means that every SOLA exhibits exactly the same integrability of the nonlinear Green's function G p described in (53) (see [7,20,55]). Both the results in (56) and (57) are sharp, as follows again considering G p , which can be in fact proved to be the only SOLA to the problem in (52). This last fact follows combining the results in [35,62]. As mentioned above, by mean of approximations arguments, all the potential estimate in this paper continue to hold for any SOLA to (55). More details are in Sect. 14.4 below, where we in particular discuss the possibility of characterizing the Lebesgue points of SOLA via linear and nonlinear potentials.

Potential estimates and fundamental solutions
To check to which extent estimates (20) and (24) replace the usual linear representation formulas in the nonlinear case, the best thing to do is to see how they reproduce the behaviour of the nonlinear fundamental solution G p . We already know the optimality of estimate (20) from Theorem 2. We therefore concentrate on (24) and show that it actually reverses when considering the nonlinear Green's function G p . This means that for with R = 2|x|, and for a constant c depending only on n and p. Indeed by coarea formula we have On the other hand, as δ(B(x, )) = 0 for ≤ R/2 and δ(B(x, )) = 1 otherwise, we also have so that (59) follows combining the last two inequalities.

Fine properties of solutions via potentials
We have seen that linear and nonlinear potentials locally control the behaviour of solutions. It is at this point not surprising to discover that potentials also control their so called fine properties. In this respect we present two theorems. The former is concerned with the pointwise behaviour of gradients of SOLA, and in particular with their Lebesgue points. This result employs Riesz potentials. The latter in instead concerned with Lebesgue points of solutions and uses Wolff potentials.

Theorem 15
Theorem 3 continues to hold whenever u ∈ W 1, p−1 ( ) is a SOLA to (13). Moreover, the condition implies that the following limit exists and therefore defines the precise representative of Du at the point x: The proof of this last theorem is included in Sect. 17 below. The analogous statement for solutions is the following: (17). Moreover, the condition implies that the following limit exists and therefore defines the precise representative of u at the point x: In particular, the set of non-Lebesgue points of u has p-capacity zero.
We notice that the assertion about the p-capacity follows directly from the fact that set of points where the Wolff potential W μ 1, p blows-up has zero p-capacity (see for instance [30]).
Remark 2 (Hausdorff dimension of singular sets of SOLA) Theorem 16 allows to define SOLA-which are initially defined only almost everywhere via convergenceoutside a singular set (i.e. set of non-Lebesgue points) of Hausdorff dimension not larger than n − p, when p ≤ n. This establishes a connection with another class of solutions to measure data problem. These are called p-superharmonic functions and are defined in the case the measure μ is nonnegative; see [30,38]. For such solutions every point is a Lebesgue point, by construction. The connection is now given by the fact that, in view of [36], every SOLA has a superharmonic representative (that is, they coincide almost everywhere) whenever the measure is nonnegative. In the case of Theorem 15 we instead can conclude that the Hausdorff dimension of the singular set of Du is not larger than n − 1. The last estimate follows on the other hand also by (57). Indeed, the Hausdorff dimension of the set of non-Lebesgue points of a general W s,γ -map, with sγ < n, has Hausdorff dimension not larger that n − sγ ; see [54].
Remark 3 (Analogies with linear potential theory) Theorems 15 and 16 can be also considered as analogs of classical facts in linear potential theory, concerning the pointwise behaviour of solutions to the Poisson equation − u = μ. As usual, in the linear case they follow from explicit representation formulas and abstract analysis of potentials. Here they are replaced by nonlinear potential estimates. Indeed, define where F denotes the Fourier transform and α ∈ (0, n). Then the limit exists-and thereby defines the precise representative of f -up to a set of zero (α, q)capacity. For these facts we refer to [3], and in particular to [3, Proposition 6.1.3]

Regularity and corollaries
The estimates presented up to now allow to give a comprehensive and unified picture of the regularity results available for quasilinear equations. We will now briefly describe a few consequences of such estimates. First of all, let us recall that combining Theorem 11 and Proposition 1 yields the following pointwise inequality: that holds whenever x, y ∈ B R/4 , B R ⊂ , α ∈ (0, 1), where c ≡ c(n, p, ν, L , α). Notice here that the blow-up of c takes place as soon as α approaches 0 or 1. In a completely similar way, by Theorem 12 and again Proposition 1, we then have that the inequality holds with x, y, r, B R and c as above, but for α ∈ (0, α M ). This time the constant c blows-up when α → 0 or α → α M . We are now interested in seeing how the results presented up to now, including estimates (64)-(65), imply regularity of solutions in various relevant function spaces. Let us briefly recall that the Morrey space L q,θ ( ) for q ≥ 1 and 0 ≤ θ ≤ n, is defined by requiring that its member f ∈ L q ( ) satisfies while their local version is defined in the usual way. We refer to [1,56] for more on Morrey spaces and their basic properties with related references. We now have the following, comprehensive:  (65)).
Proof In order to prove (C1), we begin recalling the following mapping property of the fractional maximal operator: that holds whenever 1 < t < n/β (see [26,56]). Then we go to estimate (64) and yet recall the definition of Calderón spaces given in Definition 2. This leads to determine At this point (C1) follows using this fact together with estimate (64). Notice that, recalling that we are assuming p ≥ 2, we have used that We have also use the fact with the previous definitions we in fact have that t[ p −α( p − 1)] < n. The assertion in (C2) follows immediately by taking α = 0 in Theorem 11 when p = n. We indeed recall that by the very definition of BMO spaces we have In the case 2 ≤ p < n, we recall the following Hölder's type inequality valid in Marcinkiewicz spaces M t , t > 1: This implies that in our case |μ|(B ) n− p holds and, by the definition of fractional maximal operator, we conclude that M p,R (μ) is locally bounded. At this stage (C2) follows from estimate (44), where we again take α = 0.
The statement in (C3) is simply an obvious consequence of Theorem 1.
In (C5) the first implication follows again by the Hölder type inequality in (69). On the other hand, we observe that the inequality |μ|(B ) n− p+α( p−1) implies that M p−α( p−1),R (μ) ∈ L ∞ , locally, by the very definition of fractional maximal operator in Definition 6, so that (C5) follows by estimate (64).
(C6) is just an obvious consequence of the following imbedding property of Riesz potentials (see [1]): and of Theorem 3.
Similarly, (C7) follows from Theorem 3 and the classical mapping property See also [26,56]. (C8) is essentially a corollary of Theorem 3 and of the mapping property of Riesz potentials originally proved by Adams [1] (see also [56] for a localization).
The first implication in (C9) is a consequence of (69) while the second follows applying Theorem 12 with α = 0. Indeed, assuming that |μ|(B ) n−1 implies that M 1 (μ) is locally bounded, and therefore so is . By definition of sharp maximal function this implies that the gradient belongs to BMO, locally; see (68).
(C10) follows from Theorem 8 in an obvious way. (C11) is again a consequence of (25) and of some basic computation involving the definition of Lorentz norm (see [22,41]). Indeed, the condition μ ∈ L(n, 1) allows to conclude that (25) holds, again by basic manipulations on Lorentz equivalent norms.
Some of the points in the previous corollary are well known results when considering for instance the model equation (15). The theory above allows to extend and embed them in a more general context where results follow in a unified way. Specifically, for (C2) see [55,64], for (C6) see [7,8,20], for (C7) see [19,31,56], for (C8) see [56]. We also remark that the previous corollary is just a sample of what is possible to have using the nonlinear potential estimates approach; further spaces, as for instance Lorentz-Morrey or Besov-Morrey spaces are considerable as well (see [55,56] for relevant definitions). More rearrangement invariant function spaces regularity results via potentials are contained in [11].

A basic comparison estimate
This section is devoted to the proof of a comparison estimate between a considered solution u ∈ W 1, p ( ) to (17) (notice that we are allowing for measurable coefficients here) and the function v ∈ u + W 1, p Here B R ⊂ denotes a fixed ball with radius R. The monotonicity properties of the vector field a(·) described in (18) can be restated and managed more easily via the use of the auxiliary vector field V : R n → R n defined by The number s is the one defined in (16) and (18). Indeed, we start observing that the following inequality: holds, and is valid for all matrixes z 1 , z 2 ∈ R n that are not simultaneously null (in the case s = 0) and for every p > 1. The constant c depends only on n and p. See [27,55] for basic properties and for a discussion on the use of the map V (·) in this context. Since we are assuming p ≥ 2, in particular we have that Combining (18) 2 and (72) yields again for every choice of z 1 , z 2 , x ∈ and where c ≡ c(n, p, ν). The lemmas in this section are already scattered in [21,41,55]. The proofs proposed here are anyway different and shorter. We start with a basic, weighted type energy estimate.
Lemma 1 Let u ∈ W 1, p ( ) be a solution to (17) under assumptions (18), and let v ∈ u + W 1, p 0 (B R ) be as in (70). Then . We recall the standard notation Then, notice that and therefore, using (74), we conclude with so that (75) follows.
Lemma 1 and Sobolev embedding theorem in turn imply a first comparison estimate.
Lemma 2 Let u ∈ W 1, p ( ) be a solution to (17) under assumptions (18), and let v ∈ u + W 1, p 0 (B R ) be as in (70). Then and therefore hold for constants c 1 ,c 1 ≡ c 1 ,c 1 (n, p, ν, q), whenever Proof We restrict ourselves to the proof of (76), since this obviously implies (77) via Poincaré's inequality. Moreover, we can restrict ourselves to prove (76) assuming q > 1, since the result for the remaining values would follow by Hölder's inequality. Now, let us observe that we can always reduce to the situation when there exists ξ > 1 such that where q * is the Sobolev conjugate of q. Indeed, when p ≤ n then p m = n( p −1)/(n − 1) and therefore q < p m implies that q/( p − q) < q * = nq/(n − q), so that (79) follows for the corresponding choice of ξ . When instead p > n then we first observe that we can prove (76) in the case n ≤ q < p, since (76) for lower values of q would then follow by Hölder's inequality. Now, since by definition of Sobolev embedding exponent we can take q * as large as we please, we again find a number ξ > 1 for which (79) holds. We then plan to apply Lemma 2 with this choice of ξ and with the following one of h: Notice we can always assume that h > 0, otherwise the assertion of Lemma 2 trivializes. By using the definitions in the last two displays, inequalities (73) and (75), and finally Sobolev embedding theorem, we then have so that (76) follows.

Remark 4
By examining the arguments of the previous lemma it is possible to see that the constant c 1 appearing in (76) shows the following natural asymptotic for q → p m : where c is a constant that remains bounded whenever p varies in a compact subset of (1, ∞); see (79) and the constant appearing in Sobolev-Morrey embedding theorem. The asymptotic in (80) is typical in situations when a borderline estimate in Lebesgue spaces fails, being replaced by an estimate in Marcinkiewicz spaces; recall (58).

A sequence of comparison estimates
Given a number δ 1 ∈ (0, 1/4) and a ball B(x, r ) ⊂ , we define the sequence of shrinking balls whenever j ≥ 0 is an integer. The related comparison solutions v j ∈ u + W 1, p Proof We start fixing the following quantities: and in view of the application of Lemmas 1 and 2, that we shall use with exponents q such that .
This in particular fixes the constant c 1 from Lemma 2. We also set In the rest of the proof constants denoted by c will only depend on n, p, ν, δ 1 , A and will in general vary from line to line, as usual. We start estimating the term on the left in (85) with the aid of (84) as follows: To continue, we estimate the second-last integral appearing in the previous display. For this let us preliminary note that, by using (76) and recalling that r j = δ 1 r j−1 , the following estimate holds indeed for the same range of exponents q working in (76): Then, appealing to Hölder's inequality together with (76), gives us But now, as with c ≡ c(n, p, ν, δ 1 ). Combining this last estimate with (88) gives us It remains to estimate the first term on the right hand side in the above display. Applying Hölder's inequality, together with (72) and (75), and recalling that ξ = 1 + 2γ , we obtain for any h > 0 that Now we choose for some small positive δ ∈ (0, 1) to get ⎛ Notice that the presence in (94) of the parameter δ is aimed at guaranteeing that h is positive; we shall eventually let δ → 0 at the end of the proof. In the above display, we use (89) and (84) to gain where the constant c ultimately depends only on n, p, ν, δ 1 , A. Plugging the last estimate in (93) now yields while applying Young's inequality and recalling again that r j = δ 1 r j−1 gives whenever ε ∈ (0, 1), where c ≡ c(n, p, ν, δ 1 , A) is in particular independent of ε. It finally remains to estimate h, which has been defined in (94). We have Using (76) and (89) to estimate I 1 leads to In turn we bound I 2 appealing to (76) and using (84) repeatedly as follows: Similarly to (90) we find Combining this last estimate with (97) and eventually with (96) yields where c * depends only on n, p, ν, L , A, δ 1 . Plugging the inequality in the above display in (95), choosing ε = 1/(2c * ) and reabsorbing terms leads to Now (85) follows using this last inequality in combination with the one in display (92), and finally letting δ → 0.

Proof of Theorems 8 and 10
The proof of Theorem 10 falls in six steps, going through Sects. 12.1-12.6 respectively. It requires the use of several different tools, and ultimately relies on a delicate iteration technique. Finally, in Sect. 12.7, we shall briefly show how to get Theorem 8 from Theorem 10. In the following, given a ball B ⊂ , we define the excess of a vector field f ∈ L 1 (B, R k ) over B as This functional, roughly speaking, provides an integral measure of the oscillations of the vector field f in the ball B. An elementary property of the excess is given by the following inequality: We remark that here and in the following we shall use the comparison estimates presented in Sects. 10 and 11 that in fact applies in particular to the case of the solutions considered in Theorem 10.

A density property of a-harmonic functions
In this section we shall consider a solution v ∈ W 1, p where B ⊂ R n is a given ball. We here restate in a suitable form the basic regularity properties a priori estimates for solutions to equation in (99); proofs can be for instance found in [18,21,51,55]. The first is the classical gradient L ∞ − L 1 bound which is valid whenever γ ∈ (0, 1) and for constant c l depending only on n, p, ν, L. Next, we restate in suitable way the gradient Hölder continuity estimate included in display (35); more precisely, we select β ≡ β(n, p, ν, L) ∈ (0, α M ) such that holds whenever σ B ⊂ B is a ball concentric to B with σ ∈ (0, 1/2), such that x 1 , x 2 ∈ σ B, where again c h ≥ 1 depends only on n, p, ν, L. Based on the previous regularity estimates, we can prove the following result, that roughly tells that something that happens in average, happens actually everywhere, and in a precisely quantifiable way.

Proposition 2 (Density improvement)
Let v ∈ W hold for some integer k ≥ 1 and for numbers Γ ≥ 1, λ > 0 and σ ∈ (0, 1/4) such that where β and c h are the constants appearing in (101). Then Proof The first inequality in (102) implies that there exists a point x 0 ∈ σ k B such that On the other hand, as (101) holds, the second inequality in (102) gives , the last two inequalities and (103) then give for all x ∈ σ B.
Finally, we recall two decay estimates that hold for the excess functional of the gradient and of the solutions. They are again related to the Hölder continuity of the gradient of solutions to (99) and of the solutions themselves, in the case we are considering homogeneous equations with measurable coefficients of the type div a(x, Dv) = 0 in B.
We now connect two basic regularity results for solutions to (99) holds whenever σ B ⊂ B are concentric balls.

Remark 5
The exponent β appearing in (105) can be chosen arbitrarily close to the exponent α M appearing in (34). As a matter of fact, estimate (35) is in fact obtained as a corollary of the one in (105) via the standard Campanato's integral characterisation of Hölder continuous functions [10]. The exponent α M considered in Theorem 9 is indeed nothing but the sup of the numbers β considerable in Theorem 17. Explicit, though not optimal, estimates on these numbers via the structure parameters n, p, ν, L, are retrievable tracking the dependence of the constants in the proofs in [19,21,47].

Theorem 18
Let v ∈ W 1, p loc (B) ∩ W 1,1 (B) be a weak solution to (104) under the assumptions (18), and where the vector field a : × R n → R n has measurable dependence on x. There exist constantsβ ∈ (0, 1] andc d ≥ 1, both depending only on n, p, ν, L, such that the estimate holds whenever σ B ⊂ B are concentric balls, wherer is the radius of B. Proof The proof combines a few well-known facts in the regularity theory of the equation considered. Let us recall the following Caccioppoli type inequality, that states there exists a constant c ≡ c(n, p, ν, L) such that whereβ < α m and c ≡ c(n, p, ν, L ,β). Both the inequalities in the last displays are typical in the literature (see for instance [25,Chapter 7]) when using L p -norms instead of the L 1 -ones. Combining them then yields that proves (107) in the case σ ∈ (0, 1/2]. On the other hand, the remaining case σ ∈ (1/2, 1] is trivial by using (98).

Remark 6
Exactly as in Remark 5, the exponentβ appearing in Theorem 18 can be actually taken as close to α m -appearing in (30)-(31)-as we please. As a matter of fact estimates (31) and (106) are equivalent by noticing that if v solves then so does v − k, whenever k is a real number. Again, the exponent α m used in Theorem 6 is actually defined as the sup of the numbers β for which (106) works. Estimates for such numbers are available in terms on the structure parameters n, p, ν, L.

Setting of the constants
In the following all the balls considered will be centered at the point x ∈ , and we start form a ball B R ≡ B(x, R) ⊂ as in the statement of the theorem. We introduce λ M as and fix the constant H ≥ 1 in a few lines (see (114) below), in a way that makes it depending only on n, p, ν, L; we will then prove that holds uniformly in α ∈ [0, 1], for yet another constant c ≡ c(n, p, ν, L). Clearly, we may assume without loss of generality that λ M > 0, otherwise there is nothing to prove. Let us now fix a few constants that will be relevant in the following; the constants c l , c h , c d have been defined in (100), (101) and Theorem 17, respectively, while c 1 has been introduced in Lemma 2 (that here will be used with q = 1); all these constants depend only on n, p, ν, L. Again, β ≡ β(n, p, ν, L) ∈ (0, 1) is the exponent appearing in (101) and (105). Then we fix It follows that also δ 1 depends only on n, p, ν, L. With such a choice of δ 1 we consider the balls in (81), that is Again, with δ 1 now being fixed we determine the constant c 2 from Lemma 3 as follows: The following inclusions hold for every j ≥ 0, and the will be used throughout the proof: andr j := r j + r j+1 2 for j ≥ 0, and observe that Therefore, by (109) and the choice in (114) it follows that Of course we are using constants like 10 8 to emphasize the fact that in certain places of the proof what it matters is to take large/small quantities. In the rest of the proof we shall denote, for every j ≥ 0, and

Iterating quantities
The proof is based on the fact that certain quantities, related to the fractional maximal operator we are considering, stay bounded. Specifically, we shall prove that the following inequality holds: Let us briefly recall that the last inequality implies the one in display (110) with c = (2/δ 1 ) n . Indeed, for 0 < ≤ r 1 we find i ≥ 1 such that r i+1 = δ 1 r i < ≤ r i , therefore On the other hand, if r 1 < ≤ R, recalling that r = R/2, then we similarly have where this time we have used directly (109). All in all (110) follows with c = (2/δ 1 ) n . Let us now define whenever j ≥ 1. Notice that, since r 1−α j−1 ≤ R 1−α for every j ≥ 1, then the definitions in (109) and (114) imply that Notice that the last inequality implies, in particular, that The rest of the proof is now devoted to establish (119). This will be done using induction on certain interaction chains defined in the next step.

Iteration chains
We consider the set L defined by and, accordingly, for k ≥ 1, we then define the set (123) and call it maximal iteration chain of length k, starting at i. In other words, we have C k i = {i, . . . , i + k} and each element of C k i but i lies outside of L; C k i is maximal in the sense that there cannot be another set of the same type properly containing it. Obviously, such sets do not exist when L = N. In the same way we define as the infinite maximal chain starting at i. Notice that, in every case, the smallest element of such a chain always belongs to L, being then the only one of the chain to have such a property. Now, observe that if L = N\{0}, then we are finished as in this case we have C j ≤ λ M /100 for every j ≥ 1 and therefore where we also used (121). Let us then assume that L = N\{0}. Since anyway 1 ∈ L by (120), then there must be at least one iteration chain. Let us therefore consider one of these, C k i , k ≤ ∞. We shall use finite induction to show that sup i≤ j≤i+k This, due to the fact that C k i is arbitrary, will finally prove (119) since if a certain index j does not belong to L, then it must belong to some chain. On the other hand, if j does not belong to any chain then it belongs to L and the corresponding inequality in (119) is automatically verified by the definition of C j and (125). We are now reduced to show the validity of (126) for any iteration chain C k i .

Density and decay properties along iteration chains
The idea is now that along iteration chains, a condition of the type C j ≥ λ M /100 guarantees that the equation becomes nondegenerate and can be in a sense linearized. The consequence is that the excess functional of the gradient decays in a way that resembles the one of solutions to the Poisson equation. To implement this we need to get a few density estimates. Let us preliminary prove a cheap decay estimate, that actually holds also outside the context of the present proof, but whose assumptions are obviously satisfied here by (111).

Lemma 4 (Cheap decay estimate)
Let the maps v j ∈ u + W 1, p 0 (B j ) be defined in (112) and the balls {B j } be defined in (81). Assume that the number δ 1 satisfies Then the following estimate holds for every j ≥ 0 and for a constant depending only on n, p, ν, L but not on α ∈ [0, 1]: .

(128)
Here the numbers {E j } have been defined in (117), and the constants c 1 , c d have been introduced in (76) and (105), respectively.
Proof By using, in order, triangle inequality, Lemma 2 and Theorem 17 with v ≡ v j , and also using (98) repeatedly, we have so that (4) follows by (127) and noticing that .

(129)
The following lemma is an easy consequence of the previous one: where λ M > 0 is defined in (109), then it also holds that Proof Using (128) we have where in the last estimate we have employed also (116).
The cheap decay estimate for the excess functional of Lemma 4 is not sufficient to get gradient estimates via linear Riesz potentials. Indeed, it only implies Wolff potential estimates as in (23), as shown in [21]. Anyway, by using it together with the density properties of Proposition 2 and Lemma 3, we deduce another, better decay estimate, which is this time implying linear potentials estimates. Lemma 6 (Linearized decay estimate) Let j ≥ 1 be an integer; if holds and if it happens that then we have where c 2 depends only on n, p, ν, L and has been defined in (113).
Proof Using triangle inequality, the second inequality in (132) and Lemma 2 we estimate where we have used (129) so that we conclude with max k= j, j+1 and in any case, since holds, where the numbers λ j have been defined in (118). On the other hand, observe that using Lemma 2 and (131) we have and taking again (116) into account we have Applying estimate (100) then yields We are now in position to apply the density improvement results from Proposition 2; indeed conditions in (102) are satisfied with the (134)), λ ≡ λ j−1 and = 10 3n c l ; observe that this is possible by the choice of δ 1 made in (111). Notice that (103) is satisfied due to the choice of δ 1 in (111). Proposition 2 now yields The previous inequality and (135) summarise in the following: On the other hand, notice that again by (116) we have We are thus in position to apply Lemma 3 thereby obtaining where the constant c 2 has been fixed in (113). With the last comparison estimate we can conclude as in Lemma 4, that is using Theorem 17 and the inequality in the last display as follows: and the proof of (133) is complete recalling that 4c d δ β 1 ≤ 1/4 by (111).

Iteration and conclusion
Here we finally prove that (126) holds for any iteration chain C k i , thereby concluding the proof; we remark that by construction any chain C k i is such that i ≥ 1 since the numbers C j are defined only for j ≥ 1. We shall first prove by induction that holds for every j ∈ {i, . . . , i + k − 1}. Notice that by definition of iteration chains we always have k ≥ 1. Moreover, notice also that we are going to use finite induction when k < ∞, that is when the length of the iteration chain is finite; when k = ∞ we are simply going to prove (137) for every j ≥ i. We start by considering the case j = i, which is the induction basis. This follows by the very definitions of C k i and C i as far as (137) 2 is concerned. As for (137) 1 , notice that using (121) we have Exactly in the same way we also have This last inequality allows to get (137) 3 . Indeed, we need Lemma 6 with j = i; this applies by the inequality in the last display and since by definition of iteration chain C k i we have C i+1 ≥ λ M /100; note that here we are also applying (137) 2 . Next, we proceed verifying the induction step; notice that this case occurs only when k ≥ 2 while when k = 1 the first step already concludes the proof of (137) since {i} = {i, . . . , i + k − 1}. We assume that (137) holds for all indexes j ∈ {i, . . . , h} with h < i + k − 1, and then we prove it for the index h + 1. By Lemma 5 and (137) 2 (for j = h), we immediately have Next, we are going to prove that To this aim we start observing that, whenever 0 ≤ k 1 < k 2 are integers, we have so that by letting k 1 = i and k 2 = h + 1 we obtain By adding up inequalities (137) 3 for j ∈ {i, . . . , h} and yet adding r 1−α i E i to both sides we get so that reabsorbing terms yields Using then (137) 2 for j = i and (116) allows to conclude with The last inequality together (141) gives (recall that Merging the last inequality with (121) and (138), and again using Jensen's inequality, we estimate so that (139) follows. To complete the induction step we shall finally prove (137) 3 with j = h + 1, that is For this we want to apply Lemma 6 in the case j = h +1; let us verify that assumptions are satisfied. Notice that since here we are assuming that h ≤ i + k − 2 then C h+2 ≥ λ M /100; moreover, (138) holds and by induction assumption we have At this point Lemma 6 applies and (143) follows. All in all, using induction (in particular, finite induction when k < ∞) this proves sup i≤ j≤i+k−1 In particular, this implies (126) when k = ∞, otherwise we still have to prove that This has been on the other hand implicitly proved before. Indeed, since now (137) holds in the full range j ∈ {i, . . . , i + k − 1} we repeat the computation from (141) to (142) with h = i + k − 1, that indeed gives (145). This, together with (144), finally gives (126). The proof is complete.

Proposition 1 and Theorem 8
Proof of Proposition 1 Let us define i = r/2 i , for every integer i ≥ 0 and r = 101|x − y|/100. Then observe that and, upon summation, we get We have used the elementary inequality Notice that this shows that {( f ) B(x, i ) } is a Cauchy sequence and therefore the following limit exists: Let us now show that the limit exists too, and therefore defines the precise representative of f and x, which is indeed denoted by f (x). To this aim, take 0 < ≤ 0 . There exists an integer i such that i+1 < ≤ i ; therefore, as for (146), we have The last inequality, together with the existence of the limit in (148), proves the existence of the limit in (149). Changing in the previous argument x by y and choosing this time and in the same way we can prove that the precise representative of f can be defined at y. Notice now that telescoping summation and (147) gives with a similar estimate for balls centred at y following by (150). Therefore, using triangle inequality, and recalling the choices of r made for the points x and y, we conclude with To estimate the first term in the right hand side of the previous inequality we proceed as follows: Collecting the inequalities in the last two displays yields (43). Notice that in the last display we have used that B(y, |x − y|/100) ⊂ B(x, 101|x − y|/100). We have also used that (42), we get that the estimate

Proof of Theorem 8 Applying Proposition 1 in the ball B R/4 and then
holds for every choice x, y ∈ B R/4 . The constant c appearing in the previous estimate just depends on n, but is independent of α ∈ [0, 1]. In order to estimate the last two integrals, we recall the following Caccioppoli type inequality: that holds whenever γ B ⊂ is a ball with radius γ r , and γ > 1. The constant c depends only on n, p, ν, L and γ . See [40, Proposition 4.1] for a proof. We apply (152) to B(x, 5R/8) and B(y, 5R/8) with the choice γ = 7/6 and we use it in (151). Observe that this is possible since Then we notice that the following elementary estimate holds whenever σ ∈ (0, 1): (with a similar one centred at y), and, using it with σ = 35/48, we finally conclude with (33).

Remark 7
There is a variant of estimate (33), which is now uniform in the whole range α ∈ [0, 1], and it is the following: The constant c appearing in the above estimate is this time independent of α, and only exhibits a dependence on n, p, ν, L. Estimate (153) follows exactly as estimate (33), but tracking the dependence on the constants on α starting from (151). Estimate (153) describes in a precise way the blow-up rate of the constant c in (33) when α → 0. More precisely, the coefficient 1/α appears to be related to the loss of L ∞estimates towards BMO-ones already discussed after Proposition 1. Moreover, it is the same blow-up rate appearing when fractional estimates get lost as the fractional differentiability parameter disappears. See for instance [52].

Proof of Theorem 12
This theorem has been proved in [40] in a slightly different form and with a different proof; Theorem 12 is essential for the proof of Theorem 9. We consider the sequence of shrinking balls introduced in (81) with the corresponding solutions v j ∈ u +W 1, p 0 (B i ) defined in (112); the number δ 1 ∈ (0, 1/4) is this time chosen as while we take r := R. Here we take ε ∈ (0, 1/2) to be a fixed number. Finally, we recall the definition of the excess quantities {E j } in (117). Before going on let us record an identity of later use, that is We are now going to use estimate (105) as we have done for Lemma 4. In particular, in light of Remark 5, withα < α M , we can choose in (105) a number β such that α < β < α M and we take the one in (154). Therefore, recalling that .
By recalling (154) we conclude with where c ≡ c(n, p, ν, L ,α) as a consequence of the fact that the choice in (154) determines a number δ 1 which again depends on n, p, ν, L ,α and ε via the dependence of the quantities β and c d . Observe that the resulting estimate is independent of α as long as this belongs to [0,α]. From (156) it follows that Taking for instance ε = 1/4, reabsorbing terms and yet adding R −α E 0 to both sides of the resulting inequality, we conclude with This last estimate implies the statement of the theorem, that is where c ≡ c(n, p, ν, L ,α) and S in fact denotes the quantity appearing in the right hand side of (158). Indeed, let us consider a positive radius ≤ R, and determine j ≥ 0 such that r j+1 < ≤ r j ; then using also (98) we have and (159) follows keeping in mind the dependence of δ 1 on n, p, ν, L ,α. The proof is complete.

A VMO-type result
A modification to the proof of Theorem 12 allows to get a corollary that is interesting in itself and that will be useful in order to obtain subsequent results.
Theorem 19 (VMO gradient regularity) Let u ∈ W 1, p ( ) be a weak solution to the Eq. (13) under the assumptions (16). Assume that Proof In order to prove (162) we show that, for every choice ofε, there existsR > 0, depending on n, p, ν, L ,ε, μ and the point x, such that We revisit the proof of Theorem 12, where we consider the case α =α = 0. Assume that B(x, R) ⊂ . Estimate (157) then gives which is valid as soon as δ 1 and ε are linked as in (154), and ε ∈ (0, 1/2). In particular, we also have where we have used the abbreviation S 1 ≡ S 1 (x). Now, let us notice how the previous argument implies that holds for every ≤ δ 1 R = r 1 and δ 1 is again chosen as in (154). Indeed, consider a number ≤ δ 1 R withα = 0; then there exists an integer k ≥ 1 such that r k+1 < ≤ r k . This means that = δ k 1 R for some R ∈ (δ 1 R, R]. Now, in the proof of Theorem 12 replace R by R and we gain a new chain of cylinders B j for which (164) obviously holds; indeed, notice that With this new definition we have E(Du, B ) = E k and therefore (165) is nothing but (164) with this new choice of the radii {r j } ≡ {δ j 1 R }. All in all we have proved that (165) holds with δ 1 depending only on n, p, ν, L and ε. Let us recall that estimate (165) holds with any choice of the initial radius R, which is a free parameter that is going to be chosen in the next lines; notice indeed that all the constants determined up top now are independent of the starting radius R. Now, first choose ε ≡ ε(n, p, ν, L ,ε) small enough in order to have This determines the constant c(ε) as a function of n, p, ν, L ,ε; notice that this is possible by the first assumption in (161), while using the second we can determine a radius R 1 , depending essentially on n, p, ν, L ,ε and the point x (via the rate of convergence in (161)), such that Combining the content of the last two displays with the one of (165) (where we now take with R ≡ R 1 ), and recalling the definition of the excess functional E(Du, B R ), yields (163) with the choiceR = δ 1 R 1 . As for the local VMO-regularity of Du we recall that this means that, for everyε, the choice ofR in (163) can be done independently of the point x as long as BR varies in a fixed compact subset . We can now check this by showing that all the choices of the constants above can be done independently of the point x, but depending only on the fixed subset . Indeed (166) can be replaced by The choice of R 1 in (167) can be made uniform as well by (161), which is now uniform with respect to x, and the proof of the VMO-regularity of Du is complete, too.

Proof of Theorem 15
We first prove the theorem in the case we have an energy solution u ∈ W 1, p ( ); this will be done through Steps 14.1-14.3. Then, in the final Step 14.4, we describe the approximation argument to prove the theorem for any SOLA u ∈ W 1, p−1 0 ( ) to problem (55). We indeed prefer a separate treatment of the two cases since in this second part we shall also present the general arguments to prove that the other potential estimates presented in this paper for energy solutions actually hold for SOLA too.

Proof in the case of energy solutions: beginning
We fix a ball B R ≡ B(x, R) ⊂ and then consider in the rest of the proof balls which are concentric to it. We notice that by assumption (60) the quantity is finite, where c ≡ c(n, p, ν, L) is the constant appearing in (40). We shall then prove that, for every ε > 0, there exists a radius r ε ≤ R, in general depending only on n, p, ν, L and the point x, such that This will finally prove (61). Notice that with the above definitions, by (40) Notice that, in order to verify the assumptions in (161) we have implicitly used the fact that I

Setting of the constants and a sequence
We revisit the arguments of the proof of Theorem 10, while in the following the number ε is fixed as the one introduced in (170). As usual the constants c l , c h , c d have been defined in (100), (101) and (105), respectively; c 1 is from Lemma 2 (where we take q = 1) and β ≡ β(n, p, ν, L) ∈ (0, 1) appears in (101) and (105). We this time fix δ 1 ≡ δ 1 (n, p, ν, L , ε) as With such a choice of δ 1 we determine the constant c 2 from Lemma 3 as follows: Notice that all the constants appearing up to now show a global dependence on the constants n, p, ν, L and ε. Now we can take a radius R l ≡ R l (n, p, ν, L , ε, x) ≤ R such that the following two conditions are satisfied:  (117). Notice also that (177) and a computation totally similar to the one in (115) (where we take α = 1) give In Step 14.3 below, we will prove that Let us assume for a moment that (180) and let us finish the proof of (170) with the choice Indeed, let us fix 0 < τ < ≤ r ε . This means that there exist two integers k and h, such that 2 ≤ k ≤ h, hold. Applying (178), and taking (182) into account, we get Similarly, we also gain Using the inequalities in the last two displays together with (180) and triangle inequality establishes (170). It remains to prove (180) and this will be done in the next step.
14.3 Non-degenerate iteration chains and proof of (180) We set, for j ≥ 1 and then, as in (122), we define Accordingly, we also define the chains C k i and C ∞ i as in (123) and (124), respectively. The difference here is the appearance of the parameter ε in the new definition of the set L in (183); this allows to interpret the sets C k i as maximal chains along which the equation becomes nondegenerate. In this case it will be useful to consider the following number: Needless to say, in case L is empty we have j m = ∞ and this is actually a favourable case since the problem never becomes degenerate. The following lemma is similar to Lemma 6: holds; then we have where c 2 depends only on n, p, ν, L , ε and has been defined in (176).
Proof As for Lemma 6 we estimate where we used (76) and (179). On the other hand, using triangle inequality, (171), (179) and yet Lemma 2, we have Applying estimate (100) with v ≡ v j−1 then yields In view of (186)-(187) we are in position to apply Proposition 2 with the choices v ≡ v j−1 , B ≡ B j−1 , σ = δ 1 , k = 1, λ ≡ λ M and Γ = 10 3n c l /ε; observe that this is again possible by the choice of δ 1 made in (175). This yields The inequality in the above display and (187) summarise in the following line: and therefore we are in position to apply Lemma 3 thereby obtaining where the constant c 2 has been fixed in (176). With the last comparison estimate we can conclude with (185) as in (136).
We now proceed with the proof of (180), obviously assuming k < h. We then analyse three different cases.
Case 1: k < h ≤ j m , where we recall that the number j m has been defined in (184). By the very definition of j m it follows that C j+1 ≥ λ M ε/100 holds for every j ∈ {k − 1, . . . , h − 2} and therefore we apply Lemma 7 to get that Finally, recalling (178) and (179), we conclude with On the other hand, taking k 1 = k and k 2 = h in (140) and using the inequality in the last display we can estimate and (180) follows. Case 2: j m ≤ k < h. Here (180) follows as a consequence of the inequalities If h ∈ L, the first inequality in (190) follows immediately from the definition of L; we can therefore assume h ∈ L. Then, as h > j m , it is possible to consider a nondegenerate iteration chain C m h i h with m h > 0, such that h ∈ C m h i h ; notice that h > i h as h ∈ L i h , since by definition the first index in a chain does belong to L. We are again in position to apply Lemma 7 for j ∈ {i h , . . . , i h + m h − 1}, so that (188) holds for the corresponding indexes. Summing up, proceeding as after (188), and yet summing E i h to both sides of the resulting inequality, we arrive at We have again used (178) and (179). Proceeding as in (140) (taking k 1 = i h and k 2 = h), and using the definition of C m h i h to estimate that is the first inequality in (190). The proof of the second inequality in (190) is completely similar; we just observe that we can assume that k > j m otherwise the inequality itself is trivial as k ∈ L.
Case 3: k < j m < h. This can be actually treated by a combination of the first two cases. It is indeed sufficient to prove that the inequalities in display (190) still hold. The first inequality in (190) follows exactly as in Case 2. As for the second estimate in (190), let us remark that, as j m ∈ L, we have that On the other hand, we can repeat the argument of Case 1, with h replaced by j m , iterating from h to j m and thereby obtaining as in (189). Finally, the inequalities in the last two displays and yet triangle inequality give and therefore the second inequality in (190) follows. The proof of Theorem 15 is therefore complete in the case of an energy solution u ∈ W 1, p ( ).

Very weak solutions
In this section we are going to prove Theorem 15 for SOLA. We shall also obtain the extensions of Theorems 10 and 19 to SOLA, since these are indeed needed as preliminary results. The method we propose here is of course based on an approximation argument, but, rather than passing to the limits in the final estimates, we will pass to the limits "in the proofs". Strictly speaking, we shall pass to the limits in some estimates from the proofs. The same methods developed for energy solutions will then work for SOLA. We start with a preliminary result. Let us consider a ball B(x, R) ⊂ , and a sequence of shrinking balls {B j }, concentric to B(x, R): Then we have the following: and hold whenever q is as in (78) -There exists a constant c ≡ c(n, p, ν) such that holds whenever h > 0 and ξ > 1, where c ≡ c(n, p, ν) ≥ 1. The function V (·) has been defined in (71).
Proof The proof goes via approximation and we confine ourselves to explain the arguments for the inequality in display (192), those for the other ones being completely analogous. We preliminary recall that by the results in [8,55]-see also Theorem 14any SOLA u to (55) belongs to W 1,q loc ( ) for the numbers q in the range described in (76). With u being now a fixed SOLA to (17), we consider the associated sequence {u k } ⊂ W 1, p loc ( ) from Definition 8 and note that using a standard diagonal we may assume, up to passing to not relabelled subsequences, that Du k → Du a.e.. With B j being fixed we now build the function v j as required in the statement of the lemma. We start defining w k ∈ u k + W 1, p 0 (B j ) as the unique solution to the Dirichlet problem div a(x, Applying Lemma 2 in this context gives . The next step is now letting k → ∞ in the previous inequality, and for this we need to recall a few regularity properties of the functions w k . An important point is that the way the functions w k have been defined gives that they belong to the Sobolev space W 1, p (B j ) and this allows to apply the regularity theory available for such solutions. By (196) and the very definition of the functions u k (they are converging to u in W 1, p−1 0 ( )) it follows that the sequence {w k } is bounded in W 1, p−1 (B j ); in turn this fact and the bound in (100) allows to get a uniform bound of the type Dw k L ∞ (γ B j ) ≤ c(γ ). In the same way we deduce, by using (101), that the sequence of maps {Dw k } is equicontinuous in γ B j . We can therefore invoke Ascoli-Arzelá's theorem that, together with a standard diagonal argument (recall that here γ < 1 is arbitrary), gives that there exists a function w ∈ W 1,∞ loc (B j ) ∩ u + W p−1 0 (B j ) such that, up to a not relabelled subsequence, we have w k → w and Dw k → Dw locally uniformly in B j . Now, the locally uniform convergence of the gradients Dw k allows to pass to the limits in the equations (195) and to conclude that div a(x, Dw) = 0. Fatou's lemma now gives  (B j ) and therefore standard regularity theory applies to them. The only difference is that the quantities |μ|(B j ) must replace everywhere the analogous |μ|(B j ). This is clearly not a problem since the basic inequalities the quantities |μ|(B j ) have to satisfy are implied by (116) and (179), which are already involving |μ|(B j ). Needless to say, whenever we are considering vanishing limits as in (161) the same thing hold when considering closed balls.

Theorem 3 as a corollary
The assertion of Theorem 3 can be now obtained as a corollary of the results presented up to now, and, by the approach in the previous section, directly for SOLA. Indeed, estimate (24) follows directly by estimate (40) and the fact that the finiteness of I μ 1 (x, R) implies that x is a Lebesgue point of Du. This is ensured by Theorem 15, see also Remark 1. It remains to prove that (25) implies the continuity of Du. For this we shall essentially revisit the proof of Theorem 15 given in the previous section and we shall see that the Cauchy sequence information in (170) holds locally uniformly in x. This means that the radius r ε can be chosen independently of the point x as long as this varies in a fixed open subset . The continuity of Du then follows by showing that it can be obtained as the locally uniform limit of the net of continuous maps defined by which are obviously continuous. Note that we already know that the above maps converge pointwise to Du (that is, they converges to its precise representative at every point) as established in Theorem 15. Now, a standard covering argument and the assumption (25) imply that Du is locally bounded in . Therefore, selecting another open subset such that , we set for some R > 0, where c is the constant appearing in (40). This number is going to replace the analogous one picked in (169), making it independent of x. Then, we observe that assuming (25) allows to satisfy condition (174), uniformly in x (with an argument for instance based on computations as in (115)) and this implies that (172) holds again locally uniformly in x via Theorem 19. This, and the assumption (25), imply that we can replace (177)  respectively, where now in addition we take R l < dist( , ∂ ) and λ M is defined as in (198). With the inequalities of the last two displays being now in force it is easy to check that all the inequalities after (179) become uniform with respect to the point x considered. In particular, the numbers δ 1 and R l that determine r ε via (181), that is the radius for which (170) holds, are independent of x. This means that the maps defined in (197) converge uniformly to Du and the continuity of Du therefore follows.

Proof of Theorem 9
Theorem 9 has been again proved in a slightly different form in [40]. The proof of Theorem 9 goes essentially in two stages: we first obtain a pointwise estimate for a quantity like |Du − G| (for any vector G ∈ R n ). Dimensionally speaking, this quantity represents an oscillation of the gradient when G is suitably chosen, i.e. when for instance is taken to be an average of the gradient. Then we conclude applying the sharp maximal inequality of Theorem 12 that in fact allows to estimate, in an integral way, oscillations of the gradient. Notice that in the rest of the proof we shall assume that holds, otherwise there is nothing to prove. We notice that this condition allows to conclude that both x and y are Lebesgue points of the gradient Du; this fact is indeed implied by Theorem 15, as the Wolff potentials appearing in (199) control the Riesz potential I μ 1 at the points x, y via a computation similar to the one in (26). We will anyway give a shorter proof of this fact to make the proof of Theorem 9 self-contained; see Lemma 9 below.

General setting
We readopt the general setting of Theorem 10. Therefore we start by x, y ∈ B R/4 and we select r < R/2 to be determined later. Then we follow the scheme described in Sect. 11. In particular, we define the shrinking balls {B i } as in (81); they are all centred at x, while eventually all the arguments will be reapplied when all the balls will be centred at y. The functions v j are defined in (82) while as usual we denote The number δ 1 determining the shrinking rate of the balls B j is instead defined as where the constants c d and β are from Theorem 17. This in turn yields a dependence of δ 1 on n, p, ν, L only.

Estimates on the excess
By the choice of δ 1 we can apply Lemma 4 with α = 1; this gives .
Next, we estimate the last sum in terms of Wolff potentials as follows: .
The content of the last two displays and (98) yields Connecting the last inequality to (212) and eventually to (211) completes the proof of Theorem 9.

Proof of Theorem 16
The proof goes in two steps, reported in Sects. 17.1 and 17.2, respectively. We here give the whole proof of Theorem 16 in the case of an energy solution u ∈ W 1, p ( ); the case of a general SOLA can be treated adapting the arguments valid for the energy case along the lines described in Sect. 14.4 for Theorem 15. In the rest of the proof all the balls will be centred at the point x, starting from an initial ball B(x, R) ⊂ with R ≤ 1, which is not restrictive of course.

Smallness of the excess
Here we prove that is, we prove that for everyε > 0 there existsR > 0, basically depending onε and on the point x, such that ≤R ⇒ E(u, B(x, )) <ε.
To start with, we notice that by using (44) With ε been determined, we define δ 1 as where the constantsc d andβ are from Theorem 18. In turn, having δ 1 being defined, we choose the radius r ≤ R/2 such that the smallness condition is satisfied. The constantc 1 has been introduced in (77). Notice that this is possible since the assumption in (62) and the definition of Wolff potential imply that lim →0 W μ 1, p (x, ) = 0.
Determining r allows to consider the sequence of shrinking balls {B j } as defined in (81); the functions {v j } are as in (112). Accordingly, we definẽ Finally, an argument similar to the ones in (203) yields This is indeed an inequality that actually works whenever δ 1 ∈ (0, 1/4) is a free parameter and the balls {B j } are defined as B j := B(x, δ j 1 r ). Now, the first thing we do is looking at Lemma 4 (when α = 0) and make computations similar to the one involved there, but using Theorem 18 instead of Theorem 17, and using (77) instead of (76). Specifically, for every j ≥ 0 we havẽ , which is valid whenever 1 ≤ k ≤ h are integers. We then use (217) to estimate the first two terms in the right hand side of the previous inequality and (219)-(221) to estimate the remaining one; therefore we conclude with On the other hand, take now ≤ r 1 = δ 1 r and determine k ≥ 1 such that r k+1 < ≤ r k . This means that = δ k 1 r for r ∈ (δ 1 r, r ). Now, in the proof of (225) we replace r by r and we get a new chain of cylinders B j = B(x, δ j r ) to which we apply the same reasoning as above; then (225) follows for = r j and every j ≥ 1. In particular it follows that E(u, B(x, )) ≤ε and this means now that now (215) holds forR = δ 1 r where r has been determined via the choice in (219).

Proof that x is a Lebesgue point of u
We are now ready for the proof of (63), therefore showing that for every ε > 0, there exists a radius r ε , depending on ε, such that We now define δ 1 := 1 10 8c d 1/β (227) wherec d appears in (106), and then we take a radiusR l such that sup 0< ≤R l E(u, B(x, )) + 16c 1cd δ −2n We notice that this is possible thanks to (214) and (220). With δ 1 andR l having been defined, we then use the usual shrinking chain of balls {B j } defined in (81) with r =R l , so that, in particular we have r j = δ j 1R l ; again, the functions {v j } are as in (112). Similarly to (207), whenever h > k ≥ 1 are integers it holds that We now observe that (223) still works in this setting with ε = 1, according to the choice in (227). This is h j=kẼ j ≤Ẽ k−1 + rs ∞ j=0 δ j 1 + 16c 1cd δ −2n 1 W μ 1, p (x, 2r ), and we also used (221), which is again an inequality that whenever δ 1 ∈ (0, 1/4) is a fixed parameter. Combining the inequalities in the last two displays together with (228), we obtain that holds for every choice of h > k ≥ 1.
We can now prove (226) with the choice r ε := δ 1Rl . Indeed, let us fix 0 < τ < ≤ r ε . This means that there exist two integers k and h, such that 1 ≤ k ≤ h, and, similarly, The last two inequalities and (230) establish (226) and the proof of Theorem 16 is complete.

Back to the roots: Theorem 1 (sketch)
The arguments developed for Theorem 16 allow to get a quick proof of Theorem 1, that we sketch in the following lines. We repeat the construction of Sects. 17.1-17.2 with δ 1 as in (227), but starting from any ball B(x, R) ⊂ , with r = R/2; the sequence {B j } is once again accordingly defined as in (81). In (229) we take k = 1 and let h → ∞; this yields Using this last inequality together with (221) ≡ c(n, p, ν, L), that is (20). The proof of the continuity of u under assumption (21) follows instead making the arguments for the proof of Theorem 16 uniform with respect to x, in the same way we have done in Sect. 15 with respect to the proof of Theorem 15.