Higher-Order Linearization and Regularity in Nonlinear Homogenization

We prove large-scale C∞\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C^\infty $$\end{document} regularity for solutions of nonlinear elliptic equations with random coefficients, thereby obtaining a version of the statement of Hilbert’s 19th problem in the context of homogenization. The analysis proceeds by iteratively improving three statements together: (i) the regularity of the homogenized Lagrangian L¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\overline{L}$$\end{document}, (ii) the commutation of higher-order linearization and homogenization, and (iii) large-scale C0,1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C^{0,1}$$\end{document}-type regularity for higher-order linearization errors. We consequently obtain a quantitative estimate on the scaling of linearization errors, a Liouville-type theorem describing the polynomially-growing solutions of the system of higher-order linearized equations, and an explicit (heterogenous analogue of the) Taylor series for an arbitrary solution of the nonlinear equations—with the remainder term optimally controlled. These results give a complete generalization to the nonlinear setting of the large-scale regularity theory in homogenization for linear elliptic equations.


Motivation: Quantitative Homogenization for Nonlinear Equations
This article is concerned with nonlinear, divergence-form, uniformly elliptic equations of the form −∇ · D p L(∇u(x), x) = 0 in U ⊆ R d , d 2.
( 1.1) The Lagrangian L( p, x) is assumed to be uniformly convex and regular in p and possess some mild regularity in x. Furthermore, L is a stochastic object: it is sampled by a probability measure P which is statistically stationary and satisfies a unit range of dependence. This essentially means that x → L(·, x) is a random field, valued in the space of uniformly convex functions, the law of which is independent of x (or, to be precise, periodic in x; see Section 1.4 for the assumptions).
The objective is to describe the statistical behavior of the solutions of (1.1), with respect to the probability measure P, on large length scales. In other words, we want to understand what the solution u looks like in the case that the "macroscopic" domain U is very large relative to "microscopic" scale, which is the correlation length scale of the coefficients (taken to the unit scale).
At a qualitative level, a satisfactory characterization of the solutions, in the regime in which the ratio of these two length scales is large, is provided by the principle of homogenization. First proved in this context by Dal Maso and Modica [11,12], it asserts roughly that a solution of (1.1) is, with probability approaching one, close in L 2 (relative to its size in L 2 ) to a solution of a deterministic equation of the form −∇ · D p L (∇u hom ) = 0 in U ( 1.2) for an effective Lagrangian L which is also uniformly convex and C 1, 1 .
This result is of great importance, from both the theoretical and computation points of view, since the complexity of the homogenized equation (1.2) is significantly less than that of (1.1) as it is both deterministic and spatially homogeneous. It hints that the structure of (1.1) should, on large domains and with high probability, possess some of the structure of a constant coefficient equation and thus we may expect it to be more amenable to our analysis than the worst-possible heterogeneous equation of the form (1.1). In other words, since the Lagrangian L is sampled by a probability measure P with nice ergodic properties, rather than given to us by the devil, can we expect its solutions to have a nicer structure? In order to answer this kind of question, we need to build a quantitative theory of homogenization.
To be of practical use, the principle of homogenization must be made quantitative. We need to have answers to questions such as: • How large does the ratio of scale separation need to be before we can be reasonably sure that solutions of (1.1) are close to those of ( 1.2)? In other words, what is the size of a typical error in the homogenization approximation in terms of the size of U ? • Can we estimate the probability of the unlikely event that the error is large?
• What is D p L and how can we efficiently compute it? How regular can we expect it to be? Can we efficiently compute its derivatives? • Can we describe the fluctuations of the solutions?
In this paper we show that (1.1) has a C ∞ structure. In particular, we will essentially answer the third question posed above by demonstrating that the effective Lagrangian L is as regular as L(·, x) with estimates for its derivatives. We will identify the higher derivatives of L as the homogenized coefficients of certain linearized equations and give quantitative homogenization estimates for these, implicitly indicating a computational method for approximating them and thus a Taylor approximation for L. Finally, we will prove large-scale C k,1 type estimates for solutions of (1.1), for k ∈ N as large as can be expected from the regularity assumptions on L, a result analogous to Hilbert's 19th problem, famously given for spatially homogeneous Lagrangians by De Giorgi and Nash. For this classical result, the main roadblock was in obtaining the continuity of solutions to linear equations, as it is completely straightforward to differentiate the equation apply the Schauder theory, which was previously known. In the case of homogenization, the situation is reversed, as it is less clear how one should "differentiate the equation" since literally doing so would produce negative powers of ε, even if the coefficients were smooth. Our analysis resolves this difficulty and reveals the interplay between these three seemingly different kinds of results: (i) the regularity of L, (ii) the homogenization of linearized equations and the commutability of homogenization and linearization, and (iii) the large-scale regularity of the solutions. These three statements must be proved together, iteratively in the parameter k ∈ N which represents the degree of regularity of L, the order of the linearized equation, and the order of the C k,1 estimate.

Background: Large-Scale Regularity Theory and Its Crucial Role in Quantitative Homogenization
In the last decade, beginning with the work of Gloria and Otto [16,18], a quantitative theory of homogenization has been developed to give precise answers to questions like the ones stated in the previous subsection. Until now, most of the progress has come in the case of linear equations −∇ · a(x)∇u = 0, (1.3) which corresponds to the special case L( p, x) = 1 2 p · a(x) p of (1.1), where a(x) is a symmetric matrix. By now there is an essentially complete quantitative theory for linear equations, and we refer to the monograph [4] and the references therein for a comprehensive presentation of this theory. Quantitative homogenization for the nonlinear equation (1.1) has a comparatively sparse literature; in fact, the only such results of which we are aware are those of [5,6] (see also [4,Chapter 11]), our previous paper [1] and a new paper of Fischer and Neukamm [14] which was posted to arXiv as we were finishing the present article.
Quantitative homogenization is inextractably linked to regularity estimates on the solutions of the heterogeneous equation. This is not surprising when we consider that the homogenized flux D p L(∇u hom ) should be related to the spatial average (say, on some mesoscopic scale) of the heterogeneous flux D p L(∇u(x), x). In order for spatial averages of the latter to converge nicely, we need to have bounds. It could be unfortunate and lead to a very slow rate of homogenization if, for instance, the flux was concentrated on sets of very small measure which percolate only on very large scales. To rule this out we need much better estimates: ideally, we would like to know that the size of the flux on small scales is the same as on large scales, which amounts to a W 1,∞ estimate on solutions.
Unfortunately, solutions of equations with highly oscillating coefficients do not possess very strong regularity, in general. Indeed, the best deterministic elliptic regularity estimate for solutions of (1.1), which does not degenerate as the size of the domain U becomes large, is C 0,δ in terms of Hölder regularity (the De Giorgi-Nash estimate) and W 1,2+δ in terms of Sobolev regularity (the Meyers estimate). The tiny exponent δ > 0 in each estimate becomes small as the ellipticity ratio becomes large (see [4,Example 3.1]) and thus both estimates are far short of the desired regularity class W 1,∞ = C 0,1 .
One of the main insights in the quantitative theory of homogenization is that, compared to a generic ("worst-case") L, solutions of the equation (1.1) have much better regularity if L is sampled by an ergodic probability measure P. This is an effect of homogenization itself: on large scales, (1.1) should be a "small perturbation" of ( 1.2), and therefore better regularity estimates for the former can be inherited from the latter. This is the same idea used to prove the classical Schauder estimates. In the context of homogenization, the result states that there exists a random variable X (sometimes called the minimal scale) which is finite almost surely such that, for every X < r < 1 2 R and solution u ∈ H 1 (B R ) of (1.1) with U = B R , we have the estimate B r ( 1.4) Here C depends only on dimension and ellipticity and ffl U := 1 |U |´U denotes the mean of a function in U . If we could send r → 0 in ( 1.4), it would imply that which is a true Lipschitz estimate, same estimate in fact as holds for the homogenized equation (1.2). As ( 1.4) is valid only for r > X , it is sometimes called a "large-scale C 0,1 estimate" or a "Lipschitz estimate down to the microscopic scale". This estimate, first demonstrated in [6] in the stochastic setting, is a generalization of the celebrated result in the case of (non-random) periodic coefficients due to Avellaneda and Lin [7]. Of course, it then becomes very important to quantify the size of X . The estimate proved in [6], which is essentially optimal, states that X is bounded up to "almost volume-order large deviations": for every s < d and r 1, ( 1.5) Here the constant C depends only on s, d, and the ellipticity. A proof of this largescale regularity estimate together with (1.5) can be found in [4,Chapter 3] in the linear case and in [4,Chapter 11] for the nonlinear case. The right side of ( 1.5) represents the probability of the unlikely event that the L sampled by P will be a "worst-case" L in the ball of radius r . A proof of the optimality of ( 1.5) can be found in [4,Section 3.6]. This large-scale regularity theory introduced in [6] was further developed in the case of ( 1.3) in [2,3,5,15,17] and now plays an essential role in the quantitative theory of stochastic homogenization. Whether one employs functional inequalities [13,17] or renormalization arguments [2,3,19], it is a crucial ingredient in the proof of the optimal error estimates in homogenization for ( 1.3): see the monograph [4] and the references therein for a complete presentation of these developments.
The large-scale C 0,1 estimate is, from one point of view, the best regularity one can expect solutions of (1.1) or ( 1.3) to satisfy: since the coefficients are rapidly oscillating, there is no hope for the gradient to exhibit continuity on the macroscopic scale. However, as previously shown in the periodic case in [7,9], the solutions of the linear equation (1.3) still have a C ∞ structure. To state what we mean, let us first think of an (interior) C k,1 estimate not as a pointwise bound on the (k + 1)th derivatives of a function, but as an estimate on how well a function may be approximated on small balls by a kth order polynomial. By Taylor's theorem, these are of course equivalent in the following sense: where P k denotes the set of polynomials of degree at most k and and we use the notation w L 2 (U ) := ffl U |w| 2 1 2 to denote the volume-normalized L 2 (U ) norm.
Thus the interior C k,1 estimate for a harmonic function can be stated in the form: for any harmonic function u in B R and any r ∈ 0, 1 2 ( 1.6) Moreover, the infimum on the left side may be replaced by the set of harmonic polynomials of degree at most k.
As we cannot expect a solution of (1.1) to have regularity beyond C 0,1 in a classical (pointwise) sense, in order to make sense of a C k,1 estimate for the heterogeneous equation (1.3) we need to replace the set of polynomials by a heterogeneous analogue. The classical Liouville theorem says that the set of harmonic functions which grow like o(|x| k+1 ) is just the set of harmonic polynomials of degree at most k. This suggests that we should use the (random) vector space We think of these as "a(x)-harmonic polynomials." It turns out that one can prove that, P-almost surely, this set is finite dimensional and has the same dimension as the set of at most kth order harmonic polynomials. In fact, one can match any a(x)harmonic polynomial to an a-harmonic polynomial in the highest degree, and vice versa. In close analogy to (1.6), the statement of large-scale C k,1 regularity is then as follows: there exists a minimal scale X satisfying ( 1.5) such that, for any R > 2X and any solution u ∈ H 1 (B R ) of −∇ · a∇u = 0, we can find φ ∈ A k such that, for every r ∈ X , 1 2 see [4,Theorem 3.8] for the full statement, which was first proved in the periodic setting by Avellaneda and Lin [8]. Subsequent versions of this result, which are based on the ideas of [7,8] in their more quantitative formulation given in [6], were proved in various works [2,15,17], with the full statement here given in [3,10]. In all of its various forms, higher regularity in stochastic homogenzations is based on the simple idea that solutions of the heterogeneous equation should be close to those of the homogenized equation, which should have much better regularity. In the case of the linear equation (1.3), it does not a large leap, technically or philosophically, to go from ( 1.4) to (1.7). Indeed, to gain control over higher derivatives, one just needs to differentiate the equation (not in the microscopic parameters, of course, but in macroscopic ones) and, luckily, since the equation is linear, this does not change the equation. Roughly speaking, the idea is analogous to bootstrapping the regularity of a constant-coefficient, linear equation by differentiating it. Therefore the estimate (1.7) is perhaps not too surprising once one has the large-scale C 0,1 estimate in hand. 1. 3

. Summary of the Results Proved in this Paper
The situation is very different in the nonlinear case. When one differentiates the equation (again, in a macroscopic parameter) we get a new equation, namely the first-order linearized equation. If we want to apply a large-scale regularity result to this equation, we must first (quantitatively) homogenize it! Achieving higher-order regularity estimates requires repeatedly differentiating the equation, which lead to a hierarchy of linearized equations requiring homogenization estimates.
Let us be a bit more explicit. The gradient of the homogenized Lagrangian L is given by the well-known formula ( 1.8) where φ p is the first-order corrector with slope p ∈ R d , that is, it satisfies The limit in the second line of ( 1.8) is to be understood in a P-almost sure sense, and it is a consequence of the ergodic theorem, which states that macroscopic averages of stationary fields must converge to their expectations. The formula ( 1.8) says that D p L( p) is the flux per unit volume of the first-order corrector with slope p ∈ R d . It naturally arises when we homogenize the nonlinear equation . We can try to show that L ∈ C 2 by formally differentiating (1.8), which leads to the expression If we define the linearized coefficients around p + φ p by a p (x) := D 2 p L p + ∇φ p (x), x and put ψ (1) p,e i := ∂ p i φ p (x), then we see that ψ (1) p,e i is the first-order corrector with slope e i of the linear equation with coefficients a p : We call ψ (1) p,e a first-order linearized corrector. Moreover, we have the formula That is, "linearization and homogenization commute": the Hessian of L at p should be equal to the homogenized coefficients a p corresponding to the linear equation with coefficient field a p = D 2 p L( p + ∇φ p (·), ·). This reasoning is only formal, but since the right side of the above formula for D 2 p L is well-defined (and only needs qualitative homogenization), we should expect that it should be rather easy to confirm it rigorously. Moreover, while quantitative homogenization of the original nonlinear equation gives us a C 0,1 estimate, we should expect that quantitative homogenization of the linearized equation should give us a C 0,1 estimate for differences of solutions and a large-scale C 1,1 estimate for solutions. This is indeed the case and was proved in our previous paper [1], where we were motivated by the goal of obtaining this regularity estimate for differences of solutions in anticipation of its important role in the proof of optimal quantitative homogenization estimates. Indeed, in the very recent preprint [14], Fischer and Neukamm showed that this estimate can be combined with spectral gap-type assumptions on the probability measure to obtain quantitative bounds on the first-order correctors which are optimal in the scaling of the error.
We may attempt to differentiate the formula for the homogenized Lagrangian a second time, with the ambition of obtaining a C 3 estimate for L, a C 2,1 estimate for solutions and a higher-order improvement of our C 0,1 estimate for differences (which will be a C 0,1 estimate for linearization errors): we get (2) p,e i ,e j (x) dx (1) p,e i e j + ∇ψ (1) p,e j dx .
If we define the vector field (1) p,e i e j + ∇ψ (1) p,e j , then we see that ψ (2) p,e i ,e j is the first-order corrector with slope zero of the linear equation −∇ · a p ∇ψ (2) p,e i ,e j = ∇ · F 2, p,e i ,e j in R d , (1.9) and the formula for the tensor D 3 p L becomes (2) p,e i ,e j (x) + F 2, p,e i ,e j (x) dx = F 2, p,e i ,e j , (1.10) the corresponding homogenized coefficient. Unlike the case for the Hessian of L, we should not expect this formula to be valid under qualitative ergodic assumptions! Indeed, the qualitative homogenization of ( 1.9) and thus the validity of (1.10) requires that F 2, p,e i ,e j belong to L 2 , in the sense that E and due to the product of two first-order correctors we only have L 1 -type integrability for F 2, p,e i ,e j . 1 This is a serious problem which can only be fixed using the large-scale regularity theory for the first-order linearized equation, and a suitable bound on the minimal scale X , thereby obtaining a bound in L ∞ ( 0 ) and hence L 4 ( 0 ) for ∇ψ (1) p,e i , with at least a fourth moment in expectation. Note that this also requires some regularity of the Lagrangian L on the smallest scale.
If we differentiate the equation once more in an effort to prove that L ∈ C 4 , we will be faced with similar difficulties, this time with the more complicated vector field (1) p,e i ∇ψ (2) p,e j ,e k (1) p,e j ∇ψ (2) p,e i ,e k (1) p,e k ∇ψ (2) p,e i ,e j (1) p,e i e j + ∇ψ (1) p,e j e k + ∇ψ (1) p,e k .
Notice that the last term has three factors of the first-order linearized correctors instead of two, and is thus "even further" from being obviously L 2 than was F (2) p,e i ,e j . Homogenizing the third-order linearized equation will therefore require large-scale regularity estimates for both the first-order and second-order linearized equations, and one can now see the situation will get worse as the order increases beyond three. Moreover, proving quantitative homogenization for these equations will also require some smoothness of the homogenized coefficients associated to the lowerorder equations, due to the needs for the homogenized solutions to be smooth in quantitative two-scale expansion arguments.
This suggests a bootstrap argument for progressively and simultaneously obtaining (i) the smoothness of L; (ii) the higher-order large-scale regularity of solutions (and solutions of linearized equations); and (iii) the homogenization of the higherorder linearized equations and commutation of homogenization and higher-order linearization. The point of this paper is to formalize this idea and thereby give a proof of "Hilbert's 19th problem for homogenization". Here is a rough schematic of the argument, as discussed above, which comes in three distinct steps, discussed in more detail below: • Homogenization & large-scale C 0,1 regularity for the linearized equations up to order N ⇒ L ∈ C 2+N . • L ∈ C 2+N and large-scale C 0,1 regularity for the linearized equations up to order N ⇒ homogenization for the linearized equations up to order N + 1.
• L ∈ C 2+N , large-scale C 0,1 regularity for the linearized equations up to order N and homogenization for the linearized equations up to order N + 1 ⇒ large-scale C 0,1 regularity for the linearized equations up to order N + 1. 1 To be pedantic, we actually have L 1+δ -type integrability for a tiny δ > 0 by the Meyers estimate, but this does not help.
The three implications above are the focus of most of the paper and their proofs are given in  Once this induction argument is completed, we consequently obtain a full C k,1type large scale regularity estimate for solutions of the original nonlinear equation, generalizing (1.7). The main question becomes what the replacement for A k should be, that is, what the "polynomials" should be. We show that these are certain solutions of the system of linearized equations (linearized around a first-order corrector) exhibiting polynomial-type growth, which we classify by providing a Liouville-type result which is part of the statement of the theorem (see the discussion between the statements of Theorems 1.5 and 1.6, below, for a definition of these spaces, which are denoted by W p n ). The resulting theorem we obtain, which is a version of the statement of Hilbert's 19th problem in the context of homogenization, provides a very precise description of the solutions of (1.1) in terms of the first-order correctors and the correctors of a hierarchy of linear equations.
We also obtain, as a corollary, the improvement of the scaling of linearization errors-which is closely related to the regularity of solutions. To motivate this result, suppose we are given two solutions u, v ∈ H 1 (B R ) of (1.1) in a large ball (R 1) which are close to each other in the sense that Suppose that we attempt to approximate the difference u − v by the solution w ∈ Then we may ask the question of how small we should expect the first-order linearization error to be. The best answer that we have from deterministic elliptic regularity estimates is that there exists a small exponent δ(d, ) > 0 such that . (1.12) This can be easily proved for instance using the Meyers gradient L 2+δ estimate, and it is sharp in the sense that it is not possible to do better than the very small exponent δ. We can say roughly that the space of solutions of (1.1) is a C 1,δ manifold, but no better. Of course, if L does not depend on x, or if R is of order one and L is smooth in both variables ( p, x), then one expects to have the estimate above with δ = 1 and to be able to prove more precise estimates using higher-order linearized equations. In fact, this is essentially a reformulation of the statement of Hilbert's 19th problem (indeed-see Appendix E, where we give a proof of Hilbert's 19th problem in its classical formulation by following this line of reasoning).
In this paper we also prove a large-scale version of the quadratic response to first-order linearization in the context of homogenization, which states that (1.12) holds with δ = 1 whenever R is larger than a random minimal scale. Moreover, we prove a full slate of higher-order versions of this result: see Corollary 1. 4. These results roughly assert that, with probability one, the large-scale structure of solutions of (1.1) resembles that of a smooth manifold.
In the next two subsections, we state our assumptions and give the precise statements of the results discussed above.

Assumptions and Notation
In this subsection, we state the standing assumptions in force throughout the paper.
We fix following global parameters: the dimension d ∈ N with d 2, a constant ∈ [1, ∞) measuring the ellipticity, an integer N ∈ N with N 1 measuring the smoothness of the Lagrangian, and constants M 0 , This allows us to, for instance, denote constants C which depend on (d, , The probability space is the set of all Lagrangians L : R d × R d → R, written as a function of ( p, x) ∈ R d × R d , satisfying the following conditions: (L1) L is 2+N times differentiable in the variable p and, for every k ∈ {2, . . . , 2+ N}, the function D k p L is uniformly Lipschitz in both variables and satisfies For k = 1, we assume that, for z ∈ R d , (1.14) (L2) L is uniformly convex in the variable p: for every p ∈ R d and x ∈ R d , (L3) D p L(0, ·) is uniformly bounded: We define to be the set of all such Lagrangians L: Note that depends on the fixed parameters (d, , N, M 0 , K 0 ). It is endowed with the following family of σ -algebras: for each Borel subset U ⊆ R d , define F(U ) := the σ -algebra generated by the family of random variables The largest of these is denoted by F := F(R d ).
We assume that the law of the "canonical Lagrangian" L is a probability measure P on ( , F) satisfying the following two assumptions: (P1) P has a unit range of dependence: for all Borel subsets U, V ⊆ R d such that dist(U, V ) 1, (P2) P is stationary with respect to Z d -translations: for every z ∈ Z d and E ∈ F, where the translation group {T z } z∈Z d acts on by (T z L)( p, x) = L( p, x +z).
The expectation with respect to P is denoted by E.
Since we will be often concerned with measuring the stretched exponential moments of the random variables we encounter, the following notation is convenient: for every σ ∈ (0, ∞), θ > 0, and random variable X on , we write This is essentially notation for an Orlicz norm on ( , P). Some basic properties of this notation is given in [4, Appendix A].

Statement of the Main Results
We begin by introducing the higher-order linearized equations. These can be computed by hand, as we did for the second and third linearized equations in Section 1.3, but it is convenient to work with more compact formulas. Observe that, by Taylor's formula with remainder, we have, for every n ∈ {0, . . . , N + 1}, Define, for p, x, h 1 , . . . , h N ∈ R d and t ∈ R, Observe that F 1 ≡ 0 by definition (that is, the right side of the first linearized equation is zero, as we have already seen). Our first main result concerns the regularity of the effective Lagrangian L and states that it has essentially the same regularity in p as we assumed for L.
In view of Theorem 1.1, we may introduce homogenized versions of the above functions. We define, for every and then, for every m ∈ {1, . . . , As above, we have that F 1 ≡ 0 by definition.
In the next theorem, we present a statement concerning the commutability of homogenization and higher-order linearizations.
and, for each m ∈ {1, . . . , n + 1}, fix g m ∈ W 1,2+δ (U m ) and let u ε ∈ H 1 (U 1 ) and the functions w ε ( 1.19) Finally, let u ∈ H 1 (U 1 ) and, for every m ∈ {1, . . . , n + 1}, the function w m ∈ H 1 (U m ) satisfy the homogenized problems ( 1.21) Observe that, due to the assumed regularity of L in the spatial variable and the Schauder estimates, the vector fields F m (∇u ε , ∇w ε 1 , . . . , ∇w ε m−1 , x ε ) on the right side of the equations for w m in ( 1.19) belong to L ∞ (U m ). In particular, they belong to L 2 (U m ) and therefore the Dirichlet problems in (1.19) are well-posed in the sense that the solutions w m belong to H 1 (U m ). Of course, this regularity given by the application of the Schauder estimate depends on ε (and indeed the constants blow up like a large power of ε −1 ) and therefore this remark is not very useful as a quantitative statement. To prove the homogenization result for the mth linearized equation, we will need to possess much better bounds on these vector fields, which amounts to better regularity on the solutions ∇u, ∇w 1 , . .
This is the reason we must prove Theorems 1.1 and 1.2 at the same time (in an induction on the order m of the linearized equation and of the regularity of D 2 L) as the following result on the large-scale regularity of solutions of the linearized equations and of the linearization errors. Its statement is a generalization of the large-scale C 0,1 estimates for linearized equation and differences of solutions proved in [1].
As mentioned above, throughout the paper we use the following notation for volume-normalized L p norms: for each p ∈ [1, ∞), U ⊆ R d with |U | < ∞ and f ∈ L p (U ), then we have, for every r ∈ X , 1 2 R and m ∈ {0, . . . , n}, the estimates ( 1.22) and The main interest in the above theorem is the case of the exponent q = 2. However, we must consider arbitrarily large exponents q ∈ [2, ∞) in order for the induction argument to work. In particular, in order to show that Theorem 1.3 for some n implies Theorem 1.2 for n + 1, we need to consider potentially very large q (depending on n).
As mentioned above, Theorems 1.1, 1.2 and 1.3 are proved together in an induction argument. Each of the theorems has already been proved in the case N = 0 and q = 2 in our previous paper [1]. The integrability in Theorem 1.3 is upgraded to q ∈ (2, ∞) in Propositions 4.7 and 5.1 below for w 1 and ξ 0 , respectively. These serve as the base case of the induction. The main induction step is comprised of the following three implications: • Regularity of L (Section 2). We show that if Theorems 1. of applying a quantitative excess decay iteration, based on the "harmonic" approximation provided by the quantitative homogenization statement. This estimate controls the regularity of the w m 's on "large" scales (that is, larger than a multiple of the microscopic scale). To obtain L q -type integrability for ∇w m , it is also necessary to control the small scales, and for this we apply deterministic Calderón-Zygmund-type estimates (this is our reason for assuming L possesses some small-scale spatial regularity). The estimates for the linearization errors ξ m−1 are then obtained as a consequence by comparing them to w m .
From a high-level point of view, the induction argument summarized above resembles the resolution of Hilbert's 19th problem on the regularity of minimizers of integral functionals with uniformly convex and smooth integrands. The previous three theorems allow us to prove the next two results, which can be considered as resolutions of Hilbert's 19th problem in the context of homogenization.
The first is the following result on precision of the higher-order linearization approximations which matches the one we have in the constant-coefficient case, as discussed near the end of Section 1.3: (
The analysis of the linearized equations presented in the theorems above allow us to develop a higher regularity theory for solutions of the nonlinear equation on large scales, in analogy to the role of the Schauder theory in the resolution of Hilbert's 19th problem on the regularity of solutions of nonlinear equations with smooth (or constant) coefficients. This result generalizes the large-scale C 1,1 -type estimate proved in our previous paper [1] to higher-order regularity as well as the result in the linear case [4,Theorem 3.6].
Before giving the statement of this result, we introduce some additional notation and provide some motivational discussion. Given a domain U ⊆ R d , we define This is the set of solutions of the nonlinear equation in the domain U , which we note is a stochastic object. We next define L 1 to be the set of global solutions of the nonlinear equation which exhibit at most linear growth at infinity: For each p ∈ R d , we denote the affine function p by p (x) := p · x. Observe that if the difference of two elements of L 1 has strictly sublinear growth at infinity, it must be constant, by the C 0,1 -type estimate for differences (the estimate (1.23) with m = 0). Therefore the following theorem, which was proved in [1], gives a complete classification of L 1 .
such that the following statements are valid: (i) For every u ∈ L 1 satisfying lim sup r →∞ 1 r u − (u) B r L 2 (B r ) M, there exist an affine function such that, for every R X σ , (ii) For every p ∈ B M , there exists u ∈ L 1 satisfying, for every R X σ , (1.28) Statements (i) and (ii) of the above theorem, which give the characterization of L 1 , can be considered as a first-order Liouville-type theorem. In the case of deterministic, periodic coefficient fields, this result was proved by Moser and Struwe [23], who generalized the result of Avellaneda and Lin [8] in the linear case. Part (iii) of the theorem is a quantitative version of this Liouville-type result, which we call a "large-scale C 1,1 estimate" since it states that, on large scales, an arbitrary solution of the nonlinear equation can be approximated by an elements of L 1 with the same precision as harmonic functions can be approximated by affine functions. It can be compared to similar statements in the linear case (see for instance [4,17]).
In this paper we prove a higher-order version of Theorem 1. 5. We will show that, just as a harmonic function can be approximated locally by harmonic polynomials, we can approximate an arbitrary element of L(B R ) by elements of a random set of functions which are the natural analogue of harmonic polynomials. In order to state this result, we must first define this space of functions.
Let us first discuss the constant-coefficient case. If L is a smooth Lagrangian, we know from the resolution of Hilbert's 19th problem that solutions of −∇ · D p L(∇u) = 0 are smooth and thus may be approximated by a Taylor expansion at each point. One may then ask, can we characterize the possible Taylor polynomials? In Appendix E we provide such a characterization in terms of the linearized equations. The quadratic part is an a p := D 2 L( p)-harmonic polynomial and the higher-order polynomials satisfy the equations the linearized equations, involving the F m 's as right hand sides. More precisely, for each p ∈ R d and n ∈ N, we set W p,hom n It is not too hard to show that W p,hom n ⊆ P hom 2 × · · · × P hom n+1 , where P hom j stands for homogeneous polynomials of degree j. Indeed, we see, by Liouville's theorem, that w 1 is an a p -harmonic polynomial of degree two. More importantly, according to Appendix E, we have that if u solves −∇ · D p L(∇u) = 0 in the neighborhood of origin, and we set p = ∇u(0) and w m ( In particular, w m is a sum of a special solution of ∇ · a p ∇w m + F m ( p, ∇w 1 , . . . , ∇w m−1 ) = 0 in P hom m+1 and an a p -harmonic polynomial in P hom m+1 . For our purposes it is convenient to relax the growth condition at the origin and define simply W p n := (w 1 , . . . , w n ) ∈ P 2 × · · · × P n+1 : for m ∈ {1, . . . , n} we have With this definition, we lose the homogeneity of polynomials. Following the approach in [4,Chapter 3], it is natural to define heterogeneous versions of these spaces by that is, the tuplets of heterogeneous solutions with prescribed growth. Here p + φ p is the unique element of L 1 (up to additive constants) satisfying In other words, φ p is the first-order corrector with slope p: see Lemma 2. 10. The next theorem gives a higher-order Liouville-type result which classifies the spaces W p n and states that they may be used to approximate any solution of the nonlinear equation with the precision of a C n,1 estimate.
( (iii) n There exists C(n, M, data) < ∞ such that, for every R X and v ∈ L(B R ) satisfying ∇v L 2 (B R ) M, there exist p ∈ B C and (w 1 , . . . , w n ) ∈ W p n such that, defining we have, for every r ∈ X , 1 2 R and k ∈ {0, 1, . . . , n}, the following estimates: (1.31) and The proof of Theorem 1.6 is given in Section 7.
We conclude this introduction with some comments about the possibility of generalizing our results in two different directions. First, it is not possible that our results can be extended from the scalar case to general nonlinear elliptic systems. Indeed, for even for constant-coefficient, analytic Lagrangians exhibiting uniform ellipticity and quadratic growth, there are explicit counterexamples (in dimensions larger than two) demonstrating that solutions may not even be Lipschitz: see for instance the dark side survey [22,Sections 3.3 & 3.4]. However, our arguments readily adapt in certain situations special cases (as might be expected), such as the two dimensional case or under an assumption that L has a particular form such as an Uhlenbeck-type structure (see [22,Section 4.7]). Roughly speaking, if one can identify conditions under which constant-coefficient elliptic systems have C 1 regularity, and these conditions are preserved by homogenization, then our techniques should be expected to apply.
Second, one might wonder whether it is possible to develop an analogous theory (in the scalar case) for functionals exhibiting non-quadratic growth, for example heterogeneous versions of the p-Laplacian for p ∈ (1, ∞)\{2}, under assumptions such as [22, (2.20)]. This is indeed an interesting problem which seems to be wide open. Of course, qualitative homogenization is well-known in such a setting [11,12]. The main roadblock at the present time to establishing a quantitative theory of homogenization, as well as a large-scale regularity theory, is a better understanding of the homogenized Lagrangian. In particular, we would need to show that ellipticity and growth assumptions such as [22, (2.20)] are preserved by homogenization, without which there is no hope for higher regularity. To our knowledge this is open even in the periodic setting.
Appendices C-E of this paper contain estimates for constant-coefficient equations which are essentially known but not to our knowledge written anywhere. We also collect some auxiliary estimates and computations in Appendices A and B.

Regularity Estimates for the Effective Lagrangian
In this section, we suppose that n ∈ {0, . . . , N − 1} is such that the statements of Theorems 1.1, 1.2 and 1.3 are valid for n. (2.1) The goal is to prove Theorem 1.1 for n + 1.
We proceed by constructing the linearized correctors ψ (m) p,h up to m = n + 2 and relate the correctors of different orders to each other via differentiation in the parameter p. We show that these results allow us to improve the regularity of D 2 L up to C n+1,β and obtain the statement of Theorem 1.1 for n + 1. In particular, this allows us to define the effective coefficient F n+1 . We also give formulas for the derivatives of L and for F m in terms of the correctors, which allow us to relate them to each other and show that (1.16) holds.

The First-Order Correctors and Linearized Correctors
In this subsection we construct the linearized correctors up to order n + 2. For each p ∈ R d , we define φ p to be the first-order corrector of the nonlinear equation, that is, the unique solution of The existence and uniqueness (up to additive constants) of the first-order corrector φ p is classical: it can be obtained from a variational argument (applied to an appropriate function space of stationary functions. Alternatively, it can be shown (following the proof given in [4,Section 3.4]) that the elements of L 1 , which was characterized in Theorem 1.5 above (which was proved already in [1]), have stationary gradients. We define the coefficient field a p (x) to be the coefficients for the linearized equation around the solution Given p, h ∈ R d , we define the first linearized corrector ψ (1) p,h to satisfy In other words, ψ (1) p,h is the first-order corrector with slope h for the equation which is the linearization around the first-order corrector For m ∈ {2, . . . , N + 2}, we define the mth linearized corrector to be the unique (modulo additive constants) random field ψ (m) p,h is the corrector with slope zero for the mth linearized equation Notice that this gives us the complete collection of correctors for the latter equation, since by linearity we observe that ψ (m) p,h + ψ (1) p,h is the corrector with slope h . Furthermore, by the linearity of the map h → h + ∇ψ (1) p,h (x), it is easy to see from the structure of the equations of ψ We first show that the the problem (2.3) for the mth linearized corrector is wellposed for m ∈ {2, . . . , n + 1}. This is accomplished by checking, inductively, using our hypothesis (2.1) (and in particular the validity of Theorem 1.3 for m n), that we have appropriate estimates on the vector fields F m (· · · ) on the right side. We have to make this argument at the same time that we obtain estimates on the smoothness of the correctors ψ (m) p,h as functions of p. In fact, we prove that ∇ψ We will also obtain C 0,1 -type bounds on the linearized correctors, which together with the previous display yields good quantitative control on the smoothness of the correctors in p.
Before the main statements, let us collect a few preliminary elementary results needed in the proofs. The following lemma is well-known and can be proved by the Lax-Milgram lemma (see for instance [21,Chapter 7]).

Lemma 2.1. Let a(·) be a Z d -stationary random field valued in the symmetric matrices satisfying the ellipticity bound I d
Then there exists a unique random potential field ∇z satisfying

Next, a central object in our analysis is the quantity
It then follows, by Young's inequality, that there exists C(m, data) < ∞ such that We have, similarly, that Using these we get, for x, p,p, (2.10) Therefore, by Young's inequality, we get, for all δ > 0, Furthermore, as we will be employing an induction argument in m, it is useful to notice that the leading term in F m has a simple form, and we have Our first result in this section gives direct consequences of (2.1) for estimates on the first-order correctors and linearized correctors.
C|h| m+1 (2.14) and, for every Finally, for q ∈ [2, ∞)and m ∈ {1, . . . , n + 1}, there exist constants δ(m, d, ) ∈ 0, 1 2 and C(q, m, M, data) < ∞ such that, for every p ∈ B M and h ∈ B 1 , We first collect two consequences of Theorem 1.3 assumed for n. Fix q ∈ [2, ∞). Theorem 1.3 implies that there is a minimal scale X such that ( 1.22) and (1.23) are valid with q(n + 1) instead of q and for every r ∈ X , 1 2 R . Hence, for every r ∈ X , 1 2 R and m ∈ {0, . . . , n}, we get the estimates and On the other hand, using (2.7) we obtain In particular, it follows by (2.17 . Since ∇ψ (i) p,h is Z d -stationary random field, we have by the ergodic theorem, after sending R → ∞, that a.s.
Furthermore, by Lemma 2.1 and the previous display we get, for m ∈ {1, . .
Observe that the limiting behavior of ξ 0 and ψ (1) p,h can be identified via their equations By Z d -stationary of ∇φ p and ∇φ p+h , implying Z d -stationarity of a p andã p , we may apply Lemma 2.1 to obtain

(2.23)
To show (2.23), we first claim that the difference quotient solves the equation Indeed, then (2.23) follows by Lemma 2. 1. Rewriting we observe that the first three terms on the right are solenoidal by the equations of ψ (n) p+th,h , ψ (n) p,h and ψ (n+1) p,h , respectively, and the fourth term on the right is zero by (2.22). We thus obtain (2.24).
Step 2. We will estimate the terms on the right in (2.23) separately, and in this step we first show that, for t ∈ (−1, 1), t = 0, (2.25) By the triangle inequality we have Therefore, by Hölder's inequality and (2.16), which is (2.25).
Step 3. We show that We have that and since we obtain (2.26) by (2.16).
Step 4. We then prove that Using decomposition (2.12), we have that for n 2, and thus differentiable in p. In particular, by (2.16), for every q ∈ [2, ∞) there is δ(q, n, data) > 0 and C(q, n, M, data) < ∞ such that

Using the above decomposition for
We have by (2.16) that and therefore The right hand side can be estimated with the aid of (2.25) to obtain (2.27).
Step 5. Conclusion from which we get (2.20) for m = n with β = 1. Finally, (2.20) implies that t → ∇φ p+th is in C n+2,β close to t = 0, and thus we have that (2.21) is valid by (2.13). The proof is complete.
Then p → p + ∇φ p is (n + 2) times differentiable with respect to p and, for q ∈ [2, ∞), there are constants δ(q, n, data) ∈ 0, 1 2 and C(q, n, M, data) < ∞ such that, for m{1, . . . , n + 1}, The proof of Lemma 2.5 relies on a general principle in polarization based on multilinear analysis, and it is formalized in the following lemma: Lemma 2. 6. Let V be a real, finite-dimensional vector space, and let : V n → R be a multilinear, symmetric form, that is, for all v 1 , . . . , v n ∈ V and any permutation σ of {1, . . . , n}, we have where the sum on the left is over all non-empty subsets A of {1, 2, . . . , n}. For this, we begin by expanding each summand φ j∈A v j = j∈A v j , . . . , j∈A v j fully, as a sum of terms of the form (v j 1 , . . . , v j n ) with j 1 , . . . , j n ∈ A. Using the symmetry of , each such term can be written as Letting c A ( f ) denote the number of ordered n-tuples ( j 1 , . . . , j n ) of elements of A which can be reordered to form ( f (1), . . . , f (n)), it follows that the expression A⊆{1,. ..,n} (−1) n−|A| φ j∈A v j can be expanded to give A⊆{1,. ..,n} Changing the order of summation gives where the sum on the right is over all subsets A of {1, 2, . .
Finally, it is a well-known combinatorial fact that for every nonempty finite set S, we have where the sum on the left is over all subsets B of S. Thus, above, we have as was to be shown; this proves the polarization formula.
Proof of Lemma 2. 5 Therefore, in view of Lemma 2.6, it is enough to show that SinceF n+2 is C 0,1 in its first argument and polynomial in its last n arguments, we obtain by homogeneity, the chain rule and Lemma 2.5 that Therefore, the leading term is the first one on the right in (2.31). Indeed, we observe by the above display that To conclude, we need to estimate the first term on the right. To this end, we use the triangle inequality to get, for any β ∈ [0, 1], It follows, by Hölder's inequality, that By Lemma 2.5 and Hölder's inequality, we get Consequently, combining above displays yields (2.30), finishing the proof.

Smoothness of L
It is a well-known and easy consequence of qualitative homogenization that DL is given by the formula In [1], we proved that L ∈ C 2 and D 2 L is given by the formula (2.32) We next generalize this result to higher derivatives of L by essentially differentiating the previous formula many times and applying the results of the previous subsection. Moreover, we validate the statement of Theorem 1.1 for n + 1.
We begin by showing (2.33). Starting from (2.32), and observing that since This implies that We prove that (2.34) is valid for k = m + 1 as well. Differentiating with respect to p yields, using (2. 19) and (2.21), that and noticing that we obtain by Lemmas 2.5 and 2.7, together with Hölder's inequality, that In view of Lemma 2.6 this yields which is the statement of Theorem 1.1 for n + 1. The proof is complete.

Sublinearity of Correctors
By the ergodic theorem, we have that, for every p, h ∈ R d and m ∈ {1, . . . , n + 1}, the correctors and linearized correctors are (qualitatively) sublinear at infinity: The assumption (2.1) allows us to give a quantitative estimate of this sublinearity.

Lemma 2.10. (Sublinearity of correctors). Assume
We relabel Y to X and α 2 to α. We further take X larger, if necessary, so that [1,Proposition 4.3] is at our disposal. Suppressing both p and h from the notation, we let φ r and ψ (m) all homogenize to zero and we get, by Theorem 1.2, that, for r X , This and the induction assumption, that is that (2.36) is valid for m ∈ {1, . . . , k}, together with Lemma 2.11 below, imply that Combining the previous two displays yields is a p -harmonic in B r , we have by the Lipschitz estimate [1,Proposition 4.3] that, for r X and t ∈ [X , r ], Therefore, by compactness, there existsψ (k+1) such that, for t ∈ [X , r ], Proceeding now as in [4,Section 3.4] proves that ∇ψ (k+1) is Z d -stationary. Finally, by integration by parts we also obtain that E ´ 0 ∇ψ (k+1) = 0 and, therefore, sinceψ (k+1) solves the same equation as ψ (k+1) , by the uniqueness we have that ψ (k+1) = ψ (k+1) up to a constant. The proof is hence complete by the previous display.
Above we made use of a lemma, which roughly states that if two solutions of the systems of linearized equations are close in L 2 then their gradients are also close. This lemma will also be applied repeatedly in the following sections.
and the same holds with (ũ,w 1 , . . . ,w N ) in place of (u, w 1 , . . . , w N ). Then and, denoting . . , ∇w m−1 , x) and analogously defineã andf m . We assume that R 2 m+2 X , where X is as in Theorem 1.3 for n, valid by the assumption of (3.1).
The estimate (2.38) is just the estimate for ξ 0 in Theorem 1. 3. It also implies Using Hölder's inequality and applying Theorem 1.3 for n, we obtain, for any p ∈ [2, ∞) and δ > 0, (2.42) We observe that ζ := w m −w m satisfies the equation Combining these and using (2.41) and the validity of Theorem 1.3, we get Taking δ 0 sufficiently small and putting δ m := 2 −m δ 0 , we get by induction (using Young's inequality and rearranging several sums) that, for every m ∈ {1, . . . , N }, Combining this with (2.42), we get These imply (2.39) and (2.40) after a covering argument.

Quantitative Homogenization of the Linearized Equations
In this section, we suppose that n ∈ {0, . . . , N − 1} is such that The goal is to prove that Theorem 1.2 is also valid for n + 1 in place of n. That is, we need to homogenize the (n + 2)th linearized equation.
In order to prove homogenization for the (n + 2)th linearized equation, we follow the procedure used in [1] for homogenizing the first linearized equation. We first show, using the induction hypothesis (3.1), that the coefficients D 2 p L ∇u ε , x ε and F n+1 x ε , ∇u ε , ∇w ε 1 , . . . , ∇w ε n can be approximated by random fields which are local (they satisfy a finite range of dependence condition) and locally stationary (they are stationary up to dependence on a slowly varying macroscopic variable). This then allows us to apply known quantitative homogenization results for linear elliptic equations which can be found for instance in [4].

Stationary, Finite Range Coefficient Fields
We proceed in a similar fashion as in [1,Section 3.4] by introducing approximating equations which are stationary and localized.
We fix an integer k ∈ N which represents the scale on which we localize.
where p is the affine function p (x) := p · x. We then define, for each z ∈ 3 k Z d , a coefficient fieldã and then recursively define, for each 2) Finally, we create 3 k Z d -stationary fields by gluing the above functions together: m, might not be H 1 functions globally, but we can nevertheless define their gradients locally in z + k . The R d(n+3) -valued random field ∇v is 3 k Z d -stationary and has a range of dependence of at most 3 k √ 15 + d, by construction. The same is also true of the corresponding coefficient fields, since these are local functions of this random field: is 3 k Z d -stationary and has a range of dependence of at most 3 k √ 15 + d.

(3.3)
In the next subsection, we will apply some known quantitative homogenization estimates to the linear equation

4)
In order to anchor these estimates, we require some deterministic bounds on the vector field F (k) n+2, and thus on the gradients of the functions w (k) m, defined above. These bounds are exploding as a large power of 3 k , but we will eventually choose 3 k to be relatively small compared with the macroscopic scale (so these large powers of 3 k will be absorbed).  = ( p, h 1 We deduce the existence of C(M, data) < ∞ such that We will argue by induction in m ∈ {1, . . . , n + 2} that there exist Q(data) < ∞ and C(M, data) < ∞ such that By the Caccioppoli inequality, we get In view of (3.6) and the previous two displays, another application of Proposition A.2 yields, after enlarging Q and C, the bound (3.7) for m = M. This completes the induction argument and the proof of (3.7). The bound (3.5) immediately follows.
By the assumed validity of Theorem 1.3 for n, we also have that, for each M, q ∈ [2, ∞) there exist δ(q, data) > 0 and C(M, q, data) < ∞ and a random variable X = O δ (C) such that, for every k ∈ N with 3 k X and every m ∈ {1, . . . , n + 1} and = ( p, h 1 and hence, for such k and and m ∈ {1, . . . , n + 2}, Observe that (3.5) and (3.9) together imply that, for δ and C as above and every m ∈ {1, . . . , n + 2}, We next study the continuity of a (k) p and F (k) m, in the parameter . . There exist constants δ(q, data) > 0 and Q(q, data), C(q, M, data) < ∞ and a random variable X = O δ (C) such that, for every k ∈ N with 3 k X , and, for every m ∈ {1, . . . , n + 2}, (3.12)

Proof.
We take X to be larger than the random variables in the statements of Lemma 2.11 and Theorems 1.2 and 1.3 for n. The bound (3.11) is then an immediate consequence of (2.38) and the obvious fact that ∇v (k) By induction we now obtain, for every m ∈ {1, . . . , n + 1}, This implies (3.12).
By combining (3.5) and (3.12) and using interpolation, we obtain the existence of α(data) > 0, Q(data) < ∞ and C(data) > 0 such that, with X as in the statement of Lemma 3.2, then for every k ∈ N with 3 k X , m ∈ {1, . . . , n + 2}, and Likewise, we can use (3.6) and (3.11) to obtain This variation of Lemma 3.2 will be needed below.

Setup of the Proof of Theorem 1.2
We are now ready to begin the proof of the implication (3.1) ⇒ the statement of Theorem 1.2 with n + 1 in place of n.
We fix parameters M ∈ [1, ∞), δ > 0 and ε ∈ (0, 1), a sequence of Lipschitz and a sequence of boundary conditions ( 3.17) and . . , ∇w m−1 (x)) . (3.18) We also choose K ∈ N to be the unique positive integer satisfying 3 −K −1 < ε 3 −K . We write (x) := (∇u(x), ∇w 1 (x), . . . , ∇w n+1 (x)) . (3.19) By the assumption (3.1), we only need to homogenize the linearized equation for m = n + 2. As we have already proved a homogenization result in [1, Theorem 1.1] for the linearized equation with zero right-hand side, it suffices to prove ( 1.21) for m = n + 2 under the assumption that the boundary condition vanishes: We can write the equations for w ε n+2 and w n+2 respectively as and − ∇ · a(x)∇w ε n+2 = ∇ · F n+2 in U n+2 , w n+2 = 0, on ∂U n+2 . (3.22) Our goal is to prove the following estimate: there exists a constant C(M, data) < ∞, exponents σ (data) and α(data) > 0 and random variable X satisfying (3.23) as in the statement of the theorem: To prove (3.24), we first compare the solution w n+2 to the solutionw ε n+2 of a second heterogeneous problem, namely where the coefficient fieldsã ε andF ε n+2 are defined in terms of the localized, stationary approximating coefficients (introduced above in Section 3.1) by (3.26) The parameter k ∈ N will be chosen below in such a way that 1 3 k ε −1 . We also need to declare a second mesoscopic scale by taking l ∈ N such that 1 3 k 3 l ε −1 and, for every m ∈ {1, . . . , n + 1}, Like k, the parameter l will be declared later in this section. For convenience we will also take a slightly smaller domain U n+3 than U n+2 , which also depends on ε and l and is defined by The estimate (3.24) follows from the following two estimates, which are proved separately below (here X denotes a random variable as in the statement of the theorem): and

Estimates on Size of the w ε m and w m
To prepare for the proofs of (3.29) and (3.30), we present some preliminary bounds on the size of the w ε m 's. The first is a set of deterministic bounds representing the "worst-case scenario" in which nothing has homogenized up to the current scale.
Then by the basic energy estimate, we get In view of (3.6) and the previous two displays, another application of Proposition A.2 yields, after enlarging Q and C, the bound (3.33) for m = M. This completes the induction argument and the proof of the lemma.
The typical size of |∇w ε m | is much better than (3.31) gives.
We also require bounds on the homogenized solutions u and w 1 , . . . , w n+2 . These are consequences of elliptic regularity estimates presented in Appendix D, namely Lemma D.1, which is applicable here because of the assumption (3.1) which ensures that L ∈ C 3+N,β for every β ∈ (0, 1). We obtain the existence of C(M, data) < ∞ such that, for every m ∈ {1, . . . , n + 1}, (3.37) In particular, the function defined in (3.19) is Lipschitz continuous. By the global Meyers estimate, we also have, for some δ(d, ) > 0 and C(M, data) < ∞, the bound We may also apply Lemma D.1 to get a bound on w n+2 . In view of the merely mesoscopic gap between ∂U n+2 and U n+3 , which is much smaller than the macroscopic gaps between ∂U m and U m+1 for m ∈ {1, . . . , n + 1}, cf. (3.15) and (3.28), the estimate we obtain is, for every β ∈ (0, 1), for an exponent Q(β, data) < ∞ which can be explicitly computed rather easily (but for our purposes is not worth the bother) and a constant C(β, M, data) < ∞.

The Mesoscopic Parameters k and l
Here we enter into a discussion regarding the choice of the parameters k and l which specify the mesoscopic scales. Recall that 1 3 k 3 l ε −1 . As we will eventually see later in this section, we will estimate the left sides of (3.29) and (3.30) by expressions of the form (3.40) where Q(data) < ∞ is a large exponent and α(data) > 0 is a small exponent. We then need to choose k and l so that the expression in parentheses is a positive power of ε. We may do this as follows. First, we may assume throughout for convenience that Q 1 4α. Then we may take care of the first two terms by choosing l so that ε3 l is a "very large" mesoscale: we let l be defined so that Then we see that, for some β > 0, Next, we take care of the last term: since, for some β > 0, we can make this smaller than ε β 2 by taking ε3 k to be a "very small" mesoscale. We take k so that, for β and Q as in the previous display, From this we deduce that ε3 l α 3 k Q Cε β 2 . We see that this choice of k also makes the third term inside the parentheses on the right side of (3.40) smaller than a positive power of ε. With these choices, we obtain that, for some β > 0, (3.41) Throughout the rest of this section, we will allow the exponents α ∈ 0, 1 4 and Q ∈ [1, ∞) to vary in each occurrence, but will always depend only on data. Similarly, we let c and C denote positive constants which may vary in each occurrence and whose dependence will be clear from the context.

The Minimal Scales
Many of the estimates we will use in the proof of Theorem 1.2 are deterministic estimates (that is, the constants C in the estimate are not random) but which are valid only above a minimal scale X which satisfies X = O δ (C) for some δ(data) > 0 and C(M, data) < ∞. This includes, for instance, the estimates of Theorems 1.1, 1.2 and 1.3 assumed to be valid by our assumption (3.1), as well as Lemmas 2.11, 3.1, 3.2 and 3.4, some estimates like (3.13) which appear in the text, and future estimates such as those of Lemmas 3.5, 3.6 and 3. 7. We stress that this list is not exhaustive.
In our proofs of (3.29) and (3.30), we may always suppose that 3 k X , where X is the maximum of all of these minimal scales. In fact, we may suppose that 3 k is larger than the stationary translation T z X of X by any element of z ∈ Z d ∩ ε −1 U 1 . To see why, first we remark that if X is any random variable satisfying X = O δ (C), then a union bounds gives that, for any Q < ∞ the random variablẽ satisfies, for a possibly smaller δ and larger C < ∞, depending also on Q, the estimateX = O δ/2 (C). Since, as explained in the previous subsection, 3 k is a small but positive power of ε −1 , we see that by choosing Q suitably we have that 3 k X implies the validity of all of our estimates. Finally, in the event that 3 k <X , we can use the deterministic bounds obtained in Lemma 3.3 and (3.38) very crudely as follows: Then we use that where in the previous display the exponent Q is larger in the second instance than in the first. This yields (3.29) and (3.30) in the event {3 k >X }, so we do not have to worry about this event. Therefore, throughout the rest of this section, we let X denote a minimal scale satisfying X = O δ (C) which, in addition to both δ and C, may vary in each occurrence.

The Proof of the Local Stationary Approximation
We now turn to the proof of (3.29), which amounts to showing that the difference between the coefficient fields (a ε , F ε n+2 ) and (ã ε ,F ε n+2 ) is small. This is accomplished by an application of Lemma 2.11. Lemma 3. 5. There exist α(data) > 0, Q(data) < ∞ and C(M, data) < ∞ and a minimal scale X = O δ (C) such that, if 3 k X , then For each z ∈ ε3 l Z d with z + ε l+1 ⊆ U n+1 , we compare ∇u and ∇w ε m to the functions ∇v Likewise, if 3 l X , then for every z as above, and every m ∈ {1, . . . , n + 1}, Here we used (3.37) in order that the C not depend implicitly on |∇u(z)| and in order that the prefactor on the right side of the second line be is as it is, rather than m j=1 ∇w j (z) m j . Using (3.37) again, we see that Combining the three previous displays with the triangle inequality, we get, for all such m and z and provided that 3 l X , An application of Lemma 2.11 then yields By L p interpolation and the bounds (3.8) and (3.34), we obtain, for every q ∈ [2, ∞), an X such that 3 l X implies that, for every m and z as above, This estimate implies By a very similar argument, comparing the functions ∇v Using (3.13) and (3.14), in view of (3.37), we find an exponent Q(data) < ∞ such that, for every m and z as above, Combining the above estimates, we finally obtain that, for every z and m as above, Using that these coefficient fields are bounded and that the boundary layer (that is, the set of points which lie in U n+2 but not in any cube of the form z + ε l with z as above) has thickness O(ε3 l ), we get from the previous estimate that where (in order to shorten the expressions) we denote To complete the proof of the lemma, we observe that ζ :=w ε n+2 − w ε n+2 ∈ H 1 0 (U n+2 ) satisfies the equation By the basic energy estimate (test the equation for ζ with ζ ), we obtain The first term on the right side is already estimated above. For the second term, we use the Meyers estimate and (3.34), without forgetting (3.20), to find δ(d, ) > 0 such that Therefore, by Hölder's inequality and the above estimate on a ε −ã ε This completes the proof of (3.44).
In view of the discussion in Sections 3.4 and 3.5, Lemma 3.5 implies (3.29).

Homogenization Estimates for the Approximating Equations
To prepare for the proof of (3.29), we apply the quantitative homogenization estimates proved in [4] for the linear equation (3.4

). We denote by a (k)
p and F indexed over bounded Lipschitz domains U ⊆ R d and e ∈ B 1 and ∈ R d(n+2) , satisfy the following quantitative homogenization estimate: there exist α(s, d, ) > 0, β(data) > 0, δ(data) > 0 and Q(data) < ∞ and C(s, M, data) < ∞ and a random variable X = O δ (C) such that, for every = ( p, h 1 , . . . , h n+1 ) ∈ R d(n+2) with | p| M, e ∈ R d and every l ∈ N with 3 l−k X , as well as To obtain (3.47), we change the scale by performing a dilation x → 3 k (15 + d) 1 2 x so that the resulting coefficient fields have a unit range of dependence and are still Z d -stationary, cf. (3.3). We then apply [4, Lemma 11.11] to obtain (3.47), using Lemma 3.1, namely (3.5), to give a bound on the parameter K in the application of the former. The reason we quote results for general nonlinear equations, despite the fact that (3.46) is linear, is because (3.46) has a nonzero right-hand side and the results for linear equations have not been formalized in that generality.
The bound (3.48) is a consequence the large-scale C 0,1 -type estimate for the equation (3.4) (see [4,Theorem 11.13]) which, together with a routine union bound, yields (3.48) with the left side replaced by sup z∈Z d ∩ l ∇v L 2 (z+(l−k) k ) . Indeed, if we denote by X z the minimal scale in the large-scale C 0,1 estimate, then a union bound yields (for any s ∈ (0, d), so in particular we can take s > 1), We obtain (3.48) by combining this bound for sup z∈Z d ∩ l ∇v L 2 (z+(l−k) k ) with the deterministic bounds (3.5), (3.6) and an application of the Schauder estimates (see Proposition A.2).
We also need the following estimate for the difference of v's on overlapping cubes, which can be inferred from (3.47) and the Caccioppoli inequality: there exist α(s, d, ) > 0, β(data) > 0, δ(data) > 0 and Q(data) < ∞ and C(s, M, data) < ∞ and a random variable X = O δ (C) such that, for every = ( p, h 1 , . .
We next observe that the homogenized coefficients a (k) n+2, agree, up to a small error, with a p := D 2 L( p) and F n+2 ( ). This will help us eventually to prove that the homogenized coefficients for the linearized equations are the linearized coefficients of the homogenized equation. Fix M ∈ [1, ∞). There exist α(data) > 0 and C(n, M, data) < ∞ such that, for every = ( p, h 1 , . . . , h n+1 ) (3.56) and

Lemma 3.7.
Proof. We first give the argument in the case that = ( p, h, 0  We also require the following continuity estimate for the functions w(·, l , e, ) in (e, ). Following the proof of Lemma 3.2 for L 2 dependence in (we need need to do one more step of the iteration described there), using (3.47) for L 2 dependence in e, and then interpolating this result with (3.48), we obtain exponents α(data) > 0 and Q(data) < ∞ and a random variable X = O δ (C) such that, for every e, e ∈ R d and (3.60)

Homogenization of the Locally Stationary Equation
We next present the proof of (3.30), which is the final step in the proof of Theorem 1. 2. The argument follows a fairly routine (albeit technical) two-scale expansion argument and requires the homogenization estimates given in Section 3. 7.
The proof of (3.30). In view of the discussion in Section 3.5, we may assume throughout that 3 k X where X is any minimal scale described above. We begin by building an approximation of the solutionw ε n+2 of (3.25) built out of the solutions of (3.46). We select a smooth function ζ ∈ C ∞ c (R d ) such that, for some C(d) < ∞ and each i ∈ {1, 2}, (3.61) We can construct such a ζ by a suitable mollification of 1 ε l . We write ζ z := ζ(·−z) for short. We next define a functionṽ ε . (3.62) A preliminary goal is therefore to prove the estimate (3.63) The proof of (3.63) is essentially completed in Steps 1-5 below: see (3.72) and (3.73).
After obtaining this estimate, we will prove that Together these inequalities imply (3.30).
Denote v z := v ·, z ε + l+1 , ∇w n+2 (z), (z) for short and compute Using the equation for v z , we find that Therefore, we have that .
We next estimate each of the five terms on the right side of the previous display.
Step 1. The estimate of the first term on the right side of (3.67). This is where we use the homogenized equation for w n+2

Therefore we may have
For the first term on the right, we use (3.37), (3.47), (3.56) and (3.57) to obtain For the second term, we denote (3.68) observe that |U n+2 \U n+4 | Cε3 l , and compute, using the Hölder inequality, to obtain, using also (3.38), Combining the above estimates and taking the supremum Step 2. The estimate of the second term on the right side of (3.67). We use (3.14), (3.37) and (3.50) to find Step 3. The estimate of the third term on the right side of (3.67). We have by (3.13) and (3.37 Step 4. The estimate of the fourth term on the right side of (3.67). For convenience, extend v z so that it is equal to the affine function ∇w n+2 (z) outside of the cube z + ε l+1 . By the triangle inequality, we have , (3.69) where here and below we use the notation y ∼ z to mean that y+ l+1 and z+ l+1 have nonempty intersection. Using (3.61) and the fact that z∈ε3 l Z d ∇ζ z = 0, we have that is a boundary layer of U n+3 of thickness ε3 l . Observe that |V | Cε3 l . Thus by (3.47) we find that Combining (3.61) with (3.49), (3.37) and the triangle inequality to compare the v z 's on overlapping cubes, we find that For the last term, we compute, using (3.39), Putting the above inequalities together, we find that Step 5. The estimate of the fifth term on the right side of (3.67). Recall that U n+4 is defined in (3.68). We have that, using (3.38) and (3.39), Step 6. Let us summarize what we have shown so far. Substituting the results of the five previous steps into ( 3.67), we obtain that

(3.73)
Therefore to obtain the lemma it suffices to prove (3.64). To prove this bound, we use the identity which, combined with (3.47), and (3.37), yields This is (3.64).
In view of the selection of the mesoscopic parameters k and l in Section 3.4, which gives us (3.41), the proof of (3.30) is now complete.

Large-Scale C 0,1 -Type Estimates for Linearized Equations
In the next two sections, we suppose that n ∈ {0, . . . , N − 1} is such that The goal is to prove that Theorem 1.3 is also valid for n + 1. Combined with the results of the previous two sections and an induction argument, this completes the proof of Theorems 1.1, 1.2 and 1. 3. The goal of this section is to prove the half of Theorem 1.3, namely estimate ( 1.22) for w n+2 . The estimate (1.23) will be the focus of Section 5.
Both of the estimate in the conditional proof of Theorem 1.3 are obtained by "harmonic" approximation: homogenization says that on large scales the heterogeneous equations behave like the homogenized equations, and therefore we may expect the former to inherit some of the better regularity estimates of the latter. The quantitative version of the homogenization statement provided by Theorem 1.2 allows us to prove a C 0,1 -type estimate, following a well-known excess decay argument originally introduced in [6].

Approximation of w n+2 by Smooth Functions
The large-scale regularity estimates are obtained by approximating the solutions of the linearized equations for the heterogeneous Lagrangian L, as well as the linearization errors, by the solutions of the linearized equations for the homogenized Lagrangian L. Since the latter possess better smoothness properties, this allows us to implement an excess decay iteration for the former.
We begin by giving a quantitative statement concerning the smoothness of solutions to the linearized equations for the homogenized operator L. This is essentially well-known, but we need a more precise statement than we could find in the literature.
We next present the statement concerning the approximating of the solutions w m of the linearized equations for L, as well as the linearized errors ξ m , by solutions of the linearized equations for L. For the w m 's, this is essentially a rephrasing of the assumed validity of Theorem 1.2 for n + 1 in place of N, see (4.1). For the ξ m 's, this can be thought of as a homogenization result.  exist δ(n, d, ) ∈ (0, d), α(d, ) ∈ 0, 1 2 , C(n, M, d, ) < ∞ and a random variable X satisfying such that the following statement is valid. For every R X and u, v, w 1 Proof. Denote, in short, R m := 1 2 (1 + 2 −m )R and r m := 1 4 (R m − R m−1 ). Since we assume (4.1), we have that Theorem 1.2 applied for n + 1 instead of N and Theorem 1.3, assumed now for n in place of N, are both valid. In particular, there is σ (n, data) ∈ (0, 1) and C(n, M, data) < ∞, a random variableX O σ (C) such that, for R X and m ∈ {1, . . . , n + 2}, Theorem 1.3 gives and Theorem 1.2 yields We set Clearly X = O 1 2 ασ (C) and X X , and if R X , thenX R −α R − 1 2 α . In conclusion, by taking α smaller if necessary, we obtain by (4.6) that, for m ∈ {1, . . . , n + 2} and R X , (4.7) Furthermore, we notice that (2.7) yields and thus we get, by (4.5) and (4.7), that

It then follows by the Meyers estimate and equation of w m that
finishing the proof of (4.4) by (4.7).

Excess Decay Iteration for w n+2
In this subsection we conditionally prove the statement of Theorem 1.3 for n +1 and for q = 2. The restriction on q will be removed in the next subsection. The proof is by a decay of excess iteration, following along similar lines as the argument from [6], using "harmonic" approximation. The statement we prove is summarized in the following proposition: such that the following statement is valid: let R ∈ [X , ∞) and u, w 1 , . . . , w n satisfy, for each m ∈ {1, . . . , n + 1}, Then, for m ∈ {1, . . . , n + 2} and for every r ∈ [X , 1 2 R], we have This choice, in particular, implies that there exist constants C(M, data) < ∞ and σ (data) ∈ 0, 1 2 such that X O σ (C) and X R. We will prove the statement using induction in shrinking radii. Indeed, we set, for j ∈ N and θ ∈ (0, 1 2 ], r j := θ j r 0 , where r 0 ∈ [X , δ R] and δ ∈ 0, 1 2 . Parameters θ , δ and R will all be fixed in the course of the proof. Having fixed θ , δ and R, we assume that there is J ∈ N such that r J +1 < X r J for some J ∈ N. If there is no such J or X δ R, the result will follow simply by giving up a volume factor. Furthermore, we device the notation of this proof in such a way that it will also allow us to prove the result of the next lemma, Lemma 4. 4. We denote, in short, for m ∈ {1, . . . , n + 2} and γ ∈ [0, 1) to be fixed, and Notice then that, by (4.12) and Poincaré's inequality, we have, for r ∈ [X , 1 2 R] and m ∈ {1, . . . , n + 2}, that Step 1. Letting u j solve − ∇ · D p L ∇u j = 0 in B 2r j , we show that there exist, for η ∈ (0, 1], constants α 1 (d, ) ∈ (0, 1 2 ), δ(η, M, data) < ∞ and R(η, M, data) < ∞ such that, for j ∈ {0, . . . , J }, (4.14) The parameter η shall be fixed later in (4.19). On the one hand, we have from [1,Theorem 2.3] that there exist constants β 1 (d, ) ∈ (0, 1) and C(M, d, On the other hand, by harmonic approximation ([1, Corollary 2.2]) and Lipschitz estimate for u ([1, Theorem 2.3]) we get that there exist constants β 2 (d, ) ∈ (0, 1) and Thus (4.14) follows by the triangle inequality by taking α 1 := 1 2 (β 1 ∧ β 2 ), and choosing δ small enough and R large enough so that We assume, from now on, that δ and R are such that (4.15) is valid.
Step 4. We show that the first inequality in (4.18) continues to hold for j = j * + 1. First, by the triangle inequality, (4.21) and (4.18), we have that By a similar computation, using also the induction assumption (4.18), Now, applying (D.7) we obtain by the previous three displays, (4.20) and (4.13), for each m ∈ {1, . . . , n + 2}, that r j r j+1 and, consequently, Thus, choosing θ small enough so that Cθ β 2 1 2 , we obtain that the first inequality in (4.18) is valid for j = j * + 1.
Step 5. The last step in the proof is to show the second inequality in (4.18) for j = j * + 1. Let j be the minimizing affine function in inf ∈P 1 w m − L 2 (B r j ) . Then, by the triangle inequality and the first inequality in (4.18) valid for j ∈ {0, . . . , j * }, Thus, summation gives that Therefore, By the triangle inequality we have that Choosing C = C, where C is as in the above inequality, verifies the second inequality in (4.18) for j = j * + 1. This finishes the proof of the induction step, and thus completes the proof.
To show Lipschitz estimates for the linearization errors in the next section, we need a small variant of the previous proposition.
24) and, for r ∈ [X , 1 2 R], where u satisfy the normalization Proof. The proof is a rearrangement of the argument in the proof of Proposition 4.2. Indeed, we take r 0 = r and combine the first inequality of (4.23) with (4. 16) and (4.9), which is valid for m ∈ {1, . . . , n + 1}.

Improvement of Spatial Integrability
We next complete the conditional proof of (1.22) by improving the spatial integrability of (4.25) from L 2 to L q for every q ∈ [2, ∞). To do this, we use the estimate (4.25) to pass from the large scale R to the microscopic, random scale X . We then use deterministic estimates from classical elliptic regularity theory to obtain local L q estimates in balls of radius one, picking up a volume factor-which is power of X -as a price to pay. The first formalize the latter step in the following lemma: Lemma 4. 5. Assume (4.1) and the hypotheses of Theorem 1. 3. Let β ∈ (0, 1) and q ∈ (2, ∞). Then there exist σ (q, data) ∈ (0, d), C(q, M, data) < ∞ and a random variable X satisfying X = O σ (C) such that, for every r X and m ∈ {1, . . . , n + 2}, Proof. In view of the assumption of (4.1) and thus the validity of Theorem 1.3 for n, we only need to prove (4.26) for m = n + 2. Fix q ∈ (2, ∞), r ∈ [2, ∞), β ∈ (0, 1) to be selected below. The C 1,β -estimate in Proposition A.2, together with a covering argument, yields Setting a := D 2 p L(∇u, x), we deduce by the assumption of (L1) that Applying Proposition A.1 we find that, for each x ∈ B r/2 and q ∈ (2, ∞], By a covering argument, we therefore obtain If we now take X to be the maximum of the minimal scales in the statements of: (1) [4, Theorem 11.13]; (2) Theorem 1.3 for n in place of N and with a sufficiently large exponent of spatial integrability q in place of q (which can be computed explicitly in terms of our q using the Hölder inequality, although we omit this computation)-the validity of which is given by assumption (4.1); (3) Proposition 4.2; then we have that X = O σ (C) as stated in the lemma and that r X implies the following estimates: Combining these with (4.27), we obtain This completes the proof of the lemma.
In the next lemma we finally achieve the goal of this section, which is to show that (4.1) implies ( 1.22) for m = n + 1. Lemma 4. 6. Assume that (4.1) holds. Fix M ∈ [1, ∞) and q ∈ (2, ∞). Then there exist constants σ (q, data) ∈ (0, d), C(q, M, data) < ∞ and a random variable X satisfying such that the following is valid. Suppose that R ∈ [X , ∞) and u, w 1 , . . . , w n ∈ H 1 (B R ) such that, for every m ∈ {1, . . . , n + 2}, 28) Then, for every r ∈ [X , 1 2 R], we have ∈ (2, ∞). Select a parameter θ ∈ (0, 1) which will denote a mesoscopic scale. For each z ∈ R d , we take X z to be the random variable X in the statement of Proposition 4.2, centered at the point z. Define another random variable (a minimal scale) by It is clear from Proposition 4.2 and an routine union bound argument that Next, for every k ∈ N and z ∈ R d we let Z k,z denote the random variable where the supremum is taken over all (u, w 1 , . .
satisfying, for every m ∈ {1, . . . , n + 2}, (4.30) Observe that Z k,z is F(z + k+1 )-measurable and, by Lemma 4.5 and an easy covering argument, it satisfies the estimate (4.31) Fix A ∈ [1, ∞) and define another random variable (a minimal scale) Z by We will show below that, if A is chosen sufficiently large (depending of course on the appropriate parameters) then Assuming that (4.32) holds for the moment, let us finish the proof of the lemma. Suppose now that k ∈ N satisfies Y ∨ Z 3 k 3 k+1 R. Let (u, w 1 , . . . , w n+2 ) ∈ H 1 (B R ) n+3 satisfy (4.28). Then we have that Note that in the third and final lines we used that 3 k Y, that is, we used the result of Proposition 4.2. This is the desired estimate for X = Y ∨ Z, and so the proof of the lemma is complete subject to the verification of (4.32).
Turning to the proof of (4.32), we notice first that it suffices to show, for A sufficiently large, the existence of σ (q, data) > 0 and C(q, M, data) < ∞ such that, for every k ∈ N, (4.33) Indeed, we can see that (4.33) implies (4.32) using a simple union bound. Fix then a parameter λ ∈ [1, ∞) and compute, using (4.31) and the Chebyshev inequality, Using the simple large deviations bound for sums of bounded, independent random variables, we have Here we are careful to notice that, while the collection {Z q θk ,z ∧ λ : z ∈ 3 θk Z d ∩ k+1 } of random variables is not independent (since adjacent cubes are touching and thus not separated by a unit distance), we can partition this collection into 3 d many subcollections which have an equal number of elements and each of which is independent. The large deviations estimate can then be applied to each subcollection, and then a union bound yields (4.35 Taking λ := 3 d 4 (1−θ)k and A q := C + 1 yields (4.33).
The above proof, in view of Remark 4.3, gives the following result without assuming (4.1). This, together with (5.1) below, serves as the base case for the induction.

Large-Scale C 0,1 -Type Estimate for Linearization Errors
In this section we continue to suppose that n ∈ {0, . . . , N − 1} is such that (4.1) holds. The goal is to complete the proof that Theorem 1.3 is also valid for n + 1. Combined with the results of the previous three sections and an induction argument, this completes the proof of Theorems 1.1, 1.2 and 1.3.

Excess Decay Iteration for ξ n+1
We start by proving higher integrability for a difference of two solutions. This, together with Proposition 4.7, yields the base case for the induction.
Proposition 5. 1. Fix M ∈ [1, ∞) and q ∈ [2, ∞). There exist α(data), σ (q, data) ∈ 0, 1 2 , C(q, M, data) < ∞ and a random variable X satisfying such that the following statement is vali: for every R X and u, v ∈ H 1 (B R ) satisfying, for each m ∈ {1, . . . , n + 1}, Then, for m ∈ {1, . . . , n + 1}, ξ 0 = u − v and r ∈ [X , 1 2 R], we have Proof. On the one hand, the estimate 3) follows by [1,Proposition 4.2]. On the other hand, the proof of is similar to the proof of L q -integrability of w 1 presented in Section 4. 3. Indeed, noticing that ξ 0 satisfies the equation we have by the normalization in (5.1) and C 1,α regularity of u and v that we may replace w 1 with ξ 0 in Lemma 4.5 applied with m = 1. Using this together with the Lipschitz estimate (5.3) for ξ 0 as in the proof of Lemma 4.6, concludes the proof of (5.4). We omit further details.
. (5.6) Proof. We start by fixing some notation. Let q(n, d, ) be as in Lemma C.1, applied for n + 1 instead of n. Corresponding this q, choose X to be the maximum of minimum scales appearing in Proposition 5.1, Theorem 1.3 and Lemma 4.4, of which last two are valid for n +1 in place of N by (4.1). We also assume that X R, by taking X ∨ R instead of X , where R will be fixed in the course of the proof to depend on parameters (n, M, data). Chosen this way, X satisfies X O σ (C) for some constants σ (n, data) > 0 and C(n, M, data) < ∞.
Let r j = θ j η R, where θ is as in Lemma 4.4 and η ∈ 0, 1 2 . The constant η, as well as R, will be fixed in the course of the proof, so that η is small and R is large. We track down the dependencies on η and R carefully and, in particular, constants denoted by C below do not depend on them. We may assume, without loss of generality, that η R X .
Step 2. Cacciopppoli estimate for ξ k+1 . We show that, for all r ∈ [X , R k ], We first have by (C.2) that .
By Theorem 1.3 and the choice of q in the beginning of the proof, we obtain, for every r ∈ X , 1 2 R and m ∈ {1, . . . , k + 1}, the estimates and, by Proposition 5.1, Combining above displays yields (5.9) in view of the induction assumption, that is, that (5.8) holds for m ∈ {0, . . . , k}.
Step 3. We prove that there is a constant C < ∞ independent of η and R such that, for j ∈ N 0 such that r j 2(X ∨ R), we have 12) and α(data) is the minimum of parameter α appearing in statements of Proposition 5.1 and Lemma 4. 4. Let us fix j ∈ N 0 such that r j 2(X ∨ R). We argue in two cases, namely, either (5.13) or (5.15) below is valid. We prove that in both cases (5.11) follows.
Step 3. 1. We first assume that 13) where ε j has been fixed in (5.12). We show that this implies .14) Notice that this gives (5.11). To show (5.14), we have by the triangle inequality that By the choice of α, we get by Proposition 5.1 that and, by Proposition 4.2, Combining the estimates and using (5.13), we have that We then obtain (5.14) by the choice of ε j in (5.12), provided that (5.13) is valid.
Step 4. We show that, for r ∈ [2(X σ ∨ R), r 0 ] we have We proceed inductively, and assume J ∈ N 0 is such that r J 2(X ∨ R) and for all j ∈ {0, . . . , J } we assume that there exists a constant C(d, θ) ∈ [1, ∞), independent of η and R, such that This is true for J = 0 by the definition of E k . We claim that it continues to hold for j = J + 1 as well. Combining (5.14) and (5.11) with the induction assumption we have, for j ∈ {1, . . . , J }, that Since r J R, we may take R large and η small enough so that Thus, by summing and reabsorbing, (5.20) Letting j being the minimizing affine function in inf ∈P 1 ξ k+1 − L 2 (B r j ) , we obtain by the above display and the triangle inequality that By the triangle inequality again, we obtain that We thus obtain by the triangle inequality, together with (5.20) and the above display, that, there exists C(d, θ) < ∞ such that Hence we can take C = C, proving the induction step. Letting then J be such that r ∈ (r J +1 , r J ], we obtain (5.19) by giving up a volume factor.
Step 5. Conclusion. To conclude, we obtain from (5.11) and (5.19) by iterating that We hence find α such that, for all r ∈ [2(X ∧ R, r 0 ], Moreover, by (5.19) and (5.9) we deduce that, for all r ∈ [2(X ∧ R), R k+1 ], Therefore, (5.8) is valid for m = k + 1, by giving up a volume factor. This proves the induction step and completes the proof.
These, together with Proposition 5.1, yields by (C.13) that, for R X , Having this at our disposal, we may repeat the proof of Section 4.3 to conclude (5.6) for q ∈ [2, ∞).

Sharp Estimates of Linearization Errors
Here we show that Corollary 1.4 is a consequence of Theorems Let Z be the finite set consisting of the centers of these balls. The size of Z depends only on the geometry of the sequence of domains U 0 , U 1 , . . . , U n . Let X be the maximum of the random variables X given in Theorem 1.3, centered at elements of Z, divided by the radius of the smallest ball. We assume that r X . This ensures the validity of Theorem 1.3 in each of the balls B (k) j : that is, for every q ∈ [2, ∞) and m ∈ {1, . . . , n}, we have the estimate 6.2) and hence, by the covering, Proceeding with the proof of the corollary, we define, as usual, Arguing by induction, let us suppose that m ∈ N with m 1 and θ > 0 are such that, for every j ∈ {0, . . . , m − 1} and q ∈ [2, ∞), there exists C(q, data) < ∞ such that . (6.4) This obviously holds for m = 1 and some θ(d, ) > 0 by the Meyers estimate and Theorem 1. 3. We will show that it must also hold for m + 1 and some other (possibly smaller) exponent θ > 0.
Step 1. We show that By the basic energy estimate, By Theorem 1.3 (using our definition of X and the fact that r X ) and the induction hypothesis, we have, for every k ∈ {1, . . . , m}, This completes the proof of (6.5).
The corollary now follows by induction.
Consequently, Remark 7.1 provides usw such that (w 1 , . . . , w n−1 ,w) ∈ W using the fact that F( p, ∇w 1 , . . . , ∇w n−1 ) is a polynomial of degree n, it is straightforward to show by homogeneity that We have the estimate for all m ∈ {1, . . . , n}, and by the equations of w 1 , . . . , w n−1 we see that Therefore, for all R > 0 and m ∈ {1, . . . , n}, Thus we have that, for t R, which yields (7.13).
We now turn to the proof of the large-scale C n,1 estimate. Proof of Theorem 1.6 (iii) n Fix M ∈ [1, ∞). By Theorem 1.3 there exist constants σ (n, M, data) ∈ (0, 1) and C(n, M, data) < ∞ and a random variable X satisfying X O σ (C) so that the statement of Theorem 1.3 is valid with q = 2(n + 2). We now divide the proof in two steps.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/ licenses/by/4.0/.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Deterministic Regularity Estimates
In this first appendix, we record some deterministic regularity estimates of Schauder and Calderón-Zygmund type for linear equations with Hölder continuous coefficients. These estimates, while well-known, are not typically written with explicit dependence on the regularity of the coefficients, which is needed for our purposes in this paper.
Proposition A. 1. (Calderón-Zygmund gradient L q estimates). Let β ∈ (0, 1], q ∈ [2, ∞) and a ∈ R d×d be a symmetric matrix with entries in C 0,β (B 2 ) satisfying Then u ∈ W 1,q loc (B 2 ) and there exists C(q, d, ) < ∞ such that Proof. We will explain how to extract the statement of the proposition from that of [4,Proposition 7.3]. The latter asserts the existence of δ 0 (q, d, ) > 0 such that, for every ball x ∈ B 1 and r ∈ 0, 1 2 satisfying we have, for a constant C(q, d, ) < ∞, the estimate Since osc B 2r a (2r ) β [a] C 0,β (B 2 ) , we have the above estimate for every x ∈ B 1 and r := From this, Fubini's theorem and Young's inequality for convolutions, we obtain This completes the proof.

(A.3)
Proof. We will explain how to extract the statement of the proposition from the gradient Hölder estimate found in [20,Theorem 3.13]. The latter states that, under the assumption that [a] C 0,β (B 2 ) 1, there exists C(β, d, ) < ∞ such that After changing the scale, we obtain the corresponding statement in B r , which asserts that, under the assumption that Therefore we take r : and apply the previous statement in every ball B r (x) with x ∈ B 1 to obtain After a covering argument, we obtain ⊗1 . ⊗1 , which is (B.1), concluding the proof.

Appendix C: Linearization Errors
In this appendix we compute the equation satisfied by a higher-order linearization error and thereby obtain gradient estimates.

(C.2)
Proof. Throughout the proof we use the notation s k = k j=1 w j j! and ξ 0 = v − u, so that ξ k = ξ 0 − s k .
Step 1. Recalling that F 1 = 0, we may rewrite By the equations of u, v and w k , we have that It thus remains to estimate E n .
Step 3. We show that, for k ∈ {2, . . . , n} there exists a constant C(n, k, d) < ∞ such that The statement is easy to verify by induction. Indeed, for m = j = 2 we have by (C. 5 By (C.8), using Fubini for sums, we obtain, This, together with (C.9), proves the induction step, and gives also (C.7).
Set also R h := 1 2 (1 + 2 −h )R. With this notation the previous display yields, by Hölder's inequality, that Now (C.2) follows by the Caccioppoli estimate, concluding the proof.
Throughout the next steps of the proof, we let constants C depend on parameters ({K i } m−1 i=1 , m, η, M, β, K 0 , d, and they may change from line to line. Step 3. Bounds on f. We show that under induction assumptions (D.14) and (D. 15), we have that 16) and, for r ∈ (0, r m ] and y ∈ B R m , defining  19) which yields (D. 16) by (2.7).
To show (D. 18), using Hölder regularity of f m with respect to ∇u variable, similarly to (2.11), gives us Step 5. Induction assumption on the scale. We now assume that we find ε ∈ (0, 1], a constant C ε , and r * ∈ (0, εr m ] such that implying that (D.21) is valid for r * = εr m provided that C ε C ε .
Therefore, by the triangle inequality, we get ∇w m − (∇w m ) B θr (y) L 2 (B θr (y)) By an iteration argument we thus obtain that, for r ∈ [θr * , εr m ]
Step 7. We now prove that (D.14) is valid for j = m, giving also (D. 7). An application of the Caccioppoli estimate (D. 20), together with (D.5), which was proved in Step 6 above, we have that, by giving up volume factors, This yields, via telescoping summation as in Step 6, that Thus, if, on the one hand, r = |x − y| ∈ (0, r m ], we get by the above two displays that Step 8. We finally sketch the proof of (D.8). Since it is very similar to the above reasoning, we will omit most of the details. We prove the statement by using induction in m and in k.

(D.24)
Next, w 1 solves −∇ · (b∇w 1 ) = 0 and z → b(z) = D 2 p L(∇u(z)) is in C n,β by (D.24), we may differentiate the equation at most n times and obtain that w 1 ∈ C n+1,β and it is also straightforward to show that w 1 satisfies (D.8) for m = 1; this is just classical Schauder theory. We than assume that (D. 8 This proves the induction step, and finishes the proof. Lemma E.2. Let n ∈ N and let α ∈ (n, n + 1). Let M ∈ [0, ∞) and ε ∈ (0, 1). Suppose that A is a constant symmetric matrix having eigenvalues on the interval [1, ]. There is a constant C(n, α, ε, d, ) such that if F ∈ L p (B 1 ) and u ∈ H 1 (B 1 ) solve ∇ · A∇u = ∇ · F, and that F ∈ L p (B 1 ) satisfies, for r ∈ (0, 1), then there is A-harmonic q ∈ P n+1 such that In particular, lettingq t be the minimizing element of A n+1 in the definition of D(t), we get by the triangle inequality that t −n ∇q t/2 − ∇q t L p (B t ) Ct α−n (D(t/2) + D(t)) Ct α−n (D(r ) + M) .
This allows us to identify q ∈ A n+1 such that, for t ∈ (0, r ), The proof is complete by an easy estimate D(1 − ε) C ε ( ∇u L 2 (B 1 ) + M).