Higher-order linearization and regularity in nonlinear homogenization

We prove large-scale $C^\infty$ regularity for solutions of nonlinear elliptic equations with random coefficients, thereby obtaining a version of the statement of Hilbert's 19th problem in the context of homogenization. The analysis proceeds by iteratively improving three statements together: (i) the regularity of the homogenized Lagrangian $\bar{L}$, (ii) the commutation of higher-order linearization and homogenization, and (iii) large-scale $C^{0,1}$-type regularity for higher-order linearization errors. We consequently obtain a quantitative estimate on the scaling of linearization errors, a Liouville-type theorem describing the polynomially-growing solutions of the system of higher-order linearized equations, and an explicit (heterogenous analogue of the) Taylor series for an arbitrary solution of the nonlinear equations---with the remainder term optimally controlled. These results give a complete generalization to the nonlinear setting of the large-scale regularity theory in homogenization for linear elliptic equations.


Motivation: quantitative homogenization for nonlinear equations.
This article is concerned with nonlinear, divergence-form, uniformly elliptic equations of the form The Lagrangian L(p, x) is assumed to be uniformly convex and regular in p and possess some mild regularity in x. Furthermore, L is a stochastic object: it is sampled by a probability measure P which is statistically stationary and satisfies a unit range of dependence. This essentially means that x ↦ L(⋅, x) is a random field, valued in the space of uniformly convex functions, the law of which is independent of x (or, to be precise, periodic in x; see Subsection 1.4 for the assumptions).
The objective is to describe the statistical behavior of the solutions of (1.1), with respect to the probability measure P, on large length scales. In other words, we want to understand what the solution u looks like in the case that the "macroscopic" domain U is very large relative to "microscopic" scale, which is the correlation length scale of the coefficients (taken to the unit scale).
At a qualitative level, a satisfactory characterization of the solutions, in the regime in which the ratio of these two length scales is large, is provided by the principle of homogenization. First proved in this context by Dal Maso and Modica [11,12], it asserts roughly that a solution of (1.1) is, with probability approaching one, close in L 2 (relative to its size in L 2 ) to a solution of a deterministic equation of the form (1.2) − ∇ ⋅ D p L (∇u hom ) = 0 in U, for an effective Lagrangian L which is also uniformly convex and C 1,1 . This result is of great importance, from both the theoretical and computation points of view, since the complexity of the homogenized equation (1.2) is significantly less than that of (1.1) as it is both deterministic and spatially homogeneous. It hints that the structure of (1.1) should, on large domains and with high probability, possess some of the structure of a constant coefficient equation and thus we may expect it to be more amenable to our analysis than the worst-possible heterogeneous equation of the form (1.1). In other words, since the Lagrangian L is sampled by a probability measure P with nice ergodic properties, rather than given to us by the devil, can we expect its solutions to have a nicer structure? In order to answer this kind of question, we need to build a quantitative theory of homogenization.
To be of practical use, the principle of homogenization needs be made quantitative. We need to have answers to questions such as these: • How large does the ratio of scale separation need to be before we can be reasonably sure that solutions of (1.1) are close to those of (1.2)? In other words, what is the size of a typical error in the homogenization approximation in terms of the size of U ? • Can we estimate the probability of the unlikely event that the error is large?
• What is D p L and how can we efficiently compute it? How regular can we expect it to be? Can we efficiently compute its derivatives? • Can we describe the fluctuations of the solutions?
In this paper we show that (1.1) has a C ∞ structure. In particular, we will essentially answer the third question posed above by demonstrating that the effective Lagrangian L is as regular as L(⋅, x) with estimates for its derivatives. We will identify the higher derivatives of L as the homogenized coefficients of certain linearized equations and give quantitative homogenization estimates for these, implicitly indicating a computational method for approximating them and thus a Taylor approximation for L. Finally, we will prove large-scale C k,1 type estimates for solutions of (1.1), for k ∈ N as large as can be expected from the regularity assumptions on L, a result analogous to Hilbert's 19th problem, famously given for spatially homogeneous Lagrangians by De Giorgi and Nash. Our analysis reveals the interplay between these three seemingly different kinds results: (i) the regularity of L, (ii) the homogenization of linearized equations, and (iii) the largescale regularity of the solutions. In analogy to the way that the Schauder estimates are iterated in the resolution of Hilbert's 19th problem, these three statements must be proved together, iteratively in the parameter k ∈ N which represents the degree of regularity of L, the order of the linearized equation, and the order of the C k,1 estimate.

1.2.
Background: large-scale regularity theory and its crucial role in quantitative homogenization. In the last decade, beginning with the work of Gloria and Otto [18,16], a quantitative theory of homogenization has been developed to give precise answers to questions like the ones stated in the previous subsection. Until now, most of the progress has come in the case of linear equations which corresponds to the special case L(p, x) = 1 2 p ⋅ a(x)p of (1.1), where a(x) is a symmetric matrix. By now there is an essentially complete quantitative theory for linear equations, and we refer to the monograph [4] and the references therein for a comprehensive presentation of this theory. Quantitative homogenization for the nonlinear equation (1.1) has a comparatively sparse literature; in fact, the only such results of which we are aware are those of [6,5] (see also [4,Chapter 11]), our previous paper [1] and a new paper of Fischer and Neukamm [14] which was posted to the arXiv as we were finishing the present article.
Quantitative homogenization is inextractably linked to regularity estimates on the solutions of the heterogeneous equation. This is not surprising when we consider that the homogenized flux D p L(∇u hom ) should be related to the spatial average (say, on some mesoscopic scale) of the heterogeneous flux D p L(∇u(x), x). In order for spatial averages of the latter to converge nicely, we need to have bounds. It could be unfortunate and lead to a very slow rate of homogenization if, for instance, the flux was concentrated on sets of very small measure which percolate only on very large scales. To rule this out we need good great estimates: ideally, we would like to know that the size of the flux on small scales is the same as on large scales, which amounts to a W 1,∞ estimate on solutions.
Unfortunately, solutions of equations with highly oscillating coefficients do not possess very strong regularity, in general. Indeed, the best deterministic elliptic regularity estimate for solutions of (1.1), which does not degenerate as the size of the domain U becomes large, is C 0,δ in terms of Hölder regularity (the De Giorgi-Nash estimate) and W 1,2+δ in terms of Sobolev regularity (the Meyers estimate). The tiny exponent δ > 0 in each estimate becomes small as the ellipticity ratio becomes large (see [4,Example 3.1]) and thus both estimates are far short of the desired regularity class W 1,∞ = C 0,1 .
One of the main insights in the quantitative theory of homogenization is that, compared to a generic ("worst-case") L, solutions of the equation (1.1) have much better regularity if L is sampled by an ergodic probability measure P. This is an effect of homogenization itself: on large scales, (1.1) should be a "small perturbation" of (1.2), and therefore better regularity estimates for the former can be inherited from the latter. This is the same idea used to prove the classical Schauder estimates. In the context of homogenization, the result states that there exists a random variable X (sometimes called the minimal scale) which is finite almost surely such that, for every X < r < 1 2 R and solution u ∈ H 1 (B R ) of (1.1) with U = B R , we have the estimate Here C depends only on dimension and ellipticity and ⨏ U ∶= 1 U ∫ U denotes the mean of a function in U . If we could send r → 0 in (1.4), it would imply that which is a true Lipschitz estimate, same estimate in fact as holds for the homogenized equation (1.2). As (1.4) is valid only for r > X , it is sometimes called a "largescale C 0,1 estimate" or a "Lipschitz estimate down to the microscopic scale." This estimate, first demonstrated in [6] in the stochastic setting, is a generalization of the celebrated result in the case of (non-random) periodic coefficients due to Avellaneda and Lin [7]. Of course, it then becomes very important to quantify the size of X . The estimate proved in [6], which is essentially optimal, states that X is bounded up to "almost volume-order large deviations": for every s < d and r ≥ 1, (1.5) P [X > r] ≤ C exp (−cr s ) . Here the constant C depends only on s, d, and the ellipticity. A proof of this large-scale regularity estimate together with (1.5) can be found in [4,Chapter 3] in the linear case and in [4,Chapter 11] for the nonlinear case. The right side of (1.5) represents the probability of the unlikely event that the L sampled by P will be a "worst-case" L in the ball of radius r. A proof of the optimality of (1.5) can be found in [4,Section 3.6].
This large-scale regularity theory introduced in [6] was further developed in the case of (1.3) in [17,5,15,2,3] and now plays an essential role in the quantitative theory of stochastic homogenization. Whether one employs functional inequalities [17,13] or renormalization arguments [2,3,19], it is a crucial ingredient in the proof of the optimal error estimates in homogenization for (1.3): see the monograph [4] and the references therein for a complete presentation of these developments.
The large-scale C 0,1 estimate is, from one point of view, the best regularity one can expect solutions of (1.1) or (1.3) to satisfy: since the coefficients are rapidly oscillation, there is no hope for the gradient to exhibit continuity on the macroscopic scale. However, as previously shown in the periodic case in [7,9], the solutions of the linear equation (1.3) still have a C ∞ structure. To state what we mean, let us first think of an (interior) C k,1 estimate not as a pointwise bound on the (k + 1)th derivatives of a function, but as an estimate on how well a function may be approximated on small balls by a kth order polynomial. By Taylor's theorem, these are of course equivalent in the following sense: where P k denotes the set of polynomials of degree at most k and and we use the notation w L 2 (U ) ∶= ⨏ U w 2 1 2 to denote the volume-normalized L 2 (U ) norm. Thus the interior C k,1 estimate for a harmonic function can be stated in the form: for any harmonic function u in B R and any r ∈ 0, 1 2 R , Moreover, the infimum on the left side may be replaced by the set of harmonic polynomials of degree at most k.
As we cannot expect a solution of (1.1) to have regularity beyond C 0,1 in a classical (pointwise) sense, in order to make sense of a C k,1 estimate for the heterogeneous equation (1.3) we need to replace the set of polynomials by a heterogeneous analogue. The classical Liouville theorem says that the set of harmonic functions which grow like o( x k+1 ) is just the set of harmonic polynomials of degree at most k. This suggests that we should use the (random) vector space A k ∶= u ∈ H 1 loc (R d ) ∶ −∇ ⋅ a∇u = 0, lim sup r→∞ r k+1 u L 2 (Br) = 0 .
We think of these as "a(x)-harmonic polynomials." It turns out that one can prove that, P-almost surely, this set is finite dimensional and has the same dimension as the set of at most kth order harmonic polynomials. In fact, one can match any a(x)-harmonic polynomial to an a-harmonic polynomial in the highest degree, and vice versa. In close analogy to (1.6), the statement of large-scale C k,1 regularity is then as follows: there exists a minimal scale X satisfying (1.5) such that, for any R > 2X and any solution u ∈ H 1 (B R ) of −∇ ⋅ a∇u = 0, we can find φ ∈ A k such that, for every r ∈ X , 1 2 R , See [4,Theorem 3.8] for the full statement, which was first proved in the periodic setting by Avellaneda and Lin [8]. Subsequent versions of this result, which are based on the ideas of [7,8] in their more quantitative formulation given in [6], were proved in various works [17,15,2], with the full statement here given in [3,10]. In all of its various forms, higher regularity in stochastic homogenzations is based on the simple idea that solutions of the heterogeneous equation should be close to those of the homogenized equation, which should have much better regularity. In the case of the linear equation (1.3), it does not a large leap, technically or philosophically, to go from (1.4) to (1.7). Indeed, to gain control over higher derivatives, one just needs to differentiate the equation (not in the microscopic parameters, of course, but in macroscopic ones) and, luckily, since the equation is linear, this does not change the equation. Roughly speaking, the idea is analogous to bootstrapping the regularity of a constant-coefficient, linear equation by differentiating it. Therefore the estimate (1.7) is perhaps not too surprising once one has the large-scale C 0,1 estimate in hand. 1.3. Summary of the results proved in this paper. The situation is very different in the nonlinear case. When one differentiates the equation (again, in a macroscopic parameter) we get a new equation, namely the first-order linearized equation. If we want to apply a large-scale regularity result to this equation, we must first (quantitatively) homogenize it! Achieving higher-order regularity estimates requires repeatedly differentiating the equation, which lead to a hierarchy of linearized equations requiring homogenization estimates.
Let us be a bit more explicit. The gradient of the homogenized Lagrangian L is given by the well-known formula where φ p is the first-order corrector with slope p ∈ R d , that is, it satisfies The limit in the second line of (1.8) is to be understood in a P-almost sure sense, and it is a consequence of the ergodic theorem, which states that macroscopic averages of stationary fields must converge to their expectations. The formula (1.8) says that D p L(p) is the flux per unit volume of the first-order corrector with slope p ∈ R d . It naturally arises when we homogenize the nonlinear equation. We can try to show that L ∈ C 2 by formally differentiating (1.8), which leads to the expression If we define the linearized coefficients around p + φ p by a p (x) ∶= D 2 p L (p + ∇φ p (x), x) and put ψ (1) p,e i ∶= ∂ p i φ p (x), then we see that ψ (1) p,e i is the first-order corrector with slope e i of the linear equation with coefficients a p : ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩ − ∇ ⋅ a p e i + ∇ψ (1) p,e i = 0 in R d , ∇ψ (1) p,e i is Z d -stationary, E [0,1] d ∇ψ p,e i (x) dx = 0.
We call ψ (1) p,e a first-order linearized corrector. Moreover, we have the formula a p (x) (e i + ∇ψ p,e i (x)) dx = a p e i .
That is, "linearization and homogenization commute": the Hessian of L at p should be equal to the homogenized coefficients a p corresponding to the linear equation with coefficient field a p = D 2 p L(p + ∇φ p (⋅), ⋅). This reasoning is only formal, but since the right side of the above formula for D 2 p L is well-defined (and only needs qualitative homogenization), we should expect that it should be rather easy to confirm it rigorously. Moreover, while quantitative homogenization of the original nonlinear equation gives us a C 0,1 estimate, we should expect that quantitative homogenization of the linearized equation should give us a C 0,1 estimate for differences of solutions and a large-scale C 1,1 estimate for solutions. This is indeed the case and was proved in our previous paper [1], where we were motivated by the goal of obtaining this regularity estimate for differences of solutions in anticipation of its important role in the proof of optimal quantitative homogenization estimates. Indeed, in the very recent preprint [14], Fischer and Neukamm showed that this estimate can be combined with spectral gap-type assumptions on the probability measure to obtain quantitative bounds on the first-order correctors which are optimal in the scaling of the error.
We may attempt to differentiate the formula for the homogenized Lagrangian a second time, with the ambition of obtaining a C 3 estimate for L, a C 2,1 estimate for solutions and a higher-order improvement of our C 0,1 estimate for differences (which will be a C 0,1 estimate for linearization errors): we get (1) p,e i e j + ∇ψ (1) p,e j dx .
If we define the vector field F 2,p,e i ,e j (x) ∶= D 3 L(p + ∇φ p (x), x) e i + ∇ψ (1) p,e i e j + ∇ψ (1) p,e j , then we see that ψ (2) p,e i ,e j is the first-order corrector with slope zero of the linear equation (1.9) − ∇ ⋅ a p ∇ψ (2) p,e i ,e j = ∇ ⋅ F 2,p,e i ,e j in R d , and the formula for the tensor D 3 p L becomes the corresponding homogenized coefficient. Unlike the case for the Hessian of L, we should not expect this formula to be valid under qualitative ergodic assumptions! Indeed, the qualitative homogenization of (1.9) and thus the validity of (1.10) requires that F 2,p,e i ,e j belong to L 2 , in the sense that E ∫ [0,1] d F 2,p,e i ,e j (x) 2 dx < ∞, and due to the product of two first-order correctors we only have L 1 -type integrability 1 for F 2,p,e i ,e j . This is a serious problem which can only be fixed using the large-scale regularity theory for the first-order linearized equation, and a suitable bound on the minimal scale X , thereby obtaining a bound in L ∞ (◻ 0 ) and hence L 4 (◻ 0 ) for ∇ψ (1) p,e i , with at least a fourth moment in expectation. Note that this also requires some regularity of the Lagrangian L on the smallest scale.
If we differentiate the equation once more in an effort to prove that L ∈ C 4 , we will be faced with similar difficulties, this time with the more complicated vector field p,e i ,e j ,e k (x) ∶= D 3 L(p + ∇φ p (x), x) e i + ∇ψ (1) p,e i ∇ψ (2) p,e j ,e k + D 3 L(p + ∇φ p (x), x) e j + ∇ψ (1) p,e j ∇ψ (2) p,e i ,e k + D 3 L(p + ∇φ p (x), x) e k + ∇ψ (1) p,e k ∇ψ (2) p,e i ,e j + D 4 L(p + ∇φ p (x), x) e i + ∇ψ (1) p,e i e j + ∇ψ (1) p,e j e k + ∇ψ (1) p,e k . Notice that the last term of which has three factors of the first-order linearized correctors instead of two, and is thus "even further" from being obviously L 2 than was F (2) p,e i ,e j . Homogenizing the third-order linearized equation will therefore require large-scale regularity estimates for both the first-order and second-order linearized equations, and one can now see the situation will get worse as the order increases beyond three. Moreover, proving quantitative homogenization for these equations will also require some smoothness of the homogenized coefficients associated to the lower-order equations, due to the needs for the homogenized solutions to be smooth in quantitative two-scale expansion arguments.
This suggets a bootstrap argument for progressively and simultaneously obtaining (i) the smoothness of L; (ii) the higher-order large-scale regularity of solutions (and solutions of linearized equations); and (iii) the homogenization of the higherorder linearized equations and commutation of homogenization and higher-order linearization. The point of this paper is to formalize this idea and thereby give a proof of "Hilbert's 19th problem for homogenization." Here is a rough schematic of the argument, as discussed above, which comes in three distinct steps, discussed in more detail below: • Homogenization & large-scale C 0,1 regularity for the linearized equations up to order N ⇒ L ∈ C 2+N . • L ∈ C 2+N and large-scale C 0,1 regularity for the linearized equations up to order N ⇒ homogenization for the linearized equations up to order N + 1. • L ∈ C 2+N , large-scale C 0,1 regularity for the linearized equations up to order N and homogenization for the linearized equations up to order N + 1 ⇒ large-scale C 0,1 regularity for the linearized equations up to order N + 1. The three implications above are the focus of most of the paper and their proofs are given in  Once this induction argument is completed, we consequently obtain a full C k,1type large scale regularity estimate for solutions of the original nonlinear equation, generalizing (1.7). The main question becomes what the replacement for A k should be, that is, what the "polynomials" should be. We show that these are certain solutions of the system of linearized equations (linearized around a first-order corrector) exhibiting polynomial-type growth, which we classify by providing a Liouville-type result which is part of the statement of the theorem (see the discussion between the statements of Theorems 1.5 and Theorem 1.6, below, for a definition of these spaces, which are denoted by W p n ). The resulting theorem we obtain, which is a version of the statement of Hilbert's 19th problem in the context of homogenization, provides a very precise description of the solutions of (1.1) in terms of the first-order correctors and the correctors of a hierarchy of linear equations.
We also obtain, as a corollary, the improvement of the scaling of linearization errors-which is a closely related to the regularity of solutions. To motivate this result, suppose we are given two solutions u, v ∈ H 1 (B R ) of (1.1) in a large ball (R ≫ 1) which are close to each other in the sense that Suppose that we attempt to approximate the difference u − v by the solution w ∈ Then we may ask the question of how small we should expect the first-order linearization error to be. The best answer that we have from deterministic elliptic regularity estimates is that there exists a small exponent δ(d, Λ) > 0 such that This can be easily proved for instance using the Meyers gradient L 2+δ estimate, and it is sharp in the sense that it is not possible to do better than the very small exponent δ. We can say roughly that the space of solutions of (1.1) is a C 1,δ manifold, but no better. Of course, if L does not depend on x, or if R is of order one and L is smooth in both variables (p, x), then one expects to have the estimate above with δ = 1 and to be able to prove more precise estimates using higher-order linearized equations. In fact, this is essentially a reformulation of the statement of Hilbert's 19th problem (indeed-see Appendix E, where we give a proof of Hilbert's 19th problem in its classical formulation by following this line of reasoning).
In this paper we also prove a large-scale version of the quadratic response to firstorder linearization in the context of homogenization, which states that (1.12) holds with δ = 1 whenever R is larger than a random minimal scale. Moreover, we prove a full slate of higher-order versions of this result: see Corollary 1.4. These results roughly assert that, with probability one, the large-scale structure of solutions of (1.1) resembles that of a smooth manifold.
In the following two subsections, we state our assumptions and give the precise statements of the results discussed above.
1.4. Assumptions and notation. In this subsection, we state the standing assumptions in force throughout the paper.
We fix following global parameters: the dimension d ∈ N with d ≥ 2, a constant Λ ∈ [1, ∞) measuring the ellipticity, an integer N ∈ N with N ≥ 1 measuring the smoothness of the Lagrangian, and constants M 0 , K 0 ∈ [1, ∞). For short, we denote data ∶= (d, Λ, N, M 0 , K 0 ). This allows us to, for instance, denote constants C which depend on (d, Λ, N, The probability space is the set Ω of all Lagrangians L ∶ R d × R d → R, written as a function of (p, x) ∈ R d × R d , satisfying the following conditions: (L1) L is 2+N times differentiable in the variable p and, for every k ∈ {2, . . . , 2+N}, the function D k p L is uniformly Lipschitz in both variables and satisfies (1.13) For k = 1, we assume that, for z ∈ R d , (L2) L is uniformly convex in the variable p: for every p ∈ R d and x ∈ R d , We define Ω to be the set of all such Lagrangians L: Note that Ω depends on the fixed parameters (d, Λ, N, M 0 , K 0 ). It is endowed with the following family of σ-algebras: for each Borel subset U ⊆ R d , define F(U ) ∶= the σ-algebra generated by the family of random variables The largest of these is denoted by F ∶= F(R d ).
We assume that the law of the "canonical Lagrangian" L is a probability measure P on (Ω, F) satisfying the following two assumptions: (P1) P has a unit range of dependence: for all Borel subsets U, V ⊆ R d such that dist(U, V ) ≥ 1, F(U ) and F(V ) are P-independent.
(P2) P is stationary with respect to Z d -translations: for every z ∈ Z d and E ∈ F, The expectation with respect to P is denoted by E.
Since we will be often concerned with measuring the stretched exponential moments of the random variables we encounter, the following notation is convenient: for every σ ∈ (0, ∞), θ > 0, and random variable X on Ω, we write This is essentially notation for an Orlicz norm on (Ω, P). Some basic properties of this notation is given in [4, Appendix A].
1.5. Statement of the main results. We begin by introducing the higher-order linearized equations. These can be computed by hand, as we did for the second and third linearized equations in Section 1.3, but it is convenient to work with more compact formulas. Observe that, by Taylor's formula with remainder, we have, for every n ∈ {0, . . . , N + 1}, Define, for p, x, h 1 , . . . , h N ∈ R d and t ∈ R, Also define, for each m ∈ {1, . . . , N + 1} and p, x, h 1 , . . . , h m−1 ∈ R d , . Observe that F 1 ≡ 0 by definition (that is, the right side of the first linearized equation is zero, as we have already seen).
Our first main result concerns the regularity of the effective Lagrangian L and states that it has essentially the same regularity in p as we assumed for L.
In the next theorem, we present a statement concerning the commutability of homogenization and higher-order linearizations. It generalizes [1, Theorem 1.1], which proved the result in the case N = 1.  There exist σ(data) > 0, α ({U m }, δ, data) > 0, C ({U m }, M, δ, data) < ∞, and a random variable X satisfying such that the following statement is valid. Let ε ∈ (0, 1], f ∈ W 1,2+δ (U 0 ) be such that and, for each m ∈ {1, . . . , n + 1}, fix g m ∈ W 1,2+δ (U m ) and let u ε ∈ H 1 (U 1 ) and the functions w ε 1 ∈ H 1 (U 1 ), w ε 2 ∈ H 1 (U 2 ), . . . , w ε n+1 ∈ H 1 (U n+1 ) satisfy, for every m ∈ {1, . . . , n + 1}, the Dirichlet problems Finally, let u ∈ H 1 (U 1 ) and, for every m ∈ {1, . . . , n + 1}, the function w m ∈ H 1 (U m ) satisfy the homogenized problems Then, for every m ∈ {1, . . . , n + 1}, we have the estimate Observe that, due to the assumed regularity of L in the spatial variable and the Schauder estimates, the vector fields F m (∇u ε , ∇w ε 1 , . . . , ∇w ε m−1 , x ε ) on the right side of the equations for w m in (1.19) belong to L ∞ (U m ). In particular, they belong to L 2 (U m ) and therefore the Dirichlet problems in (1.19) are well-posed in the sense that the solutions w m belong to H 1 (U m ). Of course, this regularity given by the application of the Schauder estimate depends on ε (and indeed the constants blow up like a large power of ε −1 ) and therefore this remark is not very useful as a quantitative statement. To prove the homogenization result for the mth linearized equation, we will need to possess much better bounds on these vector fields, which amounts to better regularity on the solutions ∇u, ∇w 1 , . . . , ∇w m−1 . This is the reason we must prove Theorems 1.1 and 1.2 at the same time (in an induction on the order m of the linearized equation and of the regularity of D 2 L) as the following result on the large-scale regularity of solutions of the linearized equations and of the linearization errors. Its statement is a generalization of the large-scale C 0,1 estimates for linearized equation and differences of solutions proved in [1].
As mentioned above, throughout the paper we use the following notation for volume-normalized L p norms: for each p ∈ [1, ∞), U ⊆ R d with U < ∞ and f ∈ L p (U ), Let n ∈ {0, . . . , N}, q ∈ [2, ∞), and M ∈ [1, ∞). Then there exist σ(q, data) > 0, a constant C(q, M, data) < ∞ and a random variable X satisfying X ≤ O σ (C) such that the following statement is valid. For R ∈ [2X , ∞) and u, v, w 1 , . . . , w n+1 ∈ H 1 (B R ) satisfying, for every m ∈ {1, . . . , n + 1}, . . , ∇w m−1 , x)) in B R , and defining, for m ∈ {0, . . . , n}, the mth order linearization error ξ m ∈ H 1 (B R ) by then we have, for every r ∈ X , 1 2 R and m ∈ {0, . . . , n}, the estimates The main interest in the above theorem is the case of the exponent q = 2. However, we must consider arbitrarily large exponents q ∈ [2, ∞) in order for the induction argument to work. In particular, in order to show that Theorem 1.3 for some n implies Theorem 1.2 for n+1, we need to consider potentially very large q (depending on n).
As mentioned above, Theorems 1.1, 1.2 and 1.3 are proved together in an induction argument. Each of the theorems has already been proved in the case N = 0 and q = 2 in our previous paper [1]. The integrability in Theorem 1.3 is upgraded to q ∈ (2, ∞) in Propositions 4.7 and 5.1 below for w 1 and ξ 0 , respectively. These serve as the base case of the induction. The main induction step is comprised of the following three implications: • Regularity of L (Section 2). We show that if Theorems 1. This allows us to apply homogenization estimates from [4]. • Large-scale C 0,1 regularity of linearized solutions and linearization errors (Sections 4 and 5). We argue, for n ∈ {0, . . . , N − 1}, that if Theorems 1.1 and 1.2 are valid for n + 1 and Theorem 1.3 is valid for n, then we may conclude that Theorem 1.3 is also valid for n + 1. Here we use the method introduced in [6] of applying a quantitative excess decay iteration, based on the "harmonic" approximation provided by the quantitative homogenization statement. This estimate controls the regularity of the w m 's on "large" scales (i.e., larger than a multiple of the microscopic scale). To obtain L q -type integrability for ∇w m , it is also necessary to control the small scales, and for this we apply deterministic Calderón-Zygmund-type estimates (this is our reason for assuming L possesses some small-sacle spatial regularity). The estimates for the linearization errors ξ m−1 are then obtained as a consequence by comparing them to w m .
From a high-level point of view, the induction argument summarized above resembles the resolution of Hilbert's 19th problem on the regularity of minimizers of integral functionals with uniformly convex and smooth integrands. The previous three theorems allow us to prove the next two results, which can be considered as resolutions of Hilbert's 19th problem in the context of homogenization.
The first is the following result on precision of the higher-order linearization approximations which matches the one we have in the constant-coefficient case, as discussed near the end of Subsection 1.3. There exist constants σ(data) ∈ 0, 1 2 , C({U m }, M, data) < ∞ and a random variable X satisfying such that the following statement is valid. Let r ∈ [X , ∞), n ∈ {1, . . . , N} and u, v ∈ H 1 (rU 0 ) satisfy − ∇ ⋅ (D p L(∇u, x)) = 0 and − ∇ ⋅ (D p L(∇v, x)) = 0 in rU 0 , Then, for every m ∈ {1, . . . , n}, Corollary 1.4 is an easy consequence of the previous theorems stated above. Its proof is presented in Section 6.
The analysis of the linearized equations presented in the theorems above allow us to develop a higher regularity theory for solutions of the nonlinear equation on large scales, in analogy to the role of the Schauder theory in the resolution of Hilbert's 19th problem on the regularity of solutions of nonlinear equations with smooth (or constant) coefficients. This result generalizes the large-scale C 1,1 -type estimate proved in our previous paper [1] to higher-order regularity as well as the result in the linear case [4,Theorem 3.6].
Before giving the statement of this result, we introduce some additional notation and provide some motivational discussion. Given a domain U ⊆ R d , we define This is the set of solutions of the nonlinear equation in the domain U , which we note is a stochastic object. We next define L 1 to be the set of global solutions of the nonlinear equation which exhibit at most linear growth at infinity: For each p ∈ R d , we denote the affine function p by p (x) ∶= p ⋅ x. Observe that if the difference of two elements of L 1 has strictly sublinear growth at infinity, it must be constant, by the C 0,1 -type estimate for differences (the estimate (1.23) with m = 0). Therefore the following theorem, which was proved in [1], gives a complete classification of L 1 . Fix σ ∈ (0, d) and M ∈ [1, ∞). There exist δ(σ, d, Λ) ∈ 0, 1 2 , C(M, σ, data) < ∞ and a random variable X σ which satisfies the estimate such that the following statements are valid.
(i) For every u ∈ L 1 satisfying lim sup r→∞ 1 r u − (u) Br L 2 (Br) ≤ M, there exist an affine function such that, for every R ≥ X σ , (ii) For every p ∈ B M , there exists u ∈ L 1 satisfying, for every R ≥ X σ , (iii) For every R ≥ X s and u ∈ L(B R ) satisfying 1 R u − (u) B R L 2 (B R ) ≤ M, there exists φ ∈ L 1 such that, for every r ∈ [X σ , R], Statements (i) and (ii) of the above theorem, which give the characterization of L 1 , can be considered as a first-order Liouville-type theorem. In the case of deterministic, periodic coefficient fields, this result was proved by Moser and Struwe [22], who generalized the result of Avellaneda and Lin [8] in the linear case. Part (iii) of the theorem is a quantitative version of this Liouville-type result, which we call a "large-scale C 1,1 estimate" since it states that, on large scales, an arbitrary solution of the nonlinear equation can be approximated by an elements of L 1 with the same precision as harmonic functions can be approximated by affine functions. It can be compared to similar statements in the linear case (see for instance [4,17]).
In this paper we prove a higher-order version of Theorem 1.5. We will show that, just as a harmonic function can be approximated locally by harmonic polynomials, we can approximate an arbitrary element of L(B R ) can be approximated by elements of a random set of functions which are the natural analogue of harmonic polynomials. In order to state this result, we must first define this space of functions.
Let us first discuss the constant-coefficient case. If L is a smooth Lagrangian, we know from the resolution of Hilbert's 19th problem that solutions of −∇⋅D p L(∇u) = 0 are smooth and thus may be approximated by a Taylor expansion at each point. One may then ask, can we characterize the possible Taylor polynomials? In Appendix E we provide such a characterization in terms of the linearized equations. The quadratic part is an a p ∶= D 2 L(p)-harmonic polynomial and the higher-order polynomials satisfy the equations the linearized equations, involving the F m 's as right hand sides. More precisely, for each p ∈ R d and n ∈ N, we set It is not too hard to show that W p,hom n ⊆ P hom 2 × . . . × P hom n+1 , where P hom j stands for homogeneous polynomials of degree j. Indeed, we see, by Liouville's theorem, that w 1 is an a p -harmonic polynomial of degree two. More importantly, according to Appendix E, we have that if u solves −∇ ⋅ D p L(∇u) = 0 in the neigborhood of origin, and we set p = ∇u(0) and w m (x) ∶= 1 m+1 ∇ m+1 u(0) x ⊗(m+1) , then (w 1 , . . . , w n ) ∈ W p,hom n .
In particular, w m is a sum of a special solution of ∇ ⋅ a p ∇w m + F m (p, ∇w 1 , . . . , ∇w m−1 ) = 0 in P hom m+1 and an a p -harmonic polynomial in P hom m+1 . For our purposes it is convenient to relax the growth condition at the origin and define simply W p n ∶= (w 1 , . . . , w n ) ∈ P 2 × . . . × P n+1 ∶ for m ∈ {1, . . . , n} we have With this definition, we lose the homogeneity of polynomials. Following the approach in [4,Chapter 3], it is natural to define heterogeneous versions of these spaces by that is, the tuplets of heterogeneous solutions with prescribed growth. Here p + φ p is the unique element of L 1 (up to additive constants) satisfying In other words, φ p is the first-order corrector with slope p: see Lemma 2.10. The next theorem gives a higher-order Liouville-type result which classifies the spaces W p n and states that they may be used to approximate any solution of the nonlinear equation with the precision of a C n,1 estimate. Theorem 1.6 (Large-scale regularity). Fix n ∈ {1, . . . , N} and M ∈ [1, ∞). There exist constants σ(n, M, data), δ(n, data) ∈ 0, 1 2 and a random variable X satisfying the estimate such that the following statements hold: (i) n There exists a constant C(n, M, data) < ∞ such that, for every p ∈ B M and (w 1 , . . . , w n ) ∈ W p n , there exists (w 1 , . . . , w n ) ∈ W p n such that, for every R ≥ X and m ∈ {1, . . . , n}, (ii) n For every p ∈ B M and (w 1 , . . . , w n ) ∈ W p n , there exists (w 1 , . . . , w n ) ∈ W p n satisfying (1.30) for every R ≥ X and m ∈ {1, . . . , n}. (iii) n There exists C(n, M, data) < ∞ such that, for every R ≥ X and v ∈ L(B R ) satisfying ∇v L 2 (B R ) ≤ M, there exist p ∈ B C and (w 1 , . . . , w n ) ∈ W p n such that, defining for every r ∈ X , 1 2 R , we have, for k ∈ {0, 1, . . . , n}, the following estimates: The proof of Theorem 1.6 is given in Section 7.
Appendices C-E of this paper contain estimates for constant-coefficient equations which are essentially known but not to our knowledge written anywhere. We also collect some auxiliary estimates and computations in Appendices A and B.

Regularity estimates for the effective Lagrangian
In this section, we suppose that n ∈ {0, . . . , N − 1} is such that The goal is to prove Theorem 1.1 for n + 1.
We proceed by constructing the linearized correctors ψ (m) p,h up to m = n + 2 and relate the correctors of different orders to each other via differentiation in the parameter p. We show that these results allow us to improve the regularity of D 2 L up to C n+1,β and obtain the statement of Theorem 1.1 for n + 1. In particular, this allows us to define the effective coefficient F n+1 . We also give formulas for the derivatives of L and for F m in terms of the correctors, which allow us to relate them to each other and show that (1.16) holds.
2.1. The first-order correctors and linearized correctors. In this subsection we construct the linearized correctors up to order n + 2.
For each p ∈ R d , we define φ p to be the first-order corrector of the nonlinear equation, that is, the unique solution of The existence and uniqueness (up to additive constants) of the first-order corrector φ p is classical: it can be obtained from a variational argument (applied to an appropriate function space of stationary functions. Alternatively, it can be shown (following the proof given in [4,Section 3.4]) that the elements of L 1 , which was characterized in Theorem 1.5 above (which was proved already in [1]), have stationary gradients.
We define the coefficient field a p (x) to be the coefficients for the linearized equation around the solution x ↦ p ⋅ x + φ p (x): Given p, h ∈ R d , we define the first linearized corrector ψ (1) p,h to satisfy In other words, ψ (1) p,h is the first-order corrector with slope h for the equation which is the linearization around the first-order corrector x ↦ p ⋅ x + φ p (x). For m ∈ {2, . . . , N + 2}, we define the mth linearized corrector to be the unique (modulo additive constants) random field ψ p,h (x), . . . , ∇ψ In other words, ψ p,h is the corrector with slope zero for the mth linearized equation Notice that this gives us the complete collection of correctors for the latter equation, since by linearity we observe that ψ Furthermore, by the linearity of the map h ↦ h + ∇ψ (1) p,h (x), it is easy to see from the structure of the equations of ψ (m) p,h that, for p, h ∈ R d , and t ∈ R, p,h , . . . , ∇ψ We first show that the the problem (2.3) for the mth linearized corrector is well-posed for m ∈ {2, . . . , n + 1}. This is accomplished by checking, inductively, using our hypothesis (2.1) (and in particular the validity of Theorem 1.3 for m ≤ n), that we have appropriate estimates on the vector fields F m (⋯) on the right side. We have to make this argument at the same time that we obtain estimates on the smoothness of the correctors ψ (m) p,h as functions of p. In fact, we prove that ∇ψ We will also obtain C 0,1 -type bounds on the linearized correctors, which together with the previous display yields good quantitative control on the smoothness of the correctors in p.
Before the main statements, let us collect a few preliminary elementary results needed in the proofs. The following lemma is well-known and can be proved by the Lax-Milgram lemma (see for instance [21,Chapter 7]). Lemma 2.1. Let a(⋅) be a Z d -stationary random field valued in the symmetric matrices satisfying the ellipticity bound Then there exists a unique random potential field ∇z satisfying . Next, a central object in our analysis is the quantity F m defined in (1.15). Fix m ∈ {2, . . . , N+1} and x, p, h 1 , . . . , h m−1 ∈ R d . One can easily read from the definition that we have It then follows, by Young's inequality, that there exists C(m, data) < ∞ such that We have, similarly, that Using these we get, for x, p,p, Therefore, by Young's inequality, we get, for all δ > 0, Furthermore, as we will be employing an induction argument in m, it is useful to notice that the leading term in F m has a simple form, and we have Our first result in this section gives direct consequences of (2.1) for estimates on the first-order correctors and linearized correctors.
Since ∇ψ p,h is Z d -stationary random field, we have by the ergodic theorem, after sending R → ∞, that a.s.
To show (2.23), we first claim that the difference quotient solves the equation Indeed, then (2.23) follows by Lemma 2.1. Rewriting we observe that the first three terms on the right are solenoidal by the equations of ψ p,h , respectively, and the fourth term on the right is zero by (2.22). We thus obtain (2.24).
Step 2. We will estimate the terms on the right in (2.23) separately, and in this step we first show that, for t ∈ (−1, 1), t ≠ 0, By the triangle inequality we have Therefore, by Hölder's inequality and (2.16), Step 3. We show that we obtain (2.26) by (2.16).
Step 4. We then prove that Using decomposition (2.12), we have that for n ≥ 2, and thus differentiable in p. In particular, by (2.16), for every q ∈ [2, ∞) there is δ(q, n, data) > 0 and C(q, n, M, data) < ∞ such that We have by (2.16) that and therefore . The right hand side can be estimated with the aid of (2.25) to obtain (2.27).
Step 5.  (2.19) for m = n. Therefore, we may replace n by n + 1 in Steps 1-4 above, and conclude that (2.20) is valid for m = n + 1 as well, which then gives (2.19) for m = n + 1. Using obtained formula, it is straightforward to show that (2.20) is valid for m = n with β = 1. Indeed, we notice that from which we get (2.20) for m = n with β = 1. Finally, (2.20) implies that t ↦ ∇φ p+th is in C n+2,β close to t = 0, and thus we have that (2.21) is valid by (2.13). The proof is complete.
and p ∈ B M . Then p ↦ p + ∇φ p is (n+2) times differentiable with respect to p and, for q ∈ [2, ∞), there are constants δ(q, n, data) ∈ 0, 1 2 and C(q, n, M, data) < ∞ such that, for m{1, . . . , n + 1}, The proof of Lemma 2.5 relies on a general principle in polarization based on multilinear analysis, and it is formalized in the following lemma. Lemma 2.6. Let V be a real, finite-dimensional vector space, and let Φ ∶ V n → R be a multilinear, symmetric form, that is, for all v 1 , . . . , v n ∈ V and any permutation σ of {1, . . . , n}, we have where the leftmost summation is over all non-empty subsets A ⊆ {1, 2, . . . , n} and A is the number of elements in A.
Proof. We show the equivalent statement that A⊆{1,...,n} where the sum on the left is over all non-empty subsets A of {1, 2, . . . , n}. For this, we begin by expanding each summand Using the symmetry of Φ, each such term can be written as Letting c A (f ) denote the number of ordered n-tuples (j 1 , . . . , j n ) of elements of A which can be reordered to form (f (1), . . . , f (n)), it follows that the expression ∑ A⊆{1,...,n} (−1) n− A φ ∑ j∈A v j can be expanded to give A⊆{1,...,n} Changing the order of summation gives where the sum on the right is over all subsets A of {1, 2, . . . , n} which contain im f . Each such subset A can be written as {f (1) Finally, it is a well-known combinatorial fact that for every nonempty finite set S, we have where the sum on the left is over all subsets B of S. Thus, above, we have . . , f (n) = n, and in this case as was to be shown; this proves the polarization formula. We next prove Hölder continuity of p ↦ D n+2 Proof. Fix M ∈ [1, ∞) and β ∈ (0, 1), and fix p, p ′ ∈ B M and h ∈ B 1 ∖ {0}. By Lemma 2.1 and equations of ψ Therefore, in view of Lemma 2.6, it is enough to show that SinceF n+2 is C 0,1 in its first argument and polynomial in its last n arguments, we obtain by homogeneity, the chain rule and Lemma 2.5 that Therefore, the leading term is the first one on the right in (2.31). Indeed, we observe by the above display that To conclude, we need to estimate the first term on the right. To this end, we use the triangle inequality to get, for any β ∈ [0, 1], It follows, by Hölder's inequality, that By Lemma 2.5 and Hölder's inequality, we get Consequently, combining above displays yields (2.30), finishing the proof.

Smoothness of L.
It is a well-known and easy consequence of qualitative homogenization that DL is given by the formula In [1], we proved that L ∈ C 2 and D 2 L is given by the formula We next generalize this result to higher derivatives of L by essentially differentiating the previous formula many times and applying the results of the previous subsection. Moreover, we validate the statement of Theorem 1.1 for n + 1.
There exist α(data), δ(data) > 0, C(M, data) < ∞ and a random variable X satisfying X ≤ O δ (C) such that, for every r ≥ X , p ∈ B M , h ∈ B 1 and m ∈ {1, . . . , n + 1}, is valid for m ∈ {1, . . . , k} for some k ∈ {1, . . . , n}. We then show that it continues to hold for m = k + 1. Since (2.1) is valid, by taking α and X as in Theorem 1.2, and setting Y ∶= We relabel Y to X and α 2 to α. We further take X larger, if necessary, so that [1, Proposition 4.3] is at our disposal. Suppressing both p and h from the notation, we let φ r and ψ (m) r , for m ∈ {1, . . . , n + 1}, solve all homogenize to zero and we get, by Theorem 1.2, that, for r ≥ X , This and the induction assumption, i.e. that (2.36) is valid for m ∈ {1, . . . , k}, together with Lemma 2.11 below, imply that Combining the previous two displays yields Therefore, by compactness, there existsψ (k+1) such that, for t ∈ [X , r], Proceeding now as in [4,Section 3.4] proves that ∇ψ (k+1) is Z d -stationary. Finally, by integration by parts we also obtain that E ∫ ◻ 0 ∇ψ (k+1) = 0 and, therefore, sinceψ (k+1) solves the same equation as ψ (k+1) , by the uniqueness we have that ψ (k+1) = ψ (k+1) up to a constant. The proof is hence complete by the previous display.
Above we made use of a lemma, which roughly states that if two solutions of the systems of linearized equations are close in L 2 then their gradients are also close. This lemma will also be applied repeatedly in the following sections.
The estimate (2.38) is just the estimate for ξ 0 in Theorem 1.3. It also implies By (2.10), we have that Using Hölder's inequality and applying Theorem 1.3 for n, we obtain, for any p ∈ [2, ∞) and δ > 0, We observe that ζ ∶= w m −w m satisfies the equation Combining these and using (2.41) and the validity of Theorem 1.3, we get Taking δ 0 sufficiently small and putting δ m ∶= 2 −m δ 0 , we get by induction (using Young's inequality and rearranging several sums) that, for every m ∈ {1, . . . , N }, Combining this with (2.42), we get These imply (2.39) and (2.40) after a covering argument.

Quantitative homogenization of the linearized equations
In this section, we suppose that n ∈ {0, . . . , N − 1} is such that The goal is to prove that Theorem 1.2 is also valid for n + 1 in place of n. That is, we need to homogenize the (n + 2)th linearized equation.
In order to prove homogenization for the (n + 2)th linearized equation, we follow the procedure used in [1] for homogenizing the first linearized equation. We first show, using the induction hypothesis (3.1), that the coefficients D 2 p L ∇u ε , x ε and F n+1 x ε , ∇u ε , ∇w ε 1 , . . . , ∇w ε n can be approximated by random fields which are local (they satisfy a finite range of dependence condition) and locally stationary (they are stationary up to dependence on a slowly varying macroscopic variable). This then allows us to apply known quantitative homogenization results for linear elliptic equations which can be found for instance in [4].
3.1. Stationary, finite range coefficient fields. We proceed in a similar fashion as in [1, Section 3.4] by introducing approximating equations which are stationary and localized.
We fix an integer k ∈ N which represents the scale on which we localize. Let v and then recursively define, for each Θ = (p, . Finally, we create 3 k Z d -stationary fields by gluing the above functions together: for each x ∈ R d , we define Notice that v  n+2,Θ is 3 k Z d -stationary and has a range of dependence of at most 3 k √ 15 + d, by construction. The same is also true of the corresponding coefficient fields, since these are local functions of this random field: In the next subsection, we will apply some known quantitative homogenization estimates to the linear equation n+2,Θ . In order to anchor these estimates, we require some deterministic bounds on the vector field F (k) n+2,Θ and thus on the gradients of the functions w (k) m,Θ defined above. These bounds are exploding as a large power of 3 k , but we will eventually choose 3 k to be relatively small compared with the macroscopic scale (so these large powers of 3 k will be absorbed).
Then we obtain that, for some Q(M, data) < ∞, By the Caccioppoli inequality, we get In view of (3.6) and the previous two displays, another application of Proposition A.2 yields, after enlarging Q and C, the bound (3.7) for m = M . This completes the induction argument and the proof of (3.7). The bound (3.5) immediately follows.
By the assumed validity of Theorem 1.3 for n, we also have that, for each M, q ∈ [2, ∞) there exist δ(q, data) > 0 and C(M, q, data) < ∞ and a random variable X = O δ (C) such that, for every k ∈ N with 3 k ≥ X and every m ∈ {1, . . . , n + 1} and Θ = (p, h 1 , . . . , h n+1 ) ∈ R d(n+2) with p ≤ M, we have and hence, for such k and Θ and m ∈ {1, . . . , n + 2}, Observe that (3.5) and (3.9) together imply that, for δ and C as above and every m ∈ {1, . . . , n + 2}, We next study the continuity of a  . There exist constants δ(q, data) > 0 and Q(q, data), C(q, M, data) < ∞ and a random variable and, for every m ∈ {1, . . . , n + 2}, Proof. We take X to be larger than the random variables in the statements of Lemma 2.11 and Theorems 1.2 and 1.3 for n. The bound (3.11) is then an immediate consequence of (2.38) and the obvious fact that ∇v (k) We then use the equation for the difference w m,Θ ′ (see (2.43)) and then apply the result of Lemma 2.11 to obtain, for every m ∈ {1, . . . , n + 1}, By induction we now obtain, for every m ∈ {1, . . . , n + 1}, This implies (3.12).
By combining (3.5) and (3.12) and using interpolation, we obtain the existence of α(data) > 0, Q(data) < ∞ and C(data) > 0 such that, with X as in the statement of Lemma 3.2, then for every k ∈ N with 3 k ≥ X , m ∈ {1, . . . , n + 2}, and Θ = (p, Likewise, we can use (3.6) and (3.11) to obtain This variation of Lemma 3.2 will be needed below.

3.2.
Setup of the proof of Theorem 1.2. We are now ready to begin the proof of the implication (3.1) ⇒ the statement of Theorem 1.2 with n + 1 in place of n.
By the assumption (3.1), we only need to homogenize the linearized equation for m = n + 2. As we have already proved a homogenization result in [1, Theorem 1.1] for the linearized equation with zero right-hand side, it suffices to prove (1.21) for m = n + 2 under the assumption that the boundary condition vanishes: We can write the equations for w ε n+2 and w n+2 respectively as Our goal is to prove the following estimate: there exists a constant C(M, data) < ∞, exponents σ(data) and α(data) > 0 and random variable X satisfying as in the statement of the theorem: To prove (3.24), we first compare the solution w n+2 to the solutionw ε n+2 of a second heterogeneous problem, namely where the coefficient fieldsã ε andF ε n+2 are defined in terms of the localized, stationary approximating coefficients (introduced above in Subsection 3.1) by x ε . The parameter k ∈ N will be chosen below in such a way that 1 ≪ 3 k ≪ ε −1 . We also need to declare a second mesoscopic scale by taking l ∈ N such that 1 ≪ 3 k ≪ 3 l ≪ ε −1 and, for every m ∈ {1, . . . , n + 1}, Like k, the parameter l will be declared later in this section. For convenience we will also take a slightly smaller domain U n+3 than U n+2 , which also depends on ε and l and is defined by The estimate (3.24) follows from the following two estimates, which are proved separately below (here X denotes a random variable as in the statement of the theorem):

3.3.
Estimates on size of the w ε m and w m . To prepare for the proofs of (3.29) and (3.30), we present some preliminary bounds on the size of the w ε m 's. The first is a set of deterministic bounds representing the "worst-case scenario" in which nothing has homogenized up to the current scale.
We will argue by induction in m ∈ {1, . . . , n + 2} that there exists Q(m, data) < ∞ and C(m, {U k }, M, K 0 , data) < ∞ such that Then we obtain that, for some Q(M, data) < ∞, Then by the basic energy estimate, we get In view of (3.6) and the previous two displays, another application of Proposition A.2 yields, after enlarging Q and C, the bound (3.33) for m = M . This completes the induction argument and the proof of the lemma.
The typical size of ∇w ε m is much better than (3.31) gives.
We also require bounds on the homogenized solutions u and w 1 , . . . , w n+2 . These are consequences of elliptic regularity estimates presented in Appendix D, namely Lemma D.1, which is applicable here because of the assumption (3.1) which ensures that L ∈ C 3+N,β for every β ∈ (0, 1). We obtain the existence of C(M, data) < ∞ such that, for every m ∈ {1, . . . , n + 1}, In particular, the function Θ defined in (3.19) is Lipschitz continuous. By the global Meyers estimate, we also have, for some δ(d, Λ) > 0 and C(M, data) < ∞, the bound We may also apply Lemma D.1 to get a bound on w n+2 . In view of the merely mesoscopic gap between ∂U n+2 and U n+3 , which is much smaller than the macroscopic gaps between ∂U m and U m+1 for m ∈ {1, . . . , n+1}, cf. (3.15) and (3.28), the estimate we obtain is, for every β ∈ (0, 1), for an exponent Q(β, data) < ∞ which can be explicitly computed rather easily (but for our purposes is not worth the bother) and a constant C(β, M, data) < ∞.

3.4.
The mesoscopic parameters k and l. Here we enter into a discussion regarding the choice of the parameters k and l which specify the mesoscopic scales. Recall that 1 ≪ 3 k ≪ 3 l ≪ ε −1 . As we will eventually see later in this section, we will estimate the left sides of (3.29) and (3.30) by expressions of the form where Q(data) < ∞ is a large exponent and α(data) > 0 is a small exponent. We then need to choose k and l so that the expression in parentheses is a positive power of ε. We may do this as follows. First, we may assume throughout for convenience that Q ≥ 1 ≥ 4α. Then we may take care of the first two terms by choosing l so that ε3 l is a "very large" mesoscale: we let l be defined so that Then we see that, for some β > 0, Next, we take care of the last term: since, for some β > 0, we can make this smaller than ε β 2 by taking ε3 k to be a "very small" mesoscale. We take k so that, for β and Q as in the previous display, From this we deduce that (ε3 l ) α 3 kQ ≤ Cε β 2 . We see that this choice of k also makes the third term inside the parentheses on the right side of (3.40) smaller than a positive power of ε. With these choices, we obtain that, for some β > 0, Throughout the rest of this section, we will allow the exponents α ∈ 0, 1 4 and Q ∈ [1, ∞) to vary in each occurrence, but will always depend only on data. Similarly, we let c and C denote positive constants which may vary in each occurrence and whose dependence will be clear from the context. 3.5. The minimal scales. Many of the estimates we will use in the proof of Theorem 1.2 are deterministic estimates (i.e., the constants C in the estimate are not random) but which are valid only above a minimal scale X which satisfies X = O δ (C) for some δ(data) > 0 and C(M, data) < ∞. This includes, for instance, the estimates of Theorems 1.1, 1.2 and 1.3 assumed to be valid by our assumption (3.1), as well as Lemmas 2.11, 3.1, 3.2 and 3.4, some estimates like (3.13) which appear in the text, and future estimates such as those of Lemmas 3.5, 3.6 and 3.7. We stress that this list is not exhaustive.
In our proofs of (3.29) and (3.30), we may always suppose that 3 k ≥ X , where X is the maximum of all of these minimal scales. In fact, we may suppose that 3 k is larger than the stationary translation T z X of X by any element of z ∈ Z d ∩ ε −1 U 1 .
To see why, first we remark that if X is any random variable satisfying X = O δ (C), then a union bounds gives that, for any Q < ∞ the random variable satisfies, for a possibly smaller δ and larger C < ∞, depending also on Q, the estimateX = O δ 2 (C). Since, as explained in the previous subsection, 3 k is a small but positive power of ε −1 , we see that by choosing Q suitably we have that 3 k ≥X implies the validity of all of our estimates. Finally, in the event that 3 k <X , we can use the deterministic bounds obtained in Lemma 3.3 and (3.38) very crudely as follows: Then we use that where in the previous display the exponent Q is larger in the second instance than in the first. This yields (3.29) and (3.30) in the event {3 k >X }, so we do not have to worry about this event.
Therefore, throughout the rest of this section, we let X denote a minimal scale satisfying X = O δ (C) which, in addition to both δ and C, may vary in each occurrence.
3.6. The proof of the local stationary approximation. We now turn to the proof of (3.29), which amounts to showing that the difference between the coefficient fields (a ε , F ε n+2 ) and (ã ε ,F ε n+2 ) is small. This is accomplished by an application of Lemma 2.11. Lemma 3.5. There exist α(data) > 0, Q(data) < ∞ and C(M, data) < ∞ and a minimal scale Proof. Throughout X denotes a random scale which may change from line to line and satisfies X = O δ (C).
For each z ∈ ε3 l Z d with z +ε◻ l+1 ⊆ U n+1 , we compare ∇u and ∇w ε m to the functions ∇v Likewise, if 3 l ≥ X , then for every z as above, and every m ∈ {1, . . . , n + 1}, Here we used (3.37) in order that the C not depend implicitly on ∇u(z) and in order that the prefactor on the right side of the second line be is as it is, rather than ∑ m j=1 ∇w j (z) m j . Using (3.37) again, we see that Combining the three previous displays with the triangle inequality, we get, for all such m and z and provided that 3 l ≥ X ,

An application of Lemma 2.11 then yields
By L p interpolation and the bounds (3.8) and (3.34), we obtain, for every q ∈ [2, ∞), an X such that 3 l ≥ X implies that, for every m and z as above, By a very similar argument, comparing the functions the functions ∇v ε , also using Lemma 2.11 and the assumed validity of Theorems 1.2 and 1.3 for n, we obtain, for all m and z as above, Using (3.13) and (3.14), in view of (3.37), we find an exponent Q(data) < ∞ such that, for every m and z as above, Combining the above estimates, we finally obtain that, for every z and m as above, Using that these coefficient fields are bounded and that the boundary layer (i.e., the set of points which lie in U n+2 but not in any cube of the form z + ε◻ l with z as above) has thickness O(ε3 l ), we get from the previous estimate that where (in order to shorten the expressions) we denote To complete the proof of the lemma, we observe that ζ ∶=w ε n+2 − w ε n+2 ∈ H 1 0 (U n+2 ) satisfies the equation By the basic energy estimate (test the equation for ζ with ζ), we obtain . The first term on the right side is already estimated above. For the second term, we use the Meyers estimate and (3.34), without forgetting (3.20), to find δ(d, Λ) > 0 such that Therefore, by Hölder's inequality and the above estimate on a ε −ã ε L q (z+ε◻ l ) with exponent q ∶= 4+2δ δ , we obtain This completes the proof of (3.44).
In view of the discussion in Subsections 3.4 and 3.5, Lemma 3.5 implies (3.29).

3.7.
Homogenization estimates for the approximating equations. To prepare for the proof of (3.29), we apply the quantitative homogenization estimates proved in [4] for the linear equation (3.4). We denote by a indexed over bounded Lipschitz domains U ⊆ R d and e ∈ B 1 and Θ ∈ R d(n+2) , satisfy the following quantitative homogenization estimate: there exist α(s, d, Λ) > 0, β(data) > 0, δ(data) > 0 and Q(data) < ∞ and C(s, M, data) < ∞ and a random variable X = O δ (C) such that, for every Θ = (p, h 1 , . . . , h n+1 ) ∈ R d(n+2) with p ≤ M, e ∈ R d and every l ∈ N with 3 l−k ≥ X , To obtain (3.47), we change the scale by performing a dilation x ↦ 3 k ⌈(15 + d) 1 2 ⌉ x so that the resulting coefficient fields have a unit range of dependence and are still Z d -stationary, cf. (3.3). We then apply [4,Lemma 11.11] to obtain (3.47), using Lemma 3.1, namely (3.5), to give a bound on the parameter K in the application of the former. The reason we quote results for general nonlinear equations, despite the fact that (3.46) is linear, is because (3.46) has a nonzero right-hand side and the results for linear equations have not been formalized in that generality.
The bound (3.48) is a consequence the large-scale C 0,1 -type estimate for the equation (3.4) (see [4,Theorem 11.13]) which, together with an routine union bound, yields (3.48) with the left side replaced by sup z∈Z d ∩◻ l ∇v L 2 (z+(l−k)◻ k ) . Indeed, if we denote by X z the minimal scale in the large-scale C 0,1 estimate, then a union bound yields (for any s ∈ (0, d), so in particular we can take s > 1), We obtain (3.48) by combining this bound for sup z∈Z d ∩◻ l ∇v L 2 (z+(l−k)◻ k ) with the deterministic bounds (3.5), (3.6) and an application of the Schauder estimates (see Proposition A.2).
Our next goal is to compare the homogenized coefficients F  Lemma 3.6. Fix q ∈ [2, ∞) and M ∈ [1, ∞). There exist δ(q, data), α(q, data) ∈ (0, 1 2 ], C(q, M, data) < ∞ such that, for every k ∈ N with 3 k ≥ X , m ∈ {1, . . . , n + 2} and p, h ∈ R d with p ≤ M, we have Proof. We take X to be the maximum of the random scales in Theorem 2.2, Lemmas 2.10 and 2.11 and Theorems 1.2 and 1.3 for n, the latter two being valid by assumption (3.1). Assume 3 k ≥ X . By Lemma 2.10 and the assumed validity of Theorem 1.2 for n, we have and, for m ∈ {2, . . . , n + 1}, By Theorem 2.2 and the assumed validity of Theorem 1.3 for n, we also have that, for every m ∈ {1, . . . , n + 1}, Using these estimates and Lemma 2.11, we obtain, for m ∈ {2, . . . , n + 1}, The previous display holds for 3 k ≥ X . Combining this estimate with (2.16) and (3.10) and using L p interpolation, we obtain The conclusion of the lemma now follows.
We next observe that the homogenized coefficients a (k) p and F (k) n+2,Θ agree, up to a small error, with a p ∶= D 2 L(p) and F n+2 (Θ). This will help us eventually to prove that the homogenized coefficients for the linearized equations are the linearized coefficients of the homogenized equation.  . By Lemma 3.6, and the bounds (3.5) and (2.16), we find that, for every q ∈ [2, ∞), We also require the following continuity estimate for the functions w(⋅, ◻ l , e, Θ) in (e, Θ). Following the proof of Lemma 3.2 for L 2 dependence in Θ (we need need to do one more step of the iteration described there), using (3.47) for L 2 dependence in e, and then interpolating this result with (3.48), we obtain exponents α(data) > 0 and Q(data) < ∞ and a random variable X = O δ (C) such that, for every e, e ′ ∈ R d and Θ = (p, h 1 , . . . , h n+1 ) ∈ R d(n+2) and Θ 3.8. Homogenization of the locally stationary equation. We next present the proof of (3.30), which is the final step in the proof of Theorem 1.2. The argument follows a fairly routine (albeit technical) two-scale expansion argument and requires the homogenization estimates given in Subsection 3.7.
The proof of (3.30). In view of the discussion in Subsection 3.5, we may assume throughout that 3 k ≥ X where X is any minimal scale described above. We begin by building an approximation of the solutionw ε n+2 of (3.25) built out of the solutions of (3.46). We select a smooth function ζ ∈ C ∞ c (R d ) such that, for some C(d) < ∞ and each i ∈ {1, 2}, We can construct such a ζ by a suitable mollification of 1 ε◻ l . We write ζ z ∶= ζ(⋅ − z) for short. We next define a functionṽ ε n+2 ∈ H 1 A preliminary goal is therefore to prove the estimate The proof of (3.63) is essentially completed in Steps 1-5 below: see (3.72) and (3.73).
After obtaining this estimate, we will prove that Together these inequalities imply (3.30).
Denote v z ∶= v ⋅, z ε + ◻ l+1 , ∇w n+2 (z), Θ(z) for short and compute Using the equation for v z , we find that Therefore, we have that .
We next estimate each of the five terms on the right side of the previous display.
Step 1. The estimate of the first term on the right side of (3.67). This is where we use the homogenized equation for w n+2 . Fix h ∈ H 1 0 (U n+2 ) with h H 1 (U n+2 ) = 1. By the equation for w n+2 and integration by parts that

Therefore we may have
For the first term on the right, we use (3.37), (3.47), (3.56) and (3.57) to obtain For the second term, we denote observe that U n+2 ∖ U n+4 ≤ Cε3 l , and compute, using the Hölder inequality, to obtain, using also (3.38), Combining the above estimates and taking the supremum over h ∈ H 1 0 (U n+2 ) with h H 1 (U n+2 ) = 1 yields Step 2. The estimate of the second term on the right side of (3.67). We use (3.14), (3.37) and (3.50) to find Step 3. The estimate of the third term on the right side of (3.67). We have by (3.13) and (3.37) that Step 4. The estimate of the fourth term on the right side of (3.67). For convenience, extend v z so that it is equal to the affine function ∇w n+2 (z) outside of the cube z + ε◻ l+1 . By the triangle inequality, we have where here and below we use the notation y ∼ z to mean that y + ◻ l+1 and z + ◻ l+1 have nonempty intersection. Using (3.61) and the fact that ∑ z∈ε3 l Z d ∇ζ z = 0, we have that is a boundary layer of U n+3 of thickness ε3 l . Observe that V ≤ Cε3 l . Thus by (3.47) we find that Combining (3.61) with (3.49), (3.37) and the triangle inequality to compare the v z 's on overlapping cubes, we find that For the last term, we compute, using (3.39), Putting the above inequalities together, we find that Step 5. The estimate of the fifth term on the right side of (3.67). Recall that U n+4 is defined in (3.68). We have that, using (3.38) and (3.39), Step 6. Let us summarize what we have shown so far. Substituting the results of the five previous steps into (3.67), we obtain that This implies by (3.62) that Therefore to obtain the lemma it suffices to prove (3.64). To prove this bound, we use the identity which, combined with (3.47), and (3.37), yields This is (3.64).
In view of the selection of the mesoscopic parameters k and l in Section 3.4, which gives us (3.41), the proof of (3.30) is now complete.

Large-scale C 0,1 -type estimates for linearized equations
In the next two sections, we suppose that n ∈ {0, . . . , N − 1} is such that (4.1) Theorems 1.1 and 1.2 are valid for n + 1 in place of n, and Theorem 1.3 is valid for n.
The goal is to prove that Theorem 1.3 is also valid for n + 1. Combined with the results of the previous two sections and an induction argument, this completes the proof of Theorems 1.1, 1.2 and 1.3.
The goal of this section is to prove the half of Theorem 1.3, namely estimate (1.22) for w n+2 . The estimate (1.23) will be the focus of Section 5.
Both of the estimate in the conditional proof of Theorem 1.3 are obtained by "harmonic" approximation: homogenization says that on large scales the heterogeneous equations behave like the homogenized equations, and therefore we may expect the former to inherit some of the better regularity estimates of the latter. The quantitative version of the homogenization statement provided by Theorem 1.2 allows us to prove a C 0,1 -type estimate, following a well-known excess decay argument originally introduced in [ We begin by giving a quantitative statement concerning the smoothness of solutions to the linearized equations for the homogenized operator L. This is essentially well-known, but we need a more precise statement than we could find in the literature.

It then follows by the Meyers estimate and equation of w m that
finishing the proof of (4.4) by (4.7).

4.2.
Excess decay iteration for w n+2 . In this subsection we conditionally prove the statement of Theorem 1.3 for n + 1 and for q = 2. The restriction on q will be removed in the next subsection. The proof is by a decay of excess iteration, following along similar lines as the argument from [6], using "harmonic" approximation. The statement we prove is summarized in the following proposition.
Proposition 4.2. Assume that (4.1) holds. Fix M ∈ [1, ∞). Then there exist constants σ(data), α(d, Λ) ∈ 0, 1 2 , C(M, data) < ∞ and a random variable X satisfying such that the following statement is valid. Let R ∈ [X , ∞) and u, w 1 , . . . , w n satisfy, for each m ∈ {1, . . . , n + 1}, Then, for m ∈ {1, . . . , n + 2} and for every r ∈ [X , 1 2 R], we have Proof. The proof is based on combination of harmonic approximation and decay estimates for homogenized solutions presented in Appendix D. The necessary estimate is (D.7). We take the minimal scale X as the maximum of the minimal scale in Lemma 4.1 and in Theorem 1.3, which is valid with n in place of N, corresponding q = 2, and a constant R(M, data) ∈ [1, ∞) to be fixed in the course of the proof. This choice, in particular, implies that there exist constants C(M, data) < ∞ and σ(data) ∈ 0, 1 2 such that X ≤ O σ (C) and X ≥ R. We will prove the statement using induction in shrinking radii. Indeed, we set, for j ∈ N and θ ∈ (0, 1 2 ], r j ∶= θ j r 0 , where r 0 ∈ [X , δR] and δ ∈ 0, 1 2 . Parameters θ, δ and R will all be fixed in the course of the proof. Having fixed θ, δ and R, we assume that there is J ∈ N such that r J+1 < X ≤ r J for some J ∈ N. If there is no such J or X ≥ δR, the result will follow simply by giving up a volume factor. Furthermore, we device the notation of this proof in such a way that it will also allow us to prove the result of the next lemma, Lemma 4.4. We denote, in short, for m ∈ {1, . . . , n + 2} and γ ∈ [0, 1) to be fixed, and Theorem 1.3 implies that, for r ∈ [X , 1 2 R] and m ∈ {1, . . . , n + 1}, we have that Notice then that, by (4.12) and Poincaré's inequality, we have, for r ∈ [X , 1 2 R] and m ∈ {1, . . . , n + 2}, that Step 1. Letting u j solve we show that there exist, for η ∈ (0, 1], constants α 1 (d, Λ) ∈ (0, 1 2 ), δ(η, M, data) < ∞ and R(η, M, data) < ∞ such that, for j ∈ {0, . . . , J}, The parameter η shall be fixed later in (4.19). On the one hand, we have from [1, Theorem 2.3] that there exist constants β 1 (d, Λ) ∈ (0, 1) and C(M, d, Λ) < ∞ such that 1 2r 0 inf On the other hand, by harmonic approximation ([1, Corollary 2.2]) and Lipschitz estimate for u ([1, Theorem 2.3]) we get that there exist constants β 2 (d, Λ) ∈ (0, 1) and C(M, d, Λ) < ∞ such that Thus (4.14) follows by the triangle inequality by taking α 1 ∶= 1 2 (β 1 ∧ β 2 ), and choosing δ small enough and R large enough so that We assume, from now on, that δ and R are such that (4.15) is valid.
Step 5. The last step in the proof is to show the second inequality in (4.18) for j = j * + 1. Let j be the minimizing affine function in inf ∈P 1 w m − L 2 (Br j ) . Then, by the triangle inequality and the first inequality in (4.18) valid for j ∈ {0, . . . , j * }, Thus, summation gives that By the triangle inequality we have that Choosing C = C, where C is as in the above inequality, verifies the second inequality in (4.18) for j = j * + 1. This finishes the proof of the induction step, and thus completes the proof.
To show Lipschitz estimates for the linearization errors in the next section, we need a small variant of the previous proposition. such that the following statement is valid. Let R ∈ [X , ∞) and u, w 1 , . . . , w n satisfy, for each m ∈ {1, . . . , n + 1}, Proof. The proof is a rearrangement of the argument in the proof of Proposition 4.2. Indeed, we take r 0 = r and combine the first inequality of (4.23) with (4.16) and (4.9), which is valid for m ∈ {1, . . . , n + 1}.
4.3. Improvement of spatial integrability. We next complete the conditional proof of (1.22) by improving the spatial integrability of (4.25) from L 2 to L q for every q ∈ [2, ∞). To do this, we use the estimate (4.25) to pass from the large scale R to the microscopic, random scale X . We then use deterministic estimates from classical elliptic regularity theory to obtain local L q estimates in balls of radius one, picking up a volume factor-which is power of X -as a price to pay. The first formalize the latter step in the following lemma.
If we now take X to be the maximum of the minimal scales in the statements of: (1) [4, Theorem 11.13]; (2) Theorem 1.3 for n in place of N and with a sufficiently large exponent of spatial integrability q ′ in place of q (which can be computed explicitly in terms of our q using the Hölder inequality, although we omit this computation)-the validity of which is given by assumption (4.1); (3) Proposition 4.2; then we have that X = O σ (C) as stated in the lemma and that r ≥ X implies the following estimates: Combining these with (4.27), we obtain This completes the proof of the lemma.
In the next lemma we finally achieve the goal of this section, which is to show that (4.1) implies (1.22) for m = n + 1.
Lemma 4.6. Assume that (4.1) holds. Fix M ∈ [1, ∞) and q ∈ (2, ∞). Then there exist constants σ(q, data) ∈ (0, d), C(q, M, data) < ∞ and a random variable X satisfying such that the following is valid. Suppose that R ∈ [X , ∞) and u, w 1 , . . . , w n ∈ H 1 (B R ) such that, for every m ∈ {1, . . . , n + 2}, Proof. Fix q ∈ (2, ∞). Select a parameter θ ∈ (0, 1) which will denote a mesoscopic scale. For each z ∈ R d , we take X z to be the random variable X in the statement of Proposition 4.2, centered at the point z. Define another random variable (a minimal scale) by It is clear from Proposition 4.2 and an routine union bound argument that Next, for every k ∈ N and z ∈ R d we let Z k,z denote the random variable where the supremum is taken over all (u, w 1 , . . . , w n+2 ) ∈ (H 1 (z + ◻ k+1 )) n+3 satisfy- Observe that Z k,z is F(z + ◻ k+1 )-measurable and, by Lemma 4.5 and an easy covering argument, it satisfies the estimate Fix A ∈ [1, ∞) and define another random variable (a minimal scale) Z by We will show below that, if A is chosen sufficiently large (depending of course on the appropriate parameters) then Assuming that (4.32) holds for the moment, let us finish the proof of the lemma. Suppose now that k ∈ N satisfies Y ∨ Z ≤ 3 k ≤ 3 k+1 ≤ R. Let (u, w 1 , . . . , w n+2 ) ∈ (H 1 (B R )) n+3 satisfy (4.28). Then we have that Note that in the third and final lines we used that 3 k ≥ Y, that is, we used the result of Proposition 4.2. This is the desired estimate for X = Y ∨ Z, and so the proof of the lemma is complete subject to the verification of (4.32).
Turning to the proof of (4.32), we notice first that it suffices to show, for A sufficiently large, the existence of σ(q, data) > 0 and C(q, M, data) < ∞ such that, for every k ∈ N, Indeed, we can see that (4.33) implies (4.32) using a simple union bound. Fix then a parameter λ ∈ [1, ∞) and compute, using (4.31) and the Chebyshev inequality, Using the simple large deviations bound for sums of bounded, independent random variables, we have Here we are careful to notice that, while the collection {Z q ⌈θk⌉,z ∧λ ∶ z ∈ 3 ⌈θk⌉ Z d ∩◻ k+1 } of random variables is not independent (since adjacent cubes are touching and thus not separated by a unit distance), we can partition this collection into 3 d many subcollections which have an equal number of elements and each of which is independent. The large deviations estimate can then be applied to each subcollection, and then a union bound yields (4.35). Combining (4.34), (4.35) and the observation that E Z q ⌈θk⌉,z ∧ λ ≤ E Z q ⌈θk⌉,z ≤ C by (4.31), we obtain Taking λ ∶= 3 d 4 (1−θ)k and A q ∶= C + 1 yields (4.33). The above proof, in view of Remark 4.3, gives the following result without assuming (4.1). This, together with (5.1) below, serves as the base case for the induction.

5.
Large-scale C 0,1 -type estimate for linearization errors In this section we continue to suppose that n ∈ {0, . . . , N − 1} is such that (4.1) holds. The goal is to complete the proof that Theorem 1.3 is also valid for n + 1. Combined with the results of the previous three sections and an induction argument, this completes the proof of Theorems 1.1, 1.2 and 1.3.

5.1.
Excess decay iteration for ξ n+1 . We start by proving higher integrability for a difference of two solutions. This, together with Proposition 4.7, yields the base case for the induction.
Proposition 5.1. Fix M ∈ [1, ∞) and q ∈ [2, ∞). There exist α(data), σ(q, data) ∈ 0, 1 2 , C(q, M, data) < ∞ and a random variable X satisfying such that the following statement is valid. For every R ≥ X and u, v ∈ H 1 (B R ) satisfying, for each m ∈ {1, . . . , n + 1}, Then, for m ∈ {1, . . . , n + 1}, ξ 0 = u − v and r ∈ [X , 1 2 R], we have Proof. On the one hand, the estimate follows by [1,Proposition 4.2]. On the other hand, the proof of is similar to the proof of L q -integrability of w 1 presented in Section 4.3. Indeed, noticing that ξ 0 satisfies the equation we have by the normalization in (5.1) and C 1,α regularity of u and v that we may replace w 1 with ξ 0 in Lemma 4.5 applied with m = 1. Using this together with the Lipschitz estimate (5.3) for ξ 0 as in the proof of Lemma 4.6, concludes the proof of (5.4). We omit further details.
Proposition 5.2. Assume that (4.1) holds. Fix M ∈ [1, ∞). There exist constants α(n, data), σ(n, data) ∈ 0, 1 2 , C(n, M, data) < ∞ and a random variable X satisfying X = O σ (C) such that the following statement is valid. For every R ≥ X and u, v, w 1 , . . . , w n+1 ∈ H 1 (B R ) solving, for each m ∈ {1, . . . , n + 1}, we have, for m ∈ {1, . . . , n + 1} and r ∈ [X , 1 2 R], the estimate Proof. We start by fixing some notation. Let q(n, d, Λ) be as in Lemma C.1, applied for n + 1 instead of n. Corresponding this q, choose X to be the maximum of minimum scales appearing in Proposition 5.1, Theorem 1.3 and Lemma 4.4, of which last two are valid for n + 1 in place of N by (4.1). We also assume that X ≥ R, by taking X ∨ R instead of X , where R will be fixed in the course of the proof to depend on parameters (n, M, data). Chosen this way, X satisfies X ≤ O σ (C) for some constants σ(n, data) > 0 and C(n, M, data) < ∞.
Let r j = θ j ηR, where θ is as in Lemma 4.4 and η ∈ 0, 1 2 . The constant η, as well as R, will be fixed in the course of the proof, so that η is small and R is large. We track down the dependencies on η and R carefully and, in particular, constants denoted by C below do not depend on them. We may assume, without loss of generality, that ηR ≥ X .
Step 2. Cacciopppoli estimate for ξ k+1 . We show that, for all r ∈ [X , R k ], We first have by (C.2) that . By Theorem 1.3 and the choice of q in the beginning of the proof, we obtain, for every r ∈ X , 1 2 R and m ∈ {1, . . . , k + 1}, the estimates and, by Proposition 5.1, Combining above displays yields (5.9) in view of the induction assumption, i.e., that (5.8) holds for m ∈ {0, . . . , k}.
Step 3. We prove that there is a constant C < ∞ independent of η and R such that, for j ∈ N 0 such that r j ≥ 2(X ∨ R), we have and α(data) is the minimum of parameter α appearing in statements of Proposition 5.1 and Lemma 4.4. Let us fix j ∈ N 0 such that r j ≥ 2(X ∨ R). We argue in two cases, namely, either (5.13) or (5.15) below is valid. We prove that in both cases (5.11) follows.
Step 3.1. We first assume that where ε j has been fixed in (5.12). We show that this implies (5.14) inf Notice that this gives (5.11). To show (5.14), we have by the triangle inequality that By the choice of α, we get by Proposition 5.1 that and, by Proposition 4.2, Combining the estimates and using (5.13), we have that We then obtain (5.14) by the choice of ε j in (5.12), provided that (5.13) is valid.
Step 3.2. We then assume that (5.13) is not the case, that is, holds with ε j defined in (5.12). We validate (5.11) also in this case. To this end, let us first observe an immediate consequence of (5.15). By Young's inequality, we have .
Step 4. We show that, for r ∈ [2(X σ ∨ R), r 0 ] we have We proceed inductively, and assume J ∈ N 0 is such that r J ≥ 2(X ∨ R) and for all j ∈ {0, . . . , J} we assume that there exists a constant C(d, θ) ∈ [1, ∞), independent of η and R, such that 1 This is true for J = 0 by the definition of E k . We claim that it continues to hold for j = J + 1 as well. Combining (5.14) and (5.11) with the induction assumption we have, for j ∈ {1, . . . , J}, that Since r J ≥ R, we may take R large and η small enough so that Thus, by summing and reabsorbing, Letting j being the minimizing affine function in inf ∈P 1 ξ k+1 − L 2 (Br j ) , we obtain by the above display and the triangle inequality that By the triangle inequality again, we obtain that We thus obtain by the triangle inequality, together with (5.20) and the above display, that, there exists C(d, θ) < ∞ such that Hence we can take C = C, proving the induction step. Letting then J be such that r ∈ (r J+1 , r J ], we obtain (5.19) by giving up a volume factor.
Step 5. Conclusion. To conclude, we obtain from (5.11) and (5.19) by iterating that We hence find α such that, for all r ∈ [2(X ∧ R, r 0 ], Moreover, by (5.19) and (5.9) we deduce that, for all r ∈ [2(X ∧ R), R k+1 ], . Therefore, (5.8) is valid for m = k + 1, by giving up a volume factor. This proves the induction step and completes the proof.

Sharp estimates of linearization errors
Here we show that Corollary 1.4 is a consequence of Theorems j } be a finite family of balls such that By the Vitali covering lemma, we may further assume that 1 3 Let Z be the finite set consisting of the centers of these balls. The size of Z depends only on the geometry of the sequence of domains U 0 , U 1 , . . . , U n . Let X be the maximum of the random variables X given in Theorem 1.3, centered at elements of Z, divided by the radius of the smallest ball. We assume that r ≥ X . This ensures the validity of Theorem 1.3 in each of the balls B (k) j : that is, for every q ∈ [2, ∞) and m ∈ {1, . . . , n}, we have the estimate and hence, by the covering, Proceeding with the proof of the corollary, we define, as usual, Arguing by induction, let us suppose that m ∈ N with m ≥ 1 and θ > 0 are such that, for every j ∈ {0, . . . , m − 1} and q ∈ [2, ∞), there exists C(q, data) < ∞ such that This obviously holds for m = 1 and some θ(d, Λ) > 0 by the Meyers estimate and Theorem 1.3. We will show that it must also hold for m + 1 and some other (possibly smaller) exponent θ > 0.
Step 1. We show that By the basic energy estimate, By Theorem 1.3 (using our definition of X and the fact that r ≥ X ) and the induction hypothesis, we have, for every k ∈ {1, . . . , m}, This completes the proof of (6.5).

Liouville theorems and higher regularity
In this section we prove Theorem 1.6 by an induction in the degree n. The initial step n = 1 has been already established in [1]. Indeed, (i) 1 and (ii) 1 are consequences of [1,Theorem 5.2], and (iii) 1 follows from Theorem 1.5 which is [1,Theorem 1.3]. Moreover, these estimates hold with optimal stochastic integrability, namely we may take any σ ∈ (0, d) for n = 1 (with the constant C then depending additionally on σ).
Throughout the section we will use the following notation. Given p ∈ R d and k ∈ N, we denote solves −∇ ⋅ (a∇w) = ∇ ⋅ F(p, ∇w 1 , . . . , ∇w n−1 ) in R d . We have the estimate for all m ∈ {1, . . . , n}, and by the equations of w 1 , . . . , w n−1 we see that Therefore, for all R > 0 and m ∈ {1, . . . , n}, Thus we have that, for t ≥ R, which yields (7.13).
We now turn to the proof of the large-scale C n,1 estimate.
Proof of Theorem 1.6 (iii) n . Fix M ∈ [1, ∞). By Theorem 1.3 there exist constants σ(n, M, data) ∈ (0, 1) and C(n, M, data) < ∞ and a random variable X satisfying X ≤ O σ (C) so that the statement of Theorem 1.3 is valid with q = 2(n + 2). We now divide the proof in two steps.

Appendix A. Deterministic regularity estimates
In this first appendix, we record some determinstic regularity estimates of Schauder and Calderón-Zygmund type for linear equations with Hölder continuous coefficients. These estimates, while well-known, are not typically written with explicit dependence on the regularity of the coefficients, which is needed for our purposes in this paper.
Proposition A.1 (Calderón-Zygmund gradient L q estimates). Let β ∈ (0, 1], q ∈ [2, ∞) and a ∈ R d×d be a symmetric matrix with entries in C 0,β (B 2 ) satisfying Then u ∈ W 1,q loc (B 2 ) and there exists C(q, d, Λ) < ∞ such that Proof. We will explain how to extract the statement of the proposition from that of [4,Proposition 7.3]. The latter asserts the existence of δ 0 (q, d, Λ) > 0 such that, for every ball x ∈ B 1 and r ∈ 0, 1 2 satisfying we have, for a constant C(q, d, Λ) < ∞, the estimate Since osc B 2r a ≤ (2r) β [a] C 0,β (B 2 ) , we have the above estimate for every x ∈ B 1 and From this, Fubini's theorem and Young's inequality for convolutions, we obtain . This completes the proof.
Proposition A.2. Let β ∈ (0, 1) and a ∈ R d×d be a symmetric matrix with entries Then u ∈ C 1,β loc (B 2 ) and there exists C(β, d, Λ) < ∞ such that and Proof. We will explain how to extract the statement of the proposition from the gradient Hölder estimate found in [20,Theorem 3.13]. The latter states that, under the assumption that After changing the scale, we obtain the corresponding statement in B r , which asserts that, under the assumption that there exists C(β, d, Λ) < ∞ such that Therefore we take r ∶= 1 2 ∧ [a] − 1 β C 0,β (B 2 ) and apply the previous statement in every ball B r (x) with x ∈ B 1 to obtain After a covering argument, we obtain which yields the proposition.
and similarly for F m+1 . Computing the directional derivative, recalling that we assume that z ↦ L(z, x) is a (m + 2)th degree polynomial, we get by (B.2) that Consequently, we have which is (B.1), concluding the proof.

Appendix C. Linearization errors
In this appendix we compute the equation satisfied by a higher-order linearization error and thereby obtain gradient estimates.
Proof. Throughout the proof we use the notation s k = ∑ k j=1 w j j! and ξ 0 = v − u, so that ξ k = ξ 0 − s k .
Step 1. Recalling that F 1 = 0, we may rewrite By the equations of u, v and w k , we have that It thus remains to estimate E n .
Step 5. We show that For this, we first abbreviate F k = F k (∇u, ∇w 1 , . . . , ∇w k−1 , x) and observe that, by definition, Second, we observe that, by induction on j ≥ 2, we have for all n ≥ j. Third, by commutativity of addition, we observe that Finally, letting F m = 0 for m < 2 and S (j) n = 0 for j > n for notational convenience, we note that the above equation may be rewritten as which is (C.11).
Step 6. Conclusion. We have that Indeed, by a Taylor expansion, we see that Applying this with z 0 = ∇u and z = ∇v − ∇u gives (C.12). Combining this with the previous steps yields the desired estimate (C.1) for E n . Finally, by the Hölder and Young inequalities, we get, for all q ∈ [2, ∞) and r ∈ (0, R], , Let δ 0 be the Meyers exponent corresponding Λ. Let n + 1 − h n + 1 δ 0 and q ∶= 16 δ 0 n(n + 1).

Appendix D. Regularity for constant coefficient linearized equations
In this appendix we prove a lemma tracking down the regularity of a solution (w 1 , . . . , w n ) of the linearized system in the case that L is a smooth, constantcoefficient Lagrangian.
Below we denote by C a constant depending only on parameters (m, η, M, β, d, Λ). It may change from line to line.
Step 7. We now prove that (D.14) is valid for j = m, giving also (D.7). An application of the Caccioppoli estimate (D.20), together with (D.5), which was proved in Step 6 above, we have that, by giving up volume factors, ∇w m − (∇w m ) Bεr m (y) L 2 (Bεr m (y)) ≤ CE Thus, if, on the one hand, r = x − y ∈ (0, r m ], we get by the above two displays that  Step 8. We finally sketch the proof of (D.8). Since it is very similar to the above reasoning, we will omit most of the details. We prove the statement by using induction in m and in k. First, we observe that by differentiation we see that ∂ k x j u satisfies the equation −∇ ⋅ (b∇∂ k x j u) = ∇ ⋅ F k (∇u, ∇∂ x j u, . . . , ∇∂ k−1 x j u). Thus we can apply (D.5) and (D.8) for w k = ∂ k x j u recursively and obtain, by polarization as in Lemma 2.6, that, for every k ∈ {1, . . . , n + 1} and η ∈ 1 2 , 1 , there is a constant C(k, η, M, β, d, Λ) such that Next, w 1 solves −∇⋅(b∇w 1 ) = 0 and z ↦ b(z) = D 2 p L(∇u(z)) is in C n,β by (D.24), we may differentiate the equation at most n times and obtain that w 1 ∈ C n+1,β and it is also straightforward to show that w 1 satisfies (D.8) for m = 1; this is just classical Schauder theory.
By homogeneity, we find a homogeneous polynomial q n+1 of degree n + 1 solving the equation −∇ ⋅ D 2 p L (∇q 1 ) ∇ q n+1 n + 1 = ∇ ⋅ F n ∇q 1 , ∇ q 2 2 , . . . , ∇ q n n in R d , Notice that there is a degree of freedom in the choice of q n+1 . Namely, the solution is unique up to an a-harmonic polynomial of degree n + 1. We will fix this shortly.
To draw parallels between this appendix and Appendix C, we set w n ∶= q n+1 n + 1 and By the estimate in Appendix C, we have that Taking divergence gives us, by the equations of u, w 1 , . . . , w n , that −∇ ⋅ a∇ξ n = ∇ ⋅ E n .
Using the induction assumption we get that E n L σ (Br) ≤ Cr n+1−ε .
Now Lemma E.2 below allows us to identify the homogeneous a-harmonic polynomial part of q n+1 of degree n + 1 such that sup r∈ 0, This proves the induction step, and finishes the proof.
Proof. We proceed via harmonic approximation. Let v r ∈ u+H 1 0 (B r ) be A-harmonic.
In particular, lettingq t be the minimizing element of A n+1 in the definition of D(t), we get by the triangle inequality that t −n ∇q t 2 − ∇q t L p (Bt) ≤ Ct α−n (D(t 2) + D(t)) ≤ Ct α−n (D(r) + M) .
This allows us to identify q ∈ A n+1 such that, for t ∈ (0, r), The proof is complete by an easy estimate D(1 − ε) ≤ C ε ( ∇u L 2 (B 1 ) + M).