Relaxation analysis in a data driven problem with a single outlier

We study a scalar elliptic problem in the data driven context. Our interest is to study the relaxation of a data set that consists of the union of a linear relation and single outlier. The data driven relaxation is given by the union of the linear relation and a truncated cone that connects the outlier with the linear subspace.


Introduction
The data driven perspective is new in the field of material science and partial differential equations, we mention [16] and [6] as the two fundamental contributions of this young field. In the data driven perspective certain laws of physics are accepted as invariable, e.g. balance of forces or compatibility. On the other hand, material laws (such as Hooke's law) can be questionable. In the classical approach, measurements are used to estimate constants of material laws. The new paradigm is to use a set of data points, obtained from measurements; the data points are not interpreted as realizations of some law, but calculations and analysis are based directly on the cloud of data points.
On a more formal level, one introduces a set E of functions that satisfy the invariable physical laws. A second set D denotes those functions that are consistent with the data. In this setting, the aim is to find functions in E that minimize the distance to the data set D.
The emphasis in [16] was to derive computing algorithms for this new approach. The mathematical analysis in [6] establishes well-posedness properties and introduces, among other tools, data convergence and relaxation in the data driven context. It is shown that data driven relaxation differs markedly from traditional relaxation, see the discussion below.
In the work at hand, we investigate a scalar setting, which can be used e.g. in the modelling of porous media. We seek for two functions, G (a gradient) and J (a negative flux). Given a domain Q ⊂ R n and a source f : Q → R, the invariable physical laws are the compatibility G = ∇U for some U and the mass conservation ∇ · J = f (in other contexts, the second law is the balance of forces). We introduce E := (G, J) ∈ L 2 (Q; R n ) × L 2 (Q; R n ) | G = ∇U , U ∈ H 1 0 (Q, R), ∇ · J = f .
(1.1) In the classical approach, one might be interested in the linear material law given by J = AG for A ∈ R n×n . We note that a pair (G, J) ∈ E with J = AG can be found be solving the scalar elliptic equation ∇ · (A∇U ) = f .
In the data driven perspective, the material law is replaced by a data set D. In a simple setting, we are given a local data set D loc := {(g i , j i ) | i ∈ I} ⊂ R n × R n for some index set I. This data set might be obtained by measurements, in this case the index set I is finite and D loc is a cloud of points in R n × R n . The set of functions that respect the data is D := (g, j) ∈ L 2 (Q; R n ) × L 2 (Q; R n ) | (g(x), j(x)) ∈ D loc for a.e. x ∈ Q .
(1.2) In the data driven perspective, the task is: Find a pair (G, J) ∈ E that minimizes the distance to the set D.
We remark that we recover the classical problem if we introduce D A loc := {(g, j) ∈ R n × R n | j = Ag} (1.3) and the corresponding set of functions D A as in (1.2). For typical choices of Q, A, and f , the linear problem can be solved; in this case, there exists (G, J) ∈ E ∩ D A and the minimization task has a solution that realizes the distance 0. The advantage of the data driven perspective is the generality of the data set. In the minimization task above, an arbitrary data set D can be considered. Three different types of questions can be asked: Which limits are attainable in the sense of data convergence?
The present paper is devoted to the third question. We investigate a special data set: D loc is the union of D A loc and D B loc , where D A loc is as in (1.3) and D B loc is a one-point set of a single outlier. In this setting, the minimization problem is solvable with distance 0 since D is larger than D A . Our interest is to study the relaxation problem.
The motivation to study the data set D loc = D A loc ∪ D B loc is to understand the effect of a single outlier in a cloud of measurement points. When an increasing number of data points approximates the plane of Hooke's law D A loc , then the data driven solutions to these data sets approximate the classical solution with Hooke's law; this is one of the results in [6]. Our interest is an outlier: When the measurements contain a single point that is not in D A loc , the data driven solutions can always use this data point in the further process. How far off can the data driven solutions be because of the single outlier? Our result characterizes the relaxed data set and shows that it is only changed locally in the vicinity of the outlier. In this sense, the outlier has only a limited effect on the data driven solutions.
In more mathematical terms, the analysis of this article is concerned with sequences of pairs (G h , J h ) ∈ E and (g h , j h ) ∈ D that converge in the sense of data convergence. We are interested in possible limit functions (g, j). Possible values of constant limit functions are denoted as D relax * . The set D relax * contains D loc , it is the "data driven convexification" of D loc . Our main result is the characterization of D relax * . We find that the set is strictly larger than D loc , but smaller than the convex hull of D loc . We will characterize D relax * as the union of D A loc with a truncated cone that connects the additional point D B loc with the hyperplane D A loc . Denoting the truncated cone by C, our main result states D relax * = C ∪ D A loc , see Theorem 1.2. The proof consists of two parts. The inclusion D relax * ⊃ C ∪ D A loc requires a construction of a sequence of functions that use a fine mixture of materials. We will construct simple and iterated laminates. In order to realize a point on the lateral boundary of the cone C, it is sufficient to construct a simple laminate with phases A and B. For a point in the interior of C, an iterated laminate must be constructed. Such iterated laminates are quite standard, we mention [11] and [20].
The other part of the proof regards the inclusion D relax * ⊂ C ∪ D A loc . We show this inclusion with an application of the div-curl lemma. In our context, the notion of data convergence of [6] provides exactly the prerequisites in order to use the div-curl lemma for data convergent sequences.
Literature. Relaxation is a classical problem in the calculus of variations. For a functional I : X →R on a Banach space X, one introduces the relaxed functional I relax : X →R as I relax (u) := inf lim inf k I(u k ) | u k u . A related notion is that of quasiconvexity; loosely speaking, quasiconvex functionals coincide with their relaxation. For fundamental results on these important concepts we refer to [2,7,10]. For a functional I which is not quasiconvex, one can construct laminates or more complex patterns in order to find the relaxed functional and/or the quasiconvex envelope of the integrand, see e.g. [3] and [5]. For an introduction we refer to [20].
The data driven perspective introduces a new concept of a relaxation. For a data set D, the task is to study the relaxed data set, which consists of points that are attainable as limits in the sense of data convergence. A relaxed data set in this sense has been calculated in [6] for a problem in the vectorial case: For a data set that describes a non-monotone material law (corresponding to a non-convex energy), the authors determine the relaxed data set, compare (3.26) and Theorem 3.6 in [6]. The relaxed data set is larger than the original data set, but it is smaller than the convex hull of the original data set. A similar phenomenon appears in our main result.
We want to emphasize the close relation to homogenization. In the primal problem of homogenization, one prescribes different material laws in different points x of the macroscopic domain, and asks for the effective law for fine mixtures. Building upon such results, one then asks: With any material laws in different points x (material laws of some admissible set), which effective material laws can be obtained by homogenization? This leads to bounds for effective material laws as in [12,13,17] and to optimization of the distribution of the single material laws, see [1,4]. For early results in this direction which also highlight the relation to relaxation see [18,19].
Our main result may be interpreted in the perspective of homogenization. We use the two material laws D A and D B in different regions of the macroscopic domain, possibly in a fine mixture. We ask what effective laws can be obtained in the limit. The warning about this description is that D B is not a linear relation and hence does not describe a material law in the classical setting of homogenization.
We will make use of the div-curl Lemma in the second part of the proof. This lemma is also used in the compensated compactness method of homogenization, see [14,21]. Related concepts are those of Γ-convergence [8], Young-measures [10], and H-convergence [11].
For recent developments of the data driven approach we refer to [9] and [15], which are both concerned with numerical aspects.

The main result
Let n ≥ 2 be the dimension, Q ⊂ R n be a bounded Lipschitz domain, f ∈ H −1 (Q; R) a given source, and A ∈ R n×n a positive definite symmetric matrix. We consider the local material data sets and We therefore enrich the data set D A loc of the classical approach with the one point set D B loc . We choose here (0, e 1 ) ∈ D A loc as the position of the outlier; by elementary transformations, an arbitrary outlier can be analyzed. Functions with values in the data set are defined by D := (g, j) ∈ L 2 (Q; R n ) × L 2 (Q; R n ) | (g(x), j(x)) ∈ D loc for a.e. x . (1.7) We recall that the fundamental task in the data driven approach is to find (G, J) ∈ E from (1.1) that minimizes the distance to D. In the above setting, a vanishing distance can be realized, since D is larger than D A .
Our interest is to study the relaxed data set. We focus here on constant states that can be approximated in the sense of data convergence with sequences in E × D. We use the notion of data convergence of Definition 3.1 in [6]. Definition 1.1 (Relaxed data set). We use E from (1.1) and D from (1.7). A pair (g, j) ∈ L 2 (Q; R n )×L 2 (Q; R n ) is in the relaxed data set and we write (g, j) ∈ D relax if the following holds: There exist sequences (g h , j h ) and (G h , J h ) and a limit (G, J) ∈ E such that, for every h, Furthermore, we demand that the pair ((g h , j h ), (G h , J h )) converges in the sense of data convergence to ((g, j), (G, J)) as h → 0, which means that We introduce the subset of attainable values, We remark that the relaxed data set D relax can also be characterized as a Kuratowski limit. The precise statement is provided in Lemma 1.4 below.
In our main result, we characterize the relaxed data set D relax * . We prove that it is the union of two sets: the hyperplane D A loc and a truncated cone C with vertex in the outlier D B loc . The cone is truncated by the hyperplane D A loc . We define the cone in the following steps. For b ∈ [0, 1], we set Our main result is the characterization of the relaxed data set.
Theorem 1.2. With the truncated cone C of (1.11), the set D relax * of Definition 1.1 is given by D relax Theorem 1.2 characterizes the relaxation of the data set in the context of data driven analysis. The convexification of a set consisting of an hyperplane and an outlier yields the union of the plane with a truncated cone that connects the outlier with the plane, compare Figure 2. In particular, the data driven relaxation does not yield the (classical) convexification of the original set, which is an infinite strip (the infinite strip can be regarded as the truncated cone with opening angle π; in this sense, the data driven relaxation yields a cone with smaller opening angle).

Comments on the main result
In this work, we concentrate on the study of constant functions g and j that can be approximated in the sense of data convergence. We therefore include the following open problem regarding the relaxed data set D relax .
Open Problem 1.3. It is not clear whether or not D relax is given by some local space D relax loc as in (1.2). Furthermore, even if this is the case, it is not clear whether or not D relax loc coincides with D relax * .
Our definition of D relax was given in terms of sequences. As noted above, the set D relax can also be described in terms of a Kuratowski limit as in [6]. Lemma 1.4 (Kuratowski limit). Let data convergence be denoted as − lim. We use Kuratowski convergence of sets, which coincides with Γ-convergence of the indicator functions. With these topological tools, the data relaxation can be written as a limit: (1.13) Proof. Similar to [6] the sequential characterization of the Kuratowski limit follows from an (equi-)transversality condition.
Step 1: Transversality. We claim that there exist constants C 1 , C 2 > 0 such that every pair z = (g, j) ∈ D and Z = (G, J) ∈ E satisfies (1.14) The inequality is concluded with the help of the positivity of A ∈ R n×n , ξ · Aξ ≥ c 0 |ξ| 2 for some c 0 > 0. From this estimate and the fact that z ∈ D implies g = 0 on {j = Ag} we deduce where we have used Poincaré's inequality and G = ∇U in the last step; here and below, C denotes a constant that depends only on A, Q, n and that may change from line to line. Together with (1.15) we deduce that The triangle inequality yields an analogous inequality for g, Using (1.18) and Young's inequality, this provides This estimate can be inserted in (1.17) and we obtain the corresponding estimate for G. By the triangle inequality, we control all functions g, G, j, J in L 2 (Q; R n ) by the right-hand side of (1.19). This proves the transversality (1.14).
Step 2: Sequential characterization of Kuratowski convergence. The Kuratowski limit K( )-lim D × E is given by the domain of the Γ-limit of the (constant sequence of the) indicator function of D × E. To characterize this set consider any point (z 0 , Z 0 ) ∈ L 2 (Q; R n ) 2 . Since Γ-convergence is a local property, when computing the Γ-limit in this point we may restrict ourselves to any neighborhood of (z 0 , Z 0 ) with respect to the -topology. In particular, we may choose a neighborhood in which all pairs (z, Z) ∈ L 2 (Q; R n ) 2 satisfy (z − z 0 ) − (Z − Z 0 ) L 2 (Q;R n ) < 1 (note that strong convergence of differences is part of the definition of -convergence). Then the transversality property implies that we can restrict the computation of the Gamma limit to a bounded set in showing only the plane (g 1 , j 1 ) ∈ R 2 . The diagonal line corresponds to the set D A loc of points with j = g. The exceptional point P is (g 1 , j 1 ) = (0, 1), corresponding to the one-point set of additional data points, On bounded sets the data convergence topology is metrizable. Hence the topological and the sequential characterization of Γ convergence coincide [8,Proposition 8.1].
The sequential characterization of the lim inf and lim sup inequalities that characterize Γ-convergence of the indicator function of D × E to D relax × E are described by the properties: This is equivalent to the characterization of D relax given in Definition 1.1.

Equivalent descriptions for the truncated cone C
A special case. In the case n = 2 and A = id ∈ R 2×2 , the cone C is (1.20) The last condition expresses that g = (g 1 , g 2 ) is contained in the disc B r ((r, 0)) with radius r and center (r, 0). Because of j 1 = g 1 + 1 − 2r, the disc is mapped into an inclined plane. In order to see the equivalence, it suffices to use the new variable r = (1−b)/2. The condition g · Ag ≤ (1 − b)g 1 becomes g 2 1 + g 2 2 ≤ 2rg 1 .
The lateral boundary of C can be expressed as ∂ lat C := b∈[0,1] ∂ lat C b . With this notation, the boundary of C is given by We can generalize (1.20) as follows. Let y ∈ R n be a vector that satisfies Ay = e 1 . We introduce the scalar product v 1 , v 2 A := v 1 · Av 2 and the associated norm | · | A . The corresponding sphere with center 1 2 y that contains 0 is Then (g, j) ∈ ∂ lat C b if and only if g ∈ (1 − b)S A y and j = be 1 + Ag. In fact, for b = 1, there holds ∂ lat C b = {(0, e 1 )} and the equivalence is valid. For b ∈ [0, 1), we find For later use we include the following alternative characterization of ∂ lat C b .
Lemma 1.5. The lateral boundary can also be written as ( 1.22) Proof. We fix b ∈ [0, 1] and denote by K b the right-hand side of (1.22). Consider any (g, j) with j = be 1 + Ag and g = 0. Then where the choice ν = g provides the last implication.
Vice versa, let (g, j) with g = 0 be in K b . By definition, there exists ν = 0

Construction of approximating sequences
The goal of this section is to prove the inclusion . For an arbitrary point on the lateral boundary of the cone C, we will use laminates to construct data convergent sequences (g h , j h ) and (G h , J h ).
In order to motivate the subsequent constructions, let us present what we can achieve in the case A = id with simple laminates of horizontal or vertical layers. With respect to Figure 1 we can say: The simple laminates show that all points in the vertical line of the cone and all points in the horizontal line of the cone can be constructed.

Remark 2.1 (Horizontal layers).
We consider A = id and fix b ∈ (0, 1). We decompose Q into thin horizontal layers such that e 1 is a tangential vector of the interfaces. The layers have the width (1 − b)h and b h in an alternating fashion. The layers with width (1 − b)h are called A-layers, the other layers are B-layers. In the A-layers, we set J h := j h := G h := g h := 0, in the B-layers we set G h := g h := 0 and J h := j h := e 1 .
By construction, (g h , j h ) ∈ D. Since layers are horizontal, J h has a vanishing divergence. As a trivial function, G h is a gradient. We find (G h , J h ) ∈ E. The functions converge weakly in L 2 (Q) and the differences g h − G h and j h − J h converge strongly. We therefore obtain that the vertical line {(g, j) | g = 0, j = (j 1 , 0, ..., 0), j 1 ∈ [0, 1]} is contained in D relax * . Remark 2.2 (Vertical layers). We consider again A = id. We proceed as in Remark 2.1, but we now decompose Q into thin layers with normal vector e 1 . In the interior of Q, in the A-layers, we set J h := j h := G g := g h := e 1 , in the B-layers we set G h := g h := 0 and J h := j h := e 1 .
Up to truncations near the boundary, one can verify (G h , J h ) ∈ E, (g h , j h ) ∈ D, and the convergence properties. We therefore obtain that the the horizontal line After these motivating examples, we move on to the construction in the general case. Proof. The inclusion D A loc ⊂ D relax * holds trivially. Indeed, given g ∈ R n and j = Ag ∈ R n , it suffices to use the constant functions j h = j, J h = 0, g h = g, and G h = 0. Analogously, the single point D B loc , which is the vertex of the cone, is contained in D relax * . It therefore suffices to show that, for b ∈ (0, 1), the set ∂ lat C b belongs to D relax * . We consider a point (g, j) ∈ ∂ lat C b . By Lemma 1.5, we can express this point in the form g = (1 − b) ν 1 ν · Aν ν , j = be 1 + Ag for some ν ∈ R n \ {0}.
Step 1: Construction of approximating sequences. For h > 0, we consider the following layered subdivision of Q, using the direction ν, For the volume fractions we note that |B h | → b|Q| and |A h | → (1 − b)|Q| as h 0. The field (g h , j h ) is chosen as 3) By definition of the fields, (g h , j h ) ∈ D is satisfied. We note that the construction assures j h · ν = e 1 · ν in B h and This shows ∇ · j h = 0 in Q.
Using v h , we set We may introduce Then u h u and u h − u L ∞ ≤ Ch hold for a constant C that does not depend on h.
In order to define a corresponding pair (G h , J h ), we choose a cut-off function Step 2: Verification of the properties. By definition, G h is a gradient of a function in H 1 0 (Q). The field J h has the divergence ∇ · J h = ∇ · j h + ∇ · J f = f . This shows (G h , J h ) ∈ E.
We now verify the data convergence property. We clearly have Here, the convergence follows from the following facts: (1 − ϕ h ) → 0 strongly in L 2 (Q) implies convergence to 0 for the first term. The pointwise convergence ϕ h ∇u → ∇u with the uniform bound |ϕ h ∇u| ≤ |∇u| implies strong convergence of the second term to g = ∇u. The last term (u h − u)∇ϕ h is uniformly bounded and converges to zero almost everywhere, hence strongly to 0. Altogether, we obtain that (g h , j h ), (G h , J h ) → (g, j), (G, J) in the sense of data convergence and conclude that (g, j) ∈ D relax * . We next show that also the interior of the cone C will be reached by suitable iterated laminate constructions.

Lemma 2.4 (Iterated laminates). The cone C of (1.11) is contained in
(2.9) Proof. In view of Lemma 2.3, it remains to show that the interior of the cone is contained in D relax * . Let therefore p C = (g C , j C ) ∈ C \ ∂C be arbitrary; our aim is to show p C ∈ D relax * . This is done by constructing sequences (g h , j h ) ∈ D and (G h , J h ) ∈ E as before. In this proof, however, we have to use iterated laminates.
Step 1: Preparations. Let p C = (g C , j C ) ∈C be a point in the interior of the cone. We show in Lemma A.1 of the appendix that we can write p C as a convex combination as follows: There exist two points p A = (g A , j A ) ∈ D A loc and p L = (g L , j L ) ∈ ∂ lat C and a parameter λ ∈ (0, 1) such that 2.10) and such that, additionally, As in the proof of Lemma 2.3 we exploit Lemma 1.5: We can express the point p L ∈ ∂ lat C as a convex combination with some vector ν ∈ R n \ {0}: The iterated laminate is constructed as a coarse laminate with layers of width √ h and a fine laminate with layers of order h. Every second layer of the coarse mesh uses p A = (g A , j A ). The fine laminate uses (g a , j a ) and (g b , j b ). The two functions in the fine layer produce, in average, p L = (g L , j L ). The mixture of the coarse layers with values p A and p L provide the desired values p C . For a sketch see Figure 3.
Step 2: Construction of the approximating sequence. From now on, the points p C , p A , p L , p a , p b , and the volume fractions λ and b are fixed. In addition to ν, we introduce the normal vector (2.12) For every k ∈ Z, the coarse layers L h k and M h k are defined as The unions are denoted as L h := k∈Z L h k and M h := k∈Z M h k . The iterated laminate is based on a subdivision of every layer L h k . We set The unions are denoted as L h b := k∈Z L h k,b and L h a := k∈Z L h k,a . We define the fields g h and j h as (2.13) We next define a function u h : R n → R that is piecewise affine and which has piecewise the gradient g h . In order to construct u h we introduce the points The point x k is chosen such that, if x k happens to be in Q, it is a point in ∂L h k ∩ ∂M h k−1 . By construction, the weak limit of g h is g C . We therefore set u h (x k ) := g C · x k . Accordingly, in the layer M h k−1 , we set u h (x) := g C · x k + g A · (x − x k ). In the layer L h k , we define u h as the unique continuous function with u h (x k ) = g C · x k , with the gradient g a in L h k,a and the gradient g b in L h k,b . A continuous function u h exists in L h k since (g a − g b ) ν. As in the proof of Lemma 2.3, we can use a cutoff function ϕ k h in the layer L h k to construct U h k : L h k → R with bounded gradient such that The function U h can be defined on all of Q by setting By construction, the function U h is continuous. This property follows by inserting the vector λ √ h θ, the normal vector of layer L h k , where U h has the averaged gradient g L , and the vector (1 − λ) √ h θ in normal direction of layer M h k , where U h has the gradient g A : This is consistent with the choice of U h (x k+1 ). Furthermore, the function U h has a bounded gradient. This can be seen as in the proof of Lemma 2.3: In the layer L h k , the difference between u h (x) and g C · x k + g L · (x − x k ) is of order h (uniformly in x) since g L is the average slope of u h and u h oscillates at order h. The gradient of the cutoff function ϕ k h is of order h −1 .
The gradient of U h coincides with g h except for a set with a volume bounded by C √ h: the strips of width h in the layers L h k , and there are O(1/ √ h) such layers. With the choice G h := ∇U h , this guarantees the strong convergence g h − G h L 2 (Q) → 0. We do not perform here the modification of U h at the boundary ∂Q. We restrict ourselves to the observation that the weak limit of the sequence U h is the function U : R n → R, U (x) = g C · x. Moreover, there holds U h − U L ∞ ≤ C √ h for some constant C > 0, which is independent of h. This fact allows to use the cutoff argument of Lemma 2.3 at ∂Q.
The construction is complete up to the choice of the sequence J h , which we postpone to Step 3. At this point, we have found the following functions: (g h , j h ) are functions that are compatible with the data set, G h is a gradient (after the modification at ∂Q, it is the gradient of an H 1 0 (Q)-function), and g h −G h converges strongly in L 2 (Q). All functions converge weakly in L 2 (Q) with If an appropriate sequence J h can be constructed (with the right divergence and such that the difference to j h is strongly convergent), this shows that p C = (g C , j C ) is in the relaxed data set D relax * .
Step 3: The divergence of the approximation. Let us calculate the divergence of j h . In M h , the flux is constant and hence ∇ · j h = ∇ · j A = 0 in M h . In L h , the construction uses the fluxes j a and j b which satisfy (j a − j b ) · ν = (Ag a − e 1 ) · ν = ν 1 ν·Aν Aν − e 1 · ν = 0. This shows that j h satisfies ∇ · j h = 0 in L h .
Along ∂L h , the function j h has the jumps (2.15) Important for the following construction is that the total flux through two subsequent pieces of ∂M h vanishes: by (2.7) and (2.11). After a rescaling by h and a shift into the origin, the local geometry is as follows: compare the right part of Figure 3. We emphasize that only three regions of unit dimensions are considered.
We claim that there exists a bounded vector field p : Σ L → R n with support in {x ∈ Σ L | x · θ < 1} and with the properties The divergence in the first line is understood in the sense of distributions. The function can be constructed in R 2 as follows: We use an ansatz with a rotated gradient, p := ∇ ⊥ Φ = (−∂ 2 Φ, ∂ 1 Φ) with a smooth function Φ that is piecewise affine on the boundary ∂ Σ L,A ∪ Σ L,B . The fact that the total flux vanishes by (2.16) implies that Φ can be chosen such that it vanishes on ∂ Σ L,A ∪ Σ L,B \∂Σ L . This allows, in particular, to choose a compactly supported function Φ. The rotated gradient p has all the desired properties. In higher dimension, the twodimensional function can be extended as a constant function in the remaining directions.
Rescaling p as p h (x) := p(x/h) and extending the function p h first periodically with period h in all directions perpendicular to θ, then extending the result periodically with period √ h in direction θ, we obtain a function p h that has the same distributional divergence as j h , see (2.14) and (2.15).
We construct J h (x) := j h − p h . This choice assures ∇ · J h = 0. Furthermore, the strong convergence j h − J h = p h → 0 in L 2 (Q) is a consequence of the boundedness of p together with the fact that p h = 0 holds only on a set with volume fraction of order h/ √ h = √ h. This concludes the proof for f = 0. If a function J h with ∇ · J h = f = 0 has to be constructed, it suffices to add an h-independent function J f as in the proof of Lemma 2.3.

Necessary conditions for relaxed data points
The goal of this section is to prove the inclusion which is the inclusion in the claim of Theorem 1.2 that is not yet shown. In order to show (3.1), it suffices to fix an arbitrary pair of vectors (g, j) ∈ D relax * ⊂ R n ×R n and to show (g, j) ∈ C ∪ D A loc . By Definition 1.1, the condition (g, j) ∈ D relax * means that there exist sequences (g h , j h ) and (G h , J h ) and a limit (G, J) ∈ E such that as h → 0. The pairs (G h , J h ) are in E, i.e.: G h = ∇U h is the gradient of some U h ∈ H 1 0 (Q) and ∇ · J h = f . The pairs (g h , j h ) are in the data set D of (1.7).

Calculations for A = id and n = 2
In this subsection, we obtain (3.1) in a simple case, namely A = id and n = 2. The general case is treated in the next subsection and does not use any of the intermediate results of this section, which is included only in order to illustrate the approach in a simple setting.
loc of (1.5), The complement is denoted as A h := Q \ B h . Because of the bounds 0 ≤ |B h | ≤ |Q|, we can select a subsequence h → 0 (not relabelled) and a limit b ∈ [0, |Q|] such that Averages. We can calculate, using the weak convergence of g h , the property g h (x) = 0 for x ∈ B h , then g h (x) = j h (x) for x ∈ A h , then g h (x) = e 1 for x ∈ B h , and finally the weak convergence of j h : We conclude j = g + be 1 . With reference to Figure 1, we see that the point (g 1 , j 1 ) is above the diagonal.
Div-curl lemma. The convergence properties allow to calculate integrals over the product g h · j h . In the subsequent calculation, we use the standard div-curl lemma in L 2 (Q) for the product G h · J h , and the strong convergence of differences in the other terms. In the limit h → 0, we obtain Calculation 1 with the div-curl lemma. We calculate with the div-curl lemma, exploiting Forming the limes inferior (and recalling that g and j are constant functions), we obtain |g| 2 ≤ g · j. Hence, because of j 2 = g 2 , the relation g 2 1 ≤ g 1 j 1 = g 2 1 + bg 1 . This implies, in particular, Referring to Figure 1, we see that (g 1 , j 1 ) is to the right of the vertical axis.
Calculation 2 with the div-curl lemma. We now exploit the div-curl lemma slightly differently: Forming the limes inferior, we obtain |j| 2 ≤ b + g · j. Inserting j 2 = g 2 and In Figure 1, the point (g 1 , j 1 ) is below the horizontal line j 1 = 1.
The cone conditions. The above considerations do not imply any conditions for the second components, g 2 and j 2 . With the next calculation, we do not only find conditions for g 2 , but we additionally reproduce most of the above findings. Assuming b = 1, we use the shorthand notation β : Dividing by |Q|, this yields Inequality (3.9) yields the desired restrictions on the pair (g, j). We distinguish three cases.
In the case b = 0 the relation j = g + be 1 = g implies (g, j) ∈ D A loc . In the case b = 1 there holds g g h = g h 1 A h 0 by strong convergence 1 A h → 0. Hence, in this case, g = 0 and j = g +be 1 = e 1 . This yields (g, j) ∈ D B loc . In the case b ∈ (0, 1) we conclude with (3.9): |g| 2 ≤ (1 − b)g 1 can be written as This is the defining relation of the cone C, compare (1.20). Claim (3.1) is shown for A = id and n = 2.

The general case
In this subsection, we treat the case of a general matrix A. Moreover, we show a result on D relax , and not only a result on D relax * . The proof of Theorem 1.2 is complete with relation (3.10) of the subsequent proposition. The proposition provides additionally relation (3.12), which is slightly stronger. (3.12) Proof. We note that (3.12) implies (3.10). Indeed, let (g, j) ∈ D relax * be a pair of vectors in R n × R n . Once more we identify the vectors with constant functions on Q. The constant functions are in D relax by definition of D relax * , see (1.9). Relation (3.12) implies (g, j) ∈ D C∪D A loc fct . Since the functions are constant, there holds (g, j) ∈ C ∪ D A loc . This shows (3.10).
Step 1: Preparation. In order to prove (3.12), we fix a pair (g, j) ∈ L 2 (Q; R n ) 2 in the relaxed data set D relax , which means that there exist sequences (g h , j h ) in D and (G h , J h ) in E with data convergence such that (g h , j h ) weakly converges to (g, j). Our aim is to show (g(x), j(x)) ∈ C ∪ D A loc for almost every x ∈ Q. The approximating sequences (g h , j h ) in D and (G h , J h ) in E with limit (G, J) ∈ E satisfy, as h → 0, We denote by B h ⊂ Q those points x ∈ Q for which (g h (x), j h (x)) is in D B loc , (3.13) The complement is denoted as A h := Q\B h . Because of the bound 0 ≤ |B h | ≤ |Q|, we can select a subsequence (not relabeled) and a limit b ∈ L ∞ (Q) such that 1 B h → b weakly-* in L ∞ (Q) as h → 0, 0 ≤ b(x) ≤ 1 for a.e. x ∈ Q . (3.14) As a consequence, 1 A h → (1 − b) weakly-* in L ∞ (Q).
Step 2: Localization. For any ϕ ∈ L 2 (Q; R n ) we can calculate, using the weak convergence of g h , the property g h (x) = 0 for x ∈ B h , then Ag h (x) = j h (x) for x ∈ A h , then g h (x) = e 1 for x ∈ B h , and finally the weak convergence of j h : ϕ · e 1 → Q ϕ · (j − be 1 ) . (3.15) This shows Ag = j − be 1 in Q . (3.16) In particular, we find (g, j)(x) ∈ D A loc for almost all x ∈ {x ∈ Q | b(x) = 0}.
Step 3: Div-curl lemma. The data convergence properties allow to calculate the distributional limit of the product g h · j h . In the subsequent calculation, we use the standard div-curl lemma in L 2 (Q) for the product G h · J h , and the strong convergence of differences in the other terms. In the limit h → 0, we obtain for any ϕ ∈ C ∞ c (Q) Step 4: The cone condition. We choose ε > 0 and set β ε := (1 − b + ε) −1 . For arbitrary ϕ ∈ C ∞ c (Q) with ϕ ≥ 0 we can calculate, exploiting the positivity and symmetry of A, (3.17) Using the div-curl lemma, the strong convergence of differences and the weak-* convergence of 1 A h we deduce where we have used that j = be 1 + Ag. Since ϕ was arbitrary, almost everywhere in Q holds 0 ≤ bg 1 + g · Ag 1 − 2β ε + (1 − b)β 2 ε . (3.19) Evaluating this inequality in {b = 1} = {β ε = ε −1 }, we find g · Ag ≤ ε 2−ε g 1 in this set. Since ε > 0 was arbitrary, we find g = 0 and j = e 1 almost everywhere in {b = 1}. In particular, (g, j)(x) ∈ D B loc ⊂ C for almost all x ∈ {b = 1}. We next consider the set {0 < b < 1}. In this set, for ε → 0, there holds β ε → 1 1−b . Relation (3.19) implies . This is one of the defining relations of the cone C, compare (1.10). Combined with (3.16), we obtain that (g, j) ∈ C almost everywhere in {0 < b < 1}. Finally, in {b = 0}, relation (3.16) yields (g, j) ∈ D A loc . This provides (3.12) and concludes the proof of the proposition. of two identical terms and hence nonnegative. This implies that there exists a value b L ∈ (b, 1] such that the expression vanishes. For this parameter b L , the above condition is satisfied and hence (g L , j L ) ∈ ∂ lat C.
We set λ := b b L ∈ (0, 1). With this choice, by definition of g L in (A.2), we obtain g L = 1 λ (g C − g A ) + g A and therefore Regarding the component j, we find Together with (A.4), this shows (2.10). Finally, the definitions of g A , g L , j A , and j L imply g A −g L = b L b (g A −g C ) = b L ν 1 ν and hence This shows (2.11) and completes the proof of the lemma.