Relaxation analysis in a data driven problem with a single outlier

We study a scalar elliptic problem in the data driven context. Our interest is to study the relaxation of a data set that consists of the union of a linear relation and single outlier. The data driven relaxation is given by the union of the linear relation and a truncated cone that connects the outlier with the linear subspace.


Introduction
The data driven perspective is new in the field of material science and partial differential equations, we mention [18] and [6] as the two fundamental contributions of this young field. In the data driven perspective certain laws of physics are accepted as invariable, e.g. balance of forces or compatibility. On the other hand, material laws (such as Hooke's law) can be questionable. In the classical approach, measurements are used to estimate constants of material laws. The new paradigm is to use a set of data points, obtained from measurements; the data points are not interpreted as realizations of some law, but calculations and analysis are based directly on the cloud of data points.
On a more formal level, one introduces a set E of functions that satisfy the invariable physical laws. A second set D denotes those functions that are consistent with the data. In this setting, the aim is to find functions in E that minimize the distance to the data set D.
The emphasis in [18] was to derive computing algorithms for this new approach. The mathematical analysis in [6] establishes well-posedness properties and introduces, among other tools, data convergence and relaxation in the data driven context. It is shown that data driven relaxation differs markedly from traditional relaxation, see the discussion below. In the work at hand, we investigate a scalar setting, which can be used, e.g., in the modelling of porous media. We seek two functions, G (a gradient) and J (a negative flux). Given a domain Q ⊂ R n and a source f : Q → R, the invariable physical laws are the compatibility G = ∇U for some U : Q → R and the mass conservation ∇ · J = f (in other contexts, the second law is the balance of forces). We introduce In the classical approach, one might be interested in the linear material law given by J = AG for A ∈ R n×n . We note that a pair (G, J ) ∈ E f with J = AG can be found be solving the scalar elliptic equation ∇ · (A∇U ) = f .
In the data driven perspective, the material law is replaced by a data set D. In a simple setting, we are given a local data set D loc := {(g i , j i ) | i ∈ I } ⊂ R n × R n for some index set I . This data set might be obtained by measurements, in this case the index set I is finite and D loc is a cloud of points in R n × R n . The set of functions that respect the data is D := (g, j) ∈ L 2 (Q; R n ) × L 2 (Q; R n ) | (g(x), j(x)) ∈ D loc for a.e. x ∈ Q . (1.2) In the data driven perspective, the task is: Find a pair (G, J ) ∈ E f that minimizes the distance to the set D. We remark that we recover the classical problem if we introduce and the corresponding set of functions D A as in (1.2). For typical choices of Q, A, and f , the linear problem can be solved; in this case, there exists (G, J ) ∈ E f ∩ D A and the minimization task has a solution that realizes the distance 0. The advantage of the data driven perspective is the generality of the data set. In the minimization task above, an arbitrary data set D can be considered. Three different types of questions can be asked: 1. Minimality conditions: When E f ∩ D is empty, what are conditions for minimizers of the distance? 2. Families of data sets: Given a family of data sets D h and solutions (G h , J h ) of the minimization problems, what can we say about limits? 3. Relaxation: Given D and sequences of pairs (G h , J h ) ∈ E f and (g h , j h ) ∈ D. Which limits are attainable in the sense of data convergence?
The present paper is devoted to the third question. We investigate a special data set: D loc is the union of D A loc and D B loc , where D A loc is as in (1.3) and D B loc is a one-point set of a single outlier. In this setting, the minimization problem is solvable with distance 0 since D is larger than D A . Our interest is to study the relaxation problem.
The motivation to study the data set D loc = D A loc ∪ D B loc is to understand the effect of a single outlier in a cloud of measurement points. When an increasing number of data points approximates the plane of Hooke's law D A loc , then the data driven solutions to these data sets approximate the classical solution with Hooke's law; this is one of the results in [6]. Our interest is an outlier: When the measurements contain a single point that is not in D A loc , the data driven solutions can always use this data point in the further process. How far off can the data driven solutions be because of the single outlier? Our result characterizes the relaxed data set and shows that it is only changed locally in the vicinity of the outlier. In this sense, the outlier has only a limited effect on the data driven solutions.
In more mathematical terms, the analysis of this article is concerned with sequences of pairs (G h , J h ) ∈ E f and (g h , j h ) ∈ D that converge in the sense of data convergence. The set of all limits (g, j) constitutes the relaxed data set D relax . Our main result is the characterization of this set. We prove that it consists of functions that attain values in a local relaxed data set D relax loc . This set contains D loc , it is the "data driven convexification" of D loc . We find that the set is strictly larger than D loc , but smaller than the convex hull of D loc . We will characterize D relax loc as the union of D A loc with a truncated cone that connects the additional point D B loc with the hyperplane D A loc . Denoting the truncated cone by C, our main result states D relax loc = C ∪ D A loc , see Theorem 1. The proof consists of two parts. The fact that any pair (g, j) with values in C ∪ D A loc belongs to D relax requires the construction of a sequence of functions that use a fine mixture of materials. We will first approximate constant functions with values in C ∪ D A loc by constructions of simple and iterated laminates. In order to realize a point in D A loc , it suffices to use constant functions. In order to realize a point on the lateral boundary of the cone C, it is sufficient to construct a simple laminate with phases A and B. For a point in the interior of C, an iterated laminate must be constructed. Such iterated laminates are quite standard, we mention [13] and [22]. The technical difficulty in the derivation of the inclusion for functions lies in the glueing process for the local constructions. We adapt an approach of [6] and use a suitable Vitali covering.
The other part of the proof regards necessary conditions for limits of data convergent sequences. More precisely, we have to show that limits take only values in C ∪ D A loc . This part of the proof relies on the div-curl lemma [24]. In our context, the notion of data convergence of [6] provides exactly the prerequisites in order to use the div-curl lemma for data convergent sequences.
Literature. Relaxation is a classical problem in the calculus of variations. For a functional I : X →R on a Banach space X , one introduces the relaxed functional I relax : X →R as I relax (u) := inf lim inf k I (u k ) | u k u . A related notion is that of quasiconvexity; loosely speaking, quasiconvex functionals coincide with their relaxation. For fundamental results on these important concepts we refer to [2,8,12]. For a functional I which is not quasiconvex, one can construct laminates or more complex patterns in order to find the relaxed functional and/or the quasiconvex envelope of the integrand, see e.g. [3] and [5]. For an introduction we refer to [22].
The data driven perspective introduces a new concept of a relaxation. For a data set D, the task is to study the relaxed data set, which consists of points that are attainable as limits in the sense of data convergence. A relaxed data set in this sense has been calculated in [6] for a problem in the vectorial case: For a data set that describes a non-monotone material law (corresponding to a non-convex energy), the authors determine the relaxed data set, compare (3.26) and Theorem 3.6 in [6]. The relaxed data set is larger than the original data set, but it is smaller than the convex hull of the original data set. A similar phenomenon appears in our main result.
We want to emphasize the close relation to homogenization. In the primal problem of homogenization, one prescribes different material laws in different points x of the macroscopic domain, and asks for the effective law for fine mixtures. Building upon such results, one then asks: With any material laws in different points x (material laws of some admissible set), which effective material laws can be obtained by homogenization? This leads to bounds for effective material laws as in [14,15,19] and to optimization of the distribution of the single material laws, see [1,4]. For early results in this direction which also highlight the relation to relaxation see [20,21].
Our main result may be interpreted in the perspective of homogenization. We use the two material laws D A and D B in different regions of the macroscopic domain, possibly in a fine mixture. We ask what effective laws can be obtained in the limit. The warning about this description is that D B is not a linear relation and hence does not describe a material law in the classical setting of homogenization.
We will make use of the div-curl Lemma in the second part of the proof. This lemma is also used in the compensated compactness method of homogenization, see [16,23]. Related concepts are those of -convergence [9], Young-measures [12], and H -convergence [13].
For recent developments of the data driven approach we refer to [10] and [17], which are both concerned with numerical aspects. Finite plasticity in the context of data driven analysis is treated in [7].

The main result
Let n ≥ 2 be the dimension, Q ⊂ R n be a bounded Lipschitz domain, f ∈ H −1 (Q; R) a given source, and A ∈ R n×n a positive definite symmetric matrix. We consider the local material data sets and We therefore enrich the data set D A loc of the classical approach with the one point set D B loc . We choose here (0, e 1 ) / ∈ D A loc as the position of the outlier; by elementary transformations, an arbitrary outlier can be analyzed. Functions with values in the data set are defined by (1.7) We recall that the fundamental task in the data driven approach is to find a pair (G, J ) ∈ E f from (1.1) that minimizes the distance to D. In the above setting, a vanishing distance can be realized since D is larger than D A .
Our interest is to study the relaxed data set, which is the set of states that can be approximated in the sense of data convergence with sequences in E f × D. We use the notion of data convergence of Definition 3.1 in [6].
Definition 1 (Relaxed data set D relax ; data convergence) We use E f from (1.1) and D from (1.7). A pair (g, j) ∈ L 2 (Q; R n ) × L 2 (Q; R n ) is in the relaxed data set and we write (g, j) ∈ D relax if the following holds: There exists a sequence h → 0 and sequences (g h , j h ) h and (G h , J h ) h and a limit (G, J ) ∈ E f such that, for every h, (1.8) The sequence of pairs ((g h , j h ), (G h , J h )) h converges in the sense of data convergence to ((g, j), (G, J )), which means that as h → 0.
We remark that the relaxed data set D relax can also be characterized as a Kuratowski limit. The precise statement is provided in Lemma 1 below.
We will characterize the relaxed data set D relax in terms of a local relaxed data set D relax loc which, in turn, is the union of two sets: the hyperplane D A loc and a truncated cone C with vertex in the outlier D B loc . The cone is truncated by the hyperplane D A loc . We define the cone in the following steps. For b ∈ [0, 1], we set (1.9) For fixed b, the set C b is an n-dimensional closed ellipsoid in R n×n . For b = 1, the ellipsoid degenerates to a point, On the other hand, for b = 0, every vector in C 0 satisfies j = Ag, hence C 0 ⊂ D A loc . We define the truncated cone C as Our main result is the characterization of the relaxed data set.

Theorem 1 (Characterization of the relaxed data set) The set D relax of Definition 1 is characterized as
where the local relaxed data set D relax loc is given by with the truncated cone C of (1.9)-(1.10).
Theorem 1 characterizes the relaxation of the data set in the context of data driven analysis. The convexification of a set consisting of an hyperplane and an outlier yields the union of the plane with a truncated cone that connects the outlier with the plane, compare Fig. 1 and Fig. 2. In particular, the data driven relaxation does not yield the (classical) convexification of the original set, which is an infinite strip (the infinite strip can be regarded as the truncated cone with opening angle π; in this sense, the data driven relaxation yields a cone with smaller opening angle).

Fig. 1 A sketch for
Three dimensional illustration of the cone C and part of the plane D A loc in the g 1 , g 2 , j 1 space for A = id We note that the representation of D relax in particular implies that the local relaxed data set D relax loc coincides with the set of attainable values, will be a crucial part in the proof of Theorem 1, see Section 2.1 below.

An alternative description of the relaxed data set
Our definition of D relax was given in terms of sequences. As noted above, the set D relax can also be described in terms of a Kuratowski limit as in [6].

Lemma 1 (Kuratowski limit)
Let data convergence be denoted as −lim. We use Kuratowski convergence of sets, which coincides with -convergence of the indicator functions. With these topological tools, the data relaxation can be written as a limit: (1.14) Proof Similar to [6] the sequential characterization of the Kuratowski limit follows from an (equi-)transversality condition.
Step 1: Transversality. We claim that there exist constants C 1 , C 2 > 0 such that every pair The inequality is concluded with the help of the positivity of A ∈ R n×n , ξ · Aξ ≥ c 0 |ξ | 2 for some c 0 > 0. From this estimate and the fact that z ∈ D implies g = 0 on { j = Ag} we deduce (1.16) where we have used Poincaré's inequality and G = ∇U in the last step; here and below, C denotes a constant that depends only on A, Q, n and that may change from line to line. Together with (1.16) we deduce that The triangle inequality yields an analogous inequality for g, Using (1.19) and Young's inequality, this provides This estimate can be inserted in (1.18) and we obtain the corresponding estimate for G. By the triangle inequality, we control all functions g, G, j, J in L 2 (Q; R n ) by the right-hand side of (1.20). This proves the transversality (1.15).
Step 2: Sequential characterization of Kuratowski convergence. The Kuratowski limit K ( )lim D×E f is given by the domain of the -limit of the (constant sequence of the) indicator function of D × E f . To characterize this set consider any point Since -convergence is a local property, when computing the -limit in this point we may restrict ourselves to any neighborhood of (z 0 , Z 0 ) with respect to the -topology. In particular, we may choose a neighborhood in which all pairs (z, note that strong convergence of differences is part of the definition of -convergence). Then the transversality property implies that we can restrict the computation of the Gamma limit to a bounded set in L 2 (Q; R n ) 2 . On bounded sets the data convergence topology is metrizable. Hence the topological and the sequential characterization of convergence coincide [9, Proposition 8.1].
The sequential characterization of the lim inf and lim sup inequalities that characterize -convergence of the indicator function of D × E f to D relax × E f are described by the properties: This is equivalent to the characterization of D relax given in Definition 1.

Equivalent descriptions for the truncated cone C
The main purpose of this section is to derive a convenient description of the lateral boundary of C. Before we do so, let us study briefly the cone C in the special case that the dimension is n = 2 and that the linear law is given by A = id ∈ R 2×2 . In this situation, we can use the The last condition expresses that g = (g 1 , g 2 ) is contained in the disc B r ((r , 0)) with radius r and center (r , 0). Because of j 1 = g 1 + 1 − 2r , the disc is mapped into an inclined plane.

The lateral boundary of
The lateral boundary of C can be expressed as ∂ lat C := b∈[0,1] ∂ lat C b . With this notation, the boundary of C is given by We can generalize (1.21) as follows. Let y ∈ R n be a vector that satisfies Ay = e 1 . We introduce the scalar product v 1 , v 2 A := v 1 · Av 2 and the associated norm | · | A . The corresponding sphere with center 1 2 y that contains 0 is For later use we include the following alternative characterization of ∂ lat C b .

Lemma 2
The lateral boundary can also be written as (1.23)

Proof
We fix b ∈ [0, 1] and denote by K b the right-hand side of (1.23). Consider any (g, j) with j = be 1 + Ag and g = 0. Then where the choice ν = g provides the last implication (Fig. 2).

Construction of approximating sequences
The goal of this section is to prove the inclusion This is one of the inclusions in (1.11). It is that part of Theorem 1 that must be verified by the construction of approximations. We show in Subsection 2.1 that constant functions with values in C ∪ D A loc can be approximated by data convergence, i.e. the inclusion C ∪ D A loc ⊂ D relax * . In Subsection 2.2 the construction is extended to nonconstant functions with values in C ∪ D A loc .

Approximation of constant functions
The aim of this subsection is to verify C ∪ D A loc ⊂ D relax * . To this end, we choose an arbitrary point (g, j) ∈ C ∪ D A loc ⊂ R n × R n . We have to construct sequences (g h , j h ) ∈ D and (G h , J h ) ∈ E f such that the pairs data converge, satisfying (g h , j h ) (g, j). To fulfil the condition ∇ · J h = f , it is convenient to fix a vector field J f ∈ L 2 ( ; R n ) with ∇ · J f = f .
In the case (g, j) ∈ D A loc there holds j = Ag and we can use trivial sequences (in particular, j h = j and g h = g) in order to obtain (g, j) ∈ D relax * . When (g, j) ∈ C is a point on the lateral boundary of the cone, we use simple laminates to construct data convergent sequences (g h , j h ) and (G h , J h ). Finally, when (g, j) ∈ C is an inner point, we use iterated laminates to construct the data convergent sequences.
In order to motivate the subsequent constructions, we present what can be achieved in the case A = id with simple laminates of horizontal or vertical layers. With respect to Fig. 1 we can say: The simple laminates show that all points in the vertical line of the cone and all points in the horizontal line of the cone can be constructed.

Remark 1 (Horizontal layers)
We consider A = id and fix b ∈ (0, 1). We decompose Q into thin horizontal layers such that e 1 is a tangential vector of the interfaces. The layers have the width (1 − b)h and b h in an alternating fashion. The layers with width (1 − b)h are called A-layers, the other layers are B-layers. In the A-layers, we set j h := G h := g h := 0, in the B-layers we set G h := g h := 0 and j h := e 1 . We finally set J h := j h + J f , for J f as above.
By construction, (g h , j h ) ∈ D. Since layers are horizontal, j h has a vanishing divergence and J h has the divergence f . As a trivial function, G h is a gradient. We find (G h , J h ) ∈ E f . The functions converge weakly in L 2 (Q; R n ) and the differences g h − G h = 0 and j h − J h = J f converge strongly. We therefore obtain that the vertical line {(g, j) | g = 0, j = ( j 1 , 0, ..., 0), j 1 ∈ [0, 1]} is contained in D relax * . Remark 2 (Vertical layers) We consider again A = id. We proceed as in Remark 1, but we now decompose Q into thin layers with normal vector e 1 . In the interior of Q, in the A-layers, we set j h := G h := g h := e 1 , in the B-layers we set G h := g h := 0 and j h := e 1 , and Up to truncations near the boundary, one can verify (G h , J h ) ∈ E f , (g h , j h ) ∈ D, and the convergence properties. We therefore obtain that the horizontal line {(g, j) | g = (g 1 , 0, ..., 0), After these motivating examples, we move on to the construction in the general case.

Lemma 3 (Simple laminates) Let
loc ∪ ∂C be a point in the plane or on the boundary of the cone. Then, for every sequence h

The sequence can be chosen such that |g
Proof For a point (g, j) ∈ D A loc , i.e., j = Ag, trivial sequences can be used: We choose The vertex of the cone (i.e.: the case (g, j) ∈ D B loc ) is also treated by choosing constant functions.
It remains to treat the case that, for some parameter b ∈ (0, 1), there holds (g, j) ∈ ∂ lat C b . By Lemma 2, we can express this point in the form for some ν ∈ R n \ {0}.
Step 1: Construction of approximating sequences. For h > 0, we consider the following layered subdivision of Q, using the direction ν, For the volume fractions we note that |B h | → b|Q| and |A h | → (1 − b)|Q| as h 0. The field (g h , j h ) is chosen as By definition of the fields, (g h , j h ) ∈ D is satisfied and |g h | + | j h | ≤ C(n, A) holds in Q.
We note that the construction assures j h · ν = e 1 · ν in B h and This shows ∇ · j h = 0 in Q.
We want to find a function u h : Q → R that satisfies The function u h can be constructed explicitly. We use the continuous (and piecewise affine) We may introduce Then u h u and u h − u L ∞ ≤ Ch hold for a constant C that does not depend on h.
In order to define a corresponding pair (G h , J h ), we choose a cut-off function h . With these preparations we define Step 2: Verification of the properties. By definition, G h is a gradient of a function in We now verify the data convergence property. We clearly have in L 2 (Q; R n ). Finally we have J h − j h = J f and Here, the convergence follows from the following facts: (1−ϕ h ) → 0 strongly in L 2 (Q; R n ) implies convergence to 0 for the first term. The pointwise convergence ϕ h ∇u → ∇u with the uniform bound |ϕ h ∇u| ≤ |∇u| implies strong convergence of the second term to g = ∇u.
The last term (u h − u)∇ϕ h is uniformly bounded and converges to zero almost everywhere, hence strongly to 0. Altogether, we obtain that , j), (G, J ) in the sense of data convergence. In particular, (g, j) ∈ D relax * . We next show that also the interior of the cone C will be reached by suitable iterated laminate constructions. (g, j) ∈ C be a point of the cone. Then, for every sequence h 0, there exist (G h , J h ) ∈ E f and (g h , j h ) ∈ D such that the sequence of pairs is data convergent with (g h , j h ) (g, j) and (G h , J h ) (0, J f + j) in L 2 (Q; R n ). In particular, there holds

The sequence can be chosen such that |g h | + | j h | ≤ C(n, A) in Q and, in the case f = 0, such that in addition J h ≡ j holds in a neighborhood of ∂ Q.
Proof In view of Lemma 3 it remains to consider interior points of the cone C. In this case we use iterated laminates for the construction.
Step 1: Preparations. Let p C = (g C , j C ) ∈C be an arbitrary point in the interior of the cone. We show in Lemma A.1 of the "Appendix" that we can write p C as a convex combination as follows: There exist two points p A = (g A , j A ) ∈ D A loc and p L = (g L , j L ) ∈ ∂ lat C and a parameter λ ∈ (0, 1) such that and such that, additionally, As in the proof of Lemma 3 we exploit Lemma 2: We can express the point p L ∈ ∂ lat C as a convex combination with some vector ν ∈ R n \ {0}: The iterated laminate is constructed as a coarse laminate with layers of width √ h and a fine laminate with layers of order h. Every second layer of the coarse mesh uses p A = (g A , j A ). The fine laminate uses (g a , j a ) and (g b , j b ). The two functions in the fine layer produce, in average, p L = (g L , j L ). The mixture of the coarse layers with values p A and p L provide the desired values p C . For a sketch see Fig. 3.
Step 2: Construction of the approximating sequence. From now on, the points p C , p A , p L , p a , p b , and the volume fractions λ and b are fixed. In addition to ν, we introduce the normal vector to the coarse layer interfaces (2.10) For every k ∈ Z, the coarse layers L h k and M h k are defined as The unions are denoted as L h := k∈Z L h k and M h := k∈Z M h k . The iterated laminate is based on a subdivision of every layer L h k . We set The unions are denoted as Fig. 3 Left: A sketch of the iterated laminate. Right: The local geometry as considered in Step 3 We define the fields g h and j h as (2.11) By definition (g h , j h ) ∈ D. Using (A.2) we deduce that |g h | + | j h | ≤ C(n, A) holds in Q. We next define a function u h : R n → R that is piecewise affine and which has piecewise the gradient g h . In order to construct u h we introduce the points x k := k √ h θ ∈ R n for k ∈ Z. The point x k is chosen such that, if x k happens to be in Q, it is a point in ∂ L h k ∩ ∂ M h k−1 . By construction, the weak limit of g h is g C . We therefore set u h (x k ) := g C · x k . Accordingly, in the layer M h k−1 , we set u h (x) := g C · x k + g A · (x − x k ). In the layer L h k , we define u h as the unique continuous function with u h (x k ) = g C · x k , with the gradient g a in L h k,a and the gradient g b in L h k,b . A continuous function u h exists in L h k since (g a − g b ) ν. As in the proof of Lemma 3, we can use a cutoff function ϕ k h in the layer L h k to construct The function U h can be defined on all of Q by setting By construction, the function U h is continuous. This property follows by inserting the vector λ √ h θ , the normal vector of layer L h k , where U h has the averaged gradient g L , and the vector where U h has the gradient g A : This is consistent with the choice of U h (x k+1 ). Furthermore, the function U h has a bounded gradient. This can be seen as in the proof of Lemma 3: In the layer L h k , the difference between u h (x) and g C · x k + g L · (x − x k ) is of order h (uniformly in x) since g L is the average slope of u h and u h oscillates at order h. The gradient of the cutoff function ϕ k h is of order h −1 . The gradient of U h coincides with g h except for a set with a volume bounded by C √ h: the strips of width h in the layers L h k , and there are O(1/ √ h) such layers. With the choice G h := ∇U h , this guarantees the strong convergence g h − G h L 2 (Q) → 0. We do not perform here the modification of U h at the boundary ∂ Q. We restrict ourselves to the observation that the weak limit of the sequence U h is the function U : R n → R, This fact allows to use the cutoff argument of Lemma 3 at ∂ Q.
After this modification we have U h 0, see Lemma 3.
The construction is complete up to the choice of the sequence J h , which we postpone to Step 3. At this point, we have found the following functions: (g h , j h ) are functions that are compatible with the data set, G h is a gradient (after the modification at ∂ Q, it is the gradient of an H 1 0 (Q)-function), and g h − G h converges strongly in L 2 (Q; R n ). All functions converge weakly in L 2 (Q; R n ) with If an appropriate sequence J h can be constructed (with the right divergence and such that the difference to j h is strongly convergent), this shows that p C = (g C , j C ) is in the relaxed data set D relax * .
Step 3: The divergence of the approximation. Let us calculate the divergence of j h . In M h , the flux is constant and hence ∇ · j h = ∇ · j A = 0 in M h . In L h , the construction uses the fluxes j a and j b which satisfy ( j a − j b ) · ν = (Ag a − e 1 ) · ν = ν 1 ν·Aν Aν − e 1 · ν = 0. This shows that j h satisfies ∇ · j h = 0 in L h .
Along ∂ L h , the function j h has the jumps (2.13) Important for the following construction is that the total flux through two subsequent pieces of ∂ M h vanishes: by (2.10) and (2.6). After a rescaling by h and a shift into the origin, the local geometry is as follows: compare the right part of Fig. 3. We emphasize that only three regions of unit dimensions are considered.
We claim that there exists a bounded vector field p : L → R n with support in {x ∈ L | x · θ < 1} and with the properties The divergence in the first line is understood in the sense of distributions. The function can be constructed in R 2 as follows: We use an ansatz with a rotated gradient, p := ∇ ⊥ = (−∂ 2 , ∂ 1 ) with a smooth function that is piecewise affine on the boundary ∂ L,A ∪ L,B . The fact that the total flux vanishes by (2.14) implies that can be chosen such that it vanishes on ∂ L,A ∪ L,B \ ∂ L . This allows, in particular, to choose a compactly supported function . The rotated gradient p has all the desired properties. In higher dimension, the two-dimensional function can be extended as a constant function in the remaining directions.
Rescaling p as p h (x) := p(x/h) and extending the function p h first periodically with period h in all directions perpendicular to θ , then extending the result periodically with period √ h in direction θ , we obtain a function p h that has the same distributional divergence as j h , see (2.12) and (2.13).
We construct J h (x) := j h − p h . This choice assures ∇ · J h = 0. Furthermore, the strong convergence j h − J h = p h → 0 in L 2 (Q; R n ) is a consequence of the boundedness of p together with the fact that p h = 0 holds only on a set with volume fraction of order h/ √ h = √ h. This concludes the proof for f = 0. If a function J h with ∇ · J h = f = 0 has to be constructed, it suffices to add an h-independent function J f as in the proof of Lemma 3.
In the case f = 0, the property J h ≡ j in a neighborhood of ∂ Q can be achieved by a similar construction: One defines all sequences as constructed above in the smaller domain Q h := {x ∈ Q|dist(x, ∂ Q) > h 1/4 }. In the boundary region Q \ Q h we set U h ≡ 0, G h := ∇U h = 0, J h = j, j h := g h := 0. The extension J h to all of Q is defined as above with a local construction. The local construction is possible since all averages of J h over periodicity cells are given by j.
Remark on the last step in the above proof. The property J h ≡ j near ∂ Q can be obtained also with cut-off functions: One constructs J h in all of Q by means of a potential with respect to a suitable differential operator, and uses a cut-off function to construct a transition from J h to j. Such a construction is used in Lemma 3.14 of [6].

Approximation of general functions
The next lemma is a variant of Lemma 4. It contains no more than the observation that we can perform a local construction and, at the same time, keep boundary conditions for J h .

Lemma 5 (Iterated laminates with boundary condition) Let
loc be a point, let j ∈ L 2 (Q; R n ) be a function. We set F := ∇ · j. Then there exist sequences (G h , J h ) ∈ E F and (g h , j h ) ∈ D that are data convergent with (g h , j h ) (ḡ,j) and (G h , J h ) (0, j) in L 2 (Q; R n ) 2 . The sequences can be chosen such that |g h |+| j h | ≤ C(n, A)(1+|ḡ|) pointwise in Q and such that J h ≡ j holds in a neighborhood of ∂ Q.
Proof We set f = 0 and (ḡ,j) as a limit point. Let (g h , j h ) and (G h ,J h ) be sequences in D and E 0 as in Lemma 3 or Lemma 4, the limits are (ḡ,j) and (0,j). Furthermore, we can assume thatJ h coincides withj close to ∂ Q.
We set J h :=J h −j + j. The new sequence is still data convergent, there holds ∇· J h = F, J h j, and J h = j near ∂ Q.
In the last subsection, we have shown the local property C ∪ D A loc ⊂ D relax * . Glueing together the constructions, we will now obtain the corresponding approximation result for non-constant functions with values in C ∪ D A loc .

Proposition 1 (Glueing the constructions) There holds
Proof Let (g, j) : Q → C ∪ D A loc of class L 2 be given. Our aim is to construct a sequence We set F := ∇ · j. We will construct a sequence with ∇ · J k = F. This is sufficient, since we can add a function J f −F with ∇ · J f −F = f − F in the very last step of the construction. Without loss of generality we assume |Q| ≤ 1.
Step 1: Lebesgue-points and covering. We use balls Q r (x) := {y ∈ R n | |y − x| < r }, where x ∈ Q is the center and r > 0 is the radius; we only consider balls that are contained in Q. Since almost all points x ∈ Q are Lebesgue points for g and j, there exists a set of points ω ⊂ Q with full measure, |Q \ ω| = 0, such that, for every x ∈ ω: For arbitrary δ > 0, we consider the sets Q r (x) with x ∈ ω, r > 0, and Q r (x) |(g, j) − (ḡ,j)| 2 ≤ δ|Q r (x)|. This family of balls forms a regular Vitali covering of ω. By the Vitali covering theorem, there exists a countable disjoint covering of ω \ N for some N ⊂ Q with |N | = 0. Furthermore, we find a finite disjoint family of balls and Step 2: Local construction. We can now construct approximations with Lemma 5. In each ball Q i , i ∈ I , of the finite and disjoint covering, we use the functions Here, E F (Q i ) and D(Q i ) are defined analogously to (1.1) and (1.2), with Q replaced by Q i . We note that G h i is the gradient of a function u h i with u h i = 0 on ∂ Q i . The function J h i satisfies J h i = j in a neighborhood of ∂ Q i . This means that these functions can be extended to Q \ ∪ i Q i by 0 and j, respectively. This provides pairs Similarly, we can extend the functions (g h i , j h i ) ∈ D(Q i ). Because of (0, 0) ∈ D A loc , we can extend both functions by 0 and obtain pairs (g h , j h ) ∈ D.
Step 3: Properties of the constructed sequence. Our construction provides, for all δ > 0, a sequence h → 0 (that depends on δ), and sequences of pairs The data convergence property of the construction in Lemma 5 provides that, for all δ > 0, along a subsequence h → 0, (2.23) in L 2 (Q; R n ), with limit functions g δ = iḡ i 1 Q i and j δ = ij i 1 Q i . In particular, (2.24) and similarly Q | j δ − j| 2 ≤ 2δ. The functions (g h δ , j h δ ) are uniformly bounded for all 0 < δ < 1. In fact, by Lemma 5, we have Similarly, one obtains the uniform bound for j h δ . From (2.23) and (2.24) we deduce We then choose h(k) 0 such that (2.28) We finally define g k := g h(k) δ(k) and analogously j k , G k , J k . Let us verify the properties. By construction, (G k , J k ) ∈ E F and (g k , j k ) ∈ D. The boundedness of the sequences (g k , j k ) k , (2.28) and an identification argument show their weak convergence in L 2 (Q; R n ) to (g, j). By (2.28) we therefore have (g k , j k , G k , J k ) (g, j, 0, j) weakly in L 2 (Q; R n ) , This provides the required approximation of (g, j).
Remark on related statements in [6]. The proof of Proposition 1 can be compared in both, its statement and its methods of proof, with Theorem 3.16 in [6]. A difference is that we start from the explicit constructions and we do not assume, e.g., that g is a gradient.

Necessary conditions for relaxed data points
The goal of this section is to prove the inclusion "⊂" in (1.11) of Theorem 1. For the proof, we fix an arbitrary pair of functions (g, j) ∈ D relax and show that (g, j)(x) ∈ C ∪ D A loc for almost all x ∈ Q.
By Definition 1, the condition (g, j) ∈ D relax means that there exist sequences (g h , j h ) and (G h , J h ) and a limit (G, J ) ∈ E f such that In the arguments below the div-curl lemma plays an important role. Data convergence provides the right properties to apply the lemma. The sequences G h , J h have vanishing curl and controlled divergence since (G h , J h ) belong to E f . Therefore the standard formulation of the div-curl lemma in L 2 (Q; R n ), see for example [11,Theorem 5.2.1], yields the distributional convergence of G h · J h . The convergence of the products g h · j h can be deduced by the strong convergence of differences, inherited from the definition of data convergence.

Proposition 2 (Necessary condition on limit functions) There holds
Proof We split the proof into several parts.
Step 1: Preparation. In order to prove (3.1), we fix a pair (g, j) ∈ L 2 (Q; R n ) 2 in the relaxed data set D relax , which means that there exist sequences (g h , j h ) in D and (G h , J h ) in E f with data convergence such that (g h , j h ) weakly converges to (g, j). Our aim is to show (g(x), j(x)) ∈ C ∪ D A loc for almost every x ∈ Q. The approximating sequences The complement is denoted as A h := Q \ B h . Because of the bound 0 ≤ |B h | ≤ |Q|, we can select a subsequence (not relabeled) and a limit b ∈ L ∞ (Q) such that 1 B h → b weakly-* in L ∞ (Q) as h → 0, 0 ≤ b(x) ≤ 1 for a.e. x ∈ Q . (3.3) As a consequence, 1 A h → (1 − b) weakly-* in L ∞ (Q).
Step 2: A relation for averages. For any ϕ ∈ L 2 (Q; R n ) we can calculate, using the weak convergence of g h , the property g h (x) = 0 for x ∈ B h , then Ag h (x) = j h (x) for x ∈ A h , then g h (x) = e 1 for x ∈ B h , and finally the weak convergence of j h : (3.4)

This shows
Ag = j − be 1 in Q . (3.5) In particular, we find (g, j)(x) ∈ D A loc for almost all x ∈ {x ∈ Q | b(x) = 0}.
Step 3: Div-curl lemma. The data convergence properties allow to calculate the distributional limit of the product g h · j h . In the subsequent calculation, we use the standard div-curl lemma in L 2 (Q; R n ) for the product G h · J h , and the strong convergence of differences in the other terms. In the limit h → 0, we obtain, for any ϕ ∈ C ∞ c (Q), Step 4: The cone condition. We choose ε ∈ (0, 1) and set β ε := (1 − b + ε) −1 . For arbitrary ϕ ∈ C ∞ c (Q) with ϕ ≥ 0 we can calculate, exploiting the positivity and symmetry of A, (3.6) Using Step 3 and the weak-* convergence of 1 A h we deduce where we have used that j = be 1 + Ag. Since ϕ was arbitrary, almost everywhere in Q holds 0 ≤ bg 1 + g · Ag 1 − 2β ε + (1 − b)β 2 ε . (3.8) Evaluating this inequality in {b = 1} = {β ε = ε −1 }, we find g · Ag ≤ ε 2−ε g 1 in this set. Since ε > 0 was arbitrary, we find g = 0 and, as a consequence of (3.5), j = e 1 almost everywhere in {b = 1}. In particular, (g, j)(x) ∈ D B loc ⊂ C for almost all x ∈ {b = 1}. We next consider the set {0 < b < 1}. In this set, for ε → 0, there holds β ε → 1/(1 − b). Relation (3.8) implies . This is one of the defining relations of the cone C, compare (1.9). Combined with (3.5), we obtain that (g, j) ∈ C almost everywhere in {0 < b < 1}. We recall that (g, j) ∈ D A loc in the set {b = 0} was already obtained in Step 2. This implies (3.1) and concludes the proof of the proposition.
We note that the expression on the right hand side is negative for b L = b by (A.1). On the other hand, for b L = 1, the expression on the right hand side is a product of two identical terms and hence nonnegative. This implies that there exists a value b L ∈ (b, 1] such that the expression vanishes. For this parameter b L , the above condition is satisfied and hence (g L , j L ) ∈ ∂ lat C.
We set λ := b b L ∈ (0, 1). With this choice, by definition of g L in (A.3), we obtain g L = 1 λ (g C − g A ) + g A and therefore Regarding the component j, we find Together with (A.5), this shows (2.8).
Finally, the definitions of g A , g L , j A , and j L imply g A − g L = b L b (g A − g C ) = b L ν 1 ν and hence L ν 2 1 ν · Aν − b L e 1 · (b L ν 1 ν) = 0 . This shows (2.9) and completes the proof of the lemma.