Rigidity of branching microstructures in shape memory alloys

We analyze generic sequences for which the geometrically linear energy \[E_\eta(u,\chi):= \eta^{-\frac{2}{3}}\int_{B_{0}(1)} \left| e(u)- \sum_{i=1}^3 \chi_ie_i\right|^2 d x+\eta^\frac{1}{3} \sum_{i=1}^3 |D\chi_i|(B_{0}(1))\] remains bounded in the limit $\eta \to 0$. Here $ e(u) :=1/2(Du + Du^T)$ is the (linearized) strain of the displacement $u$, the strains $e_i$ correspond to the martensite strains of a shape memory alloy undergoing cubic-to-tetragonal transformations and $\chi_i:B_{0}(1) \to \{0,1\}$ is the partition into phases. In this regime it is known that in addition to simple laminates also branched structures are possible, which if austenite was present would enable the alloy to form habit planes. In an ansatz-free manner we prove that the alignment of macroscopic interfaces between martensite twins is as predicted by well-known rank-one conditions. Our proof proceeds via the non-convex, non-discrete-valued differential inclusion \[e(u) \in \bigcup_{1\leq i\neq j\leq 3} \operatorname{conv} \{e_i,e_j\}\] satisfied by the weak limits of bounded energy sequences and of which we classify all solutions. In particular, there exist no convex integration solutions of the inclusion with complicated geometric structures.

: A sketch of the cubic-to-tetragonal transformation. The left-hand side represents the cubic austenite phase, while the right-hand side represents the martensite variants that are elongated in the direction of one of the axes of the cube and shortened in the other two. Adapted from [5, Figure 4.5].

Introduction
Due to the many possible applications of the eponymous shape memory effect, shape memory alloys have attracted a lot of attention of the engineering, materials science and mathematical communities. Their remarkable properties are due to certain diffusionless solid-solid phase transitions in the crystal lattice of the alloy, enabling the material to form microstructures. More specifically, the lattice transitions between the cubic austenite phase and multiple lower-symmetry martensite phases, triggered by crossing a critical temperature or applying stresses.
In shape memory alloys undergoing cubic-to-tetragonal transformations, see 1, one frequently observes the following types of microstructures:  3. Second-order laminates, or twins within a twin: Essentially sharp interfaces between two different refining twins, see Figure 2b. 4. Crossing second-order laminates: Two crossing interfaces between twins and pure phases, see for example [3, Figure 17].

5.
Wedges: Materials whose lattice parameters satisfy a certain relation can form a wedge of two martensite twins in austenite, see [5,Chapter 7.3.1] and Figure 2c.
Furthermore, at least in Microstructures 1, 2 and 5, all observed interfaces form parallel to finitely many different hyperplanes relative to the crystal orientation. The first use of energy minimization in the modeling of martensitic phase transformations has been made by Khatchaturyan, Roitburd and Shatalov [22-24, 36, 37] on the basis of linearized elasticity. This allowed to predict certain large scale features of the microstructure such as the orientation of interfaces between phases.
Variational models based on nonlinear elasticity go back to Ball and James [1,2]. They formulated a model in which the microstructures correspond to minimizing sequences of energy functionals vanishing on for finitely many suitable symmetric matrices U i . In their theory, the orientation of interfaces arise from a kinematic compatibility condition known as rank-one connectedness, see [5,Chapter 2.5]. For cubic-to-tetragonal transformations Ball and James prove in an ansatz-free way that the fineness of the martensite twins in a habit plane is due only certain mixtures of martensite variants being compatible with austenite. Their approach is closely related to the phenomenological (or crystallographic) theory of martensite independently introduced by Wechsler, Lieberman and Read [42] and Bowles and MacKenzie [7,33]. In fact, the variational model can be used to deduce the phenomenological theory.
A comparison of the nonlinear and the geometrically linear theories can be found in an article by Bhattacharya [4]. Formal derivations of the geometrically linear theory from the nonlinear one have been given by Kohn [27] and Ball and James [2]. A rigorous derivation via Γ-convergence has been given by Schmidt [41] with the limiting energy in general taking a more complicated form than the usually used piecewise quadratic energy densities.

Rigidity of differential inclusions
The interpretation of microstructure as minimizing sequences naturally leads to analyzing the differential inclusions sometimes called the m-well problem, or variants thereof such as looking for sequences u k such that dist(Du k , K) → 0 in measure. In fact, the statements of Ball and James are phrased in this way [1,2]. A detailed discussion of these problems which includes the theory of Young measures has been provided by Müller [34].
However, differential inclusions in themselves are not accurate models: Müller andŠverák [35] constructed solutions with a complex arrangement of phases of the differential inclusion Du ∈ SO(2)A ∪ SO(2)B with det(A) = det B = 1, for which one would naively only expect laminar solutions, in two space dimensions using convex integration. Later, Conti, Dolzmann and Kirchheim [15] extended their result to three dimensions and the case of cubic-to-tetragonal transformations.
But Dolzmann and Müller [17] also noted that if the inclusion Du ∈ SO(2)A ∪ SO (2)B is augmented with the information that the set {Du ∈ SO(2)A} has finite perimeter, then Du is in fact laminar. Also this result holds in the case of cubic-to-tetragonal transformations as shown by Kirchheim [25]. There has been a series of generalizations including stresses [16,31,32], culminating in the papers by Conti and Chermisi [13] and Jerrard and Lorent [20]. However, these are more in the spirit of the geometric rigidity theorem due to Friesecke, James and Müller [19] since they rely on the perimeter being too small for lamination and as such do not give insight into the rigidity of twins.
In contrast, the differential inclusion arising from the geometrically linear setting where e i for i = 1, 2, 3 are the linearized strains corresponding to the cubic-to-tetragonal transformation, see (4), is rigid in the sense that all solutions are laminates even without further regularizations as proven by Dolzmann and Müller [17]. Quantifying this result Capella and Otto [10,11] proved that laminates are stable in the sense that if the energy (1) (including an interfacial penalization) is small then the geometric structure of the configuration is close to a laminate. Additionally, there is either only austenite or only mixtures of martensite present. Capella and Otto also noted that for sequences with bounded energy such a result cannot hold due to a well-known branching construction of habit planes (Figure 2a) given by Kohn and Müller [28,29].
Therein, Kohn and Müller used a simplified scalar version of the geometrically linear model with surface energy to demonstrate that compatibility of austenite with a mixture of martensites only requires a fine mixture close to the interface so that the interfacial energy coarsens the twins away from the interface. Kohn and Müller also conjectured that the minimizers exhibit this so-called branching, which Conti [14] affirmatively answered by proving minimizers of the Kohn-Müller functional to be asymptotically self-similar.
In view of the results by Kohn and Müller, and Capella and Otto it is natural to consider sequences with bounded energy in order to analyze the rigidity of branching microstructures.

Some related problems
So far, we mostly discussed the literature describing the microstructure of single crystals undergoing cubic-to-tetragonal transformations. However, the variational framework can be used to address related problems, for which we highlight a few contributions as an exhaustive overview is outside the scope of this introduction: An overview of microstructures arising in other transformations can be found in the book by Bhattacharya [5]. Rigorous results for cubic-to-orthorhombic transformations in the geometrically linear theory can be found in a number of works by Rüland [38,39]. For the much more complicated cubic-to-monoclinic-I transformations with its twelve martensite variants, Chenchiah and Schlömerkemper [12] proved the existence of certain non-laminate microstructures in the geometrically linear case without surface energy.
For an overview over the available literature on polycrystalline shape memory alloys we refer the reader once again to Bhattacharya's book [5,Chapter 13] and an article by Bhattacharya and Kohn [6].
Another problem is determining the shape of energy-minimizing inclusions of martensite with given volume in a matrix of austenite, for which scaling laws have been obtained by Kohn, Knüpfer and Otto [26] for cubic-to-tetragonal transformations in the geometrically linear setting.

Definition of the energy
In order to analyze the rigidity properties of branched microstructures we choose the geometrically linear setting, since the quantitative rigidity of twins is well understood due to the results by Capella and Otto [10,11]. In fact, we continue to work with the same already non-dimensionalized functional, namely E η (u, χ) := E elast (u, χ) + E inter,η (u, χ), where E inter,η (u, χ) := η Here Ω ⊂ R 3 is a bounded Lipschitz domain, u : Ω → R 3 is the displacement and e(u) = 1 2 Du + Du T denotes the strain. Furthermore, the partition into the phases is given by χ i : Ω → {1, 1} for i = 1, . . . , 3 with 3 i=1 χ i = 1 and the strains associated to the phases are given by In particular, we assume the reference configuration to be in the austenite state, but that the transformation has occured throughout the sample, i.e., there is no austenite present. This simplifying assumption does rule out habit planes, see Figure 2a, but a look at Figure 2b suggests that we can still hope for an interesting result. Furthermore, the responsible mechanism for macroscopic rigidity is the rank-one connectedness of the average strains e(u η ) e(u) in L 2 (encoded in the decomposition provided by Lemma 3.2), which cannot distinguish between pure phases and mixtures.
The condition of the material being a shape memory alloy is encoded in the fact that tr(e i ) = 0 for i = 1, 2, 3 as this corresponds to the transformation being volumepreserving.
Further simplifying choices are using equal isotropic elastic moduli with vanishing second Lamé constant and penalizing interfaces by the total variation of Dχ i . Of course, as such it is unlikely that the model can give quantitatively correct predictions. Bhattacharya for example argues that assuming equal elastic moduli is not reasonable [4,Page 238].
We still expect our analysis to give relevant insight as we will for the most part prove compactness properties of generic sequences u η ∈ W 1,2 (Ω; R 3 ) and partitions χ η such that lim sup This regime is the appropriate one to analyze branching microstructures: On the one hand, (generalizations of) the Kohn-Müller branching construction of habit planes have bounded energy. On the other hand, the stability result of Capella and Otto [10] rules out branching by ensuring that in a strong topology there is either almost exclusively austenite or the configuration is close to a laminate. In other words, the branching construction implies that the stability result is sharp with respect to the energy regime as pointed out by Capella and Otto in their paper.

Compatibility properties of the stress-free strains
It is well known, see [5,Chapter 11.1], that for A, B ∈ R 3×3 and n ∈ S 2 the following two statements are equivalent: • There exists a continuous function u : see Figure 3a.
• The two strains are (symmetrically) rank-one connected in the sense that there exists a ∈ R 3 such that Note that the condition is symmetric in a and n thus every rank-one connection generically gives rise to two possible normals. Additionally, as rank-one connectedness is also symmetric in A and B this allows for the construction of laminates.  Figure 3: a) Geometry of an interface parallel to the plane {x · n = 0} in a laminate joining the strains A and B. b) Sketch relating the martensite strains with the cone C (dotted) of symmetrized rank-one matrices in the two-dimensional strain space S. Note that C is a union of three lines parallel to the edges of the triangle K.
In order to present the result of applying the rank-one connectedness condition to the case of cubic-to-tetragonal transformations notice that e 0 , . . . e 3 ∈ S := e ∈ R 3×3 : e diagonal, tr e = 0 .
Here, we call the two-dimensional space S strain space. It can be shown, either by direct computation or an application of [12,Lemma 3.1], that all rank-one directions in S are multiples of e 2 − e 1 , e 3 − e 2 and e 1 − e 3 . This means that they are parallel to one of the sides of the equilateral triangle spanned by e 1 , e 2 and e 3 shown in Figure 3b. In particular, the martensite strains are mutually compatible but austenite is only compatible to certain convex combinations of martensites which turn out to be 1 3 e i + 2 3 e j for i, j = 1, 2, 3 with i = j.

The contributions of the paper
We study the rigidity of branching microstructures due to "macroscopic" effects in the sense that we only look at the limiting volume fractions χ i,η * θ i in L ∞ after passage to a subsequence, which completely determines the limiting strain e(u η ) e(u) in L 2 .
Similarly to the result of Capella and Otto [10], our main result, Theorem 2.1, is local in the sense that for Ω = B 1 (0) we can classify the function θ on a smaller ball B r (0) of universal radius 0 < r < 1. As the characterization of each of the four possible cases is a bit lengthy, we postpone a detailed discussion to Subsection 2.
3. An important point is that we deduce all interfaces between different mixtures of martensites to be hypersurfaces whose normals are as predicted by the rank-one connectedness of the average strains on either side. In this respect our theorem improves on previously available ones, as they either explicitly assume the correct alignment of a habit plane, see e.g. Kohn and Müller [29], or require other ad-hoc assumptions: For example, Ball and James [1,Theorem 3] show habit planes to be flat under the condition that the set formed by the austenite phase is taken is topologically well-behaved.
The broad strategy of our proof is to first ensure that in the limit the displacement satisfies the non-convex differential inclusion e(u) ∈ K encoding that locally at most two variants are involved, see Definition 6 and Figure 3, and then to classify all solutions. We strongly stress the point that we do not need to assume any additional regularity in order to do so. In particular, the differential inclusion is rigid in the sense that it does not allow for convex integration solutions with extremely intricate geometric structure. To our knowledge this is the first instance of a rigidity result for a non-discrete differential inclusion in the framework of linearized elasticity.
The main idea is that "discontinuity" of e(u) and the differential inclusion e(u) ∈ K balance each other: If e(u) / ∈ V M O, see Definition 3.7, a blow-up argument making use of measures describing the distribution of values e(u) ∈ K, similar in spirit to Young measures, proves that the strain is independent of one direction. If e(u) ∈ V M O the differential inclusion gives us less information, but we can still prove that only two martensite variants are involved by using an approximation argument. Finally, we classify all solutions which are independent of one direction.
The structure of the paper is as follows: In Section 2 we state and discuss our main theorem in detail. We then proceed to break down its proof into several main steps in Section 3, and give an in-depth explanation of all necessary auxiliary results. Finally, we give the proofs of these results in Section 4.

The main rigidity theorem
Note that any sequence with asymptotically bounded energy has subsequences such that u η u in W 1,2 and χ η * θ in L ∞ .
Theorem 2.1. There exists a universal radius r > 0 such that the following holds: Let (u η , χ η ) be a sequence of displacements and partitions such that E η (u η , χ η ) < C for some 0 < C < ∞. Then, for any subsequence along which they exist, the weak limits see Figure 4, for almost all x ∈ B 1 (0).
Furthermore, on the smaller ball B r (0) all solutions to this differential inclusion are twovariant configurations, planar second-order laminates, planar checkerboards or planar triple intersections, according to Definitions 2.4-2.8 below.
The first part of the conclusion states that the volume fractions θ i for i = 1, 2, 3 act as barycentric coordinates for the triangle in strain space with vertices e 1 , e 2 and e 3 . In terms of these, the differential inclusion e(u) ∈ K boils down to locally only two martensite variants being present. In plain words, the classification of solutions states that 3. or they are independent of one direction and look like a checkerboard of up to two second-order laminates crossing, see Definition 2.7, 4. or they are independent of one direction and macroscopically look like three secondorder laminates crossing in an axis, see 2.8.
Comparing this list to the list of observed microstructures in the introduction, we see that three crossing second-order laminates are missing. Indeed, we are unaware of them being mentioned in the currently available literature. One possible explanation is that planar triple intersections are an artifact of the linear theory. Another one is that its very rigid geometry, see Definition 2.8, leads to it being unlikely to develop during the inherently dynamic process of microstructure formation.
Furthermore, we see that the theorem of course captures neither wedges (which are known to be missing in the geometrically linearized theory anyway [4]) nor habit planes due to austenite being absent. Unfortunately, an extension of the theorem including austenite does not seem tractable with the methods used here: The central step allowing to classify all solutions of the differential inclusion is to that most configurations are independent of some direction. And even those that do depend on all three variables have a direction in which they vary only very mildly. However, with austenite being present this property is lost, as the following example shows: such that e(u) has a fully three dimensional structure.
We will give the construction in Subsection 2.4.
Note that Theorem 2.1 strongly restricts the geometric structure of the strain, even if the four cases exhibit varying degrees of rigidity. Therefore, we can interpret it as a rigidity statement for the differential inclusion e(u) ∈ K. For example, it can be used to prove that u(x) ≡ e ∈ K is the only solution of the boundary value problem with affine boundary data, for which convex integration constructions would give a staggering amount of solutions with complicated geometric structures. This can be seen by transporting the decomposition into one-dimensional functions of Definitions 2.4-2.8 to the boundary using the fact that they are unique up to affine functions, see [10, Lemma 5].

Inferring the microscopic behavior
In order to properly interpret the various cases Theorem 2.1 provides, we first need a clear idea of precisely what information the local volume fractions contain. In principle, they have the same downside of using Young measures to describe microstructures: They do not retain information about the microscopic geometric properties of the microstructures. In fact, the Young measures generated by finite energy sequences are determined by the volume fractions and are given by the expression 3 i=1 θ i δ e i , since the Young measures concentrate on the matrices e 1 , e 2 and e 3 , which span a non-degenerate triangle.
As every rank-one connection has two possible normals, see equations (7), giving rise to two different twins, we cannot infer from the volume fractions which twin is used. Consequently, what looks like a homogeneous limit could in principle be generated by a patchwork of different twins. In fact, Figure 5 shows an experimental picture of such a situation.
Additionally, without knowing which twin is present the interpretation of changes in volume fractions is further complicated by the fact there are at least three mechanisms which could be responsible: of patches where only two martensite variants are involved. In the following, we will see that on these patches the microstructures are usually much more rigid than those in Figure 7a. This is a result of the non-local nature of kinematic compatibility when gluing two different two-variant configurations together to obtain a more complicated one.
Apart from two-variant configurations, all others will only depend on two variables. We will call such configurations planar. There exist measurable functions f ν i only depending on x · ν i and affine functions g j with ∂ d g j = 0 such that on B r (0). Here ν i is the unique normal ν i ∈ N i with ν i · d = 0, see Figure 6b.
There will be three cases of planar configurations, which at least in terms of their volume fractions look like one of the following: single second-order laminates, "checkerboard" structures of two second order laminates crossing, and three single interfaces of second order laminates crossing.
The first two cases are closely related to each other, the first one being almost contained in the second. However, the first case has slightly more flexibility away from macroscopic interfaces. Despite the caveat discussed in Subsection 2.1, we will name them planar second-order laminates.
Definition 2.6. A configuration is a planar second-order laminate on a ball B r (0) for r > 0 if there exists an index i ∈ {1, 2, 3} and ν ∈ N i such that with A ⊂ R measurable and a, b ∈ R such that 0 ≤ θ i ≤ 1 for almost all x ∈ B r (0).
A sketch of a planar second-order laminate can be found in Figure 8, along with a matching experimental picture of a Cu-Al-Ni alloy, which, admittedly, undergoes a cubicto-orthorhombic transformation.  Figure 8: a) Cross-section of a planar second-order laminate arranged in such a way that it is constant in the direction perpendicular to plane of the paper. b) Color code for the mixtures involved at one of the interfaces in the center of Subfigure 8a. The set {x · ν ∈ A} is shown as mostly green. c) Second-order laminate in a Cu-Al-Ni alloy, by courtesy of C. Chu and R.D. James. The fine twins correspond to mixtures of pure blue and green and, respectively, blue and red in Subfigure 8a.
Indeed, such configurations can be interpreted and constructed as limits of finite-energy sequences as follows, using Figure 8 as a guide: For simplicity let us assume that A is a finite union of intervals, and that i = 1. Then on the interior of {x · ν ∈ A} the configuration will be generated by twins of variants 1 and 2, while on the interior of {x · ν ∈ A c }, it will be generated by twins of variants 1 and 3. At interfaces, a branching construction on both sides will be necessary to join these twins in a secondorder laminate. In order to realize the affine change in the direction of ν we will need to combine Mechanisms 1 and 2 of Subsection 2.1 because ν is neither a possible direction of lamination between variants 1 and 2 or variants 1 and 3, nor is it normal to one of them.
The second case consists of configurations in which two second-order laminates cross. In contrast to the first case, the strains are required to be constant away from macroscopic interfaces leading to only four different involved macroscopic strains.
Definition 2.7. We will say that a configuration is a planar checkerboard on B r (0) for r > 0 if it is planar and there exists i ∈ {1, 2, 3} such that For a sketch of such configurations, see Figure 9. An experimental picture can be found in [3, Figure 17] Again, we briefly discuss the construction of such limiting strains. On {x · ν i+1 ∈ A c } ∩ {x · ν 3 ∈ B c } there is of course only the martensite variant i present. On all other patches there will be twinning and the macroscopic interfaces require branching constructions unless the interface and the twinning normal coincide, which can only happen if both strains lie on the same edge of K. In particular, on {x · ν i+1 ∈ A, x · ν i−1 ∈ B} there has to be branching towards all interfaces, i.e., the structure has to branch in two linearly independent directions.
Lastly, we remark on the case of three crossing second-order laminates.
Definition 2.8. A configuration is called a planar triple intersection on B r (0) for r > 0 if it is planar and we have for almost all x ∈ B r (0). Hereν i = ±ν i for i = 1, 2, 3 are oriented such that they are linearly dependent by virtue ofν 1 +ν 2 +ν 3 = 0, see Remark 2.3. Furthermore, we have either for some x 0 ∈ B r (0) and a, b i ∈ R for i = 1, 2, 3 such that A sketch of a planar triple intersection can be found in Figure 10.
There are a number of possible choices of microscopic twins for constructing triple sections. We will only describe the simplest one here, which is depicted in Figure 10c.
Going around the central axis the macroscopic interfaces alternate between being a result of Mechanism 1 from Subsection 2.1, namely varying the relative thickness of layers in a twin, and Mechanism 3, i.e., branching, otherwise. Similarly to the case of secondorder laminates, the affine changes require a combination of Mechanisms 1 and 2 on the individual patches in Figure 10c. Figure 11: Sketch showing the basis {ν + 1 , ν − 1 , ν 3 } and a plane with normal ν − 1 , parallel to which the cross-sections of Figure 12 are chosen.

Construction of a fully three-dimensional structure in the presence of austenite
Here we flesh out the previously announced example in Lemma 2.2. The idea is to construct planar checkerboards on hyperplanes H(c, ν) for some normal ν ∈ N and c ∈ R that include austenite and between which we can switch as c varies, see Figure 12. Figure 11. Let χ + 1 , χ − 1 , χ 3 : R → {0, 1} be measurable characteristic functions. We define the volume fractions to be
Straightforward case distinctions ensure that θ i = 0 for some i = 1, 2, 3 or θ i = 1 3 for all i = 1, 2, 3 almost everywhere. Setting G := 3 i=1 θ i e i we see that this implies G ∈ K∪{0} almost everywhere. A sketch of cross-sections through G on H(c, ν − 1 ) both with χ − 1 (c) = 0 and χ − 1 (c) = 1 is given in Figure 12. Finally, in order to identify G as the symmetric gradient of a displacement we set The left-hand side shows a cross-section with x·ν − 1 = c such that χ − 1 (c) = 0. (Be warned that the angles between interfaces are not accurate in the picture because we have ν − 1 ·ν + 1 , ν − 1 ·ν 3 = 0.) The involved strains are marked on the right-hand side. b) On the left-hand side there is a cross-section with x · ν − 1 = c depicted such that χ − 1 (c) = 1. Again, the right-hand side indicates the involved strains.
The identity e(u) ≡ G is straightforward to check.

Outline of the proof
We will give the ideas behind each individual part of the proof of our main theorem in its own subsection. The contents of each are organized by increasing detail, so that the reader may skip to the next subsection once they are satisfied with the explanations given. However, we will first prove Theorem 2.1 itself here to provide a road map to the following subsections.
Throughout the paper the number r denotes a generic, universal radius that in proofs may decrease from line to line.
Proof of Theorem 2.1. We first use Lemma 3.1 to see that the limiting differential inclusion e(u) ∈ K in fact holds. Next, we apply Lemma 3.2 to deduce the existence of six one-dimensional functions f ν ∈ L ∞ only depending on x · ν for ν ∈ N and three affine functions g i for i = 1, 2, 3 such that on some smaller ball B r (0).
If f ν ∈ V M O(−r, r) for all ν ∈ N , then Proposition 3.11 implies that the solution of the differential inclusion is a two-variant configuration. If f ν / ∈ V M O(−r, r) for some ν ∈ N i and i ∈ {1, 2, 3} we can use Proposition 3.6 to deduce that the configuration is planar or involves only two variants. Furthermore, if it is not a two-variant configuration, then there exists a plane H(α, ν) for some α ∈ (−r, r) with the following property: It holds that for some 0 < b < 1 and a Borel-measurable subset B ⊂ {x · ν = α} ∩ B r (0) of non-zero H 2 -measure. This is measure-theoretically meaningful since H(α, ν) is not normal to directions involved in the decomposition of θ j , see Lemma 3.4.
We are thus left with classifying planar configurations. If additionally one of the onedimensional functions f ν j for j ∈ {1, 2, 3} \ {i} is affine, we can apply Lemma 3.14 using the additional information (10) to see that the configuration is a planar second-order laminate or a planar checkerboard. Otherwise an application of Proposition 3.15 yields that the configuration is a planar triple intersection.

The differential inclusion
We first mention that the inclusion e(u) ∈ K holds.
Lemma 3.1. Let (u η , χ η ) be a sequence of displacements and partitions such that Then for any subsequence for which the weak limits The statement e(u) ≡ 3 i=1 θ i e i is an immediate consequence of the elastic energy vanishing in the limit and the proof of the non-convex inclusion relies on the rescaling properties of the energy. We will set where η needs to be re-scaled as well due to it playing the role of a length scale, to obtain The right-hand side consequently behaves better than just taking averages, which allows us to locally apply the result by Capella and Otto [10] to get the statement.

Decomposing the strain
Next, we link the convex differential inclusion to a decomposition of the strain into simpler objects, namely functions of only one variable and affine functions. Already Dolzmann and Müller [17] used the interplay of this decomposition with the non-convex inclusion e(u) ∈ {e 1 , e 2 , e 3 } to get their rigidity result.
There exists a universal r > 0 with the following property: Let a displacement u ∈ W 1,2 (B 1 (0)) be such that e(u) ∈ K a.e., where K ⊂ S is a compact set. Then there exist 1. a function f ν ∈ L ∞ ([−r, r]) for each ν ∈ N which will take ν · x as its argument and 2. affine functions g 1 , g 2 , g 3 such that we have on B r (0).
Here we abuse notation by dropping 1 √ 2 when referring to the one-dimensional functions, e.g., we write f (011) instead of f 1 √ 2 (011) . Furthermore, we will at times not distinguish between f ν and f ν (ν · x) as long as the context clearly determines which we mean.
Throughout the paper, we only use the fact that the inclusion e(u)(x) ∈ K a.e. involves a differential through decomposition (11). Therefore, we can easily transfer all the relevant information to the volume fractions θ via the relation for all i = 1, 2, 3. In fact, most of the arguments in the following subsections become much more transparent if we re-formulate the differential inclusion in terms of the volume fractions as θ(x) ∈K a.e. with The only (marginally) new aspect of Lemma 3.2 compared to the previously known versions [17, Lemma 3.2] and [10, Proposition 1] is the statement f ν ∈ L ∞ for all ν ∈ N . We will thus only highlight the required changes to the proof of Capella and Otto [10, Proposition 1]. Essentially, the strategy here is to integrate the Saint-Venant compatibility conditions for linearized strains, which in our situation take the form of six two-dimensional wave equations, see Lemma 3.5. Thus it is not surprising that the decomposition is in being a symmetric gradient, which reassures us in our approach of only appealing to the differential information through equations (11).
A central part of the proof of Lemma 3.2 is uniqueness up to affine functions of the decomposition [10, Lemma 3.8]. We can apply this result to characterize two-variant configurations as the only ones with θ i ≡ 0 for some i = 1, 2, 3, i.e., as the only ones that indeed only combine two variants. Figure 13: Sketch indicating that θ 2 has traces on hyperplanes with normal ν + 2 since its decomposition only involves continuous functions and the normals ν ± i for i = 1, 3. As usual we do not keep track of the lengths of the drawn vectors. Another very useful consequence of the decomposition (11) is that such functions have traces on hyperplanes as long as none of the individual one-dimensional functions are necessarily constant on them. See Figure 13 for the geometry in a typical application.
Lemma 3.4. Let F : R n → C for a closed convex set C ⊂ R m satisfy the decomposition with locally integrable functions f i : R → R m and directions ν i ∈ S n−1 for i = 1, . . . , P .
Then the decomposition (13) defines a locally integrable restriction F | V : V → C and Finally, we give the wave equations constituting the Saint-Venant compatibility conditions.
Lemma 3.5. If e(u) ∈ S, the diagonal elements of the strain satisfy the following wave equations:

Planarity in the case of non-trivial blow-ups
While the statements in the previous subsections either rely on rather soft arguments or were previously known, we now come to the main ideas of the paper. AsK, see definition (12), is a connected set, there are no restrictions on varying single points continuously iñ K. However, the crucial insight is that two different pointsθ,θ ∈K withθ 1 =θ 1 > 0 are much more constrained.
To exploit this rigidity, we first for simplicity assume the decomposition Furthermore, suppose that f 1 is a BV -function with a jump discontinuity of size δf 1 at x 1 = 0 and that the other functions are continuous. Thus the blow-up of θ at some point (0, x ) ∈ B 1 (0) takes two valuesθ,θ, both of which satisfyθ 1 =θ 1 = θ 1 (0, x ). A look at Figure 14 hopefully convinces the reader that θ 1 (0, x ) can take at most two values, which furthermore are independent of x . As it is a sum of two one-dimensional functions some straightforward combinatorics imply that one of the two functions must be constant. Consequently θ only depends on two directions. This can be adapted to our more complex decomposition (11), even without any a priori regularity of the one-dimensional functions. To do so we need to come up with a topology for the blow-ups which respects the non-convex inclusion e(u) ∈ K, and a quantification of discontinuity for f ν which ensures that its blow-up is non-constant.
In order to keep the non-convexity we consider the push-forwards for x ∈ R 3 and ε → 0. This approach is very similar in spirit to using Young-measures, but without a further localization in the variable y. Positing that f ν does not have a constant blow-up along some sequence then means that f ν does not converge strongly to a constant on average, i.e., it does not converge to its average on average. If one allows the midpoints x of the blow-ups to depend on ε, we see that this is equivalent to f ν / ∈ V M O according to Definition 3.7 given below. The resulting statement is: There exists a universal radius r > 0 with the following property: Let Note that the second part is measure-theoretically meaningful by Lemma 3.4, see in particular Figure 13.
For the convenience of the reader, we provide a definition of the space V M O(U ) for an open domain U ⊂ R n for n ∈ N, which is modeled after the one given by Sarason [40] in the whole space case. It can be shown that at least for sufficiently nice sets U the space V M O is the BM Oclosure of the continuous functions on U and as such it serves as a substitute for C(U ) in our setting. Functions of vanishing mean oscillation need not be continuous, although they do share some properties with continuous functions, such as the "mean value theorem", see Lemma 3.12. We stress that the uniformity of the convergence in x is crucial and cannot be omitted without changing the space, as can be proven by considering a function consisting of very thin spikes of height one clustering at some point.
There is another slightly more subtle issue in the proof of Proposition 3.6: As already explained, our argument works by looking at a single plane at which we blow-up. Consequently, we can only distinguish the two cases θ i ≡ 0 and θ i ≡ 0 on said hyperplane. Therefore we need a way of transporting the information θ i ≡ 0 from the hyperplane to an open ball. Given our combinatorics this turns out to be the 3D analog of the question: However, the fact that 0 is an extremal value for θ 1 saves us: If F is constant on the diagonal of a square and achieves its minimum there, then it has to be constant on the entire square, see also Figure 15a. For later use we already state this fact in its perturbed form.
and some constant c ∈ R. Let ε ≥ 0 and let one of the following two statements be true: 2. The sum satisfies f (t) + g(t) ≤ c + ε for almost all t ∈ (0, 1).
If ε = 0, then all three statements are equivalent.
This statement can be lifted to three-dimensional domains. It states that in order to deduce that θ i is constant and extremal, it is enough to know that the extremal value is attained on a suitable line, which we will parametrize by l(t) : Here, E i is the i-th standard basis vector of R 3 and the restriction of θ i to the image of l is defined by Lemma 3.4. It will later be important that we have a precise description of the maximal set to which the information θ i = 0 can be transported, which turns out to be the polyhedron P := see Figure 15b. The general strategy of the proof is described in Figure 16.
There is also a generalization of the one-dimensional functions being almost constant in two dimensions: In three dimensions, the one-dimensional functions are close to being Sketch of the polyhedron P with normals ν ± i for i = 2, 3, which is the maximal set to which we can propagate the information θ 1 ≡ 0 or θ 1 ≡ 1 on the dashed line l(I).
affine on P in the sense that the inequality (17) holds. (Lemma 3.13 ensures that then there exist affine functions which are close.) As we only need this part of the statement in approximation arguments we may additionally assume that the one-dimensional functions are continuous to avoid technicalities.
The resulting statement is the following: Lemma 3.9. There exists a radius 0 < r < 1 with the following property: Let θ satisfy decomposition (11) on B 1 (0) and let 0 ≤ θ i ≤ 1 for all i = 1, 2, 3. Let I ⊂ R be a closed interval, let x 0 ∈ R 3 and let l(t) := x 0 + √ 2tE i ∈ B r (0) for t ∈ I and some i ∈ {1, 2, 3}. Additionally, let ν ∈ N i . We define the polyhedron P to be see also Figure 15.
For ε > 0 assume that either Then for almost all x ∈ P ⊂ B 1 (0) we have Figure 16: a) First, we transport the information {θ 1 ≈ 0} from the dashed line l(I) to the gray plane H(0, (011)) ∩ P using the two-dimensional result. b) In a second step, we use {θ 1 ≈ 0} along another dashed linel(Ĩ) parallel to E 1 to propagate the information to H(α, (011)) ∩ P for all α ∈ R.
Furthermore, if additionally the one-dimensional functions f ν are continuous for every ν ∈ N i+1 ∪ N i−1 , then they are almost affine in the sense that There is yet another minor subtlety of measure theoretic nature. We already mentioned that we require the midpoints of the blow-ups to be dependent on its radius. It is thus entirely possible that the radii vanish much faster than the midpoints converge. This means we cannot use Lebesgue point theory in an entirely straightforward manner to prove that the blow-ups of fν converge to their point values almost everywhere. We deal with this issue by exploiting density of continuous functions in L p in a straightforward manner.  To fix ideas, let us first illustrate the argument in the case of continuous functions in the whole space: By the mean value theorem the case e(u) ∈ {e 1 , e 2 , e 3 } is trivial, so let us suppose that there is a point x such that e(u)(x) lies strictly between two pure martensite strains. We may as well suppose θ 1 (0) = 0 and 0 < θ 2 (0), θ 3 (0) < 1, see Figure 17. By continuity, the set {θ 1 = 0} has non-empty interior, and, by the decomposition (11), any connected component of it should be a polyhedron P whose faces have normals lying in N 2 ∪ N 3 , see Figure 18a. Additionally, continuity implies that e(u) ≡ e 2 or e(u) ≡ e 2 on each face.
Unfortunately, on a face with normal in N i for i = 2, 3 only θ i will later be a well-defined function due to Lemmas 3.2 and 3.4 after dropping continuity. Therefore on such a face we can only use the above information in the form Using Lemma 3.9 we get a polyhedron Q that transports this information back inside P , see Figure 18b. The goal is then to show that we can reach x in order to get a contradiction to e(u)(x) lying strictly between e 2 and e 3 , which we will achieve by using the face of P closest to x.
In order to turn this string of arguments into a proof in the case f ν ∈ V M O for all ν ∈ N the key insight is that non-convex inclusions and approximation by convolutions interact very nicely for V M O-functions. As has been pointed out to us by Radu Ignat, this elementary, if maybe a bit surprising fact has previously been used to in the degree theory for V M O-functions, see Brezis and Nirenberg [8, Inequality (7)], who attribute it to L. Boutet de Monvel and O. Gabber. For the convenience of the reader, we include the statement and present a proof later. On the red face we get the information θ 2 ≡ 0 or θ 2 ≡ 1. In particular, we get it along the line l, which is parallel to E 2 . b) Sketch of the polyhedron Q that transports the information θ 2 ≡ 0 or θ 2 ≡ 1 along l to the inside of P . Unfortunately, formalizing the set {θ 1,δ ≈ 0} in such a way that connected components are polyhedra is a bit tricky. We do get that they contain polyhedra on which the onedimensional functions are close to affine ones, see Lemmas 3.9 and 3.13. However, we do not immediately get the other inclusion: As the directions in the decomposition are linearly dependent, one of the one-dimensional functions deviating too much from their affine replacement does not translate into θ 1 deviating too much from zero.
We side-step this issue by first working on hyperplanes H(α, (011)). In that case, the decomposition of θ 1 simplifies to two one-dimensional functions and thus we do get that connected components of {θ 1,δ ≈ 0} ∩ H(α, (011)) are parallelograms. The goal is then to prove that at least some of them, let us call them R δ , do not shrink away in the limit δ → 0. Making use of Lemma 3.9 we can go back to a full dimensional ball and get that the set {θ 1 = 0} has non-empty interior. This allows the argument for continuous functions to be generalized to V M O-functions.
In order to prove that R δ does not get too small we choose it such that we are in the situation depicted in Figure 19. We will show that θ 2,δ (y δ ) ≈ 0, θ 2,δ (z δ ) ≈ 1 or vice versa. Together with the fact that θ 2 • l is close to an affine function in a strong topology by the following Lemma 3.13, the function θ 2 would not have vanishing mean oscillation if R δ shrank away, i.e., if |y δ − z δ | → 0.
Then there exists an affine functiong such that This is closely related to the so-called Hyers-Ulam-Rassias stability of additive functions, on which there is a large body of literature determining rates for the closeness to linear functions, see e.g. Jung [21]. As such, this statement may well be already present in the literature. However, as far as we can see, the corresponding community seems to be mostly concerned with the whole space case.

Classification of planar configurations
It remains to exploit the two-dimensionality that was the result of Proposition 3.6. It allowed us to reduce the complexity of the decomposition (11) to three one-dimensional functions with linearly dependent normals and three affine functions. We first deal with the easier case where one of the one-dimensional functions is affine and can be absorbed into the affine ones.
Then the configuration is a planar second-order laminate or a planar checkerboard on B r (0).
While the preceding lemma is mostly an issue of efficient book-keeping to reap the rewards of previous work, we now have to make a last effort to prove the rather strong rigidity properties of planar triple intersections: Then the configuration is a planar triple intersection on B r (0).
The idea is to prove that the sets θ −1 i (0) for i = 1, 2, 3 take the form where J j ⊂ R and π j (x) := ν j · x for j = 1, 2, 3, i.e., they are product sets in suitable coordinates. Expressing the condition 3 i=1 θ −1 i (0) = B 1 (0) in terms of these sets allows us to apply Lemma 3.16 below to conclude that J j is an interval for j = 1, 2, 3. The actual representation of the strain is then straightforward to obtain. Lemma 3.16. There exists a universal radius 0 < r < 1 2 such that the following holds: Let ν 1 , ν 2 , ν 3 ⊂ S 1 be linearly dependent by virtue of ν 1 + ν 2 + ν 3 = 0. Let π i (x) := x · ν i for x ∈ R 2 and i = 1, 2, 3. Let J 1 , J 2 , J 3 ⊂ [−1, 1] be measurable such that 2. and the two sets J 1 and J 2 neither have zero nor full measure, i.e., it holds that Then there exist a point x 0 ∈ R such that x · ν i ∈ (−r, r) for all i = 1, 2, 3 and, up to sets of L 1 -measure zero, either To illustrate the proof let us first assume that J 1 and J 2 are intervals of matching "orientations", e.g., we have J 1 = J 2 = [−r, 0], in which case Figure 20a suggests that also J 3 = [−r, 0].
If they are not intervals of matching "orientations", we will see that, locally and up to symmetry, more of J 1 lies below, for example, the value 0 than above, while the opposite a) Figure 20: Sketches illustrating the proof of Lemma 3.16. The arrows in the middle indicate the three linearly dependent directions ν 1 , ν 2 and ν 3 . a) The set holds for J 2 . The corresponding parts of J 1 and J 2 are shown in Figure 20b. One then needs to prove that sufficiently many lines π −1 3 (s) for parameters s close to 0 intersect the "surface" of π −1 1 (J 1 ) ∩ π −1 2 (J 2 ), see Lemma 3.17 below. As a result less than half the parameters around 0 are contained in J 3 . The same argument for the complements ensures that also less than half of them are not contained in J 3 , which cannot be true.
To link intersecting lines to the "surface area" we use that our sets are of product structure, i.e., they can be thought of as unions of parallelograms, and that the intersecting lines are not parallel to one of the sides of said parallelograms. In the following and final lemma, we measure-theoretically ensure the line π −1 3 (s) intersects a product set π −1 1 (K 1 ) ∩ π −1 2 (K 2 ) by askinĝ is measurable and satisfies |M | ≥ |K 1 | + |K 2 |.

The differential inclusion
Proof of Lemma 3.1. Fixing the sequence (u η , χ η ) we interpret the energies as a sequence of finite Radon measures on B 1 (0).
Let y ∈ B 1 (0) and r > 0 be such that B r (y) ⊂ Ω. By translation invariance we can assume y = 0. We rescale our functions to the unit ball by settingx := x r andη := η r , and definingûη :
After passing to a subsequence, we have E η * E as Radon measures in the limit η → 0.
By standard covering arguments one can see that Thus for almost every point x ∈ B 1 (0) we have

Decomposing the strain
Proof of Lemma 3.2. The proof is essentially a translation of the proofs of Capella and Otto [10, Lemma 4 and Proposition 1] into our setting. To this end, we use the "dictionary" where the left-hand side shows our objects and the right-hand side shows the corresponding ones of Capella and Otto. The two main changes are the following: 1. In our case all relevant second mixed derivatives vanish (see Lemma 3.5), instead of being controlled by the energy. Furthermore, whenever Capella and Otto refer to their "austenitic result", we just have to use the fact that e(u) 11 +e(u) 22 +e(u) 33 ≡ 0.
2. We need to check at every step that boundedness of all involved functions is preserved.
We will briefly indicate how boundedness of all functions is ensured. The functions in [10,Lemma 4] are constructed by averaging in certain directions. This clearly preserves boundedness. The proof of [10, Proposition 1] works by applying pointwise linear operations to all functions, which again preserves boundedness, and by identifying certain functions as being affine, which are also bounded on the unit ball.
Proof of Corollary 3.3. By symmetry we can assume i = 1. Applying [10, Lemma 5] to θ 1 we see that the functions f (101) , f (101) , f (110) and f (110) are affine on some ball B r (0) with a universal radius r > 0. Thus the decomposition reduces to on B r (0). As the vectors (011) and (011) form a basis of the plane H(0, E 1 ), we can absorb the parts of g 2 depending on x 2 and x 3 into f (011) and f (011) . Due to θ 1 +θ 2 +θ 3 = 1 we have g 2 (x) + g 3 (x) ≡ 1 and the decomposition simplifies to θ 1 ≡ 0, θ 2 = f (011) + f (011) + λx 1 + 1, Proof of Lemma 3.4. Let For x ∈ V and δ > 0 we have that since B 1 (0) is invariant under rotation and C is convex. By standard statements about convolutions and sequences converging in L 1 we get a subsequence in δ, which we will not relabel, and a measurable set T ⊂ R such that φ δ * f i (t) → f i (t) for all i = 1, . . . , P and all t ∈ T with L(R \ T ) = 0. Letν ∈ V ∩ B 1 (0) \ {0} be the orthogonal projection of ν i onto V for all i = 1, . . . , n. A simple calculation implies that Thus for almost all x ∈ V we have that Proof of Lemma 3.5. By symmetry it is sufficient to prove the equations involving θ 1 . We calculate We also know Taking a further derivative we see

Planarity in the case of non-trivial blow-ups
Proof of Proposition 3.6.
Step 1: Identification of a suitable plane to blow-up at. By symmetry, we may assume ν = 1 √ 2 (011). We use two symbols for universal radii throughout the proof. The radiusr > 0, which will be the radius referred to in the statement of the proposition, will stay fixed throughout the proof and its value will be chosen at the end of the proof. In contrast, the radius r >r may decrease from line to line.
Step 2: There exists a subsequence, which we will not relabel, such that for almost all (β, γ) ∈ B r (0) we have Additionally, the probability measure µ on R 2 is not a Dirac measure.
The combinatorics behind the first convergence can be found in Figure 21a.
As ν · X k (β, γ) and ν · X(β, γ) depend on at least β or γ, see equations (23)-(27) and (30)-(34), and we have the uniform convergence X k → X, we can apply Lemma 3.10 to deduce that the integral in the last line vanishes in the limit. Passing to a subsequence, we get strong convergence in ξ for almost all (β, γ) ∈ B r (0).
Due to the fact that X k (β, γ) · 1 √ 2 (011) = α k we see that f (k) (011) does not depend on β and γ. Hence we may drop them in equation (35). As f (011) is a bounded function, the sequence of push-forward measures defined by the left-hand side have uniformly bounded supports. Consequently, there exists a limiting probability measure µ such that along a subsequence we havê for all ψ ∈ C(R 2 ). Finally, if we had µ = δf , then testing this convergence with the function ψ(ĝ) = |ĝ 2 −f 2 | we would see that because in L 1 the average is almost the constant closest to a function. However, this would contradict the convergence to a strictly positive number (21) after undoing the rescaling.
Step 3: For all (β, γ) as in Step 2 we havê for all ψ ∈ C 0 (R 3 ) and where z 2 , z 3 are defined by equations (38) and (39). The measurē µ defined by the right-hand side is supported onK, see definition (12). The previous calculations immediately give that θ (k) 1 converges strongly in ξ to Similarly, the blow-ups (θ resp. As the required convergence (36) is induced by a topology, we only have to identify the limit along subsequences, which may depend on β and γ, of arbitrary subsequences. Thus we may extract a subsequence to obtain pointwise convergence a.e. of the sequences θ (011) )(β, γ; ξ). Applying both Egoroff's and Lusin's Theorem, these convergences can be taken to be uniform and the limits to be continuous on sets of almost full measure. Consequently we get that for all ψ ∈ C 0 (R 3 ). Testing with ψ = dist(•,K) we see that the measureμ has support inK.
To see that (z 2 , z 3 ) is constant on B note that the above implies for (β, γ), (β,γ) ∈ B. As a non-empty set which is invariant under a single, non-vanishing shift has to at least be countably infinite, we see that (z 2 , z 3 ) has to be constant on B.
As the plane H(α, (011)) contains plenty of lines parallel to E 1 , see Figure 21b, an application of Lemma 3.9 ensures that θ 1 ≡ 0 on B r (0). Corollary 3.3 then implies that we are dealing with a two-variant configuration.
Step By the decomposition of θ 1 • X(β, γ), see equation (37), and its interplay with the coordinates X, see equations (29)-(34), we have where λ 1 , λ 2 , c ∈ R and As by Step 4 the function θ 1 • X(β, γ) takes at most two values almost everywhere we have that either F 1 is constant or F 2 is constant almost everywhere.
We only deal with the case in which F 2 is constant. The argument for the other one works analogously. Consequently, we get a measurable set D ⊂ (−r, r) such that |D| > 0 and D × (−r, r) ⊂ B, see Figure 23.
We will follow the notation of Capella and Otto [10] in writing discrete derivatives of a function φ(γ) as β γ Figure 23: Sketch of the set D × (r, r). We take differences of the constant shifts (z 2 , z 3 ) in γ and in β in order to isolate a single function f ν by Remark 2.3 and prove that it is affine.
We proved in Step 4 that the shift (z 2 , z 3 ) is constant almost everywhere on B. Thus we get for h ∈ (−r, r), β ∈ D and almost all γ ∈ (−r, r) that The fact that g 2 is affine implies that ∂ h γ g 2 •X is independent of β. Thus, "differentiating" again under the constraint β,β ∈ D we get Even though in general we have D = (−r, r), we can still apply [10, Lemma 7] due to |D| > 0 to get ∂ h ∂hf (011) (t) = 0 for almost all t ∈ (−r, r) and shifts h,h ∈ (−r, r). Consequently, the function f (011) is affine, see e.g. Lemma 3.13. Referring back to equation (41) we see that also f (110) is affine.
In the standard basis of R 3 this translates to since ∂ γ corresponds to differentiating in the direction of [111] by equation (28). At last we are in the position to chooser := 1 2 r, so that we get The analogue of (41) using z 3 rather than z 2 gives that f (101) is affine and that we may find an affine functiong 3 with ∂ [111]g 3 = 0 such that Equations (42) f (x 1 ) + g(x 2 ) = c = 0.
Let δ > 0. We know that Consequently, we have that As a result we know −δ ≤ ess inf f + ess inf g for all δ > 0, which implies the claim.
For almost all x ∈ (0, 1) 2 we know that In particular, we know ess inf f + ess inf g ≤ ε.
By Fubini's Theorem there exists an x 2 ∈ (0, 1) such that we have for almost all x 1 ∈ (0, 1). Thus we see A similar argument ensures g ≤ ess inf g + ε.
Proof of Lemma 3.9. The radius r > 0 is only required to ensure that P ⊂ B 1 (0). We may thus translate, re-scale and use the symmetries of the problem to only work in the case i = 1, x 0 = 0, I = (−1, 1). These additional assumptions imply for ν ∈ N 2 ∪ N 3 and, consequently, P = ν∈N 2 ∪N 3 {x ∈ R 3 : |ν · x| < 1}. Furthermore, we only have to deal with the case θ 1 • l ≤ ε, as the other one can be dealt with by working withθ 1 := 1 − θ 1 . We remind the reader that Figure 16 depicts the general strategy of the proof.
Step 2: Prove inequality (16) on a suitable subset of P .
The proof so far ensured that the argument of θ 1 in this inequality lies in P . We now need to prove that we did not miss significant parts.
To this end, we exploit that P = ν∈N 2 ∪N 3 {x ∈ R 3 : |ν · x| ≤ 1} is a three-dimensional polyhedron. A fundamental result in the theory of bounded, non-empty polyhedra, see Brøndsted [9, Corollary 8.7 and Theorem 7.2], is that they can be represented as the convex hull of their extremal points. Following Brøndsted [9, Chapter 1, §5], extremal points x ∈ P are defined to leave P \ {x} still convex, see also Figure 24. Thus, in order to prove 0 ≤ θ 1 (x) ≤ 6ε holds for x ∈ P we only have to argue that the closure of the set [111] : z ∈ (−1, 1) and α, β ∈ I(z) contains all extremal points and is convex.
The extremal points can be computed in a straightforward manner by finding all intersections of three of its two-dimensional faces still lying inP . The resulting points are ± √ 2E 1 , ±2(011) = ±2ν + 1 and ± √ 2(011) = ± √ 2ν − 1 , see Figure 24. These can be presented as Furthermore, in order to see that Q is convex, we only have to prove for all −1 ≤ z 1 , z 2 ≤ 1. Indeed, by the triangle inequality we have Step 4: Prove that f ν is almost affine for ν ∈ N 2 ∪ N 3 if the one-dimensional functions are continuous. We will only deal with ν = 1 √ 2 (101). The advantage of working with continuous functions is that we do not have to bother with sets of measure zero. Let (s, h,h) ∈ R 3 be such that s, s + h, s +h, s + h +h ∈ (−1, 1). In order to exploit Remark 2.3 we set , [111], To prove x j ∈ P for all j = 1, 2, 3, 4 we go through the cases: • The facts x 0 · ν = s and 1 [111] · ν = 1 clearly implies x j · ν ∈ (−1, 1) for j = 1, 2, 3, 4.

By
Step 3 have Inserting the decomposition into the one-dimensional functions and making use of the combinatorics above we see that |f (x + z + τ y) − f (x)| p dy dx = 0.

The case f ν ∈ V M O for all ν ∈ N
Proof of Proposition 3.11. Throughout the proof letr > 0 be a universal, fixed radius, which we will choose later. We will denote generic radii with r >r. These may decrease from line to line.
Applying the mean value theorem for V M O-functions, Lemma 3.12, we get that if θ ∈ {e 1 , e 2 , e 3 } almost everywhere on Br (0), then it holds that θ ≡ e i for some i ∈ {1, 2, 3} on Br (0), which implies degeneracy by Corollary 3.3. Thus we may additionally assume that on Br (0), exploiting symmetry of the problem, that |{x ∈ Br (0) : Step 1: Find a set A ⊂ Br (0) with |A| > 0 and ε = ε(δ) 0 as δ 0 such that the following hold: • On A we have where conv(K) denotes the convex hull, see Figure 25.
We may furthermore assume to be a point of density one in the sense that |A∩Bκ(0)| |Bκ(0)| → 1 as κ → 0. Recall that we defined θ δ (x) = − B δ (x) θ(y)dy. As convolutions are convex operations we obtain θ δ ∈ conv(K) a.e. Another application of Lemma 3.12 gives the fuzzy inclusion (49) with ε = ε(δ) → 0 as δ → 0. The additional assumption (46) implies that there exists η > 0 such that on Br (0) we have Lebesgue point theory implies that θ δ → θ pointwise almost everywhere. Using Egoroff's Theorem, we may upgrade this convergence to uniform convergence on some set with |A| > 0 and such that all points in A have density one. Using both uniform convergences above we get that for δ > 0 small enough we have To see that we may assume property (50), namely 0 ∈ A, letr ≤ 1 be a universal radius with which the conclusion of the proposition holds under the assumption that we indeed have 0 ∈ A. We may then choose the radiusr = 1 4r in inequality (51) so that A ⊂ B1 4r (0). For any point x ∈ A we then clearly have B 1 2 (x) ⊂ B 1 (0). Shifting and rescaling said ball to B 1 (0) and applying the conclusion in the new coordinates, we see that the configuration only involves two variants on B1 2r (x). Consequently, it is a two-variant configuration on B1 Step 2: On the plane H 0, 1 √ 2 (011) we split up θ 1 into two one-dimensional functions and find maximal intervals on which they are essentially constant. Similarly to the proof of Proposition 3.6 we parametrize the plane H 0, 1 which gives the relations Absorbing the affine function g 1 in decomposition (11) into the four one-dimensional functions f ν for ν ∈ N 2 ∪ N 3 we may assume As before, we exploit the combinatorial structure of the normals discussed in Remark 2.3 and sort these according to their dependence on β or γ on the plane H(0, (011)) by defining As a result of Lemma 3.8 we may shuffle around some constant so that we can assume The decomposition then turns into after averaging.
Due to our assumption that 0 ∈ A and the fact that inequality (47) is an open condition, continuity of θ δ implies that there exists κ = κ(δ) > 0 such that As θ 1,δ is a sum of two one-dimensional functions that is small due to the first inequality of (60) the individual terms are small by Lemma (3.8), i.e., we have where we used continuity to replace the essential infima. In particular, for the oscillations on closed intervals I, defined as By continuity of F 1,δ and F 2,δ the oscillations are continuous when varying the endpoints of the involved intervals. Thus there exist unique maximal intervals such that osc I δ 1 F 1,δ ≤ ε and osc I δ 2 F 2,δ ≤ ε.
We would like to prove that [−r,r] ⊂ I δ 1 , I δ 2 , but for the next couple of steps we will be content with making sure they do not shrink away as δ → 0, see Figure 26 for an outline of the argument. Note that we will drop the dependence of I 1 and I 2 on δ in the following as long as we keep it fixed.
Together with (59) we obtain for γ ∈ I 2 ∩ (−r, r) that Swapping the roles of β and γ and using Step 1 and the definition ofK we thus see on the set ∂(I 1 × I 2 ) ∩ (−r, r) 2 .
Step 4: The functions f ν,δ • X for ν ∈ N 2 ∪ N 3 , θ 2,δ • X and θ 3,δ • X are almost affine along l(t) := t(1, 1) as long as t δ min < t < t δ max . Here t δ min < 0 < t δ max are the two parameters for which l intersects ∂(I δ 1 × I δ 2 ), see Figure  26. We again drop the superscripts in the notation of these objects as well as long as we keep δ fixed.
Step 5: If δ > 0 is sufficiently small and we have −r < t min < t max < r, then We also get the same implication at t min .
In order to transport this information to the point l(0) we use that θ 3,δ • X is almost affine along l(t), see (63), to get with t := t min , h := −t min andh := t max .
Combining this inequality with θ 3,δ • X • l(t min + t max ) ≥ 0 and the supposedly incorrect assumption (64) we arrive at However, this is in contradiction to the strain lying strictly between two martensite strains at 0 for small δ, see (60), which proves the claim.
Step 6: We do not have lim inf δ→0 t δ max − t δ min = 0. Towards a contradiction we assume that the difference does vanish in the limit. Let g δ (s) := f (101),δ + f (101),δ ((1−s)t δ min +st δ max ) for s ∈ [0, 1]. By Lemma 3.13 the sequence g δ converges uniformly to an affine function g. As by Step 5 we know that the linear part of g has to be nontrivial, recall that f (011),δ and f (011),δ drop out in the decomposition of θ 2 along X • l, we get thatˆ1 Undoing the rescaling we conclude that Due to Jensen's inequality this implies lim inf However, this is a contradiction to our assumption that has a connected component P such that 0 ∈ P . Furthermore, the set P satisfies for open, non-empty intervals I ν ⊂ R, i.e., up to localization it is a polyhedron whose faces' normals are contained in N 2 ∪ N 3 . By Step 6 and Lemma 3.9 we find a connected component P of the above set such that 0 ∈ P in the limit δ → 0. In the following, we will choose the precise representatives of all involved functions, see Evans and Gariepy [18,Chapter 1.7.1], so that we can evaluate θ 1 in a pointwise manner.
By distributionally differentiating the condition on P in two different directions d,d ∈ D, see Subsection 2.2, and making use of Remark 2.3 we see that f ν is locally affine on P for ν ∈ N 2 ∪ N 3 . By connectedness of P , they must be globally affine: Let ν ∈ N 2 ∪ N 3 and let G := {g : R 3 → R : g is affine}. Let U g := {x ∈ P : f ν (ν · y) ≡ g(y) for y ∈ B κ (x) for some κ > 0}.
By construction, these sets are open. They are also disjoint because two affine functions agreeing on a non-empty open set have to coincide globally. Finally, we have P = g∈G U g by assumption. Therefore, there exists a single affine function g such that f ν = g on P .
We may thus re-define f ν for ν ∈ N 2 ∪ N 3 to satisfy The image I ν := ν · P is open and connected, and thus an interval. It is also clearly non-empty and by construction we have As it holds that fν = 0 on ν∈N 2 ∪N 3 {ν · x ∈ I ν } ∩ B r (0) for allν ∈ N 2 ∪ N 3 we get the other inclusion which proves the claim.
Step 8: Let F be a face of P with normal ν ∈ N i for i ∈ {2, 3} and F ∩ B r (0) = ∅. Then The claim is meaningful by Lemma 3.4. In order to keep notation simple, we assume that ν = 1 √ 2 (101) and that ν is the outer normal to P at F , i.e., we have P ⊂ {x · ν < b} with {b} = ν · F . A two-dimensional sketch of this situation can be found in Figures 27a, while a less detailed three-dimensional one is shown in Figure 18a.
Furthermore, we only have to prove the dichotomy θ 2 ≡ 0 or θ 2 ≡ 1 locally on F , i.e., on B κ (x 0 ) for all x 0 ∈ F and someκ =κ(x 0 ) > 0 such that By Lemma 3.4 and f ν ∈ V M O for all ν ∈ N we have θ i • X ∈ V M O F , where X :F → F is an affine parametrization of F . An application of the mean value theorem for VMO-functions, Lemma 3.12, gives the "global" statement on F due to connectedness of F . Let x 0 ∈ F be such that there exists κ > 0 with the inclusions (66) being satisfied for κ = κ, where in the following κ may decrease from line to line in a universal manner. We can use the identities (65) to conclude f (101) ≡ 0 on B 2κ (x 0 ) ∩ {x · ν < b} ⊂ P and after averaging provided we have δ < c for a constant 0 < c < κ to be chosen later. In particular, the latter together with the decomposition (58) implies Therefore, we cannot have f (101) ≡ 0 on the larger set B κ (x 0 )∩{x·ν < b+c} as otherwise we would get the contradiction Written in terms of the approximation f (101),δ , recalling that ε = ε(δ) → 0 as δ → 0, this and δ > 0 small enough. By equation (67) and continuity we may additionally assume that f (101),δ (b δ ) = ε which due to equation (68) implies that for all x ∈H := H b δ , 1 √ 2 (101) ∩ B κ (x 0 ), see Figure 27b. Combining this with the inclusion θ δ ∈K + B ε (0) we consequently get min{θ 2,δ (x), θ 3,δ (x)} < ε onH. Due to θ 1 + θ 2 + θ 3 ≡ 1 we convert this into for all x ∈H. Continuity implies the dichotomy we have either θ 2,δ (x) < 2ε for x ∈H or θ 2,δ (x) < 2ε for x ∈H.
Step 9: Transport the information θ i ≡ 0 or θ i ≡ 1 on the face F closest to the origin back into P . Let I ν = (a ν , b ν ) be the intervals obtained in Step 7. The proposition is proven once we can show that a ν ≥ −r <r ≤ b ν for all ν ∈ N 2 ∪ N 3 . Towards a contradiction we assume otherwise. Furthermore, for the sake of concreteness we assume that b := b (101) = min ν∈N 2 ∪N 3 {−a ν , b ν } <r, i.e., we assume the face F of P we considered in the previous step to be the one closest to the origin. All other cases work the same.  for almost all t ∈ J := l −1 (F ∩ B r (0)). Lemma 3.9 implies that θ 2 ≡ 0 on the convex polyhedron Q := see Figure 15b for a sketch relating P and Q in three dimensions. As any point of the closure Q has positive density, we only have to prove 0 ∈ Q to get a contradiction to 0 being a point of density one of the set Step 1. Furthermore, we may suppose that b > 0 as that would imply 0 ∈ F , which by F ⊂ Q trivially gives the statement.
Proof of Lemma 3.12. The fact that f δ = − B δ (0) f (y) dy is continuous follows easily from the observation that f δ is the convolution of f with 1 |B δ (0)| χ B δ (0) . As long as B δ (x) ⊂ U , we have that Proof of Lemma 3.13. By convolution (and restriction to a slightly smaller interval) we may suppose that g is continuous. Without loss of generality we may additionally assume g(0) = 0. Recall ε := sup t,t+h,t+h,t+h+h∈[0,1] |g(t + h +h) − g(t + h) − g(t +h) + g(t)|.
By induction, we can prove that for Indeed, the case n = 1 is trivial and the crucial part of the induction step is In particular, for x ∈ [0, 1] and n ∈ N such that nx ∈ [0, 1] we have that which implies Choosing |x| ≤ 1 2 and n = 1 x in this inequality gives where we used 1 Plugging x = 1 m , n = k into estimate (71) and x = 1 m , n = m into estimate (72) for numbers k, m ∈ N with k ≤ m gives Additionally note that for x ∈ [0, 1] and N ∈ N we have Collecting all of the above, we have for N ≥ 2 and x ∈ [0, 1] that (1), which gives If instead we have ||g|| 1 2 ∞ ε − 1 2 < 2 we setg ≡ 0 and get ||g −g|| ∞ ≤ 2ε.

Classification of planar configurations
Proof of Lemma 3.14. Without loss of generality, we may assume that f ν 1 is affine and that where B has non-vanishing measure. Absorbing f ν 1 into g 2 and g 3 , as well as absorbing g 1 − 1 into f ν 2 and f ν 3 , which we can do because ∂ d g 1 = 0 and the remaining variables are spanned by ν 2 and ν 3 , we are left with for an affine function g with ∂ d g = 0. One of the two functions f ν 2 and f ν 3 cannot be affine as otherwise we would be dealing with a two-variant configuration by Proposition 3.11. Therefore, there are two cases: Precisely one of the two remaining one-dimensional functions is affine, or both are not.
Let us first deal with f ν 2 (x) being affine. We cannot have |θ −1 3 (0)| > 0, because two affine functions agreeing on a set of positive measure have to agree everywhere, which would imply θ 3 ≡ 0 and thus there would only be two martensite variants present. We thus have |θ −1 1 (0)| > 0 and |θ −1 2 (0)| > 0. The same argument applied to the x · ν 2 -dependence of θ 1 and θ 2 implies that f ν 2 is constant and g only depends on x · ν 3 . Consequently, there exist a, b ∈ R such that the decomposition simplifies to For x ∈ B r (0) such that f ν 3 (x) = 1 we must have θ 2 (x) = 0, which implies that for some measurable set A ⊂ R. Plugging this into the decomposition gives i.e., the decomposition is a planar second-order laminate according to Definition 2.6. The argument for f ν 3 being affine is the same.
Finally, let us work with the case that both functions are not affine. Using the twovaluedness (73) on H(α, ν 2 ), we may split up g(x) =g 2 (x · ν 2 ) +g 3 (x · ν 3 ) into two affine functions such thatg 2 (α) = 0 and for x ∈ B r (0) with x · ν 2 = α. Therefore χ B captures the entire dependence on x · ν 3 and we abuse the notation in writing As f ν 3 is not affine, the set B has neither zero nor full measure. Choosing x such that χ B (x·ν 3 ) = 0 we see thatg 2 ≥ 0. Thus it is an affine function which achieves its minimum atg 2 (α) = 0, which in turn makes sure thatg 2 ≡ 0. Consequently, we can re-define the functions on the right-hand side to get .
For x such that x · ν 3 ∈ B we see that This implies f ν 2 = −(1 − b)χ A for a measurable set A of neither zero nor full measure, since f ν 2 is not affine. On the set {x · ν 2 ∈ A c } ∩ {x · ν 3 ∈ B} of positive measure we get that θ 3 (x) = 0 due to our assumption that 0 < b < 1, resulting ing 3 ≡ 0. Hence the decomposition can be written as meaning the configuration is a planar checkerboard according to Definition 2.7.
Proof of Proposition 3.15. We denote the fixed radius for which the assumptions of the lemma hold byr, while r >r is a generic radius that may decrease from line to line.
Step 1: Rewrite the problem in a two-dimensional domain and bring the decomposition (11) into an appropriate form.
Using the specific form of the normals ν i and the fact that they are linearly independent, we can find orientationsν i = ±ν i for i = 1, 2, 3 which satisfyν 1 +ν 2 +ν 3 = 0. Furthermore, the strain e(u) only depends on directions in V := span(ν 1 ,ν 2 ,ν 3 ). Thus we can rotate the domain of definition such that V = R 2 and treat e(u) as a function defined on B 1 (0) ⊂ R 2 . In the following we will abuse the notation by writing ν i for the images of ν i under this rotation.
Step 2: If θ −1 i (0) ∩ B r (0) > 0 for some i = 1, 2, 3 we re-define f  Step 3: There exist measurable sets J j ⊂ R for j = 1, 2, 3 such that up to null-sets and the two sets have measure zero.
If θ −1 i (0) ∩ B r (0) > 0 we set Otherwise we set I In any case we have up to null-sets. In that case the affine function f Consequently we get, up to null-sets, which in terms of J j := I (j−1) j for j = 1, 2, 3 reads, up to null-sets, Since the sets π −1 i+1 (J i+1 ) ∩ π −1 i−1 (J i−1 c ) are pairwise disjoint for i = 1, 2, 3 and, again up to null-sets, we have i=1,2,3 θ −1 i (0) ∩ B r (0) = B 0 (r) we get that up to null-sets.
Some straightforward combinatorics ensure that Thus we have x · ν 2 −r 0 r 1 q 1 q 2 Figure 28: Graphs of χ K 1 and χ K 2 in the case that K 1 and K 2 are intervals such that one of them has an endpoint at −r and the other one at r. In this case we choose p 1 , p 2 and q 1 , q 2 on opposite sides of the respective other endpoint. 0 x · ν 1 −r 0 r 1p 1p2p3 0 x · ν 2 −r 0 r 1 q 1 q 2 Figure 29: Graphs of χ K 1 and χ K 2 in the case that K 1 is not an interval with one endpoint at −r or r. In this specific instance we choose p 1 =p 2 and p 2 :=p 3 .
If K 2 is not an interval with one endpoint at −r or r, the same reasoning applies.
As for such lines a locally significant part is missing from π −1 1 (K 1 ) ∩ π −1 2 (K 2 ) due to inequality (78) we get By algebraic manipulation of this inequality we see Since the right-hand side of this inequality vanishes in the limit ε → 0, we see that 0 is a point of density one for M by definition of S ε .
Step 2: We have |M | ≥ |K 1 | + |K 2 |. The geometric situation in the following argument can be found in Figure 32. LetK i ⊂ K i for i = 1, 2 be the points of density one contained in the respective sets. By Lebesgue point theory we have |K i | = |K i | for i = 1, 2. Lett 1 := infK 1 andt 2 := supK 2 . Since