1 Introduction

Due to the many possible applications of the eponymous shape memory effect, shape memory alloys have attracted a lot of attention of the engineering, materials science and mathematical communities. Their remarkable properties are due to certain diffusionless solid–solid phase transitions in the crystal lattice of the alloy, enabling the material to form microstructures. More specifically, the lattice transitions between the cubic austenite phase and multiple lower-symmetry martensite phases, triggered by crossing a critical temperature or applying stresses; see Bhattacharya [6] for a thorough introduction.

Fig. 1
figure 1

A sketch of the cubic-to-tetragonal transformation. The left-hand side represents the cubic austenite phase, while the right-hand side represents the martensite variants that are elongated in the direction of one of the axes of the cube and shortened in the other two. Adapted from [6, Figure 4.5]

As a result, these materials often form microstructures. In shape memory alloys undergoing cubic-to-tetragonal transformations, see Fig. 1, one frequently observes the following types of microstructures:

  1. 1.

    Twins: Fine-scale laminates of martensite variants, see Fig. 2a and both sides of the interface at the center of Fig. 2b.

  2. 2.

    Habit planes: Almost sharp interfaces between austenite, and a twin of martensites, where the twin refines as it approaches the interface, see Fig. 2a.

  3. 3.

    Second-order laminates, or twins within a twin: Essentially sharp interfaces between two different refining twins, see Fig. 2b.

  4. 4.

    Crossing second-order laminates: Two crossing interfaces between twins and pure phases, see for example Fig. 2c.

  5. 5.

    Wedges: Materials whose lattice parameters satisfy a certain relation can form a wedge of two martensite twins in austenite, see [6, Chapter 7.3.1] and Fig. 2d.

Furthermore, at least in Microstructures 1, 2 and 5, all observed interfaces form parallel to finitely many different hyperplanes relative to the crystal orientation. In this paper, we present a theorem characterizing all possible microstructures whose energy is comparable to that of a habit plane in the geometrically linear theory.

Fig. 2
figure 2

a Optical micrograph of a habit plane with austenite on the right-hand side and twinned martensite on the left-hand side in a Cu–Al–Ni alloy undergoing cubic-to-orthorhombic transformations. Reprinted by permission from Springer Customer Service Centre GmbH [17]. b Optical micrograph of a second-order laminate in a Cu–Al–Ni alloy, by courtesy of C. Chu and R.D. James. c Optical micrograph of two crossing second-order laminates in an Indium–Thallium crystal. The bottom region is in the austenite phase. All other regions show twinned martensite variants with the twinning in the left-hand side one being almost parallel to the surface of the sample. Reprinted from [4], with permission from Elsevier. d Optical micrograph of a wedge in a Cu–Al–Ni alloy, by courtesy of C. Chu and R.D. James

1.1 Contributions of the Mathematical Community

1.1.1 Modeling

The first use of energy minimization in the modeling of martensitic phase transformations has been made by Khatchaturyan, Roitburd and Shatalov [24,25,26, 36, 37] on the basis of linearized elasticity. This allowed prediction of certain large scale features of the microstructure such as the orientation of interfaces between phases.

Variational models based on nonlinear elasticity go back to Ball and James [2, 3]. They formulated a model in which the microstructures correspond to minimizing sequences of energy functionals vanishing on

$$\begin{aligned} K = \bigcup _{i=1}^m SO(3)U_i \end{aligned}$$

for finitely many suitable symmetric matrices \(U_i\) with \(i=1,\ldots m\) and \(m\in \mathbb {N}\). In their theory, the orientations of interfaces arise from a kinematic compatibility condition known as rank-one connectedness, see [6, Chapter 2.5]. For cubic-to-tetragonal transformations, Ball and James prove in an ansatz-free way that the fineness of the martensite twins in a habit plane is due to only certain mixtures of martensite variants being compatible with austenite. Their approach is closely related to the phenomenological (or crystallographic) theory of martensite independently introduced by Wechsler, Lieberman and Read [42] and Bowles and MacKenzie [8, 33].

A comparison of the nonlinear and the geometrically linear theories can be found in an article by Bhattacharya [5]. Formal derivations of the geometrically linear theory from the nonlinear one have been given by Kohn [28] and Ball and James [3]. A rigorous derivation via \(\varGamma \)-convergence has been given by Schmidt [41] with the limiting energy in general taking a more complicated form than the usually used piecewise quadratic energy densities.

1.1.2 Rigidity of Differential Inclusions

The interpretation of microstructure as minimizing sequences naturally leads to analyzing the differential inclusions

$$\begin{aligned} Du \in K = \bigcup _{i=1}^m SO(3)U_i, \end{aligned}$$

sometimes called the m-well problem, or variants thereof such as looking for sequences \(u_k\) such that \({\text {dist}}(Du_k,K) \rightarrow 0\) in measure. In fact, the statements of Ball and James are phrased in this way [2, 3]. A detailed discussion of these problems which includes the theory of Young measures has been provided by Müller [34].

However, differential inclusions in themselves are not accurate models: Müller and Šverák [35] constructed solutions with a complex arrangement of phases of the differential inclusion \(Du \in SO(2) A \cup SO(2) B\) with \({\text {det}}(A) = {\text {det}} B = 1\), for which one would naively only expect laminar solutions, in two space dimensions using convex integration. Later, Conti, Dolzmann and Kirchheim [15] extended their result to three dimensions and the case of cubic-to-tetragonal transformations.

But Dolzmann and Müller [19] also noted that if the inclusion \(Du \in SO(2) A \cup SO(2) B\) is augmented with the information that the set \(\{Du \in SO(2)A\}\) has finite perimeter, then Du is in fact laminar. Also this result holds in the case of cubic-to-tetragonal transformations as shown by Kirchheim.Footnote 1 There has been a series of generalizations including stresses [13, 16, 22, 31, 32]. However, these are more in the spirit of the geometric rigidity theorem due to Friesecke, James and Müller [21], since they rely on the perimeter being too small for lamination and as such do not give insight into the rigidity of twins. In the presence of a single rank-one connection and an additional anisotropic pertubation of the energy, this problem has more recently been overcome by Davoli and Friedrich [18] by exploiting a version of the geometric rigidity theorem for matrix fields with non-zero curl.

In contrast, the differential inclusion arising from the geometrically linear setting

$$\begin{aligned} \frac{1}{2}(Du + Du^T) \in \{e_1,e_2,e_3\}, \end{aligned}$$

where \(e_i\) for \(i=1,2,3\) are the linearized strains corresponding to the cubic-to-tetragonal transformation, see (4), is rigid in the sense that all solutions are laminates even without further regularizations as proven by Dolzmann and Müller [19]. Quantifying this result Capella and Otto [10, 11] proved that laminates are stable in the sense that if the energy (1) (including an interfacial penalization) is small then the geometric structure of the configuration is close to a laminate. Additionally, there is either only austenite or only mixtures of martensite present. Capella and Otto also noted that for sequences with bounded energy such a result cannot hold due to a well-known branching construction of habit planes (Fig. 2a) given by Kohn and Müller [29, 30].

Therein, Kohn and Müller used a simplified scalar version of the geometrically linear model with surface energy to demonstrate that compatibility of austenite with a mixture of martensites only requires a fine mixture close to the interface so that the interfacial energy coarsens the twins away from the interface. Kohn and Müller also conjectured that the minimizers exhibit this so-called branching, which Conti [14] affirmatively answered by proving minimizers of the Kohn–Müller functional to be asymptotically self-similar.

In view of the results by Kohn and Müller, and Capella and Otto it is natural to consider sequences with bounded energy in order to analyze the rigidity of branching microstructures.

1.1.3 Some Related Problems

So far, we have mostly discussed the literature describing the microstructure of single crystals undergoing cubic-to-tetragonal transformations. However, the variational framework can be used to address related problems, for which we highlight a few contributions as an exhaustive overview is outside the scope of this introduction.

An overview of microstructures arising in other transformations can be found in the book by Bhattacharya [6]. Rigorous results for cubic-to-orthorhombic transformations in the geometrically linear theory can be found in a number of works by Rüland [38, 39]. For the much more complicated cubic-to-monoclinic-I transformations with its twelve martensite variants, Chenchiah and Schlömerkemper [12] proved the existence of certain non-laminate microstructures in the geometrically linear case without surface energy.

For an overview over the available literature on polycrystalline shape memory alloys we refer the reader once again to Bhattacharya’s book [6, Chapter 13] and an article by Bhattacharya and Kohn [7].

Another problem is determining the shape of energy-minimizing inclusions of martensite with given volume in a matrix of austenite, for which scaling laws have been obtained by Kohn, Knüpfer and Otto [27] for cubic-to-tetragonal transformations in the geometrically linear setting.

1.2 Definition of the Energy

In order to analyze the rigidity properties of branched microstructures we choose the geometrically linear setting, since the quantitative rigidity of twins is well understood due to the results by Capella and Otto [10, 11]. In fact, we continue to work with the same already non-dimensionalized functional, namely

$$\begin{aligned} E_\eta (u,\chi )\,{:}{=} \,E_{{\mathrm{elast}},\eta }(u,\chi )+ E_{{\mathrm{inter}},\eta }(u,\chi ), \end{aligned}$$
(1)

where

$$\begin{aligned} E_{{\mathrm{elast}},\eta }(u,\chi )&\,{:}{=}\,\eta ^{-\frac{2}{3}}\int _{{B_{1}\left( 0\right) }} \left| e(u) -\sum _{i=1}^3\chi _i e_i\right| ^2 \, \mathrm {d}\mathscr {L}^3, \end{aligned}$$
(2)
$$\begin{aligned} E_{{\mathrm{inter}},\eta }(u,\chi )&\,{:}{=}\,\eta ^{\frac{1}{3}} \sum _{i=1}^3|D \chi _i|({{B_{1}\left( 0\right) }}). \end{aligned}$$
(3)

Here \(u \in W^{1,2}({B_{1}\left( 0\right) }; \mathbb {R}^3)\) is the displacement and \(e(u) =\frac{1}{2}\left( Du + Du^T\right) \) denotes the strain. Furthermore, the partition into the phases is given by \(\chi _i \in L^\infty ({B_{1}\left( 0\right) } ; \{0,1\})\) for \(i=1,2,3\) with \( \sum _{i=1}^3\chi _i = 1\) and the strains associated to the phases are given by

$$\begin{aligned} e_0 \,{:}{=} \,0, e_1\,{:}{=}\,\begin{pmatrix} -2 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 1\\ \end{pmatrix}, e_2\,{:}{=}\,\begin{pmatrix} 1 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad -2 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 1\\ \end{pmatrix}, e_3\,{:}{=}\,\begin{pmatrix} 1 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad -2\\ \end{pmatrix}. \end{aligned}$$
(4)

In particular, we assume the reference configuration to be in the austenite state, but that the transformation has occurred throughout the sample, i.e., there is no austenite present. This simplifying assumption does rule out habit planes, see Fig. 2a, but a look at Fig. 2b suggests that we can still hope for an interesting result. Besides, the responsible mechanism for macroscopic rigidity is the rank-one connectedness of the average strains \(e(u_\eta ) \rightharpoonup e(u)\) in \(L^2\), which cannot distinguish between pure phases and mixtures.

The condition of the material being a shape memory alloy is encoded in the fact that \({\text {tr}}(e_i) = 0\) for all \(i=1,2,3\) as this corresponds to the transformation being volume-preserving.

Further simplifying choices are using equal isotropic elastic moduli with vanishing second Lamé constant for all martensite phases and penalizing interfaces by the total variation of \(D\chi _i\) for \(i=1,2,3\). Of course, as such it is unlikely that the model can give quantitatively correct predictions. Bhattacharya for example argues that assuming equal elastic moduli is not reasonable [5, Page 238].

We still expect our analysis to give relevant insight as we will for the most part prove compactness properties of generic displacements \(u_\eta \in W^{1,2}({{B_{1}\left( 0\right) }};\mathbb {R}^3)\) and partitions \(\chi _\eta \) for \(\eta >0\) such that

$$\begin{aligned} \limsup _{\eta \rightarrow 0} E_{\eta }(u_\eta ,\chi _\eta ) < \infty . \end{aligned}$$

This regime is the appropriate one to analyze branching microstructures: On the one hand, (generalizations of) the Kohn–Müller branching construction of habit planes have bounded energy. On the other hand, the stability result of Capella and Otto [11] rules out branching in sequences with asymptotically vanishing energy by ensuring that in a strong topology there is either almost exclusively austenite or the configuration is close to a laminate. In other words, the branching construction implies that the stability result is sharp with respect to the energy regime as pointed out by Capella and Otto in their paper.

1.2.1 Compatibility Properties of the Stress-Free Strains

It is well known, see [6, Chapter 11.1], that for \(M_1\), \(M_2 \in \mathbb {R}^{3\times 3}\) and \(n \in {\mathbb {S}}^2\) the following two statements are equivalent:

  • There exists a continuous function \(u:\mathbb {R}^3 \rightarrow \mathbb {R}^3\) such that for almost all \(x\in \mathbb {R}^3\) it holds that

    $$\begin{aligned} e(u)(x) = {\left\{ \begin{array}{ll} M_1 &{}\quad {\text { if }} x \cdot n > 0,\\ M_2 &{}\quad {\text { if }} x \cdot n < 0, \end{array}\right. } \end{aligned}$$
    (5)

    see Fig. 3a.

  • The two strains are (symmetrically) rank-one connected in the sense that there exists \(a \in \mathbb {R}^3\) such that

    $$\begin{aligned} M_1 - M_2 = a \odot n \,{:}{=}\, \frac{1}{2}( a \otimes n + n \otimes a) . \end{aligned}$$

Note that the condition is symmetric in a and n thus every rank-one connection generically gives rise to two possible normals. Additionally, as rank-one connectedness is also symmetric in \(M_1\) and \(M_2\) this allows for the construction of laminates.

Fig. 3
figure 3

a Geometry of an interface parallel to the plane \(\{x\cdot n =0\}\) in a laminate joining the strains \(M_1\) and \(M_2\). b Sketch relating the martensite strains with the cone C (dotted) of symmetrized rank-one matrices in the two-dimensional strain space S. Note that C is a union of three lines parallel to the edges of the triangle \(\mathscr {K}\)

In order to present the result of applying the rank-one connectedness condition to the case of cubic-to-tetragonal transformations notice that

$$\begin{aligned} e_0, \ldots e_3 \in S\,{:}{=}\,\left\{ e\in \mathbb {R}^{3\times 3}: e \text { diagonal, }{\text {tr}}e = 0\right\} . \end{aligned}$$
(6)

Here, we call the two-dimensional space S strain space. It can be shown, either by direct computation or an application of [12, Lemma 3.1], that all rank-one directions in S are multiples of \(e_2 - e_1\), \(e_3 - e_2\) and \(e_1 - e_3\); this means that they are parallel to one of the sides of the equilateral triangle

$$\begin{aligned} \mathscr {K}\,{:}{=}\, \bigcup _{i=1}^3 \{\lambda e_{i+1} + (1-\lambda ) e_{i-1}: \lambda \in [0,1]\} \end{aligned}$$
(7)

spanned by \(e_1, e_2\) and \(e_3\) shown in Fig. 3b. In particular, the martensite strains are mutually compatible but austenite is only compatible to certain convex combinations of martensites which turn out to be \(\frac{1}{3} e_i + \frac{2}{3} e_j\) for \(i,j =1,2,3\) with \(i\ne j\).

1.3 The Contributions of the Paper

We study the rigidity of branching microstructures due to “macroscopic” effects in the sense that we only look at the limiting volume fractions \(\chi _{i,\eta } \overset{*}{\rightharpoonup } \theta _i\) in \(L^\infty \) after passage to a subsequence, which completely determines the limiting strain \(e(u_\eta ) \rightharpoonup e(u)\) in \(L^2\).

Similarly to the result of Capella and Otto [11], our main result, Theorem 1, is local in the sense that we can classify the function \(\theta \) on a smaller ball \({B_{r}\left( 0\right) }\) of universal radius \(0<r<1\). As such, having posed the problem on \({B_{1}\left( 0\right) }\) instead of a more general domain does not present a significant restriction of the result. As the characterization of each of the four possible cases is a bit lengthy, we postpone a detailed discussion to Section 2.3. An important point is that we deduce all interfaces between different mixtures of martensites to be hypersurfaces whose normals are as predicted by the rank-one connectedness of the average strains on either side. In this respect our theorem improves on previously available ones, as they either explicitly assume the correct alignment of a habit plane, see e.g. Kohn and Müller [30], or require other ad-hoc assumptions; for example, Ball and James [2, Theorem 3] show habit planes to be flat under the condition that the austenite phase defines a connected set.

The broad strategy of our proof is to first ensure that in the limit the displacement satisfies the non-convex differential inclusion

$$\begin{aligned} e(u) \in \mathscr {K}\end{aligned}$$

encoding that locally at most two variants are involved, see Definition (7) and Fig. 3, and then to classify all solutions. We strongly stress the point that we do not need to assume any additional regularity in order to do so. In particular, the differential inclusion is rigid in the sense that it does not allow for convex integration solutions with extremely intricate geometric structure. To our knowledge this is the first instance of a rigidity result for a non-discrete differential inclusion in the framework of linearized elasticity.

The main idea is that “discontinuity” of e(u) and the differential inclusion \(e(u) \in \mathscr {K}\) balance each other: If \(e(u) \notin VMO\), see Definition 1, a blow-up argument making use of measures describing the distribution of values \(e(u)\in \mathscr {K}\), similar in spirit to Young measures, proves that the strain is independent of one direction. If \(e(u) \in VMO\) the differential inclusion gives us less information, but we can still prove that throughout some smaller ball \({B_{r}\left( 0\right) }\), \(r\in (0,1)\), only two martensite variants are involved by using an approximation argument. Finally, we classify all solutions which are independent of one direction.

The structure of the paper is as follows: in Section 2 we state and discuss our main theorem in detail. We then give an in-depth explanation of most necessary auxilliary results required to prove Theorem 1 in Section 3. All proofs of the statements in Sections 2 and 3 are presented in Section 4 in the order of their appearance. The “Appendix A” finally contains two lemmas of a technical nature, along with their proofs.

2 The Main Rigidity Theorem

Theorem 1

There exists universal radii \(r,\tilde{r} \in (0,1)\) such that the following holds: For \(n \in \mathbb {N}\) let \(\eta _n >0\) be a sequence with \(\lim _{n \rightarrow \infty } \eta _n = 0\). Let \(u_{\eta _n}\in W^{1,2}({B_{1}\left( 0\right) }; \mathbb {R}^3)\) and \(\chi _{\eta _n} \in L^\infty ({B_{1}\left( 0\right) } ; \{0,1\}^3)\) with \(\sum _{i=1}^3 \chi _{\eta _n,i} \equiv 1\) almost everywhere be sequences of displacements and partitions such that \(\limsup _{n\rightarrow \infty } E_{\eta _n}(u_{\eta _n},\chi _{\eta _n}) < \infty \) and such that there exist \(u \in W^{1,2}({B_{1}\left( 0\right) };\mathbb {R}^3)\) and \(\theta \in L^\infty ({B_{1}\left( 0\right) };[0,1]^3)\) with

$$\begin{aligned} u_{\eta _n} \rightharpoonup u \text { in } W^{1,2}({B_{1}\left( 0\right) };\mathbb {R}^3)\text {, } \chi _{\eta _n} {\mathop {\rightharpoonup }\limits ^{*}} \theta \text { in } L^\infty ({B_{1}\left( 0\right) };\mathbb {R}^3) \end{aligned}$$

in the limit \(n\rightarrow \infty \). Then for almost all \(x\in {B_{1}\left( 0\right) }\) we have \(\theta _i(x) \in [0,1]\) for \(i=1,2,3\),

$$\begin{aligned} e(u)(x) = \sum _{i=1}^3 \theta _i(x) e_i \text { and }e(u)(x) \in \mathscr {K}= \bigcup _{i=1}^3 \{\lambda e_{i+1} + (1-\lambda ) e_{i-1}: \lambda \in [0,1]\}. \end{aligned}$$

Furthermore, all solutions to this differential inclusion are two-variant configurations, planar second-order laminates, planar checkerboards on \({B_{r}\left( 0\right) }\) or planar triple intersections on \({B_{\tilde{r}}\left( 0\right) }\), according to Definitions 26 below.

Note that after modifying \(u_{\eta _n}\) for \(n\in \mathbb {N}\) so that

$$\begin{aligned} \int _{{B_{1}\left( 0\right) }} u_{\eta _n} \, \mathrm {d}x= 0, \qquad \int _{{B_{1}\left( 0\right) }} \frac{1}{2} \left( D u_{\eta _n} - (D u_{\eta _n})^T\right) \, \mathrm {d}x= 0, \end{aligned}$$

any sequence with asymptotically bounded energy has subsequences (not relabeled) such that \(u_{\eta _n} \rightharpoonup u\) in \(W^{1,2}({B_{1}\left( 0\right) };\mathbb {R}^3)\) and \(\chi _{\eta _n} \overset{*}{\rightharpoonup } \theta \) in \(L^\infty ({B_{1}\left( 0\right) };\mathbb {R}^3)\) due to Korn’s and Poincaré’s inequalities.

The first part of the conclusion states that the volume fractions \(\theta _i\) for \(i=1,2,3\) act as barycentric coordinates for the triangle in strain space with vertices \(e_1\), \(e_2\) and \(e_3\). In terms of these, the differential inclusion \(e(u) \in \mathscr {K}\) boils down to locally only two martensite variants being present.

In plain words, the classification of solutions states that

  1. 1.

    only two martensite variants are involved, see Definition 2,

  2. 2.

    or the volume fractions \(\theta \) only depend on one direction and look like a second order laminate, see Definition 4,

  3. 3.

    or they are independent of one direction and look like a checkerboard of up to two second-order laminates crossing, see Definition 5,

  4. 4.

    or they are independent of one direction and macroscopically look like three second-order laminates crossing in an axis, see Definition 6.

Comparing this list to the list of observed microstructures in the introduction, we see that three crossing second-order laminates are missing. Indeed, we are unaware of them being mentioned in the currently available literature. One possible explanation for the absence of planar triple intersections in observations is that they could be an artifact of the linear theory. Another one is that their very rigid geometry, see Definition 6, could lead to them being unlikely to develop during the inherently dynamic process of microstructure formation.

Furthermore, we see that the theorem of course captures neither wedges (which are known to be missing in the geometrically linearized theory anyway [5]) nor habit planes due to austenite being absent. Unfortunately, an extension of the theorem including austenite does not seem tractable with the methods used here: The central step allowing to classify all solutions of the differential inclusion is to show that most configurations are independent of some direction. Even those that do depend on all three variables have a direction in which they are very well-behaved, i.e., they are affine. However, with austenite being present this property is lost, as the following example shows:

Lemma 1

There exist solutions \(u: \mathbb {R}^3 \rightarrow \mathbb {R}^3\) of the differential inclusion \(e(u) \in \mathscr {K}\cup \{0\}\) such that e(u) has a fully three dimensional structure.

Note that Theorem 1 strongly restricts the geometric structure of the strain, even if the four cases exhibit varying degrees of rigidity. Therefore, we can interpret it as a rigidity statement for the differential inclusion \(e(u) \in \mathscr {K}\). For example, it can be used to prove that \(u(x) \equiv M \in \mathscr {K}\) is the only solution of the boundary value problem

$$\begin{aligned} {\left\{ \begin{array}{ll} e(u) \in \mathscr {K}&{}\quad \text { in }{B_{1}\left( 0\right) },\\ u(x) \equiv M x &{}\quad \text { on } \partial {B_{1}\left( 0\right) } \end{array}\right. } \end{aligned}$$

with affine boundary data, for which convex integration constructions would give a staggering amount of solutions with complicated geometric structures. This can be seen by transporting the decomposition into one-dimensional functions of Definitions 26 to the boundary using the fact that they are unique up to affine functions, see [11, Lemma 5].

2.1 Inferring the Microscopic Behavior

In order to properly interpret the various cases Theorem 1 provides, we first need a clear idea of precisely what information the local volume fractions contain. In principle, they have the same downside of using Young measures to describe microstructures: They do not retain information about the microscopic geometric properties of the microstructures. In fact, the Young measures generated by finite energy sequences are fully determined by the volume fractions via the expression \(\sum _{i=1}^3\theta _i \delta _{e_i}\), since the Young measures concentrate on the matrices \(e_1\), \(e_2\) and \(e_3\), which span a non-degenerate triangle.

As every rank-one connection has two possible normals, see Equation (8), giving rise to two different twins, we cannot infer from the volume fractions which twin is used. Consequently, what looks like a homogeneous limit could in principle be generated by a patchwork of different twins. In fact, Fig. 4 shows an experimental picture of such a situation.

Fig. 4
figure 4

Experimental picture of a two-variant microstructure in a Cu–Al–Ni alloy, by courtesy of R.D. James and C. Chu

Additionally, without knowing which twin is present the interpretation of changes in volume fractions is further complicated by the fact there are at least three mechanisms which could be responsible:

  1. 1.

    If there is only one twin throughout \({B_{1}\left( 0\right) }\) then the volume fractions can vary freely in the direction of lamination because there are no restrictions on the thickness of martensite layers in twins apart from the very mild control coming from the interface energy.

  2. 2.

    If there is only one twin, the volume fractions may, perhaps somewhat surprisingly, vary perpendicularly to the direction of lamination in a sufficiently regular manner. Constructions exhibiting this behavior have been given by Conti [14, Lemma 3.1] and Kohn, Misiats and MüllerFootnote 2 for the scalar Kohn–Müller model.

  3. 3.

    There is a jump in volume fractions across a habit plane or a second-order twin. As such a behavior costs energy, one would expect that it cannot happen too often. However, in the present setting we can only prove, roughly speaking, that the corresponding set of interfaces has at most Hausdorff-dimension \(3-\frac{2}{3}\), which will be presented in a forthcoming paper.Footnote 3

2.2 Some Notation

The rank-one connections between the martensite strains are

$$\begin{aligned} e_2 - e_1&= 6 \, \nu _3^+ \odot \nu _3^-\nonumber \\ e_3 - e_2&= 6\, \nu _1^+ \odot \nu _1^-,\nonumber \\ e_1 - e_3&= 6\, \nu _2^+ \odot \nu _2^-, \end{aligned}$$
(8)

where the possible normals are given by

$$\begin{aligned} \nu _1^+&\,{:}{=}\, \frac{1}{\sqrt{2}}(011), \; \nu _1^- \,{:}{=}\, \frac{1}{\sqrt{2}}(01\overline{1}),\nonumber \\ \nu _2^+&\,{:}{=}\, \frac{1}{\sqrt{2}}(101),\; \nu _2^- \,{:}{=}\, \frac{1}{\sqrt{2}}(\overline{1}01),\nonumber \\ \nu _3^+&\,{:}{=}\, \frac{1}{\sqrt{2}}(110), \; \nu _3^- \,{:}{=}\, \frac{1}{\sqrt{2}}(1\overline{1}0). \end{aligned}$$
(9)

Here, we use crystallographic notation, meaning we define \(\overline{1} \,{:}{=}\, -1\). In addition, we use round brackets “( )” for dual vectors, i.e., normals of planes, while square brackets “[ ]” are used for primal vectors, i.e., directions in real space.

These normals can be visualized as the surface diagonals of a cube aligned with the coordinate axes and with side lengths \(\frac{1}{\sqrt{2}}\), see Fig. 5a. We group them into three pairs according to which surface of the cube they lie in, i.e., according to the relation \(\nu _i\cdot E_i=0\), where \(E_i\) is the standard i-th basis vector of \(\mathbb {R}^3\): Let

$$\begin{aligned} N_1 \, {:}{=}\, \{\nu _1^+,\nu _1^-\},\\ N_2 \, {:}{=}\, \{\nu _2^+,\nu _2^-\},\\ N_3 \, {:}{=}\, \{\nu _3^+,\nu _3^-\}. \end{aligned}$$

Note that this grouping is also appears in Equation (8). We will also frequently want to talk about the set of all possible twin and habit plane normals, which we will refer to by \(N\,{:}{=}\, N_1\cup N_2 \cup N_3\).

Throughout the paper we make use of cyclical indices 1, 2 and 3 corresponding to martensite variants whenever it is convenient.

Remark 1

An essential combinatorial property is that for \(i\in \{1,2,3\}\) and any \(\nu _i \in N_i\), \(\nu _{i+1} \in N_{i+1}\) there exists exactly one \(\nu _{i-1} \in N_{i-1}\) such that \(\{\nu _{i},\nu _{i+1},\nu _{i-1}\}\) is linearly dependent: Indeed, the linear relation is given by \(\nu _j \cdot d = 0\) for \(j \in \{1,2,3\}\) and a space diagonal

$$\begin{aligned} d\in \mathscr {D}\, {:}{=}\, \{[111],[\overline{1}11],[1\overline{1}1],[11\overline{1}]\} \end{aligned}$$
(10)

of the unit cube; see Fig. 5b. By virtue of \(|\nu \cdot \tilde{\nu }|=\frac{1}{2}\) for \(\nu \in N_i\) and \(\tilde{\nu } \in N_j\) with \(i\ne j\), the two vectors form \(60^{\circ }\) or \(120^{\circ }\) angles. In particular, we have for all \(\nu \in N_i\) and \(\tilde{\nu } \in N_j\) with \(i\ne j\), all \(x \in {\text {span}}\{\nu ,\tilde{\nu }\}\) and \(r>0\) that

$$\begin{aligned} |x\cdot \nu |<r \text { and } |x\cdot \tilde{\nu }|< r \text { imply } |x|<2r. \end{aligned}$$
(11)

Additionally, for every \(\nu \in N\) there exist precisely two \(d,\tilde{d} \in \mathscr {D}\) such that \(\nu \cdot d \ne 0\) and \(\nu \cdot \tilde{d} \ne 0\), and for every \(\tilde{\nu } \in N {\setminus } \{\nu \}\) we have \(\tilde{\nu } \cdot d = 0\) or \(\tilde{\nu } \cdot \tilde{d} = 0\). Furthermore, for all \(\nu \in N_i\) and \(\tilde{\nu } \in N_{i+1}\) with \(i\in \{1,2,3\}\) there exists a single \(d \in \mathscr {D}\) such that \(\nu \cdot d = \tilde{\nu } \cdot d =0\). In contrast, for each \(d\in \mathscr {D}\) we have \(\nu _i^+\cdot d=0\) and \(\nu _i^- \cdot d \ne 0\) or vice versa.

Fig. 5
figure 5

a Sketch relating the normals \(\nu _{3}^+, \nu _3^- \in N_3\) of the gray planes and \(E_3\). Primal vectors are shown as dashed, dual vectors as continuous lines. The picture does not attempt to accurately capture the lengths of the vectors. b Sketch showing the linearly dependent normals \(\nu _1^+\), \(\nu _2^+\) and \(\nu _3^-\) spanning the gray plane. The point p indicates the intersection of the affine span of the space diagonal \([11\overline{1}] \in \mathscr {D}\), see definition (10), with the span of the normals

Additionally, we will frequently want to express e(u) in terms of barycentric coordinates with respect to \(e_1\), \(e_2\) and \(e_3\), which are given by the function \(\theta : {B_{1}\left( 0\right) } \rightarrow [0,1]^3\) due to \(e(u) = \sum _{i=1}^3 \theta _i e_i\), see Theorem 1 or Lemma 2 below. For almost all \(x \in {B_{1}\left( 0\right) }\), the inclusion \(e(u)(x) \in \mathscr {K}\) can then be expressed as

$$\begin{aligned} \theta (x) \in \widetilde{\mathscr {K}} \,{:}{=}\, \left\{ \bar{\theta } \in [0,1]^3: \sum _{i=1}^3 \bar{\theta }_i = 1, \bar{\theta }_1 \bar{\theta }_2 \bar{\theta }_3 = 0 \right\} . \end{aligned}$$
(12)

Furthermore, for \(\nu \in N\), \(x\in \mathbb {R}^3\) and \(\alpha \in \mathbb {R}\) we will also set

$$\begin{aligned} \pi _\nu (x) \, {:}{=} \,\nu \cdot x \text { and } H(\alpha ,\nu ) \, {:}{=}\, \left\{ \tilde{x} \in \mathbb {R}^3: \tilde{x} \cdot \nu = \alpha \right\} \end{aligned}$$
(13)

to be the projection onto \({\text {span}} (\nu )\), respectively the plane normal to \(\nu \) containing \(\alpha \nu \). For \(x \in \mathbb {R}^3\) and \(r>0\) the symbol \({B_{r}\left( x\right) }\) denotes the corresponding three-dimensional ball, while for \(y \in \mathbb {R}^2\) the symbol \(B_r^{\,2}(y)\) denotes the two-dimensional ball. The essential infimum of a function \(h \in L^\infty (U)\) for \(U \subset \mathbb {R}^n\) is defined as \(\hbox {ess inf}_U h\, {:}{=}\, - \hbox {ess sup}_U - h\).

For the convenience of the reader, we also provide a definition of the space VMO(U) for an open Lipschitz domain \(U \subset \mathbb {R}^n\) for \(n\in \mathbb {N}\), which is modeled after the one given by Sarason [40] in the whole space case.

Definition 1

Let \(U \subset \mathbb {R}^n\) with \(n \in \mathbb {N}\) be an open domain and let \(f\in L^1(U)\). We say that the function f is of bounded mean oscillation, or \(f\in BMO(U)\), if we have

$$\begin{aligned} \sup _{x\in U, 0< r< 1} \fint _{{B_{r}\left( x\right) }\cap U} \left| f(y) - \fint _{{B_{r}\left( x\right) }\cap U} f(z) \, \mathrm {d}z \right| \, \mathrm {d}y < \infty . \end{aligned}$$

If we additionally have

$$\begin{aligned} \lim _{r\rightarrow 0} \sup _{x\in U} \fint _{{B_{r}\left( x\right) }\cap U} \left| f(y) - \fint _{{B_{r}\left( x\right) }\cap U} f(z) \, \mathrm {d}z \right| \, \mathrm {d}y =0, \end{aligned}$$

then f is of vanishing mean oscillation, in which case we write \(f\in VMO(U)\).

It can be shown that at least for sufficiently nice sets U the space VMO is the BMO-closure of the continuous functions on U and as such it serves as a substitute for C(U) in our setting. Functions of vanishing mean oscillation need not be continuous, although they do share some properties with continuous functions, such as the “mean value theorem”, see Lemma 8 below. We stress that the uniformity in x of the convergence as \(r\rightarrow 0\) is crucial and cannot be omitted without changing the space, as can be proven by considering a function consisting of very thin spikes of height one clustering at some point.

Finally, for two real numbers \(s,t>0\) we use the notation \(s \lesssim t\) if there exists a universal constant \(C>0\) such that \(s \le C t\). In proofs, such constants may grow from line to line.

2.3 Description of the Limiting Configurations

In what follows we describe all types of configurations we can obtain as weak limits. We start with those in which globally only two martensite variants are involved.

Definition 2

Let \(u \in W^{1,2}({B_{1}\left( 0\right) };\mathbb {R}^3)\) solve the differential inclusion \(e(u) \in \mathscr {K}\), i.e., there exists a measurable function \(\theta : {B_{1}\left( 0\right) } \rightarrow [0,1]^3\) such that for almost all \(x\in {B_{1}\left( 0\right) }\) we have \( e(u)(x) \equiv \sum _{i=1}^3 \theta _i(x) e_i\) and \(\theta (x) \in \widetilde{\mathscr {K}}\), see defintion (12).

We say that the configuration e(u) is a two-variant configuration on \({B_{r}\left( 0\right) }\) with \(r>0\) if there exist \(i \in \{1,2,3\}\), \(\lambda \in \mathbb {R}\) and functions \(\smash {f_{\nu _i^+}, f_{\nu _i^-} }\in L^\infty (-r,r)\) for \(\nu \in N_i\) such that for almost all \(x \in {B_{r}\left( 0\right) }\) we have

$$\begin{aligned} \theta _i(x)&= 0,\\ \theta _{i+1}(x)&= f_{\nu _i^+}\left( \nu _i^+\cdot x \right) + f_{\nu _i^-}\left( \nu _i^-\cdot x \right) + \lambda x_i + 1,\\ \theta _{i-1}(x)&= - f_{\nu _i^+}\left( \nu _i^+\cdot x \right) - f_{\nu _i^-}\left( \nu _i^-\cdot x \right) - \lambda x_i. \end{aligned}$$

For a definition of the normals \(\nu \) see Section 2.2.

An experimental picture of a two-variant configuration resolving the acutal microstructure can be found in Fig. 4. In contrast, Fig. 6a only keeps track of the local volume fractions and indicates how they can vary in space. The deceptively similar overall geometric structure of both figures is due to the rank-one connections for the microscopic and macroscopic interfaces coinciding. This is also the reason why we cannot infer the microscopic structure from the limiting volume fractions. We can only attribute the affine change in \(x_i\) to Mechanism 2 from Section 2.1.

In the context of the other structures appearing in Theorem 1, two-variant configurations are best interpreted as their building blocks, since said structures typically consist of patches where only two martensite variants are involved. In the following, we will see that on these patches the microstructures are much more rigid than those in Fig. 6a as a result of the non-local nature of kinematic compatibility.

Apart from two-variant configurations, all others will only depend on two variables. We will call such configurations planar.

Definition 3

In the setting of Definition 2, a configuration e(u) is planar with respect to \(d \in \mathscr {D}\), see (10), on a ball \({B_{r}\left( 0\right) }\) with \(r>0\) if the following holds: For \(i\in \{1,2,3\}\) let \(\nu _i\) be the unique normal \(\nu _i \in N_i\) with \(\nu _i \cdot d = 0\), see Remark 1. Then for all \(i=1,2,3\) there exist functions \(f_{\nu _i}\in L^\infty (-r,r)\) and affine functions \(g_i:\mathbb {R}^3 \rightarrow \mathbb {R}\) with \(\partial _d g_i=0\) such that for almost all \(x\in {B_{r}\left( 0\right) }\) we have

$$\begin{aligned} \theta _1(x)&= f_{\nu _2}(x\cdot \nu _2) - f_{\nu _3}(x\cdot \nu _3) + g_1(x),\nonumber \\ \theta _2(x)&= {}-{} f_{\nu _1} (x\cdot \nu _1)+ f_{\nu _3}(x\cdot \nu _3) + g_2(x),\nonumber \\ \theta _3(x)&= f_{\nu _1}(x\cdot \nu _1) - f_{\nu _2}(x\cdot \nu _2) +g_3(x). \end{aligned}$$
(14)

There will be three cases of planar configurations, which at least in terms of their volume fractions look like a single second-order laminate, a “checkerboard” structure of two second order laminates crossing, or three single interfaces of second order laminates crossing in a common axis.

The first two cases are closely related to each other, the first one being almost contained in the second. However, the first case has slightly more flexibility away from macroscopic interfaces. Despite the caveat discussed in Section 2.1, we will name them planar second-order laminates.

Fig. 6
figure 6

a Cross-section through a two-variant configuration with \(i\in \{1,2,3\}\). The configuration may be affine in the direction perpendicular to the cross-section. Created using MATLAB. b The grayscale color code indicates the volume fractions of the martensite variants \(e_j\) and \(e_k\)

Definition 4

In the setting of Definition 2, a configuration e(u) is a planar second-order laminate on a ball \({B_{r}\left( 0\right) }\) for \(r>0\) if it is planar and takes the following form: There exist an index \(i\in \{1,2,3\}\), \(\nu \in N_i\), \(A\subset (-r,r)\) measurable and a, \(b \in \mathbb {R}\) such that for almost all \(x \in {B_{r}\left( 0\right) }\) we have

$$\begin{aligned} \theta _{i-1}(x)&= (1 - a x\cdot \nu - b) \chi _{{A}^{\mathsf {c}}}(x\cdot \nu ),\\ \theta _i (x)&= a x\cdot \nu + b,\\ \theta _{i+1}(x)&= (1 - a x\cdot \nu - b) \chi _{A}(x\cdot \nu ). \end{aligned}$$

A sketch of a planar second-order laminate can be found in Fig. 7, along with a matching experimental picture of a Cu–Al–Ni alloy, which, admittedly, undergoes a cubic-to-orthorhombic transformation.

Fig. 7
figure 7

a Cross-section of a planar second-order laminate with \(i\in \{1,2,3\}\), \(a=0\) and average strains \(M_1\) and \(M_2\). b Sketch relating the strains \(M_1\) and \(M_2\) in strain space. c Second-order laminate in a Cu–Al–Ni alloy, by courtesy of C. Chu and R.D. James

Indeed, such configurations can be interpreted and constructed as limits of finite-energy sequences as follows, using Fig. 7 as a guide: for simplicity let us assume that A is a finite union of intervals, and that \(i=1\). Then on the interior of \(\{x\cdot \nu \in A\}\) the configuration will be generated by twins of variants 1 and 2, while on the interior of \(\{x\cdot \nu \in {A}^{\mathsf {c}}\}\), it will be generated by twins of variants 1 and 3. At interfaces, a branching construction on both sides will be necessary to join these twins in a second-order laminate. In order to realize the affine change in the direction of \(\nu \) we will need to combine Mechanisms 1 and 2 of Section 2.1 because \(\nu \) is neither a possible direction of lamination between variants 1 and 2 or variants 1 and 3, nor is it normal to one of them.

The second case consists of configurations in which two second-order laminates cross. In contrast to the first case, the strains are required to be constant away from macroscopic interfaces leading to only four different involved macroscopic strains.

Definition 5

In the setting of Definition 2, we will say that a configuration e(u) is a planar checkerboard on \({B_{r}\left( 0\right) }\) for \(r>0\) if it is planar and takes the following form: There exist \(i\in \{1,2,3\}\), \(A,B \subset (-r,r)\) measurable, \(a,b \ge 0 \) with \(a + b = 1\) and \(\nu _j \in N_j\) for \(j \in \{1,2,3\}{\setminus } \{i\}\) such that for almost all \(x \in {B_{r}\left( 0\right) }\) we have

$$\begin{aligned} \theta _i(x)&= - a \chi _A (x\cdot \nu _{i+1}) - b \chi _B(x\cdot \nu _{i-1}) + 1,\\ \theta _{i+1}(x)&= b \chi _B (x\cdot \nu _{i-1}) ,\\ \theta _{i-1}(x)&= a \chi _A(x\cdot \nu _{i+1}). \end{aligned}$$
Fig. 8
figure 8

a Sketch of a planar checkerboard with strains \(M_1\), ..., \(M_4\). The cross-section can be chosen such that the average strains are independent of the direction perpendicular to the cross-section. b Relation of the the strains in strain space. c Checkerboard structure in an Indium–Thallium crystal. Note that here the bottom region is in the austenite phase. Reprinted from [4], with permission from Elsevier

For a sketch of such configurations, see Fig. 8. Again, we briefly discuss the construction of such limiting strains. On \(\{x\cdot \nu _{i+1} \in {A}^{\mathsf {c}}\}\cap \{ x\cdot \nu _3 \in {B}^{\mathsf {c}}\}\) there is of course only the martensite variant i present. On all other patches there will be twinning and the macroscopic interfaces require branching constructions unless the interface and the twinning normal coincide, which can only happen if both strains lie on the same edge of \(\mathscr {K}\). In particular, on \(\{x\cdot \nu _{i+1} \in A,\, x\cdot \nu _{i-1} \in B\}\) there has to be branching towards all interfaces, i.e., the structure has to branch in two linearly independent directions.

Lastly, we remark on the case of three crossing second-order laminates.

Definition 6

In the setting of Definition 2, a configuration is called a planar triple intersection on \({B_{r}\left( 0\right) }\) for \(r>0\) if it is planar and the following holds: For \(i=1,2,3\) let \(\nu _i \in N_i\) and let \(\tilde{\nu }_i \in \{\nu _i,-\nu _i\}\) be oriented such that we have \(\tilde{\nu }_1 + \tilde{\nu }_2 + \tilde{\nu }_3 = 0\), see Remark 1. For all \(i=1,2,3\), there exist sets \(J_i \subset \mathbb {R}\) and \(x_0 \in {B_{r}\left( 0\right) }\) such that we have either

$$\begin{aligned} J_i \cap (-r,r) = (-r,x_0\cdot \tilde{\nu }_i]\text { for all } i=1,2,3 \end{aligned}$$

or

$$\begin{aligned} J_i \cap (-r,r) = [x_0 \cdot \tilde{\nu }_i,r) \text { for all } i=1,2,3. \end{aligned}$$

Furthermore, for all \(i=1,2,3\) there exist \(a,b_i \in \mathbb {R}\) such that \(\sum _{i=1}^3 b_i =1\) such that for almost all \(x\in {B_{r}\left( 0\right) }\) we have

$$\begin{aligned} \theta _1(x)&= (a x\cdot \tilde{\nu }_2 +b_2)\chi _{{J_2}^{\mathsf {c}}}(x\cdot \tilde{\nu }_2) + (ax\cdot \tilde{\nu }_3 + b_3)\chi _{J_3}(x\cdot \tilde{\nu }_3),\\ \theta _2(x)&= (ax\cdot \tilde{\nu }_1+b_1)\chi _{J_1}(x\cdot \tilde{\nu }_1) + (ax\cdot \tilde{\nu }_3 + b_3) \chi _{{J_3}^{\mathsf {c}}}(x\cdot \tilde{\nu }_3),\\ \theta _3 (x)&= (ax\cdot \tilde{\nu }_1 + b_1)\chi _{{J_1}^{\mathsf {c}}}(x\cdot \tilde{\nu }_1) + (ax\cdot \tilde{\nu }_2 + b_2)\chi _{J_2}(x\cdot \tilde{\nu }_2). \end{aligned}$$

A sketch of a planar triple intersection can be found in Fig. 9. Note that due to the requirement \(x_0 \in {B_{r}\left( 0\right) }\), i.e., we ask the axis of intersection of the discontinuities to intersect \({B_{r}\left( 0\right) }\), the restriction of a planar triple intersection to a smaller ball does not necessarily yield a triple intersection again, which is why Theorem 1 is formulated for two different universal radii.

Fig. 9
figure 9

a Sketch of a planar triple intersection with \(a=0\) with average strains \(M_1\),..., \(M_6\). Again the cross-section is chosen such that the strains are independent of the direction perpendicular to the cross-section. The hatching indicates a possible choice for the microscopic twins, but does not encode the necessary branching. b Relation of the strains in strain space

There are a number of possible choices of microscopic twins for constructing triple sections. We will only describe the simplest one here, which is depicted in Fig. 9a. Going around the central axis the macroscopic interfaces alternate between being a result of Mechanism 1 from Section 2.1, namely varying the relative thickness of layers in a twin, and Mechanism 3, i.e., branching, otherwise. Similarly to the case of second-order laminates, the affine changes require a combination of Mechanisms 1 and 2 on the individual patches in Fig. 9a.

3 Outline of the Proof

We will give the ideas behind each individual part of the proof of our main theorem in its own subsection. The contents of each are organized by increasing detail, so that the reader may skip to the next subsection once they are satisfied with the explanations given.

3.1 The Differential Inclusion

We first mention that the inclusion \(e(u) \in \mathscr {K}\) holds.

Lemma 2

For \(n \in \mathbb {N}\) let \(\eta _n >0\) be such that \(\lim _{n \rightarrow \infty } \eta _n = 0\). Consider sequences of displacements and partitions \(u_{\eta _n}\in W^{1,2}({B_{1}\left( 0\right) }; \mathbb {R}^3)\) and \(\chi _{\eta _n} \in L^\infty ({B_{1}\left( 0\right) }; \{0,1\}^3)\) with \(\sum _{i=1}^3 \chi _{\eta _n,i} =1\) almost everywhere such that \(\limsup _{n\rightarrow \infty } E_{\eta _n}(u_{\eta _n},\chi _{\eta _n}) < \infty \) and such that there exist \(u \in W^{1,2}({B_{1}\left( 0\right) };\mathbb {R}^3)\) and \(\theta \in L^\infty ({B_{1}\left( 0\right) };[0,1]^3)\) with

$$\begin{aligned} u_{\eta _n} \rightharpoonup u \text { in } W^{1,2}({B_{1}\left( 0\right) };\mathbb {R}^3)\text {, } \chi _{\eta _n} {\mathop {\rightharpoonup }\limits ^{*}} \theta \text { in } L^\infty ({B_{1}\left( 0\right) };\mathbb {R}^3) \end{aligned}$$

in the limit \(n\rightarrow \infty \). Then almost everywhere on \( {B_{1}\left( 0\right) }\) we have

$$\begin{aligned} e(u) \equiv \sum _{i=1}^3 \theta _i e_i\text {, } \theta \in \widetilde{\mathscr {K}} \text { and } e(u) \in \mathscr {K}. \end{aligned}$$

The statement \(e(u) \equiv \sum _{i=1}^3 \theta _i e_i\) is an immediate consequence of the elastic energy vanishing in the limit, while the proof of the non-convex inclusion relies on the rescaling properties of the energy and the Capella–Otto rigidity result [11]. For \(x\in \mathbb {R}^2\) and \(r>0\) we will set

$$\begin{aligned} r \hat{x} = x\text {, }\hat{u} ( \hat{x}) = \frac{1}{r} u (x)\text {, } \hat{\chi } (\hat{x}) = \chi (x)\text {, } r \hat{\eta } = \eta , \end{aligned}$$

where \(\eta \) needs to be re-scaled as well due to it playing the role of a length scale, to obtain

$$\begin{aligned} E_{\hat{\eta }}(\hat{u}, \hat{\chi }) = r^{-3 + \frac{2}{3}} E_\eta (u,\chi ). \end{aligned}$$

The right-hand side consequently behaves better than just taking averages, which allows us to locally apply the result by Capella and Otto to get the statement.

3.2 Decomposing the Strain

Next, we link the convex differential inclusion

$$\begin{aligned} e(u) \in S = \{e \in \mathbb {R}^{3\times 3}: e \text { diagonal, } {\text {tr}}e = 0\}, \end{aligned}$$

see also definition (6), to a decomposition of the strain into simpler objects, namely functions of only one variable and affine functions. Already Dolzmann and Müller [19] used the interplay of this decomposition with the non-convex inclusion \(e(u) \in \{e_1,e_2,e_3\}\) to get their rigidity result. In our case, it will however be more convenient to directly state the decomposition in terms of the barycentric coordinates \(\theta \) with respect to \(e_1\), \(e_2\) and \(e_3\). For all cyclical indices \(i=1,2,3\) the relation, valid almost everywhere in \({B_{1}\left( 0\right) }\),

$$\begin{aligned} e(u)_{ii} = \sum _{j=1}^3 \theta _j e_j = -2\theta _i + \theta _{i+1} + \theta _{i-1} = 1-3\theta _i \end{aligned}$$

ensures that both viewpoints are equivalent.

Lemma 3

There exists a universal \(r \in (0,1)\) with the following property: Let the displacement \(u \in W^{1,2}({B_{1}\left( 0\right) };\mathbb {R}^3)\) and the function \(\theta \in L^\infty ({B_{1}\left( 0\right) };\mathbb {R}^3)\) almost everywhere on \({B_{1}\left( 0\right) }\) satisfy

$$\begin{aligned} \sum _{i=1}^3 \theta _i \equiv 1 \;\mathrm{and}\; e(u) \equiv \sum _{i=1}^3 \theta _i e_i. \end{aligned}$$

Then for all \(\nu \in N\) and all cyclical indices \(i=1,2,3\) there exist \(f_\nu \in L^\infty (-r,r)\) and affine functions \(g_i : \mathbb {R}^3 \rightarrow \mathbb {R}\) such that for almost all \(x\in {B_{r}\left( 0\right) }\) we have

$$\begin{aligned} \theta _{i}(x) = \sum _{\nu \in N_{i+1}} f_\nu (x\cdot \nu ) - \sum _{\nu \in N_{i-1}} f_\nu (x\cdot \nu ) + g_i (x). \end{aligned}$$
(15)

The only (marginally) new aspect of Lemma 3 compared to the previously known versions [19, Lemma 3.2] and [11, Proposition 3.9] is the statement \(f_\nu \in L^\infty \) for all \(\nu \in N\). We will thus only highlight the required changes to the proof of Capella and Otto [11, Proposition 3.9]. Essentially, the strategy here is to integrate the Saint–Venant compatibility conditions for linearized strains, which in our situation take the form of six two-dimensional wave equations, see Lemma 5. Thus it is not surprising that the decomposition is in fact equivalent to

$$\begin{aligned} \begin{pmatrix} e(u)_{11} &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad e(u)_{22} &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad e(u)_{33} \end{pmatrix} \end{aligned}$$

being a symmetric gradient.

A central part of the proof of Lemma 3 is uniqueness up to affine functions of the decomposition [11, Lemma 3.8]. We can apply this result to characterize two-variant configurations as the only ones with \(\theta _i \equiv 0\) for some \(i =1,2,3\), i.e., as the only ones that indeed only combine two variants.

Corollary 1

Almost everywhere on \({B_{1}\left( 0\right) }\), let \(e(u) \in \mathscr {K}\) with \(u \in W^{1,2}({B_{1}\left( 0\right) };\mathbb {R}^3)\) be such that the barycentric coordinates \(\theta \in L^\infty ({B_{1}\left( 0\right) };[0,1]^3)\) satisfy \(\theta \in \widetilde{\mathscr {K}}\). Furthermore, for all \(\nu \in N\) and all cyclical indices \(i=1,2,3\) let there exist \(f_\nu \in L^\infty (-1,1)\) and affine functions \(g_i:\mathbb {R}^3 \rightarrow \mathbb {R}\) such that for almost all \(x \in {B_{1}\left( 0\right) }\) we have

$$\begin{aligned} \theta _{i}(x) = \sum _{\nu \in N_{i+1}} f_\nu (x\cdot \nu ) - \sum _{\nu \in N_{i-1}} f_\nu (x\cdot \nu ) + g_i (x). \end{aligned}$$
(16)

If for some \(i \in \{1,2,3\}\) we have \(\theta _i \equiv 0\) a.e. on \({B_{1}\left( 0\right) }\), then the solution of the differential inclusion is a two-variant configuration on \({B_{1}\left( 0\right) }\) according to Definition 2.

Another very useful consequence of the decomposition (15) is that such functions have traces on hyperplanes as long as none of the individual one-dimensional functions are necessarily constant on them. See Fig. 10 for the geometry in a typical application.

Fig. 10
figure 10

Sketch indicating that \(\theta _2\) has traces on hyperplanes with normal \(\nu _2^+\) since its decomposition only involves continuous functions and the normals \(\nu _i^\pm \) for \(i=1,3\). As usual we do not keep track of the lengths of the drawn vectors

Lemma 4

Let \(m,n,k,P \in \mathbb {N}{\setminus }\{0\}\) with \(n\ge 2\) and \(k<n\). Let \(\mathscr {C}\subset \mathbb {R}^m\) be a closed, convex set. For \(i=1,\ldots , P\) let \( \nu _i\in {\mathbb {S}}^{n-1}\) and \(f_i \in L^1_{\mathrm{loc}}( \mathbb {R}; \mathbb {R}^m) \) be such that \(F: \mathbb {R}^n \rightarrow \mathscr {C}\) for almost all \(x\in \mathbb {R}^n\) satisfies the decomposition

$$\begin{aligned} F(x) \equiv \sum _{i=1}^P f_i(x\cdot \nu _i). \end{aligned}$$
(17)

Furthermore, let \(V\subset \mathbb {R}^n\) be a k-dimensional subspace such that \(\nu _i \notin V^\perp \) for all indices \(i=1,\ldots , P\).

Then the decomposition (17) defines a locally integrable trace \(F|_V: V \rightarrow \mathscr {C}\), and for all \(\delta >0\) and \(\mathscr {H}^k\)-almost all \(x \in V\) we have

$$\begin{aligned} F_\delta (x)\, {:}{=}\, \fint _{B_{\delta }\left( x\right) }F(y) \, \mathrm {d}\mathscr {L}^n(y) \rightarrow F(x) \end{aligned}$$

in the limit \(\delta \rightarrow 0\).

Finally, we give the wave equations constituting the Saint–Venant compatibility conditions.

Lemma 5

For \(u \in W^{1,2}({B_{1}\left( 0\right) };\mathbb {R}^3)\) and \(\theta \in L^\infty ({B_{1}\left( 0\right) };\mathbb {R}^3)\) such that for almost all \(x\in {B_{1}\left( 0\right) }\) we have \(\sum _{i=1}^3 \theta _i(x) = 1\) and \( e(u)(x) = \sum _{i=1}^3 \theta _i(x) e_i\), the barycentric coordinates \(\theta \) distributionally satisfy the following wave equations:

$$\begin{aligned} \partial _{[111]}\partial _{[\overline{1}11]} \theta _1 = 0,&\quad \partial _{[1\overline{1}1]}\partial _{[ 11\overline{1}]} \theta _1 = 0,\nonumber \\ \partial _{[1\overline{1}1]}\partial _{[111]} \theta _2 = 0,&\quad \partial _{[\overline{1}11]}\partial _{[ 11\overline{1}]} \theta _2 = 0,\nonumber \\ \partial _{[111]}\partial _{[ 11\overline{1}]} \theta _3 = 0,&\quad \partial _{[1\overline{1}1]}\partial _{[\overline{1}11]} \theta _3 = 0. \end{aligned}$$
(18)

3.3 Planarity in the Case of Non-trivial Blow-Ups

While the statements in the previous subsections either rely on rather soft arguments or were previously known, we now come to the main ideas of the paper. As \(\smash {\widetilde{\mathscr {K}}}\), see definition (12), is a connected set, there are no restrictions on varying single points continuously in \(\smash {\widetilde{\mathscr {K}}}\). However, the crucial insight is that two different points \(\theta ^{(1)}, \theta ^{(2)} \in \smash {\widetilde{\mathscr {K}}}\) with \(\smash {\theta ^{(1)}_1 = \theta ^{(2)} _1} >0\) are much more constrained.

To illustrate this rigidity, we first for simplicity assume that there exist functions \(f_1, f_2, f_3 \in L^\infty (-1,1)\) such that for almost all \(x \in {B_{1}\left( 0\right) }\) we have the decomposition

$$\begin{aligned} \theta _1(x)&= f_2(x_2) - f_3(x_3) + 1,\\ \theta _2(x)&= - f_1(x_1) + f_3(x_3),\\ \theta _3(x)&= f_1(x_1) - f_2(x_2). \end{aligned}$$

Furthermore, suppose that \(f_1\) is a BV-function with a jump discontinuity of size \(\delta f_1\) at \(x_1=0\) and that the other functions are continuous. Thus the blow-up of \(\theta \) at some point \((0,x')\in {B_{1}\left( 0\right) }\) takes two values \(\smash {\theta ^{(1)},\theta ^{(2)}} \in \mathscr {\widetilde{K}}\), which agree in their first component. Specifically, we have \(\smash {\theta ^{(1)}_1 = \theta ^{(2)}_1} =\theta _1(0,x')\). A look at Fig. 11 suggests that \(\theta _1(0,x')\) can take at most two values, which furthermore are independent of \(x'\). As it is a sum of two one-dimensional functions some well-known, straightforward combinatorics imply that one of the two functions must be constant. Consequently \(\theta \) only depends on two directions.

Fig. 11
figure 11

Illustration of the argument for two-valuedness of \(\theta _1\) near \(x_1= 0\). The length of the dotted line has to be \(3\sqrt{2}\,\, \delta f_1\), where \(\delta f_1 >0\) is the size of the jump of \(f_1\) at zero. Consequently, the function \(\theta _1\) can only take the two values 0 or \(1- \delta f_1\)

This can be adapted to our more complex decomposition (15), even without any a priori regularity of the one-dimensional functions. To this end, we require a topology for the blow-ups which respects the non-convex inclusion \(e(u) \in \mathscr {K}\), and a quantification of discontinuity for \(f_\nu \) which ensures that its blow-up is non-constant.

In order to keep the non-convexity, we for \(x\in \mathbb {R}^3\) and \(\varepsilon > 0\) consider the push-forward measures

$$\begin{aligned} f \mapsto \fint _{{B_{1}\left( 0\right) }} f(\theta (x+\varepsilon y)) \, \mathrm {d}y \text { for } f\in C_0(\mathbb {R}^3) \end{aligned}$$

in the limit \(\varepsilon \rightarrow 0\). This approach is very similar in spirit to using Young-measures, but without a further localization in the variable y. Positing that \(f_\nu \) does not have a constant blow-up along some sequence then means that \(f_\nu \) does not converge strongly to a constant on average, i.e., it does not converge to its average on average. If one allows the midpoints x of the blow-ups to depend on \(\varepsilon \), we see that this is equivalent to \(f_\nu \notin VMO\) according to Definition 1 given above.

The resulting statement is

Proposition 1

There exist universal radii \(r\in (0,\frac{1}{64})\) and \(\tilde{r} \in (0,r)\) with the following property: Almost everywhere on \({B_{1}\left( 0\right) }\), let \(e(u) \in \mathscr {K}\) with \(u \in W^{1,2}({B_{1}\left( 0\right) };\mathbb {R}^3)\) be such that the barycentric coordinates \(\theta \in L^\infty ({B_{1}\left( 0\right) };[0,1]^3)\) satisfy \(\theta \in \widetilde{\mathscr {K}}\). Furthermore, for all \(\nu \in N\) and all cyclical indices \(i=1,2,3\) let there exist \(f_\nu \in L^\infty (-1,1)\) and affine functions \(g_i:\mathbb {R}^3 \rightarrow \mathbb {R}\) such that for almost all \(x \in {B_{1}\left( 0\right) }\) we have

$$\begin{aligned} \theta _{i}(x) = \sum _{\nu \in N_{i+1}} f_\nu (x\cdot \nu ) - \sum _{\nu \in N_{i-1}} f_\nu (x\cdot \nu ) + g_i (x). \end{aligned}$$
(19)

Additionally, let there exist \(i\in \{1,2,3\}\) and \(\nu \in N_i\) such that \(f_{\nu } \notin VMO(-\tilde{r}, \tilde{r})\). Then there exists \(d \in \mathscr {D}\), see (10), with \(d\cdot \nu =0\) such that the configuration is planar with respect to d on \({B_{r}\left( 0\right) }\) or we have \(\theta _i \equiv 0\) on on \({B_{\tilde{r}}\left( 0\right) }\), i.e., a two-variant configuration.

Furthermore, in the first case there exist \(\alpha \in [-\tilde{r}, \tilde{r}]\), \(0<b<1\) and a Borel-measurable set \(B \subset H(\alpha ,\nu )\cap B_{1/8}(0)\) with \(\mathscr {H}^2(B \cap B_{r}(0))>0\) such that the trace \(\theta _i|_{H(\alpha ,\nu )}\) defined by Lemma 4 satisfies \(\theta _i|_{ H(\alpha ,\nu )} = b \chi _B \) on \(H(\alpha ,\nu )\cap B_{1/8}(0)\), \(\mathscr {H}^2\)-almost everywhere. (Here, Lemma 4 applies due to \(\tilde{\nu } \cdot \nu \ne 0\) for all \(\tilde{\nu } \in N_{i+1}\cup N_{i-1}\), which are precisely the normals of the decomposition (15) of \(\theta _i\), see Fig. 10.)

There is another slightly more subtle issue in the proof of Proposition 1: As already explained, our argument works by looking at a single plane at which we blow-up. Consequently, we can only distinguish the two cases \(\theta _i \equiv 0\) and \(\theta _i \not \equiv 0\) on said hyperplane. Therefore we need a way of transporting the information \(\theta _i \equiv 0\) from the hyperplane to an open ball. Given our combinatorics this turns out to be the 3D analog of the question: “If \(F(x,y) = f(x)+g(y)\) is constant on the diagonal, is it constant on an non-empty open set?” Looking at the function \(F(x,y) = x - y\) one might think that the argument is doomed since F vanishes on the diagonal but clearly does not do us the favor of vanishing on a non-empty open set.

However, the fact that 0 is an extremal value for \(\theta _1\) saves us: If F is constant on the diagonal of a square and achieves its minimum there, then it has to be constant on the entire square, see also Fig. 12a. For later use we already state this fact in its perturbed form.

Lemma 6

Let \(f, g \in L^\infty (0,1)\) and \(c\in \mathbb {R}\) be such that \(f(x_1) + g(x_2) \ge c\) for almost all \(x \in (0,1)^2\) . Let \(\varepsilon \ge 0\) and let one of the following two statements be true:

  1. 1.

    For almost all \(x\in (0,1)^2\) the sum satisfies \(f(x_1) + g(x_2) \le c +\varepsilon \).

  2. 2.

    For almost all \(t\in (0,1)\) the sum satisfies \(f(t) + g(t) \le c + \varepsilon \).

With the essential infima as defined in Section 2.2, it then holds that

  1. 3.

    For almost every \(t\in (0,1)\) we have \(f(t) \le \hbox {ess inf}_{(0,1)} f + \varepsilon \), \(g(t) \le \hbox {ess inf}_{(0,1)} g + \varepsilon \) and \(c \le \hbox {ess inf}_{(0,1)} f + \hbox {ess inf}_{(0,1)} g \le c + \varepsilon \) .

If \(\varepsilon = 0\), then all three statements are equivalent.

This statement can be lifted to three-dimensional domains. Its analogue states that in order to deduce that \(\theta _i\) for some \(i=1,2,3\) is constant and extremal, it is enough to know that the extremal value is attained on a line parametrized by \(l(t)\, {:}{=}\, x_0 + \sqrt{2}t E_i\) for \(t\in I\) and some interval \(I \subset \mathbb {R}\). Here, \(E_i\) is the i-th standard basis vector of \(\mathbb {R}^3\) and the restriction of \(\theta _i\) to the image of l is defined by Lemma 4. It will later be important that we have a precise description of the maximal set to which the information \(\theta _i = 0\) can be transported, which turns out to be the polyhedron

$$\begin{aligned} P \, {:}{=}\, \bigcap _{\nu \in N_{i+1} \cup N_{i-1}} \{x \in \mathbb {R}^3: \nu \cdot x = \nu \cdot l(I)\}; \end{aligned}$$

see Fig. 12b. The general strategy of the proof is described in Fig. 13.

Fig. 12
figure 12

a For \(f,g \in L^\infty (0,1)\), the information \(f(x_1) +g(x_2) = c\) along the dashed diagonal can be transported to the whole gray square provided \(f(x_1) +g(x_2) \ge c\). b Sketch of the polyhedron P with normals \(\nu _i^\pm \) for \(i=2,3\), which in the setting of Lemma 7 is the maximal set to which we can propagate the information \(\theta _1 \equiv 0\) or \(\theta _1\equiv 1\) on the dashed line l(I)

There is also a generalization of the one-dimensional functions being almost constant in two dimensions: In three dimensions, the one-dimensional functions are close to being affine on P in the sense that the inequality (23) holds. (Lemma 13, see Appendix A, ensures that then there exists an affine function which is uniformly close.) As we only need this part of the statement in approximation arguments we may additionally assume that the one-dimensional functions are continuous to avoid technicalities.

Fig. 13
figure 13

a In the setting of Lemma 7 we first transport the information \(\{\theta _1 \approx 0 \}\) from the dashed line l(I) to the gray plane \(H(0,\frac{1}{\sqrt{2}}(011))\cap P\) using the two-dimensional result. b In a second step, we use \(\{\theta _1 \approx 0 \}\) along another dashed line \(\tilde{l} (\tilde{I})\) parallel to \(E_1\) to propagate the information to \(H(\alpha ,\frac{1}{\sqrt{2}}(01\overline{1}))\cap P\) for all \(\alpha \in \mathbb {R}\)

The resulting statement is the following:

Lemma 7

There exists a universal radius \(r\in (0,1)\) with the following property: Let \(\theta \in L^\infty ({B_{1}\left( 0\right) };[0,1]^3)\) be such that for all \(\nu \in N\) and all cyclical indices \(i=1,2,3\) there exist \(f_\nu \in L^\infty (-1,1)\) affine functions \(g_i:\mathbb {R}^3 \rightarrow \mathbb {R}\) such that for almost all \(x \in {B_{1}\left( 0\right) }\) we have

$$\begin{aligned} \theta _{i}(x) = \sum _{\nu \in N_{i+1}} f_\nu (x\cdot \nu ) - \sum _{\nu \in N_{i-1}} f_\nu (x\cdot \nu ) + g_i (x). \end{aligned}$$
(20)

Let \(i \in \{1,2,3\}\). Let \(I \subset \mathbb {R}\) be a closed interval and let \(x_0 \in \mathbb {R}^3\) such that \(x_0 + \sqrt{2} I E_i \subset {B_{r}\left( 0\right) }\). For \(t \in I\) we define \(l(t) \, {:}{=}\, x_0 + \sqrt{2} tE_i\) and the polyhedron P to be

$$\begin{aligned} P \,{:}{=}\, \bigcap _{\nu \in N_{i+1} \cup N_{i-1}} \{x \in \mathbb {R}^3: \nu \cdot x \in \nu \cdot l(I)\}, \end{aligned}$$

see also Fig. 12b. For \(\varepsilon >0\) assume that either

$$\begin{aligned} \theta _i\circ l(t) \le \varepsilon \text { for almost all } t \in I \text { or } 1 -\theta _i\circ l(t) \le \varepsilon \text { for almost all } t \in I. \end{aligned}$$
(21)

Then it holds that \(P \subset {B_{1}\left( 0\right) }\) and we in the respective cases have

$$\begin{aligned} 0 \le \theta _i(x) \le 9\varepsilon \text { for almost all }x \in P \text { or } 1-9\varepsilon \le \theta _i(x) \le 1 \text { for almost all }x \in P.\nonumber \\ \end{aligned}$$
(22)

Furthermore, if additionally for all \(\nu \in N_{i+1} \cup N_{i-1}\) the one-dimensional functions \(f_\nu \) are continuous, then they all are almost affine in the sense that for all \((s,h,\tilde{h}) \in \mathbb {R}^3\) with \(s, s+ h, s+ \tilde{h}, s+h + \tilde{h} \in \nu \cdot l(I)\) for all \(\nu \in N_{i+1} \cup N_{i-1}\) we have

$$\begin{aligned} \left| f_\nu (s + h + \tilde{h}) + f_\nu (s) - f_\nu (s + h) - f_\nu (s + \tilde{h})\right| \le 36 \varepsilon . \end{aligned}$$
(23)

3.4 The Case \(f_\nu \in VMO\) for all \(\nu \in N\)

Having simplified the case where one of the one-dimensional functions is not of vanishing mean oscillation, we now turn to the case where all of them lie in VMO. The statement we will need to prove here is the following:

Proposition 2

There exists a universal constant \(r\in (0,1)\) such that the following holds: Almost everywhere on \({B_{1}\left( 0\right) }\), let \(e(u) \in \mathscr {K}\) with \(u \in W^{1,2}({B_{1}\left( 0\right) };\mathbb {R}^3)\) be such that the barycentric coordinates \(\theta \in L^\infty ({B_{1}\left( 0\right) };[0,1]^3)\) satisfy \(\theta \in \widetilde{\mathscr {K}}\). Furthermore, for all \(\nu \in N\) and all cyclical indices \(i=1,2,3\) let there exist \(f_\nu \in VMO(-1,1)\) and affine functions \(g_i:\mathbb {R}^3 \rightarrow \mathbb {R}\) such that for almost all \(x \in {B_{1}\left( 0\right) }\) we have

$$\begin{aligned} \theta _{i}(x) = \sum _{\nu \in N_{i+1}} f_\nu (x\cdot \nu ) - \sum _{\nu \in N_{i-1}} f_\nu (x\cdot \nu ) + g_i (x). \end{aligned}$$
(24)

Then e(u) is a two-variant configuration on \({B_{r}\left( 0\right) }\) in the sense of Definition 2.

Fig. 14
figure 14

Sketch of how e(u)(x) lies in \(\mathscr {K}\). At the boundary of \(\theta _1^{-1}(0)\) the strain needs to take the two values \(e_2\) and \(e_3\)

Remark 2

To fix ideas, let us first illustrate the argument in the case of continuous functions in the whole space: By the mean value theorem the case \(e(u) \in \{e_1,e_2,e_3\}\) is trivial. Therefore, we may suppose that there is a point \(x \in \mathbb {R}^3\) such that e(u)(x) lies strictly between two pure martensite strains, e.g., we have \(\theta _1(x) = 0\) and \(0< \theta _2(x), \theta _3(x) <1\), see Fig. 14. By continuity of \(\theta _2 \) and \(\theta _3\), the set \(\{\theta _1 = 0\}\) has non-empty interior, and, by the decomposition (15), any connected component of it should be a polyhedron P whose faces have normals lying in \(N_2\cup N_3\), see Fig. 15a. Additionally, continuity implies that

$$\begin{aligned} e(u) \equiv e_2 \text { or } e(u) \equiv e_3 \text { on each face}. \end{aligned}$$

Unfortunately, on a face with normal in \(N_i\) for \(i=2,3\) only \(\theta _i\) will later be a well-defined function due to Lemmas 3 and 4 after dropping continuity. Therefore on such a face we can only use the above information in the form

$$\begin{aligned} \theta _i \equiv 0 \text { or } \theta _i \equiv 1. \end{aligned}$$
Fig. 15
figure 15

In the setting of Remark 2: a Sketch of a connected component P of \(\theta _1^{-1}(0)\) with normals \(\nu _2^+\), \(\nu _3^-\) and \(\nu _3^+\). On the gray face we get the information \(\theta _2 \equiv 0\) or \(\theta _2 \equiv 1\). In particular, we get it along the line l, which is parallel to \(E_2\). b Sketch of the polyhedron Q that transports the information \(\theta _2 \equiv 0\) or \(\theta _2 \equiv 1\) along l to the inside of P

Using Lemma 7 we get a polyhedron Q that transports this information back inside P, see Fig. 15b. The goal is then to show that we can reach x in order to get a contradiction to e(u)(x) lying strictly between \(e_2\) and \(e_3\), which we will achieve by using the face of P closest to x.

In order to turn this string of arguments into a proof in the case \(f_\nu \in VMO(-1,1)\) for all \(\nu \in N\) the key insight is that non-convex inclusions and approximation by convolutions interact very nicely for VMO-functions. This elementary, if maybe a bit surprising fact has previously been used to in the degree theory for VMO-functions, see Brezis and Nirenberg [9, Inequality (7)], who attribute it to L. Boutet de Monvel and O. Gabber. For the convenience of the reader, we present a proof in Section 4.

Lemma 8

(L. Boutet de Monvel and O. Gabber) For \(n,d \in \mathbb {N}\) let \(U \subset \mathbb {R}^n\) be open and let \(K \subset \mathbb {R}^d\) be compact. Let \(f \in VMO(U)\) with \(f\in K\) almost everywhere. For \(\delta >0\) and \(x\in U\), let \(f_\delta (x)\, {:}{=}\, \fint _{{B_{\delta }\left( x\right) }} f(y) \, \mathrm {d}y\), where we extend f by 0 outside of U. Then \(f_\delta \) is continuous and we have that \({\text {dist}}(f_\delta , K) \rightarrow 0\) locally uniformly in U as \(\delta \rightarrow 0\).

Unfortunately, formalizing the set \(\{\theta _{1,\delta } \approx 0\}\) in such a way that connected components are polyhedra is a bit tricky. We do get that they contain polyhedra on which the one-dimensional functions are close to affine ones, see Lemmas 7 and 13. (The latter can be found in the Appendix A.) However, we do not immediately get the other inclusion: As the directions in the decomposition are linearly dependent, one of the one-dimensional functions deviating too much from their affine replacement does not translate into \(\theta _1\) deviating too much from zero.

We side-step this issue by first working on hyperplanes \(H(\alpha ,\nu _1^+)\) for some \(\alpha \in (-r,r)\) where \(r \in (0,1)\). In that case, the decomposition of \(\theta _1\) simplifies to two one-dimensional functions and thus we do get that connected components of \(\{\theta _{1,\delta } \approx 0\} \cap H(\alpha ,\nu _1^+)\) are parallelograms. The goal is then to prove that at least some of them do not shrink away in the limit \(\delta \rightarrow 0\). Making use of Lemma 7 we can go back to a full dimensional ball and get that the set \(\{\theta _1 =0\}\) has non-empty interior. This allows the argument for continuous functions to be generalized to VMO-functions.

3.5 Classification of Planar Configurations

It remains to exploit the two-dimensionality that was the result of Proposition 1. It allowed us to reduce the complexity of the decomposition (15) to three one-dimensional functions with linearly dependent normals and three affine functions. We first deal with the easier case where one of the one-dimensional functions is affine.

Lemma 9

There exists a universal number \(r\in (0,1)\) with the following property: Almost everywhere on \({B_{1}\left( 0\right) }\), let \(e(u) \in \mathscr {K}\) with \(u \in W^{1,2}({B_{1}\left( 0\right) };\mathbb {R}^3)\) be such that the barycentric coordinates \(\theta \in L^\infty ({B_{1}\left( 0\right) };[0,1]^3)\) satisfy \(\theta \in \widetilde{\mathscr {K}}\). Let the configuration be planar on \({B_{1}\left( 0\right) }\) with respect to the direction \(d \in \mathscr {D}\), and let the functions of one variable involved in the decomposition (14) be given by \(f_{\nu _i} \in L^\infty (-1,1)\) for \(\nu _i \in N_i\) with \(\nu _i \cdot d= 0\) for all \(i=1,2,3\). Furthermore, assume that there exists \(j\in \{1,2,3\}\) with \(\nu _j \in N_j\) such that the function \(f_{\nu _j}\) is affine on \((-1,1)\).

Then, on \({B_{r}\left( 0\right) }\), the configuration is a two-variant configuration, a planar second-order laminate or a planar checkerboard.

While the preceding lemma is mostly an issue of efficient book-keeping to reap the rewards of previous work, we now have to make a last effort to prove the rather strong rigidity properties of planar triple intersections:

Proposition 3

There exists a universal radius \(r\in (0,\frac{1}{2})\) with the following property: Almost everywhere on \({B_{1}\left( 0\right) }\), let \(e(u) \in \mathscr {K}\) with \(u \in W^{1,2}({B_{1}\left( 0\right) };\mathbb {R}^3)\) be such that the barycentric coordinates \(\theta \in L^\infty ({B_{1}\left( 0\right) };[0,1]^3)\) satisfy \(\theta \in \widetilde{\mathscr {K}}\). Let the configuration be planar on \({B_{1}\left( 0\right) }\) with respect to the direction \(d \in \mathscr {D}\) and let the functions of one variable involved in the decomposition (14) be given by \(f_{\nu _i} \in L^\infty (-1,1)\) for \(\nu _i \in N_i\) with \(\nu _i \cdot d= 0\) for all \(i=1,2,3\). Furthermore, let all \(f_{\nu _i}\) for \(i=1,2,3\) be non-affine on \(\left( -r,r \right) \).

Then the configuration is a two-variant configuration on \({B_{r}\left( 0\right) }\) or a planar triple intersection on \({B_{2r}\left( 0\right) }\).

Letting \(\pi _i(x)\, {:}{=} \, \nu _i \cdot x\) for \(i=1,2,3\) and \(x\in \mathbb {R}^3\), the idea is to prove for all \(i=1,2,3\) that there exist \(J_i \subset (-\frac{1}{2},\frac{1}{2})\) such that we have

$$\begin{aligned} \theta _i^{-1}(0) \cap {B_{\frac{1}{2}}\left( 0\right) }= \pi ^{-1}_{i+1}(J_{i+1}) \cap \pi ^{-1}_{i-1}({J}^{\mathsf {c}}_{i-1}) \cap {B_{\frac{1}{2}}\left( 0\right) }, \end{aligned}$$

i.e., they are product sets in suitable coordinates. Note that the condition \(e(u) \in \mathscr {K}\) almost everywhere is then equivalent to \(\bigcup _{i=1}^3 \theta _i^{-1}(0) = {B_{1}\left( 0\right) }\). Writing this in terms of \(J_i\) with \(i=1,2,3\) allows us to apply Lemma 10 below to conclude that \(J_i\) is an interval for all \(i=1,2,3\). The actual representation of the strain is then straightforward to obtain.

Lemma 10

Let \(\nu _1, \nu _2, \nu _3 \subset {\mathbb {S}}^1\) be linearly dependent by virtue of \(\nu _1 + \nu _2 + \nu _3 = 0\). For \(i=1,2,3\) and \(x \in \mathbb {R}^2\) let \(\pi _i(x) = x\cdot \nu _i\). Let \(J_1,J_2,J_3 \subset \left( -8,8\right) \) be measurable such that

  1. 1.

    we have

    $$\begin{aligned} \left| B^{\,2}_{8}(0) \cap \left( \pi _1^{-1}(J_1)\cap \pi _2^{-1}(J_2)\cap \pi _3^{-1}(J_3)\right) \right|&=0,\nonumber \\ \left| B^{\,2}_{8}(0) \cap \left( \pi _1^{-1}({J}^{\mathsf {c}}_1)\cap \pi _2^{-1}({J}^{\mathsf {c}}_2)\cap \pi _3^{-1}({J}^{\mathsf {c}}_3)\right) \right|&= 0, \end{aligned}$$
    (25)
  2. 2.

    and the two sets \(J_1\) and \(J_2\) neither have zero nor full measure in \(\left( -1,1 \right) \) in the sense that

    $$\begin{aligned} 0< \left| J_1\cap \left( -1,1 \right) \right|&<2,\nonumber \\ 0< \left| J_2\cap \left( -1,1\right) \right|&<2. \end{aligned}$$
    (26)

Then there exist a point \(x_0 \in B^{\,2}_2(0)\) such that

$$\begin{aligned} \left| \left( J_i {\Delta } (-2,x_0\cdot \nu _i) \right) \cap (-2,2) \right| = 0 \;\text { for all }\; i=1,2,3 \end{aligned}$$

or

$$\begin{aligned} \left| \left( J_i {\Delta } (x_0\cdot \nu _i,2)\right) \cap (-2,2) \right| = 0\; \text { for all } \; i=1,2,3. \end{aligned}$$

To illustrate the proof let us first assume that \(J_1\) and \(J_2\) are intervals of matching “orientations”, e.g., we have \(J_1 = J_2 = (-\infty ,0)\), in which case Fig. 16a suggests that also \(J_3 = (-\infty ,0)\).

If they are not intervals of matching “orientations”, we will see that, locally and up to symmetry, more of \(J_1\) lies below, for example, the value 0 than above, while the opposite holds for \(J_2\). The corresponding parts of \(J_1\) and \(J_2\) are shown in Fig. 16b. One then needs to prove that sufficiently many lines \(\pi _3^{-1}(s)\) for parameters \(s\in \mathbb {R}\) close to 0 intersect the “surface” of \(\pi _1^{-1}(J_1) \cap \pi _2^{-1}(J_2)\), see Lemma 11 below. As a result less than half the parameters around 0 are contained in \(J_3\). The same argument for the complements ensures that also less than half of them are not contained in \(J_3\), which cannot be true.

Fig. 16
figure 16

Sketches illustrating the proof of Lemma 10. The arrows in the middle indicate the three linearly dependent directions \(\nu _1, \nu _2, \nu _3\). a In the setting of Lemma 10, the set \(\pi _3^{-1}(J_3)\) (hatched) may only intersect \(\pi _1^{-1}({J_1}^{\mathsf {c}}) \cap \pi _2^{-1}({J_2}^{\mathsf {c}})\) (light gray) and its complement may only intersect \(\pi _1^{-1}(J_1) \cap \pi _2^{-1}(J_2)\) (dark gray). b The line \(\pi _3^{-1}(s)\) intersects both a subset of \(\pi _1^{-1}(J_1) \cap \pi _2^{-1}(J_2)\) (dark gray) and a subset of \(\pi _1^{-1}({J_1}^{\mathsf {c}}) \cap \pi _2^{-1}({J_2}^{\mathsf {c}})\) (light gray)

To link intersecting lines to the “surface area” we use that our sets are of product structure, i.e., they can be thought of as unions of parallelograms, and that the intersecting lines are not parallel to one of the sides of said parallelograms. In the following and final lemma, we measure-theoretically ensure the line \(\pi _3^{-1}(s)\) for \(s\in \mathbb {R}\) intersects a product set \(\pi _1^{-1}(J_1) \cap \pi _2^{-1}(J_2)\) by asking

$$\begin{aligned} \int _{\{x\cdot \nu _3 = s\}} \chi _{J_1}(x\cdot \nu _1) \chi _{J_2}(x\cdot \nu _2) \, \mathrm {d}\mathscr {H}^1(x) > 0. \end{aligned}$$

Lemma 11

Let \(\nu _1, \nu _2, \nu _3 \in {\mathbb {S}}^1\) with \(\nu _1 + \nu _2 + \nu _3 = 0\). Let \(J_1, J_2 \subset \mathbb {R}\)  be measurable with \(|J_1|,|J_2|>0\). Then the set

$$\begin{aligned} A\, {:}{=}\, \left\{ s \in \mathbb {R}: \int _{\{x\cdot \nu _3 = s\}} \chi _{J_1}(x\cdot \nu _1) \chi _{J_2}(x\cdot \nu _2) \, \mathrm {d}\mathscr {H}^1(x) > 0\right\} \end{aligned}$$

is measurable and satisfies \(A \subset \pi _3\left( \pi _1^{-1}(J_1) \cap \pi _2^{-1}(J_2) \right) \), as well as \(|A|\ge |J_1| + |J_2|\).

4 Proofs

Proof of Theorem 1

We first use Lemma 2 to see that the limiting differential inclusion \(e(u)\in \mathscr {K}\) in fact holds. Furthermore, we obtain a function \(\theta \in L^\infty ({B_{1}\left( 0\right) };\mathbb {R}^3)\) almost everywhere satisfying \(e(u) = \sum _{i=1}^3 \theta _i e_i\) and \(\theta \in \widetilde{\mathscr {K}}\). Next, we apply Lemma 3 to deduce the existence of a universal radius \(r_1\in (0,1)\) such that for all \(\nu \in N\) and all cyclical indices \(i=1,2,3\) there exist \(f_\nu \in L^\infty (-r_1,r_1)\) affine functions \(g_i :\mathbb {R}^3 \rightarrow \mathbb {R}\) such that for almost all \(x \in {B_{r_1}\left( 0\right) }\) we have

$$\begin{aligned} \theta _{i}(x) = \sum _{\nu \in N_{i+1}} f_{\nu }(x\cdot \nu ) - \sum _{\nu \in N_{i-1}} f_{\nu }(x\cdot \nu ) + g_i(x). \end{aligned}$$

Let \( r_3 < r_2 \in (0,\frac{1}{2})\) be the two universal radii of Proposition 1. By a rescaling argument, we may suppose that \(r_2\) is also the universal radius of Proposition 3. Let \(r_4\, {:}{=}\, r_1r_2\) and \(r_5\, {:}{=}\, r_1r_3\), so that by another rescaling argument, the conclusions of Propositions 1 and 3 hold for the respective radii \(r_5<r_4\). If \(f_\nu \in VMO(-r_5,r_5)\) for all \(\nu \in N\), then a rescaling argument and Proposition 2 imply that there exists a universal \(r_6 \in (0,r_5)\) such that the solution of the differential inclusion is a two-variant configuration on \({B_{r_6}\left( 0\right) }\). If \(f_{\nu _i} \notin VMO(-r_5,r_5)\) for some \(\nu _i \in N_i\) and \(i \in \{1,2,3\}\) we can use the same rescaling argument and Proposition 1 to deduce that the configuration is planar on \({B_{r_4}\left( 0\right) }\) or a two-variant configuration on \({B_{r_5}\left( 0\right) }\).

We are thus left with classifying planar configurations on \({B_{r_4}\left( 0\right) }\), i.e., there exists \(d\in \mathscr {D}\) such that for all \(j=1,2,3\) there exist \(\nu _j \in N_j\) with \(d\cdot \nu _j = 0\), functions \(\tilde{f}_{\nu _j} \in L^\infty (-r_4,r_4)\) and affine functions \(\tilde{g}_j:\mathbb {R}^3 \rightarrow \mathbb {R}\) with \(\partial _d g_j = 0\) such that for almost all \(x\in {B_{r_4}\left( 0\right) }\) we have

$$\begin{aligned} \theta _j(x) = \tilde{f}_{\nu _{j+1}}(x\cdot \nu _{j+1}) - \tilde{f}_{\nu _{j-1}}(x\cdot \nu _{j-1}) + \tilde{g}_j(x). \end{aligned}$$

If, additionally, one of the functions \(\tilde{f}_{\nu _j}\) for \(j\in \{1,2,3\}{\setminus } \{i\}\) is affine on \((-r_4,r_4)\), we can apply Lemma 9 after rescaling to see that the configuration is a two-variant configuration, a planar second-order laminate or a planar checkerboard on \({B_{r_7}\left( 0\right) }\) for some universal \(r_7>0\). Otherwise \(f_{\nu _j}\) is not affine on \((-r_4,r_4)\) for all \(j \in \{1,2,3\}{\setminus }\{i\}\), and \(f_{\nu _i}\) is not affine on \((-r_4,r_4)\) by virtue of \(f_{\nu _i} \not \in VMO(-r_5,r_5)\) and \(r_5<r_4\). Therefore, remembering that we can apply Proposition 3 with the radius \(r_4\), we obtain that the configuration is a two-variant configuration on \({B_{r_4}\left( 0\right) }\) or a planar triple intersection on \({B_{2r_4}\left( 0\right) }\).

Let \(r\, {:}{=}\, \min \{r_4,r_5,r_6,r_7\}\) and \(\tilde{r}\, {:}{=}\, 2r_4\). Restricting to the smaller ball \({B_{r}\left( 0\right) }\) where possible, we see that we have a two-variant configuration, a planar second-order laminate or a planar checkerboard on \({B_{r}\left( 0\right) }\), or a planar triple intersection on \({B_{\tilde{r}}\left( 0\right) }\), concluding the proof. \(\quad \square \)

4.1 Construction of a Fully Three-Dimensional Structure in the Presence of Austenite

Here we flesh out the previously announced example in Lemma 1. The idea is to construct a two-variant configuration which can be shifted in strain space to include austenite, see Fig. 18.

Proof of Lemma 1

Recall \(\nu _1^+= \frac{1}{\sqrt{2}} (011),\nu _1^- = \frac{1}{\sqrt{2}} (01\overline{1}) \) from Section 2.2 and let \(\nu _3 \, {:}{=}\, \nu _3^+ =\frac{1}{\sqrt{2}} (110)\). It is clear that \(\{\nu _1^+,\nu _1^-,\nu _3\}\) is a basis of \(\mathbb {R}^3\), see also Fig. 17. Let \(\chi _1^+, \chi _1^-, \chi _3: \mathbb {R}\rightarrow \{0,1\}\) be measurable characteristic functions. For \(x\in \mathbb {R}^3\), we define the volume fractions to be

$$\begin{aligned} \theta _1(x)&\, {:}{=}\, \frac{1}{3}\chi _3(x\cdot \nu _3),\\ \theta _2(x)&\,{:}{=}\, 1 - \frac{1}{3}\chi _1^+(x\cdot \nu _1^+) -\frac{1}{3}\chi _1^-(x\cdot \nu _1^-) -\frac{1}{3}\chi _3(x\cdot \nu _3),\\ \theta _3(x)&\, {:}{=}\, \frac{1}{3}\chi _1^+(x\cdot \nu _1^+) + \frac{1}{3}\chi _1^-(x\cdot \nu _1^-), \end{aligned}$$

which clearly a.e. satisfy \(0\le \theta _i \le 1\) for all \(i=1,2,3\) and \(\theta _1 + \theta _2 + \theta _3 \equiv 1\). As \(\{\nu _1^+,\nu _1^-,\nu _3\}\) constitutes a basis of \(\mathbb {R}^3\), the structure is indeed fully three-dimensional.

Straightforward case distinctions ensure that \(\theta _i = 0\) for some \(i=1,2,3\) or \(\theta _i =\frac{1}{3}\) for all \(i=1,2,3\) almost everywhere. Setting \(G\, {:}{=}\, \sum _{i=1}^3 \theta _i e_i\) we see that this implies \(G \in \mathscr {K}\cup \{0\}\) almost everywhere. A sketch of cross-sections through G on \(H(c,\nu _1^-)\) both with \(\chi _1^-(c) =0\) and \(\chi _1^-(c) =1\) is given in Fig. 18.

Fig. 17
figure 17

Sketch showing the basis \(\{\nu _1^+, \nu _1^-,\nu _3\}\) and, in gray, a plane normal to \(\nu _3\), parallel to which the cross-sections of Fig. 18 are chosen

Fig. 18
figure 18

a Cross-section of the construction for Lemma 1 with \(x\cdot \nu _3 = c\). If \(c\in \mathbb {R}\) is such that \(\chi _3(c) = 0\) then the strains \(M_1, M_2,M_3\) are as in Subfigure (b) if \(\chi _3(c) = 1\) they are as in Subfigure (c)

Finally, in order to identify G as the symmetric gradient of a displacement we define functions \(F_1^+,F_1^-, F_3 : \mathbb {R}\rightarrow \mathbb {R}\) such that for almost all \(s\in \mathbb {R}\) we have

$$\begin{aligned} (F_1^+)'(s) = \sqrt{2} \chi _1^+(s), \, (F_1^-)'(s) = \sqrt{2} \chi _1^-(s) \text { and } (F_3)'(s) = \sqrt{2} \chi _3(s). \end{aligned}$$

For \(x\in \mathbb {R}^3\) we then set

$$\begin{aligned} u_1(x)&\, {:}{=} \, - F_3(x\cdot \nu _3) + x_1,\\ u_2(x)&\, {:}{=}\, F_1^+(x\cdot \nu _1^+) + F_1^-(x\cdot \nu _1^-) + F_3(x\cdot \nu _3) - 2x_2,\\ u_3(x)&\, {:}{=}\, - F_1^+(x\cdot \nu _1^+) + F_1^-(x\cdot \nu _1^-) + x_3. \end{aligned}$$

The identity \(e(u) \equiv G\) a.e. on \(\mathbb {R}^3\) is straightforward to check. \(\quad \square \)

4.2 The Differential Inclusion

Proof of Lemma 2

For simplicity, we suppress the sequence parameter in the notation throughout the proof. Let \(\eta >0\). For Borel sets \(B\subset {B_{1}\left( 0\right) }\) we interpret the energies

$$\begin{aligned} E_{\eta }(B) \,{:}{=}\, \eta ^{-\frac{2}{3}}\int _B \left| e(u)- \sum _{i=1}^3 \chi _ie_i\right| ^2 \, \mathrm {d}x+\eta ^\frac{1}{3} \sum _{i=1}^3 |D\chi _i|(B) \end{aligned}$$

as finite Radon measures on \({B_{1}\left( 0\right) }\).

Let \(y \in {B_{1}\left( 0\right) }\) and \(r>0\) be such that \({B_{r}\left( y\right) }\subset {B_{1}\left( 0\right) }\). We rescale \({B_{r}\left( y\right) }\) to the unit ball by setting \(\hat{\eta }\, {:}{=} \,\frac{\eta }{r}\), and defining \(\hat{u}_{\hat{\eta }} \in W^{1,2}({B_{1}\left( 0\right) }; \mathbb {R}^3)\) and \(\hat{\chi }_{\hat{\eta }} \in L^\infty ( {B_{1}\left( 0\right) } ;\{0,1\}^3)\) with \(\sum _{i=1}^3 \hat{\chi }_{\hat{\eta }, i}=1 \) a.e. for \(\hat{x} \in {B_{1}\left( 0\right) } \) to be

$$\begin{aligned} \hat{u}_{\hat{\eta }}(\hat{x}){:}{=} \frac{1}{r}u_\eta \left( r \hat{x}+y\right) \text {, }\hat{\chi }_{\hat{\eta }}(\hat{x})\, {:}{=}\, \chi _\eta \left( r \hat{x} + y\right) . \end{aligned}$$

By the Capella-Otto rigidity result [11, Theorem 2.2] there exists a universal radius \(0<s<1\) such that

$$\begin{aligned} \min \left\{ \Vert \hat{\chi }_{1,\hat{\eta }}\Vert _{L^1({B_{s}\left( 0\right) })},\Vert \hat{\chi }_{2,\hat{\eta }}\Vert _{L^1({B_{s}\left( 0\right) })},\Vert \hat{\chi }_{3,\hat{\eta }}\Vert _{L^1({B_{s}\left( 0\right) })}\right\} \lesssim E_{\hat{\eta }}(\hat{u}_{\hat{\eta }},\hat{\chi }_{\hat{\eta }})^\frac{1}{2}. \end{aligned}$$

The energy of the rescaled functions is

$$\begin{aligned} E_{\hat{\eta }}(\hat{u}_{\hat{\eta }},\hat{\chi }_{\hat{\eta }})&= \left( \frac{\eta }{r}\right) ^{-\frac{2}{3}}\int _{B_{1}\left( 0\right) } \left| e(u_\eta )(r\hat{x}+y) - \sum _{i=1}^3 \chi _i(r\hat{x}+y) e_i \right| ^2 \, \mathrm {d}\hat{x} \\&\quad + \left( \frac{\eta }{r}\right) ^{\frac{1}{3}}|D\hat{\chi }_{\hat{\eta }}|({B_{1}\left( 0\right) })\\&= r^{-3 + \frac{2}{3}}E_{\eta }({B_{r}\left( y\right) }), \end{aligned}$$

so that rescaling back to \({B_{r}\left( y\right) }\), we get

$$\begin{aligned} \frac{1}{r^3}\min \left\{ \Vert \chi _{1,\eta }\Vert _{L^1({B_{sr}\left( y\right) })},\Vert \chi _{2,\eta }\Vert _{L^1({B_{sr}\left( y\right) })},\Vert \chi _{3,\eta }\Vert _{L^1({B_{sr}\left( y\right) })}\right\} \lesssim \left( r^{-3 + \frac{2}{3}}E_{\eta }({B_{r}\left( y\right) })\right) ^\frac{1}{2}. \end{aligned}$$

After passing to a subsequence (not relabeled), we have \(E_\eta {\mathop {\rightharpoonup }\limits ^{*}} E\) as Radon measures in the limit \(\eta \rightarrow 0\). Consequently weak lower semi-continuity of the \(L^1\)-norm and upper semi-continuity of the total variation on compact sets imply

$$\begin{aligned} \frac{1}{r^3}\min \left\{ \Vert \theta _{1}\Vert _{L^1({B_{sr}\left( y\right) })},\Vert \theta _{2}\Vert _{L^1({B_{sr}\left( y\right) })},\Vert \theta _{3}\Vert _{L^1({B_{sr}\left( y\right) })}\right\} \lesssim \left( r^{-3 + \frac{2}{3}}E(\overline{{B_{r}\left( y\right) }})\right) ^\frac{1}{2}. \end{aligned}$$

For \(T\, {:}{=}\, \left\{ y \in {B_{1}\left( 0\right) } : \limsup _{r\rightarrow 0} r^{-3 + \frac{2}{3}}E\left( \overline{{B_{r}\left( y\right) }}\right) >0\right\} \) and for all \(\varepsilon >0\) we get \(\mathscr {H}^{3-\frac{2}{3}+\varepsilon }(T) = 0\) by [1, statement (2.40)]. Using [1, Theorem 2.49 (iii) and Theorem 2.53], along with Lebesgue point theory we thus for almost all \(y \in {B_{1}\left( 0\right) }\) obtain

$$\begin{aligned} \min \left\{ \theta _{1}(y),\theta _{2}(y),\theta _{3}(y)\right\} = 0. \end{aligned}$$

\(\square \)

4.3 Decomposing the Strain

Proof of Lemma 3

First we notice that for all \(i=1,2,3\) and almost all \(x\in {B_{1}\left( 0\right) }\) the relation \(e(u)(x) = \sum _{i=1}^3 \theta _i(x) e_i\), the definition (4) of \(e_i\) and the assumption \(\sum _{i=1}^3 \theta _i(x) = 1\) imply

$$\begin{aligned} e(u)_{ii}(x) = \sum _{j=1}^3 \theta _j(x) e_j = -2\theta _i(x) + \theta _{i+1}(x) + \theta _{i-1}(x) = 1-3\theta _i(x). \end{aligned}$$

Therefore, we only have to argue that for all \(i=1,2,3\) a decomposition for \(e(u)_{ii}\) analogous to the Equation (15) holds.

The remaining proof is essentially a translation of the proofs of Capella and Otto [11, Lemma 3.7 and Proposition 3.9] into our setting. To this end, we use the “dictionary”

$$\begin{aligned} e(u)_{11}&\longleftrightarrow \chi _1, \\ e(u)_{22}&\longleftrightarrow \chi _2, \\ e(u)_{33}&\longleftrightarrow \chi _3, \\ 0&\longleftrightarrow \chi _0, \end{aligned}$$

where the left-hand side shows our objects and the right-hand side shows the corresponding ones of Capella and Otto. The two main changes are the following:

  1. 1.

    In our case all relevant second mixed derivatives vanish (see Lemma 5), instead of being controlled by the energy. Furthermore, whenever Capella and Otto refer to their “austenitic result”, we just have to use the fact that \(e(u)_{11} + e(u)_{22} + e(u)_{33} \equiv 0\).

  2. 2.

    We need to check at every step that boundedness of all involved functions is preserved.

We will briefly indicate how boundedness of all functions is ensured. The functions in [11, Lemma 3.7] are constructed by averaging in certain directions. This clearly preserves boundedness. The proof of [11, Proposition 3.9] works by applying pointwise linear operations to all functions, which again preserves boundedness, and by identifying certain functions as being affine, which are also bounded on the unit ball. \(\quad \square \)

Proof of Corollary 1

By symmetry we can assume \(i=1\). Applying [11, Lemma 3.8] to \(\theta _1\) we see that for every \(x\in {B_{1}\left( 0\right) }\) there exists \(r>0\) such that the functions \(y \mapsto f_\nu (y\cdot \nu )\) for \(\nu \in N_2 \cup N_3\) are affine on the ball \({B_{r}\left( x\right) }\). Consequently, they are affine on \({B_{1}\left( 0\right) }\). Thus we can find affine functions \(\tilde{g}_2, \tilde{g}_3 : \mathbb {R}^3 \rightarrow \mathbb {R}\) such that for almost all \(x\in {B_{1}\left( 0\right) }\) the decomposition (16) reduces to

$$\begin{aligned} \theta _1(x)&= 0, \\ \theta _{2}(x)&= - f_{\nu _1^+}\big (x\cdot \nu _1^+\big ) - f_{\nu _1^-}\big (x\cdot \nu _1^-\big ) + \tilde{g}_{2}(x),\\ \theta _{3}(x)&= f_{\nu _1^+}\big (x\cdot \nu _1^+\big ) + f_{\nu _1^-}\big (x\cdot \nu _1^-\big ) + \tilde{g}_{3}(x). \end{aligned}$$

As the vectors \(\nu _1^+\) and \(\nu _1^-\) form a basis of the plane \(H(0,E_1)\) defined in (13), we can find functions \(\tilde{f}_{\nu _1^+}, \tilde{f}_{\nu _1^-} \in L^\infty (-1,1)\), an affine function \(\hat{g}_3 :\mathbb {R}^3 \rightarrow \mathbb {R}\) and \(\lambda \in \mathbb {R}\) such that for almost all \(x\in {B_{1}\left( 0\right) }\) we have

$$\begin{aligned} \theta _2(x)&= - \tilde{f}_{\nu _1^+}\big (x\cdot \nu _1^+\big ) - \tilde{f}_{\nu _1^-}\big (x\cdot \nu _1^-\big ) + \lambda x_1 +1.\nonumber \\ \theta _3(x)&= \tilde{f}_{\nu _1^+}\big (x\cdot \nu _1^+\big ) + \tilde{f}_{\nu _1^-}\big (x\cdot \nu _1^-\big ) + \hat{g}_3(x). \end{aligned}$$
(27)

Therefore, for almost all \(x\in {B_{1}\left( 0\right) }\) we have \(\hat{g}_3(x) = -\lambda x_1\) due to the assumption \(\theta (x)\in \mathscr {\widetilde{K}}\) in the form of \(\sum _{i=1}^3\theta _i(x)= 1\), and thus the decomposition simplifies to

$$\begin{aligned} \theta _1(x)&= 0, \\ \theta _{2}(x)&= - \tilde{f}_{\nu _1^+}\big (x \cdot \nu _1^+\big ) - \tilde{f}_{\nu _1^-}\big (x \cdot \nu _1^-\big ) + \lambda x_1 +1,\\ \theta _{3}(x)&= \tilde{f}_{\nu _1^+}\big (x \cdot \nu _1^+\big ) + \tilde{f}_{\nu _1^-}\big (x \cdot \nu _1^-\big ) -\lambda x_1. \end{aligned}$$

\(\square \)

Proof of Lemma 4

For \(t \in \mathbb {R}\) and \(\delta >0\) let

$$\begin{aligned} \phi (t)\, {:}{=}\, \int _{\{x_1=t\}} \frac{1}{\mathscr {L}^n({B_{1}\left( 0\right) })}\chi _{B_{1}\left( 0\right) }(t,x') \, \mathrm {d}\mathscr {L}^{n-1}(x')\text { and }\phi _\delta (t)\, {:}{=}\, \frac{1}{\delta }\phi \left( \frac{t}{\delta }\right) . \end{aligned}$$

For all \(x\in V\) and all \(\delta >0\) we have that

$$\begin{aligned} \sum _{i=1}^P \phi _\delta * f_i(x\cdot \nu _i) = \fint _{B_{\delta }\left( x\right) }F(y) \, \mathrm {d}\mathscr {L}^n(y) \in \mathscr {C}, \end{aligned}$$

since \({B_{1}\left( 0\right) }\) is invariant under rotation and \(\mathscr {C}\) is convex. Let \(i\in \{1,\ldots ,P\}\). By [20, Theorem 4.1 (iv)] and [20, Theorem 1.33] we get a measurable set \(T_i \subset \mathbb {R}\) with \(\mathscr {L}(\mathbb {R}{\setminus } T_i) = 0\) such that for all \(t \in T_i\) we have \(\phi _\delta * f_i (t) \rightarrow f_i (t)\) in the limit \(\delta \rightarrow 0\). For all \(i=1,\dots ,P\) let \(\widetilde{\nu }_i \in V \cap {B_{1}\left( 0\right) }{\setminus }\{0\}\) be the orthogonal projection of \(\nu _i\) onto V. As for all \(x \in V\) we have \(x\cdot \tilde{\nu }_i = x \cdot \nu _i \not \in T_i\) if and only if \(x\cdot \frac{\widetilde{\nu }_i}{|\widetilde{\nu }_i|} \not \in \frac{1}{|\widetilde{\nu }_i|} T_i\), Fubini’s Theorem for all measurable sets \(U \subset V\) of finite \(\mathscr {H}^k\)-measure gives

$$\begin{aligned} \mathscr {H}^k\left( \{x\in U: x\cdot \nu _i \in \mathbb {R}{\setminus } T_i\}\right) = 0. \end{aligned}$$

Thus for almost all \(x\in V\) we have in the limit \(\delta \rightarrow 0\) that

$$\begin{aligned} \fint _{B_{\delta }\left( x\right) }F(y) \, \mathrm {d}\mathscr {L}^n(y) \rightarrow F|_V(x)\, {:}{=}\, \sum _{i=1}^P f_i(x\cdot \nu _i) \in \mathscr {C}. \end{aligned}$$

\(\square \)

Proof of Lemma 5

By symmetry it is sufficient to prove the equations involving \(\theta _1\). We calculate

$$\begin{aligned} \partial _{[111]}\partial _{[\overline{1}11]}&= -\partial ^2_1 + \partial _1\partial _2 + \partial _1\partial _3 - \partial _1\partial _2 + \partial ^2_2 + \partial _2\partial _3 - \partial _1\partial _3 + \partial _2\partial _3 + \partial ^2_3 \\&= -\partial _1^2 +\partial _2^2 + \partial _3^2 + 2 \partial _2\partial _3 \end{aligned}$$

and, similarly,

$$\begin{aligned} \partial _{[1\overline{1}1]}\partial _{[ 11\overline{1}]} = \partial _1^2 - \partial _2^2 - \partial _3^2 + 2 \partial _2\partial _3. \end{aligned}$$

For almost all \(x\in {B_{1}\left( 0\right) }\) we have due to \(\frac{1}{2}(Du + Du^T)(x) = e(u)(x) \in S\), see definition (6), the distributional identities

$$\begin{aligned} \big (-\partial _1^2 +\partial _2^2 + \partial _3^2\big )u_{1}&= -\partial _1^2 u_{1} -\partial _2\partial _1 u_{2} -\partial _3\partial _1u_{3} \\&= -\partial _1 {\text {tr}} Du \\&= 0. \end{aligned}$$

Distributionally, we also know that

$$\begin{aligned} \partial _2\partial _3 u_{1} = - \partial _2\partial _1 u_{3} = \partial _1\partial _3 u_{2} = - \partial _2\partial _3 u_{1}, \end{aligned}$$

which gives

$$\begin{aligned} \partial _2\partial _3 u_{1} =0. \end{aligned}$$

Taking a further distributional derivative we see

$$\begin{aligned} \partial _{[111]}\partial _{[\overline{1}11]} \partial _1 u_1 = 0 \text { and } \partial _{[1\overline{1}1]}\partial _{[ 11\overline{1}]} \partial _1 u_1 = 0. \end{aligned}$$

For almost all \(x\in {B_{1}\left( 0\right) }\) we have the identity \(\partial _1 u_1(x) = 1-3\theta _1(x)\), as a result of the assumptions \(e(u)(x) =\sum _{i=1}^3 \theta _i(x) e_i\) and \(\sum _{i=1}^3 \theta _i (x) = 1\), so that the above turns into

$$\begin{aligned} \partial _{[111]}\partial _{[\overline{1}11]} \theta _1 = 0 \text { and } \partial _{[1\overline{1}1]}\partial _{[ 11\overline{1}]} \theta _1 = 0. \end{aligned}$$

\(\square \)

4.4 Planarity in the Case of Non-trivial Blow-Ups

Proof of Proposition 1

Step 1: Identification of a suitable plane to blow-up at.

By symmetry, we may assume \(\nu = \smash {\frac{1}{\sqrt{2}} (011)}\). Recall that for \(y\in \mathbb {R}^2\) and \( r >0\) the symbol \(B_r^{\,2}(y)\) refers to the two-dimensional ball. Additionally, we always drop the factor \(\smash {\frac{1}{ \sqrt{2}}}\) in the index of \(f_{\tilde{\nu }}\) for \(\tilde{\nu } \in N\) whenever we explicitly write out \(\widetilde{\nu }\). Throughout the argument, \(0<\tilde{r}<r<\frac{1}{64}\) are fixed, universal radii we will determine later.

As \(f_{(011)} \notin VMO(- \tilde{r}, \tilde{r})\) by assumption, there exist sequences \(\alpha _k \in (-\tilde{r}, \tilde{r})\) and \(\delta _k > 0\) for \(k\in \mathbb {N}\) such that \((\alpha _k - \delta _k ,\alpha + \delta _k ) \subset (-\tilde{r}, \tilde{r})\) and

  1. 1.
    $$\begin{aligned} \lim _{k\rightarrow \infty } \fint _{(\alpha _k -\delta _k, \alpha _k + \delta _k)} \left| f_{(011)}(s) - \fint _{(\alpha _k -\delta _k, \alpha _k + \delta _k)} f_{(011)}(\tilde{s}) \, \mathrm {d}\tilde{s} \right| \, \mathrm {d}s > 0, \end{aligned}$$
    (28)
  2. 2.

    \(\lim _{k\rightarrow \infty }\delta _k = 0\),

  3. 3.

    \(\lim _{k\rightarrow \infty } \alpha _k = \alpha \in [-\tilde{r}, \tilde{r}]\).

For \(k\in \mathbb {N}\) we parametrize the plane \(H\left( \alpha _k,\smash {\frac{1}{\sqrt{2}} (011)}\right) \) at which we will blow-up using \(\beta ,\gamma \in \mathbb {R}\) such that \((\beta ,\gamma )\in \smash {B_{1/8}^{\,2}}(0)\) and

$$\begin{aligned} X_k(\beta ,\gamma )\,{:}{=}\, \alpha _k \frac{1}{\sqrt{2}} (011) + \left( \beta - \frac{1}{2} \alpha _k\right) \frac{1}{\sqrt{2}}[1\overline{1}1] + \left( \gamma -\frac{1}{2}\alpha _k\right) \frac{1}{\sqrt{2}}[11\overline{1}]. \end{aligned}$$

Note that \(X_k(\beta ,\gamma ) \in {B_{1}\left( 0\right) }\) for all \(k\in \mathbb {N}\) and all \((\beta ,\gamma ) \subset B_{1/8}^{\,2}(0)\) due to \(\tilde{r} < \frac{1}{64}\).

It is straightforward to see that then for all \(k \in \mathbb {N}\) and \((\beta ,\gamma ) \in B_{1/8}^{\,2}(0)\) we have the relations

$$\begin{aligned} X_k(\beta ,\gamma ) \cdot \frac{1}{\sqrt{2}} (011) = \alpha _k,&\quad X_k(\beta ,\gamma ) \frac{1}{\sqrt{2}} (01 \overline{1}) = \gamma - \beta , \end{aligned}$$
(29)
$$\begin{aligned} X_k(\beta ,\gamma ) \cdot \frac{1}{\sqrt{2}} (101) = \beta ,&\quad X_k(\beta ,\gamma ) \cdot \frac{1}{\sqrt{2}} (\overline{1} 0 1) = \alpha _k - \gamma , \end{aligned}$$
(30)
$$\begin{aligned} X_k(\beta ,\gamma ) \cdot \frac{1}{\sqrt{2}} (110) = \gamma ,&\quad X_k(\beta ,\gamma ) \cdot \frac{1}{\sqrt{2}} (1\overline{1} 0) = \beta -\alpha _k. \end{aligned}$$
(31)

Note that they nicely capture the combinatorics we discussed in Remark 1: The expression \( X_k(\beta ,\gamma ) \cdot \nu _1^+\) depends on neither \(\beta \) nor \(\gamma \), while \( X_k(\beta ,\gamma ) \cdot \nu _1^-\) depends on both. Furthermore, we see that \( X_k(\beta ,\gamma ) \cdot \nu _i^\pm \) for \(i=2,3\) depend on precisely one of the two. For a sketch relating \(H\left( \alpha _k,\nu _1^+\right) \) with the normals \(\nu \in N\) see Fig. 19a.

For all \((\beta ,\gamma ) \in \smash {B_{1/8}^{\,2}(0)}\) and \(k\rightarrow \infty \) we get the uniform convergence

$$\begin{aligned}&X_k(\beta ,\gamma ) \rightarrow X(\beta ,\gamma ) =\alpha \frac{1}{\sqrt{2}} (011)\nonumber \\&\quad +\left( \beta - \frac{1}{2} \alpha \right) \frac{1}{\sqrt{2}} [1\overline{1}1] +\left( \gamma - \frac{1}{2} \alpha \right) \frac{1}{\sqrt{2}}[11\overline{1}] \end{aligned}$$
(32)

and the relations with the normals turn into

$$\begin{aligned} X(\beta ,\gamma ) \cdot \frac{1}{\sqrt{2}} (011) = \alpha ,&\quad X(\beta ,\gamma ) \cdot \frac{1}{\sqrt{2}} (01\overline{1}) = \gamma - \beta , \end{aligned}$$
(33)
$$\begin{aligned} X(\beta ,\gamma ) \cdot \frac{1}{\sqrt{2}} (101) = \beta ,&\quad X(\beta ,\gamma ) \cdot \frac{1}{\sqrt{2}} (\overline{1} 0 1) = \alpha - \gamma , \end{aligned}$$
(34)
$$\begin{aligned} X(\beta ,\gamma ) \cdot \frac{1}{\sqrt{2}} (110) = \gamma ,&\quad X(\beta ,\gamma ) \cdot \frac{1}{\sqrt{2}} (1\overline{1} 0) = \beta - \alpha . \end{aligned}$$
(35)

We still have \(X(\beta ,\gamma ) \in {B_{1}\left( 0\right) }\) for all \((\beta ,\gamma ) \in B_{1/8}^{\,2}(0)\).

For \(i=1,2,3\); \(\nu \in N\); \((\beta ,\gamma ) \in B_{1/8}^{\,2}(0)\); \(\xi \in {B_{1}\left( 0\right) }\); and \(k \in \mathbb {N}\) sufficiently large to have \(X_k(B_{1/8}^{\,2}(0)) + {B_{2\delta _k \xi }\left( 0\right) }\subset {B_{1}\left( 0\right) }\) we define the blow-ups

$$\begin{aligned} \theta _i^{(k)}(\beta ,\gamma ; \xi )&\, {:}{=}\, \theta _i(X_k(\beta ,\gamma ) + 2 \delta _k \xi ),\\ f_{\nu }^{(k)}(\beta ,\gamma ;\xi )&\,{:}{=}\, f_\nu (\nu \cdot (X_k(\beta ,\gamma ) + 2 \delta _k \xi )), \\ g_i^{(k)}(\beta ,\gamma ; \xi )&\, {:}{=}\, g_i ( X_k(\beta ,\gamma ) + 2 \delta _k \xi ), \end{aligned}$$

where \(\theta ^{(k)} \in L^\infty (B_{1/8}^{\,2}(0) \times {B_{1}\left( 0\right) };\mathscr {\widetilde{K}})\), \(f_{\nu }^{(k)} \in L^\infty (B_{1/8}^{\,2}(0) \times {B_{1}\left( 0\right) })\), and \(g_i^{(k)} :\mathbb {R}^5 \rightarrow \mathbb {R}\) is affine. We furthermore remark that for all \(\nu \in N {\setminus } \left\{ \smash {\frac{1}{\sqrt{2}} (011)} \right\} \) and \((\beta ,\gamma ) \in B_{1/8}^{\,2}(0)\) the composition \(f_\nu \circ X(\beta ,\gamma )\, {:}{=} \, f_\nu \left( \nu \cdot X (\beta ,\gamma )\right) \) is well-defined almost everywhere due to Lemma 4, see also Fig. 19a.

Fig. 19
figure 19

a Sketch relating planes \(H(\tilde{\alpha },\nu _1^+)\) for \(\tilde{\alpha } \in \mathbb {R}\), shown in gray, with all \(\nu \in N\). b Planes \(H(\tilde{\alpha },\nu _1^+)\) for \(\tilde{\alpha } \in \mathbb {R}\) contain lines parallel to \(E_1\)

Step 2: There exists a subsequence, which we will not relabel, and a probability measure \(\mu \) on \(\mathbb {R}^2\) such that for almost all \((\beta ,\gamma ) \in \smash {B_{1/8}^{\,2}(0)}\); all \(\nu \in \smash { N {\setminus } \left\{ \smash {\frac{1}{\sqrt{2}} (011) }\right\} }\); all \(i=1,2,3\); and all \(\psi \in C(\mathbb {R}^2)\) we have in the limit \(k\rightarrow \infty \) that

$$\begin{aligned} \Vert f_{\nu }^{(k)}(\beta ,\gamma ;\bullet ) - f_\nu \circ X (\beta ,\gamma )\Vert _{L^1({B_{1}\left( 0\right) })}&\rightarrow 0 \nonumber \\ \Vert g_i^{(k)}(\beta ,\gamma ; \bullet ) - g_i\circ X (\beta ,\gamma )\Vert _{L^1({B_{1}\left( 0\right) })}&\rightarrow 0 \nonumber \\ \fint _{B_{1}\left( 0\right) } \psi \left( \left( -f_{(011)}^{(k)},f_{(011)}^{(k)}\right) (\beta ,\gamma ;\xi )\right) \mathrm {d} \xi&\rightarrow \int _{\mathbb {R}^2} \psi (\hat{f}) \, \mathrm {d}\mu (\hat{f}). \end{aligned}$$
(36)

Additionally, \(\mu \) is not a Dirac measure.

For all \(k\in \mathbb {N}\) and all \(\nu \in N{\setminus } \left\{ \smash {\frac{1}{\sqrt{2}} (011)}\right\} \) we have by definition of \(\smash {f_\nu ^{(k)}}\) and Fubini’s Theorem that

$$\begin{aligned}&\int _{B_{1/8}^{\,2}(0)} \fint _{B_{1}\left( 0\right) } \left| f_\nu ^{(k)}(\beta ,\gamma ;\xi ) -f_\nu \circ X(\beta ,\gamma )\right| \, \mathrm {d}\xi \, \mathrm {d}(\beta ,\gamma ) \\&\quad = \int _{B_{1/8}^{\,2}(0)} \fint _{B_{1}\left( 0\right) } \left| f_\nu \left( \nu \cdot X_k(\beta ,\gamma ) + 2 \delta _k \nu \cdot \xi \right) - f_\nu (\nu \cdot X(\beta ,\gamma )) \right| \, \mathrm {d}\xi \, \mathrm {d}(\beta ,\gamma ) \\&\quad \lesssim \int _{B_{1/8}^{\,2}(0)} \fint _{-1}^1 \left| f_\nu (\nu \cdot X_k(\beta ,\gamma ) + 2 \delta _k s ) - f_\nu (\nu \cdot X(\beta ,\gamma )) \right| \, \mathrm {d}s \, \mathrm {d}(\beta ,\gamma ). \end{aligned}$$

As \(\nu \cdot X_k(\beta ,\gamma )\) and \(\nu \cdot X(\beta ,\gamma )\) depend on at least \(\beta \) or \(\gamma \), see Equations (29)–(31) and (33)–(35), and we have the uniform convergence \(X_k \rightarrow X\) for \(k\rightarrow \infty \), we can apply Lemma 12 from the Appendix A to deduce that the integral in the last line vanishes in the limit. Passing to a subsequence, we get strong \(L^1\)-convergence in \(\xi \) for almost all \((\beta ,\gamma )\in B_{1/8}^{\,2}(0)\). Also, for all \((\beta ,\gamma ) \in B_{1/8}^{\,2}(0)\) and all \(i=1,2,3\) we have \(\smash {g_i^{(k)}}(\beta ,\gamma ;\bullet ) \rightarrow g_i \circ X(\beta ,\gamma )\) pointwise and in \(L^1\) in the limit \(k\rightarrow \infty \) by continuity of affine functions.

Due to the fact that \(X_k(\beta ,\gamma )\cdot \smash {\frac{1}{\sqrt{2}}} (011) = \alpha _k\) for all \((\beta ,\gamma ) \in \smash {B_{1/8}^{\,2}(0)}\) we see that \(\smash {f^{(k)}_{(011)}}\) does not depend on \(\beta \) and \(\gamma \). Hence we may drop them as arguments of \(f^{(k)}_{(011)}\). As \(f_{(011)}\) is a bounded function, the sequence of push-forward measures defined by the left-hand side of (36) have uniformly bounded supports. Consequently, there exists a limiting probability measure \(\mu \) such that along a subsequence (not relabeled) we for all \(\psi \in C(\mathbb {R}^2)\) have in the limit \(k\rightarrow \infty \) that

$$\begin{aligned} \int _{B_{1}\left( 0\right) } \psi \left( \left( -f_{\nu _1^+}^{(k)},f_{\nu _1^+}^{(k)}\right) (\xi )\right) \, \mathrm {d}\xi \rightarrow \int _{\mathbb {R}^2} \psi (\hat{f}) \, \mathrm {d}\mu (\hat{f}). \end{aligned}$$

Finally, towards a contradiction we assume \(\mu = \smash {\delta _{\hat{f}}}\) for some \(\hat{f} \in \mathbb {R}^2\), interpreted as a constant function on \(\mathbb {R}^2\). Then for all \(k\in \mathbb {N}\) we would have

$$\begin{aligned}&\fint _{(\alpha _k -\delta _k, \alpha _k + \delta _k)} \left| f_{(011)}(s) - \fint _{(\alpha _k -\delta _k, \alpha _k + \delta _k)} f_{(011)}(\tilde{s}) \, \mathrm {d}\tilde{s} \right| \, \mathrm {d}s \nonumber \\&\quad = \fint _{(\alpha _k -\delta _k, \alpha _k + \delta _k)} \left| f_{(011)}(s) - \hat{f}_2 - \fint _{(\alpha _k -\delta _k, \alpha _k + \delta _k)} \left( f_{(011)}(\tilde{s}) -\hat{f}_2 \right) \, \mathrm {d}\tilde{s} \right| \, \mathrm {d}s \nonumber \\&\quad \le 2 \fint _{(\alpha _k -\delta _k, \alpha _k + \delta _k)} \left| f_{(011)}(s) - \hat{f}_2 \right| \, \mathrm {d}s. \end{aligned}$$
(37)

Testing the convergence (36) with the function \(\psi (\hat{g}) \,{:}{=} \, |\hat{g} _2 - \hat{f}_2|\) for \(\hat{g} \in \mathbb {R}^2\) and using the assumption \(\mu = \delta _{\hat{f}}\) we would in the limit \(k\rightarrow \infty \) see that

$$\begin{aligned} \fint _{(\alpha _k-\delta _k,\alpha _k+\delta _k)} \left| f_{(011)}(s) -\hat{f}_2 \right| \, \mathrm {d}s&= \int _{-\frac{1}{2}}^\frac{1}{2} \left| f_{(011)}(\alpha _k +2 \delta _k s) - \hat{f}_2 \right| \, \mathrm {d}s \nonumber \\&\le C \fint _{B_{1}\left( 0\right) } \left| f_{\nu _1^+}^{(k)}(\xi ) - \hat{f}_2\right| \, \mathrm {d}\xi \rightarrow 0. \end{aligned}$$
(38)

However, the combination of (37) and (38) would contradict the condition (28).

Before we come to the third step, we for \((\beta , \gamma )\in B_{1/8}^{\,2}(0)\) define the two shifts \(z_2,z_3 \in L^\infty (B_{1/8}^{\,2}(0))\) via

$$\begin{aligned} z_2(\beta ,\gamma )&\,{:}{=}\, \left( f_{(110)} + f_{(1\overline{1}0)} - f_{(01\overline{1})} + g_2\right) \circ X(\beta ,\gamma ), \end{aligned}$$
(39)
$$\begin{aligned} z_3(\beta ,\gamma )&\, {:}{=}\, \left( f_{(01\overline{1})} - f_{(101)} -f_{(\overline{1}01)} + g_3 \right) \circ X(\beta ,\gamma ), \end{aligned}$$
(40)

and the push-forward measure \(\bar{\mu }\) on \(\mathbb {R}^3\) via for \(\varPsi \in C(\mathbb {R}^3)\) setting

$$\begin{aligned} \bar{\mu }_{\beta ,\gamma } (\varPsi )\, {:}{=}\, \int _{\mathbb {R}^2} \psi \left( (\theta _1\circ X,\hat{f} + (z_2,z_3))(\beta ,\gamma )\right) \, \mathrm {d}\mu (\hat{f}). \end{aligned}$$
(41)

Step 3: For almost all \((\beta ,\gamma ) \in B_{1/8}^{\,2}(0)\) and all \(\psi \in C(\mathbb {R}^3)\) we have in the limit \(k\rightarrow \infty \) that

$$\begin{aligned} \fint _{B_{1}\left( 0\right) } \psi \left( \theta ^{(k)}(\beta ,\gamma ;\xi )\right) \, \mathrm {d}\xi \rightarrow \bar{\mu }_{\beta ,\gamma }\left( \psi \right) , \end{aligned}$$
(42)

and the measure \(\bar{\mu }_{\beta ,\gamma }\) is supported on \(\smash {\widetilde{\mathscr {K}}}\), see definition (12).

The previous calculations immediately give that for almost all \((\beta ,\gamma ) \in \smash {B_{1/8}^{\,2}(0)}\) the sequence \(\smash {\theta _1^{(k)}}(\beta ,\gamma ;\bullet )\) converges strongly in \(L^1({B_{1}\left( 0\right) })\) as \(k\rightarrow \infty \) to

$$\begin{aligned} \theta _1\circ X(\beta ,\gamma ) = \left( f_{(101)} + f_{(\overline{1}01)} - f_{(110)} - f_{(1\overline{1}0)} + g_1\right) \circ X(\beta ,\gamma ). \end{aligned}$$
(43)

Similarly, for almost all \((\beta ,\gamma ) \in \smash {B_{1/8}^{\,2}(0)}\) the blow-ups \((\theta _2^{(k)} + f^{(k)}_{(011)})(\beta ,\gamma ;\bullet )\) and \((\smash {\theta _3^{(k)} - f^{(k)}_{(011)}})(\beta ,\gamma ;\bullet )\) converge strongly in \(L^1({B_{1}\left( 0\right) })\) to \(z_2(\beta ,\gamma )\) and \(z_3(\beta ,\gamma )\) in the limit \(k\rightarrow \infty \).

As the required convergence (42) is induced by the metrizable weak\(^*\)-topology on compactly-supported measures, we only have to identify the limit along subsequences, which may depend on \(\beta \) and \(\gamma \), of arbitrary subsequences. Let \((\beta ,\gamma ) \in B_{1/8}^{\,2}(0)\). Given a subsequence, we may extract a further subsequence (neither are relabeled) to upgrade the above convergences in the limit \(k\rightarrow \infty \) to pointwise convergence for almost all \(\xi \in {B_{1}\left( 0\right) }\) of the sequences \(\theta _1^{(k)}\circ X(\beta ,\gamma )\), \((\theta _2^{(k)} + f^{(k)}_{(011)})(\beta ,\gamma ;\xi )\) and \(\smash {(\theta _3^{(k)} - f^{(k)}_{(011)})(\beta ,\gamma ;\xi )}\) to expressions independent of \(\xi \). Applying Egoroff’s theorem, for any \(\varepsilon >0\) there exists a measurable set \(K_\varepsilon \subset {B_{1}\left( 0\right) }\) such that \(| {B_{1}\left( 0\right) } {\setminus } K_\varepsilon | < \varepsilon \) and such that on \(K_\varepsilon \) these convergences are uniform. Consequently, suppressing \(\beta \) and \(\gamma \), and in the last step exploiting the definitions (36) and (41) of \(\mu \) and \(\bar{\mu }\), respectively, we get for all \(\psi \in C(\mathbb {R}^3)\) that

$$\begin{aligned}&\limsup _{k\rightarrow \infty } \left| \fint _{B_{1}\left( 0\right) } \psi \left( \theta ^{(k)}(\xi )\right) \, \mathrm {d}\xi -\int _{\mathbb {R}^3} \psi \, \mathrm {d}\bar{\mu } \right| \\&\quad \le \limsup _{k\rightarrow \infty } \left| \fint _{{B_{1}\left( 0\right) }} \psi \left( (\theta _1\circ X, z_2 - f^{(k)}_{(011)}(\xi ) ,z_3+ f^{(k)}_{(011)}(\xi )\right) \, \mathrm {d}\xi -\int _{\mathbb {R}^3} \psi \, \mathrm {d}\bar{\mu } \right| \\&\qquad + 2 \Vert \psi \Vert _\infty | {B_{1}\left( 0\right) } {\setminus } K_\varepsilon | \\&\quad \le 2 \Vert \psi \Vert _\infty \varepsilon . \end{aligned}$$

As \(\varepsilon >0\) was arbitrary, the convergence (42) follows. Testing with \(\psi = {\text {dist}}\smash {\left( \bullet ,\widetilde{\mathscr {K}}\right) }\) we obtain \({\text {supp}} \bar{\mu } \subset \widetilde{\mathscr {K}}\).

Step 4: For some \(0<b<1\) and some measurable set \(B \subset B_{1/8}^{\,2}(0)\) we have that \(\theta _1\circ X = b\chi _B\) almost everywhere. Furthermore, there exist \(\tilde{z}_2, \tilde{z}_3 \in \mathbb {R}\) such that for almost all \((\beta ,\gamma ) \in B\) we have \((z_2,z_3)(\beta ,\gamma ) = (\tilde{z}_2,\tilde{z}_3)\).

Note that what we claim to prove in Step 4 is an empty statement if \(\theta _1 \circ X \equiv 0\) a.e. in \(B_{1/8}^{\,2}(0)\). We may thus suppose that

$$\begin{aligned} B\,:= & {} \, \left\{ (\beta ,\gamma ) \in B_{1/8}^{\,2}(0):\, \theta _1\circ X(\beta ,\gamma ) > 0 \right. \\&\quad \left. \text { and the conclusions of Steps 2 and 3 hold} \right\} \end{aligned}$$

satisfies \(|B|>0\). Let \(T_z\) for \(z\in \mathbb {R}^2\) be the translation operator acting on measures \(\hat{\mu }\) on \(\mathbb {R}^2\) via the formula \((T_z \hat{\mu })(A) = \hat{\mu } (A-z)\) for Borel sets \(A\subset \mathbb {R}^2\). Let \((\beta ,\gamma ) \in B\). We have by Step 3 and definition (41), see also Fig. 20, that

$$\begin{aligned} {\text {supp}} \bar{\mu }_{\beta ,\gamma } \subset \widetilde{\mathscr {K}} \cap \left\{ \hat{\theta } \in \mathbb {R}^3: \hat{\theta }_1 = \theta _1 \circ X(\beta ,\gamma )\right\} , \end{aligned}$$

and thus in combination with definition (12) and \(\theta _1\circ X(\beta ,\gamma )>0\) that

$$\begin{aligned} {\text {supp}}T_{-(z_2,z_3)(\beta ,\gamma )}\mu \subset \left\{ \left( 0,1-\theta _1\circ X(\beta ,\gamma )\right) , \left( 1-\theta _1\circ X(\beta ,\gamma ),0\right) \right\} . \end{aligned}$$

Together with the fact that \(\mu \) is not a Dirac measure by Step 2 we therefore obtain \(0<\lambda <1\) and \(\hat{f}, \hat{g} \in \mathbb {R}^2\) with \(\hat{f} \ne \hat{g}\) such that

$$\begin{aligned} \mu = \lambda \delta _{\hat{f}} + (1-\lambda ) \delta _{\hat{g}}. \end{aligned}$$
Fig. 20
figure 20

For all \(\beta ,\gamma \in B\), the intersection of the dotted line with \(\mathscr {K}\) is given by the two points \(\theta _1(\beta ,\gamma ) e_1 + (1-\theta _1(\beta ,\gamma ))e_3\) and \(\theta _1(\beta ,\gamma ) e_1 + (1-\theta _1(\beta ,\gamma ) )e_2\)

Consequently, we get

$$\begin{aligned} \{\hat{f},\hat{g}\} - (z_2, z_3)(\beta ,\gamma ) = \left\{ \left( 0,1-\theta _1\circ X(\beta ,\gamma )\right) , \left( 1-\theta _1\circ X(\beta ,\gamma ),0\right) \right\} . \end{aligned}$$
(44)

Calculating the distance of the two points in both representations gives

$$\begin{aligned} 2\left( 1-\theta _1\circ X(\beta ,\gamma )\right) = |\hat{f} - \hat{g}|>0. \end{aligned}$$

Therefore, we have \(\theta _1\circ X< 1\) on B. Furthermore, as \(\mu \) is independent of \((\beta ,\gamma )\) also \(\hat{f}\) and \(\hat{g}\) are, which implies that there exists \(b\in (0,1)\) such that \(\theta _1\circ X \equiv b \) on B. To see that \((z_2,z_3)\) is constant on B note that for all \((\beta ,\gamma ), (\tilde{\beta }, \tilde{\gamma }) \in B\) the representation (44) implies

$$\begin{aligned} \{\hat{f},\hat{g}\} - (z_2, z_3)(\beta ,\gamma ) = \{\hat{f},\hat{g}\} - (z_2, z_3)(\tilde{\beta }, \tilde{\gamma }). \end{aligned}$$

As a non-empty set which is invariant under a single, non-vanishing shift has to at least be countably infinite, we see that there exists \(\tilde{z}_2,\tilde{z}_3 \in \mathbb {R}\) such that for all \((\beta ,\gamma ) \in B\) we have \((z_2,z_3)(\beta ,\gamma ) = (\tilde{z}_2,\tilde{z}_3)\).

Step 5: We can choose a sufficiently small, universal number \(r \in (0,\frac{1}{64})\) such that if \(|B\cap B_r^{\,2}(0)| > 0\), then there exists \(d \in \mathscr {D}\), see definition (10), such that the configuration is planar on \({B_{r}\left( 0\right) }\) with respect to d.

By the decomposition of \(\theta _1\circ X\), see (43), and its interplay with the coordinates X, see (33)–(35), there exist \(\lambda _1\), \(\lambda _2\), \(c\in \mathbb {R}\) such that for almost all \(\beta , \gamma \in (-\frac{1}{16},\frac{1}{16})\) and

$$\begin{aligned} F_1(\beta )&\,{:}{=}\, f_{(101)}(\beta )- f_{(1\overline{1}0)}(\beta - \alpha ) + \lambda _1 \beta ,\\ F_2(\gamma )&\, {:}{=}\, f_{(\overline{1}01)}(\alpha -\gamma ) - f_{(110)}(\gamma ) + \lambda _2 \gamma + c \end{aligned}$$

we have

$$\begin{aligned} \theta _1 \circ X(\beta ,\gamma )&= f_{(101)}(\beta ) + f_{(\overline{1}01)}(\alpha -\gamma ) - f_{(110)}(\gamma ) - f_{(1\overline{1}0)}(\beta - \alpha ) + \lambda _1 \beta + \lambda _2 \gamma + c\\&= F_1(\beta ) + F_2(\gamma ). \end{aligned}$$

As by Step 4 the function \(\theta _1\circ X\) takes at most two values almost everywhere we have that either \(F_1\) is constant or \(F_2\) is constant almost everywhere on \((-\frac{1}{16},\frac{1}{16})\).

We only deal with the case of \(F_2\) being constant. The argument for the other one works analogously. Consequently, we get a measurable set \(D \subset (-r,r)\) such that \(|D|>0\) and \(D\times (-\frac{1}{16},\frac{1}{16}) \subset B\). We will follow the notation of Capella and Otto [11] in writing discrete derivatives of a function \(\phi : I \rightarrow \mathbb {R}\) from a non-empty, open interval \(I\subset \mathbb {R}\) for \(\gamma , h \in \mathbb {R}\) with \(\gamma , \gamma + h \in I\) as

$$\begin{aligned} \partial _\gamma ^h \phi (\gamma ) : = \phi (\gamma + h) - \phi (\gamma ). \end{aligned}$$
(45)

We proved in Step 4 that the shift \((z_2,z_3)\) is constant almost everywhere on B. Thus we get for almost all \(\gamma \in (-\frac{1}{32},\frac{1}{32})\), \(\beta \in D\) and \(h\in (-\frac{1}{32},\frac{1}{32})\) that

$$\begin{aligned} 0&{=} \partial _\gamma ^h z_2 \circ X(\beta ,\gamma ) \overset{(39)}{=} \partial _\gamma ^h \left( f_{(110)} + f_{(1\overline{1}0)} - f_{(01\overline{1})} + g_2 \right) \circ X(\beta ,\gamma ) \nonumber \\&\overset{(33) - (35)}{=} \partial _\gamma ^h \left( f_{(110)}(\gamma ) + f_{(1\overline{1}0)}(\beta - \alpha ) - f_{(01\overline{1})}(\gamma -\beta )\right) + \partial _\gamma ^h g_2 \circ X(\beta ,\gamma ) \nonumber \\&{=} \partial _\gamma ^h \left( f_{(110)}(\gamma ) - f_{(01\overline{1})}(\gamma -\beta )\right) + \partial _\gamma ^h g_2 \circ X(\beta ,\gamma ) . \end{aligned}$$
(46)

The fact that \(g_2\) is affine implies that \(\partial _\gamma ^h g_2 \circ X\) is independent of \(\beta \). Thus, “differentiating” again under the constraint \(\beta \), \(\tilde{\beta } \in D\) we for almost all \(\gamma \in (-\frac{1}{32},\frac{1}{32})\) and \( h \in (-\frac{1}{32},\frac{1}{32})\) get

$$\begin{aligned} 0 = \partial _\gamma ^h f_{(01\overline{1})}(\gamma - \beta ) - \partial _\gamma ^h f_{(01\overline{1})}(\gamma - \tilde{\beta }). \end{aligned}$$

Fix \(\beta \in D\). Setting \(t\, {:}{=}\, \gamma -\beta \) and \(\tilde{h}\, {:}{=}\, \beta - \tilde{\beta }\) the above turns into \(\partial ^{\tilde{h}} \partial ^h f_{(01\overline{1})}(t) = 0\) for almost all \(t\in (-\frac{1}{64},\frac{1}{64})\), \(h \in (-\frac{1}{32},\frac{1}{32})\) and \(\tilde{h} \in -D +\beta \) due to \(D\subset (-\frac{1}{64},\frac{1}{64})\). As a result of \(| - D + \beta | >0\), we can choose a sufficiently small, universal number \(r \in (0,\frac{1}{64})\) and apply [11, Lemma 3.11] to get for almost all \(t \in (-4r,4r)\) and shifts \(h,\tilde{h} \in (-4r,4r)\) that

$$\begin{aligned} \partial ^h\partial ^{\tilde{h}} f_{(01\overline{1})}(t) = 0. \end{aligned}$$

Consequently, the function \(f_{(01\overline{1})}\) is affine on \((-4r,4r)\), see e.g. Lemma 13 in the Appendix A. Referring back to Equation (46) we see that also \(f_{(110)}\) is affine on \((-2r,2r)\).

The upshot is that for \(x \in {B_{2r}\left( 0\right) }\) and with the affine function

$$\begin{aligned} \tilde{g}_2(x) \,{:}{=}\, f_{(110)}\left( \frac{1}{\sqrt{2}}(110)\cdot x\right) - f_{(01\overline{1})}\left( \frac{1}{\sqrt{2}}(01\overline{1})\cdot x\right) + g_2(x) \end{aligned}$$

the decomposition (19) for \(\theta _2\) can almost everywhere be re-written as

$$\begin{aligned} \theta _2(x) = -f_{(011)}\left( \frac{1}{\sqrt{2}}(011)\cdot x\right) + f_{(1\overline{1}0)}\left( \frac{1}{\sqrt{2}}(1\overline{1}0)\cdot x\right) + \tilde{g}_2(x). \end{aligned}$$
(47)

By Equation (46) we furthermore have \(\partial _\gamma \tilde{g}_2 \circ X = 0 \) on \(B_{2r}^{\,2}(0)\). In the standard basis of \(\mathbb {R}^3\) this translates to

$$\begin{aligned} \partial _{[11\overline{1}]} \tilde{g}_2 = 0 \text { on } {B_{2r}\left( 0\right) }, \end{aligned}$$

since \(\partial _\gamma \) corresponds to differentiating in the direction of \([11\overline{1}]\) by Equation (32) and \(\partial _{[11\overline{1}]} \tilde{g}_2\) is constant on \({B_{2r}\left( 0\right) }\).

The analogue of (46) using \(z_3\) rather than \(z_2\) gives that \(f_{(\overline{1}01)}\) is affine on \((-r,r)\) and that we may find an affine function \(\tilde{g}_3\) with \(\partial _{[11\overline{1}]} \tilde{g}_3= 0\) such that for almost all \(x\in {B_{r}\left( 0\right) }\) we have

$$\begin{aligned} \theta _3(x) = f_{(011)}\left( \frac{1}{\sqrt{2}} (011) \cdot x \right) - f_{(101)} \left( \frac{1}{\sqrt{2}} (101) \cdot x \right) + \tilde{g}_3(x). \end{aligned}$$
(48)

For almost all \(x\in {B_{r}\left( 0\right) }\), the assumption \(\theta _1(x) + \theta _2(x) + \theta _3(x) = 1\), see also (12), and the observations \(\partial _{[11\overline{1}]} \theta _2(x) = \partial _{[11\overline{1}]} \theta _3(x) = 0\), due to (47) and (48), imply \(\partial _{[11\overline{1}]} \theta _1(x) = 0\). Together with the decomposition (19) we see for \(x\in {B_{r}\left( 0\right) }\) that the affine function

$$\begin{aligned} \tilde{g}_1(x)\, {:}{=}\, f_{(\overline{1}01)}\left( \frac{1}{\sqrt{2}}(\overline{1}01) \cdot x \right) - f_{(110)}\left( \frac{1}{\sqrt{2}}(110) \cdot x \right) +g_1(x) \end{aligned}$$

satisfies

$$\begin{aligned} \partial _{[11\overline{1}]} \tilde{g}_1(x) = \partial _{[11\overline{1}]} \theta _1(x) = 0 \end{aligned}$$

as well and we almost everywhere get the decomposition

$$\begin{aligned} \theta _1(x) = f_{(101)}\left( \frac{1}{\sqrt{2}}(101) \cdot x \right) - f_{(1\overline{1}0)}\left( \frac{1}{\sqrt{2}}(1\overline{1}0) \cdot x \right) + \tilde{g}_1(x). \end{aligned}$$
(49)

Equations (47)–(49) together with the affine function \(\tilde{g}_i\) being independent of the \([11\overline{1}]\)-direction constitute planarity on \({B_{r}\left( 0\right) }\) of the configuration, see Definition 3.

Step 6: There exists a universal constant \(\tilde{r} \in (0,r)\) such that if we have \(|B\cap B_r^{\,2}(0)|=0\), i.e., \(\theta _1 \circ X(\beta ,\gamma ) = 0\) for almost all \((\beta ,\gamma ) \in B^{\,2}_r(0)\), then the solution u is a two-variant configuration on \({B_{\tilde{r}}\left( 0\right) }\).

As the plane \(H(\alpha ,\smash {\frac{1}{\sqrt{2}}}(011))\) contains plenty of lines parallel to \(E_1\), see Fig. 19b, an application of Lemma 7 together with \(\alpha \in [-\tilde{r},\tilde{r}]\) ensures that \(\theta _1 \equiv 0\) on \({B_{\tilde{r}}\left( 0\right) }\) for some universal \(\tilde{r} \in (0,r)\). Corollary 1 then implies that we are dealing with a two-variant configuration on \({B_{\tilde{r}}\left( 0\right) }\). \(\quad \square \)

Proof of Lemma 6

Without loss of generality, we may assume

$$\begin{aligned} \hbox {ess inf}_{x_1,x_2 \in (0,1)} f(x_1) + g(x_2) = 0 \ge c. \end{aligned}$$
(50)

Step 1: We have \(\hbox {ess inf}f + \hbox {ess inf}g \ge 0\).

Let \(\delta >0\). We know that

$$\begin{aligned} \left| \left\{ t \in (0,1) : f(t) < \hbox {ess inf}f + \frac{\delta }{2}\right\} \right| >0 \end{aligned}$$

and

$$\begin{aligned} \left| \left\{ t \in (0,1) : g(t) < \hbox {ess inf}g + \frac{\delta }{2}\right\} \right| >0. \end{aligned}$$

Consequently, we have that

$$\begin{aligned} \left| \left\{ x \in (0,1)^2 : f(x_1) + g(x_2) < \hbox {ess inf}f + \hbox {ess inf}g + \delta \right\} \right| >0. \end{aligned}$$

For all \(\delta >0\) we with (50) thus know \(-\delta \le \hbox {ess inf}f + \hbox {ess inf}g\), which gives the claim.

Step 2: Statement 1 implies statement 3.

For almost all \(x \in (0,1)^2\) we know that

$$\begin{aligned} \varepsilon + c\ge f(x_1) + g(x_2) \ge \hbox {ess inf}f + g(x_2) \ge \hbox {ess inf}f + \hbox {ess inf}g \ge 0\ge c. \end{aligned}$$

In particular, we know

$$\begin{aligned} \hbox {ess inf}f + \hbox {ess inf}g \le \varepsilon +c . \end{aligned}$$

By Fubini’s Theorem there exists an \(x_2 \in (0,1)\) such that for almost all \(x_1 \in (0,1)\) we have

$$\begin{aligned} \varepsilon +c \ge f(x_1) + g(x_2) \ge \hbox {ess inf}f + g(x_2)\ge c. \end{aligned}$$

With such an \(x_2 \in (0,1)\) we thus we for almost all \(x_1 \in (0,1)\) see

$$\begin{aligned} f(x_1) -\hbox {ess inf}f = f(x_1) + g(x_2) - ( \hbox {ess inf}f + g(x_2))\le \varepsilon . \end{aligned}$$

A similar argument ensures \(g \le \hbox {ess inf}g + \varepsilon \).

Step 3: Conclusion.

The proof for the implication “2 \(\implies \) 3” is very similar to Step 2. Lastly, if \(\varepsilon = 0\), the implications “3 \(\implies \) 1, 2” are trivial. \(\quad \square \)

Proof of Lemma 7

The radius \(r>0\) is only required to ensure that \(P \subset {B_{1}\left( 0\right) }\). After choosing it correspondingly, we may thus translate, re-scale and use the symmetries of the problem to only work in the case \(i=1\), \(x_0 = 0\) and \(I=[-1,1]\). For all \(\nu \in N_2 \cup N_3\) these additional assumptions imply

$$\begin{aligned} \nu \cdot l(I) = \sqrt{2}\,E_1\cdot \nu [-1,1] = [-1,1] \end{aligned}$$

and, consequently, \(P = \bigcap _{\nu \in N_{2} \cup N_{3}} \{x \in \mathbb {R}^3: |\nu \cdot x| \le 1\}\). Furthermore, we only have to deal with the case \(\theta _1\circ l \le \varepsilon \), as the other one can be dealt with by working with \(\tilde{\theta }_j\, {:}{=}\, 1-\theta _j\) for \(j=1,2,3\). We remind the reader that Fig. 13 depicts the general strategy of the proof.

Step 1: Extend \(0 \le \theta _1 \le \varepsilon \) to the plane \(H\big (0,\frac{1}{\sqrt{2}}(011)\big )\).

For \((\alpha ,\beta ) \in [-1,1]^2\) we parametrize \(H\big (0,\smash {\frac{1}{\sqrt{2}}}(011)\big )\) via

$$\begin{aligned} X(\alpha ,\beta )\, {:}{=}\, \alpha \frac{1}{\sqrt{2}}[11\overline{1}] + \beta \frac{1}{\sqrt{2}}[1\overline{1}1] . \end{aligned}$$

As usual, we omit the factor \(\frac{1}{\sqrt{2}}\) in the index of \(f_\nu \) for \(\nu \in N\), see decomposition (20). By said decomposition and the existence of traces, see Lemma 4, we have for almost all \((\alpha ,\beta ) \in [-1,1]^2 \) that

$$\begin{aligned} 0\le \theta _1 \circ X \left( \alpha , \beta \right) = f_{(\overline{1}01)}(-\alpha ) - f_{(110)}( \alpha ) + f_{(101)}(\beta ) - f_{(1\overline{1}0)}(\beta ) \le 1. \end{aligned}$$

As for \(t\in [-1,1]\) the line t(1, 1) parametrizes the diagonal and \(l(t)=X(t,t)\), the assumption (21) of \(\theta _1\) almost achieving its minimum along l and the two-dimensional statement Lemma 6 imply that for almost all points \(\alpha ,\beta \in [-1,1]\) we have

$$\begin{aligned} f_{(\overline{1}01)}(-\alpha ) - f_{(110)}( \alpha ) \le&{\text {*}}{ess\ inf}_{\tilde{\alpha } \in [-1,1]} \left( f_{(\overline{1}01)}(-\tilde{\alpha }) - f_{(110)}(\tilde{\alpha }) \right) + \varepsilon , \\ f_{(101)}(\beta ) - f_{(1\overline{1}0)}(\beta ) \le&{\text {*}}{ess\ inf}_{\tilde{\beta }\in [-1,1]} \left( f_{(101)}\left( \tilde{\beta }\right) - f_{(1\overline{1}0)}\left( \tilde{\beta }\right) \right) + \varepsilon \end{aligned}$$

and

$$\begin{aligned} {\text {*}}{ess\ inf}_{\tilde{\beta }\in [-1,1]} \left( f_{(101)}\left( \tilde{\beta }\right) - f_{(1\overline{1}0)}\left( \tilde{\beta }\right) \right) + {\text {*}}{ess\ inf}_{\tilde{\alpha } \in [-1,1]} \left( f_{(\overline{1}01)}(-\tilde{\alpha }) - f_{(110)}(\tilde{\alpha }) \right) \le \varepsilon . \end{aligned}$$

Adding the first two inequalities and using the assumption (21) we get for almost all \((\alpha ,\beta ) \in [-1,1]^2\) that

$$\begin{aligned} 0 \le \theta _1 \circ X \left( \alpha , \beta \right) \le 3\varepsilon . \end{aligned}$$

Changing coordinates to \(y\, {:}{=}\, \frac{1}{2} \left( \alpha + \beta \right) \), \(z \,{:}{=}\, \frac{1}{2} \left( \alpha - \beta \right) \) we see for almost all \((y,z)\in \mathbb {R}^2\) with \(y + z\), \(y - z \in [-1,1]\) that

$$\begin{aligned} 0 \le \theta _1\left( \sqrt{2} (y, z,- z)\right) \le 3\varepsilon . \end{aligned}$$

Step 2: Prove inequality (22) on a subset of P of full measure.

For all \(y \in \mathbb {R}\) we have \(y + z\), \(y - z \in [-1,1]\) if and only if \(y \in I(z)\, {:}{=}\, \left[ -1 + |z|,\right. \left. 1 - |z|\right] \). Therefore, Fubini’s theorem implies for almost all \(z \in [-1,1]\) and for almost all \(y \in I(z)\) that

$$\begin{aligned} 0\le \theta _1\left( \sqrt{2}(y,z,-z)\right) \le 3\varepsilon . \end{aligned}$$
(51)

We may thus repeat the above argument for almost all \(z \in [-1,1]\) with \(\tilde{l}(t) = \sqrt{2}t E_1 + \sqrt{2}(0,z,-z)\) for \(t\in I(z)\) and the plane \(H\big (2z,\frac{1}{\sqrt{2}}(01\overline{1})\big )\) to see for almost all \(\alpha ,\beta \in I(z)\) that

$$\begin{aligned} 0 \le \theta _1\left( \sqrt{2} (0,z,-z) + \alpha \frac{1}{\sqrt{2}}[111] + \beta \frac{1}{\sqrt{2}}[\overline{1}11]\right) \le 9\varepsilon . \end{aligned}$$

It is straightforward to check that the conditions \(z\in [-1,1]\) and \(\alpha ,\beta \in I(z)\) are equivalent to

$$\begin{aligned} \sqrt{2} (0,z,-z) + \alpha \frac{1}{\sqrt{2}}[111] + \beta \frac{1}{\sqrt{2}}[\overline{1}11] \in P. \end{aligned}$$

Due to measurability of \(\theta _1\), another application of Fubini’s theorem implies that for almost all \(x\in P\) we have the inequality

$$\begin{aligned} 0 \le \theta _1(x) \le 9\varepsilon \end{aligned}$$

Step 3: Prove that for all \(\nu \in N_2 \cup N_3\) the function \(f_\nu \) is almost affine in the sense of estimate (23) if \(f_{\tilde{\nu }} \in C([-1,1])\) for all \(\tilde{\nu } \in N\).

We will only deal with \(\nu = \smash {\frac{1}{\sqrt{2}}(101)}\). The advantage of working with continuous functions is that we do not have to bother with sets of measure zero. Let \((s,h,\tilde{h}) \in \mathbb {R}^3\) be such that s, \(s + h\), \(s + \tilde{h}\), \(s + h + \tilde{h} \in [-1,1]\). In order to exploit Remark 1 we set

$$\begin{aligned} x_1&\, {:}{=}\, \sqrt{2}sE_1 ,\\ x_2&\, {:}{=}\, \sqrt{2}sE_1 + h \frac{1}{\sqrt{2}} [111],\\ x_3&\, {:}{=}\, \sqrt{2}sE_1 + \tilde{h} \frac{1}{\sqrt{2}} [1\overline{1}1],\\ x_4&\, {:}{=}\, \sqrt{2}sE_1 + h \frac{1}{\sqrt{2}} [111] + \tilde{h} \frac{1}{\sqrt{2}} [1\overline{1}1]. \end{aligned}$$

Let \(j\in \{1,2,3,4\}\). To prove \(x_j \in \overline{P}\), we check \(x_j \cdot \tilde{\nu } \in [-1,1]\) for all \(\tilde{\nu } \in N_2 \cup N_3\): For \(\tilde{\nu }=\frac{1}{\sqrt{2}} (101)\), this is clearly the case due to \(x_1 \cdot \tilde{\nu } = s\) and \(\frac{1}{\sqrt{2}}[111] \cdot \tilde{\nu } = \frac{1}{\sqrt{2}}[1\overline{1}1] \cdot \tilde{\nu } =1 \). In contrast, for the other normal \(\tilde{\nu } = \frac{1}{\sqrt{2}} (\overline{1}01) \in N_2\) we have \(x_0 \cdot \tilde{\nu } = - s\) and \(\frac{1}{\sqrt{2}}[111] \cdot \tilde{\nu } = \frac{1}{\sqrt{2}}[1\overline{1}1] \cdot \tilde{\nu } =0 \), which still implies \(x_j \cdot \tilde{\nu } \in [-1,1]\).

For \(\tilde{\nu } \in N_3\) we have \(x_0 \cdot \tilde{\nu } = s\) and

$$\begin{aligned} \left\{ \frac{1}{\sqrt{2}}[111] \cdot \tilde{\nu } , \frac{1}{\sqrt{2}}[1\overline{1}1] \cdot \tilde{\nu } \right\} =\{0,1\}, \end{aligned}$$

which also implies \(x_j \cdot \tilde{\nu } \in [-1,1]\).

By Step 2 we then have

$$\begin{aligned} |\theta _1(x_4) + \theta _1(x_1) - \theta _1(x_2) - \theta _1(x_3) | \le 36 \varepsilon . \end{aligned}$$

Inserting the decomposition (20) and the definition of the points \(x_j\) for \(j=1,2,3,4\) we see that all functions except \(f_{(101)}\) cancel and we get

$$\begin{aligned} \left| f_{(101)}(s + h + \tilde{h}) + f_{(101)}(s) - f_{(101)}(s + h) - f_{(101)} (s + \tilde{h})\right| \le 36 \varepsilon . \end{aligned}$$

\(\square \)

4.5 The Case \(f_\nu \in VMO\) for all \(\nu \in N\)

Proof of Proposition 2

For \(\nu \in N\), \(x \in {B_{2/3}\left( 0\right) }\) and \(\delta \in (0,\smash {\frac{1}{3}})\) we define the convolution \(\theta _\delta \in C({B_{2/3}\left( 0\right) };[0,1]^3)\) via \(\theta _\delta (x)\, {:}{=}\, \smash {\fint _{{B_{\delta }\left( x\right) }} \theta (y) dy}\) and \(f_{\nu ,\delta } \in C([\smash {-\frac{2}{3}},\smash {\frac{2}{3}}])\) such that

$$\begin{aligned} f_{\nu ,\delta }(x\cdot \nu ) = \fint _{{B_{\delta }\left( x\right) }} f_{\nu }(y \cdot \nu ) \, \mathrm {d}y. \end{aligned}$$
(52)

For functions \(u: {B_{1}\left( 0\right) } \rightarrow \mathbb {R}^3\) and \(\theta : {B_{1}\left( 0\right) } \rightarrow \mathscr {\widetilde{K}}\), see definition (12), satisfying the assumptions of the proposition and \(r \in (0,\smash {\frac{1}{3}})\) we say that \((u,\theta ) \in \mathscr {U}_{r}\) if there exist \(\eta >0\), \( \delta _0 \in (0,\smash {\frac{1}{3}})\), a function \(\varepsilon (\delta )>0\) for \(\delta \in (0, \delta _0)\) with \(\varepsilon (\delta ) \rightarrow 0\) as \(\delta \rightarrow 0\) and \(i\in \{1,2,3\}\) such that the following holds:

  1. 1.

    The set

    $$\begin{aligned}&A_{r,\eta , \delta _0,\varepsilon ,i}(u,\theta )\, {:}{=}\, \Bigg \{ x\in {B_{r}\left( 0\right) }: \theta _i(x) = 0, \eta< \theta _{i+1}(x) , \theta _{i-1}(x)< 1 - \eta , \nonumber \\&\quad \quad \theta _{i,\delta } (x)< \varepsilon (\delta ) , \eta< \theta _{i+1,\delta }(x) , \theta _{i-1,\delta }(x) < 1 -\eta \, \forall \delta \in (0, \delta _0) \Bigg \} \end{aligned}$$
    (53)

    satisfies \(|A_{r,\eta , \delta _0,\varepsilon ,i}(u,\theta )|>0\).

  2. 2.

    For all \(\delta \in (0, \delta _0)\) and all \(x \in {B_{2/3}\left( 0\right) }\) we have that

    $$\begin{aligned} \theta _\delta (x) \in \widetilde{\mathscr {K}}_{\varepsilon (\delta )}\, {:}{=}\, \left( \widetilde{\mathscr {K}} + {B_{\varepsilon (\delta )}\left( 0\right) }\right) \cap {\text {conv}}\left( \widetilde{\mathscr {K}}\right) , \end{aligned}$$
    (54)

Here \(\smash {{\text {conv}}(\widetilde{\mathscr {K}}\,)}\) denotes the convex hull, see Fig. 21.

Similarly, we say that \((u,\theta ) \in \mathscr {U}^0\) if there exist \(\eta >0\), \(\delta _0 \in (0,\smash {\frac{1}{3}})\), a function \(\varepsilon (\delta )>0\) for \(\delta \in (0,\delta _0)\) with \(\varepsilon (\delta ) \rightarrow 0\) as \(\delta \rightarrow 0\) and \(i\in \{1,2,3\}\) such that we have the inclusion (54) for all \(\delta \in (0,\delta _0)\) on \({B_{1/3}\left( 0\right) }\) and such that \(0\in A_{1/3,\eta , \delta _0,\varepsilon ,i}(u,\theta )\) is a point of density one. By this we mean that as \(\kappa \searrow 0\) we have

$$\begin{aligned} \frac{ |A_{1/3,\eta , \delta _0,\varepsilon ,i}(u,\theta )\cap {B_{\kappa }\left( 0\right) }| }{ |{B_{\kappa }\left( 0\right) }| } \rightarrow 1. \end{aligned}$$
Fig. 21
figure 21

Sketch of the strains taking the form \(e= \sum _{i=1}^3 \theta _ie_i\) for \(\theta \in \smash {\widetilde{\mathscr {K}}_\varepsilon }\). The strain \(e(u_\delta )(0)= \sum _{i=1}^3\theta _{i,\delta } e_i\) essentially lies strictly between \(e_2\) and \(e_3\)

Step 1: Let u and \(\theta \) satisfy the assumptions of the proposition. For \(r \in (0,\smash {\frac{1}{3}})\) assume that \((u,\theta ) \notin \mathscr {U}_r\). Then there exists \(i \in \{1,2,3\}\) such that for almost all \(x\in {B_{r}\left( 0\right) }\) we have \(e(u)(x) = e_i\), which in particular implies \(\theta _i(x) = 1\) and \(\theta _{i+1}(x) = \theta _{i-1}(x) =0\).

We argue by proving the converse, meaning we assume that for all \(i=1,2,3\) we have \(e(u) \not \equiv e_i\) on \({B_{r}\left( 0\right) }\). As convolutions are convex operations we obtain \(\theta _\delta \in {\text {conv}}(\smash {\widetilde{\mathscr {K}} }\, )\) a.e. on \({B_{2/3}\left( 0\right) }\) for \(\delta \in (0,\smash {\frac{1}{3}})\). Furthermore, Lemma 8 gives existence of \(\varepsilon _1(\delta )>0\) for \(\delta \in (0,\frac{1}{3})\) such that \(\varepsilon _1(\delta ) \rightarrow 0\) as \(\delta \rightarrow 0\) and such that the fuzzy inclusion (54) holds on \({B_{2/3}\left( 0\right) }\) with \(\varepsilon _1\).

Next, we prove that there exists \(i\in \{1,2,3\}\) such that we have

$$\begin{aligned} \left| \left\{ x\in {B_{r}\left( 0\right) }: \theta _i(x) = 0 , 0< \theta _{i+1}(x), \theta _{i-1}(x) < 1\right\} \right| > 0. \end{aligned}$$
(55)

Otherwise, for almost all \(x\in {B_{r}\left( 0\right) }\) we would by the assumption \(\theta \in \mathscr {\widetilde{K}}\) have \(e(u) \in \{e_1,e_2,e_3\}\). Using the uniform convergence of averages provided by the mean value theorem for VMO-functions, Lemma 8, it would hold for some \(i \in \{1,2,3\}\) that \(e(u) \equiv e_i\) on \({B_{r}\left( 0\right) }\), giving a contradiction.

Let \(i\in \{1,2,3\}\) be the index such that (55) holds, which implies that there exists \(\eta >0\) with

$$\begin{aligned} \left| \left\{ x\in {B_{r}\left( 0\right) }: \theta _i(x) = 0 , 2 \eta< \theta _{i+1}(x), \theta _{i-1}(x) < 1-2 \eta \right\} \right| > 0. \end{aligned}$$
(56)

Lebesgue point theory implies that \(\theta _\delta \rightarrow \theta \) pointwise almost everywhere as \(\delta \rightarrow 0\). Using Egoroff’s theorem, we may upgrade this convergence to uniform convergence on some measurable set

$$\begin{aligned} \widetilde{A} \subset \left\{ x\in {B_{r}\left( 0\right) }: \theta _i(x) = 0 , 2\eta< \theta _{i+1}(x), \theta _{i-1}(x) < 1-2\eta \right\} \end{aligned}$$

with \(|\smash {\widetilde{A}}| >0\). Using both uniform convergences above we get \( \delta _0 \in (0,\frac{1}{3})\) such that there exists \(\varepsilon _2(\delta ) >0 \) for \(\delta \in (0, \delta _0)\) with \(\varepsilon _2(\delta ) \rightarrow 0\) as \(\delta \rightarrow 0\) and with the property that for all \(x\in \smash {\widetilde{A}}\) and \(\delta \in (0, \delta _0)\) we have \( \theta _{1,\delta }(x) < \varepsilon _2(\delta ) \) and

$$\begin{aligned} \eta< \theta _{2,\delta }(x) \text {, } \theta _{3,\delta }(x) < 1 - \eta . \end{aligned}$$

For \(\varepsilon (\delta )\, {:}{=}\, \max \{\varepsilon _1(\delta ),\varepsilon _2(\delta )\}\) with \(\delta \in (0, \delta _0)\) we thus have \(|A_{r,\eta , \delta _0,\varepsilon ,i}(u,\theta )|>0\) and \((u,\theta ) \in \mathscr {U}_r\).

Step 2: In order to prove the proposition, it is sufficient to prove that there exists a universal radius \( r_2 \in (0,\smash {\frac{1}{3}})\) such that if \((u,\theta ) \in \mathscr {U}^0\) then e(u) is a two-variant configuration on \({B_{r_2}\left( 0\right) }\).

Let \( r\, {:}{=}\, \frac{1}{3} r_{2}\). By Step 1 we know that if \((u,\theta ) \notin \smash { \mathscr {U}_{r}}\) then e(u) is a two-variant configuration on \(\smash {{B_{r}\left( 0\right) }}\) on account of being a pure phase. If we do have \((u,\theta ) \in \smash { \mathscr {U}_{r}}\), then by defintion there exist \(\eta >0\), \( \delta _0 \in (0,\frac{1}{3})\), a function \(\varepsilon (\delta )>0\) for \(\delta \in (0,\delta _0)\) with \(\varepsilon (\delta ) \rightarrow 0\) as \(\delta \rightarrow 0\), and \(i\in \{1,2,3\}\) such that \(|A_{ r,\eta , \delta _0,\varepsilon ,i}(u,\theta )|>0\).

Let \(x_0\in A_{ r,\eta , \delta _0,\varepsilon ,i}(u,\theta )\subset {B_{ r}\left( 0\right) }\) be a point of density one. For \(x\in {B_{1}\left( 0\right) }\) and \(i=1,2,3\) we define

$$\begin{aligned} \hat{u} (x)&\,{:}{=} \, \frac{1}{1- r} u((1- r) x+x_0),\\ \hat{\theta }(x)&\,{:}{=}\, \theta ((1- r) x + x_0),\\ \hat{g}_i (x)&\, {:}{=}\, g_i((1- r)x +x_0), \end{aligned}$$

and \(\hat{f}_\nu \in L^\infty (-1,1)\) for \(\nu \in N\) such that

$$\begin{aligned} \hat{f}_\nu (\nu \cdot x) = f_\nu \left( \nu \cdot ( (1- r) x+x_0)\right) . \end{aligned}$$

Then \(\hat{u}\) satisfies the assumptions of the proposition with \(\hat{\theta }\), \(\hat{f}_\nu \) and \(\hat{g}_i\) for \(\nu \in N\) and \(i=1,2,3\). Furthermore, for all \(x\in {B_{1/3}\left( 0\right) }\) and \(\delta \in (0, \delta _0)\) we have \(| (1- r) x + x_0| \le \smash {\frac{2}{3}}\) and \(\hat{\theta }_\delta (x) = \theta _{(1- r) \delta } ( (1- r) x + x_0)\), so that \(\hat{\theta }_\delta \) satisfies the inclusion (54) on \(\smash {{B_{1/3}\left( 0\right) }}\). Additionally, due to \(x_0\in A_{ r,\eta , \delta _0,\varepsilon ,i}(u,\theta )\) being a point of density one we get that \(0 \in A_{1/3,\eta , \delta _0,\varepsilon ,i}(\hat{u},\hat{\theta })\) is also a point of density one. Therefore we have \((\hat{u}, \hat{\theta }) \in \mathscr {U}^0\). By the assumption of Step 2, \(e(\hat{u})\) is a two-variant configuration on \({B_{r_{2}}\left( 0\right) }\) and thus e(u) is a two-variant configuration on \({B_{(1- r) r_{2}}\left( x_0\right) }\). For \(x\in {B_{ r}\left( 0\right) }\) we have \(|x-x_0| \le \frac{2}{3} r_{2} \le (1- r) r_{2}\) due to \( r \le \frac{1}{3}\). Therefore e(u) is a two-variant configuration on \({B_{ r}\left( 0\right) }\).

Throughout the rest of the proof we assume that \((u,\theta ) \in \mathscr {U}^0\) and we accordingly choose \(\eta >0\), \( \delta _0 \in (0,\smash {\frac{1}{3}})\), a function \(\varepsilon (\delta )>0\) for \(\delta \in (0, \delta _0)\) with \(\varepsilon (\delta ) \rightarrow 0\) as \(\delta \rightarrow 0\), and \(i\in \{1,2,3\}\) such that \(0 \in A\, {:}{=}\, A_{1/3,\eta , \delta _0,\varepsilon ,i}(u,\theta )\) is a point of density one. By symmetry we may furthermore choose \(i=1\).

Step 3: On the plane \(\smash {H\big (0,\nu _1^+\big )}\) we split up \(\theta _1\) into two one-dimensional functions and find maximal intervals on which they are essentially constant.

Similarly to the proof of Proposition 1 we parametrize the plane \(\smash {H\big (0,\nu _1^+\big )}\) for \(\beta ,\gamma \in \mathbb {R}\) via

$$\begin{aligned} X(\beta ,\gamma )\, {:}{=}\, \beta \frac{1}{\sqrt{2}} [1\overline{1}1] + \gamma \frac{1}{\sqrt{2}} [11\overline{1}]. \end{aligned}$$

Let \(\tilde{r}\) be the universal radius of Lemma 7. For a small enough, universal number \(r_1 \in \smash {(0, \frac{1}{3})\cap (0,\tilde{r})}\) we have \(X(\beta ,\gamma ) \in {B_{1/3}\left( 0\right) }\cap {B_{\tilde{r}}\left( 0\right) }\) for all \(\beta ,\gamma \in [-r_1,r_1]\). Thus, recalling (54), for all \(\nu \in N\) and \(\delta \in (0, \delta _0)\) the functions \(\theta _{i,\delta }\circ X \in C([-r_1,r_1]^2; \smash {\mathscr {\widetilde{K}}_{\varepsilon (\delta )}})\) and \(f_{\nu , \delta } \circ X \in C([-r_1,r_1]^2)\) are well-defined. Furthermore, for all \(\beta ,\gamma \in [-r_1,r_1]\) we have the relations

$$\begin{aligned} X(\beta ,\gamma ) \cdot \nu _1^+ = X(\beta ,\gamma ) \cdot \frac{1}{\sqrt{2}} (011)&= 0, \end{aligned}$$
(57)
$$\begin{aligned} X(\beta ,\gamma ) \cdot \nu _2^+ = X(\beta ,\gamma ) \cdot \frac{1}{\sqrt{2}} (101)&= \beta , \end{aligned}$$
(58)
$$\begin{aligned} X(\beta ,\gamma ) \cdot \nu _3^+ = X(\beta ,\gamma ) \cdot \frac{1}{\sqrt{2}} (110)&= \gamma , \end{aligned}$$
(59)
$$\begin{aligned} X(\beta ,\gamma ) \cdot \nu _1^- = X(\beta ,\gamma ) \cdot \frac{1}{\sqrt{2}} (01\overline{1})&= \gamma - \beta , \end{aligned}$$
(60)
$$\begin{aligned} X(\beta ,\gamma ) \cdot \nu _2^- = X(\beta ,\gamma ) \cdot \frac{1}{\sqrt{2}} (\overline{1}01)&= - \gamma , \end{aligned}$$
(61)
$$\begin{aligned} X(\beta ,\gamma ) \cdot \nu _3^- = X(\beta ,\gamma ) \cdot \frac{1}{\sqrt{2}} (1\overline{1}0)&= \beta . \end{aligned}$$
(62)

Absorbing the affine function \(g_1\) in decomposition (24) into the four functions \(f_\nu \in L^\infty (-1,1)\) for \(\nu \in N_2 \cup N_3\) and redefining the affine functions \(g_2\) and \(g_3\) accordingly, we may, for almost all \(x\in {B_{1}\left( 0\right) }\), assume that

$$\begin{aligned} \theta _1(x) = \sum _{\nu \in N_2 } f_{\nu }(\nu \cdot x) - \sum _{\nu \in N_3} f_{\nu }(\nu \cdot x), \end{aligned}$$
(63)

while the decomposition (24) is still valid for \(\theta _2\) and \(\theta _3\). As in the proof of Proposition 1, we exploit the combinatorial structure of the normals discussed in Remark 1 and sort the functions \(f_\nu \) with \(\nu \in N_2\cup N_3\) according to their dependence on \(\beta \) or \(\gamma \) on the plane \(H\left( 0,\nu _1^+\right) \) by for \(\beta ,\gamma \in [- r_1, r_1]\) and \(\delta \in (0,\delta _0)\) defining \(F_1,F_2 \in L^\infty ((- r_1, r_1))\) and \(F_{1,\delta },F_{2,\delta } \in C([- r_1, r_1])\) via

$$\begin{aligned} F_{1}(\beta )&\,{:}{=}\, f_{\nu _2^+}(\beta ) - f_{\nu _3^-}(\beta ), \\ F_{1,\delta }(\beta )&\,{:}{=}\, f_{\nu _2^+,\delta }(\beta ) - f_{\nu _3^-,\delta }(\beta ), \\ F_{2}(\gamma )&\, {:}{=}\, f_{\nu _2^-}(-\gamma ) - f_{\nu _3^+}(\gamma ),\\ F_{2,\delta }(\gamma )&\, {:}{=} \, f_{\nu _2^-,\delta }(-\gamma ) - f_{\nu _3^+,\delta }(\gamma ). \end{aligned}$$

In particular, by the decomposition (63) we get \(\theta _1 \circ X(\beta ,\gamma ) = F_1(\beta ) + F_2(\gamma )\) for almost all \(\beta ,\gamma \in [-r_1,r_1] \), which for \(\delta \in (0, \delta _0)\) then turns into

$$\begin{aligned} \theta _{1,\delta } \circ X(\beta ,\gamma ) = F_{1,\delta }(\beta ) + F_{2,\delta }(\gamma ) \end{aligned}$$
(64)

after averaging.

Let \(\delta \in (0, \delta _0)\). Due to our assumption that \(0 \in A\) and the fact that the inequalities for \(\theta _\delta \) in the definition (53) are open conditions, continuity of \(\theta _\delta \) implies that there exists \(\kappa (\delta ) >0\) such that for all \(\beta , \gamma \in [-\kappa (\delta ),\kappa (\delta )]\) we have

$$\begin{aligned} \theta _{1,\delta } \circ X(\beta ,\gamma )&< \varepsilon (\delta ),\nonumber \\ \eta< \theta _{2,\delta }\circ X(\beta ,\gamma )&<1-\eta \nonumber \\ \eta< \theta _{3,\delta }\circ X(\beta ,\gamma )&< 1 - \eta . \end{aligned}$$
(65)

By (64) and (65), \(\theta _{1,\delta }\) is small and a sum of two one-dimensional functions. Lemma (6) implies that the individual terms are small, i.e., we for all \(\beta ,\gamma \in [-\kappa (\delta ),\kappa (\delta )]\) have

$$\begin{aligned} F_{1,\delta }(\beta ) - \min _{[-\kappa (\delta ), \kappa (\delta )]} F_{1,\delta }&\le \varepsilon (\delta ) , \\ F_{2,\delta }(\gamma ) - \min _{[-\kappa (\delta ), \kappa (\delta )]} F_{2,\delta }&\le \varepsilon (\delta ) , \end{aligned}$$

where we used continuity to replace the essential infima by minima. In particular, for the oscillations on closed intervals \(I\subset [-r_1,r_1]\), defined as

$$\begin{aligned} {\text {osc}}_{I} F_{1,\delta }&\, {:}{=}\, \max _{ I} F_{1,\delta } - \min _{[-\kappa (\delta ), \kappa (\delta )]} F_{1,\delta } , \\ {\text {osc}}_{I} F_{2,\delta }&\, {:}{=}\, \max _{ I} F_{2,\delta } - \min _{[-\kappa (\delta ), \kappa (\delta )] } F_{2,\delta }, \end{aligned}$$

we have that

$$\begin{aligned} 0 \le {\text {osc}}_{[-\kappa (\delta ), \kappa (\delta )]} F_{1,\delta }&\le \varepsilon (\delta ) , \\ 0 \le {\text {osc}}_{[-\kappa (\delta ), \kappa (\delta )]} F_{2,\delta }&\le \varepsilon (\delta ). \end{aligned}$$

By continuity of \(F_{1,\delta }\) and \(F_{2,\delta }\) the oscillations are continuous when varying the endpoints of the involved intervals. Thus there exist unique maximal, closed intervals

$$\begin{aligned} {[}-\kappa (\delta ), \kappa (\delta )]\subset I_{1,\delta } \subset \left[ -r_1,r_1 \right] \text { and }[-\kappa (\delta ), \kappa (\delta )] \subset I_{2,\delta } \subset \left[ -r_1,r_1 \right] \end{aligned}$$

such that

$$\begin{aligned} {\text {osc}}_{I_{1,\delta }} F_{1,\delta } \le \varepsilon (\delta ) \text { and } {\text {osc}}_{I_{2,\delta }} F_{2,\delta } \le \varepsilon (\delta ). \end{aligned}$$

For the remainder of the proof, we aim to argue that there exists \( r \in (0,r_1)\) universal such that \([- r, r] \subset I_{1,\delta }\) and \([- r, r] \subset I_{2,\delta }\). However, the goal of the next couple of steps will be to first make sure the intervals do not shrink away as \(\delta \rightarrow 0\), see Fig. 22 for an outline of the argument.

Fig. 22
figure 22

Sketch relating \(I_{1,\delta }\times I_{2,\delta }\) for \(\delta \in (0, \delta _0)\) and the line \(l(t) = t(1,1)\) for \(t\in \mathbb {R}\). Step 4 ensures \(\min (\theta _{2,\delta }, \theta _{3,\delta }) <\varepsilon (\delta )\) on \(\partial (I_2\times I_3)\). In Step 5 we will show that \(\theta _2\) is almost affine along the dashed part of l, which we will exploit in Step 6 to argue that \(\theta _2\circ l(t_{\mathrm{min},\delta }) \approx 0\) and \(\theta _2\circ l(t_{\mathrm{max},\delta }) \approx 1\) or vice versa due to \(\theta _2\circ l(0) \not \approx 0,1\). The function \(\theta _2\) being of vanishing mean oscillation allows us then to deduce that \(t_{\mathrm{min},\delta }\) and \(t_{\mathrm{max},\delta }\) cannot get too close as \(\delta \rightarrow 0\)

Step 4: For all \(\delta \in (0, \delta _0)\) and all \((\beta ,\gamma ) \in \partial \left( I_{1,\delta }\times I_{2,\delta }\right) \cap (-r_1, r_1)^2\) it holds that

$$\begin{aligned} \min \{\theta _{2,\delta }(\beta ,\gamma ),\theta _{3,\delta }(\beta ,\gamma )\}<\varepsilon (\delta ). \end{aligned}$$

Let \(\delta \in (0, \delta _0)\) and let us consider the case \(\beta \in \partial I_{1,\delta } \cap (-r_1, r_1)\). We then have

$$\begin{aligned} F_{1,\delta }(\beta ) - \min _{[-\kappa (\delta ), \kappa (\delta )]} F_{1,\delta } = \varepsilon (\delta ). \end{aligned}$$

Together with (64) we obtain for all \(\gamma \in I_{2,\delta } \) that

$$\begin{aligned} \theta _{1,\delta }(\beta , \gamma ) = F_{1,\delta }(\beta ) + F_{2,\delta }(\gamma ) = \varepsilon (\delta ) + \min _{[-\kappa (\delta ), \kappa (\delta )]} F_{1,\delta } + F_{2,\delta }(\gamma ). \end{aligned}$$

The same decomposition and (54) in the form of \(\theta _{1,\delta }\circ X \ge 0\) everywhere then for all \(\gamma \in I_{2,\delta }\) imply

$$\begin{aligned} \theta _{1,\delta }(\beta , \gamma ) = \varepsilon (\delta ) + \min _{[-\kappa (\delta ), \kappa (\delta )]} F_{1,\delta } + F_{2,\delta }(\gamma ) \ge \varepsilon (\delta ) + \min _{[-r_1, r_1]^2} \theta _{1,\delta } \circ X \ge \varepsilon (\delta ) . \end{aligned}$$

Due to the assumption \((u,\theta ) \in \mathscr {U}^0\) and the inclusion (54) we for all parameters \(\beta \in \partial I^\delta _1 \cap (-r_1,r_1) \) and \( \gamma \in I_{2,\delta }\) see

$$\begin{aligned} \min \{\theta _{2,\delta }(\beta ,\gamma ),\theta _{3,\delta }(\beta ,\gamma )\}<\varepsilon (\delta ), \end{aligned}$$
(66)

which is one part of the claim. Swapping the roles of \(\beta \) and \(\gamma \) we obtain the remaining part.

In the following we for \(t \in \mathbb {R}\) define \(l(t)\, {:}{=}\, t(1,1)\) and let \(-r_1 \le t_{\mathrm{min},\delta }< 0< t_{\mathrm{max},\delta } \le r_1\) be the two parameters for which l intersects \(\partial (I_{1,\delta } \times I_{2,\delta })\), see Fig. 22.

Step 5: Let \(\delta \in (0, \delta _0)\). Then the functions \(\theta _{2,\delta }\circ X\) and \(\theta _{3,\delta } \circ X\) are almost affine along l on \([t_{\mathrm{min},\delta },t_{\mathrm{max},\delta }]\) in the sense that for all t, h, \(\smash {\tilde{h}} \in \mathbb {R}\) with t, \(t + h\), \(t + \smash {\tilde{h}}\), \(t + h + \smash {\tilde{h}} \in [t_{\mathrm{min},\delta },t_{\mathrm{max},\delta }]\) we have

$$\begin{aligned}&\big | \theta _{2,\delta }\circ X \circ l(t + h + \tilde{h}) + \theta _{2,\delta } \circ X \circ l(t) \nonumber \\&\quad -\theta _{2,\delta }\circ X \circ l(t + h) - \theta _{2,\delta }\circ X \circ l(t + \tilde{h}) \big |< 216 \varepsilon (\delta ),\nonumber \\&\big | \theta _{3,\delta }\circ X \circ l(t + h + \tilde{h}) + \theta _{3,\delta } \circ X \circ l(t) \nonumber \\&\quad -\theta _{3,\delta }\circ X \circ l(t + h) - \theta _{3,\delta }\circ X \circ l(t + \tilde{h})\big | < 216 \varepsilon (\delta ). \end{aligned}$$
(67)

Note that as \(\theta _{2,\delta }\) and \(\theta _{3,\delta }\) are continuous, existence of traces is trivial.

For all points \(\bar{\beta } \in \hbox {arg min}_{[ -\kappa (\delta ), \kappa (\delta )] } F_{1,\delta }\) and \(\bar{\gamma } \in \hbox {arg min}_{[ -\kappa (\delta ), \kappa (\delta )] } F_{2,\delta }\) we have

$$\begin{aligned} \theta _{1,\delta } \circ X(\bar{\beta },\bar{\gamma }) \le \varepsilon (\delta ) \end{aligned}$$

due to decomposition (64) and estimate (65). Consequently, together with the assumption \(\theta _1 \ge 0\) almost everywhere, we have for any \((\beta ,\gamma ) \in I_{1,\delta }\times I_{2,\delta }\) that

$$\begin{aligned} 0 \le \theta _{1,\delta } \circ X(\beta ,\gamma ) \le \theta _{1,\delta } \circ X(\bar{\beta },\bar{\gamma }) + {\text {osc}}_{I_{1,\delta }} F_{1,\delta } + {\text {osc}}_{I_{2,\delta }} F_{2,\delta } \le 3\varepsilon (\delta ). \end{aligned}$$
(68)

As we have that \(X \circ l(t) = \sqrt{2} t E_1\) for all \(t\in [-r_1,r_1]\) and we chose \(r_1\) sufficiently small at the beginning of Step 3, we can apply Lemma 7 to get for all \(\nu \in N_2\cup N_3\) and all t, h, \(\smash {\tilde{h}} \in \mathbb {R}\) with t, \(t + h\), \(t + \smash {\tilde{h}}\), \(t + h + \smash {\tilde{h}} \in [t_{\mathrm{min},\delta },t_{\mathrm{max},\delta }]\) that

$$\begin{aligned}&\big | f_{\nu ,\delta }\circ X\circ l(t + h + \tilde{h}) + f_{\nu ,\delta } \circ X \circ l(t) \\&\quad -f_{\nu ,\delta }\circ X \circ l(t + h) - f_{\nu ,\delta }\circ X \circ l(t + \tilde{h}) \big | < 108 \varepsilon (\delta ). \end{aligned}$$

We now plug this information into the decomposition (24) of \(\theta _2\) and \(\theta _3\). Observing that affine functions drop out in second discrete derivatives and that \(f_{\nu _1^+,\delta }\) and \(f_{\nu _1^-,\delta }\) drop out as the line \(X\circ l\) is parallel to \(E_1\), we obtain the claim.

Step 6: There exists \( \delta _1 \in (0, \delta _0)\) such that the following holds: Let \(\delta \in (0, \delta _1)\). If we have \( -r_1< t_{\mathrm{min},\delta }< 0< t_{\mathrm{max},\delta } < r_1\), then we either have

$$\begin{aligned} \theta _{2,\delta } \circ X \circ l(t_{\mathrm{min},\delta })&< \varepsilon (\delta ),\\ \theta _{3,\delta } \circ X \circ l (t_{\mathrm{min},\delta })&> 1-4\varepsilon (\delta ),\\ \theta _{2,\delta } \circ X \circ l (t_{\mathrm{max},\delta })&>1-4\varepsilon (\delta ),\\ \theta _{3,\delta } \circ X \circ l(t_{\mathrm{max},\delta })&<\varepsilon (\delta ) \end{aligned}$$

or

$$\begin{aligned} \theta _{2,\delta } \circ X \circ l(t_{\mathrm{min},\delta })&> 1-4\varepsilon (\delta ), \\ \theta _{3,\delta } \circ X \circ l (t_{\mathrm{min},\delta })&< \varepsilon (\delta ), \\ \theta _{2,\delta } \circ X \circ l (t_{\mathrm{max},\delta })&<\varepsilon (\delta ) ,\\ \theta _{3,\delta } \circ X \circ l(t_{\mathrm{max},\delta })&>1-4\varepsilon (\delta ). \end{aligned}$$

Once the upper bounds by \(\varepsilon (\delta )\) are proven, the lower bounds by \(1-4\varepsilon (\delta )\) follow from the fact that \(\theta _{1,\delta } + \theta _{2,\delta } + \theta _{3,\delta } \equiv 1\) everywhere (resulting by linearity of convolutions from \(\sum _{i=1}^3\theta _i\equiv 1\) almost everywhere) and the inequality (68). Aiming for a contradiction we assume that

$$\begin{aligned} \theta _{3,\delta } \circ X\circ l(t_{\mathrm{min},\delta })&< \varepsilon (\delta ),\nonumber \\ \theta _{3,\delta } \circ X \circ l(t_{\mathrm{max},\delta })&< \varepsilon (\delta ). \end{aligned}$$
(69)

Recalling Step 4 we see that the only other undesirable case is \(\theta _{2,\delta } \circ X\circ l(t_{\mathrm{min},\delta }))<\varepsilon (\delta )\), \(\theta _{2,\delta } \circ X \circ l(t_{\mathrm{max},\delta })) < \varepsilon (\delta )\), which can be dealt with in the same manner.

In order to transport the information (69) to the point l(0) we use that \(\theta _{3,\delta } \circ X \) is almost affine along l: For \(t : = t_{\mathrm{min},\delta }\), \(h\, {:}{=}\, -t_{\mathrm{min},\delta }\) and \(\tilde{h} : = t_{\mathrm{max},\delta }\) we have

$$\begin{aligned} t + h&= 0&\in [t_{\mathrm{min},\delta },t_{\mathrm{max},\delta }],\\ t + \tilde{h}&= t_{\mathrm{min},\delta } + t_{\mathrm{max},\delta }&\in [t_{\mathrm{min},\delta },t_{\mathrm{max},\delta }],\\ t+h+\tilde{h}&= t_{\mathrm{max},\delta }&. \end{aligned}$$

Therefore estimate (67) implies

$$\begin{aligned}&\big | \theta _{3,\delta } \circ X \circ l(t_{\mathrm{max},\delta }) + \theta _{3,\delta } \circ X \circ l(t_{\mathrm{min},\delta })\\&\quad \quad \quad - \theta _{3,\delta } \circ X\circ l(0) -\theta _{3,\delta }\circ X\circ l(t_{\mathrm{min},\delta } + t_{\mathrm{max},\delta }) \big | < 216 \varepsilon (\delta ). \end{aligned}$$

Combining this inequality with \(\theta _{3,\delta }\circ X\circ l(t_{\mathrm{min},\delta } + t_{\mathrm{max},\delta }) \ge 0 \) and the assumption (69) we arrive at

$$\begin{aligned} \theta _{3,\delta } \circ X \circ l(0) < 218\varepsilon (\delta ). \end{aligned}$$

However, this is in contradiction to the strain lying strictly between two martensite strains at 0 for all \(\delta \in (0, \delta _1)\) with \( \delta _1 \in (0, \delta _0)\) sufficiently small, see estimate (65), which proves the claim.

Step 7: We do not have \( \liminf _{\delta \rightarrow 0} ( t_{\mathrm{max},\delta } - t_{\mathrm{min},\delta }) = 0\).

Towards a contradiction, we assume that the difference does vanish along some subsequence, which we do not relabel in the following. By \(t_{\mathrm{max},\delta }<0< t_{\mathrm{min},\delta }\) for all \(\delta \in (0,\delta _0)\) we have \(\lim _{\delta \rightarrow 0} t_{\mathrm{max},\delta } = \lim _{\delta \rightarrow 0} t_{\mathrm{min},\delta } = 0\). For \(s \in [0,1]\), let

$$\begin{aligned} G_\delta (s)\, {:}{=}\, \theta _{2,\delta } \circ X \circ l ( (1-s)t_{\mathrm{min},\delta } + st_{\mathrm{max},\delta } ). \end{aligned}$$
(70)

We can apply estimate (67) and Lemma 13 from the Appendix A to get that the sequence \(G_\delta \) converges uniformly on [0, 1] to an affine function \(G:\mathbb {R}\rightarrow \mathbb {R}\). Due to Step 6 and \(\lim _{\delta \rightarrow 0} t_{\mathrm{max},\delta } = \lim _{\delta \rightarrow 0} t_{\mathrm{min},\delta } = 0\) we know that the linear part of g has to be nontrivial, and we thus get that

$$\begin{aligned} \lim _{\delta \rightarrow 0} \int _0^1 \left| G_\delta (s) - \int _0^1 G_\delta (\tilde{s}) \, \mathrm {d}\tilde{s} \right| \, \mathrm {d}s = \int _0^1 \left| G (s) - \int _0^1 G (\tilde{s}) \, \mathrm {d}\tilde{s} \right| \, \mathrm {d}s > 0. \end{aligned}$$

Due to \(X\circ l(t) = \sqrt{2} t E_1 \) for all \(t\in [-r_1,r_1]\), the functions \(f_{\nu _1^+,\delta } \circ X \circ l \) and \(f_{\nu _1^-,\delta } \circ X \circ l\) are constant. Furthermore, \(g_2\) is affine by definition and thus continuous. Undoing the rescaling in the definition (70) and using (24), (59), and (62) together with \(l(t) = t(1,1)\) for all \(t\in \mathbb {R}\), we therefore conclude that

$$\begin{aligned} \lim _{\delta \rightarrow 0} \fint _{t_{\mathrm{min},\delta }}^{t_{\mathrm{max},\delta }} \left| \left( f_{\nu _3^+,\delta } + f_{\nu _3^-, \delta } \right) (t ) - \fint _{t_{\mathrm{min},\delta }}^{t_{\mathrm{max},\delta }}\left( f_{\nu _3^+,\delta } + f_{\nu _3^-, \delta } \right) ( \tilde{t} ) \, \mathrm {d}\tilde{t} \right| \, \mathrm {d}t>0. \end{aligned}$$

By the analogue of (37) for maps defined on \((t_{\mathrm{max},\delta },t_{\mathrm{min},\delta })\), we get for all \(\delta >0\) that

$$\begin{aligned}&\fint _{t_{\mathrm{min},\delta }}^{t_{\mathrm{max},\delta }} \left| \left( f_{\nu _3^+,\delta } + f_{\nu _3^-, \delta } \right) (t ) - \fint _{t_{\mathrm{min},\delta }-\delta }^{t_{\mathrm{max},\delta } + \delta }\left( f_{\nu _3^+} + f_{\nu _3^-} \right) ( \tilde{t} ) \, \mathrm {d}\tilde{t} \right| \, \mathrm {d}t\\&\quad \ge \frac{1}{2} \fint _{t_{\mathrm{min},\delta }}^{t_{\mathrm{max},\delta }} \left| \left( f_{\nu _3^+,\delta } + f_{\nu _3^-, \delta } \right) (t ) - \fint _{t_{\mathrm{min},\delta }}^{t_{\mathrm{max},\delta }}\left( f_{\nu _3^+,\delta } + f_{\nu _3^-, \delta } \right) ( \tilde{t} ) \, \mathrm {d}\tilde{t} \right| \, \mathrm {d}t. \end{aligned}$$

The above two inequalities, together with Young’s inequality for convolutions, then imply

$$\begin{aligned} \liminf _{\delta \rightarrow 0} \fint _{t_{\mathrm{min},\delta }-\delta }^{t_{\mathrm{max},\delta }+\delta } \left| \left( f_{\nu _3^+} + f_{\nu _3^-} \right) (t ) - \fint _{t_{\mathrm{min},\delta }-\delta }^{t_{\mathrm{max},\delta } + \delta }\left( f_{\nu _3^+} + f_{\nu _3^-} \right) ( \tilde{t} ) \, \mathrm {d}\tilde{t} \right| \, \mathrm {d}t>0. \end{aligned}$$

However, this is a contradiction to our assumption that \(\smash {f_{\nu _3^+}, f_{\nu _3^-}} \in VMO(-1,1)\) since we have \( t_{\mathrm{max},\delta } - t_{\mathrm{min},\delta } + 2\delta \rightarrow 0\) as \(\delta \rightarrow 0\).

Step 8: For \(n\in \mathbb {N}\) with \(n>1\) let \(\delta _n \in (0,\delta _1)\) be such that \(\lim _{n\rightarrow \infty } \delta _n = 0\) and such that there exist \(t_{\mathrm{min}}\, {:}{=}\, \lim _{n\rightarrow \infty } t_{\mathrm{min},\delta _n}\) and \(t_{\mathrm{max}}\, {:}{=}\, \lim _{n\rightarrow \infty } t_{\mathrm{max},\delta _n}\). Then the open set

$$\begin{aligned} B \, {:}{=}\, ^{\mathrm {o}} \end{aligned}$$

has a connected component P such that \(0 \in \overline{P}\). Furthermore, for all \(\nu \in N_2\cup N_3\) there exist open, non-empty intervals \(I_\nu \subset \mathbb {R}\) such that the set P satisfies

$$\begin{aligned} P = \bigcap _{\nu \in N_2 \cup N_3} \left\{ x\in {B_{\frac{1}{3}}\left( 0\right) }: \nu \cdot x \in I_\nu \right\} , \end{aligned}$$

i.e., up to localization it is a polyhedron whose faces’ normals are contained in \(N_2\cup N_3\).

Let \(n\in \mathbb {N}\) and for \(\hat{r} \in (0,\frac{1}{3})\) let

$$\begin{aligned} P_{\hat{r},\delta _n} \, {:}{=}\, \bigcap _{\nu \in N_2 \cup N_3} \left\{ x \in {B_{1/3}\left( 0\right) }: \exists \, t \in (t_{\mathrm{min},\delta _n},t_{\mathrm{max},\delta _n})\cap (-\hat{r},\hat{r}): \nu \cdot x = \nu \cdot X\circ l(t) \right\} . \end{aligned}$$

Estimate (68) gives for all \(t\in (t_{\mathrm{min},\delta _n},t_{\mathrm{max},\delta _n})\) that \(0\le \theta _{1,{\delta _n}} \circ X \circ l (t) \le 3 \varepsilon (\delta _n)\). For \(\hat{r} \in (0,\smash {\frac{1}{3}})\) small enough we can thus apply Lemma 7 to obtain \(0\le \theta _{1,{\delta _n}}(x) \le 27 \varepsilon (\delta _n)\) for all \(x\in P_{r,\delta _n}\). We have \(t_{\mathrm{min}} \le 0 \le t_{\mathrm{max}}\) and, by Step 7, \(t_{\mathrm{min}}<t_{\mathrm{max}}\), so that passing to the limit \(n \rightarrow \infty \) the set

$$\begin{aligned} P_{\hat{r}}\, {:}{=}\, \bigcap _{\nu \in N_2 \cup N_3} \left\{ x \in {B_{1/3}\left( 0\right) }: \exists \, t \in (t_{\mathrm{min}},t_{\mathrm{max}})\cap (-\hat{r},\hat{r}): \nu \cdot x = \nu \cdot X\circ l(t) \right\} \end{aligned}$$

is non-empty, open, and satisfies \( P_{\hat{r}} \subset B\) and \(0 \in \overline{ P_{\hat{r}}}\). Consequently, we have for the connected component P of B containing \(P_{\hat{r}}\) that \(0 \in \overline{P}\).

Recall that by Equation (63) we for almost all \(x\in P\) have

$$\begin{aligned} \sum _{\nu \in N_2} f_{\nu }(\nu \cdot x) - \sum _{\nu \in N_3} f_{\nu }(\nu \cdot x) = \theta _1(x) = \lim _{n \rightarrow \infty } \theta _{1,\delta _n} (x) = 0. \end{aligned}$$

For every \(\nu \in N_2 \cup N_3\) there exist two different directions \(d, \tilde{d} \in \mathscr {D}\) such that by distributionally differentiating this identity in the directions d and \(\tilde{d}\) only \(f_\nu \) remains, see Remark 1. Therefore, we get that \(x\mapsto f_\nu (x\cdot \nu )\) is locally affine on P. By connectedness of P, there must exist an affine function \(g_\nu : \mathbb {R}\rightarrow \mathbb {R}\) for all \(\nu \in N_2 \cup N_3\) such that for almost all \(x \in P\) we have \(f_\nu (x\cdot \nu )= g_\nu (x\cdot \nu )\). As we have \(\sum _{\nu \in N_2} g_\nu (x\cdot \nu ) - \sum _{\nu \in N_3} g_\nu (x\cdot \nu ) = 0\) for all \(x \in \mathbb {R}^3\), we may for all \(\nu \in N_2 \cup N_3\) and \(i=2,3\) redefine the functions \(f_\nu \in L^\infty (-1,1)\) and the affine functions \(g_i\) to satisfy

$$\begin{aligned} f_\nu \equiv 0 \text { on } P. \end{aligned}$$
(71)

and to leave the decompositions (63) and (24) still valid (the latter for \(\theta _2\) and \(\theta _3\)).

Since P is open and connected, the image \(I_\nu \, {:}{=} \, \nu \cdot P\) for all \(\nu \in N_2 \cup N_3\) is open and connected, and thus an interval. It is also clearly non-empty and by construction we have

$$\begin{aligned} P \subset \bigcap _{\nu \in N_2 \cup N_3} \left\{ x\in {B_{1/3}\left( 0\right) }: \nu \cdot x \in I_\nu \right\} . \end{aligned}$$

As by (71) for all \(\tilde{\nu } \in N_2 \cup N_3\) it holds that \(f_{\tilde{\nu }} \equiv 0\) on \(\bigcap _{\nu \in N_2 \cup N_3} \{x\in {B_{1/3}\left( 0\right) }: \nu \cdot x \in I_\nu \}\) we get the other inclusion

$$\begin{aligned} \bigcap _{\nu \in N_2 \cup N_3} \left\{ x\in {B_{1/3}\left( 0\right) }: \nu \cdot x \in I_\nu \right\} \subset P, \end{aligned}$$

which proves the claim.

For \(\nu \in N_2 \cup N_3\) let \(a_\nu \le 0\) and \(b_\nu \ge 0\) with \(a_\nu < b_\nu \) be such that \(I_\nu = (a_\nu ,b_\nu )\). Step 9: Let \(i\in \{2,3\}\) and \(\nu \in N_i\) be such that \(R= H(a_\nu , \nu )\cap \overline{P}\) or \(R=H(b_\nu ,\nu )\cap \overline{P}\), see definition (13), is a face of the polyhedron P with \(R\cap {B_{r_1}\left( 0\right) } \ne \emptyset \). Then we have \(\theta _i (x) = 0\) for almost all \(x \in R\cap {B_{r_1}\left( 0\right) }\) or \(\theta _i(x) = 1\) for almost all \(x \in R\cap {B_{r_1}\left( 0\right) }\), which is meaningful by decomposition (24) and Lemma4.

In order to keep notation simple, we assume that \(\nu = \nu _2^+\) and that it is the outer normal to P at R, i.e., for \(b\, {:}{=}\, b_{\nu _2^+}\) we have \(\{b\} = \nu _2^+ \cdot R\) and \(P\subset \{x\cdot \nu _2^+ < b\}\). A two-dimensional sketch of this situation can be found in Fig. 23a, while a less detailed three-dimensional one is shown in Fig. 15a. Furthermore, by connectedness of the convex set \(R\cap {B_{r_1}\left( 0\right) }\), we only have to prove the dichotomy locally, i.e., for all \(x_0\in R\cap {B_{r_1}\left( 0\right) }\) it is sufficient to find some \( \kappa = \kappa (x_0)>0\) with

$$\begin{aligned}&{B_{ \kappa }\left( x_0\right) } \subset {B_{r_1}\left( 0\right) }, \nonumber \\&{B_{ \kappa }\left( x_0\right) } \cap H(b,\nu _2^+) \subset R \cap {B_{r_2}\left( 0\right) },\nonumber \\&{B_{ \kappa }\left( x_0\right) } \cap \{x\cdot \nu _2^+ < b\} \subset P \end{aligned}$$
(72)

such that on \(R \cap {B_{ \kappa }\left( x_0\right) }\) we have either \(\theta _2 \equiv 0\) or \(\theta _2 \equiv 1\) almost everywhere.

Fig. 23
figure 23

a Inside \({B_{\tilde{\kappa }}\left( x_0\right) }\), the polyhedron P looks like a half-space with boundary R and exterior normal \(\nu _2^+\). b The dichotomy \(\theta _{2,\delta } \approx 0\) or \(\theta _{2,\delta } \approx 1\) on the dashed line \(\widetilde{H} = H(b_\delta ,\nu _2^+) \cap {B_{\kappa /2}\left( x_0\right) }\) can be propagated to the gray neighborhood of \(x_0\) as long as \({\text {dist}}(x_0,H(b_\delta ,\nu _2^+))\) is small enough

Let \(x_0 \in R \cap {B_{r_1}\left( 0\right) }\) and let \( \kappa _1>0\) be such that the inclusions (72) hold. We can use the identity (71) to get \(f_{\nu _2^+} (x\cdot \nu _2^+)=0\) for almost all \(x\in {B_{\kappa _1}\left( x_0\right) } \cap \{x\cdot \nu _2^+ < b\}\subset P\). Similarly, for all \(\tilde{\nu } \in N_2 \cup N_3 {\setminus }\left\{ \nu _2^+\right\} \) and almost all \(x \in {B_{\kappa _2}\left( x_0\right) }\) with some \(\kappa _2\in (0,\kappa _1)\) we obtain \(f_{\tilde{\nu }}(x\cdot \tilde{\nu }) = 0\). For all \(\delta \in \left( 0,\frac{\kappa _2}{2}\right) \) we therefore get after averaging

$$\begin{aligned} f_{\nu _2^+,\delta }\left( b - \delta \right) =0 \end{aligned}$$
(73)

and \( f_{\tilde{\nu },\delta } (x\cdot \tilde{\nu })= 0\) for all \(\tilde{\nu } \in N_2 \cup N_3 {\setminus }\left\{ \nu _2^+\right\} \) and almost all \(x\in {B_{\kappa _2/2}\left( x_0\right) } \). In particular, the latter together with the decomposition (63) for almost all \(x\in \smash {{B_{\kappa _2/2}\left( x_0\right) }}\) implies

$$\begin{aligned} f_{\nu _2^+,\delta } (x\cdot \nu _2^+) = \theta _{1,\delta }(x) \ge 0. \end{aligned}$$
(74)

Let \(c\in (0,\smash {\frac{1}{4}})\) be a universal number to be chosen later. Towards a contradiction, assume that for all \(\tilde{\delta } \in (0,\frac{\kappa _2}{2})\) there existed \(\delta \in (0, \tilde{\delta })\) with \(f_{\nu _2^+,\delta }(b') < \varepsilon ( \delta )\) for all \(b' \in (b- c \kappa _2, b+c \kappa _2) \), where \( \varepsilon (\delta ) \rightarrow 0\) as \(\delta \rightarrow 0\) is as in the definition of \(\smash {\mathscr {U}^0}\). Then we would have \(f_{\nu _2^+}(x\cdot \nu _2^+) = 0\) for almost all \(x\in \smash {{B_{c\kappa _2}\left( x_0\right) }}\), and we would get \(\smash {{B_{c\kappa _2}\left( x_0\right) }} \subset P\) by the definition of P, see Step 8. As a result, the fact \(P\subset \{x\cdot \nu < b\}\) would give the contradiction \(\smash {{B_{c\kappa _2}\left( x_0\right) }} \subset \{x\cdot \nu < x_0 \cdot \nu \}\).

Consequently, there exists \(\tilde{\delta } \in \left( 0,\frac{\kappa _2}{2} \right) \) such that for all \(\delta \in (0,\tilde{\delta })\) there exists \(b'_{\delta } \in (b-c\kappa _2, b+c\kappa _2) \) with \(f_{\nu _2^+,\delta }(b'_{\delta }) \ge \varepsilon (\delta )\). In the following, let \(\delta \in (0,\tilde{\delta })\). By Equation (73) and continuity there even exists \(b_{\delta } \in (b-c\kappa _2, b+c\kappa _2) \) with \(f_{\nu _2^+,\delta }(b_{\delta })= \varepsilon (\delta )\). Due to Equation (74) this implies for all \(x\in \widetilde{H}\, {:}{=}\, H(b_\delta ,\nu _2^+) \cap {B_{\frac{\kappa _2}{2}}\left( x_0\right) }\), see Fig. 23b, that

$$\begin{aligned} \theta _{1,\delta }(x) = \varepsilon (\delta ). \end{aligned}$$
(75)

Combining this with the inclusion \(\theta _\delta \in \smash {\widetilde{\mathscr {K}}} + {B_{\varepsilon (\delta )}\left( 0\right) }\) on \({B_{1/3}\left( 0\right) }\), (see (54) and the definition of \(\mathscr {U}^0\)) we get, for all \(x\in \smash {\widetilde{H}}\),

$$\begin{aligned} \min \{\theta _{2,\delta }(x), \theta _{3,\delta }(x) \} < \varepsilon (\delta ), \end{aligned}$$

which, due to the assumption \(\theta _1(x) + \theta _2(x) + \theta _3(x) = 1\), (see also (12)) can be converted into

$$\begin{aligned} \min \{\theta _{2,\delta }(x), 1- \theta _{2,\delta }(x) \} < 2\varepsilon (\delta ). \end{aligned}$$

For \(\delta >0\) small enough, continuity then implies the dichotomy

$$\begin{aligned} \theta _{2,\delta }(x)< 2\varepsilon (\delta ) \text { for all } x\in \widetilde{H} \text { or } 1- \theta _{2,\delta }(x) < 2\varepsilon (\delta ) \text { for all } x\in \widetilde{H}. \end{aligned}$$
(76)

In order to propagate this information back to \(x_0\), see again Fig. 23b, let \(x_\delta \, {:}{=}\, x_0 + \left( b_\delta - b \right) \nu _2^+\). The line \(l_\delta (t)\, {:}{=}\, x_\delta + \sqrt{2} t E_2\) for all \(t\in \smash {[-\frac{1}{4\sqrt{2}}\kappa _2 ,\frac{1}{4\sqrt{2}}\kappa _2]}\) satisfies \(l_\delta (t) \in {B_{\kappa _2/2}\left( x_0\right) }\) on account of \(|x_\delta - x_0| = |b_\delta - b| < c\kappa _2\) and \(c\in (0,\frac{1}{4})\). Furthermore, for all \(t\in \smash {[-\frac{1}{4\sqrt{2}}\kappa _2 ,\frac{1}{4\sqrt{2}}\kappa _2]}\) we have

$$\begin{aligned} l_\delta (t) \cdot \nu _2^+ = l_\delta (t)\cdot \frac{1}{\sqrt{2}} (101) = b_\delta , \end{aligned}$$

by \(x_0\in R \subset H(b,\nu _2^+)\). Thus, the dichotomy (76) implies

$$\begin{aligned} \theta _{2,\delta }\circ l_\delta (t) \le 2\varepsilon (\delta )&\text { for all } t\in \left[ -\frac{1}{4\sqrt{2}}\kappa _2 ,\frac{1}{4\sqrt{2}}\kappa _2\right] \\ \text { or } 1- \theta _{2,\delta }\circ l_\delta (t) \le 2\varepsilon (\delta )&\text { for all } t\in \left[ -\frac{1}{4\sqrt{2}}\kappa _2 ,\frac{1}{4\sqrt{2}}\kappa _2\right] . \end{aligned}$$

As we have \(l_\delta (t) \in {B_{\kappa _2/2}\left( x_0\right) }\subset {B_{r_1}\left( 0\right) }\) for all \(t\in \smash {[-\frac{1}{4\sqrt{2}}\kappa _2 ,\frac{1}{4\sqrt{2}}\kappa _2]}\) and in Step 3 we chose \(r_1>0\) to be small enough to apply Lemma 7, we get a universal constant \(\tilde{c}>0\) such that

$$\begin{aligned} \theta _{2,\delta }(x)&<18 \varepsilon (\delta ) \text { for all } x\in {B_{\tilde{c} \kappa _2}\left( x_\delta \right) } \nonumber \\ \text { or } 1- \theta _{2,\delta }(x)&< 18 \varepsilon (\delta ) \text { for all } x\in {B_{\tilde{c} \kappa _2}\left( x_\delta \right) } . \end{aligned}$$
(77)

Recall \(|x_0- x_\delta | < c\kappa _2\). As a result, the choice \(c = \min \{\frac{\tilde{c}}{2},\frac{1}{5}\}<\frac{1}{4}\) ensures that estimate (77) holds on \({B_{ \kappa }\left( x_0\right) }\) for \( \kappa = c \kappa _2 \). In the limit \(\delta \rightarrow 0\) this gives \(\theta _2 (x) =0\) for almost all \(x \in {B_{ \kappa }\left( x_0\right) }\) or \(\theta _2(x)=1\) for almost all \(x \in {B_{ \kappa }\left( x_0\right) }\). By Lemma 4 and the decomposition (24) we see that \(\theta _2 \equiv 0\) or \(\theta _2 \equiv 1\) on \({B_{ \kappa }\left( x_0\right) } \cap R\). As the inclusions (72) hold for \(\kappa <\kappa _1\) this concludes Step 9.

Step 10: The universal radius \(r_{2}\, {:}{=} \, \frac{r_1}{2}\) satisfies that for all \(\nu \in N_2 \cup N_3\), the intervals \(I_\nu = (a_\nu ,b_\nu )\) of Step 8 with \(a_\nu \le 0\) and \(b_\nu \ge 0\) satisfy \(a_\nu \le - r_{2}\) and \( b_\nu \ge r_{2}\), which allows us to conclude the proof by Corollary 1and Step 2.

Towards a contradiction we assume that there exists \(\nu \in N_2\cup N_3\) such that \(a_\nu > - r_{2}\) or \(b_\nu < r_{2} \). For the sake of concreteness we assume that

$$\begin{aligned} b\, {:}{=}\, b_{\nu _2^+} = \min _{\nu \in N_2 \cup N_3} \{-a_\nu ,b_\nu \} \in [0, r_{2}), \end{aligned}$$
(78)

so that the combinatorics here match those of the previous step. All other cases work the same.

Let \(R\, {:}{=} \, H(b,\nu _2^+) \cap \overline{P} \). For \(t\in \mathbb {R}\) let \(\tilde{l}(t)\, {:}{=} \, b\nu _2^+ + \sqrt{2} tE_2\). Let \(J\, {:}{=}\, \tilde{l}^{-1}(R\cap {B_{r_{1}}\left( 0\right) })\) and note that J is an interval with \(0\in J\) due to Equation (78). By Step 9, decomposition (24) and Lemma 4 we have

$$\begin{aligned} \theta _2 \circ \tilde{l}(t) = 0&\text { for almost all }t \in J \\ \text {or } \theta _2 \circ \tilde{l}(t) = 1&\text { for almost all }t \in J . \end{aligned}$$

At the beginning of Step 3 we chose \(r_1\) such that we can apply Lemma 7 on \({B_{r_1}\left( 0\right) }\). Therefore, we get \(\theta _2 \equiv 0\) or \(\theta _2 \equiv 1\) on the convex polyhedron

$$\begin{aligned} Q\, {:}{=}\, \bigcap _{\nu \in N_1 \cup N_3} \left\{ x \in \mathbb {R}^3: \nu \cdot x = \nu \cdot \tilde{l}(t) \text { for some } t \in J\right\} \subset {B_{1}\left( 0\right) }; \end{aligned}$$

see Fig. 15b for a sketch relating P and Q in three dimensions.

Let \(\tilde{A}\, {:}{=}\, \{\theta _1=0 \text {, } 0< \theta _2, \theta _3 <1\}\). By definition (53) and \((u,\theta ) \in \mathscr {U}^0\), the point \(0 \in A \subset \tilde{A}\) is a point of density one of A and, therefore, also of \(\tilde{A}\). In order to arrive at a contradiction, we only have to prove \(0 \in \overline{Q}\), as all points in \(\overline{Q}\) have positive density and \(Q \subset {B_{1}\left( 0\right) }{\setminus } \tilde{A}\) by virtue of \(\theta _2 \equiv 0\) or \(\theta _2 \equiv 1\) on Q.

Step 11: Prove \(0 \in \overline{Q}\).

In the case \(b=0\) we have \(0\in R\), which trivially gives \(0 \in \tilde{l}(J) \subset \overline{Q}\). Therefore, we only consider the case \(b >0\) in the following.

As a result of \(b < r_{2} = \frac{r_1}{2}\) we obtain \(\tilde{l}(- \frac{1}{2}b) = \smash {\frac{b}{\sqrt{2}}}(1\overline{1}1) \in {B_{r_1}\left( 0\right) }\). Furthermore, by \(0< b = \min _{\nu \in N_2\cup N_3} \{-a_\nu ,b_\nu \}\) we have

$$\begin{aligned} a_{\nu _2^+}&< \tilde{l}\left( - \frac{1}{2}b\right) \cdot \frac{1}{\sqrt{2}} (101) = b = b_{\nu _2^+},\\ a_{\nu _2^-}&< \tilde{l}\left( - \frac{1}{2}b\right) \cdot \frac{1}{\sqrt{2}} (\overline{1} 01) = 0< b_{\nu _1^-},\\ a_{\nu _3^+}&< \tilde{l}\left( - \frac{1}{2}b\right) \cdot \frac{1}{\sqrt{2}} (110) = 0< b_{\nu _3^+},\\ a_{\nu _3^-}&< \tilde{l}\left( - \frac{1}{2}b\right) \cdot \frac{1}{\sqrt{2}} (1\overline{1}0) = b \le b_{\nu _3^-}. \end{aligned}$$

Therefore, for all \(\varepsilon >0\) sufficiently small, we have \(\tilde{l}(-\frac{1}{2}b) - \varepsilon E_1 \in P\), which implies \(\tilde{l}(- \frac{1}{2}b) \in R\). Thus we have \(-\smash {\frac{1}{2}}b \in J\). Computation gives \(\tilde{l}(- \frac{1}{2}b) \cdot \nu _1^+ = 0 = 0 \cdot \nu _1^+ \) and \( \tilde{l}(- \frac{1}{2}b) \cdot \nu _3^+ = 0 = 0 \cdot \nu _3^+\). For \(\nu = \nu _1^+\) and \(\nu =\nu _3^+\) this proves

$$\begin{aligned} 0 \in \left\{ x\in \mathbb {R}^3: \nu \cdot x = \nu \cdot \tilde{l}(t) \text { for some } t \in J\right\} . \end{aligned}$$

Similarly, we have \(\tilde{l}( \frac{1}{2}b) = \smash {\frac{b}{\sqrt{2}}}(111) \in {B_{r_1}\left( 0\right) }\). We also get \(\tilde{l}(\frac{1}{2}b) \in R\) by the computations

$$\begin{aligned} a_{\nu _2^+}&< \tilde{l}\left( \frac{1}{2}b\right) \cdot \frac{1}{\sqrt{2}} (101) = b = b_{\nu _2^+},\\ a_{\nu _2^-}&< \tilde{l}\left( \frac{1}{2}b\right) \cdot \frac{1}{\sqrt{2}} (\overline{1} 01) = 0< b_{\nu _1^-},\\ a_{\nu _3^+}&< \tilde{l}\left( \frac{1}{2}b\right) \cdot \frac{1}{\sqrt{2}} (110) = b \le b_{\nu _3^+},\\ a_{\nu _3^-}&< \tilde{l}\left( \frac{1}{2}b\right) \cdot \frac{1}{\sqrt{2}} (1\overline{1}0) = 0 < b_{\nu _3^-}, \end{aligned}$$

so that we have \(\frac{b}{2} \in J\). Due to \(\tilde{l}( \frac{1}{2}b) \cdot \nu _1^- = 0 =0 \cdot \nu _1^- \) and \( \tilde{l}(\frac{1}{2}b) \cdot \nu _3^- = 0 = 0 \cdot \nu _3^-\), we get \(0 \in \left\{ x\in \mathbb {R}^3: \nu \cdot x = \nu \cdot l(t) \text { for some } t \in J \right\} \) for \(\nu = \nu _1^-\) and \(\nu = \nu _3^-\), which finally ensures \(0 \in \overline{Q}\). \(\quad \square \)

Proof of Lemma 8

The fact that \(f_\delta (x)= \smash {\fint _{{B_{\delta }\left( x\right) }} f(y)\, \mathrm {d}y}\) for \(x\in U\) is continuous follows from the observation that \(f_\delta \) is the convolution of an extension of f with the kernel \(\frac{1}{|{B_{\delta }\left( 0\right) }|} \chi _{{B_{\delta }\left( 0\right) }}\). Let \(\smash {\widetilde{U}} \subset \subset U\) be open. Let \(x \in \smash {\widetilde{U}}\) and \(\delta >0\) small enough to have \({B_{\delta }\left( x\right) } \subset U\). We then have that

$$\begin{aligned} {\text {dist}}(f_\delta (x), K)&= \inf _{\hat{f} \in K} | f_\delta (x) - \hat{f}| \\&= \fint _{{B_{\delta }\left( x\right) }} \inf _{\hat{f} \in K} | f_\delta (x) - \hat{f}| \, \mathrm {d}y \le \fint _{{B_{\delta }\left( x\right) }} | f_\delta (x) - f(y)| \, \mathrm {d}y \rightarrow 0, \end{aligned}$$

where the convergence in \(\delta \rightarrow 0\) is uniform in \(\widetilde{U}\) by definition of VMO. \(\quad \square \)

4.6 Classification of Planar Configurations

Proof of Lemma 9

As the configuration is planar, there exist \(d\in \mathscr {D}\) and, for \(i=1,2,3\) and \(\nu _i \in N_i\) with \(\nu _i \cdot d=0\), functions \(\tilde{f}_{\nu _i} \in L^\infty (-1,1)\) and affine functions \(\tilde{g}_i :\mathbb {R}^3 \rightarrow \mathbb {R}\) with \(\partial _d \tilde{g}_i = 0\) such that for almost all \(x\in {B_{1}\left( 0\right) }\) we have the decomposition

$$\begin{aligned} \theta _1(x)&= \tilde{f}_{\nu _2}(x\cdot \nu _2) - \tilde{f}_{\nu _3}(x\cdot \nu _3) + \tilde{g}_1(x),\nonumber \\ \theta _2(x)&= {}-{} \tilde{f}_{\nu _1} (x\cdot \nu _1)\tilde{f}_{\nu _3}(x\cdot \nu _3) + \tilde{g}_2(x),\nonumber \\ \theta _3(x)&= \tilde{f}_{\nu _1}(x\cdot \nu _1) - \tilde{f}_{\nu _2}(x\cdot \nu _2) + \tilde{g}_3(x). \end{aligned}$$
(79)

Without loss of generality, we may assume that \(\tilde{f}_{\nu _1}\) is affine.

Because \(\nu _2\) and \(\nu _3\) span the space \(\{x\in \mathbb {R}^3:x\cdot d=0\}\) and we have \(\partial _d \tilde{g}_1 = 0\), there exist \(a_{\nu _2}, a_{\nu _3}, c \in \mathbb {R}\) such that for all \(x\in \mathbb {R}^3\) we have \(\tilde{g}_1(x) = a_{\nu _2} x \cdot \nu _2 + a_{\nu _3}x\cdot _3 + c\). For \(t\in (-1,1)\) and \(x\in {B_{1}\left( 0\right) }\) we define \(f_{\nu _2},f_{\nu _3} \in L^\infty (-1,1)\) and \(g: {B_{1}\left( 0\right) } \rightarrow \mathbb {R}\) affine via

$$\begin{aligned} f_{\nu _2}(t)&\, {:}{=}\, \tilde{f}_{\nu _2}(t) + a_{\nu _2} t + c-1,\\ f_{\nu _3}(t)&\, {:}{=}\, \tilde{f}_{\nu _3}(t) - a_{\nu _3} t,\\ g (x)&\, {:}{=}\, \tilde{g}_2(x) - \tilde{f}_{\nu _1}(x\cdot \nu _1) + a_{\nu _3} x\cdot \nu _3 \end{aligned}$$

and note \(\partial _d g (x) = \partial _d \tilde{g}_2 (x) = 0\) for all \(x \in {B_{1}\left( 0\right) }\). We thus for all \(x\in {B_{1}\left( 0\right) }\) get by the decomposition (79) and the assumption \(\theta \in \mathscr {\widetilde{K}}\) a.e., (see also (12)) that

$$\begin{aligned} g(x) + \tilde{g}_3(x) + \tilde{f}_{\nu _1}(x\cdot \nu _1) + a_{\nu _2} x\cdot \nu _2 + c -1&= \tilde{g}_1(x) + \tilde{g}_2(x) + \tilde{g}_3(x) -1 \\&= \theta _1(x) +\theta _2(x) +\theta _3(x) -1\\&= 0, \end{aligned}$$

so that in turn the decomposition (79) for almost all \(x\in {B_{1}\left( 0\right) }\) simplifies to

$$\begin{aligned} \theta _1(x)&= f_{\nu _2}(x\cdot \nu _2) - f_{\nu _3}(x\cdot \nu _3) + 1, \nonumber \\ \theta _2(x)&= f_{\nu _3}(x\cdot \nu _3) +g(x),\nonumber \\ \theta _3(x)&= {}-{} f_{\nu _2}(x\cdot \nu _2) - g(x). \end{aligned}$$
(80)

Let \(0< r_2< r_1 <\frac{1}{64}\) be the two universal radii of Proposition 1. If we have \(f_{\nu _2}, f_{\nu _3} \in VMO(-r_2,r_2)\), then Proposition 2 implies that e(u) is a two-variant configuration on \({B_{r}\left( 0\right) }\) for some universal constant \(r \in (0,r_2)\). Therefore, it is sufficient to consider the case \(f_{\nu _2} \not \in VMO(-r_2,r_2)\), the remaining one being similar. Then Proposition 1 tells us that there exist \(\alpha \in [-r_2,r_2]\), \(b\in (0,1)\) and \(B\subset H(\alpha ,\nu _2) \cap B_{1/8}(0)\) with \(\mathscr {H}^2(B \cap B_{r_1}(0))>0\) such that \(\mathscr {H}^2\)-almost everywhere on \(H(\alpha ,\nu _2) \cap B_{1/8}(0)\) we have

$$\begin{aligned} \theta _2 |_{H(\alpha ,\nu _2)\cap B_{1/8}(0) } = b \chi _{ B}. \end{aligned}$$
(81)

As \(\nu _2\) and \(\nu _3\) span \(\{x\in \mathbb {R}^3:x\cdot d=0\}\), we can find two affine functions \(\hat{g}_2, \hat{g}_3:\mathbb {R}\rightarrow \mathbb {R}\) with \(\hat{g}_2(\alpha )=0\) so that for all \(x\in {B_{1}\left( 0\right) }\) we have \(g(x) = \hat{g}_2(x\cdot \nu _2) +\hat{g}_3(x\cdot \nu _3)\). Thus for \(\mathscr {H}^2\)-almost all \(x\in H(\alpha ,\nu _2) \cap {B_{1/8}\left( 0\right) }\) we have

$$\begin{aligned} f_{\nu _3}(x\cdot \nu _3) + \hat{g}_3(x\cdot \nu _3) = \theta _2(x) = b \chi _{ B}(x). \end{aligned}$$
(82)

Exploiting (11) and \(r_2\in (0,\frac{1}{64})\), we get that for all \(t\in (-\frac{1}{16},\frac{1}{16})\) there exists \( \tilde{x}\in H(\alpha ,\nu _2) \cap {B_{1/8}\left( 0\right) }\) with \(\tilde{x}\cdot \nu _3 = t \). Consequently, there exists a set \(\tilde{B} \subset (-\frac{1}{16},\frac{1}{16})\) with \(|\tilde{B} \cap (-r_1,r_1)|>0\) such that for almost all \(t\in (-\frac{1}{16},\frac{1}{16})\) we have

$$\begin{aligned} f_{\nu _3}(t) + \hat{g}_3(t) = b \chi _{\tilde{B}}(t). \end{aligned}$$
(83)

Similarly, for all \(x\in B_{1/16}(0)\) there exists \(\tilde{x} \in H(\alpha ,\nu _2) \cap {B_{1/8}\left( 0\right) }\) with \(\tilde{x}\cdot \nu _3 = x\cdot \nu _3 \) and we almost everywhere get from (83) that

$$\begin{aligned} \theta _2(x)&=f_{\nu _3}(x\cdot \nu _3) + g(x) \nonumber \\&= f_{\nu _3}(\tilde{x}\cdot \nu _3) + \hat{g}_2(x\cdot \nu _2) + \hat{g}_3(\tilde{x} \cdot \nu _3)\nonumber \\&= b\chi _{\tilde{B}}(x\cdot \nu _3) + \hat{g}_2(x\cdot \nu _2). \end{aligned}$$
(84)

Case 1: We have \(|\tilde{B} \cap (-\frac{1}{32},\frac{1}{32})| < \frac{1}{16}\).

Again by (11), for every \(t\in (-\frac{1}{32},\frac{1}{32})\) we can choose \(x\in B_{1/16}(0)\) with \(x\cdot \nu _3 \in {\tilde{B}}^{\mathsf {c}}\cap (-\frac{1}{32},\frac{1}{32}) \) and \(x\cdot \nu _2 =t\), so that (84) implies \(\hat{g}_2(t) \ge 0\). Thus it is an affine function which achieves a local minimum at \(\hat{g}_2(\alpha ) =0\), which in turn makes sure that \(\hat{g}_2 \equiv 0\) on \(\mathbb {R}\). Consequently, for almost all \(x\in B_{1/16}(0)\), we get \(\theta _2(x) = b\chi _{\tilde{B}}(x\cdot \nu _3)\), so that together with the identity (83) the decomposition (80) becomes

$$\begin{aligned} \theta _1(x)&= f_{\nu _2}(x\cdot \nu _2) - b\chi _{\tilde{B}}(x\cdot \nu _3) +1 + \hat{g}_3(x\cdot \nu _3), \nonumber \\ \theta _2(x)&= b\chi _{\tilde{B}}(x\cdot \nu _3) ,\nonumber \\ \theta _3(x)&= {}-{} f_{\nu _2}(x\cdot \nu _2) - \hat{g}_3(x\cdot \nu _3). \end{aligned}$$
(85)

As we have \(|\tilde{B} \cap (-r_1,r_1)|>0\) and the assumption \(\theta \in \widetilde{\mathscr {K}}\) a.e. on \({B_{1}\left( 0\right) }\), see also (12), we have for almost all \(x\in B_{1/16}(0)\) with \(x\cdot \nu _3 \in \tilde{B}\) that

$$\begin{aligned} \theta _1(x) = 1-b,\, \theta _3(x) = 0 \text { or } \theta _1(x) = 0,\, \theta _3(x) = 1-b. \end{aligned}$$
(86)

Looking at all \(x\in {B_{1/16}\left( 0\right) }\) for which in addition \(x\cdot \nu _2 \in (-\frac{1}{32},\frac{1}{32})\) is a Lebesgue point of \(f_{\nu _2}\), this implies that there exists \(\hat{c} \in \mathbb {R}\) such that \(\hat{g}_3\equiv \hat{c}\). In a second step, considering all \(x\in B_{1/16}(0)\) such that \(x\cdot \nu _3 \in \tilde{B} \cap (-r_1,r_1) \subset (-\frac{1}{64},\frac{1}{64})\), the alternative (86) and the third line of (85) imply that there exists a measurable set \(A\subset (-\frac{1}{32},\frac{1}{32})\) with \(-f_{\nu _2}(s) - \hat{c} = (1-b)\chi _A(s)\) for all \(s\in (-\frac{1}{32},\frac{1}{32})\). Hence the decomposition (85) can for almost all \(x\in B_{1/32}(0)\) be written as

$$\begin{aligned} \theta _1(x)&= - (1-b)\chi _A(x\cdot \nu _2) -b\chi _{\tilde{B}}(x\cdot \nu _3) + 1, \\ \theta _2(x)&= b\chi _{\tilde{B}}(x\cdot \nu _3) ,\\ \theta _3(x)&= (1-b)\chi _A(x\cdot \nu _2), \end{aligned}$$

meaning the configuration is a planar checkerboard on \(B_{1/32}(0)\) according to Definition 5.

Case 2: We have \(|\tilde{B} \cap (-\frac{1}{32},\frac{1}{32})| = \frac{1}{16}\).

In this case, (84) gives that \(f_{\nu _3}\) is affine on \((-\frac{1}{32},\frac{1}{32})\) and that there exist \(\tilde{a}, \tilde{b}\in \mathbb {R}\) such that for almost all \(x\in B_{1/32}(0)\) we have \(\theta _2(x)= \tilde{a} (x\cdot \nu _2) +\tilde{b}\). Therefore, if we have \(|\{x\in B_{1/32}(0): \theta _2(x)=0\}|>0\), then \(\theta _2 \equiv 0\) on \(B_{1/32}(0)\). By Corollary 1, the strain e(u) then is a two-variant configuration on \(B_{1/32}(0)\). In the following, we may thus assume \(\theta _2(x)>0\) for all \(x\in B_{1/32}(0)\). Due to the assumption \(\theta \in \mathscr {\widetilde{K}}\), we may furthermore also assume

$$\begin{aligned} |\theta _1^{-1}(0)\cap B_{1/32} |>0 \text { and } |\theta _3^{-1}(0)\cap B_{1/32} |>0 \end{aligned}$$
(87)

as otherwise e(u) is also a two-variant configuration on \(B_{1/32}(0)\) by virtue of Corollary 1.

By (87), since \(f_{\nu _3}\) is affine on \((-\frac{1}{32},\frac{1}{32})\) and constitutes the only \(x\cdot \nu _3\)-dependence of \(\theta _1\) in the decomposition (80), there exists \(\tilde{c} \in \mathbb {R}\) such that \(f_{\nu _3}(t)= \tilde{c}\) for all \(t\in (-\frac{1}{32},\frac{1}{32})\). Similarly, we see that \(\theta _3\) only depends on \(x\cdot \nu _2\) for \(x\in B_{1/32}(0)\). Therefore, there exist two functions \(h_1, h_3: (-\frac{1}{32},\frac{1}{32}) \rightarrow \mathbb {R}\) such that (80) for almost all \(x\in {B_{1/32}\left( 0\right) }\) simplifies to

$$\begin{aligned} \theta _1(x)&= h_1(x\cdot \nu _2), \\ \theta _2(x)&= \tilde{a} x \cdot \nu _2 + \tilde{b}, \\ \theta _3(x)&= h_3(x\cdot \nu _2) . \end{aligned}$$

For almost all \(x\in B_{1/32}(0)\), by the facts \(\theta _2(x)>0\), \(\theta _1(x)\theta _2(x)\theta _3(x)=0\) and \(\theta _1(x)+\theta _2(x)+\theta _3(x)=1\) resulting from the assumption \(\theta (x) \in \mathscr {\widetilde{K}}\), we get that there exists a measurable set \(\tilde{A}\subset (-\frac{1}{32}, \frac{1}{32})\) such that

$$\begin{aligned} \theta _1(x)&= (1 -\tilde{a} x \cdot \nu _2 - \tilde{b})\chi _{\tilde{A}}, \\ \theta _2(x)&= \tilde{a} x \cdot \nu _2 + \tilde{b}, \\ \theta _3(x)&= (1-\tilde{a} x \cdot \nu _2 - \tilde{b})\chi _{{\tilde{A}}^{\mathsf {c}}}. \end{aligned}$$

Consequently, e(u) is a planar second-order laminate on \( B_{1/32}(0)\) according to Definition 4.

Finally, the lemma follows due to \(r\le \frac{1}{32}\). \(\quad \square \)

Proof of Proposition 3

Step 1: Rewrite the problem in a two-dimensional domain and bring the decomposition (14) into an appropriate form.

We will show that we may take \(r \, {:}{=}\, \frac{1}{16}\). Using the specific form of the normals \(\nu _i\), see Section 2.2, we can find orientations \(\tilde{\nu }_i = \pm \nu _i\) for \(i=1,2,3\) which satisfy \(\tilde{\nu }_1+\tilde{\nu }_2+\tilde{\nu }_3 =0\). Furthermore, the strain e(u) and the affine function g only depend on directions in \(V \, {:}{=} \, {\text {span}} (\tilde{\nu }_1,\tilde{\nu }_2,\tilde{\nu }_3)\). Thus we can rotate the domain of definition such that \(V = \mathbb {R}^2\) and treat e(u), \(\theta \) and g as functions defined on \(B_1^{\,2}(0) \subset \mathbb {R}^2\). In the following we will abuse the notation by writing \(\nu _i\) for the images of \(\tilde{\nu }_i\) under this rotation.

It is straightforward to see that the angle between \(\nu _i\) and \(\nu _{i+1}\) for cylical indices \(i=1,2,3\) is given by \(120^{\circ }\), and hence the two vectors are linearly independent. Similarly to (11), we therefore have for all \(x\in \mathbb {R}^2\), \(\tilde{r} >0\) and \(i=1,2,3\) that

$$\begin{aligned} |x\cdot \nu |<\tilde{r} \text { and } |x\cdot \tilde{\nu }|< \tilde{r} \text { imply } |x|<2\tilde{r}. \end{aligned}$$
(88)

Additionally, for all \(i,j \in \{1,2,3\}\) with \(i\ne j\) there exist functions \(f_{i,j} \in L^\infty (-1,1)\) such that for cyclical indices \(k=1,2,3\) the sum \(f_{k+1,k} + f_{k-1,k}\) coincides almost everywhere with an affine function and such that for almost all \(x \in B_1^{\,2}(0)\) we can rewrite the decomposition (14) as

$$\begin{aligned} \theta _1(x)&= f_{1,2}(x\cdot \nu _2) + f_{1,3}(x\cdot \nu _3), \nonumber \\ \theta _2(x)&= f_{2,1}(x\cdot \nu _1) + f_{2,3}(x\cdot \nu _3), \nonumber \\ \theta _3 (x)&= f_{3,1}(x\cdot \nu _1) + f_{3,2}(x\cdot \nu _2). \end{aligned}$$
(89)

The assumption that for all \(i=1,2,3\) the functions \(f_{\nu _i}\) are not affine on \(\left( -r,r\right) \) then translates into \(f_{i,j}\) also not being affine on \(\left( -\frac{1}{16},\frac{1}{16}\right) \) for all \(i,j \in \{1,2,3\}\) with \(i\ne j\) due to the choice \(r=\frac{1}{16}\).

Step 2: If \(k\in \{1,2,3\}\) is such that \(\left| \theta _k^{-1}(0)\cap B_{1/2}^{\,2}(0)\right| > 0\), then we can additionally assume \(f_{k,k+1}, f_{k,k-1} \in L^\infty (-1,1)\) to satisfy \(f_{k,k+1}(t),f_{k,k-1}(t)\ge 0\) for almost all \(t \in (- \frac{1}{2}, \frac{1}{2})\), and \(f_{k,k+1}(x\cdot \nu _{k+1}) = f_{k,k-1}(x\cdot \nu _{k-1}) = 0\) for almost all \(x\in \theta _k^{-1}(0)\cap B_{1/2}^{\,2}(0)\).

Let \(k\in \{1,2,3\}\) be as in the claim. As a result of (88), we have for almost all \(x\in \theta _k^{-1}(0)\cap B_{1/2}^{\,2}(0)\) that

$$\begin{aligned} 0&= f_{k,k+1}(x\cdot \nu _{k+1}) + f_{k,k-1}(x\cdot \nu _{k-1}) \\&\ge \hbox {ess inf}_{(- \frac{1}{2}, \frac{1}{2})} f_{k,k+1} + \hbox {ess inf}_{(- \frac{1}{2}, \frac{1}{2})} f_{k,k-1} \\&\ge \hbox {ess inf}_{y \in B_1^{\,2}(0) } f_{k,k+1}(y\cdot \nu _{k+1}) + f_{k,k-1}(y\cdot \nu _{k-1})\\&\ge 0. \end{aligned}$$

Since the decomposition (89) is invariant under adding a constant to \(f_{k,k+1}\) and subtracting it from \(f_{k,k-1}\), the claim then follows by arranging for

$$\begin{aligned} \hbox {ess inf}_{(- \frac{1}{2}, \frac{1}{2})} \smash {f_{k,k+1}} = \hbox {ess inf}_{(- \frac{1}{2}, \frac{1}{2})} \smash {f_{k,k-1}} = 0. \end{aligned}$$

Step 3: For all \(k=1,2,3\) there exist measurable sets \(J_k \subset \left( -\frac{1}{2},\frac{1}{2}\right) \) such that we have

$$\begin{aligned} \left| \left( \theta _k^{-1}(0) {\Delta } \left( \pi _{k+1}^{-1}\left( J_{k+1}\right) \cap \pi _{k-1}^{-1}\left( {J}^{\mathsf {c}}_{k-1}\right) \right) \right) \cap B_{1/2}^{\,2}(0) \right| = 0 \end{aligned}$$
(90)

and

$$\begin{aligned} \left| \left( \pi _1^{-1}(J_1)\cap \pi _2^{-1}(J_2)\cap \pi _3^{-1}(J_3)\right) \cap B_{1/2}^{\,2}(0) \right|&=0,\nonumber \\ \left| \left( \pi _1^{-1}({J}^{\mathsf {c}}_1)\cap \pi _2^{-1}({J}^{\mathsf {c}}_2)\cap \pi _3^{-1}({J}^{\mathsf {c}}_3)\right) \cap B_{1/2}^{\,2}(0) \right|&=0. \end{aligned}$$
(91)

Let \(k\in \{1,2,3\}\). If \(\left| \theta _k^{-1}(0)\cap B_{1/2}^{\,2}(0)\right| > 0\) we set

$$\begin{aligned} I_{k,k+1}&\,{:}{=}\, \left( f_{k,k+1}\right) ^{-1}(0) \cap \left( -\frac{1}{2}, \frac{1}{2}\right) ,\nonumber \\ I_{k,k-1}&\, {:}{=}\, \left( f_{k,k-1}\right) ^{-1}(0) \cap \left( -\frac{1}{2}, \frac{1}{2}\right) . \end{aligned}$$
(92)

Otherwise we set \(I_{k,k+1}\, {:}{=}\, I_{k,k-1}\, {:}{=}\, \emptyset \). In any case, by Step 2 we have

$$\begin{aligned} \left| \left( \theta _k^{-1}(0) {\Delta } \left( \pi _{k+1}^{-1}\left( I_{k,k+1}\right) \cap \pi _{k-1}^{-1}\left( I_{k,k-1}\right) \right) \right) \cap B_{1/2}^{\,2}(0) \right| =0 . \end{aligned}$$
(93)

Claim 3.1: For all \(k=1,2,3\) we have \(\left| I_{k+1,k} \cap I_{k-1,k}\right| = 0\).

Let \(k\in \{1,2,3\}\). If \(|\theta _{k+1}^{-1}(0)\cap B_{1/2}^{\,2}(0)|=0\) or \(|\theta _{k-1}^{-1}(0)\cap B_{1/2}^{\,2}(0)|=0\) then there is nothing to prove. Otherwise we assume towards a contradiction that

$$\begin{aligned} \left| I_{k+1,k} \cap I_{k-1,k} \right| > 0. \end{aligned}$$

In that case, the affine function \(f_{k+1,k}+f_{k-1,k}\) vanishes on a set of positive measure. Consequently, we see \(f_{k+1,k} \equiv -f_{k-1,k}\). Since both functions are non-negative on \((-\frac{1}{2}, \frac{1}{2})\) by Step 2 we get \(f_{k+1,k} \equiv f_{k-1,k} \equiv 0\) on \((-\frac{1}{2}, \frac{1}{2})\). However, this contradicts our assumption that they are non-affine on \(\left( -\frac{1}{16},\frac{1}{16}\right) \), which proves Claim 3.1.

As a result of the equality (93) and Claim 3.1 we for all \(k=1,2,3\) get

$$\begin{aligned} \left| \left( \theta _k^{-1}(0) {\setminus } \left( \pi _{k+1}^{-1}\left( I_{k,k+1}\right) \cap \pi _{k-1}^{-1}\left( {I}^{\mathsf {c}}_{k+1,k-1}\right) \right) \right) \cap B_{1/2}^{\,2}(0) \right| =0, \end{aligned}$$

which in terms of

$$\begin{aligned} J_i\, {:}{=}\, I_{i-1,i}\text { for } i=1,2,3 \end{aligned}$$
(94)

reads

$$\begin{aligned} \left| \left( \theta _k^{-1}(0) {\setminus } \left( \pi _{k+1}^{-1}\left( J_{k+1}\right) \cap \pi _{k-1}^{-1}\left( {J}^{\mathsf {c}}_{k-1}\right) \right) \right) \cap B_{1/2}^{\,2}(0) \right| =0. \end{aligned}$$
(95)

Together with the assumption \(\theta \in \mathscr {\widetilde{K}}\) almost everywhere on \(B_{1/2}^{\,2}(0)\) in the form of \(\theta _1\theta _2\theta _3 =0\) we therefore get

$$\begin{aligned} \left| B_{1/2}^{\,2}(0) {\setminus } \left( \bigcup _{k=1,2,3} \pi _{k+1}^{-1}\left( J_{k+1}\right) \cap \pi _{k-1}^{-1}\left( {J}^{\mathsf {c}}_{k-1}\right) \right) \right| =0, \end{aligned}$$
(96)

so that some straightforward combinatorics ensure the two equailities (91) by virtue of

$$\begin{aligned}&B_{1/2}^{\,2}(0) {\setminus } \left( \bigcup _{k=1,2,3} \pi _{k+1}^{-1}\left( J_{k+1}\right) \cap \pi _{k-1}^{-1}\left( {J}^{\mathsf {c}}_{k-1}\right) \right) \\&\quad = B_{1/2}^{\,2}(0)\cap \left( \left( \bigcap _{k=1,2,3} \pi _k^{-1}(J_k) \right) \cup \left( \bigcap _{k=1,2,3} \pi _k^{-1}({J}^{\mathsf {c}}_k) \right) \right) . \end{aligned}$$

In order to get the identity (90) we by the equality (95) are only left with for all \(k = 1,2,3\) estimating

$$\begin{aligned}&\left| \left( \left( \pi _{k+1}^{-1}\left( J_{k+1}\right) \cap \pi _{k-1}^{-1}\left( {J}^{\mathsf {c}}_{k-1}\right) \right) {\setminus } \theta _k^{-1}(0) \right) \cap B_{1/2}^{\,2}(0)\right| \\&\quad \le \left| \left( \left( \pi _{k+1}^{-1}\left( J_{k+1}\right) \cap \pi _{k-1}^{-1}\left( {J}^{\mathsf {c}}_{k-1}\right) \right) \cap \theta _{k+1}^{-1}(0) \right) \cap B_{1/2}^{\,2}(0)\right| \\&\qquad + \left| \left( \left( \pi _{k+1}^{-1}\left( J_{k+1}\right) \cap \pi _{k-1}^{-1}\left( {J}^{\mathsf {c}}_{k-1}\right) \right) \cap \theta _{k-1}^{-1}(0) \right) \cap B_{1/2}^{\,2}(0)\right| , \end{aligned}$$

where we again used \(\theta _1\theta _2\theta _3 =0\) almost everywhere on \(B^{\,2}_{1/2}(0)\). Together with (95) and the fact that for all \(i=1,2,3\) the sets \(\pi _{i+1}^{-1}\left( J_{i+1}\right) \cap \pi _{i-1}^{-1}\left( {J}^{\mathsf {c}}_{i-1}\right) \) are pairwise disjoint by definition, we thus obtain the equality (90).

Step 4: The conclusion of the proposition holds.

We now make sure that we can apply Lemma 10. Unless we are dealing with a two-variant configuration on \(B_{1/16}^{\,2}(0)\), by \(\theta _1\theta _2\theta _3 = 0\) almost everywhere and the identity (90) there exists \(i,j \in \{1,2,3\}\) with \(i\ne j\) such that

$$\begin{aligned} \left| \left( \pi _{i+1}^{-1}\left( J_{i+1}\right) \cap \pi _{i-1}^{-1}\left( {J}^{\mathsf {c}}_{i-1}\right) \right) \cap B_{1/16}^{\,2}(0) \right|&>0,\\ \left| \left( \pi _{j+1}^{-1}\left( J_{j+1}\right) \cap \pi _{j-1}^{-1}\left( {J}^{\mathsf {c}}_{j-1}\right) \right) \cap B_{1/16}^{\,2}(0) \right|&>0. \end{aligned}$$

By relabeling, we may suppose \(i=3\) and \(j=1\). Consequently, we get

$$\begin{aligned} \begin{aligned} \left| J_1\cap \left( -\frac{1}{16},\frac{1}{16}\right) \right|&>0 , \qquad&\left| \left( -\frac{1}{16},\frac{1}{16}\right) {\setminus } J_2\right|&>0,\\ \left| J_2\cap \left( -\frac{1}{16},\frac{1}{16}\right) \right|&>0, \qquad&\left| \left( -\frac{1}{16},\frac{1}{16}\right) {\setminus } J_3\right|&>0, \end{aligned} \end{aligned}$$
(97)

which, in particular, implies \(0< \left| J_2\cap \left( -\frac{1}{16},\frac{1}{16}\right) \right| < \frac{1}{8} \). As a result of (94) and (92) we have for almost all \(t\in J_1\) that \(f_{3,1}(t) = 0\), so that the assumption of \(f_{3,1}\) not being affine on \(\left( -\frac{1}{16},\frac{1}{16} \right) \) implies \(0< \left| J_1\cap \left( -\frac{1}{16}, \frac{1}{16} \right) \right| < \frac{1}{8}\). After a rescaling, Lemma 10 implies that there exists a point \(x_0 \in B_{1/8}^{\,2}(0)\) such that we have either

$$\begin{aligned} \left| \left( J_k {\Delta } \left( -\frac{1}{8},x_0\cdot \nu _k\right) \right) \cap \left( -\frac{1}{8},\frac{1}{8} \right) \right| = 0 \text { for all } k=1,2,3 \end{aligned}$$

or

$$\begin{aligned} \left| \left( J_k {\Delta } \left( x_0\cdot \nu _k,\frac{1}{8}\right) \right) \cap \left( -\frac{1}{8},\frac{1}{8} \right) \right| = 0 \text { for all } k=1,2,3. \end{aligned}$$

Tracing back the definitions using (94) and (92), we see that for all \(k=1,2,3\) we have \(f_{k-1,k}=0\) almost everywhere on \(J_{k}\). From (90), (93) and (94) we get for all \(k=1,2,3\) that

$$\begin{aligned} \left| \left( \pi _{k-1}^{-1}\left( {J}^{\mathsf {c}}_{k-1}\right) {\Delta } \pi _{k-1}^{-1}\left( I_{k,k-1}\right) \right) \cap \pi _{k+1}^{-1}\left( J_{k+1}\right) \cap B_{1/2}^2(0) \right| =0. \end{aligned}$$

Therefore, Fubini’s theorem together with (97) and (92) implies \(f_{3,2}=0\) almost everywhere on \(\left( -\frac{1}{8},\frac{1}{8}\right) {\setminus } J_{2}\) and \(f_{1,3} = 0\) almost everywhere on \(\left( -\frac{1}{8},\frac{1}{8}\right) {\setminus } J_3\). As \(f_{1,3}\) is not affine on \(\left( -\frac{1}{16},\frac{1}{16}\right) \) by assumption we get that \(|J_3 \cap \left( -\frac{1}{8},\frac{1}{8}\right) |>0\), which in turn implies \(f_{2,1} =0\) almost everywhere on \(\left( -\frac{1}{8},\frac{1}{8}\right) {\setminus } J_1\). As a result we can rewrite the decomposition (89) of \(\theta \) for almost all \(x\in B^{\,2}_{1/8}(0)\) to be

$$\begin{aligned} \theta _1(x)&= f_{1,2}(x\cdot \nu _2)\chi _{{J}^{\mathsf {c}}_2}(x\cdot \nu _2) + f_{1,3}(x\cdot \nu _3)\chi _{J_3}(x\cdot \nu _3) ,\nonumber \\ \theta _2(x)&= f_{2,1}(x\cdot \nu _1)\chi _{J_1}(x\cdot \nu _1) + f_{2,3}(x\cdot \nu _3) \chi _{{J}^{\mathsf {c}}_3}(x\cdot \nu _3) ,\nonumber \\ \theta _3 (x)&= f_{3,1}(x\cdot \nu _1)\chi _{{J}^{\mathsf {c}}_1}(x\cdot \nu _1) + f_{3,2}(x\cdot \nu _2)\chi _{J_2}(x\cdot \nu _2). \end{aligned}$$
(98)

The condition that the function \(f_{k+1,k} + f_{k-1,k}\) is affine for all \(k=1,2,3\) implies that there exist \(a_k, b_k \in \mathbb {R}\) such that for almost all \(t\in (-\frac{1}{8} ,\frac{1}{8})\) we have

$$\begin{aligned} \left( f_{k-1,k}\chi _{{J}^{\mathsf {c}}_{k}} + f_{k+1,k}\chi _{J_{k}}\right) (t) = a_kt + b_k. \end{aligned}$$

Due to the assumption \(\sum _{k=1}^3\theta _k \equiv 1\), see also (12), summing the equations in the decomposition (98) gives for all \(x \in B^{\,2}_{1/8}(0)\) that

$$\begin{aligned} \sum _{k=1}^3 a_k x\cdot \nu _k + b_k = 1. \end{aligned}$$

Comparing the coefficients of both polynomials, we see that

$$\begin{aligned} \sum _{k=1}^3 b_k =1\text {, } \sum _{k=1}^3 a_k\nu _k = 0. \end{aligned}$$

Subtracting \(a_1(\nu _1 + \nu _2 + \nu _3) = 0\) from the second equation and remembering from Step 1 that \(\nu _2\) and \(\nu _3\) are linearly independent, we see that \(a\, {:}{=}\, a_1= a_2 = a_3\). The decomposition (98) thus for almost all \(x\in B^{\,2}_{1/8}(0)\) turns into

$$\begin{aligned} \theta _1(x)&= (a x\cdot \nu _2 +b_2)\chi _{{J}^{\mathsf {c}}_2}(x\cdot \nu _2) + (ax\cdot \nu _3 + b_3)\chi _{J_3}(x\cdot \nu _3) ,\\ \theta _2(x)&= (ax\cdot \nu _1+b_1)\chi _{J_1}(x\cdot \nu _1) + (ax\cdot \nu _3 + b_3) \chi _{{J}^{\mathsf {c}}_3}(x\cdot \nu _3) ,\\ \theta _3 (x)&= (ax\cdot \nu _1 + b_1)\chi _{{J}^{\mathsf {c}}_1}(x\cdot \nu _1) + (ax\cdot \nu _2 + b_2)\chi _{J_2}(x\cdot \nu _2), \end{aligned}$$

with \(\sum _{k=1}^3 b_k = 1\). \(\quad \square \)

Proof of Lemma 10

Claim 1: There exist \(a_1, a_2 \in (-4,4)\) such that we have either

$$\begin{aligned} \left| \left( J_1 {\Delta } \left( -4,a_1\right) \right) \cap (-4,4) \right| = 0\text { and }\left| \left( J_2 {\Delta } \left( -4,a_2\right) \right) \cap (-4,4) \right| = 0 \end{aligned}$$

or

$$\begin{aligned} \left| \left( J_1 {\Delta } \left( a_1,4\right) \right) \cap (-4,4) \right| = 0 \text { and } \left| \left( J_2 {\Delta } \left( a_2,4\right) \right) \cap (-4,4) \right| = 0 . \end{aligned}$$

Towards a contradiction we assume the negation of Claim 1.

Step 1.1: Up to symmetries of the problem, find Lebesgue points \(-4< p_1< p_2 < 4\) of \(\chi _{J_1}\) and \(-4< q_1< q_2 < 4\) of \(\chi _{J_2}\) such that

$$\begin{aligned} \chi _{J_1}(p_1) = \chi _{J_2}(q_2)= 1\text { and }\chi _{J_1}(p_2) = \chi _{J_2}(q_1) = 0. \end{aligned}$$

We first demonstrate that if the negation of Claim 1 holds, then we can find Lebesgue points \(-4< p_1< p_2 < 4\) of \(\chi _{J_1}\) and \(-4< q_1< q_2 < 4\) of \(\chi _{J_2}\) such that

$$\begin{aligned} \chi _{J_1}(p_1)&\ne \chi _{J_1}(p_2),\\ \chi _{J_2}(q_1)&\ne \chi _{J_2}(q_2),\\ \chi _{J_1}(p_1)&\ne \chi _{J_2}(q_1),\\ \chi _{J_1}(p_2)&\ne \chi _{J_2}(q_2).\\ \end{aligned}$$

If there exist \(a_1,a_2 \in \left( -4,4\right) \) such that

$$\begin{aligned} \left| \left( J_1 {\Delta } \left( -4,a_1\right) \right) \cap (-4,4)\right| = 0 \text { and } \left| \left( J_2 {\Delta } \left( a_2,4\right) \right) \cap (-4,4)\right| = 0, \end{aligned}$$

then one may take, for \(\delta >0\) small enough, Lebesgue points \(p_1 \in ( a_1 - \delta ,a_1) \), \(p_2 \in (a_1, a_1 + \delta )\) of \(\chi _{J_1}\) and Lebesgue points \(q_1 \in ( a_2 -\delta ,a_2)\) and \(q_2 \in (a_2, a_2 + \delta )\) of \(\chi _{J_2}\), see Fig. 24. The case \(\left| \left( J_1 {\Delta } \left( a_1,4\right) \right) \cap (-4,4)\right| = 0\) and \(\left| \left( J_2 {\Delta } \left( -4,a_2 \right) \right) \cap (-4,4)\right| = 0\) works the same.

Fig. 24
figure 24

Graphs of \(\chi _{J_1}\) (left) and \(\chi _{J_2}\) (right) in the case that \(J_1\) and \(J_2\) are intervals such that one of them has an endpoint at \(-4\) and the other one at 4. In this case we choose \(p_1, p_2\) and \(q_1, q_2\) on opposite sides of the respective other endpoint

By the assumption (26) we have neither \(|J_1 \cap (-4,4)| =0\) nor \(\left| \left( -4,4\right) {\setminus } J_1 \right| =0\). Thus, if there exists no \(a_1 \in \left( - 4,4\right) \) such that

$$\begin{aligned} \left| \left( J_1 {\Delta } \left( -4,a_1\right) \right) \cap (-4,4) \right| = 0 \text { or } \left| \left( J_1 {\Delta } \left( a_2,4\right) \right) \cap (-4,4) \right| = 0, \end{aligned}$$
(99)

then there exist three Lebesgue points \(-4<\bar{p}_1<\bar{p}_2< \bar{p}_3<4\) of \(\chi _{J_1}\) such that \(\chi _{J_1}(\bar{p}_1)\ne \chi _{J_1}(\bar{p}_2) \ne \chi _{J_1}(\bar{p}_3)\), see Fig. 25. Since also \(J_2\) has neither full nor zero measure in \(\left( -4,4\right) \) by (26), there exist Lebesgue points \(-4<q_1< q_2<4\) of \(\chi _{J_2}\) with \(\chi _{J_2}(q_1) \ne \chi _{J_2}(q_2)\). In the case \(\chi _{J_2}(q_1)\ne \chi _{J_1}(\bar{p}_1)\), set \(p_1 \,{:}{=} \,\bar{p}_1\) and \(p_2 \,{:}{=}\, \bar{p}_2\). Otherwise set \(p_1\, {:}{=}\, \bar{p}_2\) and \(p_2\, {:}{=}\, \bar{p}_3\).

Fig. 25
figure 25

Graphs of \(\chi _{J_1}\) (left) and \(\chi _{J_2}\) (right) in the case that \(J_1\) is not an interval with one endpoint at \(-4\) or 4. In this specific instance we choose \(p_1 = \bar{p}_2\) and \(p_2 \,{:}{=}\, \bar{p}_3\)

If similarly there exists no \(a_2\in (-4,4)\) such that the analogue of (99) holds for \(J_2\), the same reasoning applies.

Furthermore, we may assume \(\chi _{J_1}(p_1) = 1\) because the statement of the lemma is clearly invariant under replacing all sets by their complements. The above collection of unordered inequalities then turns into \(\chi _{J_1}(p_1) = \chi _{J_2}(q_2) = 1\) and \(\chi _{J_1}(p_2) = \chi _{J_2}(q_1) = 0\).

Step 1.2: Find \(\delta >0\) and \(s_1, s_2 \in (-4+ \delta , 4-\delta )\) such that for

$$\begin{aligned} J_1^<&\,{:}{=} \, J_1 \cap (s_1 - \delta , s_1),&J_1^>&\, {:}{=}\, J_1 \cap (s_1, s_1 + \delta ),\\ J_2^<&\, {:}{=}\, J_2 \cap (s_2 - \delta , s_2),&J_2^>&\, {:}{=}\, J_2 \cap (s_2, s_2 + \delta ), \end{aligned}$$

we have

$$\begin{aligned} |J_1^<|> | J_1^>| \text { and } |J_2^> | > |J_2^<|, \end{aligned}$$

see Fig. 26.

By Step 1.1, there exists \(\tilde{\delta } > 0\) such that we have \(p_i \pm 3\tilde{\delta }\), \(q_i \pm 3\tilde{\delta } \in (-4,4)\) for all \(i=1,2\) and

$$\begin{aligned} \fint _{p_1-\tilde{\delta }}^{p_1+\tilde{\delta }} \chi _{J_1} \, \mathrm {d}t,&\fint _{q_2-\tilde{\delta }}^{q_2+\tilde{\delta }} \chi _{J_2} \, \mathrm {d}t > \frac{3}{4}, \\ \fint _{p_2-\tilde{\delta }}^{p_2+\tilde{\delta }} \chi _{J_1} \, \mathrm {d}t,&\fint _{q_1-\tilde{\delta }}^{q_1+\tilde{\delta }} \chi _{J_2} \, \mathrm {d}t < \frac{1}{4}. \end{aligned}$$

Since the map \(s\mapsto \fint _{s-\tilde{\delta }}^{s+\tilde{\delta }} \chi _{J_1} \, \mathrm {d}t\) with \(s\in [p_1,p_2]\) is continuous, there exists

$$\begin{aligned} \tilde{s}_1 \,{:}{=}\, \max \left\{ p_1 \le s \le p_2 : \fint _{s-\tilde{\delta }}^{s+\tilde{\delta }} \chi _{J_1} \, \mathrm {d}t = \frac{1}{2} \right\} . \end{aligned}$$
Fig. 26
figure 26

The sets \(J^<_1\) and \(J^>_1\) (both left), and \(J^<_2\) and \(J^>_2\) (both right) locally split up \(J_1\) and \(J_2\). The graphs of their characteristic functions are shown in black

Let \(s_1\, {:}{=}\, \tilde{s}_1 +\tilde{\delta }\) and \(\delta \, {:}{=}\, 2 \tilde{\delta }\). Then we have \(s_1 \in (-4+\delta , 4-\delta )\) and

$$\begin{aligned} |J_1 \cap (s_1 - \delta , s_1)|= \frac{\delta }{2} > |J_1 \cap (s_1, s_1 + \delta )|, \end{aligned}$$

which with the notation \(J_1^<= J_1 \cap (s_1 - \delta , s_1)\) and \(J_1^> = J_1 \cap (s_1, s_1 + \delta )\) reads

$$\begin{aligned} |J_1^< |> |J_1^>|. \end{aligned}$$

Using the same reasoning we can find \(s_2 \in (-4+\delta , 4-\delta )\) such that for \(J_2^< = J_2 \cap (s_2-\delta , s_2)\) and \(J_2^> = J_2 \cap (s_2 , s_2 + \delta )\) we get

$$\begin{aligned} |J_2^>| > |J_2^<|. \end{aligned}$$

Step 1.3: Derive the contradiction.

In Fig. 16b, which illustrates the strategy of the argument, \(\pi ^{-1}_1(J_1^<) \cap \pi ^{-1}_2(J_1^>)\) is the darker set, while \(\pi ^{-1}_1((s_1,s_1+\delta ){\setminus } J_1^>) \cap \pi ^{-1}_2((s_2-\delta ,s_2){\setminus } J_1^<)\) is the lighter one. Let

$$\begin{aligned} D_1&\, {:}{=}\, \pi _1^{-1}(s_1-\delta ,s_1)\cap \pi _2^{-1}(s_2, s_2 +\delta ),\\ D_2&\,{:}{=}\, \pi _1^{-1}(s_1,s_1 +\delta )\cap \pi _2^{-1}(s_2 -\delta , s_2), \end{aligned}$$

and note that by \(s_1, s_2 \in (-4,4)\) and the still valid fact (88), we have \(D_1, D_2 \subset B^{\,2}_8(0)\). Let

$$\begin{aligned} A_1\, {:}{=}\, \left\{ s \in \mathbb {R}: \int _{\{x\cdot \nu _3 = s\}} \chi _{J_1^<}(x\cdot \nu _1) \chi _{J_2^>}(x\cdot \nu _2) \, \mathrm {d}\mathscr {H}^1(x) > 0 \right\} \end{aligned}$$
(100)

and

$$\begin{aligned} A_2\, {:}{=}\, \left\{ s \in \mathbb {R}: \int _{\{x\cdot \nu _3 = s\}} \chi _{(s_1,s_1+\delta ){\setminus } J_1^>}(x\cdot \nu _1) \chi _{(s_2 -\delta ,s_2){\setminus } J_2^<}(x\cdot \nu _2) \, \mathrm {d}\mathscr {H}^1(x) > 0 \right\} . \end{aligned}$$

By Lemma 11 we have

$$\begin{aligned} |A_1| \ge |J_1^<| + |J_2^>| \end{aligned}$$

and

$$\begin{aligned} |A_2| \ge |[s_1,s_1+\delta ]{\setminus } J_1^>| + |[s_2 -\delta ,s_2]{\setminus } J_2^<| = 2\delta - |J_1^>| - |J_2^<|. \end{aligned}$$

Summing these two inequalities and using the strict inequalities of Step 1.2 we see that

$$\begin{aligned} |A_1| + |A_2| \ge 2\delta + |J_1^<| - |J_1^>| + |J_2^>| - |J_2^<| > 2\delta = |\pi _3(D_1)|. \end{aligned}$$

As we also have \(A_1\subset \pi _3(D_1)\) and \( A_2 \subset \pi _3(D_2)\) by Lemma 11, the observation \(\pi _3(D_1) = (-s_1- s_2 - \delta , -s_1 - s_2 + \delta ) = \pi _3(D_2)\), resulting from \(\nu _1+\nu _2+\nu _3 =0\), implies

$$\begin{aligned} |A_1 \cap A_2| >0. \end{aligned}$$
(101)

By Fubini’s theorem and assumption (25), we have

$$\begin{aligned}&\int _{A_1 \cap J_3} \int _{\{x\cdot \nu _3 =s\}} \chi _{J_1}(x\cdot \nu _1) \chi _{J_2}(x\cdot \nu _2) \chi _{B^{\,2}_8(0)}(x) \, \mathrm {d}\mathscr {H}^1(x) \, \mathrm {d}s\nonumber \\&\quad = \left| \pi _1^{-1}(J_1) \cap \pi _2^{-1}(J_2) \cap \pi _3^{-1}(A_1 \cap J_3) \cap B^{\,2}_8(0) \right| \nonumber \\&\quad = 0. \end{aligned}$$
(102)

As a result of \(\pi _1^{-1}(J_1^<) \cap \pi _2^{-1}(J_2^>) \subset D_1 \subset B_8^{\,2}(0)\) we have, for almost all \(s\in \mathbb {R}\),

$$\begin{aligned}&\int _{\{x\cdot \nu _3 =s\}} \chi _{J_1}(x\cdot \nu _1) \chi _{J_2}(x\cdot \nu _2) \chi _{B^{\,2}_8(0)}(x) \, \mathrm {d}\mathscr {H}^1(x) \\&\quad \ge \int _{\{x\cdot \nu _3 =s\}} \chi _{J^<_1}(x\cdot \nu _1) \chi _{J^>_2}(x\cdot \nu _2) \, \mathrm {d}\mathscr {H}^1(x), \end{aligned}$$

so that the inner integral on the left-hand side of (102) is positive on \(A_1\), see definition (100). Consequently, we must have \(|A_1 \cap J_3| = 0\), and we similarly also get \(|A_2 \cap {J}^{\mathsf {c}}_3| = 0\). However, together with (101), this would imply

$$\begin{aligned} 0< |A_1\cap A_2| = |A_1\cap A_2 \cap J_3| + |A_1 \cap A_2 \cap {J}^{\mathsf {c}}_3| = 0, \end{aligned}$$

which clearly is a contradiction and allows us to conclude the proof of Claim 1.

Claim 2: There exists \(x_0 \in B^{\,2}_2(0)\) with \(x_0 \cdot \nu _1 = a_1\) and \(x_0 \cdot \nu _2 = a_2\). Depending on the “orientation” of \(J_1\) and \(J_2\) we either have

$$\begin{aligned} \left| \left( J_3 {\Delta } (-2, x_0 \cdot \nu _3)\right) \cap (-2,2) \right| =0 \text { or } \left| \left( J_3 {\Delta } ( x_0 \cdot \nu _3,2) \right) \cap (-2,2) \right| =0. \end{aligned}$$

Here, Fig. 16a offers an illustration of the argument. Assumption (26) together with Claim 1 gives \(a_1,a_2 \in (-1,1)\). Again using the fact that \(\nu _1\) and \(\nu _2\) form a \(120^\circ \) angle in the form of (88), there thus exists \(x_0 \in B^{\,2}_2(0)\) with \(x_0 \cdot \nu _1 = a_1\) and \(x_0 \cdot \nu _2 = a_2\). This ensures that \(J_1\) and \(J_2\) have the form advertised in the statement of the lemma.

Let us assume we are in the case

$$\begin{aligned} \left| \left( J_1 {\Delta } (-4, x_0 \cdot \nu _1) \right) \cap (-4,4) \right| =0\text { and } \left| \left( J_2 {\Delta } (-4, x_0 \cdot \nu _2) \right) \cap (-4,4) \right| =0, \end{aligned}$$

the other case being similar. By (88) we have \(\pi _1^{-1}(-4,x_0 \cdot \nu _1) \cap \pi _2^{-1}(-4,x_0\cdot \nu _2) \subset B^{\,2}_8(0)\), so that using (25) we as before get

$$\begin{aligned}&\int _{J_3} \int _{\{x\cdot \nu _3 =s\}} \chi _{(-4,x_0 \cdot \nu _1)}(x\cdot \nu _1) \chi _{(-4,x_0 \cdot \nu _2)}(x\cdot \nu _2) \, \mathrm {d}\mathscr {H}^1(x) \, \mathrm {d}s \nonumber \\&\quad = \int _{J_3} \int _{\{x\cdot \nu _3 =s\}} \chi _{(-4,x_0 \cdot \nu _1)}(x\cdot \nu _1) \chi _{(-4,x_0 \cdot \nu _2)}(x\cdot \nu _2) \chi _{B^{\,2}_8(0)}(x) \, \mathrm {d}\mathscr {H}^1(x) \, \mathrm {d}s \nonumber \\&\quad = \left| \pi _1^{-1}(J_1) \cap \pi _2^{-1}(J_2) \cap \pi _3^{-1}(J_3) \cap B^{\,2}_8(0)\right| \nonumber \\&\quad = 0. \end{aligned}$$
(103)

Recall that \(x_0 \cdot \nu _3 <2\). For \(s \in (x_0\cdot \nu _3,2)\) let \(\sigma >0\) such that \(s- x_0 \cdot \nu _3 > \sigma \) and \(-4 < x_0 \cdot \nu _1 -\sigma \). Then for all \(t\in (x_0\cdot \nu _1 - \sigma , x_0 \cdot \nu _1)\) and \(x\in \mathbb {R}^2\) with \(x\cdot \nu _1 =t\) and \(x\cdot \nu _3 = s\) we have by \(x_0 \cdot \nu _1 < 1\) that

$$\begin{aligned} x\cdot \nu _2 = - t - s&\in ( - x_0\cdot \nu _1 -2, - x_0\cdot \nu _1 + \sigma - s ) \\&\subset ( -4, - x_0\cdot \nu _1 -x_0\cdot \nu _3 ) \\&= (-4,x_0\cdot \nu _2). \end{aligned}$$

Thus we see, for almost all \(s\in (x_0\cdot \nu _3,2) \), that

$$\begin{aligned} \int _{\{x\cdot \nu _3 =s\}} \chi _{(-4,x_0 \cdot \nu _1)}(x\cdot \nu _1) \chi _{(-4,x_0 \cdot \nu _2)}(x\cdot \nu _2) \, \mathrm {d}\mathscr {H}^1(x) > 0, \end{aligned}$$

which, together with (103), gives \(|J_3 \cap (x_0\cdot \nu _3,2) | =0\). A similar argument ensures \(|{J}^{\mathsf {c}}_3 \cap (-2,x_0\cdot \nu _3)| =0\). \(\quad \square \)

Proof of Lemma 11

The measurability of

$$\begin{aligned} A=\left\{ s \in \mathbb {R}: \int _{\{x\cdot \nu _3 = s\}} \chi _{J_1}(x\cdot \nu _1) \chi _{J_2}(x\cdot \nu _2) \, \mathrm {d}\mathscr {H}^1(x) > 0\right\} \end{aligned}$$

is a consequence of Fubini’s theorem. For all \(x \in \mathbb {R}^2\) and \(i=1,2,3\) we recall \(\pi _i(x) = x\cdot \nu _i\), so that the statement \(A \subset \pi _3(\pi _1^{-1}(J_1) \cap \pi _2^{-1}(J_2) )\) immediately follows from the definitions. By monotonicity of the Lebesgue and \(\mathscr {H}^1\) measures it is sufficient to prove the statement for bounded \(J_1\) and \(J_2\).

Step 1: If \(t_1\) is a point of density one of \(J_1\) and \(t_2\) is point of density one of \(J_2\), then \(-t_1-t_2\) is a point of density one of A.

For convenience, we may assume \(t_1 = t_2 =0\). Let \(\varepsilon >0\) and let \(D_\varepsilon \,{:}{=} \, \pi _1^{-1}(-\varepsilon ,\varepsilon )\cap \pi _2^{-1}(-\varepsilon ,\varepsilon )\). As for \(A,B\subset \mathbb {R}\), sets of the form \(\pi _1^{-1}(A)\cap \pi _2^{-1}(B)\) are product sets in some transformed coordinates, we can compute

$$\begin{aligned}&1- \frac{1}{|D_\varepsilon |} |\pi _1^{-1}(J_1) \cap \pi _2^{-1}(J_2) \cap D_\varepsilon | \\&\quad = \,\frac{1}{|D_\varepsilon |} \left( |D_\varepsilon | - \left| \pi _1^{-1}(J_1) \cap \pi _2^{-1}(J_2) \cap D_\varepsilon \right| \right) \\&\quad = \, \frac{1}{|D_\varepsilon |} \left| \left( \pi _1^{-1}({J}^{\mathsf {c}}_1)\cap D_\varepsilon \right) \cup \left( \pi _2^{-1}({J}^{\mathsf {c}}_2)\cap D_\varepsilon \right) \right| \\&\quad \lesssim \frac{1}{\varepsilon ^2}\left( \varepsilon |{J}^{\mathsf {c}}_1\cap (-\varepsilon ,\varepsilon )| + \varepsilon |{J}^{\mathsf {c}}_2\cap (-\varepsilon ,\varepsilon )| \right) \\&\quad = \, \frac{1}{\varepsilon } (|{J}^{\mathsf {c}}_1\cap (-\varepsilon ,\varepsilon )|+ |{J}^{\mathsf {c}}_2\cap (-\varepsilon ,\varepsilon )|). \end{aligned}$$

As 0 is a point of density one for \(J_1\) and \(J_2\), we obtain that

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} 1- \frac{1}{|D_\varepsilon |} |\pi _1^{-1}(J_1) \cap \pi _2^{-1}(J_2) \cap D_\varepsilon | = 0. \end{aligned}$$
(104)
Fig. 27
figure 27

a Sketch of \(D_\varepsilon = \pi _1^{-1}(-\varepsilon ,\varepsilon ) \cap \pi _2^{-1}(-\varepsilon , \varepsilon )\). b A significant part of the dashed line \(l \,{:}{=}\, \{x\cdot \nu _2 =s\}\) for \(s\in (-c\varepsilon , c\varepsilon )\) intersects \(D_\varepsilon \)

By scaling arguments there exist \(0<c<1\) and \(\eta >0\) such that for all \(\varepsilon >0\) and all \(s \in (-c\varepsilon ,c\varepsilon )\) we have

$$\begin{aligned} \int _{\{x\cdot \nu _3 = s\}}\chi _{D_\varepsilon }(x) \, \mathrm {d}\mathscr {H}^1(x) \ge \eta \varepsilon ; \end{aligned}$$
(105)

see Fig. 27. Let \(\varepsilon >0\) and

$$\begin{aligned} S_\varepsilon \,{:}{=}\, \left\{ s\in (-c\varepsilon ,c\varepsilon ): \int _{\{x\cdot \nu _3 = s\}} \chi _{J_1}(x\cdot \nu _1) \chi _{J_2}(x\cdot \nu _2) \, \mathrm {d}\mathscr {H}^1(x) = 0 \right\} , \end{aligned}$$

so that (105), together with Fubini’s theorem, implies \( \left| \pi _3^{-1}(S_\varepsilon ) \cap D_\varepsilon \right| \ge \eta \varepsilon |S_\varepsilon | \). By the definition of \(S_\varepsilon \) we furthermore for almost all \(s \in S_\varepsilon \) have

$$\begin{aligned} \int _{\{x\cdot \nu _3 = s\}} \chi _{J_1\cap (-\varepsilon ,\varepsilon )}(x\cdot \nu _1) \chi _{J_2\cap (-\varepsilon ,\varepsilon )}(x\cdot \nu _2) \, \mathrm {d}\mathscr {H}^1(x) = 0. \end{aligned}$$

Therefore, another application of Fubini’s theorem gives

$$\begin{aligned} \left| \pi _3^{-1} (S_\varepsilon ) \cap \pi _1^{-1}(J_1) \cap \pi _2^{-1}(J_2) \cap D_\varepsilon \right| = 0, \end{aligned}$$

and consequently we have

$$\begin{aligned} |\pi _1^{-1}(J_1) \cap \pi _2^{-1}(J_2) \cap D_\varepsilon | \le |D_\varepsilon | - \eta \varepsilon |S_\varepsilon |. \end{aligned}$$

By algebraic manipulation of this inequality we see that

$$\begin{aligned} \frac{|S_\varepsilon |}{2c\varepsilon } \le&\, \frac{1}{2\eta c \varepsilon ^2}\left( |D_\varepsilon |-|\pi _1^{-1}(J_1) \cap \pi _2^{-1}(J_2) \cap D_\varepsilon |\right) \\ \lesssim&\, 1- \frac{1}{|D_\varepsilon |} |\pi _1^{-1}(J_1) \cap \pi _2^{-1}(J_2) \cap D_\varepsilon |. \end{aligned}$$

As the right-hand side of this inequality vanishes in the limit \(\varepsilon \rightarrow 0\) due to (104), we see that 0 is a point of density one for A by definition of \(S_\varepsilon \).

Step 2: We have \(|A| \ge |J_1|+|J_2|\).

The geometric situation in the following argument can be found in Fig. 28. For \(i=1,2\) let \(\tilde{J}_i \subset J_i\) be the points of density one contained in the respective sets. By Lebesgue point theory we have \(|J_i| = |\tilde{J}_i|\) for \(i=1,2\). Let \( t_1 \, {:}{=}\, \inf \tilde{J}_1\) and \( t_2 \, {:}{=}\, \sup \tilde{J}_2\). Since both sets are non-empty and bounded, we have for all \(i=1,2\) that \( t_i \in \mathbb {R}\). Let \(n\in \mathbb {N}\). Let \(t_{1,n} \in \tilde{J}_1\) with \(0 \le t_{1,n}- t_1 < \frac{1}{n}\) and let \(t_{2,n}\in \tilde{J}_2\) with \(0 \le t_2- t_{2,n} < \frac{1}{n}\). Let

$$\begin{aligned} A_{1,n}\, {:}{=} \, A \cap \left( -\infty ,- t_1 - \frac{1}{n} - t_{2,n}\right) \end{aligned}$$

and

$$\begin{aligned} A_{2,n}\, {:}{=}\, A \cap \left( -t_{1,n}- t_2 + \frac{1}{n}, \infty \right) . \end{aligned}$$

Adding the conditions of closeness for \(t_{1,n}\) and \(t_{2,n}\), we see that

$$\begin{aligned} t_{1,n}- t_1 + t_2- t_{2,n} < \frac{2}{n}, \end{aligned}$$

which in turn implies

$$\begin{aligned} - t_1 - t_{2,n} -\frac{1}{n}< -t_{1,n} - t_2 + \frac{1}{n}. \end{aligned}$$

Thus \(A_{1,n}\) and \(A_{2,n}\) are disjoint and we have

$$\begin{aligned} |A| \ge |A_{1,n}|+|A_{2,n}|. \end{aligned}$$
(106)
Fig. 28
figure 28

Sketch of \(\pi _1^{-1}(J_1) \cap \pi _2^{-1}(J_2)\) with the corner \(p\in \mathbb {R}^2\) satisfying \( p\cdot \nu _1 = \inf J_1)\) and \(p\cdot \nu _2 =\sup J_2\). Lines parallel to \(l\, {:}{=}\, \{x\cdot \nu _3 = p\cdot \nu _3\}\) intersecting \(\pi _1^{-1}(J_1) \cap \pi _2^{-1}(J_2)\) are sorted into \(A_1\) if they lie on the left of l or into \(A_2\) otherwise

As \(t_{2,n}\) is a point of density one of \(J_2\), we know by Step 1 that there exists a set \(\tilde{A}_{1,n} \subset \mathbb {R}\) of measure zero such that the set

$$\begin{aligned} \left( -t_{2,n} - \tilde{J}_1 \right) \cap \left( -\infty , - t_1 - \frac{1}{n} - t_{2,n}\right) \subset A_{1,n} \cup \tilde{A}_{1,n}, \end{aligned}$$

consists of points of density one for A. We thus know that \(|A_{1,n}| \ge |\tilde{J}_1 \cap ( t_1 + \frac{1}{n},\infty )|\). Similarly we obtain \(|A_{2,n}| \ge |\tilde{J}_2 \cap (-\infty , t_2 - \frac{1}{n})|\). Combining both inequalities with inequality (106), we see that

$$\begin{aligned} |A|\ge \left| \tilde{J}_1 \cap \left( t_1 + \frac{1}{n},\infty \right) \right| + \left| \tilde{J}_2 \cap \left( -\infty , t_2 - \frac{1}{n}\right) \right| . \end{aligned}$$

In the limit \(n \rightarrow \infty \), we obtain

$$\begin{aligned} |A| \ge |\tilde{J}_1| + |\tilde{J}_2| = |J_1| + |J_2|. \end{aligned}$$

\(\square \)