1 Introduction

In this paper we continue the study started in [5, 26] of functionals arising from geometric variational problems from the point of view of differential inclusions. The energies we consider are of the form

$$\begin{aligned} \Sigma _\Psi (T)\doteq \int _{E}\Psi (\vec T(x))\theta (x)d{\mathcal {H}}^m(x), \end{aligned}$$
(1.1)

defined on m-dimensional rectifiable currents (resp. varifolds) \(T = \llbracket E,\vec T, \theta \rrbracket \) of \(\Omega \times {\mathbb {R}}^n\), where \(\Omega \subset {\mathbb {R}}^m\) is a convex and bounded open set, and the integrand \(\Psi \) is defined on the oriented (resp. non-oriented) Grassmanian space. In order to keep the technicalities at a minimum level, we defer all the definitions of these geometric objects to Section A. The main interest is the regularity of stationary points for energies as in (1.1) satisfying suitable ellipticity conditions. From the celebrated regularity theorem of Allard of [2], it is known that an \(\varepsilon \)-regularity theorem holds for stationary points of the area functional, namely the case in which \(\Psi \equiv 1\). Since then, the question of extending this result to more general energies has been an important open problem in Geometric Measure Theory, see [1] for a result in this direction and [6, 8, 9] for more recent contributions. On the other hand, the situation is more understood for minimizers of energies of the form (1.1), where similar partial regularity theorems are known, see for instance [11, Ch. 5], [22].

In [5], the second author togheter with C. De Lellis, G. De Philippis and B. Kirchheim already approached this regularity problem through the viewpoint of differential inclusions. The theory of differential inclusions has a rich history, we refer the reader to [17] for an overview and to [18, 19] for more recent results. Since this work is also based on that viewpoint, let us briefly explain what this means. The strategy of [5] consisted first in rewriting (1.1) on a special class of geometric objects, namely multiplicity one graphs of Lipschitz maps, and study the differential inclusion associated to the system of PDEs arising from the stationarity condition. Namely, it can be shown that, see [5, Sec. 6] or Sect. A.5, to a \(C^k\) integrand \(\Psi \) as the one appearing in (1.1), one can naturally associate a \(C^k\) function \(f: {\mathbb {R}}^{n\times m}\rightarrow {\mathbb {R}}\) with the property that

$$\begin{aligned} \mathbb {E}_f(u) \doteq \int _{\Omega }f(Du(x))dx= \Sigma _{\Psi }(T_u). \end{aligned}$$
(1.2)

where \(T_u = \llbracket \Gamma _u,\vec \xi _u,1\rrbracket \) is the current associated to the graph of u i.e. if \(v(x)\doteq (x,u(x))\) is the graph map we have \(T_u={v}_{\#} \llbracket \Omega \rrbracket \). In particular, it is possible to prove, see [5, Prop. 6.8] that \(T_u\) is stationary for the energy (1.1) if and only if u solves the following equations:

$$\begin{aligned} \int _{\Omega }\langle Df(D u),D v\rangle dx = 0, \quad \forall v\in C^1_c(\Omega ,{\mathbb {R}}^n) \end{aligned}$$
(1.3)

and

$$\begin{aligned} \int _{\Omega }\langle Df(D u), D u D \phi \rangle dx - \int _{\Omega }f(D u){{\,\mathrm{div}\,}}\phi \; dx = 0,\quad \forall \phi \in C_c^1(\Omega ,{\mathbb {R}}^m). \end{aligned}$$
(1.4)

The Euler–Lagrange equation (1.3) corresponds to variations of the form

$$\begin{aligned} \frac{d}{d\varepsilon }|_{\varepsilon = 0}\mathbb {E}_f(u + \varepsilon v) = 0, \end{aligned}$$

usually called outer variations, and (1.4) corresponds to variations of the form

$$\begin{aligned} \frac{d}{d\varepsilon }|_{\varepsilon = 0}\mathbb {E}_f(u\circ (x + \varepsilon \Phi )) = 0, \end{aligned}$$

called inner (or domain) variations. The second step is to study (1.3) and (1.4) from the point of view of differential inclusions. This amounts to rewrite (1.3)–(1.4) equivalently as

$$\begin{aligned} \left( \begin{array}{c} Du\\ A\\ B \end{array} \right) \in K_f\doteq \left\{ C\in {\mathbb {R}}^{(2n + m)\times m}: C = \left( \begin{array}{cc} X\\ Df(X)\\ X^TDf(X) - f(X){{\,\mathrm{id}\,}}\end{array} \right) \right\} , \end{aligned}$$
(1.5)

for \(A \in L^\infty (\Omega ,{\mathbb {R}}^{n\times m})\), \(B \in L^\infty (\Omega ,{\mathbb {R}}^m)\) with \({{\,\mathrm{div}\,}}(A) = 0\), \({{\,\mathrm{div}\,}}(B) = 0\).

This paper focuses on the same problem as [5], i.e. regularity of stationary points for geometric integrands, but with the addition of considering graphs with arbitrary positive multiplicity. This of course enlarges the class of competitors and might allow for more flexibility in the regularity of solutions. In particular, we consider polyconvex functions f, i.e.

$$\begin{aligned} f(X) = g(X,\Phi (X)), \end{aligned}$$

where \(g \in C^1({\mathbb {R}}^k)\) is a convex function and \(\Phi :{\mathbb {R}}^{n\times m} \rightarrow {\mathbb {R}}^k\) is the vector containing all the minors (subdeterminants) of order larger than or equal to 2 of \(X \in {\mathbb {R}}^{n\times m}\). In analogy with (1.3)–(1.4), we will be interested in the following system of PDEs

$$\begin{aligned} \left\{ \begin{array}{ll} \displaystyle \int _{\Omega }\langle Df(D u),D v\rangle \beta dx = 0 &{}\forall v\in C^1_c(\Omega ,{\mathbb {R}}^n) \\ \displaystyle \int _{\Omega }\langle Df(D u), D u D \phi \rangle \beta \;dx - \int _{\Omega }f(D u){{\,\mathrm{div}\,}}\phi \beta \;dx = 0\qquad &{} \forall \phi \in C_c^1(\Omega ,{\mathbb {R}}^m). \end{array}\right. \end{aligned}$$
(1.6)

for a Lipschitz map \(u \in {{\,\mathrm{Lip}\,}}(\Omega ,{\mathbb {R}}^n)\), and a Borel function \(\beta \in L^\infty (\Omega ,{\mathbb {R}}^+)\). The study of objects with multiplicity is rather natural in the context of stationary rectifiable varifolds or currents. When dealing with these objects, one is interested in showing a so-called constancy theorem, see [23, Theorem 8.4.1]. A constancy theorem in the sense of [23, Theorem 8.4.1] asserts that if a stationary (for the area) varifold of dimension m has support contained in a \(C^2\) manifold of the same dimension, then the varifold must be given by a fixed multiple of the manifold, so that in particular the multiplicity must be constant. In [10], it was shown that instead of \(C^2\), even Lipschitz regularity of the manifold is sufficient to guarantee the validity of the Constancy Theorem. This is connected to the following algebraic fact. If a \(C^2\) map u solves (1.3), then it necessarily solves also (1.4), hence the system (1.3)-(1.4) reduces to equation (1.3). Nonetheless, if \(u \in C^2\) and solves (1.6) for a bounded multiplicity \(\beta \), then it is not anymore true that u automatically solves the first. One therefore would like to show a priori that the multiplicity is constant and subsequently one is again in the situation given by (1.3)-(1.4). As for regularity theorems, no general constancy result is known at the moment for general functionals, except for the codimension one case, see [7].

As said, the tools we use are the same as the ones of [5], namely we rewrite (1.6) as

$$\begin{aligned} \left( \begin{array}{c} Du\\ A\\ B \end{array} \right) \in C_f\doteq \left\{ C\in {\mathbb {R}}^{(2n + m)\times m}: C = \left( \begin{array}{cc} X\\ \beta Df(X)\\ \beta X^TDf(X) - \beta f(X){{\,\mathrm{id}\,}}\end{array} \right) , \text { for some }\beta > 0\right\} , \end{aligned}$$
(1.7)

again for \(A \in L^\infty (\Omega ,{\mathbb {R}}^{n\times m})\), \(B \in L^\infty (\Omega ,{\mathbb {R}}^m)\) with \({{\,\mathrm{div}\,}}(A) = 0\), \({{\,\mathrm{div}\,}}(B) = 0\). Our result is twofold. First, we will show that, if f is assumed to be non-negative, then the same result as [5, Theorem 1] holds, namely in \(C_f\) there are no \(T'_N\) configurations. Secondly, we show the optimality of this result by proving that if we drop the hypothesis on the positivity of f, one can not only embed a special family of matrices in \(C_f\), but one can actually construct a stationary current for the energy give in (1.1) whose support lies on the graph of a Lipschitz and nowhere \(C^1\) map. In order to formulate properly these results, we need some terminology concerning differential inclusions.

Differential inclusions are relations of the form

$$\begin{aligned} M(x) \in K \subset {\mathbb {R}}^{n\times m} \text { a.e. in }\Omega \end{aligned}$$
(1.8)

for \(M \in L^\infty (\Omega ,{\mathbb {R}}^{n\times m})\) satisfying \({\mathscr {A}}(M) = 0\) in the weak sense for some constant coefficients, linear differential operator \({\mathscr {A}}(\cdot )\). To every operator \({\mathscr {A}}(\cdot )\), one can associate a wave cone, denoted with \(\Lambda _{\mathscr {A}}\), that is made of those directions A in which it is possible to have plane wave solution, i.e. \(A \in {\Lambda _{{\mathscr {A}}}}\) if and only if there exists \(\xi \in {\mathbb {R}}^m\) such that

$$\begin{aligned} {\mathscr {A}}(h((x,\xi ))A) = 0,\quad \forall h \in C^1({\mathbb {R}}). \end{aligned}$$

In this work, we will not need to consider various differential operators, as we will only work with the mixed div-curl operator introduced in (1.5). In that case, we denote the cone with \(\Lambda _{dc}\) and we will introduce it in detail in Sect. 2.1. Due to the connection of the wave-cone to the existence of oscillatory solutions of (1.8), a very first step to exclude wild solutions of (1.8) is to check that

$$\begin{aligned} A - B \notin \Lambda _{{\mathscr {A}}}, \quad \forall A,B \in K. \end{aligned}$$
(1.9)

This is usually quite simple to verify, and indeed we will show in Proposition 3.1 that, if f is positive, then (1.9) holds with \(\Lambda _{{\mathscr {A}}} = \Lambda _{dc}\) and K replaced by \(C_f\). Property (1.9) is in general not sufficient to guarantee good regularity properties of solution of (1.8). Indeed, in [20], S. Müller and V. Šverák constructed a striking counterexample to elliptic regularity for solutions of

$$\begin{aligned} Dv(x) \in K'_f \doteq \left\{ C\in {\mathbb {R}}^{4\times 2}: C = \left( \begin{array}{cc} X\\ Df(X)J \end{array} \right) \right\} \subset {\mathbb {R}}^{4\times 2}, \end{aligned}$$
(1.10)

where the function \(f \in C^\infty ({\mathbb {R}}^{2\times 2})\) is quasiconvex (for the definition of quasiconvex function, we refer the reader to [20]), and J is a matrix satisfying \(J = -J^T\) and \(J^2 = -{{\,\mathrm{id}\,}}\). In particular, they were able to show that there exists a Lipschitz and nowhere \(C^1\) function \(v: \Omega \subset {\mathbb {R}}^2 \rightarrow {\mathbb {R}}^4\) satisfying the differential inclusion (1.10). Their strategy was subsequently improved by L. Székelyhidi in [24] showing that f can be chosen polyconvex. In both cases, \(K'_f\) does not contain rank one connections, i.e.

$$\begin{aligned} {{\,\mathrm{rank}\,}}(A-B) = 2, \quad \forall A,B \in K'_f, \end{aligned}$$

and this can proved to be equivalent to (1.9) in the case \({\mathscr {A}} = {{\,\mathrm{curl}\,}}\). Their strategy was based on showing that in \(K'_f\) other suitable families of matrices could be embedded, the so-called \(T_N\) configurations. In our situation, since we are dealing with mixed div-curl operators, we need to consider a slightly different version of \(T_N\) configurations, that we have named \(T'_N\) configurations in [5]. We postpone the definition of \(T_N\) and \(T'_N\) configurations to Sect. 2, but we are finally able to formally state our main positive result:

Theorem

If \(f\in C^1 ({\mathbb {R}}^{n\times m})\) is a strictly polyconvex function, then \(C_f\) does not contain any set \(\{A_1, \ldots , A_N\} \subset {\mathbb {R}}^{(2n + m)\times m}\) which induces a \(T_N'\) configuration, provided that \(f(X_1) \ge 0,\dots , f(X_N) \ge 0\), if

$$\begin{aligned} A_i = \left( \begin{array}{c} X_i\\ Y_i\\ Z_i \end{array} \right) , \quad X_i,Y_i \in {\mathbb {R}}^{n\times m}, Z_i \in {\mathbb {R}}^{m\times m}, \forall i \in \{1,\dots , N\}. \end{aligned}$$

This result, as [5, Theorem 1], shows that it is not possible to apply the convex integration methods of [20, 24] to prove the existence of an irregular solution of the system (1.6). This theorem is stronger than [5, Theorem 1], in the sense that we are able to show [5, Theorem 1] as a corollary:

Corollary

If \(f\in C^1 ({\mathbb {R}}^{n\times m} )\) is a strictly polyconvex function (not necessarily non-negative), then \(K_f\) does not contain any set \(\{A_1, \ldots , A_N\}\) which induces a \(T_N'\) configuration.

Finally, in Sect. 4, we show the optimality of the hypothesis of non-negativity of the previous theorem by proving the following:

Theorem

There exists a smooth and elliptic integrand \(\Psi : \Lambda _2({\mathbb {R}}^4)\rightarrow {\mathbb {R}}\) such that the associated energy \(\Sigma \) admits a stationary point T whose (integer) multiplicities are not constant. Moreover the rectifiable set supporting T is given by a graph of a Lipschitz map \(u: \Omega \rightarrow {\mathbb {R}}^2\) that fails to be \(C^1\) in any open subset \({\mathcal {V}} \subset \Omega \).

The last Theorem is obtained by embedding in the differential inclusion (1.7) what has been named in [13] large \(T_N\) configuration. Following the strategy of [24], we do not a priori choose a polyconvex \(f \in C^\infty ({\mathbb {R}}^{2\times 2})\), but rather we construct it in such a way that \(C_f\) already contains this special family of matrices. Once the polyconvex function f has been built, we prove an extension result for f to the Grassmanians, thus obtaining the integrand \(\Psi \) of the statement of the Theorem. The extension results are quite simple and might be of independent interest. The construction of our counterexample can not be carried out in the varifold setting. The reason is quite elementary, as the integrand \(\Psi \) we would need to construct in the varifold case should be even, convex and positively 1-homogeneous, hence positive. We refer the reader to Remark 5.6 for more details. Moreover, let us point out that positivity of the integrand is a necessary assumption when studying existence of minima, but to the best of our knowledge there is no available example for it to be a necessary assumption also when studying regularity properties of stationary points.

The paper is organized as follows. In Sect. 2, we recall the statements of our main results in the case of non-negative integrands f and we collect some crucial preliminary results of [5]. The proof of the main results in the positive case, i.e. Proposition 3.1, Theorem 3.3 and Corollary 3.4, will be given in Sect. 3. In Sect. 4, we provide a counterexample to regularity when dropping the hypothesis of positivity of the integrand. Some lemmas of Sect. 4 concerning the extension of polyconvex functions to the Grassmaniann manifold can be easily extended to general dimension and codimensions. Therefore, we give the proof of these general versions in Sect. 5. Finally, the appendix contains a concise introduction to the tools of geometric measure theory used along the paper.

2 Positive case: absence of \(T_N\) configurations

In this section we collect some preliminary results proved in [5], that will be essential for the proofs of the next section.

2.1 Div-curl differential inclusions, wave cones and inclusion sets

In this subsection, we explain how to rephrase the system (1.3)–(1.4) as a differential inclusion. As recalled in the introduction, the Euler–Lagrange equations defining stationary points for energies \({\mathbb {E}}_f\) are the couple of equations (1.3), (1.4), that can be written in the classical form:

$$\begin{aligned} {\left\{ \begin{array}{ll} {{\,\mathrm{div}\,}}(Df(Du)) = 0 \\ {{\,\mathrm{div}\,}}(Du^TDf(Du) - f(Du){{\,\mathrm{id}\,}}) = 0 \end{array}\right. } \end{aligned}$$

Thus we are lead to study the following div-curl differential inclusion for a triple of maps \(X, Y\in L^\infty (\Omega , {\mathbb {R}}^{n\times m})\) and \(Z\in L^\infty (\Omega , {\mathbb {R}}^{m\times m})\):

$$\begin{aligned}&{{{\,\mathrm{curl}\,}}}\, X = 0, \qquad {{{\,\mathrm{div}\,}}}\, Y =0, \qquad {{{\,\mathrm{div}\,}}}\, Z = 0\, , \end{aligned}$$
(2.1)
$$\begin{aligned}&W \doteq \left( \begin{array}{c} X\\ Y\\ Z \end{array} \right) \in K_f = \left\{ A \in {\mathbb {R}}^{(2n + m)\times m}: A = \left( \begin{array}{c} X\\ Df(X)\\ X^TDf(X) - f(X){{\,\mathrm{id}\,}}\end{array} \right) \right\} , \end{aligned}$$
(2.2)

where \(f\in C^1 ({\mathbb {R}}^{n\times m})\) is a fixed function.

Moreover, we also consider the following more general system of PDEs, for \(u \in {{\,\mathrm{Lip}\,}}(\Omega , {\mathbb {R}}^n)\) and a Borel map \(\beta \in L^\infty (\Omega ,(0,+\infty ))\):

$$\begin{aligned} \left\{ \begin{array}{ll} \displaystyle \int _{\Omega }\langle Df(D u),D v\rangle \beta dx = 0 &{}\forall v\in C^1_c(\Omega ,{\mathbb {R}}^n) \\ \displaystyle \int _{\Omega }\langle Df(D u), D u D \phi \rangle \beta dx - \int _{\Omega }f(D u){{\,\mathrm{div}\,}}\phi \beta \;dx = 0\qquad &{} \forall \phi \in C_c^1(\Omega ,{\mathbb {R}}^m). \end{array}\right. \end{aligned}$$
(2.3)

This system is equivalent to the stationarity in the sense of varifolds of the varifold \(V = \llbracket \Gamma _u,\beta \rrbracket \), where \(\Gamma _u\) is the graph of u. This is discussed in Sect. A.5. The div-curl differential inclusion associated to this system is, again for a triple of maps \(X, Y\in L^\infty (\Omega , {\mathbb {R}}^{n\times m})\) and \(Z\in L^\infty (\Omega , {\mathbb {R}}^{m\times m})\):

$$\begin{aligned}&{{{\,\mathrm{curl}\,}}}\, X = 0, \qquad {{{\,\mathrm{div}\,}}}\, Y =0, \qquad {{{\,\mathrm{div}\,}}}\, Z = 0\, , \end{aligned}$$
(2.4)
$$\begin{aligned}&W \doteq \left( \begin{array}{c} X\\ Y\\ Z \end{array} \right) \in C_f \end{aligned}$$
(2.5)

where

$$\begin{aligned} C_f = \left\{ C\in {\mathbb {R}}^{(2n + m)\times m}: C = \left( \begin{array}{cc} X\\ \beta Df(X)\\ \beta X^TDf(X) - \beta f(X){{\,\mathrm{id}\,}}\end{array} \right) , \text { for some }\beta > 0\right\} , \end{aligned}$$
(2.6)

This discussion proves the following

Lemma 2.1

Let \(f\in C^1 ({\mathbb {R}}^{n\times m})\). A map \(u \in {{\,\mathrm{Lip}\,}}(\Omega ,{\mathbb {R}}^n)\) is a stationary point of the energy (1.2) if and only if there are matrix fields \(Y\in L^\infty (\Omega , {\mathbb {R}}^{n\times m})\) and \(Z\in L^\infty (\Omega , {\mathbb {R}}^{m\times m})\) such that \(W = (Du, Y,Z)\) solves the div-curl differential inclusion (2.1)–(2.2).

Moreover, the couple \((u,\beta ) \in {{\,\mathrm{Lip}\,}}(\Omega ,{\mathbb {R}}^n)\times L^\infty (\Omega ,(0,+\infty ))\) solves (2.3) if and only there are matrix fields \(Y\in L^\infty (\Omega , {\mathbb {R}}^{n\times m})\) and \(Z\in L^\infty (\Omega , {\mathbb {R}}^{m\times m})\) such that \(W = (Du, Y,Z)\) solves the div-curl differential inclusion (2.4)–(2.5).

Finally, we introduce here the wave-cone associated to the mixed div-curl operator that is relevant for us.

Definition 2.2

The cone \(\Lambda _{dc}\subset {\mathbb {R}}^{(2n+m)\times m}\) consists of the matrices in block form

$$\begin{aligned} \left( \begin{array}{l} X\\ Y\\ Z \end{array}\right) \end{aligned}$$

with the property that there is a direction \(\xi \in {\mathbb {S}}^{m-1}\) and a vector \(u\in {\mathbb {R}}^n\) such that \(X = u\otimes \xi \), \(Y \xi =0\) and \(Z\xi =0\).

2.2 \(T_N\) configurations and \(T'_N\) configurations

We start defining \(T_N\) configurations for classical curl-type differential inclusions.

Definition 2.3

An ordered set of \(N\ge 2\) distinct matrices \(\{X_i\}_{i=1}^N \subset {\mathbb {R}}^{n\times m}\) is said to induce a \(T_N\) configuration if there exist matrices \(P, C_i \in {\mathbb {R}}^{n\times m}\) and real numbers \(k_i > 1\) such that:

  1. (a)

    each \(C_i\) belongs to the wave cone of \({{{\,\mathrm{curl}\,}}}\, X=0\), namely \({{\,\mathrm{rank}\,}}(C_i) \le 1\) for each i;

  2. (b)

    \(\sum _i C_i = 0\);

  3. (c)

    \(X_1, \ldots , X_N\), P and \(C_1, \ldots , C_N\) satisfy the following N linear conditions

    $$\begin{aligned} \begin{aligned}&X_1 = P + k_1 C_1 ,\\&X_2 = P + C_1 + k_2C_2 ,\\&\dots \\&\dots \\&X_N = P + C_1 +\dots + k_NC_N\, . \end{aligned} \end{aligned}$$
    (2.7)

In the rest of the chapter we will use the word \(T_N\) configuration for the data

$$\begin{aligned} P, C_1, \ldots , C_N, k_1, \ldots k_N. \end{aligned}$$

We will moreover say that the configuration is nondegenerate if \({{\,\mathrm{rank}\,}}(C_i)=1\) for every i.

As in [5], we give a slightly more general definition of \(T_N\) configuration than the one usually given in the literature (cf. [20, 24, 25]), in that we drop the requirement that there are no rank-one connections between distinct \(X_i\) and \(X_j\). We refer the reader to [5] for discussions concerning \(T_N\) configurations.

Adapted to the div-curl operator we introduce \(T'_N\) configurations, originally introduced in [5].

Definition 2.4

A family \(\{A_1, \ldots , A_N\}\subset {\mathbb {R}}^{(2n+m)\times m}\) of \(N\ge 2\) distinct

$$\begin{aligned} A_i\doteq \left( \begin{array}{c} X_i\\ Y_i\\ Z_i \end{array} \right) \end{aligned}$$

induces a \(T_N'\) configuration if there are matrices \(P, Q, C_i, D_i \in {\mathbb {R}}^{n\times m}\), \(R, E_i\in {\mathbb {R}}^{m\times m}\) and coefficients \(k_i >1\) such that

$$\begin{aligned} \left( \begin{array}{c} X_i\\ Y_i\\ Z_i \end{array} \right) = \left( \begin{array}{c} P\\ Q\\ R \end{array} \right) + \left( \begin{array}{c} C_1\\ D_1\\ E_1 \end{array} \right) + \cdots + \left( \begin{array}{c} C_{i-1}\\ D_{i-1}\\ E_{i-1} \end{array} \right) + k_i \left( \begin{array}{c} C_i\\ D_i\\ E_i \end{array} \right) \end{aligned}$$
(2.8)

and the following properties hold:

  1. (a)

    each element \((C_i, D_i, E_i)\) belongs to the wave cone \(\Lambda _{dc}\) of (2.1);

  2. (b)

    \(\sum _\ell C_\ell = 0\), \(\sum _\ell D_\ell =0 \) and \(\sum _\ell E_\ell = 0\).

We say that the \(T'_N\) configuration is nondegenerate if \({{\,\mathrm{rank}\,}}(C_i)=1\) for every i.

We collect here some simple consequences of the definition above.

Proposition 2.5

Assume \(A_1, \ldots , A_N\) induce a \(T_N'\) configuration with \(P,Q, R, C_i, D_i, E_i\) and \(k_i\) as in Definition 2.4. Then:

  1. (i)

    \(\{X_1, \ldots , X_N\}\) induce a \(T_N\) configuration of the form (2.7), if they are distinct; moreover the \(T_N'\) configuration is nondegenerate if and only if the \(T_N\) configuration induced by \(\{X_1, \ldots , X_N\}\) is nondegenerate;

  2. (ii)

    For each i there is an \(n_i\in {\mathbb {S}}^{m-1}\) and a \(u_i\in {\mathbb {R}}^n\) such that \(C_i = u_i\otimes n_i\), \(D_i n_i =0\) and \(E_i n_i =0\);

  3. (iii)

    \({{\,\mathrm{tr}\,}}C_i^T D_i = \langle C_i, D_i\rangle = 0\) for every i.

2.3 Strategy

Before starting with the proof of the main result of this chapter, it is convenient to explain the strategy we intend to follow. In order to do so, let us consider here the case \(n = m = 2\), \(N = 5\). Suppose by contradiction that there exists a strictly polyconvex function \(f : {\mathbb {R}}^{2\times 2 }\rightarrow {\mathbb {R}}\), \(f(X) = g(X,\det (X))\) and a \(T'_5\) configuration \(A_1,A_2,A_3,A_4,A_5\),

$$\begin{aligned} A_i = \left( \begin{array}{c} X_i\\ Y_i\\ Z_i \end{array} \right) ,\quad \forall i \in \{1,\dots ,5\}, \end{aligned}$$

where \(X_i,Y_i,Z_i\) fulfill the relations of (2.8), i.e.

$$\begin{aligned} \left( \begin{array}{c} X_i\\ Y_i\\ Z_i \end{array} \right) = \left( \begin{array}{c} P\\ Q\\ R \end{array} \right) + \left( \begin{array}{c} C_1\\ D_1\\ E_1 \end{array} \right) + \cdots + \left( \begin{array}{c} C_{i-1}\\ D_{i-1}\\ E_{i-1} \end{array} \right) + k_i \left( \begin{array}{c} C_i\\ D_i\\ E_i \end{array} \right) . \end{aligned}$$

We will see below that we can without loss of generality assume that \(P=0\). The first part of the strategy follows the same lines of the one of [5]. Indeed, we think the relations \(A_i \in C_f\), \(\forall i\), where \(C_f\) has been defined in (2.6), as two separate pieces of information:

$$\begin{aligned} \left( \begin{array}{c} X_i\\ Y_i \end{array} \right) \in K'_f = \left\{ A \in {\mathbb {R}}^{4\times 2}: A = \left( \begin{array}{c} X\\ \beta Df(X) \end{array} \right) , \beta >0, X \in {\mathbb {R}}^{2\times 2}\right\} \end{aligned}$$
(2.9)

and

$$\begin{aligned} Z_i = X_i^TY_i - \beta _if(X_i){{\,\mathrm{id}\,}}. \end{aligned}$$
(2.10)

Let us denote with \(c_i\doteq f(X_i)\). As in [5], we use (2.9) to obtain inequalities involving \(X_i,Y_i\) and quantities involving f. These are deduced from the polyconvexity of f, analogously to [24, Lemma 3]. In particular, (2.9) is rewritten as

$$\begin{aligned} c_i - c_j +\frac{1}{\beta _i}\langle Y_i,X_j - X_i\rangle - d_i\det (X_i - X_j) < 0, \end{aligned}$$
(2.11)

for \(d_i \doteq \partial _{y_5}g(y_1,y_2,y_3,y_4,y_5)|_{(X_i,\det (X_i))}\). This is proved in Proposition 2.9. The final goal is to prove that these inequalities can not be fulfilled at the same time. As in [5], we can simplify (2.11) using the structure result on \(T_N\) configurations in \({\mathbb {R}}^{2\times 2}\) of [25, Proposition 1]. This asserts, in the specific case of the ongoing example, the existence of 5 vectors \((t_1^i,\dots , t_5^i), i \in \{1,\dots , 5\}\) with positive components, such that

$$\begin{aligned} \sum _{j = 1}^5t_j^i\det (X_j - X_i) = 0,\quad \forall i \in \{1,\dots , 5\}. \end{aligned}$$
(2.12)

If we use this result in (2.11), we can eliminate from the expression the variable \(d_i\), thus obtaining

$$\begin{aligned} \nu _i&\doteq \sum _{j = 1}^5t_j^i\left( c_i - c_j +\frac{1}{\beta _i}\langle Y_i,X_j - X_i\rangle - d_i\det \left( X_i - X_j\right) \right) \\&= \sum _{j = 1}^5t_j^i\left( c_i - c_j +\frac{1}{\beta _i}\langle Y_i,X_j - X_i\rangle \right) < 0, \quad \forall i \in \{1,\dots ,5\}, \end{aligned}$$

compare Corollary 2.10. In [5, 25, Proposition 1] was extended to \(T_N\) configurations in \({\mathbb {R}}^{n\times m}\), so that relations (2.12) remain true in every dimension and target dimension. This extension is recalled in Proposition 2.8. Despite being very useful, the last simplification can not conclude the proof. Indeed, up to now we have exploited (2.9) and the fact that \(\{X_1,\dots , X_5\}\) induce a \(T_5\) configuration, but, if \(\beta _i = 1,\forall i\), this is the exact same situation of [24]. Since from that paper we know the existence of \(T_5\) configurations in \(K'_f\), clearly we can not reach a contradiction at this point of the strategy. This is where the inner variations come into play. We rewrite (2.10) using the definition of \(T'_5\) configuration and, after some manipulations, we find that the numbers

$$\begin{aligned} \mu _i\doteq \sum _{j = 1}^5 t_j^i(\langle X_i - X_j ,Y_i\rangle - \beta _ic_i + \beta _jc_j) \end{aligned}$$

must all be 0. For the index I such that \(\beta _I = \min _{i}\beta _i\), and essentially using the positivity of \(c_j\), we find that

$$\begin{aligned}&0 = -\mu _I = \sum _{j = 1}^5 t_j^I(\langle X_j - X_I ,Y_I\rangle + \beta _Ic_I - \beta _jc_j) \\&\quad \le \sum _{j = 1}^5 t_j^I(\langle X_j - X_I ,Y_I\rangle + \beta _Ic_I - \beta _I c_j) = \beta _I\nu _I, \end{aligned}$$

which is in contradiction with the negativity of \(\nu _I\).

2.4 Preliminary results: \(T_N\) configurations

To follow the strategy explained in Sect. 2.3, we need to recall the extension of [25, Proposition 1] proved in [5]. Here we will only recall the essential results without proof, we refer the interested reader to [5] for the details. First, it is possible to associate to a set \(T_N\)-configuration of the form (2.7), i.e.

$$\begin{aligned} \begin{aligned}&X_1 = P + k_1 C_1 ,\\&X_2 = P + C_1 + k_2C_2 ,\\&\dots \\&\dots \\&X_N = P + C_1 +\dots + k_NC_N\, , \end{aligned} \end{aligned}$$

a defining vector \((\lambda ,\mu ) \in {\mathbb {R}}^{N + 1}\), see [5, Definition 3.7], defined as follows:

$$\begin{aligned} \mu \doteq \frac{k_1\dots k_{N}}{(k_1-1)\dots (k_{N} - 1)} \text { and } \lambda _i \doteq \frac{k_1\dots k_{i - 1}}{(\mu - 1)(k_1-1)\dots (k_{i - 1} - 1)}. \end{aligned}$$
(2.13)

These relations can be inverted, in fact one can express

$$\begin{aligned} k_i = \frac{\mu \lambda _1 + \dots + \mu \lambda _i + \lambda _{i + 1} \dots + \lambda _N}{(\mu - 1)\lambda _i}\, . \end{aligned}$$
(2.14)

Since \(k_i > 1, \forall i \in \{1,\dots ,N\}\), (2.13) implies that \(\lambda _i> 0, \forall i, \mu > 1\) and also

$$\begin{aligned} \sum _i \lambda _i = 1. \end{aligned}$$

As in [25, Proposition 1], we define N vectors of \({\mathbb {R}}^N\) with positive components

$$\begin{aligned} t^{i} \doteq \frac{1}{\xi _i}(\mu \lambda _1,\dots ,\mu \lambda _{i - 1},\lambda _{i},\dots ,\lambda _N), \text { for } i \in \{1,\dots ,N\}, \end{aligned}$$
(2.15)

where \(\xi _i > 1\) are normalization constants chosen in such a way that \(\Vert t^{i}\Vert _{1} = 1\). For a vector \(v = (v_1,\dots , v_N) \in {\mathbb {R}}^N\),

$$\begin{aligned} \Vert v\Vert _1 = \sum _{j = 1}^N |v_j|. \end{aligned}$$

The importance of these vectors \(t^i\) comes from [25, Proposition 1], where it is proved that, for a \(T_N\) configuration of the form (2.7) in \({\mathbb {R}}^{2\times 2}\),

$$\begin{aligned} \sum _{j = 1}^Nt_j^iX_j = P + C_1 + \dots + C_{i - 1} \end{aligned}$$
(2.16)

Moreover, the following relation holds for every i:

$$\begin{aligned} \det \left( \sum _{j = 1}^Nt_j^iX_j\right) = \sum _{j=1}^Nt_j^i\det (X_j)\, . \end{aligned}$$
(2.17)

We need to state the generalization of the previous relations for \(T_N\) configurations of any size. In [5, Lemma 3.10] it was proved this general Linear Algebra result:

Lemma 2.6

Assume the real numbers \(\mu >1\), \(\lambda _1, \ldots , \lambda _N >0\) and \(k_1, \ldots , k_N >1\) are linked by the formulas (2.13). Assume \(v, v_1, \ldots , v_N, w_1, \ldots , w_N\) are elements of a vector space satisfying the relations

$$\begin{aligned} w_i&= v + v_1 + \ldots + v_{i-1} + k_i v_i \end{aligned}$$
(2.18)
$$\begin{aligned} 0&= v_1+ \ldots + v_N\, . \end{aligned}$$
(2.19)

If we define the vectors \(t^i\) as in (2.15), then

$$\begin{aligned} \sum _j t^i_j w_j = v + v_1 + \ldots + v_{i-1}\, . \end{aligned}$$
(2.20)

This lemma allows to generalize (2.16) and (2.17), compare [5, Proposition 3.8]. To state this result, we need some notation concerning multi-indexes. We will use I for multi-indexes referring to ordered sets of rows of matrices and J for multi-indexes referring to ordered sets of columns. In our specific case, where we deal with matrices in \({\mathbb {R}}^{n\times m}\) we will thus have

$$\begin{aligned} I&= (i_1,\dots ,i_r),\qquad 1\le i_1<\dots< i_r \le n\, ,\\ \text { and } \qquad J&= (j_1,\dots ,j_s),\qquad 1\le j_1< \dots < j_s\le m\, \end{aligned}$$

and we will use the notation \(|I|\doteq r\) and \(|J|\doteq s\). In the sequel we will always have \(r = s\).

Definition 2.7

We denote by \({\mathcal {A}}_r\) the set

$$\begin{aligned} {\mathcal {A}}_r = \{(I,J): |I| = |J| = r\},\qquad 1\le r \le \min (n,m) . \end{aligned}$$

For a matrix \(M = (m_{ij})\in {\mathbb {R}}^{n\times m}\) and for \(Z\in {\mathcal {A}}_r\) of the form \(Z = (I,J)\), we denote by \(M^Z\) the squared \(r\times r\) matrix obtained by M considering just the elements \(m_{ij}\) with \(i\in I\), \(j\in J\) (using the order induced by I and J).

We are finally in position to state [5, Proposition 3.8].

Proposition 2.8

Let \(\{X_1, \ldots , X_N\}\subset {\mathbb {R}}^{n\times m}\) induce a \(T_N\) configuration as in (2.7) with defining vector \((\lambda , \mu )\). Define the vectors \(t^1,\dots ,t^N\) as in (2.15) and for every \(Z\in {\mathcal {A}}_r\) of order \(1\le r \le \min \{n,m\}\) define the minor \({\mathcal {S}} : {\mathbb {R}}^{n\times m} \ni X \mapsto {\mathcal {S}} (X) \doteq \det (X^Z)\in {\mathbb {R}}\). Then

$$\begin{aligned} \sum _{j = 1}^Nt_j^i {\mathcal {S}}(X_j) = {\mathcal {S}}\left( \sum _{j = 1}^Nt_j^iX_j\right) = {\mathcal {S}}(P + C_1 + \dots + C_{i - 1})\, . \end{aligned}$$
(2.21)

and \(A^\mu _Z \lambda = 0\).

It is clear that the previous result extends (2.16) and (2.17) to all the minors.

2.5 Preliminary results: inclusion set associated to polyconvex functions

As in [5, Section 4], we write a necessary condition for a set of distinct matrices \(A_i \in {\mathbb {R}}^{2n\times m}\)

$$\begin{aligned} A_i\doteq \left( \begin{array}{c} X_i\\ Y_i \end{array} \right) \, , \end{aligned}$$
(2.22)

to belong to a set of the form

$$\begin{aligned} K'_f \doteq \left\{ \left( \begin{array}{c} X\\ Df(X)\\ \end{array} \right) :X \in {\mathbb {R}}^{n\times m}\right\} \end{aligned}$$
(2.23)

for some strictly polyconvex function \(f:{\mathbb {R}}^{n\times m}\rightarrow {\mathbb {R}}\). First, introduce the following notation, that is the same as in [5]. Let \(f:{\mathbb {R}}^{n\times m}\rightarrow {\mathbb {R}}\) be a strictly polyconvex function of the form \(f(X) =g(\Phi (X))\), where \(g \in C^1({\mathbb {R}}^k)\) is strictly convex and \(\Phi \) is the vector of all the subdeterminants of X, i.e.

$$\begin{aligned} \Phi (X) = (X,v_1(X),\dots ,v_{\min (n,m)}(X)), \end{aligned}$$

and

$$\begin{aligned} v_s(X) = (\det (X_{Z_1}),\dots , \det (X_{Z_{\#{\mathcal {A}}_s}})) \end{aligned}$$

for some fixed (but arbitrary) ordering of all the elements \(Z\in {\mathcal {A}}_s\). Variables of \({\mathbb {R}}^k\), and hence partial derivatives in \({\mathbb {R}}^k\), are labeled using the ordering induced by \(\Phi \). The first nm partial derivatives, corresponding in \(\Phi (X)\) to X, are collected in a \(n\times m\) matrix denoted with \(D_Xg\). The j-th partial derivative, \(mn + 1\le j \le k\), is instead denoted by \(\partial _Zg\), where Z is the element of \({\mathcal {A}}_s\) corresponding to the j-th position of \(\Phi \). Let us make an example in low dimension: if \(n = 3,m = 2\), then \(k = 9\), and we choose the ordering of \(\Phi \) to be

$$\begin{aligned} \Phi (X) = (X,\det (X_{(12,12)}),\det (X_{(13,12)}),\det (X_{(23,12)})). \end{aligned}$$

In this case, \(y \in {\mathbb {R}}^k\) has coordinates

$$\begin{aligned} y = (y_{11},y_{12},y_{21},y_{22},y_{31},y_{32},y_{(12,12)}, y_{(13,12)},y_{(23,12)}). \end{aligned}$$

The partial derivatives with respect to the first 6 variables are collected in the \(3\times 2\) matrix:

$$\begin{aligned} D_Xg = \left( \begin{array}{cc} \partial _{11}g &{} \partial _{12}g\\ \partial _{21}g&{} \partial _{22}g \\ \partial _{31}g&{} \partial _{32}g \\ \end{array} \right) \end{aligned}$$

The partial derivatives with respect to the remaining variables are denoted as \(\partial _{(12,12)}g\), \(\partial _{(13,12)}g\) and \(\partial _{(23,12)}g\), i.e. following the ordering induced by \(\Phi \). Finally, for a matrix \(A \in {\mathbb {R}}^{r\times r}\), we denote with \({{\,\mathrm{cof}\,}}(A)\) the matrix defined as

$$\begin{aligned} {{\,\mathrm{cof}\,}}(A)_{ij} = (-1)^{i + j}\det (M_{ji}(A)), \end{aligned}$$

where \(M_{ji}(A)\) denotes the \((n-1)\times (n-1)\) submatrix of A obtained by eliminating from A the j-th row and the i-th column. In particular, the following relation holds

$$\begin{aligned} {{\,\mathrm{cof}\,}}(A)A = A{{\,\mathrm{cof}\,}}(A) = \det (A){{\,\mathrm{id}\,}}_r. \end{aligned}$$

We are ready to state the following:

Proposition 2.9

Let \(f:{\mathbb {R}}^{n\times m}\rightarrow {\mathbb {R}}\) be a strictly polyconvex function of the form \(f(X) =g(\Phi (X))\), where \(g \in C^1\) is strictly convex and \(\Phi \) is the vector of all the subdeterminants of X, i.e.

$$\begin{aligned} \Phi (X) = (X,v_1(X),\dots ,v_{\min (n,m)}(X)), \end{aligned}$$

and

$$\begin{aligned} v_s(X) = (\det (X_{Z_1}),\dots , \det (X_{Z_{\#{\mathcal {A}}_s}})) \end{aligned}$$

for some fixed (but arbitrary) ordering of all the elements \(Z\in {\mathcal {A}}_s\). If \(A_i \in K'_f\) and \(A_i \ne A_j\) for \(i \ne j\), then \(X_i\), \(Y_i = D f(X_i)\) and \(c_i = f (X_i)\) fulfill the following inequalities for every \(i\ne j\):

$$\begin{aligned} c_i - c_j +\langle Y_i,X_j - X_i\rangle - \sum _{r = 2}^{\min (m,n)}\sum _{Z\in {\mathcal {A}}_r}d^i_{Z}\left( \langle {{{\,\mathrm{cof}\,}}}(X_i^Z)^T,X^Z_j - X^Z_i\rangle -\det (X_j^Z) + \det (X_i^Z)\right) <0,\nonumber \\ \end{aligned}$$
(2.24)

where \(d^i_Z = \partial _Zg(\Phi (X_i))\).

This result was proved in [5, Proposition 4.1]. We now introduce the set

$$\begin{aligned} C'_f \doteq \left\{ C'\in {\mathbb {R}}^{2n\times m}: C' = \left( \begin{array}{cc} X\\ \beta Df(X)\\ \end{array} \right) , \text { for some }\beta > 0\right\} . \end{aligned}$$

Notice that \(C'_f\) is the projection of \(C_f\) on the first \(2n\times m\) coordinates. We immediately obtain from the previous proposition and the definition of \(C'_f\) that

$$\begin{aligned} A_i \in C'_f, \quad \forall i\in \{1,\dots ,N\} \end{aligned}$$

if and only if there exist numbers \(\beta _i > 0, \forall i\), such that

$$\begin{aligned} c_i - c_j +\frac{1}{\beta _i}\langle Y_i,X_j - X_i\rangle -\sum _{r = 2}^{\min (m,n)}\sum _{Z\in {\mathcal {A}}_r}d^i_{Z}\left( \langle {{{\,\mathrm{cof}\,}}}(X_i^Z)^T,X^Z_j - X^Z_i\rangle -\det (X_j^Z) + \det (X_i^Z)\right) <0.\nonumber \\ \end{aligned}$$
(2.25)

The expressions in (2.25) can be simplified when the matrices \(X_1, \ldots , X_N\) induce a \(T_N\) configuration:

Corollary 2.10

Let f be a strictly polyconvex function and let \(A_1, \ldots , A_N\) be distinct elements of \(K'_f\) with the additional property that \(\{X_1, \ldots , X_N\}\) induces a \(T_N\) configuration of the form (2.7) with defining vector \((\mu , \lambda )\). Then,

$$\begin{aligned} c_i - \sum _{j}t_j^ic_j -\frac{k_i}{\beta _i}\langle Y_i,C_i\rangle < 0 ,\quad \forall i \in \{1,\dots , N\}, \end{aligned}$$
(2.26)

where the \(t^i\)’s are given by (2.15).

This corresponds to [5, Corollary 4.3], and concludes the list of preliminary results needed for the results of this paper.

3 Positive case: proof of the main results

Before checking whether the inclusion set \(C_f\) contains \(T_N\) or \(T'_N\) configurations, we need to exclude more basic building block for wild solutions, such as rank-one connections or, as in this case, \(\Lambda _{dc}\)-connections in \(C_f\). It is rather easy to see, compare for instance [24], that if f is strictly polyconvex, then for \(A,B \in K_f\) it is not possible to have

$$\begin{aligned} A-B \in \Lambda _{dc}. \end{aligned}$$

Indeed the same result holds even considering \(K'_f\). To prove this, it is sufficient to observe that if \(X,Y \in {\mathbb {R}}^{n\times m}\) are rank-one connected, i.e. for some \(u \in {\mathbb {S}}^{m - 1}\)

$$\begin{aligned} (X - Y)v = 0,\; \forall v\perp u, \end{aligned}$$
(3.1)

and

$$\begin{aligned} (Df(X) - Df(Y))u = 0, \end{aligned}$$
(3.2)

then

$$\begin{aligned} \langle Df(X) - Df(Y), X - Y \rangle&= \sum _{i = 1}^m ((Df(X) - Df(Y))u_i, (X-Y) u_i)\\&\overset{(3.1)}{=} ((Df(X) - Df(Y))u, (X-Y) u) \overset{(3.2)}{=} 0, \end{aligned}$$

where \(\{u_1,\dots , u_m\}\) is an orthonormal basis of \({\mathbb {R}}^m\) with \(u_1 = u\). On the other hand, since f is strictly polyconvex, it is easy to see that

$$\begin{aligned} \langle Df(X) - Df(Y), X - Y \rangle > 0 \end{aligned}$$

if \({{\,\mathrm{rank}\,}}(X-Y) = 1\). The first result of this section shows that this result holds also for \(C_f\), provided f is positive.

Proposition 3.1

Let f be strictly polyconvex. If

$$\begin{aligned} A =\left( \begin{array}{cc} X\\ Y\\ Z \end{array} \right) ,\; B= \left( \begin{array}{cc} X' \\ Y' \\ Z' \end{array} \right) \in C_f, \end{aligned}$$

and \(f(X) \ge 0, f(X') \ge 0\), then

$$\begin{aligned} A- B \notin \Lambda _{dc}. \end{aligned}$$

Proof

Suppose by contradiction that there exist

$$\begin{aligned} A= \left( \begin{array}{cc} X\\ Y\\ Z \end{array} \right) \in C_f,\;B= \left( \begin{array}{cc} X' \\ Y' \\ Z' \end{array} \right) = \left( \begin{array}{cc} X + C\\ Y + D\\ Z + E \end{array} \right) \in C_f, \end{aligned}$$

with \(c\doteq f(X) \ge 0, c'\doteq f(X') \ge 0\), and there is a vector \(\xi \in {\mathbb {R}}^m\) with \(\Vert \xi \Vert = 1\) such that for every \(v\perp \xi \),

$$\begin{aligned} Cv = 0,\;D\xi =0,\; E\xi = 0. \end{aligned}$$

Now we can use the so-called Matrix Determinant Lemma 3.2 to see that the expressions found in (2.25) evaluated at

$$\begin{aligned} A_1 = \left( \begin{array}{c}X\\ Y\end{array}\right) , \; A_2 = \left( \begin{array}{c}X + C\\ Y + D\end{array}\right) , \end{aligned}$$

yield the following inequalities:

$$\begin{aligned}&c - c' - \frac{1}{\beta }\langle X- X', Y\rangle < 0, \end{aligned}$$
(3.3)
$$\begin{aligned}&c' - c - \frac{1}{\beta '}\langle X' - X, Y'\rangle < 0. \end{aligned}$$
(3.4)

Moreover by assumption \((Z' - Z)\xi = 0\), i.e.

$$\begin{aligned} (Z' - Z)\xi = 0 = (X')^TY'\xi - X^TY\xi -( c'\beta ' - c\beta )\xi . \end{aligned}$$

Thus, using \((Y' - Y)\xi = 0\),

$$\begin{aligned} 0 = (X' - X)^TY'\xi -( c'\beta ' - c\beta )\xi = \langle C,Y\rangle \xi -( c'\beta ' - c\beta )\xi , \end{aligned}$$

that yields, since \(\Vert \xi \Vert = 1\),

$$\begin{aligned} \langle C,Y\rangle = c'\beta ' - c\beta . \end{aligned}$$
(3.5)

In the previous lines we have used the fact that

$$\begin{aligned} (X' - X)^TY'\xi = C^T(Y + D)\xi = C^TY\xi , \end{aligned}$$

and, since C is of rank one with \(Cv = 0, \forall v\perp \xi \),

$$\begin{aligned} C^TY\xi = \langle C, Y\rangle \xi . \end{aligned}$$

Exploiting (3.5), we rewrite (3.3) as

$$\begin{aligned} c - c' - \frac{1}{\beta }\langle X- X', Y\rangle = c - c' + \frac{1}{\beta }\langle C, Y\rangle = c - c' + \frac{1}{\beta } ( c'\beta ' - c\beta ) < 0, \end{aligned}$$
(3.6)

and (3.4) as

$$\begin{aligned} c' - c - \frac{1}{\beta '}\langle C, Y\rangle = c' - c - \frac{1}{\beta '}( c'\beta ' - c\beta ) < 0 \end{aligned}$$
(3.7)

From (3.6), we infer

$$\begin{aligned} \beta c - \beta c' + (c'\beta ' - c\beta )< 0 \Leftrightarrow c'(\beta ' - \beta ) < 0 \end{aligned}$$

and from (3.7)

$$\begin{aligned} \beta ' c' - \beta ' c - (c'\beta ' - c\beta )< 0 \Leftrightarrow c(\beta - \beta ')<0. \end{aligned}$$

Since \(c \ge 0\) and \(c' \ge 0\), we get a contradiction. \(\square \)

Let us recall the Matrix Determinant Lemma used in the proof of the last proposition:

Lemma 3.2

Let AB be matrices in \({\mathbb {R}}^{m\times m}\), and let \({{\,\mathrm{rank}\,}}(B) \le 1\). Then,

$$\begin{aligned} \det (A + B) = \det (A) + \langle {{\,\mathrm{cof}\,}}(A)^T,B\rangle . \end{aligned}$$

Now that we have excluded \(\Lambda _{dc}\)-connections, we can ask ourselves the same question concerning \(T'_N\) configurations. In particular we want to prove the main Theorem of this part of the paper:

Theorem 3.3

If \(f\in C^1 ({\mathbb {R}}^{n\times m})\) is a strictly polyconvex function, then \(C_f\) does not contain any set \(\{A_1, \ldots , A_N\} \subset {\mathbb {R}}^{(2n + m)\times m}\) which induces a \(T_N'\) configuration, provided that \(f(X_1) \ge 0,\dots , f(X_N) \ge 0\), if

$$\begin{aligned} A_i = \left( \begin{array}{c} X_i\\ Y_i\\ Z_i \end{array} \right) , \quad X_i,Y_i \in {\mathbb {R}}^{n\times m}, Z_i \in {\mathbb {R}}^{m\times m}, \forall i \in \{1,\dots , N\}. \end{aligned}$$

At the end of the section we will show the following

Corollary 3.4

If \(f\in C^1 ({\mathbb {R}}^{n\times m} )\) is strictly polyconvex, then \(K_f\) does not contain any set \(\{A_1, \ldots , A_N\}\) which induces a \(T_N'\) configuration.

Let us fix the notation. We will always consider \(T'_N\) configurations of the following form:

$$\begin{aligned} A_i\doteq \left( \begin{array}{cc} X_i\\ Y_i\\ Z_i\\ \end{array} \right) ,\quad \; X_i,Y_i\in {\mathbb {R}}^{n\times m}, Z_i \in {\mathbb {R}}^{m\times m}, \end{aligned}$$
(3.8)

with:

$$\begin{aligned} X_i = P + \sum _{j = 1}^{i - 1}C_j + k_iC_i, \; Y_i = Q + \sum _{j = 1}^{i - 1}D_j + k_iD_i, \; Z_i = R + \sum _{j = 1}^{i - 1}E_j + k_iE_i, \end{aligned}$$
(3.9)

and we denote with \(n_i \in {\mathbb {S}}^{m - 1}\) the vectors such that

$$\begin{aligned} D_in_i = 0, E_in_i = 0, C_iv = 0,\quad \forall v \perp n_i,\; \forall 1\le i \le N. \end{aligned}$$

3.1 Idea of the proof

Before proving the theorem, let us give an idea of the key steps of the proof. First of all, in Lemma 3.5, we will see that without loss of generality we can choose \(P = 0\). As already explained in Sect. 2.3, we want to prove that the system of inequalities

$$\begin{aligned} \nu _i \doteq \beta _ic_i - \sum _{j}\beta _it_j^ic_j -k_i\langle Y_i,C_i\rangle < 0, \forall i \, , \end{aligned}$$
(3.10)

cannot be fulfilled at the same time. This gives a contradiction with Corollary 2.10. In particular, we show that for the index \(\sigma \) such that \(\beta _{\sigma } = \min _j\beta _j \),

$$\begin{aligned} \nu _{\sigma } \ge 0. \end{aligned}$$

To do so, we prove that the quantities

$$\begin{aligned} \mu _i\doteq -\beta _ic_i + \sum _{j}\beta _jt_j^ic_j +k_i\langle Y_i,C_i\rangle \end{aligned}$$
(3.11)

equal to 0 for every i. Then, choosing \(\sigma \) as above and exploiting the positivity of \(c_j, \forall j\), we estimate

$$\begin{aligned} 0 = -\mu _{\sigma } = \beta _{\sigma }c_{\sigma } - \sum _{j}\beta _jt_j^{\sigma }c_j -k_{\sigma }\langle Y_{\sigma },C_{\sigma }\rangle \le \beta _{\sigma }c_{\sigma } - \sum _{j}\beta _{\sigma }t_j^{\sigma }c_j -k_{\sigma }\langle Y_{\sigma },C_{\sigma }\rangle = \nu _{\sigma }. \end{aligned}$$
(3.12)

This will then yield the required contradiction. In order to show \(\mu _i = 0,\forall 1\le i \le N\), we consider N matrices \(M_i\) defined as

$$\begin{aligned} M_i \doteq \mu \sum _{j \le i -1}\alpha _jC_j^TD_j + \sum _{j \ge i}\alpha _jC_j^TD_j, \end{aligned}$$

where \(\mu > 1\) is part of the defining vector of the \(T_N\) configuration \(\{X_1,\dots ,X_N\}\), compare 2.13, and \(\alpha _j\) are real numbers. We prove that for numbers \(\xi _j > 0\), a subset \({\mathcal {I}}_i \subset \{\xi _1\mu _1,\dots , \xi _N\mu _N\}\) is made of generalized eigenvalues of \(M_i\), see (3.24). This is achieved thanks to Lemma 3.6. Since \(M_i\) is trace-free, as can be seen by the structure of \(C_j\) and \(D_j\), we will find N relations of the form

$$\begin{aligned} \sum _{\xi _j\mu _j \in {\mathcal {I}}_i}\xi _j\mu _j = 0. \end{aligned}$$

This can be read as the equations for the kernel for a specific matrix \(N\times N\) matrix, W. Proving that W has trivial kernel will yield \(\xi _j\mu _j = 0,\forall j\), and thus \(\mu _j = 0\) since \(\xi _j > 0\). The proof of the invertibility of W is the content of the last Lemma 3.10.

3.2 Proof of Theorem 3.3

Lemma 3.5

If f is a strictly polyconvex function such that \(A_i \in C_f\), \(\forall 1\le i \le N\) and \(f(X_i) \ge 0, \forall 1\le i\le N\), then there exists another strictly polyconvex function F such that the \(T_N'\) configuration \(B_i\) defined as

$$\begin{aligned} B_i =\left( \begin{array}{cc} X_i - P\\ Y_i\\ Z_i - P^TY_i\\ \end{array} \right) \end{aligned}$$

satisfies \(B_i \in C_F,\) for every \(1\le i \le N\) and moreover \(F(X_i - P) \ge 0, \forall i\).

Proof

Simply define the new polyconvex function F(X) by \(F(X)\doteq f(X + P)\). Clearly the newly defined family \(\{B_1,\dots B_N\}\) still induces a \(T'_N\) configuration, and it is straightforward that \(B_i \in C_F\). Moreover, this does not affect positivity, in the sense that \(F(X_i - P) = f(X_i - P + P) = f(X_i) \ge 0\). \(\square \)

Lemma 3.6

Suppose \(A_i \in C_f\), \(\forall i\), and \(P = 0\). Then, for every \(i \in \{1,\dots , N\}\):

$$\begin{aligned} \sum _{j = 1}^N k_j(k_j - 1)t^i_jC_j^TD_jn_i = \left( k_i\langle C_i,Y_i\rangle - \beta _ic_i + \sum _{j = 1}^N\beta _jt_j^ic_j\right) n_i \overset{(3.11)}{=} \mu _in_i,\quad \forall i = 1,\dots , N, \end{aligned}$$

where \(t^i\) is the vector defined in (2.15).

Proof

We need to compute the following sums:

$$\begin{aligned} \sum _jt_j^iZ_j = \sum _jt_j^iX^T_jY_j - \sum _jt_j^ic_j\beta _j{{\,\mathrm{id}\,}}. \end{aligned}$$
(3.13)

Let us start computing the sum for \(i = 1\), \(\sum _j\lambda _jX_j^TY_j.\) First, notice that

$$\begin{aligned} \sum _j\lambda _j X_j^TY_j = \sum _j\lambda _j X_j^T(Y_j-Q) + \sum _j\lambda _j X_j^TQ = \sum _j\lambda _j X_j^T(Y_j-Q), \end{aligned}$$

since, by Lemma 2.6 or (2.21),

$$\begin{aligned} \sum _j \lambda _j X_j^TQ = P^TQ = 0. \end{aligned}$$

We rewrite it in the following way:

$$\begin{aligned} \begin{aligned} \sum _j\lambda _j X_j^TY_j&= \sum _j\lambda _j X_j^T(Y_j-Q) \\&= \sum _{j = 1}^N\lambda _j\left( \sum _{1\le a,b\le j - 1}C_a^TD_b + k_j\sum _{1\le a \le j - 1} C_a^TD_j + k_j\sum _{1\le b \le j - 1} C_j^TD_b + k_j^2C_j^TD_j\right) \\&=\sum _{i,j}g_{ij}C_i^TD_j, \end{aligned} \end{aligned}$$
(3.14)

where we collected in the coefficients \(g_{ij}\) the following quantities:

$$\begin{aligned} g_{ij}= {\left\{ \begin{array}{ll} \lambda _ik_i + {\sum }_{r = i + 1}^N\lambda _r,\text { if } i \ne j\\ \lambda _ik_i^2 + {\sum }_{r = i + 1}^N\lambda _r,\text { if } i = j. \end{array}\right. } \end{aligned}$$

Using (2.14), we have, if \(i \ne j\):

$$\begin{aligned} g_{ij} = g_{ji} = \lambda _ik_i + \sum _{r = i + 1}^N\lambda _r = \frac{\mu }{\mu - 1}. \end{aligned}$$

On the other hand, again using (2.14),

$$\begin{aligned} g_{ii} = k_i^2\lambda _i + \sum _{r = i + 1}^N\lambda _r = k_i(k_i - 1)\lambda _i + \frac{\mu }{\mu - 1}. \end{aligned}$$

Using the equalities \(\sum _\ell C_\ell = 0 = \sum _\ell D_\ell \), then also \(\sum _{i,j}C_i^TD_j = 0\), and so \(\sum _{i\ne j}C_i^TD_j = -\sum _iC_i^TD_i\). Hence, (3.14) becomes

$$\begin{aligned} \sum _{i,j}g_{ij}C_i^TD_j = \frac{\mu }{\mu - 1}\sum _{i\ne j}C_i^TD_j + \sum _i\left( k_i(k_i - 1)\lambda _i + \frac{\mu }{\mu - 1}\right) C_i^TD_i = \sum _i k_i(k_i - 1)\lambda _iC_i^TD_i. \end{aligned}$$

We just proved that

$$\begin{aligned} \sum _j\lambda _jX_j^TY_j = \sum _j k_j(k_j - 1)\lambda _jC_j^TD_j. \end{aligned}$$
(3.15)

Recall the definition of \(t^i\), namely

$$\begin{aligned} t^i= \frac{1}{\xi _i}(\mu \lambda _1,\dots ,\mu \lambda _{i - 1},\lambda _i,\dots ,\lambda _N)\, . \end{aligned}$$

By the previous computation (\(i = 1\)), it is convenient to rewrite (3.13) using (3.15) as

$$\begin{aligned} R + \sum _{j=1}^{i - 1}E_j = \frac{1}{\xi _i}\left( \sum _j k_j(k_j - 1)\lambda _jC_j^TD_j + (\mu - 1)\sum _{j = 1}^{i - 1}\lambda _jX_j^TY_j\right) - \sum _jt_j^ic_j\beta _j{{\,\mathrm{id}\,}}. \end{aligned}$$
(3.16)

In the previous equation, we have used the equality

$$\begin{aligned} \sum _{j = 1}^Nt_j^iZ_j = R + \sum _{j=1}^{i - 1}E_j,\quad \forall i \in \{1,\dots , N\}, \end{aligned}$$
(3.17)

that easily follows from Lemma 2.6. Once again, let us express the sum up to \(i - 1\) in the following way:

$$\begin{aligned} \sum _{j = 1}^{i - 1}\lambda _jX_j^TY_j = \sum _{j = 1}^{i - 1}\lambda _jX_j^TQ + \sum _{j = 1}^{i - 1}\lambda _jX_j^T(Y_j-Q) = \sum _{j = 1}^{i - 1}\lambda _j X_j^TQ+ \sum _{k,j}^{i - 1}s_{kj}C_k^TD_j. \end{aligned}$$

A combinatorial argument analogous to the one in the previous case gives

$$\begin{aligned}&s_{\ell \ell }= k_\ell ^2\lambda _\ell + \dots + \lambda _{i - 1} \\&= (k_\ell ^2 - k_\ell )\lambda _\ell + k_\ell \lambda _\ell + \dots + \lambda _{i - 1}, \\&s_{\alpha \beta } = k_\alpha \lambda _\alpha + \dots + \lambda _{i - 1},\qquad \alpha \ne \beta \\ \end{aligned}$$

Now

$$\begin{aligned} k_r\lambda _r + \dots + \lambda _{i - 1}=\frac{\mu ( \sum _{j = 1}^{i - 1}\lambda _j) + \sum _{j = i}^{N}\lambda _j }{\mu - 1} \end{aligned}$$

and so

$$\begin{aligned} k_r\lambda _r + \dots + \lambda _{i - 1}=\frac{(\mu -1)( \sum _{j = 1}^{i - 1}\lambda _j) + 1 }{\mu - 1} = \frac{\xi _i}{\mu - 1} =: b_{i - 1} \end{aligned}$$

Hence

$$\begin{aligned}&\sum _{j = 1}^{i - 1}\lambda _jX_j^TY_j = \sum _{j = 1}^{i - 1}\lambda _j X_j^TQ\\&\quad + \sum _{k,j}^{i - 1}s_{kj}C_k^TD_j =\sum _{j = 1}^{i - 1}\lambda _j X_j^TQ + b_{i - 1}\sum _{k,j}^{i - 1}C_k^TD_j \\&\quad + \sum _{j = 1}^{i - 1} k_j(k_j - 1)\lambda _j C_j^TD_j. \end{aligned}$$

We rewrite (3.16) as

$$\begin{aligned} \begin{aligned} R + \sum _{j = 1}^{i - 1}E_j&= \frac{1}{\xi _i}\left( \sum _j k_j(k_j - 1)\lambda _jC_j^TD_j + \xi _i\sum _{k,j}^{i - 1}C_k^TD_j \right. \\&\quad \left. + (\mu - 1)\sum _{j = 1}^{i - 1} (k_j(k_j - 1)\lambda _j C_j^TD_j + \lambda _j X_j^TQ)\right) \\&\quad - \sum _j\beta _jt_j^ic_j{{\,\mathrm{id}\,}}\end{aligned} \end{aligned}$$
(3.18)

Now we substitute (3.18) in the definition (3.9) of \(Z_i\) in order to compute \(E_i\):

$$\begin{aligned} k_iE_i&+ \frac{1}{\xi _i}\left( \sum _j k_j(k_j - 1)\lambda _jC_j^TD_j + \xi _i\sum _{k,j}^{i - 1}C_k^TD_j + (\mu - 1)\sum _{j = 1}^{i - 1} (k_j(k_j - 1)\lambda _j C_j^TD_j + \lambda _j X_j^TQ)\right) \\&- \sum _j\beta _jt_j^ic_j{{\,\mathrm{id}\,}}= X_i^TY_i - \beta _ic_i{{\,\mathrm{id}\,}}. \end{aligned}$$

Multiply by \(n_i\) the previous expression and recall that \(E_in_i = 0\) to find:

$$\begin{aligned} \begin{aligned} \frac{1}{\xi _i}&\left( \sum _j k_j(k_j - 1)\lambda _jC_j^TD_jn_i + \xi _i\sum _{k,j}^{i - 1}C_k^TD_jn_i + (\mu - 1)\sum _{j = 1}^{i - 1} (k_j(k_j - 1)\lambda _j C_j^TD_j n_i + \lambda _j X_j^TQn_i)\right) \\&- \sum _j\beta _jt_j^ic_jn_i = X_i^TY_in_i - \beta _ic_in_i. \end{aligned} \end{aligned}$$
(3.19)

Now notice that, since \(D_in_i = 0\),

$$\begin{aligned} \begin{aligned} X_i^TY_in_i&= X_i^TQn_i + \sum _{j,k}^{i- 1}C_{k}^TD_jn_i + k_i\sum _{j = 1}^{i-1}C_i^TD_jn_i + k_i\sum _{j = 1}^{i-1}C_j^TD_in_i + k_i^2C_i^TD_in_i \\&= X_i^TQn_i + \sum _{j,k}^{i- 1}C_{k}^TD_jn_i + k_i\sum _{j = 1}^{i-1}C_i^TD_jn_i. \end{aligned} \end{aligned}$$

Thus (3.19) becomes

$$\begin{aligned} \begin{aligned} \frac{1}{\xi _i}&\left( \sum _j k_j(k_j - 1)\lambda _jC_j^TD_jn_i + (\mu - 1)\sum _{j = 1}^{i - 1} (k_j(k_j - 1)\lambda _j C_j^TD_j n_i + \lambda _j X_j^TQn_i)\right) \\&- \sum _j\beta _jt_j^ic_jn_i = X_i^TQn_i + k_i\sum _{j = 1}^{i-1}C_i^TD_jn_i - \beta _ic_in_i. \end{aligned} \end{aligned}$$
(3.20)

Now we need to compute

$$\begin{aligned} \sum _{j = 1}^{i - 1}\lambda _j X_j = \sum _{j = 1}^{i- 1}y_jC_j, \end{aligned}$$

and

$$\begin{aligned} y_j = k_j\lambda _j + \dots + \lambda _{i - 1} = \frac{\xi _i}{\mu - 1}, \; \forall j \in \{1,\dots , i - 1\}. \end{aligned}$$

Using this computation, (3.20) reads as:

$$\begin{aligned} \begin{aligned} \frac{1}{\xi _i}&\left( \sum _j k_j(k_j - 1)\lambda _jC_j^TD_jn_i + (\mu - 1)\sum _{j = 1}^{i - 1} k_j(k_j - 1)\lambda _j C_j^TD_j n_i\right) \\&\quad + \sum _{j = 1}^{i - 1}C_j^TQn_i - \sum _j\beta _jt_j^ic_jn_i = \\&X_i^TQn_i + k_i\sum _{j = 1}^{i-1}C_i^TD_jn_i - \beta _ic_in_i. \end{aligned} \end{aligned}$$
(3.21)

Exploiting the definition of \(t_j^i\), we see that we can rewrite

$$\begin{aligned} \frac{1}{\xi _i}\left( \sum _j k_j(k_j - 1)\lambda _jC_j^TD_jn_i + (\mu - 1)\sum _{j = 1}^{i - 1} k_j(k_j - 1)\lambda _j C_j^TD_j n_i\right) = \sum _j k_j(k_j - 1)t^i_jC_j^TD_jn_i, \end{aligned}$$

and

$$\begin{aligned} X_i^TQn_i&+ k_i\sum _{j = 1}^{i-1}C_i^TD_jn_i - \sum _{j = 1}^{i - 1}C_j^TQn_i \\&= \sum _{j = 1}^{i - 1}C_j^TQn_i + k_iC^T_iQn_i + k_i\sum _{j = 1}^{i-1}C_i^TD_jn_i - \sum _{j = 1}^{i - 1}C_j^TQn_i = k_iC_i^TY_in_i. \end{aligned}$$

Thus (3.21) becomes

$$\begin{aligned} \sum _j k_j(k_j - 1)t^i_jC_j^TD_jn_i - \sum _j\beta _jt_j^ic_jn_i = k_i\sum _{j = 1}^{i-1}C_i^TY_in_i - \beta _ic_in_i. \end{aligned}$$

Since \(C_iv = 0, \forall v\perp n_i\), we have \(C_i^TY_in_i = \langle C_i,Y_i\rangle n_i\), and we finally obtain the desired equalities:

$$\begin{aligned} \sum _{j = 1}^N k_j(k_j - 1)t^i_jC_j^TD_jn_i = \left( k_i\langle C_i,Y_i\rangle - \beta _ic_i + \sum _{j = 1}^N\beta _jt_j^ic_j\right) n_i, \quad \forall i = 1,\dots , N. \end{aligned}$$

\(\square \)

We are finally in position to prove the main Theorem.

Proof of Theorem 3.3

Assume by contradiction the existence of a \(T_N'\) configuration induced by matrices \(\{A_1, \ldots , A_N\}\) of the form (3.8) which belong to the inclusion set \(C_f\) of some stictly polyconvex function \(f\in C^1 ({\mathbb {R}}^{n\times m})\) and \(f(X_i) \ge 0\) for every i. We can assume, without loss of generality by Lemma 3.5, that

$$\begin{aligned} P = 0\, . \end{aligned}$$

Using Lemma 3.6, we find

$$\begin{aligned} \sum _{j = 1}^N k_j(k_j - 1)t^i_jC_j^TD_jn_i = \left( k_i\langle C_i,Y_i\rangle - \beta _ic_i + \sum _{j = 1}^N\beta _jt_j^ic_j\right) n_i = \mu _in_i,\quad \forall i. \end{aligned}$$
(3.22)

Define \(\alpha _{j}\doteq k_j(k_{j} -1)\lambda _j > 0\), and

$$\begin{aligned} M_i\doteq \mu \sum _{j \le i -1}\alpha _jC_j^TD_j + \sum _{j \ge i}\alpha _jC_j^TD_j, \end{aligned}$$

for \(i \in \{1,\dots , N\}\). Also set

$$\begin{aligned} M_i\doteq \mu M_{i - N},\quad \forall i \in \{N + 1,\dots , 2N\}. \end{aligned}$$

Then, (3.22) can be rewritten as

$$\begin{aligned} M_in_i = \xi _i\mu _in_i,\quad \forall i \in \{1,\dots ,N\}. \end{aligned}$$
(3.23)

We define \(n_{s}\doteq n_{s - N}\), for \(s \in \{N + 1,\dots , 2N\}\). As explained in Sect. 3.1, the idea is to show that a subset of the vectors \(n_j\) are generalized eigenvectors and a subset of \(\xi _j \mu _j\) are generalized eigenvalues of \(M_i\). In particular, for every \(i \in \{1,\dots ,N\}\), we want to show the following equalities:

$$\begin{aligned} {\left\{ \begin{array}{ll} M_in_{i + a} = \xi _{i + a}\mu _{i + a}n_{i + a} + v_{i,a}, &{}\text { if } a: i \le i + a \le N\\ M_in_{i + a} = \mu \xi _{i + a}\mu _{i + a}n_{i + a} + v_{i,a}, &{}\text { if } a: N + 1 \le i + a \le N + i - 1, \end{array}\right. } \end{aligned}$$
(3.24)

where \(v_{i,a} \in {{\,\mathrm{span}\,}}\{n_i,\dots , n_{i + a - 1}\}\). From now on, we fix \(i \in \{1,\dots , N\}\). To prove (3.24), first we rewrite

$$\begin{aligned} M_{i}n_{i + a} = (M_{i} - M_{i + a})n_{i + a} + M_{i + a}n_{i + a}, \end{aligned}$$
(3.25)

and then we use (3.23) to obtain

$$\begin{aligned} (M_{i} - M_{i + a})n_{i + a} + M_{i + a}n_{i + a} = {\left\{ \begin{array}{ll} \xi _{i + a}\mu _{i + a}n_{i + a} + (M_{i} - M_{i + a})n_{i + a}, \text { if } i + a \le N,\\ \mu \xi _{i + a}\mu _{i + a}n_{i + a} + (M_{i} - M_{i + a})n_{i + a}, \text { if } i + a > N. \end{array}\right. }. \end{aligned}$$

To conclude the proof of (3.24), we only need to show that

$$\begin{aligned} (M_{i} - M_{i + a})n_{i + a} \in {{\,\mathrm{span}\,}}\{n_i,\dots , n_{i + a - 1}\}, \quad \forall a \in \{0,\dots , N-1\}. \end{aligned}$$
(3.26)

To do so, we compute \(M_i - M_{i + a}\). Let us start from the case \(1\le i + a \le N\):

$$\begin{aligned} M_{i} - M_{i + a}&= \mu \sum _{j< i}\alpha _jC_j^TD_j + \sum _{j \ge i}\alpha _jC_j^TD_j - \mu \sum _{j< i + a}\alpha _jC_j^TD_j - \sum _{j \ge i + a}\alpha _jC_j^TD_j\\&= \sum _{i \le j< i +a}\alpha _jC_j^TD_j - \mu \sum _{i \le j < i + a}\alpha _jC_j^TD_j. \end{aligned}$$

On the other hand, if \(N + 1 \le i + a \le i + N - 1\), then

$$\begin{aligned} M_{i} - M_{i + a}&= M_{i} - \mu M_{i + a - N} \\&= \mu \sum _{j< i}\alpha _jC_j^TD_j + \sum _{j \ge i}\alpha _jC_j^TD_j - \mu ^2\sum _{j< i + a - N}\alpha _jC_j^TD_j - \mu \sum _{j \ge i + a - N}\alpha _jC_j^TD_j\\&= \mu \sum _{j \ge i + a - N}\alpha _jC_j^TD_j - \mu \sum _{j< i}\alpha _jC_j^TD_j + \sum _{j \ge i}\alpha _jC_j^TD_j - \mu ^2\sum _{j< i + a - N}\alpha _jC_j^TD_j\\&= \mu \sum _{j \ge i}\alpha _jC_j^TD_j + \sum _{j \ge i}\alpha _jC_j^TD_j - \mu ^2\sum _{j < i + a - N}\alpha _jC_j^TD_j. \end{aligned}$$

Now the crucial observation is that, due to the fact that \(C_jv = 0\) for every \(v\perp n_j\), the image of \(C_j^TD_j\) is contained in the line \({{\,\mathrm{span}\,}}(n_j)\), for every \(j \in \{1,\dots , N\}\). Therefore, the previous computations prove (3.26) and hence (3.24). Now we introduce

$$\begin{aligned} V_i\doteq \{n_i,n_{i + 1}, n_{i + 2},\dots , n_{N}, n_{N + 1}, \dots , n_{N + i - 1}\}. \end{aligned}$$

We can extract a basis for \({{\,\mathrm{span}\,}}(V_i)\) in the following way. First, choose indexes

$$\begin{aligned} {{\overline{S}}}_i\doteq \{k: k = i \text { or } i< k \le N + i - 1, n_k \notin {{\,\mathrm{span}\,}}(n_i,\dots ,n_{k - 1})\}. \end{aligned}$$
(3.27)

Then, consider the basis \({\mathcal {B}}_i\doteq \{n_k: k \in {\overline{S}}_i\}\) for \({{\,\mathrm{span}\,}}(V_i)\). Since

$$\begin{aligned} {{\,\mathrm{span}\,}}({\mathcal {B}}_i) = {{\,\mathrm{span}\,}}(\{n_1,\dots , n_N\}),\quad \forall i, \end{aligned}$$

then \(\#{\overline{S}}_i = C \le \min \{m,N\}, \forall i\). Indexes in \({{\overline{S}}}_i\) lie in the set \(\{1,\dots , 2N\}\). For technical reasons, we also need to consider the modulo N counterpart of \({{\overline{S}}}_i\), that is

$$\begin{aligned} S_i\doteq \{k \in \{1,\dots , N\}: k \in {\overline{S}}_i \text { or } k + N \in {\overline{S}}_i\}. \end{aligned}$$
(3.28)

In \(S_i\), consider furthermore \(S_i'\doteq S_i \cap \{i,\dots , N\}\), \(S_i''\doteq S_i\cap \{1,\dots ,i - 1\}\). If necessary, complete \({\mathcal {B}}_i\) to a basis of \({\mathbb {R}}^m\) made with elements \(\gamma _j\) orthogonal to the ones of \({\mathcal {B}}_i\). Note that, since \({{\,\mathrm{Im}\,}}(C_i^TD_i) \subset {{\,\mathrm{span}\,}}(n_i)\), then \({{\,\mathrm{Im}\,}}(M_i)\subset \{n_1,\dots , n_N\}\). Then, the associated matrix to \(M_i\) with respect to \({\mathcal {B}}_i\) is

(3.29)

We denoted with \({\mathbf {0}}_{c,d}\) the zero matrix with c rows and d columns, with \({\mathbf {T}}\) the \(C\times (m - C)\) matrix of the coefficients of \(M_i\gamma _j\) with respect to \(\{n_s:s\in {\overline{S}}_i\}\), and with \(\mathbf {*}\) numbers we are not interested in computing explicitely. Finally, we have chosen an enumeration \(s_1< s_2<\dots< s_\ell< \dots < s_C\) of the elements of \({\overline{S}}_i\), and we have defined

$$\begin{aligned} a_{\ell i}= {\left\{ \begin{array}{ll} \xi _{s_{\ell }}\mu _{s_{\ell }}, &{}\text { if } s_{\ell } \in S_i',\\ \mu \xi _{s_{\ell }-N}\mu _{s_{\ell }-N}, &{}\text { if } s_{\ell } - N \in S_i''. \end{array}\right. } \end{aligned}$$

The triangular form of the matrix representing \(M_i\) is exactly due to (3.24). Now, \({{\,\mathrm{tr}\,}}(M_i) = 0, \forall i\), since \(C_i^TD_i\) is trace-free for every i. This implies that the matrix in (3.29) must be trace-free, hence:

$$\begin{aligned} 0 = {{\,\mathrm{tr}\,}}(M_i) = \sum _{\ell = 1}^C a_{\ell i} = \sum _{a \in S'_i}\xi _{a}\mu _{a} + \mu \sum _{b \in S''_i}\xi _{b}\mu _{b}. \end{aligned}$$
(3.30)

We have thus reduced the problem to the following simple Linear Algebra statement: we wish to show that, if W is the \(N\times N\) matrix defined as

$$\begin{aligned} W_{ij}= {\left\{ \begin{array}{ll} 1, &{} \text { if } j \in S_i',\\ \mu , &{} \text { if } j \in S_i'',\\ 0, &{} \text { if } j \notin S_i, \end{array}\right. } \end{aligned}$$

then, \(Wx = 0 \Rightarrow x = 0\). By (3.30), the vector \(x \in {\mathbb {R}}^{N}\) defined as \(x_j\doteq \xi _j\mu _j, \forall 1\le j \le N\), is such that \(Wx = 0\), thus if the statement is true we get \(\xi _j \mu _j = 0, \forall 1\le j \le N\), and since \(\xi _j > 1\), also \(\mu _j = 0,\forall 1 \le j \le N\). By (3.12), this is sufficient to reach a contradiction. Therefore, we only need to show that \(Wx = 0 \Rightarrow x = 0\). This proof will be given in Lemma 3.10. \(\square \)

Before giving the proof of the final Lemma, let us make some examples of possible matrices W arising from the previous construction. For the sake of illustration, let us take N to be as small as possible, i.e. \(N = 4\).

Example 3.7

Consider the case in which \(C = 2\). This corresponds, for instance, to the case \(m = 2\). Then, by Proposition 3.1 and (3.27), the only possible form of W is

$$\begin{aligned} W = \left( \begin{array}{c} \begin{matrix} 1 &{}\quad 1 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad 1 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 1&{}\quad 1\\ \mu &{}\quad 0 &{}\quad 0 &{}\quad 1 \end{matrix} \end{array} \right) , \; Wx = \left( \begin{array}{cc} x_1 + x_2\\ x_2 + x_3\\ x_3 + x_4\\ \mu x_1 + x_4 \end{array} \right) = 0. \end{aligned}$$

Let \(W_i\) be the i-th row of W. We notice that for \(i = 1,2,3\), \(W_{i + 1}\) differs from \(W_i\) by exactly two elements, while \(W_4\) does not differ with \(W_1\) by only two elements. It does, though, with \(\mu W_1\). Hence we rewrite equivalently the system \(Wx = 0\) as \((W_i - W_{i + 1},x) = 0\), \((W_4 - \mu W_1,x) = 0\):

$$\begin{aligned} 0 = \left( \begin{array}{cc} x_1 - x_3\\ x_2 - x_4\\ x_3 - \mu x_1\\ x_4 - \mu x_2 \end{array} \right) , \text { i.e. } x_i = a_i x_{h(i)}, a_i = {\left\{ \begin{array}{ll} 1, &{}\quad \text { if } h(i) > i,\\ \mu , &{}\quad \text { if } h(i) \le i, \end{array}\right. } \end{aligned}$$

For a function \(h: \{1,\dots , 4\} \rightarrow \{1,\dots , 4\}\). Since \(\mu > 1\), this immediately implies \(x_i = 0, \forall i\).

Example 3.8

Consider the case in which \(C = 4\), corresponding to \(n_1,n_2,n_3,n_4\) linearly independent. Then,

$$\begin{aligned} W = \left( \begin{array}{cc} \begin{matrix} 1 &{}\quad 1 &{}\quad 1 &{}\quad 1\\ \mu &{}\quad 1 &{}\quad 1 &{}\quad 1 \\ \mu &{}\quad \mu &{}\quad 1&{}\quad 1\\ \mu &{}\quad \mu &{}\quad \mu &{}\quad 1 \end{matrix} \end{array} \right) , \; Wx = \left( \begin{array}{cc} x_1 + x_2 + x_3 + x_4\\ \mu x_1 + x_2 + x_3 + x_4\\ \mu x_1 + \mu x_2 + x_3 + x_4\\ \mu x_1 + \mu x_2 + \mu x_3 + x_4 \end{array} \right) = 0. \end{aligned}$$

As in the previous example, for \(i = 1,2,3\), \(W_{i + 1}\) differs from \(W_i\) by exactly one element, while \(W_4\) does the same with \(\mu W_1\). Thus as before we rewrite equivalently the system \(Wx = 0\) as \((W_i - W_{i + 1},x) = 0\), \((W_4 - \mu W_1,x) = 0\):

$$\begin{aligned} 0 = \left( \begin{array}{cc} (\mu - 1)x_1\\ (\mu - 1)x_2\\ (\mu - 1)x_3\\ (\mu - 1)x_4 \end{array} \right) , \text { i.e. } x_i = a_i x_{h(i)}, a_i = {\left\{ \begin{array}{ll} 1, &{}\text { if } h(i) > i,\\ \mu , &{}\text { if } h(i) \le i, \end{array}\right. } \end{aligned}$$

In this case, \(h(i) = i, \forall i \in \{1,\dots , 4\}\). Clearly also in this case \(\mu > 1\), implies \(x_i = 0, \forall i\).

Finally, let us show a less symmetric example:

Example 3.9

Consider the case in which \(C = 3\). Then, a possible matrix is:

$$\begin{aligned} W = \left( \begin{array}{cc} \begin{matrix} 1 &{} 1 &{} 0 &{} 1\\ 0 &{} 1 &{} 1 &{} 1 \\ \mu &{} 0 &{} 1&{} 1\\ \mu &{} \mu &{} 0 &{} 1 \end{matrix} \end{array} \right) , \; Wx = \left( \begin{array}{cc} x_1 + x_2 + x_4\\ x_2 + x_3 + x_4\\ \mu x_1 + x_3 + x_4\\ \mu x_1 + \mu x_2 + x_4 \end{array} \right) = 0. \end{aligned}$$

First, let us comment on the fact that this is a possible matrix appearing in the proof of the previous Theorem. Indeed, let us consider the first two lines:

$$\begin{aligned} \left( \begin{matrix} 1 &{} 1 &{} 0 &{} 1\\ 0 &{} 1 &{} 1 &{} 1 \\ \end{matrix} \right) . \end{aligned}$$

The fact that \(W_{13} = 0\) means that \(n_3 \in {{\,\mathrm{span}\,}}(n_1,n_2)\), since \(3 \notin S_1\). On the other hand, Proposition 3.1 ensures that \(n_{3}\) is not a multiple of \(n_2\), hence \(n_3 \in S_2\), and \(W_{23} = 1 \not = 0.\) For this reason, the matrix

$$\begin{aligned} W = \left( \begin{array}{cc} \begin{matrix} 1 &{} 0 &{} 1 &{} 1\\ 0 &{} 1 &{} 1 &{} 1 \\ \mu &{} 0 &{} 1&{} 1\\ \mu &{} \mu &{} 0 &{} 1 \end{matrix} \end{array} \right) \end{aligned}$$

would for instance have been non-admissible. Now, in order to prove \(Wx = 0 \Rightarrow x = 0\), we work as in the previous examples, by noticing that for \(i = 1,2,3\), \(W_{i + 1}\) differs from \(W_i\) by at most two elements, while \(W_4\) must be compared with \(\mu W_1\). Thus we write \(W_i - W_{i + 1}\), \(W_4 - \mu W_1\):

$$\begin{aligned} 0 = \left( \begin{array}{cc} x_1 - x_3\\ x_2 - \mu x_1\\ x_3 - \mu x_2\\ (\mu - 1)x_4 \end{array} \right) , \text { i.e. } x_i = a_i x_{h(i)}, a_i = {\left\{ \begin{array}{ll} 1, &{}\text { if } h(i) > i,\\ \mu , &{}\text { if } h(i) \le i. \end{array}\right. } \end{aligned}$$

It is an elementary computation to show that \(x_i = 0, \forall i\).

Even though the examples we have given are too simple to appreciate the usefulness of the function h such that \(x_i = a_ix_{h(i)}\), this will be crucial in the proof of the Lemma.

Lemma 3.10

Let W be the matrix defined in the proof of Theorem 3.3. Then, \({{\,\mathrm{Ker}\,}}(W) = \{0\}\).

Proof

Throughout the proof, we always consider a given vector \(x \in {\mathbb {R}}^N\) such that \(Wx = 0\). The proof, partially suggested by the previous examples, consists in the following steps. First, we show that the rows of W, \(W_i\) and \(W_{i + 1}\) (if \(i = N\), we compare \(W_N\) with \(\mu W_1\)) differ for at most two elements, and one of them is always \(x_i\). This immediately yields the existence of a function \(h: \{1,\dots , N\} \rightarrow \{1,\dots , N\}\) such that \(x_i = a_ix_{h(i)}\). We will then use this and the crucial fact that \(\mu > 1\) to conclude that \(x_i = 0, \forall i\). Let us make the following claims, and see from them how to conclude the proof of the present Lemma. We will use freely the notation introduced at the end of the proof of Theorem 3.3.

Claim 1: Let \(i \in \{1,\dots , N\}\). Then \({\overline{S}}_i\) differs from \({\overline{S}}_{i + 1}\) (if \(i = N\), \({\overline{S}}_{i + 1}= {\overline{S}}_1\)) of at most two elements, in the sense that

$$\begin{aligned} {\overline{S}}_i\Delta {\overline{S}}_{i + 1}\doteq {\overline{S}}_i\setminus {\overline{S}}_{i + 1} \cup {\overline{S}}_{i+1}\setminus {\overline{S}}_{i} \end{aligned}$$

contains at most 2 elements. Moreover, if \({\overline{S}}_i \Delta {\overline{S}}_{i + 1} \ne \emptyset \), then \({\overline{S}}_i \Delta {\overline{S}}_{i + 1} = \{i,I(i)\}\), with \(i \in {\overline{S}}_{i}\setminus {\overline{S}}_{i +1}\), and \(I(i) \in {\overline{S}}_{i + 1}\setminus {\overline{S}}_i\).

Claim 2: Let \(i \in \{1,\dots , N - 1\}\). The couple of rows \(W_i\),\(W_{i + 1}\) and \(\mu W_1\), \(W_N\) differ at most by two elements, in the sense that if \(W_i = (W_{i1},\dots ,W_{iN})\) and \(W_{i + 1} = (W_{(i+1)1},\dots ,W_{(i+1)N})\), then there are at most two indexes \(j_1,j_2\) such that \(W_{ij_1}-W_{(i + 1)j_1} \ne 0\) and \(W_{ij_2}-W_{(i + 1)j_2} \ne 0\) (and analogously for \(\mu W_1\) and \(W_N\)).

Finally, with this claim at hand, we are going to prove

Claim 3: There exists a function \(h: \{1,\dots , N\}\rightarrow \{1,\dots , N\}\) and numbers \(a_i, i \in \{1,\dots , N\}\), such that

$$\begin{aligned} x_i = a_ix_{h(i)} \end{aligned}$$
(3.31)

with the property

$$\begin{aligned} a_i = {\left\{ \begin{array}{ll} 1, &{}\text { if } h(i) > i,\\ \mu , &{}\text { if } h(i) \le i. \end{array}\right. } \end{aligned}$$

Let us show how the proof of the Lemma follows from Claim 3, and postpone the proofs of the claims. Fix \(i \in \{1,\dots , N\}\) and use (3.31) recursively to find

$$\begin{aligned} x_i = a_ia_{h(i)}\dots a_{h^{(n - 1)}(i)}x_{h^{(n)}(i)}, \end{aligned}$$

where \(h^{(n)}\) denotes the function obtained by applying h to itself n times. We also use the notation \(h^{(0)}\) to denote the identity function: \(h^{(0)}(i) = i\), \(\forall i \in \{1,\dots ,N\}\). By the properties of \(a_{j}\), we have, \(\forall r \in \{1,\dots , n\}\),

$$\begin{aligned} a_{h^{(r)}(i)} = {\left\{ \begin{array}{ll} 1, &{}\text { if } h^{(r)}(i) > h^{(r - 1)}(i),\\ \mu , &{}\text { if } h^{(r)}(i) \le h^{(r - 1)}(i). \end{array}\right. } \end{aligned}$$

Fix \(k \in {{\mathbb {N}}}\), and let \(r \in \{k + 1, \dots , k + N + 1\}\). Then, \( h^{(r)}(i) > h^{(r - 1)}(i)\) can occur at most N times in this range, since otherwise we would find

$$\begin{aligned} 1 \le h^{(k)}(i)< h^{(k+ 1 )}(i)< h^{(k + 2)}(i)< \dots < h^{(k + N + 1)}(i) \le N, \end{aligned}$$

and this is impossible since we would have \(N + 1\) distinct elements in the set \(\{1,\dots , N\}\). Now clearly this observation implies that for every fixed \(l \in {{\mathbb {N}}}\), there exists \(s \in {{\mathbb {N}}}\) such that

$$\begin{aligned} x_i = \mu ^tx_{h^{(s)}(i)}, \hbox { for some}\ t \ge l. \end{aligned}$$

This can only happen if \(x_i = 0\). Since i is arbitrary, the conclusion follows. \(\square \)

Let us now turn to the proof of the claims.

Proof of claim 1:

To prove the claim, we need to use the definition of \({\overline{S}}_i\). Let us recall the definition of \({\overline{S}}_i\), given in (3.27). To build \({\overline{S}}_i\) what we do is consider the ordered set \(\{n_i,n_{i + 1},\dots , n_{i + 1 - N}\}\) and select from it a basis of \({{\,\mathrm{span}\,}}\{n_1,\dots , n_N\}\) starting from \(n_i\) and then at step \(1 \le k \le N-1\) deciding whether to insert the vector \(n_{i + k}\) in our collection based on the fact that it is linear dependent or not from the previous ones. Recall also that \(S_i\) is the modulo N version of \({\overline{S}}_i\), see (3.28), and that we define \(n_j \doteq n_{j - N}\), for \(j \in \{N + 1,\dots , 2N\}\). Hence now fix \(i \in \{1,\dots , N\}\) and consider \(S_i\). If \(S_i = \{1,\dots , N\}\), then \(\#S_i = N\), thus \(S_j = \{1,\dots , N\}, \forall 1\le j \le N\) and the claim holds. Otherwise, let \(i + 1 < I = I(i) \le i + N -1\) be the first element in \((\overline{S_i})^c\). There are two cases:

  1. (1)

    \(n_{I} \in {{\,\mathrm{span}\,}}(n_i,\dots , n_{I - 1}) \setminus {{\,\mathrm{span}\,}}(n_{i + 1},\dots , n_{I - 1})\);

  2. (2)

    \(n_{I} \in {{\,\mathrm{span}\,}}(n_{i + 1},\dots , n_{I - 1})\).

At the same time, consider what happens in \({\overline{S}}_{i + 1}\): the span in the \((i + 1)\)-th case starts with one vector less than the one of the i-th case, simply because the collection of indexes in \({\overline{S}}_{i + 1}\) starts from \(n_{i + 1}\). Hence, since I is the first missing index in \({\overline{S}}_i\), I is also the first possible missing index for \({\overline{S}}_{i + 1}\). Therefore, consider the first case

$$\begin{aligned} n_{I} \in {{\,\mathrm{span}\,}}(n_i,\dots , n_{I - 1}) \setminus {{\,\mathrm{span}\,}}(n_{i + 1},\dots , n_{I - 1}). \end{aligned}$$

This implies that \(I \in {\overline{S}}_{i + 1}\). Moreover, we are now adding \(n_I\) to the set of vectors \(n_{i+1},\dots , n_{I - 1}\), and \(n_{I} \in {{\,\mathrm{span}\,}}(n_i,\dots , n_{I - 1})\setminus {{\,\mathrm{span}\,}}(n_{i + 1},\dots , n_{I - 1})\), hence \(n_I\) adds to the previous vectors the component relative to \(n_i\), in the sense that

$$\begin{aligned} {{\,\mathrm{span}\,}}(n_{i + 1},\dots ,n_I) = {{\,\mathrm{span}\,}}(n_i,\dots , n_{I - 1}). \end{aligned}$$

This moreover implies that \(j \in {\overline{S}}_i \Leftrightarrow j \in {\overline{S}}_{i + 1}\), \(\forall I \le j < N + i - 1\). Since \(n_i \in {{\,\mathrm{span}\,}}(n_{i + 1},\dots ,n_I)\), \(i \notin {\overline{S}}_{i + 1}\). Thus \({\overline{S}}_i\) and \({\overline{S}}_{i + 1}\) differ by at most two elements, and we have \(i \in {\overline{S}}_{i}\setminus {\overline{S}}_{i + 1}\) and \(I = I(i) \in {\overline{S}}_{i + 1}\setminus {\overline{S}}_i\). This concludes the case

$$\begin{aligned} n_{I} \in {{\,\mathrm{span}\,}}(n_i,\dots , n_{I - 1}) \setminus {{\,\mathrm{span}\,}}(n_{i + 1},\dots , n_{I - 1}). \end{aligned}$$

If instead \(n_{I} \in {{\,\mathrm{span}\,}}(n_{i + 1},\dots , n_{I - 1})\), then we see that \(I \notin {\overline{S}}_{i + 1}\), and we can iterate this reasoning from there, in the sense that we look for the next index \(I'\) such that \({I'}\notin {\overline{S}}_{i}\) and divide again into the two cases above. Clearly, for the indexes \(i + 1 \le j < I'_1\), we have \(j \in {\overline{S}}_{i + 1}\) and \(j \in {\overline{S}}_i\). Either this iteration enters in case 1 of the previous subdivision for some element \(I \notin {\overline{S}}_i\), or we conclude \({\overline{S}}_i = {\overline{S}}_{i + 1}\). This concludes the proof of the claim. \(\square \)

Proof of claim 2:

Note that nonzero elements of \(W_i\) are found in positions corresponding to elements of \(S_i\). Hence now fix \(i \in \{1,\dots , N- 1\}\) and consider \(W_i\) and \(W_{i + 1}\). If \(S_i = S_{i + 1}\), then \(W_{ij} = 0 \Leftrightarrow W_{(i + 1)j} = 0\). Moreover, we introduce the modulo N counterpart of the number I(i) found in Claim 2, i.e. \(I'(i) = I(i)\) if \(I(i) \in \{1,\dots ,N\}\), and \(I'(i) = I(i) - N\) if \(I(i) \in \{N + 1,\dots , 2N\}\). Thus using the definition of W, we can deduce, if \(S_i = S_{i + 1}\),

$$\begin{aligned} {\left\{ \begin{array}{ll} W_{(i+ 1)j} = W_{ij} = 0, &{}\text { if } j \notin S_i\\ W_{(i+ 1)j} = W_{ij} = \mu , &{}\text { if } j \in S_i, j < i\\ W_{(i+ 1)j} = W_{ij} = 1, &{}\text { if } j \in S_i, j > i\\ W_{(i + 1)i} = \mu , W_{ii} = 1,&{} \end{array}\right. } \end{aligned}$$
(3.32)

and the claim holds in this case. Finally, if \(S_i \Delta S_{i +1} = \{i,I'(i)\}\), then:

$$\begin{aligned} {\left\{ \begin{array}{ll} W_{(i+ 1)j} = W_{ij} = 0, &{}\text { if } j \notin S_i,j \ne I'(i)\\ W_{(i+ 1)j} = 1, W_{ij} = 0, &{}\text { if } j = I'(i)> i + 1\\ W_{(i+ 1)j} = \mu , W_{ij} = 0, &{}\text { if } j = I'(i)< i + 1\\ W_{(i+ 1)j} = W_{ij} = \mu , &{}\text { if } j \in S_i, j < i\\ W_{(i+ 1)j} = W_{ij} = 1, &{}\text { if } j \in S_i, j > i\\ W_{(i + 1)i} =0, W_{ii} = 1. &{} \end{array}\right. } \end{aligned}$$
(3.33)

This concludes the proof of the claim if \(i \in \{1,\dots , N-1\}\). If \(i = N\), then we need to compare \(W_N\) with \(\mu W_1\), and we obtain two cases, in analogy with the previous situation:

$$\begin{aligned} \text {if } S_N\Delta S_{1} = \emptyset , \text { then}: {\left\{ \begin{array}{ll} \mu W_{1j} = W_{Nj} = 0, &{}\text { if } j \notin S_N\\ \mu W_{1j} = W_{Nj} = \mu , &{}\text { if } j \in S_N, j < N\\ \mu W_{1N} = \mu , W_{NN} = 1, &{} \text { otherwise}, \end{array}\right. } \end{aligned}$$
(3.34)

and

$$\begin{aligned} \text {if } S_N\Delta S_{1} = \{N,I'(N)\}, \text { then}: {\left\{ \begin{array}{ll} \mu W_{1j} = W_{Nj} = 0, &{}\text { if } j \notin S_N, j \ne I'(N)\\ \mu W_{1j} = \mu , W_{Nj} = 0, &{}\text { if } j \notin S_N, j = I'(N)\\ \mu W_{1j} = W_{Nj} = \mu , &{}\text { if } j \in S_N, j < N\\ \mu W_{1N} = 0, W_{NN} = 1, &{} \text { otherwise}. \end{array}\right. } \end{aligned}$$
(3.35)

\(\square \)

Proof of Claim 3:

Fix \(i \in \{1,\dots , N\}\). We consider the equations given by

$$\begin{aligned} (W_{i + 1}- W_{i},x) =0, \text { if } i\in \{1,\dots , N-1\}, \text { and } (W_{N} - \mu W_{1},x) = 0. \end{aligned}$$

If we consider \(i \in \{1,\dots ,N -1\}\), we see from (3.32) and (3.33) that

$$\begin{aligned}&0 = (W_{i}-W_{i + 1},x) = \sum _{j = 1}^N(W_{ij} - W_{(i+1)j})x_j \\&\quad ={\left\{ \begin{array}{ll} (1 - \mu )x_i,&{} \text { if } S_i\Delta S_{i - 1} = \emptyset \\ x_i - x_{I'(i)},&{}\text { if } S_i\Delta S_{i - 1} =\{i,I'(i)\}, I'(i) > i + 1\\ x_i - \mu x_{I'(i)},&{} \text { if } S_i\Delta S_{i - 1} =\{i,I'(i)\}, I'(i) < i - 1 \end{array}\right. } \end{aligned}$$

and from (3.34) and (3.35) we infer

$$\begin{aligned} 0 = (W_{N}-\mu W_{1},x) = {\left\{ \begin{array}{ll} (1 - \mu )x_N,&{} \text { if } S_N\Delta S_{1} = \emptyset \\ x_N - \mu x_{I'(N)},&{}\text { if } S_N\Delta S_{1} =\{N,I'(N)\}. \end{array}\right. } \end{aligned}$$

From these equations we see that (3.31) holds with the choice \(h(i)\doteq I'(i)\), when i is such that \(S_i\Delta S_{i + 1} \ne \emptyset \), and \(h(i)\doteq i\) otherwise. \(\square \)

3.3 Proof of Corollary 3.4

We end this section by showing that Theorem 3.3 implies Theorem 3.4. Assume by contradiction that there exists a family of matrices

$$\begin{aligned} \{A_1,\dots ,A_N\} \subset K_f \end{aligned}$$

inducing a \(T'_N\) configuration of the form (3.8). We show that then there exists another \(T'_N\) configuration \(\{B_1,\dots , B_N\}\) such that \(B_i \in K_F \subset C_F, \forall 1\le i \le N\) for some strictly polyconvex F with

$$\begin{aligned} F(X'_i) \ge 0,\; \forall 1\le i \le N, \end{aligned}$$

if

$$\begin{aligned} B_i= \left( \begin{array}{cc} X_i' \\ Y_i' \\ Z_i' \end{array} \right) ,\; \forall 1 \le i \le N. \end{aligned}$$

This is a contradiction with Theorem 3.3. To accomplish this, it is is sufficient to define \(F(X)\doteq f(X) - \min _{i}f(X_i)\). This function is clearly strictly polyconvex, since f is. Moreover, we define

$$\begin{aligned} X_i'\doteq X_i,\; Y_i' \doteq Y_i \text { and } Z_i'\doteq Z_i + \min _{j}f(X_j){{\,\mathrm{id}\,}}. \end{aligned}$$

In this way, \(B_i\) is still a \(T_N'\) configuration. Moreover, \(B_i \in K_F, \; \forall 1\le i \le N\). To see this, it is sufficient to notice that, since \(A_i \in K_f\),

$$\begin{aligned} Y_i' = Y_i = Df(X_i) = DF(X_i'),\;\forall 1\le i \le N, \end{aligned}$$

and

$$\begin{aligned} Z_i' = Z_i +\min _{i}f(X_i){{\,\mathrm{id}\,}}= X_i^TY_i - f(X_i){{\,\mathrm{id}\,}}+ \min _if(X_i){{\,\mathrm{id}\,}}= (X'_i)^TY'_i - F(X_i){{\,\mathrm{id}\,}}. \end{aligned}$$

This finishes the proof.

4 Sign-changing case: the counterexample

In this section, we construct a counterexample to regularity in the case in which the hypothesis of non-negativity on f is dropped. Let us explain the strategy, that follows the one of [24]. First of all, we consider the following equivalent formulation of the differential inclusion of div-curl type considered in the previous sections. Indeed, due to the fact that for \(a \in {{\,\mathrm{Lip}\,}}({\mathbb {R}}^2,{\mathbb {R}}^2)\),

$$\begin{aligned} {{\,\mathrm{div}\,}}(a) = {{\,\mathrm{curl}\,}}(aJ), \end{aligned}$$

if

$$\begin{aligned} J = \left( \begin{array}{ll} 0 &{} -1\\ 1 &{} 0 \end{array}\right) \,, \end{aligned}$$

one easily sees that (2.3) holds if and only if

$$\begin{aligned} \left\{ \begin{array}{ll} \displaystyle {{\,\mathrm{curl}\,}}(Df(D u)J) = 0,\\ \displaystyle {{\,\mathrm{curl}\,}}(Du^TDf(Du)J - f(Du)J)= 0, \end{array}\right. \end{aligned}$$

in the weak sense. Since \(\Omega \) is convex, the latter allows us to say that (2.3) holds for \(u \in {{\,\mathrm{Lip}\,}}(\Omega ,{\mathbb {R}}^2)\) if and only if there exist \(w_1,w_2:\Omega \rightarrow {\mathbb {R}}^2\) such that

$$\begin{aligned} w\doteq \left( \begin{array}{c}u\\ w_1\\ w_2\end{array}\right) \end{aligned}$$

solves a.e. in \(\Omega \):

$$\begin{aligned} Dw \in {{\tilde{C}}}_f \doteq \left\{ C\in {\mathbb {R}}^{(2n + m)\times m}: C = \left( \begin{array}{cc} X\\ \beta Df(X)J\\ \beta X^TDf(X)J - \beta f(X)J \end{array} \right) , \text { for some }\beta > 0\right\} . \end{aligned}$$
(4.1)

From now on, we will always use this reformulation of the problem. Let us also introduce

$$\begin{aligned} {{\tilde{C}}}_f'\doteq \left\{ C\in {\mathbb {R}}^{(2n + m)\times m}: C = \left( \begin{array}{cc} X\\ \beta Df(X)J \end{array} \right) , \text { for some }\beta > 0\right\} . \end{aligned}$$

In order to construct the counterexample, we want to find a set of non-rigid matrices \(\{A_1,A_2,A_3,A_4,A_5\}\), \(A_i \in {\mathbb {R}}^{6\times 2}, \forall i\), satisfying

$$\begin{aligned} A_i = \left( \begin{array}{cc} X_i\\ Y_i\\ Z_i \end{array} \right) \in {{\tilde{C}}}_f. \end{aligned}$$
(4.2)

Roughly, non-rigidity means that there exists a non-affine solution of the problem

$$\begin{aligned} Dw \in \{A_1,\dots , A_5\}, \end{aligned}$$

see Lemma 4.3. The integrand f is of the form

$$\begin{aligned} f(X)= \varepsilon {\mathcal {A}}(X) + g(X,\det (X)), \end{aligned}$$
(4.3)

for some convex and smooth \(g: {\mathbb {R}}^5 \rightarrow {\mathbb {R}}\) and

$$\begin{aligned} {\mathcal {A}}(X) = a(X,\det (X)), \quad \text { where } a(X,d) \doteq \sqrt{1 + \Vert X\Vert ^2 + d^2}, \end{aligned}$$
(4.4)

is the area function. As in [24], f is not fixed from the beginning, but rather becomes another unknown of the problem. In particular, in order to find f, it is sufficient for the following condition to be fulfilled:

Condition 1

There exist \(2\times 2\) matrices \(\{X_1,\dots ,X_5\}\), \(\{Y_1,\dots , Y_5\}\), real numbers \(c_1,\dots , c_5, d_1,\dots , d_5\) and positive integers \(\beta _1,\dots , \beta _5\) such that for \(Q_{ij}\doteq \displaystyle c_i - c_j + d_i\det (X_i - X_j) + \frac{1}{\beta _i}\langle X_i - X_j ,Y_iJ\rangle \), one has

$$\begin{aligned} Q_{ij} < 0, \forall i \ne j. \end{aligned}$$
(4.5)

If this condition is satisfied, then one has

$$\begin{aligned} \left( \begin{array}{cc} X_i\\ Y_i \end{array} \right) \in {{\tilde{C}}}'_f, \quad \text {i.e. } Y_i = \beta _iDf(X_i)J. \end{aligned}$$
(4.6)

The construction of f is the content of Lemma 4.2. Moreover, we will be able to build f in such a way that for some large \(R > 0\),

$$\begin{aligned} g(z) = M\sqrt{1 + \Vert z\Vert ^2} - L = Ma(z) - L,\quad \forall z \in {\mathbb {R}}^5, \Vert z\Vert \ge R, \end{aligned}$$
(4.7)

and constants \(M,L > 0\). The non-rigidity of \(A_1,\dots ,A_5\) stems from the fact that we choose \(\{X_1,\dots ,X_5\}\) forming a large \(T_5\)-configuration, in the terminology of [13]. Therefore we introduce:

Condition 2

\(\{X_1,X_2,X_3,X_4, X_5\}\) form a large \(T_5\) configuration, i.e. there exists at least three permutations \(\sigma _1,\sigma _2,\sigma _3 : \{1,2,3,4,5\} \rightarrow \{1,2,3,4,5\}\) such that the ordered set \([X_{\sigma _i(1)},X_{\sigma _i(2)},\dots , X_{\sigma _i(5)}]\) is a \(T_5\) configuration and moreover \(\{C_{\sigma _1(i)},C_{\sigma _2(i)},C_{\sigma _3(i)}\}\) are linearly independent for every \(i \in \{1,\dots ,5\}\).

Once this condition is guaranteed, by [13, Theorem 1.2], we find a non-affine Lipschitz map \(u:\Omega \subset {\mathbb {R}}^2 \rightarrow {\mathbb {R}}^2\) such that

$$\begin{aligned} Du \in \{X_1,X_2,X_3,X_4,X_5\} \end{aligned}$$

almost everywhere in \(\Omega \). Furthermore, we can choose u with the property that for any subset \({\mathcal {V}}\subset \Omega \), Du attains each of these matrices on a set of positive measure. This is proved in Lemma 4.3.

In order to find Lipschitz maps \(w_1,w_2: \Omega \rightarrow {\mathbb {R}}^2\) such that

$$\begin{aligned} w = \left( \begin{array}{c}u\\ w_1\\ w_2\end{array}\right) : \Omega \rightarrow {\mathbb {R}}^6 \end{aligned}$$

satisfies

$$\begin{aligned} Dw \in {{\tilde{C}}}_f\quad \text {a.e. in }\Omega , \end{aligned}$$

we simply consider \(w_1 = Au + B\), \(w_2 = Cu + D\), for suitable \(2\times 2\) matrices ABCD. We therefore get our last

Condition 3

\(Y_i\) and \(Z_i\) can be chosen of the form

$$\begin{aligned} Y_i = AX_i + B, \quad Z_i = CX_i + D, \end{aligned}$$

and \(Z_i = X_i^TY_i - \beta _ic_iJ\), where \(c_i = f(X_i)\).

In Sect. (4.1), we will give an explicit example of values such that the Conditions 1-2-3 are fulfilled.

Once this is achieved, we need to extend the energy \(\mathbb {E}_f\) to an energy defined on integral currents of dimension 2 in \({\mathbb {R}}^4\). Some of the results we present in this section in our specific case can be easily generalized to more general polyconvex integrands. Therefore, we defer their proofs to Sect. 5.

In order to extend our polyconvex function f to a geometric functional, we first recall (4.3), i.e.

$$\begin{aligned} f(X) = \varepsilon {\mathcal {A}}(X) + g(X,\det (X)), \end{aligned}$$

for \(g: {\mathbb {R}}^{5} \rightarrow {\mathbb {R}}\) convex and smooth, and introduce the convex function \(h: {\mathbb {R}}^5 \rightarrow {\mathbb {R}}\):

$$\begin{aligned} h(z)\doteq \varepsilon \sqrt{1 + \Vert z\Vert ^2} + g(z). \end{aligned}$$

We consider the perspective function of h:

$$\begin{aligned} G(z,t)\doteq y h\left( \frac{z}{y}\right) , \quad \forall z \in {\mathbb {R}}^5, y >0. \end{aligned}$$
(4.8)

It is a standard result in convex analysis that G is convex on \({\mathbb {R}}^5\times {\mathbb {R}}_+\) as soon as h is convex on \({\mathbb {R}}^5\), compare [4, Lemma 2]. Property (4.7) reads as

$$\begin{aligned} h(z) = (M+\varepsilon )\sqrt{1 + \Vert z\Vert ^2} - L,\quad \forall z\in B_R^c(0), \end{aligned}$$
(4.9)

therefore we also find that the recession function of G is

$$\begin{aligned} h^*(z) = \lim _{y \rightarrow 0^+}G(z,t) = M\Vert z\Vert ,\quad \forall z \in {\mathbb {R}}^5. \end{aligned}$$

Hence, G can be extended to the hyperplane \(y = 0\) as

$$\begin{aligned} G(z,0) \doteq h^*(z). \end{aligned}$$

In Lemma 4.4, we will prove that G(zt) admits a finite, positively 1-homogeneous convex extension \({\mathcal {G}}\) to the whole space \({\mathbb {R}}^6\). We are finally able to define an integrand on the space of 2-vectors of \({\mathbb {R}}^4\), \(\Lambda _2({\mathbb {R}}^4)\). For a more thorough introduction to k-vectors, see Sect. A.1. Recall that

$$\begin{aligned} \Lambda _2({\mathbb {R}}^4) = {{\,\mathrm{span}\,}}\{v_1\wedge v_2: v_1,v_2 \in {\mathbb {R}}^4\}. \end{aligned}$$

A basis for \(\Lambda _2({\mathbb {R}}^4)\) is given by the six elements \(e_i\wedge e_j, 1\le i < j\le 4\), where \(e_1,e_2,e_3,e_4\) is the canonical basis of \({\mathbb {R}}^4\). Recall moreover that this vector space can be endowed with a scalar product that acts on simple vectors as

$$\begin{aligned} \langle v_1\wedge v_2, w_1\wedge w_2\rangle \doteq \det \left( \begin{array}{cc} (v_1,w_1) &{} (v_1,w_2)\\ (v_2,w_1) &{} (v_2,w_2) \end{array}\right) , \end{aligned}$$

where (uv) denotes as usual the standard scalar product of \({\mathbb {R}}^4\). The integrand

$$\begin{aligned} \Psi : \Lambda _2({\mathbb {R}}^4) \rightarrow {\mathbb {R}}, \end{aligned}$$

is thus defined as, for \(\tau \in \Lambda _2({\mathbb {R}}^4)\),

$$\begin{aligned} \Psi (\tau ) \doteq {\mathcal {G}}(\langle \tau ,e_3\wedge e_2\rangle ,\langle \tau ,e_4\wedge e_2\rangle ,\langle \tau ,e_1\wedge e_3\rangle ,\langle \tau ,e_1\wedge e_4\rangle ,\langle \tau ,e_3\wedge e_4\rangle ,\langle \tau ,e_1\wedge e_2\rangle ). \end{aligned}$$
(4.10)

Consequently, we define an energy on \({\mathcal {I}}_2({\mathbb {R}}^4)\) as

$$\begin{aligned} \Sigma (T) \doteq \int _E\Psi (\vec T(z))\theta (z) d{\mathcal {H}}^2(z), \end{aligned}$$

if \(T = \llbracket E, \vec T, \theta \rrbracket \). For the notation concerning rectifiable currents and graphs, we refer the reader to Sect. A.4. The energy defined in this way satisfies Almgren’s ellipticity condition (A.11), as we will prove in Lemma 4.5. Finally, in Lemma 4.6, we will prove that the current

$$\begin{aligned} T_{u,\theta } = \llbracket \Gamma _u,\vec \xi _u,\theta \rrbracket \end{aligned}$$
(4.11)

is stationary for the energy \(\Sigma \). The definition of stationarity for geometric functionals is recalled in Sect. A.5. In (4.11), \(\Gamma _u\) is the graph of u, \(\vec \xi _u\) is its orientation, see (A.5), and \(\theta (y)\) is a multiplicity, defined as \(\theta (x,u(x)) = \beta _i\) if \(x \in \Omega \) is such that

$$\begin{aligned} Dw(x) = \left( \begin{array}{cc} X_i\\ Y_i\\ Z_i\\ \end{array} \right) = \left( \begin{array}{c}X_i\\ \beta _iDf(X_i)\\ \beta _iX_i^TDf(X_i)J - \beta _if(X_i)J\end{array}\right) . \end{aligned}$$

This discussion constitutes the proof of the following:

Theorem 4.1

There exists a smooth and elliptic integrand \(\Psi : \Lambda _2({\mathbb {R}}^4)\rightarrow {\mathbb {R}}\) such that the associated energy \(\Sigma \) admits a stationary point T whose (integer) multiplicities are not constant. Moreover the rectifiable set supporting T is given by a graph of a Lipschitz map \(u: \Omega \rightarrow {\mathbb {R}}^2\) that fails to be \(C^1\) in any open subset \({\mathcal {V}} \subset \Omega \).

Lemma 4.2

There exists a smooth function \(f: {\mathbb {R}}^{2\times 2}\rightarrow {\mathbb {R}}\) of the form

$$\begin{aligned} f(X)\doteq \varepsilon {\mathcal {A}}(X) + g(X,\det (X)) \end{aligned}$$

with \(g:{\mathbb {R}}^5 \rightarrow {\mathbb {R}}\) convex and smooth, such that

  1. (1)

    (4.6) is fulfilled;

  2. (2)

    \(g(X) = M{\mathcal {A}}(X) - L\) for constants \(M,L > 0\), if \(\Vert X\Vert \ge R\).

Proof

We will follow roughly the strategy of [24, Lemma 3]. At first we construct the function g in several steps. Let \(\{(X_i, Y_i, Z_i, \beta _i)\}_{i=1}^5\) the set of admissible matrices. For \(\varepsilon >0\) consider for each i the perturbed values

$$\begin{aligned} \begin{aligned} Y_i^\varepsilon&\doteq Y_iJ - \varepsilon \beta _iD{\mathcal {A}}(X_i)J \\ c_i^\varepsilon&\doteq c_i - \varepsilon {\mathcal {A}}(X_i)\\ d_i^\varepsilon&\doteq d_i - \partial _ya(X_i, \det (X_i)) \end{aligned} \end{aligned}$$
(4.12)

where \(a(X,d) = \sqrt{1+ |X|^2 + d^2}\) and \({\mathcal {A}}(X)= a(X,\det (X))\), as defined in (4.4). Furthermore we introduce the perturbed matrix

$$\begin{aligned} Q_{ij}^\varepsilon \doteq c^\varepsilon _i - c^\varepsilon _j + d^\varepsilon _i\det (X_i - X_j) + \frac{1}{\beta _i}\langle X_i - X_j ,Y^\varepsilon _iJ\rangle . \end{aligned}$$

Thanks to the strict inequality in (4.5) we can fix \(\varepsilon , \sigma >0\) such that \(Q^\varepsilon _{ij}\le -\sigma <0\) for all ij. Let us define the linear functions

$$\begin{aligned}l_i(X,d)\doteq c^\varepsilon _i - \frac{1}{\beta _i}\langle Y^\varepsilon _iJ, X-X_i\rangle + d_i \left( \langle {{\,\mathrm{cof}\,}}(X_i)^T, X_i - X\rangle + d - \det (X_i)\right) \,\end{aligned}$$

and the convex function

$$\begin{aligned} g_1(X,d)\doteq \max _{1\le i \le 5}\, l_i(X,d). \end{aligned}$$

Note that \(l_j(X_j,\det (X_j)) = c_j^\varepsilon \) and

$$\begin{aligned} l_i(X_j, \det (X_j))= c^\varepsilon _j + Q^\varepsilon _{ij} < c^\varepsilon _j\,. \end{aligned}$$

Hence there is \(\delta >0\) such that \(l_i(X,d)< l_j(X,d)\) for all \((X,d) \in B_{\delta }(X_j,\det (X_j))\) for all \(i\ne j\) which implies that \(g_1 = l_j\) on \(B_{\delta }(X_j,\det (X_j))\). Choosing a radial symmetric, non-negative smoothing kernel on \({\mathbb {R}}^5\), \(\rho _\varepsilon \), \(0< \varepsilon<< \delta \) we have that \(g_2\doteq \rho _\varepsilon \star g_1\) satisfies

  1. (1)

    \(g_2\) is smooth and convex

  2. (2)

    \(g_2 = l_j\) in a neighbourhood of \((X_j, \det (X_j))\) for all \(j \in \{1,\dots , 5\}\).

  3. (3)

    \(|g_2(X,d)| \le C \Vert (1,X,d)\Vert \) for all (Xd) for some \(C>0\).

We choose any \(R> 2 \max _{1\le i \le 5} \{\Vert X_i\Vert + |\det (X_i)| \}\), and any \(M>C\). Now we may choose \(L>0\) such that

$$\begin{aligned} F(X,d)\doteq M a(X,d) - L < g_2(X,d) \text { on } B_R. \end{aligned}$$
(4.13)

Since \(M>C\) we have that

$$\begin{aligned} F(X,d) = M\Vert (1,X,d)\Vert - L > g_2(X,d) \end{aligned}$$
(4.14)

for all \((X,d) \notin B_{R_2}\), for some \(R_2 > R\). Now let us fix a smooth approximation of the \(\max \) function, say

$$\begin{aligned} m(a,b)\doteq (\phi _\varepsilon \star \max )(a,b), \end{aligned}$$

where \(\phi _\varepsilon \) is a radial symmetric, non-negative smoothing kernel in \({\mathbb {R}}^2\). Note that \(m(a,b) = \max (a,b)\) outside a neighborhood of \(\{ a= b\}\). In particular if we choose \(\varepsilon \) sufficiently small we can ensure that

$$\begin{aligned}g(X,d)\doteq m( F(X,d), g_2(X,d))\end{aligned}$$

agrees with \(g_2\) on \(B_{\frac{2R}{3}}\) by (4.13), and that it agrees with F(Xd) outside \(B_{2R_2}\) by (4.14). It remains to check that g(Xd) is still convex. First note that \(\partial _am \ge 0\) and \(\partial _bm\ge 0\) since \(\partial _a\max = {\mathbf {1}}_{\{a>b\}} \ge 0\), \(\partial _b\max = {\mathbf {1}}_{\{b>a\}} \ge 0\). Now it is a direct computation on the Hessian to see that if \(f_1, f_2 \in C^2({\mathbb {R}}^N)\) are two convex functions and \({\tilde{m}} \in C^2({\mathbb {R}}^2)\) is convex with \(\partial _a{\tilde{m}}(a,b), \partial _b{\tilde{m}}(a,b) \ge 0\), then the composition \(k(x)\doteq {\tilde{m}}(f_1(x), f_2(x))\) is convex. Thus we conclude that g is convex. Let us summarize the properties of g and the related polyconvex integrand \(f_1(X)\doteq g(X, \det (X))\)

  1. (1)

    g is a smooth, convex function;

  2. (2)

    \(g=M\, a - L\) outside a ball \(B_{R_3}\)

  3. (3)

    \(g = g_2\) on a ball \(B_{R_0}\), that implies that \(f_1(X_i)=c_i^\varepsilon \) and \(\beta _i Df_1(X_i)J = Y_i^\varepsilon \) for all i.

In particular from the last conditions and (4.12) we conclude that \(h(X,d)\doteq \varepsilon a(X,d) + g(X,d)\) is convex, \(f(X)\doteq \varepsilon {\mathcal {A}}(X) + f_1(X)\) is smooth, polyconvex and satisfies the desired properties, in particular \(f(X_i)=c_i\), \(\beta _i Df(X_i)J= Y_i\) for all i and \(f = (\varepsilon + M){\mathcal {A}} - L\) outside a ball centered at 0. \(\square \)

Lemma 4.3

Given a large \(T_5\) configuration \(\{X_1,\dots , X_5\} \subset {{\,\mathrm{Sym}\,}}(2)\), where \({{\,\mathrm{Sym}\,}}(2)\) is the space of symmetric matrices of \({\mathbb {R}}^{2\times 2}\), there exists a map \(u \in {{\,\mathrm{Lip}\,}}(\Omega ,{\mathbb {R}}^2)\) such that

$$\begin{aligned} Du \in \{X_1,\dots , X_5\} \end{aligned}$$
(4.15)

and such that for every open \({\mathcal {V}}\subset \Omega \),

$$\begin{aligned} |\{x \in \Omega : Du(x) = X_i\}\cap {\mathcal {V}}| > 0,\quad \forall i\in \{1,\dots , 5\}. \end{aligned}$$
(4.16)

Proof

This statement is well-known, so we will only sketch its proof and give references where to find the relevant results. As shown in [13, Theorem 2.8], if \(K\doteq \{X_1,\dots , X_5\}\) forms a large \(T_5\) configuration, then there exists an in-approximation of K inside \({{\,\mathrm{Sym}\,}}(2)\). This means, compare [13, Definition 1.3], that there exists a sequence of sets \(\{U_k\}_{k \in {{\mathbb {N}}}}\), open relatively to \({{\,\mathrm{Sym}\,}}(2)\), such that

  • \(\sup _{X \in U_k}{{\,\mathrm{d}\,}}(X,K) \rightarrow 0\) as \(k \rightarrow \infty \);

  • \(U_k\subset U_{k + 1}^{rc}, \forall k \in {{\mathbb {N}}}\).

For a compact \(C \subset {\mathbb {R}}^{2\times 2}\), the rank-one convex hull is defined as

$$\begin{aligned} C^{rc} \doteq \{P \in {\mathbb {R}}^{2\times 2}: f(P) \le 0, \forall f \text { rank-one convex such that } \sup _{X \in C}f(X) \le 0\}, \end{aligned}$$

where \(f: {\mathbb {R}}^{2\times 2} \rightarrow {\mathbb {R}}\) is said to be rank-one convex if

$$\begin{aligned} f(tA + (1-t)B) \le tf(A) + (1-t)f(B),\quad \forall A,B \in {\mathbb {R}}^{2\times 2}, \det (A - B) = 0, t \in [0,1]. \end{aligned}$$

For an open set \(U \subset {\mathbb {R}}^{2\times 2}\),

$$\begin{aligned} U^{rc} \doteq \bigcup _{C \subset U, C \text { compact }}C^{rc}. \end{aligned}$$

In this way, if U is open, then \(U^{rc}\) is open as well. The existence of a in-approximation for K implies the existence of a non-affine map u such that \(Du \in \{X_1,\dots , X_5\}\), hence (4.15). This is proved in [13, Theorem 1.1]. To show (4.16), there are two ways. Either, one can use the same proof of [20, Theorem  4.1] or [24, Proposition  2] to show that the essential oscillation of Du is positive on any open subset of \(\Omega \). Since there is rigidity for the four gradient problem, see [3], this implies (4.16). Another way to show (4.16) is to use the Baire Theorem approach of convex integration as introduced by Kirchheim in [16]. In particular, in [16, Corollary  4.15], it is proved the following. Define

$$\begin{aligned} {\mathcal {U}}\doteq \bigcup _{k \in {{\mathbb {N}}}}U^{rc}_k, \end{aligned}$$

we fix \(A \in {\mathcal {U}}\), and we also set

$$\begin{aligned} {\mathcal {P}} \doteq \{v \in {{\,\mathrm{Lip}\,}}(\Omega ,{\mathbb {R}}^2): Dv \in {\mathcal {U}}, v \text { piecewise affine}, v|_{\partial \Omega } = A\}, \end{aligned}$$

then the typical (in the sense of Baire) map

$$\begin{aligned} u \in {\mathcal {P}}^{\Vert \cdot \Vert _\infty } \end{aligned}$$

has the property that \(Du \in K\). Then, we can use [26, Lemma 7.4] to show that actually the typical map is non-affine on any open set, hence again by the rigidity for the four gradient problem, we conclude (4.16). \(\square \)

Lemma 4.4

Let \(G: {\mathbb {R}}^{5}\times {\mathbb {R}}_{\ge 0}\rightarrow {\mathbb {R}}\) be the convex function defined in (4.8). Then, there exists a positively 1-homogeneous, convex function \({\mathcal {G}} \in C^\infty ({\mathbb {R}}^6\setminus \{0\})\cap {{\,\mathrm{Lip}\,}}({\mathbb {R}}^6)\) such that

$$\begin{aligned} {\mathcal {G}}(z,t) = G(z,t), \end{aligned}$$

if \(z \in {\mathbb {R}}^5,t \in {\mathbb {R}}_+\).

Proof

To prove the statement, it is sufficent to notice that the convexity of h and (4.9) tells us that h has property (P), see the beginning of Sect. 5, and therefore we can simply apply Proposition 5.2. The smoothness is a consequence of the smoothness of h, property (4.9) and Corollary 5.3. \(\square \)

Lemma 4.5

The energy \(\Sigma _\Psi \) satisfies the uniform Almgren ellipticity condition (A.11).

Proof

By construction, it is immediate to see that also

$$\begin{aligned} {\mathcal {G}}_\varepsilon (z,t)\doteq {\mathcal {G}}(z,t) - \frac{\varepsilon }{2}\sqrt{t^2 + \Vert z\Vert ^2} \end{aligned}$$

is still convex and positively 1-homogenous. Define \(\Psi _\varepsilon \) as in (4.10) by substituting \({\mathcal {G}}_\varepsilon \) to \({\mathcal {G}}\). By the general Proposition 5.5, we see that \(\Sigma _{\Psi _\varepsilon }\) satisfies Almgren condition, hence \(\Sigma _\Psi \) satisfies (A.11) with constant \(\frac{\varepsilon }{2}\). \(\square \)

Lemma 4.6

The current \(T_{u,\theta } = \llbracket \Gamma _u,\vec \xi _u,\theta \rrbracket \) defined in (4.11) is stationary in \(\Omega \times {\mathbb {R}}^2\) for the energy \(\Sigma _\Psi \).

Proof

A direct computation shows that f and \(\Psi \) fulfill

$$\begin{aligned} f(X) = \Psi (W(X)){\mathcal {A}}(X), \forall X \in {\mathbb {R}}^{2\times 2}, \end{aligned}$$

where \(W(X) = M^1(X)\wedge M^2(X)\) and \(M^i\) are the columns of the matrix

$$\begin{aligned} M(X)\doteq \left( \begin{array}{c} {{\,\mathrm{id}\,}}_m\\ X \end{array} \right) . \end{aligned}$$

Once this is checked, the proof is entirely analogous to the one of [5, Proposition  6.8], and will be sketched in the appendix, see Proposition 5.8. \(\square \)

4.1 Explicit values

The following values were found using Maple 2020. Define the following quantities:

$$\begin{aligned}&(\beta _1,\beta _2,\beta _3,\beta _4,\beta _5) \doteq (2,5,10,1,2);\\&(d_1,d_2,d_3,d_4,d_5) \doteq \left( -\frac{1204}{828115},0,\frac{-1309}{454800},\frac{-10097}{2546880},0\right) ;\\&(c_1,c_2,c_3,c_4,c_5) \doteq \left( 0,0,-\frac{2929}{1137000},\frac{5233}{113700}, -\frac{33}{15160}\right) . \end{aligned}$$

The large \(T_5\) configuration is given by:

$$\begin{aligned} A_1\doteq \left( \begin{array}{cc} \frac{8}{5}&{} -2\\ -2 &{} \frac{8}{5}\\ -\frac{8}{1137} &{} \frac{7361}{454800}\\ \frac{267}{151600}&{} \frac{8}{1137}\\ \frac{-3361}{227400}&{} \frac{3361}{284250}\\ \frac{4801}{284250} &{} -\frac{4801}{227400} \end{array} \right) ; A_2\doteq \left( \begin{array}{cc} \frac{8}{5}&{} 2\\ 2 &{} \frac{8}{5}\\ \frac{8}{1137} &{} \frac{7361}{454800}\\ \frac{267}{151600}&{} -\frac{8}{1137}\\ \frac{3361}{227400}&{} \frac{3361}{284250}\\ \frac{4801}{284250} &{} \frac{4801}{227400} \end{array} \right) ; A_3\doteq \left( \begin{array}{cc} \frac{2}{5}&{} 0\\ 0 &{} -\frac{18}{5}\\ 0 &{} -\frac{959}{454800}\\ \frac{907}{151600}&{} 0\\ 0&{} -\frac{10083}{379000}\\ \frac{4801}{1137000} &{} 0 \end{array} \right) ; \end{aligned}$$
$$\begin{aligned} A_4\doteq \left( \begin{array}{cc} -\frac{18}{5}&{} 0\\ 0 &{} \frac{2}{5}\\ 0 &{} \frac{5441}{454800}\\ \frac{9121}{454800}&{} 0\\ 0&{} \frac{3361}{1137000}\\ -\frac{14403}{379000} &{} 0 \end{array} \right) ; A_5\doteq \left( \begin{array}{cc} \frac{3}{4}&{} 0\\ 0 &{} \frac{3}{4}\\ 0 &{} \frac{6001}{454800}\\ \frac{2161}{454800}&{} 0\\ 0&{} \frac{3361}{606400}\\ \frac{4801}{606400} &{} 0 \end{array} \right) . \end{aligned}$$

Define \(X_i, Y_i,Z_i \in {\mathbb {R}}^{2\times 2}\) through the relations

$$\begin{aligned} \left( \begin{array}{cc} X_i\\ Y_i\\ Z_i\\ \end{array} \right) = A_i. \end{aligned}$$

The matrices ABCD appearing in Condition 3 are given by:

$$\begin{aligned} A\doteq \left( \begin{array}{cc} 0&{} \frac{4}{1137}\\ - \frac{4}{1137} &{} 0 \end{array} \right) ; B\doteq \left( \begin{array}{cc} 0&{} \frac{4801}{454800}\\ \frac{3361}{454800} &{} 0 \end{array} \right) ; C\doteq \left( \begin{array}{cc} 0&{} \frac{3361}{454800}\\ \frac{4801}{454800} &{} 0 \end{array} \right) , D\doteq 0. \end{aligned}$$

These values fulfill Conditions 123. In particular, the three permutations in the definition of large \(T_5\) configuration of Condition 2 are: [1, 2, 3, 5, 4], [1, 2, 4, 5, 3], [1, 2, 5, 3, 4].

5 Extension of polyconvex functions

Let \(\Phi : {\mathbb {R}}^{n\times m} \rightarrow {\mathbb {R}}^{k}\) be the usual map that, to a matrix \(X \in {\mathbb {R}}^{n\times m}\), associates the vector of the subdeterminants of \(\Phi \). Consider a polyconvex function

$$\begin{aligned} f(X) = h(\Phi (X)), \end{aligned}$$

\(h: {\mathbb {R}}^k \rightarrow {\mathbb {R}}\) beingFootnote 1\(C^1\). The purpose of this section is to generalize the arguments of the previous section to arbitrary nm, and hence to prove some of the lemmas of that section. Consider the following set of assumptions

  1. (i)

    h is convex;

  2. (ii)

    h has linear growth, i.e. \(|h(z)| \le A\Vert z\Vert + B, \forall z \in {\mathbb {R}}^k\), for \(A,B\ge 0\);

  3. (iii)

    \(\lambda \doteq \inf \{h(z) - (Dh(z),z): z\in {\mathbb {R}}^k\} > -\infty \);

  4. (iv)

    \((Dh(z_2),z_2 - z_1) \le h(z_1) + h(z_2), \quad \forall z_1,z_2 \in {\mathbb {R}}^k\).

If h fulfills (i)-(ii)-(iii), we will say it has property (P). If, in addition, h satisfies (iv), we will say that h fulfills property (PE).

Remark 5.1

Notice that (iii) is a consequence of (iv), indeed if (iv) holds we can write, for \(z_1 = 0\) and for any \(z_2 = z \in {\mathbb {R}}^k\):

$$\begin{aligned} (Dh(z),z) \le h(0) + h(z), \end{aligned}$$

hence

$$\begin{aligned} -h(0) \le h(z) - (Dh(z),z), \quad \forall z \in {\mathbb {R}}^k, \end{aligned}$$

that implies (iii).

We denote with \(h^*\) the recession function of h:

$$\begin{aligned} h^*(x)\doteq \lim _{y\rightarrow 0^+}yh\left( \frac{x}{y}\right) , \quad \forall x \in {\mathbb {R}}^k. \end{aligned}$$

It is not difficult to prove that the limit above always exists and is finite for a function h satisfying (P). To show it, one can use the fact that the function

$$\begin{aligned} y \mapsto yh\left( \frac{x}{y}\right) \end{aligned}$$

defined for \(y > 0\) is convex for every fixed \(x \in {\mathbb {R}}^k\), see [4, Lemma 2].

As above, we define the perspective function

$$\begin{aligned} G(x,y)\doteq yh\left( \frac{x}{y}\right) ,\quad \text {if } y > 0. \end{aligned}$$

We consider the smallest convex extension of G to the whole \({\mathbb {R}}^{k + 1}\):

$$\begin{aligned} {\mathcal {G}}(z,t) \doteq \sup \{G(x,y) + (DG(x,y),(z,t) - (x,y)): (x,y) \in {\mathbb {R}}^{k}\times (0,+\infty )\}. \end{aligned}$$

By 1-homogeneity of G, we can write

$$\begin{aligned} {\mathcal {G}}(z,t) = \sup \{(DG(x,y),(z,t)): (x,y) \in {\mathbb {R}}^{k}\times (0,+\infty )\} \end{aligned}$$
(5.1)

First, we prove

Proposition 5.2

Let \({\mathcal {G}}\) be defined as in (5.1). Then, if h satisfies (P), \({\mathcal {G}}\)

  1. (1)

    is convex and extends G on \({\mathbb {R}}^k\times (0,+\infty )\);

  2. (2)

    is positively 1-homogeneous;

  3. (3)

    is finite everywhere.

Conversely, if there exists a function \({\mathcal {G}}\) that fulfills (1)–(2)–(3), then h fulfills (P).

Furthermore, we can prove the following characterization of \({\mathcal {G}}\):

Corollary 5.3

Let h fulfill property (P), and let \({\mathcal {G}}\) be defined as in (5.1). Assume further that there exists \(\lambda ' \in {\mathbb {R}}\) and \(R > 0\) such that

$$\begin{aligned} h(z) = h^*(z) + \lambda ',\quad \text { for } \Vert z\Vert \ge R. \end{aligned}$$
(5.2)

Then, \(\lambda ' = \lambda \) and for \(t < 0\), we have

$$\begin{aligned} {\mathcal {G}}(z,t) = h^*(z) + \lambda ' t, \end{aligned}$$

where \(\lambda \) is the quantity appearing in (iii).

Before starting with the proof of the proposition, we need to recall some results concerning the notion of subdifferential at \(x \in {\mathbb {R}}^N\) of a convex function \(f: {\mathbb {R}}^{N} \rightarrow {\mathbb {R}}\).

5.1 Subdifferentials

The subdifferential of f at x, denoted with \(\partial f(x)\), is the collection of those vectors \(v \in {\mathbb {R}}^N\) such that

$$\begin{aligned} (v,y-x) \le f(y) - f(x), \quad \forall y \in {\mathbb {R}}^N. \end{aligned}$$

We will use the following facts concerning the subdifferential. For a convex function with finite values, \(\partial f(x) \ne \emptyset \) at all \(x \in {\mathbb {R}}^N\), see [21, Theorem 23.4]. Conversely, if \(f:{\mathbb {R}}^N \rightarrow {\mathbb {R}}\) is such that \(\partial f(x) \ne \emptyset \) at every \(x \in {\mathbb {R}}^N\), then f is convex, since in that case

$$\begin{aligned} f(x) = \sup _{y \in {\mathbb {R}}^N}\sup _{v \in \partial f(y)}\{(v,x - y) + f(y)\}. \end{aligned}$$

As can be seen from the definition of subdifferential,

$$\begin{aligned} |f(x) - f(y)|\le \max \left\{ \sup _{v \in \partial f(x)}\Vert v\Vert ,\sup _{w \in \partial f(y)}\Vert w\Vert \right\} \Vert x - y\Vert . \end{aligned}$$

This, together with the fact that if K is compact, then \(\partial f(K) \doteq \bigcup _{x \in K}\partial f(x)\) is compact, see [12, Lemma A.22], yields the fact that every convex function is locally Lipschitz. Moreover, if f is positively 1-homogeneous, a simple application of the definition of subdifferential shows that

$$\begin{aligned} v \in \partial f(x) \Leftrightarrow v \in \partial f(\lambda x), \quad \forall \lambda > 0, x \in {\mathbb {R}}^N. \end{aligned}$$
(5.3)

In particular, combining (5.3) with the local Lipschitz property of convex functions, we infer that if \(f: {\mathbb {R}}^N \rightarrow {\mathbb {R}}\) is convex and positively-1 homogeneous, f must be globally Lipschitz. Furthermore, using the definition of subdifferential and (5.3) for f convex and positively-1 homogeneous, it is easy to see that the following generalized Euler’s formula holds

$$\begin{aligned} (v,x) = f(x), \quad \forall v \in \partial f(x), \forall x \in {\mathbb {R}}^N. \end{aligned}$$
(5.4)

Finally, we recall that at x, the convex function \(f: {\mathbb {R}}^N \rightarrow {\mathbb {R}}\) is differentiable if and only if

$$\begin{aligned} \partial f(x) = \{Df(x)\}, \end{aligned}$$

see [12, Lemma A.20-A.21] and references therein. We can now start the proof of the proposition.

5.2 Proof of Proposition 5.2

First we assume that h has property (P). \({\mathcal {G}}\) is convex since it is supremum of linear functions. Moreover, the convexity of h yields the convexity of G on \({\mathbb {R}}^k\times (0,+\infty )\). Having established that G is convex, the fact that \({\mathcal {G}}\) as in (5.1) extends G is a classical fact. This proves (1). Since G was positively 1-homogeneous, we have that \({\mathcal {G}}\) is as well homogeneous. Therefore (2) is checked, and we only need to prove (3). By (5.1) we see that in order to conclude we only need to show that, for fixed \((z,t) \in {\mathbb {R}}^{k + 1}\),

$$\begin{aligned} (DG(x,y),(z,t)) \le L < + \infty , \quad \forall (x,y) \in {\mathbb {R}}^k\times (0,+\infty ), \end{aligned}$$

where L possibly depends on (zt). Let us compute DG. Firstly we have that

$$\begin{aligned} \partial _{x_i}G(x,y) = \partial _{x_i}h\left( \frac{x}{y}\right) . \end{aligned}$$

Now, exploiting the convexity of h, we can choose any \(v \in {\mathbb {R}}^k\) with \(\Vert v\Vert = 1\) and write

$$\begin{aligned} (Dh(x),v) \le \frac{h(x + sv) - h(x)}{s}, \quad \forall s \in {\mathbb {R}}^+. \end{aligned}$$

Using the linear growth of h, i.e. (ii), we bound:

$$\begin{aligned} (Dh(x),v) \le \frac{A\Vert x + sv\Vert + B + A\Vert x\Vert + B}{s}. \end{aligned}$$

Letting \(s \rightarrow + \infty \), the previous expression yields

$$\begin{aligned} (Dh(x),v) \le A,\quad \forall x,v \in {\mathbb {R}}^n, \Vert v\Vert =1. \end{aligned}$$
(5.5)

Thus, if we can show that \(\partial _yG(x,y)\) is uniformly bounded, then we conclude the proof. We compute explicitly, for every \((x,y)\in {\mathbb {R}}^{k}\times (0,+\infty )\)

$$\begin{aligned} \partial _yG(x,y) = h\left( \frac{x}{y}\right) - \left( Dh\left( \frac{x}{y}\right) ,\frac{x}{y}\right) . \end{aligned}$$

We are therefore left to study the boundedness (from below) of the function \(z\mapsto h(z) - (Dh(z),z)\), but this is a consequence of (iii) of property (P).

Finally, let us show the necessity of (P). If \({\mathcal {G}}\) is convex and extends G, then in particular

$$\begin{aligned} {\mathcal {G}}(z,1) = G(z,1) = h(z), \quad \forall z \in {\mathbb {R}}^k, \end{aligned}$$

hence h is convex. By the discussion of Sect. 5.1, we know that \({\mathcal {G}}\) is globally Lipschitz with constant \(L > 0\). Since

$$\begin{aligned} {\mathcal {G}}(z,1) = h(z), \quad \forall z \in {\mathbb {R}}^k, \end{aligned}$$

we infer that h has linear growth, i.e. it enjoys property (ii). Finally, we need to show (iii). Since \({\mathcal {G}}\) extends G in the upper half-space, we obtain

$$\begin{aligned} |\partial _yG(x,y)| \le L, \quad \forall (x,y) \in {\mathbb {R}}^k\times (0,+\infty ). \end{aligned}$$

By the definition of G, we deduce

$$\begin{aligned} |\partial _yG(z,1)| = |h(z) - (Dh(z),z)| \le L, \quad \forall z \in {\mathbb {R}}^k, \end{aligned}$$

hence (iii).

5.3 Proof of Corollary 5.3

First we show that \(\lambda = \lambda '\). To see this, consider for any \(z \ne 0\) the auxiliary function \(g(t) \doteq h(tz) - (Dh(tz),tz)\), for \(t > 0\). Then, g is non-increasing. Indeed,

$$\begin{aligned} g\left( \frac{1}{t}\right) = \partial _y{\mathcal {G}}(z,t), \end{aligned}$$

and we can use that \({\mathcal {G}}\) is convex to deduce that \(t\mapsto \partial _y{\mathcal {G}}(z,t)\) is non-decreasing, hence that \(t\mapsto g(t)\) is non-increasing. Now, for any t sufficiently large, by assumption (5.2), we have that

$$\begin{aligned} h(tz) - (Dh(tz),tz) = \lambda '. \end{aligned}$$

This shows that

$$\begin{aligned} \lambda ' = \lim _{t \rightarrow +\infty }[h(tz) - (Dh(tz),tz)] = \inf _{t > 0}[h(tz) - (Dh(tz),tz)] \ge \lambda . \end{aligned}$$

In particular, notice that \(h(0) = \lim _{t \rightarrow 0^+}g(t) \ge \lambda '\). To show the equality between \(\lambda \) and \(\lambda '\), consider now a sequence \(z_n \in {\mathbb {R}}^k\) such that \(a_n \doteq h(z_n) - (Dh(z_n),z_n) \rightarrow \lambda \) as \(n \rightarrow \infty \). If \(z_n = 0\) for infinitely many n, we can write \(\lambda = \lim _{n \rightarrow \infty }a_n = h(0) \ge \lambda '\) and the proof is concluded. Otherwise, by the computation above, we have, for every \(t \ge 1\)

$$\begin{aligned} a_n \ge h(tz_n) - (Dh(tz_n),tz_n). \end{aligned}$$

By choosing t in dependence of \(z_n\), we can ensure through assumption (5.2) that

$$\begin{aligned} h(tz_n) - (Dh(tz_n),tz_n) = \lambda '. \end{aligned}$$

Therefore,

$$\begin{aligned} \lambda = \lim _na_n \ge \lambda ' \end{aligned}$$

and the proof of the first part of the Corollary is finished.

Now we wish to show the characterization of \({\mathcal {G}}\). Fix \((z,t) \in {\mathbb {R}}^k\times (-\infty ,0)\). Let \((x,y) \in {\mathbb {R}}^k\times (0,+\infty )\). Then, using the definition \(G(x,y) = yh\left( \frac{x}{y}\right) \)

$$\begin{aligned} (DG(x,y),(z,t)) = \left( Dh\left( \frac{x}{y}\right) ,z\right) + \left( h\left( \frac{x}{y}\right) - \left( Dh\left( \frac{x}{y}\right) ,\frac{x}{y}\right) \right) t. \end{aligned}$$
(5.6)

By (iii), we get

$$\begin{aligned} h\left( \frac{x}{y}\right) - \left( Dh\left( \frac{x}{y}\right) ,\frac{x}{y}\right) \ge \lambda , \end{aligned}$$

hence, since \(t < 0\), then

$$\begin{aligned} (D{\mathcal {G}}(x,y),(z,t)) \le \left( Dh\left( \frac{x}{y}\right) ,z\right) + \lambda t. \end{aligned}$$

We now show that

$$\begin{aligned} \left( Dh\left( \frac{x}{y}\right) ,z\right) \le h^*(z). \end{aligned}$$
(5.7)

Let \(a,b \in {\mathbb {R}}^k\), \(r > 0\). Then, using the convexity of h,

$$\begin{aligned} 0 \le \left( Dh(a) - Dh\left( \frac{b}{r}\right) , a - \frac{b}{r}\right) , \end{aligned}$$

or

$$\begin{aligned} 0 \le \left( Dh(a) - Dh\left( \frac{b}{r}\right) , ra - b\right) . \end{aligned}$$
(5.8)

To conclude (5.7), we might use assumption (5.2), but let us use a slightly more general argument in order to use the same inequality below. By (5.5), we have that

$$\begin{aligned} \left\{ Dh\left( \frac{b}{r}\right) \right\} _{r > 0} \end{aligned}$$

is an equibounded family of vectors, hence up to subsequences it admits a limit \(\lim _{j \rightarrow \infty } Dh\left( \frac{b}{r_j}\right) = w \in {\mathbb {R}}^k\), where \(\lim _{j \rightarrow \infty }r_j = 0\). Hence,

$$\begin{aligned} 0 \le \lim _{j \rightarrow \infty }\left( Dh(a) - Dh\left( \frac{b}{r_j}\right) , r_ja - b\right) = -(Dh(a) - w,b). \end{aligned}$$
(5.9)

Now, \(w \in \partial h^*(b)\), in fact using the convexity of h we can write

$$\begin{aligned} \left( Dh\left( \frac{b}{r_j}\right) ,\frac{a}{r_j}-\frac{b}{r_j}\right) \le h\left( \frac{a}{r_j}\right) - h\left( \frac{b}{r_j}\right) . \end{aligned}$$

Multiplying by \(r_j\) and letting \(j \rightarrow \infty \), we find that \(w \in \partial h^*(b)\). By (5.4), (5.7) now follows from (5.9). Therefore, we can conclude that, for \(t < 0\),

$$\begin{aligned} {\mathcal {G}}(z,t) \le h^*(z) +\lambda t. \end{aligned}$$

To conclude the assertion, we consider for any \(t > 0\):

$$\begin{aligned} (D{\mathcal {G}}(z,y),(z,t)) = \left( Dh\left( \frac{z}{y}\right) ,z\right) + \left( h\left( \frac{z}{y}\right) - \left( Dh\left( \frac{z}{y}\right) ,\frac{z}{y}\right) \right) t. \end{aligned}$$

If we choose y sufficiently small (in dependence of z), once again using (5.2), we see that

$$\begin{aligned} (D{\mathcal {G}}(z,y),(z,t)) = h^*(z) + \lambda ' t = h^*(z) + \lambda t, \end{aligned}$$

the latter being true by the first part of the proof. This concludes the proof of the corollary.

5.4 Symmetric extension

Now we show the link between (PE) and a symmetric extension. Notice that imposing that h admits a 1-homogeneous and even extension such that \({\mathcal {G}}(z,1) = h(z)\) forces this extension to have the form

$$\begin{aligned} {\mathcal {G}}(z,t) = |t|h\left( \frac{z}{t}\right) \end{aligned}$$
(5.10)

for \(t \ne 0\). If we require that \({\mathcal {G}}\) is convex too, then it is continuous, hence it becomes uniquely determined on \(\{(x,y): y = 0\}\) as \({\mathcal {G}}(z,0) = h^*(z)\). Therefore, instead of considering a general convex extension as in (5.1), we are going to work with the function \({\mathcal {G}}\) obtained in (5.10).

Proposition 5.4

h satisfies (PE) if and only if \({\mathcal {G}}: {\mathbb {R}}^{k + 1}\rightarrow {\mathbb {R}}\) defined as

$$\begin{aligned} {\mathcal {G}}(z,t) = {\left\{ \begin{array}{ll} |y|h\left( \frac{z}{y}\right) , &{}\text { if } y \ne 0\\ h^*(z), &{}\text { if } y = 0 \end{array}\right. } \end{aligned}$$
(5.11)

is even and convex.

Proof

Assume that h satisfies (PE). First we prove that \({\mathcal {G}}\) is even. This amounts to show that

$$\begin{aligned} h^*(z) = \lim _{t\rightarrow 0^+}\frac{h(tz)}{t} = \lim _{t\rightarrow 0^+}\frac{h(-tz)}{t} = h^*(-z). \end{aligned}$$
(5.12)

To see this, we simply evaluate (iv) at \(z_1 = \frac{-z}{t}\) and \(z_2 = \frac{z}{t}\) for any \(z \in {\mathbb {R}}^k\), \(t > 0\) to find

$$\begin{aligned} 2\left( Dh\left( \frac{z}{t}\right) ,\frac{z}{t} \right) \le h\left( \frac{z}{t}\right) + h\left( -\frac{z}{t}\right) . \end{aligned}$$
(5.13)

We now use the same argument to prove (5.7) to see that for a sequence of positive numbers \(\{t_j\}_{j \in {{\mathbb {N}}}}\) with \(\lim _jt_j = 0\), \(\lim _{j \rightarrow \infty }Dh\left( \frac{z}{t_j}\right) = w \in \partial h^*(z)\). Therefore, multiplying by t in (5.13) and passing to the limit along this subsequence, we get

$$\begin{aligned} 2(w,z) \le h^*(z) + h^*(-z),\quad \forall z \in {\mathbb {R}}^k. \end{aligned}$$

By (5.4), \((w,z) = h^*(z)\), and in this way we see that, using the last equation,

$$\begin{aligned} 2h^*(z) \le h^*(z) + h^*(-z) \Rightarrow h^*(z) \le h^*(-z),\quad \forall z \in {\mathbb {R}}^k, \end{aligned}$$

that implies (5.12).

Now we show that \({\mathcal {G}}\) is convex. We rely on the results of Sect. 5.1, and we aim to show that at every point \(p = (z,t) \in {\mathbb {R}}^{k + 1}\),

$$\begin{aligned} \partial {\mathcal {G}}(p) \ne \emptyset . \end{aligned}$$

Let first \(t > 0\). Since at p the function is differentiable, the only possible candidate for an element of the subdifferential is \(v\doteq D{\mathcal {G}}(p)\). Notice moreover that by the 1-homogeneity of \({\mathcal {G}}\), \((D{\mathcal {G}}(p),p)={\mathcal {G}}(p)\). Thus we have, for any \(q = (x,y) \in {\mathbb {R}}^{k + 1}\):

$$\begin{aligned} (D{\mathcal {G}}(p),q-p) \le {\mathcal {G}}(q) -{\mathcal {G}}(p) \Leftrightarrow (D{\mathcal {G}}(p),q) \le {\mathcal {G}}(q). \end{aligned}$$
(5.14)

If we establish \((D{\mathcal {G}}(p),q) \le {\mathcal {G}}(q)\) for any \(y \ne 0\), then we can use the pointwise convergence

$$\begin{aligned} \lim _{y \rightarrow 0^+}{\mathcal {G}}(x,y) = \lim _{y \rightarrow 0^+}yh\left( \frac{x}{y}\right) = h^*(x) = {\mathcal {G}}(x,0) \end{aligned}$$

to infer that the inequality holds also for \(y = 0\). We therefore compute, for any \(y \ne 0\):

$$\begin{aligned} (D{\mathcal {G}}(p),q) = \left( Dh\left( \frac{z}{t}\right) ,x\right) + h\left( \frac{z}{t}\right) y - \left( Dh\left( \frac{z}{t}\right) ,\frac{z}{t}\right) y = y\left[ h\left( \frac{z}{t}\right) - \left( Dh\left( \frac{z}{t}\right) ,\frac{z}{t}-\frac{x}{y}\right) \right] \end{aligned}$$

Using (5.14), \(v = D{\mathcal {G}}(p)\) is a supporting hyperplane if and only if

$$\begin{aligned} (D{\mathcal {G}}(p), q) = y\left[ h\left( \frac{z}{t}\right) - \left( Dh\left( \frac{z}{t}\right) ,\frac{z}{t}-\frac{x}{y}\right) \right] \le {\mathcal {G}}(q). \end{aligned}$$
(5.15)

Following the same argument of the beginning of Proposition 5.2, since h is convex, \({\mathcal {G}}\) is convex on \({\mathbb {R}}^k\times (0,+\infty )\). Thus (5.15) is surely fulfilled if \(y > 0\). If \(y < 0\), (5.15) becomes

$$\begin{aligned} y\left[ h\left( \frac{z}{t}\right) - \left( Dh\left( \frac{z}{t}\right) ,\frac{z}{t} - \frac{x}{y}\right) \right] \le {\mathcal {G}}(q) = -yh\left( \frac{x}{y}\right) , \end{aligned}$$

that can be rewritten as

$$\begin{aligned} h\left( \frac{z}{t}\right) - \left( Dh\left( \frac{z}{t}\right) ,\frac{z}{t} - \frac{x}{y}\right) \ge -h\left( \frac{x}{y}\right) , \forall (z,t) \in {\mathbb {R}}^k\times (0,+\infty ),(x,y) \in {\mathbb {R}}^k\times (-\infty ,0). \end{aligned}$$

The last condition is equivalent to (iv). Now we need to prove that also for points \(p = (z,t)\) with \(t < 0 \) an element in the subdifferential exists. This is anyway a consequence of the evenness of \({\mathcal {G}}\) and the proof above, indeed the evenness of \({\mathcal {G}}\) yields

$$\begin{aligned} D{\mathcal {G}}(p) = -D{\mathcal {G}}(-p),\quad \forall p \in {\mathbb {R}}^{k}\times ({\mathbb {R}}\setminus \{0\}). \end{aligned}$$

Therefore, for any \(q = (x,y) \in {\mathbb {R}}^{k + 1}\),

$$\begin{aligned} (D{\mathcal {G}}(p),q - p)&= -(D{\mathcal {G}}(-p),q - p) = (D{\mathcal {G}}(-p),p - q) = (D{\mathcal {G}}(-p),- q - (-p)) \\&\le {\mathcal {G}}(-q)-{\mathcal {G}}(-p) = {\mathcal {G}}(q)-{\mathcal {G}}(p), \end{aligned}$$

where we exploited the fact that \(D{\mathcal {G}}(-p) \in \partial {\mathcal {G}}(-p)\), as proved above. Finally, we need to produce an element in the subdifferential at points \(p = (z,0)\). To do so, we again use the fact that for any \(p' = (z,t)\) with \(t > 0\), \(q = (x,y) \in {\mathbb {R}}^{k + 1}\),

$$\begin{aligned} (D{\mathcal {G}}(p'),q - p')\le {\mathcal {G}}(q) - {\mathcal {G}}(p'). \end{aligned}$$

We only need to observe that \(\{D{\mathcal {G}}(z,t)\}_{t > 0}\) is an equibounded family of vectors. This allows us to choose a sequence \(t_j>0\) convergent to 0 such that \(\{D{\mathcal {G}}(z,t_j)\}_{j \in {{\mathbb {N}}}}\) converges to a vector \(w \in {\mathbb {R}}^{k + 1}\). Since \(\lim _{t\rightarrow 0^+}{\mathcal {G}}(z,t) = {\mathcal {G}}(z,0)\), we have

$$\begin{aligned} (w,q - p) = \lim _{j \rightarrow \infty }(D{\mathcal {G}}(p_j),q - p_j) \le \lim _{j \rightarrow \infty }({\mathcal {G}}(q) - {\mathcal {G}}(p_j)) = {\mathcal {G}}(q) - {\mathcal {G}}(p). \end{aligned}$$

where \(p_j = (z,t_j), \forall j \in {{\mathbb {N}}}\). To show the equi-boundedness of \(\{D{\mathcal {G}}(z,t)\}_{t>0}\), we observe that

$$\begin{aligned} D_z{\mathcal {G}}(z,t) = Dh\left( \frac{z}{t}\right) , \end{aligned}$$

that is equibounded in z and t by (5.5). Exactly as in the proof of Proposition 5.2, we use (iii) to say that

$$\begin{aligned} \lambda \le \partial _t {\mathcal {G}}(z,t),\quad \forall (z,t) \in {\mathbb {R}}^k \times (0,+\infty ). \end{aligned}$$

Hence we only need to provide a bound from above. To show it, we use the convexity of h to estimate

$$\begin{aligned}&\partial _t {\mathcal {G}}(z,t) = h\left( \frac{z}{t}\right) - \left( Dh\left( \frac{z}{t}\right) ,\frac{z}{t}\right) = h\left( \frac{z}{t}\right) + \left( Dh\left( \frac{z}{t}\right) ,0 - \frac{z}{t}\right) \\&\le h\left( \frac{z}{t}\right) + h(0)- h\left( \frac{z}{t}\right) = h(0), \end{aligned}$$

that provides the desired bound. This finishes the proof of the convexity of \({\mathcal {G}}\).

To conclude, we need to show the converse statement, i.e. that if \({\mathcal {G}}\) is even and convex, then h fulfills (PE). The fact that h fulfills (i)-(ii)-(iii) can be proved in a completely analogous way as in Proposition 5.2. By Remark 5.1, one could also infer (iii) as a corollary of (iv). Finally, to see (iv), one can simply follow the chain of logical equivalences of the previous part of the proof. This proves that (PE) is also necessary to the existence of the even extension. \(\square \)

5.5 Extension to geometric functionals

Consider an orthonormal basis of \(\Lambda _m({\mathbb {R}}^{m + n})\), denoted with \(E_1,\dots , E_{\left( {\begin{array}{c}m + n\\ m\end{array}}\right) }\), where

$$\begin{aligned} E_1 \doteq e_1\wedge \dots \wedge e_{m}, \end{aligned}$$

as done in (A.1). We define, for every \(\tau \in \Lambda _m({\mathbb {R}}^{m + n})\)

$$\begin{aligned} \Psi (\tau )\doteq {\mathcal {G}}(\langle \tau ,E_2\rangle , \dots , \langle \tau ,E_{\left( {\begin{array}{c}m + n\\ m\end{array}}\right) }\rangle , \langle \tau ,E_1\rangle ), \end{aligned}$$
(5.16)

and consequently the energy

$$\begin{aligned} \Sigma _\Psi (T)\doteq \int _{E}\Psi (\vec T(x))\theta (x)d{\mathcal {H}}^m(x), \end{aligned}$$

for \(T = \llbracket E, \vec T,\theta \rrbracket \in {\mathcal {R}}_m({\mathbb {R}}^{n + m})\). For convenience, let us denote

$$\begin{aligned} \phi (\tau ) \doteq \left( \langle \tau ,E_2\rangle , \dots , \langle \tau ,E_{\left( {\begin{array}{c}m + n\\ m\end{array}}\right) }\rangle , \langle \tau ,E_1\rangle \right) . \end{aligned}$$

We have

Proposition 5.5

Let \({\mathcal {G}}\) be positively-1 homogeneous and convex, and define \(\Psi \) as in (5.16). Then, \(\Sigma _\Psi \) fulfills Almgren’s condition (A.10).

Proof

Let \(R,S \in {\mathcal {R}}_m({\mathbb {R}}^{n + m})\), \(\partial R = \partial S \), \({{\,\mathrm{spt}\,}}S\) is contained in the vectorsubspace of \({\mathbb {R}}^n\) associated with a simple m vector \(\vec {S}_0\) of \({\mathbb {R}}^n\), and \(\vec {S}(z)=\vec {S}_0\) for \(\Vert S\Vert \)-almost all z. Since \(\partial R = \partial S\) we have that

$$\begin{aligned} \int \vec {R} \, d\Vert R\Vert = \int \vec {S} \, d\Vert S\Vert = {\mathbb {M}}(S) \,\vec {S}_0, \end{aligned}$$

compare [11, 5.1.2]. Note that this implies by the linearity of \(\phi \) that

$$\begin{aligned} \int \phi \circ \vec {R} \, d\Vert R\Vert = \int \phi \circ \vec {S} \, d\Vert S\Vert = {\mathbb {M}}(S) \,\phi \circ \vec {S}_0. \end{aligned}$$

Now we may use Jensen inequality and the 1-homogeneity of \({\mathcal {G}}\) to deduce that

$$\begin{aligned} \int \Psi \circ \vec {R} d\Vert R\Vert&= \int {\mathcal {G}}\circ \phi \circ \vec {R} d\Vert R\Vert \ge {\mathcal {G}} \left( \int \phi \circ \vec {R} \, d\Vert R\Vert \right) = {\mathcal {G}} \left( {\mathbb {M}}(S) \,\phi \circ \vec {S}_0\right) \\&= {\mathbb {M}}(S)\, {\mathcal {G}}\circ \phi \circ \vec {S}_0 = \int \Phi \circ \vec {S} d\Vert S\Vert \,, \end{aligned}$$

where we used again in the last line that \({\mathcal {G}}\) is 1-homogeneous. \(\square \)

Remark 5.6

If \({\mathcal {G}}\) is even, then \(\Sigma _\Psi \) is a well-defined energy on varifolds. Notice that in this case, \({\mathcal {G}}\) is convex, even and 1-homogeneous. A simple computation in convex analysis shows that this imposes for \({\mathcal {G}}\) to be positive. This observation is what makes it impossible to extend an integrand f as the one constructed in Sect. 4 to an integrand defined on varifolds using the methods introduced here.