Functions with Bounded Hessian–Schatten Variation: Density, Variational, and Extremality Properties

In this paper we analyze in detail a few questions related to the theory of functions with bounded p-Hessian–Schatten total variation, which are relevant in connection with the theory of inverse problems and machine learning. We prove an optimal density result, relative to the p-Hessian–Schatten total variation, of continuous piecewise linear (CPWL) functions in any space dimension d, using a construction based on a mesh whose local orientation is adapted to the function to be approximated. We show that not all extremal functions with respect to the p-Hessian–Schatten total variation are CPWL. Finally, we prove the existence of minimizers of certain relevant functionals involving the p-Hessian–Schatten total variation in the critical dimension d=2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d=2$$\end{document}.


Introduction
Broadly speaking, the goal of an inverse problem is to reconstruct an unknown signal of interest from a collection of (possibly noisy) observations.Linear inverse problems, in particular, are prevalent in various areas of signal processing.They are defined via the specification of three principal components: • a hypothesis space S from which we aim to reconstruct the unknown signal f * ∈ S, • a linear forward operator ν : S → R N that models the data acquisition process, • the observed data that is stored in an array y ∈ R N , with the implicit assumption that y ≈ ν(f * ).The task is then to (approximately) reconstruct the unknown signal f * from the observed data y.From a variational perspective, the problem can be formulated as a minimization of the form f * ∈ arg min where • E : R N × R N → R is a convex loss function that measures the data discrepancy, • R : S → R is the regularization functional that enforces prior knowledge and regularity on the reconstructed signal, • λ > 0 is a tunable parameter that adjusts the two terms.In general, regularization (obtained by the presence of R) enhances the stability of the problem and alleviates its inherent ill-posedness.Also, the presence of R leads to a key theoretical result, the so called "representer theorem", that provides a parametric form for optimal solutions of (0.1) and has been recently extended to cover generic convex optimization problems over Banach spaces [BCDC + 19, BC20, Uns21, UA22].In simple terms (and under suitable assumptions), this abstract results characterizes the solution set of (0.1) in terms of the extreme points of the unit ball of the regularization functional {f ∈ S : R(f ) ≤ 1}. (0.2) Hence, the original problem can be translated in finding the extreme points of the unit ball appearing in (0.2).
In this paper, we are going to study problems arising from a particular, yet general, choice of the items appearing in the functional in (0.1).In particular, a) the hypothesis space are the functions f : Ω → R with bounded p-Hessian-Schatten variation (see item b)), for some Ω ⊆ R d open.The space coincides indeed with Demengel's space ([Dem84]) of functions with bounded Hessian, which has been introduced to study models of plastic deformations of solids and has proven useful also in the context of image processing, but the norm we adopt is specific and allows for optimal approximation results by continuous and piecewise affine functions when p = 1; b) the regularizing term is the p-Hessian-Schatten variation |D 2 p • |(Ω), that coincides with the relaxation of the functional (here and after | • | p denotes the p-Schatten norm), This is a variant of the classical second-order total variation ( [ACU21]).It has been inspired by [HS06, BP10, KBPS11, LWU13, LU13] and used in [CAU21,PGU22]; c) in the critical case d = 2 we consider as linear forward operator the evaluation functional at certain points x 1 , . . ., x N ∈ R 2 , with observed data (y 1 , . . ., y N ) ∈ R N ; d ) still in the critical case, the error term is taken to be an q norm, i.e.E(f ) := (f (x i ) − y i ) i=1,...,N q .e) the tunable parameter is λ ∈ (0, ∞], where by convention λ = ∞ imposes a perfect fit with the data.
In view of the discussion above, it is evident that some questions arise as natural.
i) The description of the extremal points of the ball (cf.(0.2)) modulo additive affine functions (since the Hessian-Schatten seminorm is invariant under the addition of affine functions, this factorization is necessary).A reasonable description of these extremal points was given in [AABU22], under the assumption that a certain density conjecture holds true.Namely, it has been proved that if CPWL functions are dense in energy in the space of functions with bounded Hessian-Schatten variation, then all extremal points, which obviously are on the sphere, are found in the closure of the CPWL extremal points (and this last set is rather manageable, see [AABU22]).Here and below, a CPWL (Continuous and PieceWise Linear) function is a piecewise affine function, affine on certain simplexes.In Section 2 we give a positive answer to the just mentioned conjecture, proved only in the two-dimensional case in [AABU22] with a different, more constructive, strategy.As any CPWL function can be exactly represented by a neural network with rectified linear unit (ReLU) activation functions [ABMM16], our result (Theorem 2.4) in particular implies approximability of any function whose Hessian has bounded total variation by means of neural networks with ReLU activation functions, with convergence of the 1-Hessian-Schatten norm.ii) Again with respect to the extremal points of the set described in (0.3), one may wonder whether all the extremal points are CPWL.By a delicate measuretheoretic analysis, in Section 3 we show that the answer is negative: functions whose graphs are cut cones are extremal, modulo affine functions, and these functions are not CPWL if d ≥ 2. In connection with this negative answer, as for compact convex sets exposed points are dense in the class of extreme points, it would be interesting to know whether cut cones are also exposed, namely if there exist linear continuous functionals attaining their minimum, when restricted to the closed unit ball of the Hessian-Schatten seminorm, only at a cut cone.
iii) In the two-dimensional case, one may wonder whether the functional (0.1) admits minimizers, with the choice of error and regularizing term described above.In Section 4 we give a positive answer, for a large set of choices of the parameters λ, p and q.
Now we pass to a more detailed description of the content of the paper.Namely, we examinate separately the answers to items i), ii) and iii) above and we sketch their proofs.
Density of CPWL functions.In Section 2 we address the problem of density in energy |D 2 1 • |(Ω) of CPWL functions in the set of functions with bounded Hessian-Schatten variation.Our main result is Theorem 2.2, stated for C 2 targets, and then it follows the localized version Theorem 2.4 for targets with finite p-Hessian-Schatten variation.The proof of Theorem 2.2 heavily relies on a fine study of triangulations of R d and consists morally of three parts.
Part 1 is Section 2.1 and deals with general properties of triangulations (considered as couples of sets, the set of vertices and the set of elements), the most important ones being the Delaunay, non degeneracy and uniformity properties (items (a), (b) and (c) of Definition 2.7).Roughly speaking, the Delaunay property states that given an element of the triangulation, no vertex of the triangulation lies inside the circumsphere of the given element.It entails regularity properties, among them, the fact that angles in the elements are not too small.This leads to the non degeneracy property, crucial to estimate geometric quantities related to an element in terms of the volume of the given element.Finally, uniformity states that the vertices of the triangulation look like a rotation of a rescaling of the lattice Z d .The main results are Lemma 2.9, that allows us to gain a Delaunay triangulation starting from a uniform set of vertices and Lemma 2.13 which studies Delaunay triangulation whose vertices locally coincide with a rotation of a rescaling of the lattice Z d .
Part 2 is Section 2.2 and aims at constructing a "good" triangulation (in the sense of Part 1) that locally follows a prescribed orientation.The outcome is Theorem 2.14 and the main difficulty in its proof relies in "gluing" the various sub-triangulations to allow for the variable orientation (see Figure 3).
Part 3 is the proof of the density result, Section 2.3.We exploit the outcome of Part 2 to build a triangulation that locally follows the orientation given by the Hessian of w, ∇ 2 w, in the sense that is given by an orthonormal basis of eigenvectors for ∇ 2 w.Then we take u, the affine interpolation for w with respect to this triangulation, which will be a good approximation.The contribution of the Hessian-Schatten variation of u on regions in which the orientation of the triangulation is constant (and hence adapted to the Hessian of w) is estimated thanks to the good choice of the orientation, whereas the contribution around the boundaries of these regions, i.e.where the gluing took place, comes from the regularity properties of the triangulation and the smallness of these regions.

Extremality of cones.
In Section 3, we prove that functions whose graphs are cut cones are extremal with respect to the Hessian-Schatten total variation seminorm.Namely, we prove that functions defined as are extremal modulo affine functions, in the sense that if for some λ ∈ (0, 1) ), for some p ∈ [1, ∞), then f 1 and f 2 are equal to f cone , up to affine functions (Theorem 3.1).
Our strategy is as follows.First, we set f rad i to be the radial symmetrization of f i , for i = 1, 2. As f cone is radial, a simple computation yields that still ).This implies with not much effort that f rad i = f cone , up to affine terms, thanks to the explicit computation of Hessian-Schatten total variation of radial functions (Proposition 1.13).
The bulk of the proof is then to prove that whenever we have f such that ), then f equals to f cone , up to affine terms.In other words, in the case f rad = f cone , we have rigidity of the property that Case p = 1 is dealt in Proposition 3.5.For its proof, a key remark is the fact that, if ∆ denotes the distributional Laplacian, then Hence, by f rad = f cone , we have that where the second inequality is obtained by explicit computation (or by concavity of f cone in B 1 ).This then implies that (at the right hand side there is the total variation of the matrix valued measure D∇f with respect to the 1-Schatten norm) so that tr(D∇f ) = −|D∇f | 1 almost everywhere, which implies that the eigenvalues of D∇f are all negative, almost everywhere (Lemma 3.3), by rigidity in the inequality | Tr(A)| ≤ |A| 1 .Then, by Lemma 3.2, it follows that f has a continuous concave representative in B 1 .Finally we exploit concavity to obtain the pointwise bound f ≥ f cone in B 1 , which, combined with the integral equality f rad = f cone , implies the claim.
Case p ∈ (1, ∞) is dealt in Proposition 3.6, where we reduce ourselves to the case p = 1, namely we show that the information ), whence we can use what proved in the Case p = 1.This reduction is done treating separately the absolutely continuous and singular part of |D 2 p f |.The former is treated exploiting the strict convexity of the p-Schatten norm together with the scaling property of the map p → |D 2 p f cone |, whereas the latter is treated by Alberti's rank 1 Theorem ( [Alb93]), in conjunction with the fact that the p-Schatten norm of rank 1 matrices is independent of p.
Solutions to the minimization problem.In Section 4 we restrict ourselves to the two dimensional Euclidean space.Indeed, we want to exploit the continuity of functions with bounded Hessian-Schatten variation in dimension 2 ( [AABU22], see Proposition 1.11) to have a meaningful evaluation functional and define, for where x 1 , . . ., x N ∈ Ω are distinct points and y 1 , . . ., y N ∈ R. Also, we are adopting the convention that Notice that F λ is the sum of the regularizing term |D 2 1 f | and the weighted (by λ) error term λ (f (x i ) − y i ) i=1,...,N 1 and that F λ can be seen as a relaxed version of F ∞ .
In Section 4, we will consider slightly more general functionals, see (4.1), but for the sake of clarity we reduce ourselves to a particular case in this introduction.Our aim is to prove existence of minimizers of F λ (Theorem 4.2).Notice that in higher (≥ 3) dimension, F λ is not well defined (by the lack of continuity), and, even if we try to define it imposing continuity on its domain, minimizers do not exist in general, as the infimum of F λ is always zero.To see this last claim, simply exploit the scaling property of the Hessian-Schatten total variation (or use Proposition 1.13) for functions of the kind x → y i (1 − |x − x i |/r) + as r 0.
We sketch now the proof of the existence of minimizers of F λ .There are two key steps.We denote λ c := 4π, the "critical" value for λ.
Step 1.First we prove existence of minimizers of F λ , for λ ∈ [0, λ c ].This is done via the direct method of calculus of variations, after we prove relative compactness of minimizing sequences and semicontinuity of this functional.Compactness, proved in Proposition 4.9, is mostly due to the estimates of [AABU22], see Proposition 1.11.Semicontinuity is then proved in Lemma 4.8 and here the choice of λ ∈ [0, λ c ] plays a role.The key idea is that, given a point x i and a converging sequence f k → f , either |D 2 1 f k | concentrates at x i or it does not.In the former case (Lemma 4.7), as a part of |D 2 1 f k | concentrates at x i (and |D 2 1 f |(x i ) = 0, being points of codimension 2), we experience a drop in the regularizing term of the functional, and this drop is enough to offset the lack of convergence of the evaluation term f k (x i ) in the error term.In the latter case (Lemma 4.7 again), we have instead convergence of k → f k (x i ).
Step 2. We prove the existence of minimizers of F λ , for λ ∈ [λ c , ∞].By Step 1, we can take a minimizer f of F λc .Then we modify f to obtain f satisfying Such modifications is obtained adding to f a suitable linear combination of "cut-cones", namely functions x → y i (1 − |x − x i |/r) + for r small enough.As f has a perfect fit with the data, for any λ, where the inequality is due to the construction of f .Now, as F λ ≥ F λc (here the choice λ ∈ [λ c , ∞] plays a role) and as f is a minimizer of F λc , we see that f is a minimizer of F λ .

Therefore, putting together what seen in
Step 1 and in Step 2 we have that for every λ ∈ [0, ∞] there exists a minimizer of F λ .

Preliminaries
In this short section we first recall basic facts about Hessian-Schatten seminorms and then in Section 1.3 we add an explicit formula to compute Hessian-Schatten variations of radial functions.
1.1.Schatten norms.We recall basic facts about Schatten norms, see [AABU22] and the references therein.We recall that the scalar product between M, N ∈ R d×d is defined by and induces the Hilbert-Schmidt norm.Next, we enumerate several properties of the Schatten norms that shall be used throughout the paper , where C = C(d, p, q) depends only on d, p and q.
Definition 1.4.Let A ⊆ R d be a domain.We say that A supports Poincaré inequalities if for every q ∈ [1, d) there exists a constant C = C(A, q) depending on A and q such that where 1/q * = 1/q − 1/d.
1.2.Hessian-Schatten total variation.For this section fix We let p * denote the conjugate exponent of p. Now we recall the definition of Hessian-Schatten total variation and some basic properties, see [AABU22] and the references therein.
Definition 1.5 (Hessian-Schatten variation).Let f ∈ L 1 loc (Ω).For every A ⊆ Ω open we define where the supremum runs among all F ∈ C ∞ c (A) d×d with F p * ,∞ ≤ 1.We say that f has bounded p-Hessian-Schatten variation in Ω if |D 2 p f |(Ω) < ∞.Remark 1.6.If f has bounded p-Hessian-Schatten variation in Ω, then the set function defined in (1.1) is the restriction to open sets of a finite Borel measure, that we still call |D 2 p f |.This can be proved with a classical argument, building upon [DGL77] (see also [AFP00, Theorem 1.53]).
By its very definition, the p-Hessian-Schatten variation is lower semicontinuous with respect to convergence in distributions.
For any couple p, q ∈ [1, ∞], f has bounded p-Hessian-Schatten variation if and only if f has bounded q-Hessian-Schatten variation and moreover for some constant C = C(d, p, q) depending only on d, p and q.This is due to equivalence of matrix norms.The next proposition connects Definition 1.5 with Demengel's space of functions with bounded Hessian [Dem84], namely Sobolev functions whose partial derivatives are functions of bounded variation.We shall use D to denote the distributional derivative, to keep the distinction with ∇ notation (used also for gradients of Sobolev functions).
Proposition 1.7.Let f ∈ L 1 loc (Ω).Then the following are equivalent: If this is the case, then, as measures, In particular, there exists a constant C = C(d, p) depending only on d and p such that where the infimum is taken among all sequences above can be replaced by convergence in L 1 (A).
In the statement of the next lemma and in the sequel we denote by Lemma 1.9.Let f ∈ L 1 loc (Ω) with bounded Hessian-Schatten variation in Ω.Let also In the same spirit of Lemma 1.9, we have the following lemma.
Lemma 1.10.Let f ∈ L 1 loc (Ω) with bounded Hessian-Schatten variation in Ω. Assume that A ⊆ Ω is open and invariant under the action of SO(R d ).For any In particular, setting where µ d is the Haar measure on SO(R d ), by convexity one has Proof.The proof is very similar to the one of Lemma 1.9 above i.e. [AABU22, Lemma 12], but we sketch it anyway for the reader's convenience and for future reference.We take any F ∈ C ∞ c (A) n×n with F p * ,∞ ≤ 1 and we set G := U F (U t • )U t .A straightforward computation shows that i,j Then we compute, by a change of variables, In particular,

Now, by Fubini's Theorem
A i,j whence the claim as F was arbitrary.
Proposition 1.11 (Sobolev embedding).Let f ∈ L 1 loc (Ω) with bounded Hessian-Schatten variation in Ω.Then and, if d = 2, f has a continuous representative.More explicitly, for every A ⊆ Ω bounded domain that supports Poincaré inequalities and r ∈ [1, ∞), there exist C = C(A, r) and an affine map g = g(A, f ) such that, setting f := f − g, it holds that Lemma 1.12 (Rigidity).Let f, g ∈ L 1 loc (Ω) with bounded Hessian-Schatten variation in Ω and assume that p g| as measures on Ω.
1.3.Hessian-Schatten variation of radial functions.The following result is new and aims at computing the Hessian-Schatten variation of radial functions.This will be needed in Section 3 and Section 4. Notice also that, as expected, the contribution involving the singular part of |Dg | in (1.2) below does not depend on p.
In the proof we shall use the auxiliary function where v 1 is repeated d−1 times and ω d := L d (B 1 ) (d will be the dimension of the Euclidean ambient space).Notice that F is continuous, convex and 1-homogeneous with respect to the (v 1 , v 2 ) variable.Therefore, for intervals (r 1 , r 2 ) ⊆ (0, R), the functional F s, dµ dλ dλ whenever |µ| λ, defined on R 2 -valued measures µ makes sense and is convex.Furthermore, Reshetnyak lower semicontinuity Theorem (e.g.[AFP00, Theorem 2.38]) grants its lower semicontinuity with respect to weak convergence in duality with C c ((r 1 , r 2 )).
Proof.Let r ∈ (0, R).Let ρ k be radial Friedrich mollifiers for R d and define Notice that the eigenvalues of the matrix appearing at the right hand side of the equation above are g k (|x|) with multiplicity 1 and g k (|x|)/|x| with multiplicity d−1, the eigenvectors being x and a basis of x ⊥ .Therefore, by Proposition 1.7, on B r (0) one has ) is uniformly bounded by Lemma 1.9, we obtain the claimed membership for g, letting eventually r R.

, where we define the open annulus
Also, there is no loss of generality in assuming that r 1 and r 2 are such that |Dg |({r 1 }) = |Dg |({r 2 }) = 0, as well as |D∇f |(∂A r 1 ,r 2 ) = 0, hence we will tacitly assume this condition in what follows.
From (1.3), with the notation µ g := (g L 1 , Dg ), we get Now notice that Lemma 1.9 and our choice of radii grant ), so that the lower semicontinuity of Φ together with the weak* convergence of µ g k to µ g grants Letting r 1 → 0 and r 2 → r provides the inequality ≥ in (1.2).Now we prove the converse implication and inequality.This time we denote by (ρ k ) a sequence of Friedrich mollifiers on R and we call Notice that, with our choice of the radii, Letting r 1 → 0 and r 2 → R gives that f has bounded Hessian-Schatten total variation in B R (0) \ {0}.To conclude, obtaining also the converse inequality in (1.2), we need just to apply the classical Lemma 1.14 below to f and to the partial derivatives of f , taking into account the mutual absolute continuity of |D 2 p f | and |D∇f | (Proposition 1.7).
Proof.By a truncation argument, we can assume with no loss of generality that h is bounded.Then, the approximation of h by the functions in a neighbourhood of 0, together with Leibniz rule, provides the result.

Density of CPWL functions
We recall the definition of continuous piecewise linear (CPWL) functions.In view of this definition we state that a simplex in R d is the convex hull of d + 1 points (called vertices of the simplex) that do not lie on an hyperplane, and a face of a simplex is the convex hull of a subset of its vertices.
either empty or a common face of P k and P h , for every h = k; ii) for every k, the restriction of f to P k ∩ Ω is affine; iii) the decomposition is locally finite, in the sense that for every ball B, only finitely many P k intersect B.
The main theorem of this section is the following density result.
Theorem 2.2.For any w ∈ C 2 (R d ) there exists a sequence Recall that, as explained in [AABU22, Remark 22], because of lower semicontinuity the exponent p = 1 is the only meaningful exponent in a density result as above, namely this sharp approximation by CPWL functions is not possible for the energy |D 2 p f | when p > 1.We defer the proof of Theorem 2.2 to Section 2.3, after having studied properties of "good" triangulations in Section 2.1 and Section 2.2.Namely, we aim to construct triangulations of R d which locally follow a prescribed orientation.The general scheme is illustrated in Figure 2. In each of the large squares it coincides with a rotation of a triangulation of εZ d ; the difficulty resides in the interpolation region between different squares.In Section 2.1 we discuss standard material on general properties of triangulations.In Section 2.2 we present the specific construction, the key result is Theorem 2.14.This is then used to prove density in Theorem 2.2.First, we start with a brief discussion around the result of Theorem 2.2.We recall the following extension result, [AABU22, Lemma 17].Its last claim is immediate, once one takes into account also Proposition 1.11.
Lemma 2.3.Let Ω := (0, 1) d ⊆ R d and let f ∈ L 1 loc (Ω) with bounded Hessian-Schatten variation in Ω.Then there exist an open neighbourhood Ω of Ω and f ∈ L 1 loc ( Ω) with bounded Hessian-Schatten variation in Ω such that The following result gives a positive answer to [AABU22, Conjecture 1], partially proved in the two-dimensional case in [AABU22, Theorem 21].The proof is based on Theorem 2.2 and a diagonal argument.
Theorem 2.4.Let Ω := (0, 1) d ⊆ R d .Then CPWL functions are dense with respect to the energy |D 2 1 • |(Ω) in the space {f ∈ L 1 loc (Ω) : f has bounded Hessian-Schatten variation in Ω} with respect to the L 1 (Ω) topology.Namely, for any f ∈ L 1 loc (Ω) with bounded Hessian-Schatten variation in Ω, there exists Proof.Take f as in the statement, and let f be given by Lemma 2.3.By using smooth cut-off functions, there is no loss of generality in assuming that f is compactly supported in Ω, hence, in particular, f ∈ L 1 (R d ).Also, we see that we can assume that , thanks to Proposition 1.9 and lower semicontinuity.Now, for any k, take ( fk,h ) ⊆ CPWL(R d ) be given by Theorem 2.2 for fk .With a diagonal argument, we obtain (g  Remark 2.6.The set of extremal points is not closed with respect to the convergence considered here.For example, with d = 2, one can easily check that the function g(x) Let us briefly comment on the proof of extremality of G h (the same argument implies extremality of g).
), then by Lemma 1.12 the support of |D 2 p f | is contained in the support of |D 2 p G h |, so that f (after choosing the continuous representative) is affine in each of the sets on which G h is affine.Adding an irrelevant affine function, we can reduce to the case that f = 0 outside R h .Using the fact that if two affine functions coincide on three non-collinear points then they coincide everywhere, one obtains f = aG h , where a := f ((1 + h)e 1 ) ∈ R (see Fig. 1); by equality of the norms a = ±1.Similarly, f = ±G h , so that by 2.1.General properties of triangulations.We define a triangulation of R d as a pair of two sets, the first one, V , containing the vertices (nodes), the second one, E, containing the elements, which are nondegenerate compact simplexes with pairwise disjoint interior.Each simplex is the convex hull of its d + 1 vertices.One further requires a compatibility condition that ensures that neighbouring elements share a complete face (and not a strict subset of a face).We remark that there is a large literature which studies this in the more general framework of simplicial complexes.For the present application the metric and regularity properties are crucial, we present in this section the few properties which are relevant here in a self-contained way.for all e ∈ E.
The triangulation is locally finite if, for every ball B, only finitely many elements of E intersect B.
Condition iii) states that two distinct elements of E are either disjoint or share a face of dimension between 0 and d − 1; in particular distinct elements have disjoint interior.Notice that conv (∅) = ∅.
The Delaunay property (a) states that the circumscribed sphere to each simplex does not contain any other vertex, and implies ∂e ∩ V = v e for all e ∈ E. It can be interpreted as a statement that the vertices have been matched to form simplexes in an "optimal" way.
The non-degeneracy property (b) states that simplexes are uniformly non-degenerate, so that the affine bijection that maps e onto the standard simplex has a uniformly bounded condition number.It implies that there is C = C(c * , d) such that for any e ∈ E, any x ∈ v e , any F ∈ R d one has The uniformity property (c) of a set V of vertices ensures (for Delaunay triangulations) that all sides of all elements have length comparable to ε.Also, property (c) immediately implies property (d), as it forces V to be a locally finite set.
Proof.Take e ∈ E and let q ∈ R d and r ∈ (0, ∞) such that v e ⊆ ∂B r (q).By the Delaunay property, V ∩ B r (q) = ∅, so that, by (c, ε)-uniformity, cε > r ≥ diam(e)/2.We next show how given the set of vertices V one can abstractly obtain a good triangulation.The construction is standard up to a perturbation argument.As we could not find a reference with the complete result, we prove it.
Lemma 2.9.Let V ⊆ R d be uniform in the sense of property (c) of Definition 2.7.Then there is E ⊆ P(R d ) such that (V, E) is a triangulation of R d with the Delaunay property (a).
Proof.We define f : Let g be the convex envelope of f , which is CPWL (see Lemma 2.10 below).Moreover, notice that has nonempty interior.Notice that A is compact, convex and coincides with the closure of its interior, and g(x) > µ + 2x • q for every x ∈ R d \ A. Also, we set then, µ + 2x • q < |x| 2 for all x ∈ V \ w.Now we show that ext (A) ⊆ V so that ext (A) ⊆ w and hence A = conv (w) with #w ≥ d+1 (as A has nonempty interior).Take indeed p ∈ ext (A) and assume p / ∈ V .Then, take a minimal set of points {p 1 , . . ., (this is possible by (2.7) of Lemma 2.10 below).As p ∈ ext (A), up to reordering, we can assume that p 1 / ∈ A, hence by g(p 1 ) > µ + 2p 1 • q we have that g(p) > µ + 2p • q, a contradiction.
The above equations can be rewritten as We set r := µ + |q| 2 , so that these conditions are w ⊆ ∂B r (q) and V ∩ B r (q) = ∅, so that the set w has the Delaunay property.Notice then that for every x ∈ V , there is at least one set A as in (2.3) with nonempty interior and with x ∈ A ∩ V (this set was called w): this follows from the fact that g is CPWL.
Any decomposition of those elements A in (2.3) with nonempty interior into non degenerate simplexes with vertices in w leads to a pair (V, E) with all 4 claimed properties of triangulations, except for iii) of Definition 2.7.In the rest of the proof we show by a perturbation argument that a decomposition exists such that property iii), which relates neighbouring pieces in which g is affine, also holds.
We first remark that property iii) is automatically true if g is non degenerate, in the sense that each A is a simplex, which is the same as #w = d + 1 (we are going to add a few details about this in the sequel of the proof).In turn, this is true if for every choice of X := {x 1 , . . ., x d+2 } ⊆ V the d + 2 points {(x, g(x))} x∈X ∈ R d+1 do not lie in a d-dimensional hyperplane, so that (2.4) cannot hold for all x ∈ X.
We fix an enumeration ϕ : For a given set in the d + 1 unknowns (µ, q).The affine map T : R d+1 → R d+2 defined by T i (µ, q) := µ + 2x i • q − |x i | 2 has an image which is at most d + 1 dimensional, hence contained in a set of the form {Ξ ∈ R d+2 : Ξ • ν = a} for some ν ∈ S d+1 , a ∈ R (which depend on X).If the system (2.5) has a solution, then As |ν| = 1 and the exponents are all distinct, this is a nontrivial polynomial equation in ρ, and has at most finitely many solutions.As there are countably many possible choices of the set X ⊆ V , for all but countably many values of ρ no such system has a solution.Therefore we can choose ρ j 0 such that (2.5) has no solution for any choice of X with X = {x 1 , . . ., x d+2 } ⊆ V .
Fix now an index j and let g ρ j be the convex envelope of f ρ j .Notice that if ρ j is sufficiently small (that we are going to assume from here on), then, as V is discrete and |x| 2 is strictly convex, for every x ∈ V.
We define E j as the family of those sets.
Let us justify why (V, E j ) is a triangulation of R d .It is enough to show that property iii) holds.Take then e 1 , e 2 ∈ E j (with vertices w 1 , w 2 ), so that there exist two affine functions L 1 , L 2 such that g ρ j = L i on e i and g The conditions Therefore w ⊆ B r (q), and either r ≤ ρ j or V ∩ B r−ρ j (q) = ∅, where r := µ + |q| 2 .By uniformity of the grid, necessarily r − ρ j < cε, which gives diam(A) ≤ 2r < 2cε + 2ρ j ≤ 2(c + 1)ε.
For any x ∈ V , the possible choices of e with x ∈ v e are restricted by diam(e) < 2(c+1)ε, which implies v e ⊆ V ∩ B 2(c+1)ε (x).As the grid is uniform, the latter set is finite, with a bound depending only on c. Therefore for any x ∈ V we can choose a subsequence of ρ j such that the set {e ∈ E j : x ∈ v e } is, after finitely many steps, constant.As there are countably many x ∈ V , we can choose a common diagonal subsequence.Along this sequence, for any bounded set K the set {e ∈ E j : e ⊆ K} is, after finitely many steps, constant.Property iii) holds for E j , and therefore for those sets.Therefore we obtain a common set E with all desired properties.We remark that indeed the Delaunay property follows from the construction of E and the discussion of the first part of the proof: indeed, if e ∈ E, it is easy to see that there exists an affine function coinciding with g on e.
We next present the result on the regularity of convex envelopes used above.
Lemma 2.10.Let V ⊆ R d be a uniform set of vertices, in the sense of item (c) of Definition 2.7.Let f : V → [0, ∞) be superlinear, in the sense that (notice that we are not taking the closure of the convex hull at the right hand side).
Remark 2.11.It is easy to verify what follows.
i) The fact that V is uniform implies that g is real-valued.
ii) The assumption of superlinearity is necessary.Indeed, consider Proof of Lemma 2.10.For r ∈ (0, ∞), we write and let g r ≥ g be the convex envelope of f r .Since V is uniform, any set V ∩ B r is finite, and therefore g r is CPWL on conv (V ∩ B r ), and infinity outside.If r ≥ cε, with c, ε > 0 the constants from item (c) of Definition 2.7, the set V ∩ B r is nonempty.We shall show below that for any r > 0 there is R > 0 such that g = g R on B r/4 .This implies that g is CPWL on B r/4 for any r, and therefore the assertion.The choice of R (which depends on f and r) is done in (2.9) below.
For r ≥ cε we define α r := max f (V ∩ [−r, r] d ).We first prove that if R/ √ d > r ≥ 4cε then g R (x) ≤ α r for all x ∈ B r/2 .
(2.8) To see this, let q 1 , . . ., q 2 d denote the vertices of the cube [−1, 1] d .By uniformity of V , for each i we can pick for all i, and therefore g R ≤ α r on B r/2 , which proves (2.8).
We next show that, if R is chosen sufficiently large, then g R = g on B r/4 .By convexity, (2.8), and g R ≥ 0 we obtain Lip(g R ; B r/4 ) ≤ 4α r /r.As g R is CPWL in B r/4 , for any y ∈ B r/4 there is an affine function a : R d → R such that y ∈ T a := {g R = a} ∩ B r/4 and T a has nonempty interior.The Lipschitz bound on g R then carries over to a, and we obtain |∇a| ≤ 4α r /r.By convexity of g R , we have a ≤ g R , so that a ≤ f on V ∩ B R .In order to obtain the same inequality outside B R , we consider any Finally, by (2.6) we can choose R > √ dr such that (2.9) Therefore a ≤ f everywhere, which implies a ≤ g ≤ g R , and in turn g = g R on T a and therefore on B r/4 .We prove now (2.7).Take x ∈ R d , so that, by what proved above, g(x) = g R (x) for some R > 0. Now notice that the epigraph of g R coincides with the convex hull of the epigraph of f R (here we are using that the convex hull of the epigraph of f R is closed), so that the conclusion is easily achieved.
We next investigate in more detail Delaunay triangulations such that V locally coincides with Z d (possibly up to translations and rotations).We show in Lemma 2.13 below that the elements necessarily are the "natural" ones.Before we recall some basic properties of Z d , where, as usual, for Remark 2.12.The following hold. i

.10)
Proof.To prove the first item, we can change coordinates to assume that R = Id, and then, by scaling, we see that we can assume ε = 1.For each i = 1, . . ., d we select For the second one, by translation we can assume 0 ∈ v.The volume of the simplex conv v is given by 1/d! times the absolute value of the determinant of the matrix whose columns are the vectors of v \ {0}.As each component of each vector is integer, the determinant is an integer.Hence it is either 0, or at least 1.
The proof of the third item is similar.Again, assume 0 ∈ w.At least one e i is not contained in the linear space generated by w.We apply the first assertion to v := w ∪ {e i }, and obtain that the volume of T := conv v is either zero or at least 1/d!.Since the volume of T is also given by 1/d times the area of conv w times the distance of e i to the space generated by w, which is at most 1 since 0 ∈ w, we obtain (2.10).
Lemma 2.13.Let (V, E) be a triangulation of R d with the Delaunay property and let B r (q) be a ball such that V ∩ B r (q) = εRZ d ∩ B r (q), for some ε > 0 and We remark that the assumption e ∩ B r− Proof.By scaling and a change of coordinates it suffices to consider the case ε = 1, R = Id.Let e be as in the statement, and let B ρ (y) be such that v e ⊆ ∂B ρ (y).By the Delaunay property, using also the assumption in force here, (2.12) We want to show now that ρ = √ d/2.First, we assume (by contradiction) that ρ > √ d/2.We show that this possibility cannot occur.We define ρ := min{ρ, r, (r + ρ − |q − y|)/2}.Condition (2.12) implies ρ > √ d/2 and the definition of ρ gives so that there exists y ∈ B r−ρ (q) ∩ B ρ−ρ (y) (we adopt the convention that B 0 (x) = {x}).The point y obeys then B ρ (y ) ⊆ B r (q) ∩ B ρ (y) and therefore, recalling (2.11), B ρ (y ) ∩ Z d = ∅, which contradicts ρ > √ d/2 (Remark 2.12(i)).Hence ρ ≤ √ d/2, so that, using also (2.12), B ρ (y) ⊆ B r (q), and therefore, recalling (2.11), B ρ (y) ∩ Z d = ∅ and v e ⊆ Z d .We define z ∈ Z d by choosing for each i a component Assume that there exists i with |z i − y i | < 1 2 , so that |z i − x i | < 1 for all x ∈ v e .As x i , z i ∈ Z, this implies x i = z i for all x ∈ v e , hence v e is contained in a (d − 1)-dimensional subspace of R d .As e is non degenerate (i.e. has non empty interior), this is impossible, hence |z i − y i | = 1 2 for all i.We conclude that ρ = √ d/2 and then v e ⊆ y + {− 1 2 , 1 2 } d , which also implies the membership of y to (Z + 1/2) d by v e ⊆ Z d .2.2.Construction of the triangulation.We write Q (x) := x + (− /2, /2) d and Q := Q (0).Notice the factor 1/2, i.e. is the length of the edge of the open cube Q (x).
Aim of this section is to prove the following (see Figure 3 for an illustration): Theorem 2.14.For any d ≥ 2 there is C G = C G (d) with the following property.Let 0 < ε < δ with δ ≥ C G ε, and let R : Then there is a triangulation (V, E) of R d , in the sense of Definition 2.7, with the following properties: i) Regularity: The triangulation has the Delaunay property (property (a)), is C G -non degenerate (property (b)), and is (C G , ε)-uniform (property (c)).ii) Orientation: for each z ∈ δZ d one has  We start by proving that in a single cube we can construct a set of vertices V which coincides with εZ d on the boundary, with a rotation of the same lattice inside, and which is uniform and non-degenerate, in a sense made precise in the statement below.This will then be used to prove Theorem 2.14.
contained in a (d − 1)-dimensional affine subspace, and there is a ball B r (y) with v ⊆ ∂B r (y), B r (y) Proof.We divide the proof in several steps.
Step 1: general setting.To simplify notation we denote by Q out := Q M ε (z) the outer cube, by Q in := Q (M −2)ε (z) the inner cube, and by Q mid := Q (M −1)ε (z) the intermediate one (see Figure 4).We set has the desired properties.The property i) is true for any choice of V mid .Next we deal with ii), and leave the more delicate treatment of iii) at the end.
We show that for any q ∈ R d one has B 2dε (q) ∩ (V in ∪ V out ) = ∅.Consider first the case q ∈ Q mid .Let q be the point of and B √ dε/2 (q ) ⊆ Q in .By Remark 2.12, we can take we have p ∈ B √ dε/2 (q ) ⊆ B 2dε (q), and the first assertion in ii) is proved in this case.In the case q ∈ Q mid we argue similarly, projecting onto Therefore the first assertion in ii) is true for any choice of V mid .
It remains to choose V mid so that the property |x − y| ≥ ε/(2d) for all x = y ∈ V (i.e. the second assertion in ii)) is preserved, and iii) holds.In order to understand the strategy (cf.iii)), consider a set v and a ball B r (y) such that (2.14) The construction strategy of V mid then will ensure that: (a) sets v as in (2.14) cannot contain elements of both V in and V out ; (b) for any choice of v as in (2.14), with additionally Step 2: construction of U ε .We show here that there is a finite set U ε ⊆ ∂Q mid such that if the set V mid is constructed picking exactly one point z of each B ε/(4d) (u), for u ∈ U ε , then (a) and the second assertion in ii) hold.The specific choice of the points z will be done in Step 3 to ensure (b) of (and hence iii), by (a)).We let The shift p is chosen so that the set is nonempty; we recall that Q mid is a cube of side length (M − 1)ε ∈ εZ, but the centre z is a generic point in R d .
Assume now that V mid is chosen so that it contains exactly one point of each B ε/(4d) (u), for u ∈ U ε .We claim that then V satisfies also the second assertion in ii).Let indeed x, y ∈ V , x = y.If both are in V in , or both in V out , then |x − y| ≥ ε.If both are in V mid , then there are In the other cases, we use and similarly dist(V in , V mid ) ≥ ε/4 to conclude.This proves the second assertion in ii).We finally check that (a) holds.Let v ⊆ V be as in (2.14).Assume by contradiction that v contains elements of both V in and V out , then the sphere ∂B r (y) intersects both ∂Q out and ∂Q in .We show that there exists x ∈ ∂Q mid such that B ε/2 (x ) ⊆ B r (y).Assume first y ∈ Q mid .Let y ∈ ∂B r (y) ∩ ∂Q out , and choose and proceed analogously.Let x be the point in U ε closest to x .As every component x i is the element of which contradicts the condition V ∩ B r (y) = ∅ stated in (2.14).Therefore this cannot happen, and hence (a) holds.
Step 3: choice of the elements of V mid .We write {u 1 , . . ., u J } := U ε and iteratively for every j pick a point z j ∈ B ε/(4d) (u j ) which ensures (b).We collect in V j mid := {z 1 , . . ., z j } the points chosen in the first j steps, and at the end we will use V mid := V J mid .Fix := 1 + 2d, (2.15) the reason for this specific choice will be clear later.An admissible set of vertices at stage j is a set v with #v = d + 1 such that there is q ∈ ∂Q mid with v ⊆ B ε (q), L d (conv v) > 0, and either v ⊆ V j mid ∪ V in or v ⊆ V j mid ∪ V out .An admissible face at stage j is a set w with #w = d such that there is q ∈ ∂Q mid with w ⊆ B ε (q), H d−1 (conv w) > 0, and either w ⊆ V j mid ∪ V in or w ⊆ V j mid ∪ V out .We denote by N w := #(w ∩ V j mid ) the number of items of w in V j mid , clearly N w ≤ d.We intend to show that there are α, β, γ, C F > 0 (depending only on d) such that we can choose z j ∈ B ε/(4d) (u j ) iteratively with the following two properties: i) If v is an admissible set of vertices at stage j, then ii) If w is an admissible face at stage j, then (2.17) The key to the choice of z j , which eventually leads to (2.16) at stage j building upon (2.17) at stage j − 1, is the following geometric observation.If v is an admissible set of vertices at stage j, and it contains the point z j , then w := v \ {z j } is an admissible face at stage j − 1 and for any q ∈ w we have where ν w is a unit normal to the affine space generated by w.The factor H d−1 (conv w) will be estimated via (2.17) at stage j − 1, the choice of z j needs to ensure that the first factor is not too small, for any possible choice of w.Now we start choosing z 1 , . . ., z J .As stated before, we proceed by iteration.Assume that we have already chosen z 1 , . . ., z j−1 , we want to choose z j (if j = 1 we use V 0 mid = ∅).Let w be an admissible face at stage j − 1 such that w ⊆ B (2 +1/(4d))ε (u j ).If no such face exists, choose z j := u j .Since no two points in V are at distance smaller than ε/(2d) (by ii)), the number of possible choices of w is bounded by a number K which depends only on d.Let w 1 , . . ., w K be these possible choices.We choose z j such that for all k = 1, . . ., K and an arbitrary choice of p k ∈ w k (the condition does not depend on the choice of p k , as ν w k is orthogonal to p k − p k for any p k , p k ∈ w k ).We show now why we can choose such z j .We observe that and thus the total volume of these sets is controlled by Kγ2 2−d d 1−d ε d .Then we choose γ such that this expression equals 1 2 L d (B ε/(4d) (u j )) and hence we have a suitable z j .Continuing in this way, we have thus constructed V J mid .It remains to show by induction that the points we constructed have the properties (2.16) and (2.17).Assume first j = 0, and recall V 0 mid = ∅, so that N w = 0.By Remark 2.12, (2.16) and (2.17) hold provided C F ≥ (d − 1)! and β ≤ 1/d!.Assume now that (2.16) and (2.17) hold at stage j − 1, we are going to prove that they hold also at stage j.
Let v be an admissible set of vertices at stage j.If z j ∈ v, then v was already admissible at stage j − 1, hence (2.16) holds.Then we assume that z j ∈ v, so that w := v \ {z j } is an admissible face at stage j−1 and v ⊆ B ε (q) ⊆ B 2 ε (z j ) ⊆ B (2 +1/(4d))ε (u j ), where q ∈ ∂Q mid is given by the admissibility of v.In particular, w ⊆ B (2 +1/(4d))ε (u j ), so that (2.19) holds for w in place of w k .By (2.17) at stage j − 1, (2.18), (2.19) and N w ≤ d we have, provided α ≤ 1, for any p ∈ w, so that setting Let w be an admissible face at stage j.As above, by the inductive assumption it suffices to consider the case z j ∈ w.Assume w ⊆ V j mid ∪ V in , the other case is analogous and will not be treated.Being w admissible, w ⊆ B ε (q), for some q ∈ ∂Q mid .Let q be the point of . By the choice of made in (2.15), we get Then the 2d points p * ± εRe i are all in B ε (q) ∩ V in , and at least one of them is not in the affine space generated by w \ {z j }.Denote it by p, and set ŵ := w \ {z j } ∪ {p}.
At this point we conclude the proof of Theorem 2.14.
Proof of Theorem 2.14.Set := 2d and (2.20) We first select a background lattice, This set obviously has the orientation property stated in ii), provided that C G ≥ 4 + 3. We show that for any We next similarly show that for any q ∈ R d one has , and the required property follows from item ii) of Lemma 2.15, since V ⊇ V z ∩ Q δ (z).If not, then B ε (q) does not intersect any Q M ε (z), so that B ε (q) ∩ V 0 = B ε (q) ∩ εZ d , which is nonempty by Remark 2.12.This proves that the set V is ( , ε)-uniform, in the sense of Property (c) of Definition 2.7.By Lemma 2.9 there is a set E so that (V, E) is a triangulation with the Delaunay property.
It only remains to show that (V, E) is non-degenerate.Let e ∈ E be a simplex, and let ∂B r (q) ⊇ v e be its circumscribed sphere.By the Delaunay property B r (q) ∩ V = ∅, by the ( , ε)-uniformity proven above this implies r < ε.If there is z ∈ δZ d such that q ∈ Q (M +2 )ε (z) then v e ⊆ V z , and item iii) of Lemma 2.15 implies L d (e) ≥ ε d /C .Otherwise v e ⊆ V 0 ⊆ εZ d , and since L d (e) > 0 by Remark 2.12 we obtain L d (e) ≥ ε d /d!.This concludes the proof, with C G := max{7 + 2d + 4 , 4 + 3, C , d!}.
2.3.Proof of the main result.We now recall how one can use a triangulation to define continuous, piecewise affine approximations.
Lemma 2.16.Let (V, E) be a triangulation of R d .For any w : V → R there is a unique u ∈ C 0 (R d ) which coincides with w on V and is affine on each e ∈ E.
If the triangulation is c * -non degenerate, and if moreover w is obtained as the restriction to V of a C 2 (R d ) function that we still denote w, then the function u obtained above obeys (2.21) for all e ∈ E, with C depending on c * and d.
Proof.For each e ∈ E one defines u e : e → R by u e = w on v e and as the affine interpolation in the rest of e = conv (v e ).To prove existence of u we only need to check that u e = u e on e ∩ e , for any pair e = e ∈ E. Assume e ∩ e = ∅.Then e ∩ e = conv (v e ∩ v e ).As u e = u e on v e ∩ v e , and both are affine in conv (v e ∩ v e ), they coincide on e ∩ e .This concludes the proof of the first assertion.
To prove the two estimates, we focus on an element e ∈ E and let G be the constant gradient of u on e.For any pair x, y ∈ v e ,
We are ready to prove our main result, Theorem 2.2.
Proof of Theorem 2.2.Before entering into the proof of the theorem, we stress that we are going to use the fact that for a piecewise affine function u j , This follows from the fact that u j is piecewise affine, hence the distributional derivative of D∇u j is only of jump type, so that the density of D∇u j with respect to |D∇u j | is a rank 1 matrix, and hence we can use item v) of Proposition 1.2 in conjunction with Proposition 1.7.Fix two sequences δ j → 0, ε j → 0, with δ j > 0, ε j > 0, and ε j /δ j → 0. For each j and each z ∈ δ j Z d we select a matrix R z ∈ SO(R d ) such that R t z ∇ 2 w(z)R z is diagonal, and let (V j , E j ) be the grid constructed in Theorem 2.14 with these parameters.We define u j as the piecewise affine interpolation of w, constructed as in Lemma 2.16.This concludes the construction.
In order to prove convergence and the energy bound, it suffices to work in a large ball B r , with Ω ⊆ B r/2 .For large j, we can assume C G ε j ≤ δ j ≤ r/(2d).Here and below C G is the (fixed) constant from Theorem 2.14, we can assume C G > 2 √ d.We use C for a generic constant that depends only on d (and C G ) and may vary from line to line.By Lemma 2.16 one immediately obtains a uniform Lipschitz bound on u j , By the uniformity property of the grid, for any x ∈ B r and any j there is y ∈ This proves local uniform convergence.
Since ∇ 2 w is continuous, one has that converges to zero as ρ → 0. The estimate of the energy is done separately in the interior of the cubes, where the grid is regular, and in the boundary regions.We start from the boundary, where the grid is irregular.As ∇w is continuous, equation (2.22) in Lemma 2.16 permits to estimate |[∇u j ]|, the jump in ∇u j across the boundary between two neighbouring elements e and e which intersect B r , and gives in all e with e ∩ B r = ∅, here we used also Remark 2.8.Using non-degeneracy and uniformity of the triangulation to control the volume of e, we obtain for all elements e ∈ E j with e ⊆ B r .Fix now z ∈ δ j Z d such that Q δ j (z) ∩ Ω = ∅.Summing the previous condition over all elements e ∈ E j with e ∩ Q provided j is large enough, since ε j δ j .Here we used that for every e ∈ E j , diam(e) ≤ 2C G ε j , being the triangulation (V j , E j ) (C G , ε j )-uniform and with the Delaunay property.
We next estimate the energy inside In the next estimates we write briefly δ and ε for δ j and ε j .
For any element e ∈ E j with e , so that the orientation property of Theorem 2.14 gives Let F y := ∇w(y).For all x ∈ v e , Taylor remainder term in integral form and (2.25) yield (this can be seen as the definition of R( • )) with which does not depend on the γ i , and therefore is the same for all x ∈ v e .Hence The function u j is affine on the element e, assume it has the form u j (ξ) = a e + G e ξ for ξ ∈ e.As u j = w on v e , for every pair x, x ∈ v e we obtain Recalling that e is a non-degenerate simplex by (2.2), (2.27) and what just proved we obtain |G e − F y | ≤ Cεω δ .
Consider now some y ∈ εR , so that the above discussion applies and (2.28) gives |G e − G e | ≤ Cεω δ , having used that the above discussion forces y = y e (since y, y e ∈ εR z (Z + 1 2 ) d and y = y e imply that ⊇ v e has at most dimension d − 1) and analogously y = y e .In particular, those elements constitute a decomposition of y + R z Q ε .Arguing as before, summing over all pairs, (2.29) In order to estimate the contribution from the boundary of these cubes, let y = y±εR z e i be the centre of one of the neighbouring small cubes.Since , so that (2.28) holds for any element e contained in y + R z Q ε (with e in place of e and y in place of y).As the common boundary has area As we did before, we represent F y − F y = ∇w(y ) − ∇w(y) with Taylor's theorem (this can be seen as the definition of R ( • , • )) to obtain where we used that the R z e i are eigenvectors of H z by the choice of R z , the definition of the Schatten norm and in the final step (2.25).Let Summing over all y ∈ A z , taking into account (2.29) and (2.30) and recalling that the boundaries between the cubes appear twice in the sum, gives |∇ 2 w| 1 dL d and combining with (2.26) Summing over all z such that Q δ (z) ∩ Ω = ∅, and inserting back the indices j, where (Ω) ρ := {x ∈ R d : dist(x, Ω) ≤ ρ √ d}.Taking the limit j → ∞, and recalling that δ j → 0, ω δ j → 0 and ε j /δ j → 0, concludes the proof (recalling (2.24)).

Extremality of cones
In this section we consider functions of the kind It is clear that our forthcoming discussion will apply also to slightly different functions, e.g.a(1 − b|x − x 0 |) + for a, b ∈ R with b > 0 and x 0 ∈ R d , but this will not make much difference, as one can reduce to the particular case of (3.1) via a change of coordinates and a rescaling.Notice that, by Proposition 1.13, if d ≥ 2, Our aim is to investigate extremality of such kind of functions with respect to p-Hessian-Schatten seminorms, for p ∈ [1, ∞].It turns out that these functions are extremal, and now we state our main result in this direction.Its proof is deferred to Section 3.3 and will follow easily from the results of Section 3.1 and Section 3.2, taking into account also Section 1.3.
) and such that for some λ ∈ (0, 1), Then f 1 and f 2 are equal to f cone , up to affine terms: there exist affine functions L 1 , L 2 : R d → R such that f i = f cone + L i for i = 1, 2.
Notice that Theorem 3.1 is stated only for d ≥ 2. Indeed, for d = 1, it is easy to realize that f cone is not extremal, according to the meaning described in the statement of the theorem.
To simplify the notation, as in this section we are going to consider only balls centred at the origin, we will omit to write the centre of the ball, i.e.B r := B r (0).Before going on, we recall that given f ∈ L 1 loc (R d ), we denote by f rad the function given by Lemma 1.10.As an explicit expression, notice that (3.3) Notice also that f rad (x) = g(|x|) for g(r) given by the right hand side of (3.3) with r in place of |x|.
3.1.Convexity.We prove that if a function ), then f is the cone.The case p = 1 is treated in Proposition 3.5, using the fact that the absolutely continuous part of D∇f has a sign, which makes f concave inside the unit ball.The case p > 1 is treated in Proposition 3.6, using strict convexity of the p-Schatten norm to show that the absolutely continuous part of D∇f is a scalar multiple of the absolutely continuous part of D∇f cone , and then scaling to reduce to the p = 1 case.
First, we need a couple of lemmas.The first is an extension of a well known criterion to recognize convexity.Lemma 3.2.Let Ω ⊆ R d be open and convex and let f ∈ L 1 loc (Ω) with bounded Hessian-Schatten variation in Ω. Assume that D∇f ≥ 0 (as a measure with values in symmetric matrices).Then f has a representative which is continuous and convex.
Proof.The property of having a continuous representative is clearly local.Since Ω is open and convex, a continuous function g : Ω → R is convex if and only if it is convex in a neighbourhood of any point.Therefore it suffices to prove the assertion in a neighbourhood of any point, so that we can assume f ∈ W 1,1 (Ω) with ∇f ∈ BV(Ω; R d ), by Proposition 1.11 and Proposition 1.7.
Let x ∈ Ω, and pick r > 0 such that Q 4r (x) ⊆ Ω (we write here Q (y It remains to show that f ε (possibly after passing to a subsequence) converges uniformly in Q r , which implies the conclusion in Q r and therefore in a neighbourhood of any point of Ω.
We prove now uniform convergence in Q r , the argument is classical, see e.g. the proof of [EG15,Theorem 7.6].Passing to a subsequence, f ε j → f pointwise almost everywhere.Pick x ∈ Q r/2 (x) such that the sequences f ε j (x) and f ε j (y), for any vertex y of Q 2r (x) ⊆ Q 3r (x), are bounded (as we can assume them to be convergent), and let M = M x,r be the common bound.By convexity, f ε j ≤ M on Q2r (x).To prove the uniform lower bound, we observe that for any w ∈ Q 2r (x) \ {x} there is z ∈ ∂Q 2r (x) such that x is in the interior of the segment joining w with z.As convexity implies monotonicity of the difference quotients, where in the last step we used |z − x| ≥ 2r.Since Passing to the smaller cube Q r (x) and using again monotonicity of the difference quotients we obtain Lip(f ε j ; Q r (x)) ≤ C M for all j, so that f ε j converges uniformly in Q r (x) to a continuous convex function, which coincides almost everywhere with f .This concludes the proof.
The following lemma builds upon Lemma 3.2 and gives an integral characterization of convexity, which is more manageable, and follows from the rigidity in the inequality | Tr A| ≤ |A| 1 .Proof.We can assume that TrD∇f (Ω) ≥ 0, otherwise one replaces f by −f .
Let now A ∈ R d×d be a symmetric matrix and let λ 1 , . . ., λ d denote its eigenvalues.By item i) of Proposition 1.2, and equality holds if and only if λ i ≥ 0 for all i, which is the same as A ≥ 0 as a symmetric matrix.
By Proposition 1.7 (in particular, which means that D∇f ≥ 0 as a matrix-valued measure, so that the conclusion then follows by Lemma 3.2.
3.2.Extremality with respect to spherical averaging.In this section, we consider only the case d ≥ 2. This is because this is an auxiliary section for the proof of Theorem 3.1, which holds only for d ≥ 2. We start by doing some explicit computation involving the Hessian-Schatten total variation of f cone .First, by Proposition 1.7, This computation is easily justified by locality, as f cone is smooth on B 1 \ {0} and on R d \ B1 .Now we claim that Taking into account that D∇f cone does not charge points, this formula is easily justified on R d \ ∂B 1 by locality, as above.For what concerns the singular part, on ∂B 1 , it is enough to use the representation formula for the singular part of differentials of vector valued functions of bounded variation, e.g.[AFP00], notice indeed that the unit outer normal to ∂B 1 is x and that the jump of ∇f cone at x ∈ ∂B 1 is exactly x.
Taking traces, we have that The next lemma states that this inequality is somehow rigid.
Then f is equal to f cone up to a linear term: there exists α ∈ R d such that Proof.Let r > 0 and let U ∈ SO(R d ).By Lemma 1.10, f U := f (U • ) has finite Hessian-Schatten total variation.Also, for any radial function g so that, integrating both sides with respect to dµ d (U ) and using Fubini's Theorem, Then, as f rad = f cone and integrating by parts, In particular, taking into account (3.2) and (3.8) ).Now Lemma 3.3 can be applied, to obtain that the function f has a continuous and concave representative in B 1 that, without loss of generality, we still denote by f .By (3.8) again, Setting also f (x) := f (x) − α • x, we conclude the proof by showing f = f cone .Notice that still f is continuous and concave on B 1 and f rad = f cone .Notice that this last fact implies f (0) = 1.Now, for any σ ∈ ∂B 1 , define fσ (s) := f (sσ) for s ∈ [0, ∞), a function continuous and concave in [0, 1) with fσ (0) = 1.Notice that for H d−1 -a.e.σ ∈ ∂B 1 , fσ ∈ W 1,1 loc ((0, ∞)).This can be seen either with a change of coordinates and the characterization of Sobolev functions on lines or by approximation, using repeatedly integration in polar coordinates.Hence, for H d−1 -a.e.σ ∈ ∂B 1 , the function fσ has a continuous representative in [1, ∞).Now, for H d−1 -a.e.σ ∈ ∂B 1 , fσ vanishes a.e. in (1, ∞) (as f vanishes identically on R d \ B1 ), therefore this implies fσ (s) → 0 as s ↑ 1 and the continuous representative is the one null in [1, ∞).Then, exploiting continuity and concavity, for ) with bounded Hessian-Schatten variation and assume that Then f is equal to f cone up to a linear term: there exists α ∈ R d such that Proof.We focus on the case p > 1 as the case p = 1 has already been proved in Proposition 3.5.Let now g := 1 2 (f + f cone ).Recalling (3.8), |D 2 p g|(R d \ B1 ) = 0. Still, g rad = f cone , so that, by Lemma 1.10 and (3.10), hence equality holds throughout and therefore g satisfies (3.10) in place of f .We next decompose D∇f in absolutely continuous and singular part, use that the singular one has a rank one density with respect to the total variation, and show that the absolutely continuous one is proportional to the one of D∇f cone .We are going to use the theory of functions of bounded variation throughout, see e.g.[AFP00].The superscript s denotes the singular part of a measure with respect to L d .We have a L hence equality holds throughout and in particular, 3.5).Therefore, by Proposition 1.7, where we also used (3.10) for f and g and (3.8) in the last equality.Hence equality holds throughout, so that By strict convexity of the p-Schatten norm (item vi) of Proposition 1.2), and the fact (by (3.5)) that the density of D∇f cone with respect to L d is nonzero L d -a.e. on B 1 , we Therefore, to sum up, we have, for i = 1, 2,

Solutions of the minimization problem
In this section we stick to the two dimensional case d = 2. Recall that, by Proposition 1.11, functions with bounded Hessian-Schatten variation are continuous, as we are in dimension 2 and hence the evaluation functionals in (4.1) below are meaningful (we will implicitly take the continuous representative, whenever it is possible).
Our aim is to establish conditions under which F p,q λ has minimizers, i.e. we want to ensure the existence of a minimizer of inf f ∈L 1 loc (Ω) F p,q λ (f ).
It turns out that for many values of λ, p, q, minimizers indeed exist.Here we state our main results in this direction.
Theorem 4.1.Let p, q ∈ [1, ∞] and let λ ∈ [0, 2 1/p−1 4π].Then there exists a minimizer of F p,q λ .Theorem 4.2.Let λ ∈ [0, ∞].Then there exists a minimizer of F 1,1 λ .Theorem 4.1 and Theorem 4.2 will follow easily from the results of Section 4.1.We defer their proof of to Section 4.2.4.1.Auxiliary results.For the following lemma, we recall again that functions with bounded Hessian-Schatten variation in dimension 2 are automatically continuous.Hence, the evaluation (at 0) functional in the infimum above is meaningful.The spirit of this lemma is to provide us with "bump" functions whose Hessian-Schatten total variation is almost optimal.
We prove now the opposite inequality in (4.2).Take then f ∈ L 1 loc (R 2 ), compactly supported, with bounded Hessian-Schatten variation and such that f (0) = 1.We have to prove that |D 2 p f |(R 2 ) ≥ 2 1+1/p π.Using Lemma 1.9, Lemma 1.10, we see that we can assume with no loss of generality that f ∈ C ∞ c (R 2 ) and f is radial, say f (x) = g(|x|), with g(0) = 1 and g + (0) = 0. Now, by Proposition 1.7 and the inequality (|a|+|b|) ≤ 2 1−1/p (|a| p +|b| p ) 1/p , we obtain that ).Hence, it is enough to show the claim in the case p = 1, i.The existence of "good bump functions" granted by Lemma 4.3 allows us to prove, in Proposition 4.4 below, that for λ large enough the infimum of F p,q λ does not depend on λ, namely that minimizing F p,q λ asymptotically promotes the perfect fit with the data.Proposition 4.4.Let p, q ∈ [1, ∞] and let λ ∈ [2π2 1/p N 1−1/q , ∞].Then F p,q ∞ (f ).
In particular, in this range of λ, the infima are also independent of q.
Proof.We let r ∈ (0, ∞) small enough so that dist(x i , x j ) > 3r if i = j.Let ε ∈ (0, 1).For i = 1, . . ., N , by Lemma 4.3 and a scaling argument, we take g i ∈ C c (R 2 ) with g(x i ) = 1, supp g i ⊆ B r (x i ) and |D 2 p g i |(R 2 ) ≤ 2 1+1/p π + ε.Then we consider f ∈ L 1 loc (Ω) and we set f := f − i (f (x i ) − y i )g i .Remark 4.6.Notice that the constant 1/(4π) in front of |D 2 p f |(B) in (4.4) is somehow optimal.We can realize this considering the sequence of functions f ε used to prove Lemma 4.3.By Lemma 4.5, there is no surprise in knowing that, given a weakly convergent sequence f k f , in duality with the space L ∞ c (Ω) of L ∞ function with compact (essential) support, we can estimate how much the evaluation functional fails to converge in terms of concentration of Hessian-Schatten total variation at x. Proof.First, take a non relabelled subsequence so that lim k |f (x) − f k (x)| exists and equals the lim sup k at the left hand side of (4.8).We assume that there exists r 1 > 0 small enough so that B r 1 (x) ⊆ Ω and moreover that lim sup k |D 2 f k |(B r 1 (x)) < ∞, otherwise there is nothing to show.By lower semicontinuity this implies that f has bounded Hessian-Schatten variation in B r 1 (x).We extract a further non relabelled subsequence such that, for some finite measure µ on B r 1 (x), |D 2 p f k | µ in duality with C c (B r 1 (x)).
Let now r ∈ (0, r 1 /2).Then, Now notice that by continuity of f the first summand converges to 0 as r 0, whereas, by the convergence assumption the second summand converges to 0 as k → ∞.Also, by Lemma 4.5, we bound the third summand as follows To conclude, it is enough notice that that lim sup By using the results above, we can prove the lower semicontinuity of F p,q λ .In the case q = 1, notice that the argument used in the proof of Proposition 4.4 together with the next result can be used to show that F p,1 λ is precisely the relaxed functional of F p,1 ∞ when λ = 2 1+1/p π.Lemma 4.8.Let p, q ∈ [1, ∞] and let λ ∈ [0, 2 1/p−1 4π].Then F p,q λ is lower semicontinuous with respect to weak convergence in duality with L ∞ c (Ω). Proof.Let (f k ) ⊆ L 1 loc (Ω) be such that f k f in duality with L ∞ c (Ω), for some f ∈ L 1 loc (Ω).We have to prove that F p,q λ (f ) ≤ lim inf k F p,q λ (f k ).First, extract a non relabelled subsequence such that F p,q λ (f k ) has a limit, as k → ∞, which equals the right hand side of the inequality above.Then, we can assume that lim inf k |D 2 p f k |(Ω) < ∞, otherwise there is nothing to show.Hence f has bounded Hessian-Schatten variation in Ω and, up to the extraction of a non relabelled subsequence, we can assume that |D 2 p f k | µ in duality with C c (Ω) for some finite measure µ on Ω.Even though µ depends on p, we do not make this dependence explicit.Also, we extract a non relabelled subsequence such that for every i = 1, . . ., N , |f (x i ) − f k (x i )| has a (finite) limit as k → ∞.
Notice that for every z ∈ Ω one has µ({z}) ≤ lim L ∞ (R 2 ), in particular fk are uniformly bounded in L ∞ (K).Now, as |g k (x i )| = | fk (x i ) − f (x i )| are bounded for every i = i 1 , i 2 , i 3 , it is easy to infer, by the assumption in (a) that the perturbations g k are uniformly bounded.Hence f k L ∞ (K) is bounded and, since K is arbitrary, the claim follows by weak compactness.
Case (b).If N ≤ 2, there is an affine function f * with f * (x i ) = y i for all i, and therefore F p,q λ (f * ) = 0. We can therefore assume N ≥ 3. Let v ⊥ be a unit vector orthogonal to v, and choose ε ∈ (0, 1) sufficiently small that x 0 := x 1 + εv ⊥ ∈ Ω. Define fk (x As F p,q λ ( fk ) = F p,q λ (f k ), this is also a minimizing sequence, with the additional property that fk (x 0 ) = 0 for all k.The conclusion follows then from the argument of the previous case.Proof of Theorem 4.1.The statement is proved by the direct method of calculus of variations, by Lemma 4.8 and Lemma 4.9.
Proof of Theorem 4.2.Let λ c :=4π.We argue as in Proposition 4.4, starting from a minimizer f of F 1,1 λc granted by Theorem 4.1.We modify f subtracting i (f (x i ) − y i )g i where this time g i are rescaled cut cones (see (4.3)), in such a way that f := f − i (f (x i ) − y i )g i has a perfect fit with the data.Since |D 2 1 g i |(R 2 ) = 4π (recall e.g.Lemma 4.3), one has This, taking the inequality F 1,1 λ ≤ F 1,1 ∞ into account, proves that f is a minimizer of F 1,1 λ for any λ ≥ λ c .
Remark 2.5.Let Ω := (0, 1) d .As a consequence of Theorem 2.4, the description of the extremal points of the unit ball with respect to the |D 2 1 • |(Ω) seminorm obtained in [AABU22, Theorem 25] remains in place in arbitrary dimension.In a slightly imprecise way, the result states that CPWL extremal points are dense in 1-Hessian-Schatten energy in the set of extremal points with respect to the L 1 (Ω) topology.Notice that the description of CPWL extremal points is made explicit in [AABU22, Proposition 23].

Figure 1 .
Figure 1.Sketch of the function G h used in proving Remark 2.6.The function equals 1 on the two points marked by black dots, −h on the two points marked by black squares, vanishes outside the large rectangle, and is affine in each of the ten polygons in the figure.

Figure 2 .
Figure2.Sketch of the desired triangulation without the interpolation region.Aim of this section is to find a suitable interpolation between the squares.
2.11) by e ∩ B r− √ d (q) = ∅ and e ⊆ B ρ (y) we have |q − y| < r we have |z − y| ≥ ρ.By minimality of z i , for any x ∈ v e ⊆ Z d and any i we have |x i − y i | ≥ |z i − y i |, which by x ∈ ∂B ρ (y) implies ρ = |x−y| ≥ |z−y| ≥ ρ.Therefore, equality holds throughout and ρ = |x − y| = |z − y| and |x i − y i | = |z i − y i | for every i ∈ {1, . . ., d} and x ∈ v e .

Figure 3 .
Figure 3. Sketch of the set of vertices V built in Theorem 2.14.The blue squares indicate the irregular regions where V mid is used.

Figure 4 .
Figure 4. Sketch of the boundary region as considered in Lemma 2.15.

L
d -a.e. on B 1 .

g
(4.3) Now we write {g > 0} ∩ (0, ξ) = k I k and {g < 0} ∩ (0, ξ) = k J k , where I k and J k are countably many pairwise disjoint open intervals.Notice that if p ∈ ∂I k for some k, then either p = ξ or g (p) = 0.Then, if we take I k such that ξ ∈ ∂I k , ds − ξg (ξ) = I k |g |ds − ξ|g |(ξ), whereas if we take I k such that ξ / ∈ ∂I k , Similar inequalities hold in the case of an interval of the type J k .Therefore, summing over all intervals I k and J k , ξ|g |(ξ), so that, by the choice of ξ due to (4.7), (2B \ B), whence the claim for p = 1.For the general case, simply notice that |D 2 1 f |(B) ≤ 2 1−1/p |D 2 p f |(B) and the same holds for 2B \B, by 1 − p inequality and Proposition 1.7.

4. 2 .
Proof of the main results.Having proved the results in Section 4.1, Theorem 4.1 and Theorem 4.2 follow in a immediate, classical way.
where the supremum is taken among all N ∈ R d×d with |N | p * ≤ 1, for p * the conjugate exponent of p, defined by 1/p + 1/p * = 1.v) If M has rank 1, then |M | p coincides with the Hilbert-Schmidt norm of M for every