1 Introduction

In recent years, action functionals of the form

$$\begin{aligned} I_{f}(\gamma )=\int _{0}^{1}|\dot{\gamma }|^{2}+|\nabla f(\gamma )|^2 \end{aligned}$$
(1.1)

have received the attention of many authors, due to their appearence in several areas of Mathematics. In the theory of gradient flows, for instance, they correspond to the integral form of the energy dissipation (see [5]), and they are also related to the so-called entropic regularization of the Wasserstein distance, when f is a multiple of the logarithmic entropy, defined on the space of probability measures with finite quadratic moment (see [9, 12, 13]). In all these cases, the main obstruction to the application of standard results of Calculus of Variations stems from the lack of differentiability, even continuity, of the Lagrangian with respect to \(\gamma \).

When \(f:X\rightarrow (-\infty ,+\infty ]\) is a \(\lambda \)-convex function defined on a metric space (Xd), the term \(|\nabla f(x)|\) has to be interpreted as the descending slope of f at x, namely

$$\begin{aligned} |\nabla ^{-} f|(x):=\underset{y\rightarrow x}{\limsup }\,\frac{[f(x)-f(y)]^{+}}{d(x,y)}, \end{aligned}$$
(1.2)

and, if X is a Hilbert space, it also coincides with the norm of the minimal selection in the subdifferential \(\partial f(x)\), known as the extended gradient of f at x (see [5]). In this general framework, stability of the functionals \(I_{f}\) with respect to \(\Gamma \)-convergence of the functions f was investigated in [2,3,4]. In particular, [2] addressed a rigorous derivation, along the lines of [8], of a dynamical system of interacting particles strictly related to the optimal transport problem, known as the discrete Monge–Ampère gravitational (MAG) model (see Sect. 2.2). What emerged from this work is that the dynamics of MAG can be conveniently studied as the Euler–Lagrange equation associated to an action functional of type (1.1), where \(f=f_{K}\) is the \((-1)\)-convex function given by the (halved) opposite squared distance from a specific discrete set \(K\subset \mathbb {R}^{d}\), namely

$$\begin{aligned} f_{K}(x):=-\frac{{{\,\textrm{dist}\,}}^{2}_{K}(x)}{2}=-\underset{y\in K}{\min }\,\frac{|x-y|^{2}}{2}. \end{aligned}$$
(1.3)

Clearly, \(f_{K}\) is not everywhere differentiable in general, so that \(\nabla f_{K}\), denoting the extended gradient of \(f_K\), is not even continuous, and standard results of Calculus of Variations are not directly applicable in this case.

The present work aims at a systematic analysis of the properties of local minimizers for the functional \(I_{f_{K}}\), where K is a generic discrete set K in \(\mathbb {R}^{d}\). Our results apply in particular to solutions of the discrete MAG model, thereby addressing the general n-dimensional case, left open in [2], where the most involved part of the analysis was carried out only in dimension 1.

The plan of the paper is the following. In Sect. 2 we present the general framework and the motivations of our work. More precisely, in Sect. 2.1 we provide a contextualization of the problem in the general Hilbertian setting, with particular emphasis on the variational properties of functionals of type (1.1), when f is a \(\lambda \)-convex function. The main references for this part are [3, 5]. Then, in Sect. 2.2, we introduce the Monge–Ampère gravitational model in the flat torus \(\mathbb {T}^{n}\) as a modification of the classical Newtonian gravitation in which the linear Poisson equation is replaced by the fully non-linear Monge–Ampère equation (see (2.7)). Following the ideas of [8], we underline the intriguing link of this dynamical system with the optimal transport problem, whose powerful tools can be used in order to derive a Lagrangian reformulation of MAG particularly meaningful in the discrete setting, where additional foundation to the model is given by the results in [2]. By means of a least action principle, we then interpret solutions of the discrete MAG model as local minimizers of the functional \(I_{f_{K}}\), where \(f_{K}\) is the opposite squared distance function from a discrete set \(K\subset \mathbb {R}^{d}\), as defined in (1.3). Section 3, being the core of the paper, is then devoted to the analysis of local minimizers for the Lagrangian problem associated to \(I_{f_{K}}\), when K is a finite collection of points in \(\mathbb {R}^{d}\):

$$\begin{aligned} K=\left\{ p_{1},\dots ,p_{N}\right\} . \end{aligned}$$

We crucially consider the Voronoi partition of the space carried by K, which encodes the underlying geometrical structure of the problem, and exploit it in order to obtain, in Proposition 3.8, the existence of some specific directions along which momentum is locally conserved by the dynamics. As a byproduct, we show in Corollary 3.9 that a local minimizer \(\gamma \) is regular as long as it stays in a single Voronoi cell, possibly developing singularities only at those times in which the optimality class changes. As it is shown later, the set K also carries a partition of \(\mathbb {R}^{d}\) into “potential zones” (see Proposition 3.6). This second partition is in general less fine than the Voronoi one, and coincides with it when the set K is “balanced”, like for instance a cubic lattice. We then define \(S(\gamma )\) to be the set of “shock times”, at which the curve \(\gamma \) jumps from a Voronoi cell to another, and \(NDS(\gamma )\subseteq S(\gamma )\) the set of “non-degenerate shock times”, at which \(\gamma \) not only changes the Voronoi cell, but also the potential zone. With this in mind, our main regularity results Theorem 3.15 and Corollary 3.16 can be collected in a single statement as follows:

Theorem 1 (Partial regularity) Let \(\gamma \) be a local minimizer of \(I_{f_K}\) with endpoints constraints. Then

  1. (i)

    \(\gamma \) has a finite number of non-degenerate shock times out of which it is \(C^{1,1}\).

  2. (ii)

    Under the additional assumption that K is balanced, \(\gamma \) has a finite number of shock times out of which it is \(C^{\infty }\).

This result in particular provides an extention to any space dimension of [2, Theorem 13], where regularity out of a finite number of shock times was proved for minimizers of a one-dimensional version of the MAG model.

2 General framework and motivations

2.1 Action functionals depending on the gradient of convex functions

\(\lambda \)-convex functions. Given a Hilbert space H, we consider a function \(f:H\rightarrow (-\infty ,+\infty ]\), and denote by \({{\,\textrm{dom}\,}}(f)\) its finiteness domain. We say that f is \(\lambda \)-convex if \(x\mapsto f(x)-\frac{\lambda }{2}|x|^2\) is convex. It is easily seen that \(\lambda \)-convex functions are precisely those functions that satisfy the perturbed convexity inequality

$$\begin{aligned} f((1-t)x+ty)\le (1-t)f(x)+tf(y)-\frac{\lambda }{2}t(1-t)|x-y|^{2},\quad t\in [0,1]. \end{aligned}$$

By \(\partial f(x)\) we denote the Gateaux subdifferential of f at \(x\in {{\,\textrm{dom}\,}}(f)\), namely the (possibly empty) closed convex set

$$\begin{aligned}\partial f(x):=\left\{ \xi \in H: \underset{t\rightarrow 0^{+}}{\liminf }\,\frac{f(x+tv)-f(x)}{t}\ge \xi \cdot v, \, \forall v \in H\right\} .\end{aligned}$$

We denote by \({{\,\textrm{dom}\,}}(\partial f)\) the domain of the subdifferential. For a \(\lambda \)-convex function, we can exploit the monotonicity of difference quotients to derive the equivalent non-asymptotic definition of the subdifferential

$$\begin{aligned}\partial f(x):=\left\{ \xi \in H: f(y)\ge f(x)+\langle \xi ,y-x \rangle + \frac{\lambda }{2}|y-x|^{2}, \, \forall y \in H\right\} .\end{aligned}$$

Whenever \(x\in {{\,\textrm{dom}\,}}(\partial f)\), there exists a unique element \(\xi \) with minimal norm in \(\partial f(x)\), obtained by projecting 0 on \(\partial f(x)\). This element is called the extended gradient of f at x, and is denoted by \(\nabla f(x)\). The concept of extended gradient is strictly related to the one of descending slope of f at \(x\in {{\,\textrm{dom}\,}}(f)\), namely

$$\begin{aligned}|\nabla ^{-}f|(x):=\underset{y\rightarrow x}{\limsup }\,\frac{[f(x)-f(y)]^{+}}{|x-y|}.\end{aligned}$$

In fact, for \(\lambda \)-convex functions, it can be proved that \(\partial f(x)\) is not empty if and only if \(|\nabla ^{-}f|(x)<+\infty \), and that, in this case, the following equalities hold (see [5]):

$$\begin{aligned} |\nabla f(x)|=|\nabla ^{-}f|(x)=\underset{y\ne x}{\sup }\,\frac{[f(x)-f(y)+\frac{\lambda }{2}|x-y|^{2}]^{+}}{|x-y|}. \end{aligned}$$
(2.1)

By setting \(|\nabla f|\) equal to \(+\infty \) out of \({{\,\textrm{dom}\,}}(\partial f)\), we easily deduce from (2.1) that the function \(H\ni x \mapsto |\nabla f(x)|\in [0,+\infty ]\) is lower semicontinuous, being the supremum of a collection of continuous functions.

In this paper we deal with the following specialization of the above setting. Given a closed set \(K\subseteq H\), we consider the (halved) opposite squared distance function from K, namely

$$\begin{aligned} f_{K}(x):=-\frac{{{\,\textrm{dist}\,}}^{2}_{K}(x)}{2}=-\underset{y\in K}{\inf }\frac{|x-y|^{2}}{2}. \end{aligned}$$
(2.2)

The infimum is not attained in general, unless K is either convex or compact, or H is finite-dimensional. By defining the convex function

$$\begin{aligned}g_{K}(x):=\underset{y\in K}{\sup }\left( \langle x, y\rangle -\frac{|y|^2}{2}\right) ,\end{aligned}$$

we derive from the equality \(g_{K}(x)=f_{K}(x)+\frac{|x|^{2}}{2}\) that \(f_{K}\) is \((-1)\)-convex.

A class of action functionals. We now introduce, in the general Hilbertian setting, the class of action functionals that we are going to study throughout the paper. We fix a function \(h:[0,+\infty ]\rightarrow [0,+\infty ] \) representing a “potential shape”. Then, for \(\delta >0\) and \(f:H\rightarrow (-\infty ,+\infty ]\) proper, \(\lambda \)-convex and lower semicontinuous, we consider the functional \(I^{\delta }_{f}:C([0,\delta ], H)\rightarrow [0,+\infty ]\) defined by

$$\begin{aligned} I^{\delta }_{f}(\gamma ):= \left\{ \begin{array}{ll} \displaystyle \int _{0}^{\delta }|\dot{\gamma }|^2+h(|\nabla f|^{2}(\gamma ))\quad &{}\text {if }\gamma \in AC([0,\delta ], H),\\ +\infty &{}\text {otherwise}. \end{array}\right. \end{aligned}$$
(2.3)

Compared to the one of type (1.1) studied so far in the literature, we consider here the enriched class of functionals in which the potential shape h is allowed to be different from the identity. In the sequel we assume h to be continuous, and \(C^{1}\) when restricted to \([0,+\infty )\).

As we have in mind to study this type of functionals from the variational point of view, it is crucial to realize that (2.3) is lower semicontinuous with respect to the \(C([0,\delta ],H)\) topology. This in fact easily follows from the lower semicontinuity of the classical action and the above characterization of the extended gradient (2.1). Then, for \(x_{0}, x_{\delta } \in H\), the infimum

$$\begin{aligned} \Gamma ^{\delta }_{f}(x_0,x_{\delta }):=\inf \left\{ I^{\delta }_{f}(\gamma ):\gamma (0)=x_{0}, \gamma (\delta )=x_{\delta }\right\} \end{aligned}$$

is attained under suitable coercivity conditions. Note in particular that this is the case if H is finite-dimensional.

Due to the lack of continuity of the potential term, however, very little is known about the regularity for minimizers of this type of functionals, even in the finite-dimensional case. We could ask for instance whether some higher regularity or at least a sort of Euler–Lagrange equation like formally

$$\begin{aligned} \ddot{\gamma }=h'(|\nabla f(\gamma )|^2)\nabla ^{2}f(\gamma )\nabla f(\gamma ) \end{aligned}$$
(2.4)

could be derived for local minimizers. It is worth mentioning here that a rigorous Euler–Lagrange equation like the one in (2.4) has been derived in [10] in the context of the entropic regularization of the Wasserstein distance, where f is the Boltzmann–Shannon relative entropy. In the very specific case in which f is the opposite squared distance function from a discrete set in \(\mathbb {R}^{d}\), we will prove in the sequel that local minimizers are piecewise \(C^{1,1}\), and that, out of a finite number of singularities, (2.4) holds taking the modulus on both sides and replacing the equality with a \(\le \) sign (see Theorem 3.15). Nevertheless, in the general setting, one can exploit the fact that the functional \(I^{\delta }_{f}\) is autonomous in order to perform “horizontal” variations of the independent variable, and eventually derive the Du Bois–Reymond equation for a local minimizer \(\gamma \) (see [1]):

$$\begin{aligned} \frac{d}{dt}\left\{ |\dot{\gamma }|^2-h(|\nabla f|^{2}(\gamma ))\right\} =0 \end{aligned}$$
(2.5)

in the sense of distributions in \((0,\delta )\). Equivalently, there exists a constant \(c\in \mathbb {R}\) such that

$$\begin{aligned}|\dot{\gamma }|^2=h(|\nabla f|^{2}(\gamma ))+c\end{aligned}$$

a.e. in \((0,\delta )\). This implies in particular that every local minimizer of \(I^{\delta }_{f}\) is Lipschitz continuous, provided that \(|\nabla f|\) is bounded on bounded sets.

We end this part by quoting a result from [3] addressing the matter of stability for the class of functionals considered so far. By adding endpoints constraints \(x_{0},x_{\delta }\in H\), we define the functional \(I^{\delta }_{f,x_{0},x_{\delta }}:C([0,\delta ], H)\rightarrow [0,+\infty ]\) such that

$$\begin{aligned} I^{\delta }_{f,x_{0},x_{\delta }}(\gamma ):= \left\{ \begin{array}{ll} \displaystyle \int _{0}^{\delta }|\dot{\gamma }|^2+h(|\nabla f|^{2}(\gamma ))\quad &{}\text {if }\quad \gamma \in AC([0,\delta ], H)\text { and } \gamma (0)=x_{0}, \gamma (\delta )=x_{\delta },\\ +\infty &{}\text {otherwise}. \end{array}\right. \end{aligned}$$

Theorem

(Stability, [3]) Let \(f_{j}, f\) be uniformly \(\lambda \)-convex functions, and let \(x_{j,0}, x_{j,\delta }, x_{0}, x_{\delta }\in H\). Suppose that

  1. (i)

    \(f_{j}\rightarrow f\) w.r.t. Mosco convergence.

  2. (ii)

    \(\underset{j\rightarrow \infty }{\lim }x_{j,i}=x_{i}\), for \(i=0, \delta \).

  3. (iii)

    \(\underset{j}{\sup }|\nabla f_{j}|(x_{j,i})<\infty \), for \(i= 0, \delta \).

Then \(I^{\delta }_{f_{j},x_{j,0},x_{j,\delta }}\) \(\Gamma \)-converge to \(I^{\delta }_{f,x_{0},x_{\delta }}\) in the \(C([0,\delta ], H)\) topology.

As a byproduct, under an additional equi-coercivity assumption, this theorem grants convergence of minimal values to minimal values and of minimizers to minimizers. Notice that Theorem 2.1 is stated in [3] for \(h={{\,\textrm{id}\,}}\), but the same proof is seen to work in the general case with only minor modifications.

2.2 The Monge–Ampère gravitational model

In a periodic spatial domain like the flat torus \(\mathbb {T}^{n}=\mathbb {R}^{n}/\mathbb {Z}^{n}\), we can describe classical Newtonian gravitation of a unity of mass in a “parametric” way as follows. We first choose a reference probability space \((\mathcal {A}, \lambda )\) of labels for the gravitating particles. Then we assign to each particle \(a\in \mathcal {A}\) its position \(X_{t}(a)\in \mathbb {T}^{n}\) at time t. Tipical choices for the reference space are the unit cube \([0,1]^n\) with the n-dimensional Lebesgue measure in the continuous case, and a finite set of points with the renormalized counting measure in the discrete case. Denoting by \(\mu _{t}:=(X_{t})_{\#}\lambda \) the image measure of \(\lambda \) by \(X_t\), the Newtonian model can be written as

$$\begin{aligned} \left\{ \begin{array}{ll} \frac{d^{2}}{dt^{2}}X_{t}(a)=-\nabla \phi _{t}(X_{t}(a)),\\ \Delta \phi _{t} = \mu _{t}-1. \end{array}\right. \end{aligned}$$
(2.6)

Here \(\phi _{t}\) is the gravitational potential generated by \(\mu _{t}\), defined on \(\mathbb {T}^{n}\). Notice that due to the periodicity of the space, the average density 1 has been subtracted from the right-hand side of the Poisson equation, in order to let the uniform measure \(\mathscr {L}^{n}\) be a stationary solution of the system. This is a perfectly meaningful assumption, because by symmetry, the attractive force of the uniform density has to be zero everywhere on \(\mathbb {T}^{n}\).

In this section we are interested in the related Monge–Ampère gravitational model (MAG in short), which is simply obtained from (2.6) by replacing the Poisson equation with the fully non-linear Monge–Ampère equation:

$$\begin{aligned} \left\{ \begin{array}{ll} \frac{d^{2}}{dt^{2}}X_{t}(a)=-\nabla \phi _{t}(X_{t}(a)),\\ \det (\mathbb {I}+\nabla ^{2}\phi _{t}) = \mu _{t}. \end{array}\right. \end{aligned}$$
(2.7)

Notice that (2.6) can be recovered from (2.7) by expanding the determinant in the Monge–Ampère equation and keeping only the linear term:

$$\begin{aligned}\det (\mathbb {I}+\nabla ^{2}\phi _{t})\approx 1 + {{\,\textrm{Tr}\,}}(\nabla ^{2}\phi _t)=1+\Delta \phi _{t}.\end{aligned}$$

We refer to [8] and the references therein for a broader introduction to this dynamical system, as well as for a comparison with the classical Newtonian model.

The MAG model in optimal transportation terms. System (2.7) appears to have an intriguing geometrical interpretation if we look at it from the optimal transportation point of view. In order to better illustrate this link, we first quote the following specialization to the flat torus \(\mathbb {T}^{n}\) of the classical Brenier-McCann theorem on the existence and uniqueness of optimal transport maps on Riemannian manifolds (see [7, 11, 15]). Let us begin with some notation. We denote by \(\pi :\mathbb {R}^{n}\rightarrow \mathbb {T}^{n}\) the projection to the quotient. We say that a vector field \(F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\) is \(\mathbb {Z}^{n}\)-translation invariant if \(F(\cdot +z)=F(\cdot )+z\), for every \(z\in \mathbb {Z}^{n}\). If this is the case we make a little abuse of notation by considering F also as a vector field from \(\mathbb {T}^{n}\) to itself. Given a Borel probability measure \(\lambda \) on \(\mathbb {T}^{n}\), we consider the Hilbert space

$$\begin{aligned}H_{\lambda }:= L^{2}(\mathbb {T}^{n}, \lambda ; \mathbb {R}^{n})\end{aligned}$$

and its closed subset \(K_{\lambda }\) given by all the \(\lambda \)-preserving vector fields

$$\begin{aligned}K_{\lambda }:=\left\{ Y\in H_{\lambda }: (\pi \circ Y)_{\#}\lambda = \lambda \right\} .\end{aligned}$$

Finally, we recall that \(f_{K_{\lambda }}\) denotes the opposite squared distance function from \(K_{\lambda }\), as defined in (2.2).

Theorem 2.1

(Existence and uniqueness of optimal transport maps in \(\mathbb {T}^{n}\)) Let \(\mu \) and \(\lambda \) be Borel probability measures on \(\mathbb {T}^{n}\), and suppose that \(\mu \ll \mathscr {L}^{n}\). Then

  1. (i)

    There exists a locally Lipschitz convex function \(\psi :\mathbb {R}^{n}\rightarrow \mathbb {R}\), such that \(\phi (x):=\psi (x)-\frac{|x|^{2}}{2}\) is \(\mathbb {Z}^{n}\)-periodic (therefore \(\nabla \psi (x)=x + \nabla \phi (x)\) is \(\mathbb {Z}^{n}\)-translation invariant), and \(T:=\nabla \psi : \mathbb {T}^{n}\rightarrow \mathbb {T}^{n}\) is the unique optimal transport map from \(\mu \) to \(\lambda \).

  2. (ii)

    If \(\mu = \rho \mathscr {L}^{n}\) and \(\lambda = \eta \mathscr {L}^{n}\) are both absolutely continuous w.r.t. the Lebesgue measure, then \(\phi \) solves the Monge–Ampère equation

    $$\begin{aligned} \det (\mathbb {I}+\nabla ^{2}\phi )\eta (T(x))=\rho (x) \end{aligned}$$
    (2.8)

    in the almost everywhere sense. Furthermore, if \(\rho \) and \(\eta \) are of class \(C^{0,\alpha }\) and \(\rho , \eta >0\), then \(\phi \) is of class \(C^{2,\beta }\), for \(0<\beta <\alpha \), and solves (2.8) in the classical sense.

  3. (iii)

    Let \(Y\in H_{\lambda }\) be such that \((\pi \circ Y)_{\#}\lambda = \mu \). Then \(T \circ Y\) is the unique projection of Y on \(K_{\lambda }\), and

    $$\begin{aligned} \Vert T \circ Y - Y \Vert _{H_{\lambda }} = W_{2}(\mu ,\lambda ), \end{aligned}$$
    (2.9)

    where \(W_{2}\) is the Wasserstein distance in the space \(\mathscr {P}_{2}(\mathbb {T}^{n})\). Moreover, the map \(f_{K_{\lambda }}\) is Gateaux differentiable at Y and it holds

    $$\begin{aligned} \nabla f_{K_{\lambda }}(Y)= T \circ Y - Y= \nabla \phi \circ Y. \end{aligned}$$
    (2.10)

In order to reformulate the Monge–Ampère gravitational model in optimal transportation terms, we look to the continuous case, in which the reference space is given by \((\mathbb {T}^{n},\mathscr {L}^{n})\). Fix then \(\lambda = \mathscr {L}^{n}\) in the Theorem above, and consider a parametrization \(X_{t}:\mathbb {T}^{n}\rightarrow \mathbb {T}^{n}\) such that \((X_{t})_{\#}\lambda = \mu _{t}\) and \(\mu _{t}=\rho _{t}\mathscr {L}^{n}\) is absolutely continuous w.r.t. the Lebesgue measure. If \(Y_{t}\in H_{\lambda }\) is any lifting of \(X_{t}\), that is to say a map that satisfies \(\pi \circ Y_{t} = X_{t}\), then Theorem 2.2 grants that the Kantorovich potential \(\phi _{t}\) solves the Monge–Ampère equation

$$\begin{aligned}\det (\mathbb {I}+\nabla ^{2}\phi _{t})=\mu _{t},\end{aligned}$$

and \(-\nabla \phi _{t} = Y_{t}-T_{t}\circ Y_{t}\), where \(T_{t}\) is the unique optimal transport map from \(\mu _{t}\) to \(\lambda \). So we see that (2.7) reduces to

$$\begin{aligned} \frac{d^{2}}{dt^{2}}Y_{t}(a)= Y_{t}(a)-T_{t}(Y_{t}(a)). \end{aligned}$$
(2.11)

Moreover, from (2.10) we obtain

$$\begin{aligned}|\nabla f_{K_{\lambda }}(Y)|^{2}= |T \circ Y - Y|^{2}= -2 f_{K_{\lambda }}(Y),\end{aligned}$$

suggesting an interpretation of (2.11) as the Euler–Lagrange equation associated to the functional

$$\begin{aligned} \int _{0}^{\delta }|\dot{\gamma }|^{2}+|\nabla f_{K_{\lambda }}(\gamma )|^{2},\quad \gamma :[0,\delta ]\rightarrow H_{\lambda }. \end{aligned}$$
(2.12)

This variational reformulation appears natural in the attempt to give a meaning to system (2.7) also in the discrete setting, where, as it is well-known, Theorem 2.2 fails.

The discrete MAG model. As already mentioned before, one of the aims of this work is to go deeper in the analysis of the discrete version of the Monge–Ampère gravitational model, first introduced in [8] and then formalized in [2]. Here we choose as reference measure

$$\begin{aligned}\lambda = \frac{1}{m}\overset{m}{\underset{i=1}{\sum }}\delta _{a_i},\end{aligned}$$

where the \(a_i\)’s are distinct points on \(\mathbb {T}^{n}\) (think for instance to a regular lattice approximating the uniform measure). In this case, the space \(H_{\lambda }\) is easily seen to be finite-dimensional, and isomorphic to \(\mathbb {R}^{nm}\), through the identification of a map \(Y\in H_{\lambda }\) with the m-uple \((Y(a_{1}),\dots ,Y(a_{m}))\in \left( \mathbb {R}^{n}\right) ^{m}\). Under this correspondence, \(K_{\lambda }\) is represented by the discrete set of all points \((b_1,\dots ,b_m)\) in \(\mathbb {R}^{nm}\) such that \(\pi \left( \left\{ b_1,\dots ,b_m\right\} \right) =\left\{ a_1,\dots ,a_m\right\} \). By regarding, a bit improperly, the \(a_i\)’s as elements of \([0,1)^{n}\), the set \(K_\lambda \) can be written as the union of m! cubic lattices in \(\mathbb {R}^{nm}\):

$$\begin{aligned}K_{\lambda }=\underset{\sigma \in \mathfrak {S}_m}{\bigcup }(a_{\sigma (1)},\dots ,a_{\sigma (m)})+\mathbb {Z}^{nm}.\end{aligned}$$

In this discrete scenario, the MAG model describes the motion of m particles of equal mass 1/m in the torus \(\mathbb {T}^{n}\), whose dynamics is ruled by the optimal transport problem as follows. The position of the ith particle at time t is denoted by \(x_{i}(t)=X_{t}(a_{i})\), and a lifting of \(x_{i}(t)\) to \(\mathbb {R}^{n}\) by \(y_{i}(t)=Y_{t}(a_i)\). The equivalent of (2.11) in this setting is, at least formally,

$$\begin{aligned} \ddot{y}_{i}(t)=y_{i}(t)-b_{i}^{{{\,\textrm{opt}\,}}}(t),\quad i\in \left\{ 1,\dots , m\right\} , \end{aligned}$$
(2.13)

where \((b_{1}^{{{\,\textrm{opt}\,}}}(t),\dots ,b_{m}^{{{\,\textrm{opt}\,}}}(t))\) is the closest point to \((y_{1}(t),\dots ,y_{m}(t))\) in \(K_{\lambda }\). The system (2.13) is easily seen to be ill posed, because of the general non-uniqueness of the projection on \(K_\lambda \), ultimately due to the non-uniqueness of the optimal transport map in the discrete setting, in contrast with the absolutely continuous one. As already pointed out in [2], in order to fix this problem, it is convenient to switch to a variational reformulation of the dynamical system, by considering an action functional of type (2.12). Therefore, relying on a least action principle, we say that \(y\in AC([0,\delta ], \mathbb {R}^{nm})\) is a solution of the discrete MAG model if it is a local minimizer of the functional

$$\begin{aligned}\int _{0}^{\delta }|\dot{\gamma }|^{2}+|\nabla f_{K_{\lambda }}(\gamma )|^{2}\end{aligned}$$

subject to endpoints constraints. In the next section, we are going to study a more general functional in which \(K_{\lambda }\) is replaced by a generic discrete set K in \(\mathbb {R}^{d}\).

Before concluding this part, we would like to briefly turn the attention of the reader to an analogous Lagrangian problem in the space of probability measures \((\mathscr {P}(\mathbb {T}^{n}),W_{2})\). This can be obtained from MAG by dropping the parametric description of the gravitating matter, required by the Hilbertian setting of Sect. 2.1, and directly considering the evolution of a probability measure \(\mu _{t}\) in \(\mathbb {T}^{n}\).

A related Lagrangian problem in \((\mathscr {P}(\mathbb {T}^{n}),W_{2}).\) Far from being limited to the Hilbertian context, functionals of type (1.1) can be considered in a much more general metrict setting, provided that we interpret \(|\nabla f(x)|\) as the descending slope \(|\nabla ^{-}f|(x)\) defined in (1.2), and \(|\dot{\gamma }|\) as the metric derivative of an absolutely continuous curve \(\gamma :[0,1]\rightarrow X\). We avoid to repeat all the constructions in this new scenario (see [4] for a systematic introduction), and prefer to immediately specialize to our case of interest. We take \((X,d)=(\mathscr {P}(\mathbb {T}^{n}), W_{2})\) the space of probability measures on \(\mathbb {T}^{n}\) endowed with the Wasserstein distance induced by the optimal transport problem with quadratic cost. As it is well-known, (Xd) is compact, geodesic and positively curved (see [5]). Given a “reference” probability measure \(\lambda \in \mathscr {P}(\mathbb {T}^{n})\), we consider the (halved) opposite squared distance function from \(\lambda \), namely

$$\begin{aligned}f_{\lambda }(\mu )=-\frac{W_{2}^{2}(\mu , \lambda )}{2}.\end{aligned}$$

Since (Xd) is positively curved, we easily deduce that \(f_{\lambda }\) is \((-1)\)-convex (in the metric setting, convexity has to be intended along geodesics). Moreover, we can bound the descending slope of \(f_{\lambda }\) at \(\mu \) as follows:

$$\begin{aligned} |\nabla ^{-}f_{\lambda }|(\mu )= & {} \underset{\nu \rightarrow \mu }{\limsup }\,\frac{[W_{2}^{2}(\nu , \lambda )-W_{2}^{2}(\mu ,\lambda )]^{+}}{2W_{2}(\mu , \nu )}\nonumber \\\le & {} \underset{\nu \rightarrow \mu }{\limsup }\,\frac{W_{2}(\nu ,\lambda )+W_{2}(\mu ,\lambda )}{2}=W_{2}(\mu ,\lambda ). \end{aligned}$$
(2.14)

A precise characterization of \(|\nabla ^{-}f_{\lambda }|(\mu )\) is given in [5, Theorem 10.4.12], and involves the minimal \(L^{2}\) norm of the barycentric projection of optimal transport plans. Inspired by the MAG model, and in particular by formulas (2.9) and (2.10), one could study the Lagrangian problem associated to the lower semicontinuous functional \(I^{\delta }_{f_{\lambda },\mu _{0},\mu _{\delta }}:C([0,\delta ],X)\rightarrow [0,+\infty ]\) defined by

$$\begin{aligned}I^{\delta }_{f_{\lambda },\mu _{0},\mu _{\delta }}(\gamma )= \left\{ \begin{array}{ll} \displaystyle \int _{0}^{\delta }|\dot{\gamma }|^{2}+|\nabla ^{-} f_{\lambda }|^{2}(\gamma )\quad &{}\text {if}\quad \gamma \in AC([0,\delta ],X),\,{ and}\gamma (0)=\mu _{0}, \gamma (\delta )=\mu _{\delta },\\ +\infty &{}\text {otherwise}. \end{array}\right. \end{aligned}$$

From the compactness of X, we immediately obtain the existence of minimizers of \(I^{\delta }_{f_{\lambda },\mu _{0},\mu _{\delta }}\). In addition, by exploiting a generalization of Theorem 2.1 to the general metric setting provided by [4, Theorem 17], as well as the bound on the descending slope (2.14), we obtain the following stability result:

Proposition 2.3

Let \(\lambda _{j}, \lambda \in \mathscr {P}(\mathbb {T}^{d})\) be reference measures, and \(\mu _{j,0},\mu _{j,\delta },\mu _{0},\mu _{\delta }\in \mathscr {P}(\mathbb {T}^{d})\) be endpoints. Suppose that

  1. (i)

    \(\lambda _{j}\rightarrow \lambda \) in \(W_{2}\).

  2. (ii)

    \(\mu _{j,i}\rightarrow \mu _{i}\) in \(W_{2}\), for \(i=0,\delta \).

Then \(I^{\delta }_{f_{\lambda _{j}},\mu _{j,0},\mu _{j,\delta }}\) \(\Gamma \)-converge to \(I^{\delta }_{f_{\lambda },\mu _{0},\mu _{\delta }}\) in the \(C([0,\delta ],X)\) topology. Moreover, we have convergence of minimal values to minimal values and of minimizers to minimizers.

3 The case of the opposite squared distance function in \(\mathbb {R}^{d}\)

In this section the main results of the paper will be derived. We study functionals of type (2.1) in the special case in which \(H=\mathbb {R}^{d}\) and \(f=f_{K}\) is the opposite squared distance function from a closed subset \(K\subseteq \mathbb {R}^{d}\). Motivated by the variational reformulation of the discrete MAG model, derived in the previous section, we will in particular focus on the case in which K is a discrete collection of points in \(\mathbb {R}^{d}\). We highlight here that the functional associated to a general closed set can be approximated, in the sense of \(\Gamma \)-convergence, by functionals associated to discrete sets (see Corollary 3.4 below). In the discrete setting, we will exploit the geometrical structure given by the associated Voronoi decomposition of the space in order to get regularity for local minimizers out of a finite number of “shock times”.

Given a closed set \(K\subseteq \mathbb {R}^{d}\), we consider the opposite squared distance function from K, defined by

$$\begin{aligned} f_{K}(x):=-\frac{{{\,\textrm{dist}\,}}^{2}_{K}(x)}{2}=-\underset{y\in K}{\min }\frac{|x-y|^{2}}{2}. \end{aligned}$$
(3.1)

Notice that the infimum in (2.2) is always attained here, due to the local compactness of the ambient space. The convex function

$$\begin{aligned} g_{K}(x):=\underset{y\in K}{\max }\left( x\cdot y-\frac{|y|^2}{2}\right) \end{aligned}$$
(3.2)

satisfies \(g_{K}(x)=f_{K}(x)+\frac{|x|^{2}}{2}\), thus implying the \((-1)\)-convexity of \(f_{K}\). We fix a potential shape \(h:[0,+\infty )\rightarrow [0,+\infty )\) of class \(C^{1}\) and consider the action functional \(I_{f_{K}, x_{0}, x_{\delta }}^{\delta }:C([0,\delta ],\mathbb {R}^{d})\rightarrow [0,+\infty ]\) defined by

$$\begin{aligned} I^{\delta }_{f_{K},x_{0},x_{\delta }}(\gamma ):= \left\{ \begin{array}{ll} \displaystyle \int _{0}^{\delta }|\dot{\gamma }|^2+h(|\nabla f_{K}|^{2}(\gamma ))\quad &{}\text {if}\quad \gamma \in AC([0,\delta ], \mathbb {R}^{d})\quad \text {and} \quad \gamma (0)=x_{0}, \gamma (\delta )=x_{\delta },\\ +\infty &{}\text {otherwise}. \end{array}\right. \end{aligned}$$

We stress that \(\nabla f_{K}\) has to be intended as an extended gradient, because \(f_{K}\) is differentiable only at those points in which the projection on K is unique.

In order to get a useful characterization of \(\nabla f_{K}\), we need a well-known Lemma of convex analysis providing an explicit formula for the subdifferential at x of the maximum of a family of convex functions, under suitable assumptions (see [14]).

Lemma 3.1

(Subdifferential of the sup function) Let \(\left\{ g_{\alpha }:\mathbb {R}^{d}\rightarrow \mathbb {R}\right\} _{\alpha \in \mathcal {A}}\) be a collection of convex functions indexed on a compact metric space \(\mathcal {A}\), and suppose that \(\alpha \mapsto g_{\alpha }(x)\) is upper semicontinuous for every \(x\in \mathbb {R}^{d}\). We consider the supremum function

$$\begin{aligned}g:=\underset{\alpha \in \mathcal {A}}{\sup }\,g_{\alpha }.\end{aligned}$$

Then, if the supremum in the definition of g(x) is attained, the following formula holds for the subdifferential of g at x:

$$\begin{aligned} \partial g(x)={{\,\textrm{conv}\,}}\left( \bigcup \left\{ \partial g_{\alpha }(x): g_{\alpha }(x)=g(x)\right\} \right) . \end{aligned}$$

We call \({{\,\textrm{opt}\,}}_{K}(x)\) the compact subset of K containing all the points that minimize the distance from x:

$$\begin{aligned}{{\,\textrm{opt}\,}}_{K}(x):=\left\{ y \in K: {{{\,\textrm{dist}\,}}}_{K}(x)=|x-y|\right\} =\left\{ y\in K: g_{K}(x)=x\cdot y-\frac{|y|^2}{2}\right\} .\end{aligned}$$

In the sequel we will refer to \({{\,\textrm{opt}\,}}_{K}(x)\) as the optimality class of x. Applying Lemma 3.1 we get:

Proposition 3.2

(Subdifferential of the opposite squared distance function) Let \(K\subseteq \mathbb {R}^{d}\) be a closed set, and let \(f_K, g_K\) be defined as in (3.1) and (3.2). Then

  1. (i)

    The subdifferential of \(g_K\) at x is given by

    $$\begin{aligned}\partial g_{K}(x)={{\,\textrm{conv}\,}}({{\,\textrm{opt}\,}}_{K}(x)).\end{aligned}$$
  2. (ii)

    The subdifferential of \(f_{K}\) at x is given by

    $$\begin{aligned}\partial f_{K}(x)={{\,\textrm{conv}\,}}({{\,\textrm{opt}\,}}_{K}(x))-x.\end{aligned}$$

    Moreover, denoting by \(\eta _{K}(x)\) the unique projection of x on the closed convex set \({{\,\textrm{conv}\,}}({{\,\textrm{opt}\,}}_{K}(x))\), the following formula holds for the extended gradient of \(f_{K}\) at x:

    $$\begin{aligned} \nabla f_{K}(x)=\eta _{K}(x)-x. \end{aligned}$$
    (3.3)
  3. (iii)

    The point \(\eta _{K}(x)\) depends only on the optimality class of x. That is to say, \(\eta _{K}(x)=\eta _{K}(y)\), whenever \({{\,\textrm{opt}\,}}_{K}(x)={{\,\textrm{opt}\,}}_{K}(y)\).

Proof

Point (i) easily follows from Lemma 3.1 if K is compact. To deal with the general case it is enough to notice that for every \(x\in \mathbb {R}^{d}\) and every radius \(R>{{\,\textrm{dist}\,}}_{K}(x)\), we have \(g_{K}=g_{K\cap \overline{B_{R}(x)}}\) in a neighborhood of x. The formula for \(\partial f_{K}(x)\) is a consequence of the rule for the subdifferential of the sum of two functions, one of which smooth. Then, by definition, \(\nabla f_{K}(x)\) is the projection of 0 on \(\partial f_{K}(x)={{\,\textrm{conv}\,}}({{\,\textrm{opt}\,}}_{K}(x))-x\), and formula (3.3) follows after a translation of x. Let us now address point (iii). Suppose that x and y share the same optimality class, \({{\,\textrm{opt}\,}}_{K}(x)={{\,\textrm{opt}\,}}_{K}(y)\). Consider the affine space A spanned by \({{\,\textrm{opt}\,}}_{K}(x)\) and its orthogonal space B passing through x. From the hypothesis on x and y we deduce that also y belongs to B. Then, denoting by p the point of intersection of A and B, by orthogonality we have:

$$\begin{aligned}{} & {} |x-z|^2=|x-p|^2+|p-z|^2,\\{} & {} |y-z|^2=|y-p|^2+|p-z|^2\quad \text {for every }\quad z\in {{{\,\textrm{conv}\,}}}({{\,\textrm{opt}\,}}_{K}(x)).\end{aligned}$$

Hence, both distances are minimized by the point z obtained by projecting p on \({{{\,\textrm{conv}\,}}}({{\,\textrm{opt}\,}}_{K}(x))\), thus \(\eta _{K}(x)=\eta _{K}(y)\).\(\square \)

Remark 3.3

From (3.3) we deduce that the potential term \(|\nabla f_{K}(x)|^2\) is always less than or equal to \(-2f_{K}(x)={{\,\textrm{dist}\,}}_{K}^{2}(x)\), and equality holds if and only if x has a unique projection on K. It is interesting to see what this means for the MAG model. Using the notation of Sect. 2.2, the potential of a configuration \((y_{1},\dots ,y_{m})\in \mathbb {R}^{nm}\) is always smaller than \(W_{2}(\mu ,\lambda )\), where, setting \(x_i = \pi (y_{i})\),

$$\begin{aligned}\lambda = \frac{1}{m}\overset{m}{\underset{i=1}{\sum }}\delta _{a_i},\quad \mu = \frac{1}{m}\overset{m}{\underset{i=1}{\sum }}\delta _{x_i} \end{aligned}$$

and \(W_{2}\) is the Wasserstein distance in \(\mathscr {P}_{2}(\mathbb {T}^{n})\). Moreover, equality holds if and only if there exists a unique optimal transport map from \(\mu \) to \(\lambda \). So we see that in the context of the MAG model, the potential term should be interpreted as a “measure of the ambiguity in the optimal transport problem”. Tipical manifestations of ambiguity in the discrete scenario appear when two or more particles collapse, thus sharing the same position in \(\mathbb {T}^{n}\). Compare also this phenomenon with the continuous framework of Theorem 2.2, where this ambiguity does not occur.

To end this part, we briefly come back to the matter of stability in this specialized context, stating the following Corollary of Theorem 2.1:

Corollary 3.4

Let \(K_{j}, K\) be closed subsets of \(\mathbb {R}^{d}\), and let \(x_{j,0}, x_{j,\delta }, x_{0}, x_{\delta }\in \mathbb {R}^{d}\). Suppose that

  1. (i)

    \(K_{j}\rightarrow K\) in the sense of Hausdorff in every compact set.

  2. (ii)

    \(x_{j,i}\rightarrow x_{i}\) for \(i= 0,\delta \).

Then \(I^{\delta }_{f_{K_{j}},x_{j,0},x_{j,\delta }}\) \(\Gamma \)-converge to \(I^{\delta }_{f_{K},x_{0},x_{\delta }}\) in the \(C([0,\delta ],\mathbb {R}^{d})\) topology. Moreover, we have convergence of minimal values to minimal values and of minimizers to minimizers.

As a consequence, the functional associated to a closed set K can be approximated by functionals associated to \(K_{j}\), where each \(K_{j}\) is a finite collection of points in \(\mathbb {R}^{d}\). It is the scope of the following subsection to focus on this simpler situation.

3.1 The discrete case

From now on, we restrict our analysis to the case in which K is given by a collection of N distinct points in \(\mathbb {R}^{d}\):

$$\begin{aligned}K=\left\{ p_{1},\dots ,p_{N}\right\} .\end{aligned}$$

We are particularly interested in studying properties of local minimizers for \(I^{\delta }_{f_{K},x_{0},x_{\delta }}\) because of the link with the variational reformulation of the discrete MAG model (see the discussion above). There, K was an infinite discrete set, but we can clearly restrict our analysis, which is essentially local, to the case in which K is finite, due to the compactness of the range of every continuous curve \(\gamma :[0,\delta ]\rightarrow \mathbb {R}^{d}\). Let us fix K, so as to be allowed to omit all the pedices involving it. Then, for instance, we will write \(f, g, \eta , {{\,\textrm{opt}\,}}\) in the place of \(f_{K}, g_{K}, \eta _{K}, {{\,\textrm{opt}\,}}_{K}\).

Polyhedra, Voronoi cells and potential zones. We say that \(P\subseteq \mathbb {R}^{d}\) is a polyhedron if it is a non-empty closed convex set admitting a representation of the form

$$\begin{aligned} P=\overset{\ell }{\underset{j=1}{\bigcap }}\left\{ x\in \mathbb {R}^{d}: T_{j}(x)\le 0\right\} , \end{aligned}$$
(3.4)

where \(\ell \in \mathbb {N}\) and \(T_{j}:\mathbb {R}^{d}\rightarrow \mathbb {R}\) are affine functions. A bounded polyhedron is called a polytope.

The Voronoi partition associated to a finite collection of points \(K=\left\{ p_{1},\dots ,p_{N}\right\} \) is the finite decomposition \(\left\{ V_{H}\right\} _{H\in \mathcal {P}(K)}\) of \(\mathbb {R}^{d}\), indexed by the set \(\mathcal {P}(K)\) of the parts of K, and such that

$$\begin{aligned}V_{H}=\left\{ x\in \mathbb {R}^{d}: {{\,\textrm{opt}\,}}(x)=H\right\} .\end{aligned}$$

We call \(V_H\) the Voronoi cell corresponding to the optimality class H. The following are well-known facts about this remarkable cellular decomposition of the space (see [6]).

Proposition 3.5

(Properties of the Voronoi partition) Let \(H\in \mathcal {P}(K)\) be such that the Voronoi cell \(V_{H}\) is non-empty. Then

  1. (i)

    \(V_{H}\) is a convex set. Moreover, denoting by \(A_H\) the affine space spanned by H, and by \(B_H\) the affine space

    $$\begin{aligned}B_{H}:=\left\{ x\in \mathbb {R}^{d}: |x-p_i|=|x-p_j|, \, \forall p_i, p_j \in H\right\} ,\end{aligned}$$

    we have that \(A_H\) is orthogonal to \(B_H\), they have complementary dimensions in \(\mathbb {R}^{d}\), and \(V_H\) is relatively open in \(B_H\). We call \(p_H\) the unique intersection point of \(A_H\) and \(B_H\).

  2. (ii)

    The closure \(\overline{V_{H}}\) is a polyhedron, whose relative boundary in \(B_H\) is precisely given by the disjoint union of all the Voronoi cells \(V_{L}\) with \(H\subsetneq L\).

In order to introduce the second fundamental decomposition associated to K, we also need to define, for every \(\eta \in \mathbb {R}^{d}\), the sets:

$$\begin{aligned} Q_{\eta }&:=\left\{ x\in \mathbb {R}^{d}: \eta (x)=\eta \right\} ,\\ P_{\eta }&:=\left\{ x\in \mathbb {R}^{d}: \eta \in \partial g(x)\right\} . \end{aligned}$$

The following proposition encodes the underlying geometrical structure conferred to our variational problem by the particular choice we made for the potential. It will be of fundamental importance in deriving regularity results for local minimizers of the functional \(I^{\delta }_{f_{K},x_{0},x_{\delta }}\).

Proposition 3.6

(Voronoi cells and potential zones) The following facts hold:

  1. (i)

    The map \(\eta \) is constant in each Voronoi cell, and hence has a finite range, that we denote by \(\mathcal {E}\).

  2. (ii)

    \(\left\{ Q_{\eta }\right\} _{\eta \in \mathcal {E}}\) is a partition of \(\mathbb {R}^{d}\), and \(x\in Q_{\eta }\) if and only if \(\nabla f(x)=\eta -x\). In the sequel we will refer to the \(Q_{\eta }\)’s as potential zones.

  3. (iii)

    For every \(\eta \in \mathcal {E}\), both \(Q_{\eta }\) and \(P_{\eta }\) are union of Voronoi cells and it holds \(Q_{\eta } \subseteq P_{\eta }\).

  4. (iv)

    For every \(\eta \in \mathcal {E}\), \(P_{\eta }\) is a polyhedron.

  5. (v)

    Let \(\beta \) be the positive constant defined by

    $$\begin{aligned} \beta :=\underset{\eta , \bar{\eta } \in \mathcal {E}}{\underset{\eta \ne \bar{\eta }}{\min }}|\eta -\bar{\eta }|^2. \end{aligned}$$
    (3.5)

    Then we have

    $$\begin{aligned} |\bar{\eta }-x|^2\ge |\eta -x|^2+\beta \quad \text {for every distinct }\eta , \bar{\eta } \in \mathcal {E}\text { and }x\in Q_{\eta }\cap P_{\bar{\eta }}. \end{aligned}$$
    (3.6)

Proof

Point (i), (ii) and (iii) are direct consequences of Proposition 3.2. To prove point (iv) it is enough to notice that \(P_{\eta }\) is a closed convex set that can be written as the union of a finite number of polyhedra (the closures of the Voronoi cells contained in \(P_{\eta }\)). Finally, point (v) easily follows from the fact that \(\eta \) is the projection of x on the closed convex set \(\partial g(x)\) containing \(\bar{\eta }\).\(\square \)

So we see that K carries two partitions of \(\mathbb {R}^{d}\), one finer than the other: the first into Voronoi cells and the second into potential zones. Simple examples show that for a general K they do not coincide (see for instance Example 3.14 hereafter). If they coincide, we say that K is balanced. In such a case, the map \(\eta \) defines a bijection between Voronoi cells and potential zones, that is to say,

$$\begin{aligned} \eta (x)=\eta (y)\iff {{\,\textrm{opt}\,}}(x)={{\,\textrm{opt}\,}}(y). \end{aligned}$$

Clearly, a sufficient condition for K to be balanced is given by

$$\begin{aligned} {{\,\textrm{opt}\,}}(\eta (x))={{\,\textrm{opt}\,}}(x)\quad \text {for every }x\in \mathbb {R}^{d}. \end{aligned}$$
(3.7)

It is worth noting that in dimension \(d=1\) every K is balanced, and that the same is true in any dimension for cubic lattices.

Conserved quantities. In this paragraph we underline the presence of some conserved quantities for local minimizers of our variational problem. They naturally arise by testing the local minimality against variations along some specific directions. The following Lemma collects two crucial observations in order to suitably perform such variations:

Lemma 3.7

(Local properties of the Voronoi diagram) The following facts hold:

  1. (i)

    If \(x\in V_H\), for some \(H\subseteq K\), then there exists a neighborhood U of x such that \({{\,\textrm{opt}\,}}(y)\subseteq H\), for every \(y\in U\).

  2. (ii)

    If \(x\in V_H\), then \(x+\epsilon v \in V_H\), provided that the vector v is parallel to \(B_H\), and \(\epsilon \) is sufficiently small.

Proof

Point (i) follows from the fact that a point of K which is not optimal for x is neither optimal for y, provided that y is chosen close enough to x. Point (ii) is instead a direct consequence of the fact that \(V_{H}\) is relatively open in the affine space \(B_H\).\(\square \)

By using very classical variational arguments as well as Lemma 3.7 we derive the following

Proposition 3.8

(Conservation laws) Let \(\gamma \) be a local minimizer of \(I^{\delta }_{f_{K},x_{0},x_{\delta }}\). Then

  1. (i)

    (Conservation of the energy). There exists a constant \(c\in \mathbb {R}\) such that

    $$\begin{aligned}|\dot{\gamma }|^{2}-h(|\nabla f(\gamma )|^{2})=c\quad \text {a.e.\ in }(0,\delta ).\end{aligned}$$

    In particular, \(\gamma \) is Lipschitz continuous.

  2. (ii)

    (Local conservation of momentum). Let \((t_1, t_2)\subseteq [0,\delta ]\) be a time interval. Suppose that there exists an optimality class \(H\subseteq K\), with \(V_H \ne \emptyset \), such that for every \(s\in (t_1, t_2)\) the inclusion \({{\,\textrm{opt}\,}}(\gamma (s))\subseteq H\) holds. Let \(\gamma _H\) be the curve obtained by projecting \(\gamma \) on the affine space \(B_H\). Then \(\gamma _H\) is a \(C^{1,1}\) curve in \((t_1, t_2)\) and, in this interval, it satisfies

    $$\begin{aligned}\ddot{\gamma }_H=h'(|\gamma -\eta (\gamma )|^2)(\gamma _H - p_H).\end{aligned}$$

    In particular, for each time \(t\in (0,\delta )\), denoting \(H={{\,\textrm{opt}\,}}(\gamma (t))\), there exists a neighborhood of t in which the component \(\dot{\gamma }_H\) of the momentum parallel to \(B_H\) is continuous.

Proof

Point (i) states that \(\gamma \) solves the Du Bois–Reymond equation (2.5). This can be shown by testing the local minimality through “horizontal” variations of the independent variable of the form \(\gamma _{\epsilon }= \gamma \circ \rho _{\epsilon }^{-1}\), where \(\rho _{\epsilon }={{\,\textrm{id}\,}}+ \epsilon \varphi \), \(\varphi \in C^{\infty }_{c}((0,\delta ))\), and \(\epsilon \) is small enough so that \(\rho _{\epsilon }\) is a diffeomorphism. The Lipschitz continuity of \(\gamma \) then follows from (3.3) via \(|\nabla f_{K}(x)|\le {{\,\textrm{dist}\,}}_{K}(x)\) and the continuity of \(\gamma \). To get point (ii), instead, we need to perform “vertical” variations of the form \(\gamma _{\epsilon } = \gamma + \epsilon \varphi v\), where \(\varphi \in C^{\infty }_{c}((t_1,t_2))\) and v is any vector parallel to the affine space \(B_{H}\). We then use point (ii) of Lemma 3.7 to get \(\eta (\gamma _{\epsilon })=\eta (\gamma )\), for \(\epsilon \) sufficiently small.\(\square \)

Point (ii) of the previous proposition implies in particular that \(\gamma \) is regular as long as it stays in a single Voronoi cell. More precisely:

Corollary 3.9

(Regularity inside a Voronoi cell) Let \(\gamma \) be a local minimizer of \(I^{\delta }_{f_{K},x_{0},x_{\delta }}\). Suppose that \({{\,\textrm{opt}\,}}(\gamma (s))=H\) is constant in \((t_1, t_2)\) and let \(\eta _H = \eta (V_{H})\). Then, in this interval, \(\gamma =\gamma _H\), \(\eta (\gamma )=\eta _H\) is constant, and \(\gamma \) is a \(C^{2}\) solution of

$$\begin{aligned}\ddot{\gamma }=h'(|\gamma -\eta _H|^2)(\gamma - p_H).\end{aligned}$$

Moreover, if h is \(C^{\infty }\), then \(\gamma \in C^{\infty }((t_1,t_2))\).

As a natural consequence, any singularity for a local minimizer of the functional \(I^{\delta }_{f_{K},x_{0},x_{\delta }}\) appears only when the optimality class “changes”. In the next paragraph we will try to give a more precise meaning to this statement.

Shock times and minimal deviation. Given a curve \(\gamma :[0,\delta ]\rightarrow \mathbb {R}^{d}\), and a time \(t\in [0,\delta ]\), we say that

  • t is a shock for \(\gamma \) if \({{\,\textrm{opt}\,}}(\gamma )\) is not constant on I for every neighborhood I of t in \([0,\delta ]\).

  • t is a non-degenerate shock for \(\gamma \) if \(\eta (\gamma )\) is not constant on I for every neighborhood I of t in \([0,\delta ]\).

  • t is an effective shock for \(\gamma \) if there are two distinct potential zones \(Q_{\eta }, Q_{\bar{\eta }}\), and two distinct Voronoi cells \(V_{H}\subseteq Q_{\eta }\), \(V_{\bar{H}}\subseteq Q_{\bar{\eta }}\) such that, for some \(\epsilon >0\), one of the following holds:

    • \(\gamma ((t-\epsilon , t))\subset V_{H}\) and \(\gamma ([t, t+\epsilon ))\subset V_{\bar{H}}\). In this case, \(H\subsetneq \bar{H}\) and we say that t is a left effective shock.

    • \(\gamma ((t-\epsilon , t])\subset V_{H}\) and \(\gamma ((t, t+\epsilon ))\subset V_{\bar{H}}\). In this case, \(\bar{H}\subsetneq H\) and we say that t is a right effective shock.

We denote by \(S(\gamma )\), \(NDS(\gamma )\) and \(ES(\gamma )\) respectively the sets of shocks, non-degenerate shocks and effective shocks for \(\gamma \). Notice that \(ES(\gamma )\subseteq NDS(\gamma )\subseteq S(\gamma )\), and that \(S(\gamma )\) and \(NDS(\gamma )\) are compact. According to the definitions above, during a shock there must be a change of Voronoi cell, while, during a non-degenerate shock there is also a change of potential zone. Clearly we have \(S(\gamma )=NDS(\gamma )\) provided that K is balanced. Finally, we have an effective shock when a neat passage occurs from a Voronoi cell to an adjacent one with different potential. By conservation of the energy, we expect the dynamics to develop a singularity in the kinetic term here. This is the content of the following proposition, which is a direct consequence of the conservation laws stated in Proposition 3.8.

Proposition 3.10

(Minimal deviation during an effective shock) Suppose that the potential shape h is strictly increasing. Let \(\gamma \) be a local minimizer of \(I^{\delta }_{f_{K},x_{0},x_{\delta }}\). Suppose that \(t\in (0,\delta )\) is a left effective shock for \(\gamma \), in which \(\gamma \) jumps from \(V_H\) to \(V_{\bar{H}}\). Let \(\eta _H = \eta (V_{H})\) and \(\eta _{\bar{H}}=\eta (V_{\bar{H}})\). Defined \(\dot{\gamma }_{-}(t)\) and \(\dot{\gamma }_{+}(t)\) respectively by

$$\begin{aligned}\dot{\gamma }_{-}(t):=\underset{s\rightarrow t^{-}}{\lim }\dot{\gamma }(s)\quad \quad \dot{\gamma }_{+}(t):=\underset{s\rightarrow t^{+}}{\lim }\dot{\gamma }(s),\end{aligned}$$

then \(\dot{\gamma }_{+}(t)\) is the component of \(\dot{\gamma }_{-}(t)\) parallel to \(B_{\bar{H}}\). Moreover

$$\begin{aligned}|\dot{\gamma }_{-}(t)-\dot{\gamma }_{+}(t)|^2=|\dot{\gamma }_{-}(t)|^2-|\dot{\gamma }_{+}(t)|^2=h(|\gamma (t)-\eta _H|^2)-h(|\gamma (t)-\eta _{\bar{H}}|^2)>0\end{aligned}$$

Clearly an analogous result holds for right effective shocks.

Remark 3.11

The fact that \(\dot{\gamma }_{+}(t)\) is the component of \(\dot{\gamma }_{-}(t)\) parallel to \(B_{\bar{H}}\) has the following interpretation: in the dynamics, when there is a neat passage from a given Voronoi cell to an adjacent one (hence lower-dimensional), the component of the momentum \(\dot{\gamma }\) parallel to the second cell is continuous, while the orthogonal one has a jump. In the specific case of a superadditive shape h, (for instance when \(h={{\,\textrm{id}\,}}\), as in the MAG model), we can also derive a uniform lower bound for such a jump:

$$\begin{aligned}|\dot{\gamma }_{-}(t)-\dot{\gamma }_{+}(t)|^2 = h(|\gamma (t)-\eta _H|^2)-h(|\gamma (t)-\eta _{\bar{H}}|^2)\ge h(|\eta _H-\eta _{\bar{H}}|^2)\ge h(\beta ),\end{aligned}$$

where \(\beta \) was defined in (3.5).

Remark 3.12

In the MAG dynamics, shocks tipically happen when two or more particles collide or separate, generating an instant change in the optimality class. For instance an effective shock occurs when two particles collide and remain sticked together. Notice that Proposition 3.8 tells us that energy and momentum are conserved in a collision.

To end this part, we show, through a couple of simple examples, that all of the three types of shock times defined in this paragraph may occur for a minimizer of \(I^{\delta }_{f_{K},x_{0},x_{\delta }}\).

Example 3.13

(Effective and non-effective shocks) Take \(h={{\,\textrm{id}\,}}\), \(\delta =1\), \(d=1\), \(K=\left\{ -1, 1\right\} \), \(x_{0}=-c\), \(x_{1}=c\), with \(c\ge 0\). We see that \(\mathbb {R}\) is partitioned into three Voronoi cells, namely: \(V_{\left\{ -1\right\} }=(-\infty ,0)\), \(V_{\left\{ 1\right\} }=(0,+\infty )\) and \(V_{\left\{ -1, 1\right\} }=\left\{ 0\right\} \). The potential term takes the form

$$\begin{aligned}|\nabla f(x)|^2 = \left\{ \begin{array}{ll} |x+1|^2\quad &{}\text {if }\quad x\in (-\infty ,0),\\ 0 &{}\text {if }\quad x=0,\\ |x-1|^2 &{}\text {if }\quad x\in (0,+\infty ). \end{array}\right. \end{aligned}$$

It is easily seen that a minimizer \(\gamma \) of \(I^{\delta }_{f_{K},x_{0},x_{\delta }}\) has to be non-decreasing, and thus \(\gamma ^{-1}(0)=[t_0, t_1]\) is a closed interval (possibly degenerate if \(t_0 = t_1\)), and \(S(\gamma )=NDS(\gamma )=\left\{ t_0, t_1\right\} \). Then, there are two possible qualitatively different behaviors of \(\gamma \), according to whether \(t_0 = t_1\) or not. If \(t_0 = t_1\), then \(\gamma \) has a single non-degenerate non-effective shock time. If instead \(t_0 \ne t_1\), then \(t_0\) and \(t_1\) are respectively left and right effective shocks. Now, direct computations show that the first case occurs if for instance \(c=1\). On the other hand, we can prove that the second case necessarily occurs if c is chosen sufficiently small. As a matter of fact, for \(c<1\), by the monotonicity of \(\gamma \), the minimum value \(\Gamma (-c,c)\) of the functional can be bounded below as follows:

$$\begin{aligned}\Gamma (-c,c)\ge (1-t_1 + t_0)(1-c)^2.\end{aligned}$$

At the same time, by Corollary 3.4, we have

$$\begin{aligned}\Gamma (-c,c)\rightarrow \Gamma (0,0) = 0,\quad \text {as }c\rightarrow 0^{+}.\end{aligned}$$

Therefore, \(t_1>t_0\), provided that we choose c small enough.

Example 3.14

(Degenerate shocks) Take \(h={{\,\textrm{id}\,}}\), \(\delta = 1\), \(d=2\), \(K=\left\{ (1,0), (0,1), (-1,0)\right\} \), \(x_0 = (0,-1)\), \(x_{1}=(0,0)\). Here we notice that the two distinct Voronoi cells \(V_{\left\{ (1,0), (0,1), (-1,0)\right\} }=\left\{ (0,0)\right\} \) and \(V_{\left\{ (1,0), (-1,0)\right\} }=\left\{ 0\right\} \times (-\infty ,0)\) share the same potential zone \(Q_{\eta }=\left\{ 0\right\} \times (-\infty ,0]\) with \(\eta = (0,0)\). We claim that every minimizer \(\gamma \) of \(I^{\delta }_{f_{K},x_{0},x_{\delta }}\) must live in \(Q_\eta \). As a consequence, we would deduce that the first time \(\tau \) at which \(\gamma (\tau )=(0,0)\) has to be a degenerate shock for \(\gamma \). To prove the claim we test the minimality of \(\gamma \) against the competitor \(\delta (t)= \pi (\gamma (t))\), where \(\pi \) denotes the projection on \(Q_{\eta }\). It can be easily checked via \(\pi (K)=\left\{ \eta \right\} \) and the contractivity of \(\pi \) that, for every t,

$$\begin{aligned}{} & {} |\nabla f(\delta (t))|=|\eta (\delta (t))-\delta (t)|=|\eta -\delta (t)|\\{} & {} \quad =|\pi (\eta (\gamma (t)))-\pi (\gamma (t))|\le |\eta (\gamma (t))-\gamma (t)|=|\nabla f(\gamma (t))|.\end{aligned}$$

Moreover, again by the contractivity of \(\gamma \): \(|\dot{\delta }|\le |\dot{\gamma }|\) a.e., with equality if and only if \(\gamma =\delta \). Comparing the actions of \(\gamma \) and \(\delta \) and imposing the minimality of \(\gamma \), we see that necessarily \(\gamma =\delta \), that is to say \(\gamma \) lives in \(Q_{\eta }\).

3.2 Regularity results

Here we state and prove our main regularity results. Recall that \(K=\left\{ p_1,\dots ,p_N\right\} \) is a finite collection of points in \(\mathbb {R}^{d}\), and \(h:[0,+\infty )\rightarrow [0,+\infty )\) is a \(C^{1}\) potential shape.

Theorem 2.2

(\(C^{1,1}\)-regularity out of a finite number of non-degenerate shock times) Let \(\gamma \) be a local minimizer of \(I^{\delta }_{f_{K},x_{0},x_{\delta }}\). Suppose that h is strictly increasing. Then:

  1. (i)

    The set \(NDS(\gamma )\) of non-degenerate shock times of \(\gamma \) is finite. That is to say, there is a finite number of times \(0\le t_1<\dots <t_{\ell }\le \delta \) such that \(\eta (\gamma )\) is constant in each connected component of \([0,\delta ]{\setminus } \left\{ t_1,\dots ,t_{\ell }\right\} \).

  2. (ii)

    Setting \(t_0 = 0\) and \(t_{\ell +1}=\delta \), then \(\gamma \) is \(C^{1,1}\)-regular in the interval \([t_i, t_{i+1}]\) for every \(i\in \left\{ 0,\dots ,\ell \right\} \). Moreover, if we let \(\eta _i \in \mathcal {E}\) be such that \(\gamma ((t_i, t_{i+1}))\subseteq Q_{\eta _i}\), then we can estimate

    $$\begin{aligned}|\ddot{\gamma }(r)|\le h'(|\gamma (r)-\eta _i|^2)|\gamma (r)-\eta _i|\quad \text {for a.e.}\quad r\in (t_i,t_{i+1}).\end{aligned}$$

Actually, if K is balanced and h is smooth, Theorem 3.15 can be improved to reach piecewise smooth regularity for any local minimizer \(\gamma \). In fact, since K is balanced, the equality \(NDS(\gamma )=S(\gamma )\) holds and point i) of Theorem 3.15 implies that \(\gamma \) has a finite number of shock times. On the other hand, Corollary 3.9 together with the smoothness of h ensures that \(\gamma \) is smooth in each connected component of \([0,\delta ]{\setminus } S(\gamma )\), where clearly \({{\,\textrm{opt}\,}}(\gamma (t))\) is constant.

Corollary 3.16

(\(C^{\infty }\)-regularity out of a finite number of shock times) Let \(\gamma \) be a local minimizer of \(I^{\delta }_{f_{K},x_{0},x_{\delta }}\). Suppose that K is balanced and h is \(C^{\infty }\) and strictly increasing. Then:

  1. (i)

    The set \(S(\gamma )\) of shock times of \(\gamma \) is finite. That is to say, there is a finite number of times \(0\le t_1<\dots <t_{\ell }\le \delta \) such that the optimality class \({{\,\textrm{opt}\,}}(\gamma )\) is constant in each connected component of \([0,\delta ]{\setminus } \left\{ t_1,\dots ,t_{\ell }\right\} \).

  2. (ii)

    Setting \(t_0 = 0\) and \(t_{\ell +1}=\delta \), then \(\gamma \) is \(C^{\infty }\) in the interval \([t_i, t_{i+1}]\) for every \(i\in \left\{ 0,\dots ,\ell \right\} \). Moreover, if \(t_{i+1}>t_{i}\), denoting by \(H_i\) the optimality class of \(\gamma \) in the interval \((t_i, t_{i+1})\), and defining \(\eta _{H_i}=\eta (V_{H_i})\), then \(\gamma \) solves

    $$\begin{aligned}\ddot{\gamma }=h'(|\gamma -\eta _{H_i}|^2)(\gamma - p_{H_i})\end{aligned}$$

    in \((t_i, t_{i+1})\).

Remark 3.17

Corollary 3.16 offers a generalization of [2, Theorem 13], in which smooth regularity out of a finite number of shock times was derived for solutions of a one-dimensional version of the discrete MAG model. In their framework, h was simply the identity, while K consisted of all the m! points of \(\mathbb {R}^{d}\) obtainable by permuting the components of a fixed vector \(A=(a_1,\dots ,a_d)\in \mathbb {R}^{d}\), with \(a_1<\dots <a_d\). Exploiting the rearrangement inequality provided by the order structure of \(\mathbb {R}\), it is not difficult to show that hypothesis (3.7) holds in this case, therefore implying that K is balanced.

The rest of the paper is devoted to the proof of Theorem 3.15, which is organized as follows. We first prove the two Lemmas 3.18 and 3.19. As a byproduct of Lemma 3.19 we deduce that a local minimizer is \(C^{1,1}\)-regular as long as it stays in a single potential zone (see Corollary 3.20). From this the implication point i) \(\implies \) point ii) in Theorem 3.15 easily descends. Finally, we prove point i) in the form of the equivalent local statement given by Claim (3.14).

Two Lemmas. The first Lemma has a strong geometric flavour, highlighting a metric property of two given intersecting polytopes. It will be a crucial ingredient in the proof of point i) of Theorem 3.15.

Lemma 3.18

(Reciprocal distance of intersecting polytopes) Let \(A,B \subset \mathbb {R}^{d}\) be two polytopes with \(A\cap B \ne \emptyset \). Then there exists a sufficiently large constant \(M>0\) such that

$$\begin{aligned}{{\,\textrm{dist}\,}}_{A\cap B}(x)\le M {{\,\textrm{dist}\,}}_{B}(x)\quad \text {for every }x \in A.\end{aligned}$$

Proof

Let \(P\subset \mathbb {R}^{d}\) be a polytope endowed with a representation of the form (3.4), where we assume without loss of generality that each of the affine functions \(T_{j}\) has Lipschitz constant equal to 1. Let us consider the associated vector-valued function \(z_{P}:\mathbb {R}^{d}\rightarrow [0,+\infty )^{\ell }\) defined by

$$\begin{aligned} z_{P}(x)=\left( T_{1}(x)^{+},\dots ,T_{\ell }(x)^{+}\right) . \end{aligned}$$
(3.8)

As a first step we show that there exists a constant \(c_P>0\) such that

$$\begin{aligned} T_{j}(x)^{+}\le {{\,\textrm{dist}\,}}_{P}(x)\le c_{P}|z_{P}(x)|_{1}\quad \text {for every }\quad j\in \left\{ 1,\dots ,\ell \right\} \quad \text { and every }\quad x\in \mathbb {R}^{d}. \end{aligned}$$
(3.9)

(recall that the 1-norm of a vector \(z=(z_{1},\dots , z_{\ell })\in \mathbb {R}^{\ell }\) is defined as \(|z|_{1}:= |z_{1}|+\dots +|z_{\ell }|\)). The left inequality follows by the 1-Lipschitz assumption on \(T_j\), while the right one can be obtained using a compactness argument as follows. Suppose by contradiction that there exists a sequence of points \(x_n \in \mathbb {R}^{d}{\setminus } P\) such that

$$\begin{aligned}{{\,\textrm{dist}\,}}_{P}(x_n)\ge n|z_{P}(x_n)|_{1}.\end{aligned}$$

Up to replacing \(x_n\) with

$$\begin{aligned}\pi _{P}(x_n)+ \frac{x_n - \pi _{P}(x_n)}{{{\,\textrm{dist}\,}}_{P}(x_n)},\end{aligned}$$

we can assume that \({{\,\textrm{dist}\,}}_{P}(x_n)=1\). Now, if x is any cluster point for \(x_n\), we obtain that \({{\,\textrm{dist}\,}}_{P}(x)=1\) and \(z_{P}(x)=0\), which is clearly a contradiction. We are now in the position to prove the Lemma. Let \(\ell \in \mathbb {N}\) be the number of affine functions in a representation of B of the form (3.4), with 1-Lipschitz affine functions \(T_{j}\). By exploiting the estimates in (3.9), for every \(x\in A\) we can bound

$$\begin{aligned}{{\,\textrm{dist}\,}}_{A\cap B}(x)\le c_{A\cap B}|z_{A\cap B}(x)|_1\le \ell c_{A\cap B}{{\,\textrm{dist}\,}}_{B}(x),\end{aligned}$$

where \(c_{A\cap B}\) is a positive constant. The thesis follows by choosing \(M=\ell c_{A\cap B}\).\(\square \)

The second Lemma, of independent interest, concerns the regularity of local minimizers of action functionals restricted to curves living in a given closed convex set. Because of boundary effects, such constrained minimizers are in general not \(C^{2}\), even if the Lagrangian is smooth. Nevertheless, they must be at least \(C^{1,1}\), whenever the Lagrangian is \(C^{1}\).

Lemma 3.19

(Regularity for a problem with a convex constraint) Let \(P\subseteq \mathbb {R}^{d}\) be a closed convex set, and let \(\Psi :\mathbb {R}^{d}\rightarrow [0,+\infty )\) be of class \(C^1\) in a neighborhood of P. Given two points \(x_0, x_\delta \in P\), we consider the functional \(\mathcal {G}:C([0,\delta ],\mathbb {R}^d)\rightarrow [0,+\infty ]\) defined by

$$\begin{aligned}\mathcal {G}(\gamma )= \left\{ \begin{array}{ll} \displaystyle \int _{0}^{\delta }|\dot{\gamma }|^2+\Psi (\gamma )\quad &{}\text {if }\quad \gamma \in AC([0,\delta ],\mathbb {R}^d), \gamma (0)=x_0, \gamma (\delta )=x_\delta , \gamma ([0,\delta ])\subseteq P,\\ +\infty &{}\text {otherwise}. \end{array}\right. \end{aligned}$$

Then every local minimizer \(\gamma \) of \(\mathcal {G}\) is of class \(C^{1,1}\) and we can estimate

$$\begin{aligned}|\ddot{\gamma }(r)|\le \frac{1}{2}|\nabla \Psi (\gamma (r))|\quad \text {for a.e. }r\in (0,\delta ).\end{aligned}$$

Proof

We call \(\pi _P\) the projection on P. We start by defining, for each point \(x\in P\), the “blow up” of P at x, namely the closed convex cone \(P_{x}\) defined by the formula

$$\begin{aligned} P_{x}+x= \left\{ \begin{array}{ll} \mathbb {R}^{d} &{}{}\text{ if }\quad x\in \mathring{P},\\ \bigcap \left\{ \mathcal {H} \text{ half } \text{ space }: P\subset \mathcal {H},\, x\in \partial \mathcal {H}\right\} \quad &{}{}\text{ if }\quad x\in \partial P. \end{array}\right. \end{aligned}$$

We then call \(S_{x}\) the projection on \(P_{x}\), which turns out to be positive homogeneous and 1-Lipschitz. It is not difficult to realize that

$$\begin{aligned} \frac{P-x}{\epsilon }\xrightarrow {\epsilon \rightarrow 0^{+}}P_{x} \end{aligned}$$
(3.10)

in the sense of Hausdorff on every compact set. Since the inclusion \((P-x)/\epsilon \subseteq P_x\) always holds, in order to show the convergence (3.10), we only need to check, using a separation argument, that, for every \(\epsilon _j \rightarrow 0^{+}\), and every point \(z\in P_x\), there exists a sequence \(y_j \in (P-x)/\epsilon _j\) converging to z. Hence, the projection on \((P-x)/\epsilon \) pointwise converges to \(S_x\) as \(\epsilon \rightarrow 0^{+}\). By exploiting also the homogeneity of \(S_x\) we eventually obtain that, for every \(v\in \mathbb {R}^{d}\),

$$\begin{aligned} \pi _{P}(x+\epsilon v)=x+\epsilon S_{x}(v)+o(\epsilon ),\quad \text { as }\epsilon \rightarrow 0^{+}. \end{aligned}$$
(3.11)

Now, let \(\gamma \) be a local minimizer for the functional \(\mathcal {G}\). Given a test function \(\varphi \in C^{\infty }_{c}((0,\delta ); \mathbb {R}^{d})\), we consider the following competitors:

$$\begin{aligned}\gamma _{\epsilon }:=\pi _{P}(\gamma +\epsilon \varphi ),\quad \text {for }\epsilon >0.\end{aligned}$$

Since \(\pi _{P}\) is 1-Lipschitz, we have \(|\gamma _{\epsilon }'|\le |\gamma '+\epsilon \varphi '|\). Then

$$\begin{aligned}\mathcal {G}(\gamma _{\epsilon })\le \mathcal {G}(\gamma )+2\epsilon \int _{0}^{\delta }\gamma '\cdot \varphi '+\frac{\Psi (\gamma _{\epsilon })-\Psi (\gamma )}{2\epsilon }+\epsilon ^{2}\int _{0}^{\delta }|\varphi '|^2.\end{aligned}$$

Using the previous pointwise expansion (3.11), as well as the dominated convergence Theorem, we obtain

$$\begin{aligned} \underset{\epsilon \rightarrow 0^{+}}{\limsup }\,\frac{\mathcal {G}(\gamma _{\epsilon })-\mathcal {G}(\gamma )}{2\epsilon }\le \int _{0}^{\delta }\gamma '\cdot \varphi ' + \frac{1}{2}\nabla \Psi (\gamma )\cdot S_{\gamma }(\varphi ). \end{aligned}$$
(3.12)

From the local minimality of \(\gamma \), it follows immediately that the right-hand side in (3.12) is non-negative. Finally, we can use the contractivity of the projections \(S_x\) to get the inequality

$$\begin{aligned}\langle \gamma '', \varphi \rangle \le \int _{0}^{\delta }\frac{1}{2}|\nabla \Psi (\gamma )||\varphi |\quad \text {for every }\quad \varphi \in C^{\infty }_{c}((0,\delta ); \mathbb {R}^{d}),\end{aligned}$$

and the thesis follows.\(\square \)

To conclude this paragraph we propose the following easy consequence of Lemma 3.19.

Corollary 3.20

(Regularity in a potential zone) Let h be a non-decreasing potential shape, and let \(\gamma \) be a local minimizer of \(I^{\delta }_{f_{K},x_{0},x_{\delta }}\). Suppose that there exists an \(\eta \in \mathcal {E}\) such that \(\gamma ((s,t))\subseteq Q_{\eta }\), where \((s,t)\subset [0,\delta ]\). Then \(\gamma \) is \(C^{1,1}\)-regular on [st] and the following estimate holds

$$\begin{aligned}|\ddot{\gamma }(r)|\le h'(|\gamma (r)-\eta |^2)|\gamma (r)-\eta |\quad \text {for a.e.\ }r\in (s,t).\end{aligned}$$

Proof

We clearly have the inclusion \(\gamma ([s,t])\subset P_{\eta }\). Then we observe that, for every absolutely continuous curve \(\rho :[s,t]\rightarrow P_{\eta }\), due to the monotonicity of h and inequality (3.6), the following inequality holds:

$$\begin{aligned} \int _{s}^{t}|\dot{\rho }|^2+ h(|\nabla f|^{2}(\rho )) \le \int _{s}^{t}|\dot{\rho }|^2+h(|\eta - \rho |^2). \end{aligned}$$
(3.13)

By a comparison argument, knowing that \(\gamma \) is a local minimizer of the functional on the left-hand side of (3.13), and that \(\rho = \gamma \) saturates the inequality, we deduce that \(\gamma \) is a local minimizer for the functional on the right-hand side of (3.13), if one restricts to curves living in the closed convex set \(P_\eta \). At this point the proof is easily concluded by invoking Lemma 3.19.\(\square \)

Proof of the regularity result. In this paragraph we prove our main regularity result, Theorem 3.15. Point (ii) is immediately derived from point (i) via Corollary 3.20. Thus, only point (i) remains to be proved. To do so, we can restrict to prove the following equivalent local statement:

$$\begin{aligned}{} & {} {\textbf {Claim}}:\quad \text {for each time }t\in (0,\delta ],\nonumber \\{} & {} \quad \text { there exists }\epsilon >0\text { such that }\eta (\gamma )\text { is constant in}\quad (t-\epsilon , t). \end{aligned}$$
(3.14)

In fact, from Claim (3.14) and the analogous one for right intervals (obtainable by exploiting the autonomicity of the functional), it clearly descends that \(NDS(\gamma )\) is discrete. Then, by compactness, we derive that \(NDS(\gamma )\) is finite. The proof of Claim (3.14) will be elementary accomplished by a quite intricated series of “cut and paste” constructions of competitors. Let us first outline the general heuristic idea before entering in the details of the rigorous proof in the forthcoming pages. We will divide the proof in a few steps:

Step 1. Suppose by contradiction that t is a cluster point for the “jumps” in the potential. Then, approaching t from the left, \(\gamma \) would infinitely often visit high and low potential zones. If only one low potential zone were visited asymptotically, then it would be convenient to stay in it definitely. Therefore we can assume that \(\gamma \) infinitely often visits at least two different low potential zones, approaching t from the left. Moreover, in alternating between different low potential zones, \(\gamma \) necessarily spends a non-negligible amount of time in high potential ones.

Step 2. Approaching t from the left, the percentage of time spent by \(\gamma \) in high potential zones tends to zero, thus enforcing at least two different low potential zones to be very near to each other in order to make it possible for the Lipschitz curve \(\gamma \) to jump from one to the other in a short time.

Step 3. By Lemma 3.18 we will be eventually allowed to slightly deviate \(\gamma \) to reach an even lower potential zone (the interface between the two), hence contradicting the local optimality of \(\gamma \) and reaching the desired absurdum.

Proof of Claim (3.14)

Throughout the proof, \(\gamma \) will be a fixed local minimizer of \(I^{\delta }_{f_{K},x_{0},x_{\delta }}\) and L the Lipschitz constant of \(\gamma \). Moreover, for the sake of simplicity, we indicate \(I^{\delta }_{f_{K},x_{0},x_{\delta }}\) as \(\mathcal {F}\). As a preliminary observation, notice that we can choose an \(R>0\) large enough such that \(\gamma ([0,\delta ])\subset B_{R}\), and then, by point (v) in Proposition 3.6, we can find an \(\alpha >0\) small enough such that

$$\begin{aligned} |\bar{\eta }-x|\ge |\eta -x|+\alpha \quad \text{ for } \text{ every } x\in B_{2R}\cap Q_{\eta }\cap P_{\bar{\eta }} \text{ and } \text{ every } \text{ distinct } \eta , \bar{\eta } \in \mathcal {E}. \end{aligned}$$
(3.15)

By translation invariance, we can assume without loss of generality that \(\gamma (t)=0\), thus simplifying some further computations. Then we consider the asymptotic lowest potential threshold

$$\begin{aligned}a:=\underset{s\rightarrow t^{-}}{\liminf }\,|\eta (\gamma (s))|.\end{aligned}$$

We call \(\tilde{\mathcal {E}}\) the subset of \(\mathcal {E}\) indexing the potential zones that are visited infinitely often before t. Namely,

$$\begin{aligned}\tilde{\mathcal {E}}:=\left\{ \eta \in \mathcal {E}: \gamma ^{-1}(Q_{\eta })\cap (0,t) \text { accumulates in } t\right\} .\end{aligned}$$

Notice that the thesis is equivalent to \(\# \tilde{\mathcal {E}}=1\). The set \(\tilde{\mathcal {E}}\) can be partitioned into two subsets \(\tilde{\mathcal {E}}_{1}\) and \(\tilde{\mathcal {E}}_{2}\) defined by

$$\begin{aligned} \tilde{\mathcal {E}}_{1}&:=\left\{ \eta \in \tilde{\mathcal {E}}: |\eta |=a\right\} ,\\ \tilde{\mathcal {E}}_{2}&:=\left\{ \eta \in \tilde{\mathcal {E}}: |\eta |>a\right\} . \end{aligned}$$

The set \(\tilde{\mathcal {E}}_{1}\), which is clearly non-empty by the definition of a, corresponds to those potential zones which are infinitely often visited by \(\gamma \) before t and that share the asymptotic lowest potential threshold a. We expect the curve \(\gamma \) to spend most of the time there.

We make the following technical choices of constants in order to simplify later arguments. We fix \(r,\mu ,\epsilon >0\) small enough so that these requirements are satisfied:

  1. R1)

    For every \(x\in B_{r}\) we have \({{\,\textrm{opt}\,}}(x)\subseteq {{\,\textrm{opt}\,}}(0)\).

  2. R2)

    \(\gamma ((t-\epsilon ,t))\subset B_{r}\).

  3. R3)

    For every \(x\in B_{r}\), and for every \(\eta \in \mathcal {E}\), we have \(|\eta |-\mu<|x-\eta |<|\eta |+\mu \).

  4. R4)

    \(3\mu \le \alpha \).

  5. R5)

    For every choice of \(\eta , \bar{\eta } \in \mathcal {E}\), exactly one of the following holds:

    $$\begin{aligned}|\eta |=|\bar{\eta }|,\quad \min \left\{ |\eta |, |\bar{\eta }|\right\} \le \max \left\{ |\eta |, |\bar{\eta }|\right\} -3\mu .\end{aligned}$$
  6. R6)

    \((t-\epsilon ,t)\subseteq \underset{\eta \in \tilde{\mathcal {E}}}{\bigcup }\gamma ^{-1}(Q_{\eta })\).

Notice that thanks to R1), for every \(x\in B_{r}\) the segment [x, 0) will be entirely contained in the Voronoi cell \(V_{{{\,\textrm{opt}\,}}(x)}\), while R2) assures us that the curve belongs to this good area. Conditions R3), R4) and R5) will be useful to effectively distinguish between different potential zones. Finally, condition R6) implies that the potential zones touched by \(\gamma \) in the interval \((t-\epsilon ,t)\) are exactly those touched asymptotically. Thus, in particular, we have

$$\begin{aligned}a=\underset{s\in (t-\epsilon ,t)}{\min }|\eta (\gamma (s))|.\end{aligned}$$

By condition R6), the interval \((t-\epsilon ,t)\) can be partitioned into the following two sets:

$$\begin{aligned} C_{1}&:=(t-\epsilon ,t)\cap \underset{\eta \in \tilde{\mathcal {E}}_1}{\bigcup }\gamma ^{-1}(Q_{\eta }),\\ C_{2}&:=(t-\epsilon ,t)\cap \underset{\eta \in \tilde{\mathcal {E}}_2}{\bigcup }\gamma ^{-1}(Q_{\eta }). \end{aligned}$$

We observe that \(C_{1}\) is closed in \((t-\epsilon ,t)\). In fact, if \(s_j \in C_{1}\) and \(s_j \rightarrow s_{\infty }\in (t-\epsilon ,t)\), then by lower semicontinuity of the modulus of the extended gradient, we have \(|\eta (\gamma (s_{\infty }))|\le a+\mu \), whence \(|\eta (\gamma (s_{\infty }))|\le a+2\mu \), which implies \(|\eta (\gamma (s_{\infty }))|= a\), that is \(s_{\infty }\in C_1\). Then \(C_{2}\) is open and can be written as a finite or countable (possibly empty) union of open intervals.

Remember that the thesis is equivalent to \(\# \tilde{\mathcal {E}}=1\). We now assume that \(\# \tilde{\mathcal {E}}\ge 2\) and try to find a contradiction.

Step 1. (Reduction to the case in which \(\# \tilde{\mathcal {E}}_2 \ge 1\) and \(\# \tilde{\mathcal {E}}_1 \ge 2\)).

We first show that \(\# \tilde{\mathcal {E}}_2 \ge 1\). If this were not the case, then we would have \(|\eta (\gamma (s))|=a\) for every \(s\in (t-\epsilon ,t)\) and \(\# \tilde{\mathcal {E}}_1 \ge 2\). Then we could find two distinct \(\eta , \bar{\eta }\) both belonging to \(\tilde{\mathcal {E}}_{1}\) and a time \(s\in (t-\epsilon , t)\) such that \(x:=\gamma (s)\in Q_{\eta }\cap P_{\bar{\eta }}\). Now, by (3.15), this would imply that

$$\begin{aligned}|x-\eta |\le |x-\bar{\eta }|-\alpha ,\end{aligned}$$

and we reach a contradiction through the following chain of inequalities:

$$\begin{aligned}a =|\eta |&\le |x-\eta |+\mu \le |x-\bar{\eta }|-\alpha + \mu \\ {}&\le |\bar{\eta }|+2\mu -\alpha = a+2\mu - \alpha \le a-\mu <a.\end{aligned}$$

We now show that \(\# \tilde{\mathcal {E}}_1 \ge 2\). Suppose by contradiction that \(\tilde{\mathcal {E}}_{1}=\left\{ \eta \right\} \). Then we can build a better competitor \(\tilde{\gamma }\) by performing arbitrarily small perturbations of \(\gamma \) in the following way. We choose \(s\in (t-\epsilon ,t)\) as close to t as we want, such that \(\gamma (s)\in Q_{\eta }\). Then we modify \(\gamma \) only in the interval [st], by replacing it with its projection on the closed convex set \(P_{\eta }\). Namely, denoting by \(\pi _{\eta }\) the projection on \(P_{\eta }\), we define

$$\begin{aligned} \tilde{\gamma }(u)=\left\{ \begin{array}{ll} \gamma (u) &{}{}\text{ for }\quad u<s\quad \text{ or }\quad u>t,\\ \pi _{\eta }(\gamma (u))\quad &{}{}\text{ for }\quad u\in [s,t]. \end{array}\right. \end{aligned}$$

Now, we clearly have \(|\dot{\tilde{\gamma }}|\le |\dot{\gamma }|\), by the 1-Lipschitz property of \(\pi _{\eta }\). On the other hand, for what concerns the potential, in the interval [st] it holds

$$\begin{aligned}&|\nabla f(\gamma (u))|=|\nabla f(\tilde{\gamma }(u))| \quad \text{ if }\quad u\in C_{1},\\&\quad |\nabla f(\gamma (u))|>|\nabla f(\tilde{\gamma }(u))| \quad \text{ if }\quad u\in C_{2}. \end{aligned}$$

The second expression easily follows from the fact that if \(s\in C_{2}\), then \(|\eta (\gamma (s))|\ge a+3\mu \). From these estimates, the strict monotonicity of the potential shape h, and the local minimality of \(\gamma \), we obtain that \(C_{2}\cap (s,t)\) must be negligible for the one-dimensional Lebesgue measure. But this contradicts the fact that \(C_{2}\cap (s,t)\) is open and accumulates in t. Thus Step 1 is completed.

To make the point, after Step 1 the situation is the following. There must exist

  • Two elements \(\eta ,\bar{\eta } \in \tilde{\mathcal {E}}_{1}\), with \(\eta \ne \bar{\eta }\);

  • Two distinct Voronoi cells \(V_H\) and \(V_{\bar{H}}\), with \(V_{H}\subseteq Q_{\eta }\) and \(V_{\bar{H}}\subseteq Q_{\bar{\eta }}\);

  • A sequence of open intervals \((s_{\ell }, r_{\ell })\subset (t-\epsilon ,t)\) accumulating in t and such that, for every \(\ell \), the following conditions hold:

    • \(r_{\ell }<s_{\ell +1}\);

    • \(\gamma (s_{\ell })\in V_{H}\);

    • \(\gamma (r_{\ell })\in V_{\bar{H}}\);

    • \((s_{\ell },r_{\ell })\subset C_{2}\).

Step 2. (We have \((r_{\ell }-s_{\ell })=O((t-s_{\ell })^2)\)).

This will be handled once again by comparison with a suitably chosen competitor. We define, for every \(\ell \), the competitor \(\gamma _{\ell }\), obtained by modifying \(\gamma \) only in the interval \([s_{\ell },t]\), where \(\gamma \) is replaced with its projection on the segment \([\gamma (s_{\ell }), 0]\). That is to say, calling \(\pi _{\ell }\) the projection on \([\gamma (s_{\ell }), 0]\), we set

$$\begin{aligned} \gamma _{\ell }(u)= \left\{ \begin{array}{ll} \gamma (u) &{}{}\text{ for } \quad u<s_{\ell } \text{ or } u>t,\\ \pi _{\ell }(\gamma (u))\quad &{}{}\text{ for } \quad u\in [s_{\ell },t]. \end{array}\right. \end{aligned}$$

Then \(|\dot{\gamma }_{\ell }|\le |\dot{\gamma }|\) as before. Regarding the potential part, we have the following estimates, for \(u\in (s_{\ell },t)\):

$$\begin{aligned} |\nabla f(\gamma (u))|^2&\ge |\eta (\gamma (u))|^2-2|\eta (\gamma (u))||\gamma (u)|-|\gamma (u)|^2\\&\ge |\eta (\gamma (u))|^2-L(2S+r)(t-u),\\ |\nabla f(\gamma _{\ell }(u))|^2&\le |\eta (\gamma _{\ell }(u))|^2+2|\eta (\gamma _{\ell }(u))||\gamma _{\ell }(u)|+ |\gamma _{\ell }(u)|^2 \\&\le |\eta (\gamma _{\ell }(u))|^2 +L(2S+r)(t-u), \end{aligned}$$

where we set \(S:=\max \{|\eta |: \eta \in \tilde{\mathcal {E}}\}\). We now show that \(|\eta (\gamma _{\ell }(u))|\le a\) for every \(u\in (s_{\ell },t)\). By the definition of \(\gamma _{\ell }\) it is enough to show that \(|\eta (x)|\le a\) for every \(x\in [\gamma (s_{\ell }), 0]\). Remember that, by condition R1), \({{\,\textrm{opt}\,}}(\gamma (s_{\ell }))\subseteq {{\,\textrm{opt}\,}}(0)\) so that \(0\in \overline{V_{H}}\subseteq P_{\eta }\). In particular, \([\gamma (s_{\ell }),0)\subseteq V_{H}\), hence \(\eta (x)= \eta \) and \(|\eta (x)|=a\) for every \(x\in [\gamma (s_{\ell }), 0)\). Moreover, since \(0\in P_{\eta }\), \(|\eta (0)|\le |\eta |=a\). This justifies the following estimates:

$$\begin{aligned}&|\eta (\gamma (u))|^2 \ge a^2\ge |\eta (\gamma _{\ell }(u))|^2 \quad \text {for }u\in (s_{\ell },t),\\&|\eta (\gamma (u))|^2 \ge a^2+9\mu ^2 \ge |\eta (\gamma _{\ell }(u))|^2 +9\mu ^2 \quad \text {for }u\in (s_{\ell },r_{\ell }). \end{aligned}$$

After renaming the constant \(c_0:= L(2S+r)\), we obtain, for sufficiently large \(\ell \):

$$\begin{aligned} 0&\ge \mathcal {F}(\gamma )-\mathcal {F}(\gamma _{\ell })\\&\ge \int _{s_{\ell }}^{r_{\ell }}\left\{ h(a^2 + 9\mu ^{2}-c_{0}(t-u))-h(a^{2}+c_{0}(t-u))\right\} \\&\quad +\int _{r_{\ell }}^{t}\left\{ h(a^2 -c_{0}(t-u))-h(a^{2}+c_{0}(t-u))\right\} \\&\ge c_{1}(r_{\ell }-s_{\ell })-c_{2}(t-s_{\ell })^{2}. \end{aligned}$$

Here \(c_{1}:=h(a^{2}+8\mu ^{2})-h(a^{2}+\mu ^{2})>0\) and \(\ell \) is large enough so that \(c_{0}(t-s_{\ell })\le \mu ^{2}\). The constant \(c_{2}\) is instead defined as

$$\begin{aligned}c_{2}:=c_{0} {{\,\textrm{Lip}\,}}\left( h; [a^{2}-\mu ^{2}, a^{2}+\mu ^{2}]\right) \end{aligned}$$

By defining \(c_{3}:=c_{2}/c_{1}\), we eventually get

$$\begin{aligned}(r_{\ell }-s_{\ell })\le c_{3}(t-s_{\ell })^2\quad \text {for }\ell \text { large enough}.\end{aligned}$$

Step 3. (A slight deviation of \(\gamma \) through \(\overline{V_{H}}\cap \overline{V_{\bar{H}}}\) reduces the action, thus contradicting its local minimality).

Notice that \(\overline{V_{H}}\) and \(\overline{V_{\bar{H}}}\) are two polyhedra whose intersection contains 0. Possibly replacing them with their intersection with a large d-dimensional cube, we can assume that they are polytopes. Then we can apply Lemma 3.18 to deduce the existence of a constant \(M>0\) and a sequence of points \(x_{\ell }\in \overline{V_{H}}\cap \overline{V_{\bar{H}}}\) such that

$$\begin{aligned}|x_{\ell } - \gamma (s_{\ell })|\le M|\gamma (r_{\ell })-\gamma (s_{\ell })|\le MLc_{3}(t-s_{\ell })^{2}=c_{4}(t-s_{\ell })^2.\end{aligned}$$

We call \(\epsilon _{\ell }:= c_{4}(t-s_{\ell })\) and assume that \(\ell \) is large enough so that \(\epsilon _{\ell }\in (0,1)\). We crucially consider the following competitor:

$$\begin{aligned} \delta _{\ell }(u)=\left\{ \begin{array}{ll} \gamma (u) &{}{}\text{ for } \quad u<s_{\ell } \text{ or } u>t,\\ \gamma (s_{\ell })+ \frac{x_{\ell }-\gamma (s_{\ell })}{\epsilon _{\ell }(t-s_{\ell })}(u-s_{\ell }) &{}{}\text{ for } \quad u\in [s_{\ell },s_{\ell }+\epsilon _{\ell }(t-s_{\ell })),\\ x_{\ell }+\frac{-x_{\ell }}{(1-\epsilon _{\ell })(t-s_{\ell })}(u-s_{\ell }-\epsilon _{\ell }(t-s_{\ell }))\quad &{}{}\text{ for } \quad u\in [s_{\ell }+\epsilon _{\ell }(t-s_{\ell }),t]. \end{array}\right. \end{aligned}$$

Notice that in the interval \([s_{\ell }, t]\), the curve \(\delta _{\ell }\) is simply a piecewise linear modification of \(\gamma \), going from \(\gamma (s_{\ell })\) to \(x_{\ell }\) in time \(\epsilon _{\ell }(t-s_{\ell })\), and then from \(x_{\ell }\) to 0 in the remaining time. We will see that \(\delta _{\ell }\) has strictly less action than \(\gamma \) for \(\ell \) large enough, thus reaching the desired contradiction. We first want to be sure that \(\overline{V_{H}}\cap \overline{V_{\bar{H}}}\) is a very low potential zone, so that we can lower the action of \(\gamma \) by a slight deviation through it. This can be seen as follows. For sure \(\eta \) and \(\bar{\eta }\) both belong to \(\partial g(x_{\ell })\), thus also \(\frac{\eta +\bar{\eta }}{2} \in \partial g(x_{\ell })\). But then

$$\begin{aligned}|\eta (x_{\ell })|\le & {} |\eta (x_{\ell })-x_{\ell }|+|x_{\ell }|\le \left| \frac{\eta + \bar{\eta }}{2}-x_{\ell }\right| +|x_{\ell }|\le \left| \frac{\eta + \bar{\eta }}{2}\right| +2|x_{\ell }|\\= & {} \left( a^2-\left| \frac{\eta -\bar{\eta }}{2}\right| ^2\right) ^{\frac{1}{2}}+2|x_{\ell }|,\end{aligned}$$

where we used in the very last equality that \(|\eta |=|\bar{\eta }|=a\). Therefore, for \(\ell \) large enough, we can assume that \(x_{\ell } \in B_{r}\), and \(|\eta (x_{\ell })|<a\). The last one in particular implies that

$$\begin{aligned}|\eta (x)|\le a-3\mu \quad \text {for every }x\in [x_{\ell },0]\text { and }\ell \text { large enough}.\end{aligned}$$

We can also estimate

$$\begin{aligned}|x_{\ell }|^2\le (|\gamma (s_{\ell })|+|x_{\ell }-\gamma (s_{\ell })|)^2\le |\gamma (s_{\ell })|^2+2L\epsilon _{\ell }(t-s_{\ell })^2+\epsilon _{\ell }^2(t-s_{\ell })^2.\end{aligned}$$

Let us then compare the action of \(\delta _{\ell }\) with the one of \(\gamma \). We start from the kinetic part:

$$\begin{aligned} |\dot{\delta }_{\ell }(u)|^2&=\left( \frac{|x_{\ell }-\gamma (s_{\ell })|}{\epsilon _{\ell }(t-s_{\ell })}\right) ^2\le 1\quad \text {for }u\in [s_{\ell },s_{\ell }+\epsilon _{\ell }(t-s_{\ell })),\\ |\dot{\delta }_{\ell }(u)|^2&=\left( \frac{|x_{\ell }|}{(1-\epsilon _{\ell })(t-s_{\ell })}\right) ^2\\&\le \frac{|\gamma (s_{\ell })|^2}{(1-\epsilon _{\ell })^{2}(t-s_{\ell })^2}+\frac{2L\epsilon _{\ell }}{(1-\epsilon _{\ell })^2}+\frac{\epsilon _{\ell }^2}{(1-\epsilon _{\ell })^2}\quad \text {for }u\in [s_{\ell }+\epsilon _{\ell }(t-s_{\ell }),t). \end{aligned}$$

Whence, integrating:

$$\begin{aligned} \int _{s_{\ell }}^{t}|\dot{\gamma }|^2&\ge \frac{|\gamma (s_{\ell })|^2}{t-s_{\ell }},\\ \int _{s_{\ell }}^{t}|\dot{\delta }_{\ell }|^2&\le \epsilon _{\ell }(t-s_{\ell })+\frac{|\gamma (s_{\ell })|^2}{(1-\epsilon _{\ell })(t-s_{\ell })}+\frac{2L\epsilon _{\ell }(t-s_{\ell })}{(1-\epsilon _{\ell })}+\frac{\epsilon _{\ell }^{2}(t-s_{\ell })}{(1-\epsilon _{\ell })}\\&=\frac{|\gamma (s_{\ell })|^2}{(1-\epsilon _{\ell })(t-s_l)}+O(\epsilon _{\ell }^{2}). \end{aligned}$$

On the other hand, for the potential part, computations very similar to the ones of Step 2 give

$$\begin{aligned}&|\nabla f(\gamma (u))|^2\ge a^2-c_{0}(t-u)\quad \text {for }u\in [s_{\ell },t),\\&|\nabla f(\delta _{\ell }(u))|^2\le a^{2}+c_{0}(t-u)\quad \text {for }u\in [s_{\ell },s_{\ell }+\epsilon _{\ell }(t-s_{\ell })),\\&|\nabla f(\delta _{\ell }(u))|^2\le a^{2}-9\mu ^2 + c_{0}(t-u)\quad \text {for }u\in [s_{\ell }+\epsilon _{\ell }(t-s_{\ell }),t). \end{aligned}$$

Integration yields

$$\begin{aligned}&\int _{s_{\ell }}^{t}h(|\nabla f(\gamma )|^2) \ge h(a^{2})(t-s_{\ell })-c_{5}(t-s_{\ell })^2,\\&\int _{s_{\ell }}^{t}h(|\nabla f(\delta _{\ell })|^2) \le h(a^{2})(t-s_{\ell })+c_{5}(t-s_{\ell })^2 - c_{6}(1-\epsilon _{\ell })(t-s_{\ell }). \end{aligned}$$

Here \(\ell \) is chosen large enough so that \(c_{0}(t-s_{\ell })\le \mu ^2\). Moreover, we set

$$\begin{aligned}c_{5}:= \frac{c_{0}}{2}{{\,\textrm{Lip}\,}}\left( h;[a^{2}-\mu ,a^{2}+\mu ]\right) ,\quad c_{6}:=h(a^{2})-h(a^{2}-8\mu ^{2})>0.\end{aligned}$$

Finally, collecting all the estimates together, we obtain

$$\begin{aligned} \mathcal {F}(\gamma )-\mathcal {F}(\delta _{\ell })&\ge \frac{|\gamma (s_{\ell })|^2}{t-s_{\ell }}\left( \frac{-\epsilon _{\ell }}{(1-\epsilon _{\ell })}\right) +c_{6}(1-\epsilon _{\ell })(t-s_{\ell })+O(\epsilon _{\ell }^{2})\\&\ge \Bigg (-\frac{L^2}{c_{4}(1-\epsilon _{\ell })}\epsilon _{\ell }+\frac{c_{6}}{c_{4}}(1-\epsilon _{\ell })+O(\epsilon _{\ell })\Bigg )\epsilon _{\ell }. \end{aligned}$$

Now the contradiction comes from the fact that the right term is strictly positive for \(\ell \) large enough. This concludes the proof.

\(\square \)