An Eikonal Equation with Vanishing Lagrangian Arising in Global Optimization

We show a connection between global unconstrained optimization of a continuous function f and weak KAM theory for an eikonal-type equation arising also in ergodic control. A solution v of the critical Hamilton–Jacobi equation is built by a small discount approximation as well as the long time limit of an associated evolutive equation. Then v is represented as the value function of a control problem with target, whose optimal trajectories are driven by a differential inclusion describing the gradient descent of v. Such trajectories are proved to converge to the set of minima of f, using tools in control theory and occupational measures. We prove also that in some cases the set of minima is reached in finite time.


Introduction
Let f ∈ C(R n ) be a bounded function attaining the global minimum.Global optimization is concerned with the search of the minimum points, i.e., finding the set M = argmin f .For convex smooth functions this is achieved by the gradient flow, i.e., by following the trajectories of ẏ(s) = −∇f (y(s)) from any initial point x = y(0).However, if the function f is not convex the trajectory y(•) may converge to a local minimum or a saddle point.Several alternative algorithms have been designed to handle non-convex optimization, such as the stochastic gradient descent, simulated annealing, or consensus-based methods.In particular the case of nonsmooth f in high dimensions is important for the applications to machine learning, see, e.g., the recent paper [14] and the references therein.
In this paper we construct and study a Lipschitz function v : R n → R such that the following normalized non-smooth gradient descent differential inclusion (1.1) ẏ(s) ∈ − p |p| , p ∈ D − v(y(s)) , for a.e.s > 0, has a solution for any initial condition x = y(0) and all solutions converge to M as t → +∞.Here D − v is the sub-differential of the theory of viscosity solutions (see, e.g., [4]).The construction of such a generating function v is based on a classical problem for Hamilton-Jacobi equations: find a constant c such that the stationary equation has a solution v.The minimal c with this property is the critical value of the Hamiltonian H and, if H(x, •) is convex, it is also the value of an optimal control problem with ergodic cost having H as its Bellman Hamiltonian.If the critical solution v is interpreted in the viscosity sense, the problem fits in the weak KAM theory, and it is well-known that, for H = 1 2 |p| 2 − f (x) with f periodic, c = − min f [29,18]; moreover the same holds for any bounded f ∈ C 2 (R n ) by a result of Fathi and Maderna [20], and for uniformly continuous f as proved by Barles and Roquejoffre [5].In Section 2 we extend such result to f ∈ C(R n ), bounded, and attaining its minimum.We also prove that min f and v solving the critical equation can be approximated in two ways: by the solution of the stationary equation as λ → 0+, the so-called small discount limit, as well as by the long-time limit of the solution of the evolution equation (1.4) ∂ t u + 1 2 |Du| 2 = f (x), in R n × (0, +∞), u(x, 0) = 0.
Note that the two problems (1.3) and (1.4) do not require the a-priori knowledge of min f and argmin f .If, in addition, f is Lipschitz and semiconcave, we show that v is semiconcave and Du λ and D x u(•, t) both converge (a.e.) to Dv, therefore giving an approximation of the gradient descent equation (1.1).Moreover, in this case (1.1) becomes the classical normalised gradient descent ẏ(t) = − Dv(y(t)) |Dv(y(t))| , ∀ t > 0.
The main result of the paper is the convergence of the gradient descent trajectories (1.1) to the set M of minima of f .This is done in Section 3.1 after observing that v solves also the Dirichlet problem for the eikonal equation with ℓ(x) := 2(f (x) − min f ).(In fact, our analysis of this problem requires only that ℓ ∈ C(R n ) is bounded, non-negative, and M = {x : ℓ(x) = 0}).We exploit that the unique solution of (1.6) is the value function where α is measurable, |α(s)| ≤ 1, and t x (α) is the first time the trajectory y α x hits M. We show that optimal trajectories exist, satisfy the gradient descent inclusion (1.1), and tend to M as t → +∞ under a slightly strengthened positivity condition at infinity for ℓ.A crucial new tool for the proof are the occupational measures associated to these trajectories.
In the final section of the paper we give sufficient conditions such that the optimal trajectories reach M in finite time.This is a nontrivial problem even when v is smooth, because it is equivalent to the finite length of gradient orbits ż(s) = −Dv(z(s)), a question with a very large literature and open problems, see, e.g., [7,16] and the references therein.Here we prove the finite hitting time by assuming a bound from below on ℓ near the target and showing an inequality of Lojasiewicz type along optimal trajectories.In a forthcoming companion paper we also study the approximation of v and M by vanishing viscosity.We add to (1.3) a term −ε∆u λ and let λ → 0+ to get the viscous critical equation where U ε is a constant.We prove that 0 ≤ U ε − min f ≤ Cε β for some β > 0.
Then we define the approximate stochastic gradient descent and show that the trajectories converge to M in a suitable sense, for small λ and ε.These results can be found also in the second author's thesis [28].
Note that (1.4) is the classical Hamilton-Jacobi equation with the mechanical Hamiltonian H(x, p) = 1 2 |p| 2 − f (x), where −f is the potential energy.Then our results of Section 2 have an interpretation in analytical mechanics.For instance, the long-time behavior (1.5) describes a thermodynamical trend to equilibrium in a non-turbulent gas or fluid: see [12,13].
We do not attempt to review all the literature related to the topics mentioned above.For weak KAM theory on compact manifolds we refer to [19,18,17], and for the PDE approach to ergodic control, mostly under periodicity assumptions, the reader can consult [2,1] and the references therein.When the state space is not bounded one must add conditions to get some compactness.In addition to [20,5] already quoted, such problems were studied in all R n by [3,31,33,9,10,25] assuming that f is large enough at infinity, and by [23,24,26] for equations involving a linear first order term that satisfies a recurrence condition, see also the references therein.Here, instead, we get compactness from the boundedness of f and the assumption that its minimum is attained.Several of the results just quoted were used for homogenisation and singular perturbation problems, e.g., [29,3,1,33], so we believe that also our results will have such applications.
The Dirichlet problem (1.6) with ℓ vanishing at the boundary was studied, e.g., in [35,30,32].The case of a cost that does not vanish is part of time-optimal control and it is treated in [4], see also the references therein.The synthesis of an optimal feedback from the value function v leading to (1.1) uses method from [4] based on the earlier papers [6,22].
We do not try here to design algorithms for global optimization based on the previous results.Let us mention, however, that an efficient numerical method for computing at the same time c and v in the critical/ergodic PDE (1.2) was proposed in [8].
The paper is organized as follows.In Section 2.1 we prove the weak KAM theorem by the small discount approximation (1.3) and in Section 2.2 we study the long-time asymptotics of solutions to (1.4).Section 3.1 is devoted to the optimal control problem with target M associated to (1.6) and Section 3.2 to deriving the gradient descent inclusion (1.1) for the optimal trajectories.In Section 3.3 we prove that such trajectories converge to M, and in Section 3.4 we show two cases where the hitting time is finite.

A weak KAM theorem and approximation of the critical solution
We introduce the following assumptions and refer to them wherever it is needed: ξξ f is the second order derivative of f in the direction ξ.
A weak KAM theorem for the Hamiltonian H(x, p) = 1 2 |p| 2 − f (x) should give conditions under which there exists a constant U ∈ R, the (Mané) critical value, such that the equation has a viscosity solution v. Clearly any critical value must satisfy U ≤ f .In this section we prove under the current assumptions that f is a critical value and construct the solution v by two different approximation procedures, both having an interpretation in terms of ergodic problems in optimal control.
The fact that f is the maximal critical value was proved in [20] for f ∈ C 2 and with R n replaced by any complete Riemannian manifold, by methods of weak KAM theory different form ours.
2.1.The small discount limit.We consider the stationary approximation of (2.3) where λ > 0 will be sent to 0. The viscosity solution u λ is known to be the value function of the following infinite horizon discounted optimal control problem (2.5) where the controls α.: [0, +∞) → R n are measurable function (see, e.g., [4,Chapter III]).The main result of this section is the following.
Theorem 2.1.Under assumptions (A), as λ → 0 where v(•) is a Lipschitz continuous viscosity solution to Moreover v ≥ 0 in R n and null on M, and it is the unique viscosity solution of (2.6) in R n \ M vanishing on ∂M and bounded from below.
If we assume moreover that assumptions (B) hold, then For the proof we need some estimates uniform in λ.The first Lemma is known and we omit the proof (see [28] for the details).
Proof.We will skip the more standard parts and refer to [28] for the complete details.We use the vanishing viscosity approximation (2.9) 1 are standard and can be got, for instance, by representing u ε λ as the value function of the stochastic infinite-horizon discounted optimal control problem associated to (2.9) and exploiting the C 2 -semiconcavity and C 1 -Lipschitz continuity of f .Next we differentiate twice (2.9) in the direction of ξ and obtain In the case ω λ attains its maximum at some x we have 1 and then we easily reach the conclusion.For the general case we set, for β > 0 to be chosen, Since ω λ is bounded from above, Ψ λ attains a global maximum in R n , say at x (which depends on λ and β).By evaluating (2.11) in x, after some calculations and using the bound (2.10) we arrive at To prove the claim we suppose by contradiction there exists and hence Ψ λ (x) − C 3 > 0. On the other hand (2.12) gives ω λ (x) ≤ C 3 and which is the desired contradiction.This proves the claim and the C 3 -semiconcavity of u ε λ , uniformly in λ, for every 0 < ε ≤ 1. Finally we let ε → 0 in (2.9) and get that the solution u λ to (2.4) is semi-concave with constant where the inequality follows from the choice α • ≡ 0. The other inequality ≥ is true for all x ∈ R n by Lemma 2.1, so the claim is proved.Now we denote R := 4 f ∞ and use the gradient bound (2.8) to get Then λu λ (x) → f locally uniformly. Define Hence, {ϕ λ (•)} λ∈(0,1) is a uniformly bounded and equi-continuous family on any ball of R n .So we can choose a sequence We let λ k → 0 and use the stability of viscosity solutions to find that v satisfies (2.6).
Now we note that (2.6) is an eikonal equation with right hand side f (x) − f > 0 in R n \M, v ≥ 0 and v = 0 on ∂M.This Dirichlet boundary value problem is known to have a unique viscosity solution bounded from below.Therefore the convergence of ϕ λ is for λ → 0 and not only on subsequences.

Long time asymptotics.
Here we consider the evolutive Hamilton-Jacobi equation where D = ∇ = D x denotes the gradient with respect to the space variables x, and we will study the limit as t → +∞.The viscosity solution u(x, t) is known to be the value function of the following finite-horizon optimal control problem To proceed with its proof we need some estimates uniform in t.
Lemma 2.3.Under the assumption (A1), for all (x, t) ∈ R n × (0, +∞), Proof.As we did in the proof of Lemma 2.2, we consider the vanishing viscosity approximation It is known that u ε is the value function of the stochastic control problem (2.20) Take ξ ∈ R n with |ξ| = 1 and let ω(x, t) := D 2 ξξ u ε (x, t) be the second order derivative in space in the direction ξ.We claim first that ω(x, t) ≤ t C 2 or, equivalently, the value function u ε (x, t) is t C 2 -semiconcave in the spatial variable x.Let δ > 0 and take a δ 2 -optimal control for the initial point x.By using the same control for the initial points x + h and x − h we get (2.21) From the controled diffusion in (2.20) we have Since δ > 0 is arbitrary we have proved the claim.Similar computations (see [28]) yield Next we differentiate twice (2.19) in the direction of ξ and obtain (2.24) Since ω 2 ≤ |D ξ Du ε | 2 and by the semiconcavity assumption Now set g(x) := log(1 + |x| 2 ) and Φ(x, t) := ω(x, t) − βg(x), in R n × (0, +∞) for some β > 0 to be made precise.Since ω is bounded from above for 0 ≤ t ≤ T , Φ admits a global maximum in R n × [0, T ].Let (x, t) be such a maximum point.We consider first the case t ∈ (0, T ) and evaluate (2.25) in (x, t) to get (2.26) has a global maximum in x = 0 for n ≥ 2, and x 1+|x| 2 is bounded.Then, by (2.23) the bound in (2.26) gives We choose β and T such that On the other hand, if t = 0, u λ (x, 0) = 0 for all x implies ω(x, 0) = 0 and (2.27) still holds.And if t = T then ∂ t Φ(x, T ) ≥ 0, i.e., ∂ t ω(x, T ) ≥ 0 and (2.27) still holds.Therefore we have We are now ready to prove that ω(x, t) ≤ C 3 for all (x, t) ∈ R n × (0, +∞).As in the proof of Theorem 2.1 we suppose by contradiction there exists (y, s) such ω(y, s) − C 3 =: δ > 0. Without loss of generality, we can choose T > 0 large enough such that s < T .Then we argue exactly as in the proof of Theorem 2.1 and reach a contradiction by choosing β such that βg(y) ≤ δ 2 .This proves the C 3 -semiconcavity of u with respect to x uniformly in t, for every 0 < ε ≤ 1.Finally, we let ε → 0 in (2.19) and get that the solution u to (2.14) is semi-concave in x with constant Proof.(Theorem 2.2) First we observe that where the inequality follows from the choice α • ≡ 0. The other inequality ≥ is true for all x ∈ R n by Lemma 2.3.
Denote R := 4 f ∞ and use the gradient bound (2.18) to get Then u(x, t) → f locally uniformly as t → ∞.
Define now ϕ t (•) := u(•, t) − f t.We observe that, in view of (2.18), |ϕ t (x)| ≤ R dist(x, M) and |ϕ t (x)−ϕ t (y)| ≤ R|x−y|.Hence, {ϕ t (•)} t≥0 is a locally uniformly bounded and equi-continuous family.We claim that ϕ t (•) → ψ(•) ∈ C(R n ) locally uniformly as t → +∞ and ψ(•) is a viscosity solution of (2.29) To prove the claim define u η (x, t) := ϕ t/η (x) = u x, t η − t η f .Then we have where the last equality comes from the equicontinuity of ϕ t .Similarly, and so both θ and ζ do not depend on t.Next note that ϕ s (x) = 0 for all x ∈ M and it is non-negative everywhere.Then θ(x) = ζ(x) = 0 on ∂M, and they are a sub-and a supersolution bounded from below of (2.29) in R n \ M, where f (x) − f > 0. Then a standard comparison principle for the Dirichlet problem associated to eikonal equations gives θ(x) = ζ(x).This proves that ϕ t converges pointwise to ψ := θ = ζ ≥ 0, and the convergence is locally uniform by the Ascoli-Arzela theorem, which gives the claim.Moreover ψ coincides with the function v found in Theorem 2.1.
Finally, the convergence of the gradient D x u(•, t) = Dϕ t to Dψ is a direct consequence of [11,Theorem 3.3.3],recalling that |ϕ t (x)| ≤ R dist(x, M) and using the uniform semiconcavity estimate in Lemma 2.4.

Reaching the minima via optimal control
3.1.The optimal control problem with target.In this section we consider the Dirichlet problem motivated by the ergodic equation (2.6) of the previous section if ℓ(x) = 2(f (x) − f ).
Here, however, the standing assumptions are only that M ⊆ R n is a closed nonempty set, possibly unbounded, and Also define ℓ := sup x∈R n ℓ(x).The Lipschitz and semiconcavity conditions of the previous section (assumptions (B)) will not be needed in most statements of the present section.
We recall that the continuous viscosity solution of (3.1) is the value function of the control problem where α (an admissible control) is a measurable function [0, +∞) → B(0, 1), the unit ball in R n , t x (α) := inf{s ≥ 0 : y α x (s) ∈ M}, and (3. which says that (y * , α * ) is an optimal pair solution to (3.2).
Next we show that the fraction of time spent by an optimal trajectory away from the minimizers of ℓ tends to zero as t → +∞.For a given fixed δ > 0 we define the set of quasi-minimizers K δ := {x ∈ R n : ℓ(x) ≤ δ} and the fraction of time ρ δ (t) spent by an optimal trajectory starting from x away from K δ where I denotes the Lebesgue measure of I ⊆ R.
In other words, ρ δ (t) is the image of the complement of K δ by the occupational measure of the optimal trajectory y α * x .Theorem 3.2.Under assumption (F), for any x ∈ R n and δ > 0, an optimal trajectory y α * x (•) for the problem (3.2) satisfies In particular, lim t→+∞ ρ δ (t) = 0.
Proof.Since ℓ ≥ 0, using the characteristic function Now, since ℓ(y α * x (s)) = 0 for all s ≥ t x (α * ) and ℓ(•) ≤ l, we have for all t ≥ 0 The second factor on the right-hand side is the minimal time function whose optimal trajectories are the straight lines from the initial position x to its orthogonal projection on the set M, with maximal speed 1. Therefore the right-hand side in the last inequality is less or equal l|z − x| for any z ∈ M, and then Combining the inequalities we get which concludes the proof.

3.2.
A gradient descent inclusion for the optimal trajectories.So far, we showed that an optimal control exists and the corresponding optimal trajectory does not leave the set of minimizers in average as time goes to infinity, i.e. in the sense of (3.6).We now synthesize optimal feedback controls that give the gradient descent differential inclusion anticipated in the Introduction.We recall the definition of subdifferential of a continuous function ) , for a.e.s ∈ (0, t x (α)).
Proof.By the dynamic programming principle, the function is non-decreasing for all α, and non-increasing (hence constant) if and only if α is optimal.And since h is locally Lipschitz, we get α is optimal if and only if h ′ (t) ≤ 0 a.e.t.
We recall the definition of limiting gradient of a Lipschitz function and the super-differential of a continuous function The following necessary and sufficient conditions of optimality hold.
(iv) If ℓ(x) = 2(f (x) − f ) and assumptions (A) and (B) are satisfied, then v is differentiable at all points y(t) with t ∈ (0, t x (α * )) and (II) A sufficient condition for the optimality of y(•) is Proof.To prove (I.i) we take h defined by (3.7) and let ∂ + v(x; q) be the upper Dini derivative of v in direction q, with |q| Using [4, Lemma 2.37, p. 126], one has, for any z ∈ R n Hence, for p ∈ D + v(y(t)), one has But, as in Claim 1 in the proof of Theorem 3.3, and since y is optimal, one gets which proves the claim.Claim 2. ẏ(t) = − p |p| for all p ∈ D + v(y(t)), p = 0, a.e.t.
To prove (I.ii) we use the fact that h is non-increasing if and only if y(•) is optimal.Hence, for t > 0 and τ > 0 small, one has Recalling the definition of p ∈ D + v(y(t)), one has and together with the previous inequality this yields The other inequality is a direct consequence of p being in D + v(y(t)) and v a subsolution.This concludes the proof of statement (I.ii).
The property (I.iii) follows immediately from the equality |p| = ℓ(y(t)) for all p ∈ D + v(y(t)) and the convexity of the set D + v(y(t)).
To prove (II) note that at all points of differentiability of v, one has |Dv(z)| = ℓ(z).Then for all p ∈ D * v(z), |p| = ℓ(z).And one has Then, for y solving (3.9), p = 0 which concludes the proof as it has been done for Theorem 3.3.
If M is bounded, then it is easy to see that this condition is equivalent to which is also equivalent to Assumption (A3) in [25], Assumption (L3)-(3.2) in [10], and Assumption (L3) in [9].The last inequality, however, is impossible when M is unbounded.
A direct consequence of Theorem 3.3 is the following result.
We are now ready to show stability properties of the set of global minimizers M with respect to the optimal trajectories y α * x (•).Theorem 3.5.Assume (F) and (H) hold.Then for y * (•) as in (DI), 2 This means that M is Lyapunov stable and lim t→+∞ dist y α * x (t), M = 0 for all x ∈ R n .
The previous inequality writes as On the other hand, we know from Theorem 3.2, in particular (3.6), that and so we have N (t) < ℓ dist(x,M) ε γ(ε/2) .But this cannot be true since N (t) → +∞ as t → +∞, and hence it concludes the proof.
3.4.On reaching the argmin in finite time.Here we investigate whether the hitting time t x (α * of an optimal trajectory with the target M is finite or not.In view of the gradient descent inclusion (1.1), or its smooth version (3.8), the question is equivalent to the finite length of the orbits of the gradient flow ẏ ∈ −D − v(y), or ẏ = −∇v(y).This is a classical problem with a large literature.Positive results require strong regularity of v, such as quasiconvexity and subanaliticity [7].On the other hand, counterexamples are known for v ∈ C ∞ (R 2 ) and target a circle [34] or a single point [16].
In our case v is not smooth, but it is the value function of a control problem and solves an eikonal equation.These properties can be exploited to prove that the hitting time is finite in some cases.
The first sufficient condition, that complements the hypothesis (H), is the following, where d(x) := dist(x, M): • there exists a continuous function γ(s) > 0 for all s > 0 and γ(0) = 0, and some r > 0 such that Proposition 3.1.Assume (F), (H), and (L) hold, and α * be an optimal control for problem (3.2).Then the hitting time t x (α * ) = d(x) whenever d(x) ≤ r and it is finite for all x.
Proof.Let us first note that the finiteness for all x follows from the property in the case d(x) ≤ r, because by Theorem 3.5 (ii) there exists a finite time t x such that d(y α * x ( t x )) ≤ r.
We assume that the initial position x satisfies d(x) ≤ r and aim to prove that where v(x) is the value function defined in (3.2).Denote by V (x) the right-hand side of the last equality.
We first claim that v(x) ≤ V (x).Take z is in the set of projections of x onto M and consider the straight line from x to z given by the trajectory y x (t) = x − pt, t ≥ 0, where p = x−z |x−z| .Note that t x := inf{t ≥ 0 : y x (s) ∈ M} = d(x), and that d(x − pt) ≤ r for all 0 ≤ t ≤ t x .Then, by (L), Observe now that d(y x (t)) = |x − z| − t = d(x) − t.Therefore, using the change of variable s := d(y x (t)) = d(x) − t, we obtain and this proves the claim.This proves that y x (t) := x − pt an optimal trajectory and d(x) is its hitting time.

Next we show that
Remark 3.3.In some control problems it may happen that an optimal trajectory remains arbitrarily close to a target without ever reaching it.Such a behavior has been observed in a linear-quadratic control problem studied in [27, §6.1] with the target is a singleton {x • } and the time t ε of being ε-close to x • is shown to be , where x is the initial state.Moreover, an optimal trajectory oscillates periodically around x • (see [27, p. 55]).
Next we show that, under the set of assumptions of Section 2, a bound from below on ℓ near the target is a sufficient condition for the finite hitting time.The proof uses an inequality of Lojasiewicz type along optimal gradient orbits.If α * is an optimal control for x, then the hitting time t x (α * ) is finite for all x, and for d(x) sufficiently small Therefore it is not restrictive to assume that d(y(t)) ≤ r for all t > 0.
We re-parametrise the trajectory y to get a gradient orbit.Set Now we combine this with (3.17) to get the estimate (3.15).