1 Introduction

Let \(f\in C(\mathbb {R}^n)\) be a bounded function attaining the global minimum. Global optimization is concerned with the search of the minimum points, i.e., finding the set \({\mathfrak {M}}={{\,\textrm{argmin}\,}}f\). For convex smooth functions this is achieved by the gradient flow, i.e., by following the trajectories of \({\dot{y}}(s) = -\nabla f(y(s))\) from any initial point \(x=y(0)\). However, if the function f is not convex the trajectory \(y(\cdot )\) may converge to a local minimum or a saddle point. Several alternative algorithms have been designed to handle non-convex optimization, such as the stochastic gradient descent, simulated annealing, or consensus-based methods. In particular the case of non-smooth f in high dimensions is important for the applications to machine learning, see, e.g., the recent paper [14] and the references therein.

In this paper we construct and study a Lipschitz function \(v : \mathbb {R}^n \rightarrow \mathbb {R}\) such that the following normalized non-smooth gradient descent differential inclusion

$$\begin{aligned} {\dot{y}}(s)\in \left\{ - \frac{p}{|p|}\,,\; p\in D^{-}v(y(s))\right\} ,\; \text { for a.e. }\, s>0, \end{aligned}$$
(1.1)

has a solution for any initial condition \(x=y(0)\) and all solutions converge to \({\mathfrak {M}}\) as \(t\rightarrow +\infty \). Here \(D^{-}v\) is the sub-differential of the theory of viscosity solutions (see, e.g., [4]). The construction of such a generating function v is based on a classical problem for Hamilton–Jacobi equations: find a constant c such that the stationary equation

$$\begin{aligned} H(x, Dv)=c \quad \text {in } \mathbb {R}^n, \end{aligned}$$
(1.2)

has a solution v. The minimal c with this property is the critical value of the Hamiltonian H and, if \(H(x,\cdot )\) is convex, it is also the value of an optimal control problem with ergodic cost having H as its Bellman Hamiltonian. If the critical solution v is interpreted in the viscosity sense, the problem fits in the weak KAM theory, and it is well-known that, for \(H=\frac{1}{2}|p|^2 - f(x)\) with f periodic, \(c=-\min f\) [19, 28]; moreover the same holds for any bounded \(f\in C^2(\mathbb {R}^n)\) by a result of Fathi and Maderna [20], and for uniformly continuous f as proved by Barles and Roquejoffre [5]. In Sect. 2 we extend such result to \(f\in C(\mathbb {R}^n)\), bounded, and attaining its minimum. We also prove that \(\min f\) and v solving the critical equation

$$\begin{aligned} \min f + \frac{1}{2}|\nabla v(x)|^{2} = f(x) \quad \text { in } \mathbb {R}^{n}, \end{aligned}$$

can be approximated in two ways: by the solution of the stationary equation

$$\begin{aligned} \lambda u_{\lambda } + \frac{1}{2}|Du_{\lambda }|^{2} = f(x),\quad x\in \mathbb {R}^{n}, \end{aligned}$$
(1.3)

as \(\lambda \rightarrow 0+\), the so-called small discount limit, as well as by the long-time limit of the solution of the evolution equation

$$\begin{aligned} \partial _{t}u + \frac{1}{2}|Du |^{2} =\, f(x),\; \text { in }\;\mathbb {R}^n \times (0,+\infty ), \quad u(x,0) = 0. \end{aligned}$$
(1.4)

More precisely, for the evolutive Eq. (1.4) we prove

$$\begin{aligned} \lim _{t\rightarrow +\infty }\left( u(x,t) - t \min f\right) = v(x) \quad \text {locally uniformly in } \mathbb {R}^{n}. \end{aligned}$$
(1.5)

Note that the two problems (1.3) and (1.4) do not require the a-priori knowledge of \(\min f\) and \({{\,\textrm{argmin}\,}}f\). If, in addition, f is Lipschitz and semiconcave, we show that v is semiconcave and \(Du_\lambda \) and \(D_xu(\cdot ,t)\) both converge (a.e.) to Dv, therefore giving an approximation of the gradient descent Eq. (1.1). Moreover, in this case (1.1) becomes the classical normalised gradient descent

$$\begin{aligned} {\dot{y}}(t)= - \frac{Dv(y(t))}{|Dv(y(t))|},\quad \forall \,t>0. \end{aligned}$$

The main result of the paper is the convergence of the gradient descent trajectories (1.1) to the set \({\mathfrak {M}}\) of minima of f. This is done in Sect. 3.1 after observing that v solves also the Dirichlet problem for the eikonal equation

$$\begin{aligned} \left\{ \quad \begin{aligned} |\nabla v(x)|&= \ell (x),&x\in \mathbb {R}^{n}\setminus {\mathfrak {M}}\\ v(x)&= 0,&x\in {\mathfrak {M}} \end{aligned}\right. , \end{aligned}$$
(1.6)

with \(\ell (x) := \sqrt{2(f(x) - \min f)}\). (In fact, our analysis of this problem requires only that \(\ell \in C(\mathbb {R}^n)\) is bounded, non-negative, and \({\mathfrak {M}}=\{x : \ell (x)=0\}\)). We exploit that the unique solution of (1.6) is the value function

$$\begin{aligned} v(x)=\inf \limits _{\alpha ({\cdot }) }\int _{0}^{t_{x}(\alpha )} \ell (y_{x}^{\alpha }(s))\,\text {d}s , \quad {\dot{y}}^{\alpha }_{x}(s) = \alpha (s) ,\, \text { for } s > 0,\quad y_{x}^{\alpha }(0)=x, \end{aligned}$$

where \(\alpha \) is measurable, \(|\alpha (s)|\le 1\), and \(t_{x}(\alpha )\) is the first time the trajectory \(y_{x}^{\alpha }\) hits \({\mathfrak {M}}\). We show that optimal trajectories exist, satisfy the gradient descent inclusion (1.1), and tend to \({\mathfrak {M}}\) as \(t\rightarrow +\infty \) under a slightly strengthened positivity condition at infinity for \(\ell \). A crucial new tool for the proof are the occupational measures associated to these trajectories.

In the final section of the paper we give sufficient conditions such that the optimal trajectories reach \({\mathfrak {M}}\) in finite time. This is a nontrivial problem even when v is smooth, because it is equivalent to the finite length of gradient orbits \(\dot{z}(s)=-Dv(z(s))\), a question with a very large literature and open problems, see, e.g., [7, 16] and the references therein. Here we prove the finite hitting time by assuming a bound from below on \(\ell \) near the target and showing an inequality of Łojasiewicz type along optimal trajectories.

In a forthcoming companion paper we also study the approximation of v and \({\mathfrak {M}}\) by vanishing viscosity. We add to (1.3) a term \(-\varepsilon \varDelta u_\lambda \) and let \(\lambda \rightarrow 0+\) to get the viscous critical equation

$$\begin{aligned} U^\varepsilon - \varepsilon \varDelta v^\varepsilon (x) + \frac{1}{2}|\nabla v^\varepsilon (x)|^{2} = f(x) \quad \text { in } \mathbb {R}^{n}, \end{aligned}$$

where \(U^\varepsilon \) is a constant. We prove that \(0\le U^\varepsilon - \min f \le C\varepsilon ^\beta \) for some \(\beta >0\). Then we define the approximate stochastic gradient descent

$$\begin{aligned} \text {d}X _{s} = -\nabla u _{\lambda }(X _{s})\,\text {d}s + \sqrt{2\varepsilon }\,\text {d}W_{s}, \end{aligned}$$

and show that the trajectories converge to \({\mathfrak {M}}\) in a suitable sense, for small \(\lambda \) and \(\varepsilon \). These results can be found also in the second author’s thesis [27].

Note that (1.4) is the classical Hamilton–Jacobi equation with the mechanical Hamiltonian \(H(x,p)=\frac{1}{2} |p|^2-f(x)\), where \(-f\) is the potential energy. Then our results of Sect. 2 have an interpretation in analytical mechanics. For instance, the long-time behavior (1.5) describes a thermodynamical trend to equilibrium in a non-turbulent gas or fluid: see [12, 13].

We do not attempt to review all the literature related to the topics mentioned above. For weak KAM theory on compact manifolds we refer to [17,18,19], and for the PDE approach to ergodic control, mostly under periodicity assumptions, the reader can consult [1, 2] and the references therein. When the state space is not bounded one must add conditions to get some compactness. In addition to [5, 20] already quoted, such problems were studied in all \(\mathbb {R}^n\) by [3, 9, 10, 24, 30, 32] assuming that f is large enough at infinity, and by [22, 23, 25] for equations involving a linear first order term that satisfies a recurrence condition, see also the references therein. Here, instead, we get compactness from the boundedness of f and the assumption that its minimum is attained. Several of the results just quoted were used for homogenisation and singular perturbation problems, e.g., [1, 3, 28, 32], so we believe that also our results will have such applications.

The Dirichlet problem (1.6) with \(\ell \) vanishing at the boundary was studied, e.g., in [29, 31, 34]. The case of a cost that does not vanish is part of time-optimal control and it is treated in [4], see also the references therein. The synthesis of an optimal feedback from the value function v leading to (1.1) uses method from [4] based on the earlier papers [6, 21].

We do not try here to design algorithms for global optimization based on the previous results. Let us mention, however, that an efficient numerical method for computing at the same time c and v in the critical/ergodic PDE (1.2) was proposed in [8].

The paper is organized as follows. In Sect. 2.1 we prove the weak KAM theorem by the small discount approximation (1.3) and in Sect. 2.2 we study the long-time asymptotics of solutions to (1.4). Section 3.1 is devoted to the optimal control problem with target \({\mathfrak {M}}\) associated to (1.6) and Sect. 3.2 to deriving the gradient descent inclusion (1.1) for the optimal trajectories. In Sect. 3.3 we prove that such trajectories converge to \({\mathfrak {M}}\), and in Sect. 3.4 we show two cases where the hitting time is finite.

2 A Weak KAM Theorem and Approximation of the Critical Solution

We introduce the following assumptions and refer to them wherever it is needed: Assumptions (A)

  1. (A1)

    \(f : \mathbb {R}^n\rightarrow \mathbb {R}\) is continuous and

    $$\begin{aligned} \exists \;\underline{f},\,\overline{f}\; \text {s.t. }\; \underline{f} \le f(x) \le \overline{f},\quad \forall \;x\in \mathbb {R}^n. \end{aligned}$$
    (2.1)
  2. (A2)

    f attains the minimum, i.e.,

    $$\begin{aligned} {\mathfrak {M}}:=\{x\in \mathbb {R}^{n}\,:\, f(x) = \underline{f}:= \min \limits _{z\in \mathbb {R}^{n}}f(z)\} \ne \emptyset . \end{aligned}$$
    (2.2)

Assumptions (B)

  1. (B1)

    f is \(C_{1}\)-Lipschitz continuous, i.e. \(C_{1}= \Vert \nabla f\Vert _\infty \).

  2. (B2)

    f is \(C_{2}\)-semiconcave, i.e., \(D^{2}_{\xi \xi }f \le C_{2}\) a.e. for all \(\xi \in \mathbb {R}^{n}\) s.t. \(|\xi |=1\), where \(D^{2}_{\xi \xi }f\) is the second order derivative of f in the direction \(\xi \).

A weak KAM theorem for the Hamiltonian \(H(x,p)=\frac{1}{2}|p|^2 - f(x)\) should give conditions under which there exists a constant \(U\in \mathbb {R}\), the (Mané) critical value, such that the equation

$$\begin{aligned} U + \frac{1}{2}|\nabla v(x)|^{2} = f(x),\quad \text { in } \mathbb {R}^{n}, \end{aligned}$$
(2.3)

has a viscosity solution v. Clearly any critical value must satisfy \(U\le \underline{f}\). In this section we prove under the current assumptions that \(\underline{f}\) is a critical value and construct the solution v by two different approximation procedures, both having an interpretation in terms of ergodic problems in optimal control.

The fact that \(\underline{f}\) is the maximal critical value was proved in [20] for \(f\in C^{2}\) and with \(\mathbb {R}^n\) replaced by any complete Riemannian manifold, by methods of weak KAM theory different form ours.

2.1 The Small Discount Limit

We consider the stationary approximation of (2.3)

$$\begin{aligned} \lambda u_{\lambda } + \frac{1}{2}|Du_{\lambda }|^{2} = f(x),\quad x\in \mathbb {R}^{n}, \end{aligned}$$
(2.4)

where \(\lambda >0\) will be sent to 0. The viscosity solution \(u_\lambda \) is known to be the value function of the following infinite horizon discounted optimal control problem

$$\begin{aligned} \begin{aligned} u_{\lambda }(x) = \inf \limits _{\alpha _{\cdot }} \;&J(x,\alpha _{\cdot }):=\int _{0}^{+\infty }\left( \frac{1}{2}|\alpha _{t}|^{2}+f(x(t))\right) e^{-\lambda t}\,\text {d}t,\\&\text {s.t. }\; {\dot{x}}(s) = \alpha _{s},\quad x(0)=x\in \mathbb {R}^n,\quad s\ge 0, \end{aligned} \end{aligned}$$
(2.5)

where the controls \(\alpha .:[0,+\infty )\rightarrow \mathbb {R}^{n}\) are measurable functions. The main result of this section is the following.

Theorem 1

Under assumptions (A), as \(\lambda \rightarrow 0\)

$$\begin{aligned} \lambda u_{\lambda }(x) \rightarrow \underline{f} \quad and \quad u_{\lambda }(x) - \underline{f}\lambda ^{-1} \rightarrow v(x) \quad \text {locally uniformly in } \mathbb {R}^{n}, \end{aligned}$$

where \(v(\cdot )\) is a Lipschitz continuous viscosity solution to

$$\begin{aligned} \underline{f} + \frac{1}{2}|Dv(x)|^{2} = f(x),\quad x\in \mathbb {R}^n. \end{aligned}$$
(2.6)

Moreover \(v\ge 0\) in \(\mathbb {R}^n\) and null on \({\mathfrak {M}}\), and it is the unique viscosity solution of (2.6) in \(\mathbb {R}^n\setminus {\mathfrak {M}}\) vanishing on \(\partial {\mathfrak {M}}\) and bounded from below.

If we assume moreover that assumptions (B) hold, then

$$\begin{aligned} Du_{\lambda }(x)\rightarrow Dv(x) \quad a.e. \end{aligned}$$

For the proof we need some estimates uniform in \(\lambda \). The first Lemma is known and we omit the proof (see [27] for the details).

Lemma 1

Under the assumption (A1), for all \(x\in \mathbb {R}^n\) and \(\lambda >0\),

$$\begin{aligned}{} & {} \underline{f}\;\le \; \lambda u_{\lambda }(x)\;\le \; \overline{f}, \end{aligned}$$
(2.7)
$$\begin{aligned}{} & {} \quad |Du_{\lambda }(x)| \le \sqrt{4\Vert f\Vert _{\infty }} \quad \text {a.e.}. \end{aligned}$$
(2.8)

Lemma 2

Assume (A) and (B) hold. Then \(u_{\lambda }\) is \({\widetilde{C}}_{3}-\)semiconcave, where \({\widetilde{C}}_{3}\) is a positive constant independent of \(\lambda > 0\).

Proof

We will skip the more standard parts and refer to [27] for the complete details. We use the vanishing viscosity approximation

$$\begin{aligned} \lambda u^{\varepsilon }_{\lambda } - \varepsilon \varDelta u^{\varepsilon }_{\lambda } + \frac{1}{2}|Du^{\varepsilon }_{\lambda }|^{2} = f(x),\quad x\in \mathbb {R}^{n}. \end{aligned}$$
(2.9)

We fix \(\xi \in \mathbb {R}^{n}\) such that \(|\xi | = 1\) and denote \(\omega _{\lambda } (x) := D^{2}_{_{\xi \xi }}u^{\varepsilon }_{\lambda }(x)\) the second order derivative in the direction \(\xi \). The estimates \(\omega _{\lambda }(x)\le \lambda ^{-1}C_{2}\) and

$$\begin{aligned} |Du^{\varepsilon }_{\lambda }(x)|\le \lambda ^{-1}C_{1}, \end{aligned}$$
(2.10)

are standard and can be got, for instance, by representing \(u^{\varepsilon }_{\lambda }\) as the value function of the stochastic infinite-horizon discounted optimal control problem associated to (2.9) and exploiting the \(C_2\)-semiconcavity and \(C_1\)-Lipschitz continuity of f.

Next we differentiate twice (2.9) in the direction of \(\xi \) and obtain

$$\begin{aligned} - \varepsilon \varDelta \omega _{\lambda } + Du^{\varepsilon }_{\lambda }\cdot D\omega _{\lambda } + |D_{_{\xi }}Du^{\varepsilon }_{\lambda }|^{2} + \lambda \omega _{\lambda } = D^{2}_{_{\xi \xi }}f,\quad \text {in } \mathbb {R}^n. \end{aligned}$$

By \(\omega _{\lambda }^{2}\le |D_{_{\xi }}D u_{\lambda }|^{2}\) and the semiconcavity assumption \(D^{2}_{_{\xi \xi }}f \le C_{2}\) we get

$$\begin{aligned} - \varepsilon \varDelta \omega _{\lambda } + Du^{\varepsilon }_{\lambda }\cdot D\omega _{\lambda } + \omega _{\lambda }^{2} + \lambda \omega _{\lambda } \le C_{2},\quad \text {in } \mathbb {R}^n. \end{aligned}$$
(2.11)

In the case \(\omega _\lambda \) attains its maximum at some \({{\bar{x}}}\) we have

$$\begin{aligned} \omega _\lambda ^2({{\bar{x}}}) + \lambda \omega _\lambda ({{\bar{x}}}) \le C_2. \end{aligned}$$

By the elementary inequality \(\frac{1}{2}\left( z^{2} - \lambda ^{2}\right) \le z^{2} + \lambda z\) we get, for \(\lambda \le 1\),

$$\begin{aligned} \omega _\lambda ^2({{\bar{x}}}) \le 2C_2 +1, \end{aligned}$$

and then we easily reach the conclusion. For the general case we set, for \(\beta >0\) to be chosen,

$$\begin{aligned} \varPsi _{\lambda }(x):= \omega _{\lambda }(x) - \beta \log \bigg (1+|x|^{2}\bigg ). \end{aligned}$$

Since \(\omega _{\lambda }\) is bounded from above, \(\varPsi _{\lambda }\) attains a global maximum in \(\mathbb {R}^n\), say at \(\overline{x}\) (which depends on \(\lambda \) and \(\beta \)). By evaluating (2.11) in \(\overline{x}\), after some calculations and using the bound (2.10) we arrive at

$$\begin{aligned} \omega _{\lambda }^{2}(\overline{x}) + \lambda \omega _{\lambda }(\overline{x}) \le C_{2} + 2\varepsilon \beta n + 2\beta \lambda ^{-1} C_{1}. \end{aligned}$$

Arguing as above we get, for \(\beta \le \lambda /2\le 1\),

$$\begin{aligned} \omega _{\lambda }(\overline{x})^{2} \le 2(C_{1} + C_{2} + 2\varepsilon n) + 1. \end{aligned}$$
(2.12)

Now we claim that

$$\begin{aligned} \omega _{\lambda }(x) \le C_{3} :=\sqrt{2(C_{1} + C_{2} + 2\varepsilon \,n) + 1},\quad \text {for all }\, x\in \mathbb {R}^n. \end{aligned}$$

To prove the claim we suppose by contradiction there exists \(y\in \mathbb {R}^{n}\) such that \(\omega _{\lambda }(y,s) - C_{3}=: \delta >0\). Denote \(g(x) := \log (1+|x|^{2})\) and choose \(\beta >0\) small enough such that \(\beta g(y)\le \frac{\delta }{2}\). Then

$$\begin{aligned} 0< \frac{\delta }{2}\le \delta -\beta g(y) = \omega _{\lambda }(y) - \beta g(y) - C_{3} = \varPsi _{\lambda }(y) - C_{3}, \end{aligned}$$

and hence \(\varPsi _{\lambda }(\overline{x}) - C_{3} >0\). On the other hand (2.12) gives \( \omega _{\lambda }(\overline{x})\le C_{3}\) and

$$\begin{aligned} \varPsi _{\lambda }(\overline{x}) - C_{3} \le -\beta g(\overline{x}) \le 0, \end{aligned}$$

which is the desired contradiction. This proves the claim and the \(C_3\)-semiconcavity of \(u^\varepsilon _{\lambda }\), uniformly in \(\lambda \), for every \(0<\varepsilon \le 1\). Finally we let \(\varepsilon \rightarrow 0\) in (2.9) and get that the solution \(u_{\lambda }\) to (2.4) is semi-concave with constant \({\widetilde{C}}_{3}:=\sqrt{2(C_{1} + C_{2}) + 1}\). \(\square \)

Proof of Theorem 1

First we claim that \(\lambda u_{\lambda }({\bar{x}})=\underline{f}\) if \({{\bar{x}}}\in {\mathfrak {M}}\) (i.e., \(f(\bar{x})=\underline{f}= \min f\)), for all \(\lambda >0\). In fact, for such \({{\bar{x}}}\),

$$\begin{aligned} u_{\lambda }(\overline{x}) = \inf \limits _{\alpha _{\cdot }}\int _{0}^{+\infty }\left( \frac{1}{2}|\alpha _{t}|^{2} + f(x(t))\right) e^{-\lambda \,t} \, \text {d}t \le \int _{0}^{+\infty }f(\overline{x})e^{-\lambda \,t}\,\text {d}t = \underline{f}\lambda ^{-1}, \end{aligned}$$

where the inequality follows from the choice \(\alpha _{\cdot }\equiv 0\). The other inequality \(\ge \) is true for all \(x\in \mathbb {R}^{n}\) by Lemma 1, so the claim is proved.

Now we denote \(R:=\sqrt{4\Vert f\Vert _{\infty }}\) and use the gradient bound (2.8) to get

$$\begin{aligned} |\lambda u_{\lambda } (x)-\underline{f}|\le \lambda R \,\text {dist}(x, {\mathfrak {M}}) \quad \forall \, x\in \mathbb {R}^n. \end{aligned}$$

Then \(\lambda u_{\lambda }(x) \rightarrow \underline{f}\) locally uniformly.

Define \(\varphi _{\lambda }(\cdot ) := u_\lambda (\cdot ) - \underline{f}\lambda ^{-1}\ge 0\) and use (2.8) to get, for all \(x,y\in \mathbb {R}^n\),

$$\begin{aligned} |\varphi _{\lambda }(x)| \le R \,\text {dist}(x, {\mathfrak {M}}) , \quad \quad |\varphi _{\lambda }(x)-\varphi _{\lambda }(y)| \le R\,|x-y|. \end{aligned}$$
(2.13)

Hence, \(\{\varphi _{\lambda }(\cdot )\}_{\lambda \in (0,1)}\) is a uniformly bounded and equi-continuous family on any ball of \(\mathbb {R}^n\). So we can choose a sequence \(\lambda _{k}\rightarrow 0\) as \(k\rightarrow +\infty \), such that \(\varphi _{\lambda _{k}}(\cdot )\rightarrow v(\cdot )\in C(\mathbb {R}^n) \) locally uniformly. Plugging \(\varphi _{\lambda }\) in (2.4) we get

$$\begin{aligned} \lambda \varphi _{\lambda } + \underline{f}+ \frac{1}{2}|D\varphi _{\lambda }(x)|^{2} = f(x),\quad x\in \mathbb {R}^n. \end{aligned}$$

We let \(\lambda _{k}\rightarrow 0\) and use the stability of viscosity solutions to find that v satisfies (2.6).

Now we note that (2.6) is an eikonal equation with right hand side \(f(x) - \underline{f}> 0\) in \(\mathbb {R}^n \setminus {\mathfrak {M}}\), \(v\ge 0\) and \(v=0\) on \(\partial {\mathfrak {M}}\). This Dirichlet boundary value problem is known to have a unique viscosity solution bounded from below. Therefore the convergence of \(\varphi _{\lambda }\) is for \(\lambda \rightarrow 0\) and not only on subsequences.

The convergence of the gradient \(Du_{\lambda }(\cdot )\) to \(Dv(\cdot )\) is a direct consequence of [11, Theorem 3.3.3], recalling that \(|\varphi _{\lambda }(x)|\le R\,|x|\) and using the uniform semiconcavity estimate in Lemma 2. \(\square \)

2.2 Long Time Asymptotics

Here we consider the evolutive Hamilton-Jacobi equation

$$\begin{aligned} \left\{ \begin{aligned} \partial _{t}u(x,t) + \frac{1}{2}|Du(x,t)|^{2} =&\, f(x),\quad{} & {} (x,t)\in \mathbb {R}^n \times (0,+\infty ) ,\\ u(x,0) =&\, 0,\quad{} & {} x\in \mathbb {R}^n, \end{aligned}\right. \end{aligned}$$
(2.14)

where \(D=\nabla =D_x\) denotes the gradient with respect to the space variables x, and we will study the limit as \(t\rightarrow +\infty \). The viscosity solution u(xt) is known to be the value function of the following finite-horizon optimal control problem

$$\begin{aligned} \begin{aligned} u(x,t) = \inf \limits _{\alpha _{\cdot }} \;&J(x,t,\alpha _{\cdot }):=\int _{0}^{t}\; \frac{1}{2}|\alpha _{s}|^{2} + f(x(s))\;\text {d}s,\\&\text {s.t. }\; {\dot{x}}(s) = \alpha _{s},\quad x(0)=x\in \mathbb {R}^n, \end{aligned} \end{aligned}$$
(2.15)

where \(\alpha .:[0,+\infty )\rightarrow \mathbb {R}^{n}\) are measurable functions. The main result of this section is the following.

Theorem 2

Under assumptions (A), as \(t\rightarrow +\infty \),

$$\begin{aligned} \frac{u(x,t)}{t} \rightarrow \underline{f} \quad and \quad u(x,t) - \underline{f}t \rightarrow v(x) \quad \text {locally uniformly in } \mathbb {R}^{n}, \end{aligned}$$

where \(v(\cdot )\) is the viscosity solution of (2.6) found in Theorem 1.

If we assume moreover that assumptions (B) hold, then

$$\begin{aligned} D_x u(x,t) \rightarrow Dv(x) \quad a.e. \end{aligned}$$

To proceed with its proof we need some estimates uniform in t.

Lemma 3

Under the assumption (A1), for all \((x,t)\in \mathbb {R}^n \times (0,+\infty )\),

$$\begin{aligned}{} & {} \underline{f}\le \frac{u(x,t)}{t} \le \overline{f} , \end{aligned}$$
(2.16)
$$\begin{aligned}{} & {} \quad \left| \partial _{t}u(x,t)\right| \le \Vert f\Vert _{\infty } \quad \text {a.e.}, \end{aligned}$$
(2.17)
$$\begin{aligned}{} & {} \quad |Du(x,t)|\le \sqrt{4\Vert f\Vert _{\infty }} \quad \text {a.e.} \end{aligned}$$
(2.18)

Proof

The arguments are standard, for the reader’s convenience we show (2.17). Fix \(h\in \mathbb {R}\) and \(x\in \mathbb {R}^n\). Note first that \( |u(x,h)|\le |h|\Vert f\Vert _{\infty }\). Let us now denote \(\overline{v}(x,t) := u(x,t+h) + |h|\Vert f\Vert _{\infty }\). Both u and \(\overline{v}\) solve the same PDE in (2.14) with initial conditions \(u(x,0) = 0\) and \(\overline{v}(x,0) = u(x,h) + |h|\Vert f\Vert _{\infty } \ge 0\), hence by the comparison principle in [15, Theorem 2.1] we get \(u(x,t)\le \overline{v}(x,t)\).

Conversely, \(\underline{v}(x,t) := u(x,t+h) - |h|\Vert f\Vert _{\infty }\) solves the same PDE in (2.14) with initial condition \(\underline{v}(x,0) = u(x,h) - |h|\Vert f\Vert _{\infty } \le u(x,0) = 0\). The same comparison principle now implies that \(\underline{v}(x,t)\le u(x,t)\). Therefore, one gets \(|u(x,t+h)-u(x,t)|\le |h|\Vert f\Vert _{\infty }\). \(\square \)

Lemma 4

Assume (A) and (B) hold. Then u is \({\widetilde{C}}_{3}-\)semiconcave, where \({\widetilde{C}}_{3}\) is a positive constant independent of \(t\ge 0\).

Proof

As we did in the proof of Lemma 2, we consider the vanishing viscosity approximation

$$\begin{aligned} \left\{ \begin{aligned}&\partial _{t}u^{\varepsilon } - \varepsilon \varDelta u^{\varepsilon } + \frac{1}{2}|\nabla u^{\varepsilon }|^{2} = f(x),\quad (x,t)\in \mathbb {R}^{n}\times (0, \infty )\\&u^{\varepsilon }(x,0) = 0,\quad x\in \mathbb {R}^n \end{aligned} \right. \end{aligned}$$
(2.19)

It is known that \(u^\varepsilon \) is the value function of the stochastic control problem

$$\begin{aligned} u^{\varepsilon }(x,t) = \inf \limits _{\alpha _{\cdot }\in {\mathcal {A}}} \mathbb {E}\left[ \int _{0}^{t}\frac{1}{2}|\alpha _{s}|^{2} + f(X_{s})\,\text {d}s\,\bigg |\, X_{0}=x\right] , \quad \text {d}X_{s} = \alpha _{s}\,\text {d}s + \sqrt{2\varepsilon }\,\text {d}W_{s}.\nonumber \\ \end{aligned}$$
(2.20)

Take \(\xi \in \mathbb {R}^{n}\) with \(|\xi | = 1\) and let \(\omega (x,t) := D^{2}_{\xi \xi }u^{\varepsilon }(x,t)\) be the second order derivative in space in the direction \(\xi \). We claim first that \(\omega (x,t)\le t\,C_{2}\) or, equivalently, the value function \(u^{\varepsilon }(x,t)\) is \(t\,C_{2}\)-semiconcave in the spatial variable x. Let \(\delta >0\) and take a \(\frac{\delta }{2}\)-optimal control for the initial point x. By using the same control for the initial points \(x+h\) and \(x-h\) we get

$$\begin{aligned} \begin{aligned}&u^{\varepsilon }(x+h,t) - 2 u^{\varepsilon }(x,t) + u^{\varepsilon }(x-h,t) - \delta \\&\le \mathbb {E}\left[ \int _{0}^{t} f(X^{x+h}_{s}) - 2 f(X^{x}_{s}) + f(X^{x-h}_{s})\; \text {d}s\right] . \end{aligned} \end{aligned}$$
(2.21)

From the controlled diffusion in (2.20) we have \(X^{x}_{s} = \frac{1}{2}\left( X^{x+h}_{s} + X^{x-h}_{s}\right) \), and f \(C_{2}\)-semiconcave implies

$$\begin{aligned} \begin{aligned}&\mathbb {E}\left[ \int _{0}^{t} f(X^{x+h}_{s}) - 2 f(X^{x}_{s}) + f(X^{x-h}_{s}) \; \text {d}s\right] \\&\le \; C_{2} \mathbb {E}\left[ \int _{0}^{t} \frac{1}{4}\left| X^{x+h}_{s} - X^{x-h}_{s}\right| ^{2} \; \text {d}s\right] \le \; t\,C_{2} \,|h|^{2}. \end{aligned} \end{aligned}$$
(2.22)

Since \(\delta >0\) is arbitrary we have proved the claim. Similar computations (see [27]) yield

$$\begin{aligned} |Du^{\varepsilon }(x,t)| \le t \,C_{1}. \end{aligned}$$
(2.23)

Next we differentiate twice (2.19) in the direction of \(\xi \) and obtain

$$\begin{aligned} \partial _{t}\omega - \varepsilon \varDelta \omega + Du^{\varepsilon }\cdot D\omega + |D_{\xi }Du^{\varepsilon }|^{2} = D_{\xi \xi }f,\quad \text {in } \mathbb {R}^n\times (0,T]. \end{aligned}$$
(2.24)

Since \(\omega ^{2}\le |D_{\xi }D u^{\varepsilon }|^{2}\) and by the semiconcavity assumption \(D^{2}_{\xi \xi }f \le C_{2}\)

$$\begin{aligned} \partial _{t}\omega - \varepsilon \varDelta \omega + Du^{\varepsilon }\cdot D\omega + \omega ^{2} \le C_{2},\quad \text {in } \mathbb {R}^n\times (0,+\infty ). \end{aligned}$$
(2.25)

Now set \(g(x) := \log (1+|x|^{2})\) and \(\varPhi (x,t):= \omega (x,t) - \beta g(x)\), in \(\mathbb {R}^n\times (0,+\infty )\) for some \(\beta >0\) to be made precise. Since \(\omega \) is bounded from above for \(0\le t\le T\), \(\varPhi \) admits a global maximum in \(\mathbb {R}^n \times [0,T]\). Let \((\overline{x},\overline{t})\) be such a maximum point. We consider first the case \(\overline{t}\in (0,T)\) and evaluate (2.25) in \((\overline{x},\overline{t})\) to get

$$\begin{aligned} \omega ^{2}(\overline{x},\overline{t}) \le C_{2} + 2 \varepsilon \beta \frac{n+(n-2)|\overline{x}|^{2}}{(1+|\overline{x}|^{2})^{2}}-2\beta Du^{\varepsilon }(\overline{x},\overline{t})\cdot \frac{\overline{x}}{1+|\overline{x}|^{2}}. \end{aligned}$$
(2.26)

Note that \(x\in \mathbb {R}^n\mapsto \frac{n+(n-2)|x|^{2}}{(1+|x|^{2})^{2}}\) has a global maximum in \(x=0\), and \(\frac{x}{1+|x|^{2}}\) is bounded. Then, by (2.23) the bound in (2.26) gives

$$\begin{aligned} \omega ^{2}(\overline{x},\overline{t}) \le C_{2} + 2\varepsilon \beta n + 2\beta \,T\, C_{1}. \end{aligned}$$

We choose \(\beta \) and T such that \(\beta \le 1/(2T)<1\). Then

$$\begin{aligned} \omega (\overline{x},\overline{t})^{2} \le C_{2} + C_{1} + 2n\varepsilon . \end{aligned}$$
(2.27)

On the other hand, if \(\overline{t}= 0\), \(u_{\lambda }(x,0)=0\) for all x implies \(\omega (\overline{x},0) = 0\) and (2.27) still holds. And if \(\overline{t}= T\) then \(\partial _{t}\varPhi (\overline{x},T) \ge 0\), i.e., \(\partial _{t}\omega (\overline{x},T)\ge 0\) and (2.27) still holds. Therefore we have

$$\begin{aligned} \omega (\overline{x},\overline{t}) \le C_{3}:=\sqrt{C_{1} + C_{2} + 2\varepsilon n}. \end{aligned}$$
(2.28)

We are now ready to prove that \(\omega (x,t) \le C_{3}\) for all \((x,t)\in \mathbb {R}^n\times (0,+\infty )\). As in the proof of Theorem 1 we suppose by contradiction there exists (ys) such \(\omega (y,s) - C_{3}=: \delta >0\). Without loss of generality, we can choose \(T>0\) large enough such that \(s< T\). Then we argue exactly as in the proof of Theorem 1 and reach a contradiction by choosing \(\beta \) such that \(\beta g(y)\le \frac{\delta }{2}\). This proves the \(C_3\)-semiconcavity of u with respect to x uniformly in t, for every \(0<\varepsilon \le 1\). Finally, we let \(\varepsilon \rightarrow 0\) in (2.19) and get that the solution u to (2.14) is semi-concave in x with constant \({\widetilde{C}}_{3}:=\sqrt{C_{1} + C_{2}}\). \(\square \)

Proof of Theorem 2

First we observe that \(\frac{1}{t} u(x,t) = \underline{f}\) if \({{\bar{x}}}\in {\mathfrak {M}}\).

In fact, for such \({{\bar{x}}}\),

$$\begin{aligned} u(\overline{x},t)=\inf \limits _{\alpha _{\cdot }} \int _{0}^{t}\frac{1}{2}|\alpha _{s}|^{2}+ f(x(s))\,\text {d}s \le \int _{0}^{t}f(\overline{x})\,\text {d}t = t\underline{f}, \end{aligned}$$

where the inequality follows from the choice \(\alpha _{\cdot }\equiv 0\). The other inequality \(\ge \) is true for all \(x\in \mathbb {R}^{n}\) by Lemma 3.

Denote \(R:=\sqrt{4\Vert f\Vert _{\infty }}\) and use the gradient bound (2.18) to get

$$\begin{aligned} \left| \frac{1}{t} u(x,t)-\underline{f}\right| \le \frac{1}{t} R \,\text {dist}(x, {\mathfrak {M}}) \quad \forall \, x\in \mathbb {R}^n, \; t>0. \end{aligned}$$

Then \(u(x,t)/ t\rightarrow \underline{f}\) locally uniformly as \(t\rightarrow \infty \).

Define now \(\varphi _{t}(\cdot ):=u(\cdot ,t)-\underline{f}t\). We observe that, in view of (2.18), \(|\varphi _{t}(x)|\le R\, \text {dist}(x, {\mathfrak {M}})\) and \(|\varphi _{t}(x) - \varphi _{t}(y)|\le R|x-y|\). Hence, \(\{\varphi _{t}(\cdot )\}_{t\ge 0}\) is a locally uniformly bounded and equi-continuous family. We claim that \(\varphi _{t}(\cdot )\rightarrow \psi (\cdot )\in C(\mathbb {R}^n)\) locally uniformly as \(t\rightarrow +\infty \) and \(\psi (\cdot )\) is a viscosity solution of

$$\begin{aligned} \underline{f} + \frac{1}{2}|D\psi (x)|^{2} = f(x),\quad \text { in } \mathbb {R}^{n}. \end{aligned}$$
(2.29)

To prove the claim define \(u_{\eta }(x,t) :=\varphi _{ {t}/{\eta }} \left( x\right) = u\left( x,\frac{t}{\eta }\right) -\frac{t}{\eta }\underline{f}\). Then we have

$$\begin{aligned} \eta \partial _{t}u_{\eta } + \underline{f} + \frac{1}{2}|Du_{\eta }|^{2} = f(x) ,\quad \text { in } \mathbb {R}^{n}\times (0,\infty ). \end{aligned}$$

Now consider the upper and lower relaxed semilimits

$$\begin{aligned} \theta (x,t):= \limsup _{\eta \rightarrow 0,\, s\rightarrow t,\, y\rightarrow x} u_\eta (y,s) , \quad \zeta (x,t) :=\liminf _{\eta \rightarrow 0,\, s\rightarrow t,\, y\rightarrow x} u_\eta (y,s), \end{aligned}$$

and note that they are finite by the local equiboundedness of \(\varphi _t\). It is well-known from the stability properties of viscosity solutions (see, e.g., [4]) that they are, respectively, a sub- and supersolution of (2.29) for any \(t>0\). Moreover, for all \(t>0\),

$$\begin{aligned} \theta (x,t)= \limsup _{s\rightarrow +\infty ,\, y\rightarrow x} \varphi _s(y) = \limsup _{s\rightarrow +\infty } \varphi _s(x), \end{aligned}$$

where the last equality comes from the equicontinuity of \(\varphi _t\). Similarly,

$$\begin{aligned} \zeta (x,t)=\liminf _{s\rightarrow +\infty } \varphi _s(x), \end{aligned}$$

and so both \(\theta \) and \(\zeta \) do not depend on t. Next note that \(\varphi _s(x)=0\) for all \(x\in {\mathfrak {M}}\) and it is non-negative everywhere. Then \(\theta (x) = \zeta (x) = 0\) on \(\partial {\mathfrak {M}}\), and they are a sub- and a supersolution bounded from below of (2.29) in \(\mathbb {R}^n\setminus {\mathfrak {M}}\), where \(f(x)-\underline{f}>0\). Then a standard comparison principle for the Dirichlet problem associated to eikonal equations gives \(\theta (x) = \zeta (x)\). This proves that \(\varphi _t\) converges pointwise to \(\psi :=\theta =\zeta \ge 0\), and the convergence is locally uniform by the Ascoli-Arzela theorem, which gives the claim. Moreover \(\psi \) coincides with the function v found in Theorem 1.

Finally, the convergence of the gradient \(D_xu(\cdot ,t)=D\varphi _t\) to \(D\psi \) is a direct consequence of [11, Theorem 3.3.3], recalling that \(|\varphi _{t}(x)|\le R\, \text {dist}(x, {\mathfrak {M}})\) and using the uniform semiconcavity estimate in Lemma 4. \(\square \)

3 Reaching the Minima Via Optimal Control

3.1 The Optimal Control Problem with Target

In this section we consider the Dirichlet problem

$$\begin{aligned} \left\{ \quad \begin{aligned} |\nabla v(x)|&= \ell (x),&x\in \mathbb {R}^{n}\setminus {\mathfrak {M}} ,\\ v(x)&= 0,&x\in {\mathfrak {M}}, \end{aligned}\right. \end{aligned}$$
(3.1)

motivated by the ergodic equation (2.6) of the previous section if \(\ell (x)=\sqrt{ 2(f(x) - \underline{f}) }\). Here, however, the standing assumptions are only that \({\mathfrak {M}}\subseteq \mathbb {R}^n\) is a closed nonempty set, possibly unbounded, and

$$\begin{aligned} \ell \in C(\mathbb {R}^n) \text { is bounded }, \; \ell (x)>0 \text { if } x\in \mathbb {R}^{n}\setminus {\mathfrak {M}} , \;\; \ell \equiv 0 \text { on } {\mathfrak {M}}. \end{aligned}$$
(F)

Also define \({\overline{\ell }}:=\sup \limits _{x\in \mathbb {R}^{n}}\ell (x)\). The Lipschitz and semiconcavity conditions of the previous section (assumptions (B)) will not be needed in most statements of the present section.

We recall that the continuous viscosity solution of (3.1) is the value function of the control problem

$$\begin{aligned} v(x)=\inf \limits _{\alpha }\int _{0}^{t_{x}(\alpha )} \ell (y_{x}^{\alpha }(s))\,\text {d}s, \end{aligned}$$
(3.2)

where \(\alpha \) (an admissible control) is a measurable function \([0,+\infty ) \rightarrow B(0,1)\), the unit ball in \(\mathbb {R}^{n}\), \(t_{x}(\alpha ):=\inf \{s \ge 0\,:\, y_{x}^{\alpha }(s) \in {\mathfrak {M}}\}\), and

$$\begin{aligned} {\dot{y}}^{\alpha }_{x}(s) = \alpha (s),\,\forall \,s\ge 0,\quad y_{x}^{\alpha }(0)=x. \end{aligned}$$
(3.3)

Theorem 3

Under Assumption (F) there exists an optimal control \(\alpha ^{*}\) for the problem (3.2).

Proof

Notice first that (F) allows to rewrite v as

$$\begin{aligned} v(x)=\inf \int _{0}^{+\infty } \ell (y_{x}^{\alpha }(s))\,\text {d}s,\; \text { s.t.:}\; (3.3)\; \text {with}\; s\mapsto \alpha (s)\in B(0,1) \, \text { measurable}. \end{aligned}$$

Fix \(x\in \mathbb {R}^{n}\) and consider a minimizing sequence \((y_{k},\alpha _{k})_{k}\), i.e., satisfying

$$\begin{aligned} \lim \limits _{k\rightarrow +\infty } \int _{0}^{+\infty } \ell (y_{k}(t))\,\text {d}t = v(x) , \quad y_{k}(t) = x +\int _{0}^{t}\alpha _{k}(s)\,\text {d}s,\; \forall \, t\ge 0 . \end{aligned}$$
(3.4)

Fix \(N\in \mathbb {N}\). Using Alaoglu’s theorem, we can extract a subsequence that we denote by \((y_{k(N)}, \alpha _{k(N)})\), where \(k(N)\rightarrow +\infty \), such that

$$\begin{aligned} \begin{aligned}&\alpha _{k(N)} \overset{*}{\rightharpoonup }\ \alpha ^{*}_{N},\; \text { a.e. in }\, [0,N] ,\\&y_{k(N)} \rightarrow y_{N}^{*},\; \text {loc. unif. on } \, [0,N] , \\ \text {and }\;&y_{N}^{*}(t) = x + \int _{0}^{t}\alpha _{N}^{*}(s)\,\text {d}s,\; \text {for all }\, t\in [0,N]. \end{aligned} \end{aligned}$$

We repeat this procedure in the interval \([0,N+1]\) and extract from the previous subsequence another subsequence \((y_{k(N+1)}, \alpha _{k(N+1)})\) with the same properties in \([0,N+1]\). Note that

$$\begin{aligned} \begin{aligned}&\alpha ^{*}_{N+1}=\alpha _{N}^{*},\; \text { a.e. in }\, [0,N].\\&y^{*}_{N+1}=y^{*}_{N},\; \text { in }\, [0,N]. \end{aligned} \end{aligned}$$

This suggests the definition of the candidate optimal pair \((y^{*},\alpha ^{*})\) as

$$\begin{aligned} (y^{*},\alpha ^{*}) :=(y^{*}_{N},\alpha ^{*}_{N})\quad \text { in } [0,N]. \end{aligned}$$

To prove its optimality consider the diagonal subsequence \((y_{N(N)},\alpha _{N(N)})\). By the previous construction, for any fixed \(T>0\) we have

$$\begin{aligned} \begin{aligned}&\alpha _{N(N)} \overset{*}{\rightharpoonup }\ \alpha ^{*},\; \text { a.e. in }\, [0,T],\\&y_{N(N)} \rightarrow y^{*},\; \text {loc. unif. on } \, [0,T],\\ \text {and }\;&y^{*}(t) = x + \int _{0}^{t}\alpha ^{*}(s)\,\text {d}s,\; \text {for all }\, t\in [0,T]. \end{aligned} \end{aligned}$$
(3.5)

Now use Fatou’s lemma

$$\begin{aligned} \int _{0}^{\infty } \liminf \limits _{N\rightarrow +\infty } \ell (y_{N(N)}(t))\,\text {d}t \le \liminf \limits _{N\rightarrow \infty }\int _{0}^{+\infty } \ell (y_{N(N)}(t))\,\text {d}t. \end{aligned}$$

By (3.4) the right-hand side is v(x) because \(y_{N(N)}\) is a subsequence of \(y_{k}\). Now use the continuity of \(\ell \) in the left hand side and get

$$\begin{aligned} \int _{0}^{\infty } \ell (y^{*}(t))\,\text {d}t = \int _{0}^{\infty } \liminf \limits _{N\rightarrow +\infty } \ell (y_{N(N)}(t))\,\text {d}t \le v(x), \end{aligned}$$

which says that \((y^{*},\alpha ^{*})\) is an optimal pair solution to (3.2). \(\square \)

Next we show that the fraction of time spent by an optimal trajectory away from the minimizers of \(\ell \) tends to zero as \(t\rightarrow +\infty \).

For a given fixed \(\delta >0\) we define the set of quasi-minimizers

$$\begin{aligned} K_{\delta }:= \{x\in \mathbb {R}^n\,:\, \ell (x)\le \delta \}, \end{aligned}$$

and the fraction of time \(\rho ^{\delta }(t)\) spent by an optimal trajectory starting from x away from \(K_{\delta }\)

$$\begin{aligned} \rho ^{\delta }(t)= \rho ^{\delta }(t,x, \alpha ^{*}) := \frac{1}{t}\big |\{s\in [0,t]\,:\, y_{x}^{\alpha ^{*}}(s)\notin K_{\delta }\}\big |, \end{aligned}$$

where \(\big |I\big |\) denotes the Lebesgue measure of \(I\subseteq \mathbb {R}\). In other words, \(\rho ^{\delta }(t)\) is the image of the complement of \(K_{\delta }\) by the occupational measure of the optimal trajectory \( y_{x}^{\alpha ^{*}}.\)

Theorem 4

Under Assumption (F), for any \(x\in \mathbb {R}^{n}\) and \(\delta >0\), an optimal trajectory \(y_{x}^{\alpha ^{*}}(\cdot )\) for the problem (3.2) satisfies

$$\begin{aligned} \rho ^{\delta }(t,x, \alpha ^{*}) \le \, \frac{{\overline{\ell }}}{t\,\delta }\,\text {dist}(x,{\mathfrak {M}}) . \end{aligned}$$
(3.6)

In particular, \( \lim \limits _{t\rightarrow + \infty } \rho ^{\delta }(t) = 0\).

Proof

Since \(\ell \ge 0\), using the characteristic function \(\mathbb {1}_{Q}(y)=1\) if \(y\in Q\) and 0 otherwise,

$$\begin{aligned} \int _{0}^{t}\ell (y_{x}^{\alpha ^{*}}(s))\text {d}s \ge \int _{0}^{t}\mathbb {1}_{K^{c}_{\delta }}(y_{x}^{\alpha ^{*}}(s))\,\ell (y_{x}^{\alpha ^{*}}(s))\,\text {d}s\;\ge \delta \,\int _{0}^{t}\mathbb {1}_{K^{c}_{\delta }}(y_{x}^{\alpha ^{*}}(s))\,\text {d}s, \end{aligned}$$

and hence

$$\begin{aligned} \frac{1}{t}\int _{0}^{t}\ell (y_{x}^{\alpha ^{*}}(s))\text {d}s\; \ge \; \delta \,\rho ^{\delta }(t). \end{aligned}$$

Now, since \(\ell (y_{x}^{\alpha ^{*}}(s)) = 0\) for all \(s\ge t_{x}(\alpha ^{*})\) and \(\ell (\cdot )\le \bar{\ell }\), we have for all \(t\ge 0\)

$$\begin{aligned} \begin{aligned} \int _{0}^{t}\ell (y_{x}^{\alpha ^{*}}(s))\,\text {d}s&\;\le \; \int _{0}^{t_{x}(\alpha ^{*})}\ell (y_{x}^{\alpha ^{*}}(s))\,\text {d}s,\\&\;=\; v(x) \; \le \; \bar{\ell }\; \inf \left\{ t_{x}(\alpha ) : (3.3) \text { holds with } |\alpha (s)|\le 1\right\} . \end{aligned} \end{aligned}$$

The second factor on the right-hand side is the minimal time function whose optimal trajectories are the straight lines from the initial position x to its orthogonal projection on the set \({\mathfrak {M}}\), with maximal speed 1. Therefore the right-hand side in the last inequality is less or equal \(\bar{\ell }|z-x|\) for any \(z\in {\mathfrak {M}}\), and then

$$\begin{aligned} v(x) \le \bar{\ell } \;\text {dist}(x,{\mathfrak {M}}). \end{aligned}$$

Combining the inequalities we get

$$\begin{aligned} 0 \le \; \delta \,\rho ^{\delta }(t) \le \; \frac{1}{t}\int _{0}^{t}\ell (y_{x}^{\alpha ^{*}}(s))\,\text {d}s \le \; \frac{v(x)}{t} \le \; \frac{\bar{\ell }}{t} \,\text {dist}(x,{\mathfrak {M}}), \end{aligned}$$

which concludes the proof. \(\square \)

3.2 A Gradient Descent Inclusion for the Optimal Trajectories

So far, we showed that an optimal control exists and the corresponding optimal trajectory does not leave the set of minimizers in average as time goes to infinity, i.e. in the sense of (3.6). We now synthesize optimal feedback controls that give the gradient descent differential inclusion anticipated in the Introduction. We recall the definition of subdifferential of a continuous function

$$\begin{aligned} D^{-}v(z) :=\left\{ \, p \, :\; \liminf \limits _{x\rightarrow z} \frac{v(x) - v(z) - p\cdot (x-z)}{|x-z|} \ge 0 \right\} . \end{aligned}$$

Theorem 5

Assume (F). A control \(\alpha \) with corresponding trajectory \(y(\cdot ):=y_{x}^{\alpha }(\cdot )\) is optimal if and only if

$$\begin{aligned} {\dot{y}}(s)\in \left\{ - \frac{p}{|p|}\,,\; p\in D^{-}v(y(s))\right\} ,\; \text { for a.e. }\, s\in \;(0,t_{x}(\alpha )). \end{aligned}$$
(DI)

Proof

By the dynamic programming principle, the function

$$\begin{aligned} h(t):=v(y_{x}^{\alpha }(t)) + \int _{0}^{t} \ell (y_{x}^{\alpha }(s))\text {d}s,\quad 0\le t\le t_{x}(\alpha ), \end{aligned}$$
(3.7)

is non-decreasing for all \(\alpha \), and non-increasing (hence constant) if and only if \(\alpha \) is optimal. And since h is locally Lipschitz, we get

$$\begin{aligned} \alpha \,\text { is optimal }\, \text { if and only if }\;\; h'(t)\le 0\; \text { a.e. } t. \end{aligned}$$

Proof of Necessity Assume \(\alpha \) is optimal, and so \(h'\le 0\). Let \(y(\cdot ):=y_{x}^{\alpha }(\cdot )\).

Claim 1. \(p\cdot {\dot{y}}(t) + \ell (y(t))\le 0\) for all \(p\in D^{-}v(y(t))\) a.e. t.

Let \(\partial ^{-}v(x;q)\) be the lower Dini derivative at x in the direction q (see Eq. (2.47) in [4, p. 125]). Then by [4, Lemma 2.50, p. 135], one has

$$\begin{aligned} \partial ^{-}(v\circ y)(s;1) = \partial ^{-}v(y(s);{\dot{y}}(s)), \end{aligned}$$

and for almost every t, \(h'(t) = \partial ^{-}v(y(t);{\dot{y}}(t)) + \ell (y(t))\). Next, using [4, Lemma 2.37, p. 126], one has, for any \(z\in \mathbb {R}^{n}\),

$$\begin{aligned} D^{-}v(z) = \{\,p\,:\;p\cdot q \le \partial ^{-}v(z;q),\;\forall \,q\in \mathbb {R}^{n}\}, \end{aligned}$$

and hence, for almost every t and for all \(p\in D^{-}v(y(t))\),

$$\begin{aligned} p\cdot {\dot{y}}(t) + \ell (y(t))\le \partial ^{-}v(y(t);{\dot{y}}(t)) + \ell (y(t)) = h'(t)\le 0. \end{aligned}$$

Claim 2. \({\dot{y}}(t) = -\frac{p}{|p|}\) for all \(p\in D^{-}v(y(t))\), a.e. t.

By [4, Proposition 5.3, p. 344], v is a bilateral supersolution of \(|Dv(x)|-\ell (x) = 0\) in \(\mathbb {R}^{n}\setminus {\mathfrak {M}}\), i.e. \(|p|-\ell (x)=0\) for all \(p\in D^{-}v(x)\). This implies in particular that \(p\ne 0\) if \(x\notin {\mathfrak {M}}\). Hence, and using claim 1 together with \({\dot{y}}\in B(0,1)\), one gets

$$\begin{aligned} |p|=\ell (y(t)) \le -p\cdot {\dot{y}}(t)\le |p|, \end{aligned}$$

that is, \({\dot{y}}(t) = - {p}/{|p|}\).

Proof of Sufficiency By the non-smooth calculus rule just recalled, for a.e. t,

$$\begin{aligned} h'(t){} & {} = - \partial ^{-}v(y(t);-{\dot{y}}(t)) + \ell (y(t)) \\{} & {} \le -p\cdot (-{\dot{y}}(t)) + \ell (y(t)),\quad \forall \,p\in D^{-}v(y(t)). \end{aligned}$$

Then, if we assume \(y(\cdot )\) solves (DI),

$$\begin{aligned} h'(t) \le -p\cdot \frac{p}{|p|} + \ell (y(t)) = -|p|+\ell (y(t)) \le 0 \end{aligned}$$

because v is a supersolution of \(|Dv|-\ell = 0\) and \(p\in D^{-}v(y(t))\). \(\square \)

Remark 1

Combining Theorems 3 and 5, the differential inclusion (DI) has at least a solution and all such solutions are optimal.

We recall the definition of limiting gradient of a Lipschitz function

$$\begin{aligned} D^{*}v(z) :=\{ \,p\,:\; p=\lim \limits _{n\rightarrow +\infty } Dv(x_{n})\;\text { for some }\, x_{n}\rightarrow z \}, \end{aligned}$$

and the super-differential of a continuous function

$$\begin{aligned} D^{+}v(z) :=\left\{ \, p \, :\; \limsup \limits _{x\rightarrow z} \frac{v(x) - v(z) - p\cdot (x-z)}{|x-z|} \le 0 \right\} . \end{aligned}$$

Theorem 6

Assume (F). The following necessary and sufficient conditions of optimality hold.

  1. (I)

    If \(y(\cdot )\) is optimal, then

    1. (i)

      \({\dot{y}}(t)=-\frac{p}{|p|}\), for all \(p\in D^{+}v(y(t)),\, p\ne 0\) and almost all \(t\in (0,t_{x}(\alpha ^{*}))\),

    2. (ii)

      \(|p|=\ell (y(t))\), for all \(p\in D^{+}v(y(t))\) and all \(t\in (0,t_{x}(\alpha ^{*}))\),

    3. (iii)

      \(D^{+}v(y(t))\) is a singleton for all \(t\in (0,t_{x}(\alpha ^{*}))\).

    4. (iv)

      If \(\ell (x)=\sqrt{ 2(f(x) - \underline{f}) }\) and assumptions (A) and (B) are satisfied, then v is differentiable at all points y(t) with \(t\in (0,t_{x}(\alpha ^{*}))\) and

      $$\begin{aligned} {\dot{y}}(t)= - \frac{Dv(y(t))}{|Dv(y(t))|},\quad \forall \,t\in (0,t_{x}(\alpha ^{*})). \end{aligned}$$
      (3.8)
  2. (II)

    A sufficient condition for the optimality of \(y(\cdot )\) is

    $$\begin{aligned} {\dot{y}}(t) \in - \left\{ \,\frac{p}{|p|}\;:\;p\in D^{*}v(y(t))\cap D^{+}v(y(t)),\,p\ne 0 \right\} ,\, \text { a.e. } t. \end{aligned}$$
    (3.9)

Proof

To prove (I.i) we take h defined by (3.7) and let \(\partial ^{+}v(x;q)\) be the upper Dini derivative of v in direction q, with \(|q|=1\).

Claim 1. \(p\cdot {\dot{y}}(t) + \ell (y(t))\le 0\),  for all \(p\in D^{*}v(y(t))\), a.e. t.

Using [4, Lemma 2.37, p. 126], one has, for any \(z\in \mathbb {R}^{n}\)

$$\begin{aligned} D^{+}v(z) = \bigg \{\,p\;:\;p\cdot q \ge \partial ^{+}v(z;q),\; \forall \,q\in \mathbb {R}^{n} \bigg \}. \end{aligned}$$

Hence, for \(p\in D^{+}v(y(t))\), one has

$$\begin{aligned} p\cdot {\dot{y}}(t) + \ell (y(t)) = -p\cdot (-{\dot{y}}(t)) + \ell (y(t)) \le -\partial ^{+}v(y(t);-{\dot{y}}(t)) + \ell (y(t)). \end{aligned}$$

But, as in Claim 1 in the proof of Theorem 5, and since y is optimal, one gets

$$\begin{aligned} -\partial ^{+}v(y(t);-{\dot{y}}(t)) + \ell (y(t)) = h'(t)\le 0, \end{aligned}$$

which proves the claim.

Claim 2. \({\dot{y}}(t) = -\frac{p}{|p|}\) for all \(p\in D^{+}v(y(t)), p\ne 0\), a.e. t.

Recalling \(|{\dot{y}}|\in B(0,1)\) and v being a subsolution of \(|Dv|-\ell =0\), we have for all \(p\in D^{+}v(y(t))\), \(|p|\le \ell (y(t))\le -p\cdot {\dot{y}}(t) \le |p|\), and hence, either \(p=0\) or \({\dot{y}}(t) = -\frac{p}{|p|}\).

To prove (I.ii) we use the fact that h is non-increasing if and only if \(y(\cdot )\) is optimal. Hence, for \(t>0\) and \(\tau >0\) small, one has

$$\begin{aligned} { \begin{aligned} h(t) - h(t-\tau ) \le 0&\Rightarrow v(y(t))-v(y(t-\tau )) + \int _{t-\tau }^{t}\ell (y(s))\text {d}s \le 0\\&\Rightarrow v(y(t))-v(y(t-\tau )) \le -\ell (y(t))\tau + o(\tau ). \end{aligned} } \end{aligned}$$

Recalling the definition of \(p\in D^{+}v(y(t))\), one has

$$\begin{aligned} { \begin{aligned}&v(y(t)) - v(y(t-\tau )) \ge p\cdot (y(t) - y(t-\tau ) ) + o(\tau )\\ \Rightarrow \;&v(y(t)) - v(y(t-\tau )) \ge \int _{t-\tau }^{t} p\cdot \alpha (s)\text {d}s + o(\tau ) \ge -|p|\tau + o(\tau ), \end{aligned} } \end{aligned}$$

and together with the previous inequality this yields

$$\begin{aligned} |p| \ge \ell (y(t)),\quad \forall \, t\in (0,t_{x}(\alpha ^*)). \end{aligned}$$

The other inequality is a direct consequence of p being in \(D^{+}v(y(t))\) and v a subsolution. This concludes the proof of statement (I.ii).

The property (I.iii) follows immediately from the equality \(|p| = \ell (y(t))\) for all \(p\in D^{+}v(y(t))\) and the convexity of the set \(D^{+}v(y(t))\).

Under the additional conditions of (I.iv), v is semiconcave thanks to Lemma 2 (or Lemma 4).

This implies that v is differentiable at all points where the superdifferential is a singleton (see, e.g., [4, Proposition II.4.7 (c), p. 66]), and then at all y(t) with \(t\in (0,t_{x}(\alpha ^{*}))\). Hence, (DI) becomes (3.8).

To prove (II) note that at all points of differentiability of v, one has \(|Dv(z)| = \ell (z)\). Then for all \(p\in D^{*}v(z)\), \(|p|=\ell (z)\). And one has

$$\begin{aligned} h'(t) = \partial ^{+}v(y(t);{\dot{y}}(t)) + \ell (y(t)) \le p\cdot {\dot{y}}(t) + \ell (y(t)),\quad \forall \,p\in D^{+}v(y(t)). \end{aligned}$$

Then, for y solving (3.9), \(p\ne 0\)

$$\begin{aligned} h'(t)\le -p\cdot \frac{p}{|p|} + \ell (y(t)) = -|p|+\ell (y(t)) = 0 \end{aligned}$$

which concludes the proof as it has been done for Theorem 5. \(\square \)

3.3 Convergence of Optimal Trajectories to the Argmin

In order to show stability of \({\mathfrak {M}}\), we need an assumption which prevents \(\ell (\cdot )\) from approaching 0 when \(\text {dist}(x,{\mathfrak {M}})\rightarrow \infty \), that is,

  • for all \(\delta >0\), there exists \(\gamma =\gamma (\delta )>0\) such that

    $$\begin{aligned} \inf \{\ell (x)\,:\,\text {dist}(x,{\mathfrak {M}})\,> \delta \} \,>\, \gamma (\delta ). \end{aligned}$$
    (H)

If \({\mathfrak {M}}\) is bounded, then it is easy to see that this condition is equivalent to

$$\begin{aligned} \liminf _{|x|\rightarrow \infty } \ell (x)>0 , \end{aligned}$$

which is also equivalent to Assumption (A3) in [24], Assumption (L3)–(3.2) in [10], and Assumption (L3) in [9]. The last inequality, however, is impossible when \({\mathfrak {M}}\) is unbounded.

Remark 2

An example of function with a unique global minimizer that does not satisfy hypothesis (H) is \(\ell (x)=|x|e^{-x^2}.\) In this case \({\mathfrak {M}}=\{0\}\) and \(\inf \{ \ell (x) : |x|>\delta \} = 0\) for all \(\delta \).

A direct consequence of Theorem 5 is the following result.

Corollary 1

Assume the conditions (F) and (H). Let \(y_{x}^{\alpha ^{*}}(\cdot )\) be an optimal trajectory and \(\delta >0\). If there exists \(\tau >0\) such that \(\text {dist}(y^{*}(\tau ),{\mathfrak {M}})>\delta \), then, for \(\gamma (\cdot )\) defined in (H),

$$\begin{aligned} \rho ^{\gamma (\delta /2)}(t) \ge \frac{\delta }{t},\quad \quad \forall \,t>\tau + \frac{\delta }{2}. \end{aligned}$$
(3.10)

Proof

Set \(y^{*}(\cdot ):=y_{x}^{\alpha ^{*}}(\cdot )\). Since it satisfies (DI), we have \(|{\dot{y}}^{*}(\cdot )|\le 1\) and hence \(y^{*}(\cdot )\) is Lipschitz continuous. Therefore, given \(\delta >0\), if there exists \(\tau >0\) such that \(\text {dist}(y^{*}(\tau ),{\mathfrak {M}})>\delta \), then

$$\begin{aligned} \begin{aligned} \delta < \text {dist}(y^{*}(\tau ),{\mathfrak {M}})&\le \text {dist}(y^{*}(s),{\mathfrak {M}}) + |y^{*}(s)-y^{*}(\tau )|\\&\le \text {dist}(y^{*}(s),{\mathfrak {M}}) + |s-\tau |,\\ \end{aligned} \end{aligned}$$

which yields

$$\begin{aligned} \text {dist}(y^{*}(s),{\mathfrak {M}})\,>\,\frac{\delta }{2},\quad \forall \,s\in ]\tau -\delta /2,\tau +\delta /2[. \end{aligned}$$

Hence one has

$$\begin{aligned} \ell (y^{*}(s))\ge \inf \left\{ \ell (x)\,:\,\text {dist}(x,{\mathfrak {M}})\,> \frac{\delta }{2}\right\} ,\quad \forall \,s\in ]\tau -\delta /2,\tau +\delta /2[, \end{aligned}$$

and together with (H), one gets

$$\begin{aligned} \ell (y^{*}(s))> \gamma (\delta /2),\quad \forall \,s\in ]\tau -\delta /2,\tau +\delta /2[. \end{aligned}$$
(3.11)

Therefore

$$\begin{aligned} \bigg |\{s\in [0,t]\,:\, y^{*}(s) \notin K_{\gamma (\delta /2)}\}\bigg | \ge \bigg |\,]\tau -\delta /2,\tau +\delta /2[\,\bigg |,\quad \forall \,t>\tau +\frac{\delta }{2}. \end{aligned}$$

The latter writes as

$$\begin{aligned} t\,\rho ^{\gamma (\delta /2)}(t) \ge \delta \end{aligned}$$

and concludes the proof. \(\square \)

We are now ready to show stability properties of the set of global minimizers \({\mathfrak {M}}\) with respect to the optimal trajectories \(y_{x}^{\alpha ^{*}}(\cdot )\).

Theorem 7

Assume (F) and (H) hold. Then for \(y^{*}(\cdot )\) as in (DI),

  1. (i)

    \({\mathfrak {M}}\) is Lyapunov stableFootnote 1,

  2. (ii)

    \({\mathfrak {M}}\) is globally asymptotically stableFootnote 2.

Proof

Let \(y^{*}(\cdot ):=y_{x}^{\alpha ^{*}}(\cdot )\) be an optimal trajectory, i.e., a solution of (DI). We proceed by contradiction.

Proof of (i). Let \(\varepsilon >0\) be fixed and suppose for all \(\eta >0\), \(\exists \,\tau >0\) such that \(\text {dist}(y^{*}(\tau ),{\mathfrak {M}})>\varepsilon \) and \(\text {dist}(x,{\mathfrak {M}})<\eta \). Then from Corollary 1, one has

$$\begin{aligned} \rho ^{\gamma (\varepsilon /2)}(t) \ge \frac{\varepsilon }{t},\quad \forall \, t > \tau +\frac{\varepsilon }{2}. \end{aligned}$$

And from Theorem 4, one has

$$\begin{aligned} \frac{t\,\gamma (\varepsilon /2)}{{\overline{\ell }}} \rho ^{\gamma (\varepsilon /2)}(t) \le \text {dist}(x,{\mathfrak {M}}). \end{aligned}$$

Therefore one gets

$$\begin{aligned} \frac{\varepsilon \,\gamma (\varepsilon /2)}{{\overline{\ell }}} \le \text {dist}(x,{\mathfrak {M}}), \end{aligned}$$

which contradicts \(\text {dist}(x,{\mathfrak {M}})<\eta \) when we choose \(\eta < \frac{\varepsilon \,\gamma (\varepsilon /2)}{{\overline{\ell }}}\). Hence we can conclude that, for all \(\varepsilon >0\), there exists \(\eta >0\) such that if \(\text {dist}(x,{\mathfrak {M}})\le \eta \) then \(\text {dist}(y^{*}(t),{\mathfrak {M}})\le \varepsilon \) for all t.

Proof of (ii). Suppose there exist a diverging sequence \(\{\tau _{k}\}_{k\ge 0}\) and \(\varepsilon >0\) such that \(\text {dist}(y^{*}(\tau _{k}),{\mathfrak {M}})>\varepsilon \). Without loss of generality, one can extract a subsequence (again denoted by \(\tau _{k}\)) such that \(\tau _{k+1}-\tau _{k}\ge \varepsilon \). Using Corollary 1, in particular (3.11), one has for all \(k\ge 0\)

$$\begin{aligned} \ell (y^{*}(s))\ge \gamma (\varepsilon /2),\quad \forall \,s\in ]\tau _{k}-\varepsilon /2,\tau _{k}+\varepsilon /2[, \end{aligned}$$

and therefore

$$\begin{aligned} \bigg |\{s\in [0,t]\,:\, y^{*}(s) \notin K_{\gamma (\varepsilon /2)}\}\bigg |\, > \sum \limits _{\{k \ge 0\,:\,\tau _{k} \le t-\frac{\varepsilon }{2}\}} \bigg |\,]\tau _{k}-\varepsilon /2,\tau _{k}+\varepsilon /2[\,\bigg | = N(t)\,\varepsilon , \end{aligned}$$

where N(t) is the number of distinct elements \(\{\tau _{k}\}_{k\ge 0}\) that are in \([0,t+\varepsilon /2]\), i.e.

$$\begin{aligned} N(t) :=\#\{\tau _{k}\,:\, \tau _{k}\le t+\varepsilon /2,\;k\ge 0\}. \end{aligned}$$

The previous inequality writes as

$$\begin{aligned} t\rho ^{\gamma (\varepsilon /2)}(t)\,> N(t)\,\varepsilon . \end{aligned}$$

On the other hand, we know from Theorem 4, in particular (3.6), that

$$\begin{aligned} t\rho ^{\gamma (\varepsilon /2)}(t) \le \frac{{\overline{\ell }}\,\text {dist}(x,{\mathfrak {M}})}{\gamma (\varepsilon /2)}, \end{aligned}$$

and so we have \(N(t) < \frac{{\overline{\ell }}\,\text {dist}(x,{\mathfrak {M}})}{\varepsilon \,\gamma (\varepsilon /2)}\). But this cannot be true since \(N(t)\rightarrow +\infty \) as \(t\rightarrow +\infty \), and hence it concludes the proof. \(\square \)

3.4 On Reaching the Argmin in Finite Time

Here we investigate whether the hitting time \(t_{x}(\alpha ^{*})\) of an optimal trajectory with the target \({\mathfrak {M}}\) is finite or not. In view of the gradient descent inclusion (1.1), or its smooth version (3.8), the question is equivalent to the finite length of the orbits of the gradient flow \(\dot{y} \in - D^- v(y)\), or \(\dot{y} = -\nabla v(y)\). This is a classical problem with a large literature. Positive results require strong regularity of v, such as quasiconvexity and subanaliticity [7]. On the other hand, counterexamples are known for \(v\in C^\infty (\mathbb {R}^2)\) and target a circle [33] or a single point [16].

In our case v is not smooth, but it is the value function of a control problem and solves an eikonal equation. These properties can be exploited to prove that the hitting time is finite in some cases.

The first sufficient condition, that complements the hypothesis (H), is the following, where \(d(x):=\text {dist}(x,{\mathfrak {M}})\):

  • there exist a continuous function \( {\tilde{\gamma }}(s)>0\) for all \(s>0\) and \({\tilde{\gamma }}(0)=0\), and some \(r>0\) such that

    $$\begin{aligned} \ell (x)={\tilde{\gamma }}(d(x)),\quad \forall \,x \,\text { s.t. }\, d(x)\le r. \end{aligned}$$
    (L)

Proposition 1

Assume (F), (H), and (L) hold, and \(\alpha ^{*}\) be an optimal control for problem (3.2). Then the hitting time \(t_{x}(\alpha ^{*})=d(x)\) whenever \(d(x)\le r\) and it is finite for all x.

Proof

Let us first note that the finiteness for all x follows from the property in the case \(d(x)\le r\), because by Theorem 7(ii) there exists a finite time \({\widetilde{t}}_{x}\) such that \(d(y_{x}^{\alpha ^{*}}({\widetilde{t}}_{x})) \le r\).

We assume that the initial position x satisfies \(d(x)\le r\) and aim to prove that

$$\begin{aligned} v(x) = \int _{0}^{d(x)} {\tilde{\gamma }}(s)\,\text {d}s, \end{aligned}$$
(3.12)

where v(x) is the value function defined in ( 3.2). Denote by V(x) the right-hand side of the last equality.

We first claim that \(v(x)\le V(x)\). Take z is in the set of projections of x onto \({\mathfrak {M}}\) and consider the straight line from x to z given by the trajectory \(\overline{y}_{x}(t) = x - pt\), \(t\ge 0\), where \(p =\frac{x-z}{|x-z|}\). Note that \(\overline{t}_{x}:=\inf \{t\ge 0\,:\; \overline{y}_{x}(s)\in {\mathfrak {M}}\} = d(x)\), and that \(d(x-pt)\le r\) for all \(0\le t\le \overline{t}_{x}\). Then, by (L),

$$\begin{aligned} v(x) \le \int _{0}^{\overline{t}_{x}}\ell (\overline{y}_{x}(t))\,\text {d}t = \int _{0}^{\overline{t}_{x}}{\tilde{\gamma }}(d(\overline{y}_{x}(t)))\,\text {d}t =: J(x). \end{aligned}$$

Observe now that \(d(\overline{y}_{x}(t)) = \big | |x-z| - t \big |=d(x) - t\). Therefore, using the change of variable \(s:=d(\overline{y}_{x}(t))=d(x)-t\), we obtain

$$\begin{aligned} J(x) = \int _{0}^{d(x)}{\tilde{\gamma }}(d(\overline{y}_{x}(t)))\,\text {d}t = \int _{0}^{d(x)}{\tilde{\gamma }}(s)\,\text {d}s = V(x), \end{aligned}$$

and this proves the claim.

Next we show that \(v(x)\ge V(x)\). Since v(x) is a continuous viscosity solution to (3.1), then using [34, Theorem 3.2 (ii)] it satisfies the upper optimality principle [34, Definition 3.1], that is,

$$\begin{aligned} v(x) \ge \inf \limits _{\alpha } \int _{0}^{t}\ell (y_{x}^{\alpha }(s))\,\text {d}s + v(y_{x}^{\alpha }(t)),\quad \forall t\ge 0, \end{aligned}$$

where the dynamics of \(y_{x}^{\alpha }(\cdot )\) is again (3.3) with \(|\alpha (s)|\le 1\). Using (L) and \(v\ge 0\) we get

$$\begin{aligned} v(x) \ge \inf \limits _{\alpha } \int _{0}^{t}{\tilde{\gamma }}(d(y_{x}^{\alpha }(s)))\,\text {d}s,\quad \forall t\ge 0. \end{aligned}$$

In particular, since \({\tilde{\gamma }}(s) = 0\) if and only if \(s=0\), we have

$$\begin{aligned} v(x) \ge \inf \limits _{\alpha \in B(0,1)} \int _{0}^{t_{x}(\alpha )}{\tilde{\gamma }}(d(y_{x}^{\alpha }(s)))\,\text {d}s\;=:\,W(x) . \end{aligned}$$

Then the function W(x) solves in the viscosity sense the Dirichlet problem

$$\begin{aligned} \left\{ \quad \begin{aligned} |\nabla W(x)|&= {\tilde{\gamma }}(d(x)),&x\in \mathbb {R}^{n}\setminus {\mathfrak {M}}\\ W(x)&= 0,&x\in {\mathfrak {M}}. \end{aligned}\right. \end{aligned}$$
(3.13)

But \(V(x):=\int _{0}^{d(x)} {\tilde{\gamma }}(s)\,\text {d}s\) is also a viscosity solution of this Dirichlet problem because \(|D^{\pm }V(x)|=|D^{\pm }d(x)|{\tilde{\gamma }}(d(x))\). We conclude using [29, Theorem 1 and Remark 3.1] that \(V(x)=W(x)\) and hence \(v(x)\ge V(x)\).

Finally we use in the integral of the formula (3.12) the same change of variable as above to get

$$\begin{aligned} v(x) = \int _{0}^{d(x)}{\tilde{\gamma }}(d(\overline{y}_{x}(t)))\,\text {d}t = \int _{0}^{d(x)}\ell (\overline{y}_{x}(t))\,\text {d}t. \end{aligned}$$

This proves that \(\overline{y}_{x}(t):=x-pt\) is an optimal trajectory and d(x) is its hitting time. \(\square \)

Remark 3

In some control problems it may happen that an optimal trajectory remains arbitrarily close to a target without ever reaching it. Such a behavior has been observed in a linear-quadratic control problem studied in [26, Sect. 6.1] with the target is a singleton \(\{x_{\circ }\}\) and the time \(t_{\varepsilon }\) of being \(\varepsilon \)-close to \(x_{\circ }\) is shown to be \(t_{\varepsilon }=C\,\ln \left( \frac{|x-x_{\circ }|}{\varepsilon }\right) \), where x is the initial state. Moreover, an optimal trajectory oscillates periodically around \(x_{\circ }\) (see [26, p. 55]).

Next we show that, under the set of assumptions of Sect. 2, a bound from below on \(\ell \) near the target is a sufficient condition for the finite hitting time. The proof uses an inequality of Łojasiewicz type along optimal gradient orbits.

Theorem 8

Assume \(\ell (x)=\sqrt{ 2(f(x) - \underline{f}) }\), (A), (B), and (H) are satisfied, and for some \(c, r>0\), \(0<\beta <3/2\),

$$\begin{aligned} \ell (x) \ge c\, d(x)^\beta ,\quad \forall \,x \,\text { s.t. }\, d(x)\le r. \end{aligned}$$
(3.14)

If \(\alpha ^{*}\) is an optimal control for x, then the hitting time \(t_{x}(\alpha ^{*})\) is finite for all x, and for d(x) sufficiently small

$$\begin{aligned} t_{x}(\alpha ^{*})\le \frac{C}{1- 2\beta /3}\, d(x)^{\frac{3}{2} - \beta }. \end{aligned}$$
(3.15)

Proof

Set \(y(t):= y_x^{\alpha ^{*}}(t)\) and recall from Theorem 7 that

$$\begin{aligned} \lim _{t\rightarrow t_{x}(\alpha ^{*})} d(y(t)) = 0. \end{aligned}$$

Therefore it is not restrictive to assume that \(d(y(t))\le r\) for all \(t>0\).

We re-parametrise the trajectory y to get a gradient orbit. Set

$$\begin{aligned} s(t):=\int _0^t|Dv(y(\tau ))|^{-1} d\tau \in [0, T) , \quad 0\le t < t_{x}(\alpha ^{*}), \end{aligned}$$

where \(T\le +\infty \). Define \(s\mapsto t(s)\), \([0, T)\rightarrow [0, t_{x}(\alpha ^{*}))\), the inverse function of s(t) and \(z(s):= y(t(s))\). Then

$$\begin{aligned} \dot{z}(s) = - Dv(z(s)) , \quad z(0)=x , \quad \lim _{s\rightarrow T} d(z(s)) = 0, \end{aligned}$$

and

$$\begin{aligned} t(s)=\int _0^s |Dv(z(\tau ))| d\tau = \int _0^s |\dot{z}(\tau )|d\tau . \end{aligned}$$

Therefore

$$\begin{aligned} t_{x}(\alpha ^{*})=\lim _{s\rightarrow T} t(s) = \int _0^{T} |\dot{z}(\tau )|d\tau , \end{aligned}$$

and so \(t_{x}(\alpha ^{*})<\infty \) if the length of the gradient orbit \(z(\cdot )\) is finite. By Theorem 6, v is differentiable at all points z(s), \(s>0\), and then

$$\begin{aligned} |Dv(z(s))|= \ell (z(s)) \ge c\, d(z(s))^\beta , \quad \forall s>0, \end{aligned}$$
(3.16)

by (3.14) and \(d(z(s))\le r\). On the other hand, by assumptions (A2) and (B1), for some \(C_3>0\)

$$\begin{aligned} \ell (x)\le C_3 \sqrt{d(x)}. \end{aligned}$$

By repeating the 1st half of the proof of Proposition 1 we get

$$\begin{aligned} v(x)\le \int _{0}^{d(x)} C_3 \sqrt{s}\,\text {d}s = \frac{2C_3}{3} d(x)^{3/2}. \end{aligned}$$
(3.17)

By combining this with (3.16) we obtain

$$\begin{aligned} |Dv(z(s))| \ge C_4 v(z(s))^\rho , \end{aligned}$$

where \(\rho := 2\beta /3 <1\). This is a Łojasiewicz inequality along the gradient orbit \(z(\cdot )\), and we can use the following classical argument:

$$\begin{aligned} \frac{-1}{1-\rho } \frac{d}{ds}[v(z(s))^{1-\rho }] = \frac{-Dv(z(s))\cdot \dot{z}(s)}{v(z(s))^{\rho }} = \frac{|Dv(z(s))| |\dot{z}(s)|}{v(z(s))^{\rho }} \ge C_4 |\dot{z}(s)|, \end{aligned}$$

which integrated from 0 to T gives

$$\begin{aligned} t_{x}(\alpha ^{*}) \le \frac{v(x)^{1-\rho }}{C_4(1-\rho )}. \end{aligned}$$

Now we combine this with (3.17) to get the estimate (3.15). \(\square \)