Approaching nonsmooth nonconvex minimization through second-order proximal-gradient dynamical systems

We investigate the asymptotic properties of the trajectories generated by a second-order dynamical system of proximal-gradient type stated in connection with the minimization of the sum of a nonsmooth convex and a (possibly nonconvex) smooth function. The convergence of the generated trajectory to a critical point of the objective is ensured provided a regularization of the objective function satisfies the Kurdyka–Łojasiewicz property. We also provide convergence rates for the trajectory formulated in terms of the Łojasiewicz exponent.


Introduction
Let f : R n → R ∪ {+∞} be a proper, convex, and lower semicontinuous function, and let g : R n → R be a (possibly nonconvex) Fréchet differentiable function with β-Lipschitz continuous gradient, i.e., there exists β ≥ 0 such that ∇g(x)−∇g(y) ≤ β x − y for all x, y ∈ R n . In this paper, we investigate the optimization problem by associating to it the following second-order dynamical system of implicit-type where u 0 , v 0 ∈ R n , γ, λ ∈ (0, +∞) and Dynamical systems of proximal-gradient type associated with optimization problems have been intensively treated in the literature. In [16], Bolte studied the convergence of the trajectories of the first-order dynamical system where g : R n → R is a convex smooth function, C ⊆ R n is a nonempty, closed, and convex set, x 0 ∈ R n , and proj C denotes the projection operator on the set C.
The trajectory of (4) has been proved to converge to a minimizer of the optimization problem inf provided the latter is solvable. We refer also to the work of Antipin [7] for further results related to (4).
The following extension of the dynamical system (4) where f : R n → R ∪ {+∞} is a proper, convex and lower semicontinuous function, g : R n → R is a convex smooth function and x 0 ∈ R n , has been recently considered by Abbas and Attouch [1] in relation to the optimization problem (1). In case (1) is solvable, the trajectory generated by (6) has been proved to converge to a global minimizer of it.
In connection with the optimization problem (5), the second-order projected-gradient system with damping parameter γ > 0 and step size λ > 0, has been considered in [7,8]. The system (7) becomes in case C = R n the so-called heavy ball method with friction. This nonlinear oscillator with damping is, in case n = 2, a simplified version of the differential system describing the motion of a heavy ball that rolls over the graph of g and keeps rolling under its own inertia until friction stops it at a critical point of g (see [14]). Implicit dynamical systems related to both optimization problems and monotone inclusions have been considered in the literature also by Attouch and Svaiter in [15], Attouch, Abbas and Svaiter in [2] and Attouch, Alvarez and Svaiter in [9]. These investigations have been continued and extended in [21][22][23][24].
The aim of this manuscript is to study the asymptotic properties of the trajectory generated by the second-order dynamical system (2) under convexity assumptions for f and by allowing g to be nonconvex. In the same setting, a first-order dynamical system of type (6) attached to (1) has been recently studied in [25]. An asymptotic Vol. 18 (2018) Approaching nonsmooth nonconvex minimization 1293 analysis for a gradient-like second-order dynamical system (which corresponds to (7) when C = R n ) has been made in [29] (see also the recent review [30]) in the analytic setting.
The main results of the current work are Theorem 16, where we prove convergence of the trajectories to a critical point of the objective function of (1), provided a regularization of it satisfies the Kurdyka-Łojasiewicz property, and Theorem 20, where convergences rates by means of the Łojasiewicz exponent are provided for both the trajectory and the velocity. The convergence analysis relies on methods and techniques of real algebraic geometry introduced by Łojasiewicz [32] and Kurdyka [31] and extended to the nonsmooth setting by Attouch et al. [13] and Bolte et al. [17].
The explicit discretization of (2) with respect to the time variable t, with step size h k > 0, damping variable γ k > 0, and initial points x 0 := u 0 and x 1 := v 0 yields the iterative scheme For h k = 1 this becomes which is a relaxed proximal-gradient algorithm for minimizing f + g with inertial effects. For inertial-type algorithms, we refer the reader to [3][4][5]. The dynamical system investigated in this paper can be seen as a continuous counterpart of the inertial-type algorithms presented in [26] and [34].

Preliminaries
In this section we introduce some basic notions and present preliminary results that will be used in the sequel. The finite-dimensional spaces considered in the manuscript are endowed with the Euclidean norm topology. The domain of the function f : We say that f is proper, if dom f = ∅. For the following generalized subdifferential notions and their basic properties, we refer to [33,35]. Let f : R n → R ∪ {+∞} be a proper and lower semicontinuous function. For x ∈ dom f , the Fréchet (viscosity) subdifferential of f at x is defined aŝ In case f is convex, these notions coincide with the convex subdifferential, which means that∂ We will use the following closedness criterion concerning the graph of the limiting subdifferential: The Fermat rule reads in this nonsmooth setting as: if x ∈ R n is a local minimizer of the set of (limiting)-critical points of f . We also mention the following subdifferential rule: if f : R n → R ∪ {+∞} is proper and lower semicontinuous and h : R n → R is a continuously differentiable function, then (ii) x is continuous and its distributional derivative is Lebesgue integrable on [0, T ]; (iii) For every ε > 0, there exists η > 0 such that for any finite family of intervals REMARK 1. (a) It follows from the definition that an absolutely continuous function is differentiable almost everywhere, its derivative coincides with its distributional derivative almost everywhere and one can recover the function from its derivativė x = y by the integration formula (i). ( is absolutely continuous and B : R n → R n is L-Lipschitz continuous (where L ≥ 0), then the function z = B • x is absolutely continuous, too. This can be easily seen by using the characterization of absolute continuity in Definition 1(iii). Moreover, z is almost everywhere differentiable and the inequality ż(·) ≤ L ẋ(·) holds almost everywhere.
Further, we recall the following result of Brézis [27]. Vol. 18 (2018) Approaching nonsmooth nonconvex minimization 1295 LEMMA 2. Let f : R n −→ R ∪ {+∞} be a proper, convex and lower semicon- Then the function t −→ f (x(t)) is absolutely continuous and for every t such that The following central results will be used when proving the convergence of the trajectories generated by the dynamical system (2); see, for example, [ Then there exists lim t→+∞ F(t) ∈ R.

Existence and uniqueness of the trajectories
Existence and uniqueness of the trajectories of (2) are obtained in the framework of the global version of the Cauchy-Lipschitz Theorem (see for instance [12, Theorem 17.1.2(b)]), by rewriting (2) as a first-order dynamical system in a suitable product space and by employing the Lipschitz continuity of the proximal operator and of the gradient.
Proof. By making use of the notation X (t) = (x(t),ẋ(t)), the system (2) can be rewritten as where We prove the existence and uniqueness of a global solution of (8) by using the Cauchy-Lipschitz Theorem. To this aim it is enough to show that F is globally Lipschitz continuous. Let be (u, v), (u, v) ∈ R n × R n . We have We have By the nonexpansiveness of prox λ f and the β-Lipschitz property of ∇g, we have On the other hand, Consequently, which leads to Vol. 18 (2018) Approaching nonsmooth nonconvex minimization 1297 Consequently, F is globally Lipschitz continuous, which implies that (8) has a global solution X ∈ C 1 ([0, +∞), R n × R n ). This shows that x ∈ C 2 ([0, +∞), R n ). REMARK 6. Another Lipschitz constant can be obtained by using the inequalities: In this case, one obtains the Lipschitz constant REMARK 7. Considering again the setting of the proof of Theorem 5, from Remark 1(b), it follows thatẌ exists almost everywhere on [0, +∞) and that for almost every t ∈ [0, +∞) one has Hence, for almost every t ∈ [0, +∞), or, equivalently, Similarly, by using L 2 , one obtains for almost every t ∈ [0, +∞) REMARK 8. Obviously, L 1 > 2 and L 2 > 2. One can easily verify that L 2 ≤ L 1 , provided γ ≤ √ 3. Moreover, if γ ≤ √ 3, then However, for γ > √ 3, one may have L 2 > L 1 and also L 2 < L 1 . Indeed, for γ = 2 and λβ = 1 10 , it holds while for γ = 2 and λβ = 1 it holds
The convergence of the trajectory generated by the dynamical system (2) will be shown in the framework of functions satisfying the Kurdyka-Łojasiewicz property. For η ∈ (0, +∞], we denote by η the class of concave and continuous functions ϕ : [0, η) → [0, +∞) such that ϕ(0) = 0, ϕ is continuously differentiable on (0, η), continuous at 0 and ϕ (s) > 0 for all s ∈ (0, η). In the following definition (see [11,17]), we use the distance function to a set, defined for A ⊆ R n as dist(x, A) = inf y∈A x − y for all x ∈ R n . The origins of this notion go back to the pioneering work of Łojasiewicz [32], where it is proved that for a real-analytic function f : R n → R and a critical point x ∈ R n (that is ∇ f (x) = 0), there exists θ ∈ [1/2, 1) such that the function | f − f (x)| θ ∇ f −1 is bounded around x. This corresponds to the situation when ϕ(s) = C(1 − θ) −1 s 1−θ . The result of Łojasiewicz allows the interpretation of the KL property as a re-parametrization of the function values in order to avoid flatness around the critical points. Kurdyka [31] extended this property to differentiable functions definable in an o-minimal structure. Further extensions to the nonsmooth setting can be found in [11,[18][19][20].
One of the remarkable properties of the KL functions is their ubiquity in applications, according to [17]. To the class of KL functions belong semi-algebraic, real sub-analytic, semiconvex, uniformly convex, and convex functions satisfying a growth condition. We refer the reader to [10,11,13,[17][18][19][20] and the references therein for more details regarding all the classes mentioned above and illustrating examples.
An important role in our convergence analysis will be played by the following uniformized KL property given in [17,Lemma 6].
LEMMA 15. Let ⊆ R n be a compact set and let f : R n → R ∪ {+∞} be a proper and lower semicontinuous function. Assume that f is constant on and f satisfies the KL property at each point of . Then there exist ε, η > 0 and ϕ ∈ η such that for all x ∈ and for all x in the intersection the following inequality holds We state the first main result of the paper.
THEOREM 16. Assume that f + g is bounded from below and γ, λ satisfy the set of conditions (ρ), and let the constants L , A, B and C be defined as in Lemma 9. For u 0 , v 0 ∈ R n , let x ∈ C 2 ([0, +∞), R n ) be the unique global solution of (2). Consider the function Suppose that x is bounded and H is a KL function. Then the following statements are true (a)ẋ ∈ L 1 ([0, +∞), R n ); (b)ẍ ∈ L 1 ([0, +∞), R n ); (c) there exists x ∈ crit( f + g) such that lim t−→+∞ x(t) = x.