Continuous dynamics related to monotone inclusions and non-smooth optimization problems

The aim of this survey is to present the main important techniques and tools from variational analysis used for first and second order dynamical systems of implicit type for solving monotone inclusions and non-smooth optimization problems. The differential equations are expressed by means of the resolvent (in case of a maximally monotone set valued operator) or the proximal operator for non-smooth functions. The asymptotic analysis of the trajectories generated relies on Lyapunov theory, where the appropriate energy functional plays a decisive role. While the most part of the paper is related to monotone inclusions and convex optimization problems in the variational case, we present also results for dynamical systems for solving non-convex optimization problems, where the Kurdyka-Lojasiewicz property is used.


Introduction and preliminaries
Dynamical systems approaching monotone inclusions and optimization problems enjoy much attention since the seventies of the last century (Brézis, Baillon and Bruck, see [26,53,54]), not only due to their intrinsic importance in areas like differential equations and applied functional analysis, but also because they have been recognized as a valuable tool for discovering and studying numerical algorithms for optimization problems obtained by time discretization of the continuous dynamics. The dynamic approach to iterative methods in optimization can furnish deep insights into the expected behavior of the method and the techniques used in the continuous case can be adapted to obtain results for the discrete algorithm. For more on the relations between the continuous and discrete dynamics we refer the reader to [77].
Let us mention that the discretizationẋ(t) ≈ 1 γ (x n+1 − x n ) (with γ > 0) of the gradient floẇ x(t) = −∇g(x(t)), where g : H → R is a smooth function, leads to the well known steepest descent method (gradient method) x n+1 = x n − γ∇g(x n ), used for solving the minimization problem min x∈H g(x).
In the case of convex non-smooth optimization problems, the discretization of the differential inclusioṅ leads to the subgradient method x n+1 ∈ x n − γ∂f (x n ), or, when we use the discretization x n+1 − x n ∈ γ∂f (x n+1 ), we obtain the proximal point algorithm [82] x n+1 = (Id +γ∂f ) −1 (x n ).
Let us proceed and consider continuous implicit-type dynamical systems associated with monotone inclusions/optimization problems, which are ordinary differential equations formulated via resolvents of maximally monotone operators. In [33], Bolte studied the convergence of the trajectories of the following dynamical systeṁ where C is a nonempty, closed and convex subset of H, x 0 ∈ H, and proj C denotes the projection operator on the set C. It has been shown in the convex setting that the trajectory of (1) converges weakly to a minimizer of the optimization problem inf provided the latter is solvable. We refer also to the work of Antipin [5] for further statements and results concerning (1).
The following generalization of the dynamical system (1) has been considered by Abbas and Attouch in [1, Section 5.2]:ẋ (t) + x(t) = prox γf x(t) − γB(x(t)) , where B is a cocoercive operator. According to [1], in case zer(∂f + B) = ∅, the weak asymptotical convergence of the orbit x of (3) to an element in zer(∂f + B) = ∅ is ensured by choosing the step-size γ in a suitable domain bounded by the parameter of cocoercivity of the operator B.
Let us also mention that dynamical systems of implicit type have been considered in the literature also by Attouch and Svaiter in [25], Attouch, Abbas and Svaiter in [2] and Attouch, Alvarez and Svaiter in [22].
For the minimization of the smooth and convex function g : H → R over the nonempty, convex and closed set C ⊆ H, a continuous in time second order gradient-projection approach has been considered in [5,8], having as starting point the dynamical systemẍ (t) + γẋ(t) + x(t) = proj C (x(t) − η∇g(x(t))), with constant damping parameter γ > 0 and constant step size η > 0. The system (4) becomes in case C = H the "heavy ball method", sometimes called also "heavy ball method with friction". This nonlinear oscillator with damping is in case H = R 2 a simplified version of the differential system describing the motion of a heavy ball that rolls over the graph of g and that keep rolling under its own inertia until friction stop it at a critical point of g (see [18]). Later, (see [14,89] and the references therein) it was highlighted the fact that a vanishing in time damping is more suitable in the context of second order dynamics, since this will induce fast convergence results in comparison to first order dynamics. Second order dynamics provide through discretization numerical schemes with inertial terms, where the new iterate is computed based on the previous two ones. Taking into account the history of the iterates makes the asymptotic analysis more involved and may provide better convergence rates. It is an interesting topic to take into consideration more than two iterates in order to compute the next one. In case of algorithms we refer the reader to the recent contribution [57] and to [16] for a third order gradient evolution system. In this survey we are considering first and second order dynamics.
Finally, motivated by concrete applications where non-convex objects appear, let us say a few words about this setting. In the context of minimizing a non-convex smooth function, several first-and second-order gradient type dynamical systems have been investigated by Lojasiewicz [70], Simon [86], Haraux and Jendoubi [67], Alvarez, Attouch, Bolte and Redont [4,Section 4], Bolte, Daniilidis and Lewis [34,Section 4], etc. In the aforementioned papers, the convergence of the trajectories is obtained in the framework of functions satisfying the Kurdyka-Lojasiewicz property, see Section 4 for details.
1.1 Contents of the paper. In this manuscript we revisit the most important first and second-order dynamical systems of implicit type studied in the context of solving monotone inclusions and non-smooth optimization problems (convex and non-convex). The main focus is on the techniques which are used in the investigation of the asymptotic behaviour of the trajectories generated. Also the connection to discrete schemes will be also underlined in order to see the algorithmic impact of the dynamical systems considered. Moreover, the proofs are not in every stage of this survey complete, but we will always refer to appropriate references when something is missing. The existence and uniqueness of the trajectory usually follows from the Cauchy-Lipschitz-Picard Theorem and its variants and will be omitted in the following. However, the reader will find the corresponding details in the literature mentioned here. Let us also mention that there are no new results in this survey, apart from Theorem 15, which is a generalization to the continuous setting of the O 1 n rate for objective function values for prox-gradient schemes with no acceleration (see also ISTA in [30]). The main challenge in the analysis of the dynamical systems is defining an appropriate energy functional in order to conduct the investigations in the framework of Lyapunov theory. We start with first order dynamics governed by a nonexpansive operator with the aim of approaching the set of its fixed points [40]. While at a first look, this seems to be quite far from monotone inclusions or convex optimization problems, let us underline that this offers a setting in which also the aforementioned problems can be embedded. In this setting the energy functional is usually expressed by means of the distance of the trajectory to a solution of the problem under consideration. The continuous version of the Opial Lemma plays a decisive role on the analysis. As particular cases we mention the dynamical systems of forward-backward type, forward-backward-forward or Douglas-Rachford. The case of convex optimization problems is also considered.
Afterwards we continue with second-order dynamical systems [39]. These are motivated by the fact that by time discretization inertial terms are induced in the algorithmic schemes which are responsible for acceleration (see also the works of Polyak [78], Nesterov [73,74], Bertsekas [31] in the discrete case). Also in this setting we start with a very general dynamical system formulated by means of a cocoercive operator and then consider particular cases for monotone inclusions and convex optimization problems. Relations to other types of dynamics considered in the literature are also mentioned.
Finally, we consider dynamical systems for non-convex and non-smooth optimization problems [42]. How to find the appropriate energy functional in this context is not trivial. The approach for proving asymptotic convergence for the trajectory towards a critical point of the objective function, uses three main ingredients (see [4] for the continuous case and also [11,37] for a similar approach in the discrete setting). Namely, we show a sufficient decrease property along the trajectories of a regularization of the objective function, the existence of a subgradient lower bound for the trajectories and, finally, we obtain convergence by making use of the Kurdyka-Lojasiewicz property of the objective function.
1.2 Notations. Let us fix a few notations used throughout the paper. Let N = {0, 1, 2, ...} be the set of nonnegative integers. Let H be a real Hilbert space with inner product ·, · and associated norm · = ·, · . For readers convenience let us recall some standard notions and results in monotone operator theory which will be used in the following (see also [29,38,87]). For an arbitrary set-valued operator A : H ⇒ H we denote by Gr A = {(x, u) ∈ H × H : u ∈ Ax} its graph. We use also the notation zer A = {x ∈ H : 0 ∈ Ax} for the set of zeros of A. We say that The operator A is said to be uniformly monotone if there exists an increasing function φ A : [0, +∞) → [0, +∞] that vanishes only at 0, and x− y, u − v ≥ φ A ( x − y ) for every (x, u) ∈ Gr A and (y, v) ∈ Gr A. If this property is fulfilled in the special case φ A (t) = γt 2 , with γ > 0, , we say that A is γ-strongly monotone. We consider also the class of cocoercive operators: the single valued mapping B : Notice that this is nothing else than B −1 is γ-strongly monotone.
We recall some standard notations and facts in convex analysis. For a proper, convex and lower semicontinuous function f : H → R ∪ {+∞}, its (convex) subdifferential at x ∈ H is defined as When seen as a set-valued mapping, it is a maximally monotone operator (see [81]) and for γ > 0, its resolvent is given by J γ∂f = prox γf (see [29]), where prox γf : H → H, denotes the proximal point operator of f . The function f is said to be ν-strongly convex, where ν > 0, if f − (ν/2) · 2 is a convex function. Let us mention that if f is ν-strongly convex, then ∂f is ν-strongly monotone, see [29,Example 22.3(iv)].

First order dynamical systems
Let T : H → H be a nonexpansive mapping (that is T x − T y ≤ x − y for all x, y ∈ H), λ : [0, +∞) → [0, 1] be a Lebesgue measurable function and x 0 ∈ H. We are concerned with the following dynamical system: As in [2,25], we consider the following definition of an absolutely continuous function.
(ii) f is continuous and its distributional derivative is Lebesgue integrable on [0, b]; (iii) for every ε > 0, there exists η > 0 such that for any finite family of intervals I k = (a k , b k ) ⊆ [a, b] we have: Remark 1 (a) It follows from the definition that an absolutely continuous function is differentiable almost everywhere, its derivative coincides with its distributional derivative almost everywhere and one can recover the function from its derivative f ′ = g by the integration formula (i). We say that f : [0, +∞] → H is locally absolutely continuous if it is absolutely continuous on every compact interval Definition 2 We say that x : [0, +∞) → H is a strong global solution of (7) if the following properties are satisfied: The existence and uniqueness of strong global solutions of (7) is a consequence of the Cauchy-Lipschitz theorem for absolutely continues trajectories (see for example [66, Proposition 6.2.1], [88,Theorem 54]).
In order to proceed with the asymptotic analysis, we need the following preparatory results.
Proof. We rely on Lyapunov analysis combined with the Opial Lemma. We take an arbitrary y ∈ Fix T . From the nonexpansiveness of T and [29, Corollary 2.14] we obtain: Hence for all t ≥ 0 we have that Since λ(t) ∈ [0, 1] for all t ≥ 0, from (8) it follows that t → x(t) − y is decreasing, hence lim t→+∞ x(t) − y exists. From here we obtain the boundedness of the trajectory and by integrating (8) we deduce also that +∞ 0 thus (i) holds. Since y ∈ Fix T has been chosen arbitrary, the first assumption in the continuous version of Opial Lemma is fulfilled. We show in the following that lim t→+∞ T (x(t)) − x(t) exists and it is a real number. This is immediate if we show that the function t → 1 2 T (x(t)) − x(t) 2 is decreasing. According to Remark 1(b), the function t → T (x(t)) is almost everywhere differentiable and d dt T (x(t)) ≤ ẋ(t) holds for almost all t ≥ 0. Moreover, by the first equation of (7) we have d dt exists and is a real number. (a) Firstly, let us assume that +∞ 0 λ(t)(1 − λ(t))dt = +∞. This immediately implies by (9) that lim t→+∞ (T (x(t)) − x(t)) = 0, thus (ii) holds. Taking into account that λ is bounded, from (7) and (ii) we deduce (iii). For the last property of the theorem we need to verify the second assumption of the Opial Lemma. Let x ∈ H be a weak sequential cluster point of x, that is, there exists a sequence t n → +∞ (as n → ∞) such that (x(t n )) n∈N converges weakly to x. Applying the demiclosedness principle [29,Corollary 4.18] and (ii) we obtain x ∈ Fix T and the conclusion follows.
(b) We suppose now that inf t≥0 λ(t) > 0. From the first relation of (7) and (i) we easily deduce that T for almost all t ≥ 0, we obtain by applying Lemma 3 that lim t→∞ T (x(t)) − x(t) 2 = 0, thus (ii) holds. The rest of the proof can be done in the lines of case (a) considered above.

Remark 6
We refer the reader to [61] for a recent contribution concerning convergence rates of the trajectories generated by (7) in case the operator T satisfies some regularity assumptions, like boundedly linear/Hölder regularity.

Remark 7
The explicit discretization of (7) with respect to the time variable t, with step size h n > 0, yields for an initial point x 0 the following iterative scheme: By taking h n = 1 this becomes which is the classical Krasnosel'skiȋ-Mann algorithm for finding the set of fixed points of the nonexpansive operator T (see [29,Theorem 5.14]). Let us mention that the convergence of (10) is guaranteed under the condition n∈N λ n (1−λ n ) = +∞. Notice that in case λ n = 1 for all n ∈ N and for an initial point x 0 different from 0, the convergence of (10) can fail, as it happens for instance for the operator T = − Id. In contrast to this, as pointed out in Theorem 5, the dynamical system (7) has a strong global solution and the convergence of the trajectory is guaranteed also in case λ(t) = 1 for all t ≥ 0.

Remark 8
The conclusions of Theorem 5 remain valid for the class of averaged operators. Let α ∈ (0, 1) be fixed. We say that R : H → H is α-averaged if there exists a nonexpansive operator T : H → H such that R = (1−α) Id +αT . In this case we consider λ : [0, +∞) → [0, 1/α] and the condition +∞ 0 Remark 9 Relying on time rescaling arguments and using the link with a dynamical system governed by a cocoercive operator, it has been shown in [40,Section 4] that the conclusion of the theorem remains valid under the condition +∞ 0 λ(t)dt = +∞, which is weaker than the ones considered above. Also the specific upper bound for λ is not needed anymore. Our option to present the results in this manner is motivated by the link with the discrete case and the fact that for the convergence rates in Theorem 10 the upper bound will be used again.
In the following we investigate the convergence rate of the trajectories of the dynamical system (7). This will be done in terms of the fixed point residual function t → T (x(t)) − x(t) and of t → ẋ(t) . Notice that convergence rates for the discrete iteratively generated algorithm (10) have been investigated in [60,63,69].
Let x : [0, +∞) → H be the unique strong global solution of (7). Then for all t ≥ 0 we have According to (9) we have that lim t→+∞ f (t) ∈ R. 2 is decreasing (see the proof of Theorem 5), we have for all t ≥ 0 : Taking into account the definition of τ , we easily derive and the conclusion follows by using again (7).
In the following we investigate a continuous version of the forward-backward algorithm. Let A : H ⇒ H be a maximally monotone operator, β > 0 and B : H → H be β-cocoercive. Consider the dynamical system It is immediate that (11) can be written in the form (7) Let x : [0, +∞) → H be the unique strong global solution of (11). Then the following statements are true: (i) the trajectory x is bounded and Remark 12 (i) The explicit discretization of (11) with respect to the time variable t, with step size h n > 0 and initial point x 0 , yields the following iterative scheme: For h n = 1 this becomes which is the classical forward-backward algorithm for finding the set of zeros of A + B (see [29,Theorem 25.8]). Let us mention that the convergence of (12) is guaranteed under the condition n∈N λ n (δ − λ n ) = +∞.
(ii) Let us notice that in case A + B is strongly monotone, then rate of the convergence in (vi) is exponential, see [41,Theorem 1].
(iii) Notice that strong convergence of the trajectory can be induced by Tikhonov regularization. We refer the reader to [51] where the dynamics has been investigated under appropriate conditions imposed on the Tikhonov regularization in order to generate strong convergence of the trajectory towards the minimal norm solution. Notice that (13) already appears in [33,Section 5] in case λ(t) = 1, A is the normal cone to a nonempty, closed convex set C ⊆ H (that is, J γA is the projector operator onto C) and B is the gradient of a convex and smooth function.
Remark 13 Let us mention that in case A = ∂f , where f : H → R ∪ {+∞} is a proper, convex and lower semicontinuous function defined on a real Hilbert space H, and for λ(t) = 1 for all t ≥ 0, the dynamical system (11) has been studied in [1]. Notice that the weak convergence is obtained in [1, Theorem 5.2] for a constant step-size γ ∈ (0, 4β).

Remark 14
In Theorem 11 above related to the dynamical system (11), the cocoercivity of the operator B plays an important role. Now let us stress the fact that in applications arising from game theory for example, the operator (x, y) → (∇ x φ(x, y), −∇ y φ(x, y)) is not cocoercive (even in the bilinear case when φ(x, y) = x, Ky with K a linear and continuous operator). Under further conditions imposed on φ : H × G → R (G is another real Hilbert space), like convexity in x, concavity in y and some smoothness conditions, this operator is monotone and Lipschitz. Hence the desire to consider dynamical systems and numerical schemes also in this case is well motivated.
With respect to the case when B is monotone and Lipschitz, let us mention two types of dynamics considered in the literature. First, a dynamical system of forward-backward-forward-type introduced and investigated in [27]. The time discretization of this dynamics leads to the well-known Tseng algorithm [91] y n = J γA (x n − γB(x n )) which plays an important role also in primal-dual type algorithms for highly structured monotone inclusion/optimization problems, see for example [58]. Let us also mention that the dynamics (14) and several discretizations have been considered in the context of pseudo-monotone variational inequalities, see [50]. Second, the dynamical system has been investigated in [62]. By using the notation z(t) = x(t)+ y(t), it can be shown that the dynamics (16) is equivalent toż where R γA = 2J γA − Id is the reflected resolvent of A. Notice that this is a continuous version of the Douglas-Rachford algorithm, see [29]. Now, (17) can be written in the language of (7), since the reflected resolvent is a nonexpansive mapping. Further, let us mention that time discretization of (16) leads to the numerical scheme which compared to Tseng's algorithm (15) has the advantage that in each iteration it requires only one forward evaluation of the single valued operator B (see also [71] for another numerical scheme with this property). Finally, we mention also [80] for continuous in time dynamics and numerical schemes for monotone inclusion problems involving monotone and Lipschitz operators.
The results presented in this section can be specialized in the context of the convex optimization problem: where f : H → R ∪ {+∞} is a proper, convex and lower semicontinuous function, β > 0 and g : H → R is a convex and (Fréchet) differentiable function with 1/β-Lipschitz continuous gradient (which implies that ∇g is β-cocoercive, due to the celebrated Baillon-Haddad Theorem, see [29,Corollary 8.16]). Notice that argmin x∈H {f (x) + g(x)} = zer(∂f + ∇g) and the system (11) becomes Let us underline the remarkable result that in case λ(t) = 1, the trajectory of (20) converges weakly to a solution of (19) with no further restriction on the positive step size γ. This has been proved in [1, Theorem 5.2] by using the energy functional where x * is a solution of (19). For the prox-gradient scheme (see ISTA in [30]) it is known that we have the rate O 1 n for the functions values along the generated iterates. The following theorem can be seen as the continuous counterpart of this result.
Proof. From the characterization of the proximal operator we have The convexity of f yields Further, by using the convexity of g and the Descent Lemma (see [ Combining (24) and (25) we obtain Let us fix an arbitrary T > 0. By integration, we derive from (26) 1 2γ We notice that due to (20) Due to the continuity properties on [0, T ] of the trajectory, one haṡ x +ẋ ∈ L 2 ([0, T ]; H).
From (23) and [53,Lemme 4,p. 73] (see also [17,Lemma 3.2]) we obtain that the function t → f ẋ(t) + x(t) is absolutely continuous and Moreover, it holds Summing up the last two equalities we derive where in (29) we used the Lipschitz continuity of ∇g and in (30) the inequality (28). Altogether, we conclude that for almost every t ≥ 0 we have Due to the condition imposed on the step size, we get that hence the function t → (f + g) ẋ(t) + x(t) − (f + g)(x * ) + 1 2γ ẋ(t) 2 is nonincreasing. This together with (27) yields the inequality 1 2γ and the proof is complete.
Remark 16 Let us mention now a recent contribution related to structured optimization problems. For H and G real Hilbert spaces, we consider the convex minimization problem Due to the presence of the composition with a linear operator, dual variables appear both when writing the Fenchel dual problem and also in primal-dual numerical schemes existing in the literature for solving highly structured optimization problems. The following dynamical system introduced in [48] can be seen as a continuous version of the mentioned numerical schemes for solving (32): y(t) = cA(x(t) +ẋ(t)) − c(z(t) +ż(t)) x(0) = x 0 ∈ H, z(0) = z 0 ∈ G, y(0) = y 0 ∈ G, Let us stress the fact that the analysis of the dynamical system (33) is involved. It needs more technical results in order to show that the system is well-posed. Moreover, the asymptotic analysis requires deep machinery in order to derive an appropriate energy functional. We invite the reader to consult [48] for all these details, just mentioning that under appropriate hypotheses, (x(t), z(t), y(t)) converges weakly to a saddle point of the Lagrangian l, defined as l : H × G × G −→ R, l(x, z, y) = f (x) + h(x) + g(z) + y, Ax − z .
There are not too many works in the literature devoted to the solving of optimization problems involving compositions with linear operators by means of continuous in time dynamics. Let us also mention a recent contribution of Attouch [7], where a different dynamical system is proposed in [7, Section 2.2] for solving block-structured optimization problems with linear constraint. We mention also [32], where a dynamical system attached to a more involved optimization problem is investigated and which is a continuous counterpart of the proximal alternating minimization algorithm AMA (see also the work of Tseng on AMA [90]).
The dynamics (33) provides through explicit time discretization a numerical algorithm which is a combination of the linearized proximal method of multipliers and the proximal ADMM algorithm.
Indeed, by writing the inclusion in an equivalent form and using the explicit discretization of with respect to the time variable t and constant step h k ≡ 1 yields the iterative scheme (see [48,Remark 1]) where (M n 1 ) n≥0 and (M n 2 ) n≥0 are two operator sequences in S + (H) and S + (G), respectively. The algorithm (34) is a combination of the linearized proximal method of multipliers and the proximal ADMM algorithm.
A particular choice for the linear maps M 1 and M 2 transforms (33) into a dynamical system of primal-dual type formulated in the spirit of the full splitting paradigm. For every t ∈ [0, +∞), define In this particular setting, the dynamical system (33) can be equivalently written as (see [48,Remark 2]) where g * : G → R ∪ {+∞} is the Fenchel conjugate of the proper, convex and lower semicontinuous function g. Let us also mention that when h = 0 and γ = 1 the discretization of the dynamical system (35) leads to the primal-dual algorithm proposed by Chambolle and Pock in [55].

Second order dynamical systems
As in the previous section, we start with a general dynamical system which will be considered then in several special instances, including the link to optimization problems as well: where u 0 , v 0 ∈ H, B : H → H is a β-cocoercive operator, λ : [0, +∞) → [0, +∞) is a relaxation function in time and γ : [0, +∞) → [0, +∞) is a continuous damping parameter.
The existence and uniqueness of the trajectory follows (under appropriate conditions imposed on the parameters) from the Cauchy-Lipschitz-Picard Theorem by writing (36) as a first order dynamical system in a product space (see [39,Theorem 4]).
In order to prove the convergence of the trajectories of (36), we make the following assumptions on the relaxation function λ and the damping parameter γ, respectively: (A1) λ, γ : [0, +∞) → (0, +∞) are locally absolutely continuous and there exists θ > 0 such that for almost every t ∈ [0, +∞) we haveγ (t) ≤ 0 ≤λ(t) and Due to Definition 1 and Remark 1(a),λ(t),γ(t) exists for almost every t ≥ 0 andλ,γ are Lebesgue integrable on each interval [0, b] for 0 < b < +∞. This combined withγ(t) ≤ 0 ≤λ(t) and the fact that λ, γ take only positive values yield the existence of a positive lower bound λ for λ and of a positive upper bound γ for γ. Furthermore, the second assumption in (37) provides also a positive upper bound λ for λ and a positive lower bound γ for γ.
We would also like to point out that under the conditions considered in (A1) the global version of the Cauchy-Lipschitz-Picard Theorem allows us to conclude that, for u 0 , v 0 ∈ H, there exists a unique trajectory x : [0, +∞) → H which is a C 2 -function and which satisfies the first relation in (36) for every t ∈ [0, +∞). The considerations we make in the following take into account this fact.
By multiplying this inequality with exp(γt) and then integrating from 0 to T , where T > 0, one easily obtains thus h is bounded (43) and, consequently, the trajectory x is bounded.
On the other hand, from (42), it follows that for every t ∈ [0, +∞) This inequality in combination with (44) yieldsẋ is bounded, which further implies thatḣ is bounded.
(iii) We are going to prove that both assumptions in Opial Lemma are fulfilled. The first one concerns the existence of lim t→+∞ x(t) − x * . As seen in the proof of part (i), the function t →ḣ(t) + γ(t)h(t) + β γ(t) λ(t) ẋ(t) 2 is monotonically decreasing, thus from (i), (ii) and (A1) we deduce that lim t→+∞ γ(t)h(t) exists and it is a real number. By taking also into account that ∃ lim t→+∞ γ(t) ∈ (0, ∞), we obtain the existence of lim t→+∞ x(t) − x * .
We come now to the second assumption of the Opial Lemma. Let x be a weak sequential cluster point of x, that is, there exists a sequence t n → +∞ (as n → +∞) such that (x(t n )) n∈N converges weakly to x. Since B is a maximally monotone operator (see for instance [29,Example 20.28]), its graph is sequentially closed with respect to the weak-strong topology of the product space H × H. By using also that lim n→+∞ B(x(t n )) = 0, we conclude that Bx = 0, hence x ∈ zer B and the proof is complete.
A standard choice of a cocoercive operator defined on a real Hilbert spaces is B = Id −T , where T : H → H is a nonexpansive operator. As it easily follows from the nonexpansiveness of T , B is in this case 1/2-cocoercive. For this particular operator B the dynamical system (36) becomes Theorem 17 gives rise to a corresponding result where the trajectory converges weakly to a fixed point of T , where in (A1) we use the condition γ 2 (t) λ(t) ≥ 2(1 + θ).
Remark 18 (i) The analysis can be further extended to the situation when T in (47) is an α-averaged operator (with 0 < α < 1). In this case we use in (A1) the condition γ 2 (t) λ(t) ≥ 2α(1 + θ). (ii) In the particular case when γ(t) = γ > 0 for all t ≥ 0 and λ(t) = 1 for all t ∈ [0, +∞) the dynamical system (47) has been studied in [8,Theorem 3.2] under the condition γ 2 > 2, which is covered by the theory presented above. We would also like to notice that in [3] an anisotropic damping has been considered in the context of approaching the minimization of a smooth convex function via second order dynamical systems.
Let us turn now our attention to monotone inclusions and consider the dynamics: Further, in (A1) we use γ 2 (t) λ(t) ≥ 2(1+θ) δ , where δ := 4β−η 2β . The statements (i)-(iii) in the next result follow from Remark 18(ii) by taking into account that under the hypotheses, T = J ηA • (Id −ηB) is 1/δ-averaged, see also the comments before Theorem 11. For (iv) and (v) we invite the reader to consult [39,Theorem 12]. Moreover, under appropriate hypotheses, the rate of convergence in (v) is exponential, see [41,Theorem 7]. Remark 20 For second order dynamical systems with vanishing damping we refer the reader to [23] where the following differential equation is investigated ẍ(t) + α tẋ (t) + A λ(t) (x(t)) = 0 A λ(t) : H → H being the Yosida regularization of A, defined by A λ(t) = 1 λ(t) (Id −J λ(t)A ). Convergence of the trajectory towards a zero of the maximally monotone operator A : H ⇒ H is reported in [23] together with convergence rates concerning the velocity and the acceleration. Let us also mention the recent contribution [20] where a Newton-like correction term is used: with β > 0. In case of convex potentials, the correction term becomes the Hessian driven damping which is responsable for attenuation of the oscillations of the trajectories and for better convergence rates for the gradients along the orbits, see also [13] and [85]. For the algorithmic consequences of (49) and (50) we invite the reader to consult [23] and [19].
In the remaining of this section we turn our attention to optimization problems of the form see (19) and the setting described there.
In the following statement we approach the minimizers of f + g via the second order dynamical system Corollary 21  (iii) x(t) converges weakly to a minimizer of f + g as t → +∞; (iv) if x * is a minimizer of f + g, then ∇g(x(·)) − ∇g(x * ) ∈ L 2 ([0, +∞); H), lim t→+∞ ∇g(x(t)) = ∇g(x * ) and ∇g is constant on argmin x∈H {f (x) + g(x)}; (v) if f or g is uniformly convex, then x(t) converges strongly to the unique minimizer of f + g as t → +∞.

Remark 22
Consider again the setting in Remark 18(ii), namely, when γ(t) = γ > 0 for every t ≥ 0 and λ(t) = 1 for every t ∈ [0, +∞). Furthermore, for C a nonempty, convex, closed subset of H, let f = δ C be the indicator function of C, which is defined as being equal to 0 for x ∈ C and to +∞, else. The dynamical system (20) attached in this setting to the minimization of g over C becomes The asymptotic convergence of the trajectories of (52) has been studied in [8,Theorem 3.1] under the conditions γ 2 > 2 and 0 < η ≤ 2β. In this case the corresponding assumption (A1) trivially holds by choosing θ such that 0 < θ ≤ (γ 2 − 2)/2 ≤ (δγ 2 − 2)/2. Thus, in order to verify the corresponding (A1) in case λ(t) = 1 for every t ∈ [0, +∞) one needs to equivalently assume that γ 2 > 2/δ. Since δ ≥ 1, this provides a slight improvement over [8,Theorem 3.1] in what concerns the choice of γ. We refer the reader also to [5] for an analysis of the convergence rates of trajectories of the dynamical system (52) when g is endowed with supplementary properties.
Remark 23 By using the energy functional (21), it is possible to go beyond the condition 0 < η ≤ 2β considered in Corollary 21. We refer the reader to [39,Corollary 16] for more details. Moreover, in [39,Remark 17] it has been pointed out that in case γ(t) = γ > 0 for every t ≥ 0 and λ(t) = 1 for every t ∈ [0, +∞), the only condition imposed on the parameters is γ 2 > η β + 1. In other words, this allows in this particular setting a more relaxed choice for the parameters, beyond the standard assumptions 0 < η ≤ 2β and γ 2 > 2 considered in [8].

Remark 24
The explicit discretization of (51) with respect to the time variable t, with step size h n > 0, relaxation variable λ n > 0, damping variable γ n > 0 and initial points x 0 := u 0 and x 1 := v 0 yields the following iterative scheme For h n = 1 this becomes which is a relaxed forward-backward algorithm for minimizing f + g with inertial effects. In order to investigate the convergence properties of the above iterative scheme, it is natural to assume that (γ n ) n≥1 is nonincreasing, (λ n ) n≥1 is nondecreasing, and there exists a lower bound for (γ 2 n /λ n ) n≥1 . The control of the inertial term by means of the variable parameters λ n and γ n could increase the speed of convergence of the algorithm (53). Making use of the the sequence (λ n ) n≥1 in (53), one obtains a relaxed forward-backward scheme, where the relaxation is usually considered in the literature in order to achieve more freedom in the choice of the parameters involved in the numerical scheme. We refer the reader to [12] where this fact has been underlined in the context of monotone inclusion problems.

Remark 25
In case f is vanishing and ηλ(t) = 1, the dynamical system (51) becomes There is an intensive research in the literature mainly devoted to the vanishing damping of the form γ(t) = α t , with α > 0. This is due to the fact that in case α > 3 this provides a fast convergence result in terms of the objective function of order o 1 t 2 . The starting point of the investigations was the work of Su, Boyd and Candès [89], followed by relevant contributions of Attouch and his co-workers (see for example [14,15] and the references therein). This is related to the accelerated gradient method in the sense of Nesterov where inertial terms by means of iterates accelerate the convergence behaviour of the numerical scheme. Let us mention also the recent contributions [13,85] where inertial terms induced by the gradient of g are also considered, which are coming from the discretization of Hessian driven damping terms, see also [24] and [49].

First order dynamics for nonconvex optimization problems
In this section we approach the solving of the optimization problem where f : R n → R ∪ {+∞} is a proper, convex, lower semicontinuous function and g : R n → R a (possibly nonconvex) Fréchet differentiable function with β-Lipschitz continuous gradient for β ≥ 0, i.e., ∇g(x)−∇g(y) ≤ β x−y ∀x, y ∈ R n , by associating to it the implicit dynamical system where η > 0 and x 0 ∈ R n is chosen arbitrary (we consider the euclidean space R n ).
Further, from the characterization of the proximal point operator we have Like in the proof of Theorem 15 we obtain and for almost every t ∈ [0, T ] we have By integration we get By using (57) and the fact that f + g is bounded from below and by taking into account that T > 0 has been arbitrarily chosen, we obtainẋ ∈ L 2 ([0, +∞); R n ).
Furthermore, for almost every t ∈ [0, +∞) we have By applying Lemma 3, it follows that lim t→+∞ẋ (t) = 0 and the proof of (a) is complete. From (60), (57) and by using that T > 0 has been arbitrarily chosen, we get for almost every t ∈ [0, +∞). From Lemma 2 it follows that exists and it is a real number, hence from lim t→+∞ẋ (t) = 0 the conclusion follows.
For the following generalized subdifferential notions and their basic properties we refer to [72,83]. Let h : R n → R ∪ {+∞} be a proper and lower semicontinuous function. If x ∈ dom h, we consider the Fréchet (viscosity) subdifferential of h at x as the set∂ for each x ∈ R n . Notice that in case h is convex, these subdifferential notions coincide with the convex subdifferential, The graph of the limiting subdifferential fulfills the following closedness criterion: The Fermat rule reads in this nonsmooth setting as follows: if x ∈ R n is a local minimizer of h, then 0 ∈ ∂ L h(x). We denote by crit(h) = {x ∈ R n : 0 ∈ ∂ L h(x)} the set of (limiting)-critical points of h. We define the limit set of the trajectory x as Lemma 27 Suppose that f + g is bounded from below and η > 0 fulfills the inequality (57). For x 0 ∈ R n , let x ∈ C 1 ([0, +∞), R n ) be the unique global solution of (56). Then Proof. Let x ∈ ω(x) and t k → +∞ be such that x(t k ) → x as k → +∞. From the characterization of the prox operator we have Lemma 26(a) and the Lipschitz continuity of ∇g ensure that We claim that lim Due to the lower semicontinuity of f it holds Further, sinceẋ we have the inequality Taking the limit as k → +∞ we derive by using again Lemma 26(a) that which combined with (68) implies lim By using (66) and the continuity of g we conclude that (67) is true. Altogether, from (64), (65), (66), (67) and the closedness criteria of the limiting subdifferential we obtain 0 ∈ ∂ L (f + g)(x) and the proof is complete.
Lemma 28 Suppose that f + g is bounded from below and η > 0 fulfills the inequality (57). For x 0 ∈ R n , let x ∈ C 1 ([0, +∞), R n ) be the unique global solution of (56) and consider the function Then the following statements are true: (H 2 ) for every t ∈ [0, +∞) it holds and Proof. (H1) follows from Lemma 26. The first statement in (H2) follows from the characterization of the proximal operator and the relation while the second one is a consequence of the Lipschitz continuity of ∇g. Finally, (H3) has been shown as intermediate step in the proof of Lemma 27.
Proof. (a), (b) and (d) are direct consequences Lemma 26, Lemma 27 and Lemma 28. Finally, (c) is a classical result from [66]. We also refer the reader to the proof of Theorem 4.1 in [4], where it is shown that the properties of ω(x) of being nonempty, compact and connected are generic for bounded trajectories fulfilling lim t→+∞ẋ (t) = 0.
Since the lower level sets of f + g are bounded, the above inequality yields the boundedness ofẋ + x, which combined with lim t→+∞ẋ (t) = 0 delivers the boundedness of x.
If h satisfies the KL property at each point in dom ∂h, then h is called KL function.
The origins of this notion go back to the pioneering work of Lojasiewicz [70], where it is proved that for a real-analytic function h : R n → R and a critical point x ∈ R n (that is ∇h(x) = 0), there exists θ ∈ [1/2, 1) such that the function |h − h(x)| θ ∇h −1 is bounded around x. This corresponds to the situation when ϕ(s) = Cs 1−θ , where C > 0. The result of Lojasiewicz allows the interpretation of the KL property as a re-parametrization of the function values in order to avoid flatness around the critical points. Kurdyka [68] extended this property to differentiable functions definable in o-minimal structures. Further extensions to the non-smooth setting can be found in [10,[34][35][36].
One of the remarkable properties of the KL functions is their ubiquity in applications (see [37]). To the class of KL functions belong semi-algebraic, real sub-analytic, semiconvex, uniformly convex and convex functions satisfying a growth condition. We refer the reader to [9][10][11][34][35][36][37] and the references therein for more on KL functions and illustrating examples.
We come now to the main result of this section.
Theorem 31 Suppose that f + g is bounded from below and η > 0 fulfills the inequality (57). For x 0 ∈ R n , let x ∈ C 1 ([0, +∞), R n ) be the unique global solution of (56) and consider the function H : R n × R n → R ∪ {+∞}, H(u, v) = (f + g)(u) Suppose that x is bounded and H is a KL function. Then the following statements are true:

Remark 32
The construction of the function H, which we used in the above arguments in order to derive a descent property, has been inspired by the decrease property obtained in (60). Similar regularizations of the objective function of (55) have been considered also in [46,75], in the context of the investigation of non-relaxed forward-backward methods involving inertial and memory effects in the nonconvex setting.
Remark 33 (i) We invite the reader to consult [42,Section 3.2] for results concerning convergence rates of the trajectory, which are expressed by means of the Lojasiewicz exponent.
(iii) Finally, let us mention a recent contribution [52] related to a dynamical system proposed and investigated in connection with block-structured optimization problems inf (x,y)∈R n ×R m [f (x) + g(y) + H(x, y)], where f, g are proper, lower-semicontinuous and H satisfies certain smoothness conditions. The dynamics considered in [52] can be seen as a continuous counter-part of the PALM method introduced in [37].