Second Order Splitting Dynamics with Vanishing Damping for Additively Structured Monotone Inclusions

In the framework of a real Hilbert space, we address the problem of finding the zeros of the sum of a maximally monotone operator A and a cocoercive operator B. We study the asymptotic behaviour of the trajectories generated by a second order equation with vanishing damping, attached to this problem, and governed by a time-dependent forward–backward-type operator. This is a splitting system, as it only requires forward evaluations of B and backward evaluations of A. A proper tuning of the system parameters ensures the weak convergence of the trajectories to the set of zeros of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A + B$$\end{document}A+B, as well as fast convergence of the velocities towards zero. A particular case of our system allows to derive fast convergence rates for the problem of minimizing the sum of a proper, convex and lower semicontinuous function and a smooth and convex function with Lipschitz continuous gradient. We illustrate the theoretical outcomes by numerical experiments.


Problem formulation and a continuous time splitting scheme with vanishing damping
Let H be a real Hilbert, A : H → 2 H a maximally monotone operator and B : H → H a β-cocoercive operator for some β > 0 such that zer(A + B) = ∅.Devising fast convergent continuous and discrete time dynamics for solving monotone inclusions of the type is of great importance in many fields, including, but not limited to, optimization, equilibrium theory, economics and game theory, partial differential equations, and statistics.One of our main motivations comes from the fact that solving the convex optimization problem where f : H → R ∪ {+∞} is proper, convex and lower semicontinuous and g : H → R is convex and Fréchet differentiable with a Lipschitz continuous gradient, is equivalent to solving the monotone inclusion 0 ∈ (∂f + ∇g)(x).
We want to exploit the additive structure of (1) and approach A and B separately, in the spirit of the splitting paradigm.For t ≥ t 0 > 0, α > 1, ξ ≥ 0, and functions λ, γ : [t 0 , +∞) → (0, +∞), we will study the asymptotic behaviour of the trajectories of the second order differential equation (Split-DIN-AVD) ẍ(t) + α t ẋ(t) + ξ d dt T λ(t),γ(t) (x(t)) + T λ(t),γ(t) (x(t)) = 0, where, for λ, γ > 0, the operator T λ,γ : H → H is given by The sets of zeros of A + B and of T λ,γ , for λ, γ > 0, coincide.The nomenclature (Split-DIN-AVD) comes from the splitting feature of the continuous time scheme, as well as the link with the (DIN-AVD) system developed by Attouch and László in [9] (Dynamic Inertial Newton -Asymptotic Vanishing Damping), which we will emphasize later.We will discuss the existence and uniqueness of the trajectories generated (Split-DIN-AVD), and also show their weak convergence to the set of zeros of A + B as well as the fast convergence of the velocities to zero, and convergence rates for T λ(t),γ(t) (x(t)) and d dt T λ(t),γ(t) (x(t)) as t → +∞.
For the particular case B = 0, we are left with the monotone inclusion problem find x ∈ H such that 0 ∈ A(x), and the attached system where, for λ, γ > 0, the operator A λ,γ : H → H can be seen as a generalized Moreau envelope of the operator A, i.e., In particular, we will be able to set γ(t) = λ(t) for every t ≥ t 0 .Since for λ > 0, A λ,λ = A λ , this allows us to recover the (DIN-AVD) system (DIN-AVD) ẍ(t) + α t ẋ(t) + ξ d dt A λ(t) (x(t)) + A λ(t) (x(t)) = 0, addressed by Attouch and László in [9].If A = 0, and after properly redefining some parameters, we obtain the following system with η : [t 0 , +∞) → (0, +∞), which addresses the monotone equation find x ∈ H such that B(x) = 0.
This dynamical system approaches the cocoercive operator B directly through a forward evaluation, which is more natural, instead of having to resort to its Moreau envelope, as in (DIN-AVD).

Notation and preliminaries
In this subsection, we will explain the notions which were mentioned in the previous subsection, and we will introduce some definitions and preliminary results that will be required later.Throughout the paper, we will be working in a real Hilbert space H with inner product •, • and corresponding norm Let A : H → 2 H be a set-valued operator, that is, Ax is a subset of H for every x ∈ H.The operator A is totally characterized by its graph gra A = {(x, u) ∈ H × H : u ∈ Ax}.The inverse of A is the operator A −1 : H → 2 H well-defined through the equivalence x ∈ A −1 u if and only if u ∈ Ax.The set of zeros of A is the set zer A = {x ∈ H : 0 ∈ Ax}.For a subset C ⊆ H, we say that A(C) = ∪ x∈C Ax.The range of A is the set ran A = A(H).
A set-valued operator A is said to be monotone if v − u, y − x ≥ 0 whenever (x, u), (y, v) ∈ gra A, and maximally monotone if it is monotone and the following implication holds: Let λ > 0. The resolvent of index λ of A is the operator J λA : H → 2 H given by and the Moreau envelope (or Yosida approximation or Yosida regularization) of index λ of A is the operator A λ : H → 2 H given by where Id : H → H, defined by Id(x) = x for every x ∈ H, is the identity operator of A single-valued operator B : H → H is said to be β-cocoercive for some β > 0 if for every x, y ∈ H we have In this case, B is 1 β -Lipschitz continuous, namely, for every x, y ∈ H we have We say B is nonexpansive if it is 1-Lipschitz continuous, and firmly nonexpansive if it is 1-cocoercive.For α ∈ (0, 1), we say B is α-averaged if there exists a nonexpansive operator R : H → H such that Let λ > 0 and A : H → 2 H .According to Minty's Theorem, A is maximally monotone if and only if ran(Id +λA) = H.In this case J λA is single-valued and firmly nonexpansive, A λ is single-valued, λ-cocoercive, and for every x ∈ H and every λ 1 , λ 2 > 0 we have monotone and continuous, then it is maximally monotone.
The following concepts and results show the strong interplay between the theory of monotone operators and the convex analysis.
Let f : H → R ∪ {+∞} be a proper, convex and lower semicontinuous function.We denote the infimum of f over H by min H f and the set of global minimizers of f by argmin H f .The subdifferential of f is the operator ∂f : H → 2 H defined, for every x ∈ H, by The subdifferential operator of f is maximally monotone and x ∈ zer ∂f ⇔ x is a global minimizer of f .
Let λ > 0. The proximal operator of f of index λ is the operator prox λf : H → H defined, for every x ∈ H, by which also means that prox λf is firmly nonexpansive.The Moreau envelope of f of index λ is the function The function f λ is Fréchet differentiable and Finally, if f : H → R has full domain and is Fréchet differentiable with 1 β -Lipschitz continuous gradient, for β > 0, then, according to Baillon-Haddad's Theorem, ∇f is β-cocoercive.

A brief history of inertial systems attached to optimization problems and monotone inclusions
In the last years there have been many advances in the study of continuous time inertial systems with vanishing damping attached to monotone inclusion problems.We briefly visit them in the following paragraphs.

The Heavy Ball Method with friction
Consider a convex and continuously differentiable function f : H → R with at least one minimizer.The heavy ball with friction system was introduced by Álvarez in [2] as a suitable continuous time scheme to approach the minimization of the function f .This system can be seen as the equation of the horizontal position x(t) of an object that moves, under the force of gravity, along the graph of the function f , subject to a kinetic friction represented by the term µ ẋ(t) (a nice derivation can be seen in the work done by Attouch-Goudou-Redont in [8]).It is known that, if x is a solution of (HBF), then x converges weakly to a minimizer of f and f (x(t)) − min H f = O 1 t as t → +∞.In recent times, the question was raised whether the damping coefficient µ could be chosen to be time-dependent.An important contribution was made by Su-Boyd-Candés (in [20]) who studied the case of an Asymptotic Vanishing Damping coefficient µ(t) = α t , namely, and proved when α ≥ 3 the rate of convergence for the functional values f (x(t)) − min H f = O 1 t 2 as t → +∞.This second order system can be seen as a continuous counterpart to Nesterov's accelerated gradient method from [19].Weak convergence of the trajectories generated by (AVD) when α > 3 has been shown by Attouch-Chbani-Peypouquet-Redont [6] and May [18], with the improved rate of convergence for the functional values f as t → +∞.For α = 3, the convergence of the trajectories remains an open question, except for the one dimensional case (see [7]).In the subcritical case α ≤ 3, it has been shown by Apidopoulos-Aujol-Dossal [5] and Attouch-Chbani-Riahi [7] that the objective values converge at a rate O(t − 2α 3 ) as t → +∞.

Heavy Ball dynamics and cocoercive operators
If f : H → R ∪ {+∞} is a proper, convex and lower semicontinuous function which is not necessarily differentiable, then we cannot make direct use of (3).However, since for λ > 0 we have argmin f = argmin f λ , we can replace f by its Moreau envelope f λ , and the system now becomes In line with this idea, and in analogy with (3), Álvarez and Attouch [3] and Attouch and Maingé [11] studied the dynamics ẍ(t) + µ ẋ(t) + B(x(t)) = 0, (5) where B : H → H is a β-cocoercive operator.They were able to prove that the solutions of this system weakly converge to elements of zer B provided that the cocoercitivity parameter β and the damping coefficient µ satisfy βµ 2 > 1.For a maximally monotone operator A : H → 2 H , we know that its Moreau envelope is λ-cocoercive and thus, under the condition λµ 2 > 1, the trajectories of ẍ(t) + µ ẋ(t) + A λ (x(t)) = 0 converge weakly to elements of zer A λ = zer A. Also related to (5), Boţ-Csetnek [16] considered the system where B : H → H is again β-cocoercive.Under the assumption that µ and ν are locally absolutely continuous, μ(t) ≤ 0 ≤ ν(t) for almost every t ∈ [0, +∞) and inf t≥0 µ 2 (t) , the authors were able to prove that the solutions to this system converge weakly to zeros of B.
In [12], Attouch and Peypouquet addressed the system where α > 1 and the time-dependent regularizing parameter λ(t) satisfies λ(t) α 2 t 2 > 1 for every t ≥ t 0 > 0. As well as ensuring the weak convergence of the trajectories towards elements of zer A, choosing the regularizing parameter in such a fashion allowed the authors to obtain fast convergence of the velocities and accelerations towards zero.

Inertial dynamics with Hessian damping
Let us return briefly to the (AVD) system (4).In addition to the viscous vanishing damping term α t ẋ(t), the following system with Hessian-driven damping was considered by Attouch-Peypouquet-Redont in [13] where ξ ≥ 0. While preserving the fast convergence properties of the Nesterov accelerated method, the Hessian-driven damping term reduces the oscillatory aspect of the trajectories.In [9], Attouch and László studied a version of (7) with an added Hessian-driven damping term: While preserving the convergence results of (7), the main benefit of the introduction of this damping term is the fast convergence rates that can be obtained for A λ(t) (x(t)) and d dt A λ(t) (x(t)) as t → +∞.The regularizing parameter λ(t) is again chosen to be time-dependent; in the general case, the authors take λ(t) = λt 2 , and in [12] it is shown that taking λ(t) this way is critical.However, in the case where A = ∂f for a proper, convex and lower semicontinuous function f , it is also allowed to take λ(t) = λt r with r ≥ 0.

Layout of the paper
In Section 2, we give the proof for the existence and uniqueness of strong global solutions to (Split-DIN-AVD) by means of a Cauchy-Lipschitz-Picard argument.In Section 3 we state the main theorem of this work, and we show the weak convergence of the solutions of (2) to elements of zer(A + B), as well as the fast convergence of the velocities and accelerations to zero.We also provide convergence rates for T λ(t),γ(t) (x(t)) and d dt T λ(t),γ(t) (x(t)) as t → +∞.We explore the particular cases A = 0 and B = 0, and show improvements with respect to previous works.In Section 4, we address the convex minimization case, namely, when A = ∂f and B = ∇g, where f : H → R ∪ {+∞} is a proper, convex and lower semicontinuous function and g : H → R is a convex and Fréchet differentiable function with Lipschitz continuous gradient, and derive, in addition, a fast convergence rate for the function values.In Section 5, we illustrate the theoretical results by numerical experiments.In Section 5, we provide an algorithm that arises from a time discretization of (Split-DIN-AVD) and discuss its convergence properties.

Existence and uniqueness of trajectories
In this section, we show the existence and uniqueness of strong global solutions to (Split-DIN-AVD).For the sake of clarity, first we state the definition of a strong global solution.
Definition 2.1.We say that x A classic solution is just a strong global solution which is C 2 .Sometimes we will mention the terms strong global solution or classic global solution without explicit mention of the Cauchy data.
The following lemma will be used to prove the existence of strong global solutions of our system, and we will need it in the proof of the main theorem as well.

Now, notice that
so using (i) and the fact that T γ 1 ,γ 2 (x) = 0, we obtain Altogether, plugging (8) into our initial inequality yields To show the second inequality, we use the previous one.We have where the last line is a consequence of T λ 2 ,γ 2 being λ 2 2 -cocoercive, and hence and use (ii) to obtain, for every t ≥ t 0 , Hence, by taking the limit as s → t we get, for any t ≥ t 0 , The next theorem concerns the existence and uniqueness of strong global solutions to (Split-DIN-AVD).
Proof.We will rely on [17,Proposition 6.2.1] and distinguish between the cases ξ > 0 and ξ = 0.For each chase, we will check that the conditions of the afforementioned proposition are fulfilled.We will be working in the real Hilbert space H × H endowed with the norm (x, y) = x + y .Let x ∈ zer(A + B) be fixed.
The case ξ > 0. First, it can be easily checked (see also [4,9,13]) that for all t ≥ t 0 the following dynamical systems are equivalent * ẍ In other words, (2) with Cauchy data (x 0 , u 0 ) = (x(t 0 ), ẋ(t 0 )) is equivalent to the first order system where z(t) = (x(t), y(t)), F is given, for every t ≥ t 0 , by and the Cauchy data is +∞) be fixed.We need to verify the Lipschitz continuity of F on the z variable.Set z = (x, y), w = (u, v).We have Set λ := inf t≥t 0 λ(t) > 0. According to Lemma 2.2(i), the term involving the operator T λ(t),γ(t) satisfies It follows that, if we take (ii) Now, we claim that F fulfills a boundedness condition.For t ∈ [t 0 , +∞) and z = (x, y) ∈ H × H we have By Lemma 2.2(i), we have, for every t ≥ t 0 , Hence, if we take then we have P ∈ L 1 loc ([t 0 , +∞), R) and We have checked that the conditions of [17, Proposition 6.2.1] hold.Therefore, there exists a unique locally absolutely continuous solution t → x(t) of (2) that satisfies x(t 0 ) = x 0 and ẋ(t 0 ) = u 0 .
The case ξ = 0. Now, ( 2) is easily seen to be equivalent to where z(t) = (x(t), y(t)) and F is given, for every t ≥ t 0 , by Showing that F fulfills the required properties is starightforward.

The convergence properties of the trajectories
In this section, we will study the asymptotic behaviour of the trajectories of the system We will show weak convergence of the trajectories generated by (2) to elements of zer(A + B), as well as the fast convergence of the velocities and accelerations to zero.Additionally, we will provide convergence rates for T λ(t),γ(t) (x(t)) and d dt T λ(t),γ(t) (x(t)) as t → +∞.To avoid repetition of the statement "for almost every t", in the following theorem we will assume we are working with a classic global solution of our system.Theorem 3.1.Let A : H → 2 H be a maximally monotone operator and B : H → H a β-cocoercive operator for some β > 0 such that zer and all t ≥ t 0 , and that γ : Then, for a solution x : [t 0 , +∞) → H to (Split-DIN-AVD), the following statements hold: (iii) We have the convergence rates (iv) If 0 < inf t≥t 0 γ(t) ≤ sup t≥t 0 γ(t) < 2β, then x(t) converges weakly to an element of zer(A + B) as t → +∞.
Proof.Integral estimates and rates.To develop the analysis, we will fix x ∈ zer(A + B) and make of use of the Lyapunov function E : [t 0 , +∞) → R ∪ {+∞} given by Differentiation of E with respect to time yields, for every t ≥ t 0 , After reduction and employing (2), we get, for every t ≥ t 0 , Now, by Lemma 2.2(i), we know that T λ(t),γ(t) is λ(t) 2 -cocoercive for every t ≥ t 0 .Using this on the first summand of the right hand side of the previous inequality yields, for t ≥ t 1 = max{ξ, t 0 }, Now, since λ > 2 (α−1) 2 , we can choose > 0 such that From (10) we get, for every t ≥ t 1 , By (11) and the definition of λ(t), we know that 1−α 2 + 2 < 0, and so we can find t 2 ≥ t 1 such that for every t ≥ t 2 the previous expression becomes nonpositive.According to Lemma A.2, the right hand side of ( 12) is nonpositive whenever This quantity can be rewritten as This means we can find t 3 ≥ t 2 such that for every t ≥ t 3 we have R(t) ≤ 0, that is, for every Now, integrating (13) from t 3 to t we obtain From ( 13) and the form of E we immediately obtain sup From Lemma 2.2(i), we know that for every t ≥ t 0 the operator T λ(t),γ(t) is 2 λ(t) -Lipschitz continuous, which gives, for every t ≥ t 0 , Thus, from (15) and recalling that λ(t) = λt 2 we arrive at By combining ( 15), ( 18) and ( 19) we obtain sup t≥t 0 t ẋ(t) < +∞ and therefore From Lemma 2.2, ( 15), (20) and the fact that B is 1 β -Lipschitz continuous we deduce that, as t → +∞, On the other hand, for every t ≥ t 0 we have so by combining ( 19), ( 21), ( 22) and the fact that λ(t) = 2λt we arrive at Let us now improve (19) and show that According to ( 19) and ( 21) there exists a constant K > 0 such that for every t ≥ t 0 it holds By (17), the right hand side belongs to L 1 ([t 0 , +∞), R), so we get exists.Obviously, this implies the existence of L := lim t→+∞ λ(t)T λ(t),γ(t)) (x(t))

2
. By using (17) again we come to and so we must have L = 0, which gives By combining (2), ( 19), ( 20) and ( 23) we obtain, as t → +∞, Moreover, by using the well-known inequality a + b + c 2 ≤ 3 a 2 + 3 b 2 + 3 c 2 for every a, b, c ∈ H, for every t ≥ t 0 it holds From ( 16), ( 23) and ( 17) it follows To see that ẋ(t) = o 1 t as t → +∞, we write, for every t ≥ t 0 , From ( 16) and ( 26) we deduce that the left hand side belongs to L 1 ([t 0 , +∞), R), from which we infer that the limit lim t→+∞ t 2 ẋ(t) 2 exists.Using ( 16) again, we get from which we finally deduce lim t→+∞ t 2 ẋ(t) 2 = 0, therefore Notice that we can write for every t ≥ t 0 Hence, multiplying both sides of (25) by λ(t) γ(t) and remembering the definition of λ(t) we obtain For every t ≥ t 0 , we have Therefore, by using ( 23) and (28), and recalling that λ(t) = λt 2 , we obtain The fact that ẍ(t) = O 1 t 2 as t → +∞ comes from ( 2), ( 27), ( 23) and (24).Weak convergence of the trajectories.Let x ∈ zer(A + B).We will work with the energy function h : [t 0 , +∞) → R given by For every t ≥ t 0 , we have Combining ( 2) and (29) gives us, for every t ≥ t 0 , By using the λ(t) 2 -cocoercitivity of T λ(t),γ(t) on the left hand side, Cauchy-Schwarz on the right hand side and multiplying both sides by t, the previous inequality entails, for every t ≥ t 0 , Now, putting ( 15), ( 16) and ( 23) together results in Let us now move on to the second condition.Suppose x is a weak sequential cluster point of t → x(t), that is, there exists a sequence (t n ) n∈N ⊆ [t 0 , +∞) such that t n → +∞ and x n := x(t n ) converges weakly to x as n → +∞.Define U γ := Id −J γA • (Id −γB).

Now apply Lemma
According to (25), we have for all t ≥ t 0 for some δ > 0, we can extract a subsequence (γ(t n k )) k∈N such that γ(t n k ) → γ ∈ (0, 2β) as k → +∞.We may assume without loss of generality then that γ n := γ(t n ) → γ as n → +∞.We now have for every n ∈ N Now, since every weakly convergent sequence is bounded and the operators B and A γ are Lipschitzcontinuous we deduce that the right-hand side of the previous inequality approaches zero as n → +∞, therefore getting as n → +∞.Now, from the proof of part (i) of Lemma 2.2, we know that U γ is 4β−γ 4β -cocoercive, thus monotone and Lipschitz continuous and therefore maximally monotone.Summarizing, we have 1.U γ is maximally monotone and thus its graph is closed in the weak×strong topology of H × H (see [14,Proposition 20.38(ii)]), 2.
x n converges weakly to x and U γ (x n ) → 0 as n → +∞, which allows us to conclude that U γ ( x) = 0, and gives finally x ∈ zer(A + B).Now we just invoke Opial's Lemma to achieve that x(t) converges weakly to x as t → +∞ for some x ∈ zer(A + B).
In the following subsections, we explore the particular cases B = 0 and A = 0, and we will show improvements with respect to previous results from the literature addressing continuous time approaches to monotone inclusions.
Remark 3.3.The hypotheses required for γ are fulfilled at least by families of functions.First, take r ≥ 0 and set γ(t) = e t −r .Then, we have and If γ is a polynomial of degree n for some n ∈ N, the conditions are also fulfilled.Assume γ(t) = a n t n + a n−1 t n−1 + • • • + a 0 for all t ≥ t 0 , for some a i ∈ R for i ∈ {0, . . ., n} and a n > 0.Then, we have Since we also have γ(t) → +∞ as t → +∞, the condition inf t≥t 0 γ(t) > 0 is fulfilled for large enough t 0 .
In particular, we can choose γ(t) = λ(t) = λt 2 , which fulfills γ(t) ≥ λt 2 0 > 0 for any t ≥ t 0 and any t 0 .Since A λ,λ = A λ for λ > 0, this choice of γ allows us to recover the (DIN-AVD) system studied by Attouch and László in [9].Notice the way the convergence rates for A γ(t) (x(t)) and d dt A γ(t) (x(t)) exhibited in part (iii) of Theorem 3.2 depend on γ(t).If we set γ(t) = t n for every t ≥ t 0 for any natural number n > 2, (Split-DIN-AVD) performs from this point of view better than (DIN-AVD) without increasing the complexity of the governing operator.

Remark 3.5. (a)
As we mentioned in the introduction, the dynamical system (31) provides a way of finding the zeros of a cocoercive operator directly through forward evaluations, instead of having to resort to its Moreau envelope when following the approach in [9].
(b) The dynamics (31) bear some resemblance to the system (6) (see also [16]) with µ(t) = α t and ν(t) = 1 η(t) , with an additional Hessian-driven damping term.In our case, since η > However, we have ν(t) = − 2 λt 3 ≤ 0 ∀t ≥ t 0 , so one of the hypotheses which is needed in ( 6) is not fulfilled, which shows that one cannot address the dynamical system (31) as a particular case of it; indeed, for (6) a vanishing damping is not allowed.With our system, we obtain convergence rates for ẋ(t) and ẍ(t) as t → +∞, which are not obtained in [16].

Structured convex minimization
We can specialize the previous results to the case of convex minimization, and show additionally the convergence of functional values along the generated trajectories to the optimal objective value at a rate that will depend on the choice of γ.Let f : H → R ∪ {+∞} be a proper, convex and lower semicontinuous function, and let g : H → R be a convex and Fréchet differentiable function with L ∇g -Lipschitz continuous gradient.Assume that argmin H (f + g) = ∅, and consider the minimization problem Fermat's rule tells us that x is a global minimum of f + g if and only if Therefore, solving (32) is equivalent solving the monotone inclusion 0 ∈ (A + B)(x) addressed in the first section, with A = ∂f and B = ∇g.Moreover, recall that if ∇g is L ∇g -Lipschitz then it is 1 L ∇g -cocoercive (Baillon-Haddad's Theorem, see [14,Corollary 18.17]).Therefore, associated to the problem (32) we have the dynamics where we have denoted u(t) = x(t) − γ(t)∇g(x(t)) for all t ≥ t 0 for convenience.
2 and all t ≥ t 0 , and that γ : [t 0 , +∞) → 0, 2 L ∇g is a differentiable function that satisfies γ(t) γ(t) = O(1/t) as t → +∞.Then, for a solution x : [t 0 , +∞) → H to (33), the following statements hold: (iii) We have the convergence rates L ∇g for every t ≥ t 0 and we set u(t) := x(t) − γ(t)∇g(x(t)), then Proof.Parts (i)-(iv) are a direct consequence of Theorem 3.1.For checking (v), first notice that for all t ≥ t 0 we have Now, let x ∈ argmin H (f + g).According to [15, Lemma 2.3], for every t ≥ t 0 , we have the inequality After summing the norm squared term and using the Cauchy-Schwarz inequality, for every t ≥ t 0 we obtain which follows as a consequence of x being bounded and It is also worth mentioning the system we obtain in the case where g ≡ 0, since we also get some improved rates for the objective functional values when we compare (Split-DIN-AVD) to (DIN-AVD) [9].In this case, we have the system If we assume λ > 1 (α−1) 2 , allow γ : [t 0 , +∞) → (0, +∞) to be unbounded from above and otherwise keep the hypotheses of Theorem 4.1, for a solution x : [t 0 , +∞) → H to (35), the following statements hold: We have the convergence rates (iv) If 0 < inf t≥t 0 γ(t), then x(t) converges weakly to a minimizer of f as t → +∞.
(v) We also obtain the rate as t → +∞, which entails Parts (i)-(iv) are a direct consequence of Theorem 3.2 for the case A = ∂f .For showing part (v), first notice that for λ > 0 and u ∈ H we have, according to the definition of λ and prox λf , Let x ∈ H be a minimizer of f .We apply the gradient inequality to f γ(t) , from which we obtain, for every where the last inequality follows from the Cauchy-Schwarz inequality.Since ∇f γ(t) (x(t)) = o 1 γ(t) as t → +∞ and x is bounded, the previous inequality entails the first statement of (v).Again recalling the definition of the Moreau envelope of f , this finally gives as t → +∞, which implies the last two statements and concludes the proof.
As pointed out in Remark 3.3, we can choose γ(t) = λt 2 for every t ≥ t 0 and recover the (DIN-AVD) system for nonsmooth convex minimization problems studied in [9].Moreover, we can also set γ(t) = t n for a natural number n > 3 and all t ≥ t 0 .Now, not only are the convergence rates for ∇f γ(t) (x(t)) and d dt ∇f γ(t) (x(t)) as t → +∞ improved with respect to the system in [9], but (Split-DIN-AVD) also provides a better rate for the convergence of f γ(t) (x(t)) to min H f as t → +∞.

Numerical experiments
In the following paragraphs we describe some numerical experiments that portray some aspects of the theory.

Minimizing a nonsmooth and convex function
As an example of a continuous time scheme minimizing a proper, convex and lower semicontinuous function f : H → R ∪ {+∞} via (Split-DIN-AVD), we consider the system We will consider three options for f and plot for each of them the trajectories, the objective function values and the gradients of the Moreau envelopes as follows: • f (x) = 1 2 x 2 (Figures 3a and 4a), • f (x) = |x| (Figures 3b and 4b), 3c and 4c).
In order to fulfill α > 1 and λ > 1 (α−1) 2 , we choose the parameters α = 2, λ = 1.1, and we take ξ = 0 and γ(t) = t 8 .We compare the results given by (DIN-AVD) (that is, when γ(t) = λt 2 ) and the ones given by our system (Split-DIN-AVD).The choice of ξ does not seem to change the plots in a significant way for the examples we have chosen.37) and the function values f prox γ(t) (x(t)) for our choices of f as t → +∞. Figure 4 portrays the fast convergence to zero of ∇f γ(t) (x(t)) as t → +∞.Notice the big improvement over (DIN-AVD) for nonsmooth convex minimization in [9] when choosing γ(t) = t 8 , a result which we already knew theoretically.Polynomials of high degree seem to be the ones which give the biggest improvements in terms of rates.

Now we consider the monotone inclusion problem (1) for
and so and x 2 (t) = 0 0 .
We choose the parameters α = 7, λ = 0.056, γ(t) ≡ 1.5, and the Cauchy data x 0 = (1, 2) and u 0 = (−1, −1). Figure 5a corresponds to the case ξ = 0, and Figure 5b depicts the trajectory when the Hessian damping parameter is ξ = 0.8.Again, notice how, not only for optimization problems, but also for monotone inclusions which cannot be reduced to the former, the presence of ξ seems to attenuate the oscillations present in the trajectories.

A numerical algorithm
In the following we will derive via time discretization of (Split-DIN-AVD) a numerical algorithm for solving the monotone inclusion problem (1).We perform a discretization of (Split-DIN-AVD) with stepsize 1 and set, for an integer k ), so we get, for every k ≥ 1, After rearranging the terms of (38), for every k ≥ 1 we obtain In other words, after setting α k = 1 − α k and denoting the right hand side of (39) by y k for every k ≥ 1, we obtain the following iterative scheme (∀k ≥ 1) Observe that the second step in (40) is always well-defined.Indeed, for λ, γ > 0, T λ,γ is λ 2 -cocoercive, hence monotone (see Lemma 2.2(i)).This also implies that T λ,γ is 2 λ -Lipschitz continuous, and a monotone and continuous operator is maximally monotone, according to [14,Corollary 20.28].Hence, by Minty's Theorem (see [14,Theorem 21.1]), we know that Id +T λ,γ : H → H is surjective.
We are in conditions of stating the main theorem concerning our previous algorithm.(ii) The sequence (x k ) k≥0 converges weakly to an element of zer(A + B).
(iii) The sequence (y k ) k≥1 converges weakly to an element of zer(A + B).Precisely, we have The proof can be done by transposing the techniques used in the continuous time case to the discrete time case.Algorithm (40) can be seen as a splitting version of the (PRINAM) algorithm studied by Attouch and László in [10].Remark 6.2.The second step in (40) can be quite complicated to compute.However, if B = 0, we can resort to the fact that (A λ 1 ) λ 2 = A λ 1 +λ 2 for λ 1 , λ 2 > 0. We now have, for λ, γ > 0, It is now possible to write (40) in terms of the resolvents of A. We have, for every k ≥ 1, Now, if we assume 0 < inf k≥0 γ k and λ > 2ξ+1 (α−1) 2 and otherwise keep the hypotheses of Theorem 6.1, then for the sequences (x k ) k≥0 and (y k ) k≥1 generated by (41), the following statements hold: (i) We have the estimates (ii) The sequence (x k ) k≥0 converges weakly to an element of zer A.
(iii) The sequence (y k ) k≥1 converges weakly to an element of zer A as well.Precisely, we have Notice that the condition required for (γ k ) k≥0 is fulfilled in particular for γ k = k n for every k ≥ 1 and a natural number n ≥ 1.Thus, by choosing large n, we obtain a fast convergence rate for A γ k (x k ) as k → +∞.

A Appendix
The following are three auxiliary lemmas that are used in the proof of Theorem 3.1.The proof for Lemma A.1 can be found in [12], while the proof of Lemma A.

Figure 2 :
Figure 2: Fast convergence of the velocities

Figure 3 :Figure 4 :Figure 3
Figure 3: Trajectories and objective function values in the case A = ∂f

Figure 5 :
Figure 5: Trajectories of (Split-DIN-AVD) for finding the zeros of A + B