Time Rescaling of a Primal-Dual Dynamical System with Asymptotically Vanishing Damping

In this work, we approach the minimization of a continuously differentiable convex function under linear equality constraints by a second-order dynamical system with an asymptotically vanishing damping term. The system under consideration is a time rescaled version of another system previously found in the literature. We show fast convergence of the primal-dual gap, the feasibility measure, and the objective function value along the generated trajectories. These convergence rates now depend on the rescaling parameter, and thus can be improved by choosing said parameter appropriately. When the objective function has a Lipschitz continuous gradient, we show that the primal-dual trajectory asymptotically converges weakly to a primal-dual optimal solution to the underlying minimization problem. We also exhibit improved rates of convergence of the gradient along the primal trajectories and of the adjoint of the corresponding linear operator along the dual trajectories. We illustrate the theoretical outcomes and also carry out a comparison with other classes of dynamical systems through numerical experiments.

In recent years, there has been a flurry of research on the relationship between continuous time dynamical systems and the numerical algorithms that arise from their discretizations.For the unconstrained optimization problem, it has been known that inertial systems with damped velocities enjoy good convergence properties.For a convex, smooth function f : X Ñ R, Polyak is the first to consider the heavy ball with friction (HBF) dynamics [43,42] : xptq `γ 9 xptq `∇f pxptqq " 0. (HBF) Alvarez and Attouch continue the line of this study, focusing on inertial dynamics with a fixed viscous damping coefficient [2,3,14].Later on, Cabot, Engler, and Gadat [26,27] consider the system that replaces γ with a time dependent damping coefficient γ ptq.In [46], Su, Boyd, and Candès showed that it turns out one can achieve fast convergence rates by introducing a time dependent damping coefficient which vanishes in a controlled manner, neither too fast nor too slowly, as t goes to infinity : xptq `α t 9 xptq `∇f pxptqq " 0. (AVD) For α ě 3, the authors showed that a solution x : rt 0 , `8q Ñ X to (AVD) satisfies f pxptqq f px ˚q " O `1 t 2 ˘as t Ñ `8.In fact, the choice α " 3 provides a continuous limit counterpart to Nesterov's celebrated accelerated gradient algorithm [39,40,19].Weak convergence of the trajectories to minimizers of f when α ą 3 has been shown by Attouch, Chbani, Peypouquet, and Redont in [17] and May in [38], together with the improved rates of convergence f pxptqq f px ˚q " o `1 t 2 ˘as t Ñ `8.In the meantime, similar results for the discrete counterpart were also reported by Chambolle and Dossal in [28], and by Attouch and Peypouquet in [15] .
In [10], Attouch, Chbani, and Riahi proposed an inertial proximal type algorithm, which results from a discretization of the time rescaled (AVD) system : xptq `α t 9 xptq `δptq∇f pxptqq " 0, where δ : rt 0 , `8q Ñ R `is the time scaling function satisfies a certain growth condition and that also enter into convergence rate statement f px ptqq ´f px ˚q " O ´1 t 2 δptq ¯as t Ñ `8.The resulting algorithm obtained by the authors is considerably simpler than the founding proximal point algorithm proposed by Güler in [31], while providing comparable convergence rates for the functional values.
In order to approach constrained optimization problems, Augmented Lagrangian Method (ALM) [44] (for linearly constrained problems) and Alternating Direction Method of Multipliers (ADMM) [29,25] (for problems with separable objectives and block variables linearly coupled in the constraints) and some of their variants have been shown to enjoy substantial success.Continuous-time approaches for structured convex minimization problems formulated in the spirit of the full splitting paradigm have been recently addressed in [24] and, closely connected to our approach, in [49,32,6,23], to which we will have a closer look in Subsection 2.2.The temporal discretization resulting from these dynamics gives rise to the numerical algorithm with fast convergence rates [34,33] and with a convergence guarantee for the generated iterate [22], without additional assumptions such as strong convexity.
In this paper, we will investigate a second-order dynamical system with asymptotic vanishing damping and time rescaling term, which is associated with the optimization problem (1.1) and formulated in terms of its augmented Lagrangian.The case when the time rescaling term does not appear has been established in [23].We show that by introducing this time rescaling function, we are able to derive faster convergence rates for the primal-dual gap, the feasibility measure, and the objective function value along generated trajectories while still maintaining the asymptotic behaviour of the trajectories towards a primal-dual optimal solution.On the other hand, this work can also be viewed as an extension of the time rescaling technique derived in [10,13] for the constrained case.To our knowledge, the trajectory convergence for dynamics with time scaling seems to be new, even in the unconstrained case.

Notations and a preliminary result
For both Hilbert spaces X and Y, the Euclidean inner product and the associated norm will be denoted by x¨, ¨y and ¨ , respectively.The Cartesian product X ˆY will be endowed with the inner product and the associated norm defined for px, λq , pz, µq P X ˆY as  Under the assumptions (1.2), L is convex with respect to x P X and affine with respect to λ P Y.
A pair px ˚, λ ˚q P X ˆY is said to be a saddle point of the Lagrangian function L if for every px, λq If px ˚, λ ˚q P X ˆY is a saddle point of L then x ˚P X is an optimal solution of (1.1), and λ ˚P Y is an optimal solution of its Lagrange dual problem.If x ˚P X is an optimal solution of (1.1) and a suitable constraint qualification is fulfilled, then there exists an optimal solution λ ˚P Y of the Lagrange dual problem such that px ˚, λ ˚q P X ˆY is a saddle point of L. For details and insights into the topic of constraint qualifications for convex duality we refer to [18,20].The set of saddle points of L, called also primal-dual optimal solutions of (1.1), will be denoted by S and, as stated in the assumptions, it will be assumed to be nonempty.The set of feasible points of (1.1) will be denoted by F :" tx P X : Ax " bu and the optimal objective value of (1.1) by f ˚.
The system of primal-dual optimality conditions for (1.1) reads where A ˚: Y Ñ X denotes the adjoint operator of A.
For β ě 0, we consider also the augmented Lagrangian L β : X ˆY Ñ R associated with (1.1) For every px, λq P F ˆY it holds f pxq " L β px, λq " L px, λq . (2.5) If px ˚, λ ˚q P S, then we have for every px, λq P X ˆY In addition, from (2.3) we have In other words, for any β ě 0 the sets of saddle points of L and L β are identical.
The case (2.6) in which there is no time rescaling, i.e., when δptq " 1, was studied by Zeng, Lei, and Chen in [49], and by Bot ¸and Nguyen in [23].The system with more general damping, extrapolation and time rescaling coefficients was addressed by He, Hu, and Fang in [32,35] and by Attouch, Chbani, Fadili, and Riahi in [6].
It is well known that the viscous damping term α t has a vital role in achieving fast convergence in unconstrained minimization [7,9,38].The role of the extrapolation θt is to induce more flexibility in the dynamical system and in the associated discrete schemes, as it has been recently noticed in [6,11,32,49].The time scaling function δ p¨q has the role to further improve the rates of convergence of the objective function value along the trajectory, as it was noticed in the context of uncostrained minimization problems in [5,10,12] and of linearly constrained minimization problems in [6,35].
Finally, we mention that extending the results in this paper to the multi-block case is possible.For further details, we refer the readers to [23,Section 2.4].

Associated monotone inclusion problem
The optimality system (2.3) can be equivalently written as where is the maximally monotone operator associated with the convex-concave function L. Indeed, it is immediate to verify that T L is monotone.Since it is also continuous, it is maximally monotone (see, for instance, [18,Corollary 20.28]).Therefore S can be interpreted as the set of zeros of the maximally monotone operator T L , which means that it is a closed convex subset of X ˆY (see, for instance, [18,Proposition 23.39]).
Even in the non-rescaling case, applying the fast continuous-time approaches recently proposed in [16,4] to the solving of (2.7) would require the use of the Moreau-Yosida approximation of the operator T L , for which in general no closed formula is available.The resulting dynamical system would therefore not be formulated in the spirit of the full splitting algorithm, which is undesirable from the point of view of numerical computations.We mention the work [21], which is related to this time rescaling approach.

Faster convergence rates via time rescaling
In this section we will derive fast convergence rates for the primal-dual gap, the feasibility measure, and the objective function value along the trajectories generated by the dynamical system (2.6).We will make the following assumptions on the parameters α, θ, β and the function δ throughout this section.Assumption 1.In (2.6), assume that δ : rt 0 , `8q Ñ p0, `8q is continuously differentiable.Moreover, suppose that the parameters α, β, θ and the function δ satisfy ´1 and sup Besides the first three conditions that are known previously in [23], it is worth pointing out that we can deduce from the last one the following inequality for every t ě t 0 : This gives a connection to the condition which appears in [10].A few more comments regarding the function δ will come later, after the convergence rates statements.

The energy function
Let px, λq : rt 0 , `8q Ñ X ˆY be a solution of (2.6) where f ˚denotes the optimal objective value of (1.1).Assumption 1 implies the nonnegativity of following quantity, which will appear many times in our analysis: where we recall that the second equality comes from (2.5).By multiplying this inequality by θtδptq and combining it with (3.10), the coefficient attached to the primal-dual gap becomes which finally gives the desired statement.
(ii) Inequality (3.16) tells us that E is nonincreasing on rt 0 , `8q.Hence, for every t ě t 0 it holds Assuming α ą 3 and 1 2 ě θ ą 1 α´1 , we immediately see ξ ą 0. From (3.17) we obtain, for every which implies the boundedness of the trajectory.On the other hand, the same inequality gives for all t ě t 0 }vptq} " Using the triangle inequality and (3.18), we obtain for all t ě t 0 t ´9 x ptq , 9 λ ptq which gives the desired convergence rate.

Fast convergence rates for the primal-dual gap, the feasibility measure and the objective function value
The following are the main convergence rates results of the paper.
Some comments regarding the previous proof and results are in order.
Remark 3.4.The proof we provided here is significantly shorter than the one derived in [23] thanks to Lemma A.1.This Lemma is inspired by the one used in [34] for showing the fast convergence to zero of the feasibility measure, although the authors study a different dynamical system.On the other hand, when δ ptq " 1, the results in [23] is more robust than the one we obtain here, as it gives the O `1 t 2 ˘rates for the sum of primal-dual gap and feasibility measure, rather than each one individually.
Remark 3.5.Here are some remarks comparing our rates of convergence to those in [6,35].
• Primal-dual gap: According to (3.21), the following rate of convergence for the primal-dual is exhibited: which coincides with the findings of [6,35].
• In [6], only the upper bound presents this order of convergence.The lower bound obtained is of order O ˆ1 t ?δptq ˙as t Ñ `8.In [35], there are no comments on the rate attained by the functional values in the case of a general time rescaling parameter.
Remark 3.6.To further illustrate, notice that δptq :" δ 0 t n @t ě t 0 fulfills Assumption 1 provided that δ 0 ą 0, 0 ď n ď 1´2θ θ " 1 θ ´2.Therefore, all the statements derived are above are of order O ´1 t 1{θ ¯.If we desire to obtain fast convergence rates, we must take θ small, which in the light of Assumption 1 can be achieved by choosing large enough α.Such behavior can be seen in the unconstrained case [10] and other settings [35,21].4 Weak convergence of the trajectory to a primal-dual solution In this section we will show that the solutions to (2.6) weakly converge to an element of S. The fact that δ ptq enters the convergence rate statement suggests that one can benefit from this time rescaling function when it is at least nondecreasing on rt 0 , `8q.We are, in fact, going to need this condition when showing trajectory convergence.

¯,
where F : rt 0 , `8q ˆX ˆY ˆX ˆY Ñ X ˆY ˆX ˆY is given by F pt, z, µ, w, ηq We omit the details proof and only state the result in the following theorem.
The additional Lipschitz continuity condition of ∇f and the fact that δ is nondecreasing give rise to the following two essential integrability statements.Proposition 4.2.Let px, λq : rt 0 , `8q Ñ X ˆY be a solution of (2.6)  Integration of this inequality produces (4.4).
The finiteness of the second integral is only entailed by (3.13) when β ą 0. For the general case β ě 0, we use (3.22) and the fact that δ is nondecreasing on rt 0 , `8q to obtain and the proof is complete.Now, for a given primal-dual solution px ˚, λ ˚q P S, we define the following mappings on rt 0 , `8q where the last inequality follows from Assumption 2: 1 ´θα ď ´θ ă 0, ´δptq `θt 9 δptq ď p2θ ´1qδptq `θt 9 δptq ď 0.
The desired result then follows after some rearranging.
The following lemma ensures that the first condition of Opial's Lemma is met.
Proof.For any t ě t 0 , we multiply (4.3) by t and drop the last two norm squared terms to obtain t : ϕptq `α 9 ϕptq `θt 2 9 W ptq ď 0.
Recall from (4.6) that for every t ě t 0 we have On the one hand, according to (3.14), the second summand of the previous expression belongs to L 1 rt 0 , `8q.On the other hand, using (4.3) and (3.12), we assert that ż `8 Hence, the first summand of (4.12) also belongs to L 1 rt 0 , `8q, which implies that the mapping t Þ Ñ tW ptq belongs to L 1 rt 0 , `8q as well.For achieving the desired conclusion, we make use of Lemma A.4 with φ :" ϕ and w :" θW .
The following results guarantee that the second condition of Opial's Lemma is also met.On the one hand, we have the conclusion follows after dividing the inequality by δptq.
It follows that ‚ The integral I 4 ptq.Yet again, integration by parts produces and from here (4.25) ‚ The integral I 5 ptq.Integration by parts entails By the Cauchy-Schwarz inequality, we deduce that and thus where, for s ě t 0 , V psq :" θpα `1qW psq `ˆαpα ´1q t 2 0 δp 0 q ` A ˙› › `9 xpsq, 9 λpsq ˘› › 2 `C3 δpsq}∇f pxpsqq ´∇f px ˚q} 2 `C4 δpsq}Axpsq ´b} 2 , and the constant C 5 is given by Now we divide (4.27) by t α , thus obtaining Now, we integrate this inequality from t 0 to r.We get We now recall some important facts.First of all, we have In addition, according to Lemma A.2, it holds and respectively.Finally, integrating by parts leads to ´ż r t 0 @ A 9 xptq, λptq ´λ˚D dt " ´@Axprq ´b, λprq ´λ˚D `@Axpt 0 q ´b, λpt 0 q ´λ˚D `ż r t 0 @ Axptq ´b, 9 λptq D dt ď }Axprq ´b}}λprq ´λ˚} ` Axpt 0 q ´b }λpt 0 q ´λ˚} `ż r We now come to the final step and show weak convergence of the trajectories of (2.6) to elements of S. Theorem 4.9.Let px, λq : rt 0 , `8q Ñ X ˆY be a solution to (2.6) and px ˚, λ ˚q P S. Then `xptq, λptq ˘converges weakly to a primal-dual solution of (1.1) as t Ñ `8.
In order to show condition (ii), we recall the operator T L defined in (2.7) by Fix px ˚, λ ˚q P S (in other words, T L px ˚, λ ˚q " 0) and take px, λq any weak sequential cluster point of `xptq, λptq ˘as t Ñ `8, which means there exists a strictly increasing sequence pt n q nPN Ď rt 0 , `8q such that `xpt n q, λpt n q ˘á px, λq as n Ñ `8.
Given that δ is nondecreasing on rt 0 , `8q, from Theorem 4.7 and (3.22) we deduce that ∇f pxpt n qq ´∇f px ˚q Ñ 0, A ˚λpxpt n qq ´A˚λ ˚Ñ 0, Axpt n q ´b Ñ 0 as n Ñ `8.Since A ˚λ˚" ´∇f px ˚q, the previous three statements imply T L pxpt n q, λpt n qq Ñ 0 as n Ñ `8.
The dynamical system in x is reads x ptq `α t 9 x ptq `∇f px ptqq " 0 xpt 0 q " x 0 and 9 xpt 0 q " 9 x 0 , for α ě 3, and is nothing else than Nesterov's accelerated gradient system.The trajectory generated by the system in λ is λptq " ´9 λ 0 t 0 1´α for every t ě t 0 .The parameters θ and β plays no role in the system.Therefore, the condition on δ ptq now becomes sup which is exactly the one imposed in [10], see also [13].
If α ě 3, then Theorem 3.3 (iii) says that f pxptqq converges to f ˚with a rate of convergence of O ´1 t 2 δptq ¯as t Ñ `8, which is the rate derived in [10,13].
If α ą 3, then Theorem 4.9 gives that the trajectory xptq converges weakly to an optimal solution of (4.40), as t Ñ `8.Notice that no trajectory convergence has been reported in [10,13].
Finally, we mention that the convergence of the trajectory in the critical case α " 3 is still an open question, even for non time rescaling case ( [7,46]), as it is the convergence of the iterates of the original Nesterov's acceleration algorithm ( [19,39]).

Numerical experiments
We will illustate the theoretical results by two numerical examples, with X " R 4 and Y " R 2 .We will address two minimization problems with linear constraints; one with a strongly convex objective function and another with a convex objective function which is not strongly convex.In both cases, the linear constraints are dictated by The optimality conditions can be calculated and lead to the following primal-dual solution pair x This problem is similar to the regularized logistic regression frequently used in machine learning.We cannot explicitly calculate the optimality conditions as in the previous case; instead, we use the last solution in the numerical experiment as the approximate solution.
To comply with Assumption 2, we choose t 0 ą 0, α " 8, β " 10, θ " 1 6 , and we test four different choices for the rescaling parameter: δptq " 1 (i.e., the (PD-AVD) dynamics in [49,23]), δptq " t, δptq " t 2 and δptq " t For each choice of δ, we plot, using a logarithmic scale, the primal-dual gap L `xptq, λ ˚˘Ĺ `x˚, λptq ˘, the feasibility measure Ax ptq ´b and the functional values |f px ptqq ´f˚| , to highlight the theoretical result in Theorem 3.3.We also illustrate the findings from Theorem 4.7, namely, we plot the quantities ∇f px ptqq ´∇f px ˚q and A ˚pλ ptq ´λ˚q , as well as the velocity }p 9 xptq, 9 λptqq}.Figure 5.1 and 5.2 display these plots for Example 5.1 and 5.2, respectively.As predicted by the theory, choosing faster-growing time rescaling parameters yields better convergence rates.This is not the case for the velocities.

A Appendix
Here we collect the auxiliary results that are required to carry out many steps in out analysis.
A proof for the following lemma in the finite-dimensional case can be found in [34, Lemma 6].The proof for the infinite-dimensional case is short and virtually identical, so we include it here for the sake of completeness.. Lemma A.1.Assume that t 0 ą 0, g : rt 0 , `8q Ñ Y is a continuous differentiable function, a : rt 0 , `8q Ñ r0, `8q is a continuous function, and C ě 0. hptqdt.
The following lemma is a slight variation of results already present in the literature.See, for example, [8,Lemma A.2].
Then, the positive part r 9 φs `of 9 φ belongs to L 1 rt 0 , `8q and the limit lim tÑ`8 φptq is a real number.

1 X
Problem statement and motivationIn this paper we will consider the optimization problem min f pxq , subject to Ax " b (, Y are real Hilbert spaces; f : X Ñ R is a continuously differentiable convex function; A : X Ñ Y is a continuous linear operator and b P Y;

3 .
In both examples, the initial conditions are xpt 0 q "

.
[23,supremum term is finite due to the boundedness of the trajectory.Now, by using the nonnegativity of ϕ and the facts (4.29), (4.30), (4.31) and (4.32) on (4.28), we come to Ax ptq ´b 2 ` 9 λ ptq ¯belong to L 1 rt 0 , `8q.Therefore, by taking the limit as r Ñ `8 in (4.33) we obtain Let px, λq : rt 0 , `8q Ñ X ˆY be a solution to (2.6) and px ˚, λ ˚q P S. Then it holds According to (3.14) and (4.4), the right hand side of the previous inequality belongs to L 1 rt 1 , `8q.Since δ is nondecreasing, for every t ě t 1 we have According to(3.14)and(4.19),therighthand side of the previous inequality belongs to L 1 rt 1 , `8q.Arguing as in (4.39), we deduce that the function being differentiated also belongs to L 1 rt 1 , `8q.Again applying Lemma A.3, we come to Finally, recalling that A ˚λ˚" ´∇f px ˚q, we deduce from the triangle inequality that ∇ x L `x ptq , λ ptq ˘ " ∇f px ptqq `A˚λ ptq ď ∇f px ptqq ´∇f px ˚q `› › A ˚pλptq ´λ˚q The previous theorem also has its own interest.It tells us that the time rescaling parameter also plays a role in accelerating the rates of convergence for ∇f px ptqq ´∇f px ˚q and }A ˚pλptq ´λ˚q } as t Ñ `8.Moreover, we deduce from (4.34) that the mapping px, λq Þ Ñ p∇f pxq, A ˚λq is constant along S, as reported in[23, Proposition A.4].
Indeed, the system of optimality conditions (2.3) read in this case px ˚, λ ˚q P S ô ∇f px ˚q " 0 and λ ˚P Y, in particular, x ˚P X is an optimal solution of (4.40) if and only if ∇f px ˚q " 0.
Consider the minimization problem min f px 1 , x 2 , x 3 , x 4 [23,f.Define, for every t ě t 0 , Fix t ě t 0 .The time derivative of G reads so by employing (A.2) and the previous equality we obtain, for every t ě t 0 , 2C @t ě t 0 , which leads to the announced statement.The proofs for the following results can be found in[23, Lemma A.1]and [1, Lemma 5.2], respectively.Lemma A.2. Let 0 ă t 0 ď r ď `8 and h : rt 0 , `8q Ñ r0, `8q be a continuous function.For every α ą 1 it holds ż r Since Gpt 0 q " 0, we have Gptq " Gptq ´Gpt 0 q "