Levenberg-Marquardt dynamics associated to variational inequalities

In connection with the optimization problem $$\inf_{x\in argmin \Psi}\{\Phi(x)+\Theta(x)\},$$ where $\Phi$ is a proper, convex and lower semicontinuous function and $\Theta$ and $\Psi$ are convex and smooth functions defined on a real Hilbert space, we investigate the asymptotic behavior of the trajectories of the nonautonomous Levenberg-Marquardt dynamical system \begin{equation*}\left\{ \begin{array}{ll} v(t)\in\partial\Phi(x(t))\\ \lambda(t)\dot x(t) + \dot v(t) + v(t) + \nabla \Theta(x(t))+\beta(t)\nabla \Psi(x(t))=0, \end{array}\right.\end{equation*} where $\lambda$ and $\beta$ are functions of time controlling the velocity and the penalty term, respectively. We show weak convergence of the generated trajectory to an optimal solution as well as convergence of the objective function values along the trajectories, provided $\lambda$ is monotonically decreasing, $\beta$ satisfies a growth condition and a relation expressed via the Fenchel conjugate of $\Psi$ is fulfilled. When the objective function is assumed to be strongly convex, we can even show strong convergence of the trajectories.

In order to overcome the fact that the classical Newton method assumes the solving of an equation which is in general not well-posed, one can use instead the Levenberg-Marquardt method T (x n ) + λ n Id +T ′ (x n ) x n+1 − x n ∆t n = 0 ∀n ≥ 0, where Id : H → H denotes the identity operator on H, λ n a regularizing parameter and ∆t n > 0 the step size. When T : H ⇒ H is a (set-valued) maximally monotone operator, Attouch and Svaiter showed in [13] that the above Levenberg-Marquardt algorithm can be seen as a time discretization of the dynamical system v(t) ∈ T (x(t)) λ(t)ẋ(t) +v(t) + v(t) = 0 (1) for approaching the inclusion problem find x ∈ H such that 0 ∈ T x.
This includes as a special instance the problem of minimizing a proper, convex and lower semicontinuous function, when T is taken as its convex subdifferential. Later on, this investigation has been continued in [2] in the context of minimizing the sum of a proper, convex and lower semicontinuous function with a convex and smooth one.
The condition (7) has its origins in the paper of Attouch and Czarnecki [7], where the solving of inf x∈argmin Ψ Φ(x), (8) for Φ, Ψ : H → R ∪ {+∞} proper, convex and lower semicontinuous functions, is approached through the nonautonomous first order dynamical system 0 ∈ẋ(t) + ∂Φ(x(t)) + β(t)∂Ψ(x(t)), (9) by assuming that the penalizing function β : [0, +∞) → (0, +∞) tend to +∞ as t → +∞. Several ergodic and nonergodic convergence results have been reported in [7] under the key assumption (7). The paper of Attouch and Czarnecki [7] was the starting point of a remarkable number of research articles devoted to penalization techniques for solving optimization problems of type (3), but also generalizations of the latter in form of variational inequalities expressed with maximal monotone operators (see [5, 7, 9, 10, 12, 15, 17, 19-21, 26, 27]). In the literature enumerated above, the monotone inclusions problems have been approached either through continuous dynamical systems or through their discrete counterparts formulated as splitting algorithms. We speak in both cases about methods of penalty type, which means in this context that the operator describing the underlying set of the variational inequality under investigation is evaluated as a penalty functional. In the above-listed references one can find more general formulations of the key assumption (7), but also further examples for which these conditions are satisfied. In Remark 5 and Remark 6 we provide more insights into the relations of the dynamical system (14) to other continuous systems (and their discrete counterparts) from the literature.

Preliminaries
In this section we present some preliminary definitions, results and tools that will be useful throughout the paper. We consider the following definition of an absolutely continuous function. (ii) x is continuous and its distributional derivative is Lebesgue integrable on [0, b]; (iii) for every ε > 0, there exists η > 0 such that for any finite family of intervals A function x : [0, +∞) → H (where b > 0) is said to be locally absolutely continuous if it is absolutely continuous on each interval [0, b] for 0 < b < +∞.
Remark 1 (a) It follows from the definition that an absolutely continuous function is differentiable almost everywhere, its derivative coincides with its distributional derivative almost everywhere and one can recover the function from its derivativeẋ = y by the integration formula (i).
is absolutely continuous and B : H → H is L-Lipschitz continuous for L ≥ 0, then the function z = B • x is absolutely continuous, too. This can be easily seen by using the characterization of absolute continuity in Definition 1(iii). Moreover, z is differentiable almost everywhere on [0, b] and the inequality ż(t) ≤ L ẋ(t) holds for almost every t ∈ [0, b].
The following results, which can be interpreted as continuous counterparts of the quasi-Fejér monotonicity for sequences, will play an important role in the asymptotic analysis of the trajectories of the dynamical system investigated in this paper. For the proof of Lemma 2 we refer the reader to [2, Lemma 5.1]. Lemma 3 follows by using similar arguments as used in [2,Lemma 5.2].
The next result which we recall here is the continuous version of the Opial Lemma.
Lemma 4 Let S ⊆ H be a nonempty set and x : [0, +∞) → H a given function. Assume that (i) for every x * ∈ S, lim t→+∞ x(t) − x * exists; (ii) every weak sequential cluster point of the map x belongs to S. Then there exists x ∞ ∈ S such that x(t) converges weakly to x ∞ as t → +∞.

A Levenberg-Marquardt dynamical system: existence and uniqueness of the trajectories
Consider the optimization problem where H is a real Hilbert space and the following conditions hold: Here, dom Φ = {x ∈ H : Φ(x) < +∞} denotes the effective domain of the function Φ.
In connection with (13), we investigate the nonautonomous dynamical system where x 0 , v 0 ∈ H and for Φ(x) ∈ R and ∂Φ(x) := ∅ for Φ(x) ∈ R, denotes the convex subdifferential of Φ. We denote by dom ∂Φ = {x ∈ H : ∂Φ(x) = ∅} the domain of the operator ∂Φ. Furthermore, we make the following assumptions regarding the functions of time controlling the velocity and the penalty: Let us mention that due to (H 1 λ ),λ(t) exists for almost every t ≥ 0.
Remark 5 (a) In case Φ(x) = 0 for all x ∈ H, the dynamical system (14) becomes The asymptotic convergence of the trajectories generated by (15) has been investigated in [7] under the assumption λ(t) = 1 for all t ≥ 0, for Θ and Ψ nonsmooth functions, by replacing their gradients with convex subdifferentials and, consequently, by treating the differential equation as a monotone inclusion (see (9)).
(c) In case Θ(x) = 0 and Ψ(x) = 1 2 x 2 for all x ∈ H and λ(t) = λ ∈ R for every t ∈ [0, +∞), the Levenberg-Marquardt dynamical system (14) becomes The dynamical system (17) has been considered in [1] in connection with the problem of finding the minimal norm elements among the minima of Φ, namely, (see also [6] and [12, In contrast to (14), where the function describing the constrained set of (13) is penalized, in (17) the objective function of (18) is penalized via a vanishing penalization function (see [1]).
In the following we specify what we understand under a solution of the dynamical system (14).

Definition 2
We say that the pair (x, v) is a strong global solution of (14), if the following properties are satisfied: Similarly to the techniques used in [13], we will show the existence and uniqueness of the trajectories generated by (14) by converting it to an equivalent first order differential equation with respect to z(·), defined by where To this end we will make use of the resolvent and Yosida approximation of the convex subdifferential of Ψ. For γ > 0, we denote by the resolvent of γ∂Φ. Due to the maximal monotonicity of ∂Φ, the resolvent J γ∂Φ : H → H is a single-valued operator with full-domain, which is, furthermore, nonexpansive, that is 1-Lipschitz continuous. The Yosida regularization of ∂Φ is defined by and it is γ −1 -Lipschitz continuous. For more properties of these operators we refer the reader to [14]. Assume now that (x, v) is a strong global solution of (14). From (19) we have for every thus, from the definition of the resolvent we derive that relation (ii) in Definition 2 is equivalent to From (19), (20) and the definition of the Yosida regularization we obtain Further, by differentiating (19) and taking into account (iii) in Definition 2, we get for almost every t ∈ [0, +∞) Taking into account (20), (21) and (22) we conclude that z defined in (19) is a strong global solution of the dynamical system (23) Vice versa, if z is a strong global solution of (23), then one obtains via (20) and (21) a strong global solution of (14).
Remark 6 By considering the time discretizationż(t) ≈ z n+1 −zn hn of the above dynamical system and by taking µ constant, from (20) and (23) we obtain the iterative scheme which for h n = 1 yields the following algorithm The convergence of the above algorithm has been investigated in [20] in the more general framework of monotone inclusion problems, under the use of variable step sizes (µ n ) n≥0 and by assuming that which is a condition that can be seen as a discretized version of the one stated in (7). The case Θ(x) = 0 for all x ∈ H has been treated in [10] (see also the references therein).
Next we show that, given x 0 , v 0 ∈ H and by assuming (H 1 λ ) and (H 1 β ), there exists a unique strong global solution of the dynamical system (23). This will be done in the framework of the Cauchy-Lipschitz Theorem for absolutely continuous trajectories (see for example [25, Proposition 6.2.1], [28,Theorem 54]). To this end we will make use of the following Lipschitz property of the resolvent operator as a function of the step size, which actually is a consequence of the classical results [ Proposition 7 Assume that (H Φ ) holds, x ∈ H and 0 < δ < +∞. Then the mapping τ → J τ ∂Φ x is Lipschitz continuous on [δ, +∞). More precisely, for any λ 1 , λ 2 ∈ [δ, +∞) the following inequality holds: Furthermore, the function λ → (∂Φ) λ x is nonincreasing.
Notice that the dynamical system (23) can be written as where z 0 = x 0 + µ(0)v 0 and f : [0, +∞) × H → H is defined by In the following we denote by L ∇Φ and L ∇Ψ the Lipschitz constants of ∇Φ and ∇Ψ, respectively.
(a) Notice that for every t ≥ 0 and every w 1 , w 2 ∈ H we have Indeed, this follows (28), the Lipschitz properties of the operators involved and the definition of µ(t). Further, notice that due to (H 1 λ ) and (H 1 β ), which is for every t ≥ 0 equal to the Lipschitz-constant of f (t, ·), satisfies We fix w ∈ H and b > 0. Due to (H 1 λ ), there exist λ min , λ max > 0 such that Relying on Proposition 7 we obtain for all t ∈ [0, b] the following chain of inequalities: Now (30) follows from the properties of the functions µ and β, and the fact that In the light of the statements proven in (a) and (b), the existence and uniqueness of a strong global solution of the dynamical system (23) follow from [25, Proposition 6.2.1] (see also [28,Theorem 54]).
Finally, similarly to the proof of [13, Theorem 2.4(ii)], one can guarantee the existence and uniqueness of the trajectories generated by (14) by relying on the properties of the dynamical system (23) and on (20) and (21). The details are left to the reader.

Convergence of the trajectories and of the objective function values
In this section we prove weak convergence for the trajectory generated by the dynamical system (14) to an optimal solution of (13) as well as convergence for the objective function values of the latter along the trajectory. Some techniques from [7] and [13] will be useful in this context. To this end we will make the following supplementary assumptions: ( where • N argmin Ψ is the normal cone to the set argmin Ψ: N argmin Ψ (x) = {p ∈ H : p, y − x ≤ 0 ∀y ∈ argmin Ψ} for x ∈ argmin Ψ and N argmin Ψ (x) = ∅ for x ∈ argmin Ψ; • ran N argmin Ψ is the range of the normal cone N argmin Ψ : p ∈ ran N argmin ψ if and only if there exists x ∈ argmin Ψ such that p ∈ N argmin Ψ (x); • Ψ * : H → R∪{+∞} is the Fenchel conjugate of Ψ: Ψ * (p) = sup x∈H { p, x −Ψ(x)} ∀p ∈ H; • σ argmin Ψ : H → R ∪ {+∞} is the support function of the set argmin Ψ: σ argmin Ψ (p) = sup x∈argmin Ψ p, x for all p ∈ H; • δ argmin Ψ : H → R ∪ {+∞} is the indicator function of argmin Ψ: it takes the value 0 on the set argmin Ψ and +∞, otherwise.
Remark 8 (a) The conditionλ(t) ≤ 0 for almost every t ∈ [0, +∞) has been used in [13] in the study of the asymptotic convergence of the dynamical system (1), when approaching the monotone inclusion problem (2).
(e) Due to the continuity of Θ, the condition ( H) is equivalent to which holds when 0 ∈ sqri(dom Φ−argmin Ψ), a condition that is fulfilled, if Φ is continuous at a point in dom Φ ∩ argmin Ψ or int(argmin Ψ) ∩ dom Φ = ∅ (we invite the reader to consult also [14], [16] and [29] for other sufficient conditions for the above subdifferential sum formula). Here, for M ⊆ H a convex set, is a closed linear subspace of H} denotes its strong quasi-relative interior. We always have int M ⊆ sqri M (in general this inclusion may be strict). If H is finite-dimensional, then sqri M coincides with ri M , the relative interior of M , which is the interior of M with respect to its affine hull.
The following differentiability result of the composition of convex functions with absolutely continuous trajectories that is due to Brézis (see [24,Lemme 4,p. 73] and also [7, Lemma 3.2]) will play an important role in our analysis. f (x(t)) = ẋ(t), h ∀h ∈ ∂f (x(t)).
Proof. For the beginning, we notice that from the definition of S and ( H) we have For almost every t ≥ 0 it holds according to (14) d dt From (14) and the convexity of Φ, Θ and Ψ we have for every t ∈ [0, +∞) and From (33) and the convexity Φ and Θ we obtain for every t ∈ [0, +∞) and Further, due to Lemma 10(ii) it holds for almost every t ∈ [0, +∞) On the other hand, using (32) and the Young-Fenchel inequality we obtain for every t ∈ [0, +∞) Finally, we obtain for almost every t ∈ [0, +∞) d dt where the first inequality follows from (41), the second one from (38) and (39), the next one from (35) and (36), and the last one from (H 2 λ ), (34), (40) and (37). (i) Since for almost every t ∈ [0, +∞) we have (see (42)) d dt the conclusion follows from Lemma 2, (H) and the fact that g z (t) ≥ 0 for every t ≥ 0.

From (H) and Lemma 2 it follows that lim t→+∞ F (t) exists and it is a real number. Hence
Further, since ψ ≥ 0, we obtain for every t ∈ [0, +∞) Similarly to (41)one can show that for every t ∈ [0, +∞) while from (42) we obtain that for almost every t ∈ [0, +∞) it holds d dt By using the same arguments as used in the proof of (43) it yields that Finally, from (43) and (44) we obtain (ii).
In order to proceed with the asymptotic analysis of the dynamical system (14), we make the following more involved assumptions on the functions λ and β, respectively: (ii) lim t→+∞ Ψ(x(t)) = 0.
Proof. Take an arbitrary z ∈ S and (according to ( H)) p ∈ N argmin Ψ (z) such that −p − ∇Θ(z) ∈ ∂Φ(z) and consider the functions g z , h z defined in Lemma 11. (i) According to Lemma 11(i), since g z ≥ 0, we have that t → λ(t) x(t) − z 2 is bounded, which combined with lim t→+∞ λ(t) > 0 implies that x is bounded.

Remark 14
One can notice that the conditionβ ≤ kβ has not been used in the proofs of Lemma 12 and Lemma 13.
Proof. Let γ > 0 be such that Φ + Θ is γ-strongly convex. It is a well-known fact that in case the optimization problem (13) has a unique optimal solution, which we denote by z. From ( H) there exists p ∈ N argmin Ψ (z) such that −p − ∇Θ(z) ∈ ∂Φ(z). Consider again the functions g z , h z defined in Lemma 11.
Taking into account (H), by integration of the above inequality we obtain +∞ 0 x(t) − z 2 dt < +∞.
Since according to the proof of Theorem 15, lim t→+∞ x(t)−z exists, we conclude that x(t)−z converges to 0 as t → +∞ and the proof is complete.

Remark 17
The results presented in this paper remain true even if the assumed growth condition is satisfied starting with a t 0 ≥ 0, that is, if there exists t 0 ≥ 0 such that