Stochastic Approximation Procedures for Lévy-Driven SDEs

We consider a continuous-time Robbins–Monro-type stochastic approximation procedure for a system described by a (multidimensional) stochastic differential equation driven by a general Lévy process, and we find sufficient conditions for its convergence in terms of Lyapunov functions. While the jump part of the noise may spoil convergence to the root of the drift in some cases, we show that by a suitable choice of noise coefficients we obtain convergence under hypotheses on the drift weaker than those used in the diffusion case or convergence to a selected root in the case of multiple roots of the drift.


Introduction
Stochastic approximation algorithms concern convergence of sequences (Y n ) of random variables defined recursively, i.e., by a stochastic difference equation Y n+1 = Y n + α n U n where U n 's represent noisy observations and the step sizes α n > 0 satisfy suitable smallness assumptions. Originally proposed as a tool for finding a root of a function (the Robbins-Monro procedure) or its minimum (the Kiefer-Wolfowitz Communicated by Negash G. Medhin. procedure), these algorithms found various applications in optimization and machine learning. See, e.g., the books [2-4, 7, 13, 14] for a thorough discussion of various aspects of stochastic approximation algorithms and their use. (Let us mention also [8,Chapter 8] for very recent applications to variational inequalities with random data.) Nevel'son and Khas'minskii developed a continuous-time approach to stochastic approximation, which in the case of the Robbins-Monro-type procedure leads to a stochastic differential equation driven by a Wiener process W . Having advanced tools of stochastic analysis at their disposal-in particular the Lyapunov functions method from the stability theory of stochastic differential equations-they showed that sufficient conditions on coefficients of (1) implying convergence of its solutions almost surely as t → ∞ to the (unique) root of the drift R may be found and proved in a straightforward and transparent way. See their book [21] for a systematic development of these ideas and, for example, the papers [6,11,22] and the book [12] for further results on continuous-time stochastic approximation. As discrete-time systems indicate, it is reasonable to consider more general driving noises in Eq. (1). Stochastic recursive procedures described by equations driven by semimartingales were considered by Mel'nikov [20] and Lazrieva et al. [15][16][17][18]. Precise statements of their results are rather technical, but roughly speaking, the martingale part of the driving noise is a locally square integrable martingale or a random measure like a compensated Poisson random measure; proofs in these papers are based on results on convergence of semimartingales. A number of results concerning equations driven by square integrable processes with independent increments are stated in the book [12]; proofs, using Lyapunov functions techniques, are given, however, only in the discrete-time case.
In our paper, we shall study equations of the type (1) but driven by a general (multidimensional) Lévy process. Owing to the Lévy-Itô decomposition, such an equation may be written as where N andÑ are an uncompensated and a compensated Poisson random measures, respectively, and W is a Wiener process. Compared with the available results, we admit a non-compensated Poisson process as a driving noise and essentially no hypotheses of the L 2 -integrability type are needed. Employing the Lyapunov functions approach, we generalize results on convergence of the Robbins-Monro procedure from [21] to Eq. (2). It may look odd that the noise in Eq. (2) is not centered since then the last term on the right-hand side influences the drift R (e.g., if c is changed) and hence also its roots. Indeed, it may happen that solutions of (2) converge to a given point which, however, is not a root of R. Nevertheless, a nontrivial class of coefficients H and K exists such that solutions to (2) converge to the root of R under conditions weaker than those used in the diffusion case (1) as no monotonicity-type hypotheses are needed. Moreover, in the case of a drift with multiple roots, by choosing K in a suitable way we may select a unique root of R the solutions will converge to. Again, in the diffusion case the behavior is different. In Remark 4.1, we discuss the differences between behavior of solutions to (1) and (2) in detail.
Let us note that the coefficients H and K is (2) are defined on disjoint sets R ≥0 × R m × {|y| < c} and R ≥0 × R m × {|y| ≥ c}, respectively, so we may-and will-treat them as restrictions of a single function defined on R ≥0 × R m × R n . This convention simplifies the form of the Itô formula.
In the next section, we introduce the equation we deal with precisely and we state the Itô formula in a form required in our proofs. In Sect. 2, the main results are proved: Theorem 3.1 giving general sufficient conditions for convergence of solutions to a stochastic differential equation driven by a Lévy process to a singleton and its Corollary 3.1 concerning the Robbins-Monro procedure, i.e., the problem (2). In Sect. 3, we show how to apply these results to particular systems.
In the rest of this section, let us introduce some notations to be used in the sequel. We set R ≥0 = [0, ∞) and R >0 = (0, ∞). By R m×n , we denote the space of all m × n matrices with real entries. If A ∈ R m×n , then A T ∈ R n×m is the transpose of the matrix A. Further, we denote by C b (R m ; R k ) the set of all bounded continuous R k -valued functions on R m , and by · ∞ its norm, i.e., u ∞ = sup R m |u|. Let C 2 (R m ) be the space of all continuous real-valued functions on R m having two continuous derivatives, and let the first and second Fréchet derivatives of V ∈ C 2 (R m ) be denoted by DV and D 2 V , respectively.

Preliminaries
Let m, n ∈ N and suppose that Borel functions and a Borel probability measure μ on R m are given. We consider the equation for some c ∈ R >0 and a pair (W , N ), where N is a Poisson random measure,Ñ is its compensated counterpart, and W is a Wiener process independent of N , see, e.g., [ we define a solution of (3) as follows.
) whose intensity is dt ν(dy) for some Lévy measure ν on R n \{0} and which is independent of W , (iv)Ñ = N − dt ν(dy), and (v) X is an R m -valued (F t )-progressively measurable càdlàg process such that the distribution of X 0 is μ and In paragraph (v) of Definition 2.1, it is supposed implicitly that all integrals are well defined, that is, for all t ≥ 0.
Throughout the paper, we impose the following assumption: and the function is locally bounded on R ≥0 × R m . Now, let us set and introduce an operator L associated with Eq. (3) that will henceforth play a crucial role. For V ∈ V , we define Using hypotheses (4) and (5), we can check easily that the definition of L is correct, see analogous considerations in the proof of Proposition 2.1. (4) can be omitted if we define L V as a function on the set {(t, x) ∈ R ≥0 × R m ; the right-hand side of (7) makes sense}. It is a direct consequence of the integrability condition in part (v) of Definition 2.1. We only adopted (4) so that the formulation of our main results may be more straightforward. (b) On the other hand, (5) is important and cannot be dispensed with easily. In a companion paper [19], related results on stability of solutions to (3) are obtained under a weaker hypothesis that

Remark 2.1 (a) Assumption
for some p ∈ (0, 1). The same choice is possible in the present paper. Under (8), we have to restrict ourselves to a narrower class of Lyapunov functions than V , proofs become rather complicated while the gain is not very impressive: the final criterion for convergence of the Robbins-Monro procedure remains almost the same. That is why we opted for (5).
Using the operator L , we can state the Itô formula for smooth functions of solutions to (3) in a suitable form.

Proposition 2.1 Assume that V ∈ V and X solves (3), then
Now adding and substracting to the right-hand side of (10) we obtain the formula (9) provided (11) is well defined for every t ≥ 0 P -almost surely. However, realizing that is a smooth function on [0, 1] and invoking boundedness of DV , we get for all x ∈ R m and s ∈ R ≥0 . Hence, follows by (5) since the paths of X are locally bounded.

Main Results
In this section, we first state a criterion based on Lyapunov functions for a solution to (3) to converge to a given point of the state space R m . The following theorem and its corollary generalize results from [21] to equations driven by Lévy processes.

Theorem 3.1 Let Assumption 2.1 be satisfied and let there exist x
Proof Let us set Step 1 We establish convergence of V (X t ) as t → ∞. To this end, we first show that (U (t, X t )) t≥0 is a supermartingale. Define for n ∈ N. Obviously, τ n 's are stopping times and τ n → ∞ P -almost surely as n → ∞. By the product rule for semimartingales, we get Hence, combining (9) and (18), we obtain for any n ∈ N and t ∈ R ≥0 (fixed but arbitrary) By the hypothesis (H3), we may estimate as α and ϕ are nonnegative. Therefore, from (19) we get We aim at showing that the right-hand side of (21) is a martingale for any n ∈ N. This having been established we find that so we may apply the Fatou lemma and arrive at for every t ∈ R ≥0 , as V ∈ L 1 (μ). Using the Fatou lemma for conditional expectations, we get in a completely analogous way that (U (t, X t ), t ∈ R ≥0 ) is a supermartingale, we skip the details.
Hence, now we fix n ∈ N and we shall proceed with the terms on the right-hand side of (21) separately. First for all t ∈ R ≥0 due to the definition of τ 2 n , so the stochastic integral is a martingale, since proceeding as in the proof of Proposition 2.1 and invoking the definition of τ 3 n we get Finally, for all t ∈ R ≥0 owing to (5). Therefore, by the same argument as in [ is again a martingale. Hence, the proof that (U (t, X t )) is a supermartingale is completed. Since U (t, X t ) is plainly nonnegative and right-continuous, the martingale convergence theorem implies that there exists an integrable random variable U ∞ ∈ L 1 (P ) such that lim t→∞ U (t, X t ) = U ∞ P -a.s., whence it follows that P -almost surely.
Step 2 Now we show that lim inf Let ω ∈ Ω be such that for some t 0 ∈ R ≥0 and ε > 0 and all t ≥ t 0 . If (12) is satisfied, then clearly a δ > 0 may be found such that If (13) is satisfied, then note that by (22) we may assume that V (X t (ω)) converges to a finite limit as t → ∞, so by the first part of (13) there exists a constant ζ = ζ(ω) such that Hence, the second part of (13) implies that for some δ > 0 and all t ≥ t 0 , that is, (24) again holds. Thus, we have As ξ ≥ 1, we have for all t ∈ R ≥0 and n ∈ N by (20). Using (19) together with the fact that the stochastic integrals in (19) are centered and U ≥ 0, we obtain for all t ∈ R ≥0 and n ∈ N, thus passing first n → ∞ and then t → ∞ and applying the monotone convergence theorem twice, we find the estimate the right-hand side of which is finite by (H2). We see that (25) holds true.
Step 3 It remains to show that lim t→∞ X t = x 0 P -a.s.
Suppose that ω ∈ Ω is such that for some ε > 0 and a sequence t n ∞. By the hypothesis (H2) of Theorem 3.1, an η > 0 may be found for which for every n ∈ N. We shall show that then either does not hold, where V ∞ is defined by (22). Indeed, (27) which is a contradiction. However, we have already shown that both (28) and (29) hold for P -almost all ω ∈ Ω, which concludes the proof of Theorem 3.1.
Now we focus on a particular case of Eq. (3) corresponding to the continuoustime stochastic approximation procedure of Robbins-Monro type with a general Lévy noise. Recall that in this setting we are looking for a stochastic differential equation such that its solutions converge to a root of the drift R for a class of noise coefficients as wide as possible. Namely, we consider the equation with Borel coefficients and a Borel probability measure μ on R m . The driving noise (W , N ) is the same as in (3). Since the function K is independent of time now, Assumption 2.1 takes the following form: Let us state a result which one obtains applying Theorem 3.1 to (30).

Corollary 3.1 Let Assumption 3.1 be satisfied. Let there exist x
Let there exist a constant K σ ∈ R ≥0 and a function β and for all x ∈ R m and t ∈ R ≥0 .

(37)
Proof To see that Corollary 3.1 follows immediately from Theorem 3.1, it suffices to check that the hypothesis (H3) is satisfied. However, the operator L associated with (30) takes the form for any x ∈ R m and t ∈ R >0 ; the last term on the right-hand side is well defined owing to Assumption 3.1. The assumptions of Corollary 3.1 thus imply that Since (K σ α 2 + 2β) ∈ L 1 (R ≥0 ) ∩ C (R ≥0 ), the proof is completed.
is continuous on R m and then both (31) and (34) are satisfied.

Applications
Sufficient conditions for convergence of a solution X of (30) to a point are given in Corollary 3.1 in terms of a Lyapunov function V . Choosing a particular Lyapunov function, we get more applicable criteria in terms of the coefficients of (30). If K = 0, then V = |·−x 0 | 2 is a standard choice; however, in the general case, we must proceed in a different way since we need a Lyapunov function belonging to the system V .

Example 4.1 Let x 0 ∈ R m and let us set
Obviously, the Fréchet derivatives of V are given by for all x ∈ R m and thus V ∈ V , furthermore, V (x) → +∞ as |x| → ∞.
Let Assumption 3.1 be satisfied and suppose that the coefficients σ and K of (30) satisfy the linear growth condition: there exists a constant L ∈ R ≥0 such that for all x ∈ R m and t ≥ 0. Denote by k the function Since for all x ∈ R m , (34) is satisfied with the choice The function ϕ defined by (41) surely satisfies (31) if k is continuous and If k is not continuous, it may be difficult to check (31) and a more feasible way may be to strengthen (42) assuming that there exists η > 0 such that In this case, we may set obtaining a function that clearly satisfies (31). We claim that the other hypotheses of Corollary 3.1 (in the version of Remark 3.1) are also satisfied.
For any x ∈ R m , we may compute using (39) and (35) follows. Finally, we verify that (36) holds with the choice β = 2α 2 L(1 + |x 0 | 2 ). Using that log(y) ≤ y − 1 for all y > 0 plainly and the definition of V , we obtain for all t ∈ R ≥0 and x ∈ R m . Note also that Assumption 3.1 clearly follows from (39). Therefore, whenever α ∈ C (R ≥0 , R >0 ) obeys (33) and ((W , N ), X ) is a solution to (30), then X converges almost surely to x 0 as t → ∞.
Hence, if R is continuous (which is a rather natural assumption) we have R(x 0 ) = 0 (as it is well known from the theory of monotone mappings, see, e.g., [5, Lemma 1] for a much more general result) and plainly x 0 is the unique root of R. If σ satisfies the linear growth condition and R is a continuous function such that (46) holds, then lim t→∞ X t = x 0 P -almost surely for any solution of the equation This is a classical result going back to [21]. (b) If the driving Lévy noise has a purely discontinuous component, but there are no large jumps, that is, ν{|x| ≥ a} = 0 for some a ∈ (0, ∞), then the results are virtually the same as in the diffusion case. Indeed, if R is continuous, obeys (46), and σ and K have at most linear growth, then (47) holds for any solution of Again, x 0 is the unique root of R. Related results, obtained by different methods, may be found in [15,20]. (c) In the general case K = 0 and ν{|y| ≥ c} > 0, the situation changes considerably.
This should not be surprising: the last term on the right-hand side of (30), that is, the process is not centered in general. Moreover, if we would like to keep the driving Lévy noise in (3) The following simple example illustrates this phenomenon. Define the coefficients R and K by for some a, b ∈ R m and matrices A, B ∈ R m×m such that A + B is invertible and negative definite, and A(x 0 − a) = 0 where we set x 0 = (A + B) −1 (Aa + Bb). We can assume for simplicity that ν{|y| ≥ c} = 1. Then, for some η > 0 and all x = x 0 , however, R(x 0 ) = 0. (d) Therefore, in the general case of (30) we must add the assumption R(x 0 ) = 0 if x) ∈ R ≥0 × R, V = | · | 2 , and α(t) = (1 + t) −1 for t ≥ 0, then all assumptions of Corollary 3.1 are satisfied except the hypothesis (34), R is plainly globally Lipschitz continuous having 0 as its only root, nevertheless, a simple direct calculation shows that X t → ∞ P-a.s. as t → ∞.) (e) If {|y|≥c} K (x, y) ν(dy) = 0 for all x ∈ R m then the process (50) is centered and we see that any solution X to (30) converges to the unique root of R under the hypothesis that R is a continuous function satisfying (46) (and σ and K has at most linear growth). This result may be compared with theorems stated in [12] where equations driven by centered square integrable processes with independent increments are dealt with. We do not need L 2 -integrability, on the other hand sharper asymptotic results than mere convergence almost surely are established in [12] at the price of more restrictive assumptions on noise coefficients and the cumulant process of the driving Lévy process. (f) Finally, note that the hypotheses of Example 4.1 may be satisfied even if R has multiple roots. The coefficient K then "selects" a root of R which a solution to (30) converges to. This may happen only if a noncentered non-compensated Poisson process is allowed as a driving noise. As we have already indicated above, large jumps of the Lévy process virtually change the drift and, consequently, it is possible that a solution to (30) no longer converges to some (or all) of its roots. Again, in the diffusion case or for Eq. (49) the situation is completely different, see, e.g., [21,Chapter 5]. For example, let m = 1 and let σ and K satisfy (39) and x · {|y|≥c} K (x, y) ν(dy) ≤ −2|x| 2 for all x ∈ R.
(g) It is possible to allow coefficients K depending on time, i.e., defined on R ≥0 × R m × R n . If Eq. (49) is considered, that is, there are no large jumps, this change results in a trivial modification of the assumptions. In the general case, however, the hypotheses become cumbersome and thus we content ourselves with timeindependent K 's.

Conclusions
We extended a Lyapunov-functions-based approach to convergence of a continuoustime Robbins-Monro procedure of stochastic approximation from diffusion processes to systems defined by a stochastic differential equation driven by a general Lévy process. While for a driving noise with small jumps only our results are essentially comparable with available results (albeit our proofs are different), if large jumps are allowed we showed that new phenomena may occur: the large jumps may force the procedure to converge to a "fake" root of the drift, on the other hand, if the noise coefficient is properly chosen, we obtain convergence under hypothesis weaker than those of the standard theory.