Abstract
We consider a continuous-time Robbins–Monro-type stochastic approximation procedure for a system described by a (multidimensional) stochastic differential equation driven by a general Lévy process, and we find sufficient conditions for its convergence in terms of Lyapunov functions. While the jump part of the noise may spoil convergence to the root of the drift in some cases, we show that by a suitable choice of noise coefficients we obtain convergence under hypotheses on the drift weaker than those used in the diffusion case or convergence to a selected root in the case of multiple roots of the drift.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Stochastic approximation algorithms concern convergence of sequences \((Y_n)\) of random variables defined recursively, i.e., by a stochastic difference equation \(Y_{n+1} = Y_n + \alpha _n U_n\) where \(U_n\)’s represent noisy observations and the step sizes \(\alpha _n>0\) satisfy suitable smallness assumptions. Originally proposed as a tool for finding a root of a function (the Robbins–Monro procedure) or its minimum (the Kiefer–Wolfowitz procedure), these algorithms found various applications in optimization and machine learning. See, e.g., the books [2,3,4, 7, 13, 14] for a thorough discussion of various aspects of stochastic approximation algorithms and their use. (Let us mention also [8, Chapter 8] for very recent applications to variational inequalities with random data.)
Nevel’son and Khas’minskii developed a continuous-time approach to stochastic approximation, which in the case of the Robbins–Monro-type procedure leads to a stochastic differential equation
driven by a Wiener process W. Having advanced tools of stochastic analysis at their disposal—in particular the Lyapunov functions method from the stability theory of stochastic differential equations—they showed that sufficient conditions on coefficients of (1) implying convergence of its solutions almost surely as \(t\rightarrow \infty \) to the (unique) root of the drift R may be found and proved in a straightforward and transparent way. See their book [21] for a systematic development of these ideas and, for example, the papers [6, 11, 22] and the book [12] for further results on continuous-time stochastic approximation.
As discrete-time systems indicate, it is reasonable to consider more general driving noises in Eq. (1). Stochastic recursive procedures described by equations driven by semimartingales were considered by Mel’nikov [20] and Lazrieva et al. [15,16,17,18]. Precise statements of their results are rather technical, but roughly speaking, the martingale part of the driving noise is a locally square integrable martingale or a random measure like a compensated Poisson random measure; proofs in these papers are based on results on convergence of semimartingales. A number of results concerning equations driven by square integrable processes with independent increments are stated in the book [12]; proofs, using Lyapunov functions techniques, are given, however, only in the discrete-time case.
In our paper, we shall study equations of the type (1) but driven by a general (multidimensional) Lévy process. Owing to the Lévy–Itô decomposition, such an equation may be written as
where N and \({\tilde{N}}\) are an uncompensated and a compensated Poisson random measures, respectively, and W is a Wiener process. Compared with the available results, we admit a non-compensated Poisson process as a driving noise and essentially no hypotheses of the \(L^2\)-integrability type are needed. Employing the Lyapunov functions approach, we generalize results on convergence of the Robbins–Monro procedure from [21] to Eq. (2). It may look odd that the noise in Eq. (2) is not centered since then the last term on the right-hand side influences the drift R (e.g., if c is changed) and hence also its roots. Indeed, it may happen that solutions of (2) converge to a given point which, however, is not a root of R. Nevertheless, a nontrivial class of coefficients H and K exists such that solutions to (2) converge to the root of R under conditions weaker than those used in the diffusion case (1) as no monotonicity-type hypotheses are needed. Moreover, in the case of a drift with multiple roots, by choosing K in a suitable way we may select a unique root of R the solutions will converge to. Again, in the diffusion case the behavior is different. In Remark 4.1, we discuss the differences between behavior of solutions to (1) and (2) in detail.
Let us note that the coefficients H and K is (2) are defined on disjoint sets \({\mathbb {R}}_{\ge 0}\times {\mathbb {R}}^m\times \{|y|<c\}\) and \({\mathbb {R}}_{\ge 0}\times {\mathbb {R}}^m\times \{|y|\ge c\}\), respectively, so we may—and will—treat them as restrictions of a single function defined on \({\mathbb {R}}_{\ge 0}\times {\mathbb {R}}^m\times {\mathbb {R}}^n\). This convention simplifies the form of the Itô formula.
In the next section, we introduce the equation we deal with precisely and we state the Itô formula in a form required in our proofs. In Sect. 2, the main results are proved: Theorem 3.1 giving general sufficient conditions for convergence of solutions to a stochastic differential equation driven by a Lévy process to a singleton and its Corollary 3.1 concerning the Robbins–Monro procedure, i.e., the problem (2). In Sect. 3, we show how to apply these results to particular systems.
In the rest of this section, let us introduce some notations to be used in the sequel. We set \({\mathbb {R}}_{\ge 0} = [0,\infty )\) and \({\mathbb {R}}_{> 0} = (0,\infty )\). By \({\mathbb {R}}^{m\times n}\), we denote the space of all \(m\times n\) matrices with real entries. If \(A\in \mathbb R^{m\times n}\), then \(A^T\in {\mathbb {R}}^{n\times m}\) is the transpose of the matrix A. Further, we denote by \(\mathscr {C}_b(\mathbb {R}^m; {\mathbb {R}}^k)\) the set of all bounded continuous \({\mathbb {R}}^k\)-valued functions on \({\mathbb {R}}^m\), and by \(\Vert \cdot \Vert _\infty \) its norm, i.e., \(\Vert u\Vert _\infty = \sup _{{\mathbb {R}}^m}|u|\). Let \(\mathscr {C}^{2} (\mathbb {R}^m)\) be the space of all continuous real-valued functions on \({\mathbb {R}}^m\) having two continuous derivatives, and let the first and second Fréchet derivatives of \(V\in {\mathscr {C}}^2({\mathbb {R}}^m)\) be denoted by DV and \(D^2V\), respectively.
2 Preliminaries
Let \(m, n\in \mathbb {N}\) and suppose that Borel functions
and a Borel probability measure \(\mu \) on \(\mathbb {R}^m\) are given. We consider the equation
for some \(c\in {\mathbb {R}}_{>0}\) and a pair (W, N), where N is a Poisson random measure, \(\tilde{N}\) is its compensated counterpart, and W is a Wiener process independent of N, see, e.g., [1, Section 2.3.1]. More precisely, recalling that a Borel measure on \(\mathbb {R}^n \setminus \lbrace 0\rbrace \) is called a Lévy measure if
we define a solution of (3) as follows.
Definition 2.1
A triplet \(((\varOmega , \mathscr {F}, (\mathscr {F}_t)_{t\ge 0}, \mathbb {P}\,), (W, N), X)\) is called a solution to Eq. (3) provided
-
(i)
\((\varOmega , \mathscr {F}, (\mathscr {F}_t)_{t\ge 0}, \mathbb {P}\,)\) is a stochastic basis with a normal filtration \((\mathscr {F}_t)_{t\ge 0}\),
-
(ii)
W is an (\(\mathscr {F}_t\))-Wiener process with values in \(\mathbb {R}^n\),
-
(iii)
N is an (\(\mathscr {F}_t\))-Poisson random measure N on \(\mathbb {R}_{\ge 0}\times (\mathbb {R}^n{\setminus }\lbrace 0\rbrace )\) whose intensity is \(\textrm{d}t \nu (\textrm{d}y)\) for some Lévy measure \(\nu \) on \(\mathbb {R}^n{\setminus } \lbrace 0\rbrace \) and which is independent of W,
-
(iv)
\({\tilde{N}} = N - \textrm{d}t\nu (\textrm{d}y)\), and
-
(v)
X is an \(\mathbb {R}^m\)-valued (\(\mathscr {F}_t\))-progressively measurable càdlàg process such that the distribution of \(X_0\) is \(\mu \) and
$$\begin{aligned}{} & {} X_t=X_0+\int _0^t f(s, X_{s})\,\textrm{d}s+\int _0^tg(s, X_{s})\,\textrm{d}W_s \\{} & {} \quad \qquad +\int _0^t\int _ {\lbrace \left| y\right| < c\rbrace } {H(s, X_{s-}, y)}\,\tilde{N}(\textrm{d}s,\textrm{d}y)\\{} & {} \quad \qquad +\int _0^t\int _ {\lbrace \left| y\right| \ge c\rbrace } {H(s, X_{s-}, y)\,N(\textrm{d}s, \textrm{d}y)} \quad \mathbb {P}\,\text {-a.s.} \end{aligned}$$for all \(t\in \mathbb {R}_{\ge 0}\).
In paragraph (v) of Definition 2.1, it is supposed implicitly that all integrals are well defined, that is,
for all \(t\ge 0\).
Throughout the paper, we impose the following assumption:
Assumption 2.1
We shall assume that
and the function
is locally bounded on \(\mathbb {R}_{\ge 0}\times \mathbb {R}^m\).
Now, let us set
and introduce an operator \({\mathscr {L}}\) associated with Eq. (3) that will henceforth play a crucial role. For \(V\in \mathscr {V}\), we define
Using hypotheses (4) and (5), we can check easily that the definition of \({\mathscr {L}}\) is correct, see analogous considerations in the proof of Proposition 2.1.
Remark 2.1
-
(a)
Assumption (4) can be omitted if we define \(\mathscr {L}V\) as a function on the set \(\{(t,x)\in {\mathbb {R}}_{\ge 0}\times \mathbb {R}^m; \; \text {the right-hand side of } (7) \text { makes sense}\}\). It is a direct consequence of the integrability condition in part (v) of Definition 2.1. We only adopted (4) so that the formulation of our main results may be more straightforward.
-
(b)
On the other hand, (5) is important and cannot be dispensed with easily. In a companion paper [19], related results on stability of solutions to (3) are obtained under a weaker hypothesis that
$$\begin{aligned} (t, x)\longmapsto \int _ {\lbrace \left| y\right| \ge c\rbrace } |H(t, x, y)|^p\,\nu (\textrm{d}y) \quad \text {is locally bounded on }\mathbb {R}_{\ge 0}\times \mathbb {R}^m \end{aligned}$$(8)for some \(p\in (0,1)\). The same choice is possible in the present paper. Under (8), we have to restrict ourselves to a narrower class of Lyapunov functions than \(\mathscr {V}\), proofs become rather complicated while the gain is not very impressive: the final criterion for convergence of the Robbins–Monro procedure remains almost the same. That is why we opted for (5).
Using the operator \({\mathscr {L}}\), we can state the Itô formula for smooth functions of solutions to (3) in a suitable form.
Proposition 2.1
Assume that \(V\in \mathscr {V}\) and X solves (3), then
Proof
By [1, Theorem 4.4.7], we have
Now adding and substracting
to the right-hand side of (10) we obtain the formula (9) provided (11) is well defined for every \(t\ge 0\) \(\mathbb {P}\,\)-almost surely. However, realizing that \(\theta \longmapsto V(x+\theta H(s,x,y))\) is a smooth function on [0, 1] and invoking boundedness of DV, we get
for all \(x\in \mathbb {R}^m\) and \(s\in \mathbb {R}_{\ge 0}\). Hence,
follows by (5) since the paths of X are locally bounded. \(\square \)
3 Main Results
In this section, we first state a criterion based on Lyapunov functions for a solution to (3) to converge to a given point of the state space \({\mathbb {R}}^m\). The following theorem and its corollary generalize results from [21] to equations driven by Lévy processes.
Theorem 3.1
Let Assumption 2.1 be satisfied and let there exist \(x_0\in \mathbb {R}^m\), a measurable function \(\varphi :\mathbb {R}^m\longrightarrow \mathbb {R}_{\ge 0}\), a function \(V\in {\mathscr {V}}\), and measurable functions \(\alpha , \gamma :\mathbb {R}_{\ge 0}\longrightarrow \mathbb {R}_{>0}\) such that
-
(H1)
either
$$\begin{aligned} \inf _{\left| x-x_0\right| \ge \varepsilon }{\varphi (x)}>0 \quad \text {for all }\varepsilon >0 \end{aligned}$$(12)or
$$\begin{aligned} \lim _{|x|\rightarrow \infty } V(x) = + \infty \quad \text {and} \quad \inf _{\varrho \ge |x-x_0|\ge \varepsilon } \varphi (x)>0 \quad \text {for all }\varrho>\varepsilon >0, \end{aligned}$$(13) -
(H2)
\(V(x_0)=0\), \(V\in L^1(\mu )\) and
$$\begin{aligned} \inf _{\left| x-x_0\right| \ge \varepsilon }{V(x)}>0 \end{aligned}$$(14)for any \(\varepsilon >0\),
-
(H3)
\(\alpha \in L_{\textrm{loc}}^1(\mathbb {R}_{\ge 0})\setminus L^1(\mathbb {R}_{\ge 0})\), \(\gamma \in L^1(\mathbb {R}_{\ge 0})\cap \mathscr {C}(\mathbb {R}_{\ge 0})\) and
$$\begin{aligned} \mathscr {L}V(t,x)\le -\alpha (t)\varphi (x)+\gamma (t)(1+V(x)) \end{aligned}$$(15)for all \(t\ge 0\) and \(x\in \mathbb {R}^m\).
Then, any solution \((\varOmega , \mathscr {F}, (\mathscr {F}_t), (W, N), X)\) to (3) satisfies
Proof
Let us set
and
- Step 1:
-
We establish convergence of \(V(X_t)\) as \(t\rightarrow \infty \). To this end, we first show that \(\left( U(t, X_t)\right) _{t\ge 0}\) is a supermartingale. Define
for \(n\in \mathbb {N}\). Obviously, \(\tau _n\)’s are stopping times and \(\tau _n\rightarrow \infty \) \(\mathbb {P}\,\)-almost surely as \(n\rightarrow \infty \).
By the product rule for semimartingales, we get
Hence, combining (9) and (18), we obtain for any \(n\in \mathbb {N}\) and \(t\in \mathbb {R}_{\ge 0}\) (fixed but arbitrary)
By the hypothesis (H3), we may estimate
as \(\alpha \) and \(\varphi \) are nonnegative. Therefore, from (19) we get
We aim at showing that the right-hand side of (21) is a martingale for any \(n\in {\mathbb {N}}\). This having been established we find that
so we may apply the Fatou lemma and arrive at
for every \(t\in \mathbb {R}_{\ge 0}\), as \(V\in L^1(\mu )\). Using the Fatou lemma for conditional expectations, we get in a completely analogous way that \((U(t, X_t),\,t\in \mathbb {R}_{\ge 0})\) is a supermartingale, we skip the details.
Hence, now we fix \(n\in \mathbb {N}\) and we shall proceed with the terms on the right-hand side of (21) separately.
First, since \(DV\in \mathscr {C}_b(\mathbb {R}^m;{\mathbb {R}}^m)\) by assumption we get
for all \(t\in \mathbb {R}_{\ge 0}\) due to the definition of \(\tau ^2_n\), so the stochastic integral
is a martingale.
Similarly, the compensated integral
is a martingale, since proceeding as in the proof of Proposition 2.1 and invoking the definition of \(\tau ^3_n\) we get
for every \(t\in \mathbb {R}_{\ge 0}\).
Finally,
for all \(t\in \mathbb {R}_{\ge 0}\) owing to (5). Therefore, by the same argument as in [9, Lemma II.3.1] (see the proof of formula (3.8) on page 62 therein) or by modifying slightly the definition of \(\tau _n\)’s and using [10, Theorem II.1.8] we have that
is again a martingale.
Hence, the proof that \((U(t, X_t))\) is a supermartingale is completed. Since \(U(t,X_t)\) is plainly nonnegative and right-continuous, the martingale convergence theorem implies that there exists an integrable random variable \(U_\infty \in L^1(\mathbb {P}\,)\) such that \(\lim _{t\rightarrow \infty } U(t,X_t) = U_\infty \) \(\mathbb {P}\,\)-a.s., whence it follows that
\(\mathbb {P}\,\)-almost surely.
- Step 2:
-
Now we show that
$$\begin{aligned} \liminf _{t\rightarrow \infty } \bigl | X_t-x_0 \bigr | = 0 \quad \mathbb {P}\,\text {-a.s.} \end{aligned}$$(23)
Let \(\omega \in \varOmega \) be such that
for some \(t_0\in {\mathbb {R}}_{\ge 0}\) and \(\varepsilon >0\) and all \(t\ge t_0\). If (12) is satisfied, then clearly a \(\delta >0\) may be found such that
If (13) is satisfied, then note that by (22) we may assume that \(V(X_t(\omega ))\) converges to a finite limit as \(t\rightarrow \infty \), so by the first part of (13) there exists a constant \(\zeta = \zeta (\omega )\) such that
Hence, the second part of (13) implies that
for some \(\delta >0\) and all \(t\ge t_0\), that is, (24) again holds. Thus, we have
because \(\alpha \in L_{\textrm{loc}}^1(\mathbb {R}_{\ge 0}) {\setminus } L^1(\mathbb {R}_{\ge 0})\). Therefore, (23) is established provided we show that
As \(\xi \ge 1\), we have
for all \(t\in \mathbb {R}_{\ge 0}\) and \(n\in \mathbb {N}\) by (20). Using (19) together with the fact that the stochastic integrals in (19) are centered and \(U\ge 0\), we obtain
for all \(t\in \mathbb {R}_{\ge 0}\) and \(n\in {\mathbb {N}}\), thus passing first \(n\rightarrow \infty \) and then \(t\rightarrow \infty \) and applying the monotone convergence theorem twice, we find the estimate
the right-hand side of which is finite by (H2). We see that (25) holds true.
- Step 3:
-
It remains to show that
$$\begin{aligned} \lim _{t\rightarrow \infty } X_t = x_0 \quad \mathbb {P}\,\text {-a.s.} \end{aligned}$$(26)
Suppose that \(\omega \in \varOmega \) is such that
for some \(\varepsilon >0\) and a sequence \(t_n\nearrow \infty \). By the hypothesis (H2) of Theorem 3.1, an \(\eta >0\) may be found for which
for every \(n\in \mathbb {N}\). We shall show that then either
or
does not hold, where \(V_\infty \) is defined by (22). Indeed, (27) together with (28) imply that \(V_\infty (\omega )\ge \eta \). On the other hand, if (29) is satisfied, then there exists a sequence \(r_n\nearrow \infty \) such that
hence, again by (28) and (H2),
which is a contradiction. However, we have already shown that both (28) and (29) hold for \(\mathbb {P}\,\)-almost all \(\omega \in \varOmega \), which concludes the proof of Theorem 3.1. \(\square \)
Now we focus on a particular case of Eq. (3) corresponding to the continuous-time stochastic approximation procedure of Robbins–Monro type with a general Lévy noise. Recall that in this setting we are looking for a stochastic differential equation such that its solutions converge to a root of the drift R for a class of noise coefficients as wide as possible. Namely, we consider the equation
with Borel coefficients
and a Borel probability measure \(\mu \) on \(\mathbb {R}^m\). The driving noise (W, N) is the same as in (3). Since the function K is independent of time now, Assumption 2.1 takes the following form:
Assumption 3.1
We shall assume that
and the function
is locally bounded on \({\mathbb {R}}^m\).
Let us state a result which one obtains applying Theorem 3.1 to (30).
Corollary 3.1
Let Assumption 3.1 be satisfied. Let there exist \(x_0\in {\mathbb {R}}^m\), a function \(V\in \mathscr {V}\cap L^1(\mu )\) with \(V(x_0)=0\) and a measurable function \(\varphi :\mathbb {R}^m\longrightarrow \mathbb {R}_{\ge 0}\) such that
and
Assume further that \(\alpha \in \mathscr {C}(\mathbb {R}_{\ge 0},\mathbb {R}_{>0})\) satisfies
Let there exist a constant \(K_\sigma \in \mathbb {R}_{\ge 0}\) and a function \(\beta \in \mathscr {C}(\mathbb {R}_{\ge 0})\cap L^1(\mathbb {R}_{\ge 0})\) such that
and
for all \(x\in \mathbb {R}^m\) and \(t\in \mathbb {R}_{\ge 0}\).
If \((\varOmega ,\mathscr {F}, (\mathscr {F}_t), (W, N), X)\) is a solution to (30), then
Proof
To see that Corollary 3.1 follows immediately from Theorem 3.1, it suffices to check that the hypothesis (H3) is satisfied. However, the operator \({\mathscr {L}}\) associated with (30) takes the form
for any \(x\in \mathbb {R}^m\) and \(t\in {\mathbb {R}}_{>0}\); the last term on the right-hand side is well defined owing to Assumption 3.1. The assumptions of Corollary 3.1 thus imply that
Since \((K_\sigma \alpha ^2 + 2\beta )\in L^1({\mathbb {R}}_{\ge 0})\cap {\mathscr {C}}({\mathbb {R}}_{\ge 0})\), the proof is completed. \(\square \)
Remark 3.1
-
(a)
As in Theorem 3.1, we may replace (31) and (32) with
$$\begin{aligned} \inf _{|x-x_0|\ge \varepsilon }\bigl (V(x)\wedge \varphi (x)\bigr )> 0 \quad \text {for any }\varepsilon > 0. \end{aligned}$$(38) -
(b)
If the function
$$\begin{aligned} x\longmapsto \Bigl \langle R(x)+\int _ {\lbrace \left| y\right| \ge c\rbrace } K(x, y)\,\nu (\textrm{d}y), DV(x) \Bigr \rangle \end{aligned}$$is continuous on \({\mathbb {R}}^m\) and
$$\begin{aligned} \Bigl \langle R(x)+\int _ {\lbrace \left| y\right| \ge c\rbrace } K(x, y)\,\nu (\textrm{d}y), DV(x)\Bigr \rangle <0 \quad \text {for }x\ne x_0 \end{aligned}$$we may set
$$\begin{aligned} \varphi (x) = - \Bigl \langle R(x)+\int _ {\lbrace \left| y\right| \ge c\rbrace } K(x, y)\,\nu (\textrm{d}y), DV(x)\Bigr \rangle , \quad x\in {\mathbb {R}}^m, \end{aligned}$$
If \(H=0\) and \(K=0\), then Theorem 3.1 and Corollary 3.1 correspond essentially to [21], Theorems 3.8.1 and 4.4.1, respectively.
4 Applications
Sufficient conditions for convergence of a solution X of (30) to a point are given in Corollary 3.1 in terms of a Lyapunov function V. Choosing a particular Lyapunov function, we get more applicable criteria in terms of the coefficients of (30). If \(K=0\), then \(V = \vert \cdot -x_0\vert ^2\) is a standard choice; however, in the general case, we must proceed in a different way since we need a Lyapunov function belonging to the system \({\mathscr {V}}\).
Example 4.1
Let \(x_0\in \mathbb {R}^m\) and let us set
Obviously, the Fréchet derivatives of V are given by
for all \(x\in {\mathbb {R}}^m\) and thus \(V\in {\mathscr {V}}\), furthermore, \(V(x)\rightarrow +\infty \) as \(|x|\rightarrow \infty \).
Let Assumption 3.1 be satisfied and suppose that the coefficients \(\sigma \) and K of (30) satisfy the linear growth condition: there exists a constant \(L\in \mathbb {R}_{\ge 0}\) such that
for all \(x\in {\mathbb {R}}^m\) and \(t\ge 0\). Denote by \({\mathfrak {k}}\) the function
Since
for all \(x\in {\mathbb {R}}^m\), (34) is satisfied with the choice
The function \(\varphi \) defined by (41) surely satisfies (31) if \({\mathfrak {k}}\) is continuous and
If \({\mathfrak {k}}\) is not continuous, it may be difficult to check (31) and a more feasible way may be to strengthen (42) assuming that there exists \(\eta >0\) such that
In this case, we may set
obtaining a function that clearly satisfies (31). We claim that the other hypotheses of Corollary 3.1 (in the version of Remark 3.1) are also satisfied.
For any \(x\in \mathbb {R}^m\), we may compute using (39)
and (35) follows. Finally, we verify that (36) holds with the choice \(\beta = 2\alpha ^2\,L(1+|x_0|^2)\). Using that \(\log (y)\le y-1\) for all \(y>0\) plainly and the definition of V, we obtain
for all \(t\in \mathbb {R}_{\ge 0}\) and \(x\in {\mathbb {R}}^m\). Note also that Assumption 3.1 clearly follows from (39).
Therefore, whenever \(\alpha \in \mathscr {C}(\mathbb {R}_{\ge 0}, \mathbb {R}_{> 0})\) obeys (33) and ((W, N), X) is a solution to (30), then X converges almost surely to \(x_0\) as \(t\rightarrow \infty \).
Remark 4.1
It should be stressed that under the hypotheses of Example 4.1 the point \(x_0\in {\mathbb {R}}^m\) the solution of (30) converges to need not be a root of the drift R; therefore, a priori it might be misleading to speak about a Robbins–Monro stochastic approximation procedure. Let us discuss this problem more carefully: Our main positive results are illustrated in paragraphs (d) and (f), while (c) contains a counterexample. In (a), (b) and (e), particular cases related to hitherto available results are treated.
-
(a)
Assume that \(K=0\). Then, (42) reduces to
$$\begin{aligned} \langle R(x), x-x_0\rangle < 0 \quad \text {for all }x\ne x_0. \end{aligned}$$(46)Hence, if R is continuous (which is a rather natural assumption) we have \(R(x_0) = 0\) (as it is well known from the theory of monotone mappings, see, e.g., [5, Lemma 1] for a much more general result) and plainly \(x_0\) is the unique root of R. If \(\sigma \) satisfies the linear growth condition and R is a continuous function such that (46) holds, then
$$\begin{aligned} \lim _{t\rightarrow \infty } X_t = x_0 \quad \mathbb {P}\,\text {-almost surely} \end{aligned}$$(47)for any solution of the equation
$$\begin{aligned} \textrm{d}X_t = \alpha (t)\Bigl ( R(X_t)\,\textrm{d}t + \sigma (t,X_t) \,\textrm{d}W_t\Bigr ), \quad X_0\sim \mu . \end{aligned}$$(48)This is a classical result going back to [21].
-
(b)
If the driving Lévy noise has a purely discontinuous component, but there are no large jumps, that is, \(\nu \{|x|\ge a\} = 0\) for some \(a\in (0,\infty )\), then the results are virtually the same as in the diffusion case. Indeed, if R is continuous, obeys (46), and \(\sigma \) and K have at most linear growth, then (47) holds for any solution of
$$\begin{aligned}{} & {} \,\textrm{d}X_t = \alpha (t)\Bigl ( R(X_t)\,\textrm{d}t + \sigma (t,X_t) \,\textrm{d}W_t \nonumber \\{} & {} \qquad \qquad + \int _{\{|y|<a\}} K(X_{t-},y){\tilde{N}}(\textrm{d}t,\textrm{d}y)\Bigr ), \quad X_0\sim \mu . \end{aligned}$$(49)Again, \(x_0\) is the unique root of R. Related results, obtained by different methods, may be found in [15, 20].
-
(c)
In the general case \(K\ne 0\) and \(\nu \{|y|\ge c\}>0\), the situation changes considerably. This should not be surprising: the last term on the right-hand side of (30), that is, the process
$$\begin{aligned} \int ^\cdot _0 \int _{\{|y|\ge c\}} K(X_{t-},y)\, N(\textrm{d}t, \textrm{d}y) \end{aligned}$$(50)is not centered in general. Moreover, if we would like to keep the driving Lévy noise in (3) but to use a representation with a different c it results in a change of the drift (and, a fortiori, of the roots of the drift). Hence, Corollary 3.1 need not be applicable to the Robbins–Monro procedure, as it implies convergence to a point \(x_0\) such that \(R(x_0)\ne 0\). Indeed, if in the setting of Example 4.1 the function \({\mathfrak {k}}\) is continuous and satisfies (42), then we only know that
$$\begin{aligned} R(x_0) + \int _{\{|y|\ge c\}} K(x_0,y)\,\nu (\textrm{d}y) = 0 \end{aligned}$$The following simple example illustrates this phenomenon. Define the coefficients R and K by
$$\begin{aligned} R:x\longmapsto A(x-a), \quad K:(x,y)\longmapsto B(x-b) \end{aligned}$$for some \(a,b\in {\mathbb {R}}^m\) and matrices \(A,B\in {\mathbb {R}}^{m\times m}\) such that \(A+B\) is invertible and negative definite, and \(A(x_0-a)\ne 0\) where we set \(x_0 = (A+B)^{-1} (Aa+Bb)\). We can assume for simplicity that \(\nu \{|y|\ge c\}=1\). Then,
$$\begin{aligned} {\mathfrak {k}}(x)&= \Bigl \langle A(x-a) + \int _{\{|y|\ge c\}} B(x-b) \,\nu (\textrm{d}y),x-x_0\Bigr \rangle \\&= \bigl \langle (A+B)x - (Aa+Bb), x-x_0 \bigr \rangle \\&= \bigl \langle (A+B)(x-x_0),x-x_0\bigr \rangle \\&\le -\eta |x-x_0|^2 \end{aligned}$$for some \(\eta >0\) and all \(x\ne x_0\), however, \(R(x_0)\ne 0\).
-
(d)
Therefore, in the general case of (30) we must add the assumption \(R(x_0) = 0\) if Corollary 3.1 is to be applied to stochastic approximation; for Eqs. (48) and (49) this is redundant. On the other hand, by choosing K in an appropriate way we may obtain (47) under rather mild hypotheses on R. Let us assume that \(R(x_0) = 0\) and R is Lipschitz continuous, denote by \({\text {Lip}}(R)\) its Lipschitz constant. If K satisfies, still in the setting of Example 4.1,
$$\begin{aligned} \Bigl \langle \int _{\{|y|\ge c\}} K(x,y)\,\nu (\textrm{d}y), x-x_0 \Bigr \rangle \le -({\text {Lip}}(R)+1)|x-x_0|^2 \quad \text {for all }x\in \mathbb R^m, \end{aligned}$$then Corollary 3.1 is applicable. In the diffusion case (48), the mere Lipschitz continuity of R need not be sufficient for the convergence of the stochastic approximation procedure. (Indeed, consider (48) with the choice \(m=n=1\), \(R(x) = \sigma (t,x) = x\) for \((t,x)\in \mathbb R_{\ge 0}\times {\mathbb {R}}\), \(V = |\cdot |^2\), and \(\alpha (t) = (1+t)^{-1}\) for \(t\ge 0\), then all assumptions of Corollary 3.1 are satisfied except the hypothesis (34), R is plainly globally Lipschitz continuous having 0 as its only root, nevertheless, a simple direct calculation shows that \(X_t\rightarrow \infty \) \({\mathbb {P}}\)-a.s. as \(t\rightarrow \infty \).)
-
(e)
If
$$\begin{aligned} \int _{\{|y|\ge c\}} K(x,y)\,\nu (\textrm{d}y) = 0 \quad \text {for all } x\in {\mathbb {R}}^m \end{aligned}$$then the process (50) is centered and we see that any solution X to (30) converges to the unique root of R under the hypothesis that R is a continuous function satisfying (46) (and \(\sigma \) and K has at most linear growth). This result may be compared with theorems stated in [12] where equations driven by centered square integrable processes with independent increments are dealt with. We do not need \(L^2\)-integrability, on the other hand sharper asymptotic results than mere convergence almost surely are established in [12] at the price of more restrictive assumptions on noise coefficients and the cumulant process of the driving Lévy process.
-
(f)
Finally, note that the hypotheses of Example 4.1 may be satisfied even if R has multiple roots. The coefficient K then “selects” a root of R which a solution to (30) converges to. This may happen only if a noncentered non-compensated Poisson process is allowed as a driving noise. As we have already indicated above, large jumps of the Lévy process virtually change the drift and, consequently, it is possible that a solution to (30) no longer converges to some (or all) of its roots. Again, in the diffusion case or for Eq. (49) the situation is completely different, see, e.g., [21, Chapter 5]. For example, let \(m=1\) and let \(\sigma \) and K satisfy (39) and
$$\begin{aligned} x\cdot \int _{\{|y|\ge c\}} K(x,y)\,\nu (\textrm{d}y) \le -2|x|^2 \quad \text {for all }x\in {\mathbb {R}}. \end{aligned}$$Then, any solution to
$$\begin{aligned}{} & {} \textrm{d}X_t=\alpha (t)\Big ( \sin X_{t} \,\textrm{d}t+\sigma (t,X_{t}) \,\textrm{d}W_t + \int _ {\lbrace \left| y\right| < c\rbrace } K(X_{t-}, y)\,\tilde{N}(\textrm{d}t, \textrm{d}y) \\{} & {} \qquad + \int _ {\lbrace \left| y\right| \ge c\rbrace } K(X_{t-}, y)\,N(\textrm{d}t, \textrm{d}y)\Big ), \quad t\ge 0 \\{} & {} \quad X_0 \sim \mu , \end{aligned}$$satisfies
$$\begin{aligned} \lim _{t\rightarrow \infty } X_t = 0 \quad \mathbb {P}\,\text {-a.s.} \end{aligned}$$ -
(g)
It is possible to allow coefficients K depending on time, i.e., defined on \({\mathbb {R}}_{\ge 0}\times {\mathbb {R}}^m\times {\mathbb {R}}^n\). If Eq. (49) is considered, that is, there are no large jumps, this change results in a trivial modification of the assumptions. In the general case, however, the hypotheses become cumbersome and thus we content ourselves with time-independent K’s.
5 Conclusions
We extended a Lyapunov-functions-based approach to convergence of a continuous-time Robbins–Monro procedure of stochastic approximation from diffusion processes to systems defined by a stochastic differential equation driven by a general Lévy process. While for a driving noise with small jumps only our results are essentially comparable with available results (albeit our proofs are different), if large jumps are allowed we showed that new phenomena may occur: the large jumps may force the procedure to converge to a “fake“ root of the drift, on the other hand, if the noise coefficient is properly chosen, we obtain convergence under hypothesis weaker than those of the standard theory.
References
Applebaum, D.: Lévy Processes and Stochastic Calculus, 2nd edn. Cambridge University Press, Cambridge (2009)
Benveniste, A., Métivier, M., Priouret, P.: Adaptive Algorithms and Stochastic Approximations. Springer, Berlin (1990)
Bhatnagar, S., Prasad, H. L., Prashanth, L. A.: Lecture notes in control information science. In: Stochastic Recursive Algorithms for Optimization, vol. 434. Springer, London (2013)
Borkar, V.S.: Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press, Cambridge (2008)
Browder, F.E.: Nonlinear elliptic boundary value problems. Bull. Am. Math. Soc. 69, 862–874 (1963)
Chen, H.-F.: Continuous-time stochastic approximation: convergence and asymptotic efficiency. Stoch. Stoch. Rep. 51, 217–239 (1994)
Chen, H.-F.: Stochastic Approximation and its Applications. Kluwer, Dordrecht (2002)
Gwinner, J., Jadamba, B., Khan, A.A., Raciti, F.: Uncertainty Quantification in Variational Inequalities. Theory, Numerics, and Applications. CRC Press, Boca Raton (2022)
Ikeda, N., Watanabe, S.: Stochastic Differential Equations and Diffusion Processes. North-Holland, Amsterdam (1981)
Jacod, J., Shiryaev, A.N.: Limit Theorems for Stochastic Processes, 2nd edn. Springer, Berlin (2003)
Komarov, S.V., Krasulina, T.P.: One-sided convergence of continuous processes of stochastic approximation. J. Math. Sci. 93, 379–384 (1999)
Korostelev, A.P.: Stochastic Recurrent Procedures. Local properties (Russian). Nauka, Moscow (1984)
Kushner, H.J., Clark, D.S.: Stochastic Approximation Methods for Constrained and Unconstrained systems. Springer, New York (1978)
Kushner, H.J., Yin, G.G.: Stochastic Approximation and Recursive Algorithms and Applications, 2nd edn. Springer, New York (2003)
Lazrieva, N., Sharia, T., Toronjadze, T.: The Robbins-Monro type stochastic differential equations. I. Convergence of solutions. Stoch. Stoch. Rep. 61, 67–87 (1997)
Lazrieva, N., Sharia, T., Toronjadze, T.: The Robbins-Monro type stochastic differential equations. II. Asymptotic behaviour of solutions. Stoch. Stoch. Rep. 75, 153–180 (2003)
Lazrieva, N., Sharia, T., Toronjadze, T.: Semimartingale stochastic approximation procedure and recursive estimation. J. Math. Sci. 153, 211–261 (2008)
Lazrieva, N., Toronjadze, T.: The Robbins-Monro type stochastic differential equations. III. Polyak’s averaging. Stochastics 82, 165–188 (2010)
Maslowski, B., Týbl, O.: Invariant measures and boundedness in the mean for stochastic equations driven by Lévy noise. Stoch. Dyn. 22, 2240019 (2022)
Mel’nikov, A. V.: Stochastic approximation procedures for semimartingales (Russian). In: Statistics and Control of Random Processes, pp. 147–256. Nauka (Russian) (1989)
Nevel’son, M.B., Khas’minskiĭ, R.Z.: Stochastic approximation and recurrent estimation (Russian). Nauka, Moscow (1972) (English translation: Nevel’son, M.B., Has’minskiĭ, R.Z.: Stochastic approximation and recursive estimation. American Mathematical Society, Providence (1973))
Pflug, G.: Stetige stochastische approximation. Metrika 26, 139–150 (1979)
Acknowledgements
This research was supported in part by Czech Science Foundation (GA ČR) Grant No. 19-07140 S. Moreover, the second author was also supported by the Grant schemes at Charles University, Project No. CZ.02.2.690.00.019 073/0016935. Thanks are due to the referee for useful comments.
Funding
Open access publishing supported by the National Technical Library in Prague.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Negash G. Medhin.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Seidler, J., Týbl, O. Stochastic Approximation Procedures for Lévy-Driven SDEs. J Optim Theory Appl 197, 817–837 (2023). https://doi.org/10.1007/s10957-023-02198-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-023-02198-0
Keywords
- Stochastic approximation algorithms
- Robbins–Monro procedure
- Lévy-driven stochastic differential equations