1 Introduction

Stochastic approximation algorithms concern convergence of sequences \((Y_n)\) of random variables defined recursively, i.e., by a stochastic difference equation \(Y_{n+1} = Y_n + \alpha _n U_n\) where \(U_n\)’s represent noisy observations and the step sizes \(\alpha _n>0\) satisfy suitable smallness assumptions. Originally proposed as a tool for finding a root of a function (the Robbins–Monro procedure) or its minimum (the Kiefer–Wolfowitz procedure), these algorithms found various applications in optimization and machine learning. See, e.g., the books [2,3,4, 7, 13, 14] for a thorough discussion of various aspects of stochastic approximation algorithms and their use. (Let us mention also [8, Chapter 8] for very recent applications to variational inequalities with random data.)

Nevel’son and Khas’minskii developed a continuous-time approach to stochastic approximation, which in the case of the Robbins–Monro-type procedure leads to a stochastic differential equation

$$\begin{aligned} \textrm{d}Y_t = \alpha (t)\bigl (R(Y_t)\,\textrm{d}t + \sigma (t,Y_t) \,\textrm{d}W_t\bigr ) \end{aligned}$$
(1)

driven by a Wiener process W. Having advanced tools of stochastic analysis at their disposal—in particular the Lyapunov functions method from the stability theory of stochastic differential equations—they showed that sufficient conditions on coefficients of (1) implying convergence of its solutions almost surely as \(t\rightarrow \infty \) to the (unique) root of the drift R may be found and proved in a straightforward and transparent way. See their book [21] for a systematic development of these ideas and, for example, the papers [6, 11, 22] and the book [12] for further results on continuous-time stochastic approximation.

As discrete-time systems indicate, it is reasonable to consider more general driving noises in Eq. (1). Stochastic recursive procedures described by equations driven by semimartingales were considered by Mel’nikov [20] and Lazrieva et al. [15,16,17,18]. Precise statements of their results are rather technical, but roughly speaking, the martingale part of the driving noise is a locally square integrable martingale or a random measure like a compensated Poisson random measure; proofs in these papers are based on results on convergence of semimartingales. A number of results concerning equations driven by square integrable processes with independent increments are stated in the book [12]; proofs, using Lyapunov functions techniques, are given, however, only in the discrete-time case.

In our paper, we shall study equations of the type (1) but driven by a general (multidimensional) Lévy process. Owing to the Lévy–Itô decomposition, such an equation may be written as

$$\begin{aligned} \begin{aligned}&\textrm{d}X_t=\alpha (t)\Big ( R(X_{t})\,\textrm{d}t+\sigma (t,X_{t})\,\textrm{d}W_t + \int _ {\lbrace \left| y\right| < c\rbrace } H(X_{t-}, y)\,\tilde{N}(\textrm{d}t, \textrm{d}y) \\&\qquad \quad + \int _ {\lbrace \left| y\right| \ge c\rbrace } K(X_{t-}, y)\,N(\textrm{d}t, \textrm{d}y)\Big ), \end{aligned} \end{aligned}$$
(2)

where N and \({\tilde{N}}\) are an uncompensated and a compensated Poisson random measures, respectively, and W is a Wiener process. Compared with the available results, we admit a non-compensated Poisson process as a driving noise and essentially no hypotheses of the \(L^2\)-integrability type are needed. Employing the Lyapunov functions approach, we generalize results on convergence of the Robbins–Monro procedure from [21] to Eq. (2). It may look odd that the noise in Eq. (2) is not centered since then the last term on the right-hand side influences the drift R (e.g., if c is changed) and hence also its roots. Indeed, it may happen that solutions of (2) converge to a given point which, however, is not a root of R. Nevertheless, a nontrivial class of coefficients H and K exists such that solutions to (2) converge to the root of R under conditions weaker than those used in the diffusion case (1) as no monotonicity-type hypotheses are needed. Moreover, in the case of a drift with multiple roots, by choosing K in a suitable way we may select a unique root of R the solutions will converge to. Again, in the diffusion case the behavior is different. In Remark 4.1, we discuss the differences between behavior of solutions to (1) and (2) in detail.

Let us note that the coefficients H and K is (2) are defined on disjoint sets \({\mathbb {R}}_{\ge 0}\times {\mathbb {R}}^m\times \{|y|<c\}\) and \({\mathbb {R}}_{\ge 0}\times {\mathbb {R}}^m\times \{|y|\ge c\}\), respectively, so we may—and will—treat them as restrictions of a single function defined on \({\mathbb {R}}_{\ge 0}\times {\mathbb {R}}^m\times {\mathbb {R}}^n\). This convention simplifies the form of the Itô formula.

In the next section, we introduce the equation we deal with precisely and we state the Itô formula in a form required in our proofs. In Sect. 2, the main results are proved: Theorem 3.1 giving general sufficient conditions for convergence of solutions to a stochastic differential equation driven by a Lévy process to a singleton and its Corollary 3.1 concerning the Robbins–Monro procedure, i.e., the problem (2). In Sect. 3, we show how to apply these results to particular systems.

In the rest of this section, let us introduce some notations to be used in the sequel. We set \({\mathbb {R}}_{\ge 0} = [0,\infty )\) and \({\mathbb {R}}_{> 0} = (0,\infty )\). By \({\mathbb {R}}^{m\times n}\), we denote the space of all \(m\times n\) matrices with real entries. If \(A\in \mathbb R^{m\times n}\), then \(A^T\in {\mathbb {R}}^{n\times m}\) is the transpose of the matrix A. Further, we denote by \(\mathscr {C}_b(\mathbb {R}^m; {\mathbb {R}}^k)\) the set of all bounded continuous \({\mathbb {R}}^k\)-valued functions on \({\mathbb {R}}^m\), and by \(\Vert \cdot \Vert _\infty \) its norm, i.e., \(\Vert u\Vert _\infty = \sup _{{\mathbb {R}}^m}|u|\). Let \(\mathscr {C}^{2} (\mathbb {R}^m)\) be the space of all continuous real-valued functions on \({\mathbb {R}}^m\) having two continuous derivatives, and let the first and second Fréchet derivatives of \(V\in {\mathscr {C}}^2({\mathbb {R}}^m)\) be denoted by DV and \(D^2V\), respectively.

2 Preliminaries

Let \(m, n\in \mathbb {N}\) and suppose that Borel functions

$$\begin{aligned} f:\mathbb {R}_{\ge 0}\times \mathbb {R}^m\longrightarrow \mathbb {R}^m, \ g:\mathbb {R}_{\ge 0}\times \mathbb {R}^m\longrightarrow \mathbb {R}^{m\times n}, \ H:\mathbb {R}_{\ge 0}\times \mathbb {R}^{m}\times {\mathbb {R}}^{n}\ \longrightarrow \mathbb {R}^m, \end{aligned}$$

and a Borel probability measure \(\mu \) on \(\mathbb {R}^m\) are given. We consider the equation

$$\begin{aligned} \begin{aligned}&\textrm{d}X_t=f(t, X_{t})\,\textrm{d}t +g(t, X_{t})\,\textrm{d}W_t + \int \limits _{\lbrace y\in \mathbb {R}^n;\; |y|<c\rbrace } {H(t, X_{t-}, y)}\,\tilde{N} (\textrm{d}t, \textrm{d}y) \\&\quad \qquad +\int \limits _{\{y\in {\mathbb {R}}^n;\; |y|\ge c\}} {H(t, X_{t-}, y)\, N(\textrm{d}t, \textrm{d}y)}, \qquad t\ge 0, \\&\qquad X_0\sim \mu , \end{aligned} \end{aligned}$$
(3)

for some \(c\in {\mathbb {R}}_{>0}\) and a pair (WN), where N is a Poisson random measure, \(\tilde{N}\) is its compensated counterpart, and W is a Wiener process independent of N, see, e.g., [1, Section 2.3.1]. More precisely, recalling that a Borel measure on \(\mathbb {R}^n \setminus \lbrace 0\rbrace \) is called a Lévy measure if

$$\begin{aligned} \int _{\mathbb {R}^n\setminus \lbrace 0\rbrace }\bigl (|y|^2\wedge 1\bigr )\,\nu (\textrm{d}y)<\infty \end{aligned}$$

we define a solution of (3) as follows.

Definition 2.1

A triplet \(((\varOmega , \mathscr {F}, (\mathscr {F}_t)_{t\ge 0}, \mathbb {P}\,), (W, N), X)\) is called a solution to Eq. (3) provided

  1. (i)

    \((\varOmega , \mathscr {F}, (\mathscr {F}_t)_{t\ge 0}, \mathbb {P}\,)\) is a stochastic basis with a normal filtration \((\mathscr {F}_t)_{t\ge 0}\),

  2. (ii)

    W is an (\(\mathscr {F}_t\))-Wiener process with values in \(\mathbb {R}^n\),

  3. (iii)

    N is an (\(\mathscr {F}_t\))-Poisson random measure N on \(\mathbb {R}_{\ge 0}\times (\mathbb {R}^n{\setminus }\lbrace 0\rbrace )\) whose intensity is \(\textrm{d}t \nu (\textrm{d}y)\) for some Lévy measure \(\nu \) on \(\mathbb {R}^n{\setminus } \lbrace 0\rbrace \) and which is independent of W,

  4. (iv)

    \({\tilde{N}} = N - \textrm{d}t\nu (\textrm{d}y)\), and

  5. (v)

    X is an \(\mathbb {R}^m\)-valued (\(\mathscr {F}_t\))-progressively measurable càdlàg process such that the distribution of \(X_0\) is \(\mu \) and

    $$\begin{aligned}{} & {} X_t=X_0+\int _0^t f(s, X_{s})\,\textrm{d}s+\int _0^tg(s, X_{s})\,\textrm{d}W_s \\{} & {} \quad \qquad +\int _0^t\int _ {\lbrace \left| y\right| < c\rbrace } {H(s, X_{s-}, y)}\,\tilde{N}(\textrm{d}s,\textrm{d}y)\\{} & {} \quad \qquad +\int _0^t\int _ {\lbrace \left| y\right| \ge c\rbrace } {H(s, X_{s-}, y)\,N(\textrm{d}s, \textrm{d}y)} \quad \mathbb {P}\,\text {-a.s.} \end{aligned}$$

    for all \(t\in \mathbb {R}_{\ge 0}\).

In paragraph (v) of Definition 2.1, it is supposed implicitly that all integrals are well defined, that is,

$$\begin{aligned} \int ^t_0\Bigl \{ |f(s,X_s)|+|g(s,X_s)|^2 + \int _{\{|y|< c\}} |H(s,X_s,y)|^2\,\nu (\textrm{d}s)\Bigr \}\,\textrm{d}s < \infty \quad {\mathbb {P}}\text {-a.s.} \end{aligned}$$

for all \(t\ge 0\).

Throughout the paper, we impose the following assumption:

Assumption 2.1

We shall assume that

$$\begin{aligned} \int _ {\lbrace \left| y\right|< c\rbrace } {\left| H(t, x, y)\right| ^2\,\nu (\textrm{d}y)}<\infty \quad \text {for all }(t, x)\in \mathbb {R}_{\ge 0}\times \mathbb {R}^m \end{aligned}$$
(4)

and the function

$$\begin{aligned} (t, x)\longmapsto \int _ {\lbrace \left| y\right| \ge c\rbrace } {\left| H(t, x, y)\right| \,\nu (\textrm{d}y)} \end{aligned}$$
(5)

is locally bounded on \(\mathbb {R}_{\ge 0}\times \mathbb {R}^m\).

Now, let us set

$$\begin{aligned} \mathscr {V} = \bigl \{V\in \mathscr {C}^{2}(\mathbb {R}^m);\; DV\in {\mathscr {C}}_b ({\mathbb {R}}^m;{\mathbb {R}}^m), \ D^2V\in {\mathscr {C}}_b({\mathbb {R}}^m;{\mathbb {R}}^{m\times m}) \bigr \} \end{aligned}$$
(6)

and introduce an operator \({\mathscr {L}}\) associated with Eq. (3) that will henceforth play a crucial role. For \(V\in \mathscr {V}\), we define

$$\begin{aligned} \begin{aligned}&\mathscr {L}V:\mathbb {R}_{\ge 0} \times \mathbb {R}^m \longrightarrow {\mathbb {R}}, \\&\quad (t, x)\longmapsto \bigl \langle f(t, x), DV(x)\bigr \rangle + \frac{1}{2} {\text {Tr}}\left( g(t, x)^TD^2V(x)g(t, x)\right) \\&\qquad \quad + \int _{\mathbb {R}^n\setminus \lbrace 0\rbrace } \bigl [V(x+H(t, x,y))-V(x)\\&\qquad \quad -\textbf{1}_{ {\lbrace \left| y\right| < c\rbrace } }(y)\bigl \langle H(t, x,y), DV(x) \bigr \rangle \bigr ] \,\nu (\textrm{d}y). \end{aligned} \end{aligned}$$
(7)

Using hypotheses (4) and (5), we can check easily that the definition of \({\mathscr {L}}\) is correct, see analogous considerations in the proof of Proposition 2.1.

Remark 2.1

  1. (a)

    Assumption (4) can be omitted if we define \(\mathscr {L}V\) as a function on the set \(\{(t,x)\in {\mathbb {R}}_{\ge 0}\times \mathbb {R}^m; \; \text {the right-hand side of } (7) \text { makes sense}\}\). It is a direct consequence of the integrability condition in part (v) of Definition 2.1. We only adopted (4) so that the formulation of our main results may be more straightforward.

  2. (b)

    On the other hand, (5) is important and cannot be dispensed with easily. In a companion paper [19], related results on stability of solutions to (3) are obtained under a weaker hypothesis that

    $$\begin{aligned} (t, x)\longmapsto \int _ {\lbrace \left| y\right| \ge c\rbrace } |H(t, x, y)|^p\,\nu (\textrm{d}y) \quad \text {is locally bounded on }\mathbb {R}_{\ge 0}\times \mathbb {R}^m \end{aligned}$$
    (8)

    for some \(p\in (0,1)\). The same choice is possible in the present paper. Under (8), we have to restrict ourselves to a narrower class of Lyapunov functions than \(\mathscr {V}\), proofs become rather complicated while the gain is not very impressive: the final criterion for convergence of the Robbins–Monro procedure remains almost the same. That is why we opted for (5).

Using the operator \({\mathscr {L}}\), we can state the Itô formula for smooth functions of solutions to (3) in a suitable form.

Proposition 2.1

Assume that \(V\in \mathscr {V}\) and X solves (3), then

$$\begin{aligned} \begin{aligned} \textrm{d}V(X_t)=\mathscr {L}V(t,X_{t})\textrm{d}t&+ \bigl \langle g(t, X_{t})^T DV(X_{t}), \cdot \bigr \rangle \, \textrm{d}W_t \\&+ \int _ {\lbrace \left| y\right| < c\rbrace } \bigl [V(X_{t-}+H(t, X_{t-}, y))-V(X_{t-}) \bigr ]\,\tilde{N}(\textrm{d}t, \textrm{d}y) \\&+ \int _ {\lbrace \left| y\right| \ge c\rbrace } \bigl [V(X_{t-}+H(t, X_{t-}, y))-V(X_{t-}) \bigr ]\, N(\textrm{d}t, \textrm{d}y) \\&- \int _ {\lbrace \left| y\right| \ge c\rbrace } \bigl [V(X_{t}+H(t, X_{t}, y))-V(X_{t}) \bigr ]\,\nu (\textrm{d}y)\,\textrm{d}t. \end{aligned} \end{aligned}$$
(9)

Proof

By [1, Theorem 4.4.7], we have

$$\begin{aligned} \begin{aligned} {\text {d}}V(X_t)&= \Bigl (\bigl \langle f(t, X_{t}), DV(X_{t})\bigr \rangle + \frac{1}{2}{\text {Tr}}\left( g(t, X_{t})^TD^2V(X_{t})g(t, X_{t})\right) \Bigr )\,\textrm{d}t \\&\quad + \bigl \langle g(t, X_{t})^TDV(X_{t}),\cdot \bigr \rangle \,\textrm{d}W_t \\&\quad +\int _ {\lbrace \left| y\right|< c\rbrace } \bigl [V(X_{t-}+H(t, X_{t-}, y))-V(X_{t-}) \bigr ]\,\tilde{N}(\textrm{d}t, \textrm{d}y) \\&\quad + \int _ {\lbrace \left| y\right| < c\rbrace } \bigr [V(X_{t}+H(t, X_{t}, y))-V(X_{t}) \\&\qquad \qquad -\bigl \langle DV(X_{t}), H(t, X_{t}, y)\bigr \rangle \bigr ]\,\nu (\textrm{d}y)\,\textrm{d}t \\&\quad +\int _ {\lbrace \left| y\right| \ge c\rbrace } \bigl [V(X_{t-}+H(t, X_{t-}, y))-V(X_{t-}) \bigr ]\, N(\textrm{d}t, \textrm{d}y). \end{aligned} \end{aligned}$$
(10)

Now adding and substracting

$$\begin{aligned} \int _0^t\int _ {\lbrace \left| y\right| \ge c\rbrace } \bigl [V(X_{s}+H(s, X_{s}, y))-V(X_{s})\bigr ] \,\nu (\textrm{d}y)\,\textrm{d}s, \end{aligned}$$
(11)

to the right-hand side of (10) we obtain the formula (9) provided (11) is well defined for every \(t\ge 0\) \(\mathbb {P}\,\)-almost surely. However, realizing that \(\theta \longmapsto V(x+\theta H(s,x,y))\) is a smooth function on [0, 1] and invoking boundedness of DV, we get

$$\begin{aligned} \int _ {\lbrace \left| y\right| \ge c\rbrace }&\bigl | V(x+H(s, x, y))-V(x)\bigr \vert \,\nu (\textrm{d}y) \\&=\int _ {\lbrace \left| y\right| \ge c\rbrace } \bigg |\int _0^1 \bigl \langle DV(x+\theta H(s, x, y)), H(s, x, y) \bigr \rangle \,\textrm{d}\theta \biggr |\,\nu (\textrm{d}y) \\&\le \Vert DV\Vert _{\infty }\int _ {\lbrace \left| y\right| \ge c\rbrace } \bigl \vert H(s, x, y) \bigr \vert \,\nu (\textrm{d}y) \end{aligned}$$

for all \(x\in \mathbb {R}^m\) and \(s\in \mathbb {R}_{\ge 0}\). Hence,

$$\begin{aligned} \int _0^t\int _ {\lbrace \left| y\right| \ge c\rbrace } \bigl \vert V(X_{s}+H(s, X_{s}, y))-V(X_{s})\bigr \vert \,\nu (\textrm{d}y) \,\textrm{d}s < \infty \quad \mathbb {P}\,\text {-a.s.} \end{aligned}$$

follows by (5) since the paths of X are locally bounded. \(\square \)

3 Main Results

In this section, we first state a criterion based on Lyapunov functions for a solution to (3) to converge to a given point of the state space \({\mathbb {R}}^m\). The following theorem and its corollary generalize results from [21] to equations driven by Lévy processes.

Theorem 3.1

Let Assumption 2.1 be satisfied and let there exist \(x_0\in \mathbb {R}^m\), a measurable function \(\varphi :\mathbb {R}^m\longrightarrow \mathbb {R}_{\ge 0}\), a function \(V\in {\mathscr {V}}\), and measurable functions \(\alpha , \gamma :\mathbb {R}_{\ge 0}\longrightarrow \mathbb {R}_{>0}\) such that

  1. (H1)

    either

    $$\begin{aligned} \inf _{\left| x-x_0\right| \ge \varepsilon }{\varphi (x)}>0 \quad \text {for all }\varepsilon >0 \end{aligned}$$
    (12)

    or

    $$\begin{aligned} \lim _{|x|\rightarrow \infty } V(x) = + \infty \quad \text {and} \quad \inf _{\varrho \ge |x-x_0|\ge \varepsilon } \varphi (x)>0 \quad \text {for all }\varrho>\varepsilon >0, \end{aligned}$$
    (13)
  2. (H2)

    \(V(x_0)=0\), \(V\in L^1(\mu )\) and

    $$\begin{aligned} \inf _{\left| x-x_0\right| \ge \varepsilon }{V(x)}>0 \end{aligned}$$
    (14)

    for any \(\varepsilon >0\),

  3. (H3)

    \(\alpha \in L_{\textrm{loc}}^1(\mathbb {R}_{\ge 0})\setminus L^1(\mathbb {R}_{\ge 0})\), \(\gamma \in L^1(\mathbb {R}_{\ge 0})\cap \mathscr {C}(\mathbb {R}_{\ge 0})\) and

    $$\begin{aligned} \mathscr {L}V(t,x)\le -\alpha (t)\varphi (x)+\gamma (t)(1+V(x)) \end{aligned}$$
    (15)

    for all \(t\ge 0\) and \(x\in \mathbb {R}^m\).

Then, any solution \((\varOmega , \mathscr {F}, (\mathscr {F}_t), (W, N), X)\) to (3) satisfies

$$\begin{aligned} \lim _{t\rightarrow \infty }X_t=x_0 \quad \mathbb {P}\,\text {-a.s.} \end{aligned}$$
(16)

Proof

Let us set

$$\begin{aligned} \xi (t)=\exp {\left( \int _t^\infty \gamma (r)dr\right) }, \quad t\in \mathbb {R}_{\ge 0}, \end{aligned}$$

and

$$\begin{aligned} U(t,x)=\xi (t)(1+V(x))=\exp {\left( \int _t^\infty \gamma (r)dr\right) } \left( 1+V(x)\right) , \quad (t, x)\in \mathbb {R}_{\ge 0}\times \mathbb {R}^m. \end{aligned}$$
Step 1:

We establish convergence of \(V(X_t)\) as \(t\rightarrow \infty \). To this end, we first show that \(\left( U(t, X_t)\right) _{t\ge 0}\) is a supermartingale. Define

$$\begin{aligned} \begin{aligned} \tau _n^1&=\inf \lbrace t\ge 0: \left| X_t\right|>n\rbrace , \\ \tau _n^2&=\inf \Bigl \{ t\ge 0: \int _0^t{\left| g(s, X_{s})\right| ^2\,\textrm{d}s}>n\Bigr \}, \\ \tau _n^3&=\inf \Bigl \{ t\ge 0: \int _0^t\int _ {\lbrace \left| y\right| < c\rbrace } {\left| H(s, X_{s}, y)\right| ^2} \,\nu (\textrm{d}y)\,\textrm{d}s >n \Bigr \}, \\ \tau _n&= \tau ^1_n \wedge \tau ^2_n \wedge \tau ^3_n \end{aligned} \end{aligned}$$
(17)

for \(n\in \mathbb {N}\). Obviously, \(\tau _n\)’s are stopping times and \(\tau _n\rightarrow \infty \) \(\mathbb {P}\,\)-almost surely as \(n\rightarrow \infty \).

By the product rule for semimartingales, we get

$$\begin{aligned} \textrm{d}U(t, X_t)=(1+V(X_t))\,\textrm{d}\xi (t)+\xi (t) \,\textrm{d}V(X_{t}), \quad t\in \mathbb {R}_{\ge 0}. \end{aligned}$$
(18)

Hence, combining (9) and (18), we obtain for any \(n\in \mathbb {N}\) and \(t\in \mathbb {R}_{\ge 0}\) (fixed but arbitrary)

$$\begin{aligned} \begin{aligned}&U(\tau _n\wedge t, X_{\tau _n\wedge t})-U(0, X_0) \\&\quad =\int _0^{\tau _n\wedge t}\bigl [(1+V(X_{s}))\xi ^\prime (s) +\xi (s)\mathscr {L}V(s,X_{s}) \bigr ]\,\textrm{d}s \\&\qquad +\int _0^{\tau _n\wedge t}\xi (s)\bigl \langle g(s, X_{s})^T DV(X_{s}),\cdot \bigr \rangle \,\textrm{d}W_s \\&\qquad +\int _0^{\tau _n\wedge t}\int _ {\lbrace \left| y\right| < c\rbrace } \xi (s)\bigl [V(X_{s-} +H(s, X_{s-}, y))-V(X_{s-})\bigr ]\,\tilde{N}(\textrm{d}s, \textrm{d}y) \\&\qquad +\int _0^{\tau _n\wedge t}\int _ {\lbrace \left| y\right| \ge c\rbrace } \xi (s)\bigl [V(X_{s-}+H(s, X_{s-}, y)) -V(X_{s-})\bigr ]\, N(\textrm{d}s, \textrm{d}y) \\&\qquad -\int _0^{\tau _n\wedge t}\int _ {\lbrace \left| y\right| \ge c\rbrace } \xi (s)\bigl [V(X_{s}+H(t, X_{s}, y)) -V(X_{s})\bigr ]\,\nu (\textrm{d}y)\,\textrm{d}s. \end{aligned} \end{aligned}$$
(19)

By the hypothesis (H3), we may estimate

$$\begin{aligned} \begin{aligned}&\int _0^{\tau _n\wedge t}\bigl [ (1+V(X_{s}))\xi '(s) + \xi (s)\mathscr {L}V(s,X_{s})\bigr ] \,\textrm{d}s \\&\quad =\int _0^{\tau _n\wedge t}\xi (s)\bigl \{-\gamma (s)(1+V(X_{s}))+ \mathscr {L}(s,X_{s})\bigr \} \,\textrm{d}s \\&\quad \le - \int _0^{\tau _n\wedge t} \xi (s)\alpha (s)\varphi (X_{s}) \,\textrm{d}s \\&\quad \le 0 \end{aligned} \end{aligned}$$
(20)

as \(\alpha \) and \(\varphi \) are nonnegative. Therefore, from (19) we get

$$\begin{aligned} \begin{aligned}&U(\tau _n\wedge t,\, X_{\tau _n\wedge t})-U(0, X_0) \\&\quad \le \int _0^{\tau _n\wedge t}\xi (s)\bigl \langle g(s, X_{s})^TDV(X_{s}), \cdot \bigr \rangle \,\textrm{d}W_s \\&\qquad +\int _0^{\tau _n\wedge t}\int _ {\lbrace \left| y\right| < c\rbrace } \xi (s)\bigl [V(X_{s-}+H(s, X_{s-}, y)) -V(X_{s-})\bigr ]\,\tilde{N}(\textrm{d}s, \textrm{d}y) \\&\qquad +\int _0^{\tau _n\wedge t}\int _ {\lbrace \left| y\right| \ge c\rbrace } \xi (s)\bigl [ V(X_{s-}+H(s, X_{s-}, y)) -V(X_{s-})\bigr ]\, N(\textrm{d}s, \textrm{d}y) \\&\qquad -\int _0^{\tau _n\wedge t}\int _ {\lbrace \left| y\right| \ge c\rbrace } \xi (s)\bigl [ V(X_{s}+H(t, X_{s}, y)) -V(X_{s})\bigr ]\,\nu (\textrm{d}y)\,\textrm{d}s. \end{aligned} \end{aligned}$$
(21)

We aim at showing that the right-hand side of (21) is a martingale for any \(n\in {\mathbb {N}}\). This having been established we find that

$$\begin{aligned} \mathbb {E}\,\bigl [ U(t\wedge \tau _n,X_{t\wedge \tau _n})-U(0,X_0)\bigr ] \le 0, \end{aligned}$$

so we may apply the Fatou lemma and arrive at

$$\begin{aligned} \mathbb {E}\,U(t, X_t)&= \mathbb {E}\,\lim _{n\rightarrow \infty }U(t\wedge \tau _n, X_{t\wedge \tau _n})\le \liminf _{n\rightarrow \infty } \mathbb {E}\,U(t\wedge \tau _n, X_{t\wedge \tau _n}) \\&\le \mathbb {E}\,U(0,X_0) \\&= e^{\Vert \gamma \Vert _{L^1}}\mathbb {E}\,V(X_0)<\infty \end{aligned}$$

for every \(t\in \mathbb {R}_{\ge 0}\), as \(V\in L^1(\mu )\). Using the Fatou lemma for conditional expectations, we get in a completely analogous way that \((U(t, X_t),\,t\in \mathbb {R}_{\ge 0})\) is a supermartingale, we skip the details.

Hence, now we fix \(n\in \mathbb {N}\) and we shall proceed with the terms on the right-hand side of (21) separately.

First, since \(DV\in \mathscr {C}_b(\mathbb {R}^m;{\mathbb {R}}^m)\) by assumption we get

$$\begin{aligned} \mathbb {E}\,\int _0^{t\wedge \tau _n}\bigl |\xi (s)\bigl \langle g(s, X_{s})^TDV(X_{s}),\cdot \bigr \rangle \bigr |^2 \,\textrm{d}s \le e^{2\Vert \gamma \Vert _{L^1}}\Vert DV\Vert _{\infty }^2 nt<\infty \end{aligned}$$

for all \(t\in \mathbb {R}_{\ge 0}\) due to the definition of \(\tau ^2_n\), so the stochastic integral

$$\begin{aligned} \int _0^{\cdot \wedge \tau _n}\xi (s)\bigl \langle g(s, X_{s})^T DV(X_{s}), \cdot \bigr \rangle \,\textrm{d}W_s \end{aligned}$$

is a martingale.

Similarly, the compensated integral

$$\begin{aligned} \int _0^{\cdot \wedge \tau _n}\int _ {\lbrace \left| y\right| < c\rbrace } \xi (s)\bigl (V(X_{s-} +H(s, X_{s-}, y))-V({X_{s-}})\bigr )\,\tilde{N}(\textrm{d}s, \textrm{d}y) \end{aligned}$$

is a martingale, since proceeding as in the proof of Proposition 2.1 and invoking the definition of \(\tau ^3_n\) we get

$$\begin{aligned}&\mathbb {E}\,\int _0^{t\wedge \tau _n}\int _ {\lbrace \left| y\right|< c\rbrace } \bigl |\xi (s) \bigl (V(X_{s}+H(s, X_{s}, y))-V(X_{s})\bigr )\bigl |^2\, \nu (\textrm{d}y)\,\textrm{d}s \\&\quad = \mathbb {E}\,\int _0^{t\wedge \tau _n}\int _ {\lbrace \left| y\right|< c\rbrace } \biggl |\int _0^1\xi (s) \bigl \langle DV(X_{s}+\theta H(s, X_{s}, y)), H(s, X_{s}, y) \bigr \rangle \,\textrm{d}\theta \biggr |^2\,\nu (\textrm{d}y)\, \textrm{d}s \\&\quad \le e^{2\Vert \gamma \Vert _{L^1}}\Vert DV\Vert _{\infty }^2 \mathbb {E}\,\int _0^{t\wedge \tau _n}\int _ {\lbrace \left| y\right|< c\rbrace } |H(s, X_{s}, y)|^2 \,\nu (\textrm{d}y)\, \textrm{d}s \\&\quad \le e^{2\Vert \gamma \Vert _{L^1}}\Vert DV\Vert _{\infty }^2 nt \\&\quad < \infty \end{aligned}$$

for every \(t\in \mathbb {R}_{\ge 0}\).

Finally,

$$\begin{aligned}&\mathbb {E}\,\int _0^{t\wedge \tau _n}\int _ {\lbrace \left| y\right| \ge c\rbrace } \bigl |\xi (s)\bigl ( V(X_{s}+H(s, X_{s}, y))-V(X_{s})\bigr )\bigr |\nu (\textrm{d}y)\,\textrm{d}s \\&\quad = \mathbb {E}\,\int _0^{t\wedge \tau _n}\int _ {\lbrace \left| y\right| \ge c\rbrace } \biggl |\int _0^1\xi (s) \bigl \langle DV(X_{s}+\theta H(s, X_{s}, y)), H(s, X_{s}, y) \bigr \rangle \,\textrm{d}\theta \biggr |\,\nu (\textrm{d}y)\,\textrm{d}s \\&\quad \le e^{\Vert \gamma \Vert _{L^1}}\Vert DV\Vert _{\infty } \mathbb {E}\,\int _0^{t\wedge \tau _n}\int _ {\lbrace \left| y\right| \ge c\rbrace } |H(s, X_{s}, y)|\,\nu (\textrm{d}y)\, \textrm{d}s \\&\quad <\infty \end{aligned}$$

for all \(t\in \mathbb {R}_{\ge 0}\) owing to (5). Therefore, by the same argument as in [9, Lemma II.3.1] (see the proof of formula (3.8) on page 62 therein) or by modifying slightly the definition of \(\tau _n\)’s and using [10, Theorem II.1.8] we have that

$$\begin{aligned} \int _0^{\cdot \wedge \tau _n}&\int _ {\lbrace \left| y\right| \ge c\rbrace } \xi (s)\bigl (V(X_{s-} +H(s, X_{s-}, y))-V(X_{s-})\bigr )\, N(\textrm{d}s, \textrm{d}y) \\&- \int _0^{\cdot \wedge \tau _n}\int _ {\lbrace \left| y\right| \ge c\rbrace } \xi (s)\bigl (V(X_{s} +H(s, X_{s}, y))-V(X_{s})\bigr )\,\nu (\textrm{d}y)\,\textrm{d}s \end{aligned}$$

is again a martingale.

Hence, the proof that \((U(t, X_t))\) is a supermartingale is completed. Since \(U(t,X_t)\) is plainly nonnegative and right-continuous, the martingale convergence theorem implies that there exists an integrable random variable \(U_\infty \in L^1(\mathbb {P}\,)\) such that \(\lim _{t\rightarrow \infty } U(t,X_t) = U_\infty \) \(\mathbb {P}\,\)-a.s., whence it follows that

$$\begin{aligned} \lim _{t\rightarrow \infty } V(X_t) = \lim _{t\rightarrow \infty }\exp \Bigl (-\int ^\infty _t \gamma (r)\,\textrm{d}r\Bigr )U(t,X_t) -1 = U_\infty -1 =\mathrel {\mathop :} V_\infty \end{aligned}$$
(22)

\(\mathbb {P}\,\)-almost surely.

Step 2:

Now we show that

$$\begin{aligned} \liminf _{t\rightarrow \infty } \bigl | X_t-x_0 \bigr | = 0 \quad \mathbb {P}\,\text {-a.s.} \end{aligned}$$
(23)

Let \(\omega \in \varOmega \) be such that

$$\begin{aligned} \bigl | X_t(\omega )-x_0\bigr |\ge \varepsilon \end{aligned}$$

for some \(t_0\in {\mathbb {R}}_{\ge 0}\) and \(\varepsilon >0\) and all \(t\ge t_0\). If (12) is satisfied, then clearly a \(\delta >0\) may be found such that

$$\begin{aligned} \varphi (X_t(\omega ))\ge \delta \quad \text {for all }t\ge t_0. \end{aligned}$$
(24)

If (13) is satisfied, then note that by (22) we may assume that \(V(X_t(\omega ))\) converges to a finite limit as \(t\rightarrow \infty \), so by the first part of (13) there exists a constant \(\zeta = \zeta (\omega )\) such that

$$\begin{aligned} \sup _{t\ge 0} |X_t(\omega )| \le \zeta . \end{aligned}$$

Hence, the second part of (13) implies that

$$\begin{aligned} \varphi (X_t(\omega )) \ge \inf _{\zeta \ge |x|\ge \varepsilon } \varphi (x) \ge \delta \end{aligned}$$

for some \(\delta >0\) and all \(t\ge t_0\), that is, (24) again holds. Thus, we have

$$\begin{aligned} \int _{t_0}^\infty \alpha (s)\varphi (X_s(\omega ))\,\textrm{d}s =\infty , \end{aligned}$$

because \(\alpha \in L_{\textrm{loc}}^1(\mathbb {R}_{\ge 0}) {\setminus } L^1(\mathbb {R}_{\ge 0})\). Therefore, (23) is established provided we show that

$$\begin{aligned} \int _0^\infty \alpha (s)\varphi (X_s)\,\textrm{d}s <\infty \quad \mathbb {P}\,\text {-a.s.} \end{aligned}$$
(25)

As \(\xi \ge 1\), we have

$$\begin{aligned} \int _0^{t\wedge \tau _n} \alpha (s)\varphi (X_s)\,\textrm{d}s \le -\int _0^{t\wedge \tau _n}\bigl [(1+V(X_{s}))\xi ^\prime (s) +\xi (s)\mathscr {L}V(s,X_{s})\bigr ]\,\textrm{d}s \end{aligned}$$

for all \(t\in \mathbb {R}_{\ge 0}\) and \(n\in \mathbb {N}\) by (20). Using (19) together with the fact that the stochastic integrals in (19) are centered and \(U\ge 0\), we obtain

$$\begin{aligned} \mathbb {E}\,\int _0^{t\wedge \tau _n} \alpha (s)\varphi (X_s)\,\textrm{d}s&\le -\mathbb {E}\,\int _0^{t\wedge \tau _n}\bigl [(1+V(X_{s}))\xi ^\prime (s) +\xi (s)\mathscr {L}V(s,X_{s})\bigr ]\,\textrm{d}s \\&=\mathbb {E}\,\bigl \{U(0, X_0)- U(t\wedge \tau _n, X_{t\wedge \tau _n})\bigr \} \\&\le \mathbb {E}\,U(0, X_0) \end{aligned}$$

for all \(t\in \mathbb {R}_{\ge 0}\) and \(n\in {\mathbb {N}}\), thus passing first \(n\rightarrow \infty \) and then \(t\rightarrow \infty \) and applying the monotone convergence theorem twice, we find the estimate

$$\begin{aligned} \mathbb {E}\,\int _0^\infty \alpha (s)\varphi (X_s)\,\textrm{d}s \le \mathbb {E}\,U(0,X_0) = e^{\Vert \gamma \Vert _{L^1}} \mathbb {E}\,V(X_0) \end{aligned}$$

the right-hand side of which is finite by (H2). We see that (25) holds true.

Step 3:

It remains to show that

$$\begin{aligned} \lim _{t\rightarrow \infty } X_t = x_0 \quad \mathbb {P}\,\text {-a.s.} \end{aligned}$$
(26)

Suppose that \(\omega \in \varOmega \) is such that

$$\begin{aligned} \bigl | X_{t_n}(\omega )-x_0\bigr |\ge \varepsilon \end{aligned}$$

for some \(\varepsilon >0\) and a sequence \(t_n\nearrow \infty \). By the hypothesis (H2) of Theorem 3.1, an \(\eta >0\) may be found for which

$$\begin{aligned} V(X_{t_n}(\omega ))\ge \eta \end{aligned}$$
(27)

for every \(n\in \mathbb {N}\). We shall show that then either

$$\begin{aligned} \lim _{t\rightarrow \infty } V(X_t(\omega )) = V_\infty (\omega ) \end{aligned}$$
(28)

or

$$\begin{aligned} \liminf _{t\rightarrow \infty }\bigl \vert X_t(\omega )-x_0\bigr \vert = 0 \end{aligned}$$
(29)

does not hold, where \(V_\infty \) is defined by (22). Indeed, (27) together with (28) imply that \(V_\infty (\omega )\ge \eta \). On the other hand, if (29) is satisfied, then there exists a sequence \(r_n\nearrow \infty \) such that

$$\begin{aligned} \lim _{n\rightarrow \infty } X_{r_n}(\omega ) = x_0, \end{aligned}$$

hence, again by (28) and (H2),

$$\begin{aligned} V_\infty (\omega )=\lim _{n\rightarrow \infty }V(X_{r_n}(\omega ))=V(x_0)=0, \end{aligned}$$

which is a contradiction. However, we have already shown that both (28) and (29) hold for \(\mathbb {P}\,\)-almost all \(\omega \in \varOmega \), which concludes the proof of Theorem 3.1. \(\square \)

Now we focus on a particular case of Eq. (3) corresponding to the continuous-time stochastic approximation procedure of Robbins–Monro type with a general Lévy noise. Recall that in this setting we are looking for a stochastic differential equation such that its solutions converge to a root of the drift R for a class of noise coefficients as wide as possible. Namely, we consider the equation

$$\begin{aligned} \begin{aligned}&\textrm{d}X_t=\alpha (t)\Big ( R(X_{t})\,\textrm{d}t+\sigma (t,X_{t})\,\textrm{d}W_t + \int _ {\lbrace \left| y\right| < c\rbrace } K(X_{t-}, y)\,\tilde{N}(\textrm{d}t, \textrm{d}y) \\&\qquad + \int _ {\lbrace \left| y\right| \ge c\rbrace } K(X_{t-}, y)\,N(\textrm{d}t, \textrm{d}y)\Big ), \quad t\ge 0 \\&\quad X_0 \sim \mu , \end{aligned} \end{aligned}$$
(30)

with Borel coefficients

$$\begin{aligned} \alpha :\mathbb {R}_{\ge 0}\longrightarrow \mathbb {R}_{>0}, \ R:\mathbb {R}^m\longrightarrow \mathbb {R}^m, \ \sigma :{\mathbb {R}}_{\ge 0}\times \mathbb {R}^m \longrightarrow \mathbb {R}^{m\times n}, \ K:\mathbb {R}^m\times \mathbb {R}^n\longrightarrow \mathbb {R}^m \end{aligned}$$

and a Borel probability measure \(\mu \) on \(\mathbb {R}^m\). The driving noise (WN) is the same as in (3). Since the function K is independent of time now, Assumption 2.1 takes the following form:

Assumption 3.1

We shall assume that

$$\begin{aligned} \int _ {\lbrace \left| y\right|< c\rbrace } {\left| K(x, y)\right| ^2\,\nu (\textrm{d}y)}<\infty \quad \text {for all }x\in \mathbb {R}^m \end{aligned}$$

and the function

$$\begin{aligned} \int _ {\lbrace \left| y\right| \ge c\rbrace } \bigl | K(\cdot , y) \bigr |\,\nu (\textrm{d}y) \end{aligned}$$

is locally bounded on \({\mathbb {R}}^m\).

Let us state a result which one obtains applying Theorem 3.1 to (30).

Corollary 3.1

Let Assumption 3.1 be satisfied. Let there exist \(x_0\in {\mathbb {R}}^m\), a function \(V\in \mathscr {V}\cap L^1(\mu )\) with \(V(x_0)=0\) and a measurable function \(\varphi :\mathbb {R}^m\longrightarrow \mathbb {R}_{\ge 0}\) such that

$$\begin{aligned} \inf _{\varrho \ge |x-x_0|\ge \varepsilon } \varphi (x)>0 \quad \text {for all }\varrho>\varepsilon >0 \end{aligned}$$
(31)

and

$$\begin{aligned} \lim _{|x|\rightarrow \infty } V(x) = +\infty , \quad \inf _{|x-x_0|\ge \varepsilon } V(x)>0 \quad \text {for all }\varepsilon >0. \end{aligned}$$
(32)

Assume further that \(\alpha \in \mathscr {C}(\mathbb {R}_{\ge 0},\mathbb {R}_{>0})\) satisfies

$$\begin{aligned} \int _0^\infty \alpha (r)\,\textrm{d}r=\infty , \quad \int _0^\infty \alpha ^2(r) \,\textrm{d}r<\infty . \end{aligned}$$
(33)

Let there exist a constant \(K_\sigma \in \mathbb {R}_{\ge 0}\) and a function \(\beta \in \mathscr {C}(\mathbb {R}_{\ge 0})\cap L^1(\mathbb {R}_{\ge 0})\) such that

$$\begin{aligned}&\Bigl \langle R(x)+\int _ {\lbrace \left| y\right| \ge c\rbrace } K(x, y)\,\nu (\textrm{d}y), DV(x)\Bigr \rangle \le -\varphi (x), \end{aligned}$$
(34)
$$\begin{aligned}&{\text {Tr}}\bigl (\sigma (t,x)^TD^2V(x)\sigma (t,x)\bigr )\le K_\sigma \bigl (1+V(x)\bigr ) \end{aligned}$$
(35)

and

$$\begin{aligned}{} & {} \int _{\mathbb {R}^n\setminus \lbrace 0\rbrace }\bigl [V(x+\alpha (t)K(x,y))-V(x) -\alpha (t)\bigl \langle K(x,y), DV(x)\bigr \rangle \bigr ]\,\nu (\textrm{d}y) \nonumber \\{} & {} \quad \le \beta (t)\bigl (1+V(x)\bigr ) \end{aligned}$$
(36)

for all \(x\in \mathbb {R}^m\) and \(t\in \mathbb {R}_{\ge 0}\).

If \((\varOmega ,\mathscr {F}, (\mathscr {F}_t), (W, N), X)\) is a solution to (30), then

$$\begin{aligned} \lim _{t\rightarrow \infty } X_t = x_0 \quad {\mathbb {P}}\text {-a.s.} \end{aligned}$$
(37)

Proof

To see that Corollary 3.1 follows immediately from Theorem 3.1, it suffices to check that the hypothesis (H3) is satisfied. However, the operator \({\mathscr {L}}\) associated with (30) takes the form

$$\begin{aligned}&\mathscr {L}V(t, x) = \alpha (t)\bigl \langle R(x), DV(x)\bigr \rangle + \frac{\alpha ^2(t)}{2} {\text {Tr}}\bigl (\sigma (t,x)^T D^2V(x) \sigma (t,x)\bigr )\\&\qquad \qquad \qquad + \int _{\mathbb {R}^n\setminus \lbrace 0\rbrace }\bigl [ V(x+\alpha (t)K(x,y))-V(x)\\&\qquad \qquad \qquad -\alpha (t)\textbf{1}_{ {\lbrace \left| y\right| < c\rbrace } }(y)\bigl \langle K(x,y), DV(x)\bigr \rangle \bigr ]\,\mathrm \nu (dy) \\&\quad \qquad \quad \quad = \alpha (t)\Bigl \langle R(x)+\int _ {\lbrace \left| y\right| \ge c\rbrace } K(x,y)\nu (dy), DV(x)\Bigr \rangle \\&\qquad \qquad \qquad + \frac{\alpha ^2(t)}{2}{\text {Tr}}\bigl (\sigma (t,x)^T D^2V(x) \sigma (t,x)\bigr ) \\&\qquad \qquad \qquad + \int _{\mathbb {R}^n\setminus \lbrace 0\rbrace } \bigl [V(x+\alpha (t)K(x,y))-V(x)-\alpha (t)\bigl \langle K(x,y), DV(x) \bigr \rangle \bigr ] \,\nu (\textrm{d}y) \end{aligned}$$

for any \(x\in \mathbb {R}^m\) and \(t\in {\mathbb {R}}_{>0}\); the last term on the right-hand side is well defined owing to Assumption 3.1. The assumptions of Corollary 3.1 thus imply that

$$\begin{aligned} \mathscr {L}V(t,x) \le -\alpha (t) \varphi (x) + \frac{1}{2}\bigl (K_\sigma \alpha ^2(t) + 2\beta (t)\bigr ) \bigl (1+V(x)\bigr ). \end{aligned}$$

Since \((K_\sigma \alpha ^2 + 2\beta )\in L^1({\mathbb {R}}_{\ge 0})\cap {\mathscr {C}}({\mathbb {R}}_{\ge 0})\), the proof is completed. \(\square \)

Remark 3.1

  1. (a)

    As in Theorem 3.1, we may replace (31) and (32) with

    $$\begin{aligned} \inf _{|x-x_0|\ge \varepsilon }\bigl (V(x)\wedge \varphi (x)\bigr )> 0 \quad \text {for any }\varepsilon > 0. \end{aligned}$$
    (38)
  2. (b)

    If the function

    $$\begin{aligned} x\longmapsto \Bigl \langle R(x)+\int _ {\lbrace \left| y\right| \ge c\rbrace } K(x, y)\,\nu (\textrm{d}y), DV(x) \Bigr \rangle \end{aligned}$$

    is continuous on \({\mathbb {R}}^m\) and

    $$\begin{aligned} \Bigl \langle R(x)+\int _ {\lbrace \left| y\right| \ge c\rbrace } K(x, y)\,\nu (\textrm{d}y), DV(x)\Bigr \rangle <0 \quad \text {for }x\ne x_0 \end{aligned}$$

    we may set

    $$\begin{aligned} \varphi (x) = - \Bigl \langle R(x)+\int _ {\lbrace \left| y\right| \ge c\rbrace } K(x, y)\,\nu (\textrm{d}y), DV(x)\Bigr \rangle , \quad x\in {\mathbb {R}}^m, \end{aligned}$$

    then both (31) and (34) are satisfied.

If \(H=0\) and \(K=0\), then Theorem 3.1 and Corollary 3.1 correspond essentially to [21], Theorems 3.8.1 and 4.4.1, respectively.

4 Applications

Sufficient conditions for convergence of a solution X of (30) to a point are given in Corollary 3.1 in terms of a Lyapunov function V. Choosing a particular Lyapunov function, we get more applicable criteria in terms of the coefficients of (30). If \(K=0\), then \(V = \vert \cdot -x_0\vert ^2\) is a standard choice; however, in the general case, we must proceed in a different way since we need a Lyapunov function belonging to the system \({\mathscr {V}}\).

Example 4.1

Let \(x_0\in \mathbb {R}^m\) and let us set

$$\begin{aligned} V:{\mathbb {R}}^m\longrightarrow {\mathbb {R}}_{\ge 0}, \ x\longmapsto \log \bigl (1 + |x-x_0|^2\bigr ). \end{aligned}$$

Obviously, the Fréchet derivatives of V are given by

$$\begin{aligned} DV(x)&= 2\frac{x-x_0}{1+\left| x-x_0\right| ^2}, \\ D^2V(x)&= \frac{2}{1+\left| x-x_0\right| ^2}I - \frac{4}{\bigl ( 1+|x-x_0|^2\bigr )^2} (x-x_0)(x-x_0)^T, \end{aligned}$$

for all \(x\in {\mathbb {R}}^m\) and thus \(V\in {\mathscr {V}}\), furthermore, \(V(x)\rightarrow +\infty \) as \(|x|\rightarrow \infty \).

Let Assumption 3.1 be satisfied and suppose that the coefficients \(\sigma \) and K of (30) satisfy the linear growth condition: there exists a constant \(L\in \mathbb {R}_{\ge 0}\) such that

$$\begin{aligned} |\sigma (t,x)|^2+\int _{\mathbb {R}^n\setminus \lbrace 0\rbrace } |K(x,y)|^2\,\nu (\textrm{d}y) \le L\bigl ( 1+ |x|^2\bigr ) \end{aligned}$$
(39)

for all \(x\in {\mathbb {R}}^m\) and \(t\ge 0\). Denote by \({\mathfrak {k}}\) the function

$$\begin{aligned} {\mathfrak {k}}:{\mathbb {R}}^m\longrightarrow {\mathbb {R}}, \ x\longmapsto \Bigl \langle R(x)+\int _ {\lbrace \left| y\right| \ge c\rbrace } K(x,y)\,\nu (\textrm{d}y), x-x_0\Bigr \rangle . \end{aligned}$$

Since

$$\begin{aligned} \Bigl \langle R(x)+\int _ {\lbrace \left| y\right| \ge c\rbrace } {K(x, y)\,\nu (\textrm{d}y)}, DV(x)\Bigr \rangle = \frac{2}{1+ |x-x_0|^2}{\mathfrak {k}}(x) \end{aligned}$$
(40)

for all \(x\in {\mathbb {R}}^m\), (34) is satisfied with the choice

$$\begin{aligned} \varphi :x\longmapsto -\frac{2{\mathfrak {k}}(x)}{1+|x-x_0|^2}. \end{aligned}$$
(41)

The function \(\varphi \) defined by (41) surely satisfies (31) if \({\mathfrak {k}}\) is continuous and

$$\begin{aligned} {\mathfrak {k}}(x) < 0 \quad \text {for all }x\ne x_0. \end{aligned}$$
(42)

If \({\mathfrak {k}}\) is not continuous, it may be difficult to check (31) and a more feasible way may be to strengthen (42) assuming that there exists \(\eta >0\) such that

$$\begin{aligned} {\mathfrak {k}}(x) \le -\eta |x-x_0|^2 \quad \text {for all } x\in {\mathbb {R}}^m. \end{aligned}$$
(43)

In this case, we may set

$$\begin{aligned} \varphi :x\longmapsto \frac{2\eta |x-x_0|^2}{1+|x-x_0|^2} \end{aligned}$$

obtaining a function that clearly satisfies (31). We claim that the other hypotheses of Corollary 3.1 (in the version of Remark 3.1) are also satisfied.

For any \(x\in \mathbb {R}^m\), we may compute using (39)

$$\begin{aligned} \begin{aligned}&{\text {Tr}} \bigl (\sigma (t,x)^TD^2V(x)\sigma (t,x)\bigr ) \\&\quad = \frac{2}{1+|x-x_0|^2} |\sigma (t,x)|^2 - \frac{4}{\bigl (1+|x-x_0|^2\bigr )^2} \bigl |\sigma (t,x)^T(x-x_0)\bigr |^2 \\&\quad \le \frac{2}{1+ |x-x_0|^2}|\sigma (t,x)|^2 \\&\quad \le 2L\frac{1+ |x|^2}{1+|x-x_0|^2} \\&\quad =4L\Bigl ( 1+\frac{|x_0|^2}{1+|x-x_0|^2}\Bigr ) \\&\quad \le 4L\bigl ( 1+|x_0|^2\bigr )\bigl ( 1+V(x)\bigr ) \end{aligned} \end{aligned}$$
(44)

and (35) follows. Finally, we verify that (36) holds with the choice \(\beta = 2\alpha ^2\,L(1+|x_0|^2)\). Using that \(\log (y)\le y-1\) for all \(y>0\) plainly and the definition of V, we obtain

$$\begin{aligned}{} & {} \int _{\mathbb {R}^n\setminus \lbrace 0\rbrace } \bigl [V(x+\alpha (t) K(x,y))-V(x)-\alpha (t)\bigl \langle K(x,y), DV(x)\bigr \rangle \bigr ] \,\nu (\textrm{d}y) \nonumber \\{} & {} \qquad \qquad = \int _{\mathbb {R}^n\setminus \lbrace 0\rbrace } \Bigl [\log \left( \frac{1+|x+\alpha (t)K(x,y)-x_0|^2}{1+|x-x_0|^2}\right) \nonumber \\{} & {} \qquad \qquad - \frac{2\alpha (t)}{1+|x-x_0|^2} \bigl \langle K(x,y),x-x_0\bigr \rangle \Bigr ]\,\nu (\textrm{d}y) \nonumber \\{} & {} \qquad \quad \le \frac{1}{1+|x-x_0|^2}\int _{\mathbb {R}^n\setminus \lbrace 0\rbrace } \bigl [|x-x_0+\alpha (t) K(x,y)|^2 -|x-x_0|^2 \nonumber \\{} & {} \qquad \qquad -2\alpha (t)\bigl \langle K(x,y),x-x_0\bigr \rangle \bigr ]\,\nu (\textrm{d}y) \nonumber \\{} & {} \qquad = \frac{\alpha ^2 (t)}{1+|x-x_0|^2} \int _{\mathbb {R}^n\setminus \lbrace 0\rbrace } |K(x,y)|^2\,\nu (\textrm{d}y) \nonumber \\{} & {} \qquad \qquad \le \alpha ^2 (t)L \frac{1+|x|^2}{1+|x-x_0|^2}\nonumber \\{} & {} \le 2\alpha ^2 (t)L\bigl ( 1+|x_0|^2\bigr )\bigl (1+V(x)\bigr ) \end{aligned}$$
(45)

for all \(t\in \mathbb {R}_{\ge 0}\) and \(x\in {\mathbb {R}}^m\). Note also that Assumption 3.1 clearly follows from (39).

Therefore, whenever \(\alpha \in \mathscr {C}(\mathbb {R}_{\ge 0}, \mathbb {R}_{> 0})\) obeys (33) and ((WN), X) is a solution to (30), then X converges almost surely to \(x_0\) as \(t\rightarrow \infty \).

Remark 4.1

It should be stressed that under the hypotheses of Example 4.1 the point \(x_0\in {\mathbb {R}}^m\) the solution of (30) converges to need not be a root of the drift R; therefore, a priori it might be misleading to speak about a Robbins–Monro stochastic approximation procedure. Let us discuss this problem more carefully: Our main positive results are illustrated in paragraphs (d) and (f), while (c) contains a counterexample. In (a), (b) and (e), particular cases related to hitherto available results are treated.

  1. (a)

    Assume that \(K=0\). Then, (42) reduces to

    $$\begin{aligned} \langle R(x), x-x_0\rangle < 0 \quad \text {for all }x\ne x_0. \end{aligned}$$
    (46)

    Hence, if R is continuous (which is a rather natural assumption) we have \(R(x_0) = 0\) (as it is well known from the theory of monotone mappings, see, e.g., [5, Lemma 1] for a much more general result) and plainly \(x_0\) is the unique root of R. If \(\sigma \) satisfies the linear growth condition and R is a continuous function such that (46) holds, then

    $$\begin{aligned} \lim _{t\rightarrow \infty } X_t = x_0 \quad \mathbb {P}\,\text {-almost surely} \end{aligned}$$
    (47)

    for any solution of the equation

    $$\begin{aligned} \textrm{d}X_t = \alpha (t)\Bigl ( R(X_t)\,\textrm{d}t + \sigma (t,X_t) \,\textrm{d}W_t\Bigr ), \quad X_0\sim \mu . \end{aligned}$$
    (48)

    This is a classical result going back to [21].

  2. (b)

    If the driving Lévy noise has a purely discontinuous component, but there are no large jumps, that is, \(\nu \{|x|\ge a\} = 0\) for some \(a\in (0,\infty )\), then the results are virtually the same as in the diffusion case. Indeed, if R is continuous, obeys (46), and \(\sigma \) and K have at most linear growth, then (47) holds for any solution of

    $$\begin{aligned}{} & {} \,\textrm{d}X_t = \alpha (t)\Bigl ( R(X_t)\,\textrm{d}t + \sigma (t,X_t) \,\textrm{d}W_t \nonumber \\{} & {} \qquad \qquad + \int _{\{|y|<a\}} K(X_{t-},y){\tilde{N}}(\textrm{d}t,\textrm{d}y)\Bigr ), \quad X_0\sim \mu . \end{aligned}$$
    (49)

    Again, \(x_0\) is the unique root of R. Related results, obtained by different methods, may be found in [15, 20].

  3. (c)

    In the general case \(K\ne 0\) and \(\nu \{|y|\ge c\}>0\), the situation changes considerably. This should not be surprising: the last term on the right-hand side of (30), that is, the process

    $$\begin{aligned} \int ^\cdot _0 \int _{\{|y|\ge c\}} K(X_{t-},y)\, N(\textrm{d}t, \textrm{d}y) \end{aligned}$$
    (50)

    is not centered in general. Moreover, if we would like to keep the driving Lévy noise in (3) but to use a representation with a different c it results in a change of the drift (and, a fortiori, of the roots of the drift). Hence, Corollary 3.1 need not be applicable to the Robbins–Monro procedure, as it implies convergence to a point \(x_0\) such that \(R(x_0)\ne 0\). Indeed, if in the setting of Example 4.1 the function \({\mathfrak {k}}\) is continuous and satisfies (42), then we only know that

    $$\begin{aligned} R(x_0) + \int _{\{|y|\ge c\}} K(x_0,y)\,\nu (\textrm{d}y) = 0 \end{aligned}$$

    The following simple example illustrates this phenomenon. Define the coefficients R and K by

    $$\begin{aligned} R:x\longmapsto A(x-a), \quad K:(x,y)\longmapsto B(x-b) \end{aligned}$$

    for some \(a,b\in {\mathbb {R}}^m\) and matrices \(A,B\in {\mathbb {R}}^{m\times m}\) such that \(A+B\) is invertible and negative definite, and \(A(x_0-a)\ne 0\) where we set \(x_0 = (A+B)^{-1} (Aa+Bb)\). We can assume for simplicity that \(\nu \{|y|\ge c\}=1\). Then,

    $$\begin{aligned} {\mathfrak {k}}(x)&= \Bigl \langle A(x-a) + \int _{\{|y|\ge c\}} B(x-b) \,\nu (\textrm{d}y),x-x_0\Bigr \rangle \\&= \bigl \langle (A+B)x - (Aa+Bb), x-x_0 \bigr \rangle \\&= \bigl \langle (A+B)(x-x_0),x-x_0\bigr \rangle \\&\le -\eta |x-x_0|^2 \end{aligned}$$

    for some \(\eta >0\) and all \(x\ne x_0\), however, \(R(x_0)\ne 0\).

  4. (d)

    Therefore, in the general case of (30) we must add the assumption \(R(x_0) = 0\) if Corollary 3.1 is to be applied to stochastic approximation; for Eqs. (48) and (49) this is redundant. On the other hand, by choosing K in an appropriate way we may obtain (47) under rather mild hypotheses on R. Let us assume that \(R(x_0) = 0\) and R is Lipschitz continuous, denote by \({\text {Lip}}(R)\) its Lipschitz constant. If K satisfies, still in the setting of Example 4.1,

    $$\begin{aligned} \Bigl \langle \int _{\{|y|\ge c\}} K(x,y)\,\nu (\textrm{d}y), x-x_0 \Bigr \rangle \le -({\text {Lip}}(R)+1)|x-x_0|^2 \quad \text {for all }x\in \mathbb R^m, \end{aligned}$$

    then Corollary 3.1 is applicable. In the diffusion case (48), the mere Lipschitz continuity of R need not be sufficient for the convergence of the stochastic approximation procedure. (Indeed, consider (48) with the choice \(m=n=1\), \(R(x) = \sigma (t,x) = x\) for \((t,x)\in \mathbb R_{\ge 0}\times {\mathbb {R}}\), \(V = |\cdot |^2\), and \(\alpha (t) = (1+t)^{-1}\) for \(t\ge 0\), then all assumptions of Corollary 3.1 are satisfied except the hypothesis (34), R is plainly globally Lipschitz continuous having 0 as its only root, nevertheless, a simple direct calculation shows that \(X_t\rightarrow \infty \) \({\mathbb {P}}\)-a.s. as \(t\rightarrow \infty \).)

  5. (e)

    If

    $$\begin{aligned} \int _{\{|y|\ge c\}} K(x,y)\,\nu (\textrm{d}y) = 0 \quad \text {for all } x\in {\mathbb {R}}^m \end{aligned}$$

    then the process (50) is centered and we see that any solution X to (30) converges to the unique root of R under the hypothesis that R is a continuous function satisfying (46) (and \(\sigma \) and K has at most linear growth). This result may be compared with theorems stated in [12] where equations driven by centered square integrable processes with independent increments are dealt with. We do not need \(L^2\)-integrability, on the other hand sharper asymptotic results than mere convergence almost surely are established in [12] at the price of more restrictive assumptions on noise coefficients and the cumulant process of the driving Lévy process.

  6. (f)

    Finally, note that the hypotheses of Example 4.1 may be satisfied even if R has multiple roots. The coefficient K then “selects” a root of R which a solution to (30) converges to. This may happen only if a noncentered non-compensated Poisson process is allowed as a driving noise. As we have already indicated above, large jumps of the Lévy process virtually change the drift and, consequently, it is possible that a solution to (30) no longer converges to some (or all) of its roots. Again, in the diffusion case or for Eq. (49) the situation is completely different, see, e.g., [21, Chapter 5]. For example, let \(m=1\) and let \(\sigma \) and K satisfy (39) and

    $$\begin{aligned} x\cdot \int _{\{|y|\ge c\}} K(x,y)\,\nu (\textrm{d}y) \le -2|x|^2 \quad \text {for all }x\in {\mathbb {R}}. \end{aligned}$$

    Then, any solution to

    $$\begin{aligned}{} & {} \textrm{d}X_t=\alpha (t)\Big ( \sin X_{t} \,\textrm{d}t+\sigma (t,X_{t}) \,\textrm{d}W_t + \int _ {\lbrace \left| y\right| < c\rbrace } K(X_{t-}, y)\,\tilde{N}(\textrm{d}t, \textrm{d}y) \\{} & {} \qquad + \int _ {\lbrace \left| y\right| \ge c\rbrace } K(X_{t-}, y)\,N(\textrm{d}t, \textrm{d}y)\Big ), \quad t\ge 0 \\{} & {} \quad X_0 \sim \mu , \end{aligned}$$

    satisfies

    $$\begin{aligned} \lim _{t\rightarrow \infty } X_t = 0 \quad \mathbb {P}\,\text {-a.s.} \end{aligned}$$
  7. (g)

    It is possible to allow coefficients K depending on time, i.e., defined on \({\mathbb {R}}_{\ge 0}\times {\mathbb {R}}^m\times {\mathbb {R}}^n\). If Eq. (49) is considered, that is, there are no large jumps, this change results in a trivial modification of the assumptions. In the general case, however, the hypotheses become cumbersome and thus we content ourselves with time-independent K’s.

5 Conclusions

We extended a Lyapunov-functions-based approach to convergence of a continuous-time Robbins–Monro procedure of stochastic approximation from diffusion processes to systems defined by a stochastic differential equation driven by a general Lévy process. While for a driving noise with small jumps only our results are essentially comparable with available results (albeit our proofs are different), if large jumps are allowed we showed that new phenomena may occur: the large jumps may force the procedure to converge to a “fake“ root of the drift, on the other hand, if the noise coefficient is properly chosen, we obtain convergence under hypothesis weaker than those of the standard theory.