Lévy Langevin Monte Carlo

Analogously to the well-known Langevin Monte Carlo method, in this article we provide a method to sample from a target distribution π\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\pi }$$\end{document} by simulating a solution of a stochastic differential equation. Hereby, the stochastic differential equation is driven by a general Lévy process which—unlike the case of Langevin Monte Carlo—allows for non-smooth targets. Our method will be fully explored in the particular setting of target distributions supported on the half-line (0,∞)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(0,\infty )$$\end{document} and a compound Poisson driving noise. Several illustrative examples conclude the article.


Introduction
Monte Carlo methods based on stationary Markov processes appear frequently in fields such as statistics, computer simulation and machine learning, and they have a variety of applications, for example in physics and biology, cf.[1,4,9,20,21].These methods have in common that in order to sample from a target distribution π one considers sample paths of certain Markov processes to approximate π.Langevin Monte Carlo (lmc) is one of these methods and it originates from statistical physics.It applies to absolutely continuous target distributions π(dx) = π(x)dx with smooth density functions π : R d → R + , and its associated process (X t ) t 0 is the so-called Langevin diffusion, that is a strong solution of the stochastic differential equation (sde) where (B t ) t 0 is a standard Brownian motion on R d , and ∇π denotes the gradient of π.For LMC to produce samples from π it is required that (X t ) t 0 is a unique strong solution for (1.1) and π is an invariant distribution for (X t ) t 0 , that is R P x (X t ∈ B)π(dx) = π(B) for all t 0, B ∈ B(R). (1.2) However, for this to be the case it is only natural that assumptions must be made regarding π, e.g. that ∇π exists in a suitable sense.Moreover, to sample from π using (X t ) t 0 it is essential that (X t ) t 0 converges to π in a suitable sense from any starting point x ∈ supp π.As solutions (X t ) t 0 of (1.1) are almost surely continuous, supp π is necessarily connected for convergence to be even possible.For more on lmc see [4] or [17].
Due to the constraints of lmc it is reasonable to ask whether one could construct similar methods by replacing the Brownian motion with a more general process.In this article we consider Lévy processes as driving noises.In particular, we are interested in the following question: Given a distribution π and a Lévy process (L t ) t 0 , can we choose a drift coefficient φ such that we can sample from π by simulation of a solution (X t ) t 0 of dX t = φ(X t )dt + dL t ?(1.3)There are various cases in the literature for which sdes of the form (1.3) are considered with Lévy processes as driving noises.In [15] and [20] a fractional Langevin Monte Carlo (flmc) method is introduced for which (L t ) t 0 is an α-stable process, and in [5], several examples are produced for the case when the driving noise is a pure jump Lévy process.However, both of these studies are rather focused on practical aspects, disregarding some of the theoretical foundations.This will be discussed further in Remark 3.3.
To thoroughly answer the above question it is essential to distinguish between the notions infinitesimally invariant distribution, invariant distribution, and limiting distribution.These, and some more introductory notions and well-known facts can be found in Section 2 of this article.After that, in Section 3, we investigate under which conditions a drift coefficient φ exists such that π is infinitesimally invariant for (X t ) t 0 .Clearly, there are cases for which this is not the case, think of discrete distributions, or distributions on a half-space while jumps can occur in all directions.Hence, a general answer can only exist under certain assumptions on the regularity of π and the compatibility of π and (L t ) t 0 .
In the same section, we then find a particular set of conditions under which π is invariant and limiting for (X t ) t 0 .Various examples subsequently illustrate our results.Afterwards, in Section 4, we present the more technical aspects of the proofs, followed in Section 5 by a list of possible extensions with comments on the difficulties they might pose.Methodologically we rely on the results in [2] on invariant measures of Lévy-type processes, the Foster-Lyapunov methods originating in a series of articles by S.P. Meyn and R.L. Tweedie ( [12]- [14]), and standard techniques from the theory of ordinary differential equations.
that are continuous w.r.t.uniform convergence on compact subsets of all derivatives are called Schwartz distributions.The distributional derivative of a Schwartz distribution T is defined by If there exists N ∈ N 0 such that for all compact sets K ⊂ U there exists c > 0 such that Markov processes and generators Let (X t ) t 0 be a Markov process in R on the probability space (Ω, F , P).We denote for all x ∈ R. The pointwise generator of (X t ) t 0 is the pair (A, D(A)) defined by where Further, denote by D(G) the set of all functions f : R → R for which there exists a measurable function g : R → R such that for all x ∈ R and t > 0 it holds Setting Gf = g the pair (G, D(G)) is called the extended generator of (X t ) t 0 .
Lévy processes A (one-dimensional) Lévy process (L t ) t 0 is a Markov process with stationary and independent increments with characteristic exponent ϕ(β) given by Here, γ ∈ R is the location parameter, σ 2 0 is the Gaussian parameter, and µ and ρ are two measures on R such that µ{0} = ν{0} = 0 and R (1 Note that the decomposition of Π into µ and ρ is not unique.Further, denote by the integrated tail of µ, and by µ s (x) := sgn(x)µ(x) the signed integrated tail of µ.We similarly define the double integrated tail of ρ by and π is called limiting for (X t ) t 0 if π is a distribution, i.e. |π| = 1, and where • TV denotes the total variation norm.The process (X t ) t 0 is called Harris recurrent if there exists a non-trivial σ-finite measure a on R such that for all B ∈ B(R) with a(B) > 0 it holds P x (τ B < ∞) = 1 where τ B := inf{t 0 : X t ∈ B}.It is well-known (cf.[14]) that for any Harris recurrent Markov process (X t ) t 0 an invariant measure π exists which is unique up to multiplication with a constant.If π is finite it can be normalized to be a distribution.In this case (X t ) t 0 is called positive Harris recurrent.

Lévy Langevin Monte Carlo
Let π be a probability distribution on R, and let (L t ) t 0 be a Lévy process on R. Further, let (X t ) t 0 be a solution of Can we choose φ : R → R in such a way that π is limiting for (X t ) t 0 ?In the spirit of flmc we call the sampling of (X t ) t 0 in order to sample from π Lévy Langevin Monte Carlo (llmc).
As mentioned in the introduction a general answer to this question cannot be given without certain conditions on π and (L t ) t 0 .Throughout, we assume that π(dx) = π(x)dx is absolutely continuous, and that the Lévy process (L t ) t 0 with characteristic triplet (γ, σ 2 , Π) is not purely deterministic, i.e. σ 2 > 0 or Π = 0. Recall that Π = µ + ρ with µ and ρ as in Section 2. Define E := {x ∈ R : π(x) > 0} and assume either E = R or some open half-linewithout loss of generality we choose in this case E = (0, ∞).This choice of E is not a real restriction, as explained further in Section 5. Additionally we assume the following: (a1) If E = (0, ∞), then (L t ) t 0 is a spectrally positive compound Poisson process.

Infinitesimally invariant distributions
Then π is an infinitesimally invariant distribution of any solution (X t ) t 0 of (1.3).
The proof of Theorem 3.1 will be clearer if we point out the primary thoughts behind Assumptions (a1) -(a3) first.Remark 3.2.Assumptions (a1) and (a2) make sure that the process (X t ) t 0 stays in the open half-line (0, ∞) if E = (0, ∞).The former does so by allowing only upward jumps while the latter guarantees that (X t ) t 0 cannot drift onto 0, as we will see in the proof below.Clearly, Assumption (a3) becomes only relevant if E = R, and it ensures that our choice of the drift coefficient in (3.4) is well-defined.Note that it can be weakened if σ 2 = 0 as π ∈ W 1,1 loc (0, ∞) is sufficient but not necessary for (ρ * π) ′ to be well-defined.However, since we discuss in this article mostly processes with paths of bounded variation we choose to omit various special cases for the sake of clarity.
Proof of Theorem 3.1.Denote by O ⊂ R the state space of (X t ) t 0 .In order to show that O = E we prove that X t ∈ E for all t 0 if X 0 ∈ E. As this is trivially true for E = R we show it only for E = (0, ∞).In this case (L t ) t 0 is a spectrally positive compound Poisson process, by (a1).Thus, (X t ) t 0 cannot exit E via jumps.We are going to show that (X t ) t 0 cannot exit via drift either.If no jump of (L t ) t 0 interrupts the path of (X t ) t 0 then t → X t is monotone decreasing and follows the autonomous differential equation π(x) .Separation of variables yields that the time T it takes for (X t ) t 0 to drift from x to x ′ ∈ [0, x] is given by By (a1) and (a2), µ s * π(x) cxπ(x) for some constant c > 0, and thereby for all x > 0. Hence, (X t ) t 0 cannot drift onto 0 in finite time.Therefore, O = E in this case as well.
We now return to the general case.A straight-forward application of Itô's lemma and the Lévy-Itô decomposition, similar to [19, Thm.2.50], yields that for the pointwise generator

Invariant distributions
In general, proving that an infinitesimally invariant distribution is an invariant distribution is hard.The best-case scenario is given when (X t ) t 0 is a Feller process and the test functions constitute a core of the pointwise generator of (X t ) t 0 .In this case infinitesimally invariant and invariant are equivalent notions, cf.[11].Although there exist easily verifiable conditions on the drift coefficient φ and (L t ) t 0 such that a solution of (1.3) is a Feller process (cf.[10]) these have some drawbacks.The fact that typically, φ is required to be continuous and fulfills a linear growth condition, i.e. |φ(x)| C(1 + |x|) for some C > 0, excludes many interesting cases.Moreover, even if (X t ) t 0 is a Feller process, we are still left with the question whether the test functions form a core.The task of finding conditions for this to be the case is an open problem (cf.[3]) which has not yet been answered to the best of our knowledge.Remark 3.3.Both the article [5] on Lévy Langevin dynamics and the original article [20] on flmc do not provide arguments as to why the considered target measures are invariant for the respective processes.The chosen appraoch in both articles revolves around finding a stationary solution for Kolmogorov's forward equation of the underlying sde (1.3).This equation, which is also known as Fokker-Planck equation, is inherently connected to invariant distributions as any weak stationary solution of it can be associated to an infinitesimally invariant measure of a solution (X t ) t 0 of (1.3).In the aforementioned articles it is suggested that the transition densities p(t, x, y) of (X t ) t 0 defined via

x, y)dy
solve the associated Kolmogorov forward equation.For many processes this is true, e.g.Feller diffusions (cf.[8]) just to name one.However, for sdes (1.3) with general Lévy noises we were not able to find a reference with a rigorous proof of this claim.If it was indeed true, then any invariant measure π of (X t ) t 0 would necessarily be a stationary solution of Kolmogorov's forward equation.Moreover, both articles are missing an argument as to why the stationary solution of Kolmogorov's forward equation is unique.Although in [20] another article (cf. [18]) is cited on this topic, in said reference uniqueness is merely argued heuristically but not proved.
To ensure methodological rigor, we present in this section a different way of showing that π is an invariant distribution of (X t ) t 0 , and as such even unique.We consider this approach in the special case of E = (0, ∞) and (L t ) t 0 being a compound Poisson process but similar results can be achieved in other frameworks by adjusting the individual steps in a suitable way.This will also be discussed in Section 3.4 below.
Let us now briefly describe our setting.Denote by for all i ∈ Z, and 0 is the unique accumulation point of (x i ) i∈Z } a set of partitions of the open interval (0, ∞).We call a function f ∈ L 1 loc (0, ∞) piecewise weakly differentiable if there exists a partition Let (L t ) t 0 be a Lévy process in R and let π(dx) = π(x)dx be an absolutely continuous distribution on (0, ∞).Our assumptions are as follows: (b1) π is a positive, piecewise weakly differentiable function, and there exist constants C, C ′ , α > 0 such that lim x→∞ π(x)e αx = C, and x 0 π(z)dz C ′ π(x)x for x ≪ 1.
(b2) (L t ) t 0 is a spectrally positive compound Poisson process, i.e. a Lévy process with characteristic triplet (0, 0, µ) such that supp µ ⊂ R + and ∞ 0 (1 ∨ z)µ(dz) < ∞.Note that our standing assumptions (a1) -(a3) are direct consequences of (b1) and (b2).In this setting, the drift coefficient given by (3.4) is reduced to and it is easy to see that φ(x) ∈ (−∞, 0) for all x > 0. Sometimes it will be advantageous to write L t = Nt i=1 ξ i , where (N t ) t 0 is a Poisson process with intensity |µ| and (ξ i ) i∈N is a sequence of i.i.d.random variables distributed according to µ/|µ|.Note that Eξ 1 < ∞ by (b2).With the following theorem we show that, under (b1) and (b2), a solution (X t ) t 0 of (1.3) has the unique invariant distribution π if, additionally, one of the following two conditions is met: Theorem 3.4.Assume that (b1) and (b2) hold, and let (X t ) t 0 be a solution of (1.3) with φ as in (3.7).Then (i) (X t ) t 0 is positive Harris recurrent, and (ii) any invariant distribution of (X t ) t 0 is an infinitesimally invariant distribution of (X t ) t 0 .
The proof of Theorem 3.4 is presented in Section 4, and it is divided into several steps.The first assertion is shown by using the Foster-Lyapunov method of [12] - [14], while the second assertion is a simple application of [2,Cor. 5.4].Under (c1) we show Theorem 3.4 (iii) via techniques from the theory of ordinary differential equations.If instead (c2) is true, we approximate (X t ) t 0 by a sequence of processes fulfilling (c1) to prove the claim.

Limiting distributions
The natural follow-up question of Theorem 3.4 is whether existence and uniqueness of an invariant distribution η for (X t ) t 0 implies that η is a limiting distribution.This property of (X t ) t 0 , i.e. the existence of a limiting distribution, is called ergodicity.As before, π(dx) = π(x)dx, E = (0, ∞), and (L t ) t 0 is a spectrally positive compound Poisson process.Corollary 3.5.Assume (b1) and (b2) hold, and let (X t ) t 0 be a solution of (1.3) with φ as in (3.7).Further assume that some skeleton chain of (X t ) t 0 is irreducible, i.e. there exits ∆ > 0 such that for all B ∈ B((0, ∞)) with λ Leb (B) > 0 and all x ∈ (0, ∞) there exists n ∈ N such that Then (X t ) t 0 is ergodic.
Proof.Without loss of generality we assume µ 1 = 0 since otherwise we may simply condition on the event that the jumps are only sampled from µ 2 .Let B ∈ B((0, ∞)) with λ Leb (B) > 0. Our goal is to show that for all x ∈ (0, ∞) there exists n ∈ N such that It suffices to show (3.8)only for sets B for which inf B > 0 since for arbitrary B ∈ B(0, ∞) with λ Leb (B) > 0 there exist 0 < a < b such that λ Leb (B ∩ (a, b)) > 0.Moreover, it also suffices to just consider x < inf B and n = 1.This is due to the fact that for arbitrary x ∈ (0, ∞) and m ∈ N we obtain where we recall that (N t ) t 0 is the Poisson process counting the jumps of (L t ) t 0 , and therefore also of (X t ) t 0 .
As φ(x) < 0 for all x > 0, we may choose m large enough such that X x m−1 < inf B on {N m−1 = 0}, and consider X x m−1 as a new starting point.Thus, let 0 < x < inf B. In the following we condition on the event that exactly one jump occurs until t = 1.Denote Y t := X x t N 1 = 1 .It holds for some c > 0. Further, denote by T ∈ (0, 1) the uniformly distributed time of the jump.We show that the joint cumulative distribution function is strictly monotone on (0, 1) × (x, ∞) in both arguments.Let 0 < t < t ′ < 1 and y ∈ (x, ∞).We obtain Indeed, if T ∈ (t, t ′ ] and additionally ξ 1 x − Y t− , then Y 1 x < y, and since Now, let t ∈ (0, 1) and x < y < y ′ < ∞.We note that for every t ∈ (0, 1) there exists some interval I ⊂ (0, ∞) such that Y 1 ∈ (y, y ′ ] if T = t and ξ 1 ∈ I.This is due to the fact that the paths of (X t ) t 0 between two jumps are continuous and strictly decreasing.Moreover, since Y 1 depends continuously on T and ξ 1 , there exists ε > 0 and an interval by the assumption on µ.
As both T and Y 1 have clearly no atoms in (0, 1) and (x, ∞), respectively, there exists a joint density function f (T,Y 1 ) of (T, Y 1 ) on (0, 1) × (x, ∞) which is strictly positive.Hence This, together with Corollary 3.5, concludes the proof.

Examples
In this section we illustrate Theorem 3.4 on various examples by sampling (1.3).To this end, we first compute a realization of the path of the driving noise (L t ) t 0 .With (L t ) t 0 being a compound Poisson process this is straight-forward.It remains to solve a (deterministic) differential equation which is then done via the classic Euler method.
Example 3.7 (double-well).In [20] is is pointed out that sampling from a target distribution π(dx) = π(x)dx with two separated modes is challenging for classic lmc.The lower the values of π are between the modes the longer it takes on average for the continuous Langevin diffusion to move from one mode to the other.This issue can be circumvented by allowing jumps.Take x(x − 4)(x − 6.02)(x − 10) + 0.5 , x > 0,  which is taken from [20,Sec. 4], but shifted to the right such that both modes are contained in (0, ∞).As driving noise we choose a Lévy process (L t ) t 0 with characteristic triplet (0, 0, µ) where µ(dx) = e −x dx + δ 4 + 2δ 8 .
Clearly, conditions (b1), (b2), and (c2) are fulfilled.Thus, π is invariant for a solution (X t ) t 0 of (1.3) with φ as in (3.7) by Theorem 3.4.Further, since Lemma 3.6 applies, π is even limiting.We demonstrate this example in Figure 1.Note that, in general, there is no closed-form expression of φ due to the convolution term appearing in its definition.Hence, we use numerical integration to evaluate φ for this and the following two examples.
Example 3.9 (Dresden Frauenkirche).To illustrate that our result also covers target densities with lots of detail we consider the density π as in Figure 2 (left) which represents the silhouette of the Dresden Frauenkirche, continued by an exponential tail.We manufactured π in a way such that (b1) is met.As driving noise we choose a spectrally positive Lévy process with characteristic triplet (0, 0, µ) where Let (X t ) t 0 be a solution of (1.3) with φ as in (3.7).As for both prior examples, π is clearly invariant and limiting for (X t ) t 0 .The density function, the sampled distribution and an exemplary sample path of (X t ) t 0 can be seen in Figure 2.
Taking a closer look we see that the process (X t ) t 0 slows down considerably upon entering the interval (0, 7.5) on which most of the mass of π concentrates.In general, the drift coefficient takes on large values in areas of small mass and small values in areas of high mass.This stems from π appearing in the denominator of (3.7), and can be observed by inspecting the slopes of the sample path in Figure 2.Moreover, the process becomes slower the closer it gets to the origin.On the one hand, this is due to Assumption (b1) (and (a2), respectively), and ensures that 0 cannot be reached in finite time.On the other hand, this slowing down is caused by the convolution with the signed tail function µ s in the nominator of (3.7).This is reasonable: Because jumps go only upwards it is less likely for (X t ) t 0 to reach the area of the left half of the silhouette (approximately the interval (0, 3.75)) than to appear in the area of its right half.But since both sides are symmetrical the drift must compensate for that.

Proof of Theorem 3.4
n the following, whenever constants C, C ′ or α appear in the proof below, we mean the constants of Assumption (b1).

Proof of Theorem 3.4 (i): Positive Harris recurrence
The Foster-Lyapunov method of [12] - [14] is tailored for processes which cover the whole real line.Hence, we consider the auxiliary process Y t = s(X t ) where s : (0, ∞) → R is a smooth strictly monotone function such that s(x) = ln(x) for x ∈ (0, 1 − ε) and s ) t 0 be the unique strong solution of dX ).Clearly, this construction implies that for all m ∈ N if It is typical for the Foster-Lyapunov method that one only requires a single norm-like function (sometimes also called Foster-Lyapunov function) which fulfills a certain inequality.Our particular choice is presented in the lemma below.In the following, to make notation easier, we denote y := s(x) if x ∈ (0, ∞) is given, and where f 0 (x) := f (y) = f (s(x)).
Proof.Fix m ∈ N. Itô's formula yields where µ is the jump measure of (L t ) t 0 .To verify whether the jump measure may be replaced by the compensator under the expectation, and to subsequently swap the order of integration, we need some estimates.Clearly, f 0 (x + h m (x)z) − f 0 (x) = 0 for all z > 0, and x / ∈ [e −m−1 , m + 1].On the other hand, for all x ∈ [e −(m+1) , m + 1] there exists M > 0 such that by the definition of f 0 .Hence, by [19,Thm. 2.21] and the fact that ∞ 0 (1 ∨ z)µ(dz) < ∞ by Assumption (b2), µ (•, ds, dz) may be replaced by dsµ(dz) under the expectation in (4.10).Applying Fubini's theorem and reversing the space transform, i.e. going back to (Y t ) t 0 , we obtain where This function is clearly measurable.We observe that the integral term is continuous in y and vanishes for |y| m + 1.Therefore, g is bounded and Tonelli's theorem is applicable yielding for all y ∈ R and all t 0. Hence, f ∈ D(G m ) for all m ∈ N.This completes the proof as we observe that (4.9) follows from the definition of the extended generator and upon realizing that the representation in (4.11) agrees with (4.9) for all y ∈ (−m, m).
The second key ingredient of the Foster-Lyapunov method is the following: A set K ⊂ R is called petite for a Markov process (Y t ) t 0 if there exists a distribution a on (0, ∞) and a non-trivial measure ϕ on B(R) such that for all y ∈ K and B ∈ B(R) Proof.We start with some helpful notation.For y ∈ R denote by q y (•) the solution of the autonomous differential equation Then q y (t) represents the (deterministic) state Y y t under the assumption that no jump occurs in the time interval [0, t], that is N t = 0.The inverse function q −1 y (y ′ ) exists due to φ(x) < 0 for all x > 0. We note that it represents the time it takes to drift from y > 0 to y ′ ∈ (0, y), and is hence decreasing in y ′ and increasing in y.Let K ⊂ R be compact, without loss of generality assume K = [k 1 , k 2 ].Let a(dt) = e −t dt and ϕ(dz) = c1 (k 1 −1,k 1 ) (z)dz for some c > 0 which we are yet to choose.Let y ∈ K and B ∈ B(R).Using P(N t = 0) = e −t|µ| we compute For the third inequality we substituted z := q y (t) and used the fact that for all y, k 1 ∈ R it holds sup{|q ′ y (t)| : t q −1 y (k 1 − 1)} < ∞.Indeed, this is implied by (4.12), and the properties of φ and s.The fourth inequality is due to the reduction of the area of integration while the fifth inequality uses the monotonicity properties of q −1 described above.Lastly, choosing Finally, we are ready to prove the first claim of Theorem 3.4.

Proof of Theorem 3.4 (i).
We show that there exist some positive constants c, d > 0 and a closed petite set K ⊂ R such that for all m ∈ N and y ∈ (−m, m).Then [14,Thm. 4.2] implies that (Y t ) t 0 is positive Harris recurrent, and therefore, (X t ) t 0 is positive Harris recurrent as well.
Clearly, the function is continuous, and bounded on (−m, m).Hence, (4.13) follows if we can show that lim sup y→±∞ G m f (y) −c for some c > 0. We start with y → +∞.Note that for y ≫ 1 we have x = s −1 (y) = y, and, on the one hand f ′ 0 (y) = 1, and, on the other hand for all arbitrary, but fixed M > 0. Thus, also with Assumption (b1), Since M > 0 was arbitrary this yields lim sup y→+∞ φ(y)f ′ 0 (y) < (0,∞) µ s (z)dz = −Eξ 1 .Now, for the second term of (4.14) we observe that for x = y ≫ 1 Consequently, there exists c > 0 such that G m f (y) < −c < 0 for y ≫ 1. Next, consider the behavior for y → −∞, and start with the observation that for y ≪ −1 one has x = s −1 (y) = e y , and f ′ 0 (x) = e −y .With the definition of φ and Assumption (b1) we obtain |φ(e y )| C ′ |µ|e y for y ≪ −1.Therefore, φ(e y )f ′ 0 (e y ) is bounded for y ≪ −1.Finally, to find a suitable estimate for the second term of (4.14) for y ≪ −1 we fix M > 0 such that µ([M, ∞)) > 0. Observe that for y ≪ −1 it holds f 0 (e y + z) − f 0 (e y ) < 0 for all z ∈ (0, M).Further, there exists K > 0 such that f 0 (e y + z) < M ′ + z for all z ∈ [M, ∞).We then compute and therefore, G m f (y) < −c for y ≪ −1 with the same c as above.This completes the proof.

Proof of Theorem 3.4 (ii): Invariant distributions are infinitesimally invariant
Proof of Theorem 3.4 (ii).The claim follows from [2,Cor. 5.4] if we can show that ) and all t 0. Analogously to the proof of Lemma 4.1, a straight-forward application of Itô's formula yields

Proof of Theorem 3.4 (iii): Uniqueness of the invariant distribution
For the third assertion we require one of the additional assumptions.As described above we start with Assumption (c1), i.e. there exists n ∈ N such that supp µ ⊂ (1/n, n).
Proof of Theorem 3.4 (iii) under (c1).It has been shown in [2,Thm. 4.2] that any infinitesimally invariant measure η of a solution (X t ) t 0 of (1.3) necessarily solves the distributional equation on (0, ∞).To show that there exists only one probability distribution solving (4.15) we first need some regularity properties for φ.A straight-forward calculation yields that for all x 0 the representation holds.As π is piecewise weakly differentiable and π(x) > 0 for x > 0 it follows that 1/π is piecewise weakly differentiable as well.Thus, φ is piecewise weakly differentiable w.r.t. the same partition as π, since the numerator of the right-hand side of (4.16) is the primitive of a locally integrable function, and as such contained in W 1,1 loc (0, ∞).Further, as φ(x) < 0 for all x > 0 we infer that at least 1/φ ∈ L 1 loc (0, ∞).This property of φ allows us to transform (4.15) into We note that the right-hand side of (4.17) defines a Schwartz distribution (cf.[2, Lem.

2.2]
) if η is a real-valued Radon measure which we can assume as we are only looking for solutions which are probability distributions.More importantly, in this case the righthand side of (4.17) is even a Schwartz distribution of order 0, i.e. it can be identified with some real-valued Radon measure on (0, ∞).In summary, the distributional derivative of η can be identified with a real-valued Radon measure which implies that η itself can be identified with a locally integrable function.But now, if we insert a locally integrable function η into the right-hand side of (4.17) we obtain a locally integrable function plus a discrete measure with atoms at the discontinuities of φ.Integrating on both sides of (4.17) tells us that any solution of ( , and that µ * (H ′ )(x) = 0 for x ∈ (0, 1/n], we obtain from (4.18) the equation where F (x) := π(0, x] is the cumulative distribution function of our target distribution π.This is easily seen with the definition of φ.For x > 1/n Equation (4.18) reads From Assumption (c1) it follows that for all 0 < x < b and any function f . Consider Equation (4.20) on [m/n, (m + 1)/n] for some m ∈ N. As initial condition we assume H(x) = c 3 F (x) for all x ∈ (0, m/n] and some c 3 ∈ R.This results in the equation for which Caratheodory's theorem again ensures a unique solution.Hence, by induction over m and subsequent normalization it follows that π is the unique infinitesimally invariant distribution of (X t ) t 0 .Finally, Theorem 3.4 (i), that is positive Harris recurrence, implies existence and uniqueness of an invariant distribution (cf.[14,Sec. 4]).But this unique distribution must be π due to Theorem 3.4 (ii).This proves the claim.and q n (t) = a + t 1 0 φ n (q n (s))ds, respectively.Hence, for all t ∈ [0, t 1 ) for some constants ℓ, c > 0. This is due to the estimate in (4.21), the fact that q is strictly decreasing, and to the piecewise Lipschitz continuity of φ.For the latter we note that φ φ n which implies q(t) q n (t) for all t ∈ [0, t 1 ].In other words, if q reaches a discontinuity of φ at t 1 , i.e. q(t 1 ) ∈ p, then it reaches it ahead of q n which allows the estimate above.Grönwall's inequality (cf.[6, Cor.I.6.6])then yields for all t ∈ [0, t 1 ) which vanishes for n → ∞.
Our strategy is now to iterate this step until we surpass the time T .By design there are two cases: q either jumps at t 1 or hits a discontinuity of φ.Note that it can be ruled out that both events occur at the same time as the probability of this happening is zero.In the same way we exclude q jumping onto a discontinuity of φ because q is strictly decreasing, jumps are space homogeneous, and the set of discontinuities of φ has no accumulation points in (0, ∞).The first case, i.e. q jumps at t 1 , is simple.Clearly, there exists N ∈ N such that for all n > N it holds ∆L t 1 = ∆L (n) t 1 Consequently, for all ε > 0 there exists N ′ ∈ N such that for all n > N ′ it holds |q(t 1 ) − q n (t 1 )| < ε.Choosing ε small enough ensures that q and q n both jump into the same interval (x i , x i+1 ) of the partition w.r.t. which φ is piecewise Lipschitz continuous.Let for all t ∈ [t 1 , t 2 ), and some (possibly different) constants ℓ, c > 0. Applying Grönwall's inequality again concludes this iteration step.For the second case, i.e. if q hits a discontinuity of φ at t 1 , we argue differently.We denote by t 2 (n) := inf{t t 1 : q n (t) = q(t 1 )} the time at which q n also reaches the discontinuity of φ at q(t 1 ).We require some observations: First, t 2 (n) < ∞ for all n ∈ N large enough since φ n < 0 is bounded away from is Theorem 3.4 (iii), has to be shown differently.This is because there exists no subinterval of (0, ∞) that cannot be reached by jumps, and because a non-zero amount of jumps does occur almost surely during every time interval.Thus, the proofs of Theorem 3.4 (iii) under (c1) and (c2), respectively, do not apply in this case.
Target distributions with full support One might wish to extend the results of Theorem 3.4 for the case when E = R.However, just like in the previous paragraph, the approach used for the proof of Theorem 3.4 (iii) fails due to the fact that there exists no subinterval of R that cannot be reached by jumps.Therefore, the proof of the uniqueness of the solution of (3.6) requires different arguments.

Target measures with disconnected supports
Allowing only E = R or E = (0, ∞) seems restrictive as the ability to cross gaps is one of the main advantages of the presence of jumps.Intending to allow disconnected supports of the target measure π one has to assume three things: With those three assumptions one can show that, apart from E = R and E being some half-line, the only option is that E is periodic, that is there exists p > 0 such that E + p = E.However, if E = R is periodic and (X t ) t 0 with state space O = E solves (1.3), then (X t ) t 0 cannot be positive Harris recurrent -regardless of the drift coefficient φ.The reason for this is simple: The jumps of (X t ) t 0 are space-homogeneous, and E consists of countably many intervals of the same length that can only be connected by jumps.Thus, the mass of an invariant measure concentrated on each of these segments is the same.Therefore, no invariant measure can be finite (apart from the trivial measure).
Target measures with atoms An invariant measure π with π({x 0 }) > 0 for one or more x 0 ∈ R can only be achieved by a solution (X t ) t 0 of (1.3) if (X t ) t 0 comes to a halt at x 0 .One possible solution might be to set φ(x 0 ) = 0.At least heuristically this makes sense considering that the denominator in the original definition (3.4) of φ is the density function of π.However, extending Theorem 3.4 to this case needs a new idea since the current proof relies on the fact that any solution of (3.6) can be associated to a locally integrable function.
Target measures with arbitrary tails By Assumption (b1), we require π to have an exponential tail.This is mostly needed in the proof of Theorem 3.4 (i).As with heavy tailed jumps, using a more sophisticated norm-like function will most likely enable us to consider target measures π for which only |π(x)| ce −αx for all x ≫ 1 and some constants c, α > 0.
In case π has a heavy tail, that is when |π(x)| cx −(1+α) for all x ≫ 1 and some constants c, α > 0, it is not clear whether (X t ) t 0 is positive Harris recurrent or not.

Figure 1 :
Figure 1: Illustration of Example 3.7 (left) and Example 3.8 (right).The target densities are displayed in the top images while the bottom images show the respective histograms with a sample size of N = 50000 each.

Figure 2 :
Figure 2: From left to right we see the target density function π, a histogram of the sampled distribution with sample size N = 50000, and an exemplary sample path of (X t ) t 0 .
some constant.Clearly, (X t ) t 0 is positive Harris recurrent if and only if (Y t ) t 0 is positive Harris recurrent.Central to this method are the so-called norm-like functions whose precise definition needs additional notation: For m ∈ N denote O m := (e −m , m) and choose h m ∈ C ∞