Appendix 1: Excerpts from stochastic calculus
Filtered probability spaces
Basic measure theoretic probability is taken as a prerequisite. We list here some notions and results in stochastic processes theory and sketch their motivating heuristics. Let \({(\Upomega, {\mathcal{F}},{\mathbb{P}})}\) be a probability space. The sigma-algebra \({{\mathcal{F}}}\) represents the collection of all subsets of \(\Upomega\) that represent “events”. An event A occurs \({{\mathbb{P}}}\)
almost surely (a.s.) if \({{\mathbb{P}}[A] = 1, }\) and it is a \({{\mathbb{P}}}\)
null set if \({{\mathbb{P}}[A] = 0. }\) The qualifier \({{\mathbb{P}}}\) may be omitted when it is clear from the context. The probability space is assumed to be complete, meaning that all subsets of null sets are measurable (hence null sets). Completeness can always be arranged and is needed for purely technical reasons in a context of continuous-time stochastic processes. Two probability measures defined on the same sigma-algebra are equivalent if they have the same null sets.
Suppose the probability space is meant to model certain phenomena that evolve in a random manner over time, commencing at time 0 (say). For each t ≥ 0 let \({{\mathcal{F}}_t}\) be a sub-sigma-algebra of \({{\mathcal{F}}}\) representing all events whose occurrence or non-occurrence can be established at time t. It is assumed that the collection of sigma-algebras \({{\bf F} = ({\mathcal{F}}_t)_{t \geq 0}}\) is increasing, which means \({{\mathcal{F}}_s \subset {\mathcal{F}}_t}\) if s < t (no information is sacrificed at any time), and that it is right-continuous, which means \({{\mathcal{F}}_t = \bigcap_{u; u > t} {\mathcal{F}}_u}\) for all t (another purely technical necessity). Then F is called a filtration, and \({(\Upomega, {\mathcal{F}}, {\bf F},{\mathbb{P}})}\) is called a filtered probability space. The smallest sigma-algebra containing all the sigma-algebras \({{\mathcal{F}}_s, s <t, }\) is denoted \({{\mathcal{F}}_{t-}. }\) It represents the information provided by F before time t.
Stochastic processes
A stochastic process is a collection of random variables, X = (X
t
)
t≥0, representing some real- or vector-valued index that develops in a random manner over time. The process X is adapted to F if X
t
is measurable with respect to \(\mathcal{F}\!_t\) for each t. The interpretation is that the history of X is written by the history F. It is henceforth understood, without further mention, that all processes considered are adapted to F. A real-valued process X is integrable if \({{\mathbb{E}} |X_t| < \infty}\) for all t, and it is square integrable if \({{\mathbb{E}} X_t^2 < \infty}\) for all t. These definitions extend to vector-valued processes by applying them to each component.
Seen as a function of t for a given outcome \(\omega \in \Upomega,\)
X(ω) = (X
t
(ω))
t≥0 is called the path of X (at ω). The analytic properties of the paths are essential when it comes to extending the calculus of integration and differentiation to random functions. If \(X_{t-} = \lim_{s \nearrow t} X_s\) exists and \(X_t = \lim_{u \searrow t} X_u\) for all t, then X is said to be right-continuous with left limits (RCLL). If X is RCLL, then \(\Updelta X_t = X_t - X_{t-}\) is the jump made by X at time t. Let \({{\mathcal{F}}_t^X}\) be the sigma-algebra generated by \((X_s)_{s \in [0,t]},\) that is, the smallest sigma-algebra with respect to which X
s
is measurable for each s ≤ t. A stochastic process X is F-predictable if it is left-continuous (X
t
= X
t− for all t) or is the point-wise limit of left-continuous processes, the interpretation being that the state of X is at any time determined its course in the strict past.
Stochastic integration
Let X be an RCLL square integrable process and Y a process with measurable paths. If X has bounded variation paths, then the integral \(\int\nolimits_0^t Y_\tau dX_\tau\) can be defined path-by-path as the Stieltjes integral when it exists: it is the limit of integrand-weighted sums of forward increments of the integrator, \(\sum\nolimits_{i=1}^{n} Y_{t_{i}} (X_{t_{i+1}}\,-\,X_{t_{i}})\), as the partition \(0 = t_0 < t_1 < \cdots < t_{n-1} < t_n = t\) becomes increasingly fine. If X does not have bounded variation paths, then the limit is defined in an L
2 sense, and works for predictable integrands Y. The definitions coincide for bounded variation integrators. The resulting stochastic process, denoted \(Y \cdot X,\) is called the stochastic integral of Y with respect to X. Its state at time t is denoted in the suggestive manner \((Y \cdot X)_t = \int\nolimits_0^t Y_\tau dX_\tau\) or, in differential form,
$$ d(Y \cdot X)_t = Y_t dX_t , $$
(51)
where the differentials can be thought of as forward increments in the small time interval [t, t + dt).
Martingales
An RCLL integrable real-valued process M is an \({({\bf F},{\mathbb{P}})}\)-martingale if \({{\mathbb{E}} [M_t|{\mathcal{F}}_s] = M_s}\) for 0 ≤ s ≤ t. The martingale property can be expressed in differential form as
$$ {{\mathbb{E}}} [dM_t | {{\mathcal{F}}}_{t-}] =0 . $$
(52)
If \(M_\infty\) is an integrable random variable, then the process M defined by \({M_t = {\mathbb{E}} [ M_\infty | {\mathcal{F}}_t]}\) is called the \({({\bf F},{\mathbb{P}})}\)-martingale associated with
\(M_\infty.\) Integrability of M and its martingale property are simple consequences of the tower property of conditional expectation: for s < t,
$$ {{\mathbb{E}}} [M_t|{{\mathcal{F}}}_s] = {{\mathbb{E}}} [{{\mathbb{E}}} [ M_\infty | {{\mathcal{F}}}_t] |{{\mathcal{F}}}_s] = {{\mathbb{E}}} [ M_\infty |{{\mathcal{F}}}_s] = M_s . $$
The martingale M is said to be closed by the random variable
\(M_\infty.\) If \(M_\infty\) is square integrable, then the martingale associated with it is square integrable, a consequence of Jensen’s inequality and the tower property:
$$ {{\mathbb{E}}} M_t^2 = {{\mathbb{E}}} {{\mathbb{E}}}^2 [ M_\infty| {{\mathcal{F}}}_t] \leq {{\mathbb{E}}} {{\mathbb{E}}} [ M_\infty^2| {{\mathcal{F}}}_t] = {{\mathbb{E}}} M_\infty^2 . $$
If Y is left-continuous and possesses right-limits, and if M is a square integrable martingale, then the stochastic integral \(Y \cdot M\) is a martingale (Section II.5 in [7]). The intuitive motivation of this result rests on (51) and (52):
$$ {{\mathbb{E}}} [d(Y \cdot M)_t | {{\mathcal{F}}}_{t-}] = {{\mathbb{E}}} [Y_t dM_t | {{\mathcal{F}}}_{t-}] = Y_t {{\mathbb{E}}} [dM_t | {{\mathcal{F}}}_{t-}] = 0 . $$
If the time horizon is a finite fixed interval [0, T], then integrability conditions are usually easy to check. In particular, a martingale \((M_t)_{t \in [0,T]}\) is closed by M
T
, which is a random variable whose integrability properties can typically be established through what is known about its distribution.
Predictable covariance processes
Let M and N be square integrable martingales. Their predictable covariance process, denoted \(\langle M,N \rangle,\) is given by
$$ d \langle M,N \rangle_t = {{{\mathbb{C}}\hbox{ov}}} [d M_t , d N_t | {{\mathcal{F}}}_{t-}] = {{\mathbb{E}}} [d M_t d N_t | {{\mathcal{F}}}_{t-}] . $$
For vector-valued processes M and N it is defined as the matrix process \(\langle {\bf M},{\bf N'} \rangle\) with \(\langle M_i,N_j \rangle\) in row i and column j.
If M is a square integrable martingale, then
$$ {{\mathbb{E}}} \left[\left.\left(M_T - M_t\right)^2 \right| {{\mathcal{F}}}_t\right] = {{\mathbb{E}}} \left[\left. \int\limits_t^T {{\mathbb{E}}}[ (dM_\tau)^2 | {{\mathcal{F}}}_{\tau -} ] \right| {{\mathcal{F}}}_t \right] = {{\mathbb{E}}}\left[\left. \int\limits_t^T d \langle M,M \rangle_\tau \right| {{\mathcal{F}}}_t \right] $$
(53)
for fixed times t < T. A heuristic explanation goes by writing
$$ \begin{aligned} {{\mathbb{E}}} [(M_T - M_t)^2 | {{\mathcal{F}}}_t] &= {{\mathbb{E}}} \left[\left. \left( \int\limits_t^T dM_\tau \right)^2 \right| {{\mathcal{F}}}_t \right] = {{\mathbb{E}}} \left[\left. \int\limits_t^T dM_{\sigma} \int\limits_t^T dM_\tau \right| {{\mathcal{F}}}_t \right]\\ &= \int\limits_t^T \int\limits_t^T {{\mathbb{E}}}[ dM_{\sigma} dM_\tau | {{\mathcal{F}}}_t] . \end{aligned} $$
In this double integral the off-diagonal terms (e.g. σ < τ) vanish because
$$ {{\mathbb{E}}}\left[ \left. dM_{\sigma} dM_\tau \right| {{\mathcal{F}}}_t\right] = {{\mathbb{E}}} \left[ \left.{{\mathbb{E}}}[ dM_{\sigma} dM_\tau | {{\mathcal{F}}}_{\tau-}] \right| {{\mathcal{F}}}_t \right] = {{\mathbb{E}}} \left[\left. dM_{\sigma} {{\mathbb{E}}}[ dM_\tau | {{\mathcal{F}}}_{\tau-}] \right| {{\mathcal{F}}}_t \right] = 0 , $$
and what remains are the diagonal terms (σ = τ), which are
$$ {{\mathbb{E}}}[ (dM_\tau)^2 | {{\mathcal{F}}}_t] = {{\mathbb{E}}} [ {{\mathbb{E}}}[ (dM_\tau)^2 | {{\mathcal{F}}}_{\tau-}] | {{\mathcal{F}}}_t] = {{\mathbb{E}}} [ d \langle M,M \rangle_\tau | {{\mathcal{F}}}_t] . $$
For a rigorous introduction to the stochastic calculus underlying this paper, a suitable reference are Chapters 1–4 in [7].
Appendix 2: Supplement to Sect. 4
Proof of (47)
The heuristic argument goes as follows:
$$ \begin{aligned} d \langle {\tilde{M}}^{(f)} , {\tilde{M}}^{(f')} \rangle_t &= {\tilde{\mathbb{E}}} \left[ d {\tilde{M}}_t^{(f)} d {\tilde{M}}_t^{(f')} \big| {{\mathcal{F}}}_{t-} \right] \\ &= {\tilde{\mathbb{E}}} \left[ \int\limits_{z \in {{\mathbb{Z}}}} f(t,z) \left[N(dt,dz) - {\tilde{\nu}}(dt,dz)\right] \left.\int\limits_{z' \in {{\mathbb{Z}}}} f{'}(t,z{'}) \left[N(dt,dz{'}) - {\tilde{\nu}}(dt,dz{'})\right]\, \right| {{\mathcal{F}}}_{t-} \right]\\ &= \int\limits_{z \in {{\mathbb{Z}}}} \int\limits_{z{'} \in {{\mathbb{Z}}}} f(t,z) f{'}(t,z{'}) {\tilde{\mathbb{E}}} [ N(dt,dz) N(dt,dz{'}) | {{\mathcal{F}}}_{t-} ] + o(dt), \end{aligned} $$
(54)
where we have used
$$ {\tilde{\mathbb{E}}} [ N(dt,dz) {\tilde{\nu}}(dt,dz') | {{\mathcal{F}}}_{t-}] = {\tilde{\nu}}(dt,dz) {\tilde{\nu}}(dt,dz') = o(dt) . $$
Now, for z ≠ z′,
$$ {\tilde{\mathbb{E}}} [ N(dt,dz) N(dt,dz') | {{\mathcal{F}}}_{t-} ] = o(dt) $$
(two different catastrophes cannot occur at a time) and, for z = z′,
$$ {\tilde{\mathbb{E}}}[ N^2(dt,dz) | {{\mathcal{F}}}_{t-}] = {\tilde{\mathbb{E}}}[ N(dt,dz) | {{\mathcal{F}}}_{t-}] + o(dt) = {\tilde{\nu}}(dt,dz) + o(dt) $$
(N(dt, dz) is essentially zero or one hence equal to its square). Thus, off-diagonal terms in the “double sum” (54) vanish, and what remains on the diagonal is precisely the integrand in (47).
Derivation of the formulas (48)–(50)
Consider the stochastic process X defined by
$$ X_t = \sum_{i; T_i \leq t} f(T_i,Z_i) = \int\limits_0^t \int\limits_{{{\mathbb{Z}}}} f(\tau,z) N(d \tau,dz) , $$
where \({f: {\mathbb{R}}_+\times{\mathbb{Z}} \mapsto {\mathbb{R}}}\) is measurable. Assume the first three moments of X
t
exist, and denote them by
$$ m_t^{(j)} = {\tilde{\mathbb{E}}} X_t^j , \quad j = 1,2,3. $$
Obviously, these are continuous functions of t. We are going to show that the functions
$$ v_t^{(j)} = \int\limits_0^t \int\limits_{{{\mathbb{Z}}}} f(\tau,z)^j {\tilde{\nu}}(d \tau,dz) , \quad j = 1,2,3, $$
are the mean, variance, and central third moment of X
t
. Straightforwardly, the mean is
$$ m_t^{(1)} = v_t^{(1)} . $$
(55)
To deal with higher order moments, write X
j
t
as the sum of its increments:
$$ \begin{aligned} X_t^j &= \int\limits_0^t \int\limits_{{{\mathbb{Z}}}} \left( (X_{\tau-} + f(\tau,z))^j - X_{\tau-}^j \right) N(d \tau,dz)\\ &= \sum_{i=0}^{j-1} {j \choose i} \int\limits_0^t \int\limits_{{{\mathbb{Z}}}} X_{\tau-}^i f(\tau,z)^{j-i} N(d \tau,dz) . \end{aligned} $$
Taking expectation, shifting the order of integration and expectation in the last expression, and using the tower property \({E [\,\cdot\,] = E [E[ \,\cdot\, |\, {\mathcal{F}}_{\tau -}]]}\) and the fact that N has independent increments, we obtain
$$ m_t^{(j)} = \sum_{i=0}^{j-1} {j \choose i} \int\limits_0^t m_\tau^{(i)} \int\limits_{{{\mathbb{Z}}}} f(\tau,z)^{j-i} {\tilde{\nu}}(d \tau,dz) = \sum_{i=0}^{j-1} {j \choose i} \int\limits_0^t m_\tau^{(i)} d v_\tau^{(j-i)} . $$
In particular,
$$ \begin{aligned} m_t^{(2)} &= \int\limits_0^t d v_\tau^{(2)} + 2 \int\limits_0^t m_\tau^{(1)} d v_\tau^{(1)} = v_t^{(2)} + 2 \int\limits_0^t v_\tau^{(1)} d v_\tau^{(1)}\\ &= v_t^{(2)} + \left( v_t^{(1)} \right)^2, \end{aligned} $$
(56)
$$ \begin{aligned} m_t^{(3)} &= \int\limits_0^t d v_\tau^{(3)} + 3 \int\limits_0^t m_\tau^{(1)} d v_\tau^{(2)} + 3 \int\limits_0^t m_\tau^{(2)} d v_\tau^{(1)} \\ &= v_t^{(3)} + 3 \int\limits_0^t v_\tau^{(1)} d v_\tau^{(2)} + 3 \int\limits_0^t \left( v_\tau^{(2)} + \left( v_\tau^{(1)} \right)^2 \right) d v_\tau^{(1)} \\ &= v_t^{(3)} + 3 \int\limits_0^t d \left( v_\tau^{(1)} v_\tau^{(2)} \right) + 3 \int\limits_0^t \left( v_\tau^{(1)} \right)^2 d v_\tau^{(1)}\\ &= v_t^{(3)} + 3 v_t^{(1)} v_t^{(2)} + \left( v_t^{(1)} \right)^3. \end{aligned} $$
(57)
The expressions (55)–(57) show that v
(2)
t
and v
(3)
t
are precisely the central moments corresponding to m
(2)
t
and m
(3)
t
.
For each t the moments in (48)–(50) are given by \({\tilde{V}}_t = v_T^{(1)}\) and \({\tilde{V}}_t^{(j)} = v_T^{(j)}, j=2,3,\) with f(τ, z) = ℓ 1[t,T](τ). The first two are straightforward to calculate, and for the third one uses the following elementary result: if \(Y_1,\ldots,Y_k\) are i.i.d. replicates of a random variable Y, then
$$ {{\mathbb{E}}} (Y_1 + \cdots + Y_k)^3 = k {{\mathbb{E}}} Y^3 + 3 k (k-1) {{\mathbb{E}}} Y^2 {{\mathbb{E}}} Y + k (k-1)(k-2) ({{\mathbb{E}}}Y)^3. $$