Appendix
In this Appendix, we prove the main results of the paper which are Theorems 2 and 3. Throughout this section, we use the same notation as in Sect. 3. In particular, \(\varPsi _{t}(x,U,\theta )\) is defined by (15) and under Assumption 1, this function is differentiable w.r.t. x. We start this section with a simple proposition.
Proposition 1
Let the multiscale process \(( Z^N_\theta (t) )_{t \ge 0}\) and the PDMP \(( Z_\theta (t) )_{t \ge 0}\) be as in Sect. 3. Suppose that Assumption 1 holds and \(Z^N_\theta \Rightarrow Z_\theta \) as \(N \rightarrow \infty \). Then,
$$\begin{aligned} \lim _{N \rightarrow \infty } \frac{ \partial }{ \partial \theta } \mathbb {E}( f( Z^{N}_\theta ( T ) ) ) = {\bar{S}}_\theta (f,T) \end{aligned}$$
where
$$\begin{aligned} {\bar{S}}_\theta (f,T)&= \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^T \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla \varPsi _{T-t}(x_\theta (t),U_\theta (t),T-t) , \zeta ^{(c)}_k \right\rangle {\text {d}}t \right] \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^T \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \varDelta _{k} \varPsi _{T-t}(x_\theta (t),U_\theta (t),T-t) {\text {d}}t \right] . \end{aligned}$$
(24)
Proof
Let
$$\begin{aligned} S^N_\theta (f,T) = \frac{ \partial }{ \partial \theta } \mathbb {E}( f( Z^{N}_\theta ( T ) ) ) \end{aligned}$$
and analogous to \(\varPsi _t\) (15) define the map \(\varPsi ^N_t\) as
$$\begin{aligned} \varPsi ^N_t (x,U,\theta ) = \mathbb {E}( f( Z^N_\theta (t) ) ), \quad for any \quad t \ge 0, \end{aligned}$$
where \(Z^N\) is the scaled process describing the multiscale reaction dynamics with (x, U) as its initial state. Due to Theorem 3.1 in (Gupta et al. 2017), we obtain
$$\begin{aligned}&S^N_\theta (f,T) \\&= \sum _{ k =1}^K \mathbb {E}\left( N^{\rho _k + r } \int _0^T \partial _\theta \lambda ^N_k( Z^N_\theta (t) ,\theta ) ( \varPsi ^N_{T-t}( Z^N_\theta (t)+ \zeta ^N_k,\theta ) - \varPsi ^N_{T-t}( Z^N_\theta (t) , \theta ) ) {\text {d}}t \right) , \end{aligned}$$
where \(\zeta ^N_k:= \varLambda _N \zeta _k\), \(\rho _k = \beta _k + \langle \nu _k, \alpha \rangle \) and r is the timescale of observation (10). We can write \(Z^N_\theta (t) = (x^N_\theta (t), U^N_\theta (t) )\) where \(x^N_\theta (t) \in \mathbb {R}^{S_c}\) denotes the states of species in \({\mathscr {S}}_c\) and \(U^N_\theta (t) \in \mathbb {N}_0^{S_d}\) denotes the states of species in \({\mathscr {S}}_d\). Exploiting the analysis in Sect. 2.3, we can express \(S^N_\theta (f,T) \) as
$$\begin{aligned} S^N_\theta (f,T) = S^{N,c}_\theta (f,T) + S^{N,d}_\theta (f,T) +o(1) \end{aligned}$$
where the o(1) term converges to 0 as \(N \rightarrow \infty \),
$$\begin{aligned} S^{N,c}_\theta (f,T)&= \sum _{ k \in {\mathscr {R}}_c} \mathbb {E}\left( \int _0^T \partial _\theta \lambda _k( x^N_\theta (t), U^N_\theta (t) ,\theta ) N^{\rho _k + r } ( \varPsi ^N_{T-t}( x^N_\theta (t)\right. \\&\quad \left. +N^{-(\rho _k + r) } \zeta ^{(c)}_k , U^N_\theta (t) , \theta ) - \varPsi ^N_{T-t}( x^N_\theta (t), U^N_\theta (t) ,\theta ) ) {\text {d}}t \right) \end{aligned}$$
and
$$\begin{aligned} S^{N,d}_\theta (f,T)&= \sum _{ k \in {\mathscr {R}}_d}\mathbb {E}\left( \int _0^T \partial _\theta \lambda _k( x^N_\theta (t), U^N_\theta (t),\theta ) \right. \\&\qquad \left. ( \varPsi ^N_{T-t}( x^N_\theta (t), U^N_\theta (t)+ \zeta ^{(d)}_k,\theta ) - \varPsi ^N_{T-t}( x^N_\theta (t), U^N_\theta (t) , \theta ) ) {\text {d}}t \right) . \end{aligned}$$
We know that as \(N \rightarrow \infty \), process \((x^N_\theta , U^N_\theta )\) converges in distribution to process \((x_\theta , U_\theta )\) in the Skorohod topology on \(\mathbb {R}^{S_c} \times \mathbb {N}^{S_d}_0\). This ensures that for any (x, U) and \(t \ge 0\), \(\varPsi ^N_t(x,U,\theta ) \rightarrow \varPsi _t(x,U,\theta )\) as \(N \rightarrow \infty \), and this convergence holds uniformly over compact sets, i.e.,
$$\begin{aligned} \lim _{N \rightarrow \infty } \sup _{ (x,U) \in C , t \in [0,T] } \left| \varPsi ^N_t(x,U,\theta ) - \varPsi _t(x,U,\theta ) \right| = 0 \end{aligned}$$
(25)
for any \(T >0\) and any compact set \(C \subset \mathbb {R}^{S_c} \times \mathbb {N}^{S_d}_0\). In fact, under Assumptions 1 we also have
$$\begin{aligned} \lim _{N \rightarrow \infty } \sup _{ (x,U) \in C , t \in [0,T] } \left\| \nabla \varPsi ^N_t(x,U,\theta ) - \nabla \varPsi _t(x,U,\theta ) \right\| = 0. \end{aligned}$$
(26)
As \((x^N_\theta , U^N_\theta ) \Rightarrow (x_\theta , U_\theta )\), using (25), it is straightforward to conclude that
$$\begin{aligned}&\lim _{N \rightarrow \infty } S^{N,d}_\theta (f,T)\nonumber \\&= \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^T \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \varDelta _{k} \varPsi _{T-t}(x_\theta (t),U_\theta (t),T-t) {\text {d}}t \right] . \end{aligned}$$
(27)
Noting that
$$\begin{aligned}&N^{\rho _k + r } \left[ \varPsi ^N_{T-t}\left( x^N_\theta (t)+ \frac{1}{ N^{\rho _k + r } }\zeta ^{(c)}_k , U^N_\theta (t) , \theta \right) - \varPsi ^N_{T-t}\left( x^N_\theta (t), U^N_\theta (t) ,\theta \right) \right] \\&\quad = \left\langle \nabla \varPsi ^N_{T-t}\left( x^N_\theta (t), U^N_\theta (t) ,\theta \right) , \zeta ^{(c)}_k \right\rangle +o(1), \end{aligned}$$
(26) allows us to obtain
$$\begin{aligned}&\lim _{N \rightarrow \infty } S^{N,c}_\theta (f,T) \\&\quad = {\sum }_{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^T \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla \varPsi _{T-t}(x_\theta (t),U_\theta (t),T-t) , \zeta ^{(c)}_k \right\rangle {\text {d}}t \right] . \end{aligned}$$
This relation along with (27) proves the proposition. \(\square \)
In light of Proposition 1, to prove Theorem 2 it suffices to show that \({\bar{S}}_\theta (f,T) = \hat{S}_\theta (f,T) \), where \(\hat{S}_\theta (f,T) \) is the sensitivity for limiting PDMP \((Z_\theta (t))_{t \ge 0 }\) defined by
$$\begin{aligned} \hat{S}_\theta (f,T)&= \lim _{ h \rightarrow \infty } \frac{ \mathbb {E}( f(Z_{\theta +h} (T) ) ) - \mathbb {E}( f(Z_{\theta } (T) ) ) }{h} \nonumber \\&= \lim _{ h \rightarrow \infty } \frac{ \mathbb {E}( f(x_{\theta +h} (T), U_{\theta +h} (T) ) ) - \mathbb {E}( f(x_{\theta } (T), U_{\theta } (T) ) ) }{h}. \end{aligned}$$
(28)
The next proposition derives a formula for \(\hat{S}_\theta (f,T) \) by coupling processes \(Z_\theta = (x_\theta , U_\theta )\) and \(Z_{\theta +h} = (x_{\theta +h}, U_{\theta +h} )\). This formula will be useful later in proving both Theorems 2 and 3.
Proposition 2
Let \(y_\theta (t)\) be the solution of IVP (16) and let \(D_\theta \lambda _k( x_\theta (t), U_\theta (t) ,\theta )\) be given by (17). Then, the PDMP sensitivity \(\hat{S}_\theta (f,T) \) defined by (28) can be expressed as
$$\begin{aligned} \hat{S}_\theta (f,T)&= \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^{T} \partial _{\theta } \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle {\text {d}}t \right] \\&\quad + \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^{T} \left\langle \nabla \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle \right] , y_\theta (t) \right\rangle {\text {d}}t \right] \\&\quad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{T} \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla \left( \varDelta _k f(x_\theta (t),U_\theta (t) ) \right) , y_\theta (t) \right\rangle {\text {d}}t \right] \\&\quad + \sum _{k \in {\mathscr {R}}_d } \mathbb {E}\left[ \int _0^T \partial _{\theta } \lambda _k (x_\theta (t) , U_\theta (t) ,\theta ) \varDelta _k \varPsi _{T- t}( x_\theta (t), U_\theta (t), \theta ) {\text {d}}t \right] \\&\quad + \sum _{k \in {\mathscr {R}}_d } \mathbb {E}\left[ \int _0^T \left\langle \nabla \lambda _k (x_\theta (t) , U_\theta (t) ,\theta ), y_\theta (t) \right\rangle \varDelta _k \varPsi _{T- t}( x_\theta (t), U_\theta (t), \theta ) {\text {d}}t \right] . \end{aligned}$$
Proof
Analogous to the “split-coupling” introduced in (Anderson 2012), we couple the PDMPs \(Z_\theta = (x_\theta , U_\theta )\) and \(Z_{\theta +h} = (x_{\theta +h} , U_{\theta +h} )\) as follows
$$\begin{aligned} x_\theta (t)&= x_0 + \sum _{k \in {\mathscr {R}}_c} \left( \int _{0}^t \lambda _k( x_\theta (s), U_\theta (s), \theta ) {\text {d}}s \right) \zeta ^{(c)}_k \\ x_{\theta +h}(t)&= x_0 + \sum _{k \in {\mathscr {R}}_c} \left( \int _{0}^t \lambda _k( x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) {\text {d}}s \right) \zeta ^{(c)}_k \\ U_\theta (t)&= U_0 \\&\quad + \sum _{k \in {\mathscr {R}}_d} Y_k \left( \int _{0}^t \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \wedge \lambda _k( x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) {\text {d}}s \right) \zeta ^{(d)}_k \\&\quad + \sum _{k \in {\mathscr {R}}_d} Y^{(1)}_k \left( \int _{0}^t \lambda ^{(1)}_k( x_\theta (s), U_\theta (s) ,\theta , x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) {\text {d}}s \right) \zeta ^{(d)}_k \\ U_{\theta +h}(t)&= U_0\\&\quad + \sum _{k \in {\mathscr {R}}_d} Y_k \left( \int _{0}^t \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \wedge \lambda _k( x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) {\text {d}}s \right) \zeta ^{(d)}_k \\&\quad + \sum _{k \in {\mathscr {R}}_d} Y^{(2)}_k \left( \int _{0}^t \lambda ^{(2)}_k( x_\theta (s), U_\theta (s) ,\theta , x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) {\text {d}}s \right) \zeta ^{(d)}_k, \end{aligned}$$
where \(a \wedge b\) denotes the minimum of a and b, \(\{ Y_k , Y^{ (1) }_k , Y^{ (2)}_{k}\}\) is a collection of independent unit-rate Poisson processes, and
$$\begin{aligned}&\lambda ^{(1)}_k( x_\theta (s), U_\theta (s) ,\theta , x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) = \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \\&\quad - \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \wedge \lambda _k( x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) \quad and \\&\lambda ^{(2)}_k( x_\theta (s), U_\theta (s) ,\theta , x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) = \lambda _k( x_{\theta +h}(s), U_{\theta +h}(s) ,\theta +h ) \\&\quad - \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \wedge \lambda _k(x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ). \end{aligned}$$
Define a stopping time as the first time that processes \(U_\theta \) and \(U_{ \theta +h}\) separate, i.e.,
$$\begin{aligned} \tau _h = \inf \{ t \ge 0: U_{\theta }(t) \ne U_{\theta +h}(t) \}. \end{aligned}$$
Observe that the generator for the PDMP \(Z_\theta = (x_\theta , U_\theta )\) is
$$\begin{aligned} \mathbb {A}_\theta g(x,u) = \sum _{ k \in {\mathscr {R}}_c} \lambda _k(x,u,\theta ) \left\langle \nabla g(x,u) , \zeta ^{(c)}_k \right\rangle + \sum _{k \in {\mathscr {R}}_d} \lambda _k(x,u,\theta ) \varDelta _k g(x,u), \end{aligned}$$
where \(g : \mathbb {R}^{S_c} \times \mathbb {N}^{S_d}_0\) is any function which is differentiable in the first \(S_c\) coordinates. Applying Dynkin’s formula, we obtain
$$\begin{aligned} \mathbb {E}\left( f(x_\theta (t) ,U_\theta (t) ) \right)&= f(x_0,U_0) + \mathbb {E}\left( \int _{0}^t \mathbb {A}_\theta f( x_\theta (s) , U_\theta (s) ) {\text {d}}s \right) \qquad and \\ \quad \mathbb {E}\left( f(x_{\theta +h}(t) ,U_{\theta +h}(t) ) \right)&= f(x_0,U_0) + \mathbb {E}\left( \int _{0}^t \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) {\text {d}}s \right) . \end{aligned}$$
The above coupling between processes \(Z_\theta = (x_\theta , U_\theta )\) and \(Z_{\theta +h} = (x_{\theta +h} , U_{\theta +h} )\) ensures that for \(0 \le s \le \tau _h\) we have \(U_{\theta +h}(s) = U_{\theta }(s)\) and \(x_{\theta +h}(s) = x_\theta (s) + h y_\theta (t) +o(h)\). Noting that \(\tau _h \rightarrow \infty \) a.s. as \(h \rightarrow 0\), we obtain
$$\begin{aligned}&\lim _{h \rightarrow 0 } \frac{1}{h} \left[ \mathbb {E}\left( \int _{0}^{\tau _h\wedge t} \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) {\text {d}}s \right) - \mathbb {E}\left( \int _{0}^{\tau _h\wedge t} \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) {\text {d}}s \right) \right] \nonumber \\&\quad = \lim _{h \rightarrow 0 } \frac{1}{h} \left[ \mathbb {E}\left( \int _{0}^{\tau _h\wedge t} \left[ \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta }(s) ) - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right] {\text {d}}s \right) \right] \nonumber \\&\quad = \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^{ t} \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla f(x_\theta (s),U_\theta (s) ) , \zeta ^{(c)}_k \right\rangle {\text {d}}s \right] \nonumber \\&\qquad + \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^{t} \left\langle \nabla \left[ \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla f(x_\theta (s),U_\theta (s) ) , \zeta ^{(c)}_k \right\rangle \right] , y_\theta (s) \right\rangle {\text {d}}s \right] \nonumber \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{ t} \left\langle \nabla \left[ \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \varDelta _k f(x_\theta (s),U_\theta (s) ) \right] , y_\theta (s) \right\rangle {\text {d}}s \right] \nonumber \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{ t} \partial _\theta \lambda _k(x_\theta (s) ,U_\theta (s) ,\theta ) \varDelta _k f(x_\theta (s),U_\theta (s)) {\text {d}}s \right] . \end{aligned}$$
(29)
Let \(\sigma _0 = 0\) and for each \(i=1,2,\dots \) let \(\sigma _i\) denote the ith jump time of the process
$$\begin{aligned} \sum _{k \in {\mathscr {R}}_d} Y_k \left( \int _{0}^t \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \wedge \lambda _k( x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) {\text {d}}s \right) \end{aligned}$$
which counts the common jump times among processes \(U_\theta \) and \(U_{\theta +h}\). Observe that
$$\begin{aligned}&\lim _{h \rightarrow 0 } \frac{1}{h} \left[ \mathbb {E}\left( \int _{\tau _h\wedge t}^t \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) {\text {d}}s \right) - \mathbb {E}\left( \int _{\tau _h\wedge t}^t \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) {\text {d}}s \right) \right] \\&\quad = \sum _{i=0}^\infty \lim _{h \rightarrow 0 } \frac{1}{h} \mathbb {E}\left[ \mathrm{1l}_{ \{ \sigma _i \wedge t \le \tau _h < \sigma _{i+1} \wedge t \} } \int _{\tau _h\wedge t}^t \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) \right. \right. \\&\left. \left. \qquad \qquad - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s \right] . \end{aligned}$$
Recall the definition of \(D_\theta \lambda _k( x_\theta (t), U_\theta (t) ,\theta )\) from (17). We shall soon prove that
$$\begin{aligned}&\lim _{h \rightarrow 0 } \frac{1}{h} \mathbb {E}\left[ \mathrm{1l}_{ \{ \sigma _i \wedge t \le \tau _h < \sigma _{i+1} \wedge t \} } \int _{\tau _h\wedge t}^t \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) )\right. \right. \nonumber \\&\left. \left. \qquad \qquad \qquad - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s\right] . \nonumber \\&\quad = \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{ t \wedge \sigma _i}^{ t \wedge \sigma _{i +1} } D_\theta \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \left( \varDelta _k \varPsi _{t- s}( x_\theta (s), U_\theta (s), \theta ) )\right. \right. \nonumber \\&\left. \left. \qquad \qquad \qquad - \varDelta _k f( x_\theta (s), U_\theta (s)) \right) {\text {d}}s \right] . \end{aligned}$$
(30)
Assuming this for now, we get
$$\begin{aligned}&\lim _{h \rightarrow 0 } \frac{1}{h} \left[ \mathbb {E}\left( \int _{\tau _h\wedge t}^t \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) {\text {d}}s \right) - \mathbb {E}\left( \int _{\tau _h\wedge t}^t \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) {\text {d}}s \right) \right] \\&\quad = \sum _{k \in {\mathscr {R}}_d} \sum _{i=0}^\infty \mathbb {E}\left[ \int _{t \wedge \sigma _i}^{ t \wedge \sigma _{i+1} } D_\theta \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \left( \varDelta _k \varPsi _{t- s}( x_\theta (s), U_\theta (s), \theta ) \right. \right. \\&\left. \left. \qquad \qquad \qquad - \varDelta _k f( x_\theta (s), U_\theta (s)) \right) {\text {d}}s \right] \\&\quad = \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{ t} \partial _\theta \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \varDelta _k \varPsi _{t- s}( x_\theta (s), U_\theta (s), \theta ) {\text {d}}s \right] \\&\qquad - \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{ t} \partial _\theta \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \varDelta _k f( x_\theta (s), U_\theta (s)) {\text {d}}s \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{ t} \left\langle \nabla \lambda _k(x_\theta ( s) ,U_\theta (s),\theta ) , y_\theta ( s) \right\rangle \varDelta _k \varPsi _{t- s}( x_\theta (s), U_\theta (s), \theta ) {\text {d}}s \right] \\&\qquad - \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{ t} \left\langle \nabla \lambda _k(x_\theta ( s) ,U_\theta (s),\theta ) , y_\theta ( s) \right\rangle \varDelta _k f( x_\theta (s), U_\theta (s)) {\text {d}}s \right] . \end{aligned}$$
Combining this formula with (29), we obtain
$$\begin{aligned}&\hat{S}_\theta (f,t) \\&\quad = \lim _{h \rightarrow 0} \frac{ \mathbb {E}\left( f(x_{\theta +h} (t), U_{\theta +h} (t) ) \right) - \mathbb {E}\left( f(x_{\theta } (t), U_{\theta } (t) \right) }{h} \\&\quad = \lim _{h \rightarrow 0} \frac{1}{h} \mathbb {E}\left[ \int _{0}^t \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s \right] \\&\quad = \lim _{h \rightarrow 0} \frac{1}{h} \mathbb {E}\left[ \int _{0}^{t \wedge \tau _h } \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s \right] \\&\qquad + \lim _{h \rightarrow 0} \frac{1}{h} \mathbb {E}\left[ \int _{t \wedge \tau _h }^t \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s \right] \\&\quad = \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^{t} \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla f(x_\theta (s),U_\theta (s) ) , \zeta ^{(c)}_k \right\rangle {\text {d}}s \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^{t} \left\langle \nabla \left[ \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla f(x_\theta (s),U_\theta (s) ) , \zeta ^{(c)}_k \right\rangle \right] , y_\theta (s) \right\rangle {\text {d}}s \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{t} \left\langle \nabla \left[ \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \varDelta _k f(x_\theta (s),U_\theta (s) ) \right] , y_\theta (s) \right\rangle {\text {d}}s \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{t} \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \varDelta _k f(x_\theta (s) , U_\theta ( s) ) {\text {d}}s \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{ 0}^{ t } \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \varDelta _k \varPsi _{t- s}( x_\theta (s), U_\theta (s), \theta ) {\text {d}}s \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{ 0}^{ t } \left\langle \nabla \lambda _k(x_\theta ( s) ,U_\theta (s),\theta ) , y_\theta ( s) \right\rangle \varDelta _k \varPsi _{t- s}( x_\theta (s), U_\theta (s), \theta ) {\text {d}}s \right] \\&\qquad - \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{ 0}^{ t } \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \varDelta _k f(x_\theta (s) , U_\theta ( s) ) {\text {d}}s \right] \\&\qquad - \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{ 0}^{ t } \left\langle \nabla \lambda _k(x_\theta ( s) ,U_\theta (s),\theta ) , y_\theta ( s) \right\rangle \varDelta _k f(x_\theta (s) , U_\theta ( s) ) {\text {d}}s \right] . \end{aligned}$$
In the last expression, the fourth term cancels with the sixth term. Expanding the third term via the product rule \(\nabla (gh) = g \nabla h + h \nabla g\) produces two terms, one of which cancels with the last term, and then we obtain the result stated in the statement of this proposition. Therefore, to prove this proposition the only step remaining is to show (30). This is what we do next.
Assume that \(x_\theta ( \sigma _i ) = x\), \(x_{\theta +h}( \sigma _i ) = x(h) = x + o(1)\), \(U_\theta (\sigma _i) =U_{\theta +h}(\sigma _i) =U\) and \(\{\tau _h > \sigma _i\}\). Given this information \({\mathscr {F}}_i\), the random time \(\delta _i = ( \tau _h -\sigma _i ) \wedge (\sigma _{i+1} -\sigma _i )\) has distribution that satisfies
$$\begin{aligned} \mathbb {P}\left( \delta _i \le w \vert {\mathscr {F}}_i \right) = 1 - \exp \left( - \int _{0}^w \lambda _0( x_\theta (s + \sigma _i) , U ,\theta ) {\text {d}}s \right) + o(1), for w \in [0, \infty ) \end{aligned}$$
(31)
where \(\lambda _0(x,U,\theta ) = \sum _{k \in {\mathscr {R}}_d} \lambda _k(x,U,\theta )\). Given \(\delta _i = w\), the probability that event \(\{ (\sigma _{i+1} -\sigma _i ) > ( \tau _h -\sigma _i ) \}\) occurs (i.e., \(\delta _i = \tau _h -\sigma _i\)) and the perturbation reaction is \(k \in {\mathscr {R}}_d\) is simply
$$\begin{aligned}&\frac{1}{ \lambda _0(x_\theta ( \sigma _i +w) , U,\theta ) }\left| D_\theta \lambda _k(x_\theta (\sigma _i+w) , U,\theta ) \right| h +o(h). \end{aligned}$$
If \(D_\theta \lambda _k(x_\theta (\sigma _i+w) , U,\theta ) > 0\), then at time \(\tau _h\) process \(U_{ \theta +h}\) jumps by \(\zeta ^{ (d) }_k\), and if \(D_\theta \lambda _k(x_\theta (\sigma _i+w) , U,\theta ) <0\), process \(U_{ \theta }\) jumps by \(\zeta ^{ (d) }_k\). We will suppose that the first situation holds, but the other case can be handled similarly. Assuming \(w < (t - \sigma _i)\), we have
$$\begin{aligned}&\lim _{h \rightarrow 0} \mathbb {E}\left( \int _{\tau _h\wedge t}^t \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s \bigg \vert {\mathscr {F}}_i, \tau _h = \sigma _i + w ,k \right) \\&\quad = \varDelta _{k} \varPsi _{t - \sigma _i -w} ( x_\theta ( \sigma _i + w), U_\theta ( \sigma _i+w ), \theta ) - \varDelta _{k} f( x_\theta ( \sigma _i + w), U_\theta ( \sigma _i+w ) ) \\&\quad := G_k( x_\theta ( \sigma _i +w) , U_\theta ( \sigma _i +w ) , t - \sigma _i-w) \end{aligned}$$
and as \(\delta _i\) has distribution (31), we obtain
$$\begin{aligned}&\lim _{h \rightarrow 0} \frac{1}{h} \mathbb {E}\left[ \mathrm{1l}_{ \{ \sigma _i \wedge t \le \tau _h < \sigma _{i+1} \wedge t \} } \int _{\tau _h\wedge t}^t \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s \right] \nonumber \\&\quad = \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \mathrm{1l}_{ \{ \sigma _i \le t \} } \int _{0}^{t - \sigma _i} G_k( x_\theta ( \sigma _i +w) , U_\theta ( \sigma _i +w ) , t - \sigma _i-w) \right. \nonumber \\&\quad \left. D_\theta \lambda _k(x_\theta (\sigma _i+w) , U_\theta (\sigma _i+w),\theta ) \exp \left( -\int _{0}^w \lambda _0(x_\theta ( \sigma _i+s) , U_\theta (\sigma _i+s),\theta ) {\text {d}}s \right) dw \right] . \end{aligned}$$
(32)
Note that given \(\sigma _i < t\) and \({\mathscr {F}}_i\), the random variable \(\gamma _i = (t \wedge \sigma _{i +1} - t \wedge \sigma _{i })\) has probability density function given by
$$\begin{aligned} p(w) = \lambda _0(x_\theta ( \sigma _i+ w ) , U_\theta (\sigma _i+w),\theta ) \exp \left( -\int _{0}^w \lambda _0(x_\theta ( \sigma _i+ u ) , U_\theta (\sigma _i+u),\theta ) {\text {d}}u \right) , \end{aligned}$$
for \(w \in [0, t -\sigma _i)\) and \( \mathbb {P}\left( \gamma _i \le w \vert {\mathscr {F}}_i \right) =1\) if \(w \ge (t - \sigma _i)\). Letting
$$\begin{aligned}&G(s,t) = G_k( x_\theta ( s) , U_\theta ( s) , t - s) D_\theta \lambda _k(x_\theta (s) , U_\theta (s),\theta )\\&\quad and \quad P(w) = \int _{w}^\infty p(u){\text {d}}u = \exp \left( -\int _{0}^w \lambda _0(x_\theta ( \sigma _i+ u ) , U_\theta (\sigma _i+u),\theta ) {\text {d}}u \right) \end{aligned}$$
we have
$$\begin{aligned}&\mathbb {E}\left( \int _{t \wedge \sigma _i}^{ t \wedge \sigma _{i+1} } G(s,t) {\text {d}}s \bigg \vert {\mathscr {F}}_i , \sigma _i< t \right) = \mathbb {E}\left( \int _{0}^{ \gamma _i } G(s+\sigma _i,t) {\text {d}}s \bigg \vert {\mathscr {F}}_i , \sigma _i< t \right) \\&\quad = \mathbb {P}\left( \gamma _i \ge t - \sigma _i \bigg \vert {\mathscr {F}}_i , \sigma _i< t \right) \int _{0}^{t - \sigma _i} G(s+\sigma _i,t) {\text {d}}s \\&\qquad + \mathbb {E}\left( \mathrm{1l}_{ \{ 0 \le \gamma _i<(t - \sigma _i) \} } \int _{0}^{ \delta _i } G(s+\sigma _i,t) {\text {d}}s \bigg \vert {\mathscr {F}}_i , \sigma _i < t \right) \\&\quad = P(t - \sigma _i) \int _{0}^{t - \sigma _i} G(s+\sigma _i,t) {\text {d}}s + \int _{0}^{t - \sigma _i} p(w) \left( \int _{0}^{ w } G(s+\sigma _i,t) {\text {d}}s \right) dw. \end{aligned}$$
Using integration by parts
$$\begin{aligned} \int _{0}^{t - \sigma _i} p(w) \left( \int _{0}^{ w } G(s+\sigma _i,t) {\text {d}}s \right) dw&= -P(t - \sigma _i)\left( \int _{0}^{ t -\sigma _i } G(s+\sigma _i,t) {\text {d}}s \right) \\&\quad +\int _{0}^{t - \sigma _i} P(w) G(w+\sigma _i,t) dw \end{aligned}$$
which shows that
$$\begin{aligned} \int _{0}^{t - \sigma _i} P(w) G(w+\sigma _i,t) {\text {d}}s = \mathbb {E}\left( \int _{t \wedge \sigma _i}^{ t \wedge \sigma _{i+1} } G(s,t) {\text {d}}s \bigg \vert {\mathscr {F}}_i , \sigma _i < t \right) . \end{aligned}$$
Substituting this expression in (32) gives us
$$\begin{aligned}&\lim _{h \rightarrow 0} \frac{1}{h} \mathbb {E}\left[ \int _{\tau _h\wedge t}^t \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s \right] \\&\quad = \sum _{i=0}^\infty \lim _{h \rightarrow 0} \frac{1}{h} \mathbb {E}\left[ \mathrm{1l}_{ \{ \sigma _i \wedge t \le \tau _h < \sigma _{i+1} \wedge t \} } \int _{\tau _h\wedge t}^t \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) \right. \right. \\&\left. \left. \qquad \qquad \qquad - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s \right] \\&\quad = \sum _{k \in {\mathscr {R}}_d} \sum _{i=0}^\infty \mathbb {E}\left[ \int _{ t \wedge \sigma _i}^{ t \wedge \sigma _{i +1} } G_k( x_\theta ( s) , U_\theta ( s) , t - s) D_\theta \lambda _k(x_\theta (s) , U_\theta (s),\theta ) {\text {d}}s \right] . \end{aligned}$$
This proves (30) and completes the proof of this proposition. \(\square \)
Define a \(S_c \times S_c\) matrix by
$$\begin{aligned} M(x,U,\theta ) = \sum _{k \in {\mathscr {R}}_c } \zeta ^{ (c) }_{k} ( \nabla \lambda _k(x,U,\theta ) )^* \end{aligned}$$
for any \((x , U, \theta ) \in \mathbb {R}^{S_c} \times \mathbb {N}^{S_d}_0 \times \mathbb {R}\), where \(v^*\) denotes the transpose of v. Let \(\varPhi (x_0,U_0,t)\) be the solution of the linear matrix-valued equations
$$\begin{aligned} \frac{{\text {d}} }{{\text {d}}t} \varPhi (x_0,U_0,t) = M(x_\theta (t) , U_\theta (t) ,\theta ) \varPhi (x_0,U_0,t) \end{aligned}$$
(33)
with \(\varPhi (x_0,U_0,0) = \mathbf{I}\), which is the \(S_c \times S_c\) identity matrix. Here \((x_0, U_0)\) denotes the initial state of \((x_\theta (t) ,U_\theta (t) )\). It can be seen that \(y_\theta (t)\), which is the solution of IVP (16), can be written as
$$\begin{aligned} y_\theta (t) = \sum _{k \in {\mathscr {R}}_c } \int _0^t \partial _\theta \lambda _k ( x_\theta (s) , U_\theta (s) , \theta ) \varPhi (x_\theta (s),U_\theta (s),t - s) \zeta _k^{(c)} {\text {d}}s. \end{aligned}$$
(34)
This shall be useful in proving the next proposition which considers the sensitivity of \(\varPsi _t (x_\theta (t) , U_\theta (t) ,\theta )\) to the initial value of the continuous state \(x_0\).
Proposition 3
Let \(\varPhi (x_0,U_0,t)\) be the matrix-valued function defined above. Then, we can express the gradient of \(\varPsi _t(x_0,U_0,\theta ) \) w.r.t. \(x_0\) as
$$\begin{aligned}&\nabla \varPsi _t(x_0,U_0,\theta ) = \nabla f(x_0,U_0) \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^{t} \varPhi ^*(x_0, U_0,s) \nabla \left[ \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla f(x_\theta (s),U_\theta (s) ) , \zeta ^{(c)}_k \right\rangle \right] {\text {d}}s \right] \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{t} \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \varPhi ^*(x_0, U_0,s) \nabla \left( \varDelta _k f(x_\theta (s),U_\theta (s) ) \right) {\text {d}}s \right] \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{ 0}^{ t } \varPhi ^*(x_0, U_0,s) \nabla \lambda _k(x_\theta ( s) ,U_\theta (s),\theta ) \varDelta _k \varPsi _{t-s}( x_\theta ( s) , U_\theta ( s ) ,\theta ) {\text {d}}s \right] . \end{aligned}$$
(35)
Proof
To prove this proposition, it suffices to show that for any vector \(v \in \mathbb {R}^{S_c}\), the inner product of v with the l.h.s. of (35) is same as the inner product of v with the r.h.s. of (35). Defining
$$\begin{aligned} y(t) = \varPhi (x_0, U_0,t) v \end{aligned}$$
our aim is to prove that
$$\begin{aligned}&\left\langle \nabla \varPsi _t(x_0,U_0,\theta ) , v \right\rangle = \left\langle \nabla f(x_0,U_0) , v \right\rangle \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^{t} \left\langle \nabla \left[ \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla f(x_\theta (s),U_\theta (s) ) , \zeta ^{(c)}_k \right\rangle \right] , y(s) \right\rangle {\text {d}}s \right] \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{t} \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla \left( \varDelta _k f(x_\theta (s),U_\theta (s) ) , y(s) \right) \right\rangle {\text {d}}s \right] \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{ 0}^{ t } \left\langle \nabla \lambda _k(x_\theta ( s) ,U_\theta (s),\theta ) , y(s) \right\rangle \varDelta _k \varPsi _{t-s}( x_\theta ( s) , U_\theta ( s ) ,\theta ) {\text {d}}s \right] . \end{aligned}$$
(36)
Note that y(t) solves the IVP
$$\begin{aligned} \frac{{\text {d}} y}{{\text {d}}t}&= \sum _{k \in {\mathscr {R}}_c} \ \left\langle \nabla \lambda _k(x_\theta (t) , U_\theta (t) ,\theta ) , y(t) \right\rangle \zeta ^{(c)}_k \nonumber \\ and&\qquad y(0) = v, \end{aligned}$$
(37)
which shows that y(t) is the directional derivative of \(x_\theta (t)\) [see (12)] w.r.t. the initial state \(x_0\) in the direction v.
This proposition can be proved in the same way as Proposition 2, by coupling process \((x_\theta , U_\theta )\) with another process \((x_{\theta ,h} , U_{\theta ,h} )\) according to
$$\begin{aligned} x_\theta (t)&= x_0 + \sum _{k \in {\mathscr {R}}_c} \left( \int _{0}^t \lambda _k( x_\theta (s), U_\theta (s), \theta ) {\text {d}}s \right) \zeta ^{(c)}_k \\ x_{\theta ,h}(t)&= x_0 + h v + \sum _{k \in {\mathscr {R}}_c} \left( \int _{0}^t \lambda _k( x_{\theta , h}(s), U_{\theta , h}(s), \theta ) {\text {d}}s \right) \zeta ^{(c)}_k \\ U_\theta (t)&= U_0 + \sum _{k \in {\mathscr {R}}_d} Y_k \left( \int _{0}^t \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \wedge \lambda _k( x_{\theta , h}(s), U_{\theta ,h}(s), \theta ) {\text {d}}s \right) \zeta ^{(d)}_k \\&\quad + \sum _{k \in {\mathscr {R}}_d} Y^{(1)}_k \left( \int _{0}^t \lambda ^{(1)}_k( x_\theta (s), U_\theta (s) ,\theta , x_{\theta , h}(s), U_{\theta , h}(s), \theta ) {\text {d}}s \right) \zeta ^{(d)}_k \\ U_{\theta ,h}(t)&= U_0 + \sum _{k \in {\mathscr {R}}_d} Y_k \left( \int _{0}^t \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \wedge \lambda _k( x_{\theta , h}(s), U_{\theta , h}(s), \theta ) {\text {d}}s \right) \zeta ^{(d)}_k \\&\quad + \sum _{k \in {\mathscr {R}}_d} Y^{(2)}_k \left( \int _{0}^t \lambda ^{(2)}_k( x_\theta (s), U_\theta (s) ,\theta , x_{\theta , h}(s), U_{\theta , h}(s), \theta ) {\text {d}}s \right) \zeta ^{(d)}_k, \end{aligned}$$
where \(\{ Y_k , Y^{ (1) }_k , Y^{ (2)}_{k}\}\) is a collection of independent unit-rate Poisson processes, and \(\lambda ^{(1)}_k\), \(\lambda ^{(2)}_k\) are as in the proof of Proposition 2. An important difference between this proposition and Proposition 2 is that the value of \(\theta \) is the same in the coupled processes, and hence the only difference between the two processes comes due to difference in the initial continuous state \(x_0\). Consequently, the \( \partial _\theta \lambda _k\) terms in the statement of Proposition 2disappear and we obtain (36). \(\square \)
Proof
(Proof of Theorem 2) Define
$$\begin{aligned} L(t) = \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^t \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla \varPsi _{t-s}(x_\theta (s),U_\theta (s),t-s) , \zeta ^{(c)}_k \right\rangle {\text {d}}s \right] . \end{aligned}$$
Due to Proposition 2, to prove Theorem 2 it suffices to prove that
$$\begin{aligned} L(T)&= \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^T \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla f(x_\theta (t) ,U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle {\text {d}}t \right] \nonumber \\&\quad +\sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^T \left\langle \nabla \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_k \rangle \right] , y_\theta (t) \right\rangle {\text {d}}t \right] \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^T \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla \left( \varDelta _k f(x_\theta (t),U_\theta (t) ) \right) , y_\theta (t) \right\rangle {\text {d}}t \right] \nonumber \\&\quad +\sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^T \left\langle \nabla \lambda _k(x_\theta ( t) ,U_\theta (t),\theta ) ,y_\theta (t) \right\rangle \varDelta _k \varPsi _{T- t}( x_\theta (t), U_\theta (t), \theta ) {\text {d}}t \right] . \end{aligned}$$
(38)
Let \(\{ {\mathscr {F}}_t \}\) be the filtration generated by process \((x_\theta , U_\theta )\). For any \(t \ge 0\), let \(\mathbb {E}_t (\cdot )\) denote the conditional expectation \(\mathbb {E}( \cdot \vert {\mathscr {F}}_t )\). Proposition 3 allows us to write
$$\begin{aligned}&\nabla \varPsi _{t - s}(x_\theta (s) , U_\theta (s), t-s ) \\&\quad = \nabla f(x_\theta (s) , U_\theta (s) ) \\&\qquad + \sum _{k \in {\mathscr {R}}_c} \int _{s}^{t} \mathbb {E}_s \left[ \varPhi ^*(x_\theta (s), U_\theta (s),u-s) \right. \\&\left. \qquad \qquad \nabla \left[ \lambda _k (x_\theta (u) ,U_\theta (u) ,\theta ) \left\langle \nabla f(x_\theta (u),U_\theta (u) ) , \zeta ^{(c)}_k \right\rangle \right] \right] {\text {d}}u \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \int _{s}^{t} \mathbb {E}_s \left[ \lambda _k (x_\theta (u) ,U_\theta (u) ,\theta ) \varPhi ^*(x_\theta (s), U_\theta (s),u-s) \nabla \left( \varDelta _k f(x_\theta (u),U_\theta (u) ) \right) \right] {\text {d}}u \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \int _{ s}^{ t } \mathbb {E}_s \left[ \varPhi ^*(x_\theta (s), U_\theta (s), u -s) \right. \\&\quad \left. \qquad \qquad \nabla \lambda _k(x_\theta ( u) ,U_\theta (u),\theta ) \varDelta _k \varPsi _{t-u}( x_\theta ( u) , U_\theta ( u ) ,\theta ) \right] {\text {d}}u . \end{aligned}$$
This shows that
$$\begin{aligned}&\frac{{\text {d}}}{{\text {d}}t} \nabla \varPsi _{t - s}(x_\theta (s) , U_\theta (s), t-s )\\&\quad = \sum _{k \in {\mathscr {R}}_c} \mathbb {E}_s \left[ \varPhi ^*(x_\theta (s), U_\theta (s),t-s) \nabla \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle \right] \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}_s \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \varPhi ^*(x_\theta (s), U_\theta (s),t-s) \nabla \left( \varDelta _k f(x_\theta (t),U_\theta (t) ) \right) \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}_s \left[ \varPhi ^*(x_\theta (s), U_\theta (s), t -s) \nabla \lambda _k(x_\theta ( t) ,U_\theta (t),\theta ) \varDelta _k f( x_\theta ( t) , U_\theta (t ) ) \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \int _{ s}^{ t } \mathbb {E}_s \Bigg [ \varPhi ^*(x_\theta (s), U_\theta (s), u -s)\\&\qquad \qquad \nabla \lambda _k(x_\theta ( u) ,U_\theta (u),\theta ) \frac{{\text {d}}}{{\text {d}}t} \varDelta _k \varPsi _{t-u}( x_\theta ( u) , U_\theta ( u ) ,\theta ) \Bigg ] {\text {d}}u. \end{aligned}$$
The middle two terms can be combined using the product rule \(\nabla (gh) = g \nabla h + h \nabla g\) to yield
$$\begin{aligned}&\frac{{\text {d}}}{{\text {d}}t} \nabla \varPsi _{t - s}(x_\theta (s) , U_\theta (s), t-s ) \\&\quad = \sum _{k \in {\mathscr {R}}_c} \mathbb {E}_s \left[ \varPhi ^*(x_\theta (s), U_\theta (s),t-s) \nabla \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle \right] \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}_s \left[ \varPhi ^*(x_\theta (s), U_\theta (s),t-s) \nabla \left( \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \varDelta _k f(x_\theta (t),U_\theta (t) ) \right) \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \int _{ s}^{ t } \mathbb {E}_s \left[ \varPhi ^*(x_\theta (s), U_\theta (s), u -s) \right. \\&\left. \qquad \qquad \nabla \lambda _k(x_\theta ( u) ,U_\theta (u),\theta ) \frac{{\text {d}}}{{\text {d}}t} \varDelta _k \varPsi _{t-u}( x_\theta ( u) , U_\theta ( u ) ,\theta ) \right] {\text {d}}u. \end{aligned}$$
Using this, we can compute the time derivative of L(t) as
$$\begin{aligned} \frac{{\text {d}} L(t) }{{\text {d}}t} = \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla f(x_\theta (t) ,U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle \right] + A+ B+ C, \end{aligned}$$
(39)
where
$$\begin{aligned}&A := \sum _{k \in {\mathscr {R}}_c} \sum _{j \in {\mathscr {R}}_c} \int _{0}^{t} \mathbb {E}\left[ \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \varPhi ^*(x_\theta ( s) , U_\theta ( s) , t-s) \right. \right. \\&\left. \left. \qquad \qquad \nabla \left[ \lambda _j (x_\theta (t) ,U_\theta (t) ,\theta ) \langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_j \rangle \right] , \zeta ^{(c)}_k \right\rangle \right] {\text {d}}s, \\&B:= \sum _{k \in {\mathscr {R}}_c} \sum _{j \in {\mathscr {R}}_d} \int _{0}^{t} \mathbb {E}\left[ \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \right. \\&\left. \left\langle \varPhi ^*(x_\theta ( s) , U_\theta ( s) , t-s) \nabla \left[ \lambda _j (x_\theta (t) ,U_\theta (t) ,\theta ) \varDelta _j f(x_\theta (t),U_\theta (t) ) \right] , \zeta ^{(c)}_k \right\rangle \right] {\text {d}}s \\&\qquad and \\&C: = \sum _{k \in {\mathscr {R}}_c} \sum _{j \in {\mathscr {R}}_d} \int _{0}^{t} \mathbb {E}\left[ \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta )\left\langle \int _{ s}^{ t } \varPhi ^*(x_\theta (s), U_\theta (s), u -s) \right. \right. \\&\left. \left. \qquad \qquad \nabla \lambda _j(x_\theta ( u) ,U_\theta (u),\theta ) \frac{{\text {d}}}{{\text {d}}t} \varDelta _j \varPsi _{t-u}( x_\theta ( u) , U_\theta ( u ) ,\theta ) {\text {d}}u, \zeta ^{(c)}_k \right\rangle \right] {\text {d}}s. \end{aligned}$$
This definition of A, B and C ensures that
$$\begin{aligned}&A+ B+ C \\&\quad = \sum _{k \in {\mathscr {R}}_c} \int _{0}^{t} \mathbb {E}\left[ \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \frac{{\text {d}}}{{\text {d}}t} \nabla \varPsi _{t - s}(x_\theta (s) , U_\theta (s), t-s ), \zeta ^{(c)}_k \right\rangle {\text {d}}s \right] . \end{aligned}$$
Recall that \(y_\theta (t)\) can be expressed as (34). Therefore, we can write A as
$$\begin{aligned} A&= \sum _{j \in {\mathscr {R}}_c} \mathbb {E}\left[ \left\langle \nabla \left[ \lambda _j (x_\theta (t) ,U_\theta (t) ,\theta ) \langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_j \rangle \right] , \right. \right. \nonumber \\&\qquad \left. \left. \sum _{k \in {\mathscr {R}}_c} \int _{0}^{t} \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \varPhi (x_\theta ( s) , U_\theta ( s) , t-s) \zeta ^{(c)}_k \right\rangle {\text {d}}s \right] \nonumber \\&= \sum _{j \in {\mathscr {R}}_c} \mathbb {E}\left[ \left\langle \nabla \left[ \lambda _j (x_\theta (t) ,U_\theta (t) ,\theta ) \langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_j \rangle \right] , y_\theta (t) \right\rangle \right] . \end{aligned}$$
(40)
Similarly, we can write B as
$$\begin{aligned} B = \sum _{j \in {\mathscr {R}}_d} \mathbb {E}\left[ \left\langle \nabla \left[ \lambda _j (x_\theta (t) ,U_\theta (t) ,\theta ) \varDelta _j f(x_\theta (t),U_\theta (t) ) \right] , y_\theta (t) \right\rangle \right] . \end{aligned}$$
(41)
Changing the order of integration, we can write C as
$$\begin{aligned}&C= \sum _{j \in {\mathscr {R}}_d} \int _{0}^{t} \mathbb {E}\left[ \left\langle \nabla \lambda _j(x_\theta ( u) ,U_\theta (u),\theta ) \frac{{\text {d}}}{{\text {d}}t} \varDelta _j \varPsi _{t-u}( x_\theta ( u) , U_\theta ( u) ,\theta ) , \right. \right. \\&\qquad \left. \left. \sum _{k \in {\mathscr {R}}_c} \int _{ 0}^{ u} \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \varPhi (x_\theta (s), U_\theta (s), u -s)\zeta ^{(c)}_k {\text {d}}s \right\rangle {\text {d}}u \right] \\&\quad = \sum _{j \in {\mathscr {R}}_d} \int _{0}^{t} \mathbb {E}\left[ \left\langle \nabla \lambda _j(x_\theta ( u) ,U_\theta (u),\theta ) \frac{{\text {d}}}{{\text {d}}t} \varDelta _j \varPsi _{t-u}( x_\theta ( u) , U_\theta ( u ) ,\theta ) , y_\theta (u)\right\rangle {\text {d}}u \right] \\&\quad = \sum _{j \in {\mathscr {R}}_d} \frac{{\text {d}}}{{\text {d}}t} \int _{0}^{t} \mathbb {E}\left[ \left\langle \nabla \lambda _j(x_\theta ( u) ,U_\theta (u),\theta )\varDelta _j \varPsi _{t-u}( x_\theta ( u) , U_\theta ( u ) ,\theta ) , y_\theta (u)\right\rangle {\text {d}}u \right] \\&\qquad - \sum _{j \in {\mathscr {R}}_d} \mathbb {E}\left[ \left\langle \nabla \lambda _j(x_\theta ( t) ,U_\theta (t),\theta )\varDelta _j f ( x_\theta ( t) , U_\theta ( t ) ) , y_\theta (t)\right\rangle \right] . \end{aligned}$$
This relation along with (40), (41) and (39) implies that
$$\begin{aligned} \frac{{\text {d}} L(t) }{{\text {d}}t}&= \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla f(x_\theta (t) ,U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle \right] \\&\quad +\sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \left\langle \nabla \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_k \rangle \right] , y_\theta (t) \right\rangle \right] \\&\quad + \sum _{k \in {\mathscr {R}}_d}\mathbb {E}\left[ \left\langle \nabla \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \varDelta _k f(x_\theta (t),U_\theta (t) ) \right] , y_\theta (t) \right\rangle \right] \\&\quad + \sum _{k \in {\mathscr {R}}_d} \frac{{\text {d}}}{{\text {d}}t} \int _{0}^t \mathbb {E}\left[ \left\langle \nabla \lambda _k(x_\theta ( s) ,U_\theta (s),\theta ) \varDelta _k \varPsi _{t-s}( x_\theta ( s) , U_\theta ( s ) ,\theta ) ,y_\theta (s) \right\rangle {\text {d}}s \right] \\&\quad - \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \left\langle \nabla \lambda _k(x_\theta ( t) ,U_\theta (t),\theta )\varDelta _k f ( x_\theta ( t) , U_\theta ( t ) ) , y_\theta (t)\right\rangle \right] . \end{aligned}$$
Applying the product rule on the third term will produce two terms, one of which will cancel with the last term to yield
$$\begin{aligned} \frac{{\text {d}} L(t) }{{\text {d}}t}&= \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla f(x_\theta (t) ,U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle \right] \\&\quad +\sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \left\langle \nabla \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_k \rangle \right] , y_\theta (t) \right\rangle \right] \\&\quad + \sum _{k \in {\mathscr {R}}_d}\mathbb {E}\left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla \left( \varDelta _k f(x_\theta (t),U_\theta (t) ) \right) , y_\theta (t) \right\rangle \right] \\&\quad + \sum _{k \in {\mathscr {R}}_d} \frac{{\text {d}}}{{\text {d}}t} \int _{0}^t \mathbb {E}\left[ \left\langle \nabla \lambda _k(x_\theta ( s) ,U_\theta (s),\theta ) \varDelta _k \varPsi _{t-s}( x_\theta ( s) , U_\theta ( s ) ,\theta ) ,y_\theta (s) \right\rangle {\text {d}}s \right] . \end{aligned}$$
Integrating this equation from \(t = 0\) to \(t =T\) will prove (38), and this completes the proof of Theorem 2. \(\square \)
Proof
(Proof of Theorem 3) Consider the Markov process \(( x_\theta (t) , U_\theta (t) , y_\theta (t) )_{t \ge 0}\). The generator of this process is given by
$$\begin{aligned} \mathbb {H} F(x,u,y)&= \sum _{k \in {\mathscr {R}}_c} \lambda _k(x,u,\theta ) \left\langle \nabla F(x,u,y), \zeta ^{(c)}_k \right\rangle + \sum _{k \in {\mathscr {R}}_d} \lambda _k(x,u,\theta ) \varDelta _k F(x,u,y) \\&\quad + \sum _{k \in {\mathscr {R}}_c} \partial _\theta \lambda _k(x,u,\theta ) \left\langle \nabla _y F(x,u,y), \zeta ^{(c)}_k \right\rangle \\&\quad + \sum _{k \in {\mathscr {R}}_c} \left\langle \nabla \lambda _k(x,u,\theta ) , y \right\rangle \left\langle \nabla _y F(x,u,y), \zeta ^{(c)}_k \right\rangle \end{aligned}$$
for any real-valued function \(F: \mathbb {R}^{S_c} \times \mathbb {N}^{S_d}_0 \times \mathbb {R}^{S_c} \rightarrow \mathbb {R}\). Here, \(\nabla _y F\) denotes the gradient of function F w.r.t. the last \(S_c\) coordinates. Setting
$$\begin{aligned} F(x,u,y) = \left\langle \nabla f(x,u), y \right\rangle \end{aligned}$$
we obtain
$$\begin{aligned} \mathbb {H} F(x,u,y)&= \sum _{k \in {\mathscr {R}}_c} \lambda _k(x,u,\theta ) \left\langle \varDelta f(x,u) y, \zeta ^{(c)}_k \right\rangle + \sum _{k \in {\mathscr {R}}_d} \lambda _k(x,u,\theta ) \varDelta _k \left\langle \nabla f(x,u), y \right\rangle \\&\quad + \sum _{k \in {\mathscr {R}}_c} \partial _\theta \lambda _k(x,u,\theta ) \left\langle \nabla f(x,u), \zeta ^{(c)}_k \right\rangle \\&\quad + \sum _{k \in {\mathscr {R}}_c} \left\langle \nabla \lambda _k(x,u,\theta ) , y \right\rangle \left\langle \nabla f(x,u), \zeta ^{(c)}_k \right\rangle \end{aligned}$$
where \(\varDelta F\) denotes the Hessian matrix of F w.r.t. the first \(S_c\) coordinates. However, note that the first and the fourth terms can be combined with product rule as
$$\begin{aligned}&\sum _{k \in {\mathscr {R}}_c} \lambda _k(x,u,\theta ) \left\langle \varDelta f(x,u) y, \zeta ^{(c)}_k \right\rangle +\sum _{k \in {\mathscr {R}}_c} \left\langle \nabla \lambda _k(x,u,\theta ) , y \right\rangle \left\langle \nabla f(x,u), \zeta ^{(c)}_k \right\rangle \\&\quad = \sum _{k \in {\mathscr {R}}_c} \left\langle \nabla \left[ \lambda _k (x, u ,\theta ) \langle \nabla f(x,u ) , \zeta ^{(c)}_k \rangle \right] , y \right\rangle \end{aligned}$$
and hence we get
$$\begin{aligned} \mathbb {H} F(x,u,y)&= \sum _{k \in {\mathscr {R}}_c} \partial _\theta \lambda _k(x,u,\theta ) \left\langle \nabla f(x,u), \zeta ^{(c)}_k \right\rangle \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_c} \left\langle \nabla \left[ \lambda _k (x, u ,\theta ) \langle \nabla f(x,u ) , \zeta ^{(c)}_k \rangle \right] , y \right\rangle \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_d} \lambda _k(x,u,\theta ) \varDelta _k \left\langle \nabla f(x,u), y \right\rangle . \end{aligned}$$
(42)
Using Dynkin’s formula, we have
$$\begin{aligned} \mathbb {E}\left( F(x_\theta (T) , U_\theta (T) , y_\theta (T) ) \right) = \mathbb {E}\left[ \int _0^T \mathbb {H} F(x_\theta (t) , U_\theta (t) , y_\theta (t)){\text {d}}t \right] \end{aligned}$$
and substituting (42) yields
$$\begin{aligned}&\mathbb {E}\left[ \left\langle \nabla f (x_\theta (T) ,U_\theta (T) ) , y_\theta (T) \right\rangle \right] \\&\quad = \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^T \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla f(x_\theta (t) ,U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle {\text {d}}t \right] \\&\qquad +\sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^T \left\langle \nabla \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_k \rangle \right] , y_\theta (t) \right\rangle {\text {d}}t \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^T \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla \left( \varDelta _k f(x_\theta (t),U_\theta (t) ) \right) , y_\theta (t) \right\rangle {\text {d}}t \right] . \end{aligned}$$
This relation along with Proposition 2 proves Theorem 3. \(\square \)