Skip to main content
Log in

Sensitivity Analysis for Multiscale Stochastic Reaction Networks Using Hybrid Approximations

  • Special Issue: Gillespie and His Algorithms
  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

We consider the problem of estimating parameter sensitivities for stochastic models of multiscale reaction networks. These sensitivity values are important for model analysis, and the methods that currently exist for sensitivity estimation mostly rely on simulations of the stochastic dynamics. This is problematic because these simulations become computationally infeasible for multiscale networks due to reactions firing at several different timescales. However it is often possible to exploit the multiscale property to derive a “model reduction” and approximate the dynamics as a Piecewise deterministic Markov process, which is a hybrid process consisting of both discrete and continuous components. The aim of this paper is to show that such PDMP approximations can be used to accurately and efficiently estimate the parameter sensitivity for the original multiscale stochastic model. We prove the convergence of the original sensitivity to the corresponding PDMP sensitivity, in the limit where the PDMP approximation becomes exact. Moreover, we establish a representation of the PDMP parameter sensitivity that separates the contributions of discrete and continuous components in the dynamics and allows one to efficiently estimate both contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The generator of a Markov process is an operator specifying the infinitesimal rate of change of the distribution of the process [see Chapter 4 in (Ethier and Kurtz 1986) for more details].

References

  • Anderson DF (2007) A modified next reaction method for simulating chemical systems with time dependent propensities and delays. J Chem Phys 127(21):214107

    Article  Google Scholar 

  • Anderson D (2012) An efficient finite difference method for parameter sensitivities of continuous time markov chains. SIAM J Numer Anal 50(5):2237–2258

    Article  MathSciNet  MATH  Google Scholar 

  • Anderson DA, Kurtz TG (2011) Continuous time Markov chain models for chemical reaction networks. In: Koeppl H, Setti G, di Bernardo M, Densmore D (eds) Design and analysis of biomolecular circuits. Springer, Berlin

    Google Scholar 

  • Arkin AP, Rao CV, Wolf DM (2002) Control, exploitation and tolerance of intracellular noise. Nature 420:231–237. https://doi.org/10.1038/nature01258

    Article  Google Scholar 

  • Ball K, Kurtz TG, Popovic L, Rempala G (2006) Asymptotic analysis of multiscale approximations to reaction networks. Ann Appl Probab 16(4):1925–1961

    Article  MathSciNet  MATH  Google Scholar 

  • Cao Y, Petzold LR, Rathinam M, Gillespie DT (2004) The numerical stability of leaping methods for stochastic simulation of chemically reacting systems. J Chem Phys 121(24):12169–12178

    Article  Google Scholar 

  • Cao Y, Gillespie DT, Petzold LR (2005) The slow-scale stochastic simulation algorithm. J Chem Phys 122(1):1–18

    Article  Google Scholar 

  • Cao Y, Gillespie DT, Petzold LR (2006) Efficient step size selection for the tau-leaping simulation method. J Chem Phys 124(4):044109

    Article  Google Scholar 

  • Crudu A, Debussche A, Radulescu O (2009) Hybrid stochastic simplifications for multiscale gene networks. BMC Syst Biol 3(1):89

    Article  Google Scholar 

  • Darden T (1979) A pseudo-steady state approximation for stochastic chemical kinetics. Rocky Mt J Math 9(1):51–71

    Article  MathSciNet  MATH  Google Scholar 

  • Davis MHA (1993) Markov models and optimization, vol 49. Monographs on statistics and applied probability. Chapman & Hall, London

    Book  MATH  Google Scholar 

  • Duncan A, Erban R, Zygalakis K (2016) Hybrid framework for the simulation of stochastic chemical kinetics. J Comput Phys 326:398–419

    Article  MathSciNet  MATH  Google Scholar 

  • Elowitz MB, Levine AJ, Siggia ED, Swain PS (2002) Stochastic gene expression in a single cell. Science 297(5584):1183–1186. https://doi.org/10.1126/science.1070919

    Article  Google Scholar 

  • Ethier SN, Kurtz TG (1986) Markov processes. Probability and mathematical statistics. Wiley series in probability and mathematical statistics. ISBN 0-471-08186-8. Characterization and convergence. Wiley, New York

    Book  MATH  Google Scholar 

  • Eymard R, Mercier S, Roussignol M (2011) Importance and sensitivity analysis in dynamic reliability. Methodol Comput Appl Probab 13(1):75–104

    Article  MathSciNet  MATH  Google Scholar 

  • Feng X, Hooshangi S, Chen D, Li Weiss R, Rabitz H (2004) Optimizing genetic circuits by global sensitivity analysis. Biophys J 87(4):2195–2202

    Article  Google Scholar 

  • Fink M, Noble D (2009) Markov models for ion channels: versatility versus identifiability and speed. Philos Trans R Soc A Math Phys Eng Sci 367(1896):2161–2179

    Article  MathSciNet  MATH  Google Scholar 

  • Ganguly A, Altintan D, Koeppl H (2015) Jump-diffusion approximation of stochastic reaction dynamics: error bounds and algorithms. Multiscale Model Simul 13(4):1390–1419

    Article  MathSciNet  MATH  Google Scholar 

  • Gibson MA, Bruck J (2000) Efficient exact stochastic simulation of chemical systems with many species and many channels. J Phys Chem A 104(9):1876–1889

    Article  Google Scholar 

  • Gillespie DT (1977) Exact stochastic simulation of coupled chemical reactions. J Phys Chem 81(25):2340–2361

    Article  Google Scholar 

  • Gillespie DT (2001) Approximate accelerated stochastic simulation of chemically reacting systems. J Chem Phys 115(4):1716–1733

    Article  Google Scholar 

  • Goutsias J (2007) Classical versus stochastic kinetics modeling of biochemical reaction systems. Biophys J 92(7):2350–2365

    Article  Google Scholar 

  • Gunawan R, Cao Y, Doyle FJ (2005) Sensitivity analysis of discrete stochastic systems. Biophys J 88(4):2530–2540

    Article  Google Scholar 

  • Gupta A, Khammash M (2013) Unbiased estimation of parameter sensitivities for stochastic chemical reaction networks. SIAM J Sci Comput 35(6):2598–2620

    Article  MathSciNet  MATH  Google Scholar 

  • Gupta A, Khammash M (2014) An efficient and unbiased method for sensitivity analysis of stochastic reaction networks. J R Soc Interface 11(101):20140979

    Article  Google Scholar 

  • Gupta A, Rathinam M, Khammash M (2018) Estimation of parameter sensitivities for stochastic reaction networks using tau-leap simulations. SIAM J Numer Anal 56(2):1134–1167

    Article  MathSciNet  MATH  Google Scholar 

  • Gupta A, Rathinam M, Khammash M (2017) Estimation of parameter sensitivities for stochastic reaction networks using tau-leap simulations. arXiv:1703.00947

  • Hepp B, Gupta A, Khammash M (2015) Adaptive hybrid simulations for multiscale stochastic reaction networks. J Chem Phys 142(3):034118

    Article  Google Scholar 

  • Kang H-W, Kurtz TG (2013) Separation of time-scales and model reduction for stochastic reaction networks. Ann Appl Probab 23(2):529–583

    Article  MathSciNet  MATH  Google Scholar 

  • Kurtz TG (1978) Strong approximation theorems for density dependent Markov chains. ISSN 03044149

  • McAdams HH, Arkin A (1999a) It’s a noisy business! Genetic regulation at the nanomolar scale. TIG 15(2):65–69 (ISSN 0168-9525)

    Article  Google Scholar 

  • McAdams HH, Arkin A (1999b) It’s a noisy business! Genetic regulation at the nanomolar scale. TIG 15(2):65–69 (ISSN 0168-9525)

    Article  Google Scholar 

  • Michaelis L, Menten ML (2007) Die kinetik der invertinwirkung. Universitätsbibliothek Johann Christian Senckenberg

  • Plyasunov S, Arkin AP (2007) Efficient stochastic sensitivity analysis of discrete event systems. J Comput Phys 221:724–738

    Article  MathSciNet  MATH  Google Scholar 

  • Rathinam M, Petzold LR, Cao Y, Gillespie DT (2003) Stiffness in stochastic chemically reacting systems: the implicit tau-leaping method. J Chem Phys 119(24):12784–12794

    Article  Google Scholar 

  • Rathinam M, Sheppard PW, Khammash M (2010) Efficient computation of parameter sensitivities of discrete stochastic chemical reaction networks. J Chem Phys 132(3):034103

    Article  Google Scholar 

  • Rudnicki R, Tyran-Kamińska M (2017) Piecewise deterministic processes in biological models. Springer, Berlin

    Book  MATH  Google Scholar 

  • Sheppard PW, Rathinam M, Khammash M (2012) A pathwise derivative approach to the computation of parameter sensitivities in discrete stochastic chemical systems. J Chem Phys 136(3):034115

    Article  Google Scholar 

  • Stelling J, Gilles ED, Doyle FJ (2004) Robustness properties of circadian clock architectures. Proc Natl Acad Sci USA 101(36):13210–13215

    Article  Google Scholar 

  • Thattai M, van Oudenaarden A (2001) Intrinsic noise in gene regulatory networks. Proc Natl Acad Sci 98(15):8614–8619

    Article  Google Scholar 

  • Weinan E, Liu D, Vanden-Eijnden E (2005) Nested stochastic simulation algorithm for chemical kinetic systems with disparate rates. J Chem Phys 123(19):1–8

    Google Scholar 

  • Weinan E, Liu D, Vanden-Eijnden E (2007) Nested stochastic simulation algorithms for chemical kinetic systems with multiple time scales. J Comput Phys 221(1):158–180 (ISSN 0021-9991)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The funding was provided by European Research Council (Grant no. 743269).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mustafa Khammash.

Appendix

Appendix

In this Appendix, we prove the main results of the paper which are Theorems 2 and 3. Throughout this section, we use the same notation as in Sect. 3. In particular, \(\varPsi _{t}(x,U,\theta )\) is defined by (15) and under Assumption 1, this function is differentiable w.r.t. x. We start this section with a simple proposition.

Proposition 1

Let the multiscale process \(( Z^N_\theta (t) )_{t \ge 0}\) and the PDMP \(( Z_\theta (t) )_{t \ge 0}\) be as in Sect. 3. Suppose that Assumption 1 holds and \(Z^N_\theta \Rightarrow Z_\theta \) as \(N \rightarrow \infty \). Then,

$$\begin{aligned} \lim _{N \rightarrow \infty } \frac{ \partial }{ \partial \theta } \mathbb {E}( f( Z^{N}_\theta ( T ) ) ) = {\bar{S}}_\theta (f,T) \end{aligned}$$

where

$$\begin{aligned} {\bar{S}}_\theta (f,T)&= \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^T \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla \varPsi _{T-t}(x_\theta (t),U_\theta (t),T-t) , \zeta ^{(c)}_k \right\rangle {\text {d}}t \right] \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^T \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \varDelta _{k} \varPsi _{T-t}(x_\theta (t),U_\theta (t),T-t) {\text {d}}t \right] . \end{aligned}$$
(24)

Proof

Let

$$\begin{aligned} S^N_\theta (f,T) = \frac{ \partial }{ \partial \theta } \mathbb {E}( f( Z^{N}_\theta ( T ) ) ) \end{aligned}$$

and analogous to \(\varPsi _t\) (15) define the map \(\varPsi ^N_t\) as

$$\begin{aligned} \varPsi ^N_t (x,U,\theta ) = \mathbb {E}( f( Z^N_\theta (t) ) ), \quad for any \quad t \ge 0, \end{aligned}$$

where \(Z^N\) is the scaled process describing the multiscale reaction dynamics with (xU) as its initial state. Due to Theorem 3.1 in (Gupta et al. 2017), we obtain

$$\begin{aligned}&S^N_\theta (f,T) \\&= \sum _{ k =1}^K \mathbb {E}\left( N^{\rho _k + r } \int _0^T \partial _\theta \lambda ^N_k( Z^N_\theta (t) ,\theta ) ( \varPsi ^N_{T-t}( Z^N_\theta (t)+ \zeta ^N_k,\theta ) - \varPsi ^N_{T-t}( Z^N_\theta (t) , \theta ) ) {\text {d}}t \right) , \end{aligned}$$

where \(\zeta ^N_k:= \varLambda _N \zeta _k\), \(\rho _k = \beta _k + \langle \nu _k, \alpha \rangle \) and r is the timescale of observation (10). We can write \(Z^N_\theta (t) = (x^N_\theta (t), U^N_\theta (t) )\) where \(x^N_\theta (t) \in \mathbb {R}^{S_c}\) denotes the states of species in \({\mathscr {S}}_c\) and \(U^N_\theta (t) \in \mathbb {N}_0^{S_d}\) denotes the states of species in \({\mathscr {S}}_d\). Exploiting the analysis in Sect. 2.3, we can express \(S^N_\theta (f,T) \) as

$$\begin{aligned} S^N_\theta (f,T) = S^{N,c}_\theta (f,T) + S^{N,d}_\theta (f,T) +o(1) \end{aligned}$$

where the o(1) term converges to 0 as \(N \rightarrow \infty \),

$$\begin{aligned} S^{N,c}_\theta (f,T)&= \sum _{ k \in {\mathscr {R}}_c} \mathbb {E}\left( \int _0^T \partial _\theta \lambda _k( x^N_\theta (t), U^N_\theta (t) ,\theta ) N^{\rho _k + r } ( \varPsi ^N_{T-t}( x^N_\theta (t)\right. \\&\quad \left. +N^{-(\rho _k + r) } \zeta ^{(c)}_k , U^N_\theta (t) , \theta ) - \varPsi ^N_{T-t}( x^N_\theta (t), U^N_\theta (t) ,\theta ) ) {\text {d}}t \right) \end{aligned}$$

and

$$\begin{aligned} S^{N,d}_\theta (f,T)&= \sum _{ k \in {\mathscr {R}}_d}\mathbb {E}\left( \int _0^T \partial _\theta \lambda _k( x^N_\theta (t), U^N_\theta (t),\theta ) \right. \\&\qquad \left. ( \varPsi ^N_{T-t}( x^N_\theta (t), U^N_\theta (t)+ \zeta ^{(d)}_k,\theta ) - \varPsi ^N_{T-t}( x^N_\theta (t), U^N_\theta (t) , \theta ) ) {\text {d}}t \right) . \end{aligned}$$

We know that as \(N \rightarrow \infty \), process \((x^N_\theta , U^N_\theta )\) converges in distribution to process \((x_\theta , U_\theta )\) in the Skorohod topology on \(\mathbb {R}^{S_c} \times \mathbb {N}^{S_d}_0\). This ensures that for any (xU) and \(t \ge 0\), \(\varPsi ^N_t(x,U,\theta ) \rightarrow \varPsi _t(x,U,\theta )\) as \(N \rightarrow \infty \), and this convergence holds uniformly over compact sets, i.e.,

$$\begin{aligned} \lim _{N \rightarrow \infty } \sup _{ (x,U) \in C , t \in [0,T] } \left| \varPsi ^N_t(x,U,\theta ) - \varPsi _t(x,U,\theta ) \right| = 0 \end{aligned}$$
(25)

for any \(T >0\) and any compact set \(C \subset \mathbb {R}^{S_c} \times \mathbb {N}^{S_d}_0\). In fact, under Assumptions 1 we also have

$$\begin{aligned} \lim _{N \rightarrow \infty } \sup _{ (x,U) \in C , t \in [0,T] } \left\| \nabla \varPsi ^N_t(x,U,\theta ) - \nabla \varPsi _t(x,U,\theta ) \right\| = 0. \end{aligned}$$
(26)

As \((x^N_\theta , U^N_\theta ) \Rightarrow (x_\theta , U_\theta )\), using (25), it is straightforward to conclude that

$$\begin{aligned}&\lim _{N \rightarrow \infty } S^{N,d}_\theta (f,T)\nonumber \\&= \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^T \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \varDelta _{k} \varPsi _{T-t}(x_\theta (t),U_\theta (t),T-t) {\text {d}}t \right] . \end{aligned}$$
(27)

Noting that

$$\begin{aligned}&N^{\rho _k + r } \left[ \varPsi ^N_{T-t}\left( x^N_\theta (t)+ \frac{1}{ N^{\rho _k + r } }\zeta ^{(c)}_k , U^N_\theta (t) , \theta \right) - \varPsi ^N_{T-t}\left( x^N_\theta (t), U^N_\theta (t) ,\theta \right) \right] \\&\quad = \left\langle \nabla \varPsi ^N_{T-t}\left( x^N_\theta (t), U^N_\theta (t) ,\theta \right) , \zeta ^{(c)}_k \right\rangle +o(1), \end{aligned}$$

(26) allows us to obtain

$$\begin{aligned}&\lim _{N \rightarrow \infty } S^{N,c}_\theta (f,T) \\&\quad = {\sum }_{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^T \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla \varPsi _{T-t}(x_\theta (t),U_\theta (t),T-t) , \zeta ^{(c)}_k \right\rangle {\text {d}}t \right] . \end{aligned}$$

This relation along with (27) proves the proposition. \(\square \)

In light of Proposition 1, to prove Theorem 2 it suffices to show that \({\bar{S}}_\theta (f,T) = \hat{S}_\theta (f,T) \), where \(\hat{S}_\theta (f,T) \) is the sensitivity for limiting PDMP \((Z_\theta (t))_{t \ge 0 }\) defined by

$$\begin{aligned} \hat{S}_\theta (f,T)&= \lim _{ h \rightarrow \infty } \frac{ \mathbb {E}( f(Z_{\theta +h} (T) ) ) - \mathbb {E}( f(Z_{\theta } (T) ) ) }{h} \nonumber \\&= \lim _{ h \rightarrow \infty } \frac{ \mathbb {E}( f(x_{\theta +h} (T), U_{\theta +h} (T) ) ) - \mathbb {E}( f(x_{\theta } (T), U_{\theta } (T) ) ) }{h}. \end{aligned}$$
(28)

The next proposition derives a formula for \(\hat{S}_\theta (f,T) \) by coupling processes \(Z_\theta = (x_\theta , U_\theta )\) and \(Z_{\theta +h} = (x_{\theta +h}, U_{\theta +h} )\). This formula will be useful later in proving both Theorems 2 and 3.

Proposition 2

Let \(y_\theta (t)\) be the solution of IVP (16) and let \(D_\theta \lambda _k( x_\theta (t), U_\theta (t) ,\theta )\) be given by (17). Then, the PDMP sensitivity \(\hat{S}_\theta (f,T) \) defined by (28) can be expressed as

$$\begin{aligned} \hat{S}_\theta (f,T)&= \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^{T} \partial _{\theta } \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle {\text {d}}t \right] \\&\quad + \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^{T} \left\langle \nabla \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle \right] , y_\theta (t) \right\rangle {\text {d}}t \right] \\&\quad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{T} \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla \left( \varDelta _k f(x_\theta (t),U_\theta (t) ) \right) , y_\theta (t) \right\rangle {\text {d}}t \right] \\&\quad + \sum _{k \in {\mathscr {R}}_d } \mathbb {E}\left[ \int _0^T \partial _{\theta } \lambda _k (x_\theta (t) , U_\theta (t) ,\theta ) \varDelta _k \varPsi _{T- t}( x_\theta (t), U_\theta (t), \theta ) {\text {d}}t \right] \\&\quad + \sum _{k \in {\mathscr {R}}_d } \mathbb {E}\left[ \int _0^T \left\langle \nabla \lambda _k (x_\theta (t) , U_\theta (t) ,\theta ), y_\theta (t) \right\rangle \varDelta _k \varPsi _{T- t}( x_\theta (t), U_\theta (t), \theta ) {\text {d}}t \right] . \end{aligned}$$

Proof

Analogous to the “split-coupling” introduced in (Anderson 2012), we couple the PDMPs \(Z_\theta = (x_\theta , U_\theta )\) and \(Z_{\theta +h} = (x_{\theta +h} , U_{\theta +h} )\) as follows

$$\begin{aligned} x_\theta (t)&= x_0 + \sum _{k \in {\mathscr {R}}_c} \left( \int _{0}^t \lambda _k( x_\theta (s), U_\theta (s), \theta ) {\text {d}}s \right) \zeta ^{(c)}_k \\ x_{\theta +h}(t)&= x_0 + \sum _{k \in {\mathscr {R}}_c} \left( \int _{0}^t \lambda _k( x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) {\text {d}}s \right) \zeta ^{(c)}_k \\ U_\theta (t)&= U_0 \\&\quad + \sum _{k \in {\mathscr {R}}_d} Y_k \left( \int _{0}^t \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \wedge \lambda _k( x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) {\text {d}}s \right) \zeta ^{(d)}_k \\&\quad + \sum _{k \in {\mathscr {R}}_d} Y^{(1)}_k \left( \int _{0}^t \lambda ^{(1)}_k( x_\theta (s), U_\theta (s) ,\theta , x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) {\text {d}}s \right) \zeta ^{(d)}_k \\ U_{\theta +h}(t)&= U_0\\&\quad + \sum _{k \in {\mathscr {R}}_d} Y_k \left( \int _{0}^t \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \wedge \lambda _k( x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) {\text {d}}s \right) \zeta ^{(d)}_k \\&\quad + \sum _{k \in {\mathscr {R}}_d} Y^{(2)}_k \left( \int _{0}^t \lambda ^{(2)}_k( x_\theta (s), U_\theta (s) ,\theta , x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) {\text {d}}s \right) \zeta ^{(d)}_k, \end{aligned}$$

where \(a \wedge b\) denotes the minimum of a and b, \(\{ Y_k , Y^{ (1) }_k , Y^{ (2)}_{k}\}\) is a collection of independent unit-rate Poisson processes, and

$$\begin{aligned}&\lambda ^{(1)}_k( x_\theta (s), U_\theta (s) ,\theta , x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) = \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \\&\quad - \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \wedge \lambda _k( x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) \quad and \\&\lambda ^{(2)}_k( x_\theta (s), U_\theta (s) ,\theta , x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) = \lambda _k( x_{\theta +h}(s), U_{\theta +h}(s) ,\theta +h ) \\&\quad - \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \wedge \lambda _k(x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ). \end{aligned}$$

Define a stopping time as the first time that processes \(U_\theta \) and \(U_{ \theta +h}\) separate, i.e.,

$$\begin{aligned} \tau _h = \inf \{ t \ge 0: U_{\theta }(t) \ne U_{\theta +h}(t) \}. \end{aligned}$$

Observe that the generator for the PDMP \(Z_\theta = (x_\theta , U_\theta )\) is

$$\begin{aligned} \mathbb {A}_\theta g(x,u) = \sum _{ k \in {\mathscr {R}}_c} \lambda _k(x,u,\theta ) \left\langle \nabla g(x,u) , \zeta ^{(c)}_k \right\rangle + \sum _{k \in {\mathscr {R}}_d} \lambda _k(x,u,\theta ) \varDelta _k g(x,u), \end{aligned}$$

where \(g : \mathbb {R}^{S_c} \times \mathbb {N}^{S_d}_0\) is any function which is differentiable in the first \(S_c\) coordinates. Applying Dynkin’s formula, we obtain

$$\begin{aligned} \mathbb {E}\left( f(x_\theta (t) ,U_\theta (t) ) \right)&= f(x_0,U_0) + \mathbb {E}\left( \int _{0}^t \mathbb {A}_\theta f( x_\theta (s) , U_\theta (s) ) {\text {d}}s \right) \qquad and \\ \quad \mathbb {E}\left( f(x_{\theta +h}(t) ,U_{\theta +h}(t) ) \right)&= f(x_0,U_0) + \mathbb {E}\left( \int _{0}^t \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) {\text {d}}s \right) . \end{aligned}$$

The above coupling between processes \(Z_\theta = (x_\theta , U_\theta )\) and \(Z_{\theta +h} = (x_{\theta +h} , U_{\theta +h} )\) ensures that for \(0 \le s \le \tau _h\) we have \(U_{\theta +h}(s) = U_{\theta }(s)\) and \(x_{\theta +h}(s) = x_\theta (s) + h y_\theta (t) +o(h)\). Noting that \(\tau _h \rightarrow \infty \) a.s. as \(h \rightarrow 0\), we obtain

$$\begin{aligned}&\lim _{h \rightarrow 0 } \frac{1}{h} \left[ \mathbb {E}\left( \int _{0}^{\tau _h\wedge t} \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) {\text {d}}s \right) - \mathbb {E}\left( \int _{0}^{\tau _h\wedge t} \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) {\text {d}}s \right) \right] \nonumber \\&\quad = \lim _{h \rightarrow 0 } \frac{1}{h} \left[ \mathbb {E}\left( \int _{0}^{\tau _h\wedge t} \left[ \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta }(s) ) - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right] {\text {d}}s \right) \right] \nonumber \\&\quad = \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^{ t} \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla f(x_\theta (s),U_\theta (s) ) , \zeta ^{(c)}_k \right\rangle {\text {d}}s \right] \nonumber \\&\qquad + \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^{t} \left\langle \nabla \left[ \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla f(x_\theta (s),U_\theta (s) ) , \zeta ^{(c)}_k \right\rangle \right] , y_\theta (s) \right\rangle {\text {d}}s \right] \nonumber \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{ t} \left\langle \nabla \left[ \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \varDelta _k f(x_\theta (s),U_\theta (s) ) \right] , y_\theta (s) \right\rangle {\text {d}}s \right] \nonumber \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{ t} \partial _\theta \lambda _k(x_\theta (s) ,U_\theta (s) ,\theta ) \varDelta _k f(x_\theta (s),U_\theta (s)) {\text {d}}s \right] . \end{aligned}$$
(29)

Let \(\sigma _0 = 0\) and for each \(i=1,2,\dots \) let \(\sigma _i\) denote the ith jump time of the process

$$\begin{aligned} \sum _{k \in {\mathscr {R}}_d} Y_k \left( \int _{0}^t \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \wedge \lambda _k( x_{\theta +h}(s), U_{\theta +h}(s), \theta +h ) {\text {d}}s \right) \end{aligned}$$

which counts the common jump times among processes \(U_\theta \) and \(U_{\theta +h}\). Observe that

$$\begin{aligned}&\lim _{h \rightarrow 0 } \frac{1}{h} \left[ \mathbb {E}\left( \int _{\tau _h\wedge t}^t \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) {\text {d}}s \right) - \mathbb {E}\left( \int _{\tau _h\wedge t}^t \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) {\text {d}}s \right) \right] \\&\quad = \sum _{i=0}^\infty \lim _{h \rightarrow 0 } \frac{1}{h} \mathbb {E}\left[ \mathrm{1l}_{ \{ \sigma _i \wedge t \le \tau _h < \sigma _{i+1} \wedge t \} } \int _{\tau _h\wedge t}^t \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) \right. \right. \\&\left. \left. \qquad \qquad - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s \right] . \end{aligned}$$

Recall the definition of \(D_\theta \lambda _k( x_\theta (t), U_\theta (t) ,\theta )\) from (17). We shall soon prove that

$$\begin{aligned}&\lim _{h \rightarrow 0 } \frac{1}{h} \mathbb {E}\left[ \mathrm{1l}_{ \{ \sigma _i \wedge t \le \tau _h < \sigma _{i+1} \wedge t \} } \int _{\tau _h\wedge t}^t \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) )\right. \right. \nonumber \\&\left. \left. \qquad \qquad \qquad - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s\right] . \nonumber \\&\quad = \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{ t \wedge \sigma _i}^{ t \wedge \sigma _{i +1} } D_\theta \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \left( \varDelta _k \varPsi _{t- s}( x_\theta (s), U_\theta (s), \theta ) )\right. \right. \nonumber \\&\left. \left. \qquad \qquad \qquad - \varDelta _k f( x_\theta (s), U_\theta (s)) \right) {\text {d}}s \right] . \end{aligned}$$
(30)

Assuming this for now, we get

$$\begin{aligned}&\lim _{h \rightarrow 0 } \frac{1}{h} \left[ \mathbb {E}\left( \int _{\tau _h\wedge t}^t \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) {\text {d}}s \right) - \mathbb {E}\left( \int _{\tau _h\wedge t}^t \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) {\text {d}}s \right) \right] \\&\quad = \sum _{k \in {\mathscr {R}}_d} \sum _{i=0}^\infty \mathbb {E}\left[ \int _{t \wedge \sigma _i}^{ t \wedge \sigma _{i+1} } D_\theta \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \left( \varDelta _k \varPsi _{t- s}( x_\theta (s), U_\theta (s), \theta ) \right. \right. \\&\left. \left. \qquad \qquad \qquad - \varDelta _k f( x_\theta (s), U_\theta (s)) \right) {\text {d}}s \right] \\&\quad = \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{ t} \partial _\theta \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \varDelta _k \varPsi _{t- s}( x_\theta (s), U_\theta (s), \theta ) {\text {d}}s \right] \\&\qquad - \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{ t} \partial _\theta \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \varDelta _k f( x_\theta (s), U_\theta (s)) {\text {d}}s \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{ t} \left\langle \nabla \lambda _k(x_\theta ( s) ,U_\theta (s),\theta ) , y_\theta ( s) \right\rangle \varDelta _k \varPsi _{t- s}( x_\theta (s), U_\theta (s), \theta ) {\text {d}}s \right] \\&\qquad - \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{ t} \left\langle \nabla \lambda _k(x_\theta ( s) ,U_\theta (s),\theta ) , y_\theta ( s) \right\rangle \varDelta _k f( x_\theta (s), U_\theta (s)) {\text {d}}s \right] . \end{aligned}$$

Combining this formula with (29), we obtain

$$\begin{aligned}&\hat{S}_\theta (f,t) \\&\quad = \lim _{h \rightarrow 0} \frac{ \mathbb {E}\left( f(x_{\theta +h} (t), U_{\theta +h} (t) ) \right) - \mathbb {E}\left( f(x_{\theta } (t), U_{\theta } (t) \right) }{h} \\&\quad = \lim _{h \rightarrow 0} \frac{1}{h} \mathbb {E}\left[ \int _{0}^t \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s \right] \\&\quad = \lim _{h \rightarrow 0} \frac{1}{h} \mathbb {E}\left[ \int _{0}^{t \wedge \tau _h } \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s \right] \\&\qquad + \lim _{h \rightarrow 0} \frac{1}{h} \mathbb {E}\left[ \int _{t \wedge \tau _h }^t \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s \right] \\&\quad = \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^{t} \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla f(x_\theta (s),U_\theta (s) ) , \zeta ^{(c)}_k \right\rangle {\text {d}}s \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^{t} \left\langle \nabla \left[ \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla f(x_\theta (s),U_\theta (s) ) , \zeta ^{(c)}_k \right\rangle \right] , y_\theta (s) \right\rangle {\text {d}}s \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{t} \left\langle \nabla \left[ \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \varDelta _k f(x_\theta (s),U_\theta (s) ) \right] , y_\theta (s) \right\rangle {\text {d}}s \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{t} \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \varDelta _k f(x_\theta (s) , U_\theta ( s) ) {\text {d}}s \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{ 0}^{ t } \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \varDelta _k \varPsi _{t- s}( x_\theta (s), U_\theta (s), \theta ) {\text {d}}s \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{ 0}^{ t } \left\langle \nabla \lambda _k(x_\theta ( s) ,U_\theta (s),\theta ) , y_\theta ( s) \right\rangle \varDelta _k \varPsi _{t- s}( x_\theta (s), U_\theta (s), \theta ) {\text {d}}s \right] \\&\qquad - \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{ 0}^{ t } \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \varDelta _k f(x_\theta (s) , U_\theta ( s) ) {\text {d}}s \right] \\&\qquad - \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{ 0}^{ t } \left\langle \nabla \lambda _k(x_\theta ( s) ,U_\theta (s),\theta ) , y_\theta ( s) \right\rangle \varDelta _k f(x_\theta (s) , U_\theta ( s) ) {\text {d}}s \right] . \end{aligned}$$

In the last expression, the fourth term cancels with the sixth term. Expanding the third term via the product rule \(\nabla (gh) = g \nabla h + h \nabla g\) produces two terms, one of which cancels with the last term, and then we obtain the result stated in the statement of this proposition. Therefore, to prove this proposition the only step remaining is to show (30). This is what we do next.

Assume that \(x_\theta ( \sigma _i ) = x\), \(x_{\theta +h}( \sigma _i ) = x(h) = x + o(1)\), \(U_\theta (\sigma _i) =U_{\theta +h}(\sigma _i) =U\) and \(\{\tau _h > \sigma _i\}\). Given this information \({\mathscr {F}}_i\), the random time \(\delta _i = ( \tau _h -\sigma _i ) \wedge (\sigma _{i+1} -\sigma _i )\) has distribution that satisfies

$$\begin{aligned} \mathbb {P}\left( \delta _i \le w \vert {\mathscr {F}}_i \right) = 1 - \exp \left( - \int _{0}^w \lambda _0( x_\theta (s + \sigma _i) , U ,\theta ) {\text {d}}s \right) + o(1), for w \in [0, \infty ) \end{aligned}$$
(31)

where \(\lambda _0(x,U,\theta ) = \sum _{k \in {\mathscr {R}}_d} \lambda _k(x,U,\theta )\). Given \(\delta _i = w\), the probability that event \(\{ (\sigma _{i+1} -\sigma _i ) > ( \tau _h -\sigma _i ) \}\) occurs (i.e., \(\delta _i = \tau _h -\sigma _i\)) and the perturbation reaction is \(k \in {\mathscr {R}}_d\) is simply

$$\begin{aligned}&\frac{1}{ \lambda _0(x_\theta ( \sigma _i +w) , U,\theta ) }\left| D_\theta \lambda _k(x_\theta (\sigma _i+w) , U,\theta ) \right| h +o(h). \end{aligned}$$

If \(D_\theta \lambda _k(x_\theta (\sigma _i+w) , U,\theta ) > 0\), then at time \(\tau _h\) process \(U_{ \theta +h}\) jumps by \(\zeta ^{ (d) }_k\), and if \(D_\theta \lambda _k(x_\theta (\sigma _i+w) , U,\theta ) <0\), process \(U_{ \theta }\) jumps by \(\zeta ^{ (d) }_k\). We will suppose that the first situation holds, but the other case can be handled similarly. Assuming \(w < (t - \sigma _i)\), we have

$$\begin{aligned}&\lim _{h \rightarrow 0} \mathbb {E}\left( \int _{\tau _h\wedge t}^t \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s \bigg \vert {\mathscr {F}}_i, \tau _h = \sigma _i + w ,k \right) \\&\quad = \varDelta _{k} \varPsi _{t - \sigma _i -w} ( x_\theta ( \sigma _i + w), U_\theta ( \sigma _i+w ), \theta ) - \varDelta _{k} f( x_\theta ( \sigma _i + w), U_\theta ( \sigma _i+w ) ) \\&\quad := G_k( x_\theta ( \sigma _i +w) , U_\theta ( \sigma _i +w ) , t - \sigma _i-w) \end{aligned}$$

and as \(\delta _i\) has distribution (31), we obtain

$$\begin{aligned}&\lim _{h \rightarrow 0} \frac{1}{h} \mathbb {E}\left[ \mathrm{1l}_{ \{ \sigma _i \wedge t \le \tau _h < \sigma _{i+1} \wedge t \} } \int _{\tau _h\wedge t}^t \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s \right] \nonumber \\&\quad = \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \mathrm{1l}_{ \{ \sigma _i \le t \} } \int _{0}^{t - \sigma _i} G_k( x_\theta ( \sigma _i +w) , U_\theta ( \sigma _i +w ) , t - \sigma _i-w) \right. \nonumber \\&\quad \left. D_\theta \lambda _k(x_\theta (\sigma _i+w) , U_\theta (\sigma _i+w),\theta ) \exp \left( -\int _{0}^w \lambda _0(x_\theta ( \sigma _i+s) , U_\theta (\sigma _i+s),\theta ) {\text {d}}s \right) dw \right] . \end{aligned}$$
(32)

Note that given \(\sigma _i < t\) and \({\mathscr {F}}_i\), the random variable \(\gamma _i = (t \wedge \sigma _{i +1} - t \wedge \sigma _{i })\) has probability density function given by

$$\begin{aligned} p(w) = \lambda _0(x_\theta ( \sigma _i+ w ) , U_\theta (\sigma _i+w),\theta ) \exp \left( -\int _{0}^w \lambda _0(x_\theta ( \sigma _i+ u ) , U_\theta (\sigma _i+u),\theta ) {\text {d}}u \right) , \end{aligned}$$

for \(w \in [0, t -\sigma _i)\) and \( \mathbb {P}\left( \gamma _i \le w \vert {\mathscr {F}}_i \right) =1\) if \(w \ge (t - \sigma _i)\). Letting

$$\begin{aligned}&G(s,t) = G_k( x_\theta ( s) , U_\theta ( s) , t - s) D_\theta \lambda _k(x_\theta (s) , U_\theta (s),\theta )\\&\quad and \quad P(w) = \int _{w}^\infty p(u){\text {d}}u = \exp \left( -\int _{0}^w \lambda _0(x_\theta ( \sigma _i+ u ) , U_\theta (\sigma _i+u),\theta ) {\text {d}}u \right) \end{aligned}$$

we have

$$\begin{aligned}&\mathbb {E}\left( \int _{t \wedge \sigma _i}^{ t \wedge \sigma _{i+1} } G(s,t) {\text {d}}s \bigg \vert {\mathscr {F}}_i , \sigma _i< t \right) = \mathbb {E}\left( \int _{0}^{ \gamma _i } G(s+\sigma _i,t) {\text {d}}s \bigg \vert {\mathscr {F}}_i , \sigma _i< t \right) \\&\quad = \mathbb {P}\left( \gamma _i \ge t - \sigma _i \bigg \vert {\mathscr {F}}_i , \sigma _i< t \right) \int _{0}^{t - \sigma _i} G(s+\sigma _i,t) {\text {d}}s \\&\qquad + \mathbb {E}\left( \mathrm{1l}_{ \{ 0 \le \gamma _i<(t - \sigma _i) \} } \int _{0}^{ \delta _i } G(s+\sigma _i,t) {\text {d}}s \bigg \vert {\mathscr {F}}_i , \sigma _i < t \right) \\&\quad = P(t - \sigma _i) \int _{0}^{t - \sigma _i} G(s+\sigma _i,t) {\text {d}}s + \int _{0}^{t - \sigma _i} p(w) \left( \int _{0}^{ w } G(s+\sigma _i,t) {\text {d}}s \right) dw. \end{aligned}$$

Using integration by parts

$$\begin{aligned} \int _{0}^{t - \sigma _i} p(w) \left( \int _{0}^{ w } G(s+\sigma _i,t) {\text {d}}s \right) dw&= -P(t - \sigma _i)\left( \int _{0}^{ t -\sigma _i } G(s+\sigma _i,t) {\text {d}}s \right) \\&\quad +\int _{0}^{t - \sigma _i} P(w) G(w+\sigma _i,t) dw \end{aligned}$$

which shows that

$$\begin{aligned} \int _{0}^{t - \sigma _i} P(w) G(w+\sigma _i,t) {\text {d}}s = \mathbb {E}\left( \int _{t \wedge \sigma _i}^{ t \wedge \sigma _{i+1} } G(s,t) {\text {d}}s \bigg \vert {\mathscr {F}}_i , \sigma _i < t \right) . \end{aligned}$$

Substituting this expression in (32) gives us

$$\begin{aligned}&\lim _{h \rightarrow 0} \frac{1}{h} \mathbb {E}\left[ \int _{\tau _h\wedge t}^t \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s \right] \\&\quad = \sum _{i=0}^\infty \lim _{h \rightarrow 0} \frac{1}{h} \mathbb {E}\left[ \mathrm{1l}_{ \{ \sigma _i \wedge t \le \tau _h < \sigma _{i+1} \wedge t \} } \int _{\tau _h\wedge t}^t \left( \mathbb {A}_{\theta +h} f( x_{\theta +h}(s) , U_{\theta +h}(s) ) \right. \right. \\&\left. \left. \qquad \qquad \qquad - \mathbb {A}_{\theta } f( x_{\theta }(s) , U_{\theta }(s) ) \right) {\text {d}}s \right] \\&\quad = \sum _{k \in {\mathscr {R}}_d} \sum _{i=0}^\infty \mathbb {E}\left[ \int _{ t \wedge \sigma _i}^{ t \wedge \sigma _{i +1} } G_k( x_\theta ( s) , U_\theta ( s) , t - s) D_\theta \lambda _k(x_\theta (s) , U_\theta (s),\theta ) {\text {d}}s \right] . \end{aligned}$$

This proves (30) and completes the proof of this proposition. \(\square \)

Define a \(S_c \times S_c\) matrix by

$$\begin{aligned} M(x,U,\theta ) = \sum _{k \in {\mathscr {R}}_c } \zeta ^{ (c) }_{k} ( \nabla \lambda _k(x,U,\theta ) )^* \end{aligned}$$

for any \((x , U, \theta ) \in \mathbb {R}^{S_c} \times \mathbb {N}^{S_d}_0 \times \mathbb {R}\), where \(v^*\) denotes the transpose of v. Let \(\varPhi (x_0,U_0,t)\) be the solution of the linear matrix-valued equations

$$\begin{aligned} \frac{{\text {d}} }{{\text {d}}t} \varPhi (x_0,U_0,t) = M(x_\theta (t) , U_\theta (t) ,\theta ) \varPhi (x_0,U_0,t) \end{aligned}$$
(33)

with \(\varPhi (x_0,U_0,0) = \mathbf{I}\), which is the \(S_c \times S_c\) identity matrix. Here \((x_0, U_0)\) denotes the initial state of \((x_\theta (t) ,U_\theta (t) )\). It can be seen that \(y_\theta (t)\), which is the solution of IVP (16), can be written as

$$\begin{aligned} y_\theta (t) = \sum _{k \in {\mathscr {R}}_c } \int _0^t \partial _\theta \lambda _k ( x_\theta (s) , U_\theta (s) , \theta ) \varPhi (x_\theta (s),U_\theta (s),t - s) \zeta _k^{(c)} {\text {d}}s. \end{aligned}$$
(34)

This shall be useful in proving the next proposition which considers the sensitivity of \(\varPsi _t (x_\theta (t) , U_\theta (t) ,\theta )\) to the initial value of the continuous state \(x_0\).

Proposition 3

Let \(\varPhi (x_0,U_0,t)\) be the matrix-valued function defined above. Then, we can express the gradient of \(\varPsi _t(x_0,U_0,\theta ) \) w.r.t. \(x_0\) as

$$\begin{aligned}&\nabla \varPsi _t(x_0,U_0,\theta ) = \nabla f(x_0,U_0) \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^{t} \varPhi ^*(x_0, U_0,s) \nabla \left[ \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla f(x_\theta (s),U_\theta (s) ) , \zeta ^{(c)}_k \right\rangle \right] {\text {d}}s \right] \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{t} \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \varPhi ^*(x_0, U_0,s) \nabla \left( \varDelta _k f(x_\theta (s),U_\theta (s) ) \right) {\text {d}}s \right] \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{ 0}^{ t } \varPhi ^*(x_0, U_0,s) \nabla \lambda _k(x_\theta ( s) ,U_\theta (s),\theta ) \varDelta _k \varPsi _{t-s}( x_\theta ( s) , U_\theta ( s ) ,\theta ) {\text {d}}s \right] . \end{aligned}$$
(35)

Proof

To prove this proposition, it suffices to show that for any vector \(v \in \mathbb {R}^{S_c}\), the inner product of v with the l.h.s. of (35) is same as the inner product of v with the r.h.s. of (35). Defining

$$\begin{aligned} y(t) = \varPhi (x_0, U_0,t) v \end{aligned}$$

our aim is to prove that

$$\begin{aligned}&\left\langle \nabla \varPsi _t(x_0,U_0,\theta ) , v \right\rangle = \left\langle \nabla f(x_0,U_0) , v \right\rangle \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^{t} \left\langle \nabla \left[ \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla f(x_\theta (s),U_\theta (s) ) , \zeta ^{(c)}_k \right\rangle \right] , y(s) \right\rangle {\text {d}}s \right] \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^{t} \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla \left( \varDelta _k f(x_\theta (s),U_\theta (s) ) , y(s) \right) \right\rangle {\text {d}}s \right] \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{ 0}^{ t } \left\langle \nabla \lambda _k(x_\theta ( s) ,U_\theta (s),\theta ) , y(s) \right\rangle \varDelta _k \varPsi _{t-s}( x_\theta ( s) , U_\theta ( s ) ,\theta ) {\text {d}}s \right] . \end{aligned}$$
(36)

Note that y(t) solves the IVP

$$\begin{aligned} \frac{{\text {d}} y}{{\text {d}}t}&= \sum _{k \in {\mathscr {R}}_c} \ \left\langle \nabla \lambda _k(x_\theta (t) , U_\theta (t) ,\theta ) , y(t) \right\rangle \zeta ^{(c)}_k \nonumber \\ and&\qquad y(0) = v, \end{aligned}$$
(37)

which shows that y(t) is the directional derivative of \(x_\theta (t)\) [see (12)] w.r.t. the initial state \(x_0\) in the direction v.

This proposition can be proved in the same way as Proposition 2, by coupling process \((x_\theta , U_\theta )\) with another process \((x_{\theta ,h} , U_{\theta ,h} )\) according to

$$\begin{aligned} x_\theta (t)&= x_0 + \sum _{k \in {\mathscr {R}}_c} \left( \int _{0}^t \lambda _k( x_\theta (s), U_\theta (s), \theta ) {\text {d}}s \right) \zeta ^{(c)}_k \\ x_{\theta ,h}(t)&= x_0 + h v + \sum _{k \in {\mathscr {R}}_c} \left( \int _{0}^t \lambda _k( x_{\theta , h}(s), U_{\theta , h}(s), \theta ) {\text {d}}s \right) \zeta ^{(c)}_k \\ U_\theta (t)&= U_0 + \sum _{k \in {\mathscr {R}}_d} Y_k \left( \int _{0}^t \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \wedge \lambda _k( x_{\theta , h}(s), U_{\theta ,h}(s), \theta ) {\text {d}}s \right) \zeta ^{(d)}_k \\&\quad + \sum _{k \in {\mathscr {R}}_d} Y^{(1)}_k \left( \int _{0}^t \lambda ^{(1)}_k( x_\theta (s), U_\theta (s) ,\theta , x_{\theta , h}(s), U_{\theta , h}(s), \theta ) {\text {d}}s \right) \zeta ^{(d)}_k \\ U_{\theta ,h}(t)&= U_0 + \sum _{k \in {\mathscr {R}}_d} Y_k \left( \int _{0}^t \lambda _k( x_\theta (s), U_\theta (s) ,\theta ) \wedge \lambda _k( x_{\theta , h}(s), U_{\theta , h}(s), \theta ) {\text {d}}s \right) \zeta ^{(d)}_k \\&\quad + \sum _{k \in {\mathscr {R}}_d} Y^{(2)}_k \left( \int _{0}^t \lambda ^{(2)}_k( x_\theta (s), U_\theta (s) ,\theta , x_{\theta , h}(s), U_{\theta , h}(s), \theta ) {\text {d}}s \right) \zeta ^{(d)}_k, \end{aligned}$$

where \(\{ Y_k , Y^{ (1) }_k , Y^{ (2)}_{k}\}\) is a collection of independent unit-rate Poisson processes, and \(\lambda ^{(1)}_k\), \(\lambda ^{(2)}_k\) are as in the proof of Proposition 2. An important difference between this proposition and Proposition 2 is that the value of \(\theta \) is the same in the coupled processes, and hence the only difference between the two processes comes due to difference in the initial continuous state \(x_0\). Consequently, the \( \partial _\theta \lambda _k\) terms in the statement of Proposition 2disappear and we obtain (36). \(\square \)

Proof

(Proof of Theorem 2) Define

$$\begin{aligned} L(t) = \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^t \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla \varPsi _{t-s}(x_\theta (s),U_\theta (s),t-s) , \zeta ^{(c)}_k \right\rangle {\text {d}}s \right] . \end{aligned}$$

Due to Proposition 2, to prove Theorem 2 it suffices to prove that

$$\begin{aligned} L(T)&= \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^T \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla f(x_\theta (t) ,U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle {\text {d}}t \right] \nonumber \\&\quad +\sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^T \left\langle \nabla \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_k \rangle \right] , y_\theta (t) \right\rangle {\text {d}}t \right] \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^T \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla \left( \varDelta _k f(x_\theta (t),U_\theta (t) ) \right) , y_\theta (t) \right\rangle {\text {d}}t \right] \nonumber \\&\quad +\sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^T \left\langle \nabla \lambda _k(x_\theta ( t) ,U_\theta (t),\theta ) ,y_\theta (t) \right\rangle \varDelta _k \varPsi _{T- t}( x_\theta (t), U_\theta (t), \theta ) {\text {d}}t \right] . \end{aligned}$$
(38)

Let \(\{ {\mathscr {F}}_t \}\) be the filtration generated by process \((x_\theta , U_\theta )\). For any \(t \ge 0\), let \(\mathbb {E}_t (\cdot )\) denote the conditional expectation \(\mathbb {E}( \cdot \vert {\mathscr {F}}_t )\). Proposition 3 allows us to write

$$\begin{aligned}&\nabla \varPsi _{t - s}(x_\theta (s) , U_\theta (s), t-s ) \\&\quad = \nabla f(x_\theta (s) , U_\theta (s) ) \\&\qquad + \sum _{k \in {\mathscr {R}}_c} \int _{s}^{t} \mathbb {E}_s \left[ \varPhi ^*(x_\theta (s), U_\theta (s),u-s) \right. \\&\left. \qquad \qquad \nabla \left[ \lambda _k (x_\theta (u) ,U_\theta (u) ,\theta ) \left\langle \nabla f(x_\theta (u),U_\theta (u) ) , \zeta ^{(c)}_k \right\rangle \right] \right] {\text {d}}u \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \int _{s}^{t} \mathbb {E}_s \left[ \lambda _k (x_\theta (u) ,U_\theta (u) ,\theta ) \varPhi ^*(x_\theta (s), U_\theta (s),u-s) \nabla \left( \varDelta _k f(x_\theta (u),U_\theta (u) ) \right) \right] {\text {d}}u \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \int _{ s}^{ t } \mathbb {E}_s \left[ \varPhi ^*(x_\theta (s), U_\theta (s), u -s) \right. \\&\quad \left. \qquad \qquad \nabla \lambda _k(x_\theta ( u) ,U_\theta (u),\theta ) \varDelta _k \varPsi _{t-u}( x_\theta ( u) , U_\theta ( u ) ,\theta ) \right] {\text {d}}u . \end{aligned}$$

This shows that

$$\begin{aligned}&\frac{{\text {d}}}{{\text {d}}t} \nabla \varPsi _{t - s}(x_\theta (s) , U_\theta (s), t-s )\\&\quad = \sum _{k \in {\mathscr {R}}_c} \mathbb {E}_s \left[ \varPhi ^*(x_\theta (s), U_\theta (s),t-s) \nabla \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle \right] \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}_s \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \varPhi ^*(x_\theta (s), U_\theta (s),t-s) \nabla \left( \varDelta _k f(x_\theta (t),U_\theta (t) ) \right) \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}_s \left[ \varPhi ^*(x_\theta (s), U_\theta (s), t -s) \nabla \lambda _k(x_\theta ( t) ,U_\theta (t),\theta ) \varDelta _k f( x_\theta ( t) , U_\theta (t ) ) \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \int _{ s}^{ t } \mathbb {E}_s \Bigg [ \varPhi ^*(x_\theta (s), U_\theta (s), u -s)\\&\qquad \qquad \nabla \lambda _k(x_\theta ( u) ,U_\theta (u),\theta ) \frac{{\text {d}}}{{\text {d}}t} \varDelta _k \varPsi _{t-u}( x_\theta ( u) , U_\theta ( u ) ,\theta ) \Bigg ] {\text {d}}u. \end{aligned}$$

The middle two terms can be combined using the product rule \(\nabla (gh) = g \nabla h + h \nabla g\) to yield

$$\begin{aligned}&\frac{{\text {d}}}{{\text {d}}t} \nabla \varPsi _{t - s}(x_\theta (s) , U_\theta (s), t-s ) \\&\quad = \sum _{k \in {\mathscr {R}}_c} \mathbb {E}_s \left[ \varPhi ^*(x_\theta (s), U_\theta (s),t-s) \nabla \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle \right] \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}_s \left[ \varPhi ^*(x_\theta (s), U_\theta (s),t-s) \nabla \left( \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \varDelta _k f(x_\theta (t),U_\theta (t) ) \right) \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \int _{ s}^{ t } \mathbb {E}_s \left[ \varPhi ^*(x_\theta (s), U_\theta (s), u -s) \right. \\&\left. \qquad \qquad \nabla \lambda _k(x_\theta ( u) ,U_\theta (u),\theta ) \frac{{\text {d}}}{{\text {d}}t} \varDelta _k \varPsi _{t-u}( x_\theta ( u) , U_\theta ( u ) ,\theta ) \right] {\text {d}}u. \end{aligned}$$

Using this, we can compute the time derivative of L(t) as

$$\begin{aligned} \frac{{\text {d}} L(t) }{{\text {d}}t} = \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla f(x_\theta (t) ,U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle \right] + A+ B+ C, \end{aligned}$$
(39)

where

$$\begin{aligned}&A := \sum _{k \in {\mathscr {R}}_c} \sum _{j \in {\mathscr {R}}_c} \int _{0}^{t} \mathbb {E}\left[ \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \varPhi ^*(x_\theta ( s) , U_\theta ( s) , t-s) \right. \right. \\&\left. \left. \qquad \qquad \nabla \left[ \lambda _j (x_\theta (t) ,U_\theta (t) ,\theta ) \langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_j \rangle \right] , \zeta ^{(c)}_k \right\rangle \right] {\text {d}}s, \\&B:= \sum _{k \in {\mathscr {R}}_c} \sum _{j \in {\mathscr {R}}_d} \int _{0}^{t} \mathbb {E}\left[ \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \right. \\&\left. \left\langle \varPhi ^*(x_\theta ( s) , U_\theta ( s) , t-s) \nabla \left[ \lambda _j (x_\theta (t) ,U_\theta (t) ,\theta ) \varDelta _j f(x_\theta (t),U_\theta (t) ) \right] , \zeta ^{(c)}_k \right\rangle \right] {\text {d}}s \\&\qquad and \\&C: = \sum _{k \in {\mathscr {R}}_c} \sum _{j \in {\mathscr {R}}_d} \int _{0}^{t} \mathbb {E}\left[ \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta )\left\langle \int _{ s}^{ t } \varPhi ^*(x_\theta (s), U_\theta (s), u -s) \right. \right. \\&\left. \left. \qquad \qquad \nabla \lambda _j(x_\theta ( u) ,U_\theta (u),\theta ) \frac{{\text {d}}}{{\text {d}}t} \varDelta _j \varPsi _{t-u}( x_\theta ( u) , U_\theta ( u ) ,\theta ) {\text {d}}u, \zeta ^{(c)}_k \right\rangle \right] {\text {d}}s. \end{aligned}$$

This definition of A, B and C ensures that

$$\begin{aligned}&A+ B+ C \\&\quad = \sum _{k \in {\mathscr {R}}_c} \int _{0}^{t} \mathbb {E}\left[ \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \frac{{\text {d}}}{{\text {d}}t} \nabla \varPsi _{t - s}(x_\theta (s) , U_\theta (s), t-s ), \zeta ^{(c)}_k \right\rangle {\text {d}}s \right] . \end{aligned}$$

Recall that \(y_\theta (t)\) can be expressed as (34). Therefore, we can write A as

$$\begin{aligned} A&= \sum _{j \in {\mathscr {R}}_c} \mathbb {E}\left[ \left\langle \nabla \left[ \lambda _j (x_\theta (t) ,U_\theta (t) ,\theta ) \langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_j \rangle \right] , \right. \right. \nonumber \\&\qquad \left. \left. \sum _{k \in {\mathscr {R}}_c} \int _{0}^{t} \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \varPhi (x_\theta ( s) , U_\theta ( s) , t-s) \zeta ^{(c)}_k \right\rangle {\text {d}}s \right] \nonumber \\&= \sum _{j \in {\mathscr {R}}_c} \mathbb {E}\left[ \left\langle \nabla \left[ \lambda _j (x_\theta (t) ,U_\theta (t) ,\theta ) \langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_j \rangle \right] , y_\theta (t) \right\rangle \right] . \end{aligned}$$
(40)

Similarly, we can write B as

$$\begin{aligned} B = \sum _{j \in {\mathscr {R}}_d} \mathbb {E}\left[ \left\langle \nabla \left[ \lambda _j (x_\theta (t) ,U_\theta (t) ,\theta ) \varDelta _j f(x_\theta (t),U_\theta (t) ) \right] , y_\theta (t) \right\rangle \right] . \end{aligned}$$
(41)

Changing the order of integration, we can write C as

$$\begin{aligned}&C= \sum _{j \in {\mathscr {R}}_d} \int _{0}^{t} \mathbb {E}\left[ \left\langle \nabla \lambda _j(x_\theta ( u) ,U_\theta (u),\theta ) \frac{{\text {d}}}{{\text {d}}t} \varDelta _j \varPsi _{t-u}( x_\theta ( u) , U_\theta ( u) ,\theta ) , \right. \right. \\&\qquad \left. \left. \sum _{k \in {\mathscr {R}}_c} \int _{ 0}^{ u} \partial _\theta \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \varPhi (x_\theta (s), U_\theta (s), u -s)\zeta ^{(c)}_k {\text {d}}s \right\rangle {\text {d}}u \right] \\&\quad = \sum _{j \in {\mathscr {R}}_d} \int _{0}^{t} \mathbb {E}\left[ \left\langle \nabla \lambda _j(x_\theta ( u) ,U_\theta (u),\theta ) \frac{{\text {d}}}{{\text {d}}t} \varDelta _j \varPsi _{t-u}( x_\theta ( u) , U_\theta ( u ) ,\theta ) , y_\theta (u)\right\rangle {\text {d}}u \right] \\&\quad = \sum _{j \in {\mathscr {R}}_d} \frac{{\text {d}}}{{\text {d}}t} \int _{0}^{t} \mathbb {E}\left[ \left\langle \nabla \lambda _j(x_\theta ( u) ,U_\theta (u),\theta )\varDelta _j \varPsi _{t-u}( x_\theta ( u) , U_\theta ( u ) ,\theta ) , y_\theta (u)\right\rangle {\text {d}}u \right] \\&\qquad - \sum _{j \in {\mathscr {R}}_d} \mathbb {E}\left[ \left\langle \nabla \lambda _j(x_\theta ( t) ,U_\theta (t),\theta )\varDelta _j f ( x_\theta ( t) , U_\theta ( t ) ) , y_\theta (t)\right\rangle \right] . \end{aligned}$$

This relation along with (40), (41) and (39) implies that

$$\begin{aligned} \frac{{\text {d}} L(t) }{{\text {d}}t}&= \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla f(x_\theta (t) ,U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle \right] \\&\quad +\sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \left\langle \nabla \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_k \rangle \right] , y_\theta (t) \right\rangle \right] \\&\quad + \sum _{k \in {\mathscr {R}}_d}\mathbb {E}\left[ \left\langle \nabla \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \varDelta _k f(x_\theta (t),U_\theta (t) ) \right] , y_\theta (t) \right\rangle \right] \\&\quad + \sum _{k \in {\mathscr {R}}_d} \frac{{\text {d}}}{{\text {d}}t} \int _{0}^t \mathbb {E}\left[ \left\langle \nabla \lambda _k(x_\theta ( s) ,U_\theta (s),\theta ) \varDelta _k \varPsi _{t-s}( x_\theta ( s) , U_\theta ( s ) ,\theta ) ,y_\theta (s) \right\rangle {\text {d}}s \right] \\&\quad - \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \left\langle \nabla \lambda _k(x_\theta ( t) ,U_\theta (t),\theta )\varDelta _k f ( x_\theta ( t) , U_\theta ( t ) ) , y_\theta (t)\right\rangle \right] . \end{aligned}$$

Applying the product rule on the third term will produce two terms, one of which will cancel with the last term to yield

$$\begin{aligned} \frac{{\text {d}} L(t) }{{\text {d}}t}&= \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla f(x_\theta (t) ,U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle \right] \\&\quad +\sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \left\langle \nabla \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_k \rangle \right] , y_\theta (t) \right\rangle \right] \\&\quad + \sum _{k \in {\mathscr {R}}_d}\mathbb {E}\left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla \left( \varDelta _k f(x_\theta (t),U_\theta (t) ) \right) , y_\theta (t) \right\rangle \right] \\&\quad + \sum _{k \in {\mathscr {R}}_d} \frac{{\text {d}}}{{\text {d}}t} \int _{0}^t \mathbb {E}\left[ \left\langle \nabla \lambda _k(x_\theta ( s) ,U_\theta (s),\theta ) \varDelta _k \varPsi _{t-s}( x_\theta ( s) , U_\theta ( s ) ,\theta ) ,y_\theta (s) \right\rangle {\text {d}}s \right] . \end{aligned}$$

Integrating this equation from \(t = 0\) to \(t =T\) will prove (38), and this completes the proof of Theorem 2. \(\square \)

Proof

(Proof of Theorem 3) Consider the Markov process \(( x_\theta (t) , U_\theta (t) , y_\theta (t) )_{t \ge 0}\). The generator of this process is given by

$$\begin{aligned} \mathbb {H} F(x,u,y)&= \sum _{k \in {\mathscr {R}}_c} \lambda _k(x,u,\theta ) \left\langle \nabla F(x,u,y), \zeta ^{(c)}_k \right\rangle + \sum _{k \in {\mathscr {R}}_d} \lambda _k(x,u,\theta ) \varDelta _k F(x,u,y) \\&\quad + \sum _{k \in {\mathscr {R}}_c} \partial _\theta \lambda _k(x,u,\theta ) \left\langle \nabla _y F(x,u,y), \zeta ^{(c)}_k \right\rangle \\&\quad + \sum _{k \in {\mathscr {R}}_c} \left\langle \nabla \lambda _k(x,u,\theta ) , y \right\rangle \left\langle \nabla _y F(x,u,y), \zeta ^{(c)}_k \right\rangle \end{aligned}$$

for any real-valued function \(F: \mathbb {R}^{S_c} \times \mathbb {N}^{S_d}_0 \times \mathbb {R}^{S_c} \rightarrow \mathbb {R}\). Here, \(\nabla _y F\) denotes the gradient of function F w.r.t. the last \(S_c\) coordinates. Setting

$$\begin{aligned} F(x,u,y) = \left\langle \nabla f(x,u), y \right\rangle \end{aligned}$$

we obtain

$$\begin{aligned} \mathbb {H} F(x,u,y)&= \sum _{k \in {\mathscr {R}}_c} \lambda _k(x,u,\theta ) \left\langle \varDelta f(x,u) y, \zeta ^{(c)}_k \right\rangle + \sum _{k \in {\mathscr {R}}_d} \lambda _k(x,u,\theta ) \varDelta _k \left\langle \nabla f(x,u), y \right\rangle \\&\quad + \sum _{k \in {\mathscr {R}}_c} \partial _\theta \lambda _k(x,u,\theta ) \left\langle \nabla f(x,u), \zeta ^{(c)}_k \right\rangle \\&\quad + \sum _{k \in {\mathscr {R}}_c} \left\langle \nabla \lambda _k(x,u,\theta ) , y \right\rangle \left\langle \nabla f(x,u), \zeta ^{(c)}_k \right\rangle \end{aligned}$$

where \(\varDelta F\) denotes the Hessian matrix of F w.r.t. the first \(S_c\) coordinates. However, note that the first and the fourth terms can be combined with product rule as

$$\begin{aligned}&\sum _{k \in {\mathscr {R}}_c} \lambda _k(x,u,\theta ) \left\langle \varDelta f(x,u) y, \zeta ^{(c)}_k \right\rangle +\sum _{k \in {\mathscr {R}}_c} \left\langle \nabla \lambda _k(x,u,\theta ) , y \right\rangle \left\langle \nabla f(x,u), \zeta ^{(c)}_k \right\rangle \\&\quad = \sum _{k \in {\mathscr {R}}_c} \left\langle \nabla \left[ \lambda _k (x, u ,\theta ) \langle \nabla f(x,u ) , \zeta ^{(c)}_k \rangle \right] , y \right\rangle \end{aligned}$$

and hence we get

$$\begin{aligned} \mathbb {H} F(x,u,y)&= \sum _{k \in {\mathscr {R}}_c} \partial _\theta \lambda _k(x,u,\theta ) \left\langle \nabla f(x,u), \zeta ^{(c)}_k \right\rangle \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_c} \left\langle \nabla \left[ \lambda _k (x, u ,\theta ) \langle \nabla f(x,u ) , \zeta ^{(c)}_k \rangle \right] , y \right\rangle \nonumber \\&\quad + \sum _{k \in {\mathscr {R}}_d} \lambda _k(x,u,\theta ) \varDelta _k \left\langle \nabla f(x,u), y \right\rangle . \end{aligned}$$
(42)

Using Dynkin’s formula, we have

$$\begin{aligned} \mathbb {E}\left( F(x_\theta (T) , U_\theta (T) , y_\theta (T) ) \right) = \mathbb {E}\left[ \int _0^T \mathbb {H} F(x_\theta (t) , U_\theta (t) , y_\theta (t)){\text {d}}t \right] \end{aligned}$$

and substituting (42) yields

$$\begin{aligned}&\mathbb {E}\left[ \left\langle \nabla f (x_\theta (T) ,U_\theta (T) ) , y_\theta (T) \right\rangle \right] \\&\quad = \sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^T \partial _\theta \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \left\langle \nabla f(x_\theta (t) ,U_\theta (t) ) , \zeta ^{(c)}_k \right\rangle {\text {d}}t \right] \\&\qquad +\sum _{k \in {\mathscr {R}}_c} \mathbb {E}\left[ \int _{0}^T \left\langle \nabla \left[ \lambda _k (x_\theta (t) ,U_\theta (t) ,\theta ) \langle \nabla f(x_\theta (t),U_\theta (t) ) , \zeta ^{(c)}_k \rangle \right] , y_\theta (t) \right\rangle {\text {d}}t \right] \\&\qquad + \sum _{k \in {\mathscr {R}}_d} \mathbb {E}\left[ \int _{0}^T \lambda _k (x_\theta (s) ,U_\theta (s) ,\theta ) \left\langle \nabla \left( \varDelta _k f(x_\theta (t),U_\theta (t) ) \right) , y_\theta (t) \right\rangle {\text {d}}t \right] . \end{aligned}$$

This relation along with Proposition 2 proves Theorem 3. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, A., Khammash, M. Sensitivity Analysis for Multiscale Stochastic Reaction Networks Using Hybrid Approximations. Bull Math Biol 81, 3121–3158 (2019). https://doi.org/10.1007/s11538-018-0521-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11538-018-0521-4

Keywords

Mathematics Subject Classification

Navigation