Skip to main content

Langevin and Kalman Importance Sampling for Nonlinear Continuous-Discrete State-Space Models

  • Chapter
Continuous Time Modeling in the Behavioral and Related Sciences
  • 1263 Accesses

Abstract

The likelihood function of a nonlinear continuous-discrete state-space model with state dependent diffusion function is computed by integrating out the latent variables with the help of Langevin sampling. The continuous-time paths are discretized on a time grid in order to obtain a finite-dimensional integration and densities w.r.t. Lebesgue measure. We use importance sampling, where the exact importance density is the conditional density of the latent states, given the measurements. This unknown density is either estimated from the sampler data or approximated by an estimated normal density. Then, new trajectories are drawn from this Gaussian measure. Alternatively, a Gaussian importance density is directly derived from an extended Kalman smoother with subsequent sampling of independent trajectories (extended Kalman sampling (EKS)). We compare the Monte Carlo results with numerical methods based on extended, unscented, and Gauss-Hermite Kalman filtering (EKF, UKF, GHF) and a grid-based solution of the Fokker-Planck equation between measurements. This comprises the repeated multiplication of transition matrices based on Euler transition kernels, finite differences, and discretized integral operators. The methods are illustrated for the geometrical Brownian motion and the Ginzburg-Landau model for phase transitions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Mother behaviors—look at infant, smile, vocalize, touch, or hold infant. Infant behaviors—look at mother, smile, vocalize, touch or hold mother, fuss/cry.

  2. 2.

    In order to avoid misunderstandings, one must distinguish between (non)linearity in the continuous-time dynamical specification (differential equation) w.r.t. the state variables, and in the derived “exact discrete model” w.r.t. the parameters.

  3. 3.

    W.r.t. the parameters.

  4. 4.

    One has \(\int \delta (x-x') \phi (x')dx' =\phi (x)\) and \(\sum _{\rho '} \delta _{\rho \rho '}\phi _{\rho '}=\phi _{\rho }\).

  5. 5.

    Otherwise, one can use the singular normal distribution (cf. Mardia et al. 1979, ch. 2.5.4, p. 41). In this case, the generalized inverse of Ω j is used and the determinant |⋅|, which is zero, is replaced by the product of positive eigenvalues. Singular covariance matrices occur, for example, in autoregressive models of higher order, when the state vector contains derivatives of a variable.

  6. 6.

    In statistical mechanics, one assumes the equivalence of time averages and ensemble averages (cross sections of identical systems).

  7. 7.

    In the case of a state-dependent diffusion matrix, η j+1 = η j + G(η j, x j, ψ)δW j generates a more general martingale process. Expression (16.16) remains finite in a continuum limit (see Appendix 2).

  8. 8.

    These are called irreducible diffusions. A transformation z = h(y) leading to unit diffusion for z must fulfil the system of differential equations h α,βg βγ = δ αγ, α, β = 1, …, p; γ = 1, …, r. The inverse transformation y = v(z) fulfills v α,γ(z) = g αγ(v(z)). Thus v α,γδ = g αγ,𝜖v 𝜖,δ = v α,δγ = g αδ,𝜖v 𝜖,γ. Inserting v, one obtains the commutativity condition \(g_{\alpha \gamma _, \epsilon } \; g_{\epsilon \delta }=g_{\alpha \delta ,\epsilon } \; g_{\epsilon \gamma }\), which is necessary and sufficient for reducibility. See Kloeden and Platen (1992, ch. 10, p. 348), Aït-Sahalia (2008).

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hermann Singer .

Editor information

Editors and Affiliations

Appendices

Appendix 1: Langevin Sampler: Analytic Drift Function

16.1.1 Notation

In the following, the components of vectors and matrices are denoted by Greek letters, e.g., f α, α = 1, …, p, and partial derivatives by commas, i.e., f α,β := ∂f α∂η β =  βf α = (f η)αβ. The Jacobian matrix ∂f∂η is written as f η and its βth column as (f η)β. Likewise, Ω α denotes row α of matrix Ω αβ and Ω •• = Ω for short.

Latin indices denote time, e.g., f  = f α(η j). Furthermore, a sum convention is used for the Greek indices (i.e., f αg α =∑αf αg α). The difference operators δ = B −1 − 1, ∇ = 1 − B, with the backshift j = η j−1 are used frequently. One has δ ⋅∇ = B −1 − 2 + B := Δ for the central second difference.

16.1.2 Functional Derivatives

The functional Φ(y) may be expanded to first order by using the functional derivative \((\delta \varPhi /\delta y)(h) = \int (\delta \varPhi /\delta y(s)) h(s)ds\). One has Φ(y + h) − Φ(y) = (δΦδy)(h) + O(∥h2).

A discrete version is Φ(η) = Φ(η 0, …, η J) and Φ(η + h) − Φ(η) =∑j[∂Φ(η)∕(η jδt)]h jδt + O(∥h2). As a special case, consider the functional Φ(η) = η j. Since \(\eta _{j}+h_{j}-\eta _{j}=\sum (\delta _{jk}/\delta t) h_{k} \delta t\) one has the continuous analogue \(y(t)+h(t)-y(t)=\int \delta (t-s)h(s)ds\), thus δy(t)∕δy(s) = δ(t − s).

16.1.2.1 State-Independent Diffusion Coefficient

First we assume a state-independent diffusion coefficient Ω j = Ω, but later we set Ω j = Ω(η j, x j). This is important, if the Lamperti transformation does not lead to constant coefficients in multivariate models.Footnote 8 In components, the term (16.15) reads

$$\displaystyle \begin{aligned} \begin{array}{rcl} S_{0} &\displaystyle =&\displaystyle \frac{1}{2} \sum_{j=0}^{J-1} (\eta_{j+1; \beta}-\eta_{j\beta}) (\varOmega_{\beta \gamma} \delta t)^{-1} (\eta_{j+1; \gamma}- \eta_{j \gamma}), \end{array} \end{aligned} $$

Note that (Ω βγδt)−1 ≡ [(Ωδt)−1]βγ and the semicolon in η j+1;β serves to separate the indices; it is not a derivative. Differentiation w.r.t. the state η yields (j = 1, …, J − 1)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \partial S_{0}/\partial(\eta_{j \alpha}\delta t) &\displaystyle =&\displaystyle -\varOmega_{\alpha \gamma}^{-1} \delta t^{-2} (\eta_{j+1; \gamma}-2\eta_{j\gamma}+ \eta_{j-1; \gamma}) \end{array} \end{aligned} $$
(16.40)

In vector notation, we have ∂S 0(η jδt) = −Ω −1δt −2Δη j. On the boundaries j = 0, j = J we obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial S_{0}/\partial(\eta_{0 \alpha}\delta t) &\displaystyle =&\displaystyle -\varOmega_{\alpha \gamma}^{-1} \delta t^{-2} (\eta_{1\gamma}-\eta_{0\gamma}) \\ \partial S_{0}/\partial(\eta_{0 \alpha}\delta t) &\displaystyle =&\displaystyle \varOmega_{\alpha \gamma}^{-1} \delta t^{-2} (\eta_{J\gamma}-\eta_{J-1;\gamma}) \end{array} \end{aligned} $$

Next, the derivatives of \(\log \alpha (\eta )\) are needed. One gets

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial S_{1}/\partial(\eta_{j \alpha}\delta t) &\displaystyle =&\displaystyle -\delta t^{-1}[f_{j\beta,\alpha}\varOmega_{\beta \gamma}^{-1}\delta \eta_{j\gamma} -\varOmega_{\alpha \gamma}^{-1}(f_{j\gamma}-f_{j-1;\gamma})] \end{array} \end{aligned} $$

or in vector form, using difference operators

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \partial S_{1}/\partial(\eta_{j\alpha}\delta t) &\displaystyle =&\displaystyle -\delta t^{-1}[f_{j\bullet,\alpha}^{\prime}\varOmega^{-1}\delta \eta_{j} -\varOmega^{-1}\delta f_{j-1}], \end{array} \end{aligned} $$
(16.41)

where f j•,α is column α of the Jacobian f η(η j). The second term yields

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \partial S_{2}/\partial(\eta_{j \alpha}\delta t) &\displaystyle =&\displaystyle \partial/\partial\eta_{j \alpha}\frac{1}{2}[f_{j\beta}\varOmega_{\beta \gamma}^{-1}f_{j\gamma}] = f_{j\beta,\alpha}\varOmega_{\beta \gamma}^{-1}f_{j\gamma}\\ &\displaystyle =&\displaystyle f_{j\bullet,\alpha}^{\prime}\varOmega^{-1}f_{j}. \end{array} \end{aligned} $$
(16.42)

Finally, one has to determine the drift component corresponding to the measurements, which is contained in the conditional density p(z|η). Since it was assumed that the error of measurement is Gaussian (see 16.2), we obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} p(z|\eta) &\displaystyle =&\displaystyle \prod_{i=0}^{T} p(z_{i}|\eta_{j_{i}}) = \prod_{i=0}^{T} \phi(z_{i};h_i,R_{i}), \end{array} \end{aligned} $$

where ϕ(y;μ, Σ) is the multivariate Gaussian density, \(h_i=h(\eta _{j_{i}},x_{j_{i}})\) is the output function and \(R_{i}=R(x_{j_{i}})\) is the measurement error covariance matrix. Thus the derivative reads (matrix form in the second line)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \partial \log p(z|\eta)/\partial(\eta_{j \alpha}\delta t) &\displaystyle =&\displaystyle \sum_{i=0}^{T}h_{i\gamma,\alpha}R_{i\beta\gamma}^{-1}(z_{i\beta}-h_{i\beta}) (\delta_{j j_{i}}/\delta t)\\ &\displaystyle =&\displaystyle \sum_{i=0}^{T}h_{i\bullet,\alpha}^{\prime}R_{i}^{-1}(z_{i}-h_{i}) (\delta_{j j_{i}}/\delta t) \end{array} \end{aligned} $$
(16.43)

The Kronecker symbol \(\delta _{j j_{i}}\) only gives contributions at the measurement times \(t_{i}=\tau _{j_{i}}\). Together we obtain for the drift of the Langevin equation (16.14)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \delta_{\eta}\log p(\eta|z) &\displaystyle =&\displaystyle \delta_{\eta}[\log p(z|\eta) + \log p(\eta)]\\ &\displaystyle =&\displaystyle \text{(A.4)} - (\text{A.1} +\text{A.2}+\text{A.3}) + \delta_{\eta}\log p(\eta_{0}). \end{array} \end{aligned} $$
(16.44)

Here, p(η 0) is an arbitrary density for the initial latent state.

16.1.2.2 State-Dependent Diffusion Coefficient

In the case of Ω j = Ω(η j, x j) the expressions get more complicated. The derivative of S 0 now reads

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \partial S_{0}/\partial(\eta_{j \alpha}\delta t) &\displaystyle =&\displaystyle \delta t^{-2}[\varOmega_{j-1;\alpha \beta}^{-1} \delta\eta_{j-1; \beta} -\varOmega_{j\alpha \beta}^{-1} \delta\eta_{j \beta} \\ &\displaystyle &\displaystyle +\frac{1}{2} \delta\eta_{j \beta}\varOmega_{j\beta\gamma,\alpha }^{-1} \delta\eta_{j \gamma}], \end{array} \end{aligned} $$
(16.45)

\(\varOmega _{j\beta \gamma ,\alpha }^{-1}\equiv (\varOmega ^{-1})_{j\beta \gamma ,\alpha }\). A closer relation to expression (16.40) may be obtained by the Taylor expansion

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \varOmega_{j-1;\alpha \beta}^{-1} &\displaystyle =&\displaystyle \varOmega_{j\alpha \beta}^{-1} + \varOmega_{j\alpha \beta,\gamma}^{-1} (\eta_{j-1;\gamma}-\eta_{j\gamma})+O(\|\delta\eta_{j-1}\|{}^{2}) \end{array} \end{aligned} $$
(16.46)

leading to

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \partial S_{0}/\partial(\eta_{j \alpha}\delta t) &\displaystyle =&\displaystyle -\varOmega_{j\alpha \beta}^{-1} \delta t^{-2} (\eta_{j+1; \beta}-2\eta_{j \beta}+ \eta_{j-1; \beta}) \\ &\displaystyle &\displaystyle - \varOmega_{j\alpha \beta,\gamma}^{-1} \delta t^{-2}\delta\eta_{j-1; \beta}\delta\eta_{j-1; \gamma} +O(\delta t^{-2}\|\delta\eta_{j-1}\|{}^{3})\\ &\displaystyle &\displaystyle + \frac{1}{2} \varOmega_{j \beta \gamma, \alpha}^{-1} \delta t^{-2}\delta\eta_{j\beta}\delta\eta_{j\gamma}. \end{array} \end{aligned} $$
(16.47)

In the state-dependent case, also the derivative of the Jacobian term \(\log Z^{-1}=-\frac {1}{2} \sum _{j} \log \) |2πΩ jδt| is needed. Since the derivative of a log determinant is

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial \log|\varOmega| /\partial\varOmega_{\alpha\beta} &\displaystyle =&\displaystyle \varOmega_{\beta \alpha}^{-1}, \end{array} \end{aligned} $$

one obtains

$$\displaystyle \begin{aligned} \begin{array}{rcl} \partial \log Z^{-1}/\partial(\eta_{j \alpha}\delta t) &\displaystyle =&\displaystyle -\frac{1}{2} \delta t^{-1}\varOmega_{j\beta\gamma}^{-1} \varOmega_{j\beta\gamma,\alpha} = -\frac{1}{2} \delta t^{-1} \mbox{tr}[\varOmega_{j}^{-1} \varOmega_{j,\alpha}], \end{array} \end{aligned} $$

Ω j,α = Ω j••,α for short. Using the formula \(\varOmega _{j} \varOmega _{j}^{-1} = I; \varOmega _{j,\alpha } = -\varOmega _{j} \varOmega ^{-1}_{j,\alpha } \varOmega _{j}\), we find

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \partial \log Z^{-1}/\partial(\eta_{j \alpha}\delta t) = \frac{1}{2} \delta t^{-1} \mbox{tr}[\varOmega^{-1}_{j,\alpha} \varOmega_{j} ]. \end{array} \end{aligned} $$
(16.48)

The contributions of S 1 and S 2 are now (see 16.16)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} &\displaystyle {}{}{}&\displaystyle {\partial S_{1}/\partial(\eta_{j \alpha}\delta t)=} \\ &\displaystyle &\displaystyle -\delta t^{-1}[f_{j\beta,\alpha}\varOmega_{j\beta \gamma}^{-1}\delta \eta_{j\gamma} -(\varOmega_{j\alpha \gamma}^{-1}f_{j\gamma} - \varOmega_{j-1;\alpha \gamma}^{-1}f_{j-1;\gamma}) +f_{j\beta}\varOmega_{j\beta\gamma,\alpha}^{-1}\delta\eta_{j\gamma}] \end{array} \end{aligned} $$
(16.49)
$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \partial S_{2}/\partial(\eta_{j \alpha}\delta t) &\displaystyle =&\displaystyle f_{j\beta,\alpha}\varOmega_{j\beta \gamma}^{-1}f_{j\gamma}+ \frac{1}{2} f_{j\beta}\varOmega_{j\beta \gamma,\alpha}^{-1}f_{j\gamma}. \end{array} \end{aligned} $$
(16.50)

It is interesting to compare the terms in (16.45, 16.49, 16.50) depending on the derivative \(\varOmega _{j\beta \gamma ,\alpha }^{-1}\), which read in vector form

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \frac{1}{2} \delta t^{-2} \mbox{tr}[\varOmega_{j, \alpha}^{-1} \delta\eta_{j}\delta\eta_{j}^{\prime}] -\delta t^{-1} \mbox{tr}[\varOmega_{j,\alpha}^{-1}\delta\eta_{j}f_{j}^{\prime}] +\frac{1}{2} \mbox{tr}[\varOmega_{j,\alpha}^{-1}f_{j}f_{j}^{\prime}], \end{array} \end{aligned} $$

and the Jacobian derivative (16.48). The terms can be collected to yield

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \frac{1}{2} \delta t^{-2} \mbox{tr}\{\varOmega^{-1}_{j,\alpha} [\varOmega_{j}\delta t - (\delta\eta_{j}-f_{j}\delta t)(\delta\eta_{j}-f_{j}\delta t)^{\prime}]\}, \end{array} \end{aligned} $$
(16.51)

as may be directly seen from the Lagrangian (16.7).

In summary, the Langevin drift component (), j = 0, …J;α = 1, …, p is in vector-matrix form

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \delta_{\eta_{j\alpha}}\log p(\eta|z) &\displaystyle =&\displaystyle \delta_{\eta_{j\alpha}}[\log p(z|\eta) + \log p(\eta)] \\ &\displaystyle =&\displaystyle \sum_{i=0}^{T} h_{i\bullet,\alpha}^{\prime} R_{i}^{-1}(z_{i}-h_{i}) (\delta_{j j_{i}}/\delta t)\\ &\displaystyle &\displaystyle + \delta t^{-2}[ \varOmega_{j\alpha\bullet}^{-1} \delta\eta_{j} -\varOmega_{j-1;\alpha\bullet}^{-1} \delta\eta_{j-1}]\\ &\displaystyle &\displaystyle + \delta t^{-1}[f_{j\bullet,\alpha}^{\prime}\varOmega_{j}^{-1}\delta \eta_{j} -(\varOmega_{j\alpha\bullet}^{-1}f_{j} - \varOmega_{j-1;\alpha\bullet}^{-1}f_{j-1})]\\ &\displaystyle &\displaystyle - f_{j\bullet,\alpha}^{\prime}\varOmega_{j}^{-1}f_{j} \\ &\displaystyle &\displaystyle + \frac{1}{2} \delta t^{-2} \mbox{tr}\{\varOmega^{-1}_{j,\alpha} [\varOmega_{j}\delta t - (\delta\eta_{j}-f_{j}\delta t)(\delta\eta_{j}-f_{j}\delta t)^{\prime}]\}\\ &\displaystyle &\displaystyle + \delta_{\eta_{j\alpha}}\log p(\eta_{0}). \end{array} \end{aligned} $$
(16.52)

Here, h i•,α is column α of Jacobian \(h_{\eta }(\eta _{j_{i}})\), \(\varOmega _{j\alpha \bullet }^{-1}\) is row α of Ω(η j)−1, \(\varOmega _{j,\alpha }^{-1}:=\varOmega _{j \bullet \bullet ,\alpha }^{-1}\), and f j•,α denotes column α of Jacobian f η(η j).

Appendix 2: Continuum Limit

The expressions in the main text were obtained by using an Euler discretization of the SDE (16.1), so in the limit δt → 0, one expects a convergence of η j to the true state y(τ j) (see Kloeden and Platen 1999, ch. 9). Likewise, the (J + 1)p-dimensional Langevin equation (16.14) for η (u) will be an approximation of the stochastic partial differential equation (SPDE) for the random field Y α(u, t) on the temporal grid τ j = t 0 + jδt.

A rigorous theory (assuming constant diffusion matrices) is presented in the work of Reznikoff and Vanden-Eijnden (2005); Hairer et al. (2005, 2007); Apte et al. (2007); Hairer et al. (2011). In this section it is attempted to gain the terms, obtained in this literature by functional derivatives, directly from the discretization, especially in the case of state-dependent diffusions. Clearly, the finite-dimensional densities w.r.t. Lebesgue measure lose their meaning in the continuum limit, but the idea is to use large but finite J, so that the Euler densities p(η 0, …, η J) are good approximations of the unknown finite-dimensional densities p(y 0, τ 0;…;y J, τ J) of the process Y (t) (cf. Stratonovich 1971, 1989, Bagchi 2001 and the references cited therein).

16.1.1 Constant Diffusion Matrix

First we consider constant and (nonsingular) diffusion matrices Ω. The Lagrangian (16.15) attains the formal limit (Onsager-Machlup functional)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} S &\displaystyle =&\displaystyle \frac{1}{2} \int dy(t)^{\prime} (\varOmega dt)^{-1}dy(t) \end{array} \end{aligned} $$
(16.53)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle - \int f(y)^{\prime} \varOmega^{-1} dy(t) + \frac{1}{2} \int f(y)^{\prime} \varOmega^{-1} f(y) dt. {} \vspace{-2pt}\end{array} \end{aligned} $$
(16.54)

If y(t) is a sample function of the diffusion process Y (t) in (16.1), the first term (16.53) does not exist, since the quadratic variation dy(t)dy(t) = Ωdt is of order dt. Thus we have dy(t)(Ωdt)−1dy(t) = tr[(Ωdt)−1 dy(t)dy(t)] = tr[I p] = p. Usually, (16.53) is written as the formal expression \(\frac {1}{2} \int \dot {y}(t)^{\prime } \) \(\varOmega ^{-1} \dot {y}(t) dt\), which contains the (nonexisting) derivatives \(\dot {y}(t)\). Moreover, partial integration yields

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} -\frac{1}{2} \int {y}(t)^{\prime}\varOmega^{-1}\ddot{y}(t) dt \vspace{-2pt}\end{array} \end{aligned} $$
(16.55)

so that C −1(t, s) = Ω −1(− 2∂t 2)δ(t − s) is the kernel of the inverse covariance (precision) operator of Y (t) (for drift f = 0; i.e., a Wiener process). Indeed, since

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \partial^{2}/\partial t^{2} \min(t,s)=-\delta(t-s), \end{array} \end{aligned} $$
(16.56)

the covariance operator kernel C(t, s) is

$$\displaystyle \begin{aligned} \begin{array}{rcl} C(t,s) &\displaystyle =&\displaystyle \varOmega(-\partial^{2}/\partial t^{2})^{-1}\delta(t-s) = \varOmega \min(t,s). \end{array} \end{aligned} $$

Thus, \(p(y)\propto \exp [-\frac {1}{2} \int {y}(t)^{\prime }\varOmega ^{-1}\ddot {y}(t) dt]\) is the formal density of a Gaussian process Y (t) ∼ N(0, C).

In contrast, the terms in (16.54) are well defined and yield the Radon-Nikodym derivative (cf. 16.17)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \alpha(y) &\displaystyle =&\displaystyle \exp\Big\{\int f(y)^{\prime} \varOmega^{-1} dy(t) - \frac{1}{2} \int f(y)^{\prime} \varOmega^{-1} f(y) dt\Big\}. \vspace{-2pt}\end{array} \end{aligned} $$
(16.57)

This expression can be obtained as the ratio of the finite-dimensional density functions p(y J, τ J, …, y 1, τ 1|y 0, τ 0) for drifts f and f = 0, respectively, in the limit δt → 0 (cf. Wong and Hajek 1985, ch. 6, p. 215 ff.). In this limit, the (unkown) exact densities can be replaced by the Euler densities (16.5). Now, the terms of the Langevin equation (16.14) will be given. We start with the measurement term (16.43), α = 1, …, p

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \delta \log p(z|y)/\delta y_{\alpha}(t) &\displaystyle =&\displaystyle \sum_{i=0}^{T} h_{i\bullet,\alpha}^{\prime} R_{i}^{-1}(z_{i}-h_{i}) \delta(t-t_{i}) \end{array} \end{aligned} $$
(16.58)

where the scaled Kronecker delta \((\delta _{j j_{i}}/\delta t)\) was replaced by the delta function (see Appendix 1). Clearly, in numerical implementations, a certain term of the delta sequence δ n(t) must be used (cf. Lighthill 1958). Next, the term stemming from the driftless part (16.40) is

$$\displaystyle \begin{aligned} \begin{array}{rcl} -\delta S_{0}/\delta y_{\alpha}(t) &\displaystyle =&\displaystyle \varOmega^{-1}_{\alpha\bullet} \ddot{y}(t) = \varOmega^{-1}_{\alpha\bullet} y_{tt}(t), \vspace{2pt}\end{array} \end{aligned} $$

or Ω −1y tt(t) in matrix form, which corresponds to (16.55). The contributions of S 1 are (cf. 16.41)

$$\displaystyle \begin{aligned} \begin{array}{rcl} -\delta S_{1}/\delta y_{\alpha}(t) &\displaystyle =&\displaystyle f(y)_{\beta,\alpha}\varOmega_{\beta \gamma}^{-1}dy_{\gamma}(t)/dt -\varOmega_{\alpha \gamma}^{-1} df_{\gamma}(y)/dt. \vspace{2pt}\end{array} \end{aligned} $$

The first term is of Itô form. Transformation to Stratonovich calculus (Apte et al. 2007, sects. 4, 9) yields

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} h_{\alpha\beta} dy_{\beta} &\displaystyle =&\displaystyle h_{\alpha\beta} \circ dy_{\beta} - \frac{1}{2} {h}_{\alpha\beta,\gamma}\varOmega_{\beta\gamma}dt \end{array} \end{aligned} $$
(16.59)
$$\displaystyle \begin{aligned} \begin{array}{rcl} df_{\alpha} &\displaystyle =&\displaystyle f_{\alpha,\beta}dy_{\beta} + \frac{1}{2} f_{\alpha,\beta\gamma}\varOmega_{\beta\gamma}dt = f_{\alpha,\beta}\circ dy_{\beta}{} \vspace{2pt}\end{array} \end{aligned} $$
(16.60)

Thus, we obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} -\delta S_{1}/\delta y_{\alpha}(t) &\displaystyle =&\displaystyle f(y)_{\beta,\alpha}\varOmega_{\beta \gamma}^{-1}\circ dy_{\gamma}(t)/dt -\frac{1}{2} f(y)_{\beta,\alpha\beta}\\ &\displaystyle &\displaystyle - \varOmega_{\alpha \gamma}^{-1} f(y)_{\gamma,\delta}\circ dy_{\gamma}(t)/dt\\ &\displaystyle =&\displaystyle (f_{y}^{\prime}\varOmega^{-1}-\varOmega^{-1}f_{y})\circ y_{t}(t) -\frac{1}{2} \partial_{y}[\partial_{y}\cdot f(y)] \vspace{2pt}\end{array} \end{aligned} $$

where y ⋅ f(y) = f β,β = div(f). Finally we have (cf. 16.42)

$$\displaystyle \begin{aligned} \begin{array}{rcl} -\delta S_{2}/\delta y(t) &\displaystyle =&\displaystyle -f_{y}^{\prime}\varOmega^{-1}f \vspace{2pt}\end{array} \end{aligned} $$

and \(\delta _{y(t)}\log p(y(t_{0}))=\partial _{y_{0}}\log p(y_{0})\delta (t-t_{0})\). Putting all together, one finds the Langevin drift functional (in matrix form)

$$\displaystyle \begin{aligned} \begin{array}{rcl} -\frac{\delta \varPhi(y|z)}{\delta y(t)} &\displaystyle :=&\displaystyle F(y|z) \\ &\displaystyle =&\displaystyle \sum_{i=0}^{T} h_{iy}^{\prime}(y) R_{i}^{-1}(z_{i}-h_{i}(y)) \delta(t-t_{i})\\ &\displaystyle &\displaystyle + \varOmega^{-1} y_{tt}+(f_{y}^{\prime}\varOmega^{-1}-\varOmega^{-1}f_{y})\circ y_{t}\\ &\displaystyle &\displaystyle - \frac{1}{2} \partial_{y}[\partial_{y}\cdot f(y)]-f_{y}^{\prime}\varOmega^{-1}f \\ &\displaystyle &\displaystyle + \partial_{y_{0}}\log p(y_{0})\delta(t-t_{0}) \end{array} \end{aligned} $$

and the SPDE (cf. Hairer et al. 2007)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} dY(u,t) &\displaystyle =&\displaystyle F(Y(u,t)|z)) du +\sqrt{2}\; dW_{t}(u,t), \end{array} \end{aligned} $$
(16.61)

where W t(u, t) =  tW(u, t) is a cylindrical Wiener process with E[W t(u, t)] = 0, E[W t(u, t) \(W_{s}(v,s)^{\prime }]=I_{p}\min (u,v)\delta (t-s)\), and W(u, t) is a Wiener field (Brownian sheet). See, e.g., Jetschke (1986); Da Prato and Zabczyk (1992, ch. 4.3.3). The cylindrical Wiener process may be viewed as continuum limit of \(W_{j}(u)/\sqrt {\delta t}\), \(E[W_{j}(u)/\sqrt {\delta t} \; W^{\prime }_{k}(v)/\sqrt {\delta t}]=I_{p}\min (u,v)\delta t^{-1}\delta _{jk}\).

16.1.2 State-Dependent Diffusion Matrix

In this case, new terms appear. Starting with the first term in (16.47), one gets

$$\displaystyle \begin{aligned} \begin{array}{rcl} -\varOmega_{j\alpha \beta}^{-1} \delta t^{-2} (\eta_{j+1; \beta}-2\eta_{j \beta}+ \eta_{j-1; \beta}) &\displaystyle \rightarrow&\displaystyle -\varOmega(y(t))^{-1} \circ \ddot{y}(t). \end{array} \end{aligned} $$

The second term in (16.47) contains terms of the form h j (η j − η j−1) which appear in a backward Itô integral. Here we attempt to write them in symmetrized (Stratonovich) form. It turns out that the Taylor expansion (16.46) must be carried to higher orders. Writing (for simplicity in scalar form)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \varOmega_{j-1}^{-1} \delta\eta_{j-1} -\varOmega_{j}^{-1} \delta\eta_{j} &\displaystyle :=&\displaystyle h_{j-1}\delta\eta_{j-1}-h_{j}\delta\eta_{j} \vspace{-2pt}\end{array} \end{aligned} $$

and expanding around η j

$$\displaystyle \begin{aligned} \begin{array}{rcl} h_{j-1} &\displaystyle =&\displaystyle h_{j} + \sum_{k=1}^{\infty} \frac{1}{k!} h_{j,k} (\eta_{j-1}-\eta_{j})^{k} \vspace{-2pt}\end{array} \end{aligned} $$

one obtains

$$\displaystyle \begin{aligned} \begin{array}{rcl} h_{j-1}\delta\eta_{j-1}-h_{j}\delta\eta_{j} &\displaystyle =&\displaystyle h_{j}(\delta\eta_{j-1}-\delta\eta_{j}) + \sum_{k=1}^{\infty} \frac{(-1)^{k}}{k!} h_{j,k} \delta \eta_{j-1}^{k+1}. \vspace{-2pt}\end{array} \end{aligned} $$
(16.62)

To obtain a symmetric expression, h j,k is expanded around \(\eta _{j-1/2}:=\frac {1}{2}(\eta _{j-1}+\eta _{j})\). Noting that \(\eta _{j}-\eta _{j-1/2}=\frac {1}{2} \delta \eta _{j-1}\), we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} h_{j,k} &\displaystyle =&\displaystyle \sum_{l=0}^{\infty} \frac{(\frac{1}{2})^l}{l!} h_{j-1/2,k+l} \delta \eta_{j-1}^{l} \end{array} \end{aligned} $$
(16.63)

and together

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} h_{j}(\delta\eta_{j-1}-\delta\eta_{j})+ \sum_{k=1,l=0}^{\infty} \frac{(-1)^{k}(\frac{1}{2})^l}{k! \; l!} h_{j-1/2,k+l} \delta \eta_{j-1}^{k+l+1}. \end{array} \end{aligned} $$
(16.64)

Multiplying with δt −2 and collecting terms to order O(δt 2), one gets the continuum limit

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} -\varOmega^{-1} \circ \ddot{y} - \varOmega^{-1}_{\eta}\circ \dot{y}^{2} - \tfrac{1}{24} \varOmega^{-1}_{\eta\eta\eta} \varOmega^{2}. \end{array} \end{aligned} $$
(16.65)

The last term in (16.47) is absorbed in the expression (16.51).

The continuum limit of the first two terms in the derivative of S 1 (see (16.49) is

$$\displaystyle \begin{aligned} \begin{array}{rcl} - f(y)_{\beta,\alpha}\varOmega(y)_{\beta \gamma}^{-1}dy_{\gamma}(t)/dt + d[\varOmega(y)_{\alpha \gamma}^{-1} f_{\gamma}(y)]/dt. \end{array} \end{aligned} $$

Transforming to Stratonovich calculus (16.5916.60) yields

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} -\{f(y)_{\beta,\alpha}\varOmega(y)_{\beta \gamma}^{-1} -[\varOmega(y)_{\alpha \beta}^{-1} f_{\beta}(y)]_{, \gamma}\} \circ dy_{\gamma}(t)/dt \\ +\frac{1}{2} [f(y)_{\beta,\alpha}\varOmega(y)_{\beta \gamma}^{-1}]_{,\delta}\varOmega_{\gamma\delta}. \end{array} \end{aligned} $$
(16.66)

Equation (16.50) yields

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \delta S_{2}/\delta y_\alpha(t) &\displaystyle =&\displaystyle f(y)_{\beta,\alpha}\varOmega(y)_{\beta \gamma}^{-1}f(y)_{\gamma}+ \frac{1}{2} f(y)_{\beta}\varOmega(y)_{\beta \gamma,\alpha}^{-1}f(y)_{\gamma}. \end{array} \end{aligned} $$
(16.67)

The last term to be discussed is (16.51). Formally,

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \frac{1}{2} \delta t^{-2} \mbox{tr}&\displaystyle &\displaystyle \{\varOmega^{-1}_{,\alpha} [\varOmega dt - (dy-f dt)(dy-f dt)^{\prime}]\} \\ &\displaystyle &\displaystyle =\frac{1}{2} \mbox{tr}\{\varOmega^{-1}_{,\alpha} [\varOmega \delta t^{-1} - (\dot{y}-f)(\dot{y}-f)^{\prime}]\} . \end{array} \end{aligned} $$
(16.68)

From the quadratic variation formula (dy − fdt)(dyfdt) = Ωdt, it seems that it can be dropped. But setting \(\delta \eta _{j}-f_{j}\delta t = g_{j} z_{j}\sqrt {\delta t}\) (from the Euler scheme, see (16.3)), one gets

$$\displaystyle \begin{aligned} \begin{array}{rcl} X := \frac{1}{2} \delta t^{-1} \mbox{tr}\{\varOmega^{-1}_{j,\alpha} \varOmega_{j} \;(I- z_{j} z_{j}^{\prime})\} \end{array} \end{aligned} $$

In scalar form, one has \(X := \frac {1}{2} \delta t^{-1} \varOmega ^{-1}_{j,\alpha } \varOmega _{j} \;(I- z_{j}^{2})\) which is \(\chi _{1}^{2}\)-distributed, conditionally on η j. One has E[1 − z 2] = 0;Var(1 − z 2) = 1 − 2 + 3 = 2, thus E[X] = 0 and \(\mbox{Var}[X]=\frac {1}{2} \delta t^{-2}E[\varOmega ^{-2}_{j,\alpha }\varOmega _{j}^{2}]\).

Therefore, the drift functional in the state-dependent case is

$$\displaystyle \begin{aligned} \begin{array}{rcl} -\frac{\delta \varPhi(y|z)}{\delta y(t)} &\displaystyle :=&\displaystyle F(y|z) \\ &\displaystyle =&\displaystyle (\text{A.19})-(\text{A.26})-(\text{A.27})-(\text{A.28})+(\text{A.29})\\ &\displaystyle &\displaystyle +\partial_{y_{0}}\log p(y_{0})\delta(t-t_{0}) \end{array} \end{aligned} $$

16.1.3 Discussion

The second-order time derivative (diffusion term w.r.t. t) Ω −1y tt in the SPDE (16.61) resulted from the first term (16.53) in the Lagrangian corresponding to the driftless process (random walk process). Usually this (in the continuum limit), infinite term is not considered and removed by computing a density ratio (16.17) which leads to a well-defined Radon-Nikodym density (16.54). On the other hand, the term is necessary to obtain the correct SPDE. Starting from the Radon-Nikodym density (16.57) for the process dY (t) = fdt + GdW(t) at the outset, it is not quite clear how to construct the appropriate SPDE. Setting for simplicity f = 0 and dropping the initial condition and the measurement part, Eq. (16.61) reads

$$\displaystyle \begin{aligned} \begin{array}{rcl} dY(u,t) &\displaystyle =&\displaystyle \varOmega^{-1} \;Y_{tt}(u,t) du +\sqrt{2}\; dW_{t}(u,t). \end{array} \end{aligned} $$

This linear equation (Ornstein-Uhlenbeck process) can be solved using a stochastic convolution as (\(A:=\varOmega ^{-1}\partial _t^2\))

$$\displaystyle \begin{aligned} \begin{array}{rcl} Y(u,t) &\displaystyle =&\displaystyle \exp(A u) Y(0,t) + \int_{0}^{u} \exp(A (u-s)) \sqrt{2}\; dW_{t}(s,t). \end{array} \end{aligned} $$

(cf. Da Prato 2004, ch. 2). It is a Gaussian process with mean \(\mu (u) = \exp (A u) E[Y(0)]\) and variance \(Q(u)=\exp (A u) \mbox{Var}(Y(0)) \exp (A^{*} u) + \int _{0}^{u} \exp (A s) 2 \exp (A^{*} s) ds\) where A is the adjoint of A. Thus the stationary distribution (u →) is the Gaussian measure N(0, Q()) with \(Q(\infty )=-A^{-1}=-\varOmega \cdot [\partial _{t}^{2}]^{-1}\), since A = A . But this coincides with \(C(t,s)=\varOmega \min (t,s)\), the covariance function of the scaled Wiener process G ⋅ W(t) (see (16.56); Ω = GG′). Thus, for large u, Y (u, t) generates trajectories of GW(t). More generally (f≠0), one obtains solutions of SDE (16.1). A related problem occurs in the state-dependent case Ω(y). Again, the term \(\int dy'(\varOmega dt)^{-1} dy\) yields a second-order derivative in the SPDE, but after transforming to symmetrized Stratonovich form, also higher-order terms appear (16.64), (16.65).

Moreover, the differential of Ω −1 in the Lagrangian (16.53)–(16.54) imports a problematic term similar to (16.53) into the SPDE, namely, \(\frac {1}{2} (\dot {y}-f)^{\prime }(\varOmega ^{-1})_{y} (\dot {y}-f)\), which can be combined with the derivative of the Jacobian (cf. 16.68). Formally, it is squared white noise where the differentials are in Itô form. A procedure similar to (16.63), i.e.,

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} h_{j,k} &\displaystyle =&\displaystyle \sum_{l=0}^{\infty} \frac{(-\frac{1}{2})^l}{l!} h_{j+1/2,k+l} \delta \eta_{j}^{l} \end{array} \end{aligned} $$
(16.69)

can be applied to obtain Stratonovich-type expressions. Because of the dubious nature of these expressions, only the quasi-continuous approach based on approximate finite-dimensional densities and Langevin equations is used in this paper.

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Cite this chapter

Singer, H. (2018). Langevin and Kalman Importance Sampling for Nonlinear Continuous-Discrete State-Space Models. In: van Montfort, K., Oud, J.H.L., Voelkle, M.C. (eds) Continuous Time Modeling in the Behavioral and Related Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-77219-6_16

Download citation

Publish with us

Policies and ethics