Abstract
The likelihood function of a nonlinear continuous-discrete state-space model with state dependent diffusion function is computed by integrating out the latent variables with the help of Langevin sampling. The continuous-time paths are discretized on a time grid in order to obtain a finite-dimensional integration and densities w.r.t. Lebesgue measure. We use importance sampling, where the exact importance density is the conditional density of the latent states, given the measurements. This unknown density is either estimated from the sampler data or approximated by an estimated normal density. Then, new trajectories are drawn from this Gaussian measure. Alternatively, a Gaussian importance density is directly derived from an extended Kalman smoother with subsequent sampling of independent trajectories (extended Kalman sampling (EKS)). We compare the Monte Carlo results with numerical methods based on extended, unscented, and Gauss-Hermite Kalman filtering (EKF, UKF, GHF) and a grid-based solution of the Fokker-Planck equation between measurements. This comprises the repeated multiplication of transition matrices based on Euler transition kernels, finite differences, and discretized integral operators. The methods are illustrated for the geometrical Brownian motion and the Ginzburg-Landau model for phase transitions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Mother behaviors—look at infant, smile, vocalize, touch, or hold infant. Infant behaviors—look at mother, smile, vocalize, touch or hold mother, fuss/cry.
- 2.
In order to avoid misunderstandings, one must distinguish between (non)linearity in the continuous-time dynamical specification (differential equation) w.r.t. the state variables, and in the derived “exact discrete model” w.r.t. the parameters.
- 3.
W.r.t. the parameters.
- 4.
One has \(\int \delta (x-x') \phi (x')dx' =\phi (x)\) and \(\sum _{\rho '} \delta _{\rho \rho '}\phi _{\rho '}=\phi _{\rho }\).
- 5.
Otherwise, one can use the singular normal distribution (cf. Mardia et al. 1979, ch. 2.5.4, p. 41). In this case, the generalized inverse of Ω j is used and the determinant |⋅|, which is zero, is replaced by the product of positive eigenvalues. Singular covariance matrices occur, for example, in autoregressive models of higher order, when the state vector contains derivatives of a variable.
- 6.
In statistical mechanics, one assumes the equivalence of time averages and ensemble averages (cross sections of identical systems).
- 7.
In the case of a state-dependent diffusion matrix, η j+1 = η j + G(η j, x j, ψ)δW j generates a more general martingale process. Expression (16.16) remains finite in a continuum limit (see Appendix 2).
- 8.
These are called irreducible diffusions. A transformation z = h(y) leading to unit diffusion for z must fulfil the system of differential equations h α,βg βγ = δ αγ, α, β = 1, …, p; γ = 1, …, r. The inverse transformation y = v(z) fulfills v α,γ(z) = g αγ(v(z)). Thus v α,γδ = g αγ,𝜖v 𝜖,δ = v α,δγ = g αδ,𝜖v 𝜖,γ. Inserting v, one obtains the commutativity condition \(g_{\alpha \gamma _, \epsilon } \; g_{\epsilon \delta }=g_{\alpha \delta ,\epsilon } \; g_{\epsilon \gamma }\), which is necessary and sufficient for reducibility. See Kloeden and Platen (1992, ch. 10, p. 348), Aït-Sahalia (2008).
References
Aït-Sahalia, Y. (2002). Maximum likelihood estimation of discretely sampled diffusions: A closed-form approximation approach. Econometrica, 70(1), 223–262. https://doi.org/10.1111/1468-0262.00274
Aït-Sahalia, Y. (2008). Closed-form likelihood expansions for multivariate diffusions. Annals of Statistics, 36(2), 906–937. https://doi.org/10.1214/009053607000000622
Apte, A., Hairer, M., Stuart, A. M., & Voss, J. (2007). Sampling the posterior: An approach to non-Gaussian data assimilation. Physica D: Nonlinear Phenomena, 230(1–2), 50–64. https://doi.org/10.1016/j.physd.2006.06.009
Apte, A., Jones, C. K. R. T., Stuart, A. M., & Voss, J. (2008). Data assimilation: Mathematical and statistical perspectives. International Journal for Numerical Methods in Fluids, 56(8), 1033–1046. https://doi.org/10.1002/fld.1698
Arasaratnam, I., Haykin, S., & Hurd, T. (2010). Cubature Kalman filtering for continuous-discrete systems: Theory and simulations. IEEE Transactions on Signal Processing, 58, 4977–4993. https://doi.org/10.1109/TSP.2010.2056923
Arnold, L. (1974). Stochastic differential equations. New York: Wiley.
Åström, K. J. (1970). Introduction to stochastic control theory. Mineola, NY: Courier Corporation.
Bagchi, A. (2001). Onsager-Machlup function. In Encyclopedia of mathematics. Berlin: Springer. https://www.encyclopediaofmath.org/index.php/Onsager-Machlup_function
Ballreich, D. (2017). Stable and efficient cubature-based filtering in dynamical systems. Berlin: Springer International Publishing.
Bartlett, M. S. (1946). On the theoretical specification and sampling properties of autocorrelated time-series. Journal of the Royal Statistical Society (Supplement), 7, 27–41. https://doi.org/10.2307/2983611
Basawa, I. V., & Prakasa Rao, B. L. S. (1980). Statistical inference for stochastic processes. London: Academic Press.
Bergstrom, A. R. (1976a). Non-recursive models as discrete approximations to systems of stochastic differential equations. In A. R. Bergstrom (Ed.), Statistical inference in continuous time models (pp. 15–26). Amsterdam: North Holland.
Bergstrom, A. R. (Ed.). (1976b). Statistical inference in continuous time economic models. Amsterdam: North Holland.
Bergstrom, A. R. (1983). Gaussian estimation of structural parameters in higher order continuous time dynamic models. Econometrica: Journal of the Econometric Society, 51(1), 117–152. https://doi.org/10.2307/1912251
Bergstrom, A. R. (1988). The history of continuous-time econometric models. Econometric Theory, 4, 365–383. https://doi.org/10.1017/S0266466600013359
Beskos, A., Papaspiliopoulos, O., Roberts, G. O., & Fearnhead, P. (2006). Exact and efficient likelihood-based inference for discretely observed diffusion processes (with discussion). Journal of the Royal Statistical Society Series B, 68, 333–382. https://doi.org/10.1111/j.1467-9868.2006.00552.x
Black, F., & Scholes, M. (1973). The pricing of options and corporate liabilities. Journal of Political Economy, 81, 637–654. https://doi.org/10.1086/260062
Chang, J., & Chen, S. X. (2011). On the approximate maximum likelihood estimation for diffusion processes. The Annals of Statistics, 39(6), 2820–2851. https://doi.org/10.1214/11-AOS922
Chow, S.-M., Ferrer, E., & Nesselroade, J. R. (2007). An unscented Kalman filter approach to the estimation of nonlinear dynamical systems models. Multivariate Behavioral Research, 42(2), 283–321. https://doi.org/10.1080/00273170701360423
Chow, S.-M., Lu, Z., Sherwood, A., & Zhu, H. (2016). Fitting nonlinear ordinary differential equation models with random effects and unknown initial conditions using the stochastic approximation expectation–maximization (SAEM) algorithm. Psychometrika, 81(1), 102–134. https://doi.org/10.1007/s11336-014-9431-z
Da Prato, G. (2004). Kolmogorov equations for stochastic PDEs. Basel: Birkhäuser. https://doi.org/10.1007/978-3-0348-7909-5
Da Prato, G., & Zabczyk, J. (1992). Stochastic equations in infinite dimensions. New York: Cambridge University Press. https://doi.org/10.1017/CBO9780511666223
Doreian, P., & Hummon, N. P. (1976). Modelling social processes. New York, Oxford, Amsterdam: Elsevier.
Durham, G. B., & Gallant, A. R. (2002). Numerical techniques for simulated maximum likelihood estimation of stochastic differential equations. Journal of Business and Economic Statistics, 20, 297–316. https://doi.org/10.1198/073500102288618397
Elerian, O., Chib, S., & Shephard, N. (2001). Likelihood inference for discretely observed nonlinear diffusions. Econometrica, 69(4), 959–993. https://doi.org/10.1111/1468-0262.00226
Feller, W. (1951). Two singular diffusion problems. Annals of Mathematics, 54, 173–182. https://doi.org/10.2307/1969318
Flury, T., & Shephard, N. (2011). Bayesian inference based only on simulated likelihood: particle filter analysis of dynamic economic models. Econometric Theory, 27(5), 933–956. https://doi.org/10.1017/S0266466610000599
Fuchs, C. (2013). Inference for diffusion processes: With applications in life sciences. Berlin: Springer. https://doi.org/10.1007/978-3-642-25969-2
Gard, T. C. (1988). Introduction to stochastic differential equations. New York: Dekker.
Girolami, M., & Calderhead, B. (2011). Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2), 123–214. https://doi.org/10.1111/j.1467-9868.2010.00765.x
Hairer, M., Stuart, A. M., & Voss, J. (2007). Analysis of SPDEs arising in path sampling, part II: The nonlinear case. Annals of Applied Probability, 17(5), 1657–1706. https://doi.org/10.1214/07-AAP441
Hairer, M., Stuart, A. M., & Voss, J. (2009). Sampling conditioned diffusions. In J. Blath, P. Morters, & M. Scheutzow (Eds.), Trends in stochastic analysis (pp. 159–186). Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9781139107020.009
Hairer, M., Stuart, A. M., & Voss, J. (2011). Signal processing problems on function space: Bayesian formulation, stochastic PDEs and effective MCMC methods. In D. Crisan & B. Rozovsky (Eds.), The Oxford handbook of nonlinear filtering (pp. 833–873). Oxford: Oxford University Press.
Hairer, M., Stuart, A. M., Voss, J., & Wiberg, P. (2005). Analysis of SPDEs arising in path sampling, part I: The Gaussian case. Communications in Mathematical Sciences, 3(4), 587–603. https://doi.org/10.4310/CMS.2005.v3.n4.a8
Haken, H. (1977). Synergetics. Berlin: Springer.
Haken, H., Kelso, J. A. S., & Bunz, H. (1985). A theoretical model of phase transitions in human hand movements. Biological Cybernetics, 51(5), 347–356. https://doi.org/10.1007/BF00336922
Hamerle, A., Nagl, W., & Singer, H. (1991). Problems with the estimation of stochastic differential equations using structural equations models. Journal of Mathematical Sociology, 16(3), 201–220. https://doi.org/10.1080/0022250X.1991.9990088
Harvey, A. C., & Stock, J. (1985). The estimation of higher order continuous time autoregressive models. Econometric Theory, 1, 97–112. https://doi.org/10.1017/S0266466600011026
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97–109. https://doi.org/10.1093/biomet/57.1.97
Herings, J. P. (1996). Static and dynamic aspects of general disequilibrium theory. Boston: Springer. https://doi.org/10.1007/978-1-4615-6251-1
Jazwinski, A. H. (1970). Stochastic processes and filtering theory. New York: Academic Press.
Jetschke, G. (1986). On the equivalence of different approaches to stochastic partial differential equations. Mathematische Nachrichten, 128, 315–329. https://doi.org/10.1002/mana.19861280127
Jetschke, G. (1991). Lattice approximation of a nonlinear stochastic partial differential equation with white noise. In International series of numerical mathematics (Vol. 102, pp. 107–126). Basel: Birkhäuser. https://doi.org/10.1007/978-3-0348-6413-8_8
Jones, R. H. (1984). Fitting multivariate models to unequally spaced data. In E. Parzen (Ed.), Time series analysis of irregularly observed data (pp. 158–188). New York: Springer. https://doi.org/10.1007/978-1-4684-9403-7_8
Jones, R. H., & Tryon, P. V. (1987). Continuous time series models for unequally spaced data applied to modeling atomic clocks. SIAM Journal of Scientific and Statistical Computing, 8, 71–81. https://doi.org/10.1137/0908007
Kac, M. (1980). Integration in function spaces and some of its applications. Pisa: Scuola normale superiore.
Kloeden, P. E., & Platen, E. (1992). Numerical solution of stochastic differential equations. Berlin: Springer. https://doi.org/10.1007/978-3-662-12616-5
Kloeden, P. E., & Platen, E. (1999). Numerical solution of stochastic differential equations. Berlin: Springer. (corrected third printing)
Langevin, P. (1908). Sur la théorie du mouvement brownien [On the theory of Brownian motion]. Comptes Rendus de l’Academie des Sciences (Paris), 146, 530–533.
Li, C. (2013). Maximum-likelihood estimation for diffusion processes via closed-form density expansions. The Annals of Statistics, 41(3), 1350–1380. https://doi.org/10.1214/13-AOS1118
Lighthill, M. J. (1958). Introduction to Fourier analysis and generalised functions. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9781139171427
Liptser, R. S., & Shiryayev, A. N. (2001). Statistics of random processes (Vols. I and II, 2nd ed.). New York: Springer
Lorenz, E. (1963). Deterministic nonperiodic flow. Journal of the Atmospheric Sciences, 20, 130. https://www.encyclopediaofmath.org/index.php/Onsager-Machlup_function
Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Association B, 44(2), 226–233.
Malik, S., & Pitt, M. K. (2011). Particle filters for continuous likelihood evaluation and maximisation. Journal of Econometrics, 165(2), 190–209. https://doi.org/10.1016/j.jeconom.2011.07.006
Mardia, K. V., Kent, J. T., & Bibby, J. M. (1979). Multivariate analysis. London: Academic Press.
Molenaar, P., & Newell, K. M. (2003). Direct fit of a theoretical model of phase transition in oscillatory finger motions. British Journal of Mathematical and Statistical Psychology, 56(2), 199–214. https://doi.org/10.1348/000711003770480002
Moler, C., & Van Loan, C. (2003). Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAM Review, 45(1), 1–46. https://doi.org/10.1137/S00361445024180
Newell, K. M., & Molenaar, P. C. M. (2014). Applications of nonlinear dynamics to developmental process modeling. Hove: Psychology Press.
Okano, K., Schülke, L., & Zheng, B. (1993). Complex Langevin simulation. Progress of Theoretical Physics Supplement, 111, 313–346. https://doi.org/10.1143/PTPS.111.313
Onsager, L., & Machlup, S. (1953). Fluctuations and irreversible processes. Physical Review, 91(6), 1505–1515. https://doi.org/10.1103/PhysRev.91.1505
Oud, J. H. L., & Jansen, R. A. R. G. (2000). Continuous time state space modeling of panel data by means of SEM. Psychometrika, 65, 199–215. https://doi.org/10.1007/BF02294374
Oud, J. H. L., & Singer, H. (2008). Continuous time modeling of panel data: SEM versus filter techniques. Statistica Neerlandica, 62(1), 4–28.
Ozaki, T. (1985). Nonlinear time series and dynamical systems. In E. Hannan (Ed.), Handbook of statistics (pp. 25–83). Amsterdam: North Holland.
Parisi, G., & Wu, Y.-S. (1981). Perturbation theory without gauge fixing. Scientia Sinica, 24, 483.
Pedersen, A. R. (1995). A new approach to maximum likelihood estimation for stochastic differential equations based on discrete observations. Scandinavian Journal of Statistics, 22, 55–71.
Pitt, M. K. (2002). Smooth particle filters for likelihood evaluation and maximisation (Warwick economic research papers No. 651). University of Warwick. http://wrap.warwick.ac.uk/1536/
Reznikoff, M. G., & Vanden-Eijnden, E. (2005). Invariant measures of stochastic partial differential equations and conditioned diffusions. Comptes Rendus Mathematiques, 340, 305–308. https://doi.org/10.1016/j.crma.2004.12.025
Risken, H. (1989). The Fokker-Planck equation (2nd ed.). Berlin: Springer. https://doi.org/10.1007/978-3-642-61544-3
Roberts, G. O., & Stramer, O. (2001). On inference for partially observed nonlinear diffusion models using the Metropolis–Hastings algorithm. Biometrika, 88(3), 603–621. https://doi.org/10.1093/biomet/88.3.603
Roberts, G. O., & Stramer, O. (2002). Langevin diffusions and Metropolis-Hastings algorithms. Methodology and Computing in Applied Probability, 4(4), 337–357. https://doi.org/10.1023/A:1023562417138
Rümelin, W. (1982). Numerical treatment of stochastic differential equations. SIAM Journal of Numerical Analysis, 19(3), 604–613. https://doi.org/10.1137/0719041
Särkkä, S. (2013). Bayesian filtering and smoothing (Vol. 3). Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9781139344203
Särkkä, S., Hartikainen, J., Mbalawata, I. S., & Haario, H. (2013). Posterior inference on parameters of stochastic differential equations via non-linear Gaussian filtering and adaptive MCMC. Statistics and Computing, 25(2), 427–437. https://doi.org/10.1007/s11222-013-9441-1
Schiesser, W. E. (1991). The numerical method of lines. San Diego: Academic Press.
Schuster, H. G., & Just, W. (2006). Deterministic chaos: An introduction. New York: Wiley.
Shoji, I., & Ozaki, T. (1997). Comparative study of estimation methods for continuous time stochastic processes. Journal of Time Series Analysis, 18(5), 485–506. https://doi.org/10.1111/1467-9892.00064
Shoji, I., & Ozaki, T. (1998a). Estimation for nonlinear stochastic differential equations by a local linearization method 1. Stochastic Analysis and Applications, 16(4), 733–752. https://doi.org/10.1080/07362999808809559
Shoji, I., & Ozaki, T. (1998b). A statistical method of estimation and simulation for systems of stochastic differential equations. Biometrika, 85(1), 240–243. https://doi.org/10.1093/biomet/85.1.240
Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall. https://doi.org/10.1007/978-1-4899-3324-9
Singer, H. (1986). Depressivität und gelernte Hilflosigkeit als Stochastischer Prozeß [Depression and learned helplessness as stochastic process] (Unpublished master’s thesis). Universität Konstanz, Konstanz, Germany. https://www.encyclopediaofmath.org/index.php/Onsager-Machlup_function
Singer, H. (1990). Parameterschätzung in zeitkontinuierlichen dynamischen Systemen [Parameter estimation in continuous time dynamical systems]. Konstanz: Hartung-Gorre-Verlag.
Singer, H. (1993). Continuous-time dynamical systems with sampled data, errors of measurement and unobserved components. Journal of Time Series Analysis, 14(5), 527–545. https://doi.org/10.1111/j.1467-9892.1993.tb00162.x
Singer, H. (1995). Analytical score function for irregularly sampled continuous time stochastic processes with control variables and missing values. Econometric Theory, 11, 721–735. https://doi.org/10.1017/S0266466600009701
Singer, H. (1998). Continuous panel models with time dependent parameters. Journal of Mathematical Sociology, 23, 77–98. https://doi.org/10.1080/0022250X.1998.9990214
Singer, H. (2002). Parameter estimation of nonlinear stochastic differential equations: Simulated maximum likelihood vs. extended Kalman filter and Itô-Taylor expansion. Journal of Computational and Graphical Statistics, 11(4), 972–995. https://doi.org/10.1198/106186002808
Singer, H. (2003). Simulated maximum likelihood in nonlinear continuous-discrete state space models: Importance sampling by approximate smoothing. Computational Statistics, 18(1), 79–106. https://doi.org/10.1007/s001800300133
Singer, H. (2005). Continuous-discrete unscented Kalman filtering (Diskussionsbeiträge Fachbereich Wirtschaftswissenschaft No. 384). FernUniversität in Hagen. http://www.fernunihagen.de/lsstatistik/publikationen/ukf2005.shtml
Singer, H. (2008a). Generalized Gauss-Hermite filtering. Advances in Statistical Analysis, 92(2), 179–195. https://doi.org/10.1007/s10182-008-0068-z
Singer, H. (2008b). Nonlinear continuous time modeling approaches in panel research. Statistica Neerlandica, 62(1), 29–57.
Singer, H. (2010). SEM modeling with singular moment matrices. Part I: ML-estimation of time series. Journal of Mathematical Sociology, 34(4), 301–320. https://doi.org/10.1080/0022250X.2010.509524
Singer, H. (2011). Continuous-discrete state-space modeling of panel data with nonlinear filter algorithms. Advances in Statistical Analysis, 95, 375–413. https://doi.org/10.1007/s10182-011-0172-3
Singer, H. (2012). SEM modeling with singular moment matrices. Part II: ML-estimation of sampled stochastic differential equations. Journal of Mathematical Sociology, 36(1), 22–43. https://doi.org/10.1080/0022250X.2010.532259
Singer, H. (2014). Importance sampling for Kolmogorov backward equations. Advances in Statistical Analysis, 98(4), 345–369. https://doi.org/10.1007/s10182-013-0223-z
Singer, H. (2016). Simulated maximum likelihood for continuous-discrete state space models using Langevin importance sampling (Diskussionsbeiträge Fakultät Wirtschaftswissenschaft No. 497). Paper presented at the 9th International Conference on Social Science Methodology (RC33), 11–16 September 2016, Leicester, UK. https://www.encyclopediaofmath.org/index.php/Onsager-Machlup_function
Stramer, O., Bognar, M., & Schneider, P. (2010). Bayesian inference for discretely sampled Markov processes with closed-form likelihood expansions. Journal of Financial Econometrics, 8(4), 450–480. https://doi.org/10.1093/jjfinec/nbp027
Stratonovich, R. L. (1971). On the probability functional of diffusion processes. In Selected translations in mathematical statistics and probability, Vol. 10 (pp. 273–286).
Stratonovich, R. L. (1989). Some Markov methods in the theory of stochastic processes in nonlinear dynamic systems. In F. Moss & P. McClintock (Eds.), Noise in nonlinear dynamic systems (pp. 16–71). Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511897818.004
Stuart, A. M., Voss, J., & Wiberg, P. (2004). Conditional path sampling of SDEs and the Langevin MCMC method. Communications in Mathematical Sciences, 2(4), 685–697. https://doi.org/10.4310/CMS.2004.v2.n4.a7
Thomas, E. A. C., & Martin, J. A. (1976). Analyses of parent-infant-interaction. Psychological Review, 83(2), 141–156. https://doi.org/10.1037/0033-295X.83.2.141
Van Kampen, N. G. (1981). Itô vs. Stratonovich. Journal of Statistical Physics, 24, 175–187. https://doi.org/10.1007/BF01007642
Wei, G. W., Zhang, D. S., Kouric, D. J., & Hoffman, D. K. (1997). Distributed approximating functional approach to the Fokker-Planck equation: Time propagation. Journal of Chemical Physics, 107(8), 3239–3246. https://doi.org/10.1063/1.474674
Weidlich, W., & Haag, G. (1983). Quantitative sociology. Berlin: Springer.
Wong, E., & Hajek, B. (1985). Stochastic processes in engineering systems. New York: Springer. https://doi.org/10.1007/978-1-4612-5060-9
Yoo, H. (2000). Semi-discretization of stochastic partial differential equations on R1 by a finite-difference method. Mathematics of Computation, 69(230), 653–666. https://doi.org/10.1090/S0025-5718-99-01150-3
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1: Langevin Sampler: Analytic Drift Function
16.1.1 Notation
In the following, the components of vectors and matrices are denoted by Greek letters, e.g., f α, α = 1, …, p, and partial derivatives by commas, i.e., f α,β := ∂f α∕∂η β = ∂ βf α = (f η)αβ. The Jacobian matrix ∂f∕∂η is written as f η and its βth column as (f η)•β. Likewise, Ω α• denotes row α of matrix Ω αβ and Ω •• = Ω for short.
Latin indices denote time, e.g., f jα = f α(η j). Furthermore, a sum convention is used for the Greek indices (i.e., f αg α =∑αf αg α). The difference operators δ = B −1 − 1, ∇ = 1 − B, with the backshift Bη j = η j−1 are used frequently. One has δ ⋅∇ = B −1 − 2 + B := Δ for the central second difference.
16.1.2 Functional Derivatives
The functional Φ(y) may be expanded to first order by using the functional derivative \((\delta \varPhi /\delta y)(h) = \int (\delta \varPhi /\delta y(s)) h(s)ds\). One has Φ(y + h) − Φ(y) = (δΦ∕δy)(h) + O(∥h∥2).
A discrete version is Φ(η) = Φ(η 0, …, η J) and Φ(η + h) − Φ(η) =∑j[∂Φ(η)∕∂(η jδt)]h jδt + O(∥h∥2). As a special case, consider the functional Φ(η) = η j. Since \(\eta _{j}+h_{j}-\eta _{j}=\sum (\delta _{jk}/\delta t) h_{k} \delta t\) one has the continuous analogue \(y(t)+h(t)-y(t)=\int \delta (t-s)h(s)ds\), thus δy(t)∕δy(s) = δ(t − s).
16.1.2.1 State-Independent Diffusion Coefficient
First we assume a state-independent diffusion coefficient Ω j = Ω, but later we set Ω j = Ω(η j, x j). This is important, if the Lamperti transformation does not lead to constant coefficients in multivariate models.Footnote 8 In components, the term (16.15) reads
Note that (Ω βγδt)−1 ≡ [(Ωδt)−1]βγ and the semicolon in η j+1;β serves to separate the indices; it is not a derivative. Differentiation w.r.t. the state η jα yields (j = 1, …, J − 1)
In vector notation, we have ∂S 0∕∂(η jδt) = −Ω −1δt −2Δη j. On the boundaries j = 0, j = J we obtain
Next, the derivatives of \(\log \alpha (\eta )\) are needed. One gets
or in vector form, using difference operators
where f j•,α is column α of the Jacobian f η(η j). The second term yields
Finally, one has to determine the drift component corresponding to the measurements, which is contained in the conditional density p(z|η). Since it was assumed that the error of measurement is Gaussian (see 16.2), we obtain
where ϕ(y;μ, Σ) is the multivariate Gaussian density, \(h_i=h(\eta _{j_{i}},x_{j_{i}})\) is the output function and \(R_{i}=R(x_{j_{i}})\) is the measurement error covariance matrix. Thus the derivative reads (matrix form in the second line)
The Kronecker symbol \(\delta _{j j_{i}}\) only gives contributions at the measurement times \(t_{i}=\tau _{j_{i}}\). Together we obtain for the drift of the Langevin equation (16.14)
Here, p(η 0) is an arbitrary density for the initial latent state.
16.1.2.2 State-Dependent Diffusion Coefficient
In the case of Ω j = Ω(η j, x j) the expressions get more complicated. The derivative of S 0 now reads
\(\varOmega _{j\beta \gamma ,\alpha }^{-1}\equiv (\varOmega ^{-1})_{j\beta \gamma ,\alpha }\). A closer relation to expression (16.40) may be obtained by the Taylor expansion
leading to
In the state-dependent case, also the derivative of the Jacobian term \(\log Z^{-1}=-\frac {1}{2} \sum _{j} \log \) |2πΩ jδt| is needed. Since the derivative of a log determinant is
one obtains
Ω j,α = Ω j••,α for short. Using the formula \(\varOmega _{j} \varOmega _{j}^{-1} = I; \varOmega _{j,\alpha } = -\varOmega _{j} \varOmega ^{-1}_{j,\alpha } \varOmega _{j}\), we find
The contributions of S 1 and S 2 are now (see 16.16)
It is interesting to compare the terms in (16.45, 16.49, 16.50) depending on the derivative \(\varOmega _{j\beta \gamma ,\alpha }^{-1}\), which read in vector form
and the Jacobian derivative (16.48). The terms can be collected to yield
as may be directly seen from the Lagrangian (16.7).
In summary, the Langevin drift component (jα), j = 0, …J;α = 1, …, p is in vector-matrix form
Here, h i•,α is column α of Jacobian \(h_{\eta }(\eta _{j_{i}})\), \(\varOmega _{j\alpha \bullet }^{-1}\) is row α of Ω(η j)−1, \(\varOmega _{j,\alpha }^{-1}:=\varOmega _{j \bullet \bullet ,\alpha }^{-1}\), and f j•,α denotes column α of Jacobian f η(η j).
Appendix 2: Continuum Limit
The expressions in the main text were obtained by using an Euler discretization of the SDE (16.1), so in the limit δt → 0, one expects a convergence of η j to the true state y(τ j) (see Kloeden and Platen 1999, ch. 9). Likewise, the (J + 1)p-dimensional Langevin equation (16.14) for η jα(u) will be an approximation of the stochastic partial differential equation (SPDE) for the random field Y α(u, t) on the temporal grid τ j = t 0 + jδt.
A rigorous theory (assuming constant diffusion matrices) is presented in the work of Reznikoff and Vanden-Eijnden (2005); Hairer et al. (2005, 2007); Apte et al. (2007); Hairer et al. (2011). In this section it is attempted to gain the terms, obtained in this literature by functional derivatives, directly from the discretization, especially in the case of state-dependent diffusions. Clearly, the finite-dimensional densities w.r.t. Lebesgue measure lose their meaning in the continuum limit, but the idea is to use large but finite J, so that the Euler densities p(η 0, …, η J) are good approximations of the unknown finite-dimensional densities p(y 0, τ 0;…;y J, τ J) of the process Y (t) (cf. Stratonovich 1971, 1989, Bagchi 2001 and the references cited therein).
16.1.1 Constant Diffusion Matrix
First we consider constant and (nonsingular) diffusion matrices Ω. The Lagrangian (16.15) attains the formal limit (Onsager-Machlup functional)
If y(t) is a sample function of the diffusion process Y (t) in (16.1), the first term (16.53) does not exist, since the quadratic variation dy(t)dy(t)′ = Ωdt is of order dt. Thus we have dy(t)′(Ωdt)−1dy(t) = tr[(Ωdt)−1 dy(t)dy(t)′] = tr[I p] = p. Usually, (16.53) is written as the formal expression \(\frac {1}{2} \int \dot {y}(t)^{\prime } \) \(\varOmega ^{-1} \dot {y}(t) dt\), which contains the (nonexisting) derivatives \(\dot {y}(t)\). Moreover, partial integration yields
so that C −1(t, s) = Ω −1(−∂ 2∕∂t 2)δ(t − s) is the kernel of the inverse covariance (precision) operator of Y (t) (for drift f = 0; i.e., a Wiener process). Indeed, since
the covariance operator kernel C(t, s) is
Thus, \(p(y)\propto \exp [-\frac {1}{2} \int {y}(t)^{\prime }\varOmega ^{-1}\ddot {y}(t) dt]\) is the formal density of a Gaussian process Y (t) ∼ N(0, C).
In contrast, the terms in (16.54) are well defined and yield the Radon-Nikodym derivative (cf. 16.17)
This expression can be obtained as the ratio of the finite-dimensional density functions p(y J, τ J, …, y 1, τ 1|y 0, τ 0) for drifts f and f = 0, respectively, in the limit δt → 0 (cf. Wong and Hajek 1985, ch. 6, p. 215 ff.). In this limit, the (unkown) exact densities can be replaced by the Euler densities (16.5). Now, the terms of the Langevin equation (16.14) will be given. We start with the measurement term (16.43), α = 1, …, p
where the scaled Kronecker delta \((\delta _{j j_{i}}/\delta t)\) was replaced by the delta function (see Appendix 1). Clearly, in numerical implementations, a certain term of the delta sequence δ n(t) must be used (cf. Lighthill 1958). Next, the term stemming from the driftless part (16.40) is
or Ω −1y tt(t) in matrix form, which corresponds to (16.55). The contributions of S 1 are (cf. 16.41)
The first term is of Itô form. Transformation to Stratonovich calculus (Apte et al. 2007, sects. 4, 9) yields
Thus, we obtain
where ∂ y ⋅ f(y) = f β,β = div(f). Finally we have (cf. 16.42)
and \(\delta _{y(t)}\log p(y(t_{0}))=\partial _{y_{0}}\log p(y_{0})\delta (t-t_{0})\). Putting all together, one finds the Langevin drift functional (in matrix form)
and the SPDE (cf. Hairer et al. 2007)
where W t(u, t) = ∂ tW(u, t) is a cylindrical Wiener process with E[W t(u, t)] = 0, E[W t(u, t) \(W_{s}(v,s)^{\prime }]=I_{p}\min (u,v)\delta (t-s)\), and W(u, t) is a Wiener field (Brownian sheet). See, e.g., Jetschke (1986); Da Prato and Zabczyk (1992, ch. 4.3.3). The cylindrical Wiener process may be viewed as continuum limit of \(W_{j}(u)/\sqrt {\delta t}\), \(E[W_{j}(u)/\sqrt {\delta t} \; W^{\prime }_{k}(v)/\sqrt {\delta t}]=I_{p}\min (u,v)\delta t^{-1}\delta _{jk}\).
16.1.2 State-Dependent Diffusion Matrix
In this case, new terms appear. Starting with the first term in (16.47), one gets
The second term in (16.47) contains terms of the form h j (η j − η j−1) which appear in a backward Itô integral. Here we attempt to write them in symmetrized (Stratonovich) form. It turns out that the Taylor expansion (16.46) must be carried to higher orders. Writing (for simplicity in scalar form)
and expanding around η j
one obtains
To obtain a symmetric expression, h j,k is expanded around \(\eta _{j-1/2}:=\frac {1}{2}(\eta _{j-1}+\eta _{j})\). Noting that \(\eta _{j}-\eta _{j-1/2}=\frac {1}{2} \delta \eta _{j-1}\), we have
and together
Multiplying with δt −2 and collecting terms to order O(δt 2), one gets the continuum limit
The last term in (16.47) is absorbed in the expression (16.51).
The continuum limit of the first two terms in the derivative of S 1 (see (16.49) is
Transforming to Stratonovich calculus (16.59–16.60) yields
Equation (16.50) yields
The last term to be discussed is (16.51). Formally,
From the quadratic variation formula (dy − fdt)(dy − fdt)′ = Ωdt, it seems that it can be dropped. But setting \(\delta \eta _{j}-f_{j}\delta t = g_{j} z_{j}\sqrt {\delta t}\) (from the Euler scheme, see (16.3)), one gets
In scalar form, one has \(X := \frac {1}{2} \delta t^{-1} \varOmega ^{-1}_{j,\alpha } \varOmega _{j} \;(I- z_{j}^{2})\) which is \(\chi _{1}^{2}\)-distributed, conditionally on η j. One has E[1 − z 2] = 0;Var(1 − z 2) = 1 − 2 + 3 = 2, thus E[X] = 0 and \(\mbox{Var}[X]=\frac {1}{2} \delta t^{-2}E[\varOmega ^{-2}_{j,\alpha }\varOmega _{j}^{2}]\).
Therefore, the drift functional in the state-dependent case is
16.1.3 Discussion
The second-order time derivative (diffusion term w.r.t. t) Ω −1y tt in the SPDE (16.61) resulted from the first term (16.53) in the Lagrangian corresponding to the driftless process (random walk process). Usually this (in the continuum limit), infinite term is not considered and removed by computing a density ratio (16.17) which leads to a well-defined Radon-Nikodym density (16.54). On the other hand, the term is necessary to obtain the correct SPDE. Starting from the Radon-Nikodym density (16.57) for the process dY (t) = fdt + GdW(t) at the outset, it is not quite clear how to construct the appropriate SPDE. Setting for simplicity f = 0 and dropping the initial condition and the measurement part, Eq. (16.61) reads
This linear equation (Ornstein-Uhlenbeck process) can be solved using a stochastic convolution as (\(A:=\varOmega ^{-1}\partial _t^2\))
(cf. Da Prato 2004, ch. 2). It is a Gaussian process with mean \(\mu (u) = \exp (A u) E[Y(0)]\) and variance \(Q(u)=\exp (A u) \mbox{Var}(Y(0)) \exp (A^{*} u) + \int _{0}^{u} \exp (A s) 2 \exp (A^{*} s) ds\) where A ∗ is the adjoint of A. Thus the stationary distribution (u →∞) is the Gaussian measure N(0, Q(∞)) with \(Q(\infty )=-A^{-1}=-\varOmega \cdot [\partial _{t}^{2}]^{-1}\), since A = A ∗. But this coincides with \(C(t,s)=\varOmega \min (t,s)\), the covariance function of the scaled Wiener process G ⋅ W(t) (see (16.56); Ω = GG′). Thus, for large u, Y (u, t) generates trajectories of GW(t). More generally (f≠0), one obtains solutions of SDE (16.1). A related problem occurs in the state-dependent case Ω(y). Again, the term \(\int dy'(\varOmega dt)^{-1} dy\) yields a second-order derivative in the SPDE, but after transforming to symmetrized Stratonovich form, also higher-order terms appear (16.64), (16.65).
Moreover, the differential of Ω −1 in the Lagrangian (16.53)–(16.54) imports a problematic term similar to (16.53) into the SPDE, namely, \(\frac {1}{2} (\dot {y}-f)^{\prime }(\varOmega ^{-1})_{y} (\dot {y}-f)\), which can be combined with the derivative of the Jacobian (cf. 16.68). Formally, it is squared white noise where the differentials are in Itô form. A procedure similar to (16.63), i.e.,
can be applied to obtain Stratonovich-type expressions. Because of the dubious nature of these expressions, only the quasi-continuous approach based on approximate finite-dimensional densities and Langevin equations is used in this paper.
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Singer, H. (2018). Langevin and Kalman Importance Sampling for Nonlinear Continuous-Discrete State-Space Models. In: van Montfort, K., Oud, J.H.L., Voelkle, M.C. (eds) Continuous Time Modeling in the Behavioral and Related Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-77219-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-77219-6_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77218-9
Online ISBN: 978-3-319-77219-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)