A Kalman particle filter for online parameter estimation with applications to affine models

In this paper we address the problem of estimating the posterior distribution of the static parameters of a continuous-time state space model with discrete-time observations by an algorithm that combines the Kalman filter and a particle filter. The proposed algorithm is semi-recursive and has a two layer structure, in which the outer layer provides the estimation of the posterior distribution of the unknown parameters and the inner layer provides the estimation of the posterior distribution of the state variables. This algorithm has a similar structure as the so-called recursive nested particle filter, but unlike the latter filter, in which both layers use a particle filter, our algorithm introduces a dynamic kernel to sample the parameter particles in the outer layer to obtain a higher convergence speed. Moreover, this algorithm also implements the Kalman filter in the inner layer to reduce the computational time. This algorithm can also be used to estimate the parameters that suddenly change value. We prove that, for a state space model with a certain structure, the estimated posterior distribution of the unknown parameters and the state variables converge to the actual distribution in Lp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^p$$\end{document} with rate of order O(N-12+Δ12)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}}(N^{-\frac{1}{2}}+\varDelta ^{\frac{1}{2}})$$\end{document}, where N is the number of particles for the parameters in the outer layer and Δ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varDelta $$\end{document} is the maximum time step between two consecutive observations. We present numerical results of the implementation of this algorithm, in particularly we implement this algorithm for affine interest models, possibly with stochastic volatility, although the algorithm can be applied to a much broader class of models.


Introduction
We pose the problem, describe its background and give a brief sketch of earlier approaches.After that we explain our approach and contribution to the literature and outline the organization of the present paper.

Problem description and background
When using stochastic models in a business environment, the model parameters need to be estimated, which turns out to be a very challenging problem.The main methods for parameter estimation can be classified into two groups: Bayesian and Maximum Likelihood estimation (MLE) methods.Such methods can also be categorized as online or offline depending on whether the data are used sequentially, or used in batches of observations.The MLE approach is to find the estimate which maximizes the marginal likelihood of the observed data.The Bayesian approach, however, considers the parameters as random variables which are updated recursively using prior knowledge of the parameters and the likelihood of the observations.In applications, the MLE based offline method is often linked to the Kalman filter or its modifications such as the extended Kalman filter, see [14,36], or the unscented Kalman filter, see [37], because these algorithms can compute or approximate the likelihood function analytically.However, a common problem of the MLE calibration is that the likelihood function is usually not convex.Hence the numerical optimization of the likelihood often ends up at a local maximum instead of the global maximum.This problem can be even more severe when dealing with models with many parameters, such as multi-factor Hull-White models, popular in interest rate modeling.Moreover, the MLE method normally requires static model parameters, while in reality the model parameters, such as volatility in financial models, could change over time.These issues restrict the application of offline methods, for instance in the financial modeling area.So, in recent decades, online methods received more and more attention.
Attempts to solve the problem of estimating the static parameters online was to include simulations (particles) of parameter values.One then has a particle filter, see for example [10,19,26,28] and [24] for a survey.However, through successive time steps this approach can quickly lead to what is called particle degeneracy of the parameter space.One solution to this degeneracy problem is to use a kernel density to estimate the posterior distribution of the parameters from which new parameter particles can be drawn at each time step [27].However, such a method can only work on some models with parameters of low dimension.Also a convergence analysis of such a method is missing.
In recent years, some new methods have been proposed to deal with the online parameter estimation problem, including the iterated batch importance sampling (IBIS), see [4], the sequential Monte Carlo square (SMC2) simulation, see [5], and the recursive nested particle filter (RNP filter, also RNPF in short), see [8].The SMC2 and the RNPF use two layers of Monte Carlo methods to overcome certain difficulties with the IBIS method, see [29].An important difference between SMC2 and the RNPF is that the SMC2 is a non-recursive method, whereas the RNPF is recursive.Hence in general, RNPF is more efficient than SMC2.
In [8] the estimated posterior measure of the parameters by using an RNPF algorithm is shown to converge to the actual measure in L p -norm with rate N −1/2 + M −1/2 , where N is the number of particles for the parameter estimation in the outer layer and N × M is the number of particles for the state variables in the inner layer.The RNPF has some drawbacks for a practical application.One is that the computation of the two Monte Carlo layers is very time consuming, another one is that the RNPF requires that the parameter mutation size is small enough.As a consequence the RNPF converges very slowly to the actual value of the parameters and hence requires a very long time series of data, which is very often not available in many applications.

Contribution
In this paper, we consider joint parameter and state estimation for a state space model where the state evolves continuously in time, whereas the observations are made at discrete time instants.We use a Bayesian online approach to parameter estimation.We propose an algorithm which combines the Kalman filter and a particle filter for online estimation of the posterior distribution of the unknown parameters.This algorithm has a similar structure as the RNPF, it is a semirecursive algorithm with also two layer structure: the inner layer provides the approximation on the posterior distribution of the state variables conditioned on the parameter particles generated in the outer layer, while the outer layer provides an approximation of the posterior distribution of the parameters by using the outcome of the inner layer.
Our proposed methodology has two main differences when compared to the RNPF algorithm.One difference is that in the inner layer, the posterior distribution of the state variables is estimated by the Kalman filter instead of a particle filter.The implementation of the Kalman filter reduces the computation complexity and hence results in a much faster and robust algorithm.The second difference is in the outer layer.In the RNPF the parameter samples are generated from a certain kernel function.In order to obtain a recursive algorithm, some requirements on the kernel function are introduced.This results in a kernel that significantly reduces the convergence speed of the RNPF.We overcome this problem by using dynamic jittering kernels.Especially in this paper, we implement two different kernel functions.One is applied at the beginning stage to obtain a higher convergence speed.The consequence, however, is that the algorithm is not recursive at this beginning stage since this kernel function does not satisfy the requirements of a recursive algorithm.The other kernel is applied when the variance of the parameter particles decreases to a certain level which is such that this kernel function satisfies the conditions for a recursive method.From that time on, the algorithm is truly recursive.From the numerical experiments we performed, we observe that the variance of the particles decreases very fast at the beginning stage, usually after hundreds steps.Hence by using these two different kernel functions, the algorithm converges much faster than the RNPF.
This paper also provides theoretical results on the asymptotical behavior of the proposed algorithm.When dealing with non-Gaussian or non-linear models, the Kalman filter in the inner layer could produce a biased estimate of the posterior distribution of the state variables.This makes it difficult to generally study the convergence of the posterior distribution of the parameters.Although it is shown in [30] that, under certain assumptions, the bias introduced in the inner layer makes the posterior distribution of the parameters converge to a biased distribution, this bias is intractable in general.In this paper, for models with a certain structure, we show that the estimated distributions of the parameters and the states converges the actual distributions in L p with rate of order O(N −1/2 + δ 1/2 ) under certain regularity assumptions, where N is the number of particles for the parameter space and δ is the maximum time step between consecutive observations.Note that we don't have to deal with particles in the inner layer, which improves on the order M −1/2 term for convergence rate of the RNPF.Our proofs are inspired by those in [8], but at crucial steps we obtain novel results.These are due to the use of the Kalman filter in one of the layers and to the size of the time discretization that governs the observations of the continuous time system, the latter not playing a role in the setting of the cited reference.
To illustrate the performance of the algorithm, we present numerical results of the parameter estimation on several affine interest rate models, some allowing for stochastic volatility, including two-factor Hull-White model and the Cox-Ingersoll-Ross (CIR) model.For the CIR model we have also implemented the RNPF and we observed that our algorithm outperforms the RNPF.Although the algorithm is designed for static parameter estimation, it can also be used to estimate parameters that perform sudden changes in value.We also present an implementation of the algorithm in such a situation, and we observe that the algorithm is able to quickly track such a sudden change.

Organization of the paper
In Section 2 we present the state space model of interest.This section also provides brief reviews on Bayesian filters, including the Kalman filter and the particle filter, and online parameter estimation using particle filters.Section 3 contains an encompassing framework for various affine models that are used in interest rate modeling and to which we apply our proposed Kalman particle algorithm, which is introduced in Section 4. In Section 5 we provide the convergence analysis and in Section 6 the numerical results are presented.Finally Section 7 is devoted to the conclusions.In the Appendix we collect some background results on affine processes.

Notation
Let d ≥ 1, S ⊆ R d , and B(S) be the sigma algebra of Borel subsets of S. We denote by 1 A the indicator function on A ∈ B(S) and by δ x the Dirac measure for a given x ∈ S, i.e., Suppose given a function f : S → R and a probability measure µ on (S, B(S)).We denote the integral of f w.r.t.µ by (f, µ) := S f (x) µ(dx) and the supremum norm of f by f ∞ = sup x∈S |f (x)|.We use the notation x 0:k := (x 0 , . . ., x k ) for a discrete-time sequence up to time k of a process (x k ) k∈N .By • , we denote the transpose of a vector or a matrix.The Euclidian norm of an element x ∈ R d , is denoted by x and the L p -norm, for p ≥ 1 of a random variable X, defined on some probability space (Ω, F, P), is denoted by X p = (E|X| p ) 1/p .Densities of random variables or vectors x (always assumed to exist w.r.t. the Lebesgue measure) are often denoted p, or p(x) and conditional densities of X given Y = y are often denoted p(x | y), possibly endowed with subor superscripts.
2 Set up and background on parameters estimation using filters In this section we outline the set up, we pose the problem formulation, give a brief survey of various filters (Bayesian, Kalman, particle filter) and address the parameter estimation problem using particle filters.Time is assumed to be discrete.

Discrete-time state space model
We consider the following general state space model where are given functions and {u k } k∈N + and {v k } k∈N + are d-dimensional white noise processes, possibly independent, and both independent of the initial condition x 0 , all defined on some (Ω, F, P).Parameters in the functions f k and h k , together with the covariance of u k and v k can be seen as the parameters of the state space model, and to which we refer to as θ.
It follows that the model (2.1) satisfies the properties of a stochastic system, i.e. at every (present) time k ≥ 1 the future states and future observations (x j , y j ), j ≥ k, are conditionally independent from the past states and observations (x j , y j−1 ), j ≤ k, given the present state x k , see [35].It then follows that {x k } k∈N is a Markov process, and for every k ≥ 1 one has that y k and y 1:k−1 are conditionally independent given x k , in terms of densities, Moreover, one also has, for every k ≥ 1, that x k and y 1:k−1 are conditionally independent given x k , in terms of densities, 3) The latter equation has the consequence We are interested in estimating the (latent) state process {x k } k∈N + , but only have access to the process {y k } k∈N + which represents the observations.Because of the existence of the white noise in the data, estimating the value of the latent states {x k } k∈N + by the observations {y k } k∈N + is not trivial.There are different methodologies in the literature to estimate the latent process (see e.g.[32,6,1]).We introduce some of these methodologies in our paper since we will need them in our analysis later.We first introduce the Bayesian filter.

Bayesian filter of discrete-time Markovian state space model
The Bayesian filter, see e.g.[32,33] for an overview, is used to estimate the latent states {x k } k∈N + in (2.1) given the parameter θ.We define the initial probability measure π 0 of x 0 , and the transition measure π θ k of x k under a given parameter θ at time k by π 0 (A) = P(x 0 ∈ A), where The methodology in Bayesian filtering consists of two parts: prediction and update.At every time point k, the prediction part computes (estimates) the prior measure of x k (a time k given the past observations up to time k − 1) and the update part computes (estimates) the posterior measure of x k given the past up to time k, respectively given by Using Bayes' rule, we deduce that the density function of the prior distribution is given by where we used (2.3) to get the last equality.This implies the relation Let g : R d → R be an integrable function w.r.t. the measure γ θ k .Then we get by Fubini's theorem which we abbreviate by (g, The purpose of the Bayesian algorithm is to sequentially compute the posterior measure Γ θ k .Let be the probability (with some abuse of statistical terminology we often also call it likelihood) of the realized observation y k conditional on the state value x k = x and the model parameter θ.
Then using Bayes' rule, (2.2) and (2.7), we obtain for a function g that is integrable w , which we abbreviate, similar to (2.8), by . (2.9) If we assume the likelihood function l θ y k and the transition measure π θ k are known, then given the posterior measure Γ θ k−1 , we can use Equation (2.9) to compute the posterior measure Γ θ k .In this way the posterior measure {Γ θ k } k∈N + can be computed recursively.Moreover, using (2.2) again, the conditional likelihood p(y k | y 1:k−1 , θ) and the likelihood p(y 1:k | θ) can be respectively computed as and When (2.1) is a linear Gaussian model, then the Bayesian filter is equivalent to the Kalman filter, which we briefly review in the next subsection.

Kalman filter
We assume that the state and observations in (2.1) evolve according to a linear Gaussian model.That is the functions f k and h k have to take linear forms as follows where F k is a d × d matrix, H k is a m × d matrix and the noise terms u k (d-dimensional), v k (m-dimensional) are assumed to be Gaussian with mean 0 and variance Q k , R k , respectively.Moreover, the initial state x 0 is assumed to be Gaussian.Due to the Gaussian assumptions and the linear structure of the model in (2.11), one can derive analytic expressions for the prior and posterior measures defined in (2.6) and the algorithm in the Kalman filter, see e.g.[6,20], yields the exact solution to the estimation problem.Denote by N (dx; µ, Σ) or N (µ, Σ) the Gaussian distribution with mean µ and Covariance Σ.We also use the generic notation N (x; µ, Σ) to denote the density at x of this normal distribution.Recall from (2.6), the prior and posterior measures and denote by B k−1 and P k−1 respectively, the mean and the covariance of the posterior measure at time k − 1.Then the prior measure is given by , which implies that the prior measure is a conditionally Gaussian measure with mean and covariance respectively given by Moreover, the posterior measure is given by where Finally, the conditional likelihood is given by Then we obtain the recursion for the log-likelihood of the observation log(p(y 1:k ) | θ) as follows, log(p(y where m is the dimensionality of the data y k , k ∈ N + .Hence, by maximizing the likelihood of the observations, one can determine the optimal parameters of the linear Gaussian system (2.11).
For most non linear non Gaussian models, it is not possible to compute the prior and posterior measures analytically and numerical methods are called for.In this case, the particle filter, which we introduce in the next subsection, is widely used.

Particle filter
In the particle filter, see e.g.[1,3,11], the prior and posterior distributions are estimated by a Monte Carlo method.With a Monte Carlo method, a certain measure µ is generally estimated by where {x (i) , i = 1, • • • , N } are i.i.d.random samples from a so-called importance density and The key part of the particle filter is to choose the importance density and compute the importance weights, see e.g.[9].For the general state space model (2.1), suppose the posterior measure Γ θ k−1 at time k − 1 is estimated by If at time k, the samples x(i) k are generated from the transition measure Moreover, using Equation (2.9), the prior and posterior measures are respectively estimated by and from (2.10), we deduce the following approximation for the conditional likelihood Consequently, the integral (f, Γ θ k ) can be estimated by where the weights a k are defined by . (2.16) Equations (2.14) and (2.15) show how to sequentially estimate the posterior measure Γ θ k using the Monte Carlo method.This type of particle filter is often referred to as sequential particle filter.In [9] it is shown that the variance of the importance weights decreases stochastically over time.This will lead the importance weights to be concentrated on a small amount of sampled particles.This problem is called degeneracy.To address the rapid degeneracy problem, the sampling-importance resampling (SIR) method, see e.g.[9,31], is introduced to eliminate the samples with low importance weight and multiply the samples with high importance weight.In SIR, once the approximation of the posterior measure Γ Then the new estimation on the posterior measure Γ θ k is given by and the new estimate of the conditional likelihood is

Static model parameters estimation using particle filter
When the parameters are known, the particle filter is a quite effective algorithm for latent variable estimation.However, if the parameters are not known beforehand, it is a very challenging task to estimate the parameters and the latent states using the particle filter.Here we take a Bayesian approach to estimate the parameters.The estimation of the parameters in online estimation requires the computation of the posterior distribution of θ, i.e., p(θ | y 1:k ), k ∈ N + .Using Bayes' rule, one can represent the posterior density as Hence, the posterior distribution of θ given y 1:k can be evaluated as To estimate the density and the distribution of θ given y 1:k , a straightforward way is to sample parameter particles from the former posterior distribution p(θ | y 1:k−1 ).Denote the samples by where the weights w . (

2.18)
There are two issues to implement (2.17).One is that sampling from the former posterior distribution p(θ | y 1:k−1 ) usually cannot be carried out exactly.Another is that often the likelihood ) cannot be computed theoretically.These two latter issues can be tackled by using the recursive nested particle filter (RNPF), recently introduced in [8], which is presented below.

Recursive nested particle filter
In the RNPF, a two layer Monte Carlo method is used.In the first layer, also referred to as outer layer, new parameter samples are generated by using a kernel function.This step is usually called jittering and the kernel is referred to as the jittering kernel.In the second layer, also called inner layer, a particle filter is applied to approximate the conditional likelihood p(y k | y 1:k−1 , θ (i) ).In the following paragraph of this section we present the RNPF in more detail and introduce its ensuing Algorithm 2.1.First, assume that θ has a compact support D θ ⊂ R d θ , where d θ is the dimension of θ.Moreover assume at time k − 1, one can generate a random grid of samples in the parameter space D θ , say {θ k−1 , we have the set of particles in the state space {x • Jittering.Given the parameters samples {θ • Update.From Equations (2.8) and (2.10), we know that for a given θ, the marginal likelihood is obtained by calculating the integral In order to compute this latter integral, the posterior measure at time k − 1, Γ θ k−1 , needs to be known.In the standard Bayesian filter, the parameters are fixed over time and this posterior measure is computed at time k − 1 by using Equation (2.9).However in this case, this measure is not directly available since the parameter has evolved from θ at time k − 1 to θ at time k.In order to compute Γ θ k−1 , one needs to re-run a filter from time 1 to k, which makes the algorithm not recursive and very time consuming.The authors in [8] solved this latter problem by assuming that Γ Therefore by considering a rather small variance in the jittering kernel, one can use the particle approximation of the filter computed for θ at time k − 1 as a particle approximation of the filter for the new sampled θ at time k.
In the RNPF, the jittering kernel is chosen such that the mutation step from θ k−1 to θk is sufficiently small, see Section 4.2 in [8].Then for each θ(i) k , {i = 1, • • • , N }, a sequential nested particle filter (see Section 2.4 for the description of the particle filter methodology) is used for the state space to obtain {x .e in Algorithm 2.1 below.This is the inner Monte Carlo layer.
• Resampling.The outer layer Monte Carlo method in the update step above provides an approximation of the likelihood p( 1) which are used to re-weight the parameter particles and obtain {θ The RNPF is introduced in [8].We reproduce it here for the sake of completeness.
Algorithm 2.1 (sequential nested particle filter for parameter estimation).
Initialization: Assume an initial distribution p(θ 0 ) for the parameters and p(x 0 ) for the states, and sample from the initial distributions to get N particles {θ

Recursion:
1. Filtering: given {θ (together with the next two steps, this is the update part) sample new parameters ), d. compute the weights for the state space using Equation (2.16) e. resample the x(i,p) : set x(i,j) equal to x(i,p) with probability a

Resampling of the {θ (i)
k }: compute the weights for the parameters space using Equation (2.18) Go back to the filtering step.

A note on the convergence of the RNPF
A convergence study of Algorithm 2.1 was carried out in Lemmas 3 to 6 and Theorems 2 and 3 in [8], where the reasoning was split in the three steps of the algorithm: the jittering, the update and the resampling.In this latter paper, it was proven that, under some regularity conditions, the L pnorms of the approximation errors, induced by these different steps, vanish with rate proportional to 1 √ N and 1 √ M .Recall here that N and N × M are respectively the number of samples in the parameter space and the number of particles in the state space.A similar result was proven for the approximation of the joint posterior distribution of the parameters and the state variables.We will make use of some of these convergence results later in Section 5 to carry out convergence study of our proposed algorithm, Algorithm 4.3.
Under the assumption that the posterior measure Γ θ k (dx) is continuous w.r.t. the parameter θ and when the mutation step of the parameters is small enough, the RNPF is a recursive algorithm.This makes the RNPF more efficient than non-recursive methods such as sequential Monte Carlo square, see [5], and Markov Chain Monte Carlo methods, see [16,17,22].The drawbacks of the RNPF are its heavy computational burden and slow convergence speed which are respectively due to the nested simulations in the two Monte Carlo layers and the small mutation step of the parameters.In many applications, such as in financial modeling, the time length of the data is quite limited.Hence the time series of the data are not enough to make the RNPF converge.To tackle this problem, we propose a new methodology in Section 4.

Parameters estimation in short rate models
Here we present a rather general model, an affine process, particularly relevant in mathematical finance for instance where one is interested in estimating the parameters of the short rate curve given the observed data.It motivates the kind of system that we will consider and to which the new (Kalman particle) filter of Section 4 will be applied.
Let (Ω, F, (F t ) t≥0 , P) be a filtered probability space satisfying the usual conditions and (W t ) t≥0 be a d-dimensional Brownian motion.In this paper, although our results can be applied to general state space models of type (2.1), we will mainly consider dynamics of the type where A, Σ and Σ are d × d-matrices, β is a d-vector and its first component is non-negative, and t ) t≥0 is the first component of (x t ) t≥0 .We assume the matrix A is diagonal and we denote the diagonal elements of A by α 1 , • • • , α d .Consider some integers p, q ≥ 0 with p + q = d.When Σ Σ = ΣΣ = 0 and the parameters of the model (3.1) satisfy certain conditions known in the literature as admissibility conditions, the process (x t ) t≥0 is R p + × R q -valued affine process, see [12,13,25] for an overview of affine processes.In Appendix A.1 we specify the admissibility of the parameters of the dynamics (3.1).In our context, the short rate evolution will be described by a process (r t ) t≥0 given in terms of (x t ) t≥0 by where c ∈ R, γ ∈ R d .Let T > 0 be the maturity time, then the zero coupon bond price at time t < T is defined as and the corresponding zero rates, also called yields, are defined as − log P (t, T )/(T − t) .The fact that the process (x t ) t≥0 is affine, which happens if Σ Σ is zero, allows one to obtain an explicit formula for the zero coupon bond price, i.e.
The functions φ and ψ are the solutions to some ordinary differential equations, which are often referred to as the Riccati equations, see Theorem A.1 in Appendix A.2 for details.Then, if Σ = 0 (first case), Equation (3.2) holds for (φ, ψ) the solution to (A.1).If Σ = 0 (second case), then (3.2) holds for (φ, ψ) the solution to (A.2).Denote the time to maturity T − t by τ , then the zero rate at time t with time to maturity τ can be computed by In the market, we can obtain the data for zero rates at discrete time instants t k with certain times to maturity τ 1 , • • • , τ L , call these data R k (τ l ).We believe these data contain noise, hence at time k we observe , and v k is an L-dimensional random vector which presents the noise in the observed data.Let 0 = t 0 ≤ t 1 , • • • , t n = T be a partition of the time interval [0, T ].Then, considering a time-discrete version x k := x t k , k ∈ N + , of the affine process (x t ) t≥0 , our aim is to derive the parameters of the latent state process (x k ) k∈N + given the observations (y k ) k∈N + .To be more precise, we consider the following state space model, the observation equation can be seen as of the general form in (2.11) by enlarging the state vector, where I is the identity matrix, (x k ) k∈N + is the latent process, (y k ) k∈N + represents the observations, and v k represents the noise.The aim is to estimate the model parameters A, β, Σ, Σ given the observation vector y 1:k and the variance of v k .

Kalman particle filter for online parameters estimation
In this section, we introduce the Kalman particle filter for online parameter estimation.It is a semi-recursive algorithm that combines the Kalman filter and the particle filter.In this new approach, we consider a two layers method as in the RNPF algorithm.In the outer layer, we sample the particles of the model parameters using some Markovian Gaussian kernel which is updated at each time step.In the inner layer, the distribution of the state process and the marginal likelihood p( k ), which is used to re-weight the parameter particles in the outer layer, are estimated given the sampled parameter particles.
There are two main differences between our proposed Kalman particle filter algorithm and the RNPF algorithm.The first difference is that in the outer layer we use dynamic jittering functions, i.e. the jittering functions can change over time.Specially, in this paper we specify two jittering functions to sample the model parameters, see (4.4) and (4.7) as described in Subsection 4.1 below.The second difference is that we use the Kalman filter, instead of the particle filter, to update the underlying states in the inner layer.Note that in case the state space does not follow linear Gaussian dynamics, the literature offers different alternatives, see [2] for a Monte Carlo approach, or the Gaussian mixture, see [34], or Kalman filter extensions such as the extended Kalman filter, see [14,36], the unscented Kalman filter, see [37] .In these latter methodologies, the idea is to consider an approximation of the state variables which is linear and Gaussian, and then run a Kalman filter on the approximation.When the model is not Gaussian, such an approximation introduces bias.In Section 5, we will carry a convergence analysis of our algorithm and we will prove that the bias induced by the Gaussian approximation of the model (3.4), (3.5) indeed vanishes when the time step tends to zero.
The use of the two jittering functions in the outer layer and of the Kalman filter in the inner layer allows us to obtain an algorithm that has faster convergence speed and less computational complexity than the RNPF algorithm.This will be further illustrated in the examples in Section 6.

Static model estimation
As described in Section 2.5, in order to sequentially estimate the posterior density p(θ | y 1:k ), k = 1, • • • , K, we face two issues: how to sample particles from the former posterior distribution p(θ | y 1:k−1 ) and how to compute the conditional likelihood p(y k | y 1:k−1 , θ).First, we consider the sampling problem and we introduce the first jittering kernel that we use to update the parameter space θ.

Gaussian kernel with changing covariance
Recall that we use the generic notation N (x; µ, Σ) to denote the density at x of the normal distribution with mean vector µ and covariance matrix Σ.We choose a Gaussian kernel such that the conditional density of θ k is given as with µ(θ k−1 ) and Σ(θ k−1 ) being respectively the conditional mean and covariance of θ k given θ k−1 .Then if the parameters θ k are jittered from this Gaussian kernel, one can easily derive that Ideally, the jittering should not introduce bias and information loss (artificial increase in the variance), see [27], which means that E(θ k ) = E(θ k−1 ) and Var(θ The latter, together with Equations (4.1) imply To achieve that, the Liu & West filter [27] applies a shrinkage to the kernel.We will apply the same technique although the jittering function is used differently in our case.If one assumes a deterministic jittering covariance, i.e.Σ(θ k−1 ) = Σ k−1 and a linear mean function , for some a ∈ (0, 1), ( then the jittering kernel satisfying (4.2) is given by where Σ(θ k−1 ) = (1 − a 2 )Var(θ k−1 ).The kernel in (4.4) is the same jittering kernel as used in the Liu & West filter [27].We will refer to the number a as the discount factor.Before we present our methodology for jittering in detail, we introduce the following assumption which we need in our recursive algorithm later.We will use this assumption in Section 5 to prove the convergence of our proposed algorithm.
for some positive constant e 1,k and bounded function f : D θ → R, and sup for p ≥ 1 and some positive constant e 2,k .
Let f : D θ → R be a bounded Lipschitz function.In Proposition 1 of Appendix C in [8], set n = 1 there, it is shown that if for any p ≥ 1, the jittering kernels for some positive constant c independent of n, then Assumption 4.1 holds.
The jittering kernels of type (4.4) have an appealing property, the covariance can change over time.This aspect helps us to design an algorithm with the following attractive feature.Initially, since we lack information on the unknown parameters, a larger covariance can lead to a faster convergence of the parameters to the high likelihood area.Over time, the filter refines the estimate of the fixed parameters until at some points a very small variance has been reached which makes the parameter estimation more accurate.However, a direct application of this kernel does not yield a recursive method since it generally does not satisfy Assumption 4.1.Hence, it is unclear whether the algorithm converges.To tackle this issue, we introduce the second Gaussian jittering kernel which satisfies Assumption 4.1, hence we can obtain a recursive method if we switch the jittering kernel to the second kernel.The details will be described in the section 4.1.2.
Remark 4.2.Although in this paper we have specified the use of Gaussian kernels with a certain mean and variance, the dynamic kernel set up is very generic.In implementation, one can freely choose another kernel that fits its purpose.For example, one can define the variance of the jittering kernel as a monotonically decreasing function of time so that the convergence speed can be manually controlled.

Description of the Kalman particle filter methodology
Here we present our methodology to Gaussian and linear models.When the model is not Gaussian and linear, we approximate it by a Gaussian linear model and hence we can follow the same methodology as described below to the approximation.
The non recursive step.Assume at time k − 1, one can generate a random grid of samples in the parameter space, say {θ Here we apply the kernel (4.4), referred to as jittering kernel 1, to obtain new samples { θ(i) • Update.In order to compute the posterior measure Γ k−1 at time k − 1 of the posterior distribution, see Formula (2.12) and note that we make the dependence on θ(i) k clear in the notation.However, these latter quantities are not available since the parameter has evolved from θ k−1 at time k to θk at time k, see also the discussion in Section 2.5.Hence at this step, the algorithm does not run recursively and at every time k where a new parameter particle is sampled, the inner filter re-runs from time t 0 to t k (step 1.a.ii in Algorithm 4.3).Moreover, the marginal likelihood p( is computed in the inner filter, using Equation (2.13).This latter will be used to re-weight the parameter particles, see step 1.c in Algorithm 4.3 below.
• Resampling.We use a resampling technique to obtain {θ The recursive step.Once at some time point t l the variance Σ(θ l ) = (1 − a 2 )Var(θ l ) is smaller than a certain level V N ensuring that Assumption 4.1 holds (we also set a floor 0 < V f < V N on the jittering variance to prevent the algorithm of getting stuck), then • Jittering step 2. We apply the jittering kernel 2 p(θ for k ≥ l + 1, see step 1.b.i of Algorithm 4.3.
• Update.From time t l+1 on, we have a recursive algorithm based on the idea to approximate the posterior measure Γ ii in Algorithm 4.3).Note that here we use the Kalman filter.Note that, as in Subsection 2.5.1, here we assume that Γ θ k is continuous w.r.t.θ ∈ D θ .Moreover, the marginal likelihood p( is approximated by the inner filter using Equation (2.13), step 1.c in Algorithm 4.3.This latter will be used to re-weight the parameter particles • Resampling.Apply a resampling technique to obtain {θ We now introduce the Kalman particle algorithm.Recall µ(θ k−1 ) and Σ(θ k−1 ) from (4.4) and the discount factor a from (4.3). Recursion: 1. Filtering: given {θ ii. based on the parameter θ(i) k , use the Kalman filter to compute the mean and covariance of the posterior distribution from time 1 to k and hence obtain Γ i. sample new parameters from the kernel (4.7), i.e.
ii. based on the parameters θ(i) k , B k−1 and P k ) using Equation (2.13), consequently, using (2.18)), obtain an approximation of the normalized weights given by wθ (i) , which gives un update to be used in the next step, 2. Resampling: Note that Algorithm 4.3 could be applied to general state space models (2.1).For our convergence analysis and in the financial applications, we will focus on the special type (3.4), (3.5) of affine state space models.When Σ = 0, the transition measure π θ k (dx | x k−1 ) of (3.4) is not Gaussian, we need to approximate it by a Gaussian transition in order to apply the Kalman filter.The approximation is obtained by replacing x Note that given x k−1 , the variable xk admits a Gaussian transition for k ∈ N + .Hence, given the model parameters θ (i.e.A, β, Σ 1 , Σ 2 ), we can compute the approximated transition measure πθ k = p(dx k | x k−1 , θ).Recall from Equations (2.17), (2.8) and (2.10) that the weights for the parameters space are computed by (4.9) In the recursive step of Algorithm 4.3, the measure Γ k in the jittering step at time k.In order to have a recursive algorithm, the estimate Γθ (i) Hence, when the model is linear and Gaussian, we obtain the following estimation of the weights for the parameters space When the model is nonlinear or non-Gaussian, then we consider the approximation (4.8) to (3.4).
In this case, the transition probability π is approximated by a Gaussian transition probability π θ(i) k k .Therefore, in the non-recursive step, estimation of the weights for the parameters space is given by ŵθ In the recursive step, the measure Γ Hence, we obtain the following estimation of the weights for the parameters space together with the estimation of the posterior measure of x k , for A ∈ B(R d ) a Borel set, We will make use of the weights (4.10), (4.11) and (4.12) in our convergence analysis later in Section 5.

Kalman Particle filter for models with piece-wise constant parameters
So far in this paper, the model parameters are assumed to be fixed over time.But in many applications, it is more realistic to assume that, at least, some parameters are time-varying, for instance if they are piecewise constant.Offline estimation methods such as MLE are able to deal with this situation only if the change point of the parameter is known beforehand.This does not hold in most of the cases.For the particle filter methods which treat the model parameters as static, the variance of the samples for the parameters decreases with more observed data.Hence the marginal distribution of the model parameters will be increasingly concentrated around certain values.The consequence is that the particle filter algorithm is not able to capture abrupt changes of parameters.We extend our proposed algorithm for static parameter to adapt to abrupt changes of parameters.To achieve that, we first identify the change points.This step is done by comparing the marginal likelihood between two consecutive steps.Suppose the parameter samples have already converged to the actual value.If at some point the actual parameter value jumps to another value, then the marginal likelihood based on the existing parameter samples are far from optimal.Hence the marginal likelihood at this time point should be significantly smaller than that at the previous time point.On the other hand, if at this time point the actual parameter value does not change, then the marginal likelihood should also be very close to the previous value.So we set a threshold b < 1 and if at some point time k, the maximum marginal likelihood for θ then we consider the time point k to be the change point of the parameters.Of course, this condition is not sufficient but it is necessary.
Another issue here is that the parameter samples may have already (nearly) converged before the change point, hence the variance of these samples is too small to capture the change.This problem can be tackled by adding new samples from the initial parameter distribution to increase the sample variance.But since the variance is increased, the jittering kernel (4.7) does not satisfy Assumption 4.1.Hence the jittering kernel should switch to (4.4).Moreover, since the model parameter changes, the posterior distributions from the previous time point are also not valid anymore, and a new initial value for the mean and the variance of the posterior distribution should also be initialized.So once the change point is determined, one can treat the calibration of the model as a new calibration based on data after this change point.We introduce the Kalman particle algorithm extended to time-varying parameters.We present the algorithm in full detail, noting that the differences with the previous Algorithm 4.3 are in the two jittering cases in the filtering step.1. Filtering: given {θ ii. based on the parameter θ(i) k , use the Kalman filter to compute the mean and the covariance of the posterior distribution from time 1 to k, iii.compute the likelihood p(y k | y 1:k−1 , θ(i) k ), consequently obtain the normalized weights k−1 }, i. sample new parameters from the kernel (4.7), i.e.
ii. based on the parameters θ(i) k , B k−1 and P

(i)
k−1 , use the Kalman filter to compute the mean B(i) k and the covariance P (i) k of the approximated posterior distribution at time k,

iii. compute an approximation p(y
then go to the Initialization step of the algorithm to initialize the algorithm using the data after time k.Otherwise compute the normalized weights .

Resampling: for each
As in this setting the parameters are time-varying, there is of course no point in providing a convergence analysis.But in Section 6 we will present a numerical study where Algorithm 4.4 is seen to be capable of tracking a sudden parameter change.

Convergence analysis
In the Kalman particle Algorithm 4.3 introduced in Section 4.1.2,the (conditional) measure In the jittering step as described in Algorithm 4.3, new samples θ(i) k are sampled from the kernel function N (θ In this step, no extra information is used to refine the estimates on the parameters.Hence the aim is to prove that the measure μN k converges to the measure µ N k−1 in some sense when δ goes to zero and the number of samples N goes to infinity. In the update step as described in Algorithm 4.3 of Section 4.1.2,there are four cases to analyze, combinations of Gaussian-linear or non-Gaussian/non-linear models and recursive or non-recursive parts of the algorithm.When the model is linear and Gaussian, and we consider the non-recursive step of the algorithm, then the normalized weights w For the non-Gaussian and non-linear model, we consider the approximation (4.8) to (3.4), the normalized weights are estimated by (4.11) and (4.12) in the non-recursive step and recursive step, respectively .Since the convergence of the weights in this latter case implies the convergence of the weights in the first three cases, we will only consider the model (4.8) and the recursive step of the algorithm.In this case, we obtain a new estimated measure and the convergence analysis of μN k depends on that of the estimated weights ŵθ (i) k k .The aim is to prove that μN k converges to µ k in some sense when δ goes to 0 and N goes to infinity.In the resampling step, the new particles {θ and we need to prove that the measure converges to µ k when δ goes to zero and N goes to infinity.
For our convergence analysis we will need besides Assumption 4.1, the following assumptions.
for some constant e 3,k .
Assumption 5.2.Let (x k ) k∈N + be as in (3.4).There exist a constant M > 0 such that sup Assumption 5.3.For any fixed observation sequence y 1:k , the likelihood 3. inf θ∈D θ l θ yt > 0. The Inequality (4.5) in Assumption 4.1 is used for the convergence analysis of the jittering step, the Inequality (4.6) in Assumption 4.1, together with Assumptions 5.1 and 5.2 are specially used for the analysis for the update step and Assumption 5.3 is used for the analysis for both update and resampling steps.Given the convergence of µ N k−1 to µ k−1 in an L p -sense when δ goes to zero and N goes to infinity, we present the convergence of the measures μN k , μN k and µ N k respectively in the three Lemmas 5.4, 5.7, 5.8 in the subsections below.In Section 5.4, we study the convergence of Algorithm 4.3 presented in Subsection 4.1.2by induction using these three lemmas.The main result there is Theorem 5.9.Our analysis in inspired by the results in [8] for the nested particle filter and follows a similar pattern.Note however the crucial differences between our Algorithm 4.3 and their nested particle filter.We have to deal with only one layer with a particle filter (instead of two such layers), as we use the Kalman filter in the other layer, but we also pay attention to the time discretization of the continuous processes.

Jittering
In the jittering step 1.b.i for low values of Σ(θ k−1 ) the new parameter particles θ(i) k are sampled from a kernel function κ The following lemma shows that the error due to the jittering step vanishes.This lemma can be seen as the analog of Lemma 3 in [8], but with the term 1 √ M there (which plays no role in our analysis) replaced with √ δ as we treat the influence of time discretization.It can be proven in the same way and we present it here for the sake of completeness and in a form that suits our purposes.
for some constants c 1,k−1 and d 1,k−1 which are independent of N and δ, then there exist constants c1,k and d1,t which are independent of N and δ, such that (5.7)

Update
To prove the convergence in the update step 1.c, we first need to prove that the error introduced from the approximation of the weights w , can be bounded by a desired quantity as in (5.7).The following lemma is a core result in our convergence analysis, in the proof of it we exploit the affine nature of the state process.Lemma 5.5.Let the observation sequence y 1:k be fixed.Suppose function f is bounded and continuous and Assumptions 4.1,5.1, and 5.2 hold.If for some constants c 2,k−1 and d 2,k−1 which are independent of N and δ, then there exist constants c2,k and d2,k which are independent of N and δ such that To prove Lemma 5.5, we exploit the structure (3.4) of the state process.Besides, we need the following auxiliary result.The proof of it follows the same lines as the proof of Lemma 4 in [8], but note again the 1 √ M term is replaced by √ δ.
Lemma 5.6.Suppose the function f is bounded and continuous.Moreover, suppose Assumptions 4.1 and 5.1 and Inequality (5.8) hold.Then there exist some constants c2,k−1 and d2,k−1 which are independent of N , δ and of all θ such that Now we are ready to prove Lemma 5.5.
Proof.Using the triangle inequality, one obtains sup Note that (f, π θ(i) k k ) is bounded by f ∞ .Hence, using Inequality (5.10), we get for the first term on the right hand side of (5.11) (5.12) Recall that (x k ) k∈N and (x k ) k∈N respectively from (3.4) and (4.8).For δ, M 1 > 0, define the sets ) can be seen as the (conditional) expectation of f (.) taken under the measure π θ(i) k k .Stated otherwise, we can see it as the expectation of f (x ) for these expectations.With these interpretations, the second term on the right hand side of (5.11) yields We need to find an upper bound for the three terms on the right hand side of (5.13).
Consider the first term and define M 1 = M/ √ δ.Using Assumption 5.2 and the Markov inequality, we get (5.14) Substitute (5.14) into the first term on the right hand side of (5.13) to obtain (5.15) Next we consider the second term.Note that the random variables xθ (i) restricted to B M1 take values in a compact set.Hence the continuous function f is also uniformly continuous on that set.Define = √ δ, then there exists a δ such that |f (x) − f (y)| ≤ √ δ, for all |x − y| < δ .Define δ = δ .We obtain for the second term on the right hand side of (5.13), using the definition of A δ , For the last term on the right hand side of (5.13), we apply again the Markov inequality, to obtain We denote the i-th component of an R d -valued process (x k ) k∈N by (x k ) k∈N .Furthermore, we temporarily suppress the dependence on θ(i) k in the notation.Given x k−1 , according to Equations (3.4) and (4.8), we obtain Using the Burkholder-Davis-Gundy inequality, we know there exists a constant C which does not depend on (x t ) t≥0 such that, for every .
Using Jensen's inequality and Fubini's theorem, we get for the latter expectation k−1 du k−1 du k−1 du .
Note that e x − 1 = O(x) if x → 0. Since the parameters are assumed to have a compact domain, we conclude there exists a constant C 1 which is independent of the parameters and such that Hence, returning to previously used notation, (5.17) Combining (5.15), (5.16) and (5.17) together with (5.12), we prove the statement of the lemma. and hold for some constants c 1,k−1 , d 1,k−1 , c 2,k−1 and d 2,k−1 which are independent of N and δ, then there exist constants ĉ1,k , d1,k , c2,k and d2,k which are independent of N and δ such that Proof.First, we prove the convergence of the estimated normalized weights ŵθ (i) k k , which are used to prove the convergence of the measure μN k .Denote the unnormalized weights by v θ(i) k−1 k−1 ), see (4.12), and v , see (4.9).Using Lemma 5.5, we have where c2,k and d2,k are constants which are independent of N, δ and θ.By Assumption 5.3, we obtain inf Hence for the normalized weights, it follows that is bounded from below by a constant times N .
Since w k and (w k , µ k−1 ) ≥ 0, using Assumption 5.3 and the triangle inequality, we get Hence, we need to find an upper bound for the two quantities (w (5.22)For the first term on the right hand side of (5.22), it follows from Lemma 5.4 that (5.23) For the second term on the right hand side of (5.22), we get using (5.20) Inserting the latter together with (5.23) in (5.22), we obtain where c k and d k are constants independent of N and δ.Letting f = 1 in (5.24) implies (5.25) Therefore substituting (5.25) and (5.24) into the right hand side of Inequality (5.21), we obtain d k < ∞ are independent of N and δ and the statement for μN k follows.To prove the statement for Γθ (i)   k k , we compute using (2.9) and (4.12) ≥ inf θ∈D θ l θ y k > 0, then using Equation (5.20) and Lemma 5.5 (note that l θ(i) k y k is bounded by Assumption 5.3), we prove the statement of the Lemma.

Resampling
In the following lemma we study the convergence of the measure µ N k .
6 Numerical results on affine term structure models In this section we illustrate our results by considering some specific examples of the affine class (3.1) to which we apply our algorithms.In particular we will consider the one factor Cox-Ingersoll-Ross (CIR) model, the two factor Hull-White model, and the one factor Hull-White model with stochastic volatility.To test how the designed algorithm works on these models, the models are calibrated on simulated data using pre-determined parameters.We also compare the behavior of our algorithm to the one generated by the recursive nested particle filter (RNPF) for the CIR model.The comparison shows that in this case our algorithm outperforms the RNPF.

One factor CIR model
At first we consider the CIR model (3.6).One important property of the CIR models is that the yield curves are always non-negative.The transition density of the CIR model has a non-central chi-square distribution, i.e., )) is the non-centrality parameter and c = σ 2 (1 − e −α(t k+1 −t k ) )/4α.We define the short rate process to be r t = x t .The analytical solution of the functions φ and ψ defined in (3.2) can be obtained by solving the ODE (A.2), with d = 1, γ = −1 and c = 0.
In the tests, the parameters are set to be α 1 = 0.45, β = 0.001 and Σ = 0.017.Based on these parameters values, we generate daily data for yield curves with the times to maturity ranging from 1 year to 30 years.The time length is T = 2000, i.e., the data set contains 2000 days.Furthermore, we add white noise with variance h = 1 × 10 −8 in the simulated zero rates.
We use our Kalman particle filter algorithm (Algorithm 4.3, henceforth referred to as KPF) to calibrate the model parameters.The transition distribution of the CIR model is not Gaussian, but note that the CIR model fits the structure of (3.1) with Σ = 0. Hence we use the approximation (4.8) and apply the Kalman filter in the inner layer.In the outer layer, we set the number of particles to be N = 5000 and the initial prior distribution of the parameters to be uniform, i.e., α ∼ U (0, 1), β ∼ U (0, 0.01), and σ ∼ U (0, 0.1) .
Moreover, the sampled parameters at each step are bounded by the boundary of the latter corresponding uniform distributions respectively.In the inner layer, the mean and variance of the initial prior distribution of x 0 are 0.005 and 0.01.The discounting factor a is set to be 0.98 and the variance boundary V N is set to be 1 √ N 3 .We will also use the same a and V N for other experiments later.Figure 6.1 shows that the estimated parameters converge over time.Figure 6.2 shows the convergence of the standard deviation of the jittering kernel for the three parameters.It is seen that after 437 steps the variance is smaller than the required level.
We also implemented the recursive nested particle filter (RNPF) for comparison.The sample size in the two layers are set to be 1000 and 300 respectively.Moreover, we simulated 18000 more days of data, hence in total we obtain 20000 days of data.For the rest we use the same settings as in the previous example, i.e. the same initial sample distributions for the parameter generation, the same boundary on the parameter samples in the outer layer, and the variance of the jittering kernel is also set the same as V N .Figure 6.3 shows how the behavior of the estimated parameters over time.One observes that even after 20000 time steps the RNPF algorithm estimates of α and σ don't reach the correct parameter values.

Two factor Hull-White model
In this subsection we consider the two factor Hull-White model (3.7).The short rate process r t is given by r t = x  Similar as in the previous example, the yield curve data are simulated based on these parameters and then a white noise process is added on the simulated data.The variance of the white noise is 6 × 10 −7 .The times to maturity of the yield curves range from 1 year up to 30 years.The time step of the data is set to be daily and the time length is 2000.
Using these noisy simulated data, we use our Kalman particle algorithm 4.3 to calibrate the model parameters.The Hull-White model fits the structure of (3.1) with Σ = 0. Since the Hull-White model is a Gaussian model, the Kalman filter in the inner layer gives an optimal filter.The number of particles at each step is N = 2000.For the initialization of the parameter samples, the initial prior distribution of parameters is chosen to be uniform, namely: α 1 , α 2 ∼ U (0, 0.4), σ 1 , σ 2 ∼ U (0, 0.1), and ρ ∼ U (−0.8, −0.3) .
The initial prior distribution of the state x k is chosen to be Gaussian with mean 0 and variance diag([0.1,0.1]).Figure 6.4 shows the how the estimated parameters converge over time.One can observe that the convergence is very fast and accurate.

Hull-White model with stochastic volatility
We consider the following stochastic volatility model, , which is model (3.8) in a notation that is more suitable for the purposes of this section.The process V t presents the fluctuation of the volatility of the system.Note that the long term mean of the process V t is fixed at the known constant 0.1, otherwise the model would be over-parametrized, i.e. by scaling the volatility process and the parameters σ 1 , σ 2 one can obtain an equivalent model.The transition density of this stochastic volatility model is not analytically available.To tackle this issue, we apply the same approximation as in the CIR model test of Section 6.1, namely, we approximate the stochastic diffusion by constant diffusion between the time steps, see (4.8).Under this approximation, the transition density of the model is Gaussian and the mean and variance can be theoretically computed.The short rate is defined by r t = X t and hence the yield curve can be computed by R t (τ ) = φ(τ ) + ψ(τ ) Xt , with Xt = [V t , X t ] .The functions φ, ψ are the solutions to the Riccati equations (A.2), with d = 2, γ = (0, −1) and c = 0.The solutions to these latter equations are not known in closed form.We introduce an efficient numerical algorithm to compute these functions, the detailed algorithm is in the Appendices A. The mean and variance of the initial prior distribution of Xt are [0.1, 0] and diag([0.01,0.01]).Figure 6.5 shows that also for a model with stochastic volatility the parameter estimates quickly converge.

One factor CIR model with jump parameters
This experiment can be seen as an extension of the experiment on the CIR model calibration of Section 6.1.In this experiment, we assume the parameters have a jump at time T = 2001, from [α, β, σ] = [0.45,0.001, 0.017] to [α, β, σ] = [0.55,0.0015, 0.023].We simulate the new data from time point T = 2001 to T = 4000 based on the new parameters and the settings for the other parameters are the same as in Section 6.1.To identify the parameter change, we set b = 0.1.In the experiment we use Algorithm 4.4.Figure 6.6 shows that in this study the KPF algorithm for models with time-varying parameters is able to track a sudden change in the parameter values and quickly stabilizes at the new values.

Conclusion
In this paper we have introduced a semi-recursive algorithm combining the Kalman filter and the particle filter with a two layers structure.In the outer layer the dynamic Gaussian kernel is implemented to sample the parameter particles.Moreover, the Kalman filter is applied the inner layer to estimate the posterior distribution of the state variables given the parameters sampled in the outer layer.These two changes provide faster convergence and reduce the computational ).The theoretical result is complemented by numerical results for several affine term structure models with static parameters or jump parameters.Although our numerical illustrations are for term structure models, the Kalman particle algorithm can also be applied to many other models.• Aβ ∈ R p + × R q , • A IJ = 0 , • A II has nonpositive off-diagonal elements .
This latter conditions on the parameters ensure that the process of (3.1) remains in the state space R p + × R q .The aim in the following theorem is to compute the zero coupon bond price P (t, T ) introduced in (3.2).For a proof, we refer to Theorem 3.1 in [25].
In any of the above cases, it holds for all 0 ≤ t ≤ T ≤ τ and for all x ∈ R p + × R q , E[e − T t r(s) ds | F t ] = e −φ(T −t,0)−ψ(T −t,0)x(t) .
We use a Taylor series to approximate the solution (φ, ψ).In order to do so, first we need to determine the coefficients in Taylor expansion.
In practice, we can set the approximation error level to be ε.If at each step we choose ∆ = (εA N (u)) 1 N , then the last term in the Taylor expansion is A N (u) * ∆ N = ε.Hence we can control the approximation at the level ε.
Remark A.3.The values of the functions φ(t, u), ψ(t, u) might go to infinity for some value of t and u .In these situations, the Taylor expansion approximation doesn't work.However, in financial application, we assume these cases do not exist since in finance we always assume the moments of the underlying process exist.
time k are generated by some Markov kernels denoted by κ(θ | θ (i) k−1 ) (step 1.a in Algorithm 2.1 below).This step is the outer Monte Carlo layer.

Algorithm 4 . 3 (
Kalman particle filter for static parameter model).Initialization:1.set the number of particles N , a value for the discounting factor a, a switching variance level V N and a floored variance level V f , 2. assume an initial distribution p(θ 0 ) for the parameters, 3. sample from the initial distribution to get N particles {θ (i) 0 , i = 1, • • • , N } for the parameters, 4. for each particle θ (i) 0 , assign the same initial mean B

Algorithm 4 . 4 (
Kalman particle filter for models with time-varying parameters).Initialization:1.set the number of particles N , a value for the discounting factor a, a switching variance level V N , a floored variance level V f , and the threshold parameter b, 2. assume an initial distribution p(θ 0 ) for the parameters, 3. sample from the initial distribution to get N particles {θ (i) 0 , i = 1, • • • , N } for the parameters, 4. for each particle {θ (i) 0 }, assign the same initial mean and covariance value of the posterior distribution B

. 1 )
is estimated for k = 1, • • • , K. From now on, for convenience, we assume the algorithm starts at time 1.At each time, the algorithm has three main steps: jittering, update and resampling.Define the maximum step size δ = sup k=1,••• ,K (t k − t k−1 ).Let the parameter θ ∈ D θ .At time k, suppose the estimated measure of the last time µ N k−1 is available and exactly.When we consider the recursive step of the algorithm for a Gaussian linear model, the normalized weights w

Lemma 5 . 4 .
Let f be a bounded function and suppose Assumption 4.1 holds.If

1 k− 1 Lemma 5 . 7 .
can be controlled in an appropriate manner, guaranteeing Σ(θ k−1 ) below a threshold value V N .This allows us to run the outer layer in Algorithm 4.3 recursively, see step 1.b.Adding Assumption 5.3, we present in the following lemma the convergence of μN k .Let the observation sequence y 1:k be fixed and Assumptions 4.1, 5.1, 5.2 and 5.3 hold.Then for any bounded and continuous function f , if t .The analytical solution of the functions φ and ψ defined in (3.2) can be

Figure 6 . 1 :
Figure 6.1: Convergence of parameter estimates by KPF for the CIR model; the red lines represent the true values of the parameters.

Figure 6 . 2 :
Figure 6.2: Convergence of the standard deviation of the jittering kernel; the orange line represents the square root of switching level.

Figure 6 . 4 :
Figure 6.4: Convergence of parameter estimates by KPF for the Hull-White model.

Figure 6 . 5 :
Figure 6.5: Convergence of parameter estimates by KPF for the Hull-White model with stochastic volatility.

Figure 6 . 6 :
Figure 6.6: Parameter estimates by KPF for the CIR model with a sudden jump.