Ensemble Kalman filter based sequential Monte Carlo sampler for sequential Bayesian inference

Many real-world problems require one to estimate parameters of interest, in a Bayesian framework, from data that are collected sequentially in time. Conventional methods for sampling from posterior distributions, such as Markov chain Monte Carlo cannot efficiently address such problems as they do not take advantage of the data’s sequential structure. To this end, sequential methods which seek to update the posterior distribution whenever a new collection of data become available are often used to solve these types of problems. Two popular choices of sequential method are the ensemble Kalman filter (EnKF) and the sequential Monte Carlo sampler (SMCS). While EnKF only computes a Gaussian approximation of the posterior distribution, SMCS can draw samples directly from the posterior. Its performance, however, depends critically upon the kernels that are used. In this work, we present a method that constructs the kernels of SMCS using an EnKF formulation, and we demonstrate the performance of the method with numerical examples.


Introduction
Bayesian inference is a popular method for estimating unknown parameters from data, largely due to its ability to quantify uncertainty in the estimation results [16].
In the current work we consider a special class of Bayesian inference problems where data have to be collected in a sequential manner.A typical example of this type of problem is the estimation of parameters, such as the initial states or the equation coefficients, in a dynamical system from observations related to the state vector at discrete times.Such problems arise from many real-world applications, ranging from weather prediction [1] to biochemical networks [20].It should be emphasized that, unlike many data assimilation problems that seek to estimate the time-dependent states in dynamical systems, the parameters that we want to estimate here are assumed not to vary in time.To distinguish the two types of problems, we refer to the former as state estimation problems and the latter as parameter estimation.We should also note that in this work we focus on methods which use samples to represent the posterior distribution, and that approximation based methods, such as the Variational Bayes [6] and the Expectation Propagation [30] will not be discussed here.Conventional sampling methods, such as Markov Chain Monte Carlo (MCMC) simulations [19], use all the data in a single batch, are unable to take advantage of the sequential structure of the problems.On the other hand, sequential methods utilize the sequential structure of the problem and update the posterior whenever a new collection of data become available, which makes them particularly convenient and efficient for sequential inference problems.
A popular sequential method for parameter estimation is the Ensemble Kalman filtering (EnKF) algorithm, which was initially developed to address the dynamical state estimation problems [13].The EnKF method was extended to estimate parameters in many practical problems, e.g., [1,2], and more recently it was generically formulated as a derivative-free optimization based parameter estimation method in [26].The EnKF method for parameter estimation was further developed and analyzed in [4,25,33], etc.The basic idea of the EnKF method for parameter estimation is to construct an artificial dynamical system, turning the parameters of interest into the states of the constructed dynamical system, before applying the standard EnKF procedure to estimate the states of the system.A major limitation of the EnKF method is that, just like the original version for dynamical state estimation, it can only compute a Gaussian approximation of the posterior distribution, and thus methods directly sampling the posterior distribution are certainly desirable.To this end, the Sequential Monte Carlo sampler (SMCS) method [11], can draw samples directly from the posterior distribution.The SMCS algorithm is a generalisation of the particle filter [5,12] for dynamic state estimation, generating weighted samples from the posterior distribution.Since the SMCS algorithm was proposed in [11], considerable improvements and extensions of the method have been proposed, such as, [15,7,22,14], and more information on the developments of the SMCS methods can be found in the recent reviews [10,9].We also note here that there are other parameter estimation schemes also based on particle filtering, e.g., [18,8], and the differences and connections between SMCS and these schemes are discussed in [11].The SMCS method makes no assumption or approximation of the posterior distribution, and can directly draw (weighted) samples from any posterior.As will be discussed later, a key issue in the implementation of SMCS is to choose suitable forward and backward kernels, as the performance of SMCS depends critically on such choices.As has been shown in [11], the optimal forward and backward kernels exist in principle, but designing effective kernels for specific problems is nevertheless a highly challenging task.In dynamic state estimation problems, often the EnKF approximation is used as the proposal distribution in the particle filtering algorithm [31,36], especially for problems in which the posteriors are only modestly non-Gaussian.Building upon similar ideas, we propose in this work to construct the kernels in SMCS by using an EnKF framework.Specifically, the forward kernel is obtained directly from an EnKF approximation, and the backward kernel is derived by making a Gaussian approximation of the optimal backward kernel.
The remaining work is organized as follows.In Section 2 we present the generic setup of the sequential inference problems that we consider in this work.In Sections 3 and 4 we respectively review the SMCS and the EnKF methods for solving sequential inference problems.In Section 5 we present the proposed EnkF-SMCS method and in Section 6 we provide several numerical examples to illustrate the performance of the proposed method.Finally Section 7 offers some concluding remarks.

Problem setup
We consider a sequential inference problem formulated as follows.Suppose that we want to estimate the parameter x ∈ R n x from data y 1 , ..., y t , ..., y T which become available sequentially in time.In particular the data y t ∈ R n y is related to the parameter of interest x via the follow model, where each G t (•) is a mapping from R n x to R n y , and the observation noise η t ∼ N (0, R t ).It follows that the likelihood function can be written as, It is important to note here that the restriction that the error model has to be additive Gaussian as is in Eq. (2.1) is due to the use of EnKF.While noting that relaxing such a restriction is possible, we emphasize here that additive Gaussian assumption noise is reasonable for a wide range of practical problems.We can now write the posterior distribution in a sequential form: where π 0 (x) is the prior distribution of x, and our goal is to draw samples from π t for any 0 < t ≤ T .
The posterior in Eq. (2.2) is essential in a data tempering formulation, and as is pointed out in [15,37], such problems pose challenges for usual MCMC methods especially when the amount of data is large, as they cannot conveniently exploit the sequential structure of the problem.In what follows, we first discuss two sequential methods for this type of problems: the EnKF and the SMCS algorithms, and we then propose a scheme to combine these two methods.

Sequential Monte Carlo Sampler
We first give a brief introduction to the SMCS method for sampling the posterior distribution π t (x), following [11].The key idea of SMCS is to construct a joint distribution π(x 1 , ..., x t ), the marginal of which is equal to the target distribution π t (•).Note here that π(x 1 , ..., x t ) needs only to be known up to a normalization constant.One then applies the sequential importance sampling algorithm [5,12] to draw weighted samples from π(x 1 , ..., x t ), which after being marginalized over x 1 , ..., x t−1 , yields samples from π t (•).
Next we describe SMCS in a recursive formulation where, given an arbitrary conditional distribution L t−1 (x t−1 |x t ), we can construct a joint distribution of x t−1 and x t in the form of, such that the marginal distribution of p t (x t−1 , x t ) over x t−1 is π t (x t ).Now, given a marginal distribution q t−1 (x t−1 ) and a conditional distribution K t (x t |x t−1 ), we can construct an importance sampling (IS) distribution for p It is important to note here that a key requirement of the IS distribution q t (x t−1 , x t ) is that we can directly draw samples from it.We let {x m t−1:t } M m=1 be an ensemble drawn from q t (x t−1 , x t ), and note that the weighted ensemble {(x m t−1:t , w m t )} M m=1 follows the distribution p t (x t−1:t ), where the weights are computed according to where As can be seen here, once the two conditional distributions K t and L t−1 (respectively referred to as the forward and backward kernels in the rest of the paper) are chosen, we can draw samples from Eq. (3.2) and compute the associated weights from Eq. (3.3), obtaining weighted samples from p t (x t−1 , x t ) as well as its marginal π t (x t ).The SMCS essentially conducts this procedure in the following sequential manner: 1. let t = 0, draw an ensemble {x m 0 } M m=1 from q 0 (x 0 ), and compute w m 0 = π 0 (x m 0 )/q 0 (x m 0 ) for m = 1...M ; Note here that, a resampling step is often used in SMCS algorithm to alleviate the "sample degeneracy" issue [11].The resampling techniques are well documented in the PF literature, e.g., [5,12], and so are not discussed here.
As can be seen from the discussion above, to use SMCS one must choose the two kernels.In principle, optimal choices of these kernels are available.For example, it is known that once K t (x t |x t−1 ) is provided, one can derive that the optimal choice of L t−1 (x t−1 |x t ) is [11]: where the optimality is in the sense of yielding the minimal estimator variance.We also note that use of the optimal L-kernel allows the weights to be written as Moreover, we can see here that if we can choose K t such that q t = π t , then the weight function is always unity, which means that we now sample directly from the target distribution (the ideal case).While obtaining such an ideal K t is usually not possible in practice, it nevertheless provides useful guideline regarding choice of the forward kernel K t i.e. it should be chosen such that the resulting q t is close to π t .For example, it is proposed in [11] to use the MCMC moves as the forward kernel.A main limitation of the MCMC kernel is that it typically requires a number of MCMC moves to propose a "good" particle, and since each MCMC move involves an evaluation of the underlying mathematical model, G t , the total computational cost can be high when G t is computationally intensive.
In this work we consider an alternative to the use of MCMC kernels.Specifically we propose to choose K t of the form i.e., a Gaussian distribution with mean T t (x t−1 ) and variance Σ K t , where T t (•) is a R n x → R n x transformation.We shall compute T t and Σ K t (or equivalently the forward kernel K t ) using the EnKF method.

Ensemble Kalman Filter
In this section we give a brief overview of the EnKF parameter estimation method proposed in [26], which essentially aims to compute a Gaussian approximation of π t (x t ) in each time step t.To formulate the problem in an EnKF framework, we first construct an artificial dynamical system denoted by F t ; at any time t, we have the states u t = [x t , z t ] T where z t = G t (x t ), and the dynamical model, The data is associated to the states through y t = z t + η t , or equivalently where I n y ×n y is a n y × n y identity matrix and 0 n y ×n x is a n y × n x zero matrix.
We emphasize here that once we have the posterior distribution π(u t |y 1:t ), we can obtain the posterior π t (x t ) = π(x t |y 1:t ) by marginalizing π(u t |y 1:t ) over z t .Now let us see how the EnKF proceeds to compute a Gaussian approximation of the posterior distribution π(u t |y 1:t ).At time t, suppose that the prior π(u t |y 1:t−1 ) can be approximated by a Gaussian distribution with mean μt and covariance Ct .It follows that the posterior distribution π(u t |y 1:t ) is also Gaussian and its mean and covariance can be obtained analytically: where I is the identity matrix and is the so-called Kalman gain matrix.
In the EnKF method, one avoids computing the mean and the covariance directly in each step.Instead, both the prior and the posterior distributions are represented with a set of samples.Suppose that at time t − 1, we have an ensemble of particles {u m t−1 } M m=1 drawn according to the posterior distribution π(u t−1 |y 1:t−1 ), we can propagate the particles via the dynamical model (4.1): for m = 1...M , obtaining an assemble {ũ m t } M m=1 following the prior π(u t |y 1:t−1 ).We can compute a Gaussian approximation, N (u t |μ t , Ct ), of π(u t |y 1:t−1 ), where the mean and the covariance of π(u t |y 1:t−1 ) are estimated from the samples: Once μt and Ct are obtained, we then can compute µ t and C t directly from Eq. (4.2), and by design, the posterior distribution π(u t |y 1:t ) is approximated by N (µ t , C t ).Moreover it can be verified that the samples with Q t computed by Eq. ( 4.3), follow the distribution are the approximate ensemble of π(u t |y 1:t ), and consequently the associated approximately follows distribution π t (x t ) = π(x t |y 1:t ).

EnKF-SMCS
Now we shall discuss how to use the EnKF scheme to construct the forward kernel K t for SMCS.First recall that u t = [x t , z t ] T , H = [0 n y ×n x , I n y ×n y ] and the propagation model x t = x t−1 , and we can derive from Eq. (4.6) that, where δ is a small constant, Σ q t−1 is the covariance of q t−1 (the evaluation of Σ q t−1 is provided in Eq. (5.5b)), and Q x t is a submatrix of Q t formed taking the first n x rows and the first n y columns of Q t , denoted as Eq. (5.1a) can also be written as a conditional distribution: where (5.2b)Note that the purpose of introducing the small noise term, η ′ t , in Eq. (5.1a) is to ensure that Σ K t is strictly positive definite and so K t is a valid Gaussian conditional distribution.In all the numerical implementations performed in this work, δ is set to be 10 −4 .According to the discussion in Section 4, we have, if q t−1 is a good approximation to π t−1 ( That is, Eq. ( 5.2) provides a good forward Kernel for the SMC sampler.It should be noted here that since T t is a nonlinear transform, in general, we can not derive the analytical expression for q t and as a result, we can not use the optimal backward kernel given in Eq. (3.4).Nonetheless, we can use a sub-optimal backward kernel: where qt−1 is the Gaussian approximation of q t−1 , and Kt is an approximation of K t .Next we need to determine qt−1 and Kt .Here qt−1 can be estimated from the ensemble (5.5b) Now recall that the issue with the optimal backward kernel L opt t−1 is that the transform T t inside the forward kernel K t is nonlinear, and as a result q t can not be computed analytically.Here to obtain Lt−1 in Eq. (5.4) explicitly, we take and in practice ȳ is evaluated from the particles, i.e., It follows that the backward kernel Lt−1 , given by Eq. (5.4), is also Gaussian and is given by Lt−1 ( where and (5.8c) It follows that the resulting incremental weight function is . (5.9) Now using the ingredients presented above, we summarize the EnKF-SMCS scheme in Algorithm 1.

Algorithm 1 The EnKF-SMCS algorithm
Initialization: draw sample {x m 0 } M m=1 from distribution q 0 (x 0 ); compute the weights w m 0 = π 0 (x m 0 )/q 0 (x m 0 ) for m = 1...M and renormalize {w m 0 } M m=1 so that M m=1 w m 0 = 1; for t = 1 to T do estimate ξ t−1 and Σ q t−1 from the ensemble {(x m t−1 , w m t−1 )} M m=1 using Eq.(5.5b); let ũm t = [x m t−1 , Gt(x t−1 ) m ] T for m = 1...M ; evaluate μt and Ct with Eq. (4.5), and compute Qt with Eq. (4.3); draw x m t ∼ Kt(xt|x m t−1 ) for m = 1...M with Kt given by Eq. (5.2); compute Lt−1 from Eq. (5.8); update the weights: It is important to note that a key challenge is yet to be addressed in Algorithm 1, namely the computational cost of computing the particle weight.First recall that the main computational cost arises from the evaluation of the forward model G t , and therefore, the total computational cost can be approximately measured by the number of evaluations of G t .We can see from Eq. (5.9) that when updating the particle weight, we need to compute π t (x t ), which involves the evaluation of the forward model from G 1 to G t .This operation is required at each time step, and as a result the number of the model evaluations is at the order of O(T 2 ) for each particle.Therefore, the total computational cost can be prohibitive if T is large.We propose a method to tackle the issue, which is based on the following two observations.First, here we mainly consider the sequential inference problems where one is primarily interested in the posterior distribution at the final step where all data are incorporated (we appreciate that there are problems where all the intermediate posteriors are of interest and in this case the method proposed here does not apply).Second, in many practical problems, after some number of observations, the posteriors may not vary substantially in several consecutive steps.It therefore may not be necessary to exactly compute the posterior distribution at each time step and, as a result, we only need to sample the posterior distribution in a relatively small number of selected steps.Based on this idea, we propose the following scheme in each time step to reduce the computational cost: we first compute an approximate weight for each particle, and then assess that if some prescribed conditions (based on the approximate weights) are satisfied.If such conditions are satisfied, we evaluate the actual weights of the particles.To implement this scheme, we have to address the following issues: -First we need a method to compute the approximate weight, which should be much easier to compute than the exact weight.Recall that in Eq. (5.9) one has to evaluate π t (x t )/π t−1 (x t−1 ) which involves computing the forward models from G 1 (x t ) all the way to G t (x t ), and so the computational cost is high.To reduce the computational cost, we propose the following approximate method to evaluate Eq. (5.9).Namely we first write π t (x t )/π t−1 (x t−1 ) as, and naturally we can approximate π t−1 with q t−1 , yielding, π(y t |x t ).
Though q t−1 is formally given by Eq. ( 5.3), it is not computationally tractable.Thus we make another approximation, replacing q t−1 with qt−1 , where qt−1 is the Gaussian approximation of q t−1 given by Eqs.(5.5), and as a result, we obtain which is used to compute the approximate weights.-Second we need to prescribe the conditions for triggering the computation of the actual weights.Following [21], we use the Effective Sample Size (ESS) [12] (based on the approximate weights) as the main indicator for computing the actual weights.Namely if the ESS calculated with the approximate weights is smaller than a threshold value, the actual weights are computed.Moreover we also have two additional conditions that can also trigger the computation of the actual weights: 1) if the actual weights have not been computed for a given number of steps; 2) if the inference reaches the final step, i.e., t = T .We refer to such a step as a weight refinement.
-Finally we shall discuss how to compute the actual weight w t .It should be noted here that the recursive formulas (3.3) can not be used here since the actual value of w t−1 is not available.However, letting t 0 be the preceding step where the actual weights are computed, and it can be shown that which is used to calculate the actual weights of the particles.
We refer to this modified scheme as EnKF-SMCS with weight refinement (EnKF-SMCS-WR), the complete procedure of which is described in Algorithm 2. Note here that in both EnKF-SMCS algorithms, a resampling step is needed.Finally we can see that, in EnKF-SMCS-WR, the number of forward model evaluations can potentially be significantly reduced, and the actual number of evaluations depends on how frequently the weight refinement is triggered.

Numerical examples
We provide three examples in this section to demonstrate the performance of the proposed method.We emphasize here that in all these examples, the forward model G t is computationally intensive and thus the main computational cost arises from the simulation of G t .As a result, the main computational cost of all methods is measured by the number of forward model evaluations, which in all the methods used in this section is equal to the product of the number of steps and that of the particles.

The Bernoulli model
Our first example is the Bernoulli equation, which has an analytical solution, This model is an often-used benchmark problem for data assimilation methods as it exhibits certain non-Gaussian behavior [3].Here we pose it as a sequential inference problem.Namely, suppose that we can observe the solution of the equation, v(τ ), at different times τ = t • ∆ t for t = 1, ..., T , and we aim to estimate the initial condition x from the sequentially observed data.The observation noise is assumed to follow a zero-mean Gaussian distribution with standard deviation σ.In this example, we take T = 50, and ∆ t = 0.3 and we consider two different noise levels: σ = 0.4 and σ = 0.8.In the numerical experiments, we set the ground truth to be x = 10 −4 and the data is simulated from the model (6.1) for σ = 0.4 and σ = 0.8, which are shown in Figs. 1.In the experiments the prior distribution for x is taken to be uniform: We sample the posterior distribution with four methods: the EnKF method in [26], EnKF-SMCS (Algorithm 1), EnKF-SMCS-WR (Algorithm 2) and MH-SMCS.Note that MH-SMCS is the SMCS with a Metropolis-Hastings forward proposal, a commonly used implementation of SMCS (we provide a detailed description of the algorithm in Appendix A).In each method, we use 200 particles, and the bias error, i.e., the difference between the sample mean, which is a commonly used estimator, and the ground truth is then computed at each time step.The procedure is repeated 100 times and the averaged results are shown in Figs. 2 where the left figure show the results for the small noise case (σ = 0.4) and the right figure shows those for the large noise case (σ = 0.8).First, one can see from the figures that all the methods perform better in the small noise case, which is sensible as intuitively the inference should be more accurate when the observation noise is small.More importantly, we can also see that in both cases, the EnKF results in significantly higher errors than the three SMCS methods, suggesting that EnKF performs poorly for this example.On the other hand, we observe that the three SMCS algorithms produce largely the same results in both cases, while EnKF-SMCS-WR only calculates the actual sample weights at 9 time steps on average in the small noise case and 6 in the large noise case, as is compared to 50 in the EnKF-SMCS.Such a difference suggests that the EnKF-SMCS-WR algorithm can significantly reduce the computational cost associated with the weight computation.The two EnKF-SMCS algorithms and MH-SMCS yield similar results, but we need to emphasize that the MH-SMCS is substantially more expensive than EnKF-SMCS-WR, as its procedure is similar to that of MCMC (see Appendix A for details).

Lorenz 63 model
Our second example is the Lorenz 63 model, a popular example used in several works on parameter estimation, such as [1,29].Specifically the model consists of three variables x, y and z, evolving according to the differential equations ) ) where α, ρ and β are three constant parameters.In this example we take the true values of the three parameters to be α = 10, β = 8/3 and ρ = 28, which we assume that we have no knowledge of.Now suppose that observations of (x, y, z) are made at a sequence of discrete time points: τ = t • ∆ t for ∆ t = 0.1 and t = 1, ..., 50, and we want to estimate the three parameters (α, β, ρ) from these observed data.
The measurement noise here is taken to be zero-mean Gaussian with variance 3 2 , and the priors of the three parameters are also taken to be Gaussian with means [6, 0 (the prior is chosen so that it covers the regime that can result in chaotic behavior) .The data used in our numerical experiments are shown in Fig. 3.
In the numerical experiments, we conduct inference for two different cases: one is that variable x is observed and the other is that y is observed.In each case we draw samples from the posterior distributions with EnKF, EnKF-SMCS and EnKF-SMCS-WR, MH-SMCS, where 500 samples are drawn with each method.All the numerical experiments are repeated 10 times.We plot the average errors for the case that x is observed in Fig. 4 and those for that with y being observed in Fig. 5.One can see that, in both cases, the errors in the EnKF is larger than those in the three SMCS methods, especially for parameter α.Once again, the three SMCS methods yield similar errors while EnKF-SMCS-WR employs much less computations of the actual weight: on average 9 time steps in the first case and 8 in the second.The example shows that even for problems where the posterior distributions are rather close to Gaussian, the SMCS can further improve the estimation accuracy.

A kinetic model of the ERK pathway
In the last example we consider the parameter estimation problems in the kinetic models.Estimating the kinetic parameters is an essential task in the modeling of the biochemical reaction networks, including genetic regulatory networks and Fig. 4 The average estimation error of each parameter when x is observed.
signal transduction pathways [32].In particular we consider the kinetic model of the Extracellular signal Regulated Kinase (ERK) pathway suppressed by Raf-1 kinase inhibitor protein (RKIP) [27,35].Here we shall omit further details of biological background of the problem and proceed directly to the mathematical formulation of the problem; readers who are interested in more application-related information may consult [27,35].
In this problem the mathematical model that is derived based on enzyme kinetics, and is represented by a dynamical system: where τ is the time, x is a vector of state variables which are concentrations of metabolites, enzyme and proteins or gene expression levels, S is a stoichiometric matrix that describes the biochemical transformation in a biochemical network, and V (x) is the vector of reaction rates and is usually the vector of nonlinear function of the state and input variables.Specifically, in this ERK pathway model Fig. 5 The average estimation error of each parameter when y is observed.we have which forms a system of 11 ordinary differential equations.Moreover the rates of reactions V (x) are [27,35]: where k 1 , ..., k 11 are the kinetic parameters, and the stoichiometric matrix S is given by [27,35]: .
In this problem, we can make observations of some of the concentrations x 1 , ...x 11 at different times, from which we estimate the 11 kinetic parameters k 1 , ..., k 11 .In our numerical experiments the specific setup is the following.In many practical problems, not all the species' concentrations can be conveniently observed [27,35].To mimic the situation we assume that the observations can only be made on 4 of the states: {x 1 , x 4 , x 7 , x 10 }, and {x 2 , x 3 , x 5 , x 6 , x 8 , x 9 , x 11 } are not observed.Second the observation is made 50 times with each time spacing is ∆ t = 0.001, and the measurement noise is taken to be zero-mean Gaussian with standard deviation (STD) shown in Table 1.The initial values of the concentrations are also given in Table 1.We use simulated data in this example where the true values of the eleven parameters are shown in Table 2.The prior of the eleven parameters are also taken to be Gaussian with means and standard deviations both shown in Table 2.
In this example we focus on the four SMCS algorithms: EnKF-SMC, EnKF-SMC-WR, MH-SMCS, and IS-SMCS (a special MH-SMCS implementation with an independent proposal; see Appendix A for details), where the main purpose is to compare the EnKF based and the MH based forward proposals.We test two sample sizes M = 5000 and M = 10000 for each method and all the tests are repeated 10 times.
First we want to examine the computational cost of EnKF-SMCS and EnKF-SMCS-WR.To do this, we plot the CPU time of the two algorithms as a function of t, where in each algorithm we show both the total time cost and that used for evaluating the forward model.First, we can see from the figure that, in both algorithms the main computational cost arises from the forward model evaluation; second, the EnKF-SMCS-WR can significantly reduce the computational cost by using less forward model evaluations.As discussed earlier, for the purpose of sequential inference, we should devote the majority of our attention to the estimator accuracy at the final step, and therefore in Table 3 we show the estimation error for each parameter at the final step t = 50.Specifically, we provide in the table the mean-squared errors (MSE) of the estimation results.We can see from the table that the two EnKF-SMCS algorithms yield lower estimation errors than MH-SMCS in all the cases, and in particular the difference is substantially large for parameters k 1 , k 4 , k 5 k 6 , k 8 , k 9 and k 11 , with both sample sizes.The results of IS-SMCS are considerably better than those of MH-SMCS, suggesting that the posterior distributions in this problem may be reasonably close to Gaussian.That said, IS-SMCS results in clearly higher MSE for k 1 and k 9 (in the M = 10000 case).On the other hand, the two EnKF based SMCS algorithms yield similar performance in terms of the estimation error, but the EnKF-SMCS-WR method only conducts WR at 12 steps on average for both sample sizes, resulting in much higher computational efficiency than EnKF-SMCS.

Conclusions
In this work we propose a sampling method to compute the posterior distribution that arises in sequential Bayesian inference problems.The method is based on SMCS, which seeks to generate weighted samples from the posterior in a sequential manner and, specifically, we propose to construct the forward kernel in SMCS using an EnKF framework and also derive a backward kernel associated to it.With numerical examples, we demonstrate that the EnKF-SMCS method can often yield more accurate estimations than the direct use of either SMCS or EnKF for a class of problems.We believe that the method can be useful in a large range of realworld parameter estimation problems where data becomes available sequentially in time.
Some extensions and improvements of the EnKF-SMCS algorithm are possible.First in this work we focus on problems with a sequential structure, but we expect that the method can be applied to batch-inference problems (where the data are available and used for inference altogether) as well.In fact, many batch inference problems can be artificially "sequentialized" by some data tempering treatments [17] and, consequently, the EnKF-SMCS algorithm can be applied in these scenarios.In this respect, combining data tempering methods and the EnKF-SMCS method to address batch inference problems can be a highly interesting research problem.Second, as has been discussed previously, the proposed method relies on the assumption that the posterior distributions do not deviate strongly from being Gaussian.For problems with highly nonlinear models, the posterior distributions may depart far from Gaussian, and as result the kernels obtained with the EnKF method may not be effective for SMCS.In this case, the performance of the EnKF-SMCS method may be improved by approximating the posterior with a mixture distribution (e.g.[23,34]).Finally, as the method is based on an EnKF scheme, it requires that the observation noise is additive Gaussian.In the EnKF literature, a number of methods have been developed to deal with non-Gaussian observations.One simple approach is to calculate the Kalman gain matrix using the sample covariance between u t and y t [24], and another example is the EnKF variant proposed in [28].In principle, all these ideas can be used in our method to construct the EnKF forward kernel, and to this end an important question is how to effectively incorporate them in the EnKF-SMCS framework.We plan to investigate these issues in the future.

A The MH-SMCS algorithms
In this section we provide a brief description of the MH-SMCS algorithm, largely following [11].As is mentioned in section 6.1, a commonly used forward proposal is the MH kernel, which generates a new sample xt with the following procedure, -Draw x * from a condition distribution q(•|x t−1 ); -Calculate the acceptance probability a(x * , x t−1 ) = min{1, πt(x * ) πt(x t−1 ) q(x t−1 |x * ) q(x * |x t−1 ) }; -Draw ν from U [0, 1]; -If ν ≤ a, let xt = x * ; otherwise let xt = x t−1 .
It is worth mentioning that, since the MH kernel includes an acceptance-rejection step that involves πt(•), the number of forward model evaluations is also at the order of O(T 2 ) where T is the total number of time steps.Finally we note that, in the numerical experiments, we choose two proposal distribution within the MH kernel: (1) a random walk proposal, q(•|x t−1 ) = N (x t−1 , λ 2 I), and the variance λ 2 is adjusted so that the average acceptance rate is about 50%; (2) an independence sampler proposal q(•|x t−1 ) = qt−1 (•) where qt−1 (•) is given by Eq. (5.5).We refer to the former as MH-SMCS and the latter as IS (independent sampler)-SMCS.Finally we note that, in principle multiple-iteration MH proposal can be used in SMCS; however, since each MH iteration requires an evaluation of the forward model, which may result in formidable computational cost, here we only perform one MH iteration in both MH-SMCS and IS-SMCS.

Fig. 1
Fig.1The simulated data for σ = 0.4 (left) and σ = 0.8 (right).The lines show the simulated states in continuous time and the dots are the noisy observations.

Fig. 2
Fig.2The average bias error (the difference between the sample mean and the ground truth) plotted at each time step where the insets are the same plots on a logarithmic scale.The left plot is the error for σ = 0.4 and the right figure is that for σ = 0.8.

Fig. 3
Fig.3The simulated data for the Lorenz 63 example.The lines show the simulated states in continuous time and the dots are the noisy observations.

Fig. 6 Fig. 7
Fig.6The total CPU time and the CPU time for the forward model evaluation (marked with Gt) in both EnKF-SMCS and EnKF-SMCS-WR.Insets are the zoom-in plots.

Table 1
The initial values and observation noise of the concentrations (states x i )

Table 2
The true values and priors of the kinetic parameters

Table 3
Comparison of the MSE results of the kinetic model at t=50.