A point mass proposal method for Bayesian state-space model fitting

Llewellyn, Mary; King, Ruth; Elvira, Víctor; Ross, Gordon

doi:10.1007/s11222-023-10268-6

A point mass proposal method for Bayesian state-space model fitting

Original Paper
Open access
Published: 28 July 2023

Volume 33, article number 111, (2023)
Cite this article

Download PDF

You have full access to this open access article

Statistics and Computing Aims and scope Submit manuscript

A point mass proposal method for Bayesian state-space model fitting

Download PDF

Mary Llewellyn¹,
Ruth King¹,
Víctor Elvira¹ &
…
Gordon Ross¹

881 Accesses
2 Altmetric
Explore all metrics

Abstract

State-space models (SSMs) are commonly used to model time series data where the observations depend on an unobserved latent process. However, inference on the model parameters of an SSM can be challenging, especially when the likelihood of the data given the parameters is not available in closed-form. One approach is to jointly sample the latent states and model parameters via Markov chain Monte Carlo (MCMC) and/or sequential Monte Carlo approximation. These methods can be inefficient, mixing poorly when there are many highly correlated latent states or parameters, or when there is a high rate of sample impoverishment in the sequential Monte Carlo approximations. We propose a novel block proposal distribution for Metropolis-within-Gibbs sampling on the joint latent state and parameter space. The proposal distribution is informed by a deterministic hidden Markov model (HMM), defined such that the usual theoretical guarantees of MCMC algorithms apply. We discuss how the HMMs are constructed, the generality of the approach arising from the tuning parameters, and how these tuning parameters can be chosen efficiently in practice. We demonstrate that the proposed algorithm using HMM approximations provides an efficient alternative method for fitting state-space models, even for those that exhibit near-chaotic behavior.

Coupling stochastic EM and approximate Bayesian computation for parameter inference in state-space models

Article Open access 23 October 2017

Approximate Bayesian Computation by Subset Simulation for Parameter Inference of Dynamical Models

Sequential Gauss-Newton MCMC Algorithm for High-Dimensional Bayesian Model Updating

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

State-space models (SSMs) describe time series observations as dependent on an unobserved system process (Durbin and Koopman 2012). The unobserved system process of an SSM consists of a sequence of continuously-valued latent states that evolve through time but are not observed directly. Instead, observations are modeled as a function of these unobserved latent states via the observation process. SSMs have been applied to various problems, including the modeling of inflation rates (Koopman and Bos 2004), neuron responses (Smith and Brown 2003; Lin et al. 2019), animal movement (Patterson et al. 2008; McClintock et al. 2012; Auger-Méthé et al. 2016), and population dynamics (Buckland et al. 2004; King et al. 2009; King 2012; Newman et al. 2014).

Fitting an SSM usually refers to inference on the model parameters given the observations. However, outside of the discrete-state hidden Markov model (HMM) case (Rabiner 1989), or special continuous cases, for example, linear Gaussian SSMs (Kalman 1960; Durbin and Koopman 2012), a closed-form expression for the likelihood does not exist, leading to model fitting challenges. The various approaches to approximating the likelihood include the use of linear Gaussian approximations (Julier and Uhlmann 1997; Wan and Van Der Merwe 2000), numerical methods using Laplace approximations or the HMM approximations of point mass filters (Bucy and Senne 1971; Koyama et al. 2010; Langrock and King 2013; Thygesen et al. 2017; Herliansyah et al. 2022), the use of sequential Monte Carlo to obtain an unbiased approximation to the likelihood (Beaumont 2003; Andrieu and Roberts 2009; Kantas et al. 2015; Deligiannidis et al. 2018), and Bayesian data augmentation approaches (Frühwirth-Schnatter 2004; Fearnhead 2011; Borowska and King 2023). A review of these methods, and others, is given in Newman et al. (2022).

Here, we focus on Bayesian data augmentation approaches, which can be applied to general SSMs by specifying the latent states as additional auxiliary variables that need to be estimated. In doing so, the data augmentation approach forms the joint posterior distribution of the model parameters and additional latent states (Tanner and Wong 1987; Hobert 2011). The resulting joint likelihood of the parameters and states is typically available in closed-form, permitting the application of standard MCMC algorithms, such as the Gibbs sampler, Metropolis–Hastings algorithms, or Metropolis-within-Gibbs algorithms. An estimate of the marginal posterior distribution of the model parameters can then be obtained by simply retaining the sampled parameter values and disregarding the latent state samples to effectively marginalize over the states. It is, however, typically challenging to efficiently sample the latent state vector within a Metropolis–Hastings algorithm, since there are typically many highly correlated latent states, leading to poor mixing (Frühwirth-Schnatter 2004). Updating the states one at a time typically performs very poorly due to the high correlation between states (Fearnhead 2011), whilst designing good block-update proposal distributions can be challenging (Shephard and Pitt 1997).

Historically, data augmentation has been combined with numerical approximation, sequential Monte Carlo, and/or MCMC to fit the parameters of an SSM. When an SSM is well-approximated by Gaussian distributions, efficient proposal distributions for the latent states in a data augmentation approach include those derived from Laplace approximation (Kristensen et al. 2016) and extensions to the Kalman filter (Giordani and Kohn 2010). In the general case, more computationally intensive methods are usually required, such as the particle Gibbs (PG) sampler (Lindsten et al. 2012; Chopin and Singh 2015; Fearnhead and Meligkotsidou 2016) or particle Gibbs with ancestor sampling (PGAS) (Lindsten et al. 2014). PG and PGAS are classes of particle MCMC method (Andrieu et al. 2010) which use sequential Monte Carlo to update all states simultaneously conditional on the parameters. In their use of sequential Monte Carlo, PG and PGAS methods rely on few assumptions about the underlying state and observation distributions. However, these methods can be computationally expensive, particularly if many samples are required in the sequential Monte Carlo steps due to sample impoverishment (Rainforth et al. 2016; Wang et al. 2017). One approach to overcoming sample impoverishment is the introduction of deterministic resampling, which uses a deterministic grid over the state space to retain samples that may otherwise be discarded in the sequential Monte Carlo steps (Li et al. 2012). Although we do not develop a particle MCMC approach here, the use of a deterministic grid to efficiently sample the latent states is related to our approach.

In this paper, we propose a novel and efficient approach to Metropolis-within-Gibbs data augmentation. In particular, we focus on the challenging aspect of efficiently updating the latent states conditional on the model parameters and present the point mass proposal Metropolis–Hastings (PMPMH) algorithm, proposing candidate values for (blocked) latent state values in two steps. The first step uses a discretization of the state space into a set of pre-defined intervals (a grid) and deterministic approximations to reduce the SSM to an HMM. We then sample a sequence of intervals from the HMM-approximated distribution of the latent states given the model parameters, employing the forward-filtering backward-sampling (FFBS) algorithm (Rabiner 1989). In the second step of the algorithm, we sample values from within the intervals using a continuous proposal distribution. Once the candidate values have been proposed using the two-step procedure, we correct for the discretization error imposed by the approximate HMM using a Metropolis–Hastings acceptance step. As with standard data augmentation algorithms, the model parameters are then updated conditionally on the latent states using standard Gibbs or Metropolis-within-Gibbs updates. Further to describing a general method for combining HMM approximations and MCMC steps, we discuss, in detail, several tuning parameters that can be used to improve the performance of the algorithm. Such tuning parameters include the block size for the latent states (i.e. the number of latent states to update simultaneously), the number of intervals used in the HMM approximation, and the distribution of the intervals in the state space.

The first step of the proposed algorithm is related to embedded HMMs and point mass filtering, which both use the tractable discrete case of an SSM, an HMM, to sample latent states. Embedded HMMs (Neal 2003; Shestopaloff and Neal 2013; Finke et al. 2016; Shestopaloff and Neal 2018) construct an HMM in the indices of stochastically-generated ‘pool’ states, and sample states such that they target the correct posterior distribution. Here, we use the embedding of an HMM via deterministic grid cells to ensure latent states are proposed with reasonable posterior mass. Point mass filters, similarly to the proposed approach, use a deterministic grid to reduce the SSM to an HMM (Bucy and Senne 1971; Kitagawa 1987; Langrock et al. 2012; Langrock and King 2013). The HMM can be used to approximate the likelihood or posterior distribution of the model parameters directly. As such, point mass filters can be (relatively) computationally inexpensive to implement, but generally produce biased estimators. Further, the bias and variance of estimators often depend on the grid (for example, the number and location of the grid cells and their coverage of the state space), and the integration method used to approximate the HMM (de Valpine and Hastings 2002; Matousek et al. 2019). Instead, we use the grid cells simply to inform a Metropolis–Hastings proposal distribution and are thus able to define the HMM and grid cells coarsely (and inexpensively), while being able to achieve convergence to the correct posterior distribution, correcting for the bias introduced by the discretization.

The rest of the paper is structured as follows. We start by defining SSMs and HMMs and model fitting procedures for SSMs via Bayesian data augmentation in Sect. 2. We describe the proposed data augmentation scheme in Sect. 3, the PMPMH algorithm, in which we use a grid-based HMM approximation within a proposal distribution for the latent states. In Sect. 4, we provide three particular cases of the algorithm, each given by a different method for defining the grid cells, and describe when each can be usefully applied. Finally, we illustrate our approach with two case studies in Sect. 5, including a near-chaotic system, where our proposed approach demonstrates a substantial improvement in performance compared to alternative approaches considered. We conclude in Sect. 6 with a discussion of our proposed algorithm and further possible avenues of research.

2 State-space model inference

Here we define SSMs, the notation used throughout the paper, and the special case of discrete-valued system processes: HMMs. We then discuss the challenges of fitting SSMs to data and computational approaches in Sect. 2.1, motivating our proposed data augmentation algorithm.

We consider time series data observed at discrete (regular) time points up to time T, $y_{1:T}=(y_1,\dots ,y_T)$. An SSM models these data via (i) a system process of unobserved (latent) states, $x_{1:T}=(x_1,\dots ,x_T)$; and (ii) an observation process linking the latent process with the observed $y_{1:T}$. We allow the states to be continuous, taking values in some set $x_t \in \chi $ for each t, and describe their evolution using initial state and transition distributions. Similarly, the observation process is described using an observed state distribution. As standard, we assume a first-order Markov process for the system process and that the observed data are conditionally independent given the underlying latent states, that is, the observation process is only a function of the latent state at time t.

Given the set of model parameters, collectively denoted $\pmb {\theta }$, the SSM can be described mathematically by

Initial state distribution:

$$\begin{aligned} p(x_1 \vert \pmb {\theta }), \end{aligned}$$

State transition distribution:

$$\begin{aligned} p(x_t \vert x_{t-1}, \pmb {\theta }), t=2, \dots , T, \end{aligned}$$

Observed state distribution:

$$\begin{aligned} p(y_t \vert x_t, \pmb {\theta }), t=1, \dots , T. \end{aligned}$$

A particular case of an SSM is an HMM, in which the state space is discrete and finite. In other words, the states at time t, $x_t$, take values in the set $\{1, \dots , N\}$. Therefore, when the SSM considered is an HMM, we simply re-define the initial state and transition probabilities for $k, n=1, \dots , N$ as

Initial state probabilities:

$$\begin{aligned} P(x_1=n \vert \pmb {\theta }), \end{aligned}$$

State transition probabilities:

$$\begin{aligned} P(x_t=n \vert x_{t-1}=k, \pmb {\theta }), t=2, \dots , T. \end{aligned}$$

2.1 Model fitting

We assume that we are primarily interested in inference on the model parameters, $\pmb {\theta }$, although this may also extend to the latent states, $x_{1:T}$, depending on the application. In general, inference on the posterior distribution for $\pmb {\theta }$ requires a closed-form expression for the observed data (or marginal) likelihood, $p(y_{1:T} \vert \pmb {\theta })$. The joint likelihood of the model parameters and latent states is given by

$$\begin{aligned} p(x_{1:T}, y_{1:T} \vert \pmb {\theta })&= p(x_1 \vert \pmb {\theta }) \prod _{t=2}^T p(x_t \vert x_{t-1}, \pmb {\theta })\prod _{t=1}^T p(y_t \vert x_t, \pmb {\theta }). \end{aligned}$$

(1)

A closed-form expression for the marginal likelihood typically only exists in the linear Gaussian case, where $p(y_{1:T} \vert \pmb {\theta })$ can be calculated using the Kalman filter (Kalman 1960; Durbin and Koopman 2012), or when the state space is discrete and finite (an HMM), where the marginalization of Equation (1) amounts to summation over the state space (Rabiner 1989). In general, however, the marginal likelihood is intractable for continuous SSMs, and standard MCMC implementations cannot be used. One solution is to use pseudo-marginal likelihood methods (Beaumont 2003; Andrieu and Roberts 2009), replacing the marginal likelihood with an unbiased estimate to formulate a valid Metropolis–Hastings approach, but we do not focus on these here.

Instead, we consider a Bayesian data augmentation approach. In this case, we form the joint posterior distribution over the model parameters and latent states (Tanner and Wong 1987; Hobert 2011), $p(x_{1:T}, \pmb {\theta } \vert y_{1:T}) \propto p(\pmb {\theta }) p(x_{1:T}, y_{1:T} \vert \pmb {\theta })$, thus using the joint likelihood in Equation (1) directly. Since, in general, a closed-form expression exists for the joint likelihood term up to proportionality, the proportional likelihood can be evaluated, and standard MCMC approaches can be used targeting $p(x_{1:T}, \pmb {\theta } \vert y_{1:T})$. Samples from the (marginal) posterior distribution of the parameters, $p(\pmb {\theta } \vert y_{1:T})$, are then obtained by disregarding the state samples and retaining the parameter samples.

To sample from $p(x_{1:T}, \pmb {\theta } \vert y_{1:T})$, a common approach is to sample the latent states and parameters in turn via their conditional distributions, simplifying the parameter updates (Carter and Kohn 1994; Frühwirth-Schnatter 2004; Durbin and Koopman 2012). A Metropolis-within-Gibbs algorithm can be applied to target the joint distribution: in each iteration, we update the latent states conditional on the model parameters; and the model parameters conditional on the latent states (for which we now have a closed conditional likelihood expression). However, designing an efficient proposal distribution for the underlying state vector can be challenging. Ideally, we would be able to design a proposal distribution to update all states simultaneously and minimize the correlation between consecutive samples. In general, this state distribution is intractable, and designing an efficient approximation to the state distribution can be challenging since it is often high-dimensional in the number of temporal states. One approach updates all states simultaneously using a particle Gibbs (PG) algorithm (Chopin and Singh 2015; Murphy and Godsill 2016; Lindsten et al. 2014); values for the entire state vector are proposed at each iteration from a conditional sequential Monte Carlo algorithm. In this approach, proposed values for the entire state vector are always accepted and samples from the correct posterior distribution can be obtained. However, PG methods can suffer from poor mixing sample impoverishment: degenerating particles being disregarded upon resampling, impacting mixing and convergence (Wang et al. 2017; Rainforth et al. 2016). Instead of updating all states simultaneously, for example via PG updates, an alternative approach is to use a single-update algorithm: updating each latent state individually to reduce the required dimension of the proposal distribution, but this approach typically leads to very poor mixing due to correlation in the state process (Fearnhead 2011; King 2011).

A compromise between single and global updates of the latent states is to use block updates, simultaneously updating $\ell $ consecutive states (Shephard and Pitt 1997). The posterior conditional distribution of the states in a given block is, generally, of non-standard form, thus it can be challenging to define an efficient proposal distribution that accounts for the correlation between the states. We propose the use of the discrete analogy to SSMs, HMMs, to define an informative and efficient proposal distribution for blocks of latent states.

3 Point mass proposal Metropolis–Hastings

In this section, we propose the point mass proposal Metropolis–Hastings (PMPMH) algorithm. The setting is a Metropolis-within-Gibbs algorithm for data augmentation. The model parameters, $\pmb {\theta }$, are updated conditionally on $x_{1:T}$ using a standard Metropolis–Hastings or Gibbs update, and we propose the PMPMH algorithm to sample Metropolis–Hastings proposals for the latent states given the model parameters.

Step 1 of the algorithm initially converts the SSM into an HMM by discretizing the state space at each time point, forming a deterministic grid, and the HMM is approximated deterministically. Grid cells are then sampled from the associated (discretized and approximate) posterior distribution, conditional on the model parameters, using the standard forward-filtering backward-sampling (FFBS) algorithm for HMMs (Rabiner 1989) at a computational cost of $\mathcal {O}(NT)$ for N grid cells at each time point. Given the sampled grid cells, Step 2 proposes values for $x_{1:T}$ from within the cells using some specified (bounded) continuous proposal distribution. We then correct for the approximate sampling distribution using a Metropolis–Hastings step targeting the correct posterior distribution.

For pedagogical purposes, we make a few assumptions in our initial description of the algorithm, for example, we derive the algorithm for one-dimensional state spaces. Extensions to higher-dimensional state spaces are possible, and we demonstrate the algorithm in the two-dimensional case with an example in Sect. 5, and discuss scalability further in Sect. 6. However, we note here that there may be practically increased computational demands for higher-dimensional state spaces. For notational simplicity, we present the algorithm for SSMs where the state space and the number of grid cells are the same for each time point, that is, the state space at all time points is denoted by $\chi $. However, the generalization to state spaces that vary over time is immediate by adapting the notation. Finally, we initially describe the algorithm in terms of updating the full set of latent states, with a version of the algorithm for block updates of the states described later.

3.1 Step 1: sampling a grid cell trajectory

In this initial step, we formulate a discrete HMM representation of the SSM, approximate the HMM, and sample a trajectory from the discrete approximation.

3.1.1 Discretization to an HMM

First, we propose a partition of the whole state space into a grid. The state space at each time t, $\chi $, is partitioned into intervals, forming grid cells. The N grid cells at time t are denoted $I_t(n),$ $n=1, \dots , N$ and span $\chi $ with no overlap. The respective outer grid cells have infinite length if the state space is unbounded. An example of such a partition is shown in Fig. 1.

The grid cells discretize the state space. Thus, the grid cell indices are the discrete states of an HMM. For the grid cells given by $\{I_t(1), \dots I_t(N)\}$, the indices are simply $\{1, \dots N\}$, with dynamics given by the SSM density of each grid cell. Mathematically, let $B_t$ denote the random variable of the grid cell indices under the specified grid cell boundaries so that $B_t \in \{1,\dots ,N\}$ for $t=1,\dots ,T$. Then the HMM probabilities are given for states $k, n \in \{1, \dots , N\}$ as follows:

Initial state probabilities:

$$\begin{aligned} P(B_1=n \vert \pmb {\theta }), \end{aligned}$$

State transition probabilities:

$$\begin{aligned} P(B_t = n \vert B_{t-1}=k, \pmb {\theta }), t=2, \dots , T, \end{aligned}$$

Observed state distribution:

$$\begin{aligned} p(y_t \vert B_t=n, \pmb {\theta }), t=1, \dots , T. \end{aligned}$$

These HMM probabilities under the original SSM are therefore defined by integrals of the state space equations over the given intervals:

$$\begin{aligned}&P(B_1=n \vert \pmb {\theta }) = \int \limits _{I_1(n)} p(x_1 \vert \pmb {\theta }) dx_1,\nonumber \\&P(B_t = n \vert B_{t-1}=k, \pmb {\theta }) = \nonumber \\&\qquad \int \limits _{I_t(n)} \int \limits _{I_{t-1}(k)} p(x_t \vert x_{t-1}, \pmb {\theta }) dx_{t-1} dx_t, t=2, \dots , T, \nonumber \\&p(y_t \vert B_t=n, \pmb {\theta }) = \int \limits _{I_t(n)} p(y_t \vert x_t, \pmb {\theta }) dx_t, t=1, \dots , T. \end{aligned}$$

(2)

3.1.2 Approximate HMM

In general, the probabilities associated with the HMM, given in Equation (2), are not available in closed form, thus we approximate these integrals. To formulate an approximation to the HMM under the considered SSM, we use simple (fast) deterministic methods, for example, the midpoint rule or other Riemann sum methods. Mathematically, we define $L_t(n)$ as the length of the interval $I_t(n)$, with the length of any infinite grid cells set at some arbitrary (finite) length (discussed further in Appendix B). Within each grid cell, we choose a set of node points, $\{\xi _t^s(n)\}_{s=1}^S$. The set of nodes is defined a priori and is used to approximate the density functions across each grid cell by a polynomial. In the simple case where $S=1$ (that is, one node in each grid cell such as the mid-point), we have that $\{\xi _t^s(n)\}_{s=1}^S=\xi _t(n)$. The approximate HMM, for grid cells indices $k, n=1, \dots , N$ , can thus be given by the mid-point approximations:

$$\begin{aligned}&\hat{P}(B_1 = n \vert \pmb {\theta }) \propto L_1(n) p(\xi _1(n) \vert \pmb {\theta }),&\\&\hat{P}(B_t=n \vert B_{t-1}=k, \pmb {\theta }) \propto&\nonumber \\&\quad L_t(n) L_{t-1}(k) p(\xi _t(n) \vert \xi _{t-1}(k), \pmb {\theta }), t=2, \dots , T,&\\&\hat{P}(y_t \vert B_t=n, \pmb {\theta }) \propto&\\&\quad L_t(n) p(y_t \vert \xi _t(n), \pmb {\theta }), t=1, \dots , T,&\end{aligned}$$

Each approximate probability is appropriately normalized so that they sum to one and are thus valid probability mass functions. Once the probability mass function approximations have been obtained, grid cell indices are proposed from

$$\begin{aligned} \hat{P}(B_{1:T} \vert y_{1:T}, \pmb {\theta })&\propto \hat{P}(B_1 \vert \pmb {\theta })\hat{P}(y_1 \vert B_1, \pmb {\theta })&\\&\quad \prod _{t=2}^T \hat{P}(B_t \vert B_{t-1}, \pmb {\theta })\hat{P}(y_t \vert B_t, \pmb {\theta }).&\end{aligned}$$

We sample indices from this distribution, denoted $b'_{1:T}$, using the forward-filtering backward-sampling (FFBS) algorithm in Rabiner (1989).

3.2 Step 2: sampling a point trajectory

Step 2 proposes continuous values for the states which will be subsequently corrected in Metropolis–Hastings steps. Given the discrete grid cell indices sampled previously, we propose values for $x_{1:T}$ conditional on the corresponding intervals $\{I_1(b'_1),\dots ,I_T(b'_T)\}$. This amounts to sampling $x_{1:T}$ from the domain $I_1(b'_1)\times I_2(b'_2)\times \cdots \times I_T(b'_T)$. For simplicity, we consider proposal distributions for each $x_t$ independently and independently of $\pmb {\theta }$, giving proposal distributions of the form $q(x_t \vert x_t \in I_t(b'_t))=q(x_t \vert B_t=b'_t)$, for $t=1,\dots ,T$. For example, a natural choice is to propose each $x_t$ from a uniform proposal distribution over $I_t(b'_t)$ for finite $I_t(b'_t)$, or a truncated Gaussian distribution if $I_t(b'_t)$ has infinite length. However, any distribution with domain $I_t(b'_t)$ is applicable.

3.3 The PMPMH proposal distribution

The process of sampling a grid cell trajectory and sampling a value for $x_{1:T}$ defines an independent proposal distribution on $\chi ^T$. The density function of the proposed trajectories is therefore the probability of selecting the grid cell indices from a specified set of grid cell boundaries, combined with the density of the (continuous) values within those grid cells. Let $B_{1:T}$ denote the random variables of the grid cell indices under the proposed grid, which may change across iterations with intervals $I_t(n), n=1, \dots , N, t=1, \dots , T$. Then the density of the subsequently proposed state values, $x'_{1:T}$, is given by

$$\begin{aligned} q(x'_{1:T} \vert y_{1:T}, \pmb {\theta })&= \hat{P}(B_{1:T}=b'_{1:T} \vert y_{1:T}, \pmb {\theta }) \nonumber \\&\quad \prod _{t=1}^T q(x'_t \vert B_t=b'_t). \end{aligned}$$

(3)

The resulting proposed trajectory, $x'_{1:T}$, is retained according to the Metropolis–Hastings acceptance probability, given by

$$\begin{aligned} p(x_{1:T}, x'_{1:T} \vert \pmb {\theta }) = \min \left( 1, \frac{ p(x'_{1:T} \vert y_{1:T}, \pmb {\theta }) q(x_{1:T} \vert y_{1:T}, \pmb {\theta }) }{p(x_{1:T} \vert y_{1:T}, \pmb {\theta })q(x'_{1:T} \vert y_{1:T}, \pmb {\theta })} \right) , \end{aligned}$$

(4)

where $p(x_{1:T} \vert y_{1:T}, \pmb {\theta })$ denotes the posterior conditional distribution of $x_{1:T}$ given $y_{1:T}$ and $\pmb {\theta }$, $q(x'_{1:T} \vert y_{1:T}, \pmb {\theta })$ is as in Eq. (3), and $q(x_{1:T} \vert y_{1:T}, \pmb {\theta })$ uses the same definition for the grid cells, that is, their indices, $b_{1:T}$, are such that $x_t \in I(b_t)$ for all t.

The PMPMH method for obtaining a sample of $x_{1:T}$ conditional on $\pmb {\theta }$ is summarized in Algorithm 1. For completeness, we also include the updates of $\pmb {\theta }$ conditional on $x_{1:T}$, which can be updated using standard Metropolis–Hastings or Gibbs updates conditional on the current chain value of the states.

3.4 Block updates of the latent states

Updating all of the states simultaneously may be inefficient if there are many latent states. We can, instead, update the states in smaller blocks: for a block of $\ell $ states starting at time t, we sample $x_{t:t+\ell -1}$ from a proposal distribution of the form $q(x_{t:t+\ell -1} \vert x_{t-1}, x_{t+\ell }, y_{t:t+\ell -1}, \pmb {\theta })$.

A block PMPMH proposal distribution is given by a simple adaptation of the case where all states are updated simultaneously, conditioning on the current states at either side of the block; $x_{t-1}$ and $x_{t+\ell }$. Therefore, in Step 1, we condition on the current grid cell indices, $b_{t-1}$ and $b_{t+\ell }$, such that $x_{t-1} \in I(b_{t-1})$ and $x_{t+\ell } \in I(b_{t+\ell })$ under the current grid definition. We then use FFBS to sample proposed indices from the grid cells in the block. In Step 2, we simply define the proposal distribution, as before, for each latent state in the given block, $q(x_s \vert B_s=b'_s)$, for $s=t, \dots , t+\ell -1$. The PMPMH algorithm for block updates is summarized in Appendix C.

We note that it is possible to parallelize the updates of the latent states to improve computational efficiency. For example, blocks of states can be updated in parallel provided that the states to be updated and those being conditioned on do not overlap. For the sake of comparison with other methods, we do not parallelize our computation but simply note its possibility. See also King et al. (2023) for further discussion on paralellization within a state-space model fitting context.

3.5 Conditions for convergence

A benefit of the proposed method is that estimators of integrable functions are consistent under some mild conditions ensuring the validity of the proposal distribution in the Metropolis–Hastings steps.

Theorem

The PMPMH algorithm provides consistent ergodic average estimators of integrable functions with respect to $p(x_{1:T}, \pmb {\theta } \vert y_{1:T})$ if,

1.
for all $b_t \in \{1, \dots , N\}$, $t \in \{1, \dots , T\}$, the HMM probabilities are defined such that $\hat{P}(B_{1:T} = b_{1:T} \vert y_{1:T}, \pmb {\theta })>0$ and the proposal distributions within each grid cell are defined such that $q(x_t\vert B_t=b_t)>0$ for all $x_t \in I_t(b_t)$, and
2.
1. (a)
  the grid cells are defined and fixed at the start of the algorithm or
2. (b)
  the grid cells are defined as a function of the current MCMC iteration parameter or state values.

Proof

We assume that the transition kernel on the parameter space is constructed such that it admits $p(\pmb {\theta }\vert x_{1:T}, y_{1:T})$ as its limiting distribution. By standard results from Metropolis-within-Gibbs algorithms (Latuszyński et al. 2013; Gamerman and Lopes 2006), proving that the PMPMH algorithm provides consistent estimators of integrable functions reduces to ensuring that the transition kernel on the blocks of $\ell $ states converge to the correct conditional posterior distribution. We therefore show that irreducibility and detailed balance are satisfied by the proposal distributions with respect to the correct conditional posterior distribution for each block of $\ell $ states.

By construction, we assume that $\cup _{n=1}^N I_t(n) = \chi $ for all $t=1, \dots , T$. Condition 1 implies that $q(x_{1:T} \vert y_{1:T}, \pmb {\theta })>0$ for all $x_{1:T} \in \chi ^T$, thus irreducibility is satisfied since any value in the state space can be proposed at each iteration of the algorithm. Note that Condition 1 can be ensured by arbitrarily thresholding the HMM probabilities above zero and attributing non-zero density within each grid cell using many standard distributions.

It is immediate that detailed balance holds under Condition 2(a) that the grid cells are defined and fixed at the start of the Metropolis–Hastings algorithm. It is also straightforward to show that detailed balance holds under Condition 2(b) that the grid cells are defined as a function of the current MCMC iteration parameter or state values.

Thus, Condition 1 and Condition 2(a) or 2(b) ensure that irreducibility and detailed balance are satisfied by the PMPMH proposal distribution, and the algorithm provides consistent ergodic average estimators of integrable functions with respect to $p(x_{1:T}, \pmb {\theta } \vert y_{1:T})$.

4 Defining the grid cells

We have, so far, given a broad framework for using an approximate HMM as a Metropolis–Hastings proposal distribution. The method fundamentally relies on the use of a deterministic grid to discretize the state space. There are, however, many different ways to define the grid cells (their size and location), possibly resulting in substantially different proposal distributions and computational costs. We see in Sect. 5 that the efficiency of the algorithm is highly dependent on the choice of grid cell definition and resulting proposal distribution. Since the optimal choice depends on the application, we provide the reader with three approaches for defining the grid cells and describe when each can be usefully (and efficiently) applied. We discuss the other practical decisions in Appendix B.

4.1 Approach 1: equal grid cells

The first approach sets all finite cells equally sized and the same across all time points, that is, $I_t(n) = I_s(n)$ for all $t,s = 1, \dots , T$ and $n=2, \dots , N-1$ (assuming an infinite lower and upper bound on the latent states). We assume that these equally sized grid cells are centered around the mean of the data, $\frac{1}{T} \sum _{t=1}^T y_t$, spanning a range of the state space denoted by $\mathcal {S}$, though this approach could be adapted by centering the grid cells around another function of the data.

If the model parameters do not vary with time, only one transition matrix needs to be calculated per MCMC iteration, or when the model parameters are updated. For a fixed number of grid cells, this approach has the lowest computational cost of the approaches considered. However, since the grid cells are the same for each time point, but the regions of high posterior density may change at each time point, the grid cells should cover a large range of the state space to ensure coverage of these high-density regions and good mixing. This may require many grid cells and a greater computational cost if the high posterior regions of the SSM are highly non-uniform over time, and hence this approach is most efficient for SSMs with uniform such regions.

4.2 Approach 2: data-driven quantiles

If the high posterior density regions vary over time, a time-inhomogeneous approximation to the HMM transition matrix may be more efficient than the previous approach. Hence, we set grid cells in this approach using the observed data at each time point. In particular, in the implementations of this paper, we set the grid cell boundaries at the quantiles of one-dimensional Gaussian distributions centered around each observed data point. For each time point, t, we set the boundaries at the quantiles of X where $X \sim N(y_t, {\sigma _y}^2)$. The variance of the Gaussian distribution, ${\sigma _y}^2$, is set at a scalar value, or as a function of the current observation process variance within the MCMC iterations and can be used to control the range that the majority of the grid cells cover via pilot tuning. The resulting quantiles are simply rounded if integer values are required. Note that simple extensions to this approach may include using a different distribution to set the quantiles, provided that its domain is in the state space.

When the model parameters are updated, a transition matrix needs to be recalculated for each time point. It is, however, possible to reduce the computational cost of this approach by using the same transition matrix across several time points, centering, for example, on a single data point or the mean of the corresponding sub-series. By using the observed data at each time point, we aim to approximately concentrate the grid cells in areas that are likely to have high posterior mass at each time point. This method may be most efficient if the observed data used in the centering is a good proxy for the underlying state process and its dependencies.

4.3 Approach 3: latent state quantiles

The previous approaches define the grid cells independently of the state process, which may result in slow mixing if the latent states of the SSM are highly correlated over time. The accuracy of the proposal distribution and mixing of the latent states can be improved through the definition of the grid cells. Matousek et al. (2019) show empirically that the accuracy of deterministic grid cell approximations to the posterior distribution can be improved by placing the grid cells according to rough approximations to the posterior distribution. Thus, here we set the grid cells using the current latent states in the MCMC iterations at each time t. The underlying intuition is that, after the MCMC algorithm has converged, the current states reflect a sample from the posterior distribution. Thus, over several iterations, grid cells centered around the current state lead to proposed values that are distributed similarly to the (conditional) posterior distribution (Haario et al. 1999), improving the accuracy of the proposal distribution and mixing of the MCMC steps.

To formulate grid cells using the current state, we let ${x_t}^{(m-1)}$ denote the current state in the MCMC iterations at time t, then similarly to the previous approach, we define the grid cell boundaries at the quantiles of a Gaussian distribution X for each t, where $X \sim N({x_t}^{(m-1)}, {\sigma _x}^2)$. As with the previous approach, the variance of the Gaussian distribution used to define the grid cells, ${\sigma _x}^2$, controls the span of the finite cells. Again, this variance could be set, for example, as a fixed value (chosen via pilot-tuning), as a function of the current estimate of the system process variance, or as a function of the current state within the MCMC iterations. We note that setting the majority of the grid cells to cover a small range relative to the high posterior density ranges worked well in practice (see Sect. 5).

In this approach, the transition matrices need to be recalculated at each time point for every change in the model parameters but will typically require fewer grid cells than the other approaches to achieve good mixing.

In general, there is a trade-off between the extent to which the grid cells are well-placed and the associated computational expense. For example, equally-sized grid cells are computationally fast but may give lower acceptance probabilities if they give coarse approximations in high-density regions. Where equally-sized grid cells give poor acceptance probabilities, for example, due to non-uniformity in high posterior regions over time, acceptance probabilities may be improved using one of the other approaches at a greater computational expense.

5 Numerical illustrations

We demonstrate the proposed PMPMH algorithm via two case studies. The first is an SSM with a simple one-dimensional Gaussian mixture state process, demonstrating how the algorithm can be implemented, and the properties of the algorithm when different practical decisions are made. We then show how a similar PMPMH implementation can be used to efficiently sample the latent states of a more challenging 2-dimensional population growth model that can display near-chaotic behavior.

In each case study, we compare the performance of the algorithm to two particle Gibbs algorithms: the particle Gibbs (PG) sampler (Chopin and Singh 2015; Murphy and Godsill 2016) and the particle Gibbs with ancestor sampling (PGAS) algorithm (Lindsten et al. 2014), both with multinomial resampling. The two methods are particle MCMC algorithms and, like the proposed algorithm, sample from the joint posterior distribution of the states and parameters by updating each in turn from their full conditional distributions. The PGAS algorithm, in particular, is a state-of-the-art method that often improves upon the mixing properties of the PG algorithm by reducing sample impoverishment (Nonejad 2015; Meent et al. 2015; Kantas et al. 2015), though at an increased computational cost. The code used to implement the PG, PGAS, and PMPMH algorithms in each numerical illustration is given in the Supplementary Information.

5.1 Gaussian mixture state-space model

We consider a simple one-dimensional Gaussian mixture state process:

$$\begin{aligned}&x_1 \sim w_1 N(1, \sigma ^2_{\eta _1}) + (1-w_1)N(1, \sigma ^2_{\eta _2}),\\&x_t \vert x_{t-1} \sim w_t N(x_{t-1}, \sigma ^2_{\eta _1}) + \\&\qquad \qquad \quad (1-w_t) N(x_{t-1}, \sigma ^2_{\eta _2}), t= 2, \dots , T,\\&w_t \sim Bernoulli(p), t= 1, \dots , T, \end{aligned}$$

where $p \in [0,1]$ denotes the probability of selecting each mixture component of the state distribution (the Gaussian distributions with respective variances $\sigma ^2_{\eta _1}$ or $\sigma ^2_{\eta _2}$). Data, $y_{1:T}$, are observed according to $y_t \vert x_t \sim N(x_t, \sigma ^2_{\epsilon })$ and the model parameters are given by $\pmb {\theta }=(p, \sigma ^2_{\eta _1}, \sigma ^2_{\eta _2}, \sigma ^2_{\epsilon })$. We simulate two data sets from this model, $y_{1:600}^{(1)}$ using $\pmb {\theta }=(0.9, 1, 700, 1)$ and $T=600$ (Model 1), and $y_{1:1000}^{(2)}$ using $\pmb {\theta }=(0.99, 1, 10,000, 10)$ and $T=1000$ (Model 2), shown in Fig. 2.

Table 1 Average effective sample size (ESS) and effective sample size per second (ESS/s) for Approaches 2 and 3 of the PMPMH algorithm with temporal blocks of size $\ell =4$

Full size table

Both data sets are simulated with a high-variance second mixture component, selected with a low probability, resulting in infrequent but large jumps in the state process. The parameter values for Model 2 result in more occasional and larger jumps in the state process than the first. We compare the properties of the algorithm in both cases.

5.1.1 PMPMH implementation

We sample from the joint posterior distribution of the states and parameters, updating $x_{1:T}$ and $\pmb {\theta }$ from their conditional distributions. The sampling schemes and priors for updating $\pmb {\theta }$ are given in Appendix A. Here, we apply the two stages of the PMPMH algorithm to update $x_{1:T}$ conditional on $\pmb {\theta }$. The PMPMH framework presented can be adapted in several ways, for example, by changing the deterministic integration method used to approximate the HMM probabilities, the number of grid cells, and the distribution used to propose values from within grid cells. However, we show in this section that we achieve stable and efficient performance by fixing a number of these decisions and opting for the simple choices given in Appendix B.

Instead, we focus on the practical decisions that are relevant to the efficiency of the algorithm: (a) the choice of each Approach $1-3$ of Sect. 4 (including the range of the state space covered by the finite grid cells, $\mathcal {S}$, which we set using scalar variances), (b) the number of grid cells, N, and (c) the temporal block sizes in which the states are updated, $\ell $. We test the efficiency of the algorithm under various combinations of these tuning parameters:

(a)
the finite grid cells in Approach 1 over a range of $\mathcal {S}=150, 250, 350, 450, 550$ units for both data sets (compared to the range of each set of observations, $y_{1:600}^{(1)}$ and $y_{1:1000}^{(2)}$, equal to 334 and 478 units respectively). In Approaches 2 and 3, we use $\mathcal {S}=1, 3, 5, 7, 9, 11, 13, 15$ units for both models,
(b)
$N=5, 10, 20$,
(c)
$\ell =1, 4, 10$, overlapping blocks by one state to improve the mixing of the states at the “boundaries” of each block as suggested in (Fearnhead 2011).

5.1.2 Results

For each combination of the tuning parameters listed above, the results are based on 10 separate runs of 10, 000 iterations each, taking $1.3-7.5$ hours with 1.6 GHz CPU for Model 1 (depending on the number of grid cells and the approach chosen) and $2.2-11$ hours for Model 2. We present the results for each model in Tables 1 and 2 respectively.

Table 2 Average effective sample size (ESS) and effective sample size per second (ESS/s) for Approaches 2 and 3 of the PMPMH algorithm with temporal blocks of size $\ell =4$

Full size table

Increasing the number of grid cells, N, results in a more accurate approximation to $p(x_{1:T} \vert y_{1:T}, \pmb {\theta })$ in the region covered by the finite grid cells, improving mixing but at a higher computational cost. The equally-spaced grid cell Approach 1 performed poorly on both simulated models compared to the quantile-based methods of Approaches 2 and 3, requiring a large span for both models (350 and 550 units respectively) due to a large range of values in the high posterior regions, and thus large numbers of grid cells and a large computational cost, to provide a reasonable HMM approximation.

Under Approaches 2 and 3, using blocks of size $\ell =10$ or a span for the finite cells greater than 11 units required more than $N=20$ grid cells for convergence and resulted in much more costly and less efficient implementations than the other sets of tuning parameters. Further, using single-site updates ($\ell =1$) gave poor mixing when compared to blocks of size $\ell =4$ due to the correlation between consecutive states. The results when using $\ell =1, 10$, a finite span greater than 11 units and $N>20$ are thus excluded from Tables 1 and 2 for brevity.

The results for Approaches 2 and 3 in Table 1 use blocks of size $\ell =4$, and where convergent, converge within $1000-5000$ iterations using the Brooks-Gelman-Rubin (BGR) diagnostics in Gelman and Rubin (1992); Brooks and Gelman (1998). In the implementations relating to Model 1, both approaches for defining the grid cell boundaries yield similar levels of efficiency when considering the effective sample size (ESS) per second, likely since the small observation error means that both approaches focus grid cells in roughly the same region of the state space around the current state. This is potentially also the reason that both methods required a small span for the finite cells and a small number of grid cells for convergence relative to the range of the data ($1-9$ vs. 344 units, around $0.3-2.6\%$ of the range of the data): as is true of proposal distributions that make local moves, grid cells can be focused over a smaller range of the data compared to global-move approaches and still achieve good acceptance probabilities and mixing. The smaller range also means that as few as 5 grid cells can be used to achieve good convergence properties (compared to 50 in Approach 1) via a good HMM approximation over the region, reducing the computational cost of the approaches.

In comparison to Model 2, where the observation process variance is larger, Approach 2 is different from a local-move proposal distribution and generally exhibits poor mixing and convergence in comparison to Approach 3. Conversely, the local moves of Approach 3 are effective at achieving convergence and exhibit relatively stable performance even for $N=5$. The efficient ranges for the finite grid cells are similar to those in the implementations of the first model as a percentage of the range of the data (here, $1-2.3\%$ of the range of the data).

We also fitted both of the models using the PG and PGAS algorithms with 5–1000 particles, and various combinations of resampling thresholds based on the standard percentage ESS criterion (Cappé et al. 2005). The PG sampler did not converge for either model using as many as 1000 particles, resulting in computational times of around 14 h for Model 1 and 28 h for Model 2 on 1.6 GHz CPU. Conversely, the PGAS sampler converged using as few as 5 particles for both models. On the whole, the PGAS sampler gave greater levels of efficiency than the PMPMH algorithm for both of the models, achieving an average ESS per second of around 4.45 for Model 1 and 1.49 for Model 2. However, this efficiency was not uniform across all states. The average ESS per second of the states simulated according to the second mixture was 0.25– 0.41 for Model 1 and 0.004–0.008 for Model 2. In contrast, where convergent, the PMPMH algorithm was more robust to the mixture associated with the state, with the ESS per second of states in the second mixture minimally $99\%$ of those quoted in Tables 1 and 2 (the average ESS per second in the second mixture-distributed states ranging from 0.01 to 0.24 for Model 1 and 0.01–0.06 for Model 2). We now investigate the comparative performance of the algorithm on a model that can display near-chaotic behavior.

5.2 Nicholson’s blowfly model

We consider Nicholson’s Blowfly model for chaotic population growth in Wood (2010). The population counts over time, denoted by $N_{1:T}$, arise from two correlated survival and birth processes, denoted $S_{1:T}$ and $R_{\tau +1:T}$, $\tau >0$, respectively. Following Wood (2010), we let $\exp (-\delta \epsilon _t)$ denote the daily survival probability with associated environmental error term $\epsilon _t$, such that $\epsilon _t \sim \Gamma (\beta _{\epsilon }, \beta _{\epsilon })$, $\beta _{\epsilon }>0$. The survival component of the system process is given for $t=1, \dots , T$ by

$$\begin{aligned} S_t \sim Binom(N_{t-1}, \exp (-\delta \epsilon _t)), \end{aligned}$$

Letting $e_t$ denote an environmental noise term in the reproductive process such that $e_t \sim \Gamma (\beta _e, \beta _e)$, $\beta _e>0$, the reproductive component of the system process is given for $t=\tau +1, \dots , T,$ by

$$\begin{aligned} R_t \sim Po\left( PN_{t-\tau -1} \exp (-\frac{N_{t-\tau -1}}{N_0}) e_t\right) , \end{aligned}$$

where $N_t = S_t + R_t$ for $t=\tau +1, \dots , T$ and $N_t=S_t$ for $t=1, \dots , \tau $. We let $\tau =5$ be the known time lag between population count and subsequent birth count, and $N_0=50$ is the known initial population count. Further, $\epsilon _{1:T}$ and $e_{\tau +1:T}$ are known. The survival and birth processes, $S_{1:T}$ and $R_{\tau +1:T}$, are unknown latent states, with observed population counts $y_t \sim Po(\phi N_t)$, for all t. Further, the model parameters $\pmb {\theta }=(\delta , P, \beta _\epsilon , \beta _e, \phi )$ are unknown and positively valued. Figure 3 shows the simulated data used in this case study, $y_{1:T}$, using $\pmb {\theta }=(0.7, 50, 1, 0.1, 1)$ and $T=300$.

5.2.1 PMPMH implementation

We devise a Metropolis–Hastings scheme targeting the joint posterior distribution of the unknown latent states and model parameters, $p(S_{1:T}, R_{\tau +1:T}, \pmb {\theta } \vert y_{1:T})$. The model parameters are updated from their full conditional distribution, $p(\pmb {\theta } \vert S_{1:T}, R_{\tau +1:T}, y_{1:T})$, using the priors and sampling schemes given in Appendix A. In this section, we describe how the PMPMH algorithm can efficiently update the latent states, $S_{1:T}$ and $R_{\tau +1:T}$, conditional on $\pmb {\theta }$.

First, the state process is now two-dimensional at each time point. To simplify the design of the grid used within the PMPMH algorithm, we build separate PMPMH proposal distributions targeting the full conditional distributions in each state dimension, $p(S_{1:T} \vert R_{\tau +1:T}, y_{1:T}, \pmb {\theta })$ and $p(R_{\tau +1:T} \vert S_{t:T}, y_{1:T}, \pmb {\theta })$, respectively. To sample from each of the full conditional distributions in turn, a number of the practical decisions remain the same as in the previous example of Sect. 5.1. We first fix several of the practical decisions to those in Appendix B and use blocks of size $\ell =4$, overlapping blocks by one state to reduce the correlation between states at the block boundaries. We test the performance of the algorithm for $N \in \{10, 20, 50\}$ grid cells.

There are two main differences in this implementation compared to the previous example. The first accounts for the near-chaotic state processes and the large range of the data ($1.59 \times 10^{5}$ units). We apply the current state-centered Approach 3 of Sect. 4, which requires fewer grid cells for good mixing properties due to the lower span for the finite cells required to achieve efficient (more local) moves. Within this approach, we permit large variability around large values for the state process by adjusting the variance of the Gaussian distributions used to set the boundaries of the grid cells: we set the variance proportional to the current state at each time point. We try factors of proportionality of 0.1, 0.25, 0.5, 1, with finite grid cells between the q and $1-q$ quantiles with $q=0.01, 0.1, 0.2$, resulting in (average) spans for the finite cells ranging from $\mathcal {S}=52-400$. The second difference is that we ensure the grid cells are over a discrete space bounded at zero by rounding the quantiles used to determine the grid cells, setting the lower boundary to zero if needed.

5.2.2 Results

We assessed the performance of the PMPMH algorithm using 10 independent MCMC simulations of 50, 000 iterations and present the performance metrics in Table 3. Where the implementations of the PMPMH algorithm converged, convergence was achieved within $4000-18,000$ iterations, taking $11-27$ h using 1.6 GHz CPU. Fairly consistent and efficient performance (in terms of ESS per second, Table 3) is achieved using $20-50$ grid cells with the finite cells covering a region of the state space $0.05-0.25\%$ of the range of the data ($\mathcal {S} \in [82, 400]$).

Table 3 Average ESS and ESS/s across the state and parameter samples for various sets of tuning parameters under Approach 3 with blocks of size $\ell =4$

Full size table

Approach 3 mixes effectively by producing a reasonable HMM approximation in a relatively small region around the current state and is also computationally cheap to implement. The efficiency of the algorithm reduces at an upper bound of 50 grid cells, indicating that the improved mixing properties of the algorithm are not justified by the extra cost when compared with using fewer grid cells. Conversely, 10 grid cells, although lower in computational cost, do not produce a sufficiently accurate approximation to the posterior in the region of the current state. We also note that, similarly to the previous case study, implementations relying on extremely small moves around the state space (finite cells $<0.05\%$ of the range of the data) gave poor mixing and convergence properties.

The PGAS sampler, when applied to the model using both joint updating of the latent states and updating each latent state process in turn from their full conditional distributions, did not converge in 50, 000 iterations with 1000 particles, remaining in a range of the state space unrepresentative of the posterior due to sample impoverishment and taking 160 hours with 1.6 GHz CPU.

6 Discussion

We provide an efficient novel approach for fitting general SSMs to observed data. The idea uses tractable HMM approximations to efficiently update the unobserved latent states in a Metropolis–Hastings algorithm. We demonstrate the generality of the proposed approach by its application to two problems, including a challenging near-chaotic problem. The proposed PMPMH algorithm is demonstrated to provide reliable posterior estimates within reasonable computational time frames, especially when compared to a state-of-the-art method, which did not converge within a reasonable time frame for the near-chaotic problem. The flexibility of the PMPMH approach via the tuning parameters, including the location, size, and number of grid cells, and the temporal block size provides an adaptable and efficient algorithm. Using simple methods for placing the grid cells, such as equally-sized grid cells, can be efficient since they are computationally cheap. However, such simple methods may require a large number of grid cells to achieve good mixing if the range of the data is large. A current state-centered approach for approximating the HMM is useful, especially when the state space covers a large range, and using a data-centered approach is useful provided the range of the data is small.

The PMPMH approach motivates several interesting points for future research. One avenue for research is to consider the computational efficiency of parallelization when applied to different components of the algorithm. The computational efficiency of algorithms using block-updating strategies can be improved via parallelization (King et al. 2023). Within the framework of the PMPMH algorithm specifically, serially independent blocks of states can be updated in parallel within each Metropolis–Hastings iteration. Further, for the state-centered grid cell approach presented here, computing the approximate HMM probabilities in each block in parallel can lead to an improvement in the execution time of the approximate HMM computations. However, as with parallelization schemes for particle MCMC methods, such as in Henriksen et al. (2012), the user must balance memory limitations and the computational cost from re-synchronization with the computational gains from parallel implementation. Within the sequential computing context of this paper, one way to reduce the overall computational cost of the algorithm is to reduce the number of times the HMM approximation is calculated across iterations. This is valid for approximation methods that do not depend on the current state or model parameters. However, for deterministic grids that adapt to previous samples (for example, Approaches 2 and 3), the resulting posterior estimate may become biased since the chain is no longer Markovian (Haario et al. 1999). However, Haario et al. (1999) also show that the bias introduced is negligible in some applications and that unbiased samples can be obtained by introducing some pre-determined stopping criterion for updating the grid, that is, specifying a number of iterations beyond which the grid is no longer updated. The computational gains from updating the HMM approximation less frequently could, however, be balanced with potentially reduced mixing properties. For example, local-move grids using a low span for finite cells are more sensitive to the frequency of the HMM approximation since they rely on proposed moves being made in the region of the current state. However, updating the grid less frequently may give an efficient approach when applied to more global-move samplers.

One may also attempt to find efficient ways to adapt the HMM approximations. With any such adaptations, there is a trade-off between the improved mixing properties and the computational cost arising from the complexity of the method. For example, in Step 1 of the algorithm, more complex numerical integration methods could be used to approximate the SSM density in each grid cell. Such methods include increasing the number of points in each grid cell used for approximation calculations, simple linear functions joining evaluations at midpoints, or other fast evaluations that include the gradient in each grid cell. There may also be ways to improve the efficiency of the within-cell proposal distributions in Step 2, including approaches that approximate the posterior density in each grid cell where the conditional posteriors are irregular or vary in an unsystematic way depending on the value of the parameter. These approaches could be valuable to explore for more complex SSMs, in which the improved mixing may justify the added computational cost. Further, these approaches to improving mixing in each step may permit the use of fewer grid cells, decreasing the overall computational cost, which may also permit greater scalability of the HMM to higher dimensional latent spaces. Although we demonstrated the scalability of the proposed algorithm to higher dimensional spaces by updating each state dimension conditionally on the remaining state dimensions, higher dimensional spaces are a particular challenge if individual state dimensions are highly correlated, resulting in poor mixing. In this case, if low-dimensional sets of state dimensions are independent, for example, Factorial SSMs (Ghahramani and Jordan 1995; Rimella and Whiteley 2022), or if dimensions of the state space are partially integrable (Borowska and King 2023), the poor mixing from highly-correlated state dimensions could be improved by performing joint updates using lower-dimensional transition matrix approximations. However, in general, the issue of the scalability of grid-based methods to high-dimensional SSMs is a challenge to the proposed algorithm and an active area of research.

Code availability

See Supplementary Information.

References

Andrieu, C., Doucet, A., Holenstein, R.: Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B 72(3), 269–342 (2010)
MathSciNet MATH Google Scholar
Andrieu, C., Roberts, G.O.: The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Stat. 37(2), 697–725 (2009)
MathSciNet MATH Google Scholar
Auger-Méthé, M., Field, C., Albertsen, C.M., Derocher, A.E., Lewis, M.A., Jonsen, I.D., Mills Flemming, J.: State-space models’ dirty little secrets: even simple linear Gaussian models can have estimation problems. Sci. Rep. 6(1), 26677 (2016)
Google Scholar
Beaumont, M.A.: Estimation of population growth or decline in genetically monitored populations. Genetics 164(3), 1139–1160 (2003)
Google Scholar
Borowska, A., King, R.: Semi-complete data augmentation for efficient state-space model fitting. J. Comput. Graph. Stat. 32(1), 19–35 (2023)
MathSciNet Google Scholar
Brooks, S.P., Gelman, A.: General methods for monitoring convergence of iterative simulations. J. Comput. Graph. Stat. 7(4), 434–455 (1998)
MathSciNet Google Scholar
Buckland, S.T., Newman, K.B., Thomas, L., Koesters, N.B.: State-space models for the dynamics of wild animal populations. Ecol. Model. 171(1), 157–175 (2004)
Google Scholar
Bucy, R.S., Senne, K.D.: Digital synthesis of non-linear filters. Automatica 7(3), 287–298 (1971)
MATH Google Scholar
Cappé, O., Moulines, E., Rydén, T.: Inference in Hidden Markov Models. Springer, New York (2005)
MATH Google Scholar
Carter, C.K., Kohn, R.: On Gibbs sampling for state space models. Biometrika 81(3), 541–553 (1994)
MathSciNet MATH Google Scholar
Chopin, N., Singh, S.S.: On particle Gibbs sampling. Bernoulli 21(3), 1855–1883 (2015)
MathSciNet MATH Google Scholar
de Valpine, P., Hastings, A.: Fitting population models incorporating process noise and observation error. Ecol. Monogr. 72(1), 57–76 (2002)
Google Scholar
Deligiannidis, G., Doucet, A., Pitt, M.K.: The correlated pseudomarginal method. J. R. Stat. Soc. Ser. B 80(5), 839–870 (2018)
MathSciNet MATH Google Scholar
Durbin, J., Koopman, S.J.: Time Series Analysis by State Space Methods, 2nd edn. Oxford University Press (2012)
MATH Google Scholar
Fearnhead, P.: MCMC for state-space models. In: Brooks, S., Gelman, A., Jones, G.L.J., Meng, X.L. (eds.) Handbook of Markov Chain Monte Carlo, pp. 513–529. Chapman & Hall/CRC (2011)
Fearnhead, P., Meligkotsidou, L.: Augmentation schemes for particle MCMC. Stat. Comput. 26(6), 1293–1306 (2016)
MathSciNet MATH Google Scholar
Finke, A., Doucet, A., Johansen, A.: On embedded hidden Markov models and particle Markov chain Monte Carlo methods. arXiv (2016). arXiv:1610.08962
Frühwirth-Schnatter, S.: Efficient Bayesian parameter estimation. In: Harvey, A., Koopman, S.J., Shephard, N. (eds.) State Space and Unobserved Component Models: Theory and Applications, pp. 123–151. Cambridge University Press (2004)
Gamerman, D., Lopes, H.F.: Markov chains. In: Gamerman, D., Lopes, H.F. (eds.) Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference. 2nd edn, pp. 113 – 139. Chapman & Hall/CRC (2006)
Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7(4), 457–472 (1992)
MATH Google Scholar
Ghahramani, Z., Jordan, M.: Factorial hidden Markov models. Mach. Learn. 29(2), 245–273 (1995)
MATH Google Scholar
Giordani, P., Kohn, R.: Adaptive independent Metropolis-Hastings by fast estimation of mixtures of Normals. J. Comput. Graph. Stat. 19(2), 243–259 (2010)
MathSciNet Google Scholar
Haario, H., Saksman, E., Tamminen, J.: Adaptive proposal distribution for random walk Metropolis algorithm. Comput. Stat. 14(3), 375–395 (1999)
MATH Google Scholar
Henriksen, S., Wills, A., Schön, T.B., Ninness, B.: Parallel implementation of particle MCMC methods on a GPU. In: 16th IFAC Symposium on System Identification, pp. 1143–1148 (2012)
Herliansyah, R., King, R., King, S.E.: Laplace approximations for individual heterogeneity capture-recapture models. J. Agric. Biol. Environ. Stat. 22(1), 401–418 (2022)
MATH Google Scholar
Hobert, J.P.: The data augmentation algorithm: theory and methodology. In: Brooks, S., Gelman, A., Jones, G.L.J., Meng, X.L. (eds.) Handbook of Markov Chain Monte Carlo, pp. 253–293. Chapman & Hall/CRC (2011)
Julier, S.J., Uhlmann, J.K.: New extension of the Kalman filter to nonlinear systems. In: Signal Processing, Sensor Fusion, and Target Recognition VI, pp. 182–193 (1997)
Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82(1), 35–45 (1960)
MathSciNet Google Scholar
Kantas, N., Doucet, A., Singh, S.S., Maciejowski, J., Chopin, N.: On particle methods for parameter estimation in state-space models. Stat. Sci. 30(3), 328–351 (2015)
MathSciNet MATH Google Scholar
King, R.: Statistical ecology. In: Brooks, S., Gelman, A., Jones, G.L.J., Meng, X.L. (eds.) Handbook of Markov Chain Monte Carlo, pp. 419–447. Chapman & Hall/CRC (2011)
King, R.: A review of Bayesian state-space modelling of capture-recapture-recovery data. Interface Focus 2(2), 190–204 (2012)
Google Scholar
King, R., Morgan, B.J.T., Gimenez, O., Brooks, S.P.: Bayesian Analysis for Population Ecology. Chapman & Hall/CRC, New York (2009)
Google Scholar
King, R., Sarzo, B., Elvira, V.: When ecological individual heterogeneity models and large data collide: an importance sampling approach. Ann. Appl. Stat. (2023)
Kitagawa, G.: Non-Gaussian state-space modeling of nnonstationary time series. J. Am. Stat. Assoc. 82(400), 1032–1041 (1987)
MATH Google Scholar
Koopman, S.J., Bos, C.S.: State space models with a common stochastic variance. J. Bus. Econ. Stat. 22(3), 346–357 (2004)
MathSciNet Google Scholar
Koyama, S., Pérez-Bolde, L., Shalizi, C., Kass, R.: Approximate methods for state-space models. J. Am. Stat. Assoc. 105(489), 170–180 (2010)
MathSciNet MATH Google Scholar
Kristensen, K., Nielsen, A., Berg, C.W., Skaug, H., Bell, B.M.: TMB: automatic differentiation and Laplace approximation. J. Stat. Softw. 70(5), 1–21 (2016)
Google Scholar
Langrock, R., King, R.: Maximum likelihood estimation of mark-recapture-recovery models in the presence of continuous covariates. Ann. Appl. Stat. 7(3), 1709–1732 (2013)
MathSciNet MATH Google Scholar
Langrock, R., MacDonald, I.L., Zucchini, W.: Some nonstandard stochastic volatility models and their estimation using structured hidden Markov models. J. Empir. Financ. 19(1), 147–161 (2012)
Google Scholar
Latuszyński, K., Roberts, G.O., Rosenthal, J.S.: Adaptive gibbs samplers and related mcmc methods. Ann. Appl. Probab. 23(1), 66–98 (2013)
MathSciNet MATH Google Scholar
Li, T., Sattar, T.P., Sun, S.: Deterministic resampling: unbiased sampling to avoid sample impoverishment in particle filters. Signal Process. 92(7), 1637–1645 (2012)
Google Scholar
Lin, A., Zhang, Y., Heng, J., Allsop, S.A., Tye, K.M., Jacob, P.E., Ba, D.: Clustering time series with nonlinear dynamics: a Bayesian non-parametric and particle-based approach. In: Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Laplace Approximation, pp. 2476–2484 (2019)
Lindsten, F., Jordan, M.I., Schön, T.B.: Particle Gibbs with ancestor sampling. J. Mach. Learn. Res. 15(63), 2145–2184 (2014)
MathSciNet MATH Google Scholar
Lindsten, F., Schön, T.B., Jordan, M.: On the use of backward simulation in the particle gibbs sampler. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3845–3848 (2012)
Matousek, J., Dunik, J., Straka, O.: Point-mass filter: density specific grid design and implementation. In: 15th European Workshop on Advanced Control and Diagnosis, pp. 1093–1115 (2019)
McClintock, B.T., King, R., Thomas, L., Matthiopoulos, J., McConnell, B.J., Morales, J.M.: A general discrete-time modeling framework for animal movement using multistate random walks. Ecol. Monogr. 82(3), 335–349 (2012)
Google Scholar
Meent, J.W., Yang, H., Mansinghka, V., Wood, F.: Particle Gibbs with ancestor sampling for probabilistic programs. In: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, pp. 986–994 (2015)
Murphy, J., Godsill, S.J.: Blocked particle Gibbs schemes for high dimensional interacting systems. IEEE J. Sel. Top. Signal Process. 10(2), 328–342 (2016)
Google Scholar
Neal, R.M.: Markov chain sampling for non-linear state space models using embedded hidden Markov models. arXiv (2003)
Newman, K.B., Buckland, S.T., Morgan, B.J.T., King, R., Borchers, D.L., Cole, D.J., Besbeas, P., Gimenez, O., Thomas, L.: Modelling Population Dynamics: Model Formulation, Fitting and Assessment using State-space Methods. Springer (2014)
Newman, K.B., King, R., Elvira, V., de Valpine, P., McCrea, R.S., Morgan, B.J.T.: State-space models for ecological time series data: practical model-fitting. Methods Ecol. Evol. 14(1), 26–42 (2022)
Google Scholar
Nonejad, N.: Particle Gibbs with ancestor sampling for stochastic volatility models with: heavy tails, in mean effects, leverage, serial dependence and structural breaks. Stud. Nonlinear Dyn. Econ. 19(5), 561–584 (2015)
MathSciNet MATH Google Scholar
Patterson, T.A., Thomas, L., Wilcox, C., Ovaskainen, O., Matthiopoulos, J.: State-space models of individual animal movement. Trends Ecol. Evol. 23(2), 87–94 (2008)
Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Google Scholar
Rainforth, T., Naesseth, C.A., Lindsten, F., Paige, B., van de Meent, J.W., Doucet, A., Wood, F.: Interacting particle Markov chain Monte Carlo. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 2616–2625 (2016)
Rimella, L., Whiteley, N.: Exploiting locality in high-dimensional factorial hidden Markov models. J. Mach. Learn. Res. 23(4), 1–34 (2022)
MathSciNet MATH Google Scholar
Shephard, N., Pitt, M.K.: Likelihood analysis of non-Gaussian measurement time series. Biometrika 84(3), 653–667 (1997)
MathSciNet MATH Google Scholar
Shestopaloff, A.Y., Neal, R.M.: MCMC for non-linear state space models using ensembles of latent sequences. arXiv (2013). arXiv:1305.0320
Shestopaloff, A.Y., Neal, R.M.: Sampling latent states for high-dimensional non-linear state space models with the embedded HMM method. Bayesian Anal. 13(3), 797–822 (2018)
MathSciNet MATH Google Scholar
Smith, A.C., Brown, E.N.: Estimating a state-space model from point process observations. Neural Comput. 15(5), 965–991 (2003)
MATH Google Scholar
Tanner, M.A., Wong, W.H.: The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 82(398), 528–540 (1987)
MathSciNet MATH Google Scholar
Thygesen, U.H., Albertsen, C.M., Berg, C.W., Kristensen, K., Nielsen, A.: Validation of ecological state space models using the Laplace approximation. Environ. Ecol. Stat. 24(2), 317–339 (2017)
Wan, E.A., Van Der Merwe, R.: The unscented Kalman filter for nonlinear estimation. In: Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium, pp. 153–158 (2000)
Wang, X., Li, T., Sun, S., Corchado Rodríguez, J.: A survey of recent advances in particle filters and remaining challenges for multitarget tracking. Sensors 17(12), 2707 (2017)
Wood, S.N.: Statistical inference for noisy nonlinear ecological dynamic systems. Nature 466(7310), 1102–1104 (2010)

Download references

Funding

M.L. was supported by studentship funding from the Engineering and Physical Sciences Research Council (EPSRC). R.K. was supported by the Leverhulme Research Fellowship (RF-2019-299). V. E. was supported by the Agence Nationale de la Recherche of France under PISCES (ANR-17-CE40-0031-01), the Leverhulme Research Fellowship (RF-2021-593), and by ARL/ARO (W911NF-22-1-0235).

Author information

Authors and Affiliations

School of Mathematics, University of Edinburgh, King’s Buildings Campus, Edinburgh, EH9 3FD, UK
Mary Llewellyn, Ruth King, Víctor Elvira & Gordon Ross

Authors

Mary Llewellyn
View author publications
You can also search for this author in PubMed Google Scholar
Ruth King
View author publications
You can also search for this author in PubMed Google Scholar
Víctor Elvira
View author publications
You can also search for this author in PubMed Google Scholar
Gordon Ross
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The first draft of the manuscript was written by M.L., and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mary Llewellyn.

Ethics declarations

Conflict of interest

The authors report that there are no conflicts of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (zip 51 KB)

Appendices

Appendix A: parameter prior distributions

1.1 A.1: Gaussian mixture state-space model parameters in Sect. 5.1

In Sect. 5.1, we conditionally update the parameters from both $p(\pmb {\theta } \vert x_{1:T}, y_{1:T}^{(1)})$ and $p(\pmb {\theta } \vert x_{1:T}, y_{1:T}^{(2)})$ using single-site conjugacy in the observation variance parameter. For both Model 1 and Model 2, we assign vague priors to enable comparison with the simulated parameter for diagnostic purposes. For Model 1,

$$\begin{aligned} \sigma _{\epsilon }^2 \sim \Gamma ^{-1}(2, 2), \end{aligned}$$

resulting in a Gibbs sampler for $ \sigma _{\epsilon }^2$. We assign further high-variance priors to the other parameters in Model 1:

$$\begin{aligned} p&\sim U(0,1),\\ \sigma _{\eta _1}^2&\sim \Gamma ^{-1}(2,2),\\ \sigma _{\eta _2}^2&\sim \Gamma ^{-1}(2,700), \end{aligned}$$

where p, $\sigma _{\eta _1}^2$ and $\sigma _{\eta _2}^2$ are proposed at iteration m from uniformly-distributed random walk proposals over intervals of length 0.3, 2, and 160 units respectively. For Model 2, we similarly assign the independent priors:

$$\begin{aligned} \sigma _{\epsilon }^2&\sim \Gamma ^{-1}(2, 10),\\ p&\sim U(0,1),\\ \sigma _{\eta _1}^2&\sim \Gamma ^{-1}(2,2),\\ \sigma _{\eta _2}^2&\sim \Gamma ^{-1}(2,700), \end{aligned}$$

where the prior for $\sigma _{\epsilon }^2$ again results in Gibbs steps for this parameter. We propose p, $\sigma _{\eta _1}^2$ and $\sigma _{\eta _2}^2$ at iteration m from a uniform random walk proposal distribution over 0.02, 0.5, and 20000 units respectively. All intervals for the random walk proposal distributions are set using pilot tuning.

1.2 A.2: Nicholson’s Blowfly model parameters in Sect. 5.2

We give details for the priors and sampling schemes for the model parameters, $\pmb {\theta }$, in Sect. 5.2. Once again, we make use of the conjugate priors for single site updates of $\pmb {\theta }$ where possible and assign vague priors to help diagnose convergence:

$$\begin{aligned} \delta&\sim \Gamma (0.007, 0.01),\\ P&\sim \Gamma (50, 1),\\ \beta _{\epsilon }&\sim \Gamma ^{-1}(100, 100),\\ \beta _e&\sim \Gamma ^{-1}(10, 1),\\ \phi&\sim \Gamma (0.01, 0.01). \end{aligned}$$

This results in single-site Gibbs updates for P and $\phi $. For $\delta $, $\beta _{\epsilon }$ and $\beta _e$ and we use a random walk Metropolis–Hastings step with uniform proposal distributions over intervals of length 0.03, 0.5, and 0.05 respectively. The parameters of the proposal distributions were chosen via pilot tuning.

Appendix B: practical decisions

We make a number of practical decisions in order to implement the PMPMH algorithm in our case studies. We opt for simple choices for most of these and discuss these here.

1.1 B.1: Deterministic integration method

In Sect. 3.1, we focus on the simple case for approximating the HMM probabilities in each grid cell: midpoint integration using $S=1$. However, the Riemann sum integral approximation method scales simply for higher order polynomials where $S\ge 2$, or these methods can be easily replaced by more complex numerical integration strategies. However, the complexity of these methods should be balanced with their associated computational cost to ensure efficiency in this step. We therefore test the efficiency of the algorithm using midpoint integration ($S=1$)

The midpoint and length of the cells must be defined to apply midpoint integration, even when the $n=1$ or $n=N$ grid cells have an infinite range. In these infinite cells, we simply set this artificial length at the average length of the finite cells, and we set the artificial midpoint at half that length away from the corresponding upper or lower boundary of the finite cells. Further, to ensure that the HMM probabilities are sufficiently high to avoid the proposal distribution resulting in a ‘near-reducible’ Markov chain, we set a lower bound on the HMM probabilities. In all the implementations of this paper, we use a lower bound of 0.01 and renormalize so that they sum to one.

1.2 B.2: Proposal distributions within the grid cells

Once we have sampled a set of grid cells, indexed by $b'_{1:T}$ as per Sect. 3.2, we sample point values from within the grid cells using simple proposal distributions for each $t\ge 1$. In all implementations, we sample from uniform distributions with domain in the finite grid cells.

To sample values for the state in infinite grid cells in the first case study of Sect. 5.1, we sample from a truncated Gaussian distribution with mean equal to the finite boundary and a variance of 5. The variance of the infinite-cell distribution is relatively low so that proposals in this grid cell are mostly concentrated around the boundary of the finite cells, increasing the density in the tails of the proposal distribution. In the second case study, we sample from a Poisson distribution with mean parameter equal to 2 (again to ensure a relatively heavy-tailed proposal distribution), shifted to the lower bound of the upper (infinite) grid cell.

Appendix C: Pseudocode for the PMPMH algorithm with block updates

Here we present the pseudocode describing the two stages of the PMPMH algorithm with the states updated in blocks. This supplements the description of the block-updating procedure in Sect. 3.4.

Note that, in this case, the acceptance probability for a proposed set of $\ell $ states starting at time t, $x'_{t:t+\ell -1}$, is given by

$$\begin{aligned}&p(x_{t:t+\ell -1}, x'_{t:t+\ell -1} \vert \pmb {\theta })\nonumber \\&\quad =\frac{p(x'_{t:t+\ell -1} \vert x_{t-1}, x_{t+\ell }, y_{t:t+\ell -1}, \pmb {\theta })}{p(x_{t:t+\ell -1} \vert x_{t-1}, x_{t+\ell }, y_{t:t+\ell -1}, \pmb {\theta })} \nonumber \\&\quad \quad \times \frac{ q(x_{t:t+\ell -1} \vert x_{t-1}, x_{t+\ell }, y_{t:t+\ell -1}, \pmb {\theta })}{q(x'_{t:t+\ell -1} \vert x_{t-1}, x_{t+\ell }, y_{t:t+\ell -1}, \pmb {\theta })} \end{aligned}$$

(C1)

where, in the first block,

$$\begin{aligned} p(x_{t:t+\ell -1}&\vert x_{t-1}, x_{t+\ell }, y_{t:t+\ell -1}, \pmb {\theta }) \\&\quad =p(x_{t:t+\ell -1} \vert x_{t+\ell }, y_{t:t+\ell -1}, \pmb {\theta }), \end{aligned}$$

and similarly for the proposal density. Likewise, in the last block,

$$\begin{aligned} p(x_{t:t+\ell -1}&\vert x_{t-1}, x_{t+\ell }, y_{t:t+\ell -1}, \pmb {\theta }) \\&\quad =p(x_{t:t+\ell -1} \vert x_{t-1}, y_{t:t+\ell -1}, \pmb {\theta }). \end{aligned}$$

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Llewellyn, M., King, R., Elvira, V. et al. A point mass proposal method for Bayesian state-space model fitting. Stat Comput 33, 111 (2023). https://doi.org/10.1007/s11222-023-10268-6

Download citation

Received: 19 January 2023
Accepted: 17 June 2023
Published: 28 July 2023
DOI: https://doi.org/10.1007/s11222-023-10268-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A point mass proposal method for Bayesian state-space model fitting

Abstract

Similar content being viewed by others

Coupling stochastic EM and approximate Bayesian computation for parameter inference in state-space models

Approximate Bayesian Computation by Subset Simulation for Parameter Inference of Dynamical Models

Sequential Gauss-Newton MCMC Algorithm for High-Dimensional Bayesian Model Updating

Explore related subjects

1 Introduction

2 State-space model inference

2.1 Model fitting

3 Point mass proposal Metropolis–Hastings

3.1 Step 1: sampling a grid cell trajectory

3.1.1 Discretization to an HMM

3.1.2 Approximate HMM

3.2 Step 2: sampling a point trajectory

3.3 The PMPMH proposal distribution

3.4 Block updates of the latent states

3.5 Conditions for convergence

Theorem

Proof

4 Defining the grid cells

4.1 Approach 1: equal grid cells

4.2 Approach 2: data-driven quantiles

4.3 Approach 3: latent state quantiles

5 Numerical illustrations

5.1 Gaussian mixture state-space model

5.1.1 PMPMH implementation

5.1.2 Results

5.2 Nicholson’s blowfly model

5.2.1 PMPMH implementation

5.2.2 Results

6 Discussion

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (zip 51 KB)

Appendices

Appendix A: parameter prior distributions

1.1 A.1: Gaussian mixture state-space model parameters in Sect. 5.1

1.2 A.2: Nicholson’s Blowfly model parameters in Sect. 5.2

Appendix B: practical decisions

1.1 B.1: Deterministic integration method

1.2 B.2: Proposal distributions within the grid cells

Appendix C: Pseudocode for the PMPMH algorithm with block updates

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation