A rare event approach to highdimensional approximate Bayesian computation
 3.5k Downloads
Abstract
Approximate Bayesian computation (ABC) methods permit approximate inference for intractable likelihoods when it is possible to simulate from the model. However, they perform poorly for highdimensional data and in practice must usually be used in conjunction with dimension reduction methods, resulting in a loss of accuracy which is hard to quantify or control. We propose a new ABC method for highdimensional data based on rare event methods which we refer to as REABC. This uses a latent variable representation of the model. For a given parameter value, we estimate the probability of the rare event that the latent variables correspond to data roughly consistent with the observations. This is performed using sequential Monte Carlo and slice sampling to systematically search the space of latent variables. In contrast, standard ABC can be viewed as using a more naive Monte Carlo estimate. We use our rare event probability estimator as a likelihood estimate within the pseudomarginal Metropolis–Hastings algorithm for parameter inference. We provide asymptotics showing that REABC has a lower computational cost for highdimensional data than standard ABC methods. We also illustrate our approach empirically, on a Gaussian distribution and an application in infectious disease modelling.
Keywords
ABC Markov chain Monte Carlo Sequential Monte Carlo Slice sampling Infectious disease modelling1 Introduction
Approximate Bayesian computation (ABC) is a family of methods for approximate inference, used when likelihoods are impossible or impractical to evaluate numerically but simulating datasets from the model of interest is straightforward. ABC can be viewed as a nearest neighbours method. It simulates datasets given various parameter values, and finds the closest matches, in some sense, to the observed dataset. The corresponding parameters are used as the basis for inference. Various Monte Carlo methods have been adapted to implement this idea, including rejection sampling (Beaumont et al. 2002), Markov chain Monte Carlo (MCMC) (Marjoram et al. 2003) and sequential Monte Carlo (SMC) (Sisson et al. 2009). However, it is well known that nearest neighbours approaches becomes less effective for higherdimensional data, a phenomenon referred to as the curse of dimensionality. The problem is that even under the best parameter values, it is rare for a highdimensional simulation to match a fixed target well, essentially because there are many random components all of which must be close matches to observations.
In this paper, we propose a method to deal with this issue and permit higherdimensional data or summary statistics to be used in ABC. The idea involves introducing latent variables x. We assume data are a deterministic function \(y(\theta ,x)\), where \(\theta \) is a vector of parameters. Hence, x encapsulates all the randomness which occurs in the simulation process. Our approach is, for a particular \(\theta \) value, to use rare event methods to estimate the probability of x values occurring which produce \(y(\theta ,x) \approx y_{\text {obs}}\). As discussed later, this probability equals, up to proportionality, the approximate likelihood of \(\theta \) used in existing ABC algorithms. We estimate this probability using SMC algorithms for rare events from Cérou et al. (2012). The resulting estimates are unbiased or low bias, depending on the algorithm, and can be used by many inference methods. We concentrate on the pseudomarginal Metropolis Hastings algorithm (Andrieu and Roberts 2009), which outputs a sample from a distribution approximating the Bayesian posterior.
The intuition for the rare event probability estimates we use is as follows. Given \(\theta \), standard ABC methods effectively simulate one or several x values from their prior and calculate a Monte Carlo estimate of \(\Pr (y(\theta ,x) \approx y_{\text {obs}})\). This relative error of this estimate has high variance when the probability is small, as is the case when we require close matches. The rare event technique of splitting uses nested sets of latent variables \(A_1 \supset A_2 \supset \ldots \supset A_T\), representing increasingly close matches. We aim to estimate \(\Pr (A_1)\), \(\Pr (A_2  A_1)\), \(\Pr (A_3  A_2), \ldots \) and take the product. If these probabilities are all relatively large then the variance of the final estimator’s relative error is smaller than using a single stage of Monte Carlo [for a crude variance analysis justifying this see L’Ecuyer et al. (2007). Cérou et al. (2012) prove more detailed results for their SMC algorithms which we summarise later]. We can estimate \(\Pr (A_1)\) using Monte Carlo with N samples. Next, we reuse the x samples with \(x \in A_1\). We sample randomly from these N times and, to avoid duplicates, perturb each appropriately. We found a good perturbation method was a slice sampling algorithm from Murray and Graham (2016). The resulting sample is used to find a Monte Carlo estimate of \(\Pr (A_2  A_1)\). We carry on similarly to estimate the remaining conditional probabilities.
For this approach to work well, a small perturbation of the xs must produce a corresponding small perturbation of the ys. Hence, the mapping \(y(\theta ,x)\) must be well chosen. This requirement is explored in Sect. 6.1.
We consider two rare event SMC algorithms proposed by Cérou et al. (2012). In one, the nested sets must be fixed in advance and in the other they are selected adaptively during the algorithm. A contribution of this paper is to compare the efficiency of these algorithms within the setting of ABC. Our recommendation, discussed in Sect. 6.2, is a combination of the two approaches: a single run of the adaptive algorithm to select the nested sets, followed by using these in the fixed algorithm.
1.1 Related literature
First, we highlight the difference between our approach and ABCSMC (Sisson et al. 2009; Moral et al. 2012). These methods find parameter values which are most likely to produce simulations closely matching the observations. We argue that for highdimensional observations, such simulations are rare even for the best parameter values. Instead, we use SMC in a different way, to find latent variables which produce successful simulations. In Sect. 6.3, we discuss the possibility of combining these two approaches. Another method that seeks to find promising parameter values is ABC subset simulation (Chiachio et al. 2014). To our knowledge, this is the only other approach to ABC using rare event methods. Again, our approach differs from this by instead searching a space of latent variables.
The most popular approach to deal with the curse of dimensionality in ABC is dimension reduction. Here, highdimensional datasets are mapped to lower dimensional vectors of features, often referred to as summary statistics. The quality of a match between simulated and observed data is then judged based only on their corresponding summary vectors. However, using summary statistics involves some loss of information about the posterior which is hard to quantify. Lowdimensional sufficient statistics would avoid this problem but generally do not exist, and there are many competing methods to choose summaries which make a good tradeoff between low dimension and informativeness (Blum et al. 2013; Prangle 2017). An alternative approach of Nott et al. (2014) is to improve ABC output by adjusting each parameter’s margin to agree with a separate marginal ABC analysis. These analyses can each use different lowdimensional summary statistics, so that the effect of the curse of dimensionality on the margins is reduced. However, there are still issues in selecting these summaries and dealing with approximation error in the dependence structure. Recently, an extension has looked at assuming a Gaussian copula dependence structure (Li et al. 2017). More highdimensional ABC methods are reviewed in Nott et al. (2017).
Several other authors have recently investigated latent variable approaches to ABC. Neal (2012) introduced coupled ABC for household epidemics. This simulates latent variable vectors from their prior and, for each, finds one or many parameter vectors leading to closely matching simulated datasets. These parameters, weighted appropriately, form a sample from an approximate posterior. A similar strategy is employed for more general applications in Meeds and Welling (2015)’s optimisation Monte Carlo and the reverse sampler of Forneron and Ng (2016). Alternatively, Moreno et al. (2016) perform variational inference, using latent variable vectors drawn from their prior in the estimation of loss function gradients. Another related method is Graham and Storkey (2016), who sample from the \((\theta ,x)\) space conditioned exactly on the observations using constrained Hamiltonian Monte Carlo (HMC). A limitation is that the \(y(\theta ,x)\) mapping must be differentiable with respect to both arguments.
A similar SMC approach to ours is outlined, but not implemented, by Andrieu et al. (2012). Analogous methods have been implemented for ABC inference of state space models, using ABC particle filtering to estimate likelihoods for a sequence of observations (Jasra 2015).
Targino et al. (2015) use similar methods to us in a nonABC context. They use SMC to estimate posterior quantities for a copula model conditional on a rare event. Like us, they use increasingly rare events as intermediate targets and use slice sampling for perturbation moves. A difference is our focus on estimating the probability of the rare event, and providing results on the asymptotic efficiency of this. Also their perturbation updates each component of x in turn with a univariate slice sampler, while we use truly multivariate updates.
1.2 Contributions and overview
We provide an approximate inference method for the same class of intractable problems as ABC. Our algorithm samples from the same family of posterior approximations as ABC, but can reach more accurate approximations for the same computational cost. In particular, its cost rises more slowly with the data dimension. Therefore, it is feasible to perform inference using a larger, and hence more informative, set of summary statistics. In some cases, it is even feasible to use the full data.
Our method has various differences to competing methods using latent variables. Unlike the majority of these, it does not rely solely on randomly sampling latent variables, but instead searches their space more efficiently. Also unlike HMC approaches, we do not require differentiability assumptions for \(y(\theta ,x)\).
Typically, SMC methods have many tuning choices. Another benefit of our approach is that these can all be automated. The tuning choices required are simply those for the ABC and PMMH algorithms.
Section 2 describes background information on the methods we use. Section 3 presents our algorithm to estimate the likelihood given a particular parameter vector, and how we use this within a MCMC inference algorithm. Asymptotic results on computational cost are also given here, quantifying the improvement over standard ABC. The method is evaluated on a simple Gaussian example in Sect. 4 and used in an infectious disease application in Sect. 5. Code for these examples is available at https://github.com/dennisprangle/RareEventABC.jl. Section 6 gives a concluding discussion, including when we expect our scheme to work well. “Appendix A” contains technical details of our asymptotics.
2 Background
2.1 Approximate Bayesian computation
ABC rejection sampling is inefficient in the common situation where the prior is much more diffuse than the posterior, as a lot of time is spent on simulations that have very little chance of being accepted. Several more sophisticated ABC algorithms have been proposed which concentrate on performing simulations for \(\theta \) values believed to have high posterior density. These include versions of importance sampling, MCMC and SMC. These also output samples (sometimes weighted) from an approximation to the posterior, usually \(\pi _\text {ABC}\) as given in (1). See Marin et al. (2012) for a review of ABC, including these algorithms and related theory.
As mentioned earlier, ABC suffers from a curse of dimensionality issue. Intuitively, the problem is that simulations producing good matches of all summaries simultaneously become increasingly unlikely as \(\dim (y)\) grows. For Algorithm 1, it has been proved (Blum 2010; Barber et al. 2015; Biau et al. 2015) that for a fixed value of N the quality of the output sample as an approximation of the posterior deteriorates as d increases, even taking into account the possibility of adjusting \(\epsilon \). See Fearnhead and Prangle (2012) for heuristic arguments that the problem also applies to other ABC algorithms.
2.2 Pseudomarginal Metropolis–Hastings
The approach of this paper is to estimate the ABC likelihood (2) more accurately than standard ABC methods. This section reviews one approach for how such estimates can be used to sample from \(\pi _\text {ABC}\).
The Metropolis–Hastings (MH) algorithm samples from a Markov chain with stationary distribution proportional to an unnormalised density \(\psi (\theta )\). It is often used in Bayesian inference to produce samples from a close approximation to the posterior distribution. Despite the nonindependence of these samples, they can still be used to produce highly accurate Monte Carlo estimates of functions of the posterior. Simulating \(\theta _t\), the tth state of the Markov chain is based on sampling a state \(\theta '\) from a proposal density \(q(\theta '\theta _{t1})\), typically centred on the preceding state \(\theta _{t1}\). This proposal is accepted as \(\theta _t\) with probability \( \min \left( 1, \frac{\psi (\theta ') q(\theta _{t1}\theta ')}{\psi (\theta _t) q(\theta '\theta _{t1})} \right) \). Otherwise \(\theta _t=\theta _{t1}\).
This algorithm remains valid if likelihood evaluations are replaced with unbiased nonnegative estimates as follows (Andrieu and Roberts 2009). The state of the Markov chain is now \((\theta _t, \hat{\psi }_t)\), where \(\hat{\psi }_t\) is an estimate of \(\psi (\theta _t)\), and the acceptance probability must be \(\min \left( 1,\frac{\hat{\psi }' q(\theta _{t1}\theta ')}{\hat{\psi }_{t1} q(\theta '\theta _{t1})}\right) .\) Crucially, upon acceptance \(\hat{\psi }_t\) is set to the estimate \(\hat{\psi }'\) for the proposal \(\theta '\). So, rather than being recalculated in every iteration, this estimate is used in all future iterations until another proposal is accepted. A version of the resulting pseudomarginal Metropolis–Hastings (PMMH) algorithm, specialised to this paper’s setting, is presented below as Algorithm 5.
Optimal tuning of PMMH has been examined theoretically by Pitt et al. (2012), Doucet et al. (2015) and Sherlock et al. (2015), covering the case where each \(\hat{\psi }'\) estimate is generated by an SMC algorithm. A central issue is how many SMC particles should be used to optimise the computational efficiency of PMMH. All the authors conclude that this number should be tuned to achieve a particular variance of \(\log \hat{\psi }\) (it’s assumed, unrealistically, that this variance does not depend on \(\theta \). In practice it’s typical to investigate the variance at a fixed value of \(\theta \) believed to have high posterior density). The value derived for this optimal variance differs between the authors due to their different assumptions, but all values lie in the range 0.8–3.3. Sherlock et al. (2015) also investigate tuning the proposal distribution q, and suggest using proposal variance \(\frac{2.562^2}{\dim (\theta )} \Sigma \) where \(\Sigma \) is the posterior variance. They perform simulation studies generally supporting both these results. One key assumption made by all the authors is that \(\log \hat{\psi }\) follows a normal distribution. The validity of this assumption in our setting will be investigated later. It’s also assumed that the computational cost of SMC is proportional to the number of particles used and does not depend on \(\theta \), which is generally true for SMC algorithms.
2.3 Rare event sequential Monte Carlo
To estimate the ABC likelihood (2) in Sect. 3, we will use two algorithms of Cérou et al. (2012) for estimating rare event probabilities using a SMC approach. This section reviews existing work on these algorithms. A few novel remarks which are relevant later are given at the end.
The aim is to estimate a small probability, \(P=\Pr (\varPhi (x) \le \epsilon  \theta )\). Here, x is a random variable, \(\theta \) is a vector of parameters, \(\varPhi \) maps x values to \(\mathbb {R}\), and \(\epsilon \) is a threshold. In the ABC setting of later sections, P will be an estimate of \(L_\text {ABC}(\theta ;\epsilon )\) up to proportionality. As discussed informally in Sect. 1, both algorithms act by estimating conditional probabilities \(\Pr (\varPhi (x) \le \epsilon _{k+1}  \theta , \varPhi (x) \le \epsilon _k)\) for a decreasing sequence of \(\epsilon \) values. In Algorithm 2 (FIXEDRESMC), a fixed \(\epsilon \) sequence must be prespecified. In Algorithm 3 (ADAPTRESMC), the sequence is selected adaptively. Whenever we use RESMC without an additional prefix, we are referring to both algorithms.
Remark
 1.
Step 2 of ADAPTRESMC selects a threshold sequence in the same way as the ABCSMC algorithm of Moral et al. (2012). Unlike that work however, this sequence is specialised to one particular \(\theta \) value rather than being used for many proposed \(\theta \)s.
 2.
In ADAPTRESMC, typically \(N_{\text {acc}}\) particles are accepted so that \(I_t=N_{\text {acc}}\). However, there may be more acceptances in the final iteration or if ties in distance are possible.
 3.
For \(t \le T\), \(\prod _{\tau =1}^t \hat{P}_\tau \) is an upper bound on \(\hat{P}\) in either RESMC algorithm. This bound can be calculated during the tth iteration of the algorithms. This will be used below to terminate the algorithms early once the estimate is guaranteed to be below some prespecified bound.
 4.
The \(x^{(i)}_T\) values can be used for inference of \(x  \theta , \varPhi (x) \le \epsilon \). When this is not of interest, as in this paper, then the computational cost can be reduced by omitting step 3 (resampling and Markov kernel propagation) in the final iteration of either algorithm.
 5.
It’s possible for ADAPTRESMC not to terminate. This could occur if the \(x^{(i)}_t\) particles become stuck near a mode where \(\varPhi (x) > \epsilon \) and the Markov kernel is unable to move them to other modes. In Sect. 3.2, we will discuss how our proposed method can avoid this problem by terminating once it becomes clear the final likelihood estimate will be very low.
 6.
When ties in the distance are possible, ADAPTRESMC iterations can fail to reduce the threshold. That is, sometimes step 2 can give \(\epsilon _{t+1}=\epsilon _t\). This can produce very long run times. Possible improvements to deal with this are discussed in Sect. 6.3 (note that when ADAPTRESMC is being used to select a sequence of thresholds then repeated values should be removed).
 7.
These algorithms use multinomial resampling. More efficient schemes exist, but are not investigated by the theoretical results of Cérou et al. (2012).
2.4 Slice sampling
We require a suitable Markov kernel to use within the RESMC algorithms. This must have invariant density \(\pi (x\theta , \varPhi (x) \le \epsilon _{t1})\). As discussed below in Sect. 3.2, our ABC setting will assume \(\pi (x\theta )\) is uniform on \([0,1]^m\). Hence, the required invariant distribution is uniform on the subset of \([0,1]^m\) such that \(\varPhi (x) \le \epsilon _{t1}\). We will use slice sampling as the Markov kernel. This section outlines the general idea of slice sampling and a particular algorithm. We also include some novel material on how it can be adapted to our setting and advantages over alternative choices.
Slice sampling is a family of MCMC methods to sample from an unnormalised target density \(\gamma (x)\). The general idea is to sample uniformly from the set \(\{ (x,h)\) \( h \le \gamma (x) \}\) and marginalise. We will concentrate on an algorithm of Murray and Graham (2016) for the case where the support of \(\gamma (x)\) is \([0,1]^m\), or a subset of this. Their algorithm updates the current state x by first drawing h from \(\text {Uniform}(0,\gamma (x))\), then proposing \(x'\) values, accepting the first one for which \(\gamma (x') \ge h\). The proposal scheme initially considers large changes from x in a randomly chosen direction, and then, if these are rejected, progressively smaller changes.
Next, we describe two advantages of using slice sampling within RESMC, particularly in relation to the alternative of using a Metropolis–Hastings kernel. Firstly, slice sampling requires little tuning. If tuning choices were required, for example a proposal distribution for Metropolis–Hastings, then RESMC would need to include rules to make a good choice automatically, which may be difficult. Another advantage of slice sampling is that each iteration outputs a unique x value. On the other hand, Metropolis–Hastings rejections can lead to duplicates, which is problematic within SMC because it leads to increased variance of probability estimates.
The only tuning choice required by Algorithm 4 is the initial search width w. A default choice is \(w=1\), but this means that the number of loops required will increase for small \(\epsilon \). To deal with this, we choose \(w=1\) in the first SMC iteration and then select w adaptively, as \(\min (1, 2 \bar{z})\) where \(\bar{z}\) is the maximum final value of z from all slice sampling calls in the previous SMC iteration. This choice generally shrinks w based on the most recent value of \(\bar{z}\), while avoiding some unwanted behaviours. Firstly, it avoids forcing w to decrease at a fixed rate, so that eventually only very small steps would be attempted. Secondly, it avoids w growing above 1, which would make slice sampling expensive when local moves are required. The effect of our choice is investigated empirically later (see Fig. 3).
3 Highdimensional ABC
This section presents our approach to inference in the ABC setting, using the algorithms reviewed in Sect. 2. Section 3.1 describes how the RESMC algorithms can estimate the ABC likelihood given values of \(\theta \) and \(\epsilon \), and a latent variable structure. Such likelihood estimators can be used within several inference algorithms to produce approximate Bayesian inference. In this paper, we concentrate on PMMH. Section 3.2 presents the resulting method. Section 3.3 discusses the computational cost of the resulting REABC algorithm in comparison with standard ABC, with particular note of the highdimensional case.
Two versions of REABC are possible, depending on whether likelihood estimates are produced using FIXEDRESMC or ADAPTRESMC. We present both and compare them throughout the remainder of the paper. As will be explained, in Sects. 3.2 and 6.2, we conclude by arguing in favour of using FIXEDRESMC together with an initial run of ADAPTRESMC to select the \(\epsilon \) sequence.
3.1 Likelihood estimation
For now, suppose \(\theta \) and \(\epsilon >0\) are fixed. We aim to produce an unbiased estimate of \(L_\text {ABC}(\theta ; \epsilon )\), as defined in (2).
Suppose there exist latent variables x such that the observations can be written as a deterministic function \(y=y(\theta , x)\). The idea is that x and \(\theta \) suffice to specify a complete realisation of the simulation process, even including details such as observation error, and \(y(\theta , x)\) is a vector of partial observations. Neglecting \(\theta \), which is fixed for now, \(y(\theta , x)\) will be written below as simply \(y=y(x)\). See Sect. 6.1 for a discussion of properties of \(y(\theta , x)\) which help our approach work well.
We specify a density \(\pi (x\theta )\) (with respect to Lebesgue measure) for the latent variables. This is part of the specification of the model, but it can also be viewed as representing prior beliefs about the latent variables. Throughout the paper, we take \(\pi (x\theta )\) to be uniform on \([0,1]^m\) regardless of \(\theta \). Under this interpretation, x is a vector of m independent standard uniform random variables which suffice to carry out the simulation process.
Note that we assume \(\pi (x\theta )\) to be uniform simply for convenience. Firstly, many latent variable representations can easily be reexpressed in this form. Secondly, given this assumption, the slice sampling method of Algorithm 4 is well suited to be the Markov kernel within RESMC. Our methodology could be adapted to use other \(\pi (x\theta )\) distributions if desired. The main change needed would be to use alternative Markov kernels, for example elliptical slice sampling (see Murray and Graham 2016) for the Gaussian case, or Gibbs updates for the discrete case. These changes could well improve performance for particular applications.
3.2 Inference
For FIXEDREABC, the likelihood estimates are unbiased estimates of \(L_{\text {ABC}}(\theta ; \epsilon )\) up to proportionality. Therefore, the probability of acceptance in step 4 corresponds to a target density proportional to \(\pi (\theta ) L_\text {ABC}(\theta ; \epsilon )\), i.e., the standard ABC posterior (1). ADAPTREABC involves biased likelihood estimates so does not sample from exactly this density. However, the bias introduced is small and may have little effect compared to the efficiency benefits of the variance reduction which ADAPTRESMC provides (theoretical and practical aspects of MCMC algorithms that have this character are discussed in Alquier et al. 2016). We investigate this empirically in Sects. 4 and 5 and find no noticeable effect of bias. However, we find ADAPTRESMC to sometimes be less computationally efficient in practice, and so we recommend using the FIXEDRESMC algorithm, together with a single run of ADAPTRESMC to select a \(\epsilon \) sequence. Reasons for this are described shortly, and discussed in more detail in Sect. 6, together with possibilities for improvement.
Earlier we commented that ADAPTRESMC can fail to terminate in some situations. When ties in the distance are not possible, then this is usually not a problem within REABC due to the early termination rule just outlined. However, care is still required the first time ADAPTRESMC is run, and when it is used in pilot runs. Ties in the distance are potentially more problematic and are discussed further in Sect. 6.3.
There are numerous tuning choices required in this PMMH algorithm. Most of these can be based on the output of a pilot analysis, for example an ABC analysis or a short initial run of PMMH. The estimated posterior mean \(\hat{\mu }\) can be used as an initial PMMH state. The estimated posterior variance \(\hat{\Sigma }\) can be used to tune the PMMH proposal density. Following the PMMH theory discussed in Sect. 2.2, we sample proposal increments from \(N \left( 0, \frac{2.562^2}{\dim (\theta )} \hat{\Sigma } \right) \) (note that the early termination rule avoids SMC calls having very long run times for some \(\theta \) values, approximately meeting the assumptions of the PMMH tuning literature). The threshold sequence for FIXEDRESMC can be selected by running ADAPTRESMC with \(\theta =\hat{\mu }\). To select the number of particles, a few preliminary runs of FIXEDRESMC (or ADAPTRESMC) can be performed with \(\theta =\hat{\mu }\), aiming to produce a loglikelihood variance of roughly 1. This is at the more conservative end of the range suggested by the theory reviewed earlier.
A crucial tuning choice which remains is \(\epsilon \). As in other ABC methods, we suggest tuning this pragmatically based on the computational resources available. This can be done by running ADAPTRESMC with \(\theta =\hat{\mu }\) and \(\epsilon =0\) and stopping after a prespecified time, corresponding to how long is available for an iteration of PMMH. The value of \(\epsilon _t\) when the algorithm is stopped can be used as \(\epsilon \). It is still possible for the SMC algorithms to take much longer to run for other \(\theta \) values. However, the early termination rule will usually mitigate this. Diagnostic plots can be used to investigate whether the \(\epsilon \) value selected produces simulations judged to be sufficiently similar to the observations. For example, see Figure 1 of the supplementary material.
3.3 Cost
Here, we summarise results on the cost of ABC and REABC in terms of time per samples produced (or effective sample size for PMMH algorithms), in the asymptotic case of small \(\epsilon \). Arguments supporting these results are given in “Appendix A”. Several assumptions are required, principally that \(\pi (y\theta )\) is a density with respect to Lebesgue measure—informally, the observations must be continuous. Weakening these assumptions is discussed in supplementary material. Note that the results are the same whether FIXEDREABC and ADAPTREABC is used.
The time per sample is asymptotic to \(1/V(\epsilon )\) for ABC and \([\log V(\epsilon )]^2\) for REABC (see (3) for definition of \(V(\epsilon )\).) So, asymptotically, REABC has a significantly lower cost to reach the same target density. To illustrate the effect of \(D = \dim (y)\), we can consider the asymptotic case of large D (n.b. as shown in the supplementary material, when some observations are noncontinuous then D can be replaced with the dimension of \(\{ y  d(y,y_{\text {obs}}) < \epsilon \}\) for small \(\epsilon \).) Under the Lebesgue assumption, (3) gives that \(V(\epsilon ) \propto \epsilon ^D\). Hence, the time per sample is asymptotic to the following expressions, written in terms of \(\tau = 1/\epsilon \) for interpretability: \(C_1 = \tau ^D\) for ABC and \(C_2 = D^2 [\log \tau ]^2 = [\log C_1]^2\) for REABC. Hence, ABC has an exponential cost in D, while REABC has only a quadratic cost. This makes highdimensional inference more tractable for REABC but dimension reduction via summary statistics will remain useful in controlling the cost when D is large.
These results assume the algorithms are run sequentially. The PMMH stage of REABC is innately sequential, but particle updates can be run in parallel, providing a benefit from parallelisation. Compared to the most efficient ABC algorithms, this is an advantage over ABCMCMC and seems roughly comparable to that of ABCSMC algorithms.
4 Gaussian example
In this section, we compare ABC (Algorithm 1) and REABC (Algorithm 5) on a simple Gaussian model. The model is \(Y_i \sim N(0, \sigma ^2)\) independently for \(1 \le i \le 25\). We use the prior \(\sigma \sim \text {Uniform}(0, 10)\). This is an interesting test case because \(\dim (y)\) is large enough to cause difficulties for ABC methods but calculations are quick, and the results can be compared to those of likelihoodbased methods.
4.1 Comparison of ABC and REABC
Figure 1 shows the results. The left panel illustrates that accuracy improves as the acceptance threshold \(\epsilon \) is reduced below roughly 15, and, as expected, all methods produce very similar results. In particular, the biased likelihood estimates in ADAPTREABC have a negligible effect overall. The right panel investigates the time taken per sample by ABC. For MCMC output, this is time divided by the effective sample size (the IMSE estimate of Geyer 1992.) Under ABC and ABCMCMC, time per sample increases rapidly as \(\epsilon \) is reduced. For both REABC algorithms, the increase is slower, allowing smaller values of \(\epsilon \) to be investigated. Neither REABC algorithm is obviously more efficient than the other. This difference between ABC and REABC is consistent with the asymptotics on computational cost described in Sect. 3.3. However, for large \(\epsilon \) values ABC and ABCMCMC are cheaper. Overall, REABC permits smaller \(\epsilon \) values to be investigated at a reasonable computational cost, producing more accurate approximations.
4.2 Validity of assumptions
5 Epidemic application
Infectious disease data are often modelled using compartment models where members of a population pass through several stages. We will consider a model with susceptible, infectious and removed stages – the socalled SIR model (Andersson and Britton 2000). A susceptible individual has not yet been infected with the disease but is vulnerable. An infectious individual has been infected and may spread the disease to others. A removed individual can no longer spread the disease. Depending on the disease this may be due to immunity following recovery, or death.
We will use a stochastic version of this model based on a continuous time stochastic process \(\{S(t), I(t): t \ge 0\}\) for numbers susceptible and infectious at time t. The total population size is fixed at n so the number removed at time t can be derived as \(R(t)=nS(t)I(t)\). The initial conditions are \((S(0), I(0)) = (n1, 1)\). Two jump transitions are possible: infection \((i,j) \mapsto (i1,j+1)\) and removal \((i,j) \mapsto (i,j1)\). The simplest version of the model is Markovian and is defined by the instantaneous hazard functions of the two transitions, which are \(\frac{\lambda }{n} S(t) I(t)\) for infection and \(\gamma I(t)\) for removal. The unknown parameters are \(\lambda \), controlling infection rates and \(\gamma \), the removal rate. A goal of inference is often to learn about the basic reproduction number \(R_0 = \lambda / \gamma \). This is the expected number of further infections caused by an initial infected individual in a large susceptible population. When \(R_0<1\), most epidemics will infect an insignificant proportion of a large population. Many variations on the Markovian SIR model are possible, some of which are outlined below.
Likelihoodbased inference is straightforward for fully observed data from an SIR model. However, in practice only partial and possibly noisy observations of removal times are available, producing an intractable likelihood. For many models nearexact inference is possible by MCMC methods (summarised by McKinley et al. 2014), but small changes to the details require new and modelspecific algorithms. Approximate inference can be performed by ABC (summarised by Kypraios et al. 2016), which is more adaptable but does not scale well to highdimensional data. Here, we illustrate how REABC can, without modification, perform inference for several variations on the SIR model, and do so more efficiently than standard ABC methods. As we concentrate on a classic and wellstudied dataset, our analysis does not provide any novel subjectarea insights.
Section 5.1 describes a method of simulating from SIR models. Section 5.2 discusses the distance function we use to implement REABC. Data analysis is performed in Sect. 5.3.
5.1 Sellke construction
The Sellke construction (Sellke 1983) for an SIR model provides an appealing way to simulate epidemic models. It introduces latent infectious periods \(g_i \sim F_{\text {inf}}\) and pressure thresholds \(p_i \sim F_{\text {press}}\) for \(1 \le i \le n\), all independent. For the Markovian SIR model, \(F_{\text {inf}}\) is \(Exp(\gamma )\) and \(F_{\text {press}}\) is Exp(1), but other choices are possible and may be more biologically plausible. We condition on \(g_1=0\) so that the first infection occurs at time 0. Algorithm 6 shows how these variables and the parameter \(\lambda \) are converted to simulated removal times. To use slice sampling, we require the latent variables to be uniformly distributed a priori. Therefore, we use quantiles of the \(g_i\)s and \(p_i\)s as the latent variables.
The cost of Algorithm 6 is \(O(n \log n)\), where n is the population size. This is because the main loop runs at most \(2n1\) times and involves finding the minimum of a set of up to \(n1\) removal times, which requires \(O(\log n)\) steps (this is the case if the set is stored as an ordered vector. The cost of adding a new item is \(O(\log n)\)).
Alternative simulation methods exist, principally the Gillespie algorithm (described in Kypraios et al. 2016, forexample). Here, the latent variables form a sequence controlling the behaviour of each successive jump event. The Gillespie algorithm has the advantage of O(n) cost. However, it seems hard for slice sampling to explore the space of latent variables due to the behaviour of the mapping \(y(\theta , x)\). In particular, a small change in latent variables which alters the type of one jump will typically have a large and unpredictable effect on all the subsequent jumps. For more discussion on desirable properties of \(y(\theta , x)\), see Sect. 6.
5.2 Distance function
Approximate posterior estimates of basic reproduction number \(R_0\) and the means and standard deviations of pressure thresholds and infectious periods for the Abakaliki data under three models computed using FIXEDREABC
Model  \(R_0\)  Pressure thresholds  Infectious period  

Mean  SD  Mean  SD  
5 day bins  1.16 (0.30)  0.11 (0.03)  0.11 (0.03)  11.1 (3.0)  11.1 (3.0) 
Gamma infectious period  1.18 (0.24)  0.09 (0.03)  0.09 (0.03)  13.6 (3.8)  6.8 (2.2) 
Weibull pressure thresholds  –  0.10 (0.04)  0.11 (0.03)  12.4 (3.3)  12.4 (3.3) 
5.3 Analysis of Abakaliki data
The Abakaliki dataset contains times between removals from a smallpox epidemic in which 30 individuals were infected from a closed population of 120. It has been studied by many authors under many variations to the basic SIR model. We study three models. The first model uses a Gamma\((k, \gamma )\) infectious period (similar to Neal and Roberts 2005). The second assumes pressure thresholds are distributed by a Weibull(k, 1) distribution (as in Streftaris and Gibson 2012.) The third is the Markovian SIR model, but with removal times only recorded within 5 day bins. This is realised by altering the \(s_{\text {obs}, (i)}  s_{(i)}\) term (difference between simulated and observed day of removal) in (5) to \(f(s_{\text {obs}, (i)})  f(s_{(i)})\) where \(f(s) = 5\lfloor s/5 \rfloor \), the greatest multiple of 5 less than or equal to s. In each model, there are two or three unknown parameters: \(\lambda \), controlling infection rates; \(\gamma \), infectious period scale; k, a shape parameter. These are all assigned independent exponential prior distributions with rate 0.1, representing weakly informative prior beliefs that these parameters are less likely to be large.
Figure 1 of the supplementary material shows simulated epidemics from each model. This shows that our choice of \(\epsilon \) produces epidemics reasonably close to the observed data for every model. Formal model choice is not straightforward in our framework (see discussion in Sect. 6), but it is easy to explore whether the models produced large differences in loglikelihood. In this case, differences were modest, as shown by Figure 2 in the supplementary material, and within what would be explained, using BIC type arguments, by the differing number of parameters in the models. So we conclude qualitatively that are no clear differences in fit between the models.
ADAPTREABC was also tried and returned parameter inference results extremely similar to those for FIXEDREABC – see Table 1 in the supplementary material. This shows that, as in Sect. 4, the bias in its likelihood estimates has a negligible effect on the final results. However, for some analyses the run times were longer. For example, the Gamma infectious period model took 263 minutes for FIXEDREABC and 323 minutes for ADAPTREABC. Figure 5 investigates this in more detail. It shows that the run time difference is because most calls to RESMC terminate early, and these are generally quicker under FIXEDRESMC. It is also interesting that ADAPTRESMC is typically faster for completed RESMC calls. These findings are discussed in the next section.
We also ran ABCMCMC for comparison, using the same MCMC and \(\epsilon \) tuning choices as for REABC. For run times of comparable length to REABC, ABCMCMC produced too few acceptances to calculate effective sample sizes accurately. Instead, we consider the time per acceptance. For ABCMCMC this was at least 12 minutes for all models. For REABC, this value was always less than 2 minutes.
6 Discussion
We have presented a method for approximate inference under an intractable likelihood when simulation of data is possible. It uses the same posterior approximation as ABC, (1), which is controlled by a tuning parameter \(\epsilon \). The advantage of our method is that smaller values of \(\epsilon \) can be achieved for the same computational cost, resulting in more accurate inference. We have shown this is the case through asymptotics (Sect. 3.3) and empirically (Sects. 4, 5.) This increased accuracy allows higherdimensional data or summary statistics to be analysed in practice.
6.1 Latent variable considerations

Evaluating \(y(\theta ,x)\) is reasonably cheap.

Sets of the form \(\{ x  d(y_{\text {obs}},y(\theta ,x)) \le \epsilon \}\) are easy to explore using slice sampling. This would be difficult for sets made up of many disconnected components, or which are lower dimensional manifolds. Smoothness of y to changes in x will help meet this condition.
6.2 Adaptive and nonadaptive algorithms
The REABC algorithm can use RESMC with a fixed \(\epsilon \) sequence (FIXEDRESMC) or one that is chosen adaptively (ADAPTRESMC). FIXEDRESMC provides unbiased estimates of the ABC likelihood, as required by the PMMH algorithm, while ADAPTRESMC has a small bias. In practice, we observe very little difference in the posterior results between the two algorithms, suggesting that this bias has a negligible effect in practice. We also note that, if desired, a bias correction approach from Cérou et al. (2012) could be applied.
Nonetheless, we recommend using the FIXEDRESMC algorithm within REABC (together with a pilot run of ADAPTRESMC to choose the \(\epsilon \) sequence.) The main reason is that it is faster to run in practice, as found in Sect. 5. Figure 5 shows that this is because FIXEDRESMC can terminate more quickly for poor proposed \(\theta \) values. Interestingly, in the iterations where early termination is not required ADAPTRESMC is slightly quicker. We speculate that this is because it often finds a shorter \(\epsilon \) sequence. Furthermore, the theory of Cérou et al. (2012) suggests that ADAPTRESMC produces less variable ABC likelihood estimates, which would improve PMMH efficiency. Therefore, there may be some scope for a more efficient RESMC algorithm which combines the best features of the adaptive and nonadaptive approaches.
6.3 Possible extensions
More efficient \(\epsilon \) sequence adaptation ADAPTREABC adapts the \(\epsilon \) sequence for each \(\theta \) value separately. One alternative is to instead update the sequence based on information from SMC runs at previous \(\theta \) values used by PMMH. This could be done using stochastic approximation (see, e.g., Andrieu and Thoms 2008; Garthwaite et al. 2016), with the aim of making the \(\hat{P}_t\) values in Algorithm 2 as similar as possible—which minimises asymptotic variance of the likelihood estimates, as discussed in Sect. 2.3. The result would be an adaptive MCMC algorithm, and it may be theoretically challenging to prove it has desirable convergence properties (Andrieu and Thoms 2008).
Joint exploration of \((\theta ,x)\) Many \(\theta \) values proposed by REABC are rejected after calculating an expensive likelihood estimate. An appealing alternative is to update the parameters \(\theta \) conditional on sampled x values, for example through a Gibbs sampler with state \((\theta ,x)\). Unfortunately in exploratory analyses of such methods, we found the \(\theta \) updates generally did not mix well. The reason is that x is much more informative for \(\theta \) than the observations \(y_{\text {obs}}\). This results in small \(\theta \) moves compared to the posterior’s scale.
Alternatively, one could consider nesting an SMC algorithm to explore x within one to explore \(\theta \), following Chopin et al. (2013) and Crisan and Miguez (2016). Exploring \(\theta \) could proceed by reducing \(\epsilon \) at each iteration. This might avoid the time penalty of ADAPTRESMC when used in PMMH, discussed in Sect. 6.2.
Discrete data RESMC can struggle if there is a discrete data variable \(x^*\). It can be hard for SMC to move from accepting a set of latent variables A to another \(A'\) in which the range of possible \(x^*\) values is smaller, because \(\Pr (x \in A'  x \in A, \theta )\) may be very small. The issue is particularly obvious for ADAPTRESMC as the \(\epsilon \) sequence may fail to move below some threshold for a large number of iterations. For FIXEDRESMC, it would instead result in highvariance likelihood estimates. In Sect. 5.2, this problem occurs for \(\nu \), the number of removals. There we adopt an applicationspecific solution by introducing continuous latent variables (pressure thresholds) into the distance function (5). It would be useful to investigate more general solutions from the rare event literature (e.g., Walter 2015). Despite these potential issues, REABC can perform well with discrete data in practice, for example in the binned data model of Sect. 5.3.
Nonuniform ABC kernels In this paper, the ABC likelihood (2) is a convolution of the exact likelihood and a uniform kernel (4). Alternative kernel functions have also been used in ABC (e.g., Wilkinson 2013) such as a Gaussian: \(k(y;\epsilon ) \propto \exp [\tfrac{1}{2 \epsilon ^2} d(y, y_{\text {obs}})]\). REABC could easily be adapted to make use of these, but it is not clear what effect it would have on our asymptotic results.
Estimating loglikelihood gradients Where loglikelihood gradients can be estimated they allow more efficient inference schemes based on stochastic gradient descent (Poyiadjis et al. 2011) or MCMC (Dahlin et al. 2015). Estimating such gradients from SMC algorithms is possible using the Fisher identity (Poyiadjis et al. 2011). However, the calculation would involve evaluating \(\nabla _\theta y(\theta ,x)\), which may be demanding for complicated y functions. Moreno et al. (2016) use automatic differentiation to evaluate this for some models. Alternatively, Andrieu et al. (2012) propose using infinitesimal perturbation analysis methods. It would be interesting to use either approach with REABC.
Model choice A desirable extension to REABC would be methods for model choice. Possible methods to extend our PMMH approach include reversible jump MCMC or using a deviance information criterion. See Chkrebtii et al. (2015) and François and Laval (2011) for versions of these methods in the ABC context. Alternatively, it may be more fruitful to use our likelihood estimate in algorithms which directly output model evidence estimates, such as importance sampling or population Monte Carlo (Cappé et al. 2004).
Notes
Acknowledgements
We thank Chris Sherlock for suggesting the use of slice sampling and Andrew Golightly for helpful discussions.
Supplementary material
References
 Alquier, P., Friel, N., Everitt, R., Boland, A.: Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels. Stat. Comput. 26(1), 29–47 (2016)MathSciNetCrossRefMATHGoogle Scholar
 Andersson, H., Britton, T.: Stochastic Epidemic Models and Their Statistical Analysis. Springer, Berlin (2000)CrossRefMATHGoogle Scholar
 Andrieu, C., Doucet, A., Lee, A.: Contribution to the discussion of Fearnhead and Prangle (2012). J. R. Stat. Soc. B 74, 451–452 (2012)Google Scholar
 Andrieu, C., Roberts, G.O.: The pseudomarginal approach for efficient Monte Carlo computations. Ann. Stat. 39, 697–725 (2009)Google Scholar
 Andrieu, C., Thoms, J.: A tutorial on adaptive MCMC. Stat. Comput. 18(4), 343–373 (2008)MathSciNetCrossRefGoogle Scholar
 Barber, S., Voss, J., Webster, M.: The rate of convergence for approximate Bayesian computation. Electron. J. Stat. 9, 80–105 (2015)MathSciNetCrossRefMATHGoogle Scholar
 Beaumont, M.A., Zhang, W., Balding, D.J.: Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035 (2002)Google Scholar
 Biau, G., Cérou, F., Guyader, A.: New insights into approximate Bayesian computation. Annales de l’Institut Henri Poincaré (B) Probabilités et Statistiques 51(1), 376–403 (2015)MathSciNetCrossRefMATHGoogle Scholar
 Blum, M.G.B.: Approximate Bayesian computation: a nonparametric perspective. J. Am. Stat. Assoc. 105(491), 1178–1187 (2010)MathSciNetCrossRefMATHGoogle Scholar
 Blum, M.G.B., Nunes, M.A., Prangle, D., Sisson, S.A.: A comparative review of dimension reduction methods in approximate Bayesian computation. Stat. Sci. 28, 189–208 (2013)MathSciNetCrossRefMATHGoogle Scholar
 Cappé, O., Guillin, A., Marin, J.M., Robert, C.P.: Population Monte Carlo. J. Comput. Gr. Stat. 13(4), 907–929 (2004)MathSciNetCrossRefGoogle Scholar
 Cérou, F., Del Moral, P., Furon, T., Guyader, A.: Sequential Monte Carlo for rare event estimation. Stat. Comput. 22(3), 795–808 (2012)MathSciNetCrossRefMATHGoogle Scholar
 Chiachio, M., Beck, J.L., Chiachio, J., Rus, G.: Approximate Bayesian computation by subset simulation. SIAM J. Sci. Comput. 36(3), A1339–A1358 (2014)MathSciNetCrossRefMATHGoogle Scholar
 Chkrebtii, O.A., Cameron, E.K., Campbell, D.A., Bayne, E.M.: Transdimensional approximate Bayesian computation for inference on invasive species models with latent variables of unknown dimension. Comput. Stat. Data Anal. 86, 97–110 (2015)MathSciNetCrossRefGoogle Scholar
 Chopin, N., Jacob, P.E., Papaspiliopoulos, O.: \(\text{ SMC }^2\): an efficient algorithm for sequential analysis of state space models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 75(3), 397–426 (2013)MathSciNetCrossRefGoogle Scholar
 Crisan, D., Miguez, J.: Nested particle filters for online parameter estimation in discretetime statespace Markov models. arXiv:1308.1883 (2016)
 Dahlin, J., Lindsten, F., Schön, T.B.: Particle Metropolis–Hastings using gradient and Hessian information. Stat. Comput. 25(1), 81–92 (2015)MathSciNetCrossRefMATHGoogle Scholar
 Del Moral, P., Doucet, A., Jasra, A.: An adaptive sequential Monte Carlo method for approximate Bayesian computation. Stat. Comput. 22(5), 1009–1020 (2012)MathSciNetCrossRefMATHGoogle Scholar
 Doucet, A., Pitt, M.K., Deligiannidis, G., Kohn, R.: Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator. Biometrika 102(2), 295–313 (2015)MathSciNetCrossRefMATHGoogle Scholar
 Fearnhead, P., Prangle, D.: Constructing summary statistics for approximate Bayesian computation: semiautomatic ABC. J. R. Stat. Soc. B 74, 419–474 (2012)CrossRefGoogle Scholar
 Forneron, J.J., Ng, S.: A likelihoodfree reverse sampler of the posterior distribution. In: GonzÁlezRivera, G., Hill, R. C., Lee, T.H. (eds.) Essays in Honor of Aman Ullah, pp. 389–415. Emerald Group Publishing Limited (2016)Google Scholar
 François, O., Laval, G.: Deviance information criteria for model selection in approximate Bayesian computation. Stat. Appl. Genet. Mol. Biol. 10(1) (2011). doi: 10.2202/15446115.1678
 Garthwaite, P.H., Fan, Y., Sisson, S.A.: Adaptive optimal scaling of Metropolis–Hastings algorithms using the Robbins–Monro process. Commun. Stat. Theory Methods 45(17), 5098–5111 (2016)MathSciNetCrossRefMATHGoogle Scholar
 Geyer, C.J.: Practical Markov chain Monte Carlo. Stat. Sci. 7, 473–483 (1992)Google Scholar
 Graham, M.M., Storkey, A.: Asymptotically exact conditional inference in deep generative models and differentiable simulators. arXiv:1605.07826 (2016)
 Jasra, A.: Approximate Bayesian computation for a class of time series models. Int. Stat. Rev. 83, 405–435 (2015)MathSciNetCrossRefGoogle Scholar
 Kypraios, T., Neal, P., Prangle, D.: A tutorial introduction to Bayesian inference for stochastic epidemic models using Approximate Bayesian Computation. Math. Biosci. 287, 42–53 (2016)Google Scholar
 L’Ecuyer, P., Demers, V., Tuffin, B.: Rare events, splitting, and quasiMonte Carlo. ACM Trans. Model. Comput. Simul. (TOMACS) 17(2), 9 (2007)CrossRefMATHGoogle Scholar
 Li, J., Nott, D.J., Fan, Y., Sisson, S.A.: Extending approximate Bayesian computation methods to high dimensions via a Gaussian copula model. Comput. Stat. Data Anal. 106, 77–89 (2017)MathSciNetCrossRefGoogle Scholar
 Marin, J.M., Pudlo, P., Robert, C.P., Ryder, R.J.: Approximate Bayesian computational methods. Stat. Comput. 22(6), 1167–1180 (2012)MathSciNetCrossRefMATHGoogle Scholar
 Marjoram, P., Molitor, J., Plagnol, V., Tavaré, S.: Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. 100(26), 15324–15328 (2003)CrossRefGoogle Scholar
 McKinley, T.J., Ross, J.V., Deardon, R., Cook, A.R.: Simulationbased Bayesian inference for epidemic models. Comput. Stat. Data Anal. 71, 434–447 (2014)MathSciNetCrossRefGoogle Scholar
 Meeds, T., Welling, M.: Optimization Monte Carlo: efficient and embarrassingly parallel likelihoodfree inference. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 2071–2079. Curran Associates, Inc. (2015)Google Scholar
 Moreno, A., Adel, T., Meeds, E., Rehg, J.M., Welling, M.: Automatic variational ABC. arXiv:1606.08549 (2016)
 Murray, I., Graham, M.M.: Pseudomarginal slice sampling. J. Mach. Learn. Res. 51, 911–919 (2016)Google Scholar
 Neal, P.: Efficient likelihoodfree Bayesian computation for household epidemics. Stat. Comput. 22(6), 1239–1256 (2012)MathSciNetCrossRefMATHGoogle Scholar
 Neal, P., Roberts, G.: A case study in noncentering for data augmentation: stochastic epidemics. Stat. Comput. 15(4), 315–327 (2005)MathSciNetCrossRefGoogle Scholar
 Nott, D.J., Fan, Y., Marshall, L., Sisson, S.A.: Approximate Bayesian computation and Bayes linear analysis: toward highdimensional ABC. J. Comput. Gr. Stat. 23(1), 65–86 (2014)MathSciNetCrossRefGoogle Scholar
 Nott, D.J., Ong, V.M.H., Fan, Y., Sisson, S.A.: Highdimensional ABC. In: Scott, A., Sisson, Y.E., Beaumont, M. (eds.) Handbook of Approximate Bayesian Computation (Forthcoming). Chapman and Hall/CRC Press, Boca Raton (2017)Google Scholar
 Pitt, M.K., Silva, R.D.S., Giordani, P., Kohn, R.: On some properties of Markov chain Monte Carlo simulation methods based on the particle filter. J. Econ. 171(2), 134–151 (2012)MathSciNetCrossRefMATHGoogle Scholar
 Poyiadjis, G., Doucet, A., Singh, S.S.: Particle approximations of the score and observed information matrix in state space models with application to parameter estimation. Biometrika 98(1), 65–80 (2011)MathSciNetCrossRefMATHGoogle Scholar
 Prangle, D.: Summary statistics. In: Scott, A., Sisson, Y.E., Beaumont, M. (eds.) Handbook of Approximate Bayesian Computation (Forthcoming). Chapman and Hall/CRC Press, Boca Raton (2017)Google Scholar
 Sellke, T.: On the asymptotic distribution of the size of a stochastic epidemic. J. Appl. Prob. 20, 390–394 (1983)MathSciNetCrossRefMATHGoogle Scholar
 Sherlock, C., Thiery, A.H., Roberts, G.O., Rosenthal, J.S.: On the efficiency of pseudomarginal random walk Metropolis algorithms. Ann. Stat. 43(1), 238–275 (2015)MathSciNetCrossRefMATHGoogle Scholar
 Sisson, S.A., Fan, Y., Tanaka, M.M.: Correction: sequential Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. 106(39), 16889–16890 (2009)Google Scholar
 Smith, R.L.: The hitandrun sampler: a globally reaching Markov chain sampler for generating arbitrary multivariate distributions. In: Proceedings of the 28th Conference on Winter Simulation, pp. 260–264. IEEE Computer Society (1996)Google Scholar
 Stein, E.M., Shakarchi, R.: Real Analysis: Measure Theory, Integration, and Hilbert Spaces. Princeton University Press, Princeton (2009)MATHGoogle Scholar
 Streftaris, G., Gibson, G.J.: Nonexponential tolerance to infection in epidemic systems—modeling, inference, and assessment. Biostatistics 13(4), 580–593 (2012)Google Scholar
 Targino, R.S., Peters, G.W., Shevchenko, P.V.: Sequential Monte Carlo samplers for capital allocation under copuladependent risk models. Insur. Math. Econ. 61, 206–226 (2015)MathSciNetCrossRefMATHGoogle Scholar
 Walter, C.: Rare event simulation and splitting for discontinuous random variables. ESAIM: Prob. Stat. 19, 794–811 (2015)MathSciNetCrossRefMATHGoogle Scholar
 Wilkinson, R.D.: Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Stat. Appl. Genet. Mol. Biol. 12(2), 129–141 (2013)MathSciNetCrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.