Without-replacement sampling for particle methods on finite state spaces

Shah, Rohan; Kroese, Dirk P.

doi:10.1007/s11222-017-9752-8

Without-replacement sampling for particle methods on finite state spaces

Published: 19 May 2017

Volume 28, pages 633–652, (2018)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

419 Accesses
1 Citation
Explore all metrics

Abstract

Combinatorial estimation is a new area of application for sequential Monte Carlo methods. We use ideas from sampling theory to introduce new without-replacement sampling methods in such discrete settings. These without-replacement sampling methods allow the addition of merging steps, which can significantly improve the resulting estimators. We give examples showing the use of the proposed methods in combinatorial rare-event probability estimation and in discrete state-space models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Antithetic sampling for sequential Monte Carlo methods with application to state-space models

Article 14 July 2015

Particle rolling MCMC with double-block sampling

Article 19 July 2022

Markov Chain Monte Carlo Algorithms for Bayesian Computation, a Survey and Some Generalisation

References

Aires, N.: Comparisons between conditional Poisson sampling and Pareto $\pi $ps sampling designs. J. Stat. Plan. Inference 88(1), 133–147 (2000)
Article MathSciNet MATH Google Scholar
Bondesson, L., Traat, I., Lundqvist, A.: Pareto sampling versus Sampford and conditional Poisson sampling. Scand. J. Stat. 33(4), 699–720 (2006)
Article MathSciNet MATH Google Scholar
Brewer, K.R.W., Hanif, M.: Sampling with Unequal Probabilities, vol. 15. Springer, New York (1983)
MATH Google Scholar
Brockwell, A., Del Moral, P., Doucet, A.: Sequentially interacting Markov chain Monte Carlo methods. Ann. Stat. 38(6), 3387–3411 (2010)
Article MathSciNet MATH Google Scholar
Carpenter, J., Clifford, P., Fearnhead, P.: Improved particle filter for nonlinear problems. IEE Proc. Radar Sonar Navig. 146(1), 2–7 (1999)
Article Google Scholar
Chen, R., Liu, J.S.: Mixture Kalman filters. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 62(3), 493–508 (2000)
Article MathSciNet MATH Google Scholar
Chen, Y., Diaconis, P., Holmes, S.P., Liu, J.S.: Sequential Monte Carlo methods for statistical analysis of tables. J. Am. Stat. Assoc. 100(469), 109–120 (2005)
Article MathSciNet MATH Google Scholar
Cochran, W.G.: Sampling Techniques, 3rd edn. Wiley, New York (1977)
MATH Google Scholar
Del Moral, P., Doucet, A., Jasra, A.: Sequential Monte Carlo samplers. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68(3), 411–436 (2006)
Article MathSciNet MATH Google Scholar
Douc, R., Cappé, O., Moulines, E.: Comparison of resampling schemes for particle filtering. In: ISPA 2005. In: Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, pp 64–69 (2005)
Doucet, A., de Freitas, N., Gordon, N. (eds.): Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science. Springer, New York (2001)
Google Scholar
Elperin, T.I., Gertsbakh, I., Lomonosov, M.: Estimation of network reliability using graph evolution models. IEEE Trans. Reliab. 40(5), 572–581 (1991)
Article MATH Google Scholar
Fearnhead, P.: Sequential Monte Carlo Methods in Filter Theory. Ph.D. thesis, University of Oxford (1998)
Fearnhead, P., Clifford, P.: On-line inference for hidden Markov models via particle filters. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 65(4), 887–899 (2003)
Article MathSciNet MATH Google Scholar
Gerber, M., Chopin, N.: Sequential quasi Monte Carlo. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 77(3), 509–579 (2015)
Article MathSciNet MATH Google Scholar
Gilks, W.R., Berzuini, C.: Following a moving target-Monte Carlo inference for dynamic Bayesian models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 63(1), 127–146 (2001)
Article MathSciNet MATH Google Scholar
Gordon, N., Salmond, D., Smith, A.: Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. F Radar Signal Process. 140(2), 107–113 (1993)
Article Google Scholar
Hammersley, J.M., Morton, K.W.: Poor man’s Monte Carlo. J. R. Stat. Soc. Ser. B (Methodol.) 16(1), 23–38 (1954)
MathSciNet MATH Google Scholar
Hartley, H.O., Rao, J.N.K.: Sampling with unequal probabilities and without replacement. Ann. Math. Stat. 33(2), 350–374 (1962)
Article MathSciNet MATH Google Scholar
Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47(260), 663–685 (1952)
Article MathSciNet MATH Google Scholar
Iachan, R.: Systematic sampling: a critical review. Int. Stat. Rev. 50(3), 293–303 (1982)
Article MathSciNet MATH Google Scholar
Kong, A., Liu, J.S., Wong, W.H.: Sequential imputations and Bayesian missing data problems. J. Am. Stat. Assoc. 89(425), 278–288 (1994)
Article MATH Google Scholar
Kou, S.C., McCullagh, P.: Approximating the $\alpha $-permanent. Biometrika 96(3), 635–644 (2009)
Article MathSciNet MATH Google Scholar
L’Ecuyer, P., Rubino, G., Saggadi, S., Tuffin, B.: Approximate zero-variance importance sampling for static network reliability estimation. IEEE Trans. Reliab. 60(3), 590–604 (2011)
Article Google Scholar
Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2001)
MATH Google Scholar
Liu, J.S., Chen, R.: Blind deconvolution via sequential imputations. J. Am. Stat. Assoc. 90(430), 567–576 (1995)
Article MathSciNet MATH Google Scholar
Liu, J.S., Chen, R., Logvinenko, T.: A theoretical framework for sequential importance sampling with resampling. In: Doucet, A., de Freitas, N., Gordon, N. (eds.) Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science, pp. 225–246. Springer, New York (2001)
Chapter Google Scholar
Lomonosov, M.: On Monte Carlo estimates in network reliability. Probab. Eng. Inf. Sci. 8, 245–264 (1994)
Article Google Scholar
Madow, W.G.: On the theory of systematic sampling, II. Ann. Math. Stat. 20(3), 333–354 (1949)
Article MathSciNet MATH Google Scholar
Madow, W.G., Madow, L.H.: On the theory of systematic sampling, I. Ann. Math. Stat. 15(1), 1–24 (1944)
Article MathSciNet MATH Google Scholar
Marshall, A.: The use of multi-stage sampling schemes in Monte Carlo computations. In: Meyer, H.A. (ed.) Symposium on Monte Carlo Methods. Wiley, Hoboken (1956)
Google Scholar
Ó Ruanaidh, J.J.K., Fitzgerald, W.J.: Numerical Bayesian Methods Applied to Signal Processing. Springer, New York (1996)
Book MATH Google Scholar
Paige, B., Wood, F., Doucet, A., Teh, Y.W.: Asynchronous anytime sequential Monte Carlo. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 27, Curran Associates, Inc., pp 3410–3418 (2014)
Rosén, B.: Asymptotic theory for order sampling. J. Stat. Plan. Inference 62(2), 135–158 (1997a)
Article MathSciNet MATH Google Scholar
Rosén, B.: On sampling with probability proportional to size. J. Stat. Plan. Inference 62(2), 159–191 (1997b)
Article MathSciNet MATH Google Scholar
Rosenbluth, M.N., Rosenbluth, A.W.: Monte Carlo calculation of the average extension of molecular chains. J. Chem. Phys. 23(2), 356–359 (1955)
Article Google Scholar
Rubinstein, R.Y., Kroese, D.P.: Simulation and the Monte Carlo Method, 3rd edn. Wiley, New York (2017)
MATH Google Scholar
Sampford, M.R.: On sampling without replacement with unequal probabilities of selection. Biometrika 54(3–4), 499–513 (1967)
Article MathSciNet Google Scholar
Tillé, Y.: Sampling Algorithms. Springer, New York (2006)
MATH Google Scholar
Vaisman, R., Kroese, D.P.: Stochastic enumeration method for counting trees. Methodol. Comput. Appl. Probab. 19(1), 31–73 (2017)
Article MathSciNet MATH Google Scholar
Wall, F.T., Erpenbeck, J.J.: New method for the statistical computation of polymer dimensions. J. Chem. Phys. 30(3), 634–637 (1959)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Australian Research Council Centre of Excellence for Mathematical & Statistical Frontiers, under grant number CE140100049. The authors would like to thank the reviewers for their valuable comments, which improved the quality of this paper.

Author information

Authors and Affiliations

School of Mathematics and Physics, The University of Queensland, Brisbane, QLD, 4702, Australia
Rohan Shah & Dirk P. Kroese

Authors

Rohan Shah
View author publications
You can also search for this author in PubMed Google Scholar
Dirk P. Kroese
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rohan Shah.

Appendices

Appendix 1: Unbiasedness of sequential without-replacement Monte Carlo

Let $h^*\left( {\mathbf {x}}_t\right) = \mathbb {E}\left[ h\left( {\mathbf {X}}_d\right) \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t\right] $. Note that

$$\begin{aligned} \sum _{{\mathbf {x}}_{t} \in \mathscr {S}_{t}\left( {\mathbf {x}}_{t-1}\right) } h^*\left( {\mathbf {x}}_t\right) f\left( {\mathbf {x}}_t\right)&= h^*\left( {\mathbf {x}}_{t-1}\right) f\left( {\mathbf {x}}_{t-1}\right) . \end{aligned}$$

Consider the expression

$$\begin{aligned} \sum _{{\mathbf {x}}_{t} \in \mathbf {S}_{t}} \frac{h^*\left( {\mathbf {x}}_t\right) f\left( {\mathbf {x}}_t\right) }{\prod _{i=1}^{t} \pi ^i\left( {\mathbf {x}}_t\right) }, \end{aligned}$$

(20)

where $1 \le t < d$. Let $I\left( {\mathbf {x}}_t\right) $ be a binary variable, where $I\left( {\mathbf {x}}_t\right) = 1$ indicates the inclusion of element ${\mathbf {x}}_{t}$ of $\mathscr {S}_{t}\left( \mathbf {S}_{t-1}\right) $ in $\mathbf {S}_t$. We can rewrite (20) as

$$\begin{aligned} \sum _{{\mathbf {x}}_t \in \mathscr {S}_{t}\left( \mathbf {S}_{t-1}\right) } I_t\left( {\mathbf {x}}_t\right) \frac{h^*\left( {\mathbf {x}}_t\right) f\left( {\mathbf {x}}_t\right) }{\prod _{i=1}^{t} \pi ^i\left( {\mathbf {x}}_t\right) }. \end{aligned}$$

(21)

Recall that $\mathbb {E}\left[ I_t\left( {\mathbf {x}}_t\right) \;\vert \;\mathbf {S}_{t-1}\right] = \pi ^t\left( {\mathbf {x}}_t\right) $. So the expectation of (21) conditional on $\mathbf {S}_1, \ldots , \mathbf {S}_{t-1}$ is

$$\begin{aligned}&\sum _{{\mathbf {x}}_t \in \mathscr {S}_{t}\left( \mathbf {S}_{t-1}\right) } \frac{h^*\left( {\mathbf {x}}_t\right) f\left( {\mathbf {x}}_t\right) }{\prod _{i=1}^{t-1} \pi ^i\left( {\mathbf {x}}_t\right) }\\&\quad = \sum _{{\mathbf {x}}_{t-1}\in \mathbf {S}_{t-1}} \frac{\sum _{{\mathbf {x}}_t \in \mathscr {S}_t\left( {\mathbf {x}}_{t-1}\right) }h^*\left( {\mathbf {x}}_t\right) f\left( {\mathbf {x}}_t\right) }{\prod _{i=1}^{t-1} \pi ^i\left( {\mathbf {x}}_{t-1}\right) }\\&\quad = \sum _{{\mathbf {x}}_{t-1}\in \mathbf {S}_{t-1}} \frac{h^*\left( {\mathbf {x}}_{t-1}\right) f\left( {\mathbf {x}}_{t-1}\right) }{\prod _{i=1}^{t-1} \pi ^i\left( {\mathbf {x}}_{t-1}\right) }. \end{aligned}$$

So

$$\begin{aligned}&\mathbb {E}\left[ \sum _{{\mathbf {x}}_{t} \in \mathbf {S}_{t}} \frac{h^*\left( {\mathbf {x}}_t\right) f\left( {\mathbf {x}}_t\right) }{\prod _{i=1}^{t} \pi ^i\left( {\mathbf {x}}_t\right) }\;\vert \;\mathbf {S}_1, \ldots , \mathbf {S}_{t-1}\right] \nonumber \\&\quad = \sum _{{\mathbf {x}}_{t-1} \in \mathbf {S}_{t-1}} \frac{h^*\left( {\mathbf {x}}_{t-1}\right) f\left( {\mathbf {x}}_{t-1}\right) }{\prod _{i=1}^{t-1} \pi ^i\left( {\mathbf {x}}_{t-1}\right) } \end{aligned}$$

(22)

Applying Eq. (22) d times to

$$\begin{aligned} \widehat{\ell }&= \sum _{{\mathbf {x}}_d \in \mathbf {S}_d} \frac{h\left( {\mathbf {X}}_d\right) f\left( {\mathbf {X}}_d\right) }{\prod _{i=1}^{d-1} \pi ^i\left( {\mathbf {X}}_d\right) } = \sum _{{\mathbf {x}}_d \in \mathbf {S}_d} \frac{h^*\left( {\mathbf {X}}_d\right) f\left( {\mathbf {X}}_d\right) }{\prod _{i=1}^{d-1} \pi ^i\left( {\mathbf {X}}_d\right) }. \end{aligned}$$

shows that $\mathbb {E}\left[ \widehat{\ell }\right] = \ell $.

Appendix 2: Unbiasedness of sequential without-replacement Monte Carlo, with merging

The proof is similar to “Appendix 1.” In this case, all the sample spaces and samples are sets of triples. Consider any expression of the form

$$\begin{aligned} \sum _{\left( {\mathbf {x}}_t, w, p\right) \in \mathscr {T}_t\left( \mathbf {S}_{t-1}\right) } h^*\left( {\mathbf {x}}_t\right) w. \end{aligned}$$

(23)

It is clear that if the proposed merging rule is applied to $\mathscr {T}_t\left( \mathbf {S}_{t-1}\right) $, then the value of (23) is unchanged. Using the definition of $\mathscr {T}_t\left( \mathbf {S}_{t-1}\right) $, Eq. (23) can be written as

$$\begin{aligned}&\sum _{\left( {\mathbf {x}}_{t-1}, w, p\right) \in \mathbf {S}_{t-1}} w \sum _{{\mathbf {x}}_t \in \mathscr {S}_t\left( {\mathbf {x}}_{t-1}\right) } h^*\left( {\mathbf {x}}_t\right) \frac{f\left( x_t \;\vert \;{\mathbf {x}}_{t-1}\right) }{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }\nonumber \\&\quad = \sum _{\left( {\mathbf {x}}_{t-1}, w, p\right) \in \mathbf {S}_{t-1}} \frac{\mathbb {E}\left[ h^*\left( {\mathbf {X}}_t\right) \;\vert \;{\mathbf {X}}_{t-1} = {\mathbf {x}}_{t-1}\right] w}{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }\nonumber \\&\quad = \sum _{\left( {\mathbf {x}}_{t-1}, w, p\right) \in \mathbf {S}_{t-1}} \frac{h^*\left( {\mathbf {x}}_{t-1}\right) w}{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }. \end{aligned}$$

(24)

The expectation of (24) conditional on $\mathbf {S}_{t-2}$ is

$$\begin{aligned} \sum _{\left( {\mathbf {x}}_{t-1}, w, p\right) \in \mathscr {T}_{t-1}\left( \mathbf {S}_{t-2}\right) } h^*\left( {\mathbf {x}}_{t-1}\right) w. \end{aligned}$$

(25)

So

$$\begin{aligned}&\mathbb {E}\left[ \sum _{\left( {\mathbf {x}}_{t}, w, p\right) \in \mathscr {T}_{t}\left( \mathbf {S}_{t-1}\right) } h^*\left( {\mathbf {x}}_{t}\right) w \;\vert \;\mathbf {S}_{t-2}\right] \nonumber \\&\quad = \sum _{\left( {\mathbf {x}}_{t-1}, w, p\right) \in \mathscr {T}_{t-1}\left( \mathbf {S}_{t-2}\right) } h^*\left( {\mathbf {x}}_{t-1}\right) w. \end{aligned}$$

(26)

Applying Eq. (26) $d-1$ times to

$$\begin{aligned} \mathbb {E}\left[ \widehat{\ell } \;\vert \;\mathbf {S}_{d-1}\right] = \sum _{\left( {\mathbf {x}}_d, w, p\right) \in \mathscr {T}_d\left( \mathbf {S}_{d-1}\right) } h^*\left( {\mathbf {x}}_d\right) w \end{aligned}$$

shows that $\widehat{\ell }$ is unbiased.

Appendix 3: Without-replacement sampling for the change-point example

We now give the details of the application of without-replacement sampling to the change-point example in Sect. 1. Recall that ${\mathbf {X}}_d = \left\{ X_t \right\} _{t=1}^d$ is a Markov chain and ${\mathbf {Y}}_d = \left\{ Y_t \right\} _{t=1}^d$ are the observations. Let f be the joint density of ${\mathbf {X}}_d$ and ${\mathbf {Y}}_d$. Note that

$$\begin{aligned} f\left( {\mathbf {x}}_{t}\;\vert \;{\mathbf {y}}_{t}\right)&= c_t f\left( {\mathbf {x}}_{t-1} \;\vert \;{\mathbf {y}}_{t-1}\right) f\left( x_{t}\;\vert \;{\mathbf {x}}_{t-1}\right) f\left( y_{t} \;\vert \;x_{t}\right) , \end{aligned}$$

(27)

$$\begin{aligned} f\left( {\mathbf {x}}_{1}\;\vert \;{\mathbf {y}}_{1}\right)&= c_1 f\left( x_{1}\right) f\left( y_{1} \;\vert \;x_{1}\right) , \end{aligned}$$

(28)

for some unknown constants $\left\{ c_t\right\} _{t=1}^d$. Define the size variables recursively as

$$\begin{aligned} p\left( {\mathbf {x}}_t\right)&= p\left( {\mathbf {x}}_{t-1}\right) \frac{f\left( x_{t} \;\vert \;{\mathbf {x}}_{t-1}\right) f\left( y_t \;\vert \;x_t\right) }{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }, \end{aligned}$$

(29)

$$\begin{aligned} p\left( x_1\right)&= f\left( x_1\right) f\left( y_1 \;\vert \;x_1\right) . \end{aligned}$$

(30)

This updating rule is slightly different from that given in (17). Equations (30) and (27) require an initial distribution for $X_1 = \left( C_1, O_1\right) $, which we take to be

$$\begin{aligned} \mathbb {P}\left( C_1 = 2, O_1 = 2\right) = \frac{1}{250}, \mathbb {P}\left( C_1 = 2, O_1 = 2\right) = \frac{249}{250}. \end{aligned}$$

Define

$$\begin{aligned} {\mathscr {U}}_1 = {\mathscr {U}}_1\left( \emptyset \right) = \left\{ \left( x_1, f\left( x_1\right) f\left( y_1 \;\vert \;x_1\right) \right) :x_1 \in {\mathscr {S}}_1\right\} , \end{aligned}$$

and let $\mathbf {S}_1$ be a sample chosen from $\mathscr {U}_1$, with probability proportional to the last component. Assume that sample $\mathbf {S}_{t-1}$ has been chosen, and let

$$\begin{aligned} \mathscr {U}_{t}\left( \mathbf {S}_{t-1}\right)&= \left\{ \left( {\mathbf {x}}_t, w \frac{f\left( x_t \;\vert \;{\mathbf {x}}_{t-1}\right) f\left( y_t \;\vert \;x_t\right) }{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }\right) :\right. \\ \left( {\mathbf {x}}_{t-1}, w\right)&\left. \in \mathbf {S}_{t-1}, {\mathbf {x}}_t \in \mathrm {Support}\left( {\mathbf {X}}_t \;\vert \;{\mathbf {X}}_{t-1} = {\mathbf {x}}_{t-1}\right) \right\} . \end{aligned}$$

We account for the unknown normalizing constants in (27) by using an estimator of the form (12). This results in Algorithm 5.

Proposition 2

The set $\mathbf {S}_d$ generated by Algorithm 5 has the property that

$$\begin{aligned} \mathbb {E}\left[ \sum _{\left( {\mathbf {x}}_d, w\right) \in \mathbf {S}_d}\frac{h\left( {\mathbf {x}}_d\right) w}{\pi ^d\left( {\mathbf {x}}_d\right) }\right]&= \mathbb {E}\left( h\left( {\mathbf {X}}_d\right) \;\vert \;{\mathbf {Y}}_d\right) \prod _{t=1}^d c_t^{-1}. \end{aligned}$$

Proof

Define

$$\begin{aligned} H\left( {\mathbf {x}}_t\right)&= \frac{\mathbb {E}\left[ h\left( {\mathbf {X}}_d\right) \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t, {\mathbf {Y}}_d = {\mathbf {y}}_d\right] f\left( {\mathbf {x}}_t \;\vert \;{\mathbf {y}}_d\right) }{f\left( {\mathbf {x}}_t \;\vert \;{\mathbf {y}}_t\right) \prod _{i=t+1}^d c_i}. \end{aligned}$$

Using (27),

$$\begin{aligned}&\sum _{{\mathbf {x}}_t \in \mathscr {S}_{t}\left( {\mathbf {x}}_{t-1}\right) }H\left( {\mathbf {x}}_t\right) f\left( x_t \;\vert \;{\mathbf {x}}_{t-1}\right) f\left( y_t \;\vert \;x_t\right) \\&\quad = \sum _{{\mathbf {x}}_t \in \mathscr {S}_{t}\left( {\mathbf {x}}_{t-1}\right) }\frac{\mathbb {E}\left[ h\left( {\mathbf {X}}_d\right) \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t, {\mathbf {Y}}_d = {\mathbf {y}}_d\right] f\left( {\mathbf {x}}_t\;\vert \;{\mathbf {y}}_d\right) }{f\left( {\mathbf {x}}_{t-1}\;\vert \;{\mathbf {y}}_{t-1}\right) \prod _{i=t}^d c_i}\\&\quad = \frac{\mathbb {E}\left[ h\left( {\mathbf {X}}_d\right) \;\vert \;{\mathbf {X}}_{t-1} = {\mathbf {x}}_{t-1}, {\mathbf {Y}}_d = {\mathbf {y}}_d\right] f\left( {\mathbf {x}}_{t-1}\;\vert \;{\mathbf {y}}_d\right) }{f\left( {\mathbf {x}}_{t-1}\;\vert \;{\mathbf {y}}_{t-1}\right) \prod _{i=t}^d c_i}\\&\quad = H\left( {\mathbf {x}}_{t-1}\right) . \end{aligned}$$

Consider any expression of the form

$$\begin{aligned} \sum _{\left( {\mathbf {x}}_{t}, w\right) \in \mathscr {U}_{t}\left( \mathbf {S}_{t-1}\right) }H\left( {\mathbf {x}}_t\right) w. \end{aligned}$$

(31)

Equation (31) can be written as

$$\begin{aligned}&\sum _{\left( {\mathbf {x}}_{t-1}, w\right) \in \mathbf {S}_{t-1}}\sum _{{\mathbf {x}}_t \in \mathscr {S}_{t}\left( {\mathbf {x}}_{t-1}\right) }H\left( {\mathbf {x}}_t\right) w \frac{f\left( x_t \;\vert \;{\mathbf {x}}_{t-1}\right) f\left( y_t \;\vert \;x_t\right) }{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }\nonumber \\&\quad = \sum _{\left( {\mathbf {x}}_{t-1}, w\right) \in \mathbf {S}_{t-1}}\frac{w H\left( {\mathbf {x}}_{t-1}\right) }{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }. \end{aligned}$$

(32)

The expectation of (32) conditional on $\mathbf {S}_{t-2}$ is

$$\begin{aligned} \sum _{\left( {\mathbf {x}}_{t-1}, w\right) \in \mathscr {U}_{t-1}\left( \mathbf {S}_{t-2}\right) }H\left( {\mathbf {x}}_{t-1}\right) w. \end{aligned}$$

So

$$\begin{aligned}&\mathbb {E}\left[ \sum _{\left( {\mathbf {x}}_{t}, w\right) \in \mathscr {U}_{t}\left( \mathbf {S}_{t-1}\right) }H\left( {\mathbf {x}}_t\right) w\;\vert \;\mathbf {S}_{t-2} \right] \nonumber \\&\quad = \sum _{\left( {\mathbf {x}}_{t-1}, w\right) \in \mathscr {U}_{t-1}\left( \mathbf {S}_{t-2}\right) }H\left( {\mathbf {x}}_{t-1}\right) w. \end{aligned}$$

(33)

Applying Eq. (33) $d-1$ times to

$$\begin{aligned}&\mathbb {E}\left[ \sum _{\left( {\mathbf {x}}_d, w\right) \in \mathbf {S}_d}\frac{h\left( {\mathbf {x}}_d\right) w}{\pi ^d\left( {\mathbf {x}}_d\right) }\;\vert \;\mathbf {S}_{d-1}\right] \\&\quad = \sum _{\left( {\mathbf {x}}_d, w\right) \in \mathscr {U}_{d}\left( \mathbf {S}_{d-1}\right) }h\left( {\mathbf {x}}_d\right) w\\&\quad = \sum _{\left( {\mathbf {x}}_d, w\right) \in \mathscr {U}_{d}\left( \mathbf {S}_{d-1}\right) }H\left( {\mathbf {x}}_d\right) w. \end{aligned}$$

completes the proof. $\square $

We now describe the merging step outlined in Fearnhead and Clifford (2003), applied to the estimation of the posterior change-point probabilities

$$\begin{aligned} \left\{ \mathbb {P}\left( C_t = 2 \;\vert \;{\mathbf {Y}}_d = {\mathbf {y}}_d\right) \right\} _{t=1}^d. \end{aligned}$$

The method we describe here can be extended fairly trivially to also estimate $\left\{ \mathbb {P}\left( O_t = 2 \;\vert \;{\mathbf {Y}}_d = {\mathbf {y}}_d\right) \right\} _{t=1}^d$.

In order to perform this merging, we must add more information to all the sample spaces and the samples chosen from then. The extended space will have ${\mathbf {x}}_t$ as the first entry, the particle weight w as the second entry, and a vector $\mathbf m_t$ of t values as the third entry. The last entry will be an estimate of $\left\{ \mathbb {P}\left( C_i = 2 \;\vert \;{\mathbf {y}}_t\right) \right\} _{i=1}^t$. Let

$$\begin{aligned} \mathscr {V}_1&= \left\{ \left( x_1, f\left( x_1\right) f\left( y_1 \;\vert \;x_1\right) , \mathbb {P}\left( C_1 = 2 \;\vert \;x_1\right) \right) :x_1 \in \mathscr {S}_1\right\} . \end{aligned}$$

Note that the third component of every element of $\mathscr {V}_1$ is either 0 or 1. Let $\mathbf {S}_1$ be a sample drawn from $\mathscr {V}_1$, with probability proportional to the second element. Assume that sample $\mathbf {S}_{t-1}$ has been chosen, and let $\mathscr {V}_t\left( \mathbf {S}_{t-1}\right) $ be

$$\begin{aligned}&\left\{ \left( {\mathbf {x}}_t, w \frac{f\left( x_t \;\vert \;{\mathbf {x}}_{t-1}\right) f\left( y_t \;\vert \;{\mathbf {x}}_t\right) }{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }, \left( \mathbf m_{t-1}, \right. \right. \right. \\&\quad \left. \left. \left. \mathbb {P}\left( C_t = 2 \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t, {\mathbf {Y}}_d = {\mathbf {y}}_d\right) \right) \right) :\right. \\&\quad \left. \left( {\mathbf {x}}_{t-1}, w, \mathbf m_{t-1}\right) \in \mathbf {S}_{t-1}, {\mathbf {x}}_t \in \mathscr {S}_t\left( {\mathbf {x}}_{t-1}\right) \right\} . \end{aligned}$$

We can now define Algorithm 6, which uses the merging step outlined in Proposition 4.

Proposition 3

If the merging step is omitted, then the set $\mathbf {S}_d$ generated by Algorithm 6 has the property that

$$\begin{aligned} \mathbb {E}\left[ \sum _{\left( {\mathbf {x}}_d, w, \mathbf m_d\right) \in \mathbf {S}_d}\frac{\mathbf m_d w}{\pi ^d\left( {\mathbf {x}}_d\right) }\right] = \frac{\left\{ \mathbb {P}\left( C_t = 2 \;\vert \;{\mathbf {Y}}_d = {\mathbf {y}}_d\right) \right\} _{t=1}^d}{\prod _{t=1}^d c_t}. \end{aligned}$$

Proof

Define

$$\begin{aligned} G\left( {\mathbf {x}}_t, \mathbf m_t\right)&= \left( \mathbf m_t, \mathbb {P}\left( C_{t+1} = 2 \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t, {\mathbf {Y}}_d = {\mathbf {y}}_d\right) , \right. \\&\qquad \left. \ldots , \mathbb {P}\left( C_{d} = 2 \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t, {\mathbf {Y}}_d = {\mathbf {y}}_d\right) \right) \\&\quad \times \,\frac{f\left( {\mathbf {x}}_t \;\vert \;{\mathbf {y}}_d\right) }{f\left( {\mathbf {x}}_t \;\vert \;{\mathbf {y}}_t\right) \prod _{i=t+1}^d c_i}. \end{aligned}$$

It can be shown that

$$\begin{aligned}&\sum _{{\mathbf {x}}_t \in \mathscr {S}_t\left( {\mathbf {x}}_{t-1}\right) } G\left( {\mathbf {x}}_t, \left( {\mathbf {m}}_{t-1}, \mathbb {P}\left( C_t = 2 \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t, {\mathbf {Y}}_d = {\mathbf {y}}_d\right) \right) \right) \\&\quad \times \,f\left( x_t \;\vert \;{\mathbf {x}}_{t-1}\right) f\left( y_t \;\vert \;x_t\right) \\&\quad = G\left( {\mathbf {x}}_{t-1}, {\mathbf {m}}_{t-1}\right) . \end{aligned}$$

Consider any expression of the form

$$\begin{aligned} \sum _{\left( {\mathbf {x}}_t, w, \mathbf m_t\right) \in \mathscr {V}_t\left( \mathbf {S}_{t-1}\right) } G\left( {\mathbf {x}}_t, \mathbf m_t\right) w. \end{aligned}$$

(34)

Equation (34) can be written as

$$\begin{aligned}&\sum _{\left( {\mathbf {x}}_{t-1}, w, {\mathbf {m}}_{t-1}\right) \in \mathbf {S}_{t-1}}w\sum _{{\mathbf {x}}_t \in \mathscr {S}_t\left( {\mathbf {x}}_{t-1}\right) }\nonumber \\&\quad G\left( {\mathbf {x}}_t, \left( {\mathbf {m}}_{t-1}, \mathbb {P}\left( C_t = 2 \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t, {\mathbf {Y}}_d = {\mathbf {y}}_d\right) \right) \right) \nonumber \\&\quad \times \frac{f\left( x_t \;\vert \;{\mathbf {x}}_{t-1}\right) f\left( y_t \;\vert \;{\mathbf {x}}_t\right) }{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }\nonumber \\&\quad = \sum _{\left( {\mathbf {x}}_{t-1}, w, {\mathbf {m}}_{t-1}\right) \in \mathbf {S}_{t-1}}w\frac{G\left( {\mathbf {x}}_{t-1}, \mathbf m_{t-1}\right) }{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }. \end{aligned}$$

(35)

The expectation of (35) conditional on $\mathbf {S}_{t-2}$ is

$$\begin{aligned} \sum _{\left( {\mathbf {x}}_{t-1}, w, {\mathbf {m}}_{t-1}\right) \in \mathscr {V}_{t-1}\left( \mathbf {S}_{t-2}\right) }w G\left( {\mathbf {x}}_{t-1}, \mathbf m_{t-1}\right) . \end{aligned}$$

So

$$\begin{aligned}&\mathbb {E}\left[ \sum _{\left( {\mathbf {x}}_t, w, \mathbf m_t\right) \in \mathscr {V}_t\left( \mathbf {S}_{t-1}\right) } G\left( {\mathbf {x}}_t, \mathbf m_t\right) w \;\vert \;\mathbf {S}_{t-2}\right] \nonumber \\&= \sum _{\left( {\mathbf {x}}_{t-1}, w, {\mathbf {m}}_{t-1}\right) \in \mathscr {V}_{t-1}\left( \mathbf {S}_{t-2}\right) }w G\left( {\mathbf {x}}_{t-1}, \mathbf m_{t-1}\right) . \end{aligned}$$

(36)

Applying Eq. (36) $d-1$ times to

$$\begin{aligned}&\mathbb {E}\left[ \sum _{\left( {\mathbf {x}}_d, w, \mathbf m_d\right) \in \mathbf {S}_d}\frac{\mathbf m_d w}{\pi ^d\left( {\mathbf {x}}_d\right) }\;\vert \;\mathbf {S}_{d-1}\right] \\&\quad =\,\sum _{\left( {\mathbf {x}}_{d}, w, {\mathbf {m}}_{d}\right) \in \mathscr {V}_{d}\left( \mathbf {S}_{d-1}\right) }w G\left( {\mathbf {x}}_{d}, \mathbf m_{d}\right) \end{aligned}$$

completes the proof. $\square $

Proposition 4

Assume we have two units $\left( {\mathbf {x}}_t, w, \mathbf m_t\right) $ and $\left( {\mathbf {x}}_t', w', \mathbf m_t'\right) $, both corresponding to paths of the Markov chain with $C_t = 2$ and $O_t = 2$. Then, we can remove these units, and replace them with the single unit

$$\begin{aligned} \left( {\mathbf {x}}_t, w + w', \frac{w \mathbf m_t + w' \mathbf m_t'}{w + w'}\right) . \end{aligned}$$

This rule also applies if both units correspond to $C_t = 2$ and $O_t = 1$.

Proof

Under the specified conditions on ${\mathbf {x}}_t$ and ${\mathbf {x}}_t'$,

$$\begin{aligned}&\mathbb {P}\left( C_i = 2 \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t, {\mathbf {Y}}_d = {\mathbf {Y}}_d\right) \\&\quad = \mathbb {P}\left( C_i = 2 \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t', {\mathbf {Y}}_d = {\mathbf {Y}}_d\right) ,&\forall t+ 1 \le i \le d, \\&\quad f\left( {\mathbf {x}}_t \;\vert \;{\mathbf {y}}_t\right) = f\left( {\mathbf {x}}_t \;\vert \;{\mathbf {y}}_d\right) ,\\&\quad f\left( {\mathbf {x}}_t' \;\vert \;{\mathbf {y}}_t\right) = f\left( {\mathbf {x}}_t' \;\vert \;{\mathbf {y}}_d\right) . \end{aligned}$$

This shows that

$$\begin{aligned}&\left( w + w'\right) G\left( {\mathbf {x}}_t, \frac{w {\mathbf {m}}_t + w' {\mathbf {m}}_t'}{w + w'}\right) \\&\quad = w G\left( {\mathbf {x}}_t, {\mathbf {m}}_t\right) + w' G\left( {\mathbf {x}}_t', {\mathbf {m}}_t'\right) . \end{aligned}$$

So replacement of this pair of units by the specified single unit does not bias the resulting estimator. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shah, R., Kroese, D.P. Without-replacement sampling for particle methods on finite state spaces. Stat Comput 28, 633–652 (2018). https://doi.org/10.1007/s11222-017-9752-8

Download citation

Received: 30 August 2016
Accepted: 02 May 2017
Published: 19 May 2017
Issue Date: May 2018
DOI: https://doi.org/10.1007/s11222-017-9752-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Without-replacement sampling for particle methods on finite state spaces

Abstract

Access this article

Similar content being viewed by others

Antithetic sampling for sequential Monte Carlo methods with application to state-space models

Particle rolling MCMC with double-block sampling

Markov Chain Monte Carlo Algorithms for Bayesian Computation, a Survey and Some Generalisation

References

Acknowledgements