Skip to main content
Log in

Without-replacement sampling for particle methods on finite state spaces

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Combinatorial estimation is a new area of application for sequential Monte Carlo methods. We use ideas from sampling theory to introduce new without-replacement sampling methods in such discrete settings. These without-replacement sampling methods allow the addition of merging steps, which can significantly improve the resulting estimators. We give examples showing the use of the proposed methods in combinatorial rare-event probability estimation and in discrete state-space models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Aires, N.: Comparisons between conditional Poisson sampling and Pareto \(\pi \)ps sampling designs. J. Stat. Plan. Inference 88(1), 133–147 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Bondesson, L., Traat, I., Lundqvist, A.: Pareto sampling versus Sampford and conditional Poisson sampling. Scand. J. Stat. 33(4), 699–720 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Brewer, K.R.W., Hanif, M.: Sampling with Unequal Probabilities, vol. 15. Springer, New York (1983)

    MATH  Google Scholar 

  • Brockwell, A., Del Moral, P., Doucet, A.: Sequentially interacting Markov chain Monte Carlo methods. Ann. Stat. 38(6), 3387–3411 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Carpenter, J., Clifford, P., Fearnhead, P.: Improved particle filter for nonlinear problems. IEE Proc. Radar Sonar Navig. 146(1), 2–7 (1999)

    Article  Google Scholar 

  • Chen, R., Liu, J.S.: Mixture Kalman filters. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 62(3), 493–508 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, Y., Diaconis, P., Holmes, S.P., Liu, J.S.: Sequential Monte Carlo methods for statistical analysis of tables. J. Am. Stat. Assoc. 100(469), 109–120 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Cochran, W.G.: Sampling Techniques, 3rd edn. Wiley, New York (1977)

    MATH  Google Scholar 

  • Del Moral, P., Doucet, A., Jasra, A.: Sequential Monte Carlo samplers. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68(3), 411–436 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Douc, R., Cappé, O., Moulines, E.: Comparison of resampling schemes for particle filtering. In: ISPA 2005. In: Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, pp 64–69 (2005)

  • Doucet, A., de Freitas, N., Gordon, N. (eds.): Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science. Springer, New York (2001)

    Google Scholar 

  • Elperin, T.I., Gertsbakh, I., Lomonosov, M.: Estimation of network reliability using graph evolution models. IEEE Trans. Reliab. 40(5), 572–581 (1991)

    Article  MATH  Google Scholar 

  • Fearnhead, P.: Sequential Monte Carlo Methods in Filter Theory. Ph.D. thesis, University of Oxford (1998)

  • Fearnhead, P., Clifford, P.: On-line inference for hidden Markov models via particle filters. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 65(4), 887–899 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Gerber, M., Chopin, N.: Sequential quasi Monte Carlo. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 77(3), 509–579 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Gilks, W.R., Berzuini, C.: Following a moving target-Monte Carlo inference for dynamic Bayesian models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 63(1), 127–146 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Gordon, N., Salmond, D., Smith, A.: Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. F Radar Signal Process. 140(2), 107–113 (1993)

    Article  Google Scholar 

  • Hammersley, J.M., Morton, K.W.: Poor man’s Monte Carlo. J. R. Stat. Soc. Ser. B (Methodol.) 16(1), 23–38 (1954)

    MathSciNet  MATH  Google Scholar 

  • Hartley, H.O., Rao, J.N.K.: Sampling with unequal probabilities and without replacement. Ann. Math. Stat. 33(2), 350–374 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  • Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47(260), 663–685 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  • Iachan, R.: Systematic sampling: a critical review. Int. Stat. Rev. 50(3), 293–303 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  • Kong, A., Liu, J.S., Wong, W.H.: Sequential imputations and Bayesian missing data problems. J. Am. Stat. Assoc. 89(425), 278–288 (1994)

    Article  MATH  Google Scholar 

  • Kou, S.C., McCullagh, P.: Approximating the \(\alpha \)-permanent. Biometrika 96(3), 635–644 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • L’Ecuyer, P., Rubino, G., Saggadi, S., Tuffin, B.: Approximate zero-variance importance sampling for static network reliability estimation. IEEE Trans. Reliab. 60(3), 590–604 (2011)

    Article  Google Scholar 

  • Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2001)

    MATH  Google Scholar 

  • Liu, J.S., Chen, R.: Blind deconvolution via sequential imputations. J. Am. Stat. Assoc. 90(430), 567–576 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  • Liu, J.S., Chen, R., Logvinenko, T.: A theoretical framework for sequential importance sampling with resampling. In: Doucet, A., de Freitas, N., Gordon, N. (eds.) Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science, pp. 225–246. Springer, New York (2001)

    Chapter  Google Scholar 

  • Lomonosov, M.: On Monte Carlo estimates in network reliability. Probab. Eng. Inf. Sci. 8, 245–264 (1994)

    Article  Google Scholar 

  • Madow, W.G.: On the theory of systematic sampling, II. Ann. Math. Stat. 20(3), 333–354 (1949)

    Article  MathSciNet  MATH  Google Scholar 

  • Madow, W.G., Madow, L.H.: On the theory of systematic sampling, I. Ann. Math. Stat. 15(1), 1–24 (1944)

    Article  MathSciNet  MATH  Google Scholar 

  • Marshall, A.: The use of multi-stage sampling schemes in Monte Carlo computations. In: Meyer, H.A. (ed.) Symposium on Monte Carlo Methods. Wiley, Hoboken (1956)

    Google Scholar 

  • Ó Ruanaidh, J.J.K., Fitzgerald, W.J.: Numerical Bayesian Methods Applied to Signal Processing. Springer, New York (1996)

    Book  MATH  Google Scholar 

  • Paige, B., Wood, F., Doucet, A., Teh, Y.W.: Asynchronous anytime sequential Monte Carlo. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 27, Curran Associates, Inc., pp 3410–3418 (2014)

  • Rosén, B.: Asymptotic theory for order sampling. J. Stat. Plan. Inference 62(2), 135–158 (1997a)

    Article  MathSciNet  MATH  Google Scholar 

  • Rosén, B.: On sampling with probability proportional to size. J. Stat. Plan. Inference 62(2), 159–191 (1997b)

    Article  MathSciNet  MATH  Google Scholar 

  • Rosenbluth, M.N., Rosenbluth, A.W.: Monte Carlo calculation of the average extension of molecular chains. J. Chem. Phys. 23(2), 356–359 (1955)

    Article  Google Scholar 

  • Rubinstein, R.Y., Kroese, D.P.: Simulation and the Monte Carlo Method, 3rd edn. Wiley, New York (2017)

    MATH  Google Scholar 

  • Sampford, M.R.: On sampling without replacement with unequal probabilities of selection. Biometrika 54(3–4), 499–513 (1967)

    Article  MathSciNet  Google Scholar 

  • Tillé, Y.: Sampling Algorithms. Springer, New York (2006)

    MATH  Google Scholar 

  • Vaisman, R., Kroese, D.P.: Stochastic enumeration method for counting trees. Methodol. Comput. Appl. Probab. 19(1), 31–73 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  • Wall, F.T., Erpenbeck, J.J.: New method for the statistical computation of polymer dimensions. J. Chem. Phys. 30(3), 634–637 (1959)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Australian Research Council Centre of Excellence for Mathematical & Statistical Frontiers, under grant number CE140100049. The authors would like to thank the reviewers for their valuable comments, which improved the quality of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rohan Shah.

Appendices

Appendix 1: Unbiasedness of sequential without-replacement Monte Carlo

Let \(h^*\left( {\mathbf {x}}_t\right) = \mathbb {E}\left[ h\left( {\mathbf {X}}_d\right) \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t\right] \). Note that

$$\begin{aligned} \sum _{{\mathbf {x}}_{t} \in \mathscr {S}_{t}\left( {\mathbf {x}}_{t-1}\right) } h^*\left( {\mathbf {x}}_t\right) f\left( {\mathbf {x}}_t\right)&= h^*\left( {\mathbf {x}}_{t-1}\right) f\left( {\mathbf {x}}_{t-1}\right) . \end{aligned}$$

Consider the expression

$$\begin{aligned} \sum _{{\mathbf {x}}_{t} \in \mathbf {S}_{t}} \frac{h^*\left( {\mathbf {x}}_t\right) f\left( {\mathbf {x}}_t\right) }{\prod _{i=1}^{t} \pi ^i\left( {\mathbf {x}}_t\right) }, \end{aligned}$$
(20)

where \(1 \le t < d\). Let \(I\left( {\mathbf {x}}_t\right) \) be a binary variable, where \(I\left( {\mathbf {x}}_t\right) = 1\) indicates the inclusion of element \({\mathbf {x}}_{t}\) of \(\mathscr {S}_{t}\left( \mathbf {S}_{t-1}\right) \) in \(\mathbf {S}_t\). We can rewrite (20) as

$$\begin{aligned} \sum _{{\mathbf {x}}_t \in \mathscr {S}_{t}\left( \mathbf {S}_{t-1}\right) } I_t\left( {\mathbf {x}}_t\right) \frac{h^*\left( {\mathbf {x}}_t\right) f\left( {\mathbf {x}}_t\right) }{\prod _{i=1}^{t} \pi ^i\left( {\mathbf {x}}_t\right) }. \end{aligned}$$
(21)

Recall that \(\mathbb {E}\left[ I_t\left( {\mathbf {x}}_t\right) \;\vert \;\mathbf {S}_{t-1}\right] = \pi ^t\left( {\mathbf {x}}_t\right) \). So the expectation of (21) conditional on \(\mathbf {S}_1, \ldots , \mathbf {S}_{t-1}\) is

$$\begin{aligned}&\sum _{{\mathbf {x}}_t \in \mathscr {S}_{t}\left( \mathbf {S}_{t-1}\right) } \frac{h^*\left( {\mathbf {x}}_t\right) f\left( {\mathbf {x}}_t\right) }{\prod _{i=1}^{t-1} \pi ^i\left( {\mathbf {x}}_t\right) }\\&\quad = \sum _{{\mathbf {x}}_{t-1}\in \mathbf {S}_{t-1}} \frac{\sum _{{\mathbf {x}}_t \in \mathscr {S}_t\left( {\mathbf {x}}_{t-1}\right) }h^*\left( {\mathbf {x}}_t\right) f\left( {\mathbf {x}}_t\right) }{\prod _{i=1}^{t-1} \pi ^i\left( {\mathbf {x}}_{t-1}\right) }\\&\quad = \sum _{{\mathbf {x}}_{t-1}\in \mathbf {S}_{t-1}} \frac{h^*\left( {\mathbf {x}}_{t-1}\right) f\left( {\mathbf {x}}_{t-1}\right) }{\prod _{i=1}^{t-1} \pi ^i\left( {\mathbf {x}}_{t-1}\right) }. \end{aligned}$$

So

$$\begin{aligned}&\mathbb {E}\left[ \sum _{{\mathbf {x}}_{t} \in \mathbf {S}_{t}} \frac{h^*\left( {\mathbf {x}}_t\right) f\left( {\mathbf {x}}_t\right) }{\prod _{i=1}^{t} \pi ^i\left( {\mathbf {x}}_t\right) }\;\vert \;\mathbf {S}_1, \ldots , \mathbf {S}_{t-1}\right] \nonumber \\&\quad = \sum _{{\mathbf {x}}_{t-1} \in \mathbf {S}_{t-1}} \frac{h^*\left( {\mathbf {x}}_{t-1}\right) f\left( {\mathbf {x}}_{t-1}\right) }{\prod _{i=1}^{t-1} \pi ^i\left( {\mathbf {x}}_{t-1}\right) } \end{aligned}$$
(22)

Applying Eq. (22) d times to

$$\begin{aligned} \widehat{\ell }&= \sum _{{\mathbf {x}}_d \in \mathbf {S}_d} \frac{h\left( {\mathbf {X}}_d\right) f\left( {\mathbf {X}}_d\right) }{\prod _{i=1}^{d-1} \pi ^i\left( {\mathbf {X}}_d\right) } = \sum _{{\mathbf {x}}_d \in \mathbf {S}_d} \frac{h^*\left( {\mathbf {X}}_d\right) f\left( {\mathbf {X}}_d\right) }{\prod _{i=1}^{d-1} \pi ^i\left( {\mathbf {X}}_d\right) }. \end{aligned}$$

shows that \(\mathbb {E}\left[ \widehat{\ell }\right] = \ell \).

Appendix 2: Unbiasedness of sequential without-replacement Monte Carlo, with merging

The proof is similar to “Appendix 1.” In this case, all the sample spaces and samples are sets of triples. Consider any expression of the form

$$\begin{aligned} \sum _{\left( {\mathbf {x}}_t, w, p\right) \in \mathscr {T}_t\left( \mathbf {S}_{t-1}\right) } h^*\left( {\mathbf {x}}_t\right) w. \end{aligned}$$
(23)

It is clear that if the proposed merging rule is applied to \(\mathscr {T}_t\left( \mathbf {S}_{t-1}\right) \), then the value of (23) is unchanged. Using the definition of \(\mathscr {T}_t\left( \mathbf {S}_{t-1}\right) \), Eq. (23) can be written as

$$\begin{aligned}&\sum _{\left( {\mathbf {x}}_{t-1}, w, p\right) \in \mathbf {S}_{t-1}} w \sum _{{\mathbf {x}}_t \in \mathscr {S}_t\left( {\mathbf {x}}_{t-1}\right) } h^*\left( {\mathbf {x}}_t\right) \frac{f\left( x_t \;\vert \;{\mathbf {x}}_{t-1}\right) }{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }\nonumber \\&\quad = \sum _{\left( {\mathbf {x}}_{t-1}, w, p\right) \in \mathbf {S}_{t-1}} \frac{\mathbb {E}\left[ h^*\left( {\mathbf {X}}_t\right) \;\vert \;{\mathbf {X}}_{t-1} = {\mathbf {x}}_{t-1}\right] w}{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }\nonumber \\&\quad = \sum _{\left( {\mathbf {x}}_{t-1}, w, p\right) \in \mathbf {S}_{t-1}} \frac{h^*\left( {\mathbf {x}}_{t-1}\right) w}{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }. \end{aligned}$$
(24)

The expectation of (24) conditional on \(\mathbf {S}_{t-2}\) is

$$\begin{aligned} \sum _{\left( {\mathbf {x}}_{t-1}, w, p\right) \in \mathscr {T}_{t-1}\left( \mathbf {S}_{t-2}\right) } h^*\left( {\mathbf {x}}_{t-1}\right) w. \end{aligned}$$
(25)

So

$$\begin{aligned}&\mathbb {E}\left[ \sum _{\left( {\mathbf {x}}_{t}, w, p\right) \in \mathscr {T}_{t}\left( \mathbf {S}_{t-1}\right) } h^*\left( {\mathbf {x}}_{t}\right) w \;\vert \;\mathbf {S}_{t-2}\right] \nonumber \\&\quad = \sum _{\left( {\mathbf {x}}_{t-1}, w, p\right) \in \mathscr {T}_{t-1}\left( \mathbf {S}_{t-2}\right) } h^*\left( {\mathbf {x}}_{t-1}\right) w. \end{aligned}$$
(26)

Applying Eq. (26) \(d-1\) times to

$$\begin{aligned} \mathbb {E}\left[ \widehat{\ell } \;\vert \;\mathbf {S}_{d-1}\right] = \sum _{\left( {\mathbf {x}}_d, w, p\right) \in \mathscr {T}_d\left( \mathbf {S}_{d-1}\right) } h^*\left( {\mathbf {x}}_d\right) w \end{aligned}$$

shows that \(\widehat{\ell }\) is unbiased.

Appendix 3: Without-replacement sampling for the change-point example

We now give the details of the application of without-replacement sampling to the change-point example in Sect. 1. Recall that \({\mathbf {X}}_d = \left\{ X_t \right\} _{t=1}^d\) is a Markov chain and \({\mathbf {Y}}_d = \left\{ Y_t \right\} _{t=1}^d\) are the observations. Let f be the joint density of \({\mathbf {X}}_d\) and \({\mathbf {Y}}_d\). Note that

$$\begin{aligned} f\left( {\mathbf {x}}_{t}\;\vert \;{\mathbf {y}}_{t}\right)&= c_t f\left( {\mathbf {x}}_{t-1} \;\vert \;{\mathbf {y}}_{t-1}\right) f\left( x_{t}\;\vert \;{\mathbf {x}}_{t-1}\right) f\left( y_{t} \;\vert \;x_{t}\right) , \end{aligned}$$
(27)
$$\begin{aligned} f\left( {\mathbf {x}}_{1}\;\vert \;{\mathbf {y}}_{1}\right)&= c_1 f\left( x_{1}\right) f\left( y_{1} \;\vert \;x_{1}\right) , \end{aligned}$$
(28)

for some unknown constants \(\left\{ c_t\right\} _{t=1}^d\). Define the size variables recursively as

$$\begin{aligned} p\left( {\mathbf {x}}_t\right)&= p\left( {\mathbf {x}}_{t-1}\right) \frac{f\left( x_{t} \;\vert \;{\mathbf {x}}_{t-1}\right) f\left( y_t \;\vert \;x_t\right) }{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }, \end{aligned}$$
(29)
$$\begin{aligned} p\left( x_1\right)&= f\left( x_1\right) f\left( y_1 \;\vert \;x_1\right) . \end{aligned}$$
(30)

This updating rule is slightly different from that given in (17). Equations (30) and (27) require an initial distribution for \(X_1 = \left( C_1, O_1\right) \), which we take to be

$$\begin{aligned} \mathbb {P}\left( C_1 = 2, O_1 = 2\right) = \frac{1}{250}, \mathbb {P}\left( C_1 = 2, O_1 = 2\right) = \frac{249}{250}. \end{aligned}$$

Define

$$\begin{aligned} {\mathscr {U}}_1 = {\mathscr {U}}_1\left( \emptyset \right) = \left\{ \left( x_1, f\left( x_1\right) f\left( y_1 \;\vert \;x_1\right) \right) :x_1 \in {\mathscr {S}}_1\right\} , \end{aligned}$$

and let \(\mathbf {S}_1\) be a sample chosen from \(\mathscr {U}_1\), with probability proportional to the last component. Assume that sample \(\mathbf {S}_{t-1}\) has been chosen, and let

$$\begin{aligned} \mathscr {U}_{t}\left( \mathbf {S}_{t-1}\right)&= \left\{ \left( {\mathbf {x}}_t, w \frac{f\left( x_t \;\vert \;{\mathbf {x}}_{t-1}\right) f\left( y_t \;\vert \;x_t\right) }{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }\right) :\right. \\ \left( {\mathbf {x}}_{t-1}, w\right)&\left. \in \mathbf {S}_{t-1}, {\mathbf {x}}_t \in \mathrm {Support}\left( {\mathbf {X}}_t \;\vert \;{\mathbf {X}}_{t-1} = {\mathbf {x}}_{t-1}\right) \right\} . \end{aligned}$$

We account for the unknown normalizing constants in (27) by using an estimator of the form (12). This results in Algorithm 5.

figure e

Proposition 2

The set \(\mathbf {S}_d\) generated by Algorithm 5 has the property that

$$\begin{aligned} \mathbb {E}\left[ \sum _{\left( {\mathbf {x}}_d, w\right) \in \mathbf {S}_d}\frac{h\left( {\mathbf {x}}_d\right) w}{\pi ^d\left( {\mathbf {x}}_d\right) }\right]&= \mathbb {E}\left( h\left( {\mathbf {X}}_d\right) \;\vert \;{\mathbf {Y}}_d\right) \prod _{t=1}^d c_t^{-1}. \end{aligned}$$

Proof

Define

$$\begin{aligned} H\left( {\mathbf {x}}_t\right)&= \frac{\mathbb {E}\left[ h\left( {\mathbf {X}}_d\right) \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t, {\mathbf {Y}}_d = {\mathbf {y}}_d\right] f\left( {\mathbf {x}}_t \;\vert \;{\mathbf {y}}_d\right) }{f\left( {\mathbf {x}}_t \;\vert \;{\mathbf {y}}_t\right) \prod _{i=t+1}^d c_i}. \end{aligned}$$

Using (27),

$$\begin{aligned}&\sum _{{\mathbf {x}}_t \in \mathscr {S}_{t}\left( {\mathbf {x}}_{t-1}\right) }H\left( {\mathbf {x}}_t\right) f\left( x_t \;\vert \;{\mathbf {x}}_{t-1}\right) f\left( y_t \;\vert \;x_t\right) \\&\quad = \sum _{{\mathbf {x}}_t \in \mathscr {S}_{t}\left( {\mathbf {x}}_{t-1}\right) }\frac{\mathbb {E}\left[ h\left( {\mathbf {X}}_d\right) \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t, {\mathbf {Y}}_d = {\mathbf {y}}_d\right] f\left( {\mathbf {x}}_t\;\vert \;{\mathbf {y}}_d\right) }{f\left( {\mathbf {x}}_{t-1}\;\vert \;{\mathbf {y}}_{t-1}\right) \prod _{i=t}^d c_i}\\&\quad = \frac{\mathbb {E}\left[ h\left( {\mathbf {X}}_d\right) \;\vert \;{\mathbf {X}}_{t-1} = {\mathbf {x}}_{t-1}, {\mathbf {Y}}_d = {\mathbf {y}}_d\right] f\left( {\mathbf {x}}_{t-1}\;\vert \;{\mathbf {y}}_d\right) }{f\left( {\mathbf {x}}_{t-1}\;\vert \;{\mathbf {y}}_{t-1}\right) \prod _{i=t}^d c_i}\\&\quad = H\left( {\mathbf {x}}_{t-1}\right) . \end{aligned}$$

Consider any expression of the form

$$\begin{aligned} \sum _{\left( {\mathbf {x}}_{t}, w\right) \in \mathscr {U}_{t}\left( \mathbf {S}_{t-1}\right) }H\left( {\mathbf {x}}_t\right) w. \end{aligned}$$
(31)

Equation (31) can be written as

$$\begin{aligned}&\sum _{\left( {\mathbf {x}}_{t-1}, w\right) \in \mathbf {S}_{t-1}}\sum _{{\mathbf {x}}_t \in \mathscr {S}_{t}\left( {\mathbf {x}}_{t-1}\right) }H\left( {\mathbf {x}}_t\right) w \frac{f\left( x_t \;\vert \;{\mathbf {x}}_{t-1}\right) f\left( y_t \;\vert \;x_t\right) }{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }\nonumber \\&\quad = \sum _{\left( {\mathbf {x}}_{t-1}, w\right) \in \mathbf {S}_{t-1}}\frac{w H\left( {\mathbf {x}}_{t-1}\right) }{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }. \end{aligned}$$
(32)

The expectation of (32) conditional on \(\mathbf {S}_{t-2}\) is

$$\begin{aligned} \sum _{\left( {\mathbf {x}}_{t-1}, w\right) \in \mathscr {U}_{t-1}\left( \mathbf {S}_{t-2}\right) }H\left( {\mathbf {x}}_{t-1}\right) w. \end{aligned}$$

So

$$\begin{aligned}&\mathbb {E}\left[ \sum _{\left( {\mathbf {x}}_{t}, w\right) \in \mathscr {U}_{t}\left( \mathbf {S}_{t-1}\right) }H\left( {\mathbf {x}}_t\right) w\;\vert \;\mathbf {S}_{t-2} \right] \nonumber \\&\quad = \sum _{\left( {\mathbf {x}}_{t-1}, w\right) \in \mathscr {U}_{t-1}\left( \mathbf {S}_{t-2}\right) }H\left( {\mathbf {x}}_{t-1}\right) w. \end{aligned}$$
(33)

Applying Eq. (33) \(d-1\) times to

$$\begin{aligned}&\mathbb {E}\left[ \sum _{\left( {\mathbf {x}}_d, w\right) \in \mathbf {S}_d}\frac{h\left( {\mathbf {x}}_d\right) w}{\pi ^d\left( {\mathbf {x}}_d\right) }\;\vert \;\mathbf {S}_{d-1}\right] \\&\quad = \sum _{\left( {\mathbf {x}}_d, w\right) \in \mathscr {U}_{d}\left( \mathbf {S}_{d-1}\right) }h\left( {\mathbf {x}}_d\right) w\\&\quad = \sum _{\left( {\mathbf {x}}_d, w\right) \in \mathscr {U}_{d}\left( \mathbf {S}_{d-1}\right) }H\left( {\mathbf {x}}_d\right) w. \end{aligned}$$

completes the proof. \(\square \)

We now describe the merging step outlined in Fearnhead and Clifford (2003), applied to the estimation of the posterior change-point probabilities

$$\begin{aligned} \left\{ \mathbb {P}\left( C_t = 2 \;\vert \;{\mathbf {Y}}_d = {\mathbf {y}}_d\right) \right\} _{t=1}^d. \end{aligned}$$

The method we describe here can be extended fairly trivially to also estimate \(\left\{ \mathbb {P}\left( O_t = 2 \;\vert \;{\mathbf {Y}}_d = {\mathbf {y}}_d\right) \right\} _{t=1}^d\).

In order to perform this merging, we must add more information to all the sample spaces and the samples chosen from then. The extended space will have \({\mathbf {x}}_t\) as the first entry, the particle weight w as the second entry, and a vector \(\mathbf m_t\) of t values as the third entry. The last entry will be an estimate of \(\left\{ \mathbb {P}\left( C_i = 2 \;\vert \;{\mathbf {y}}_t\right) \right\} _{i=1}^t\). Let

$$\begin{aligned} \mathscr {V}_1&= \left\{ \left( x_1, f\left( x_1\right) f\left( y_1 \;\vert \;x_1\right) , \mathbb {P}\left( C_1 = 2 \;\vert \;x_1\right) \right) :x_1 \in \mathscr {S}_1\right\} . \end{aligned}$$

Note that the third component of every element of \(\mathscr {V}_1\) is either 0 or 1. Let \(\mathbf {S}_1\) be a sample drawn from \(\mathscr {V}_1\), with probability proportional to the second element. Assume that sample \(\mathbf {S}_{t-1}\) has been chosen, and let \(\mathscr {V}_t\left( \mathbf {S}_{t-1}\right) \) be

$$\begin{aligned}&\left\{ \left( {\mathbf {x}}_t, w \frac{f\left( x_t \;\vert \;{\mathbf {x}}_{t-1}\right) f\left( y_t \;\vert \;{\mathbf {x}}_t\right) }{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }, \left( \mathbf m_{t-1}, \right. \right. \right. \\&\quad \left. \left. \left. \mathbb {P}\left( C_t = 2 \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t, {\mathbf {Y}}_d = {\mathbf {y}}_d\right) \right) \right) :\right. \\&\quad \left. \left( {\mathbf {x}}_{t-1}, w, \mathbf m_{t-1}\right) \in \mathbf {S}_{t-1}, {\mathbf {x}}_t \in \mathscr {S}_t\left( {\mathbf {x}}_{t-1}\right) \right\} . \end{aligned}$$

We can now define Algorithm 6, which uses the merging step outlined in Proposition 4.

figure f

Proposition 3

If the merging step is omitted, then the set \(\mathbf {S}_d\) generated by Algorithm 6 has the property that

$$\begin{aligned} \mathbb {E}\left[ \sum _{\left( {\mathbf {x}}_d, w, \mathbf m_d\right) \in \mathbf {S}_d}\frac{\mathbf m_d w}{\pi ^d\left( {\mathbf {x}}_d\right) }\right] = \frac{\left\{ \mathbb {P}\left( C_t = 2 \;\vert \;{\mathbf {Y}}_d = {\mathbf {y}}_d\right) \right\} _{t=1}^d}{\prod _{t=1}^d c_t}. \end{aligned}$$

Proof

Define

$$\begin{aligned} G\left( {\mathbf {x}}_t, \mathbf m_t\right)&= \left( \mathbf m_t, \mathbb {P}\left( C_{t+1} = 2 \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t, {\mathbf {Y}}_d = {\mathbf {y}}_d\right) , \right. \\&\qquad \left. \ldots , \mathbb {P}\left( C_{d} = 2 \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t, {\mathbf {Y}}_d = {\mathbf {y}}_d\right) \right) \\&\quad \times \,\frac{f\left( {\mathbf {x}}_t \;\vert \;{\mathbf {y}}_d\right) }{f\left( {\mathbf {x}}_t \;\vert \;{\mathbf {y}}_t\right) \prod _{i=t+1}^d c_i}. \end{aligned}$$

It can be shown that

$$\begin{aligned}&\sum _{{\mathbf {x}}_t \in \mathscr {S}_t\left( {\mathbf {x}}_{t-1}\right) } G\left( {\mathbf {x}}_t, \left( {\mathbf {m}}_{t-1}, \mathbb {P}\left( C_t = 2 \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t, {\mathbf {Y}}_d = {\mathbf {y}}_d\right) \right) \right) \\&\quad \times \,f\left( x_t \;\vert \;{\mathbf {x}}_{t-1}\right) f\left( y_t \;\vert \;x_t\right) \\&\quad = G\left( {\mathbf {x}}_{t-1}, {\mathbf {m}}_{t-1}\right) . \end{aligned}$$

Consider any expression of the form

$$\begin{aligned} \sum _{\left( {\mathbf {x}}_t, w, \mathbf m_t\right) \in \mathscr {V}_t\left( \mathbf {S}_{t-1}\right) } G\left( {\mathbf {x}}_t, \mathbf m_t\right) w. \end{aligned}$$
(34)

Equation (34) can be written as

$$\begin{aligned}&\sum _{\left( {\mathbf {x}}_{t-1}, w, {\mathbf {m}}_{t-1}\right) \in \mathbf {S}_{t-1}}w\sum _{{\mathbf {x}}_t \in \mathscr {S}_t\left( {\mathbf {x}}_{t-1}\right) }\nonumber \\&\quad G\left( {\mathbf {x}}_t, \left( {\mathbf {m}}_{t-1}, \mathbb {P}\left( C_t = 2 \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t, {\mathbf {Y}}_d = {\mathbf {y}}_d\right) \right) \right) \nonumber \\&\quad \times \frac{f\left( x_t \;\vert \;{\mathbf {x}}_{t-1}\right) f\left( y_t \;\vert \;{\mathbf {x}}_t\right) }{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }\nonumber \\&\quad = \sum _{\left( {\mathbf {x}}_{t-1}, w, {\mathbf {m}}_{t-1}\right) \in \mathbf {S}_{t-1}}w\frac{G\left( {\mathbf {x}}_{t-1}, \mathbf m_{t-1}\right) }{\pi ^{t-1}\left( {\mathbf {x}}_{t-1}\right) }. \end{aligned}$$
(35)

The expectation of (35) conditional on \(\mathbf {S}_{t-2}\) is

$$\begin{aligned} \sum _{\left( {\mathbf {x}}_{t-1}, w, {\mathbf {m}}_{t-1}\right) \in \mathscr {V}_{t-1}\left( \mathbf {S}_{t-2}\right) }w G\left( {\mathbf {x}}_{t-1}, \mathbf m_{t-1}\right) . \end{aligned}$$

So

$$\begin{aligned}&\mathbb {E}\left[ \sum _{\left( {\mathbf {x}}_t, w, \mathbf m_t\right) \in \mathscr {V}_t\left( \mathbf {S}_{t-1}\right) } G\left( {\mathbf {x}}_t, \mathbf m_t\right) w \;\vert \;\mathbf {S}_{t-2}\right] \nonumber \\&= \sum _{\left( {\mathbf {x}}_{t-1}, w, {\mathbf {m}}_{t-1}\right) \in \mathscr {V}_{t-1}\left( \mathbf {S}_{t-2}\right) }w G\left( {\mathbf {x}}_{t-1}, \mathbf m_{t-1}\right) . \end{aligned}$$
(36)

Applying Eq. (36) \(d-1\) times to

$$\begin{aligned}&\mathbb {E}\left[ \sum _{\left( {\mathbf {x}}_d, w, \mathbf m_d\right) \in \mathbf {S}_d}\frac{\mathbf m_d w}{\pi ^d\left( {\mathbf {x}}_d\right) }\;\vert \;\mathbf {S}_{d-1}\right] \\&\quad =\,\sum _{\left( {\mathbf {x}}_{d}, w, {\mathbf {m}}_{d}\right) \in \mathscr {V}_{d}\left( \mathbf {S}_{d-1}\right) }w G\left( {\mathbf {x}}_{d}, \mathbf m_{d}\right) \end{aligned}$$

completes the proof. \(\square \)

Proposition 4

Assume we have two units \(\left( {\mathbf {x}}_t, w, \mathbf m_t\right) \) and \(\left( {\mathbf {x}}_t', w', \mathbf m_t'\right) \), both corresponding to paths of the Markov chain with \(C_t = 2\) and \(O_t = 2\). Then, we can remove these units, and replace them with the single unit

$$\begin{aligned} \left( {\mathbf {x}}_t, w + w', \frac{w \mathbf m_t + w' \mathbf m_t'}{w + w'}\right) . \end{aligned}$$

This rule also applies if both units correspond to \(C_t = 2\) and \(O_t = 1\).

Proof

Under the specified conditions on \({\mathbf {x}}_t\) and \({\mathbf {x}}_t'\),

$$\begin{aligned}&\mathbb {P}\left( C_i = 2 \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t, {\mathbf {Y}}_d = {\mathbf {Y}}_d\right) \\&\quad = \mathbb {P}\left( C_i = 2 \;\vert \;{\mathbf {X}}_t = {\mathbf {x}}_t', {\mathbf {Y}}_d = {\mathbf {Y}}_d\right) ,&\forall t+ 1 \le i \le d, \\&\quad f\left( {\mathbf {x}}_t \;\vert \;{\mathbf {y}}_t\right) = f\left( {\mathbf {x}}_t \;\vert \;{\mathbf {y}}_d\right) ,\\&\quad f\left( {\mathbf {x}}_t' \;\vert \;{\mathbf {y}}_t\right) = f\left( {\mathbf {x}}_t' \;\vert \;{\mathbf {y}}_d\right) . \end{aligned}$$

This shows that

$$\begin{aligned}&\left( w + w'\right) G\left( {\mathbf {x}}_t, \frac{w {\mathbf {m}}_t + w' {\mathbf {m}}_t'}{w + w'}\right) \\&\quad = w G\left( {\mathbf {x}}_t, {\mathbf {m}}_t\right) + w' G\left( {\mathbf {x}}_t', {\mathbf {m}}_t'\right) . \end{aligned}$$

So replacement of this pair of units by the specified single unit does not bias the resulting estimator. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shah, R., Kroese, D.P. Without-replacement sampling for particle methods on finite state spaces. Stat Comput 28, 633–652 (2018). https://doi.org/10.1007/s11222-017-9752-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-017-9752-8

Keywords

Navigation