Skip to main content
Log in

Random sampling of contingency tables via probabilistic divide-and-conquer

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

We present a new approach for random sampling of contingency tables of any size and constraints based on a recently introduced probabilistic divide-and-conquer (PDC) technique. Our first application is a recursive PDC: it samples the least significant bit of each entry in the table, motivated by the fact that the bits of a geometric random variable are independent. The second application is via PDC deterministic second half, where one divides the sample space into two pieces, one of which is deterministic conditional on the other; this approach is highlighted via an exact sampling algorithm in the \(2\times n\) case. Finally, we also present a generalization to the sampling algorithm where each entry of the table has a specified marginal distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. One must still fill those blocks, which can be done, e.g., using a permutation of s.

References

  • Arratia R, DeSalvo S (2016) Probabilistic divide-and-conquer: a new exact simulation method, with integer partitions as an example. Comb Probab Comput 25(3):324–351

    Article  MathSciNet  MATH  Google Scholar 

  • Arratia R, Tavaré S (1994) Independent process approximations for random combinatorial structures. Adv Math 104(1):90–154

    Article  MathSciNet  MATH  Google Scholar 

  • Baldoni-Silva W, De Loera JA, Vergne M (2004) Counting integer flows in networks. Found Comput Math 4(3):277–314

    Article  MathSciNet  MATH  Google Scholar 

  • Barvinok A (2007) Brunn–Minkowski inequalities for contingency tables and integer flows. Adv Math 211(1):105–122

    Article  MathSciNet  MATH  Google Scholar 

  • Barvinok A (2008) Enumerating contingency tables via random permanents. Comb Probab Comput 17(1):1–19

    Article  MathSciNet  MATH  Google Scholar 

  • Barvinok A (2009a) Asymptotic estimates for the number of contingency tables, integer flows, and volumes of transportation polytopes. Int Math Res Not IMRN 2:348–385

    Article  MathSciNet  MATH  Google Scholar 

  • Barvinok A (2009b) Asymptotic estimates for the number of contingency tables, integer flows, and volumes of transportation polytopes. Int Math Res Not 2009(2):348–385

    Article  MathSciNet  MATH  Google Scholar 

  • Barvinok A (2010) What does a random contingency table look like? Comb Probab Comput 19(04):517–539

    Article  MathSciNet  MATH  Google Scholar 

  • Barvinok A, Hartigan J (2010) Maximum entropy gaussian approximations for the number of integer points and volumes of polytopes. Adv Appl Math 45(2):252–289

    Article  MathSciNet  MATH  Google Scholar 

  • Barvinok A, Hartigan JA (2012) An asymptotic formula for the number of non-negative integer matrices with prescribed row and column sums. Trans Am Math Soc 364(8):4323–4368

    Article  MathSciNet  MATH  Google Scholar 

  • Barvinok A, Luria Z, Samorodnitsky A, Yong A (2010) An approximation algorithm for counting contingency tables. Random Struct Algorithms 37(1):25–66

    Article  MathSciNet  MATH  Google Scholar 

  • Bender EA (1974) The asymptotic number of non-negative integer matrices with given row and column sums. Discrete Math 10:217–223

    Article  MathSciNet  MATH  Google Scholar 

  • Bezáková I, Sinclair A, Štefankovič D, Vigoda E (2006) Negative examples for sequential importance sampling of binary contingency tables. In: Algorithms–ESA 2006. Springer, pp 136–147

  • Blitzstein J, Diaconis P (2011) A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Internet Math 6(4):489–522

    Article  MathSciNet  MATH  Google Scholar 

  • Canfield ER, McKay BD (2005) Asymptotic enumeration of dense 0–1 matrices with equal row sums and equal column sums. J Comb 12(2):R29

    MathSciNet  MATH  Google Scholar 

  • Chen Y, Diaconis P, Holmes SP, Liu JS (2005) Sequential Monte Carlo methods for statistical analysis of tables. J Am Stat Assoc 100(469):109–120

    Article  MathSciNet  MATH  Google Scholar 

  • Chen Y, Dinwoodie IH, Sullivant S (2006) Sequential importance sampling for multiway tables. Ann Stat 34(1):523–545

    Article  MathSciNet  MATH  Google Scholar 

  • Cryan M, Dyer M (2003) A polynomial-time algorithm to approximately count contingency tables when the number of rows is constant. J Comput Syst Sci 67(2):291–310

    Article  MathSciNet  MATH  Google Scholar 

  • Cryan M, Dyer M, Goldberg LA, Jerrum M, Martin R (2006) Rapidly mixing Markov chains for sampling contingency tables with a constant number of rows. SIAM J Comput 36(1):247–278

    Article  MathSciNet  MATH  Google Scholar 

  • De Loera JA, Hemmecke R, Tauzer J, Yoshida R (2004) Effective lattice point counting in rational convex polytopes. J Symb Comput 38(4):1273–1302

    Article  MathSciNet  MATH  Google Scholar 

  • DeSalvo S (2018) Probabilistic divide-and-conquer: deterministic second half. Adv Appl Math 92:17–50

    Article  MathSciNet  MATH  Google Scholar 

  • Devroye L (1986) Nonuniform random variate generation. Springer, New York

    Book  MATH  Google Scholar 

  • Diaconis P, Gangolli A (1995) Rectangular arrays with fixed margins. In: Discrete probability and algorithms (Minneapolis, MN, 1993), IMA Vol. Math. Appl., vol 72. Springer, New York, pp 15–41

  • Diaconis P, Sturmfels B et al (1998) Algebraic algorithms for sampling from conditional distributions. Ann Stat 26(1):363–397

    Article  MathSciNet  MATH  Google Scholar 

  • Duchon P (2011) Random generation of combinatorial structures: Boltzmann samplers and beyond. In: Proceedings of the 2011 winter simulation conference (WSC), pp 120–132

  • Duchon P, Flajolet P, Louchard G, Schaeffer G (2004) Boltzmann samplers for the random generation of combinatorial structures. Comb Probab Comput 13(4–5):577–625

    Article  MathSciNet  MATH  Google Scholar 

  • Dyer M, Greenhill C (2000) Polynomial-time counting and sampling of two-rowed contingency tables. Theor Comput Sci 246(1):265–278

    Article  MathSciNet  MATH  Google Scholar 

  • Dyer M, Frieze A, Kannan R (1991) A random polynomial-time algorithm for approximating the volume of convex bodies. J ACM 38(1):1–17

    Article  MathSciNet  MATH  Google Scholar 

  • Fishman GS (2012) Counting contingency tables via multistage Markov chain Monte Carlo. J Comput Graph Stat 21(3):713–738

    Article  MathSciNet  Google Scholar 

  • Good IJ, Crook JF (1977) The enumeration of arrays and a generalization related to contingency tables. Discrete Math 19(1):23–45

    Article  MathSciNet  MATH  Google Scholar 

  • Greenhill C, McKay BD (2008) Asymptotic enumeration of sparse nonnegative integer matrices with specified row and column sums. Adv Appl Math 41(4):459–481

    Article  MathSciNet  MATH  Google Scholar 

  • Hall P (1927) The distribution of means for samples of size n drawn from a population in which the variate takes values between 0 and 1, all such values being equally probable. Biometrika 19:240–245

    Article  MATH  Google Scholar 

  • Irwin JO (1927) On the frequency distribution of the means of samples from a population having any law of frequency with finite moments, with special reference to Pearson’s type ii. Biometrika 19:225–239

    Article  MATH  Google Scholar 

  • Kieffer D, Bianchetti L, Poch O (2012) Perfect sampling on 2 x 2 x... x 2 x k contingency tables with an application to SAGE data. J Stat Plan Inference 142(4):896–901

    Article  MathSciNet  MATH  Google Scholar 

  • Kijima S, Matsui T (2006) Polynomial time perfect sampling algorithm for two-rowed contingency tables. Random Struct Algorithms 29(2):243–256

    Article  MathSciNet  MATH  Google Scholar 

  • Knuth DE, Yao AC (1976) The complexity of nonuniform random number generation. In: Algorithms and complexity (Proc. Sympos., Carnegie-Mellon Univ., Pittsburgh, PA, 1976). Academic Press, New York, pp 357–428

  • Loera JAD, Onn S (2006) All linear and integer programs are slim 3-way transportation programs. SIAM J Optim 17(3):806–821

    Article  MathSciNet  MATH  Google Scholar 

  • Matsui T, Matsui Y, Ono Y (2004) Random generation of 2 x 2 x... x 2 x j contingency tables. Theor Comput Sci 326(1–3):117–135

    Article  MathSciNet  MATH  Google Scholar 

  • Morris BJ (2002) Improved bounds for sampling contingency tables. Random Struct Algorithms 21(2):135–146

    Article  MathSciNet  MATH  Google Scholar 

  • Propp JG, Wilson DB (1996) Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Struct Algorithms 9(1–2):223–252

    Article  MathSciNet  MATH  Google Scholar 

  • Soules GW (2003) New permanental upper bounds for nonnegative matrices. Linear Multilinear Algebra 51(4):319–337

    Article  MathSciNet  MATH  Google Scholar 

  • Steutel F, Thiemann J (1987) On the independence of integer and fractional parts. University of Technology, Department of Mathematics and Computing Science

  • Von Neumann J (1951) Various techniques used in connection with random digits. J Res Natl Bur Stand Appl Math Ser 13(3):36–38

    Google Scholar 

  • Wicker N (2010) Perfect sampling algorithm for small m\(\times \) n contingency tables. Stat Comput 20(1):57–61

    Article  MathSciNet  Google Scholar 

  • Yoshida R, Xi J, Wei S, Zhou F, Haws D (2011) Semigroups and sequential importance sampling for multiway tables. arXiv preprint arXiv:1111.6518

Download references

Acknowledgements

The authors gratefully acknowledge helpful discussions with Chris Anderson, Richard Arratia, Jesus de Loera, and also Igor Pak for help with the literature.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen DeSalvo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

DeSalvo, S., Zhao, J. Random sampling of contingency tables via probabilistic divide-and-conquer. Comput Stat 35, 837–869 (2020). https://doi.org/10.1007/s00180-019-00899-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-019-00899-7

Keywords

Navigation