Abstract
We present a new approach for random sampling of contingency tables of any size and constraints based on a recently introduced probabilistic divide-and-conquer (PDC) technique. Our first application is a recursive PDC: it samples the least significant bit of each entry in the table, motivated by the fact that the bits of a geometric random variable are independent. The second application is via PDC deterministic second half, where one divides the sample space into two pieces, one of which is deterministic conditional on the other; this approach is highlighted via an exact sampling algorithm in the \(2\times n\) case. Finally, we also present a generalization to the sampling algorithm where each entry of the table has a specified marginal distribution.
Similar content being viewed by others
Notes
One must still fill those blocks, which can be done, e.g., using a permutation of s.
References
Arratia R, DeSalvo S (2016) Probabilistic divide-and-conquer: a new exact simulation method, with integer partitions as an example. Comb Probab Comput 25(3):324–351
Arratia R, Tavaré S (1994) Independent process approximations for random combinatorial structures. Adv Math 104(1):90–154
Baldoni-Silva W, De Loera JA, Vergne M (2004) Counting integer flows in networks. Found Comput Math 4(3):277–314
Barvinok A (2007) Brunn–Minkowski inequalities for contingency tables and integer flows. Adv Math 211(1):105–122
Barvinok A (2008) Enumerating contingency tables via random permanents. Comb Probab Comput 17(1):1–19
Barvinok A (2009a) Asymptotic estimates for the number of contingency tables, integer flows, and volumes of transportation polytopes. Int Math Res Not IMRN 2:348–385
Barvinok A (2009b) Asymptotic estimates for the number of contingency tables, integer flows, and volumes of transportation polytopes. Int Math Res Not 2009(2):348–385
Barvinok A (2010) What does a random contingency table look like? Comb Probab Comput 19(04):517–539
Barvinok A, Hartigan J (2010) Maximum entropy gaussian approximations for the number of integer points and volumes of polytopes. Adv Appl Math 45(2):252–289
Barvinok A, Hartigan JA (2012) An asymptotic formula for the number of non-negative integer matrices with prescribed row and column sums. Trans Am Math Soc 364(8):4323–4368
Barvinok A, Luria Z, Samorodnitsky A, Yong A (2010) An approximation algorithm for counting contingency tables. Random Struct Algorithms 37(1):25–66
Bender EA (1974) The asymptotic number of non-negative integer matrices with given row and column sums. Discrete Math 10:217–223
Bezáková I, Sinclair A, Štefankovič D, Vigoda E (2006) Negative examples for sequential importance sampling of binary contingency tables. In: Algorithms–ESA 2006. Springer, pp 136–147
Blitzstein J, Diaconis P (2011) A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Internet Math 6(4):489–522
Canfield ER, McKay BD (2005) Asymptotic enumeration of dense 0–1 matrices with equal row sums and equal column sums. J Comb 12(2):R29
Chen Y, Diaconis P, Holmes SP, Liu JS (2005) Sequential Monte Carlo methods for statistical analysis of tables. J Am Stat Assoc 100(469):109–120
Chen Y, Dinwoodie IH, Sullivant S (2006) Sequential importance sampling for multiway tables. Ann Stat 34(1):523–545
Cryan M, Dyer M (2003) A polynomial-time algorithm to approximately count contingency tables when the number of rows is constant. J Comput Syst Sci 67(2):291–310
Cryan M, Dyer M, Goldberg LA, Jerrum M, Martin R (2006) Rapidly mixing Markov chains for sampling contingency tables with a constant number of rows. SIAM J Comput 36(1):247–278
De Loera JA, Hemmecke R, Tauzer J, Yoshida R (2004) Effective lattice point counting in rational convex polytopes. J Symb Comput 38(4):1273–1302
DeSalvo S (2018) Probabilistic divide-and-conquer: deterministic second half. Adv Appl Math 92:17–50
Devroye L (1986) Nonuniform random variate generation. Springer, New York
Diaconis P, Gangolli A (1995) Rectangular arrays with fixed margins. In: Discrete probability and algorithms (Minneapolis, MN, 1993), IMA Vol. Math. Appl., vol 72. Springer, New York, pp 15–41
Diaconis P, Sturmfels B et al (1998) Algebraic algorithms for sampling from conditional distributions. Ann Stat 26(1):363–397
Duchon P (2011) Random generation of combinatorial structures: Boltzmann samplers and beyond. In: Proceedings of the 2011 winter simulation conference (WSC), pp 120–132
Duchon P, Flajolet P, Louchard G, Schaeffer G (2004) Boltzmann samplers for the random generation of combinatorial structures. Comb Probab Comput 13(4–5):577–625
Dyer M, Greenhill C (2000) Polynomial-time counting and sampling of two-rowed contingency tables. Theor Comput Sci 246(1):265–278
Dyer M, Frieze A, Kannan R (1991) A random polynomial-time algorithm for approximating the volume of convex bodies. J ACM 38(1):1–17
Fishman GS (2012) Counting contingency tables via multistage Markov chain Monte Carlo. J Comput Graph Stat 21(3):713–738
Good IJ, Crook JF (1977) The enumeration of arrays and a generalization related to contingency tables. Discrete Math 19(1):23–45
Greenhill C, McKay BD (2008) Asymptotic enumeration of sparse nonnegative integer matrices with specified row and column sums. Adv Appl Math 41(4):459–481
Hall P (1927) The distribution of means for samples of size n drawn from a population in which the variate takes values between 0 and 1, all such values being equally probable. Biometrika 19:240–245
Irwin JO (1927) On the frequency distribution of the means of samples from a population having any law of frequency with finite moments, with special reference to Pearson’s type ii. Biometrika 19:225–239
Kieffer D, Bianchetti L, Poch O (2012) Perfect sampling on 2 x 2 x... x 2 x k contingency tables with an application to SAGE data. J Stat Plan Inference 142(4):896–901
Kijima S, Matsui T (2006) Polynomial time perfect sampling algorithm for two-rowed contingency tables. Random Struct Algorithms 29(2):243–256
Knuth DE, Yao AC (1976) The complexity of nonuniform random number generation. In: Algorithms and complexity (Proc. Sympos., Carnegie-Mellon Univ., Pittsburgh, PA, 1976). Academic Press, New York, pp 357–428
Loera JAD, Onn S (2006) All linear and integer programs are slim 3-way transportation programs. SIAM J Optim 17(3):806–821
Matsui T, Matsui Y, Ono Y (2004) Random generation of 2 x 2 x... x 2 x j contingency tables. Theor Comput Sci 326(1–3):117–135
Morris BJ (2002) Improved bounds for sampling contingency tables. Random Struct Algorithms 21(2):135–146
Propp JG, Wilson DB (1996) Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Struct Algorithms 9(1–2):223–252
Soules GW (2003) New permanental upper bounds for nonnegative matrices. Linear Multilinear Algebra 51(4):319–337
Steutel F, Thiemann J (1987) On the independence of integer and fractional parts. University of Technology, Department of Mathematics and Computing Science
Von Neumann J (1951) Various techniques used in connection with random digits. J Res Natl Bur Stand Appl Math Ser 13(3):36–38
Wicker N (2010) Perfect sampling algorithm for small m\(\times \) n contingency tables. Stat Comput 20(1):57–61
Yoshida R, Xi J, Wei S, Zhou F, Haws D (2011) Semigroups and sequential importance sampling for multiway tables. arXiv preprint arXiv:1111.6518
Acknowledgements
The authors gratefully acknowledge helpful discussions with Chris Anderson, Richard Arratia, Jesus de Loera, and also Igor Pak for help with the literature.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
DeSalvo, S., Zhao, J. Random sampling of contingency tables via probabilistic divide-and-conquer. Comput Stat 35, 837–869 (2020). https://doi.org/10.1007/s00180-019-00899-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-019-00899-7