Skip to main content

Fibers of multi-way contingency tables given conditionals: relation to marginals, cell bounds and Markov bases

Abstract

A fiber of a contingency table is the space of all realizations of the table under a given set of constraints such as marginal totals. Understanding the geometry of this space is a key problem in algebraic statistics, important for conducting exact conditional inference, calculating cell bounds, imputing missing cell values, and assessing the risk of disclosure of sensitive information. Motivated by disclosure problems, in this paper we study the space of all possible tables for a given sample size and set of observed conditional frequencies. We show that this space can be decomposed according to different possible marginals, which, in turn, are encoded by the solution set of a linear Diophantine equation. Our decomposition has two important consequences: (1) we derive new cell bounds, some including connections to directed acyclic graphs, and (2) we describe a structure for the Markov bases for the given space that leads to a simplified calculation of Markov bases in this particular setting.

This is a preview of subscription content, access via your institution.

Notes

  1. 1.

    We note that our Diophantine equation does not necessarily satisfy the main hypothesis of the main result from Chen and Li (2007).

References

  1. 4ti2 Team. (2014). 4ti2—a software package for algebraic, geometric and combinatorial problems on linear spaces. http://www.4ti2.de.

  2. Agresti, A. (2002). Categorical data analysis. New Jersey: Wiley.

    MATH  Book  Google Scholar 

  3. Aoki, S., Takemura, A. (2002). Minimal basis for connected Markov chain over \(3 \times 3 \times K\) contingency tables with fixed two-dimensional marginals. Australian and New Zealand Journal of Statistics, 45, 229–249.

  4. Aoki, S., Takemura, A. (2008). Minimal invariant Markov basis for sampling contingency tables with fixed marginals. Annals of the Institute of Statistical Mathematics, 60, 229–256.

  5. Arnold, B., Castillo, E., Sarabia, J. (1999). Conditional specification of statistical models. Berlin: Springer.

  6. Barvinok, A. I. (1994). A polynomial time algorithm for counting integral points in polyhedra when the dimension is fixed. Mathematics of Operations Research, 19, 769–779. doi:10.1287/moor.19.4.769.

  7. Barvinok, A., Luria, Z., Samorodnitsky, A., Yong, A. (2010). An approximation algorithm for counting contingency tables. Random Structures and Algorithms, 37, 25–66.

  8. Bishop, Y., Fienberg, S., Holland, P. (2007). Discrete multivariate analysis. New York: Springer.

  9. Chen, S., Li, N. (2007). On a conjecture about the number of solutions to linear diophantine equations with a positive integer parameter. arXiv:0710.0177.

  10. Chen, Y., Dinwoodie, I., Sullivant, S. (2006). Sequential importance sampling for multiway tables. Annals of Statistics, 34, 523–545.

  11. De Loera, J. A., Onn, S. (2006). Markov bases of three-way tables are arbitrarily complicated. Journal of Symbolic Computation, 41, 173–181. doi:10.1016/j.jsc.2005.04.010.

  12. De Loera, J. A., Hemmecke, R., Tauzer, J., Yoshida, R. (2004). Effective lattice point counting in rational convex polytopes. Journal of Symbolic Computation, 38, 1273–1302.

  13. Diaconis, P., Sturmfels, B. (1998). Algebraic algorithms for sampling from conditional distributions. The Annals of Statistics, 26, 363–397.

  14. Dobra, A. (2003). Markov bases for decomposable graphical models. Bernoulli, 9(6), 1093–1108.

  15. Dobra, A. (2012). Dynamic Markov bases. Journal of Computational and Graphical Statistics, 21, 496–517.

  16. Dobra, A., Fienberg, S. (2000). Bounds for cell entries in contingency tables given marginal totals and decomposable graphs. Proceedings of the National Academy of Sciences, 97, 11885–11892.

  17. Dobra, A., Fienberg, S. E. (2010). The generalized Shuttle algorithm. In M. R. P. Gibilisco, E. Riccomagno, H. Wynn, (Eds.), Algebraic and geometric methods in statistics (pp. 135–156). UK: Cambridge University Press.

  18. Dobra, A., Tebaldi, C., West, M. (2006). Data augmentation in multi-way contingency tables with fixed marginal totals. Journal of Statistical Planning and Inference, 136(2), 355–372.

  19. Dobra, A., Fienberg, S., Rinaldo, A., Slavković, A., Zhou, Y. (2008). Algebraic statistics and contingency table problems: log-linear models, likelihood estimation and disclosure limitation. In M. Putinar, S. Sullivant, (Eds.), IMA volumes and in mathematics and its applications: emerging applications of algebraic geometry, (Vol. 149, pp. 63–88). New York: Springer.

  20. Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (2001). Confidentiality, disclosure and data access. USA: North Holland.

  21. Drton, M., Sturmfels, B., Sullivant, S. (2009). Lectures on algebraic statistics. Oberwolfach seminars, Vol. 40. Basel: Birkhäuser.

  22. Eisenbeis, C., Temam, O., Wijshoff, H. (1992). On efficiently characterizing solutions of linear Diophantine equations and its application to data dependence analysis. Technical Report No. RUU-CS-92-01. Utrecht University.

  23. Hemmecke, R., Malkin, P. N. (2009). Computing generating sets of lattice ideals and Markov bases of lattices. Journal of Symbolic Computation. 44, 1463–1476. http://dblp.uni-trier.de/db/journals/jsc/jsc44.html#HemmeckeM09.

  24. Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E. S., Spicer, K., et al. (2012). Statistical disclosure control. West Sussex: Wiley.

  25. Lasserre, J. B., Zeron, E. S. (2007). Simple explicit formula for counting lattice points of polyhedra. In IPCO ’07: Proceedings of the 12th international conference on integer programming and combinatorial optimization (pp. 367–381). Berlin, Heidelberg: Springer. doi:10.1007/978-3-540-72792-7_28.

  26. LattE. (2014). LattE machiato—lattice point enumeration. http://www.math.ucdavis.edu/~mkoeppe/latte/.

  27. Lauritzen, S. (1996). Graphical models. USA: Oxford University Press.

  28. Lazebnik, F. (1996). On systems of linear Diophantine equations. Mathematics Magazine, 69, 261–266. http://www.jstor.org/stable/2690528.

  29. Lee, J. (2009). Sampling contingency tables given sets of conditionals and marginals in the context of statistical disclosure limitation. PhD thesis, Penn State University.

  30. Malkin, P. (2007). Computing Markov bases, Grbner bases, and extreme rays. PhD thesis. http://edoc.bib.ucl.ac.be:81/ETD-db/collection/available/BelnUcetd-06222007-144602/unrestricted/thesis.pdf.

  31. Marjoram, P., Molitor, J., Plagnol, V., Tavaré, S. (2003). Markov chain Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences of the United States of America, 100(26), 15324–15328.

  32. Morito, S., Salkin, H. M. (1980). Using the Blankinship algorithm to find the general solution of a linear Diophantine equation. Acta Informatica, 13, 379–382.

  33. Morton, J. (2013). Relations among conditional probabilities. Journal of Symbolic Computation, 50, 478–492. doi:10.1016/j.jsc.2012.02.005.

  34. R Development Core Team. (2005). R: a language and environment for statistical computing ISBN 3-900051-07-0. http://www.R-project.org.

  35. Richardson, T., Spirites, P. (2002). Ancestral graph Markov models. Annals of Statistics, 30(4), 962–1030.

  36. Sertoz, S. (1998). On the number of solutions of a Diophantine equation of Frobenius. Discrete Mathematics and Applications, 8, 153–162.

  37. Slavković, A. (2004). Statistical disclosure limitation beyond the margins: characterization of joint distributions for contingency tables. PhD thesis, Carnegie Mellon University.

  38. Slavković, A. (2009). Partial information releases for confidential contingency table entries: present and future research efforts. Journal of Privacy and Confidentiality, 1(2), 253–264.

  39. Slavković, A. B., Fienberg, S. E. (2004). Bounds for cell entries in two-way tables given conditional relative frequencies. In J. Domingo-Ferrer, V. Torra (Eds.) Privacy in statistical databases—PSD 2004, lecture notes in computer science No. 3050 (pp. 30–43). Berlin: Springer.

  40. Slavković, A. B., Fienberg, S. E. (2010). Algebraic geometry of \(2 \times 2\) contingency tables. In M. R. P. Gibilisco E. Riccomagno, H. Wynn (Eds.) Algebraic and geometric methods in statistics (pp. 63–81). UK: Cambridge University Press.

  41. Slavković, A., Lee, J. (2010). Synthetic two-way contingency table preserving conditional frequencies. Statistical Methodology, 7, 225–239.

  42. Slavković, A., Sullivant, S. (2006). The space of compatible full conditionals is a unimodular toric variety. Journal of Symbolic Computation, 41, 196–209.

  43. Smarandache, F. (2000). Integer algorithms to solver Diophantine linear equations and systems. http://arxiv.org/abs/math/0010134.

  44. Smucker, B., Slavković, A. (2008). Cell bounds in two-way contingency tables based on conditional frequencies. In J. Domingo-Ferrer, S. Ycel (Eds.) Proceedings of the UNESCO Chair in Data Privacy International Conference—PSD 2008, lecture notes in computer science No. 5262 (pp. 64–76). Berlin: Springer.

  45. Smucker, B., Slavković, A., Zhu, X. (2012). Cell Bounds in multi-way contingency tables based on conditional frequencies. Journal of Official Statistics, 28, 121–140.

  46. Sturmfels, B., Weismantel, R., Ziegler, G. (1994). Gröbner bases of lattices, corner polyhedra, and integer programming. Berlin: Konrad-Zuse-Zentrum für Informationstechnik.

  47. Thibaudeau, Y. (2003). An algorithm for computing full rank minimal sufficient statistics with applications to confientiality protection. In Monographs of official statistics, work session on statistical data confidentiality, Vol. 1. Luxembourg: Eurostat.

  48. Whittaker, J. (1990). Graphical models in applied multivariate statistics. New York: Wiley.

Download references

Acknowledgments

A. Slavković and X. Zhu supported in part by NSF grants SES-052407 and BCS-0941553 from the Pennsylvania State University. S. Petrović supported in part by grant FA9550-12-1-0392 from the U.S. Air Force Office of Scientific Research (AFOSR) and the Defense Advanced Research Projects Agency (DARPA). The authors would like to thank the reviewers for their valuable insights and comments.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Aleksandra Slavković.

About this article

Verify currency and authenticity via CrossMark

Cite this article

Slavković, A., Zhu, X. & Petrović, S. Fibers of multi-way contingency tables given conditionals: relation to marginals, cell bounds and Markov bases. Ann Inst Stat Math 67, 621–648 (2015). https://doi.org/10.1007/s10463-014-0471-z

Download citation

Keywords

  • Conditional tables
  • Contingency tables
  • Diophantine equations
  • Disclosure limitation
  • Directed acyclic graphs
  • Marginal tables
  • Markov bases
  • Optimization for cell entries