Skip to main content
Log in

Hybrid schemes for exact conditional inference in discrete exponential families

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

Exact conditional goodness-of-fit tests for discrete exponential family models can be conducted via Monte Carlo estimation of p values by sampling from the conditional distribution of multiway contingency tables. The two most popular methods for such sampling are Markov chain Monte Carlo (MCMC) and sequential importance sampling (SIS). In this work we consider various ways to hybridize the two schemes and propose one standout strategy as a good general purpose method for conducting inference. The proposed method runs many parallel chains initialized at SIS samples across the fiber. When a Markov basis is unavailable, the proposed scheme uses a lattice basis with intermittent SIS proposals to guarantee irreducibility and asymptotic unbiasedness. The scheme alleviates many of the challenges faced by the MCMC and SIS schemes individually while largely retaining their strengths. It also provides diagnostics that guide and lend credibility to the procedure. Simulations demonstrate the viability of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. This is the corrected value from that article, which is generally known to have been a typographical error.

References

  • Agresti, A. (1992). A survey of exact inference for contingency tables. Statistical Science, 7(1), 131–153.

    Article  MathSciNet  MATH  Google Scholar 

  • Agresti, A. (2002). Categorical data analysis (2nd ed.). Hoboken: Wiley.

    Book  MATH  Google Scholar 

  • Aoki, S., Hara, H., Takemura, A. (2012). Markov bases in algebraic statistics (Vol. 199). New York: Springer.

  • Baldoni, V., Berline, N., De Loera, J., Dutra, B., Koppe, M., Moreinis, S., Pinto, G., Vergne, M., Wu, J. (2014). A user’s guide for LattE integral v1.7.2. URL: http://www.math.ucdavis.edu/~latte/.

  • Bélisle, C. J., Romeijn, H. E., Smith, R. L. (1993). Hit-and-run algorithms for generating multivariate distributions. Mathematics of Operations Research, 18(2), 255–266.

  • Berkelaar, M., Eikland, K., Notebaert, P. (2015). lpSolve: Interface to Lp_solve v.5.5 to solve linear/integer programs. http://CRAN.R-project.org/package=lpSolve, R package version 5.6.11.

  • Bishop, Y. M. M., Fienberg, S. E., Holland, P. W. (1975). Discrete multivariate analysis: Theory and practice. Cambridge: The MIT Press.

  • Booth, J. G., Butler, R. W. (1999). An importance sampling algorithm for exact conditional tests in log-linear models. Biometrika, 86(2), 321–332.

  • Boyett, J. M. (1979). Algorithm as 144: Random r\(\times \) c tables with given row and column totals. Journal of the Royal Statistical Society Series C-Applied Statistics, 28(3), 329–332.

    MATH  Google Scholar 

  • Brooks, S. P., Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7(4), 434–455.

  • Caffo, B. (2013). exactLoglinTest: Monte Carlo exact tests for log-linear models. http://CRAN.R-project.org/package=exactLoglinTest, R package version 1.4.2.

  • Caffo, B. S., Booth, J. G. (2001). A Markov chain Monte Carlo algorithm for approximating exact conditional probabilities. Journal of Computational and Graphical Statistics, 10(4), 730–745.

  • Chen, Y., Diaconis, P., Holmes, S. P., Liu, J. S. (2005a). Sequential monte carlo methods for statistical analysis of tables. Journal of the American Statistical Association, 100(469), 109–120.

  • Chen, Y., Dinwoodie, I., Dobra, A., Huber, M. (2005b). Lattice points, contingency tables, and sampling. Contemporary Mathematics, 374, 65–78.

  • Chen, Y., Dinwoodie, I., Sullivant, S. (2006). Sequential importance sampling for multiway tables. The Annals of Statistics, 34(1), 523–545.

  • Clarkson, D. B., Fan, Y., Joe, H. (1993). A remark on algorithm 643: Fexact: An algorithm for performing fisher’s exact test in RXC contingency tables. ACM Transactions on Mathematical Software, 19(4), 484–488.

  • Cox, D., Little, J., O’Shea, D. (1997). Ideals, varieties, and algorithms (2nd ed.). New York: Springer.

  • De Loera, J., Onn, S. (2005). Markov bases of three-way tables are arbitrarily complicated. Journal of Symbolic Computation, 41(2), 173–181.

  • Diaconis, P., Sturmfels, B. (1998). Algebraic algorithms for sampling from conditional distributions. The Annals of Statistics, 26(1), 363–397.

  • Dobra, A. (2003). Markov bases for decomposable graphical models. Bernoulli, 9(6), 1093–1108.

    Article  MathSciNet  MATH  Google Scholar 

  • Dobra, A., Sullivant, S. (2004). A divide-and-conquer algorithm for generating Markov bases of multi-way tables. Computational Statistics, 19, 347–366.

  • Drton, M., Sturmfels, B., Sullivant, S. (2009). Lectures on algebraic statistics. Boston: Birkhauser Basel.

  • Eddelbuettel, D. (2013). Seamless R and C++ integration with Rcpp. New York: Springer.

    Book  MATH  Google Scholar 

  • Eddelbuettel, D., François, R. (2011). Rcpp: Seamless R and C++ integration. Journal of Statistical Software, 40(8), 1–18.

  • Fisher, R. A. (1922a). On the interpretation of \(\chi \)2 from contingency tables, and the calculation of p. Journal of the Royal Statistical Society, 85(1), 87–94.

  • Fisher, R. A. (1922b). On the mathematical foundations of theoretical statistics. Philosophical transactions of the royal society of London series A—Containing papers of a mathematical or physical character (pp. 309–368).

  • Fisher, R. A. (1934). Statistical methods for research workers (5th ed.). Edinburgh: Oliver & Boyd.

    MATH  Google Scholar 

  • Gelman, A., Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472.

  • Halton, J. H. (1969). A rigorous derivation of the exact contingency formula. Mathematical Proceedings of the Cambridge Philosophical Society, 65(02), 527–530.

    Article  MathSciNet  MATH  Google Scholar 

  • Hara, H., Takemura, A., Yoshida, R. (2010). On connectivity of fibers with positive marginals in multiple logistic regression. Journal of Multivariate Analysis, 101(4), 909–925.

  • Hara, H., Aoki, S., Takemura, A. (2012). Running Markov chain without Markov basis. In Proceedings of the second CREST-SBM international conference, Harmony of Gröbner bases and the modern industrial society, Singapore (pp. 19–34).

  • Kahle, D., Garcia-Puente, L., Yoshida, R. (2015). algstat: Algebraic statistics in R. http://CRAN.R-project.org/package=algstat, R package version 0.1.0.

  • Kahle, T., Rauh, J. (2011). The Markov bases database. http://www.markov-bases.de.

  • Lange, K. (2010). Numerical analysis for statisticians (2nd ed.). New York: Springer.

    Book  MATH  Google Scholar 

  • Lehmann, E. L., Romano, J. P. (2005). Testing statistical hypotheses (3rd ed.). New York: Springer.

  • Liu, J. S. (2008). Monte Carlo strategies in scientific computing. New York: Springer.

    MATH  Google Scholar 

  • Lunn, D., Jackson, C., Best, N., Thomas, A., Spiegelhalter, D. (2012). The BUGS book: A practical introduction to Bayesian analysis. Boca Raton: CRC Press.

  • Mehta, C. R., Patel, N. R. (1986). Algorithm 643: Fexact: A Fortran subroutine for fisher’s exact test on unordered r\(\times \) c contingency tables. ACM Transactions on Mathematical Software, 12(2), 154–161.

  • Patefield, W. M. (1981). Algorithm as 159: An efficient method of generating random r\(\times \) c tables with given row and column totals. Journal of the Royal Statistical Society Series C-Applied Statistics, 30(1), 91–97.

    MathSciNet  MATH  Google Scholar 

  • Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine Series 5, 50(302), 157–175.

    Article  MATH  Google Scholar 

  • R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/.

  • Read, T. R., Cressie, N. (1988). Goodness-of-fit statistics for discrete multivariate data. New York: Springer.

  • Schrijver, A. (1986). Theory of linear and integer programming. Chichester: Wiley.

    MATH  Google Scholar 

  • Sheskin, D. J. (2007). Handbook of parametric and nonparametric statistical procedures (4th ed.). Boca Raton: Chapman and Hall/CRC Press.

    MATH  Google Scholar 

  • Snee, R. D. (1974). Graphical display of two-way contingency tables. The American Statistician, 28(1), 9–12.

    MathSciNet  MATH  Google Scholar 

  • Snijders, T. (1991). Enumeration and simulation methods for 0–1 matrices with given marginals. Psychometrika, 56(3), 397–417.

    Article  MathSciNet  MATH  Google Scholar 

  • Sturmfels, B. (1996). Gröbner bases and convex polytopes (Vol. 8). Providence: American Mathematical Society.

    MATH  Google Scholar 

  • 4ti2 team (2008). 4ti2—A software package for algebraic, geometric and combinatorial problems on linear spaces. http://www.4ti2.de.

Download references

Acknowledgements

D. K. and R. Y. are supported by the National Science Foundation under Grant Nos. 1622449 and 1622369, respectively. The authors would like to thank an anonymous referee for suggesting that the validity of the scheme should be considered through the lens of unbiasedness.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Kahle.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kahle, D., Yoshida, R. & Garcia-Puente, L. Hybrid schemes for exact conditional inference in discrete exponential families. Ann Inst Stat Math 70, 983–1011 (2018). https://doi.org/10.1007/s10463-017-0615-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-017-0615-z

Keywords

Navigation