Skip to main content
Log in

A class of statistical models to weaken independence in two-way contingency tables

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

In this paper we study a new class of statistical models for contingency tables. We define this class of models through a subset of the binomial equations of the classical independence model. We prove that they are log-linear and we use some notions from Algebraic Statistics to compute their sufficient statistic and their parametric representation. Moreover, we show how to compute maximum likelihood estimates and to perform exact inference through the Diaconis-Sturmfels algorithm. Examples show that these models can be useful in a wide range of applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agresti A (1992) Modelling patterns of agreement and disagreement. Stat Methods Med Res 1: 201–218

    Article  Google Scholar 

  • Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • Aoki S, Takemura A (2005) Markov chain Monte Carlo exact tests for incomplete two-way contingency tables. J Stat Comput Simul 75(10): 787–812

    Article  MathSciNet  MATH  Google Scholar 

  • Bigatti A, La Scala R, Robbiano L (1999) Computing toric ideals. J Symb Comput 27: 351–365

    Article  MathSciNet  MATH  Google Scholar 

  • Bishop YM, Fienberg S, Holland PW (1975) Discrete multivariate analysis: theory and practice. MIT Press, Cambridge

    MATH  Google Scholar 

  • Carlini E, Rapallo F (2009) Algebraic modelling of category distinguishability. In: Gibilisco P, Riccomagno E, Rogantin MP (eds) Algebraic and geometric methods in statistics. Cambridge University Press, London (in press)

  • Chen Y, Dinwoodie I, Dobra A, Huber M (2005) Lattice points, contingency tables, and sampling. In: Integer points in polyhedra—geometry, number theory, algebra, optimization, Contemp. Math., vol 37. Amer. Math. Soc., Providence, pp. 65–78

  • CoCoATeam (2007) CoCoA: a system for doing computations in commutative algebra. Available at http://cocoa.dima.unige.it

  • Cox D, Little J, O’Shea D (1992) Ideals, varieties, and algorithms. Springer, New York

    MATH  Google Scholar 

  • Darroch JN, McCloud PI (1986) Category distinguishability and observer agreement. Aust J Stat 28(3): 371–388

    Article  MathSciNet  MATH  Google Scholar 

  • De Loera J, Haws D, Hemmecke R, Huggins P, Tauzer J, Yoshida R (2003) A user’s guide for LattE v1.1. software package LattE is available at http://www.math.ucdavis.edu/~latte/

  • Diaconis P, Sturmfels B (1998) Algebraic algorithms for sampling from conditional distributions. Ann Stat 26(1): 363–397

    Article  MathSciNet  MATH  Google Scholar 

  • Duffy D (2006) The gllm package. Available from http://cran.r-project.org, 0.31 edn

  • Fienberg S (1980) The analysis of cross-classified categorical data. MIT Press, Cambridge

    MATH  Google Scholar 

  • Fienberg SE, Rinaldo A (2007) Three centuries of categorical data analysis: log-linear models and maximum likelihood estimation. J Stat Plan Inference 137: 3430–3445

    Article  MathSciNet  MATH  Google Scholar 

  • Fienberg SE, Hersh P, Rinaldo A, Zhou Y (2009) Maximum likelihood estimation in latent class models. In: Gibilisco P, Riccomagno E, Rogantin MP (eds) Algebraic and geometric methods in statistics. Cambridge University Press, London (in press)

  • Fingleton B (1984) Models of category counts. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Garcia LD, Stillman M, Sturmfels B (2005) Algebraic geometry of Bayesian networks. J Symb Comput 39: 331–355

    Article  MathSciNet  MATH  Google Scholar 

  • Geiger D, Heckerman D, King H, Meek C (2001) Stratified exponential families: graphical models and model selection. Ann Stat 29(3): 505–529

    MathSciNet  MATH  Google Scholar 

  • Geiger D, Meek C, Sturmfels B (2006) On the toric algebra of graphical models. Ann Stat 34(3): 1463–1492

    Article  MathSciNet  MATH  Google Scholar 

  • Govaert G, Nadif M (2007) Clustering of contingency table and mixture model. Eur J Oper Res 59(4): 727–740

    MathSciNet  Google Scholar 

  • Greenacre MJ (1988) Clustering the rows and columns of a contingency table. J Classif 5: 39–51

    Article  MathSciNet  MATH  Google Scholar 

  • Gurevich G, Vexler A (2005) Change point problems in the model of logistic regression. J Stat Plan Inference 131(2): 313–331

    Article  MathSciNet  MATH  Google Scholar 

  • Haberman SJ (1974) The analysis of frequency data. The University of Chicago Press, Chicago

    MATH  Google Scholar 

  • Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67: 123–129

    Article  Google Scholar 

  • Hosten S, Sullivant S (2004) Ideals of adjacent minors. J Algebra 277: 615–642

    Article  MathSciNet  MATH  Google Scholar 

  • Jeong HC, Jhun M, Kim D (2005) Bootstrap tests for independence in two-way ordinal contingency tables. Comput Stat Data Anal 48: 623–631

    Article  MathSciNet  MATH  Google Scholar 

  • Kreuzer M, Robbiano L (2000) Computational commutative algebra 1. Springer, Berlin

    Book  Google Scholar 

  • Le CT (1998) Applied categorical data analysis. Wiley, New York

    MATH  Google Scholar 

  • Pachter L, Sturmfels B (2005) Algebraic statistics for computational biology. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  • Pistone G, Riccomagno E, Wynn HP (2001) Algebraic statistics: computational commutative algebra in statistics. Chapman&Hall/CRC, Boca Raton

  • R Development Core Team (2006) R: a language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria, http://www.R-project.org, ISBN 3-900051-07-0

  • Rapallo F (2003) Algebraic Markov bases and MCMC for two-way contingency tables. Scand J Stat 30(2): 385–397

    Article  MathSciNet  MATH  Google Scholar 

  • Rapallo F (2007) Toric statistical models: binomial and parametric representations. Ann Inst Stat Math 59(4): 727–740

    Article  MathSciNet  MATH  Google Scholar 

  • Rinaldo A (2005) Maximum likelihood estimates in large sparse contingency tables. Ph.D. thesis, Department of Statistics, Carnegie Mellon University

  • Ritschard G, Zighed DA (2003) Simultaneous row and column partitioning: the scope of a heuristic approach. In: Zhong N, Ras Z, Tsumo S, Suzuki E (eds) Foundations of Intelligent Systems, ISMIS03. Springer, Heidelberg, pp 468–472

  • Sturmfels B (2007) Open problems in algebraic statistics, arXiv:0707.4558v1

  • 4ti2 team (2007) 4ti2—a software package for algebraic, geometric and combinatorial problems on linear spaces. Available at http://www.4ti2.de

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabio Rapallo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carlini, E., Rapallo, F. A class of statistical models to weaken independence in two-way contingency tables. Metrika 73, 1–22 (2011). https://doi.org/10.1007/s00184-009-0262-3

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-009-0262-3

Keywords

Navigation