Abstract
In this paper we study a new class of statistical models for contingency tables. We define this class of models through a subset of the binomial equations of the classical independence model. We prove that they are log-linear and we use some notions from Algebraic Statistics to compute their sufficient statistic and their parametric representation. Moreover, we show how to compute maximum likelihood estimates and to perform exact inference through the Diaconis-Sturmfels algorithm. Examples show that these models can be useful in a wide range of applications.
Similar content being viewed by others
References
Agresti A (1992) Modelling patterns of agreement and disagreement. Stat Methods Med Res 1: 201–218
Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, New York
Aoki S, Takemura A (2005) Markov chain Monte Carlo exact tests for incomplete two-way contingency tables. J Stat Comput Simul 75(10): 787–812
Bigatti A, La Scala R, Robbiano L (1999) Computing toric ideals. J Symb Comput 27: 351–365
Bishop YM, Fienberg S, Holland PW (1975) Discrete multivariate analysis: theory and practice. MIT Press, Cambridge
Carlini E, Rapallo F (2009) Algebraic modelling of category distinguishability. In: Gibilisco P, Riccomagno E, Rogantin MP (eds) Algebraic and geometric methods in statistics. Cambridge University Press, London (in press)
Chen Y, Dinwoodie I, Dobra A, Huber M (2005) Lattice points, contingency tables, and sampling. In: Integer points in polyhedra—geometry, number theory, algebra, optimization, Contemp. Math., vol 37. Amer. Math. Soc., Providence, pp. 65–78
CoCoATeam (2007) CoCoA: a system for doing computations in commutative algebra. Available at http://cocoa.dima.unige.it
Cox D, Little J, O’Shea D (1992) Ideals, varieties, and algorithms. Springer, New York
Darroch JN, McCloud PI (1986) Category distinguishability and observer agreement. Aust J Stat 28(3): 371–388
De Loera J, Haws D, Hemmecke R, Huggins P, Tauzer J, Yoshida R (2003) A user’s guide for LattE v1.1. software package LattE is available at http://www.math.ucdavis.edu/~latte/
Diaconis P, Sturmfels B (1998) Algebraic algorithms for sampling from conditional distributions. Ann Stat 26(1): 363–397
Duffy D (2006) The gllm package. Available from http://cran.r-project.org, 0.31 edn
Fienberg S (1980) The analysis of cross-classified categorical data. MIT Press, Cambridge
Fienberg SE, Rinaldo A (2007) Three centuries of categorical data analysis: log-linear models and maximum likelihood estimation. J Stat Plan Inference 137: 3430–3445
Fienberg SE, Hersh P, Rinaldo A, Zhou Y (2009) Maximum likelihood estimation in latent class models. In: Gibilisco P, Riccomagno E, Rogantin MP (eds) Algebraic and geometric methods in statistics. Cambridge University Press, London (in press)
Fingleton B (1984) Models of category counts. Cambridge University Press, Cambridge
Garcia LD, Stillman M, Sturmfels B (2005) Algebraic geometry of Bayesian networks. J Symb Comput 39: 331–355
Geiger D, Heckerman D, King H, Meek C (2001) Stratified exponential families: graphical models and model selection. Ann Stat 29(3): 505–529
Geiger D, Meek C, Sturmfels B (2006) On the toric algebra of graphical models. Ann Stat 34(3): 1463–1492
Govaert G, Nadif M (2007) Clustering of contingency table and mixture model. Eur J Oper Res 59(4): 727–740
Greenacre MJ (1988) Clustering the rows and columns of a contingency table. J Classif 5: 39–51
Gurevich G, Vexler A (2005) Change point problems in the model of logistic regression. J Stat Plan Inference 131(2): 313–331
Haberman SJ (1974) The analysis of frequency data. The University of Chicago Press, Chicago
Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67: 123–129
Hosten S, Sullivant S (2004) Ideals of adjacent minors. J Algebra 277: 615–642
Jeong HC, Jhun M, Kim D (2005) Bootstrap tests for independence in two-way ordinal contingency tables. Comput Stat Data Anal 48: 623–631
Kreuzer M, Robbiano L (2000) Computational commutative algebra 1. Springer, Berlin
Le CT (1998) Applied categorical data analysis. Wiley, New York
Pachter L, Sturmfels B (2005) Algebraic statistics for computational biology. Cambridge University Press, New York
Pistone G, Riccomagno E, Wynn HP (2001) Algebraic statistics: computational commutative algebra in statistics. Chapman&Hall/CRC, Boca Raton
R Development Core Team (2006) R: a language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria, http://www.R-project.org, ISBN 3-900051-07-0
Rapallo F (2003) Algebraic Markov bases and MCMC for two-way contingency tables. Scand J Stat 30(2): 385–397
Rapallo F (2007) Toric statistical models: binomial and parametric representations. Ann Inst Stat Math 59(4): 727–740
Rinaldo A (2005) Maximum likelihood estimates in large sparse contingency tables. Ph.D. thesis, Department of Statistics, Carnegie Mellon University
Ritschard G, Zighed DA (2003) Simultaneous row and column partitioning: the scope of a heuristic approach. In: Zhong N, Ras Z, Tsumo S, Suzuki E (eds) Foundations of Intelligent Systems, ISMIS03. Springer, Heidelberg, pp 468–472
Sturmfels B (2007) Open problems in algebraic statistics, arXiv:0707.4558v1
4ti2 team (2007) 4ti2—a software package for algebraic, geometric and combinatorial problems on linear spaces. Available at http://www.4ti2.de
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Carlini, E., Rapallo, F. A class of statistical models to weaken independence in two-way contingency tables. Metrika 73, 1–22 (2011). https://doi.org/10.1007/s00184-009-0262-3
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-009-0262-3