Skip to main content
Log in

Goodness of fit for log-linear network models: dynamic Markov bases using hypergraphs

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

Social networks and other sparse data sets pose significant challenges for statistical inference, since many standard statistical methods for testing model/data fit are not applicable in such settings. Algebraic statistics offers a theoretically justified approach to goodness-of-fit testing that relies on the theory of Markov bases. Most current practices require the computation of the entire basis, which is infeasible in many practical settings. We present a dynamic approach to explore the fiber of a model, which bypasses this issue, and is based on the combinatorics of hypergraphs arising from the toric algebra structure of log-linear models. We demonstrate the approach on the Holland–Leinhardt \(p_1\) model for random directed graphs that allows for reciprocation effects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. Recall that a directed edge (uv) is called reciprocated if (vu) is also in the network. Otherwise, (uv) is called unreciprocated. In subsequent figures we sometimes draw reciprocated edges as undirected to reduce clutter.

  2. The skeleton of a graph \(G=(V,E)\) is the graph obtained by replacing the directed edges in E with their undirected counterparts and then removing multiple edges.

References

  • Aoki, S., Takemura, A. (2003). Minimal basis for a connected Markov chain over \(3\times 3\times k\) contingency tables with fixed two-dimensional marginals. Australian & New Zealand Journal of Statistics, 45(2), 229–249.

  • Aoki, S., Takemura, A. (2005). Markov chain Monte Carlo exact tests for incomplete two-way contingency tables. Journal of Statistical Computation and Simulation, 75(10), 787–812.

  • Aoki, S., Hara, H., Takemura, A. (2012). Markov bases in algebraic statistics. Springer Series in Statistics. New York: Springer.

  • Baird, D., Ulanowicz, R. (1989). The seasonal dynamics of the Chesapeake Bay ecosystem. Ecological Monographs, 59, 329–364.

  • Bishop, Y. M., Fienberg, S. E., Holland, P. W. (1975). Discrete multivariate analysis: Theory and practice. New York: Springer.

  • Chatterjee, S., Diaconis, P., Sly, A. (2011). Random graphs with a given degree sequence. Annals of Applied Probability, 21(4), 1400–1435.

  • Chen, Y., Dinwoodie, I. H., Sullivant, S. (2005). Sequential importance sampling for multiway tables. Annals of Statistics, 34, 523–545.

  • Csardi, G., Nepusz, T. (2006). The igraph software package for complex network research. International Journal of Complex Systems, 1695.

  • Develin, M., Sullivant, S. (2003). Markov bases of binary graph models. Annals of Combinatorics, 7(4), 441–466.

  • Diaconis, P., Sturmfels, B. (1998). Algebraic algorithms for sampling from conditional distribution. Annals of Statistics, 26(1), 363–397.

  • Dinwoodie, I. H., Chen, Y. (2011). Sampling large tables with constraints. Statistica Sinica, 21, 1591–1609.

  • Dobra, A. (2003). Markov bases for decomposable graphical models. Bernoulli, 9(6), 1093–1108.

    Article  MathSciNet  MATH  Google Scholar 

  • Dobra, A. (2012). Dynamic Markov bases. Journal of Computational and Graphical Statistics, 21(12), 496–517.

  • Dobra, A., Sullivant, S. (2004). A divide-and-conquer algorithm for generating Markov bases of multi-way tables. Computational Statistics, 19, 347–366.

  • Dobra, A., Fienberg, S. E., Rinaldo, A., Slavković, A., Zhou, Y. (2008). Algebraic statistics and contingency table problems: Log-linear models, likelihood estimation and disclosure limitation. Emerging applications of algebraic geometry (pp. 63–88). IMA. Volumes in Mathematics and its Applications, vol. 149, New York: Springer Verlag.

  • Drton, M., Sturmfels, B., Sullivant, S. (2009). Lectures on algebraic statistics, Oberwolfach Seminars, vol 39. Springer, Basel. doi:10.1007/978-3-7643-8905-5.

  • Fienberg, S. E., Wasserman, S. S. (1981). Discussion of Holland, P. W. and Leinhardt, S. An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76, 54–57 (1981).

  • Fienberg, S.E., Petrović, S., Rinaldo, A. (2010). Algebraic statistics for \(p_1\) random graph models: Markov bases and their uses. Looking Back. Proceedings of a Conference in Honor of Paul W. Holland, chapter 1, Lecture Notes in Statistics—Proceedings, vol.202, New York: Springer.

  • Goldenberg, A., Zheng, A. X., Fienberg, S. E., Airoldi, E. M. (2009). A survey of statistical network models. Foundations and Trends in Machine Learning, 2(2), 129–233.

  • Gross, E., Petrović, S. (2013). Combinatorial degree bound for toric ideals of hypergraphs. International Journal of Algebra and Computation, 23(6), 1503–1520.

  • Gross, E., Petrović, S., Stasi, D. (2014). Goodness of fit for log-linear network models: supplementary material. http://math.iit.edu/~spetrov1/DynamicP1supplement/. Accessed 18 Mar 2016.

  • Haberman, S. J. (1981). An exponential family of probabilty distributions for directed graphs: Comment. Journal of the American Statistical Association, 76(373), 60–61.

    Google Scholar 

  • Hara, H., Takemura, A. (2010). Connecting tables with zero-one entries by a subset of a Markov basis. In M. Viana, H. Wynn (Eds.), Algebraic methods in statistics and probability II, contemporarymathematics (Vol. 516, pp. 199–213)., American Mathematical Society: Providence.

  • Hara, H., Takemura, A., Yoshida, R. (2009a). Markov bases for two-way subtable sum problems. Journal of Pure and Applied Algebra, 213(8), 1507–1521.

  • Hara, H., Takemura, A., Yoshida, R. (2009b). A Markov basis for conditional test of common diagonal effect in quasi-independence model for square contingency tables. Computational Statistics & Data Analysis, 53(4), 1006–1014.

  • Hara, H., Aoki, S., Takemura, A. (2010). Minimal and minimal invariant Markov bases of decomposable models for contingency tables. Bernoulli, 16(1), 208–233.

  • Hara, H., Aoki, S., Takemura, A. (2012). Running Markov chain without Markov basis. In T. Hibi (Ed.), Harmony of Gröbner bases and the modern industrial society. Singapore: World Scientific.

  • Haws, D., Martin del Campo, A., Takemura, A., Yoshida, R. (2014). Markov degree of the three-state toric homogeneous Markov chain model. Beiträge zur Algebra und Geometrie/Contributions to Algebra and Geometry, 55, 161–188.

  • Holland, P. W., Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs (with discussion). Journal of the American Statistical Association, 76(373), 33–65.

  • Hunter, D. R., Goodreau, S. M., Handcock, M. S. (2008). Goodness of fit of social network models. Journal of the American Statistical Association, 103(481), 248–258.

  • Král, D., Norine, S., Pangrác, O. (2010). Markov bases of binary graph models of \(K_4\)-minor free graphs. Journal of Combinatorial Theory, Series A, 117(6), 759–765.

  • Kushimba, S., Chaggar, H., Gross, E., Kunyu, G. (2013). Social networks of mobey money in Kenya. In: Working Paper 2013-1, Institute for Money, Technology, and Financial Inclusion, Irvine.

  • Norén, P. (2015). The three-state toric homogeneous Markov chain model has Markov degree two. Journal of Symbolic Computation, 68(2), 285–296.

    Article  MathSciNet  MATH  Google Scholar 

  • Ogawa, M., Hara, H., Takemura, A. (2013). Graver basis for an undirected graph and its application to testing the beta model of random graphs. Annals of Institute of Statistical Mathematics, 65(1), 191–212.

  • Pajek (2004a). Food webs. http://vlado.fmf.uni-lj.si/pub/networks/data/bio/foodweb/foodweb.htm. Accessed 18 Mar 2016.

  • Pajek (2004b). Sampson’s monastery dataset. http://vlado.fmf.uni-lj.si/pub/networks/data/esna/sampson.htm. Accessed 18 Mar 2016.

  • Petrović, S., Stasi, D. (2014). Toric algebra of hypergraphs. Journal of Algebraic Combinatorics, 39(1), 187–208.

  • Petrović, S., Rinaldo, A., Fienberg, S.E. (2010). Algebraic statistics for a directed random graph model with reciprocation. In: M. A. G. Viana, H. Wynn (Eds.), Algebraic Methods in Statistics and Probability II, Contemporary Mathematics, vol. 516, American Mathematical Society.

  • R DCT (2005). R: a language and environment for statistical computing. http://www.R--project.org. Accessed 18 Mar 2016.

  • Rapallo, F., Yoshida, R. (2010). Markov bases and subbases for bounded contingency tables. Annals of the Institute of Statistical Mathematics, 62(4), 785–805.

  • Robert, C., Casella, G. (1999). Monte Carlo statistical methods. In: Springer Texts in Statistics. New York: Springer.

  • Sampson, S.F. (1968). A novitiate in a period of change: an experimental and case study of relationships. PhD thesis, Department of Sociology, Cornell: Cornell University.

  • Slavković, A. B. (2010). Partial information releases for confidential contingency table entries: Present and future research efforts. Journal of Privacy and Confidentiality, 1(2).

  • Slavković, A. B., Zhu, X., Petrović, S. (2015). Fibers of multi-way contingency tables given conditionals: relation to marginals, cell bounds and markov bases. Annals of the Institute of Statistical Mathematics, 67(4), 621–648.

  • Sturmfels, B. (1996). Gröbner bases and convex polytopes., University Lecture Series. Providence: American Mathematical Society.

  • Sturmfels, B., Welker, V. (2012). Commutative algebra of statistical ranking. Journal of Algebra, 361, 264–286.

  • Villarreal, R. H. (2000). Monomial algebras., Monographs and Research Notes in Mathematics. Boca Raton: Chapman and Hall/CRC.

  • Yamaguchi, T., Ogawa, M., Takemura, A. (2013). Markov degree of the Birkhoff model. Journal of Algebraic Combinatorics, 38(4), 1–19.

  • 4ti2 T (2008) 4ti2: a software package for algebraic, geometric and combinatorial problems on linear spaces combinatorial problems on linear spaces. http://www.4ti2.de. Accessed 18 Mar 2016.

Download references

Acknowledgments

The authors are grateful to Alessandro Rinaldo and Stephen E. Fienberg for their support at the inception of this project. The authors would also like to thank two anonymous referees for their very thoughtful comments and suggestions which improved this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sonja Petrović.

Additional information

E. Gross is supported by the NSF Postdoctoral Research Fellowship, NSF award #DMS-1304167. S. Petrović and D. Stasi acknowledge partial support from Grants #FA9550-12-1-0392 and #FA9550-14-1-0141 from the U.S. Air Force Office of Scientific Research (AFOSR) and the Defense Advanced Research Projects Agency (DARPA). Some of the computations were performed on a cluster provided by an NSF-SCREMS Grant to IIT. Part of this work was completed while D. Stasi was a postdoc at Pennsylvania State University Statistics Department.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (R 3 KB)

Supplementary material 2 (R 34 KB)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gross, E., Petrović, S. & Stasi, D. Goodness of fit for log-linear network models: dynamic Markov bases using hypergraphs. Ann Inst Stat Math 69, 673–704 (2017). https://doi.org/10.1007/s10463-016-0560-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-016-0560-2

Keywords

Navigation