Skip to main content

Statistical Models for Network Graphs

  • Chapter
  • First Online:
Statistical Analysis of Network Data with R

Part of the book series: Use R! ((USE R,volume 65))

Abstract

The network models discussed in the previous chapter serve a variety of useful purposes. Yet for the purpose of statistical model building, they come up short. Indeed, as Robins and Morris [125] write, “A good [statistical network graph] model needs to be both estimable from data and a reasonable representation of that data, to be theoretically plausible about the type of effects that might have produced the network, and to be amenable to examining which competing effects might be the best explanation of the data.” None of the models we have seen up until this point are really intended to meet such criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    These models have also been referred to as p models, particularly in the social network literature, where they are seen as one of the later examples of a series of model classes introduced in succession over a roughly 20-year period covering the late 1970s, 1980s, and early 1990s. See the review of Wasserman and Pattison [145], for example. Our use of the term ‘exponential random graph models’ reflects current practice, which emphasizes the connection of these models with traditional exponential family models in classical statistics.

  2. 2.

    Recall that an arbitrary (discrete) random vector Z is said to belong to an exponential family if its probability mass function may be expressed in the form

    $$\displaystyle{ \mathbb{P}_{\theta }\left (\mathbf{Z} = \mathbf{z}\right ) =\exp \left \{{\theta }^{T}\mathbf{g}(\mathbf{z}) -\psi (\theta )\right \}, }$$
    (6.1)

    where θ ∈ I​Rp is a p × 1 vector of parameters, g(⋅ ) is a p-dimensional function of \(\mathbf{z}\), and ψ(θ) is a normalization term, ensuring that \(\mathbb{P}_{\theta }(\cdot )\) sums to one.

  3. 3.

    However, it is important to realize that it is not the case that simply any collection of (in)dependence relations among the elements of Y yields a proper joint distribution on Y. Rather, certain conditions must be satisfied, as formalized in the celebrated Hammersley-Clifford theorem (e.g., Besag [12]).

  4. 4.

    The statnet suite is arguably the most sophisticated single collection of R packages for doing statistical modeling of network graphs, particularly from the perspective of social network analysis.

  5. 5.

    That is, for each pair {i, j}, we assume that Y ij is independent of \(Y _{i^{\prime},j^{\prime}}\), for any \(\{i^{\prime},j^{\prime}\}\neq \{i,j\}\).

  6. 6.

    Note that S 1(y) = N e is the number of edges.

  7. 7.

    Formally, Frank and Strauss introduced the notion of Markov dependence for network graph models, which specifies that two possible edges are dependent whenever they share a vertex, conditional on all other possible edges. A random graph G arising under Markov dependence conditions is called a Markov graph.

  8. 8.

    In this context the term is used to refer to a probability distribution that places a disproportionately large amount of its mass on a correspondingly small set of outcomes.

  9. 9.

    Hunter [78] offers an equivalent formulation of this definition, in terms of geometrically weighted counts of the neighbors common to adjacent vertices.

  10. 10.

    We note that the ergm package provides not only summary statistics but also p-values. However, as mentioned earlier, the theoretical justification for the asymptotic chi-square and F-distributions used by ergm to compute these values has not been established formally to date. Therefore, our preference is to interpret these values informally, as additional summary statistics.

  11. 11.

    Goodness-of-fit has been found to be particularly important where ERGMs are concerned, due in large part to the issue of potential model degeneracy.

  12. 12.

    A random variable X is said to follow a Q-class mixture distribution if its probability density function is of the form \(f(x) =\sum _{ q=1}^{Q}\alpha _{k}f_{q}(x)\), for class-specific densities f q , where the mixing weights α q are all non-negative and sum to one.

  13. 13.

    The entropy of a discrete probability distribution p = (p 1, , p Q ) is defined as \(H(\mathbf{p}) = -\sum _{q=1}^{Q}p_{q}\log _{2}p_{q}\), with smaller values indicating a distribution concentrated on fewer classes. This value is bounded above by log2 Q, corresponding to a uniform distribution on {1, , Q}.

  14. 14.

    A set of random variables is said to be exchangeable if their joint distribution is the same for any ordering.

  15. 15.

    In general, a probit model specifies, for a binary response Y, as a function of covariates x, that \(\mathbb{P}(Y = 1\vert \mathbf{X} = \mathbf{x}) =\varPhi ({\mathbf{x}}^{T}\beta )\), for some β.

  16. 16.

    The package latentnet, in the statnet suite of tools, implements other variants of latent network models, such as latent distance models.

  17. 17.

    The arguments S and burn chosen in our example ask that a ‘burn-in’ of 10, 000 iterations be used to initiate our MCMC sampler, after which the following 1, 000 iterations are used to perform posterior inference.

  18. 18.

    Conventions of vertex color, shape, and label are the same as in Fig. 1.1 in Chap. 1, and are specified in R in the same manner as seen in Chap. 3.

  19. 19.

    An ROC curve is used commonly in classification problems. The term refers to a curve obtained by plotting the true positive rate of a classifier against the true negative rate, as a threshold (or similar parameter) is varied across its natural range, where the threshold is applied to the predicted values to discriminate between two classes of interest. Here, since the predictions are posterior probabilities, the threshold is varied from 0 to 1, with vertex pairs for which the posterior probability of an edge is above threshold being predicted to have an edge.

References

  1. E.M. Airoldi, D.M. Blei, S.E. Fienberg, E.P. Xing, Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981–2014 (2008)

    MATH  Google Scholar 

  2. D. Aldous, Exchangeability and related topics. In École d’Été de Probabilités de Saint-Flour XIII—1983, (Springer, Berlin, 1985), pp. 1–198

    Google Scholar 

  3. J. Besag, Spatial interaction and the statistical analysis of lattice systems. J. Roy. Stat. Soc. Ser. B 36(2), 192–236 (1974)

    MATH  MathSciNet  Google Scholar 

  4. A. Coja-Oghlan, A. Lanka, Finding planted partitions in random graphs with general degree distributions. SIAM J. Discrete Math. 23(4), 1682–1714 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  5. J.-J. Daudin, F. Picard, S. Robin, A mixture model for random graphs. Stat. Comput. 18(2), 173–183 (2008)

    Article  MathSciNet  Google Scholar 

  6. O. Frank, D. Strauss, Markov graphs. J. Am. Stat. Assoc. 81(395), 832–842 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  7. C. Geyer, E. Thompson, Constrained Monte Carlo maximum likelihood for dependent data. J. Roy. Stat. Soc. Ser. B 54(3), 657–699 (1992)

    MathSciNet  Google Scholar 

  8. M. Handcock, Assessing degeneracy in statistical models of social networks. Technical Report No. 39, Center for Statistics and the Social Sciences, University of Washington, 2003

    Google Scholar 

  9. P. Hoff, Modeling homophily and stochastic equivalence in symmetric relational data. Advances in Neural Information Processing Systems, NIPS (MIT Press, Cambridge, 2008)

    Google Scholar 

  10. D.N. Hoover, Row-column exchangeability and a generalized model for probability. In Exchangeability in Probability and Statistics (North-Holland, Amsterdam, 1982), pp. 81–291

    Google Scholar 

  11. D. Hunter, Curved exponential family models for social networks. Soc. Network. 29(2), 216–230 (2007)

    Article  Google Scholar 

  12. D. Hunter, M. Handcock, Inference in curved exponential family models for networks. J. Comput. Graph. Stat. 15(3), 565–583 (2006)

    Article  MathSciNet  Google Scholar 

  13. B. Karrer, M.E. Newman, Stochastic blockmodels and community structure in networks. Phys. Rev. E 83(1), 016107 (2011)

    MathSciNet  Google Scholar 

  14. D. Lusher, J. Koskinen, G. Robins, Exponential Random Graph Models for Social Networks: Theory, Methods, and Applications (Cambridge University Press, Cambridge, 2012)

    Book  Google Scholar 

  15. G. McLachlan, T. Krishnan, The EM Algorithm and Extensions, vol. 382 (Wiley, New York, 2007)

    Google Scholar 

  16. K. Nowicki, T. Snijders, Estimation and prediction for stochastic blockstructures. J. Am. Stat. Assoc. 96(455), 1077–1087 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  17. P. Pattison, G. Robins, Neighborhood-based models for social networks. Socio. Meth. 32(1), 301–337 (2002)

    Article  Google Scholar 

  18. G. Robins, M. Morris, Advances in exponential random graph (p*) models. Soc. Network. 29(2), 169–172 (2007)

    Article  Google Scholar 

  19. G. Robins, P. Pattison, Y. Kalish, D. Lusher, An introduction to exponential random graph (p*) models for social networks. Soc. Network. 29(2), 173–191 (2007)

    Article  Google Scholar 

  20. T. Snijders, P. Pattison, G. Robins, M. Handcock, New specifications for exponential random graph models. Socio. Meth. 36(1), 99–153 (2006)

    Article  Google Scholar 

  21. S. Wasserman, K. Faust, Social Network Analysis: Methods and Applications (Cambridge University Press, New York, 1994)

    Book  Google Scholar 

  22. S. Wasserman, P. Pattison, Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and p . Psychometrika 61(3), 401–425 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kolaczyk, E.D., Csárdi, G. (2014). Statistical Models for Network Graphs. In: Statistical Analysis of Network Data with R. Use R!, vol 65. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0983-4_6

Download citation

Publish with us

Policies and ethics