Statistical Models for Network Graphs

Kolaczyk, Eric D.; Csárdi, Gábor

doi:10.1007/978-1-4939-0983-4_6

Eric D. Kolaczyk⁶ &
Gábor Csárdi⁷

Part of the book series: Use R! ((USE R,volume 65))

18k Accesses
1 Citations
2 Altmetric

Abstract

The network models discussed in the previous chapter serve a variety of useful purposes. Yet for the purpose of statistical model building, they come up short. Indeed, as Robins and Morris [125] write, “A good [statistical network graph] model needs to be both estimable from data and a reasonable representation of that data, to be theoretically plausible about the type of effects that might have produced the network, and to be amenable to examining which competing effects might be the best explanation of the data.” None of the models we have seen up until this point are really intended to meet such criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
These models have also been referred to as p ^∗ models, particularly in the social network literature, where they are seen as one of the later examples of a series of model classes introduced in succession over a roughly 20-year period covering the late 1970s, 1980s, and early 1990s. See the review of Wasserman and Pattison [145], for example. Our use of the term ‘exponential random graph models’ reflects current practice, which emphasizes the connection of these models with traditional exponential family models in classical statistics.
2.
Recall that an arbitrary (discrete) random vector Z is said to belong to an exponential family if its probability mass function may be expressed in the form
$$\displaystyle{ \mathbb{P}_{\theta }\left (\mathbf{Z} = \mathbf{z}\right ) =\exp \left \{{\theta }^{T}\mathbf{g}(\mathbf{z}) -\psi (\theta )\right \}, }$$
(6.1)
where θ ∈ IR^p is a p × 1 vector of parameters, g(⋅ ) is a p-dimensional function of $\mathbf{z}$, and ψ(θ) is a normalization term, ensuring that $\mathbb{P}_{\theta }(\cdot )$ sums to one.
3.
However, it is important to realize that it is not the case that simply any collection of (in)dependence relations among the elements of Y yields a proper joint distribution on Y. Rather, certain conditions must be satisfied, as formalized in the celebrated Hammersley-Clifford theorem (e.g., Besag [12]).
4.
The statnet suite is arguably the most sophisticated single collection of R packages for doing statistical modeling of network graphs, particularly from the perspective of social network analysis.
5.
That is, for each pair {i, j}, we assume that Y _ij is independent of $Y _{i^{\prime},j^{\prime}}$, for any $\{i^{\prime},j^{\prime}\}\neq \{i,j\}$.
6.
Note that S ₁(y) = N _e is the number of edges.
7.
Formally, Frank and Strauss introduced the notion of Markov dependence for network graph models, which specifies that two possible edges are dependent whenever they share a vertex, conditional on all other possible edges. A random graph G arising under Markov dependence conditions is called a Markov graph.
8.
In this context the term is used to refer to a probability distribution that places a disproportionately large amount of its mass on a correspondingly small set of outcomes.
9.
Hunter [78] offers an equivalent formulation of this definition, in terms of geometrically weighted counts of the neighbors common to adjacent vertices.
10.
We note that the ergm package provides not only summary statistics but also p-values. However, as mentioned earlier, the theoretical justification for the asymptotic chi-square and F-distributions used by ergm to compute these values has not been established formally to date. Therefore, our preference is to interpret these values informally, as additional summary statistics.
11.
Goodness-of-fit has been found to be particularly important where ERGMs are concerned, due in large part to the issue of potential model degeneracy.
12.
A random variable X is said to follow a Q-class mixture distribution if its probability density function is of the form $f(x) =\sum _{ q=1}^{Q}\alpha _{k}f_{q}(x)$, for class-specific densities f _q, where the mixing weights α _q are all non-negative and sum to one.
13.
The entropy of a discrete probability distribution p = (p ₁, …, p _Q) is defined as $H(\mathbf{p}) = -\sum _{q=1}^{Q}p_{q}\log _{2}p_{q}$, with smaller values indicating a distribution concentrated on fewer classes. This value is bounded above by log₂ Q, corresponding to a uniform distribution on {1, …, Q}.
14.
A set of random variables is said to be exchangeable if their joint distribution is the same for any ordering.
15.
In general, a probit model specifies, for a binary response Y, as a function of covariates x, that $\mathbb{P}(Y = 1\vert \mathbf{X} = \mathbf{x}) =\varPhi ({\mathbf{x}}^{T}\beta )$, for some β.
16.
The package latentnet, in the statnet suite of tools, implements other variants of latent network models, such as latent distance models.
17.
The arguments S and burn chosen in our example ask that a ‘burn-in’ of 10, 000 iterations be used to initiate our MCMC sampler, after which the following 1, 000 iterations are used to perform posterior inference.
18.
Conventions of vertex color, shape, and label are the same as in Fig. 1.1 in Chap. 1, and are specified in R in the same manner as seen in Chap. 3.
19.
An ROC curve is used commonly in classification problems. The term refers to a curve obtained by plotting the true positive rate of a classifier against the true negative rate, as a threshold (or similar parameter) is varied across its natural range, where the threshold is applied to the predicted values to discriminate between two classes of interest. Here, since the predictions are posterior probabilities, the threshold is varied from 0 to 1, with vertex pairs for which the posterior probability of an edge is above threshold being predicted to have an edge.

References

E.M. Airoldi, D.M. Blei, S.E. Fienberg, E.P. Xing, Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981–2014 (2008)
MATH Google Scholar
D. Aldous, Exchangeability and related topics. In École d’Été de Probabilités de Saint-Flour XIII—1983, (Springer, Berlin, 1985), pp. 1–198
Google Scholar
J. Besag, Spatial interaction and the statistical analysis of lattice systems. J. Roy. Stat. Soc. Ser. B 36(2), 192–236 (1974)
MATH MathSciNet Google Scholar
A. Coja-Oghlan, A. Lanka, Finding planted partitions in random graphs with general degree distributions. SIAM J. Discrete Math. 23(4), 1682–1714 (2009)
Article MATH MathSciNet Google Scholar
J.-J. Daudin, F. Picard, S. Robin, A mixture model for random graphs. Stat. Comput. 18(2), 173–183 (2008)
Article MathSciNet Google Scholar
O. Frank, D. Strauss, Markov graphs. J. Am. Stat. Assoc. 81(395), 832–842 (1986)
Article MATH MathSciNet Google Scholar
C. Geyer, E. Thompson, Constrained Monte Carlo maximum likelihood for dependent data. J. Roy. Stat. Soc. Ser. B 54(3), 657–699 (1992)
MathSciNet Google Scholar
M. Handcock, Assessing degeneracy in statistical models of social networks. Technical Report No. 39, Center for Statistics and the Social Sciences, University of Washington, 2003
Google Scholar
P. Hoff, Modeling homophily and stochastic equivalence in symmetric relational data. Advances in Neural Information Processing Systems, NIPS (MIT Press, Cambridge, 2008)
Google Scholar
D.N. Hoover, Row-column exchangeability and a generalized model for probability. In Exchangeability in Probability and Statistics (North-Holland, Amsterdam, 1982), pp. 81–291
Google Scholar
D. Hunter, Curved exponential family models for social networks. Soc. Network. 29(2), 216–230 (2007)
Article Google Scholar
D. Hunter, M. Handcock, Inference in curved exponential family models for networks. J. Comput. Graph. Stat. 15(3), 565–583 (2006)
Article MathSciNet Google Scholar
B. Karrer, M.E. Newman, Stochastic blockmodels and community structure in networks. Phys. Rev. E 83(1), 016107 (2011)
MathSciNet Google Scholar
D. Lusher, J. Koskinen, G. Robins, Exponential Random Graph Models for Social Networks: Theory, Methods, and Applications (Cambridge University Press, Cambridge, 2012)
Book Google Scholar
G. McLachlan, T. Krishnan, The EM Algorithm and Extensions, vol. 382 (Wiley, New York, 2007)
Google Scholar
K. Nowicki, T. Snijders, Estimation and prediction for stochastic blockstructures. J. Am. Stat. Assoc. 96(455), 1077–1087 (2001)
Article MATH MathSciNet Google Scholar
P. Pattison, G. Robins, Neighborhood-based models for social networks. Socio. Meth. 32(1), 301–337 (2002)
Article Google Scholar
G. Robins, M. Morris, Advances in exponential random graph (p*) models. Soc. Network. 29(2), 169–172 (2007)
Article Google Scholar
G. Robins, P. Pattison, Y. Kalish, D. Lusher, An introduction to exponential random graph (p*) models for social networks. Soc. Network. 29(2), 173–191 (2007)
Article Google Scholar
T. Snijders, P. Pattison, G. Robins, M. Handcock, New specifications for exponential random graph models. Socio. Meth. 36(1), 99–153 (2006)
Article Google Scholar
S. Wasserman, K. Faust, Social Network Analysis: Methods and Applications (Cambridge University Press, New York, 1994)
Book Google Scholar
S. Wasserman, P. Pattison, Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and p ^∗. Psychometrika 61(3), 401–425 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Boston University Professor, Boston, MA, USA
Eric D. Kolaczyk
Department of Statistics, Harvard University Research Associate, Cambridge, MA, USA
Gábor Csárdi

Authors

Eric D. Kolaczyk
View author publications
You can also search for this author in PubMed Google Scholar
Gábor Csárdi
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kolaczyk, E.D., Csárdi, G. (2014). Statistical Models for Network Graphs. In: Statistical Analysis of Network Data with R. Use R!, vol 65. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0983-4_6

Download citation

DOI: https://doi.org/10.1007/978-1-4939-0983-4_6
Published: 17 April 2014
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-0982-7
Online ISBN: 978-1-4939-0983-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics