Advertisement

Psychometrika

, Volume 83, Issue 4, pp 809–830 | Cite as

Outliers and Influential Observations in Exponential Random Graph Models

  • Johan KoskinenEmail author
  • Peng Wang
  • Garry Robins
  • Philippa Pattison
Article

Abstract

We discuss measuring and detecting influential observations and outliers in the context of exponential family random graph (ERG) models for social networks. We focus on the level of the nodes of the network and consider those nodes whose removal would result in changes to the model as extreme or “central” with respect to the structural features that “matter”. We construe removal in terms of two case-deletion strategies: the tie-variables of an actor are assumed to be unobserved, or the node is removed resulting in the induced subgraph. We define the difference in inferred model resulting from case deletion from the perspective of information theory and difference in estimates, in both the natural and mean-value parameterisation, representing varying degrees of approximation. We arrive at several measures of influence and propose the use of two that do not require refitting of the model and lend themselves to routine application in the ERGM fitting procedure. MCMC p values are obtained for testing how extreme each node is with respect to the network structure. The influence measures are applied to two well-known data sets to illustrate the information they provide. From a network perspective, the proposed statistics offer an indication of which actors are most distinctive in the network structure, in terms of not abiding by the structural norms present across other actors.

Keywords

statistical analysis of social networks exponential random graph models outliers leverage missing data principle case deletion 

References

  1. Anderson, B. S., Butts, C., & Carley, K. (1999). The interaction of size and density with graph-level indices. Social Networks, 21, 239–267.CrossRefGoogle Scholar
  2. Barndorff-Nielsen, O. E. (1978). Information and exponential families in statistical theory. New York: Wiley.Google Scholar
  3. Belsley, D. A., Kuh, E., & Welsh, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity, Wiley series in probability and mathematical statistics. New York: Wiley.CrossRefGoogle Scholar
  4. Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society B, 36, 96–127.Google Scholar
  5. Block, P., Koskinen, J. H., Stadtfeld, C. J., Hollway, J., & Steglich, C. (2018). Change we can believe in: Comparing longitudinal network models on consistency, interpretability and predictive power. Social Networks, 52, 189–191.CrossRefGoogle Scholar
  6. Borgatti, S. P., & Everett, M. G. (2006). A graph-theoretic perspective on centrality. Social Networks, 28, 466–484.CrossRefGoogle Scholar
  7. Chatterjee, S., & Hadi, A. S. (2009). Sensitivity analysis in linear regression (Vol. 327). New York: John Wiley & Sons.Google Scholar
  8. Cook, R. D. (1977). Detection of influential observations in linear regression. Technometrics, 19, 15–18.Google Scholar
  9. Cook, R. D. (1986). Assessment of local influence. Journal of the Royal Statistical Society, Series B, 48, 133–169.Google Scholar
  10. Corander, J., Dahmström, K., & Dahmström, P. (1998). Maximum likelihood estimation for Markov graphs. Research report, 1998:8, Stockholm University, Department of Statistics.Google Scholar
  11. Corander, J., Dahmström, K., & Dahmström, P. (2002). Maximum likelihood estimation for exponential random graph model. In J. Hagberg (ed.), Contributions to social network analysis, information theory, and other topics in statistics; A Festschrift in honour of Ove Frank (pp. 1–17). University of Stockholm: Department of Statistics.Google Scholar
  12. Crouch, B., Wasserman, S., & Trachtenberg, F. (1998). Markov Chain Monte Carlo maximum likelihood estimation for p* social network models. Paper presented at the Sunbelt XVIII and Fifth European International Social Networks Conference, Sitges (Spain), May 28–31, 1998.Google Scholar
  13. Dahmström, K., & Dahmström, P. (1993). ML-estimation of the clustering parameter in a Markov graph model. Stockholm: Research report, 1993:4, Department of Statistics.Google Scholar
  14. Frank, O., & Strauss, D. (1986). Markov graphs. Journal of the American Statistical Association, 81, 832–842.CrossRefGoogle Scholar
  15. Freeman, L. C. (1978). Centrality in social networks conceptual clarification. Social Networks, 1, 215–239.CrossRefGoogle Scholar
  16. Gelman, A., & Meng, X. L. (1998). Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Statistical Science, 13, 163–185.CrossRefGoogle Scholar
  17. Handcock, M. S. (2003). Assessing degeneracy in statistical models of social networks. Working Paper no. 39, Center for Statistics and the Social Sciences, University of Washington. http://www.csss.washington.edu/Papers/wp39.pdf.
  18. Handcock, M., & Gile, K. (2010). Modeling social networks from sampled data. The Annals of Applied Statistics, 4, 5–25.CrossRefGoogle Scholar
  19. Hines, R. O. H., & Hines, W. G. S. (1995). Exploring Cook’s statistic graphically. The American Statistician, 49, 389–394.Google Scholar
  20. Hines, R. O. H., Lawless, J. F., & Carter, E. M. (1992). Diagnostics for a cumulative multinomial generalized linear model, with applications to grouped toxicological mortality data. Journal of the American Statistical Association, 87, 1059–1069.CrossRefGoogle Scholar
  21. Holland, P., & Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs (with discussion). Journal of the American Statistical Association, 76, 33–65.CrossRefGoogle Scholar
  22. Huisman, M. (2009). Imputation of missing network data: Some simple procedures. Journal of Social Structure, 10(1), 1–29.Google Scholar
  23. Hunter, D. R., & Handcock, M. S. (2006). Inference in curved exponential family models for networks. Journal of Computational and Graphical Statistics, 15, 565–583.CrossRefGoogle Scholar
  24. Jonasson, J. (1999). The random triangle model. Journal of Applied Probability, 36, 852–876.CrossRefGoogle Scholar
  25. Koskinen, J. (in press). Exponential random graph models. In B. Everitt, G. Molenberghs, W. Piegorsch, F. Ruggeri, M. Davidian, & R. Kenett (Eds.), Wiley StatsRef: Statistics Reference Online. Wiley, stat08136. https://doi.org/10.1002/9781118445112.stat08136.
  26. Koskinen, J., Robins, G., & Pattison, P. E. (2010). Analysing exponential random graph (p-star) models with missing data using bayesian data augmentation. Statistical Methodology, 7(3), 366–384.CrossRefGoogle Scholar
  27. Koskinen, J., Robins, G., Wang, P., & Pattison, P. E. (2013). Bayesian analysis for partially observed network data, missing ties, attributes and actors. Social Networks, 35(4), 514–527.CrossRefGoogle Scholar
  28. Koskinen, J., & Snijders, T. A. B. (2013). Simulation, estimation and goodness of fit. In D. Lusher, J. Koskinen, & G. Robins (Eds.), Exponential random graph models for social networks: Theory, methods and applications (pp. 141–166). New York, NY: Cambridge University Press.Google Scholar
  29. Kuhnt, S. (2004). Outlier identification procedures for contingency tables using maximum likelihood and \(L_1\) estimates. Scandinavian Journal of Statistics, 31, 431–442.CrossRefGoogle Scholar
  30. Laumann, E. O., Marsden, P. V., & Prensky, D. (1983). The boundary specification problem in network analysis. In R. S. Burt & M. J. Minor (Eds.), Applied network analysis (pp. 18–34). London: Sage Publications.Google Scholar
  31. Lazega, E. (2001). The collegial phenomenon: The social mechanisms of cooperation among peers in a corporate law partnership. Oxford: Oxford University Press.CrossRefGoogle Scholar
  32. Lee, A. H. (1988). Partial influence in generalized linear models. Biometrics, 44, 71–77.CrossRefGoogle Scholar
  33. Lehmann, E. L. (1983). Theory of point estimation. New York: Wiley.CrossRefGoogle Scholar
  34. Lesaffre, E., & Albert, A. (1989). Multiple-group logistic regression diagnostics. Applied Statistics, 38, 425–440.CrossRefGoogle Scholar
  35. Lesaffre, E., & Verbeke, G. (1998). Local influence in linear mixed models. Biometrics, 570–582.CrossRefGoogle Scholar
  36. Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.Google Scholar
  37. Lusher, D., Koskinen, J., & Robins, G. L. (2013). Exponential random graph models for social networks: Theory, methods, and applications. Cambridge: Cambridge University Press.Google Scholar
  38. McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415–444.CrossRefGoogle Scholar
  39. Meng, X.-L., & Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statistica Sinica, 6, 831–860.Google Scholar
  40. Neal, R. M. (1993) Probabilistic inference using Markov Chain Monte Carlo methods. Technical Report CRG–TR–93–1, Department of Statistics, University of Toronto. http://www.cs.utoronto.ca/~radford/. Accessed 29 Sept 2008.
  41. Nomikos, J. M. (2007). Terrorism, media, and intelligence in Greece: Capturing the 17 November group. International Journal of Intelligence and CounterIntelligence, 20(1), 65–78.CrossRefGoogle Scholar
  42. Pattison, P. E., & Wasserman, S. (1999). Logit models and logistic regressions for social networks: II. Multivariate relations. British Journal of Mathematical and Statistical Psychology, 52, 169–193.CrossRefGoogle Scholar
  43. Pierce, D. A., & Schafer, D. W. (1986). Residuals in generalized linear models. Journal of the American Statistical Association, 81, 977–986.CrossRefGoogle Scholar
  44. Pregibon, D. (1981). Logistic regression diagnostics. The Annals of Statistics, 9, 705–724.CrossRefGoogle Scholar
  45. Rhodes, C. J., & Jones, P. (2009). Inferring missing links in partially observed social networks. Journal of the Operational Research Society, 60, 1373–1383.CrossRefGoogle Scholar
  46. Robins, G. L., & Daraganova, G. (2013). Social selection, dyadic covariates, and geospatial effects. In D. Lusher, J. Koskinen, & G. Robins (Eds.), Exponential random graph models for social networks: Theory, methods, and applications (pp. 91–101). Cambridge: Cambridge University Press.Google Scholar
  47. Robins, G. L., Elliott, P., & Pattison, P. E. (2001). Network models for social selection processes. Social networks, 23, 1–30.CrossRefGoogle Scholar
  48. Robins, G. L., & Lusher, D. (2013). Illustrations: Simulation, estimation, and goodness of fit. In D. Lusher, J. Koskinen, & G. Robins (Eds.), Exponential random graph models for social networks: Theory, methods, and applications (pp. 167–185). Cambridge: Cambridge University Press.Google Scholar
  49. Robins, G. L., & Morris, M. (2007). Advances in exponential random graph (p*) Models. Social Networks, 29, 169–172.CrossRefGoogle Scholar
  50. Robins, G. L., Pattison, P. E., & Elliot, P. (2001). Network models for social influence processes. Psychometrika, 66, 161–190.CrossRefGoogle Scholar
  51. Robins, G. L., Pattison, P. E., & Woolcock, J. (2005). Small and other worlds: Global network structures from local processes. American Journal of Sociology, 110, 894–936.CrossRefGoogle Scholar
  52. Rubin, D. B. (1976). Inference and missing data (with discussion). Biometrika, 63, 581–592.CrossRefGoogle Scholar
  53. Schoch, D., & Brandes, U. (2015). Stars, neighborhood inclusion, and network centrality. In SIAM workshop on network science.Google Scholar
  54. Shalizi, C. R., & Rinaldo, A. (2013). Consistency under sampling of exponential random graph models. The Annals of Statistics, 41, 508–535.CrossRefGoogle Scholar
  55. Snijders, T. A. B. (2002). Markov chain Monte Carlo estimation of exponential random graph models. Journal of Social Structure, 3(2), 1–40.Google Scholar
  56. Snijders, T. A. B. (2010). Conditional marginalization for exponential random graph models. Journal of Mathematical Sociology, 34, 239–252.CrossRefGoogle Scholar
  57. Snijders, T. A. B., & Borgatti, S. P. (1999). Non-parametric standard errors and tests for network statistics. Connections, 22, 61–70.Google Scholar
  58. Snijders, T. A. B., Pattison, P. E., Robins, G. L., & Handcock, M. S. (2006). New specifications for exponential random graph models. Sociological Methodology, 36, 99–153.CrossRefGoogle Scholar
  59. Schweinberger, M. (2011). Instability, sensitivity, and degeneracy of discrete exponential families. Journal of the American Statistical Association, 106, 1361–1370.CrossRefGoogle Scholar
  60. Schweinberger, M., Krivitsky, P. N., & Butts, C. T. (2017). Foundations of finite-, super-, and infinite-population random graph inference. arXiv:1707.04800v1
  61. Strauss, D. (1986). On a general class of models for interaction. SIAM Review, 28, 513–527.CrossRefGoogle Scholar
  62. The John Jay & ARTIS Transnational Terrorism Database, JJATT. (2009). http://doitapps.jjay.cuny.edu/jjatt/data.php. Accessed 27 July 2016.
  63. van Duijn, M. A. J., Gile, K. J., & Handcock, M. S. (2009). A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models. Social Networks, 31(1), 52–62.CrossRefGoogle Scholar
  64. Wang, P., Pattison, P., & Robins, G. (2013). Exponential random graph model specifications for bipartite networks—A dependence hierarchy. Social Networks, 35(2), 211–222.CrossRefGoogle Scholar
  65. Wang, P., Robins, G., Pattison, P., & Koskinen, J. (2014). MPNet, Program for the simulation and estimation of (\(p^{\ast }\)) exponential random graph models for Multilevel networks: USER MANUAL. Melbourne School of Psychological Sciences The University of Melbourne Australia.Google Scholar
  66. Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  67. Wasserman, S., & Pattison, P. E. (1996). Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and p*. Psychometrika, 61, 401–425.CrossRefGoogle Scholar
  68. Waternaux, C., Laird, N. M., & Ware, J. H. (1989). Methods for analysis of longitudinal data: Blood-lead concentrations and cognitive development. Journal of the American Statistical Association, 84, 33–41.CrossRefGoogle Scholar
  69. Weiss, R. E., & Lazaro, C. G. (1992). Residual plots for repeated measures. Statistics in Medicine, 11, 115–124.CrossRefGoogle Scholar
  70. Williams, D. A. (1984). Residuals in generalized linear models. In Proceedings of the XIIth international biometric conference, Tokyo (pp. 59–68).Google Scholar
  71. Williams, D. A. (1987). Generalized linear model diagnostics using the deviance and single case deletions. Applied Statistics, 36, 181–191.CrossRefGoogle Scholar

Copyright information

© The Psychometric Society 2018

Authors and Affiliations

  1. 1.The Mitchell Centre for Social Network Analysis and the Department of Social Statistics, School of Social SciencesUniversity of ManchesterManchesterUK
  2. 2.Melbourne School of Psychological SciencesThe University of MelbourneParkvilleAustralia
  3. 3.Institute of Analytical SociologyUniversity of LinköpingLinköpingSweden
  4. 4.Centre for Transformative Innovation, Faculty of Business and LawSwinburne University of TechnologyHawthornAustralia
  5. 5.The University of SydneySydneyAustralia

Personalised recommendations