Combining Stochastic Block Models and Mixed Membership for Statistical Network Analysis

  • Edoardo M. Airoldi
  • David M. Blei
  • Stephen E. Fienberg
  • Eric P. Xing
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4503)

Abstract

Data in the form of multiple matrices of relations among objects of a single type, representable as a collection of unipartite graphs, arise in a variety of biological settings, with collections of author-recipient email, and in social networks. Clustering the objects of study or situating them in a low dimensional space (e.g., a simplex) is only one of the goals of the analysis of such data; being able to estimate relational structures among the clusters themselves may be important. In , we introduced the family of stochastic block models of mixed membership to support such integrated data analyses. Our models combine features of mixed-membership models and block models for relational data in a hierarchical Bayesian framework. Here we present a nested variational inference scheme for this class of models, which is necessary to successfully perform fast approximate posterior inference, and we use the models and the estimation scheme to examine two data sets. (1) a collection of sociometric relations among monks is used to investigate the crisis that took place in a monastery [2], and (2) data from a school-based longitudinal study of the health-related behaviors of adolescents. Both data sets have recently been reanalyzed in [3] using a latent position clustering model and we compare our analyses with those presented there.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Airoldi, E.M., Blei, D.M., Fienberg, S.E., Xing, E.P.: Stochastic block models of mixed membership. Manuscript under review (2006)Google Scholar
  2. 2.
    Sampson, F.S.: A Novitiate in a period of change: An experimental and case study of social relationships. PhD thesis, Cornell University (1968)Google Scholar
  3. 3.
    Handcock, M.S., Raftery, A.E., Tantrum, J.M.: Model-based clustering for social networks. Journal of the Royal Statistical Society, Series A 170, 1–22 (2007)MathSciNetGoogle Scholar
  4. 4.
    Holland, P.W., Leinhardt, S.: Local structure in social networks. In: Heise, D. (ed.) Sociological Methodology, pp. 1–45. Jossey-Bass, San Fransisco (1975)Google Scholar
  5. 5.
    Fienberg, S.E., Meyer, M.M., Wasserman, S.: Statistical analysis of multiple sociometric relations. Journal of the American Statistical Association 80, 51–67 (1985)CrossRefGoogle Scholar
  6. 6.
    Wasserman, S., Pattison, P.: Logit models and logistic regression for social networks: I. an introduction to markov graphs and p  ∗ . Psychometrika 61, 401–425 (1996)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Snijders, T.A.B.: Markov chain monte carlo estimation of exponential random graph models. Journal of Social Structure (2002)Google Scholar
  8. 8.
    Hoff, P.D., Raftery, A.E., Handcock, M.S.: Latent space approaches to social network analysis. Journal of the American Statistical Association 97, 1090–1098 (2002)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Doreian, P., Batagelj, V., Ferligoj, A.: Generalized Blockmodeling. Cambridge University Press, Cambridge (2004)Google Scholar
  10. 10.
    Taskar, B., Wong, M.F., Abbeel, P., Koller, D.: Link prediction in relational data. In: Neural Information Processing Systems, vol. 15 (2003)Google Scholar
  11. 11.
    Kemp, C., Griffiths, T.L., Tenenbaum, J.B.: Discovering latent classes in relational data. Technical Report AI Memo 2004-019, MIT (2004)Google Scholar
  12. 12.
    Kemp, C., Tenenbaum, J.B., Griffiths, T.L., Yamada, T., Ueda, N.: Learning systems of concepts with an infinite relational model. In: Proceedings of the 21st National Conference on Artificial Intelligence (2006)Google Scholar
  13. 13.
    McCallum, A., Wang, X., Mohanty, N.: Joint group and topic discovery from relations and text. In: Airoldi, E., Blei, D.M., Fienberg, S.E., Goldenberg, A., Xing, E.P., Zheng, A.X. (eds.) ICML 2006. LNCS, vol. 4503, pp. 28–44. Springer, Heidelberg (2007)Google Scholar
  14. 14.
    Blei, D.M., Ng, A., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)MATHCrossRefGoogle Scholar
  15. 15.
    Cohn, D., Hofmann, T.: The missing link—A probabilistic model of document content and hypertext connectivity. In: Advances in Neural Information Processing Systems, vol. 13 (2001)Google Scholar
  16. 16.
    Erosheva, E.A., Fienberg, S.E., Lafferty, J.: Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences 97(22), 11885–11892 (2004)Google Scholar
  17. 17.
    Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)MATHCrossRefGoogle Scholar
  18. 18.
    Erosheva, E.A., Fienberg, S.E.: Bayesian mixed membership models for soft clustering and classification. In: Weihs, C., Gaul, W. (eds.) Classification—The Ubiquitous Challenge, pp. 11–26. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  19. 19.
    Manton, K.G., Woodbury, M.A., Tolley, H.D.: Statistical Applications Using Fuzzy Sets. Wiley, Chichester (1994)MATHGoogle Scholar
  20. 20.
    Rosenberg, N.A., Pritchard, J.K., Weber, J.L., Cann, H.M., Kidd, K.K., Zhivotovsky, L.A., Feldman, M.W.: Genetic structure of human populations. Science 298, 2381–2385 (2002)CrossRefGoogle Scholar
  21. 21.
    Pritchard, J., Stephens, M., Donnelly, P.: Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000)Google Scholar
  22. 22.
    Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning with applications to clustering with side information. In: Advances in Neural Information Processing Systems, vol. 16 (2003)Google Scholar
  23. 23.
    Holland, P., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: Some first steps. Social Networks 5, 109–137 (1983)CrossRefMathSciNetGoogle Scholar
  24. 24.
    Anderson, C.J., Wasserman, S., Faust, K.: Building stochastic blockmodels. Social Networks 14, 137–161 (1992)CrossRefGoogle Scholar
  25. 25.
    Nowicki, K., Snijders, T.A.B.: Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association 96, 1077–1087 (2001)MATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Airoldi, E.M., Blei, D.M., Fienberg, S.E., Xing, E.P.: Admixtures of latent blocks with application to protein interaction networks. Manuscript under review (2006)Google Scholar
  27. 27.
    Airoldi, E.M., Fienberg, S.E., Xing, E.P.: Latent aspects analysis for gene expression data. Manuscript under review (2006)Google Scholar
  28. 28.
    Carley, K.M.: Smart agents and organizations of the future. In: Lievrouw, L., Livingstone, S. (eds.) The Handbook of New Media, pp. 206–220 (2002)Google Scholar
  29. 29.
    Jordan, M., Ghahramani, Z., Jaakkola, T., Saul, L.: Introduction to variational methods for graphical models. Machine Learning 37, 183–233 (1999)MATHCrossRefGoogle Scholar
  30. 30.
    Airoldi, E.M., Fienberg, S.E., Joutard, C., Love, T.M.: Discovering latent patterns with hierarchical Bayesian mixed-membership models and the issue of model choice. Technical Report CMU-ML-06-101, School of Computer Science, Carnegie Mellon University (2006)Google Scholar
  31. 31.
    Xing, E.P., Jordan, M.I., Russell, S.: A generalized mean field algorithm for variational inference in exponential families. In: Uncertainty in Artificial Intelligence, vol. 19 (2003)Google Scholar
  32. 32.
    Schervish, M.J.: Theory of Statistics. Springer, Heidelberg (1995)MATHGoogle Scholar
  33. 33.
    Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families and variational inference. Technical Report 649, Department of Statistics, University of California, Berkeley (2003)Google Scholar
  34. 34.
    David, G.B., Carley, K.M.: Clearing the FOG: Fuzzy, overlapping groups for social networks. Manuscript under review (2006)Google Scholar
  35. 35.
    Breiger, R.L., Boorman, S.A., Arabie, P.: An algorithm for clustering relational data with applications to social network analysis and comparison to multidimensional scaling. Journal of Mathematical Psychology 12, 328–383 (1975)CrossRefGoogle Scholar
  36. 36.
    Harris, K.M., Florey, F., Tabor, J., Bearman, P.S., Jones, J., Udry, R.J.: The national longitudinal study of adolescent health: research design. Technical report, Caorlina Population Center, University of North Carolina, Chapel Hill (2003)Google Scholar
  37. 37.
    Udry, R.J.: The national longitudinal study of adolescent health: (add health) waves i and ii, 1994–1996; wave iii 2001–2002. Technical report, Carolina Population Center, University of North Carolina, Chapel Hill (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Edoardo M. Airoldi
    • 1
  • David M. Blei
    • 2
  • Stephen E. Fienberg
    • 1
    • 3
  • Eric P. Xing
    • 1
  1. 1.School of Computer Science, Carnegie Mellon University, Pittsburgh PA 15213USA
  2. 2.Department of Computer Science, Princeton University, Princeton NJ 08540USA
  3. 3.Department of Statistics, Carnegie Mellon University, Pittsburgh PA 15213USA

Personalised recommendations