MEI: Mutual Enhanced Infinite Generative Model for Simultaneous Community and Topic Detection

  • Dongsheng Duan
  • Yuhua Li
  • Ruixuan Li
  • Zhengding Lu
  • Aiming Wen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6926)

Abstract

Community and topic are two widely studied patterns in social network analysis. However, most existing studies either utilize textual content to improve the community detection or use link structure to guide topic modeling. Recently, some studies take both the link emphasized community and text emphasized topic into account, but community and topic are modeled by using the same latent variable. However, community and topic are different from each other in practical aspects. Therefore, it is more reasonable to model the community and topic by using different variables. To discover community, topic and their relations simultaneously, a m utual e nhanced i nfinite generative model (MEI) is proposed. This model discriminates the community and topic from one another and relates them together via community-topic distributions. Community and topic can be detected simultaneously and can be enhanced mutually during learning process. To detect the appropriate number of communities and topics automatically, Hierarchical/Dirichlet Process Mixture model (H/DPM) is employed. Gibbs sampling based approach is adopted to learn the model parameters. Experiments are conducted on the co-author network extracted from DBLP where each author is associated with his/her published papers. Experimental results show that our proposed model outperforms several baseline models in terms of perplexity and link prediction performance.

Keywords

social network analysis community detection topic modeling mutual enhanced infinite generative model dirichlet process gibbs sampling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3, 993–1022 (2003)MATHGoogle Scholar
  2. 2.
    Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association 90, 577–588 (1994)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Fortunato, S.: Community detection in graphs. Physics Reports 486(3-5), 75–174 (2010)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Gao, J., Liang, F., Fan, W., Wang, C., Sun, Y., Han, J.: On community outliers and their efficient detection in information networks. In: KDD, pp. 813–822 (2010)Google Scholar
  5. 5.
    Guo, Z., Zhang, Z.M., Zhu, S., Chi, Y., Gong, Y.: Knowledge discovery from citation networks. In: ICDM, pp. 800–805 (2009)Google Scholar
  6. 6.
    Heinrich, G.: Parameter estimation for text analysis. Technical report, University of Leipzig (2008)Google Scholar
  7. 7.
    Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR, pp. 50–57 (1999)Google Scholar
  8. 8.
    Li, H., Nie, Z., Lee, W.-C., Giles, C.L., Wen, J.-R.: Scalable community discovery on textual data with relations. In: WWW, pp. 101–110 (2008)Google Scholar
  9. 9.
    McCallum, A., Wang, X., Corrada-Emmanuel, A.: Topic and role discovery in social networks with experiments on enron and academic email. JAIR 30, 249–272 (2007)Google Scholar
  10. 10.
    McPherson, M., Lovin, L.S., Cook, J.M.: Birds of a feather: Homophily in social networks. Annual Review of Sociology 27(1), 415–444 (2001)CrossRefGoogle Scholar
  11. 11.
    Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: CIKM, pp. 1203–1212 (2008)Google Scholar
  12. 12.
    Nallapati, R., Ahmed, A., Xing, E.P., Cohen, W.W.: Joint latent topic models for text and citations. In: KDD, pp. 542–550 (2008)Google Scholar
  13. 13.
    Neal, R.M.: Markov chain sampling methods for dirichlet process mixture models. Journal of Computational and Graphical Statistics 9(2), 249–265 (2000)MathSciNetGoogle Scholar
  14. 14.
    Nowicki, K., Snijders, T.A.B.: Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association 96(455), 1077–1087 (2004)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Sun, Y., Han, J., Gao, J., Yu, Y.: Itopicmodel: Information network-integrated topic modeling. In: ICDM, pp. 493–502 (2009)Google Scholar
  16. 16.
    Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. Journal of the American Statistical Association 101(476), 1566–1581 (2006)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Wang, X., Mohanty, N., Mccallum, A.: Group and topic discovery from relations and text. In: LinkKDD, pp. 28–35 (2005)Google Scholar
  18. 18.
    Yang, T., Jin, R., Chi, Y., Zhu, S.: Combining link and content for community detection: A discriminative approach. In: KDD, pp. 927–935 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Dongsheng Duan
    • 1
  • Yuhua Li
    • 1
  • Ruixuan Li
    • 1
  • Zhengding Lu
    • 1
  • Aiming Wen
    • 1
  1. 1.School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhanP. R. China

Personalised recommendations